Differential Geometry

[i]
Alan U. Kennington
Differential geometry reconstructed

a unified systematic framework
First edition [work in progress]
90
90
90
A 90 0 10
90
0
90
0
10 10
0
B
90 90
90 90 90 10 10
0 0
0 10
10 90
0 10
0 10 0 10 0 10
10 10
0
0 10 10
0 0 90 90 0
10 0
10 0 10
0 10 0 90 10 0
10 0 0
90 90 10
90 0 10 90 10
0 10 0 10
10 0 90 10
10 0 0
Differential geometry reconstructed. Copyright (C) 2010, Alan U. Kennington. All Rights Reserved. The author hereby grants permission to print this book draft in A4 format. Printing in all other formats is forbidden. You may not charge any fee for copies of this book draft.
0 90 10
90 10 0 0
0 10
10 10 90 0
0 10 0
0 10
90 10
0 10 90
10 90 0 10
0 90 10
0
0
10
Ω1 Ω2
thematics
0 ma
ph
3 2 c
i
ys
l og
ics
1
5
pology
chemist
4
9
o
thr
8
r
an
y
e
c bi
en olo
6 7 gy neurosci
[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

[ii]
Mathematics Subject Classification (MSC 2000): 53–01
Library cataloguing data
Kennington, Alan Ulrich (1953-

Differential geometry reconstructed:
a unified systematic framework
2010 516.3
First printing, April 2010 [work in progress]
c 2010, Alan U. Kennington.
Copyright !
All rights reserved.
The author hereby grants permission to print this book draft in A4 format.
Printing in all other formats is forbidden.
You may not charge any fee for copies of this book draft.
This book was typeset by the author with the plain TEX typesetting system.
The illustrations in this book were created with MetaPost.
This book is available on the Internet: http://www.topology.org/tex/conc/dg.html

[iii]
Preface
This book should be suitable for fourth year university mathematics, physics and engineering students, or for
anyone who has already learned differential geometry but has an uneasy feeling that they may have skimmed
over a few too many fine points. The intention here is to replace intuition and hand-waving with a seamless,
systematic exposition. However, this is only a definitions book, not a theorems book. The reader must
look elsewhere for serious theorems and serious applications. But understanding definitions is obviously an
enormously important part of understanding theorems and applications.
The author wrote the first 112 pages of this book in early 1992 in Bonn on his Atari ST computer. After
nine years of neglect, he wrote another 310 pages from August 2001 to November 2002. It is still a scruffy
“work in progress” scrapbook, but it may be ready for a first printing some time in 2010. Then there should
be one or two printings every year thereafter.
Right now, this book looks more like a construction site than a finished building. With some imagination,
you may be able to conjure up a vision of the finished work through the scaffolding. Material is being added
and rewritten in many chapters and sections simultaneously. The creative process for producing this book
is illustrated in the following diagram. All processes are happening concurrently.
read write type upload download

ideas notes TEX files PostScript
files
books brain desk workstation web server Internet
The current strategy is to first type in all of my hand-written notes during the “ideas capture phase”. Then
during the “consolidation phase”, everything will be made neat, tidy, comprehensible and coherent. The
book is being assembled like a jigsaw puzzle. Some of the pieces are fitting together nicely already, but
most pieces are in disorganized heaps. Many pieces are still in the box waiting to be thrown on the table.
Sometimes new pieces must be crafted by hand. It’s like moving into a new house. First you dump all the
boxes on the floor; then you must put everything where it belongs. Inconsistencies, repetition, self-indulgence
and frivolity will be progressively removed. All of the theorems will be proved. All of the exercises will be
solved. Formative chaos will yield to serene order. It won’t happen overnight, but it will happen.
April 2010 Dr. Alan U. Kennington

Melbourne, Victoria
Australia
Disclaimer
The author of this book disclaims any express or implied guarantee of the fitness of this book for any purpose.
In no event shall the author of this book be held liable for any direct, indirect, incidental, special, exemplary,
or consequential damages (including, but not limited to, procurement of substitute services; loss of use, data,
or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict
liability, or tort (including negligence or otherwise) arising in any way out of the use of this book, even if
advised of the possibility of such damage.
Biography
The author was born in England in 1953 to a German mother and Irish father. The family migrated in 1963
to Adelaide, South Australia. The author graduated from the University of Adelaide in 1984 with a Ph.D.
in mathematics. He was a tutor at University of Melbourne in 1984, research assistant at the Australian
National University (Canberra) in early 1985, Assistant Professor at University of Kentucky for the 1985/86
academic year, and visiting researcher at the University of Heidelberg, Germany, in 1986/87. From 1987 to
2007, the author carried out research and development of communications and information technologies in
Australia, Germany and the Netherlands.

[iv]
Chapters
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I. Preliminary topics
2. Philosophical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3. Logic semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4. Logic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6. Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7. Order and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8. Rational and real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9. Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10. Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
11. Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
12. Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13. Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
14. Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
15. Topology classes and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
16. Topological curves, paths and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
17. Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
18. Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
19. Diffeomorphisms in Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
20. Measure and integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
21. Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
22. Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
23. Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
24. Parallelism on topological fibre bundles . . . . . . . . . 529
. . . . . . . . . . . . . . . . .

v
Chapters
Part II. Differential geometry
25. Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
26. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
27. Tangent bundles on differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . 571
28. Tensor bundles and tensor fields on manifolds . . . . . . . . . . . . . . . . . . . . . . . 601
29. Higher-order tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
30. Differentials on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
31. Higher-order differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
32. Vector field calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
33. Differentiable groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
34. Differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
35. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 681
36. Affine connections and covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . 699
37. Geodesics, convexity and Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
38. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
39. Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
40. Tensor calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
41. Geometry of the 2-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
42. Examples of manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
43. Examples of fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
44. Derivations, gradient operators, germs and jets . . . . . . . . . . . . . . . . . . . . . . 781
45. History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
46. Exercise questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
47. Exercise answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
48. Notations and abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
49. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
50. Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841

vi

[vii]
Table of contents
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Layers of structure of differential geometry . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Topic flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Chapter groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objectives and motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Some minor details of presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Differences from other differential geometry texts . . . . . . . . . . . . . . . . . . . . . 11
1.8 MSC 2000 subject classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 How to learn mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Part I. Preliminary topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 2. Philosophical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 The bedrock of mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Logic, language and tribalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Ontology of mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Plato’s theory of ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Sets as parameters for socio-mathematical network communications . . . . . . . . . . . 31
2.6 Sets as parameters for classes of objects . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Extraneous properties of set-constructions in definitions . . . . . . . . . . . . . . . . . 38
2.8 Axioms versus constructions for defining mathematical systems . . . . . . . . . . . . . 40
2.9 Some general remarks on mathematics and logic . . . . . . . . . . . . . . . . . . . . . 43
2.10 Dark sets and dark numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.11 Integers and infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.12 Real numbers and infinitesimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 3. Logic semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.1 Mathematical logic subject development . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 General comments on logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Modelling, meta-modelling and recursive modelling . . . . . . . . . . . . . . . . . . . . 73
3.4 The universality (or otherwise) of modern logic . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Logic in literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Proposition-store versus world-view ontology for logic . . . . . . . . . . . . . . . . . . 85
3.7 A proposition-store ontology for logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.8 Undecidable propositions and incomplete information transfer . . . . . . . . . . . . . . 93
3.9 The semantics of truth and falsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.10 The semantics of logical negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.11 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.12 The moods of logical propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.13 Other remarks on the semantics of logic . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.14 Naive mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

viii
Chapter 4. Logic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.1 Concrete proposition domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2 Logic operations in concrete proposition domains . . . . . . . . . . . . . . . . . . . . . 116
4.3 Logical operators and expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4 Logical expression evaluation and logical argumentation . . . . . . . . . . . . . . . . . 124
4.5 Propositional calculus formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.6 Deduction rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.7 An implication-based propositional calculus . . . . . . . . . . . . . . . . . . . . . . . . 130
4.8 Some propositional calculus theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.9 Meta-theorems and the “deduction theorem” . . . . . . . . . . . . . . . . . . . . . . . 135
4.10 Further theorems for the implication operator . . . . . . . . . . . . . . . . . . . . . . 139
4.11 Other logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.12 Parametrized families of propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.13 Logical quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.14 Predicate calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.15 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.16 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Chapter 5. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.1 Zermelo-Fraenkel set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.2 The ZF extension axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.3 The ZF empty set, pair, union and power set axioms . . . . . . . . . . . . . . . . . . . 165
5.4 The ZF replacement axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.5 The ZF regularity axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 The ZF infinity axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.7 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.8 ZF set theory definitions and notations . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.9 Axiom of choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.10 Axiom of countable choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.11 Zermelo set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.12 Bernays-Gödel set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.13 Basic properties of binary set unions and intersections . . . . . . . . . . . . . . . . . . 194
5.14 Basic properties of general set unions and intersections . . . . . . . . . . . . . . . . . . 196
5.15 Closure of set unions under arbitrary unions . . . . . . . . . . . . . . . . . . . . . . . 198
5.16 Specification tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Chapter 6. Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

6.1 Ordered pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.2 Cartesian product of a pair of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.4 Equivalence relations and partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.6 Function set maps and inverse set maps . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.7 Composition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.8 Families of sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.9 Cartesian products of families of sets and functions . . . . . . . . . . . . . . . . . . . . 219
6.10 Partial Cartesian products and identification spaces . . . . . . . . . . . . . . . . . . . 220
6.11 Partially defined functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.12 Notations for sets of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

ix
Chapter 7. Order and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

7.1 Ordered sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.2 Ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3 Natural numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.4 Unsigned integer arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.5 Signed integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.6 Extended integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.7 Cartesian products of sequences of sets and functions . . . . . . . . . . . . . . . . . . 237
7.8 Choice functions without the axiom of choice . . . . . . . . . . . . . . . . . . . . . . . 238
7.9 Indicator functions and delta functions . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.10 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.11 Combinations and ordered selections . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.12 List spaces for general sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.13 Reformulation of logic in terms of axiomatic mathematics . . . . . . . . . . . . . . . . 246
Chapter 8. Rational and real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

8.1 Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.2 Extended rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
8.3 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.4 Extended real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
8.5 Real number tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.6 Some useful basic real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.7 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Chapter 9. Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

9.1 Semigroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
9.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.3 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
9.4 Left transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.5 Right transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.6 Mixed transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.7 Figures and invariants of transformation groups . . . . . . . . . . . . . . . . . . . . . 276
9.8 Rings and fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
9.9 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
9.10 Associative algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.11 Lie algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.12 List space for sets with algebraic structure . . . . . . . . . . . . . . . . . . . . . . . . 285
Chapter 10. Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

10.1 Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
10.2 Linear subspaces and basis vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
10.3 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
10.4 Eigenspaces of linear space endomorphisms . . . . . . . . . . . . . . . . . . . . . . . . 292
10.5 Linear functionals and dual spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
10.6 Direct sums of linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
10.7 Quotients of linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
10.8 Inner products and norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.9 Groups of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.10 Free linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.11 Exact sequences of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

x
Chapter 11. Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

11.1 Rectangular matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
11.2 Component matrices of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
11.3 Square matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
11.4 Real square matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
11.5 Real symmetric matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
11.6 Real symmetric definite and semi-definite matrices . . . . . . . . . . . . . . . . . . . . 316
11.7 Matrix groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Chapter 12. Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

12.1 Affine spaces discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
12.2 Affine space definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
12.3 Affine transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
12.4 Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Chapter 13. Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

13.1 The meaning of tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
13.2 Multilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
13.3 Linear spaces of multilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13.4 Symmetric and antisymmetric multilinear maps . . . . . . . . . . . . . . . . . . . . . 332
13.5 Tensor product metadefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
13.6 Tensor products of linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
13.7 Covariant tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
13.8 Mixed tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
13.9 General tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
13.10 Alternating tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
13.11 Alternating tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
13.12 Tensor products defined via free linear spaces . . . . . . . . . . . . . . . . . . . . . . . 346
13.13 Tensor products defined via lists of tensor monomials . . . . . . . . . . . . . . . . . . 347
Chapter 14. Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
14.1 Overview of topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.2 History and generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
14.3 Topological spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.4 Some simple topologies on finite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
14.5 Interior and closure of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
14.6 Exterior and boundary of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
14.7 Limit points and isolated points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
14.8 Some simple topologies on countably infinite sets . . . . . . . . . . . . . . . . . . . . . 369
14.9 Generation of topologies from collections of sets . . . . . . . . . . . . . . . . . . . . . 372
14.10 The standard topology for the real numbers . . . . . . . . . . . . . . . . . . . . . . . 374
14.11 Open bases and open subbases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
14.12 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
14.13 Homeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

xi
Chapter 15. Topology classes and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

15.1 Product and quotient topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
15.2 Separation classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
15.3 Separation and disconnection of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
15.4 Connectivity classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
15.5 Definition of continuity of functions using connectivity . . . . . . . . . . . . . . . . . . 392
15.6 Open bases, countability classes and separability . . . . . . . . . . . . . . . . . . . . . 394
15.7 Compactness classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
15.8 Topological properties of real number intervals . . . . . . . . . . . . . . . . . . . . . . 397
15.9 Topological dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
15.10 Set union topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
15.11 Topological identification spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Chapter 16. Topological curves, paths and groups . . . . . . . . . . . . . . . . . . . . . . . . . . 401

16.1 Curve and path terminology and definition options . . . . . . . . . . . . . . . . . . . . 401
16.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
16.3 Path-equivalence relations for curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
16.4 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
16.5 Convex curvilinear interpolation in affine spaces . . . . . . . . . . . . . . . . . . . . . 410
16.6 Algebraic topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
16.7 Topological groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.8 Topological transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.9 Topological vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
16.10 Network topology and continuous paths in networks . . . . . . . . . . . . . . . . . . . 415
Chapter 17. Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

17.1 Distance functions and balls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
17.2 Set distance and set diameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
17.3 The topology induced by a metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
17.4 Continuous functions in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
17.5 Rectifiable sets, curves and paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Chapter 18. Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

18.1 Infinitesimals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
18.2 Differentiation for one variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
18.3 Unidirectional differentiability of real-to-real functions . . . . . . . . . . . . . . . . . . 435
18.4 Higher-order derivatives for real-to-real functions . . . . . . . . . . . . . . . . . . . . . 436
18.5 Differentiation for several variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
18.6 Higher-order derivatives for several variables . . . . . . . . . . . . . . . . . . . . . . . 443
18.7 Some differentiability-based function spaces . . . . . . . . . . . . . . . . . . . . . . . . 444
18.8 Differentiation for abstract linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . 444
18.9 Hölder continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Chapter 19. Diffeomorphisms in Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

19.1 Tangent vectors and diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
19.2 Differentials and diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
19.3 Second-level tangent vectors and diffeomorphisms . . . . . . . . . . . . . . . . . . . . 454
19.4 Diffeomorphism pseudogroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
19.5 Second-order differential operators and diffeomorphisms . . . . . . . . . . . . . . . . . 459
19.6 Directionally differentiable homeomorphisms . . . . . . . . . . . . . . . . . . . . . . . 461

xii
Chapter 20. Measure and integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

20.1 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
20.2 Lebesgue integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
20.3 Rectangular Stokes theorem in two dimensions . . . . . . . . . . . . . . . . . . . . . . 465
20.4 Rectangular Stokes theorem in three dimensions . . . . . . . . . . . . . . . . . . . . . 467
20.5 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
20.6 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
20.7 Exterior differentiation using Lie derivatives . . . . . . . . . . . . . . . . . . . . . . . 473
20.8 Geometric measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.9 Stokes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.10 Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.11 Some integrability-based function spaces . . . . . . . . . . . . . . . . . . . . . . . . . 475
20.12 Logarithmic and exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . 475
20.13 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Chapter 21. Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

21.1 Ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
21.2 Systems of linear second-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
21.3 Boundary value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
21.4 Initial value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
21.5 Calculus of variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
21.6 ODEs for defining exponential and trigonometric functions . . . . . . . . . . . . . . . . 486
21.7 Taylor series and exponentials of matrices . . . . . . . . . . . . . . . . . . . . . . . . 487
Chapter 22. Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

22.1 Non-topological fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
22.2 Parallelism for non-topological fibrations . . . . . . . . . . . . . . . . . . . . . . . . . 491
22.3 Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
22.4 Finite transformation groups as fibre bundles . . . . . . . . . . . . . . . . . . . . . . . 493
Chapter 23. Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
23.1 History, motivation and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
23.2 Topological fibrations with intrinsic fibre spaces . . . . . . . . . . . . . . . . . . . . . 500
23.3 Topological fibrations and fibre atlases . . . . . . . . . . . . . . . . . . . . . . . . . . 502
23.4 Fibration identification spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
23.5 Structure groups discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
23.6 Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
23.7 Fibre bundle homomorphisms, isomorphisms and products . . . . . . . . . . . . . . . . 512
23.8 Structure-preserving fibre set maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
23.9 Topological principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
23.10 Associated topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
23.11 Construction of associated topological fibre bundles . . . . . . . . . . . . . . . . . . . 524
23.12 Construction of associated topological fibre bundles via orbit spaces . . . . . . . . . . . 525
Chapter 24. Parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 529

24.1 Parallelism path classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
24.2 Pathwise parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . 531
24.3 Associated parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
24.4 Other topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536

xiii
Part II. Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Chapter 25. Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541

25.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
25.2 Euclidean and locally Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 543
25.3 Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
25.4 Charts and atlases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
25.5 Topological manifold constructions, attributes and relations . . . . . . . . . . . . . . . 548
25.6 Topological identification spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Chapter 26. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

26.1 Overview of differentiable structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
26.2 Terminology and definition choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
26.3 Differentiable manifold atlases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
26.4 Some standard differentiable manifold atlases . . . . . . . . . . . . . . . . . . . . . . . 556
26.5 Some basic definitions for differentiable manifolds . . . . . . . . . . . . . . . . . . . . 556
26.6 Differentiable real-valued functions on differentiable manifolds . . . . . . . . . . . . . . 558
26.7 Differentiable curves and paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
26.8 Differentiable families of differentiable transformations . . . . . . . . . . . . . . . . . . 561
26.9 Differentiable maps between differentiable manifolds . . . . . . . . . . . . . . . . . . . 562
26.10 Analytic manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
26.11 Unidirectionally differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 564
26.12 Lipschitz manifolds and rectifiable curves . . . . . . . . . . . . . . . . . . . . . . . . . 565
26.13 Differentiable fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
26.14 Tangent space building principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Chapter 27. Tangent bundles on differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . 571

27.1 Styles of representation of tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . 573
27.2 Tangent bundle metadefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
27.3 Tangent vectors . . . . . . . . . . . . . . . . . . . 580
. . . . . . . . . . . . . . . . . . .
27.4 Computational tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
27.5 Tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
27.6 Tagged tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
27.7 Pointwise tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
27.8 Tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
27.9 Tangent operator bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
27.10 The tangent bundle of a tangent bundle . . . . . . . . . . . . . . . . . . . . . . . . . 590
27.11 Horizontal components and drop functions . . . . . . . . . . . . . . . . . . . . . . . . 594
27.12 Tangent frames and coordinate basis vectors . . . . . . . . . . . . . . . . . . . . . . . 596
27.13 Tangent space constructions, attributes and relations . . . . . . . . . . . . . . . . . . . 598
27.14 Unidirectional tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
27.15 Distributions as representations of tangent bundles . . . . . . . . . . . . . . . . . . . . 599
27.16 Tangent bundles on infinite-dimensional manifolds . . . . . . . . . . . . . . . . . . . . 600

xiv
Chapter 28. Tensor bundles and tensor fields on manifolds . . . . . . . . . . . . . . . . . . . . . . 601

28.1 Contravariant tensors and tensor spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 602
28.2 Cotangent vectors and cotangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . 602
28.3 Covariant and mixed tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
28.4 Double tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
28.5 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
28.6 Tangent operator fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
28.7 Tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
28.8 Vector fields and tensor fields along curves . . . . . . . . . . . . . . . . . . . . . . . . 608
28.9 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Chapter 29. Higher-order tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

29.1 Higher-order tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
29.2 Tensorization coefficients for second-order tangent operators . . . . . . . . . . . . . . . 615
29.3 Higher-order tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
29.4 Higher-order tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
29.5 Drop functions for second-level tangent vectors . . . . . . . . . . . . . . . . . . . . . . 620
29.6 Elliptic second-order operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
29.7 Higher-order vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
29.8 Higher-order vector fields for families of curves . . . . . . . . . . . . . . . . . . . . . . 622
Chapter 30. Differentials on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

30.1 Pointwise differentials versus induced maps . . . . . . . . . . . . . . . . . . . . . . . . 623
30.2 The differential of a real-valued function . . . . . . . . . . . . . . . . . . . . . . . . . 625
30.3 The differential of a differentiable map . . . . . . . . . . . . . . . . . . . . . . . . . . 627
30.4 The differential of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
30.5 One-parameter transformation families and vector fields . . . . . . . . . . . . . . . . . 633
Chapter 31. Higher-order differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635

31.1 Higher-order differentials of a real-valued function . . . . . . 635
. . . . . . . . . . . . . .
31.2 Higher-order differentials of a differentiable map . . . . . . . . . . . . . . . . . . . . . 635
31.3 Higher-order differentials of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
31.4 Higher-order differentials of curve families . . . . . . . . . . . . . . . . . . . . . . . . 638
31.5 Differentials of real-valued functions for higher-order operators . . . . . . . . . . . . . . 639
31.6 Hessian operators at critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
31.7 Differentials of differentiable maps for higher-order operators . . . . . . . . . . . . . . . 640
31.8 Differentials of curves for higher-order operators . . . . . . . . . . . . . . . . . . . . . 642
Chapter 32. Vector field calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643

32.1 Naive vector field derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
32.2 The Poisson bracket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
32.3 Vector field derivatives for curve families . . . . . . . . . . . . . . . . . . . . . . . . . 647
32.4 Lie derivatives of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
32.5 Lie derivatives of tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
32.6 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

xv
Chapter 33. Differentiable groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653

33.1 Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
33.2 Hilbert’s fifth problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
33.3 Left invariant vector fields on Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . 656
33.4 Right invariant vector fields on Lie groups . . . . . . . . . . . . . . . . . . . . . . . . 660
33.5 The Lie algebra of a Lie group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
33.6 Diffeomorphism groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
33.7 Lie transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
33.8 Infinitesimal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
Chapter 34. Differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669

34.1 Differentiable fibre bundles with non-Lie structure group . . . . . . . . . . . . . . . . . 670
34.2 Differentiable fibre bundles with Lie structure group . . . . . . . . . . . . . . . . . . . 670
34.3 Vector fields on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . 671
34.4 Differentiable principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
34.5 Vector fields on differentiable principal fibre bundles . . . . . . . . . . . . . . . . . . . 674
34.6 Associated differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
34.7 Vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
34.8 Tangent bundles of differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . 677
34.9 Tangent frame bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
Chapter 35. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . 681

35.1 Naming, history and choice of definitions . . . . . . . . . . . . . . . . . . . . . . . . . 682
35.2 Differentiation of parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
35.3 Horizontal lift functions for ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . 686
35.4 Curvature of connections on ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . 689
35.5 Horizontal lift functions for principal fibre bundles . . . . . . . . . . . . . . . . . . . . 690
35.6 Connection forms for PFB connections . . . . . . . . . . . . . . . . . . . . . . . . . . 693
35.7 Covariant derivatives for general connections . . . . . . . . . . . . . . . . . . . . . . . 695
35.8 Parallel displacement for PFB connections . . . . . . . . . . . . . . . . . . . . . . . . 695
35.9 Alternative definitions for general connections . . . . . . . . . . . . . . . . . . . . . . 696
Chapter 36. Affine connections and covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . 699

36.1 Concepts, history and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
36.2 Overview of affine connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
36.3 Motivation for defining connections on manifolds . . . . . . . . . . . . . . . . . . . . . 702
36.4 Affine connections on tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . 704
36.5 Covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
36.6 Hessian operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
36.7 Elliptic second-order operator fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
36.8 Curvature and torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
36.9 Affine connections on principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . 711
36.10 Coefficients of affine connections on principal fibre bundles . . . . . . . . . . . . . . . . 711
36.11 Connections for Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 713

xvi
Chapter 37. Geodesics, convexity and Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . 715

37.1 Covariant derivatives of vector fields along curves . . . . . . . . . . . . . . . . . . . . 715
37.2 Geodesic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
37.3 Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
37.4 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
37.5 Convex combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
37.6 Convex curvilinear interpolation in affine manifolds . . . . . . . . . . . . . . . . . . . 718
37.7 Families of geodesic interpolations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
37.8 Exponential maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
37.9 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
Chapter 38. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723

38.1 Historical notes on Riemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . . 724
38.2 Overview of Riemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
38.3 The Riemannian metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
38.4 The point-to-point distance function . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
38.5 The Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
38.6 Curvature tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
38.7 Differential operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
38.8 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
38.9 Embedded Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
38.10 Information geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Chapter 39. Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735

39.1 Overview of pseudo-Riemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . 735
39.2 The pseudo-Riemannian metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
39.3 General relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
39.4 Singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
39.5 Global solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
Chapter 40. Tensor calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
40.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
40.2 Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
40.3 Manifolds with affine connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
40.4 Equations of geodesic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
40.5 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
40.6 Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
40.7 Submanifolds of Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745

xvii
Chapter 41. Geometry of the 2-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747

41.1 Terrestrial coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
41.2 Tensor calculus in terrestrial coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 749
41.3 Metric tensor calculation from the distance function . . . . . . . . . . . . . . . . . . . 751
41.4 The principal fibre bundle in terrestrial coordinates . . . . . . . . . . . . . . . . . . . 752
41.5 The Riemannian connection in terrestrial coordinates . . . . . . . . . . . . . . . . . . 753
41.6 Coordinates for polar exponential maps . . . . . . . . . . . . . . . . . . . . . . . . . . 755
41.7 The global tangent bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
41.8 Isometries of S 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
41.9 Geodesic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
41.10 Affinely parametrized geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
41.11 Convex sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.12 Normal coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.13 Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.14 Circles on the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.15 Calculation of the “hours of daylight” . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
41.16 Some standard map projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
41.17 Projection of a sphere onto a plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
Chapter 42. Examples of manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767

42.1 Topological space examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
42.2 Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
42.3 Non-Hausdorff locally Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 769
42.4 Hölder-continuous manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
42.5 Torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
42.6 General sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
42.7 Conical coordinates for Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . 772
42.8 Hyperboloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
42.9 Tractrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
42.10 Analysis on Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Chapter 43. Examples of fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
43.1 Euclidean fibre bundles on Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . 775
43.2 The Möbius strip as a fibre bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
43.3 The Möbius strip fibre bundle on S 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
Chapter 44. Derivations, gradient operators, germs and jets . . . . . . . . . . . . . . . . . . . . . 781

44.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
44.2 Some elementary examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
44.3 Further elementary examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
44.4 Spaces of differentiable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
44.5 Spaces of smooth functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
44.6 The space of analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.7 The Hölder spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.8 Further topics on derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.9 Germs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.10 Jets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Chapter 45. History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791

45.1 Chronology of mathematicians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
45.2 Origins of words and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
45.3 Etymology of affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
45.4 Logical language in ancient literature . . . . . . . . . . . . . . . . . . . . . . . . . . . 801

xviii
Chapter 46. Exercise questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805

46.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
46.2 Sets, relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
46.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
46.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
46.5 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
46.6 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
46.7 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
Chapter 47. Exercise answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811

47.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
47.2 Sets, relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
47.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
47.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
47.5 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
47.6 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
47.7 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
Chapter 48. Notations and abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823

48.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
48.2 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Chapter 49. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833

49.1 Differential geometry introductory texts . . . . . . . . . . . . . . . . . . . . . . . . . 833
49.2 Other differential geometry references . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
49.3 Other mathematics references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
49.4 Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
49.5 Logic and set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
49.6 Anthropology and linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
49.7 Philosophy and ancient history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
49.8 History of mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
49.9 Other references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
49.10 Comments on other people’s books . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
Chapter 50. Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841

[1]
Chapter 1
Introduction
1.1 Layers of structure of differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Topic flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Chapter groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objectives and motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Some minor details of presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Differences from other differential geometry texts . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 MSC 2000 subject classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 How to learn mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
This book is not “Differential Geometry Made Easy”. Differential geometry is not easy. If you think it’s easy,
you haven’t understood it! Attempts to make it seem easy give the reader only a superficial understanding.
The best that can be hoped for realistically is a systematic, self-consistent presentation of topics so that the
ideas can be assimilated by the reader without any more pain and confusion than absolutely necessary. This
book aims to be “Differential Geometry Made Crystal Clear”, but enlightenment requires effort.
This is a definitions book, not a theorems book. Definitions introduce you to things and tell you their names.
Theorems tell you properties and relations of things. Most mathematical texts give definitions so that they
can present their theorems. In this book, theorems are given only when required for the presentation of
definitions. If the reader can understand the definitions in the DG literature, that is a good starting point
for understanding the theorems. To be meaningful, many definitions do require existence, uniqueness or
regularity proofs. So some basic theorems are unavoidable. Also a few theorems are given here to motivate
definitions or to clarify the relations between them.
Before studying differential geometry, the reader should have some prior familiarity with set theory, group
theory, linear algebra, topology, measure theory and partial differential equations. These prerequisites are
presented in preliminary chapters in this book, but it is preferable to have studied these topics beforehand.
Differential geometry is the geometry of manifolds. Coordinate charts are the principal defining characteristic
of differential geometry, not an embarrassing nuisance. Therefore coordinate charts are constantly and
unashamedly in the foreground in this book. Some authors don’t like coordinates. They can be hidden but
not removed. (The coordinates, that is.)
The central concepts of differential geometry are coordinate charts, tangent vectors, the exterior derivative,
pathwise parallelism, curvature and metric tensors. The aim of this book is to give the reader a confident
understanding of these concepts and the relations between them. The presentation strategy is to stratify all
DG concepts according to structural layers. This book will hopefully fill the role of an illustrated dictionary.
It does not try to be a comprehensive encyclopedia. It presents only the dramatis personae, not the complete
works of Shakespeare.
The author may use this book as a resource for the creation of other books. When this full version has been
released, the author may write a half-length version which omits the less popular technicalities. The shorter
version may be titled “Differential Geometry Made Easy”. It might sell a lot of copies!
Alan U. Kennington, “Differential geometry reconstructed: a unified systematic framework”.

Copyright 2010, Alan U. Kennington. All rights reserved. You may print this book draft in A4 format.
Printing in all other formats is forbidden. You may not charge any fee for copies of this book draft.

2 1. Introduction
1.1. Layers of structure of differential geometry

The following table summarizes the progressive build-up of the layers of structure of differential geometry
in the chapters of this book.
layer main concept structure chapters
0 point-set layer points set of points with no topological structure 5–13
1 topological layer connectivity topological space: open neighbourhoods 14–17, 23–25
2 differential layer vectors atlas of differentiable charts; tangent bundle 26–34
3 connection layer parallelism affine connection on the tangent bundle 35–37
4 metric layer distance Riemannian metric tensor field 38–39
The following table shows which levels of structure are required by some important concepts. For example,
geodesic curves are well defined if an affine connection or Riemannian metric is specified, but not if you only
have a differentiable structure.
concepts structural layers where concepts are meaningful
point topology differentiable affine riemannian
set structure connection metric
0 cardinality of sets yes yes yes yes yes
1 boundaries of sets yes yes yes yes
connectivity of sets yes yes yes yes
continuity of functions yes yes yes yes
2 tangent vectors yes yes yes
differentials of functions yes yes yes
tensor bundles yes yes yes
differential forms yes yes yes
vector field algebra yes yes yes
Lie derivatives yes yes yes
exterior derivative yes yes yes
Stokes theorem yes yes yes
3 parallel transport yes yes
covariant derivatives yes yes
geodesic curves yes yes
geodesic coordinates yes yes
convex sets and functions yes yes
Riemann curvature yes yes
Ricci curvature yes yes
4 angle between vectors yes
length of vector yes
distance between points yes
normal coordinates yes
sectional curvature yes
scalar curvature yes
Einstein curvature tensor yes
Laplace-Beltrami operator yes
The specification of any structural layer uniquely determines the lower layer structures but not the higher
layer structures. When only a point set is specified, there are many possible choices for the topology. When
only the topology is specified, there are many choices for the differentiable structure. On a given differentiable
structure, many choices of connection are possible. But a Riemannian metric uniquely determines the affine
connection, which uniquely determines the differentiable structure, and so forth.
The higher layers are optional. You only need to provide the layers of structure which are required by the
concepts you wish to use. More structure gives you more concepts. It is noteworthy that so many concepts
are well defined in the absence of a metric, and even in the absence of a connection.

1.2. Topic flow diagram 3
1.2. Topic flow diagram

Chapters 2 to 24 present preliminary topics for reference in later chapters. These topics are in the early
chapters so that later chapters are not cluttered by the interpolation of prerequisites. The preliminary topics
include logic, sets, functions, order and numbers (Chapters 3–8), algebra (Chapter 9), linear algebra (Chap-
ters 10–12), tensor algebra (Chapter 13), topology (Chapters 14–17), sets and numbers
calculus (Chapters 18–21) and fibre bundles (Chapters 22–24).
Differential geometry begins with topological manifolds (Chapter 25), algebra
followed by differentiable manifolds (Chapter 26), connections (Chap-
ters 35–37) and Riemannian metric spaces (Chapters 38–39). topology linear spaces
This book tries to disentangle which concepts and theorems belong to
four levels of structure: topological structure (Chapter 25), differen- calculus tensor algebra
tiable structure (Chapters 26–32), affine connection structure (Chap-
ters 35–37), and Riemannian metric structure (Chapters 38–39). Lie topological differentiable
manifolds manifolds
(differentiable) groups (Chapter 33) and differentiable fibre bundles
(Chapter 34) may be considered as preliminary topics, but like tensor topological Lie groups
algebra (Chapter 13) and topological fibre bundles (Chapters 23–24), fibre bundles
they may be regarded as core topics of differential geometry. differentiable fibre bundles
Later chapters deal with tensor calculus (Chapter 40), the 2-sphere
(Chapter 41), example geometries and fibre bundles (Chapters 42–43), affine connections
and alternative tangent space definitions (Chapter 44).
The topic flow diagram shows the progressive build-up of algebraic Riemannian
manifolds
pseudo-Riemannian
manifolds
structure from “sets and numbers” to “tensor algebra”. Then “topol-
ogy” and “calculus” are combined with “tensor algebra” to define “differentiable manifolds”. Adding “topo-
logical fibre bundles” to this yields “differentiable fibre bundles” on which “affine connections” are defined.
Adding a metric or pseudo-metric leads to Riemannian or pseudo-Riemannian manifolds. Algebraic and
analytical structure are thus developed in two intermingled streams. This modern approach to differential
geometry is expressed in the rarified language of fibre bundles. Affine connections may be defined directly
on differentiable manifolds, bypassing fibre bundles as suggested by the dashed arrow, like in the olden days.
1.3. Chapter groups
1.3.1 Remark: The page counts for the chapter groups are as follows.
pages chapter group chapters
16 introduction 1
524 Part I: preliminary topics 2–24
94 philosophy, semantics 2–3
148 logic, set theory, numbers 4–8
90 algebra 9–13
80 topology 14–17
60 calculus 18–21
52 topological fibre bundles 22–24
250 Part II: differential geometry 25–44
10 topological manifolds 25
102 differentiable manifolds 26–32
28 Lie groups, differentiable fibre bundles 33–34
42 connections 35–37
24 Riemannian metric, tensor calculus 38–40
34 examples 41–43
10 derivations 44
50 appendices 45–49
45 index 50

4 1. Introduction
1.3.2 Remark: The chapters of this book fall more or less naturally into the following groups.
Chapter 1 is a general introduction. This may be safely ignored apart from Section 1.1.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I. Preliminary topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapters 2 and 3 discuss philosophy and semantics. These are the most annoying chapters of the book.
2. Philosophical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3. Logic semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapters 4 to 8 present logic, sets, relations, functions, order and numbers.
4. Logic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6. Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7. Order and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8. Rational and real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Chapters 9 to 13 introduce algebra, especially linear and multilinear (tensor) algebra.
9. Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10. Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
11. Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
12. Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
13. Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Chapters 14 to 17 introduce topology. Metric spaces are a particular kind of topological space.
14. Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
15. Topology classes and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
16. Topological curves, paths and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
17. Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Chapters 18 to 21 introduce analytical topics, namely the differential calculus and integral calculus. This
provides a break from topology before returning to it in the fibre bundle chapters.
18. Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
19. Diffeomorphisms in Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
20. Measure and integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
21. Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Chapters 22 to 24 introduce fibre bundles which have topology but no differentiable structure.
22. Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
23. Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
24. Parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Part II. Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Chapter 25 introduces topological manifolds. This is layer 1 in the five-layer DG structure model.
25. Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Chapters 26 to 32 add differentiable structure (i.e. charts) to manifolds. This commences layer 2.
26. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
27. Tangent bundles on differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . 571
28. Tensor bundles and tensor fields on manifolds . . . . . . . . . . . . . . . . . . . . . . . 601
29. Higher-order tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
30. Differentials on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
31. Higher-order differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
32. Vector field calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
Chapters 33 and 34 introduce Lie (i.e. differentiable) groups. Lie groups are required for the formal definition
of differentiable fibre bundles, which are required for defining general connections.
33. Differentiable groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
34. Differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669

1.4. Objectives and motivations 5
Chapters 35 to 37 introduce connections (i.e. differentiable parallelism) on manifolds. This is layer 3. Con-
nections are required for concepts such as covariant derivatives, geodesics, convexity and curvature.
35. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 681
36. Affine connections and covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . 699
37. Geodesics, convexity and Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
Chapters 38 to 40 introduce Riemannian and pseudo-Riemannian metrics. This is layer 4. Such metrics are
required for general relativity. Tensor calculus is a notational system for practical DG calculations.
38. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
39. Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
40. Tensor calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Chapters 41 to 43 present numerous examples of manifolds and fibre bundles. A particularly useful and
familiar manifold is the 2-sphere S 2 .
41. Geometry of the 2-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
42. Examples of manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
43. Examples of fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Chapter 44 is a not very useful set of notes on derivations, germs and jets, which provide some obscure
representations of tangent spaces.
44. Derivations, gradient operators, germs and jets . . . . . . . . . . . . . . . . . . . . . . 781
Chapters 45 to 50 are appendices to the main part of the book.
45. History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
46. Exercise questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
47. Exercise answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
48. Notations and abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
49. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
50. Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
1.4. Objectives and motivations

1.4.1 Remark: Between 1986 and 1991, the author was trying to generalize some geometric properties of
solutions of second-order boundary and initial value problems from flat space to differentiable manifolds. (In
particular, the author needed estimates for parallel transport of second-order partial differential operators
along geodesics in terms of bounds on curvature.)
More importantly, it seemed a terrible shame that mathematicians had developed such a deep and compre-
hensive corpus of results for partial differential equations in flat space, particularly for boundary and initial
value problems, whereas according to cosmologists, the universe is no longer flat, in which case a vast swathe
of the PDE corpus must surely be null and void, being inapplicable to curved space. Many of the techniques
of PDE theory are in fact very much dependent on the special properties of Euclidean space.
The five-layer structural organization of differential geometry in Section 1.1 is a direct consequence of the
desire to minimize the requirements placed by DG on PDE so that the maximum extent of generalization
to curved spaces will be facilitated. The requirements minimization objective is considered in choosing all
definitions in this book.
For the task of converting PDE concepts and techniques to curved space, the author could not find differential
geometry texts which met the high standards of systematic development and logical rigour of the best analysis
texts. The more he read, the more confusing the subject became because of the multitude of contradictory
definitions and formalisms.
The origins and motivations of fundamental DG concepts are largely submerged under a century of continuous
redefinition and rearrangement. The differential geometry literature is plagued by a plethora of mutually
incomprehensible formalisms and notations. Writing this book has been like creating a map of the world
from a hundred regional maps which use different coordinate systems and languages for locating and naming
geographical features.
1.4.2 Remark: Long after initially writing the comments in Remark 1.4.1 regarding a “multitude of contra-
dictory definitions and formalisms” and a “plethora of mutually incomprehensible formalisms and notations”,

6 1. Introduction
the author acquired a copy of Michael Spivak’s 5-volume DG book. The first two paragraphs of the preface
to the 1970 edition ([43], page ix) contain eerily similar comments.
[. . . ] no one denies that modern definitions are clear, elegant, and precise; it’s just that it’s impossible
to comprehend how any one ever thought of them. And even after one does master a modern
treatment of differential geometry, other modern treatments often appear simply to be about totally
different subjects.
Since 1970, it seems little has changed. If anything, the literature is now even more confusing.
1.4.3 Remark: The initial strategy of this book was to stitch together a dozen of the differential geometry
articles in the Mathematical Society of Japan’s excellent Encyclopedic dictionary of mathematics [34] into a
small coherent presentation in a logical order with uniform notation, together with prerequisites and further
details from other texts. The original target length of about 50 pages has unfortunately been exceeded!
The recursive catchment area of differential geometry prerequisites is a surprisingly large proportion of
undergraduate mathematics. The finished product will hopefully achieve a reasonable coherence and harmony
between the various perspectives of the subject without becoming encyclopedic.
1.4.4 Remark: The most difficult aspect of differential geometry is the lack of explanation for why defini-
tions are so and not otherwise. There is perhaps an analogy here with the contrast between ancient Egyptian
mathematics and classical Greek mathematics. It was said that Thales brought back mathematics from a
visit to Egypt in around 600bc. (See Ball [188], pages 14–19: “Probably it was as a merchant that Thales
first went to Egypt, but during his leisure there he studied astronomy and geometry.” There had been
very substantial sea trade in the Eastern Mediterranean for centuries. So such contacts were inevitable.)
The Egyptian priest class never gave reasons for why their theorems were true. They simply observed, for
example, that a 3/4/5 triangle has a right angle. (See Bell [190], page 40.) The Egyptians just said: “This is
how you do it.” The Greeks, by contrast, insisted on trying to find proofs, and by finding proofs, the Greeks
were able to enormously expand the body of theorems. Classical Greek mathematics was characterized by
the excitement of discovery whereas Egyptian mathematics was static. (It is just possible that the severe
limitations of the Egyptian writing system had something to do with this. The Egyptian scribe class had
to learn all the symbols by rote, although their writing was partly phonetically based. The Greek writing
system was the first fully phonetic system in history, which resulted in high general literacy because only
a few simple principles were required to pronounce every word in the language. Axiomatic and deductive
thinking were an integral part of Greek culture.)
While some differential geometry books do try very hard to motivate the choices of definitions, there are
many definitions for which it is very difficult to find any explanation of how the choice is made. The modern
mathematician’s instinct is always to modify and extend definitions to see if something useful arises. In this
book, an attempt is made to determine what happens if many of the appararently arbitrary choices and
restrictions in definitions are really necessary. If it turns out that dropping a requirement or extending the
domain of an argument results in a useless or meaningless definition, this helps to clarify the meaning. But
sometimes the usual way of doing things turns out to be an obstacle in the way of further development of
the subject. Therefore this book tries to avoid simply saying: “This is how you do it.”
1.4.5 Remark: A vegetarian cook will generally be better at cooking vegetables than the meat-centric
cook who regards vegetables as a necessary but uninteresting background. In the same way, a definition-
centric book will generally explain definitions better than a theorem-centric book which regards definitions
as a necessary but uninteresting background.
1.4.6 Remark: The principal goal of mathematics teaching is to liberate the student from the teacher. The
teacher who explains the motivation and justification of every assumption and assertion assists the student to
become independent of the teacher. The teacher who says “this is how you do it” without explanation makes
the student a prisoner of dogma, unable to adapt their knowledge to circumstances and make new discoveries.
Dogmatically taught students become dogmatic teachers and practitioners because they do not know the
reasons for how things are done. The best teachers are happiest when their students correctly challenge
assumptions and assertions and have the courage and capability to develop valid extensions, generalizations
and alternatives.
1.4.7 Remark: For every mathematical object, one may ask the following questions.

1.4. Objectives and motivations 7
(1) How do I perform calculations with this object?

(2) In which space does this object “live”?
(3) What is the “essential nature” of this object?
Students who need mathematics only as a tool for other subjects may be taught only the answers to ques-
tion (1). This often leads to incorrect or meaningless calculations because of a lack of knowledge of which
kind of space each object “lives” in. Different spaces have different rules. Question (3) is important to help
guide one’s intuition to form conjectures and discover proofs and refutations. One must have some kind of
mental concept of every class of object. This book tries to give answers to all three of the above questions.
Human beings are half animal, half robot. It is important to satisfy the animal half’s need for motivation
and meaning as well as the robot half’s need to do calculations.
If one cannot determine the class of object to which a symbolic expression refers, it may be that the
expression is a “pseudo-notation”. That is, it may be a meaningless expression. Such expressions are
frequently encountered in differential geometry (and in theoretical physics and applied mathematics). This
book tries to avoid pseudo-notations. They should be replaced with well-defined, meaningful expressions.
An effective tactic for making sense of difficult ideas in the DG literature is to determine which set each
object in each mathematical expression belongs to. It sometimes happens that a difficult problem is made
easy, trivial or vacuous by carefully identifying the meanings of all symbolic expressions and sub-expressions
in the statement of the problem. Plodding correctness is better than brilliant guesswork. (Best of all, of
course, is brilliant guesswork combined with plodding correctness.)
1.4.8 Remark: To suppose that mathematics is the art of calculation is like supposing that architecture
is the art of bricklaying, or that literature is the art of typing. Calculation in mathematics is necessary,
but it is an almost mechanical procedure which can be automated by machines. The mathematician does
much more than a computerized mathematics package. The mathematician formulates problems, reduces
them to a form which a computer can handle, checks that the results correspond to the original problem,
and interprets and applies the answers. Therefore the mathematician must not be too concerned with mere
computation. That is the task of the robot. The task of the human is to understand mathematics.
1.4.9 Remark: Mathematical research generally proceeds as two parallel activities. By intuition, the
mathematician forms conjectures. Intuition is a forward-looking activity. But every conjecture needs proof
by rigorous deduction. Proof is a backward-looking activity which tries to “join up the dots” between
assumptions and assertions. (In a sense, one goes into debt when one states a theorem, and the debt is only
paid off when the theorem is proved.) Both intuition and deduction are essential in mathematical research.
Therefore a good book should assist the reader in both areas. The strict methods of proof must be presented,
but a strong intuitive insight must also be communicated.
This book tries to be thoroughly rigorous, while also trying to make as many concepts as possible intuitively
clear. Therefore many of the fundamental concepts are discussed at much greater length than is usual,
and many diagrams are included to assist the intuition. After all, the ability to validate or invalidate a
conjecture is of little value if one’s conjectures are formed at random with little idea of what to expect.
Often the discussions in this book state the completely obvious. But there is method in this madness.
By making obvious observations explicit, one may question them, and sometimes a statement which is
“obviously true” turns out later to be either not so obvious at all, or sometimes partly or completely false.
It is normal in lectures to spell out many obvious consequences of definitions and theorems. Publishers,
and their reviewers, generally frown upon the inclusion of obvious observations in books. But if one is
self-publishing, this constraint on the communication of “the obvious” is removed.
1.4.10 Remark: Most DG texts assume a high degree of smoothness (e.g. C ∞ ) for functions and manifolds
to make their work easier. This hinders applications to functions and manifolds (e.g. the graphs of functions)
which arise as solutions of initial and boundary value problems for physical models. Analysts often have to
work very hard indeed to prove that a problem has solutions with even limited regularity such as C 1,1 or C 2 .
So the blanket assumption that everything is C ∞ unduly restricts the applicability of DG results.
The analyst seeking to apply DG theorems must first investigate whether C ∞ -based results can survive with
weaker regularity assumptions. At least the cautious, responsible analyst who does not wish to recklessly cut
corners should verify that any C ∞ assumptions can be removed without endangering the validity of theorems

8 1. Introduction
before using them. Therefore this book tries to keep regularity assumptions in definitions and theorems as
weak as possible to save analysts the time and expense of verifying applicability in weak regularity contexts.
1.4.11 Remark: Just as the general public of the 18th and 19th centuries had a desire to understand the
new theories of gravity, the spherical Earth and the solar system, so in our own time the public have the same
desire to understand the new theories of gravity, curved space-time and the universe. Every effort should
be made to remove obstacles lying in the way of the non-specialist who wants to better understand the big
ideas of our time. There’s no point being in the 21st century if one’s understanding of the universe remains
stuck in the 19th. It is a worthy life aim to understand and appreciate the best theories and discoveries of
one’s own era; even more worthy is to contribute to them.
It is possible that the current orthodoxy in cosmology may require an overhaul some time soon. If Einstein’s
equations need to be put on the operating table for major surgery, it will be important to have a deep under-
standing of the mathematical machinery underlying those equations. This justifies the detailed investigation
of the fundamentals of differential geometry in this book.
1.4.12 Remark: Since the scope of this book is quite extensive, from the fundamental concepts of logic to
the convoluted structures of differential geometry, sorting the topics into define-before-use order has not been
easy. Specialists in each mathematical topic borrow what they need from other specialists in a non-acyclic
manner, possibly unaware of the circularity of their definitions. So define-before-use ordering is difficult to
achieve.
A large part of the author’s motivation for seeking a “bedrock for mathematics” is (or was) the desire to
arrange all concepts in define-before-use order. This is still an objective, but it can be achieved only in a
best-effort sense.
1.4.13 Remark: The diversity and divergence of definitions in mathematics over time are reminiscent
of the diversity and divergence of natural language families during the last few thousand years. (The
divergence process for natural languages is well described, both in overview and in detail, by Ostler [176].)
The reunification of mathematical definitions is difficult to achieve, but the objective is worthwhile.
1.4.14 Remark: Differential geometry has developed multiple languages and dialects for expressing its
extensive network of concepts during the last 150 years. Familiarity with the folklore of each school of
thought in this subject sometimes requires a lengthy apprenticeship to learn the language and methods. It
is the author’s belief that a competent mathematician should be able to learn differential geometry from
books alone without the need for initiation into the mysteries by the knowledgeable ones.
The author is reminded of the foreword to the counterpoint tutorial “Gradus ad Parnassum” ([209], page 17)
by Johann Joseph Fux, published in 1725:
I do not believe that I can call back composers from the unrestrained insanity of their writing to
normal standards. Let each follow his own counsel. My object is to help young persons who want
to learn. I knew and still know many who have fine talents and are most anxious to study; however,
lacking means and a teacher, they cannot realize their ambition, but remain, as it were, forever
desperately athirst.
Joseph Haydn was one of the composers who learned counterpoint from Fux’s book because he “lacked means
and a teacher”. This differential geometry book is also aimed at those who lack means and a teacher. The
“Gradus ad Parnassum” was much more successful than previous counterpoint tutorials because Fux arranged
the ideas in a logical order, starting from the most basic two-part counterpoint (with gradually increasing
rhythmic complexity), then three-part counterpoint in the same way, and finally four-part counterpoint in
the same graduated manner. Fux said about this systematic approach: “When I used this method of teaching
I observed that the pupils made amazing progress within a short time.” ([209], page 18.) This book tries
similarly to arrange all of the ideas which are required for differential geometry in a clear systematic order.
1.4.15 Remark: The motivations for this book may be summarized as follows.
(1) Provide a “source book” for the author to create smaller, beginner-friendlier books.
(2) Help the author to understand at least some of the mathematical physics which was incomprehensible
to him in the 1970s.

1.5. Style 9
(3) Provide a background resource for mathematics and physics students who are studying differential
geometry, particularly those who haven’t two pennies to rub together.
(4) Provide a notations and definitions resource for other authors of differential geometry and physics books.
(5) Provide the necessary background for the author to generalize some work on boundary value problems
for elliptic second-order differential equations from flat space to curved space. The analytic approach
of this book will hopefully assist analysts in general to extend flat-space results to curved space.
(6) Provide a path to understanding Einstein’s general relativity for the determined adult with much en-
thusiasm but limited initial mathematics background.
(7) Restructure the subject of differential geometry into a unified, integrated subject which makes sense to
pure mathematicians. This requires some research into the motivation of definitions.
(8) Clarify for all differential geometry practitioners which concepts and theorems apply to each of the
layers of differentiable geometry so that they don’t use the wrong formulas in their work.
(9) Provide justifications for each building block in the entire edifice of theory which underlies differential
geometry so as to facilitate the reconstruction of physics. Only by understanding in depth why the
ziggurat is constructed as it is can one safely reconstruct it.
(10) Provide a framework for differential geometry which is “close to the ground”. High abstractions are
avoided. (Some authors represent simple concepts by astonishingly complex definitions with no obvious
benefit. This book tries to give the simplest possible formal definition for every concept as if the reader
had better things to do than spend hours studying a convoluted definition to determine finally that it
is exactly the same as a very simple definition which they knew already.)
1.5. Style
1.5.1 Remark: Some DG books begin with appeals to physical and geometric intuition; some start in the
middle layers of the subject with differentiable manifolds and work outwards; others begin with the familiar
flat spaces of high school geometry and gradually introduce manifolds and curvature. Here the approach is
to systematically build secure foundations in the early chapters so that whenever there is doubt, one may
trace any definition back through the progressively built-up layers to the ground levels of mathematical logic
and set theory. Although the soil underneath mathematical logic is quite sandy in places, it is probably the
best basis on which to build a secure formalism for differential geometry. (See Remark 2.1.4 for this choice
of foundation layer.)
Many DG books try to build the reader’s understanding on an intuitive foundation, starting with familiar
ideas and appealing to intuition at frequent intervals to help make conceptual leaps which don’t seem quite
logical. Much of differential geometry is counter-intuitive. So intuition often leads down erroneous paths.
In this book, the theory is developed along strictly mathematical lines. Intuition is often helpful, but must
always be backed up by rigorous proof.
1.5.2 Remark: In some books, half of the material is relegated to a series of frustrating exercises at the
end of each chapter. Putting significant material in the exercises makes the main presentation incomplete. In
this book, answers for all exercises will be provided. Trying to understand mathematics is itself a sufficient
exercise. Interested readers can always work out their own examples and check the theorems and definitions
to make sure that they are correct and self-consistent. The enthusiastic reader can experiment with theorems
and definitions to try to improve them or to understand why particular assumptions were chosen. Whenever
an author says that an assertion is “clear”, “easy to show”, or “blindingly obvious”, this is an invitation to
readers to check that they know how to fill in the missing steps. Simmons [140], page xi, makes the following
similar comment. (Authors were less gender-neutral in 1963.)
The serious student will train himself to look for gaps in proofs, and should regard them as tacit
invitations to do a little thinking on his own. Phrases like “it is easy to see,” “one can easily
show,” “evidently,” “clearly,” and so on, are always to be taken as warning signals which indicate
the presence of gaps, and they should put the reader on his guard.
1.5.3 Remark: There are three kinds of books on any technical subject: (1) a reference book, (2) a
tutorial, (3) a ‘cookbook’. The cookbook style of work presents a set of recipes for solving various particular

10 1. Introduction
problems from which the reader is supposed to gradually accumulate skills and knowledge or else just solve the
particular kinds of problems that they are interested in. The tutorial style of work takes the reader through
a subject in a manner which systematically increases their knowledge by starting from easy concepts with
which they can be expected to be familiar and adding new concepts which depend only on concepts that
have been presented earlier or else are assumed as prerequisites. The reference style of work is supposed to
be complete, systematic and well-indexed so that someone already knowledgeable in the subject can quickly
locate the details which they require. This author is against cookbooks. The cookbook style of presentation
is best suited to subjects which are so totally incomprehensible that trial-and-error is the only way to make
sense of it. Then some ready-made recipes are indispensable. The style here is intended to be a combination
of reference and tutorial. In other words, this book should hopefully be complete and systematic, but also
readable and digestible in a single linear reading.
1.5.4 Remark: If the reader finds the amount of informal discussion in this book excessive, it is interesting
to note that Hermann Weyl frequently entered into philosophical discussion in his classic “Raum, Zeit,
Materie” [51]. In 1923, there was more time to think about the meaning of mathematics. At that time, there
was less of a boundary between mathematics and philosophy. On the other hand, some authors chatter too
much. (For example, the book on π by Beckmann [189] looks more like a political pamphlet than a serious
mathematics book.)
1.6. Some minor details of presentation

1.6.1 Remark: The mathematical literature has a wide range of notations for sets of numbers. This book
uses the following notations.
all positive non-negative negative non-positive
−
integers +
, 0, ω
+ −
0
−
rationals + +
0
−
0
reals IR IR+ IR+0 IR− IR−
0
− −+ − −+ + −− −−
extended integers , 0, ω
− −+ −+ −− −0−
extended rationals
− − −0 − −0
extended reals IR IR+ IR+ IR− IR−
0 0
The notation ω is used for the non-negative integers + 0 when the ordinal number representation is meant.
Thus 0 = ∅, 1 = {0}, 2 = {0, 1} and so forth. Then ω + = ω ∪ {ω}. The notation + 0 is preferable when
the representation is not important. Note that the plus-symbol in + 0 indicates the inclusion of positive
numbers, whereas the plus-symbol in ω + means that an extra element is added to the set ω.
The notation is mnemonic for the “natural numbers” + . (Some authors include zero in the natural
numbers.) The bar over a set of numbers indicates that the set is extended by elements ∞ and −∞. So
− −
= ∪ {∞, −∞} and IR = IR ∪ {∞, −∞}. Then the positivity and negativity restrictions are applied to
−+ −− −
these extended sets. So 0 = + 0 ∪ {∞} and IR = IR ∪ {−∞}.
The notations n = {1, 2, . . . n} and n = {0, 1, . . . n − 1} are used as index sets for finite sequences. These
index sets are often used interchangeably in an informal manner. Thus IRn usually means IR n in practice.
1.6.2 Remark: When mathematical symbols appear at the beginning of a sentence, the meaning can
become unclear. Therefore a strong effort is made to commence all sentences with natural language words.
For similar reasons, end-of-sentence mathematical symbols at the beginning of a line are also avoided where
possible. These style considerations sometimes lead to slightly unnatural sentence structure.
1.6.3 Remark: Alternative definitions are sometimes given for the author’s preferred definitions. An
alternative definition has an arrow pointing to the corresponding standard definition as in the following.
9.2.4 Definition: This is a definition which is adopted as standard in this book.
9.2.18 Definition (→ 9.2.4): This is an equivalent alternative version of Definition 9.2.4.
1.6.4 Remark: Theorems based on non-standard axioms are tagged. (See Remarks 4.7.2 and 5.0.10.)

1.7. Differences from other differential geometry texts 11
1.6.5 Remark: In mathematical definitions in this book, the word “when” is sometimes used where usually
the word “if” is used. A definition is not the same as a logical implication or equivalence. A definition is a
shorthand for an often-used concept. It is not an axiom or theorem. The popular definition “assignment”
def ∆
notations = , =, ← and := are clumsy, unattractive and incorrect. Plain language is used instead.
1.6.6 Remark: The modern symbol “ ” is placed at the end of each completed proof instead of the
old-fashioned abbreviation QED (Quod Erat Demonstrandum).
1.6.7 Remark: The spelling is mostly British. The suffix spellings -ize and -ization are used in accordance
with the excellent discussion in the OED [212], page 1122. However, some of the north American spelling
variations devised by Noah Webster in about 1828 ([205], page xxiv) are occasionally used.
1.7. Differences from other differential geometry texts

The following are some of the general differences in presentation between this book and the majority of other
differential geometry textbooks.
(1) Differential geometry concepts are presented in a strictly progressive order in terms of structural layers
and sublayers. In particular, the Riemannian metric is not introduced until all affine connection topics
have been presented, and all concepts which are meaningful without a connection are defined in the
differentiable manifold chapters before connections are defined. (See Section 1.1 for further details.)
(2) Substantial preliminary chapters present most of the prerequisites for the book. This avoids having to
weave elementary material as needed into the more advanced material as many books do.
(3) Definitions are the main focus rather than theorems. Theorems are presented only to support the
presentation of definitions.
(4) Classes of mathematical objects are generally defined semi-formally in terms of specification tuples.
(5) Examples are mostly collected at the end of the book to avoid interrupting the theoretical development.
(6) The exercises are at the end of the book and all answers are given. Therefore no essential result is
required to be provided by the reader.
The following are some specific differences between the way concepts are defined in this book compared to
many or most other differential geometry books.
(i) Paths are defined as equivalence classes of curves, not images of curves.
(ii) Associated fibre bundles are defined in terms of a relation between fibre bundle atlases rather than the
customary explicit constructions using orbit spaces.
(iii) Connections on fibre bundles are generalized to topological pathwise parallelism.
(iv) Tangent vectors are defined as equivalence classes of vector components rather than as differential
operators or curves.
(v) Higher-order tangent operators are defined for use in analysis of partial differential equations.
(vi) Connections are defined on ordinary fibre bundles rather than the customary principal fibre bundles.
(vii) The covariant derivative is defined directly in terms of an affine connection by using a “drop function”.
(viii) Riemannian metric tensors are defined as half the Hessian of the square of the point-to-point distance
function, rather than as ad hoc symmetric covariant tensors of degree 2.

12 1. Introduction
1.8. MSC 2000 subject classification

The following ‘wheel of fortune’ shows where differential geometry fits into the general scheme of things.
22 20 19
26 18
28 17
Group theory and generalizations

Topological grou
K-theory
30
Real fu
Categ
16
Mea
Non
31 15
Fun
Ass
32
Po
sure
asso
14
L in
ory th
ctio
nctions
oci
ten
PDE
Se
33
ea
A
13
at
ns o
cia
and
l
ve
tial
ge
ra
Sp
i
e
v
34
ral
C
br
o
t
n
i
ec
o
the
v
r
a
integ
m
r
12
y
e
O
a
co
i
i al
Fi
c
ngs
ps, Lie groups
mu
;
m
rd
com
ring
35
el
m p ns
hom
ge
Pa
u
ry
fu
i na
t
ltili
at
11
n
rti
ratio
om gs
th
and algeb
lex
ry
Nu
cti
ive
al
s an bras matr
eo
plex
n
37
o
m
di
Dy
e
o
di
e
va ry
lo
be
try nd
r
ffe
ffe
ar
na
in
08
a
an rt
g
Ge
d
re
ria
mi
re
l
var
ge ra;
he
ical a ras
d
nt
ne
39
ca
nt
a
Dif
ble
or
a
ia
po ral
l
ls
ia
fer
g
iabl
y
le
06
ys
le
lyn alg
en
eb
Ord
s
ce t em qua
al
qu
eb
lgebra
a
om er,
40
e
Seq
g
an rai
nd
s tio
latt
at
e
df
ia
uen un and ns
cs
br
ice
io
05
ls Com
an
ces ys s, o
ns
as
, ctio e tem bina
na rgod
41
s
rde
aly
App erie
ix
s red toric
roxim s, s l
um equa ic th
s
03
tic
he
ation
alg Math
ebr emat
s a m tio eo
sp
ory
42
Fourie nd e ical lo
xpan bility ns ry
a ic s
r analy gic a
ac
tru
01
sion
nd fo
sis ctu History
es
s and biog unda res

raphy tions
43
Abstract ha
rmonic anal
ysis
00
General
es
Integral transforms, operational calculus
44
enc
n
atio Mathematics education ci
miz
its agl s
97
on, circu
tions pti
a ti r
Integral equa
n ic u n
l; o comm u io i
45
t r o tion and havmm

nalysis on Informa es d be gra
al c
r o l c
94
onal a
co n t e n
ci al an l pr o
Functi o ptim ms th
eory; al s
atur , soc atica
i
46
heor
y and
es
ry trans er
y s t e r n
tor t ns
S h e s
93
c
att
y
t i m
o
ess
fer
a i
etr
o
e r a t and om the
Op ari
fm
con , ma
tic p lds
v
47
m ogy
re ry
f o
so
roc
l e
ge
i o ,
ctu eo
o
culu
B y h
has nifo
eor arc ics
92
Cal te
th
Mechanics of particles and systems
r e e th se ys
y
at
isc
49
stoc n ma
, s nal
etr
e
ry
m r h
o s
Ga
et
s p
dd
, he
lys lexe
om
ble solids
ion ro
an tatio
91
e st
r
n m rat
t
o
G a eo
s a
ex
51
ics
e
e
p
sic
i
lg Op
av
d
s
om
s
h
nv
gy
an
c
ia hy
i
ct
90
m
o
gr
y
t
i
lo
C
p
log
n
ll c
y
a
po
o
re
neti
d
52
and
na
Ge m
dyn
ch
an
ffe
po
no
ce
to
f deforma
i 86
,a
ry
rt o
al
D
to
y
mag
lm
nd
mo
t
o
ory
er
i
53
As
sis
lysis
tiv
aic
the
en
ce
sa
la
r
s
85
c
aly
the
br
e
o
G
ic
e
sti
Computer scien
old
tum
l th
r
R
ge
54
cal ana
t
n
l an
ati
elec
83
a
ility
Al
nif
ica
you are here

mech
S
a
55
s
ba
Ma
stics
bab
Qu
s
82
ic
ics,
s
Glo
Cla
57
Numeri
81
Pro
Opt
Fluid
Stati
58
general relativity
Mec
80
60 78
62 76
65 68 70 74
1.8.1 Remark: Differential geometry is one of 63 mathematics subject areas. In the Dewey decimal
classification system, mathematics occupies 10 out of 1000 classifications. This suggests that differential
geometry constitutes approximately 1/6300 of all human knowledge. This is about 0.016%.
It is perhaps a depressing thought that decades of study are required to acquire even a fair understanding of
such a small proportion of human knowledge. The human mind is a finite microscope scanning an infinite
universe of ideas. No matter how much one learns, one’s world-view will always be woefully incomplete and
unrepresentative. The research community is like millions of ants with tiny microscopes, each examining a
grain of sugar at a time. Textbook writers try to assemble the grains of sugar into tidy tasty sugar cubes
of knowledge. But there is a huge sugar-mountain of knowledge out there to be organized and understood.
So it is essential to continually compactify and demystify all areas of mathematics so that valuable human
time and energy are not wasted collecting and descrambling scattered and sometimes cryptic material.

1.9. How to learn mathematics 13
1.9. How to learn mathematics

1.9.1 Remark: This is the “asterisk method of learning” which I discovered in 1975. It worked!
(1) Find somewhere quiet. Turn off the radio (and your portable digital music player). A library reading
room is best if it is a quiet library reading room. Preferably have a large table or desk to work on.
(2) Open the book or lecture notes which you wish to study.
(3) Start copying the relevant part of the book or the lecture notes to a notepad by hand. If you are
studying a book, you should be writing a summary or paraphrasing what you are reading. If you are
studying lecture notes, you should copy everything and add explanations where required.
(4) Whenever you copy something, ask yourself if you really understand it completely. In other words, you
must understand every word in every sentence. As long as you are completely comfortable with what
you are copying, keep going.
(5) If you read something which is difficult to understand, stop and think about it until you understand it
clearly. If a mathematical expression is unclear, try to determine which set or class each term in the
expression belongs to. All sub-expressions in a complex expression must belong to some set or class.
Every operator and function in an expression must act only on variables in the domain of definition of
the operator or function.
(6) If you find something that you really can’t understand after a long time, copy it to your notebook, but
put an asterisk in the margin. This means that you have copied something that you did not understand.
(7) While you continue copying, keep going back to the lines which are marked with an asterisk to see if
you can understand them. If you find an explanation later, you can erase the asterisk .
(8) When you have finished copying enough material for one sitting, look over your notes to see if you can
understand the lines which still have an asterisk. If you have no asterisks, that means that you have
understood everything. So you can progress to the next chapter or section of the text.
(9) If you still have one or more asterisks left in your notes after a day or more, you should keep trying to
understand the lines with the asterisks. Whenever you get some spare time and energy, just look at the
lines with asterisks on them. These are the lines that need your attention most.
(10) If you discuss your work with other people, especially with teachers or tutors, show them your notes
and the lines with the asterisks. Try to get them to explain these lines to you.
If you keep working like this, you will find that your study becomes very efficient. This is because you do
not waste your time studying things which you have already understood.
I used to notice that I would spend most of my time reading the things which I did understand. To learn
efficiently, it is necessary to focus on the difficult things which you do not understand. That’s why it is so
important to mark the incomprehensible lines with an asterisk.
Copying material by hand is important because this forces the ideas to go through the mind. The mind is
on the path between the eyes and the hands. So when you copy something, it must go through your mind!
It is also important to develop an awareness of whether you do or do not really understand something. It is
important to remove the mental blind spots which can hide the difficult things.
When copying material, it is important to determine which is the first word where it becomes difficult. Read
a difficult sentence until you find the first incomprehensible word. Focus on that word. If a mathematical
expression is too difficult to understand, read each symbol one by one and ask yourself if you really know
what each symbol means. Make sure you know which set or space each symbol belongs to. Look up the
meaning of any symbol which is not clear.
The best way to learn any subject is to write a book on it. But if you’re in a hurry, the asterisk method is
a good second best.
1.9.2 Remark: It is often said that learning is most effective when the student has an insight into an
idea. This assertion is sometimes used as a pretext for the avoidance of “rote learning” because insight must
precede the acquisition of ideas. In practice, this leads to inefficient learning. The best way to get insight
into the properties and relations of ideas is to first upload them into the mind and then achieve insight into
them. For example, a child who has learned the “five times table” will easily notice the redundancy in this
table during or after rote learning. The best strategy is to learn first, then understand more deeply.

14 1. Introduction
1.10. Acknowledgements
[ Acknowledgements will be given here to particular people who helped with this book. ]
It goes perhaps without saying that the author is greatly indebted to Donald E. Knuth [207] for the gift of
the TEX typesetting system, without which this book would not have been created. The author would have
been even more grateful if plain TEX didn’t require an IQ of 200 to fully understand it. Thanks also to John
D. Hobby for the MetaPost illustration tool which is used for all of the diagrams.
The fonts used in this book include cmr (Computer Modern Roman, American Mathematical Society),
msam/msbm (AMSFonts), rsfs (Taco Hoekwater) and wncyr Cyrillic (Washington University).

[15]
Part I
Preliminary topics

16 1. Introduction

[17]
Chapter 2
Philosophical considerations
2.1 The bedrock of mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Logic, language and tribalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Ontology of mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Plato’s theory of ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Sets as parameters for socio-mathematical network communications . . . . . . . . . . . . . 31
2.6 Sets as parameters for classes of objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Extraneous properties of set-constructions in definitions . . . . . . . . . . . . . . . . . . . . 38
2.8 Axioms versus constructions for defining mathematical systems . . . . . . . . . . . . . . . . 40
2.9 Some general remarks on mathematics and logic . . . . . . . . . . . . . . . . . . . . . . . . 43
2.10 Dark sets and dark numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.11 Integers and infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.12 Real numbers and infinitesimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
The purpose of this chapter is to outline some of the difficulties which are encountered when attempting
to find a solid basis for mathematics, in particular for differential geometry. Any resemblance between this
chapter and the academic subject “philosophy of mathematics” is purely coincidental and unintentional.
hematics
m at
p
c
hy
i
l og
sic
s
op o lo g y
chemist
thr
ry
an
e
c bi
en olo
sci gy
neuro
2.0.1 Remark: No bedrock of knowledge underlies mathematics. Reductionism ultimately fails.

The above medieval-style “wheel of knowledge” shows some interdependencies between seven disciplines.
One may seek answers to questions about the foundations of each discipline by following the arrow to a more
fundamental discipline, but there seems to be no ultimate “bedrock of knowledge”.
The author started writing this book with the intention of putting differential geometry on a firm, reliable
footing for the benefit of mathematicians, physicists and engineers so that they could feel complete confidence
that their work had a solid basis. When he had read a set theory book by Halmos [160] in 1975, it seemed
to him that all of mathematics was reducible to set constructions. But by delving deeper into set theory and
logic, the author encountered the opposite result. Thirty years later, the happy illusion faded and crumbled.
It seems like a completely reasonable idea to seek the meaning of a mathematical expression in terms of
the meanings of individual symbols in the expression. When this is applied recursively, the inevitable result


18 2. Philosophical considerations
is either a cyclic definition at some level or a definition which refers outside of mathematics. Part of the
philosophy of this book is to avoid external definitions. But cyclic definitions are even more unsatisfying and
empty. So the best solution is to admit that expanding the semantic tree of mathematical expressions must
ultimately lead to leaf nodes which make references outside mathematics. But these external references do
not have to be external to this book. In other words, a book which seeks to give full meaning to mathematical
expressions may, and should, include sufficient extra-mathematical context at the lowest levels to ensure that
the whole edifice has meaning.
The reductionist approach to mathematics is enormously valuable, but cannot be carried to its full conclusion.
Just as matter may be reduced to atoms and elementary particles, so all of mathematics may be reduced to
sets and logic. It turns out, however, that when elementary particles are split, they are recursively defined
in terms of each other. They can be arranged in a network, not in a tree. In the same way, the elements of
set theory and logic turn out to be recursively defined in terms of each other.
2.0.2 Remark: Philosophical enquiry is unavoidable when studying meanings of mathematical definitions.
This chapter arose naturally from the wide scope and unifying objectives of the book. If one writes about a
narrow range of mathematical topics, there is little need for philosophical reflection, but this book attempts
to present differential geometry and all of its prerequisites in a unified and systematic manner, tracing all
definitions to their origins and incorporating numerous divergent approaches to the subject.
2.0.3 Remark: Some kinds of philosophical questions which arise in a definitions book.
Many particular questions of a philosophical nature arose when writing this book. For example: What are
the “correct” or “best” definitions for real numbers, tensor products (of vectors), and tangent vectors (on
manifolds)? Should one accept the “existence” of a set whose contents can never be known? If so, what
would such “existence” mean? How can a secure basis be provided for mathematics if basic logic requires
set theory for its establishment and set theory requires basic logic for its establishment? What is the real
“stuff” of mathematics? If the written symbols are not the real “stuff” of mathematics, what is? What do
the symbols point to? Is mathematics universal in the sense that inter-galactic civilisations would discover
the same mathematics that Earthlings have? Or is mathematics merely a local culture which is propagated
in our civilisation more or less in the manner of natural languages? Is there any sense in which anything in
mathematics can be said to be certainly true in an absolute sense? Or is all mathematics merely a socially
defined behaviour? To what extent is our mathematics a consequence of the peculiar capabilities of the
human brain? Would a more advanced species use a totally different (and superior) mathematics? Since
the real number system seems to be a consequence of our system of physical measurements, what is the
significance of the absurdly large infinity of elements of the set of real numbers? Are the logical difficulties
that arise with the real numbers a problem with the physical universe which we measure, or a problem with
the limitations of the human modelling process? Since we never experience or observe anything truly infinite
in the real world, how is it that the most successful applications of mathematics to the physical sciences rely
so heavily on infinite and infinitesimal concepts?
These questions and many more are likely to arise spontaneously from any serious attempt to provide a
robust, self-consistent, unified, systematic basis for a rich mathematical topic such as differential geometry
and its pre-requisites. Therefore some philosophy seems inevitable since philosophy is the study of questions
which cannot be answered. (When a question can be answered, it moves from the philosophy department
to the science faculty.)
2.0.4 Remark: Philosophy of mathematics is potentially harmless.

Richard Feynman is reputed to have said: “Philosophy of science is about as useful to scientists as ornithology
is to birds.” Philosophy of mathematics is probably equally useful. It is best for mathematicians to get on
with their mathematics and let the philosophers worry about what it all means, if anything. Philosophy of
mathematics may or may not be completely harmless.
The best attitude to philosophy of mathematics might be to simply read it and forget it. In fact, one may
as well just not read it at all. Skipping this chapter would certainly do no harm. This chapter is not much
more than a collection of mini-essays and speculations, sometimes presented in a disconnected fashion. The
reader should certainly not expect logical self-consistency in this chapter. Philosophy is a great opportunity
to relax one’s intellectual rigour for a while and let the mind unwind.

2.1. The bedrock of mathematics 19
2.1. The bedrock of mathematics

2.1.1 Remark: Rigorous mathematics is boot-strapped from naive mathematics.
Although logic is arguably the bedrock of mathematics, it floats on a sea of molten magma, namely the
naive notions of logic, sets, functions, order and numbers which are used in the formulation of “rigorous”
mathematical logic. This is illustrated in Figure 2.1.1.
sedimentary layers (rigorous mathematics)
functions order numbers
set theory
bedrock layer mathematical logic
naive logic naive sets naive functions naive order naive numbers
magma layer (naive mathematics)

Figure 2.1.1 Relations between naive mathematics and “rigorous mathematics”
Mathematics is boot-strapped into existence by first assuming socially and biologically acquired naive notions;
then building logical machinery on this wobbly basis; then “rigorously” redefining the naive notions of sets,
functions and numbers with the logical machinery. Mathematics and logic are like two snakes swallowing
each other by the tail. (A similar problem occurs in natural-language dictionaries. Some minimal vocabulary
is required to boot-strap the definitions.) Conjuring up concepts such as metamathematics and metalogic
solves nothing. Inventing long names to describe the circular definition problem does not make it go away.
2.1.2 Remark: Etymology and meaning of the word “naive”.

The word “naive” is not necessarily pejorative. The French word “naive” comes from the Latin word
“nativus”, which means “native”, “innate” or “natural”. These meanings are applicable in the context of
logic and set theory. The Latin word “nativus” comes from Latin “natus”, which means “born”. So the
word “naive” may be thought of as meaning “inborn”. That is, naive mathematics is a capability which
humans are born with. (This is related, of course, to the philosophizing in linguistics about the extent to
which natural language is an inborn or learned capability. Much the same arguments apply to mathematical
behaviour.)
2.1.3 Remark: Fundamental mathematics is both a-priori and analytic knowledge.

The boot-strap notion of logic and mathematics seems to resolve the question of whether logic and arithmetic
are a-priori knowledge. It seems that logic and arithmetic are both a-priori and analytic knowledge because
they are a-priori during the boot-strap phase and analytic when re-defined “rigorously”. A mathematical
education consists of many loops around the snakes and ladders of logic and mathematics until one’s thinking
is synchronized with other mathematicians. Philosophers who worry about the true nature of mathematics
possibly make the mistake of assuming a static picture. They ignore the cyclic and dynamic nature of all
knowledge, which has no ultimate bedrock. Terra firma floats on molten magma.
2.1.4 Remark: Mathematical logic is the best “bedrock layer” for mathematics.
One must agree on a starting point for the presentation of a body of mathematics and then proceed in an
orderly fashion from that agreed starting point. The starting assumptions cannot be validated in a non-
cyclic way from more basic “knowledge”. Like all communication of knowledge, one must proceed from
an agreed point in an agreed manner. New ideas can be communicated only if old ideas are first agreed.

(Computer communications are similar. A pair of computers generally start any communication session
with a “handshake” procedure to synchronize their states so that the rest of the session will be correctly
understood by both sides.)
At a postgraduate university mathematics level, it makes sense to strive to achieve synchronization of con-
cepts in the mathematical logic layer as a “bedrock” and base everything else on that. Since most of
mathematics can be built upon a basis of mathematical logic, it is a suitable starting point. This does not
mean that logic is self-evidently true. It is just easier to achieve standardization of language and concepts
in the mathematical logic layer in the current century at a particular educational level.
2.1.5 Remark: The bedrock of mathematics may be chosen to be natural or minimal.

One may compare the organization of mathematics as a subject with chemistry. One group of chemists
could define hydrogen and oxygen as products of the electrolysis of water, while another group could define
water as a compound made from hydrogen and oxygen. It is more natural to define the hard-to-produce
gases in terms of the ubiquitous liquid. But it is more minimalist to start from 92 elements and defined all
compounds in terms of them.
In the same way, the basis of each mathematical topic may be chosen to be either natural or minimal. Very
often a minimal set of axioms and rules will seem unnatural and difficult to understand, while a natural
starting point may suffer from substantial redundancy and untidiness. In chemistry, there is broad agreement
that the 92 elements are a suitable basis for understanding all compounds. But in mathematics and logic,
and in differential geometry in particular, there are often no clear winners in the competition to be the
undisputed basis of particular topics.
2.1.6 Remark: Mathematics bedrock layers in history.

The choice of bedrock layer for mathematics seems to be a question of fashion. In ancient Greece, and for
a long time after, the most popular choice for the bedrock layer was geometry. For a long time, arithmetic
and algebra could be based on the ruler-and-compass geometry of points, lines and circles. From the time of
Descartes onwards, geometry could be based on arithmetic and algebra. Then geometry became progressively
more secondary, and arithmetic became more primary, because algebra and analysis extended numbers well
beyond what the old geometry could deliver. Later on, set theory provided a new, more primary layer upon
which arithmetic, algebra, analysis and geometry could be based. Around the beginning of the 20th century,
mathematical logic became a yet more primary layer upon which set theory and all of mathematics could
be based.
Maybe some day a new concept layer will be developed to better underly logic and mathematics. Such a
development is difficult to foresee. The fact that logic and set theory are mutually intertwined in a cyclic
manner suggests that a better bedrock layer is needed. But at the current time, logic seems to be the best
choice of boot-strap layer.
2.1.7 Remark: The network of mathematical concepts requires coherence.

Although the network of mathematical concepts cannot be defined in the sense of deriving all concepts from
other concepts in an acyclic manner, the concept network can at least be coherent. Coherence is the best that
can be hoped for in the non-acyclic concept network which includes both mathematics and mathematical
logic as in Figure 2.1.1.
Coherence is not the same as logical self-consistency. The latter means that the concept network is tested
with respect to a deductive logic framework which is established external to the network. In the situation
considered here, the logical framework is part of the concept network which is being tested. The rules of
deduction are themselves defined within the network.
In the original Latin, the word “coherent” means “sticking together” whereas “consistent” means “standing
firmly” or “remaining firm”. The English meanings of “consistent” and “coherent” are very similar in general
usage. In the context of mathematical logic, “consistent” generally means that no propositions will contradict
each other whereas “coherent” is more abstract term referring to the entire framework of concepts.
2.1.8 Remark: The mathematical pre-requisities for mathematical logic.

Mendelson [165], pages 5–11, explicitly presents six pages of mathematics which are pre-requisite to his
introduction to mathematical logic.

2.1. The bedrock of mathematics 21
For the absolute novice a summary will be given here of some of the basic ideas and results used
in the text.
Since Mendelson’s six pages include (very compactly) a big chunk of the basic theory of sets, relations,
functions, cardinal numbers and order, this seems to be a quiet confession that mathematical logic cannot be
developed without assuming much mathematics which is itself based on mathematical logic. The surprising
thing is that this circular dependency is so seldom remarked upon. Maybe the teachers for each topic refer
to each other for the pre-requisites of their courses, each believing that their own topic is firmly based on
lower-level concepts presented elsewhere. Hopefully, in the end, there are at least no contradictions between
the topics.
Remark 7.2.5 discusses how the naive notion of a sequence is required as prior knolwedge for the development
of logic and mathematics, but is then re-defined within ZF set theory.
Section 3.14 is an attempt to list some of the naive mathematics required in the set-up of mathematical logic
in this book.
2.1.9 Remark: Bertrand Russell’s disillusionment with the non-provability of axioms.

The lack of a “bedrock” for mathematics is reminiscent of the experience of Bertrand Russell when he
discovered that everything in Euclidean geometry is derived from axioms which must be accepted without
proof. The following paragraph (Clark [180], page 34) describes Russell’s disillusionment.
Lack of an emotional hitching-post was quite clearly a major factor in driving the young Russell
out on his quest for an intellectual alternative – for certainty in an uncertain world – a journey
which took him first into mathematics and then into philosophy. The expedition had started by
1883 when Frank Russell took his brother’s mathematical training in hand. ‘I gave Bertie his first
lesson in Euclid this afternoon’, he noted in his diary on 9 August. ‘He is sure to prove a credit to
his teacher. He did very well indeed, and we got half through the Definitions.’ Here there was to
be no difficulty. The trouble came with the Axioms. What was the proof of these, the young pupil
asked with naive innocence. Everything apparently rested on them, so it was surely essential that
their validity was beyond the slightest doubt. Frank’s statement that the Axioms had to be taken
for granted was one of Russell’s early disillusionments. ‘At these words my hopes crumbled’, he
remembered; ‘. . . why should I admit these things if they can’t be proved?’ His brother warned that
unless he could accept them it would be impossible to continue. Russell capitulated – provisionally.
2.1.10 Remark: Philosophy does not provide a solid bedrock for mathematical logic.
Sadly, one cannot find a bedrock underlying logic even in the realm of philosophy because philosophy itself
depends heavily on logic, thereby creating a further dependency cycle. At best one can hope to find a
logic and philosophy which are coherent with each other. Philosophy does “underly” logic in some sense,
as suggested by the diagram discussed in Remark 2.0.1, but all of the arrows depicted in that diagram are
accompanied by thinner arrows pointing in the opposite direction. One would not want to base any subject
on philosophy anyway, since philosophy is the woolliest of all disciplines. Every proposition in philosophy
can be shown to be simultaneously true, false and meaningless. Philosophy is analogous to the Earth’s core
in the magma/bedrock/sedimentary picture in Figure 2.1.1. Everything in the Earth’s core is a matter of
conjecture and guesswork because direct observation is not possible.
2.1.11 Remark: Mathematics is psychological and biological in nature, location and origin.
Lakoff/Núñez [173], page 49, has the following comment on where the ultimate meaning of mathematics
comes from when mathematical symbols are recursively interpreted.
To understand a mathematical symbol is to associate it with a concept—something meaningful in
human cognition that is ultimately grounded in experience and created via neural mechanisms.
[. . . ]
The meaning of mathematical symbols is not in the symbols alone and how they can be manipulated
by rule. Nor is the meaning of symbols in the interpretation of the symbols in terms of set-
theoretical models that are themselves uninterpreted. Ultimately, mathematical meaning is like
everyday meaning. It is part of embodied cognition.
Texts on mathematical logic typically interpret symbolic logic in terms of set theory, which in turn is defined
in terms of mathematical logic. The Lakoff/Núñez claim is that the recursive interpretation does end

somewhere, namely with “embodied cognition”, which means essentially the biological processes of human
thinking. In other words, all mathematics is ultimately psychological and biological in its nature, its location
and its origin.
2.2. Logic, language and tribalism

2.2.1 Remark: Removing the circularities in logic and mathematics by external references.
The attempt to remove the circularities in the structure of mathematics may be compared to a study of
how the parts of the human body are supported. One may observe that the head is supported by the neck,
which is supported by the shoulders, which are supported by the chest, which is supported by the spine and
abdomen and so forth. But the feet are not supported by any part of the body. The feet must be supported
by something outside the body. In this sense, the ground underneath the feet is part of the body. In fact,
the dirt we walk on is an inseparable part of the body (in the same way that various friendly bacteria which
are needed for digestion are an essential part of the human body). Even if we jump in the air, the trajectory
of that jump is defined by the point at which we leave the ground and the point at which we return. If we
swim in the sea, the sea itself is defined by the ground on which it rests. Swimming is supported by the
buoyancy of the water, which is supported by the earth below it.
Thus any explanation of the physical support of the components of the human body can only avoid circular
dependencies by finally referring to something which is outside the body.
In the same way, an explanation of the meaning of mathematics can never avoid circularities unless there is
some support point outside mathematics. This is where naive mathematics is required. Naive mathematics
is the point of contact between pure mathematics and the nature of human experience itself, including the
experience of the arrow of time, which allows us to order events, and this ordering of events allows us to
count objects in sets, which gives us cardinality and numbers. The arrow of time is also required for the
ordering of logical arguments from assumptions at the beginning to assertions at the end. A mathematics
book typically starts with assumptions and works towards conclusions, which also requires the arrow of time
to distinguish the future from the past in an asymmetric fashion. The ability to read a mathematical text
requires knowledge of left and right, and up and down.
2.2.2 Remark: The membership concept is related to tribalism.
The concept of sets comes from the human mind’s ability to group objects together and define boundaries
around territories. The very word “membership” in the set theory context is suggestive of tribal membership,
which is fundamental to both human cooperation and competition (which in turn are responsible for the
majority of human happiness and misery respectively).
2.2.3 Remark: The concept of propositions originates from cooperation and competition.
The ability to convert mathematics into propositions is derived from the human ability to convert ideas into
words, which is fundamental to the evolution of humans from isolated monkey tribes into communities of
minds who build common views of the world and history. Speech allowed the human species to transmit
knowledge efficiently and preserve it through replication in the minds of others after individuals who possessed
knowledge died. (Speech also permitted more efficient cooperation within tribes to defeat competitors and
the environment.) Ultimately the conversion of all mathematics into symbolic logic rests on the speech
capability—the ability to convert ideas into words and convert words back into ideas. (See Figure 2.2.1.)
speech
talking hearing
verbalization interpretation
ideas ideas
Figure 2.2.1 Communication of human ideas via speech
The semantics of symbolic logic rests on this human capability to convert back and forth between ideas and
words. So it seems that our ability to perform symbolic logic rests heavily on a human capability which
evolved in reponse to a need to cooperate better within tribes to increase survivability.

2.2. Logic, language and tribalism 23
2.2.4 Remark: The role of language in community formation.

The claim in Remark 2.2.3 that language plays an important role in tribe and community formation is
supported by the following passage in a book on anthropological linguistics by Foley [172], page 69.
Aiello and Dunbar [170] and Dunbar [171] [. . . ] note a close correlation between social-group size
and brain size, and note that both of these were increasing significantly about the time of Homo
habilis 2 million years ago. As the size increased, grooming behaviour would have no longer sufficed
to ensure group social cohesion. Aiello and Dunbar [170] calculate that Homo erectus (Olduvai 9)
with a cranial capacity of 1067 cm3 and a mean group size of 116.39 would have needed to spend
33 percent of its time in grooming activity to promote social bonding. They posit that the function
and complexity of vocalizations were extended to take on some of the load previously carried by
grooming. They also point out that gelada baboons, living in the largest groups among primates
excluding humans, with a mean group size of 115, have vocalization patterns with a number of fea-
tures once considered unique to human speech: fricatives, stops and nasals, 3 places of articulation
(labial, dental and velar), and prosodic melodies (Richman [177,178,179]). These vocal properties
seem to supplement grooming as a mechanism for social bonding. Given the grooming time that
would have been required for hominids, Aiello and Dunbar [170] propose that vocalizations under-
went a similar extension of function in the ancestral Homo lineage. Indeed, the importance of their
role in promoting social cohesion has been emphasized continuously by many scholars, going back
at least to Malinowski [175]; this, of course, is also central to the view of linguistic practices pre-
sented in [the previous chapter]. This is often insufficiently recognized because speakers of written
languages (and linguists!) often unduly overemphasize its propositional bearing function.
It seems somewhat ironic that our modern, sophisticated system of communicating and thinking may have
originated not so much in the need to communicate “propositions” or “facts”, but rather in the advantages of
forming larger tribal groupings for the sake of cooperation against other animals of the same species. In other
words, the original adaptive advantage of language may have been the greater solidarity when competing
with other individuals of the same species. The fact that language eventually developed the ability to
communicate propositions may have been an incidental by-product of identifying which individuals are in or
out of a particular group; in other words, “friend or foe”.
2.2.5 Remark: The anthropological approach to mathematics.
The whole of mathematics (and physics and logic) may be approached with an anthropological mind-set.
(Anthropology is the study of humans with the same mind-set that zoology applies to animals, but with
special emphasis on the differences between humans and other animals.) It is important to study not only
what people do in mathematics, but also how they think about what they are doing. When the studied
humans say things about their thinking which do not seem to make much sense, one should pay more
attention to what they are doing.
2.2.6 Remark: Counting requires language and the arrow of time.
The ability to count things comes from the association of an ordered sequence of words with an ordered
sequence of observations of objects. Counting (beyond one or two items) would not be possible without
a vocabulary of number-words. So numbers arise (probably) from ordered listings of things according to
a standard set of number-words. This ability to associate words with lists of objects is clearly useful for
communicating whether one’s possessions have increased or decreased, for example lists of tribal members,
tools or domesticated animals.
Sometimes it seems sad that geometry has been reduced to numbers (as in numerical coordinate systems),
and that numbers have been reduced to symbolic logic. But symbolic logic is more convenient for reliable
written communication, which originated in the words and sentences of natural-language speech, which itself
arose from the need for efficient communication within human tribes.
It seems, then, that there is nothing really fundamental about the reduction of mathematics to symbolic
logic. It just happens to be convenient for fast and reliable communication between humans. If evolution
had taken another path, the most efficient means of communication might have been more geometrical in
character, with no words or sentences at all. Or effective communication could have evolved along lines
which are difficult for us to imagine because we are human beings who see everything anthropocentrically.
Consequently one should not necessarily expect symbolic logic (and ordered logical arguments) to be the
means of expressing the fundamentals of mathematics among extra-terrestrial civilisations. In other words,

one should not be too concerned at the apparent arbitrariness of some aspects of mathematical logic. Many
aspects are aribitrary, because they originate in the peculiarities of the human species and our current
culture.
The purpose of presenting naive mathematics is to identify, in a sense, “where the feet meet the ground”.
(See Section 3.14.)
2.2.7 Remark: The normative influence of written language on spoken language.

There are some similarities between the disciplines of logic and linguistics. Linguists (including anthropol-
ogists and proselytizers) have written dictionaries and grammar books describing the languages of illiterate
societies since shortly after the invention of printing in Europe. (For example, see Ostler [176], pages 341–
347, for a brief account of the history of dictionaries and grammar books in the Spanish colonies in America
from about 1540 onwards.) Then later the studied societies were able to use dictionaries and grammar books
to help learn their own languages better and to teach their children. (Even in the modern world, some groups
of people learn how to do their own traditional dances and rituals from anthropologists who described these
practices before they became extinct in the wild.)
In the same way, logical argument used to be an activity in the wild. Logicians studied how that wild logic
was happening and developed symbolic logic to describe that behaviour. Gradually the studied populations,
particularly mathematicians, have been able to use symbolic logic in their own thinking.
The formalization of both language and logic is partly good and partly bad. The linguists end up forcing
languages to go down paths that they otherwise would not have. People who learn language from books
find that they have to strictly conform to rules which have been inferred (sometimes imperfectly) by the
linguists. (See Figure 2.2.2.)
literature
writing reading
speech
talking hearing
ideas ideas
Figure 2.2.2 Communication of human ideas via speech and literature
This often brings about conflict between prescribed language and colloquial language. In the case of logic,
the specified logic of the logicians can force mathematicians to accept propositions against their intuition and
will. It is important not to forget where formal logic came from. Formal logic derives its authority from the
wild logic from which the formal logic was inferred. Logicians have as much authority to tell mathematicians
how to do mathematics as anthropologists have to tell Australian aborigines how to roast a kangaroo.
2.2.8 Remark: Formalist mathematics yields existence assertions which require “faith”.
One could view some of the heated controversies about intuitionism and constructivism in the 19th and
20th centuries as resistance to the new formal way of developing mathematics. In particular, the concept
of “existence” formerly meant that something could be specifically described, whereas later mathematicians
used formal methods to describe an empty concept of existence which did not yield specific objects. The
fact that one write down that something exists, and justify this statement in terms of a logical argument
based upon baseless axioms, is nowadays accepted as a proof of existence.
The formalist approach to the notion of existence shows how formal grammars can distort the original
thinking processes by failing to fully, accurately describe the original processes. In the same way that
the real, original speakers of a language may feel they have a right to speak as they see fit rather than
following the grammar books and dictionaries of anthropologists and linguists, so also it seems reasonable
that mathematicians should feel they have the right to object to axioms which seem to describe something
which they do not recognize as authentic mathematics.
When one can prove an assertion ∃x, P (x) from axioms, it requires a certain amount of faith to believe
that an x really does exist which satisfies P (x). A sceptical person wishes to see material evidence of the

2.3. Ontology of mathematics 25
existence. If someone claims that a set is non-empty, but also states that nota single member of that set can
be specified, scepticism would seem to be justified. Even the best-presented argument for the existence of
pixies is thrown into doubt by the impossibility of ever seeing one.
2.3. Ontology of mathematics

2.3.1 Remark: Ontology versus “an ontology”.
Semantics is the association of meanings with texts. An ontology is a semantics for which the meanings are
expressed in terms of real-world models.
In philosophy, the subject of “ontology” is generally defined as the study of being or the essence of things. The
phrase “an ontology” has a different but related meaning. An ontology, especially in the context of artificial
intelligence computer software, is a standardized model to which different languages may refer to facilitate
translation between the languages. To achieve this objective, an ontology must contain representations of all
the objects, classes, relations, attributes and other things which are signified by the languages in question.
In the context of mathematics, an ontology must be able to represent all of the ideas which are signified by
mathematical language. The big question is: What should an ontology for mathematics contain? In other
words, what are the things to which mathematics refers and what are the relations between them? This is
very much like asking: What is the essence of mathematical things?
2.3.2 Remark: Real-world models are required for interpreting physical phenomena.
Since the “real world” can only be indirectly perceived via physical phenomena (i.e. interactions between
human minds and the real world), the closest we can come to the real world is a model of the real world which
is consistent with observed phenomena. Real-world models vary between individual people and over time, and
according to the tasks to which the models are to be applied. A single individual may simultaneously adopt
multiple real-world models. Models may attempt to comprehensively describe the whole universe or only
particular aspects of the entire universe. Since the words “whole” and “entire” cannot be validated for the
universe, because we cannot know the full extent of the limitations of human perception and observations
of the universe, all models must be assumed to be partial at best. More importantly, the limitations on
modelling by humans imply that all real-world models have dubious correctness and completeness even
under the most optimistic assumptions.
The particular difficulty in developing an ontology for mathematics is the non-physical nature of mathemat-
ical “phenomena”.
2.3.3 Remark: An ontology for mathematics can be based on mind-states.

The view adopted in this book is that a satisfying ontology for mathematics can be built on a real-world
model which locates mathematical objects inside human minds, in the minds of some animals, and also in
computer systems which execute mathematical software. The key concept in this proposed model is the
“socio-mathematical network” which is outlined in Section 2.5. The socio-mathematical network view of
mathematics contrasts with the Platonic “Ideal” view in Section 2.4.
2.3.4 Remark: Albert Einstein’s explanation of the ontological problem for mathematics.
Albert Einstein wrote the following comments about “the ontological problem” in an essay: “The problem
of space, ether, and the field in physics” [182], pages 61–62.
It is the security by which we are so much impressed in mathematics. But this security is purchased
at the price of emptiness of content. Concepts can only acquire content when they are connected,
however indirectly, with sensible experience. But no logical investigation can reveal this connection;
it can only be experienced. And yet it is this connection that determines the cognitive value of
systems of concepts.
Take an example. Suppose an archaeologist belonging to a later culture finds a text-book of
Euclidean geometry without diagrams. He will discover how the words “point,” “straight-line,”
“plane” are used in the propositions. He will also see how the latter are deduced from each other.
He will even be able to frame new propositions according to the known rules. But the framing of
these propositions will remain an empty word-game for him, as long as “point,” “straight-line,”
“plane,” etc., convey nothing to him. Only when they do convey something will geometry possess

any real content for him. The same will be true of analytical mechanics, and indeed of any exposition
of the logically deductive sciences.
What does this talk of “straight-line,” “point,” “intersection,” etc., conveying something to one,
mean? It means that one can point to the parts of sensible experience to which those words refer.
This extra-logical problem is the essential problem, which the archaeologist will only be able to solve
intuitively, by examining his experience and seeing if he can discover anything which corresponds
to those primary terms of the theory and the axioms laid down for them. Only in this sense can
the question of the nature of a conceptually presented entity be reasonably raised.
With our pre-scientific concepts we are very much in the position of our archaeologist in regard to the
ontological problem. We have, so to speak, forgotten what features in the world of experience caused
us to frame those concepts, and we have great difficulty in representing the world of experience to
ourselves without the spectacles of the old-fashioned conceptual interpretation. There is the further
difficulty that our language is compelled to work with words which are inseparably connected with
those primitive concepts.
Mathematics is not able to define itself in a self-contained way. Even though a machine may perform
calculations (like a future archaeologist who does not know what the words and symbols refer to), the
machine is not able to solve the ontological problem of mapping words and symbols to their meaning.
2.3.5 Remark: Mathematics without ontology is empty of meaning.

Mathematics is as much about sets as computer data is about zeros and ones. Just as computer data must
be brought to life by being interpreted by human beings, so also must set constructions be brought to life
by interpretation. Set theory is a language, but this language is no more useful than ancient Mycenaean
clay tablets if one has no dictionary and no familiarity with the world model within which the tablets were
created.
Even if a future archaeologist has the same sort of brain as 21st century mathematicians, however, there is
still the difficult question of how 21st century mathematical symbols can be mapped to the concepts inside
the mind of the future archaeologist. The minds of mathematicians in this century are formed by a long
and intense process of training to “see” concepts which are quite foreign to spontaneous thinking. These
concepts require considerable “mind-stretching”. It is not obvious that the minds of people of the future
would be sufficiently “stretched” by reading 21st century mathematical literature to be able to synchronize
to our mathematical concepts.
On the other hand, the mathematical literature of ancient Greece was somehow adequate to stimulate the
renaissance of mathematics in Europe after a Dark Age which lasted more than 1500 years. There was,
admittedly, some slim continuity in the person-to-person communication of mathematical concepts. So it
is difficult to know if the literature alone could have stimulated a mathematical renaissance. Luckily the
particular natural langauge of the texts was not yet extinct when Europe emerged from this Dark Age.
2.3.6 Remark: Mathematical activity may be perceived internally.

Einstein is probably not right (in Remark 2.3.4) in saying that all meaning must be connected to “sensible
experience” unless one includes internal perception as a kind of sensory experience. Pure mathematical
language refers to activities which are perceived within the human brain.
2.3.7 Remark: The intellectual content of mathematics lies in human ideas.

Lakoff/Núñez [173], page xi, makes the following comment on the lack of intellectual content in mathematical
symbols.
Mathematics is seen as the epitome of precision, manifested in the use of symbols in calculation
and in formal proofs. Symbols are, of course, just symbols, not ideas. The intellectual content of
mathematics lies in its ideas, not in the symbols themselves. In short, the intellectual content of
mathematics does not lie where the mathematical rigor can be most easily seen—namely, in the
symbols. Rather, it lies in human ideas.
2.3.8 Remark: Classification of ontologies for mathematics.

Ontologies for mathematics are often categorized in an effort to try to bring order to a chaotic jumble of
proposals by numerous authors. Various mathematics ontology category lists may be found in the literature.
Among the numerous categories are the following.

2.4. Plato’s theory of ideas 27
(1) Plato’s theory of ideas. Mathematics exists in a “timeless realm of being”. (See Section 2.4.)
(2) Mathematics exists in the machinery of the universe.
(3) Mathematics exists in the human mind.
2.3.9 Remark: The physical-universe-structure mathematics ontology.

The ontology category (2) in Remark 2.3.8 is exemplified by the following comments in Lakoff/Núñez [173],
page xv. (The authors then proceed to cast scorn on the idea.)
Mathematics is part of the physical universe and provides rational structure to it. There are
Fibonacci series in flowers, logarithmic spirals in snails, fractals in mountain ranges, parabolas in
home runs, and π in the spherical shape of stars and planets and bubbles.
Later, these same authors say the following (Lakoff/Núñez [173], page 3).
How can we make sense of the fact that scientists have been able to find or fashion forms of
mathematics that accurately characterize many aspects of the physical world and even make correct
predictions? It is sometimes assumed that the effectiveness of mathematics as a scientific tool shows
that mathematics itself exists in the structure of the physical universe. This, of course, is not a
scientific argument with any empirical scientific basis.
[. . . ] Our argument, in brief, will be that whatever “fit” there is between mathematics and the
world occurs in the minds of scientists who have observed the world closely, learned the appropriate
mathematics well (or invented it), and fit them together (often effectively) using their all-too-human
minds and brains.
There is very great certainty that mathematics does exist in the human mind. This mathematics corresponds
to observations of, and interactions with, the physical universe. But those observations and interactions are
extremely limited by the human sensory system and the human cognitive system. We cannot be at all certain
that the mathematics of our physical models is inherent in the observed universe itself. We can only say
that our mathematics is highly suited to describing our observations and interactions, which are themselves
very limited by the channels through which we make our observations.
The correspondence between models and the universe is pretty good at times, but we view the universe
through a cloud of statistical variations in all of our measurements. All observations of the physical universe
require statistical inference to discern the noumena underlying the phenomena. This inference may be
explicit, with reference to probabilistic models, or it may be implicit, as in the case of the human sensory
system.
2.3.10 Remark: The mathematical mind ontology leads to a useful avenue of research.
One may look at the options for the ontology of mathematics in Remark 2.3.8 in terms of their consequences
for research. If one accepts option (1) (the mathematical heaven) or option (2) (the mathematical universe),
the source of mathematics is in both cases inaccessible to practical research. Both are dead ends for research.
By contrast, if one accepts option (1) (the mathematical mind), one is led to inquire into the history of
the development of mathematical ideas in the human mind. The history of models, propositions, numbers
and sets can be inferred to some extent from biology, zoology, palaeontology, archaeology, history and
anthropology. In particular, the development of these building blocks of mathematical thought over the last
million years can be inferred as we have progressed from monkeys to pre-linguistic humans, to linguistic
humans, to neolithic farmers, to the earliest literacy, and so forth. the “deconstruction” of our modern
mathematical ideas in this historical fashion brings some credible clarification of their nature.
2.4. Plato’s theory of ideas

2.4.1 Remark: Sets and numbers exist in minds, not in a mathematics-heaven.
The Platonic style of ontology is explicitly rejected in this book. Sets and numbers, for example, really
exist in the mind-states and communications among human beings (and also in the electronic states and
communications among computers and between computers and human beings), but sets and numbers do not
exist in any “mathematics heaven” where everything is perfect and eternal. Any such “heaven for ideas” is
located in the human mind, if anywhere, and it is neither perfect nor eternal. Plato’s ideal “Forms” really
do exist, but only in the human mind.

The idea that all imperfect physical-world circles are striving towards a single Ideal circle form in a perfect
Form-world is quite seductive. (For discussion of Plato’s “theory of ideas”, see for example Russell [186],
Chapter XV, pages 135–146; Foley [172], pages 81–83.)
[ Remarks 2.4.2 and 2.4.3 are very similar to Remark 2.11.19. Should combine or at least collocate them. ]
2.4.2 Remark: A plausible argument in favour of a Platonic-style ontology for mathematics.

A plausible argument may be made in favour of Platonic ontology for the integers in the following form.
(1) Numbers written on paper must refer to something.
(2) Numbers do not correspond exactly to anything in the sensible world.
(3) Therefore numbers correspond exactly to something which is not in the sensible world.
Here the word “sensible” means “able to be sensed”, e.g. via sight or hearing. The same overall form of
argument is applied to geometrical forms such as triangles in this passage from Russell [186], page 139.
In geometry, for example, we say: ‘Let ABC be a rectilinear triangle.’ It is against the rules to ask
whether ABC really is a rectilinear triangle, although, if it is a figure that we have drawn, we may
be sure that it is not, because we can’t draw absolutely straight lines. Accordingly, mathematics
can never tell us what is, but only what would be if. . . There are no straight lines in the sensible
world; therefore, if mathematics is to have more than hypothetical truth, we must find evidence
for the existence of super-sensible straight lines in a super-sensible world. This cannot be done by
the understanding, but according to Plato it can be done by reason, which shows that there is a
rectilinear triangle in heaven, of which geometrical propositions can be affirmed categorically, not
hypothetically.
Although microscopy, chemistry and quantum mechanics convince us very easily that there is no such thing
as a straight line, it is a little more difficult to make the case that the “cardinality of collections” is ill-defined
in the sensible world. Each integer is supposed to correspond to the cardinality of real-world collections of
objects, or so we are told in our infancy. Even chimpanzees can count. But serious doubt may be thrown
on this idea. So here is the above 3-step argument for integers.
(1) If you write down the numbers 1, 2 and 3 on a piece of paper and ask: “Are these numbers?”, many
people will say that they are. But if you write down the word “dog” and ask if this is a dog, they
will almost certainly say it is not. The word “dog” refers to an external entity. In the same way, the
numbers written on paper also refer to entities other than the ink on the paper. So what do written
numerals refer to?
The written numerals are only ink or graphite smeared onto paper. If billions of people are writing these
symbols, they must mean something. They must refer to something in the world of experience of the people
who write the symbols to communicate with each other.
(2) Nothing in the observed physical world corresponds exactly to numbers. The number of cows in a field
may seem to be 3, for example, but if a cow dies, the number of cows will vary over time. There will be
a point in time at which it is not clear whether there are 2 or 3 cows because one of them is undergoing
a death process. Even in the case of humans, there is enormous controversy over the many definitions
of when death has occurred. If a cow is born, the number of cows increases, but there are times when
the cow count is ambiguous. There is frequent animated controversy over the time-point at which a cow
(or human) may be said to come into existence. Likewise, there is ambiguity about whether a cow is
inside a given field or outside it. If the cow is entering or leaving the field via a hole in the fence, the
time of entry or exit is subjective. Quantum mechanics muddies the waters still further.
(3) Since nothing in the physical world corresponds exactly to the idea of a number, and written numbers
must refer to something, there must be a non-physical world where numbers exist. This non-physical
world must be perceived by all humans because otherwise they could not communicate about numbers
with each other. This proves the existence of a non-physical world where all numbers are perfect and
eternal. This world may be referred to as a “mathematics heaven”. This number-world can be perceived
by the minds of human beings.
Convincing argument, isn’t it? Well, maybe not. The integers may (and probably do) refer to specific kinds
of contents of the human mind. But these contents can be sensed by the mind, although they are not sensed

2.4. Plato’s theory of ideas 29
externally. In other words, it is not necessary to invent some sort of spooky, aetherial, all-pervasive meta-
universe which can be detected by deep contemplation and meditation. Numbers (and rectilinear triangles)
are just part of the normal thought processes of normal human beings living normal lives. Nothing spooky.
2.4.3 Remark: Possible non-existence of sensible objects.
It may be remarked that the second part of the argument in Remark 2.4.2 implies in particular that the
integers 0 and 1 have no reliable correspondence to the “sensible world” either. So both existence and
uniqueness of objects are thrown into doubt. (This is because existence means that the cardinality of a
collection is at least 1 whereas uniqueness means that the cardinality is at most 1.) A cow may be defined
as a region of space-time which has an approximate boundary, but this kind of definition is not free of
ambiguity.
In order to count up to 1 cow, the observer must be able to
(i) see a cow as more than just a two-dimensional pattern of light;
(ii) distinguish one cow from another similar-looking cow;
(iii) identify the cow at each point in time as being the same cow;
(iv) distinguish cows from non-cows;
(v) not count the same cow twice by accident.
The difficulty of programming computers to perform these tasks shows that even counting to 1 is non-trivial.
One may think of the process of counting collections of objects as a kind of ADC (Analogue Digital Conver-
sion). The engineering of systems which perform ADC is a well-developed art, but there are always boundary
cases which are difficult to convert. Mathematics and logic are performed on the converted perceptions, not
on real things themselves.
In particular, the geometrical concept of a “point with no extent” is a “digital” concept which exists only
in the mind, but this concept is the result of conversion of “analogue” observations. Likewise, the “digital”
or “discrete” nature of human language requires some sort of ADC to convert fuzzy ideas inside minds into
particulate language.
2.4.4 Remark: Single object with a trajectory over time versus time sequence of objects.
Requirement (iii) in Remark 2.4.3 implies that as a cow develops over life, it is supposed to be the same cow.
This is despite the fact that the cow does change over that time, and a large proportion of the matter in the
cow is exchanged during life. In the case of a car whose parts have all been replaced at least once during its
lifetime, one often still regards that car as being the same car, having the same “identity”.
The same identification issue arises in the case of nations. It is generally assumed that a nation at some
point in history may be identified with the nation of the same name 150 years later, although all of the
individuals have been “exchanged”. This is sometimes called the “continuity and succession” issue. This
arises especially in discussions of debts and rights. (For example, Russia was regarded as the “successor state”
to the Soviet Union in 1991, inheriting its rights and obligations even though the territory and population
were significantly altered.)
Similarly, companies merge and split, and employees come and go, but somehow the identity of companies
must be determined. As another example, if a bacterium divides, it is difficult to say whether it died at that
time, or half of its identity is given to each of its offspring, or its full identity is given to each of the offspring.
A very similar issue arises in the design of databases. One must often decide what constitutes an “object”.
Some attributes determine the identity of an object, whereas other attributes are changeable without chang-
ing the identity. Sometimes the determination of object identity is easy. Sometimes it is almost impossible.
The implication of these comments for mathematics, and for logic generally, is that one should not take too
seriously the idea that numbers and sets have their origins in the perceived universe. Numbers and sets are
human concepts for organizing perceptions. Number and sets may be thought of as part of the “commentary”
by the human mind on the world as perceived. When a child is shown 3 objects and is told that this is the
number 3, this does not mean that the number 3 is in the real world. The number 3 is something which
arises in the child’s mind when 3 objects are shown. And the definition of “3 objects” is whatever triggers
the adult’s mental concept of “3 objects”. The adult and the child are sharing the same commentary on the
world as perceived. But the fact that two people experience the same thing under the same circumstances
does not imply that that experience is “out there” in the environment.

2.4.5 Remark: Descartes and Hermite supported the Platonic ontology for mathematics.
Descartes and Hermite supported the Platonic Forms view of mathematics. Bell [191], page 457, published
the following comment in 1937.
Hermite’s number-mysticism is harmless enough and it is one of those personal things on which
argument is futile. Briefly, Hermite believed that numbers have an existence of their own above
all control by human beings. Mathematicians, he thought, are permitted now and then to catch
glimpses of the superhuman harmonies regulating this ethereal realm of numerical existence, just
as the great geniuses of ethics and morals have sometimes claimed to have visioned the celestial
perfections of the Kingdom of Heaven.
It is probably right to say that no reputable mathematician today who has paid any attention
to what has been done in the past fifty years (especially the last twenty five) in attempting to
understand the nature of mathematics and the processes of mathematical reasoning would agree
with the mystical Hermite. Whether this modern skepticism regarding the other-worldliness of
mathematics is a gain or a loss over Hermite’s creed must be left to the taste of the reader. What is
now almost universally held by competent judges to be the wrong view of “mathematical existence”
was so admirably expressed by Descartes in his theory of the eternal triangle that it may be quoted
here as an epitome of Hermite’s mystical beliefs.
“I imagine a triangle, although perhaps such a figure does not exist and never has existed any-
where in the world outside my thought. Nevertheless this figure has a certain nature, or form, or
determinate essence which is immutable or eternal, which I have not invented and which in no way
depends on my mind. This is evident from the fact that I can demonstrate various properties of
this triangle, for example that the sum of its three interior angles is equal to two right angles, that
the greatest angle is opposite the greatest side, and so forth. Whether I desire it or not, I recognize
very clearly and convincingly that these properties are in the triangle although I have never thought
about them before, and even if this is the first time I have imagined a triangle. Nevertheless no one
can say that I have invented or imagined them.” Transposed to such simple “eternal verities” as
1 + 2 = 3, 2 + 2 = 4, Descartes’ everlasting geometry becomes Hermite’s superhuman arithmetic.
2.4.6 Remark: Bertrand Russell abandoned the Platonic mathematics ontology.
Bertrand Russell once believed the ideal Forms ontology, but later rejected it. Bell [190], page 564, said the
following about Bertrand Russell’s change of mind.
In the second edition (1938) of the Principles, he recorded one such change which is of particular
interest to mathematicians. Having recalled the influence of Pythagorean numerology on all subse-
quent philosophy and mathematics, Russell states that when he wrote the Principles, most of it in
1900, he “shared Frege’s belief in the Platonic reality of numbers, which, in my imagination, peopled
the timeless realm of Being. It was a comforting faith, which I later abandoned with regret.”
There is no need to find a location for mathematics other than the human mind. Perfectly circular circles,
perfectly straight lines and zero-width points all exist in the human mind, which is an adequate location
for all mathematical concepts. There is no need to postulate an extrasensory perception by which the mind
perceives perfect geometric forms. The perfect forms are inside the mind and are perceived by the mind
internally. The perfect forms are real in the sense that CGI graphics of dinosaurs are real.
2.4.7 Remark: The location of mathematics is the human brain.
Lakoff/Núñez [173], page 33, has the following comment on the location of mathematics.
Ideas do not float abstractly in the world. Ideas can be created only by, and instantiated only in,
brains. Particular ideas have to be generated by neural structures in brains, and in order for that
to happen, exactly the right kind of neural processes must take place in the brain’s neural circuitry.
Lakoff/Núñez [173], page 9, has the following similar comment.
Mathematics as we know it is human mathematics, a product of the human mind. Where does
mathematics come from? It comes from us! We create it, but it is not arbitrary—not a mere
historically contingent social construction. What makes mathematics nonarbitrary is that it uses
the basic conceptual mechanisms of the embodied human mind as it has evolved in the real world.
Mathematics is a product of the neural capacities of our brains, the nature of our bodies, our
evolution, our environment, and our long social and cultural history.

2.5. Sets as parameters for socio-mathematical network communications 31
2.5. Sets as parameters for socio-mathematical network communications

2.5.1 Remark: Ontological questions disappear in the anthropological viewpoint.
The “socio-mathematical network” idea referred to in this section is essentially an anthropological view
of mathematics. We can study human mathematical behaviour with the same mind-set as when studying
animal communication behaviour. In this perspective, most philosophical questions about meanings of
mathematical concepts disappear. Within the anthropological perspective, mathematical concepts exist
only in models which are built to explain the observed behaviour. In other words, we seek models to explain
the externally observed behaviour whereas in the more abstract philosophy of mathematics, one searches for
external objects with which the mathematical concepts can be associated.
2.5.2 Remark: Set theory and logic are only languages, not the real stuff of mathematics.
The principal assertion of this section is that set theory and mathematical logic are merely languages for
the communication of mathematical ideas rather than the true “stuff” of mathematics. Mathematical mind
states and activity are the true stuff of mathematics.
A mathematical definition of a system such as a tensor space is not really a definition of the system. It
is merely a set of construction instructions and interoperability tests which characterize the system to be
constructed.
2.5.3 Remark: Ontology of mathematics based on mind-states in a socio-mathematical network.

In this section, a model is sketched for an informal ontology of mathematics. According to this “socio-
mathematical network” model, mathematical entities (such as numbers and sets) are identified with retriev-
able data-items in mind-states which result from a process of synchronization of a network of mathematical
thinkers. The notations of mathematics are parameters of a communications protocol between participants
in the network. The network protocol also includes indications of object classes.
Whether or not real-life mathematics is done literally in this fashion is not important. An ontological model
is only an agreed “reference model” which acts as a target for the semantics of a language.
2.5.4 Remark: There is no standard notation for ontological classes of mathematical objects.
As mentioned in Section 2.6, sets alone are not sufficient to convey the full meaning of mathematical entities.
Each mathematical object represented by a set must be explicitly or implicitly tagged within each context
with an indication of the class to which the object belongs.
This raises the difficult question of how to communicate classes of objects. A language mechanism is required,
for example, to indicate the difference between a left transformation group and a right transformation group
which have identical set representations. (This clash is mentioned in Remarks 5.16.6 and 33.8.8.) How is
the difference between the ordered pair (0, 1) ∈ IR2 and the complex number i = (0, 1) ∈ to be indicated?
Should there be an international standards body to maintain a register of mathematical object classes?
Should there be a standardized notation for mathematical class ontology? How would the meaning of such
a notation be communicated and standardized?
[ Maybe category theory gives a standardized notation for a large number of mathematical object classes? ]
At present there is (probably) no international standard notation for mathematical class ontology. Any such
notation would need to be boot-strapped with recursively lower-level concepts until some sort of “a-priori
knowledge” level is reached.
The central claim which is made in this section is that mathematical object classes, which provide the missing
semantic content for set constructions, are currently standardized informally by means of socio-mathematical
network synchronization processes. There is nothing very unusual about these processes. Abstract concepts
such as beauty, truth, pain, pleasure, hope and despair are standardized by means of informal social network
communication through immersion in the culture.
2.5.5 Remark: Indication of word categories in ancient Egyptian hieroglyphics.

It is perhaps noteworthy that in ancient Egyptian hieroglyphics, concrete words were represented by pictures
of the things referred to whereas abstract words were written as phonetic symbols followed by a special cat-
egory symbol (a rolled-up papyrus) which indicated abstract nouns. (For Egyptian abstract determinatives,
see [198], pages 5–6; [215], pages 80–83.) The papyrus symbol following a word signified that the idea could
not be shown in a picture, but only written in terms of its pronunciation on papyrus. Dogs can be drawn,

but mental states are abstract, invisible entities which can be referred to only by words (for example on
papyrus).
Some modern languages, such as Vietnamese, have a similar system of category-words which precede ordinary
words to disambiguate them. In English, the categories of words are indicated informally. In mathematics,
the category is usually indicated informally also.
2.5.6 Remark: Understanding mathematics requires prior capability to represent the ideas.
It is an interesting question whether an alien species from another galaxy could decipher and really under-
stand our mathematics written on Earth. Probably alien minds would require similar internal processes to
humans in order to understand mathematical texts as anything more than a formal system. The inability to
see the similarity between 4 chickens and 4 asteroids (i.e. the concept of cardinal numbers) would probably
exclude any understanding of human mathematics. Mathematical notations must stimulate in the human
reader some pre-existing capability to internally represent the ideas indicated. Integers on paper must evoke
some kind of “essence of integers”. Otherwise mathematics is as meaningless as ancient Minoan tablets.
Harmonization of such “essences” requires a social network to make words and symbols correspond to com-
monly agreed meanings. This process is as reliable, and unreliable, as social agreement on the meanings of
natural languages.
2.5.7 Remark: The difference between internal representations and communication languages.
Figure 2.5.1 illustrates the role of set theory as a communication protocol between mathematicians, as
opposed to a language for referring to mathematical objects. Set theory plays a role similar to the Internet
protocols for communicating between computers.
mind A mind B
set theory protocol

problem 3# + 1# 3## + 1## problem
3+1
construction construction
set theory protocol
answer 4# 4## answer
4
rule test rule test
set theory protocol

problem 2# + 2# 2## + 2## problem
2+2
mind A mind B
Figure 2.5.1 Commutative diagram for socio-mathematical synchronization
Mathematical language steers the reader or listener towards the thoughts which the writer or speaker is
trying to communicate. Mathematical language does not describe the destination conclusions, but rather
the navigation directions to get to those conclusions. Thus mathematics is more a “steering language” than
a “location language”. In other words, mathematical language sends “difference signals” to alter the reader’s
or listener’s thoughts rather than absolute signals. It follows that the sender and receiver of this language
must have agreed starting points.
By comparison, the languages used for storing data inside computers are quite different to network protocol
languages. For example, C++ and Fortran are programming languages which are used inside a single
computer, whereas HTTP and SNMP are network languages for communciation between computers “on the
wire”. The programming languages specify data in a fairly static way whereas the network languages consist
of a sequence of instructions which direct the recipient to dynamically change state in some way. The “state
transition rules” for network languages steer the receiving computer towards a specified state.

2.5.8 Remark: Synchronization of different implementations of the integers.

Consider the simple situation illustrated in Figure 2.5.1 where two minds A and B have already established
an agreement on the nature and usage of the integers 1, 2 and 3. Mind A has internal representations 1# , 2#
and 3# . Mind B has internal representations 1## , 2## and 3## . Suppose mind A wishes to introduce the concept
of the number 4 into the discussion. Then mind A could send a definition of 4 to mind B as 4 = 3 + 1.
Although mind B has a different internal representation of the numbers 3 and 1 to mind A, the operation
of addition may be carried out to obtain 4## from 3## + 1## .
To verify that the rules of addition hold, mind A may send a test problem to mind B such as “2 + 2 = ?”.
If the result of the addition 2## + 2## is the same as 3## + 1## , the addition rules are preserved. To confirm this
to mind A, the internal number 4## is translated to the external number 4 and sent back to mind A, which
converts this to internal number 4# .
In a more systematic way, mind A could send the complete set of rules for the number system {1, 2, 3, 4} for
verification by mind B. If all rules of the system are verified, the two minds have a consistent definition of
the system. They can then proceed to each prove theorems about their number systems, confident that all
operations will yield the same result at both ends of the communication path.
The number 4 might be represented by a child as a symbol ‘4’ drawn on paper (or a drawing of 4 chickens),
by an ancient Roman as ‘IV’, by a mathematician as 3+ or {0, 1, 2, 3} or {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}},
and by a computer as a sequence of voltages of the form 00000100 or 11111011.
2.5.9 Remark: Mind-dependent addition and equality.

Perhaps notations such as +# , +## , =# and =## should be used in Remark 2.5.8 to emphasize that these
operations are mind-dependent. These superscripts have been suppressed in the interests of tidiness.
2.5.10 Remark: The phenomena/noumena and language/representation dichotomies are similar.

In the case of mathematics, we can see the inter-person network languages, but not the internal represen-
tation “languages”. The distinction between phenomena and noumena is similar to the distinction between
observable languages and internal representations. Phenomena are the observations of things, which are an
interaction between observers and observed things. Noumena are the things themselves, for which we can
create speculative models which match the phenomena. In the same way, the languages which are used for
mathematical interactions between individuals are directly observable, whereas the internal representations
of mathematical objects within individuals cannot be observed, but we can create models (in our own minds)
of the representations of mathematical objects inside other people’s minds.
2.5.11 Remark: The definition of equality is the responsibility of individual applications.

The difference between the traffic on the mathematics communication channels and the implementation of
mathematical concepts in human minds corresponds fairly well to the difference between abstract mathe-
matical logic and concrete mathematical logic as discussed in Remark 4.12.3.
For example, when the equation “x = ∅” is written in abstract logic, this means that the concrete object
which is pointed to by the variable name x is the same as the object which is pointed to by the constant
name ∅. The meaning of equality is the responsibility of the mind which implements the set concepts. For the
purposes of the communications channel, it is only important that whatever equality relation is implemented
in the concrete logic inside minds must satisfy the abstract requirements (reflexivity etc.) which are assumed
at the linguistic level.
c
When a concrete equality relation = is exported from the concrete level inside the mind as an abstract
a
linguistic-level equality relation = “on the wire”, it must possess the properties of an equality relation for
any choice of map µVk : NV → Vk from the variable abstract name space NV to the concrete object spaces Vk .
(See Figure 2.5.2.) What is called “equality” at the abstract linguistic level is probably more accurately called
“identity” at the concrete level. (See Remark 4.15.1 for related comments.)
2.5.12 Remark: Synchronization of different internal representations of tensor products

More complex constructions, such as tensor products of linear spaces or tangent spaces on differentiable
manifolds, may follow the same basic pattern as for integers. Mind A may construct a tensor product from
two linear spaces which both minds have previously synchronized. Then mind A may send a request to
mind B to construct tensor products of individual vector pairs by formally juxtaposing them. It may be
that mind B already has a physical intuition for what a tensor product might be, but it is not necessary.

object space V1 X=Y

a object space V2
c abstract c
µV1 (X) =1 µV1 (Y ) linguistic µV2 (X) =2 µV2 (Y )
level
mind 1 X=Y
a
mind 2
Figure 2.5.2 Abstract and concrete equality/identity relations
The vector products may be internally represented by mind B as symbolic juxtapositions of pairs of internal
representations of vectors. When this has been done by mind B, mind A should send some tests to determine
whether mind A has a consistent definition. For example, mind A could send pairs v1 ⊗ v2 and (2v1 ) ⊗ (0.5v2 )
for some vectors v1 and v2 . Then mind B should obtain the same answer for each tensor product, namely
v1# ⊗ v2# = (2v1# ) ⊗ (0.5v2# ). If all of the rules of a tensor product are satisfied, then all relevant properties and
relations of the tensor product space will be the same for both minds because the tensor product rules are
a complete characterization (as discussed in Section 13.5).
2.5.13 Remark: Axioms are an efficient way of synchronizing infinitely many objects.
The kind of synchronization of definitions which is described in Remark 2.5.8 is clearly not feasible for infinite
systems, and is quite inefficient even for finite systems. This is why axioms are a good idea. The purpose
of axioms is to synchronize an arbitrarily large number of properties and relations among participants in
a socio-mathematical discussion. Axioms are typically templates for an infinite number of properties or
relations. Axioms require only finite bandwidth to communicate an infinite amount of information. This
gets around the finite bandwidth limitation of human communications.
If it can be shown that all systems which obey a particular set of axioms are isomorphic in some sense, this
builds confidence that two people are essentially talking about the same thing if their systems obey the par-
ticular axioms. This gets around the problem that each human may have a different internal representation
of each object or class of object.
For the positive integers, the classic axiomatic system is due to Peano. (See for example Definition 7.3.2.)
If two people agree on the Peano axioms and the rules of logic, any deduction made from the axioms and
rules by one person must (hopefully) be accepted by anyone else who accepts the axioms and rules. If the
axioms and rules imply uniqueness of the system of integers up to isomorphism, one can be fairly certain
that any theorem proved by one person will be accepted by all people in the socio-mathematical network.
This is the entire motivation behind the axiomatization of mathematical systems. The process is similar to
the standardization processes for the Internet and for audio cable connectors.
2.5.14 Remark: Jacquard loom programming analogy for mathematics synchronization.
The process of socio-mathematical synchronization may be compared to the synchronization of Jacquard
loom designs by exchanging punch cards and loom outputs in the post. (The Jacquard loom was invented in
1790 or 1801. See for example Guinness encyclopedia [204], pages 319, 335.) If loom designer A sends a set
of programming cards for the loom through the post to loom designer B, who uses the cards to weave some
cloth which is sent back through the post, loom designer A may compare the cloth with what is produced
locally. If the result is the same, this increases the level of confidence that the two machines work in the
same way. One machine may be wooden and the other metallic, but as long as they produce the same output
for the same inputs, both loom designers can be confident that whenever they create a set of programming
cards for their own machine, it will produce the same cloth on the other machine a long distance away. The
same synchronization process is involved in mathematics, where each mathematician has a high confidence
that their deductions will be accepted by other mathematicians, because they all test their own thinking
processes against standard axioms and deductive methods.
Mathematical theorems are not absolute knowledge any more (or less) than Jacquard loom punch cards are
absolute knowledge. Mathematical theorems apply only to the machines, i.e. the minds, which obey the
same calculation rules.
2.5.15 Remark: Inference of virtual machines by observing mathematical communications.
By observing the communication of mathematics between people, one can infer something about the nature

of the “virtual machines” which participate in the mathematical network. The inferred virtual machines
must be able to store information which is received on the communciation channel for later transmission.
The machines must be able to somehow represent all of the data structures described in the communications
and the parameters of those structures must be modifiable in response to incoming information.
One can infer that the stored data structures corresponding to observed communications of mathematics
must have at least as much information content as the structures which are observed on the channel. What
one cannot infer is that the data representation in the mind has a direct relation to the way in which
mathematics is written.
2.5.16 Remark: Mathematics is a craft, not a science.

It may be concluded that mathematics is not a science but rather a mental and social craft. The mathematics
craft requires both psychological processes and social communications. It is impossible to settle questions
about the true nature of mathematics without reference to the human mind and social context. Therefore it
is useless to ask what the true tangent vector definition is or which set theory axioms are the true axioms.
You choose your axioms and your definitions according to your purposes and the community with which you
wish to communicate. Therefore mathematical logic may be regarded as a systematic study of a particular
range of human psychological and social behaviour.
2.5.17 Remark: The objects of mathematics are part of anthropology.

In terms of the diagram which was discussed in Remark 2.0.1, the socio-mathematical network model lies
in the discipline of anthropology. Geometric and arithmètic concepts may thus be explained in terms of
neuropsychology and neurophysiology. These may in turn be explained in terms of biology, chemistry and
physics. This completes the cycle of dependencies of the seven disciplines because physical theories are
expressed in terms of mathematics.
2.5.18 Remark: Axiom systems are abstracted from experience.

Axioms are often presented in textbooks as being the starting point for mathematical definitions. However,
axioms may also be regarded as merely a summary of experience. That is, the experience comes first and
the axioms are derived from experience. Bell [191], page 356, made the following comment regarding the
axioms (or “postulates”) of an abstract algebraic field.
Such a set of postulates may be regarded as a distillation of experience. Centuries of working
with numbers and getting useful results according to the rules of arithmetic—empirically arrived
at—suggested most of the rules embodied in these precise postulates, but once the suggestions of
experience are understood, the interpretation (here common arithmetic) furnished by experience
is deliberately suppressed or forgotten, and the system defined by the postulates is developed
abstractly, on its own merits, by common logic plus mathematical tact.
The unsystematic synchronization of the idea of a field (or any other mathematical system) amongst math-
ematicians may be replaced by a compact set of rules, which then becomes the basis of socio-mathematical
synchronization. Mathematical systems exist first in human experience, and are later formalized by axioms
and constructions for efficient communication with others. (The idea that mathematics is “empirically ar-
rived at” may be compared with the Einstein quotation in Remark 2.3.4 and the comments in Remark 2.3.5.)
Although the most basic systems, such as integers and real numbers, are derived from experience, mathe-
maticians often invent axiomatic systems simply out of curiosity or idle recreation. Such systems sometimes
develop a connection with real-world experience at a later time in history, but often these invented-on-paper
mathematical systems become sterile exercises, like roads leading into the desert, hotels without customers,
trains without passengers. This is not different to the situation of natural languages. The words of natural
languages are initially closely connected to experience, but speakers and writers may create copious speech
or text with no apparent connection to the real world—grammatically correct yet ontologically empty.
2.5.19 Remark: Axiomatization of mathematical systems discards the ontology.

When the “real stuff” of mathematics in the human mind is formalized through axioms and constructions
for easy, reliable communication and synchronization, the original experience and meaning of mathematical
objects is not communicated through the axioms and constructions. The communicable properties and
relations of mathematical objects enable correct computations to be made, but do not communicate what
the objects are. Therefore it is important to accompany definitions with humanly meaningful explanations

of their significance. The writer usually knows the significance; the reader sometimes does not know and
cannot easily guess.
2.5.20 Remark: Behaviourism versus introspection.

The behaviourist movement in psychology, which was dominant in the USA in the second half of the 20th
century, claimed that human introspection is an improper subject for serious study. They claimed that only
human behaviour which is observable by others can be studied scientifically. (One big hole in their argument
was the fact that the reports of the observers of the behaviour are as unreliable as the introspective reports
by the subjects being studied.) According to this view, the inner experience of individuals must be ignored
and a human being is considered to be no more than the sum total of their observable behaviours.
In a similar way, mathematics could be regarded as no more than an observable human behaviour where
streams of symbols pass between people. These symbols could be analysed for patterns without under-
standing their meaning for the participants. This perspective would be as erroneous as the behaviourist
perspective in psychology. It would be like regarding all of human history as no more than a series of texts
without regard to the world about which the texts were written.
2.5.21 Remark: Academic philosophy of mathematics. Embodied mind theories.

In the serious academic philosophy of mathematics, the viewpoint called “embodied mind theories” seems
to be very close to the ideas expressed in this chapter. (See Lakoff/Núñez [173] for this viewpoint.) That
view asserts that all of mathematics exists in the human brain, and that the instantiation of mathematics
may be explained through cognitive science.
2.6. Sets as parameters for classes of objects

2.6.1 Remark: Sets are merely “secondary keys” for mathematical objects.
In this section, it is proposed that sets are merely parameters for classes of mathematical objects, not the
objects themselves.
The question of what mathematical objects are, and where they reside, is discussed in Section 2.5. Even if
we accept that mathematical objects reside in human minds, it does not immediately follow that sets are
the primary objects. In fact, it is more likely that classes are primary, and sets provide merely a “secondary
key” (in the sense of computer databases) which tells you which object in a class is being indicated. The
reason for this proposal is essentially the idea that there are not enough sets to comfortably describe all of
the objects which are discussed in mathematics.
Maybe this section is stating the obvious, but the author thought for a long time that sets were the primary
objects of mathematics. However, although the map from mathematical objects to sets may be well-defined,
the inverse map from sets to mathematical objects is highly non-unique. This is why “tagging” of sets
by class and/or context is required. A student could quite reasonably suppose that sets are the primary
mathematical objects because teachers so often make statements like: “We define a group to be an ordered
pair (G, σ) where G is a set and the function σ : G × G → G has the following properties.” When a teachers
defines a concept to be a set, they are just not stating the “obvious” fact that the definition is context-
dependent. Human communication would very inefficient indeed if the full context was made explicit in all
statements.
2.6.2 Remark: Mathematics without semantics is literally meaningless.

Just as physical containers (like boxes, cups, saucepans and houses) are very useful in everyday life, but are
not so useful without other things to put inside them, so also sets are very useful in mathematics, but trying
to build all of mathematics with pure sets only is somewhat extreme. Yes, it can be done. But the semantic
content of sets is very limited indeed. Mathematics without semantics is literally meaningless. Giving
meaning to mathematical definitions requires something more than just containers. Meaning is normally
communicated through informal discussions which frame the set-based definitions. But perhaps definitions
can be upgraded and enhanced a little to communicate some semantics themselves from time to time. One
should not need to read or listen to a “second channel” to get the semantics of every definition.
The emptiness of pure set definitions suggests that other axiomatically defined systems should be permit-
ted within mathematics. For example, integers and real numbers may be independently axiomatized and
introduced into mathematics without being constructed themselves out of sets. Concepts such as tensor

2.6. Sets as parameters for classes of objects 37
spaces and tangent spaces can be defined axiomatically, and concrete constructions may be imported from
any source desired.
It should not be forgotten that sets are themselves merely models of some aspects of human perception of
the world. Even sets are “imported” into mathematical discussion from “somewhere else”. So it is not a
radical step to import other mathematical structures also. The mathematical logic and analysis of tensor
spaces and tangent spaces is independent of the source of importation of the structures. That logic and
analysis is the primary value in the study of such structures. The particularities of the representation are
secondary. Mathematics is more about methods than concrete constructions.
2.6.3 Remark: Sets provide a parameter space for classes of mathematical objects.
Sets may be regarded as concept parameters rather than the concepts themselves. A set indicates which
object is intended from among a given class of objects. A set on its own is not a full mathematical object.
Each object can only be fully specified if its class is indicated. The category of all sets may be regarded as
a huge parameter space for the concepts that mathematicians wish to discuss and think about.
Just as mathematics provides tools for modelling the physical world, so also set theory provides tools for
modelling mathematical thinking. There is more content in mathematical thinking than is captured in set
constructions. Like names of people, sets communicate which objects are intended, but do not provide a
complete description. Set theory provides a more or less adequate system for pointing to mathematical
objects, but it does not answer the question of what those objects really are.
Sets are no more the real concepts of mathematics than English-language words are the real things of the
world in which we live. (A related comment is made in Remark 2.3.5.)
2.6.4 Remark: Ubiquity of the empty set.

Consider the example of the empty set ∅. This set is a member of every topology. In a particular topological
space (X, T ), the meaning of the element ∅ ∈ T is “the empty subset of X”. In other words, the empty set
∅ indicates which element of a given set T is meant.
Consider topologies T1 and T2 for sets X1 and X2 respectively, where X1 ∩ X2 = ∅. The empty set is in
both T1 and T2 . So T1 ∩ T2 = {∅} -= ∅. Intuitively one would consider that T1 ∩ T2 = ∅ because X1 and X2
have no association of any kind. All topologies for all topological spaces contain the empty set as a common
element. This is more serious than it seems. Suppose one wishes to define a function f on the set T1 ∪ T2 .
Then f must agree on the empty set of each of these two topologies. It is impossible, for example, to define
f (Ω1 ) = 1 for Ω1 ∈ T1 and f (Ω2 ) = 2 for Ω2 ∈ T2 . This suggests that the notation “∅” really means “the
empty subset of the set under consideration”. In other words, the set should be tagged with the containing
set. (And the containing set should be tagged with its object class.)
Lakoff/Núñez [173], page XIII, makes a related remark: “Why is there a unique empty class and why is it a
subclass of all classes? Indeed, why is the empty class a class at all, if it cannot be a class of anything?”
2.6.5 Remark: Ambiguity of zero-valued tangent operators. Context tags are needed.
A popular definition of tangent vectors on manifolds has a problem which is similar to the example in
Remark 2.6.4. As mentioned in Remark 27.6.1, if each tangent vector v at a point p in a manifold M is
represented as a first-order differential operator Dv on the space of continuously differentiable functions f :
M → IR, it transpires that the zero vector v = 0 is represented as exactly the same operator Dv at all
points p ∈ M . Many authors ignore this inconvenient fact, which implies that the tangent spaces at all
points of a manifold have a common element. This is an unfortunate by-product of the set construction
used for the representation. The zero operator D0 at each point p ∈ M really means “the zero vector in the
tangent space at p”. The set construction (i.e. the differential operator) parametrizes the elements of each
per-point tangent space, but the set construction on its own is insufficient to convey the intended meaning.
2.6.6 Remark: Clashes between objects represented as tuples.

As another example, the set A = {(0, 1)} (where 0 and 1 denote the usual real numbers) may represent
{i} ⊆ where i is the purely imaginary complex number, or a set of real-number pairs A ⊆ IR2 , or A
may represent a partially defined function from IR to IR which maps 0 to 1. In practice, these kinds of set
construction clashes must be resolved by implicit tagging within each context.
Functions are usually thought of as a different category of object to sets, and yet for reasons of economy,
functions are represented as sets. The sense that a function is some sort of active machine producing outputs

from given inputs, whereas a set is a passive object, must be provided by the person who thinks about the
object. Within a particular context, it may be clear whether a set A is intended to be a set or a function,
but sometimes it is not, and sometimes there is a clash between identical sets which have different meanings
within the same context.
2.6.7 Remark: Pathological clash of definitions for ordinal numbers.
Consider the set of integers. The integers 0, 1 and 2 may be defined as ∅, {∅} and {∅, {∅}} respectively.
(See Section 7.2.) The idea that the number 1 is a subset of the number 2 seems a little absurd. The best
way to think of this set representation of the integers is that it is a parametrization within a limited scope.
Otherwise the topological space (∅, {∅}) on the empty set would be identical to the pair (0, 1), which would
be perplexing. Clearly the integer 1 and the set {∅} are objects of two different classes, which just happen
to be parametrized by the same set within each class. Similarly, consider that an ordered pair such as (0, 1)
is defined as {{0}, {0, 1}}. So (0, 1) = {{∅}, {∅, {∅}}}, which also seems absurd.
A danger of defining integers to be sets is that if one, for instance, wishes to deal with sets of the form {A, b}
such that A ⊆ IR and b is an integer, then a set such as {∅, 0} could arise. But this would then satisfy
{∅, 0} = {∅} = {0} = 1. So a set which appears to have two elements in fact has only one. When all objects
are represented as sets, there is always a danger that these objects will clash. This suggests the idea of
“tagging” sets with a category label of some sort. Halmos [160], page 35, refers to such clashes of definitions
as “pathological”.
2.6.8 Remark: Disambiguation of sets by implicit class tags in each usage context.
The idea of using sets as parameters of objects within mathematical classes is outlined in Section 5.16. In
such a framework, each mathematical object is indicated by a class/set pair.
One way to subtly clarify the class to which a set belongs is to refer to it as, for example, “the complex
number (0, 1)” or “the ordered pair (0, 1)”. Alternatively, one may write “(0, 1) ∈ ” or “(0, 1) ∈ IR × IR”.
This kind of implicit class tagging is mostly adopted in this book (as in most mathematics books). It is
important not to forget that the real-number pair (0, 1) is not a complex number on its own.
Explicit tagging would look more like Complex(0, 1) or Real-tuple(0, 1), which would be tedious to write
and to read.
2.6.9 Remark: Inheritance of implicit class tags by constructed mathematical objects
The implicit class tags which are described in Remark 2.6.8 for mathematical objects are presumably inherited
by objects which are constructed out of them. The two component sets of the cross product × still have
the quality of being complex numbers even in the cross product, and the cross product IR2 × IR2 similarly
inherits the implicit class of each set IR2 . Otherwise the sets × and IR2 × IR2 would be the same.
If complex tuples and IR2 tuples are given explicit tags as in the examples Complex-tuple((0, 1), (0, 2)) or
Real-tuple-tuple((0, 1), (0, 2)), one would expect the first-component projections to be Complex(0, 1) or
Real-tuple(0, 1) respectively. This suggests that the original tuple-tuples should have been formalized as
Complex-tuple(Complex(0, 1), Complex(0, 2)) or Real-tuple-tuple(Real-tuple(0, 1), Real-tuple(0, 2)).
This kind of systematic formalization of explicit class tags becomes very tedious very quickly. A similar for-
malization is routinely implemented in computer programming languages where ambiguity must be avoided.
But even in mature “strongly typed” programming languages such as C++, ambiguities and surprising
outcomes are often encountered despite serious attempts at standardization.
In the case of mathematics, it is important to simply be aware of the pitfalls of “untyped” objects. As
mentioned in Remark 1.4.7, it is important to be able to identify in which space (i.e. class) each mathematical
object “lives”. Whenever confusion arises, it is useful to be able to determine the class of every sub-expression
of every expression.
2.7. Extraneous properties of set-constructions in definitions

2.7.1 Remark: Definitions ensure that people are talking about the same thing.
In Section 2.5, it is asserted that mathematical objects are not located on paper (as written symbols) and
are not located in some mysterious, mystical, metaphysical, mathematical universe where everything is
perfect. Mathematical objects are located inside minds. This ontological picture has implications for how
mathematical objects should be defined. This is important in a definitions book such as this.

2.7. Extraneous properties of set-constructions in definitions 39
Mathematical definitions are part of the communications language which may be observed on a socio-
mathematical network whereas the real “stuff” of mathematics resides inside the minds of the participants.
In this view, definitions are merely signals whose purpose is to ensure that participants in a network are
talking about the same thing.
2.7.2 Remark: Internal representations of mathematical objects should not transmit extraneous properties.
We may consider two mathematical objects within two different minds to be “the same thing” if all of their
properties are identical. This raises the issue of extraneous properties. Sometimes two objects may have
identical properties as required by the particular context of a discussion, but they may have additional
properties which are different. For example, the number 5 may have the colour green inside one mind and a
bright yellow colour in another mind.
There is little that can be done to prevent two minds from giving their mathematical object representations
extraneous properties which are not specified by definitions communicated over the network. However, it
is possible to prevent such extraneous properties from entering into the discussion by listing very precisely
those properties which are relevant to the context. Such a listing of relevant properties is similar to an
axiomatic specification. One of the purposes of axiomatization is to ensure that a discussion is not confused
by the introduction of extraneous properties.
As a practical example, two participants may agree that the real numbers 3 and π satisfy 3 < π, but
one participant may claim that 3 ⊂ %= π whereas the other one claims that 3 =
⊃ π. (See Notation 5.1.12.)
\
These set inclusion relations are irrelevant and mutually contradictory, but two different kinds of Dedekind
cut definitions do in fact have these properties. (See Section 8.3 for Dedekind cuts.) Clearly this kind of
intrusion of contradictory extraneous properties into a discussion must be excluded. The simplest way to
avoid such contradictions is to ban all extraneous properties.
2.7.3 Remark: Constructional definitions on the network should not have extraneous properties.
The danger of extraneous properties mentioned in Remark 2.7.2 arises because of differing internal repre-
sentations of mathematical objects. Such differences cannot be avoided. One can only prevent extraneous
properties of internal representations from appearing on the communications channel.
A second cause of extraneous properties is the co-existence on a socio-mathematical network of multiple
alternative set-constructions for objects. For example, some√network participants may use a Dedekind cut
of the form {q ∈ ; q 3 < n} to represent the real number 3 n for integers n√∈ on the network whereas
other participants may use a Dedekind cut of the form {q ∈ √ ; q 3 > n}
√ for 3 n on the network. The first
representation has the extraneous property that n1 < n2√ ⇔ √n1 %= n2 . The second representation has
3 ⊂ 3
the contradictory extraneous property that n1 < n2 ⇔ 3 n1 ⊃ \

=
3
n2 .
This kind of extraneous property is not a leakage of internal representation properties into the discussion.
In this case, the extraneous properties arise from the use of multiple definitions “on the wire”. The above
two example definitions provide all intended properties of real numbers such as algebraic operations, order
and completeness, but they also provide extraneous properties.
2.7.4 Remark: How to manage extraneous properties of constructional definitions.

One may avoid the extraneous properties that arise from a diversity of set-construction representations on
the network by simply having no set-construction definitions at all. One could exclusively use axiomatic
definitions. However, there are at least two good reasons to make use of set constructions instead of axioms
for some definitions.
(i) Some set-construction definitions are precisely what is intended. In other words, all properties of the
construction are intended. An example is the representation of real number intervals [r1 , r2 ] as sets
{x ∈ IR; r1 ≤ x ∧ x ≤ r2 } for r1 , r2 ∈ IR with r1 ≤ r2 . Absolutely all properties of this representation
are intended. (There are alternative representations which have both the intended properties and some
unintended extraneous properties.) Real-number intervals could be defined axiomatically, but this would
require some tedious work to define the intervals and to verify that the axioms satisfy existence and
uniqueness.
(ii) Even when a set construction has undesired extraneous properties, the set construction may have many
advantages. For example, a specific construction may have the advantage of automatically satisfying
existence and uniqueness whereas an axiomatic definition might require substantial work to demonstrate

existence and uniqueness up to isomorphism. A specific construction may also be useful for visualization
or intuitive understanding of the objects. For example, linear spaces of real n-tuples may be represented
as functions f : n → IR, but then (r1 , r2 ) = {(1, r1 ), (2, r2 )} and (r1 , r2 , r3 ) = {(1, r1 ), (2, r2 ), (3, r3 )}
for r1 , r2 , r3 ∈ IR so that (r1 , r2 ) ⊆ (r1 , r2 , r3 ), which is (most likely) an undesired extraneous property.
In case (i), there is no danger of extraneous properties. So there is no reason to avoid such representations.
In case (ii), extraneous properties are typically disregarded in an informal way, but the disregard sometimes
fails. It is difficult to resolve this problem. For example, the non-negative integers may be represented as ∅
for 0 and n ∪ {n} for integers n + 1 with n ≥ 0. (See Remark 7.2.4.) This representation has the extraneous
property that n ∈ n + 1 and n ⊆ n + 1 for all n ≥ 0. The reader may be instructed to not apply the relations
“∈” and “⊆” to pairs of numbers, but it is difficult to make a complete list of all extraneous properties which
must be ignored.
Case (ii) should be avoided where possible. If such a set construction is given, a set of axioms should be
given also, and any particular construction should be presented as merely a representation for the axiom
system, not as the real definition.
2.8. Axioms versus constructions for defining mathematical systems

2.8.1 Remark: Axiomatic versus constructional methods for defining mathematical systems.
Mathematical systems may be brought into a mathematical communication in two ways.
(1) Axiomatic method: Define a system by its properties. (Examples: Peano natural number axioms,
real-number axioms.)
(2) Constructional method: Define a system as a specific set construction from previously defined systems.
(Examples: von Neumann construction for ordinal numbers, Dedekind cuts and Cantor construction for
real numbers.)
2.8.2 Remark: Further classification of the axiomatic method of definition.

The axiomatic method (1) in Remark 2.8.1 may be further classified as follows.
(i) Inconsistent axioms. There are no set structures satisfying for the axioms.
(ii) Unique representation of a single system. One and only one set structure satisfies the axioms.
(iii) Multiple representations of a single system. Many set structures satisfy the axioms, but they are
all equivalent to each other (for some equivalence relation).
(iv) Class of systems. Many set structures satisfy the axioms, and they are not all equivalent to each
other (for some equivalence relation).
In case (ii), one may as well use a direct construction.
Case (iii) may be said to characterize all representations of a single defined concept. Such a definition may
be referred to as a “characterization” or “metadefinition”. In other words, the defined concept is fixed, but
the representation is variable. For example, all systems which satisfy Metadefinition 13.5.1 (for the tensor
product of linear spaces) are isomorphic. All such systems are equivalent alternative representations of the
tensor product idea.
In case (iv), the axioms define a class of systems. For example, Definition 25.3.1 gives a set of rules for an
n-dimensional manifold. For each parameter n ∈ + , there are many equivalence classes of n-dimensional
manifolds.
2.8.3 Remark: Modern and classical axiom systems.

Shoenfield [169], page 2, calls axiom systems in Remark 2.8.2 case (iv) “modern axiom systems”, whereas
he calls axiom systems in case (ii) “classical axiom systems”. But he comments: “Of course, the difference
is not really in the axiom system, but in the intentions of the framer of the system.” He defines an axiom
system as “the entire edifice [. . . ], consisting of basic concepts, derived concepts, axioms, and theorems”. A
classical axiom system has only a single “universe” of objects whereas a modern axiom system may have
many alternative “universes” of objects. (See Shoenfield [169], page 10.)

2.8. Axioms versus constructions for defining mathematical systems 41
2.8.4 Remark: Further classification of the constructional method of definition.

The constructional method (2) in Remark 2.8.1 may be further classified as follows.
(i) The construction method yields no structures.
(ii) The construction method yields exactly one structure.
(iii) The construction method yields more than one structure.
One might think that a construction method should always yield one and only one structure. However,
concepts are very often defined as “the solution” of a set of equations, or as the outcome of some maximization
of minimization process. For example, the square root of a real number y is defined as “the solution” x of
the equation x2 = y. Clearly the number of solutions may be zero, one or many according to the value of y.
Since so many concepts are constructed as the solutions of equations, existence and uniqueness proofs are
frequently required if the concept being defined is expected to be single-valued. Such proofs are sometimes
extremely difficult to provide. Consequently establishing that a concept is well-defined can sometimes require
a lot of hard work. So a mere “definitions book”, for example, might not be so straightforward as expected.
Even set constructions which appear to be direct and simple may be regarded as sets of equations which
have solutions. Consider, for example, the empty set x = ∅. This is shorthand for “x; ∀y, y ∈ / x”. In other
words, the empty set is the unique set x which satisfies ∀y, y ∈
/ x. It happens that this set x does exist and
is unique, but this requires proof .
Consider also two-element sets {a, b}. For each a and b, {a, b} is defined to be the solution x of the proposition
∀y, (y ∈ x ⇔ (y = a ∨ y = b)). Just like the empty set, this two-element set requires an existence and
uniqueness proof.
2.8.5 Remark: Parametrized families of systems.

Sometimes a single mathematical system, or a single class of systems, is defined (by axioms or by construction)
in isolation. Examples of a single system are the integers, the rational numbers, the real numbers and the
complex numbers. Examples of a single class of systems are the finite groups, general groups, fields, rings,
linear spaces and topological spaces.
Often, however, it is convenient to define many systems or classes of systems according to a variable parame-
ter. Examples of parametrized single systems are the group of permutations of the set n for a given positive
integer n, the ring C(X, Y ) of continuous functions from X to Y for given topological spaces X and Y , and
the function space C k,α (X) for a given non-negative integer k, real number α ∈ (0, 1] and differentiable
manifold X. Examples of parameterized classes of systems are the linear spaces over a given field K and
the vector fields on a given differentiable manifold M .
It is very often impossible to determine whether the parameters for a definition are really parameters or are
in fact part of a single non-parametrized definition. For example, one may define n-dimensional topological
manifolds for a given parameter n. But one could equally define topological manifolds with the dimension n
as part of the definition. For example, consider the following.
(1) Let n be a non-negative integer. An n-dimensional topological manifold is a Hausdorff space M which
is everywhere locally homeomorphic to IRn .
(2) A topological manifold is a Hausdorff space M which is everywhere locally homeomorphic to IRn for
some non-negative integer n.
In case (1), n is clearly a parameter for an infinite set of definitions. In case (2), n is a dummy variable
in the proposition “∃n, M is locally homeomorphic to IRn ”. In the latter case, it must be shown that n is
unique if it exists. (In fact, it is not unique if M is the empty topological space, and the proof is non-trivial
otherwise.)
In practice, definitions of the form (1) and (2) are used interchangeably, although in a formal sense they are
very different.
2.8.6 Remark: Set constructions have extraneous properties. Axioms require existence/uniqueness.
As mentioned in Section 2.7, the set-construction style of definition has the disadvantage of “extraneous
properties”. But axiomatic definitions have the difficulty that existence and uniqueness proofs are often
required.

A useful way to think about set-construction definitions is that every such set-construction is merely a “canon-
ical representation” of an abstract idea. In other words, any construction which has the same information
content and properties may be substituted for the given representation. Equivalence of respresentations may
be defined with the aid of an isomorphism test. This isomorphism is often clear from the context.
The provision of an canonical representation avoids the requirement to provide an existence proof (which
would be necessary for an axiomatic definition). The uniqueness requirement is then fulfilled by the provision
of an isomorphism test, because this actually defines uniqueness for representations of the given concept.
In this sense, all set-construction definitions are essentially the same thing as axiomatically defined metadef-
initions or characterizations. In the former case, the set of permitted representations is determined by an
isomorphism test relative to a canonical representations. In the latter case, the set of permitted represen-
ations is determined by a set of required attributes and relations. Consequently one should never be too
insistent on the adoption of one definition or another. The important thing is to very precisely communicate
the concept which is being presented.
2.8.7 Remark: Mathematical classes which may be defined either axiomatically or constructionally.
Some mathematical systems are best defined by axioms; others are best defined as constructions; some sys-
tems may be conveniently defined either axiomatically or constructionally; and some systems are inconvenient
for both methods of definition.
For example, a linear space K n may be constructed from a field K using ordered n-tuples in a way which
is well suited to the vast majority of contexts. (See Remark 2.7.3 part (ii) for an example of extraneous
properties of the system K n .) For such a system, it is superfluous, but not too difficult, to go through the
axiomatic approach. One could, in fact, give a metadefinition for the linear space K n as an abstract linear
space V together with families of canonical linear injections ik : K → V and canonical linear projections
πk : V → K such that ik ◦ πk = idK and πk ◦ ik : V → V is idempotent for all k ∈ n . If a few more axioms
are added, such a metadefinition will guarantee that all such representations V will be isomorphic to each
other and to the usual construction of K n . In this case, the constructional method is more convenient than
the axiomatic method and the extraneous properties are not too difficult to ignore.
The non-negative integers may be defined via axioms as in Definition 7.3.2 (similar to the Peano axioms) or
via the von Neumann construction from empty sets as in Remark 7.2.4. Probably the axioms provide the
most convenient definition for the non-negative integers, but the construction based on ZF set theory provides
some useful insights and is more easily extended to general ordinal numbers. Likewise, the real numbers are
perhaps slightly more conveniently defined by axioms rather than construction, but more advanced texts
prefer constructions, either in the Dedekind or Cantor style. (See Remark 8.3.4 for comments on these two
constructions.)
One may regard the constructional approach as merely a short-cut version of the axiomatic approach because
a specific system construction conveniently omits the need for axioms and uniqueness and existence proofs.
In fact, the constructional approach is always a special case of the axiomatic approach because one may have
axioms which precisely specify the construction method.
In this book, the vast majority of definitions use the constructional method. The axiomatic method is used
only when specific constructions have significant difficulties (as in the case of tensor products). In particular,
a list of axioms specifying properties and relations is preferable when the mathematics literature contains
a proliferation of unsatisfying definitions, as is the case for tangent vectors on manifolds for example. The
axiomatic method is essential for systems which are highly non-unique, such as the categories of groups,
linear spaces and topological manifolds.
2.8.8 Remark: Particular set-constructions for concepts may be too ugly.

The set constructions for some concepts, such as for tensor products and tangent spaces in particular,
are quite unintuitive and unconvincing because they are much more complicated and untidy than one’s
intuitive ideas about these concepts. (For example, the usual von Neumann ordinal number representation
is somewhat ugly. There seem to be no graceful representations for tensor products either.) In such cases,
there are often multiple alternative definitions, each of which is more or less suitable for particular applications
and audiences.
The adoption of an axiomatic definition for a concept aften avoids the ugliness of particular representations.
If no tidy, intuitively appealing set-construction can be found, an axiomatic definition may be preferable.

2.9. Some general remarks on mathematics and logic 43
2.8.9 Remark: Axiomatic systems specify interoperability, not implementation.

An axiomatic system for a class of mathematical objects is analogous to interoperability and conformance
specifications in various engineering fields. In engineering standardization bodies, it is usual to require
standards to specify only interfaces, not implementation. Great importance is attached to interoperability
so that implementers of interoperable systems are left free to design concrete representations of the abstract
specifications as they wish. The philosophical principle behind this is that there should be open competition
to build the best implementations of the systems, while maintaining full interoperability between different
brands of systems which do not need to know anything about the internal details of particular system designs.
In the mathematics field, it is not merely undesirable to know how other people implement mathematical
objects in their own minds; it is actually (currently) impossible to know how mathematical objects are
implemented. Therefore mathematical definitions can only ever specify interoperability rules. The entire
literature of mathematics specifies only what the communications channels between humans must look like,
not what the concepts look like inside human minds.
This remark has the important consequence that mathematical definitions do not describe mathematical
objects. Definitions only describe communications between human beings about mathematical objects.
Consequently, one should not seek deep understanding of mathematical objects by examining definitions,
which only describe the contents of communications channels. The same mathematical object in a human
mind may be communicated in many different ways. Thus many different definitions may actually refer to
the same object. For example, five different definitions of tangent vectors on manifolds might refer to the
same mental concept.
This perspective on mathematical definitions is particularly significant for this book because it is concerned
predominantly with definitions as the foreground concern, not just as a necessary prelude to presentation
of theorems. In order to choose the best definitions, the author needed to know what is being defined.
When it is accepted that only the communication of mathematical systems is being defined, not the systems
themselves, it follows that definitions should be generally more axiomatic in character rather than in terms of
specified set constructions. In other words, mathematical systems should preferably be characterized rather
than constructed . Specific constructions should be regarded as mere “demonstration prototypes” rather than
“the only way to do it”. In engineering, “reference implementations” play an analogous role. Any system
which is a “drop-in replacement” for the reference implementation is considered to be acceptable.
2.9. Some general remarks on mathematics and logic
2.9.1 Remark: The vast majority of all possible theorems are uninteresting.
Symbolic logic seems to reduce all of mathematical deduction to mechanistic computations which may
be automated by a computer. In principle, a computer program may generate all possible theorems in
mathematics. However, it is equally true that a computer may generate all possible digital images and all
possible natural-language texts. Possessing all (256 × 256 × 256)1000000 possible 1000 bit by 1000 bit images
is of very little practical benefit. Someone must still be employed to find the interesting images.
In the same way, the ability to generate all possible theorems is of little value. The real business of mathemat-
ics is to find and use the interesting theorems. (See Remark 4.8.1 for a similar comment.) The mechanization
of mathematics serves only to check that when someone claims to have discovered an interesting theorem, it
can be established with great confidence that the theorem is true or false.
2.9.2 Remark: Set theory and logic define rules for an artifical universe.
Mathematics may be regarded as an artificial universe with its own rules or “equations of motion” or
“evolution equations”. Within this artificial universe, models may be built for real-world processes, which is
why the rules are chosen as they are. (More accurately, the rules of mathematics are chosen as a compromise
between what the human brain is capable of and what the real world applications demand.)
A computer language similarly creates an abstract semantic space with its own special rules. Each language
is best suited to modelling different kinds of situations and problems. Just as computer languages may be
executed on computers, so mathematics may be processed in human minds. The rules of mathematics are
not provable. They are just currently the best rules for practical applications. (The recreational applications
of mathematics are a by-product of the useful applications.)

2.9.3 Remark: Acceptance of mathematical concepts varies according to personality and history.
There have always been disagreements about the fundamentals of mathematics. For much of history, neg-
ative numbers were not accepted, and complex numbers were only generally accepted in relatively recent
times. Some people might still not accept induction arguments or infinite sets. Some people might not
accept that there are more irrationals than rationals. (See Bell [190], page 566.) At various times in history,
irrational numbers, transcendental numbers, non-analytic functions and non-Euclidean geometries were re-
jected. People must have a model in their own mind which corresponds to the elements of mathematics
which they accept. A person to whom a negative number seems absurd will have great difficulties with
signed arithmetic, no matter how skilled they are in manipulating symbols according to the prescribed rules.
2.9.4 Remark: Arbitrariness of the rules of mathematics.

Arguing about the axioms of mathematics is as important, and unimportant, as arguing about the rules of
football. There are many “codes” of football rules which co-exist in the world, but within a single code, and
for a particular football game, there must be an agreed set of rules. Mathematics is “all in the mind”. So
there is no way to test the validity of any set of rules for mathematics.
2.9.5 Remark: Controversies about infinities in the fundamentals of mathematics.

When you study mathematics, you are studying one of the capabilities of the human mind. Most of the
controversies in the fundamentals of mathematics are concerned with the credibility or otherwise of infinite
concepts. Since the human mind is finite, it is not surprising that disagreements occur in this area, because
everything that can be said about infinite things is conjectural. Different opinions on set theory are gen-
erally distinguished by differing levels of comfort with concepts which are infinite, which are all ultimately
extrapolations from what we can think of to what we can’t think of directly. Unfortunately, nearly all of
differential geometry is concerned with things which are infinite.
2.9.6 Remark: Mathematics is an art or craft.

Mathematics is an art like music or knitting, taught by boot-strapping from the prior skills and naive con-
cepts of the individual. In other words, mathematics is not pure and absolute knowledge. Mathematics is
an abstraction from some rather ancient arts, principally the geometric arts (including surveying, naviga-
tion, astronomy and carpentry) and the arithmètic arts (including accounting, chronology, and weights and
measures). René Descartes established strong links between the geometric and arithmètic arts. Ultimately
the numerical view of geometry triumphed over the geometric view of arithmetic because the arithmètic
perspective was the more powerful, although the arbitrariness of the choice of numeric coordinates still an-
noys physicists and mathematicians. (Analysis and algebra are included in the arithmètic arts since they
deal with numbers rather than points and lines, although both analysis and algebra were often couched in
geometric language until the 19th century.) One of the constant currents in differential geometry is the
antagonism between the numerical and geometric points of view. The frequent clamour for coordinate-free
differential geometry is an attempt to return to the good old days when geometry had a purely geometric
character. This clash of views seems to be inherent in human nature, since we have both geometric and
arithmètic modes of thinking, and the geometric mode is more intuitive and direct.
[ Check the roles of the people mentioned in Remark 2.9.7. ]
2.9.7 Remark: Mathematical deduction may be regarded as a typographic art.

Around the turn of the 20th century, there were two important developments in relations between the
geometric and arithmètic arts. First, there was the set theory associated with names such as Cantor and
Weierstraß. Then there was the symbolic logic basis for mathematics developed by Peano, Frege, Russell,
Zermelo, Fraenkel, Bernays, Gödel and others. The set theoretic perspective provided a new basis for
geometry and numbers. Then symbolic logic provided a systematic basis for set theory.
Symbolic logic may be characterized as a typographic art rather than a science. The martialling of sequences
of symbols on lines and the enforcement of strict rules between sequences of lines are reminiscent of lead
typography. Symbolic logic requires prior notions of order of symbols on lines, the order of lines within
an argument, sets of symbols (the font cases), and relations between sets of symbols and between lines
of symbols. Any treatment of symbolic logic freely uses number concepts such as integer arithmetic, and
geometric concepts such as left, right, above and below.
A related remark is made by Lakoff/Núñez [173], page 86, as follows.

2.9. Some general remarks on mathematics and logic 45
Our mathematics of calculation and the notation we do it in is chosen for bodily reasons—for ease
of cognitive processing and because we have ten fingers and learn to count on them. But our bodies
enter into the very idea of a linearly ordered symbolic notation for mathematics. Our writing
systems are linear partly because of the linear sweep of our arms and partly because of the linear
sweep of our gaze. The very idea of a linear symbol system arises from the peculiar properties of
our bodies. And linear symbol systems are at the heart of mathematics. Our linear, positional,
polynomial-based notational system is an optimal solution to the constraints placed on us by our
bodies (our arms and our gaze), our cognitive limitations (visual perception and attention, memory,
parsing ability), and possibilities given by conceptual metaphor.
The phrase “linear, positional, polynomial-based notational system” refers to our system for writing numbers.
! notation represents a non-negative integer by the sequence of coefficients an an−1 . . . a0 of
Thus the decimal
a polynomial ni=0 ai 10i . (See Lakoff/Núñez [173], page 82.)
2.9.8 Remark: Medieval university subject levels: trivium and quadrivium.

It is interesting to compare the topic layering in Figure 2.1.1 (Remark 2.1.1) with the corresponding layering
of university subjects in medieval Europe. These subject layers are illustrated in Figure 2.9.1.
quadrivium
arithmetic geometry astronomy music
grammar logic rhetoric
trivium
Figure 2.9.1 Relations between trivium and quadrivium in modern mathematics
There are four perspectives for mathematics which can each be defined in terms of the others. These are the
geometric, numeric, set theoretic and symbolic logic perspectives. The latter two are identifiable with the
“trivium” of ancient Greek and medieval university education, namely grammar, rhetoric and logic, whereas
the former two are identifiable with the university “quadrivium”, namely arithmetic, geometry, astronomy
and music. In a sense, the systematization of mathematics at the turn of the 20th century redefined the
mathematics of the quadrivium (the advanced university subjects) in terms of the trivium (the elementary
university subjects).
2.9.9 Remark: Roles and interactions of mathematicians and physicists.

The interactions between mathematicians, physicists and reality are roughly illustrated in Figure 2.9.2.
real world experimental theoretical mathematical mathematician

physicist physicist physicist
Figure 2.9.2 Interactions between mathematicians, physicists and reality
There are endless arguments about the differences between theoretical and mathematical physics. The
following is an outline of this author’s current perspective on the issue.
(1) The experimental physicist does experiments, including design of experiments, design of apparatus,
operating the apparatus, collecting data, recording data, organizing data, making initial interpretations
of data, and publishing the data in a meaningful form. The experimental physicist has some contact
with the real world.
(2) The theoretical physicist interprets data which is produced by the experimental physicist by comparing
data to what the theories predict, attempting to fit data to the theories, attempting to select among
competing theories by eliminating hypotheses which are inconsistent with the data, and suggesting
future research projects to help select more effectively between competing theories.

(3) The mathematical physicist studies the theories which theoretical physicists work on and proposes new
theories, but is not directly involved in trying to harmonize experimental data with theories. The
mathematical physicist studies the mathematical properties of physical theories to determine whether
they are self-consistent and to determine what predictions can be made from the theories. Thus the
mathematical physicist is concerned with the mathematical machinery of physical models, not with
their correspondence to experimental data, but the motivation is to assist the theoretical physicist by
checking the validity and broad properties of proposed models, improving the models, and providing
tools to better utilize the models.
(4) The mathematician studies mathematical machinery. The mathematician is concerned with the self-
consistency of mathematical systems and the consequences and interrelationships of mathematical sys-
tems. A large proportion of those systems originate in physics and other sciences. Some mathematical
systems, however, could be characterized as of recreational or intellectual interest only. Sometimes
mathematicians develop systems out of mere curiosity which turn out unintentionally to be useful for
modelling physical systems. Mathematicians are usually concerned with logical self-consistency, mini-
malist clarity and strict correctness more than mathematical physicists are.
The mathematician (4) is also called a pure mathematician to contrast with the applied mathematician (5).
But applied mathematicians are often so concerned with their application areas that they are better grouped
together with the application discipline since they use mathematics as a tool rather than developing new
mathematics.
(5) The applied mathematician is concerned with practical techniques of solution of mathematical prob-
lems more than with the formulation of mathematical frameworks for defining classes of problems.
Thus applied mathematicians often develop numerical techniques and algorithms for solving problems
approximately for use in real-life applications.
2.9.10 Remark: Mathematics as a tool subservient to the interests of physicists.

Bell [191], pages 197–198, makes the following comments on the differences between mathematicians and
physicists in the context of the examination of a paper on heat theory by Fourier for the Grand Prize of the
French Academy in 1812.
Laplace, Lagrange, and Legendre were the referees. While admitting the novelty and importance
of Fourier’s work they pointed out that the mathematical treatment was faulty, leaving much to be
desired in the way of rigor. Lagrange himself had discovered special cases of Fourier’s main theorem
but had been deterred from proceeding to the general result by the difficulties which he now pointed
out. These subtle difficulties were of such a nature that their removal at the time would probably
have been impossible. More than a century was to elapse before they were satisfactorily met.
In passing it is interesting to observe that this dispute typifies a radical distinction between pure
mathematicians and mathematical physicists. The only weapon at the disposal of pure mathemati-
cians is sharp and rigid proof, and unless an alleged theorem can withstand the severest criticism
of which its epoch is capable, pure mathematicians have but little use for it.
The applied mathematician and the mathematical physicist, on the other hand, are seldom so
optimistic as to imagine that the infinite complexity of the physical universe can be described
fully by any mathematical theory simple enough to be understood by human beings. Nor do they
greatly regret that Airy’s beautiful (or absurd) picture of the universe as a sort of interminable,
self-solving system of differential equations has turned out to be an illusion born of mathematical
bigotry and Newtonian determinism; they have something more real to appeal to at their own
back door—the physical universe itself. They can experiment and check the deductions of their
purposely imperfect mathematics against the verdict of experience—which, by the very nature
of mathematics, is impossible for a pure mathematician. If their mathematical predictions are
contradicted by experience they do not, as a mathematician might, turn their backs on the physical
evidence, but throw their mathematical tools away and look for a better kit.
This indifference of scientists to mathematics for its own sake is as enraging to one type of pure
mathematician as the omission of a doubtful iota subscript is to another type of pedant. The result
is that but few pure mathematicians have ever made a significant contribution to science—apart,
of course, from inventing many of the tools which scientists find useful (perhaps indispensable).

2.10. Dark sets and dark numbers 47
It would be very surprising indeed if the mathematics of which the human mind is capable was adequate to
describe all physical phenomena. The human mind is a small machine. The universe is a very large machine.
The human mind is implemented as a biological system which is optimized for many purposes other than
doing physics. It is truly surprising that human mathematical modelling is as powerful as it is. However,
future progress in physics may require new developments in the fundamentals of mathematics. This justifies
the strong emphasis in this book on fundamentals. Overhauling a system requires thorough understanding.
2.10. Dark sets and dark numbers

[ Rewrite Sections 2.10, 2.11 and 2.12 in light of Remark 2.10.1. ]
2.10.1 Remark: Dark sets and numbers are sets and numbers which cannot be named.
The issues regarding dark sets and numbers, and grey sets and numbers, and concerns about the validity of
concepts like infinity and infinitesimality, become quite easy to resolve in terms of the modelling ontology
which is outlined in Section 3.3, Section 4.1, Remark 4.12.3, Remark 5.7.25 and elsewhere.
Dark numbers (or pixie numbers) are those numbers which are assumed to exist in a concrete object space
which is modelled by an abstract language space. Since the span of the abstract language is limited to a
countable set of names, it is impossible to give names to all elements of a presumed uncountable space of
concrete objects. Therefore almost all elements of a presumed uncountable concrete space must be “dark”.
That is, they cannot be given names.
Since in practice only a finite number of names can ever be written down or thought about, almost all of
the non-dark concrete objects will never be given names, and never will be given names under any naming
system. Since the naming systems for logical and mathematical objects vary over time and can be newly
invented or extended at will, it is not possible to talk about the “largest number which can be named”,
because obviously this meta-name may be used in a meta-naming-system to include a number which is even
larger. Naming systems for logical and mathematical objects are socially defined and highly dynamic. So
the classes of “grey numbers” and “grey sets” are also socially defined and dynamic.
Questions regarding infinity concepts are also quite easy to resolve within the names/objects modelling
perspective. The elements of presumed infinite sets in modelled systems cannot all be referred to individually
by name, as mentioned above. The concrete systems being modelled do not exist in any physical sense.
Mathematical models (such as ZF set theory) which supposedly describe infinite sets simply do not model
any real system. The system being modelled in this case is an imaginary system, and models which include
infinite sets merely state how such a system would behave if it existed. Since such systems can never be
observed in nature (due to the finite bandwidth of human perception if for no other reason), mathematics
involving infinities is, in a sense, null and void and to no effect. However, the infinitudes of mathematics
are never tested in practice (since such testing is impossible). So the infinities of mathematics are not part
of the testable component of models. Only the finite parts of mathematics are ever tested, and these finite
parts are often found to be enormously useful. Therefore there is no absolute imperative to discard infinite
mathematics on the grounds that infinities can neither be named nor observed.
The principle of mathematical induction requires a name to be assigned to an object in a set, and an induction
rule is applied to this named object to show that another named object (its successor) is in the set, which
then shows that there is no maximum element in the set. This is what we mean by an infinite set. However,
since we cannot name all objects, and we cannot explicitly specify an infinite number of naming maps, it is
not possible to validate the application of mathematical induction to concrete systems. The concept of “the
largest namable number in a set” is as circular as “the largest number which can be written in ten words”.
When one writes N = “the largest namable number in set X”, one must adjust one’s naming system so that
the name N points to the largest namable number in X. Thus the name map from abstract to concrete
objects is defined in terms of itself. There is an implicit name-to-object map µV : NV → V here which
maps an abstract name such as N to a concrete number µV (N ). (See Remark 4.12.3 for notations.) This is
illustrated in Figure 2.10.1.
The inability to give names to all individual objects does not contradict the fact that we can give names to
uncountable aggregates of objects. The set of real numbers can be named. The set of real numbers between
0 and 1 can be named. We just can’t name all of the members of such sets. We “know” that they are “there”,
but we can’t name them! But we only know that they are there because the abstract theory claims that this

abstract name space
names NV
name µV
map
rk number
da
s
named
numbers V
da
rk numbers
modelled concrete numbers
Figure 2.10.1 The naming bottleneck and dark numbers
is so. In fact, the truth is that we will never obtain evidence that they are not there. So it is (potentially)
harmless to assume that all of the individual real numbers are “there”. But “there” means in the modelled
concrete object space, which we can never experience and never test. One of the great advantages of the
human mind is its ability to imagine things which have not yet been experienced. In this case, the human
mind imagines real numbers and ZF set theory, which are actually impossible to experience.
As long as there are practical advantages to the development of models of imaginary worlds, there is an
incentive to continue to develop them. As mentioned in Remark 3.4.2, there are practical reasons why we
should continue to accept our current logic and mathematics. It is in our self-interest to do so. Modern logic
and mathematics are not “true” in any sense. They are beneficial to our interests. They just work. There
is no incentive to interrogate a goose too closely if it continues to lay golden eggs.
The way in which the “naming bottleneck” restricts our ability to give names to all of the real numbers is
analogous to the way in which the use of a microscope limits our ability to view every point on Earth. A
name map from a countable set of abstract names to a notional uncountably infinite set of concrete objects
is like a microscope for viewing the concrete objects. We can only see a countable number (at most) at any
one time. We just don’t have the required bandwidth to see (and name) all of the real numbers at the same
time. But we can view a satellite image which shows the whole Earth in aggregate. Mathematical induction,
for example, names and examines two objects at a time, which is a microscopic operation on an infinite set.
2.10.2 Remark: Finite minds cannot think about non-finitely-representable things.

In symbolic logic, only a countable number of propositions may be written (or thought about or processed
in a computer), even in infinite time, since all propositions contain a finite number of symbols chosen from
a countable set. Therefore only a countable number of sets may ever be referred to. This leads to some
“issues” when formulating a set theory which deals with uncountable numbers of things. There are bound
to be difficulties when finite minds try to contemplate infinite concepts.
2.10.3 Remark: Most sets cannot be talked about individually.

The sets which one can explicitly refer to in a set theory are like ants in a garden. If a particular set theory
has only a finite number of sets in total, we can give them all names and see them individually, like in a
formicarium. (A “formicarium” is like an aquarium, but with ants instead of fish.) When the number of
sets is infinite, they may be compared to the ants in a big garden. You can, in principle, see any individual
ant, but you can’t see them all. That is, you can see any ant, but you can’t see all the ants. This is an
important distinction. This is similar to the observation that you can see any person in the world, but you
can’t see all 6 billion people in the world. This does not make us doubt that those people exist. Our lives
are just not long enough to visit them all. (A hundred Gregorian years equals only 3,155,695,200 seconds.)
The fact that we can visit any one of them at will gives us confidence that the rest do exist. Therefore we
can refer to some sets, people or ants as individuals, but the rest must be referred to in aggregate.
Suppose a set theory has an uncountably infinite number of sets. You can only refer to a finite selection of
sets from a countably infinite menu of these sets. This kind of set theory is like a garden with a vast number
of ants, of which any single ant can be seen and referred to, but there are additionally an uncountably
infinite number of pixies which can never, ever be seen or referred to in any individual way. We know that

all those pixies are out there, and we can refer to them in aggregate. We can classify the pixies according to
hair colour, eye colour, gender and shoe size, but we can never, ever see one individually. This is analogous
to the situation with infinite sequences of digits in decimal expansions of real numbers. We assume that
they all exist although we can never write them down. But we can classify them without referring to them
individually.
There is a third level of set theory in which some sets consist entirely of pixies. That is, not a single element
of the set may be referred to individually. The elements may be classified and written about in aggregate, but
not a single one of them can be referred to individually. This is the kind of set which is brought into existence
by the axiom of choice when the Zermelo-Fraenkel set theory axioms do not alone imply its existence.
2.10.4 Remark: Unmentionable sets and numbers are incompressible.

Numbers and sets which cannot be referred to individually by any construction or other specific individual
definition may be described as “unmentionable” numbers or sets. They could also be described as “incom-
pressible” numbers and sets because the ability to reduce an infinite specification to a finite specification
is a kind of compression. For example, !∞ the real number π has an infinite number of digits in its decimal
expansion, but the expression π = 4 n=0 (−1)n (2n + 1)−1 is a finite specification for π which uniquely
defines it so that it can be be distinguished from any other specified real number.
2.10.5 Remark: Unmentionable numbers and sets may be thought of as dark numbers and dark sets.
Pixie numbers and sets, the numbers and sets which we know are “out there” but can never see or think
about individually, could be described as “dark numbers” and “dark sets” by analogy with “dark matter” and
“dark energy”. (The difference, perhaps, it that we don’t really know that dark matter and dark energy are
out there.) The supposed existence of dark matter and dark energy is inferred in aggregate from their effects
in the same way that dark numbers may be mentioned in aggregate. (See Remark 2.11.12 for classification
of dark numbers into black numbers and grey numbers.)
[ How does the classification of discomfort levels in Remark 2.10.6 change if ZF is replaced by NBG set theory?
Proper classes presumably have even more “issues” than sets. ]
2.10.6 Remark: Ant-and-pixie analogy for discomfort levels with unmentionable sets and numbers.
The levels of discomfort with infinite and unreachable sets described in Remark 2.10.3 may be further
described as follows.
Discomfort level 0: Ants in a Formicarium. (Finite set.)
All inhabitants are ants. You can see any individual ant. If you wish, you can see all ants as individuals.
This is a metaphor for a set theory which has only finite sets. Each set is like a formicarium with only
a finite number of ants, which can all be referred to individually within the language of the set theory.
Discomfort level 1: Ants in the Garden. (Countably infinite set.)
In this case, all of the inhabitants are ants, and you can see any ant individually. You can’t see them
all individually because there are too many of them. (There are in my garden anyway.) But you can
see as many as you like individually. No ant is incapable of being detected if you make enough effort.
This is a metaphor for a set theory in which there are infinite sets such as the set ω of ordinal numbers.
Although we can’t write down all the elements, we can single out any finite subset of them, given a
large enough piece of paper to write on or enough time to pronounce all the words of the specification.
Even though we don’t have infinite paper or infinite lives, the induction principle is very convincing:
No matter how large a number is, we can always append a zero to make it bigger. Virtually everyone
is convinced by this argument after about the age of 10. Most people are content to refer to all the
integers in aggregate although we can never write them all down individually.
Discomfort level 2: Both Ants and Pixies in the Garden. (Uncountably infinite set.)
In this case, there are infinitely many ants in the garden, but there are infinitely more numerous invisible
pixies in the garden. You can see any individual ant, but not all of them individually in one lifetime.
However, you can’t see or refer to any of the pixies individually. You can refer to them only in aggregate.
This is a metaphor for a set theory in which an “infinity axiom” brings into existence infinite sets such
as ω, and a “power set axiom” brings into existence sets such as IP(ω), which is the set of all subsets of
an infinite set. The set IP(ω) is equivalent to the set 2ω of all infinite sequences of zeros and ones, which

may be thought of as the binary numbers in the real interval [0, 1) with infinite binary digits after the
“decimal point”. The set IR of real numbers is therefore also a mixture of mentionable numbers and
unmentionable “pixie numbers”.
At this discomfort level, we can individually specify any finite number of elements of the countably
infinite subset of mentionable elements of 2ω , but we can’t individually specify any elements of the
uncountable subset of unmentionable elements of 2ω . Luckily, the mentionable elements are those of
greatest interest because if we can think of an element, we can always write down a description in finitely
many words. The unmentionable numbers can be referred to in aggregate. They are like pixies which
we will never see, but we “know” they’re there. We can refer, for example to the set of all sequences of
zeros and ones which start with a one. That is an aggregate description, but we can’t write down most
of the elements of this set.
So we have lots of ants to work with, and they are the most interesting things because they are the
things which we can think of individually. But there are infinitely more pixies than ants. We can never
even think about the pixie sets and numbers individually !with a finite mind. The number π is a real
∞
number which we can think of and specify. (E.g. π = 4 n=0 (−1)n (2n + 1)−1 is a finite specification
of an infinite number of decimal digits. See Section 2.10.4.) So π is an ant in this metaphor, not a
pixie. We can’t see all of the digits in the decimal expansion of π, but at least we can specify this ant
unambiguously and see as much of it as we have time for.
It is not possible to give an example of an unmentionable number because, by definition, they cannot
be individually mentioned! This is certainly disturbing. We know they exist, but we can’t give any
examples. In real life, when someone claims that things of a particular kind exist, but no examples can
ever be given because they are undetectable by any means, such a person would generally be humoured
and politely avoided.
Discomfort level 3: Only Pixies in the Garden. (Set of AC-existence-only sets.)
In this case, a set has elements which are all pixies. We can talk about the set as a whole in aggregate,
but we can never see a single individual element because every element is an unmentionable pixie set.
The axiom of choice brings into existence sets which are “constructed” according to some unspecified
infinite set of random choices which are unknown and unknowable. For example, a Lebesgue non-
measurable subset of the real numbers is guaranteed by AC to exist, but there is no way to know its
membership. Invisible sets such as Lebesgue non-measurable sets may be thought of as “dark sets” in
a rough analogy with the hypotheses of dark matter and dark energy in cosmology.
If at least one Lebesgue non-measurable subset of the real numbers exists, then there are uncountably
many such subsets. So we “know” (by the axiom of choice) that there exist an uncountable infinity of
pixies in the garden of Lebesgue non-measurable sets, but we can’t see any individual pixie. We can
refer to the pixies in the garden only in aggregate. If U denotes the set of Lebesgue non-measurable
functions f : IR → IR, then U and the subset {f : IR → IR; f (0) = 1} of U are well-defined aggregates,
but no example elements of either set can be written down.
There are no visible ants at all in such a garden, which is very frustrating indeed. We are happy to
assume that there are billions of people in the world whom we will never meet because we can meet as
many of them as we like and extrapolate from our experiences, but in the case of a garden where all
the pixies are invisible, we don’t even have a starting point on which to base the extrapolation of our
experience. Absolutely no individual examples of the set’s members can be presented.
A more profoundly worrying aspect of pixies-only sets is that you can make claims about their members
without fear that someone will construct a counter-example to your claim! Disproving a conjecture by
construction of a counter-example is impossible. People who believe in the existence of sets like this
should be humoured and politely avoided. (Admittedly, there are other ways of disproving conjectures
than the provision of counterexamples. But it is very disturbing to know that one important method
of proof, namely the construction of a counter-example, is removed from the tool-box in advance.)
[ Get a reference for who showed that ZF is consistent with the negative of AC. In that case, Lebesgue non-
measurable sets are known to not exist. Find out if Gödel’s theorems have something to do with “goblin”
sets which have indeterminate existence status. ]
Discomfort level 4: Only Goblins, and we don’t know if they’re there.

Level 3 might seem like it is the most uncomfortable level, but there are sometimes sets whose existence
is not knowable, whereas at level 3, only the membership of the pixie sets is unknowable. For lack of a
better word, sets whose existence or non-existence is unknowable may be referred to as “goblins”.
For example, Zermelo-Fraenkel set theory without the axiom of choice does not say whether Lebesgue
non-measurable sets exist or not. If they do exist, they are unmentionable, but the axioms do not
allow us to show that they exist. (We “know” this because someone proved that ZF with no Lebesgue
non-measurable sets is a consistent axiomatic system. But presumably, like all metalogic and meta-
mathematics proofs, you have to assume quite a lot of basic logic and mathematics in order to make
the proofs work.)
2.10.7 Remark: Informal classification of discomfort levels with unmentionable sets and numbers.
The levels of discomfort in Remark 2.10.6 may be summarized as follows.
level description example
0. Finite ants. All ants visible in finite time on finite paper. 1000 , ZF − ∞
1. Infinite ants. Any single ant visible, but not all in finite viewing time. ω
2. Ants and pixies. All ants visible. Pixies invisible. but definitely exist. 2ω , IR
3. Pixies only. No ants. Pixies invisible, but definitely exist. ZF+AC
4. Goblins. All goblins invisible, and they may or may not exist. ZF
If the infinity axiom is removed from the Zermelo-Fraenkel set theory axioms ZF, the reduced axioms ZF−∞
would be at discomfort level 0. Probably some subset of the ZF axioms (but including the infinity axiom)
would be at level 1. (The power set axiom would have to be removed, for example.) The full ZF set theory is
at discomfort level 2 (if you assume that any set not constructible with ZF axioms definitely does not exist).
If the axiom of choice is added to ZF, this increases to discomfort level 3. (But see Remark 2.10.8.) There
are times when one can tolerate level 3. The reader must then simply be aware that they may encounter a
garden full of invisible pixies which can only ever be referred to in aggregate. As long as the limitations are
understood, there should be no seriously harmful consequences.
ZF set theory has discomfort level 4 if no determination is made on whether sets requiring AC for existence
proofs really do or do not exist. Then, for example, Lebesgue unmeasurable sets become goblins whose
existence is indeterminate.
2.10.8 Remark: ZF set theory may have the highest level of unmentionability discomfort.
In a sense, Zermelo-Fraenkel set theory without the axiom of choice may be even more disturbing than ZF
with AC. Although ZF with AC guarantees the existence of the disturbing pixie sets whose membership
is unknowable (and remember that membership is the only property that sets can have in ZF set theory),
ZF without AC is more disturbing because we don’t even know if these disturbing sets are “out there”.
Possibly even more disturbing than this is a ZF set theory where the existence of Lebesgue non-measurable
sets is excluded by an axiom. This is, apparently, a logically consistent axiomatic system. It is not clear
whether it is better to exclude the most disturbing sets, or learn to live with them, or just adopt a “don’t
ask” policy. In this book, the “don’t ask” option is adopted; that is, simple ZF with no axiom of choice. In
other words, there may be some goblins out there.
The situation where the ZF axioms and deduction rules are inadequate to determine whether Lebesgue
non-measurable sets exist is illustrated by Figure 3.8.2 in Remark 3.8.2.
[ Find a reference for ZF with axiom asserting non-existence of Lebesgue non-measurable sets. ]
2.10.9 Remark: The human mind is a finite machine in a countably infinite playpen.
One may say that the human mind is a finite machine which is confined to a countably infinite playpen in an
uncountably infinite world. This is true even in pure mathematics without considering the empirical world
at all. The set of mentionable real numbers is a countable playpen within which we can do mathematics.
An uncountable infinity of unmentionable numbers are forever out of reach of human thought. It is quite
embarrassing to realize that mathematical analysis has such a dubious basis, and that most of physics and
the other sciences are expressed in terms of this flimsy foundation.
The “finite playpen” is evaded in practice by referring to infinite sets of mathematical objects in aggregate.
Otherwise analysis would be impossible. But our real-world experiences develop a strong belief that we can

test general propositions by drawing a random sample from a population and applying the test criteria to the
sample. But in many areas of mathematics, the “random sample” excludes in advance almost all elements of
the general set. Thus all “random samples” are in fact strongly biased. So this practical method for allaying
scepticism of general propositions is not applicable, or is seriously flawed at best.
2.10.10 Remark: Very large sets are analogous to dynamically allocated “virtual” computer memory.
The problems with dark sets and dark numbers if we take a “dynamic existence” approach to sets and
numbers, analogous to the “virtual memory” concept in computer operating systems.
In modern computers, a program is allocated a large space of computer addresses which it can access.
However, those addresses do not “exist” until the program tries to read or write the contents of the addresses.
The operating system dynamically allocates real memory to represent the virtual memory addresses when
the program attempts an access.
This is a bit like a sandy tennis court which has no hard surface for the ball to bounce on, but just before
the ball hits the ground, a small area of hard surface is placed where the ball is about to hit. From the ball’s
point of view, the whole tennis court is a hard, flat, continuous area. The advantage of this approach is that
it economizes the available pieces of hard court surface if they are in short supply.
Sets and numbers are very much like this. The Zermelo-Fraenkel axioms for sets and the Cantor construction
for real numbers guarantee the “existence” of sets and numbers whenever the mathematician wishes them
to exist, in accordance with various construction principles. So when they are requried, they come into
existence.
Now when one talks about the existence of, say, all of the real numbers, one should ask whether one is talking
about the virtual real numbers which exist in principle (but which are not yet “allocated”), or the real real
numbers which have been brought into existence dynamically by some construction operation.
If one takes an anthropological view of sets and numbers, one finds that in practice, all sets and numbers
are defined by procedures or algorithms. Apart from the random sets which are brought into existence by
the axiom of choice, all sets and numbers are arrived at by a construction of some sort, which can be written
as a linear sequence of symbols. This countable space of constructions of sets and numbers yeilds numbers
which are brought into existence as required, if the procedures or algorithms conform to the rules of the
system.
This “dynamic allocation” picture of sets and numbers relegates the dark sets and numbers to the “virtual
memory” part of the system, whereas the “reachable” or “mentionable” sets and numbers are dynamically
brought into existence as “real memory” as required.
The fact that this picture is dynamic is not a problem. Human mathematical thought is dynamic. So in
the anthropological sense, the picture is accurate. The desire for a static picture arises from the fact that
mathematicians think of their concepts of sets and numbers as being static.
2.11. Integers and infinity

2.11.1 Remark: Mathematical induction and the infinity of integers.
The principle of mathematical induction is fundamental to all of mathematics. It is probably the most
important of all mathematical principles. It is essentially equivalent to the concept of infinity. If this
principle is not “true”, then there are no infinite sets. Whether or not the integers are infinite is related
to the questions: whether the universe is infinite, and whether time and space are infinitely divisible. The
Euclidean geometry model for space is infinite and infinitely divisible, but these properties are never needed
in practice. Euclidean geometry was suitable for the surface of the Earth as long as one did not apply
the model to unduly large extents. It was a model which was true enough within its range of application.
Similarly, whether the integers are infinite is irrelevant if one can never use an infinite number of integers in
practice.
2.11.2 Remark: Infinite integers versus the finite human mind and finite universe.
It is very difficult to reconcile the idea of infinitely many integers with the finite human mind. A finite
mind or computer can only represent a finite number of numbers. For example, a computer memory with N
bits of storage can represent at most 2N different numbers. These do not need to be the integers from 0 up
to 2N − 1. For example, the number 21,000,000 can be represented in very little space. One might refer to such

2.11. Integers and infinity 53
an expression as a “compressed number representation”. (The familiar floating point representations are only
a particular style of number compression.) Although this number has 1,000,000 binary zeros after the initial
digit 1, only a few bytes are required for its representation rather than a million. But if 1,000,000 binary
digits are chosen at random, it is very unlikely that such an integer can be “compressed”. Representation
of all integers less than 21,000,000 requires at least a million binary bits of storage. This is not difficult to do
with modern computers.
Now consider the numbers from 0 up to 2N where N = 101,000,000 . Representation of all of these numbers
requires at least 101,000,000 bits of storage. The amount of matter in the Universe is around about 5.68 × 1056
grams. (See Misner/Thorne/Wheeler [38], box 27.4, page 738.) This corresponds to about 3.40 × 1080
protons. If each proton stores one bit of information, this is still very much less than the required 101,000,000
bits. So even with a brain as big as the Universe, one inevitably finds that there must be integers which
cannot be thought about, no matter how big the brain is.
There is also a limit to the number of integers that can be represented in a given time because of special
relativity and quantum effects. The most serious restriction is the “bandwidth bottleneck” of any mind.
There is a very low upper bound on the number of bits of data which can be fed into a mind in a given time.
1,000,000
One must conclude that the vast majority of the numbers between 0 and 210 will never be thought
about by any being or system in the Universe. Even if the Universe was infinitely voluminous, it would not
be possible to communicate the state of such a Universe to any subset of the Universe (such as a human
mind) in finite time due to bandwidth constraints. So if we now crank up the number N to much larger
values, it is clear that 0% of all integers can ever be thought about if there are an infinite number of them.
The situation is much worse than this. Not only will almost all integers not be thought about; almost
all integers cannot be thought about. There are only a finite number of methods that human beings will
ever devise for “compressing” integers. By writing 1010 instead of 10,000,000,000 we obtain a compression
factor of about 2.5. By writing 101,000,000 instead of a million-and-one decimal digit number, we obtain a
compression factor of about 100,000. There are only a finite number of ways of compressing integers into
such notations. Consider the sequence of numbers ni defined by n0 = 1 and ni+1 = 2ni for (finitely many)
integers i. These are 1, 2, 4, 16, 256, 2256 , and so forth. Even at i = 5, the number ni is painful to write
down as decimal digits. But the numbers become very large very quickly. Suppose we take the 101,000,000 th
element of this sequence. That will be a really huge number. So now use that number ni for i = 101,000,000
as the number i. This is now monstrously bigger. Now do this feedback process 101,000,000 times. The
number will be stupendously big. However, no matter what method we specify, the result will take a finite
amount of space to represent. The description of any integer, no matter how much “compression” we use,
will require a finite amount of time to describe or a finite amount of space on a sheet of paper or in a
computer. Even if we use all of the atoms of the Universe, there are still only a finite number of numbers
that can be represented, because there are only a finite number of methods of specifying sequences and other
ways of describing integers. Therefore some integers cannot be described.
[ At this point, the Kolmogorov-Chaitin complexity might be relevant. See EDM2 [35], 354.D. This might
have something to do with the compressibility of numbers. ]
2.11.3 Remark: An argument against the validity of the induction principle.

One may conclude from Remark 2.11.2 that (1) there are only finitely many integers which can be represented
in the Universe in which we live, and (2) there is a maximum integer which can be represented. But now the
intuitive principle of induction should save us. The intuitive induction principle says: “Whatever the largest
integer is, just add one. Then you have a bigger integer. That contradicts the hypothesis of a finite number
of integers. Therefore the integers are infinite.” This intuition is so fundamental to modern mathematics
that it is almost impossible to reconcile with the above argument for the finity of integers. (It is difficult to
let go of a belief which is acquired at a very young age!)
The problem with the induction principle is that you can’t know which integer is the largest. Therefore you
cannot add 1 to it. This is related to Zeno’s paradox of Achilles and the tortoise. It is also related to the
atomic theory of matter. More precisely, consider the analogy with the finite speed of light. “No matter
how fast you travel, you can always travel faster.” Even within special relativity, this is true. But there is
a finite limit to speed. The problem is that the amount of energy required for each speed boost increases so
that one can never exceed the speed of light. (One would need an infinite amount of fuel, and this would
make the vehicle infinitely massive, and so forth.)

In the case of integers, you must be able to represent an integer in order to add 1 to it. No matter which
integer you pick, even if you take the millionth power of this integer, you still only have yet another large
integer. The finity of integers is related to this kind of paradox: “One more than the largest number which
can be described in thirteen words.” This seems to describe a number in 13 words. The problem is that
it doesn’t describe a number. It describes a recursive relation which is not soluble. So it is not a paradox.
It is similar to “the solution of x = x + 1”, but the contradictory recursion is easier to disguise in plain
English. (See Remark 8.4.4 for discussion of x = x + 1.) In the same way, “the largest number which can
be represented” is not a representation of a number. The source of the paradox is that a human mind is
trying to talk about the largest number which can be represented by a universe which includes that human
mind. A system cannot represent all of the states of a system which includes itself. That is as recursive and
meaningless as the 13-word pseudo-paradox above. Therefore there is no paradox in assuming that there are
only a finite number of integers.
2.11.4 Remark: We don’t really need an infinite number of numbers.

One should not feel too uncomfortable with a finite set of integers. Euclidean geometry assumes a point
set of infinite extent, but we use this for 2-d maps of the Earth and 3-d models of the Universe. We just
ignore the useless bit outside the domain of interest. This is as harmless as Newtonian mechanics being
applied to everyday situations. We still use Newtonian mechanics for almost all practical purposes, but we
ignore the part of the model which we don’t need. In the same way, we should not really care if there are
finite or infinite integers in our model for human and mechanical computation processes. We will never need
most of the integers in the infinite set. So it is of no importance that most of the integers will never be
thought about, and never can be thought about, by any human or computer, or any network of humans and
computers.
2.11.5 Remark: The assumption of infinitely many integers avoids the need for boundary conditions.
Although the “real world” set of integers, which we model by the mathematical integers, is fairly clearly
finite, there is one excellent reason for assuming that the integers are infinite. That reason is the avoidance of
boundary conditions. If we defined the integers to be finite, all theorems and proofs would need to deal with
the boundary condition at the “high end” of the set. We would need to say things like: “Let an+1 = 2an if n
is less than the largest integer; otherwise an+1 is undefined.” Working with such a “closed set” of integers,
we would always need to have separate conditions for the “interior” and “boundary” of the set of integers.
This is because our logical systems require the total absence of contradictions and errors. (A modified logical
system could, in principle, permit fuzziness of the logic at the boundaries of the system. But that would be
revolutionary step backwards which would rob mathematics of any credibility which it currently possesses.
For example, proof by contradiction would cease to be valid. See Section 3.11 for discussion of proof by
contradiction.)
If we exclude those integers which cannot be represented by the state of a human mind, or by a machine
the size of the Universe, we would need to place conditions on integers like: “If n is a representable integer,
then. . . ”. This would make all of mathematics very messy indeed. (It is true that some mathematics is
nominally finite, but mathematical logic itself makes extensive use of induction. So induction is explicit or
implicit in all of modern mathematics.)
The strongest argument for the infinity of the integers and the principle of mathematical induction is conve-
nience. The infinite integers in Definitions 7.3.2 and 7.3.5 are simply the most convenient kinds of integers
which include all of the integers that we will ever want to refer to, and we don’t have to extend the set every
time someone extends the integers by thinking up a previously unknown method of describing integers. This
is like using Euclidean 3-space to describe the Universe so that we don’t have to say how big it is or whether
it is infinite. Euclidean 3-space has enough points for our purposes. If we don’t use it all, that’s not a
problem. If we run out of numbers or points, that would be a problem.
Similarly, in physics and engineering, the behaviour of systems after they achieve equilibrium is routinely
studied, although equilibrium requires infinite time. It’s just more convenient to ignore the modelling error
and work with the infinite-time solution.
In conclusion, it will be assumed in this book that the integers are infinite and that the principle of mathe-
matical induction is valid. This is a convenient and harmless fiction, as long as you don’t think about it too
deeply. It’s just a model. All models are imperfect because models are in the mind, and the world is “out

there”. The mind is too simple to represent the complexity of the world because a mind is a tiny subset of
the world. Mathematics is the study of models, not of the real world.
2.11.6 Remark: Practicality of mathematics using only a finite number of integers.

There could be applications for a mathematical theory of integers which explicitly recognizes “compressible
integers” and works exclusively within such a framework. Then one could represent a large number of
integers using a “compression code book”. Using this code book, very large numbers may be represented
by shorter strings of symbols, like 101,000,000 − 1 to represent a number with many decimal digits. Since the
code book must be finite, one must be careful to include in the code book as many important integers as
possible, omitting “uninteresting” integers. It would be possible to define large finite sets of integers which
have well-defined partial closure properties. Obviously no finite set can be closed under normal addition. But
some sort of “weak closure” could be useful. Such almost-closed sets of integers would have the advantage
of being implementable in a finite-state machine.
Such a finite theory of integers is routinely implemented in computers. The problems which arise from such
finite representations are well known in the field of numerical analysis. This topic lies outside the scope of
this book.
2.11.7 Remark: The “biggest number” is time-dependent.

A useful way to think of the set of representable integers is to consider the set as time-dependent. The integers
which have been represented somewhere on Earth yesterday may be fed into functions which generate new,
bigger integers today. Today’s biggest integer will be larger than yesterday’s biggest integer. This helps to
explain our intuition that no matter how large an integer is, we can always make it bigger. This is true, but
making integers bigger takes time.
One might argue that in the long run, we can generate arbitrarily large integers. But as the old saying goes:
“In the long run, we’re all dead.” (This saying is generally credited to John Maynard Keynes.) Just as a
pseudo-number “infinity” can be defined as the solution to x = x + 1, interpreted as an iteration formula (as
mentioned in Remark 8.4.4), so also we can think of the definition of the “biggest number” as the solution
of an iteration process which daily increases the number of integers.
In this perspective, the assumption that the integers are infinite is a way of avoiding the need to keep
increasing the “biggest number” every day as people need larger and larger numbers.
One of the first signs that a child is taking an interest in mathematics is when they ask: “What is the biggest
number?” Sometimes children think up a really big number and think it is the biggest for a while, and then
realize that they can add 1, multiply by 10, or do something else to make it bigger. Eventually they settle
on “infinity” as the biggest number, until they find out that this has certain technical difficulties also.
Like most of the ancient riddles and paradoxes of mathematics, the principle of mathematical induction
remains incompletely resolved to the present day. Formalization of irresolvable difficulties by axiomatizing
them as in Definition 7.3.2 does not solve these problems. Axiomatization is merely a compromise agreement
to call a truce so as not to raise philosophical difficulties which get in the way of mathematical productivity.
If the agreement is looked at too closely, the compromise may unravel. Axiomatization can only formalize
and standardize our current state of ignorance. It does not remove the ignorance.
2.11.8 Remark: Infinite sets can be referrred to finitely in aggregate.

Although we can only represent a finite number of integers in a finite time in a finite universe with finite
communication channels, it is possible to refer to infinite sets of integers in aggregate. (The words “set”
and “aggregate” have very similar meaning.) For example, it is impossible to list all of the odd numbers
individually, but it is easy to refer to the set {n ∈ ; ∃m ∈ , n = 2m + 1}.
The very-large-sets problem also disappears in the case of aggregates. For example, we cannot list all of the
integers from 0 up to 2N where N = 101,000,000 , as mentioned in Remark 2.11.2, but we can easily refer to
the set {n ∈ ; 0 ≤ n ≤ N }.
The set {n ∈ ; 0 ≤ n ≤ N } is well defined for any representable integer N , even though we cannot list
the elements if N is too large. The set is a well-defined aggregate because there is a well-defined procedure
to determine if a number is or is not in the set. A set is defined by its membership. So the ability to
answer “yes” or “no” to the membership question is the only pre-requisite for meaningfulness of the set.
Similarly, the infinite set of all natural numbers is a well-defined aggregate because there are well-defined

membership determination procedures. Another example is the set of prime integers. (See Definition 7.4.2.)
It may be very difficult in practice to determine whether any given integer is a prime, but the procedures
are well-defined. So the set is well-defined.
2.11.9 Remark: Operational definitions are required for representable sets and numbers.
A requirement for the practical definition of any aggregate of integers is the availability of a procedure to
determine whether any given integer is or is not inside the aggregate. There is a corresponding requirement
for the definition of sums of series of real numbers.
A number or set is “represented” or “pointed to” or “indicated” by a formula or expression if it is practically
possible to determine if this number or set is the same or different to any number or set that someone else
represents. For example, no one can write down all the decimal places of π or e, but it is easy to determine
that these numbers are different. Given two different series expressions for π, it is possible to verify that
the sums of the series are equal. (See also Remark 2.12.4.) Thus a “representation” of a number or set is
the same sort of thing as an “operational definition”. (Lebesgue non-measurable sets, by contrast, are not
representable. Nobody knows which numbers are members of such sets.)
2.11.10 Remark: Semi-pixie integers. Analogy to limited resolution of telescopes.

The integers which we can never write down or “indicate”, because they have too many decimal digits
and cannot be compressed by any humanly possible algorithm so as to fit into the state space of the total
machinery available to humanity, may be termed “semi-pixie integers”. These are the integers which we can
refer to in aggregate but never individually. Of course, there must be a grey area between the indicatable
and non-indicatable integers. The set of semi-pixie integers is not well-defined. If we could write down one
of the semi-pixie integers, it would not be a semi-pixie integers. You know that they are “out there”, but
you can never see them. We don’t even know which integers are “visible” and which are not.
This could be thought of a resolution problem. The resolution of our human ways of indicating things is not
fine enough to resolve all individual integers. We can refer to vast sets of integers just as we can see pictures
of distant galaxies of which we cannot see any individual atoms. We know that there are atoms out there,
but we will never see them. Our inability to resolve individual atoms in distant galaxies does not make us
uncomfortable with the assumption that the atoms are indeed there. The non-indicatability of most integers
is no more frustrating or bewildering than this.
2.11.11 Remark: The abundance of integers.
We can think of all infinite structures in mathematics as arenas in which mathematical activity can take
place. The closure axioms of infinite mathematical structures are sufficient to guarantee that whatever
activity takes place will remain within the arena, but this does not imply that such acitivity can ever reach
all parts of the arena. Some parts are unreachable, and most of the theoretically reachable parts of the arena
will never be reached at any time in human history. We can think of this as a kind of abundance of integers.
Given the choice between scarcity and abundance, probably abundance is the lesser of the two discomforts.
2.11.12 Remark: Classification of dark numbers into black numbers and grey numbers.
Perhaps we could sub-classify dark numbers (mentioned in Remark 2.10.5) into black numbers and grey
numbers. The dark real numbers which are essentially random and incompressible could be thought of as
black numbers because they absolutely cannot be represented at all in any finite system such as a human
mind or digital computer. But an integer with a randomness which cannot be compressed to an expression
which is representable in a computer the size of a planet is only a grey number because the incompressible
size requires only a finite state space, although that finite state space is far beyond all known technologies.
In this sense, the childish question about the “biggest number” does have a serious answer. There is a
biggest number which can be thought about or talked about, but if we knew which number this was, we
could add 1 to it and make a bigger number. In other words, the biggest number is only the biggest number
until we actively think about it. To be more precise, the biggest number is time-dependent.
2.11.13 Remark: Zermelo-Fraenkel set theory axiom of infinity.

The infinity axiom, Definition 5.1.26 (8), has some difficulties. We will never be able to write down all of
the finite ordinal numbers. The main thing which convinces people to accept this axiom is the difficulty in
seeing where the numbers would run out. The induction argument is that if there are only finitely many
integers then there must be a last integer, and we can add 1 to this to get a larger integer, which is a

contradiction. This is quite convincing. We will never see an infinite number of integers, but “we know
they’re there”. Mathematics seems to be full of objects which are invisible and unknowable, but this infinity
axiom is perhaps the least disturbing of these mysteries.
A more disturbing consequence of the infinity axiom is its combination with the power set axiom, Defini-
tion 5.1.26 (5), to show that the set IP(ω) of all subsets of the non-negative integers is well defined. This can
be identified with the set 2ω , which is equivalent to the set of all real numbers in the interval [0, 1) expressed
in binary (base-2) form. Since there are only countably many sentences that can be written in ZF theory,
almost all elements of IP(ω) are “unreachable”. That is, they can never be written down. So almost all such
sets are eternally beyond individual description by any mathematician. These sets can be used in aggregate,
but they cannot be singled out and discussed individually. Whereas we can single out any single integer
in ω, but not all them, the vast majority of subsets of ω cannot even be singled out. Nevertheless, most
mathematicians nowadays do not worry about this too much.
So the progression of discomfort here is from level (1): elements of a countably infinite set can be singled
out, but they can’t all be given names in finite time; to level (2): infinitely many subsets of a countably
infinite set can be singled out, but almost all subsets can never be even referred to individually; and to
level (3): none of the elements of a set brought into existence by the axiom of choice can ever be referred
to individually. At level (1), each element may be singly referred to or the whole set can be referred to in
aggregate. At level (2), essentially any element of interest can be singled out, although the rest of the set
may be referred to only in aggregate. At level (3), not even one single element can be referred to as an
individual, and therefore the whole set may be referred to only in aggregate. Consequently, these levels of
discomfort may be worth accepting if referring to aggregates suffices.
By comparison, one may think of all the people on Earth. We may meet any one of them, but we can’t meet
them all. However, we are happy to refer to the aggregate of all people on Earth. (Similarly we are happy
to accept the number π although we will never see all of its digits.) This is level (1) “discomfort”. We can
see many stars, but there are many stars which we will never be able to see even if we try. However, we are
happy to talk about all the stars in the universe in aggregate as a category, and we can see a large number of
examples, which is reassuring. This is level (2) “discomfort”. We know (more or less) that the inside of the
Sun is extremely hot, but we can never go inside with a thermometer to check. However, we are still happy
to assume that the inside of the Sun does exist and that it is indeed hot inside there. This is similar to
level (3) “discomfort”. It seems that humans can cope with some fairly high levels of abstract thinking about
things which will never or can never be observed directly. But there are limits to this kind of abstraction.
In this book discomfort up to level (2) will be accepted, but theorems which require level (3) discomfort will
be specially marked, for example as Theorem [zf+ac]. (See Theorem 20.1.4 for a real example.)
2.11.14 Remark: The origin of classes which are mentionable in aggregate but not as individuals.
The concept of “aggregates” in Remark 2.11.13 arises in real-life set theory from an induction, extrapolation
or generalization process. A person sees a rabbit, then another rabbit, then another. And after a while
one acquires the concept of a general rabbit class which is defined by the observed attributes of the rabbits
which one has seen so far. Then any future rabbits are classified into this class on the basis of the inferred
attributes of the class. Thus the process has the stages: (1) see examples, (2) infer common attributes,
(3) define a class in terms of attributes, (4) classify future class members according to attributes. This
process of class definition, forming aggregate concepts from individual examples, is built into most animals.
(See Figure 2.11.1.) Classification is more or less the same thing as pattern recognition.
In the case of infinite classes, for example the integers, the process is the same. One sees many integers, notes
their common characteristics, defines a class based on these characteristics, and then classifies future objects
as integers on the basis of the characteristics. After a while, the class of objects acquires its own sense of
reality. It is this ability to conceptualize classes or aggregates which is the basis of naive (i.e. in-born) set
theory. Without this mental skill, humans would not be able to learn set theory.
Classes are an important building-block of real-world models. Classes make possible the manipulation of an
infinite number of instances based on attributes of classes rather than the attributes or identity of individual
instances.
Most classes are, in principle, infinite. The word “infinite” means literally “without end”. In other words,
there is no “last element” of an infinite class, or at least, one cannot state which element of the class is the
last. So, since one cannot see an “end” of the set, one assumes, for intellectual economy, that there is no

attribute parameter 2
mice rats
hamsters
attribute parameter 1
Figure 2.11.1 The development of classes from individual observations
end. Then literally, the set is “infinite”. (The ZF set theory definition of infinity in fact defines an infinite
set to have no “last element”. That is, every element has a “successor”.)
Now in terms of this “cognitive theory” for infinite sets, it becomes clear that there is no paradox in stating
that a set may be mentioned in aggregate, but no element of that set may be referred to individually. Thus
we may talk about the real numbers in aggregate, whereas almost all real numbers are not individually
finitely describable in any language, simply because of the finite bandwidth of the human mind and human
communications.
The vast majority of mathematical models and mathematical concepts are expressed in terms of classes, i.e.
aggregates, not in terms of individual things.
2.11.15 Remark: Mathematical induction, the infinity concept, and straws on camel’s backs.
The common argument in favour of the existence of an infinite number of integers is that you can add 1 to
the largest integer to get a larger integer. Therefore there is no largest integer. However, if you can’t write
down the largest integer, you can’t add 1 to it.
The number of atoms in the Universe is of the order of 1080 . Even if we store 1010 bits of data in each atom,
90
that gives us only 210 numbers which can be represented with all the matter in the universe. So out of the
100
first 210 integers, certainly only at most one billionth of all integers can be represented at one time. With
this sort of argument, it becomes clear that there must be a point at which we can no longer keep adding
1 to the integers. This is like the “straws on the camel’s back” argument. We know that a camel (without
further breeding advances) cannot carry 100 tons of straw. But we cannot imagine that a single straw will
break its back. (See also Remark 4.13.12 regarding camels and straw.) In the same way, there must be a
largest integer. We just can’t imagine what it is.
Therefore there is no concrete representation of an infinite set of integers, even in human minds. (Minds are,
of course, much smaller than the Universe.) Since ZF set theory is only a model, and there is no concrete
system which fulfils the ZF infinity axiom (Definition 5.1.26 (8)), it follows that the conclusions of ZF set
theory can never be obtained. Since all of mathematical analysis (particularly differential equations) relies
heavily on infinite and infinitesimal limits, it follows that mathematical analysis is not validated by ZF set
theory. And since almost all models in physics are expressed in terms of differential equations, the vast
majority of physics receives no support from ZF set theory. Consequently almost the whole of mathematical
physics is essentially baseless.
2.11.16 Remark: For large numbers, name-to-number mapping ceases to be a yes/no proposition.
Perhaps the best of thinking about the infinity issue in Remark 2.11.15 is to note that as numbers become
larger, the ability to represent the number in a mathematical mind ceases to be a yes/no question. It is
the applicability of the concept of a proposition which fails here, more than the concept of a number. In
the phrase “let N be the largest integer”, the verb “be” is an implicit instruction to construct an name-to-
number map which maps N to the largest integer. It is this constructibility question which changes from a
yes/no proposition into a confidence-level estimation which depends upon resource availability and political
will. Logic is based upon yes/no questions. The logic model is not applicable otherwise.
2.11.17 Remark: Abstract-to-concrete variable name maps and the cosmic information bottleneck.
One might ask why it is that, according to Remark 2.11.15, there cannot be infinitely many integers in any

concrete system, whereas in ZF set theory, we can confidently assert that if M is the maximum integer, then
M + 1 is a larger integer, therefore there is no largest integer, hence the integers are infinite. The explanation
for this apparent contradiction is the fact that the abstract variable name space in ZF set theory (which
contains variables names of sets and numbers) requires a variable name map in order to be applied to a
concrete variable space (which contains concrete sets and numbers). (See Remark 4.12.3 and Figure 4.12.1
for abstract-to-concrete name maps for logical variables.)
When a variable M is used in symbolic logic, the map µV : NV → V from the abstract variable name space
NV to the concrete variable space V is arbitrary and unspecified. The deductions of symbolic logic are
independent of the choice of variable name map µV .
Now when we assert in the abstract variable name space that we can add 1 to M , this only has some
significance for the concrete variable space if a variable name map µV is specified. Then we can interpret
M + 1 as addition of µV (M ) and µV (1) in the concrete space of integers. However, we cannot even define
the map µV if it requires too much information. The cosmic information bottleneck prevents us from writing
down the map. Therefore the conclusions from the abstract name space are not applicable. Abstract results
from a model are only applicable if a map can be defined from the model to a concrete system.
2.11.18 Remark: Possible unsuitability of infinitely many integers for physics.

On a personal note, this author found the huge excess of real numbers quite disturbing in the early 1970s
while studying both mathematical physics. In physics, nothing could be measured to more than about 10
significant figures whereas in mathematics, people took seriously the calculation of hundreds of digits of
numbers like π. There were even contests on the mainframes in those days to calculate vast sequences of
decimal digits of π whereas in reality, the ratio of a circle’s diameter and circumference could never be
measured to more than 10 figures.
The countably infinite set of integers seemed less disturbing than the uncountable set of real numbers. But
now, upon further reflection, the stupendously large set of integers is just as disturbing. Even mathematical
induction does not bear up well under very close scrutiny, although in younger years it seemed almost
self-evidently valid.
[ Remark 2.11.19 is very similar to Remarks 2.4.2 and 2.4.3. ]
2.11.19 Remark: Unsuitability of integers for describing the real world.
Taking scepticism of the integers one step further, it is possible to doubt even that small integers are an
adequate model for anything in the real world. One may argue that the integers have been natural since
human beings were herding cattle 10,000 years ago. Counting cattle was important then. Surely the integers
are an adequate model for the counting of cattle. However, there are problems with this.
To see the inadequacy of integers to describe even the simplest counting task, suppose a cow is dying. At
what point do you say that the cow has ceased to be a cow? Even when high-tech probes are placed in
human beings, it is highly disputed when a human being has died. So in order to count cows, one would
need an agreed definition of cow death so that the number of cows can be reduced by 1. Consider also births.
At what point in time during birth has the new cow come into existence? There must be a time during
which there is ambiguity. If the grazing area is large, it may be difficult to know if a cow is in one’s own
territory or in the neighbour’s territory. Since one cannot observe all points simultaneously, the cow count
could depend on the times and places at which observations are made. There is also a problem with species.
At what point does the breeding of cattle (with or without genetic engineering) lead to a new species which
is no longer described as cattle? If a mutant cow has two heads, is this two cows or one? Or is it a new
species of cow?
It is clear that counting cows does not always yield an integer as we understand integers in modern mathe-
matics. Most counting situations have the same kinds of difficulties as counting cows. When the prehistoric
problems are combined with quantum mechanics and relativity, the problems become even greater. Simul-
taneity of existence of cows is not always well defined. All observations have an irreducible uncertainty.
Clearly the integers provide only an approximate model for real systems. The model is often useful, but only
if one does not examine its accuracy too closely.
These difficulties with integers apply equally to all sets. The whole of set theory is accurate only in describing
the models inside the human brain, not the fuzzy, ambiguous reality which is “out there”.

One might therefore ask whether there are better models for the real world than such crisp concepts as
integers and sets. In fact, this is probably not the best question to ask. A better question is whether integers
and sets provide a good basis for the formalization of human thinking and models. Mathematics provides
tools for thinking and mental models, not for the real world. The mind forms models, and the model can
be formalized in terms of mathematics. It is not the job of mathematics to accurately describe the real (i.e.
phenomenal) world, but rather to provide a language and tool-box for human thinking about the real world.
More accurately, mathematics provides a language framework for the communication of mathematical ideas.
2.11.20 Remark: Infinity and uncountability are negative concepts.

An interesting clue to the possible vacuity of concepts of infinity lies in English-language words such as
“infinite” and “uncountable”. Both of these words are negatives: “infinite” means “not finite” and “un-
countable” means “not countable”. In each case, the word does not actually say what the concept means.
Each word only says what the concept is not.
When one shows that a set is infinite, one typically invalidates the assumption that the set is finite. That
is, one assumes the set to be finite (typically by associating the set with a finite set of integers), and then
arrives at a contradiction by showing that the set has an element which is not in the assumed set.
To show that the real numbers are not countable, one typically uses a diagonal argument, which starts from
the assumption that we have an ordered listing of the elements of the set and then proceeds to construct an
element of the set which is not in the listing.
In each case, when proving that a set is infinite or uncountable, the proof is obtained by challenging the
validity of an assumed listing of elements in the set. Thus no infinite or uncountable list is constructed.
This is certainly unsatisfying, which explains why concepts of infinity and uncountability were so deeply
controversial in the 19th century.
2.11.21 Remark: Proofs of infinity and uncountability use a challenge/response method.

Both the inductive method of proof of the infinity of a set and the diagonal method of proof of the un-
countability of a set may be regarded as “challenge/response” proof methods. In each case, the assertion is
challenged by a finite (in the first case) or countable (in the second case) listing of the elements of the set.
Then the response to this challenge is the construction of an element of the set which is not in the putative
listing. Thus the construction in the proof is in the negative response to the attempted challenge to the
main assertion.
The constructive negative response is effectively an algorithm which can deal with an infinite numbers of
possible challenges. But none of those challenges is presented explicitly. Thus the response is more like a
template for an infinite number of responses.
2.11.22 Remark: The difficulty of imagining the termination of an infinite sequence.

Saying that “set X is infinite” is a bit like saying: “I am the Ruler of the Universe.” The mere writing of
a proposition does not make it true. Within axiomatized set theory, there is usually an axiom which states
that infinite sets exist. This does not prove anything at all.
Although the integers may seem to be obviously infinite because of the principle of mathematical induction,
the inductive method of reasoning is not valid in real life. The fact that we cannot imagine that there is
a point at which a sequence terminates does not imply that the sequence does not terminate. Life is full
of experiences which one canno imagine coming to and end. But they do end anyway. Zeno’s paradoxes
relied upon the inability to imagine infinite processes. If one cannot imagine an infinite sequence, one cannot
imagine how the sequence may eventually terminate.
If someone claims to be the Ruler of the Universe, it may not be possible to prove otherwise. But the
inability to disprove an assertion does not imply the validity of the assertion. In particular, it may be that
the assertion is meaningless. It is quite meaningful to say that one cannot imagine that there is a greatest
integer. This just means that no attempt to utter the largest integer can succeed, or at least we cannot
imagine that such an attempt would succeed. But this does not prove that the set of integers is infinite,
unless one understands “infinite” in the literal sense that the set has no end.

2.12. Real numbers and infinitesimality 61
2.12. Real numbers and infinitesimality

[ 2008-5-8: This section has not yet been integrated into this chapter. ]
2.12.1 Remark: Finite resolution of real-world measurements.

The real numbers arose historically from measurements of length, area, volume, weight, angles and other
physical observables. So the real numbers may be regarded as the standard mathematical model for physical
measurements. This does not mean that physical things themselves have equations of motion which involve
real numbers. It is only the measurements which are modelled by real numbers. Measurements are cate-
gorized in philosophy as “phenomena” as opposed to the underlying unknowable entities (the “noumena”)
which produce the phenomena. Although fractional numbers, whether they are rational integer fractions,
decimal expansions or sexagesimal expansions, may be continued without any obvious limit, one can never
make a measurement to infinite accuracy. Real-world measurements work with intervals of uncertainty.
However, the human mind can perform induction on decimal expansions, for example, to conclude that since
no matter how accurate a measurement is it can always be made more accurate, it follows that there are no
limits. Hence we typically assume the existence of infinite decimal expansions although we can never measure
or experience them. (We are accustomed to accepting the existence of things which we cannot experience.
For example, we cannot meet all 6,000 million people in the world, as mentioned in Remark 2.10.6. But this
does not make us seriously doubt their existence.)
The infinite resolution of real numbers causes no serious problem unless someone demonstrates that there
is a limit to measurement resolution. For example, if it were shown that time and space are particulate
or “atomic”, the real numbers would need to be replaced with something else. Then all of analysis and
differential geometry would need to be totally revised in such a framework. So the sceptical mathematician
or physicist should be quite alarmed (as the author is) when a textbook starts with the assertion without
proof that space is a 3-dimensional real-number manifold. Such assertions should be tagged as axioms or
conjectures, not stated as a-priori knowledge. The observed phenomena have finite accuracy of measurements.
The infinite accuracy of the real number system is only an assumed model for the underlying unknowable
entities. In the case of solids, liquids and gases, we know that the model is wrong. In the case of space and
time, we have no evidence against the real-number model – yet!
In recent times, there have been increasing indications that space and time may have a granular character,
in particular with reference to the Planck length, which is about 10−35 metres.
[ Write more about the Planck length. Perhaps also mention the holographic geometry idea: Craig J. Hogan,
“Measurement of quantum fluctuations in geometry”, Phys. Rev. D 77, 104031 (2008). My personal suspicion
is that space-time may be “created” or constructed by vast quantities of gravity particles which cause a
“shadow force”. These particles then cause miniscule variations in space-time, in the same way that massive
objects like stars cause large-scale curvature of space-time, and perhaps by the identical mechanism. This
might then explain the GEO600 noise better than the holographic theory. I’m not a physicist. So I’m just
speculating wildly here. But I might be right though! ]
2.12.2 Remark: Questionable validity of real numbers for the physics behind the mesaurements.
During the 19th century, there was much debate about the validity of the real number system. Even in the
20th century, after the logical self-consistency problems had been dealt with, debate has continued on the
relevance of the real numbers. Despite all the apparent logical paradoxes, the real numbers do seem to be
at least logically self-consistent, whether or not they correspond to the true nature of physical space. The
real numbers do seem to provide a good basic model for physical measurements, although it is still not clear
how physical measurements correspond to the underlying “reality”. It is an interesting question whether or
not the real numbers will always provide the principal underlying building block for physical models. Since
manifolds are built very specifically out of the real numbers, the entire relevance of differential geometry
depends crucially on the relevance of the real numbers.
Why should we suppose that space and time are infinitely divisible when we can’t even examine them under a
microscope? It seems to be merely a collective agreement among scientists that space and time are infinitely
divisible and uniform down to any resolution. Coincidentally this is what the real number system is like.
So we may be simply projecting our models onto reality. The philosophical problems with the real numbers
should not be assumed to be problems with the real world. We cannot be certain that the real numbers are
an accurate model for any real-world system at all. Therefore all of differential geometry, which is heavily

based on real number systems IRn , must be regarded as only a tool-box for creating models which are suitable
until proven unsuitable.
It is difficult to re-examine one’s assumptions about numbers after a lifetime immersed in conventional
thinking. Since the role of scientists is partly to overthrow orthodoxy in the name of progress, it is important
to take an aggressive attitude towards the weak points of conventional thinking.
2.12.3 Remark: Unreachable real numbers.

One of the serious philosophical problems with the real numbers is the question of “effective computability”.
Since only a countable number of real numbers can be computed within ZF set theory, almost all real
numbers must be “ZF-unreachable”. This, unfortunately, means that almost all real numbers are “pixies at
the bottom of the garden”. We know they’re there, but we will never be able to write a specification for them.
If these numbers are rejected, then there will seem to be a very dense set of “gaps” in the set of real numbers.
This could cause big problems for Lebesgue measure theory. But if the existence of unreachable real numbers
can be comfortably accepted, this would re-open the question of whether the existence of unreachable choice
functions can be accepted with a similar level of comfort. It seem odd, though, to be making statements
about objects which can never be written down. It’s a bit like writing the laws of physics for an alternative
universe which one has never seen, and provably cannot be seen or experienced in any way. This perspective
makes both uncomputable real numbers and axiom-of-choice functions seem distinctly metaphysical.
To be told that almost all theorems are unreachable within ZF seems perfectly acceptable. But to be told
that almost all real numbers are unreachable within ZF is more disconcerting. One might ask how the
physical world copes with these issues. If the real world is somehow operating with real numbers, how
does it resolve questions of uncomputability of numbers? Does the real world “skip the gaps” between the
effectively computable numbers? There’s obviously something wrong with the real numbers if they have
such extraordinary complexity and yet atoms and photons seem to be care-free and nonchalant about it.
Although we can never measure anything to infinite significant figures in the real world, mathematicians fill
in the gaps with infinite decimal expansions and such notions. So any infinite sequence of digits is a figment
of the imagination anyway. And yet all of analysis is dependent on these infinite expansions. It could be that
pixies really do exist, because without them it seems impossible to create a viable analysis and differential
geometry. Since analysis is a bedrock underlying almost all physics, and since physics seems to get so many
answers right, maybe those pixies at the bottom of the garden really are out there making analysis work
correctly. This would take away an argument against the axiom of choice then.
All in all, it seems that any infinite sequence of zeros and ones in a binary number should be permitted. They
can’t all be reached, but it’s nice to know that they are there. This filling in the gaps with pixie numbers
which will never be seen seems to be the lesser of two evils. The fact that we will never be able to write
down algorithms to generate most of the real numbers is not sufficient reason to exclude them. The pixie
numbers seem to be harmless enough. This contrasts with the situation of the axiom of choice where some
important proofs rely upon the throw of the dice to generate choice functions, and then no choice function
can be written down. In the case of real numbers, we have a healthy population of reachable real numbers
to work with without ever having to postulate existence of unreachable numbers in order to make proofs
work. Since we do not rely on the existence of pixie numbers, it will be no great sadness if they are taken
away some day, whereas withdrawal of AC from a topic which relies on them causes some pain to discover
and weed out the dud theorems.
2.12.4 Remark: The historical motivation for the formalization of real numbers.
In the 21st century, the real numbers seem so natural, it is difficult to understand why there was ever a need
to formalize the real numbers and establish the validity of the concept. The following quote from Bell [191],
pages 519–520, helps to clarify the problem which needs to be solved.
If two rational numbers are equal, it √ is no doubt√obvious that their square roots √ are√equal.
√ Thus
2×3 and 6 √ are equal;
√ so also
√ then are 2 × 3 and 6. But it is not obvious that 2×
√ 3 √ 2×
= √3,
and hence 2 × 3 = 6. The un-obviousness of this simple assumed equality, 2 × 3 = 6,
taken for granted in school arithmetic, is evident if we visualize what the equality implies: the
“lawless” square roots of 2, 3, 6 are to be extracted, the first two of these are then to be multiplied
together, and the result is to come out equal to the third. As no one of these three roots can be
extracted exactly, no matter to how many decimal places the computation is carried, it is clear
that the verification by multiplication as just described will never be complete. The whole human

2.12. Real numbers and infinitesimality 63
√ √ √
race toiling incessantly through all its existence could never prove in this way that 2 × 3 = 6.
Closer and closer approximations to equality would be attained as time went on, but finality would
continue to recede. To make these concepts of “approximation” and “equality” precise, or to replace
our first crude conceptions of irrationals by sharper descriptions which will obviate the difficulties
indicated, was the task Dedekind set himself in the early 1870’s—his work on Continuity and
Irrational Numbers was published in 1872.
2.12.5 Remark: The requirement to include irrationals in the real numbers.
The need for irrational numbers was not always accepted. In the 19th century, Kronecker urged mathe-
maticians to build all of mathematics without irrationals. Mathematics could have continued without irra-
tionals, and probably the physical sciences would have been seriously handicapped by their lack. Bell [191],
pages 521–522 made the following comment about irrational numbers and infinite concepts.
It depends upon the individual mathematician’s level of sophistication whether he regards these
difficulties as relevant or of no consequence for the consistent development of mathematics. The
courageous analyst goes boldly ahead, piling one Babel on top of another and trusting that no
outraged god of reason will confound him and all his works, while the critical logician, peering
cynically at the foundations of his brother’s imposing skyscraper, makes a rapid mental calculation
predicting the date of collapse. In the meantime all are busy and all seem to be enjoying themselves.
But one conclusion appears to be inescapable: without a consistent theory of the mathematical
infinite there is no theory of irrationals: without a theory of irrationals there is no mathematical
analysis in any form even remotely resembling what we now have; and finally, without analysis
the major part of mathematics—including geometry and most of applied mathematics—as it now
exists would cease to exist.
The most important task confronting mathematicians would therefore seem to be the construction
of a satisfactory theory of the infinite.
Mathematics sometimes seems like an exhausting series of mind-stretches. First one has to stretch one’s
ideas about integers to accept enormously large integers. Then one must accept negative numbers. Then
fractional, algebraic and transcendental numbers. After this, one must accept complex numbers. Then there
are real and complex vectors of any dimension, infinite-dimensional vectors and semi-normed topological
vector spaces. To add pain to injury, one must somehow accept a dizzying array of transfinite numbers
which are “more infinite than infinity”. Beyond this are dark numbers and dark sets which exist but can
never be written down. If there were no benefits to this exhausting series of mind-stretches, only people with
a serious personality disorder would study such stuff. Only the amazing success of applications to science
and engineering justify the whole edifice of modern mathematics. The bizarre infinities and abstractions of
mathematics cannot be said to be “true” in any sense. Intellectual discomfort is the price of obtaining the
analytical power of mathematics.
2.12.6 Remark: Symbolic mathematics software can’t represent all real numbers. Nor can humans.
One of this author’s objections to symbolic mathematics software packages in the 1970s until recent times
was the fact that such packages could not possibly represent all of the real numbers. Therefore such software
could only, at best, deal with a limited range of real mathematics. It took a long time to realize that this
limitation applies to human beings also. Anything that a mathematician can write down can be represented
and manipulated by computer software. So anything which computer software cannot represent cannot be
written down by human beings either.
One might counter-argue that mathematicians can think concepts which are not writable. Well, maybe so,
but such concepts cannot be transmitted on the communications channels to other mathematicians. The
mathematics which is written in books and papers consists only of the writable component of mathematics.
All written and spoken mathematics transmits only a finite amount of information. (Even the source files
for this book contain only a finite number of bytes of information!)
Therefore there is no reason to exclude computers from the socio-mathematical network. Computers can
express everything that mathematicians can, if they are programmed extensively enough. The problem with
computers is only that they do not have the naive ontology for mathematics and logic which humans do.
It may be, in fact, impossible to codify human in-built ontology well enough to permit a computer to even
pretend to understand the meaning of what it is doing. (But the effort to program computers to pretend to
understand might mevertheless be worthwhile!)


[65]
Chapter 3
Logic semantics
3.1 Mathematical logic subject development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.2 General comments on logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Modelling, meta-modelling and recursive modelling . . . . . . . . . . . . . . . . . . . . . . 73
3.4 The universality (or otherwise) of modern logic . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Logic in literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Proposition-store versus world-view ontology for logic . . . . . . . . . . . . . . . . . . . . . 85
3.7 A proposition-store ontology for logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.8 Undecidable propositions and incomplete information transfer . . . . . . . . . . . . . . . . 93
3.9 The semantics of truth and falsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.10 The semantics of logical negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.11 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.12 The moods of logical propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.13 Other remarks on the semantics of logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.14 Naive mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
[ This chapter and the next are currently in the “ideas capture phase”. The grouping of remarks and definitions
between and within the sections is not tidy at all. Some of the subsections of this chapter and the next
are like mini-essays or sketch-pad notes. Much of the text is currently repetitive or tedious. During the
“consolidation phase”, this chapter (and the next) will be totally rearranged. ]
3.0.1 Remark: Mathematical logic is the foundation layer of mathematics.

The mathematics in this book starts at the beginning with mathematical logic, building up differential
geometry from the lowest-level concepts because it is so unsatisfying to be told that one’s questions about
the fundamentals of a subject are answered in a course one never took, in a book one never read, or in the
realm of the “obvious”.
Logic is the foundation layer upon which all of mathematics is built. Therefore this chapter and chapter 4
are the “bedrock” on which the following chapters rest. These are also the weakest chapters because they
have nothing to rest on except anthropology and some naive notions of logic and set theory.
3.1. Mathematical logic subject development

3.1.1 Remark: Conceptual frameworks are not necessarily “true”.
The framework for logic which is summarized in Remark 3.1.2 is not necessarily “true” or “correct” in any
sense. The objective of any model or conceptual framework is to provide a fairly unified, consistent set of
concepts which facilitates the study of a wide range of ideas within a subject.
A conceptual framework is not very useful if one must constantly make excuses for things which do not fit,
or if the effort to make things fit is disproportionate to the benefit. A framework is useful if one can easily
find a place within the framework for every concept. Thus, for example, general relativity provides a useful
framework for cosmology and astrophysics, and natural selection provides a useful framework for biology
and palaeontology.

66 3. Logic semantics
The view in Remark 3.1.2 that the primary concept of logic is the “model”, that propositions are properties
of models, and that logical argument is a mere algebra for “solving for unknowns” amongst the propositions,
many of the paradoxes and discomforts of logic seem to disappear. Therefore this framework can at least be
considered to be useful and comfortable.
3.1.2 Remark: High-level overview of mathematical logic.

Mathematical logic may be conceptualized as the following stages of subject development. This may be
regarded as a “tour of mathematical logic”. This summary presents the author’s current understanding of
the “true nature” of mathematical logic.
(1) Define truth functions on concrete proposition domains: For any (naive) set P of concrete
propositions, a function τ : P → {F, T} is assumed to be defined. This is merely a model. Like any
other model, there may be differences between the model and the system being modelled. If the model
has errors, it must be corrected or abandoned. In this model, each proposition P ∈ P has a truth value
τ (P ) which may be true (T) or false (F). As in the case of any model, the values τ (P ) may be unknown
although some relations between the values may be assumed a-priori as part of the modelling. The
objective of the modelling is to solve for unknown truth values in terms of a-priori assumptions.
(2) Define a set of abstract names for concrete propositions: An abstract proposition name space
N is defined so that propositions may be referred to conveniently. This is just like using symbols x,
y, A, B, f , g etc. to refer to numbers, sets or functions. Thus we may write P = “The Sun rises in
the East.” Then P is easier to write than the full sentence. It is understood that the abstract names
refer to arbitrary propositions. The proposition name map may be denoted as µ : N → P. Then
t = τ ◦ µ : N → {F, T} maps abstract proposition names to the truth values of the propositions to
which they refer.
(3) Define logical expressions: Functions of the truth values on a concrete proposition domain P may
be written as logical expressions which can be parsed to determine the functions which are intended.
For example, the logical expression A ⇒ B refers to the function (A, B) 8→ φ⇒ (t(A), t(B)), where
φ⇒ : {F, T}2 → {F, T} is defined by φ⇒ (x, y) = F if (x, y) = (T, F), otherwise φ⇒ (x, y) = T. More
generally, truth functions have the form φ : {F, T}n → {F, T} for non-negative integers n.
(4) Define “logical algebra” problems: Just as in the case of elementary real-number algebra, it is
possible to define “logic algebra” problems where the values of some functions (i.e. logical expressions)
have known values and the task is to solve for the values of individual propositions, the “unknowns”.
For example, given that t(A ⇒ B) = T and t(B ⇒ A) = F, solve for t(A) and t(B). (In this example,
the only solution is: t(A) = F and t(B) = T. Note that this does not follow, at this stage, from
propositional calculus. The solution follows from the definitions of the functions which are represented
by logical expressions.) The “solutions” of logic algebra problems are not necessarily the truth values
of all individual propositions. Sometimes the objective is to determine the truth values of particular
logical expressions rather than the individual propositions. Solutions of logical algebra problems are
called “theorems”. (To make things confusing, theorems are sometimes called “propositions”. Such
confusing terminology should be avoided.)
(5) Formalize logical algebra methods as “propositional calculus”: The methods of solving logical
algebra problems may be written in a formal language as a sequence of statement lines which use a
small range of symbols and follow a small, explicit set of rules. Therefore the methods of logical algebra
may be formalized as a strict, explicit set of rules and assumptions. Any such set of rules may be
referred to as a “propositional calculus” if it always yields correct conclusions from given assumptions.
Thus propositional calculus may be defined as “formalized logical algebra”. The strict formalization of
logical algebra removes the need to understand the meanings of statements. If the rules are followed
mechanically, true logical expressions always follow from true logical expressions.
(6) Extend propositions to parametrized proposition families: In many applications of logic, it
is most efficient to organize very large (or even infinite) sets of propositions into families, where a
proposition is obtained for each value of a parameter called a “variable”. The families are called
“predicates”. Let Q denote the set of predicates and V is the set of variables. Then the expression
F (x) ∈ N is a proposition name for every F ∈ Q and x ∈ V. (It is convenient to use a common
variable space V for all predicates. So t(F (x)) = T is assumed for all x for which the expression
F (x) is undefined.) Predicates may accept multiple variables as parameters. Such predicates have the

3.1. Mathematical logic subject development 67
form F : V k → N for non-negative integers k. This is formally equivalent to replacing the set V of
variables with the set of sequences of variables: V 0 ∪ V 1 ∪ V 2 ∪ V 3 . . .. (Strictly speaking, one should
distinguish between variables in V and variable names in NV , and there should be a variable name
map µV : NV → V.)
[ Modify the presentation here to correctly differentiate between variables and variable names. ]
(7) Define quantifiers to represent bulk conjunctions and disjunctions: It often happens that one
wishes to form the conjunction of all propositions in a family F ∈ Q. The long way to do this is
F (x1 ) ∧ F (x2 ) ∧ F (x3 ) ∧ F (x4 ) . . ., iterating explicitly over all variable values. The short way to do this
is ∀x, F (x). This is symbolic notation for “t(F (x)) = T for all x ∈ V”. (This shows the advantage of
having a common variable space. The symbolic notation does not need to specify it.) Similarly ∃x, F (x)
means “t(F (x)) = T for some x ∈ V”. The long way to do this is F (x1 ) ∨ F (x2 ) ∨ F (x3 ) ∨ F (x4 ) . . ..
The sub-expressions “∀x” and ∃x are called “quantifiers”. The “dummy variable” or “bound variable”
x may be any symbol. (Generally these dummy variables are drawn from the same set of variable names
NV as for other variables.) It would be possible to define many other kinds of “bulk logical operators”,
but it turns out that two quantifiers are enough. Other bulk logical operators can be defined in terms
of these two.
(8) Define logical expressions for parametrized proposition families: Just as in the case of sim-
ple propositions, stage (3), a syntax may be defined the logical expressions which are permitted for
parametrized proposition families. Such logical expressions include all of the syntax of simple proposi-
tions together with constructions which use quantifiers. As in stage (3), the symbolic expressions are
merely notations for functions which must be determined by parsing the expressions. This kind of logic
is called “predicate logic”.
(9) Define “logical algebra” problems for parametrized proposition families: Just as in the case of
simple propositions, stage (4), “logical algebra” problems may be defined for parametrized proposition
families. To distinguish these two case, one could call the logical algebra in stage (4) “proposition
algebra”, whereas the proposition families version could be called “predicate algebra”.
(10) Formalize predicate algebra methods as “predicate calculus”: Just as in the case of simple
propositions, stage (5), the methods of solution for parametrized proposition family problems (“predicate
algebra”) may be strictly formalized as rules of deduction for sequences of lines of symbolic text with a
well-defined syntax. Any such formalization is called a “predicate calculus”. (Sometimes the expression
“first order language” is used.) As in stage (5), the strict formalization of predicate algebra removes
the need to understand the meanings of statements. If the rules are followed mechanically, true logical
expressions always follow from true logical expressions.
(11) Add logical functions to parametrized proposition family logic: This is an extension of the
parametrized proposition family logic in stage (6) to include functions of the form f : V k → V for
non-negative integers k. Then the parameter x of any proposition F (x) may be replaced with the value
of such a function. For example, the expression F (f (x, y, z)) yields a proposition for f : V 3 → V, F ∈ Q
and x, y, z ∈ V. Various other such extensions are also possible. The quantifiers in stage (7), the logical
expressions in stage (8), the predicate algebra in stage (9) and the predicate calculus in stage (10) may
then be defined in the same way as before.
(12) Define set theories in terms of predicate calculus: The framework in stage (11) is now expressive
enough to define set theories. A set theory (such as Zermelo-Fraenkel or Neumann-Bernays-Gödel)
generally requires particular sets of predicates, functions and variables, together with a set of axioms
(which are particular kinds of deduction rules).
(13) Define abstract structures outside set theories: Many kinds of structures, such as number systems
and groups, may be defined purely linguistically in a predicate calculus without basing these structures
on set theory. Sets are merely one kind of system which can be defined within a predicate calculus.
However, a typical set theory has a rich enough structure to enable many other kinds of structures to
be defined with it. Therefore many kinds of mathematical structures are defined directly in terms of
sets to save the burden of developing a separate language and calculus. Structures which are defined
within set theory may be regarded as “representations” of the real structures. But all structures defined
within a predicate calculus are merely models anyway. Sometimes it is convenient to define multiple
representations of a single structure concept. (Examples of this are tensor algebras and tangent bundles.)

(14) Extend set theories by including outside abstract structures: Structures defined inside and
outside set theory may be combined by applying set construction methods to sets of objects which are
not strictly set objects. The objects in ZF and NBG set theory are all sets (or classes). But one may
define structures outside set theory and insert them into a set-theory structure. For example, one may
define integers outside set theory with the Peano axioms. The integers 1, 2, 3,. . . which are defined in
such a non-set-theoretic structure may then be permitted to participate in a set theory. Thus the set
{1, 2, 3} would be a valid construction within a set theory even though the symbols 1, 2 and 3 do not
represent sets.
Perhaps the most difficult part of arriving at the above overview is forgetting what one “knows” from
immersion in the subject. It is generally more difficult to lose a false belief than to acquire it in the first
place. It follows from the above summary that mathematics cannot be identified exactly with any set theory.
Set theories and other logical systems are merely models for systems which are assumed to exist outside the
logical systems.
The formal languages and methods of mathematics may be identified to a great extent with set theories and
other logical systems, but the semantics always lies outside. This is true in the same sense that ordinary
text and images generally refer to things which lie outside the text and images.
Formal logic does not formalize the concrete objects of mathematics. It only formalizes what people write
about mathematical objects. Only the language can be formalized, not the objects which are discussed in
that language. Mathematical objects reside inside minds.
3.1.3 Remark: Mathematical logic is an art of modelling, whether the subject matter is concrete or not.
There is a parallel between the history of mathematics and the history of art. In the olden days, paintings
were all representational. In other words, paintings were supposed to be modelled on visual perceptions of
the physical world. But although the original objective of painting was to represent the world as accurately
as possible, or at least convince the viewer that the representation was accurate, the advent of cameras made
this representational role almost redundant. Photography produced much more accurate representations
for a much lower price than any painter could compete with. So the visual arts moved further and further
away from realism towards abstract non-representational art. In non-representational art, the methods and
techniques were retained, but the original objective to accurately model the visual world was discarded. With
modern computer generated imagery, the capability to portray things which are not physically perceived has
increased enormously.
Mathematics was originally supposed to represent the real world, for example in arithmetic (for counting and
measuring), geometry (for land management and three-dimensional design) and astronomy (for navigation
and curiosity). The demands of science required rapid development of the capabilities of mathematics to
represent ever more bizarre conceptions of the physical world. But as in the case of the visual arts, the
methods and techniques of mathematics took on a life of their own. Increasingly during the 19th and 20th
centuries, mathematics took a turn towards the abstract. It was no longer felt necessary for pure mathematics
to represent anything at all. Sometimes fortuitously such non-representational mathematics turned out to
be useful for modelling something or other in the real world, and this justified further research into models
which modelled nothing.
The fact that a large proportion of logic and mathematics models no longer represent anything in the
perceived world does not change the fact that the structures of logic and mathematics are indeed models.
Just as a painting is not necessarily a painting of physical things, but they are still paintings, so also the
structures of mathematics are not necessarily models of physical things, but they are still models. It follows
from this that mathematics is an art of modelling, not a science of eternal truth. Therefore it is pointless to
ask whether mathematical logic is correct or not. It simply does not matter. Logic provides tools, techniques
and methods for building and studying models. Whether those models correspond to anyone’s intuitive idea
of “correct” logic is a matter of taste and applicability.
3.1.4 Remark: The excluded middle is non-negotiable. Logical argumentation must guarantee it.
The logic development schema in Remark 3.1.2 guarantees the “excluded middle”. As mentioned in Re-
mark 3.1.3, it does not matter whether logic “in the wild” satisfies the excluded middle. The fundamental
starting assumption for logic models is that propositions can be true or false. All of the axioms and rules
of deduction in logic are mere “techniques of solution” of “logical equations”. If those techniques arrive at

3.2. General comments on logic 69
the conclusion that a proposition is both true and false (or neither true nor false), then it is the techniques
which are faulty.
Similarly, the logic development schema in Remark 3.1.2 guarantees the validity of “proof by contradiction”.
If the adoption of a tentative truth value for a proposition leads to a contradiction, then either the method
of argument is faulty, or the proposed axioms are incompatible with any valid logical system model, or
the tentative assumption was false. So one may validly infer that if the axioms are self-consistent and the
argument is error-free, then the tentative assumption must be false.
The principal assertion of Remark 3.1.2 is that the excluded middle is primary and logical argumentation is
secondary. If there is a clash, it is the method of argument which must give way.
3.2. General comments on logic

3.2.1 Remark: Modern logic arose by necessity, from paradoxes and confusion.
Mendelson [165], pages 1–2, says the following on the historical origins of modern mathematical logic.
Although logic is basic to all other studies, its fundamental and apparently self-evident character
discouraged any deep logical investigations until the late nineteenth century. Then, under the impe-
tus of the discovery of non-Euclidean geometries and of the desire to provide a rigorous foundation
for analysis, interest in logic revived. This new interest, however, was still rather unenthusiastic
until, around the turn of the century, the mathematical world was shocked by the discovery of the
paradoxes, i.e. arguments leading to contradictions.
One could argue that the deeper questions of logic only arose because mathematicians insisted on pushing
their naive logic beyond its limits. If they had restricted themselves to fairly ordinary, non-pathological areas
of mathematics, there might have been no need for the deep study of logic. However, they did push the
boundaries to try to find more and more pathological and infinite sets with the most extreme characteristics.
They tried to define ridiculously infinite sets which were of no practical value, which led to the paradoxes
of Burali-Forti (1897), Cantor (1899) and Russell (1902). They tried to calculate the area beneath the
nastiest possible curves, which culminated in Lebesgue non-measurable sets (Vitali, 1905), which were non-
constructible. Paradoxes were, in hindsight, inevitable at the boundaries of the scope of naive logic.
It could perhaps be argued that it was the extreme extrapolation of naive logic which led inevitably to the
complexities and abstractions of modern mathematical logic. Conversely, one could argue that there is little
need for the intellectual intensity of mathematical logic if mathematicians restrict themselves to reasonable
constructions and concepts. In particular, the paradoxes and tortuous entanglements of modern logic can
probably be safely ignored by physicists and other users of mathematics.
3.2.2 Remark: Symbolic logic is not the basis of all rationality.

Symbolic logic should not be taken to be absolutely true in any sense. The book by Lakoff/Núñez [173],
page 8, makes the following comment on this subject.
Symbolic logic is not the basis of all rationality, and it is not absolutely true. It is a beautiful
metaphorical system, which has some rather bizarre metaphors. It is useful for certain purposes
but quite inadequate for characterizing anything like the full range of the mechanisms of human
reason.
3.2.3 Remark: Most of mathematics can be reduced to symbolic logic.

Logical argument is the art of convincing people (including onself) of the validity of propositions. But
mathematics delivers more than mere propositions.
The outputs or “deliverables” of mathematics also include calculations and diagrams. Calculations are
arithmètic in character whereas diagrams (and graphs) are geometric in character. However, diagrams
and graphs may be coordinatized with numbers, which are arithmètic, and arithmetic may be reduced to
logical propositions. Therefore ultimately all mathematical deliverables would seem to be reducible to logical
propositions, although the meaning of such propositions requires semantics which lie outside pure symbolic
logic. The process of reformulation and interpretation of geometry, arithmetic and analysis in terms of
mathematical logic is illustrated in Figure 3.2.1.
Mathematicians don’t just “deliver” propositions, calculations and diagrams. They back up their deliverables
with justifications. The purpose of such justifications is to convince the “client” to accept the deliverables.

geometry geometry
questions answers
reformulation interpretation
arithmetic arithmetic
and analysis and analysis
questions answers
reformulation interpretation
symbolic logical symbolic

logic logic
assumptions arguments assertions
Figure 3.2.1 Reformulation and interpretation of geometry, arithmetic and analysis
The mathematician is expected to provide a kind of certification of the correctness of the deliverables. This
leads to a heavy emphasis on deciding whether a deliverable is “true”. The word “true” means in practice that
the deliverable should be accepted . Thus all mathematical deliverables are divided into true and false, and
one should accept the true deliverables and reject the false ones. This is the socio-mathematical significance
of truth and falsity.
Mathematics does much more than merely decide which propositions are true or false. For example, math-
ematics includes the reduction of arithmetic and geometry to symbolic logic, and the interpretation of the
outcomes of logical analysis back to the tasks in arithmetic and geometry where they should be applied.
3.2.4 Remark: Abstract mathematics removes the semantics.

Pure, abstract mathematical logic often seems to be a meaningless intellectual recreation. This is because
the original meaning, the semantics, has been removed during rigorous axiomatization. It often seems that
semantics is regarded as an optional extra in mathematical logic. Sometimes it is almost impossible to
discover the meaning or origins of mathematical concepts.
However, when logic is being applied to real mathematics, such as geometry, arithmetic and analysis as in
Remark 3.2.3, the semantics may be rediscovered by retracing the steps of the formulation of these subjects
into symbolic logic.
3.2.5 Remark: Mathematics is an unusually precise application of logic.

Logic may be applied to a very wide range of contexts. For example, one may pick up a book at random,
highlight several sentences in the book, and hypothesize that those sentences are either correct or incorrect.
Then the propositions of this logic could be labelled Ai for i = 1, 2,. . . n, for some non-negative integer n,
where each Ai means: “Sentence i is correct.” Each of these propositions has the truth value “true” or
“false”. One may then notice that there are dependencies between the sentences resulting from a-priori
knowledge (or assumptions) about the subject. Thus it may be that A1 ⇒ A2 , meaning that if sentence 1
is correct, then sentence 2 must be correct. By accumulating such information on relations between the
truth values of the propositions, one may gradually determine individual truth values. This is similar to
the procedures of algebra, where the numerical variables are solved for in terms of known relations between
them.
The application of logic to pure mathematics is similar to other applications of logic, but the logic of pure
mathematics has two exceptional features.
(1) Precision of modelling: The modelling of truth and falsity for pure mathematics is assumed to be
perfectly accurate. In other words, all propositions are assumed to be either true or false, and all
abstract logical operations are mapped verbatim to the propositions of the concrete logic.
(2) Completeness of modelling: The logically deducible propositions are the only true propositions. In
other words, only the deducible propositions are to be accepted. Any proposition whose truth value
is not deducible is simply unknown. There is no presumption of truth until proven false. The entire
span of the subject is the set of all deducible propositions. Everything outside that span is unknown
territory.

3.2. General comments on logic 71
This is quite different to the application of logic in science and everyday real-world experience. When
applying logic to the real world, one never has certainty of the truth value of any proposition, and only a
subset of all possible propositions is subjected to logical analysis. The set of propositions to be analyzed is
determined by the context of the discussion and the motives of the participants.
The concepts of truth and falsity in abstract logic have no meaning in isolation from the application context.
Abstract logic has a set of rules which must be followed, and the words “true” and “false” are merely abstract
labels which must be manipulated according to the rules. However, in the application to pure mathematics,
a proposition which is determined to be “true” according to the rules must be accepted in the corpus of
mathematical knowledge. A proposition which is determined to be “false” according to the rules must be
rejected (or ejected) from the mathematical knowledge corpus. Such acceptance and rejection of propositions
is absolute in the pure mathematical realm.
3.2.6 Remark: Logic is the art of deducing propositions from other propositions.
Since physics is expressed in terms of mathematics, and mathematics is formally justified in terms of logic,
one might reasonably conclude that logic is very deep and important and fundamental. However, logic is
very much less significant than it may at first seem. Logic is merely the algebra of logical expressions. The
methods of logic allow one to deduce logical expressions from other logical expressions. This is very much
like elementary algebra, which is a set of methods for solving for algebraic expressions in terms of given
algebraic expressions. This analogy is summarized in the following table.
entities direct method indirect method
numbers measure numbers directly solve equations to determine numbers
propositions assess propositions directly deduce propositions from axioms
For example, one may measure the height H of a building directly with a tape measure. Alternatively one
may time the fall of a stone from the top of the building and solve the equation 12 gt2 = H for the height H
in terms of the measured time t. The indirect method is not inherently superior to the direct method. The
indirect method exploits a general rule which is inferred from past observations. It may be more efficient
economically to use the indirect method. But the estimate of H does not have any greater credibility because
the value was arrived at by using an equation.
In a similar way, I may wish to determine whether the plants in my garden are wet. I could walk outside
and measure this directly. However, I may have heard rain falling heavily for the last 30 minutes. So I know
indirectly that the plants must be wet. This is because A ⇒ B, where A = “It has been raining for the last
30 minutes.” and B = “The plants in my garden are wet.” If I apply A and A ⇒ B to conclude B, this does
not make the conclusion B any more credible than if I measure B directly.
In the case of mathematical logic, the fact that I can deduce 2+2 = 4 from the ZF set theory axioms together
with the definition of addition does not make the conclusion 2 + 2 = 4 any more credible because I derived
it from axioms. In principle, all of the “facts” of mathematics could be determined directly through deep
thinking, or experimental observation, or any other means. Logic is, in this sense, useless. But just as we do
not wish to measure all physical phenomena directly, but prefer instead to predict the values of measurements
from the application of inferred laws of physics, so also it is more efficient to codify mathematics in terms of
small sets of laws from which all other “facts” may be deduced.
Thus logical argumentation does not make assertions more valid than if they are examined on a case-by-
case basis. Axiomatic systems play the same role in logic that physical models do in physics. By making
calculations from such systems or models, it is possible to reduce the burden of direct measurement or
assessment.
As in the case of physical models, the validity of an axiomatic system is limited by its ability to pass predictive
tests. If a physical model makes a prediction which disagrees with observation, the model must be restricted
in scope, modified or discarded. The same applies to axiomatic systems. If a set of axioms and logical rules
yields conclusions which are unacceptable, the system must be restricted in scope, modified or discarded.
One cannot say that a proposition is given enhanced validity by being deduced from a set of axioms, no
matter how reasonable the axioms may seem at first sight.
It is sometimes difficult to explain to non-scientists that science does not claim to deliver truth, and that
scientists try to disprove their own theories. Scientists only work with models, and models are frequently

“voted out of office” if they do not deliver the right predictions. The fame and career of a scientist are
greatly enhanced by disproving generally accepted models. So most experiments are designed to disprove
theories, not prove them. In the same way, mathematicians and logicians should not place too much value
on their methods of argumentation and calculution. The axiomatic systems which they develop, and the
deduction rules which they apply, must be evaluated according to their ability to generate assertions which
are acceptable.
One may say that logical argumentation is merely the art of solving simultaneous logical equations. This
enables a very large number of logical assertions to be generated from a compact set of axioms and rules.
Therefore formal logic is useful for compactly presenting mathematics. But logic can never guarantee that
anything is true. Logic never proves anything. Logic only facilitates the compression of a large number of
propositions into a small space. Any “truth” that there is in mathematics comes from somewhere else.
3.2.7 Remark: Axioms do not fall from the sky on golden tablets.
Axiomatization was famously successful in the case of Euclid’s geometry. It seems likely that the axiomatic
minimalism instinct observable in mathematics and logic since that time has been inspired originally by the
success of that project. Perhaps the axiomatization programme has gone too far. A subject may be axioma-
tized so as to generate the desired range of theorems, but then there may be unintended consequences such
as paradoxes, pathological examples and counter-intuitive results. Quite often the undesired consequences
of an axiomatic system are held to have credibility because of being deduced by logical arguments. Axioms
do not fall from the sky on golden tablets. On the other hand, it does take courage to assert a proposition
without the support of a deductive argument of some sort.
3.2.8 Remark: Linguistic logic is a higher brain function than real-world modelling.
Logic seems to be a “higher” function than mere arithmetic and other modelling activity in the brain. A
principal difference between logic and mere modelling is that in logic, one tolerates not knowing whether a
hypothesis is true or false. The logical brain considers a proposition to be an “object” whose truth value is
an unknown attribute.
Pre-scientific people were (and are) not able to accept the not-knowing. They need to know which of any
two options is true. They need a definite world-model where everything is very definitely one way or the
other.
The scientist says: “It may be true or it may be false, and we must wait to find out which.” Pre-scientific
thinkers just want to know the answer and cannot tolerate the Schrödinger’s cat scenario where the world
could have more than one possibility indefinitely. They don’t want to hear about modalities.
It seems likely that there was once a pre-logical era where each human had just one world-view at any point
in time, with no IFs and no ORs. That is why the ancient Gilgamesh story has no ORs and only eleven IFs
in 3000 lines. (See Remark 3.5.2.)
Probably all humans still have the yearning for certainty, and maybe all humans feel uncomfortable with not
knowing the answers to important questions. But most modern people resist the desire to fill in the gaps of
ignorance with myths and ancient, simplistic world-pictures.
A question arose in the writing of this book whether logic should follow or precede the sets and numbers
chapters. One could argue that logic should come after arithmetic because all animals can do world-modelling
and many can do arithmetic, whereas only humans, who have language, are capable of expressing logical
propositions, which themselves are “objects” with attributes.
However, logic and mathematics in this book do not have to be in the same order in which logic and maths
developed among animals on Earth. Our modern approach to maths (and everything else) is immersed in a
framework of logical thinking. Mathematics is not taught as a set of fixed and eternal methods any more, like
the ancient Egyptian priests did when Thales visited them in about 600bc. The ancient Greeks introduced
logic and deduction to mathematics, and they proved propositions. They also speculated actively and with
enthusiasm about propositions which could not be decided to be true or false; in other words, paradoxes.
The Greeks introduced IFs and ORs and hypotheses to maths, especially to geometry. That is the modern,
logical way of thinking.
Thus it seems right to start this book with our modern methods of thinking, which affect everything that
follows in the book, even though historically, logical thinking probably dates only to about 750bc, about the

3.3. Modelling, meta-modelling and recursive modelling 73
same time as the writing-down of Homer and the very early Olympics. (The last version of Gilgamesh falls
into this early “logical era”.)
The anthropology of logic could throw a lot of light on the nature of logical thinking. It would be interesting
to know whether the tribes of New Guinea and the Brazilian forests discuss hypotheses and have logical
arguments. Anthropologists study language, kinship relations, tools, technology, rituals, music and pictorial
art, but they do not seem to research the anthropology of logical thinking. Yet logical thinking is probably
the biggest thing to happen to the human species since the advent of language. Logical thinking is what
made the modern science-based world possible!
3.3. Modelling, meta-modelling and recursive modelling

3.3.1 Remark: Mathematics is the study of models, including analysis and synthesis of models.
Mathematics may be defined as the study of models. This includes the analysis and synthesis of models.
The techniques of mathematics may be thought of as a toolbox for creating, studying and applying models.
Some mathematical models are intended to correspond to the physical or empirical world. Other models are
purely abstract because they correspond to processes inside minds, for example. (By comparison, painting
was originally the art of depiction of the visible world, but over time, the subjects of depiction became
more and more abstract. A totally abstract painting depicts only itself. In the same way, pure, abstract
mathematics models nothing but itself.)
Applied mathematics is concerned with the application of mathematical models to real-world systems. (The
three modelling stages, abstraction, analysis and application, are discussed in Remark 3.9.6.)
3.3.2 Remark: World models are required in all animals in the sensor/motor feedback loop.
Animals have sensor and motor paths which must be coordinated. The manipulation of the environment,
and of the animal’s own body, requires a feedback loop to ensure that the motor path sends the right signals
to achieve an objective. (See Figure 3.3.1.) It is almost obvious that some sort of model of the world must
be implemented within the organism. Minds arise from motion and manipulation.
sensor path
noumena model
motor path
world organism
Figure 3.3.1 Modelling of the world by an organism
[ Make a table of definitions of logic “structure”, “interpretation” and “model” by various authors. ]
3.3.3 Remark: The logic literature uses the word “model” differently.
The word “model” is widely used in the logic literature in the reverse sense to the meaning in this book.
Shoenfield [169], page 18, first defines a “structure” for an abstract (first-order) language to be a set of
concrete predicates, functions and variables, together with maps between the abstract and concrete spaces,
as outlined in Remark 4.12.3. If such a structure maps true abstract propositions only to true concrete
propositions, it is then called a “model”. (Shoenfield [169], page 22.)
Mendelson [165], page 49, defines an “interpretation” with the same meaning as a “structure”, just men-
tioned. Then a “model” is defined as an interpretation with valid maps in the same way as just mentioned.
(Mendelson [165], page 51.) Shoenfield [169], pages 61, 62, 260, seems to define an “interpretation” as more
or less the same as a “model”.
Here the abstract language is regarded as the model for the concrete logical system which is being discussed.
It is perhaps understandable that logicians would regard the abstract logic as the “real thing” and the
concrete system as a mere “structure”, “interpretation” or “model”. But in the scientific context, generally
the abstract description is regarded as a model for the concrete system being studied. It would be accurate
to use the word “application” for a valid “structure” or “interpretation” of a language rather than a “model”.

3.3.4 Remark: Discussion context and discussed context.

Logic takes place within a “network of discussions”. Every logic discussion has a “discussion context” A and
a “discussed context” B. But the discussed context B may itself be a discussion context which is dealing with
a third discussed context C. (See Figure 3.3.2.) The third context may be referred to as a “meta-discussion
context”.
model 1 model 2
machine 1 machine 2 machine 3
Figure 3.3.2 Modelling of modelling
The 5-layer model illustrated in Figure 4.5.1 in Remark 4.5.4 typically refers to three different contexts.
Layer 1 belongs to a discussed context. Layers 2 and 3 belong to a discussion context. Layers 4 and 5
belong to a meta-discussion context. As mentioned in Remark 2.2.7, a discussion context often influences
the discussed context. (For example, grammar books and dictionaries influence the development of the
languages which they describe.) So the boundaries of the contexts are sometimes difficult to determine.
Difficulties arise when one attempts to combine multiple discussion contexts in a cyclic fashion. For example,
logicians discuss mathematics and comment on its validity, while mathematicians likewise discuss logic and
comment on its validity. There is no cycle-free hierarchy of superior intellect here. The fact that one
group of people A discuss the activities of another group of people B does not imply that A has superior
knowledge above B. It is always “interesting” (and often exasperating) to see one’s own discipline being
discussed by another discipline. This kind of interdisciplinary “critique” can lead to hostilities sometimes.
(There are many strategies for prevailing in such conflicts. For example, “embrace and extend”. I.e. make
“contributions” to the literature of the other discipline. Alternatively just ignore them if you have some
serious work to do.)
Logic is just one aspect of the way people think when they are discussing any subject. The subject matter
may be anything at all, including the written or spoken discussions of other people. Since logic is often part
of the discussed context, logic may be part of the subject of discussion as well as being part of the discussion
itself.
Confusion can be avoided in discussions of logic by distinguishing clearly between multiple contexts. Con-
fusion can be maximized by mixing multiple contexts without indicating which portions of the discussion
belong to each context.
3.3.5 Remark: Models and meta-models may co-exist within one logic machine.
Meta-modelling is the modelling of a model. Both the model and the meta-model may co-exist within one
logic machine. This is illustrated in Figure 3.3.3. Model 1 is a meta-model for machine 3 in this diagram
and also in Figure 3.3.2.
model 1 model 2
machine 1 = machine 2 machine 3
Figure 3.3.3 Meta-modelling within one logic machine (or mind)

3.3. Modelling, meta-modelling and recursive modelling 75
PC ZF
ZF PC
PC ZF
ZF PC
model 1 model 2
machine 1 machine 2
Figure 3.3.4 Recursive modelling of one machine by a second machine
3.3.6 Remark: Recursive modelling of one machine by a second machine.

If two modelling machines are so foolhardy as to attempt to model each other, the result is recursive madness.
This is illustrated in Figure 3.3.4.
For example, Zermelo-Fraenkel set theory (ZF) may be modelled within propositional calculus (PC), and
PC may be modelled within ZF. One should not examine recursive models too closely. It is particularly
inadvisable to try to perform recursive modelling inside a single mind as in Figure 3.3.5. (Attempting to
do this can adversely affect your sleep pattern!) One can at least hope that the external modelled system,
machine 3 in this case, will not be a deeply recursive as the abstract model suggests.
PC + ZF PC
ZF PC ZF
PC ZF PC
ZF PC ZF
PC
model 1 model 2
machine 1 = machine 2 machine 3
Figure 3.3.5 Recursive modelling inside a single machine (or mind)
It is difficult to discuss self-consistency for a recursive model because one can never really define anything
fully. One can only define things in terms of other things, which are defined in terms of yet other things,
and so forth. However, one could perhaps discuss “coherence” as a second best.
3.3.7 Remark: In practice, modelling loops do not become infinite.

The recursive modelling situations alluded to in Remark 3.3.6 are not quite as disastrous as they look.
The reason for this is the same as the reason why positive feedback in audio systems does not exhibit the
theoretical infinite runaway behaviour which the simplest mathematical models predict. The voltages in an
audio system do not reach billions of volts. Nor does the sound level destroy the Earth’s atmosphere by
over-heating. Audio systems rapidly reach the limits of the simplest kind of model, after which the behaviour
is constrained by those limits.
In the case of modelling loops, the definitions are never expanded to an infinite extent. For example, a
ZF set theory concept may be converted to pure predicate logic (such as a first order language). The pure
logic may then be written in terms of the set theory of the concrete propositions domain and the various
relations and maps which are required to formalize logic. Those sets, relations and maps may then be
converted into pure predicate logic again. In practice, finite machines such as human brains never perform
this unbounded expansion very far. The expansion is limited by practical concerns. If the expansion is
performed on computers, memory space is generally an effective limitation.
All of the infinite aspects of logic and set theory are only potential . The axioms and rules define the range
of what can be expressed within logic and set theory. The full range cannot be instantiated in practice.
Unbounded traversals through set theory and logic, following the recursive definitions of each in terms of
the other, only needs to find no contradictions in those traversals. One may say that the combined recursive
system is “coherent” if no contradictions can happen in any such traversal.

In terms of the proposition-store ontology for logic in Section 3.7, one may say that any logical system is
only “finitely populated” at any point in time. In other words, despite the theoretically infinite number of
propositions which can be asserted within most logical systems, the proposition store never becomes infinitely
populated with propositions. (One might compare this with the Earth, which only has a finite population
at any point in time.) The axioms and rules of a logical system determine what propositional offspring is
permitted from any group of parent propositions. The expansion of definitions follows similar breeding rules.
3.3.8 Remark: Logic and set theory do not build anything. Customers must provide everything.
In a sense, there is no real problem with the circularity of logic and set theory. Both of these theories
merely tell you that is you have a system which may be accurately modelled by the rules and axioms which
you propose for them, then there are various consequences (i.e. theorems). The methods and techniques of
logic and set theory do not build anything! They only tell you the consequences of rules and axioms which
are presumed to be valid for a system which you must provide yourself. If you have no such system to be
modelled, then no conclusions can be drawn. To put it simply, logic and set theory tell you: “Give me a
system which satisfies these rules and axioms, and I will tell you some conclusions which can be drawn.” In
other words, the spaces of propositions and sets must be built and provided by other means.
The apparent mutual circularity of logic and set theory disappears when it is realized that the actual
construction of concrete logic and set theory systems is “somebody else’s problem”! Logicians and set
theorists are only service providers, not manufacturers.
So it is okay that a logical system requires the a-priori provision of sets of logical propositions, predicates,
variables and names. The methods of logic do not have to provide these. They are assumed to satisfy, for
example, the rules and axioms of ZF or NBG set theory. It is the customer’s responsibility to provide these
sets and ensure that they meet the prerequisites.
Likewise it is okay that a set theory requires the a-priori provision of a logical system within which the set
theory axioms may be presented and developed. Once again, it is the customer’s responsibility to provide
this. Set theory delivers no conclusions at all unless a logical system is provided which meets the service
provider’s specifications.
Logic and set theory may be combined. The logic service provider requires you to provide a set theory which
meets the requirements of the set theory service provider. And vice versa. If the customers are unable to
provide the prerequisites in advance, the service providers will not be able to deliver any valid services at
any price.
3.3.9 Remark: Truth-value status of propositions depends on logical system context.
Within a logic machine A, all propositions which are proved in A are true and are believed without question.
But from the point of view of a machine B which models machine A, all proved propositions in A are
completely arbitrary, since the propositions in A are merely entities within a modelled system. If A is also
modelling B, the propositions in machine B seem equally arbitrary to A. This kind of situation is familiar
when the “logic machines” are human minds. But the situation also arises when two symbolic logic systems
are able to model each other. Then, depending on which logical system context one is thinking within, each
proposition may seem to be either unquestionably true or else completely arbitrary. Thus the truth-value
status of propositions is highly dependent on the discussion context. Consequently one can never say that
anything in mathematics is true or false. Everything is context-dependent.
3.3.10 Remark: Mathematical logic may be useful for ethics discussions, subject to conditions.
A clear distinction is generally made between empirical propositions and value propositions. The empirical
category includes: “You are eating the ice cream.” The value category includes: “You should not eat
the ice cream.” This division of propositions into IS-propositions and SHOULD-propositions is not always
completely clear. But science is generally held to be in the first category, while ethics is held to be in the
second category.
Logic may be applied to both the empirical and ethics categories of propositions. However, ethics propositions
generally depend very much upon empirical propositions whereas empirical propositions are not supposed to
depend on value considerations. At least, empirical propositions should not depend on ethical propositions.
In practice, the two-state yes/no requirement for logical propositions is easier to satisfy with empirical
propositions than with value propositions. Value statements are often subjective, controversial, ambiguous
or meaningless. Standard mathematical logic absolutely requires two mutually exclusive states for each

3.4. The universality (or otherwise) of modern logic 77
proposition. If this requirement can be satisfied, mathematical logic may be a useful model for ethical
discussions, and logical tools may be of some value.
Laws (for human conduct, not scientific laws) are very often written in the form A ⇒ B, where A is an
empirical proposition and B is an imperative proposition. When B is some sort of punishment, it is implicit
that A should not be done. In other words, A is then a value proposition. But when the law is applied,
A is generally empirical (to be determined by evidence) whereas B is effectively a value statement which
says that action B should or must be carried out. There is clearly ample opportunity here for ambiguity.
On the other hand, laws are probably among the earliest applications of logic, with a history of thousands
of years. The antecedent A of a law A ⇒ B is often a compound logical expression with many component
propositions.
3.4. The universality (or otherwise) of modern logic

3.4.1 Remark: The correctness of mathematical logic is proved by robots on Mars.
If logic is a mere anthropological observable, one might reasonably ask why our culture has so much certainty
in the objectivity and universality of its logical processes. Is there any objective measure of the “correctness”
of our culture of logical argument? After all, there have been other systems of logic within recorded history
which are different to our current logic.
An extraterrestrial observer could note that our culture has sent robots to Mars. Previous cultures have
not done so. It seems to be a fair statement that the ability to engineer martian robots, and navigate
them through the Solar System’s gravitational fields and other obstacles to land them on Mars and transmit
close-up photos and measurements back to Earth, is an objective indication that our logic is well attuned to
the nature of the Universe. One could point to any of the accomplishments of the last century as supporting
evidence for the superiority of our logic, mathematics, physics, chemistry and engineering.
A person commencing education in our culture must make great efforts to accept the propositions and rules of
argumentation of our society. People would presumably not make such effort if there were no incentives, but
the incentives are very strong. Human beings obtain great power through their current methods of logical
argumentation, mathematics and science. It is quite possible that if, in future, the outputs from science
are viewed as bad, our society could abandon the currently fashionable logic and adopt other methods of
argumentation.
3.4.2 Remark: Aristotle’s logic may have been popular because of Alexander’s success.
The case could be made that Aristotle’s logic was held in high esteem by medieval Europe partly because he
was well known to have been a tutor of Alexander the Great. Aristotle became tutor to Alexander in either
343bc (Russell [186], page 173; Barnes [187], page 10) or 342bc (Cotterill [181], page 468). According to
Russell, Aristotle tutored Alexander between the ages of 13 and 16 years. The association of Aristotle with
Alexander was very well known (Russell [186], page 174).
At the death of Alexander, the Athenians rebelled, and turned on his friends, including Aristotle,
who was indicted for impiety, but, unlike Socrates, fled to avoid punishment.
This is a reference to the execution of Socrates in 399bc, partly on account of his tutoring of such unpopular
politicians as Alcibiades. Similarly, Barnes [187], page 11, says:
In 323 Alexander died in Babylon. When the news reached Athens Aristotle, unwilling to share the
fate of Socrates, left the city lest the Athenians put a second philosopher to death.
Russell [186], pages 206–212, is quite critical of Aristotle’s logic and describes numerous errors and points of
confusion. On page 206, he says the following about the undeserved influence of Aristotelian logic.
Aristotle’s influence, which was very great in many different fields, was greatest of all in logic. In
late antiquity, when Plato was still supreme in metaphysics, Aristotle was the recognized authority
in logic, and he retained this position throughout the Middle Ages. It was not till the thirteenth
century that Christian philosophers accorded him supremacy in the field of metaphysics. This
supremacy was largely lost after the Renaissance, but his supremacy in logic survived. Even at
the present day, all Catholic teachers of philosophy and many others still obstinately reject the
discoveries of modern logic, and adhere with a strange tenacity to a system which is as definitely
antiquated as Ptolemaic astronomy. This makes it difficult to do historical justice to Aristotle.

His present-day influence is so inimical to clear thinking that it is hard to remember how great an
advance he made upon all his predecessors (including Plato), or how admirable his logical work
would seem if it had been a stage in a continual progress, instead of being (as in fact it was) a dead
end, followed by over two thousand years of stagnation.
It seems plausible that the strong, broad acceptance of Aristotelian logic for such a long time was influenced
by the very high regard in which Alexander’s empire-building capability was held. This would seem to
support the idea that people accept the logic framework which they believe has the most prestige or the
greatest power. Therefore our current system of logic may not be so absolute and universal as it seems.
One could go further and speculate that all knowledge and “truth” is determined by prestige, power and
economic self-interest. If science ever stops delivering concrete benefits, the world could return once more
to a pre-scientific dark age.
3.4.3 Remark: Mathematical logic is an imperfect model for the physical world.
It is very difficult indeed to see how logic could possibly be wrong. It seems that logic is so fundamental and
obvious, it must be beyond doubt. The propositional calculus should be beyond doubt too. How can one
find anything dubious in simple, basic logic?
As a human calculational activity, as a set of procedures, one can no more cast doubt on simple logic than
one can disprove the rules of the game of chess. Logic is a well-defined set of procedures which are agreed
upon by large numbers of people. Anyone who adheres to the rules should arrive at the same conclusions in
all cases.
The doubtfulness of logic lies in its applicability to real-life propositions. This is analogous to the way
in which the integers have well-defined rules, although the applicability of the integers to the real world
is dubious. There are no collections of physical objects with 101000 elements. So the interpretation of
integers as cardinalities of collections of objects is dubious. All but a finite number of integers must be “grey
numbers”, as mentioned in Remark 2.11.12. So almost all integers cannot even be thought about, or written
down, or represented in computers. Even the counting of small numbers of physical objects is quite dubious,
as mentioned in Remark 2.11.19. So the integers are an abstraction which is merely a model for the real
world. As the old saying goes, the real world is only an approximation to the models of theoretical physics.
Similarly, real-world counting is a mere approximation to the mathematical integers, and real-world logic is
only an approximation to the crisp perfection of abstract logic.
Logical propositions are an abstraction from real-world propositions. In the real world, no proposition is
ever true or false beyond all doubt. This is partly because of quantum mechanics. But it also follows from
the unreliability of human observation. Even the state of a human mind is highly variable. It is impossible
to know what anyone means when they say that a proposition is “true”. A person’s beliefs may change over
time. So during a time of transition, the truth value is in doubt. Also, most propositions are contingent on
other propositions, which are themselves in doubt. All empirical knowledge has a probabilistic or statistical
nature. Most propositions are open to a wide range of interpretations. The definition of “truth” is itself
very elusive. Truth is such a fundamental concept that it is impossible to define, just as probability cannot
ultimately be defined in concrete terms. Truth is a “primitive concept” which is not reducible to other
concepts.
Thus logic is a perfect representation of itself, but it is dubious that it accurately models real-life propositions
which are held by real-life human beings.
The above comments may be summarized by the statement: “Logic is a human procedure.” In other words,
logic is not an absolutely accurate model for anything in the real world.
3.4.4 Remark: Logic might not be a-priori absolute and universal.

Immanuel Kant asserted that Euclidean geometry was somehow a-priori determined in some sense. Since his
time (1724–1804), it has been discovered that real-world physical space does not necessarily follow Euclidean
geometry at all. This raises doubts that anything is really a-priori determined, including logic and numbers.
To quote Bell [190], page 344, writing in 1945:
The arbitrary freedom in the mathematical construction of ‘spaces’ and ‘geometries’ at last made
it plain that Kant’s a priori space and his whole conception of the nature of mathematics are
erroneous. Yet, as late as 1945, students of philosophy were still faithfully mastering Kant’s obsolete
ideas under the delusion that they were gaining an insight into mathematics. As Kant appealed

3.5. Logic in literature 79
to his mathematical misconceptions in the elaboration of his system, it is just possible that some
other parts of his philosophy are exactly as valid as his mathematics.
Russell [186], pages 685–686, gave the following summary of Kant’s assertions about geometry.
As regards space, the metaphysical arguments are four in number.
(1) Space is not an empirical concept, abstracted from outer experiences, for space is presupposed
in referring sensations to something external, and external experience is only possible through
the presentation of space.
(2) Space is a necessary presentation a priori , which underlies all external perceptions; for we
cannot imagine that there should be no space, although we can imagine that there should be
nothing in space.
(3) Space is not a discursive or general concept of the relations of things in general, for there is
only one space, of which what we call ‘spaces’ are parts, not instances.
(4) Space is presented as an infinite given magnitude, which holds within itself all the parts of
space; this relation is different from that of a concept to its instances, and therefore space is
not a concept but an Anschauung.
The transcendental argument concerning space is derived from geometry. Kant holds that Euclidean
geometry is known a priori , although it is synthetic, i.e. not deducible from logic alone. Geometrical
proofs, he considers, depend upon the figures; we can see, for instance, that, given two intersecting
straight lines at right angles to each other, only one straight line at right angles to both can be drawn
through their point of intersection. This knowledge, he thinks, is not derived from experience. But
the only way in which my intuition can anticipate what will be found in the object is if it contains
only the form of my sensibility, antedating in my subjectivity all the actual impressions. The
objects of sense must obey geometry, because geometry is concerned with our ways of perceiving,
and therefore we cannot perceive otherwise. This explains why geometry, though synthetic, is
a priori and apodeictic.
Some philosophers have fought back against this attack on Kant’s apparent bungle on geometry. For example,
see Palmquist [185].
If logic and numbers are examined in depth, they look less and less a-priori. However, just as the human
concept of geometry has evolved to adapt to the evidence of the senses, it seems reasonable to suppose that
logic also adapts over time to observations of the physical world.
3.4.5 Remark: Evolution should yield animals whose logic matches the real world.
The amazing coincidence that human minds (and a large proportion of other animal minds) are well-prepared
for conceptualizing two-dimensional and three-dimensional space is easily explained by adaptation to the
environment. Similarly, colour perception is a useful adaptation, not a purely coincidental match between
capability and utility. In the 20th century, the concept of geometry was extended from flat spaces to curved
spaces in response to physical observations. The concept of colour was extended in the last few centuries
from visible light to a vast spectrum of electromagnetic radiation in response to observations also. It would
not be too surprising, then, if our concepts of logic needed to be extended in some way over the next few
centuries, possibly in response to new discoveries in quantum mechanics or other areas of physics.
3.5. Logic in literature

3.5.1 Remark: Ancient literature may give clues to the meaning of logic.
It is difficult to develop mental concepts without language. When concepts are present in a language,
the users of that language develop and reinforce those concepts through frequent use. Therefore it seems
plausible that one may trace the development of human thinking about particular concepts by studying their
appearance in language. For example, one could study the use of logic words in the 6000 or so languages
which are currently in use in the world. This might indicate something about the way in which people in
different cultures actually think. A society which does not have a word for “if” and “or” would be unlikely
to have well-developed capabilities in propositional logic.
Written human language dates from about 3500–3250bc. Cuneiform writing on tablets dates from about
3000bc. Therefore we have potentially some access to the development of human language and thought over

the last 5000 years. There are several obstacles in this path. There were not very many humans on Earth
5000 years ago. In 2250bc, there were only about 25 million people on Earth. In 3500bc, there were only
about 4 million people in Europe, Western Asia and North Africa. (See McEvedy [184], page 34.) Very few
of these people were actually writing anything. And of the tiny amount of literature that was written, only
a very tiny proportion has survived to the current time.
3.5.2 Remark: The epic of Gilgamesh contains almost no logic in 3000 lines.
The most ancient substantial narration (about 3000 lines) is the Gilgamesh epic, which is set principally
in Iraq around 2750bc or 2650bc. It was initially composed orally about 2250bc and was written down in
Sumerian and Akkadian in various versions between about 2050bc and 650bc, the standard version being
written down around 1100bc. (See Gilgamesh [203].)
Since the epic of Gilgamesh is so ancient, one would expect to see less logic in the text than in more modern
texts for two main reasons: people probably had not developed logical thinking to a great extent, and any
logic which was present in people’s thinking would probably not have been given vocabulary and grammatical
structures at that time. This conjecture is supported by a reading of Gilgamesh. The entire epic seems to
have no occurrences at all of the word “or”, nor any equivalent grammatical construction. This is not
surprising because the word “or” generally indicates uncertainty or ignorance. This is unlikely to be popular
in a narration.
The word “or” requires the narrator to convey two possibilities, neither of which is certain. The truth of one
proposition is conditionally linked to the truth of the other. (The inclusive “or” has the form (¬A) ⇒ B.
The exclusive “or” has the form (¬A) ⇔ B.)
By contrast, the word “and” can be omitted because it is the natural way of thinking. (A ∧ B is equivalent
to the sequence of propositions A, B, stated one after the other.) So the word “and” conveys certainty of
two or more propositions, just like stating the propositions one after the other. The word “not” indicates
certainty of the negative proposition, leaving no room for doubt.
Logical expressions of the form ¬A ∧ ¬B also indicate certainty because, in this case, two negative assertions
are made with certainty. For example, the following line appears in Gilgamesh [203], page 5.
he knows not a people, nor even a country. I.108
This is superficially equivalent to the form ¬(A ∨ B), but this is not how the statement is communicated.
The following is another case of this logical form.
‘O Gilgamesh, there never has been a way across, X.79
nor since olden days can anyone cross the ocean.’ X.80
It could be argued that a statement of the form ¬A ⇒ B is equivalent to A ∨ B. (The quotation of
lines VI.96–97 in Remark 45.4.1 is of this form.) But such statements are not expressed as disjunctions in
Gilgamesh.
The epic of Gilgamesh has 11 instances of the word “if”. These are listed in Remark 45.4.1. For 3000 lines
of text, this is a rather small amount of logical language. This seems to support the idea that logic appeared
in language at a late stage, and that quite likely, people did not think in the modern logical fashion very
frequently.
The paucity of logic in Gilgamesh contrasts with the abundance of numbers. For example, tablet I has 7
unique numbers in 300 lines ( 21 , 3, 23 , 13 , 6, 2, 7, in order of appearance). A fuller set of number counts is
listed in Table 3.5.1. (The last column shows the number of IF-sentences.)
This seems to support the view that logic is a later development than numbers in human history. One
might argue that narrations are, by their nature, unlikely to include OR and IF logic, because these indicate
uncertainty or ignorance, as opposed to AND and NOT logic, which indicate certain knowledge. The readers
and listeners of narrations typically just want to know what happened. They don’t want to guess! Therefore
it is no coincidence that all of the eleven IF-sentences in Remark 45.4.1 are part of quoted dialogues, not
part of the main narration.
3.5.3 Remark: The Hammurabi Code of Laws, which date from about 1860bc, show logic in almost every
law. For example:
7. If any one buy from the son or the slave of another man, without witnesses or a contract, silver
or gold, a male or female slave, an ox or a sheep, an ass or anything, or if he take it in charge, he

tablet lines count numbers if

I 300 7 2 , 3, 3 , 3 , 6, 2, 7
1 2 1
II 303 5 2, 7, 60, 12 , 10 1
III 287 4 2, 7, 13, 20
IV 260 10 20, 30, 50, 3, 1 12 , 2, 4, 5, 7, 6
V 302 5 2, 3, 13, 10, 6
VI 183 7 3, 2, 7, 100, 200, 30, 6 3
VII 267 12 2, 20, 6, 7, 3, 4, 5, 8, 9, 10, 11, 12 1
VIII 219 4 10, 40, 3, 2
IX 196 13 3 , 3 , 12, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
2 1
X 322 14 6, 7, 300, 5, 3, 1 12 , 2, 4, 8, 9 10, 11, 12, 120 5

XI 329 17 5, 10, 6, 7, 9, 30,000, 10,000, 20,000, 23 , 14, 2, 3, 4, 20, 30, 12 , 3 12 1
XII 153 6 2, 3, 4, 5, 6, 7
Table 3.5.1 The numbers of instances of numbers and logic in Gilgamesh
is considered a thief and shall be put to death.

Presumably there were laws before these were written down, and all laws would have much the same level
of logical structure.
3.5.4 Remark: Bēowulf contains more logical language than Gilgamesh.

The Anglo-Saxon (Old English) epic Bēowulf [219] is a 3182-line poem composed in England about 840ad. A
reading of this work reveals certainly more instances of “or” and “if” logical constructions. Many appearances
of these words in an English translation [195] turn out to be artefacts of the translation. The first three
instances of the word “or” in this translation (in lines 186, 250 and 252) are such artefacts. If one carefully
reads the Old English source to eliminate such artefacts, there remain 16 genuine OR-constructions, in lines
283, 437, 635, 637, 693, 1491, 1763–1766 (7 instances), 1848, 2253, 2376, 2434, 2494, 2495, 2536, 2840
and 2922. This may seem a small number, but it contrasts strongly with the zero in Gilgamesh. (See
Remark 45.4.2 for examples of logic in Bēowulf.)
The Old English word for “or” is variously written as “o!!e”, “o!þe” or “oþ!e”. (Pronunciation: “!” (called
“eth”) sounds like “th” in “this”; “þ” (called “thorn”) sounds like “th” in “thick”.) This word is derived
from the word meaning “other”, which English retains in the word “otherwise”. This shows that originally
the word “or”, which is derived from “o!!e”, had the meaning of an implication. In other words, a sentence
of the form A ∨ B was effectively written as (¬A) ⇒ B. That is: “A is true, otherwise B is true.”
There are 26 IF-constructions in Bēowulf in lines 272, 280, 346, 442, 447, 452, 527, 593, 684, 1104, 1182, 1185,
1319, 1379, 1382, 1477, 1481, 1822, 1826, 1836, 1852, 2514, 2519, 2637, 2841 and 2870. (The Old English
word for “if” is “gif” or “gyf”.) As in the case of OR-constructions, there is a contrast with Gilgamesh,
which has only 11 IF-constructions in the same number of lines.
There are also many instances of the word “unless”. (This is “nefne”. “nemne” or “nymþe” in Old English.)
The word “unless” is logically equivalent to “or”. This is readily verified by noting that “A unless B” means
“if not A, then B”, which means (¬A) ⇒ B, which is equivalent to A ∨ B. There are numerous instances of
“unless” in Bēowulf, including at lines 781, 2151, 2533, 2654 and 3054.
Logic in Bēowulf is also expressed with the word “except”, which is “būtan” in Old English. This often
means the same as “unless”. In fact, this gives a clue to how logic can enter into natural language. All
OR-constructions and IF-constructions have the character of exceptions. This is summarized in the following
table.
main second
statement formula claim claim
A or B A∨B A B
if A then B A ⇒ B ¬B A
A, unless B ¬A ⇒ B A B
Each of these logical expressions has a main claim. But if the main claim fails, there is a back-up claim, the

exception. This is why the word “or” in English is so closely related to “otherwise”, and both words come
from the Old English “o!!e”. The word “otherwise” means “in the exceptional case that the main claim
fails”. The OR-construction and IF-construction differ in that the main claim for the OR-construction is
the foreground or default claim whereas the main claim for the IF-construction is in the background as the
normal case is the exceptional condition does not apply. (See also Remarks 3.6.1 and 3.7.3 for foreground/
background propositions.)
It follows that non-atomic logical expressions (i.e. not including AND and NOT expressions, which are really
atomic assertions) have the character of afterthoughts. This suggests a state of mind in which one claim is
made, followed by another state of mind in which the exception is stated. This may explain why there was
little non-atomic logic in very early literature.
3.5.5 Remark: There are essentially only two kinds of binary logical operator.
If one considers that propositions and their negatives are of equivalent status (since the negative of every
atomic proposition is an atomic proposition), there are in fact only two kinds of non-atomic binary logical
operator: inclusive and exclusive. To clarify this, let XT,T XT,F XF,T XF,F denote the sequence of values of
a truth function X : {F, T}2 → {F, T}. For example, TTTF denotes the logical operator (A, B) 8→ A ∨ B.
Then the 16 possible truth functions may be expressed as follows.
function operator type sub-type equivalents atoms
TTTT 9 always true 0
TTTF ¬A ⇒ B implication inclusive A∨B A ⇐ ¬B
TTFT ¬A ⇒ ¬B implication inclusive A ∨ ¬B A⇐B
TTFF A atomic 1
TFTT A⇒B implication inclusive ¬A ∨ B ¬A ⇐ ¬B
TFTF B atomic 1
TFFT A⇔B implication exclusive ¬A ⇔ ¬B
TFFF A∧B conjunction A, B 2
FTTT A ⇒ ¬B implication inclusive ¬A ∨ ¬B ¬A ⇐ B
FTTF A ⇔ ¬B implication exclusive ¬A ⇔ B
FTFT ¬B atomic 1
FTFF conjunction 2
A ∧ ¬B A, ¬B
FFTT ¬A atomic 1
FFTF ¬A ∧ B conjunction ¬A, B 2
FFFT ¬A ∧ ¬B conjunction ¬A, ¬B 2
FFFF ⊥ always false 0
The “always true” and “always false” operators convey no information at all. The 4 atomic propositions
convey information about only one of the propositions. So these are not, strictly speaking, binary operators.
The 4 conjunctions are equivalent to simple lists of two atomic propositions. So the information in these
operators can be conveyed by individual propositions, one after the other.
This leaves the 4 inclusive and 2 exclusive disjunctions (or implications). The 4 inclusive disjunctions run
through the 4 combinations of T and F for the two propositions. They are equivalent to each other under
swaps of the proposition truth values. The 2 exclusive disjunctions are similarly equivalent to each other
under swaps of proposition truth values.
Thus, apart from binary logical expressions which are equivalent to lists of 0, 1 or 2 atomic propositions,
there are only two logical operators which are unique under swaps of proposition truth values.
3.5.6 Remark: Biconditionals are equivalent to lists of conditionals.

One may observe that the biconditional A ⇔ B is equivalent to a list of two simple conditionals: A ⇒ B, B ⇒
A. Therefore all non-atomic propositions are conditionals. This re-write of biconditionals as conditionals is
fairly close to how people do think about biconditionals colloquially. Alternatively, A ⇔ B is equivalent to
the list: A ⇒ B, ¬A ⇒ ¬B. This is perhaps closer to how people really think about biconditionals.
However, disjunctions are not so easily expressible as lists of disjunctions. The exclusive disjunction A ! B
is equivalent to the list of disjunctions: A ∨ B, ¬A ∨ ¬B. This might seems convincing to the modern

mathematical mind. But it is probably not how people think about the exclusive OR in non-technical
contexts. People probably think of A ! B as meaning: (A ∨ B) ∧ ¬(A ∧ B), which is not a list of disjunctions.
Another natural way of thinking about A ! B is: (A ∧ ¬B) ∨ (¬A ∧ B), which looks even less like a list of
disjunctions. The list A ∨ B, A ⇒ ¬B seems moderately intuitively convincing. So does the list ¬A ⇒ B,
A ⇒ ¬B. But these last two lists use implications in place of disjunctions.
It can be tentatively concluded that inclusive and exclusive OR-expressions are qualitatively different. Sim-
ilarly, conditionals and biconditionals are qualitatively different.
permissive strict
(one-way) (two-way)
implication A⇒B A⇔B
disjunction ¬A ⇒ B A ⇔ ¬B
A permissive operator makes a one-way link from one proposition to another. A strict operator makes a
two-way link between propositions. But the two two-way implications are not thought of in the same way.
The two-way disjunction is thought of as exclusive multiple choices: (A ∧ ¬B) ∨ (¬A ∧ B). The two-way
implication is thought of as a choice between “both true” and “neither true”: (A ∧ B) ∨ (¬A ∧ ¬B). In
other words, in both cases, a more natural way of thinking is as a disjunction of conjunctions, not as a list of
one-way implications or disjunctions. One may therefore conclude that all four non-atomic logical operators
are qualitatively different, even though they may all be written as one or two implications.
3.5.7 Remark: Colloquial logic confusions: inclusive versus exclusive disjunctions and implications.
In colloquial logic, even in the 21st century, there is frequent confusion between the inclusive OR and the
exclusive OR. Very often, it is the exclusive OR which is meant. Often it is not possible for even the speaker
to determine whether the inclusive or exclusive OR is meant. Thus A∨B is confused with (A∨B)∧¬(A∧B).
(These colloquial logic confusions are also mentioned in Remark 3.13.7.)
Parallel to the inclusive/exclusive OR confusion is the conditional/biconditional confusion where someone
says that B is true if A is true, but it is consciously or subconsciously implied that B is true only if A is
true. Thus A ⇒ B is confused with A ⇔ B.
These colloquial logic confusions are summarized in the following table.
inclusive exclusive
implication A ⇒ B A⇔B
disjunction A∨B A ⇔ ¬B
As mentioned in Remark 3.5.5, disjunctions and implications are equivalent under swaps of proposition
truth values. So there are only two unique, strictly binary logical operators. But these two are confused in
colloquial logic. So there is effectively only one unique kind of strictly binary logical operator which is found
in general literature.
Further confusions are quite common in ordinary daily life. Some people who hear a sentence of the form
“if A, then B” think that this is equivalent to “since A, then B”, which is equivalent to “B is true because
A is true”. Such people are used to hearing conditional compound sentences only when the antecedent is
true. But even if the probability of this being true in daily life is high, from the purely logical point of view,
A ⇒ B is neither an assertion of A nor of B, and yet many people interpret it as an assertion of both!
3.5.8 Remark: Disjunctions are cognitively more complex than conjunctions.
The low frequency of disjunctions in literature, relative to conjunctions, is not surprising. The compound
proposition “A ∧ B” is equivalent to:
(1) A is true.
(2) B is true.
The compound proposition “A ∨ B” is equivalent to:
(1) A may be true or false.
(2) B may be true or false.
(3) At least one of A and B is true.

Seen from this perspective, the disjunction of two propositions is significantly more difficult to grasp than
the conjunction. The disjunction is quite ambiguous about each of the simple propositions A and B. The
number of true propositions in the pair must be at least one. This seems to imply that some sort of counting
operation is required. This becomes clearer in the case of the triple disjunction A ∨ B ∨ C.
(3) C may be true or false.
(4) Either one, two or three of the propositions A, B and C are true.
Neither of the three propositions is known to be true or false. But if two of them are known in some other
way to be false, the remaining single proposition will then be known to be true. This can be expressed as
follows.
(3) C may be true or false.
(4) If A and B are false, then C is true.
(5) If A and C are false, then B is true.
(6) If B and C are false, then A is true.
If a person’s objective is to establish that some simple proposition is true when A ∨ B ∨ C is given, the
logical path towards this objective is somewhat complex. The triple conjunction has no such complexity.
The triple conjunction gives three true propositions with no extra work required.
3.5.9 Remark: Conjunctions communicate more information than disjunctions.

From the information perspective, the conjunction of propositions gives the maximum information about
them because it specifies all of the truth values, whereas the disjunction gives very little information. The
disjunction tells us only that one of the propositions is true, and we don’t even know which one it is. So it is
no wonder that disjunctions are not favoured in ancient literature. (See Remark 4.13.9 for similar comments
regarding universal and existential quantifiers.)
Although conjunctions and disjunctions are superficially similar, since they are in some sense duals of each
other, the differences in information content between them demonstrate that they are fundamentally different.
If logical expressions are written in disjunctive normal form, it becomes clearer how much information is
contained in them. For example, A ∧ B is already in disjunctive normal form, but A ∨ B is equivalent to
(A ∧ B) ∨ (A ∧ ¬B) ∨ (¬A ∧ B), which only narrows down the options to 3 out of 4 possibilities for the
pair of truth values (t(A), t(B)). The triple conjunction A ∧ B ∧ C specifies a single possibility out of 8
possibilities for the truth-value triple (t(A), t(B), t(C)), whereas A ∨ B ∨ C specifies 7 possibilities out of 8,
which is clearly much less specific.
[ In relation
! to Remark 3.5.9, quantify the information in conjunctions and disjunctions of N propositions.
E.g. − N −1 ln N , or something like that. ]
3.5.10 Remark: Geographical navigation may be a metaphor for logic.

Navigation in the geographical terrain is a quite plausible metaphor for logical expressions. The expression
A ⇒ B could be interpreted as: “If you follow path A, you will arrive at point B.” Multiple such implications
can be concatenated to make up a path to arrive at a distant destination. The expression (A ∨ B) ⇒ C
could be interpreted as: “If you follow either path A or path B, you will arrive at point C.” The expression
¬A could be interpreted as: “Path A leads nowhere.” The analogy is not perfect, but it is plausible that the
mental processes of terrain navigation and propositional calculus have a lot in common.
If this point of view is correct, one may regard essentially all animals as being capable of logic. Animals which
can navigate in sophisticated ways probably implement a richer range of logical expressions. For example,
an animal which knows multiple ways to arrive at a destination can navigate alternative paths if one or more
paths are unexpectedly blocked. The generality of logic required for terrain navigation is probably greater
than that which is required for simple operant conditioning.

3.6. Proposition-store versus world-view ontology for logic 85
3.6. Proposition-store versus world-view ontology for logic

3.6.1 Remark: The world-model ontology for logic explains the double negative with two-way decisions.
The acceptance/rejection model for truth and falsity which is presented in Remark 3.7.2 encounters serious
difficulties when one considers concepts such as double negation. (See Section 3.10 for the semantics of
negation.)
Section 3.7 proposes that an ontology for logic can be based on a machine model where the machine ac-
cumulates true propositions over time. In this model, a proposition is tagged with “true” to mean that it
should be added to the proposition-store whereas a “false” tag means that it should be rejected. But double
negation poses difficulties. The rejection of an instruction to reject a proposition A does not imply that A
should be accepted. In mathematical logic, however, we do want double negation to imply the assertion of
a proposition.
If one spends enough time an energy trying to make an accept/reject model of the true/false concept
correspond naturally to a wide variety of logic contexts, one begins to find this approach frustrating and
unsatisfying. There seems to be something fundamentally lacking in the proposition-store machine model.
The problems with the proposition-store machine model seem to be fixed by a world-model machine model.
In a world-model machine model for logic, each logic machine is thought of as deriving all propositions from
a world-model. The world-model may be a model of anything at all, not necessarily within the physical
world. The world-model may be pure fantasy. Propositions then arise within the world-model as choices
between options for the world-model.
For example, one world-model could be a Newtonian picture of planets revolving around a star in elliptical
orbits. Such a model has a space of unknown parameters such as the number of planets, their orbital
elements, and the mass, radius and temperature of each planet. One may propose that the fourth planet
from the star has an orbital period greater than one year. The assertion or negation of this proposition A
constrains the total space of parameters for the model. Such a model is not just a big set of true propositions.
The propositions are attributes or properties of the model. The model is more than merely the totality of
all true propositions about the model.
If one accepts the world-model machine model for logic, the negation ¬A of a proposition A is always a
positive assertion. When one asserts A, one is constraining the parameter space of the model to a particular
subset of the total set of parameters. The negative assertion ¬A means that the set of parameters is
constrained to the complement of the subset asserted by A.
It follows from this kind of world-model-machine ontology that truth and falsity are not fundamentally
different. The truth of a proposition implies going down one path. The falsity of the proposition implies
going down some other path. In particular, the assertion that a proposition is false does not imply some
sort of vacuum of information. Every negative assertion is effectively a positive assertion. Both positive
and negatives assertions are branches of a two-way choice. (This is illustrated in Figure 3.6.1 for the case of
palaeolithic mammoth-hunters deciding whether there is or is not a mammoth drinking at the local pond. If
the news is negative, the state of mind Z does not cease to exist. The state of mind goes down one branch
of a two-way decision tree.)
Mammoth! No mammoth!
state Z1 state Z2
(Let’s go!) (Let’s stay!)
true false
Mammoth?
(Go or stay?)
state Z0
Figure 3.6.1 Decision-making for mammoth-hunters: Falsity does not imply a vacuum
To put it another way, a proposition which relates to a world model asserts a foreground option if it is
true, but there are always one or more other options in the background. The alternative to the truth of a
proposition is not a vacuum. To put it simply: For every foreground, there is a background.

By contrast with the acceptance/rejection model for truth and falsity, the falsity of a proposition in the
world-model machine model is never a mere rejection of a proposition. The falsity always implies acceptance
of a complementary proposition. In view of these considerations, a double negative ¬¬A is always equivalent
to the positive assertion A because the complement of a complement equals the original set.
3.6.2 Remark: The observable behaviour of mathematicians suggests a proposition-store model for logic.
Mathematics is quite unusual among the applications of logic. In the case of mathematics, it sometimes
seems that all of the subject can be represented as a set of propositions. In fact, it seems to be possible
to “do” mathematics without having any understanding of the meaning of the symbols at all. A person
(or a computer) who knows only the rules of deduction can verify the correctness or incorrectness of any
mathematical argument if it is presented formally enough. (Computer don’t seem to be very successful at
discovering new, interesting theorems though. Nor are they able to distinguish between superficial and deep
theorems.)
The theorems which are accumulated in any mathematical subject form a proposition-store which seems
to represent the accumulated knowledge in that subject. The strong emphasis in pure mathematics on the
accumulation of theorems may be partly due to the lack of diagrams in journals in the past. This creates
the impression that mathematics is an essentially text-based subject, and that all mathematical ideas can
be represented as a sequence of propositions.
It was the apparent theorem-accumulation nature of mathematics which made this author at first incline
to the view that a natural ontology for logic might be a machine which accumulates propositions. In fact,
mathematics is one of the very few subjects where logical deduction in terms of abstract deduction, devoid of
semantics, can be so productive. This is why computer software can apparently make sense of mathematics.
The software only makes sense of the textual level of the subject, not the semantic level. This is also why
computer software is not very successful in discovering new approaches in mathematics. Humans work at
both the textual and semantic levels. The propositions of mathematics serve merely to record the conclusions
of the mathematician, and to verify the self-consistency of the arguments. Discovery generally requires an
understanding of the semantics underlying each mathematical subject. (That’s why this author is strongly
in favour of diagrams to illustrate mathematics. Diagrams help to clarify the semantic content of a text.)
The proposition-store-machine ontology in Section 3.7 makes truth and falsity seem to be fundamentally
different in terms of their semantics. Similarly, this ontology makes conjunctions and disjunctions seem
fundamentally different. (See Remark 3.7.14, for example.)
The world-model-machine ontology in Remark 3.6.1 makes truth and falsity seems like duals of each other.
This corresponds very well to the calculus of logical operators, which shows duality between true and false,
and between conjunctions and disjunctions. Thus the world-model-machine ontology is, ironically, closer to
the properties of abstract logical operator calculus than the proposition-store-machine ontology.
In digital electronics, it is well known that the laws of logic are invariant under the swap of high and low
voltages. In other words, when “true” and “false” are swapped, all of the logic rules stay the same. This fits
very well with the world-model-machine ontology.
3.6.3 Remark: The excluded middle follows easily from a class model for logical propositions.
The world-model ontology for propositional logic seems to be supported by the chapter on classes and
symbolic logic in Lakoff/Núñez [173], pages 121–139. On page 131, they make the following comment.
The trick here is to conceptualize propositions in terms of classes, in just such a way that Boole’s
algebra of classes will map onto propositions, preserving inferences. This is done in the semantics
of propositional logic by conceptualizing each proposition P as being (metaphorically of course) the
class of world states in which that proposition P is true. For example, suppose the proposition P is
It is raining in Paris. Then the class of world states A, where P is true, is the class of all the states
of the world in which it is raining in Paris. Similarly, if Q is the proposition Harry’s dog is barking,
then B is the class of states of the world in which Harry’s dog is barking. The class A ∪ B is the
class of world states where either P or Q is true—that is, where it is raining in Paris or Harry’s
dog is barking, or both.
The authors then proceed to show how the law of the excluded middle follows from the corresponding law
for classes. It it only a small intellectual leap to conclude that if all propositions are derived as attributes
from world-models, then the exclude middle law must hold for propositional logic. Thus propositions which

3.7. A proposition-store ontology for logic 87
transmit world-model attributes can be either true or false, and not both, and not anything else. It is difficult
to think of practically useful propositions which are not attributes of some sort of model. A completely
abstract proposition with no semantics at all would not necessarily obey the excluded middle law, but such
an abstract kind of proposition would seem to have only recreational interest.
3.6.4 Remark: Intuitionistic propositional calculus omits the excluded middle axiom.
One of the schools of thought in mathematical logic called “intuitionism” claims that the excluded middle
rule A ∨ ¬A, for all propositions A, should not be accepted. (See EDM2 [35], 156.C, page 614.) The
excluded middle is also known by the Latin name tertium non datur . For a particular propositional calculus,
Mendelson [165], page 43, says that intuitionistic propositional calculus is obtained by replacing the axiom
¬¬α ⇒ α with the weaker axiom ¬α ⇒ (α ⇒ β).
3.7. A proposition-store ontology for logic

[ 2008-12-12: The author has concluded a couple of weeks ago that the proposition-store ontology for logic is
unsatisfactory. It seems to describe what mathematicians apparently do, but it doesn’t describe what they
think . It is also useless for giving meaning to logical expressions, even as simple as logical negation. So the
world-model ontology is now being adopted. So this section will be downgraded to a mere side-comment
some time soon. ]
3.7.1 Remark: A proposition may be true or false. An assertion claims that a proposition is true.
The word “proposition” is Latin for “putting forth”. In other words, a proposition is something which one
person (or group of persons) puts forward to another person (or group of persons). Thus a proposition is
something which is offered, with more or less compulsion.
The word “assertion” is Latin for “laying hold of” in the sense of laying claim to something as one’s property.
By extension, an assertion may mean a claim that something is true.
In modern mathematical logic, a proposition is a statement which may be true or false, whereas an assertion
is a proposition which is claimed to be true. (In former times, a proposition was generally a statement which
was claimed to be true.) Thus we distinguish between
(1) the mere writing of a proposition, and
(2) the assertion that it is true.
The assertion symbol “ < ” before a proposition indicates that the writer is asserting that the proposition
is true. The mere writing of a proposition means that the writer merely wishes to discuss the proposition,
no matter if it be true or false. (This is essentially the same as the distinction between indicative and
subjunctive verb moods discussed in Remark 3.12.2.) However, the context of an informal logical argument
often indicates that a proposition is being asserted, not just discussed. It is the writer’s responsibility to
clarify which meaning is intended.
example description verb mood truth value
proposition A∨B could be true or false subjunctive t(A ∨ B) = T or F
assertion <A∨B is claimed to be true indicative t(A ∨ B) = T
3.7.2 Remark: Observed logical behaviour includes offering, accepting and rejecting propositions.
In the real world, the practical interpretation of the concepts of truth and falsity by human minds may be
thought of as the acceptance and rejection of propositions over time. The human mind may be thought of
as starting initially with zero (or very few) propositions about the world, and over time, propositions are
added to the individual’s “true proposition set”, or rejected if found to be false. In practice, when a person
says that a proposition is true, what they mean is that they will accept or maintain it in their collection of
propositions. When they say that a proposition is false, they mean that they will reject it if offered, and will
delete it if it was accepted earlier. This is illustrated in Figure 3.7.1.
When particular individuals assert the truth of a proposition, they are claiming that it should be accepted
by themselves and others. This suggests that assertions have a socially coercive character, as suggested in
Remark 3.7.12.

accept T
offer of
proposition F
reject
universe of proposition proposition
propositions truth tester stores
Figure 3.7.1 Model for truth in mathematical logic
There is no absolute meaning for “truth”. Each individual (and perhaps even each state of mind of each
individual) has a particular set of true propositions. A proposition is said to be true for a particular speaker
if it is accepted by that speaker. It is said to be false if it is rejected by the particular speaker. Acceptance
and rejection are states of mind which can be inferred from observable social behaviour. According to this
point of view, truth and falsity are subjective.
The proposition acceptance/rejection model for logic has some difficulties which become clear in the case of
double negative propositions, as mentioned in Remark 3.10.8.
3.7.3 Remark: The concept of truth often has a multiple-choice character.

In practice, a large proportion of logical propositions are of the “multiple choice” type. In other words, there
is an a-priori assumption that one of a set of options is true. Then the decision to be made is which one
of the options is true. For example, one might ask which direction a path leads in. The “true” direction
would be the correct choice of direction. In the minds of people who are trying to answer a question about
direction, the question is “which direction?” rather than whether each particular direction is “true”. In
other words, instead of having to make a two-way choice between “true” and “false”, the choice is between
the many directions of the compass. Thus the concept of truth is often a matter of choosing the right option
from a set of options. (See Remark 3.9.5 for similar comments.)
To put it another way, “true” means: “Yes, this is the right choice.” “False” means: “No, this is the wrong
choice.” Behind every foreground proposition there is a background proposition. Every foreground has a
background. So for every foreground proposition which can be true, there is a background proposition which
is true if the foreground proposition is false. The human mind works on a foreground/background basis, not
a foreground/vacuum basis. Deciding a proposition always determines which of two or more choices will be
accepted. (See also Remark 3.6.1 for foreground/background propositions.)
3.7.4 Remark: Every assertion is effectively a double-negative assertion.

To put it yet another way, whenever a proposition is asserted to be true, the negation of the proposition is
brought into the picture as the background. Therefore the assertion of a proposition always implies that the
proposition is in question, and that the negative of the proposition is also being considered. In this sense,
one may regard all positive assertions as double-negative assertions, because the assertion is a denial of the
negative of the asssertion. Thus propositions are like the two sides of a coin. There are no single-sided
coins. All assertions arise from a two-way choice. (This explains why it is dangerous for a politician to deny
anything. The audience immediately gives at least some credence to the negative of the assertion.)
3.7.5 Remark: Deducing new propositions from old propositions can be automated.
If a group of individuals happen to adopt the same rules for deriving new “true” (i.e. acceptable) propositions
from already accepted propositions, and if they start with the same set of “true” (i.e. accepted) propositions,
they can be expected to agree on the new propositions which they derive.
We may regard truth and falsity as being merely a tagging of propositions by machine-like minds which
manage their proposition-stores by accepting true and rejecting false propositions. To start the process,
mathematicians try to agree on rules of argument and a set of axioms. This is illustrated in Figure 3.7.2.
A propositional (or predicate) calculus may be thought of as the design of an abstract machine which
has a proposition-store and a process for accumulating “true” propositions. The design of such a calculus
is supposed to resemble the way a real-world mathematician works. If the calculus is implemented on a
computer, one may hope that the implementation may be faster and less error-prone. In fact, one may
regard mathematical logic as the mechanization of human mathematical thinking. In other words, it is an
attempt to systematically describe yet another slow, unreliable, tedious human activity so that it may be
delegated to machines which are fast, reliable and pain-free.

proposition
well-formed axioms and re-use
formula rules deduction rules
accept T
offer of
proposition F
reject
universe of proposition proposition
propositions truth tester stores
Figure 3.7.2 Rules and axioms in logical deduction
3.7.6 Remark: The tasks of logic include proposition evaluation and “solving logical equations”.
The proposition testing units in Remark 3.7.5 must be able to do both evaluation of compound propositions,
given the truth values of the component atomic propositions, and the deduction of conclusions from given
compound propositions. These two tasks are analogous to the evaluation of algebraic expressions and the
solutions of simultaneous algebraic equations respectively. (See Remark 4.4.2.)
3.7.7 Remark: Conjunctions are difficult to give meaning to in the proposition-store logic model.
Given any two propositions A and B which are accepted by a “logic machine”, we may form the combined
proposition “A and B”. This has a clear meaning with reference to the “logic machine” model. It means
that both propositions A and B are accepted by the machine. It follows, then, from our naive (i.e. in-built)
knowledge of the world that the meta-proposition “A and B” is necessarily true if both A and B are true.
This is because truth of a proposition means that it is accepted by a particular logic machine.
The previous paragraph has a suspicious circularity. We, as observers of a logic machine M , are accepting the
propositions “A is accepted by machine M ” and “B is accepted by machine M ” within our own minds, and
we are ourselves “logic machines”. We then form the compound proposition “both A and B are accepted by
machine M ” in our own minds, and we accept this compound proposition because of our own naive logical
processes for the conjunction of propositions. We therefore conclude that machine M should accept the
proposition “A and B”. But this is the imposition of our own logical processes on machine M . This is, in
fact, exactly what an assertion is. An assertion is a claim by one logical machine that a proposition should
be accepted by another machine.
It is difficult to escape the conclusion that the conjunction of propositions (connecting them with the word
“and”), and all of the rest of mathematical logic, is merely a behaviour which humans do, and which we
believe any other rational machine should accept.
3.7.8 Remark: Logic is the art of leading an audience to conclusions.

A logical argument is, broadly speaking, a communication which is intended to lead a reader or listener to
accept one or more propositions. (In fact, the words “deduction” and “induction” both come from the Latin
word “ducere” which means “to lead”.) A logical argument generally makes use of assumed prior accepted
propositions (which correspond to axioms) and agreed rules of argument to lead the reader or listener in the
desired direction.
3.7.9 Remark: Temporal ordering of logical argumentation.

In terms of the “logic machine” model, a logical argument has a temporal parameter. The sequence of steps
in an argument is played out over time, inducing a sequence of states in the target logic machine M (i.e. the
reader or listener), which finally reaches a state where one or more new propositions are accepted by M .
For example, consider the following sketch of a trivial argument.
Theorem: α, α ⇒ β < β
Proposition Justification
(1) α Assumption 1
(2) α ⇒ β Assumption 2
(3) β MP (1,2)

Such a formalized argument assumes that the machine M already accepts the propositions α and α ⇒ β.
Then in step (3), machine M is supposed to insert β into its own set of accepted propositions. So the
contents of the proposition-store of machine M are as follows after each line of the argument has been put.
Proposition Proposition-store of machine M
(1) α α, α ⇒ β
(2) α ⇒ β α, α ⇒ β
(3) β α, α ⇒ β, β
In serious mathematics, the proposition-store of the reader or listener would, of course, be too large to write
out explicitly.
3.7.10 Remark: Static versus dynamic view of proposition validity.

The main point of Remark 3.7.8 is that propositions are not true or false in a static sense. True and false
propositions are accumulated in proposition stores inside logic machines (of which the human mind is but
one example). In other words, truth and falsity are dynamic in character. However, human beings hold their
propositions as if they were true in a static sense. Humans like to believe that the conclusions which they
arrive at are true in an absolute and eternal sense. Therefore they do not say: “I now accept this proposition
until such time as I reject it.” They say: “This proposition is true.” This is perhaps a necessary illusion.
If every individual believed that all of their beliefs were arbitrary and temporary, they would not act with
confidence on the basis of their beliefs. They would fear that their beliefs could be negated at any time.
One consequence of the desire to maintain the illusion of the static nature of beliefs is that mathematicians
try to establish, at least within the scope of a handful of axioms and rules (i.e. an axiomatic system such
as the predicate calculus), that there are no possible contradictions between the propositions that may be
accumulated by different people starting with the same axioms and rules.
Russell’s paradox is just one example of how two different logic machines following the same axioms and
rules could correctly deduce that a particular proposition is both true (must be accepted) and false (must
be rejected). If one of the rules of argument is RAA (reductio ad absurdum), such a situation leads to the
necessary acceptance and rejection of all propositions. (See Section 3.11 for discussion of RAA.)
3.7.11 Remark: Static versus dynamic view of logical expressions.
The modus ponens deduction rule may be regarded as defining the implication operator “⇒”. On the other
hand, a logical expression such as A ⇒ B as a notation for a truth value function φ⇒ (t(A), t(B)) in terms
of the truth values t(A) and t(B) of the propositions named by A and B. (See for example Remark 3.12.2.)
The first of these perspectives is dynamic, the second is static.
The modus ponens view of the implication operator shows how implication expressions may be applied in
the deduction process, which is what the proposition-store model does. Propositional calculus and predicate
calculus are dynamic processes which are fairly well explained by the proposition-store model.
The logical function tree view of the implication operator assumes that the component atomic propositions
already have known truth values, which are then inserted into the calculation to produce a truth value for
the logical expression.
One might reasonably ask which of the two views is more “correct”, or which is more fundamental. The
history of logical argumentation, particularly in ancient Greek philosophical speculation, and presumably also
in relation to the legal systems which in Mesopotamia are known to date from 2000bc, has shown a strong
bias towards the dynamic view of logic. But this is quite likely because the consequent of an implication is
typically an imperative (i.e. an action to be performed) in non-scientific contexts. Ancient Greek logic was
mostly in the from of syllogisms, which are dynamic in character. Implications in some non-scientific have
a causal character, where the antecedent chronologically precedes the consequent. In most non-scientific
contexts, there are subliminal undertones which go beyond cold, bare logic.
In scientific contexts, it seems more natural to think of the static functional view of logical expressions as
fundamental, and the dynamic logical argumentation view as derived from this. In scientific applications,
one typically argues from assumed, or confirmed, general laws to particular consequences. Thus the static
statement of laws or axioms is considred primary, and the dynamic argumentation process is simply a
necessary task to recover unknown parameters of models from known parameters. The assertion of a logical
expression is therefore seen as a static “fact”, and the argumentation process is seen as a human activity

which combines static facts to infer unknown things from known things. In this context, modus ponens may
be viewed as merely one among very many “tricks of the trade” for inferring unknowns from knowns.
Most mathematical logic textbooks focus on the implication operator and modus ponens as the basis for
their formalizations of logic. This emphasizes the dynamic view of logic. This may create the impression
that all logic is fundamentally dynamic in character. But this confuses the methods of solving logic problems
with the nature of logical assertions.
3.7.12 Remark: Influencing other individuals’ beliefs may seek to influence behaviour.
In the case of human beings, the ultimate purpose of foisting propositions on other individuals is to modify
their behaviour, since human behaviour is affected by beliefs. So logic could plausibly have arisen (some time
in the last million years) as a method for social behaviour control through control of beliefs. By understanding
the prior beliefs of other individuals, and their algorithms for acceptance or rejection of propositions, it is
possible to induce other individuals to accept new propositions (or reject previously accepted propositions)
by leading them down logical pathways.
Mathematical logic merely seeks to formalize and standardize the already existing methods of argument used
by human beings.
One consequence of the logic machine perspective is the observation that all logical argument, such as in a
mathematics paper or book, has an implicit “proposition store” in the background upon which arguments
are based. Mathematicians generally use the already-proved propositions of other mathematicians, although
the prior assumptions of the arguments of others are not always explicitly verified to be compatible with the
assumptions of the author in question.
To put it simply, the meaning of an assertion in a theorem is “add this to your proposition-store”. An
assertion is essentially a command (or request) from one individual to another to accept a proposition in the
proposition-store.
3.7.13 Remark: Utterance of propositions has both a descriptive and prescriptive character.
The descriptive and prescriptive interpretations of logical assertions are illustrated in Figure 3.7.3 in the case
of a compound proposition A ∧ B.
sender description receiver
“I accept A and B.”
A B “You should accept A and B.”
proposition store 1 prescription proposition store 2
Figure 3.7.3 Descriptive versus prescriptive interpretation of proposition A ∧ B
In the descriptive case, the sender of the proposition is stating that its proposition store includes both A and
B at the time of sending. In the prescriptive case, the sender of the proposition is stating that the recipient’s
proposition store should include both A and B shortly after the time of reception. (See Remark 3.12.1 for
related comments.)
3.7.14 Remark: The consequent of a conditional has the character of an imperative.

If the compound proposition A ∧ B in Remark 3.7.13 is replaced with A ∨ B, the situation is qualitatively
different. When the proposition A ∨ B is used in the descriptive sense, the sender merely needs to determine
if it believes either A or B, or both. Then the sender calculates whether the compound proposition A ∨ B is
true or false, and this truth value is transmitted. However, when the recipient receives A ∨ B, it is usually
not possible to determine truth values for A and B. The recipient must store the compound proposition as
is. Then it must wait for further information. For example, if the proposition ¬A is received, this can be
combined with A ∨ B to infer that B is true. (This is analogous to the situation where the sender transmits
an equation such as x + y = 8, and the recipient later combines this with x = 3 to obtain y = 5.)
To summarize this situation, if A∧B is received, the truth values of A and B can be immediately determined,
and the original compound proposition A ∧ B may be discarded. But with a compound proposition like
A ∨ B (or A ⇒ B), there is insufficient information to discard it in favour of the truth values of the atomic
propositions A and B.

In terms of the proposition insertion notation in Remark 3.10.2, the reception of A ∧ B may be decomposed
into the pair of insertions A → T, B → T, whereas the reception of A∨B must be implemented as A∨B → T
because it cannot be decomposed. Essentially this same point is made in Remark 3.13.6. (Conversely, if
A ∧ B is asserted to be false, the truth values of A and B are unknown, in which case A ∧ B → F must be
saved by the recipient rather than individual truth values for A and B.)
In the case of sending compound propositions, there is no such qualitative difference between conjunctions
and disjunctions. For both kinds of compound operators, it may be possible to determine the truth value if
only one of the component truth values is known. (For example, if A is false, then A ∧ B is guaranteed false,
no matter what B is. If A is true, then A ∨ B is guaranteed true, no matter what B is.) Consequently the
sender of a truth value for a compound proposition may not even know the truth values of all the component
propositions. So the ambiguity for the recipient is unavoidable. (See Remark 4.2.6 for an example truth
table which allows the possibility of unknown truth values.)
Although the sender of a compound proposition may have incomplete knowledge of the truth values of the
component propositions, the receiver may have extra knowledge which permits it to deduce more of the
component truth values than the sender possessed. These considerations are suggestive of the notion of a
“logic machine network”; in other words, a network of logic machines which implement a kind of population
dynamics.
3.7.15 Remark: Truth values are attribute tags for propositions.
Notation 3.7.16 presents abbreviations for the truth values “true” and “false”. These may be thought of as
proposition tags. In other words, they are attributes which are attached to propositions.
EDM2 [35], 411.E, page 1552, uses the notation " for “true” and # for “false”. In some computer pro-
gramming languages (such as C and C++), and in some symbolic logic formalisms, “true” is denoted 1 and
“false” is denoted 0, which seems very much easier to remember than " and #. (There is also a very ancient
numerological significance to the choice of 1 and 0 for true and false respectively. But this is a “family show”.
So the reader will have to look elsewhere for the fine details.)
3.7.16 Notation:
T denotes the proposition-tag “true”.
F denotes the proposition-tag “false”.
3.7.17 Remark: Conditional propositions and disjunctions may be regarded as “delayed assertions”.
As mentioned in Remark 3.7.14, if the insertion A ∨ B → T occurs in a recipient logic machine, a later
insertion ¬A → T permits the deduction that B is true. This kind of delayed insertion of a proposition is
particularly intended by the compound proposition A ⇒ B. This proposition permits the later deduction
of B to be triggered if A is asserted. The chaining together of multiple such implications is the reason why
propositional calculus is often formalized in terms of the implication operator.
A disjunction such as A ∨ B is equivalent to each of the two implications (¬A) ⇒ B and (¬B) ⇒ A. Thus
the delayed assertion of A is triggered by ¬B, and the delayed assertion of B is triggered by ¬A. (See
Remark 3.10.14 for further comments on triggering simple propositions via compound propositions.)
As mentioned in Remark 3.5.4, the word “or” originates from an Anglo-Saxon word which means “other”
or “otherwise”. This is suggestive of the idea that if the first proposition is not true, then the other one is
true. Thus A ∨ B means: “A is true; otherwise B is true.” This agrees with the idea that a disjunction can
be interpreted as a delayed assertion which is triggered by another assertion. In this case, B is triggered
by ¬A.
3.7.18 Remark: The prescriptive and imperative character of logic may be may be abused.
As the word “foisting” in Remark 3.7.12 suggests, logic has a somewhat aggressive character at times. In
fact, the ancient Greeks did employ logic as a political weapon to achieve objectives which were often quite
questionable ethically. It is particularly noteworthy that the combination of a contradictory set of axioms
with the RAA deduction rule (mentioned in Remark 3.7.10) enables the proponent (i.e. the person who puts
forward an argument) to deduce that any proposition at all is true or false. (This exploit can be observed
frequently in modern politics if one knows what to look for.)
3.7.19 Remark: Generalized truth-value tags to indicate uncertainty.
One could go further in the proposition-store “logic machine” modelling by proposing that all logic machines

3.8. Undecidable propositions and incomplete information transfer 93
may have more than one proposition-store. For example, there could be a second store for false propositions.
Our understanding of logic in our culture requires that a proposition may not be simultaneously in both the
“true-store” and the “false-store”. (In practice, humans do in fact allow this sort of contradictory situation.)
In addition to true and false proposition stores, logic machines could have an “undecidable-store” and an
“unknown-validity-store” for propositions which are not yet decided. There could also be a “probably-true-
store” and a “probably-false-store”, and a very wide range of similar modal indications.
As a matter of implementation, it is more efficient to maintain a single store of propositions with tags for
each proposition, indicating varying levels of truth or falsity.
3.8. Undecidable propositions and incomplete information transfer

3.8.1 Remark: Compound propositions communicate partial information.
An extended truth table for the implication operator is presented in Remark 4.2.6. The extended truth table
shows the sender’s view of the implication logical expression. The task of the recipient is to invert this truth
table to infer the extended truth values of A and B from the extended truth value of A ⇒ B. If the received
truth value for A ⇒ B is F, the recipient can say with certainty that the sender believes that A is true and
B is false. This gives the maximum information. If the recipient receives the truth value T for A ⇒ B, there
are 5 possible combinations of extended truth values for A and B. This gives the least information about
the belief state of the sender.
The order of truth values in Figure 3.8.1 is intentionally different to the normal English-language usage.
The reason for this is the fact that F and T are represented by the numbers 0 and 1 respectively in some
contexts. Therefore increasing numerical order is adopted here.
true/false model true/false/unknown model
F T FTU
incomplete
τ information flow τ#
axioms
P P1 , P2 , P3 ,. . . P1 , P2 , P3 ,. . . P
and rules
machine M1 machine M2
Figure 3.8.1 Ontology of unknown truth values
3.8.2 Remark: Undecidable truth values are a consequence of incomplete information transfer.
When a mathematical model contains undecidable propositions, a pseudo-truth-value tag U may be attached
to those propositions which are proved to be undecidable. In this case, a situation arises which is illustrated
in Figure 3.8.1. There is a presumed system M1 which is being modelled by system M2 . In M1 , the truth
value map τ is well defined for all propositions P ∈ P.
One way in which the information about the truth values of propositions in space P would be incomplete
transmission of truth values from M1 to M2 . In this case, truth values are transmitted one by one. But for
large sets of propositions, particularly for infinite sets, one usually transmits information about propositions
via axioms and deduction rules. In the axioms-and-rules case in machine M2 , incompleteness of the truth
value function τ # relative to machine M1 may arise because the axioms and rules are insufficient to span the
whole proposition space P.
In the case of first order languages such as ZF set theory, there are indeed axiomatic systems which are
incomplete. One may think of such systems is being imperfect models M2 of perfect models M1 . In other
words, an incomplete axiomatic systems are models for implied complete systems in which all truth values
are well defined. The pseudo-truth-value U indicates an incompleteness of information in a model which
suffers from incomplete transfer of information.

The question still remains, whether one should add extra truth values to symbolic logic to have the satisfaction
of assigning a truth value to every proposition. This would be as useful as adding an unknown pseudo-
integer to the integers or an unknown pseudo-real-number to the real numbers. It is somewhat absurd to
add such pseudo-values to every set in mathematics and logic. (This kind of absurdity is also discussed in
Remark 4.2.8.) It is better to deal with incomplete information meta-logically, in the discussion context.
The danger of inventing pseudo-truth-values for propositions is that one may fall into the error of supposing
that the system under study (M1 in this case) possesses these pseudo-truth-values, which is not correct. The
discussed system M1 has definite truth values for all propositions. They just happen to be unknown in the
discussing system M2 .
The situation is perhaps clarified by considering two logic machines M1 and M2 which are modelled by a
third machine M3 . (See Figure 3.8.2.)
true/false model
F T
τ1 inf
inc
o
o r m mple true/false/unknown model
ati te
P P1 , P2 , P3 ,. . . on
axi flow
om
sa
nd
FTU
machine M1 rul
e s
τ#
true/false model te
ple w P 1 , P 2 , P3 , . . . P
o m flo
inc t i on
F T a
orm
inf machine M3
es
τ2 rul
a nd
s
om
axi
P P 1 , P 2 , P3 , . . .
machine M2
Figure 3.8.2 Ontology of unknown truth values: two modelled logic machines
Even if the communicated axioms and rules are identical for M1 and M2 , the truth value functions τ1 and
τ2 may be different for some propositions P ∈ P, but within the span of the axioms and rules, there must
be complete agreement between τ1 , τ2 and τ # . For the propositions P ∈ P for which τ1 (P ) and τ2 (P ) are
different, the value of τ # (P ) must be U.
Now, if one wishes to model machine M2 by a third machine M3 , then it would be sensible to permit three
truth values in the model of M2 by M3 . This causes a minor blow-out in the number of types of unknowns.
(See Figure 3.8.3.)
Of course, this procedure becomes more “interesting” if one has a larger network of machines which are all
modelling each other. This introduces such concepts as “unknown unknowns”. Such a “network of logic
machines” situation does often happen in human literature where multiple contexts refer to each other.
3.8.3 Remark: Logical algebra is the “restoration” of lost truth values.
As mentioned in Remark 45.2.4, the original Arabic word “algebra” meant “restoration” in the sense of
restoring bones to their original condition after breakage, which is analogous to restoring the unknown values
of numbers, given some formulas for the unknowns which are somehow lost. This is fully applicable to “logic
algebra” (i.e. propositional calculus), which seeks to “restore” the unknown truth values of propositions,
given some logical formulas for their lost values. This perspective implies that truth values do have values,
and the purpose of propositional calclus is to restore those values. Consequently, truth and falsity do not
originate in the propositional calculus. Truth and falsity are assumed to be well defined, and the task is
merely to discover their values. Hence no discussion of inconsistency-tolerant logic is required, and the
excluded middle is guaranteed.

3.9. The semantics of truth and falsity 95
true/false model
F, T
τ inc
o
inf
o r m mple
ati te true/false/unknown model
P P1 , P2 , P3 ,. . . on
fl
axi ow
om
sa
nd F, T, U={F, T}
machine M1 rul 1
es
τ#
true/false/unknowns model
te
ple w P 1 , P 2 , P3 , . . . P
F, T, {F, T}, in c om n flo
tio
{F, U}, {T, U} orm
a
{F, T, U} inf machine M2
2
es
τ ## rul
nd
sa
om
axi
P P 1 , P 2 , P3 , . . .
machine M3
Figure 3.8.3 Ontology of unknown truth values: second-order modelling of unknowns
3.9. The semantics of truth and falsity

3.9.1 Remark: Analogy between undefined truth concept and undefined probability concept.
Truth is generally not defined in mathematical logic, although it is the core concept. Truth is left as an
undefined concept with well-defined procedures for its calculation, although what it is is not stated. This is
analogous to the way in which probability is undefined in probability, and set membership is undefined in
set theory. The meaning of “truth” must be sought outside the domain of mathematical logic.
3.9.2 Remark: Importance of giving meaning to the concepts of truth and falsity.
The primary objective of logical argumentation is the tagging of propositions with the adjectives “true” or
“false”, with the implication that the reader (or listener) should accept the true propositions and reject the
false propositions (as discussed in Section 3.7). It is therefore reasonable to commence a treatment of logic
by attempting to clarify the concepts of truth and falsity.
3.9.3 Remark: Mathematical logic truth is restricted to two truth values.

The concept of “truth” can be employed in a wide variety of contexts. As mentioned in Remark 4.1.1, the
abstract calculus of truth and falsity may be applied to a wide range of concrete systems whose components
may be in either of two states. The states may be called “true” and “false”, or “on” and “off”, or “high”
and “low”, or any other pair of names.
The concept of “truth” in abstract logic should be carefully distinguished from the related concept of the
accuracy of a hypothesis, model or theory. Abstract logic is a well-defined discipline involving truth tables,
logical operators, logical quantifiers, deduction rules, axioms, definitions and theorems. In abstract logic,
truth and falsity are effectively defined by the logical procedures which manipulate propositions and their
truth values. (The methods and procedures of abstract logic are the subject of Chapter 4.)
By contrast, the truth (or falsity) of propositions which refer to the real world is highly subjective, depending
on the entire intellectual framework and context in which the modelling is carried out. The notion of “scien-
tific truth” refers generally to the accuracy and suitability of particular models for explaining observations
of phenomena. Scientific truth depends on the ambiguities of our observations of the real world whereas
logical truth is defined by social conventions alone.
In colloquial language, the real-world kind of “truth” refers to both the lack of intentional deception and
the lack of accidental error. The kind of truth referred to in this case is the accuracy of the correspondence
between a person’s model of the world and the world itself. This kind of truth is not the subject of

mathematical logic. Therefore there is no need for concepts like the confidence level of an assertion in
mathematical logic. There are two truth values, and only two truth values. If observations of the real world
are subjective and vulnerable to error and uncertainty, that is not an issue which needs to be discussed
within mathematical logic.
The distinction between abstract and concrete logic is similar to how the abstract game of chess is distin-
guished from the concrete pieces of wood or plastic which are moved around on a physical board.
3.9.4 Remark: The concepts of truth and falsity arise from doubt and suspicion.
In human discussions and literature, there is no absolute need for the concepts of truth and falsity to play
a role. As long as there is no doubt about the veracity of what is written, there is no need to discuss
whether a statement is true or false. However, if any doubt arises, either because of the possible dishonesty
or ignorance of the speaker or writer, it is necessary to attach some sort of truth value or confidence value
to every proposition which is in doubt. Then each proposition P becomes the subject of a meta-assertion,
which states that P is true or false. So truth and falsity are always part of a meta-discussion, where a
previous discussion is itself the subject of the discussion.
In the modern world, we are so familiar with such meta-discussion that we are almost unaware of the
transition between assertions and meta-assertions. The sceptical mind-set doubts all statements by all
people, including oneself. But this kind of distrust could perhaps be regarded as pathological, indicating
a serious breakdown of trust. If all discussion is met with scepticism, the meta-discussions by the sceptics
are likewise subject to doubt, in which case there is little purpose in listening to any discussions at all. In
practice, generally only a small proportion of statements are subject to active doubt. When humans are
doing practical logic, usually only a small number of propositions are thrown into question.
[ Insert here a comment on the relation of the truth concept to indicator (or characteristic) functions. For
example, “true North” means the correct choice of directions among many directions. The “true” North is
the direction whose truth value is “true”. This is like superimposing a characteristic function on the set
of all directions, such that the correct direction has the value “true” and all other directions have value
“false”. A large proportion, maybe all, “truth” contexts are of the nature of a multiple choice, where one
choice is “true”. Therefore one may express a proposition in two different but equivalent ways: either as a
truth-valued “indicator function”, or as a simple proposition stating that the one and only one correct choice
is the “true” choice. ]
[ Remove the redundancy between Remarks 3.7.3 and 3.9.5. ]
3.9.5 Remark: Truth and falsity are not essential in natural language.
Most of the time, people do not consciously think in terms of the two truth values for propositions. Mostly,
people think in terms of “multiple choice”. (This is also mentioned in Remark 3.7.3.) For example, people
think about which path is best to take when walking to a destination. If a person is thinking about whether
the left-most path is the shortest, the person does not think: “Is it true or false that the left-most path is the
best?” The conscious choice is not between “true” and “false”, but rather between the various path choices.
The concepts of “true” and “false” are not really necessary at all. The concept of which choice to make is
sufficient. Consider the question: “Is option A the correct decision?” This question can be answered with
“yes” or “no”. This looks very much like a true/false answer. Instead of replying “yes” or “no” to the above
question, one could answer with a different choice: “Option B is the correct decision.” It seems as if the
attachment of truth values to propositions is unnecessary. If it is known a-priori that one and only one of a
set of options is true, the assertion of a particular option implicitly negates the other options.
There are some languages which lack the words “yes” and “no”, such as Latin and Irish Gaelic. (In Irish
Gaelic, the response to a yes/no question is essentially to repeat the question in the indicative or in the
negative.) Discussions can clearly be conducted without yes/no and true/false vocabulary.
3.9.6 Remark: Three stages of logic abstraction, analysis and application.
Abstract logic is an attempt to collect and formalize the common elements of practical logical thinking.
As for any kind of abstraction, there are three stages in this process. First, the concrete context to which
abstract logic is to be applied must be mapped to an abstract logic system. Secondly, some calculations
are carried out within the abstract framework. Thirdly, the conclusions from the abstract logic must be
mapped back to the application context. (See Figure 3.9.1.) This process can be beneficial if the modelling
is accurate.

3.10. The semantics of logical negation 97
abstract logical calculus abstract

propositions propositions
abstraction application
concrete concrete
propositions propositions
Figure 3.9.1 Three-stage logic abstraction and application process
3.9.7 Remark: Analytic logic and synthetic logic.

The abstract logic domain is generally called “analytic logic”. The use of logic in concrete contexts is called
“synthetic logic”.
[ Remove the definitions of analytic and synthetic logic if no reference can be found for them. ]
3.9.8 Remark: Propositions may be abstracted from any model or observation.

The most basic prerequisite for the applicability of abstract logic to a context is that there must be a well-
defined space of propositions. Each proposition must have a truth-value attribute which takes one and only
one of the values “true” and “false”. For example, in the case of the Leonardo da Vinci painting known as
the “Mona Lisa”, one may ask whether she is smiling or not. If the proposition “she is smiling” can only be
true or false, then abstract logic is applicable. If the answer to the question “Is she smiling?” could be “yes
and no”, then standard abstract logic is not applicable.
[ Strictly speaking, the proposition symbols in the following should be wff names like α rather than proposition
names like A. Maybe this should be clarified. ]
3.10. The semantics of logical negation

3.10.1 Remark: Dynamic interpretation of truth tables for proposition-store logic machines.
It is usual in textbooks to give the following “truth table” for the negation operator.
A ¬A
T F
F T
This is taken to mean that if A is true, then ¬A is false; and if A is false, then ¬A is true. In terms of the
logic machine model, the table means that if A is in the true-proposition list of machine M , then ¬A should
be inserted in the list; if ¬A is in the true-proposition list of machine M , then A should be inserted in the
list. In other words, the negation of a proposition blocks the acceptance of that proposition.
However, the truth table does not mean: “If A is not in the true proposition list, then ¬A should be inserted
in the list.” If neither A nor ¬A is in the true-proposition list, the truth table tells us nothing about
whether ¬A or A respectively should be inserted in the list. The truth table creates the impression that all
propositions are either true or false. But in a dynamic system, neither the proposition A nor its negation
¬A may be in the true-proposition list at a given point in time. If a proposition and its negative are both
not in the true-proposition list, this means that the proposition is undecided, not necessarily undecidable.
3.10.2 Remark: Interpretation of truth tables as proposition-store state-machine instructions.

The truth table in Remark 3.10.1 may be converted to logic-machine pseudo-instructions as follows.
(i) If A ∈ T, then ¬A → F. (If A is in the T-list, insert ¬A into the F-list.)
(ii) If ¬A ∈ F, then A → T. (If ¬A is in the F-list, insert A into the T-list.)
(iii) If ¬A ∈ T, then A → F. (If ¬A is in the T-list, insert A into the F-list.)
(iv) If A ∈ F, then ¬A → T. (If A is in the F-list, insert ¬A into the T-list.)
The first row of the truth table in Remark 3.10.1 corresponds to rules (i) and (ii). The second row corresponds
to rules (iii) and (iv). The ad-hoc notation A → T means “insert proposition A into the T-list”. Note that

the phrase “A ∈ T” is a static prior condition which must be satisfied at the time that the rule is executed.
The phrase “A → T” is a dynamic action which must be executed when a condition is satisfied.
According to this logic-machine view, truth is dynamic. In other words, truth is a process.
3.10.3 Remark: Self-consistency rules for proposition-store logic machines.

There is nothing in the insertion rules in Remark 3.10.2 to prevent both A and ¬A being in both the true-list
and the false-list. To prevent this situation, there must be an a-priori rule to prevent contradictions. This
“self-consistency rule” can have two different forms.
(1) It is forbidden to have an identical proposition A in both the true-list and the false-list.
(2) For any proposition A, it is forbidden to have both A and ¬A in the true-list, or both A and ¬A in the
false-list.
Rule (1) is not sufficient to prevent contradictions on its own. But if it is combined with the truth table for
the negation operator, contradictions are prevented. For example, suppose A and ¬A are in the true-list.
The negation operator rules implies that ¬A must be inserted in the false-list. Then ¬A will be in both lists.
So the rule is contravened.
Likewise, rule (2) is not sufficient to prevent contradictions on its own. But if it is combined with the truth
table for the negation operator, contradictions are prevented. For example, suppose A is in both the true-list
and the false-list. Then the negation operator rules imply that ¬A must be inserted into both lists. So A
and ¬A will both be in both lists. So the rule is contravened.
3.10.4 Remark: If a proposition-store logic machine generates a contradiction, it must be rebooted.

Remarks 3.10.1 and 3.10.3 lead to a dilemma if the negation-operator rule somehow clashes with the self-
consistency rule according to the example sequences of events described in Remark 3.10.3. Which rule takes
precedence? Should the negation rule proceed unhindered, yielding an inconsistent logic machine (which
breaks the self-consistency rule)? Or should the negation rule be prevented from acting in such cases, which
then yields a system which is still in a self-contradictory state? In the computer software context, such a
situation would lead to a “system hang” where the computer can no longer continue to function.
Under such circumstances, one may “reboot” the system and hope that the problem will not occur again for
a long time, or that the problem was caused by an “execution error”. Or one may abandon all machines with
the same design because they yield contradictions. Alternatively, one may remove one or more of the “boot-
time” propositions (called “axioms”) and re-start the machine in the hope that the removed propositions
were the sole cause of the problem. This is more or less what happened with Russell’s paradox. When this
was discovered, the axioms of set theory were modified to (hopefully) prevent the same kind of contradiction
happening again. Various new set theory designs were proposed, and these are now used instead of the
earlier “buggy” set theory.
The fixes for Russell’s paradox would be called “bug fixes” or “(software) patches” in the computer software
context. The subsequent divergence of set theories into various flavours would be referred to as “forking” of
the system into multiple “derived systems”.
3.10.5 Remark: Reductio ad absurdum may be interpreted as a local-scope reboot.

The show-stopping (or machine-stopping) contradictions mentioned in Remark 3.10.4 are acted out in minia-
ture in the RAA (reductio ad absurdum) method of deduction. In the RAA method, a tentative assumption A
is first made (with the intention of proving it false), and deductions are made from this assumption until a
contradiction is encountered. This causes a mini-reboot of the logical system. (In fact, one may think of the
logical argument which follows the tentative assumption as a temporary context which is “pushed” onto a
“stack” of contexts, in terms of a particular kind of computer programming model.)
When a contradiction is encountered, the RAA method specifies that the original tentative assumption A
may be asserted in the negative (¬A), and all deductions from the positive assertion A must be removed.
This is like building a mini-logic-machine which includes the tentative assumption A, and this mini-machine
is destroyed when a contradiction is found in it. (In a sense, the RAA method is very similar to the “devil’s
advocate” method of argument.)
[ Does the regula falsi method have any connection with the RAA method? ]

3.10.6 Remark: The difficulty of expressing compound logical expressions in natural languages.
The negation operator seems straightforward enough when one considers only abstract proposition names.
One simply writes the symbolic expression ¬A corresponding to any expression A. However, suppose the
name A refers to a concrete proposition such as: “The Earth is round.” How must one interpret the
proposition ¬A? The concrete proposition “¬The Earth is round.” does not have any obvious meaning. The
symbol “¬” is not part of the natural language of the space of concrete propositions. Since we do know
English, we can write: ¬A = “The Earth is not round.” This form of negation seems plausible enough. We
merely need to understand the language in which the concrete proposition (i.e. sentence) is written and then
form the negative of that proposition. But this is not always so easy.
As a more complex example, consider the sentence is B = “The Sun does not rise in the East.” Perhaps the
negative of this sentence is: “The Sun does not not rise in the East.” This is almost acceptable, although
it is not a common construction in English. Another candidate for the negative sentence is: “The Sun does
not rise not in the East.” This is also unnatural. Since we know English very well, we can guess the correct
negative as: “The Sun does rise in the East.” Or more simply: “The Sun rises in the East.”
But the examples can be more complex than that. Let C = “Either the Arctic is melting or the Equator
is warming or not becoming less humid, unless the global warming trend is not verified.” It quickly be-
comes clear that in order to write propositions such as ¬C in the concrete proposition language requires an
unreasonably detailed knowledge of the language.
A clean way to resolve the difficulty of concretely representing negations of propositions is to define ¬A
to mean “A is false” or “t(A) = F”, or some such expression in a meta-language. To be more precise,
¬A means: “The proposition to which we give the name A is false.” To be even more precise than this, we
could say that ¬A is a compound proposition name expression whose truth value is the negative of the truth
value of A, but which does not point to a specified sentence in the concrete language, although there may be
a sentence in the concrete language whose truth value is guaranteed to be the negative of the truth value of
A also. Only simple proposition names point directly to concrete propositions. All proposition expressions
point to simple proposition names, which in turn do point to concrete propositions.
When we say that a proposition is false, the proposition is itself the subject of the assertion. Truth values
are not always available in a concrete proposition language. We discuss truth-value proposition-attributes
in a “discussion language” or “meta-language”. In natural English, we discuss truth and falsity within the
language itself, but this is not necessarily possible in all concrete languages. For any concrete proposition A,
we can fairly plausibly write ¬A as “It is not true that A.” However, in some languages, the concrete
proposition A must be converted to a different verb mood and/or word order, and the rules for this may be
quite complex, even if the language does possess a sentence construction of this kind. (In the case of transistor
circuit voltages or Egyptian hieroglyphics, the map of abstract logic expressions to concrete “sentences” is
even more problematic than is described here.)
A classic example of the difficulty of negating a proposition is a proposition such as: “Have you stopped
committing crimes?” One possible answer is: “I have stopped committing crimes.” The apparent negative
of this answer is: “I have not stopped committing crimes.” Although both of these sentences have a well-
defined meaning in English, they are not true negatives of each other. Most people would be reluctant to
answer “yes” or “no” to the question because neither answer would seem correct. This suggests that concrete
language is unsuitable for logical operations. If even simple negation is so problematic, there is little hope
of performing the full generality of logical operations with crisp precision.
3.10.7 Remark: A negated proposition name is not the same as a meta-sentence.

Remark 3.10.8 discusses the consequences of regarding ¬A as a meta-proposition relative to A. This seems
right in the sense that the sentence “A is false.” is a meta-sentence relative to the sentence A. However,
a better interpretation is that A is a name of a proposition, and ¬A is a logical expression constructed
from the name A, and when asserted, < ¬A means that the expression φ¬ (t(A)) has the value T. That is,
< ¬A is equivalent to φ¬ (t(A)) = T. Thus all logical expressions of propositions names, except for atomic
proposition names, are proposition pseudo-names which are well-defined only in the abstract logic, not in
the concrete logic. Similarly, A ∨ B is a proposition pseudo-name which, when asserted, has a well-defined
meaning. Namely, < A ∨ B means φ∨ (t(A), t(B)) = T.
Nevertheless, the approach discussed in Remark 3.10.8 is useful in clarifying the consequences of assuming
that non-atomic logical expressions are meta-propositions relative to the concrete proposition space in some

sense.
3.10.8 Remark: The difficulty of mapping abstract logical expressions back to concrete propositions.
Even if one accepts that the truth and falsity of concrete propositions must be discussed within a meta-
language, the problem of the multiple application of logical operators still arises. As a trivial example,
the double negation expression ¬¬A must be given meaning for any proposition A. If ¬A is located in a
meta-language, surely ¬¬A is located in a meta-meta-language! If ¬A means “t(A) = F”, then surely ¬¬A
means “t(“t(A) = F”) = F”.
¬A ≡ “ t(A) = F”
¬¬A ≡ “ t(“ t(A) = F”) = F”.
[ Alternatively t(¬A) = φ¬ (t(A)). So t(¬A) = F ⇔ t(A) = T? ]

It is not obvious a-priori how propositions in the meta-meta-language can be mapped to the meta-language.
After all, mapping propositions from the meta-language to the concrete language is, in general, either very
difficult or even impossible. The situation is saved by the fact that the meta-language is a very special kind
of language. Unlike the concrete proposition languages, we do know a lot about the syntax and semantics of
the meta-language. In particular, we know that if the meta-proposition “t(A) = T” is false, then “t(A) = F”
is true, and vice versa. Therefore double negation can be simply and reliably mapped from the meta-meta-
language to the meta-language. (Of course, this does not imply that a further map to the concrete language
is simple or reliable.)
The meta-meta-proposition “t(“t(A) = F”) = F” can be mapped to the meta-proposition “t(A) = T”. This
follows from our knowledge that the “true” and “false” truth-values are mutually exclusive in the meta-
language layer. Similarly, the meta-meta-propositions “t(“t(A) = T”) = F” and “t(“t(A) = F”) = T” can
both be mapped to the meta-proposition “t(A) = F”.
“ t(“ t(A) = T”) = T” 8 → “ t(A) = T”

“ t(“ t(A) = F”) = F” 8→ “ t(A) = T”
“ t(“ t(A) = T”) = F” 8 → “ t(A) = F”
“ t(“ t(A) = F”) = T” 8 → “ t(A) = F”.
The meta-proposition “t(A) = F” cannot be mapped straightforwardly to a concrete proposition, as discussed

above, but the meta-proposition “t(A) = T” is generally identified with the concrete proposition A.
“ t(A) = T” 8 → A
“ t(A) = F” 8→ ???.
This is an exception. Essentially all logical expressions, apart from atomic expressions A, are problematic
to map to the concrete language. (This is illustrated in Figure 3.10.1.)
In the case of all other logical expressions, the same argument applies as for the double negative. Just as we
can say that ¬¬A is fully equivalent to A, if we interpret A as the meta-proposition t(A) = T. so also we
can interpret A ∧ A, for example, as fully equivalent to A with the same interpretation. More generally, all
equivalences are valid within the meta-proposition layer, but these equivalences are not always valid in the
concrete proposition layer.
[ Find references for the “deflationary” and “redundancy” theories of truth, where apparently a sentence of
the form “A is true.” is equivalent or identical to the corresponding sentence A. Of course, I don’t agree
with such theories, but it’s good to present opposing viewpoints. ]
3.10.9 Remark: General truth functions.

Having discussed one particular “truth function”, namely the negation operator in Remark 3.10.1, it is
straightforward to generalize this concept to general truth functions. (See Definition 4.2.4 for truth func-
tions.)

meta-meta-propositions
t(“t(“Tá an Domhan cothrom.”) = T”) = T t(“t(“Tá an Domhan cothrom.”) = T”) = F

t(“t(“Tá an Domhan cothrom.”) = F”) = F t(“t(“Tá an Domhan cothrom.”) = F”) = T
meta-propositions
t(“Tá an Domhan cothrom.”) = T t(“Tá an Domhan cothrom.”) = F
concrete propositions
A = “Tá an Domhan cothrom.” “. . . ???. . . ”
Figure 3.10.1 Mapping (meta-)meta-propositions to (meta-)propositions
3.10.10 Remark: Semantics of abstract logical propositions.

The primary application of truth functions is to define logical operators. Each logical operator corresponds
to a fixed truth function.
The truth table in Remark 3.10.1 defines the unique unary truth function f : {F, T} → {F, T} which
satisfies f (T) = F and f (F) = T. This is the truth table of the unary logical operator “¬”, namely the
logical negation operator.
To be more precise, let t(A) denote the truth value F or T of a proposition A. Then ¬A means a proposition
whose truth value t(¬A) satisfies t(¬A) = f (t(A)), where f is the unary truth function defined by the table
in Remark 3.10.1.
There is something unsatisfying in this definition of negated propositions ¬A. How do we know that there
exists a proposition in the universe of propositions which satisfies t(¬A) = f (t(A))? How do we know that
there is at most one such proposition? A simple way to resolve these questions is to simply define ¬A
to mean “the proposition which has the opposite truth value to A”. But this is also unsatisfying because
meta-language is required to give this proposition meaning. In fact, this is essentially the real situation. In
real life, assertions of the form “A is not true” have the character of meta-propositions. A similar comment
applies to compound propositions such as “A and B are both true” and “B is true if A is true”. However,
in our culture, we accept meta-propositions such as “it is not true that the cat sat on the mat” (t(A) = F)
on an equal footing to simple propositions such as “the cat sat on the mat” (A).
3.10.11 Remark: Indicative and subjunctive verb moods for abstract propositions.
The truth value attribute t(A) for propositions A, referred to in Remark 3.10.10, is related to the subjunctive
verb mood which is discussed in Remark 3.7.1. Meta-propositions of the form “t(A) = T” correspond to the
indicative mood.
3.10.12 Remark: Semantics of assertion of negative propositions.

In the same way that one defines “< A” to mean “there exists a proof for A” (where “exists” means that
the person making the assertion can produce a proof on demand), one might conjecture that “< ¬A” might
mean “there exists no proof for A”. This could be interpreted to mean that there is a provable meta-theorem
which asserts that no proof for the proposition A is possible. (Of course, this requires an entire meta-logic
to be defined and developed.)
One way to prove (in a meta-logic) that no proof for A can be provided would be to show that the result of
any such proof would lead to a contradiction. But this would require a powerful general meta-theorem which
proves that the original logic is contradition-free, i.e. consistent. This form of proof (of the impossibility
of a proof for A) is essentially the same as the RAA rule. (See Section 3.11.) Other kinds of meta-logical
proofs of the impossibility of finding a proof are likely to be much more complicated than this, for example

involving concepts like “reachability” and “span”.

Even if a meta-mathematical proof of the non-existence of a proof may be provided, this would not be a
convincing meaning for the assertion “< ¬A”. This kind of assertion must mean that the truth value of
A is “false”. This, in turn, is only well-defined when interpreted within the concrete context to which the
abstract logic is being applied.
The semantics of “< ¬A” may be denoted as “t(A) = F” in the same way that the meaning of “< A” may
be denoted as “t(A) = T”. There is no need to discuss proofs or existence of proofs at all. More generally,
the assertion of any abstract logical expression has a meaning which can be fully expressed in terms of truth
values of the concrete propositions to which the proposition names in the expression refer.
3.10.13 Remark: Equivalent logical expressions versus identical logical expressions.

When two compound logical expressions have the same truth value for all choices of truth values for the atomic
propositions, one may ask whether these expressions are the same, and if so, in what sense. For example,
are the expressions A and ¬¬A the same? This is similar to asking whether the algebraic expressions x and
(x + 1) − 1 are the same. We would generally say that these expressions are different, but the always have
the same value. So they are “equivalent”. In the same way, equal-valued logical expressions such as A and
¬¬A are equivalent, but not identical expressions.
A logical expression specifies a set of operations which are to be carried out. For example, the expression
¬¬A specifies that the sequence of operations φ¬ ◦ φ¬ must be applied to A. More complex expressions
require trees of operations, not just linear sequences of operations.
As mentioned in Remark 3.10.8, a concrete proposition corresponding to the abstract logical expression ¬A
does not necessarily exist. The space of concrete propositions is not necessarily closed under the usual logical
operations (in Notation 4.3.3) because the concrete proposition language might not have all such operations.
In particular, an abstract expression like ¬¬A might not correspond to a concrete proposition. Therefore
one certainly cannot say that the propositions A and ¬¬A are “the same”, because ¬¬A might not even be
defined as a concrete proposition.
It is much tidier to regard simple atomic propositions like A as names of concrete propositions, and then
regard all compound logical expressions, including ¬A, as being merely notations for logical function trees
in terms of the truth values of the component propositions. (The parsing of logical expressions to produce
the corresponding function trees is discussed in Remark 4.3.10.) Thus ¬A, in particular, is not the name of
a concrete proposition.
3.10.14 Remark: Interpretation of logical expressions as triggering delayed conclusions.

As mentioned in Remark 3.7.17, compound propositions such as A ⇒ B and A ∨ B may be thought of as
triggering future deductions. The proposition B if “triggered” by A if A ⇒ B is true. The proposition B is
“triggered” by ¬A if A ∨ B is true.
The triggering concept gives a clue to how one might interpret the negation of a proposition. As observed
in Remark 3.12.1, the consequent clause of a conditional is typically in the imperative verb mood in natural
languages. (In other words, people rarely say that if A is true, then B is true. More often, people say that
if A is true, then action B must be carried out.)
Suppose A ⇒ B is a conditional, where B is in the imperative verb mood. If a person believes that A is true,
this implies that they must carry out action B. If A is false, this implies that B does not need to be carried
out, at least not on account of the rule A ⇒ B. In ancient history, laws had the form of conditionals with
an imperative consequent clause. Finding that the antecedent clause was false meant that the consequent
action did not need to be carried out.
As suggested in Remark 3.10.1, the negation of a proposition blocks the acceptance of the proposition. Since
a primary reason for wanting to establish the truth value of propositions is to determine consequent actions,
the negation of a proposition has the result of blocking the consequences of accepting it. Thus the negation
of a proposition is not merely the non-assertion of it, but also an active blocking of its assertion.
3.10.15 Remark: Truth and falsity versus belief and disbelief.

Instead of “truth” and “falsity”, one should perhaps talk of “belief” and “disbelief”. In the English language,
the word “disbelief” can mean either “not believing something” or “believing that something is false”. In
other words, disbelief can indicate either a scepticism, which could later be converted to belief, or a positive

3.11. Proof by contradiction 103
belief in the falsity of a proposition. (This is reminiscent of the Scottish law distinction between “not guilty”
and “not proven”.)
When we say that a proposition is false, we generally mean that it is not only not provable, but also something
stronger than that. One way to interpret falsity is to consider implications like: “If the food is poisonous,
you must not eat it.” If the proposition P = “The food is poisonous.” is true, one must not eat the food.
And if there is any chance at all that it is poisonous, one should presumably not eat it. However, if it is
positively asserted that the food is not poisonous, it would be safe to eat it. The mere inability to prove P
would not be sufficient grounds for eating the food. But the positive assertion that P is false means that
the food is safe to eat. One no longer needs to think at all of the logical implications of P being true.
3.11. Proof by contradiction

3.11.1 Remark: Equivalent and related concepts to proof by contradiction.
The important method of mathematical logic known as “proof by contradiction” is equivalent, or identical,
to the concepts and methods of reasoning known as “the excluded middle”, “reductio ad absurdum” (RAA),
“Ex contradictione sequitur quodlibet”, “Ex falso sequitur quodlibet” and “the indirect method”. There is a
long history of controversy about this method.
[ Find references and translations for Ex contradictione sequitur quodlibet and Ex falso sequitur quodlibet. ]
3.11.2 Remark: Proof by contradiction is based on the excluded middle.

One of the most important tools of mathematical argument is “reductio ad absurdum” (RAA), which is
based on the “excluded middle”, which means that all well-formed propositions are either true or false (but
not both, and not some other truth value). Bell [190], pages 565–566, quotes Luitzen Egbertus Jan Brouwer
as saying that: “A implies the absurdity of the absurdity of A, but the absurdity of the absurdity of A does
not imply A.” One may reply to this that: “the absolute impossibility of the absolute impossibility of A
does imply A.” Therefore the “excluded middle” rule is accepted here.
Doubt about the validity of the excluded middle law does arise in colloquial, informal argumentation where
multiple world-models are mixed indiscriminately. This is due to mixing multiple meanings of the words
“true” and “false” in a single context, applying these words to propositions belonging to multiple inconsistent
world models without specifying which word instances refer to which models.
3.11.3 Remark: Proof by contradiction assumes that the logical system is contradiction-free.
Logic is a model for human mental processes. The rules of logic must correctly describe how humans think.
By accepting RAA, we are merely incorporating into formal logic the observation that mathematicians do
generally argue that if they can obtain both A and ¬A from a tentative assumption that B is true, then B
must be false. This is equivalent to expressing the certainty that all of the previous assertions in the logical
system that one has built up contain no contradictions.
3.11.4 Remark: Proof by contradiction is potentially dangerous.

The biggest danger of RAA is the fact that if a set of axioms is not self-consistent, then all propositions
can be proved to be both true and false. So the whole logical system collapses. This danger may seem to
be an argument in favour of rejecting RAA, but if a logical system does contain self-contradictions, it is
probably a good idea to make the system collapse. The RAA deduction method assumes that the logical
system is self-consistent. The best cure for a logical system which is not self-consistent is to remove offending
components of the system until it is self-consistent, not to remove the RAA deduction method.
3.11.5 Remark: Proof by contradiction is very dangerous if empirical propositions are permitted.
If some of the propositions in a logical system incorporate empirical observations and inferences, there would
be a fair likelihood that contradictory assertions could enter into the system. In this case, the RAA method
would certainly be unreliable and undesirable. The slightest error in observation or inference could lead
to totally arbitrary conclusions. In order to use RAA successfully, one must be totally certain that the
logical system has not previously acquired any contradictory assertions. This guarantee is fairly achievable
in a purely abstract model, but not in a system which is frequently acquiring assertions from unreliable
observations and inferences.

3.11.6 Remark: The excluded middle follows from world-model ontology, not proposition-store ontology.
The excluded middle assertion “¬¬A < A” seems difficult to justify if one accepts the proposition-store
ontology for mathematical logic which is mentioned in Remark 3.7.2. According to this ontology, logical
machines (such as human beings) accumulate propositions over time. Truth means that a proposition is
accepted. Falsity means that a proposition is rejected. So double negation ¬¬A means that the rejection
¬A is rejected. But this does not imply the acceptance of the proposition A.
However, if one adopts the world-model ontology which is mentioned in Remark 3.6.1, the assertion “¬¬A <
A” is much easier to justify. According to this ontology, propositions are attributes of models. That is,
propositions are either true or false according to whether the logic machine’s world-model has the attribute
or does not. The possession or non-possession of any specific attribute partitions all world-model states into
two subsets. (This is illustrated in Figure 3.11.1.)
sender receiver
model states z ∈ Z “I assert P .” model states z ∈ Z
ZP P (z) true P is true P (z) true ZP
Z̄P other states P is false other states Z̄P

ZP = {z ∈ Z; P (z)} “I do not assert P .” ZP = {z ∈ Z; P (z)}
Or: “I assert ¬P .”
machine M1 Or: silence machine M2
Figure 3.11.1 World-model ontology for logical negation
So negation of an attribute means that the complementary set is indicated. The double complement of a
set is itself. So the general assertion “¬¬A < A” is guaranteed by the semantics of propositions. Such a
guarantee cannot be given in the absence of an ontology.
The discussion in Remark 3.10.8 concludes that if the abstract proposition “t(A) = F” is false then “t(A) =
T” must be true. In other words, the meta-meta-proposition “t(“t(A) = F”) = F” can be mapped to the
meta-proposition “t(A) = T”. This follows specifically from the level of abstraction of these propositions.
3.11.7 Remark: Propositional calculus neither needs nor permits a third truth value.
One may argue that for every proposition regarding a world model (inside a logic machine), there could be
three possibilities for the world-model attribute: true, false or unknown. However, in this case there are
three possible propositions: P1 = “attribute A is true”, P2 = “attribute A is false” and P3 = “attribute A
is unknown”. Each one of these attributes may be true or false. One of the propositions P1 , P2 and P3 is
true, and the other two are false. These three propositions each have two possible truth values, “true” and
“false”.
If the previous paragraph seems confusing, that may be because the words “true” and “false” are used in
two different contexts, with two different sets of meanings. In the first context, these words refer to possible
states of the discussed logic machine M1 , for which there are three possibilities for world-model attributes.
In the second context, the words “true” and “false” are used by logic machine M2 to describe machine M1 .
The logical language of machine M2 conforms to the excluded middle rule, whereas the logical language of
machine M1 does not.
Ultimately it seems that the excluded middle is valid for all logic machines which have only two options for
the truth value of any proposition. If there are more than two options, one is no longer dealing with logic,
whose propositions are defined to have exactly two truth value options. Any logic machine which works
with more than two truth value options can be accurately modelled by a logic machine which has two-option
propositions. That is, in fact, what defines a proposition. A logical proposition always has two possible
truth values. Anything else is a generalization of the concept of logical propositions. Such generalizations
may be modelled, as outlined above, in terms of a two-option logic. (See Section 4.1 for the fundamental
definitions of two-truth-value logics.)
It may be concluded that all logic machines either implement a strict two-truth-value logic, or else they can
be accurately modelled by a strict two-truth-value logic.

3.11. Proof by contradiction 105
Since logic is merely a model, its conclusions are only valid to the extent that the modelled system is
accurately modelled. Therefore necessarily some logic machines cannot benefit from the conclusions of logic
because the modelling assumptions are not satisfied. One may, however, transform a non-conformant system
so that two-truth-value logic is applicable.
3.11.8 Remark: Interpretation of proof by contradiction in the world-model logic ontology.
The comments in Remarks 3.11.6 and 3.11.7 may now be applied to the issue of proof by contradiction. In
terms of the proposition-store ontology, this method of proof is difficult to justify. In this ontology, the RAA
method commences with the insertion of a trial proposition A into the proposition-store. Then calculations
are performed which show that A is inconsistent with some combination of pre-existing propositions. So A
must be removed from the store. This does not imply in any way that its negation ¬A must be inserted
in the store. The fact that a proposition cannot be inserted does not automatically imply that its negation
must be inserted. It is entirely feasible for the logic machine to continue with neither of the propositions A
and ¬A in the store.
In terms of the world-model ontology, all propositions are attributes of the model. Each proposition answers
a yes/no question about the model. So if a proposition is found to be inconsistent with other propositions
about the model, this does imply that the proposition is necessarily false, which by definition of ¬A, implies
that ¬A must be true.
Logical inconsistencies can only arise in the world-model ontology because of an error of logical calculus,
which means that the machine does not satisfy the minimum requirements for the logic model, or else one
made the wrong guess out of the two options: “true” and “false”. As mentioned in Section 4.1, the most
fundamental assumption of standard two-state logic is that there is a set of concrete propositions, all of
which are either true or false. Sometimes one is given the truth values of compound propositions (i.e. a
logical expressions) instead of atomic propositions. From the truth values of compound propositions, the
truth values of atomic propositions can be deduced. (This is similar to solving simultaneous equations in
algebra.) If a contradiction arises in this deduction process, this implies that the truth values given for the
compound propositions were not all correct.
In the world-model ontology, contradictions always imply that one or more incorrect propositions have been
introduced into the logical calculus. Therefore one can validly infer that if all assumptions but one are
guaranteed in some way to be valid, the non-guaranteed assumptions must be false.
The world-model ontological picture of the RAA rule may be summarized by saying that the truth values of
all propositions are “out there”, and we only need to find out what they are. The logical calculus is merely
a method of combining the clues which we have about the model under study to arrive at conclusions. In
the proposition-store ontology, propositions are meaningless texts which are subject to arbitrary rules. The
fact that a proposition is syntactically correct does not imply that it has a meaning. Therefore it seems that
semantics is essential for the RAA method to work. In fact, the existence of an underlying model seems to
be required for logic to be anything more than a recreational activity.
3.11.9 Remark: Discomfort with proof by contradiction has a long history.
Bell [191], pages 277–278, makes the following comment about reductio ad absurdum in the context of a
disagreement between Malus (1775–1812) and Cauchy (1789–1857).
Étienne-Louis Malus was not a professional mathematician but an ex-officer of engineers in Napo-
leon’s campaigns in Germany and Egypt, who made himself famous by his accidental discovery of
the polarization of light by reflexion. So possibly his objections struck young Cauchy as just the sort
of captious criticisms to be expected from an obstinate physicist. In proving his most important
theorems Cauchy had used the “indirect method” familiar to all beginners in geometry. It was to
this method of proof that Malus objected.
In proving a proposition by the indirect method, a contradiction is deduced from the assumed falsity
of the proposition; whence it follows, in Aristotelian logic, that the proposition is true. Cauchy
could not meet the objection by supplying direct proofs, and Malus gave in—still unconvinced that
Cauchy had proved anything.
The “indirect method” of proof is the same as reductio ad absurdum.
3.11.10 Remark: Discomfort with proof by contradiction is still present in the modern world.
Lakoff/Núñez [173], page XIII, makes the following remark about the RAA method.

[. . . ] why, in formal logic, does every proposition follow from a contradiction? Why should anything
at all follow from a contradiction?
On page 121, they say the following on this subject.
And what about “P and not P , therefore Q”? Why should anything at all follow from a contradic-
tion, much less everything? Generations of students have found these to be disturbing questions.
Formal definitions, axioms, and proofs do not answer these questions. They just raise further
questions, like “Why these axioms and not others?”
It is incorrect to describe proof by contradiction as: “From a contradiction, any proposition follows.” If
no errors are made in logical deduction from self-consistent assumptions, no contradictions will ever arise.
So one cannot then make anything follow from a contradiction because there aren’t any! But if one does
try to introduce an ad-hoc proposition into an argument, and that proposition is shown to contradict other
propositions, one may correctly infer that the ad-hoc proposition is in fact false, assuming that it is at
least a well-defined proposition which is within the scope of the logical system being considered. (A well-
defined abstract proposition must be a valid logical expression whose individual proposition names refer to
propositions in the concrete proposition domain.) It is assumed also that the deduction rules of the system
have been shown to yield only true conclusions from true assumptions.
3.12. The moods of logical propositions

Verb moods are also discussed in Remarka 3.7.1, 3.10.6, 3.10.11 and 3.10.14.
3.12.1 Remark: Descriptive and prescriptive moods of propositions.

In terms of the descriptive and prescriptive interpretations of propositions discussed in Remark 3.7.13,
the notations A ∈ T and A → T in Remark 3.10.2 may be interpreted as descriptive and prescriptive
respectively. That is, A ∈ T means that the proposition is currently in the true proposition list of the sender
of the proposition whereas A → T means that the recipient of the proposition should insert A in its true
proposition list.
To make the meaning of propositions clearer, one could, in principle, use notations such as A ∈ T and
A → T in all mathematical logic literature to distinguish the indicative and imperative mood of the verb.
Unfortunately, symbolic logic and mathematical notation generally do not indicate the mood, tense and
other inflections of verbs. This information is usually indicated informally in the natural-language context.
3.12.2 Remark: Non-asserted propositions may be regarded as pseudo-propositions.

Propositions can be interpreted as being in either the subjunctive or indicative verb mood. If a proposition
P is asserted, for example with the notation “< P ”, this corresponds to the indicative verb mood. When a
proposition is not asserted, but is merely being discussed, this corresponds to the subjunctive verb mood.
(This verb mood distinction is also discussed in Remark 3.7.1.)
When the proposition A ∧ B is being asserted, it is equivalent to the truth value equation t(A ∧ B) = T,
which means that t(A) = T and t(B) = T. But when the proposition A ∧ B is merely being discussed, not
asserted, it satisfies " #
t(A ∧ B) = T ⇔ t(A) = T and t(B) = T .
Thus the truth value of A ∧ B depends on the truth values of A and B. In other words, in the non-asserted
case, we don’t know if the proposition A ∧ B is true or false unless we know the truth values of A and B. In
fact, we could say that the non-asserted proposition A ∧ B is a pseudo-proposition whose only attribute is
a truth value t(A ∧ B), which is defined to equal φ∧ (t(A), t(B)), where the function φ∧ : {F, T}2 → {F, T}
is defined by $
φ∧ (x, y) = T if (x, y) = (T, T)
F otherwise.
By contrast, the asserted proposition A ∧ B means that the truth values of both A and B are known to be T.
3.12.3 Remark: In general literature, the consequents of conditionals are mostly imperatives.
In old literature, such as Gilgamesh (Remark 3.5.2) and Bēowulf (Remark 3.5.4), the consequent of a con-
ditional is generally future-tense or imperative clause. The kind of consequent clause which is of interest
for mathematics is the present indicative. In mathematics, one generally wants to express conditionals such

3.13. Other remarks on the semantics of logic 107
as: “If A is true, then B is true.” Pure mathematics is a fairly static system where propositions are either
eternally true or eternally false. Computer software often has conditionals of the imperative form: “If A is
true, then do B.” But pure mathematics is not like that.
3.13. Other remarks on the semantics of logic

[ This section is a holding pen for remarks not yet allocated to sections. ]
3.13.1 Remark: Verbatim importation of abstract logical expressions into mathematics.

As mentioned in Remark 3.2.5, the application of logic to pure mathematics has the special feature that logical
operators are mapped verbatim from the abstract logic to the concrete logic. For example, if A = “x ∈ U ”
and B = “x ∈ V ”, then the abstract proposition A ∧ B maps to the concrete proposition “x ∈ U ∧ x ∈ V ”.
This map has the property that “x ∈ U ∧ x ∈ V ” is true if and only if A ∧ B is true. Mathematical logic is
defined in such a way as to ensure that abstract and concrete logic operations give the same result.
Many mathematicians are not comfortable with such an application of logic to mathematics. They prefer to
communicate logical operations informally in the text of their expositions. However, the adoption of formal
logical notations is generally clearer and less ambiguous than natural language, and the use of formal logic
notation is not incompatible with the provision of informal clarification in natural language.
3.13.2 Remark: A finite set of propositions implies an infinite set of compound propositions.
One problem with the logic machine model for propositional calculus is the question of whether all compound
propositions should be automatically inserted into the proposition lists of the machine. For example, suppose
the true-list of a machine has n simple propositions A1 , A2 ,. . . An . We may then insert the n2 compound
propositions (Ai ∧ Aj ) into the true-list for integers i, j = 1 . . . n. Then the 2n3 propositions ((Ai ∧ Aj ) ∧ Ak )
and (Ai ∧ (Aj ∧ Ak )) may be added. Clearly an infinite number of true compound statements follow from
a finite set of simple propositions. This contradicts the initial assumption of a finite machine. An infinite
set of propositions would build all of the difficulties of the infinity concept into the design of the machine.
This infinite set of potential true statements which follow from a finite set of simple propositions justifies
the interpretation of truth functions as a dynamic permission to assert a proposition as opposed to a static
specification of which compound statements are true. (When logical quantifiers are brought into the picture,
the infinity of true propositions in the static picture becomes even more difficult to deal with.)
3.13.3 Remark: Use of a limited set theory to underly mathematical logic.

When grappling with the cyclic, intertwined nature of logic and set theory, one thin hope for reprieve is to
suppose that a finite naive set theory can be used as a more or less firm basis for logic, and then logic can
be a basis for a set theory which encompasses infinite concepts. (See Figure 3.13.1.)
infinite “rigorous” mathematics
infinite infinite infinite

functions order numbers
infinite
set theory
mathematical logic
finite logic finite sets finite functions finite order finite numbers
finite naive mathematics

Figure 3.13.1 Finite naive mathematics and infinite “rigorous” mathematics

If the naive set theory underlying logic is required to provide infinite sets, any hope of using a finite framework
to explain infinite concepts is lost. In other words, infinity would need to be input to the framework in order
to get infinity as an output. There would be little net profit.
3.13.4 Remark: Reformulation of logic in terms of set theory.

The mechanization of mathematical logic seems to take away the problem of the circularity of mathematics
and logic. When mathematics (including set theory and numbers) is used to define logic, this is like running
a simulation of computer hardware inside the software which runs on that hardware. (See Figure 3.13.2.)
This is clearly possible, but it is obvious that the simulation is not the same thing as the real hardware. In
particular, a self-simulation runs much more slowly than the physical hardware. (Computer simulation is
also mentioned in Remark 3.13.8.)
software
software
hardware
simulation
hardware
hardware
Figure 3.13.2 Recursive simulation of a computer by itself
In the same way, when logic is re-formalized in terms of set theory and numbers, it is not the same logic as
the naive logic which we used to formalize mathematical logic (and set theory and numbers) originally. (See
Section 7.13 for the reformulation of mathematical logic in terms of axiomatic set theory and numbers.)
Another way to look at this is that during mathematical education, we construct in our minds a series of
virtual logic machines which are used progressively to define new virtual logic machines. As soon as we have
reached a level where set theory and numbers are defined, we can return to our logic and re-formulate it.
But this is then not the same logic which we used to formulate logic the first time we learned it.
A conclusion from this is that it is somewhat deceptive to claim that mathematical logic is defined in terms
of sets and numbers as it sometimes is in advanced logic textbooks. It should always be emphasized that
this is a re-formulation in the same way that a simulation of hardware in software is not the same thing
as the hardware on which the software runs. In particular, when it is claimed, on the basis of set-theoretic
analysis, that some class of mathematical logics is complete, incomplete, or self-consistent, the validity of
such claims is open to question. One must be at least sceptical of a “proof” of the correctness of a machine
if the “proof” is only verified by the machine itself.
3.13.5 Remark: On-demand construction of compound propositions.

The danger of infinite lists of propositions mentioned in Remark 3.13.2 can be avoided by generating derived
compound propositions “on the fly”. One may regard compound propositions as meta-propositions which
may be tested whenever desired. These meta-propositions about the state of the true proposition list are
propositions about the state of the core proposition list. Thus there are two proposition stores: a (relatively)
static core proposition list and a temporary, derived working meta-proposition list.
3.13.6 Remark: Conjunctions are like proposition lists. Disjunctions are not.
A meta-proposition of the form A ∧ B is equivalent to the insertion of propositions A and B into the
core proposition list. (See Figure 3.13.3.) Thus if T, T# are the sets of true propositions and true meta-
propositions respectively, A ∧ B ∈ T# is equivalent to {A, B} ⊆ T.
However, there is no such equivalent for a disjunction such as A ∨ B. There is no set of simple propositions
which can be inserted into the core proposition list to obtain the same effect as the insertion of A ∨ B in the
meta-proposition list. (See Figure 3.13.4.) This suggests that conjunctions and disjunctions are qualitatively
different. (This issue is also discussed in Remark 3.7.14.)
3.13.7 Remark: A proposition-store must be able to store both simple and compound propositions.
The disjunction of two propositions seems to be qualitatively different to a conjunction. (Essentially this

3.13. Other remarks on the semantics of logic 109
true meta-propositions true meta-propositions
T# A∧B T#
is
equivalent
to
T T
A B
true propositions true propositions
Figure 3.13.3 Interpretation of conjunction of propositions
true meta-propositions true meta-propositions
T# A∨B T#
is
equivalent
to
T T
? ? ?
true propositions true propositions
Figure 3.13.4 Interpretation of disjunction of propositions?
same point is made in Remark 3.7.14.) This is reflected in the use of the word “or” in colloquial English,
which has numerous forms such as “exclusive or”, “inclusive or” and “and/or”, whereas there is only one
form of “and”. The “if” operator, which is roughly equivalent to a disjunction, is also the subject of much
confusion. (Many people find it difficult to accept that A ⇒ B is true if A is false. This subject is also
mentioned in Remark 3.5.7.) The “and” operator indicates certainty whereas the “or” and “if” operators
indicate uncertainty.
Consequently, it is not possible to avoid the inclusion of compound propositions in the semi-static core
proposition list of a logic machine. To be able to indicate the uncertainty (or conditionality) of the “or” and
“if” operators, these operators must be allowed into the core list. Disjunctions and conditionals cannot be
generated from a list of atomic propositions because one does not know if the true proposition list includes
all true propositions.
[ Remark 3.13.8 has some overlap with Remark 3.13.4. ]
3.13.8 Remark: Analogy of cyclic logical systems to cyclic computer simulations.

When set theory and arithmetic are used in a meta-logical proof of the self-consistency or completeness of
a first-order logic of some sort, such a proof inspires as much confidence as a computer software analysis
or simulation of the processor hardware which is used by the computer for all of its own calculations. The
acceptance of the correctness of the hardware leads to the conclusion that the conclusions of the software
are correct, which implies the correctness of the hardware. This is clearly unsafe thinking.
The process of building confidence in computer hardware started historically with computers which were
simple enough for humans to analyze in great detail. If enough good designers worked on the task, it could
be hoped that the hardware would be fairly bug free. A similar process occurred in the evolution of computer
software. The first simulations of the operation of computers were performed by the human brain, which is
itself a computer. Later, computers were used in the design and verificiation of computer designs, on the
assumption that no bugs had crept in at the beginning, for example in the brains of the designers. As long
as all of the computers are consistent with each other, one can be confident that they are either all right or
all wrong.
The same observation applies to mathematics, which evolved from naive beginnings into a system sophis-
ticated enough to model itself. Mathematics has neither more nor less credibility than computer hardware
designs. Each mathematical system is effectively a machine design. Mathematical systems have evolved in

much the same way as computer designs.
3.14. Naive mathematics

3.14.1 Remark: To boot-strap the methods of mathematical logic, we need some “naive mathematics”.
This corresponds to the “magma layer” in Figure 2.1.1. Any reader who cannot recognize the concepts of
naive mathematics will not be able to understand mathematical logic, set theory and the rest of mathematics
anyway; so there seems little harm in assuming that the reader will have acquired naive mathematics from
previous experience.
The purpose of a naive mathematics definition is to attempt to informally standardize a minimal collection
of concepts which can be used as a basis for developing mathematical logic. (Serious formal definitions of
naive mathematics are not possible because, of course, there is nothing of a formal nature to base it on!)
3.14.2 Remark: There is no absolute requirement to develop naive mathematics (or any other subject)
in an axiomatic manner. As mentioned in Remarks 3.2.6 and 3.2.7, propositions do not magically acquire
superior credibility or infallibility because of being deduced from other propositions. Axiomatization has
many virtues, for example: compactness, minimalism, greater likelihood of self-consistency, and greater
clarity of exposition. But some assertions must be permitted without the benefit of deductive justification.
Nevertheless, naive mathematics does require consensus. In fact, there is only partial consensus on naive
mathematics and logic. This is unavoidable. But the effort to achieve consensus is necessary. At the very
minimum, each author must attempt to achieve consensus with some proportion of the readers. This book
tries more than most to make explicit the underlying naive assumptions.
Verification of naive mathematics must necessarily be conducted “by hand”. This is for the same reason
that the first computer hardware and software designs needed to be verified by human authors and testers.
(See Remarks 3.13.4 and 3.13.8 for comments on recursive computer simulations.)
3.14.3 Remark: There are cyclic dependencies among the concepts defined in Definition 3.14.4. Therefore
this naive definition serves only to refine the concepts, not define them.
[ Definition 3.14.4 will present a collection of naive mathematics concepts. This is an extremely rough first
hack of a definition which will take a lot of work to make it useful. Please ignore it for now. ]
3.14.4 Definition [naive]:

(1) A name is a sequence of letters (with optional subscripts and optional superscripts, which are themselves
either names or numbers).
(2) A (naive) set is a name A together with an effective procedure for determining whether any given object
x is in the set A (denoted x ∈ A) or not in the set A (denoted x ∈ / A).
(3) An element of a set is any object x such that x is in A.
(4) A sequence is a set of names which are ordered in some way.
(5) A number sequence is a sequence which is used for counting objects in sets. (??)
(6) Two sets A and B are equinumerous if there is a one-to-one association between the elements of A and
the elements of B.
(7) A (naive) cardinal number is a symbol or symbol-sequence n which is associated with sets in such a way
that n is associated with both set A and set B if and only if A and B are equinumerous. (?)
(8) The cardinality of a set A is any cardinal number n which is associated with A (according to the definition
of a cardinal number). (?)
(9) Two cardinal numbers m and n are called equivalent cardinal numbers if there exists a set A such that
m and n are both associated with A (according to the definition of a cardinal number). (?)
(To be continued. . . maybe!)
[ Definition 3.14.4, and perhaps all of Section 3.14, could be eliminated without much loss. If the naive
mathematics is kept, it should define only those concepts which are actually used in the set-up of logic, set
theory and numbers. ]

[111]
Chapter 4
Logic methods
4.1 Concrete proposition domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.2 Logic operations in concrete proposition domains . . . . . . . . . . . . . . . . . . . . . . . 116
4.3 Logical operators and expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4 Logical expression evaluation and logical argumentation . . . . . . . . . . . . . . . . . . . 124
4.5 Propositional calculus formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.6 Deduction rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.7 An implication-based propositional calculus . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.8 Some propositional calculus theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.9 Meta-theorems and the “deduction theorem” . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.10 Further theorems for the implication operator . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.11 Other logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.12 Parametrized families of propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.13 Logical quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.14 Predicate calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.15 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.16 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
The purpose of this chapter is to present the methods and procedures of logic without thinking too much
about the semantics, which is the subject of Chapter 3. The semantics of logic resides in the application
contexts of logic. Each context has its own semantics.
[ 2008-11-13: The split of the old logic chapter into Chapters 3 and 4 is not yet complete. The bulk of
semantics should be in Chapter 3, whereas Chapter 4 should simply present the calculus of logic, just the
methods and procedures. ]
[ 2008-12-6: The two logic chapters still contain a lot of repetition, and a large amount of material still needs
to be written. ]
4.0.1 Remark: The methods and procedures of logic come in many styles. These styles include truth
tables, propositional calculus, predicate calculus, syllogisms and Boolean algebra. In addition to the textual
and tabular methods of logic, it is also possible to make logical calculations by means of diagrams such as the
function trees which are discussed in Remark 4.3.10. (There is even a quirky graphical method of syllogisms
which was published by Lewis Carroll [159].)
All of the different ways of doing logic are essentially equivalent. One chooses the methods and procedures
which are best suited to particular applications. Within each style of logic, there are various flavours and
sub-flavours which are adopted by different bodies of literature and particular authors.
Sometimes the most powerful methods and procedures are not the easiest to apply. In general, it is best
to study the most powerful framework for a subject in depth, and then study also some specific techniques
which make work easier for particular applications. An objective of this chapter will be to present a broad,
powerful framework, together with some useful specific methods for applications which may arise in the rest
of the book.

112 4. Logic methods
[ Should have a section on truth-table methods, including the abbreviated, syntax-directed truth tables. Also
have at least one section on Boolean algebra. Maybe even have a section on syllogisms? ]
4.0.2 Remark: There are two styles of calculation in symbolic logic.

(1) The algebraic style: Logical expressions are treated as functions which can be evaluated in terms of
their arguments, and deduction is performed in the manner of algebra, solving for the unknown truth
values.
(2) The linguistic style: Logical expressions are manipulated as lines of text subject to rules of deduction,
ignoring even the most basic semantics of the expressions.
Style (2) is a rigid formalization (or “mechanization”) of style (1). Logical arguments in the algebraic style
are generally informal, as in algebra. The linguistic style removes the need for any semantics at all, which is
preferable if one distrusts informality. The disadvantage of text-level linguistic argumentation is that human
beings can find it dry and dreary.
4.0.3 Remark: The calculus of logical expression is essentially void of meaning. It is simply a collection
of notations and definitions.
Logic only becomes interesting when one must solve “logical equations” for unknown individual propositions.
Similarly, the mere differentiation of functions in calculus is of limited interest. The really exciting task is to
solve (ordinary or partial) differential equations, where the derivatives are known (or solve some equation),
and one must invert the differentiation process to determine the original function, i.e. the “solution”. (This
is also discussed in Remark 4.4.2.)
4.0.4 Remark: The extent of standardisation in mathematical logic seems to be somewhat less than in set
theory. In the case of set theory, one may simply say that one wishes to accept the Zermelo-Fraenkel axioms
or the Neumann-Bernays-Gödel axioms, and the meaning is well standardized. There are many variants of
these set theory axiom sets, but they are equivalent variants. In the case of mathematical logic, there are
many alternative non-equivalent axiomatic systems (comprised of syntax rules, deduction rules and axioms),
and these systems do not generally even have standard names. (Although the theorems are often equivalent,
the proofs are typically very different.) Every author seems to have a different system. This necessitates a
complete development of the body of theorems for each system, which wastes much effort.
Consequently, the author of this text will try to determine a generally suitable axiomatic system for math-
ematical logic which combines most features of most of the worthwhile axiomatic systems and exemplifies
most of the concepts. However, it is not possible to identify one “standard” logic which is generally ac-
cepted and applied. Nor is it possible to combine all, or most, axiomatic frameworks for logic into a single
comprehensive framework.
In particular, it should be noted that many frameworks for mathematical logic attempt to derive meta-logic
“theorems” from some sort of lower-level semantic framework which is based directly on set theory. To this
author, it is a great mystery why this kind of brazenly circular form of development is presented with such
earnestness by so many authors. Rather than attempting to “prove” assertions about logic using a basis
of set theory, one may as well come clean and start the development as a totally ad-hoc deductive system,
leaving the reader to decide if they wish to accept the formalism.
4.1. Concrete proposition domains

4.1.1 Remark: The proposition names in abstract propositional calculus may refer to anything at all. One
may consider the propositional calculus to be a “functional module” in the engineering sense. Propositional
calculus may be applied to any set as the object space which is referred to by the proposition name space. The
objects could be, for example, bistable transistor circuits which have two states, ON and OFF. Alternatively,
the objects could be English-language sentences. Of greatest interest for mathematics is the case that the
proposition object space is a set of symbolic expressions defined within a set theory such as Zermelo-Fraenkel.
(See Figure 4.1.1.)
The usual caveat emptor warning applies to the use of a particular propositional calculus in combination
with a particular proposition object space. It is the responsibility of the user to ensure that the concepts
of “true” and “false” have some relevance to the object space, and that the logical operators correspond

4.1. Concrete proposition domains 113
propositional calculus 1 propositional calculus 2
name abstract
spaces A B A∨B A B A∨B logics
object concrete
spaces V1 V2 . . . S1 S2 . . . P 1 P2 . . . logics
transistor voltages natural language sentences set theory propositions
Figure 4.1.1 Multiple object spaces for abstract propositional calculus modules
to something meaningful in the application domain. A propositional calculus has no absolute validity. It
is simply a functional unit like an algebraic system. (See Chapter 9 for algebraic systems.) There is no
guarantee that a particular algebraic system is applicable to a given set of objects. Any set of objects to
which an algebraic system is applied must be tested to see if it satisfies the axioms of the algebraic system.
Likewise one must test any intended object space for a propositional calculus to ensure that it satisfies the
axioms and rules of that propositional calculus.
[ Mendelson [165], page 32 footnote, uses the word “proposition” to mean a meta-theorem. Make a compara-
tative table of such vocabulary variations of various logic authors. Mendelson [165], page 31 (footnote), uses
the term “object language” to mean what I call “abstract logic”. (See Figure 4.1.1.) ]
4.1.2 Remark: In Definition 4.1.3, the word “domain” has a particular meaning in the theory of functions
within set theory. (See Definition 6.3.6.) The meaning as the source space of a relation or function is not
intended here.
Some plausible alternatives for the word “domain” in Definition 4.1.3 would be “range”, “class”, “universe”,
“space” or “zone”.
[ Definition 4.1.3 is tentative. It might need some improvement. It looks like a concrete proposition domain
needs to be defined in a set which is defined within a background system, defined a-priori. But in a sense,
this might not be such a bad thing. It could be explicitly stated that this is so, an maybe it should be stated
here. ]
4.1.3 Definition: A truth value map is a map τ : P → {F, T} for any (naive) set P.
The concrete proposition domain of a truth value map τ : P → {F, T} is its domain P.
4.1.4 Remark: The “naive sets” in Definition 4.1.3 are not really sets at all. The English language requires
some noun to describe those objects which pass a specific test procedure or operational definition. To be
more precise, one should think of “naive sets” not as expressions like “{x; F (x)}”, where F is some test
which is applied to things x, but rather as expressions like “x; F (x)”. In other words, one should talk about
“those x which pass the test F ” rather than “the set of x which pass the test F ”. The word “class” can be
used for such naive sets, although the word “class” is often used for more specific concepts such as NBG set
theory classes.
In short, one may say that naive sets or classes are an objectification of properties. There is not necessarily
any “object” which corresponds to particular test procedures or operational definitions.
4.1.5 Remark: If the truth value map in Definition 4.1.3 is generalized to permit multiple values for
the map, the result is that the propositions in the proposition domain may be true, false or both. In
this case, the relation τ ∈ P × P has the property ∀x ∈ P, ∃y ∈ P, (x, y) ∈ τ . Such inconsistency-
tolerant or “paraconsistent” logics are not difficult to define, and could have useful applications. (See
Mortensen [167], page 2.) However, it seems more economical intellectually to partition the proposition
domain P for paraconsistent logics into a domain P0 of single-truth-valued propositions and a domain P1 of
double-truth-valued propositions.

Any logical axioms which are proposed for the combined domain must specify whether they apply to propo-
sitions of one or both of the sub-domains. Proof by contradiction only works for propositions which lie
in the single-valued propositions. This kind of two-tiered system with normal propositions and abnormal
pseudo-propositions is reminiscent of the two-tiered NBG set theory, which has normal sets and abnormal
proper classes. In both cases, the functionality for the abnormal tier is severely limited. One may talk about
the abnormal things, but one cannot do very much with them.
4.1.6 Remark: The concrete proposition domains in Definition 4.1.3 can be static or dynamic. However,
the dynamic case is expressible as the static case in the same way that time is bundled together with space
parameters in physics.
To be more specific, suppose the truth value function τ has a time parameter: τ : P × IR → {F, T}. The
space/time domain P × IR can be replaced by the concrete proposition domain P # , which makes the function
τ : P # → {F, T} look static.
4.1.7 Remark: One may draw an analogy between truth value maps and the solutions of BVPs (boundary
value problems) and IVPs (initial value problems) for PDEs (partial differential equations). BVP solutions
are analogous to static truth value maps and IVP solutions are analogous to dynamic truth value maps.
PDEs, and boundary or initial conditions, place constraints on the solutions to PDE problems. Very often,
it is not possible to solve PDE problems sufficiently to say what the solution u(x) is for a particular x in the
domain. One must very often be content to know how the properties of solutions u depend on the constraints
of a problem.
In the same way, very often one finds that the constraints placed on truth value maps for a particular “logic
problem” ony permit one to determine various properties of the truth value map “solutions” for the problem.
For example, if one could determine the truth value for every proposition “X ∈ Y ” in a Zermelo-Fraenkel
set theory, every open problem of mathematics would be solved!
In a “logic problem”, existence and uniqueness of truth-value-map “solutions” is very important, just as
it is in PDE theory. Roughly speaking, existence for logic problems is the same as “self-consistency” and
uniqueness is the same as “completeness”. If a single contradiction is found in a logic system, this means
that the number of solutions of the logic problem is zero. Self-consistency is not an optional extra. Without
self-consistency, there is no solution to work with. This is the basis of proof by contradiction, which is
discussed in Section 3.11.
4.1.8 Remark: The following are some examples of concrete proposition domains.
(1) P = the propositions “x < y” for real numbers x and y. The map τ : P → {F, T} is defined so that
τ (“x < y”) = T if x < y and τ (“x < y”) = F if x ≥ y in the familiar way.
(2) P = the propositions “X ∈ Y ” for sets X and Y in a Zermelo-Fraenkel set theory. The map τ : P →
{F, T} is defined so that τ (“X ∈ Y ”) = T if X ∈ Y and τ (“X ∈ Y ”) = F if X ∈ / Y in the familiar way.
(One may think of the totality of mathematics as the task of calculating this truth value map.)
(3) P = the propositions “X ∈ Y ” and “X ⊆ Y ” for sets X and Y in a Zermelo-Fraenkel set theory. The
map τ : P → {F, T} is defined so that τ (“X ∈ Y ”) = T if X ∈ Y , τ (“X ∈ Y ”) = F if X ∈ / Y,
τ (“X ⊆ Y ”) = T if X ⊆ Y , τ (“X ⊆ Y ”) = F if X -⊆ Y in the familiar way.
(4) P = the well-formed formulas of Zermelo-Fraenkel set theory. The map τ : P → {F, T} is defined so
that τ (P ) = T if P is true and τ (P ) = F if P is false.
(5) P = the well-formed formulas of any first-order language. The map τ : P → {F, T} is defined so that
τ (P ) = T if P is true and τ (P ) = F if P is false.
In case (1), the domain P is a well-defined set in the Zermelo-Fraenkel sense. In the other cases above, P is
generally not a ZF set.
The following example is not so well defined as the others.
(6) P = the propositions P (x, t) defined as “the voltage of transistor x is high at time t” for transistors x
in a digital electronics circuit for times t. The truth value τ (P (x, t)) is not usually well defined for all
pairs (x, t) for various reasons, such as settling time, related to latching and sampling. Nevertheless,
propositional calculus can be usefully applied to this system.

4.1. Concrete proposition domains 115
(7) P = the propositions P (n, t, z) defined as “the voltage Vn (t) at time t equals z” for locations n in a
digital electronics circuit, at times t, with logical voltages z equal to 0 or 1. The truth value τ (P (n, t, z))
is usually not well defined at all times t. Propositions P (n, t, z) may be written as “Vn (t) = z”.
The idea of concrete proposition domains and truth value maps is illustrated in Figure 4.1.2.
concrete proposition domain

“π < 3”,
P1 “3.1 < π”,
“π < 3.2”, . . .
tru τ1
th
val truth value space
ue
“∃x, ∀y, y ∈ ma
/ x”,
τ2 p
P2 “∅ ∈ ∅”, F T {F, T}
truth value map
“∅ ⊆ ∅”, . . .
concrete proposition domain τ3 ap
luem
va
th
tru
“V1 (0) = 1”,
P3 “V2 (3.7 µS) = 0”,
“V7 (7.2 mS) = 1”, . . .
concrete proposition domain
Figure 4.1.2 Concrete proposition domains and truth value maps
4.1.9 Remark: The kind of sets of propositions and names which appear in the definitions of formal
symbolic logic are not the same as the kinds of sets which appear in formal set theories such as Zermelo-
Fraenkel (Section 5.1) and Bernays-Gödel (Section 5.12). The sets in abstract symbolic logic are externally
defined collections of propositions or names which have very limited membership relations.
For example, if P denotes the set of propositions of a propositional calculus, the membership relation x ∈ P
is defined for elements x of P, but there is no membership relation x ∈ y for pairs of elements x and y
in P. If the external application context does permit such relations, this is not an issue for symbolic logic
because such relations are not used at all in symbolic logic. It is meaningless (or irrelevant) to say that one
proposition is an “element” of another proposition. Likewise, it is meaningless to say that one name or label
is an “element” of another name or label.
This remark is related to the theory of types developed by Russell and Whitehead [168]. (See Mendelson [165],
page 4.) By organizing sets into layers called “types” and forbidding the set membership relation within
each type, the dangers of Russell’s paradox are avoided.
The “sets” required for the formalization of symbolic logic are therefore “naive sets” whose pruned-down
properties, relations and axioms effectively avoid the dangers or mathematical set theory. The “naive set”
idea also partially avoids the circularity of definitions between logic and set theory.
It may same at first sight that the membership relation is permitted to operate between elements of the
spaces used in ZF set theory. However, the propositions of ZF set theory do not possess any membership
relation. The ZF membership relation operates between the parameter names for propositions. The sets
of ZF set theory are merely parameters for propositions. Thus M (x, y) = “x ∈ y” is a proposition with
parameters x and y. The membership relation M is a family of parametrized propositions. Parametrized
propositions belong to predicate calculus, not propositional calculus.
In the case of predicate calculus, the set of proposition parameters can encounter Russell’s paradox, but this
is not a problem with the naive set of proposition parameters itself. The problem is simply that the axioms
of a set theory can be inconsistent if they are not chosen carefully. To be specific, if U denotes the set of all
sets and Q denotes the problematic set which satisfies ∀x ∈ U, M (x, Q) ⇔ (M (x, U ) ∧ ¬M (x, x)). This
leads to the contradiction: M (Q, Q) ∧ ¬M (Q, Q). This is a consequence of the axioms which guarantee the
existence of such a set Q. Therefore there cannot be a universal set U which supports a set relation M which

satisfies the bad set of axioms. The contradiction lies in the axioms, not in the set of proposition parameters
(i.e. the set of sets).
The problem in Russell’s paradox is caused by allowing sets to be elements of themselves. Although it seems
plausible that the class of all things should contain the class of all things as an element, the possibility of
sets being elements of themselves is inconsistent with the restriction axiom. If the axiom ∀x ∈ U, ¬M (x, x)
is accepted instead, we have Q = U and ¬M (Q, Q). So the specification ∀x ∈ U, M (x, Q) ⇔ (M (x, U ) ∧
¬M (x, x)) for Q only yields the consequence ∀x ∈ U, ¬M (x, x), which doesn’t contradict anything. In
summary, Russell’s paradox is a show-stopper for cyclic inclusion relations among sets, not for the sets
themselves.
To put it another way, let Q = {x ∈ P; x ∈ / x}. Then Q = P. So Q ∈ / Q because P ∈ / P. So Russell’s
paradox does not apply.
[ The discussion in Remark 4.1.9 needs to be checked very carefully! ]
4.1.10 Definition: A proposition name map is a map µ : N → P from a (naive) set N to a concrete
proposition domain P.
The proposition name space of a definition name map µ : N → P is its domain N .
4.1.11 Remark: The relation between proposition name spaces and concrete proposition domains is illus-
trated in Figure 4.1.3.
proposition name space
N A, B, C, . . .
ab
str
act
ru t
proposition t = th va
µ
name map τ ◦ lue m
µ ap
“∃x, ∀y, y ∈
/ x”,
τ
P “∅ ∈ ∅”, F T {F, T}
truth value map
“∅ ⊆ ∅”, . . .
concrete proposition domain truth value space
Figure 4.1.3 Proposition name space and concrete proposition domain
The proposition name map µ in Definition 4.1.10 is quite arbitrary. A particular name space may be mapped
to any concrete proposition domain in any way at all. (This is illustrated in Figure 4.1.4.)
Abstract logic uses the proposition name space, not the concrete proposition domain. The theorems of
abstract logic, which are validated at a linguistic level, may then be applied to any concrete proposition
domain at all, with any choice of proposition name map. The theorems of abstract logic are only valid to
the extent that the assumptions of the model are valid.
4.2. Logic operations in concrete proposition domains

4.2.1 Remark: The context of a particular concrete proposition domain may not permit the construction
or expression of general logical operations. In particular, many natural languages are limited in their ability
to express logical operations. This is discussed in Section 3.10, particularly in Remark 3.10.8.
In the case that a proposition context does support logical operations, the propositions which are formed by
those operations may be referred to as “truth-functional combinations”. (See Mendelson [165], page 12.)
The truth-functional combinations are straightforward to define in formal set theory because the same logical
operations are used as in formal logic. But in other concrete proposition domains, particularly in natural
languages, truth-functional combinations are quite difficult to define and are vulnerable to ambiguity. It
seems preferable to present the methods of logic at an abstract level, where precision and ambiguity are easy

4.2. Logic operations in concrete proposition domains 117
concrete proposition domains

“π < 3”,
“3.1 < π”,
“π < 3.2”, . . .
proposition µ 1 ap trut τ1
P1 hv
name space em alue truth value space
nam “∃x, ∀y, y ∈ / x”,
map
µ2 τ2
A, B, C, . . . name map “∅ ∈ ∅”, F T
truth value map
“∅ ⊆ ∅”, . . .
nam µ3 τ3 ap
N em P2 em {F, T}
ap va lu
h
“V1 (0) = 1”, trut
“V2 (3.7 µS) = 0”,
“V7 (7.2 mS) = 1”, . . .
P3
Figure 4.1.4 Proposition name space with multiple concrete proposition domains
to achieve, and then import the abstract logic into concrete domains. If this importation map can be shown
to be precise and unambiguous, all of the methods of abstract logic will be directly applicable.
4.2.2 Remark: Statements which are purely logical combinations of other sentences are called “truth-
functional combinations”. The function f¯ in Definition 4.2.3 is necessarily of the form f¯ : {F, T}n → {F, T}.
There are therefore 2(2 ) different choices of f¯ for n component sentences.
n
The function f in Definition 4.2.3 has domain Dn and range D, where D is the “compound proposition
space” under consideration. (See Remark 4.5.4.)
4.2.3 Definition: A truth-functional combination of sentences A1 ,. . . An , for any non-negative integer n,
is a sentence f (A1 , . . . An ) such that its truth-value t(f (A1 , . . . An )) is a function f¯(t(A1 ), . . . t(An )), where
the function f¯ may depend on the truth values t(A1 ),. . . t(An ), but is independent of the sentences A1 ,. . . An
themselves.
4.2.4 Definition: A truth function is a function f : {F, T}n → {F, T} for some non-negative integer n.
A truth function of n arguments or n-ary truth function, for non-negative integer n, is a truth function with
domain {F, T}n .
4.2.5 Remark: If one or more of the propositions A1 ,. . . An in Definition 4.2.3 do not have well-defined
truth values, the function value f¯(t(A1 ), . . . t(An )) will not necessarily be well defined. But under such
circumstances, the sentence f (A1 , . . . An ) may be well defined, and the truth value t(f (A1 , . . . An )) may
also be well defined. This anomaly may be remedied by defining a third pseudo-truth-value U which means
“unknown truth value”. Then the domain and range of f¯ may be extended so that f¯ : {F, T, U}n →
{F, T, U}.
Propositions with unknown truth values may occur particularly in systems where the truth values are de-
termined dynamically in some way.
4.2.6 Remark: In terms of the “unknown truth value” in Remark 4.2.5, one may write the following
extended truth table for the implication operator.

A B A⇒B
T T T
T F F
T U U
F T T
F F T
F U T
U T T
U F U
U U U
This truth table is discussed in Remark 3.8.1 in the context of a logic machine network.
4.2.7 Remark: It seems fairly reasonable to add a third truth value “unknown” to the standard two,
“true” and “false”. However, the case for this is not overwhleming.
Consider, for example, the analogous situation where the real-valued solution of an algebraic or PDE problem
is unknown. In this case, we do not add an extra element U to the real numbers to represent unknown values.
Thus if a value x in algebra or f (x, t) in PDE theory is unknown, we do not write x = unknown (or x = U)
or f (x, t) = unknown (or f (x, t) = U). We simply say that the value is unknown. (In some elementary
teaching, however, such monstrosities are sometimes seen!)
We generally assume that the value of a solution to an equation is defined, even when we do not know what
it is. The equations x = a and f (x, t) = b are satisfied for some a and b, but we just don’t know what they
are.
4.2.8 Remark: If we did want to add an unknown element U to the real numbers to represent the state of
knowledge about a system A by a mathematical thinking machine M1 (such as a human brain), we would
then have only the ordinary real numbers in system A, but this would be modelled by machine M1 using
an extended set of real numbers. This is awkward, but not too absurd. The problem comes when a second
machine M2 models the state of machine M1 . Then machine M2 may not fully know the state of machine M1 .
Machine M2 may not know if machine M1 is representing a value x as a real number (because it knows the
value), or as the pseudo-number U. If machine M2 does not have any information, it must use a second
level of unknown real number, for example U2 . (This is perhaps one of the infamous “unknown unknowns”!)
Machine M2 does not know if machine M1 knows the value of x or not.
But the problem is more difficult than that. Machine M2 may know that machine M1 knows the value of x,
but may now know what value it knows. This is a different kind of unknown to the original U (which was a
“known unknown”). Now machine M2 needs to define an “unknown known”, possibly denoted U1 .
The same issues arise in the case of extended truth values. If we add a third truth value U to permit
machine M1 to model unknown truth values of propositions about system A, it becomes necessary to invent
another two pseudo-truth-values when machine M2 models machine M1 . When machine M3 models ma-
chine M2 , a further four pseudo-truth-values are required. If two logic machines have the great misfortune
to be modelling each other’s state of knowledge, an infinite number of pseudo-truth-values will be required
to represent all of the possibilities. (Example: “I know that you don’t know that I don’t know that you
know that I know whether the Sun rises in the East.”)
The use of pseudo-truth-values to represent states of ignorance may be useful under some circumstances.
But it is inconvenient to introduce such monstrosities into the formal theory of logic. (See also Remark 3.8.2
for discussion of this point.)
4.3. Logical operators and expressions

This section is concerned with logical propositions without parameters and without quantifiers. Existential
and universal quantifiers, and parametrized families of propositions, are introduced in Sections 4.12 and 4.13.
4.3.1 Remark: The logical operators in Notation 4.3.3 are applied to abstract propositions in a “discussion
context”, not necessarily in the “discussed context”. However, in formal set theory, these operators are used in
both contexts. One may think of this as the importation of the operators (and compound logical expressions)
from the abstract discussion context into the concrete discussed context.

4.3. Logical operators and expressions 119
4.3.2 Remark: Although the first three symbols in Notation 4.3.3 are common in logic, they are not so
common in mathematics. In differential geometry, the ∧ symbol clashes with the wedge product in exterior
algebra. So these three logic symbols are rarely used outside the preliminary chapters.
4.3.3 Notation:
(i) ¬ means “not” (logical negation).
(ii) ∧ means “and” (logical conjunction).
(iii) ∨ means “or” (logical disjunction).
(iv) ⇒ means “implies” (logical implication).
(v) ⇔ means “if and only if” (logical equivalence).
4.3.4 Definition: A propositional connective is any of the symbols in Notation 4.3.3.
4.3.5 Remark: The ∧ and ∨ symbols have some easy mnemonics (in English). The ∧ symbol suggests the
letter “A” in the word “and”, but without the horizontal strut. The ∨ symbol is the opposite of this. The ∧
and ∨ symbols for propositions match the corresponding ∩ and ∪ symbols for sets. The ∪ symbol resembles
the letter “U” for “union”. (Arguably the ∩ symbol suggests the “A” in the word “all”.)
The ¬ (not) symbol is perhaps inspired by the − (minus) symbol for arithmètic negation. A popular
alternative for “not” is the ∼ symbol. However, this is easily confused with other uses of the same symbol.
So ¬ is preferable when logic is mixed with mathematics in a single text.
According to Lemmon [164], page 19, the ∨ symbol is actually a letter “v”, which is a mnemonic for the
Latin word “vel”, which means the inclusive OR, as opposed to the Latin word “aut”, which means the
exclusive OR.
[ Possibly replace the plain TEX ¬ symbol with or something similar. ]
4.3.6 Remark: There is a fair amount of variation in notations for logical operators and statements. The
following table summarizes some sample notations.
logic operator notations
author not and or implies if and only if nand nor xor prop wff
EDM [34] , ∼,¯ ∧, &, · ∨, + →, ⊃, ⇒ ↔, $, ≡, ∼, ⊃⊂, ⇔ A
EDM2 [35] , ∼,¯ ∧, &, · ∨, + →, ⊃, ⇒ ↔, ≡, ∼, ⊃⊂, ⇔ A
KEM [122] ¬ ∧ ∨ → ↔ p H
Lemmon [164] − & v → ↔ | ↓ P
Mendelson [165] ∼ ∧ ∨ ⊃ ≡ | ↓ A A
Reinhardt [135] ¬ ∧ ∨ ⇒ ⇔ | C A
Shoenfield [169] & ∨ → ↔ p
Szekeres [45] and or ⇒ ⇔ P
Kennington ¬ ∧ ∨ ⇒ ⇔ | ↓ ! A α
The notations for proposition names and well-formed formula (wff) names are indicated in the table as
“prop” and “wff” respectively.
The NAND operator is also called the “Sheffer stroke” or “alternative denial” operator. (See Mendelson [165],
pages 26, 42.) The NOR operator is also called the “Peirce arrow”, “Quine dagger” or “joint denial” operator.
(See Mendelson [165], page 26.)
4.3.7 Remark: There is no notation for the exclusive OR operator in Remark 4.3.6 because there seems
to be no standard for it in logic and mathematics. The exclusive OR of A and B is (A ∨ B) ∧ ¬(A ∧ B), which
is equivalent to A ⇔ ¬B. It also has the useful property that the truth value t(A ⇔ ¬B) equals T if and
only if the sum of the truth values t(A) + t(B) is an odd integer. (This rule applies also to the exclusive OR
of any finite set of propositions, which is well defined because the operator is associative and commutative.)
So a notation resembling the addition symbol “+” could be suitable, such as “⊕”. This is in fact used in
some contexts. (For example, see Lin/Costello [210], pages 16–17.) But this symbol is also used frequently

in algebra. The sometimes-used exclusive-OR symbol “-≡” also clashes with a frequently used symbol. (For
example, see CRC [100], pages 16–21.)
Modified OR symbols such as “∨”, ˙ “∨ ¯ ” or “%” could be used because the inclusive OR and exclusive OR
are often confused with each other.
A fairly rational notation choice would be a superposed OR and AND symbol “∨ ∧”. This has the same sort of
4-way symmetry as the “⇔” biconditional symbol. But it would be tedious to write this by hand frequently.
It would make some sense to use a superposed X (×) and circle (!) to represent the exclusive OR operator.
Such a symbol (if it could be easily produced in TEX) would look similar to the “∨ ∧” symbol. It would also
have the same symmetries as the biconditional symbol. It would also suggest the first two letters of the
abbreviation XOR.
The triangle notation “E” would make good sense for the exclusive OR because it resembles the Delta symbol
∆ and the corresponding set operation is the “set difference”. (See Definition 5.13.14 and Notation 5.13.15.)
To distinguish this symbol from the set operation, the corresponding small triangle symbol ! could be used
instead.
4.3.8 Definition: The alternative denial operator is the map f : {F, T}2 → {F, T} satisfying f (v1 , v2 ) =
F for (v1 , v2 ) = (T, T), otherwise f (v1 , v2 ) = T.
The joint denial operator is the map f : {F, T}2 → {F, T} satisfying f (v1 , v2 ) = T for (v1 , v2 ) = (F, F),
otherwise f (v1 , v2 ) = F.
The exclusive-or operator is the map f : {F, T}2 → {F, T} satisfying f (v1 , v2 ) = T for (v1 , v2 ) = (T, F)
or (F, T), otherwise f (v1 , v2 ) = F.
The alternative denial operator is also known as the NAND operator or the Sheffer stroke.
The joint denial operator is also known as the NOR operator or the Peirce arrow or Quine dagger.
4.3.9 Notation:
| denotes the alternative denial operator.
↓ denotes the joint denial operator.
! denotes the exclusive OR operator.
4.3.10 Remark: Propositional calculus automates logic at the symbolic level, ignoring the semantics of
logical expressions. Figure 4.3.1 illustrates the difference between the syntactic and semantic levels.
parse tree function tree

(A ∧ ¬B) ∨ C φ∨
A ∧ ¬B C φ∧ t(C)
A ¬B t(A) φ¬
B t(B)
syntax semantics
" # " #
t (A ∧ ¬B) ∨ C = φ∨ φ∧ (t(A), φ¬ (t(B))), t(C)
Figure 4.3.1 Example logical expression tree with syntax and semantics
The tree on the left shows how a simple abstract logical expression (A ∧ ¬B) ∨ C may be parsed. At the
syntactic level, such parsing is required in order to ensure that the expression satisfies the rules for a well-
formed formula. Propositional calculus performs operations on expression at the syntactic level, using only
a simple set of blind rules to determine which deductions may be made from sets of compound propositions.
" #
At the semantic
" level, the example# expression (A ∧ ¬B) ∨ C has a truth value t (A ∧ ¬B) ∨ C which
equals φ∨ φ∧ (t(A), φ¬ (t(B))), t(C) . The expression (A ∧ ¬B) ∨ C is merely a written representation of the

corresponding spoken language. The meaning of the expression is a set of operations which must be carried
out as indicated by the function tree in Figure 4.3.1. The function-tree diagram is an attempt to convey the
required mental operations in the same way that the expression (A ∧ ¬B) ∨ C is an attempt to convey the
same operations in linear written text. The rules and axioms of propositional calculus cannot be understood
in terms of text alone. The origin of the deduction rules and axioms lies in the logical procedures which are
summarized by abstract symbolic logic expressions.
See Remark 4.13.11 for an analogous attempt to parse a quantified logical expression.
[ Mention syntax-directed translation in connection with syntax and semantcs of parse-trees? Also tree deco-
ration? Abbreviated truth tables are also an example of syntax-directed translation? ]
4.3.11 Definition:
A conjunct of an expression of the form α ∧ β, for wffs α and β, is either α or β.
The left conjunct of an expression of the form α ∧ β, for wffs α and β, is α.
The right conjunct of an expression of the form α ∧ β, for wffs α and β, is β.
A disjunct of an expression of the form α ∨ β, for wffs α and β, is either α or β.
The left disjunct of an expression of the form α ∨ β, for wffs α and β, is α.
The right disjunct of an expression of the form α ∨ β, for wffs α and β, is β.
The principal connective of an expression of the form α◦β, where ◦ is one of the connectives in Notation 4.3.3,
and α and β are wffs, is the connective ◦.
4.3.12 Remark: A problem with Definition 4.3.11 is the possibility that an expression of the form α ∧ β,
for example, may contain an additional operator ∧ in the sub-expression α, or β, or both. Then the choice
of the pair of conjuncts would be non-unique. However, the use of strict parenthesizing rules removes the
non-uniqueness. Under such rules, each side of a binary operator such as ∧ or ∨ is required to be either a
primitive symbol or a parenthesized sub-expression. Then the ambiguity is removed by the parentheses.
If the parsing of a logical expression is ambiguous, the semantics of the expression is ambiguous. Logical
expressions are merely notations for function trees. So if the parsing is non-unique, the meaning is non-
unique. The fact that the same result may be provably unchanged for any choice of interpretation does not
change the fact that the function tree interpretation is ambiguous.
It would probably be more correct to refer to the “conjuncts (or disjuncts) of an operator” rather than the
“conjuncts (or disjunts) of an expression”. Thus if an expression has the form α ∧ (β ∧ γ), for example, the
conjuncts of the first ∧-operator would be α and (β ∧ γ), whereas the conjuncts of the second ∧-operator
would be β and γ. More generally, one may refer to the “operands” of each operator in a logic expression.
Usually the number of operands for any operator is one or two, but it is quite straightforward to generalize the
concept to larger numbers of operands. (On the other hand, larger numbers of operands are best expressed
in functional notation rather than by operators.)
The associativity of the ∧ and ∨ operators implies that the parenthesization rules may be relaxed. One
way to remove the function-tree ambiguity in this case is to decide on a fixed “associativity rule” for all
such expressions. For example, a left-associativity rule would interpret α ∧ β ∧ γ as (α ∧ β) ∧ γ whereas
right-associativity would interpret it as α ∧ (β ∧ γ). This approach effectively parenthesizes all expressions
so that the order of application of operations is unique. (For a typical computer software context for left
and right associativity, see for example Kernighan/Ritchie [206], page 200.)
Another way to remove ambiguity in unparenthesized expressions such as α ∧ β ∧ γ is to interpret such
expressions in terms of multi-operand functions. For example, α ∧ β ∧ γ could be interpreted as φ∧ (α, β, γ),
which in turn is defined in terms of the two-operand operators.
Similar comments refer to the definition of the “principal connective”.
4.3.13 Remark: In terms of the semantic-level function trees mentioned in Remark 4.3.12, the principal
connective defined in Definition 4.3.11 for logical expressions corresponds to the function at the root of the
tree for the parsed expression. The conjuncts (or disjuncts) of a conjunction (or disjunction) are the left and
right branches of the root of the parse tree.

4.3.14 Definition:
A conditional (expression) is an expression of the form α ⇒ β for wffs α and β.
The antecedent (subexpression) of a conditional expression α ⇒ β for wffs α and β is the wff α.
The consequent (subexpression) of a conditional expression α ⇒ β for wffs α and β is the wff β.
A biconditional (expression) is an expression of the form α ⇔ β for wffs α and β.
4.3.15 Remark: Logical operators can be expressed in terms of arithmètic relations among truth values
if the function τ from the set of proposition names to the set of integers {0, 1} is defined by τ (P ) = 0 if P
is false and τ (P ) = 1 if P is true. (The symbol “!” means the exclusive OR.)
logical expression arithmètic expression
A τ (A) = 1
¬A τ (A) = 0
A∧B τ (A) + τ (B) = 2
A∨B τ (A) + τ (B) ≥ 1
A⇒B τ (A) ≤ τ (B)
A⇔B τ (A) = τ (B)
A!B τ (A) + τ (B) = 1
A∧B ∧C τ (A) + τ (B) + τ (C) = 3
A∨B ∨C τ (A) + τ (B) + τ (C) ≥ 1
A!B !C τ (A) + τ (B) + τ (C) = 1 or 3
As often mentioned in this book, the logic layer is infested with dependencies on set and number concepts.
All of logic relies upon the natural integers anyway. So it is not a substantial defeat to be using integers in
definitions of logical operators.
Conjunctions and disjunctions are easily expressed in terms of minimum and maximum operators as in the
following table.
logical expression min/max expression
min(τ (A)) = 1
A
¬A max(τ (A)) = 0
A∧B min(τ (A), τ (B)) = 1
A∨B max(τ (A), τ (B)) = 1
A∧B ∧C min(τ (A), τ (B), τ (C)) = 1
A∨B ∨C max(τ (A), τ (B), τ (C)) = 1
A substantial advantage of these min/max operators is that they are also valid for infinite sets of propositions.
This is important in the predicate calculus, where propositions are organized into parametrized families, and
these families are typically countably or uncountably infinite.
4.3.16 Remark: There is a sense in which proposition names and wff names are not an essential component
of an axiomatic system for propositional calculus. All such names are arbitrary “dummy variables” whose
sole purpose is to indicate which variables are the same. This is essentially the same as the role of pronouns
in natural languages.
The information contained in dummy variables can be communicated by other means, for example by links in
a diagram. However, the scope of these dummy variables could be quite large. If the scope of a proposition
name is spread over many pages, it would be inconvenient to use other forms of less arbitrary linkage
between the variables in expressions. It is more convenient to simply remember that the particular choice of
proposition names is arbitrary. Such arbitrariness of symbols is present in almost all of mathematics anyway.
Parentheses are even more clearly inessential than proposition names because “Polish notation” is a well-
defined full substitute for parentheses. The “reverse Polish” notation is not easy for humans to read, but the
existence of this option proves that parentheses are not an essential part of the language. (Shoenfield [169],
pages 14–16, presents a first-order language notation which is expressed in terms of prefix operators.)

4.3.17 Remark: One might reasonably ask what a zero-operand logical operator looks like. A unary
operand acting on an abstract proposition yields an abstract proposition whose truth value φ(t(A)) is a
1
function φ of a single truth-value variable t(A). There are 2(2 ) = 4 posssible such functions, as mentioned
0
in Remark 4.2.2. In the case of zero-operand operators, there are 2(2 ) = 2 choices for the operator. These
are introduced in Definition 4.3.18 and Notation 4.3.19.
4.3.18 Definition:
The true (zero-operand) operator is the zero-operand operator whose expression value is true.
The false (zero-operand) operator is the zero-operand operator whose expression value is false.
4.3.19 Notation:
9 denotes the true zero-operand logical operator.
⊥ denotes the false zero-operand logical operator.
4.3.20 Remark: The true and false zero-operand operators may be used in a logical expression in the
same way as other operators. For example, the expressions 9, ¬⊥, ¬(A ∧ ⊥) and A ∨ 9 are all valid logical
expressions. In fact, they all have the value T for any choice of operands. In other words, for any abstract
proposition A, t(9) = T, t(¬⊥) = T, t(¬(A ∧ ⊥)) = T and t(A ∨ 9) = T.
It follows, therefore, that the expressions 9 and ⊥ are valid abstract propositions, whose truth values are
always-true and always-false respectively.
There are also operators with any positive number of operands which are always-true or always-false. To
avoid excessive diversity of notations, the always-true operators with n operands could be denoted as 9(),
9(·), 9(·, ·) and 9(·, ·, ·) for n = 0, 1, 2, 3, and so forth. Thus t(9(A, B)) = T for all abstract propositions
A and B. The always-false operators with one or more operands may be denoted similarly. However, there
is very seldom a practical requirement for such operands.
The always-true and always-false logical operators should be distinguished from the always-true and always-
false logical predicates. The logical operators take propositions as arguments whereas logical predicates take
logical variables as arguments. (See Remark 4.12.9 and Notation 4.12.10 for always-true and always-false
logical predicates.)
Note also that 9 and ⊥ are not names for concrete propositions. They are not labels for propositions in
a concrete proposition domain. However, both 9 and ⊥ are abstract compound propositions which are
valid logical expressions. Therefore it is incorrect to refer to them as the “always-true proposition” and the
“always-false proposition” respectively. (On the other hand, they do happen to be always-true and always-
false logical predicates respectively, although strictly speaking, logical predicates are not defined as such in
propositional calculus.)
4.3.21 Remark: The author chose the notations 9 and ⊥ to represent “always true” and “always false”
respectively on 4 July 2008, but only on a temporary basis due to lack of a better symbol. The symbols
seemed arbitrary and unlikely to be popular, although the 9 symbol does look like a “T”, and the ⊥
symbol is the opposite of this. Unfortunately, the ⊥ symbol is also used in Euclidean geometry to mean
“perpendicular”, which is a potential clash. On the other hand, an advantage of these symbols is that
the 9 symbol suggests the graph of a function whose value is always equal to 1, whereas ⊥ suggests the
corresponding function equal to 0 everywhere. This matches the usual arithmetic values of these symbols.
However, on 24 January 2009, the author found the exact same notation in use in an article on formal proof
(Harrison [161]).
4.3.22 Remark: Definition 4.3.23 introduces a “tautological” or “tautologous” compound abstract propo-
sitions, also known simply as tautologies. Compound abstract propositions contain zero or more atomic
proposition names. The truth value t(α) of a compound proposition α is some function f : {F, T}n → {F, T}
of the truth values of the n component propositions, where n is a non-negative integer. Thus t(α) =
f (t(A1 ), t(A2 ), . . . t(An )), where A1 , A2 , . . . An are the component propositions. A compound proposition
is a tautology if and only if f (t1 , t2 , . . . tn ) = T for all n-tuples (t1 , t2 , . . . tn ) ∈ {F, T}n .
[ Remark 4.3.22 seems excessively tedious. Fix this. ]

4.3.23 Definition: A tautology is a compound abstract proposition whose truth value is “true” for all
possible combinations of truth values for its atomic propositions.
A contradiction is a compound abstract proposition whose truth value is “false” for all possible combinations
of truth values for its atomic propositions.
4.4. Logical expression evaluation and logical argumentation

4.4.1 Remark: The word “calculus” in “propositional calculus” and “predicate calculus” is perhaps mis-
leading. The word “algebra” might be a more accurate analogy. The proposition names (such as A, B, C)
in compound propositions (such as A ∧ (B ∨ C)) are analogous to variables (such as x, y, z) in algebraic
expressions (such as x(y + z)). The procedures of propositional calculus to determine truth values of com-
pound propositions are analogous to the procedures of algebra to determine values of algebraic expressions.
So probably it would be more helpful to talk about “propositional algebra” and “predicate algebra”. In fact,
an early attempt to build symbolic logic in the 19th century is now referred to as “Boolean algebra”.
4.4.2 Remark: In the case of algebra, there are two basic tasks. One is easy. The other is difficult.
(1) Simple calculation: Evaluation of algebraic expressions, given the values of the individual variables.
For example, evaluate 2x + 3y, given that x = 10 and y = 15.
(2) Inverse problems: Inference of the values of individual variables, given the values of algebraic expres-
sions. For example: solve for x and y, given that 2x + 3y = 65 and 5x − 2y = 20.
In logic, there are two analogous basic tasks.
(1) Simple calculation: Evaluation of truth-values of logical expressions, given the truth-values of indi-
vidual logical variables. For example, determine the truth value of A ∧ (B ∨ C), given that A and C are
true and B is false.
(2) Inverse problems: Inference of the truth-values of individual logical variables, given the truth-values
of logical expressions. For example, determine the possible truth-values of A, B and C, given that
A ∧ (B ∨ C) is true and (A ∨ B) ∧ C is false. (See Exercise 46.1.3 for solution.)
The rules of propositional calculus are reminiscent of the manipulation rules of algebra, which gradually
reduce given equations to a desired form. Just as the rules of algebra ensure that the transformed equations
have the same solutions as the initial equations which are given, so also the rules of propositional calculus
ensure that the set of combinations of truth values for logical variables which satisfy the equations are not
altered by application of the deduction rules. The rules of propositional calculus are required to verifiably
leave the set of truth value combinations unchanged. This is usually expressed as the requirement that the
deduction rules must never permit false conclusions from true premisses.
One may make a further analogy with calculus. There are two well-known branches of calculus.
(1) Simple calculation: Differentiation of functional expressions. In other words, the differential calculus.
For example, calculate the derivative of exp(−x2 ).
(2) Inverse problems: Determination of functional expressions whose derivatives are as specified. In other
words, the integral calculus. For example: determine the form of f (x), given that the derivative of f
is −2x exp(−x2 ) for all x ∈ IR.
As in the case of algebra and logic, so also in calculus, case (1) is a straightforward calculation (requiring only
a bounded, finite number of steps), whereas case (2) is more problematic, sometimes requiring an unbounded
or infinite number of steps, and sometimes being even impossible to solve. In the propositional calculus,
the number of steps for case (2) is finite, but potentially quite difficult (although all conjectural theorems
can be easily tested with truth tables by virtue of the deduction theorem in Section 4.9). In the predicate
calculus, case (2) can sometimes require an unbounded number of steps or could even be impossible to solve.
(The predicate calculus is effectively the propositional calculus for infinite families of logical variables. So
the difficulty of predicate calculus is not very surprising.)
4.4.3 Remark: An example of the observation in Remark 4.4.2 that the rules of deduction in propositional
calculus may be regarded as techniques of solution of simultaneous logical equations, consider the modus

4.5. Propositional calculus formalization 125
ponens rule described in Remark 4.6.1. This is equivalent to solving for the truth value t(B), given the
truth values t(A ⇒ B) = T and t(A) = T. Modus ponens is, in fact, similar to the substitution rule in
real-number algebra. Since the truth value of A is T, this may be susbstituted into the true expression
A ⇒ B to yield t(B) = T.
4.4.4 Remark: It may be argued that the propositional calculus is a terrible waste of time and energy
because it can be done more simply and quickly with truth tables. It is true that all PC theorems may be
easily determined to be provable or not provable by the use of simple algorithms such as truth tables.
Every assertion may be easily converted to an equivalent single wff which is a tautology if and only if the
assertion is provable. Whether a wff is a tautology can be determined by a simple algorithm which evaluates
the truth or falsity of the wff for each of the possible 2n combinations of truth values of the n proposition
names in the wff. (This approach may be given the name “tabular exhaustive enumeration”.) Therefore the
propositional calculus could be abandoned in favour of simple calculation.
A counter-argument to this argument is the fact that the predicate calculus (involving existential and uni-
versal quantifiers) is not so easy to analyze by simple algorithms. When the axioms of set theory are added
to the predicate calculus, there seems to be no kind of “truth table” to facilitate the task of determining
which statements are true or false. (A similar comment is made in Remark 4.14.3.) Since the propositional
calculus is required as part of both the predicate calculus and set theory, it is beneficial to have a uniform
approach to all three topics: propositional calculus, predicate calculus and set theory.
It is most economical to use the axiomatic approach for all three levels. The only reason the truth-table
approach is possible for propositional calculus is that fact that PC expressions represent only a finite number
of propositions. (In predicate calculus, a single logical expression often signifies an infinite number of propo-
sitions involving an infinite number of individual variables.) Truth tables are useful for teaching purposes,
but have limited applicability in serious mathematical deduction. Truth tables are more closely associated
with the “simple calculation” tasks described in part (1) in Remark 4.4.2 than with the “inverse problems”
tasks in part (2).
[ Check whether there might actually be an effective truth-table method of testing all predicate calculus
assertions for validity. ]
4.5. Propositional calculus formalization
4.5.1 Remark: Propositional calculus is a formalization of the methods of argument which are used for
solving simultaneous logic equations. By formalizing the text-level procedures which are observed in the
informal methods of solution, it is possible to dispense with the semantics and perform all calculations
without any reference to the meaning of the symbols whatsoever. Thus one may regard propositional
calculus a semantics-free framework for solving logic problems. In other words, propositional calculus is
semantics-free logic.
4.5.2 Remark: There are hundreds of reasonable ways to develop the propositional calculus. Axiomatic
systems for propositional calculus typically have the following components.
(i) Operators: The set of primitive symbols such as operators. Typical symbols are ⇒, ∧, ∨ and ¬. These
symbols are called the “primitive connectives” of the system. Sometimes the parentheses “(” and “)”
may be defined as primitive symbols although their function is usually only for the grouping of symbols
into sub-formulas to make operands unambiguous.
(ii) Name space: The sets of permitted labels for propositions and well-formed formulas. The labels for
propositions are called the “statement names” of the system. They are typically single letters, with
or without subscripts. (Presumably the labels for well-formed formulas could be referred to as “wff
names”?)
(iii) Syntax rules: The rules which decide the syntactic correctness of logical sentences. A permitted
sentence is called a “well-formed formula”, abbreviated to wff or wf. (A wff may also be referred to as
a “statement form”.)
(iv) Axioms: The set of axioms or axiom schemas. Axiom schemas are templates into which arbitrary
well-formed formulas may be substituted.

(v) Inference rules: The set of rules of inference. Typical inference rules are “modus ponens” and “reductio
ad absurdum”. (See Remark 4.6.8 for the significance and origin of the phrase “modus ponens”.)
4.5.3 Remark: It is conventional to typeset Latin phrases in italics, e.g. modus ponens and modus tollens,
so that readers will not waste their time looking up these phrases in English dictionaries, although many
common Latin words and phrases are defined in English dictionaries.
4.5.4 Remark: The relations between the statements, statement names, statement forms, statement-form
names and statement-form-name forms in Remark 4.5.2 are illustrated in Figure 4.5.1.
statement compound
form letter α⇒β β⇒γ γ ∨δ ¬δ syntactic 5
forms variables
statement
syntactic
form α β γ δ 4
variables
letters
abstract
statement
A∧B A ∧ (B ∨ C) B ∨C ¬C proposition 3
forms
formulas
statement abstract
letters
A B C D
propositions 2
concrete
statements statement 1 statement 2 statement 3 statement 4 1
propositions
Figure 4.5.1 Statements, names and forms
The statements (or propositions) in layer 1 in Figure 4.5.1 are in some externally defined concrete proposition
space. (See Remark 4.1.1 for concrete spaces of propositions.) Concrete atomic propositions may be transistor
voltages, natural-language statements, symbolic mathematics statements, or any other kinds of two-state
components of systems.
The statement names (or proposition names) in layer 2 are associated in an implementation-dependent
fashion with concrete propositions. Two statement names may refer to the same concrete proposition. The
association may vary over time. In fact, there is no need to have even an equality relation on the space of
concrete propositions.
In the terminology of Remark 3.3.4, the proposition names in layer 2 belong to the “discussion context”
whereas the concrete propositions in layer 1 belong to the “discussed context”. Layer 3 also belongs to
the discussion context. (Layers 4 and 5 belong to a meta-discussion context, which discusses the layer 2/3
context.)
It is not necessary to have a well-defined concept of truth and falsity in layer 1, the space of concrete
statements. In fact, the entities in layer 1 don’t need to have any sort of two-state attributes at all. Layer 2
in Figure 4.5.1 does have a crisp, sharp notion of truth and falsity, but this is not a contradiction. The upper
four layers belong to a discussion context for layer 1, which is the discussed context. Conclusions which are
arrived at in layer 2 can only be expressed in layer 1 if layer 1 has a full set of two-state attributes and logical
expressions.
In layer 3, the atomic abstract proposition names in layer 2 are combined into logical compounds. This
is the layer in which propositional calculus does its work. The compound abstract expressions in layer 3
may or may not be associated with equivalent compound expressions in layer 1. Some concrete proposition
domains do not support general logical expressions, in which case layer 3 may be regarded as an extension
of the concrete proposition domain.

4.6. Deduction rules 127
In layer 4, the meta-logical analysis of compound abstract propositions is facilitated by associating wff names
with compound propositions in layer 3. This association is arbitrary except to the extent specified in the
discussion context. The association is not necessarily fixed. Wff names are also known as “syntactical
variables” (Shoenfield [169], page 7), “metalogical variables” (Lemmon [164], page 49) or “statement letters”
(Mendelson [165], page 15).
In layer 5, the syntactic variables in layer 4 are combined meta-logically into compounds of compounds.
When the values of the syntactic variables are substituted into an expression in layer 5, the result is an
expression in layer 3.
Compound logical statements (in layer 3) are often explained as having their truth-values determined by
the truth-values of atomic logical statements (in layer 2). In practice, the reverse is very often the case. In
other words, the truth-values of compound statements are given, and the task is to solve for the truth-values
of either the atomic components or other compound statements. This is analogous to the tasks undertaken
in algebra. One is generally given relations between variables, and the task is to solve for the individual
variables or other relations between variables. (The daily tasks of PDE analysis are similar, where one is
given a PDE plus some boundary and/or initial values, and the task is to find the set of all solutions, except
that mostly exact solutions cannot be computed; only the properties of solutions can be determined in most
practical cases.)
The names in each of the upper four layers in Figure 4.5.1 refer to individual entities in the corresponding
lower layer.
4.5.5 Remark: Truth and falsity do not necessarily have a crisp, well-defined status in the layer 1 of
Figure 4.5.1 in Remark 4.5.4. If the propositions in layer 1 are defined within a predicate calculus or first-
order language (or some other symbolic logic context), then the truth values will be well defined. But if the
propositions are in natural English language, truth values will generally be fuzzy and uncertain. Similarly,
if the propositions are voltages of electronic logic circuits, the truth values can be indeterminate under some
circumstances. Truth values are well defined in the higher 4 layers of Figure 4.5.1.
4.5.6 Remark: Definition 4.5.7 is fairly woolly because it must be explained in terms of naive logic, naive
sets and naive numbers. To put it more briefly, α1 , α2 , . . . αn < β means that the conclusion β can be
deduced from the assumptions α1 , α2 , . . . αn for any non-negative integer n. The assertion symbol “ < ” may
be given a subscript to indicate which propositional calculus is used for the deduction.
4.5.7 Definition: An assertion in a propositional calculus X is an ordered pair (α, β) such that α is a
finite set of zero or more wffs, and β is a wff, and for some logical argument in X, the wff β is the conclusion
and the wffs α are the assumptions.
4.5.8 Notation: α1 , α2 , . . . αn < X β, for non-negative integers n, denotes the assertion in propositional
calculus X of the pair (α, β), where α = {α1 , α2 , . . . αn }.
The subscript X may be omitted. Thus α1 , α2 , . . . αn < β, for non-negative integers n, denotes the assertion
in a propositional calculus (which is implied in the context) of the pair (α, β), where α = {α1 , α2 , . . . αn }.
[ The two-way assertion symbol in Notation 4.5.9 looks clumsy. The vertical lines are too close together and
the horizontal dashes are too long. See Lemmon [164], page 34. It shouldn’t look like an electronic circuit
diagram symbol for a capacitor. ]
4.5.9 Notation: α F< β denotes the assertion of both α < β and β < α.
4.6. Deduction rules

[ It might be a good idea to summarize all of the propositional calculus rule-sets and axiom-sets which appear
in various textbooks. The different systems should at least be listed in a table. ]
4.6.1 Remark: The modus ponens rule means that whenever a line of the form
(n1 ) α
appears on line (n1 ) in an argument, for any wff α, and the line

(n2 ) α ⇒ β
appears in the same argument on line (n2 ), for any wff β, then the line
(n3 ) β MP (n1 ,n2 )
may be written on any line (n3 ) in the same argument if n1 < n3 and n2 < n3 . To assist checking for errors,
it is conventional to indicate the two inputs to the rule in some such way as “MP (n1 ,n2 )” as indicated.
4.6.2 Remark: The modus ponens rule is sufficient for all deduction in the propositional calculus if a
small set of axiom schemas is assumed. However, the MP rule gives very strong emphasis so the conditional
operator “⇒”. This is an asymmetric operator which is not quite as intuitive in meaning as the and-
operator “∧” and the not-operator “¬”. The PC axioms are somewhat unintuitive when expressed in terms
of the conditional operator. One may therefore ask whether a rule equivalent to MP may be implemented
in terms of the ∧ and ¬ operators.
Consider the following form of deduction: α, ¬(α ∧ ¬β) < β
(1) α Assumption 1
(2) ¬β Assumption
(3) α ∧ ¬β AND (1,2)
(4) ¬(α ∧ ¬β) Assumption 2
(5) (α ∧ ¬β) ∧ ¬(α ∧ ¬β) AND (3,4)
(6) β RAA (2,5)
Thus a combination of the and-introduction rule and reductio ad absurdum yields the same result as MP,
because ¬(α ∧ ¬β) has the same meaning as α ⇒ β. The and-introduction rule is really nothing other than
the definition of the and-operator. So RAA is effectively of equivalent strength to modus ponens. Therefore
the propositional calculus could be written in terms of the and-operator and the RAA rule.
[ Write out the axioms of propositional calculus in terms of the and-operator and the RAA rule. ]
4.6.3 Remark: Some deduction rules may be interpreted as definitions of particular logical operators.
For example, the modus ponens rule is effectively the definition of the logical implication operator “⇒”.
The AND-introduction and AND-elimination rules define the conjunction operator “∧”. The RAA rule is
effectively a definition of the logical negation operator “¬”.
4.6.4 Remark: The modus ponens inference rule is powerful enough to be used essentially alone as the
sole inference rule in part (v) of Remark 4.5.2. Many popular axiomatic systems use only this single rule.
Logicians often strive to reduce propositional calculus to a spartan, minimalist set of inference rules, axioms
and primitive symbols. The more spartan the axiomatic system is, the more difficult it is to deduce the basic
properties of the operators listed in Notation 4.3.3.
The use of a minimal axiomatic system can be justified on the grounds of reliability. The smaller the
definition of the system, the easier it should be to verify that the system is valid in terms of one’s idea of
how logic should be done. However, the intuitive correctness of the system is difficult to establish if the
operators, axioms or rules seem unfamiliar because they are excessively abstracted.
An extreme form of minimalist axiomatic system uses the NAND (not-and) operator (which is equivalent to
the “Sheffer stroke” or “alternative denial”) as the only operator symbol, with a single axiom and a single
inference rule. It is somewhat burdensome to have to derive the familiar standard logic from such an austere
foundation. It also suffers from a serious lack of intuitive comprehensibility. A very similar minimalist
axiomatic system uses the NOR (not-or) operator (also known as “joint denial”) as its only operator symbol.
(These two operators correspond closely to the way a one-transistor circuit can be made to function as a
logic device.)
Since the modus ponens inference rule is so popular, and this rule specifically requires wffs of the form α ⇒ β,
it seems sensible for even a minimalist axiomatic system to include the ⇒ symbol as part of its symbol set in
part (i) of Remark 4.5.2. The complete set of logical operations cannot be defined in terms of the ⇒ symbol
alone. In fact, only the NAND and NOR operators can generate the full set of logic operations from a single
operator. (See Remark 4.11.3 for related comments.)

4.6. Deduction rules 129
[ Jean Nicod (1917) showed that the single axiom schema (α|(β|γ))|((δ|(δ|δ))|((ε|β)|((α|ε)|(α|ε)))) is suffi-
cient for generating standard propositional calculus, using the single inference rule α, (α|(β|γ)) < γ. See
Mendelson [165], page 42, where this NAND operator is called “alternative denial”. ]
4.6.5 Remark: The primacy of the implication operator in propositional calculus may seem inevitable,
but it is valuable to consider why other operators cannot easily fulfil the same role.
Since ancient Greek times, syllogisms such as “all people are mortal; therefore Socrates is mortal” have
dominated logic. Such a deduction is equivalent to modus ponens, which is equivalent to the rule “α, α ⇒
β < β”. But rules and theorems of the form “α1 , α2 , . . . αn < β” are equivalent to tautologies of the form
“α1 ⇒ (α2 ⇒ . . . (αn ⇒ β) . . .)”. This seems to support the claim that the implication operator is special
in some way for logical deduction. The assertion symbol “ < ” appears to be essentially equivalent to an
implication operator.
Our method of logical deduction generally proceeds from a set of pre-established facts or rules, and proceeds
to apply these to special circumstances. As a result, logical deduction tends to have the form of substituting
special-instance information into general rules, and such deduction looks very much like modus ponens:
“α, α ⇒ β < β”, where α ⇒ β is the general rule, α is the special-instance information, and β is the
proposition which is deduced from the rule when it is applied to α.
The possibility of replacing implication with conjunction as the primary operator for propositional calculus
is briefly discussed in Remark 4.6.2. However, although this may be technically possible, it would not be a
natural model for human deduction. Disjunction would be a much more likely candidate for logical operator
primacy, but only because disjunction is very similar to implication. Real-world literature is generally
expressed much more in terms of implications than disjunctions. So this seems to tip the balance in favour
of the implication operator.
Implication rules can be chained together to make new rules. For example, α ⇒ β, β ⇒ γ, γ ⇒ δ < α ⇒ δ.
This cannot be done with conjunctions or disjunctions because they are symmetric. In an aesthetic sense,
the use of symmetric operators would be very pleasing. Symmetry is normally a sought-after attribute in
mathematical theories. But in this case, the asymmetry is apparently highly desirable for utilitarian reasons.
Deduction is itself asymmetric in nature.
4.6.6 Remark: Tautologies such as “(A ∧ (A ⇒ B)) ⇒ B”, which have the ⇒ symbol as the primary
operator, are often written interchangeably in the form of a theorem like “(A ∧ (A ⇒ B)) < B”. The
concept of the assertion symbol < can be guessed by noting that the following three statements are
essentially interchangeable.
< (A ∧ (A ⇒ B)) ⇒ B.
A ∧ (A ⇒ B) < B
A, (A ⇒ B) < B
A < (A ⇒ B) ⇒ B
A list of zero or more propositions to the left of the assertion symbol is the set of “assumptions”. The single
proposition to the right of the assertion symbol is the “assertion”. It is asserted that the assertion can be
deduced from the assumptions. In principle, all theorems in mathematics may be written in this way, but
such a rigorously correct notation is not popular among mathematicians although strict symbolic logic would
make theorem statements much less ambiguous. The assertion symbol “ <” is rarely used in this book except
in the logic and set theory chapters.
4.6.7 Remark: The modus ponens inference rule may be thought of very roughly as replacing the ⇒
operator with the < symbol. The reverse replacement is known as the “deduction theorem”. According to
this metatheorem, the theorem Γ < A ⇒ B may be proved if the theorem Γ, A < B can be proved, for any
list Γ of wffs. (See Section 4.9.)
[ Maybe arrange the four modes in Remark 4.6.8, and their rough meanings, in a table. ]
4.6.8 Remark: The phrase modus ponens is an abbreviation for the medieval reasoning principle called
modus ponendo ponens. This was one of the following four principles of reasoning. (See Lemmon [164],
page 61.)

(i) Modus ponendo ponens. Roughly speaking: A, A ⇒ B < B.

(ii) Modus tollendo tollens. Roughly speaking: ¬B, A ⇒ B < ¬A.
(iii) Modus ponendo tollens. Roughly speaking: A, ¬(A ∧ B) < ¬B.
(iv) Modus tollendo ponens. Roughly speaking: ¬A, A ∨ B < B.
The Latin words in these reasoning-mode names come from the verbs “ponere” and “tollere”. The verb
“ponere” (which means “to put”) has the following particular meanings in the context of logical argument.
(See White [218], page 474.)
(1) [ In speaking or writing: ] To lay down as true; to state, assert, maintain, allege.
(2) To put hypothetically, to assume, suppose.
Meaning (1) is intended in the word “ponens”. Meaning (2) is intended in the word “ponendo”. Thus the
literal meaning of “modus ponendo ponens” is “assertion-by-assumption mode”. In other words, when A is
assumed, B may be asserted.
The Latin verb “tollere” (which means “to lift up”) has the following figurative meanings. (White [218],
page 612.)
To do away with, remove; to abolish, annul, abrogate, cancel.
Mode (ii) (“modus tollendo tollens”) may be translated “negative-assertion-by-negative-assumption mode”
or “negation-by-negation mode”. In other words, when B is assumed to be false, A may be asserted to be
false. Effectively “tollendo” means “by negative assumption” while “tollens” means “negative assertion”.
Mode (iii) (“modus ponendo tollens”) may be translated “negative-assertion-by-positive-assumption mode”
or “negation-by-assumption mode”. When A is assumed to be true, B may be asserted to be false.
Mode (iv) (“modus tollendo ponens”) may be translated “positive-assertion-by-negative-assumption mode”
or “assertion-by-negation mode”. In this case, when A is assumed to be false, B may be asserted to be true.
Since modes (iii) and (iv) are rarely used as inference rules, the more popular modes (i) and (ii) are generally
abbreviated to simply “modus ponens” and “modus tollens” respectively.
4.7. An implication-based propositional calculus
The subject of this section is the particular formulation of propositional calculus which is described in
Definition 4.7.4. This axiomatic system is adopted as the basis for logic theorems which are required in this
book.
[ Possibly also have a section which presents a NAND-based propositional calculus. This would be purely
recreational. Probably not worth the ink. ]
4.7.1 Remark: Definition 4.7.4 is a compromise between the ascetic, impoverished NAND-based axiom
system (mentioned in Remark 4.6.4) and an easy-going 4-symbol, 10-axiom system. Definition 4.7.4 defines
a two-operator axiomatic system with ⇒ and ¬ as primitive symbols, together with three axiom schemas,
and modus ponens as the sole inference rule. This one-rule, two-symbol, three-axiom system is less tedious
to work with than the NAND and NOR logics, but it is still hard work to generate the basic properties of the
symbols in Notation 4.3.3 from it. This axiomatic system is attributed to Jan L
& ukasiewicz. (It is described
by Mendelson [165], pages 30–31.)
[ Present a 4-symbol, 10-axiom, 1-rule system and refer to this in Remark 4.7.1. ]
[ The single axiom schema ((((α ⇒ β) ⇒ (¬γ ⇒ ¬δ)) ⇒ γ) ⇒ ε) ⇒ ((ε ⇒ α) ⇒ (δ ⇒ α)) was shown
by C. A. Meredith (1953) to be equivalent to the three-axiom set in Definition 4.7.4. This is mentioned in
Mendelson [165], page 42. ]
4.7.2 Remark: Logic theorems which are deduced from the propositional calculus axiomatic system in
Definition 4.7.4 will be tagged with the abbreviation “PC”. For example, see Theorem 4.8.3. (Theorems
which are not tagged are, by default, derived from Zermelo-Fraenkel set theory. See Remark 5.0.10.)
[ Get a reference for who invented the axiom system in Definition 4.7.4. Probably Jan L
& ukasiewicz. ]

4.8. Some propositional calculus theorems 131
[ One source gives a simpler version of the third axiom schema (PC 3) in Definition 4.7.4: (¬β ⇒ ¬α) ⇒
(β ⇒ α). Check that this is valid and find out why Mendelson gives the more complicated axiom schema.
It must be shown that this modified axiom, together with axioms (PC 1) and (PC 2), implies axiom (PC 3)
as stated. ]
4.7.3 Remark: Definition 4.7.4 is organized as parts (i) to (v) corresponding to the five parts of Re-
mark 4.5.2.
4.7.4 Definition: The following is an axiom system for propositional calculus.

(i) The primitive connectives are ⇒ (“implies”) and ¬ (“not”). The grouping parentheses are “(” and “)”.
(ii) The statement names are the upper-case letters of the Roman alphabet, A to Z, with or without decimal
integer subscripts. The “wff names” are the lower-case letters of the Greek alphabet, with or without
decimal integer subscripts.
(iii) Any statement name is a wff. For any wff α, the formula (¬α) is a wff. For any wffs α and β, the
formula (α ⇒ β) is a wff. Any formula which cannot be constructed by recursive application of these
rules is not a wff. (For clarity, parentheses may be omitted in accordance with precedence rules.)
(iv) The axiom schemas are as follows:
(PC 1) α ⇒ (β ⇒ α).
(PC 2) (α ⇒ (β ⇒ γ)) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)).
(PC 3) (¬β ⇒ ¬α) ⇒ ((¬β ⇒ α) ⇒ β).
(v) The only inference rule is modus ponens. (See Remark 4.6.1.)
[ Should prove somewhere that Definition 4.7.4 is consistent with the truth-value function properties of the
operators ⇒ and ¬. Also show that all properties of these operators may be derived from Definition 4.7.4.
In other words, show that Definition 4.7.4 is equivalent to the operators. ]
4.7.5 Remark: Axiom schemas (PC 1)–(PC 3) in Definition 4.7.4 are perhaps a little difficult to interpret.
Axiom (PC 1) is just one half of the definition of the ⇒ operator. Axiom (PC 2) looks like a “distributivity
axiom” or “restriction axiom”, which says how the ⇒ operator associates with the ⇒ operator.
4.7.6 Remark: Axiom (PC 3) looks very much like a “reductio ad absurdum” axiom. It states that if the
assumption ¬β implies both ¬α and α, then the assumption ¬β must be false; in other words β must be
true.
4.7.7 Remark: The gradual build-up of theorems in an axiomatic system (such as propositional calculus,
predicate calculus or Zermelo-Fraenkel set theory) is analogous to the way programming procedures (also
called “functions”) are built up in software libraries. In both cases, there is an attempt to amass a hoard of
re-usable “intellectual capital” which can be used in a wide variety of future work. Consequently the work
gets progressively easier as “user-friendly” theorems (or programming procedures) accumulate over time.
Accumulation of “theorem libraries” sounds like a good idea in principle, but a single error in a single re-usable
item (a theorem or a programming procedure) can easily propagate to a very wide range of applications. In
other words, “bugs” can creep into re-usable libraries. It is for this reason that there is so much emphasis
on total correctness in mathematical logic. The slightest error could propagate to all of mathematics.
The development of the propositional calculus is also analogous to “boot-strapping” a computer operating
system. The propositional calculus is the lowest functional layer of mathematics. Everything else is based
on this substrate. Logic and set theory may be thought of as the “operating system” of mathematics. Then
differential geometry is a “user application” in this “operating system”.
4.8. Some propositional calculus theorems

4.8.1 Remark: Theorem 4.8.3 follows the usual informal approach to proofs in logic, which is to find proofs
for desired assertions, while building up a small library of useful intermediate assertions along the way. A
different approach would be to generate all possible assertions which can be obtained in n deductions steps

with all possible combinations of applications of the deduction rules. Then n can be gradually increased to
discover all possible theorems. This is analogous to finding the span of a set of vectors in a linear space.
However, such a systematic, exhaustive approach is about as useful as generating the set of all possible chess
games according to the number of moves n. As n increases, the number of possible games increases very
rapidly. But a more serious problem is that it the vast majority of such games are worthless. Similarly in
logic, the vast majority of true assertions are uninteresting. (A similar comment is made in Remark 2.9.1.)
4.8.2 Remark: The order of assertions in Theorem 4.8.3 is chosen so that earlier assertions assist in the
proof of later assertions. Although the proof is long, and quite stressful if you’re out of practice, it does
demonstrate some of the flavour of propositional calculus. After reading the proofs of the first one or two
parts of the theorem, the reader may like to find proofs for the other parts without looking at the solutions
provided here. Another useful exercise is to try to find shorter proofs than those which are given here.
4.8.3 Theorem [pc]: The following assertions follow from Definition 4.7.4.
(i) < α ⇒ α.
(ii) < (¬α ⇒ α) ⇒ α.
(iii) α ⇒ β, β ⇒ γ < α ⇒ γ.
(iv) α ⇒ (β ⇒ γ) < β ⇒ (α ⇒ γ).
(v) < (¬β ⇒ ¬α) ⇒ (α ⇒ β).
(vi) < ¬¬α ⇒ α.
(vii) α ⇒ β < ¬¬α ⇒ β.
(viii) < α ⇒ ¬¬α.
(ix) α ⇒ β < α ⇒ ¬¬β.
(x) α ⇒ β < ¬¬α ⇒ ¬¬β.
(xi) α ⇒ β < ¬β ⇒ ¬α.
(xii) α ⇒ ¬β < β ⇒ ¬α.
(xiii) ¬α ⇒ β < ¬β ⇒ α.
(xiv) < ¬α ⇒ (α ⇒ β).
(xv) < α ⇒ (¬α ⇒ β).
(xvi) < ¬(α ⇒ ¬β) ⇒ α.
(xvii) < ¬(α ⇒ ¬β) ⇒ β.
Proof: To prove part (i): <α⇒α

(1) α ⇒ ((α ⇒ α) ⇒ α) PC 1
(2) (α ⇒ ((α ⇒ α) ⇒ α)) ⇒ ((α ⇒ (α ⇒ α)) ⇒ (α ⇒ α)) PC 2
(3) (α ⇒ (α ⇒ α)) ⇒ (α ⇒ α) MP (1,2)
(4) α ⇒ (α ⇒ α) PC 1
(5) α⇒α MP (4,3)
To prove part (ii): < (¬α ⇒ α) ⇒ α
(1) ¬α ⇒ ¬α part (i)
(2) (¬α ⇒ ¬α) ⇒ ((¬α ⇒ α) ⇒ ¬α) PC 3
(3) (¬α ⇒ α) ⇒ ¬α MP (1,2)
To prove part (iii): α ⇒ β, β ⇒ γ < α ⇒ γ
(1) α⇒β Hyp
(2) β⇒γ Hyp
(3) (β ⇒ γ) ⇒ (α ⇒ (β ⇒ γ)) PC 1
(4) α ⇒ (β ⇒ γ) MP (2,3)

4.8. Some propositional calculus theorems 133
(5) (α ⇒ (β ⇒ γ)) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)) PC 2

(6) (α ⇒ β) ⇒ (α ⇒ γ) MP (4,5)
(7) α ⇒ γ MP (1,6)
To prove part (iv): α ⇒ (β ⇒ γ) < β ⇒ (α ⇒ γ)
(1) α ⇒ (β ⇒ γ) Hyp
(2) β ⇒ (α ⇒ β) PC 1
(3) (α ⇒ (β ⇒ γ)) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)) PC 2
(4) (α ⇒ β) ⇒ (α ⇒ γ) MP (1,3)
(5) β ⇒ (α ⇒ γ) part (iii) (2,4)
To prove part (v): < (¬β ⇒ ¬α) ⇒ (α ⇒ β)
(1) α ⇒ (¬β ⇒ α) PC 1
(2) (¬β ⇒ ¬α) ⇒ ((¬β ⇒ α) ⇒ β) PC 3
(3) (¬β ⇒ α) ⇒ ((¬β ⇒ ¬α) ⇒ β) part (iv) (2)
(4) α ⇒ ((¬β ⇒ ¬α) ⇒ β) part (iii) (1,3)
(5) (¬β ⇒ ¬α) ⇒ (α ⇒ β) part (iv) (4)
To prove part (vi): < ¬¬α ⇒ α
(1) ¬α ⇒ ¬α part (i)
(2) (¬α ⇒ ¬¬α) ⇒ ((¬α ⇒ ¬α) ⇒ α) PC 3
(3) (¬α ⇒ ¬α) ⇒ ((¬α ⇒ ¬¬α) ⇒ α) part (iv) (2)
(4) (¬α ⇒ ¬¬α) ⇒ α MP (1,3)
(5) ¬¬α ⇒ (¬α ⇒ ¬¬α) PC 1
(6) ¬¬α ⇒ α part (iii) (5,4)
To prove part (vii): α ⇒ β < ¬¬α ⇒ β
(1) α ⇒ β Hyp
(2) ¬¬α ⇒ α part (vi)
(3) ¬¬α ⇒ β part (iii) (2,1)
To prove part (viii): < α ⇒ ¬¬α
(1) ¬¬¬α ⇒ ¬α part (vi)
(2) (¬¬¬α ⇒ ¬α) ⇒ (α ⇒ ¬¬α) part (v)
(3) α ⇒ ¬¬α MP (1,2)
To prove part (ix): α ⇒ β < α ⇒ ¬¬β
(1) α ⇒ β Hyp
(2) β ⇒ ¬¬β part (viii)
(3) α ⇒ ¬¬β MP (1,2)
To prove part (x): α ⇒ β < ¬¬α ⇒ ¬¬β
(1) α ⇒ β Hyp
(2) ¬¬α ⇒ β part (vii) (1)
(3) ¬¬α ⇒ ¬¬β part (ix) (2)
To prove part (xi): α ⇒ β < ¬β ⇒ ¬α
(1) α⇒β Hyp
(2) ¬¬α ⇒ ¬¬β part (x) (1)
(3) (¬¬α ⇒ ¬¬β) ⇒ (¬β ⇒ ¬α) part (v)
(4) ¬β ⇒ ¬α MP (2,3)

To prove part (xii): α ⇒ ¬β < β ⇒ ¬α

(1) α ⇒ ¬β Hyp
(2) ¬¬α ⇒ ¬β part (vii) (1)
(3) (¬¬α ⇒ ¬β) ⇒ (β ⇒ ¬α) part (v)
(4) β ⇒ ¬α MP (2,3)
To prove part (xiii): ¬α ⇒ β < ¬β ⇒ α
(1) ¬α ⇒ β Hyp
(2) ¬α ⇒ ¬¬β part (ix) (1)
(3) (¬α ⇒ ¬¬β) ⇒ (¬β ⇒ α) part (v)
(4) ¬β ⇒ α MP (2,3)
To prove part (xiv): < ¬α ⇒ (α ⇒ β)
(1) ¬α ⇒ (¬β ⇒ ¬α) PC 1
(2) (¬β ⇒ ¬α) ⇒ (α ⇒ β) part (v)
(3) ¬α ⇒ (α ⇒ β) part (iii) (1,2)
To prove part (xv): < α ⇒ (¬α ⇒ β)
(1) α ⇒ ¬¬α part (viii)
(2) ¬¬α ⇒ (¬α ⇒ β) part (xiv)
(3) α ⇒ (¬α ⇒ β) part (iii) (1,2)
To prove part (xvi): < ¬(α ⇒ ¬β) ⇒ α
(1) ¬α ⇒ (α ⇒ ¬β) part (xiv)
(2) ¬(α ⇒ ¬β) ⇒ α part (xiii) (1)
To prove part (xvii): < ¬(α ⇒ ¬β) ⇒ β
(1) ¬β ⇒ (α ⇒ ¬β) PC 1
(2) ¬(α ⇒ ¬β) ⇒ β part (xiii) (1)
This completes the proof of Theorem 4.8.3.
4.8.4 Remark: It is fair to ask how one might discover the proof which is presented here for part (i) of
Theorem 4.8.3. The designer of this axiomatic system chose the axioms so that the propositional calculus
could only just be generated from them. The system designer had the unfair advantage of working backwards
from the theorems to the axioms.
For the non-specialist, the discovery of proofs is initially like solving some of those frustrating recreational
puzzles which are sold in puzzle shops. They are designed so that a solution exists, but is very difficult to
find. One might also compare the proof of basic theorems from minimal axioms sets to the deciphering of
encrypted messages where you know the answer (or “plain-text”), and you have to descramble the message
to arrive at the known answer. When proving logic theorems, one knows the answers, and the axioms are
like a compressed, encrypted version of the full set of basic logic theorems. Therefore the main benefit of
proving basic logic theorems is the acquisition of decryption skills, which hopefully will be applicable when
the “plain-text” is not known in advance.
The first thing one may notice about the first two axioms, (PC 1) and (PC 2), is that the “input” (the left
side of the top-level implication) of (PC 2) looks similar to the whole of (PC 1). Recognizing this, one can
write the following.
< (α ⇒ β) ⇒ (α ⇒ α)
(1) α ⇒ (β ⇒ α) PC 1
(2) (α ⇒ (β ⇒ α)) ⇒ ((α ⇒ β) ⇒ (α ⇒ α)) PC 2
(3) (α ⇒ β) ⇒ (α ⇒ α) MP (1,2)

4.9. Meta-theorems and the “deduction theorem” 135
Now the input of (3) is the same as the “output” (the right side of the top-level implication) of (PC 1) with
α and β swapped: β ⇒ (α ⇒ β). However, it is clearly not possible to prove that β is true for any wff β. So
this line of attack leads nowhere. However, if β is replaced in (3) with (β ⇒ α), the input of (3) becomes
the same as axiom (PC 1). This gives a valid result as follows.
<α⇒α
(4) α ⇒ (β ⇒ α) PC 1
(5) (α ⇒ (β ⇒ α)) ⇒ (α ⇒ α) (from assertion above)
(6) α ⇒ α MP (4,5)
Now that we know what we’re doing, we can combine the two above arguments into a single proof and
pretend that we discovered it exactly as presented.
<α⇒α
(1) α ⇒ ((β ⇒ α) ⇒ α) PC 1
(2) (α ⇒ ((β ⇒ α) ⇒ α)) ⇒ ((α ⇒ (β ⇒ α)) ⇒ (α ⇒ α)) PC 2
(3) (α ⇒ (β ⇒ α)) ⇒ (α ⇒ α) MP (1,2)
(4) α ⇒ (β ⇒ α) PC 1
(5) α⇒α MP (4,3)
If β is replaced by α, this is the same as the proof which was given for Theorem 4.8.3 (i).
4.8.5 Remark: A particular difficulty of propositional calculus theorem proving is knowing the best order
in which to attack a desired set of assertions. If the order is chosen well, the earlier assertions can be very
useful for proving the later assertions. But it is often only after the proofs are found that one can see a
better order of attack.
In practice, one typically has a single target assertion to prove and looks (recursively) for assertions which
can assist with this. If this backwards-deductive search leads to known results, the target assertion can be
proved. The natural order of discovery is typically the opposite of the order of deduction of propositions.
This is a constant feature of all mathematics research. Discovery of new results typically starts from the
conclusions and works back to old results. (Most of the assertions in Theorem 4.8.3 were proved by the
author in order to reach assertion (xvii), which happens to be equivalent to (α ∧ β) ⇒ α, an important
property of the ∧ operator. The author first sketched a proof of assertion (xvii) using unproven assertions
and then found proofs of these unproven assertions, which in turn required further unproven assertions, until
finally all required assertions were proved from axioms.)
4.8.6 Remark: Theorem 4.8.3 (iii) means that the conditional logical operator is transitive. This is equiv-
alent to the “a-fortiori” method of proof.
4.9. Meta-theorems and the “deduction theorem”

[ Possibly some of the discussion in this section should be moved to Chapter 3. ]
[ How many other PC meta-theorems are useful enough to present here? ]
4.9.1 Remark: As mentioned in Remark 4.9.4, all deduction rules may be regarded as meta-theorems.
This is because deduction rules are generally proved by meta-logical means to yield only true results from
true assumptions. Deduction rules are derived from more fundamental considerations such as the definitions
of logical operators. (Some logic textbooks may give the impression that the deduction rules are more
fundamental than logical operators.)
One example of a deduction rule which is really a meta-theorem is the substitution rule in propositional
calculus. This states that any compound proposition may be substituted for any proposition name in a
theorem to yield a new theorem. If the concrete proposition domain is closed under all logical operators,
then the validity of the substitution rule is almost obvious. This is because one needs only to define a
proposition name which refers to each desired compound proposition and then substitute this name for the
generic name which was in the original theorem.

If the concrete proposition domain is not closed under logical operations, no such simple substitution is
possible. In this case, it is necessary to determine whether the set of axioms is closed under substitution
of compound propositions. In typical axiom sets, this is true. In a propositional calculus, the axioms will
typically be templates into which arbitrary logical expressions may be substituted. So the substitution rule
is once again valid.
If the concrete proposition domain is not closed under logical operations and the axioms are not closed under
substitution of logical expressions, it is possible that the substitution rule could be invalid.
4.9.2 Remark: In a formal sense, the assertions (1) “< α ⇒ β” and (2) “α < β” are different for any
wffs α and β, but in practice, they are “equipotent”, more or less. In other words, they both yield the same
results if they are applied in a logical argument using modus ponens. If the wff α appears on a line of an
argument, assertion (1) may be applied to infer β by modus ponens. Assertion (2) may be applied to infer
β by the “theorem application rule”. So it seems to be unimportant whether the theorem is available in one
form or the other.
One fly in the ointment here is that an axiom such as (γ ⇒ (δ ⇒ ε)) ⇒ ((γ ⇒ δ) ⇒ (γ ⇒ ε)) (as
in Remark 4.7.4 (PC 2)) can only yield a result using MP if a proposition of the form (γ ⇒ (δ ⇒ ε)) is
available. It may be that α ⇒ β is of the required form, but α and β individually are not. So such an
axiom could then be applied to assertion (1), but not to assertion (2). It is therefore desirable to be able to
convert between these two forms of assertion. The conversion from (1) to (2) is discussed in Remark 4.9.3.
The reverse conversion is the subject of Theorem 4.9.8.
4.9.3 Remark: It is always possible to convert an assertion of the form “< α ⇒ β” to the assertion
“α < β” using modus ponens. Suppose first that “< α ⇒ β” is a proved theorem. Then the following
argument will be valid.
α < β
(1) α Hyp
(2) α ⇒ β Theorem
(3) β MP (1,2)
The proof is apparently trivial. However, the general rule that the existence of a proof for the assertion
“< α ⇒ β” implies the existence of a proof for the assertion “α < β” is not a theorem. It is a meta-theorem.
The reason for this is that α and β are not names of completely general wffs in each assertion. The scope
of these wffs includes both of the assertions. These wff names refer to fixed wffs within the combined scope
of the two assertions. The meta-theorem is a statement about proofs, not about propositions or wffs. Any
such theorem about proofs is a meta-theorem.
4.9.4 Remark: One way to bring the meta-theorem in Remark 4.9.3 into the main stream of argumentation
within propositional calculus is to simply declare this meta-theorem to be a deduction rule. In other words,
one may declare that whenever there exists a proof of an assertion of the form “< α ⇒ β”, it is permissible
to infer the assertion “α < β”. A deduction rule can permit any kind of inference one desires. In this
case, we “know” that this rule will always give true inferences from true assumptions. This follows from the
meta-proof in Remark 4.9.3. Since all of the other rules of inference are justified meta-logically in the same
way, there is no reason to exclude this kind of deduction rule. In fact, all deduction rules may be regarded
as meta-theorems because they can only be justfied by meta-proofs.
Probably the principal reason for excluding the declaration of the meta-theorem in Remark 4.9.3 as a
deduction rule is the “minimalist principle” which has pervaded all of mathematics and logic for the last
hundred years.
Another way to avoid the need for the meta-theorem in Remark 4.9.3 is to forbid all theorems where there
are assumptions. Given a theorem of the form “< α ⇒ β”, one may always infer β from α by modus ponens.
In other words, there is no need for theorems of the form “α < β”. This would make the presentation
of symbolic logic only slightly more tedious. An assertion of the form α1 , α2 , . . . αn < β would need to
be presented in the form < α1 ⇒ (α2 ⇒ (. . . (αn ⇒ β) . . .)), which is somewhat untidy. For example,
α1 , α2 , α3 < β becomes < α1 ⇒ (α2 ⇒ (α3 ⇒ β)).

4.9. Meta-theorems and the “deduction theorem” 137
4.9.5 Remark: After doing a fairly large number of proofs in the style of Theorem 4.8.3, one naturally
feels a desire for short-cuts. One of the most significant frustrations is the inability to convert an assertion
of the form “α < β” to an implication of the form “< α ⇒ β”.
An assertion of the form “α < β” means that the writer claims that there exists a proof of the wff β from
the assumption α. Therefore if α appears on a line of an argument, then β may be validly written on any
later line. It is intuitively clear that this is equivalent to the assertion “< α ⇒ β”. The latter assertion can
be used as an input to modus ponens whereas the assertion “α < β” cannot.
Although it is possible to convert an assertion of the form “< α ⇒ β” to the assertion “α < β” (as mentioned
in Remark 4.9.3), there is no way to make the reverse conversion. A partial solution to this problem is called
the “Deduction Theorem”. This is not actually a theorem at all. It is sometimes referred to as a meta-
theorem, but if the logical framework for the proof of the meta-theorem is not well defined, it cannot be said
to be a real theorem at all. It might be more accurate to call it a “naive theorem” since the proof uses naive
logic and naive mathematics.
The proof of this “theorem” requires mathematical induction, which requires the arithmetic of infinite sets,
which requires set theory, which requires logic. (This inescapable cycle of dependencies was forewarned
in Remark 2.1.1 and illustrated in Figure 2.1.1.) However, all of the propositional calculus requires naive
arithmetic and naive set theory already. So one may as well go the whole hog and throw in some naive
mathematical induction too. (Mathematical induction is generally taught by the age of 16 years. One
cannot generally progress in mathematics without accepting it. A concept which is taught at a young
enough age is generally accepted as “obvious”.)
One could call these sorts of “logic theorems” by various names, like “pseudo-theorems”, “meta-theorems”
or “fantasy theorems”. In this book, they will be called “naive theorems” since they are proved using naive
logic and naive set theory.
4.9.6 Remark: Quite likely the “wffs” in Remark 4.9.7 are really wff-wffs. This is because they are wffs
whose names are themselves wffs. [ Check this. ]
4.9.7 Remark: Theorem 4.9.8 is an attempt to provide a corollary for the meta-theorem in Remark 4.9.3.
To formulate naive Theorem 4.9.8, let W denote the set of all possible wffs in the propositional calculus in
Definition 4.7.4, let W n denote the set of all sequences of n wffs for non-negative integers n, and let List(W )
%∞
denote the set n=0 W n of sequences of wffs with non-negative length. An element of W n is said to be a wff
sequence of length n. The concatenation of two wff sequences Γ1 and Γ2 is denoted with a comma as Γ1 , Γ2 .
[ Give a pseudo-theorem which states that quoting a theorem is a valid procedure for proving an assertion.
See comment at end of Remark 4.9.10. ]
[ Ideally one should have fully-developed meta-theory with meta-axioms and meta-rules and meta-classes etc.
to give some credibility to “theorems” like Theorem 4.9.8. In principle, this is the purpose of Section 3.14.
But actually this enterprise is doomed. For example, we should at be able to say what “provable” means.
We could define “proofs” as sequences of “lines”, and define “lines” as sequences of “symbols”. But what
is a symbol? The first machines must be created by hand. Only then can we hope that some machines will
create other machines. ]
4.9.8 Theorem [naive]: “Deduction Theorem”

In the propositional calculus in Definition 4.7.4, let Γ ∈ List(W ) be a wff sequence. Let α ∈ W and β ∈ W
be wffs for which the assertion Γ, α < β is provable. Then the assertion Γ < α ⇒ β is provable.
Proof: Let Γ ∈ List(W ) be a wff sequence. Let α ∈ W and β ∈ W be wffs. Let ∆ = (δ1 , . . . δm ) ∈ W m
be a proof of the assertion Γ, α < β with δm = β. First assume that no other theorems are used in the
proof ∆.
Define the proposition P (k) for integers k with 1 ≤ k ≤ m by P (k) = “the assertion Γ < α ⇒ δi is provable
for all positive i with i ≤ k”.
[ Define concepts like “line” and “proof” and “quoting a theorem” and “deduction rule” using sets and lists and
integers so that the next paragraph will makes sense. Provide these pseudo-definitions before the statement
of the pseudo-theorem. ]

To prove P (1) it must be shown that Γ < α ⇒ δ1 is provable by some proof ∆# ∈ List(W ). Every line of
the proof ∆ must be either (a) an axiom (possibly with some substitution for the wff names), (b) an element
of the wff sequence Γ, (c) the wff α, or (d) the output of a modus ponens rule application. Line δ1 of the
proof ∆ cannot be a modus ponens output. So there are only three possibilities. Suppose δ1 is an axiom.
Then the following argument is valid.
Γ < α ⇒ δ1
(1) δ1 (from some axiom)
(2) δ1 ⇒ (α ⇒ δ1 ) PC 1
(3) α ⇒ δ1 MP (1,2)
If δ1 is an element of the wff sequence Γ, the situation is almost identical.
Γ < α ⇒ δ1
(1) δ1 Hyp (from Γ)
(2) δ1 ⇒ (α ⇒ δ1 ) PC 1
(3) α ⇒ δ1 MP (1,2)
If δ1 equals α, the following proof works.
Γ < α ⇒ δ1
(1) α ⇒ α Theorem 4.8.3 (i)
When a theorem is quoted as above, it means that a full proof could be written out “inline” according to the
proof of the quoted theorem. Now the proposition P (1) is proved. So it remains to show that P (k−1) ⇒ P (k)
for all k > 1.
Now suppose that P (k − 1) is true for an integer k with 1 < k ≤ m. That is, asssume that the assertion
Γ < α ⇒ δi is provable for all integers i with 1 ≤ i < k. Line δk of the original proof ∆ must have been
justified by one of the four reasons (a) to (d) given above. Cases (a) to (c) lead to a proof of Γ < α ⇒ β
exactly as for establishing P (1) above.
In case (d), line δk of proof ∆ is arrived at by modus ponens from two lines δi and δj with 1 ≤ i < k
and 1 ≤ j < k with i -= j, where δj has the form δi ⇒ δk . By the inductive hypothesis P (k − 1), there are
! !!
valid proofs ∆# ∈ W m for Γ < α ⇒ δi and ∆##j ∈ W m for Γ < α ⇒ δj . Then the concatenated argument
! !!
∆# , ∆## ∈ W m +m has α ⇒ δi on line (m# ) and α ⇒ (δi ⇒ δk ) on line (m# + m## ). A proof of Γ < α ⇒ δk
may then be constructed as an extension of the argument ∆# , ∆## as follows.
Γ < α ⇒ δk
(m# ) α ⇒ δi (above lines of ∆# )
(m# + m## ) α ⇒ (δi ⇒ δk ) (above lines of ∆## )
(m + m + 1) (α ⇒ (δi ⇒ δk )) ⇒ ((α ⇒ δi ) ⇒ (α ⇒ δk ))
# ##
PC 2
(m + m + 2) (α ⇒ δi ) ⇒ (α ⇒ δk )
# ##
MP (m + m , m + m + 1)
# ## # ##
(m + m + 3) α ⇒ δk
# ##
MP (m# , m# + m## + 2)
(Alternatively one could first prove the theorem α ⇒ β, α ⇒ (β ⇒ γ) < α ⇒ γ and apply this to lines (m# )
and (m# + m## ).) This established P (k). Therefore by mathematical induction, it follows that P (m) is true,
which means that Γ < α ⇒ β is provable.
4.9.9 Remark: For the principle of mathematical induction, see Remark 7.3.4.
4.9.10 Remark: Theorem 4.9.8 is clearly bogus. It is circular, like a thief who sells you an item which
they have just stolen from you. However, this kind of pseudo-theorem is common in the mathematical logic
literature. The proof is such a total mess, it would be much tidier to simply assume a kind of “reverse
modus ponens” rule rather than invoke the machinery of mathematical induction at such a basic level in the
development of logic.

4.10. Further theorems for the implication operator 139
Another way out of this mess is to do without any so-called “Deduction Theorem”. It is not essential. One
work-around is to never, ever have any wffs on the left side of the assertion symbol in theorems. The cost of
doing this is simply that in applying theorems, one must have an extra line of modus ponens to apply the
theorem for every wff which was eliminated from the left side of the assertion symbol. This is a very tiny
cost compared to the intellectual messiness of Theorem 4.9.8.
Yet another way out of the “Deduction Theorem” mess is to go back to the proof of any theorems which
have wffs “on the left” and apply the imported proofs “inline” in the proof in which the theorem is required.
This is more onerous than the total avoidance of wffs “on the left”.
[ Implement the idea in the following paragraph. ]
The use of theorems is itself a kind of second inference rule in addition to modus ponens. Strictly speaking,
there should also be a pseudo-theorem to “prove” by naive logic and naive set theory that the application
of previous theorems to the proof of a new theorem is valid.
[ Remarks 4.9.11 and 4.9.12 are perhaps too philosophical for this chapter. They could be either moved to
the Chapter 2 or deleted. ]
4.9.11 Remark: If the reader concludes from Remark 4.9.10 that all of mathematical logic and mathe-
matics is bogus, that would be fair enough if the expectation was for a directed acylic graph of concepts,
but that is a false expectation. Mathematics is like a natural-language dictionary which defines all words
in terms of other words. The recursive look-up of words must eventually arrive at a cyclic dependency.
One requires a certain amount of a-priori knowledge. Mathematics has the advantage that all of the cyclic
dependencies can be isolated to a well-defined subset of the subject.
Logic depends on set theory and integer arithmetic. Integer arithmetic depends on set theory, which depends
on logic. If the reader can accept the network of concepts in these elementary topics, the rest of mathematics
is “solid”.
4.9.12 Remark: The unsatisfying cyclic nature of logic and set theory at the core of mathematics is
particularly sad for people who wish to take refuge in mathematics to escape the lack of careful logic in
physics. On this subject, Bell [191], pages 516–517, says the following about Dedekind’s exit from physics
and chemistry into mathematics.
But he did not wander long in darkness. By the age of seventeen he had smelt numerous rats in
the alleged reasoning of physics and had turned to mathematics for less objectionable logic.
[ Since Gödel’s famous theorems are meta-mathematical, relying on naive mathematics for their proofs, it
seems that these theorems may all be bogus, since they rely on the prior establishment of much set theory for
the meta-mathematics. Check this. The aspersions cast on mathematicians by logicians may all be baseless
since their logical tools are constructed with the very mathematical tools which they seek to undermine. ]
4.9.13 Remark: An assertion of the form α1 , α2 , . . . αn < β means that there exists a proof of β from
the assumptions α1 , α2 , . . . αn . Therefore if the wffs α1 , . . . αn appear on any n lines of an argument,
then the wff β may be written on any later line. It is perhaps intuitively clear that this is equivalent to
the assertion < α1 ⇒ (α2 ⇒ (. . . (αn ⇒ β) . . .)). It is also perhaps intuitively clear that a meta-proof of
this meta-theorem may be derived from the “Deduction Theorem” (Theorem 4.9.8) with the aid of naive
mathematical induction. However, it is not even possible to denote this inductive meta-theorem within the
notation framework which has been defined. So it is difficult to see how one can realistically hope to prove
an assertion which cannot be clearly written down!
4.10. Further theorems for the implication operator

4.10.1 Remark: Using the “deduction theorem”, the assertions in Theorem 4.8.3 which have wffs on the
left of the assertion symbol may be converted to equivalent wffs with no wffs on the left as in Theorem 4.10.2.
4.10.2 Theorem [pc]: The following assertions follow from the propositional calculus in Definition 4.7.4.
(i) < (α ⇒ β) ⇒ ((β ⇒ γ) ⇒ (α ⇒ γ)).
(ii) < (α ⇒ (β ⇒ γ)) ⇒ (β ⇒ (α ⇒ γ)).

(iii) < (α ⇒ β) ⇒ (¬¬α ⇒ β).

(iv) < (α ⇒ β) ⇒ (α ⇒ ¬¬β).
(v) < (α ⇒ β) ⇒ (¬¬α ⇒ ¬¬β).
(vi) < (α ⇒ β) ⇒ (¬β ⇒ ¬α).
(vii) < (α ⇒ ¬β) ⇒ (β ⇒ ¬α).
(viii) < (¬α ⇒ β) ⇒ (¬β ⇒ α).
Proof: By the “Deduction Theorem” (Theorem 4.9.8), part (i) follows from Theorem 4.8.3 (iii), part (ii)
follows from Theorem 4.8.3 (iv), part (iii) follows from Theorem 4.8.3 (vii), part (iv) follows from Theo-
rem 4.8.3 (ix), part (v) follows from Theorem 4.8.3 (x), part (vi) follows from Theorem 4.8.3 (xi), part (vii)
follows from Theorem 4.8.3 (xii) and part (viii) follows from Theorem 4.8.3 (xiii).
4.10.3 Remark: For proof of Theorem 4.10.2 without the “Deduction Theorem”, see Exercise 46.1.4.
4.10.4 Remark: The purpose of Theorem 4.10.5 is to prove Theorem 4.11.7 (viii), which is equivalent to
Theorem 4.10.5 (vi).
4.10.5 Theorem [pc]: The following assertions follow from the propositional calculus in Definition 4.7.4.
(i) < (¬α ⇒ β) ⇒ ((α ⇒ β) ⇒ β).
(ii) < (α ⇒ β) ⇒ ((¬α ⇒ β) ⇒ β).
(iii) α ⇒ (β ⇒ γ), γ ⇒ δ < α ⇒ (β ⇒ δ).
(iv) < (¬α ⇒ β) ⇒ ((β ⇒ γ) ⇒ ((α ⇒ γ) ⇒ γ)).
(v) α ⇒ (β ⇒ (γ ⇒ δ)) < α ⇒ (γ ⇒ (β ⇒ δ)).
(vi) < (α ⇒ γ) ⇒ ((β ⇒ γ) ⇒ ((¬α ⇒ β) ⇒ γ)).
Proof: To prove part (i): < (¬α ⇒ β) ⇒ ((α ⇒ β) ⇒ β)

(1) (¬β ⇒ ¬α) ⇒ ((¬β ⇒ α) ⇒ β) PC 3
(2) (α ⇒ β) ⇒ (¬β ⇒ ¬α) Theorem 4.10.2 (vi)
(3) (α ⇒ β) ⇒ ((¬β ⇒ α) ⇒ β) Theorem 4.8.3 (iii) (2,1)
(4) (¬β ⇒ α) ⇒ ((α ⇒ β) ⇒ β) Theorem 4.8.3 (iv) (3)
(5) (¬α ⇒ β) ⇒ (¬β ⇒ α) Theorem 4.10.2 (viii)
(6) (¬α ⇒ β) ⇒ ((α ⇒ β) ⇒ β) Theorem 4.8.3 (iii) (5,4)
To prove part (ii): < (α ⇒ β) ⇒ ((¬α ⇒ β) ⇒ β)
(1) (¬α ⇒ β) ⇒ ((α ⇒ β) ⇒ β) part (i)
(2) (α ⇒ β) ⇒ ((¬α ⇒ β) ⇒ β) Theorem 4.8.3 (iv) (1)
To prove part (iii): α ⇒ (β ⇒ γ), γ ⇒ δ < α ⇒ (β ⇒ δ)
(1) α ⇒ (β ⇒ γ) Hyp
(2) γ⇒δ Hyp
(3) (α ⇒ (β ⇒ γ)) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)) PC 2
(4) (α ⇒ β) ⇒ (α ⇒ γ) MP (1,3)
(5) (α ⇒ γ) ⇒ ((γ ⇒ δ) ⇒ (α ⇒ δ)) Theorem 4.10.2 (i)
(6) (α ⇒ β) ⇒ ((γ ⇒ δ) ⇒ (α ⇒ δ)) Theorem 4.8.3 (iii) (4,5)
(7) (β ⇒ γ) ⇒ ((γ ⇒ δ) ⇒ (β ⇒ δ)) Theorem 4.10.2 (i)
(8) α ⇒ ((γ ⇒ δ) ⇒ (β ⇒ δ)) Theorem 4.8.3 (iii) (1,7)
(9) (γ ⇒ δ) ⇒ (α ⇒ (β ⇒ δ)) Theorem 4.8.3 (iv) (8)
(10) α ⇒ (β ⇒ δ) MP (2,9)
To prove part (iv): < (¬α ⇒ β) ⇒ ((β ⇒ γ) ⇒ ((α ⇒ γ) ⇒ γ))

4.11. Other logical operators 141
(1) (¬α ⇒ β) ⇒ ((β ⇒ γ) ⇒ (¬α ⇒ γ)) Theorem 4.10.2 (i)

(2) (¬α ⇒ γ) ⇒ ((α ⇒ γ) ⇒ γ) part (i)
(3) (¬α ⇒ β) ⇒ ((β ⇒ γ) ⇒ ((α ⇒ γ) ⇒ γ)) part (iii) (1,2)
To prove part (v): α ⇒ (β ⇒ (γ ⇒ δ)) < α ⇒ (γ ⇒ (β ⇒ δ))
(1) α ⇒ (β ⇒ (γ ⇒ δ)) Hyp
(2) (β ⇒ (γ ⇒ δ)) ⇒ (γ ⇒ (β ⇒ δ)) Theorem 4.10.2 (ii)
(3) α ⇒ (γ ⇒ (β ⇒ δ)) Theorem 4.8.3 (iii)
To prove part (vi): < (α ⇒ γ) ⇒ ((β ⇒ γ) ⇒ ((¬α ⇒ β) ⇒ γ))
(1) (¬α ⇒ β) ⇒ ((β ⇒ γ) ⇒ ((α ⇒ γ) ⇒ γ)) part (iv)
(2) (β ⇒ γ) ⇒ ((¬α ⇒ β) ⇒ ((α ⇒ γ) ⇒ γ)) Theorem 4.8.3 (iv) (1)
(3) (β ⇒ γ) ⇒ ((α ⇒ γ) ⇒ ((¬α ⇒ β) ⇒ γ)) part (v) (2)
(4) (α ⇒ γ) ⇒ ((β ⇒ γ) ⇒ ((¬α ⇒ β) ⇒ γ)) Theorem 4.8.3 (iv) (3)
This completes the proof of Theorem 4.10.5.
4.11. Other logical operators

4.11.1 Remark: The implication-based propositional calculus which is introduced in Section 4.7 offers
only two logical operators, ∧ and ¬. Of the five logical connectives in Notation 4.3.3, the two connectives ⇒
and ¬ are undefined in the axiom system described in Definition 4.7.4 because they are primitive connectives.
(All of the operators are fully defined in the semantic context, but the symbols-only language context defines
only manipulations rules, not meaning.) The other three connectives are defined in terms of ⇒ and ¬ in
Definition 4.11.2.
4.11.2 Definition:
(i) α ∨ β means (¬α) ⇒ β for any wffs α and β.
(ii) α ∧ β means ¬(α ⇒ ¬β) for any wffs α and β.
(iii) α ⇔ β means (α ⇒ β) ∧ (β ⇒ α) for any wffs α and β.
[ Perhaps the NAND (|) and NOR (↓) operators should be defined near here. It would be a good idea to find
a standard symbol for XOR too. ]
4.11.3 Remark: Definition 4.11.2 is not how logical connectives are defined in the real world. It just
happens that there is a lot of redundancy among the operators in Notation 4.3.3. So it is possible to define
the full set of operators in terms of a proper subset. Defining the operators in terms of a minimal set of
operators is part of a minimalist mode of thinking which is not necessarily useful or helpful.
[ Maybe there should be a discussion of reductionism, minimalism and naturalism in the philosophy chapter? ]
Reductionism has been enormously successful in the natural sciences in the last couple of centuries. But
minimalism is not the same thing as reductionism. Reductionism recursively reduces complex systems to
fundamental principles and synthesizes entire systems from these simpler principles. (E.g. Solar System
dynamics can be synthesized from Newton’s laws.) However, it cannot be said that the operators ⇒ and ¬
are more “fundamental” than the operators ∧ and ∨ . The best way to think of the basic logical connectives
is as a network of operators which are closely related to each other.
In many contexts, the set of three operators ∧ , ∨ and ¬ is preferred as the “fundamental” operator set.
(For example, there are useful decompositions of all truth functions into “disjunctive normal form” and
“conjunctive normal form”.) In the context of propositional calculus, the ⇒ operator is more “fundamental”
because it is the basis of the modus ponens inference rule. But the modus ponens rule could easily be
replaced by an equivalent rule which uses one or more different logical connectives. (The possibility of
basing propositional calculus on the AND and NOT operators is mentioned in Remark 4.6.2.)
A propositional calculus based on the single NAND operator with a single axiom and modus ponens is
possible, but it requires a lot of work for nothing. The NAND operator is in no way “the fundamental
operator” underlying all other operators. It is minimal, not fundamental. (See also Remark 4.6.4 for related
comments on NAND-operator logic.)

4.11.4 Remark: The ten assertions of Theorem 4.11.7 are the same as the axiom schemas of the proposi-
tional calculus described by Mendelson [165], page 40. This axiomatic system is referred to Kleene [162].
4.11.5 Remark: Theorem 4.11.7 part (ix) seems to be quite difficult to prove from the three axiom schemas
in Definition 4.7.4. To simplify the proof, some preliminary assertions are given in Lemma 4.11.6. Theo-
rem 4.11.7 (ix) is a kind of mirror image of axiom (PC 2) in Definition 4.7.4. The former has a γ on the right
in each of the three terms whereas the latter has an α on the left in each of the three terms.
4.11.6 Lemma [pc]: The following assertions for wffs α, β and γ follow from the propositional calculus in
Definition 4.7.4.
(i) < ¬(¬β ⇒ ¬(¬α ⇒ β)) ⇒ α.
(ii) < ((α ⇒ β) ⇒ γ) ⇒ ((¬β ⇒ ¬α) ⇒ γ).
4.11.7 Theorem [pc]: The following assertions for wffs α, β and γ follow from the propositional calculus
in Definition 4.7.4.
(i) < α ⇒ (β ⇒ α).
(ii) < (α ⇒ (β ⇒ γ)) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)).
(iii) < (α ∧ β) ⇒ α.
(iv) < (α ∧ β) ⇒ β.
(v) < α ⇒ (β ⇒ (α ∧ β)).
(vi) < α ⇒ (α ∨ β).
(vii) < β ⇒ (α ∨ β).
(viii) < (α ⇒ γ) ⇒ ((β ⇒ γ) ⇒ ((α ∨ β) ⇒ γ)).
(ix) < (α ⇒ β) ⇒ ((α ⇒ ¬β) ⇒ ¬α).
(x) < ¬¬α ⇒ α.
Proof: Parts (i) and (ii) are identical to axiom schemas (PC 1) and (PC 2) respectively in Definition 4.7.4.
Part (iii) is the same as < ¬(α ⇒ ¬β) ⇒ α by Definition 4.11.2 (ii). But this assertion is identical to
Theorem 4.8.3 (xvi). Similarly, part (iv) is the same as < ¬(α ⇒ ¬β) ⇒ β by Definition 4.11.2 (ii), and this
assertion is identical to Theorem 4.8.3 (xvii).
To prove part (v): < α ⇒ (β ⇒ (α ∧ β))
(1) (α ⇒ ¬β) ⇒ (α ⇒ ¬β) Theorem 4.8.3 (i)
(2) α ⇒ ((α ⇒ ¬β) ⇒ ¬β) Theorem 4.8.3 (iv) (1)
(3) ((α ⇒ ¬β) ⇒ ¬β) ⇒ (β ⇒ ¬(α ⇒ ¬β)) Theorem 4.10.2 (vii)
(4) α ⇒ (β ⇒ ¬(α ⇒ ¬β)) Theorem 4.8.3 (iii) (2,3)
(5) α ⇒ (β ⇒ (α ∧ β)) Definition 4.11.2 (ii) (4)
To prove part (vi): < α ⇒ (α ∨ β)
(1) α ⇒ (¬α ⇒ β) Theorem 4.8.3 (xv)
(2) α ⇒ (α ∨ β) Definition 4.11.2 (i) (1)
To prove part (vii): < β ⇒ (α ∨ β)
(1) β ⇒ (¬α ⇒ β) PC 1
(2) β ⇒ (α ∨ β) Definition 4.11.2 (i) (1)
Part (viii) is identical to Theorem 4.10.5 (vi).
To prove part (ix): < (α ⇒ β) ⇒ ((α ⇒ ¬β) ⇒ ¬α)
(1) (α ⇒ β) ⇒ (¬β ⇒ ¬α) Theorem 4.10.2 (vi)
(2) (¬β ⇒ ¬α) ⇒ ((β ⇒ ¬α) ⇒ ¬α) Theorem 4.10.5 (i)
(3) (α ⇒ β) ⇒ ((β ⇒ ¬α) ⇒ ¬α) Theorem 4.8.3 (iii) (1,2)

4.12. Parametrized families of propositions 143
(4) (β ⇒ ¬α) ⇒ ((α ⇒ β) ⇒ ¬α) Theorem 4.8.3 (iv) (3)

(5) (α ⇒ ¬β) ⇒ (β ⇒ ¬α) Theorem 4.10.2 (vii)
(6) (α ⇒ ¬β) ⇒ ((α ⇒ β) ⇒ ¬α) Theorem 4.8.3 (iii) (5,4)
(7) (α ⇒ β) ⇒ ((α ⇒ ¬β) ⇒ ¬α) Theorem 4.8.3 (iv) (6)
Part (x) is identical to Theorem 4.8.3 (vi).
[ 2007-1-19: This section is currently being rewritten. So everything after this point does not follow smoothly
from the above definitions. ]
[ Make Remarks 4.11.8 and 4.11.9 into theorems. They are actually meta-theorems. ]
4.11.8 Remark: If a wff α is a tautology, it may be combined in a disjunction with any wff β without
altering the truth or falsity of the enclosing wff. Thus if β is a sub-wff of a wff, it may be replaced with the
combined wff (α ∨ β). This fact follows from Theorem 4.11.7 (vi) and the theorem α < (α ∨ β) ⇒ β.
Conversely, a tautology may be removed from a conjunction. Thus α may be substituted for α ∧ β if β is a
tautology. An example application of this is line (5.15.2) in the proof of Theorem 5.15.4.
4.11.9 Remark: There is a corresponding observation to Remark 4.11.8 in terms of a contradiction instead
of a tautology. (See Definition 4.3.23 for contradictions.) A contradiction β can be combined with any
proposition α without altering its truth or falsity. Thus the combined proposition α ∨ β is equivalent to α
for any proposition α and contradiction β.
[ Highly desirable for Theorem 4.11.10 would be an “equivalent expression substitution” metatheorem. Then
equivalent expressions could be substituted into logical expressions without changing their truth values, just
like the substitution methods in number algebra. ]
4.11.10 Theorem [pc]: The following assertions for wffs α, β and γ follow from the propositional calculus
(i) < (α ∨ β) ⇔ (β ∨ α). (Commutativity of disjunction.)
(ii) < (α ∧ β) ⇔ (β ∧ α). (Commutativity of conjunction.)
(iii) < (α ∨ α) ⇔ α.
(iv) < (α ∧ α) ⇔ α.
" # " #
(v) < α ∨ (β ∨ γ) ⇔ (α ∨ β) ∨ γ . (Associativity of disjunction.)
" # " #
(vi) < α ∧ (β ∧ γ) ⇔ (α ∧ β) ∧ γ . (Associativity of conjunction.)
" # " #
(vii) < α ∨ (β ∧ γ) ⇔ (α ∨ β) ∧ (α ∨ γ) . (Distributivity of disjunction over conjunction.)
" # " #
(viii) < α ∧ (β ∨ γ) ⇔ (α ∧ β) ∨ (α ∧ γ) . (Distributivity of conjunction over disjunction.)
" #
(ix) < α ∨ (α ∧ β) ⇔ α. (Absorption of disjunction over conjunction.)
" #
(x) < α ∧ (α ∨ β) ⇔ α. (Absorption of conjunction over disjunction.)
4.11.11 Theorem [pc]: The following tautologies hold for propositions A and B.
(i) (A ⇒ B) ⇔ (A ⇔ (A ∧ B)).
Proof: See Exercise 46.1.5.
4.12. Parametrized families of propositions

The predicate calculus is introduced in Section 4.13. The basic concepts of predicate calculus include the
idea of parametrized families of propositions. The parameters are called “variables”.
4.12.1 Remark: The predicate calculus may be regarded as a management system for very large sets of
propositions by grouping them into classes.
In the propositional calculus, each proposition is regarded as an individual entity to be managed without any
regard to attributes which it may have in common with other propositions. For example, the propositions

A = “The Sun is a star.” and B = “Alpha Centuri is a star.” are treated as having nothing in common at
all. But each proposition has the form: P (X) = “X is a star.”
The propositions A and B are members of a class of propositions P (X) for different values of the variable X.
The predicate calculus tries to exploit the redundancy in this class of propositions. Thus one may write
∀X, P (X) to mean that everything is a star. The alternative in pure propositional calculus is to make this
assertion for every choice of X, which would be tedious for a large set of X values, or impossible in the case
of an infinite set.
4.12.2 Remark: It is often stated that the predicate calculus, which deals with parametrized families
of propositions, is different to the propositional calculus in that it permits infinite classes of propositions
because the set of predicate parameters is defined as a set which may be infinite and even uncountably infinite.
However, there is no real difference. There is nothing to prevent the space of propositions in the propositional
calculus from being arbitrarily infinite except the minor inconvenience of defining a naming notation for such
a large set of propositions. The predicate calculus provides explicit support for the organization of the space
of propositions as a parametrized family.
Since the predicate calculus does typically deal with infinite sets of propositions, infinite conjunctions and
disjunctions are provided. The universal and existential quantifiers are simply infinite conjunctions and
disjunctions respectively. (See Section 4.13 for logical quantifiers.) They are only required when there are
infinitely many propositions, but such infinite propositions sets are not strictly specific to the predicate
calculus. Consequently, the propositional calculus and predicate calculus are not actually as different as
they may at first seem.
4.12.3 Remark: The predicate calculus requires two kinds of names, namely (1) names which refer to
propositions and (2) names which refer to parameters for the propositions. Thus the notation P (x) means
a proposition template which includes a symbol x. The understanding is that one may substitute for x any
other name from the set of parameter names. Similarly, the notation Q(x, y) means a proposition template
which has two substitutable names, x and y. And so forth for any number of proposition parameters. In set
theory, the parameter names typically represent sets.
The name-to-object maps for a predicate language are summarized in the following table.
name name object object
map space domain type
µV NV V variables
µQ NQ Q predicates
µP NP P propositions
The choice of the words “space” and “domain” here is fairly arbitrary. (See Remark 4.1.2 for discussion of
this choice of words.) The spaces and maps in the above tables are illustrated in Figure 4.12.1.
variable names predicate names proposition names truth value names

NV NQ NP
abstract x, y, z,. . . P , Q,. . . P (x), Q(y, z),. . . F T {F, T}
t
µV µQ µP
V Q P
concrete ∅, {∅},. . . “∈”,. . . ∅ ∈ {∅},. . . τ F T {F, T}
variables predicates propositions truth values

Figure 4.12.1 Variables, predicates, propositions and truth values

[ Strictly speaking, there should be a map µT : NT → T , where T is the concrete set of truth values, and
NT = {F, T} is the abstract set of truth values. Add these to the table and the diagram. The concrete
truth values may be voltages, for example. And the truth map may be varied for the same abstract and
concrete truth value spaces. The abstract truth value name space NT probably should be a fixed space. On
the other hand, other kinds of logic can use more than two truth values, for example. ]
The space Q of predicates may be partitioned according to the number of parameters of each predicate.
Thus Q = Q0 ∪ Q1 ∪ Q2 . . ., where Qk is the space of predicates which have k parameters. The predicates in
Qk have the form P : V k → {F, T}. In terms of the list notation in Section 7.12, the predicates in Q have
the form P : List(V) → {F, T}.
4.12.4 Remark: The five layers of linguistic structure for propositional calculus in Figure 4.5.1, in Re-
mark 4.5.4, are applicable also to predicate calculus. The main difference is that the logical operators are
extended by the addition of universal and existential quantifiers in the predicate calculus.
4.12.5 Remark: It is noteworthy that although the variables in predicate calculus are typically arbitrarily
infinite, the number of predicates is typically a small finite number. For example, in ZF set theory, there
are only two predicates, namely the set equality and set membership relations. (In NBG set theory, there
an additional single-parameter predicate which determines whether a class is a set.) This large difference in
size between the sets of variables and predicates is completely understandable when one views predicates as
families of propositions. Each predicate corresponds to a different concept, typical a fundamental relation
or property of objects.
Another noteworthy difference between variables and predicates is that the predicates are usually constant.
The axioms of a predicate calculus typically specify the properties of the very small number of predicates.
The predicates are imported from the concrete logical system. There is typically no need for variable names
for concrete predicates in the axioms.
In the case of abstract predicates which are built from the concrete predicates by the use of logical operators
and quantifiers, variable predicates are frequently encountered, and they are typically infinite in number.
4.12.6 Remark: The use of the word “set” for the classes of proposition names and parameter names in
Remark 4.12.3 seems dangerous because sets have not been defined yet. However, these are “naive sets”.
There is no membership relation “∈” on these two particular naive sets. The only membership relation is
between the symbols (as elements) and the whole class. I.e. all of the symbols are elements of their class. The
specification axiom is useful for indicating subsets of the full symbols sets. But very few set constructions
are required. So the danger is not very great.
[ Check Remark 4.12.7 to see if it makes good sense or not. ]
4.12.7 Remark: It seems potentially dangerous that parameter names such as x and y in Remark 4.12.3
could be the names of sets, such as ZF or NBG sets. But this is a semantic issue. Since predicate calculus
operates only at the linguistic level, the meaning of the symbols is of no importance to the integrity of the
language itself. At the linguistic level, we are only interested in applying rules and axioms to abstract logical
expressions. Contradictions can occur if the axioms of the language are not well chosen. But this implies
only that the axiomatic system is not self-consistent. It does not imply that the formulation of the system
is itself inconsistent. The set of axioms is still a well-defined set of axioms, and the set of deductions rules is
still a well-defined set of deductions rules. The fact that a contradiction may be proved within an axiomatic
system just means that the system has undesirable characteristics, not that it contains contradictions within
the formulation itself.
4.12.8 Remark: The wffs in a propositional calculus are meaningful only if a “domain of interpretation”
is specified for its statement variables, relations, functions and constants. According to Mendelson [165],
page 49, an “interpretation” requires that the variables be elements of a set D and each logical relation,
function and constant must be associated with a relation, function and constant in the set-theory sense
within the set D. In other words, the interpretation of a propositional calculus requires some of the basic set
theory which is presented in Chapters 5 and 6. (Or alternatively, only some sort of limited naive set theory
is required.)

4.12.9 Remark: It is sometimes convenient to have a notation for a predicate which is always true (or
always false). Such a predicate does not need parameters. So Notation 4.12.10 introduces the zero-parameter
predicates which are always true or always false. This is the same notation as for zero-operand logical
operators. (See Notation 4.3.19, which is discussed in Remark 4.3.20.)
These predicates are added to the abstract predicate names for any particular predicate calculus. There are
not necessarily any concrete predicates for which these abstract predicates are labels.
As an extension of notation, logical predicates which are always true or always false and have one or more
parameters, may be denoted as for functions. Thus, for example, 9(x, y, z) would be a true proposition for
any variables x, y and z, and ⊥(a, b, c, d) would be a false proposition for any variables a, b, c and d. Luckily
such notation is rarely needed.
4.12.10 Notation:
9 denotes the true zero-parameter logical predicate.
⊥ denotes the false zero-parameter logical predicate.
4.12.11 Remark: In addition to predicates and variables, predicate calculus may also have functions.
These are naive functions of some kind, not the same as the relational functions in set theory. But they look
very similar. A predicate logic function is a map of the form f : V n → V for some non-negative integer n.
[ It seems like predicate logic functions on the domain of predicates are required. In other words, functions of
the form g : Qn → V for non-negative integers n. For example {x; F (x)} looks like g(F ), yielding a set for
each predicate. ]
4.12.12 Remark: In addition to the variable predicates, variable functions and individual variables, there
is also a requirement for constants in each of these categories. For example, in ZF set theory, “∈” is a
constant predicate and “∅” is an individual constant. Figure 4.12.2 illustrates the map µV of variable names
for sets and the map µQ for the constant name “∈” for the concrete set membership predicate.
A abstract B
names
∈ C ∈
µV µV
µQ µQ
µV (A) µV µV (B)
µQ (∈) µQ (∈)
concrete
objects µV (C)
Figure 4.12.2 Mapping the constant name “∈” to a concrete predicate
[ There probably should be a separate section on the definition of “constant” in Remark 4.12.13. Probably
this remark should be split up into smaller conceptual steps. Although this remark is probably correct in
principle, and quite likely useful, it needs a lot of improvement to make it comprehensible. Some of the
notation and terminology need fixing too. ]
4.12.13 Remark: The definition of “constant” in Remark 4.12.12, for names of predicates, functions and
individuals, only has meaning if there is a definition of equality (or “identity”) on the corresponding concrete
object space. Otherwise one cannot know whether two names are pointing to the same object. But definitions
of constants require even more than that.

The definition of the word “constant” is not at all obvious. In mathematics presentations, one often hears
statements like: “Let C be a constant.” But then it generally transpires that C is quite arbitrary. So it
isn’t constant at all, although it will generally be constant with respect to something. Analysis texts often
have propositions of the general form: ∀x, ∃C, ∀y, P (x, y, C), where P is some three-parameter predicate.
(See for example the epsilon-delta criterion for continuity on metric spaces in Theorem 17.4.3.) In this case,
C is constant with respect to y, but not with respect to x. Thus one may typically write C(x) to informally
suggest that C may depend on x but not on y.
So the question arises in the case of names of predicates, functions and variables as to what a constant name
should be constant with respect to. The simple answer is that constant names are constant with respect to
name maps, but this requires some explanation.
Consider first the name maps µV : NV → V from the individual name space NV to the individual object
space V. (The “individual” spaces are usually called “variable” spaces, but this is confusing when one is
talking about “constant variables”. A more accurate terms would be “predicate parameter spaces”.) In
principle, the language defined in terms of the name space should be invariant under arbitrary choices of the
name map µV . That is, all of the axioms and rules should be valid for any given name map. Therefore the
axioms and rules should be invariant under any permutation of the elements of the object space V.
If all elements of a concrete space are equivalent, there is no need to be concerned with the choice of names.
But this is not usually the case. More typically, each element of a concrete space has a unique character
which is of interest in the study of that space. In this case, one often wishes to have fixed names for some
concrete objects.
Consider the example of the empty set in ZF set theory. Let a ∈ V be the empty set in a concrete ZF set
theory (often called a “model” of the theory). Then the name A ∈ NV in the parameter name space NV
c c
points to a if and only if µV (A) = a, where = : V × V → {F, T} denotes the equality relation on the concrete
parameter space V. (See Figure 4.12.3.)
abstract names abstract names

A B C A B C
NV NV
µ1V µ1V µ1V µ2V µ2V µ2V
V V
c c c c c c
a = µ1V (A) b = µ1V (B) c = µ1V (C) a = µ2V (B) b = µ2V (A) c = µ2V (C)
concrete objects concrete objects
Figure 4.12.3 Definition of a constant name
c
A variable name C ∈ NV can be said to be “constant” if ∀µ1V , µ2V ∈ V NV , µ1V (C) = µ2V (C), where V NV
denotes the set of functions f : NV → V.
Since the concrete parameter space has unknown elements, one cannot know what a is. (Concrete spaces are
an “implementation” matter, which cannot be standardized at the linguistic level in logic.) Therefore the
c
condition µV (A) = a is not meaningful. However, if C is a name which know (somehow) has the property
c c
µV (C) = a, we can define define A by the condition µV (A) = µV (C). This is now importable into the
a a
abstract name space as the condition: A = C, where =: NV × NV → {F, T} is the import of the concrete
c
relation = to the parameter name space.
a
Now the condition A = C is also not meaningful because C is not known. The problem here is that the
equality relation is fully democratic. To see how to escape from this problem, denote an ad-hoc abstract
a c
single-parameter predicate PC by PC (X) = “X = C”. (This is equivalent to PC (X) = “µV (X) = a”.) Then
a c
clearly PC (X) is true if and only if X = C. That is, ∀X, (PC (X) ⇔ (µV (X) = a)). So the predicate PC

characterizes the required constant C. That is, we can determine if any name X points to the fixed object
a by testing it with the predicate PC .
" a #
A predicate P ∈ NQ which satisfies a uniqueness rule, namely ∀x ∈ NV , ∀y ∈ NV , (P (x)∧P (y)) ⇒ (x = y) ,
c
characterizes a unique concrete object. The predicate PC defined by PC (X) = “µV (X) = a” does satisfy this
uniqueness requirement, but this alone cannot define PC because it does not define the unknown object a ∈ V.
c
The predicate PC (X) = “µV (X) = a” is specific to a, but there is no way to explicitly specify a.
We need to convert PC into a predicate which is meaningful. This can be done by defining P (X) = “∀y, y ∈ /
X”. The predicate P does not refer to any a-priori knowledge of the identity of the empty set at all. So this
does define a constant name for the empty set. A name X is the name of the constant object (called the
“empty set”) if and only if P (X) is true.
Next it must be noted that the predicate expression P (X) = “∀y, y ∈ / X” depends on the membership
relation “∈”, which is in turn a fixed predicate which is imported from the concrete predicate space. This
has shifted the constancy problem from predicate parameters to predicates. It seems that the need to have
an a-priori mapping from some constant predicate names to particular concrete predicates is unavoidable.
On the other hand, the ZF axioms characterize the membership relation “∈” very precisely. If there are two
or more such membership relations on the concrete space, that is not a problem. All of the consequences
of ZF set theory will be valid for all maps µQ from the predicate name space NQ to the concrete predicate
space Q.
The “constant” empty set name is characterized the predicate P (X) = “∀y, y ∈ / X”, which depends on the
variable name “∈” for a membership relation which satisfies the ZF axioms. However, this is not a problem
which anyone worries about. The choice of predicate name map µQ : NQ → Q is very strongly constrained
by the ZF set axioms. Then in terms of the choice of the membership relation “∈”, the name ∅ is uniquely
constrained by the predicate P (∅), which is true only if the name “∅” points to the unique empty set for the
c
particular choice of predicate name map for “∈”. Then we have a = µV (∅).
In conclusion, the definition of a constant name (for a predicate parameter) requires a definition of equality
on the concrete parameter space, which implies that a first order language without equality cannot have
constant names, and at least one additional “constant” predicate must be specified in order to distinguish
one parameter from another, so that one will know which concrete object is being pointed to by the constant.
4.13. Logical quantifiers
4.13.1 Remark: The subject of this section is predicate calculus, which is the logic of parametrized propo-
sition families with the existential and universal quantifiers. Predicate calculus is an extension of proposi-
tional calculus, which is the subject of Section 4.4.
4.13.2 Remark: The quantifier symbols ∀ and ∃ mean “for all” (the universal quantifier) and “for some”
(the existential quantifier) respectively. The existential quantifier may be defined in terms of the universal
quantifier. Thus ∃x, P (x) is defined to mean ¬(∀x, ¬P (x)). Notation 4.13.3 is an attempt to formalize the
meanings of these symbols.
4.13.3 Notation:
∀x, P (x) for any variable name x and predicate name P means “P (x) is true for all variables x”.
∃x, P (x) for any variable name x and predicate name P means “P (x) is true for some variable x”.
4.13.4 Remark: The symbol ∃ does not mean “there exists” as many elementary texts erroneously claim.
It is a quantifier, not a verb. Sentences of the form “∃x such that P (x).” are wrong – and very annoying to
the cognoscenti. The correct form is “∃x, P (x).” or: “There exists x such that P (x).” Thus in colloquial
contexts one may read ∃x as “there exists x such that”, but not “there exists x”.
4.13.5 Remark: Although the ∃ symbol does not mean “there exists”, the symbol is mnemonic for the
letter “E” of the word “exists”. Similarly the ∀ symbol is mnemonic for “A” in the word “all”. The rotation
of the symbols through 180◦ is a relic of the olden days when typesetting used lead fonts. Using an existing
character presumably saved space in the font drawer (called the “case”).

4.13. Logical quantifiers 149
4.13.6 Remark: The use of commas after every quantifier is intended to remove ambiguity. The comma
terminates the quantifier unambiguously. This is a good idea even if the following sub-expression is paren-
thesized because justaposition of two sub-expressions could be confusing. It is better to simply make all
expressions easy to parse by using the quantifier-terminating commas.
4.13.7 Remark: There is some variation in notations for logical quantifiers. The following table gives a
sample of notations.
author universal quantifier existential quantifier
EDM [34] ∀xF (x), (x)F (x), ΠxF (x), xF (x) ∃xF (x), (Ex)F (x), ΣxF (x), xF (x)
EDM2 [35] ∀xF (x) ∃xF (x)
KEM [122] ∀xA(x) ∃xA(x)
Lemmon [164] (x)P x (∃x)P x
Mendelson [165] (x)A11 (x) (Ex)A11 (x)
Reinhardt [135] x
P (x) x
P (x)
Shoenfield [169] ∀xA ∃xA
Szekeres [45] ∀x(P (x)) ∃x(P (x))
Kennington ∀x, P (x) ∃x, P (x)
4.13.8 Remark: The universal and existential quantifiers are sometimes denotes by and respectively.
This choice of symbols may be justified by considering expressions of the form “P (x1 ) ∧ P (x2 ) ∧ P (x3 ) ∧ . . .”
and “P (x1 ) ∨ P (x2 ) ∨ P (x3 ) ∨ . . .” respectively. However, these notations would be confusing in the context
of tensor algebra. So they are not used in this book. (The subscript version is also inconvenient for the
vertical spacing of text, as is evident in Remark 4.13.7.)
4.13.9 Remark: The universal quantifier gives the maximum possible information about the truth values
of a predicate P because the statement ∀x, P (x) determines the truth value of the proposition P (x) for
all values of x in the universe of the logical system. By contrast, the existential quantifier gives almost
the minimum possible information about the truth values of a predicate P because the statement ∃x, P (x)
determines the truth value of P (x) for only one value of x in the universe, and we don’t even know which
value of x this is.
Although universal and existential quantifiers are superficially similar, since they are in some sense duals of
each other, the differences in information content between them demonstrate that they are fundamentally
different.
A similar comment was made in the case of conjuncts and disjuncts of propositions in Remark 3.5.9.
[ The following semantics remarks should be moved to the logic semantics chapter. ]
4.13.10 Remark: It is very difficult to specify notations for semantics. So Notation 4.13.3 necessarily
explains the basic quantifier notations in natural language. If one looks too closely at this short explanation
of the universal and existential quantifiers, however, some serious difficulties arise. The words “all” and
“some” are not easy to explain precisely. In the physical world, it is very often impossible to prove that all
the members of a class have a particular property. Even if the class is not infinite, it may still be impossible
to test the property for all members.
One might think that the word “some” is easier because the truth of the proposition is firmly proved as soon
as one example is found. But if the ∃x, P (x) is true, it might still not be possible to find the single required
example x which proves the proposition. There is thus a clear duality between the two quantifiers. If the
class of variables x is infinite for empirical propositions P (x), it is never possible to prove that ∀x, P (x) is
true, and it is never possible to prove that ∃x, P (x) is false. (This duality follows from the fact that “true”
and “false” are duals of each other.)
The inability to establish quantified predicates by physical testing, in the case of empirical propositions, is
not necessarily a show-stopper. Propositions usually do not refer to the “real world” at all. Propositions are
usually attributes of models of the real world. And the quantified predicates may not arise from case-by-case
testing, but rather from some sort of a-priori assumption about the model. Then if the model does find a
contradiction with the real world, the model must be modified or limited in some way. This is how most

universal propositions enter into models. They are adopted because they are initially not contradicted by
observations, and the model is maintained as long as it is useful.
In the case of mathematical logic, it is sometimes impossible to find counter-examples to universal propo-
sitions, or examples for existential propositions. An important example of this is the idea of a Lebesgue
non-measurable set. No examples of these sets can be found in the sense of being able to write down a rule
for determining what the elements of such a set are. If the axiom of choice is added to the ZF set theory
axioms, it can be proved by means of axioms and deduction rules that such sets must exist. In other words,
it is shown that ∃x, P (x) is true for the predicate P (x) = “the set x is Lebesgue non-measurable”. But no
examples can be found. This does not prove that the assertion is false, which is very frustrating if one thinks
about it too much.
The difficulties with universal and existential quantifiers in mathematics are in one sense the reverse of the
difficulties for empirical propositions. The empirical claim that “all swans are white” is always vulnerable
to the discovery of new cases. So empirical universals can never be certain. But in the case of mathematics,
universal propositions are often quite robust. For example, one may be entirely certain of the proposition:
“All squares of even integers are even.” If anyone did find a counter-example x whose square x2 is not even,
one could apply a quick deduction (from the definitions) that the x2 is not even if x is not even, which would
be a contradition. More generally, universal mathematical propositions are usually proved by demonstrating
the absurdity of any counter-example. There is no corresponding method of proof to show the total absurdity
of non-white swans.
4.13.11 Remark: In Remark 4.3.10, it is mentioned that unquantified logical expressions are straight-
forward to parse and attach semantics to. Figure 4.13.1 shows an analogous attempt to parse and attach
semantics to a quantified logical expression: ∃b, (b ∈ X ∧ F (a, b)). The use of bound variables implies that
the nodes of the parse tree do not have a fixed meaning. The tree must be traversed potentially an infinite
number of times to determine the truth value of the entire quantified expression.
parse tree function tree

∃b, (b ∈ X ∧ F (a, b)) φ∃
∃b b ∈ X ∧ F (a, b) b φ∧
b∈X F (a, b) φ∈ F
b X a b b X a b
syntax semantics
" # " #
t ∃b, (b ∈ X ∧ F (a, b)) = max φ∧ φ∈ (b, X), F (a, b)
b
Figure 4.13.1 Example quantified logical expression tree with syntax and semantics
4.13.12 Remark: The problem of interpreting infinite sets within set theory is really identical to the
infinity problem in logic. The predicate calculus already has all of the infinity difficulties. On the other
hand, no infinite set of propositions can ever be written down and checked one by one. So one can never
really prove that any universally quantified infinite family of propositions is satisfied. This seems to put
infinite conjunctions in the same general category as propositions which can never be proved true or false.
The infinite logic of limits in calculus is so overwhelmingly important in physics that it is difficult to simply
ignore infinities. Perhaps it is best to think of the word “infinite” as meaning “finite but unbounded”. Then
one can carry through the analysis of limits without fear that the elements of a sequence will vanish into a
vacuum while being observed.
There are adequate metaphors for infinity concepts. For example, the idea that no matter how large a
number is, there will always be a larger number, is very convincing. We simply cannot imagine that any
number could be so big that we could not add 1 to it. But this, and any other infinity metaphor, breaks

4.14. Predicate calculus 151
down when we consider straws on the backs of camels. It is very difficult to imagine that a single straw will
break the back of a camel. But we also know that a weight of 100 tons cannot be carried by any camel. So
there must be a point at which the camel will collapse. If we replace straws with single hairs or even lighter
objects, it is even more difficult to imagine the camel collapsing from such a tiny weight increment. (See
also Remark 2.11.15 regarding camels and straw.) As mentioned in Remark 2.11.2, the boundedness of the
integers seems entirely plausible. But we just can’t imagine the breaking point where the universe’s ability
to represent an integer would break down. (People can’t imagine their own death either, but this does not
imply that humans are immortal. People’s minds are simply too limited to imagine some things.)
It should be noted that although mathematics physics does rely very heavily on limits in its formulation,
it is the results of differentiation and integration of functions which are useful in physics, not the limiting
processes themselves. Limits just proved a basis for developing algorithms which give the right answers.
Furthermore, the atomic (or elementary particle) nature of matter and the quantum limits on measurement
imply that no differentiation or integration (or any other kind of prediction of models) can ever be measured
to infinite accuracy or even unbounded accuracy. Therefore it is not even clear that the laws of physics can
be verified in any circumstances at all. Exact differentiation and integration lie in the realm of metaphysics
because they are defined within inferred models which underly phenomena. So in this sense, limits are not
directly used in physics, and are therefore not validated by physics.
4.13.13 Remark: The set of proposition parameters in a first order logic should surely be a class in the
sense of NBG set theory rather than a set. (See Section 5.12 for NBG set theory.) For example, ZF set
theory is supposedly a first order logic, and the “set of all sets” is not a set. (See also Remark 4.1.4 regarding
sets and classes of propositions.)
On the other hand, maybe the “set” of all individual variable names need not necessarily be sufficient to
represent each element of the proposition parameter “set” by a separate name. So perhaps a ZF-style set
suffices for the proposition parameter space.
It’s important to distinguish between the set of variable names in the language and the “set” of individual
objects (variables) in the semantic space of the language. (The variables names belong to the discussion
context. The concrete variables belong to the discussed context. These two contexts are discussed in
Remark 4.3.1.) The set of variable names is limited by the information-carrying ability of the medium in
which one writes sentences in the language. This set must therefore presumably be no larger than countably
infinite. But the semantic space with which one interprets the individual variable names could be a proper
class (in the sense of NBG set theory).
The observation that the set of names just be finite (or countably infinite), because of the bandwidth
limitations of human writing and talking, whereas the class of objects in the concrete logical system may
have a completely arbitrary cardinality, is in essence the reason why pixie (or dark) sets and numbers must
exist. Although the ZF axioms imply that the real numbers, for example, are uncountable, the variable name
space for the predicate calculus cannot refer to all of them. (For dark sets and numbers, see for example
Section 2.10.)
4.14. Predicate calculus

[ This section will present the logical axioms and the deduction rules for general predicate calculus. Then
particular first order languages may be derived from this. ]
[ Discuss completeness theorems for predicate calculus near here. For example, in a complete logical system
there should be a meta-theorem that if A is not provable, then A is provably false. ]
[ Discuss also first-order and second-order predicate logics (or languages). Second-order logics permit predicate
names to be the bound variables in quantifiers. ]
[ Possibly present a version of group theory which is formalized as a first-order language which does not need
set theory. Perhaps this could be done just after Section 9.2? ]
4.14.1 Remark: In this book, QC is an abbreviation for predicate calculus. The Q suggests the word
“quantifier”. There isn’t just one predicate calculus. But there is a core set of common logical axioms and
deductions rules which apply to every predicate calculus.

4.14.2 Remark: There is essentially only one propositional calculus, although there are very numerous
formulations of it. But in the case of predicate calculus, there are many very different first order languages
because they have a wide variety of sets of predicates, functions and axioms.
[ Most of Remark 4.14.3 almost certainly should be in the logic semantics chapter. ]
4.14.3 Remark: Truth tables are a tabular representation of the method of exhaustive substitution. When
there are N propositions in a logical expression, there are 2N combinations to test. When there are infinitely
many propositions, clearly the exhaustive substitution method is not so clearly applicable. (There is a similar
comment in Remark 4.4.4.)
Mendelson [165], page 56, makes the following comment on the inapplicability of the truth table approach
to the evaluation of logical expressions involving parametrized families of propositions.
In the case of the propositional calculus, the method of truth tables provides an effective test as to
whether any given statement form is a tautology. However, there does not seem to be any effective
process to determine whether a given wf is logically valid, since, in general, one has to check the
truth of a wf for interpretations with arbitrarily large finite or infinite domains. In fact, we shall see
later that, according to a fairly plausible definition of “effective”, it may actually be proved that
there is no effective way to test for logical validity. The axiomatic method, which was a luxury in
the study of the propositional calculus, thus appears to be a necessity in the study of wfs involving
quantifiers, and we therefore turn now to the consideration of first-order theories.
This observation requires careful interpretation. It does not imply that parametrized families of proposi-
tions can derive their validity only from axioms. The semantics of parametrized propositions is not totally
different to the semantics of unparametrized propositions. The change of notation to indicate parameters,
and the introduction of universal and existential quantifiers to indicate infinite conjunctions and disjunctions
respectively, does not imply that the semantic foundations are irrelevant to the determination of truth and
falsity of logical expressions.
It must be remembered that the axiomatization of logic is merely a formalization of a kind of “logical
algebra”. (This is summarized in Remark 3.1.2.) There is an analogy here to numerical algebra, where a
proposition such as x2 +2x = (x+1)2 −1 may be shown to be true for all x ∈ IR without needing to substitute
all values of x to ensure the validity of the formula. One uses algebraic manipulation rules which preserve the
validity of propositions, such “add the same number to both sides of an equation” and “multiply both sides
of the equation by a non-zero number”, together with rules of distributivity, commutativity, associativity
and so forth. But these rules are based directly on the semantics of addition and multiplication. The rules
are valid if and only if they match the semantics of the arithmetic operations. In the same way, the axioms
of predicate calculus are not arbitrary or optional. The predicate calculus axioms and rules derive their
validity entirely from the propositional calculus, which derives its axioms and rules from the semantics of
propositional logic.
Consequently, the validity of predicate calculus is derived entirely from exhaustive substitution in principle.
It is not possible to carry out exhaustive substitution for infinite families of propositions. But the rules and
axioms are determined from a study of the meanings of logical expressions.
Thus predicate calculus is merely a formalization of the “algebra” of parametrized families of propositions,
and the objective of this algebra is to “solve” for the truth values of particular propositions and logical
expressions, and also to show equality (or other relations) between the truth value functions represented by
various logical expressions which may involve quantifiers.
It is to easy to fall into the temptation to regard truth in predicate calculus as arising solely from manipula-
tions of symbols according to apparently somewhat arbitrary rules and axioms. The truth of a proposition
does not arise from the hocus-pocus with the line-by-line deductions. The truth arises from the semantics
of the proposition, which is merely discovered or inferred by means of line-by-line argument.
The above comment by Mendelson [165], page 56, continues into the following footnote.
There is still another reason for a formal axiomatic approach. Concepts and propositions which
involve the notion of interpretation, and related ideas such as truth, model, etc., are often called
semantical to distinguish them from syntactical precise formal languages. Since semantical notions
are set-theoretic in character, and since set theory, because of the paradoxes, is considered a rather

4.14. Predicate calculus 153
shaky foundation for the study of mathematical logic, many logicians consider a syntactical ap-
proach, consisting in a study of formal axiomatic theories using only rather weak number-theoretic
methods, to be much safer.
The paradoxes in set theory seem to all arise from self-referential propositions. (This is discussed at length
in Section 5.7.) But self-referential propositions are really a logic problem. It might be a little unfair to
blame set theory for paradoxes which arise in logic which is imported from the predicate calculus in the
first place! The paradoxes disappear when sets are defined non-self-referentially. So it might be safe to
base logic on a semantical foundation of set theory after all. More important than this, the validity of
predicate calculus is unquestionably derived from the semantics of set theory. This cannot be avoided. In
fact, detaching predicate calculus from set theory creates vast opportunities to deviate from the natural
meanings of logical propositions and head off into territory populated by ever more bizarre paradoxes of a
logical kind. (Examples of this could include multi-valued logic and inconsistency-tolerant logic.) At the
very least, the axioms of predicate calculus should be based on a consideration of the underlying semantics.
This should prevent wild departures from the original purposes of the study of logic.
4.14.4 Remark: Lemmon [164], page 91, has the following comment on the inapplicability of the truth
table approach to the predicate calculus. (The word “sequent” here roughly means a theorem of propositional
calculus.)
At more complex levels of logic, however, such as that of the predicate calculus [. . . ], the truth-table
approach breaks down; indeed there is known to be no mechanical means for sifting expressions
at this level into the valid and the invalid. Hence we are required there to use techniques akin to
derivation for revealing valid sequents, and we shall in fact take over the rules of derivation for
the propositional calculus, expanding them to meet new needs. The propositional calculus is thus
untypical: because of its relative simplicity, it can be handled mechanically—indeed, [. . . ] we can
even generate proofs mechanically for tautologous sequents. For richer logical systems this aid is
unavailable, and proof-discovery becomes, as in mathematics itself, an imaginative process.
The requirement for the use of “the rules of derivation” is not very different to the requirement for deduction
and reduction procedures in numerical algebra. The task of predicate calculus is to solve problems in “logical
algebra”. If infinite sets of variables are present in any kind of algebra, numerical or logical, it is fairly clear
that one must use rules to handle the infinite sets rather than particular instances. But this does not imply
that the methods of argument, devoid of semantics, have a life of their own. By analogy, one may specify an
arbitrary set of rules for manipulating numerical algebra expressions, equations and relations, but if those
rules do not correspond to the underlying meaning of the expressions, equations and relations, those rules
will be of recreational interest at best, and a waste of time and resources at worst. To put it simply, semantics
does matter!
It is not quite clear that “an imaginative process” is required, whatever that may mean. Perhaps the author
meant that semantics has a role in proof discovery. It is remarkable that mathematics, historically speaking,
was originally entirely a matter of “imagination”. Then the communications between mathematicians were
codified symbolically to the extent that some mechanization and automation was possible. But then, when
mechanical methods are inadequate, the application of “imagination” is almost regarded as a necessary evil,
whereas it was originally the whole business!
One could mention a parallel here with the industrial revolution, during which a vast proportion of human
productive activity was mechanized and automated. The observation that the design, use and repair of
machines requires human intervention, including manual dexterity and “an imaginative process”, is perhaps
annoying at times, but one should not forget that humans were an endangered species of monkey only a couple
of hundred thousand years ago. The automation of a large proportion of human thought by mechanizing
logical processes can be as beneficial as the automation of economic goods and services. But semantics
defines the meaning and purposes of logical processes. So semantics is as necessary to logic as human needs
are to the totality of economic production. (A cynic would possibly comment here that sometimes too many
people forget that the economy is supposed to serve humans, not vice versa. By analogy, one should not
permit blind methods of logic to force mathematicians to conclusions which seem totally ridiculous. If the
conclusions are too bizarre, then maybe its the mechanized logic which needs to be fixed, not the minds of
mathematicians.)
[ Express the QC axioms in Mendelson [165], page 57, in terms of min/max operators combined with clipping.

Must define “clip” functions like x 8→ max(0, x) somewhere. For example, A ⇒ B(x) is equivalent to
max(t(A)−t(B(x)), 0) = 0. This approach is not very satisfying, but it does deliver a language for expressing
the semantics of predicate calculus. ]
4.15. Equality
[ See EDM2 [35], 411.J, for predicate calculus with equality. ]
4.15.1 Remark: The concept of uniqueness requires a definition for the equality relation, which is usually
denoted as “=”. (See Remark 45.2.6 for the history of this symbol.)
A definition of equality must be provided somehow in any set theory, for example in Zermelo-Fraenkel set
theory. There are (at least) two ways to introduce equality into predicate logic.
(1) Assume that the concrete logical system which is being modelled has its own equality relation which
defines the identity of concrete objects. Import the concrete equality relation into the abstract predicate
logic. The imported equality relation then necessarily satisifes the “reflexivity of equality” axiom (4.15.1)
and “substitutivity of equality” axiom (4.15.2). (See Mendelson [165], page 75, for these axioms.)
∀A, A = A (4.15.1)
wff α (A = B) ⇒ (α(A) ⇔ α(B)). (4.15.2)
(2) Define the abstract equality relation in terms of some other relation (such as set membership) which is
imported from the concrete logical system. Then specify that the reflexivity and substitutivity axioms
be satisfied for the defined equality relation.
In the case of set theory, the equality x = y may be defined for sets A and B to mean the proposition
∀x, ((x ∈ A) ⇔ (x ∈ B)), where “∈” denotes a set membership relation which is imported from a concrete
predicate logic. This is an example of approach (2).
By contrast, in approach (1), the presumed importation of the equality relation from a concrete predicate
logic automatically yields (4.15.1) and (4.15.2). So any two names which refer to the same object may
be used interchangeably in any proposition. (See Remark 5.2.3 for more detailed discussion of these two
approaches to defining equality.)
4.15.2 Remark: The importation of a concrete equality relation into the abstract name space is achieved
c c
by defining A = B to mean µV (A) = µV (B) for all A, B ∈ NV , where = denotes the concrete equality
c
relation = : V × V → {F, T}.
Similarly, the truth value P (A) for any one-parameter predicate name P , and any variable name A, is defined
by P (A) = µQ (µV (A)).
4.15.3 Remark: Equality may be defined in a logical system which is not a set theory. The minimum
requirements for an equality definition are the three well-known conditions for an equivalence relation as
presented in Section 6.4, namely reflexivity, symmetry and transitivity. These three conditions are not
limited to set theories. But they are not automatically defined in all logical systems.
4.15.4 Remark: There are (at least) two ways of interpreting the equality concept in symbolic logic.
(1) Linguistic level: Regard equality as an abstract relation among symbols without any reference to the
meaning of the symbols in some other space of entities.
(2) Semantic level: Regard the symbols as mere temporary labels which refer always to elements of an
externally defined set of objects.
In case (1), the symbols are the real “things” which are defined in the logical system. So equality is an
abstract relation between symbols. Usually this relation would have some sort of connection through axioms
to other predicates in the system.
In case (2), the relation of equality is defined by the association of the symbols with underlying objects.
Then equality between symbol A and symbol B means that symbols A and B refer to the same object in the
set of objects. The equivalence relation conditions (reflexivity, symmetry and transitivity) are automatically

4.16. Uniqueness 155
satisfied by importing the concrete equivalence relation from the concrete object space, from which the
language acquires its semantics.
One might fairly ask how the equality relationship is defined in the semantic space of concrete objects.
However, this is an application question. The abstract theory of a predicate calculus with equality is
applicable to any concrete object space which satisfies the stated requirements. It is the “user’s” responsibility
to ensure that the requirements are met.
4.16. Uniqueness
4.16.1 Remark: There is no absolute reason why there could not be more than one equality relation in
a logical system. Then one could define uniqueness for each equality relation. However, logical systems
typically have a single equality relation which is thought of as defining the correspondence of names to
underlying objects.
4.16.2 Remark: Mendelson [165], page 168, uses the adjective “univocal” to refer to ordered-pair-set
relations which have the uniqueness property which is specified in Definition 6.11.3, but this definition may
be applied to 2-parameter predicates P as follows.
∀x, ∀y, ∀z, (P (x, y) ∧ P (x, z)) ⇒ y = z.
4.16.3 Remark: The shorthand “ ∃# ” is often used in mathematics to mean “for some unique”. For
example, “∃# x, P (x)” should mean “for some unique x, P (x) is true”. This may be expanded to the statement
“for some x, P (x) is true; and there is at most one x such that P (x) is true”. So it is really shorthand for
two statements. This is stated more precisely in Notation 4.16.4.
4.16.4 Notation: ∃# x, P (x), for a predicate P and variable name x, means:
" # " " ##
∃x, P (x) ∧ ∀x, ∀y, (P (x) ∧ P (y)) ⇒ (x = y) .
4.16.5 Remark: If the predicate P in Notation 4.16.4 is a very complicated expression, possibly possessing
several parameters, the longhand proposition could be quite irksome.
It is important in practice to not ignore additional predicate parameters because uniqueness is usually
conditional upon other parameters being restricted in some way. In colloquial contexts, the “∃# ” notation
is sometimes used in such a way that the dependencies are ambiguous. This can be avoided by ensuring
that all quantifiers and conditions are written out fully and in the correct order. Notation 4.16.6 gives an
extension of Notation 4.16.4 to explicitly show other parameters of the predicate P .
4.16.6 Notation: ∃# x, P (y1 , . . . ym , x, z1 , . . . zn ), for a predicate P with m + n + 1 parameters for non-
negative integers m and n, and a variable name x, means:
" #
∃x, P (y1 , . . . ym , x, z1 , . . . zn )
" " ##
∧ ∀u, ∀v, (P (y1 , . . . ym , u, z1 , . . . zn ) ∧ P (y1 , . . . ym , v, z1 , . . . zn )) ⇒ (u = v) .
4.16.7 Remark: In terms of the cardinality notation in set theory, one might write ∃# x, P (x) informally
as “#{x; P (x)} = 1”, meaning that there is precisely one thing x such that P (x) is true. (Existence means
that #{x; P (x)} ≥ 1 whereas uniqueness means that #{x; P (x)} ≤ 1.) But the notation “∃# ” corresponds
to a logic concept which is not restricted to set theory.
[ Formalize the idea that if something exists and is unique, then it may be given a name and can be used in
later definitions and theorems. ]
4.16.8 Remark: Much of pure mathematics is concerned with proving that sets (or numbers, or functions)
exist and are unique. A function is defined to have a value which exists and is unique. Much of partial
differential equations theory is concerned with existence and uniqueness. So the ∃# quantifier represents an
important concept. In particular, if a set exists and is unique, it can be used in a definition. Definitions
mostly define things which exist and are unique.
One may use the word “the” for unique objects. For example, the Zermelo-Fraenkel empty set axiom
(Definition 5.1.26 (2)) states that ∃A, ∀x, x ∈ / A. Therefore the set A
/ A. It is easily proved that ∃# A, ∀x, x ∈
may be given the name “the empty set” and a specific notation “∅”. (See Theorem 5.8.2 and Notation 5.8.4.)

[ Possibly give formal definitions of the “multiplicity quantifiers” in Remark 4.16.9. ]
4.16.9 Remark: It would be convenient to have notations for concepts such as “there are at least 2 things
x such that P (x) is true”, notated “∃2 x, P (x)”. More generally, “there are between m and n things such that
P (x) is true” for cardinal numbers m ≤ n, notated “∃nm x, P (x)”. Then ∃#n would be a sensible shorthand
for ∃nn . Hence the quantifiers ∃# , ∃#1 and ∃11 are equivalent.
The notation “∃n x, P (x)” should mean: “There are at most n things such that P (x) is true.” So “∃n ”
should mean the same as “∃n0 ”. (See Exercises 46.1.6, 46.1.7, 46.1.8, 46.1.9, 46.1.10, 46.1.11, 46.1.12, 46.1.13
regarding expressions for “∃nm ”.)
These kinds of generalized uniqueness and existence quantifiers could be referred to as “multiplicity quanti-
fiers”.
4.16.10 Remark: Perhaps the generalized existential quantifier notations in Remark 4.16.9 are best writ-
ten out in longhand. The simple case ∃2 x, P (x) may be defined as ∃x, ∃y, (P (x) ∧ P (y) ∧ (x -= y)), but
the more general cases become rapidly more complex. The slightly more complex case ∃#2 x, P (x) (that is,
∃22 x, P (x)) would mean
" # " " ##
∃x, ∃y, (P (x) ∧ P (y) ∧ (x -= y)) ∧ ∀x, ∀y, ∀z, (P (x) ∧ P (y) ∧ P (z)) ⇒ (x = y ∨ y = z ∨ x = z) .
4.16.11 Remark: Just as “unique” means that there is one thing only, the word “duplique” must mean
that there are two things only. The adjectives for 3 and 4 might be “triplique” and “quadruplique” respec-
tively. (This terminology is highly conjectural of course.)
Just as the noun “existence” means that there exists at least one thing, the word “duplicity” might mean
that two things exist. (The corresponding adjective might be “duplicate” or “duplex”.) The nouns for 3 and
4 might be “triplicity” and “quadruplicity” respectively.
4.16.12 Theorem: The statement ∃n x, P (x) is equivalent to ¬(∃n+1 x, P (x)) for non-negative integers n.
So ∃nm x, P (x) is equivalent to (∃m x, P (x)) ∧ ¬(∃n+1 x, P (x)).
The statement ∃n x, P (x) is equivalent to ¬(∃n−1 x, P (x)) for positive integers n.
[ Under what conditions can one say that ∃x, x = x is a tautology? ]

[ According to Mendelson [165], page 47, a proposition of the form “∀y, α” for some wff α is the same as
the unquantified wff α if α does not refer to the variable y. This is a definition of the meaning of “∀y, α”.
This expression requires definition because the standard definition of “∀y” applies only in the case that the
quantified wff refers to y. Formalize these ideas axiomatically. ]
[ Are tautologies defined for predicate calculus? Or are they only defined in propositional calculus.? ]
4.16.13 Theorem [qc]: The following statements are tautologies.

(i) ∀x, x = x.
(ii) ∃y, y = x, for any x.
(iii) ∀x, ∃y, y = x.
[ Provide a more formal proof of Theorem 4.16.13. ]
Proof: Part (i) follows from the definition of equality. Part (ii) follows by noting that x = x. Therefore
y = x is satisfied by substituting x for y. Part (iii) is essentially the same as part (ii).
[ Need to formalize the idea that a symbol like ∅ can represent a particular individual variable in a predicate
logic, for example defined by “x; ∀y, y ∈ / x” without the curly brackets. Similarly define notations like
ordered pairs {a, b} in terms of an expression which satisfies unique existence. ]

[157]
Chapter 5
Sets
5.1 Zermelo-Fraenkel set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5.2 The ZF extension axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.3 The ZF empty set, pair, union and power set axioms . . . . . . . . . . . . . . . . . . . . . 165
5.4 The ZF replacement axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.5 The ZF regularity axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 The ZF infinity axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.7 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.8 ZF set theory definitions and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.9 Axiom of choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.10 Axiom of countable choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.11 Zermelo set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.12 Bernays-Gödel set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.13 Basic properties of binary set unions and intersections . . . . . . . . . . . . . . . . . . . . 194
5.14 Basic properties of general set unions and intersections . . . . . . . . . . . . . . . . . . . . 196
5.15 Closure of set unions under arbitrary unions . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.16 Specification tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.0.1 Remark: Set theory is a foundation layer upon which most of mathematics may be built. In this
book, however, set theory is explained in terms of an even lower foundation layer, mathematical logic,
which is presented in Chapters 3 and 4. So it seems reasonable to commence a systematic explanation of
modern mathematics with logic, progressing to set theory, followed by the rest of mathematics. (And then
mathematical physics rests firmly and comfortably on such a solid, multi-layered foundation of mathematics.)
On the other hand, Halmos [160], page vi, said:
[. . . ] general set theory is pretty trivial stuff really, but, if you want to be a mathematician, you
need some, and here it is; read it, absorb it, and forget it.
It is true that mathematics can be done successfully without first studying set theory or mathematical logic.
But some people have a strong desire to understand what everything means, not just how to do it. Since
this book is strongly oriented towards meaning and understanding, as opposed to the achievement of mere
calculational prowess, the presentation commences with the foundations. Some people study set theory just
because “you need some”. They prefer to let dedicated specialists worry about the foundations. But the
mathematian (and mathematical physicist) who has revolutionary intentions, not content with the gradual
evolutionary development of their subject, will desire an in-depth understanding of the fundamentals.
5.0.2 Remark: Mathematics is full of perplexing and incomprehensible ambiguities. So it is always useful
to be able to wave one’s hands and claim that everything can be made meaningful using “rigorous set
theory”. This is like the ancient Greeks explaining events (e.g. in the Iliad) in terms of the whims of deities
on Mount Olympus. As long as no one went there, no one could prove that the whims of deities on Mount
Olympus didn’t explain events. Similarly, one should not examine set theory too closely because the myth
of a solid basis in rigorous set theory is useful to maintain. Even in times when most logicians thought that

158 5. Sets
set theory was irrepairably self-inconsistent, mathematics progressed regardless. Mathematics did not stop
for a decade while waiting for paradoxes in set theory to be resolved. So don’t panic!
5.0.3 Remark: It is not a-priori obvious that mathematics can be formalized in terms of symbolic logic.
The overwhelming majority of mathematical literature is written in terms of informal arguments. Very
often, a close examination of even the best-written research papers reveals ambiguities, self-contradictions
and meaningless assertions.
It is not certain that the ruthless reduction of real-life mathematical literature would not remove some of
the mathematical corpus which is considered true by the majority of the best mathematicians. Nevertheless,
most mathematicians would accept that any argument which is shown to be either wrong or meaningless
when formalized should be rejected from the corpus, no matter how strongly supported the argument may
be by “intuition”. The view is taken in this book that any argument which cannot be validated in formal
set theory must be rejected from the corpus of mathematical knowledge.
5.0.4 Remark: Constructing all mathematical objects from ZF sets is like building all houses, cars and
computers out of empty boxes. Although it is possible to build mathematics from highly convoluted con-
structions which are defined entirely in terms of the empty set, this is not at all natural.
An alternative to the sets-only approach for the definition of mathematical objects is the axiomatic approach,
where objects such as integers, real numbers and other low-level structures are defined by axioms. Then
the particular set-based constructions of these structures are merely representations for the corresponding
axiomatic systems.
By allowing structures other than sets to be defined axiomatically, mathematical constructions will not then
all be derivable from empty sets as their core underlying object. Some minimalism is thereby sacrificed for
the sake of naturalism. However, minimalism in the foundations implies a vast amount of pointless work
in the construction phase, constructing all mathematical objects out of the empty set. The fact that it
can be done does not imply that it should be done. Suitable candidates for axiomatization (i.e. definition
independent of pure set theory) include numbers, groups, fields, linear spaces, tensors and tangent vectors.
Allowing some classes of mathematical objects to be defined independently of set theory could be regarded
as a kind of diversification strategy or insurance policy. If it turns out later that there is some fundamental
flaw in pure set theory which cannot be fixed, it would be useful to have an independent basis for at least
some of the structures which are required for mathematics.
5.0.5 Remark: The origin of modern set theory is generally traced back to Georg Cantor’s work in the last
quarter of the 19th century, particularly his paper “Grundlagen einer allgemeinen Mannigfaltigkeitslehre”
(“Foundations of a general theory of aggregates”) in 1883. This partially intuitive form of set theory contained
paradoxes which were resolved within axiomatic set theory in the early 20th century.
5.0.6 Remark: The use of set theory requires the acceptance of a small number of rules of logical procedure
and some basic axioms. These rules and axioms are like building regulations which should prevent the too-
creative architect from designing an edifice that may collapse in a storm. Some architects of mathematical
structures may choose to build rogue projects outside the building safety regulations, but to have a work
certified, it must be made to conform, and fellow architects must inspect it for flaws and design faults.
Physicists and other scientists are not bound by the mathematicians’ building regulations. Therefore mathe-
matical models constructed by physicists and others must sometimes be dismantled and reconstructed to fit
the current regulations. Part of the work of mathematicians is to reconstruct models which were originally
constructed by non-mathematicians.
5.0.7 Remark: The idea of trying to find a minimal set of axioms for set theory, from which one can
deduce all of set theory by logical arguments, seems to be inspired by Euclid’s “Elements”, which attempted
to achieve this goal for geometry. It is not entirely obvious that such a venture should succeed. It is clear
that within almost any subject, it will always be possible to deduce at least some “facts” from other facts.
It should perhaps be surprising that so much of mathematics can be deduced without contradictions from a
small set of axioms. There are very few subjects which can be so thoroughly axiomatized as mathematics.
It is difficult to be certain that nothing has been lost from mathematics in the attempt to reduce the
assumptions and rules to a minimum. It is also not clear just how much is gained from this minimalist
endeavour.

5.1. Zermelo-Fraenkel set theory 159
One difficulty with the axiomatic approach is that one must begin with a knowledge of a large number of
facts of a subject, and then work backwards to find a set of axioms which generates the entire subject by
means of simple deduction rules. The axioms are justified by the facts which follow from them. But the
deductive method justifies conclusions in terms of assumed axioms. In this sense, the axiomatic method is
not intuitive.
5.0.8 Remark: Most proofs of theorems in this chapter are given as exercise questions (Chapter 46)
because they are more useful for the reader’s active learning than for presenting novel techniques of deduction.
Answers will be provided in Chapter 47 for all exercises in case the reader wishes to ensure the validity of
proofs. Although most of the theorems are elementary in character, their proofs can be elusive or sometimes
even frustrating, especially since the correctness of most of the assertions is “obvious”.
5.0.9 Remark: Abbreviations

The following are abbreviations for some particular set theories. ZF means “Zermelo-Fraenkel”. BG means
“Bernays-Gödel”. NBG means “Neumann-Bernays-Gödel”.
The abbreviation AC means “axiom of choice”. CC means “axiom of countable choice”. The abbreviation
CCω is also popular for the axiom of countable choice.
5.0.10 Remark: Theorems which are based on non-standard axioms (i.e. not Zermelo-Fraenkel set theory)
are tagged. For example, a theorem based on the axioms ZF plus AC is written as Theorem [zf+ac]. For
example, see Theorems 10.2.25 and 20.1.4. (See also Remark 4.7.2.)
Non-standard additions to ZF set theory (“optional extras”) include AC, CC, the axiom of dependent choice
and the continuum hypothesis.
5.1. Zermelo-Fraenkel set theory

5.1.1 Remark: An excellent introduction to “naive set theory” is Halmos [160]. For a very concise sum-
mary of Zermelo-Fraenkel set theory, see EDM [34], article 35, or the updated version in EDM2 [35], sec-
tion 33.B. For a much less digestible but more rigorous treatment, see Mendelson [165], page 206, which
defines Zermelo-Skolem-Fraenkel set theory in terms of Neumann-Bernays-Gödel set theory, which is pre-
sented in Mendelson [165], pages 159–170. (The sceptical reader might question the need for so many different
set theories. Why can’t there be just one standard set theory? As someone once famously observed, the
good thing about standards is that there are so many to choose from!)
5.1.2 Remark: Sets are not defined. As far as set theory is concerned, sets are merely symbols which
satisfy some axioms and rules of deduction. However, mathematicians think of those symbols as referring to
mathematical objects. (See Section 2.3 for some comments on the ontology of mathematics.)
Although sets are undefined, this is the normal way in which axiomatic systems are treated. For example,
in probability theory, probability itself is not defined.
5.1.3 Remark: Set theory defines the membership relation “∈” rather than sets. So it would perhaps be
more accurate to call set theory “membership theory”.
Since the only relation between sets (apart from equality) is the membership relation, it is reasonable to
expect that two sets are the same set if and only if they have the same membership relations with other sets.
So it is usual to require that A = B if ∀x, (x ∈ A ⇔ x ∈ B). This requirement can be either an axiom or a
definition of set equality. (This is also discussed in Section 5.2.)
/ y means ¬(x ∈ y).

5.1.4 Notation: x ∈
5.1.5 Remark: When the elements of a set A are themselves sets under consideration, the set A may be
referred to as a “collection”. There is really no difference between sets and collections, but it is sometimes
useful to have a different word to help clarify an idea. Since essentially all things in set theory are sets, it
follows that all members of sets must be sets.
5.1.6 Remark: One very annoying thing about set theory is the fact that an expression like {x; P (x)},
the “set of things satisfying predicate P ”, is not guaranteed to define a set. (See Notation 5.8.12.) To have a

160 5. Sets
high likelihood of constructing only well-defined sets, it is necessary to restrict set constructions to a limited
set of set-construction rules. One such rule-set is the ZF set of axioms which are outlined in this section. (Set
construction techniques are very succinctly explained in EDM2 [35], section 381.G.) Bernays-Gödel (BG) set
theory gets around the problem of ill-defined sets (Russell’s paradox) by introducing a second kind of set
called a “class”. This extends ZF set theory. (See EDM2 [35], section 33.C, for BG set theory.)
5.1.7 Definition:
A subset of a set B is a set A such that ∀x, (x ∈ A ⇒ x ∈ B).
A superset of a set A is a set B such that ∀x, (x ∈ A ⇒ x ∈ B).
5.1.8 Notation:
A ⊆ B means that A is a subset of B.
A ⊇ B means that A is a superset of B.
5.1.9 Remark: The notation ⊂ meaning ⊆ is avoided in this book because although ⊂ mostly means the
same thing as ⊆, it very often means strict inclusion. Similarly the notation ⊃ is avoided.
5.1.10 Notation: x -= y means ¬(x = y) for any sets x and y.
5.1.11 Definition:
A proper subset of a set B is a set A such that A ⊆ B and A -= B.
A proper superset of a set A is a set B such that A ⊆ B and A -= B.
5.1.12 Notation:
A⊂%= B means that set A is a proper subset of set B.
A⊃ \ B means that set A is a proper superset of set B.
=
[ Consider using some of the msbm symbols: !, ", #, $, %, &, ', (, ), *, +, ,. ]
5.1.13 Remark: To be safe, one should use the clumsy-looking strict inclusion notations ⊂ %= and =
⊃ respec-
\
tively for the proper subset and superset relations. The clumsiness does not matter because these strict
inclusion relations are almost never needed. It goes perhaps without saying that the relations ⊂%= and =
⊃ must
\
not be confused with the relations -⊆ and -⊇ respectively.
5.1.14 Notation:
A proposition of the form ∀x ∈ S, P (x) for a set S and set-theoretic formula P means ∀x, (x ∈ S ⇒ P (x)).
A proposition of the form ∃x ∈ S, P (x) for a set S and set-theoretic formula P means ∃x, (x ∈ S ∧ P (x)).
5.1.15 Remark: It is curious that the form of the meanings of the proposition forms ∀x ∈ S, P (x) and
∃x ∈ S, P (x) are different. The former uses “⇒” while the latter uses “∧”. However, this difference ensures
that ∃x ∈ S, P (x) is equivalent to ¬(∀x ∈ S, ¬P (x)). This matches the definition of the existential quantifier
in Remark 4.13.2. (For proof, see Exercise 46.2.1.)
The proposition ∀x, (x ∈ S ⇒ P (x)) is equivalent to ∀x, (x ∈/ S ∨ P (x)).
5.1.16 Remark: The definition of a ZF set theory is presented as a set of axioms in Definition 5.1.26
because sets are a basic concept in this book, not derived from other concepts. There is no canonical
representation of the system of sets in terms of other systems. Each mathematician is expected to have their
own representation of sets; the ZF axioms just provide a set of tests to ensure that all mathematicians are
discussing an equivalent system when they discuss sets. A set of axioms such as ZF may be thought of as
a set of regulations for an outsourced mathematical system; a conformant system is given a certificate of
compliance if it is proved to comply with the axioms. The choice of representation is a mere implementation
detail to be decided by the supplier.
Humans may represent sets on paper as lists of symbols between braces, or as regions of a diagram, or in many
other ways. Humans usually have some sort of set representation in their own minds also. Computers may
represent sets as bit patterns in electrical or magnetic storage. As long as such representations satisfy the
axioms, they constitute a valid set theory which must be isomorphic in some sense to any other representation.
For example, every ZF set theory representation must have a unique empty set A and sets B = {A},
C = {A, B} etc. In any other representation, the empty set A# and sets B # = {A# }, C # = {A# , B # } etc.

5.1. Zermelo-Fraenkel set theory 161
must have the same properties A# ∪ C # = C # , B # ∩ C # = B # etc. as in the first representation. It should be
impossible for one representation to have sets which another does not have because all of the elements of
sets are required to be sets themselves. [ Is this really true? Is it an established fact that any two systems
satisfying the ZF axioms are necessarily co-extensive? ]
5.1.17 Remark: Definition 5.1.26 specifies the properties of the membership relation between sets. It
also specifies construction methods which yield new sets from given sets. So it provides a combination
of membership relation properties and set existence rules. The set existence axioms 2, 3, 4, 5, 6 and 8
guarantee the existence of various kinds of sets. Axioms 1 and 7 specify constraints which limit set theory
to safe ground.
All of the ZF set existence axioms guarantee the existence of sets which are constructed with the aid of
pre-existing sets except for Axioms 2 and 8, which make sets out of nothing. The only non-independent (i.e.
redundant) ZF axiom is Axiom 2.
[ There might be a second redundant set theory axiom. Check this. ]
The productive axioms 2, 3, 4, 5, 6 and 8 may be thought of as defining lower bounds on the sets which may
exist in ZF set theory. The restrictive axioms 1 and 7 may be thought of as defining upper bounds on set
existence.
5.1.18 Remark: ZF set existence axioms 2, 3, 4, 5 and 6 all specify sets in terms of a set-theoretic formula.
Therefore by the extension axiom (1), these sets are uniquely defined. The combination of existence and
uniqueness is, of course, a pre-requisite for giving a specific name or notation to a set. Axiom 2 (empty
set existence) is equivalent to ∃X, ∀z, (z ∈ X ⇔ ⊥(z)), where ⊥ denotes the always-false predicate. (See
Notation 4.12.10.)
Therefore all of the ZF set existence axioms yield unique, specified sets. This contrasts with the axiom of
choice, which merely claims that a specified class of sets is non-empty. This is why the axiom of choice is so
useless. (See also Remark 5.9.2.)
5.1.19 Remark: In a sense, all of the productive ZF axioms mentioned in Remark 5.1.17 make sets bigger
with the exception of Axiom 6, which guarantees that you can prune back a set more or less as you wish,
using an arbitrary set-theoretic formula. In particular, it allows you to prune back any set to the empty set.
Since the infinity axiom (8) guarantees the existence of at least one set, it follows that an empty set exists.
In other words, axiom 2 follows from axioms 6 and 8.
5.1.20 Remark: The set membership symbol ∈ is presumably derived from the first letter of the word
“element”. This symbol must not be confused with + (epsilon) or ε (variant epsilon).
5.1.21 Remark: Definition 5.1.22 arises from the antediluvian soup of intuitive set theory which is the
basis of formal logic. Since a “formula” is not defined here, and nor are variables, logical connectives and
relations, a “set-theoretic formula” is not very well defined. (Mendelson [165], page 164, uses the term
“predicative well-formed formula”.)
5.1.22 Definition: A set-theoretic formula is a formula which contains only variables, logical connectives
and the predicate symbol “∈” (set membership).
5.1.23 Remark: Definition 5.1.24 classifies variables as “bound” or “free”. Essentially a bound variable
has local scope as a parameter of a quantifier whereas a free variable has global scope. This implies that a
variable may be a free variable for a sub-formula of a formula and simultaneously a bound variable within
the full formula. In computer software, a bound variable would be called a “dummy variable”.
[ Definition 5.1.24 needs to be improved a lot. ]
5.1.24 Definition: A bound variable in a set-theoretic formula is a variable x which is within the scope
of a universal quantifier “∀x” or existential quantifier “∃x”.
A free variable in a set-theoretic formula is a variable which is not a bound variable.

162 5. Sets
5.1.25 Remark: The function of a free variable in a logical expression is to permit substitution of values
from a variable space. Thus a logical expression which contains free variables may be regarded as a template
which generates a different particular proposition for each combination of values substituted for the free
variables in the template expression.
The function of a bound (or dummy) variable is to link two points in a logical expression. It is not permitted
to substitute particular values for a dummy variable as one may do for free variables.
The ZF set theory axioms in Definition 5.1.26 are proposition templates. Concrete individuals of the indicated
type may be substituted for any of the free variables. All of the quantifiers refer to the set of individuals,
not predicates or functions. All individuals are sets. So all quantifiers refer to sets.
The equality relation “=” in Definition 5.1.26 is a two-parameter predicate which is imported from the
concrete predicate domain. (See Remark 4.12.3 for concrete predicates.)
5.1.26 Definition: The axioms of a Zermelo-Fraenkel set theory are as follows.
(1) The extension axiom: For any sets A and B,
" #
∀x, (x ∈ A ⇔ x ∈ B) ⇒ (A = B).
(2) The empty set axiom:
∃A, ∀x, x ∈
/ A.
(3) The unordered pair axiom: For any sets A and B,
∃C, ∀z, (z ∈ C ⇔ (z = A ∨ z = B)).
(4) The union axiom: For any set K,
" #
∃X, ∀z, z ∈ X ⇔ ∃S, (z ∈ S ∧ S ∈ K) .
(5) The power set axiom: For any set X:
" #
∃P, ∀A, A ∈ P ⇔ ∀x, (x ∈ A ⇒ x ∈ X) .
(6) The replacement axiom: For any set-theoretic formula f and set A,
" # " #
∀x, ∀y, ∀z, ((f (x, y) ∧ f (x, z)) ⇒ y = z) ⇒ ∃B, ∀y, y ∈ B ⇔ ∃x, (x ∈ A ∧ f (x, y)) .
(7) The regularity axiom: For any set-theoretic formula P ,
" # " #
∃x, P (x) ⇒ ∃z, P (z) ∧ ∀y ∈ z, (¬P (y)) .
(8) The infinity axiom:
" #
∃X, ∀z, z ∈ X ⇔ ((∀u, u ∈
/ z) ∨ ∃y, (y ∈ X ∧ ∀v, (v ∈ z ⇔ (v ∈ y ∨ v = y)))) .
[ Shoenfield [169], page 239, has a different form of axiom (7), which seems a little simpler. But it may start
from different assumptions. Check this. ]
5.1.27 Remark: The following is a compact summary of the ZF set theory axioms. The variables on the
left indicate what kind of free variable or free predicate is to be used in the axiom schema on the right. The
word “formula” on the left means “set-theoretic formula”.
(1) sets A, B: (∀x, (x ∈ A ⇔ x ∈ B)) ⇒ (A = B)
(2) ∃A, ∀x, x ∈/A
(3) sets A, B: ∃C, ∀z, (z ∈ C ⇔ (z = A ∨ z = B))
(4) set K: ∃X, ∀z, (z ∈ X ⇔ ∃S, (z ∈ S ∧ S ∈ K))
(5) set X: ∃P, ∀A, (A ∈ P ⇔ ∀x, (x ∈ A ⇒ x ∈ X))
(6) formula f , set A: (∀x, ∀y, ∀z, ((f (x, y) ∧ f (x, z)) ⇒ y = z)) ⇒ ∃B, ∀y, (y ∈ B ⇔ ∃x, (x ∈ A ∧ f (x, y)))
(7) formula P : (∃x, P (x)) ⇒ ∃z, (P (z) ∧ ∀y ∈ z, (¬P (y)))
(8) ∃X, ∀z, (z ∈ X ⇔ ((∀u, u ∈ / z) ∨ ∃y, (y ∈ X ∧ ∀v, (v ∈ z ⇔ (v ∈ y ∨ v = y))))).
It is quite awesome that the entire basis of set theory can be written in 8 lines. The rest of mathematics is
just definitions, theorems, notations and remarks.

5.2. The ZF extension axiom 163
[ Refer to the set construction “stages” in Shoenfield [169], pages 238–240. ]

5.1.28 Remark: The following is a high-level interpretation of the compact summary of the ZF set theory
axioms in Remark 5.1.27. The variables on the left indicate what kind of free variable or free predicate is to
be used in the axiom schema on the right. The word “formula” on the left means “set-theoretic formula”.
(1) sets A, B: (A ⊆ B ∧ B ⊆ A) ⇒ (A = B)
(2) ∅ is a set
(3) sets A, B: {A, B} is a set
%
(4) set K: K is a set
(5) set X: IP(X) is a set
(6) formula f , set A: f is a function ⇒ f (A) is a set
(7) formula P : (∃x, P (x)) ⇒ ∃z, (P (z) ∧ ∀y ∈ z, (¬P (y)))
(8) ω is a set.
5.2. The ZF extension axiom

5.2.1 Remark: The extension axiom, Definition 5.1.26 (1), is also known as the axiom of extensionality.
This axiom may be thought of as the proposition (A ⊆ B ∧ B ⊆ A) ⇒ A = B. That is, if the sets referred to
by two abstract set names A and B have the same membership relations “to the left” with all other abstract
set names, then the abstract set names must refer to the same concrete object. And if two names refer to
the same object, all attributes of that object must be identical. (See Figure 5.2.1.)
A B
∈ ∈
x
" #
∀x, (x ∈ A ⇔ x ∈ B) ⇒ A = B
Figure 5.2.1 ZF extension axiom
In other words, the only property of a set is what it contains. A set is completely determined by its contents.
Consequently if ∀x, (x ∈ A ⇔ x ∈ B) then ∀x, (A ∈ x ⇔ B ∈ x), which means that A and B have the
same set membership relations “on the right” if they have the same membership relations “on the left”.
[ In terms of the proposition naming framework in Section 4.1, A = B means that A and B refer to the same
object in the concrete proposition domain, since the abstract equality relation is imported from the concrete
logic domain. This is mentioned elsewhere already, but perhaps there should be a diagram of the import
process near here. ]
5.2.2 Remark: In the presentation of NBG (Neumann-Bernays-Gödel) set theory by Mendelson [165],
pages 159–170, the extension axiom is defined differently to Definition 5.1.26 (1). This is because in a purely
linguistic, abstract symbolic logic, equality needs to be explicitly defined since the symbols are just symbols
which do not refer to anything concrete. In fully abstract logic, the equality relation “=” may be defined in
terms of the membership relation “∈” by definining A = B to mean ∀x, (x ∈ A ⇔ x ∈ B).
In terms of the defined equality relation “=”, the extension axiom then requires this relation to have the
following relation to the set membership relation “on the right”. (See Mendelson [165], page 161.)
A = B ⇒ ∀x, (A ∈ x ⇔ B ∈ x),
This axiom is similar but quite different to Axiom (1). It means that two sets which are equal are contained
in the same sets “on the right”. In other words, if the ∈-relations on the left are the same, then the ∈-
relations on the right are the same, in which case all ∈-relations of A and B are the same. So it is reasonable

164 5. Sets
to think of the two labels A and B as referring to the same object. But set objects are outside the scope of
symbolic logic, which only defines the truth and falsity of symbolic expressions. Maybe this is the difference
between “naive” set theory and “real” set theory. In a naive set theory, the concept of set equality is taken
for granted, and it is automatically assumed that two symbols which refer to a single set must have the same
membership relations, whereas in dinkum set theory, there is no concept of symbols referring to objects
outside the symbolic language.
Another way to think of this is that in naive set theory, the labels are just names which may refer to the same
object, whereas in symbolic algebra, the names are distinguishing attributes of objects; hence two objects
may be equal but not the same! This is a bit like two hydrogen atoms being equal but not the same because
they have a different location. In many practical situations in mathematics, it is necessary to think of two
copies of the same set as being different objects if they have different labels, for example in the case of a
function f between two sets A and B: if A and B happen to be equal as sets, we don’t necessarily want to
admit f as a member of the set of maps from A to A. The fact that two sets being used in different parts
of a system happen to have the same members shouldn’t necessarily make them the same in all respects. A
clearer example is the fact that the right translation group of a group G and the left translation group of G
are represented by identical sets although they are different kinds of structure. It is a philosophical question
as to whether two things which are equal in all respects are necessarily the same object. A lot of time and
energy can be wasted on this question if sufficient tea and coffee are provided.
5.2.3 Remark: Two ways of introducing the extension axiom are summarized in the following table.
proposition modelling linguistic
1. (A = B) ⇒ ∀z, (z ∈ A ⇔ z ∈ B) FOL + EQ definition
2. (A = B) ⇒ ∀z, (A ∈ z ⇔ B ∈ z) FOL + EQ axiom
" #
3. "∀z, (z ∈ A ⇔ z ∈ B)# ⇒ (A = B) axiom definition
4. ∀z, (A ∈ z ⇔ B ∈ z) ⇒ (A = B) theorem theorem
The modelling approach to logic is adopted in this book and also by Shoenfield [169], pages 238–240, and
by EDM2 [35], 33.B, pages 147–148. In this approach, it is assumed that the names in the variable name
space NV of the language refer to objects in a concrete variable space V. (In the case of set theory, this
means that the names of sets refer to concrete sets in some externally defined space.) An equality relation is
assumed to be defined already on the concrete variable space, and this relation is imported into the abstract
space via the variable name map µV : NV → V.
Therefore in the modelling approach, the equality relation “=” is defined as an import of concrete equality
relation. Since it assumed, furthermore, that the membership relation “∈” is also imported from a concrete
membership relation, and the concrete relation is assumed to be well-defined for the concrete variables, it
automatically follows that in the abstract name space, (A = B) ⇒ (z ∈ A ⇔ z ∈ B) and (A = B) ⇒
(A ∈ z ⇔ B ∈ z). In fact, B may be substituted for any instance of A like this without changing the
truth value of the proposition. That is, (A = B) ⇒ (F (A) ⇔ F (B)) for any predicate F . (This is
called the “substitutivity of equality” axiom. See Remark 4.15.1.) So lines (1) and (2) in the above table
follow
" immediately. (“FOL
# + EQ” abbreviates “first order language with equality”.) In this case, line (3),
∀z, (z ∈ A ⇔ z ∈ B) ⇒ (A = B) is specified as an axiom of extension. This is required because it is not
a-priori obvious that the concrete membership relation would be related to the concrete equality relation in
this way.
The language approach in the right column of the table is used by Mendelson [165], pages 159–170. In this
approach, the equality relation is not imported from a concrete space. In this case, the equality relation
needs to be defined in terms of the membership relation, which is the only relation which is imported from
the concrete space. In this approach, A = B is defined to mean ∀z, (z ∈ A ⇔ z ∈ B). This yields lines (1)
and (3) in the table.
In the language approach, since the equality relation is no more than a definition in terms of the membership
relation, there is no guarantee at all that this abstract equality of A and B implies that they both refer to
the same concrete object, which therefore would have all attributes identical. Therefore this “substitutivity
of equality” property must be specified as an axiom of extension. From this, it then follows as a theorem that

5.3. The ZF empty set, pair, union and power set axioms 165
this language is a first order language with equality. (See Mendelson [165], Proposition 4.2.) By contrast,
the modelling approach assumes a-priori that set theory is a first order language with equality.
This explains why line (3) is an axiom in the modelling approach whereas it is given by a definition in the
language approach, and the reverse is true for line (1).
Line (4) follows in both approaches from Theorem 5.2.4.
" #
5.2.4 Theorem: ∀z, (A ∈ z ⇔ B ∈ z) ⇒ (A = B).
Proof: Suppose that A and B are (names of) sets such that ∀z, (A ∈ z ⇔ B ∈ z). Let z = {A}. (This is
a set by the ordered pair axiom.) Then A ∈ z. Therefore B ∈ z. So B = A.
5.2.5 Remark: When an a-priori assumption of a first order language with equality is made, a very general
axiom of “substitution of equality” is adopted. (See also Remark 5.2.3.) This may be written as follows.
" #
∀x, ∀y, (x = y) ⇒ α(x, x) ⇒ α(x, y) .
This means that a variable x in any wff α may be substituted with y in all, some or none of the places where
x occurs in the formula. (See Mendelson [165], page 75.)
5.2.6 Remark: It should not be forgotten that the extension axiom, like all other axioms, is not intended
to be “true”. The fact that this axiom is adopted in ZF set theory is merely intended to focus the mind on
those systems which do satisfy the axiom. Thus ZF set theory restricts the focus to those systems which
have the special property that the identity of each object is determined by its members. I.e. The identity of
an object x is determined by the objects y such that y ∈ x.
The members of a set determine its identity in a one-to-one fashion. The reverse implication (∀x, (x ∈ A ⇔
x ∈ B)) ⇐ (A = B) follows from the “substitution of equality” property which is required for a first order
language with equality. Consequently, ZF set theory deals very specifically with collections of objects. These
collections have no distinguishing attributes other than their membership relations “on the left”.
One could almost say that the extension axiom is the only axiom which is required in set theory. This axioms
on its own encapsulates the essential nature of sets. The other ZF axioms are required to force the system
of sets to have at least a minimum of useful sets (axioms 2, 3, 4, 5, 6 and 8), and to not have sets which
could cause trouble (axiom 7). (See also Remark 5.1.17 for related comments.)
It would be possible to develop a set theory in which the sets have other attributes in addition to left-side
membership relations. However, such a theory would almost certainly be easier to develop by extending
ZF-style set theory with the addition of extra attributes to sets than by developing a new theory from
scratch. For example, Section 2.6 discusses the desirability of attaching a “class tag” to each set when using
sets constructions to represent classes of objects. (Class tags are also discussed in Section 5.16.) Then the
empty set in one class could be distinguished from the empty set in another class, for example, and complex
numbers (x, y) ∈ can be distinguised from real number tuples in (x, y) ∈ IR2 . Such a system is easily
developed in terms of ordered pairs (C, X), where C is an element of a set of class tags and X is a ZF set.
5.2.7 Remark: As a historical note, the issues discussed in Remark 5.2.1 caused the author to spend a lot
of time reading about what the symbols in logic books really mean. Like many, many months. It all made
sense finally, and this required a major rewrite of much of the fundamental logic and set theory material.
When the rewrite is done, the meaning of the logic and set theory will hopefully be clear in the minds of
readers also!
5.3. The ZF empty set, pair, union and power set axioms
5.3.1 Remark: The remarks in this section are comments on the individual ZF set theory axioms in
Definition 5.1.26. It is important to get comfortable with the axioms, at least to some extent, before
proceeding to the systematic development which starts in Section 5.8.
The commentary on Russell’s paradox in Section 5.7 refers principally to the regularity axiom (7). This is
given its own section because it is such a big issue. The ZF extension axiom (1) is discussed in Section 5.2.
The ZF replacement axiom (6) is discussed in Section 5.4. The ZF regularity axiom (7) is discussed in
Section 5.5. The ZF infinity axiom (8) is discussed in Section 5.6.

166 5. Sets
5.3.2 Remark: empty set axiom (2)

The empty set axiom states that there exists an empty set. It follows from Axiom (1) that the empty set is
unique. The empty set is denoted as ∅. (See Definition 5.8.3 and Notation 5.8.4 for the empty set.)
5.3.3 Remark: unordered pair axiom (3)

A notation for the set A in the unordered pair axiom is {x, y}. By setting y = x, this axiom also implies
the existence of {x}. Any set with one and only one element is called a “singleton”. (See Definition 5.8.9
for singletons.)
5.3.4 Remark: union axiom (4)

The union
% axiom means %
that ∀K, ∃X, X = {z; ∃S ∈ K, z ∈ S}. (See Figure 5.3.1.) The notation for this
set X is S∈K S or just K.
K
∈ ∈
∈ ∈
%
X z K
" #
∃X, ∀z, z ∈ X ⇔ ∃S, (z ∈ S ∧ S ∈ K)
Figure 5.3.1 ZF union axiom for a general collection of sets
If K = {A, B} for sets A and B, the set X = S∈K S may be denoted as A ∪ B. (See Figure 5.3.2 for the
special case of the union of two sets.)
A B
∈ ∈
X z A∪B
" #
∃X, ∀z, z ∈ X ⇔ (z ∈ A ∨ z ∈ B)
Figure 5.3.2 ZF union axiom for a collection of two sets
Together with Axiom (3), this axiom implies that 3-member sets {x, y, z} are well defined by considering K =
{{x, y}, {y, z}}. Sets containing any finite number of given elements are similarly well defined. (It seems
that if Axiom (3) was replaced with a “singleton axiom”, guaranteeing the existence of the set {x} for any
x could be combined with the union axiom to construct pairs {x, y}. But Axiom (3) is not stronger than
necessary because for this construction to work would require K = {{x}, {y}}, which is a pair!)
5.3.5 Remark: power set axiom (5)

The power set axiom may be written as ∃P, ∀A, (A ∈ P ⇔ A ⊆ P ). That is, P = {A; A ⊆ X} is a set for any
given set X. The power set P = {A; A ⊆ X} for any set X will be denoted IP(X). (See Notation 5.8.19.)

5.4. The ZF replacement axiom 167
5.4. The ZF replacement axiom

This section contains comments on the ZF replacement axiom, Definition 5.1.26 (6).
5.4.1 Remark: The replacement axiom says effectively that for any set-theoretic formula g and any set A,
there exists a set B such that B = {g(x); x ∈ A}. This set may be denoted by g(A), although this is not
actually unambiguous and the meaning of this notation must always be determined from the context. The
axiom as stated allows for the possibility that the function g may only be partially defined on A, which may
explain why a two-parameter formula f is used instead of a single-parameter formula g.
The set B whose existence is guaranteed by the replacement axiom for a pre-existing set A and a function-like
rule f is not required to be an element of A or any other pre-existing set. The rule f may be thought of as
mapping elements of A to sets constructed according to rule f outside A, which are then gathered together
in a new set B. (See Figure 5.4.1.)
f
x y
A B
Figure 5.4.1 ZF replacement axiom
If the rule f in Axiom (6) associates elements of A with elements of A, and if for all x ∈ A the predicate
f (x, y) is either false for all y or else true for only y = x, then f becomes a simple yes/no function on A.
In this case Axiom (6) guarantees the existence of the subset of A which is defined by any single-parameter
set-theoretic formula g. This special case is called the specification axiom or the axiom of subsets.
5.4.2 Remark: If the replacement axiom in ZF set theory is replaced by the slightly weaker specification
axiom (6# ) in Definition 5.11.1, the resulting set of axioms is called Zermelo set theory. Any theorem which
is true in Zermelo set theory is also true in ZF set theory. (See Section 5.11 for Zermelo set theory.)
5.4.3 Remark: The replacement axiom (6) may be replaced by the combination of the separation axiom
with a more general replacement axiom. (See EDM2 [35], 33.B, page 147.) The axiom of separation (or of
specification, or of comprehension, or of subsets) is as follows, for any set X and predicate P .
∃Y, ∀z, (z ∈ Y ⇔ (z ∈ X ∧ P (z))). (5.4.1)
A more general axiom of replacement is as follows for sets X and predicates R.
∃Y, ∀x, ((x ∈ X ∧ ∃a, R(x, a)) ⇒ ∃b, (b ∈ Y ∧ R(x, b))). (5.4.2)
It is an interesting exercise to show that the combination of the two axioms (5.4.1) and (5.4.2) implies the
single ZF replacement axiom. (See Exercise 46.2.2.) It is not difficult to show that the ZF replacement
axiom implies (5.4.1). (See Exercise 46.2.3.) However, it seems that the ZF replacement axiom does not
imply (5.4.2) unless it is assisted by the axiom of choice.
5.5. The ZF regularity axiom

This section contains comments on the ZF regularity axiom, Definition 5.1.26 (7).
5.5.1 Remark: The regularity axiom is also known as the “axiom of foundation”.
The regularity axiom says that if P (x) is true for some x, then there exists a set z such that P (z) is true
but P (y) is false for all y ∈ z. In other words, for any predicate P which is not the always-false predicate,
there exists a set z for which P (z) is true but none of the elements y of z satisfy P (y). This would imply
that a sequence of set memberships must terminate.

168 5. Sets
For a non-empty set V , define the predicate P by P (x) = “x ∈ V ” for sets x. Then the regularity axiom
implies ∃z ∈ V, ∀y ∈ z, y ∈
/ V (or equivalently ∃z ∈ V, ∀y ∈ V, y ∈
/ z, or ∃z ∈ V, z ∩ V = ∅). In other words,
for some z in V , none of the elements of z are in V . The negation of this would be ∀z ∈ V, ∃y ∈ z, y ∈ V .
But this would mean that every element z0 of V contains at least one element z1 of V , which in turn contains
at least one element z2 of V , and so forth, which would never arrive at a set containing no elements of V .
(See Figure 5.5.1.) The axiom implies that the sequence . . . zn ∈ . . . ∈ z2 ∈ z1 ∈ z0 terminates on the left
for some n by redefining V to be this infinite sequence of sets.
z6 z5 z4 z3 z2 z1
z5 z4 z3 z2 z1 z0
Figure 5.5.1 Infinite chain of set memberships “on the left”
It follows that the proposition x ∈ x is false for all sets x. This in turn prevents Russell’s paradox from
happening because a set U satisfying ∀x, x ∈ U would contradict Axiom (7), as would also a set Q satisfying
∀x, (x ∈ Q ⇔ x ∈ / x), because clearly Q = U since x ∈ / x for all sets x. So Russell’s paradox is resolved by
forbidding the kinds of sets which cause the embarrassment. (Bernays-Gödel set theory in Section 5.12 gets
around the paradox by demoting problematic “sets” such as U and Q to mere “classes” while accepting all
of the sets in ZF theory as fully certified sets.)
5.5.2 Remark: Although the regularity axiom forbids an infinite sequence of set memberships on the left,
an infinite sequence
% of such set memberships can be obtained as the union over all “leftward
% paths”. For
example, the set ω is the union %of all second-order members of the set ω. Then ω = ω. (This is
illustrated in Figure 5.5.2. The set ω equals the set of all numbers in the second row below ω.)
0 1 2 3 ... n ...
%
ω={ 0 0 1 0 1 2 0 . . . n−1 . . . }
0 0 0 1
0
Figure 5.5.2 Set membership tree for the set of finite ordinal numbers
%% %
So . . . ω = ω. This only occurs because a union is taken over multiple membership paths. Any
particular path downwards from ω in Figure 5.5.2 terminates after a finite number of steps, although this
number of steps is unbounded. (This is a clear example of the difference between the words “infinite” and
“unbounded”.)
5.5.3 Remark: Although the regularity axiom prevents an infinite chain of set memberships “to the left”
there is nothing to stop an infinite chain of set memberships “to the right”. In fact, the set of ordinal numbers
is defined as such an infinite sequence, whose existence is guaranteed by Axiom (8). (See Exercise 46.2.4.)
Thus line (5.5.1) is okay.
x ∈ {x} ∈ {{x}} ∈ {{{x}}} ∈ . . . (5.5.1)
But line (5.5.2) is not okay.
. . . ∈ U ∈ U ∈ U ∈ U ∈ U. (5.5.2)
One might ask why there is an asymmetry here. One way to look at this is to consider that the nature and
meaning of a set is determined by what it contains, not by what it is contained in. Thus to determine the
nature of a set, one follows the membership relation network to the left. By looking at the members, and
the members of the members, and so forth, one should be able to find an end-point of any traversal along

5.6. The ZF infinity axiom 169
a leftwards path in the membership relation network within a finite number of steps. The regularity axiom
guarantees that this is so. Therefore the nature of all sets may be determined in this way.
One might ask also why the nature of a set is not determined by the sets which it is contained in. That would
seem to not be totally unnatural. However, this approach would require a very large number of top-level
sets to be given meaning outside pure set theory. In the ZF approach to set theory, all membership relation
traversals to the left ultimately end with the empty set. Therefore one only needs to give meaning to one
set, namely the empty set. This does not seem to be very onerous. It is not at all clear how one could start
with a single universe set and give meaning to its members, and members of members, and so forth.
So the short answer to the question of why the regularity axiom is needed is that it provides certainty that
all sets have determinable content.
[ Find out if the “pure sets” requirement in Remark 5.5.4 does in fact imply satisfaction of the regularity
axiom. See Shoenfield [169], pages 238–240, for set construction “stages”. ]
5.5.4 Remark: If all sets in ZF set theory are “pure sets”, i.e. all sets are built up in a finite number of
“stages” from the empty set ∅ (axiom 2) and the finite ordinals set ω (axiom 8), it seems intuitively clear
that the ZF regularity axiom should hold automatically. Each of the set construction axioms 3, 4, 5 and 6
seems to build sets which obey the regularity axiom if the source sets out of which they are built obey the
regularity axiom.
5.6. The ZF infinity axiom

[ Maybe Remark 5.6.1 is not 100% correct. But there must be some truth in it. ]
5.6.1 Remark: The concept of “infinity” is essentially impossible to resolve. One may write down a set
of symbols on paper which purport to refer to something which is infinite. But in all of mathematics, it is
difficult to know what any symbols are pointing to, other than thoughts or models in people’s minds, and
the contents of those minds are inaccessible other than via writing and speech. As discussed in Sections 2.11
and 2.12, there are very serious problems with all infinite concepts.
In the case of the ZF infinity axiom, even though it may look as if infinity has been objectively defined, in fact
a set of axioms is only a particular set of propositions within a logical model. (See Section 3.3 for models.)
A model has validity only to the extent that it accurately represents a modelled system. In particular, a
universal qualifier “∀z” can only range over an infinite set of values if the concrete system being modelled
has an infinite number of objects.
The infinity axiom states that the modelled system has no last element in at least one set. So this axiom
does encapsulate the idea that no matter how many elements are in the set, there is always one more –
because no element is the last element. I.e. every member of the set has a successor which differs from all
other members of the set. But this merely states a required property of an infinite set. It does not explicitly
list the elements of the set, and obviously that is impossible. By comparison, consider that π has a specified
set of properties, but we are never given the complete set of decimal digits.
More than anything, it should not be forgotten that the development of ZF set theory only guarantees that
various consequences follow from the axioms if there exists a concrete system which is accurately modelled
by the axioms. If there is no such concrete system, then the assumptions (i.e. axioms) of ZF are not satisfied,
and therefore there will be no consequences whatsover!
The infinity axiom differs from the other set existence axioms by giving merely a recurrence relation for a
set construction, not giving a more or less explicit list of elements.
[ Also comment on the equivalent axiom of ∈-induction in Remark 5.6.2? ]
5.6.2 Remark: There are many common variants of the infinity axiom, Definition 5.1.26 (8).
∃X, ((∅ ∈ X) ∧ (∀y ∈ X, (y ∪ {y} ∈ X))) (5.6.1)

" " ##
∃X, (∃u, u ∈ X) ∧ ∀u, u ∈ X ⇒ ∃v, (v ∈ X ∧ u ⊆ v ∧ v -= u) . (5.6.2)
"
∃X, ∃y, (y ∈ X ∧ ∀z, z ∈
/ y) ∧
#
∀u, (u ∈ X ⇒ ∃v, (v ∈ X ∧ ∀w, (w ∈ v ⇔ (w ∈ u ∨ w = u)))) . (5.6.3)

170 5. Sets
Axiom (5.6.1) is given in EDM2 [35], 33.B, axiom (5.6.2) is given in EDM [34], 35.B, and Mendelson [165],
page 169, and axiom (5.6.3) is given by Shoenfield [169], page 240.
The variants of the infinity axiom differ in more than one feature.
(1) Some variants are written in high-level set language, using specific set-construction expressions such as
∅ and y ∪ {y}, while others are written in raw, low-level logic language such as ∃y, ∀z, z ∈
/ y.
(2) Some variants use expressions like ∀y, (y ∈ z ⇔ F (y)) to specify the precise membership of a set z,
whereas others use expressions like ∃y, (y ∈ z ∧ (F (y)) to specify only minimum requirements of set
membership.
When stating the ZF axioms initially, it is generally preferable to use only low-level logical language. No-
tations such as ∅ and y ∪ {y} should be introduced after all of the axioms have been presented. Therefore
in (1), preference should be given to pure logical language, although it is very helpful to explain the axioms
later in high-level language.
In (2), the style of axiom which is adopted affects the amount of work required to develop the axioms into
a useful initial corpus of theorems. If the less specific, weaker set-membership requirement is adopted, an
axiom of substitution (or specification) must be applied to convert the set into a useful specific form. If
the more specific, stronger set-membership requirement is adopted, the work is easier, but it may be more
difficult to convince oneself that the more specific requirement is intuitively justifiable. If one’s aim is to
convince a sceptical reader of the reasonableness of an axiom, probably the more general style could be
preferable. But when a lot of manipulation is required to strengthen this style of axiom to a useful form,
some of the credibility is lost. If the weaker, less specific style has no advantages (such as greater generality,
for example), it is generally better to be specific, unless this would make the axiom too long or too abstract
to easily understand.
A style of infinity axiom which is in high-level set language, and which is fully specific about set membership,
is as follows. " #
∃X, ∀z, z ∈ X ⇔ (z = ∅ ∨ ∃y ∈ X, z = y ∪ {y}) . (5.6.4)
In terms of such an axiom, one may write the set ω of finite ordinal numbers immediately as the unique
set X which satisfies (5.6.4). This is almost too easy! Proposition (5.6.4) may be rewritten in low-level pure
predicate logic as follows.
" #
∃X, ∀z, z ∈ X ⇔ ((∀u, u ∈ / z) ∨ ∃y, (y ∈ X ∧ ∀v, (v ∈ z ⇔ (v ∈ y ∨ v = y)))) . (5.6.5)
Proposition (5.6.5) is both specific and low-level. It means precisely that the set ω of finite ordinal numbers
is well defined. By contrast, a proposition such as (5.6.2) means that “for some set X, X is infinite”. In other
words, there exists at least one set which is infinite. They are effectively equivalent axioms when combined
with the other axioms. So one may choose the form of the infinity axiom according to one’s objectives and
stylistic preferences.
[ Rewrite proposition (5.6.2) in low-level logic. ]
5.6.3 Remark: One might reasonably ask why the very general concept of infinite sets is described in ZF
set theory in terms of the von Neumann ordinal number construction, where each element in the set ω is
constructed as x ∪ {x} from each preceding set x.
The construction of an infinite set X requires the specification of a new set which is different to all preceding
sets, for each given subset of X. To be able to state that a set X is infinite, we need to be able to assert
that no matter how large any subset Y ⊆ X is, we can always generate a set S(Y ) which is different to all
elements of Y . So the general requirement is to find a general construction rule which yields a set S(Y ) for
any given set Y , such that S(Y ) ∩ Y = ∅ for any set Y .
The construction Y 8→ Y ∪ {Y } happens to satisfy the requirement. To see this, let S(Y ) = Y ∪ {Y } and
note that Y ∈ S(Y ), but Y ∈ / Y because this is forbidden by the regularity axiom. Therefore S(Y ) -= Y , by
the “substitution of equality” axiom of a first order language with equality. So each generated set is different
to the previous set.
We also need to prove that S(Y ) is different to all preceding elements of the sequence of sets. To show
this, note that Y ⊆ S(Y ). In fact, by mathematical induction, it is clear that all generated elements of the

5.7. Russell’s paradox 171
sequence include all previous elements of the sequence. In fact, each element Y of the sequence equals the
union of all elements in the sequence up to and including Y . Therefore the proposition Y ∈
/ Y also implies
that S(Y ) is different to all of the preceding elements of the sequence.
The construction S(Y ) = {Y } looks simpler, and it easily guarantees S(Y ) -= Y (by the regularity axiom).
One can show that the sequence ∅, {∅}, {{∅}},. . . has no elements the same by applying the regularity axiom
inductively.
[ Provide a much briefer, and more rigorous, proof that the finite ordinals in Remark 5.6.3 are all different. ]
[ Find a convincing reason why one should not define S(Y ) = {Y } in Remark 5.6.3. ]
5.6.4 Remark: The inductive rule in the infinity axiom is uncomfortably similar to the inductive rules in
Remarks 5.7.14 and 5.7.15 which arise from the study of Russell’s paradox for “universe sets”. One might
reasonably ask whether the reasons for excluding universe sets from set theory might also be applicable in
the case of infinite sets which are guaranteed to “exist” by the ZF infinity axiom.
As mentioned in Remarks 5.5.1 and 5.5.3, infinite chains of set memberships “on the left” are forbidden in
order to exclude universe-like sets which lead to Russell’s paradox. No logical contradiction arises from the
infinity axiom, but it is difficult to see how an infinite set of sets can be a model for a concrete physical
system, or even for a system which exists only in human minds.
The infinity axiom is best thought of as providing a boundary-less “arena” in which mathematics can be
carried out without ever having to worry about where the bounds lie. For comparison, it is useful to have a
model of the physical universe which is infinite in time and space, whether or not this is true. This provides
and economy of thought. If the model is boundary-less (which is what the induction principle tells us),
then the special boundary case never needs to be considered. No matter where the real boundary is, we can
always ignore it because it’s somewhere “over the horizon”.
This view of the infinity axiom is equally applicable to the infinitesimality of the real numbers. Even if real
time and space are somehow particulate in nature, at least the model has no bounds on resolution. So limits
may be calculated without any concerns about the finite resolution of time or space. Giving our models
a higher resolution than the real world presumably does less harm than if the models have an insufficient
resolution.
5.7. Russell’s paradox
[ The section is somewhat repetitive. It needs to be weeded and compressed. ]
Russell’s paradox is so deep (and unpleasant), it deserves its own section. Maybe it deserves its own book!
5.7.1 Remark: The importance of Russell’s paradox lies in the fact that it is simple, and yet it was not
known to be a serious problem by the professionals until 1903. If such a fundamantal difficulty with the idea
of “the set of all sets” can go unnoticed for such a long time, one must seriously wonder whether there are
perhaps other such serious problems still waiting to be found.
A second good reason to be worried about Russell’s paradox is the fact that the work-around for this problem
is the fairly arbitrary-looking axiom of regularity. (See Section 5.5.) This gets around the problem, but it is
not instantly clear that it is a natural requirement for sets. The regularity axiom looks like an untidy patch
for a fundamentally broken system.
Two objectives arise from these observations: (1) to determine restrictions on logical systems (such as a set
theory) which will ensure that such a big problem never arises again, and (2) to find a way of viewing the
concept of “the set of all sets” so that it seems totally unreasonable to expect such a thing to be accepted. It
is important to develop some sort of instinctual recognition which will prevent such things from happening
again.
5.7.2 Remark: The term “naive set theory” has many different meanings, including the following.
(1) Halmos [160] wrote a book titled “Naive set theory”, meaning an informal approach to axiomatic set
theory. This approach was informal in the sense that there was no formal propositional calculus or
predicate calculus.

172 5. Sets
(2) “Naive set theory” may be defined as set theory which is inborn in humans, or which is commonly
known by people who study mathematics before encountering the mathematical theories of set theory
such as Zermelo-Fraenkel.
(3) A set theory may be called “naive” if it includes the axiom of naive comprehension:
∀F, ∃x, ∀y, (F (y) ⇔ y ∈ x). (5.7.1)
This means that there is a “set” corresponding to each predicate function F .
5.7.3 Remark: Case (3) in Remark 5.7.2 may be thought of as converting adjectives into nouns. If there
is a one-to-one correspondence between predicates and sets, it seems almost superfluous to define sets at all.
If every set corresponds to a predicate, and vice versa, set theory is really the same as the study of the relations
between predicates. In terms of indicator functions, described in Section 7.9, there is a formal correspondence
between any single-parameter logical predicate P and the indicator function χx , where x is the set whose
existence is guaranteed by the NC axiom: ∃x, ∀y, (y ∈ x ⇔ P (y)). This requires the identification of the
truth values F and T with the numbers 0 and 1 respectively. Then ∀y, (y ∈ x ⇔ P (y) ⇔ χx (y) = 1).
(To be continued. . . or deleted!)
5.7.4 Remark: The qualifiers ∃x and ∀y in (5.7.1) in Remark 5.7.2 refer to an implied concrete variable
domain. The axiom of naive comprehension is supposed to be satisfied by such a domain. But the conse-
quences of this axiom are only valid if a variable domain can be found which satisfies it! It is perhaps not
emphasized enough in logic that none of the carefully argued proofs have any validity at all if no system can
be found for which the axioms are valid.
The discovery of a contradiction in a set of axioms implies, one would assume, that the set of axioms is null
and void, and to no further effect. It is a mystery, therefore, why some people would want to persist with
the study of axioms which are not valid for any possible system.
Since the axiom of naive comprehension seems to give rise to a plethora of contradictions, one would assume
that it should be abandoned and ignored. But it continues to occupy the minds of many otherwise serious
people. No scientist, engineer or architect could make a serious career from the study of models which contain
contradictions which prevent them being applicable in any possible universe. It is one of life’s mysteries that
this kind of useless study is regarded as respectable when carried out in the philosophy department.
[ Check what the correspondence is between NBG set theory and the axiom of naive comprehension in Re-
mark 5.7.5. ]
5.7.5 Remark: Case (3) seems to correspond to NBG set theory, where sets co-exist with proper classes.
NBG classes correspond to arbitrary predicates on the space of all sets. NBG classes are only permitted to
be members of other sets if they are sets according to ZF.
5.7.6 Remark: Theorems which are derived in accordance with the axiom of naive comprehension are
marked as [nc].
5.7.7 Theorem [nc]: Russell’s paradox

Let U be a set which satisfies ∀x, x ∈ U . Let Q = {x ∈ U ; x ∈
/ x}. Then Q ∈ Q and Q ∈
/ Q.
Proof: First show that Q ∈ Q. If Q ∈ Q, then the proposition is obviously true. So suppose that Q ∈ /Q
and let x = Q. Then x ∈ U by the definition of U . So x ∈ U and x ∈ / x. Therefore x ∈ Q by the definition
of Q. In other words, Q ∈ Q, as was to be proven.
Now show that Q ∈ / Q. If Q ∈
/ Q then the proposition is obviously true. So suppose that ¬(Q ∈
/ Q). By the
law of the excluded middle, this implies that Q ∈ Q. Then by the definition of Q, Q ∈ U and Q ∈ / Q. So
Q∈ / Q, as was to be proven.
5.7.8 Remark: Theorem 5.7.9 “proves” that Russell’s paradox is not a problem. It assumes NC, the axiom
of naive comprehension. If the proof of Russell’s paradox is applied to a particular subset of the supposed
universe set, it follows that the universe set does not exist. Even more satisfying than this “result” is the
consequence that there are no sets at all which are elements of themselves. In other words, the axiom of
naive comprehension implies a kind of regularity axiom.

5.7.9 Theorem [nc]: ∀x, x ∈

/ x.
Proof: Let U = {x; 9}, where 9 denotes the always-true predicate. (See Notation 4.12.10.) Then U is
the universe of all sets. Let V = U \ {U } and Q = {x ∈ V ; x ∈/ x}.
Suppose Q ∈ Q. Then Q ∈ V and Q ∈ / Q. This is a contradiction. So the proposition Q ∈ Q must be false.
Now suppose that Q ∈ / Q. If Q ∈ V , then Q ∈ Q by the definition of Q. This is a contradiction. So the
proposition Q ∈ V must be false. Therefore Q ∈ / V.
The only set which is not a member of V is U itself. So the proposition Q ∈ / V implies that Q = U . Then by
the definition of Q, it follows that ∀x, (x ∈ U ⇔ (x ∈ V ∧ x ∈ / x)). This implies both ∀x, (x ∈ U ⇒ x ∈ V )
and ∀x, (x ∈ U ⇒ x ∈ / x). So U ⊆ V and ∀x ∈ U, x ∈ / x. The proposition U ⊆ V implies that U ∈ / U . The
proposition ∀x ∈ U, x ∈ / x means that there are no sets which are members of themselves.
5.7.10 Remark: The “proof” of Theorem 5.7.9 gives a hint as to why it is necessary to get serious about
the fundamentals of mathematical logic. It is often difficult to keep track of what is being assumed and
what is being proved. When the set of accepted assertions is infected by just one meaningless proposition,
everything that follows is pure nonsense. On the other hand, Theorem 5.7.9 does give a rather pleasing
result. It “proves” that sets cannot be elements of themselves, which makes good sense. But then again, the
proof of Theorem 5.7.7 is just as valid, but its result is totally horrible, namely the invalidity of the most
fundamental property of truth values, the “excluded middle” rule. Sadly we cannot pick and choose which
“theorems” we accept from a tainted batch. The whole batch must be discarded to be safe.
In Theorem 5.7.9, an inconsistent set of axioms which permits a universal set U = {x; 9} satisfying ∀x, x ∈ U
is shown to satisfy U ∈ / U , which directly contradicts the definition of the set U . But if the definition of
U is contradicted by its own definition, this implies that the whole proof must be wrong because it relies
totally on the definition of U . This suggests that there is something fundamentally wrong with the proposition
∃U, ∀x, x ∈ U . Perhaps it should be replaced with ∃U, ∀x, (x = U ∨ x ∈ U ). Such a proposition is reminiscent
of the definition of a maximal element of a partially ordered set S: ∃M ∈ S, ∀x ∈ S, (x = M ∨ x < M ). So
it does not seem unnatural to define the “most comprehensive set” in a similar fashion.
Whether or not the proof of Theorem 5.7.9 is valid, and in which logical frameworks it may be valid or
otherwise, the theorem does strongly suggest abandoning the idea that the set U of all sets satisfies U ∈ U .
So the regularity axiom (Definition 5.1.26 (7)) is more natural than it may at first seem. In fact, it is natural
to require at least that the set membership relation be anti-reflexive (i.e. ∀x, ¬(x ∈ x)) and antisymmetric
(i.e. ∀x, ¬(x ∈ y ∧ y ∈ x)), and furthermore that there should be no cyclic chains.
5.7.11 Remark: It could be claimed that the regularity axiom (discussed in Remark 5.5.1) is an ad-hoc
solution for a deep and real problem with the concept of sets and classes. In fact, Mortensen [167] suggests
that Russell’s paradox may be escaped by abandoning the “excluded middle” logical law, adopting instead
“paraconsistent logic” or “inconsistency-tolerant logic”. (See Mortensen [167], pages 1–5.) This is an extreme
and unnatural escape route.
It is, in fact, quite natural to forbid sets to be members of themselves, either directly or via a chain of
containments: x ∈ y ∈ z ∈ . . . x. Set membership is inspired by the metaphor of physical containers, and
one never observes in reality that a container is contained in itself, except perhaps in an Escher lithograph.
(See for example “Print Gallery”, 1956, in Escher [201], page 127.) The superficially plausible claim that
the set of all things should contain the set of all things is as reasonable as trying to assign meaning to a
statement like: “This sentence is false.” A similar piece of nonsense is the equation x = x + 1.
Consider also that we define “the fastest swimmer” to mean “the swimmer who is faster than all other
swimmers”, not “the swimmer who is faster than all swimmers”. Similarly, it is natural to define the
universe of all sets to be “the set which contains all other sets”.
[ The axiom template in Remark 5.7.2 case (3) would be more likely to avoid Russell’s paradox if it was
replaced with: ∀F, ∃x, ∀y, ((F (y) ∧ y -= x) ⇔ y ∈ x). Check whether there may be a completely general such
modification to the naive comprehension axiom which generates all NBG classes, or at least guarantees x ∈
/ x,
or guarantees the ZF regularity axiom. ]
5.7.12 Remark: In terms of the world-model ontology for logic described in Section 3.6, the replace-
ment of the excluded middle with tolerance of contradictions means that a logic machine’s world-model is

174 5. Sets
t(P (z)) = T t(P (z)) = F

z
ZP Z \ ZP
Figure 5.7.1 Inconsistency-tolerant logic with world-model ontology
simultaneously in two complementary subsets of the world-model state space. (See Figure 5.7.1.)
A careful study of the kind of thinking which arrives at inconsistency-tolerant logic reveals context confusion.
Logic commences with the assumption that there are propositions which are either true or false, and not
both. Then various relations between the truth values of propositions are noticed, and these relations are
used to discover the truth values of propositions. But then when these relations yield contradictory results,
somehow it is the truth values of the propositions which are held responsible. But it is clear that the relations
must be faulty, because the vailidity of the relations between truth values comes entirely from their ability
to correctly determine truth values of propositions.
Now suppose it is argued that in reality, propositions can be both true and false. For example, in the
quantum mechanical Schrödinger’s cat scenario, the cat may be both alive and dead. But this is actually a
third “mixed state”. There are three possible states of the cat in this model: “alive”, “dead” or “mixed”.
For each of the three states z, the proposition that the cat is in state z may be either true or false, and
precisely one of these three assertions is true according to this model. When one of these three propositions
is true, the state z is in the set of world-model states corresponding to that proposition. When it is false,
the state is in the complementary set.
In the classical model, there are two states. In the quantum model, there are three states. If one interprets
the mixed quantum state in terms of the classical model, it looks like a proposition is simultaneously true
and false. But this mixed state does not occur in the classical model! So the contradiction occurs only when
the two models are confused. Each model-state must be interpreted within its own model, because that it
where it is given meaning.
Figure 5.7.1 depicts a contradictory situation. But this does not agree with the definition of the word “false”.
A proposition is said to be false when the world-model is not in one of the states where the proposition is true.
What is illustrated here is a three-valued truth function. The states in which the proposition P is both true
and false need to be given some sort of meaning. It is meaningless to simply say that a proposition is both
true and false. Not all sequences of words are meaningful sentences. It is even difficult to give meaning to the
individual truth values “true” and “false”. In the case of those who argue for an inconsistency-tolerant logic,
there is a belief that methods of argument take priority over the definition of a proposition. But methods
of argument are only valid to the extent that they correctly recover the truth values of propositions. If they
yield contradictory results, this is an indication that one or more assumptions in the method of argument
are invalid.
To give an analogy, if one solves algebraic equations for a real number x, and one method of calculation
yields x = 3 while another method yields x = 57, one would not say that the correct answer is that x equals
both 3 and 57 simultaneously. There is no number which equals both 3 and 57. At the very least, a method
of calculation must yield meaningful results. That is, the results must be at least consistent with the model
under consideration. Meaningfulness is a prerequisite for correctness.
5.7.13 Remark: To understand why (and how) the “set of all sets” concept is hopelessly self-referential
(like the Liar’s paradox), it is amusing to try to construct very simple examples of universe sets and see how
Russell’s paradox (Theorem 5.7.7) happens.
The simplest case of a universe of sets is U = ∅. In other words, we have a system which has no sets at
all. However, U is the set of all sets, which is therefore, apparently, a set. Therefore we must have U ∈ U ,
apparently. So we must correct the definition of U to at least contain U itself. So we must have U = {U }.
This seems fine. This universe candidate satisfies U ∈ U . It looks like we should add also the set {U } as a
member of U , but this is not necessary because U and {U } are the same set. (If we look inside U , we will see
another copy of U , and inside that is yet another copy, and so on ad infinitum. But that is not necessarily

a serious problem, although it does seem a rather unusual kind of set!)

Now with U = {U }, what is Q? By definition, Q = {x ∈ U ; x ∈ / x} = ∅ because U ∈ U . Then Q is clearly
a subset of U , but it is not (currently) an element of U . If Q ∈ / U , then we have no problems. We will
have Q ∈ / Q = ∅, and Q ∈ / U = {U } because Q -= U , because U -= ∅. This seems to have avoided Russell’s
paradox.
However, the paradox rests heavily on the axiom of specification. It is assumed that the universe U is closed
under the restriction {x ∈ U ; P (x)} of U by any predicate P . In particular, the always-false predicate yields
the empty set. Therefore we must have ∅ ∈ U . So now we have U = {U, ∅}. Therefore Q = {∅} because
U ∈ U and ∅ ∈ / ∅. Once again, we have Q ∈ / Q and Q ∈
/ U . No problem!
The problem, though, is that the specification axioms now requires {x ∈ U ; x ∈ / x} to be an element of U . So
we must once again add Q = {∅} to U . This gives U = {U, ∅, {∅}} and Q = {∅, {∅}}, which does not trigger
Russell’s paradox. But repeated applications of the specification axiom give U = {U, ∅, {∅}, {∅, {∅}}, . . .}
and Q = {∅, {∅}, {∅, {∅}}, . . .}. The reader who has studied ordinal numbers should recognize that Q = ω
at this stage, namely the set of finite ordinal numbers. (See Definition 7.2.12.)
The problem does not end there. The requirement Q ∈ U arises once again from the specification axiom,
even after we apply mathematical induction to arrive at U = {U } ∪ ω and Q = ω. We must now insert ω
into U . This gives U = {U } ∪ ω ∪ {ω} and Q = ω ∪ {ω}. This process does not stop until the entire “set”
of ordinal numbers is included in U .
From this sort of argument, the ordinal numbers arise in a natural way from the application of the specifi-
cation axiom to a supposed universe of sets. Thus when Russell’s paradox is applied to just the empty set,
the resulting set construction leads directly to the Burali-Forti paradox, which is dated 1897, 6 years before
Russell’s paradox, dated 1903. (For example, see Halmos [160], page 80; EDM2 [35], 319.B, page 1188;
Mendelson [165], page 2.) This is because the “final” value of Q in this induction process is the “set” W of
all ordinal numbers, and this “set” has the well-known property that W is less than W , which is a logical
contradiction.
5.7.14 Remark: Note that the proposition U ∈ U is in itself highly problematic, even without thinking
about Russell’s paradox or any set theory axioms. In the simplest case, if U = {U }, we might like to
know what U is. That’s simple! Clearly U equals {U }. So we can express an unknown set in terms of
itself. If we ask what is inside U , it is just U . And when we look inside again, we see U again. Therefore
U = {U } = {{U }} = {{{U }}} . . .. So we can know what U is if we know what U is. But knowing that
U = {U } is very much like knowing that x = x + 1. (This must be a very big number because whatever x is,
x must be 1 bigger than that. So x = (x + 1) + 1 = x + 2. So x is arbitrarily larger than whatever number
we can think of. This is actually a rather poetic “definition” for infinity. But we cannot accept this as a
serious number.) Some self-referential equations do not have solutions. And U = {U } seems to be such an
equation.
It is fairly clear that U = {U } really defines nothing at all. So this example of the proposition U ∈ U is
unacceptable and meaningless, and should not be permitted to enter into mathematics. It is more suited to a
book of riddles and brain-teasers. But suppose there are more elements in U . I.e. suppose that U = {U } ∪ X
for some set X which satisfies U ∈ / X. Then when we look inside U , we see U together with all of the
elements of X. And when we look inside this copy of U , we see another copy of U together with the elements
of X. Making the substitution gives U = {{U } ∪ X} ∪ X. The process never ends, and we never find out
what U is! So it doesn’t matter how we look at it, we can never know what U is.
The self-reference problem is really the core of Russell’s paradox. From a meaningless assumption, any
number of contradictions may be derived. Perhaps the main reason why it is so difficult to see what
it wrong with Russell’s paradox is the fact that the core of the problem is hidden in the self-referential
requirement U ∈ U which arises from ∀x, x ∈ U . The self-reference gives rise to positive feedback which –
surprise, surprise – does not yield a stable solution. (Sounds a bit like my book really!)
5.7.15 Remark: The recursive construction in Remark 5.7.14 uses substitution in each iteration step.
That is, the universe set U generates an infinite number of universes by iterating the rule Un+1 = {Un },
where U0 = U .
Another kind of infinite recursion results from the equation Un+1 = Un ∪ {Un }. (This rule is the same as
the successor set construction in Definition 7.2.2 which generates the ordinal numbers, and the Burali-Forti

176 5. Sets
paradox if taken too far.) This kind of recursion arises when trying to define the universe set U as “the set
which contains all other sets”. This definition avoids the self-referential nature of the definition: “the set of
all sets”.
However, this approach still has serious problems. If we have initially any set U0 which does not contain the
universe U0 , we can add U0 into U0 to construct U1 = U0 ∪ {U0 }. Then we can add U1 to this to obtain
U2 = U1 ∪ {U1 } = U0 %∪ {U0 , {U0 }} and so forth. (See Figure 5.7.2.) But when induction is applied to this
∞
process to yield Uω = n=0 Un , we can construct Uω+1 = Uω ∪ {Uω } and so forth. This gives a class of sets
which are the same as the ordinal numbers except that the seed set is U0 instead of ∅.
U0
U0
U1
U1
U2
U2
U3
Figure 5.7.2 Iteration of universe sets
From this kind of argument, it is clear that Russell’s paradox is only a consequence of the Burali-Forti
paradox when the successor set construction is applied to any set at all. In this sense, Russell’s paradox
is a downstream consequence of the Burali-Forti paradox because the real paradox lies in the universe set
definition itself, not in its properties. On the other hand, Russell’s paradox may be presented in a very
elementary way, whereas the ordinal numbers are an amazingly convoluted construction.
5.7.16 Remark: The attempt by the universe set to contain itself leads to a chain of expanding universe
instances, and the rate of expansion accelerates as the universe set gets bigger. One might wonder if an
expanding physical universe might be related to this. Accelerating expansion suggests a positive feedback
loop. But the analogy is almost certainly superficial.
5.7.17 Remark: Within the “cognitive science” perspective, described in Lakoff/Núñez [173], pages 121–
131, sets and classes are regarded as an abstraction of physical containers. In other words, the human mind
performs manipulations of mathematical sets using the same general apparatus as is used for thinking about
physical containers. The human capability to think about containers is clearly advantageous to survival.
This capability is present also in many kinds of animals. The claim in the cognitive theory perspective is
that it is this mental capability which is being put to work when a mathematician thinks about sets. This
seems entirely plausible. The fact that set operations are taught at an elementary level in terms of Venn
diagrams adds weight to this metaphorical connection.
One might expect some help in resolving the question of universe sets by referring to the container metaphor.
One could ask whether a bucket is inside itself or not. (See Figure 5.7.3.) The same ambiguities which arise
in abstract set theory arise also in the case of a physical bucket. The words “inside” and “outside” do not
seem to apply to the bucket itself. The bucket has a non-zero thickness. Points in the world are divided into
“inside”, “outside” and “boundary”. Is the boundary within the set or outside it? In topology, this question
is answered in exact and satisfying terms. (The topology of set boundaries relies heavily on infinite limits,
which in turn lead to very much the same sorts of problems again. But that’s a story for another day.)
If every bucket is contained inside itself, this suggests that we adopt the axiom ∀x, x ∈ x. This is not a totally
impossible axiom to work with. We can define the empty set to be the set ∅ which satisfies x ∈ ∅ ⇔ x = ∅.
The rest of set theory can be developed by carefully excluding the case x ∈ x when considering any set x. It
would be an “interesting” exercise to re-express the ZF axioms to be consistent with the proposition ∀x, x ∈ x.
(See Exercise 46.2.5.)
It seems much less confusing to say that a bucket is not contained inside itself. This suggests an axiom such
as ∀x, x ∈/ x, which excludes the possibility of a universe set. If we imagine a bucket which contains literally

boundary inside outside
Figure 5.7.3 A bucket
everything, it must contain itself. But the idea of a bucket containing itself is meaningless, a mere quandary
for philosophers to discuss over afternoon tea. Discussion of the “the set of all sets” riddle should be kept
harmlessly within the bounds of the philosophy department tea room.
A third solution is to say that all buckets are on the “boundary” of themselves. This is neither x ∈ x
nor x ∈/ x. In terms of logic, there cannot be a third possibility if we accept the “excluded middle” principle.
So we must either invent a third logical truth value, or else invent a second set relation in addition to ∈.
But then we would need to generalize ZF set theory to include this second set relation xBy to mean “x is
inside the boundary of y”, which would be a lot of wasted effort, or else express ZF set theory in terms of a
third truth value B (in addition to F and T) which would mean “on the boundary of being true and being
false”. Luckily such monstrosities are not in current use by serious mathematicians.
5.7.18 Remark: The attempt in Remark 5.7.15 to define a universe set would arise naturally if we com-
mence with the ZF-generated sets, say, and try to insert into the ZF sets U0 = “the set of all ZF-generated
sets”. Since this modifies the definition of the “the set of all ZF-generated sets” to U1 = U0 ∪ {U0 }, a
recursive process arises as described.
Perhaps the most important thing to notice about any attempt to insert a universe set into the ZF sets is
that it is really totally useless. The constructional activities within the bounds and constraints of ZF set
theory are adequate for mathematics. When very young people want to know what the “biggest number”
is, clearly we cannot name a number which is larger than all others. (This is closely related to the question:
“What is the solution to the equation x = x + 1.”) We can define a non-number called “infinity” for either
the integers or the real numbers, and we can do useful things with such a concept. But we do not try to
insert “infinity” into the standard numbers. We define new sets of extended numbers and keep these sets of
extended numbers clearly separate in our thinking.
In the same way, it is perfectly okay to have a “set1 of all sets0 ” if we keep the level-0 sets separate from the
level-1 sets. This sounds a little like the set classes introduced by Russell/Whitehead [168], but all of the
useful sets are in this case in the first class U0 . The higher classes are provided for recreational mathematics
and logic.
5.7.19 Remark: The difficulties described in Remark 5.7.18 arise in a totally analogous way in computer
file systems. For example, in the unix-family operating systems, it is generally not permitted to make a
“hard link” to a directory, precisely because this would be disastrous for any software which tries to traverse
the file system’s directory tree. (In the olden days, the “super-user” was permitted to make hard links to
directories, but ordinary computer users could not be trusted with such a dangerous capability.)
Figure 5.7.4 illustrates how the file system “folders” (i.e. directories) would appear in a typical graphical user
interface if a “hard link” is made inside a directory “universe” to the directory “universe” itself. (The
unix-style command for this would be something like: “ln . universe”. Do not try this at home unless
you have made a complete computer back-up first.)
Inside the top-level folder, there would appear to be another folder (i.e. sub-directory) called “universe”.
For clarity, the top directory is called U0 here, and the sub-directory is called U1 . (On a real computer, all
of these directories would have the same name.) Then inside the folder U1 would appear another folder U2 ,
and so forth.
There is actually nothing logically wrong with this hall-of-mirrors situation, apart from the fact that any
software which tries to traverse the directory tree will “hang”. (Good software would detect the infinite loop

178 5. Sets
universe U0
universe U1
universe U2
universe U3
universe U4
universe U5
universe 6
universe
unıv
...
Figure 5.7.4 Computer “folder” which contains a “hard link” to itself
and terminate it, but not all software is written so carefully.) The main argument against a self-containing
universe set is that it is useless, but is also a huge burden to maintain. High price, no benefit.
In mathematics, if the membership relation on sets is not an acyclic graph with a finite depth for traversals
“on the left”, it will not be possible to determine the meaning of a set. The most basic level of meaning of
a set is determined by its elements. The membership of a set is what distinguishes one set from another. A
set’s membership determines its identity since the only property of a set is its membership relations. This
is recursively true for each of the members of a set. If the recursive expansion of the elements of a set in
terms of futher elements does not terminate, then a set’s meaning is unknowable. The finite termination of
set membership traversal “on the left” is precisely what is stipulated by the ZF axiom of regularity. (See
Section 5.5.)
A different way to construct the directory hierarchy which is illustrated in Figure 5.7.4 is to start with
an empty directory called “universe” and copy this empty directory into itself. There will then be two
directories, U0 and U1 ∈ U0 , both called “universe”. Then U0 (which now contains U1 ) can be copied
into U0 again. This time a new directory U2 appears inside U1 . The new U2 ∈ U1 is a copy of the old U1 .
Each time the copy operation is done, a new sub-directory appears as in the diagram. This process yields
an unbounded sequence of directories rather than an infinite sequence. But this dynamic process yields the
same result as the static hard-link method up to a finite depth.
The problems get serious when one tries to construct a file system directory corresponding to the set Q =
{x ∈ U ; x ∈/ x}. First a directory Q should be constructed inside Q. Then there is the question of what
to put inside the new directory Q. It is clear that Q is an empty directory. So Q does not contain the
element Q. It is also clear that Q -= U because Q is (currently) empty and U contains Q. So there is no
doubt that Q ∈ / Q. But Q ∈ U . So the construction rule Q = {x ∈ U ; x ∈ / x} requires that Q must be
inserted in Q.
But now we have Q ∈ Q. So by the rule Q = {x ∈ U ; x ∈ / x} requires that Q must be removed from Q.
This process clearly does not converge. It yields a sequence which alternates between the directory trees
Q ∈ Q ∈ U and Q ∈ U . (See Figure 5.7.5.)
universe U = {Q} universe U = {Q}
Q = {x ∈ U ; x ∈
/ x} Q = {x ∈ U ; x ∈
/ x}
Q = {x ∈ U ; x ∈
/ x}
Figure 5.7.5 Computer “folder” containing the folder of all non-self-containing folders

The source of the problem here is very clear. The membership of Q is defined in terms of the membership
of Q. It is specified to be the reverse. Therefore when the test x ∈ Q ⇔ x ∈ / x gives a contradiction when
Q is substituted for x. Note particularly that this has nothing at all to do with the special properties of the
universe set U . The problem arises in exactly the same way if one specifies merely the simple system:
U = {Q}
Q = {x ∈ U ; x ∈
/ x}.
Such a system of equations has no solution for the membership relation between the sets. This suggests
the conclusion that the cause of Russell’s paradox is the form of the equation for Q, not the universe set
property U ∈ U . So it appears that Russell’s paradox is itself erroneous. So maybe there is no such paradox
at all. On the other hand, the only reason that one expects a set satisfying Q = {x ∈ U ; x ∈ / x} to be
a member of a set U is the combination of (1) the closure of U with respect to the specification axiom,
and (2) the idea that U could be an element of U . This combination leads inevitably to the problematic
set Q. Since the specification axiom seems entirely reasonable, this strongly suggests that U ∈ U is totally
unreasonable. But it only becomes totally unacceptable when combined with the specification axiom.
Then again, the specification axiom itself is potentially unreasonable. In the case of a fixed set U , it is totally
reasonable. But then the set U depends on Q, and Q depends on U , as is the case here, such a dynamic
application of the specification axiom becomes unacceptable.
It is also interesting to try to construct a file system directory corresponding to the set W = {x ∈ U ; x ∈ x}.
5.7.20 Remark: The equality Q = {x ∈ U ; x ∈ / x} in Remark 5.7.19 is equivalent to x ∈ Q ⇔ (x ∈

U ∧ x∈ / x). When Q is substituted into this proposition, the result is Q ∈ Q ⇔ (Q ∈ U ∧ Q ∈ / Q). It
follows directly that Q ∈
/ U . Therefore U is not a universe set, or else Q exists in a different universe.
5.7.21 Remark: Using the notations 9 and ⊥ for the propositions which are respectively always true or
always false (as in Notation 4.12.10), one may write ∅ = {x; ⊥} and U = {x; 9}. This makes it appear that
a universe set is as reasonable an idea as the empty set, which is rarely called into question.
The problem with argument, however, is that both expressions {x; ⊥} and {x; 9} are meta-concepts. Written
out fully in terms of the variable space V for the logical system, they signify {x ∈ V; ⊥} and {x ∈ V; 9}
respectively. But the membership relation “∈” is in this case defined in the meta-set-theory within which the
set theory has been defined. This is not the membership relation which is defined in the system’s predicate
space Q. (See Remarks 4.12.3 and 3.1.2 for an explanation of these spaces.) Therefore the expression
{x ∈ V; 9} signifies the set V, which is the concrete variable space which is supplied a-priori for the logical
system. This set must be defined in a different set theory, within which the foreground set theory is defined.
It is fairly clear from these observations that Russell’s paradox arises directly from the use of set theory
to define set theory! This kind of chicken-and-egg situation is described in Remark 2.1.1 and elsewhere in
this book. It is very easy to become confused between set operations in the background and foreground set
theories; in other words, between the meta-set-theory and the set theory. If these two layers are confused,
the kind of machine modelling loop which is described in Remark 3.3.5 arises.
5.7.22 Remark: In the perspective of Remark 5.7.21, it would be surprising if a paradox like Russell’s
paradox did not arise. One can be confident that a paradox has been resolved only when its occurrence
seems inevitable rather than surprising or disturbing.
Self-containing sets are no more meaningful than the sentence S1 = “The sentence S1 is true.” This may
seem to be a very safe sentence because it is true if it is true. But if one looks more carefully, one notices
that one can only know that S1 is true if S1 is true. One can only say that S1 is true if it is true. How can
one know that it is true? If it is false, one might think that a contradiction occurs. Maybe not so. Suppose
S1 is false. Then the sentence S1 is false, which is not a contradiction at all. In other words, we observe that
S1 is false if it is false. There is no contradiction here at all! A contradiction only occurs if a statement is
both true and false. In this case, we can choose to say that S1 is true, in which case there is no contradiction,
or false, in which case there is no contradiction.
If one substitutes S1 into the right hand side of the equality repeatedly, the same answer “false” appears.
Remember that simply stating a proposition is not the same thing as asserting it. (See Remark 3.7.1 for

180 5. Sets
discussion of this.) So there is no contradiction at all in the sentence S1 being false, unless one confuses the
concepts of statement and assertion.
When one realizes that there is a problem with the sentence S1 (because there is no contradiction is stating
that it is true or false), it is then no great surprise when one considers S0 = “The sentence S0 is false.” In
this case there is a serious problem whether it is true or false. There is a contradiction in both cases. But
this is just the other side of the same coin. In the S1 case, a sentence can be both true and false. In the S0
case, the sentence can be neither true nor false.
One may draw an analogy here between S1 and the real-number equation x = x, and between S0 and the
real-number equation x = x + 1. The first equation permits all solutions for x whereas the second equation
permits none. There is no contradiction here. The first equation is merely useless and the second equation
is telling us to go home early. These analogies are summarized in the following table.
sets sentences numbers
U = {x; x = x} S1 = “Sentence S1 is true.” x = x
/ x} S0 = “Sentence S0 is false.” x = x + 1.
Q = {x; x ∈
In both the liar’s paradox and Russell’s paradox, the problem is resolved by seeing clearly the self-referential
nature of the system, which may be resolved by distinguishing between a logical system and its meta-logical
framework system.
5.7.23 Remark: Suppose the axiom ∀x, x ∈ / x is accepted. Then any universe class U satisfies U ∈ / U . The
Russell’s paradox construction may be applied to U to obtain Q = {x ∈ U ; x ∈ / x}. Clearly then Q = U .
So Q ∈ / U . Therefore Q ∈ / Q by the definition of Q. It does not matter that Q does satisfy the second
proposition “x ∈ / x” which defines Q because it does not satisfy the first criterion x ∈ U . Russell’s paradox
disappears entirely.
As outlined in Remark 5.7.25 in greater detail, it is okay to think about a set of objects, and then think
about a set of sets of objects, and so forth. One may even think unambiguously about a set whose members
are all of the sets which are constructed by iterating this procedure. The paradoxes arise only when there is
confusion as to which side of the membership relation ∈ is the object and which side is the class.
To put it simply, it is okay to think about the universe set, i.e. “the set of all sets”. But the first instance of
the word “set” is a meta-set. It exists in a system in which the ordinary sets are being modelled. The NBG
set theory approach tries to include a universe set as a second-class citizen which has no membership rights.
(First-class sets may be elements of NBG proper classes, but NBG proper classes are not permitted to be a
member of anything at all.) This approach is perhaps not such a good idea. It seems preferable to separate
ZF sets from meta-sets which are formed from ZF sets. Then this process may be continued to arbitrary
levels of meta-meta-ZF sets and so forth.
Thus the ideal solution to Russell’s paradox is probably to define a universe set as “the meta-set of all sets”.
The modern approach to set theory tries to break down the psychological barriers between sets and objects.
When one first learns the set concept, one is told that a set is a collection of objects. Then one learns that
one can define a set of sets, and so forth. In ZF set theory, all sets are either empty or else a set of sets. The
lack of distinction between the different categories of sets, meta-sets, meta-meta-sets and so forth is helpful
in reducing the complexity of language required to describe them. But when people use the same name for
two different things, they being to think that those things have more in common than if they had different
names.
It is not necessary to put each set in an explicit layer number as proposed by Russell/Whitehead [168]. But it
is necessary to keep set membership relations cycle-free. It is not necessary to define a separate object/class
model for each level of set mebership as outlined in Remark 5.7.25. But it is necessary to place limits on
the co-existence of sets in a single logical system so that membership relation cycles cannot occur.
Most importantly, when someone says “all sets”, one must ask: “All sets in which universe?” The word
“all” only has meaning if it refers to a specified universe. One cannot define a new set as U = “the set of
all sets”. This only defines U to be equal to the pre-defined concrete set domain V of a particular logical
system. This universe, being part of the framework (or meta-system) in which the logical system is defined,
cannot be inside the universe which it defines. In conclusion, the “set of all sets” is very well defined, but it
is defined in a meta-system or supporting framework.

It may seem that the almost endless discussion of Russell’s paradox is tedious and worthless. But it has a
very positive value in helping to clarify the entire nature of all of logic and set theory. If Russell’s paradox
can be made to “lie flat”, that is a sign that the whole edifice of mathematical logic has been constructed
as a stable, robust structure. If the edifice is vulnerable to paradoxes, that is a sign that there is something
fundamentally weak in the design.
[ Find out if anyone has already developed the meta-sets alternative to NBG set theory which is suggested in
Remark 5.7.23. ]
5.7.24 Remark: Perhaps yet another way to look upon “the set of all sets” is the dynamic perspective.
That is, all sets are constructed by some unspecified mental process which actively collects a “collection of
objects”. In this dynamic perspective, one should be very surprised indeed if one formed a collection of
objects and found that inside that collection was the object which has just been constructed. This would
be, at the very least, a violation of causality.
One should not expect to be able to make a collection of objects which do not yet already exist. That would
be as nonsensical as the phrase: “the largest number which can be expressed in ten words”. As soon as
causality is introduced into set construction, the self-referential paradoxes disappear.
In this dynamic perspective, “the set of all sets” can only mean “the set of all previously existing sets”. It is
not an unrelated observation that the disturbing aspects of the concept of infinity are very much ameliorated
by taking a dynamic perspective in which unbounded sets of objects are generated dynamically “on demand”.
(This “on-demand” perspective of mathematical modelling is also alluded to in Remark 5.7.25.)
Another way of looking at this is that the “axiom of naive comprehension” is not so much naive as careless
or irresponsible. Perhaps it should be renamed the “axiom of reckless comprehension”. Then probably it
would receive less respect than it currently does.
5.7.25 Remark: Attractive though the container metaphor in Remark 5.7.17 may seem, it is too simple to
explain modern set theory and the way modern mathematicians think. Although sets are generally taught
at an elementary level in terms of objects and containers (or boundary curves as in Venn diagrams), modern
set theory permits arbitrary nesting of sets within sets, which blurs the distinction between objects and their
containers. (In fact, in pure set theory, all objects are classes, and all classes are objects. This is like saying
that all objects are buckets.)
To give a credible ontology for modern set theory, it is helpful to make use of the modelling machines which
are discussed in Section 3.3. Models or representations of the world are found in very simple animals. So
world-modelling is apparently a more basic and ubiquitous activity amongst animals than arithmetic or logic.
If a machine represents objects in a model, it seems safe to suppose that one may specify and name collections
of objects based on attributes or enumeration. Thus if U is the total collection of objects in the model, one
may specify collections SF = {x ∈ U ; F (x)} for any criterion F . Alternatively, one may explicitly list the
elements of a collection S of objects. It is entirely reasonable for a name S to be given to a collection if the
machine can effectively determine for each object x ∈ U whether x ∈ S or x ∈ / S. Classification of objects
into categories S is important for basic survival. So it is not surprising that animals have this capability.
The object/class concept becomes problematic when on attempts to include the collections of modelled
objects within the model itself. Generally a model represents some system (either external or internal to
the organism), and that system would not generally contain abstract collections as objects. For example,
we may see birds in a tree, but we never see the class of all birds as a single object sitting on a branch of
a tree. The classes within a model refer to the model’s own object representations. Classes are part of the
apparatus of the model, not part of the system which is being modelled.
If one tries to insert a model’s classes directly into the model’s own object space, the object space is thereby
modified, which consequently modifies the model’s classes, which modifies the model’s classes. This insertion
process is not at all guaranteed to reach a stable state, as discussed in Remarks 5.7.14, 5.7.15 and 5.7.19.
Sets can be included as objects within the world-model ontology (alluded to in Section 3.6) by making a
model of a model, where each model has distinct categories of objects and sets. Thus machine M1 may
have distinct objects and classes of objects. Then machine M2 may model both the objects and classes in
machine M1 . (See Figure 5.7.6.)

182 5. Sets
modelled model meta-model

system
classes classes
objects classes
objects
objects
machine M0
machine M1 machine M2
Figure 5.7.6 Classes and objects, and meta-classes and meta-objects
The model in machine M2 may give names to arbitrary sets of sets of objects which are in the model of
machine M1 . In the bird example, if person 2 models the state of mind of person 1, who is modelling the
birds in the trees, then the class of all birds may indeed by a valid “object” inside the mind of person 1
which can be modelled by person 2.
This process can be extended any finite number of times. In each case, machine Mn+1 creates a model of
both the objects and classes in machine Mn and then constructs arbitrary subsets of the class of objects in
this meta-model. (See Figure 5.7.7.)
modelled model meta-model metan−1 -model metan -model

system
classes classes classes classes
objects classes . . . . classes classes
objects .... ....

objects ..
.... ..
machine M0
machine M1 machine M2 machine Mn machine Mn+1
Figure 5.7.7 Classes and objects in models, recursive
Since humans are able to introspect, they can model their own state of mind. Therefore the modelled system
of a person’s modelling activity may be the same person’s past, present, future or potential state of mind.
This is quite possibly how nested classes originated in human thinking during the last few thousand years.
To represent an unbounded depth of set nesting in a single model, one may introduce the concept of an
induction-capable machine ω which can represent any number of the machines n in its world model. It is not
necessary to represent all of the finite-depth machines in the model of machine ω. This induction-capable
machine only needs to be able to represent any finite-depth machine on demand . Thus machine ω does not
require infinite modelling capability, only unbounded modelling capability. (See Figure 5.7.8.)
inductive system modelled by Mω metaω -model

modelled model meta-model meta n−1
-model meta -model
n
system classes
classes classes classes classes
objects classes . . . . classes classes
objects .... ....

objects
....
..
..
objects
machine M0
machine M1 machine M2 machine Mn machine Mn+1
machine Mω
Figure 5.7.8 Classes and objects in models, infinite

5.8. ZF set theory definitions and notations 183
Transfinite induction-capable machines may implement models which represent the aggregate of a well-
ordered set of individual machines, ordered by the relation that one machine is modelled within another.
Thus one can arrive at a machine which, in principle, may represent any ordinal number set. (See Section 7.2
for ordinal numbers.) However, the ordinal numbers are not closed under this process of building new
machines.
[ Check Remark 5.7.26 to see if it can be strengthened a little. ]
5.7.26 Remark: Yet another way to look at the “set of all sets” problem is to observe that the inclusion
in a universe set U of the set U itself and all of its subsets is formally the requirement IP(U ) ⊆ U , where
IP(U ) denotes the set of all subsets of U . (See Definition 5.8.18.) The specification axiom implies that all
(or almost all) of the subsets of any set must also be sets. (The subsets must have the special property that
they are expressible in terms of a set-theoretic formula. This actually places a severe restriction on which
subsets can even be written down.) By basic cardinality theory, assuming that U does have a cardinality, if
IP(U ) ⊆ U then #(IP(U )) ≤ #(U ). But #(IP(U )) > #(U ) for any set A. (See for example Halmos [160],
pages 100-101.) This is a contradiction.
This form of argument is not completely secure, but it does suggest that there is a problem with trying to
include a set U and its specifiable subsets inside U .
5.7.27 Remark: In conclusion, one may say that the ZF regularity axiom (Definition 5.1.26 (7)) is not an
ad-hoc kludge to fix up Russell’s paradox. The regularity axiom merely prevents self-referential monsters
from inflicting painful positive feedback noise upon set theory. Mathematicians have enough work to do
already without having to fight such hideous monsters.
[ Also describe Curry’s paradox briefly. Then reject it for the same reason that Russell’s paradox is rejected,
namely because the axiom of naive comprehension is too naive. Just turning any arbitrary adjective into a
noun is too permissive. Curry’s paradox shows that A = {x; x ∈ x ⇒ P } for any proposition P leads to a
proof of P . So all propositions are true! However, this assumes that A can be a member of A. So this is a
self-referential set. ]
5.8. ZF set theory definitions and notations
5.8.1 Remark: Theorem 5.8.2 states that there is only one set which satisfies the empty set axiom (Def-
inition 5.1.26 (2)). In other words, there is one and only one empty set. Therefore this set can be given its
own notation and can be called the empty set.
5.8.2 Theorem: Let A1 , A2 be sets which satisfy ∀x, x ∈

/ A1 and ∀x, x ∈
/ A2 . Then A1 = A2 .
Proof: Let A1 , A2 be sets which satisfy ∀x, x ∈/ A1 and ∀x, x ∈

/ A2 . To prove that A1 = A2 , it suffices to
show that ∀x, (x ∈ A1 ⇔ x ∈ A2 ) (by the extension axiom, Definition 5.1.26 (1)). But
/ A1 ⇒ ∀x, (x ∈
∀x, x ∈ / A1 ∨ x ∈ A2 ) (5.8.1)
⇔ ∀x, (x ∈ A1 ⇒ x ∈ A2 ). (5.8.2)
So ∀x, (x ∈ A1 ⇒ x ∈ A2 ). Similarly ∀x, (x ∈ A2 ⇒ x ∈ A1 ). Therefore ∀x, (x ∈ A2 ⇔ x ∈ A1 ). Line (5.8.1)

follows from Theorem 4.11.7 (vi). Line (5.8.2) follows from Definition 4.11.2.
5.8.3 Definition: The empty set is the set A which satisfies ∀x, x ∈
/ A.
5.8.4 Notation: ∅ denotes the empty set.

" #
5.8.5 Remark: From Theorem 5.8.2, it follows that ∀A, (∀x, x ∈
/ A) ⇒ (A = ∅) .
Therefore for any concrete proposition domain (or concrete logic domain) which is modelled by ZF set theory,
there will be one and only one individual object E such that the variable name map µV : NV → V maps ∅
to µV (∅) = E. (See Remark 3.1.2 for terminology and notations.)

184 5. Sets
5.8.6 Remark: The empty set in ZF set theory is the unique individual concrete set which has no members.
The uniqueness follows from the ZF extension axiom as in the proof of Theorem 5.8.2. The existence follows
from the ZF empty set axiom.
When a concrete object exists and is unique, it may be given a constant abstract name. Names are generally
given to objects in an informal way. However, it is possible to define constant names in a systematic way.
If a set S(z1 , z2 , . . . zn ) satisfying x ∈ S(z1 , z2 , . . . zn ) ⇔ P (x, z1 , z2 , . . . zn ) exists and is unique for all n-
tuples (z1 , z2 , . . . zn ), then S(z1 , z2 , . . . zn ) is a well-defined n-parameter name family. This may be regarded
as a named abstract n-parameter logical function. This definition may be introduced as follows.
∀z1 , z2 , . . . zn , S(z1 , z2 , . . . zn ) = {x; P (x, z1 , z2 , . . . zn )}.
Using this convention, one may formally define the 0-parameter function ∅ as ∅ = {x; x -= x}. This is,
however, an artifical way of defining the empty set because the predicate P (x) = “x -= x” is merely designed
to be false for all x. This predicate has nothing at all to do with the equality relation. It would be
more natural to define P (x) = “⊥(x)”, where ⊥ is the always-false predicate. (See Remark 4.12.9 and
Notation 4.12.10.) Thus ∅ = {x; ⊥(x)}. But this is also an artificial way to define the empty set.
An axiom (or theorem) which guarantees the existence of an n-parameter family of sets S(z1 , z2 , . . . zn ) often
has the form:
∀z1 , z2 , . . . zn , ∃X, ∀y, (y ∈ X ⇔ P (y, z1 , z2 , . . . zn )).
In this case, it is possible to notate the set X as S(z1 , z2 , . . . zn ) = {x; P (x, z1 , z2 , . . . zn )}. But sometimes
the existence axiom (or theorem) may have the following form.
∀z1 , z2 , . . . zn , ∃X, P (X, z1 , z2 , . . . zn ).
In this case, it would be desirable to introduce the notation S(z1 , z2 , . . . zn ) for X as follows.
∀z1 , z2 , . . . zn , S(z1 , z2 , . . . zn ) = X; P (X, z1 , z2 , . . . zn ).
In other words, S(z1 , z2 , . . . zn ) is the unique set X for which P (X, z1 , z2 , . . . zn ) is true. That is, instead
of notating the set of objects which satisfy the predicate P , one notates the unique single object which
satisfies P .
In the case of the empty set, we could then define the notation ∅ as:
∅ = X; ∀y, y ∈
/ X,
where the predicate P is defined by P (X) = “∀y, y ∈ / X”. This form of notation introduction is easy to
convert into plain language. It means that ∅ denotes the unique set X which satisfies ∀y, y ∈
/ X. In other
words, it denotes the unique set which has no members.
Definition 5.8.9 and Notation 5.8.11 introduce singleton sets {a} for any a. These may be formalized as
follows.
∀a, {a} = {x; P (x, a)}

(5.8.3)
= {x; x = a},
where P is defined by P (x, z) = “x = z”. Alternatively, one may define:
∀a, {a} = X; ∀x, (x ∈ X ⇔ P (x, a)).

(5.8.4)
= X; ∀x, (x ∈ X ⇔ x = a).
In this case, the “bracket-less” form of definition (5.8.4) seems less natural than the standard bracketed
form (5.8.3).
It is important to remember that the equality symbols “=” in the above definitions do not represent an
equality relation. (This notational issue for definitions is discussed in Remark 1.6.5.)

5.8. ZF set theory definitions and notations 185
5.8.7 Remark: The empty set may be thought of as a “universal subset” because ∅ ⊆ A for all sets A.
By symmetry, one might expect also the unique existence of a “universal superset”. It turns out that such
a concept is as painful as the empty set is painless. There is no universal set in ZF set theory, in order to
avoid Russell’s paradox, but Bernays-Gödel set theory does have a “universal class”. (See Section 5.12 for
BG set theory.)
5.8.8 Theorem: Let A be a set. Then ∅ ⊆ A.
Proof: Let A be a set. It is to be shown that ∀x, (x ∈ ∅ ⇒ x ∈ A). By Definition 5.8.3, ∀x, x ∈ / ∅.
Therefore by Theorem 4.11.7 (vi), ∀x, (x ∈
/ ∅ ∨ x ∈ A). By Definition 4.11.2, this is equivalent to the
proposition ∀x, (x ∈ ∅ ⇒ x ∈ A).
5.8.9 Definition: A singleton is a set A which satisfies ∃x, (x ∈ A ∧ (∀y ∈ A, y = x)).
5.8.10 Remark: For any set x, there is a unique set A which satisfies x ∈ A and ∀y ∈ A, y = x. Existence
follows from the ZF unordered pair axiom, Definition 5.1.26 (3).
An equivalent condition for a set A to be a singleton is ∃# x, x ∈ A. The unique singleton A which contains
a set x is characterized by the proposition y ∈ A ⇔ y = x.
5.8.11 Notation: {x} denotes the singleton set A which satisfies x ∈ A ∧ ∀y ∈ A, y = x.
5.8.12 Notation: {x; P (x)}, for any set-theoretic formula P , means the set S which satisfies x ∈ S ⇔
P (x) if it can be proven that such a set exists.
5.8.13 Remark: If it has not been proven in ZF set theory that x ∈ S ⇔ P (x) for some set S, then
Notation 5.8.12 denotes merely a conjectural set. The introduction of such conjectural sets is error-prone
because sometimes such notations are mistaken for well-defined sets which can be used as such. Therefore all
use of this form of notation should be accompanied by a proof that the set exists, or else should be indicated
very clearly as conjectural. By contrast, Notation 5.8.15 is safe because the set is known to be well defined
by the specification (or replacement) axiom.
5.8.14 Remark: There are a few popular variants of Notation 5.8.12. For example, Shoenfield [169],
page 242, uses [x | P (x)] to mean {x; P (x)}. (See also Remark 5.8.26 for discussion of alternative notations.)
5.8.15 Notation: {x ∈ A; P (x)} means {x; (x ∈ A) ∧ P (x)} for any set A and set-theoretic formula P .
5.8.16 Remark: A difficulty with Notation 5.8.15 is the fact that “x ∈ A” is actually a proposition with
two variables x and A. There is no absolute reason why one should assume that x is the bound variable or
“dummy variable” and that “x ∈ A” is a condition which is to be combined with P (x) to define the set.
One way to avoid such ambiguity would be a notation such as Ex [(x ∈ A) ∧ P (x)] (where E is mnemonic
for “ensemble”), which makes it clear that the symbol x is a bound variable and A is a free variable. (See
Remark 5.1.23 for free and bound variables.)
One possible generalization of Notation 5.8.15 would be to say that {x R y; P (x)} means {x; (x R y) ∧ P (x)}
for any relation R. An ambiguity in this sort of generalized notation is exemplified by an expression such as
“{A ⊆ X; P (A)}”. One may guess that A is the dummy variable because it is the first variable before the
semicolon, and it appears in P (A). But if this set is written as “{X ⊇ A; P (A)}”, the meaning is ambiguous
even though the proposition “X ⊇ A” is equivalent to the proposition “A ⊆ X”. Notation 5.8.23 attempts
to clarify this situation.
5.8.17 Theorem:
(i) A = {x; x ∈ A} = {x ∈ A; x = x} for any set A.
(ii) {x ∈ A; P (x)} ⊆ A, for any set A and any set-theoretic formula P .
5.8.18 Definition: The power set of a set X is the set {A; A ⊆ X} of all subsets of X.
5.8.19 Notation: IP(X) for a set X denotes the power set of X.

186 5. Sets
5.8.20 Remark: A popular alternative notation for the power set IP(X) is 2X , but this is inconvenient
for typing and is not very accurate. The notation 2X is best reserved for the set of functions from X
to 2 = {0, 1}. (See also Notation 6.5.13 and Remark 7.9.6.)
5.8.21 Remark: The power set in Definition 5.8.18 is well defined for any set X by ZF axiom (5) in
Definition 5.1.26.
" #
5.8.22 Remark: The power set IP(X) is defined by the proposition ∀A, A ∈ IP(X) ⇔ A ⊆ X .
Therefore the proposition A ⊆ X can always be replaced by the equivalent proposition A ∈ IP(X) and vice
versa. This equivalence is frequently useful in applications.
5.8.23 Notation: {A ⊆ X; P (A)} for any set X and set-theoretic formula P means {A ∈ IP(X); P (A)}.
5.8.24 Remark: Life gets more confusing with expressions such as {(x, y, z) ∈ IR3 ; y = z} or {f : X →
Y ; f is one-to-one}. Such expressions can be reduced to Notation 5.8.15, for example {p ∈ IR3 ; p = (x, y, z) ∧
y = z} and {f ∈ Y X ; f is one-to-one}. (See Notation 6.5.15 for the latter style of notation for sets of
functions.) Usually the context should make clear which variables are the dummy variables.
5.8.25 Remark: Another kind of confusion can arise with expressions such as {f (x); x ∈ A} for a set-
theoretic expression f which is a function on A (i.e. f (x) is uniquely defined for all x ∈ A). It follows by ZF
Axiom (6) (Definition 5.1.26) that {f (x); x ∈ A} is a set if A is a set and f is a set-theoretic function on A.
Notation 5.8.27 formalizes this.
5.8.26 Remark: Some people use the notation {x ∈ A : P (x)} for the set of x ∈ A such that P (x) is true.
The colon can be confusing in set constructions such as {f : A → B : P (f )}.
Some people use the notation {x ∈ A | P (x)}, but this can be confusing in probability contexts. In this book,
the notation {x ∈ A; P (x)} is used throughout. The semicolon means “such that”. The vertical stroke
notation “ | ” for “such that” may be confused with the following notations.
(i) The probability notation “Prob(E | F )” for events E and F , which means the probability of the event
E conditioned by the event F .
(ii) The modulus |z| of a complex number z, the norm |v| of a vector v, or the determinant |A| of a matrix A.
&
(iii) The restriction f &X of a function f to a set X.
(iv) The divisibility operator. For example, m | n means that m divides n.
5.8.27 Notation: {f (x); x ∈ A} for any set A and set-theoretic function f means {y; ∃x ∈ A, f (x) = y}.
5.8.28 Remark: It is a good habit to avoid “naked dummy variables” when specifying sets. An expression
such as {x; P (x)} which does not specify a set which is to be restricted by the proposition P is dangerous
because the expression might not specify a valid set. This is no problem in NBG set theory (which is not
adopted for this book), but in ZF set theory it is very important to ensure that all set specifications are valid
sets so as to avoid Russell’s paradox. The simplest way to achieve this is to use Notation 5.8.15 to specify
sets. If A is previously verified to be a set, and P is a valid set-theoretic formula, then {x ∈ A; P (x)} is a
set by ZF Axiom (6) in Definition 5.1.26.
5.9. Axiom of choice

[ Like most of the book, this section is in the “ideas capture phase”. Therefore there is currently a lot of
repetition and informal discussion. ]
5.9.1 Remark: The author has decided to present ZF as the standard, safe set theory for the differential
geometer to work in. Theorems which are valid in ZF without AC (axiom of choice) could be referred to
as “ZF-clean” theorems. All “AC-tainted” theorems will be tagged with a warning indication like “Theo-
rem [zf+ac]”. (See for example Theorem 20.1.4.) Mathematicians who are comfortable with AC might call
such theorems “AC-enhanced”. The reader can thus be made aware that such theorems are at the border
of meaningfulness, perhaps even a little outside the border. Theorems which can be proven with the weaker

5.9. Axiom of choice 187
axiom of countable choice will be tagged as “Theorem [zf+cc]” and will be called “CC-tainted”. (Exam-
ples: Theorem 7.2.26 and 7.2.36.) It is very easy to accidentally use AC in a proof, especially in topology.
Theorems claimed as ZF-clean in this book may be tagged as AC-tainted or CC-tainted in a later revision
if a dependency is discovered. Caveat lector!
5.9.2 Remark: The ZF existence axioms 2, 3, 4, 5, 6 and 8 in Definition 5.1.26 are essentially of the
explicit set-membership form ∃X, ∀z, (z ∈ X ⇔ F (z)) for some predicate F . The axiom of choice is not
of this form. The explicit set-membership specification permits the immediate definition of a unique set X
whose members satisfy F . (See Remark 5.1.18.) The axiom of choice does not enable any such definition,
unique or not. This is why the axiom of choice is useless. AC simply does not tell you what the members of
a set X are. One must remember that a set is defined by its members. The membership of a set is its only
property. If the membership is not known, the set is not known.
5.9.3 Remark: The axiom of choice may be described as a “pixies at the bottom of the garden” axiom –
you know they’re there, but you never see them. Given a set U , a subset X of U which is guaranteed to exist
only by the axiom of choice cannot be determined explicitly. That is, there are elements x ∈ U for which it is
not possible to determine whether x ∈ X. The set “exists”, but you can’t see it. (A good example of this is
a set which is not Lebesgue measurable.) Some mathematicians feel quite comfortable with this while others
find the discomfort intolerable. The axiom of choice is rejected in this book. (If the reader sees any explicit
or implicit use of the axiom of choice in the book, please notify the author immediately.) Mendelson [165],
page 197, says: “The Axiom of Choice is one of the most celebrated and contested statements in the theory
of sets.” Struik [194], page 201, says the following about Zermelo’s proof of the well-ordering of real numbers
in 1902:
Since Zermelo based his proof on the Auswahlaxiom, which states that from each subset of a given
set one of its elements can be singled out, mathematicians differed on the acceptability of a proof
where no constructive procedure could be given for the finding of such an element. Hilbert and
Hadamard were willing to accept it; Poincaré and Borel were not.
This kind of disagreement led to a split between “formalism” and “intuitionism”. The battle is still worth
fighting. Most scientists would reject a theory that explains events on Earth in terms of an arbitrary theory
about goblins in another galaxy. Even though formally the theory might be valid, an untestable theory
is useless. The axiom of choice is untestable because it gives you sets which you can never see. Just as
Ockham’s razor is invoked in science to eliminate arbitrary and untestable theorizing, so also should the
axiom of choice be rejected in mathematics. Real mathematicians solve problems by constructing solutions,
not by pulling them out of a hat.
5.9.4 Remark: In the author’s opinion, there is no need for the axiom of choice in serious mathematics. It
is an interesting axiom for recreational mathematics. For example, the existence of Lebesgue non-measurable
subsets of the real numbers can be proved with the axiom of choice, but you can never know what is in such
a set. If X is an non-measurable subset of IR, it is not possible to determine for all x ∈ IR whether x ∈ X
or x ∈ / X. In other words, such sets may exist in some abstract sense, but it is not possible to determine the
set’s contents. The axiom of choice is the “lucky dip axiom”. With all of the other existential axioms, you
choose what is in the set, and the axiom guarantees that your set exists if you obey the rules. The set which
the Axiom of Choice delivers to you is not of your choosing and under your control. The choice set or choice
function is an unknowable, random selection of elements from a set of sets. Not only is the set or function
determined by roulette wheels and dice, but you don’t even get told what the numbers are after the wheel
has been spun and the dice have been rolled. The Axiom of Choice is the roulette wheel of set theory!
If a set’s contents can be determined, there must be some sort of rule or procedure which determines which
elements are in the set. In other words, the set is constructible according to some procedure or formula. Even
though the axiom of choice is intuitively appealing because one feels that choices from non-empty sets must
be possible, it causes sets to come into existence which have an indeterminate membership. It is very difficult
to say that a set really exists if it has a membership which cannot be determined. Since the whole raison
d’être of sets is to specially distinguish members from non-members, a set with undeterminable membership
is incapable of fulfilling its only purpose. One can write symbols for such objects and manipulate them with
self-consistent logic, but they are of no practical use. Therefore it will be assumed in this book that the
axiom of choice is not one of the axioms of mathematics. (In fact, the axiom of choice will be derided and
vilified at every opportunity. If a theorem requires AC, tough!)

188 5. Sets
The author rejects the Axiom of Choice not because it is possibly wrong, but because it is certainly useless.
5.9.5 Remark: Definition 5.9.6 adds the axiom of choice to the ZF set theory in Definition 5.1.26. This
can be useful when courage and moral fortitude fail. The set of axioms ZF plus AC is sometimes denoted
as ZFC.
Although it is now well known that AC is independent of the ZF axioms, there are limited forms of the
axiom of choice which do follow from ZF. Some AC-similar ZF theorems are discussed in Section 7.8.
5.9.6 Definition: A Zermelo-Fraenkel set theory with axiom of choice is a set theory which satisfies all of
the conditions of Definition 5.1.26 together with Axiom (9).
(9) The choice axiom: For any set X,
" " # " ##
∀A, A ∈ X ⇒ ∃x, (x ∈ A) ∧ ∀A, ∀B, (A ∈ X ∧ B ∈ X ∧ A -= B) ⇒ ¬∃x, (x ∈ A ∧ x ∈ B)
" % #
⇒ ∃y, y ⊆ X ∧ ∀A, (A ∈ X ⇒ ∃z, (z ∈ A ∧ z ∈ y ∧ ∀w, (w ∈ A ∧ w ∈ y ⇒ w = z))) .
[ Comment on the individual sub-expressions of Definition 5.9.6, Axiom (9), and axioms (5.9.1) and (5.9.2).
Comment on how they are related. ]
5.9.7 Remark:% It is difficult to have faith in an axiom which doesn’t fit on a single line, even using the
abbreviation X for the union of X. Using even more abbreviations, Definition 5.9.6, Axiom (9), may be
expressed as follows.
" # %
(∀A ∈ X, A -= ∅) ∧ (∀A, B ∈ X, (A -= B ⇒ A ∩ B = ∅)) ⇒ ∃y ⊆ X, ∀A ∈ X, ∃z, A ∩ y = {z}. (5.9.1)
The set y contains a single choice of an element in each set A in the collection X.
As mentioned in Remark 5.1.18, the set y whose existence is asserted by Definition 5.9.6, Axiom (9), is not
of the form ∃y, ∀x, (x ∈ y ⇔ F (x)) for%some predicate F . Th axiom of choice does not specify a unique set.
The axioms only claims that the set X is non-empty. There is no indication of how one might determine
the membership of the set y. Since the membership of a set is its only property, and this essential property
cannot be determined, the axiom of choice is useless for constructing a set.
[ Show the equivalence of the choice axiom (9) and axiom (5.9.2). ]
5.9.8 Remark: EDM2 [35], section 33.B expresses the axiom of choice similarly to the following:
(∀x ∈ X, ∃y, P (x, y)) ⇒ ∃f, (∀x ∈ X, P (x, f (x))), (5.9.2)
for any set X and set-theoretic formula P . (See Definition 5.1.22 for an explanation of set-theoretic formulas.)
Here f is a set which happens to be a function with domain including X, not a set-theoretic formula. (See
Figure 5.9.1.) This formulation of the axiom is incomplete because it does not include the requirement that
f be a function. However, it has the advantage of intuitive clarity.
y1,1
,1 )
∈P
(x 1, y 1
y1,2 = f (x1 )
x1
X y1,3
x2
y2,1 = f (x2 )
(x2 , y
2 ,2 ) ∈P y2,2
Figure 5.9.1 Choice function f for a relation P

5.9.9 Remark: A simple summary of the axiom of choice is (S -= ∅ ∧ (∀i ∈ S, Xi -= ∅)) ⇒ ×i∈S Xi -= ∅.
But this requires the formal definition of arbitrary cross products in Section 7.7.
Unlike the ZF existence axioms, the axiom of choice does not specify the elements of a particular set. AC tells
you that a particular set of sets is not empty. This is a weak assertion. No new set is brought into existence.
One is told by AC only that any direct product of a non-empty collection of non-empty sets contains at least
one element. When an ex machina “roulette wheel” has chosen the elements for all sets in a collection, one
may make one’s own choices for a finite sub-collection, and let the roulette wheel choose the rest. In this
way, many more elements of the direct product can be found.
5.9.10 Remark: It is important to be aware of what one loses by not accepting the axiom of choice. The
following assertions require the general axiom of choice.
(i) Every set is equinumerous to an ordinal number. Therefore the cardinality of all sets is well defined.
(See Mendelson [165], page 200.)
(ii) Any non-empty direct product of compact topological spaces is a compact topological space. (Tikhonov’s
theorem. See Remark 15.7.10.) For a topological space to be compact requires only that a particular kind
of subcollection of given collections of sets should exist. Compactness does not require the specification
of any method of determining the contents of these subcollections. This makes the axiom of choice very
tempting because it asserts the existence of sets, but no method of determining the contents.
(iii) There exists a subset of IR which is not Lebesgue measurable. (See Theorem 20.1.4.)
(iv) Any linear space X over a field k has a basis. (See Theorem 10.2.25.)
(v) Cardinal numbers are comparable. [ See EDM2 [35], 34.B. This is probably the same as trichotomy? ]
(vi) The Heine-Borel theorem (Theorem 17.3.27)? [ This must be checked. ]
The following assertions require the axiom of countable choice.
(vii) Any Dedekind-finite set (Definition 7.2.24) is a finite set as defined by equinumerosity to finite ordinal
numbers. (See Theorem 7.2.26.)
(viii) Every infinite"set has a subset which is equinumerous to the ordinal numbers.
# That is, if X is an infinite
set, then ∃f, (∀i ∈ ω, f (i) ∈ X) ∧ (∀i, j ∈ ω, (i = j ⇔ f (i) = f (j))) . (See Theorem 7.2.28.)
(ix) The union of a countable number of countable sets is countable. (See Theorem 7.2.36.)
5.9.11 Remark: Concerning the Lebesgue non-measurable sets mentioned in Remark 5.9.10 (iii), an ex-
ample of a non-measurable set may be constructed by partitioning IR into equivalence classes according to
whether the difference between two numbers is a rational number. Then these (apart from the equivalence
class of zero) are paired into mirror images of each other, and a function f is defined to be 1 on one class
of each pair and 0 on the other class. No doubt, such a function can be defined. But without the axiom of
choice, such a set cannot be reached be set-theoretic formulas inserted into the ZF axiom templates. The
problem seems to be that the set theory language cannot independently specify an uncountable number of
function values. More importantly, if humans could do so, then surely this could be expressed in symbolic
algebra. Therefore the problem seems to be ultimately a limitation of the human mind. It is not possible
to choose a toss of the coin for each member of an uncountable set of equivalence classes which cannot be
enumerated in some systematic way. Countable sets can often be enumerated and manipulated according
to inductive or iterative rules. (This is without an axiom of countable choice. See Remark 7.8.3.) There is
apparently no way to express a rule to deal with the very complex structure of these equivalence class pairs.
Since we cannot enumerate them in terms of other sets, we cannot assign explicit values.
So AC comes to the rescue by tossing a coin for us an infinite number of times. But since we cannot see the
results of the coin tosses, and we would not be able to register the results in our limited minds anyway, we
can do absolutely nothing with the resulting non-measurable set except feel comforted (or discomforted) by
its existence. We will never be able to draw a graph of it as one can in the case of a nowhere-differentiable
continuous function, for example. It is not possible to know what its value f (x) is for x = exp(4.2), except
that f (x) = 0 or 1. In short, such a function is a dead end. All sets and functions ‘constructed’ with AC are
dead ends. It is not even possible to devise algorithms to numerically approximate the values of such sets
and functions.

190 5. Sets
In the case of countably infinite enumerations, even though we cannot carry out an infinite number of
calculations, we can carry out as many as we wish. Given enough time, we can imagine reaching any point
in the enumeration. (Some mathematicians have rejected this view though.) In the case of uncountable sets,
there is typically no way to convince ourselves that we could get to any point in finite time. In other words,
countable induction fails. Since the human mind cannot go beyond this point, it seems pointless to fill in this
void in human capability with the Axiom of Choice which provides only dead ends to further investigation.
Consider the example of the decimal expansion of π. We could never calculate 1080 digits of π, but we
generally accept that it can be done “in principle”. So it is impossible, but we accept it anyway. The whole
concept of the real number system relies upon this kind of acceptance that decimal expansions (or some
equivalent) can be continued indefinitely. Even though π is unknowable, we can know as many digits as we
wish, without any hard barrier to stop us. AC only tells us that there is something in the void, but we cannot
know anything about it. One may accept the Axiom of Choice if one wishes to do so. But a mathematics
which has black-box sets which cannot be opened is untidy. This like receiving gifts of thousands of books in
boxes which can never be opened. One might as well refuse the gifts and keep the house tidy. One may never
finish reading the infinite book of digits of π, but most people would accept that being able to read several
pages is better than nothing. Reality can’t be measured to infinite digits anyway. (And besides, according
to general relativity, the circumference/diameter ratio of a circle depends on local space-time curvature and
the size and orientation of the circle!)
It seems that the human mind can accept that linear thought patterns can be extended indefinitely, which is
what induction is: an indefinite sequence of linear thought. But uncountable sets cannot be reached in such
a linear sequence. It is then highly questionable whether a simple linear mind can ever appreciate a universe
which is not at all linear. Differential geometry, as a tool-chest for physics, is an attempt to reduce massively
uncountable complexity to linear sequences of thought. This is surprisingly feasible, but it must never be
forgotten that the world is understood through the extremely narrow funnel of the single-threaded human
mind. Even basic measure theory has dead ends which cannot be investigated further. So the full complexity
of differential geometry inevitably has a forest of dead ends. The question of the Axiom of Choice gives some
idea of the limitations of mathematical thinking and modelling. Mathematics is only neat and tidy if you
ignore the voids beyond which one cannot think. This is yet another reason to not view mathematics as a
solid bedrock for physics or any other science.
5.9.12 Remark: Mendelson [165], page 201, has the following pro-AC statement:
The status of the Axiom of Choice has become less controversial in recent years. To most math-
ematicians it seems quite plausible and it has so many important applications in practically all
branches of mathematics that not to accept it would seem to be a wilful hobbling of the practicing
mathematician.
So AC makes life easier. This seems a poor excuse for introducing sets with undeterminable membership into
set theory, considering that the membership of a set is the only purpose for its definition. The sets which
require AC can safely be ignored because one will never arrive at them without AC. No contradictions arise
by not admitting the “pixies at the bottom of the garden”. It may feel better to know that all sets have a
cardinal number, but the sets which do not have a cardinal number will never be encountered in practice.
AC does not give any practical benefit.
A good example of the uselessness of the axiom of choice is the existence of a basis for every linear space.
Theorem 10.2.25, with the assistance of the axiom of choice, asserts that every linear space has a basis.
However, it is not possible to do anything in practice with a linear space which has an intangible basis. So
in practice, only those spaces for which a basis can be constructed are useful. Therefore in this book, when a
basis is required for a theorem, the existence of the basis is required as a condition. In this way, the axiom of
choice is avoided, and no useful theorems are lost. An example is Theorem 10.5.3 which asserts the existence
of linear functionals of a particular kind on any linear space which has a basis. Every usable linear space
has a basis. So this theorem covers all usable linear spaces. The same general observation applies to all
applications of the axiom of choice.
[ See Halmos [160], page 60, for a comment on the history of trying to avoid the axiom of choice. ]
5.9.13 Remark: The proof of the Heine-Borel theorem (Theorem 17.3.27) sometimes uses the Axiom of
Choice in the equivalent form of Zorn’s Lemma. (See Simmons [140], p. 110–114.) But it seems that AC is

not required. For example, see Taylor [145], pages 30–31.
[ There’s a problem with using ordered pairs in Remark 5.9.14. They are not introduced until Definition 6.1.3. ]
5.9.14 Remark: In the literature, there are many lists of equivalents for the Axiom of Choice. (See for
example EDM2 [35], section 34, Mendelson [165], page 197, Halmos [160], page 60.) Some well-known
equivalents for AC, under the assumption of the ZF axioms, are as follows.
(1) Some slightly different formulations of the Axiom of Choice in Remark 5.9.8, equation (5.9.2):
(1.1) For any set X,
(X -= ∅ ∧ ∀A ∈ X, A -= ∅)
"" # #
⇒ ∃f, ∀x ∈ X, ∀y, z, (((x, y) ∈ f ∧ (x, z) ∈ f ) ⇒ y = z) ∧ ∀B ∈ X, f (B) ∈ B .
In other words, for any non-empty set X of non-empty sets, there is a function f with domain X
such that f (B) ∈ B for all B ∈ X. This is perhaps the best-known form of AC.
(1.2) For any set X,
" #
∃f, ∀B, (B ⊆ X ∧ B -= ∅) ⇒ f (B) ∈ B .
The function f is a set which happens to be a function, not a set-theoretic formula. This axiom
means that for any set X, there exists a choice function f such that for any subset B of X, f (B) ∈ B;
that is, f chooses a single element of B. (See Mendelson [165], page 197.) This is illustrated in
Figure 5.9.2.
f (B1 ) f (B2 )
B3
B1 B2
X f (B3 )
B4 B5
f (B6 )
f (B4 ) f (B5 )
B6
Figure 5.9.2 Choice of one element for every subset Bi of a given set X
(1.3) The Multiplicative Axiom. (Mendelson [165], page 198.) This states that for any set X,
" # " #
∀A, B ∈ X, (A ∩ B = ∅ ∨ A = B) ⇒ ∃C ∀A ∈ X, ∃# x ∈ C, x ∈ A .
In other words, if X is a disjoint collection of sets, then there exists a set C such that for any
member A of X, the set A ∩ C contains exactly one element x. So C is a “choice set” which selects
exactly one element from each element of a disjoint collection. See Figure 5.9.3.
A1 x1 ∈C x2 ∈C A2
A3
x3 ∈C
x6 ∈C
x4 ∈C x5 ∈C
A4 A6
A5
Figure 5.9.3 Choice of one element xi from each set Ai of a disjoint collection X

192 5. Sets
(2) The Well-Ordering Theorem. This theorem states that any set can be well-ordered. It seems that
Zermelo formulated the Axiom of Choice in 1904 with the express purpose of proving the well-ordering
theorem.
[ Define well-ordering. ]
(3) Zorn’s Lemma. According to EDM2 [35], 34.C, Zorn’s lemma has at least the following 5 forms.
(3.1) For any (partially) ordered set X, if every totally ordered subset of X has an upper bound, then
X has at least one maximal element. In other words, every inductively ordered set has at least one
maximal element.
(3.2) If every well-ordered subset of an ordered set M has an upper bound, then there is at least one
maximal element in M .
(3.3) Every ordered set M has a well-ordered subset W such that every upper bound of M belongs to W .
(3.4) For a condition C of finite character for sets, every set X has a maximal (for the relation of inclusion)
subset of X that satisfies C.
(3.5) Let C be a condition of finite character for functions from X to Y . Then, in the set of functions
that satisfy C, there is a function whose domain is maximal (for the relation of inclusion).
(4) Trichotomy. (See Mendelson [165], page 198.) For any sets X and Y , either X is equinumerous to a
subset of Y , or Y is equinumerous to a subset of X. In other words, either there exists a bijection from
X to a subset of Y , or vice versa.
A common feature of AC equivalents is that they talk about “any set”. Theorems containing this phrase
should always be viewed with suspicion and be scrutinized for dependence on the axiom of choice. The
purpose of the above list is to help the reader spot any sneeky use of AC in an equivalent form.
[ There seem to be plenty more equivalents for the Axiom of Choice. See for example Mendelson [165], page 199
(exercises), Taylor [145], pages 19–21, EDM2 [35], 34.A. ]
5.10. Axiom of countable choice

5.10.1 Remark: This little piece of text appears on the internet somewhere.
The Axiom of Countable Choice (CC) is a weak form of the Axiom of Choice. It states that every
countable set of nonempty sets has a choice function.
ZF+CC (that is, the Zermelo-Fraenkel axioms together with the Axiom of Countable Choice)
suffices to prove that the union of countably many countable sets is countable. It also suffices to
prove that every infinite set has a countably infinite subset.
These consequences seem desirable enough. So probably CC should be invoked to acquire these propositions.
It might be very difficult indeed to track down which theorems already use CC without being aware of it.
5.10.2 Remark: The countable version of the axiom of choice is a weaker than the standard axiom. The
countable version is perhaps more intuitively “obvious” than the full axiom, but it still brings into existence
sets whose membership is unknowable.
[ Must find a tidy set-theoretic expression which means “X is countable”. Then use this in Definition 5.10.3
instead of plain English. ]
5.10.3 Definition: A Zermelo-Fraenkel set theory with countable axiom of choice is a set theory which
satisfies all of the conditions of Definition 5.1.26 together with Axiom (9# ).
(9# ) The countable choice axiom: For any countable set X and any set-theoretic formula P ,
(∀x ∈ X, ∃y, P (x, y)) ⇒ ∃f, (∀x ∈ X, P (x, f (x))). (5.10.1)
5.10.4 Remark: CC-tainted theorem examples in this book are Theorems 7.2.26, 7.2.28 and 7.2.36.

5.11. Zermelo set theory 193
5.11. Zermelo set theory

Probably there is no use for Zermelo set theory in this book. The difference between Zermelo and ZF set
theory is that there are some unusual sets which are guaranteed to exist in ZF which are not guaranteed
to exist in Zermelo set theory. Probably these kinds of sets have no practical use in differential geometry.
So the slightly weaker Zermelo set theory could (probably) be used for DG instead of ZF. But this book
assumes ZF anyway because the replacement axiom does not seem particularly implausible.
[ Should quote some theorems about consistency of ZF and Zermelo axioms. ]
5.11.1 Definition: A Zermelo set theory is a set theory which satisfies all of the conditions of Defini-
tion 5.1.26 except that the replacement axiom (6) is replaced with Axiom (6# ).
(6# ) The specification axiom: For any set x and set-theoretic formula A,
∃y, ∀u, (u ∈ y ⇔ (u ∈ x ∧ A(u)))
5.11.2 Remark: Axiom (6# ) is also known as the axiom of separation, the axiom of comprehension or the
axiom of subsets.
5.11.3 Remark: The specification axiom is implied by the replacement axiom. Theorem 5.11.4 follows
from this. The existence of the set {ω, IP(ω), IP(IP(ω)), IP(IP(IP(ω))), . . .}, where ω is the set of non-negative
integers, can be proved in ZF theory but not in Zermelo set theory. (See EDM2 [35], 33.B, page 148, for this
statement. See Halmos [160], Section 19, pages 74–77, for constructions using the axiom of substitution.)
The difference between the two axiom-sets is not very likely to be an issue for differential geometry. The
weaker axiom of specification (6# ) is probably sufficient. However, just to be on the safe side, the ZF set
theory Axiom (6) is adopted in this book.
[ Theorem 5.11.4 needs to be proved, or else a reference must be found. ]
5.11.4 Theorem: Any theorem that can be proved in Zermelo set theory can also be proved in Zermelo-
Fraenkel set theory.
5.11.5 Remark: There is no axiom-of-choice style of problem here. Axiom (6) constructs sets with definite
contents, not some random, unknowable sample of elements of sets. If the specification axiom guarantees the
existence of a set X, you can determine whether any given set y is in X or not. This is because the set X is
constructed by your own set-theoretic formula which you used in the invocation of the axiom. With AC, by
contrast, you never know which elements are in the constructed set. It’s a lucky dip. You get something from
the lucky dip, but you don’t know what you get. All of the ZF existential axioms guarantee the existence of
sets whose contents you have specified in your choice of the variables (sets or set-theoretic formulas) which
you insert into the axiom templates. AC gives you a pig in a poke!
5.12. Bernays-Gödel set theory

[ This section will contain a presentation of Bernays-Gödel set theory, not just informal comments. ]
[ According to EDM2 [35], 381.G: “A set in the naive sense is a collection {x; C(x)} of all x which satisfy a
certain condition C(x).” Does this correspond to the definition of a “class” in Bernays-Gödel set theory. ]
5.12.1 Remark: ZF set theory may be extended by introducing two types of class. In Bernays-Gödel set
theory, “classes” are defined instead of sets, and a set is defined as a special type of class which is an element
of some other class. Thus a class X is said to be a set if there is some class Y such that X ∈ Y . Then
“proper classes” are the classes which are not members of any other class. This allows one to define, for
example, the class of all sets. The class is not a member of any other class, but it has all sets as members.
Russell’s paradox does not happen because the class U , say, of all sets is not a member of itself. The class
Q with X ∈ Q ⇔ (X ∈ U ∧ X ∈ / X) gives no paradox because Q ∈ / Q but X ∈ / U . So the Russell’s
paradoxical “set” Q is not a set. This sounds a little like an ad-hoc fix, but since BG set theory is reportedly
logically self-consistent, it’s difficult to argue against it.

194 5. Sets
BG set theory with its two types of class seems to be a half-way house between ZF set theory which has a
single type of class or set and the Russell-Whitehead set theory which had an infinite number of types. A
clear presentation of BG set theory is given by Mendelson [165], Chapter 4, pages 159–170. (They call this
NBG set theory after John von Neumann who first proposed a similar set of axioms in 1925 and 1928.)
5.12.2 Remark: According to Mendelson [165], page 204, the axiom of choice is consistent and independent
with respect to NBG set theory.
5.12.3 Remark: Since Zermelo-Fraenkel set theory is essentially the same thing as Bernays-Gödel set
theory minus the proper classes, one could regard BG set theory as a kind of conceptual layer underlying ZF
because ZF is a specialization of BG. Most concepts of ZF theory seem to work very similarly in BG set
theory, but classes must always be checked to see if they are sets or proper classes. This increases the amount
of work required. Since ZF is embedded inside BG theory, it seems that there is little value in presenting
the BG set theory axioms for the purposes of differential geometry. It is more convenient to adopt the ZF
axioms, and if one wishes to talk about concepts like the class of all linear spaces or the class of all manifolds,
it is sufficient to simply remember to use the word “class” instead of “set” for such generalities so as to avoid
Russell’s paradox. Most people just want to do correct calculations anyway. So the philosophical alley-ways
into which BG set theory leads are a luxury which one may profitably forego.
5.13. Basic properties of binary set unions and intersections

From this point onwards, the Zermelo-Fraenkel set theory axioms are assumed. Any theorem which requires
additional set theory axioms will be tagged with the required axioms.
5.13.1 Remark: The union of any two sets is well defined by the union Axiom (4) in Definition 5.1.26.
The intersection of any two sets is well defined by the replacement Axiom (6) or specification Axiom (6# ).
5.13.2 Definition:
The union of sets A and B is the set {x; x ∈ A ∨ x ∈ B}.
The intersection of two sets A and B is the set {x; x ∈ A ∧ x ∈ B}.
5.13.3 Notation:
A ∪ B for sets A and B denotes the union of A and B
A ∩ B for sets A and B denotes the intersection of A and B
5.13.4 Theorem:
(i) A ∪ ∅ = A for any set A.
(ii) A ∩ ∅ = ∅ for any set A.
(iii) A ⊆ B ⇔ (z ∈ A ⇔ (z ∈ A ∧ z ∈ B)) for any sets A and B.
5.13.5 Remark: Theorem 5.13.4 (iii) is equivalent to the tautology (α ⇒ β) ⇔ (α ⇔ (α ∧ β)) with the
meanings α = “z ∈ A” and β = “z ∈ B”.
[ Must also do the families version of Theorem 5.13.6 following Definition 6.8.1. ]
5.13.6 Theorem: The following identities hold for all sets A, B and C.
(i) A ∪ B ⊇ A.
(ii) A ∪ B ⊇ B.
(iii) A ∩ B ⊆ A.
(iv) A ∩ B ⊆ B.
(v) A ∪ B = B ∪ A. (Commutativity of union.)
(vi) A ∩ B = B ∩ A. (Commutativity of intersection.)
(vii) A ∪ A = A.

5.13. Basic properties of binary set unions and intersections 195
(viii) A ∩ A = A.
(ix) A ∪ (B ∪ C) = (A ∪ B) ∪ C. (Associativity of union.)
(x) A ∩ (B ∩ C) = (A ∩ B) ∩ C. (Associativity of intersection.)
(xi) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). (Distributivity of union over intersection.)
(xii) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). (Distributivity of intersection over union.)
(xiii) A ∪ (A ∩ B) = A. (Absorption of union over intersection.)
(xiv) A ∩ (A ∪ B) = A. (Absorption of intersection over union.)
Proof: These formulas follow from the corresponding formulas for the logical operators ∨ and ∧.
Part (i) follows from Theorem 4.11.7 (vi). Part (ii) follows from Theorem 4.11.7 (vii). Part (iii) follows from
Theorem 4.11.7 (iii). Part (iv) follows from Theorem 4.11.7 (iv). Part (v) follows from Theorem 4.11.10 (i).
Part (vi) follows from Theorem 4.11.10 (ii). Part (vii) follows from Theorem 4.11.10 (iii). Part (viii) fol-
lows from Theorem 4.11.10 (iv). Part (ix) follows from Theorem 4.11.10 (v). Part (x) follows from Theo-
rem 4.11.10 (vi). Part (xi) follows from Theorem 4.11.10 (vii). Part (xii) follows from Theorem 4.11.10 (viii).
Part (xiii) follows from Theorem 4.11.10 (ix). Part (xiv) follows from Theorem 4.11.10 (x).
5.13.7 Definition: Two sets A and B are disjoint if A ∩ B = ∅.
5.13.8 Definition: The (relative) complement of a set B within a set A is the set {x ∈ A; x ∈
/ B}.
5.13.9 Notation: A \ B for sets A and B denotes the complement of B within A.
5.13.10 Theorem: The following identities hold for all sets X and A.
(i) (X \ A) ∩ A = ∅.
(ii) (X \ A) ∪ A = X ∪ A.
(iii) (X \ A) \ A = X \ A.
(iv) X \ (X \ A) = X ∩ A.
Hence the following identities hold if A ⊆ X.
(v) (X \ A) ∩ A = ∅.
(vi) (X \ A) ∪ A = X.
(vii) (X \ A) \ A = X \ A.
(viii) X \ (X \ A) = A.
5.13.11 Theorem (de Morgan’s law): The following identities hold for all sets A, B and X.
(i) X \ (A ∪ B) = (X \ A) ∩ (X \ B).
(ii) X \ (A ∩ B) = (X \ A) ∪ (X \ B).
5.13.12 Theorem: The following identities hold for all sets A, B and X.
(i) A ⊆ B ⇒ (X \ A) ⊇ (X \ B).
5.13.13 Theorem: The following identities hold for all sets A, B, C and D.
(i) (A ∪ B) ∩ (C ∪ D) = (A ∩ C) ∪ (B ∩ C) ∪ (A ∩ D) ∪ (B ∩ D).
(ii) (A ∩ B) ∪ (C ∩ D) = (A ∪ C) ∩ (B ∪ C) ∩ (A ∪ D) ∩ (B ∪ D).
And so forth . . . [ See Simmons [140], p. 6–14 for more properties of sets. ]
5.13.14 Definition: The (symmetric) set difference of two sets A and B is the set (A \ B) ∪ (B \ A).
5.13.15 Notation: A E B, for sets A and B, denotes the symmetric set difference of A and B.

196 5. Sets
5.13.16 Theorem: The symmetric set difference has the following properties.
(i) A E A = ∅.
(ii) A E ∅ = A.
(iii) A E B = B E A.
(iv) (A E B) E C = A E (B E C).
5.13.17 Remark: The properties of the symmetric set difference in Theorem 5.13.16 are quite satisfyingly
simple and symmetric. This is because a symmetric set difference contains a point if and only if the number
of sets which contain the point is odd. This observation easily implies both the symmetry rule (iii) and the
associativity rule (iv). The set A E A in part (i) contains no points because all points are either in zero or
two of the sets A and A.
Theorem 5.13.18, on the other hand, contains almost no satisfyingly simple and symmetric rules because the
symmetric set difference operator, which is arithmetic in nature, is mixed with the union and intersection
operators, which are logical in nature. The logical operators have simple relations among themselves, and
the arithmetic operator E has simple relations. But combinations of these two kinds of relations yield very
few simple results.
5.13.18 Theorem: The symmetric set difference has the following properties.
(i) A ∩ (B E C) = (A ∩ B) E (A ∩ C).
(ii) A ∪ (B E C) = (A ∪ B ∪ C) \ ((B ∩ C) \ A).
" # " #
(iii) A E (B ∪ C) = (A E B) ∪ (A E C) \ A ∩ (B E C) .
" # " #
(iv) A E (B ∪ C) = (A E B) ∩ (A E C) ∪ (B E C) \ A .
" # " #
(v) A E (B ∩ C) = (A E B) ∪ (A E C) \ (B E C) \ A .
" # " #
(vi) A E (B ∩ C) = (A E B) ∩ (A E C) ∪ A ∩ (B E C) .
5.14. Basic properties of general set unions and intersections
5.14.1 Remark: The properties of general unions and intersections are especially applicable in topology.
5.14.2
% Notation:
' S denotes the set {x; ∃A ∈ S, x ∈ A} for any set S.
S denotes the set {x; ∀A ∈ S, x ∈ A} for any non-empty set S.
%
5.14.3 Remark:
' The set S in Notation 5.14.2 is well defined by the union Axiom (4) in Section 5.1.
The set S for a non-empty set of sets S is well defined by Axiom (6) or (6# ).
5.14.4 Theorem:
% " # " #
(i) x ∈ S ⇔ ∃A ∈ S, x ∈ A ⇔ ∃A, (x ∈ A ∧ A ∈ S) for any set of sets S.
' " # " #
(ii) x ∈ S ⇔ ∀A ∈ S, x ∈ A ⇔ ∀A, (x ∈ A ∨ A ∈
/ S) for any non-empty set of sets S.
5.14.5 Theorem:
%
(i) ∅ = ∅.
%
(ii) {A} = A for any set A.
'
(iii) {A} = A for any set A.
%
(iv) {A, B} = A ∪ B for any sets A and B.
'
(v) {A, B} = A ∩ B for any sets A and B.
5.14.6 Remark: Theorem 5.14.7 gives some generalizations to sets of sets of the statements in Theorems
5.13.6 and 5.13.13. (The reader may like to ponder why part (vii) of Theorem 5.14.7 requires S1 and S2 to
be non-empty.)

5.14. Basic properties of general set unions and intersections 197
5.14.7 Theorem: Let A be a set. Let S, S1 and S2 be sets of sets. Then the following statements are
true.
% %
(i) S1 ⊆ S2 ⇒ S1 ⊆ S2 .
' '
(ii) S1 ⊆ S2 ⇒ S1 ⊇ S2 if S1 -= ∅ and S2 -= ∅.
"% # %
(iii) A ∩ S = {A ∩ X; X ∈ S}.
"' # '
(iv) A ∪ S = {A ∪ X; X ∈ S} if S1 -= ∅.
"% # "% # %
(v) S1 ∩ S2 = {X1 ∩ X2 ; X1 ∈ S1 , X2 ∈ S2 }.
"' # "' # '
(vi) S1 ∪ S2 = {X1 ∪ X2 ; X1 ∈ S1 , X2 ∈ S2 } if S1 -= ∅ and S2 -= ∅.
"% # "% # %
(vii) S1 ∪ S2 = {X1 ∪ X2 ; X1 ∈ S1 , X2 ∈ S2 } if S1 -= ∅ and S2 -= ∅.
"' # "' # '
(viii) S1 ∩ S2 = {X1 ∩ X2 ; X1 ∈ S1 , X2 ∈ S2 } if S1 -= ∅ and S2 -= ∅.
"% # "% # %
(ix) S1 ∪ S2 = (S1 ∪ S2 ).
"' # "' # '
(x) S1 ∩ S2 = (S1 ∪ S2 ) if S1 -= ∅ and S2 -= ∅.
%
(xi) A ∈ S ⇒ A ⊆ S.
'
(xii) A ∈ S ⇒ A ⊇ S if S -= ∅.
" # "% #
(xiii) ∀X ∈ S, X ⊆ A ⇔ S⊆A .
" # "' #
(xiv) ∀X ∈ S, X ⊇ A ⇔ S ⊇ A if S -= ∅.
" # "% % #
(xv) ∀X1 ∈ S1 , ∀X2 ∈ S2 , X1 ∩ X2 = ∅ ⇔ S1 ∩ S2 = ∅ .
5.14.8 Remark: Propositions (i) and (ii) in Theorem 5.14.9 are generalizations of De Morgan’s law. (See
Theorem 5.13.11.)
5.14.9 Theorem: Let A be a set. Let S be a set of sets. Then the following statements are true.
% '
(i) A \ S = {A \ X; X ∈ S} if S -= ∅.
' %
(ii) A \ S = {A \ X; X ∈ S} if S -= ∅.
5.14.10 Remark: Notation 5.8.15 introduced the abbreviation {x ∈ A; P (x)} for {x; (x ∈ A) ∧ P (x)},
where A is a set and P is a set-theoretic formula. Theorem 5.14.11 presents some properties of unions and
intersections applied to sets of this form. Part (ii) is perhaps more surprising than part (i).
5.14.11 Theorem:
% " # " #
(i) x ∈ {A ∈ S; P (A)} ⇔ ∃A ∈ S, (x ∈ A ∧ P (A)) ⇔ ∃A, (x ∈ A ∧ A ∈ S ∧ P (A))
for any set of sets S and set-theoretic formula P .
' " # " #
(ii) x ∈ {A ∈ S; P (A)} ⇔ ∀A ∈ S, (x ∈ A ∨ ¬P (A)) ⇔ ∀A, (x ∈ A ∨ A ∈
/ S ∨ ¬P (A))
for any non-empty set of sets S and set-theoretic formula P .
5.14.12 Remark: Theorem 5.14.13 presents some basic properties of the power set construction in Defi-
nition 5.8.18 and Notation 5.8.19.
5.14.13 Theorem: Let A1 and A2 be sets. Let S1 and S2 be sets of sets. Then the following statements
are true.
(i) A1 ∈ IP(A1 ).
(ii) A1 ⊆ A2 ⇒ IP(A1 ) ⊆ IP(A2 ).
%
(iii) (IP(A1 )) = A1 .
% % % %
(iv) ∀C ∈ IP(S2 ), C ∈ IP( S2 ). That is, S1 ⊆ S2 ⇒ S1 ⊆ S2 .
%
(v) S1 ⊆ IP( S1 ).
%
(vi) S1 ⊆ IP(S2 ) ⇒ S1 ⊆ S2 .

198 5. Sets

%
5.14.14 Remark: The set construction x∈X f (x) in Notation 5.14.15 (i) is well defined by the replacement
axiom, Definition 5.1.26 (6), which guarantees that {y; ∃x ∈ X, y ∈ f (x)} = {f (x); x ∈ X} is a set. Then
the union axiom, Definition 5.1.26 (4), guarantees that the union of this set is also a set.
'
The set construction x∈X f (x) in Notation 5.14.15 (ii) is well defined by the axiom of specification.
5.14.15 Notation:
% %
(i) x∈X f (x) means {y; ∃x ∈ X, y ∈ f (x)} for any set X and any set-theoretic function f such that
f (x) is a set for all x ∈ X.
' '
(ii) x∈X f (x) means {y; ∀x ∈ X, y ∈ f (x)} for any set X and set-theoretic function f such that f (x) is
a set for all x ∈ X and f satisfies ∃x ∈ X, ∃y ∈ Y, y ∈ f (x).
5.14.16 Theorem:
%
(i) y∈Y {x ∈ X; P (x, y)} = {x ∈ X; ∃y ∈ Y, P (x, y)} for any sets X and Y and set-theoretic formula P .
'
(ii) y∈Y {x ∈ X; P (x, y)} = {x ∈ X; ∀y ∈ Y, P (x, y)}, for any sets X and Y and set-theoretic formula P
which satisfies ∃x ∈ X, ∃y ∈ Y, P (x, y).

%
5.14.17 Remark: The set y∈Y {x ∈ X; P (x, y)} in Theorem 5.14.16 may be thought of as the domain
of the set-theoretic formula P if it is subject to the restrictions x ∈ X and y ∈ Y on P (x, y).
'
The set y∈Y {x ∈ X; P (x, y)} is perhaps less' useful. If the set {(x, y) ∈ X × Y ; P (x, y)} is interpreted as
the graph of a function f : X → Y , the set y∈Y {x ∈ X; P (x, y)} is the set of x ∈ X for which f ({x}) = Y .
Relations and functions are defined in Chapter 6.
[ Give a list of properties of the symmetric set "difference

' # operator
"% distributed
# over arbitrary unions and
intersections? For example, try to simplify A E S and A E S .]
5.15. Closure of set unions under arbitrary unions
5.15.1 Remark: Theorem 5.15.4 states that for any set of sets S, the set of unions of subsets of S is closed
under arbitrary unions. This fact is useful in topology, for example for the proof of Theorem 14.9.8. One
may write Theorem 5.15.4 more formally as
% % %
(∀U ∈ Q, ∃C ∈ IP(S), U = C) ⇒ (∃C̄ ∈ IP(S), Q= C̄)
for any Q and S.

% %
5.15.2
% Remark:
% In the statement of Theorem 5.15.4, C ⊆% S for all C ⊆ S (by Theorem 5.14.7 (i)). So
C ∈ IP( S) (as in Theorem 5.14.13 (iv)). Therefore T ⊆ IP( S). This guarantees that T is a well-defined
set by the ZF axiom of specification. (Alternatively the replacement axiom may be used.)
5.15.3 Remark: To prove Theorem 5.15.4, it is intuitively clear that one may combine all of the collections
C of which the sets%U are%composed into a collection C̄ ∈ IP(S),%and this combined collection C̄ will have
the property that Q = C̄, which immediately implies that Q ∈ T as claimed. % The obstacle here is
that it is not always possible to reconstruct the collections C from the unions
% C. Given a set Q ⊆ T , it is
only known that each element U of Q has the property: ∃C ∈ IP(S), U = C. The set Q is not equipped
with information about how its elements U were constructed from sets C ∈ IP(S). Even if Q was actually
constructed by someone from sets C ∈ IP(S), the construction history is discarded when the set is “handed
over” to the “recipient” of the set. The recipient of the set Q only knows what the elements U are, together
with the useful hint that each such set U is equal to the union of some subset of S. The recipient is not
told which subset! For a very large set S, it could be a computationally challenging to determine which
combination of elements of S may be combined to produce U . The required combination of elements may be
uncountably infinite. An even more unpleasant fly in the ointment is the possibility that the set Q was never
even constructed at all. It may be a completely arbitrary subset of T , and S could be a completely arbitrary

5.15. Closure of set unions under arbitrary unions 199
general set. In such a situation, there is no history of “construction” of Q to be recovered. Recovering the
sets C ∈ IP(S) may be equivalent to solving a cryptographic puzzle, for example, or some famous “open
problem”. It follows from this discussion that the simple aggregation of the sets C will not deliver the
required combined set C̄.
The axiom of choice would bail us out of%this quandary by providing, for each set U ∈ Q, a single choice
of collection C ∈ IP(S) such that U = C. This might not be the same set C which was used in the
construction of U , if %
it was %
constructed, but that doesn’t matter. The important thing is to obtain a set
C̄ ∈ IP(S) such that Q = C̄.
The choice of set
% C for each U ∈ Q would be equivalent to constructing a choice function f : Q → IP(S)
which satisfies C = U for each set U ∈ Q. Then Theorem 6.8.13 applied to f would yield the desired
result. However, since this book is avoiding the axiom of choice whenever possible, this approach is not
followed here.
% %
The approach followed in the proof of Theorem% 5.15.4
% is to use the set C̄ = {C ∈ IP(S); C ∈ Q} for the
combined collection of sets which will yield Q = C̄. This choice of C̄ combines all possible collections
% % C
which could have been used in the construction of the sets U ∈ Q. So it should hopefully satisfy C̄ = Q,
and hopefully
% the choice axiom will not be required to prove this. By using all possible set collections C
with U = C, we avoid making a choice of set collection C, thereby avoiding the axiom of choice.
[ Give an example of a set S in Theorem 5.15.4% for which it is impossible or computationally impractical to
reconstruct sets C ∈ IP(S) for which U = C for U ∈ Q. E.g. make the solution of this problem equivalent
to solving a crypto puzzle or the axiom of choice. Lebesgue non-measurable sets could be useful for the
latter. ]
% %
5.15.4 Theorem: Let S be a set of sets. Define T = { C; C ⊆ S}. Then ∀Q ⊆ T, Q ∈ T .
% %
Proof: Let S be a set of sets. Define T = { C; C ⊆ S}. Let Q ⊆% T . It is to be shown that Q ∈ T . The
set Q is defined by the proposition that ∀U ∈ Q, ∃C ∈ IP(S), U = C. This means that Q is a collection
of sets which are unions of subcollections of the collection S.
% % % %
Let C̄ = {C ∈ IP(S); C ∈ Q}. It will be shown % that
% C̄ is an element of T which is equal to Q. To
prove this, it suffices to show that C̄ ⊆ S and C̄ = Q.
% %
To show that % C̄ ⊆ S,% note that C̄ ⊆ IP(S) by Theorem 5.14.7 (i) and IP(S) = S by Theorem 5.14.13 (iii).
The equality C̄ = Q may be derived as follows.
%
x ∈ Q ⇔ ∃U ∈ Q, x ∈ U
⇔ ∃U, (x ∈ U ∧ U ∈ Q)
⇔ ∃U, (x ∈ U ∧ U ∈ Q ∧ U ∈ T ) (5.15.1)
%
⇔ ∃U, (x ∈ U ∧ U ∈ Q ∧ ∃C ∈ IP(S), (U = C))
%
⇔ ∃U, ∃C, (x ∈ U ∧ U ∈ Q ∧ C ∈ IP(S) ∧ U = C)
%
⇔ ∃C, ∃U, (C ∈ IP(S) ∧ U ∈ Q ∧ x ∈ U ∧ U = C)
%
⇔ ∃C, (C ∈ IP(S) ∧ ∃U, (U ∈ Q ∧ x ∈ U ∧ U = C))
% % %
⇔ ∃C, (C ∈ IP(S) ∧ ∃U, ( C ∈ Q ∧ x ∈ C ∧ U = C))
% % %
⇔ ∃C, (C ∈ IP(S) ∧ C ∈ Q ∧ x ∈ C ∧ (∃U, U = C))
% %
⇔ ∃C, (C ∈ IP(S) ∧ C ∈ Q ∧ x ∈ C) (5.15.2)
%
⇔ ∃C, (C ∈ IP(S) ∧ C ∈ Q ∧ ∃V, (x ∈ V ∧ V ∈ C))
%
⇔ ∃V, ∃C, (C ∈ IP(S) ∧ C ∈ Q ∧ x ∈ V ∧ V ∈ C)
%
⇔ ∃V, (x ∈ V ∧ ∃C, (C ∈ IP(S) ∧ V ∈ C ∧ C ∈ Q))
%
⇔ ∃V, (x ∈ V ∧ ∃C ∈ IP(S), (V ∈ C ∧ C ∈ Q))
⇔ ∃V, (x ∈ V ∧ V ∈ C̄) (5.15.3)
⇔ ∃V ∈ C̄, x ∈ V
%
⇔ x ∈ C̄.
%
It follows that Q ∈ T . Line (5.15.1) follows from Theorem 5.13.4 (iii) because Q ⊆ T . Line (5.15.2) follows
from Theorem 4.16.13 (ii) and Remark 4.11.8. Line (5.15.3) follows from Theorem 5.14.11 (i).

200 5. Sets
'
[ Is there a similar theorem to 5.15.4 for C? What about finite unions and intersections? Can this be
applicable to showing that a topology is generated by all finite intersections of all unions of a sets of sets? ]
5.15.5 Remark: Assertions of the form ∀x ∈ X, ∃y, P (x, y) for a proposition P depending on two variables
often lead to the temptation to use the axiom of choice because such assertions intuitively suggest the
existence of a function f on the set X such that
% P (x, f (x)) holds for all x ∈ X. In the particular case
of Theorem 5.15.4, ∀U ∈%Q, ∃C ∈ IP(S), U = C, which leads to the temptation to propose a function
f : Q → IP(S) satisfying f (U ) = U for all U ∈ Q.
5.15.6 Remark: Proving theorems without the axiom of choice is a bit like learning how to cook without
using meat, fish or eggs. Just as the lacto-vegetarian must find substitutes for forbidden foods, the mathe-
matician who rejects AC must find alternative paths to achieve desired results. The non-AC mathematician
must learn to recognize theorems which are “contaminated” by AC or CC. This would be much easier if
everyone provided accurate labelling of theorems which indicated the ingredients used in their manufacture.
5.15.7 Remark: Theorem 5.15.4 is related to the CC-tainted Theorem 7.2.36, which says that the union
of a countable set of countable sets is countable.
5.15.8 Remark: The many lines of the logical calculation in the proof of Theorem 5.15.4 may seem ex-
cessive, but this is the price of certainty that the logic is correct. By checking that every line is correctly
derived from axioms and already-proved lines, the final line of a derivation can be trusted to not depend
on intuition or unwarranted assumptions. Most mathematical proofs are presented informally. An informal
proof relies on the ability of the reader to “fill the gaps” between the steps of the argument. If the correctness
of an argument is in contention, the reader must know how to write out the argument in full, no matter how
tedious this may be. In mathematics, occasional tedium is better than occasional incorrectness.
% '
[ The unary ∪ and ∩ operators are too small and are too close to the symbol on the right, but the and
symbols are too large for this purpose when the symbol on the right is a small letter. So what is needed are
symbols which are in between these two sizes. Probably should define macros for these symbols and define
them later to be some better-sized symbols. ]
5.16. Specification tuples
5.16.1 Remark: The fact that mathematicians do not agree on which sets should represent the concepts
of differential geometry shows that the sets are not themselves the objects under consideration, but merely
serve to indicate which object is being looked at within a class of objects.
Every mathematical object should ideally have not only a set to indicate which object it is, but also an
object class, a name tag, a scope identifier and an encoding class. The encoding class is really necessary for
the same reason that it is necessary in computer software. Computers represent all data in terms of zeros
and ones which are same kinds of zeros and ones whether the data is text, integers, floating-point numbers or
images in dozens of formats. Therefore all data in computers has some sort of indication of the encoding rules
used for each piece of data. It is also necessary to know what class of object is represented. To distinguish
one object from another, identifiers (name tags) are used. Since names from different contexts may clash,
scope identifiers are often used implicitly or explicitly. Strictly speaking, mathematics should also have all
five of these components: (1) a set to indicate which object of a class is indicated; (2) a class tag to indicate
the human significance of the object class; (3) an encoding tag to indicate the chosen representation; (4) a
name tag to indicate which object of a class is indicated; (5) a scope tag to remove ambiguity from multiple
uses of a name tag. All of this is done informally in most mathematics, but it is helpful to be aware of the
limits of the expressive power of sets. Sets should be thought of as the mathematical equivalent of the zeros
and ones of computer data. Either explicitly or implicitly, this raw data must be brought to life.
5.16.2 Remark: Mathematical objects may be organized into classes. The objects in each mathematical
class may be indicated by a “specification tuple”, a parameter sequence which uniquely determines a single
object in the class. Names for things are often confused with the things which they refer to. Mathematical
names are no exception. A specification tuple is often thought of as the definition of the object itself. For
example, a pair (G, σG ) may be defined to be a group if the function σG : G × G → G satisfies the axioms of
a group. The trivial group with identity 0 would then be the tuple ({0}, {((0, 0), 0)}). However, given only

5.16. Specification tuples 201
this pair of sets, it would be difficult to guess that it is supposed to be a group. There is something missing
in the bare parameter list. One could think of the missing significance as the “essence of a group”.
5.16.3 Remark: A full specification tuple may be inconveniently long. In this case, the tuple may be
abbreviated. For example, a topological group might be referred to as a set G with various operations and
a topology, whereas a full specification tuple might be (G, TG , σG ) where TG is the topology and σG is the
group operation. Informal specifications are fine for simple situations, but as structures become progressively
more complex, the burden on the reader’s memory and guesswork becomes excessive. It is best to specify
the full set of parameters to avoid ambiguity when introducing a new concept.
5.16.4 Remark: Although one thinks of such notations as G and (G, TG , σG ) as referring to the same
thing, they cannot be equal. A statement such as “G = (G, TG , σG )” is logical nonsense. It is preferable to
use an asymmetric notation such as
< (G, TG , σG )
G− or (G, TG , σG ) −
> G,
to indicate that G is an abbreviation for (G, TG , σG ). The non-standard chicken-foot symbols “−

<” and “−
>”
are used frequently in this book.
5.16.5 Remark: In a more formal presentation, one might indicate mathematical classes explicitly. For
example, the class of all topological principal G-bundles might be denoted as TPFB[G, TG , σG ], where
G− < (G, TG , σG ) is a topological group. (In this case, there is a different class TPFB[G, TG , σG ] for each
choice of the triple of parameters (G, TG , σG ), where G is a group, TG is a topology on G and σG is an
algebraic operation on G.)
An individual member of this class might be denoted as TPFB[G, TG , σG ](P, TP , q, B, TB , AG P , µG ), which
P
indicates both the parameters of the class and the parameters of the particular object. The 3 class parameters
are (G, TG , σG ). The 7 object parameters are (P, TP , q, B, TB , AG
P , µG ).
P
With such a notation, the reader would see at all times the clear division between class and object parameters.
In practice, such an object is denoted as (P, q, B) or (P, TP , q, B, TB , AG P , µG ), and the class membership is
P
indicated in the informal context. Although this book is not written in such a formal way, an effort has been
made to ensure that most class and object specifications could be formalized if required. Most texts freely
mix class and object parameters, which can make it difficult to think clearly about what one is doing.
5.16.6 Remark: Suppose LTG(G, X, σ, µ) denotes a left transformation group G with group operation σ :
G × G → G, acting on X with action µ : G × X → X. Let RTG(G, X, σ, µ) denote the correspond-
ing right transformation group. Then for any group GP(G, σ), the parameters of LTG(G, G, σ, σ) and
RTG(G, G, σ, σ) are identical. Yet they are respectively the left and right transformation groups of G acting
on G. So they are different classes of structure with identical parameters. (This ambiguity is also mentioned
in Remark 33.8.8.)
This shows that there must be something extra which indicates the class of a specification tuple. Thus, for
example, when this book talks about “the group (G, σ)”, what is really meant is “the structure GP(G, σ)”,
where the meaning of GP is explained only in non-technical human-to-human language. The meaning of
the class of a structure lies outside formal mathematics in the socio-mathematical context. So the reader
should have no illusions that the pair (G, σ) is a group. It is just a pair of parameters.
5.16.7 Remark: The idea that mathematical definitions refer to “classes” or “objects” does not derive
from the corresponding terminology in computer programming. For example, Bell [191], page 505, published
the following in 1937.
A manifold is a class of objects (at least in common mathematics) which is such that any member
of the class can be completely specified by assigning to it certain numbers [. . . ]

202 5. Sets

[203]
Chapter 6
Relations and functions
6.1 Ordered pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

6.2 Cartesian product of a pair of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.4 Equivalence relations and partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.6 Function set maps and inverse set maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.7 Composition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.8 Families of sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.9 Cartesian products of families of sets and functions . . . . . . . . . . . . . . . . . . . . . . 219
6.10 Partial Cartesian products and identification spaces . . . . . . . . . . . . . . . . . . . . . . 220
6.11 Partially defined functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.12 Notations for sets of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.0.1 Remark: Relations and functions may be represented as sets of ordered pairs. In other words,
objects which are related to each other may be explicitly and exhaustively listed. In practice, the ordered
pairs are too numerous to list. So the ordered pair lists are specified by a compact, finite rules of some
sort. Therefore one may regard the set-of-ordered pairs formalism as merely an abstract conceptualization
of relations and functions.
Two rules which specify relations (or functions) are considered to be equal if and only if they generate the
same set of ordered pairs. But since any such sets of ordered pairs may be impossible to list explicitly, the
equality of the generated sets is determined by logical analysis of the rules, not by comparing the ordered
pair lists.
Suppose relations R1 and R2 are defined by rules R1 = {(x, y); P1 (x, y)} and R2 = {(x, y); P2 (x, y)}, where
P1 and P2 are set-theoretic relations (i.e. predicates with two variables). Then
" #
R1 = R2 ⇔ ∀x, ∀y, P1 (x, y) ⇔ P2 (x, y) .
In other words, testing equality of the sets R1 and R2 is equivalent to testing the equivalance of the predicates
P1 and P2 . Although one is expected to have in mind the sets, in practice one works with the predicates in
the logical domain.
The two ways of formalizing relations may be referred to as “relation-sets” and “relation-predicates”. The
correspondence between relation-sets and relation-predicates breaks down under some circumstances. For
example, when one refers to the set of all relations R between real numbers, it is simply not possible to
write all such relations as finite rules. Such “dummy variable” relations must be formalized as sets. Another
circumstance is the case of Lebesgue non-measurable functions. These relations are chosen with the Axiom
of Choice and cannot be specified by any rule.
Relation-sets have the advantage of avoiding issues such as the existence of finite rules to specify the relations.
On the other hand, relation-predicates have the advantage that they are well defined even when the domain
and range of the relation are not sets. (For example, the set inclusion relation “⊆” is defined for all sets,


204 6. Relations and functions
but the domain and range are the “set of all sets”, which is not a set! So this relation-predicate has no
corresponding relation-set.)
These comments apply to sets in general, not just to relations and functions.
6.0.2 Remark: Although functions are represented as particular kinds of sets, they are generally thought
of as being dynamic, in contrast to sets which are static. Functions may be thought of as moving or
transforming objects, or associating attributes with objects. Sometimes one thinks of functions as having a
temporal or causal quality.
Since functions may be applied to functions, it often happens that a function acts on static objects, but the
function is itself acted upon by other functions. So the same function may sometimes have the active role,
sometimes a passive role.
Similarly, relations are thought of as more than just sets of ordered pairs. They may be thought of as
indicating associations between things.
The meanings of sets, relations and functions are clearly not fully captured in the set theory formalism. The
meanings are communicated within the socio-mathematical context.
6.0.3 Remark: Figure 6.0.1 shows some of the relations, er. . . relationships between relations and func-
tions. (The “relations” between classes of mathematical objects are “naive relations” in the realm of “naive
mathematics” or “a-priori mathematics” or “metamathematics”. The “relations” which are defined in Sec-
tion 6.3 as subsets of Cartesian products of sets constitute a well-defined class of sets within set theory.)
Cartesian product
relation
partially defined equivalence partial order

function relation
total order
function
injective function surjective function
bijective function
Figure 6.0.1 Relationships between relations and functions
6.1. Ordered pairs

In order to define relations and functions, it is necessary to first define ordered pairs.
6.1.1 Remark: The ordered pairs {{a}, {a, b}} in Definition 6.1.3 are well defined by ZF Axiom (3). One
peculiarity of this representation is the fact that the ordered pair (a, a) is represented as {{a}}. (Clearly
one does not normally think of sets of the form {{a}} as ordered pairs. This shows once again that sets on
their own mean very little. One must also know the class of objects to which a set belongs in order to know
which mathematical object it points to.)
Another peculiarity is that n-tuples (a1 , a2 , . . . an ) for non-negative integers n are represented as functions
with values a1 , a2 , . . . an , but functions are represented as ordered pairs. So a 2-tuple is represented as a
function, which is a set of ordered pairs, but a 2-tuple is thought of as being the same thing as an ordered
pair and is often called such. The circularity of the definitions here is broken by first defining ordered pairs
as in Definition 6.1.3, then representing functions in terms of these ordered pairs, and then defining general
n-tuples in terms of functions. This is an example of boot-strapping of definitions in mathematics. Ordered
pairs are first defined in a “bare-handed” way and then redefined later using more sophisticated machinery.
Yet another peculiarity is the fact that ordered pairs have two different set representations. In practice,
as soon as functions have been defined, one may ignore Definition 6.1.3 and use the n-tuple representation
instead. [ See Halmos [160], section 6, for ordered pairs. ] Although a kind of function-free n-tuple is discussed
in Remark 6.1.13, such function-free n-tuples are not used in practical definitions for n > 2.

6.1. Ordered pairs 205
6.1.2 Remark: One could axiomatically define a “system of pairs of sets”, called Pairs, say, to contain all
of the ordered pairs of sets in a ZF set theory Sets, say. The relations between sets and pairs could be defined
in terms of a function P : Sets × Sets → Pairs and functions L : Pairs → Sets and R : Pairs → Sets
such that L(P (x, y)) = x, R(P (x, y)) = y and P (L(p), R(p)) = p for sets x and y and pairs p. We could call
P (x, y) the ordered pair of sets x and y. Then Definition 6.1.3 would be just one possible representation of
the system Pairs in terms of sets. So Definition 6.1.3 should not be taken too seriously. The set expression
{{a}, {a, b}} is a mere parametrization of the concept of an ordered pair. The concept of the “order” of two
things is something that the reader must already know about and use in the understanding of ordered pairs.
6.1.3 Definition: An ordered pair is a set of the form {{a}, {a, b}} for any sets a and b.
6.1.4 Notation: (a, b) denotes the ordered pair {{a}, {a, b}} for any a and b.
6.1.5 Remark: According to Mendelson [165], page 162, Kazimierz (Casimir) Kuratowski discovered the
representation of ordered pairs in Definition 6.1.3.
6.1.6 Remark: The set-theoretic functions P , L and R in Remark 6.1.2 for the set-pair representation in
Definition 6.1.3 would be P (a, b) = {{a}, {a, b}}, L({{a}, {a, b}}) = a and R({{a}, {a, b}}) = b. (Of course,
one cannot really talk about P mapping the cross-product Sets × Sets to Pairs because Sets is not a
set and cross products have not been defined yet. So P cannot be thought of as mapping ordered pairs of
sets (a, b) ∈ Sets × Sets to Pairs unless naive notions of cross product and ordered pairs are used. So
there are at least three layers of ordered pair definitions here: a logic layer, a basic set-pair layer, and a
sequence-indexed-by-integers layer. One could dig more deeply for more definitional layers for ordered pairs!)
6.1.7 Remark: Theorems 6.1.8 and 6.1.9 show that the left and right elements of an ordered pair can be
extracted from the pair.
6.1.8 Theorem: The left element of an ordered pair p is the set a defined by ∀x ∈ p, a ∈ x.
Proof: Let p = (a, b) be an ordered pair. Then p = {{a}, {a, b}}. Therefore ∀x ∈ p, a ∈ x. But it must
be shown that this uniquely determines the set a. In other words, it must be shown that if ∀x ∈ p, a# ∈ x,
then a# = a. Suppose a = b. Then a# ∈ {a}. Therefore a# = a. Suppose a -= b. Then a# ∈ {a} and a# ∈ {a, b}.
In particular, a# ∈ {a} again. So a# = a.
6.1.9 Theorem: The right element of an ordered pair p is the set b defined by ∃# x ∈ p, b ∈ x.
" #
Proof:
# The proposition ∃#
x ∈ p, b#
∈ x means that ∃x ∈ p, b#
∈ x and ∀x, y ∈ p, (b ∈ x ∧ b# ∈ y) ⇒ x =
y . (See Notation 4.16.4 for the unique existence notation.) This clearly holds with b = b# . So it remains to
show that any b# which satisfies ∃# x ∈ p, b# ∈ x must equal b. Suppose a = b. Then b# ∈ {a} = {b}. So b# = b.
Suppose a -= b. Then either b# ∈ {a} or b# ∈ {a, b}. So either b# = a or b# = b. Suppose b# = a. Then x = {a}
and y = {a, b} implies that (b# ∈ x ∧ b# ∈ y) and x -= y, which contradicts the assumption. Hence b# = b.
6.1.10 Remark: Theorems 6.1.8 and 6.1.9 are unsatisfying. One would ideally like to have simple logical
expressions left and right which yield a and
' b from p as a = left(p) and b = right(p). There are expressions
for {a} and {b}. We may write {a} = (p) = {x; ∀x ∈ p, a ∈ x} and {b} = {x; ∃# x ∈ p, b ∈ x}. An
alternative for {b} in terms of p = (a, b) would be
(% ' % '
p\ p if % p -= ' p
{b} = %
p if p = p.
These are very clumsy constructions, but there seem to be no expressions for a and b themselves at all. In
fact, there seems to be no expression to simply construct a from {a}.
One way out of this difficulty would be to invent a form of expression such as “x; x ∈ X” to extract a
from X = {a}. But this logical expression only yields a single thing if the set is a singleton. It is desirable
to have an operator which yields “the thing inside the set”. The closest one can come to “x = the thing
inside X” is “x ∈ X”. Within the context, one must also then claim that ∃# x, x ∈ X.

6.1.11 Remark: Theorem 6.1.12 states the property of ordered pairs which motivates the definition. Any
definition which had the same property would be equally suitable, but Definition 6.1.3 seems to be universally
accepted.
6.1.12 Theorem: Ordered pairs satisfy ∀a, ∀b, ∀c, ∀d, ((a, b) = (c, d) ⇔ ((a = c) ∧ (b = d))).
6.1.13 Remark: Ordered triples, quadruples, pentuples and higher-order tuples are defined in Section 7.7
(Definition 7.7.3) in terms of functions (which are in turn defined in terms of ordered pairs in Section 6.5).
If ordered triples are required, there will be a cyclic loop of definitions here unless ordered triples can be
defined without functions. This cycle of definitions can be broken by Definition 6.1.14.
6.1.14 Definition: An ordered triple is a set of the form ((a, b), c) for any three sets a, b and c. This is
denoted as (a, b, c).
An ordered quadruple is a set of the form ((a, b, c), d) for any four sets a, b, c and d. This is denoted
as (a, b, c, d).
6.1.15 Remark: By naive induction, Definition 6.1.14 may be extended to any order of tuple by the rule
(a1 , a2 , . . . an+1 ) = ((a1 , a2 , . . . an ), an+1 ). (Of course, this doesn’t make much sense if functions and integers
haven’t been defined yet.) To make the induction complete, it is convenient to define a 1-tuple as (a) = a
for any set a. This inductive rule cannot be used for the definition of a 0-tuple.
6.2. Cartesian product of a pair of sets

The Cartesian product of a pair of sets may be defined in terms of ordered pairs. The Cartesian product of
more than two sets is generally defined in terms of functions and integers, which are not yet defined in this
section. Therefore this section is concerned only with the Cartesian product of two sets.
6.2.1 Definition: The Cartesian product of two sets A and B is the set {(a, b); a ∈ A, b ∈ B} of all
ordered pairs (a, b) such that a ∈ A and b ∈ B.
6.2.2 Notation: A × B denotes the Cartesian product of sets A and B.
6.2.3 Remark: The Cartesian product of two sets A and B may be visualized as in Figure 6.2.1.
b (a, b)
B A×B
Figure 6.2.1 Cartesian product of sets
6.2.4 Remark: The Cartesian product A × B is well defined because it is a subset of IP(IP(A ∪ B)). (To
see this, note that {a} and {a, b} are elements of IP(A ∪ B) so that (a, b) ∈ IP(IP(A ∪ B)) and A × B ∈
IP(IP(IP(A ∪ B))).) So the existence follows from Axioms (3), (4) and (5). [ Is Axiom (6# ) required here? ]

6.3. Relations 207
6.2.5 Theorem: The Cartesian product has the following properties for sets A, B, C and D.
(i) (A × B = ∅) ⇔ ((A = ∅) ∨ (B = ∅)).
(ii) If A -= ∅ and B -= ∅, then (A × B ⊆ C × D) ⇔ ((A ⊆ C) ∧ (B ⊆ D)).
(iii) (A × C) ∪ (B × C) = (A ∪ B) × C and (C × A) ∪ (C × B) = C × (A ∪ B).
(iv) (A × C) ∩ (B × C) = (A ∩ B) × C and (C × A) ∩ (C × B) = C × (A ∩ B).
(v) (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D).
(vi) (A × B) ∪ (C × D) ⊆ (A ∪ C) × (B ∪ D).
6.3. Relations
[ See Halmos [160], section 7 for more on relations. ]
6.3.1 Remark: The mathematics literature uses the words “relation”, “function”, “domain” and “range”
in confusing and inconsistent ways. Definitions 6.3.2 and 6.3.6 attempt to reduce the confusion a little by
insisting that the domain and range must be implicit in the specification of a relation. In other words, a
relation is defined to be no more than a set of ordered pairs, whereas a popular alternative formalization
includes the domain and range explicitly in the definition of a relation.
6.3.2 Definition: A relation is a set of ordered pairs.

A relation between a set A and a set B is a subset of A × B.
A relation from a set A to a set B is a subset of A × B.
A relation in a set A is a subset of A × A.
A pair (a, b) is said to satisfy the relation R if (a, b) ∈ R.
6.3.3 Remark: Remark 6.0.1 mentions the difference between a relation-predicate (defined by a logical
predicate expression or “rule”) and a relation-set (defined as a set of ordered pairs of related objects). In
this terminology, Definition 6.3.2 defines a relation-set. (See Definition 5.1.22 for a half-hearted attempt to
define a “set-theoretic formula”, which is how relation-predicates are defined.)
6.3.4 Remark: The definition of a relation R in Definition 6.3.2 may be decomposed into the following
conditions.
(i) R is a set;
(ii) ∀z ∈ R, ∃a, ∃b, z = (a, b).
Condition (i) is significant. One often refers to a predicate with two parameters as a “relation”, but this is not
at all the same thing. In ZF set theory, such a relation-predicate applies in principle to all sets, and there is no
“set of all sets” in ZF set theory. (This is discussed at length in Section 5.7.) Therefore unrestricted relation-
predicates are not relations in the sense of Definition 6.3.2. However, a relation-predicate can be converted
into a relation-set by restricting the domain and range to a set. For example, the equality relation and
membership relation for ZF sets are both not sets. However, for any set X, the sets {(x, y) ∈ X × X; x = y}
and {(x, y) ∈ X × X; x ∈ y} are well-defined relations in the sense of Definition 6.3.2.
6.3.5 Example: The following are relations for all sets X and Y .
(1) X ×Y.
(2) ∅.
(3) {(x, y) ∈ X × Y ; x = y}.
(4) {(x, y) ∈ X × Y ; x ∈ y}.
(5) {(x, y) ∈ X × IP(X); x ∈ y}.
(6) {(x, y) ∈ IP(X) × IP(Y ); x ⊆ y}.
6.3.6 Definition: The domain of a relation R is the set {a; ∃b, (a, b) ∈ R}.
The range (or the image or the codomain) of a relation R is the set {b; ∃a, (a, b) ∈ R}.

6.3.7 Notation: Dom(R) denotes the domain of a relation R.

Range(R) denotes the range of a relation R.
Im(R) is an alternative notation for Range(R) for relations R.
6.3.8 Remark: Since Definition 6.3.2 specifies that a relation is a set, it follows that the domain and range
in Definition 6.3.6 are also sets. It is perhaps not immediately clear why this is so. As is very often the case,
it is the ZF replacement axiom (Definition 5.1.26 (6)) which saves the situation. The specification axiom
(which is weaker) implies that any expression of the form {x ∈ X; P (x)} is a genuine set if X is a set and P
is a set-theoretic function. Hence it is sufficient
%% to find suitable X and P to represent the domain and range
of a function. In this case, the set X = R does the job for any relation R.
6.3.9 Theorem: The domain and range of a relation are sets. For any relation R,
) %% %% *
Dom(R) = a ∈ R; ∃b ∈ R, (a, b) ∈ R
and
) %% %% *
Range(R) = b ∈ R; ∃a ∈ R, (a, b) ∈ R .
Proof: Let R be a relation. Let a satisfy ∃b, (a, b) ∈ R. By definition of an ordered pair, this means that
∃b, {{a}, {a, b}} ∈ R. It follows that {a} ∈ (a, b) for some b. Let y = (a, b). Then {a} ∈ y and y ∈ R%for
some y. That is, ∃y, ({a} ∈ y ∧ y % ∈ R). By the definition of set union, this is equivalent to%{a}
% ∈ R.
Let z = {a}. Then a ∈ z and z ∈ R. So by definition of set union again, this implies a ∈ R. This
argument may be summarized as follows.
(a, b) ∈ R ⇔ {{a}, {a, b}} ∈ R

⇒ ∃y, ({a} ∈ y ∧ y ∈ R)
%
⇔ {a} ∈ R
%
⇒ ∃z, (a ∈ z ∧ z ∈ R)
%%
⇔ a∈ R.
%%
An almost identical argument shows%that% (a, b) ∈ R ⇒ b ∈ R. By a double application of the ZF
union axiom (Definition 5.1.26
) (4)),
%% R is a
%%well defined set.
* It then
) follows
% % by the replacement
%% axiom*
(Definition 5.1.26 (6)) that a ∈ R; ∃b ∈ R, (a, b) ∈ R and%% b ∈ R;
%%∃a ∈ R, (a, b) ∈ R
are both well defined sets. But since (a, b) ∈ R implies that a ∈ R and b ∈ R, these sets are the
same as {a; ∃b, (a, b) ∈ R} and {b; ∃a, (a, b) ∈ R} respectively, which are simply the definitions of Dom(R)
and Range(R). It follows that the domain and range of R are sets for any relation R.
6.3.10 Remark: One may regard the steps in the % proof of Theorem 6.3.9 as progressively replacing the
set-braces “{” and “}” with the set-union operator “ ”. The two levels of nesting of a and b in set-braces
are replaced by two levels of set-union operators.
"% % %% #
6.3.11 Theorem: R ⊆ R× R for any relation R.
Proof: The assertion follows immediately from Theorem 6.3.9.
6.3.12 Remark: The word “the” is part of the defined terms in Definition 6.3.6 because generally any set
X such that ∀z ∈ R, ∃a ∈ X, ∃b, z = (a, b) may be referred to as “a domain for R”. Similarly, any set Y
such that ∀z ∈ R, ∃a, ∃b ∈ Y, z = (a, b) may be referred to as “a range for R”. In other words, any set X
with Dom(R) ⊆ X may be called “a domain for R”, and any set Y with Range(R) ⊆ Y may be called “a
range for R”.
From this terminology, one may say that R is a relation from A to B if and only if R is a relation, and A is
a domain for R, and B is a range for R.
6.3.13 Definition: The image of a set A by a relation R is the set {b; ∃a ∈ A, (a, b) ∈ R}.
The pre-image (or inverse image) of a set B by a relation R is the set {a; ∃b ∈ B, (a, b) ∈ R}.

6.3. Relations 209
6.3.14 Definition: The source set of a “relation from a set A to a set B” is the set A.
The target set of a “relation from a set A to a set B” is the set B.
6.3.15 Remark: The terms “source set” and “target set” in Definition 6.3.14 are probably non-standard.
These sets are often referred to colloquially as the “domain” and “range” of the relation. If a relation is
defined to be merely a set R of ordered pairs, as opposed to the tuple (R, A, B), then the sets A and B are
not properties of the set R. These sets are merely part of the context in which the relation is discussed. Thus
Dom(R) and Range(R) are well defined, whereas Source(R) and Target(R) are not well defined. Nevertheless
it is very often useful to be able to refer to the source and target sets. Therefore they need names which are
different to “domain” and “range”.
In view of the above comments, it should be noted that Definition 6.3.14 is in fact a pseudo-definition. These
sets are properties of the meta-language used in the discussion of a relation, not properties of the relation
itself. (Such pseudo-definitions are fairly rare in careful mathematics. An example of a pseudo-notation is the
commonly used “M n ” for an n-dimensional manifold. The careful reader will notice such pseudo-notations
and pseudo-definitions when they occur, but hopefully not too many in this book!)
Definition 6.3.14 may be converted into a valid definition if the word “the” is replaced by “a” as in Defini-
tion 6.3.16.
6.3.16 Definition: A source set for a relation R is any set A such that A ⊇ Dom(R).
A target set for a relation R is any set B such that B ⊇ Range(R).
6.3.17 Remark: It is clear from Definitions 6.3.2 and 6.3.6 that if R is a relation between sets A and B,
then Dom(R) ⊆ A and Range(R) ⊆ B.
The word “range” is used in the mathematics literature for two different concepts. Sometimes the range of a
relation R means the set B (in the case of a relation between sets A and B), but equally often it means the
set {b; ∃a, (a, b) ∈ R}. It is the latter definition which is adopted here. This choice of definition is influenced
by the meaning of the English-language word “range”, but it also agrees with Halmos [160], page 27.
[ Remark 6.3.18 overlaps a lot with Remark 6.3.4. ]
6.3.18 Remark: The concept of a relation-set should be distinguished from a relation-predicate or “set-
theoretic relation”, which is a logic concept. A set-theoretic relation is a symbol in mathematical logic which
is built out of language primitives whereas a relation-set is a particular kind of set within set theory. There
is a third concept of “relation” which is the natural language meaning of the word. (To avoid confusion,
the word “relationship” could be used for the natural language meaning.) Unfortunately, all three concepts
are combined in mathematical writing. The reader must infer from context which of the three meanings is
intended.
6.3.19 Remark: A relation-set is not usually thought of as just a set of ordered pairs. Generally a relation
will be introduced into a mathematical discussion as a “relation between (sets) A and B” or a “relation from
(set) A to (set) B”. A relation is throught of as associating objects with each other. Thus the ordered pair
(a, b) associates object a with object b. A relation, being a set of ordered pairs, associates any number of
objects in this way.
Generally the writer or speaker will have in mind specific sets A and B between which the relation defines
associations. Therefore it is common to define a relation as a subset of a Cartesian product A × B. However,
it is, strictly speaking, unnecessary to specify the source and target sets A and B respectively. The fact that
a relation R is required in Definition 6.3.2 to be a set guarantees that Dom(R) and Range(R) are both sets.
From this it follows that R ⊆ A × B if A and B are chosen as A = Dom(R) and B = Range(R).
6.3.20 Remark: In some formalisms, a relation is defined as a triple of sets (R, A, B), where A and B are
sets and R ⊆ A × B. This is then generally abbreviated to R. Thus R − < (R, A, B), using the “chicken-foot”
notation for abbreviations introduced in Section 5.16. In such a formalism, the reader must decide according
to context whether R means the full tuple (R, A, B) or just the set of ordered pairs R ⊆ A × B.
The tuple formalism (R, A, B) has the advantage of communicating the intended context to the reader, but
it is often clumsy and confusing. The contextual sets A and B are probably best communicated in the
surrounding text.

6.3.21 Notation: a R b for a relation R means that (a, b) ∈ R.
6.3.22 Remark: The infix notation a R b is abstracted from the well-known notations for relations such
as a = b, a -= b, a ∈ b, a < b, a > b, a ≤ b, a ≥ b, a ⊆ b, a ⊇ b, a ≡ b, a ∼ b, a H b, a ≈ b and a ∼
= b.
6.3.23 Definition: The composition or composite of relations R1 and R2 is the set {(a, b); ∃c, ((a, c) ∈
R1 ∧ (c, b) ∈ R2 )}.
6.3.24 Notation: R2 ◦ R1 denotes the composition of two relations R1 and R2 .
6.3.25 Remark: The composite of two relations is a relation. This is almost obvious because a relation is
defined as a set of ordered pairs. The only non-obvious assertion is that the composite of two relations is a
set. The fact that someone writes down something of the form X = {x; P (x)} does not necessarily imply
that X is a set in Zermelo-Fraenkel set theory. Theorem 6.3.26 verifies that the composition of relations
yields a set.
6.3.26 Theorem: The composite R2 ◦ R1 of two relations R1 and R2 is a relation which satisfies R2 ◦ R1 ⊆
Dom(R1 ) × Range(R2 ).
Proof: Let R1 and R2 be relations. By Theorem 6.3.9, Dom(Ri ) and Range(Ri ) are sets for i = 1, 2.
Thus Ri ⊆ Dom(Ri ) × Range(Ri ) for i = 1, 2. Let R = R2 ◦ R1 denote the composition of R1 and R2 . Let
(a, b) ∈ R. Then (a, c) ∈ R1 and (c, b) ∈ R2 for some c. Therefore a ∈ Dom(R1 ) and b ∈ Dom(R2 ). Hence
R ⊆ Dom(R1 ) × Range(R2 ).
6.3.27 Definition: The inverse of a relation R between two sets A and B is the relation between B and
A specified as the set {(b, a); (a, b) ∈ R} of reversed pairs of R.
6.3.28 Notation: R−1 denotes the inverse of a relation R.
[ Definition 6.3.29 should be extended to give a comprehensive list of names of properties of relations. See for
example EDM2 [35], 311. Possibly define these properties before this point instead. ]
6.3.29 Definition: A reflexive relation in a set X is a relation R in X such that ∀a ∈ X, (a, a) ∈ R.

" #
A symmetric relation is a relation R such that ∀a, ∀b, (a, b) ∈ R ⇒ (b, a) ∈ R .
"" # #
A transitive relation is a relation R such that ∀a, ∀b, ∀c, (a, b) ∈ R ∧ (b, c) ∈ R ⇒ (a, c) ∈ R .
6.3.30 Theorem:
(i) The inverse (R−1 )−1 of the inverse R−1 of a relation R satisfies (R−1 )−1 = R.
(ii) The composite R−1 ◦ R of a relation R with its inverse R−1 is a symmetric relation.
Proof: To prove part (ii), let R be a relation. Let S denote the composite R−1 ◦ R of R with its inverse.
Suppose (x1 , x2 ) ∈ S = R−1 ◦ R. Then for some y ∈ Y , (x1 , y) ∈ R and (y, x2 ) ∈ R−1 . So (x2 , y) ∈ R by the
definition of R−1 . Similarly, (y, x1 ) ∈ R−1 . Hence (x2 , x1 ) ∈ S by the definition of the composite. Therefore
S is symmetric.
6.3.31 Definition: An injective relation is a relation R which satisfies

" #
∀x1 , ∀x2 , ∀y, (x1 , y) ∈ R ∧ (x2 , y) ∈ R ⇒ x1 = x2 .
6.3.32 Theorem: The composite of any two injective relations is an injective relation.
Proof: Let R1 and R2 be injective relations. By Definition 6.3.23, the composite of R1 and R2 is the
relation R = {(a, b); ∃c, ((a, c) ∈ R1 ∧ (c, b) ∈ R2 )}. Suppose (a1 , b) ∈ R and (a2 , b) ∈ R. Then for some c1
and c2 , (a1 , c1 ), (a2 , c2 ) ∈ R1 and (c1 , b), (c2 , b) ∈ R2 . Since R2 is an injective relation, c1 = c2 . So a1 = a2
because R1 is an injective relation. Hence R is an injective relation.
6.3.33 Definition: The (domain) restriction of a relation R to a set A is the relation {(x, y) ∈ R; x ∈ A}.
The range restriction of a relation R to a set B is the relation {(x, y) ∈ R; y ∈ B}.

6.4. Equivalence relations and partitions 211
6.4. Equivalence relations and partitions

6.4.1 Definition: An equivalence relation in a set X is a relation R in X such that
(i) ∀x ∈ X, x R x; (reflexivity)
(ii) ∀x, y ∈ X, (x R y ⇒ y R x); (symmetry)
(iii) ∀x, y, z ∈ X, ((x R y and y R z) ⇒ x R z). (transitivity)
6.4.2 Definition: A partition of a set X is a set S ⊆ IP(X) such that

%
(i) S = X.
(ii) ∀A, B ∈ S, (A = B or A ∩ B = ∅).
A set X is said to equal the disjoint union of S if S is a partition of X.
6.4.3 Remark: There is a slight abuse of language in the term% “disjoint union”. This term creates the
impression that there are two kinds of union: and ordinary union S and a disjoint union of S, which many
%̇
people denote as S. However, the phrase “X is the disjoint union of S” means “X is the union of S, and S
is a pairwise disjoint collection of sets”. The phrase “disjoint union of S” means “union of S, which happens
to be a pairwise disjoint collection”. If the collection S doesn’t happen to be a pairwise disjoint union, then
“the disjoint union of S” doesn’t have any meaning.
[ Here present the relationship between a partition and an equivalence relation. ]
6.4.4 Definition: The quotient set of a set X with respect to an equivalence relation R is the set X/R
defined as the set of equivalence classes of X with respect to R. . .
[ For the quotient set, see Halmos [160], page 28. ]
6.4.5 Remark: The set X/R of equivalence classes of a set X with respect to a relation R may also be
called an identification set or identification space. The term “identification space” implies the inheritance
by X/R of some structure on the set X, such as a topology or linear space structure. Another name for the
quotient set X/R is the classification of X by R. It can also be called simply the partition of X by R
6.5. Functions
6.5.1 Remark: The word “function” refers to two kinds of mathematical entity:
(i) a rule-based “set-theoretic function” (specified as a “set-theoretic formula” as in Definition 5.1.22);
(ii) a special kind of relation-set (as defined in Definition 6.3.2).
A rule-based (“set-theoretic”) function may be thought of as a procedure or a sequence of operations in the
logic layer which yields a new set from a given set. The domain of a set-theoretic function is not necessarily
a set. As an example, the operation of constructing the union X ∪ {∅} from any given set X is clearly a
well-defined operation on all sets. But the “set of all sets” is not a set. This kind of logic-layer set-theoretic
function is not the subject of this section.
A reasonable name for the kind of function in part (i) would be a “function-predicate” by analogy with the
“relation-predicate” alluded to in Remark 6.3.4. Then a reasonable name for the kind of function in part (ii)
would be a “function-set” by analogy with the corresponding “relation-set”.
6.5.2 Remark: Functions are generally represented in set theory as particular kinds of sets, but they are
usually thought of as being a separate class of object – something like a machine which produces outputs
for given inputs. The representation of functions as sets is an economical measure which keeps the number
of object classes low. (Conceptual economy is discussed by Halmos [160], section 6. Dissatisfaction with the
passive definition of a function as a set is discussed by Halmos [160], section 8.)

6.5.3 Remark: It may be that the feeling which mathematicians have that a function is different to a set
is due to the historical origins of functions. In the olden days, for instance, the square of an integer x was
defined by a procedure of multiplication of x by itself, which was an active process of generating one number
from another. But the set definition of the “square function” is more like a look-up table in computing. The
set definition is a set of ordered pairs (x, x2 ). So to calculate the square of a number with the set definition,
you look up the value in the set of ordered pairs. In a more active definition of functions, you would specify
an algorithm or procedure. It seems to have been necessary historically to abandon functions defined as
procedures in favour of functions defined as look-up tables in order to remove an arbitrary limitation on
the set of functions that one can discuss. The down side of this has been the loss of ‘active mood’ in the
definition of functions.
6.5.4 Remark: The nouns “function” and “map” are used synonymously in this book. In many contexts,
the word “map” indicates a function between sets which are peers in some sense (such as differentiable
manifolds), whereas “function” is then used to indicate a more light-weight function such as a real-valued
function. The word “mapping” is synonymous with “map”, and a “family” is really the same thing as a
general function except that it is thought about differently. All of these synonyms for “function” are useful
for putting the focus on different aspects of functions.
6.5.5 Remark: Whereas a relation is introduced in Definition 6.3.2 as a set of ordered pairs without any
specification of an explicit domain or range set, most introductory texts do explicitly define a function in
terms of a specified domain set and range set. But functions, like relations, do not the domain or range to
be specified in advance. The domain can always be determined from the set of ordered pairs in a function.
Similarly, any set which contains all values of the function may be considered to be a range set for it.
Therefore Definition 6.5.6 introduces three levels of specification of a function. When neither the domain
nor the range is specified, condition (i) requires only that the value of the function must have a unique value
if it has a value at all. When the domain is specified, it is required by condition (ii) to be equal to the
set of values for which the relation does have a value. When both the domain and range sets are specified,
condition (iii) only requires that the specified range set Y should contain all of the values of the relation f .
It does not need to equal the set of values of the relation.
6.5.6 Definition: A function is a relation f such that
"" # #
(i) ∀x, ∀y1 , ∀y2 , (x, y1 ) ∈ f ∧ (x, y2 ) ∈ f ⇒ y1 = y2 .
A function on X, for any set X, is a function f such that
(ii) Dom(f ) = X,
A function from X to Y , for any sets X and Y , is a function f on X such that
(iii) Range(f ) ⊆ Y ,
6.5.7 Notation: f : X → Y means that f is a function from X to Y .
6.5.8 Remark: A relation f is function f : X → Y if an only if ∀x ∈ X, ∃# y ∈ Y, (x, y) ∈ f . This can be

expressed in terms of set cardinality as ∀x ∈ X, #{y ∈ Y ; (x, y) ∈ f } = 1. In other words, there is one and
only one y for each x ∈ X such that (x, y) ∈ f . In terms of the set map f¯ in Definition 6.6.1, the condition
for a relation f to be a function f : X → Y may be written as ∀x ∈ X, #(f¯({x})) = 1.
6.5.9 Remark: The domain, range and image of a function are defined exactly as for relations in Defini-
tion 6.3.6. The notations Dom(f ), Range(f ) and Im(f ) are defined for functions exactly as for relations in
Notation 6.3.7.
6.5.10 Remark: The range of a function f from X to Y is sometimes defined to be the set Y rather than
the set {y ∈ Y ; ∃x ∈ X, (x, y) ∈ f } ⊆ Y , which is not generally the same set. The term “image”, however,
is always the set {y ∈ Y ; ∃x ∈ X, (x, y) ∈ f } of values of f . In this book, both the range and image
of a function are understood to be the set of values {y ∈ Y ; ∃x ∈ X, (x, y) ∈ f } of the function f as in
Definition 6.3.6. The term “image” should be preferred because it is less ambiguous. A practical difficulty
with the word “image” is the fact that the abbreviation Im clashes with the abbreviation for the imaginary
part of a complex number.

6.5. Functions 213
For maximum clarity one should use the term “target set” for the set Y in the phrase “function from X
to Y ”, and “image” for the set of values of f . The arbitrariness of the target set is a nuisance which is
accepted for good reasons.
6.5.11 Definition: An argument of a function f is any element of the domain of f .
A value of a function f is any element of the range of f .
The value of a function f for an argument x of f is the value y of f such that (x, y) ∈ f .
6.5.12 Notation: f (x) denotes the value of a function f for an argument x ∈ Dom(f ).
fx denotes the value of a function f for an argument x ∈ Dom(f ).
(fi )i∈X is an alternative notation for a function f with domain X.
6.5.13 Notation: Y X , for sets X and Y , denotes the set of functions from X to Y .
6.5.14 Remark: The set Y X is the set of functions f such that Range(f ) = X and Dom(f ) ⊆ Y .
6.5.15 Notation: {f : X → Y ; P (f )} for sets X and Y and set-theoretic formula P means the set {f ∈
Y X ; P (f )}.
6.5.16 Remark: If 9 denotes the always-true set-theoretic formula with zero arguments (as in Nota-
tion 4.12.10), then the set Y X may be written in terms of Notation 6.5.15 as {f : X → Y ; 9}. But then the
dummy variable f does not appear in the expression P (f ) = 9. So it seems superfluous to write the letter
f at all. (One could make use here of the single-parameter always-true logical predicate which is alluded to
in Remark 4.12.9.) The set Y X is sometimes written as {f : X → Y }, which also contains the superfluous
dummy variable f .
In this book, the notation “X → Y ” is proposed as an equivalent for Y X . (See Notation 6.12.2.) This
notation has the advantage that it avoids the superfluous dummy variable, but it is slightly non-standard.
One place where the author has found this sort of notation is in a computer software user manual for the
Isabelle/HOL “proof assistant for higher-order logic” [166], page 5.
6.5.17 Definition: The empty function is the set ∅.
6.5.18 Remark: If f is the empty function f = ∅ then Dom(f ) = ∅ and Range(f ) = ∅. It follows that f
is a function from X to Y if and only if X = ∅. In other words, the target set is arbitrary.
6.5.19 Definition: The identity function on a set X is the function f : X → X with ∀x ∈ X, f (x) = x.
6.5.20 Remark: The identity function on any set X is clearly the same thing as {(x, x); x ∈ X}.
6.5.21 Notation: idX for any set X denotes the identity function on X.
6.5.22 Remark: Since the identity function idX is parametrized by a set which is defined in the context
where it is used, it is really a kind of “meta-function” or “function template”.
6.5.23 Definition: A function f : X → Y is injective (or an injection) if ∀x1 , x2 ∈ X, (f (x1 ) = f (x2 ) ⇒
x1 = x2 ). An injective function is also said to be one-to-one or 1–1.
A function f : X → Y is surjective (or a surjection) if ∀y ∈ Y, ∃x ∈ X, f (x) = y. A surjective function is
also said to be onto.
A function f : X → Y is bijective (or a bijection) if it is injective and surjective.
6.5.24 Remark: Whereas the injective property is independent of the choice of range set Y , the surjective
property depends completely on the set Y . In fact, since a function f is specified as a set of ordered pairs,
it is possible to determine the domain of f as the set Dom(f ) = {x; (x, y) ∈ f }, but it is not possible to
determine the range of f from only the ordered pairs of f . That is, the range of a function is not an attribute
of the function as usually specified. One can only determine that Y ⊇ Range(f ) = {y; (x, y) ∈ f }. Then
a function f : X → Y is said to be surjective if Y = Range(f ), which means that surjectivity is a relation
between a function f and a given set Y , not an attribute of the function as injectivity is.
It follows that the target space Y of a function f : X → Y must always be stated when asserting that a
function is surjective or onto. It is best to say explicitly something like “f is onto Y ” or “f is surjective
to Y ”.

6.5.25 Theorem: A function f : X → Y is a bijection if and only if the inverse relation f −1 : Y → X is

a function.
6.5.26 Remark: It is not necessary to define “inverse function” because it is the same as the “inverse
relation” if the “inverse function” is well defined.
The inverse of a function is always well defined, but the inverse might not be a function. Thus “the inverse
relation f −1 ” is well defined for any function (or relation) f , but “the inverse function f −1 ” is not well
defined unless f is a bijection.
It is important to distinguish between “the inverse of a function f ” (which is always defined) and “the inverse
function f −1 ” (which is only defined if f is a bijection).
6.5.27 Definition: A restriction of a function f is any function g such that g ⊆ f .

The restriction to a set A of a function f is the function {(x, y) ∈ f ; x ∈ A} = f ∩ (A × Range(f )).
&
6.5.28 Notation: f &A for a function f and set A denotes the restriction of f to A.
&
6.5.29 Remark: For expressions F , the notation F (x)&x=a means F (a). This is useful for substituting a
value into a complicated expression as in this example:
+ ∂ " #i ,&&
j
ψβ ◦ ψα−1 (x) & .
∂x x=ψα (p)
&
This is not the same thing as function restriction. The notation F (x)&x=a denotes substitution of a value a,
which gives the unique value of the expression
& F when it is restricted to the set {a} ⊆ Dom(F ). Clearly
F (a) is not exactly the same thing as F &{a} .
6.5.30 Remark: It is not necessary for the set A in Definition 6.5.27 to be a subset of the domain X of f .
A function g is a restriction of a function f if &and only if it is a restriction &of f for some set A. Suppose
g is a restriction of f . Then g ⊆ f ; so g = f &Dom(g) . Conversely, if g = f &A , then clearly g ⊆ f by the
&
definition of f &A . Definition 6.5.31 defines a function extension so that g is an extension of f if and only if
f is a restriction of g.
6.5.31 Definition: An extension of a function f is any function g such that f ⊆ g.
6.6. Function set maps and inverse set maps
6.6.1 Definition: The set map corresponding to a function f : X → Y is the function f¯ : IP(X) → IP(Y )
defined by f¯(A) = {f (x); x ∈ A} for all A ⊆ X.
The inverse set map corresponding to a function f : X → Y is the function f¯−1 : IP(Y ) → IP(X) defined by
f¯(B) = {x ∈ X; f (x) ∈ B} for all B ⊆ Y .
6.6.2 Remark: The set map f¯ in Definition 6.6.1 is illustrated in Figure 6.6.1.
f¯ f¯(A)=
IP(X) A {f (x); x∈A} IP(Y )
f
X x f (x) Y
Figure 6.6.1 Set map between power sets

6.6. Function set maps and inverse set maps 215
The set maps f¯ and f¯−1 in Definition 6.6.1 are usually denoted simply as f and f −1 . This notation re-use,
although economical, leads to actual contradictions when the domain or range of f has elements which are
contained in other elements. Thus for instance, if f : X → Y with ∅ ∈ X, then f (∅) may refer to the original
function f or the corresponding set map f¯. If this seems to be a problem for pathological sets X and Y only,
consider X = ω, the set of ordinal numbers. In this case, every element of ω is also a subset of ω, which
makes it absolutely essential to distinguish between a function f : X → Y and its corresponding set map f¯.
Despite the real danger, most authors use such ambiguous notation. For clarity, as in Theorems 6.6.3
and 6.6.4, different notation may be used for set maps. This kind of difficulty could be resolved by placing a
tag on each element of X and IP(X) to distinguish their set membership. But this is not the way mathematics
is currently done.
6.6.3 Theorem: Let f : X → Y be a function, and let f¯ : IP(X) → IP(Y ) denote the set map corresponding
to f . Then the following statements are true for any sets A, B ∈ IP(X).
(i) f¯(∅) = ∅ and f¯(X) ⊆ Y .
(ii) A ⊆ B ⇒ f¯(A) ⊆ f¯(B).
(iii) f¯(A ∪ B) = f¯(A) ∪ f¯(B).
(iv) f¯(A ∩ B) ⊆ f¯(A) ∩ f¯(B).
(v) f¯(X \ A) ⊇ f¯(X) \ f (A).
[ For properties of set maps and inverse set maps, see EDM2 [35], 381.C. ]
6.6.4 Theorem: Let f : X → Y be a function, and let f¯−1 : IP(Y ) → IP(X) denote the inverse set map
corresponding to f . Then the following statements are true for any sets A, B ∈ IP(Y ).
(i) f¯−1 (∅) = ∅ and f¯−1 (Y ) = X.
(ii) A ⊆ B ⇒ f¯−1 (A) ⊆ f¯−1 (B).
(iii) f¯−1 (A ∪ B) = f¯−1 (A) ∪ f¯−1 (B).
(iv) f¯−1 (A ∩ B) = f¯−1 (A) ∩ f¯−1 (B).
(v) f¯−1 (Y \ A) = f¯−1 (Y ) \ f −1 (A).
And so forth . . .
If f is surjective, then (ii) has the stronger form
(ii# ) A ⊆ B ⇔ f¯−1 (A) ⊆ f¯−1 (B). Therefore A = B ⇔ f¯−1 (A) = f¯−1 (B).
[ Try to think up some statements here for when f is a bijection. ]
Proof: Part (ii) is elementary. To show (ii# ), suppose that f −1 (A) ⊆ f¯−1 (B) and let y ∈ A. Then f (x) = y
for some x ∈ A. So x ∈ f¯−1 (A). Therefore x ∈ f¯−1 (B) and so y = f (x) ∈ B.
...
6.6.5 Remark: The reason for the simplicity of Theorem 6.6.4 relative to Theorem 6.6.3 if the fact that
the inverse of a function is automatically one-to-one and onto even though the inverse may not be a function.
6.6.6 Theorem: Let f : X → Y and let f¯ and f¯−1 be as in Theorems 6.6.3 and 6.6.4. Then:
(i) ∀S ∈ IP(Y ), f¯(f¯−1 (S)) = S.
(ii) ∀S ∈ IP(X), f¯−1 (f¯(S)) ⊇ S.
6.6.7 Remark: Theorem 6.6.8 extends Theorems 6.6.3 and 6.6.4 to arbitrary sets of sets. This theorem
involves the “double set map” f¯ : IP(IP(X)) → IP(IP(Y )) for the function f , which is defined by f¯ : S 8→
{f¯(A); A ∈ S}, and the “double inverse set map” (the set map of the inverse set map) f¯−1 : IP(IP(Y )) →
IP(IP(X)) defined by f¯−1 : S 8→ {f¯−1 (A); A ∈ S}.
6.6.8 Theorem: Let f : X → Y be a function with set map f¯ : IP(X) → IP(Y ) and inverse set map
f¯−1 : IP(Y ) → IP(X). Then the following statements are true for any S ∈ IP(IP(X)) and S # ∈ IP(IP(Y )).

% % %
(i) f¯( S) = {f¯(A); A ∈ S} = f¯(S).
' ' '
(ii) f¯( S) ⊆ {f¯(A); A ∈ S} = f¯(S) if S -= ∅.
% % %
(iii) f¯−1 ( S # ) = {f¯−1 (A); A ∈ S # } = f¯−1 (S # ).
' ' '
(iv) f¯−1 ( S # ) = {f¯−1 (A); A ∈ S # } = f¯−1 (S # ) if S # -= ∅.
And so forth . . .
6.6.9 Theorem: Let X and Y be sets and let f : X → Y be a function from X to Y . Then X is the
disjoint union of the sets f −1 ({y}) for y ∈ Y . That is,
% −1
X= f ({y})
y∈Y
and
" #
∀y1 , y2 ∈ Y, y1 -= y2 ⇒ f −1 ({y1 }) ∩ f −1 ({y2 }) = ∅ .
6.6.10 Remark: Theorem 6.6.9 is illustrated in Figure 6.6.2.
f −1 ({y1 }) f −1 ({y2 })
Y
y1 y2
Figure 6.6.2 Partitioning of a set X by an inverse function f −1
Functions provide a useful tool for partitioning sets. The value of a function f on a set X may be thought
of as a tag which identifies elements of X which belong to the same part of the parition. In other words, a
function effectively defines an equivalence relation on its domain.
6.6.11 Remark: Theorem 6.6.9 provides the foundation for non-topological fibrations (groupless fibre
bundles). The tuple (E, π, B) could be regarded as a fibration if π : E → B is any function from E to B.
Then the “total space” E is partitioned by the “fibre sets” π −1 ({b}) for b ∈ B. In other words, the set
{π −1 ({b}); b ∈ B} is a partition of E. (See Section 22.1 for non-topological fibrations.)
6.7. Composition of functions

6.7.1 Definition: The composition or composite of two functions f : A → B and g : C → D such that
B ⊆ C is the function h : A → D defined by h(x) = g(f (x)) for all x ∈ A.
6.7.2 Notation: g ◦ f denotes the composition of two functions f and g.
6.7.3 Remark: The composition of functions f and g in Definition 6.7.1 would be a well-defined function
under the weaker assumption that Im(f ) ⊆ C. However, it is always possible to define B to be Im(f ) anyway.
A further inconvenience caused by the arbitrariness of the range set D is that although D may equal Im(g),
the image of f ◦ g might be a proper subset of Im(g) (unless Im(f ) = C). Thus one may wish to adjust the
target set of f ◦ g to a set other than D. In that case the expression h : A → D in Definition 6.7.1 would
not hold. This shows the nuisance value of always having to attach a useless target set to every function.
Definition 6.3.23 gives an even more general composite f ◦ g of functions f and g which does not even require
Range(f ) ⊆ Dom(g). However, such a composite is not always defined everywhere on Dom(f ). Partially
defined functions are presented in Section 6.11.

6.8. Families of sets and functions 217
6.7.4 Remark: Although (f ◦g)(x) is the same thing as f (g(x)), f ◦g is not the same thing as f (g) because
f ◦ g is a function constructed from f and g, not the value of f for the argument g.
6.7.5 Remark: It is sometimes desirable to define f ◦ g even if Range(f ) -⊆ Dom(g) in Definition 6.7.1. In
this case, f must first be restricted to f −1 (Dom(g)) before composing f with g. It would be useful to be
able to denote this generalized composite also by g ◦ f , but this does not seem to be common practice.
6.7.6 Remark: A function composition of the form g ◦ f −1 : B → C for f : A → B and g : A → C may

be defined when f is not injective if the non-invertibility of f is somehow cancelled by the non-invertibility
of g. To make sense of this, suppose that f is surjective and that g(x1 ) = g(x2 ) whenever f (x1 ) = f (x2 )
for x1 , x2 ∈ A. (This means that ∀y ∈ B, ∃z ∈ C, f −1 ({y}) ⊆ g −1 ({z}), where z is unique for each y.) Then
the set g(f −1 ({y})) must be a singleton for all y ∈ B. Consequently g ◦ f −1 may be defined to map y to the
unique element of this singleton. This is formalized in Definition 6.7.7. (It is tempting to go wobbly at the
knees here and apply the axiom of choice to construct a right inverse f −1 for f from which g ◦ f −1 may be
constructed. Luckily that’s not the only way to do it.) Definition 6.7.7 is illustrated in Figure 6.7.1.
y# g ◦ f −1
B y z C
f g
A
x1 x2 x3 x4
Figure 6.7.1 Generalized right inverse function: g = (g ◦ f −1 ) ◦ f
6.7.7 Definition: The function"quotient of a function g : A →# C with respect to a surjective function

f : A → B such that ∀x1 , x2 ∈ A, f (x1 ) = f (x2 ) ⇒ g(x1 ) = g(x2 ) is the function g ◦ f −1 : B → C defined
by g ◦ f −1 = {(f (x), g(x)); x ∈ A}.
6.7.8 Remark: The relation h = {(f (x), g(x)); x ∈ A} ⊆ B × C in Definition 6.7.7 is a function from
B to C because for all y ∈ B, there is at least one x ∈ A with f (x) = y since f is surjective, and the
uniqueness of g(x) for a given f (x) follows from the assumed relation between f and g. The function
quotient satisfies g = (g ◦ f −1 ) ◦ f . This may be regarded as a “generalized right inverse” of some sort.
6.8. Families of sets and functions

Families are defined simply as functions. The main difference is in the notation and the focus on the function
values. The domain of a family is regarded as a mere index set which only provides tags for the values of
the function. A family usually has values that are all sets or all functions. A set of sets or functions is often
provided with tags to create a family out of a set.
Sets of things and families of things are often thought of interchangeably since a family can be constructed
from a set by providing tags for all elements of the set, and a set can be constructed from a family as the
range of the family. An important difference is the fact that the same object may appear twice in the range
of a family whereas there cannot be two copies of an object in a set. So if a set is indexed and the indexes
are removed, the original set is recovered. But if a family has its indices removed, it may not be possible to
reconstruct the family from the range of the family.
A family of sets or functions may be thought of as an “indexed set” of sets or functions. But the indexed
set is sometimes defined to be the range of the family.
Although families have the same set representation as functions, they are thought of as being a different
kind of object. This shows once again that mathematical objects have more significance than their set
representation alone. A similar observation is that all functions are represented as sets. So a family of
functions is also a family of sets, although a family of sets is not generally a family of functions.

6.8.1 Definition: A family of sets with index set I is a function S with domain I such that S(i) is a set
for all i ∈ I.
6.8.2 Notation: The value S(i) for a family of sets S is usually denoted as Si .
A family of sets S with index set I may be denoted as (Si )i∈I .
6.8.3 Remark: It is unnecessary to require all of the sets Si in a family (Si )i∈I in Definition 6.8.1 to be
subsets of a single set X. The sets Si are always subsets of the union of all the sets Si , which is a set by
Definition 5.1.26, Axiom (4).
6.8.4 Notation: As a convenience in notation, if two families (Ai )i∈I and (Bi )i∈I have the same index
set I, then the family
" of pairs
# (Ai , Bi ) may be denoted in abbreviated fashion as (Ai , Bi )i∈I instead of the
explicit notation (Ai , Bi ) i∈I .
" 1 #
An n-tuple of families (A" i )1i∈I ,2(Ai )i∈In , #. . . (Ai )i∈I for n ≥ 1 may be denoted as (Ai , Ai , . . . Ai )i∈I instead
2 n 1 2 n
of the explicit notation (Ai , Ai , . . . Ai ) i∈I .
6.8.5 Definition:
% The union of a family of sets S = (Si )i∈I %
is the union of the range of S. In other words,
it is the set {Si ; i ∈ I}. The union of S may be denoted as i∈I Si .
The intersection of a family
' of sets S = (Si )i∈I such that I -= ∅ is the intersection
' of the range of S. In
other words, it is the set {Si ; i ∈ I}. The intersection of S may be denoted as i∈I Si .
6.8.6 Remark: Theorem 6.8.7 extends Theorem 5.14.7 to general families.
6.8.7 Theorem: Let A = (Ai )i∈I and B = (Bj )j∈J be families of sets and let C be a set. Then the
following statements are true.
"' # '
(i) C ∪ i∈I Ai = (C ∪ Ai ) if A -= ∅.
"% # %i∈I
(ii) C ∩ i∈I Ai = (C ∩ Ai ).
"' # " ' i∈I # '
(iii) i∈I Ai ∪ j∈J Bj = (i,j)∈I×J (Ai ∪ Bj ) if A -= ∅ and B -= ∅.
"% # "% # %
(iv) i∈I Ai ∩ j∈J Bj = (i,j)∈I×J (Ai ∩ Bj ).
6.8.8 Remark: Theorem 6.8.9 is an indexed version of Theorem 6.6.8.
6.8.9 Theorem: Let f : X → Y be a function with set map f¯ : IP(X) → IP(Y ) and inverse set map
f¯−1 : IP(Y ) → IP(X). Then the following statements are true for any families of sets A = (Ai )i∈I : I → IP(X)
and B = (Bi )i∈I : I → IP(Y ).
% %
(i) f¯( i∈I Ai ) = i∈I f¯(Ai ).
' '
(ii) f¯( i∈I Ai ) ⊆ i∈I f¯(Ai ) if I -= ∅.
% %
(iii) f¯−1 ( i∈I Bi ) = i∈I f¯−1 (Bi ).
' '
(iv) f¯−1 ( i∈I Bi ) = i∈I f¯−1 (Bi ) if I -= ∅.
[ For more properties of functions for families of sets, see EDM2 [35], 381.D. ]
6.8.10 Remark: Theorems 6.8.11 and 6.8.13 are applicable to topology.
[ Theorems 6.8.11 and 6.8.13 look suspiciously similar to Theorem 5.15.4. Check to ensure that the axiom of
choice is not used in Theorems 6.8.11 and 6.8.13. ]
% % %%
6.8.11 Theorem: Let f : I → IP(IP(X)) for sets I and X. Then { f (A); A ∈ I} = {f (A); A ∈ I}.
% % %%
Proof: Approaching the equation { f (A); A ∈ I} = {f (A); A ∈ I} from the left-hand side,
% % %
{ f (A); A ∈ I} ⇔ ∃A ∈ I, x ∈ f (A)
x∈
⇔ ∃A ∈ I, ∃B ∈ f (A), x ∈ B.
Approaching from the right-hand side,

6.9. Cartesian products of families of sets and functions 219
%% %
x∈ {f (A); A ∈ I} ⇔ ∃B ∈ {f (A); A ∈ I}, x ∈ B
%
⇔ ∃B, (B ∈ {f (A); A ∈ I} ∧ x ∈ B)
⇔ ∃B, ∃A ∈ I, (B ∈ f (A) ∧ x ∈ B)
⇔ ∃B ∈ f (A), ∃A ∈ I, x ∈ B.
The result follows.
6.8.12 Remark: See Section 5.15 for a version of Theorem 6.8.13 which does not use functions.
% Theorem: Let X be a %
6.8.13 set and%Q %
∈ IP(IP(X)). Let f% :%Q → IP(IP(X)) be a function such that
A = f (A) for all A ∈ Q. Then Q = { f (A); A ∈ Q} = {f (A); A ∈ Q}.
% % % %
Proof:
% % Clearly Q % = % {A; A ∈ Q} = { f (A); A ∈ Q}, and from Theorem 6.8.11, it follows that
{ f (A); A ∈ Q} = {f (A); A ∈ Q}.
6.8.14 Definition: A family of functions with index set I is a function f with domain I such that f (i) is
a function for all i ∈ I.
6.8.15 Notation: The value f (i) for a family of functions f is usually denoted as fi .
A family of functions f with index set I may be denoted as (fi )i∈I .
6.8.16 Remark:
& It could be useful to define a notation πB
A
: X A → X B for general sets B ⊆ A and X by
&
πB : f 8→ f B .
A
For any sets X, A and B and any function g : B 8→ A, one could usefully define a notation πg : X A → X B
for the projection map πg : f 8→ f ◦ g.
Definition 6.9.8 defines specific projection maps for Cartesian products of set families such as ×i∈I Si .
6.9. Cartesian products of families of sets and functions

The elements of the Cartesian product in Definition 6.9.1 may be thought of as either functions or sets of
ordered pairs (the graphs of the functions). The perspective may be chosen according to one’s purposes.
6.9.1 Definition: The Cartesian product of a family of sets (Si )i∈I is the set of functions
) % *
x:I→ Si ; ∀i ∈ I, xi ∈ Si .
i∈I
6.9.2 Notation: ×i∈I Si for a family of sets (Si )i∈I denotes the Cartesian product of the family of sets
according to Definition 6.9.1.
6.9.3 Remark: The Cartesian product in Definition 6.9.1 may be written as:
) % % *
× Si = f ∈ IP(I × Si ); (∀j ∈ I, ∃# x ∈ Si , (j, x) ∈ f ) ∧ (∀j ∈ I, ∃x ∈ Si , (j, x) ∈ f ) .
i∈I i∈I i∈I
6.9.4 Remark: If Si = X for all i ∈ I, then ×i∈I Si = X I for any sets X and I. (See Notation 6.5.13 for
the set of functions X I .)
If I = n for some n ∈ + 0 and Si = X for all i ∈ I, then ×i∈I Si = X . (See Notation 7.2.33 for index
n
sets n .)
6.9.5
% Remark: The Cartesian product in Definition 6.9.1 is a well-defined set because it is a subset of I ×
i∈I Si . But it cannot always be proven to be non-empty in general without an axiom of choice. Unless one
is trying very hard to construct pathological sets, it will usually be true that a Cartesian product will be
non-empty if all of the member sets are non-empty. Nevertheless, the non-emptiness of a Cartesian product
should not be assumed without at least thinking about the issue. Section 7.8 has some ideas on establishing
non-emptiness of Cartesian products for readers who dislike the axiom of choice.

6.9.6 Remark: The Cartesian product of a family of sets may be interpreted as the set of all tagged sets
of elements of the family of sets. In other words, one element is sampled from each of the sets in the family,
and the index of each set is tagged onto the chosen element to keep track of where it came from. This is
clearer in the case that all of the sets in the family are the same set. Thus the Cartesian product of a family
(Si )i∈I where Si = X for all i ∈ I may be interpreted as the set of all tagged subsets of X, where one element
of X must be chosen for each index i ∈ I. The partial Cartesian product in Definition 6.10.1 differs in that
all indices are optional – it is not necessary to choose an element for each index.
6.9.7 Remark: The definition of ×i∈I Si is reminiscent of the definition of a cross-section of a fibre bundle
because of the way that an element of Si must be chosen for each i ∈ I.
6.9.8 Definition: The projection maps for a Cartesian product S = ×i∈I Si are the functions πj : S → Sj
defined for j ∈ I by πj : (xi )i∈I 8→ xj for all (xi )i∈I ∈ S.
[ See Halmos [160], page 36, for set projections. ]

[ Define also projections for arbitrary subsets of the index set. Define cross-sections of arbitrary projection
maps as right inverses of surjective functions. ]
6.9.9 Remark: See Remark 6.8.16 for some more general notations for projection maps.
6.9.10 Definition: An n-ary function from a set X to a set Y for n ∈ +

0 is a function f : X n → Y .
[ Somewhere near here should be a definition and notation for sets like {X1 × X2 ; X1 ∈ S1 and X2 ∈ S2 },
where S1 and S2 are sets of sets. I don’t remember what such constructions are useful for. ]
6.9.11 Definition: The direct product of any two functions f and g is the function f × g : Dom(f ) ×
Dom(g) → Range(f ) × Range(g) defined by (f × g) : (x, y) 8→ (f (x), g(y)) for (x, y) ∈ Dom(f ) × Dom(g).
6.9.12 Definition: The pointwise direct product of any two functions f and g is the function f × ˙ g :
˙ g)(x) = (f (x), g(x)) for all x ∈ Dom(f ) ∩Dom(g).
Dom(f ) ∩Dom(g) → Range(f ) × Range(g) defined by (f ×
6.9.13 Remark: Definition 6.9.11 is standard. (E.g. see EDM2 [35], 381.C.) Definition 6.9.12 and its
notations are probably non-standard. The pointwise direct product function is the same as the composition
of the direct product with a diagonal map. Let X = Dom(f ) ∩ Dom(g) in Definition 6.9.12 and define the
diagonal map d : X → X × X by d(x) = (x, x) for all x ∈ X. Then (f × g) ◦ d is the same as the pointwise
direct product f ×˙ g. In practice, the notation f × g will generally be used instead of f ×
˙ g, but the context
should always resolve the ambiguity.
6.10. Partial Cartesian products and identification spaces

6.10.1 Definition: The partial ˚
% Cartesian product of a family of sets (Si )i∈I is the set of functions ×i∈I Si
˚
defined by ×i∈I%
Si = {x : J → i∈I Si ; J ⊆ I and ∀i ∈ J, xi ∈ Si }. (This is a well-defined set because it is
a subset of I × i∈I Si .)
6.10.2 Remark: If the index set I is a subset of the integers, then the elements of a partial Cartesian
product may be referred to as “partial sequences”.
6.10.3 Remark: Definition 6.10.1 is probably non-standard. The idea here is that the functions S are not
necessarily defined on all of the index set I. The set × ˚i∈I Si is a superset of the standard Cartesian product
set ×i∈I Si . Elements of × ˚i∈I Si are restrictions of the elements of ×i∈I Si to arbitrary subsets of I. (These
restrictions are actually projections.) That is, the elements of the partial Cartesian product may be thought
of as families (xi )i∈I of the normal Cartesian product for which some of the elements xi may be undefined.
Unfortunately, set theory does not have a standard symbol or definition for an undefined element.
6.10.4 Remark: Unlike the situation with full Cartesian products, the partial Cartesian products in Def-
inition 6.10.1 are guaranteed to be non-empty, even if some of all of the sets in the family are empty.

6.11. Partially defined functions 221
6.10.5 Definition: An identification space of the family of sets (Si )i∈I is a subset X of the partial Carte-
˚i∈I Si such that
sian product set ×
(i) ∀x ∈ X, x -= ∅,
(ii) ∀x, y ∈ X, ∀i ∈ I, (xi = yi ⇒ x = y), and
(iii) ∀i ∈ I, ∀s ∈ Si , ∃x ∈ X, xi = s.
˚i∈I Si such that every element of each set Si
In other words, the elements of X are non-empty elements of ×
is the element xi of one and only one element x of X.
6.10.6 Remark: Identification spaces are closely related to quotient sets, which are introduced in Defi-
nition 6.4.4. [ Near here, give a commentary on each of the three conditions of Definition 6.10.5. ] Defini-
tion 6.10.5 requires some interpretation. Strictly speaking, a subset X of the set × ˚i∈I Si is only a represen-
tation of an identification space of the family of sets (Si )i∈I . The identification space of a family of sets
is a kind of equivalence class of the sets in the family. It is easier %
to interpret Definition 6.10.5 if the sets
Si are pairwise disjoint. In this case, there is a canonical map f : i∈I Si → X defined so that f (s) is the
unique element x of X such that xi = s for some i ∈ I. (See Figure 6.10.1, where a dot “·” is used for the
“undefined” elements of sequences in X.)
y2 z2 S2
x2
w y
⇒ x z
S1
w1 y1
x1 f
˚ Si = S1 ×
X⊆ × ˚ S2
(w1 , ·) = f (w1 ) f (z2 ) = (·, z2 ) i∈I
f (x1 ) = f (x2 ) = (x1 , x2 )

Figure 6.10.1 Identification space of sets S1 and S2
Since f is a well-defined %
surjective function, the sets of the form f −1 ({x}) for x ∈ X are non-empty and
constitute a partition of i∈I Si . The elements of each set f −1 ({x}) are regarded as identified. In other
words, the elements in such a set are regarded as “grafted” onto each other.
If the sets Si are not disjoint, it is straightforward to attach a “tag” i to elements of each Si so as to force
the sets to be disjoint. If the sets%Si are disjoint, it is possible to formalize the elements of the identification
space as a partition of the set i∈I Si rather than as a set of tagged elements of the partial Cartesian
%
product × ˚i∈I Si . In fact, the partial Cartesian product set × ˚i∈I Si is simply the set of subsets of i∈I Si
with tags on each element to indicate which set Si it was drawn from.
The concept of grafting sets onto each other to create identification spaces is fundamental to differential
geometry. Differentiable and topological manifolds are generally created by grafting portions of Euclidean
spaces onto each other to create more general classes of spaces. The sets (Si )i∈I are in this case the domains
of charts in an atlas. The domain of each chart is a “patch”. The patches are grafted together to form an
abstract manifold. It is necessary to also define an identification space topology and an identification space
differentiable structure, and so forth, in order to build up the structural layers of manifolds.
6.10.7 Remark: The concept of a “direct sum of a family of sets” (see EDM2 [35], 381.E) is essentially
equivalent to an identification space for a disjoint family of sets.
6.11. Partially defined functions

6.11.1 Remark: A function is said to be “well defined” if it has a unique value for every element of
its domain set. In other words, a function is well defined if and only if it is a function. The reason for
the superfluous adjective “well-defined” is the fact that sometimes one wishes to discuss “partially defined

functions”, which are not truly functions because they are not necessarily defined for the whole domain.
Sometimes “multiple-valued functions” are discussed, especially in the context of complex functions. To
avoid woolly thinking, it is best to avoid using the word “function” except when it is well defined.
6.11.2 Remark: Definition 6.11.3 and Notation 6.11.4 are non-standard but often useful. There are many
situations in differential geometry and analysis where the functions of interest are not defined everywhere.
Mendelson [165], page 168, uses the adjective “univocal” to refer to ordered-pair relations which have the
same uniqueness property as in Definition 6.11.3. (See Remark 4.16.2 for a 2-parameter predicate version of
this definition.)
6.11.3 Definition: A partially defined function (or partial function or local function) from a set A to a
set B is a relation f −
< (f, A, B) such that
(i) ∀(a1 , b1 ), (a2 , b2 ) ∈ f, (b1 = b2 ⇒ a1 = a2 ).
˚ B denotes the set of all local functions from a set A to a set B.

6.11.4 Notation: A →
f :A→
˚ B means that f is a local function from a set A to a set B.
6.11.5 Remark: An alternative notation for A → ˚ B would be B̊ A . However, it is best to leave the sets A
and B unadorned to make way for various kinds of subscripts and superscripts.
For example, a partially defined function f from C01 (IRn ) to IR would be of the form f : C01 (IRn ) →
˚ IR or or
f : (IRn → IR) → ˚ IR. It would be difficult to comfortably position a small circle on top of either of the sets
C01 (IRn ) or IRn → IR.
6.11.6 Remark: There is a notation Y X for the set of functions f : X → Y , but there is no simple notation
for the set or partially defined functions {f : U → Y ; U ⊆ X}. This could be denoted in the following ways.
%
{f : U → Y ; U ⊆ X} = YU
U⊆X
≡ (Y ∪ {Y })X
= (Y + )X .
The second notation has the advantage of making it obvious that the cardinality of the set is (#(Y )+1)#(X) ,
but it is not good, meaningful set notation.
[ Show that the composite of any two functions is a partially defined function. See Definition 6.3.23 and
Remark 6.7.3. ]
6.11.7 Definition: The composition or composite of partially defined functions f1 − < (f1 , A1 , B1 ) and
< (f2 , A2 , B2 ) is the relation f −
f2 − < (f, A1 , B2 ) where f = {(a, b); ∃c ∈ B1 ∩ A2 , ((a, c) ∈ f1 ∧ (c, b) ∈ f2 )}.
6.11.8 Theorem: The composite of any two partially defined functions is a partially defined function.
Proof: Let f1 − < (f1 , A1 , B1 ) and f2 −

< (f2 , A2 , B2 ) be partially defined functions. By Definition 6.3.23,
the composite of f1 and f2 is the relation f − < (f, A1 , B2 ) where f = {(a, b); ∃c ∈ B1 ∩ A2 , ((a, c) ∈
f1 ∧ (c, b) ∈ f2 )}. Suppose (a, b1 ) ∈ f and (a, b2 ) ∈ f . Then for some c1 , c2 ∈ B1 , (a, c1 ), (a, c2 ) ∈ f1 and
(c1 , b1 ), (c2 , b2 ) ∈ f2 . Since f1 is a partially defined function, c1 = c2 . So b1 = b2 because f2 is a partially
defined function. Hence f is a partially defined function.
6.11.9 Remark: Theorem 6.11.8 is suspiciously similar to Theorem 6.3.32. Theorem 6.11.8 is illustrated
in Figure 6.11.1.
6.11.10 Theorem: The composite of any two functions is a partially defined function.
Proof: This is an immediate corollary of Theorem 6.11.8.

6.12. Notations for sets of functions 223
A1 B1 A2 B2
f1 f2
a1 b1
c1 c2 c3
a2 b2
f1 f2
f1 : A1 → B1 f2 : A2 → B2
f2 ◦ f1 : A1 ∩ f −1 (A2 ) → B2
Figure 6.11.1 Composite of two functions is a partially defined function
6.12. Notations for sets of functions

6.12.1 Remark: Let A and B be sets. It is useful to denote by A → B the set of functions f : A → B.
Then one may write f ∈ (A → B). This set of functions may also be denoted as B A . (See Notation 6.5.13.)
Thus f ∈ (A → B) and f : A → B mean exactly the same thing as f ∈ B A . The notation A → B seems
useless until one considers function-valued functions. Let C be a set, and let f be a function on A whose
A
values are functions from B to C. One may write this as f : A → C B or f ∈ C (B ) . It is much clearer to
write f : A → (B → C) or f ∈ (A → (B → C)). (See Remark 6.5.16 for further comments on this notation.)
6.12.2 Notation: X → Y for sets X and Y denotes the set Y X .
6.12.3 Remark: For sets A, B and C, a function f : A → (B → C) has a “function transpose” f t :

B → (A → C) defined by f t (b)(a) = f (a)(b) for a ∈ A and b ∈ B. (One could also refer to f t as a
“re-sequence” of f .) As a practical example, a tangent operator field on a differentiable manifold M is of the
form X : M → ((M → IR) → IR). (Here A = M , B = (M → IR) and C = IR.) This may be transposed as
the function X t : (M → IR) → (M → IR) defined by X t (f )(x) = X(x)(f ) for all f ∈ (M → IR) and x ∈ M .
In fact, whenever a function is function-valued, and the functions all have the same domain, the function
may be transposed in this way to make a new function.
It often happens in differential geometry that function-valued functions are transposed in this way for
convenience according to context. Noteworthy examples of this are differentials and connections. There are
four different ways of representing the information in a function f : A × B → C using transposition and
“domain-splitting”. These are as follows.
f :A×B →C
ft : B × A → C defined by f t (b, a) = f (a, b)
f¯ : A → (B → C) defined by f¯(a)(b) = f (a, b)
f¯t : B → (A → C) defined by f¯t (b)(a) = f (a, b),
where f¯ denotes the “domain-split” of f . Any one of these functions may be defined in terms of any of the
others. Therefore they all contain the same information. Functions are often freely converted between these
forms with little or no comment.
Any function valued on a cross-product may be regarded as a function-valued function by fixing the value of
one coordinate and constructing a function using the remaining coordinate. The functions f¯ : A → (B → C)
and f¯t : B → (A → C) may be thought of as “projections” of f onto A and B respectively. One could even
invent notations such as π1 f for f¯ and π2 f for f¯t . Obviously there are very many more ways of doing this
in the case of a Cartesian product of many sets.
[ Must also discuss here the “circled arrow” or “William Tell” notation “−→”,
◦ for group actions as shown in
Figure 23.6.2. Use notation G −→◦ F to denote G → (F → F ) or G × F → F . ]


[225]
Chapter 7
Order and integers
7.1 Ordered sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

7.2 Ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3 Natural numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.4 Unsigned integer arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.5 Signed integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.6 Extended integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.7 Cartesian products of sequences of sets and functions . . . . . . . . . . . . . . . . . . . . . 237
7.8 Choice functions without the axiom of choice . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.9 Indicator functions and delta functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.10 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.11 Combinations and ordered selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.12 List spaces for general sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.13 Reformulation of logic in terms of axiomatic mathematics . . . . . . . . . . . . . . . . . . . 246
The concept of order on sets is closely associated with numbers. Historically, the first numbers were ordinal
numbers, and order could generally be expressed in terms of numbers. However, both order and numbers
have broad generalizations beyond their historical origins.
Order does not require set theory. Definitions of various categories of order relations may be expressed in
terms of predicate calculus alone. However, applicable order relations are typically defined on sets. Therefore
only order on sets will be presented here.
7.1. Ordered sets

Totally ordered sets may be thought of as abstractions from the order relation of the real numbers. Partially
ordered sets may be thought of as an abstraction from the inclusion relations of sets. (See Definition 6.3.2
for relations.) Some useful references for ordered sets are Halmos [160], section 14, Simmons [140], section 8,
pages 43–48, and EDM2 [35], article 311.
Ordered sets are particularly useful for indexing sets. A set which is indexed by a totally ordered set is
called a “sequence”. The index sets of sequences are most often subsets of the integers, but Definition 7.1.14
generalizes this to any totally ordered set. This generality is applicable to “paths” which are traversals of
a given set in a specified order. Paths are usually specified (as in Section 16.4) in terms of a real-number
parameter, but a totally ordered parameter is the natural generalization.
7.1.1 Definition: A (partial) order for a set X is a relation R −

< (R, X, X) which satisfies:
(i) ∀x ∈ X, x R x; (weak reflexivity)
(ii) ∀x, y ∈ X, (x R y ∧ y R x) ⇒ x = y; (antisymmetry)
(iii) ∀x, y, z ∈ X, (x R y ∧ y R z) ⇔ x R z. (transitivity)
A (partially) ordered set is a pair (X, R) such that X is a set and R −
< (R, X, X) is a partial order for X.


226 7. Order and integers
7.1.2 Theorem: If a relation R for a set X is a partial order, then the inverse relation R−1 is a partial
order for X also.
7.1.3 Example: For any set A, the power set X = IP(A) has a partial order R defined by
R = {(x, y) ∈ X × X; x ⊆ y}.
In other words, x R y ⇔ x ⊆ y for x, y ∈ X. Thus “ ⊆ ” is a partial order on IP(A) for any set A.
[ Define partial order on X I for ordered set X and set I as in Remark 16.1.8: x ≤ y ⇔ (∀i ∈ I, xi ≤ yi ). ]
7.1.4 Definition: A total order for a set X is a relation R −

< (R, X, X) which satisfies:
(i) ∀x, y ∈ X, x R y ∨ y R x; (strong reflexivity)
(ii) ∀x, y ∈ X, (x R y ∧ y R x) ⇒ x = y; (antisymmetry)
(iii) ∀x, y, z ∈ X, (x R y ∧ y R z) ⇔ x R z. (transitivity)
A totally ordered set is a pair (X, R) such that X is a set and R −
< (R, X, X) is a total order for X.
"
[ Define lexicographic total order on X I for# ordered sets X and I as in Remark 16.1.8: x ≤ y ⇔ ∀j ∈
n , (∀i ∈ n , i < j ⇒ xi = yi ) ⇒ xj ≤ yj . Equivalently, x ≤ y ⇔ ∀j ∈ n , ((xj ≤ yj ) ∨ (∃i ∈ n , (i <
j ∧ xi -= yi ))). (See also Exercise 46.7.10.) ]
7.1.5 Remark: The difference between a partial order and a total order lies in the stronger reflexivity
condition for a total order. Clearly every total order is a partial order. So all theorems and definitions which
mention a partial order also apply to a total order.
Either reflexivity condition implies that the domain and range of the relation R are both equal to the set X.
So the full relation triple (R, X, X) is redundant. Therefore it is best to specify an order as (X, R) where R
is just the set of ordered pairs (the graph) of the order relation.
7.1.6 Theorem: If a relation R for a set X is a total order, then the inverse relation R−1 is a total order
for X also.
7.1.7 Definition: The dual order of an order R for a set X is the inverse R−1 of the relation R. In other
words, the dual of an order R is the relation R−1 = {(x, y); (y, x) ∈ R}.
7.1.8 Notation: The symbol “≤” is often used for a partial or total order R. Thus x ≤ y means x R y.
The symbol “≥” means the dual of the order “≤”. Thus “≥” equals R−1 .
The symbol “<” denotes the relation which satisfies x < y ⇔ (x ≤ y ∧ x -= y).
The symbol “>” denotes the relation which satisfies x > y ⇔ (x ≥ y ∧ x -= y).
7.1.9 Remark: In computer programming (in some computer languages), a function f which defines a
total order on a set X typically returns a value in the set {−1, 0, 1}, where
-
−1 if x < y
f (x, y) = 0 if x = y
+1 if x > y.
However, in mathematics an order relation is represented as a subset of X × X, which is equivalent to a

boolean (or indicator) function on X × X, which is a function whose values lie in {0, 1}, where the integer 1
represents “true”. The reason for the difference is the fact that equality of elements x, y ∈ X is taken for
granted in mathematics. Therefore only two of the order-function values need to be specified.
In computer programming, partial orders are not often returned as functions. For such an order-function, it
would be necessary to have a fourth value to represent “unrelated” because it is possible that neither x nor
y may be elements of the order.

7.1. Ordered sets 227
7.1.10 Definition: A minimal element of a subset A of a partially ordered set X is an element x ∈ A

such that ∀a ∈ A, ¬(a < x).
A maximal element of a subset A of a partially ordered set X is an element x ∈ A such that ∀a ∈ A, ¬(a > x).
A lower bound of a subset A of a partially ordered set X is an element x ∈ X such that ∀a ∈ A, x ≤ a.
An upper bound of a subset A of a partially ordered set X is an element x ∈ X such that ∀a ∈ A, x ≥ a.
An infimum of a subset A of a partially ordered set X is a lower bound x ∈ X for A such that y ≤ x for all
lower bounds y for A.
A supremum of a subset A of a partially ordered set X is an upper bound x ∈ X for A such that x ≤ y for
all upper bounds y for A.
A minimum of a subset A of a partially ordered set X is an element x ∈ A such that ∀y ∈ A, x ≤ y.
A maximum of a subset A of a partially ordered set X is an element x ∈ A such that ∀y ∈ A, x ≥ y.
[ Define chains somewhere in this section. Show that a strict order has no cyclic chains. ]
7.1.11 Remark: There is at most one maximum and one minimum for any subset of a partially ordered
set. The infimum of a subset A is a maximum of the set of all lower bounds of A, and the supremum A is
a minimum of the set of all upper bounds of A. Therefore there is at most one infimum and one supremum
for any subset of a partially ordered set.
7.1.12 Remark: One may define homomorphisms and isomorphisms with respect to the order structure
on a set as in Definition 7.1.13. The definitions apply to both partial and total orders. It is important to
remember that the inequality symbol “ ≤ ” represents two different order relations according to the ordered
set in which the elements are being compared. If the sets X and Y are the same (or X ∩ Y -= ∅), one should
use distinct notations such as RX and RY to indicate which order is being used, since two different orders
may be defined on the same set.
7.1.13 Definition: An " order homomorphism between# two ordered sets X and Y is a map f : X → Y
such that ∀x1 , x2 ∈ X, x1 ≤ x2 ⇒ f (x1 ) ≤ f (x2 ) .
An "order isomorphism between# two ordered sets X and Y is a bijection f : X → Y such that ∀x1 , x2 ∈
X, x1 ≤ x2 ⇔ f (x1 ) ≤ f (x2 ) .
7.1.14 Definition: A sequence (of sets) is a family of sets (Xi )i∈I such that the index set I is a totally
ordered set.
A sequence of functions is a family of functions (fi )i∈I such that the index set I is a totally ordered set.
7.1.15 Remark: Families of sets and functions are defined in Section 6.8. Definition 7.1.14 is not strictly
correct because an ordered set is really a pair (I, R) rather than a set I. A family of sets (Definition 6.8.1)
is strictly speaking a triple X −
< (X, I, J), namely a function X : I → J for some sets I and J. Therefore
the order (R, I, I) must be incorporated into the triple (X, I, J) somehow. Since the order is thought of as
being tightly attached to the set I, one good possibility would be to specify a sequence as (X, I, R, J), or
perhaps as (X, (I, R), J). The author’s preference is to standardize on (X, I, R, J) as the specification tuple
for a sequence, although an equally good solution is to simply have two different tuples: (X, I, J) for the
sequence and (I, R) for the total order on I.
7.1.16 Remark: A family of sets or functions (with no total order on the index set) may sometimes be
referred to casually as a “sequence” of sets or functions. If the index set has no specified order, the word
“family” should be used. The word “sequence” comes from the Latin word “sequi” meaning “to follow”. So
a sequence must have a specified order.
7.1.17 Definition: An ordered traversal of a set X is a bijection f : I → X for some totally ordered set I.
[ Should also define ordered traversals of ordered traversals. I.e. permit the index set to be a doubly totally
ordered set. This is like a database table with a primary key and a secondary key. Also allow an arbitrary
number of “keys” for multiply ordered traversals. This concept could be useful for defining multi-parameter
paths. See Remark 16.1.8. ]

7.1.18 Remark: The non-standard Definition 7.1.17 is essentially equivalent to Definition 7.1.14. It is
intended as a generalization of the concept of a continuous curve in a topological space (as in Section 16.2)
to a non-topological curve. The very least that one expects of a curve is that it should have an order in which
the points are traversed. If I = IR, then this permits discontinuities which a continuous curve forbids. An
equally good term for a traversal would be a “trajectory”, although this word has a more specific meaning
in physics.
An obvious variation of the ordered traversal idea is to simply transfer the order relation from the ordered
set I in Definition 7.1.17 to the set X itself. One can obviously also invert the map f without materially
affecting the definition. A much more interesting variation is to replace the total order on I with a general
partial order on I. But once again, one may as well consider the order to be defined on the set X itself.
The purpose of inventing an “ordered traversal” is to put any superfluous structures on the ordered set I
into the background. If one uses real-number intervals for I, the topological, differentiability and algebraic
properties of the interval suggest themselves, which is often not desirable.
7.2. Ordinal numbers

[ The number sections are very scrappy and sloppy right now because it’s all very standard material which
everybody knows. They will be written properly after the more difficult parts of the book have been written. ]
[ Shoenfield [169], pages 246–252, has an interesting version of ordinals and transfinite induction. ]
7.2.1 Remark: The concept of infinity follows naturally from the concept of extrapolation, which is the
ability to infer a rule and apply it to make predictions. No one ever holds an infinite amount of information
in the mind because of fundamental limitations on information storage. Even if the human mind could hold
infinite information, it would not be possible to communicate that information because of signal processing
limitations on information transmission. Therefore notionally infinite concepts are actually rules, and the
rules themselves are finite. Even simple animals are able to recognize patterns and learn rules. Psychologists
have a concept of “conditioning”, which means that an animal learns to follow a pattern. The learning animal
infers a rule and follows it. The recognition of a rule or pattern is sometimes called the “aha” experience
when the individual “gets it”. When a pattern has been recognized, the individual’s perception of a sequence
of events changes from surprise to disinterest. The first few times that a human put a seed in the ground
and an edible plant grew out of it, this may have been surprising, but after a while recognition must have
dawned, after which the inferred rule would become “knowledge”, from which a conceptually infinite number
of predictions could be made.
Humans are good at inferring rules from experience. For example, when inference is applied to a sequence
such as 1, 2, 4, 8, 16. . . , a human can infer the rule f (n) = 2n , which is a finite rule that in principle specifies
an infinite amount of information, but this is only possible because of the redundancy in the sequence. So
rule and pattern recognition rely upon redundancy in the sequences of experiences.
It is important to remember that nothing in mathematics is really infinite. Infinite concepts are only infinite
in principle. Without infinite concepts, mathematics would have to return to the 16th century or even much
earlier. Without infinite concepts, there would certainly be no differential geometry. However, all finite
concepts and manipulations are expressed in terms of finitely representable rules.
7.2.2 Definition: The successor set of a set X is the set X ∪ {X}.
[ Check the claim in Remark 7.2.3 that a set which equals its own successor set suffers from the Burali-Forti
paradox. ]
7.2.3 Remark: For any set X, the successor set X ∪ {X} is a well-defined set. This follows from the
unordered pair axiom and the union axiom in ZF set theory.
The successor set construction is used in the definition of the ordinal numbers, both finite and infinite. If it
is supposed that a set W is equal to its own successor set, such a “set” suffers from the Burali-Forti paradox.
[ Probably von Neumann’s set construction for ordinal numbers in Remark 7.2.4 was published in 1928? ]

7.2. Ordinal numbers 229
7.2.4 Remark: Roughly speaking, the finite ordinal numbers are defined inductively as 0 = ∅, 1 = {0},
2 = {0, 1}, 3 = {0, 1, 2}, n+ = n ∪ {n}, etc. Definition 7.2.12 gives a more rigorous characterization.
According to EDM2 [35], 33.B, this successor-set construction for the ordinal numbers is due to John von
Neumann.
This construction has various useful properties. For example, 0 ∈ 1 ∈ 2 ∈ 3 . . . and 0 ⊆ 1 ⊆ 2 ⊆ 3 . . ..
But x ∈ y ⇒ x -= y. Therefore 0 ⊂ %= 1 %= 2 %= 3 . . .. The membership relation is transitive, and in fact
⊂ ⊂
corresponds exactly to our usual notion of the “<” inequality. (This seems to be the principal motivation for
the construction.) The construction of these sets permits no other sets “in the gaps” between them. Further
properties are mentioned in Remark 7.2.6.
7.2.5 Remark: In real life, people tend to represent the non-negative integers as sequences of decimal
digits. Thus 4096 and 99999 would be non-negative % integers. This suggests an alternative representation as
finite sequences (i.e. lists) of the form (di )ki=1 ∈ ∞n=0 ({0, 1, . . . 9}) n
. Ths fly in the ointment here is that
such sequences require the prior definition of the non-negative integers to provide index sets for sequences
of arbitrary length.
This raises the question of whether sequences of arbitrary length can be defined without first defining integers.
Definition 6.1.14 suggests that this can be done. However, each length of sequence must be defined separately
because inductive definition cannot be done without first defining all of the non-negative integers!
Thw whole purpose of the apparently absurd von Neumann construction for ordinal numbers is to boot-strap
the definitions of all integer-related things starting from the basis of the ZF axioms. The axiom of infinity
is the crucial axiom which makes it all work.
As soon as the ordinal numbers have been defined, and the basic properties have been established, they may
be used to re-define integers as sequences of digits, either decimal or binary, or in another related form.
It may seem that the von Neumann construction is hereby shown to be at a more fundamental level than the
representation of integers as digit sequences. However, this may be an illusion. All of mathematical logic,
upon which the ZF axioms are based, pre-supposes the naive notion of sequences of symbols of arbirary
length (in each proposition) and sequences of propositions of arbitrary length (in deductive arguments). If
sequences are required already for logic, then surely they may be applied in the representation of integers
as digit sequences. This is true in principle, but not in practice. In practice, when the ZF axioms have
been built up on a basis of propositional and predicate calculus, the underpinnings of the logical calculus
are removed and ZF set theory must stand on its own. There is some small justice in this, namely the fact
that by discarding naive logic and mathematics when the 8 axioms of ZF set theory have been set up, this
minimal set of axioms is easier to examine for validity than an ill-defined body of naive mathematics. On
the other hand, the metamathematical examination of the validity of ZF set theory requires a lot of naive
mathematics. One may argue back and forth forever in this way. It is better to just not think about it too
much.
7.2.6 Remark: It is not possible to use induction to define the finite ordinal numbers in the obvious way
because induction requires the prior definition of ordinal numbers. The ZF infinity axiom guarantees the
existence of at least one infinite set, but from this we must construct a set of ordinals which is infinite but
not too infinite. When the finite ordinals have been rigorously defined, the rest of differential geometry can
be based securely on them.
For this important leap it is convenient to state a few intuitively obvious properties of the finite ordinal
numbers N ∈ ω defined informally (by naive induction) as 0 = ∅, 1 = {0}, 2 = {0, 1} etc. The following
intuitive properties of finite ordinal numbers can be used both to define them and to verify that the definition
matches the intuitive idea.
(i) For all N ∈ ω, either N = ∅ or ∃m ∈ ω, N = m ∪ {m}. That is, all finite ordinal numbers are either
the empty set or can be constructed as the successor of another ordinal number. However, this property
does not exclude the possibility of an infinite set.
% %
(ii) For all N ∈ ω, N ∈ N . The element N of N is the maximum % element% of N%. This property prevents
the set N from being infinite because the successor set N = ( N ) ∪ { N } of N cannot be a member
of N .
(iii) For all N ∈ ω, for all m ∈ N , m ∈ ω. That is, all elements of finite ordinal numbers are finite ordinal
numbers. In other words, N ⊆ ω.

(iv) For all N1 , N2 ∈ ω, either N1 ⊂ %= N2 , N2 %= N1 or N1 = N2 .

⊂
(v) For all N1 , N2 ∈ ω, either N1 ∈ N2 or N2 ∈ N1 or N1 = N2 .

(vi) For all N1 , N2 ∈ ω, N1 ∈ N2 is and only if N1 ⊂ %= N2 .
(vii) For all N1 , N2 ∈ ω, if there exists a bijection f : N1 → N2 then N1 = N2 . Equivalently, if N1 -= N2 , then
there exists no bijection f : N1 → N2 . (This means that all finite ordinal numbers are Dedekind-finite.)
(viii) For all N ∈ ω, all subsets of N are well-ordered (on the left) by the “∈” membership relation. That is,
∀S ∈ IP(N ) \ {∅}, ∃m ∈ S, ∀x ∈ S \ {m}, (m ∈ x ∧ x ∈ / m). That is, every non-empty subset of N has
a minimum.
(ix) For all N ∈ ω, all subsets of N are well-ordered on the right by the “∈” membership relation. That is,
∀S ∈ IP(N ) \ {∅}, ∃m ∈ S, ∀x ∈ S \ {m}, (x ∈ m ∧ m ∈ / x). That is, every non-empty subset of N has
a maximum.
(x) For all N ∈ ω, there is no set x such that N ∈ x and x ∈ N + . That is, there is no set between any
finite ordinal number and its successor.
In essence, a finite ordinal number has a minimum and a maximum, and each element in between is tightly
chained to its successor and predecessor. The above properties should be sufficient to ensure finiteness, but
is no way of rigorously testing a definition of finiteness until the finite ordinals have been defined. The
Dedekind definition of finiteness is appealing and perhaps trustworthy, but a better test might be whether
mathematical induction works as expected on ω. The conditions most closely associated with mathematical
induction are (i) and (viii).
7.2.7 Remark: There is probably no more important intellectual leap in mathematics than the introduc-
tion of the set ω of finite ordinal numbers. The concept of infinity has puzzled and troubled mathematicians
and non-mathematicians from 2500 years ago to the present day. For example, EDM2 [35], article 47, says
about Cantor: “In 1884, under the strain of opposition to his ideas and his efforts to prove the continuum
hypothesis, he suffered the first of many attacks of depression which continued to hospitalize him from time
to time until his death.” The main source of opposition to Cantor’s ideas about infinity was Kronecker. (Ein-
stein apparently called the clashes between the followers of Cantor and Kronecker a “frog-mouse battle”.
See Bell [191], page 556.) Better known are the infinity paradoxes of Zeno of Elea.
7.2.8 Remark: First, Definition 7.2.9 characterizes finite ordinal numbers in terms of a simple set-theoretic
formula. Then Definition 7.2.12 introduces the infinite set of all ordinal numbers.
7.2.9 Definition: A finite ordinal number is a set N such that either N = ∅ or

%
N ∈ N ∧ ∀m ∈ N, (m = ∅ ∨ ∃a ∈ N, m = a ∪ {a}). (7.2.1)
7.2.10 Remark: Definition 7.2.9 may not look like the intuitive concept of an ordinal number. So it must
be shown that the conditions of the definition imply the familiar properties of the non-negative integers. It is
easy to check that the first few ordinal numbers do satisfy the definition, but it mush be shown that all finite
ordinal numbers satisfy the definition, and only the finite ordinal numbers satisfy the definition. Since the
ordinal numbers are only defined informally, no rigorous proof can be given, but it is possible to check that
sets satisfying Definition 7.2.9 have the sort of properties that we expect. It is highly desirable to show that
all sets satisfying Definition 7.2.9 are finite. But since finiteness is defined in terms of the ordinal numbers,
this is impossible. It is meaningful, however, to require that the finite ordinal numbers are Dedekind-finite.
7.2.11 Theorem: Let N be a finite ordinal number. Then:

(i) ∀m ∈ N \ {∅}, ∅ ∈ m.
(ii) . . .
Proof: To prove (i), suppose that the assertion is false. Then the set N # = {m ∈ N \ {∅}; ∅ ∈ / m} must
be non-empty. By the ZF regularity axiom (Definition 5.1.26 (7)) applied to the proposition P (x) defined
by “x ∈ N # ”, one obtains ∃z ∈ N # , ∀y ∈ N # , y ∈
/ z. For such z, it follows from the definition of N that
∃y ∈ N, z = y ∪ {y}. So y ∈ z, which is a contradiction. So N # = ∅, which proves (i).
(To be continued. . . )

7.2. Ordinal numbers 231
7.2.12 Definition: The set of finite ordinal numbers is the set ω defined by
%
ω = {N ; N = ∅ ∨ ( N ∈ N ∧ ∀m ∈ N, (m = ∅ ∨ ∃a ∈ N, m = a ∪ {a}))} (7.2.2)
[ Theorem 7.2.13 seems a little pointless. Scrap it if it is useless. ]
7.2.13 Theorem: The set of ordinal numbers ω is well-defined. That is, the existence of the set ω is
guaranteed by the ZF axioms and the set is uniquely determined by the definition.
Proof: The existence of ω follows directly from the infinity axiom, Definition 5.1.26 (8), which guarantees
the existence of a set X such that
" #
∀z, z ∈ X ⇔ ((∀u, u ∈
/ z) ∨ ∃y, (y ∈ X ∧ ∀v, (v ∈ z ⇔ (v ∈ y ∨ v = y)))) .
This may be rewritten in higher-level set language as:

" #
∀z, z ∈ X ⇔ (z = ∅ ∨ ∃y ∈ X, z = y ∪ {y}) .
A set ω satisfying (7.2.2) can be constructed from the set X as
ω = {N ∈ X; N = ∅ ∨ ∀m ∈ N, (m = ∅ ∨ ∃a ∈ N, m = a ∪ {a})}. (7.2.3)
This set is well-defined by the ZF replacement axiom. It must be shown that this set equals X, and that
the resulting set ω in (7.2.3) is the same as the set ω in (7.2.2).
(To be continued. . . maybe!)
7.2.14 Remark: The ordinal numbers become rapidly impractical if written out in full as 0 = ∅, 1 = {∅},
2 = {∅, {∅}} etc. (See Exercise 46.3.1.)
7.2.15 Remark: Figure 7.2.1 illustrates the first seven ordinal numbers. (See Exercise 46.3.2 for a similar
diagram of the ordinal number 10.)
0
1
2
Figure 7.2.1 The first seven ordinal numbers
7.2.16 Remark: The notation S + = S ∪ {S} for successor sets clashes negligibly with notations such as
0 and IR . The inequality S -= S follows from the ZF regularity axiom, which implies that S ∈
/ S for
+ + +
sets S.
7.2.17 Theorem: For any set S in ZF set theory, the set S + = S ∪ {S} is well-defined.

[ Should present the mathematical induction principle in this section. ]

[ Insert near here a set of axioms for a system of finite ordinal numbers? Show that Definition 7.2.12 satisfies the
axioms? Alternatively give no axioms for the ordinals, but do give the Peano axioms for natural numbers. The
best method of definition integers is to first define the ordinal numbers using set theory. Then define natural
numbers using Peano axioms. And then finally show that the ordinal numbers are a valid representation for
the natural numbers. This effectively puts the two styles of numbers on an equal footing, neither being truly
prior to the other. ]
7.2.18 Definition: Two sets X and Y are said to be equinumerous or equipotent if there exists a bijec-
tion f : X → Y .
7.2.19 Remark: Halmos [160], page 52, uses the word “equivalent” instead of “equinumerous” for Defini-
tion 7.2.18. He uses the notation X ∼ Y to denote the relation of equinumerosity. However, note that this
is a relation-predicate in the sense of Remark 6.3.4, not a relation-set.
[ Define cardinality as a kind of pseudo-function on sets which is a kind of a proxy for cardinality order and
equality relations. Need to define relation-predicates like “domination” and “strict domination”. (See Hal-
mos [160], pages 87-90.) Then summarize cardinal arithmetic and cardinal numbers similarly to Halmos [160],
pages 94–102. Also mention the Schröder-Bernstein theorem. ]
7.2.20 Definition: A finite set is a set S which satisfies ∃n ∈ ω, ∃f : n → S, f is bijective
7.2.21 Remark: Definition 7.2.20 means that a finite set is a set which is equinumerous to a finite ordinal
number.
7.2.22 Theorem:
(i) A set S is finite if and only if ∃n ∈ ω, ∃f : n → S, f is surjective.
(ii) A set S is finite if and only if ∃n ∈ ω, ∃f : S → n, f is injective.
7.2.23 Remark: Definition 7.2.24 is Dedekind’s definition of a finite set. Every finite set (equinumerous
with a finite ordinal number) is Dedekind-finite, but the axiom of countable choice is required to show that
all Dedekind-finite sets are finite. (See Mendelson [165], pages 184–185 and Halmos [160], page 61.) The
Dedekind definition of finite sets is attractive because it uses a very simple rule instead of the machinery
of finite ordinal numbers, but the axiom of choice is the greater evil. (See Remark 5.9.10 (vii) for further
comments on Dedekind-finite sets. See Definition 7.2.20 for finite sets.)
7.2.24 Definition: A Dedekind-finite set is a set X such that all injective functions f : X → X are
surjective.
A Dedekind-infinite set is a set X such that f (X) -= X for some injective function f : X → X.
7.2.25 Theorem: If a set S is finite, then S is Dedekind-finite.
7.2.26 Theorem [zf+cc]: If a set S is Dedekind-finite, then S is finite.
[ For Theorem 7.2.26, see Mendelson [165], page 199, Prop. 4.38(II) and page 185, Prop. 4.25(3). ]
7.2.27 Remark: Theorems 7.2.26 and 7.2.28 are tainted by the Axiom of Countable Choice, as indicated.
Therefore they must not be used in the proof of theorems which are not CC-tainted.
7.2.28 Theorem [zf+cc]: Every infinite " set has a subset which is equinumerous to the ordinal# numbers.
That is, if X is an infinite set, then ∃f, (∀i ∈ ω, f (i) ∈ X) ∧ (∀i, j ∈ ω, (i = j ⇔ f (i) = f (j))) .
[ Define cardinality in terms of equinumerosity and ordinal numbers. Notation #(X). Define total order on
the integers. ]
7.2.29 Notation: ω + = ω ∪ {ω}.
7.2.30 Remark: Since the set ω may be interpreted as the infinite “number” ∞, the set ω + in Nota-
tion 7.2.29 may be thought of as ω ∪ {∞}.

7.3. Natural numbers 233
7.2.31 Notation: = {1, 2, . . .} = ω \ {0} denotes the set of natural numbers, which are defined here to
be the positive integers.
7.2.32 Remark: Some authors define the same as ω. Apart from wasting a symbol, the concept of
“natural numbers” is arguably the set of counting numbers which have been used for thousands of years, which
does not include zero because until the last few centuries, zero seemed like an unnatural and unnecessary
concept. Therefore is used for the positive integers here. This agrees with EDM2 [35], 294.B, which
defines natural numbers in terms of the Peano axioms.
7.2.33 Notation: n denotes the set {1, 2, . . . n} for n ∈ 0.

+
[ Put here the definition of the cardinality #(S) of a (countable) set S. (Probably don’t need much about
transfinite numbers.) The word “denumerable” is a common (possibly rather old-fashioned) synonym for
“countable”. ]
7.2.34 Notation: For any set X and non-negative integer k, IPk (X) denotes the set {S ∈ IP(X); #(S) ≤
k} of subsets of X with no more than k elements.
[ Is Notation 7.2.34 standard? Is it useful for anything? ]
7.2.35 Remark: Notation 7.2.34 is analogous to the power set notation IP(X) in Definition 5.1.26, Ax-
iom 5. One could define more general sets of limited numbers of subsets. For example, IPω (X) would denote
the set of all countable subsets of a general set X. One could also define notations IPnm (X) to mean the
subsets of X with no less than m and no more than n elements.
[ Possibly make some comments about transfinite numbers and transfinite induction near here. ]
[ Define “countable set” near here. ]
7.2.36 Theorem [zf+cc]: The union of a countable set of countable sets is countable.
7.3. Natural numbers
7.3.1 Remark: There are many approaches to building up the definitions of the various number systems.
Three of these approaches are: (1) Specify a particular set representation of each number system, (2) Define
each number system in terms of a set of abstract axioms, and (3) Build each number system from a simpler
number system using a “characterization”. Approach (1) was used to define the ordinal numbers in Sec-
tion 7.2 because there is fairly universal acceptance of the “successor set” construction for ordinal numbers.
Approach (3) is used for the other number systems here.
Many authors use the axiomatic approach (2) for number systems, but defining mathematical systems in
terms of axioms can give them a very abstract, dry and unreal character. Systems defined in this way often
look like meaningless lists of ad-hoc rules. The purely axiomatic approach has the advantage of avoiding
disputes about which representation to use, but the cost is the necessity to show that representations exist
and are unique up to isomorphism. It is best to use the purely axiomatic approach when there are many
non-isomorphic representations. In the case of number systems such as the finite ordinal numbers ω, the
integers , the rationals , the real numbers IR and the complex numbers , it is important to guarantee
both existence and uniqueness up to isomorphism, and proving existence and uniqueness should not be
laborious. Most importantly, number system definitions should have a strong intuitive meaning. A set of 13
abstract axioms (Approach (2)) has very little intuitive value. A particular construction (Approach (1)) for
systems such as the real numbers is almost always tedious and low-level, and suffers from the multiplicity
(i.e. non-uniqueness) of different constructions. But the incremental approach (3) shows clearly the relations
between the number systems. Approach (3) is a compromise between the excessively low-level approach (1)
and the excessively high-level approach (2).
[ Probably Definition 7.3.2 should be regarded as a definition of a successor set. Then integers can be based
upon this by attaching an addition operation. ]

7.3.2 Definition: A system of natural numbers is a pair ( , s) such that is a set and s : → is a
function such that
(i) ∀m, n, p ∈ , ((s(m) = p ∧ s(n) = p) ⇒ m = n),

(ii) ∃u ∈ , ∀p ∈ , s(p) -= u,
(iii) ∀S ∈ IP( ), ∀u ∈ \ s( ), ((u ∈ S ∧ ∀n ∈ S, s(n) ∈ S) ⇒ S = ).
7.3.3 Remark: Definition 7.3.2 is based on the Peano axioms. (For example, see Halmos [160], page 46;
Mendelson [165], pages 102–117.) The function s is called the successor function of the system of natural
numbers . For any element n ∈ , the element s(n) is called the successor of n in .
7.3.4 Remark: Condition (i) means that s is injective. Condition (ii) means that s( ) -= . In other
words, s is not surjective. Condition (iii) is known as the “Principle of Mathematical Induction”. So a
system of natural numbers may be defined more concisely as in Definition 7.3.5.
7.3.5 Definition (→ 7.3.2): A system of natural numbers is a pair ( , s) such that is a set and s : →
is an injective, non-surjective function which satisfies the induction principle:
(i) ∀S ∈ IP( ), ∀u ∈ \ s( ), ((u ∈ S ∧ ∀n ∈ S, s(n) ∈ S) ⇒ S = ).
7.3.6 Remark: In order to speak of the (system of) natural numbers, it must be shown that all systems
of natural numbers are isomorphic and that there is only one isomorphism between each pair of systems of
natural numbers. Then each element of any such system may be unambiguously identified with a unique
element of each other system. This is shown in Theorem 7.3.10. But first it must be shown that at least one
system of natural numbers exists within ZF set theory. This is shown in Theorem 7.3.7.
7.3.7 Theorem: The set ω of ordinal numbers forms a system of natural numbers if the successor function
s : ω → ω is defined by s(X) = X ∪ {X} for all X ∈ ω.
7.3.8 Theorem: Let ( , s) be a system of natural numbers. Then \ s( ) contains only one element.
7.3.9 Remark: Theorem 7.3.8 means that there is one and only one element of (the set of) a system
of natural numbers which is not the successor of another element of . This is usually denoted as 0
or 1. There is a difference between these two options only if an operation of addition is attached to the set.
Roughly half of the texts start the integers with 0 and the other half start with 1.
7.3.10 Theorem: Let ( , s) and ( #

, s# ) be systems of natural numbers. Then there is a unique bijection
f : → # such that f ◦ s = s# ◦ f .
[ Give a set of axioms for a system of signed integers. Then give a particular representation of the integers in
terms of equivalence classes of pairs of integers? Show uniqueness up to isomorphism? Define positive and
non-negative integers without arithmètic operations, then with addition only, and then with both addition
and multiplication. Multiplication can be defined in terms of addition by induction? Addition can be defined
in terms of the successor operation by induction? ]
7.3.11 Remark: The necessity of defining only a system of integers rather than the system of integers is
clear from the wide variety of representations of the integers. The non-negative integers may be represented
as ordinal numbers or as signed integers, e.g. as pairs (a, b) ∈ 2 with the equivalence relation (a1 , b1 ) ≡
(a2 , b2 ) ⇔ (a1 + b2 = a2 + b1 ). But they may also be represented as a subset of the real numbers, which use
different representations. The real numbers may be regarded as the vector space IR1 , which is represented
differently to IR. A similar comment applies to the inclusion IR ⊆ . So the integers may be represented as
a subset of IR1 or . So here are several different representations of the “same” system of integers. There
is little choice but to give in and define only a system of integers.

7.4. Unsigned integer arithmetic 235
7.4. Unsigned integer arithmetic

7.4.1 Remark: In Sections 7.2 and 7.3, there are no addition or multiplication operations. For such bare
integers, the successor operation is defined, and this implicitly defines order. If the successor operation is
repeated, a primitive kind of addition is implied, and if addition is repeated, a primitive kind of multiplication
is implied.
7.4.2 Definition: A composite integer is a positive integer k ∈ such that ∃m, n ∈ \ {1}, k = mn.
A prime integer is a positive integer p ∈ such that p is not a composite integer.
7.4.3 Notation: m | k for m, k ∈ denotes the proposition: ∃n ∈ , k = mn.
7.4.4 Definition: For m, k ∈ , m divides k when ∃n ∈ , k = mn.
7.4.5 Remark: The propositions “m | k” and “∃n ∈ , k = mn” mean the same thing for all m, k ∈ .
The study of prime numbers does have some importance for finite groups, and finite groups do arise in the
study of differentiable groups, for example as quotients of closely related differentiable groups, but the theory
of prime numbers is mostly not of great relevance to differential geometry.
[ Here define non-negative integer powers of non-negative integers. ]
7.4.6 Remark: According to Ball [188], page 38, the word “power” in the mathematical sense is due to
Hipprocrates of Chios, who lived about 470–410bc.
Hippocrates [. . . ] denoted the square on a line by the word , and thus gave the technical
meaning to the word power which it still retains in algebra: there is reason to think that this use
of the word was derived from the Pythagoreans, who are said to have enunciated the result of the
proposition Euc. i, 47, in the form that “the total power of the sides of a right-angled triangle is
the same as that of the hypotenuse.”
The meaning of the word is given by Feyerabend [202], pages 107–108, as
ability; might, power, strength; military force, army; expedient, implement; talent, faculty; power
of speech; miracle; influence, consequence; worth, value; meaning, signification.
It seems, then, that the word “power” in mathematics originally referred only to the second power. Then
higher powers (and fractional, negative and complex powers) must have developed as generalizations of the
square. The use of the word “power” for power sets is by analogy with the powers of 2.
It happens that algebraic expressions for powers in physics invariably involve also the square of some sort of
velocity. But this is quite likely a coincidence.
[ Define computer representations and arithmetic for unsigned integers. ]
[ Represent the set +0 of non-negative integers by {u ∈ 2 ; #{i ∈ ω; u(i) = 1} < ∞}. See Remark 8.3.3 (7).
ω
Also use other bases instead of 2. ]

[ Present computer-style N -bit representations of unsigned integers using the set 2N . ]
7.5. Signed integers

[ Also present an axiomatic system for signed integers. ]
7.5.1 Remark: Figure 7.5.1 illustrates the equivalence classes which are used in one popular construction
of the signed integers from unsigned integers. This is formalized in Definition 7.5.2.
7.5.2 Definition: A space of (signed) integers is the set of equivalence classes of × for any system
of natural numbers, where (m1 , n1 ) ∼ (m2 , n2 ) whenever m1 + n2 = m2 + n1 .
7.5.3 Notation: denotes the set of (signed) integers.
7.5.4 Notation:
+
= {n ∈ ; n > 0} = {1, . . .} denotes the set of positive integers.
0 = {n ∈ ; n ≥ 0} = {0, 1, . . .} denotes the set of non-negative integers.
+
−
= {n ∈ ; n < 0} = {. . . − 3, −2, −1} denotes the set of negative integers.
−
0 = {n ∈ ; n ≤ 0} = {. . . − 3, −2, −1, 0} denotes the set of non-positive integers.

n −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8
10
11
12
13
14
15
16
Figure 7.5.1 Construction of signed integers from pairs of unsigned integers
7.5.5 Remark: Reinhardt [135], page 8, gives the notations + = {1, 2, 3 . . .} and + 0 = {0, 1, 2, 3 . . .}.
These do not seem to be in common use in the English-language literature, but they are used in this book
for the greatest overall simplicity and consistency.
7.5.6 Notation: n denotes the set {i ∈ ; 0 ≤ i < n} = {0, 1, . . . n − 1} for n ∈ 0.

+
7.5.7 Remark: Since the elements of the set n in Notation 7.5.6 are all non-negative, one could define
n as a subset of the ordinal numbers ω. In fact, the set n is essentially identical to the ordinal number n =
{0, 1, . . . n − 1}. However, the notation n is very commonly used. It is also useful to be able to handle
negative numbers in the same context as such sets since the modulo function (Definition 8.6.16) is defined
for signed numbers and is closely associated with the sets n .
7.5.8 Remark: The “two’s complement” representation of negative integers in computers, which is the
most popular by far in personal computers, is of the form (−2n , p), where p ∈ +
0 is a non-negative integer
satisfying 2n−1 ≤ p < 2n , where n is the number of bits. The bit-patterns for 0 ≤ p < 2n−1 represent
themselves. So the value represented by a two’s complement binary number p is ((p + 2n−1 ) mod 2n ) − 2n−1
in terms of the modulo operator in Definition 8.6.16.
7.5.9 Remark: Figure 7.5.2 illustrates another popular way of representing signed integers.
s
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 v
Figure 7.5.2 Construction of signed integers from unsigned integers and sign
This kind of representation is sometimes used in computers. (For example, n-bit “one’s complement” rep-
resentation has one sign bit followed by n − 1 value bits.) In this case, each signed integer corresponds to
a unique value-sign pair (v, s), where v is a non-negative integer and s = 0 or 1. The pair (0, 1) (meaning
“−0”) may or may not be excluded from the set. If “−0” is included in the set, it is defined to be equal to
0 anyway.
7.5.10 Remark: a = (aα )α∈A denotes a function with domain A. If A is a subset [m, n] of , then a may
be regarded as a sequence and written as (ai )ni=m . If A is a cross-product of two suitable subsets m and n
of , then a may be regarded as a matrix and written as [aij ]m−1,n−1
i,j=0 .
[ Should define “sequences” as families with integer number domain. But Definition 7.1.14 defines a sequence
as a family with a totally ordered index set. ]
[ Must define addition and multiplication on integers. This should maybe wait until the algebra chapter. ]

7.6. Extended integers 237
7.5.11 Remark: The first publication of the modern definition of multiplication for signed numbers is
attributed to Rafael Bombelli in 1572. (See Remark 45.2.7.)
[ Define computer representations and arithmetic for signed integers. ]

[ Maybe comment on Gaußian integers near here. Basically that’s n . Also comment on the linear transfor-
mations which are permitted for such grids. ]
7.6. Extended integers

[ Also present an axiomatic system for extended signed integers. ]
7.6.1 Remark: It does not matter how the pseudo-numbers ∞ and −∞ are represented. The only thing
that really matter is the order relations ∀n ∈ , n < ∞ and ∀n ∈ , −∞ < n, and of course −∞ < ∞.
7.6.2 Definition: The set of extended integers is the set ∪ {−∞, ∞}.
−
7.6.3 Notation: denotes the set of extended integers ∪ {−∞, ∞}.
7.6.4 Notation:
−+ −
= {n ∈ ; n > 0} = {1, 2, . . . , ∞} denotes the set of positive extended integers.
−+ −
= {n ∈ ; n ≥ 0} = {0, 1, . . . , ∞} denotes the set of non-negative extended integers.
−0− −
= {n ∈ ; n < 0} = {−∞, . . . − 3, −2, −1} denotes the set of negative extended integers.
−− −
0 = {n ∈ ; n ≤ 0} = {−∞, . . . − 3, −2, −1, 0} denotes the set of non-positive extended integers.
7.6.5 Remark: The bar over the symbol for the set of integers in Notation 7.6.3 is inspired by the usual
notation for the closure of a set in topology. The rough idea here is that by including the pseudo-numbers
∞ and −∞, the set of numbers is made in some sense “complete” by including the limits of numbers as
they get arbitrarily large in either direction. Of course, the notion of “limit” requires topology, which is not
defined for a mere algebraic system. (Example 14.8.3 gives a hint of how to give topological meaning to the
idea that ∞ is the limit of arbitrary large integers.) The same over-bar convention is followed consistently
(in this book) for the rational and real number systems also, as summarized in Remark 1.6.1.
7.7. Cartesian products of sequences of sets and functions
7.7.1 Notation: X n , for any set X and n ∈ 0,
+
denotes the Cartesian product ×i∈Nn X i where Xi = X
for all i ∈ Nn .
7.7.2 Remark: Notation 7.7.1 has the interesting consequence that X 0 = Y 0 for any sets X and Y . This
shows how necessary it is to have class tags and even name tags for sets, since the set value of a combination
of symbols does not contain the entire meaning of those symbols.
7.7.3 Definition: An n-tuple of elements of a set X for any n ∈ + 0 is any element of X . A 2-tuple may
n
be referred to as a duple (rarely) or ordered pair, a 3-tuple may be called a triple, a 4-tuple may be called a
quadruple and a 5-tuple may be called a pentuple.
7.7.4 Remark: The phrase “ordered pair” should be used with care because it can also mean Defini-
tion 6.1.3.
7.7.5 Notation: The expressions (a, b), (a, b, c), (a, b, c, d) and (a, b, c, d, e) for a, b, c, d, e ∈ X denote
respectively a duple, triple, quadruple and pentuple of elements of X. (The ordering on the printed line
represents the order within each tuple.) Notations for general n-tuples are defined inductively on n.
7.7.6 Definition: For any set X and m, n ∈ + 0 , the standard identification map concat : X
m
× Xn →
X m+n
is defined for a ∈ X and b ∈ X by concat : (a, b) 8→ c, where c ∈ X
m n m+n
is defined by ci = ai for
i ∈ m and ci+m = bi for i ∈ n.
[ Define Gaußian integers n . ]

[ Also define Cartesian product of sequence of functions corresponding to families of functions as in Defini-
tion 6.9.11. ]

7.8. Choice functions without the axiom of choice

[ Near here should have a list of ways of avoiding AC by having particular knowledge about a collection of
sets. E.g. if all of the sets in a collection are the same, one can choose the same value for each set. In
the particular example of the set of all non-empty compact subsets of the real numbers, one can choose the
infimum of each subset as the representative. This is an uncountable collection which is quite complex, but
choices can easily be made. ]
Despite all that has been said against the Axiom of Choice in Section 5.9, there are often times when it is
desirable to be able to say that the cross product of a non-empty family of non-empty sets is non-empty.
The elements of any cross product can be thought of as “choice functions” because each element of the cross
product singles out an element from each element of the family. As mentioned in Remark 5.9.9, the axiom
of choice may be expressed as
(S -= ∅ ∧ (∀i ∈ S, Xi -= ∅)) ⇒ ×i∈S Xi -= ∅. (7.8.1)
In other words, if a non-empty family (Xi )i∈S consists entirely of non-empty

% members, then its cross product
×i∈S Xi is non-empty. In other words, there exists a function f : S → i∈S Xi (called a “choice function”)
such that ∀i ∈ S, f (i) ∈ Xi . Although the general axiom of choice is rejected in this book, there are special
conditions under which Equation (7.8.1) follows from the ZF axioms in Section 5.1. Theorem 7.8.1 gives one
set of special conditions.
[ There are hopefully some more sets of conditions for AC, perhaps using total orders or well-ordering on the
collections of sets. If each set in a collection of sets is nicely enough ordered, then perhaps a unique minimum
or maximum of each set can be “chosen”. It seems pretty clear that if ∀i ∈ S, ∃# x ∈ Xi , ∀y ∈ Xi , xRi y,
where Ri is an order on Xi for all i ∈ S, then one can “choose” f (i) = mini (Xi ) for all i ∈ Xi , where
mini denotes the minimum with respect to the order Ri . But this seems to have nothing to do with order
specifically. (See Theorem 7.8.2.) Any only uniqueness result for a proposition would do. Alternatively, the
set of all choices for finite subcollections of the collection of sets could be given a suitable ordering so that
a global choice can be made in terms of the ordering. ]
'
7.8.1 Theorem: Let the family (Xi )i∈S have non-empty intersection i∈S Xi . Then
(S -= ∅ ∧ (∀i ∈ S, Xi -= ∅)) ⇒ ×i∈S Xi -= ∅.
'
Proof: To say that the intersection i∈S Xi is non-empty means that ∃x, ∀i ∈ S, x ∈ Xi . So define
f = {(i, x); i ∈ S}. Then f ∈ ×i∈S Xi . Therefore ×i∈S Xi -= ∅.
7.8.2 Theorem: If the family (Xi )i∈S satisfies ∀j ∈ S, ∃# x ∈ Xj , P (j, x) for some set-theoretic formula P ,
then
(S -= ∅ ∧ (∀i ∈ S, Xi -= ∅)) ⇒ ×i∈S Xi -= ∅.
% %
Proof: Define f : S → i∈S Xi by f = {(j, x) ∈ i∈S Xi ; P (j, x) ∧ x ∈ Xj }.
7.8.3 Remark: It seems reasonable that an “axiom of countable choice” should follow from the ZF axioms
using induction. However it appears that this is not so. Equation (7.8.1) must be proven for countable S.
Let (Xi )i∈S be a non-empty countable family of non-empty sets. Define Y = ×i∈S (Xi ∪ {∅}). Then Y -= ∅
because the function z = {(i, ∅); i ∈ S} is an element of Y .
It may be assumed that the index set S is the set ω of finite ordinal numbers by composing the map X with
a surjective map g : ω → S to obtain X ◦ g : ω → Range(X).
Let P (n) be the proposition: ∃x ∈ Y, ∀j < n, xj ∈ Xj . This means that there exists a sequence of choices
xj ∈ Xj for j < n, although the remaining xj for j ≥ n might all equal the empty set. Clearly P (0) is true
because the function z as above satisfies the proposition. Now suppose that P (k) is true for some k ≥ 0.
Then ∃x ∈ Y, ∀j < k, xj ∈ Xj . But ∃y, y ∈ Xk because Xk -= ∅. So define x̄ ∈ Y by
(
xj for j -= k
x̄j =
y j=k

7.9. Indicator functions and delta functions 239
for all j ∈ ω. Then ∀j < k + 1, x̄j ∈ Xj . So P (k + 1) is true. Therefore P (n) is true for all n by induction.
That is, ∀n ∈ ω, ∃x ∈ Y, ∀j < n, xj ∈ Xj .
Unfortunately, it does not seem to be possible to swap the universal and existential quantifiers here. (This is
a very common frustration indeed in mathematics. Swapping the quantifiers without sufficient justification is
the cause of many errors, especially when one
% '“knows” that' %the desired result is true. A similar kind of error
is swapping intersections and unions as in i j Sij and j i Sij .) Several lines of attack here all lead back
subtly to the axiom of choice which one is trying to avoid. For example, let Yn = {x ∈ Y ; ∀i < n, xi ∈ Xi }.
Then there is surely a function
' h : ω → Y such that hn ∈ Yn for all n ∈ ω. If so, then it is simple to
construct a function h̄ ∈ n∈ω Yn from the sequence (hn )n∈ω . But the existence of such a function requires
the axiom of countable choice!
There is an interesting comment by Taylor [145], page 19, regarding the axiom of countable choice.
Any non-empty set A contains at least one element x, and in the ordinary process of logic one can
choose a particular element from a non-empty set. By using the principle of induction it follows
that one can choose an element from each of a sequence of non-empty sets, but difficulty arises if
one has to make the simultaneous choice of an element from each set of a non-countable class C .
This comment seems to assume the axiom of countable choice. If so, it shows how easy it is to use AC
without knowing it.
7.8.4 Remark: This whole subject of axioms of choice and pixies at the bottom of the garden has become
so deeply convoluted that the author will have to make some hard choices in order to avoid spending the
next 10 years in deep meditation on set theory. The choice will probably be to accept the ZF axioms, but to
tag all theorems requiring the axiom of general or countable choice so that the reader will be able to assess
just what they will lose or gain by rejecting or accepting such axioms. The AC-sceptic may simply ignore
all AC-tainted results.
7.9. Indicator functions and delta functions

7.9.1 Remark: The numbers 0 and 1 in Definition 7.9.2 may be unsigned integers, signed integers, real
numbers, complex numbers, or elements of any unitary ring or field. The range of the function depends on
the context. So the indicator function is actually a kind of “meta-function”.
The same observation applies also to the Kronecker delta function and the Levi-Civita alternating symbol.
The symbols “0” and “1” are to be interpreted according to context.
7.9.2 Definition: The indicator function of a subset A of a set S is the function χA : S → {0, 1} defined by
(
1, x ∈ A
χA (x) =
0, x ∈ S \ A,
for any subset A of S.
7.9.3 Notation: χA for a subset A of a set S denotes the indicator function of A.
7.9.4 Remark: Figure 7.9.1 illustrates an indicator function χA where A = {a} is a subset of S = 2
.
The ambient set S is usually clear from the context.
χ{a}
(a, 1)
Figure 7.9.1 Function χ{a} for S = 2

An indicator function is also sometimes called a characteristic function, but this can be confusing, especially
in probability contexts. However, the use of the Greek letter χ for the indicator function is no doubt derived
from the first letter of (which means “stamp, mark, characteristic trait, character, token”).
7.9.5 Theorem: For any set S, the sets IP(X) and 2S are equinumerous.
Proof: Define f : IP(S) → 2S by f : A 8→ χA for all A ⊆ S, where χA denotes the indicator function for
A as a subset of S. Then f is a bijection.
7.9.6 Remark: The sets 2S and IP(S) are quite frequently identified with each other. The inverse of the
canonical map in the proof of Theorem 7.9.5 is the map f −1 : g 8→ {x ∈ S; g(x) = 1}.
In the case S = ω, there is a bijection φ : 2ω → [0, 1) which maps infinite!sequences of zeros and ones to the
binary expansions of real numbers in the interval [0, 1), namely φ : g 8→ n∈ω g(n)2−n−1 .
7.9.7 Definition: The (integer) power-of-two function is the function f : +

0 → +
which satisfies f (0) =
0.
1 and f (n + 1) = 2f (n) for n ∈ +
7.9.8 Notation: 2n for n ∈ +

0 denotes the integer power-of-two function for argument n ∈ 0.
+
7.9.9 Theorem: For any finite set S, #(IP(S)) = #(2S ) = 2#(S) .
7.9.10 Definition: The Kronecker delta function on a set S is the function δ : S × S → {0, 1} which is
defined so that δ(i, j) = 1 if and only if i = j.
7.9.11 Notation: The Kronecker delta expression δ(i, j) may be denoted as δij , δji , δij or δ ij .
7.9.12 Remark: The Kronecker delta function in Definition 7.9.10 is often applied to sets S which are
subsets of the integers. The Kronecker delta function for S = is illustrated in Figure 7.9.2.
δ ij
j
Figure 7.9.2 Kronecker delta function on the integers
As mentioned in Remark 7.9.1, the range of the Kronecker delta function must be interpreted according to
context. The symbols “0” and “1” must be defined within the context. Thus the Kronecker delta function
is a kind of meta-function.
Theorem 7.9.13 shows that the indicator function and characteristic function for any set S are closely related.
7.9.13 Theorem: For any set S,
∀i, j ∈ S, χ{i} (j) = δij .
7.10. Permutations
7.10.1 Remark: A permutation is usually defined only for finite sets. Permutations are useful for defining
symmetries and for constructing examples of finite groups.

7.10. Permutations 241
7.10.2 Remark: The word “permutation” is used for two different but related concepts. The first kind of
permutation, given in Definition 7.10.3, is a map from an entire set to itself. This may be thought of as an
ordering of the elements of the set although the set does not need to have a defined order.
A permutation P of a set X is typically used in conjunction with a function g : X → Y or a function h :
Y → X for some set Y . Since P : X → X is a bijection, the functions g ◦ P : X → Y and P ◦ h : Y → X are
well defined. The function g ◦ P is injective if and only if g is injective. Similarly, g ◦ P is surjective if and
only if g is surjective. Corresponding relations hold between P ◦ h and h. Thus permutations are useful for
modifying functions by rearranging the elements of the domain and/or range.
The second kind of permutation is an “ordered selection” or “ordered subset” of a given set X. Thus this
kind of permutation of the elements of a set X is a subset S of X which has a total order structure added
to the subset S. The simplest way to add an order structure to a set S is to induce the order from a set
which has a well-known pre-defined order, such as a set of integers for example. Thus an injective function
ψ : S → induces an order on a set S by defining x ≤ y for x, y ∈ S if and only if ψ(x) ≤ ψ(y). For a
finite set X, it is customary to use the inverse of such a map ψ and require the domain of this inverse to be
a contiguous set such as k for some k ∈ + 0 . Thus a permutation could be defined as a map p : k → S.
This kind of permutation is called a “k-permutation” by EDM2 [35], article 330, to distinguish it from the set-
bijection style of permutation. In the special case that X = k , the two concepts are the same. Feller [107],
page 28, uses the term “ordered sample” for an ordered selection and uses the word “permutation” for the
special case #(X) = k.
Set-bijection permutations are defined in this section. Ordered-selection “permutations” are defined in
Section 7.11.
7.10.3 Definition: A permutation of a set X is a bijection from X to X.
7.10.4 Notation: perm(X) for a set X denotes the set of permutations of X.
7.10.5 Remark: Notation 7.10.4 is probably non-standard.
7.10.6 Remark: A permutation is a transposition if and only if exactly two elements of the set are swapped.
Definition 7.10.7 is related to the “swap” operator in Definition 7.12.2.
If the set X in Definition 7.10.7 has less than two elements, there are no transpositions on X.
[ Probably should define the “swap” operator for general sets. Some other list operators are similarly general
in nature. ]
7.10.7 Definition: A transposition of a set X is a bijection f : X → X such that for some i, j ∈ X

with i -= j,
-
j if x = i
∀x ∈ X, f (x) = i if x = j
x otherwise.
7.10.8 Remark: Definition 7.10.9 assumes that a total order exists on the domain of a permutation.
However, this is not really necessary. The parity of a permutation can be defined in terms of the number of
transpositions that the permutation is equivalent to.
[ Show that the parity of the number of transpositions of a permutation is equal to the parity in Defini-
tion 7.10.9. Determine necessary and sufficient structures and conditions on the domain to make this true
more generally. ]
7.10.9 Definition: The parity of a permutation f : X → X of a totally ordered set X is the integer
parity(f ) = (−1)k where k = #{(i, j) ∈ X; i < j ∧ f (i) > f (j)}.
7.10.10 Definition: An even permutation of a totally ordered set X is a permutation f of X such that
parity(f ) = 1.
An odd permutation of a totally ordered set X is a permutation f of X such that parity(f ) = −1.

7.10.11 Remark: The parity of a permutation is also called the sign or the index of the permutation.
Parity commutes with function composition. The Levi-Civita symbol in Definition 7.10.20 is essentially the
same thing as the parity function.
7.10.12 Theorem:
(i) If f and g are permutations of a set X, then parity(f ◦ g) = parity(f ) parity(g).
(ii) parity(f ) = −1 for any transposition f of any set X.
(iii) A permutation f of a set X is even if and only if f is equal to the composition of an even number of
transpositions of X.
7.10.13 Definition: The factorial function is the map f : +

0 → +
0 defined inductively by f (0) = 1 and
∀n ∈ + , f (n) = nf (n − 1).
7.10.14 Notation: n! denotes the value of the factorial function for argument n.
.n
7.10.15 Remark: The factorial function may be expressed as n! = i=1 i.
7.10.16 Definition: The Jordan factorial function is the map f : +

0 × +
0 → +
0 defined inductively by
∀n ∈ +0 , f (n, 0) = 1 and ∀n, k ∈ 0 , f (n, k + 1) = (n − k)f (n, k).
+
7.10.17 Notation: (n)k denotes the value of the Jordan factorial function for argument (n, k).
.
7.10.18 Remark: The Jordan factorial function may be expressed as (n)k = k−1 i=0 (n − i). When n < k,
(n)k = 0. When n = k, (n)k = n!. When n ≥ k, (n)k = n!/(n − k)! The Jordan factorial function is also
defined (and very useful) for a general real or complex argument n. Notation 7.10.17 is not very safe in
differential geometry, which is infested with parentheses and subscripts.
7.10.19 Remark: The number of permutations of a set with n elements is n! for n ∈ 0.

+
7.10.20 Definition: The Levi-Civita (alternating) symbol or Levi-Civita tensor (with n indices) is the
function + : ( n → n ) → {−1, 0, 1}, for some n ∈ +
0 , defined by
-
−1 if f is an odd permutation of n
+(f ) = 0 if f is not a permutation of n
+1 if f is an even permutation of n ,
for all f : n → n.
7.10.21 Notation: +i1 ,...in for n ∈ + 0 denotes the value +(i) of the Levi-Civita symbol for i = (ik )k=1 .
n
+i1 ,...in is an alternative notation for +i1 ,...in .
7.10.22 Remark: See Definition 7.10.3 for permutations. The Levi-Civita symbol is essentially the same
thing as the parity function in Definition 7.10.9. Only the notation is different.
7.11. Combinations and ordered selections

7.11.1 Remark: An ordered selection is also known as an “ordered sample”, a “k-permutation” or simply
a “permutation”. This is explained in Remark 7.10.2. A combination is called a “k-combination” by
EDM2 [35], article 330, but it could also be referred to as an “unordered selection” or “unordered sample”.
Although these concepts are important in probability theory, they are even more important in analysis,
particularly as coefficients of Taylor series.
7.11.2 Definition: The combination symbol is the function C : +

0 × +
0 → +
0 defined by
∀n, r ∈ +
0, C(n, r) = #{x ∈ IP(n); #(x) = r}.
7.11.3 Notation: Crn for n, r ∈ +

0 denotes the value of the combination symbol C(n, r).

7.11. Combinations and ordered selections 243
7.11.4 Remark: Definition " n7.11.2

# means that Crn is the number of distinct r-element subsets of an n-
element set. The notation r is a popular alternative" n # to Cr . However, when combination symbols are
n
mixed with other areas of mathematics, the notation r tends to be ambiguous.

7.11.5 Theorem:
r−1
/ n−i
∀n, r ∈ +
0, Crn = .
i=0
1+i
7.11.6 Remark: When the combination symbol values are arranged in a triangle, this is called “Pascal’s
triangle”, although Pascal called it an “arithmètic triangle” or “triangle arithmétique”. Pascal’s triangle has
so many interesting properties that whole books have been written about it. In particular, Blaise Pascal
wrote a small treatise “Traité du triangle arithmétique” [130] in 1654, published posthumously in 1665. He
wrote the following preface to the chapter on applications.
DIVERS VSAGES DV TRIANGLE
ARITHMETIQVE.
Dont le generateur est l’Vnité
Apres auoir donné les proportions qui se rencontrent entre les cellules & les rangs des Triangles
Arithmetiques, ie passe à diuers vsages de ceux dont le generateur est l’vnité; c’est ce qu’on verra
dans les traictez suiuans. Mais i’en laisse bien plus que ie n’en donne; c’est vne chose estrange
combien il est fertile en proprietez, chacun peut s’y exercer; I’auertis seulement icy, que dans toute
la suite, ie n’entends parler que des Triangles Arithmetiques, dont le generateur est l’vnité.
This may be translated into English as follows.
Various uses of the arithmètic triangle with unit generator
After having given the proportions which are encountered between the cells and the rows of
arithmètic triangles, I pass to various uses of those of which the generator is unity; that is what
one will see in the following tracts. But I have left out many more of them than I have given; it is
a strange thing how fertile in properties it is, everyone may exercise himself on it; I mention here
only that in all of the following, I intend only to talk of arithmètic triangles of which the generator
is unity.
The comment about fertile properties is more often translated as: “It is extraordinary how fertile in properties
this triangle is. Everyone can try his hand.” The conclusion one may draw from this is that it is pointless
to try to make a comprehensive list of properties of the combination symbol (i.e. Pascal’s triangle).
Pascal’s stipulation that “the generator is unity” is equivalent to part (i) of Theorem 7.11.9.
7.11.7 Remark: Struik [194], page 74, notes that Pascal’s triangle was published by Chinese mathe-
maticians Yáng Huı̄ (about 1238–1298ad) and Zhū Shı̀jié (about 1260–1320ad) during the Sung dynasty
(960–1279ad).
Yáng Huı̄ presents us with the earliest extant representation of the Pascal triangle, which we find
again in a book c. 1303 written by Zhū Shı̀jié (Chu Shi-chieh).
However, it seems these Chinese authors merely wrote out six or eight rows of the triangle without system-
atically investigating its properties as Pascal did.
7.11.8 Remark: To establish that the “cells” of Pascal’s triangle are equal to the combination symbol
in Definition 7.11.2, it is necessary to show that the combination symbol satisfies the induction rule in
Theorem 7.11.9. n
Cr−1 Crn
Crn+1
7.11.9 Theorem:
(i) ∀r ∈ , Cr0 = δr0 .
(ii) ∀n ∈ +
0 , ∀r ∈ , Cr
n+1
= Cr−1
n
+ Crn .

7.11.10 Notation:
Irn for n, r ∈ +
0 denotes the set {f : r → n; ∀j, k ∈ r, j < k ⇒ f (j) < f (k)}.
Jrn for n, r ∈ +
0 denotes the set {f : r → n; ∀j, k ∈ r, j ≤ k ⇒ f (j) ≤ f (k)}.
7.11.11 Theorem:
(i) ∀n, r ∈ +
0, #(Irn ) = Crn .
(ii) ∀n, r ∈ +
0, #(Jrn ) = Crn+r−1 .
Proof: [ A proof of the symmetric case (ii) is given by Feller [107], section II.5, page 38. (The antisymmetric
case is there too! But that’s much simpler.) ]
7.11.12 Remark: The notations Irn and Jrn in Notation 7.11.10 are non-standard. Federer [106], 1.3.2,
p.15, gives the notation Λ(n, r) for Irn .
Sets of increasing and non-decreasing sequences of indices (i1 , . . . ir ) selected from a complete set (1, . . . n)
are frequently encountered in tensor algebra. One often wishes to sum an expression for index sequence
values which have a unique representative for each subset of distinct index values.
It is perhaps noteworthy that Theorem 7.11.11 is equivalent to the following assertions.
r−1
/ n−i
∀n, r ∈ +
0, #(Irn ) = .
i=0
1+i
r−1
/ n+i
∀n, r ∈ +
0, #(Jrn ) = .
i=0
1+i
These are more similar to each other than suggested by parts (i) and (ii) of Theorem 7.11.11.
7.11.13 Notation: fi for a function f : n → A and i ∈ Im n

for any set A means the sequence f ◦ i =
(f (i(j)))j=1 = (fi(j) )j=1 = (fi1 , . . . fim ) : m → A.
m m
7.11.14 Notation: f i for a function f : n→ A and i ∈ Im
n
for any set A means the sequence f ◦ i =
(f (i(j)))m
j=1 = (f )j=1 = (f i1 , . . . f im ) :
i(j) m
m → A.
7.11.15 Remark: It may seem that Notations 7.11.13 and 7.11.14 are contradictory. Notation 7.11.14
often arises in contexts where the Einstein index convention (Remark 13.9.18) is used. According to this
convention, raised indices are a mnemonic which indicate that the sequence is contravariant in some sense
(whereas lowered indices are a mnemonic for covariance). Notation 7.11.14 simply preserves the choice of
subscript or superscript.
The set A in Notations 7.11.13 and 7.11.14 is typically the field of numbers which is used for coordinates
and coefficients of vectors or a set of vectors in a linear space.
[ Here insert ordered selections. See for example EDM2 [35], article 330. ]
7.12. List spaces for general sets

7.12.1 Remark: %∞ Lists are finite sequences of arbitrary length. A list of items in a set X is an element of
the set union i=0 X i . The set X i may be taken literally as the set of functions f : i = {0, 1, . . . i − 1} → X
or the set of functions f : i = {1, 2, . . . i} → X. The initial index 0 is used in this section for maximum
convenience.
Although lists are familiar in computer software and everyday life, they are rarely formalized in mathematics
texts. The items in a list may be referred to also as components, elements or members.
One application for lists is in the definition of the exterior derivative of a differential form. (See Defini-
tion 20.6.2.) List spaces are also useful in homology theory.

7.12. List spaces for general sets 245
[ Consider the space List(X) to be embedded in X ∞ ? ]

[ Should indicate for all list operations what the length of the resulting list is. ]
[ Many of the following list operators are well-defined for general functions. Make a separate definition for
such any-function operators. ]
7.12.2 Definition: For any set X, the list space List(X) on X is the set
% %
∞
List(X) = Xi = X i,
+ i=0
i∈ 0
together with the following operations:

(i) The length function length : List(X) → + 0 defined by length : x 8→ #(Dom(x)).
(ii) The canonical injection i : X → List(X) defined by i : x → (x), where (x) is the one-element list
containing the single element x.
(iii) The concatenation function concat : List(X) × List(X) → List(X), defined by
∀x, y ∈ List(X), concat(x, y) ∈ X length(x)+length(y) and ∀i ∈ length(x)+length(y) ,

(
xi if i < length(x)
concat(x, y)i =
yi−length(x) if i ≥ length(x).
(iv) The “restriction to n items” function. . .

(v) The “omit item i” function omitj : List(X) → List(X) defined by:
(
/i if i < j
omit(/)(i) =
j /i+1 if j ≤ i < length(/) − 1.
That is,
omit(/) = (/0 , . . . , /j−1 , /j+1 , . . . , /length(%)−1 ).
j
Note that omitj acts as the identity on lists which have length less than or equal to j. Otherwise, omitj
maps X k to X k−1 , where k = length(/).
(vi) The “omit items i and j” function omitj,k : List(X) → List(X) is defined for j, k ∈ + 0 such that j -= k
by 
 /i if i < min(j, k)
omit(/)(i) = /i+1 if min(j, k) ≤ i < max(j, k)
j,k 
/i+2 if max(j, k) ≤ i < length(/) − 2.
Note that omitj,k acts as the identity on lists which have length less than or equal to min(j, k).
(vii) The “swap items i and j” operator swapj,k : List(X) → List(X) is defined for j, k ∈ + 0 by

 /k if i = j and j ∈ Dom(/)
swap(/)(i) = /j if i = k and k ∈ Dom(/)

j,k
/i if i ∈
/ {j, k}.
The operator swapj,k equals the identity function if j = k. (This operator is well-defined for any
function, not just for lists.)
(viii) The “substitute x at position j” operator subsj,x : List(X) → List(X) defined by:
(
x if i = j
subs(/)(i) =
j,x /i if i =
- j.
The operator subsj,x equals the identity on lists / for which j ∈

/ Dom(/). (This operator is well-defined
for any function, not just for lists.)

(ix) The “insert x at position j” function insertj,x : List(X) → List(X) defined by:

 /i if i < j
insert(/)(i) = x if i = j
j,x 
/i−1 if j < i < length(/) + 1.
That is,
insert(/) = (/0 , . . . , /j−1 , x, /j , . . . , /length(%)−1 ).
j,x
The operator insertj,x acts as the identity on lists which have length less than or equal to j.
(x) The “subsequence of length n starting at k” function subseqk,n : List(X) → List(X) is defined by
$
subseq(/)(i) = /k+i for 0 ≤ i < min(n, length(/) − k)
k,n 0 otherwise.
This function maps X j to X m , where m = min(n, j − k).
7.12.3 Remark: It is often convenient to use the initial index 1 instead of 0 in Definitions 7.12.2, 9.12.1
and 9.12.2. Then all operations are the same except that the indices are shifted. It may be useful to define
list spaces with a variable starting index or even general indices instead of integer indices.
7.12.4 Remark: It is often useful to allow a mixture of finite and infinite sequences as in Definition 7.12.5.
[ Should define list operations for Definition 7.12.5. ]
7.12.5 Definition: For any set X, the extended list space List(X) on X is the set
%
∞
List(X) = X ω ∪ Xi
i=0
% −+
= {X ; i ∈
i
0}
%
= {X i ; i ∈ ω } +
= List(X) ∪ X ω .
7.12.6 Remark: The notation List(X) for an extended list space is supposed to suggest and analogy with
−+ −+
0 = 0 ∪ {∞} and other such extended number sets. (See Notation 7.6.4 for 0 .) A suitable alternative
+
∗ −
would be List (X), which suggests completion of a topological space, as for example IR+0 = IR0 ∪ {∞}.
+
7.13. Reformulation of logic in terms of axiomatic mathematics

As mentioned in Remarks 3.13.4 and 3.13.8, in order to avoid circularity in definitions, it is necessary to
first define symbolic logic in terms of naive mathematics, and then later return to make a reformulation of
mathematical logic in terms of the axiomatic mathematics which has been built on the logical foundations.
Since mathematical logic seems to require only sets, order, numbers, relations, functions and other such
elementary concepts, it seems that this section of the book is a suitable location for a systematic reformulation
of mathematical logic in terms of formal set theory and numbers.

[247]
Chapter 8
Rational and real numbers
8.1 Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

8.2 Extended rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
8.3 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.4 Extended real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
8.5 Real number tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.6 Some useful basic real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.7 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.1. Rational numbers

[ Give sets of axioms for the rational numbers here. One set in terms of the integers . The other set specifying
the rationals from scratch. ]
8.1.1 Definition: The set of rational numbers is the set {[(n, d)]; n ∈ , d ∈ \ {0}}, where [(n, d)]
denotes the equivalence class {(n# , d# ) ∈ 2 ; d# -= 0 and nd# = n# d} for all pairs (n, d) ∈ × ( \ {0}).
8.1.2 Notation: denotes the set of rational numbers.
8.1.3 Remark: Definition 8.1.1 is a popular style of representation for the set of rational numbers in terms
of equivalence classes of pairs of integers with equal ratios. This is illustrated in Figure 8.1.1. (This diagram
is reminiscent of some constructions in projective geometry.)
−4 −4 6 4 3 2 3 4 6 1 6 3 2
3 1 n 1 1 1 1 2 3 5 1 7 4 3
3/5
1/2
2/5
1/3
1/4
1/5
1/10
d 0/1
−1/10
−1/5
−1/4
−1/3
−2/5
−1/2
−3/5
−6 −3 −2 −3 −6 −1 −6 −3 −2
1 1 1 2 5 1 7 4 3
Figure 8.1.1 Construction of rational numbers from integers


248 8. Rational and real numbers
[ Present floating-point representations of rational numbers, such as 2 × (2 × 2M ) × 2N , where the first

component is the sign bit, the second component is the sign of the exponent, the third component is the
absolute value of the exponent (or something like that, e.g. 2’s complement representation), and the last
component is the “fractional part” of the number. ]
[ Present a list of reasonable representations for rational numbers similar to the list in Remark 8.3.3. ]
[ Define a canonical embedding of the integers inside , and also all of the order, arithmètic and metric
structures which has. Also the subtraction, reciprocal and division operations, and the lowest common
denominator, the norm/modulus, and normalized “mixed fractions”. Also define decimal and binary expan-
sions, and floating point representations. ]
[ Define intervals and interval notations [q1 , q2 ] etc. Present the use of the ∞ symbol to represent unbounded
intervals. ]
8.1.4 Notation:
+
denotes the set of positive rational numbers {q ∈ ; q > 0}.
0 denotes the set of non-negative rational numbers {q ∈ ; q ≥ 0}.
+
−
denotes the set of negative rational numbers {q ∈ ; q < 0}.
−
0 denotes the set of non-positive rational numbers {q ∈ ; q ≤ 0}.
[ Define the arithmetic of intervals and general subsets of . For example [a, b] + [c, d] and q.[a, b]. ]
[ Do a version of Theorem 8.3.11 for rational numbers. Define inf and sup for subsets of . ]
[ Present arithmetics here for computer representations of rational numbers. In particular, define floating
point representations and arithmetic. ]
[ Define Dedekind cuts and Cauchy sequences for and show non-closure. ]
[ Rational number tuples should be presented somewhere, maybe in the matrix algebra chapter, Chapter 11.
Show closure under linear transformations with rational number matrix coefficients. Although matrix mul-
tiplication belongs in the matrix chapter, tuples and non-algebraic operations on tuples probably should be
presented near here. ]
[ Bresenham’s line algorithm is used for the rasterisation of lines with rational parameters, using an integer
grid. Perhaps such algorithms should be discussed near here. They are used in practice for the implemen-
tation of rational and real numbers on real-world systems. ]
8.2. Extended rational numbers
[ Give an axiomatization or two for the extended rational numbers. ]
8.2.1 Definition: The set of extended rational numbers is the set ∪ {−∞, ∞}.
−
8.2.2 Notation: denotes the set of extended rational numbers ∪ {−∞, ∞}.
−
[ Define arithmetic, order, metric, etc. on . ]
8.2.3 Remark: The “infinite rational numbers” −∞ and ∞ are not easy to represent. It is possible to
artificially define −∞ = {(n, 0); n ∈ − } and ∞ = {(n, 0); n ∈ + } by analogy with the ordered pair
equivalence classes in Definition 8.1.1. However, the arithmètic rules will not then be consistent with the
finite rationals. Since the rules for order and arithmetic must be specified separately for the infinite rationals
anyway, one may as well leave their representation unspecified. The choice of representation is of purely
academic interest!
8.2.4 Notation:
−+ −
denotes the set of positive extended rational numbers {q ∈ ; q > 0}.
−+ −
0 denotes the set of non-negative extended rational numbers {q ∈ ; q ≥ 0}.
−− −
denotes the set of negative extended rational numbers {q ∈ ; q < 0}.
−− −
0 denotes the set of non-positive extended rational numbers {q ∈ ; q ≤ 0}.
[ Give a comparative table of (extended) rational number notations for various authors, including the sets of
positive and negative rational numbers. ]
[ Define general extended real number intervals. Define arithmetic for extended real number intervals. ]

8.3. Real numbers 249
8.3. Real numbers

[ Refer to some numerical analysis concepts near here. Computers use rational numbers which approximate
to real numbers. ]
8.3.1 Remark: The “floating-point numbers” which are implemented in digital computer hardware rep-
resent rational numbers, not real numbers, although the term “real numbers” is often used incorrectly for
computer numbers. However, rational numbers are often used in practice to represent intervals of real num-
bers. For example, a decimal floating-point number like 3.141592654 is understood to indicate an interval
such as [3.1415926535, 3.1415926545]. Whether computer hardware number implementations are interpreted
as rational numbers or intervals of real numbers, their operations are generally only approximate at best.
The purpose of mathematical definitions of real numbers is to remove the ambiguity and imprecision of
the informal every-day usage. The idea of number intervals as an interpretation for the common usage of
numbers provides a natural basis for a rigorous definition of real numbers.
8.3.2 Remark: The real numbers sometimes seem to have some surprising properties, and one may wonder
whether they are an accurate model for the universe. However, one must always remember that mathematical
systems only give conclusions if the assumptions are true. In the case of real numbers, it is highly unlikely
that any physical system would satisfy the axioms. It is dubious that even the rational numbers could be
a correct model for any physical system. Certainly measurements in physics do not provide unbounded
significant figures of accuracy. All physical measurements have a limited resolution.
Although physical phenomena (the appearances of things) do not offer unbounded resolution, physical
noumena (the underlying true natures of things) could possibly offer unbounded or infinite resolution in
some systems. But there is no way of knowing the true natures of things. When real numbers are used to
model noumena, one can only say that the finite-resolution approximations which are provided by phenom-
ena seem to match very well with many of the models in physics. If the underlying things behind phenomena
are not in fact modelled correctly by the real numbers, this does not really matter. Physics never says what
a thing is, only how it appears.
The real number system only tells us the properties of a modelled system if it precisely matches the speci-
fication. So if the properties of real numbers seem a little odd, and do not match exactly our experience of
the phenomena of the universe, it doesn’t matter because we will never be able to check. Therefore the real
number system should not be taken too seriously. It’s only a model.
[ Insert here a set of axioms for the real numbers. Maybe give one axiomatization from scratch, and one based
on . Then give the Cantor representation of real numbers as equivalence classes of Cauchy sequences.
Probably some work is required to show that the definition satisfies the axioms. See EDM2 [35], section 294.E
for both Dedekind and Cantor representations of real numbers. See Spivak [144], pages 487–512, for axiomatic
definition of complete ordered fields (i.e. the real numbers), the Dedekind and Cantor constructions, and
proof of uniqueness of real numbers up to isomorphism. ]
8.3.3 Remark: Representations of the real numbers include the following.

(1) Dedekind cuts.
(2) Cauchy sequences.
(3) × 2ω . The first component represents the floor of the real number. (See Definition 8.6.8.) The second
component represents the fractional part of the number. (See Definition 8.6.13.)
(4) 2×ω×2ω . The first component represents the sign of the real number. (See Definition 8.6.2.) The second
component represents the floor of the absolute value of the number. (See Definition 8.6.1.) The second
component represents the fractional part of the absolute value of the number. (See Definition 8.6.13.)
(5) 2× ×2ω . The first component represents the sign of the real number. The second component represents
the shift of the binary “decimal point” of the absolute value of the number. The second component
represents the fractional part of the absolute value of the number.
(6) × {x ∈ 2ω ; #{i ∈ ω; x(i) = 0} = ∞}. The first component represents the floor of the real number.
(See Definition 8.6.8.) The second component represents the fractional part of the number. By contrast
with option (3), there is not need to apply an equivalence class to this representation.

(7) {u ∈ 2ω ; #{i ∈ ω; u(i) = 1} < ∞} × {x ∈ 2ω ; #{i ∈ ω; x(i) = 0} = ∞}. This is the same as option (6)
except that the first component (the floor of the real number) is represented as a finite sequence of a
binary ones.
Other representations may be based, for example, on the usual floating point representations for rational
numbers, or could replace the base 2 with other integers which are greater than 1.
[ Define some or all of the real-number representations in Remark 8.3.3. Show the correspondences between
them. Maybe show how arithmetic is done in some or all of these representations. ]
[ is a set of equivalence classes on × ( \ {0}). Define IR as a set of equivalence classes on {f ∈
ω
; f is a Cauchy sequence}, or Dedekind cuts. Define IRn as the set of functions from n to IR. Make all
vectors have indices which start at 0. ]
8.3.4 Remark: The most popular representations for the real numbers in pure mathematics are the
Dedekind representation and the Cantor representation. The former defines real numbers as semi-infinite
intervals of the rational numbers called Dedekind cuts. The latter defines real numbers as equivalence classes
of Cauchy sequences.
The Dedekind representation exploits the total order on the rationals to “fill the gaps”. The Cantor repre-
sentation exploits the distance function on the rational numbers to “fill the gaps”. The total order and the
distance function on the rational numbers are not unrelated. However, the distance function approach is
much more generally applicable. Any metric space may be completed (i.e. the “gaps” can be filled in) using
the Cauchy sequence approach. On the other hand, the total order approach requires much less convoluted
logic for its construction.
In applied mathematics and the sciences, the most popular representations of the real numbers are as finite
and infinite decimal or binary expansions. However, irrational numbers cannot be represented precisely by
such expansions, since they require an infinite amount of information.
8.3.5 Definition: A Cauchy sequence of rational numbers is a sequence x : ω → such that
+
∀ε ∈ , ∃n ∈ ω, ∀i ≥ n, |xi − xj | < ε.
[ Definition 8.3.6 is only one representation of the real numbers. There should be a definition of “a system of
real numbers” rather than “the system of real numbers”. This should be followed by several representations.
Define the Dedekind, Cauchy sequence, decimal and binary representations. The decimal representation is
perhaps closest to how people really think about the real numbers? Finite decimal expansions are what we
typically get from measurements of the real world because that’s how measurement equipment is designed.
Rational numbers are a consequence of dividing up units into equal portions. Then it is only reasonable to
assume that the gaps between rationals are filled in somehow. The Cantor representation is closely related
to the concept of “successive approximation”. ]
8.3.6 Definition: The set of real numbers is the set of equivalence classes of all Cauchy sequences of
rational numbers, where two Cauchy sequences x = (xi )i∈ω and y = (yi )i∈ω are considered equivalent if the
combined sequence (zi )i∈ω defined by z2i = xi and z2i+1 = yi for all i ∈ ω is a Cauchy sequence.
8.3.7 Notation: IR denotes the set of real numbers.
[ Define order on IR. ]
8.3.8 Notation:
IR+ denotes the set of positive real numbers {r ∈ IR; r > 0}.
IR+
0 denotes the set of non-negative real numbers {r ∈ IR; r ≥ 0}.
IR− denotes the set of negative real numbers {r ∈ IR; r < 0}.
IR−
0 denotes the set of non-positive real numbers {r ∈ IR; r ≤ 0}.
8.3.9 Remark: Some authors use IR+ for non-negative real numbers, but then there would be no obvious
notation for positive reals. It would be tedious to have to write IR+ \ {0} for the positive reals.
Some authors use notations such as IR+ and + instead of the superscript versions. The advantage of this
would be that you could write IRn+ for the Cartesian product of n copies of IR+ . However, this would cause
ambiguity for notations like IR0+ for the non-negative real numbers.

8.4. Extended real numbers 251
8.3.10 Definition: An open interval of the real numbers is any set of the form (a, b) = {x ∈ IR; a < x <
b}, (a, ∞) = {x ∈ IR; a < x}, (−∞, b) = {x ∈ IR; x < b} or IR, for some a, b ∈ IR with a ≤ b.
A closed interval of the real numbers is any set of the form [a, b] = {x ∈ IR; a ≤ x ≤ b}, [a, ∞) = {x ∈
IR; a ≤ x}, (−∞, b] = {x ∈ IR; x ≤ b} or IR, for some a, b ∈ IR with a ≤ b.
A closed-open interval of the real numbers is any set of the form [a, b) = {x ∈ IR; a ≤ x < b} for some
a, b ∈ IR with a ≤ b.
An open-closed interval of the real numbers is any set of the form (a, b] = {x ∈ IR; a < x ≤ b} for some
a, b ∈ IR with a ≤ b.
A semi-closed interval or semi-open interval of the real numbers is any open-closed or closed-open interval
of the real numbers for some a, b ∈ IR with a ≤ b.
An interval of the real numbers is any open, closed or semi-closed interval of the real numbers.
The closed unit interval of the real numbers is the set [0, 1].
The open unit interval of the real numbers is the set (0, 1).
8.3.11 Theorem: A non-empty subset I of IR is an interval if and only if I ⊇ (inf I, sup I).
Proof: It is easy to show that all non-empty intervals satisfy I ⊇ (inf I, sup I). So suppose that I is a
non-empty subset of IR which satisfies I ⊇ (inf I, sup I).
8.3.12 Remark: In terms of the standard topology on the real numbers, a subset of IR is connected if and
only if it is an interval. (See Definition 14.10.1 for the standard topology on the real numbers.) Note that
an open interval may be the empty set. In this case, inf I = ∞ and sup I = −∞. So (inf I, sup I) = ∅.
[ Near here define sup and inf for subsets of IR. Show existence of these for bounded sets etc. Is this the Heine-
Borel theorem? How does this task differ for the Cantor and Dedekind definitions of real numbers? Since
the supremum and infimum may be infinite, this requires the definition of positive and negative infinities
before extended real numbers are defined. ]
[ Show that the real numbers are uncountable by using a diagonal construction from an ordering of a supposedly
countable set of real numbers. Maybe give two or more proofs of the uncountability of the real numbers.
Comment on the constructibility issues for this proof. ]
[ Even more than in the case of integers, the concept of “compressibility” of real numbers may be a useful
concept. (See comment at end of Remark 2.11.1.) Whereas all integers have a finite representation, almost
all real numbers have no finite representation, whether compressed or not. Therefore one could usefully
introduce a “code book” concept to select a finite or countable subset of the real numbers with partial or full
closure under arithmètic operations. Such a space of compressible numbers could provide a limited universe
of real numbers for doing arithmetic for some purposes. Some simple examples of this are the algebraic real
numbers and the sub-algebras of the real numbers generated by solutions of algebraic equations. ]
8.3.13 Remark: If space and time turn out to be discrete in some sense in some future fundamental theory
of the universe, it may be necessary to abandon the real numbers. It would be interesting to try to develop
various potential replacements for the real numbers in anticipation of future developments.
8.4. Extended real numbers

[ Give an axiomatization or two for the extended real numbers. ]
8.4.1 Definition: The extended real number system is. . .

−
8.4.2 Notation: IR denotes the set of extended real numbers IR ∪ {−∞, ∞}.
[ Give a comparative table of (extended) real number notations for various authors, including the sets of
positive and negative real numbers. ]

−
8.4.3 Remark: The notation IR for the extended real numbers is used by very few authors. (Federer [106],
2.1.1, page 51, does use this over-bar notation.) The principal advantage of the over-bar notation is the
ability to add superscripts, which is difficult for the more popular asterisk superscript notation IR∗ .
The asterisk superscript has the advantage of suggesting the usual notation for topological compactification
of a set. (See for example Taylor [145], page 34.) However, the topological space asterisk usually denotes
−
a one-point compactification, whereas IR is a two-point compactification. So there is not much loss in
abandoning the asterisk superscript.
[ Give a full set of arithmètic operations for the real numbers. ]
8.4.4 Remark: The “number” infinity, denoted ∞, may be thought of as the solution of the equation:
x = x + 1. (8.4.1)
This is not much more absurd than defining the imaginary number i as the solution of x = −1. The 2
equation x = x + 1 may be “solved” by iteration. Start with x = 1 and substitute this into the right hand
side. This gives x = 2. Now substitute again to obtain x = 3. By continuing this an infinite number of
times, the solution x converges to infinity! Another way to solve Equation (8.4.1) is to divide both sides
by x, which gives 1 = (x + 1)/x, which means that the ratio of x + 1 to x equals 1, which is a more and
more accurate approximation as x tends to infinity.
8.4.5 Notation:
− −
IR+ denotes the set of positive extended real numbers {r ∈ IR; r > 0}.
− −
IR+ denotes the set of non-negative extended real numbers {r ∈ IR; r ≥ 0}.
−0 −
IR− denotes the set of negative extended real numbers {r ∈ IR; r < 0}.
− −
IR−
0 denotes the set of non-positive extended real numbers {r ∈ IR; r ≤ 0}.
[ Define intervals of extended real numbers. Give notations. ]
8.5. Real number tuples

[ There is perhaps not much that can be said about rational and real number tuples without getting into
matrix algebra. However, it is useful to give some brief notations and definitions here. There could be some
“issues” arising from the superscripts on the positive/negative subsets. ]
8.5.1 Definition: IRn denotes the set of real n-tuples for n ∈ 0.
+
8.5.2 Remark: See Notation 7.7.1 for the definition of X n for general sets X. The conventional index set
of all n-tuples in IRn is the set Nn = {1, 2, . . . , n}, although there are strong arguments in favour of the
index set n = {0, 1, . . . n − 1}.
8.5.3 Definition: The m, n-concatenation operator for real (number) tuples for m, n ∈ +
0 is the function
Qm,n : IRm × IRn → IRn defined by
Qm,n : ((x1 , . . . xm ), (y1 , . . . yn )) 8→ (x1 , . . . xm , y1 , . . . yn ).
8.6. Some useful basic real-valued functions

[ See the document fund.tex for more basic function definitions. ]
8.6.1 Definition: The absolute value function | · | : IR → IR is defined by
(
x if x ≥ 0
∀x ∈ IR, |x| =
−x if x ≤ 0.
8.6.2 Definition: The sign function of a real variable is defined by
-
1 x>0
∀x ∈ IR sign(x) = 0 x=0
−1 x < 0.
8.6.3 Remark: The sign function is also called the signum function (from the Latin word “signum”).
Some authors use the notation sgn(x) for sign(x). The absolute value function is sometimes known as the
“modulus”. The absolute value and sign functions are illustrated in Figure 8.6.1.

8.6. Some useful basic real-valued functions 253
|x| sign(x)
3 3
2 2
1 1
-3 -2 -1 0 1 2 3 x -3 -2 -1 1 2 3 x
-1
Figure 8.6.1 Absolute value and sign functions
8.6.4 Theorem: ∀x ∈ IR, sign(x).|x| = x.
Proof: sign(x).|x| equals 1.x = x for x > 0, (−1).(−x) = x for x < 0, and 0.x = x for x = 0.
8.6.5 Remark: The Heaviside function H : IR → IR is variously defined with H(0) equal to 0, 1 or 12 .
In generalized function and transform contexts where it usually appears, the value of H(0) often has no
significance. As a Fourier transform, it usually is best to set H(0) = 12 , but for simplicity in calculations,
H(0) equal to 0 or 1 is more convenient. It is often advantageous for functions to be right-continuous, which
suggests that H(0) should equal 1. Generally the value is chosen to suit the application.
[ Give a comparative table of conventions for the Heaviside function’s notation and value at 0. ]
EDM2 [35] gives H(0) = 1 in section 125.E and appendix A, 12.II, but H(0) = 12 in section 306.B. Rudin [138],
page 180, exercise 24 gives H(0) = 0. Treves [147], pages 26 and 240, leaves H(0) undefined. Definition 8.6.6
gives the Heaviside function in terms of the signum function, which forces H(0) = 12 .
Some authors give the notation ε(t) for the Heaviside function of t. (E.g. see CRC [156], page F-180.) The
Heaviside function is also called the unit step function.
8.6.6 Definition: The Heaviside function H : IR → IR is defined by ∀x ∈ IR, H(x) = (1 + sign(x))/2.
8.6.7 Remark: As illustrated in Figure 8.6.2, the Heaviside function value H(0) is not the same as H(0)00 .
This could be regarded as a bad thing. Generally the preferred value for 00 would be 0. More serious, perhaps,
is the fact that H(x)p -= H(x) for x = 0 and integers p > 1. The equality H(x)p = H(x) does hold if H(0)
equals 0 or 1 (but not for any other values).
H(x) -≡ H(x)x0 H(x)x1 H(x)x2

2 2 2
1 1 1
-2 -1 0 1 2 x -2 -1 0 1 2 x -2 -1 0 1 2 x
Figure 8.6.2 Heaviside function multiplied by monomial
The value H(0) = 0 would have the advantage that H(x) = limp→0+ xp for all x ≥ 0. But the value H(0) = 1
2
has the advantage that H(x) = 1 − H(−x) for all x ∈ IR.
8.6.8 Definition: The floor function floor : IR → is defined by
∀x ∈ IR, floor(x) = JxK = sup{i ∈ ; i ≤ x}.
8.6.9 Definition: The ceiling function ceiling : IR → is defined by
∀x ∈ IR, ceiling(x) = LxM = inf{i ∈ ; i ≥ x}.

floor(x) ceiling(x)
4 4
3 3
2 2
1 1
-1 1 2 3 4 5 x 0 1 2 3 4 5 x
Figure 8.6.3 The floor and ceiling functions
8.6.10 Theorem: ∀x ∈ IR, ceiling(x) = − floor(−x).
8.6.11 Remark: Some authors use the notation ent(x) for floor(x). (E.g. see CRC [156], page F-180.)
8.6.12 Remark: Two useful functions which can be easily expressed in terms of the floor function are the
“fractional part” and “round” functions.
8.6.13 Definition: The fractional part function frac : IR → [0, 1) is defined by
∀x ∈ IR, frac(x) = x − floor(x).
8.6.14 Definition: The round function round : IR → is defined by
∀x ∈ IR, round(x) = sign(x). floor(|x| + 12 ).
This may also be called the nearest integer function.
round(x)
4
frac(x)
3
2
1
1
x x
-3 -2 -1 0 1 2 3 -1 0 1 2 3 4 5
Figure 8.6.4 Fractional part and rounding functions
8.6.15 Remark: There are several common variants of the rounding function in Definition 8.6.14. Probably
the most popular version rounds to the nearest integer “away from zero”, as defined here.
8.6.16 Definition: The modulo function mod : IR × (IR \ {0}) → IR+

0 is defined by
∀x ∈ IR, ∀m ∈ IR \ {0}, 0 ; ∃i ∈ , x = i.m + y}.

x mod m = inf{y ∈ IR+

8.6. Some useful basic real-valued functions 255
8.6.17 Remark: The modulo function satisfies

" #
∀x ∈ IR, ∀m ∈ IR \ {0}, x mod m = inf {x + i.m; i ∈ } ∩ IR+
0
= inf {x − k|m|; k ∈ and x ≥ k|m|}
= x − |m|Jx/|m|K
= m frac(x/m).
Figure 8.6.5 illustrates the modulo function and a shifted modulo function x 8→ (x + a) mod 2a − a. Such
functions are familiar in electronics as “relaxation oscillator” waveforms.
x mod a (x + a) mod 2a − a
a a
− 2a −a 0 a 2a
x x
− 2a −a 0 a 2a
−a
Figure 8.6.5 Modulo functions
8.6.18 Theorem: The absolute value, floor and modulo functions satisfy the following:
(i) ∀a ∈ IR, ∀b ∈ IR \ {0}, a mod b = a mod (−b) = a mod |b|.
(ii) ∀a ∈ IR, ∀b ∈ IR \ {0}, 0 ≤ a mod b < |b|.
(iii) ∀a ∈ IR, ∀b ∈ IR \ {0}, a mod b = a − |b|. floor(a/|b|).
8.6.19 Remark: The sawtooth functions shown in Figure 8.6.6 are useful for defining the sine and cosine
functions in Section 20.13.
|(x + a) mod 2a − a| |(x + 3a) mod 4a − 2a| − a
a
a
−4a −2a 2a 4a
x −3a −a a 3a x
− 2a −a 0 a 2a
−a
Figure 8.6.6 Sawtooth functions
A particular sawtooth function is the “distance to nearest integer”, defined by inf{|x − k|; k ∈ }. The
difference between x and the rounded value round(x) = ceiling(x − 12 ) is
" #
x − sign(x). floor(|x| + 12 ) = sign(x). |x| − floor(|x| + 12 )
" #
= sign(x). frac(|x| + 12 ) − 12 .
Therefore the distance from a real number x to the nearest integer equals the absolute value of this, which
is & & & &
&frac(|x| + 1 ) − 1 & = &(|x| + 1 ) mod 1 − 1 &
2 2 2 2
= (|x| + 12 ) mod 1 − 12
= (x + a) mod 2a − a
with a = 12 , which is illustrated in Figure 8.6.6.

8.6.20 Remark: Figure 8.6.7 shows some basic square functions. The functions floor(x) and floor(x/a) −
2 floor(x/(2a)) are right-continuous. The functions ceiling(x) and 2 ceiling(x/(2a)) − ceiling(x/a) are left-
continuous.
floor(x/a) − 2 floor(x/(2a)) 2 ceiling(x/(2a)) − ceiling(x/a)
a a
x x
− 2a −a 0 a 2a − 2a −a 0 a 2a
Figure 8.6.7 Square functions
8.7. Complex numbers

[ Give an axiomatisation for the complex numbers. ]
8.7.1 Notation: denotes the set of complex numbers.
[ Must define a representation of the complex numbers here. ]

[ Present complex number arithmetic, order(?), intervals(??), metric, norm, modulus, Cauchy sequences and
embedding in IR. Also present complex number tuples? ]
8.7.2 Remark: Complex numbers are avoided as far as possible in this book. Life is already difficult
enough without them.
8.7.3 Remark: There is a strong argument for the reality of complex numbers in the properties of Taylor
series. For example, consider the function f : IR → IR defined by f : x 8→ (1 + x2 )−1 . This function is
clearly analytic, but the Taylor series for f at a point x0 ∈ IR has convergence radius (1 + x20 )1/2 , which is
coincidentally the distance from the point (x0 , 0) to (0, 1) in the complex plane, and (0, 1) happens to be a
pole of the analytic extension of f to the complex numbers. (See Figure 8.7.1.)
1
ce con
gen ver
onver us rad gence
x4 c
rad
i ius x3 x
-6 -5 -4 -3 -2 -1 x1 1 x2 2 3 4 5 6
−1
radius of convergence of (1 + x2 )−1

Figure 8.7.1 Convergence radii suggesting complex pole for real-analytic function
This kind of rule holds generally for real-analytic functions, which suggests that even if the complex numbers
are ignored, they still have their unavoidable effect on the real line. This is reminiscent of the way in which
geophysicists detect minerals a long distance under the ground using gravitometric, electromagnetic and
other detection methods. The existence and nature of minerals deep underground may be inferred from
evidence obtained entirely above ground. In the same way, the poles of real-analytic functions may be
detected “deep in the complex plane” may be detected by their influence on the radius of convergence of
Taylor series for real-valued functions evaluated purely within the real line.

8.7. Complex numbers 257
This kind of argument only shows the “reality” of the complex numbers if one accepts the “reality” of power
series. But there is no reason in physics why power series should have a special significance, apart from
the fact that addition and multiplication are programmed into calculators. It is very unusual for a physical
system to have state functions which are truly real analytic except in mathematical models, because an
analytic function is determined everywhere from any infinitesimal region. At the very microscopic level at
which quantum theory is effective, probably the real functions are indeed real analytic; so one would expect
complex numbers to be relevant. The usefulness in quantum theory of complex numbers is therefore not
surprising. There seems to be no argument for the “reality” of quaternions similar to the argument for the
reality of complex numbers.


[259]
Chapter 9
Algebra
9.1 Semigroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

9.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
9.3 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
9.4 Left transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.5 Right transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.6 Mixed transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.7 Figures and invariants of transformation groups . . . . . . . . . . . . . . . . . . . . . . . . 276
9.8 Rings and fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
9.9 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
9.10 Associative algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
9.11 Lie algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
9.12 List space for sets with algebraic structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Figure 9.0.1 shows the family relations between several of the algebraic classes in this chapter. (Another
family tree is shown in Figure 9.9.1.)
semigroup
(G,σG )
group transformation semigroup
(G,σG ) (G,X,σG ,µ)
commutative group transformation group

(G,σG ) (G,X,σG ,µ)
left module over a group
(G,M,σG ,σM ,µ)
ring effective transformation group
(R,σR ,τR ) (G,X,σG ,µ)
unitary ring commutative ring free transformation group

(R,σR ,τR ) (R,σR ,τR ) (G,X,σG ,µ)
commutative unitary ring

(R,σR ,τR )
field
(K,σK ,τK )
Figure 9.0.1 Family tree of semigroups, groups, rings and fields
Only the basic facts are given about most of these classes although transformation groups are presented in
more detail because of their importance in differential geometry. Linear spaces are even more important; so
they have their own chapter: Chapter 10.

260 9. Algebra
As explained in Section 5.16, specification tuples for mathematical objects may be abbreviated. For example,
a group may either be fully specified as a tuple (G, σ), where G is the set and σ : G × G → G is the operation
on G, or else the group may be abbreviated to just G. The adoption of an abbreviation is indicated by a
notation such as G − < (G, σ), which means that “G is an abbreviation for (G, σ)”. The meaning of G is
ambiguous because it represents both the basic set and the full tuple, which cannot be equal, but this kind
of symbol re-use is standard practice. The chicken-foot notation “− <” is non-standard though.
9.0.1 Remark: Algebra is distinguished from analysis by the absence of limits and “infinite processes” in
algebra. Bell [191], page 355, said the following on this subject in 1937.
[. . . ] it must be kept in mind that algebra deals only with finite processes; when infinite processes
enter, as for example in summing an infinite series, we are thrust out of algebra into another
domain. This is emphasized because the usual elementary text labelled “Algebra” contains a great
deal—infinite geometric progressions, for instance—that is not algebra in the modern meaning of
the word.
9.1. Semigroups
Groups (Definition 9.2.4) are algebraic systems with a single set and a single operation defined on that set.
This puts groups among the simplest of algebraic systems. A semigroup is a stripped-down version of a
group. A semigroup is not required to have inverses and an identity as a group is. (If groups were bacteria,
semigroups would be viruses.)
9.1.1 Definition: A semigroup is a pair G −
< (G, σ) where G -= ∅ and σ : G × G → G satisfies
(i) ∀g1 , g2 , g3 ∈ G, σ(g1 , σ(g2 , g3 )) = σ(σ(g1 , g2 ), g3 ). (Associativity.)
9.1.2 Remark: The associativity condition (i) in Definition 9.1.1 is what one expects from a set of actions
which have some sort of memory of the result of the actions. Otherwise how would the operations “know”
that the operation of g1 with σ(g2 , g3 ) is supposed to give exactly the same result as the operation of
σ(g1 , g2 ) with g3 ? Somehow some memory of the elements g2 and g3 must be included in the combined
group element σ(g2 , g3 ). Associativity looks a little like a conservation law, and one might ask what is being
conserved so that the rule can be maintained so precisely.
When presented with an operation which satisfies Definition 9.1.1, it would be reasonable to infer that
there is a hidden “state space” somewhere because associativity is a characteristic of the composition of
functions, and the closure of the semigroup operation is a characteristic of a set of functions which have a
common domain and range. The domain of the functions may be thought of as a state space which is being
transformed. Conversely, for g ∈ G, let Lg : G → G denote the map Lg : h 8→ σ(g, h). Then Definition 9.1.1
specifies that {Lg ; g ∈ G} is a closed set of functions and Lg1 ◦ Lg2 = Lσ(g1 ,g2 ) for all g1 , g2 ∈ G. So
{Lg ; g ∈ G} is a semigroup of transformations which is isomorphic to G itself. This isomorphism occurs if
and only if the operation σ is associative. In practice, there is often a set other than G on which the elements
of G act, and this results in the associative rule.
If the reader is still sceptical, think of a child’s first experiences with multiplication. It can come as a surprise
that, for example, (5 × 3) × 2 gives the same result as 5 × (3 × 2), and that this is a general rule. It later
becomes clear that this rule follows from the fact that the product of 3 integers corresponds to the volume
of a rectangular solid. The different order of calculation corresponds to different ways of adding up the unit
cubes in the solid. So this associative rule follows from the fact that multiplication of numbers is modelled
on an underlying system. Multiplication is not just an abstract set of rules which someone made up.
9.1.3 Example: The cross product for the vectors i, −i, j, −j, k, −k, 0 ∈ IR3 is a non-associative closed
operation. For example, Li◦j (j) = −i -= 0 = Li (Lj (j)). So this is not a semigroup. Therefore it cannot be
identifed with a set of transformations.
9.1.4 Remark: A group is required to have an inverse element, which suggests that in some sense, no
“information” is lost in the transformations corresponding to group elements. In other words, all actions
are reversible. If an action has no inverse, this suggests that “information” has been lost. The original state
cannot be recovered. So semigroups would typically be useful for irreversible operations while groups are
useful for reversible operations.

9.2. Groups 261
In the case of the operator solutions of the heat equation, the operators are one-to-one but not onto. (See
Treves [147], section 6.1, page 41.) They are only non-invertible because they are not surjective. The
impossibility of defining general negative time-step operators as solutions to the heat equation arises from
the fact that arbitrary time-zero functions are generally not smooth enough. One may construct negative
time-step operator solutions using fourier transforms, but they do not yield convergent integrals for general
functions. Therefore, in a strict sense, one may say that information is not lost in this situation. But in
practice, we know that noise tends to obscure smoothed heat distributions, which makes it impossible to
make the reverse-time transformation to recover the initial heat distribution. In other words, the inverse
operator is extremely unstable. Therefore, in practice, information is lost. In fact, heat redistribution does
correspond to an increase in entropy, which is a loss of information.
So the conclusion remains that groups tend to be associated with reversible, information preserving systems
whereas semi-groups are associated with irreversible, information-losing systems.
9.2. Groups
9.2.1 Remark: The early development of group theory is generally attributed to Galois (1811–1832).
However, Bell [191], page 278, makes the following comment, apparently attributing group theory to Cauchy
(1789–1857).
[. . . ] the theory of substitutions, begun systematically by Cauchy, and elaborated by him in a long
series of papers in the middle 1840’s, which developed into the theory of finite groups [. . . ]
According to Szekeres [45], page 27:
The concept of a group has its origins in the work of Evariste Galois (1811–1832) and Niels Henrik
Abel (1802–1829) on the solution of algebraic equations by radicals.
Bell [191], page282, states that the transition from a concrete group (such as a group of permutations) to
an abstract group (defined only by labels for its elements and the “multiplication table” for the elements)
was made by Cayley (1821–1895).
This abstract point of view is that now current. It was not Cauchy’s, but was introduced by Cayley
in 1854. Nor were completely satisfactory sets of postulates for groups stated till the first decade
of the twentieth century.
Oddly, Bell [191], page 518, also attributes the abstract approach to group theory to Dedekind.
Dedekind was one of the first to appreciate the fundamental importance of the concept of a group
in algebra and arithmetic. In this early work Dedekind already exhibited two of the leading charac-
teristics of his later thought, abstractness and generality. Instead of regarding a finite group from
the standpoint offered by its representation in terms of substitutions [. . . ], Dedekind defined groups
by means of their postulates [. . . ] and sought to derive their properties from this distillation of their
essence. This is in the modern manner: abstractness and therefore generality.
9.2.2 Remark: Definition 9.2.4 says that any two actions of the group are related by a unique third action
of the group. This is a kind of algebraic closure of the group under a quotient operation.
9.2.3 Remark: Transformation groups are presented in Sections 9.4 and 9.5, but abstract groups and
semigroups have many of the characteristics one would associate with transformation groups. In practice,
apart from abstract examples in pure mathematics and groups embedded within other algebraic structures
(such as fields), groups and semigroups tend to be groups of transformations of some space of states of some
system. (See also some related comments in Remark 9.1.2.)
9.2.4 Definition: A group is a pair G −

< (G, σ) such that G -= ∅ and σ : G × G → G satisfies the following.
(ii) ∀a, b ∈ G, ∃# g% , gr ∈ G, σ(g% , b) = σ(b, gr ) = a. (Unique existence of left and right quotients.)
[ Should revert to the traditional definition of groups (Definition 9.2.18) and present Definition 9.2.4 (the
“clever definition”) and an alternative. ]

262 9. Algebra
9.2.5 Notation: The group operation σ of a group (G, σ) in Definition 9.2.4 may be notated additively
or multiplicatively. In the additive notation, σ(g1 , g2 ) is written as g1 + g2 . In the multiplicative notation,
σ(g1 , g2 ) is written as g1 g2 or g1 ◦g2 , or sometimes as g1 ·g2 . Due to the associativity condition (i), parentheses
are optional. Multiplicative notation is used in this section. Additive notation is usually used when the group
is embedded in a two-operation algebraic structure such as a ring or field.
The circle symbol ◦ is also used for the composition of arbitrary functions. To avoid ambiguity, it is better to
not use ◦ for group operations unless the group operation happens to be some sort of function composition.
9.2.6 Remark: There is no very good reason why semigroups should be non-empty. In this book, trivial
cases of mathematical classes are permitted as much as possible.
In the case of a group, it is desirable to always be able to say that there is an identity element. Therefore in
Definition 9.2.4, there is an a-priori requirement of non-emptiness. Otherwise one would have to talk about
“non-empty groups” all the time, which would be tedious. Definition 9.2.4 is a rarely mentioned definition
of a group which substitutes the unique existence of left and right inverses for the identity and two-sided
inverse conditions (Definition 9.2.18). The conditions are equivalent except that 9.2.4 (ii) permits the group
to be empty.
9.2.7 Notation:
Lg for g in a group G denotes the function Lg : G → G defined by Lg : h 8→ gh.
Rg for g in a group G denotes the function Rg : G → G defined by Rg : h 8→ hg.
9.2.8 Definition: The function Lg in Notation 9.2.7 may be referred to as the left action by g (on G).
The function Rg may be referred to as the right action by g (on G).
9.2.9 Remark: Associativity may be thought of as a kind of commutativity because associativity means
that the left and right actions of group elements commute. In other words, Lg1 ◦ Rg2 = Rg2 ◦ Lg1 for
all g1 , g2 ∈ G. Group elements h ∈ G may be thought of as “two-port objects” which can be multiplied
from the left or the right, and the actions on these two ports commute. This contrasts with transformation
groups and vector spaces, for example, where the elements of the “passive set” may be multiplied only on
one side (typically on the left) by elements of the “active set”.
9.2.10 Definition: An identity for a group G is any element e ∈ G such that eg = ge = g for all g ∈ G.
9.2.11 Theorem: For any group G, an identity e ∈ G exists and is unique.
Proof: Definition 9.2.4 (ii) implies that there are well-defined quotient functions Q% : G × G → G and
Qr : G × G → G such that Q% (a, b)b = a and bQr (a, b) = a for all a, b ∈ G.
Choose any a ∈ G and let e% = Q% (a, a). Then for any g ∈ G, e% g = e% (aQr (g, a)) = (e% a)Qr (g, a) =
aQr (g, a) = g. Therefore e% = Q% (g, g) for all g ∈ G. This is a “left identity” for the group. Similarly, there
is a “right identity” er such that er = Qr (g, g) for all g ∈ G. To show that e% = er , note that e% = e% er = er .
So an identity e = e% = er exists and is unique.
9.2.12 Remark: Probably the customary notation “e” for the identity in Definition 9.2.4 is a mnemonic
for the German word “Einheit”, which means “unity”, because the identity in a multiplicatively notated
group is also called the unit element or unity.
9.2.13 Definition: A left inverse of an element g in a group G is an element h ∈ G such that hg = e.

A right inverse of an element g in a group G is an element h ∈ G such that gh = e.
An inverse of an element g in a group G is an element h ∈ G such that hg = gh = e.
9.2.14 Theorem: For any element g of a group G, g has a unique inverse in G.
Proof: Let g% , gr ∈ G be the unique solutions of g% g = e and ggr = e, where e ∈ G is the unique two-sided
identity guaranteed by Theorem 9.2.11. Then g% = g% e = g% (ggr ) = (g% g)gr = egr = gr . Therefore g% = gr is
the unique two-sided inverse of g.
9.2.15 Notation: For any element g in a group G, g −1 denotes the inverse of g in G.

9.2. Groups 263
9.2.16 Remark: Although left and right inverses always exist and are the same for a group, they may be
different, non-unique or even non-existent for semigroups.
9.2.17 Remark: Condition (ii) of Definition 9.2.4 is equivalent to the combination of conditions (ii) and
(iii) in the alternative Definition 9.2.18 which is more commonly presented.
9.2.18 Definition (→ 9.2.4): A group is a pair G −
< (G, σ) such that σ : G×G → G satisfies the following.
(ii) ∃e ∈ G, ∀g ∈ G, σ(e, g) = σ(g, e) = g. (Existence of identity.)
(iii) ∀g ∈ G, ∃g # ∈ G, σ(g # , g) = σ(g, g # ) = e. (Existence of inverses.)
9.2.19 Remark: Conditions (ii) and (iii) of Definition 9.2.18 require the existence of an identity element
and the existence of an inverse for each element in the group. One discomfort of these two conditions is that it
must first be shown that the identity element e in (ii) is unique so that (iii) will be meaningful. The combined
condition Definition 9.2.4 (ii) does not have this problem but does require some work to demonstrate the
existence of the identity and inverses.
Definition 9.2.18 has the disadvantage that it requires a-priori that the left and right identity and inverses be
the same, whereas Definition 9.2.4 has the disadvantage that it requires the a-priori uniqueness of quotients.
Thus to demonstrate that a pair (G, σ) is a group requires different work in each case.
Theorems 9.2.11 and 9.2.14 show that a pair (G, σ) satisfies Definition 9.2.18 if it satisfies Definition 9.2.4.
To show the converse, first note that the identity and inverses in Definition 9.2.18 (ii) and (iii) are unique
because e1 = e1 e2 = e2 for any identities e1 , e2 ∈ G, and similarly if g1# , g2# ∈ G are inverses for g ∈ G, then
g1# = g1# e = g1# (gg2# ) = (g1# g)g2# = eg2# = g2# . In terms of the unique inverses, one may construct g% = ab−1 and
gr = b−1 a as solutions to Definition 9.2.4 (ii). The uniqueness of the solution g% follows from the observation
that if g% is any solution of g% b = a, then g% = g% e = g% (bb−1 ) = (g% b)b−1 = ab−1 . The uniqueness of gr
follows identically. Therefore Definitions 9.2.4 and 9.2.18 are equivalent.
9.2.20 Definition: A commutative group is a group (G, σ) such that σ(g1 , g2 ) = σ(g2 , g1 ) for all g1 , g2 ∈ G.
9.2.21 Remark: Commutative groups are also known as “Abelian groups” after Niels Henrik Abel (1802–
29).
9.2.22 Definition: A group homomorphism from a group G1 − < (G1 , σ1 ) to a group G2 −< (G2 , σ2 ) is a
map φ : G1 → G2 such that φ(σ1 (g, h)) = σ2 (φ(g), φ(h)) for all g, h ∈ G1 . Less formally, φ(gh) = φ(g)φ(h)
for g, h ∈ G1 .
A group isomorphism from a group G1 to a group G2 is a bijective group homomorphism φ : G1 → G2 .
A group endomorphism of a group G is a group homomorphism φ : G → G.
A group automorphism of a group G is a group isomorphism φ : G → G.
A group monomorphism from a group G1 to a group G2 is an injective group homomorphism φ : G1 → G2 .
A group epimorphism from a group G1 to a group G2 is a surjective group homomorphism φ : G1 → G2 .
9.2.23 Notation: Let G, G1 , G2 be groups.
Hom(G1 , G2 ) denotes the set of group homomorphisms from G1 to G2 .
Iso(G1 , G2 ) denotes the set of group isomorphisms from G1 to G2 .
End(G) denotes the set of group endmorphisms of G.
Aut(G) denotes the set of group automorphisms of G.
Mon(G1 , G2 ) denotes the set of group monomorphisms from G1 to G2 .
Epi(G1 , G2 ) denotes the set of group epimorphisms from G1 to G2 .
9.2.24 Remark: The inverse φ−1 of a group isomorphism φ in Definition 9.2.22 is not explicitly required
to be a homomorphism. But it is readily verified that for any g2 , h2 ∈ G2 , φ−1 (g2 h2 ) = φ−1 (φ(g1 )φ(h1 )) =
φ−1 (φ(g1 h1 )) = g1 h1 = φ−1 (g2 )φ−1 (h2 ), where g1 = φ−1 (h1 ) and g2 = φ−1 (h2 ).
[ Possibly present a formalization of group theory in terms of first-order logic without using set theory. Then
also do such a formalization for each other species of algebra? Maybe there could be a separate section of this
chapter for such first-order language formalizations, which are really abstract axiomatic approaches at the
linguistic level, as opposed to representations in terms of ZF sets. See Section 4.14 for predicate calculus. ]

264 9. Algebra
9.3. Subgroups
9.3.1 Definition: A subgroup of a group (G, σG ) is a group (H, σH ) such that H ⊆ G and σH ⊆ σG .
&
9.3.2 Remark: If (H, σH ) is a subgroup of (G, σG ), then σH = σG &H×H . In terms of the left and right
quotient functions Q% : G×G → & G and Qr : G×G → G defined for an arbitrary group (G, σG ) in the proof of
Theorem 9.2.11, a pair (H, σG &H×H ) is a subgroup of (G, σG ) if and only if ∅ -= H ⊂
%= G and Q% (H × H) ⊆ H
(or equivalently, Qr (H × H) ⊆ H).
9.3.3 Definition: The left coset of a subgroup H of a group G by an element g ∈ G is the set {gh; h ∈ H}.
The right coset of a subgroup H of a group G by an element g ∈ G is the set {hg; h ∈ H}.
9.3.4 Notation: Let H be a subgroup of a group G and let g ∈ G. Then gH denotes the left coset
{gh; h ∈ H} and Hg denotes the right coset {hg; h ∈ H}.
9.3.5 Remark: In terms of Notations 9.2.7 and 9.3.4, gH = Lg (H) and Hg = Rg (H).
9.3.6 Theorem: The set {gH; g ∈ G} of left cosets of a subgroup H of a group G is a partition of G.
Similarly, the set {Hg; g ∈ G} of right cosets of H is a partition of G.
Proof: For left cosets, suppose g1 H ∩ g2 H -= ∅. Then g1 h1 = g2 h2 for some h1 , h2 ∈ H. For any h ∈ H,
g1 h = g1 h1 (h−1 −1
1 h) = g2 h2 h1 h ∈ g2 H because H is a subgroup of G. So g1 H ⊆ g2 H. Similarly, g1 H ⊇ g2 H.
So g1 H = g2 H. The proof for right cosets is the same.
9.3.7 Remark: A simple kind of left action µ : G × X → X may be defined on the set X = {gH; g ∈ G}
of left cosets of H by µ : (g # , gH) 8→ (g # g)H. This is clearly associative in the sense that g1 (g2 H) = (g1 g2 )H
for all g1 , g2 ∈ G. Therefore one may write simply g1 g2 H instead of µ(g1 , g2 H).
An attempt to define translations of left cosets from the right as a map ν : X × G → X with ν : (gH, g # ) 8→
{ghg # ; h ∈ H} does not necessarily yield left or right cosets. If one attempts instead to define ν : (gH, g # ) 8→
(gg # )H, the left cosets (gg # )H may depend on the choice of g # . That is, if g1 H = g2 H, it is not guaranteed
that (g1 g # )H = (g2 g # )H. The map ν would be well-defined if the left and right cosets of H yielded the same
partition of G. Such a subgroup H is called a “normal subgroup” of G. For such a subgroup, the maps
µ : G × X → X and ν : X × G → X well-defined, and also the values µ(g # , gH) and ν(gH, g # ) depend only on
the coset of H which g # belongs to. Therefore one may construct a well-defined operation σ # : X × X → X
with σ # : g1 H × g2 H → (g1 g2 )H. Then it turns out that (X, σ # ) is a group. This is called the quotient
group of G modulo H and is given the notation G/H. This is presented more formally in Definitions 9.3.8
and 9.3.10.
9.3.8 Definition: A normal subgroup of a group G is a subgroup H of G such that

∀g ∈ G, ∀h ∈ H, ∃h# ∈ H, gh = h# g.
9.3.9 Remark: Definition 9.3.8 for a normal subgroup H of G is equivalent to requiring that gH = Hg
for all g ∈ G. To show that the mirror image of the condition in Definition 9.3.8 holds, note that for all
g ∈ G and h ∈ H, hg = (g −1 h−1 )−1 = (h# g −1 )−1 = g(h# )−1 for some h# ∈ H, and so (h# )−1 ∈ H.
9.3.10 Definition: The quotient group of G modulo H for any normal subgroup H − < (H, σH ) of a group
G− < (G, σG ) is the group (G/H, σG/H ), where G/H = {gH; g ∈ G} and σG/H : (g1 H, g2 H) 8→ g1 g2 H for
all g1 , g2 ∈ G.
9.3.11 Remark: Quotient groups are also known as “factor groups”. To show that quotient groups in
Definition 9.3.10 are well-defined, it must first be checked that the operation σG/H is well-defined. It must
be shown that if g1 H = g1# H and g2 H = g2# H, then g1 g2 H = g1# g2# H. Suppose that g1 H = g1# H and
g2 H = g2# H. Then g1# = g1 h1 and g2# = g2 h2 for some h1 , h2 ∈ H. So g1# g2# H = (g1 h1 )(g2 h2 )H = g1 h2 g2 H.
But h2 g2 = g2 h#2 for some h#2 ∈ H because H is a normal subgroup of G. So g1# g2# H = g1 g2 h#2 H = g1 g2 H.
The identity of G/H is eG H = H, and the inverse of gH is g −1 H.

9.3. Subgroups 265
9.3.12 Remark: Remark 9.3.7 and Definitions 9.3.8 and 9.3.10 may be summarized as follows from the
perspective of Sections 9.4 and 9.5. For an arbitrary subgroup H of a group G, define the maps µ% : G×X% →
X% with µ% : (g # , gH) 8→ (g # g)H and µr : Xr ×G → Xr with µr : (Hg, g # ) 8→ H(gg # ), where X% = {gH; g ∈ G}
and Xr = {Hg; g ∈ G}. Then for any subgroup H of G, µ% satisfies Definition 9.4.4 for a left transformation
group and µr satisfies Definition 9.5.2 for a right transformation group. But when H is a normal subgroup
of G, X% = Xr = G/H becomes a group which combines the left and right actions µ% and µr .
Curiously, something similar happens with fibre bundles in Chapter 22. An ordinary fibre bundle defines a
left transformation operation on the fibre space whereas a principal fibre bundle defines a two-sided group
action on the fibre space. Maybe these observations are connected in some way. Maybe not.
[ Here present isomorphism theorems, etc. ]
9.3.13 Definition: The (right) conjugate of a subset S of a group G by an element g ∈ G is the subset
g −1 Sg = {g −1 xg; x ∈ S} of G.
Two subsets S1 and S2 of a group G are said to be conjugate subsets if S1 is the conjugate of S2 by some
element of G.
9.3.14 Remark: A problem with conjugation of subsets in Definition 9.3.13 is the fact that some authors
define it as g −1 Sg while others define it as gSg −1 . These are not always the same. However, g −1 Sg = gSg −1
for all g ∈ G if S is a normal subgroup of G. The converse almost follows if S is a subgroup of G, but not
quite. (See Exercise 46.4.1.) The same issue arises in Definition 9.3.22.
Quite often, it doesn’t matter which definition is used. For example, the centralizer in Definition 9.3.27 seeks
those group elements g ∈ G such that g −1 Sg = S, but this holds if and only if S = gSg −1 .
In the interests of completeness, Definition 9.3.15 gives the corresponding left conjugate if the conjugate in
Definition 9.3.13 is called the right conjugate.
9.3.15 Definition: The left conjugate of a subset S of a group G by an element g ∈ G is the subset
gSg −1 = {gxg −1 ; x ∈ S} of G.
9.3.16 Remark: If G is a left transformation group (defined in Section 9.4), the right conjugate may be
interpreted as a “pull-back” function. This is because the action of g −1 hg on a point x is the action of h
on g(x), pulled back by g −1 . Similarly the left conjugate can be interpreted as a “push-forth” function. This
insight probably has some tenuous relevance to differentiable groups in Chapter 33.
Figure 9.3.1 shows the idea that a right and left conjugate of an element h by an element g of a group G
may be interpreted respectively as a pull-back of h from g to e or a push-forth of h from e to g.
right conjugate g −1 hg left conjugate ghg −1
g −1 g
g −1 hg hg h gh
G g −1 hg h h ghg −1 G
e g g e g
g −1
pull-back of h from g to e push-forth of h from e to g

Figure 9.3.1 Right and left conjugates as pull-back and push-forth
This is related to the concept of parallel transport. The element h is transported between the points e and
g in the group. In some sense, a conjugate g −1 hg or ghg −1 is parallel to h by being transported along the
path consisting of g followed by g −1 or vice versa. If the group is commutative, all conjugates of h are equal
to h, in which case one could say that there is an “absolute parallelism”, i.e. a path-independent parallelism.
The pull-back and push-forth concepts are also mentioned in Remarks 10.5.25, 26.9.3 and 30.1.1. Covariant
differentiation in Section 36.5 may be regarded as a form of pull-back because it involves pulling back a

266 9. Algebra
function value in a parallel fashion from a variable point to a fixed point so that it can be compared to the
function values at the fixed point. A covariant derivative is a measure of deviation from absolute parallelism.
9.3.17 Example: Illustrated in Figure 9.3.2 is the example G = GL(2), the group of invertible linear
transformations of IR2 , with g, h ∈ G defined by the left action of the matrices Rθ and S0 respectively on
column vectors in IR2 , where for all θ ∈ IR,
3 4
cos θ − sin θ
Rθ =
sin θ cos θ
3 4
− cos 2θ − sin 2θ
Sθ = .
− sin 2θ cos 2θ
That is, Rθ rotates points of IR2 by angle θ in an anti-clockwise direction and Sθ reflects points through a
line normal to the vector (cos θ, sin θ).
g = Rθ h = S0 g −1 = R−θ
Rθ S0 R−θ
θ
θ
S−θ Sθ
right −1
θ
left
conjugate g hg ghg −1 conjugate
θ
R−θ S0 Rθ = S−θ Rθ S0 R−θ = Sθ

pull-back of h by g push-forth of h by g
Figure 9.3.2 Right conjugate is a pull-back, left conjugate is a push-forth
The effect of the right conjugate g −1 hg, which has matrix R−θ S0 Rθ is to pull back the direction of reflection
of S0 by the angle θ to S−θ . The effect of the left conjugate ghg −1 , which has matrix Rθ S0 R−θ is to push
forth the direction of reflection of S0 by the angle θ to Sθ . In other words, the left conjugate by g rotates h
forward in the direction of g.
In connection with this example, see Exercises 46.4.2 and 46.4.3.
9.3.18 Notation: S g for a subset S of a group G and g ∈ G denotes the conjugate of S by g.
9.3.19 Remark: In terms of Notation 9.2.7, S g = g −1 Sg = Lg−1 (Rg (S)) = Rg (Lg−1 (S)) = (Lg−1 ◦
Rg )(S) = (Rg ◦ Lg−1 )(S).
Similarly for the left conjugate, gSg −1 = Lg (Rg−1 (S)) = Rg−1 (Lg (S)) = (Lg ◦ Rg−1 )(S) = (Rg−1 ◦ Lg )(S).
For proof, see Exercise 46.4.4.
9.3.20 Theorem: Let H be a subgroup of a group G. Then for all g ∈ G, g −1 Hg is a subgroup of G.
9.3.21 Remark: Conjugations maps are automorphisms which are related to adjoint representations of
groups.
9.3.22 Definition: The (right) conjugation map of a group G by an element g ∈ G is the map h 8→ g −1 hg
for h ∈ G.
An inner automorphism of a group G is the same thing as a conjugation map.
The left conjugation map of a group G by an element g ∈ G is the map h 8→ ghg −1 for h ∈ G.
9.3.23 Remark: In terms of Notation 9.2.7, the right conjugation map by g ∈ G is Lg−1 ◦ Rg = Rg ◦ Lg−1 .
The left conjugation map by g ∈ G is Lg ◦ Rg−1 = Rg−1 ◦ Lg .

9.4. Left transformation groups 267
9.3.24 Theorem: All conjugation maps of a group are automorphisms.

9.3.25 Definition: The normalizer of a subset S of a group G is the subset {g ∈ G; g −1 Sg = S} of G.
9.3.26 Remark: The definition of the normalizer of a subset of a group is the same whether the right or
left conjugate is used. The same comment applies to the centralizer and centre of a group.
9.3.27 Definition: The centralizer of a subset S of a group G is the set {g ∈ G; ∀x ∈ S, g −1 xg = x}.
The centre of a group G is the centralizer of G in G.
9.3.28 Notation: N (S) for a subset S of a group G denotes the normalizer of S.
Z(S) for a subset S of a group G denotes the centralizer of S.
9.3.29 Theorem: If S is a subset of a group G, then the normalizer N (S) is a subgroup of G.
If S is a subset of a group G, then the centralizer Z(S) is a subgroup of G.
9.3.30 Definition: The direct product of groups (G1 , σ1 ) and (G2 , σ2 ) is the group (G, σ) where G =
G1 × G2 and σ : G × G → G satisfies σ : ((g1 , g2 ), (g1# , g2# )) 8→ (g1 g1# , g2 g2# ).
The direct product of (G1 , σ1 ) and (G2 , σ2 ) may be denoted as (G1 , σ1 ) × (G2 , σ2 ).
The direct product of a finite family of groups (Gi , σi )i∈I is the group (G, σ) where G = ×i∈I Gi and σ :
G × G → G satisfies σ : ((gi )i∈I , (gi# )i∈I ) 8→ (gi gi# )i∈I .
9.3.31 Remark: The identity of the direct product group in Definition 9.3.30 is the element e = (e1 , e2 ),
where e1 and e2 are the identities of G1 and G2 respectively. The inverse of (g1 , g2 ) ∈ G is (g1−1 , g2−1 ). Each
of the groups G1 and G2 is called a “direct factor” of G.
[ See EDM2 [35], 190.L, for further information on direct products. ]
9.3.32 Remark: It does not seem to be possible to vary the operation σ in Definition 9.3.30 in any simple
way. For example, the operation ((g1 , g2 ), (g1# , g2# )) 8→ (g1 g1# , g2−1 g2# ) would not generally be a group operation
because the identity element would not work correctly. So “skewed” direct product operations do not seem
to be possible as they are in the case of action maps of direct products of transformation groups. (See
Definitions 9.6.4, 9.6.5, 9.6.8 and 9.6.9.)
[ Maybe define direct products of infinite families of groups, if any use can be found for these. Mention the
question of axiom of choice for assuring that an infinite direct product is non-empty. ]
[ Define semi-direct products of groups here. See EDM2 [35], 190.N. Check whether they are related to my
skew products. Maybe even skew products are not very useful anyway, since their use in defining connections
is questionable. ]
9.3.33 Remark: Any group (G, σ) may be embedded diagonally in the direct product (G, σ)×(G, σ) under
the map φ : G → G × G defined by φ : g 8→ (g, g), or via the maps φ1 : g 8→ (g, e) or φ2 : g 8→ (e, g).
[ Maybe could define groupoids near here, if an application for them can be found. See EDM2 [35], 190.P. ]
[ Near here, present quotient sets like ((G1 , σ1 ) × (G2 , σ2 ))/H and (G × G)/G? ]
9.4. Left transformation groups

9.4.1 Remark: Abstract groups are of little use in themselves, particularly in practical subjects like dif-
ferential geometry and physics. The useful groups are the ones which actually do something by acting on a
space of some kind, which is exactly what transformation groups are.
Transformation groups are used throughout differential geometry, for example in the definition of connections
on fibre bundles. See Section 16.7 for topological transformation groups and Section 33.7 for differentiable
transformation groups.
A transformation group is an algebraic system consisting of two sets and two operations. The active set
G − < (G, σ) is a group. The passive set X has no operation of its own. There is a binary operation
µ : G × X → X called the “action” of the group G on X. Transformation semigroups are also defined here,
but nothing useful is said about them.

268 9. Algebra
9.4.2 Definition: A (left) transformation semigroup of a set X is a tuple (G, X, σ, µ) where (G, σ) is a
semigroup, and the map µ : G × X → X satisfies
(i) ∀g1 , g2 ∈ G, ∀x ∈ X, µ(σ(g1 , g2 ), x) = µ(g1 , µ(g2 , x)).
9.4.3 Remark: Definitions 9.4.2 and 9.4.4 do not exclude the possibility that X is the empty set.
9.4.4 Definition: A (left) transformation group of a set X is a tuple (G, X, σ, µ) where (G, σ) is a group,
and the map µ : G × X → X satisfies the following conditions.
(i) ∀g1 , g2 ∈ G, ∀x ∈ X, µ(σ(g1 , g2 ), x) = µ(g1 , µ(g2 , x)).
(ii) ∀x ∈ X, µ(e, x) = x, where e is the identity of G.
The map µ may be referred to as the action map or the action of G on X.
[ X in Definition 9.4.4 may be called the “G-space” or the “action set” of G? ]
9.4.5 Notation: For a left transformation group in Definition 9.4.4, µ(g, x) is generally denoted gx or g.x
for g ∈ G and x ∈ X. The associative condition (i) ensures that this does not result in ambiguity since
(g1 g2 )x = g1 (g2 x) for all g1 , g2 ∈ G and x ∈ X.
9.4.6 Notation: If G is a left transformation group of a set X with action µ : G × X → X, the map
Lg : X → X is defined by Lg (x) = µ(g, x) = gx for x ∈ X.
9.4.7 Remark: For any fixed g ∈ G in Definition 9.4.4, the map Lg : X → X must be a bijection, because
Lg (Lg−1 (x)) = Lg−1 (Lg (x)) = x for all x ∈ X.
9.4.8 Remark: Definition 9.4.9 is related to the definition of fibre bundle homomorphisms, and fibre
bundle homomorphisms are related to the inheritance (or “porting”) of parallelism between fibre bundles in
Section 24.3. However, some parts of Definition 9.4.9 are of dubious value. (Epimorphisms, in particular,
are rarely seen in public.) Definition 9.4.9 is illustrated in Figure 9.4.1.
σ1 σ2
φ̂
G1 G2
µ1 µ2
φ
X1 X2
Figure 9.4.1 Transformation group homomorphism maps and spaces
[ The notation in Definition 9.4.9 could perhaps be improved by using φ̌ instead of φ? ]
9.4.9 Definition: A (left) transformation group homomorphism from a left transformation group
(G1 , X1 ) −
< (G1 , X1 , σ1 , µ1 ) to a left transformation group (G2 , X2 ) −
< (G2 , X2 , σ2 , µ2 ) is a pair of maps (φ̂, φ)
with φ̂ : G1 → G2 and φ : X1 → X2 such that
(i) φ̂(σ1 (g, h)) = σ2 (φ̂(g), φ̂(h)) for all g, h ∈ G1 . That is, φ̂(gh) = φ̂(g)φ̂(h) for g, h ∈ G1 .
(ii) φ(µ1 (g, x)) = µ2 (φ̂(g), φ(x)) for all g ∈ G1 , x ∈ X1 . That is, φ(gx) = φ̂(g)φ(x) for g ∈ G1 , x ∈ X1 .
A (left) transformation group isomorphism from (G1 , X1 ) to (G2 , X2 ) is a left transformation group homo-
morphism (φ̂, φ) such that φ̂ : G1 → G2 and φ : X1 → X2 are bijections.
A (left) transformation group endomorphism of a left transformation group (G, X) is a left transformation
group homomorphism (φ̂, φ) from (G, X) to (G, X).
A (left) transformation group automorphism of a left transformation group (G, X) is a left transformation
group isomorphism (φ̂, φ) from (G, X) to (G, X).

A (left) transformation group monomorphism from (G1 , X1 ) to (G2 , X2 ) is an left transformation group
homomorphism (φ̂, φ) such that φ̂ : G1 → G2 and φ : X1 → X2 are injective.
A (left) transformation group epimorphism from (G1 , X1 ) to (G2 , X2 ) is a surjective left transformation
group homomorphism (φ̂, φ) such that φ̂ : G1 → G2 and φ : X1 → X2 are surjective.
9.4.10 Example: One naturally asks whether the pair of maps in Definition 9.4.9 may be replaced with
a single map. Unfortunately, neither φ̂ nor φ determines the other. Consider the example of embedding
the transformation group (G1 , X1 ) = (SO(2), S 1 ) inside (G2 , X2 ) = (GL(2), IR2 ). The pair of identity maps
φ̂ : G1 → G2 and φ : X1 → X2 constitutes a transformation group homomorphism. (See Figure 9.4.2.)
Rθ φ̂ Rθ
SO(2) θ θ
GL(2)
µ1 µ2
y φ(y)
x φ φ(x)
S1 IR2
Figure 9.4.2 Transformation group homomorphism in Example 9.4.10
In fact, it is a monomorphism. But for a fixed choice of φ̂, the map ψ ◦ φ is a transformation group
homomorphism for any conformal map ψ : IR2 → IR2 . For a fixed choice of φ, if the target group G2 is not
specified, then any supergroup of G2 with the same map φ̂ is also a transformation group homomorphism. If
the target group G2 is given, the map φ̂ may or may not be uniquely specifed by the map φ. In the current
example, φ̂ is uniquely determined by φ. But if the group G2 is replaced with the group of all bijections
of IR2 (the permutation group of IR2 ), then, for example, φ̂ may be replaced by the map which sends group
elements Rθ ∈ SO(2) to bijections of IR2 which rotate the points of S 1 in the same way but leave all other
points of IR2 fixed. More generally, any map φ determines only the action of group elements φ̂(g) ∈ G2 on
points in the image of φ.
It follows from this example that transformation group homomorphisms must specify both maps in general.
9.4.11 Remark: Definition 9.4.9 (ii) may be expressed as φ ◦ Lg = Lφ̂(g) ◦ φ for all g ∈ G1 . Hence
∀g1 ∈ G1 , ∃g2 ∈ G2 , φ ◦ Lg1 = Lg2 ◦ φ, since one can simply put g2 = φ̂(g1 ). But the reverse does not follow.
A necessary condition for the existence of g1 ∈ G1 for a given g2 ∈ G2 is that Lg2 (Range(φ)) ⊆ Range(φ).
A sufficient condition is that g2 ∈ Range(φ̂).
[ Check if the combined necessary and sufficient conditions in Remark 9.4.11 imply existence of g1 . Also
examine the (non-)uniqueness of g1 and g2 when the other is given. ]
9.4.12 Remark: An equivariant map in Definition 9.4.13 is the map φ of a left transformation group
homomorphism (φ̂, φ) in Definition 9.4.9 for which φ̂ : G1 → G2 = G1 is the identity map.
9.4.13 Definition: An equivariant map between left transformation groups (G, X1 ) −

< (G, X1 , σ, µ1 ) and
(G, X2 ) −
< (G, X2 , σ, µ2 ) is a map φ : X1 → X2 which satisfies
(i) ∀g ∈ G, ∀x ∈ X1 , φ(µ1 (g, x)) = µ2 (g, φ(x)). That is, φ(gx) = gφ(x) for g ∈ G, x ∈ X1 .
[ Near here, define group anti-homomorphisms. The are useful for representations of a groups as a right action,
and for other purposes. ]
9.4.14 Definition: An effective (left) transformation group is a left transformation group (G, X, σ, µ) such
that ∀g ∈ G \ {e}, ∃x ∈ X, gx -= x. (In other words, Lg = Le only if g = e.)
Such a left transformation group G −< (G, X, σ, µ) is said to act effectively on X.

270 9. Algebra
9.4.15 Remark: If (G, X, σ, µ) is an effective left transformation group and g, g # ∈ G are such that Lg =
Lg! , then g = g # . In other words, no two different group elements produce the same action. That is, the
group element is uniquely determined by its group action. Conversely, a left group action which is uniquely
determined by the group element must be an effective left action. (For proof, see Exercise 46.4.6.)
9.4.16 Example: A simple example of an effective left transformation group is the group (G, X) of all
permutations of an arbitrary set X. (This group is called the “symmetric group” of X if X is finite. See
Section 7.2 for permutations.) The action of each group element g ∈ G is a bijection Lg : X → X. The
group operation of G is determined by µ as function composition so that Lgg! = Lg Lg! for all g, g # ∈ G. This
is the largest possible effective left transformation group of a given set X.
At the other extreme is the trivial group G containing only the identity operation on any set X.
9.4.17 Example: A fairly general example of a non-effective left transformation group is given by the
tuple (G, X% , σ, µ% ) which is the group of left translations of the set X% of left cosets of an arbitrary subgroup
H of an arbitrary group G − < (G, σ) as described in Remark 9.3.12. If H is not the trivial group, then
(G, X% , σ, µ% ) is not effective.
9.4.18 Remark: If the set X of an effective left transformation group (G, X, σ, µ) is the empty set, the
only possible choice for G is the trivial group {e}.
9.4.19 Theorem: The group operation of an effective left transformation group is uniquely determined by
the action map.
Proof: Let (G, X, σ, µ) be an effective left transformation group. Let g1 , g2 ∈ G. Then µ(g1 g2 , x) =
µ(g1 , µ(g2 , x)) for all x ∈ X. Suppose that there are two group elements g3 , g4 ∈ G such that µ(g3 , x) =
µ(g4 , x) = µ(g1 g2 , x) for all x ∈ X. Then for all x ∈ X, µ(g3 g4−1 , x) = µ(g3 , µ(g4−1 , x)) = µ(g4 , µ(g4−1 , x)) =
µ(g4 g4−1 , x)) = µ(e, x) = x. This implies that g3 g4−1 = e since the group action is effective. So the group
element σ(g1 , g2 ) = g1 g2 is uniquely determined by the action map µ.
9.4.20 Remark: Theorem 9.4.19 implies that there is no real need to specify the group operation σ of
an effective left transformation group (G, X, σ, µ) because all of the information is in the action map. An
effective transformation group is no more than the set of left transformations Lg : X → X of the set X. If
the group action is not effective, then there are at least two group elements g, g # ∈ G which specify the same
action Lg = Lg! : X → X. Any group which is explicitly constructed as a set of transformations of a set will
automatically be effective.
If a left transformation group is effective, the group elements g and the corresponding left translations Lg
may be used interchangeably. There is no danger of real ambiguity in this.
9.4.21 Definition: A left transformation group G − < (G, X, σ, µ) is said to act freely on the set X if
∀g ∈ G \ {e}, ∀x ∈ X, gx =
- x; that is, if group elements apart from the identity e have no fixed points.
A free left transformation group is a left transformation group (G, X, σ, µ) which acts freely on X.
9.4.22 Remark: In the special case that X = ∅ in Definition 9.4.21, the group G is completely arbitrary.
Therefore all left transformation groups except the trivial group {e} act freely on the empty set but are not
effective. If X -= ∅, then all transformation groups which act freely on X are effective.
Spivak [43], volume II, page 309, says “G acts without fixed point” rather than “G acts freely”.
9.4.23 Remark: The free transformation groups in Definitions 9.4.21 and 9.5.11 must be distinguished
from “free groups”, which are groups (not transformation groups) which are generated freely from a set of
abstract elements.
A canonical example of a transformation group which acts freely is a group acting on itself by left or right
translation as in Theorem 9.4.24. This is the kind of transformation group which is needed for principal fibre
bundles in Sections 23.9 and 34.4, which play an important role in defining parallelism and connections.
9.4.24 Theorem: Let (G, σ) be a group. Define the action map µ : G × G → G by µ : (g1 , g2 ) 8→ σ(g1 , g2 ).
Then the tuple (G, G, σ, µ) = (G, G, σ, σ) is a free left transformation group of G.

Proof: For a left transformation group, the action map µ : G × X → X must satisfy the associativity
rule µ(σ(g1 , g2 ), x) = µ(g1 , µ(g2 , x)) for all g1 , g2 ∈ G and x ∈ X. If the formula for µ in the theorem is
substituted into this rule with X = G, it follows easily from the associativity of σ. The transformation group
acts freely because G is a group.
9.4.25 Definition: The (left) transformation group of G acting on G by left translation, or the left
translation group of G, is the left transformation group (G, G, σ, σ).
9.4.26 Remark: The group action in Theorem 9.4.24 is only supposed to apply to the active copy of G,
but it is an interesting philosophical question as to whether this really means anything.
9.4.27 Definition: The orbit of a left transformation group (G, X) passing through the point x ∈ X is
the set Gx = {gx; g ∈ G}.
9.4.28 Definition: The orbit space of a left transformation group (G, X) is the set of orbits {Gx; x ∈ X},
denoted as X/G.
9.4.29 Theorem: The orbit space of a left transformation group (G, X) is a partition of X.
Proof: Let (G, X, σ, µ) be a left transformation group. Let x1 , x2 ∈ X be such that Gx1 ∩ Gx2 -= ∅. Then
g1 x1 = g2 x2 for some g1 , g2 ∈ G. So for any g ∈ G, gx1 = (gg1−1 )g1 x1 = gg1−1 g2 x2 ∈ Gx2 . Hence Gx1 ⊆ Gx2 .
Similarly Gx1 ⊇ Gx2 . So Gx1 = Gx2 and it follows that X/G is a partition of X. (This proof is suspiciously
similar to the proof of Theorem 9.3.6.)
9.4.30 Remark: Another way to show that the orbit space is a partition of the passive set X in Theorem
9.4.29 is to note that the relation (R, X, X) defined by R = {(x1 , x2 ) ∈ X × X; ∃g ∈ G, x1 = gx2 } is an
equivalence relation whose equivalence classes are of the form Gx for x ∈ X.
9.4.31 Remark: Apart from being a partition of the passive set X, the orbit space of (G, X) seems to
have no further structure or operations from the transformation group. (Of course there may be topological
or algebraic structures on X, but these are ignored here.) The group action has been effectively “moduloed
out” by the quotient construction. This contrasts with the situation for mixed transformation groups.
9.4.32 Example: The set of rotations of a sphere about its North pole is a group whose orbits are the
lines of constant latitude. (See Figure 9.4.3.) The group action causes translations within each orbit but not
between them. (See Exercise 46.4.7.)
Start here
Figure 9.4.3 Orbits of rotation of S 2 by 120◦ in Example 9.4.32
9.4.33 Definition: The stabilizer of a point x ∈ X& for a left transformation group (G, X, σ, µ) is the
group (Gx , σx ) with Gx = {g ∈ G; gx = x} and σx = σ &G ×G .
x x
9.4.34 Remark: A left transformation group (G, X, σ, µ) acts freely on X if and only if Gx = {e} for
all x ∈ X. This follows immediately from Definitions 9.4.21 and 9.4.33.

272 9. Algebra
[ Show that Range(σx ) = Gx for Definition 9.4.33. ]

[ Present quotients related to stabilizers near here. ]
[ Near here, present approximately three isomorphism theorems. ]
9.4.35 Remark: Analogous to the direct products of groups in Definition 9.3.30 are the direct products
of transformation groups in Definitions 9.4.36 and 9.4.37. The group (G, σ) in Definition 9.4.36 is the direct
product group introduced in Definition 9.3.30. It is a very easy exercise to verify that the tuples (G, X, σ, µ)
in Definitions 9.4.36 and 9.4.37 satisfy Definition 9.4.4 for a left transformation group. (See Exercise 46.4.8.)
For left transformation groups with the same group, Definitions 9.4.36 and 9.4.37 give two direct products
which are not generally isomorphic, although the group (G, σ) may be identified with a subgroup of the
direct product group (G, σ) × (G, σ) as mentioned in Remark 9.3.33.
9.4.36 Definition: The direct product of left transformation groups (G1 , X1 , σ1 , µ1 ) and (G2 , X2 , σ2 , µ2 )
is the left transformation group (G, X, σ, µ) where (G, σ) = (G1 , σ1 ) × (G2 , σ2 ), X = X1 × X2 , and µ :
G × X → X satisfies µ : ((g1 , g2 ), (x1 , x2 )) 8→ (g1 x1 , g2 x2 ).
9.4.37 Definition: The (same-group) direct product of left transformation groups (G, X1 , σ, µ1 ) and
(G, X2 , σ, µ2 ) is the left transformation group (G, X, σ, µ) where X = X1 × X2 , and µ : G × X → X
satisfies µ : (g, (x1 , x2 )) 8→ (gx1 , gx2 ).
9.4.38 Remark: It is possible to combine the products and quotients of left transformation groups. The
different-group direct product (G, X, σ, µ) in Definition 9.4.36 yields an orbit space (X1 × X2 )/G which is
perhaps not very interesting. This orbit space is defined as the set {G(x1 , x2 ); x1 , ∈ X1 x2 ∈ X2 }, where
G(x1 , x2 ) denotes the set {(g1 x1 , g2 x2 ); g1 ∈ G1 , g2 ∈ G2 } = (G1 x1 ) × (G2 x2 ). Therefore (X1 × X2 )/G may
be identified with the set (X1 /G1 ) × (X2 /G2 ) = {(G1 x1 , G2 x2 ); x1 ∈ X1 , x2 ∈ X2 }.
The left action ν1 : G1 ×((X1 ×X2 )/G) → (X1 ×X2 )/G defined by ν1 : (g1 , G1 x1 ×G2 x2 ) 8→ (G1 g1 x1 ×G2 x2 )
yields nothing of interest because G1 g1 = G1 for all g1 ∈ G1 .
9.4.39 Remark: A slightly more interesting product-quotient arise from the same-group direct product in
Definition 9.4.37. In this case, the set (X1 × X2 )/G equals {G(x1 , x2 ); x1 , ∈ X1 x2 ∈ X2 } where G(x1 , x2 ) =
{(gx1 , gx2 ); g ∈ G} = [(x1 , x2 )]. These orbits do not generally split into simple products as was the case in
Remark 9.4.38.
It is tempting to believe that a left action ν1 can now be defined on the orbit space (X1 × X2 )/G as
ν1 : G × ((X1 × X2 )/G) → (X1 × X2 )/G where ν1 : (h, [(x1 , x2 )]) 8→ [(hx1 , x2 )]. Unfortunately, this is well-
defined only if h commutes with every element of G. This is because [(hgx1 , gx2 )] must equal [(hx1 , x2 )] for
all g ∈ G, and [(hgx1 , gx2 )] = [(g −1 hgx1 , x2 )]. But it is not generally true that g −1 hg = h. [ This motivates
the definition of mixed left and right actions? ]
9.5. Right transformation groups

[ This section is too boring to read. It is intended for reference only. ]
This section is not a simple mirror image of Section 9.4. It is true that left and right transformation
groups are essentially mirror images of each other, but there are numerous subtleties which must be checked,
particularly when left and right transformation groups are combined in various ways in Section 9.6. After
all, a real-life mirror may be held at different angles to create many different mirror images.
An example of the need for both left and right transformation groups is the definition of a principal fibre
bundle. (For example, see Section 23.9.) The total space of a principal fibre bundle is acted on by fibre chart
transition maps on the left, and by structure group actions on the right. The fact that the actions are on
opposite sides makes the notation more convenient.
9.5.1 Definition: A right transformation semigroup of a set X is a tuple (G, X, σ, µ) where (G, σ) is a
semigroup, and the map µ : X × G → X satisfies
(i) ∀g1 , g2 ∈ G, ∀x ∈ X, µ(x, σ(g1 , g2 )) = µ(µ(x, g1 ), g2 ).

9.5. Right transformation groups 273
9.5.2 Definition: A right transformation group of a set X is a tuple (G, X, σ, µ) where (G, σ) is a group,
and the map µ : X × G → X satisfies
(i) ∀g1 , g2 ∈ G, ∀x ∈ X, µ(x, σ(g1 , g2 )) = µ(µ(x, g1 ), g2 ).
(ii) ∀x ∈ X, µ(x, e) = x, where e is the identity of G.
The map µ may be referred to as the action map or the action of G on X.
9.5.3 Notation: For a right transformation group in Definition 9.5.2, µ(x, g) is generally denoted as xg
or x.g for g ∈ G and x ∈ X. The associative condition (i) ensures that this does not result in ambiguity.
since x(g1 g2 ) = (xg1 )g2 for all g1 , g2 ∈ G and x ∈ X.
9.5.4 Remark: The difference between left and right transformation groups lies in the associativity rule,
condition (i) in Definition 9.5.2, rather than in the order of G and X in the domain G × X or X × G of the
action µ. The importance of the associativity differences becomes clearer from a study of Definition 9.6.1.
9.5.5 Notation: If G is a right transformation group of a set X with action µ : X × G → X, the map
Rg : X → X is defined by Rg (x) = µ(x, g) = xg for x ∈ X.
9.5.6 Remark: For any fixed g ∈ G in Definition 9.5.2, the map Rg : X → X must be a bijection, because
Rg (Rg−1 (x)) = Rg−1 (Rg (x)) = x for all x ∈ X.
9.5.7 Remark: Definition 9.5.8 is used in principal fibre bundles. (See Section 23.9.) The fibre charts of
a principal fibre bundle are equivariant maps with respect to the right action of a structure group on the
total space of a principal fibre bundle.
9.5.8 Definition: An equivariant map between right transformation groups (G, X1 ) −

< (G, X1 , σ, µ1 ) and
(G, X2 ) −
< (G, X2 , σ, µ2 ) is a map φ : X1 → X2 which satisfies
(i) ∀g ∈ G, ∀x ∈ X1 , φ(µ1 (x, g)) = µ2 (φ(x), g). That is, φ(xg) = φ(x)g for g ∈ G, x ∈ X1 .
[ Define right transformation group homomorphisms. ]
9.5.9 Definition: An effective right transformation group is a right transformation group (G, X, σ, µ) such
that ∀g ∈ G \ {e}, ∃x ∈ X, xg -= x. (In other words, Rg = Re only if g = e.)
Such a right transformation group G −< (G, X, σ, µ) is said to act effectively on X.
9.5.10 Theorem: The group operation σ in Definition 9.5.2 is uniquely determined by the action map µ
if the group action is effective.
Proof: The argument is essentially the same as for left transformation groups in Theorem 9.4.19.
9.5.11 Definition: A right transformation group G − < (G, X, σ, µ) is said to act freely on the set X if
∀g ∈ G \ {e}, ∀x ∈ X, xg -= x; in other words, group elements apart from the identity e have no fixed points.
A free right transformation group is a right transformation group (G, X, σ, µ) which acts freely on X.
9.5.12 Theorem: Let G − < (G, σ) be a group. Define the action map µ : G × G → G by µ : (g1 , g2 ) 8→
σ(g1 , g2 ). Then the tuple (G, G, σ, µ) = (G, G, σ, σ) is a free right transformation group of G.
Proof: For a right transformation group, the action map µ : X × G → X must satisfy the associativity
rule µ(x, σ(g1 , g2 )) = µ(µ(x, g1 ), g2 ) for all g1 , g2 ∈ G and x ∈ X. If the formula for µ in the theorem is
substituted into this rule with X = G, it follows easily from the associativity of σ. The transformation group
act freely because G is a group.
9.5.13 Definition: The right transformation group of G acting on G by right translation, or the right
translation group of G, is the right transformation group (G, G, σ, σ).
9.5.14 Remark: Oddly enough, the tuple (G, G, σ, σ) is identical for the left and right transformation
groups where G acts on G. This is yet another example which demonstrates that definitions of object classes
are more than just the sets in the specification tuples. The same tuple is used to parametrize two different
classes of objects: both the left and right transformation group classes.

274 9. Algebra
[ Present orbits, stabilizers and quotients near here. ]
9.5.15 Remark: The direct products of right transformation groups in Definitions 9.5.16 and 9.5.17 are
simple mirror images of the left transformation groups in Definitions 9.4.36 and 9.4.37. The direct product
group (G, σ) = (G1 , σ1 ) × (G2 , σ2 ) is the same for left and right transformation group direct products.
9.5.16 Definition: The direct product of right transformation groups (G1 , X1 , σ1 , µ1 ) and (G2 , X2 , σ2 , µ2 )
is the right transformation group (G, X, σ, µ) where (G, σ) = (G1 , σ1 ) × (G2 , σ2 ), X = X1 × X2 , and
µ : X × G → X satisfies µ : ((x1 , x2 ), (g1 , g2 )) 8→ (x1 g1 , x2 g2 ).
9.5.17 Definition: The (same-group) direct product of right transformation groups (G, X1 , σ, µ1 ) and
(G, X2 , σ, µ2 ) is the right transformation group (G, X, σ, µ) where X = X1 × X2 , and µ : X × G → X
satisfies µ : ((x1 , x2 ), g) 8→ (x1 g, x2 g).
9.5.18 Remark: As noted in Remark 9.4.39, product-quotient orbit spaces of the form (X1 × X2 )/G may
be constructed from the direct products in Definitions 9.5.16 and 9.5.17.
9.6. Mixed transformation groups

Left and right transformation groups are often mixed in the definitions of fibre bundles. Mixed transformation
groups are particularly applicable in the definition of associated fibre bundles.
This section was inspired by the strangeness of the way associated fibre bundles are constructed via orbit
spaces, as for example condition (i) in Definition 23.12.3. The objective of this section is to reconstruct a
plausible algebraic background for those strange constructions. In particular, the objective is to construct
generalizations of tensor and multilinear maps to transformation groups. (In fact, it has turned out that the
product-quotient groups in this section are almost irrelevant to defining associated fibre bundles, no matter
what the textbooks say.)
9.6.1 Definition: The mirror image of a left transformation group (G, X, σ, µ) is the right transformation
group (G, X, σ, µ# ), where µ# : X × G → X is defined by µ# : (x, g) 8→ µ(g −1 , x).
The mirror image of a right transformation group (G, X, σ, µ) is the left transformation group (G, X, σ, µ# ),
where µ# : G × X → X is defined by µ# : (g, x) 8→ µ(x, g −1 ).
[ The mirror images in Definition 9.6.1 are important for converting left transformation groups (G, F1 ), (G, F2 )
so that (G, F1 ) is a right transformation group, and then form a skew product from this. ]
9.6.2 Remark: Definition 9.6.1 is non-standard, at least in name. To show that the mirror image of
a left transformation group is a valid right transformation group, it is necessary to verify conditions (i)
and (ii) of Definition 9.5.2. Condition (ii) is trivial to verify. Condition (i) follows from the calculation
µ# (x, g1 g2 ) = µ(g2−1 g1−1 , x) = µ(g2−1 , µ(g1−1 , x)) = µ# (µ(g1−1 , x), g2 ) = µ# (µ# (x, g1 ), g2 ). The validity of the
mirror image of a right transformation group follows identically. As one would expect, the mirror image of
the mirror image is the same as the original transformation group.
An attempt to similarly construct a left transformation group from a left transformation group, by simply
replacing g with g −1 , fails because this would yield µ# (g1 g2 , x) = µ# (g2 , µ# (g1 , x)), which has the wrong
order. An attempt to construct mirror image groups using some sort of inverse of the group operation σ fails
similarly (as mentioned in Remark 9.3.32). Therefore it seems unlikely that further similar mirror images
for general transformation groups can be constructed.
[ Near here, define mixed transformation group homomorphisms. ]
9.6.3 Remark: The “skew products” (almost certainly a non-standard name) of a left and right transfor-
mation group in Definitions 9.6.4, 9.6.5, 9.6.8 and 9.6.9 are the same as the direct product of the left (or
right) transformation group with the mirror image of a right (or left) transformation group. It is an easy
exercise to verify that these skew products satisfy Definitions 9.4.4 and 9.5.2 for left and right transformation
groups. (See Exercise 46.4.9.)

9.6. Mixed transformation groups 275
9.6.4 Definition: The left outside skew product of a left transformation group (G1 , X1 , σ1 , µ1 ) and a
right transformation group (G2 , X2 , σ2 , µ2 ) is the left transformation group (G, X, σ, µ) where (G, σ) =
(G1 , σ1 ) × (G2 , σ2 ), X = X1 × X2 , and µ : G × X → X satisfies µ : ((g1 , g2 ), (x1 , x2 )) 8→ (g1 x1 , x2 g2−1 ).
9.6.5 Definition: The right inside skew product of a right transformation group (G1 , X1 , σ1 , µ1 ) and a
left transformation group (G2 , X2 , σ2 , µ2 ) is the right transformation group (G, X, σ, µ) where (G, σ) =
(G1 , σ1 ) × (G2 , σ2 ), X = X1 × X2 , and µ : X × G → X satisfies µ : ((x1 , x2 ), (g1 , g2 )) 8→ (x1 g1 , g2−1 x2 ).
9.6.6 Remark: It is important to use the correct choice of inverses in Definitions 9.6.4 and 9.6.5. If the
map µ : ((g1 , g2 ), (x1 , x2 )) 8→ (g1−1 x1 , x2 g2 ) is used for the left skew product, the correct associativity rule
is not obeyed. Similarly, associativity for the right skew product fails if the map µ : ((x1 , x2 ), (g1 , g2 )) 8→
(x1 g1−1 , g2 x2 ) is used.
On the other hand, these incorrect maps become correct if left and right are swapped. That is, the left
skew product (G, X, σ, µ) becomes a right transformation group if the map µ is redefined as µ : X × G → X
with µ : ((x1 , x2 ), (g1 , g2 )) 8→ (g1−1 x1 , x2 g2 ). Similarly, the right skew product (G, X, σ, µ) becomes a left
transformation group if µ is redefined as µ : G × X → X with µ : ((g1 , g2 ), (x1 , x2 )) 8→ (x1 g1−1 , g2 x2 ). The
general rule is: “If the g is on the wrong side, use the inverse.”
[ The other two kinds of skew products should also be in Definitions 9.6.4, 9.6.5, 9.6.8 and 9.6.9. ]
These comments apply also to Definitions 9.6.8 and 9.6.9, which is essentially a special case of Definitions
9.6.4 and 9.6.5.
9.6.7 Remark: The same-group right skew product in Definition 9.6.9 is used in definitions of associated
fibre bundles. Definitions 9.4.36, 9.4.37, 9.5.16, 9.5.17, 9.6.1, 9.6.4 and 9.6.5 have been presented primarily
as preparation for Definition 9.6.9.
9.6.8 Definition: The (same-group) left outside skew product of a left transformation group (G, X1 , σ, µ1 )
and a right transformation group (G, X2 , σ, µ2 ) is the left transformation group (G, X, σ, µ) where X =
X1 × X2 , and µ : G × X → X satisfies µ : (g, (x1 , x2 )) 8→ (gx1 , x2 g −1 ).
9.6.9 Definition: The (same-group) right inside skew product of a right transformation group
(G, X1 , σ, µ1 ) and a left transformation group (G, X2 , σ, µ2 ) is the right transformation group (G, X, σ, µ)
where X = X1 × X2 , and µ : X × G → X satisfies µ : ((x1 , x2 ), g) 8→ (x1 g, g −1 x2 ).
[ Please ignore Remark 9.6.10 for now. It has many errors!! ]
9.6.10 Remark: In Remark 9.4.39, a “product-quotient” construction of the form (X1 ×X2 )/G was defined
as the orbit space of a group G acting on a direct product of left transformation groups. The same can be
done for right transformation groups and for mixed transformation groups such as in Definitions 9.6.4, 9.6.5,
9.6.8 and 9.6.9. As noted before, the different-group products (such as in Definitions 9.6.4 and 9.6.5) are not
very interesting.
[ Maybe should try to use (x1 g −1 , gx2 ) in the following instead of (x1 g, g −1 x2 ). ]
The form of product-quotient space most often used for associated fibre bundles is constructed from the
“right inside skew product” in Definition 9.6.9. The quotient of this skew product X1 × X2 with respect to
G is the orbit space (X1 × X2 )/G defined as {xG; x ∈ X1 × X2 } = {(x1 , x2 )G; x1 ∈ X1 , x2 ∈ X2 }, where
(x1 , x2 )G = {(x1 g, g −1 x2 ); g ∈ G} = [(x1 , x2 )] is an orbit of (G, X, σ, µ).
It was noted in Remark 9.4.39 that a certain kind of natural action by G on (X1 × X2 )/G is well-defined
only if the group is commutative, which is a very strong constraint. (This hoped-for action is a map of the
form ν : ([(x1 , x2 )], h) 8→ [(x1 h, x2 )] or something similar.) It seems that the only way to get around this
commutativity requirement without G being commutative is to define both left and right transformations
on X1 or X2 . The commutativity of a left with a right action on a single set looks like an associativity rule,
namely (gx)h = g(xh) for group elements g and h. If such a rule holds, then a useful action may be defined
on the product-quotient space. (Such double-sided actions arise in the case that X1 or X2 is either a group
or the total space of a principal fibre bundle.)
Suppose that a group G acts on a set X1 both from the left and the right. Then there is a left transformation
group (G, X1 , σ, µ%1 ) and a right transformation group (G, X1 , σ, µr1 ). First consider the direct product X1 ×X2

276 9. Algebra
of (G, X1 , σ, µ%1 ) and (G, X2 , σ, µ2 ). Then (X1 ×X2 )/G = {[(x1 , x2 )]; x1 ∈ X1 , x2 ∈ X2 } as in Remark 9.4.39,
where [(x1 , x2 )] = {(gx1 , gx2 ); g ∈ G}. A right transformation ν : ((X1 × X2 )/G) × G → (X1 × X2 )/G may
be defined by ν : ([(x1 , x2 )], h) 8→ [(x1 h, x2 )] for h ∈ G. This is well-defined if µ%1 commutes with µr1 because
then ν : ([(gx1 , gx2 )], h) 8→ [((gx1 )h), gx2 )] = [(g(x1 h), gx2 )] = [(x1 h, x2 )].
It would be interesting to know whether this right action ν can be “transferred” from X1 to X2 . That is,
although ν seems to act on the X1 component, it would be useful if this could be regarded as an action on
the X2 component instead.
The fact that the different-group product-quotient yields no interesting transformation groups, whereas the
same-group product-quotient does, may be thought of as being due to the orbits of the former being so large
that no group action can move a point from one orbit into another, while in the same-group case, the orbits
are something like “diagonal stripes” which can still be moved around by a group action.
[ Mention here the similarity between product-quotients of transformation groups and tensor products of
modules over rings. ]
[ Perhaps (X1 × X2 )/G can be regarded as a sort of tensor space for some generalization of “multilinear”.
For linear spaces V1 , V2 , a multilinear map f : V1 × V2 → W satisfies f (λ−1 v1 , λv2 ) = f (v1 , v2 ) for λ in a
field K. ]
[ The right inside skew product has the property (?) that given a map φ : X1 → G, a right action µ1 : X1 ×G →
X1 and a left action µ2 : G × X2 → X2 , a function φ̃ : (X1 × X2 )/G → X2 such that φ̃([(x1 , x2 )]) = φ(x1 )x2 . ]
9.7. Figures and invariants of transformation groups

The purpose of this section is to indicate the wide range of transformation groups which may be associated
with each other. When the points of a geometry are transformed by elements of the structure group of
that geometry, not only the points of that geometry but also the subsets of the geometry are transformed.
The most general kind of thing that is transformed along with the points of a geometry may be referred
to as “figures”. This list below indicates the wide range of these figures, including the points of a space,
the subsets of points, functions valued on those points, distributions (generalized functions) valued on those
points, and so forth. The transformation group which transforms the points in the space is in some sense
“associated” with the group of transformations of the spaces of figures. Thus the group of transformations
of tangent vectors is associated with the group of transformations of covectors or tensors of various kinds.
This helps to motivate the notions of associated fibre bundles in Section 23.12.
[ See EDM2 [35], article 226, for classical terminology for invariants and covariants. ]
This section is motivated by the observation that convex combinations of vectors are preserved under affine
transformations. This is not strictly invariance, because the midpoint of two points, for instance, is trans-
formed to a different point by an affine transformation, but the midpoint of two transformed points is the
same as the transformed midpoint of the original points. This kind of preservation of relations could be
described as a kind of “covariance”. In other words, the transformation group preserves relations rather
than points or functions of points. However, it is arguable that the same word “invariant” could be applied
both to fixed points, fixed function values, and fixed relations under transformation groups.
When parallelism is defined on fibre bundles, it is often required that some structure is preserved. Thus,
for instance, if two objects attached to one point of a set satisfy some relation, then the same two objects
transported in a parallel fashion to another point may be required to maintain that relation. A property
or relation which is maintained under a transformation is called an “invariant” of the group of transforma-
tions. So when defining parallelism for manifolds, it is convenient to define a “structure group” so that the
properties and relations which are supposed to be maintained are invariants of that group. This motivates
the presentation here of transformation group invariants.
The definition of group invariants is based on the notion of “figures”. A group action on a space X may
be extended to a wide variety of spaces of figures derived from the basic space X. It does not seem to be
possible to define general spaces of figures and the extended group actions on them in a systematic way. To
see why, consider the following examples, where X is a set acted on by a group G − < (G, X, σ, µ). The set of
figures is denoted as F , and the operator Lg : F → F denotes an action of g ∈ G on F in each case.

9.7. Figures and invariants of transformation groups 277
[ Is it true that all transformations of a set X which preserve the invariants of a group G on X are members
of G? ]
(1) F ⊆ X. The figures are just the points of the base set X. In this case, the natural action Lg : F → F
satisfies Lg : x 8→ µ(g, x).
(2) F ⊆ IP(X). The figures are subsets of X. The natural operation Lg : F → F satisfies Lg (S) =
{µ(g, f ); f ∈ F } for all subsets S of X. A special case would be the set of images of one-to-one
!k
(continuous) maps f : Ck → X, where Ck = {x ∈ IRk ; i=1 xi ≤ 1 and ∀i = 1 . . . k, xi ≥ 0} is a closed
k-dimensional “cell”. Such images might be referred to as “disoriented cells”. Closed cells could be
replaced with open cells or boundaries of cells. By fixing k, the set of figures could be restricted to just
line segments or triangles etc.
(3) ∀γ ∈ F, γ : I → X. The figures γ are maps from some set I to X. Examples are sequences of elements
of X such as a basis of a vector space (where I is a finite or countable set), curves in X (where I is an
interval of the real line), and families of curves in X (where I is a subset of IR2 such as [0, 1] × [0, 1]).
The natural operation Lg : F → F is defined by Lg (γ)(t) = µ(g, γ(t)) for γ ∈ F and t ∈ I. A special
case would be the set of functions f : Ck → X, where Ck is defined as in case (2). Such functions could
be called “oriented cells”.
(4) ∀f ∈ F, f : X → Y . Each figure is a map from X to some other set Y . In this case, there are two
natural operations Lg : F → F . The action could be defined either so that Lg (f )(x) = f (µ(g, x)) or
Lg (f )(x) = f (µ(g −1 , x)). The second choice has the effect that the function is moved in the direction
of the action of g on X because the points x are moved in the reverse direction. This has the advantage
that if both f and x are transformed by g ∈ G, then the value f (x) is unchanged.
(5) ∀f ∈ F, f : (X → Y ) → ˚ Z. Each figure f is a map from some space of functions from X to a set Y ,
valued in a third set Z. An example is the set of continuous linear functionals f : C0∞ (IRn ) → IR
with X = IRn and Y = Z = IR. In this case, the figures are functions of functions. A double reversal
of the sense of the Lg action in case (4) yields the forward action defined by Lg (f )(φ) = f (Lg−1 (φ)) for
f ∈ F , φ : X → Y , where Lg (φ)(x) = φ(µ(g −1 , x)) for φ : X → Y and x ∈ X.
(6) ∀f ∈ F, f : I → IP(X). The figures are maps from an index set I with values which are subsets of X.
An example would be an infinite sequence of neighbourhoods of a point in a topological space X. The
natural operation Lg : F → F here is Lg (f )(t) = {µ(g, x); x ∈ f (t)} for f ∈ F and t ∈ I.
(7) F is the set of ordered paths in X. The figures are images C = Range(γ) of continuous curves γ : I → X,
together with a total order R on C. In other words, the map γ is discarded except for the ordering it
imposes on the set Range(γ) ⊆ X. In this case, Lg : F → F is defined so that Lg : (C, R) 8→ (C # , R# )
where C # = Lg C is the translated set as in case (2), and R# is defined as the obvious translation of R
by g. An obvious generalization of this is the case that F is a set of partially ordered subsets of the
set X.
[ Also mention tensors and alternating tensors etc. in this list? ]
There is clearly a very broad class of objects which could be regarded as “figures” for a set X. In each case,
there is no well-defined rule for generating the map Lg from the figure space F , although it is ‘obvious’ which
map to choose in each case, more or less. The point made here is that the concepts of figures and invariants
seem to be some sort of meta-concept which cannot be codified easily.
Suppose now that a figure space F1 has been chosen for a transformation group (G, X, σ, µ), and a transfor-
mation operator Lg : F1 → F1 for each g ∈ G. Then any function θ : F1 → F2 for any figure space F2 for X
(with action operator also denoted as Lg ) is said to be an invariant of the group G if θ(Lg f ) = Lg θ(f ) for
all f ∈ F1 and g ∈ G. (This is equivalent to requiring θ to be an equivariant map from (G, F1 ) to (G, F2 ).
See Definition 9.4.13 for equivariance.)
[ Maybe in the case that θ(Lg f ) = Lg θ(f ), the function θ should be called covariant, and only invariant
if θ(Lg f ) = θ(f ). ]
Typical examples of invariant functions θ would be the length of a curve, the centroid of a finite set of vectors,
the maximum of a real-valued function, the angle between two vectors, and the mass of a Radon measure.
The θ functions may be boolean-valued attribute functions such as the collinearity of a set of points. The
attribute value set F2 may not be modified at all by the Lg operations. But in the case of linear combinations,
for instance, the linear combination of a sequence of vectors is not invariant under transformations of the

278 9. Algebra
base space X, but the linear combination is transformed in the same way as the sequence of vectors. Thus
linear combination operations are invariants of the group of linear transformations.
9.7.1 Remark: An important example of a group invariant is the operation of linear combination on
linear spaces such as the tangent vector spaces at points of a differentiable manifold. Consider the linear
space IRn . For any!sequence λ = (λi )mi=1 ∈ IR , define the linear combination operator Sλ : (IR )
m n m
→ IRn
m
by Sλ : (vi )i=1 8→ i=1 λi vi . This maps m-tuples of vectors in IR to a linear combination of those vectors.
m n
Then Sλ (Lg v) = gSλ (v) for any g ∈ GL(n) and v = (vi )m i=1 ∈ (IR ) , where Lg : (IR )
n m n m
→ (IRn )m
is defined so that Lg : (vi )i=1 8→ (gvi )i=1 . So every linear combination operator Sλ is an invariant (or
m n
covariant?) function of the set of m-frames in IRn .

When defining a connection on a differentiable fibre bundle in Chapters 35 and 36, the connection will be
required to preserve these invariants. This will play a role in the transition from connections on ordinary
fibre bundles to connections on principal fibre bundles. More specifically, the invariance of a connection on
a principal fibre bundle under group action on the right will arise from the invariance of linear combinations
of n-frames under linear transformations.
9.8. Rings and fields

This section presents the basic definitions of rings and fields.
9.8.1 Definition: A ring is a tuple R −

< (R, σ, τ ) such that
(i) (R, σ) is a commutative group (written additively),
(ii) (R, τ ) is a semigroup (written multiplicatively),
(iii) ∀a, b, c ∈ R, a(b + c) = ab + ac and (a + b)c = ac + bc.
9.8.2 Definition: A unitary ring is a ring (R, σ, τ ) such that

(i) ∃e ∈ R, ∀a ∈ R, (ea = a and ae = a).
The element e is called the unity, unity element, identity element or unit element of the ring.
The unit element may be denoted as 1 or 1R .
A unitary ring is also called a ring with unity.
9.8.3 Remark: The unity in Definition 9.8.2 is unique.
9.8.4 Definition: A zero ring is a ring (R, σ, τ ) such that R has only one element. Thus R = {0}.
9.8.5 Definition: A commutative ring is a ring (R, σ, τ ) such that

(i) ∀a, b ∈ R, ab = ba.
9.8.6 Remark: A commutative ring with unity is also called a commutative unitary ring.
9.8.7 Definition: An ideal of a ring R is an additive subgroup K of R such that

(i) ∀x ∈ R, ∀k ∈ K, (kx ∈ K and xk ∈ K).
[ Check whether the word “ideal” is derived from the term “ideal numbers”. See Bell [191], page 474. ]
9.8.8 Definition: A field is a tuple K −

< (K, σ, τ ) such that
(i) (K, σ) is a commutative group (written additively);
(ii) (K, τ ) is a commutative semigroup (written multiplicatively);
&
(iii) (K # , τ &K ! ×K ! ) is a group, where K # = K \ {0K };
(iv) ∀a, b, c ∈ K, a(b + c) = ab + ac and (a + b)c = ac + bc.
9.8.9 Remark: A field is the same thing as a commutative unitary ring (K, σ, τ ) which is not a zero ring,
with the additional condition:
(i) ∀a ∈ K \ {0K }, ∃b ∈ K, ab = ba = 1K .

9.9. Modules 279
9.8.10 Remark: According to Bell [191], page 355, writing in 1937, a field was sometimes called a “corpus”
in English to correspond to the German “Körper” and the French “corps”. The choice of the word “field”
in English is unfortunate because it clashes badly with the concept of a field in mathematical physics as a
function of space or space-time.
[ Must also define ordered fields. For ordered fields, see EDM2 [35], 149.N, page 581; Curtis [101], page 4. See
also an application of ordered fields in Remark 12.2.15. ]
9.9. Modules
The following table summarizes some basic algebraic systems and their operations and maps. When there
are two sets in one of these systems, there is a map whose domain is the cross product of the two sets and
the range is one or other of the two sets. Here the range set is called the “passive set” and the other set is
called the “active set”. In all cases in the following table, the active set is the one on the left, but obviously
the order is somewhat arbitrary.
active passive active set passive set
system name set set operation operation action map
group, semigroup G σ :G×G→G
ring R σ :R×R →R
τ :R×R→R
field K σ :K ×K →K
τ :K ×K →K
transformation group G X σ :G×G→G µ:G×X →X
left A-module A M σ :M ×M →M µ:A×M →M
left module over a group G M σG : G × G → G σM : M × M → M µ:G×M →M
(unitary) left module R M σR : R × R → R σM : M × M → M µ:R×M →M
over a ring τR : R × R → R
linear space K V σK : K × K → K σV : V × V → V µ:K ×V →V
τK : K × K → K
associative algebra, K A σK : K × K → K σA : A × A → A µ:K ×A→A
Lie algebra τK : K × K → K τA : A × A → A
It sometimes seems to analysts that algebraists throw together random sets of conditions for algebraic systems
and then study and classify them as an intellectual recreation. Nevertheless, it is useful to give here the
names and basic properties of some combinations of conditions because they arise naturally in differential
geometry. Probably the most useful systems are transformation groups, linear spaces and lie algebras.
On the subject of algebraic systems, Bell [190], page 213, says: “There are 4,096 (perhaps more) possible
generalizations of a field. To develop them all without some definite object in view would be slightly silly.
Only those that experience has suggested have been worked out in any detail. The rest will keep till they
are needed; the apparatus for developing them is available.”
A left A-module is a an algebraic system with two sets and two operations. This is similar to a transformation
group, but in this case the active set is unstructured rather than the passive (acted-upon) set.
The remaining algebraic systems in this section and Section 9.10 and 9.11 have two sets with three or more
operations. Both sets have one or more operations, and there is an action operation of one set on the other.
9.9.1 Remark: A mnemonic for the symbols is that σ means addition (“sums”), τ means multiplication
(“times”) and µ means a map. The σ symbol is hopefully reminiscent of the ◦ “operation” symbol.
9.9.2 Remark: Linear spaces (also called “vector spaces”) are deferred until Chapter 10. Linear spaces
are presented in much greater detail that other kinds of algebras because of their very great importance for
differential geometry.

280 9. Algebra
9.9.3 Remark: There is such a thing as a module without an operator domain. (For example, see
EDM2 [35], 277.B.) This is just a commutative group written additively. To such a module may be added
an operator domain which becomes the active set while the module is the passive set. The operator domain
may be an abstract set, a group, a ring or a field.
9.9.4 Definition: A module (without operator domain) is a commutative group, whose operation is written
additively.
9.9.5 Definition: The module of homomorphisms from a module M1 to a module M2 is the group
(Hom(M1 , M2 ), σ), where Hom(M1 , M2 ) is the set of group homomorphisms from M1 to M2 , and σ is
the operation of pointwise addition on Hom(M1 , M2 ).
9.9.6 Remark: The module of homomorphisms in Definition 9.9.5 is a module because it is a group whose
operation is commutative because the operation of M2 is commutative. Thus (f + g)(x) = f (x) + g(x) =
g(x) + f (x) = (g + f )(x) for all f, g ∈ Hom(M1 , M2 ) and x ∈ M1 .
9.9.7 Definition: The ring of endomorphisms of a module M is the ring (End(M ), σ, τ ), where End(M )
is the set of group endmorphisms of M , σ is the operation of pointwise addition on End(M ), and τ is the
composition operation of End(M ).
9.9.8 Definition: A (left) A-module is a tuple M −

< (A, M, σM , µ) such that
< (M, σM ) is a commutative group (written additively);
(i) M −
(ii) µ : A × M → M (written multiplicatively) satisfies ∀a ∈ A, ∀x, y ∈ M, a(x + y) = ax + ay.
A is said to be an operator domain of the module M .
M is also said to be a module with operator domain A or a module over (the set) A.
9.9.9 Remark: Since a.0 = a(0 + 0) = a.0 + a.0 in Definition 9.9.8, it follows that a.0 = 0 for all a ∈ A.
9.9.10 Definition: An A-homomorphism between modules M1 and M2 over a set A is a map f : M1 → M2

such that
(i) f is a group homomorphism from M1 to M2 ;
(ii) ∀a ∈ A, ∀x ∈ M1 , f (ax) = af (x).
An A-homomorphism between modules is also called an operator homomorphism or allowed homomorphism.
An A-endomorphism of a module M over a set A is an A-homomorphism from M to M .
An A-automorphism of a module M over a set A is an invertible A-endomorphism of M .
9.9.11 Remark: Definition 9.9.10 and Notation 9.9.12 apply not only to modules over abstract sets, but
also to modules over groups, rings and fields.
9.9.12 Notation: HomA (M1 , M2 ) denotes the set of all A-homomorphisms from module M1 to module
M2 over the same set A.
EndA (M ) denotes the set of all A-endomorphisms of a module M over a set A.
AutA (M ) denotes the set of all A-automorphisms of a module M over a set A.
GL(M ) is an alternative notation for AutA (M ).
9.9.13 Remark: The subscript A in Notation 9.9.12 is superfluous because Definition 9.9.8 requires the
set A to be part of the specification of a module.
9.9.14 Definition: The module of A-homomorphisms from a module M1 to a module M2 over the same
set A is the group (HomA (M1 , M2 ), σ), where HomA (M1 , M2 ) is the set of A-homomorphisms from M1 to
M2 and σ is the operation of pointwise addition on HomA (M1 , M2 ).
9.9.15 Definition: The ring of A-endomorphisms of an A-module M is the ring (EndA (M ), σ, τ ), where
EndA (M ) is the set of A-endmorphisms of M , σ is the operation of pointwise addition on EndA (M ), and τ
is the composition operation of EndA (M ).

9.9. Modules 281
9.9.16 Remark: The tuple (EndA (M ), σ, τ ) in Definition 9.9.15 is a ring. As noted in Remark 9.9.6,
the tuple (EndA (M ), σ) is a commutative group. The composition operation τ is a semi-group by the
associativity of composition in general. The distributivity follows pointwise from the distributivity of the
ring A.
[ Maybe define a (left) module over a semigroup here? Then it should be possible to say that a left module
over a ring is a left module over a semigroup. ]
9.9.17 Definition: A (left) module over a group G is a tuple M −

< (G, M, σG , σM , µ) such that
(i) G −
< (G, σG ) is a group (written multiplicatively);
(ii) M −
(iii) µ : G × M → M (written multiplicatively) satisfies ∀a ∈ G, ∀x, y ∈ M, a(x + y) = ax + ay;
(iv) ∀a, b ∈ G, ∀x ∈ M, (ab)x = a(bx);
(v) ∀x ∈ M, 1G x = x.
9.9.18 Remark: A left module over a group G is the same thing as a left G-module M such that G is a
group and the additional conditions (iv) and (v) of Definition 9.9.17 hold.
9.9.19 Definition: A (left) module over a ring R is a tuple M −

< (R, M, σR , τR , σM , µ) such that
(i) R −
< (R, σR , τR ) is a ring;
(ii) M −
(iii) µ : R × M → M (written multiplicatively) satisfies ∀a ∈ R, ∀x, y ∈ M, a(x + y) = ax + ay;
(iv) ∀a, b ∈ R, ∀x ∈ M, (a + b)x = ax + bx;
(v) ∀a, b ∈ R, ∀x ∈ M, (ab)x = a(bx).
9.9.20 Remark: A left module over a ring R is the same thing as a left R-module M such that R is a ring
and conditions (iv) and (v) of Definition 9.9.19 are satisfied.
Definition 9.9.19 is not a specialization or extension of Definition 9.9.17. Condition (v) of Definition 9.9.19
applies to the multiplication operation of the ring R, which is not a group operation, whereas condition (iv)
of Definition 9.9.17 refers to a group operation which just happens to be written multiplicatively. Even if
the latter condition were written additively, it would mean that (a + b)x = a(bx), which is incompatible
with Definition 9.9.19. Therefore a left module over a ring cannot be a left module over a group. This is
illustrated by the forking of the tree in Figure 9.9.1 between modules over rings and modules over groups.
9.9.21 Definition: A unitary (left) module over a ring R is a left module M −

< (R, M, σR , τR , σM , µ) over
a ring R such that
(i) R −
< (R, σR , τR ) is a unitary ring;
(ii) ∀x ∈ M, 1R x = x, where 1R is the unity of R.
9.9.22 Remark: Hartley/Hawkes [116], definition 5.1, page 70, use Definition 9.9.21 for left modules over
rings, but they comment that some authors use Definition 9.9.19 instead. However, it seems safest in a
general context to always state the unitary requirement explicitly.
9.9.23 Remark: If the ring R of a unitary left module M over R happens to be a field, then M is a linear
space over R. (See Definition 10.1.2.) So a linear space is the same thing as a unitary left module over a
field.
9.9.24 Remark: The family tree in Figure 9.9.1 summarizes the relations between the algebraic structures
in this section and Sections 9.10 and 9.11.
9.9.25 Definition: An R-linear map between modules M1 and M2 over a ring R is an R-homomorphism
from M1 to M2 .

282 9. Algebra
left A-module transformation group

(A,M,σM ,µ) (G,X,σG ,µ)
left module over ring left module over group

(R,M,σR ,τR ,σM ,µ) (G,M,σG ,σM ,µ)
unitary left module over ring

(R,M,σR ,τR ,σM ,µ)
associative algebra Lie algebra linear space

(R,A,σR ,τR ,σA ,τA ,µ) (R,A,σR ,τR ,σA ,τA ,µ) (K,V,σK ,τK ,σV ,µ)
Real Lie algebra

(IR,A,σIR ,τIR ,σA ,τA ,µ)
Figure 9.9.1 Family tree of modules and algebras
9.9.26 Definition: The A-module of A-homomorphisms from a module M1 − < (A, M1 , σ1 , µ1 ) to a mod-
ule M2 − < (A, M2 , σ2 , µ2 ) over the same commutative ring A is the A-module (A, M, σ, µ), where M =
HomA (M1 , M2 ) is the set of A-homomorphisms from M1 to M2 , σ is the operation of pointwise addition
on HomA (M1 , M2 ), and µ : A×M → M is the action map defined as the pointwise product µ(a, f )(x) = af (x)
for all a ∈ A, f ∈ HomA (M1 , M2 ) and x ∈ M1 .
9.9.27 Definition: The A-module of A-endomorphisms of a module M over a commutative ring A is the
A-module of A-homomorphisms from M to M .
[ Here present the tensor product of two modules over a ring as in EDM2 [35], section 277.J. This definition
is very similar indeed to the way tensor products of vector spaces are defined. Should also try to generalize
this to modules over a group, which is the relevant situation for defining associated fibre bundles. ]
[ Mention “balanced maps” and the relation to multilinear maps? Also do balanced maps for product-quotients
of groups? See EDM2 [35], section 277.J. ]
9.10. Associative algebras
Sections 9.10 and 9.11 define associative algebras and Lie algebras. (Tensor algebras are presented in Chap-
ter 13.) Algebras are distinguished from modules by having five operations: an add and multiply operation
for both the active and the passive set, and an action operation.
9.10.1 Definition: An associative algebra over R, for a commutative unitary ring R −

< (R, σR , τR ), is a
< (R, A, σR , τR , σA , τA , µ) such that:
tuple A −
(i) A −
< (R, A, σR , τR , σA , µ) is a unitary left module over the ring R;
(ii) A −
< (A, σA , τA ) is a ring;
(iii) ∀λ ∈ R, ∀a, b ∈ A, λ(ab) = (λa)b = a(λb).
9.10.2 Example: The R-module EndR (M ) of R-endomorphisms of an R-module M for a commutative

ring R, given in Definition 9.9.27, is a non-trivial example of an associative algebra. These associative
algebras are converted into a useful class of Lie algebras by Theorem 9.11.10.
9.10.3 Theorem: The R-module of R-endomorphisms of a module over a commutative unitary ring R is
an associative algebra over R.
Proof: Condition 9.10.1 (i) follows from Definition 9.9.27. Condition (ii) follows from Remark 9.9.16.
Condition (iii) follows pointwise from commutativity of the product operation of R.
9.10.4 Remark: The K-module of K-automorphisms of a K-module is an associative sub-algebra of the

K-module of K-endomorphisms. This may be regarded as a “general linear” algebra, as suggested by
Notation 9.10.5.

9.11. Lie algebras 283
9.10.5 Notation: GL(V ) denotes the associative algebra of K-automorphisms of a K-module V together
with the operations of pointwise addition, composition product, and pointwise scalar product. In other
words, GL(V ) denotes the associative algebra (K, A, σK , τK , σA , τA , µ), where A = AutK (V ), σA is pointwise
addition on A τA is the product by composition of A, and µ is the pointwise scalar product of K on A.
9.10.6 Remark: There is a succinct summary of the general concept of representations in EDM2 [35],
section 362.A: “For a mathematical system A, a map from A to a similar (but in general ‘more concrete’)
system preserving the structure of A is called a representation of A.” Thus general representations are
defined in terms of subjective concepts such as “similar” and “concrete”. In differential geometry (and
physics), representations of groups and algebras are particularly useful. Representations of associative and
Lie algebras are defined in Definitions 9.10.7 and 9.11.17 respectively.
[ See EDM2 [35], 362.C, for further details on Definition 9.10.7 regarding unitary homomorphisms. ]
9.10.7 Definition: A linear representation of an associative algebra A over a commutative unitary ring
K is an associative algebra homomorphism ρ : A → EndK (M ) for some K-module M .
9.11. Lie algebras

Although associative algebras and Lie algebras have the same number and general form of operations,
the passive set A for an associative algebra is a ring whereas the passive set for a Lie algebra has an
anticommutative product operation τA which satisfies a Jacobi identity instead of an associativity rule.
9.11.1 Definition: A Lie algebra is a tuple A −

< (R, A, σR , τR , σA , τA , µ) such that:
(i) A −
< (R, A, σR , τR , σA , µ) is a unitary left module over the ring R;
(ii) R <
− (R, σR , τR ) is a commutative ring with unity;
(iii) τA : A × A → A, notated as the bracket [·, ·], satisfies:
+
∀m, n ∈ 0, ∀α ∈ Rm , ∀β ∈ Rn , ∀X ∈ Am , ∀Y ∈ An ,
35 m n
5 4 m,n
5
αi Xi , βj Yj = αi βj [Xi , Yj ];
i=1 j=1 i,j=1
(iv) ∀X ∈ A, [X, X] = 0;
(v) ∀X, Y, Z ∈ A, [X, [Y, Z]] + [Y, [Z, X]] + [Z, [X, Y ]] = 0. (Jacobi identity.)
[ Can condition (iii) of Definition 9.11.1 be derived by induction from a simpler rule? ]
9.11.2 Remark: The commutativity of [·, ·] in Definition 9.11.1 follows from condition (iv) since 0 =
[X + Y, X + Y ] = [X, Y ] + [Y, X]. Conversely, commutativity of the product implies condition (iv).
9.11.3 Remark: The Jacobi identity is equivalent to [X, [Y, Z]] = [[X, Y ], Z] + [[X, Z], Y ], which resembles
a distributive rule. It is also equivalent to [X, [Y, Z]] − [[X, Y ], Z] = [[X, Z], Y ], which resembles an associa-
tivity (or “non-associativity”) rule. It is probably best to think of the Jacobi identity as a replacement for
the associativity rule than for the distributive rule.
9.11.4 Remark: If the ring R in Definition 9.11.1 is a field, then A − < (R, A, σR , τR , σA , µ) is a linear space.
So a Lie algebra over a field may be thought of as a linear space A to which a vector product τA : A × A → A
has been added.
9.11.5 Definition: A Lie subalgebra of a Lie algebra A − < (R, A, σR , τR , σA , τA , µ) is a Lie algebra A# −
<
(R, A , σR , τR , σA! , τA! , µ ) such that A ⊆ A, σA! ⊆ σA , τA! ⊆ τA and µ# ⊆ µ.
# # #
9.11.6 Definition: A Lie algebra homomorphism from a Lie algebra A1 − < (R, A1 , σR , τR , σA1 , τA1 , µ1 ) to
a Lie algebra A2 −
< (R, A2 , σR , τR , σA2 , τA2 , µ2 ) is a map f : A1 → A2 such that
(i) ∀λ1 , λ2 ∈ R, ∀a, b ∈ A1 , f (λ1 a + λ2 b) = λ1 f (a) + λ2 f (b); (f is R-linear)
(ii) ∀a, b ∈ A1 , f ([a, b]) = [f (a), f (b)].

284 9. Algebra
9.11.7 Remark: The Lie algebra homomorphism in Definition 9.11.6 is the same thing as the R-homomor-
phism between modules in Definition 9.9.10 together with the extra condition 9.11.6 (ii).
9.11.8 Definition: A Lie algebra isomorphism from a Lie algebra A1 to a Lie algebra A2 is a bijective
Lie algebra homomorphism φ : A1 → A2 .
A Lie algebra endomorphism of a Lie algebra A is a Lie algebra homomorphism φ : A → A.
A Lie algebra automorphism of a Lie algebra A is a Lie algebra isomorphism φ : A → A.
A Lie algebra monomorphism from a Lie algebra A1 to a Lie algebra A2 is an injective Lie algebra homo-
morphism φ : A1 → A2 .
A Lie algebra epimorphism from a Lie algebra A1 to a Lie algebra A2 is a surjective Lie algebra homomor-
phism φ : A1 → A2 .
9.11.9 Theorem: Let A − < (R, A, σR , τR , σA , τA , µ) be an associative algebra. Define τA# : A × A → A by
τA : (X, Y ) 8→ τA (X, Y ) − τA (Y, X). Then (R, A, σR , τR , σA , τA# , µ) is a Lie algebra.
#
Proof: From Definition 9.10.1, it follows that

65m n
5 7 +5 m n
5 ,
αi Xi , βj Yj = τA# αi Xi , βj Yj
i=1 j=1 i=1 j=1
+5
m n
5 , +5
n m
5 ,
= τA αi Xi , βj Yj − τA βj Yj , αi Xi
i=1 j=1 j=1 i=1
m 5
5 n n 5
5 m
= αi βj τA (Xi , Yj ) − αi βj τA (Yj , Xi )
i=1 j=1 j=1 i=1
m,n
5
= αi βj [Xi , Yj ],
i,j=1
which verifies Definition 9.11.1 (iii). Condition (iv) of Definition 9.11.1 is obvious. The Jacobi condition (v)
follows from the kind of boring calculation that is usually set as an exercise.
9.11.10 Definition: The Lie algebra associated with an associative algebra A − < (R, A, σR , τR , σA , τA , µ)
is the Lie algebra (R, A, σR , τR , σA , τA# , µ), where τA# : A × A → A is defined by τA# : (X, Y ) 8→ τA (X, Y ) −
τA (Y, X).
9.11.11 Definition: A real Lie algebra is a Lie algebra whose commutative ring is IR.
9.11.12 Remark: The algebra X ∞ (M ) of C ∞ vector fields in a C ∞ manifold M with the Poisson bracket
as the product operation is a real Lie algebra. This is stated more precisely in Theorem 32.2.11. (See
Definition 32.2.10 for the Poisson bracket.)
9.11.13 Remark: Let V − < (K, V, σK , τK , σV , µVK ) be a K-module for a commutative unitary ring K. As
noted in Definition 9.9.15 and Remark 9.9.16 and Definition 9.9.27, the set EndK (V ) of K-endomorphisms of
V is a K-module if it is given the operations of pointwise addition and composition product. As noted further
in Theorem 9.10.3, EndK (V ) is an associative algebra over K. Since the set AutK (V ) of K-automorphisms
(the K-linear maps from V to V ) is a closed K-module of EndK (V ), it follows that AutK (V ) is also an
associative algebra over K. By Theorem 9.11.9 and Definition 9.11.10, the associative algebra AutK (V ) can
be converted into a Lie algebra by replacing its composition product with the commutator of that product.
This is the Lie algebra in Notation 9.11.14.
9.11.14 Notation: gl(V ) for a K-module V denotes the Lie algebra associated with the associative algebra
AutK (V ) of K-automorphisms of V .
9.11.15 Example: A useful example of a Lie algebra gl(V ) is the case that V is the set IRn for some n ∈ + ,
together with the usual linear space structure for IRn . Then the elements of AutIR (IRn ) are the invertible
linear transformations of IRn . With respect to a basis for IRn , these correspond to invertible n × n matrices.
Therefore gl(V ) corresponds to the set of invertible n × n matrices together with the operations of matrix
addition and the commutator product.

9.12. List space for sets with algebraic structure 285
9.11.16 Remark: A representation of a Lie algebra A is defined to be a Lie algebra homomorphism ρ from
A to a Lie algebra of the form gl(V ) as in Notation 9.11.14. The notation (ρ, V ) is sometimes used instead
of ρ, although the space V is, strictly speaking, superfluous.
9.11.17 Definition: A (linear) representation of a Lie algebra A < − (K, A, σK , τK , σA , τA , µA
K ) over a
K-module V −< (K, V, σK , τK , σV , µK ) is a Lie algebra homomorphism ρ : A → gl(V )
V
The space V is called the representation space of the representation.

9.11.18 Remark: If K = IR and V = IRn for some n ∈ + in Definition 9.11.17, then the Lie algebra A
is mapped by ρ to the space AutIR (IRn ) of linear transformations of IRn with the product defined by the
commutator of the composition of transformations. If ρ is injective, then ρ embeds A in a matrix algebra.
9.11.19 Definition: The adjoint representation of a Lie algebra A is the representation ρ : A → gl(A)
defined by
∀X, Y ∈ A, ρ(X)(Y ) = [X, Y ].
9.11.20 Notation: ad(X) for an element X of a Lie algebra A denotes the image ρ(X) of X under the
adjoint representation of A. Thus ad(X) : A → gl(A) is defined by ad(X) : Y 8→ [X, Y ].
9.11.21 Remark: The set of all maps ad(X) for X in a Lie algebra A is closed under the operations of A.
Therefore this set, together with the restricted operations of A, is a subalgebra of A. This is formalized in
Definition 9.11.22.
9.11.22 Definition: The adjoint Lie algebra of a Lie algebra A is the subalgebra {ad(X); X ∈ A} of A.
9.11.23 Notation: ad(A) for a Lie algebra A denotes the adjoint Lie algebra {ad(X); X ∈ A} of A.
9.11.24 Remark: The French word “adjoint” comes from the verb “adjoindre” meaning to “associate”,
“attach” or “affix”. It corresponds to the English word “adjunct”, which means a thing which is joined,
subordinate, associated or auxiliary to something else. In mathematics, the sense of the word “adjoint” is
that it is associated or subsidiary to something else. In the case of adjoint representations and adjoint Lie
algebras, the English word “associated” is probably a suitable translation. Perhaps the word “adjunct” is
an even better translation.
[ What is the relation, if any, beween adjoint representations and adjoint operators? ]
[ Define direct sums of Lie algebras. ]
[ Define associative algebras near here. E.g. cross product in IR3 . Generalize to higher dimensions? Maybe
use alternating products of tensors for this? ]
9.12. List space for sets with algebraic structure

The operations available on a list space depend on the operations defined on the base set X. See Section 7.12
for list spaces for general sets.
[ Incorporate all of the material from fund.tex here on list spaces? Also include possibly some material from
the list spaces folder which was written for the symbolic algebra approach to network reliability calculations. ]
9.12.1 Definition: If X is a semigroup whose operation is written additively, the following operations are
defined for List(X) in addition to the operations in Definition 7.12.2:
(i) The projection functions πi : List(X) → X, defined by. . .
(
/i if i < length(/)
πi (/) = .
0X if i ≥ length(/)
!
(ii) The sum function : List(X) → X defined for / ∈ List(X) by
length(%)−1
! 5
/= /i .
i=0
This is well defined because addition in X is associative. The sum is understood to “left to right”. That
is, the sum is /0 + /1 + . . ..

286 9. Algebra
9.12.2 Definition: If X is a ring, then the following additional operations are defined for List(X) in
addition to the operations for a semigroup in Definition 9.12.1:
(i) The product function from (X, List(X)) to List(X) defined by
(x/)i = x/i
for x ∈ X, / ∈ List(X) and 0 ≤ i < length(/).
[ There are plenty more operations for rings probably. An probably for groups and semigroups too. Must
have a separate definition for each class of base set. ]

[287]
Chapter 10
Linear algebra
10.1 Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

10.2 Linear subspaces and basis vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
10.3 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
10.4 Eigenspaces of linear space endomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10.5 Linear functionals and dual spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
10.6 Direct sums of linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
10.7 Quotients of linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
10.8 Inner products and norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.9 Groups of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.10 Free linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.11 Exact sequences of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
10.1. Linear spaces

10.1.1 Remark: The term “linear space” is used in Definition 10.1.2 instead of “vector space” to emphasize
the fact that the elements of a linear space are not necessarily vectors with end-points embedded in a larger
linear space. The terms “linear space” and “vector space” are often used interchangeably, but the term
“vector space” reflects the historical contexts of physics and geometry whereas the term “linear space” refers
to the abstract mathematical definition. The “vector space” of physics is better modelled by the affine spaces
which are defined in Chapter 12.
10.1.2 Definition: A linear space over a field K = (K, σK , τK ) is a tuple V − < (K, V, σK , τK , σV , µ)
such that (V, σV ) is a commutative group written additively, and the operation µ : K × V → V , written
multiplicatively, satisfies
(i) ∀λ ∈ K, ∀v1 , v2 ∈ V, λ(v1 + v2 ) = λv1 + λv2 ,
(ii) ∀λ1 , λ2 ∈ K, ∀v ∈ V, (λ1 + λ2 )v = λ1 v + λ2 v,
(iii) ∀λ1 , λ2 ∈ K, ∀v ∈ V, (λ1 λ2 )v = λ1 (λ2 v), and
(iv) ∀v ∈ V, 1K v = v.
The operation σV on V is called vector addition. The operation µ is called scalar multiplication.
10.1.3 Remark: A linear space over a field K is the same thing as a unitary left module V over the ring
structure of K. (See Definition 9.9.21 and Remark 9.9.23.)
10.1.4 Remark: In a linear space, 0K v = 0V for all v ∈ V because by Definition 10.1.2 (ii), 0K v =
(0K + 0K )v = 0K v + 0K v for all v ∈ V .
10.1.5 Remark: Condition 10.1.2 (iv) is independent of the other conditions. It follows from (iii) that
1K (λv) = λv for all λ ∈ K and v ∈ V . This does not imply that 1K v = v for all v ∈ V because the range
of µ might not be the entire space V . (We only know that Range(µ) is a subspace of V if condition (iv) is
omitted.)

288 10. Linear algebra
If λv = 0V for all λ ∈ K and v ∈ V , all of the linear space conditions except (iv) are satisfied for any field
K and non-trivial commutative group (V, σV ). In fact, for any valid linear space V , the operation µ may be
replaced by µ# : (λ, v) 8→ λP (v), where P is an idempotent linear automorphism. (I.e. P : V → V is linear
and P (P (v)) = P (v) for all v ∈ V .) Then µ# satisfies all of the other conditions, but µ# does not satisfy (iv)
unless P happens to be the identity map.
10.1.6 Remark: Vectors in linear spaces are often drawn as arrows as in Figure 10.1.1, which illustrates
the commutativity of vector addition.
v1
“copy of v2 ”
0
v1 + v2 = v2 + v1
“copy of v1 ”
v2
Figure 10.1.1 Commutativity of linear space addition
The sum of two vectors v1 and v2 is thought of as being constructed by translating a “copy” of v2 to the end
of the v1 arrow. Then the vector v1 + v2 is at the pointy end of the translated v2 arrow. This way of doing
vector addition is correct in affine spaces, which are defined in Chapter 12. This is not how vector addition
works in a linear space. The affine space picture is used for teaching linear spaces because it is convenient,
not because it is correct.
Vector addition in an affine space is illustrated in Figure 12.2.1. In an affine space, vectors are depicted as
arrows between points. This is because in an affine space, there are two different kinds of objects, namely
points and vectors. The points are represented as dots. The vectors are represented as arrows which start
at one point and terminate at another point. Therefore the vector labels are attached to the arrows. In
Figure 12.2.1, the points are labelled P , Q and R, while the vectors are labelled v, w and v + w.
In a linear space, there are only vectors. The vectors may be thought of as points, but there is only one kind
of object, namely vectors. Therefore it is best to depict the vectors as points in a linear space. The vectors
in a linear space are not “portable” or “mobile” as vectors in an affine space are. A vector in a linear space
is a fixed element of a set. In Figure 10.1.1, the vector labels 0, v1 , v2 and v1 + v2 are attached to the points
in the diagram which represent the four vectors.
The arrows joining the origin to each point are superfluous in a linear space diagram. There is no “portable
arrow” in a linear space. It is true that vector addition in a linear space follows the familiar parallel transport
rules which are used in diagrams such as Figure 10.1.1. However, this only works because we define parallel
transport in an affine space in terms of linear space vector addition.
In a linear space, the arrows between the points representing vectors are actually directed line segments.
For example, the “vector” from v1 to v1 + v2 is actually the curve γ : [0, 1] → V defined by γ(t) =
(1 − t)v1 + t(v1 + v2 ) for t ∈ [0, 1]. It happens that this is the same as γ(t) = v1 + tv2 for t ∈ [0, 1], which is
admittedly suggestive of the vector v2 being added to v1 . This “suggestion” is formalized in the definition of
affine spaces. Figure 10.1.2 illustrates line segment curves between vectors in terms of a parameter t ∈ [0, 1].
10.1.7 Remark: Linear spaces are defined in Weyl [51], section 2, page 15, in a classical fashion for K = IR.
Weyl makes the observation that the multiplicative rules for the rational subfield of the field IR follow from
the additive rules.
10.1.8 Remark: Bell [191], page 379, attributes the initial development of finite-dimensional spaces, as a
generalization of the familiar two and three dimensional spaces, to Cayley.
Another of the ideas originated by Cayley, that of the geometry of “higher space” (space of n
dimensions) is likewise of present scientific significance but of incomparably greater importance as
pure mathematics. Similarly for the theory of matrices, again an invention of Cayley’s.

10.2. Linear subspaces and basis vectors 289
v1
tv1
v1 + tv2
0 t(v1 +
v2 )
tv2 v1 + v2
v2 + tv1
v2
Figure 10.1.2 Directed line segments between vectors in a linear space
10.2. Linear subspaces and basis vectors

10.2.1 Definition: A subspace of a linear space V − < (K,& V, σK , τK , σV , µV ) is
& a linear space
W −< (K, W, σK , τK , σW , µW ) such that W ⊆ V , σW = σV &W ×W and µW = µV &K×W .
10.2.2 Remark: Definition 10.2.1 is equivalent to the conventional definition of a subspace of V as a subset
W which is closed under vector addition and scalar multiplication.
10.2.3 Theorem: The intersection of a set of subspaces of a given linear space is itself a subspace.
Proof: Let V − < (K, V, σK , τK , σV , µV'

) be a linear space,
& and let C be a set
& of subspaces of V . Define the
tuple (K, W, σK , τK , σW , µW ) by W = C, σW = σV &W ×W and µW = µV &K×W . Then Range(σW ) ⊆ W
and Range(µW ) ⊆ W . So W is a linear space. Therefore it is a subspace of V .
10.2.4 Definition: The subspace spanned by a subset S of a linear space V is the intersection of all
subspaces of V which contains S. In other words it is the subspace
'
{U ⊆ V ; U is a subspace of V and S ⊆ U }.
A spanning set of a linear subspace W of a linear space V is a set S ⊆ W which spans W .
10.2.5 Definition: A finite-dimensional linear space is a linear space which has a finite spanning set.
An infinite-dimensional linear space is a linear space which is not finite-dimensional.
The dimension of a finite-dimensional linear space is the least cardinality of spanning sets for the space.
10.2.6 Notation: dim(V ) for a linear space V denotes the dimension of V .
10.2.7 Remark: The trivial subspace {0V } of a linear space V is spanned by the empty set. Therefore
dim({0V }) = 0.
The trivial subspace {0V } of a linear space V is a subspace of all subspaces of V . The space V is a a subspace
of V .
This chapter is mostly concerned with finite-dimensional linear spaces.
10.2.8 Definition: A linear combination of a subset X of a linear space V over a field K is any element
v ∈ V which satisfies
n
5
∃n ∈ +
0, ∃w ∈ V n , ∃k ∈ K n , ki wi = v.
i=1
10.2.9 Definition: A (finite) basis for a linear space V over a field K is a sequence (ei )ni=1 ∈ V n for some
n∈ + 0 such that
n
5
∀v ∈ V, ∃# k ∈ K n , ki ei = v.
i=1
!n
In other words, for all v ∈ V , there is a unique n-tuple k ∈ K n such that v = i=1 ki ei .

!
10.2.10 Remark: Definition 10.2.9 requires the map φ : K n → V which is defined by φ(k) = ni=1 ki ei
for k ∈ K n to be a bijection. Although this map is clearly always well defined, it is not at all obvious that
every linear space will have a basis. Even if a basis does exist, it is not instantly obvious that the number
of elements n is the same for all bases. Such assertions require proof.
10.2.11 Theorem: If a linear space V is finite-dimensional, then V has a finite basis.
10.2.12 Theorem: Every basis of a finite-dimensional linear space has a number of elements equal to the
dimension of the space.
10.2.13 Remark: The inverse map φ−1 may be thought of as a linear chart for V . Each basis for a linear
space yields a different linear chart. These linear charts are the “component maps” in Definition 10.2.14.
10.2.14 Definition: The component map for a basis (ei )ni=1 of !an n-dimensional linear space V over a
n
field K for n ∈ +
0 is the map κ : V → K n
defined by ∀v ∈ V, v = i=1 κ(v)i ei .
10.2.15 Remark: The component map κ in Definition 10.2.14 is well defined because Definition 10.2.9
requires the existence of a unique component n-tuple k ∈ K n for each v ∈ V .
10.2.16 Definition: The component tuple of a vector v with respect to a basis (ei )ni=1 for an n-dimensional
linear space V over a field K for n ∈ +
0 is the tuple κ(v) ∈ K , where κ is the component map in
n
Definition 10.2.14.
10.2.17 Notation: v i for a vector v in a finite-dimensional linear space V and i ∈ n , where n = dim(V ) ∈
0 , denotes the ith element of the component n-tuple of v with respect to a basis (ei )i=1 for V .
+ n
10.2.18 Remark: The use of a superscript index in Notation 10.2.17 is sometimes only a hint that the
n-tuple v is a contravariant vector component. In this case, v i and vi mean exactly the same thing. However,
in tensor calculus contexts, the superscript and subscript notations have different meanings. Therefore it is
important to make clear in each context what the subscript and superscript index notations mean. In the
absence of a metric tensor field, it is usually safe to assume that the use of both subscripts and superscripts
is merely a hint of the meaning. But when a metric tensor field is defined, the notations v i and vi have
different meanings.
[ Discuss the relation between a basis and a subspace spanned by a set. Show that if a linear space is finite-
dimensional, then it has a finite basis with number of elements equal to the dimension. Show that the
dimension of the basis is unique. ]
10.2.19 Example: For any field K − < (K, σK , τK ), the linear space K −
< (K, K, σK , τK , σK , τK ) is a one-
dimensional linear space over the field K. The context will generally make clear whether a field K is being
used as a field or as a linear space.
Particularly useful examples of such 1-dimensional linear spaces are the real numbers IR over the field IR and
the complex numbers over the field . The complex numbers may also be regarded as a 2-dimensional
linear space over IR.
[ Define here the spaces K n etc., such as IRn as a linear space. ]
10.2.20 Definition: A Euclidean linear space is a set IRn for some n ∈ 0,

+
together with the operations
of componentwise addition and scalar multiplication over the field IR.
10.2.21 Definition: The standard basis of a Euclidean linear space IRn for n ∈ +
0 is the sequence (ei )ni=1
defined by ei = (δij )nj=1 ∈ IRn for i ∈ n .
10.2.22 Remark: For an infinite-dimensional linear space, a more general definition of a basis is required.
Definition 10.2.23 requires the number of non-zero elements in a linear combination to be finite. This is
because, in the absence of a topology, there is no way to give meaning to an infinite series of vectors. (The
subject of topological vector spaces is where this issue is addressed. Banach spaces and Hilbert spaces are
special kinds of topological vector spaces.)

10.3. Linear maps 291
10.2.23 Definition: A basis for a linear space V over a field K is a set B ⊆ V such that
5
∀v ∈ V, ∃# k : B → K, #({e ∈ B; k(e) -= 0}) < ∞ and k(e)e = v.
e∈B
In other words, for all v ∈ V , there

! is a unique function k : B → K such that k(e) = 0 for all but a finite
number of vectors e ∈ B and v = e∈B k(e)e.
10.2.24 Remark: It is not immediately obvious that every linear space has a basis. In fact, the infamous
Axiom of Choice is required to make this assertion in full generality as in Theorem 10.2.25.
10.2.25 Theorem [zf+ac]: Every linear space has a basis.
Proof: Let V be a linear space. Let C denote the set of all linearly independent subsets of V . Then C
is partially
% C1 of C which is totally ordered by set inclusion, the
ordered by set inclusion. For any subset %
union C1 is a linearly independent subset of V . So C1 ∈ C. Therefore by Zorn’s Lemma (Remark 5.9.14
part (3.2)), C contains a maximal element B with respect to the relation of set inclusion. That is, there is a
linearly independent subset B of V such that B ∪ {v} is linearly dependent for all v ∈ V . But this implies
that B spans V .
10.2.26 Remark: A special case of the linear combination concept introduced in Definition 10.2.8 is the
concept of a convex combination in Definition 10.2.27.
10.2.27 Definition: A convex combination of a subset X of a linear space V over a field K is any element
v ∈ V which satisfies
n
5 5n
∃n ∈ +0 , ∃w ∈ V n
, ∃k ∈ K n
, ki = 1 K and ki wi = v.
i=1 i=1
10.3. Linear maps

10.3.1 Definition: A linear map from a linear space V over a field K to a linear space W over K is a
map φ : V → W such that
(i) ∀v1 , v2 ∈ V, φ(v1 + v2 ) = φ(v1 ) + φ(v2 ),
(ii) ∀λ ∈ K, ∀v ∈ V, φ(λv) = λφ(v).
10.3.2 Notation: Lin(V, W ) denotes the set of linear maps from the linear space V to the linear space W .
10.3.3 Remark: The notation L(V, W ) is used for the set of linear maps in Rudin [137], defn. 9.6, page 184.
The notation Hom(V, W ) is used for Lin(V, W ) in Federer [106].
The notation HomK (V, W ) is used in EDM2 [35], section 256.D, but the field K is implicit in the specifications
of the linear spaces V and W . It is best to avoid tacking “useful hints” onto notations. (A similar kind of
“useful hint” is the annoying practice of writing a manifold M as M n to remind the reader of the dimension
of the manifold.)
[ Should cover the relation between endomorphisms of a linear space and matrix algebras. See EDM2 [35],
256.D. Somehow the matrix of the change of basis is also the matrix of the change of coordinates with respect
to a basis. ]
10.3.4 Definition: The linear space of linear space homomorphisms from a linear space V to a linear
space W is the set Lin(V, W ) of linear maps from V to W , together with the operations of pointwise vector
addition and scalar multiplication given by
∀f1 , f2 ∈ Lin(V, W ), ∀v ∈ V, (f1 + f2 )(v) = f1 (v) + f2 (v)
and
∀λ ∈ K, ∀f ∈ Lin(V, W ), ∀v ∈ V, (λf )(v) = λf (v).
10.3.5 Remark: Every category in mathematics is accompanied by a large amount of boilerplate. In par-
ticular, various kinds of morphisms are defined. Definition 10.3.6 and Notation 10.3.7 present the basic mor-
phisms for linear spaces. (These are created by copy/paste/edit from Definition 9.2.22 and Notation 9.2.23.)

10.3.6 Definition: A linear space homomorphism from a linear space V1 to a linear space V2 is a linear
map φ : V1 → V2 .
A linear space isomorphism from a linear space V1 to a linear space V2 is a bijective linear space homomor-
phism φ : V1 → V2 .
A linear space endomorphism of a linear space V is a linear space homomorphism φ : V → V .
A linear space automorphism of a linear space V is a linear space isomorphism φ : V → V .
A linear space monomorphism from a linear space V1 to a linear space V2 is an injective linear space homo-
morphism φ : V1 → V2 .
A linear space epimorphism from a linear space V1 to a linear space V2 is a surjective linear space homomor-
phism φ : V1 → V2 .
10.3.7 Notation: Let V , V1 , V2 be linear spaces over a field K.
Hom(V1 , V2 ) denotes the set of linear space homomorphisms from V1 to V2 .
Iso(V1 , V2 ) denotes the set of linear space isomorphisms from V1 to V2 .
End(V ) denotes the set of linear space endmorphisms of V .
Aut(V ) denotes the set of linear space automorphisms of V .
Mon(V1 , V2 ) denotes the set of linear space monomorphisms from V1 to V2 .
Epi(V1 , V2 ) denotes the set of linear space epimorphisms from V1 to V2 .
10.4. Eigenspaces of linear space endomorphisms

10.4.1 Remark: Eigenspaces, eigenvectors and eigenvalues are defined for linear space endomorphisms.
The “eigen” concepts are a way of choosing a basis for a linear space to simplify the form of a linear map.
The simplest situation for understanding the “eigen” concepts is a one-dimensional linear space V . A linear
endomorphism for such a space has a very simple form. It must be equivalent to a scalar multiplication
operation on V . In other words,
∀φ ∈ End(V ), ∃λ ∈ K, ∀v ∈ V, φ(v) = λv,
where K is the field of V . (See Exercise 46.5.1 for proof.) It follows that the double application of an
endomorphism φ is also equivalent to a scalar multiplication operation:
∀φ ∈ End(V ), ∀v ∈ V, φ(φ(v)) = λ2 v.
Clearly this can be extended to arbitrary powers of φ and polynomials in φ. (Infinite power series in φ, such
as the exponential, can also be defined if the linear space is closed under limit operations, but that’s another
story. Limits require topology, which is defined in Chapter 14.)
More generally, an “eigenspace” of an endomorphism φ ∈ End(V ) on a linear space V over a field K is a
subspace W of V such that
∃λ ∈ K, ∀v ∈ W, φ(v) = λv.
In other words, within the eigenspace W , the endomorphism is equivalent to a scalar multiplication operation.
This makes the behaviour of the endomorphism very simple within the eigenspace, just as if the linear space
had been one-dimensional. For example, one may easily calculate any finite power of the endomorphism:
∃λ ∈ K, ∀k ∈ +
0, ∀v ∈ W, φk (v) = λk v.
More generally, for any polynomial function P , P (φ)(v) = P (λ)v for all v ∈ W . The scalar multiple λ is
called the “eigenvalue” of φ on the eigenspace W . The non-zero elements of W are called eigenvectors.
Unfortunately, the eigenspaces of an endomorphism do not necessarily span the whole linear space V . The
eigenvalues of eigenspaces generally are not the same. However, it turns out that the eigenspace concept is
enormously useful for studying the behaviour of endomorphisms, particularly if the eigenspaces do span the
whole linear space. Then the behaviour of the map may be decomposed into the simple behaviour on each
eigenspace. For example, a linear space automorphism may be inverted by replacing its eigenvalues with the
reciprocals of the eigenvalues.
[ Talk about spectral analysis of linear operators L ∈ Lin(V, V ). Show that Tr(L) and the spectrum of L are
invariants under linear transformations. ]

10.5. Linear functionals and dual spaces 293
10.5. Linear functionals and dual spaces

Linear functionals are linear maps from a linear space to the linear space’s field. This is not quite correct.
Definition 10.3.1 requires a linear map to be a map between two linear spaces V and W over the same
field K. A field is not the same thing as a linear space, although Example 10.2.19 mentions that an field
may be easily converted into a linear space over itself. A linear functional is then a map from a linear space
V to its field K which obeys the same formal linearity rules as a linear map does.
10.5.1 Definition: A linear functional on a linear space V over a field K is a function f : V → K such
that
(i) ∀v1 , v2 ∈ V, f (v1 + v2 ) = f (v1 ) + f (v2 ), and
(ii) ∀λ ∈ K, ∀v ∈ V, f (λv) = λf (v).
10.5.2 Remark: Perhaps surprisingly, there is no guarantee, for each vector v in a general linear space V ,
that there exists a linear functional f on V for which f (v) is non-zero. Such a general guarantee requires
the axiom of choice. Alternatively the linear space may be required to have a basis as in Theorem 10.5.3.
(A general linear space is guaranteed to have at least one basis if the axiom of choice is assumed. See
Theorem 10.2.25 for this.)
10.5.3 Theorem: Let V be a linear space over a field K which has a basis. Then for any non-zero v ∈ V ,
there is a linear functional f : V → K such that f (v) -= 0.
Proof: Let B be a basis ! for V . Let v ∈ V \ {0}. Denote the component map for the basis B by
κ : V → K B . Then v = e∈B κ(v)(e)e. The set B # = {e ∈ B; κ(v)(e) -= 0} is finite. Let ē ∈ B be a
basis vector such that κ(v)(ē) -= 0. (There must be such a vector ē because v -= 0.) Define f : V → K by
f (w) = κ(v)(ē)−1 κ(w)(ē) for all w ∈ V . Then f is a well-defined linear functional on V . But f (v) = 1K .
This is non-zero
10.5.4 Remark: The set of linear functionals on a linear space has a natural linear space structure by
defining pointwise vector addition and scalar multiplication operations. The space of linear functionals is
called the “dual space”. The original space may then be referred to as the “primal space”, although the
word “primal” is rarely used in this way in the literature.
When the primal linear space is the tangent space at a point of a differentiable manifold, the dual space is
the same as the set of all differentials of differentiable real-valued functions. Consequently, the primal spaces
for differentiable manifolds typically correspond to differential operators whereas the dual spaces typically
correspond to differentials of functions.
10.5.5 Definition: The dual space of a linear space V − < (K, V, σK , τK , σV , µV ) is the linear space V ∗ −
<
(K, V , σK , τK , σV ∗ , µV ∗ ) where the set V is the set of all linear functionals on V , and σV ∗ , µV ∗ are the
∗ ∗
operations of pointwise addition and scalar multiplication on V ∗ .
[ Robertson/Robertson [136], page 25, calls Definition 10.5.5 the “algebraic dual” as opposed to the “topo-
logical dual”. ]
10.5.6 Remark: Bases for the primal and dual spaces are sometimes distinguished by using superscripts
for the dual space basis while using the same letter, such as e, for both bases. Then the components of primal
space vectors with respect to the primal basis also use superscripts. This kind of superscript convention is
used in Definition 10.5.7. The convention is useful as a mnemonic for calculations, but it sometimes leads
to erroneous assumptions. (See Remark 10.2.18 for further comment on this.)
10.5.7 Definition: The canonical dual basis of a basis (ei )ni=1 for a finite-dimensional ! linear space V is
the basis (ei )ni=1 for the dual space V ∗ defined by ei (v) = v i for all i ∈ n , for all v = ni=1 v i ei ∈ V .
10.5.8 Remark: The fact that the functions ei in Definition 10.5.7 are well-defined relies on the existence
and uniqueness of the coordinates v i for all vectors v ∈ V . It is easy to show that the ei are linear functionals.
The fact that the sequence
"!n (e )#i=1 !
i n
is a basis for V ∗!follows from the linearity of!any f ∈ V ∗ because for
n n n
any v ∈ V , f (v) = f i=1 v ei =
i
i=1 v f (ei ) =
i
i=1 f (ei )e (v), so that f =
i
i=1 f (ei )e . This shows
i

!
that the ei span V ∗ . To show that the coefficients of the e!
i
are unique, let f = ni=1 fi ei = 0 for (fi )ni=1 ∈ K n .
Then f (v) = 0 for all v ∈ V . In particular, 0 = f (ej ) = ni=1 fi ei (ej ) = fj for all j ∈ n . So the coefficients
are unique and the sequence (ei )ni=1 is a basis for V ∗ .
10.5.9 Remark: The canonical dual basis in Definition 10.5.7 is the unique sequence of linear functionals
(ei )ni=1 on V such that ei (ej ) = δji for all i, j ∈ n .
10.5.10 Theorem: The dimension of the dual of a linear space equals the dimension of the primal space.
Proof: This follows immediately from the fact that the canonical dual basis is a basis which has the same
number of elements as the basis of the primal space.
10.5.11 Definition: The dual component map for the dual V ∗ of an n-dimensional linear space V over
a field K with n ∈ +
0 for a basis (ei )i=1 for V is the map κ : V → K defined by κ : f →
n ∗ ∗ n ∗
8 (f (ei ))ni=1
for f ∈ V .
∗
10.5.12 Remark: In !nterms∗ of the dual component map κ∗ in Definition 10.5.11, any linear functional
f ∈ V satisfies f = i=1
∗
!nκ (f ∗)i e , where
i
(ei !
)ni=1 is the canonical !
dual basis of (ei )ni=1 . This follows from
the calculation f (ej ) = ( i=1 κ (f )i e )(ej ) = i=1 κ (f )i e (ej ) = ni=1 κ∗ (f )i δji = κ∗ (f )j .
i n ∗ i
[ Remark 10.5.12 probably needs to be explained a bit better. ]
10.5.13 Notation: fi denotes the ith element of the component n-tuple κ∗ (f ) ∈ K n of f with respect
to a basis (ei )ni=1 for V , where f is a linear functional on a finite-dimensional linear space V over a field K
and i ∈ n , where n = dim(V ) ∈ + 0.
10.5.14 Remark: Notation 10.5.13 is the covariant analogue of Notation 10.2.17 for contravariant vector
components. As alluded to in Remark 10.2.18, the use of subscript and superscript indices is merely a hint
of the transformation rules for vector and linear functional component tuples except when a Riemannian or
pseudo-Riemannian metric tensor is being used to raise or lower the indices.
10.5.15 Remark: In terms of the primal and dual coordinate systems ! for an n-dimensional linear space V ,
the value f (v) for any f ∈ V ∗ and v ∈ V may be written as f (v) = ni=1 f i vi .
[ Must define “canonical” somewhere. Maybe have a glossary at the end of the book? ]
10.5.16 Remark: The dual V ∗∗ of the dual V ∗ of a linear space V is canonically isomorphic to the primal
space V if V has a basis. Theorem 10.5.17 does not hold for a completely general linear space unless the
Axiom of Choice is invoked as in Theorem 10.2.25. To avoid AC, Theorem 10.5.17 requires V to have a
basis, which is always true in the case of a finite-dimensional linear space.
10.5.17 Theorem: Let V be a linear space which has a basis. Define h : V → V ∗∗ by h(v)(f ) = f (v) for
all v ∈ V and f ∈ V ∗ . Then h is a linear space isomorphism.
Proof: The map h in Theorem 10.5.17 is well-defined because for all v ∈ V , h(v) is a linear map from V ∗
to the field K of the linear space V . (This follows from the calculation h(v)(λ1 f1 +λ2 f2 ) = (λ1 f1 +λ2 f2 )(v) =
λ1 f1 (v)+λ2 f2 (v) = λ1 h(v)(f1 )+λ2 h(v)(f2 ) for all v ∈ V , λ1 , λ2 ∈ K and f1 , f2 ∈ V ∗ .) Therefore f (v) ∈ V ∗ .
For all λ1 , λ2 ∈ K and v1 , v2 ∈ V , h(λ1 v1 + λ2 v2 )(f ) = f (λ1 v1 + λ2 v2 ) = λ1 f (v1 ) + λ2 f (v2 ) = λ1 h(v1 )(f ) +
λ2 h(v2 )(f ) for all f ∈ V ∗ . So h(λ1 v1 + λ2 v2 ) = λ1 h(v1 ) + λ2 h(v2 ). Thus h is a linear space homomorphism.
To show that h is one-to-one, suppose that v1 , v2 ∈ V are vectors such that h(v1 ) = h(v2 ) and v1 -= v2 . By
Theorem 10.5.3, there is a linear functional f ∈ V ∗ such that f (v1 −v2 ) -= 0. Then 0 = h(v1 )(f )−h(v2 )(f ) =
f (v1 ) − f (v2 ) = f (v1 − v2 ) -= 0. This is a contradiction. So h is one-to-one.
To show that h is surjective, let g ∈ V ∗∗ . It must be shown that there is a vector v ∈ V such that g(f ) = f (v)
for all!f ∈ V ∗ . Let κ : V → K B denote the component map for B in V . Any w ∈ V may be written as
w = e∈B κ(w)(e)e. Define ! e ∈ V for e ∈!B by e (w) = κ(w)(e)
# ∗ #
! for all# w ∈ V . Then for! any f ∈ V ∗
and w ∈ V , f (w) = f ( e∈B κ(w)(e)e) ! = e∈B κ(w)(e)f
! (e) = e∈B e ! (w)f (e). So f = e∈B f (e)e .
#
Therefore for all f ∈ V , g(f ) = g( e∈B f (e)e ) = e∈B f (e)g(e ) = f ( e∈B g(e )e) = f (v) where v =
!
∗ # # #
e∈B g(e )e ∈ V . So h is surjective. Hence f is a linear space isomorphism.

#

10.5. Linear functionals and dual spaces 295
10.5.18 Definition: The second dual of a linear space V is the dual of the dual of V .
10.5.19 Remark: The second dual could also be called the “double dual”. Robertson/Robertson [136],
page 70, use the word “bidual” for the dual of the dual.
10.5.20 Definition: The canonical map from a linear space V to its second dual is the map h : V → V ∗∗
defined by h : v 8→ (f 8→ f (v)). In other words, h(v)(f ) = f (v) for all v ∈ V and f ∈ V ∗ .
10.5.21 Remark: If the expression f (v) in Definition 10.5.20 is regarded as a function of two variables
f ∈ V ∗ and v ∈ V , the canonical map h is the transpose of this function. (See Remark 6.12.3 for the transpose
of a function of a function.) More precisely, if φ : V ∗ → (V → K) denotes the map φ : f 8→ (v 8→ f (v)), then
h may be thought of as the transposed function h : V → (V ∗ → K) defined by h : v 8→ (f 8→ f (v)). But φ
is actually the identity map idV ∗ on V ∗ . So the second dual canonical map h is just the function transpose
of idV ∗ . (When the revolution comes, this will be the second remark on the bonfire.)
10.5.22 Remark: If W in the space of linear maps Lin(V, W ) is the linear space K over K, there is
an isomorphism Lin(V, K) ≈ V ∗ . If V in the space Lin(V, W ) is the linear space K over K, there is an
isomorphism Lin(K, W ) ≈ W .
10.5.23 Definition: The transpose of a linear map φ : V → W for linear spaces V and W over the same
field K is the map φT : W ∗ → V ∗ defined by φT : f 8→ (v 8→ f (φ(v))).
In other words, φT (f )(v) = f (φ(v)) for all f ∈ W ∗ and v ∈ V . In other words, φT (f ) = f ◦ φ for all f ∈ W ∗ .
10.5.24 Remark: Frankel [19], page 640, notes that the transpose φT of a linear map φ : V → W in
Definition 10.5.23 is the same as the “pullback operator” for functionals on W . EDM2 [35], 256.G, calls φT
the “dual mapping” or “transposed mapping”. Yosida [151], VII.1, calls φT the “dual operator” or “conjugate
operator”. Robertson/Robertson [136], page 38, call φT the transpose, but they note the alternative terms
“adjoint”, “conjugate” and “dual”.
Crampin/Pirani [12], page 27, call φT the “adjoint” of φ. This use of the word “adjoint” is (hopefully) a
minority opinion. The adjoint of φ is usually defined as a quite different map φ∗ : W → V which uses inner
products on the linear spaces V and W . The term “adjoint” is also used for φT by Gilbarg/Trudinger [111],
page 79, in the context of Banach spaces, and by Spivak [43], page I.103, for linear maps φ : IRn → IRn .
To minimize confusion, it seems best to use the terms “transpose”, “transposed map” or “dual map” for
the concept in Definition 10.5.23 since there seems to be no obvious alternative to “adjoint” for the inner
product space concept.
10.5.25 Remark: If V = W and φ = idV in Definition 10.5.23, then φ∗ = idV ∗ . In the general case, the
formula φ∗ (f ) = f ◦ φ may be written as φ∗ = Rφ , where Rφ is the right action on linear functionals f ∈ W ∗
by the map φ. This is actually a special case of the concept of the “pull-back” of the function f : V → K
by the map φ to a function φ∗ (v) : W → K. The pull-back concept is used frequently for functions on
manifolds.
The natural generalization of this to the set W → K of all (not necessarily linear) K-valued functions on W
would have the same definition φ∗ : f 8→ f ◦ φ. Another useful generalization would be to linear maps from
V and W to a third linear space U . Thus a transposed map φ∗ : Lin(W, U ) → Lin(V, U ) may be defined by
the same rule φ∗ (f ) = f ◦ φ for all f ∈ Lin(W, U ).
10.5.26 Remark: There is an eerie similarity between the transposed map in Definition 10.5.23 and the
canonical map for the second dual in Definition 10.5.20. The formula for the transposed map is φT : f 8→ f ◦φ,
whereas for the second dual canonical map, it is h(v) : f 8→ f (v). In each case, the formula “multiplies” the
linear functional f on the right by something. In the former case, the argument of f is the value of φ. In
the latter case, the argument of f is the vector v ∈ V .
By some tortuous thinking, it is possible to represent the second dual map h : V → V ∗∗ in terms of a
transposed map. For a given v ∈ V , define the linear map φv : K → V by φv (t) = tv for all t ∈ K, regarding
the field K as a linear space over K. Then the transpose of φv is φTv : V ∗ → K ∗ defined by φTv : f 8→ f ◦ φ,
so that φTv (f ) : t 8→ f (tv). If K ∗ is identified with K via another canonical map, the map t 8→ f (tv) could
be identified with f (v). Thus effectively φTv : f 8→ f (v). So φTv is (roughly) the same thing as h(v). (When
the revolution comes, this will be the first remark on the bonfire.)

10.6. Direct sums of linear spaces

[ What is a direct product of linear spaces? The same as direct sum? ]
[ Among modules, direct sums have finite sums, whereas infinite sums are allowed in the direct product. See
EDM2 [35] 277.F. Check whether this is a related issue. ]
[ Possibly should present direct sums of linear spaces for dual spaces Vα∗ etc. also? ]
10.6.1 Definition: The external direct sum of a sequence (Vα )α∈A of linear spaces over a field K is the
set $ 8
⊕ Vα = (vα )α∈A ∈ × Vα ; #{α ∈ A; vα -= 0} < ∞
α∈A α∈A
together with the operations of componentwise addition and scalar multiplication defined by
∀(vα )α∈A , (wα )α∈A ∈ ⊕ Vα , (vα )α∈A + (wα )α∈A = (vα + wα )α∈A
α∈A
and
∀λ ∈ K, ∀(vα )α∈A ∈ ⊕ Vα , λ(vα )α∈A = (λvα )α∈A .
α∈A
The external direct sum of linear spaces may also be called a formal direct sum.
10.6.2 Notation: ⊕
α∈A Vα denotes the external direct sum of a family (Vα )α∈A of linear spaces.
n−1
⊕ i=0 Vi denotes the external direct sum ⊕
α∈A Vα in case A = n ∈ ω.
n
⊕ i=1 Vi denotes the external direct sum ⊕
α∈A Vα in case A = n ∈ 0.
+
n
⊕ n V denotes the external direct sum ⊕
i=1 Vi in case n ∈ 0 and Vi = V
+
for all i ∈ n.
2
⊕
V1 V2 denotes the external direct sum i=1 Vi . ⊕
3
⊕ ⊕ ⊕
V1 V2 V3 denotes the external direct sum i=1 Vi , and so forth.
10.6.3 Remark: The formal or abstract direct sum in Definition 10.6.1 is referred to as an external direct
sum by Hartley/Hawkes [116], definition 3.1, page 34, whereas they call Definition 10.6.5 (corresponding to
their definition 3.2) the “internal direct sum”.
Most authors use the simple term “direct sum” of linear spaces to mean the “internal direct sum” which
is given by Definition 10.6.5. Since the internal and external direct sums of linear spaces are different set
constructions, they should have different notations to distinguish them. But when the internal direct sum is
defined, it is equivalent to the external direct sum. This reduces the motivation to find a separate notation.
The external direct sum of linear spaces has much broader applicability than the internal direct sum. But
the internal direct sum is more often used in applications. It seems inevitable that both the internal and
external constructions should share a single notation.
Hartley/Hawkes [116], page 34, make the following useful observation in regard to external and internal
direct sums of rings.
The reason for the use of the same notation for external and internal direct sums will appear
shortly. The external direct sum should be thought of as a way of building up more complicated
rings from given ones, and the internal direct sum as a way of breaking a given ring down into
easier components.
This comment is equally valid for direct sums of linear spaces. The “reason” for the use of the same notation
is that the external and internal direct sums are isomorphic, as shown here in Theorem 10.6.7.
It seems fairly safe to use the same notation for external and internal direct sums of linear spaces, since the
interpretation will generally be clear from the context.
[ See EDM2 [35] 277.F for direct sums of linear spaces. The EDM2 article on modules is more detailed. Should
show that the direct product is actually a linear space, in particular that it is non-empty. ]

10.7. Quotients of linear spaces 297
10.6.4 Definition: The standard injection iβ : Vβ → ×α∈A Vα is defined by

(
w α=β
∀w ∈ Vβ , (iβ (w))α =
0Vα α -= β.
[ Must define standard projections. Should be the same as Definition 6.9.8. ]

10.6.5 Definition: Let V1 and V2 be subspaces of a linear space V over a field K such that V1 ∩ V2 = {0}.
Then the (internal) direct sum of V1 and V2 is the subspace of V defined by
{λ1 v1 + λ2 v2 ; λ1 , λ2 ∈ f, v1 ∈ V1 , v2 ∈ V2 }.
10.6.6 Remark: Definition 10.6.5 means that the direct sum of two subspaces with trivial intersection is
the span of the union of the two subspaces. (See Definition 10.2.4 for the span of a subset of a linear space.)
Thus
') *
{λ1 v1 + λ2 v2 ; λ1 , λ2 ∈ f, v1 ∈ V1 , v2 ∈ V2 } = U ∈ IP(V ); U is a subspace of V and V1 ∪ V2 ⊆ U .
if V1 and V2 are subspaces of V such that V1 ∩ V2 = {0}.
10.6.7 Theorem: Let V1 , V2 , V3 be subspaces of a linear space V over a field K such that V3 is the internal
direct sum of V1 and V2 . Define a map φ : V1 ⊕ V2 → V3 by
∀v1 ∈ V1 , ∀v2 ∈ V2 , φ(v1 , v2 ) = v1 + v2 .
Then φ is a linear space isomorphism.

10.6.8 Remark: Theorem 10.6.7 means that the internal direct sum of two subspaces of a linear space is
isomorphic to the external direct sum of the spaces. Therefore Notation 10.6.2 is re-used in Notation 10.6.9,
as discussed in Remark 10.6.3.
10.6.9 Notation: V1 ⊕ V2 denotes the internal direct sum of linear subspaces V1 and V2 which is given
by Definition 10.6.5.
10.6.10 Theorem: Let V1 , V2 , V3 be subspaces of a linear space V over a field K such that V3 = V1 ⊕ V2 is
the internal direct sum of V1 and V2 . Then any element v3 ∈ V3 has a unique representation v3 = λ1 v1 + λ2 v2
as a linear combination of a pair of elements from V1 and V2 . That is,
∀λ1 , λ#1 , λ2 , λ#2 ∈ K, ∀v1 , v1# ∈ V1 , ∀v2 , v2# ∈ V2 ,

λ1 v1 + λ2 v2 = λ#1 v1# + λ#2 v2# ⇒ λ1 = λ#1 , λ2 = λ#2 , v1 = v1# and v2 = v2# .
10.6.11 Theorem: The dimension of a direct sum of linear spaces equals the sum of the dimensions of
the component spaces.
10.6.12 Remark: Theorem 10.6.11 may be contrasted with the situation for tensor products of linear
spaces. The dimension of a tensor product of linear spaces equals the product of the dimensions of the
component spaces. (See Theorem 13.6.17 for the dimension of a tensor product of linear spaces.)
10.7. Quotients of linear spaces

[ Where is the direct sum/product of linear spaces defined? Apparently direct sums are in Section 10.6. ]
10.7.1 Definition: The quotient (linear) space of a linear space V over a field K with respect to a subspace
W of V is the quotient set {v + W ; v ∈ V } together with operations of addition and scalar multiplication
defined by
∀v1 , v2 ∈ V, (v1 + W ) + (v2 + W ) = (v1 + v2 ) + W

∀λ ∈ K, ∀v ∈ V, λ(v + W ) = (λv) + W,
where K denotes the field of V .

10.7.2 Notation: V /W for linear spaces V and W denotes the quotient space of V over W .
10.7.3 Remark: The operations in Definition 10.7.1 are well-defined.
[ Deal with the standard map j : V → V /W such that j(v) = v + W . Then have j(v1 ) + j(v2 ) = j(v1 + v2 )
and λj(v) = j(λv) etc. ]
[ Put lots of isomorphism theorems here. ]
10.7.4 Theorem: [ Isomorphism theorem π # : V / ker π → W for π : V → W . ]
[ Must define kernel and image of linear maps. ]
10.7.5 Theorem: Let f : V → W be a linear map for finite-dimensional linear spaces V and W . Then
dim V = dim ker f + dim Im f .
10.8. Inner products and norms

[ Define
) upper and *lower norms of linear
) maps for general
* normed linear spaces. That is, define λ− (φ) =
inf |φ(v)|; |v| = 1 and λ (φ) = sup |φ(v)|; |v| = 1 for linear maps φ : V → W and given norms on V
+
and W ? ]
10.8.1
"!Definition:
# The p-norm on a Cartesian product IRn for n ∈ +
0 and p ∈ [1, ∞] is the map
n
x 8→ i=1 |xi |
p 1/p
for 1 ≤ p < ∞, and x 8→ maxni=1 |xi | for p = ∞.
10.8.2 Notation: |x|p for x ∈ IRn , n ∈ +

0 and p ∈ [1, ∞] denotes the p-norm of x ∈ IRn . Thus
+ n ,
 5

 |xi |p 1/p if 1 ≤ p < ∞
|x|p = i=1

 n
 max |xi | if p = ∞.
i=1
[ Must define the standard topology on extended-real sets near Definition 14.10.1. ]
10.8.3 Remark: For all x ∈ IRn , for n ∈ 0,

+
the map p 8→ |x|p is a continuous function on [1, ∞].
10.8.4 Definition: The Euclidean norm on IRn for n ∈ +

0 is the 2-norm x 8→ |x|2 on IRn
[ Define general inner products on linear spaces. Also define the correspondence between norms and inner
products. Explain the conditions under which an inner product may be constructed from a norm. (Something
to do with a parallelogram condition.) ]
10.8.5 Definition: The! (standard) (Euclidean) inner product on IRn for n ∈ +

0 is the map f : IRn ×IRn →
IR defined by f : (x, y) 8→ ni=1 xi yi .
10.8.6 Notation: (x, y) denotes the inner product of x, y ∈ IRn for n ∈ 0.

+
x · y denotes the inner product of x, y ∈ IRn for n ∈ 0.

+
Nx, yO denotes the inner product of x, y ∈ IRn for n ∈ 0.

+
10.8.7 Remark: There are many notations for inner products, of which Notation 10.8.6 gives a selection.
A norm may be defined on any linear space. In physics contexts, the angle bracket notation Nx, yO is especially
popular. When multiple norms are defined on a single space in a single context, it may be necessary to use
subscripts to distinguish the norms. The quantum mechanics bra/ket notation is especially ambiguous since
the angle brackets may denote either the inner product or the action of a linear functional, depending on
whether the left vector is in the primal or dual space.

10.9. Groups of linear transformations 299
[ Here define the inner product and norm on IRn . Show some basic properties. Define orthogonal and
orthonormal sets of vectors. Present Gram-Schmidt orthogonalization. ]
[ See Remark 9.11.24 for comments on the meaning of the French word “adjoint”. See EDM2 [35], 256.Q, for
adjoint maps. ]
[ Remark 10.8.8 needs to be tidied up a bit to make more sense. Eigenvalues are related to the values of a
bilinear form relative to the corresponding inner product: β(v, v) ≤ |λ|(v, v), or something like that. ]
10.8.8 Remark: As mentioned in Remark 13.2.14, square matrices are useful for (at least) two kinds of
applications, namely as the coefficients of a linear endomorphism and as the coefficients of a bilinear map.
In both cases, a basis must first be specified for the linear space.
In the case of a linear map φ : V → V for a linear space V , the eigenspaces of the matrix of the map are
useful for the study of the properties of the linear map. In the case of a bilinear function β : V × V → K,
where K is the field of the linear space V , the eigenvalues of the matrix of the bilinear function are also
useful, but in a different way.
If a matrix A = (aij )ni,j=1 is real and symmetric, its eigenspaces span the whole linear space V , and the
eigenvalues are real. Denote by λ+ the maximum eigenvalue and by λ− the minimum eigenvalue. (See also
Definition 11.4.2 for λ+ and λ− .) Then the map φ : V → V never multiplies the length of a vector in V by
a multiple which is greater than λ = | max(λ+ , λ− )|, no matter how the norm on the linear space is chosen.
In fact, λ = sup{|φ(v)|; |v| ≤ 1}.
For a bilinear map corresponding to a real symmetric matrix, the number λ is equal to sup{|β(v, v)|; v ∈
V and |v| ≤ 1} for any choice of norm.
It would now be useful to be able to ignore the complicated algebra of eigenvalues and use the simpler-
looking norm-based expressions instead. (In practice, of course, the sup and inf expressions would involve
complicated algebra anyway, but they do look less algebraic.) The norm-based sup/inf expressions are
perfectly meaningful for asymmetric matrices, where eigenvalue calculations are more difficult.
10.9. Groups of linear transformations

[ Clearly distinguish between GL(V ) and GL(m, K). Make a correspondence between these in the case of a
finite-dimensional space V . The component version needs matrices. This will be in Section 11.7. ]
[ Present the linear space automorphism groups GL(IRn ) and so forth. Apparently GL(n, IR) is the group of
n × n matrices over the field IR, whereas GL(IRn ) is a subset of Lin(IRn , IRn ) = Hom(IRn , IRn ). ]
[ See Malliavin [36], page 184 for the basic classical groups. See also EDM2 [35], chapter 60. ]
[ Also make sure the include groups which preserve hyperbolic norm like the Minkowski and Lorentz groups.
Also deal with rational linear or projective groups? ]
10.10. Free linear spaces

Free linear spaces may be thought of as the external direct sum of a copy of a field K for each element of an
arbitrary set. Free linear spaces are useful as raw “uncut” spaces which may be sculpted into useful spaces
by constructing the quotient over an appropriate relation. For example, tensor products of linear spaces may
be defined in this way. (Tensor product spaces are defined in Chapter 13. The free linear space construction
method for tensor product spaces in Section 13.12.)
[ Prove the above claim. That is, show that a free linear space is equivalent to an external direct sum ⊕α∈S Vα ,
where Vα = K for all α ∈ S. ]
10.10.1 Definition: The free linear space on a set S over a field K − < (K, σK , τK ) is the tuple V −
<
(K, V, σK , τK , σV , µ), where
) *
V = f : S → K; #{x ∈ S; f (x) -= 0} < ∞ ,
σV is the pointwise addition operation in V , and µ : K × V → V is the pointwise scalar multiplication
operation by elements of K on V .
The standard immersion from the set S to the free linear space V on S is the map i : S → V defined as the
characteristic function i(x) = χ{x} : S → K for x ∈ S. (Thus i(x)(y) = δxy for all x, y ∈ S.) As informal
notation, x may be written for i(x).

10.10.2 Remark: See Definition 7.9.2 for the indicator function χ in Definition 10.10.1.
10.10.3 Remark: From the computational point of view, Definition 10.10.1 is easy to represent and operate
on. Because each element of the free linear space has only a finite set of non-zero components, it is possible
to represent an element of the space by a finite set of ordered pairs. Any vector f in the free linear space V
on the set S over K may be represented by the finite set
) *
(x, f (x)) ∈ f ; f (x) -= 0 .
Such a representation is easy to store and manipulate if the elements of S are easy to store and manipulate.
This is very close to the way symbolic algebra is implemented in computer software. Figure 10.10.1 illustrates
an element χ{u} + 2χ{v} + (−3/2)χ{w} with u = (3, 3), v = (1, −4) and w = (−4, −1) in the free linear space
on the Cartesian product set S = IR2 over the field K = IR.
K = IR
(v, 2)
(u, 1)
S = IR2
(w, −3/2)
Figure 10.10.1 χ{(3,3)} + 2χ{(1,−4)} + (−3/2)χ{(−4,−1)} in the free linear space on IR2 over IR
10.10.4 Remark: Free spaces of any kind may be thought of as a kind of “symbolic algebra” such as
is offered by some computer software. Each element of a “free space” of any class represents a grouping
of symbols rather than a mathematical object. The intended mathematical object is obtained from the
“symbolic algebra” by imposing an equivalence relation and adding algebraic operations. In the example
of tensors, a symbolic expression such as “λ1 (v1 ⊗ w1 ) + λ2 (v2 ⊗ w2 )” may be represented as a sum of
characteristic functions such as λ1 χ{(v1 ,w1 )} + λ2 χ{(v2 ,w2 )} .
Whenever mathematicians or physicists write down juxtapositions of symbols without working out what
they represent mathematically, the concept of a “free” space can often save the situation. For instance,
when tensor products are written, they are usually written as v ⊗ w. To formalize this mathematically, one
may first create the “uncut space” or “free space” of all formal juxtapositions of vectors v and w in some
linear space, and then reduce the size of this free space by introducing equivalence relations. This approach
is also used for “free groups”, “free R-modules”, and so forth.
An example of interest to differential geometry is a free R-module of singular q-chains over a commutative
unitary ring R such as R = . In this case, such a free R-module is referred to as the space of “formal
linear combinations” of elements of some space. The base space for the free space of q-chains is the space of
q-simplices. (See Greenberg/Harper [114], page 44.)
10.10.5 Remark: Sparse matrices may be represented by lists in a similar way to free linear spaces. If a
matrix has very few non-zero entries, it is more efficient in computer software to represent matrices as lists of
the non-zero terms. Thus the representation of algebras (such as tensor products of linear spaces) in terms
of free linear spaces may be thought of as “sparse representations”.
10.11. Exact sequences of linear maps

Exact sequences of linear maps are applicable to the definition of connection forms on principal fibre bundles
in Section 35.6. In algebraic topology, exact sequences of homomorphisms between modules over a ring are
used extensively.
[ See EDM2 [35] 277.E for exact sequences. Also see Hartley/Hawkes [116]. Also see Fulton [108], p.57, for
quotient modules and exact sequences. Also see Greenberg/Harper [114], section 14, page 75. ]
10.11.1 Definition: An exact sequence of linear maps (over a field K) is a pair ((Vi )n+1
i=1 , (fi )i=1 ) of
n
sequences such that each Vi is a linear space (over K), each fi is a linear map from Vi to Vi+1 , and
Im fi = ker fi+1 for 1 ≤ i ≤ n − 1; that is, the pair of sequences must satisfy

10.11. Exact sequences of linear maps 301
(i) ∀i ∈ n+1 , Vi is a linear space (over K),

(ii) ∀i ∈ n, fi ∈ Lin(Vi , Vi+1 ),
(iii) ∀i ∈ n−1 , Im fi = ker fi+1 .
If K is not mentioned, it may be arbitrary or else equal to IR, depending on context.
10.11.2 Notation: The pair of sequences ((Vi )n+1

i=1 , (fi )i=1 ) may be denoted
n
f1 f2 fn−1 fn
V1 −→ V2 −→ . . . −→ Vn −→ Vn+1 .
The trivial linear space {0} may be denoted 0 in such diagrams.

f1 fn
10.11.3 Theorem: Suppose V1 −→ V2 . . . Vn −→ Vn+1 is an exact sequence of linear maps between finite
dimensional linear spaces. Then the following hold:
(i) For 1 ≤ k ≤ l ≤ n,
l
5
dim Im fl = (−1)l−i dim Vi − (−1)l−k dim ker fk .
i=k
(ii) If V1 = 0, then for 1 ≤ l ≤ n,

l
5
dim Im fl = (−1)l−i dim Vi .
i=2
(iii) If V1 = 0 and Vn+1 = 0, then

n
5
(−1)n−i dim Vi = 0.
i=2
That is, the sum of the odd dimensions equals the sum of the even dimensions.
Proof: Let 1 ≤ k ≤ n. Then by Theorem 10.7.5, dim Vk = dim ker fk + dim Im fk . That is, dim Im fk =
dim Vk − dim ker fk . This establishes case l = k of (i).
If 1 ≤ k < n, then
dim Im fk+1 = dim Vk+1 − dim ker fk+1
= dim Vk+1 − (dim Vk − dim ker fk+1 ).
This is case l = k + 1 of (i). The result for general k and l follows by the inductive application of Theo-
rem 10.7.5 and the exactness of the sequence.
Parts (ii) and (iii) follow immediately from (i).
10.11.4 Theorem:
f
(i) If V1 −→ V2 −→ 0 is an exact sequence of linear maps, then f is surjective, and hence dim V1 ≥ dim V2 .
g
(ii) If 0 −→ V1 −→ V2 is an exact sequence of linear maps, then g is injective, and hence dim V1 ≤ dim V2 .
f
(iii) If 0 −→ V1 −→ V2 −→ 0 is an exact sequence of linear maps, then f is a bijection, and hence dim V1 =
dim V2 .
g f
(iv) If 0 −→ V1 −→ V2 −→ V3 −→ 0 is an exact sequence of linear maps, then f is surjective, g is injective,
f ◦ g is the zero map, and dim V2 = dim V1 + dim V3 .
Proof: Statements (i), (ii) and (iii) are straightforward. Part (iv) follows from part (iii) of Theo-
rem 10.11.3.
[ Define a complementary subspace. See EDM2 [35] 256.F and Simmons [140], page 193. ]

g f
10.11.5 Theorem: Suppose 0 −→ U −→ V −→ W −→ 0 is an exact sequence of linear maps. Then the
following hold:
(i) For any complementary subspace Q of ker f in V , there exist a unique ρ ∈ Lin(W, V ) and a unique
ω ∈ Lin(V, U ) such that
0 ←− U ←− V ←− W ←− 0
ω ρ
is an exact sequence of linear maps with ker ω = Q, and f ◦ ρ = idW and ω ◦ g = idU .
(ii) For any ρ ∈ Lin(W, V ) such that f ◦ ρ = idW , the space Q = Im ρ is complementary to ker f in V . Then
Q uniquely determines a map ω ∈ Lin(V, U ) such that ω ◦ g = idU and ker ω = Q.
(iii) For any ω ∈ Lin(V, U ) such that ω ◦ g = idU , the space Q = ker ω is complementary to ker f in V . Then
Q uniquely determines a map ρ ∈ Lin(W, V ) such that f ◦ ρ = idW and Im ρ = Q.
g f
10.11.6 Remark: In summary, given the exact sequence 0 −→ U −→ V −→ W −→ 0 of linear maps, there is
a one-to-one correspondence between the complementary subspaces Q of ker f in V , the maps ρ ∈ Lin(W, V )
such that f ◦ ρ = idW , and the maps ω ∈ Lin(V, U ) such that ω ◦ g = idU . In the application to connections
on a principal bundle, Q will be the set of horizontal vectors, ρ will be the lift function, and W will be the
connection form.
[ What is the function h in the diagram in the following remark? ]
10.11.7 Remark: The following relations and Figure 10.11.1 summarize the assumptions and assertions
of Theorem 10.11.5:
f and ω are surjective
g and ρ are injective
dim V = dim U + dim W
V = ker f ⊕ Q
ker f = Im g
Q = ker ω = Im ρ
f ◦ ρ = idW
ω ◦ g = idU .
Q
:

h<idQ
g f
0 −→
←− U −→ ←− W −→
←− V −→ ←− 0
ω ρ
Figure 10.11.1 Exact sequences
If the vectors of ker f are termed vertical vectors and the vectors of Q are termed horizontal vectors, then
g ◦ ω is a projection of vectors in V onto the vertical vectors, and ρ ◦ f is a projection of vectors in V onto
the horizontal vectors. It follows that there is a further one-to-one correspondence in Theorem 10.11.5 with
the projection maps of V onto subspaces of V which are complementary to ker f .
Proof of theorem 10.11.5: [ The proof of this theorem should be taken from the proofs of Theorems
10.11.9 and 10.11.10. ]
10.11.8 Remark: The following two theorems deal with right inverses of surjective linear maps. A one-to-
one correspondence is demonstrated between such right inverses and the complementary subspaces of such
maps.
f
←− W −→
V −→ ←− 0.
ρ
10.11.9 Theorem: Let V and W be linear spaces over the same field. Let f : V → W be a surjective
linear map. Let ρ : W → V be a linear map such that f ◦ ρ = idW . Then V = (ker f ) ⊕ ρ(W ).

10.11. Exact sequences of linear maps 303
Proof: Clearly ρ(W ) is a linear subspace of V . To show that (ker f ) ∪ ρ(W ) spans V , let v ∈ V and put
v # = ρ ◦ f (v). Then
f (v − v # ) = f (v) − f (ρ ◦ f (v))
= f (v) − idW ◦f (v)
= 0.
So v−v # ∈ ker f . But v # ∈ ρ(W ). Hence v = (v−v # )+v # ∈ (ker f ) ⊕ ρ(W ). To show that (ker f )∩ρ(W ) = {0},
suppose v ∈ (ker f ) ∩ ρ(W ). Then v = ρ(w) for some w ∈ W . But w = idW (w) = f ◦ ρ(w) = f (v) = 0,
because v ∈ ker f . So v = ρ(0) = 0. So V = (ker f ) ⊕ ρ(W ).
10.11.10 Theorem: Let V and W be linear spaces over the same field. Let f : V → W be a linear map
onto W . Let Q be a linear subspace of V such that V = (ker f ) ⊕ Q. Then for all w ∈ W , there is a
unique v ∈ Q such that f (v) = w. Denote this function on W by ρ : W → V . Then ρ is linear, and is an
isomorphism of W onto Q. In particular, there exists a unique linear map ρ : W → V such that f ◦ ρ = idW
and ρ(W ) = Q.
Proof: Let w ∈ W . Since f is surjective, there is a v # ∈ V such that f (v # ) = w. There exist u ∈ ker f
and v ∈ Q such that v # = u + v. Then f (v) = f (v # ) − f (u) = f (v # ) = w. To show that v is unique, suppose
v2 ∈ Q and f (v2 ) = w. Then v2 − v ∈ Q and f (v2 − v) = w − w = 0. So v2 − v ∈ (ker f ) ⊕ Q, and therefore
v2 − v = 0. Thus the function ρ : W → Q of the theorem is well-defined.
To show that ρ is linear, let w1 , w2 ∈ W , and let v1 = ρ(w1 ) and v2 = ρ(w2 ). Then for any λ1 , λ2 in the field
of the linear spaces V and W , the vector λ1 v1 + λ2 v2 is in Q, and f (λ1 v1 + λ2 v2 ) = λ1 f (v1 ) + λ2 f (v2 ) =
λ1 w1 + λ2 w2 . Hence λ1 v1 + λ2 v2 = ρ(λ1 w1 + λ2 w2 ). That is, ρ is linear.
To show that ρ(W ) = Q, let v ∈ Q and w = f (v). Then ρ(w) ∈ Q and f (ρ(w)) = w, from the definition of ρ.
So v −ρ(w) ∈ Q and f (v −ρ(w)) = f (v)−f (ρ(w)) = 0, and therefore v −ρ(w) = 0, from the complementarity
of ker f and Q. Hence v ∈ ρ(W ) for all v ∈ Q. (This shows incidentally that ρ ◦ f = idQ .)
The fact that f (ρ(w)) = w for any w ∈ W means that f ◦ ρ = idW . It follows that ρ is injective and is
therefore an isomorphism.
To show the uniqueness of a linear map ρ : W → V such that f ◦ ρ = idW and ρ(W ) = Q, let ρ1 and ρ2
be two such functions, and let w ∈ W . Then f (ρ1 (w) − ρ2 (w)) = w − w = 0 and ρ1 (w) − ρ2 (w) ∈ Q. So
ρ1 (w) − ρ2 (w) = 0 by the complementarity of ker f and Q. So ρ1 = ρ2 . This completes the proof of the
theorem.
10.11.11 Remark: Theorems 10.11.9 and 10.11.10 are partial converses of each other. They imply that
choosing a subspace complementary to the kernel of a linear map is equivalent to choosing a right inverse
for the map.
These theorems are especially applicable to the construction of a connection in a fibre bundle (in Definitions
35.5.3, 35.9.2, 35.9.4 and 35.6.3), where a right inverse to the projection map of the fibre bundle is sought.
A right inverse to the projection map is in fact a “lift” function, whereas a complementary subspace to
the kernel of the projection map turns out to be a subspace of “horizontal vectors”. Theorems 10.11.9
and 10.11.10 demonstrate that the same information is contained in either choice – whether one chooses a
lift function or a subspace of horizontal vectors.
[ Near here have a section on topological linear spaces, Banach spaces and Hilbert spaces. ]


[305]
Chapter 11
Matrix algebra
11.1 Rectangular matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

11.2 Component matrices of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
11.3 Square matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
11.4 Real square matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
11.5 Real symmetric matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
11.6 Real symmetric definite and semi-definite matrices . . . . . . . . . . . . . . . . . . . . . . 316
11.7 Matrix groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
The family tree in Figure 11.0.1 summarizes the main sets of matrices which are described in this chapter.
rectangular matrix
Mm,n (K)
square matrix real rectangular matrix

Mn,n (K) Mm,n (IR)
real square matrix

Mn,n (IR)
real symmetric matrix
Sym(n,IR)
real positive semi-definite real negative semi-definite

symmetric matrix symmetric matrix
Sym+ 0 (n,IR) Sym−0 (n,IR)
real positive definite real negative definite

symmetric matrix symmetric matrix
Sym+ (n,IR) Sym− (n,IR)
Figure 11.0.1 Family tree of sets of matrices

11.0.1 Remark: According to Bell [191], page 379, matrix algebra was invented by Arthur Cayley. (See
Remark 10.1.8.)
11.1. Rectangular matrix algebra

Matrix spaces are not a generalization of linear spaces of tuples of elements of a field, strictly speaking.
Matrices are doubly indexed whereas tuple vectors are singly indexed. Tuple vectors arise naturally as
sequences of components of vectors in a linear space with respect to a basis. Rectangular matrices arise
naturally as the components of linear maps between linear spaces with respect to bases for the source and
target spaces.
A linear transformation from an m-dimensional linear space V to an n-dimensional linear space W may be
specified by a rectangular m × n matrix of components with respect to a basis on V and a basis on W .

306 11. Matrix algebra
For a fixed basis for each space, the association between linear maps and matrices is a bijection. In prac-
tical applications, calculations with linear spaces and linear maps are usually performed in terms of vector
components and matrix algebra with respect to linear space bases.
The operations defined on rectangular matrices include matrix addition, scalar multiplication, and a limited
form of matrix multiplication. Most of the interesting operations, such as matrix inversion, the determinant
and trace functions and the calculation of eigenvalues and eigenvectors, require square matrices which are
defined in Section 11.3.
11.1.1 Definition: An m × n matrix over a field K for m, n ∈ +

0 is an element of K m× n
.
11.1.2 Remark: The notation K m × n is inconvenient. The notation K m×n is generally used instead.
The ordinal number m = {0, 1, . . . m − 1} is clearly not the same as the set m = {1, 2, . . . m}. Mathematics,
computer programming and many other subjects are plagued by lack of standardization of the starting
number for sequences: 0 or 1. There seems to be no tidy cure for this problem. One must simply be ready
to convert between the representations when the need arises. It will be assumed here that the indices start
at 1 because that is the majority preference.
11.1.3 Notation: Mm,n (K) for m, n ∈ +

0 and a field K denotes the set of m × n matrices over K. That
is, Mm,n (K) = K m × n .
11.1.4 Notation: Mm,n for m, n ∈ +

0 denotes the set Mm,n (IR).
11.1.5 Remark: The default field for matrices is chosen to be the real numbers in Notation 11.1.4, but in
some contexts the default field would be the complex numbers.
11.1.6 Remark: A matrix A in the set K m × n is a map A : m × n → K. A matrix may also be

notated as a doubly indexed sequence. Thus A = (A(i, j))i∈ m × n = (Aij )i∈ m × n = (Aij )m,n i=1,j=1 .
It is customary to use lower-case letters for matrices when they have indices and upper-case when the indices
are absent. The purpose of the upper-case is to contrast matrices and vectors when used in the same
expression. When indices are used, the upper-case hint is not required because the reader can count the
indices to distinguish matrices from vectors. Thus it is usual to write A = (aij )m,n
i=1,j=1 . If m = n, it is usual
to write A = (aij )ni,j=1 .
Square brackets may also be used. Thus A = [aij ]m,ni,j=1 . Maybe the rectangularity of the brackets is supposed
to suggest the rectangularity of the matrix. Likewise, when the elements are fully written out in rows and
columns, the brackets may be round of rectangular. Thus
   
a11 a12 . . . a1n a11 a12 . . . a1n
 a21 a22 . . . a2n   a21 a22 . . . a2n 
A = (aij )m,n m,n  ..  = . .. 
i,j=1 = [aij ]i,j=1 =  .. .. .. .. .. .
. . . .   .. . . . 
am1 am2 . . . amn am1 am2 . . . amn
The rectangular brackets are preferable because they look nicer.
11.1.7 Definition: The linear space of m × n matrices over a field K for m, n ∈ +

0 is the set of matrices
Mm,n (K) together with the following operations.
(i) Addition σ : Mm,n (K) × Mm,n (K) → Mm,n (K), written as σ(A, B) = A + B, defined by
∀A, B ∈ Mm,n (K), ∀i ∈ m, ∀j ∈ n ,

(A + B)(i, j) = A(i, j) + B(i, j).
(ii) Scalar multiplication µ : K × Mm,n (K) → Mm,n (K), written as µ(k, A) = kA, defined by
∀k ∈ K, ∀A ∈ Mm,n (K), ∀i ∈ m, ∀j ∈ n ,
(kA)(i, j) = kA(i, j).
11.1.8 Remark: The formal linear space Definition 10.1.2 requires a tuple (K, Mm,n (K), σK , τK , σ, µ) with
field operations σK and τK for K. Definition 11.1.7 is clearer without such formality.

11.1. Rectangular matrix algebra 307
11.1.9 Remark: There are two obvious natural isomorphisms from the spaces of tuples of elements of
a field K to the matrix spaces over K. These isomorphisms are the maps φc : K m → Mm,1 (K) and
φr : K n → M1,n (K) in Definitions 11.1.10 and 11.1.11. These define respectively the column and row
representations of the spaces K m and K n . (The column vector map and row vector map are summarized in
Figure 11.1.1.)
 
v1
 v2  φc φr
 .  (v1 , v2 , . . . vn ) [ v1 v2 . . . vn ]
 .. 
vn tuple
column vector vector row vector
in Mn,1 (K) in K n in M1,n (K)
Figure 11.1.1 Column vector map and row vector map
11.1.10 Definition: The column vector map for a linear space K m over a field K for m ∈ +
0 is the map
φc : K m → Mm,1 (K) defined by:
∀v ∈ K m , ∀i ∈ m, φc (v)(i, 1) = vi .
The matrix φc (v) is called the column vector for v.
11.1.11 Definition: The row vector map for a linear space K n over a field K for n ∈ +
0 is the map
φr : K n → M1,n (K) defined by:
∀v ∈ K n , ∀j ∈ n, φr (v)(1, j) = vj .
The matrix φr (v) is called the row vector for v.
11.1.12 Definition: The transpose of a matrix A ∈ Mm,n (K) for a field K and m, n ∈ +
0 is the matrix
B ∈ Mn,m (K) defined by B(j, i) = A(i, j) for all i ∈ m and j ∈ n .
11.1.13 Notation: AT denotes the transpose of A ∈ Mm,n (K) for a field K and m, n ∈ 0.
+
11.1.14 Remark: The notation tA for the transpose of a matrix A is also popular. However, a mixture
of prefix and postfix superscripts (or subscripts) can be confusing. Therefore only postfix superscripts and
subscripts are used in this book.
11.1.15 Remark: Clearly the transpose of the transpose of a matrix is the original matrix. That is,
(AT )T = A for all A ∈ Mm,n (K) for any field K and m, n ∈ + 0.
A matrix space Mm,n (K) is not closed under the transpose operation unless m = n. The transpose of a
column vector X ∈ Mm,1 (K) is a row vector X T ∈ M1,m (K). The transpose of a row vector Y ∈ M1,n (K)
is a column vector Y T ∈ Mn,1 (K).
It is readily verified that the transpose operation commutes with both addition and scalar multiplication on
a linear space of matrices. In other words, (A + B)T = AT + B T and (λA)T = λAT for A, B ∈ Mm,n (K)
and λ ∈ K.
11.1.16 Definition: The matrix product of an ordered pair of matrices A ∈ Mm,p (K) and B ∈ Mp,n (K)
0 is the matrix C ∈ Mm,n (K) defined by
for a field K and m, n, p ∈ +
p
5
∀i ∈ m , ∀j ∈ n, C(i, j) = A(i, k)B(k, j).
k=1
11.1.17 Notation: AB denotes the matrix product of matrices A ∈ Mm,p (K) and B ∈ Mp,n (K) for a
0.
field K and m, n, p ∈ +

11.1.18 Remark: The matrix product is a map τ : Mm,p (K) × Mp,n (K) → Mm,n (K). This is not closed
in any nice way. The operation can be made closed by embedding the spaces Mm,n (K) in an infinite space
M∞,∞ (K) = K × . Then the product AB is calculated as
∞
5
∀i, j ∈ , (AB)(i, j) = A(i, k)B(k, j). (11.1.1)
k=1
The general matrices A ∈ Mm,p (K) and B ∈ Mq,n (K) for a field K and m, n, p, q ∈ +
0 may be multiplied
to give AB ∈ Mm,n (K) defined by
min(p,q)
5
∀i ∈ m, ∀j ∈ n, (AB)(i, j) = A(i, k)B(k, j).
k=1
This calculation gives the same result as (11.1.1), but it is probably not very useful. Closure of the matrix
product is, of course, guaranteed in the case of square matrices.
The infinite matrix idea also makes possible the addition of arbitrary matrices A ∈ Mm1 ,n1 (K) and B ∈
Mm2 ,n2 (K) for a field K and m1 , n1 , m2 , n2 ∈ + 0 . A matrix sum A + B ∈ Mm3 ,n3 may be defined for
m3 = max(m1 , m2 ) and n3 = max(n1 , n2 ) by
∀i ∈ m3 , ∀j ∈ n3 , (A + B)(i, j) = A(i, j) + B(i, j),
where 0K is assumed for each right-hand term if it is undefined. As in the case of the generalized matrix
product, this generalized matrix addition is of dubious practical value.
11.1.19 Theorem: For any field K, ∀m, n, p ∈ +

0, ∀A ∈ Mm,p (K), ∀B ∈ Mp,n (K), (AB)T = B TAT .
Proof: Suppose m, n, p ∈ + 0,!

A ∈ Mm,p (K) !and B ∈ Mp,n (K)
! for Tsome field K. Then for all i ∈ n
and j ∈ m , (AB)Tij = (AB)ji = nk=1 ajk bki = nk=1 bki ajk = nk=1 Bik ATkj = (B TAT )ij as claimed.
11.1.20 Definition: The identity matrix in a matrix space Mn,n (K) for a field K and n ∈ +
0 is the
matrix A ∈ Mn,n (K) defined by A(i, j) = δij for all i, j ∈ n .
11.1.21 Notation: In denotes the identity matrix in Mn,n (K) for n ∈ +

0 and a field K.
11.1.22 Remark: See Definition 7.9.10 and Notation 7.9.11 for the Kronecker delta symbol δij in Defini-
tion 11.1.20. Notation 11.1.21 is not very unique or specific because the letter I is used for many different
purposes. Its meaning is usually made clear in context.
It is noteworthy that the identity matrix is well defined for any field K because every field has a unique zero
and unity element. (As mentioned in Remark 9.8.3, the zero and unity element are unique even in a unitary
ring. A field is a sub-species of the unitary rings.)
The Kronecker symbol δij is defined in this context to have entries which are either the zero or unity element
of the field K.
11.1.23 Theorem: Let A ∈ Mm,n (K) be a matrix over a field K with m, n ∈ 0.

+
Then Im A = AIn = A.
11.1.24 Definition: A left inverse of a matrix A ∈ Mm,n (K) for a field K and m, n ∈ + 0 with m ≥ n is
a matrix B ∈ Mn,m (K) such that BA = In .
A right inverse of a matrix A ∈ Mm,n (K) for a field K and m, n ∈ +
0 with m ≤ n is a matrix B ∈ Mn,m (K)
such that AB = Im .
11.1.25 Definition: An orthogonal matrix in a set of matrices Mm,n (K) for m, n ∈ +

0 and field K is a
matrix A ∈ Mm,n (K) which satisfies AAT = Im .
11.1.26 Remark: In most texts, orthogonality is defined only for square matrices. However, orthogonal
matrices are well-defined for general rectangular matrices. However, there are no orthogonal matrices in the
case m > n in Definition 11.1.25.

11.2. Component matrices of linear maps 309
11.1.27 Theorem: Let A ∈ Mm,n (K) with m, n ∈ +

0 for some field K. If m > n then A is not orthogonal.
[ Define row rank and column rank of rectangular matrices. Also define nullity. Define the determinant of a
rectangular matrix as a matrix. Define cofactors of elements of a matrix. These definitions make much more
sense after the determinant of a square matrix has been defined. So maybe delay these topics? ]
11.1.28 Remark: In the olden days, matrix algebra was performed by hand with pen (or pencil) on paper.
Hand-written matrix algebra is most convenient when the vector components are depicted as column vectors.
I.e. the elements of the n-tuples are listed in increasing order down the page. This gives the best layout
when such a vector is multiplied by a matrix as in the following.
    
y1 a11 a12 ... a1n x1
 y2   a21 a22 ... a2n   x2 
 . = . .. .. ..   . .
 .   .. . . .   .. 
.
ym am1 am2 . . . amn xn
The use of vertical (i.e. column) vectors keeps the handwritten calculations compact in the horizontal di-
rection. A further advantage of column vectors is that the composition of multiple linear transformations
gives the same order for matrices. Thus if y = f (x) = Ax and z = g(y) = By for matrices A and B,
the composition g(f (x)) equals BAx. Thus in both notations, the order is the reverse of the temporal (or
causal) order which intuitively underlies the composition of functions. An m × n matrix over a field K is
used for linear maps f : K n → K m , which has m and n in the reverse order. And advantage ! of multiplying
a vector by a matrix from the left is that the indices are contiguous. For example, yi = nj=1 aij xj . The
two instances of the index j are close to each other.
In Markov chain theory, the reverse convention is adopted. There the vectors are row vectors, which makes the
matrix multiplication order the reverse of the standard function composition notation order. The advantage
in this case is that the matrix multiplication order matches the temporal order since the initial state of the
system is a row!vector ! which is multiplied on the right by state transition matrices. For example, in the
equation wk = ni=1 nj=1 vi Pij Qjk , the initial state is v and one or matrices on the right modify the state,
first P then Q.
11.1.29 Remark: Theorem 11.1.30 is a generalized multinomial formula. For m = n = 2, this theorem
states that (a11 + a12 )(a21 + a22 ) = a11 a21 + a11 a22 + a12 a21 + a12 a22 .
11.1.30 Theorem: Let A ∈ Mm,n (K) be a matrix over a field K with m, n ∈ 0.

+
Then
m 5
/ n 5 m
/
aij = ai,J(i) .
i=1 j=1 J∈ n i=1
n
Proof: This theorem may be proved by a double application of induction to the distributive law for a
field. (See Exercise 46.5.2 for proof.)
11.2. Component matrices of linear maps

The principal application of rectangular matrix algebra is to linear maps between finite-dimensional linear
spaces. This section presents the correspondence between linear maps and matrix algebra. The principal
property of the component matrices of linear maps is given in Theorem 11.2.5, namely that the matrix of a
composition of linear maps is the product of the matrices of the linear maps. Therefore calculations may be
performed on linear maps in terms of their component matrices.
Linear maps are often specified in terms of component matrices. The principal drawback of component
matrices is the fact that they depend on the choice of basis for both the source and target space. Conversion
between matrices with respect to different bases is tedious and error-prone.
The decision to work with bases and matrices or directly with linear spaces and linear maps is a trade-
off between their advantages and disadvantages in each application context. Differential geometers who

proclaim the virtues of “coordinate-free” methods are advocating the avoidance of bases and matrices.
The “coordinates” they refer to are the components of vectors and matrices (and tensors) with respect to
particular choices of bases. Abstract theorems can mostly be written in a coordinate-free style, but practical
calculations mostly require the use of vector component tuples and matrices with respect to bases.
Matrices are also useful for representing bilinear maps with respect to a linear space basis. This is discussed
in Remark 13.2.14.
11.2.1 Definition: The (component) matrix of a linear map φ : V → W with respect to bases (ej )nj=1 ∈
V n and !
(fi )m
i=1 ∈ W
m
for linear spaces V and W over a field K is the matrix α ∈ Mm,n (K) such that
m
φ(ej ) = i=1 fi αij for all j ∈ n .
11.2.2 Remark: The matrix α in Definition 11.2.1 is well defined because by definition of a basis, each
vector φ(ej ) has a unique m-tuple (αij )m
i=1 ∈ K
m
of components with respect to the basis (fi )m
i=1 . In terms
of the component map in Definition 10.2.14, αij κW (φ(ej ))i , where κW : W → K m is the component map
=
with respect to the basis (fi )m
i=1 .
11.2.3 Definition: The linear map for a (component) matrix α ∈ Mm,n (K) with respect to bases
(ej )nj=1 ∈ V n and (fi )m
i=1 ∈ W
m
for linear spaces V and W over a field K is the linear map given by
!m !n
φ : v 8→ i=1 j=1 fi αij κV (v)j for v ∈ V , where κV : V → K n is the component map for the basis (ej )nj=1 .
11.2.4 Remark: Definition 11.2.3 specifies a unique linear map φ ∈ Lin(V, W ) for each matrix α ∈
Mm,n (K) for given bases for V and W . Conversely, Definition 11.2.1 specifies a unique matrix α ∈ Mm,n (K)
for each linear map φ ∈ Lin(V, W ) for the same bases. This establishes a bijection between the linear maps
and component matrices. Thus one may freely choose whether to work “coordinate-free”, in terms of linear
maps, or “using coordinates”, in terms of matrices.
11.2.5 Theorem: Let U, V, W be linear spaces over K with bases a ∈ U % , b ∈ V m and c ∈ W n respectively.
Let φ ∈ Hom(U, V ) and ψ ∈ Hom(V, W ), with matrices α ∈ Mm,% (K) and β ∈ Mn,m (K) respectively. Then
the matrix of φ ◦ ψ is the product matrix αβ.
11.2.6 Remark: Whereas the basis vectors are transformed by multiplying the sequence of basis vectors
on the right by the matrix α in Definition 11.2.1, component tuples are multiplied on the left by α.
!n
Let v = j=1 vj ej be a vector in V with components (vj )nj=1 ∈ K n . (Here v denotes both the vector and
its component tuple.) Let φ! :V → !W be the linear map for the component matrix α ∈ Mm,n (K). Then
by Definition 11.2.3, φ(v) = m n
f α v . So the component tuple for φ(v) with respect to the basis
i=1 !n i ij j
j=1
(fi )m
i=1 ∈ W m
is (w ) m
i i=1 , where w i = j=1 αij vj for i ∈ m . This is multiplication of the component tuple
of v on the left by the matrix α. This is illustrated in Figure 11.2.1.
tuple space tuple space
matrix multiplication
Kn (vj )nj=1 !n (wi )m
i=1 Km
wi = j=1 αij vj
component component
map κV : V → K κW : W → K m map
n
φ:V →W
V W
(ej )nj=1 linear map (fi )m
i=1
linear space linear space

Figure 11.2.1 Matrix multiplication corresponding to a linear map
[ Present the special case of linear spaces V = K n and W = K m . ]

11.3. Square matrix algebra 311
[ Should also discuss matrices for changes of basis in a single space. ]

[ Present the relation between the matrix transpose and the transpose of a linear map. See Definition 10.5.23. ]
[ Maybe have a new section for the inner product component matrices which are mentioned in Remark 11.2.7. ]
11.2.7 Remark: Since an inner product on a linear space over the real number field is a symmetric bilinear
map on the linear space, it is possible to represent the inner product as a matrix multiplied by the components
of vectors with respect to a basis.
11.3. Square matrix algebra

Whereas general rectangular matrices correspond to general linear homomorphisms φ : V → W , square
matrices are required for the components of linear endomorphisms φ : V → V . The component matrices of
linear automorphisms are invertible square matrices.
11.3.1 Notation: Mn (K) denotes Mn,n (K) for n ∈ +
0 and field K,
!n
11.3.2 Definition: The trace of a matrix A = (aij )ni,j=1 ∈ Mn,n (K) for n ∈ +
0 is the sum i=1 aii .
11.3.3 Notation: Tr(A) denotes the trace of a matrix A ∈ Mn,n (K) for n ∈ +
0 and a field K,
11.3.4 Theorem: The trace of square matrices has the following properties for all n ∈ +
0 and fields K.
(i) ∀λ ∈ K, ∀A ∈ Mn,n (K), Tr(λA) = λ Tr(A).
(ii) ∀A, B ∈ Mn,n (K), Tr(A + B) = Tr(A) + Tr(B).
(iii) ∀A ∈ Mn,n (K), Tr(AT ) = Tr(A).
11.3.5 Remark: Determinants are defined in terms of permutations and parity, which are defined in
Section 7.10. The set perm( n ) is the set of n! permutations of n = {1, 2, . . . n}.
11.3.6 Definition: The determinant of a matrix A = (aij )ni,j=1 ∈ Mn,n (K) for n ∈ +
0 and field K is the
! " . #
element of K given by P ∈perm( n ) parity(P ) ni=1 ai,P (i) .
11.3.7 Notation: det(A) denotes the determinant of a matrix A ∈ Mn,n (K) for n ∈ +
0 and a field K.
5 n
/
det(A) = parity(P ) ai,P (i) .
P ∈perm( n) i=1
11.3.8 Remark: If the field K in Definition 11.3.6 is 2 = {0, 1}, the parity values 1 and −1 are the same.
The matrix A has only zeros and ones as elements. Since the determinant can only be zero or one, it is
interesting to ask what kinds of matrices have the non-zero determinant value. The diagonal matrix with
aij = δij for i, j ∈ n certainly has det(A) = 1. It is also clear that det(A) = 1 for any lower diagonal
matrix with aij = 1 for i = j and aij = 0 for i < j. (This is true for any field.)
11.3.9 Theorem: Let A ∈ Mn,n (K) for n ∈ +
0 and a field K. Then det(AT ) = det(A).
Proof: For any n ∈ + 0 , the set {P
−1
; P ∈ perm( n )} is equal to perm( n ). This is because P ∈
perm( n ) if and only if P ∈ perm( n ). So
−1
5 n
/
det(A) = parity(P ) ai,P −1 (i) .
P ∈perm( n) i=1
. . .
But ni=1 ai,P −1 (i) = ni=1 aP (i),P (P −1 (i)) = ni=1 aP (i),i for any P ∈ perm( n ) because a permutation of
the factors in a product in a field does not affect the value of the product. (According to Definition 9.8.8,
the product operation of a field is commutative.) Therefore
5 n
/
det(A) = parity(P ) aP (i),i ,
P ∈perm( n) i=1
which equals det(AT ).

11.3.10 Theorem: Let A ∈ Mn,n (K) for n ∈ +

0 and a field K. Then det(λA) = λn det(A) for λ ∈ K.
11.3.11 Remark: For Theorem 11.3.13, it is useful to first prove Lemma 11.3.12. The parity function is
extended by parity(Q) = 0 for functions Q : n → n which are not permutations (i.e. not bijections).
[ Try to find a shorter proof for Lemma 11.3.12. It can’t be so difficult! ]
11.3.12 Lemma: Let A ∈ Mn,n (K) for n ∈ +

0 and a field K. Let Q : n → n. Then
5 n
/
parity(P ) aQ(i),P (i) = parity(Q) det(A). (11.3.1)
P ∈perm( n) i=1
Proof: Suppose first that Q is not a permutation of n . (This implies, incidentally, that n ≥ 2.) Then
Q(k) = Q(/) for some k, / ∈ n with k -= /. Let S ∈ perm( n ) be the permutation which swaps k and /.
(That is, S(k) = / and S(/) = k; otherwise S(i) = i.) Then Q ◦ S = Q ◦ S −1 = Q and parity(S) = −1.
Substitute P = T ◦ S in the left-hand side of (11.3.1) for permutations T ∈ perm( n ). Then
5 n
/ 5 n
/
parity(P ) aQ(i),P (i) = parity(T ◦ S) aQ(i),T S(i)
P ∈perm( n) i=1 T ∈perm( n) i=1
5 n
/
= parity(T ) parity(S) aQS −1 (i),T (i)
T ∈perm( n) i=1
5 n
/
= parity(S) parity(T ) aQ(i),T (i)
T ∈perm( n) i=1
= 0,
because x = −x for x ∈ K implies that x = 0. [ The Lemma must be true for a field which contains an
element x with x + x = 0. Must find a different kind of proof in this case? ] Since parity(Q) = 0 for a
non-permutation, the Lemma is verified in this case.
If Q is a permutation of n, substitution of P = T ◦ Q in (11.3.1) yields
5 n
/ 5 n
/
parity(P ) aQ(i),P (i) = parity(T ◦ Q) aQ(i),T Q(i)
P ∈perm( n) i=1 T ∈perm( n) i=1
5 n
/
= parity(T ) parity(Q) aQQ−1 (i),T (i)
T ∈perm( n) i=1
5 n
/
= parity(Q) parity(T ) ai,T (i)
T ∈perm( n) i=1
= parity(Q) det(A),
which is as claimed.
11.3.13 Theorem: Let A, B ∈ Mn,n (K) for n ∈ +

0 and a field K. Then det(AB) = det(A) det(B).
Proof: Let Nn = ( n)
n
and Pn = perm( n) ⊆ Nn . Then
5 n +5
/ n ,
det(AB) = sign(P ) aik bk,P (i)
P ∈Pn i=1 k=1
5 5 +/ n ,
= sign(P ) ai,Q(i) bQ(i),P (i) (11.3.2)
P ∈Pn Q∈Nn i=1

11.4. Real square matrix algebra 313
5 5 n
/
= sign(P ) ai,Q(i) bQ(i),P (i)
Q∈Nn P ∈Pn i=1
5 +/ n , 5 n
/
= ai,Q(i) sign(P ) bQ(j),P (j)
Q∈Nn i=1 P ∈Pn j=1
5 +/ n , 5 n
/
= ai,Q(i) sign(Q) sign(P ) bj,P (j) (11.3.3)
Q∈Nn i=1 P ∈Pn j=1
I5 n
/ JI 5 /n J
= sign(Q) ai,Q(i) sign(P ) bj,P (j)
Q∈Pn i=1 P ∈Pn j=1
= det(A) det(B).
Line (11.3.2) follows from Theorem 11.1.30. Line (11.3.3) follows from Lemma 11.3.12.
11.3.14 Remark: Left and right inverses of matrices are given by Definition 11.1.24. In the case of square
matrices, a matrix B is a left inverse of a matrix A if and only if B is a right inverse of A. This common
left/right inverse is a unique matrix which is called the inverse of A if such a matrix exists.
11.3.15 Definition: An invertible matrix is a matrix A ∈ Mn,n (K) for n ∈ +

0 such that for some
B ∈ Mn,n (K), B is the inverse of A.
11.3.16 Notation: A−1 denotes the inverse of a square matrix A.
11.3.17 Theorem: The set Mn,n

inv
(K) of invertible n × n matrices over a field K for n ∈ +
0 has the
following properties.
(i) ∀λ ∈ K \ {0K }, ∀A ∈ Mn,n
inv
(K), (λA)−1 = λ−1 A−1 .
(ii) ∀A, B ∈ Mn,n
inv
(K), (AB)−1 = B −1 A−1 .
(iii) ∀A ∈ Mn,n
inv
(K), det(A−1 ) = det(A)−1 .
Proof: Property (iii) follows from Theorem 11.3.13 and the definition of an inverse matrix.
[ Define eigenvalues and eigenvectors of square matrices. Give relations between invertibility of a matrix and
its set of eigenvalues. ]
11.3.18 Remark: It turns out that the trace and determinant functions, and the inverse matrix operation,
are given a lot more meaning when viewed within the framework of eigenspaces.
The concept of eigenspaces for linear space endomorphisms is discussed in Section 10.4. The one-to-one
onto correspondence between matrices and linear maps is discussed in Section 11.2. The eigenspace concept
for linear spaces may be applied to square matrices via this correspondence. This is because linear space
endomorphisms correspond to square matrices.
11.3.19 Definition: A symmetric n × n matrix over a field K for n ∈ +

0 is a matrix (aij )ni,j=1 ∈ Mn,n (K)
such that ∀i, j ∈ n , aij = aji .
11.4. Real square matrix algebra

[ Generalize the definitions in this section to fields other than IR. ]
11.4.1 Remark: The real-number field IR has a standard total order. This enables the definition of a
various norms on matrices over IR.
11.4.2 Definition: The upper norm for real square matrices is the function λ+ : Mn,n (IR) → IR defined
0 by
for n ∈ +
$5
n n
5 8
∀A ∈ Mn,n (IR), λ+ (A) = sup aij vi vj ; v ∈ IRn and vi2 = 1 .
i,j=1 i=1

The lower norm for real square matrices is the function λ− : Mn,n (IR) → IR defined for n ∈ +
0 by
$5
n n
5 8
∀A ∈ Mn,n (IR), λ (A) = inf
−
aij vi vj ; v ∈ IR and
n
vi2 = 1 .
i,j=1 i=1
11.4.3 Remark: The terms “upper norm” and “lower norm” in Definition 11.4.2 are probably non-
standard. These functions are useful for putting upper and lower limits on the coefficients of second-order
partial derivatives in partial differential equations.
11.4.4 Remark: In the case n = 0 in Definition 11.4.2, λ+ (A) = −∞ and λ− (A) = ∞ for all A ∈ Mn,n (IR).
Therefore these functions are of dubious value for n = 0.
11.4.5 Theorem:
(i) ∀n ∈ +
0, ∀A ∈ Mn,n (IR), λ+ ( 12 (A + AT )) = λ+ (A).
(ii) ∀n ∈ +
0, ∀A ∈ Mn,n (IR), λ− ( 12 (A + AT )) = λ− (A).
(iii) ∀n ∈ +
0, ∀A ∈ Mn,n (IR), λ+ ( 12 (A − AT )) = λ− ( 21 (A − AT )) = 0.
! !
Proof: Parts (i) and (ii) follow from ni,j=1 ( 12 (aij + aji ))vi vj = ni,j=1 aij vi vj , which follows from com-
!
mutativity of real number multiplication. Part (iii) follows from ni,j=1 ( 21 (aij − aji ))vi vj = 0.
11.4.6 Remark: In the study of second-order partial differential equations, the order properties of the
coefficient matrix for the second-order derivatives are important for the classification of operators and equa-
tions. An elliptic operator, for example, requires a second-order coefficient matrix which is positive definite.
A weakly elliptic operator requires a positive semi-definite second-order coefficient matrix. In curved space,
similar classifications are applicable, although the second-order derivatives in this case should be covariant
and the coefficient matrix is replaced by a second-order contravariant tensor. Then the (semi)definiteness
properties are applied to the components of this contravariant tensor with respect to a basis. Of course,
the (semi)definiteness properties must be chart-independent in order to ensure that the properties are well
defined.
11.4.7 Definition: A real square matrix (aij )ni,j=1 ∈ Mn,n (IR) for n ∈ +
0 is said to be
!n
positive semi-definite if ∀v ∈ IRn , i,j=1 aij vi vj ≥ 0,
!n
negative semi-definite if ∀v ∈ IRn , i,j=1 aij vi vj ≤ 0,
!n
positive definite if ∀v ∈ IRn \ {0}, i,j=1 aij vi vj > 0,
!n
negative definite if ∀v ∈ IR \ {0}, i,j=1 aij vi vj < 0.
n
11.4.8 Theorem: Let n ∈ +

and A ∈ Mn,n (IR). Then
(i) A is positive semi-definite if and only if λ− (A) ≥ 0;
(ii) A is negative semi-definite if and only if λ+ (A) ≤ 0.
Proof: To show (i), let n ∈ + and suppose that A is positive)!nsemi-definite. Then A ∈! Mn,n (IR) and *
n !n n
∀v ∈ IR , i,j=1 aij vi vj ≥ 0. By Definition 11.4.2, λ− (A) = inf i,j=1 a ij vi vj ; v ∈ IRn
and i=1 vi = 1 .
2
!n
But i,j=1 aij vi vj ≥ 0 for any v ∈ IRn . Therefore λ− (A) ≥ 0 as claimed.
!
Now suppose that A ∈ Mn,n (IR) and λ− (A) ≥ 0. If v ∈ IRn is the zero vector v = 0, then ni,j=1 aij vi vj = 0.
!n !n
So a v v ≥ 0 as required. If v -= 0, define w ∈ IRn by w = k−1/2 v, where k = i=1 vi .
2
!n ij 2i j
i,j=1 !n ! n
Then i=1 wi = 1. So i,j=1 aij wi wj ≥ 0 be the definition of λ (A). Therefore
−
i,j=1 aij vi vj =
!n !n
k i,j=1 aij wi wj ≥ 0 since k ≥ 0. It follows that i,j=1 aij vi vj ≥ 0 for all v ∈ IR . Hence A is posi-
n
tive semi-definite.
The proof of (ii) follows by suitable changes of sign.

11.5. Real symmetric matrix algebra 315
11.4.9 Remark: Definite and semi-definite matrices are generally defined only for the spacial case of real
symmetric or complex hermitian matrices. (See for example EDM2 [35], 269.I.) Whether a real matrix A
has a definiteness or semi-definiteness property depends only on the symmetric part 12 (A + AT ). (This is
stated in Theorem 11.4.5.) The main applications of (semi)definiteness are to eigenspaces for real sym-
metric (or complex hermitian) matrices, which are guaranteed to have real eigenvalues. Such matrices
therefore have well-defined ordering properties when applied to vectors. Nevertheless, Definition 11.4.7 de-
fines (semi)definiteness for all real square matrices. Notation 11.6.1 specializes the concept to real symmetric
matrices.
11.5. Real symmetric matrix algebra

[ Probably discuss in this section the basic facts about symmetric matrices, orthogonal eigenspaces, preserva-
tion of matrix symmetry under linear transformations, and similar useful things. Define also characteristic
polynomials. ]
[ It is usual to discuss real symmetric matrices in the context of hermitian complex matrices. The usual sets
of complex matrices will be presented if I can find a use for them. There should be a new section for complex
matrices. ]
11.5.1 Remark: Symmetric matrices arise constantly in partial differential equations as the second deriva-
tives of C 2 real-valued functions on manifolds. Since the matrices U = (uij )ni,j=1 of second derivatives of
such functions are necessarily symmetric, the coefficient matrices A = (aij )ni,j=1 of second-order terms such
!
as Tr(AT U ) = ni,j=1 aij uij in PDEs may be assumed symmetric also, since the anti-symmetric component
has no effect. The following notations and theorems are motivated by PDE applications.
11.5.2 Definition: A real symmetric n × n matrix for n ∈ +

0 is a matrix (aij )ni,j=1 ∈ Mn,n (IR) such that
∀i, j ∈ n , aij = aji .
11.5.3 Notation: Sym(n, IR) denotes the set of real symmetric n × n matrices for n ∈ 0.
+
11.5.4 Remark: It is not generally true that AB ∈ Sym(n, IR) if A, B ∈ Sym(n, IR). By Theorem 11.5.5,
(AB)T = BA if A, B ∈ Sym(n, IR), which is not quite the same thing. Theorem 11.1.19 implies that (AB)T =
B TAT for any matrices A, B ∈ Mn,n (IR). So (AB)T = BA = B TAT = (ATB T )T for all A, B ∈ Sym(n, IR).
It is not possible to conclude the general equality of (AB)T and AB from this. However, AB + BA is
symmetric if A and B are symmetric, as shown in Theorem 11.5.6.

0 and A, B ∈ Sym(n, IR). Then (AB)T = BA.

0 and A, B ∈ Sym(n, IR). Then AB + BA ∈ Sym(n, IR).
Proof: Let n ∈ +
0 and A, B ∈ Sym(n, IR). Then by Theorem 11.5.5 and Remark 11.1.15, (AB + BA) =
T
(AB) + (BA) = BA + AB = AB + BA. So AB + BA ∈ Sym(n, IR) as claimed.

T T
[ Possibly Remark 11.5.7 applies to general symmetric matrices for any field? ]
11.5.7 Remark: Theorem 11.5.8 gives a spectral decomposition for a symmetric matrix A using a style of
inductive proof which does not seem to rely on solving the polynomial equation det(λI − A) = 0.
The eigenvalues and eigenvectors that arise from Theorem 11.5.8 derive their name from the German adjective
“eigen”, which means “own”, “typical” or “particular”. The reason for this choice of word is the !nfact that
eigenvalues and eigenvectors are invariant under orthogonal transformations. To see this, if A !
= i=1 λi ei eTi
n
is transformed by the orthogonal matrix B, then the transformed matrix is B −1 AB = i=1 λi ei (ei )
# # T
with ei = (B ei ) . So the eigenvalues and eigenvectors may be thought of as being “attached” to these
# −1 T
matrices under any orthogonal transformation.

!
[ Prove the assertion about B −1 AB = ni=1 λi e#i (e#i )T in Remark 11.5.7. This should be in a separate section
on eigenspaces. ]

11.5.8 Theorem: For any symmetric matrix A ∈ ! Sym(n, IR), there are numbers λ1 , . . . λn ∈ IR and
n
orthonormal
!n vectors e1 , . . . en ∈ IR n
such that Ax = i=1 λi (x ei )ei for all x ∈ IR . Consequently, A =
T n
i=1 λi ei ei .
T
Proof: [ Try to prove this spectral decomposition theorem using the induction method. ]
[ Quote proof of orthogonality of eigenspaces of self-adjoint operator in Rudin [138], page 313, thm 12.12. ]
11.6. Real symmetric definite and semi-definite matrices

11.6.1 Notation:
0 (n, IR) denotes the set of real positive semi-definite symmetric n × n matrices for n ∈
Sym+ 0.
+
Sym−
0 (n, IR) denotes the set of real negative semi-definite symmetric n × n matrices for n ∈ 0.
+
Sym+ (n, IR) denotes the set of real positive definite symmetric n × n matrices for n ∈ 0.
+
Sym− (n, IR) denotes the set of real negative definite symmetric n × n matrices for n ∈ 0.
+
11.6.2 Remark: Theorem 11.5.6 states that AB + BA ∈ Sym(n, IR) if A, B ∈ Sym(n, IR), for all n ∈ + 0.
However, it is not generally true that AB + BA ∈ Sym+
0 (n, IR) if A, B ∈ Sym+
0 (n, IR). Example 11.6.3 is a
counterexample to this hypothesis.
Roughly speaking, a positive semi-definite real symmetric matrix corresponds to a linear transformation
which rotates vectors by no more than 90◦ . But a composition of two such transformations may rotate
vectors by up to 180◦ . Hence positive semi-definiteness is not transitive, even when restricted to real
symmetric matrices.
[ Rudin [138], page 330, has some useful results on positive operators on Hilbert spaces which are related to
Remark 11.6.2. ]
[ Give a theorem that positive semi-definite matrices do not rotate vectors by more than 90◦ . ]
3 4 3 4 3 4
a 0 1 b 2
11.6.3 Example: Let A = and B = with a = 2, b = 1 and v = . Then AB + BA =
3 4 3 04 1 b 1 −3
2a b(a + 1) 4 3
= . So v T (AB + BA)v = −2 -≥ 0. Hence AB + BA ∈ / Sym+ 0 (2, IR) although
b(a + 1) 2 3 2
A, B ∈ Sym0 (2, IR).
+
11.6.4 Remark: Theorem 11.6.5 is important in the theory of elliptic partial differential operators.
[ One way of proving Theorem 11.6.5 would be to diagonalize either A or U . But that requires some theorems
about diagonalization of matrices and invariance of the set of eigenvalues under orthogonal transformations. ]
0 (n, IR) for some n ∈

11.6.5 Theorem: If A, U ∈ Sym+ 0,
+
then Tr(AU ) ≥ 0.
[ See Rudin [137], pages 186–188, for properties of products of matrices. ]
11.7. Matrix groups

[ In this section, define matrix groups such as GL(n), SL(n), O(n) and SO(n). These are related to the
corresponding automorphism groups GL(IRn ) and so forth in Section 10.9. ]
[ Also define the groups of Euclidean transformations and affine transformations on IRn . Might as well do
projective and conformal groups too. Also add translation groups to these groups. And just for good
measure, do Lorentz transformations too. ]
[ Also make sure the include groups which preserve a hyperbolic norm like the Minkowski and Lorentz groups. ]
11.7.1 Remark: The classical matrix groups include the following.

11.7. Matrix groups 317
notation name definition

GL(n) general linear group
SL(n) special linear group
O(n) orthogonal group
SO(n) special orthogonal group
U(n) unitary group
SU(n) special unitary group


[319]
Chapter 12
Affine spaces
12.1 Affine spaces discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

12.2 Affine space definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
12.3 Affine transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
12.4 Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
12.0.1 Remark: Roughly speaking, an affine space is a linear space from which the coordinates have been
removed. So, for example, the points of an affine space have no origin, no angles, no metric and no inner
product. However, parallelism and convexity are well defined in an affine space.
In a linear space, points and vectors are the same thing. Every point is a vector and every vector is a point.
But in an affine space, points and vectors are members of two disjoint spaces, namely the point space and
the vector space.
Since affine spaces may be constructed by “removing” the coordinates from a linear space, the points of an
affine space may in practice be labelled by coordinates for convenience. Therefore in practical calculations
in affine spaces, points and vectors may be confused. One must exercise some self-discipline to avoid using
concepts such as “origin”, “axes”, “angles” and “metric” in an affine space.
One may also think of and affine space as a point space in which every point is an origin of the space, and
all compatible sets of coordinates at all such origins are equally acceptable. Since there are multiple ways
of thinking about affine spaces, it is important to have a single reference definition to help prevent woolly
thinking. (See Definition 12.2.3.)
12.0.2 Remark: Affine spaces provide a reference model for understanding parallel transport and affine
connections on general manifolds. An affine space may be thought of as a manifold with absolute (i.e.
path-independent) parallelism. In other words, affine spaces are examples of “flat spaces”.
12.0.3 Remark: The word “affine” is used in many ways, some of which are mutually contradictory.
An “affine transformation” of a linear space is a combination of a translation and an invertible linear
transformation.
An “affine space” is a combination of a space of points with a linear space of vectors at each point.
An “affine connection” is really a linear connection. It is possible to justify the name “affine connection”,
but it is confusing.
An “affine manifold” may refer to a differentiable manifold on which an affine connection is defined. (See
Remark 36.1.4.)
12.1. Affine spaces discussion

12.1.1 Remark: The concepts of affine spaces and affine transformations were apparently first published
by Euler [104] in 1748 in a superficial way. They were first dealt with in a serious manner by Möbius [76]
in 1827. Euler chose the name “affine” for the concept, but it was Möbius who correctly defined the general
affine transformation and proved some non-trivial properties and invariants of affine transformations, such
as the invariance of convex combinations of points. In choosing the name “affine”, Euler needed something
to describe a geometric relation which was weaker than similarity, and since “affine” means “related”, he
thought it seemed appropriate. (See Section 45.3 for details.)

320 12. Affine spaces
12.1.2 Remark: It is important to have some understanding of affine spaces in order to better under-
stand the concept of an affine connection on a differentiable manifold. (Affine connections are presented in
Chapter 36.) An affine space exhibits a canonical example of an affine connection in the same sense that
a Euclidean metric space exhibits a canonical example of a Riemannian metric. In an affine linear space,
straight lines and parallelism are defined, but not distance or angles. An affine connection defines geodesics
and parallel transport, but not distance or angles.
[ Reinhardt [135], pages 128–169, defines analytic (coordinate) and synthetic (axiomatic point/line) geometry. ]
12.1.3 Remark: The summary of classical point-and-line geometry in Reinhardt [135], pages 128–169, is
very enlightening.
[ The Euclidean and affine groups will be defined in Section 12.3. ]
12.1.4 Remark: An affine space is a linear space V with no special zero vector and no scalar multiplication
or vector addition operations, although vector subtraction is permitted on the point space. An affine space
is a democratic space where all points are equal. Definition 12.2.3 removes unwanted concepts from the
linear space V by starting with a bare set X and adding only those properties of V which are required.
An alternative construction to specify an affine space would be an equivalence class of linear maps. This is
the approach taken for defining manifolds and it is just as applicable here. Thus an affine space could be
specified as a set X together with a bijection ψ : X → V . The pair (X, ψ) is then taken to be equivalent to
the pair (X # , ψ # ) if and only if X = X # and ψ # ◦ ψ −1 : V → V is an affine transformation, namely an element
of GL(V ). This would be closer to “the truth” than Definition 12.2.3.
There is a third possibility for defining affine spaces which seems to be much better than even the atlas
approach. Affine spaces can be defined in a natural, intrinsic manner by constructing a four-way equivalence
relation on the points of a set. Thus given a set X, a parallelism relation on the set X × X is defined to
−−5 −5
mean that (P, Q) ∼ (R, S) whenever the vector P Q is parallel to RS.
12.1.5 Remark: The following table shows which kinds of transformations leave which relations invariant
in point-and-line geometries.
transformation preserved relations group DG concept
projective coincidence of lines and points rational linear projective connection?
affine parallelism of lines general linear affine connection
similarity angles conformal conformal connection?
congruence distances orthogonal Riemannian metric
12.2. Affine space definitions

12.2.1 Remark: Definition 12.2.3 follows EDM2 [35], article 7, in that is uses a completely general linear
space. It is similar to the definition by Weyl [51], section 2, which uses a general real linear space. Greenberg/
Harper [114], section 8, page 41, use the linear space IRn , which is probably not a good idea. The important
thing to note about Definition 12.2.3 is the fact that the linear space V has no specified basis and no inner
product or metric. The base space X is given only algebraic structure by the linear space V . If the linear
space IRn had been chosen instead of a general linear space, there would have been ambiguity as to which
properties were to be inherited by the base space.
12.2.2 Remark: The approach taken in Definition 12.2.3 represents the parallelism structure of an affine
space in terms of an associated linear space. It is also possible to take a more intrinsic approach whereby
the parallelism is specified as an equivalence relation. There seem to be four sensible ways to define an affine
space.
(i) An abstract set X with vectors in an abstract linear space V as in Definition 12.2.3.
(ii) The same approach as in Definition 12.2.3 except that instead of a single difference operation δ, the set
of all difference operations which are related by affine transformation to this is used.

12.2. Affine space definitions 321
(iii) Start with an abstract linear space V or a concrete linear space such as IRn and remove unwanted
properties by insisting on invariance under the group of affine transformations. This is the point of view
of the Erlanger Programm.
(iv) An equivalence relation for point pairs in an abstract set X. This is a synthetic geometry approach.
12.2.3 Definition: An affine space over a linear space V is a pair X −

< (X, δ) where the non-empty set
X and function δ : X × X → V satisfy the following conditions.
(i) For all Q ∈ X, the map P 8→ δ(P, Q) is a bijection from X to V .
(ii) For all P ∈ X, the map Q 8→ δ(P, Q) is a bijection from X to V .
(iii) For all P, Q, R ∈ X, δ(P, R) = δ(P, Q) + δ(Q, R), where + : V × V → V denotes the vector addition
function of V .
The point space of the affine space (X, δ) is the set X.
The vector space of the affine space (X, δ) over a linear space V is the linear space V .
12.2.4 Remark: The definition of two spaces in Definition 12.2.3, the point space and the vector space,
contrasts with the situation in linear spaces where points and vectors are the same thing. The distinction
between points and vectors in affine spaces is useful as a metaphor for dealing with tangent vectors for
differentiable manifolds, where points and vectors are even more distinct than in the case of affine spaces.
12.2.5 Definition: The affine structure (function) for an affine space (X, δ) over a linear space V is the
function σ : X × V → X defined by
(i) ∀P, Q ∈ X, σ(Q, δ(P, Q)) = P .
12.2.6 Remark: Since the map P 8→ δ(P, Q) in Definition 12.2.3 (i) is a bijection from X to V for all Q ∈
X, the function σ in Definition 12.2.5 (i) is well defined.
12.2.7 Notation: The function δ : X × X → V for an affine space (X, δ) over a linear space V may be
denoted as the binary operation “ − ”. Thus P − Q denotes δ(P.Q) for all P, Q ∈ X.
The affine structure function σ : X × V → X for an affine space (X, δ) over a linear space V may be denoted
as the binary operation “ + ”. Thus P + v denotes σ(P.v) for all P ∈ X and v ∈ V .
12.2.8 Remark: In terms of Notation 12.2.7, the transitivity property of the affine difference function,
Definition 12.2.3 (iii), may be written as ∀P, Q, R ∈ X, P − R = (P − Q) + (Q − R), where + denotes the
vector addition operation on V . (This is illustrated in Figure 12.2.1 with v = P − Q and w = Q − R.)
v+w P
R
v
w
Q
Figure 12.2.1 Transitivity of affine space vectors
The transitivity property implies that P − P = 0 for all P ∈ X. This can be seen by noting that P − P =
(P − P ) + (P − P ).
12.2.9 Remark: Definition 12.2.5 (i) may be written as ∀P, Q ∈ X, Q + (P − Q) = P , where + denotes
the affine structure function for (X, δ).
Definition 12.2.5 (i) implies that the addition operation “+” is just a way of providing a convenient shorthand
for the “ − ” operation. Thus Q + v really means “the unique point P ∈ X such that P − Q = v”.
[ Add lots more properties to Theorem 12.2.10. ]

12.2.10 Theorem: The following properties are valid for any affine space (X, δ) over a linear space V over
a field K, with affine structure function σ.
(i) ∀P, Q ∈ X, P − Q = −(Q − P ).
(ii) ∀P ∈ X, P − P = 0V .
(iii) ∀P ∈ X, P + 0V = P .
12.2.11 Remark: The bijections P 8→ δ(P, Q) and Q 8→ δ(P, Q) in Definition 12.2.3 may be thought of as
manifold charts which satisfy a linearity constraint.
12.2.12 Remark: The set {P + t(Q − P ); t ∈ K} in Definition 12.2.13 is well defined because the product
of elements t and Q − P is well defined in K. The + operation in this definition is the affine structure
function for the affine space. The expression (1 − t)P + tQ, on the other hand, is not well defined because
there is no product function from K × X to X, and there is no addition function from X × X to X. The
expression (1 − t)P + tQ is well defined, however, if X is given a linear space structure which is consistent
with the affine space structure.
12.2.13 Definition: The line through points P, Q ∈ X in an affine space (X, δ) over a linear space V with
field K is the set {P + t(Q − P ); t ∈ K}.
12.2.14 Theorem: Lines through points in an affine space (X, δ) over a linear space K with field K have
the following properties, where L̄(P, Q) = {P + t(Q − P ); t ∈ K} denotes the line through points P.Q
for P, Q ∈ X.
(i) ∀P ∈ X, L̄(P, P ) = {P }.
(ii) For all P, Q ∈ X with P -= Q, the map φ : K → L̄(P, Q) defined by φ : t 8→ P + t(Q − P ) is a bijection.
(iii) ∀P, Q ∈ X, L̄(P, Q) = L̄(Q, P ).
(iv) ∀P, Q ∈ X, ∀t ∈ K \ {0K }, L̄(P, Q) = L̄(P, P + t(P − Q)).
[ For ordered fields in Remark 12.2.15, see EDM2 [35], 149.N, page 581; Curtis [101], page 4. ]
12.2.15 Remark: Definition 12.2.13 describes an “infinite line”, at least if the field K is infinite. The “line
segment” between points P and Q would be defined in plain Euclidean space by limiting the parameter t to
the interval [0, 1] in K = IR. In the case of a general field K, the numbers 0K , 1K ∈ K are well defined, but
the interval [0K , 1K ] = {t ∈ K; 0K ≤ t ≤ 1K } is not defined unless K possesses a suitable total order. It
makes sense to require K to be an ordered field in this case. This is done in Definition 12.2.16.
12.2.16 Definition: The line segment through points P, Q ∈ X in an affine space (X, δ) over a linear
space V with ordered field K is the set {P + t(Q − P ); t ∈ K, 0K ≤ t, t ≤ 1K }.
12.2.17 Remark: Notations 12.2.18 and 12.2.25 are suggested by the standard closed interval notation
in IR.
12.2.18 Notation: [P, Q] denotes the line segment through points P, Q ∈ X for an affine space X over a
linear space with and ordered field.
12.2.19 Theorem: Line segments through points in an affine space (X, δ) over a linear space K with
ordered field K have the following properties, where L(P, Q) = {P + t(Q − P ); t ∈ K, 0K ≤ t, t ≤ 1K }
denotes the line segment through points P.Q for P, Q ∈ X.
(i) ∀P ∈ X, L(P, P ) = {P }.
(ii) For all P, Q ∈ X with P -= Q, the map φ : [0K , 1K ] → L(P, Q) defined by φ : t 8→ P + t(Q − P ) is a
bijection, where [0K , 1K ] = {t ∈ K; 0K ≤ t and t ≤ 1K }.
(iii) ∀P, Q ∈ X, L(P, Q) = L(Q, P ).
!k
12.2.20 Remark: The expression P0 + i=1 ti (Pi − P0 ) in Definitions 12.2.21 and 12.2.22 is a well-defined
!
element of X for all point sequences P ∈ K k+1 and number sequences t ∈ K k . The expression ki=1 ti (Pi −
! !
P0 ) is defined inductively by the rule ji=1 ti (Pi −P0 ) = tj (Pj −P0 )+ j−1
i=1 ti (Pi −P0 ). Since vector addition
in K is commutative, the sum expression is independent of the order of addition of the terms.

12.3. Affine transformation groups 323
12.2.21 Definition: The hyperplane through points P0 , P1 . . . Pk ∈ X for k ∈ +

0 in an affine space (X, δ)
!k
over a linear space V with field K is the set {P0 + i=1 ti (Pi − P0 ); t ∈ K k }.
12.2.22 Definition: The hyperplane segment through points P0 , P1 . . . Pk ∈ X for k ∈ + 0 in an affine

!k
space (X, δ) over a linear space V with ordered field K is the set {P0 + i=1 ti (Pi − P0 ); t ∈ [0K , 1K ]k }.
12.2.23 Remark: The “hyperplane segment through points” in Definition 12.2.22 is also known as the
“span” of the points P0 , P1 . . . Pk .
!
12.2.24 Remark: The expression P0 + ki=1 ti (Pi − P0 ) in Definitions 12.2.21 and 12.2.22 resembles the
!k !k
expression (1 − i=1 ti )P0 + i=1 ti Pi . As pointed out for the two-point case in Remark 12.2.12, the latter
expression is ill defined because there is no product function from K × X to X and no addition function
from X × X to X.
!
A convex combination of k + 1 points P0 , P1 , . . . Pk may be written as ki=0 λi Pi when the point space X
has a linear space structure which is compatible
!k with the affine space structure on X. The number sequence
λ ∈ K k+1 is then required to satisfy i=1 λi = 1K .
12.2.25 Notation: [P0 , P1 . . . Pk ] denotes the hyperplane segment through points P0 , P1 . . . Pk ∈ X for
k∈ + 0 and an affine space X over a linear space with an ordered field.
[ Should give some examples of affine spaces which are not linear spaces. A good example could be the set of
points in IRn+1 which lie in any plane which does not pass through the zero vector. ]
12.2.26 Remark: EDM2 [35], article 7, summarizes the entire subject of affine geometry and affine spaces.
Greenberg/Harper [114], section 8, page 41, gives a useful 2-page overview of affine space definitions.
Weyl [51], section 2, gives a useful 6-page account in German.
[ Define affine transformations and groups of affine transformations. These may be expressed in terms of the
tangent bundle on IRn . Give all affine space definitions also in terms of the tangent bundle T (IRn ). ]
12.3. Affine transformation groups
[ This section should present the standard “inhomogeneous” linear transformation groups which permit a
translation of some sort. These should be presented both as abstract linear space groups and as matrix
groups. Include inhomogeneous Lorentz transformations in this section. Maybe projective or rational linear
groups could be included also? Both affine and rational linear transformations may be represented as square
matrices with an additional row or column which is treated differently. ]
12.4. Euclidean spaces

12.4.1 Remark: Euclidean spaces are defined in many different ways including the following. In paren-
theses are possible names for some of these concepts and constructions. Some specification tuples could be
given for all of these, but that would be taking it too seriously. The point being made here is that there is
no single “standard definition” of a Euclidean space. There are many standards to choose from.
(i) The linear space IRn with no metric or inner product. (Euclidean linear space?)
(ii) The linear space IRn with a metric but no inner product. (Euclidean metric space?)
(iii) The linear space IRn with a norm and an inner product. (Euclidean inner product space?)
(iv) A linear space of n dimensions, namely a linear space with the absolute origin and axis directions
removed. (This can be done by starting with an abstract set and adding to this an n-dimensional chart
and the set of all n-dimensional charts which are related to this by isometries.) This is the space of
Euclidean geometry.
(v) A linear space of n dimensions together with a tangent bundle.
(vi) A linear space of n dimensions together with the usual tangent bundle, Riemannian metric and Levi-
Civita connection.
(vii) Affine space of n dimensions. (Euclidean affine space?)

(viii) The set IRn together with the usual flat affine connection.
(ix) The set IRn together with its usual topology. (Euclidean topological space?)
(x) Any topological space which is homeomorphic to IRn with the usual topology.
(xi) The set IRn together with its usual differentiable structure.
(xii) Any topological space which is diffeomorphic to IRn with the usual differentiable structure.
(xiii) The set IRn together with its usual Lebesgue measure.
As a result of the wide variety of meanings in the literature, it is difficult to say exactly how much algebraic,
geometric or analytic structure is referred to when someone says “Euclidean space”. But ambiguity turns
to absurdity when a notation such as “E n ” is proposed for one or more of these Euclidean spaces. When
the notation “E n ” is used in elementary texts it is supposedly to remind the reader to think of IRn with
some range of usual “Euclidean” structures attached to it. However, such an “E n ” notation is not the nth
power of anything – certainly not the nth power of E, whatever E is. This kind of pseudo-notation should
be replaced with meaningful notation wherever it is found. (An even more absurd notation is “M n ” for an
n-dimensional manifold. Teachers do more harm than good by using such meaningless notation because it
forces students to abandon logical thinking.)
[ Maybe present here some standardized definitions of Euclidean space? Could maybe define many kinds of
Euclidean space, such as “Euclidean affine spaces” “Euclidean linear spaces” and “Euclidean topological
spaces”. ]

[325]
Chapter 13
Tensor algebra
13.1 The meaning of tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

13.2 Multilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
13.3 Linear spaces of multilinear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13.4 Symmetric and antisymmetric multilinear maps . . . . . . . . . . . . . . . . . . . . . . . 332
13.5 Tensor product metadefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
13.6 Tensor products of linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
13.7 Covariant tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
13.8 Mixed tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
13.9 General tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
13.10 Alternating tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
13.11 Alternating tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
13.12 Tensor products defined via free linear spaces . . . . . . . . . . . . . . . . . . . . . . . . 346
13.13 Tensor products defined via lists of tensor monomials . . . . . . . . . . . . . . . . . . . . 347
This chapter presents only the algebra of tensor spaces. Tensor space concepts such as differential forms
(Section 20.5) and the exterior derivative (Section 20.6) require differential and integral calculus for their
treatment. These analysis-dependent topics are therefore delayed until Chapter 20, following the required
analysis concepts.
13.1. The meaning of tensors
13.1.1 Remark: Tensors are central to general relativity and differential geometry in general. For example,
Einstein’s equations in Remark 39.3.2 are expressed in terms of the Ricci curvature tensor R, the metric
tensor g and the stress-energy tensor T . Although tensors arise naturally and inevitably in differential
geometry, they are difficult to define in an intuitively obvious way.
Even in diagrams, it is difficult to express non-trivial tensor concepts. For example, general tensors of rank 2
in three dimensions span a 9-dimensional space, which is, of course, difficult to represent diagrammatically
on paper. Symmetric tensors of rank 2 in three dimensions span a 6-dimensional space, which is also
difficult, but the corresponding antisymmetric tensors span a 3-dimensional space, which is in principle
representable as 3-dimensional diagrams, but this “cross product” representation has many difficulties with
interpretation. Any tensors which go beyond rank 2 and 3 dimensions are even more difficult to grasp
intuitively or diagrammatically.
It is probably not an exaggeration to say that the tensor concept is the major single source of pain when
learning general relativity or differential geometry.
13.1.2 Remark: Any attempt to define general tensors in a way which is both simple, mathematically
correct and intuitively clear is futile. However, tensors may be defined in terms of monomial tensors. (These
are also known as “simple tensors”. See Definition 13.6.6.) The following is an informal definition of a
monomial tensor.
A monomial tensor is the effect of a sequence of vector arguments on a multilinear function.


326 13. Tensor algebra
This may be abbreviated as follows.

A monomial tensor is the multilinear effect of a sequence of vectors.
This concept is useful because multilinear functions do appear frequently in non-trivial ways in differential
geometry. Before attempting to interpret the above definition, it is helpful to first consider some examples
of multilinear functions.
The area of a triangle is one of the most fundamental concepts in Euclidean geometry. Let the vertices of
−−
→ −→
a triangle be A, B and C. Let v1 and v2 denote the vectors AB and AC respectively. Then the area of
the triangle ABC is proportional to the length of v1 for fixed v2 and is proportional to the length of v2 for
fixed v1 , if it is assumed that the directions of these vectors are fixed. If this example is examined more
closely, it may be seen that the area is fully bilinear with respect to the pair of vectors. Similarly, the volume
of a tetrahedron is a trilinear function of the edge vectors at any one of its vertices.
A second example of a multilinear function is the second-order component of the Maclaurin series of a twice
continuously differentiable function φ : IRn → IR for any n ≥ 2. The second order term has the form
!n 2 &
(x, y) 8→ 12 i,j=0 aij xi yj for x, y ∈ IRn , where aij = ∂∂xφ(x,y) i ∂y j
& for i, j = 1 . . . n. This expression is
x=0,y=0
clearly linear with respect to each of x and y when the other is fixed. So this is a bilinear function. Similarly,
the third order term is trilinear and the nth order term is n-linear.
A monomial tensor is supposed to be the effect of a sequence of vectors on a multinomial function. Thus
if f (v1 , v2 ) is a bilinear function of v1 and v2 , and the sequences (v1 , v2 ) and (v1# , v2# ) satisfy f (v1 , v2 ) =
f (v1# , v2# ), then the two vector sequences have the same effect on f because they give f the same value.
So if f (v1 , v2 ) = f (v1# , v2# ) for all bilinear functions f , then the monomial tensors for these are two pairs
are identical. In other words, a monomial tensor is effectively a vector sequence, subject to an equivalence
relation.
As an example, let v1 ∈ V1 and v2 ∈ V2 be vectors in real linear spaces V1 and V2 . The tensor v1 ⊗ v2 is
defined as the map v1 ⊗ v2 : f 8→ f (v1 , v2 ) evaluated on the space of bilinear functions f : V1 × V2 → IR. In
other words, v1 ⊗ v2 is the effect of the pair (v1 , v2 ) on bilinear functions f : V1 × V2 → IR. Thus v1 ⊗ v2 is
the “bilinear effect” of the pair (v1 , v2 ).
More generally, if (vi )m i=1 ∈ ×i=1 Vi is a sequence of m vectors in linear spaces Vi over a field K, the tensor
m
product v1 ⊗ . . . vm is defined as the effect f 8→ f (v1 , . . . vm ) of the sequence (v1 , . . . vm ) on multilinear

functions f : ×m
i=1 Vi → K.
A general tensor is an arbitrary sum of multilinear effects of sequences of vectors.
A tensor is an arbitrary sum of monomial tensors.
Each individual sequence of vectors (v1 , . . . vm ) has a multilinear effect which is a map from functions
f : ×m i=1 Vi → K to values f (v1 , . . . vm ) ∈ K. But the set of these maps forms a linear space over K with
respect to pointwise addition and scalar multiplication. It is this linear space of arbitrary sums of maps which
is called a “tensor space”. The map for any individual sequence of vectors is called a “tensor monomial”.
13.1.3 Remark: The construction of tensors from linear spaces is similar to the way a (finite-dimensional)
linear space may be reconstructed from its self-dual. A vector v in a linear space V may be thought of
as the “linear effect” of v on the dual space V ∗ of linear functionals on V . That is, the value λ(v) for
v ∈ V and λ ∈ V ∗ may be regarded as a map µ(v) : V ∗ → V defined by µ(v) : λ 8→ λ(v). Thus
µ : V → V ∗∗ is a linear space isomorphism. It may seem useless to replace a linear space with the dual of
its dual space. Nothing is gained in the case of linear functionals. But in the case of multilinear functionals
f : ×m i=1 Vi → IR on more than one linear space, the “dual of the dual” is not isomorphic to the original
space V , nor to the Cartesian product ×m i=1 Vi . The construction
.m method for tensors is the same as for the
dual
!n of the dual, but the constructed space has dimension i=1 dim(V i ), which is different to the dimension
i=1 dim(V i ) = dim(× n
i=1 V i ). Therefore the tensor product space is not isomorphic to the Cartesian product.
It is a very different kind of product with a very different algebra.
13.1.4 Remark: Tensors are not quite as perplexing as they seem at first. We are used to thinking of
the ratios 3/4 and 6/8 as exactly the same thing. In the same way, the tensor products (0, 2) ⊗ (3, 4) and
(0, 1) ⊗ (6, 8) are exactly the same thing. Any bilinear function acting on these vector pairs will give the
same result. This is like saying that if you multiply by 3 and divide by 4, you get the same “effect” as
multiplying by 6 and dividing by 8. We could say that 3/4 and 6/8 have the same “fractional effect”.

13.1. The meaning of tensors 327
13.1.5 Remark: An alternating tensor is the “antisymmetric multilinear effect” of a sequence of vectors.
Alternating tensors arise naturally as line elements, area elements and volume elements in multi-dimensional
integration theory.
A symmetric tensor is the symmetric multilinear effect of a sequence of vectors. However, there is (probably)
no real need to develop the theory of symmetric tensors because the algebra of symmetric tensors is not very
interesting.
13.1.6 Remark: The set of “multilinear effects” of vector sequences is not closed under addition. The
closure of this set under addition is called a “tensor (product) space”.
13.1.7 Remark: The subject of tensors may be thought of as “multilinear algebra”. It seems to have
started with Hermann Günter Grassman in the middle of the 19th century. The topic of alternating tensors
is called exterior algebra, exterior calculus or Grassman algebra. (See Federer [106], page 8 and Frankel [19],
page 66 for notes on Grassman as the originator of the exterior calculus.)
13.1.8 Remark: Three ways of defining tensor products are presented in this chapter.
(i) Characterization in terms of a multilinear map (Metadefinition 13.5.1).
(ii) The dual of the linear space of multilinear maps (Definition 13.6.2).
(iii) The quotient of a free linear space by a set of multilinear identities (Definition 13.12.1).
Metadefinition 13.5.1 defines tensor space representations in general. Definitions 13.6.2 (dual of multilinear
maps) and 13.12.1 (quotient of free linear space) are particular tensor space representations. All tensor space
representations (for a particular sequence of linear spaces) are related by a unique isomorphism. Therefore
calculations in all representations give the same answers.
The appearance of multiple representations in the literature is not peculiar to tensors. For example, the
real numbers may be represented as either Dedekind cuts, equivalence classes of Cauchy sequences, or
binary expansions. Tangent vectors on manifolds may be represented as coordinates, differential operators
or equivalence classes of differentiable curves. The acceptance of multiple representations may be thought
of as “outsourcing” the construction of mathematical systems to multiple definition providers subject to
interoperability standards.
[ The word “tensor” was possibly introduced by Hamilton in 1846. Check this. ]
13.1.9 Remark: Some useful references for this chapter are Frankel [19], sections 2.4–2.6, Federer [106],
chapter 1, Warner [50], chapter 2, EDM2 [35], article 256.I–O, Gallot/Hulin/Lafontaine [20], page 36, and
Crampin/Pirani [12], page 101.
13.1.10 Remark: Simple vectors and differential forms are not sufficient to do a full range of algebra
and calculus in manifolds. The derivative of a vector field is not a vector field. It is a new kind of object.
Similarly, the products of differentials which are required for integration are not simple differentials. These
new kinds of objects are “tensors”. A tensor is a kind of product of vectors or differentials or mixtures of
vectors and differentials.
13.1.11 Remark: A tensor may be characterized as “the multilinear effect of a sequence of vectors”. This
is not easy to understand. It is easier to first understand alternating tensors, which may be thought of
as “the antisymmetric multilinear effect of a sequence of vectors”. This sounds even more complex. But
consider the example of a pair of vectors v1 and v2 . The “antisymmetric multilinear effect” of this pair
of vectors is just the area spanned by the two vectors. The word “area” here means “directed area”,
not just the amount of area. This directed area is denoted v1 ∧ v2 . (The symbol “∧” is pronounced
“wedge”.) Since this area has a direction, reversing one of the vectors changes the direction to the opposite:
v1 ∧ (−v2 ) = −(v1 ∧ v2 ) = (−v1 ) ∧ v2 . When the order of multiplication is swapped, the resulting area has
the opposite direction. So v2 ∧ v1 = −(v1 ∧ v2 ). It follows that v2 ∧ v1 + v1 ∧ v2 = 0.
As illustrated in Figure 13.1.1, the antisymmetric multilinear effect of the pair of vectors v1 and v1 + v2 is the
same as for the pair v1 and v2 . This is because v1 ∧(v1 +v2 ) = v1 ∧v1 +v1 ∧v2 by linearity with respect to the

2v1 + v2
v1 + v2 v1 + v2 v1 + v2 2v1 + 0.5v2
v2 v2 1.5v1 + 0.5v2
0.5v2 2v1
v1 v1 v1
0.5(v1 − v2 )
v1 ∧ v2 = v1 ∧ (v1 + v2 ) = 0.5(v1 − v2 ) ∧ (v1 + v2 ) = (2v1 ) ∧ (0.5v2 )
Figure 13.1.1 Equivalent antisymmetric multilinear effect of vector pairs
second factor and v1 ∧v1 = −(v1 ∧v1 ) by antisymmetry. So v1 ∧v1 = 0. Hence v1 ∧(v1 +v2 ) = v1 ∧v2 . Similarly,
0.5(v1 − v2 ) ∧ (v1 + v2 ) = (v1 − 0.5(v1 + v2 )) ∧ (v1 + v2 ) = v1 ∧ (v1 + v2 ) − 0.5(v1 + v2 ) ∧ (v1 + v2 ) = v1 ∧ (v1 + v2 ).
By a happy coincidence, the parallelograms subtended by these vector pairs have the same area. This coinci-
dence holds generally for vector sequences with the same “antisymmetric multilinear effect”. So alternating
tensors are used for integration of functions on curves, surfaces and volumes. The line, area and volume
elements for (standard) integration are antisymmetric multilinear products of vectors. This explains the
importance of alternating tensors in the differential geometry literature.
13.2. Multilinear maps

A multilinear map is a vector-valued function of multiple vector variables which depends linearly on each
variable individually.
13.2.1 Remark: Definition 13.2.3 means that f : ×α∈A Vα → U is multilinear if f is linear with respect to
each space Vβ individually for fixed values of vα ∈ Vα for indices α ∈ A with α -= β. This requirement for
linearity with respect to each variable for fixed values of other variables is similar to the definition of partial
derivatives, which also require all but one of the variables to be fixed.
&
13.2.2 Remark: Definition 13.2.3 uses the “substitution operator” notation which defines E(x)&x=V to
mean the substitution of expression V into expression E(x) wherever the variable x occurs. This is a text-
level metanotational convention. Probability theory uses a similar convention as in Prob(X|E) for logical
expressions E. The notation {x ∈ S; P (x)} for the restriction of a set S by a condition P is a comparable
text-level metanotation in which a proposition appears.
13.2.3 Definition: A multilinear map from a Cartesian product ×α∈A Vα of linear spaces (Vα )α∈a over a
field K to a linear space U over K is a map f : ×α∈A Vα → U such that
∀v ∈ × Vα , ∀β ∈ A, ∀w1 , w2 ∈ Vβ , ∀λ1 , λ2 ∈ K,
α∈A
+ & , + & , + & ,
& & &
f v& = λ1 f v & + λ2 f v & . (13.2.1)
vβ =λ1 w1 +λ2 w2 vβ =w1 vβ =w2
13.2.4 Remark: The linearity of a multilinear map with respect to all component spaces of its domain is
When #(A) = 1 in Definition 13.2.3, the map f is linear.
When #(A) = 2, the map is said to be bilinear.
When #(A) = 3, the map is said to be trilinear, and so forth.
Figure 13.2.2 gives an alternative style of visualization for the definition of multilinear functions. It is
important to keep in mind that the “axes” for the Cartesian product V1 × V2 × V3 represent the entire spaces
V1 , V2 and V3 , not one-dimensional subspaces of these spaces.
13.2.5 Notation: L ((Vα )α∈A ; U ) for a family (Vα )α∈A of linear spaces over a field K and a linear space
U over K denotes the set of multilinear maps from ×α∈A Vα to U .
13.2.6 Notation: L (V1 , . . . Vm ; U ) for a sequence of m ∈ +

0 linear spaces (Vi )i=1 and a linear space U
m
over a field K denotes the set L ((Vα )α∈A ; U ) with A = m = {1, . . . m}.

13.2. Multilinear maps 329
U
f (v1 , v2 , w3 )
f (v1 , v2 , v3 )
f (w1 , v2 , v3 )
f (v1 , w2 , v3 ) linear
linear linear
v1 w2
V1 V2 V3 w3
v2 v3
w1
trilinear map f : V1 × V2 × V3 → U
Figure 13.2.1 Linearity of a multilinear function with respect to domain components
V3
f ∈ L (V1 , V2 , V3 ; U )
(v1 , v2 , w3 ) linear f (v1 , v2 , w3 ) ∈ U
(v1 , w2 , v3 ) linear f (v1 , w2 , v3 )

(v1 , v2 , v3 ) f (v1 , v2 , v3 )
∈ V1 × V2 × V3 (w1 , v2 , v3 ) linear f (w1 , v2 , v3 )
V2
V1 × V 2 × V 3 V1 U
Figure 13.2.2 Definition of multilinear function
13.2.7 Notation: Lm (V, U ) for m ∈ + 0 and linear spaces V and U over a field K denotes the set of
multilinear maps L ((Vα )α∈A ; U ) with A = m and Vα = V for all α.
13.2.8 Remark: It follows from Definition 13.2.3 that the value of a multilinear function is zero if any of
its arguments is zero. (See Theorem 13.2.9 (i).) This is a consequence of the multiplicative nature of the
tensor product space in contrast to the additive nature of the Cartesian product space.
The set of multilinear maps L (V ; U ) is the same thing as the set of linear space homomorphisms Hom(V, U )
defined in Section 10.3. If U is the field K regarded as a linear space over K, then L (V ; K) = Hom(V, K)
is the linear space dual V ∗ of V as in Section 10.5.
13.2.9 Theorem: Let (Vα )α∈A be a finite family of linear spaces over a field K and U be a linear space
over K. Then:
(i) ∀f ∈ L (×α∈A Vα ; U ), ∀β ∈ A, ∀v ∈ (Vα )α∈A , (vβ = 0 ⇒ f (v) = 0).
Proof: For (i), let v = (vα )α∈A with vβ = 0 for some β ∈ A. In Definition 13.2.3, let w1 = w2 = 0 and
λ1 = λ2 = 0. Then line (13.2.1) gives f (v) = 0.f (v) + 0.f (v) = 0.
13.2.10 Example: Figure 13.2.3 illustrates a bilinear function f from linear spaces V1 = V2 = IR1 to the
linear space U = IR1 .
All multilinear functions from IR1 × IR1 to IR1 are of the form f (x1 , x2 ) = kx1 x2 for some k ∈ IR. The
diagram uses k = 1. It is clear from the contour curves that f is not linear. A linear function of two variables

x2
f (x1 , x2 ) = −1 f (x1 , x2 ) = 1
−2 1 2
−1.5 0.5 1.5
−1 1
−0.5 0.25 0.5
0 x1
0.5 -1 1 −0.5
1 −1
1.5 f (x1 ,x2 )=0 −1.5
2 -1 −2
f (x1 , x2 ) = x1 x2
2 1 −1 −2
Figure 13.2.3 Bilinear function f : IR × IR1 → IR1

1
would have straight lines for contours. However, the value of f (x1 , x2 ) for a constant value of x1 clearly
varies linearly with respect to x2 .
13.2.11 Remark: Theorem 13.2.12 re-expresses Definition 13.2.3 in terms of a logical expression which
does not use the substitution metanotation in Remark 13.2.2. However, expression (13.2.2) is probably less
easy to interpret than (13.2.1).
13.2.12 Theorem: A map f from a Cartesian product ×α∈A Vα of linear spaces (Vα )α∈A over a field K
to a linear space U over K is multilinear if and only if
∀β ∈ A, ∀λ1 , λ2 ∈ K, ∀u, v, w ∈ × Vα ,
α∈A
" #
uβ = λ1 vβ + λ2 wβ and ∀α ∈ A \ {β}, uα = vα = wα ⇒ f (u) = λ1 f (v) + λ2 f (w). (13.2.2)
13.2.13 Remark: Linear homomorphisms are defined for a broader class than linear spaces. For example,
maps between modules over a commutative ring have a well-defined concept of linearity. (For such maps,
see Definition 9.9.25.) Multilinearity of maps is well-defined for these more general classes. So tensor spaces
are well-defined for these classes also. (See for example Bump [97], chapter 9.)
13.2.14 Remark: In Section 11.2, the correspondence between matrices and linear maps was presented.
There is also a useful correspondence between matrices and bilinear functions on a linear space.
Let V be an n-dimensional linear space over a field K with basis (ei )ni=1 ∈ V n . Let α : V × V → K be a
bilinear map on V . For i, j ∈ n , let aij = α(ei , ej ). The matrix a ∈ Mn,n (K) then satisfies
+5 n n
5 ,
∀v, w ∈ V, α(v, w) = α vi ei , wj ej
i=1 j=1
n
55 n
= vi wj α(ei , ej )
i=1 j=1
5n
= aij vi wj ,
i,j=1
!n !n
where v = i=1 vi ei and w = wj ej express v and w in terms of the given basis.
j=1
!
Conversely, for any matrix a ∈ Mn,n (K), the map (v, w) 8→ ni,j=1 aij vi wj for v, w ∈ V defines a bilinear map
on V . The correspondence between the matrices and the bilinear maps is one-to-one and onto. Consequently
matrices offer an equivalent way of expressing bilinear maps. (This is illustrated in Figure 13.2.4. This is
similar to Figure 11.2.1 but different.)
The matrix representation requires a basis and the matrix is different for each basis. Nevertheless the matrix
representation is the most usual way of specifying bilinear maps in practical applications.

13.3. Linear spaces of multilinear maps 331
tuple space
(vi )ni=1
n
K ×K n !
n
(wj )m
j=1 bij vi,j =
1
i wj
bilin
ear f
orm
component β(v, w)
map κ V : V → K n
K
ap
ear m
bilin K
v × V → field
w β: V
V ×V
linear space
Figure 13.2.4 Components of a bilinear map with respect to a basis
13.3. Linear spaces of multilinear maps

13.3.1 Remark: The pointwise addition and scalar multiplication operations for L ((Vα )α∈A ; U ) are de-
fined so that
(λ1 f1 + λ2 f2 )((vα )α∈A ) = λ1 f1 ((vα )α∈A ) + λ2 f1 ((vα )α∈A )
for all f1 , f2 ∈ L ((Vα )α∈A ; U ), λ1 , λ2 ∈ K and (vα )α∈A ∈ ×α∈A Vα .
13.3.2 Theorem: The set L ((Vα )α∈A ; U ) is closed under pointwise addition and scalar multiplication.
13.3.3 Definition: The linear space of multilinear maps from the family (Vα )α∈A of linear spaces over a
field K to a linear space U over K is the set L ((Vα )α∈A ; U ) together with the operations of pointwise vector
addition and scalar multiplication.
13.3.4 Remark: Strictly speaking, the linear space of multilinear maps in Definition 13.3.3 is the tuple
L ((Vα )α∈A ; U ) −
< (K, V̄ , σK , τK , σV̄ , µK ), where K −
< (K, σK , τK ) is the field, V̄ = L ((Vα )α∈A ; U ) is the
set of maps, σV̄ : V̄ × V̄ → V̄ denotes pointwise addition on V̄ , and µK : K × V̄ → V̄ denotes pointwise
multiplication of elements of V̄ by elements of K. (See Definition 10.1.2 for linear spaces.)
13.3.5 Remark: Let (eα,i )ni=1 α

be a basis for the linear space Vα for α ∈ A, so that nα = dim(Vα ) for α ∈ A.
Let c : ×α∈A nα → K be a family of coefficients in the field K. (Thus c(j) ∈ K for families of indices
j = (jα )α∈A ∈ ×α∈A nα .)
! α i
Let (vαi )ni=1
α
be the coefficients of vα with respect to (eα,i )ni=1
α
for α ∈ A. Thus vα = ni=1 vα eα,i for α ∈ A.
Define the map fc : ×α∈A Vα → K by
5 /
fc : (vα )α∈A 8→ c(j) vαjα .
j∈ × nα α∈A
α∈A
Then fc ∈ L ((Vα )α∈A ; K).

" # " #
[ Maybe show that fc (eα,I(α) )α∈A = c (I(α))α∈A for I ∈ ×α∈A nα ? ]
[ Show how Remark 13.3.5 specializes to A = {1, 2}. Also comment on how this embeds the dual of the dual
in the primal space. ]
[ Can Remark 13.3.5 be extended to target spaces U -= K? Show that
. the above gives a basis for the space of
multilinear maps. Deduce the dimension dim(L ((Vα )α∈A ; IR)) = α∈A dim(Vα ) from this (or otherwise). ]
[ Theorem 13.3.6 is closely related to Theorem 13.6.17 for the tensor product space. ]

13.3.6 Theorem: Let (Vα )α∈A be a finite family of finite-dimensional linear spaces. Then
" # /
dim L ((Vα )α∈A ; K) = dim(Vα ).
α∈A
[ Show a canonical isomorphism L ((Vα∗ )α∈A ; K)∗ ∼ = L ((Vα )α∈A ; K). Frankel [19] gives L ((Vα∗ )α∈A ; K) as
the definition of the space of contravariant tensors and L ((Vα )α∈A ; K) as the space of covariant tensors.
See comment after Theorem 13.7.3. ]
13.4. Symmetric and antisymmetric multilinear maps

13.4.1 Remark: In Definitions 13.4.2 and 13.4.3 for symmetric and antisymmetric multilinear maps, recall
that a permutation of a finite set A is any bijection P : A → A. (See Definition 7.10.3.) When requiring a
symmetry for all permutations, all linear spaces in the product must be the same.
13.4.2 Definition: A symmetric multilinear map from a Cartesian product V m of a linear "space V over # a
field K, for m ∈ + 0 , to a linear space U over K is a multilinear map f ∈ Lm (V, U ) such that f swap j,k (v) =
f (v) for all v = (vi )m
i=1 ∈ V m
and j, k ∈ m .
13.4.3 Definition: An antisymmetric multilinear map from a Cartesian product V m of a linear space
# K, for m ∈ 0 , to a linear
V" over a field +
space U over K is a multilinear map f ∈ Lm (V, U ) such that
f swapj,k (v) = −f (v) for all v = (vi )m
i=1 ∈ V m and j, k ∈ m such that j -= k.
An antisymmetric multilinear map is also called an alternating multilinear map.
13.4.4 Notation: Lm +
(V, U ) for m ∈ +
0 and linear spaces U and V denotes the set of symmetric multi-
linear maps from V m to U . Thus
+
Lm (V, U ) = {f ∈ Lm (V, U ); ∀j, k ∈ m, f ◦ swap = f }.
j,k
13.4.5 Notation: Lm −
(V, U ) for m ∈ +
and linear spaces U and V denotes the set of antisymmetric
0
multilinear maps from V m to U . Thus
−
Lm (V, U ) = {f ∈ Lm (V, U ); ∀j, k ∈ m , (j -= k ⇒ f ◦ swap = −f )}.
j,k
13.4.6 Theorem: The set Lm

+
(V ; U ) is closed under pointwise addition and scalar multiplication.
13.4.7 Theorem: The set Lm

−
(V ; U ) is closed under pointwise addition and scalar multiplication.
13.4.8 Theorem: Let f ∈ Lm

+
(V, U ) for m ∈ + 0 and " linear m spaces
# V" andmU #over a field K. Then for all
permutations P : m → m and v = (vi )mi=1 ∈ V m
. f (vP (i) i=1 = f (vi )i=1 . In other words,
))
+
∀f ∈ Lm (V, U ), ∀P ∈ perm( m ), ∀v ∈ V m ,
f (v ◦ P ) = f (v).
13.4.9 Theorem: Let f ∈ Lm

−
(V, U ) for m ∈ + 0 and " linear m spaces
# V and U over " amfield
# K. Then for all
permutations P : m → m and v = (vi )mi=1 ∈ V m
. f (v ))
P (i) i=1 = parity(P )f (vi i=1 . In other words,
)
−
∀f ∈ Lm (V, U ), ∀P ∈ perm( m ), ∀v ∈ V m ,
f (v ◦ P ) = parity(P )f (v).

13.5. Tensor product metadefinition 333
13.4.10 Theorem: Let V be a finite-dimensional linear space over a field K, and m ∈ 0.

+
Then
" #
dim Lm (V, K) = dim(V )m
" − #
dim Lm (V, K) = Cm dim(V )
" + #
dim Lm (V, K) = Cm dim(V )+m−1
!
Proof: Let (ei )ni=1 be a basis for V . All elements λ or Lm (V, K) have the form λ : v 8→ I∈ n m aI vI
for v ∈ V m . Since there are no constraints on the coefficients aI , It follows that dim(Lm (V, K)) = dim(V )m .
In the case of Lm−
(V, K), the coefficients aI are constrained by the antisymmetry rule aI = parity(P )aI◦P for
all permutations P ∈ perm( m ). From this it follows that aI = 0 for index sequences I with any two indices
equal. The remaining index sequences may be partitioned according to the equivalence relation I ≡ J if and
only if ∃P ∈ perm( m ), I = J ◦ P . A unique representative may be chosen from each equivalence class
by sorting into increasing order. There is one and only one increasing index sequence in each equivalence
class, and there is one and only one equivalence class for each increasing index sequence. It follows that the
number of equivalence classes equals the number of increasing index sequences in Im n
. But this is equal to
Cmn
by Theorem 7.11.11 (i).
In the case of Lm+
(V, K), the symmetry rule implies that aI = aJ whenever I = J ◦P for some P ∈ perm( m ).
Equivalent index sequences may be partitioned into equivalence classes as in the antisymmetric case, but
coefficients with repeated indices are not set to zero. A unique representative for each equivalence class may
be obtained by sorting into non-increasing order. Since aI = aJ if and only if the sorted index sequences I
and J are equal, it follows that the number of independent coefficients is equal to the cardinality of the set
Jmn
of non-increasing maps from m to n . By Theorem 7.11.11 (ii), this equals Cm n+m−1
.
[ The proof of Theorem 13.4.10 is a bit too intuitive. It should be possible to do elementary combinatorics a
bit more rigorously than this. ]
13.4.11 Remark:
" # A different way to specify the antisymmetry in Definition 13.4.3 is to require f (v) = 0
for all v = (vi )m i=1 ∈ V m
such that vi = vj for some i, j ∈ m with i -= j.
Suppose that the alternative condition is " &true. Let i, j ∈ #m with " &i -= j. Let# w1 , w
" 2 & ∈ V . By #the
multilinearity of f ∈ Lm (V, U ), 0 = f v &vi =w1 +w2 ,vj =w1 +w2 = f v &vi =w1 ,vj =w1 + f v &vi =w1 ,vj =w2 +
" & # " & # " & # " & # " & #
f v &vi =w2 ,vj =w1 +f v &vi =w2 ,vj =w2 = 0+f v &vi =w1 ,vj =w2 +f v &vi =w2 ,vj =w1 +0. Hence f v &vi =w1 ,vj =w2 =
" & #
−f v &vi =w2 ,vj =w1 . Since f is antisymmetric for a simple two-parameter transposition, the antisymmetry
holds for all permutations of parameters. A similar proof yields the converse.
13.5. Tensor product metadefinition

In Sections 13.5 and 13.6, tensors are defined by removing from sequences of vectors any information which
disappears in multilinear maps on those sequences. In other words, a tensor is constructed from sequences
of vectors by equating those sequences of vectors for which any multilinear map gives the same value.
Tensor spaces are first characterized or specified by Metadefinition 13.5.1, which is based on Federer [106],
section 1.1.1, page 9. (See also Gallot/Hulin/Lafontaine [20], page 36.)
13.5.1 Metadefinition: A tensor product space for a finite family (Vα )α∈A of linear spaces over a field
K is a pair (W, µ) such that
(i) W is a linear space over K,
(ii) µ : ×α∈A Vα → W is multilinear, and
(iii) for any pair (U, ν) where U is a linear space and ν : ×α∈A Vα → U is multilinear, there exists a unique
linear map g : W → U such that ν = g ◦ µ.
The map µ is referred to as the canonical multilinear map.
13.5.2 Remark: Metadefinition 13.5.1 is illustrated in Figure 13.5.1. There is a different map g for each
multilinear map ν, but the map µ is unique to the particular tensor product space definition and there is a
unique function g for each pair (µ, ν).

g1 g2
U1 W U2
µ
ν1 ν2
× Vα
α∈A
Figure 13.5.1 Metadefinition of tensor spaces
13.5.3 Remark: EDM2 [35], section 256.I, calls the canonical multilinear map µ in Metadefinition 13.5.1
the “canonical bilinear mapping” in the case that #(A) = 2, but the noun “mapping” is generally avoided
in this book. The noun “map” is used here instead. Usually the map µ is not surjective. The tensors are
an extension of range of the canonical map µ. The extension W (minimally) closes the tensor product space
with respect to linear operations. That’s the whole point of tensor spaces. Tensor spaces would be useless
is they were nothing more than the image of a multilinear map.
Both Bump [97], page 50, and Fulton/Harris [109], page 471, call the map µ in Metadefinition 13.5.1 the
“universal” bilinear map in the case #(A) = 2.
13.5.4 Remark: Condition (iii) in Metadefinition 13.5.1 may be interpreted as saying that all of the
information in any multilinear map from the product ×α∈A Vα to any linear space U is contained in the
representation W , because after “filtering” the product through the map µ, it is still possible to reconstruct
any multilinear map ν from the space W via a map g : W → U . The requirement that g should be unique
means that no unnecessary information remains in the space W following the application of the map µ. So
the pair (W, µ) keeps all relevant information and removes all irrelevant information for the construction of
multilinear maps on the cross product. This justifies the claim that tensors are the “multilinear quintessence”
of sequences of vectors.
13.5.5 Remark: All tensor product definitions are equivalent because any two tensor product definitions
will yield isomorphic tensor spaces. It follows from condition (iii) that if two tensor product definitions yield
pairs (W1 , µ1 ) and (W2 , µ2 ) for a family (Vα )α∈A , then there exist maps g12 : W1 → W2 and g21 : W2 → W1
such that µ2 = g12 µ1 and µ1 = g21 µ2 . Therefore g12 and g21 are linear isomorphisms between W1 and W2 .
This is illustrated in Figure 13.5.2.
g12
W1 W2
g21
µ1 µ2
× Vα
α∈A
Figure 13.5.2 Uniqueness of tensor space up to isomorphism
13.5.6 Remark: Very importantly, the isomorphism in Remark 13.5.5 between any two representations of
a tensor product is unique. This ensures that any individual tensor in one representation may be identified
with one and only one particular tensor in the other representation. So there is no ambiguity when one
asks; “Which tensor is this?” The same considerations apply to Metadefinition 27.2.1 for tangent bundles
to manifolds.
13.5.7 Theorem: For any two representations (W1 , µ1 ) and (W2 , µ2 ) of the tensor product of a sequence
of linear spaces, there exists a unique isomorphism between the representations which commutes with the
canonical maps µ1 and µ2 .
[ Federer proves Theorem 13.5.7. Must rewrite its statement more precisely. ]
13.5.8 Notation: The tensor product space W in Metadefinition 13.5.1 is denoted as ⊗α∈A Vα .
m
0 , then the tensor product space may be denoted as ⊗i=1 Vi or V1 ⊗ . . . Vm .
If A = m for m ∈ +

13.6. Tensor products of linear spaces 335
13.5.9 Theorem: Let (Vα )α∈A and (Vα# )α∈A be families of linear spaces with the same index set. Then
for any family of linear maps (fα )α∈A such that fα : Vα → Vα# for all α ∈ A, there is a unique linear map
⊗α∈A fα : ⊗α∈A Vα → ⊗α∈A Vα# such that
" #" #
∀(vα )α∈A ∈ × Vα , ⊗ fα ⊗ vα = ⊗ fα (vα ).
α∈A α∈A α∈A α∈A
Proof: . . .
13.5.10 Remark: Theorem 13.5.9 is illustrated in Figure 13.5.3. The direct product function ×α∈A fα is
introduced in Definition 6.9.11.
⊗ fα
⊗ Vα α∈A
⊗ Vα#
α∈A α∈A
µ µ#
× Vα × Vα#
α∈A α∈A
× fα
α∈A
Figure 13.5.3 Unique lift from a family of linear maps to a single tensor space map
[ Maybe also define (anti)symmetric tensors in the definition context. ]

[ When proving properties of tensors, some properties are valid for any representation consistent with the
general definition. Other properties are representation-dependent. Should clearly distinguish between these.
In particular, should try to prove as much as possible in terms of the general definition before presenting the
specific representation. ]
13.6. Tensor products of linear spaces
13.6.1 Remark: The standard definition for tensor products in this book is the dual of the linear space
of multilinear maps from a linear space family (Vα )α∈A to the field K of the linear spaces. This definition is
simpler, clearer and more economical than the representation in Section 13.12 as the quotient space of a free
linear space on ×α∈A Vα with respect to the subspace generated by the set of multilinear equivalence rules.
13.6.2 Definition: The tensor product (space) of a finite family (Vα )α∈A of linear spaces over a field K
is the dual Hom(L ((Vα )α∈A ; K), K) of the linear space of multilinear maps L ((Vα )α∈A ; K).
A tensor space is the tensor product of any finite family of linear spaces.
A tensor is any element of a tensor space.
[ Must show that Definition 13.6.2 satisfies Metadefinition 13.5.1. Make this a theorem. ]
13.6.3 Notation: ⊗α∈A Vα for a finite family (Vα )α∈A of linear spaces over a field K denotes the tensor
product space of (Vα )α∈A . Thus
⊗ Vα = L ((Vα )α∈A ; K)∗

α∈A
= Hom(L ((Vα )α∈A ; K), K).
13.6.4 Definition: The canonical multilinear map of a tensor space ⊗α∈A Vα is the map µ : ×α∈A Vα →
⊗α∈A Vα defined by " # " #
µ (vα )α∈A (λ) = λ (vα )α∈A ,
for all (vα )α∈A ∈ ×α∈A Vα and λ ∈ L ((Vα )α∈A ; K).

iso iso
(Vα∗ )α∈A L ((Vα )α∈A , K) ⊗α∈A Vα∗ (⊗α∈A Vα )∗
m-
du du du
al al al
dual
al al al
du du du
m-
(Vα )α∈A L ((Vα∗ )α∈A , K) ⊗α∈A Vα (⊗α∈A Vα∗ )∗
iso iso
Figure 13.6.1 Multilinear spaces and tensor products of linear space families
13.6.5 Remark: Figure 13.6.1 illustrates the relations between spaces of multilinear maps and tensor
product spaces of linear space families. The abbreviation “iso” means “isomorphism”, and “m-dual” means
the “multilinear dual” or “space of multilinear maps”.
The families of linear spaces (Vα )α∈A and (Vα∗ )α∈A are shown in Figure 13.6.1 rather than the Cartesian
products ×α∈A Vα and ×α∈A Vα∗ for two reasons. First, the dual (×α∈A Vα )∗ of the space ×α∈A Vα is not equal
to the Cartesian product ×α∈A Vα∗ of the dual spaces. Secondly, The spaces multilinear spaces L ((Vα )α∈A ; K)
and L ((Vα∗ )α∈A ; K) require the full linear space structures of the spaces Vα and Vα∗ respectively for their
definition, not just the point sets. So to be precise in the diagram, it is best to show the families of linear
spaces rather than the Cartesian products.
13.6.6 Definition: A simple tensor in a tensor product space ⊗α∈A Vα is any element of the image of its
canonical multilinear map µ.
13.6.7 Notation: ⊗"α∈A vα for # an element (vα )α∈A of a finite family (Vα )α∈A of linear spaces over a field
K denotes the value µ (vα )α∈A , where µ is the canonical multilinear map for the tensor product ⊗α∈A Vα
m
13.6.8 Notation: ⊗i=1 Vi for a sequence of linear spaces (Vi )m
i=1 over a field K, where m ∈ 0,
+
denotes
the tensor product space ⊗α∈A Vα with index set A = m .
13.6.9 Notation: V1 ⊗ . . . Vm for a sequence of linear spaces (Vi )m

i=1 over a field K, where m ∈ 0,
+
m
denotes the tensor product space ⊗i=1 Vi .
[ Maybe say something near here about Hom(L ((vα )α∈A ; W1 ), W2 ). Is this good for anything? Is it isomor-
phic to something? ]
13.6.10 Remark: In terms of Notations 13.6.8 and 13.6.9, one may write
m
V 1 ⊗ . . . Vm = ⊗ Vi = L ((Vi )m
i=1 ; K) = L (V1 , . . . Vm ; K) .
∗ ∗
i=1
13.6.11 Notation: ⊗m i=1 vi for a sequence #of vectors (vi )i=1 in linear spaces (Vi )i=1 over a field K, where
m m
"
m ∈ 0 , denotes the simple tensor µ (vi )i=1 .
+ m
13.6.12 Notation: v1 ⊗ . . . vm for a sequence of vectors (vi )m

i=1 in linear spaces (Vi )i=1 over a field K,
m
where m ∈ 0 , denotes the simple tensor ⊗i=1 vi .

+ m
13.6.13 Remark: In terms of Notations 13.6.11 and 13.6.12, one may write
m
v1 ⊗ . . . vm = ⊗ vi : λ 8→ λ((vi )m
i=1 ) = λ(v1 , . . . vm )
i=1
for all λ ∈ L ((Vi )m

i=1 ; K).
m
13.6.14 Notation: ⊗ V denotes the tensor product of m copies of a linear space V for any m ∈ 0.
+
In
m m m
other words, ⊗ V = ⊗i=1 Vi , where Vi = V for i ∈ m . Thus ⊗ V = Lm (V, K)∗ .
[ Define the degree of a tensor near here. ]

13.7. Covariant tensors 337
13.6.15 Remark: The canonical map " µ in#Definition

" 13.6.4#is not injective. For example, if K = IR,
A = {1, 2} and V1 = V2 = IR3 , then µ (v1 , v2 ) = µ (tv1 , t−1 v2 ) for any t ∈ IR \ {0}.
" #
In general, the canonical map µ is not surjective either. For example, tensors of the form µ (v1 , v2 ) +
" # 2 " #
µ (v3 , v4 ) ∈ ⊗ IR3 for vk ∈ IR3 are usually not expressible as µ (v5 , v6 ) for v5 , v6 ∈ IR3 .
13.6.16 Remark: To interpret Definitions " 13.6.2

# and 13.6.4, consider the case A = {1, 2} with V1 = V2 =
IRn . Let v1 ∈ V1 and v2 ∈ V2 . Then µ (v1 , v2 ) is the linear
" function
# on the linear space
" L (V# 1 , V2 ; IR) which
maps every multilinear function λ ∈ L (V1 , V2 ; IR) to λ (v1 , v2 ) . In other words, µ (v1 , v2 ) = v1 ⊗ v2 : λ 8→
λ(v1 , v2 ) for all (v1 , v2 ) ∈ V1 × V2 .
!n
For example, denote a real n×n matrix as (aij )ni,j=1 , and define λ : V1 ×V2 → IR by λ(v1 , v2 ) = i,j=1 aij v1i v2j
for
" (v1 , v2#) ∈ V1 ×!V2 . Then λ is clearly bilinear. Therefore Definition 13.6.2 ! implies that (v1 ⊗ v2 )(λ) =
µ (v1 , v2 ) (λ) = ni,j=1 aij v1i v2j for all v1 ∈ V1 and v2 ∈ V2 . The value of ni,j=1 aij v1i v2j clearly does not
change if v1 and v2 are scaled by inverse factors. The “tensor quality” of a tensor product v1 ⊗ v2 is whatever
quality makes a difference to such quadratic-looking expressions. Any other quality doesn’t count.
[ Show the dimension of the explicit tensor definition in addition to proving it from the general definition? ]
13.6.17 Theorem: If #(A) < ∞ and dim Vα < ∞ for all α ∈ A, then
" # /
dim ⊗ Vα = dim Vα ,
α∈A
α∈A
and a basis for the tensor product is. . .
Proof: [ See Federer [106], 1.1.2, etc. ]

.
13.6.18 Remark: ! The dimension α∈A dim(Vα ) of the tensor product ⊗α∈A Vα may be compared with
the dimension α∈A dim(Vα ) of the direct product ⊕α∈A Vα , which is also known as the “direct sum” of
the family of linear spaces. This shows clearly how different the two concepts are. It also explains why the
word “sum” is used for the direct sum of linear spaces.
m 1
13.6.19 Remark: When m = 1 in Notation 13.6.14, the space ⊗ V = ⊗ V may be identified with the
1
linear space V although it is not represented by the same set. The space ⊗ V is represented as L1 (V, K)∗ ,
which is the dual of the dual of V . The difference between this and the space V is generally ignored. This
is not a serious embarrassment because the number 2 is represented differently as an element of and IR,
and no one worries about that, even though strictly 2 -= 2IR . For any set X, the Cartesian product X m of
m copies of the set X is identified with X when m = 1, even though X 1 is really a set of functions which
are valued in X. As long as the user of a definition knows how to convert the informal “equalities” into
strictly correct identification maps, there is no problem. A similar situation alluded to in Remark 26.3.10 is
the identification of topological manifolds with C 0 manifolds.
" #
[ Also mention equivalents, isomorphisms and duals such as L (Vα )α∈A , W ∼ = Hom(⊗α∈A Vα , W ). ]
13.7. Covariant tensors

[ Rewrite this section. ]
13.7.1 Remark: There are many choices for representation of all kinds of tensors. The simplest kind of
representation for covariant tensors is as the multilinear functionals Lm (V, IR) on Cartesian products of a
linear space. But the representation chosen for contravariant tensors is the dual space Lm (V, IR)∗ . So to be
consistent, it would make sense to represent covariant tensors as the space Lm (V ∗ , IR)∗ , which is the linear
dual of the m-linear dual of the linear dual of V . In a choice between the symmetry of the space Lm (V ∗ , IR)∗
and the simplicity of the space Lm (V, IR), it seems best to favour simplicity. This is particularly because of
the heavy use of covariant tensors in applications.
In many contexts, it becomes clear that contravariant and covariant vectors are not exact mirror images of
each other. The notations of tensor calculus give the illusion of this mirror-image equivalence by using upper
and lower indices for contravariant and covariant vector coordinates. It would be self-defeating to insist on
a mirror-image symmetry in tensor representations when this symmetry is in fact not always valid.

13.7.2 Remark: The words “covariant” and “contravariant” often seem to be defined with the reverse
meanings to what is expected. The terminology may be justified by noting that the coefficients of con-
travariant vectors vary as inverses of the transformations of basis vectors. However, contravariant vectors
themselves are the primal vectors whereas covariant vectors are the dual vectors. In a comment about the
transformation rule for contravariant vector coefficients with respect to a transformed basis, Szekeres [45],
page 84, says: “This law of transformation of components of a vector v is sometimes called the contravariant
transformation law of components, a curious and somewhat old-fashioned terminology that possibly de-
fies common sense.” (See also a related discussion of the confusing terminology for the “covariant derivative”
in Remark 36.5.3.)
[ Must look at how to interpret the dual of spaces such as L (V1 , V2 ; U ) for linear spaces U -= IR as contravariant
tensors of some sort. ]
13.7.3 Theorem: (⊗α∈A Vα )∗ is canonically isomorphic to L ((Vα )α∈A ; K).
Proof: Since ⊗α∈A Vα = (L ((Vα )α∈A ; K))∗ by Definition 13.6.2, this theorem follows from the general
fact that the dual of the dual of any linear space is canonically isomorphic to the primal. (See Theo-
rem 10.5.17.) In this case, define the linear map h : L ((Vα )α∈A ; K) → (⊗α∈A Vα )∗ by h(λ)(w) = w(λ) for
all λ ∈ L ((Vα )α∈A ; K) and w = ⊗α∈A vα ∈ ⊗α∈A Vα .
" #∗
L (Vα )α∈A ; K = ⊗ Vα w
α∈A
" #∗
h(λ) ∈ ⊗ Vα
α∈A
" #
L (Vα )α∈A ; K λ K
w = ⊗ vα
α∈A
λ
× Vα v
α∈A v = (vα )α∈A
Figure 13.7.1 Maps for the dual of a tensor product space
[ Show something like ⊗α∈A Vα∗ ∼ = L ((Vα )α∈A ; K) in some canonical sense. Similarly, show that ⊗α∈A Vα ∼ =
L ((Vα )α∈A ; K). EDM2 [35], 256.I, says that Hom(V1 ⊗ V2 , W ) ∼
∗
= L (V1 , V2 ; W ), for instance. Since linear
spaces are isomorphic if they have the same dimension, these isomorphisms should be natural in some sense.
Note that Frankel [19], p. 59, section 2.4b uses L ((Vα∗ )α∈A ; K) as the definition of ⊗α∈A Vα . ]
13.7.5 Remark: Figure 13.7.2 illustrates the relations between spaces of multilinear maps and tensor
product spaces of a single linear space. As in Figure 13.6.1, the abbreviation “iso” means “isomorphism”,
and “m-dual” means the “multilinear dual” or “space of multilinear maps”.
iso iso m
V∗ m Lm (V, IR) ⊗m V ∗ (⊗ V )∗
-du du du
al al al
dual
dual
al al al
-du du du m
V ∗∗ V m Lm (V ∗ , IR) ⊗m V (⊗ V ∗ )∗
iso iso iso
Figure 13.7.2 Linear and multilinear duals of a linear space

13.8. Mixed tensors 339
13.8. Mixed tensors

s,r
13.8.1 Remark: EDM2 [35], 256.J, mentions a natural isomorphism (⊗ V )∗ ∼ = L ((V )si=1 , (V ∗ )rj=1 ; K).
r,s ∼
This agrees with Theorem 13.7.3. They obtain a natural isomorphism ⊗ V = L ((V )si=1 , (V ∗ )rj=1 ; K) by
s,r r,s
combining this with the duality (⊗ V )∗ = ⊗ V .
r,s
13.8.2 Definition: The mixed tensor space ⊗ V is defined for linear spaces V and r, s ∈ +
0 as the
r+s
tensor product ⊗i=1 Vi , where Vi = V for i ≤ r and Vi = V ∗ for i > r.
r,s
An element of ⊗ V is said to be a mixed tensor of type (r, s).
[ Define various kinds of tensor degree near here. Maybe r is the “contravariant degree”, s is the “covariant
degree” and r + s is the “total degree”? ]
13.8.3 Remark: It is tedious to have to always specify that (r, s) ∈ + 0 × 0 or r, s ∈ 0 . Therefore the
+ +
type (r, s) of a tensor is always assumed to lie in 0 × 0 unless explicitly restricted in some way.
+ +
13.8.4 Remark: The mixed tensors in Definition 13.8.2 are required to have a sequence of primal spaces
followed by a sequence of dual spaces, but clearly this could be generalized so that the primal and dual
spaces are mixed up in any order. It is unclear why this is not generally done. It is possible to construct
a closed algebra of mixed tensors by always keeping the primal and dual spaces grouped together, but in
physics, the full generality is required. (See Remark 28.3.1 for further comment on this.)
This issue is briefly mentioned by Misner/Thorne/Wheeler [38], section 3.5, page 84. They say: “Because
the frame-independent geometric notation is somewhat ambiguous . . . , one often uses component notation to
express coordinate-independent, geometric relations between geometric objects.” In other words, the order
is important and the standard tensor notations do not permit one to express this order adequately.
As an example, a possible notation for the space of tensors whose components are written as Ki jk % m would
0,1,2,1,1 u ,d ,u ,d ,...
be ⊗ IRn . More generally, the notation would be ⊗ 1 1 2 2 V with contravariant and covariant
degrees respectively equal to u1 , u2 , . . . and d1 , d2 , . . .. It would then be necessary to develop a set of
notations for arbitrary contractions and products of such tensors and spaces. This seems to be an area
where the physicists’ component notations are better developed than the pure mathematical notations.
13.8.5 Remark: [ This remark is a bit conjectural, like Remark 13.8.4. ]

It might be that the kludgy component notations lead one astray. In the context of a manifold without
metric and without connection, there is not way to raise of lower indices in an arbitrary fashion. So a mixed
r,s
tensor in ⊗ V can only mean an s-linear map on a tensor product of r copies of a linear space V . There is
actually no difference between e1 ⊗ e2 . . . er ⊗ e1 ⊗ e2 . . . es and e1 ⊗ e2 . . . es ⊗ e1 ⊗ e2 . . . er . In the absence
of a metric, there is no way to arbitrarily raise and lower any coefficients of such tensor products.
The meaning of a simple tensor aI J e1 ⊗ e2 . . . er ⊗ e1 ⊗ e2 . . . es for multi-indexes I ∈ rn and J ∈ sn is a
r s r
map in Lin(⊗ V, ⊗ V ) ≡ Ls (⊗ V, V ). This seems to indicate that the representation of mixed tensors as
a tensor product of a mixture of copies of the linear space V and its dual V ∗ is purely for convenience to save
r s r
having to discuss Lin(⊗ V, ⊗ V ) or Ls (⊗ V, V ). The coefficients aI J are thus really of the form (aI )J ,
r s
which more clearly suggests the space Lin(⊗ V, ⊗ V ). From this we may conclude that mixed tensors
r s r
should be defined as Lin(⊗ V, ⊗ V ) or Ls (⊗ V, V ). Then one may then note in passing that there is an
r s
equivalent space ⊗ V ⊗ ⊗ V ∗ . The latter space is how mixed tensors are usually defined.
r ,s ,r ,s ,r ,s
It may be concluded that any attempt to define spaces like ⊗ 1 1 2 2 3 3 V would only yield worthless
permutations of coefficient arrays of the form (a )J for I ∈ rn1 +r2 +r3 and J ∈ sn1 +s2 +s3 . Therefore
I
definitions of such spaces would be worthless. However, it is important to ask the question and obtain the
answer.
In the case of metric spaces, however, the picture is different. In this case, there is an additional Einstein
index convention that multiplying by suitable copies of the metric tensor g and its inverse yields a tensor
with the same symbol but different location of indices (i.e. raised or lowered). Thus coefficients Ri jk% may
be converted to Rijk
%
, and other similar variations. In this case, the index locations keep track of the metric
tensor multiplications which have been applied.

13.8.6 Notation: ⊗s V denotes the tensor product of s copies of a linear space V , for any s ∈ 0.
+
0,s
Thus ⊗s V = ⊗ V .
13.8.7 Notation: V r,s denotes the sequence of linear spaces (Ui )r+s i=1 for non-negative integers r, s ∈ 0,
+
where Ui = V for i = 1 . . . r and Ui = V ∗ for i = r + 1, . . . r + s.
13.8.8 Notation: Lr,s (V, W ) denotes the space L (V r,s , W ) of multilinear maps from V r,s to W for any
0 . In particular, W may be the field K of V .
linear spaces V and W , and non-negative integers r, s ∈ +
13.8.9 Remark: Notations 13.8.7 and 13.8.8 may be non-standard, but they seem reasonable enough. The
r,s
mixed tensor space ⊗ (V ) is, by Definition 13.8.2,
" the dual#∗of Lr,s (V, K), where K is the field of the linear
r,s
space V , for any r, s ∈ + 0 . That is, ⊗ (V ) = Lr,s (V, K) .
[ Must determine whether T 0,2 (M ) is the dual of T 2,0 (M ) in some sense, etc. etc. ]
[ Also see EDM2 [35], 256.I, for tensor products of functions f : M1 → M2 or f : V1 → V2 . These should be
useful for defining T r,s (M1 , M2 ). ]
[ Must define non-degenerate (0, 2) tensors around here somewhere. ]
[ Introduce a bilinear inner product or metric on these mixed spaces. Federer [106] introduces inner products
in section 1.7, page 27. ]
[ Define contractions and traces here. For example, C%k would denote the contraction of the kth contravariant
index with the /th covariant index. This assumes a certain amount of orderliness in the sets of indices.
Note that contractions are not well-defined on tensor algebras. They must be defined on tensors with a
well-defined type. ]
13.9. General tensor algebra

Tensor spaces have operations of vector addition and scalar multiplication, but they have no vector product
operation. In order to accommodate such a product operation, tensor algebras are built from an infinite
sequence of tensor spaces. The tensor spaces are not individually closed under the tensor product operation.
[ It is possible to define tensor product operations for mixed spaces Vα . Of course, this is a little untidy. Even
in the case of two kinds of spaces Vα , such as a primal space V and dual space V ∗ , the requirement to keep
track of order is untidy. Maybe this can be done by always ignoring order in such products? But then again,
in tensor calculus, contravariant and covariant indices are often mixed in arbitrary order. So probably this
is a useful thing to define here! Alternating wedge-style products probably wouldn’t make sense though.
Definitely must do a general version of Definition 13.9.2 for an arbitrary pair of tensor spaces. ]
[ The numbers r and s are called the “degrees” of tensors if they are not mixed. See EDM2 [35], 256.J. The
pair (r, s) is called the “type” of the tensor. ]
13.9.1 Remark: Definition 13.9.2 uses the extended canonical map µ in Definition 13.13.5. The expression
λ(vi , wj ) in Definition 13.9.2 means the value of λ for the sequence of r + s vectors formed by concatenating
!mr1 vectors vi,k with the s vectors wj,% . The term µ(v)
the !means the tensor of degree r defined by µ(v)(λ) =
m2
i=1 λ(v i ) for all λ ∈ Lr (V, IR), and similarly µ(w) = j=1 λ(w j ).
[ Must try to find a more abstract definition of tensor product operation which does not use polynomial
representations as in Definition 13.9.2. ]
13.9.2 Definition: The tensor product operation for a linear space V is the operation
r s r+s
⊗:⊗ V ×⊗ V →⊗ V which is defined for all r, s ∈ +
0 by
m1 5
5 m2
∀λ ∈ Lr+s (V, IR), (µ(v) ⊗ µ(w))(λ) = λ(vi , wj )
i=1 j=1
for all tensor polynomials v = ((vi,k )rk=1 )m

i=1 ∈ (V )
1 r m1
and w = ((wj,% )s%=1 )m
j=1 ∈ (V )
2 s m2
.

13.9. General tensor algebra 341
13.9.3 Theorem: The product in Definition 13.9.2 is independent of tensor polynomial representations.
In other words, if µ(v) = µ(v # ) and µ(w) = µ(w# ), then µ(v) ⊗ µ(w) = µ(v # ) ⊗ µ(w# ).
m! ! m! !
Proof: Let v # = ((vi,k
#
)rk=1 )i=1
1
∈ (V r )m1 and w# = ((wj,%
#
)s%=1 )j=1
2
∈ (V s )m2 be alternative representations
for v and w respectively so that µ(v) = µ(v ) and µ(w) = µ(w ). Then for all λ ∈ Lr+s (V, IR),
# #
! ! ! !
m1 5
m2 m1 m2 m1 5
m2 m1 m2
5 5 5 5 5 5
λ(vi , wj ) − λ(vi# , wj# ) = wj (vi )
λL − w! (vi )
λL #
(13.9.1)
j
i=1 j=1 i=1 j=1 i=1 j=1 i=1 j=1
!
m2 m2
5 5
= wj )
µ(v)(λL − µ(v # )(λL
w! ) (13.9.2)
j
j=1 j=1
!
+5
m2 m2 ,
5
= µ(v) λL
wj − λL
w! , (13.9.3)
j
j=1 j=1
where the r-linear functions λL

y ∈ Lr (V, IR) are defined by λy : x 8→ λ(x, y) for x ∈ V and y ∈ V . The
L r s
! m2 L !m!2 L
r-linear map j=1 λwj − j=1 λw! ∈ Lr (V, IR) on line (13.9.3) may be evaluated for any x ∈ V r as follows:
j
! !
+5
m2 m2 , m2 m2
5 5 5
λL
wj − λL
wj! (x) = λ(x, wj ) − λ(x, wj# ) (13.9.4)
j=1 j=1 j=1 j=1
!
m2 m2
5 5
= x (wj )
λR − x (wj )
λR #
(13.9.5)
j=1 j=1
= µ(w)(λR
x ) − µ(w )(λx )
# R
(13.9.6)
= 0,
where the s-linear functions λRx ∈ Ls (V, IR) are defined by λx : y 8→ λ(x, y) for x ∈ V and y ∈ V . Since
R r s
the r-linear map on line (13.9.3) is the zero map, it follows that the expression on the left of line (13.9.1) is
zero, which means that µ(v) ⊗ µ(w) = µ(v # ) ⊗ µ(w# ) as claimed.
13.9.4 Remark: The proof of Theorem 13.9.3 is perhaps not instantly comprehensible. Line (13.9.1)
highlights the fact that a multilinear function of r + s vectors is in particular multilinear with respect to the
first r vectors. Thus by fixing the last s vectors (wj,% )s%=1 , the (r + s)-linear function λ becomes the r-linear
r
function λL wj of the remaining r vectors (vi,k )k=1 . This is exactly what the tensor µ(v) ∈ ⊗ V of degree
r
r needs as an argument. Therefore line (13.9.2) applies the definition of µ(v), which gives µ(v)(λL wj ) =
!m1 L
i=1 λwj (vi ). (This shows the convenience of defining tensors as the dual of a space of multilinear maps
rather than the stodgy old quotient space of a free linear space in Section 13.12.) Line (13.9.3) uses the fact
that µ(v) = µ(v # ) (because v and v # represent the same tensor) and the linearity of µ(v) with respect to
r-linear maps.
The coefficient of µ(v) in line (13.9.3) is an r-linear map which is a linear combination of r-linear maps. In
line (13.9.4), this r-linear map is applied to a sequence x ∈ V r of r vectors and expanded according to the
definitions of λL wj and λw! . Line (13.9.5) uses the reverse trick to line (13.9.1) by fixing the first r arguments
L
j
of λ to construct an s-linear map λR x . This is what the tensors µ(w) and µ(w ) act on. So line (13.9.6) uses
#
the definitions µ(w) and µ(w ) to simplify the expression. Since the tensors µ(w) and µ(w# ) are the same
#
tensor with two different representations, the result is zero. By the linearity of µ(v) in line (13.9.3), this
implies that Definition 13.9.2 gives the same result, no matter which representations are used.
13.9.5 Example: Let v = ((v1 , v2 ), (v3 , v4 )) ∈ (V 2 )2 and w = ((w1 , w2 ), (w3 , w4 )) ∈ (V 2 )2 in Defini-

tion 13.9.2. These represent the tensors µ(v) = v1 ⊗ v2 + v3 ⊗ v4 and µ(w) = w1 ⊗ w2 + w3 ⊗ w4 . Therefore
µ(v) ⊗ µ(w) = (v1 ⊗ v2 + v3 ⊗ v4 ) ⊗ (w1 ⊗ w2 + w3 ⊗ w4 )

= (v1 ⊗ v2 ⊗ w1 ⊗ w2 ) + (v1 ⊗ v2 ⊗ w3 ⊗ w4 ) + (v3 ⊗ v4 ⊗ w1 ⊗ w2 ) + (v3 ⊗ v4 ⊗ w3 ⊗ w4 ).

13.9.6 Definition: The tensor algebra of a linear space V −

< (K, V, σK , τK , σV , µVK ) is the tuple
(K, A, σK , τK , σA , τA , µK ) where:
A
∞ r
K ) = ⊕r=0 ⊗ V is the direct sum of the linear spaces
(i) (K, A, σK , τK , σA , µA ⊗r V ;
(ii) τA : A × A → A, denoted as the binary operation ⊗, is defined by
k 5
5 % k+% 5
5 n
u⊗v = ui ⊗ vj = (ui ⊗ vn−i )
i=0 j=0 n=0 i=0
!k !%
for u, v ∈ A with u = i=0 ui and v = j=0 vj with ui ∈ ⊗i V and vj ∈ ⊗j V for i = 0 . . . k and
i j i+j
j = 0 . . . /, and the tensor space product ⊗ : ⊗ V × ⊗ V → ⊗ V is as in Definition 13.9.2.
13.9.7 Notation: ⊗∗ V denotes the tensor algebra A −

< (K, A, σK , τK , σA , τA , µA
K ) in Definition 13.9.6.
∞ r
13.9.8 Remark: The direct sum ⊕r=0 ⊗ V in Definition 13.9.6 (i) is defined as the set of all almost-all-
∞ r
zero infinite sequences of elements from the different degrees of tensor
"! r=0 ∈ ⊕r=0 ⊗ V
spaces. Thus# a = (ar )∞
r r
if ar ∈ ⊗ V for all r ∈ + 0 . Thus part (ii) means that a ⊗ b = s=0 as ⊗ br−s r=0 .
∞
13.9.9 Remark: Most textbooks say very little indeed about product operations of tensor algebras for
fully general sums of tensors with different degrees. The tensor product in Definition 13.9.2 applies only to
pairs of tensors where each element of the pair is in a single tensor space. Kobayashi/Nomizu [27], page 22,
define the general product, and EDM2 [35], section 256.O, talks about the direct sum of tensor spaces, which
implies that the tensor algebra contains general sums. It seems that there is not much interest in such sums
of tensors of mixed degree in applications. Therefore the general mixed product operations are defined here
for logical completeness only. For example, consider 5 + v1 + v1 ⊗ v2 + v3 ⊗ v4 ⊗ v5. This is an element of
⊕3r=0 ⊗r V since 5 ∈ ⊗0 V . There is no way to simplify the sum of v1 and v1 ⊗ v2 since they are in different
components of the direct sum of tensor spaces.
13.9.10 Remark: Since any linear space may be inserted into Definition 13.9.6 to construct a tensor
algebra, certainly the dual of any linear space may also be turned into a tensor algebra. If V ∗ is the dual of
∗
a linear space V , then ⊗ V ∗ is a well-defined tensor algebra. For a given linear space V , the tensor algebra
⊗ V is called the contravariant tensor algebra of V whereas ⊗∗ V ∗ is called the covariant tensor algebra
∗
of V . It is not quite so easy to construct the tensor algebra of mixed contravariant and covariant tensors.
(For mixed tensors, see Definition 13.8.2.)
13.9.11 Definition: The mixed tensor product operation for a linear space V is the operation
r ,s r ,s r +r ,s +s
⊗ : ⊗ 1 1 V × ⊗ 2 2 V → ⊗ 1 2 1 2 V which is defined for all r1 , r2 , s1 , s2 ∈ +
0 by
m1 5
5 m2
∀λ ∈ Lr1 +r2 ,s1 +s2 (V, IR), (µ(v) ⊗ µ(w))(λ) = λ(vi1 , wj1 , vi2 , wj2 )
i=1 j=1
1 r1 2 s1
for all v = ((vi,k )k=1 , (vi,k )k=1 )m
i=1 ∈ (V
1 1 r2
) and w = ((wj,%
r1 ,s1 m1 2 s2
)%=1 , (wj,% )%=1 )m
j=1 ∈ (V
2
) .
r2 ,s2 m2
13.9.12 Remark: The mixed linear spaces V r1 ,s1 and V r2 ,s2 in Definition 13.9.11 are defined in Nota-
tion 13.8.7. The mixed multilinear space Lr1 +r2 ,s1 +s2 (V, IR) is defined in Notation 13.8.8.
13.9.13 Theorem: The product in Definition 13.9.11 is independent of tensor polynomial representations.
In other words, if µ(v) = µ(v # ) and µ(w) = µ(w# ), then µ(v) ⊗ µ(w) = µ(v # ) ⊗ µ(w# ).
Proof: A proof like for Theorem 13.9.3 probably works. Trust me, I’m a mathematician.
13.9.14 Definition: The mixed tensor algebra of a linear space V −

< (K, V, σK , τK , σV , µVK ) is the tuple
(K, A, σK , τK , σA , τA , µK ) where:
A
∞ r,s
K ) = ⊕r,s=0 ⊗
(i) (K, A, σK , τK , σA , µA V is the direct sum of the linear spaces ⊗r,s V ;
13.10. Alternating tensors 343
(ii) τA : A × A → A, denoted as the binary operation ⊗, is defined by

k1 5
5 %1 5
k2 5
%2 r5
1 +r2 s5
1 +s2 r 5
5 s
u⊗v = (ur1 ,s1 ⊗ vr2 ,s2 ) = (ui,j ⊗ vr−i,s−j )
r1 =0 s1 =0 r2 =0 s2 =0 r=0 s=0 i=0 j=0
! ! ! ! r ,s
for all u, v ∈ A, where u = kr11=0 %s11 =0 ur1 ,s1 and v = kr22=0 %s22 =0 vr2 ,s2 for some ur1 ,s1 ∈ ⊗ 1 1 V
r2 ,s2
and vr2 ,s2 ∈ ⊗ V for r1 = 0 . . . k1 , s1 = 0 . . . /1 , r2 = 0 . . . k2 , and s2 = 0 . . . /2 , and the tensor space
r1 ,s1 r ,s r +r ,s +s
product ⊗ : ⊗ V × ⊗ 2 2 V → ⊗ 1 2 1 2 V is given by Definition 13.9.11.
13.9.15 Notation: ⊗∗,∗ V denotes the tensor algebra (K, A, σK , τK , σA , τA , µA
K ) in Definition 13.9.14.
13.9.16 Remark: Notation 13.9.15 is admittedly not standard although it is perfectly logical. EDM2 [35],
section 256.K, calls it T (V ). Kobayashi/Nomizu [27], page 22, call this space T.
13.9.17 Remark: It can be seen from Definitions 13.9.11 and 13.9.14 in particular that tensors are an index
management nightmare. Luckily, tensors of mixed degree are rarely seen. Gallot/Hulin/Lafontaine [20],
page 36, say about the tensors: “It is one of the unpleasant tasks for the differential geometers to define
them!” They’re not exaggerating!!
[ Try to define tensor product and algebra for ⊗r ,s ,r ,s ,... V . ]
i 1 2 2
13.9.18 Remark: The “Einstein index convention” is a very useful collection of pseudo-notations. The
meaning of expressions which use this convention depends to some extent on context. So it is important to
always provide sufficient context to make the meaning clear. The convention includes the following rules.
(i) The vectors in a basis for a linear space V use subscript indices. Example: (ei )ni=1 .
The coordinates of a vector v with respect to a basis (ei )ni=1 use superscript indices. Example: v =
(ii) !
n
i=1 v ei .
i
(iii) When subscript and superscript indices (which follow the convention) are summed !over the linear space
n
basis index set, the sum symbol may be omitted. Example: v = v i ei means v = i=1 v i ei .
(iv) When a “metric tensor” (gij )ni,j=1 is provided for a linear space, and (v i )ni=1 is the sequence of vector
coordinates
!n with respect to a basis (ei )ni=1 , the sequence of coordinates (vi )ni=1 is defined by vi = gij v j =
i=1 gij v for i ∈ n . This is called “lowering the indices”.
j
(v) etc. etc. etc.
It is important to ensure that the implicit linear space basis and metric tensor are defined clearly in the
context of expressions which use the Einstein index convention. This is particularly important when multiple
bases and multiple metric tensors are used in a single context. The meaning of the choice of subscript or
superscript for a particular sequence of objects depends on what kind of object it is. For example, tensor
coefficients and tensor basis elements have opposite choices. It is also important to remember that some
sequences with subscripts and subscripts are not tensors of any kind at all, although they do use the Einstein
index convention. For example, the Christoffel symbol is not a tensor. The individual terms in the exterior
derivative are also not generally tensors.
Remark 7.11.15 mentions some index convention rules for multi-index subscripts and superscripts.
13.10. Alternating tensors

Alternating covariant tensors are also known as exterior forms, skew-symmetric forms, skew-symmetric
tensors and antisymmetric tensors.
Alternating tensors are motivated by integration over submanifolds of flat spaces or manifolds. This is the
subject of geometric measure theory. The familiar Lebesgue measure is used for integration with respect to
volume elements. The region of integration for Lebesgue measure typically has the same dimension as the
ambient space. But in geometric measure theory, the region of integration typically has dimension less than
the ambient space dimension. Examples are integration over surfaces and curves in 3-dimensional space.
13.10.1 Remark: The m-area spanned by a sequence of m tangent vectors at a point in a manifold is
an anti-symmetric multilinear function of the tangent vectors. Therefore alternating tensors of degree m
contain exactly the right amount of information for integrating over an m-submanifold.

13.10.2 Remark: The lowered index in the notation Λm V for covariant antisymmetric tensor spaces is
inherited from the Lm −
(V, K) notation. KConveniently this matches the “Einstein convention” for lowered
m
indices. The raised index in the notation V for the contravariant space (“wedge product”) is inherited from
m
the corresponding ⊗ V notation. Once again, the raised index matches the convention for contravariant
tensors.
Notations 13.10.3 and 13.10.5 denote the linear spaces of antisymmetric multilinear maps which were intro-
duced in Theorem 13.4.7. The relations between many of the antisymmteric tensor spaces in this section are
iso Km iso "Km #∗

V∗ m Λm (V, IR) V∗ V
-du du du
al al al
dual
dual
al al al
-du du Km du "Km #∗
V ∗∗
V m Λm (V , IR)
∗
V V∗
iso iso iso
Figure 13.10.1 Linear and antisymmetric multilinear duals of a linear space
13.10.3 Notation: Λm (V, W ) for a linear space V with field K, where m ∈ + 0 , denotes the set Lm (V, W )
−
together with its pointwise vector addition and scalar multiplication operations.
13.10.4 Definition: The elements of Λm (V, W ) for a linear space V with field K, where m ∈ 0,
+
are
called (alternating) m-forms.
13.10.5 Notation: Λm V for a linear space V over a field K, where m ∈ 0,

+
is an abbreviation for
Λm (V, K), where K is identified with the linear space K over the field K.
13.10.6 Definition: The alternating tensor product of m copies of a linear space V for any m ∈ + 0 is the
dual linear space of the linear space of antisymmetric multilinear forms Λm (V, K).
Km
13.10.7 Notation: V for mKm∈ + 0 and a linear space V over a field K denotes the alternating tensor
product of m copies of V . Thus V = Λm (V, K)∗ .
13.10.8 Definition: An m-covector for a linear space V and m ∈ is any element of Λm V .

+
0
Km
13.10.9 Definition: An m-vector for a linear space V and m ∈ + 0 is any element of V.
K
13.10.10 Definition:
K A simple m-vector in an alternating tensor product m V for a linear space V and
m
m∈ + 0 is any f ∈ V of the form f : λ 8→ λ(v) for some v ∈ V m .
i=1 vi for a sequence v ∈ V , where V is a linear space and m ∈ 0, denotes the

+
13.10.11 Notation:K ∧m m
m
simple m-vector f ∈ V defined by f (λ) = λ(v).
v1 , . . . vm is an alternative notation for ∧m
i=1 vi .
13.10.12 Remark: Definition 13.10.6 and Notation 13.10.7 imply that for m ≥ 2 and dim(V ) ≥ 1,
Km ) && *
V = λ& ; λ ∈ Lm (V, K)∗
Λm (V,K)
) && m *
= λ& ; λ∈⊗ V
Λm (V,K)
m
-⊆ ⊗ V = Lm (V, K)∗ .
Km m
That is, the individual tensors in V are subsets (or function subgraphs) of the individual tensors in ⊗ V ,
m
but the alternating tensor space as a whole is not a subset of the tensor space ⊗ V . This observation scarcely
rises above the pedantic. The reader (and writer) may safely ignore it.

13.11. Alternating tensor algebra 345
general tensor antisymmetric tensor

reference contravariant covariant contravariant covariant
m
Bump [97] ⊗ V — K
m
∧ V Km—
m
Crampin/Pirani [12] — — Km V Km ∗ V∗
Darling [14] — — Km V V ,K Am (V → IR)
EDM2 [35] ⊗m".V , T0m (V ),# ⊗m"V.∗ , Tm0 (V#), V
m ∗
V
m ∗ m
L V , IR L V, IR
K Km
Federer [106] ⊗m V — Km V V
m
Flanders [66] — — V —
Km ∗
Frankel [19] ⊗m V ⊗m V ∗ K— Km V ∗
m
Fulton/Harris [109] V ⊗m — V Km V ∗
Gallot/Hulin/Lafontaine [20] ⊗m V ⊗m V ∗ — V
Kobayashi/Nomizu [27] Tm 0 (V ) T0m (V ) K— Km ∧ m —
m
Lang [31] — Lm (V, IR) V V , La (V, IR)
Lee [33] T m (V ) Tm (V ) — Λm (V )
Malliavin [36] — Lm (V, IR) — Lm,a (V, IR)
Spivak [43] Tm (V ) T m (V ) — Ωm (V )
Szekeres [45] — — Λm (V ) Λ (V ∗ ), Λ∗m (V )
m
Km K
Kennington ⊗m V Lm (V, IR), ⊗m V ∗ V Λm (V, IR), m V ∗
Table 13.10.1 Summary the general and antisymmetric tensor space notations
13.10.13 Remark: Table 13.10.1 summarizes the general and antisymmetric tensor space notations of a
selection of authors. Although there is clearly much agreement, there is also significant diversity.
[ Present a basis for Λr V . Then show how things work in terms of this basis. ]
[ Demonstrate how alternating tensors are related to area and volume. ]
[ Give a diagram showing how vectors u and v are related to the directed area u ∧ v spanned by the vectors.
Show sets of equivalent vector pairs. See Figure 13.1.1. ]
Km dim(V )
13.10.14 Theorem: Let V be a linear space and m ∈ + 0 . Then dim( V ) = dim(Λm V ) = Cm .
Proof: It follows from Theorem 13.4.10 and Notation 13.10.3 that dim(Λm V ) = Cm
Km
n
for n = dim(V ). The
dimensionality of V then follows from Theorem 10.5.10.
[ See Notation 7.11.10 for index sets Irn with Crn elements. ]
[ Also mention equivalents, isomorphisms and duals such as Hom(Λm V, W ) ∼
= Λm (V, W ). See Federer [106],
page 17. Also Hom(Λm V ∗ , W ) ∼= Λm (V ∗ , W ). ]
13.11. Alternating tensor algebra

Alternating tensors have a much more interesting algebra than general tensors.
13.11.1 Definition: The algebra Λ∗ (V, W ) of alternating forms over the linear space V is. . . [ See Fed-
erer [106] 1.4.2. ]
13.11.2 Definition: The exterior algebra Λ∗ V over the linear space V is. . . [ See Federer [106] 1.3.1. ]
13.11.3 Definition: The alternating algebra Λ∗ (V, W ) over the linear space V with coefficients in the
alternating algebra W is. . . [ See Federer [106] 1.4.2. ]
[ After definition of Λr V , must also define the exterior product ∧ : Λr V × Λs V → Λr+s V . See Frankel [19],
pages 67–68. ]
[ Maybe have
Lma tiny section on symmetric tensors near here? Maybe not interesting enough. Could use
notation
L V for the space of symmetric tensors and P m
i=1 vi = v1 P v2 , . . . vm for symmetric tensors. Use
m (V, W ) for the space Lm (V, W ) of symmetric multilinear maps on V ? See Federer [106], page 41. ]
+

13.12. Tensor products defined via free linear spaces

The section presents an alternative approach to defining tensor products of linear spaces. A simpler approach
is presented in Section 13.6. In this section, tensor products are carved out of free linear spaces. Free linear
spaces are defined in Section 10.10.
[ The map u 8→ eu in Definition 13.12.1 is called the “canonical projection” in EDM2 [35], 256.I. The standard
immersion is called the “canonical multilinear map”. ]
13.12.1 Definition: The tensor product ⊗α∈A Vα of a family (Vα )α∈A of linear spaces over a field K is
the quotient linear space F/G of the free linear space F on the set ×α∈A Vα with respect to the subspace G
of F generated by the set
{eu + ev − ew ; u, v, w ∈ × Vα and ∃β ∈ A, (uβ + vβ = wβ and ∀α ∈ A \ {β}, uα = vα = wα )}

α∈A
∪ {eu − cev ; u, v ∈ × Vα , c ∈ K, and ∃β ∈ A, (uβ = cvβ and ∀α ∈ A \ {β}, uα = vα )},
α∈A
where eu ∈ F denotes the function eu = χ{u} : ×α∈A Vα → K.

The standard immersion of a product ×α∈A Vα of linear spaces over a field K into the tensor product ⊗α∈A Vα
is the function µ : ×α∈A Vα → ⊗α∈A Vα defined by
∀v ∈ × Vα , µ(v) = ev + G.
α∈A
That is, each element v of ×α∈A Vα is mapped onto the coset in F/G of the characteristic function of {v}.
13.12.2 Remark: Definition 13.12.1 may be interpreted as representing each symbol of the form v as ev .
(See Figure 13.12.1.)
(u, 1)
× Vα
α∈A
Figure 13.12.1 Function eu = χ{u} for α = 2, V1 = V2 = K = IR, u = (3, 4)
13.12.3 Remark: The tensor concept seems to have been invented by physicists. It sometimes happens
that physicists represent concepts by symbols without regard to the representation of those symbols in terms
of set-theoretic constructions. It is easy to write the symbol v ⊗ w for any vectors v and w, but due to
the equivalence rules, it is difficult to give this a unique set theory representation, particularly for practical
computation requirements. This is not difficult if using a fixed basis, but this has the disadvantage of
dependence on the basis.
13.12.4 Remark: An example which demonstrates how Definition 13.12.1 works is the equality
(2, 4) ⊗ (3, 5) = 2(1, 2) ⊗ (3, 5) = (1, 2) ⊗ (6, 10) = (1, 2) ⊗ (1, 9) + (1, 2) ⊗ (5, 1)
in the tensor product space IR2 ⊗ IR2 . Essentially, when the index set is of the form A = k = {1, . . . k},
Definition 13.12.1 creates a tensor space from a free linear space by applying any number of rules of the form
(u1 , . . . uβ , . . . uk )+(u1 , . . . vβ , . . . uk ) = (u1 , . . . uβ +vβ , . . . uk ) and c(u1 , . . . uβ , . . . uk ) = (u1 , . . . cuβ , . . . uk ) to
determine equivalence classes. Thus, for instance, (2, 4) ⊗ (3, 5) really means [((2, 4), (3, 5))], the equivalence
class of the pair ((2, 4), (3, 5)) ∈ IR2 × IR2 .
[ Near here, show how tensors may be defined in terms of a basis. ]

13.13. Tensor products defined via lists of tensor monomials 347
13.12.5 Remark: Let e : ×α∈A Vα → F be the standard immersion of Definition 13.12.1. Let j : F → F/G
be the standard map of Definition 10.7.1. Then the map µ = j ◦ e : ×α∈A Vα → F/G satisfies
µ(v) = χ{v} + G.
The subspace G of F corresponds to the relation that two vectors in F are equivalent if and only if they are
equal except for. . .
13.12.6 Remark: A metadefinition for the tensor product of linear spaces is given in Section 13.5. Def-
initions of tensor product spaces are “characterized” by a set of properties which any definition of tensor
products must satisfy. These properties are asserted in Theorem 13.12.7 for the free linear space definition
of tensor product given here.
13.12.7 Theorem: The tensor product immersion µ : ×α∈A Vα → ⊗α∈A Vα satisfies

(i) µ is multilinear,
(ii) for any linear space W (over the same field as for the Vα ) and multilinear map ψ : ×α∈A Vα → W , there
exists a unique linear map f : ⊗α∈A Vα → W such that ψ = f ◦ µ.
Therefore Definition 13.12.1 satisfies Metadefinition 13.5.1.
13.12.8 Remark: All of the definitions, notations and theorems for the tensor space definition in Sec-
tion 13.6 apply also to the free linear space definition.
13.13. Tensor products defined via lists of tensor monomials

13.13.1 Remark: Many important definitions for tensors are stated in terms of tensor monomials and then
extended by linearity to all tensors. Therefore it is important to establish the relation between tensor mono-
mials and tensors. It is also important to ensure that all definitions expressed in terms of tensor monomials
are independent of the polynomial representation. The “polynomial representation” in Definition 13.13.3 is
not really a polynomial; it is a sum of monomials. A slightly more general-looking expression would have
an arbitrary field element as a factor in front of each monomial term ⊗α∈A vi,α , but such factors can be
absorbed into the monomials.
" #
13.13.2 Definition: A tensor monomial in a tensor space ⊗α∈A Vα is a tensor ⊗α∈A vα = µ (vα )α∈A
for some (vα )α∈A ∈ ×α∈A Vα , where µ is the canonical map of the tensor space.
[ A tensor monomial is called a “simple tensor” by Federer [106]? See Definition 13.6.6. ]
13.13.3 Definition: A tensor polynomial representation of a tensor w in a tensor space ⊗α∈A Vα is a

" # !k " #
sequence v = (vi,α )α∈A ki=1 ∈ (×α∈A Vα )k , for some k ∈ +0 , such that w = i=1 µ ⊗α∈A vi,α . In other
!k
words, w(λ) = i=1 λ(×α∈A vi,α ) for all λ ∈ L ((Vα )α∈A ; K).
13.13.4 Remark: The set of all tensor polynomial

" representations
# % of a tensor space ⊗α∈A Vα in Defini-
tion 13.13.3 is a “list space” of the form List ×α∈A Vα = ∞ k=0 (×α∈A Vα ) . (See Sections 7.12 and 9.12 for
k
list spaces. Although it is more logical to begin list indices at 0, it will be assumed here that monomial
indices start at 1.) The list space consists of finite lists of indexed sets of vectors. The concatenation of
two such lists yields another list in the same list space. Other useful operations on these lists include the
omission of a one or more elements and the insertion of one or more elements. Such operations are used
extensively in tensor algebra, often in an informal manner which can be confusing or ambiguous.
The list representation in Definition 13.13.3 is very close to the way one would represent tensors in symbolic
algebra software. It is also very similar to the free linear space quotient style of definition of tensor spaces
in Section 13.12.
13.13.5" Definition:
# The extended canonical map for a tensor space ⊗α∈A Vα is the map
µ : List ×α∈A Vα → ⊗α∈A Vα defined so that µ(v) is the tensor represented by the polynomial v.

13.13.6 Remark: Although it is clear that all tensor polynomial representations in Definition 13.13.3
specify a tensor, it is not so obvious that all tensors have finite polynomial representations. Theorem 13.13.7
states that the extended canonical map in Definition 13.13.5 is surjective.
13.13.7 Theorem: All tensors in a tensor space ⊗α∈A Vα with finite-dimensional linear spaces Vα for a
finite index set A have a finite tensor polynomial representation.
Proof: For each α ∈ A, let (vα,i )ni=1

α
, where nα = dim(Vα ), be a basis for the linear space Vα .
...

[349]
Chapter 14
Topology
14.1 Overview of topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

14.2 History and generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
14.3 Topological spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.4 Some simple topologies on finite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
14.5 Interior and closure of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
14.6 Exterior and boundary of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
14.7 Limit points and isolated points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
14.8 Some simple topologies on countably infinite sets . . . . . . . . . . . . . . . . . . . . . . . 369
14.9 Generation of topologies from collections of sets . . . . . . . . . . . . . . . . . . . . . . . 372
14.10 The standard topology for the real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 374
14.11 Open bases and open subbases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
14.12 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
14.13 Homeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
14.0.1 Remark: Chapters 14 to 17 are not a full introduction to general topology. Only those aspects of
topology which are needed for later chapters are presented here. Apart from the basic definitions, the most
important aspects of topology for differential geometry are topological space construction techniques (such
as product and quotient topologies), classes of topological spaces, continuous curves and paths, topological
transformation groups, and metric spaces.
14.0.2 Remark: Topology is the study of the connectivity of sets and the continuity of functions. So:
Topology is the study of connectivity and continuity.
Roughly speaking, continuous functions are those which preserve the connectivity of sets. (See Section 15.5
for details.) So continuity may be defined in terms of connectivity. But connectivity and continuity may
both be defined in terms of the concepts of interior, exterior and boundary of sets. Thus:
Topology is the study of the interior, exterior and boundary of sets.
The interior Int(S) and exterior Ext(S) of a set S in a topological space X may be defined in terms of the
boundary Bdy(S) as Int(S) = S \ Bdy(S) and Ext(S) = (X \ S) \ Bdy(S). So everything in topology may
be defined in terms of boundaries of sets. Hence:
Topology is the study of boundaries.
The modern technical specification of a topology is expressed in terms of open sets. (An open set is a set
which contains none of its boundary points.) The interior, exterior and boundary of a set are then defined in
terms of these open sets. However, in terms of the meaning of a topology, boundaries are more fundamental
than open sets. Both concepts contain the same information in a technical sense, but boundaries have a
much stronger intuitive appeal than open sets.

350 14. Topology
14.0.3 Remark: The concepts of the interior, exterior and boundary of sets are familiar from many real-life
contexts, including the following.
(1) Biological cells and the bodies of animals and plants.
(2) Enclosed vehicles such as cars, aeroplanes, spacecraft and ships.
(3) Oceans, seas and lakes, water droplets, icebergs and glaciers.
(4) Nations, territories and continents.
(5) Planets, stars, asteroids.
It is not totally implausible that the earliest vertebrate animals on Earth implemented the concepts of
interior, exterior and boundary in the world-models by which they navigated their environments. These
topological concepts are arguably more fundamental than numbers or propositional logic.
The logical concept of a class of objects implies both an interior and an exterior of the class. So there is
apparently a close fundamental relation between set theory and topology.
Mathematical models of the physical world are very often expressed in terms of the interior, exterior and
boundary of sets. For example, the Stokes theorem (in Section 20.9) expresses integrals over a region in
terms of integrals over its boundary. Conservation equations for physical flows are expressed in terms of
boundary integrals and interior integrals. Solutions of boundary value problems are typically expressed as
integrals over boundaries and interior regions. A large proportion of complex analysis is concerned with
integrals over boundary curves and their relations to integrals over interior regions bounded by curves.
Perhaps most importantly, all of mathematical analysis is expressed in terms of limits, which are effectively
equivalent to boundaries of sets. Thus analysis is based upon the boundary concept, which is the core
concept of topology.
14.0.4 Remark: A topology for a set X defines the interior Int(S), exterior Ext(S) and boundary Bdy(S)
of every subset S of X. These three sets form a partition of X for each set S. The technical specification of
a topology uses open neighbourhoods to define these three sets.
The interior of a set S is consists of the points x1 which have at least one neighbourhood which is entirely
inside the set S. The exterior consists of the points x3 which have at least one neighbourhood which is
entirely outside the set S. The boundary consists of the points x2 which are neither interior nor exterior.
In other words, every neighbourhood of a boundary point contains at least one point inside and one point
outside S. (See Figure 14.0.1.)
X \S x2 ∈ Bdy(S)
S
x1 ∈ Int(S)
neighbourhoods
x3 ∈ Ext(S)
Figure 14.0.1 Interior, boundary and exterior of a set
One way to think about set interiors and boundaries is to recall Zeno’s paradox of Achilles and the tortoise.
Imagine the tortoise walking towards the boundary. Whenever Achilles gets to where the tortoise was, he
still has the tortoise between him and the boundary. So Achilles always has a neighbour inside the set.
14.0.5 Remark: The pure mathematical concept of a zero-width boundary of a set (in the case of a metric
space) will not match anything in the physical world if space (or space-time) turns out to be granular in
nature. Even if space and time really are infinitely divisible, no physical measurement can be made to
an infinite number of significant figures. However, the exact topological concepts of interior, exterior and
boundary are applied only within pure mathematical models. The correspondence between the physical
world and mathematical models is an application issue.

14.1. Overview of topology 351
14.1. Overview of topology

14.1.1 Remark: In differential geometry layer 0, there are points or events. A point is a location in space.
An event is a combination of a point and a time which locates the event in time. Points (and events) are
represented mathematically as elements of sets. (In the following, the word “point” includes the meaning
“event”.) The only property of a point is its location.
There is no association at all between points in layer 0. The points are independent of each other. The only
relation between points is the equality relation. In other words, given any two points P and Q, it is possible
to say if the points are equal (P = Q) or not equal (P -= Q). There is no distance relation. So there is no
distinction between points which are close to P and points which are distant from P .
A set of points can be counted because the equality relation is well defined on any set. Counting a set or
subset can be achieved by labelling points with numbers as illustrated in Figure 14.1.1.
7
5
9
3 8
2 4
6
1
0
Figure 14.1.1 Layer 0: Points (or events)
Although the points in this figure are drawn on 2-dimensional paper, pure points do not have coordinates of
any kind. All we can do with pure points is label and group them. Despite the lack of attributes, a lot of
mathematics uses no point attributes at all. Basic set theory is presented in Chapters 3 to 8.
14.1.2 Remark: Differential geometry layer 1 adds topological structure to simple point sets. Topology
may be thought of as a kind of glue which holds neighbouring points together. Otherwise points would have
no association with their neighbours at all.
14.1.3 Remark: The fundamental concepts of topology are connectivity of sets and continuity of functions.
The simpler concept to understand intuitively is connectivity of sets. Topological structure is defined in terms
of “open neighbourhoods” of points.
14.1.4 Remark: An “open set” is a set which has no boundary. This means that all points of an open set
are “interior points”, which means that they are surrounded by other points of the same set. You can get
an intuitive idea of what this means by removing the national border lines from a map of the world, leaving
only a zero-width space between countries. Then all the remaining points inside countries are interior points
surrounded only by other points of the same country. You might say that the remaining points inside the
boundaries still have an “edge”, but this is only because the map has finite resolution. If you consider the
boundary to have a thickness of exactly zero metres, then no matter how close you get to the boundary,
you will still be within one country or another. Only when you are exactly on the boundary will you have
“neighbours” in more than one country. (The idea of an open set is related to Zeno’s paradox of Achilles
and the tortoise.)
14.1.5 Remark: In topology, it is considered that every point x is in the interior of one or more neighbour-
hoods Nx of x. Figure 14.1.2 illustrates two sets of points. Each point x ∈ S1 and y ∈ S2 is surrounded by a
single circular neighbourhood Nx or Ny in this diagram, but points may have any number of neighbourhoods
of any size and shape.
The fact that the two sets of points S1 and S2 in Figure 14.1.2 are disconnected from each other is clear from
the fact that each point-set S1 and S2 can be covered by neighbourhoods which exclude the neighbourhoods

352 14. Topology
x Nx
y Ny
Nx ∩ Ny = ∅
Figure 14.1.2 Disjoint neighbourhoods of individual points of disconnected sets
of the other set of points. That is, the intersection Nx ∩ Ny of neighbourhoods Nx and Ny is empty for all
x ∈ S1 and y ∈ S2 .
The neighbourhoods of the sets S1 and S2 may be joined into combined neighbourhoods Ω1 and Ω2 which
contain all of the respective% points in the interiors.
% This is illustrated in Figure 14.1.3. The combined
neighbourhoods are Ω1 = x∈S1 Nx and Ω2 = y∈S2 Ny respectively. Clearly S1 is in the interior of Ω1 and
S2 is in the interior of Ω2 , and the intersection Ω1 ∩ Ω2 is empty because all of the neighbourhood pairs Nx
and Ny have an empty intersection.
%
Ω1 = Nx %
x∈S1 Ω1 ∩ Ω2 = ∅ Ω2 = Ny
y∈S2
Figure 14.1.3 Disjoint combined neighbourhoods of disconnected sets
Note that the sets Ω1 and Ω2 are not associated with any particular individual point. The subject of topology
is much simpler as a logical discipline if the open sets are not associated with particular points. Instead of
dealing with an infinite number of little neighbourhoods around an infinite number of points, we can deal
instead with a much smaller number of combined neighbourhoods around combined sets of points. This leads
to a way of thinking that looks more like Figure 14.1.4.
14.1.6 Remark: The “open set” concept is an efficient abstraction of the per-point “open neighbourhood”
concept. The gain in logical efficiency has the disadvantage of a loss of intuitive directness. In fact, in
applications one generally focuses on per-point neighbourhoods rather than abstract open sets. In this book,
Top(X) denotes the set of open sets of a set X and Topx (X) = {Ω ∈ Top(X); x ∈ Ω} denotes the set of open
neighbourhoods of a point x ∈ X. In practice, the pointwise sets Topx (X) are generally more useful than the
abstract set Top(X). It is important to be able to change focus easily between the per-point neighbourhoods
in Topx (X) and the per-set neighbourhoods in Top(X).
In terms of abstract open sets, a set is defined to be disconnected if it can be covered by two disjoint open
sets Ω1 and Ω2 which each contain at least one point of the set. Otherwise the set is connected. In other
words, a set is connected if and only if it has no “gaps” which separate the set into two or more components.
The set illustrated in Figure 14.1.4 is disconnected.
Ω1
Ω2
Figure 14.1.4 Definition of disconnectedness in terms of disjoint open set covering

14.1. Overview of topology 353
This fact is determined by first specifying the set of all open neighbourhoods in the topological structure of the
point space. (A different choice of neighbourhoods would give a different classification of sets into connected
and disconnected.) In Figure 14.1.4, there are two sets Ω1 and Ω2 which “cover” the set of points. In other
words, all points of the set are inside at least one of the two sets. In this case, the neighbourhoods are disjoint.
This is the definition of connectedness of a set. A set X is disconnected if there are two neighbourhoods
which cover X and there is at least one point of X in each of the two neighbourhoods. These neighbourhoods
effectively “disconnect” the set into two non-empty components. (See also Figure 15.4.1.)
The ability to disconnect points and sets of points from each other is the fundamental task of open sets. A
topology is formally defined to be a set of open sets.
14.1.7 Remark: Chapter 14 presents the basic concepts of topology. One might ask how the set of open
sets is chosen. The choice is quite arbitrary, but must satisfy some rules to keep the definitions self-consistent.
In practice, each commonly used space has a small number of usual topologies which are commonly defined
on that space.
If the set of open neighbourhoods is reduced by removing some open neighbourhoods, this tends to reduce
the ability to disconnect sets into two components. It follows that more sets are then defined to be connected
as the set of neighbourhoods is reduced. Similarly, if the set of neighbourhoods is augmented by adding new
neighbourhood sets, this makes it easier to disconnect sets into two portions. Then you tend to have less
connected sets and more disconnected sets. Roughly speaking, the bigger the topology is, the smaller the
set of connected sets is.
14.1.8 Remark: Continuity of a function may be defined in terms of connectivity. A function is continuous
if and only if its inverse preserves disconnectedness. (There is a minor technicality that the range of the
function must be within a “normal space”, but as the name suggests, this is fairly weak requirement.
Normal spaces are defined in Section 15.2.) Most texts do not define continuity of functions in terms of
connectedness, but this style of definition is presented in Section 15.5 as an alternative. It is useful to just
know that continuity can be defined as the preservation of set disconnectedness by the inverse of a function
rather than preservation of set openness.
Continuous functions can be used to define connectedness of sets. Thus the concepts of connectivity and
continuity are “equipotent”. If you know which sets are connected, you can determine which functions are
continuous. If you know which functions are continuous, you can determine which sets are connected.
14.1.9 Remark: It is disconcerting that real physical geometry does not correspond to the properties of
topological spaces. For example, if a zero-diameter single point is removed from a physical plane figure, it is
impossible to detect this. No measurement apparatus has a zero resolution. If a point is removed from the
set IR2 , the topology is significantly altered. For a physical set, there is no way to determine whether the
set is open or closed because a zero-thickness boundary is undetectable.
All physical sets have fuzzy boundaries. However, this is not a serious problem. It must never be forgotten
that mathematical models and physical systems occupy different universes. Topology applies to abstract
mathematical models, not to observations of real physical systems. Real points have positive width. Real
boundaries have positive thickness. The advantage of zero-diameter points and zero-thickness boundaries
is that they avoid all arguments about the minimum discernible point width or boundary thickness, which
depend very much on the quality of the measuring equipment and the nature of the experiment.
14.1.10 Remark: In layer 0, it is possible to count sets. The cardinality of sets is determined by defining an
equivalence relation on all sets; then saying that two sets have the same cardinality if they are equivalent. Two
sets A and B have the same cardinality (i.e. number of elements) if and only if there is a bijection h : A → B.
By analogy, in layer 1, two sets A and B are said to have the same topology if there is a homeomorphism h :
A → B. A homeomorphism is defined as a bijection h : A → B with the extra restriction that both h
and its inverse h−1 : B → A map open sets to open sets. In other words, a homeomorphism preserves not
only the number of elements in subsets, but also the property of openness of subsets. Since all topological
properties, such as connectivity and continuity, are defined in terms of the topology, two sets with equivalent
topology must also have equivalent connectivity and continuity properties. This reduces work because every
topological fact that is known about one topological space is automatically true for all equivalent topological
spaces.

354 14. Topology
In layer 0, we can classify all sets in equivalence classes according to their numbers of elements. This is
achieved in practice by defining a wide class of sets called “ordinal numbers” which are used as standard sets
for comparison with other sets. Thus a set A has 3 elements if and only if there is a bijection h : A → B,
where B is the standard 3-element set (which happens to be the set 3 = {∅, {∅}, {∅, {∅}}}).
In layer 1, the classification of sets according to their topology is one of the greatest preoccupations. It is not
possible to classify topologies in a totally ordered sequence as in the case of cardinality. Algebraic topology
is concerned with the calculation of algebraic topological attributes to topological spaces. Considerable
research is concerned with determining sufficient conditions which guarantee the topological equivalence (i.e.
the existence of a homeomorphism) between sets. The Poincaré conjecture is just one famous example of a
research question which attempts to uniquely determine the topology of a set in terms of specified attributes.
14.1.11 Remark: If the reader has the impression that Figures 14.1.2, 14.1.3 and 14.1.4 resemble diagrams
of small multi-celled animals, this would not be an entirely unjustified impression. Sets of points are analogous
to single-celled organisms whereas multi-celled organisms have a topological structure which indicates the
connectivity of the cells. When large numbers of cells are present, it makes better sense to focus on the
whole organism rather than the individual cells.
14.2. History and generalities

14.2.1 Remark: The word “topology” was introduced in 1847 by Johann Benedict Listing in “Vorstudien
in Topologie”. This Greek-derived word (from “ ”) corresponds roughly to the earlier Latin phrase
“analysis situs” for the same subject. The Latin word “situs” is defined by White [218], page 572, as: “The
manner of lying; the situation, local position, site of a thing”. It is not really clear how this word is related
to the subject matter of topology.
Bell [191], page 492, suggested (in 1937) that the word “topology” was earlier than the term “analysis situs”.
[. . . ] topology (now called analysis situs) as first developed bore but little resemblance to the
elaborate theory which today absorbs all the energies of a prolific school [. . . ]
14.2.2 Remark: Most topology can be divided into two flavours: global connectivity classification and
local continuity analysis.
(1) Global connectivity classification. This flavour of topology is concerned with classifying topolog-
ical spaces into equivalence classes according to their connectivity properties. Algebraic topology, for
example, attaches homeomorphism-invariant algebraic structures to topological spaces to assist in their
classification.
(2) Local continuity analysis. This flavour of topology is mostly concerned with the pointwise continuity
of maps between topological spaces for which the global connectivity properties are either trivial or of
no interest. Topological vector spaces, for example, are generally trivially connected according to all of
the connectivity algorithms defined in algebraic topology.
The difference between these two flavours of topology is well described by Simmons [140], page viii, as follows.
Historically speaking, topology has followed two principal lines of development. In homology theory,
dimension theory, and the study of manifolds, the basic motivation appears to have come from
geometry. In these fields, topological spaces are looked upon as generalized geometric configurations,
and the emphasis is placed on the structure of the spaces themselves. In the other direction, the
main stimulus has been analysis. Continuous functions are the chief objects of interest here, and
topological spaces are regarded primarily as carriers of such functions and as domains over which
they can be integrated. These ideas lead naturally into the theory of Banach and Hilbert spaces
and Banach algebras, the modern theory of integration, and abstract harmonic analysis on locally
compact groups.
Within the context of differential geometry, one could say, broadly speaking, that topology flavour (1) is
principally the concern of the pure mathematicians whereas topology flavour (2) is of more interest to physi-
cists. A prime example of the pure mathematical focus in differential geometry is the Poincaré conjecture.
Physicists tend to be more interested in infinite-dimensional linear spaces of functions (such as vector and
tensor fields) which are defined on topologically well-understood spaces such as IRk or S k .

14.2. History and generalities 355
14.2.3 Remark: In popular presentations, topology is often explained as the study of properties of topo-
logical spaces which are invariant under homeomorphisms. (This is topology flavour (1) in Remark 14.2.2.)
One could call this “tea-cup and dough-nut topology”.
This kind of homeomorphism-invariant focus of topology is more or less in line with Felix Klein’s 1872
“Erlanger Programm”. The set of homeomorphisms may be regarded as the structure group for the geometry
of topological spaces. This does not quite fit, however. Klein’s definition of a geometry has a fixed set with
a fixed set of automorphisms, not an infinite number of point sets. (See Remark 19.4.2 for related comments
on pseudogroups of homeomorphisms.)
14.2.4 Remark: There is a close correspondence between set connectivity and function continuity. It
follows from Theorem 15.5.8 that a bijection between two open subsets of IRn is a homeomorphism if and
only if the image (and pre-image) of any disconnected pair of sets is a corresponding disconnected pair of
sets. (This applies more generally to normal topological spaces.) In this sense, the connectivity of sets is
a maximal invariant of the set of homeomorphisms of IRn , and the set of homeomorphisms is the maximal
pseudogroup which preserves connectivity. Thus one may say that connectedness is the fundamental invariant
of homeomorphisms in the same sense that tangent vectors are (more or less) the fundamental invariant of
C 1 diffeomorphisms.
14.2.5 Remark: The local continuity analysis flavour of topology is closely associated with the limit
concepts of analysis.
In every interesting topology, each point x ∈ X has an infinite number of neighbourhoods which get closer
and closer to x. (If there were only finitely many neighbourhoods, a single innermost neighbourhood could
be used instead of an infinite set of them.) In this way, the notion of a “limit” is defined. A point x is a
limit of a sequence of points y ∈ X if there is at least one point y in each of the neighbourhoods of x.
The idea of a “limit” has been psychologically troubling ever since the famous limit paradoxes of Zeno. Even
in the 18th and 19th centuries, limits were found to be philosophically troubling. Topology is the subject
which is supposed to resolve all of the issues regarding limits in a logically self-consistent fashion. However,
since the concept of “infinity” is itself difficult to grasp and accept, it is never possible to fully resolve all
difficulties. Topology can only ensure that the formalism is logically self-consistent. Topology cannot take
away the essential discomfort of infinite and infinitesimal concepts.
14.2.6 Remark: Topology has numerous levels of structure within numerous classes. For example, sep-
aration classes in Section 15.2, connectivity classes in Section 15.4, separability classes are discussed in
Section 15.6, and compactness classes in Section 15.7. It is very important to take note of the levels of struc-
ture required for each definition and theorem. Equally, it is important to clearly state the assumptions upon
which all theorems and definitions are based. The full statement of assumptions is sometimes tedious both
to write and read, but this is less painful in the long run than the occasional false application of theorems
and definitions. (It is a common source of error in mathematics to apply a theorem when its assumptions
are not satisfied, particularly in applications outside pure mathematics.)
A similar hazard of false application is present in differential geometry. The level of structure required for
definition and theorems in differential geometry is often implied in the context, not stated explicitly in the
statement of the definition or theorem. (This is the motivation for presenting the five-level DG structure
model in Section 1.1.)
[ Apparently a topological space was originally defined by Hausdorff in 1914 with the Hausdorff space require-
ment. Then Kuratowski generalized topological spaces in 1922 to the modern definition. Find a printed
reference for this in Remark 14.2.7. ]
14.2.7 Remark: It could be argued that the scope of general topology is too wide. The most general
definition of topology (Definition 14.3.3) permits absurd extremes of scope which are difficult to find appli-
cations for. Topological concepts have been generalized enormously in the last 200 years, far away from the
original focus on Euclidean spaces and metric spaces. Most useful topology is carried out with topological
spaces which are constrained in various ways to make them applicable, for example by requiring the spaces
to fall within one or more of the classes referred to in Remark 14.2.6. In particular, most of the topology
examples in Sections 14.4 and 14.8 are of very dubious applicability. Unbridled generalization often makes
paths into barren deserts with oases few and far between.

356 14. Topology
[ Find a reference for the statement in Remark 14.2.8 that Cantor’s set theory was motiviated by the desire
to fill the gaps between the algebraic real numbers. This seems to be implicitly justified by Bell [190],
pages 273–278, but an explicit reference would be better. ]
14.2.8 Remark: Set theory, as introduced by Georg Cantor, arose during the last quarter of the 19th
century out of a need to fill the gaps between the algebraic real numbers. In topological language, this is a
question about the completeness of the real numbers. So set theory, which is now the framework of almost
all mathematics, may be said to have arisen from a question in topology. Therefore it is no surprise that so
much basic set algebra is required for topology.
14.2.9 Remark: In topology, as in measure theory, many definitions are expressed in terms of arbitrary
sets and vast collections of subsets. This often leads one into the temptation to invoke the axiom of choice
as a convenient way of bringing sets and functions into existence to simplify proofs of theorems. The
mathematician who is against the axiom of choice must be constantly on guard.
14.3. Topological spaces

14.3.1 Remark: There are numerous equivalent formalisms for topological spaces. A topology may be
formalized, for example, in terms of open sets (Definition 14.3.3), closed sets (Notation 14.3.15), interior
operators (Definition 14.5.1), closure operators (Definition 14.5.4) or per-point neighbourhoods. (These are
described and formalized axiomatically in EDM2 [35], 425.B, page 1606.)
The most popular formalism defines a topology to be its set of open sets as in Definition 14.3.3. Sim-
mons [140], page 98, says the following on this subject.
A good deal of research was done along these lines in the early days of topology. It was found
that there are many different ways of defining a topological space, all of which are equivalent to
one another. Several decades of experience have convinced most mathematicians that the open set
approach is the simplest, the smoothest, and the most natural.
From the point of view of doing the practical analysis, undoubtedly the open-set formalization is the best.
However, it does lack direct intuitive appeal, as mentioned in Remark 14.0.2.
14.3.2 Remark: There is some redundancy in the specification tuple for a topological space. The set X
%
in a tuple (X, T ) always satisfies X = T . But in usage, the set X is usually in the foreground and the
topology T is in the background. Therefore the pair (X, T ) is often abbreviated to just X.
14.3.3 Definition: A topology for a set X is a set T ⊆ IP(X) such that

(i) {∅, X} ⊆ T ,
(ii) ∀Ω1 , Ω2 ∈ T, Ω1 ∩ Ω2 ∈ T ,
%
(iii) ∀C ⊆ T, C ∈ T .
14.3.4 Definition:
A topological space is a pair X −
< (X, T ) such that T is a topology for the set X.
A point in a topological space (X, TX ) is an element of X.
The point set of a topological space (X, TX ) is the set X.
A set in a topological space (X, TX ) is a subset of X.
14.3.5 Notation: Top(X) denotes the topology T on a topological space X − < (X, T ) when the choice of
topology T on X is implicit in a particular context. That is, Top(X) = T .
%
14.3.6 Remark: For topological spaces (X, T ), specification of the set X is redundant because X = T .
So it is perhaps perplexing that the pair (X, T ) is usually abbreviated as X. The set X certainly does not
contain the full information in the pair (X, T ), but the set T does contain the full information.
14.3.7 Remark: Since Theorem 14.3.8 (ii# ) implies condition (ii) of Definition 14.3.3, the finite intersection
condition (ii# ) may be substituted for the two-set intersection condition (ii) without changing the definition.
The two-set intersection rule (ii) in Definition 11.6.5 is chosen to facilitate the proof of validity of topologies.
But by Theorem 14.3.8, the two-set rule always implies the more powerful finite intersection rule.

14.3. Topological spaces 357
14.3.8 Theorem: Let (X, T ) be a topological space. Then

" ' #
(ii# ) ∀C ⊆ T, (1 ≤ #(C) < ∞) ⇒ C ∈ T .
Proof: To show (ii# ), let (X, T ) be a topological
' space and let C ⊆ T satisfy 1 ≤ #(C) < ∞. If #(C) = 1,
then C = {Ω} for some' set Ω ∈ T . So C = Ω ∈ T as claimed. If #(C) = 2, then C = {Ω1 , Ω2 } for some
sets Ω1 , Ω2 ∈ T . So C = Ω1 ∩ Ω2 ∈ T by Definition 14.3.3 (ii).
'
The result C ∈ T for general #(C) may be proved by an induction "'n−1 argument.
# Let n = #(C) > 1. Then
' 'n
there is a bijection f : n → C. So C = i=1 f (i) = i=1 f (i) ∩ f (n). This is an element of T
by Definition 14.3.3 (ii) if (ii# ) is valid for #(C) = n − 1. Therefore by induction on n, (ii# ) is valid for
all n = #(C).
14.3.9 Remark: A topology for a set X is a set of subsets of X which contains the empty set, the set X,
and any finite intersection or arbitrary union of sets in X.
One might reasonably ask why there is an asymmetry between set intersections and set unions in Defini-
tion 14.3.3. A simple answer is that topology would be very boring if closure under arbitrary intersections
was required. In that case, every topology on a set X would be the set of all unions of a partition of X. If
the topology had the ability to separate pairs of points at all (in the sense of the very weak T1 separation
class in Definition 15.2.4), the topology would be the power set IP(X). Then the only connected sets would
be singletons.
A better way to answer the question is to consider the intuitive idea of the interior of an open set. A set is
intuitively defined to be “open” if every point in the set is in the interior of the set.
If a point x is in the interior of an open set Ω1 and also in the interior of Ω2 , then we would expect x to
be in the interior of Ω1 ∩ Ω2 although the “walls” of the set would be a little “closer”. In the case of a
union Ω1 ∪ Ω2 , the “walls” of the union will be either the same “distance” away or further away. Since the
union operation makes sets bigger, we are guaranteed to always be in the interior of a set no matter how
many open sets are in a union, even an infinite or uncountably infinite number of open sets. The shinkage
of a set, on the other hand, has the danger that eventually we might not have enough room to move. Being
in the “interior” of a set intuitively means that we have at least some “space” between each point and the
“walls”. The amount of space for the intersection Ω1 ∩ Ω2 should be the minimum of the space for the
individual sets Ω1 and Ω2 . But the minimum of a “small space” and a “small space” is still a “small space”.
So by naive induction, we expect that the intersection of any finite number of sets will leave us at least some
“room” around each point which is still in the intersection.
Therefore a topological space requires closure under finite intersections and arbitrary unions.
14.3.10 Remark: Some texts (e.g. Simmons [140], section 16, p.92) do not permit a topology to be defined
on an empty set X. However, this would be inconvenient for the statement of some kinds of general rules.
It is tedious to have to always specially exclude cases where a set is empty. Therefore the pair (X, T ) where
X = ∅ and T = {∅} will be regarded as a valid topology in this text. (See Example 14.4.2 for details.)
14.3.11 Notation: Topx (X) denotes the set of neighbourhoods of a point x ∈ X −
< (X, T ) for an implicit
topology T on X. That is, Topx (X) = { Ω ∈ Top(X); x ∈ Ω } = {Ω ∈ T ; x ∈ Ω}.
14.3.12 Definition: An open set in a topological space (X, TX ) is any set Ω ∈ TX .
A closed set in a topological space (X, TX ) is any set K ⊆ X such that X \ K ∈ TX .
14.3.13 Remark: In the English-language literature, the letter G is often used for open sets and F is
often used for closed sets. Maybe this is because the German word for “open” is “geöffnet” and the French
word for “closed” is “fermé”. The letter K tends to be used for compact sets (Definition 15.7.4), but is also
commonly used for closed sets.
The letter Ω is very popular for open sets because “open”, “offen” and “ouvert” all start with “o” and in
Greek, O-mega means “big O”. (The Greek letter o has the name “o-micron”, which means “small O”.)
14.3.14 Remark: The set of closed sets for a topology is rarely given its own notation. EDM2 [35], 425.B,
page 1606, uses (Fraktur O) for the set of open sets and (Fraktur F) for the set of closed sets. But the
Fraktur font is difficult to write and read. The non-standard Notation 14.3.15 uses an over-bar to indicate
the set of closed sets of a topological space (X, T ) by analogy with Notations 14.3.5 and 14.5.5.

358 14. Topology
14.3.15 Notation: Top(X) denotes the set of closed sets in a topological space X −
< (X, T ). That is,
Top(X) = {F ∈ IP(X); X \ F ∈ Top(X)}
= {X \ Ω; Ω ∈ Top(X)}
= {X \ Ω; Ω ∈ T }.
Topx (X) denotes the set of closed sets in a topological space X −
< (X, T ) which contain a point x ∈ X. That
is, Topx (X) = {F ∈ Top(X); x ∈ F }.
14.3.16 Theorem:
(1) The union of any finite set of closed sets in a topological space is closed.
(2) The intersection of any non-empty set of closed sets in a topological space is closed.
Proof: To prove part (1), let C be a non-empty finite set of closed sets in a topological ' space X. Let
C # =%{X \ K; K ' ∈ C}. By Definition 14.3.12, ∀S ∈ C # , S ∈ Top(X). By Theorem 14.3.8,% C # ∈ Top(X).
But C = X \ ( C # ). So the union of C is closed by Definition 14.3.12. If C is empty, C = ∅, which is
a closed set.
% # X. Let C = {X%\ K; K ∈ C}.
For part (2), let C be a non-empty set of closed sets in a topological space #
' By
Definition 14.3.12, ∀S ∈ C , S ∈ Top(X). By Definition 14.3.3 (iii), C ∈ Top(X). But C = X \ ( C # ).
#
So the intersection of C is closed by Definition 14.3.12.

14.3.17 Remark: For any set X, both {∅, X} and the powerset IP(X) are valid topologies on X. (See
Exercise 46.7.1.) Note that the trivial topology {∅, X} contains only one element if X = ∅. Note also that
the trivial topology and the discrete topology are the same if X is empty or contains only one element. It
is clear that {∅, X} ⊆ T ⊆ IP(X) for any topology T on any set X. So the trivial topology is the smallest
topology on a set, and the discrete topology is the largest topology (in the sense of the partial order on sets
defined by set inclusion).
14.3.18 Definition: The trivial topology on a set X is the set TX = {∅, X}.
A trivial topological space is a topological space (X, T ) such that T is the trivial topology on X.
14.3.19 Definition: The discrete topology on a set X is the set TX = IP(X).
A discrete topological space is a topological space (X, T ) such that T is the discrete topology on X.
[ The terms “coarse” and “fine” are sometimes used instead of weak and strong. Define these. ]
14.3.20 Remark: When more than one topology is being discussed on a single set, one often compares the
relative “strength” or “weakness” of the topologies. It is often true that if a given topology has a property,
then all stronger, or weaker, topologies also have that property. Therefore knowing the relative strength or
weakness of topologies is often useful for proving properties more easily.
In contradiction to standard english, topologies T1 and T2 such that T1 = T2 are said to be both weaker and
stronger than each other. It is simpler to adopt this convention than to say, for instance, that “T1 is weaker
than or equal to T2 ”. For any set X, the trivial topology {∅, X} is clearly weaker than all topologies on X,
and the discrete topology IP(X) is stronger than all topologies on X.
14.3.21 Definition: A topology T1 on a set X is said to be weaker than a topology T2 on X if T1 ⊆ T2 .
T1 is said to be stronger than T2 if T1 ⊇ T2 .
14.4. Some simple topologies on finite sets

14.4.1 Remark: It is a useful familiarization exercise to determine in full generality the set of all topologies
on the most trivial sets. On the other hand, the subject of topology is not directed at finite sets. Topology
is principally concerned with limiting processes and continuity, and these concepts are only non-trivial for
infinite sets. However, connectivity does make good sense in a finite set. Connectivity for finite sets is
sometimes referred to as “network topology”, which is discussed in Section 16.10.
The smallest topology Top(X) = {∅, X} on any set X is the trivial topology in Definition 14.3.18. The
largest topology Top(X) = IP(X) = {S; S ⊆ X} on any set X is the discrete topology in Definition 14.3.19.
The interesting thing is to determine what the other possibilities are for the topology on any given set X.

14.4. Some simple topologies on finite sets 359
14.4.2 Example: The only possible topology on the set X = ∅ is Top(X) = {∅}. This gives the empty
topology (X, T ) = (∅, {∅}) which is mentioned in Remark 14.3.10.
14.4.3 Remark: When considering the possible topologies on countable sets of points, it is convenient to
consider only sets whose “points” are integers. Only the cardinality of the point set matters for this task.
14.4.4 Example: On a single-element set X = {1}, the only possible topology is Top(X) = {∅, X}. In
this case, the trivial topology and the discrete topology are the same.
14.4.5 Example: On a two-element set X = {1, 2}, there are four possible topologies.
topology abbreviation
a {∅, X} 0
b {∅, {1}, X} 1
c {∅, {2}, X} 2
d {∅, {1}, {2}, X} 1, 2
Topology a is the trivial topology. Topology d is the discrete topology. Topologies c and d are equivalent
under a permutation of the point set. So there is only one “interesting” topology on a two-point set. The
2
four possible topologies (amongst the 2(2 ) = 16 subsets of IP({1, 2})) are illustrated in Figure 14.4.1.
a b c d
1 2 1 2 1 2 1 2
Figure 14.4.1 All topologies on the set {1, 2}
Since ∅ and X are always elements of a topology on a set X, it makes sense to ignore them. Similarly, the
set brackets are a distraction. So the topologies may be abbreviated as in the right column of the table.
14.4.6 Example: On a three-element set X = {1, 2, 3}, there are 8 unique topologies. The other topologies
are obtained by permuting the point set.
topology multiplicity
a 0 1
b 1 3
c 12 3
d 1, 12 6
e 1, 2, 12 3
f 2, 12, 23 3
g 1, 2, 12, 23 6
h 1, 2, 3, 12, 13, 23 1
total 26
Topology a is the trivial topology. Topology h is the discrete topology. Including all of the permutations of the
3
point set, there are 26 possible topologies amongst the 2(2 ) = 256 subsets of IP({1, 2, 3}). (Combinatorics
enthusiasts may like to amuse themselves by trying to find a general formula for the number of possible
topologies for each point-set cardinality.) Topologies a to g are illustrated in Figure 14.4.2.
a b c
1 2 3 1 2 3 1 2 3
d e f g
1 2 3 1 2 3 1 2 3 1 2 3
Figure 14.4.2 Unique topologies on the set {1, 2, 3}

360 14. Topology
The determination of the set of all valid topologies on a four-element set is left to the interested reader. (See
Exercise 46.7.2.)
Since ∅ and X are always elements of X = n for n ∈ + , the number of subsets of IP(X) which must be
n
checked to find valid topologies is 22 −2 for n ≥ 1. This number increases rather rapidly with increasing n.
14.4.7 Remark: In the case of finite point sets X, there is a simple duality between the open sets and
closed sets because an “arbitrary union” of sets in a topology on X means a “finite union” of sets when X
is finite.
Let T be a topology on a finite set X. Then T̃ = {X \Ω; Ω ∈ T } is a topology on T . The topology T̃ is a kind
of “dual topology” of T . The closure of T̃ under set union follows from the fact that (X \ Ω1 ) ∪ (X \ Ω1 ) =
X \ (Ω1 ∩ Ω2 ) for all Ω1 , Ω2 ∈ T . Closure under intersections follows from (X \ Ω1 ) ∩ (X \ Ω1 ) = X \ (Ω1 ∪ Ω2 ).
Of course the dual of the dual T̃ is the same as the original topology T .
In Example 14.4.5, topology b is the dual of topology c. Both topologies a and d are self-dual.
In Example 14.4.6, the dual of topology b is the same as a permutation of topology c and topology e is the
same as a permutation of topology f . Topologies d and g are equivalent (under set permutations) to their
own dual topologies. Topologies a and g are self-dual.
14.4.8 Example: In practice, one usually wants topologies which have some sort of uniformity or sym-
metry. For example, a topology on sets like , , IR and IRn for n ∈ + 0 would generally be expected to
be invariant under arbitrary translations. Translation invariance is a very strong constraint which greatly
reduces the set of possible topologies on a set.
If X is a finite set, the only topologies on X which are invariant under all permutations of X are the trivial
and discrete topologies. To see this, suppose {x} ∈ Top(X) for some x ∈ X. Then permutation invariance
implies that {x} ∈ Top(X) for all x ∈ X. Since all unions of elements of Top(X) are elements of Top(X), this
implies that Top(X) = IP(X). Now suppose that A is an arbitrary non-empty subset of X such that A -= X.
Then there are elements x, y ∈ X such that x ∈ A and y ∈ / A. So by permutation invariance of Top(X), the
set Bz = swapy,z (A) is an element of Top(X) for all'z ∈ Z = X \ {x}. (See Definition 7.12.2 (vii) for the
swap function.) Therefore the finite intersection B = z∈Z Bz must be an element of Top(X). But B = {x}.
So by the first argument, Top(X) is once again the discrete topology. Note that the finiteness of X was an
essential step in this proof.
It follows that there are no interesting uniform topologies on finite sets with respect to the set of all permu-
tations of the point set.
14.4.9 Example: If the set of all permutations of a finite set is replaced by the set of rotations, the
situation in Example 14.4.8 is slightly different. A rotation Rd : n → n by distance d for n, r ∈ + 0 with
n ≥ 1 and d < n is defined by Rd : x 8→ 1 + (x + d − 1) mod n. Suppose n is a composite integer with n = k/
and k, / ∈ + \ {1}. (See Definition 7.4.2 for composite integers.) Let T be the set of subsets of n which
have periodicity k. Then T is a topology on n . This follows from the fact that the union of any set of
subsets with period k also has period k. The same is true for intersections. This topology is then the same
as / copies of the discrete topology on k .
14.5. Interior and closure of sets

14.5.1 Definition: The (topological) interior of a set S in a topological space (X, T ) is the union of all
open sets in (X, T ) which are included by the set S. In other words, the interior of S is the set
% %
Ω= {Ω ∈ T ; Ω ⊆ S}.
Ω∈T
Ω⊆S
14.5.2 Notation: Int(S) denotes the interior of a set S with respect to an implicit topological space (X, T ).
14.5.3 Remark: The intersection expression in Definition 14.5.4 is well-defined because the set {X \Ω; Ω ∈
T and S ⊆ X \ Ω} is non-empty since ∅ ∈ T and S ⊆ X \ ∅ = X.

14.5. Interior and closure of sets 361
14.5.4 Definition: The (topological) closure of a set S in a topological space (X, T ) is the intersection of
all closed sets in (X, T ) which include the set S. In other words, the closure of S is the set
' '
X \ Ω = {X \ Ω; Ω ∈ T and S ⊆ X \ Ω}
Ω∈T '
X\Ω⊇S = {K ∈ IP(X); X \ K ∈ T and S ⊆ K}
'
= {K ∈ Top(X); S ⊆ K}.
14.5.5 Notation: S̄ denotes the closure of a set S with respect to an implicit topology.
14.5.6 Remark: The notation S̄ for the closure of a set S is used by Simmons [140], page 68; Rudin [137],
page 39; Rudin [138], page 7; Taylor [145], page 26; Robertson/Robertson [136], page 6; Treves [147],
page xv; Gilbarg/Trudinger [111], page 9; Adams [93], page 9; Helms [117], page 1; Darling [14], page 114;
Malliavin [36], page 122; and Reinhardt [135], page 214. The notation S a is used by EDM2 [35], 425.B,
page 1607; and Yosida [151], page 3. (The letter “a” may be mnemonic for “adherent set”. But Yosida [151]
says that it comes from the German phrase for closure: “abgeschlossene Hülle”.) The notations S − and Cl S
are given by Ahlfors [94], page 53.
14.5.7 Remark: Notations 14.5.2 and 14.5.5 require the topology to be determined by the context. So
there can be confusion if there is more than one topology under consideration.
As mentioned in Remark 14.3.6, a topological space (X, T ) is fully determined by the set T . Therefore
Notation 14.5.8
% fully determines the implicit topological space (X, T ) by naming only the topology T be-
cause X = T .
14.5.8 Notation: IntT (S) denotes the interior of a set S with respect to a topological space (X, T ).
14.5.9 Theorem: Let S be a subset of a topological space X. Then:

(1) Int(S) ∈ Top(X).
(2) Int(S) ⊆ S.
(3) S̄ ∈ Top(X).
(4) S ⊆ S̄.
(5) X \ S̄ = Int(X \ S).
(6) S̄ = X \ Int(X \ S).
(7) Int(S) ⊆ S̄.
(8) Int(S) = X \ (X \ S).
(9) X \ S = X \ Int(S).
Let S1 and S2 be subsets of a topological space X. Then:
(10) S1 ⊆ S2 ⇒ Int(S1 ) ⊆ Int(S2 ).
(11) S1 ⊆ S2 ⇒ S̄1 ⊆ S̄2 .
Proof: Part (1) follows from Definition 14.3.3 because the interior of a set is the union of a set of open
sets by Definition 14.5.1.
%
For part (2), let C = {Ω ∈ Top(X); Ω ⊆ S}. Then ∀z ∈ C, z ⊆ S. So C ⊆ S by Theorem 5.14.7 (xiii).
That is, Int(S) ⊆ S.
Part (3) follows from Theorem 14.3.16 because the closure of a set is the intersection of a non-empty set of
closed sets by Definition 14.5.4. (The set C of closed supersets of a set S ⊆ X is non-empty because X is a
closed set in any topological space X. Therefore X ∈ C. See also Remark 14.5.3.)
%
For part (4), let C = {K ∈ Top(X); S ⊆ K}. Then ∀z ∈ C, z ⊇ S. So C ⊇ S by Theorem 5.14.7 (xiv).
That is, S̄ ⊇ S.
% %
Part'(5) follows from Int(X \ S) = {Ω ∈ Top(X); Ω ⊆ X \ S} = {Ω; Ω ∈ Top(X) ∧ S ⊆ X \ Ω} =
X \ {X \ Ω; Ω ∈ Top(X) ∧ S ⊆ X \ Ω}, which equals X \ S̄ by Definition 14.5.4.
Part (6) follows from part (5) and Theorem 5.13.10 (viii).

362 14. Topology
Part (7) follows from parts (2) and (4).

Part (8) follows from part (5) by substituting X \ S for S.
Part (9) follows from part (8).
Part (10) follows from Theorem 5.14.7 (i) because S1 ⊆ S2 implies that {Ω ∈ Top(X); Ω ⊆ S1 } ⊆ {Ω ∈
Top(X); Ω ⊆ S2 } (by the transitivity of the set inclusion relation).
Part (11) follows from Theorem 5.14.7 (ii) because S1 ⊆ S2 implies that {K ∈ Top(X); K ⊇ S1 } ⊇ {K ∈
Top(X); K ⊇ S2 } (by the transitivity of the set inclusion relation).
(1) S ∈ Top(X) ⇔ Int(S) = S.
(2) S ∈ Top(X) ⇔ S̄ = S.
(3) Int(Int(S)) = Int(S).
(4) S̄¯ = S̄.
(5) Int(S) ⊆ S̄.
(6) Int(S̄) ⊇ Int(S).
(7) Int(S1 ) ∪ Int(S2 ) ⊆ Int(S1 ∪ S2 ).
(8) Int(S1 ) ∩ Int(S2 ) ⊆ Int(S1 ∩ S2 ).
(9) S̄1 ∪ S̄2 ⊇ S1 ∪ S2 .
(10) S̄1 ∩ S̄2 ⊇ S1 ∩ S2 .
Proof: %For part (1), let S be an open set in X. That is, S ∈ Top(X). Then S ∈ {Ω ∈ Top(X); Ω ⊆ S}.
So S ⊆ {Ω ∈ Top(X); Ω ⊆ S} = Int(S) by Definition 14.5.1. But Int(S) ⊆ S by Theorem 14.5.9 (1).
So Int(S) = S.
Now suppose that S ⊆ X and Int(S) = S. Then S is open by Theorem 14.5.9 (1). So part (1) is verified.
' (2), let S be a closed set in X. That is, S ∈ Top(X). Then S ∈ {K ∈ Top(X); K ⊇ S}. So
For part
S ⊇ {K ∈ Top(X); K ⊇ S} = S̄ by Definition 14.5.4. But S̄ ⊇ S by Theorem 14.5.9 (3). So S̄ = S.
Now suppose that S ⊆ X and S̄ = S. Then S is closed by Theorem 14.5.9 (3). So part (2) is verified.
Part (3) follows from part (1) and Theorem 14.5.9 (1).
Part (5) follows from Theorem 14.5.9 parts (2) and (11).
Part (6) follows from Theorem 14.5.9 parts (4) and (10).
For part (7), note that Int(S1 )∪Int(S2 ) is open by Theorem 14.5.9 (1) and Definition 14.3.3 (iii). So Int(S1 )∪
Int(S2 ) = Int(Int(S1 ) ∪ Int(S2 )) by part (3). This is a subset of Int(S1 ∪ S2 ) by Theorem 14.5.9 (10).
Part (8) may be proved as for part (7) expect that Definition 14.3.3 part (ii) is used instead of part (iii).
For part (9), note that S̄1 ∪ S̄2 is closed by Theorem 14.5.9 (3) and Theorem 14.3.16 (1). So S̄1 ∪ S̄2 = S̄1 ∪ S̄2
by part (4). This is a superset of S1 ∪ S2 by Theorem 14.5.9 (11).
Part (10) may be proved as for part (9) expect that Theorem 14.3.16 part (2) is used instead of part (1).
14.5.11 Theorem: The closure of any set S in a topological space X is equal to the complement of the
interior of the complement of S in X. In other words, S̄ = X \ Int(X \ S).
" #
The interior of S is the complement of the closure of the complement of S. That is, Int(S) = X \ X \ S .
Proof: To prove the first part, note that
'
S̄ = {X \ Ω; Ω ∈ Top(X) ∧ S ⊆ X \ Ω}
%
= X \ {Ω; Ω ∈ Top(X) ∧ S ⊆ X \ Ω}
%
= X \ {Ω; Ω ∈ Top(X) ∧ Ω ⊆ X \ S}
= X \ Int(X \ S).
The second part follows by substituting X \ S for S.

14.5. Interior and closure of sets 363
14.5.12 Theorem:
(1) An element x of a set S in a topological space X is an element of the interior of S in X if and only if
∃Ω ∈ Topx (X), Ω ⊆ S.
(2) An element x of a set S in a topological space X is an element of the closure of S in X if and only if
∀Ω ∈ Topx (X), Ω ∩ S -= ∅.
Proof: To show part (1), assume that ∃Ω ∈ Topx (X), Ω ⊆ S. Let G ∈ Topx (X) satisfy G ⊆ S. Then
x ∈ G ∈ {Ω ∈ Top(X); Ω ⊆ S}. So by Definition 14.5.1, x is in interior of S.
Conversely, assume that x ∈ Int(S). Then x ∈ {Ω ∈ Top(X); Ω ⊆ S}. Therefore ∃Ω ∈ Top(X); x ∈ Ω ⊆ S.
It follows that x ∈ Ω and so Ω ∈ Topx (X). Hence ∃Ω ∈ Topx (X), x ∈ Ω ⊆ S, as was to be shown.
Part (2) follows easily from part (1) by the application of Theorem 14.5.11. But it is instructive to prove it
directly. For any x ∈ X,
' %
x∈ {X \ Ω; Ω ∈ T and S ⊆ X \ Ω} ⇔ x ∈
/ {Ω; Ω ∈ T and S ⊆ X \ Ω}
⇔ ¬ ∃Ω ∈ Top(X), (x ∈ Ω ∧ S ⊆ X \ Ω)
⇔ ¬ ∃Ω ∈ Top(X), (x ∈ Ω ∧ S ∩ Ω = ∅)
⇔ ∀Ω ∈ Top(X), (x ∈
/ Ω ∨ S∩Ω∈ / ∅)
⇔ ∀Ω ∈ Top(X), (x ∈ Ω ⇒ S ∩ Ω ∈/ ∅)
⇔ ∀Ω ∈ Topx (X), S ∩ Ω ∈
/ ∅.
In other words, x is in the closure of S if and only if ∀Ω ∈ Topx (X), Ω ∩ S -= ∅, as claimed.
14.5.13 Remark: Definitions 14.5.1 and 14.5.4, and Theorem 14.5.12 may be summarized as follows.
%
Int(S) = {Ω ∈ Top(X); Ω ⊆ S}
= {x ∈ X; ∃Ω ∈ Topx (X), Ω ⊆ S}
'
S̄ = {X \ Ω; Ω ∈ Top(X) and S ⊆ X \ Ω}
'
= {K ∈ Top(X); S ⊆ K}
= {x ∈ X; ∀Ω ∈ Topx (X), Ω ∩ S -= ∅},
for any subset S of a topological space X.
14.5.14 Remark: Theorem 14.5.15 uses Notation 14.5.8 for the interior IntT (S) of a set S with respect to
a topology T . Unfortunately, the notation S̄ for the closure of a set does not easily permit the addition of
a subscript to indicate the implied topology explicitly. Therefore the ad-hoc notation ClosureT (S) may be
used for the closure of a set S with respect to a topology T .
14.5.15 Theorem: Let T1 and T2 be topologies on a set X such that T1 ⊆ T2 . Then:

(1) IntT1 (S) ⊆ IntT2 (S) for all S ∈ IP(X).
(2) ClosureT1 (S) ⊇ ClosureT2 (S) for all S ∈ IP(X).
Proof: For part (1), note that% {Ω ∈ T1 ; Ω ⊆ S} ⊆%{Ω ∈ T2 ; Ω ⊆ S}. Therefore by Theorem 5.14.7 (i) and
Definition 14.5.1, IntT1 (S) = {Ω ∈ T1 ; Ω ⊆ S} ⊆ {Ω ∈ T2 ; Ω ⊆ S} = IntT2 (S).
14.5.16 Remark: Theorem 14.5.15 says that strengthening the topology on a fixed set X makes interiors
of sets larger, and makes closures of sets smaller. (The inequalities in Theorem 14.5.15 are illustrated in
Figure 14.6.2.) In the extreme case of the strongest topology on X, namely the discrete topology IP(X),
both the interior and closure of S equal S itself. (This is discussed in more detail in Remark 14.8.10.) In
the opposite extreme of the trivial topology T = {∅, X} on X, the interior of any set S in IP(X) \ {∅, X}
equals ∅, and the closure of such a set S equals X. (See Remark 14.6.13 for similar comments on the exterior
and boundary of sets.)

364 14. Topology
14.6. Exterior and boundary of sets

14.6.1 Remark: It seems reasonable to define the exterior of a set analogously to the interior. This is
done in Definition 14.6.2. The exterior turns out to be the same as the complement of the closure given in
Definition 14.5.4. A reasonable notation for the exterior of a set S would be Ext(S).
It would perhaps also be reasonable to define the “exterior closure” of a set as the complement of the interior.
But there seems to be little demand for this.
14.6.2 Definition: The (topological) exterior of a set S in a topological space X is the union of all open
sets in X which are included in the set X \ S. In other words, the exterior of S is the set
% %
Ω= {Ω ∈ Top(X); Ω ∩ S = ∅}.
Ω∈Top(X)
Ω⊆X\S
14.6.3 Notation: Ext(S) denotes the exterior of a set S in an implicit topological space.
14.6.4 Definition: The (topological) boundary of a set S in a topological space X is the complement of
the interior of S within the closure of S; in other words, S̄ \ Int(S).
14.6.5 Notation: Bdy(S) denotes the boundary of a set S with respect to an implicit topology.
∂S is an alternative notation for Bdy(S).
14.6.6 Remark: Combining Definition 14.6.4 and Notation 14.6.5 gives ∂S = Bdy(S) = S̄ \ Int(S) for
subsets S of a topological space X.
The notation ∂S is much more common than Bdy(S) for the boundary of a set S. However, the curly-dee
symbol ∂ is also used extensively for denoting partial derivatives. In fact, the two concepts are closely
related within the context of the theory of distributions. (The gradient of the indicator function of the set S
is related to the boundary of S.)
(1) Ext(S) = Int(X \ S).
(2) Ext(S) ∈ Top(X).
(3) Ext(S) = X \ S̄.
(4) Ext(S) ∩ S̄ = ∅.
(5) Ext(S) ⊆ X \ S.
(6) Bdy(S) = S̄ ∩ X \ S.
(7) Bdy(S) ∈ Top(X).
(8) Bdy(S) = X \ Int(S) \ Ext(S).
(9) Bdy(S) = Bdy(X \ S).
(10) X = Int(S) ∪ Bdy(S) ∪ Ext(S).
(11) (Int(S) ∩ Bdy(S) = ∅) ∧ (Int(S) ∩ Ext(S) = ∅) ∧ (Bdy(S) ∩ Ext(S) = ∅).
(12) X \ S = Bdy(S) ∪ Ext(S).
(13) Int(S) = S̄ \ Bdy(S).
(14) Int(S) = S \ Bdy(S).
(15) Ext(S) = (X \ S) \ Bdy(S).
(16) S̄ = Int(S) ∪ Bdy(S).
(17) S̄ = S ∪ Bdy(S).
(18) Ext(Ext(S)) = Int(S̄).
(19) Int(Ext(S)) = Ext(S).
(20) Bdy(Int(S)) ⊆ Bdy(S).

14.6. Exterior and boundary of sets 365
(21) Bdy(S̄) ⊆ Bdy(S).

(22) Bdy(Ext(S)) ⊆ Bdy(S).
(23) Bdy(S) = Bdy(S).
(24) Bdy(Bdy(S)) ⊆ Bdy(S).
(25) S1 ⊆ S2 ⇒ Ext(S1 ) ⊇ Ext(S2 ).
% %
Proof: Part (1) follows from Ext(S) = {Ω ∈ Top(X); Ω ∩ S = ∅} = {Ω ∈ Top(X); Ω ⊆ X \ S}, which
equals Int(X \ S) by Definition 14.5.1.
Part (2) follows from Theorem 14.5.9 (1) because Ext(S) = Int(X \ S) by part (1).
Part (4) follows from part (3) and Theorem 5.13.10 (v).
Part (6) follows from Definition 14.6.4 and Theorem 14.5.9 (8).
Part (7) follows from part (6), Theorem 14.5.9 (3) and Theorem 14.3.16 (2).
For part (8), note that Bdy(S) = S̄ \ Int(S) = X \ Int(X \ S) \ Int(S) = X \ Int(S) \ Ext(S) by part (1).
To show part (9), note that by part (8), Bdy(X \ S) = X \ Int(X \ S) \ Ext(X \ S) = X \ Ext(S) \ Int(S)
by part (1) and Theorem 5.13.10 (viii). This equals Bdy(S) by part (8).
Part (10) follows from part (8) because Bdy(S) = X \ Int(S) \ Ext(S) = X \ (Int(S) ∪ Ext(S)).
Part (11) follows from part (8) because Int(S) ∩ Ext(S) = ∅ follows from part (4) and Theorem 14.5.9 (7).
For part (12), note that Bdy(S) ∪ Ext(S) = (X \ Int(S) \ Ext(S)) ∪ Ext(S) = X \ Int(S) by part (11). But
by Theorem 14.5.9 (6), X \ S = X \ Int(S).
Part (17) follows from part (16) and Theorem 14.5.9 parts (2) and (4).
Part (18) follows from Ext(Ext(S)) = Int(X \ Ext(S)), by part (1), which equals Int(S̄) by part (3).
Part (20) follows from Bdy(Int(S)) = Int(S) \ Int(Int(S)) by Definition 14.6.4, which equals Int(S) \ Int(S)
by Theorem 14.5.10 (3), which is a subset of S̄ \ Int(S) by Theorem 14.5.10 (5), which equals Bdy(S).
For part (21), note that Bdy(S̄) = S̄¯ \ Int(S̄) by Definition 14.6.4, which equals S̄ \ Int(S̄) by Theorem
14.5.10 (4), which is a subset of S̄ \ Int(S) by Theorem 14.5.10 (6), which equals Bdy(S).
For part (22), note that Bdy(Ext(S)) = Bdy(X \ S̄), by part (3), which equals Bdy(S̄) by part (9), and this
is a subset of Bdy(S) by part (21).
For part (24), note that Bdy(Bdy(S)) ⊆ Bdy(S) by Definition 14.6.4, but Bdy(S) = Bdy(S) by part (23).
Part (25) follows from part (1), Theorem 14.5.9 (10) and Theorem 5.13.12 (i).
14.6.8 Remark: Parts (10) and (11) of Theorem 14.6.7 imply that the set {Int(S), Bdy(S), Ext(S)} is a
partition of the topological space X for any subset S of X.
14.6.9 Remark: For any set S in a topological space X, the points in Int(S) are always elements of S (by
Theorem 14.5.9 (2)), and the points in Ext(S) are always elements of X \ S (by Theorem 14.6.7 (5)). But
the points of Bdy(S) may belong to either S or X \ S. (See Figure 14.6.1.)
If the set S is open, all points of Bdy(S) belong to X \ S, whereas if the set S is closed, all points of Bdy(S)
belong to S. Therefore one may refer to the points of Bdy(S) which are elements of X \ S as the “open
portions” of the boundary, and the points of Bdy(S) which are elements of S as the “closed portions” of the
boundary.

366 14. Topology
rtion open
po por
sed tio
clo n
X \S
S Bdy(S) ⊆ X \ S
Bdy(S) ⊆ S
Figure 14.6.1 Open and closed portions of boundary of a set S
14.6.10 Theorem: Let X be a topological space. Then the following propositions are true.
(1) Int(∅) = ∅, Bdy(∅) = ∅ and Ext(∅) = X.
(2) Int(X) = X, Bdy(X) = ∅ and Ext(X) = ∅.
14.6.11 Remark: Notation 14.6.12 gives topology-dependent notations for the exterior and boundary of
sets analogous to Notation 14.5.8 for the interior of sets.
14.6.12 Notation: ExtT (S) denotes the exterior of a set S with respect to a topological space (X, T ).
BdyT (S) denotes the boundary of a set S with respect to a topological space (X, T ).
14.6.13 Remark: Theorem 14.6.14 says that strengthening the topology on a fixed set X makes set ex-
teriors larger and set boundaries smaller. In the extreme case of the strongest topology on X, namely the
discrete topology IP(X), the exterior of S is equal to X \ S and the boundary of S is empty. In the opposite
extreme of the trivial topology T = {∅, X} on X, the exterior of any set S in IP(X) \ {∅, X} equals ∅, and the
boundary of such a set S equals X. (See Remark 14.5.16 for similar comments on the interior and closure
of sets.)
(1) ExtT1 (S) ⊆ ExtT2 (S) for all S ∈ IP(X).
(2) BdyT1 (S) ⊇ BdyT2 (S) for all S ∈ IP(X).
Proof: Part (1) follows from Theorem 14.5.15 (1) because ExtT (S) = IntT (X \ S) for T = T1 and T2 by
Theorem 14.6.7 (1).
Part (2) follows from Theorem 14.5.15 and Theorem 14.6.7 (8).
14.6.15 Remark: The inequalities in Theorems 14.5.15 and 14.6.14 are illustrated in Figure 14.6.2.
S̄
Int(S) Bdy(S) Ext(S)
topology weaker
T1 topology
S X \S
topology stronger
T2 topology
S̄
X
Figure 14.6.2 Relation between topology strength and interior/boundary/exterior of sets

14.6. Exterior and boundary of sets 367
The decreasing (actually non-increasing) boundary set Bdy(S) for a fixed set S with respect to the strength of
the topology may be thought of a process of nibbling away of the boundary by the interior Int(S) and Ext(S)
as more empty sets are added to the topology. When the topology is weak, many points have “undecided
status”. That is, they are neither in the interior nor in the interior. (Figure 14.6.1 in Remark 14.6.9 illustrates
the “undecided status” of points in Bdy(S) which are in S or X \ S, but which are not allocated to either
Int(S) or Ext(S).)
As the topology is strengthened, more and more points are decided as either interior or exterior points. The
extreme cases, the trivial and discrete topologies, are illustrated in Figure 14.6.3, where almost all sets S
in the trivial topology have Bdy(S) = X because no points in X are “decided”, whereas all sets S in the
discrete topology have Bdy(S) = ∅ because all points in X are “decided”.
14.6.16 Remark: It is reasonable to seek additional inequalities resembling Theorems 14.5.15 and 14.6.14.
For example, let S1 , S2 ∈ IP(X) for a topological space X. Then by Theorems 14.5.9 (10), 14.5.9 (11),
and 14.6.7 (25), it follows that if S1 ⊆ S2 , then Int(S1 ) ⊆ Int(S2 ), S̄1 ⊆ S̄2 and Ext(S1 ) ⊇ Ext(S2 ), but there
is no such general inequality relating Bdy(S1 ) to Bdy(S2 ).
There seem to be no such inequalities for a fixed set S in the intersection X1 ∩ X2 of two topological
spaces (X1 , T1 ) and
' (X2 , T2 ). It is'
not possible for two different sets X1 and X2 to have the same topology
(because X1 = T1 and X2 = T2 ). But a reasonable correspondence between T1 and T2 would be
{Ω ∩ X1 ; Ω ∈ T2 } = {Ω ∩ X2 ; Ω ∈ T1 }. (This means that the relative topologies of T1 and T2 on X1 ∩ X2
are the same. See Definition 14.11.13 for relative topology.) With such a correspondence, Int(S), S̄, Bdy(S)
and Ext(S) are all independent of the choice of topology. So no interesting inequalities seem to be available
for the condition X1 ⊆ X2 .
14.6.17 Theorem: Let X be a discrete topological space. Then the following propositions are true.
(1) ∀S ∈ IP(X), S ∈ Top(X).
(2) ∀S ∈ IP(X), S ∈ Top(X).
(3) ∀S ∈ IP(X), Int(S) = S.
(4) ∀S ∈ IP(X), Ext(S) = X \ S.
(5) ∀S ∈ IP(X), Bdy(S) = ∅.
(6) ∀S ∈ IP(X), S̄ = S.
Proof: By Definition 14.3.19 for a discrete topological space, Top(X) = IP(X). This implies part (1).
Part (2) follows from part (1) and Definition 14.3.12.
Part (5) follows from parts (3) and (4) and Theorem 14.6.7 (8).
Part (6) follows from part (2) and Theorem 14.5.10 (2). (Or from part (4) and Theorem 14.6.7 (3).)
14.6.18 Theorem: Let X be a trivial topological space. Then the following propositions are true.
(1) ∀S ∈ IP(X), (S ∈ Top(X) ⇔ (S = ∅ ∨ S = X)).
(2) ∀S ∈ IP(X), (S ∈ Top(X) ⇔ (S = ∅ ∨ S = X)).
(3) ∀S ∈ IP(X) \ {X}, Int(S) = ∅.
(4) ∀S ∈ IP(X) \ {∅}, Ext(S) = ∅.
(5) ∀S ∈ IP(X) \ {∅, X}, Bdy(S) = X.
(6) ∀S ∈ IP(X) \ {∅}, S̄ = X.
Proof: Part (1) is equivalent to Definition 14.3.18 for the trivial topology.
Part (2) follows from part (1) and the definition of closed sets.
For part (3), note that if S ∈ IP(X) \ {X} and Ω ∈ Top(X) and Ω ⊆ S, then Ω = ∅.

368 14. Topology
S̄ = X
Int(S) = ∅ Bdy(S) = X Ext(S) = ∅
trivial weakest
topology topology
{∅, X}
S X \S
discrete
topology strongest
IP(X) topology
Int(S) = S Bdy(S) = ∅ Ext(S) = X \ S
S̄ = S
X
Figure 14.6.3 Interior/boundary/exterior of sets including trivial and discrete extremes
14.6.19 Remark: Theorems 14.6.17 and 14.6.18 are illustrated in Figure 14.6.3.
Note that for the trivial topology on a non-empty set X, the label Int(S) = ∅ applies only if S -= X; the
label Ext(S) = ∅ applies only if S -= ∅; the label Bdy(S) = ∅ applies only if S ∈
/ {∅, X}; and the label S̄ = X
applies only if S -= ∅. (See Remark 14.6.15 for related comments.)
14.6.20 Remark: Figure 14.6.4 illustrates the way in which boundary thickness of a fixed set S in a
topological space X decreases as the topology is strengthened. It is notable that both the weakest (i.e.
trivial) topology and the strongest (i.e. discrete) topology contain no information. The extremes of topology
strength add no information to the set S.
X
Bdy(S) = X S weakest topology
thick boundary ∂S weak topology
X
∂S
X
thin boundary ∂S strong topology
X
Bdy(S) = ∅ S strongest topology
Figure 14.6.4 The influence of topology strength on boundary thickness
14.7. Limit points and isolated points

14.7.1 Definition: A limit point of a set S in a topological space X is a point x ∈ X which satisfies
∀Ω ∈ Topx (X), Ω ∩ (S \ {x}) -= ∅.
A limit point is also known as an accumulation point or a cluster point.
The limit set of a set S in a topological space X is the set of limit points of S.
14.7.2 Remark: If a point x has only a finite number of neighbourhoods Ω ∈ Topx (X), the point ' x can
be a limit point only if there is at least one point y distinct from x which is in the intersection Topx (X)
of all neighbourhoods of x. (See Figure 14.7.1.)

14.8. Some simple topologies on countably infinite sets 369
not a limit point limit point

x x
Ω Ω
S S
Figure 14.7.1 Limit point x of a set S

'
But then G = Topx (X) must be an element of Topx (X) (i.e. an open neighbourhood of x) if the number
of neighbourhoods is finite. It would therefore follow that {x, y} ⊆ G ⊆ Ω for all non-empty Ω ∈ Topx (X).
This would imply that the topology is extremely weak. Such a topology would not even have the extremely
weak T1 separation property in Definition 15.2.4. Therefore limit points are of real interest only when there
are infinitely many neighbourhoods at each point.
14.7.3 Remark: The limit set of a set S in a topological space X may be written as
{x ∈ X; ∀Ω ∈ Topx (X), Ω ∩ (S \ {x}) -= ∅}.
14.7.4 Theorem: A point x is a limit point of a set S in a topological space X if and only if x ∈ S \ {x}.
Proof: The proof follows from Definitions 14.7.1 and 14.5.1. Let x ∈ X. Then
x is a limit point of S ⇔ ∀Ω ∈ Topx (X), Ω ∩ (S \ {x}) -= ∅

⇔ ¬ ∃Ω ∈ Topx (X), Ω ∩ (S \ {x}) = ∅
⇔ ¬ ∃Ω ∈ Topx (X), Ω ⊆ X \ (S \ {x})
⇔ x∈/ Int(X \ (S \ {x}))
⇔ x ∈ S \ {x}.
The last line follows from Theorem 14.5.11.
14.7.5 Definition: An isolated point of a set S in a topological space X is a point x ∈ S which is not a
limit point of S.
[ Show the relations between limit points and the continuity of functions. In some sense, continuous functions
are those which preserve limit points. Taht is, the limit set of the image of a set by a continuous function
includes the image of the limit set, roughly speaking. Also show the relation between limits of sets and
limits of functions. And also show the exact general relation between boundaries and limit sets. Continuous
functions preserve boundaries. So they should preserve limits also. Boundaries can usually be thought of as
the simultaneous limits of interior and exterior points. Limits of sequences have something to do with this
too. ]
14.8. Some simple topologies on countably infinite sets

14.8.1 Remark: Topology on finite sets is not very useful. It is an interesting exercise to check one’s
understanding of the axioms of topology in finite-set situations which are simple enough to analyze com-
pletely. But the real value of topology is the ability to study limiting processes. Limits of finite sequences
are essentially devoid of interest.
Topology (in the small) arose historically from the study of limits of points and functions. Limits are
the essence of analysis. The word “ana-lysis” itself means breaking up something into tiny bits, from the
Greek word “ ” (“loosing, dissolution, separation”) and “ ” (“up, upwards”). Thus “ ” means

370 14. Topology
“dissolution” in the sense that a salt may be dissolved or loosened into ions by being immersed in water.
(Related words are “cata-lysis”, “dia-lysis”, “electro-lysis”, “hydro-lysis”, “para-lysis” and “photo-lysis”.)
The English word “solve” comes from the Latin word “solvere” (“to loosen”), which comes from Latin “se-
luo” (reflexive of “luo”). The Latin word “luo” (“loosen”) comes from Greek “ ” (“to loosen, set free,
release, dissolve, sever, destroy”), which is where the word “ ” comes from.
A finite set cannot be broken up into ever tinier bits. The task of topology is to assist the study of limiting
processes when sets become infinitesimal (i.e. arbitrary small). Consequently, topology on finite sets is purely
recreational and educational.
14.8.2 Remark: The task of a topology is to separate points from each other (and sets from each other).
A stronger topology is better at separating points. A weaker topology has less ability to separate points. (See
Definition 14.3.21 for weaker and stronger topologies.) When a set is finite, the ability to separate points
from each other implies that all singletons are open sets, which implies that the topology is the discrete
topology. (See Theorem 15.2.10.) When a set is infinite, the ability to separate all pairs of singletons does
not result in the topology being discrete. This fact is demonstrated in Example 14.8.3.
14.8.3 Example: Figure 14.8.1 illustrates a topology with imperfect separation on a countably infinite
set X = + 0 . In other words, the topology is not the discrete topology. Even though the point x = 0 is
separated from any other point y ∈ X by a set Ωy ∈ Top(X), it is not possible to construct {x} as the
intersection of a finite set of open sets Ωy .
Figure 14.8.1 illustrates the set T = {∅} ∪ {Ωy ; y ∈ +
0 } where Ωy = {i ∈ 0 ; i = 0 ∨ i > y} for y ∈ 0 .
+ +
Ω7 Ω6 Ω5 Ω4
0
9 8 7 6 5 4 3 2 1
Figure 14.8.1 Topology with poor separation on a countably infinite set
It is easily verified that the range of any non-increasing sequence of subsets of IP(X) (with respect to the
set-inclusion partial order) is closed under finite intersection and arbitrary union. So if ∅ and X are added,
the result is a valid topology. It follows that T is a valid topology on X.
Although x = 0 is separated from y by the open set Ωy for all y ∈ + 0 (because 0 ∈ Ωy and y ∈ / Ωy ), the
set {0} is clearly not equal to the intersection of any finite number of sets Ωy . So {0} ∈ / T although 0 is
separated from all individual elements of X.
The topology T may be extended to include all sets {y} for y ∈ + . Let T # = {G1 ∪ G2 ; G1 ∈ T and G2 ∈
IP( + )}. (This is the same as T # = IP( + ) ∪ {Ω ∈ IP( + 0 ); #(
+
\ Ω) < ∞}.) Then it is (fairly) clear that
T is also a valid topology on X. This larger topology completely separates all points x ∈ +
#
0 from other
elements y ∈ X \ {x}. The set {x} is in T # for all x ∈ + , but {0} ∈ / T #.
14.8.4 Example: There is now the question of whether infinite sets such as have interesting topologies
which are invariant under various groups of permutations of the point set.
Suppose a topology T on is invariant under all translations of and contains at least one non-empty finite
set. Then T = IP( ). To show this, let Ω ∈ T be a non-empty finite set. Let d = max(Ω) − min(Ω). Then
Ω ∩ (Ω + d) = {max(Ω)} contains exactly one element of , where Ω + d denotes the translate of Ω by a
distance d. It follows that {x} ∈ T for all x ∈ . From this, every subset of can be constructed as a union
of singleton sets.
The set Tk of all subsets of with period k ∈ + is a translation-invariant topology on . These topologies
are simply infinite copies of the discrete topology on k . The case k = 1 is the trivial topology.
[ Give a general theorem about the consequences of invariance of a topology under all permutations of the
point set. General permutations include all bijections. Maybe also examine a sub-class of permutations
which only permute at most a finite set of points, or a countably infinite set of points, and so forth. ]

14.8. Some simple topologies on countably infinite sets 371
14.8.5 Example: The topologies in Example 14.8.4 do not exhaust all possibilities for translation-invariant
topologies on . A topology can be defined on X = by
Top(X) = {∅} ∪ {Ω ∈ IP(X); #(X \ Ω) < ∞}. (14.8.1)
This set of subsets of X = is closed under finite intersection because \ (Ω1 ∩ Ω2 ) = ( \ Ω1 ) ∪ ( \ Ω1 ),

which is a finite set if Ω1 , Ω2 ∈ Top( ). Closure under union is equally clear. This topology is invariant
under all permutations of the point set .
The set defined by (14.8.1) is a valid topology for any set X, and it is always permutation-invariant. On
the downside, it is not very useful in applications. (However, Remark 15.2.15 does use this topology as a
borderline example.)
[ It would be interesting to apply the style of analysis in this section to uniform or translation-invariant
topologies on , IR and IRn for n ∈ +0 .]
14.8.6 Definition: The trivial closed-point topology on a set X is the set
T = {∅} ∪ {Ω ∈ IP(X); #(X \ Ω) < ∞}. (14.8.2)
A trivial closed-point topological space is a topological space (X, T ) such that T is the trivial closed-point
topology on X.
14.8.7 Remark: Definition 14.8.6 is almost certainly non-standard. The “trivial closed-point topology”
on a set X is the smallest topology on X for which singletons {x} are closed sets for all x ∈ X. (This
property of a topology is defined as the T1 separation property in Section 15.2. So a less clumsy name for
this concept would be a “trivial T1 topology”.)
14.8.8 Theorem: The trivial closed-point topology on a set X is a topology on X.
Proof: Let X be a set. Define T as in equation (14.8.2). Then clearly T ⊆ IP(X) and {∅, X} ⊆ T . So
Definition 14.3.3 (i) is satisfied.
Let Ω1 , Ω2 ∈ T . If Ω1 = ∅ or Ω2 = ∅, then Ω1 ∩ Ω2 = ∅ ∈ T . So suppose that Ω1 -= ∅ and Ω2 -= ∅. Then
Ω1 , Ω2 ∈ IP(X), and #(X \ Ω1 ) < ∞ and #(X \ Ω2 ) < ∞. But X \ (Ω1 ∩ Ω2 ) = (X \ Ω1 ) ∪ (X \ Ω2 ) by
Theorem 5.13.11. So #(X \ (Ω1 ∩ Ω2 )) < ∞ and therefore Ω1 ∩ Ω2 ∈ T , which satisfies Definition 14.3.3 (ii).
%
Let C ⊆ T and let Ω = C. If Ω = ∅ then Ω ∈ T . So suppose that Ω -= ∅. Then ∃G ∈ C, G% -= ∅. Therefore
∃G ∈ C, (G ⊆ X ∧ #(X \ G) < ∞). But by Theorem 5.14.7 (xi), G ∈ C implies that % G ⊆ C. Therefore
∃G ∈ C, (G ⊆ Ω ∧ #(X \ G) < ∞). Hence #(X \ Ω) ≤ #(X \ G) < ∞. So C ∈ T , which satisfies
Definition 14.3.3 (iii).
14.8.9 Theorem: The trivial closed-point topology on a set X is the smallest topology on X for which
{x} is a closed set for all x ∈ X.
Proof: To show that {x} is closed for all x ∈ X for the trivial closed-point topology on the set X, let
Ω = X \ {x} and note that #(X \ Ω) = 1 < ∞.
Let T be the trivial closed-point topology on a set X. To show that T ⊆ T # for all topologies T # on X such
that {x} is a closed subset of X with respect to T # for all x ∈ X, let T # be such a topology. Let Ω ∈ T .
If Ω = ∅, then Ω ∈ T # because T # is a topology. So let Ω be a subset of X such that #(X \ Ω) < ∞.
Let C = {X \ {x}; x ∈ X \ Ω}. Then #(C) < ∞ and C ⊆ T # because every singleton {x} is closed
with respect to T # . If #(C)'= 0 then Ω = X, and so Ω ∈ T # because T # ' is a topology on X. So assume
that 1 < #(C) < ∞. Then C ∈ T # because T # is a topology. Thus Ω = C ∈ T # . Hence T ⊆ T # . That
is, T is smaller than (or equal to) every topology on X for which {x} is a closed set for all x ∈ X.
14.8.10 Remark: Let X be a topological space with the trivial closed-point topology. If X is a finite set,
then Top(X) = IP(X). In other words, The topology is the same as the discrete topology in Definition 14.3.19.
So for any subset S of X, the interior, boundary and exterior are as follows.
S ∅ X \S

372 14. Topology
If the set X is infinite, the interior, boundary and exterior are as follows.
cardinality of S Int(S) Bdy(S) Ext(S)
#(S) < ∞ ∅ S X \S
#(S) = ∞, #(X \ S) = ∞ ∅ X ∅
#(X \ S) < ∞ S X \S ∅
These tables suggest that the trivial closed-point topologies are rather uninteresting. When X is finite,
there are no boundary points for any set, and the interior and exterior are simply S and X \ S for any
set S ∈ IP(X). So the topology gives no information about a set S other than its set of elements.
When X is infinite, any set S for which #(S) = ∞ and #(X \ S) = ∞ has Bdy(S) = X. In other words,
such sets have no interior and no exterior. So the topology says very little of interest about sets S, other
than whether they (and their complements) are finite or infinite.
It can be hoped that this simple class of topologies may be of some use in providing pathological examples
to disprove false conjectures. In particular, various trivial classes of topologies show that Definition 14.3.3
for a general topology is perhaps overly broad. This motivates the introduction of the classes of topologies
in Chapter 15, which add various sets of extra axioms to topologies to make them more likely to be useful.
Sections 14.9 and 14.11 introduce methods of generating topologies which are more interesting than the
trivial closed-point topologies because the interior, boundary and exterior of sets can be made to contain
much more information when topologies are built up in more sophisticated ways.
14.9. Generation of topologies from collections of sets

'
14.9.1 Theorem: Let T be a non-empty set of topologies on a set X. Then T is a topology on X.
'
14.9.2 Theorem: Let (Ti )i∈I be a non-empty family of topologies on a set X. Then T = i∈I Ti is a
topology on X.
14.9.3 Remark: On any set X, a topology may be generated on X from any given subset S of IP(X).
(Note that two distinct subsets S1 -= S2 of IP(X) may generate the same topology on X.) This is a very
general and useful procedure for specifying topologies on sets.
14.9.4 Definition: The topology generated by S on X, for any sets X and S such that S ⊆ IP(X), is the
intersection of all topologies T on X such that S ⊆ T .
14.9.5 Theorem: Let X and S be sets which satisfy S ⊆ IP(X). Then the topology generated by S on X
is a valid topology on X.
Proof: Let T = {T ∈ IP(IP(X)); S ⊆ T and T is a topology on ' X}, where X and S are sets which
satisfy S ⊆ IP(X). Then the topology generated by S on X is equal to T . The set T is non-empty
' because
IP(X) ∈ T for any set X. (See Definition 14.3.19 for the discrete topology IP(X) on X.) So T is a valid
topology on X by Theorem 14.9.1.
14.9.6 Remark: In the special case that S = ∅, the topology generated by S on any set X is the trivial
topology {∅, X} on X. This is still true under the slightly weaker condition that S ⊆ {∅, X}.
14.9.7 Remark: Theorem 14.9.8 shows a method of constructing the topology generated by a set-collection
S on a set X. The condition {∅, X} ⊆ S ensures that the constructed set T is a valid topology on X.
The construction for the set T in Theorem 14.9.8 may be combined in a single line as follows:
% '
T = { Q; ∀U ∈ Q, ∃C ∈ IP(S), (U = C and 1 ≤ #(C) < ∞)}.
14.9.8 Theorem: Let {∅, X} ⊆ S ⊆ IP(X) for sets X and S. Define

)' *
T# = C; C ∈ IP(S) and 1 ≤ #(C) < ∞
and
)% *
T = Q; Q ∈ IP(T # ) .
Then T is the topology generated by S on X.

14.9. Generation of topologies from collections of sets 373
Proof: First show that T is a topology according

% to Definition 14.3.3. To show that ∅ ∈ T , let Q = ∅.
Then
' ∅ ⊆ T #
(by Theorem 5.8.8). So ∅ = Q ∈% . To show that X ∈ T , let C = {X} ⊆ S. Then
T
C = X ∈ T # . So Q = {X} ⊆ T # . Therefore X = Q ∈ T . This establishes Definition 14.3.3 (i).
% "% # "% # %
Let A1 , A2 ∈ T . Then Ai = Qi for some Qi ⊆ T # for i = 1, 2. Hence A1 ∩A2 = Q1 ∩ Q2 = {U '1 ∩
U2 ; U1 ∈ Q1 , U2 ∈ Q2 } (by Theorem 5.14.7 (v)). For i = 1, 2, Ui ∈ Qi implies that U
"' # "' # i ∈ T #
and so
' Ui = Ci
for some collection Ci ⊆ S with 1 ≤ #(Ci ) < ∞. Therefore U1 ∩ U2 = C'1 ∩ C2 = (C1 ∪ C2 )
(by Theorem 5.14.7 (x)). Then C1 ∪ C2 ⊆ S and 1 ≤ #(C1 ∪ C2 ) < ∞. So (C1 ∪ C2 ) ∈ T # . That is,
U1 ∩ U2 ∈ T # . Therefore A1 ∩ A2 ∈ T . This proves part (ii) of Definition 14.3.3.
The closure of T under arbitrary unions is guaranteed by Theorem 5.15.4. So T satisfies all of the conditions
of Definition 14.3.3 for a topology.
To prove that T is the topology generated by S on X, show'first that S ⊆ T . Note that S ⊆ T # . (This is
because C = {U } ⊆ S and #(C) = 1 for % all U ∈ S. So U = C ∈ T .) Similarly, T ⊆ T . (This is because
# #
Q = {V } ⊆ T for all V ∈ T . So V = Q ∈ T .) It follows that S ⊆ T .

# #
Let T̄ be a topology on X which satisfies S ⊆ T̄ . Then T # ⊆ T̄ because T̄ is closed under finite intersection
and T # is the closure of S under finite intersection. Similarly, T ⊆ T̄ because T̄ is closed under arbitrary
unions and T is the closure of T # under arbitrary unions. Therefore T is included in the intersection of
all topologies T̄ on X which include S. Since T is itself such a topology, it follows that T is equal to the
intersection of all such topologies. Therefore T satisfies Definition 14.9.4 for the topology generated by S
on X.
[ Try to show a reverse of Theorem 14.9.8, namely that the set of all finite intersections of arbitrary unions of
a set of sets is a topology. If this is not true, investigate why not. If it is true, this may need some support
from a theorem which looks like Theorem 5.15.4. ]
14.9.9 Remark: The proof of Theorem 14.9.8 does not use the axiom of choice because it was carefully
avoided in the proof of the closure of T under arbitrary unions in Theorem 5.15.4. Avoiding the axiom of
choice in topology is difficult because topology deals with such general sets and collections of sets. Measure
theory is another subject which tempts one to use the axiom of choice because of the enormous generality
of the sets.
14.9.10 Remark: It is a useful exercise to verify theorems in topology for trivial cases. See Exercise 46.7.3
for verification of Theorem 14.9.8 for X = ∅.
14.9.11 Remark: The topology generated on a set X by a set S of subsets of X is the unique topology
on X which is weaker than all other possible topologies on X.
14.9.12 Remark: Theorem 14.9.13 is a version of Theorem 14.9.8 which assumes that the set S is closed
under finite intersections. Then only half of the construction work is required to build the topology generated
by S on X. To facilitate comparisons, the set S is denoted as T # . In fact, the proof of Theorem 14.9.8 could
have been shortened by first proving Theorem 14.9.13 and then applying it to the set T # in Theorem 14.9.8.
14.9.13 Theorem: Let X and T # be sets such that {∅, X} ⊆ T # ⊆ IP(X) and T # is closed under finite
intersections. Define )% *
T = Q; Q ∈ IP(T # ) .
Then T is the topology generated by T # on X.
%
Proof: To show that ∅ ∈ T%, let Q = ∅ ∈ IP(T # ). Then ∅ = Q ∈ T . To show that X ∈ T , let
Q = {X} ∈ IP(T # ). Then X = Q ∈ T . So {∅, X} ⊆ T .
%
To show that T is closed under finite
" % intersections,
# " % # let%A1 , A2 ∈ T . Then Ai = Qi for some Qi ⊆ T
#
for i = 1, 2. Hence A1 ∩ A2 = Q1 ∩ Q2 = Q with Q = {U1 ∩ U2 ; U1 ∈ Q1 , U2 ∈ Q2 } (by

Theorem 5.14.7 (v)). For i = 1, 2, Ui ∈ Qi implies that
% Ui ∈ T and so U1 ∩ U2 ∈ T by the closure of T
# # #
under finite intersections. So Q ∈ IP(T ). Therefore Q ∈ T . That is, A1 ∩ A2 ∈ T . Hence T is closed

#
under finite intersections.

The closure of T under arbitrary unions is guaranteed by Theorem 5.15.4. So T satisfies all of the conditions
of Definition 14.3.3 for a topology on X.

374 14. Topology
that T is the topology generated by T # on X, first show that T # ⊆ T . Let U ∈ T # . Then {U } ∈ IP(T # ).
To show%
So U = {U } ∈ T . Hence T # ⊆ T .
Let T̄ be a topology on X which satisfies T # ⊆ T̄ . Then T ⊆ T̄ because T̄ is closed under arbitrary unions
and T is the closure of T # under arbitrary unions. Therefore T is included in the intersection of all topologies
T̄ on X which include T # . Since T is itself such a topology, it follows that T is equal to the intersection of
all such topologies. Therefore T satisfies Definition 14.9.4 for the topology generated by T # on X.
14.10. The standard topology for the real numbers

[ Maybe also present other “standard topologies” in this section. For example, standard topologies for the
integers, rational numbers and complex numbers, and products of these spaces. Could also present some
basic properties of the standard topologies in this section. ]
14.10.1 Definition: The usual topology for the real numbers is the topology generated by the set of all
real open intervals.
14.10.2 Remark: Definition 14.10.1 is not as circular as it looks. In Definition 8.3.10, an “open interval”
−
is defined as a set of the form (a, b), where a, b ∈ IR with a ≤ b. It turns out that open intervals are indeed
open sets in the usual topology of IR as one would expect.
14.10.3 Definition: The usual topology for IRn for n ∈ +

is the topology generated by the set of all
products of real open intervals.
.
14.10.4 Remark: The products of real open intervals in Definition 14.10.3 are sets of the form ni=1 (ai , b1 ),
where (ai )i=1 , (bi )i=1 ∈ IR are sequence of real numbers such that ai ≤ bi for all
n n n
.ni = 1 . . . n. The usual
topology on IRn is the same as the product topology for the set product IRn = i=1 IR. (See Definition
15.1.1 for general product topologies.)
14.10.5 Remark: Topological properties of real number intervals are presented in Section 15.8.
[ Also present the standard topology on the complex numbers here. ]
14.11. Open bases and open subbases
14.11.1 Remark: In practice, people do not specify all of the open sets in a topology. A topology is
most conveniently generated from an open base or open subbase. This is similar to the way a linear space
is generated from a basis. Many operations on linear spaces can be specified for a basis, from which the
operations on the whole space follow. In the same way, many definitions and calculations for topological
spaces may be specified for an open base or open subbase, from which the corresponding operations follow
for the full topology.
14.11.2
) % Definition:
* An open base for a topological space (X, T ) is a set S ⊆ IP(X) such that T =
D; D ⊆ S .
) ' Definition: An open subbase

14.11.3 * for a topological space (X, T ) is a subset S of IP(X) such that the
set C; C ⊆ S and 1 ≤ #(C) < ∞ of finite intersections of sets in S is an open base for (X, T ).
14.11.4 Theorem: Let X and S be sets satisfying {∅, X} ⊆ S ⊆ IP(X). Let T be a topology on X. Then
S is an open subbase for the topological space (X, T ) if and only if T is the topology generated by S on X.
14.11.5 Theorem: Let (Y, TY ) be a topological space and let f : X → Y be any function from X to Y .
Then TX = {f −1 (Ω); Ω ∈ TY } is a topology for X.
Proof: ∅, X ∈ TX since ∅ = f −1 (∅) and X = f −1 (Y ). If G1 , G2 ∈ TX , then G1 = f −1 (Ω1 ) and

G2 = f −1 (Ω2 ) for some Ω1 , Ω2 ∈ TY . So by Theorem 6.6.4 (iv), G1 ∩ G2 = f −1 (Ω1 ∩ Ω2 ) ∈ TX . Let (Gi )i∈I
be
% a family −1of "sets
% Gi ∈# TX such that Gi = f (Ωi ) for some sets Ωi ∈ TY . Then by Theorem 6.6.8 (iii),
−1
i∈I Gi = f i∈I Ωi ∈ TY . So TX satisfies the conditions of Definition 14.3.3 for a topology.

14.11. Open bases and open subbases 375
14.11.6 Remark: Theorem 14.11.5 does not have a forward analogue which states that (Y, TY ) is a topol-
ogy with TY = {f (Ω); Ω ∈ TX } for a topological space (X, TX ) with f : X → Y . The reason the inverse
map f −1 works so well is that an inverse function is always one-to-one and onto, which causes the set map
f −1 : IP(Y ) → IP(X) to send intersections to intersections and unions to unions. (See Theorems 6.6.3, 6.6.4
and 6.6.8.)
14.11.7 Remark: Theorem 14.11.5 is generalized in Theorem 14.11.8 to an arbitrary family of functions
and target topologies.
14.11.8 Theorem: Let (Yi , Ti )i∈I be a family of topological spaces for some non-empty index set I. Let
X be a set and let fi : X → Yi be a function for all i ∈ I. Define
) ' *
#
TX = fj−1 (Ωj ); J ⊆ I, 1 ≤ #(J) < ∞, and ∀j ∈ J, Ωj ∈ Tj
j∈J
and
)% *
TX = #
S; S ⊆ TX .
Then TX is a topology for X.

% %
Proof: ∅, X ∈ TX because ∅ = ∅ and X = fi−1 (Yi ) for any i ∈ I. Let S = i∈I {fi−1 (Ωi ); Ωi ∈ Ti } =
)' *
{fi−1 (Ωi ); i ∈ I, Ωi ∈ Ti } and S # = C; C ⊆ S, 1 ≤ #(C) < ∞ . Clearly TX #
⊆ S # . Since any element
−1
of S is a finite intersection
#
' of sets fi (Ωi ), there must be only a'finite number of these sets for each i ∈ I.
The intersection k∈K fi−1 (Ωi,k ) is equal to fi−1 (Ωi ) with Ωi = k∈K Ωi,k . So TX #
= S # . Therefore TX is a
topology on X by Theorem 14.9.8.
14.11.9 Definition: The weak topology on a) % set X generated by a non-empty * family of maps (fi )i∈I and
−1
topological spaces (Xi , Ti )i∈I is the set TX = f
i∈I i (Ω i ); ∀i ∈ I, Ω i ∈ T i .
14.11.10 Remark: Theorem 14.11.8 shows that the set TX in Definition 14.11.9 is a topology on X. It
is also true that any topology on X which contains all of the sets fi−1 (Ωi ) must include the weak topology
on X.
[ Should also define the strong topology on the range of a function or family of functions. This would be
useful for Definition 24.4.3. ]
%
[ Do a version of Theorem 14.11.8 for fi partially defined on X and X = i Dom(fi ). ]
14.11.11 Remark: Theorem 14.11.15 is referred to in Remark 27.8.10. It is applicable to the construction
of a topology on a set from an atlas.
Definition: An open cover of a subset S of a topological space X −

14.11.12 % < (X, T ) is a set C ⊆ T such
that S ⊆ C.
A finite open cover of a set S is an open cover C of S such that #(C) < ∞.
14.11.13 Definition: The relative topology on a subset S of a topological space (X, T ) is the set of
intersections of sets of T with S. Thus Top(S) = {G ∩ S; G ∈ T }.
14.11.14 Remark: The fact that {G ∩ S; G ∈ T } is a topology for S in Definition 14.11.13 follows easily
from the distributivity of set intersection with respect to set union.
14.11.15 Theorem: Let (X, T ) be a topological space and let C%be an open cover of X. Then for any
set S ⊆ X, S ∈ T if and only if Ω ∩ S ∈ T for all Ω ∈ C. Hence T = Ω∈C TΩ , where TΩ denotes the relative
topology on Ω.
Proof: Suppose S ∈ T . Then Ω ∩ S ∈ T for all Ω ∈ C by the closure of T under finite intersection.
"% # %
Suppose Ω ∩ S ∈ T for all Ω ∈ C. Then S = X ∩ S = C ∩ S = Ω∈C (Ω %∩ S) ∈ T by the closure of T
under union. Since Ω ∩ S ∈ T if and only if Ω ∩ S ∈ TΩ , it follows that T = Ω∈C TΩ .

376 14. Topology
14.12. Continuous functions

14.12.1 Remark: According to Bynum et alia [192], page 14, continuity of functions was first defined by
Cauchy in 1821. Definition 14.12.2 is the standard modern definition for continuous functions.
14.12.2 Definition: A continuous function from a topological space X to a topological space Y is a

function f : X → Y such that ∀Ω ∈ Top(Y ), f −1 (Ω) ∈ Top(X).
14.12.3 Theorem: All constant functions are continuous.
Proof: Let f : X → Y for topological spaces X and Y . Then f is constant if and only if ∃a ∈ Y, ∀x ∈
X, f (x) = a. Let a ∈ Y satisfy ∀x ∈ X, f (x) = a. Let Ω ∈ Top(Y ). Then either a ∈ Ω or a ∈
/ Ω. So either
f −1 (Ω) = X or f −1 (Ω) = ∅. In either case, f −1 (Ω) ∈ Top(X) by Definition 14.3.3 (i).
14.12.4 Theorem: Let X be a set, Y be a topological space, and f : X → Y . Then {f −1 (Ω); Ω ∈ Top(Y )}
is a topology on X.
Proof: Let T = {f −1 (Ω); Ω ∈ Top(Y )} for a function f : X → Y for a topological space Y . Then
{∅, X} ⊆ T because ∅ = f −1 (∅) and X = f −1 (Y ), since {∅, Y } ⊆ Top(Y ). So T satisfies Definition 14.3.3 (i)
for a topology on X.
Let S1 , S2 ∈ T . Then S1 = f −1 (Ω1 ) and S2 = f −1 (Ω2 ) for some Ω1 , Ω2 ∈ Top(Y ). But Ω1 ∩ Ω2 ∈ Top(Y )
by Definition 14.3.3 (ii). So by Theorem 6.6.4 (iv), S1 ∩ S2 = f −1 (Ω1 ) ∩ f −1 (Ω2 ) = f −1 (Ω1 ∩ Ω2 ) ∈ T . So T
satisfies Definition 14.3.3 (ii).
%
Let C ⊆ T . Then ∀S%∈ C, ∃Ω %S ∈ Top(Y ), S = f −1 (ΩS ).%But C ∈ Top(Y ) by Definition 14.3.3 (iii). So by
Theorem 6.6.8 (iii), C = {f −1 (ΩS ); S ∈ C} = f −1 ( S∈C ΩS ) ∈ T . So T satisfies Definition 14.3.3 (iii).
Hence T is a topology on X.
14.12.5 Definition: The topology induced on a set X by a function f : X → Y from a topological space Y
is the topology {f −1 (Ω); Ω ∈ Top(Y )} on X.
14.12.6 Remark: The inverse-image topology in Theorem 14.12.4 is given a name in Definition 14.12.5.
This is quite possibly not a standard name for this topology.
If f : X → Y is a constant function, the topology induced by f on X is the trivial topology. Since the
trivial topology on a set X is weaker than any other topology on X, it follows from Theorem 14.12.7 that
all constant functions are continuous, as already stated in Theorem 14.12.3.
14.12.7 Theorem: For topological spaces X and Y , let f : X → Y . Then f is continuous if and only if
the topology {f −1 (Ω); Ω ∈ Top(Y )} induced by f on X is weaker (i.e. not stronger) than Top(X).
14.12.8 Remark: Theorem 14.12.9 is a specialization of Theorem 14.12.11 to injective functions. However,
for the purposes of proof discovery, it is helpful to first prove the injective version before the general version.
14.12.9 Theorem: For topological spaces X and Y , let f : X → Y be injective. Then the following
statements are equivalent.
(1) f is continuous.
(2) ∀S ∈ IP(X), f (Int(S)) ⊇ Int(f (S)).
" #
(3) ∀S ∈ IP(X), f (Ext(S)) ⊇ Int(f (X)) ∩ Ext(f (S)) .
(4) ∀S ∈ IP(X), f (Bdy(S)) ⊆ Bdy(f (S)).
Proof: To show that part (1) implies part (2), let f : X → Y be an injective, continuous function,
and let S ∈ IP(X). Then Int(f (S)) ⊆ f (S) by Theorem 14.5.9 (2). So f −1 (Int(f (S))) ⊆ f −1 (f (S))
by Theorem 6.6.4 (ii). But f −1 (f (S)) = S because f is injective. So f −1 (Int(f (S))) ⊆ S. Therefore
Int(f −1 (Int(f (S)))) ⊆ Int(S) by Theorem 14.5.9 (10). But Int(f −1 (Int(f (S)))) = f −1 (Int(f (S))) since
f −1 (Int(f (S))) ∈ Top(X) because f is continuous and Int(f (S)) ∈ Top(Y ). Therefore f −1 (Int(f (S))) ⊆
Int(S). By Theorem 6.6.3 (ii), it then follows that f (f −1 (Int(f (S)))) ⊆ f (Int(S)). So Int(f (S)) ⊆ f (Int(S))
by Theorem 6.6.6 (i).

14.12. Continuous functions 377
To show that part (2) implies part (1), let f : X → Y be injective and suppose that f (Int(S)) ⊇ Int(f (S))
for all S ∈ IP(X). Let S # ∈ Top(Y ). Then f −1 (S # ) ∈ IP(X). So f (Int(f −1 (S # ))) ⊇ Int(f (f −1 (S # ))). But
Int(f (f −1 (S # ))) = Int(S # ) by Theorem 6.6.6 (i), and Int(S # ) = S # because S # ∈ Top(Y ). So f (Int(f −1 (S # ))) ⊇
S # . Therefore Int(f −1 (S # )) = f −1 (f (Int(f −1 (S # )))) ⊇ f −1 (S # ) by the injectivity of f and Theorem 6.6.4 (ii).
Therefore Int(f −1 (S # )) = f −1 (S # ) by ZF axiom of extension (Definition 5.1.26 (1)). So f −1 (S # ) ∈ Top(X).
Hence f is continuous.
To show that part (2) implies part (3), note that Int(f (X)) ∩ Ext(f (S)) ⊆ Int(f (X) ∩ (Y \ f (S))) by
Theorem 14.6.7 (1) and Theorem 14.5.10 (8). This equals Int(f (X) \ f (S)) because f (X) ⊆ Y . This equals
Int(f (X \ S)) because f is injective. Then by part (2), Int(f (X \ S)) ⊆ f (Int(X \ S)) = f (Ext(S)). So
Int(f (X)) ∩ Ext(f (S)) ⊆ f (Ext(S)) as claimed.
(To be continued . . . )
To show the equivalence of parts (2) and (4), note that Int(S) = S \ Bdy(S) by Theorem 14.6.7 (14). So
f (Int(S)) = f (S) \ f (Bdy(S)). Therefore for a fixed set S ∈ IP(X) . . .
14.12.10 Remark: To show that the injectivity condition in Theorem 14.12.9 cannot be discarded, con-
sider Y = {a} with the discrete topology on Y . Then all subsets of Y are open. Therefore Int(A) = A and
Bdy(A) = ∅ for all A ∈ IP(Y ).
By Theorem 14.12.3, all functions f : X → Y are continuous (because they are necessarily constant). Let
X be a topological space which does not have the discrete topology. Then there is a set S ∈ X such that
Int(S) ∈
/ Top(X) and Bdy(S) -= ∅. Let A = f (S). Then
14.12.11 Theorem: For topological spaces X and Y , let f : X → Y . Then the following statements are
equivalent.
(1) f is continuous.
(2) ∀S ∈ IP(Y ), f (Int(f −1 (S))) ⊇ Int(S).
(3) ∀S ∈ IP(Y ), f (Bdy(f −1 (S))) ⊆ Bdy(S).
Proof: To show that part (1) implies part (2), let f : X → Y be continuous and S ∈ IP(Y ). Then Int(S) ⊆
S by Theorem 14.5.9 (2). So f −1 (Int(S)) ⊆ f −1 (S) by Theorem 6.6.4 (ii). Therefore Int(f −1 (Int(S))) ⊆
Int(f −1 (S)) by Theorem 14.5.9 (10). So f (Int(f −1 (Int(S)))) ⊆ f (Int(f −1 (S))) by Theorem 6.6.3 (ii). But
f −1 (Int(S)) ∈ Top(X) because Int(S) ∈ Top(Y ) and f is continuous. So Int(f −1 (Int(S))) = f −1 (Int(S))
by Theorem 14.5.10 (1). Therefore f (Int(f −1 (Int(S)))) = f (f −1 (Int(S))), which equals Int(S) by Theo-
rem 6.6.6 (i). So Int(S) ⊆ f (Int(f −1 (S))) as claimed.
To show that part (2) implies part (1), suppose that f : X → Y satisfies f (Int(f −1 (S))) ⊇ Int(S) for
all S ∈ IP(Y ). Let Ω ∈ Top(Y ). Then f (Int(f −1 (Ω))) ⊇ Int(Ω) = Ω. However, Int(f −1 (Ω)) ⊆ f −1 (Ω)
by Theorem 14.5.9 (2). So f (Int(f −1 (Ω))) ⊆ f (f −1 (Ω)) = Ω by Theorem 6.6.3 (ii) and Theorem 6.6.6 (i).
Therefore f (Int(f −1 (Ω))) = Ω by the ZF axiom of extension (Definition 5.1.26 (1)). . . .
14.12.12 Notation: C(X, Y ) denotes the set of continuous functions from X to Y , for topological spaces
X and Y .
C 0 (X, Y ) is an alternative notation for C(X, Y ) for topological spaces X and Y .
14.12.13 Remark: The alternative notation C 0 (X, Y ) in Notation 14.12.12 is often used in contexts where
the sets C k (X, Y ) of k-times continuously differentiable functions from X to Y for k ∈ + 0 are also defined.
Even if differentiability of functions from X to Y is not defined, the C notation is a convenient abbreviation
0
for the word “continuous” because the notation “C” on its own is ambiguous, whereas “C 0 ” strongly suggests
the idea of continuity. It is often said that a “function is C 0 ”, but rarely that a “function is C”. (See for
example Notation 18.4.10 for C k function spaces.)
Notation 14.12.14 defines the default range Y of C(X, Y ) as the real numbers. In other words, C 0 (X) is
defined to equal C 0 (X, IR).

378 14. Topology
14.12.14 Notation: C 0 (X) denotes the set of continuous functions from X to IR for a topological space X,
using the standard topology on IR.
[ Near here, define the limit of a function at a point and define continuity in terms of this (as the limit
everywhere equalling the value of the function). Then compare this definition with Definition 14.12.2. ]
[ Check that Definition 14.12.15 is correct. Prove that it is the same as more usual definitions. Maybe
the closure operations are superfluous, or even wrong. Most importantly, check that Theorem 14.12.20 is
correct. ]
14.12.15 Definition: The limit set of a function f at a point x, for topological spaces X and Y , function
f : X → Y and x ∈ X, is the set
'
f (Ω \ {x}) = {y ∈ Y ; ∀Ω ∈ Topx (X), y ∈ f (Ω \ {x})}.
Ω∈Topx (X)
The limit of a function f at a point x, for topological spaces X and Y , function f : X → Y and x ∈ X, is
the unique element in the limit set of f at x if the limit set contains one and only one element.
14.12.16 Notation: limz→x f (z), for a function f : X → Y , for topological spaces X and Y with x ∈ X,
denotes the limit of f at x if the limit is well defined.
14.12.17 Remark: Although Notation 14.12.16 is commonly used for the limit of a function f , it contains
a superfluous variable z. Notation 14.12.18 is preferable, but is probably non-standard.
14.12.18 Notation: limx f , for a function f : X → Y , for topological spaces X and Y with x ∈ X,
denotes the limit of f at x if the limit is well defined.
14.12.19 Remark: The variable z in Notation 14.12.16 is not superfluous when the function f is defined
“inline”; in other words, when f (z) is replaced by an expression in which z is a dummy variable. For example,
z is a dummy variable in the expression
|g(z) − g(p) − v(z − p)|

lim .
|z − p|
z→p
The above expression implicitly defines a function f inline by
|g(z) − g(p) − v(z − p)|

f : z 8→ .
|z − p|
Using Notation 14.12.16, it is not necessary to explicitly define the function f first before evaluating its limit.
14.12.20 Theorem: A function f : X → Y for topological spaces X and Y is continuous at x ∈ X if and

only if the limit limx f of f at x is well defined and f (x) = limx f .
[ Must fix up Definition 14.12.21, which is based on the direct product topology of an arbitrary cross product
which has not yet been defined! See EDM2 [35], 435.B, for the equivalence of pointwise convergence and the
product topology. ]
14.12.21 Definition: The topology of pointwise convergence on the set C(X, Y ) for topological spaces
X−< (X, TX ) and Y −
< (Y, TY ) is the restriction to IP(C(X, Y )) of the direct product topology on Y X .
14.12.22 Remark: See Definition 15.7.9 for the compact-open topology on C(X, Y ).
14.12.23 Theorem: The weak topology T in Definition 14.11.9 is weaker than all other topologies on the
set X for which all functions fi : X → Xi are continuous.
[ See EDM2 [35], article 84, for more material on continuous functions. ]
14.12.24 Theorem: Let X be a topological space and let (fi )m i=1 be a finite sequence of continuous func-
tions fi : X → IR. Then the functions minm
i=1 fi and maxi=1 fi are both continuous functions on X.
m

14.13. Homeomorphisms 379
14.12.25 Remark: Sequences are a special kind of function whose domain is a totally ordered set. When
the domain is the set of non-negative integers, the only limiting process that can occur is the limit as a point
in the domain “tends to infinity”. Since such a sequence has no value at infinity, it is not possible to define
continuity “at infinity”. However, limits do still make good sense as for functions on the real numbers and
other topological spaces.
14.12.26 Remark: The term “limit point” for sets has a slightly different meaning to a “limit point” of
a sequence of points in a topological space. If all of the points of a sequence are the same, then the limit or
limit point of that sequence is the constant value of that sequence, but that point is not a limit point of the
range of (i.e. the set of points in) the sequence. But if all elements of a sequence are different, a limit point
of the sequence is a limit point of the range of the sequence.
Sequences may be defined as functions (or families) whose domain is a totally ordered set. This set is not
necessarily a subset of the integers. In Definition 14.12.27, it is clear that there can be no limit point of a
finite sequence. The generality of the definition of a sequence permits functions of real numbers, for example,
to be regarded as sequences which may have limit points in accordance with Definition 14.12.27.
14.12.27 Definition: A limit (point) of a sequence (xi )i∈I of points in a topological space X is a point
y ∈ X such that ∀Ω ∈ Topy (X), ∃n ∈ I, ∀i ≥ n, xi ∈ Ω.
A convergent sequence in a topological space X is a sequence of points in X which has a limit point in X.
A divergent sequence in a topological space X is a sequence of points in X which is not convergent in X.
14.13. Homeomorphisms
14.13.1 Definition: A homeomorphism between two topological spaces (X1 , T1 ) and (X2 , T2 ) is a bijection
f : X1 → X2 such that both f and f −1 are continuous. Two topological spaces (X1 , T1 ) and (X2 , T2 ) are
said to be homeomorphic if there exists a homeomorphism f : X1 → X2 .
A (topological) automorphism on a topological space X −
< (X, TX ) is a homeomorphism from X to X.
14.13.2 Notation: The notation X1 ≈ X2 means that the topological spaces X1 and X2 are homeomor-
phic with respect to topologies which are implicit in the context.
The notation f : X1 ≈ X2 means that f is a homeomorphism between the topological spaces X1 and
X2 with implicit topologies. Similarly, this notation may be used for explicit topologies; for example,
(X1 , T1 ) ≈ (X2 , T2 ) and f : (X1 , T1 ) ≈ (X2 , T2 ).
14.13.3 Notation: Iso(X, Y ) for topological spaces X −

< (X, TX ) and Y −
< (Y, TY ) denotes the set of all
topological isomorphisms from X to Y .
Aut(X) for a topological space X −
< (X, TX ) denotes the set of all topological automorphisms on X.
14.13.4 Remark: The notation “Iso” in Notation 14.13.3 seems to be non-standard. But it is a kind of
space which is often used in differential geometry; so it should have its own notation. The space Aut(X) is
the same as Iso(X, X). Spaces of morphisms apply to a very wide range of classes of structures. In a narrow
context, there is no danger of confusion, but in differential geometry, isomorphisms are often intermingled
for different classes within a single context. Therefore a subscript may sometimes be used to distinguish
structure classes.
14.13.5 Remark: It follows from Definition 14.13.1 that the set map f : IP(X1 ) → IP(X2 ) of f is a
bijection between the open sets in T1 and T2 . In other words, there is a one-to-one correspondence between
the topologies of the two sets. This implies that absolutely all topological properties of the two sets are
identical. The homeomorphism relation is clearly an equivalence relation, but the set of all topological
spaces is not a set! So this is not an equivalence relation in the sense of a set of ordered pairs as in
Definition 6.3.2.
14.13.6 Theorem: Let f : (X1 , T1 ) ≈ (X2 , T2 ) be a homeomorphism. Let S1 be a subset & of X1 with the
relative topology from T1 . Let S2 = f (S1 ) have the relative topology from T2 . Then f &S1 : S1 → S2 is a
homeomorphism.

380 14. Topology
Proof: Clearly the restriction of a bijection is always a bijection. Denote by T1# and T2&# the relative
topologies for S1 and S2 from (X1 , T1 ) and (X2 , T2 ) respectively. To show the continuity of f &S , note that
1
if &G#2 ∈ T2# , then G#2 = G2 ∩ S2 for some G&2 ∈ T2 . Since f is continuous, f −1 (G2 ) ∈ T1 . It follows that
(f &S )−1 (G#2 ) = f −1 (G2 ) ∩ S1 ∈ T1 . So f &S is continuous with respect to the relative topologies. The
1 1 &
continuity of the inverse follows symmetrically. Therefore f &S is a homeomorphism with respect to the
& 1
relative topologies on S1 and f (S1 ). That is, f &S1 : (S1 , T1# ) ≈ (S2 , T2# ).
[ The composite of two homeomorphisms is a homeomorphism. Use general composition of relations for this. ]

[381]
Chapter 15
Topology classes and constructions
15.1 Product and quotient topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

15.2 Separation classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
15.3 Separation and disconnection of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
15.4 Connectivity classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
15.5 Definition of continuity of functions using connectivity . . . . . . . . . . . . . . . . . . . . 392
15.6 Open bases, countability classes and separability . . . . . . . . . . . . . . . . . . . . . . . 394
15.7 Compactness classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
15.8 Topological properties of real number intervals . . . . . . . . . . . . . . . . . . . . . . . . 397
15.9 Topological dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
15.10 Set union topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
15.11 Topological identification spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
15.1. Product and quotient topologies

[ The terms “product topology”, “direct product topology” may be synonymous. See EDM2 [35], 425.K. ]
15.1.1 Definition: The (direct) product topology for two topological spaces (X1 , T1 ) and (X2 , T2 ) is the
topology T for the set product X1 × X2 which is the set of all unions of sets of the form G1 × G2 such that
G1 ∈ T1 and G2 ∈ T2 . That is,
)% *
T = A; A ⊆ {G1 × G2 ; G1 ∈ T1 and G2 ∈ T2 } .
15.1.2 Remark: The product topology T in Definition 15.1.1 is the weakest topology for which the projec-
tion maps f1 : X1 ×X2 → X1 and f2 : X1 ×X2 → X2 are continuous. The fact that the set T is a topology in
both Definitions 15.1.1 and 15.1.4 follows from Theorem 14.11.8. (See Definition 6.9.8 for projection maps.
See Definition 14.12.2 for continuity.)
15.1.3 Theorem: Let X, Y be topological spaces. Let f : X → Y be a continuous function. Then the
map g : X → X × Y defined by g(x) = (x, f (x)) is continuous.
Proof: Let Ω ∈ Top(X × Y ). It must be shown that g −1 (Ω) ∈ Top(X). Let x ∈ g −1 (Ω) ∈ Top(x). Then
(x, f (x)) ∈ Ω. So G1 × G2 ⊆ Ω for some G1 ∈ Topx (X) and G2 ∈ Topf (x) (Y ) by Definition 15.1.1. Therefore
x ∈ G1 ⊆ g −1 (Ω). It follows that x is in the interior of g −1 (Ω). Hence g −1 (Ω) ∈ Top(X).
[ Must define product topology for arbitrary Cartesian products ×i∈I Xi . ]
15.1.4 Definition: The (direct) product topology for the set product X = ×i∈I Xi of a family of topological
spaces (Xi , Ti )i∈I is the set of all unions of sets ×i∈I Ωi ∈ ×i∈I Ti such that #{Ωi ; Ωi -= Xi } < ∞.
In other words, the (direct) product topology on X is the weak topology on X generated by the projection
maps πi : X → Xi of X onto the sets Xi as in Definition 14.11.9. That is, it is the set T defined by
) *
T # = × Ωi ∈ × Ti ; #{i ∈ I; Ωi -= Xi } < ∞
i∈I i∈I
and

382 15. Topology classes and constructions
)% *
T = S; S ⊆ T # .
15.1.5 Remark: There are some Axiom of Choice issues with defining direct products of arbitrary sets of
topological spaces. There are even more significant AC issues with theorems about arbitrary products of
topological spaces, in particular for compactness classes. In the unusual (or even pathological) case where
the product of a non-empty family of non-empty topologies turns out to be empty, many claims about the
properties of the product topology would be true simply because it is empty.
Definition 15.1.4 does not seem to have any% AC issues. The set ×i∈I Ti is guaranteed non-empty because
it contains
% at least the function f : I → i∈I Ti defined by f : i 8→ ∅ for i ∈ I. (Clearly the function
g : I → i∈I Ti defined by g : i 8→ Xi for i ∈ I is equally well-defined.) The set products ×i∈I Ωi ∈ ×i∈I Ti
are all well-defined when only a finite number of the Ωi are not equal to Xi . There’s is no difficulty in
“choosing” such sets of sets. Therefore the product topology
) has an abundance
* of open sets to work with.
An AC problem certainly would have arisen if the set Ωi ; Ωi ∈ / {∅, Xi } was required to be always infinite.
15.1.6 Definition: The quotient topology on a set Y induced by a surjective map f : X → Y , where
(X, TX ) is a topological space, is the set TY = {G ⊆ Y ; f −1 (G) ∈ TX }.
15.1.7 Remark: The quotient topology on Y is the strongest topology for which f is continuous. (See
Definition 14.12.2 for continuity.) The fact that f is continuous with respect to TX and TY follows from
directly from the definition of continuity. If any more open sets were added to TY , then clearly the definition
of continuity would fail.
15.1.8 Definition: The quotient topology for the quotient set Y = X/R is the quotient topology on the
set Y induced by the quotient projection f : X → Y .
15.1.9 Theorem: Let X, A and B be topological spaces. Suppose f : X → A and g : X → B are functions
such that f × g : X → A × B is a& homeomorphism for the standard set-product topology on A × B. Then
for all a ∈ A, the restricted map g &f −1 ({a}) : f −1 ({a}) → ({a} × B) is a homeomorphism with respect to the
relative topologies on f −1 ({a}) and {a} × B, and f −1 ({a}) ≈ B. (See Figure 15.1.1. See Definition 6.9.12
for the pointwise direct product function f × g. See Definition 14.11.13 for relative topology.)
f
X a A
f −1 ({a})
f ×g :X ≈A×B
g
B A×B
{a} × B
Figure 15.1.1 Homeomorphism for restriction of a product map
&
Proof: It follows from Theorem 14.13.6 that for any a ∈ A, the map (f × g)&f −1 ({a}) : f −1 ({a}) → {a} × B
is a homeomorphism with respect to the relative topologies on f −1 ({a}) ⊆ X and ({a} × B) ⊆ (A × B).
Now define h : {a} × B → B by h(a, b) = b for all b ∈ B. Then h is clearly a bijection, and h and h−1 are
continuous. To show that h is continuous, note that if G ∈ Top(B) then h−1 (G) = {a} × G ∈ Top({a} × B).
To show that h−1 is continuous, let H ∈ Top({a} × B) and note that H = {a} & × G for some G ∈ Top(B)
and therefore h(H) ∈ Top(B). So h is a homeomorphism and g = h ◦ ((f × g)&f −1 ({a}) ) : f −1 ({a}) ≈ B.
[ Should simplify and shorten the proof of Theorem 15.1.9. Should provide a diagram for clarification. ]

15.2. Separation classes 383
15.2. Separation classes

15.2.1 Remark: Very confusingly, “separation” is not closely related to “separability”. A separation
property of a topological space tells you how easy it is to disconnect subsets of the space from each other.
(Connectivity is defined in Section 15.4.) Two subsets of a space are separated by covering them with a pair
of disjoint open sets. In the case of T0 and T1 spaces, only a single open set is involved.
15.2.2 Remark: The most fundamental question one can ask about a topology T on a set X is whether
it at least has a set Ω ∈ T for each pair of points x, y ∈ X such that x ∈ Ω and y ∈ / Ω. If such a set Ω
does not exist for a point pair, the pair of points might as well be considered to be a single point. If x and
y are always in the same open set or outside it, the topology has no capability at all to separate the points.
Therefore it is not surprising that this fundamental property of separation is called the T1 property.
On the other hand, the T0 property is weaker than the T1 property, and it does have some applications.
The T0 property is a kind of semi-T1 property which guarantees only that one point of each pair of points
has a neighbourhood which excludes the other point.
15.2.3 Definition: A T0 (topological) space is a topological space X such that
∀x1 ∈ X, ∀x2 ∈ X \ {x1 },

(∃Ω1 ∈ Topx1 (X), x2 ∈
/ Ω1 ) ∨ (∃Ω2 ∈ Topx2 (X), x1 ∈
/ Ω2 ).
15.2.4 Definition: A T1 (topological) space is a topological space X such that
∀x1 ∈ X, ∀x2 ∈ X \ {x1 }, ∃Ω1 ∈ Topx1 (X), x2 ∈

/ Ω1 .
15.2.5 Remark: The T1 property means that for every pair of distinct points x1 and x2 , there is an open
set Ω1 which contains x1 but does not contain x2 . Since x1 and x2 may be swapped, this implies that
∃Ω2 ∈ Topx2 (X), x1 ∈
/ Ω2 also. Therefore
∀x1 ∈ X, ∀x2 ∈ X \ {x1 },
(∃Ω1 ∈ Topx1 (X), x2 ∈
/ Ω1 ) ∧ (∃Ω2 ∈ Topx2 (X), x1 ∈
/ Ω2 ).
The T0 property in Definition 15.2.3 is weaker than this because it guarantees only the disjunction of these
two propositions rather than the conjunction. A topological space which is not T0 has two (or more) distinct
points which are always both either inside or outside any given open set. In other words, a topological space X
is non-T0 if and only if
" #
∃x1 , x2 ∈ X, x1 -= x2 ∧ ∀Ω ∈ Top(X), (x1 ∈ Ω ⇔ x2 ∈ Ω) .
15.2.6 Remark: It follows from the T1 property in Definition 15.2.4 that there exist open sets Ω1 and
Ω2 such that x1 ∈ Ω1 \ Ω2 and x2 ∈ Ω2 \ Ω1 , but it does not follow that the sets Ω1 and Ω2 can be chosen
such that Ω1 ∩ Ω2 = ∅. (This is illustrated in Figure 15.2.1.) A useful way of thinking of the T1 separation
property is that it guarantees that all single-point sets are closed.
Ω1 x1 x2 Ω1 x1 x2 Ω2
x1 ∈ Ω1 / Ω1
x2 ∈ x1 ∈ Ω1 x2 ∈ Ω2
/ Ω2
x1 ∈ / Ω1
x2 ∈
Figure 15.2.1 T1 separation does not require disjoint covering pairs

15.2.7 Remark: The two-point topologies in Example 14.4.5 have the following T0 and T1 properties.
topology T0 T1
a {∅, X}
b {∅, {1}, X} yes
c {∅, {2}, X} yes
d {∅, {1}, {2}, X} yes yes
The three-point topologies in Example 14.4.6 have the following T0 and T1 properties.
topology T0 T1
a 0
b 1
c 12
d 1, 12 yes
e 1, 2, 12 yes
f 2, 12, 23 yes
g 1, 2, 12, 23 yes
h 1, 2, 3, 12, 13, 23 yes yes
As mentioned in Remark 14.8.10, the only T1 topology on a finite set is the discrete topology.
15.2.8 Theorem: Let X be a T1 topological space. Then the following propositions are true.
(1) ∀x ∈ X, {x} ∈ Top(X).
(2) ∀x ∈ X, {x} = {x}.
(3) ∀x ∈ X, Ext({x}) = X \ {x}.
Proof: For part (1), let x ∈ X. Define C = {Ω ∈ Top(X); % x∈/ Ω}. Then ∀y ∈ X % \ {x}, ∃Ω ∈ C, y %∈ Ω
because X %has the T1 property. Therefore
% ∀y ∈
% X \{x}, y ∈ C, by the definition of C. So X \{x} ⊆ C.
But x ∈ / %C. So X \ {x} = C. But C ∈ Top(X) because C is a set of open sets. Therefore
{x} = X \ C ∈ Top(X).
15.2.9 Theorem: A topological space X has the T1 property if and only if {x} is closed in X for all x ∈ X.
Proof: It follows from Theorem 15.2.8 (1) that {x} is closed in X for all x ∈ X if X is a T1 space. So
suppose that X is a topological space such that {x} is closed in X for all x ∈ X. Then Ωx = X \ {x} is open
for all x ∈ X. So y ∈ Ωx for any y ∈ X \ {x}, but x ∈/ Ωx . This implies the T1 property for X.
15.2.10 Theorem: Let X be a finite T1 topological space. Then Top(X) = IP(X).
15.2.11 Remark: Theorem 15.2.10 means that if a finite set has the T1 property, then all single-point sets
are open. In the case of countably infinite sets, this is not true. This is demonstrated by Example 14.8.3.
Trivial T1 topologies are mentioned in Definition 14.8.6 and Remark 14.8.7.
15.2.12 Remark: The existence of a disjoint covering pair is guaranteed by the T2 separation property,
which is also known as “Hausdorff separation”. This is given by Definition 15.2.13 and illustrated in Fig-
ure 15.2.2.
15.2.13 Definition: A Hausdorff space is a topological space X such that
∀x1 , x2 ∈ X, x1 -= x2 ⇒ ∃Ω1 ∈ Topx1 (X), ∃Ω2 ∈ Topx2 (X), Ω1 ∩ Ω2 = ∅. (15.2.1)
A Hausdorff space is also known as a T2 space.

15.2. Separation classes 385
Ω1 x1 x2 Ω2
x1 ∈ Ω1 x2 ∈ Ω2
Ω1 ∩ Ω2 = ∅
Figure 15.2.2 Hausdorff (T2 ) space requires disjoint covering for a pair of points
15.2.14 Remark: The Hausdorff property (15.2.1) in Definition 15.2.13 means that every pair of distinct
points has a pair of disjoint neighbourhoods. A useful way of thinking of the Hausdorff property is that it
guarantees that every two-point set is disconnected in the sense of Definition 15.4.3.
15.2.15 Remark: For any infinite set X, the topology defined on X in Example 14.8.5 is T1 but not
Hausdorff.
15.2.16 Remark: Non-Hausdorff topological spaces are of very limited use in differential geometry. (In
fact, they’re useless for most applications.) When topological manifolds are defined in Section 25.3, non-
Hausdorff topologies are explicitly excluded. Example 42.3.2 is a non-trivial non-Hausdorff space which gives
a hint of why one would want to exclude them. They’re just more bother than they’re worth. Differential
geometry is bothersome enough already.
Theorem 15.2.17 is an immediate application of the Hausdorff class.
15.2.17 Theorem: Let f : X → Y be a continuous function, where X is a topological space and Y is a
Hausdorff space. Then the graph of f is a closed subset of the product topological space X × Y .
15.2.18 Definition: A completely regular (topological) space is a T1 topological space X such that
∀F ∈ Top(X), ∀x ∈ X \ F, ∃f ∈ C(X, [0, 1]),
f (x) = 0 and ∀y ∈ F, f (y) = 1.
15.2.19 Remark: Definitions 15.2.18 and 15.2.20 use Notation 14.3.15 for the set Top(X) of closed subsets
of X. Definition 15.2.18 is illustrated in Figure 15.2.3.
x f (y) = 0
F
f (y) = 0.3
f (y) = 0.7
f (y) = 1
Figure 15.2.3 Completely regular space, existence of continuous function
For any finite set of points x1 , . . . xm ∈ X \ F in Definition 15.2.18, suitable continuous functions f1 , . . . fm
are guaranteed to exist if the topological space is completely regular. So one may construct f : X → [0, 1]
with f (y) = minmi=1 fi (y) for all y ∈ X. Then f is continuous on X and has the value 1 on F and the value
0 on all points x1 , . . . xm .
15.2.20 Definition: A T4 (topological) space is a topological space X such that for every disjoint pair of
closed sets K1 and K2 in X, there are disjoint open sets Ω1 and Ω2 such that K1 ⊆ Ω1 and K2 ⊆ Ω2 . In
other words,
∀K1 , K2 ∈ Top(X),
K1 ∩ K2 = ∅ ⇒ ∃Ω1 , Ω2 ∈ Top(X), (K1 ⊆ Ω1 and K2 ⊆ Ω2 and Ω1 ∩ Ω2 = ∅).
15.2.21 Definition: A normal (topological) space is a topological space which is both T1 and T4 .

15.2.22 Remark: Definition 15.2.20 is illustrated in Figure 15.2.4. A T4 space has the property that every
disjoint pair of non-empty closed sets is disconnected. (See Theorem 15.4.11.)
Ω1 K1 K2 Ω2
K 1 ⊆ Ω1 K 2 ⊆ Ω2
Ω1 ∩ Ω2 = ∅
Figure 15.2.4 T4 space requires disjoint covering for a pair of closed sets
15.2.23 Remark: The T4 space condition is not very different to the Hausdorff condition. This can be
made clearer by extending the notation Topx (X) = {Ω ∈ Top(X); x ∈ Ω} for points x ∈ X to a notation
TopS (X) = {Ω ∈ Top(X); S ⊆ Ω} for sets S ⊆ X.
∀K1 , K2 ∈ Top(X),
K1 ∩ K2 = ∅ ⇒ ∃Ω1 ∈ TopK1 (X), ∃Ω2 ∈ TopK2 (X), Ω1 ∩ Ω2 = ∅.
This is similar to the Hausdorff space condition (15.2.1).
[ Check if the axiom of choice is required to prove any of the relations between separation classes in Fig-
ure 15.2.5. If so, try to provide AC-free versions of the affected relations. See EDM2 [35], 425.Q, pages 1612–
1613, for numerous statements of theorems regarding the separation classes. ]
15.2.24 Remark: Relations between topological space separation classes are illustrated in Figure 15.2.5.
T−1
topological space
T0
Kolmogorov space
T1
Kuratowski space
T2
Hausdorff space
T3 T0 +T3
Vietoris axiom regular space
T3 1 T1 +T3 1
2 2
Tikhonov axiom completely regular space
T4 T1 +T4
Tietze’s first axiom normal space
T5 T1 +T5
Tietze’s second axiom completely normal space
T6 T1 +T6
Vedenisov axiom perfectly normal space
metrizable space
Figure 15.2.5 Family tree for separation classes of topological spaces

15.3. Separation and disconnection of sets 387
[ For every implication which is not indicated in Figure 15.2.5, give a counterexample to show why the
implication is not generally valid. ]
"
[ According to EDM2 [35], 425.Q, page 1612, a T5 space defined by the property
# that ∀A, B ∈ IP(X), (A∩B̄ =
∅ ∧ Ā ∩ B = ∅) ⇒ (∃Ω1 , Ω2 ∈ Top(X), (A ⊆ Ω1 ∧ B ⊆ Ω2 ∧ Ω1 ∩ Ω2 = ∅)) . In other words if A ⊆ Ext(B)
and B ⊆ Ext(A), then A and B are disconnected. I thought that this could be proved in a normal space.
Check this. ]

(1) If (X, T1 ) is T0 then (X, T2 ) is T0 .
(2) If (X, T1 ) is T1 then (X, T2 ) is T1 .
[ Add assertions for many more separation classes in Theorem 15.2.25. Also do such monotonicity results with
respect to topology strength for other kinds of topology classes. ]
15.2.26 Remark: The Tietze extension theorem is useful when one wishes to specify the value of a function
on a subset of a topological space but the value elsewhere doesn’t matter.
15.2.27 Theorem: [ Tietze extension theorem - see Simmons [140], section 28, page 135. The proof of this
theorem does not seem to require the axiom of choice, but if it does, modify it to remove this dependence. ]
15.2.28 Remark: It is notable that a locally Euclidean space is not necessarily Hausdorff, although all
Euclidean topological spaces are clearly Hausdorff. This is explained in Remark 25.3.2. (See Definition 25.2.1
for Euclidean topological spaces, Definition 25.2.3 for locally Euclidean spaces.)
15.3. Separation and disconnection of sets

15.3.1 Remark: An important role of a topology on a set X is to determine the boundary of each subset
of X. The boundary of a set may be thought of as the separation barrier between its interior and exterior.
As illustrated in Figure 14.6.1 (in Remark 14.6.9), the boundary Bdy(S) of a set S consists of elements of
both the set S and the set X \ S. The topology determines which elements of S and X \ S belong to Bdy(S).
The interior Int(S) and the exterior Ext(S) of a set S in a topological space X are both open sets in X. We
can say, then, that these two open sets are separated by the boundary Bdy(S) of S. Since Int(S) = Ext(X \S)
and Ext(S) = Int(X \S), it follows that Bdy(S) = Bdy(X \S). So Bdy(S) symmetrically separates Ext(X \S)
from Int(X \ S).
More generally, we can then say that any two open sets S1 and S2 are separated from each other if S1 ⊆
Ext(S2 ) and S2 ⊆ Ext(S1 ). (The conditions S1 ⊆ Ext(S2 ) and S2 ⊆ Ext(S1 ) are equivalent to S1 ∩ S̄2 = ∅
and S2 ∩ S̄1 = ∅ respectively.) If these two conditions are satisfied, then either Bdy(S1 ) or Bdy(S2 ) may be
chosen as the separating boundary between S1 and S2 .
This naturally leads to the question of how a topology on X can separate two general subsets S1 and S2
of X from each other. An obvious idea is to define topological separation of S1 and S2 by the same pair of
conditions S1 ⊆ Ext(S2 ) and S2 ⊆ Ext(S1 ). It turns out that this is equivalent to the standard definition of
the “disconnection” of a pair of sets in a normal topological space, but it is not the same as the standard
definition in a non-normal topological space. (See Definition 15.2.21 for normal topological spaces.)
If S1 is entirely included in the exterior of S2 , the boundary of S2 separates S1 from the interior of S2 . If S2
is entirely included in the exterior of S1 , the boundary of S1 separates S2 from the interior of S1 . In other
words, neither set strays into the boundary of the other. So it seems that these conditions are sufficient
to guarantee some intuitive idea of separation between two sets. To check if the conditions are necessary,
suppose there is a point x1 ∈ S1 ∩ S̄2 . Then all neighbourhoods of x1 contain one or more elements of S̄2 .
This does not match the idea of x1 being separated from S2 .
15.3.2 Remark: The words “disconnected” and “connected” are used in topology to mean, respectively,
that two sets are or are not separated. However, the word “connection” is used in the theory of parallelism
to denote differential parallel transport of vectors from one place to another in a differentiable manifold.

(See for example Chapters 35 and 36.) The use of the word “connection” for parallelism is unfortunate.
(See Remark 36.1.3.) A better term would have been “differential parallelism” or just “parallelism”. But
this terminology is unlikely to change. On the other hand, the word “separation” already has two different
meanings in general topology. (See Sections 15.6 and 15.2.) So the words “connected” and “connectivity”
are probably the best choices here. But they must be carefully distinguished from the differential manifold
concept of a “connection”.
15.3.3 Definition: Non-empty sets S1 , S2 are said to be separated in a topological space X if S1 ∩ S̄2 = ∅
and S2 ∩ S̄1 = ∅.
Non-empty sets S1 , S2 are said to be non-separated in a topological space X if S1 ∩ S̄2 -= ∅ or S2 ∩ S̄1 -= ∅.
15.4. Connectivity classes

[ Possibly could define pathwise connectivity in the section. But this might be better defined in a section on
curves (like Section 16.2). ]
15.4.1 Definition: A connected topological space is a topological space (X, T ) such that
∀Ω1 , Ω2 ∈ T, (X = Ω1 ∪ Ω2 and Ω1 ∩ Ω2 = ∅) ⇒ (Ω1 = ∅ or Ω2 = ∅).
In other words, X cannot be partitioned into two non-empty open sets.
15.4.2 Remark: Any set with less than two elements must be connected because it cannot be partitioned
into two different subsets. Therefore the empty set and all singletons are connected.
15.4.3 Definition: A connected subset of a topological space (X, T ) is a set Y ⊆ X such that
∀Ω1 , Ω2 ∈ T, (Y ⊆ Ω1 ∪ Ω2 and Ω1 ∩ Ω2 = ∅) ⇒ (Ω1 ∩ Y = ∅ or Ω2 ∩ Y = ∅). (15.4.1)
In other words, Y cannot be partitioned by two non-empty open sets of X.
A disconnected subset of a topological space (X, T ) is a subset which is not connected. In other words,
∃Ω1 , Ω2 ∈ T, Y ⊆ Ω1 ∪ Ω2 and Ω1 ∩ Ω2 = ∅ and Ω1 ∩ Y -= ∅ and Ω2 ∩ Y -= ∅. (15.4.2)
15.4.4 Remark: Condition (15.4.1) in Definition 15.4.3 may be expressed in the following equivalent way.
∀Ω1 , Ω2 ∈ T, (Y ⊆ Ω1 ∪ Ω2 and Ω1 ∩ Y -= ∅ and Ω2 ∩ Y -= ∅) ⇒ Ω1 ∩ Ω2 -= ∅. (15.4.3)
In other words, if the set Y is covered by two open sets Ω1 and Ω2 , and both of these open sets covers at
least one point of Y , then Ω1 and Ω2 must have at least one point in common. This is probably closer to
one’s intuition of connectivity. Condition (15.4.3) is illustrated in Figure 15.4.1.
Y ⊆ Ω1 ∪ Ω2 Y ⊆ Ω1 ∪ Ω2
Ω1 Ω2 Ω1 Ω2
Y Y Y Y
Y ∩ Ω1 -= ∅ Y ∩ Ω2 -= ∅ Y ∩ Ω1 -= ∅ Y ∩ Ω2 -= ∅
Ω1 ∩ Ω2 = ∅ Ω1 ∩ Ω2 -= ∅
disconnected connected
Figure 15.4.1 Definition of connectivity of a set

15.4. Connectivity classes 389
Intuitively speaking, one finds the “gap” between two portions of a set and covers each portion with an open
set. If it is not possible to find any gap, the set must be connected. Difficulties arise, however, when the
“gap” has zero width, which demands great skill to position the covering sets accurately. There is no margin
for error!
The more open sets you have, the more pairs of sets you can separate. So bigger topologies have more
disconnected sets, which implies less connected sets.
15.4.5 Definition: A disconnection of a topological space X − < (X, T ) is a pair (Ω1 , Ω2 ) such that Ω1 , Ω2 ∈
T , X = Ω1 ∪ Ω2 , Ω1 -= ∅, Ω2 -= ∅ and Ω1 ∩ Ω2 = ∅.
In other words, a disconnection of X is a partition of X into a pair of disjoint, non-empty open subsets.
15.4.6 Remark: In terms of Definition 15.4.5, one can say that a topological space is connected if and
only if there does not exist any disconnection of the topological space.
15.4.7 Definition: A disconnection of a subset S of a topological space X − < (X, T ) is a pair (K1 , K2 )
such that S = K1 ∪ K2 , K1 -= ∅, K2 -= ∅, K1 ∩ K2 = ∅, and there are open sets Ω1 , Ω2 ∈ T such that
K1 ⊆ Ω1 , K2 ⊆ Ω2 and Ω1 ∩ Ω2 = ∅.
In other words, a disconnection of S is a partition of S into a pair of disjoint non-empty subsets which are
covered by a corresponding pair of disjoint open sets.
15.4.8 Remark: In terms of Definition 15.4.7, one can say that a subset of a topological space is connected
if and only if there does not exist any disconnection of the subset.
15.4.9 Definition: A disconnected set of sets in a topological space X is a set C of non-empty sets in X
which are pairwise disconnected.
15.4.10 Definition: A disconnection of a set of non-empty sets C in a topological space X is a pairwise
disjoint set D of open sets in X which satisfy ∀S ∈ C, ∃Ω ∈ D, S ⊆ Ω.
15.4.11 Theorem: Let X be a normal topological space. Let K1 and K2 be disjoint non-empty closed
subsets of X. Then (K1 , K2 ) is a disconnection of K1 ∪ K2 .
[ Check whether Theorem 15.4.12 is valid if S1 = ∅ or S2 = ∅. Also check generally the extent to which the
non-emptiness of the sets has to be stated explicitly for definitions of separation and disconnectedness. To
the extent that the non-empty conditions are not required, they should be dropped. ]
15.4.12 Theorem: Let S1 , S2 be sets in a normal topological space X. Then
" #
(S1 ∩ S̄2 = ∅) ∧ (S̄1 ∩ S2 = ∅) ⇔ ∃Ω1 , Ω2 ∈ Top(X), (S1 ⊆ Ω1 ∧ S2 ⊆ Ω2 ∧ Ω1 ∩ Ω2 = ∅).
Proof: Let S1 , S2 be sets in a topological space X and suppose that (S1 ∩ S̄2 = ∅) ∧ (S̄1 ∩ S2 = ∅). Let
Ω1 = Ext(S2 ) and Ω2 = Ext(S1 ). Then
15.4.13 Remark: Separation of a pair of non-empty sets may be expressed as follows in a normal topo-
logical space X.
S1 and S2 are separated ⇔ (S1 ∩ S̄2 = ∅) ∧ (S2 ∩ S̄1 = ∅)
⇔ (S1 ⊆ Ext(S2 )) ∧ (S2 ⊆ Ext(S1 ))
⇔ ∃Ω1 , Ω2 ∈ Top(X), (S1 ⊆ Ω1 ∧ S2 ⊆ Ω2 ∧ Ω1 ∩ Ω2 = ∅)
⇔ S1 and S2 are disconnected.
Non-separation of a non-empty pair of sets may be expressed as follows in a normal topological space X
S1 and S2 are non-separated ⇔ (S1 ∩ S̄2 -= ∅) ∨ (S2 ∩ S̄1 -= ∅)
⇔ (S1 ⊆
- Ext(S2 )) ∨ (S2 -⊆ Ext(S1 ))
⇔ ∀Ω1 , Ω2 ∈ Top(X), (S1 -⊆ Ω1 ∨ S2 -⊆ Ω2 ∨ Ω1 ∩ Ω2 -= ∅)
⇔ S1 and S2 are connected.

The definitions of separation and disconnection seem more natural than connection and non-separation.
Connectedness is therefore generally defined as the logical negative of disconnectedness. In other words,
connectedness is defined as a double negative. This suggests that disconnectedness is the primary concept.
In plain English, the words “connected” and “separated” are the logical negative of each other. Therefore
the word “separation” does seem to be the natural word to use for this concept, despite the clash with other
terminology in general topology.
The difference between separation and disconnection of a pair of sets S1 , S2 is illustrated in Figure 15.4.2.
separated sets disconnected sets

Ω1 Ω2 Ω1 Ω2
S1 S2 S1 S2
S1 ⊆ Ω1 S2 ⊆ Ω2 S 1 ⊆ Ω1 S2 ⊆ Ω2
S1 ∩ Ω2 = ∅ S2 ∩ Ω1 = ∅ Ω1 ∩ Ω2 = ∅
Figure 15.4.2 Difference between separation and disconnection of sets
15.4.14 Definition: A connected component of a topological space (X, T ) is a non-empty open set Ω ∈ T
such that X \ Ω ∈ T is non-empty and Ω is a connected subset of X.
The connected component of a point x in a topological space (X, T ) is the connected component of (X, T )
which contains x.
[ In Remark 15.4.15, must prove that the set of connected components in a topological space is a partition.
Make this an exercise? ]
15.4.15 Remark: The set of connected components of a topological space (X, T ) is a partition of X. So the
connected component for a given point is well-defined. Notation 15.4.16 implies that Topcx (X) ∈ Topx (X).
Thus the connected component of x is the unique open neighbourhood of x which is a connected component
of X.
15.4.16 Notation: Topcx (X) for a topological space X and x ∈ X denotes the connected component of
X which contains x.
[ Try to find a better notation for connected components. ]
15.4.17 Theorem: Let X and Y be topological spaces, f : X → Y be continuous and A ⊆ X be connected.
Then f (A) is connected. In other words, the continuous image of a connected set is connected.
Proof: Suppose f (A) is not connected. Then there are Ω1 , Ω2 ∈ Top(Y ) such that f (A) ⊆ Ω1 ∪ Ω2 ,
f (A) ∩ Ω1 -= ∅, f (A) ∩ Ω2 -= ∅ and Ω1 ∩ Ω2 = ∅. But A ⊆ f −1 (f (A)) (for any function f and set A),
and f −1 maps disjoint sets to disjoint sets (because the inverse relation of any function is one-to-one). By
Definition 14.12.2, f −1 maps open sets in Y to open sets in X. Therefore f −1 (Ω1 ), f −1 (Ω2 ) ∈ Top(X),
A ⊆ f −1 (f (A)) ⊆ f −1 (Ω1 ∪ Ω2 ) = f −1 (Ω1 ) ∪ f −1 (Ω2 ), A ∩ f −1 (Ω1 ) -= ∅, A ∩ f −1 (Ω2 ) -= ∅ and f −1 (Ω1 ) ∩
f −1 (Ω2 ) = ∅. By Definition 15.4.3, this implies that A is not connected.
15.4.18 Remark: Theorem 15.4.19 is an immediate corollary of Theorem 15.4.17.
15.4.19 Theorem: Let X and& Y be topological spaces. Let f : X → Y be continuous and A ⊆ X be
connected. Then the graph of f &A is a connected subset of X × Y with the product topology.
Proof: By Theorem 15.1.3, the map g : X → X × Y defined by g : x 8→ (x, f (x)) is continuous if f is
continuous. Therefore the image of g is connected by Theorem 15.4.17.
15.4.20 Definition: A locally connected topological space is a topological space X such that
∀x ∈ X, ∀Ω ∈ Topx (X), ∃Ω# ∈ Topx (X),

Ω# ⊆ Ω and Ω# is connected.

15.4. Connectivity classes 391
15.4.21 Remark: A connected space is not necessarily locally connected, and vice versa. Examples of
connected spaces which are not locally connected are the “comb space” in EDM2 [35], section 79.A, and the
sine-of-reciprocal function in Example 15.4.23, which is presented by Simmons [140], section 34, page 151.
15.4.22 Remark: Some relations between connectivity classes are illustrated in Figure 15.4.3.
topological space
(X,TX )
Hausdorff space locally connected space connected space

(X,TX ) (X,TX ) (X,TX )
locally Euclidean space

(X,TX )
topological manifold
(X,TX )
Figure 15.4.3 Family tree for connectivity and separation classes
15.4.23 Example: Define a topological space X as the set

) " # " #*
X = (x, y) ∈ IR2 ; x = 0 ∧ y ∈ [−1, 1] ∨ x ∈ (0, 1] ∧ y = sin(π/(2x))
with the relative topology of IR2 . (This is illustrated in Figure 15.4.4.)
y
1
not
locally
connected ( + ,
π
sin 2x x>0
f (x) =
0 x=0
0
x
1
-1
Figure 15.4.4 Connected set which is not locally connected
X is connected (by Theorem 15.4.17) because it is the closure of the graph of sin(π/(2x)) for x ∈ (0, 1]. All
neighbourhoods of all points in {0} × [−1, 1] contain an infinite number of components of X. Therefore the
set is not locally connected at any of these points.
[ Define simply connected topologies near here? ]

15.4.24 Remark: In algebraic topology (Section 16.6), there is a very wide range of definitions of con-
nectivity of a topological manifold. These are mostly based on properties of curves and families of curves
in the manifold, which are in turn defined in terms of intervals, which are the connected subsets of the real
numbers.
15.5. Definition of continuity of functions using connectivity

[ Show how other classes of separation between sets are related to continuity. This is done in Theorems 14.12.9
and 14.12.11. ]
15.5.1 Remark: Usually the continuity of a function is defined in terms of the action of the function (or
its inverse) on sets. But functions have two ways of thinking about them: either as an active map from one
set to another, or as a static set of ordered pairs, namely the “graph” of the function. Theorem 15.2.17
is an example of the graph view of continuity of a function. Theorem 15.2.17 shows that if a function is
continuous (and its range is a Hausdorff space), then the graph is closed. (This is similar to, but not the
same as, the closed graph theorem for Banach space operators.)
Definition 14.12.2 is succinct and convenient for applications. From the technical point of view, it is a good
definition. But it does not clearly correspond to the intuitive concept of continuity in everyday life. To
understand the “true nature” of continuity, it is desirable to find an alternative definition.
In elementary introductory courses, continuity is explained in terms of the continuity of the graph, meaning
the connectedness of the graph. Theorem 15.4.19 states that a the graph of a continuous function is connected.
But the converse does not hold.
Theorem 15.5.8 shows that if the range of a function is a normal space, the function is continuous if and
only if the inverse function maps disconnected sets to correspondingly disconnected sets.
15.5.2 Remark: It is useful to introduce here some non-standard definitions of connectivity properties of
functions. Definitions 15.5.3 and 15.5.4 are useful for presenting some relations between connectivity and
continuity properties of functions.
15.5.3 Definition: A weakly connected function from topological space X to topological space Y is a
function f : X → Y such that:
∀C ⊆ f (X), f −1 (C) is connected ⇒ C is connected.
In other words, the image of a connected pre-image is connected.
15.5.4 Definition: A strongly connected function from topological space X to topological space Y is a
function f : X → Y such that:
∀C ⊆ f (X), ∀K1 , K2 ⊆ Y,
(K1 , K2 ) is a disconnection of C ⇒ (f −1 (K1 ), f −1 (K2 )) is a disconnection of f −1 (C).
In other words, the pre-images of a disconnection pair are a disconnection pair for the pre-image.
15.5.5 Remark: It follows from Theorem 15.4.17 that any continuous function is weakly connected. This
can be seen by setting A = f −1 (C) and noting that f (A) = f (f −1 (C)) = C for any function f and set C.
It follows from the proof of Theorem 15.4.17 that a continuous function is also strongly connected, as stated
in Theorem 15.5.6.
15.5.6 Theorem: If X and Y are topological spaces and f : X → Y is continuous, then f is strongly
connected.
Proof: See proof of Theorem 15.4.17.
15.5.7 Remark: Theorem 15.5.8 means that a function f : X → Y is continuous if and only if a subset
B of Y is connected whenever the pre-image f −1 (B) is connected, and if B is disconnected, f −1 (B) is
disconnected by the inverse images of the partition for B. Figure 15.5.1 illustrates this idea.

15.5. Definition of continuity of functions using connectivity 393
Ω1 Ω1
Ω2 Ω2
B1 B1
B2 B2
f −1 f −1
f −1 f −1
f −1 (B1 ) f −1 (B1 )
f −1 (B2 ) f −1 (B2 )
B = B1 ∪ B2 is disconnected B = B1 ∪ B2 is disconnected
f −1 (B) = f −1 (B1 ) ∪ f −1 (B2 ) is disconnected f −1 (B) = f −1 (B1 ) ∪ f −1 (B2 ) is connected
f is continuous f is discontinuous
Figure 15.5.1 Continuous function pre-images of disconnected sets
[ Try to weaken the normal space condition in Theorem 15.5.8 to the Hausdorff condition using a construction
such as in the proof of Theorem 15.2.17 which is given in exercise answer 47.7.3. If Hausdorff doesn’t work,
try to replace the normal space with a completely regular space. In fact, a regular space does seem to be
adequate. Also see Theorems 14.12.9 and 14.12.11 for similar equivalences for a general topological space
with a weaker notion of set separation. ]
15.5.8 Theorem: Let X be a topological space and Y be a normal space. Then f : X → Y is continuous
if and only if f is strongly connected.
Proof: The forward implication of the theorem follows from Theorem 15.5.6. It remains to show that the
function f is continuous if it is strongly connected.
Let f : X → Y be strongly connected. If X = ∅, then f is trivially continuous. So assume that X -= ∅.
Let y ∈ f (X) and Ω ∈ Topy (Y ). To prove continuity of f , it must be shown that for any x ∈ X such
that y = f (x), there is an open neighbourhood G ∈ Topx (X) such that f (G) ⊆ Ω. (In other words, it must
be shown that f −1 (Ω) is an open subset of X.)
Let K1 = {y} and K2 = Y \ Ω. Then K1 is closed because Y is a T1 space. (See Definition 15.2.4.) K2 is
closed because Ω is open. Since K1 and K2 are disjoint closed sets and Y is a normal space, there exist
disjoint G1 , G2 ∈ Top(Y ) such that K1 ⊆ G1 and K2 ⊆ G2 . So (K1 , K2 ∩ f (X)) is a disconnection of
K = (K1 ∪ K2 ) ∩ f (X) = K1 ∪ (K2 ∩ f (X)).
By the strong connectivity of f , the pair (f −1 (K1 ), f −1 (K2 )) must be a disconnection of f −1 (K). Therefore
there are open sets H1 , H2 ∈ Top(X) such that x ∈ H1 , f −1 (K2 ) ⊆ H1 and H1 ∩ H2 = ∅. So H1 ⊆ X \ H2 ⊆
X \ f −1 (K2 ) = X \ f −1 (Y \ Ω) = f −1 (Ω). Thus f (H1 ) ⊆ Ω. Hence (or otherwise), f is continuous.
15.5.9 Remark: An immediate corollary of Theorem 15.5.8 is Theorem 15.5.10 for one-to-one functions.
15.5.10 Theorem: Let X be a topological space and Y be a normal space. If f : X → Y is one-to-one,

then f is continuous if and only if
∀A, B ⊆ X,
(f (A), f (B)) is a disconnection of f (A) ∪ f (B) ⇒ (A, B) is a disconnection of A ∪ B.
In other words, f is continuous if and only if the image of a pair which is not a disconnection is a pair which
is not a disconnection.
15.5.11 Remark: Theorem 15.5.10 means that a one-to-one function f : X → Y is continuous if and
only if the image set f (A) is connected for any connected subset A of X, and if f (A) is disconnected, A is
disconnected by the inverse images of the partition for f (A). Figure 15.5.2 illustrates this idea.
15.5.12 Remark: Very roughly speaking, a function is said to be continuous if it maps connected sets to
connected sets. In other words, the function must preserve connectedness. That is, if there is no gap in a
subset of the domain of the function, there must be no gap in the image of that subset by the function. If there

Ω1 Ω1
Ω2 Ω2
f (A1 ) f (A2 ) f (A1 ) f (A2 )
f f
f f
A1 A1
A2 A2
f (A) = f (A1 ) ∪ f (A2 ) is disconnected f (A) = f (A1 ) ∪ f (A2 ) is disconnected
A = A1 ∪ A2 is disconnected A = A1 ∪ A2 is connected
f is continuous f is discontinuous
Figure 15.5.2 Continuous one-to-one function pre-images of disconnected sets
is a gap in the range of the function, there must be a corresponding gap in the domain. “Corresponding”
means that if a subset C of the range is split into A and B, then the inverse image f −1 (C) is split into
f −1 (A) and f −1 (B).
It follows from this that the concept of continuity can be presented in terms of the simple intuitive concept
of connectivity instead of the convoluted ε-δ definitions which are used for metric spaces or the open-inverse-
function definition for general topologies. Although connectivity is technically defined in terms of open
sets, people have a strong intuition of connectivity without mentioning open sets. Connectivity is generally
thought of as the absence of a “gap”, which is much easier to grasp than the usual universal/existential
quantifier combination.
Maybe continuity should be defined in elementary texts as: “A continuous function makes no new gaps.”
A continuous function may close up some gaps, but it never opens up new ones. Therefore: “Continuous
functions are the functions which preserve connectivity.” Alternatively: “Continuous functions are the
functions whose inverses preserve disconnection.” To put it simply,
f is continuous ⇔ f −1 preserves gaps.
15.5.13 Remark: Theorem 15.5.8 does not imply that f is continuous if the image of any connected set
is connected. A counterexample to this is the function f : [0, ∞) → IR defined by
$
f (x) = sin(π/(2x)) x > 0
0 x = 0.
For any connected subset C of [0, ∞) which does not contain 0, f (C) is connected because f is continuous
on (0, ∞). If C = {0}, then f (C) = {0}, which is connected. If 0 ∈ C and C -= {0} is connected, then
f (C) = [−1, 1], which is connected. But f is not connected. It is easy to show that if K1 = {y}, K2 = {1}
and C = K1 ∪ K2 , then (K1 , K2 ) is a disconnection of C for y ∈ [−1, 1), but (f −1 K1 , f −1 (K2 )) is not a
disconnection of f −1 (C) for y = 0. This is because it is not possible to find a neighbourhood of 0 ∈ X which
does not intersect f −1 (K2 ).
15.6. Open bases, countability classes and separability

15.6.1 Remark: In computer representations, it would be very inconvenient to represent a topology as
its complete set of open sets. There are more efficient representations. In the case of metric spaces (Def-
inition 17.1.2), the topological specification is fully contained in the distance function. The corresponding
structure for a general topology is an “open base” or an “open subbase”. The entire topology can be gen-
erated from such sets by operations such as union and finite intersection. For computation, such bases
should not be too large. If a topology has a countable open base, it is said to be “second countable”. (See
Simmons [140], pages 99–100.)
Since computers can’t even cope with countable sets, topologies are more likely to be dealt with in computers
by providing a test function which examines a representation of a set to see if it can be proven by a set of
rules to be open or not. For instance, sets might be represented as unions and intersections of sets satisfying
various constraints in terms of functions which are themselves represented in symbolic algebra. Thus it is
more likely that the topological nature of sets would be determined in a rule-based manner.

15.7. Compactness classes 395
[ Open bases and subbases are already defined in Section 14.11. Therefore should remove them from here? ]
15.6.2 Definition: An% open base for a topological space (X, T ) is a set S of subsets of X such that
∀Ω ∈ T, ∃S # ⊆ S, T = S # . In other words, every open set is the union of some subset of the open base.
15.6.3 Definition: A second countable (topological) space is a topological space (X, T ) for which there
exists a countable open base.
15.6.4 Definition: An open subbase of a topological space (X, T ) is . . .
15.6.5 Definition: A dense subset of a topological space (X, T ) is a set K ⊆ X such that the closure of
K is X in the topology T .
15.6.6 Definition: A separable topological space is a topological space (X, T ) such that X has a countable
dense subset.
[ 2009-3-3: Determine the relations between density, separability and decomposition of topological spaces
into components. The relations between interior/exterior separation and disconnectedness are already under
close study. So bring in also the density and countable dense subset ideas. ]
15.7. Compactness classes

EDM2 [35], 273.F, suggests that the term “compact” was introduced in 1906 by René Maurice Fréchet.
[ Is topological dimension a kind of compactness? If so, include it in this section. ]
15.7.1 Definition: % A cover or covering of a subset A in a set X is a family (Bi )i∈I of sets Bi ⊆ X for
i ∈ I such that A ⊆ i∈I Bi .
An open cover or open covering of a subset A ⊆ X in a topological space (X, T ) is a covering (Bi )i∈I of A
in X such that Bi ∈ T for all i ∈ I.
15.7.2 Definition: A refinement of a covering (Bi )i∈I of a subset A in a set X is a covering C = (Cj )j∈J
of A in X such that
∀j ∈ J, ∃i ∈ I, Cj ⊆ Bi .
An open refinement of an open covering (Bi )i∈I of a subset A ⊆ X in a topological space (X, T ) is a refinement
C = (Cj )j∈J of A in the set X such that C is an open covering of A in the topological space (X, T ).
15.7.3 Definition: A locally finite subset of a topology T on a set X is a set C ⊆ T such that each point
of X has a neighbourhood which intersects at most a finite number of elements of C. That is,
" #
∀x ∈ X, ∃G ∈ Topx (X), # {Ω ∈ C; Ω ∩ G -= ∅} < ∞.
[ Need to find out under what assumptions on the topology a locally finite set of open sets is the same thing
as a pointwise finite set of open sets. ]
15.7.4 Definition: A compact set in a topological space (X, TX ) is a subset K of X such that every open
covering of K has a finite subcover; that is,
% " % #
∀C ⊆ TX , K⊆ C ⇒ ∃C # ⊆ C, (K ⊆ C # and #(C # ) < ∞) .
A compact topological space is a topological space (X, TX ) such that X is a compact set in X.
15.7.5 Remark: It seems that the characterization of compactness in Definition 15.7.4 is called the Heine-
Borel condition or Heine-Borel compactness.
15.7.6 Theorem: If F is a closed subset of a compact set K, then F is compact.

Proof: Let C be an open cover for a closed subset F of a compact set K in a topological space (X, T ).
Define C # = C ∪ {X \ F }. Then C # is an open cover of K. So K has a finite open subcover C1 of C # .
Define C2 = C ∩ C1 . Then C2 is a finite subset of C. Since C1 ⊆ C ∪ {X \ F }, C2 must equal either C1 or
C1 \ {X \ F }. But C1 covers F , and C1 \ {X \ F } also covers F because {X \ F } ∩ F = ∅. So C2 ⊆ C is a
finite subcover of F . The theorem follows.
15.7.7 Theorem: If X and Y are topological spaces, f : X → Y is continuous and X is compact, then Y
is compact. That is, the image of a compact set under a continuous map is compact.
[ Define compact-open topology for C(X, Y ). See EDM2 [35], 202.C, “mapping spaces”. EDM2 [35], 435.D,
states many useful properties of the compact-open topology. ]
15.7.8 Remark: The set of continuous functions from X to Y is denoted as C(X, Y ) in Notation 14.12.12.
Definition 15.7.9 defines a stronger topology on C(X, Y ) than the pointwise convergence topology in Defini-
tion 14.12.21.
15.7.9 Definition: The compact-open topology on the set of functions C(X, Y ) for topological spaces X
and Y is the topology generated by {ΩK,U ; K is a compact subset of X and U ∈ Top(Y )} on C(X, Y ),
where ΩK,U = {f : X → Y ; f (K) ⊆ U } for all compact K ⊆ X and open U ⊆ Y .
[ Prove that the compact-open topology is stronger than the pointwise topology (if true). ]
15.7.10 Remark: Tikhonov’s Theorem states that the direct product of an arbitrary set of compact topo-
logical spaces is a compact topological space. (See Definition 15.1.4 for direct products of topological spaces.)
However, this theorem requires the axiom of choice. Therefore it is not quoted here.
[ Must present a form of Tikhonov’s Theorem which does not require AC. The full name of Tikhonov is
Andre!i Nikolaeviq Tihonov. Give Tikhonov’s theorem here, but tag it as AC-tainted. ]
15.7.11 Definition: A sequentially compact set in a topological space (X, TX ) is a subset K of X such
that every infinite sequence in K has a convergent subsequence.
15.7.12 Definition: A locally compact topology on a set X is a topology on X such that every point of
X has a compact neighbourhood. That is,
−
∀x ∈ X, ∃Ω ∈ Topx (X), Ω is compact,
−
where Ω means the closure of Ω in Top(X).
15.7.13 Remark: If X is a compact topological space, then X is locally compact.
15.7.14 Definition: A paracompact topology on a set X is a topology on X such that every open covering
of X has a locally finite open refinement.
15.7.15 Remark: If X is a compact topological space, then X is paracompact. [ Should give here an ex-
ample of a paracompact set which is not locally compact. ] Any metrizable topological space is paracompact.
So paracompactness is a fairly weak compactness property. (See Chapter 17 for metric spaces.)
The usefulness of paracompactness in differential geometry is that it guarantees the existence of partitions
of unity, according to Warner [50], page 8. Warner [50], lemma 1.9, page 9, says that a topological space
is paracompact if it is locally compact, Hausdorff and second countable, and since manifolds are second
countable, this means that all manifolds are paracompact. In fact, they are all metrizable. (Proof in
Kelley [118].)
[ Must check that none of the results in Remark 15.7.15 depend on the axiom of countable choice. Is the
axiom of countable choice required? ]

15.8. Topological properties of real number intervals 397
topological space
(X,TX )
Hausdorff space locally compact space paracompact space

(X,TX ) (X,TX ) (X,TX )
locally Euclidean space locally compact, Hausdorff, compact space

(X,TX ) second countable space (X,TX )
(X,TX )
(X,TX )
Figure 15.7.1 Family tree of compactness classes
15.7.16 Remark: The above classes of topological spaces are summarized in the family tree in Fig-
ure 15.7.1.
The fact that the Hausdorff property is not implied by the compactness properties is proven by the trivial
topology {∅, IR} for IR. This is clearly not Hausdorff, but it is compact because all open covers are finite. A
topological manifold is defined as a locally Euclidean Hausdorff space in Definition 25.3.1. See EDM2 [35],
article 425, page 1618, for a comprehensive set of family trees for topological spaces.
[ Near here, present complete spaces, sequential compactness, etc. ]
15.8. Topological properties of real number intervals

15.8.1 Remark: For intervals of real numbers in Theorem 15.8.2, see Definition 8.3.10. For the topology
on IR, see Definition 14.10.1.
15.8.2 Theorem: A set I ⊆ IR is connected in the usual topology on IR if and only if I is an interval.
Proof: The empty set and all singletons are real intervals and are connected. So assume that a set I ⊆ IR
contains at least two elements. A non-empty subset of IR is an interval if and only if I ⊇ (inf I, sup I). (See
Theorem 8.3.11.) Since I has two or more elements, then inf I < sup I. So if I is not an interval, then there
must be elements x, z ∈ I and y ∈ IR \ I such that x < y < z. (Otherwise I ⊇ (inf I, sup I) would hold.)
Hence I is disconnected by the open sets Ω1 = (−∞, y) and Ω1 = (y, ∞) which cover I.
Now suppose that I is not connected. . . .
[ Simmons [140], page 143, has a proof. Should produce a more satisfying version of that here. ]
15.8.3 Remark: From the topological point of view, two real intervals are equivalent if they are homeo-
morphic, but there is a further distinction according to whether the homeomorphism is order-preserving.
The following table classifies intervals into equivalence classes with respect to order-preserving (increasing)
homeomorphisms. It is assumed that a, b ∈ IR with a < b.
type intervals properties of intervals
0. empty ∅ compact, closed, bounded
1. single-point [a, a] compact, closed, bounded
2. closed [a, b] compact, closed, bounded
3a. closed-open [a, b), [a, ∞) neither open nor closed
3b. open-closed (a, b], (−∞, b] neither open nor closed
4. open (a, b), (a, ∞), (−∞, b), (−∞, ∞) open
There are 6 equivalence classes of intervals for oriented homeomorphisms and 5 classes for unoriented homeo-
morphisms because (3a) and (3b) are equivalent if reversals are permitted. Since all topological properties
of the image of a curve are invariant under homeomorphisms of the parameter interval, one may represent
all possibilities in terms of bounded intervals, and one may reduce these intervals to the canonical case that
a = 0 and b = 1.

15.9. Topological dimension

Topological dimension has some relevance to the determination of sufficient conditions for a locally compact
transformation group to be a Lie group. (See Section 33.1.) [ See EDM2 [35], section 117.B. ]
15.9.1 Definition: The Lebesgue dimension of a normal topological space (X, T ) is the smallest n ∈ + 0
such that for%s any finite open covering G of X (i.e. a collection G = (Gi )i=1 such
s
that Gi ∈ T for all i = 1 . . . s
and X = i=1 Gi ), for %s some refinement H of G (i.e. a collection H = (Hi )i=1 such that' Hi ⊆ Gi for all
s
i = 1 . . . s and X = i=1 Hi ), for all sets I ⊆ {1 . . . s} such that #{I} = n + 2, the set i∈I Hi is empty.
The Lebesgue dimension of X is denoted as dim X or dim(X).
If no such n ∈ + 0 exists, then the dimension of (X, T ) is infinite. This is denoted dim X = ∞.
[ You can’t be serious! Definition 15.9.1 is way too convoluted. What does it really mean? Examples are
required. Whose big idea was it to invent this definition anyway? ]
[ Must define cardinality of sets in the numbers section. ]
15.9.2 Remark: Definition 15.9.1 obviously requires an example or two for clarification.
[ Find out what the relation is between Lebesgue dimension and Euclidean dimension. Also find out what
this has to do with n-cells. ]
[ Near here, there could be a short section stating a fairly complete set of properties of the empty set topology
and the topologies on sets with one or two elements, even maybe with 3 elements. Alternatively, the
properties of such discrete topologies could be set as easy exercises. ]
15.10. Set union topology

15.10.1 Theorem: Let (X1 , T1 ) and (X2 , T2 ) be topological spaces such that X1 ∩ X2 = ∅. Then the set
T = {Ω1 ∪ Ω2 ; Ω1 ∈ T1 and Ω2 ∈ T2 } is the weakest topology on X1 ∪ X2 such that T ⊇ T1 ∪ T2 .
'
" ' i # The
Proof: " ' closure
# of T with respect to pairwise intersection follows from the identity i (Ωi1 ∪ Ωi2 ) =
i Ω1 ∪ i Ω2 which holds if Ω1 ⊆ X1 and Ω2 ⊆ X2 for all i, and X1 ∩ X2 = ∅. The closure with respect
i i i
to arbitrary unions follows from the corresponding identity for unions. The minimality of the topology T
follows from the fact that any topology T ⊇ T1 ∪ T2 must contain at least all of the unions of elements
of T1 ∪ T2 .
15.10.2 Definition: The disjoint set union topology for disjoint sets X1 and X2 with topologies T1 and
T2 respectively is the topology T = {Ω1 ∪ Ω2 ; Ω1 ∈ T1 and Ω2 ∈ T2 }.
15.10.3 Remark: Definition 15.10.6 introduces a sort of ‘graft’ of two or more topologies. The idea here
is to try to define a topology on the set X1 ∪ X2 , especially in the case that X1 ∩ X2 is non-empty. The
topologies of manifolds and fibre spaces are often defined as a “graft of patches”. This is a very general
mechanism for creating topologically interesting spaces out of flat, boring spaces without having to resort to
induced topologies of embeddings or various quotient topologies. The identification topology can probably
be expressed as the quotient of the “union topology” (of two nominally non-intersecting topological spaces)
with respect to an appropriate relation on the base set union.
A more general form of “identification topological space” can be defined with the aid of Definition 6.10.5,
which defines identification spaces of arbitrary families of sets. If the topologies on each member of such
a family are consistent, then a topological identification space is well-defined. This is presented in Defini-
tion 15.11.2.
[ Theorem 15.10.5 is clearly superseded by Theorem 15.10.7. ]

[ See EDM2 [35], section 425.M for discussion of “topological sums”, which seem to be related to the set-union
topology. ]
15.10.4 Remark: The conditions ∀Ω1 ∈ T1 , Ω1 ∩ X2 ∈ T2 and ∀Ω2 ∈ T2 , Ω2 ∩ X1 ∈ T1 in Theorem 15.10.5

are equivalent to requiring the identity map idX1 ∩X2 to be a homeomorphism for X1 ∩ X2 with the relative
topologies from T1 and T2 respectively.

15.11. Topological identification spaces 399
15.10.5 Theorem: Suppose two topological spaces (X1 , T1 ) and (X2 , T2 ) are such that Ω1 ∩ X2 ∈ T2 for
all Ω1 ∈ T1 and Ω2 ∩ X1 ∈ T1 for all Ω2 ∈ T2 . Then the set T = {Ω1 ∪ Ω2 ; Ω1 ∈ T1 and Ω2 ∈ T2 } is the
weakest topology on X1 ∪ X2 such that T ⊇ T1 ∪ T2 .
Proof: It must be shown that the set T is a topology for X1 ∪ X2 . As in the proof of Theorem 15.10.1, it is
sufficient to show that T is closed under pairwise intersections and arbitrary unions. (See Definition 14.3.3.)
So consider Ω11 , Ω21 ∈ T1 and Ω12 , Ω22 ∈ T2 . It must be shown that Ω = (Ω11 ∪ Ω12 ) ∩ (Ω21 ∪ Ω22 ) ∈ T . By
distributivity, " # " # " # " #
Ω = Ω11 ∩ Ω21 ∪ Ω11 ∩ Ω22 ∪ Ω12 ∩ Ω21 ∪ Ω12 ∩ Ω22 .
Clearly Ω11 ∩ Ω21 ∈ T1 and Ω12 ∩ Ω22 ∈ T2 . Since Ω11 ∩ Ω22 = (Ω11 ∩ X2 ) ∩ Ω22 , it follows that Ω11 ∩ Ω22 ∈ T2 by
the assumptions of the theorem. Similarly (Ω12 ∩ Ω21 ) ∈ T2 . So Ω is a union of an element of T1 with three
elements of T2 . From the closure of T2 under unions, it follows that Ω ∈ T . The closure of T under arbitrary
unions follows trivially from the associativity and commutativity of set unions.
15.10.6 Definition: Suppose two topological spaces (X1 , T1 ) and (X2 , T2 ) are such that Ω1 ∩ X2 ∈ T2 for
all Ω1 ∈ T1 and Ω2 ∩ X1 ∈ T1 for all Ω2 ∈ T2 . (This is the same as requiring the identity map idX1 ∩X2 to be
a homeomorphism between X1 and X2 .) Then the set union topology of these two topological spaces is the
topological space (X1 ∪ X2 , T ), where T = {Ω1 ∪ Ω2 ; Ω1 ∈ T1 and Ω2 ∈ T2 }.
15.10.7 Theorem: Suppose (Xi , Ti ))i∈I % is a family of topological* spaces such that for all i, j ∈%I, Ωi ∩ Xj ∈
Tj for all %Ωi ∈ Ti . Then the set T = i∈I iΩ ; ∀i ∈ I, Ω i ∈ T i is the weakest topology on i∈I Xi such
that T ⊇ i∈I Ti .
%
Proof: It is sufficient to show that T is a topology for i∈I Xi . Since T is clearly closed under arbitrary
unions, it is sufficient to show that it is closed under pairwise intersection. Consider " %two families
# " %of open #
sets (Ω1i )i∈I and (Ω2i )i∈I , where Ω1i , Ω2i ∈ Ti for all i ∈ I. It must be shown that Ω = i∈I Ω 1
i ∩ j∈I Ω1j
%
is in T . By distributivity of union and intersection operations, Ω = i,j∈I (Ω1i ∩ Ω2i ). For any i, j ∈ I,
Ω1i ∩ Ω2j = Ω1i ∩ (Ω2j ∩ Xi ), and Ω2j ∩ Xi ∈ Ti by the assumptions of the theorem. So Ω1i ∩ Ω2j ∈ Ti . It follows
that Ω ∈ T .
15.10.8 Definition: Suppose (Xi , Ti )i∈I is a family of topological spaces such that for all i, j "∈%I, Ωi ∩Xj #∈
Tj for all Ωi)∈ Ti . Then the set union *topology of the family (Xi , Ti )i∈I is the topological space i∈I Xi , T ,
%
where T = i∈I Ωi ; ∀i ∈ I, Ωi ∈ Ti .
15.11. Topological identification spaces

“Topological identification spaces” are topological spaces constructed by grafting together patches of topo-
logical spaces. This is how topological manifolds are often defined in practice. [ See EDM2 [35], 425.L, for
identification spaces. ] (If you have something better to do, like making a cup of tea or feeding the hamster,
this would be a good time to skip a section and come back later. This is one of the less interesting sections.)
15.11.1 Remark: Definition 15.11.2 requires some explanation. It is effectively the same as Defini-
tion 15.10.8 except that the sets (Xi )i∈I are first grafted into a set X before the tests are applied to the
topologies on the pairwise intersections of the sets Xi . In other words, the identification topology is really
the same thing as a set union topology except that an equivalence relation must first be applied to the sets
in the family to determine which points in the sets are supposed to be identified.
A useful mental image for the definition of a topological identification space is a football made out of many
patches of material sewn together with overlapping flaps.
[ Maybe graft spaces could give a useful perspective on the way tangent bundles are defined by identifying
point/vector pairs according to transformation rules. ]
15.11.2 Definition: Let (Xi , Ti )i∈I be a family of topological spaces. Let X ⊆ × ˚i∈I Xi be an identifica-
tion space of the sets (Xi )i∈I . (See Definition 6.10.5.) The family (Xi , Ti )i∈I is said to be (topologically)
consistent with the identification space X if for all i, j ∈ I, for all Ωi ∈ Ti , {xj ∈ Xj ; (xk )k∈K ∈ X and xi ∈
Ω i } ∈ Tj .

˚i∈I Xi ,
When the family of topological spaces (Xi , Ti )i∈I is consistent with the identification space X ⊆ ×
the identification topology on X is the set T defined by
)% *
T = fi (Ωi ); ∀i ∈ I, Ωi ∈ Ti ,
i∈I
where fi : Xi → X is defined so that for all y ∈ Xi , fi (y) is the unique x ∈ X such that y = xi . (See
Remark 6.10.6.)
The topological space (X, T ) may be called a topological identification space of the family (Xi , Ti )i∈I .
15.11.3 Remark: The ‘topological consistency’ in Definition 15.11.2 simply means that the identification
space transition functions are continuous. The topology T of a identification space X is the weakest topology
for which projections from X to the patch topologies are continuous.
15.11.4 Theorem: Let X be a topological space, and let τ : Y → X be a map from a set Y onto X. Then
τ −1 (Top(X)) = {τ −1 (G); G ∈ Top(X)} is a topology on Y , and (Y, τ −1 (Top(X))) ≈ (X, Top(X)).
Proof: [ See EDM 376.D for the relevant properties of f −1 . ]
15.11.5 Remark: Theorem 15.11.6 is used in Section 43.3. [ Explain the motivation. ]
15.11.6 Theorem: Let X be a topological space, and let τ : X → Y be a map from X onto a set Y . If
τ −1 ◦ τ (Ω) ∈ Top(X) for all Ω ∈ Top(X), then τ (Top(X)) is a topology on Y . Also τ −1 ◦ τ (Top(X)) is a
topology on X and (X, τ −1 ◦ τ (Top(X))) ≈ (Y, τ (Top(X))).
Proof: Let X be a topological space, and τ : X → Y a map from X onto Y . Suppose τ ◦ τ −1 maps open
sets of X to open sets of X. Let T = τ (Top(X)). Then
(i) ∅ = τ (∅) and Y = τ (X). So ∅ and Y are in T .
(ii) Let Ω1 , Ω2 ∈ T . Then Ω1 = τ (G1 ) and Ω2 = τ (G2 ) for some G1 , G2 ∈ Top(X). Let G#1 = τ −1 (Ω1 ) and
G#2 = τ −1 (Ω2 ). Then G#1 , G#2 ∈ Top(X). So G#1 ∩ G#2 ∈ Top(X). Therefore Ω1 ∩ Ω2 = τ (G#1 ∩ G#2 ) ∈ T .
% % %
(iii) Let C ⊆ T . Then for some S ⊆ Top(X), C = G∈S τ (G) = τ ( G∈S G) ∈ T .
This shows that T is a topology on X. The remaining statements of the theorem follow immediately from
Theorem 15.11.4.

[401]
Chapter 16
Topological curves, paths and groups
16.1 Curve and path terminology and definition options . . . . . . . . . . . . . . . . . . . . . . 401

16.2 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
16.3 Path-equivalence relations for curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
16.4 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
16.5 Convex curvilinear interpolation in affine spaces . . . . . . . . . . . . . . . . . . . . . . . 410
16.6 Algebraic topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
16.7 Topological groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.8 Topological transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
16.9 Topological vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
16.10 Network topology and continuous paths in networks . . . . . . . . . . . . . . . . . . . . . 415
16.1. Curve and path terminology and definition options

There is some variation among authors in the use of the words “curve” and “path”. This section is a
discussion of the options for defining these concepts.
16.1.1 Remark: The candidates for definitions of curves and paths include the following styles of struc-
tures, where X is a topological space of points and I is a real-number interval. (The various kinds of real
intervals are presented in Definition 8.3.10 and Remark 15.8.3.)
(1) A map γ : I → X for intervals I ⊆ IR.
(2) An image set S = γ(I) of a map γ : I → X.
Most differential geometry texts define a “curve” in style (1). This is sometimes referred to as a “parametrized
curve” or “continuous curve”. (Examples are Ahlfors [94], Crampin/Pirani [12], Darling [14], do Carmo [17],
Frankel [19], Gallot/Hulin/Lafontaine [20], Lang [31], Lee [33], Misner/Thorne/Wheeler [38], Rudin [137],
Spivak [43] and Szekeres [45].)
Some texts (like Greenberg/Harper [114]) use the word “path” for structure style (1). Some texts (like
Lang [31]) use both words “curve” and “path” for style (1). A minority of texts use the word “curve” for
structure style (2). (Examples are EDM2 [35] and Simmons [140].)
The pattern which seems to emerge is as follows.
(i) Structure style (1) is the most popular by far. Style (2) is much less popular.
(ii) The word “curve” is used by most DG texts for structure style (1). A small number of authors use the
word “path” for style (1), either exclusively or in addition to the word “curve”.
(iii) The word “path” is used only by a small minority of DG texts.
(iv) The texts which do not use the word “curve” for structure (1) are predominantly on non-DG subjects
such as real analysis, complex analysis and topology.
The most popular names for structures (1) and (2) are “curve”, “path”, “arc”, “locus” and “contour”. Other
plausible names are “trajectory”, “track”, “route”, “journey” and “traversal”.
Styles (3) and (4) are not found at all in the author’s survey of texts, but they do have some benefits.

402 16. Topological curves, paths and groups
(3) A map γ : I → X from which some redundant information has been removed.
(4) A set S ⊆ X to which some extra information has been added.
Structure style (1) has the most information. Style (2) has the least information. Styles (3) and (4) have
intermediate amounts of information. The removal of information in case (3) may be achieved by replacing
the map γ in case (1) with an equivalence class of such maps. The addition of information in case (4) may be
achieved by attaching extra structures (such as start and end points, the direction of traversal, or an order
structure) to the image set S of case (2).
16.1.2 Remark: In the mathematical theory of networks (sometimes called “graph theory”), a “path”
generally means an ordered sequence of links (or “edges”) in the network. Although the phrase “path from
A to B” has a sense of directionality, this suggests only a sequence of traversal, not the time at which each
point should be reached. So the function concept (1) in Remark 16.1.1 has far too much information for a
path. Perhaps a total order on the path would be the best way of representing the English-language idea of
a path from one point to another.
16.1.3 Remark: In this book, the word “curve” (Definition 16.2.4) signifies the map structure (1) in
Remark 16.1.1 and “path” signifies an equivalence-class structure (3), whereas concept (2) is called simply
the “image” of the curve or path. In other words, a curve will be a map from an interval to a topological
space whereas a “path” means a curve from which some or all information about the method of traversal
has been removed.
16.1.4 Remark: There may be many different path definitions according to the choice of equivalence
relation. This is very much analogous to the way in which there is a multiplicity of definitions of manifolds –
topological, C k differentiable, analytic, and so forth. Concept (1) in Remark 16.1.1 is similar to a manifold
chart (defined in the inverse direction), whereas concept (3) resembles a maximal atlas for a manifold.
Concept (2) resembles the base set of a manifold without the charts or topology. (See Remark 16.1.8 for
comments on inverse atlases for curves and paths.)
16.1.5 Remark: What is often wanted is a curve definition with the redundant information removed.
Curves which have the same image are sometimes regarded as equivalent, but the image alone does not
usually contain all of the desired information. It is helpful to look at some examples. Consider the maps
γ1 : [0, π] → IR2 with γ1 : t → (cos t, 0) and γ2 : [0, π] → IR2 with γ2 : t → (cos 3t, 0). (See Figure 16.1.1.)
Both maps have the image set [−1, 1] × {0}, and they have the same start and end points. Given the image
of a path with self-intersections, it is impossible to determine the order in which the image is traversed.
t=π t=0 t = π/3 t=0
−1 1 t=π t = 2π/3
γ1 t 8→ (cos t, 0) γ2 t 8→ (cos 3t, 0)
t=π t=π
IR IR
Figure 16.1.1 Curves with same image set and end-points
In the case of a space-filling curve, essentially all parametrization information is lost. (See Example 42.1.2
for space-filling curves.) Therefore even if a curve is known to be one-to-one, the traversal sequence cannot
be determined from only the image set and the start and end points. Therefore it is better to either specify
the traversal order explicitly as an abstract order relation, or to specify a single map γ together with an
equivalence relation on the set of all curves.
16.1.6 Remark: The difference between sets and curves may be compared with the difference between sets
and sequences. Sets are often indexed for convenience even when the choice of index is irrelevant. Sometimes

16.1. Curve and path terminology and definition options 403
the order of a sequence of objects is important, sometimes not. The parametrization of paths is analogous
to the indexing of sets. The choice of index map for the set doesn’t matter as long as the order is right.
The parametrization of a curve is often of little importance apart from its order. Similarly, the choice of
charts for a manifold is often of little importance as long as the topology and differentiable structure are as
intended. One can remove the superfluous details of the choice of parametrization by defining an equivalence
class of parametrizations or by simply declaring parametrizations to be equivalent with respect to some
specified equivalence relation. These are the usual ways of doing things when a mathematical structure
contains superfluous information which should be ignored.
16.1.7 Remark: Despite some similarities between curves and 1-dimensional manifolds, they are not really
the same thing. The charts of curves map from IR to the point set, whereas 1-manifold charts are from the
manifold to IR. (See Figure 16.1.2.)
curve one-manifold
S = γ(I) ⊆ M Dom(ψ1 ), Dom(ψ2 ) ⊆ S ⊆ M
γ:I→M ψ1 : S →
˚ IR ψ2 : S →
˚ IR
IR IR
I = Dom(γ) ⊆ IR ψ1 (S), ψ2 (S) ⊆ IR
Figure 16.1.2 Contrast between curve and one-manifold
This is a necessary difference because curves may self-intersect. Therefore the map for a curve may not be
injective. A curve is not just a point-set with a given topology. A curve is a parametrized trajectory within
a topological space. A manifold structure would be more suitable for level curves of a real-valued function.
Strictly speaking, “level curves” should perhaps be called “level manifolds”.
16.1.8 Remark: A path could be fully defined by analogy with manifolds as a pair (S, A) where S ⊆ M
is a subset of a topological space M , and A is a set of continuous maps γ : Iγ → S for intervals Iγ ⊆ IR.
(See Figure 16.1.3.)
%
S= Im(γ) ⊆ M
γ∈A
γ1 γ3
γ2
IR IR IR
Dom(γ1 ) ⊆ IR Dom(γ2 ) ⊆ IR Dom(γ3 ) ⊆ IR
Figure 16.1.3 Atlas of curves for a path
From this perspective, the curve map in concept (1) in Remark 16.1.1 is a chart γ ∈ A, the image set in
concept (2) is the set S, and the equivalence class suggested by concept (3) is the atlas A. For any two maps
γ1 , γ2 ∈ A, one may construct monotonic surjective continuous functions β1 : I → Iγ1 and β2 : I → Iγ2 such
that γ1 ◦ β1 = γ2 ◦ β2 . (The technicalities here are explained in Section 16.4.)
If the reparametrization functions β1 and β2 are non-decreasing for all pairs of maps, the path is oriented.
Other constraints could be put on reparametrization maps. For instance, they could be required to be affine,

C k , analytic or an isometry. (This is related to the concept of a pseudogroup. See Section 19.4.) Each
transition map constraint would yield a different class of path. One could even have transition maps which
are of different regularity classes on different subsets of the path. In the case of rectifiable curves, Lipschitz
transition maps might be appropriate.
One could now proceed to develop all of the concepts of topological and differentiable manifolds for paths
of the form (S, A). After defining paths, one could define a family of paths as a pair (S, A) where γ :
Iγ → S is a multi-parameter map with Iγ ⊆ IRn for n ∈ + 0 . The advantage of this kind of inverse
atlas construction for n-manifolds is that it can represent in a natural way surfaces which have complex
self-intersections. The important point to note is that the charts are only required to be continuous, not
necessarily homeomorphisms.
If one expands a an atlas of curves by completing the atlas with respect to equivalence of direction only,
then the information left in the atlas corresponds to a total order on the path set. In this case, one may as
well use instead the concept of an “ordered traversal” which was introduced in Definition 7.1.17.
A simple kind of path-chart equivalence would be to declare a set A of curves for a path set to be equivalent
if γ1−1 ◦ γ2 : IR →
˚ IR is continuous for all γ1 and γ2 in A. These curve transition maps may also be required
to have various regularity properties.
Perhaps a much more interesting concept of “path-atlas” would generalize the index set I to an open subset
of IRn for general n ∈ + 0 . As in Definition 7.1.17, the analytical structure on the parameter set I could
be replace with an order structure, because sometimes one is only interested in order of traversal, not in
the analytical properties of the traversal. In the interests of minimalism, the structure on I = IRn , for
example, could
" be a partial order such as x ≤ y ⇔ (∀i ∈ # n , xi ≤ yi ) or the lexicographic total order
x ≤ y ⇔ ∀j ∈ n , (∀i ∈ n , i < j ⇒ xi = yi ) ⇒ xj ≤ yj . (This lexicographic total order may also be
expressed as x ≤ y ⇔ ∀j ∈ n , ((xj ≤ yj ) ∨ (∃i ∈ n , (i < j ∧ xi -= yi ))), as may be readily verified by
the curious reader. See Exercise 46.7.10 and Section 7.1.)
[ Foliations might have some relevance or relation to curve families. ]

[ Maybe try to generalize ordered traversals to multi-parameter ordered traversals. This would be analogous
to a multi-parameter family of curves, but with the parametrization information removed if the multiple
total order on the index set is induced onto the set itself. ]
16.2. Curves
16.2.1 Remark: The customary use of the symbol γ for curves is possibly due to the fact that γ is the
third Greek letter, corresponding to the Latin third letter ‘c’ for “curve”.
16.2.2 Remark: Curves are of fundamental importance to both differential geometry and physics. There-
fore they deserve careful study. Much of physics is expressed in terms of the effect of fields (electric, magnetic,
gravitational, etc.) on “test particles”, which are abstract infinitesimal particles which follow continuous tra-
jectories in some space, usually with a time parameter of some kind. Much of differential geometry is
expressed in terms of parallel transport along curves or paths. Curves and paths are similar to 1-manifolds
(defined in Chapter 25), but have important differences as discussed in Remark 16.1.8.
Curves and paths are also of fundamental importance in the study of connectivity in general topological
spaces because curves are continuous maps whose domain is an interval, and intervals are precisely those
subsets of IR which are connected. Curves are therefore used for defining pathwise connectivity. Families of
curves are also a basic tool in algebraic topology.
16.2.3 Remark: The definitions of curves in this section are meaningful in general topological spaces
although they are typically used in topological and differentiable manifolds. It is assumed that all curves are
continuous because it is difficult to think of a useful class of curves with a weaker condition than continuity.
(A curve with discontinuous jumps is probably better described as a set or sequence of curves, or an “ordered
traversal”.) However, a curve may be referred to as a “continuous curve” to emphasize that no stronger
regularity properties are expected from it.
16.2.4 Definition: A (continuous) curve in a topological space M is a continuous map γ : I → M for

some interval I ⊆ IR.

16.2. Curves 405
16.2.5 Remark: Intervals of IR are defined in Definition 8.3.10.

16.2.6 Remark: There’s an interesting question here as to whether empty curves should be permitted in
Definition 16.2.4. Real intervals are characterized as the connected subsets of IR. The set of all real intervals
is closed under intersection if empty intervals are permitted. So it is desirable to permit empty curves. If
I = ∅, then γ = ∅, namely the empty function.
16.2.7 Remark: Since the parameter interval I of a curve in Definition 16.2.4 is connected, it follows that
the image γ(I) is connected in the topology of the target space M . The interval I may be open, closed or
semi-closed. It may also be classed as bounded, singly infinite or doubly infinite. If I is compact (i.e. closed
and bounded), then the image γ(I) is compact. The non-empty compact intervals are the most useful for
defining parallelism on fibre bundles because the end-points are required.
In algebraic topology, it is usual to normalize a positive-length compact parameter interval of a curve to
the set [0, 1], but in differential geometry, general parameter intervals are required. The parameter may
represent, for example, the time of passing a point or the distance measured along the curve.
16.2.8 Remark: It is important to disinguish two different ways of using the words “open” and “closed”.
In the theory of curves, these words are often applied to the connectivity properties of the curve rather than
the general topological properties defined in Section 14.3. For instance, a curve γ : [a, b] → X might be
called “closed” if γ(a) = γ(b). This is an unfortunate re-use of words. Because of these multiple meanings
of “closed” and “open”, it is sometimes a matter of guesswork to interpret them. Perhaps a better word for
a curve which ends where it starts would be a “loop” or a “closed loop”.
The word “arc” in Definition 16.2.9 is sometimes a synonym for “curve” and sometimes not, which makes
things even more confusing. Historically this confusion seems to be due to the divergent usage in complex
analysis and topology. Arcs are found mostly in complex analysis. The word “arc” is avoided in this book.
The term “compact-domain curve” is probably non-standard, but is often useful.
16.2.9 Definition: Let M be a topological space.
An open curve or open arc in M is a curve γ : I → M such that I is an open interval.
A compact-domain curve in M is a curve γ : I → M such that I is a compact interval.
16.2.10 Remark: Whereas Definition 16.2.9 classifies curves in a topological sense, Definition 16.2.11
classifies curves according to the injectivity or lack of injectivity of the curve.
A closed curve in M is a curve γ : [a, b] → M such that γ(a) = γ(b).
A simple curve or Jordan arc in M is an injective curve, i.e. a curve γ such that γ(t1 ) = γ(t2 ) ⇒ t1 = t2 .
A Jordan curve or simple closed curve in M is a curve with a non-empty compact domain which is injective
except that the end-points coincide; in other words, it is a curve γ : [a, b] → M such that γ(t1 ) = γ(t2 ) ⇔
(t1 = t2 or {t1 , t2 } = {a, b}).
A constant curve in M is a curve γ : I → M such that ∀s, t ∈ I, γ(s) = γ(t).
16.2.12 Remark: It is not at all guaranteed that a curve γ : (a, b) → M can be continuously extended to
a curve γ : [a, b] → M for a, b ∈ IR with a < b. Therefore initial and terminal points in Definition 16.2.13
and Notation 16.2.14 assume a non-empty compact parameter interval [a, b]. Since concatenation of curves is
defined in terms of initial and terminal points, concatenation in Definitions 16.2.15 and 16.2.17 also assumes
non-empty compact parameter intervals.
The initial point of a curve γ : [a, b] → M is γ(a).
The terminal point of a curve γ : [a, b] → M is γ(b).
A multiple point of a curve γ : I → M is a point x ∈ M such that γ(t1 ) = γ(t2 ) = x for some t1 , t2 ∈ I
with t1 -= t2 .
16.2.14 Notation: The initial and terminal points of a curve γ : [a, b] → M may be denoted as S(γ) =
γ(a) and T (γ) = γ(b) respectively. (S is mnemonic for “source” or “start”. T is mnemonic for “terminal”
or “target”.)

16.2.15 Definition: The concatenation of two curves γ1 : [a1 , b1 ] → M and γ2 : [a2 , b2 ] → M in a

topological space M with b1 = a2 and γ1 (b1 ) = γ2 (a2 ) is the curve γ : [a1 , b2 ] → M defined for t ∈ [a1 , b2 ] by
(
γ1 (t) for t ∈ [a1 , b1 ]
γ(t) =
γ2 (t) for t ∈ [a2 , b2 ].
16.2.16 Remark: The concatenation of two continuous curves is a continuous curve. Pairwise concatena-
tion generalizes easily to sets, families and sequences of curves. The concatentation of an unordered family
of curves is given in Definition 16.2.17.
16.2.17 Definition: The concatenation % of a sequence of curves (γj )j∈J with non-empty compact domains
Ij = [aj , bj ] for j ∈ J such that I = j∈J Ij is an interval and #(Ij ∩Ik ) ≤ 1 for all j, k ∈ J, and γj (t) = γk (t)
for all t ∈ Ij ∩ Ik for all j, k ∈ J, is the map γ : I → M defined by γ(t) = γj (t) for all t ∈ Ij , for all j ∈ J.
[ Define single-parameter and multi-parameter families of curves here. One-parameter families of curves are
important for homotopy in algebraic topology. ]
[ Define topological foliations near here. See EDM2 [35], section 154. ]
16.2.18 Notation: Denote by C0 (M ) the set of all continuous curves in M .
16.2.19 Remark: Notation 16.2.18 is experimental. To be useful, it may need some indication of the
nature of the parameter interval.
16.3. Path-equivalence relations for curves

Some texts define any two curves in a topological space M to have the same path if they have the same
image. Such a definition discards information about the direction of the curve and exact process of traversal
in the case of self-intersections or retracing of the image set. On the other hand, if two curves are defined to
be path-equivalent when then are related by a homeomorphism between the parameter intervals, not enough
information is discarded. This is because a curve which is constant on some sub-interval actually traces the
same path as if there had been no such constant sub-interval. To give a real-life example, if you travel by
train from Paris to Moscow, your path is the same whether or not your train stops in Berlin for 5 minutes.
But parameter homeomorphisms are unable to remove such pauses in journeys. Therefore in this section a
more precise concept of path equivalence is defined. To put it simply, intervals where a curve is constant
(called “constant stretches”) are ignored when comparing curves. In particular, this implies that a constant
curve has the same path as a curve with a one-point parameter interval, which is as one would expect.
Information about the direction of a curve is not discarded because this information is needed in most
applications in differential geometry. Unoriented paths are useful for defining pathwise connectivity in
general topology, but oriented paths can do that job too. So the default definition for a path is oriented.
Alternative terms for “oriented” would be “directed” or “ordered”.
16.3.1 Remark: In the study of pathwise parallelism, it is assumed that no change of orientation of a
fibre occurs if the curve is stationary for a while. Thus if a curve γ : I → M satisfies γ(s) = γ(t) for
all s, t ∈ [a, b] ⊆ I for some a < b, then the curve could be said to be stationary on the interval [a, b]. But
the word “stationary” is usually associated with functions of one or more real or complex variables whose
derivatives vanish at a point. Therefore the more generic term “constant” is preferable.
If reparametrizations are permitted to be non-decreasing continuous maps rather than increasing homeo-
morphisms, then all constant stretches of curves may be removed. Thus a curve γ1 which is constant on the
interval [a, b] may have this constant stretch removed by expressing it as γ1 = γ2 ◦ β, where β : I → IR is
defined by β(t) = min(x, a + max(0, x − b)) for all t ∈ I, and
(
γ1 (t) t≤a
γ2 (t) =
γ1 (t + b − a) t ≥ a
for t ∈ β(I) = Dom(γ2 ). A reparametrization such as this is clearly not a homeomorphism, but the curves γ1
and γ2 trace out the same set of points in the same order. Therefore they are equivalent as far as representing

16.3. Path-equivalence relations for curves 407
a traversal of points is concerned. When curves are used for parallel transport, there is supposed to be no
change in the orientation of a fibre when the curve is constant.
There are two obvious ways to deal with sometimes-constant curves. Either they can be simply removed
from consideration, or else they can be “equivalenced out”, which means that they can be collected together
in equivalence classes which effectively ignore the constant stretches of curves. If the latter approach is used,
it will be convenient to always be able to select a never-constant representative for each equivalence class.
In practice, this would be the same as just ignoring sometimes-constant curves completely. There remains,
therefore, the question of whether there is any use in permitting sometimes-constant curves to be members
of curve classes. The formalism chosen here uses unrestricted continuous curves, and imposes an equivalence
relation which makes all curves equivalent to some never-constant curve. Therefore all sometimes-constant
curves may be ignored since their paths are represented by never-constant curves.
16.3.2 Definition: A constant stretch of a curve γ : I → M in a topological space M is an interval

[a, b] ⊆ I with a < b and γ(s) = γ(t) for all s, t ∈ [a, b].
16.3.3 Definition: A never-constant curve in a topological space M is a curve γ : I → M such that

∀a, b ∈ I, (a < b ⇒ (∃c ∈ [a, b], γ(c) -= γ(a))).
A sometimes-constant curve in a topological space M is a curve γ : I → M which has a constant stretch.
16.3.4 Remark: A curve is a never-constant curve if and only if it has no constant stretches. It is not
necessarily true that a &never-constant curve is injective if restricted to small enough sub-intervals. That is,
a restriction such as γ &[t−ε,t+ε] may be non-injective for all ε > 0. But a curve which does have this local
injectivity property is necessarily never-constant in the sense of Definition 16.3.3.
16.3.5 Theorem: If a curve γ is constant and never-constant, then either γ = ∅ or #(Dom(γ)) = 1.
[ Show somewhere that

% for any continuous function f : I → M for an interval I ⊆ IR and M a topological
space, the set A = {[a, b] ⊆ I; a < b and #(f ([a, b])) = 1} is a countable union of disjoint closed intervals
of IR. This may be useful for proving Theorem 16.3.6. ]
16.3.6 Theorem: For any curve γ1 : I → M in a topological space M , there exists a never-constant curve
γ2 : J → M and a non-decreasing continuous surjection β : I → J such that γ1 = γ2 ◦ β.
Proof: . . .
16.3.7 Definition: Curves γ1 and γ2 in a topological space M are path-equivalent if there are surjective
non-decreasing continuous functions β1 : I → Dom(γ1 ) and β2 : I → Dom(γ2 ) for some interval I ⊆ IR such
that γ1 ◦ β1 = γ2 ◦ β2 .
16.3.8 Example: The curves in IR2 defined by γ1 : [0, π] → IR2 with γ1 : t → (cos t, 0) and γ2 : [0, π] → IR2
with γ2 : t → (cos 3t, 0) have the same image set [−1, 1] × {0} and the same start and finish points, but
according to Definition 16.3.7, γ1 and γ2 are not path-equivalent. This follows from Theorem 16.3.12.
16.3.9 Remark: In terms of Definition 16.3.7, Theorem 16.3.6 means that every curve in a topological
space M is path-equivalent to a never-constant curve in M .
16.3.10 Remark: The reparametrization functions β1 and β2 in Definition 16.3.7 modify the corresponding
curves γ1 and γ2 so that they have the same parameter interval I. They also insert constant stretches into
the curves so that they match correctly. Thus if p ∈ M is a point such that γ1 (t) = p for t in some positive-
length interval but γ2 does not have such a constant stretch, then β2 must insert a constant stretch with
value p into the curve γ2 . In other words, the maps β1 and β2 do not remove constant stretches – they insert
constant stretches in each curve to match the other curve.
Instead of inserting constant stretches into curves γ1 and γ2 , it would be much more satisfying to somehow
remove them. This is not possible with single-valued functions, but it can be done with the function quotient
in Definition 6.7.7. Surjective functions β1 : Dom(γ1 ) → I and β2 : Dom(γ2 ) → I may be chosen so that
the constant stretches of β1 and β2 match the constant stretches of γ1 and γ2 respectively. Then in terms of
Definition 6.7.7, γ1 ◦ β1−1 : I → M and γ2 ◦ β2−1 : I → M will be well-defined functions, and β1 and β2 may be

chosen so that γ1 ◦ β1−1 = γ2 ◦ β2−1 = γ3 for some never-constant curve γ3 . Hence γ1 = γ3 ◦ β1 and γ2 = γ3 ◦ β2
as in Theorem 16.3.11. This is illustrated in Figure 16.3.1. (Any resemblance between Figure 16.3.1 and the
Millikan oil drop experiment – or an overhead view of a starship – is purely coincidental.)
I1 γ1
β1
γ3
I M
β2 γ2
I2
Figure 16.3.1 Equivalence of two curves to a never-constant curve
16.3.11 Theorem: Two curves γ1 and γ2 in a topological space M are path-equivalent in M if and only
if they are both path-equivalent to some never-constant curve γ3 in M .
16.3.12 Theorem: If two never-constant curves γ1 and γ2 in a topological space M are path-equivalent,
then the sets γ1−1 ({S}) and γ2−1 ({S}) are equipotent for all sets S ⊆ M ; that is, #(γ1−1 ({S})) = #(γ2−1 ({S})).
(See Definition 7.2.18 for equipotent sets.)
[ Should also list other properties of curves which are homogeneous within an equivalence class, such as the
initial and terminal points and hopefully the compactness of the parameter interval. To get this, must
probably restrict attention to never-constant curves. ]
16.3.13 Remark: When defining parallel transport on curves, it is necessary that the transport be the
same for all path-equivalent curves. That is, the parallelism should depend only on the path, not on the
particular choice of curve to represent the path. If the parallel transport is defined between two points
along equivalent curves, then the choice of curves should be irrelevant. This requirement may be stated
by specifying that the parallel transport between “corresponding points” on two curves must be the same.
Definition 16.3.14 defines the notion of “corresponding parameters” of curves for this purpose.
16.3.14 Definition: Corresponding parameters of curves γ1 : I1 → M and γ2 : I2 → M in a topological
space M are parameters t1 ∈ I1 and t2 ∈ I2 such that t1 = β1 (t) and t2 = β2 (t) for some t ∈ I for some
reparametrizations β1 : I → I1 and β2 : I → I2 with γ1 ◦ β1 = γ2 ◦ β2 .
[ Must show that “corresponding parameters” are well-defined. ]
16.3.15 Example: The correspondence of parameters in Definition 16.3.14 is not always unique. For
example, consider the curve γ : I → IR2 for an interval I ⊆ IR where γ : t 8→ (cos t, sin t). Define γ1 = γ2 = γ
with I = IR. Then suitable reparametrizations are β1 : IR → IR and β2 : IR → IR with β1 : t 8→ t + 2n1 π and
β2 : t 8→ t + 2n2 π for any n1 , n2 ∈ . It follows that t and t + 2(n2 − n1 )π are corresponding parameters. So
each parameter t1 for γ1 has an infinite number of corresponding parameters t2 = t1 + 2(n2 − n1 )π for γ2 ,
even though γ1 and γ2 are both never-constant.
[ Show that two never-constant compact-domain curves have unique corresponding parameters. ]
16.3.16 Remark: The non-uniqueness of corresponding parameters in Example 16.3.15 is not a problem for
defining parallelism. Open curves may sometimes be invariant under a reparametrization, but the parallelism
carried on such curves is independent of parametrization.
16.4. Paths
In this section, paths are defined in general topological spaces as equivalence classes of curves with respect
to the “path equivalence” of curves defined in Section 16.3.
The notation chosen for paths here is [γ]0 for the equivalence class of any given curve γ. Then [γ1 ]0 = [γ2 ]0
if and only if γ1 and γ2 are path-equivalent curves. One may say that [γ]0 is “the path of γ”, so that any two

16.4. Paths 409
curves are equivalent if and only if they “have the same path”. So every curve is associated with a unique
(oriented continuous) path. This path structure determines the order and general manner of traversal of
points in the image set.
16.4.1 Notation: [γ]0 denotes the set of curves in a topological space M which are path-equivalent to a
given curve γ in M .
16.4.2 Definition: A path in a topological space M is an equivalence class [γ]0 of curves which are path-
equivalent to a given curve γ in M .
A path may also be called an (oriented) (continuous) path or an (oriented) C 0 path, and the words directed
or ordered may be used instead of “oriented”.
For any curve γ, the path of γ is the equivalence class [γ]0 .
%
The set {Range(γ1 ); γ1 ∈ [γ]0 } is called the image of the path [γ]0 .
Any curve in a path Q = [γ]0 may be referred to as a path representative or representative curve for the
path Q.
16.4.3 Remark: For the empty curve γ = ∅, the equivalence class [γ]0 is not empty. So it cannot be called
literally the “empty path”. But it could accurately be called the “empty curve path” or the “path of the
empty curve”. This logically correct usage is too clumsy. So the terms “empty path” and “non-empty path”
will refer to the curve map, not the equivalence class. The emptiness or non-emptiness also corresponds to
the corresponding property of the image of the path.
Another moderately interesting trivial-curve issue is that of constant curves. For a fixed p ∈ M , the constant
curves γ1 : {0} → M with γ1 : t 8→ p and γ2 : [0, 1] → M with γ2 : t 8→ p are path-equivalent although their
parameter intervals are not homeomorphic. So these curves have the same path. In fact, all constant paths
with the same value are path-equivalent, for all of the topologically different kinds of non-empty oriented
intervals in the table in Section 16.2. The only kind of constant curve which is never-constant is a curve
with a singleton domain.
16.4.4 Notation: Denote by P0 (M ) the set of all continuous paths in M .
16.4.5 Remark: Notation 16.4.4 is experimental. There are some standard notations for sets of curves
and paths, but this it probably not one of them. In terms of the corresponding Notation %
16.2.18 for curves,
P0 (M ) = {[γ]0 ; γ ∈ C0 (M )}. Hence P0 (M ) is a partition of C0 (M ); so C0 (M ) = P0 (M ) and for
all γ1 , γ2 ∈ C0 (M ), either [γ1 ]0 = [γ2 ]0 or [γ1 ]0 ∩ [γ2 ]0 = ∅.
16.4.6 Remark: Definitions 16.4.7 and 16.4.8 are based on curve classes in Definitions 16.2.9 and 16.2.11.

The empty path in M is the path [γ]0 in M such that γ = ∅ is the empty curve.
An open path in M is a path Q in M such that γ is an open curve in M for some never-constant curve γ ∈ Q.
A closed path in M is a path [γ]0 such that γ is a closed curve in M .

A simple path in M is a path [γ]0 such that γ is a simple curve in M .
A simple closed path or Jordan path in M is a path [γ]0 such that γ is a Jordan curve in M .
A constant path in M is a path [γ]0 in M such that ∃p ∈ M, ∀t ∈ Dom(γ), γ(t) = p.
[ There are perhaps some problems with Definitions 16.4.7 and 16.4.8 because not all properties of curves
are homogeneous within equivalence classes. For example, the domain of a constant curve is an arbitrary
interval even though all constant curves with the same value are path-equivalent. Must fix this. Probably
have to specify non-constant representative curves. ]
16.4.9 Definition: The reversal of a path [γ]0 in a topological space M is the path [−γ]0 where −γ
denotes the curve −γ : t 8→ γ(−t). The reversal of a path Q = [γ]0 may be denoted as −Q or −[γ]0 .
16.4.10 Remark: The definitions of initial point, terminal point and multiple point in Definition 16.4.11
are independent of the choice of path representative.

16.4.11 Definition: The initial point and terminal point of a path Q in a topological space M with
non-empty compact domain are the initial point and terminal point respectively of some representative of Q.
A multiple point of a path Q in a topological space M is a point x ∈ M such that x is a multiple point of
some path representative of Q.
16.4.12 Notation: The initial and terminal points of a path Q with non-empty compact domain may be
denoted as S(Q) = S(γ) and T (Q) = T (γ) respectively for any path representative γ of Q.
16.4.13 Definition: The concatenation of two paths Q1 and Q2 in a topological space M with non-empty
compact domains such that T (Q1 ) = S(Q2 ) is the concatenation of any representatives γ1 of Q1 and γ2 of
Q2 such that T (γ1 ) = S(γ2 ).
[ Should show here that the concatenation of two paths is well-defined. ]

[ Define sums of paths so that common stretches of paths going in opposite directions can be cancelled. This
is important for stating that pathwsie curvature is additive with respect to “addition” of curves in some
simplicial complex sense. I guess that means I’ve got to read up on algebraic topology now. ]
16.4.14 Remark: Definition 16.4.2 for a path removes information about the choice of parametrization of
a curve except for the direction. The information which is preserved is the order in which every point of the
image are traversed, possibly multiple times. Definition 16.4.15 removes slightly more information because
the direction of traversal is also removed.
16.4.15 Definition: An unoriented (continuous) path in a topological space M is the set Q ∪ (−Q) for
any path Q in M .
16.4.16 Remark: Other possible terms for “unoriented” are “disoriented”, “undirected”, “unordered”,
“disordered”, and so forth.
16.4.17 Remark: As is the case of all equivalence class constructions in mathematics, an equivalence class
of curves (such as the paths in Definitions 16.4.2 and 16.4.15) may be represented in practical applications
by a single curve of the class. In practice, one need not be fastidious about the distinction between curves
and paths, as long as it is clear which equivalence relation is being used in each context.
16.4.18 Remark: The purpose of the parametrization of paths is to indicate the order and manner of
traversal of points in a topological space, although most of the information in the parametrization is irrele-
vant. The alternative of defining some sort of total ordering on the image is too clumsy in practice.
This is analogous to the issue of families of atlases versus sets of atlases. In practice, very little analysis
of paths can be done without parametrization, just as very little differential geometry can be done without
coordinate charts.
[ Define concatenation of paths. Show that concatenation is independent of the choice of path maps. ]
[ Define pathwise connectivity near here. ]
16.5. Convex curvilinear interpolation in affine spaces

16.5.1 Remark: While preparing figures for this book, the author needed some formulas for convex inter-
polation between curves in a linear space. An example is shown in Figure 16.5.1.
Since convex combinations are well defined in affine spaces, it is more efficient to consider convex interpolation
of curves in affine spaces rather than the more highly structured linear spaces.
This kind of convex interpolation implements the idea of “morphing” one curve into another. The trajectory
of the entire curve during the “morph” is determined by a given pair of end-point trajectories. A not-entirely-
obvious question here is whether the end-point trajectories determine a unique total-curve trajectory in a
natural manner.
[ Make Remark 16.5.2 into a definition and state/prove a theorem that the curvilinear interpolation is a
continuous map from the unit square [0, 1] × [0, 1] to IRn if the target space is IRn . ]

16.5. Convex curvilinear interpolation in affine spaces 411
γ3 q3
q2 γ4
γ2
q0 q1
γ1
Figure 16.5.1 Curvilinear interpolation between 4 curves
16.5.2 Remark: The formula which the author finally settled on for interpolation between curves is as
follows.
z(s, t) = (1 − t)γ1 (s) + tγ3 (s) + (1 − s)γ2 (t) + sγ4 (t) − ((1 − t)(1 − s)q0 + (1 − t)sq1 + t(1 − s)q2 + tsq3 )
= ct (γ1 (s), γ3 (s)) + cs (γ2 (t), γ4 (t)) − ct (cs (q0 , q1 ), cs (q2 , q3 )) (16.5.1)
= ct (γ1 (s), γ3 (s)) + cs (γ2 (t), γ4 (t)) − cs (ct (q0 , q2 ), ct (q1 , q3 )), (16.5.2)
where cλ (x, y) = (1 − λ)x + λy for λ, x, y ∈ IR.

The functions γi represent curves in IR2 which have a single real parameter in [0, 1]. The points qk are the
intersection points of the curves. Then the point z(s, t) is a point which interpolates between the four curves.
When s = 0, this interpolation matches γ2 . To be precise, z(0, t) = γ2 (t) for t ∈ [0, 1]. The complete set of
boundary conditions is as follows.
 
γ (s) if t = 0 q if s = t = 0

 1 
 0
γ2 (t) if s = 0 q1 if s = 1, t = 0
z(s, t) = =
 γ3 (s) if t = 1
  q2 if s = 0, t = 1

γ4 (t) if s = 1 q3 if s = t = 1.
16.5.3 Remark: The intention of the curve family (16.5.1) is that the horizontal curves should “morph”
from curve γ1 into curve γ3 . At the same time, the vertical curves should “morph” from γ2 to γ4 . The two
sets of curves should be consistent with each other in the sense that the point with parameter t on the sth
curve t 8→ z(s, t) interpolating γ2 to γ4 should be the same as the point with parameter s on the tth curve
s 8→ z(s, t) interpolating γ1 to γ3 . The interpolation should also be a polynomial of the lowest possible order
with respect to s and t. (In this case, it turned out to be bilinear.)
16.5.4 Remark: The chosen formula (16.5.1) has some interesting properties. For instance, the curve
t 8→ z(s, t) is a linear interpolation of a simple linear interpolation of γ2 and γ4 with the simple linear
interpolation of the points γ1 (s) and γ3 (s).
16.5.5 Remark: It does not necessarily follow that if the four bounding curves are non-self-intersecting
then the interpolated curves will be non-self-intersecting. Figure 16.5.2 illustrates a counter-example.
16.5.6 Remark: The curvilinear interpolation formula (16.5.1) was derived by the author as follows. First,
regard the location of the point z(s, t) for each s, t ∈ [0, 1] as a distortion of a regular rectangular grid. The
distortion at the boundary is given, and it is only necessary to find a distortion of the rectangular grid which
matches up with the give distortions at the boundary. Thus this is a kind of boundary value problem, and
the solution inside the grid should be a multinomial of the lowest possible degree, preferably bilinear.
Assuming that γ2 is the curve on the left and γ4 is the curve on the right as shown in Figure 16.5.1, define
a convex combination αs of γ2 and γ4 : αs (t) = (1 − s)γ2 (t) + sγ4 (t). This is asymmetric with respect to s

q3
γ3
q2 γ4
γ2
q0 q1
γ1
Figure 16.5.2 Curvilinear interpolation between 4 curves
and t. This family is curvilinear in t but linear in s. When s ∈ {0, 1}, this family matches the curves γ2
and γ4 .
This family of curves αs (t) does not match the curves γ1 and γ3 when t ∈ {0, 1} as desired. Therefore
consider the deviation or “error” between the curves αs (0) and γ1 and between the curves αs (1) and γ3 .
This “error” should look like γ1 (s) − αs (0) for t = 0 and like γ3 (s) − αs (1) for t = 0. Therefore define
z(s, t) = αs (t) + (1 − t)(γ1 (s) − αs (0)) + t(γ3 (s) − αs (1)).
This has the form of the “erroneous” curve αs (t) plus a convex combination with respect to t of the “error
corrections” for αs (t) for t = 0 and t = 1. By substituting the formula for α into this equation, the result is
z(s, t) = (1 − s)γ2 (t) + sγ4 (t) + (1 − t)(γ1 (s) − (1 − s)γ2 (0) − sγ4 (0)) + t(γ3 (s) − (1 − s)γ2 (1) − sγ4 (1))
= (1 − s)γ2 (t) + sγ4 (t) + (1 − t)(γ1 (s) − (1 − s)q0 − sq1 ) + t(γ3 (s) − (1 − s)q2 − sq3 )
= (1 − s)γ2 (t) + sγ4 (t) + (1 − t)γ1 (s) + tγ3 (s) − (1 − t)((1 − s)q0 + sq1 ) − t((1 − s)q2 + sq3 ).
This agrees with equation (16.5.1).
16.6. Algebraic topology
Some basic definitions for topics such as homotopy groups and singular homology theory will be summarized
here.
[ For homology groups, see EDM2 [35], article 201, and Federer [106], page 463. For homotopy theory, see
EDM2 [35], article 202. ]
[ See EDM2 [35], section 148.C, for the compact-open topology on the set Ω(X; x0 , x1 ) of curves from x0 to
x1 in a topological space X. This is supposed to have something to do with fibre spaces. ]
[ Exact sequences of linear spaces are defined in Section 10.11. But algebraic topology requires only exact
sequences of group homomorphisms, or something like that. ]
[ Define sheaves (EDM2 [35], 383) and sheaf cohomology (EDM2 [35], 383.E). ]

16.7. Topological groups 413
16.7. Topological groups

Topological groups are required for the specification of structure groups for fibre bundles in Chapter 23.
Topological groups are related to Lie groups. (See Chapter 33.)
Transformation groups are discussed in Section 9.4. This section deals with topological groups.
[ Near here, should refer to a family tree for topological groups and Lie groups. ]
16.7.1 Definition: A topological group is a tuple (G, TG , σG ) such that

(i) (G, σG ) is a group,
(ii) (G, TG ) is a topological space,
(iii) the group operation σG : G × G → G is continuous with respect to TG , and
(iv) the map g 8→ g −1 from G to G is continuous with respect to TG .
[ See EDM [34], section 406.A. Is the g −1 condition in Definition 16.7.1 superfluous? ]
16.8. Topological transformation groups

16.8.1 Remark: The difference between Definitions 16.8.2 and 16.8.3 is the extra requirement of the
topological transformation group that the action map be continuous with respect to elements of the group.
In both cases, the action of each element of the group G is a topological automorphism of the set X.
In Definition 16.8.2, the group is not a topological group. It is just a group of automorphisms. In Defini-
tion 16.8.3, the group is a topological group whose action map is continuous with respect to both the group
topology and the point-set topology.
16.8.2 Definition: A (left) transformation group of a topological space X is a tuple (G, X, TX , σG , µ) such
that (G, X, σG , µ) is a (left) transformation group of the set X, and Lg : X → X is a homeomorphism from
(X, TX ) to (X, TX ) for all g ∈ G.
16.8.3 Definition: A topological (left) transformation group of a topological space (X, TX ) is a tuple
(G, TG , X, TX , σG , µ) such that (G, TG , σG ) is a topological group and the action map µ : G × X → X is
continuous with respect to the topologies TG and TX .
16.8.4 Remark: Definitions 16.8.7 and 16.8.8 are the same as Definitions 16.8.2 and 16.8.3 respectively,
except that the action is required to be effective. (See Definition 9.4.14 for the concept of effective group
action.)
16.8.5 Remark: Homomorphisms of topological transformation groups in Definition 16.8.6 are based on
Definition 9.4.9 for general transformation groups.
16.8.6 Definition: A topological (left) transformation group homomorphism from a topological left trans-
formation group (G1 , X1 ) − < (G1 , TG1 , X1 , TX1 σ1 , µ1 ) to a topological left transformation group (G2 , X2 ) −
<
(G2 , TG2 , X2 , TX2 , σ2 , µ2 ) is a pair of maps (φ̂, φ) with φ̂ : G1 → G2 and φ : X1 → X2 such that
(i) The pair (φ̂, φ) is a left transformation group homomorphism;
(ii) φ̂ and φ are continuous.
16.8.7 Definition: An effective (left) transformation group of a topological space X is a (left) transfor-
< (G, X, TX , σG , µ) of the topological space X such that G acts effectively on X.
mation group G −
16.8.8 Definition: An effective topological (left) transformation group of a topological space X is a topo-
< (G, TG , X, TX , σG , µ) of X such that G acts effectively on X.
logical (left) transformation group G −
16.8.9 Remark: On the subject of specification tuples, the rule for choosing the listing order for the
components of tuples is that all algebraic operations (such as sums σ and products µ) are placed at the end
of the tuple, whereas attributes of sets such as topologies and atlases are place immediately after the sets
they belong to, as for example the topology TX for X in Definition 16.8.8. These style rules are followed
throughout this book.

16.8.10 Remark: The following definitions are for the “right” versions of the above “left” transformation
groups. The non-topological versions of these topological right transformation groups are in Section 9.4.
16.8.11 Definition: A right transformation group of a topological space X is a tuple (G, X, TX , σG , µ)

such that (G, X, σG , µ) is a right transformation group of the set X, and Rg : X → X is a homeomorphism
from (X, TX ) to (X, TX ) for all g ∈ G.
16.8.12 Definition: A topological right transformation group of a topological space (X, TX ) is a tuple
(G, TG , X, TX , σG , µ) such that (G, TG , σG ) is a topological group and the action map µ : X × G → X is
continuous with respect to the topologies TG and TX .
16.8.13 Definition: An effective right transformation group of a topological space X is a right transfor-
mation group G of the topological space X such that G acts effectively on X.
16.8.14 Definition: An effective topological right transformation group of a topological space X is a

topological right transformation group G of X such that G acts effectively on X.
[ Put a family tree here for topological groups, although it’s very simple. This should be similar to Fig-
ure 33.7.1. ]
16.8.15 Remark: Theorem 16.8.16 is the topological version of Theorem 9.4.24. See Remark 9.4.20.
16.8.16 Theorem: Let G − < (G, TG , σG ) be a topological group. Define the action map µ : G × G → G by
µ : (g1 , g2 ) 8→ σG (g1 , g2 ). Then the tuple (G, TG , G, TG , σG , µ) is an effective topological left transformation
group of (G, TG ).
The tuple (G, TG , G, TG , σG , µ) is also an effective topological right transformation group of (G, TG ).
Proof: For a left transformation group, the action map µ : G × X → X must satisfy the associativity
rule µ(σG (g1 , g2 ), x) = µ(g1 , µ(g2 , x)) for all g1 , g2 ∈ G and x ∈ X. If the formula for µ in the theorem is
substituted into this rule with X = G, it follows easily from the associativity of σG . The continuity of µ
follows from the continuity of σ.
For a right transformation group, the action map µ : X × G → X must satisfy the associativity rule
µ(x, σG (g1 , g2 )) = µ(µ(x, g1 ), g2 ) for all g1 , g2 ∈ G and x ∈ X. This follows in exactly the same way from
the group associativity.
These transformation groups are effective because the identity element of G is unique.
16.8.17 Definition: The topological left transformation group (G, G) − < (G, TG , G, TG , σG , σG ) is called
the topological (left) transformation group of G acting on G by left translation, or the topological left
translation group of G (on itself ).
< (G, TG , G, TG , σG , σG ) is called the topological right
The topological right transformation group (G, G) −
transformation group of G acting on G by right translation, or the topological right translation group of G
(on itself ).
[ Define pseudogroups of transformations of topological spaces near here. See Kobayashi and Nomizu [27],
page 1. Is there such a thing as a non-topological pseudogroup of transformations? ]
16.9. Topological vector spaces

16.9.1 Remark: In this book, the term “linear space” is used in preference to “vector space” because a
the word “vector” suggests an arrow with both a specified start point and a specified end point whereas a
linear space is simply an algebraic structure. However, the term “topological vector space” is the standard
terminology in the mathematics literature. So the more accurate terminology “topological linear space” is
not used in this book.
16.9.2 Remark: Definition 16.9.3 combines Definition 10.1.2 for a linear space with Definition 14.3.4 for
a topological space.

16.10. Network topology and continuous paths in networks 415
[ Check what the correct definition is for a topological vector space for a general field. It’s not quite clear
what to do with the topology on the field. The usual idea is to just use the standard topology on the field,
but what should one do if the field is not IR of ? ]
16.9.3 Definition: A topological vector space is a tuple V −

< (V, TV ) −
< (K, V, σK , τK , σV , µ, TK , TV ) such
that
(i) K = (K, σK , τK ) is a field;
< (K, V, σK , τK , σV , µ) is a linear space over K;
(ii) V −
(iii) (V, T ) is a topological space;
(iv) the vector addition function σV : V × V → V is continuous with respect to the topology TV on V and
the corresponding product topology on V × V ;
(v) the scalar multiplication function µ : K × V → V is continuous with respect to the topology TV on V
and the product topology on K × V .
[ Check if the continuity of µ in Definition 16.9.3 can be dropped because of the linearity of the map. If not,
give a counterexample. Using only the linearity of µ, it is probably possible to demonstrate continuity on a
dense subspace of K, or something like that. ]
[ In this section, give only the most basic general properties and definitions for topological vector spaces.
Specific examples, such as Banach spaces, Hilbert spaces, distributions and Sobolev spaces should be given
elsewhere. ]
16.10. Network topology and continuous paths in networks

16.10.1 Remark: This section is a mere twig on the concept tree of the book. It may be skipped with no
harmful side-effects. On the other hand, it is referred to in Chapter 22, which is almost as pointless as this
section. However, it may turn out some day that some significant world-models in physics require network
topology.
16.10.2 Remark: The word “topology” is often applied to the connectivity properties of networks, also
known as graphs. Network topology is not the same thing as topology on discrete sets. Network topology
does not satisfy the conditions for a standard analytical topology, but there are many similarities. Continuous
curves can be defined. Therefore notions such as pathwise parallelism and even curvature can be defined.
[ Since continuity of functions may be redefined (as in Section 15.5) in terms of preservation of disconnected-
ness, it may be possible to use such an approach for network topology. Give definitions for this approach. ]
16.10.3 Remark: Given a set N , neighbourhoods Ωx ⊆ N can be defined for each x ∈ N such that
(i) ∀x ∈ N, x ∈ Ωx , and
(ii) ∀x ∈ N, ∀y ∈ Ωx , x ∈ Ωy .
Unfortunately, the intersections of neighbourhoods are not generally neighbourhoods, but continuous curves
may be defined in a network N as maps γ : I → N such that I is a contiguous subset of the integers and
(i) ∀i ∈ I, (i + 1 ∈ I ⇒ γ(i + 1) ∈ Ωγ(i) ), and
(ii) ∀i ∈ I, (i − 1 ∈ I ⇒ γ(i − 1) ∈ Ωγ(i) ).
−
A metric d : N × N → + 0 is usually defined recursively on a network N in terms of balls Bx,r with centre
x ∈ N and radius r ∈ + 0 as follows.
(i) ∀x ∈ N, Bx,0 = {x},

%
(ii) ∀x ∈ N, ∀r ∈ +
0, Bx,r+1 = {z ∈ N ; ∃y ∈ Bx,r , z ∈ Ωy } = y∈Bx,r Ωy .
%
Then the metric d(x, y) is defined as min{r ∈ +0 ; y ∈ Bx,r } if y ∈ r∈ + Bx,r ; otherwise d(x, y) = ∞. The
0
network N is said to be connected if d(x, y) < ∞ for all x, y ∈ N .

16.10.4 Remark: A fibre bundle may be defined on a network with a network topology by attaching
a copy of a fibre space to each element of the network. Then parallelism may be defined as symmetric,
transitive, path-dependent maps between the fibres. In the context of a network topology, a connection may
be defined as the parallelism relations between neighbouring points. From the connection, parallel transport
for general continuous paths may be generated by applying the transitivity rule.

[417]
Chapter 17
Metric spaces
17.1 Distance functions and balls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

17.2 Set distance and set diameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
17.3 The topology induced by a metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
17.4 Continuous functions in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
17.5 Rectifiable sets, curves and paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
17.0.1 Remark: Metric spaces may be regarded as being in a higher concept layer than topological spaces.
This is because every metric space determines a unique corresponding topological space, whereas any metriz-
able topological space generally corresponds to an infinite number of metric spaces. Since a metric space
induces a unique topology, a metric space may freely import the definitions and theorems of general topology,
but not vice versa.
Since metric spaces are very close to human intuition, many textbooks introduce metric spaces before topo-
logical spaces to make life easier for the reader. In the long term, however, this is confusing because it
is difficult to forget the “facts” of metric spaces when learning the more general subject of topological
spaces. One’s intuition for metric spaces tends to impose itself on topological spaces, leading to many false
assumptions and much confusion.
One of the principal objectives of this book is to present the foundations of differential geometry in a
disciplined systematic manner in order to discourage the incorrect application of specific facts to more
general contexts. A similar danger of false generalization occurs in many differential geometry books which
present Riemannian manifolds before general differentiable manifolds and manifolds which have only an
affine connection. Commencing a book in the middle of a subject (between the low-level foundations and the
high-level applications) may be more popular in the short term, but it leads to confusion in the long term.
17.0.2 Remark: Metric spaces (in the context of topology) should not be confused with the concept of a
Riemannian metric (in the context of differential geometry). A Riemannian manifold is a particular kind of
differentiable metric space, and a Riemannian metric is a particular kind of differential of a two-point metric
on a manifold. In terms of conceptual layering, metric spaces are lower than Riemannian manifolds. Metric
spaces are a more general concept which is closely associated with general topology.
17.0.3 Remark: Just as the Riemannian metric tensor field simplifies many differential geometry defini-
tions, so also a two-point metric function simplifies many topology definitions. Many distinct concepts in
general topology become equivalent concepts when the topology is induced by a metric. A good example of
this is compactness. Several different definitions of compactness for general topological spaces are equivalent
for metric spaces.
17.0.4 Remark: Some of the topics in this chapter, such as Lipschitz functions and rectifiable curves are
more closely related to calculus than topology. It turns out that Lipschitz functions and rectifiable curves are
differentiable almost everywhere when the metric space is a manifold. Differentiability is a calculus concept
and “almost everywhere” is a measure theory concept, but the Lipschitz and rectifiability conditions may
be stated in the absence of such higher-layer concepts.


418 17. Metric spaces
17.1. Distance functions and balls

17.1.1 Definition: A metric (function) or distance function for a set M is a function d : M × M → IR+
0
such that
(i) ∀x, y ∈ M, d(x, y) = 0 ⇔ x = y, (identity)
(ii) ∀x, y ∈ M, d(x, y) = d(y, x), (symmetry)
(iii) ∀x, y, z ∈ M, d(x, y) + d(y, z) ≥ d(x, y). (triangle inequality)
17.1.2 Definition: A metric space is a pair (M, d) where M is a set and d is a metric function on M . A
metric space (M, d) may be abbreviated as M .
17.1.3 Remark: The Riemannian metric tensor in Section 38.3 may also be called simply “the metric”,
which could be confused with the metric in Definition 17.1.1. When there is a possibility of confusion, the
metric in Definition 17.1.1 may be referred to as a “point-to-point metric”, “two-point metric” or “distance
function”. The Riemannian concept may be called a “Riemannian metric” or “metric tensor”. The two
concepts are closely related, as discussed in Section 38.4. It turns out that the Riemannian metric is actually
a differential of the two-point distance function.
17.1.4 Definition:
"!n The# usual metric on IRn for n ∈ +
0 is the function d : IRn × IRn → IR+
0 defined by
d(x, y) = i=1 (xi − yi )
2 1/2
for x, y ∈ IRn .
17.1.5 Theorem: In a metric space (M, d), d(x, z) ≥ |d(x, y) − d(y, z)| for all x, y, z ∈ M .
17.1.6 Remark: Theorem 17.1.5 puts a lower bound on distances corresponding to the upper bound in
the triangle inequality. Combining the bounds gives
∀x, y, z ∈ M, |d(x, y) − d(y, z)| ≤ d(x, z) ≤ d(x, y) + d(y, z).
One may reconcile oneself to such inequalities by drawing lines and circles on paper. Theorems 17.1.12 and
17.1.14 suggest how to draw such lines and circles.
17.1.7 Definition:
−
The open ball in a metric space (M, d) with centre x ∈ M and radius r ∈ IR+ 0 is the set {y ∈ M ; d(x, y) < r}.
−+
The closed ball in a metric space (M, d) with centre x ∈ M and radius r ∈ IR0 is the set {y ∈ M ; d(x, y) ≤ r}.
17.1.8 Notation:
−
Bx,r , for a metric space (M, d), x ∈ M and r ∈ IR+ 0 , denotes the open ball {y ∈ M ; d(x, y) < r} in M with
centre x and radius r.
−
B̄x,r , for a metric space (M, d), x ∈ M and r ∈ IR+
0 , denotes the closed ball {y ∈ M ; d(x, y) ≤ r} in M with
centre x and radius r.
−
Br (x), for a metric space (M, d), x ∈ M and r ∈ IR+ 0 , is an alternative notation for Bx,r .
−+
B̄r (x), for a metric space (M, d), x ∈ M and r ∈ IR0 , is an alternative notation for B̄x,r .
17.1.9 Remark: Definition 17.1.7 is illustrated in Figure 17.1.1. General metric spaces are quite different
to IR2 , but such diagrams are often helpful for understanding general theorems.
y
y y
y y
x y x y
y r y r
y
y ∈ Br (x) y ∈ B̄r (x)

Figure 17.1.1 Open and closed balls in a metric space

17.1. Distance functions and balls 419
17.1.10 Remark: The words “open” and “closed” in Definition 17.1.7 are only loosely related to the
concepts of open and closed sets in the topology induced by the metric in Definition 17.3.3. Likewise, the
use of the bar in the notation for a closed ball does not generally mean that it is the topological closure
of the corresponding open ball. Even in the spaces IRn , closed balls with zero radius are not topological
closures of the corresponding open balls. In general metric spaces, the relation between open and closed
balls is even looser. Finite metric spaces provide ample examples of open balls which are strictly included
in the corresponding closed balls for positive radius.
[ Should have numerous exercises for finite metric spaces and other metric spaces where properties expected
from IRn do not hold. ]
17.1.11 Remark: There are many ways to express the triangle inequality in terms of balls. Some examples
are given in Theorem 17.1.12. Parts (1) and (3) are illustrated in Figure 17.1.2.
z r2
r2 y
y z
x x
r1 r1
r1 + r2 r1 + r2
y ∈ B̄r1 (x) ⇒ B̄r2 (y) ⊆ B̄r1 +r2 (x) B̄r1 (x) ∩ B̄r2 (y) -= ∅ ⇒ y ∈ B̄r1 +r2 (x)
Figure 17.1.2 Triangle inequality equivalents in Theorem 17.1.12 (1) and (3)
17.1.12 Theorem: The following statements are equivalent to the triangle inequality for a function d :
M ×M → IR+0 on a set M which satisifes the identity and symmetry conditions (i) and (ii) in Definition 17.1.1.
(1) ∀x, y ∈ M, ∀r1 , r2 ∈ IR+
0 , y ∈ B̄r1 (x) ⇒ B̄r2 (y) ⊆ B̄r1 +r2 (x).
+ %
(2) ∀x, y ∈ M, ∀r1 , r2 ∈ IR0 , y∈B̄r (x) B̄r2 (y) ⊆ B̄r1 +r2 (x).
1
(3) ∀x, y ∈ M, ∀r1 , r2 ∈ IR+

0 , B̄r1 (x) ∩ B̄r2 (y) -= ∅ ⇒ y ∈ B̄r1 +r2 (x).
(4) ∀x, y ∈ M, ∀r1 , r2 ∈ IR+ / B̄r1 +r2 (x) ⇒ B̄r1 (x) ∩ B̄r2 (y) = ∅.
0, y ∈
17.1.13 Remark: There are many ways to express the triangle inequality lower bound in Theorem 17.1.5 in
terms of balls. Some examples are given in Theorem 17.1.14. Parts (1) and (3) are illustrated in Figure 17.1.3.
r2
z y z
y r2
x x
r1 − r2 r1 − r2
r1 r1
/ Br1 (x) ⇒ B̄r2 (y) ∩ Br1 −r2 (x) = ∅

y∈ B̄r2 (y) -⊆ Br1 (x) ⇒ y ∈
/ Br1 −r2 (x)
Figure 17.1.3 Triangle inequality equivalents in Theorem 17.1.14 (1) and (3)

17.1.14 Theorem: The following statements are equivalent to the triangle inequality for a function d :
M ×M → IR+0 on a set M which satisifes the identity and symmetry conditions (i) and (ii) in Definition 17.1.1.
(1) ∀x, y ∈ M, ∀r1 ∈ IR+

0 , ∀r2 ∈ [0, r1 ], y ∈
/ Br1 (x) ⇒ B̄r2 (y) ∩ Br1 −r2 (x) = ∅.
%
(2) ∀x, y ∈ M, ∀r1 ∈ IR0 , ∀r2 ∈ [0, r1 ], Br1 −r2 (x) ∩ y∈B
+
/ r (x) B̄r2 (y) = ∅.
1
(3) ∀x, y ∈ M, ∀r1 ∈ IR+

0 , ∀r2 ∈ [0, r1 ], B̄r2 (y) -⊆ Br1 (x) ⇒ y ∈
/ Br1 −r2 (x).
(4) ∀x, y ∈ M, ∀r1 ∈ IR+
0 , ∀r2 ∈ [0, r1 ], y ∈ Br1 −r2 (x) ⇒ B̄r2 (y) ⊆ Br1 (x).
17.1.15 Remark: The inequalities and propositions in Definition 17.1.1 and Theorems 17.1.5, 17.1.12
and 17.1.14 may seem shallow. They become more interesting, however, when they are compared to the
corresponding inequalities and propositions for a pseudo-metric (or “hyperbolic metric”).
17.1.16 Remark: It is sometimes useful to define an open and closed annulus as in Definition 17.1.17
corresponding to the open and closed ball in Definition 17.1.7. The special cases where the inner radius is 0
may be referred to as a “punctured” open or closed ball. Notation 17.1.18 is (probably) non-standard.
17.1.17 Definition: The open annulus in a metric space (M, d) with centre x ∈ M and radius pair
−
r1 , r2 ∈ IR+
0 is the set {y ∈ M ; r1 < d(x, y) < r2 }.
−
The closed annulus in a metric space (M, d) with centre x ∈ M and radius pair r1 , r2 ∈ IR+0 is the set
{y ∈ M ; r1 ≤ d(x, y) ≤ r2 }.
−
The punctured open ball in a metric space (M, d) with centre x ∈ M and radius r ∈ IR+0 is the set {y ∈
M ; 0 -= d(x, y) < r}.
−
The punctured closed ball in a metric space (M, d) with centre x ∈ M and radius r ∈ IR+ 0 is the set
{y ∈ M ; 0 -= d(x, y) ≤ r}.
17.1.18 Notation:
−
Bx,r1 ,r2 , for a metric space (M, d), x ∈ M and r1 , r2 ∈ IR+0 , denotes the open annulus {y ∈ M ; r1 < d(x, y) <
r2 } in M with centre x and radius pair r1 , r2 .
−
B̄x,r1 ,r2 , for a metric space (M, d), x ∈ M and r1 , r2 ∈ IR+ 0 , denotes the closed annulus {y ∈ M ; r1 ≤
d(x, y) ≤ r2 } in M with centre x and radius pair r1 , r2 .
−
Ḃx,r , for a metric space (M, d), x ∈ M and r ∈ IR+0 , denotes the punctured open ball {y ∈ M ; 0 -= d(x, y) < r}
in M with centre x and radius r.
B̄˙ x,r , for a metric space (M, d), x ∈ M and r ∈ IR+
−
0 , denotes the punctured closed ball {y ∈ M ; 0 -= d(x, y) ≤
r} in M with centre x and radius r.
−
Br1 ,r2 (x), for a metric space (M, d), x ∈ M and r1 , r2 ∈ IR+ 0 , is an alternative notation for Bx,r1 ,r2 .
−
B̄r1 ,r2 (x), for a metric space (M, d), x ∈ M and r1 , r2 ∈ IR+ 0 , is an alternative notation for B̄x,r1 ,r2 .
−+
Ḃr (x), for a metric space (M, d), x ∈ M and r ∈ IR0 , is an alternative notation for Ḃx,r .
B̄˙ (x), for a metric space (M, d), x ∈ M and r ∈ IR+ , is an alternative notation for B̄˙ .
−
r 0 x,r
17.1.19 Remark: There are, of course, many interrelationships between the various definitions of balls
and annuli. For example, Bx,r1 ,r2 = Bx,r2 \ B̄x,r1 and Ḃx,r = Bx,r \ B̄x,0 .
17.1.20 Remark: There is some ambiguity between Notations 17.1.8 and 17.1.18. For example, Bx,r and
Br1 ,r2 may be confused, particularly if M is the metric space of real numbers. Such clashes are usually easy
to clarify within the application context.
17.2. Set distance and set diameter

17.2.1 Definition: The distance between two sets A and B in a metric space (M, d) is the non-negative
extended real number d(A, B) ∈ IR+0 ∪ {∞} defined by d(A, B) = inf{d(x, y); x ∈ A, y ∈ B}.
The distance between a point and a set in a metric space (M, d) is the non-negative extended real number
0 ∪ {∞} defined by d(x, A) = inf{d(x, y); y ∈ A} for points x ∈ X and A ⊆ M with A -= ∅.
d(x, A) ∈ IR+

17.3. The topology induced by a metric 421
17.2.2 Remark: If A = ∅ or B = ∅ in Definition 17.2.1, then d(A, B) = inf ∅ = ∞, which is okay. Similarly,
d(x, A) = ∞ if A = ∅. Otherwise, all distances are non-negative (finite) real numbers.
17.2.3 Theorem: Let A1 and A2 be subsets of a metric space (M, d) with A1 ⊆ A2 . Then d(x, A1 ) ≥
d(x, A2 ) for all x ∈ M . So {x ∈ M ; d(x, A1 ) < r} ⊆ {x ∈ M ; d(x, A2 ) < r}.
Let B1 , B2 be subsets with B1 ⊆ B2 . Then d(A1 , B1 ) ≥ d(A2 , B2 ).
17.2.4 Definition: The diameter of a set S -= ∅ in a metric space (M, d) is the non-negative extended
real number diam(S) = sup{d(x, y); x, y ∈ S}.
17.2.5 Remark: In Definition 17.2.4, the diameter of the empty set would be sup ∅ = −∞, which would
probably be annoying. Therefore it is not defined. Any non-empty set with a finite number of elements must
have a finite diameter. The diameter of an infinite set may be infinite.
17.2.6 Remark: It follows from the triangle inequality that diam(Bx,r ) ≤ 2r for all x ∈ M and r ∈ IR+ 0.
Corresponding to diam(S) in Definition 17.2.4, one could also define a radius as radius(S) = inf{r ∈ IR ∪
{∞}; ∃x ∈ S, S ⊆ Bx,r }. However, although diam(S) ≤ 2 radius(S), the equality diam(S) = 2 radius(S)
does not hold for all metric spaces.
17.2.7 Definition: A bounded subset of a metric space is a subset whose diameter is finite.
17.3. The topology induced by a metric

17.3.1 Remark: Out of all of the possible topologies which could be attached to a metric space, there is
a single canonical topology which is generated by the set of all open balls in the metric space. This is the
“topology induced by the metric”. Historically, metric spaces preceded topological spaces. But the history
of mathematics has countless cases of “re-founding” old concepts on the basis of new concepts which are
more general. This embedding of specific concepts in a more general framework is an integral part of the
way mathematicians think. It often happens that there are benefits from the embedding of concepts in more
general frameworks. The general frameworks suggest asking questions that one might otherwise not have
asked. However, it sometimes happens that embedding in more general frameworks just makes the original
concepts less comprehensible without any benefit. Generality for its own sake is sometimes a burden imposed
by well-meaning mathematicians who want to create a new territory. But sometimes the new territory is
barren. In the case of topology though, the more general framework has been enormously fruitful. There
are many theorems for metric spaces which are even more useful when applied to more general topological
spaces. The difficult thing, however, is to keep the theorems clear in one’s own mind. There are many
theorems for metric spaces which do not generalize much or at all to topological spaces. It is very important
to state clearly the assumptions upon which each theorem is based.
17.3.2 Remark: Just as a point-to-point metric (Definition 17.1.1) induces a canonical topology (Defini-
tion 17.3.3), so also a differential metric (the Riemannian metric) induces both a canonical topology and a
canonical parallelism. It turns out that a point-to-point metric on a diffientiable manifold (Definition 26.3.6)
induces a Riemannian metric (under some reasonable assumptions), and this Riemannian differential metric
therefore induces a canonical topology and a canonical paralellism. Fortunately it turns out that the induced
topology and induced parallelism are the same no matter which path you arrive at them by (under some
reasonable assumptions).
17.3.3 Definition: The topology induced by a metric d on a set M is the topology generated by the set
of all open balls with positive radius in the metric space (M, d).
The topology of a metric space (M, d) is the topology induced by d on M . This topology may be denoted
as Top(M ) when the metric d is implicit.
17.3.4 Remark: The topology on a metric space (M, d) can be written explicitly as
) % *
Top(M ) = Bxi ,ri ; x : S → X and r : S → IR+ ,
i∈S
where the open balls Bxi ,ri are as in Notation 17.1.8.

17.3.5 Remark: The topology in Definition 17.3.3 is well-defined because an arbitrary set of subsets of
any set M will always generate a topology on M . This does not necessarily mean that the topology will
have any nice properties. For example, if d(x, y) = 0 for all x, y ∈ M , the topology will be trivial. (See
Definition 14.3.18 for the trivial topology.)
17.3.6 Remark: When there are two or more choices of a metric on a given set M , one could use a notation
such as Topd (M ) to indicate the topology on M for a particular metric d. But this could be confused with
the notation Topx (M ) for the set of open neighbourhoods of x in Top(M ). A better notation for the topology
induced by (M, d) would be Top(M, d).
17.3.7 Remark: All of the definitions for topological spaces apply also to metric spaces by referring to the
induced topology. Thus a metric space is said to be paracompact if the induced topology is paracompact,
and so forth. Similarly, continuity between metric spaces, and between metric spaces and topological spaces,
is defined as if the metric function were replaced with the induced topology, as superfluously presented in
Definition 17.4.1.
17.3.8 Remark: Open balls in a metric space are automatically open sets by Definition 17.3.3.
17.3.9 Theorem: Closed balls in a metric space are closed sets in the induced topology.
Proof: Consider the closed ball S = {y ∈ M ; d(x, y) ≤ r}. If r = ∞, then S = M , which is closed.
So assume that r < ∞. If z ∈ M \ S, then d(x, z) > r. Therefore by Theorem 17.1.5, Bz,ε ⊆ M \ S
with ε = d(x, z) − r. But Bz,ε is an open set. So M \ S is an open set. Therefore S is closed.
17.3.10 Remark: A closed ball in Definition 17.1.7 includes, but is not necessarily equal to, the closure
of the corresponding open ball. Since a closed ball {y ∈ M ; d(x, y) ≤ r} is a closed set, it follows from
Definition 14.5.4 for the closure of a set that B̄x,r ⊆ {y ∈ M ; d(x, y) ≤ r}. A discrete space such as the
integers with the usual metric provides ample counterexamples to the converse.
−
17.3.11 Theorem: Let (M, d) be a metric space. Then for any A ⊆ M , x ∈ M , and r ∈ IR+
0,
(1) Bx,r ∩ A = ∅ ⇔ d(x, A) ≥ r;
(2) Bx,r ∩ A -= ∅ ⇔ d(x, A) < r ⇔ ∃y ∈ A, d(x, y) < r.
% −
Hence (or otherwise), x∈A Bx,r = {x ∈ M ; d(x, A) < r} for any set A ⊆ M and r ∈ IR+
0.
17.3.12 Theorem: For points x and closed sets K in a metric space (M, d), x ∈ K ⇔ d(x, K) = 0. In
other words, if K is closed then K = {x ∈ M ; d(x, K) = 0}.
Proof: Since K is closed, M \ K is open. Therefore
x∈
/ K ⇔ x∈M \K
⇔ ∃r > 0, Bx,r ⊆ M \ K
⇔ ∃r > 0, Bx,r ∩ K = ∅
⇔ ∃r > 0, d(x, K) ≥ r
⇔ d(x, K) > 0.
The theorem follows immediately.
17.3.13 Theorem: The interior of any set S in a metric % space (M, d) is equal to the union of all open
balls which are included in S. In other words, Int(S) = {Bx,r ; x ∈ S, r > 0, Bx,r ⊆ S}.
17.3.14 Remark: Definitions and properties of the interior, closure, exterior and boundary of sets in a
general topological space are presented in Sections 14.5 and 14.6. The correspondence between the metric on
a metric space M and these set components in the induced topology on M is expressed in Theorem 17.3.15
in terms of the set distance function.

17.3. The topology induced by a metric 423
17.3.15 Theorem: Let S be a subset of a metric space M with metric d. Then:

(1) Int(S) = {x ∈ M ; d(x, M \ S) > 0}.
(2) S̄ = {x ∈ M ; d(x, S) = 0}.
(3) Ext(S) = {x ∈ M ; d(x, S) > 0}.
(4) Bdy(S) = {x ∈ M ; d(x, S) = 0 ∧ d(x, M \ S) = 0}.
17.3.16 Theorem: The closure Ā of a set A in a metric space (M, d) satisfies
Ā = {x ∈ M ; d(x, A) = 0}
' %
= Bx,r .
r>0 x∈A
Proof: It follows from Theorems 17.3.12 and 17.2.3 and the fact that Ā is closed, that Ā = {x ∈
M ; d(x, Ā) = 0} ⊇ {x ∈ M ; d(x, Ā) = 0}.
For any A ⊆ M , the set {x ∈ M ; d(x, A) > 0} is open because for any x ∈ M with d(x, A) > 0, Bx,d(x,A) ⊆
{x ∈ M ; d(x, A) > 0}. So {x ∈ M ; d(x, A) = 0} = M \ {x ∈ M ; d(x, A) > 0} is closed. Therefore
Ā ⊆ {x ∈ M ; d(x, A) = 0}. This verifies the first equality of the theorem.
%
To show the second equality,' let Sr = x∈A Bx,r for r > 0. Then Ā ⊆ {x ∈ M ; d(x, A) < r} = Sr by
Theorem 17.3.11. So Ā ⊆ r>0 Sr . Now ' suppose that ' y ∈/ Ā. Then d(y, A) > 0. Let r = d(y, A). Then
y∈ / Sr = {x ∈ M ; d(x, A) < r}. So y ∈/ r>0 Sr . Hence r>0 Sr ⊆ Ā. This verifies the second equality.
[ Originally I proved half of the first equality of Theorem 17.3.16 as in the Remark 17.3.17. But then I realized
that it can be done in a single line. I’ll delete this remark as soon as I’m totally certain that it’s a total
waste of space. ]
'
17.3.17 Remark: By Definition 14.5.4, Ā = {K ∈ IP(X); K is closed and S ⊆ K}. Let K be closed
in M with A ⊆ K. Then K = {x ∈ M ; d(x, K) = 0} by Theorem 17.3.12. By Theorem 17.2.3, d(x, A) ≥
d(x, K) for all x ∈ M . Therefore {x ∈ M ; d(x, A) = 0} ⊆ {x ∈ M ; d(x, K) = 0} = K. So {x ∈ M ; d(x, A) =
0} ⊆ Ā.
The following calculation is some sort of alternative proof of the second equality in Theorem 17.3.16.
x ∈ S̄ ⇔ d(x, S) = 0
⇔ inf{d(x, y); y ∈ S} = 0
⇔ ∀r > 0, ∃y ∈ S, d(x, y) < r
⇔ ∀r > 0, ∃y ∈ S, x ∈ By,r
%
⇔ ∀r > 0, x ∈ By,r
y∈S
' %
⇔ x∈ By,r .
r>0 y∈S
[ I’ll get Remark 17.3.17 sorted out when I get a bit of spare time. ]
17.3.18 Theorem: A subset of a metric space is compact if and only if it is sequentially compact.
Proof: See Simmons [140], page 123, for a proof that sequentially compact implies compact, and pages
120–124 for a proof of the converse.
[ Make sure the proofs of equivalence of compact and sequentially compact do not use the axiom of choice. ]
17.3.19 Theorem: All compact subsets of a metric space are closed and bounded.
17.3.20 Remark: The converse of Theorem 17.3.19 is not true. A closed, bounded subset of a metric
space is not necessarily compact. (See Simmons [140], page 115.)
17.3.21 Theorem: A subset of IRn (with the usual metric) is compact if and only if it is closed and
bounded.

17.3.22 Definition: A Lebesgue number for" an open cover (Ωi )i∈I of a set X in #a metric space (M, d) is
a positive real number λ such that ∀A ⊆ X, (diam(A) < λ) ⇒ (∃i ∈ I, A ⊆ Ωi ) ; in other words, every
subset of X with diameter less than λ is fully included within at least one of the covering sets Gi .
Thus an open cover (Ωi )i∈I of a set X in a metric space (M, d) is said to have a Lebesgue number if
∃λ > 0, ∀A ⊆ X, (diam(A) < λ) ⇒ (∃i ∈ I, A ⊆ Ωi ).
17.3.23 Remark: Since compactness and sequential compactness are equivalent in a metric space, The-
orem 17.3.24 implies that all open covers of compact sets in metric spaces have Lebesgue numbers. This
is useful for showing that rectifiability for compact-domain paths in general topological manifolds is well-
defined.
17.3.24 Theorem: Every open cover of a sequentially compact set in a metric space has a Lebesgue
number.
Proof: See Simmons [140], page 122.
17.3.25 Theorem: All metric spaces are paracompact. [ Prove this. ]
17.3.26 Remark: The Heine-Borel theorem requires the definition of bounded sets, which requires a met-
ric. Therefore it cannot be defined in an earlier chapter, even though the topology on the real numbers is
defined in Section 14.10.
The Heine-Borel theorem is sometimes proved with the aid of the Axiom of Choice. (E.g. See Simmons [140],
pages 113–119.) It does not seem that this is necessary. Theorem 17.3.27 is proved here (hopefully) without
any use of AC. (See Taylor [145], page 30.)
[ Unfortunately, I have read that Heine-Borel cannot be proved in ZF without AC. Bother! On the other
hand, probably that refers to general metric spaces, not Euclidean spaces. Get references for this. ]
17.3.27 Theorem (Heine-Borel): All bounded, closed subsets of IR are compact.
Proof: First show that any bounded, closed interval [a, b] is compact. Let C be an open cover of [a, b].
Define S = {x ∈ [a, b]; ∃C1 ⊆ C, C1 is finite and C1 covers [a, x]}. Then S -= ∅ since a ∈ S, and S ⊆ [a, b].
So c = sup(S) is well defined and c ∈ [a, b]. Since C covers [a, b], c ∈ G for some G ∈ C. By definition of
the topology on IR, there is an open interval (d, e) ⊆ IR such that c ∈ (d, e) ⊆ G. By definition of S, the
interval [a, d# ] is covered by a finite subcover C1 of C, where d# = max(a, d). Let C2 = C1 ∪ {G}. Then C2 is
a finite open cover of the interval [a, e]. But this contradicts the definition of S. So the assertion that [a, b]
is compact follows. (See Figure 17.3.1.)
G
a d e b
c IR
C C
C1 C1 C2 = C1 ∪ {G}
Figure 17.3.1 Proof of Heine-Borel Theorem 17.3.27
In the case of a general bounded, closed subset K of IR, there is a closed interval [a, b] ⊆ IR with K ⊆ [a, b].
By Theorem 15.7.6, any closed subset of a compact set is compact. So the theorem follows.
17.3.28 Theorem: A subset of IRn with the usual topology is compact if and only if it is closed and
bounded.
17.3.29 Remark: The definition of compactness in terms of the existence of finite subcovers of open covers
(Definition 15.7.4) is equivalent in a metric space to sequential compactness (Definition 15.7.11), which is
also equivalent to the Bolzano-Weierstraß property. The statement that the Euclidean metric spaces IRn
have the Bolzano-Weierstraß property is called the Bolzano-Weierstraß Theorem.
17.3.30 Definition: A metric space is said to have the Bolzano-Weierstraß property if every infinite subset
has a limit point.
[ It seems like the Bolzano-Weierstraß theorem probably does not need AC. Must check. ]

17.4. Continuous functions in metric spaces 425
17.4. Continuous functions in metric spaces

17.4.1 Definition: A continuous function from a metric space (M, d) to a topological space (X, TX ) is a
function f : M → X which is continuous with respect to the topologies Top(M ) and TX .
Continuity of a function f : X → M is defined with respect to TX and Top(M ) respectively.
A function f : M1 → M2 for metric spaces (M1 , d1 ) and (M2 , d2 ) is said to be continuous if f is continuous
with respect to Top(M1 ) and Top(M2 ).
17.4.2 Remark: Theorem 17.4.3 states that in a metric space, ε-δ continuity is the same as continuity
with respect to the induced topology of the metric space.
The ε-δ definition of limits and continuity is attributed to Karl Theodor Wilhelm Weierstraß by Bell [190],
page 294 and Bynum et alia [192], page 15. In an earlier time, continuity was defined intuitively. This made
it difficult to arrive at consunsus on which kinds of functions were continuous. But more importantly, the
absence of an objective definition of continuity made deductive arguments impossible. As soon as a purely
logical expression for continuity was discovered and agreed upon, rapid progress in the subject was possible.
This shows the importance of replacing intuition with objective definitions.
17.4.3 Theorem: A function f : M1 → M2 between metric spaces (M1 , d1 ) and (M2 , d2 ) is continuous if
and only if
∀x ∈ M1 , ∀ε > 0, ∃δ > 0, ∀y ∈ M1 ,
(17.4.1)
d1 (x, y) < δ ⇒ d2 (f (x), f (y)) < ε.
17.4.4 Remark: Condition (17.4.1) in Theorem 17.4.3 is expressed in Theorem 17.4.5 in terms of open
balls. The ball notation requires superscripts to indicate in which metric space the balls are defined, although
these can generally be guessed from the context.
17.4.5 Theorem: A function f : M1 → M2 between metric spaces (M1 , d1 ) and (M2 , d2 ) is continuous if
and only if
∀x ∈ M1 , ∀ε > 0, ∃δ > 0, f (Bx,δ
1
) ⊆ Bf2(x),ε . (17.4.2)
[ Near here, specialize Definition 14.12.15 to metric spaces, using distances to define limits instead of neigh-
bourhoods. Possibly also specialize Theorem 14.12.20 to metric spaces similarly. ]
17.4.6 Remark: For any continuous function f : M1 → M2 , for metric ) spaces (M1 , d1 ) and (M2 , d2 ), one
*
may construct a function Ef : M1 ×IR+ 0 → IR+
0 ∪{∞} by E f (x, δ) = inf r ∈ IR+
0 ∪{∞}; f (Bx,δ
1
) ⊆ B(f
2
(x),r) .
The characteristics of this function are the basis of various specialized definitions of continuity such as Hölder
continuity.
17.4.7 Remark: Uniform continuity is specified in Definition 17.4.8 by swapping some of the quantifiers
in Theorem 17.4.3. The condition is equivalent to ∀ε > 0, ∃δ > 0, ∀x ∈ M1 , f (Bx,δ ) ⊆ Bf (x),ε , which is
equivalent to requiring that the function Ef in Remark 17.4.4 satisfies ∀ε > 0, ∃δ > 0, Ef (x, δ) ≤ ε.
[ Define modulus of continuity. See EDM2 [35], page 317, section 84.A. ]
17.4.8 Definition: A uniformly continuous function f : M1 → M2 from a metric space (M1 , d1 ) to a

metric space (M2 , d2 ) is a function f : M1 → M2 which satisfies
∀ε > 0, ∃δ > 0, ∀x, y ∈ M1 , d1 (x, y) < δ ⇒ d2 (f (x), f (y)) < ε.
17.4.9 Theorem: For any metric spaces (M1 , d1 ) and (M2 , d2 ), if K is a compact subset of M1 and
f : K → M2 is continuous, then f is uniformly continuous.
Proof: See Simmons [140], page 124, or Taylor [145], page 37.

17.4.10 Definition: A Lipschitz (continuous) or Lipschitzian function from a metric space (M1 , d1 ) to a
metric space (M2 , d2 ) is a function f : M1 → M2 such that
∃K ∈ IR+
0 , ∀x, y ∈ M1 , d2 (f (x), f (y)) ≤ Kd1 (x, y). (17.4.3)
A Lipschitz constant for a Lipschitz function f is any K ∈ IR+

0 such that (17.4.3) holds.
17.4.11 Notation: Lip(f ) denotes the infimum of all Lipschitz constants for a Lipschitz function f .
[ Define modulus of continuity and show how this relates to Lipschitz and uniform continuity. ]
17.4.12 Theorem: For any two metric spaces (M1 , d1 ) and (M2 , d2 ), all Lipschitz functions f : M1 → M2
are uniformly continuous, and all uniformly continuous functions f : M1 → M2 are continuous.
17.4.13 Remark: Theorem 17.4.14 specializes Theorem 17.4.3 to the case where the metric spaces are
Euclidean tuple spaces with the usual metric specified in Definition 17.1.4.
[ Should generalize Theorem 17.4.14 to domains which are open subsets of IRn . This requires the induced
metric and induced topology on open subsets. This theorem is used in the proof of Theorem 18.2.15. ]
17.4.14 Theorem: A function f : IRn → IRm for m, n ∈ +

0 is continuous if and only if
∀x ∈ IRn , ∀ε > 0, ∃δ > 0, ∀y ∈ IRn ,

|x − y| < δ ⇒ |f (x) − f (y)| < ε.
17.5. Rectifiable sets, curves and paths

17.5.1 Remark: Rectifiable paths are a natural basis for the definition of parallelism in differentiable man-
ifolds. The parallel transport of fibres between base points of a fibre bundle is closely related to integration.
Therefore the paths which are used for parallel transport must be almost everywhere differentiable, and that
is exactly what rectifiable paths provide.
17.5.2 Remark: Rectifiable paths are well defined in a general metric space. A more general concept is
the k-rectifiable set introduced in Definition 17.5.3.
17.5.3 Definition: A k-rectifiable set in a metric space (M, d) for k ∈ +

0 is a set X ⊆ M such that
f (S) = X for some bounded set S ⊆ IRk and Lipschitz function f : S → IRk .
17.5.4 Remark: The k-rectifiable sets in Definition 17.5.3 are clearly arbitrary subsets of images of k-
rectangles under Lipschitz maps from IRk to M . So all subsets of a k-rectifiable set are also k-rectifiable.
It is equally clear that the union of any finite number of k-rectifiable sets must be a k-rectifiable set. (See
Federer [106], page 251, for further related definitions.) The map f in Definition 17.5.3 may be thought of
as rectifying the set X, because under the inverse map f −1 , the set X is mapped to a flat set, which in some
sense “rectifies” it; that is, it is “straightened” by the inverse map.
17.5.5 Definition: A rectifiable curve in a metric space (M, d) is a curve γ : I → M such that γ ◦ β −1 is
Lipschitz continuous for some homeomorphism β : I → J, for some bounded interval J ⊆ IR.
17.5.6 Remark: The reparametrization β of a curve γ in Definition 17.5.5 is illustrated in Figure 17.5.1.
The requirement that the interval J be bounded is essential.
I
γ
β M
γ ◦ β −1
J
Figure 17.5.1 Reparametrization of rectifiable curve

17.5. Rectifiable sets, curves and paths 427
17.5.7 Theorem: Any subset of the image of a rectifiable curve in a metric space is a 1-rectifiable set.
Proof: The image of a rectifiable curve γ in a metric space (M, d) is the image of a Lipschitz map
γ ◦ β : I → M for some bounded interval I ⊆ IR. So Range(γ) = Range(γ ◦ β) is a 1-rectifiable set. Any
subset of a 1-rectifiable set is 1-rectifiable. So the theorem follows.
17.5.8 Remark: The converse to Theorem 17.5.7 is not generally true. If a set X ⊆ M is 1-rectifiable,
then there is a bounded set S ⊆ IR such that f (S) = X for some Lipschitz function f : S → M . Let I =
[inf S, sup S] ⊆ IR be the smallest closed interval which includes S. To show the converse of Theorem 17.5.7,
one must show that f can be extended as a Lipschitz function to all of I. An obvious counterexample is
the metric space (M, d) where M = and d : (x, y) 8→ |x − y|. Let S = X = {0, 1} and f : x 8→ x. Then
I = [0, 1], but there is no Lipschitz extension f : I → M . The converse would be true if there exists a
Lipschitz curve connecting every pair of points in the metric space.
17.5.9 Definition: The length of a curve γ -= ∅ in a metric space (M, d) is L(γ) defined by
$5
n 8
−
L(γ) = sup d(γ(xi ), γ(xi−1 )); x ∈ Sγ ∈ IR+
0,
i=1
where
) *
Sγ = (ai )ni=0 ; n ∈ +
, Range(a) ⊆ Dom(γ), a is non-decreasing .
The length of an ordered traversal γ -= ∅ in (M, d) is defined identically to the length of a curve.
17.5.10 Theorem: The length of a curve is invariant under reparametrization. That is, for any metric
space (M, d), for any curve γ : I → M , for any interval J ⊆ IR, for any homeomorphism β : I → J, the
curve γ ◦ β −1 : J → M satisfies L(γ ◦ β −1 ) = L(γ).
17.5.11 Remark: The curve length in Definition 17.5.9 uses only the total order on the domain of the
curve γ. Therefore the length may be generalized to general “ordered traversals”, which were introduced in
the non-standard Definition 7.1.17. See Definition 7.1.13 for order isomorphisms.
17.5.12 Theorem: The length of an ordered traversal is invariant under order isomorphisms. That is, for
any metric space (M, d), for any totally ordered sets I and J, for any ordered traversal γ : I → M , for any
order isomorphism β : I → J, L(γ) = L(γ ◦ β −1 ).
17.5.13 Theorem: A curve γ in a metric space (M, d) is rectifiable if and only if L(γ) < ∞.
Proof: First let γ : I → M be a Lipschitz map from a bounded interval I to!a metric space (M, d).
n
Then
!n for every sequence of points (ai )i=0 in the set Sγ for γ in Definition 17.5.9, i=1 d(γ(ai ), γ(ai−1 )) ≤
n
i=1 Lip(γ)(ai − ai−1 ) ≤ Lip(γ) diam(I). So L(γ) < ∞.

Now assume that a curve γ : I → M is a rectifiable curve in a metric space (M, d). Then by Definition 17.5.5,
there exists a bounded interval J ⊆ IR and a homeomorphism β : I → J such that !nγ ◦ β : J → M is a
−1
! " map. Let γ̃ = γ ◦ #β .!Then

Lipschitz −1
for every sequence of points (ai )i=0 in Sγ , i=1 d(γ(ai ), γ(ai−1 )) =
n
n n
i=1 d γ̃(β(a i )), γ̃(β(a i−1 )) ≤ i=1 Lip(γ̃)(β(a i ) − β(ai−1 )) ≤ Lip(γ̃) diam(J). So L(γ) < ∞ again.
To show the converse, it is necessary for any γ : I → M with L(γ) < ∞ to construct a homeomorphism
β : I → J such that γ ◦ β −1 : J → M is Lipschitz continuous and J is bounded. & To do this, note that the
length L(γ) is a non-decreasing function of the curve γ in the sense that L(γ & ) ≤ L(γ) for any restriction
& K &
γ &K of γ to a subinterval K of I. For t ∈ I, define K = {x ∈ I; x ≤ t}. Then L(γ &K ) is non-negative,
& finite
&
and non-decreasing with respect to t. Define J = [0, L(γ)], and define β : I → J by β : t 8→ L(γ K ).
This function β does not quite complete the proof because it may not be a homeomorphism. The curve may
be constant on subintervals of I. This can be fixed by adding a function such as tanh(t) to the function β.
Alternatively, the interval I may be mapped into a bounded interval such as [0, 1] and a term such as kt
may then be added to β for some small positive k.

17.5.14 Remark: Theorem 17.5.13 shows that Definition 17.5.5, which is a topological definition of a
rectifiable curves, is equivalent to the finite-length condition based on Definition 17.5.9, which is apparently
entirely non-topological. The definition of length requires only the total order on the domain of the curve γ.
This suggests that the length and rectifiability of a curve may be generalized to the class of all maps from
totally ordered sets to the metric space. (Similar generalizations are presumably possible for k-rectifiability
and families of curves.) However, since the metric structure is required on the target space for curves,
rectifiability cannot be defined for general topological spaces.
17.5.15 Remark: Since the equivalent condition for rectifiability of a curve in Theorem 17.5.13 is clearly
independent of the parametrization (and incidentally the orientation also) of a curve, it follows that curves γ
with the same path [γ]0 must either be all rectifiable or all not rectifiable. Therefore one may define rectifiable
paths unambiguously as in Definition 17.5.16. The length of a curve is independent of parametrization. So
the length of a path in Definition 17.5.17 is well-defined.
17.5.16 Definition: A rectifiable path in a metric space (M, d) is a path Q in M whose representative
curves γ ∈ Q are all rectifiable curves in M .
17.5.17 Definition: The length of a rectifiable path Q in a metric space (M, d) is the length of any
representative curve γ ∈ Q. The length of a path Q may be denoted as L(Q).
17.5.18 Remark: The rectifiability and length properties of a curve depend only on the sequence of points
traversed and do not really require any continuous map. If the curve is simple, then a total order may be
defined on the image of the curve to specify the traversal, and the rectifiability and length may be recovered
from this. This follows from the fact that Theorem 17.5.13 uses the curve map only to determine the ordering
of points. For simple curves, this implies that the curve map could be replaced in definitions of rectifiability
and length with a total order. This could then be used in a parametrization-free definition of a directed path.
This is not done here because it only works for simple paths (which is inconvenient for defining parallelism),
and because it is too difficult to specialize to differentiable paths.
[ Define a distance parametrization for all curves, and normalize it to start at 0 if it is closed on the left. Show
that diam(Im(γ)) ≤ L(γ). ]
[ Define pseudo-metrics as hyperbolic versions of two-point metrics. ]

[429]
Chapter 18
Differential calculus
18.1 Infinitesimals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

18.2 Differentiation for one variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
18.3 Unidirectional differentiability of real-to-real functions . . . . . . . . . . . . . . . . . . . . 435
18.4 Higher-order derivatives for real-to-real functions . . . . . . . . . . . . . . . . . . . . . . . 436
18.5 Differentiation for several variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
18.6 Higher-order derivatives for several variables . . . . . . . . . . . . . . . . . . . . . . . . . 443
18.7 Some differentiability-based function spaces . . . . . . . . . . . . . . . . . . . . . . . . . 444
18.8 Differentiation for abstract linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
18.9 Hölder continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Some basic calculus topics are summarized here for quick reference.
18.0.1 Remark: Calculus is a set of calculation rules for differentiating and integrating functions. Analysis
deals with limits, convergence, existence, regularity and other more advanced topics. The topics “calculus”
and “analysis” have a lot of overlap. Both words suffer some ambiguity because they are used with different
meanings in different contexts.
18.1. Infinitesimals
18.1.1 Remark: A vector may represent either a two-point displacement or a single-point infinitesimal
displacement. The application of linear spaces to two-point displacements goes back to Descartes. (See
EDM2 [35], section 101.) This involved labelling each point in space with numerical coordinates. Geometry
before the time of Descartes was mostly expressed in terms of compass and ruler constructions.
Although Cartesian coordinates are very useful for describing the trajectories of objects in 3-dimensional
space, the development of physical laws required infinitesimal displacements as introduced by Newton, who
called them “fluxions”. A two-point space vector (x, x+∆x) for a non-zero ∆x may be divided by a two-point
time vector (t, t + ∆t) to provide an estimate ∆x/∆t of velocity, but this two-point average velocity does not
have a simple algebraic relation to forces acting on an object. For the development of theories of motion, it
was necessary to develop the concept of a limit of a quotient of two quantities to arrive at measures such as
velocity or acceleration.
Whenever one analyses the rate of change of one variable with respect to another, which is how most physical
theories are expressed, one looks at limits such as lim∆t→0 (∆x)/(∆t). In this case, one usually denotes the
“limits” of ∆x and ∆t as dx and dt. Then the limit of the quotient is written as dx/dt. However, as everyone
knows, neither dx nor dt is well defined in isolation. The concept of a limit may be expressed more precisely
as ∀ε > 0, ∃δ > 0, |∆t| < δ ⇒ |∆x/∆t − dx/dt| < ε.
[ Should refer to the discussion of infinitesimals in Bell [190], pages 142–153, especially on the difficulties of
Newton and Leibniz in thinking about derivatives and velocities. ]
18.1.2 Remark: It is entirely understandable that it is so difficult to find an exact mathematical represen-
tation of an infinitesimal. It is one of the deepest and most significant concepts in the history of science. The

430 18. Differential calculus
discovery of this concept marked the break between descriptive science and predictive science. By applying
Newton’s mechanical and gravitational laws, it was possible to determine on paper how a machine would
work before it was built. But the laws could only be understood with the aid of the concepts of derivatives
and integrals, both of which require the concept of infinitesimals.
18.1.3 Remark: If one thinks about velocity from the information point of view, it seems that the velocity
(or momentum) of an object or particle stores information. When an object moves in a vacuum, the
information encoded in its velocity is maintained indefinitely, and very precisely. One might ask how an
object or particle “knows” what velocity to maintain. If this information were “copied” inaccurately from
second to second, one would expect that velocity would be follow a random walk, but this does not seem to
be observed.
In opposition to the classical notion of infinitely well maintained velocity of a particle, taking into account
various forces such as gravity, is in opposition to some more recent ideas regarding the granularity of space
and time. If space-time does indeed turn out to be granular in nature, the ε-δ limit notion of velocity may
turn out to be inapplicable because for small enough δ, the velocity estimator ∆x/∆t may become rather
fuzzy. One may then ask what the forces (such as gravity) acting on a particle are modifying exactly, since
the instantaneous velocity may not be well defined. Then it could be better to formulate velocity in terms
of a probabilistic estimator which is not so sensitive to fuzziness in either space or time.
In applications of analysis to engineering, it is often noted that integrals smooth out the errors or noise,
whereas differentiation exacerbates errors or noise. So integral expressions are preferred, for example, in
feedback systems. It may be, in face, that differentiation is not an applicable model for the real physical
world, whereas integration may be entirely suitable. It is probably wise to not take infinitesimal limit
processes too literally. As noted elsewhere in this book, any concept which relies heavily on infinity concepts
is liable to be inapplicable in detail, although models reliant on limiting processes may be accurate enough
in realistic application scenarios.
Concepts such as temperature and pressure of gases have undoubted applicability and reality, despite the
“granular” nature of these concepts. It is probably a good idea to define derivatives, if possible, in such a
way that they may easily be generalized to a corresponding statistical concept. In other words, it would be
desirable to develop a “noise-tolerant definition” of differentiation.
18.2. Differentiation for one variable
18.2.1 Remark: Definition 18.2.2 seeks to adapt the ε-δ style of continuity condition in Theorem 17.4.5
to derivatives of real-valued functions. In the case of continuity at a point p ∈ IR, the function f must satisfy
f (Bp,δ ) ⊆ Bf (p),ε for some δ > 0. In the case of differentiability at a point p ∈ IR, the function f must
satisfy (f (x) − f (p))/(x − p) ∈ Bv,ε for x ∈ Bp,δ for some δ > 0, but the point p must be excluded from the
open ball Bp,δ to ensure that the quotient is well defined.
18.2.2 Definition: A function f : U → IR for U ∈ Top(IR) is said to be differentiable at p for p ∈ U if
∃v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),

f (x) − f (p)
∈ Bv,ε . (18.2.1)
x−p
A function f : U → IR for U ∈ Top(IR) is said to be differentiable on U if it is differentiable at p for all p ∈ U .

In other words,
∀p ∈ U, ∃v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),

f (x) − f (p)
∈ Bv,ε . (18.2.2)
x−p
18.2.3 Remark: Condition (18.2.2) may be rewritten as follows.
∀p ∈ U, ∃v ∈ IR, ∀ε > 0, ∃δ > 0, Qf,p (Ḃp,δ ) ⊆ Bv,ε . (18.2.3)

18.2. Differentiation for one variable 431
where the function Qf,p : U \ {p} → IR is defined by Qf,p : x 8→ (f (x) − f (p))/(x − p), and Ḃp,δ denotes a
punctured open ball as in Definition 17.1.17.
The quotient Qf,p (x) = (f (x) − f (p))/(x − p) is an estimator for the gradient parameter of the “best-fit”
straight line through the point (p, f (p)) of the graph of f . Condition (18.2.3) says that this estimator must
be arbitrarily close to a real number v if the domain is restricted narrowly enough around the point p.
Since there are so many families of curves to choose from for fitting to a given function f , one might ask
why a linear function is chosen. Historically, this is almost certainly because of the long history of straight
line Euclidean geometry which was taught in the universities of Europe in the 17th century, and because
in Cartesian coordinates, the graph of a line expressed in terms of a very tiny set of add-and-multiply
operations. In fact, the general ubiquity of straight lines in mathematics is surely for the same two reasons.
The derivative of a function may be regarded as a local linearization of the function. The “goodness of fit”
improves to any specified accuracy if the “δ microscope” is zoomed in sufficiently closely on any given point
of the graph.
18.2.4 Remark: The ε-δ style of continuity condition in Theorem 17.4.3 is adapted for differentiability
in Definition 18.2.5, which is an alternative for Definition 18.2.2. In the case of continuity, the difference
|f (x) − f (p)| must be bounded by ε for arbitrarily small ε. For differentiability, |f (x) − f (p) − v(x − p)| must
be bounded by ε|x − p| for arbitrarily small ε. This is illustrated in Figure 18.2.1.
f (p) + v(x − p) + ε|x − p|

f (p) + v(x − p)
f (p) + v(x − p) − ε|x − p|
f (p)
f (x)
p−δ p p+δ
Figure 18.2.1 Definition of derivative of real-valued function or real variable
Since the vast majority of fundamental physics is written in terms of derivatives, one might ask which
of Definition 18.2.2 and 18.2.5 is preferable for conveying the right meaning. A derivative is generally a
model for a velocity or the rate of change of some parameter. Definition 18.2.2 suggests that we are making
estimates of a rate-of-change parameter. Definition 18.2.5 suggests that we are determining the “goodness
of fit” of a straight line to the graph of the variation of one parameter with respect to another.
There are some rate-of-change concepts which are not easily represented as fitting two graphs to each
other.
" For example, the second #derivative of a function may be represented as a limit of the quotient
h−2 f (p − h) − 2f (p) + f (p + h) . Limiting this quotient to a ball Ba,ε limits the graph of f to a set of
parabolas. These parabolas can be depicted on a graph of f , but they are somewhat untidy.
Graphs generally do not have any physical reality. They are merely convenient for humans to represent what
is happening. It seems therefore preferable to define a derivative as the limit of a parameter estimator rather
than as the parameter of a “best-fit curve”. In other words, Definition 18.2.2 is preferable to Definition 18.2.5.
18.2.5 Definition (→ 18.2.2): A function f : U → IR for U ∈ Top(IR) is said to be differentiable at p for

p ∈ U if
∃v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),

|f (x) − f (p) − v(x − p)| < ε|x − p|. (18.2.4)
A function f : U → IR for U ∈ Top(IR) is said to be differentiable on U if it is differentiable at p for all p ∈ U .

18.2.6 Remark: The number v in Definition 18.2.5 is unique for a given function f and point p. This is
shown in Theorem 18.2.7. The difference between equations (18.2.4) and (18.2.5) is the use of the unique
existence quantifier “ ∃# ” in (18.2.5).
18.2.7 Theorem: Let U ∈ Top(IR) and let f : U → IR be a function which is differentiable at p ∈ U .

Then
∃# v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),

|f (x) − f (p) − v(x − p)| < ε|x − p|. (18.2.5)
Proof 1: The uniqueness of v follows from a straightforward modus tollens. Let v1 , v2 ∈ IR satisfy equation
(18.2.4) for some p ∈ U . Suppose v1 -= v2 and let ε = |v1 − v2 |/2. Then ε > 0. Therefore
∃δ1 > 0, ∀x ∈ (p − δ1 , p + δ1 ) ∩ (U \ {p}),

|f (x) − f (p) − v1 (x − p)| < ε|x − p|
and
∃δ2 > 0, ∀x ∈ (p − δ2 , p + δ2 ) ∩ (U \ {p}),

|f (x) − f (p) − v2 (x − p)| < ε|x − p|.
Let δ = min(δ1 , δ2 ). Then
∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),
|f (x) − f (p) − v1 (x − p)| < ε|x − p| and |f (x) − f (p) − v2 (x − p)| < ε|x − p|.
Since (p − δ, p + δ) ∩ (U \ {p}) -= ∅, it follows that for some x ∈ (p − δ, p + δ) ∩ (U \ {p}),
|v1 − v2 |.|x − p| = |(v1 − v2 )(x − p)|

&" # " #&
= & f (x) − f (p) − v1 (x − p) − f (x) − f (p) − v2 (x − p) &
≤ |f (x) − f (p) − v1 (x − p)| + |f (x) − f (p) − v2 (x − p)|
< 2ε|x − p|
≤ |v1 − v2 |.|x − p|.
This is impossible if v1 -= v2 and x -= p. Therefore v1 = v2 . In other words, v is unique as claimed.
Proof 2: Let v1 , v2 ∈ IR satisfy equation (18.2.4) for some p ∈ U . Let ε > 0. Then
∃δ1 > 0, ∀x ∈ (p − δ1 , p + δ1 ) ∩ (U \ {p}),

|f (x) − f (p) − v1 (x − p)| < ε|x − p|
and
∃δ2 > 0, ∀x ∈ (p − δ2 , p + δ2 ) ∩ (U \ {p}),

|f (x) − f (p) − v2 (x − p)| < ε|x − p|.
Let δ = min(δ1 , δ2 ). Then since (p − δ, p + δ) ∩ (U \ {p}) -= ∅, for some x ∈ (p − δ, p + δ) ∩ (U \ {p}),
|v1 − v2 | = |(v1 − v2 )(x − p)|/|x − p|

&" # " #&
= & f (x) − f (p) − v1 (x − p) − f (x) − f (p) − v2 (x − p) &/|x − p|
" #
≤ |f (x) − f (p) − v1 (x − p)| + |f (x) − f (p) − v2 (x − p)| /|x − p|
< 2ε|x − p|/|x − p|
= 2ε.
Therefore |v1 − v2 | = 0. So v1 = v2 . In other words, v is unique as claimed.

18.2. Differentiation for one variable 433
18.2.8 Remark: Proofs of uniqueness in analysis often follow the modus tollens pattern as in line (18.2.6).
This is very similar to the reductio ad absurdum pattern of argument in line (18.2.7). In the 19th century,
these kinds of “indirect” methods of argument were often criticised for assuming the “excluded middle”
notion of truth and falsity, although as argued in Remark 3.1.4 and Section 3.11, adoption of a sensible
ontology for logic ensures the validity of “proof by contradition”.
A ⇒ B, ¬B < ¬A (18.2.6)
A ⇒ B, A ⇒ ¬B < ¬A. (18.2.7)
Proof 1 of Theorem 18.2.7 uses the modus tollens approach. Proof 2 of Theorem 18.2.7 apparently does not.
18.2.9 Remark: The uniqueness of the number v in Definition 18.2.5 implies that the word “the” can be
used instead of “a” for this value. Thus it may be given a name such as “the derivative of f at p”. (If the
number v was not unique, we would have to call it “a derivative of f at p”.) Hence Definition 18.2.10 makes
good sense. If the function f is differentiable on U , there is a unique well-defined function which maps each
p ∈ U to a corresponding unique value v at p. This is called “the derivative of f ”.
18.2.10 Definition: The derivative of f at p, for a function f : U → IR which is differentiable at a point

p ∈ U for U ∈ Top(IR), is the unique number v ∈ IR which satisfies
∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),

f (x) − f (p)
∈ Bv,ε .
x−p
The derivative of f , for a function f : U → IR which is differentiable on U ∈ Top(IR), is the function

f # : U → IR defined so that for all p ∈ U , f # (p) is the derivative of f at p.
18.2.11 Remark: EDM2 [35], 106.A, gives the notations dy/dx, y # , ẏ, df (x)/dx, (d/dx)f (x), f # (x) and
Dx f (x) for the derivative of y = f (x). In the olden days, the dot notation ẏ usually meant the derivative
with respect to time, whereas the dashed notation y # means the derivative with respect to a space variable.
Leibniz created the dy/dx notation whereas Newton created the ẋ notation.
Some authors attribute the slow progress of analysis in England after Newton to the use of his notation. For
example, Ball [188], page 439, wrote the following in about 1893.
Towards the beginning of the last century the more thoughtful members of the Cambridge school
of mathematics began to recognize that their isolation from their continental contemporaries was a
serious evil. The earliest attempt in England to explain the notation and methods of the calculus as
used on the continent was due to Woodhouse, who stands out as the apostle of the new movement.
Ball [188], page 441, wrote the following about Charles Babbage, who was also a member of the Cambridge
Analytical Society.
It was he who gave the name to the Analytical Society, which, as he stated, was formed to advocate
“the principles of pure d-ism as opposed to the dot-age of the university.”
The d and dot here refer to the continental and Newtonian notations respectively. Struik [194], page 171,
wrote the following. (He gives the reference Dubbey [193] for the Babbage quote.)
The names of Hamilton and Cayley show that by 1840 English-speaking mathematicians had at
last begun to catch up with their continental colleagues. Until well into the nineteenth century,
the Cambridge and Oxford dons regarded any attempt at improvement of the theory of fluxions
as an impious revolt against the sacred memory of Newton. The result was that the Newtonian
school of England and the Leibnitzian school of the continent drifted apart to such an extent that
Euler, in his integral calculus (1768), considered a union of both methods of expression as useless.
The dilemma was broken in 1812 by a group of young mathematicians at Cambridge who, under
the inspiration of the older Robert Woodhouse, formed an Analytical Society to propagate the
differential notation. Its leaders were George Peacock, Charles Babbage, and John Herschel. They
tried, in Babbage’s words to advocate “the principles of pure d-ism as opposed to the dot-age of the
university.” [. . . ] The new generation in England now began to participate in modern mathematics.

Ball [188], pages 361–362, also wrote the following about Newton’s “fluxions” method as opposed to the
European “differential” method.
The controversy with Leibnitz was regarded in England as an attempt by foreigners to defraud
Newton of the credit of his invention, and the question was complicated on both sides by national
jealousies. It was therefore natural, though it was unfortunate, that in England the geometrical and
fluxional methods as used by Newton were alone studied and employed. For more than a century
the English school was thus out of touch with continental mathematicians. The consequence was
that, in spite of the brilliant band of scholars formed by Newton, the improvements in the methods
of analysis gradually effected on the continent were almost unknown in Britain. It was not until
1820 that the value of analytical methods was fully recognized in England, and that Newton’s
countrymen again took any large share in the development of mathematics.
(See also some related comments in Remarks 30.4.7 and 45.1.5.)
18.2.12 Remark: Equation (18.2.4) is the same as (18.2.8).
|f (x) − f (p) − v(x − p)|

∃v ∈ IR, lim = 0. (18.2.8)
x→p |x − p|
(See Definition 14.12.15 for the definition of the limit of a function at a point.)
[ Definition 14.12.15 is quite unsatisfactory. This must be fixed. ]
18.2.13 Remark: It is tempting to seek a generalization of condition (17.4.2) in Theorem 17.4.5, which
expresses continuity in a general metric space in terms of open balls, from continuity to differentiability. One
may define sets of the form Kp,a,v,ε = {(x, y) ∈ IR × IR; |y − a − vx| < ε|x − p|} for (p, a) ∈ IR × IR, v ∈ IR
and ε ∈ IR+ . Then one may re-express (18.2.4) as:
&
&
∃v ∈ IR, ∀ε > 0, ∃δ > 0, f& ⊆ Kp,f (p),v,0 .
(p−δ, p+δ)\{p}
This is illustrated in Figure 18.2.2. The cone-shaped neighbourhoods Kp,f (p),v,0 are related to the definitions
of α-Hölder continuity in Section 18.9, particularly the case α = 1.
f (p) + v(x − p) + ε|x − p|
Kp,f (p),v,ε f (p) + v(x − p)
f (p) + v(x − p) − ε|x − p|
f (p)
f (x)
&
f &(p−δ, p+δ)\{p}
x
p−δ p p+δ
Figure 18.2.2 Cone-shaped neighbourhood for defining differentiability
One may also note that the definition of differentiability of f at p is equivalent to:
∃v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p + δ) ∩ (U \ {p}),

f (x) ∈ Bf (p)+v(x−p),ε|x−p| .
This means that f (x) is required to lie inside the open ball Bf (p)+v(x−p),ε|x−p| for each x, but this ball
depends on x. This “moving ball” concept seem to capture the meaning of the derivative even less well than
the “tangent cone” concept. All things considered, the “estimator convergence” concept in Definition 18.2.2
seems to express the derivative idea in the most natural way.

18.3. Unidirectional differentiability of real-to-real functions 435
18.2.14 Remark: Differentiability is generally thought of as a stronger condition than continuity. How-
ever, in the case of multiple independent variables, Examples 18.5.6 and 18.5.11 show that partial and
directional differentiability do not imply continuity. Therefore it is prudent to prove that differentiability for
a single independent variable does imply continuity.
18.2.15 Theorem: Let U ∈ Top(IR) and p ∈ U . Suppose a function f : U → IR is differentiable at p.

Then f is continuous at p.
Proof: Suppose U ∈ Top(IR) and f : U → IR is differentiable at p ∈ U . By Theorem 17.4.14, f is

continuous at p if
∀ε > 0, ∃δ > 0, ∀x ∈ U, |x − p| < δ ⇒ |f (x) − f (p)| < ε.
Let ε > 0 and let v = f # (p) be the derivative of f at p. Let ε# = ε/2. Then Definition 18.2.5 implies
∃δ # > 0, ∀x ∈ (p − δ # , p + δ # ) ∩ U \ {p},
|f (x) − f (p) − v(x − p)| < ε# |x − p|.
Let δ = min(1, δ # , ε/(2|v|)). Then |x − p| < 1 for all x ∈ (p − δ, p + δ) and |v| ≤ ε/(2δ). So
∀x ∈ (p − δ, p + δ) ∩ U \ {p},
|f (x) − f (p)| ≤ |f (x) − f (p) − v(x − p)| + |v(x − p)|
< ε# |x − p| + |v|.|x − p|
< (ε/2)|x − p| + ε/2
< ε.
Therefore f is continuous at p.
[ Show that a Lipschitz function is differentiable almost everywhere. This really requires measure theory, but
Lipschitz functions are non-differentiable only on countable sets, which don’t need any measure theory. Also
present functions of bounded variation. ]
18.2.16 Example: [ Here give the example of a function h : IR → IR which is everywhere continuous but
differentiable nowhere. For example, see Rudin [137], theorem 7.18, page 141. This requires the use of
infinite series. ]
18.3. Unidirectional differentiability of real-to-real functions

18.3.1 Definition: A right-open subset of IR is a set U ⊆ IR which satisfies
∀p ∈ U, ∃δ > 0, (p, p + δ) ⊆ U.
18.3.2 Definition: A left-open subset of IR is a set U ⊆ IR which satisfies
∀p ∈ U, ∃δ > 0, (p − δ, p) ⊆ U.
18.3.3 Definition: A right-differentiable function at a point p ∈ U for a right-open set U ⊆ IR is a function

f : U → IR which satisfies
∃v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p, p + δ) ∩ U,

|f (x) − f (p) − v(x − p)| ≤ ε|x − p|.
A right-differentiable function on a right-open set U ⊆ IR is a function f : U → IR which is right-differentiable

for all p ∈ U .

f (p) + v(x − p) + ε|x − p|

f (p) + v(x − p)
f (p) + v(x − p) − ε|x − p|
f (p)
f (x)
x
p p+δ
Figure 18.3.1 Right differentiability of real-valued function or real variable
18.3.4 Definition: A left-differentiable function at a point p ∈ U for a left-open set U ⊆ IR is a function

f : U → IR which satisfies
∃v ∈ IR, ∀ε > 0, ∃δ > 0, ∀x ∈ (p − δ, p) ∩ U,

|f (x) − f (p) − v(x − p)| ≤ ε|x − p|.
A left-differentiable function on a left-open set U ⊆ IR is a function f : U → IR which is left-differentiable

for all p ∈ U .
18.3.5 Definition: A unidirectionally differentiable function at a point p ∈ U for an open set U ⊆ IR is a

function f : U → IR which is both right-differentiable and left-differentiable at p.
A unidirectionally differentiable function on an open set U ⊆ IR is a function f : U → IR which is both
right-differentiable and left-differentiable on U .
[ Insert tree diagram for various classes of first-order differentiability at a point, and in a given domain. ]
18.4. Higher-order derivatives for real-to-real functions
18.4.1 Remark: It is tempting to define higher order differentiability of real-valued functions of a real
variable inductively as follows.
A function f : U → IR for U ∈ Top(IR) is said to be k-times differentiable at p for k ∈ + and
p ∈ U if f is (k − 1)-times differentiable in a neighbourhood of p and the (k − 1)th derivative of f
is differentiable at p.
Such inductive definitions are somewhat dangerous in general. It is best to play it safe by first demonstrating
the existence of the sequences of objects which are to be defined and then giving them names in a definition.
An “inductive definition” is an infinite sequence of definitions, each of which requires the validation of all
earlier definitions in the sequence in order to be validated.
This is different to a “template definition” which is parametrized by an object from a specified class, but
where there is no dependency between the definitions for each member of the class. The list space List(X) for
an arbitrary set X in Definition 7.12.2 is a typical example of a template definition. There are infinitely many
list space definitions, but none of them depend on each other, unless of course the space X is itself defined
in terms of a list space. A space List(List(X)) is defined by the double application of Definition 7.12.2.
An inductive sequence of definitions is a template definition where there are dependencies between the
individual definitions for particular parameter values. Although such definitions may appear fairly safe for
simple induction situations, the danger is much greater in the case of multiple induction, transfinite induction
and induction with respect to more general ordered sets of definition parameter values.
18.4.2 Remark: Since the kth derivative of a function f at a point x ∈ IR depends on two variables, k
and x, one must make a choice of order for the formalisation of higher-order derivatives. One may think of
these derivatives as a sequence of partially defined functions or a sequence-valued function. In the former
case, the function is of the form + 0 → (IR → ˚ IR). In the latter case, the form is IR → ( + 0 →˚ IR), or

18.4. Higher-order derivatives for real-to-real functions 437
more precisely, IR → List(IR). (See Definition 6.11.3 and Notation 6.11.4 for partially defined functions. See
Definition 7.12.5 for extended list spaces.)
A sequence-valued function representation of higher-order derivatives is clumsy. In this representation, a
sequence of higher-order derivatives is attached to each point of IR. This is unnatural in the sense that each
derivative f (k) (x) is defined in terms of the function values f (k−1) (t) for t in a neighbourhood of x, not just
the value f (k−1) (x). On the other hand, the sequence-valued function representation has the advantage that
it forces the values f (%) (x) to be defined for / < k if f (k) (x) is defined. However, the trade-off seems to favour
the “sequence of partially defined functions” style of representation as in Definition 18.4.4. (A “list-valued
function” style of representation is presented in Remark 18.4.9.)
18.4.3 Remark: Definition 18.4.4 is expressed in terms of partially defined functions. This is because
the set of partially defined functions is closed under differentiation, whereas a set of functions with a fixed
domain is not. (This is illustrated by the example in Figure 18.4.1.)
f (x) f # (x) f ## (x) f ### (x)

2 2 2 2
1 1 1 1
-2 -1 1 2 x -2 -1 1 2 x -2 -1 1 2 x -2 -1 1 2 x
-1 -1 -1 -1
-2 -2 -2 -2
Figure 18.4.1 Higher derivatives may be partially defined
18.4.4 Definition: The sequence of higher order derivatives of a partially defined real function f : IR → ˚ IR
–
is the sequence Df : + 0 → (IR →˚ IR) which is defined inductively by the rules:
–
(i) (Df )0 = f .
– – & – –
(ii) For all k ∈ + , for all x ∈ IR, (Df )k (x) = (d/dt)(Df )k−1 (t)&t=x if x ∈ Int(Dom((Df )k−1 )) and (Df )k−1
–
is differentiable at x; otherwise (Df )k (x) is undefined.
18.4.5 Remark: Before using the sequences in Definition 18.4.4 to define individual higher derivatives
and differentiability, it’s a good idea to make sure that the sequences are always well-defined. The partially
defined functions f : IR → ˚ IR include the functions f : U → IR for open sets U ∈ Top(IR) as special cases.
But Dom(f ) is not necessarily an open set for all f : IR → ˚ IR.
–
It is clear that condition (i) is always well-defined. The zeroth element of the sequence Df is simply the
function f itself.
–
The value (Df )k (x) is defined in condition (ii) if and only if
–
(1) (Df )k−1 (t) is defined for t ∈ (x − δ, x + δ) for some δ > 0 and
–
(2) (Df )k−1 (t) is differentiable at x.
–
Condition (1) is the same as saying that x is in the topological interior Int(Dom((Df )k−1 )) of the domain
– –
Dom((Df )k−1 ) of the partially defined function (Df )k−1 . (See Definition 14.5.1 for the topological interior
of a set.)
Conditions (1) and (2) are unambiguous propositions which are either true or false. So the inductive rules
(i) and (ii) in Definition 18.4.4 are well-defined propositions. In other words, if the partially defined function
– –
(Df )k−1 is well defined, then the partially defined function (Df )k is also well defined. It follows that the
–
infinite sequence Df is well defined for any partially defined function f : IR →
˚ IR. Therefore Definition 18.4.4
may be safely used in other definitions.
– –
18.4.6 Remark: Definition 18.4.4 implies that Dom((Df )k ) ⊆ Int(Dom((Df )k−1 )) for all k ∈ + . Hence
–
Dom((Df )k ) ⊆ Int(Dom(f )) for all k ∈ + . In other words, positive orders of differentiability are defined on
the interior of the domain of f at most. So one may as well restrict the definitions of derivatives to functions
whose domains are open sets. This is exactly what many textbooks do. However, generalizing the definitions
of derivatives to arbitrary domains sometimes saves a lot of tedious formal argument in applications.

18.4.7 Definition: A function f : IR → ˚ IR is k-times differentiable at x for x ∈ Dom(f ) when x ∈

– –
Dom((Df )k ), where Df is the sequence of higher-order derivatives in Definition 18.4.4.
–
If f is k-times differentiable at x ∈ Dom(f ), then the kth derivative of f at x is the value (Df )k (x). Otherwise
the kth derivative of f at x is said to be “undefined”.
–
A function f : IR → ˚ IR is k-times differentiable when Dom(f ) ⊆ Dom((Df )k ).
18.4.8 Remark: Definition 18.4.7 is equivalent to the more usual definition, which is that f is k-times
differentiable when all derivatives f (j) (t) are defined for j < k for t in a neighbourhood of x and f (k−1) is
differentiable at x.
–
The set inclusion condition Dom(f ) ⊆ Dom((Df )k ) for the k-times differentiability of f is, of course, the
–
same as the equality Dom(f ) = Dom((Df )k ) because the reverse inclusion is automatic.
18.4.9 Remark: As mentioned in Remark 18.4.2, the sequence of higher-order derivatives of a real function
may also be represented as a list-valued function of the real numbers, namely as a function D∗ f : IR →
List(IR), where the list space List(IR) is given by Definition 7.12.5:
%
List(IR) = IRω ∪ IRk .
+
k∈ 0
This representation is the transpose (in the sense of Remark 6.12.3) of the sequence-of-functions representa-
tion in Definition 18.4.4.
–
∀x ∈ IR, ∀k ∈ +
0, (D∗ f )(x)k = (Df )k (x). (18.4.1)
Equation (18.4.1) is to be understood in the “partially defined” sense. In other words, the right-hand side
is undefined if and only if the left-hand side is undefined. By Remark 18.4.6, it follows that the domains
of the sequences (D∗ f )(x) are contiguous subsets of + 0 which include 0 for all x ∈ IR. In other words,
(D∗ f )(x) ∈ List(IR) for all x ∈ IR.
18.4.10 Notation: C k (U ) for an open subset U of IR and k ∈ + 0 denotes the set of functions f : U → IR
for which the rth derivative dr f (x)/dxr is well-defined for all r ≤ k and the kth derivative function x 8→
dk f (x)/dxk is continuous for x ∈ U .
'∞
C ∞ (U ) for an open subset U of IR denotes the set k=0 C k (U ).
18.4.11 Theorem: Let U ∈ Top(IR) and g ∈ C 1 (U ). Let f ∈ C 1 (V ) for some set V ∈ Top(IR) which
satisfies Range(g) ⊆ V . Define h : U → IR by h : x 8→ f (g(x)). Then h ∈ C 1 (U ) and h# (x) = f # (g(x))g # (x)
for all x ∈ U .
18.4.12 Remark: The higher-order composition rules for differentiation with a single variable may be
generalized from Theorem 18.4.11 as follows.
h## = f ## (g # )2 + f # g ##
h### = f ### (g # )3 + 3f ## g ## g # + f # g ###
h(4) = f (4) (g # )4 + 6f ### g ## (g # )2 + 3f ## (g ## )2 + 4f ## g ### g # + f # g (4) ,
and so forth. It is assumed that g ∈ C k (U ) and f ∈ C k (V ), where k is the order of the derivative of h.
[ Show that the second order derivative can be calculated as limh→0 h−2 (f (x − h) + f (x + h) − 2f (x)), or
something like that. Also provide such expressions for higher-order derivatives. This kind of calculation
is very much in the style of the “parameter estimator” convergence notion discussed in Remarks 18.2.3
and 18.2.4. It might even more natural to define higher-order derivatives as these kinds of “parameter
estimators” than to take derivatives of derivatives. One can even visualize these estimators in terms of
parabolas etc. It is also possible to change the weights for points in the estimates to make them lop-sided,
and to parametrize with respect to the independent variable differently. In particular, one-sided derivatives
of any order may be defined like this. ]

18.5. Differentiation for several variables 439
18.5. Differentiation for several variables

18.5.1 Remark: The definitions of differentiability for functions of several variables are not a straightfor-
ward generalization from the single variable situation. In fact, there is some substantial complexity. This
complexity is not of purely technical interest. Models which arise in physics often have solutions which are
functions of several variables with limited differentiability. In other words, not all orders of derivatives are
defined. Therefore one may ask which order of derivative is the highest order which is defined. If this order
is k, say, then clearly the order k + 1 derivative is not defined. Numerous definitions of limited differentia-
bility may be interpolated between order k and order k + 1. Many apparently paradoxical relations occur
between these definitions. For example, a function whose partial derivatives are defined everywhere in an
open domain might not be continous at all points of the domain.
In practical situations in physics, especially for boundary value problems, the domain is very often not an
open set. In this case, differentiability on the boundary of the set must be defined. In PDE analysis, this
question is avoided by requiring a function to be extendable to an open superset of the domain. This does
not make sense in many physics models. Therefore differentiability on boundary points must be defined. In
this case, the situation becomes very much more complicated than even the interior point case.
It follows from these considerations that the technical difficulties of differentiating functions of several vari-
ables cannot be avoided if applicability of differential geometry to practical problems in physics is desired.
[ Should define limits for real functions on IRn ? ]
18.5.2 Remark: Figure 18.5.1 shows some relations between differentiability properties for real-valued
functions of several real variables.
f continuous in Ω
f partially f directionally f totally

differentiable in Ω differentiable in Ω differentiable in Ω
f ctsly partially f ctsly directionally f ctsly totally

differentiable in Ω differentiable in Ω differentiable in Ω
Figure 18.5.1 Relations between differentiability properties for f : Ω → IR, Ω ∈ Top(IRn ), n ∈ IR+
18.5.3 Remark: Definition 18.2.5 for differentiability of a real-valued function of a real variable may be
generalized in many different ways to functions from IRn to IRm . The most restrictive natural generalization
requires the existence of a “total differential” at a point in IRn . This is given by Definition 18.5.17. A much
easier test to verify is the partial differentiability property in Definition 18.5.4.
18.5.4 Definition: A function f : U → IRm for U ∈ Top(IRn ) with m, n ∈ +

0 is said to be partially
differentiable at p ∈ U if
∀i ∈ n, ∃w ∈ IRm , ∀ε > 0, ∃δ > 0, ∀t ∈ (−δ, δ),

p + tei ∈ U ⇒ |f (p + tei ) − f (p) − tw| ≤ ε|t|. (18.5.1)
A function f : U → IRm for U ∈ Top(IRn ) is said to be partially differentiable on U if it is partially

differentiable at all points p ∈ U .
[ Definition of partial derivatives ∂f (x)/∂xi . ]

[ Show that the partial derivative values are unique at each point. ]
18.5.5 Remark: Partially differentiable functions are not necessarily continuous. This is proved by Ex-
amples 18.5.6 and 18.5.7.
Since all totally differentiable functions are continuous, it follows that the discontinuous Examples 18.5.6
and 18.5.7 are not totally differentiable. Hence partial differentiability everywhere does not imply total
differentiability.

18.5.6 Example: Define f : IR2 → IR by

(
2x1 x2 /(x21 + x22 ) x -= (0, 0)
f (x) =
0 x = (0, 0).
Then f is partially differentiable on IR2 , but f is not continuous at (0, 0) ∈ IR2 . This function is not
directionally differentiable at (0, 0). The level curves of f are illustrated in Figure 18.5.2. Note that |f (x)| ≤ 1
for all x ∈ IR2 .
x2
−0.7 2 0.3
0.7
−0.9 0.9
−1 1
0.9
1
0.7
0.5
0.3
x1
0
-2 -1 1 2−0.3
−0.5
−0.7
-1 −0.9
1 −1
0.9
0.7
-2
0
Figure 18.5.2 Level curves of f (x) = 2x1 x2 /(x21 + x22 ), f : IR2 → IR
18.5.7 Example: Let h : + → 2 be an enumeration of the set 2

of elements of IR2 which have rational
components. Define φ : IR2 → IR by
5
φ(x) = 2−k f (x − h(k))
k∈ +
for all x ∈ IR2 , where f is as defined in Example 18.5.6 and x − a denotes the element (x1 − a1 , x2 − a2 )
of IR2 for all a ∈ IR2 . Then φ is well defined and partially differentiable on IR2 , but φ is discontinuous at
all points a ∈ 2 . Thus φ is discontinuous on a dense subset of its domain IR2 . Note that |φ(x)| ≤ 1 for
all x ∈ IR2 . (For all points a ∈ 2 , the function φ is also not directionally differentiable at a.)
[ There are some minor technicalities in Example 18.5.7. If 3−k is used instead of 2−k , the proof of non-
continuity at all points of 2 is a little easier. The proof of the properties of examples 18.5.6 and 18.5.7
should be given in exercises. Similar comments apply to Examples 18.5.11 and 18.5.12. ]
[ There should be a whole section on enumerations of sets like n for n ∈ + and subsets of these sets. ]
0 is said to be directionally
∀v ∈ IRn , ∃w ∈ IRm , ∀ε > 0, ∃δ > 0, ∀t ∈ (−δ, δ),
p + tv ∈ U ⇒ |f (p + tv) − f (p) − tw| ≤ ε|t|. (18.5.2)
A function f : U → IR for U ∈ Top(IR ) is said to be directionally differentiable on U if it is directionally
m n
differentiable at all points p ∈ U .

[ Definition of directional derivatives ∂v f = lima→0 (f (x + av) − f (x))/a for v ∈ IRn . Also define one-sided
directional derivatives. See Rudin [137], 9.10, page 188–192 and 5.16, page 96. See also EDM2 [35], 106.G,
page 396 for directional derivatives. See also Section 19.6. ]
[ Show that w in line (18.5.2) is unique for each p and v. Show the relation between the directional and
partial derivatives, namely that the partial derivatives are special cases of the directional derivatives. Given
an example to show that w is not necessarily a linear function of v. Give an example to show that w may
be an unbounded function of v for a given fixed p. ]

18.5. Differentiation for several variables 441
18.5.9 Remark: It is easy to show that the m-tuple w in Definition 18.5.8 is uniquely determined by the
n-tuple v for each point p ∈ U . Therefore w ∈ IRm is a well-defined function of v ∈ IRn for each fixed p ∈ U .
Denote this well-defined function by φp : IRn → IRm . It is immediately clear from Definition 18.5.8 that
φp (u) = φp (λv) for any u, v ∈ IRn which satisfy u = λv for some λ ∈ IR \ {0}. Thus each function φp is linear
on one-dimensional linear subspaces of the linear space IRn . However, it is certainly not true in general that
φp is linear on the whole of IRn . (Remark 18.5.15 is the unidirectional analogue of this remark.)
18.5.10 Remark: Examples 18.5.11 and 18.5.12 give discontinuous functions which are everywhere direc-
tionally differentiable. These are analogous to Examples 18.5.6 and 18.5.7 respectively, which are merely
everywhere partially differentiable.
Since all totally differentiable functions are continuous, it follows that the discontinuous Examples 18.5.11
and 18.5.12 are not totally differentiable. Hence directional differentiability everywhere does not imply total
differentiability.
" #
18.5.11 Example: Define g : IR → IR by g(t) = t exp (1 − t2 )/2 for all t ∈ IR. (See Figure 18.5.3.) Then
g is a C ∞ function on IR, g(1) = 1 and |g(t)| ≤ 1 for all t ∈ IR.
2
g(t) = te(1−t )/2
-3 -2 -1
t
1 2 3
-1
" #
Figure 18.5.3 The function g : IR → IR with g : t 8→ t exp (1 − t2 )/2
Define f : IR2 → IR by
(
g(x21 /x2 ) x2 =
- 0
∀x ∈ IR, f (x) =
0 x2 = 0
( " #
= x21 x−1 4 −2
2 exp (1 − x1 x2 )/2 x2 -= 0
0 x2 = 0.
Then f is directionally differentiable on IR2 , but f is not continuous at (0, 0) ∈ IR2 . Note that |f (x)| ≤ 1
for all x ∈ IR2 . Consequently f is not totally differentiable at (0, 0). The level curves of this function are
18.5.12 Example: As in Example 18.5.7, let h : + → 2 be an enumeration of the set 2
of elements
of IR2 which have rational components. Define φ : IR2 → IR by
5
φ(x) = 2−k f (x − h(k))
k∈ +
for all x ∈ IR , where f is as defined in Example 18.5.11 and x − a denotes the element (x1 − a1 , x2 − a2 )
2
of IR2 for all a ∈ IR2 . Then φ is well defined and directionally differentiable on IR2 , but φ is discontinuous
at all points a ∈ 2 . Thus φ is discontinuous on a dense subset of its domain IR2 . Note that |φ(x)| ≤ 1 for
all x ∈ IR2 . (For all points a ∈ 2 , the function φ is also not totally differentiable at a.)
18.5.13 Remark: Definition 18.5.14 is a one-sided derivative version of Definition 18.5.8.
0 is said to be
unidirectionally differentiable at p ∈ U if
∀v ∈ IRn , ∃w ∈ IRm , ∀ε > 0, ∃δ > 0, ∀t ∈ (0, δ),
p + tv ∈ U ⇒ |f (p + tv) − f (p) − tw| ≤ ε|t|. (18.5.3)
A function f : U → IR for U ∈ Top(IR ) is said to be unidirectionally differentiable on U if it is unidirec-
m n
tionally differentiable at all points p ∈ U .

x2
f (x) = g( 12 ) 2 f (x) = g( 12 )
f (x) = g(1) 1 f (x) = g(1)

f (x) = g(2) f (x) = g(2)
f (x)=0 x1
-2 -1 1 2
f (x) = g(−2) f (x) = g(−2)
f (x) = g(−1) -1 f (x) = g(−1)
f (x) = g(− 12 ) -2 f (x) = g(− 12 )

f (x)=0
" #
Figure 18.5.4 Level curves of f (x) = x21 x−1
2 exp (1 − x41 x−2
2 )/2 , f : IR → IR
2
[ Show the uniqueness of unidirectional derivatives. Show that the directional derivative in a particular
direction is well defined if and only if the two corresponding unidirectional derivatives are well defined and
equal. ]
18.5.15 Remark: Just as in Remark 18.5.9, it is easy to show that the m-tuple w in Definition 18.5.14 is
uniquely determined by the n-tuple v for each point p ∈ U . Therefore w ∈ IRm is a well-defined function of
v ∈ IRn for each fixed p ∈ U . Denote this well-defined function by φp : IRn → IRm . It is immediately clear
from Definition 18.5.14 that φp (u) = φp (λv) for any u, v ∈ IRn which satisfy u = λv for some λ ∈ (0, ∞).
Thus each function φp is linear on one-dimensional directed linear subspaces Sv+ = {λv; λ ∈ IR+ 0 } of the
linear space IRn for v ∈ IRn . However, it is not true in general that φp is linear on linear subspaces
Sv = {λv; λ ∈ IR} of IRn . It is a-fortiori not true in general that φp is linear on the whole of IRn .
18.5.16 Remark: Since a directionally differentiable function is necessarily unidirectionally differentiable,
it follows that Examples 18.5.11 and 18.5.12 demonstrate the existence of discontinuous functions which are
everywhere unidirectionally differentiable.
0 is said to be totally
∃L ∈ Lin(IRn , IRm ), ∀ε > 0, ∃δ > 0, ∀v ∈ Bp,δ ∩ U,
|f (p + v) − f (p) − L(v)| ≤ ε|v|. (18.5.4)
A function f : U → IR for U ∈ Top(IR ) is said to be totally differentiable on U if it is totally differentiable
m n
at all points p ∈ U .
If a function f : U → IRm is totally differentiable at a point p ∈ U , the total differential of f at p is the
linear map L in equation (18.5.4).
If a function f : U → IRm is totally differentiable on U ∈ Top(IRn ), the total differential of f is the function
df : U → Lin(IRn , IRm ) defined so that for all p ∈ U , df (p) is the total differential of f at p.
[ Show the uniqueness of the total differential. Give an example where the total differential is not continuous
with respect to the base point p. ]
18.5.18 Remark: The total differential of a function γ : IR → IRm may be visualized as a tangent vector
to a curve. In the case of a function f : IRn → IR, the total differential may be visualized as a tangent plane
to constant-value contours of f . Figure 19.2.4 gives a rough impression of such tangent vectors and tangent
planes.
18.5.19 Theorem: If a function f : U → IRm for U ∈ Top(IRn ) with m, n ∈ +
0 is totally differentiable
at p ∈ U , then f has two-sided directional derivatives in every direction at p.

18.6. Higher-order derivatives for several variables 443
18.5.20 Theorem: If a function f : U → IRm for U ∈ Top(IRn ) with m, n ∈ +

0 has continuous partial
derivatives on U , then f is totally differentiable on U .
[ Theorem 18.5.20 implies that the directional derivatives can be replaced with a linear combination of partial
derivatives. I.e. ∂v f = v i ∂i f . Present an example which shows that the continuity is required. ]
[ Show that if the total differential is defined at a point, then the directional derivatives (and unidirectional
derivatives) are well defined and are linear functions of the direction vector. Give an example to show
that the converse is not true. In other words, linearity of directional derivatives does not guarantee total
differentiability. Examples 18.5.11 and 18.5.12 demonstrate this in fact. ]
[ Give the example of f (x, y) = x3 (x2 + y 2 )−1 which has directional derivatives everywhere but is not differ-
entiable in the limiting linear sense. ]
[ Should include the implicit function theorem inverse function theorem here for use in Section 33.1. ]
18.6. Higher-order derivatives for several variables

[ The definitions for multi-indices and factorial functions should be in the number chapters. ]
18.6.1 Definition: A multi-index is an element of ( ) for some k ∈

+ k
0.
+
An addition operation is defined on multi-indices in ( + )k by α + β = (α1 + β1 , . . . αk + βk ), where α =

(α1 , . . . αk ) and β = (β1 , . . . βk ).
!k
A length function is defined on ( + )k as [α] = i=1 αi for all α ∈ ( + )k .
.
A factorial function is defined on ( + )k by α! = ki=1 αi ! for all α ∈ ( + )k .
18.6.2 Definition: The αth derivative of a function f : U → IR for an open set U ⊆ IRn and α ∈ ( n)
k
for k ∈ +
0 is. . .
[ Define C r , C ∞ and analytic functions of several variables here. For C k functions, for example, want some-
thing like ∀α ∈ ( + )n , |α| ≤ n ⇒ Dα f ∈ C 0 (U ). Could maybe recursively define C k+1 (Ω) = {f ∈
C k (Ω); ∀i ∈ n , ∂i f ∈ C k (Ω)}. See EDM2 [35], 106.K, page 397. ]
18.6.3 Theorem: Let f ∈ C 1 (Ω) for some Ω ∈ Top(IRn ) and n ∈ 0.
+
Then
n
5 ∂f (x) f (x + av) − f (x)
∀x ∈ U, ∀v ∈ IRn , vi = lim .
i=1
∂xi a→0 a
[ Discuss examples where the partial derivatives exist but do not commute. E.g. see Rudin [137], page 221. ]
[ Show that u ∈ C 2 (Ω) and p ∈ Ω a local maximum of u implies that aij uij ≤ 0 at p. ]
18.6.4 Theorem: The composition rules for differentiation with several variables are as follows.
hi = fj φj i
hij = fk% φk i φ% j + fk φk ij
hijk = f%mn φ% i φm j φn k + f%m (φ% ik φm j + φ% jk φm i + φ% ij φm k ) + f% φ% ijk ,
and so forth, where h(x) = f (φ(x)) for x ∈ IRd , and φ : IRd → IRd is a one-to-one differentiable function. It
is assumed that f and φ are C r , where r is the order of derivative of h.
−+
18.6.5 Definition: A C k (differentiable)
& curve in IRm for m ∈ 0,
+
k∈ 0 and an interval I ⊆ IR is a
0 m &
map γ ∈ C (I, IR ) such that γ Int(I) is of class C k .
18.6.6 Remark: Definition 18.6.5 means that a map γ : I → IRm is a C k curve if and only if it is continuous
on the whole interval I and C k on the interior of I.

18.7. Some differentiability-based function spaces

18.7.1 Remark: The linear spaces which are most used in differential geometry are tuple spaces and
function spaces. The tuple spaces are mostly finite-dimensional, typically real-valued.
The functions in the functions spaces are typically real-valued and defined on finite-dimensional tuple space
domains. Notation 18.7.2 is an example of these kinds of function spaces.
−+
18.7.2 Notation: C k (Ω) denotes {f : Ω → IR; f is C k } for k ∈ 0 , Ω ∈ Top(IRn ) and n ∈ 0.
+
−+
18.7.3 Notation: C k (Ω, IRm ) denotes {f : Ω → IRm ; f is C k } for Ω ∈ Top(IRn ), m, n ∈ +
0 and k ∈ 0 .
[ Also define such spaces as C̊ i (Ω, IRn ). See Definition 18.8.1. ]

−+
18.7.4 Theorem: The space C k (Ω, IRm ), for any Ω ∈ Top(IRn ), for any m, n ∈ + 0 and k ∈ 0 , is
closed under the operations of pointwise multiplication by real numbers, and pointwise function addition
and multiplication.
18.7.5 Remark: The notation C ω is not used for analytic functions in this book because of the ambiguity
caused by the notation ω for the set of finite ordinal numbers, which is sometimes taken as the definition of
the countable infinity ∞.
Analytic functions are not very useful in applications of differential geometry to general relativity. Analytic
functions are completely determined globally by values on an arbitrarily small neighbourhood of an arbitrary
point. This is not realistic for macroscopic physical systems. Therefore analytic functions are not emphasized
in this book.
[ Maybe could invent some notation A(IRn ) or (IRn ) for the analytic functions? ]
18.7.6 Theorem: Let n ∈ + 0, k ∈ 0 , Ω ∈ Top(IR ), f ∈ C (Ω), p ∈ Ω and α ∈ ( n ) . Then

+ n k k
f (p) = f
α P (α)
(p) for all permutations P : k → k . In other words, the value of the derivative is
independent of the order of differentiation.
18.7.7 Remark: See Notation 7.2.33 for the sets of integers k.
18.7.8 Remark: There are so many function spaces in analysis, it is sometimes useful to invent a metan-
otation for them such as K. Then one could create templates for definitions such as “manifolds of class K”
or “class K manifolds”, where K might mean C k , C k,α or analytic.
A related generalization of regularity classes is the notion of a “pseudogroup of diffeomorphisms” between
open subsets of IRn . This is defined in Section 19.4.
Most of this book is written in terms of C k manifolds because this level of refinement of regularity is almost
always an adequate starting point. Such spaces are readily refined further, for example to Hölder regularity
classes C k,α if required. The important thing it to avoid the blanket use of C ∞ spaces which remove all
motivation to bring differentiability into consideration. It should not be forgotten that differential geometry is
an extension of analysis from flat space to curved space. Analysis is not a minor extension topic of differential
geometry. The real business of differential geometry is to solve differential equations. Determining whether
a manifold can be stretched into a donut or a sphere is a minor recreational consideration.
[ Define notations for alternating tensor tangent bundles of the form Λm T (V, U ). Also define sets of cross-
sections like X k (T (r,s) (V )) and X k (Λm T (V, U )). ]
[ Define Lie derivatives for flat space somewhere near here. ]
18.8. Differentiation for abstract linear spaces

[ This section needs to be fixed. Maybe it could be generalized to infinite-dimensional linear spaces. ]
Although all linear spaces are isomorphic to IRn for some n, there is sometimes a need for differential calculus
to be defined for general finite-dimensional spaces.
[ See Malliavin [36], section 2.2 for C k (V, W ) etc. ]

18.9. Hölder continuity 445
[ Maybe Definition 18.8.1 is not a good definition. Should base Definition 18.8.1 on Definition 18.2.5? ]
18.8.1 Definition: A differentiable function from a normed linear space V to a normed linear space W
is a function f : Ω → W for an open set Ω ∈ Top(V ) such that
∀x ∈ Ω, ∃φ ∈ Lin(V, W ), ∀ε > 0, ∃δ > 0, ∀x# ∈ Ω,

|x# − x|V < δ ⇒ |f (x# ) − f (x) − φ(x# − x)|W ≤ ε|x# − x|V .
[ Is the derivative of a function f : A → B equivalent to a special case of the covariant derivative on a manifold
with a connection? ]
[ Define C k diffeomorphisms in flat space. See Malliavin [36], section 2.3. ]
[ Follow the notation of Federer here. See Federer [106] 3.1.11 and 3.1.1. ]
[ Define also C r (V, W ). Then get D2 f (a) ∈ Lin(V, Lin(V, W )), etc. ]
[ Also define C̊ r (V, W ). ]
[ Show the relations between the abstract linear space definitions and the n-tuple linear space definitions for
differentiability. ]
18.9. Hölder continuity

[ Hölder continuity could be defined for general metric spaces, somewhere near the definition of uniformly
continuous functions. ]
Hölder continuous functions are defined in terms of the standard norm on IRn for n ∈ + 0.
[ Also define Hölder continuity for the general domains IR . Perhaps generalize also to general ranges IRm .
n
Maybe should even generalize to arbitrary metric spaces for both domain and range. ]
[ Carefully distinguish Hölder continuity definitions according to whether they are local or global, and whether
they are uniform or pointwise. ]
18.9.1 Definition: A Hölder-continuous function with exponent α or α-Hölder (continuous) function at

x ∈ S for a set S ⊆ IRn for n ∈ +
0 and α ∈ (0, 1] is a function f : S → IR such that
∃K ∈ IR, ∀y ∈ S, |f (y) − f (x)| ≤ K|y − x|α .
A uniformly Hölder-continuous function with exponent α or uniformly α-Hölder (continuous) function on a

set S ⊆ IRn for n ∈ +0 and α ∈ (0, 1] is a function f : S → IR such that
∃K ∈ IR, ∀x, y ∈ S, |f (y) − f (x)| ≤ K|y − x|α .
A locally Hölder-continuous function with exponent α or locally α-Hölder (continuous) function at x ∈ S

for a set S ⊆ IRn for n ∈ + 0 and α ∈ (0, 1] is a function f : S → IR such that f is uniformly α-Hölder
continuous on all bounded subsets of S.
18.9.2 Remark: As suggested in Figure 18.9.1, the α-Hölder continuity conditions with larger values of α
are the most restrictive.
So, for example, all 0.75-Hölder continuous functions are also 0.25-Hölder continuous, but not vice versa.
In terms of Notation 18.9.3, this means that C 0,3/4 (U ) ⊆ C 0,1/4 (U ) for any open set U ⊆ IRn . The most
restrictive α-Hölder condition (i.e. the condition with the greatest regularity) is the 1-Hölder condition, which
is the same as the Lipschitz property on bounded sets.
[ Check the distinction between local and global Hölder continuity definitions in Remark 18.9.2 and Nota-
tion 18.9.3. ]
18.9.3 Notation: C k,α (Ω) for an open subset Ω of IRn , n, k ∈ + 0 and α ∈ (0, 1] denotes the set of
functions f ∈ C k (Ω) such that the function x 8→ dk f (x)/dxK is locally α-Hölder continuous for all x ∈ Ω
and multi-indices K ∈ ( + )n with |K| = k.

f (x) = x f (x) = |x|0.75

f (x) = |x|0.5
f (x) = |x|0.25
1 |x|0.25
|x|0.75
-2 -1 1 2 x
-1
f (x) = −|x|0.25
f (x) = −|x|0.5
f (x) = − x f (x) = −|x|0.75
Figure 18.9.1 Fractional powers f (x) = |x|α , α ∈ (0, 1]
18.9.4 Remark: It'seems reasonable that C ∞,α (Ω) for an open subset Ω '
of IRn with n ∈ + 0 and α ∈ (0, 1]
would denote the set k=0 C (Ω). However, this is identical to C (Ω) = ∞
∞ k,α ∞
k=0 C k
(Ω). Therefore the C ∞,α
spaces are generally not defined.
[ Give a family tree of Hölder continuity properties. ]

[ This section should also define one-sided derivatives and upper and lower derivatives, and give theorems
about these for general functions, monotone functions, Lipschitz functions, etc. ]
[ It should also be possible to define derivatives for functions on a dense subset of IR, such as for example. ]

[447]
Chapter 19
Diffeomorphisms in Euclidean space
19.1 Tangent vectors and diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

19.2 Differentials and diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
19.3 Second-level tangent vectors and diffeomorphisms . . . . . . . . . . . . . . . . . . . . . . 454
19.4 Diffeomorphism pseudogroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
19.5 Second-order differential operators and diffeomorphisms . . . . . . . . . . . . . . . . . . . 459
19.6 Directionally differentiable homeomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 461
Diffeomorphisms between subsets of IRn are the foundation of differentiable structures on n-dimensional
manifolds. In fact, the differentiable manifold concept may be thought of as merely the addition of global
topology to the subject of local diffeomorphisms. The local properties of differentiable manifolds (i.e. the
properties within any individual chart) are the same as for Euclidean space. Differentiable manifolds simply
stitch together patches of Euclidean space. This observation does not apply to the higher-layer structures of
differential geometry, however, such as connections and Riemannian metrics, which are not so easy to simply
stitch together from Euclidean space patches.
19.1. Tangent vectors and diffeomorphisms

19.1.1 Remark: The concept of a tangent vector arises naturally as an invariant attribute of differentiable
functions and curves under diffeomorphisms. Both the gradient of a real-valued function and the tangent to
a curve are invariant under diffeomorphisms whereas higher-order differential operators are not invariant in
a comparable way.
The group of rotations of Euclidean space leaves distances, angles and the classes of spheres and lines
invariant. The general linear group leaves parallelism and lines invariant. Diffeomorphisms leave the class
of tangent vectors invariant. (The larger the class of transformations, the smaller is the set of invariants.)
Tangent vectors and diffeomorphisms are closely associated with each other. Tangent vectors are, roughly
speaking, the maximal invariant of diffeomorphisms, whereas diffeomorphisms are, roughly speaking, the
maximal class of transformations which preserves tangent vectors. For example, Lipschitz transformations
do not preserve tangent vectors; and the curvature of curves is not preserved by diffeomorphisms. (This
observation is related to the Erlanger Programm. See Remark 19.4.2.)
−+
19.1.2 Definition: For n ∈ + 0 and r ∈ 0 , a (local) C diffeomorphism on IR is a homeomorphism
r n
φ : Ω1 ≈ Ω2 for open subsets Ω1 , Ω2 of IR such that both φ and φ are C differentiable maps.
n −1 r
The sets Ω1 and Ω2 are said to be C r -diffeomorphic if there exists a C r diffeomorphism from Ω1 to Ω2 .
19.1.3 Remark: The composition φ2 ◦ φ1 of any pair of C r diffeomorphisms φ1 and φ2 in Definition 19.1.2
is well defined according to the generalized composition of functions given by Definition 6.3.23 (which does
not require Dom(φ2 ) ⊆ Range(φ1 )). Then the set of C r diffeomorphisms on IRn in Definition 19.1.2 is closed
under composition. This follows from the fact that the composite of C r functions is C r . The inverse of a
C r diffeomorphism is also clearly a C r diffeomorphism. (The closure of diffeomorphisms under composition
is related to the concept of a “pseudogroup”. See Section 19.4.)


448 19. Diffeomorphisms in Euclidean space
The transitivity, symmetry and reflexivity of the C r diffeomorphism relation imply that it is an equivalence
relation, and therefore the set of open sets of IRn is partitioned into equivalence classes by this relation. (See
Section 6.4 for equivalence relations and partitions.)
19.1.4 Remark: Figure 19.1.1 illustrates the transformation of tangent vectors of a curve when the curve
lies within the domain of a diffeomorphism from IRn to IRn . (C r curves in IRn are given by Definition 18.6.5.)
φ
vi
q wi γ̃ = φ ◦ γ
γ p = γ(t) p̃ = φ(p)
w̃i
i
q̃
v i = ∂t γ i (t) ṽ
IRn IRn
ṽ i = ∂t γ̃ i (t) = φi ,k (p)v k
Figure 19.1.1 Transformation of tangent vector of a curve under a diffeomorphism
As illustrated in Figure 19.1.2, the derivative (i.e. tangent vector) of a differentiable curve γ : IR → IRn
is transformed according to only the first derivatives (the Jacobian matrix) of a diffeomorphism. One may
write the transformation rule more compactly as ∂t (φ ◦ γ(t))i = φi ,k ∂t γ k (t).
V 8→ Ṽ = (ṽ i )ni=1
V = γ # (t) (φ ◦ γ)# (t) = Ṽ
∂φi (x) &&
ṽ i = & vj
∂xj x=γ(t)
derivative derivative
γ φ ◦ γ = γ̃
φ
Figure 19.1.2 Transformation of tangent to a curve under a diffeomorphism
19.1.5 Remark: In practical terms, the transformation rule for tangents to curves means that you don’t
have to re-calculate tangent vectors when the curves are subjected to diffeomorphisms. The tangent vector
of a transformed curve may be calculated from the untransformed curve using only the derivatives of the
diffeomorphism, without needing to differentiate the curve again. This suggests that a tangent vector has
some sort of existence independent of the particular curve from which it is constructed.
19.1.6 Remark: The transformation rule for tangent vectors in Remark 19.1.4 is obtained by comparing
the derivative of a curve with the derivative of the transformed curve. This suggests that derivatives and
curves are part of the “essence” of tangent vectors. This is more or less true. However, for C 1 functions
f : IRn → IR, the following identity holds for any v ∈ IRn .
n
5 ∂f f (x + av) − f (x)
vi (x) = lim . (19.1.1)
i=1
∂xi a→0 a
The left-hand side of (19.1.1) is a convenient method of calculating the right-hand side, but it is the right-
hand side which is closer to the “true meaning” of the vector v ∈ IRn . The left-hand side uses a particular
orientation of axes for calculating partial derivatives whereas the right-hand side is effectively written in
terms of the vector from x to x + av, in other words the vector av at the point x. The function f has the
role of a “test function” to help express the idea of the limit of the vector av as a tends to zero.
Both the C 1 functions f and the C 1 curves γ serve to clarify the meaning of an infinitesimal vector. The limit
operation serves to neutralize the curvature effect of non-linear diffeomorphisms so that the transformation
rules will not be erroneous. Ultimately, it does seem that the only way to derive the properties of tangent

19.2. Differentials and diffeomorphisms 449
vectors (and other tangential objects) is to evaluate the effect of diffeomorphisms on them as in Remarks
19.1.4 and 19.3.2. Thus every class of tangential object on a manifold is a generalization of some differential
operator in a Euclidean space together with its transformation rules under diffeomorphisms. However, as
soon as the calculus has been performed on the concrete differential operator, it is best to abstract the
transformation rules from this and define the class of tangent objects purely in terms of transformation rules
rather than in terms of test functions or sample curves.
19.1.7 Remark: It is important to distinguish between “point transformations” and “coordinate transfor-
mations” because they are opposites even though they may have the same equations. The diffeomorphism
φ in Remark 19.1.4 is a point transformation in flat space IRn , but when the points of IRn are used as mere
coordinates for a manifold (as in Definition 26.3.2), the diffeomorphism is a coordinate transformation. If
ψ1 and ψ2 are charts for an n-dimensional manifold, the composite map ψ2 ◦ ψ1−1 is a local diffeomorphism
of IRn which plays the role of a coordinate transformation for the manifold, but is a point transformation in
the space IRn of coordinates.
[ Define the tangent bundle on IRn . Use this for Metadefinition 27.2.1. Definition 19.1.8 needs to be made
more rigorous. Need a notation for the space of all C r diffeomorphisms on IRn . ]
19.1.8 Definition: The tangent bundle on IR n for n ∈ + 0 is the tuple (T (IR ), π, IR ), where T (IR ) =
n n n
IR × IR and π : T (IR ) → IR satisfies π : (p, v) 8→ p for (p, v) ∈ T (IR ).

n n n n n
[ Give the rule for the Jacobian of the composite of two diffeomorphisms? ]
19.1.9 Notation: Tx (IRn ) for n ∈ +

0 and x ∈ IRn denotes the subset {x} × IRn of the tangent bundle
T (IRn ) on IRn .
19.1.10 Remark: The set Tx (IRn ) in Notation 19.1.9 is called the “fibre at x”. In Definition 19.1.8, the
set T (IRn ) is called the “total space” of the tangent bundle on IRn , IRn (the third element of the specification
tuple) is called the “base space” of the tangent bundle, and the map π is called the projection map of the
tangent bundle.
19.1.11 Definition: A cross-section of the tangent bundle on IR n for n ∈ +

is a function f : IRn →
0
T (IRn ) such that f (x) ∈ Tx (IRn ) for all x ∈ Dom(f ).
−+
19.1.12 Definition: A C r cross-section of the tangent bundle on IR n for n ∈ +
0 and r ∈ 0 is a cross-
section f of the tangent bundle on IRn such that f is of class C r .
−+
19.1.13 Notation: X r (T (IRn )) for n ∈ +
0 and r ∈ 0 denotes the set of C r cross-sections of T (IRn ).
19.1.14 Remark: Diffeomorphisms between differentiable manifolds are given by Definition 26.9.8.
19.2. Differentials and diffeomorphisms

19.2.1 Remark: The differential ! (or ∂f gradient) of a function is a kind of transpose of the derivative of a
function. A derivative such as ni=1 v i ∂x i (x) depends on two inputs, namely the sequence of n components
∂f
(v )i=1 and the sequence of n partial derivatives ( ∂x
i n
i (x))i=1 . The sequence of partial derivatives of f is
n
called the “differential of f ”. This is often denoted as df , but in applied mathematics, the usual notation
is ∇f . (The nabla symbol ∇ is just the Delta symbol ∆ upside down. The ∆ symbol is the Greek letter
“D”, which is an abbreviation for the word “derivative”.)
19.2.2 Example: Figure 19.2.1 shows the “vectors” df (or ∇f ) for a quadratic function on IR2 . The
function f is defined by f (x, y) = x2 /4 + y 2 for (x, y) ∈ IR2 . This has the differential df (x, y) = (x/2, 2y).
(See also Example 42.10.1.)
19.2.3 Remark: It turns out that the gradient is not a tangent vector. It does not obey the right trans-
formation rules for a tangent vector under diffeomorphisms. To see this, consider the point diffeomorphism
which is illustrated in Figure 19.2.2.

f (x, y) = x2 /4 + y 2 1 (fx , fy ) = (x/2, 2y)
-2 -1 1 2 x
f (x, y) = 1.0 f (x, y) = 0.9
Figure 19.2.1 Differential of quadratic function on IR2
φ : (x, y) 8→ (2x, y)
f (x, y) = x + y
2 2
f˜(x, y) = x2 /4 + y 2
y f (x, y) = 1.0 y
1 1
f˜(x, y) = 1.0
ṽ = φ∗ (v)
p v p̃
x
-1 1 -2 -1 1 2 x
f˜(x, y) = 0.75
f (x, y) = 0.75
(fx , fy ) = (2x, 2y) (f˜x , f˜y ) = (x/2, 2y)
Figure 19.2.2 Effect of diffeomorphism on a differential in IR2
The point (0.5, 0) is mapped to (1, 0). If the function f : IR2 → IR is transformed, point for point, by the
diffeomorphism, the value of f˜ = f ◦ φ−1 satisfies f˜(1, 0) = f ◦ φ−1 (1, 0) = f (0.5, 0) = 0.25 if f is defined
by f : (x, y) 8→ x2 + y 2 .
The gradient of f satisfies df (x, y) = (2x, 2y). Similarly, df˜(x, y) = (x/2, y). So df (0.5, 0) = (1, 0)
and df˜(1, 0) = (0.5, 0). However, the image (or “push-forth”) φ∗ of φ maps the vector v = (1, 0) at
p = (0.5, 0) to φ∗ (p, v) = (p̃, ṽ) where p̃ = (1, 0) and ṽ = (2, 0). In other words, the tangent vector (1, 0) at
p is doubled in length by φ whereas the gradient df at p is halved in length. It seems that the gradient is
scaled exactly the opposite to a tangent vector. This is true in general because the gradient is a “cotangent
vector”, not a tangent vector. To be more precise, the gradient is a linear form on the space of tangent
vectors. Alternatively, a cotangent vector may be called a “covector” for short.
Just because something has n real components doesn’t mean it’s a vector. It is important to test “things
with n components” to ensure that they transform correctly under diffeomorphisms before accepting them
as tangent vectors.
19.2.4 Remark: The gradient of a C 2 real-valued function on a Euclidean space is everywhere orthogonal
to the level curves (if the gradient is non-zero). The “length” is inversely related to the spacing between level
curves. (This idea of “length” is not a real metric. It does not transform like a metric under transformations.)
The orthogonality is an illusion. True orthogonality would be lost under a non-orthogonal linear transforma-

tion, whereas the pseudo-orthogonality between tangent vectors and differentials seems to hold despite any
diffeomorphisms at all. In fact, the “vector” representing the differential is merely the set of n components
of the linear form which defines a tangent plane through a point. It is this tangent plane which transforms
like a true tangent vector, not the “normal vector” to that plane. If the “normal vector” is transformed like
a true tangent vector, it is transformed to a vector which is no longer normal to the tangent plane of the
contour curves (or contour surfaces for n > 2). This gives a clue as to how to visualize and think about
differentials (and gradients, differential forms and cotangent vectors).
Differentials are best thought of as a family of parallel tangent planes with a parameter t ∈ IR which equals
zero at the point where the differential is attached. These tangent planes may be thought of as the contour
curves or surfaces of a real-valued function. This is illustrated in Figure 19.2.3.
H2 H3
H0 H1
H−2 H−1 w
p
v∈H1
) !n *
Ht = p + v; i=1 wi vi = t
t=3
t=2
t=1
t=0
t=−1
t=−2
Figure 19.2.3 Visualization of a cotangent vector as a family of hyperplanes
This view of cotangent vectors is the inverse of the “velocity of a curve” view of tangent vectors. In the same
way that a vector v at p can be vizualized as the velocity of a C 1 function γ : IR → IRn with γ(0) = p, so also
can a cotangent vector λ at p be vizualized as the gradient of a C 1 function f : IRn → IR such that f (p) = 0.
The n-tuple !nw in iFigure 19.2.3 is only a parameter for the linear functional λw : Tp (IR ) → IR defined by
n
λw : v 8→ i=1 w vi . One may equivalently !n represent the linear functional by the family (Ht )t∈IR defined
by Ht = {p + v; λw (v) = t} = {p + v; i=1 w vi = t}. It is sufficient to specify only the plane H1 (and
i
the point p), from which the other planes are easily calculated. Alternatively, one could represent the linear
functional as the function fp,w : IRn → IR defined by fp,w : x 8→ λw (x − p).
In effect, a cotangent vector does not have a direction. The direction is illusory since it does not transform
like a true tangent vector. It is the hyperplanes which have a well-defined direction. The tangent plane H0
does transform correctly under diffeomorphisms.
An advantage of the representation as a family of hyperplanes (Ht )t∈IR is that the transformation rule is
trivially tranformed by the point transformation φ : IRn → IRn . Since the hyperplanes are transformed
to curved hypersurfaces in general, these do need to be flattened out. The function representation fp,w is
similarly easy to transform. After the transformation, the transformed n-tuple w̃ can be recalculated from
the transformed function fp,w ◦ φ. All things considered, the n-tuple together with the cotangent vector
transformation rule is the most convenient way of doing calculations. One must, however, always remember
that the n-tuple is not a real tangent vector. It is only a parameter for a linear functional.
19.2.5 Remark: Figure 19.2.4 illustrates the similarity and difference between the methods of visualizing
tangent vectors and cotangent vectors in flat space. In physics language one may summarize vectors and
cotangent vectors as follows.
– Tangent vectors are infinitesimal displacements in the point space.
– Cotangent vectors are infinitesimal gradients of potential energy fields.
In other words, a tangent vector is a kind of best-fit line for a differentiable curve, whereas a cotangent
vector is a kind of best fit linear function for a differentiable real-valued function.
19.2.6 Remark: Since the gradient of a real-valued function at a point has a transformation rule which
is entirely determined by the Jacobian of the transformation at the point, it makes sense to define a class

t=2
w
t=1
t=0 v
t = −1 p
p
t = f (x)
t = −2
x = γ(t)
t=2
t=1
t=0
t=−1
t=−2
v = γ # (0) w = df (p)
Figure 19.2.4 Visualization of tangent vector v and cotangent vector w
of object in which gradients of functions can “live”. The derivatives of functions γ : IR → IRn “live” in the
tangent bundle of IRn , which is the set of all tangent vectors at all points in IRn . The gradient of a function
f : IRn → IR obeys a different transformation rule to the tangent bundle. So a different kind of vector bundle
is required. The name for this is the “cotangent (vector) bundle”. The cotangent vector transformation rule
is the opposite to the tangent vector rule.
19.2.7 Remark: As illustrated in Figure 19.2.5, the differential of a differentiable function f : IRn → IR
is also transformed according to only the first derivatives of a diffeomorphism. However, in this case, the
inverse linear transformation is used. Consequently the (contraction) products wi v i and w̃i ṽ i are equal.
Hence the differential of a real-valued function on IRn with respect to the tangent vector along a curve in
IRn is independent of the choice of local coordinates. One may write the transformation rule for differentials
of real-valued functions more compactly as (f ◦ φ),i = f,k φk ,i .
W 8→ W̃ = (w̃i )ni=1
W = df (x) d(f ◦ φ−1 )(φ(x)) = W̃
∂φj (x)
wi = w̃j
∂xi
differential differential
f f ◦ φ−1 = f˜
φ
Figure 19.2.5 Transformation of differential of a function under a diffeomorphism
19.2.8 Definition: The cotangent bundle on IRn for n ∈ + 0 is the set T (IR ) = IR × IR together
∗ n n n
with the transformation rule ψ̂ : T ∗ (IR ) →

n
˚ T ∗ (IR ) defined by (p, w) 8→ (φ(p), (wj φ̄j ,i (p))ni=1 ) for C 1
n
diffeomorphisms φ : IRn →
˚ IRn , where [φ̄j ,i ]ni,j=1 denotes the inverse of the Jacobian of φ.
[ The φ̄ pseudo-notation in Definition 19.2.8 for the inverse Jacobian is very dubious. It should be replaced
with a correct notation. ]
19.2.9 Notation: Tx∗ (IRn ) for n ∈ +

0 and x ∈ IRn denotes the subset {x} × IRn of the cotangent bundle
T ∗ (IRn ) on IRn .
19.2.10 Definition: A cross-section of the cotangent bundle on IR n for n ∈ +

0 is a function f : IRn →
T ∗ (IRn ) such that f (x) ∈ Tx∗ (IRn ) for all x ∈ Dom(f ).
−+
19.2.11 Definition: A C r cross-section of the cotangent bundle on IR n for n ∈ + 0 and r ∈ 0 is a
cross-section f : IR → T (IR ) such that f is of class C .
n ∗ n r
−+
19.2.12 Notation: X r (T ∗ (IRn )) for n ∈ + 0 and r ∈ 0 denotes the set of C cross-sections of T (IR ).
r ∗ n
19.2.13 Definition: The differential of a function f ∈ C 1 (IRn ) at x ∈ IRn is the element (x, (∂k f (x))nk=1 )
of the cotangent bundle T ∗ (IRn ).

19.2.14 Definition: The differential of a function f ∈ C 1 (IRn ) is the cross-section {(x, (∂k f (x))nk=1 ); x ∈
IRn } ∈ X 1 (T ∗ (IRn )) of the tangent bundle T ∗ (IRn ).
19.2.15 Remark: It is convenient to define a basis of unit vectors for both the tangent bandle and cotan-
gent bundle in flat space. The unit basis for the linear space of real-valued n-tuples is given by Defini-
tion 10.2.21. In the context of diffeomorphisms, it is usual to define the unit tangent vectors ∂/∂xi and
cotangent vectors dxi for i ∈ n .
The tangent vector ∂/∂xi is really just a cute way of referring to the tangent vector of the curve γ : IR → IRn
defined by γ : t 8→ tei , where ei = (δij )nj=1 is the usual unit n-tuple of real numbers. Thus ∂/∂xi means the
tangent vector v ∈ IRn given by
∂γ j (t)
vj =
∂t
∂(tδij )
=
∂t
= δij = eji .
In other words, v = ei , which is not very useful. The real value of the pseudo-notation ∂/∂xi for ei is that it
facilitates the calculation of transformation matrices under diffeomorphisms and it makes a clear distinction
between tangent and cotangent vectors.
The unit cotangent vectors dxi are supposed to represent the differentials of the functions f : IRn → IR
defined by f : x 8→ xi . The differential df is clearly equal to the unit n-tuple ei again, which is apparently
not very useful. The dxi pseudo-notation is useful as a mnemonic for the cotangent vector transformation
rule under a diffeomorphism. The superscript i on dxi is a mnemonic for a covariant vector or cotangent
vector whereas the subscript i of ∂/∂xi indicates a contravariant vector or tangent vector. (The indices on
unit vectors have the opposite location to their corresponding coefficients.)
The ∂/∂xi and dxi mnemonic abbreviations may be even further abbreviated to ∂i and di respectively.
Although these pseudo-notations are attractive, simple, useful and popular, they are not always helpful. In
some situations, they can lead to incorrect calculations. They often suggest the right calculations, but it
is difficult to know when their mnemonic value is beneficial and when it is harmful. Therefore it is best to
use these abbreviations only for presentation purposes. For reliable calculations, it is best to use explicit
notations.
19.2.16 Remark: Figure 19.2.6 illustrates how the vector/covector dual relationship is inherent in the
combination of a curve γ : IR → IRn and a real-valued function f : IRn → IR. Since the function loop starts
and ends in IR, it is inevitable that the derivative of f will be a dual of the derivative of γ. Even if the space
“in the middle” has no differentiable structure, there will be some sort of dual relationship in the differential
behaviour of such maps. In this case, we have ∂t (f ◦ γ) = γ # (t)∇f (p) with p = γ(t).
p
γ f
IR IRn IR
Figure 19.2.6 The vector/covector loop from IR to IR via IRn
19.2.17 Remark: In conservative force fields, the potential energy of a particle is the integral of the product
of the force vector with the space displacement vector. Since the energy at each point is transformed under
diffeomorphisms, it follows that the force field vectors must be cotangent vectors. To put it another way,
a force vector may be calculated as the gradient of a scalar potential energy field. In very simple terms,
force × displacement = energy. Therefore force is a covariant vector if displacement is a contravariant vector.

19.3. Second-level tangent vectors and diffeomorphisms

[ Test the concept of “second-level” tangent vectors by applying it to test functions in C 2 (IR2 ). The test for
“reality” of the object is that it gives the same effect on all f ∈ C r (IRn ). Does this make sense? ]
A first-level tangent vector may be thought of as an infinitesimal variation of a point. Similarly, a second-
level tangent vector may be thought of as an infinitesimal variation of a tangent vector. Since a tangent
vector is always attached to a unique point, it is the point/vector pair which is being varied. Whereas a
first-level tangent vector is a variation in the point space, a second-level tangent vector is a variation in a
point/vector space. In the language of manifolds, this combined point/vector space will be called a “tangent
bundle”.
A useful way to think about infinitesimal variations of points and vectors is to vary the points and vectors
with respect to a time parameter t. This is illustrated in Figure 19.3.1, where the point p(t) and vector v(t)
are varied with respect to t.
wV
v(t1 )
p(t1 ) p(t2 ) v(t2 )

wH
Figure 19.3.1 Second-level tangent vector in flat space
The rate of change of the point p(t) with respect to t is a vector which is denoted here by wH . The
rate of change of the vector v(t) with respect to t is a vector which is denoted here by wV . These are
called respectively the “horizontal” and “vertical” components of the rate of change of the point/vector
pair (p(t), v(t)). Thus the picture here is of a vector moving with variable direction and variable base point.
The first task here is to provide a suitable space of objects which describe the velocity of motion of such a
vector, analogous to the velocity of the base point on its own. The second task is to determine how these
second-level velocity objects transform with respect to C 2 diffeomorphisms from IRn to IRn .
19.3.1 Remark: Second-level velocity vectors in IRn can be described by tuples of the form (x, v, w) ∈
IRn × IRn × (IRn × IRn ), where x is the location of the base point, v is the initial value of a vector with
base point p, and w = (wH , wV ) ∈ IRn × IRn is the pair of horizontal and vertical components wH ∈
IRn and wV ∈ IRn . It would seem reasonable, then, to identify the space of second-level tangent vectors
T (T (IRn )) = T (2) (IRn ) with a set such as IRn × IRn × (IRn × IRn )
19.3.2 Remark: The transformation rule for second-level tangent vectors under a C 2 diffeomorphism φ
may be determined by differentiating the first-level transformation rule. The point/vector pair (p(t), v(t)) is
transformed to (p̃(t), ṽ(t)) where p̃(t) = φ(p(t)) and ṽ i (t) = φi ,j (p(t))v j (t). (See Figure 19.3.2.)
φ
wV w̃V
v(t1 ) ṽ(t2 )
ṽ(t1 )
p(t1 ) p(t2 ) v(t2 ) p̃(t2 )
wH p̃(t1 )
w̃H
Figure 19.3.2 Transformation of second-level tangent vector in flat space

19.3. Second-level tangent vectors and diffeomorphisms 455
Therefore the rate of change (wH , wV ) = ∂t (p(t), v(t)) is transformed to
(w̃H , w̃V ) = ∂t (p̃(t), ṽ(t))

= (φi ,j (p(t))∂t pj (t), φi ,jk (p(t))∂t pk (t)v j (t) + φi ,j (p(t))∂t v j (t))
j
= (φi ,j (p(t))wH , φi ,jk (p(t))wH v (t) + φi ,j (p(t))wVj ).
k j
This is summarized by the following matrix equation.

3 4 3 43 4
w̃H A 0 wH
= ,
w̃V B A wV
where the n × n matrices A and B are defined by

3 4n
∂φi (x) &&
A= & ,
∂xj x=p(t) i,j=1
and
3 4n
∂ 2 φi (x) &&
B= & v (t)
k
.
∂xj ∂xk x=p(t) i,j=1
The horizontal component of a second-level tangent vector transforms like a first-level tangent vector, but
the vertical component has an extra term which depends on the matrix of second derivatives of the diffeomor-
phism. It turns out that this term is where differential geometry becomes “interesting”. Simple second-level
tangent vectors do not have the kinds of invariance properties which are required for physics. To construct
well-defined tensors from second-level tangent vectors, it is necessary to add an additional term to represent
“parallel transport”. This additional term compensates for the second-order derivatives of the diffeomor-
phism so that second-level tangent objects can be made to transform purely according to the first-order
derivatives of diffeomorphisms.
It is very inconvenient that second-level tangent vectors are in a different space to first-level tangent vectors.
However, second-level tangent vectors can be “dropped” into the first-level tangent vector space by definition
an affine connection on the tangent bundle. An affine connection specifies a parallelism relation in each
direction of the horizontal component of a second-level tangent vector. This enables the vertical component
to be adjusted for parallel transport in the horizontal direction, which converts the vertical component to
a net rate of change relative to parallel transport in the horizontal direction. This is, in fact, the purpose
of defining an affine connection on a differentiable manifold. If no affine connection is defined, there are
infinitely many equivalent linear maps to “drop” the vertical component of a second-level tangent vector.
An affine connection makes a particular choice of linear map to use as the drop function.
19.3.3 Definition: The second-level tangent bundle on IR n for n ∈ + 0 is the set T

(2)
(IRn ) = IRn × IRn ×
(IRn × IRn ) together with the transformation rule ψ̂ (2) : T (2) (IRn ) →
˚ T (2) (IRn ) defined by ψ̂ (2) : (p, v, w) 8→
(φ(p), ṽ, w̃) for C diffeomorphisms φ : IR →
2 n
˚ IR , where w = (wH , wV ), w̃ = (w̃H , w̃V ) and
n
ṽ = (φi ,j (p)v j )ni=1 ,

j n
w̃H = (φi ,j (p)wH )i=1
w̃V = (φi ,jk (p)wH v + φi ,j (p)wVj )ni=1 .
k j
19.3.4 Remark: First-level tangent vectors may be applied to real-valued functions as in Section 19.2.
Similarly, second-level tangent vectors may be applied to tangent-vector-valued functions. Such functions
are called “vector fields”.
Suppose X and Y are vector fields on IRn . In other words, for all p ∈ IRn , both X(p) and Y (p) are
elements of T (IRn ) = IRn × IRn such that X(p) = (p, u(p)) and Y (p) = (p, v(p)) for some u(p), v(p) ∈ IRn
for all p ∈ IRn . (See Definition 19.1.8 for the tangent space T (IRn ).) Define a function Z on IRn by
Z(p) = (p, w(p)) = (p, ui (p)∂i v j (p) − v i (p)∂i uj (p)). The transformation rules of Definition 19.1.8 require

that (p, u(p)) 8→ (φ(p), (φi ,j (p)uj (p))ni=1 ) and (p, v(p)) 8→ (φ(p), (φi ,j (p)v j (p))ni=1 ) for C 1 diffeomorphisms φ.
If Z is calculated in transformed coordinates, the value becomes
ūi ∂¯i v̄ j − v̄ i ∂¯i ūj = φi k uk φ̄% i ∂% (φj m v m ) − φi k v k φ̄% i ∂% (φj m um )

= φi k uk φ̄% i φj m ∂% v m − φi k v k φ̄% i φj m ∂% um + φi k uk φ̄% i φj m% v m − φi k v k φ̄% i φj m% um
= φi k φ̄% i φj m (uk ∂% v m − v k ∂% um ) + (φi k φ̄% i φj m% − φi m φ̄% i φj k% )uk v m
= φj m (uk ∂k v m − v k ∂k um ) + (φj mk − φj km )uk v m
= φj m (uk ∂k v m − v k ∂k um ).
This means that the n-tuple (uk ∂k v m − v k ∂k um )nm=1 has the same transformation rules as a tangent vector
in IRn . It follows that this is a chart-independent true vector. It is given the name “Poisson bracket”. The
standard notation for this is [X, Y ].
The motivation for introducing the Poisson bracket is to define a form of differentiation which is chart-
independent. In other words, the same vector is calculated regardless of diffeomorphisms of the underlying
space. The antisymmetric quantity [X, Y ] may be expressed as DX Y − DY X, where DX represents the
derivative operation ui ∂i applied to a vector field. Although DX Y is not a vector, the antisymmetrant
DX Y − DY X is a vector. If this kind of differentiation is generalized to general tensors, the result is the
“Lie derivative”, which achieves the same objective, namely to differentiate tensor fields in a manner which
commutes with diffeomorphisms.
[ Remarks 19.3.4 and 19.3.5 require work to convert the sloppy tensor calculus to authentic mathematics. ]
19.3.5 Remark: Remark 19.3.4 shows that anti-symmetrization can make the partial derivative of a con-
travariant vector (which is not a true vector) into a true contravariant vector. A similar kind of anti-
symmetrization can be applied to the partial derivative of a covariant vector. Let (ai )ni=1 be the n-tuple of
coefficients of a covariant vector in T (IRn ). Then
∂¯i āj − ∂¯j āi = φ̄k i ∂k (φ̄% j a% ) − φ̄k j ∂k (φ̄% i a% )

= φ̄k i φ̄% j ∂k a% − φ̄k j φ̄% i ∂k a% + φ̄% ji a% − φ̄% ij a%
= φ̄k i φ̄% j (∂k a% − ∂% ak ).
It follows that the construction ∂i aj −∂j ai is a true covariant tensor of degree 2. This is, in fact, the “exterior
derivative” of the covariant vector field a = (ai )ni=1 .
19.3.6 Remark: While it is very valuable to construct true vectors and tensors using antisymmetrization
to cancel out the second-order derivatives of a diffeomorphism, which yields the Poisson bracket, Lie deriva-
tives and the exterior derivative, it is also highly desirable to construct straightforward derivatives without
antisymmetrization. To achieve chart-independent differentiation without antisymmetrization requires the
introduction of differentiable parallelism, which is also called an “affine connection”.
An affine connection is an extra structure on a point space which maps the vertical components of second-
level tangent vectors to horizontal components. This map must commute with diffeomorphisms in order
to give well-defined vector and tensor constructs. Chart-independence is then achieved by ensuring that
the definition of differential parallelism, if integrated along a path to yield a pathwise parallelism, yields
chart-independent parallelism between the tangent spaces at distant points.
The Lie derivative is defined in terms of the integrated flow of a first-level contravariant vector field. This
defines a limited form of parallelism along the integral curves of the vector field. This parallelism along inte-
gral curves is chart-independent, which is why the Lie derivative and Poisson bracket are chart-independent.
But an affine connection offers is parallelism along all differentiable paths.
[ Show invariance of (1) full exterior derivative and (2) full Lie derivative? (This is a mysterious old marginal
comment.) ]
[ Conclude that Stokes Theorem is very general. (This mysterious old marginal note might mean that Stokes
Theorem is independent of connections and metrics, and is therefore applicable to the broadest range of DG
structures.) ]

19.4. Diffeomorphism pseudogroups 457
19.3.7 Remark: Second-level tangent bundles (which have n + n + 2n coefficients) should not be confused
with second-order differential operator bundles (which have n + n + n2 coefficients). The latter are defined
in Section 19.5 in terms of second-order differentials of real-valued functions, whereas the former (in this
Section) represent first-order differentials of first-level vector fields.
Roughly speaking, a second-order differential operator looks like f 8→ bi ∂i f (x) + aij ∂ij f (x). (See also
Remark 27.10.5 for comments on the words “degree”, “order” and “level” for tangent objects.)
[ Should give some examples and an illustration to clarify the differences described in Remark 19.3.7. Also need
to clarify the terminology and make the names of the different kinds of tangent objects easier to interpret. ]
[ Define general tensor bundles near here? ]
19.4. Diffeomorphism pseudogroups

19.4.1 Remark: A small number of authors give the name “pseudogroup” to sets of diffeomorphisms
which are closed under function composition. The name seems to have originated with Marius Sophus Lie
(1842–1899). A pseudogroup of diffeomorphisms on a set X does not generally have a single identity element.
There is, instead, an identity element for each open subset of the set X. In other words, there are only local
identity elements. The diffeomorphisms in a pseudogroup have left and right inverses relative to local identity
maps. In addition to closure of a pseudogroup under composition, one may also require closure under such
operations as function domain restriction.
Pseudogroups are defined by Kobayashi/Nomizu [27], page 1 and EDM2 [35], 90.D, page 337. (This definition
should not be confused with the pseudogroup definition by Malliavin [36], Section 2.4, pages 99–102, which
is actually a semigroup.)
A pseudogroup of transformations seems very similar to a group at first. The composition of elements
is closed and associative, and all elements have an inverse. However, the inverse does not use a single
global identity. There is a separate identity operation for each of the domains of the transformations in the
pseudogroup.
19.4.2 Remark: As mentioned in Remark 14.2.3, the set of homeomorphisms between subsets of a Eu-
clidean space have some similarity to the concept of the Felix Klein’s Erlanger Programm. (See EDM2 [35],
article 137, page 546.) But the homeomorphisms constitute a pseudogroup rather than a group because of
the lack of a global identity function.
Klein claimed that geometry consists principally of the study of congruence and invariance under groups
of transformations. Thus topology can be thought of as the study of the invariants of homeomorphisms.
He also stated that further geometries can be generated from subgroups of a transformation group. Since
C r diffeomorphisms constitute a sub-pseudogroup of homeomorphisms, one may consider the “programme”
of differential geometry to be the study of the invariant properties of C r diffeomophisms. To some extent,
this is true. Tangent vectors, for example, may be thought of as invariant properties of curves under
diffeomorphisms. (See also Remark 36.1.5 regarding invariants of manifolds with an affine connection.)
The invariance of tangent vectors is sketched in Figure 19.4.1. The set of C 1 curves in IRn is denoted
by C1 (IRn ). The diagram is commutative because the same tangent vector γ̃ # (0) is arrived at by either
transforming the curve with φ first and then differentiating or by differentiating first and then transforming
with the push-forth linear map φ∗ . (See also the similar diagram in Figure 19.1.2.)
φ∗
T (IRn ) γ # (0) γ̃ # (0) T (IRn )
∂ && ∂ &&
& &
∂t t=0 ∂t t=0
C1 (IRn ) γ γ̃ C1 (IRn )
φ
Figure 19.4.1 Invariance of tangent vector to curves under C 1 diffeomorphisms

19.4.3 Remark: One may ask what what properties of figures are preserved by the pseudogroup of C 0,1
diffeomorphisms. In the case of C 1 diffeomorphisms, tangent vectors are preserved, and the largest pseu-
dogroup which preserves tangent vectors is (more or less) the pseudogroup of C 1 diffeomorphisms. The
C 0,1 diffeomorphisms (probably) preserve tangent cones rather than tangent vectors. Then one might ask
whether the largest pseudogroup of homeomorphisms which preserve tangent cones is essentially the C 0,1
class. [ This requires further study. ]
19.4.4 Remark: There does not seem to be a rich literature on pseudogroups. A separate definition seems
to be quite unnecessary. (Not all clusters of attributes define an interesting class of objects.) It happens that
the transition maps of a manifold constitute a pseudogroup of homeomorphisms. This pseudogroup is not
necessarily closed under restriction and extension of maps. (Closure under restriction/extension operations
is easily obtained by completion of the manifold atlas with respect to restriction and extension.)
Classes of differentiable manifolds are specified in terms of a regularity requirement for the transition maps
between manifold charts. (For example, see Definition 26.3.6.) There is a one-to-one correspondence be-
tween classes of diffeomorphism pseudogroups and classes of differentiable manifolds. One may think of the
pseudogroup concept as derived from the manifold atlas concept or vice versa. This is related to the general
philosophical question of whether real objects lie “behind” perceptions. A manifold is may be thought of as
the “real object” which lies behind the pseudogroup of diffeomorphisms in the flat coordinate spaces where
we do concrete calculations. Or we can regard the flat coordinate spaces as the “real object” and the points
of the manifold as an abstraction from the transformation pseudogroup. In this book, the author is thinking
of manifolds as being the real objects “out there” in the “real world”, while coordinates for manifolds are
part of the mental models “in here” inside the brain. But the distinction is quite arbitrary.
19.4.5 Definition: A pseudogroup of homeomorphisms Γ on a topological space X is a set Γ such that:

(i) ∀φ ∈ Γ, (Dom(φ) ∈ Top(X) and Range(φ) ∈ Top(X)).
(ii) ∀φ ∈ Γ, φ is a homeomorphism.
(iii) ∀φ ∈ Γ, φ−1 ∈ Γ.
(iv) ∀φ1 , φ2 ∈ Γ, φ2 ◦ φ1 ∈ Γ. (Note that φ2 ◦ φ1 : φ−1
1 (Dom(φ2 )) ≈ φ2 (Range(φ1 )).)
19.4.6 Remark: The conditions of Definition 19.4.5 are a mixture of algebraic and topological require-
ments. Condition (i) is topological and apparently unnecessary. The reason for restricting consideration
to open domains and ranges is to avoid all of the hard work that arises from questions about boundary
behaviour.
Conditions (iii) and (iv) are purely algebraic. The existence of inverses in condition (iii) is purely local. The
composition of a local homeomorphism with its inverse is a local identity map on the domain or range of the
function. In fact, φ−1 ◦ φ = idDom(φ) and φ ◦ φ−1 = idRange(φ) for any bijection φ.
The combination of conditions (iii) and (iv) implies that the pseudogroup Γ contains the identity map on
all domains and ranges of homeomorphisms in Γ. A pseudogroup is not generally a group because the local
identity maps do not meet the requirement of a single global identity element. It is true that the local
identity maps could be extended to the global identity map and identified as a single identity map. However,
the homeomorphisms of the pseudogroup would then need to be extended likewise to the whole space X,
which would usually contradict condition (ii). In order to even maintain the bijection property for all maps
in Γ, it would be necessary to transform the points of the function range to other points. So it is not easy
to automatically convert a pseudogroup into a group. But a pseudogroup is automatically a group if the
homeomorphisms are required to be global, i.e. with both domain and range equal to X.
The function composition in condition (iv) uses the general Definition 6.11.7 for partially defined functions.
Consequently, Dom(φ2 ◦ φ1 ) is a subset of Dom(φ1 ).
19.4.7 Definition: A complete pseudogroup of homeomorphisms Γ on a topological space X is a pseu-

dogroup of homeomorphisms Γ on X such that:
&
(i) ∀φ ∈ Γ, ∀Ω ∈ Top(X), φ&Ω ∈ Γ.
%
(ii) For any family (Ωα )α∈A of open subsets
& of X, if φ : Ω ≈ G is a homeomorphism from Ω = α∈A Ωα to
an open subset G ∈ Top(X) and φ&Ω ∈ Γ for all α ∈ A, then φ ∈ Γ.
α

19.5. Second-order differential operators and diffeomorphisms 459
[ Is condition (i) a special case of condition (ii) in Definition 19.4.7? ]
19.4.8 Remark: In Definition 19.4.7, both conditions (i) and (ii) imply a kind of locality of the pseu-
dogroup membership property. One may restrict the function’s domain or extend it (by merging any family
of homeomorphisms) without losing pseudogroup membership. Conditions (i) and (ii) seem to be quite
useless in practice. The closure under restriction and extension can easily be generated from any set of
homeomorphisms satisfying the conditions of Definition 19.4.5.
19.4.9 Remark: A pseudogroup of homeomorphisms would usually be constructed by sculpting the pseu-
dogroup of all homeomorphisms on a topological space. This is done by restricting the full pseudogroup
to those homeomorphisms which satisfy a set of conditions such that the restricted set still satisfies the
closure conditions of a pseudogroup. For example, requiring the homeomorphisms to be C k diffeomorphisms
−
for some k ∈ + 0 yields a closed subset of the pseudogroup of all homeomorphisms of IR . This is pre-
n
sented as Definition 19.4.10. Some useful properties of this pseudogroup are presented in Section 19.1. This
pseudogroup is implicit in Definitions 26.3.4 and 26.3.6 for the differentiable structure of a differentiable
manifold.
−+
19.4.10 Definition: The (complete) pseudogroup of C k diffeomorphisms on IR n for n ∈ +
0 and k ∈ 0
is the set {φ : Ω1 ≈ Ω2 ; Ω1 , Ω2 ∈ Top(IRn ) and φ, φ−1 are of class C k }.
−+
19.4.11 Notation: Γk (IRn ) for n ∈ +
0 and k ∈ 0 denotes the complete pseudogroup of C k diffeomor-
phisms on IRn .
19.4.12 Remark: It is not necessary to specify the pseudogroup action in Definition 19.4.10 because the
action of a pseudogroup of homeomorphisms is always simply the composition operation for functions in the
set Γk (IRn ). However, it is necessary to verify that the set satisfies all of the conditions of Definition 19.4.7
for a complete pseudogroup of homeomorphisms. This follows readily from Remark 19.1.3.
19.4.13 Remark: On may also define the complete pseudogroup Γk,α (IRn ) of C k,α diffeomorphisms whose
maps have kth-order derivatives which are α-Hölder continuous. (See Definition 18.9.1.)
19.5. Second-order differential operators and diffeomorphisms
19.5.1 Remark: In the same way that tangent vectors may be thought of as differential operator maps
f 8→ v i ∂i f (x) for f ∈ C̊ 1 (IRn ), x ∈ Dom(f ) and v ∈ IRn , it is possible to define second-order maps like
f 8→ bi ∂i f (x) + aij ∂ij f (x) for b ∈ IRn and matrices a = (aij )ni,j=1 ∈ IRn×n . Invariance rules for these tangent
objects can be defined analogously to first-order tangent vectors. Then the transformation rules can be used
in the definition of higher-order differential operator tangent bundles on C 2 manifolds.
It turns out that higher-order differential operators in a space with arbitrary diffeomorphisms but no affine
connection are not very useful. One may certainly define the operators and transform them under diffeomor-
phisms, but one must keep track of not only the first-order derivatives of the diffeomorphisms (the Jacobian
matrix), but also the higher-order derivatives up to the order of the operator. This is simply too inconvenient
for most people. In effect, one must specify the chart together with the operator. People prefer operators
which require only pointwise tensorial transformations for the coefficients, and which have the same apparent
form independent of the choice of chart. Thus the gradient v i ∂i f of a real-valued function f is okay, even
though the coefficients v = (v i )ni=1 are chart-dependent. The dependency is simple, and in practice the
n-tuple v ∈ IRn has some kind of physical or other interpretation in the model which is under consideration.
In the case of second-order operators, a C 2 change of coordinates causes the first-order term to be contam-
inated by an extra term which depends on the curvature of the transformation. (This problem does not
occur with linear or affine transformations.) It is very inconvenient to have to explain such chart-curvature
terms. In practice, the charts in which a second-order operator seems simplest and easiest to explain are
aligned with some kind of parallelism or metric structure on the space. (In normal coordinates, for example,
the second-order operators in some models may have a simpler form.) Then when other charts are used,
correction terms may be applied to take into account curvature and distortion of the coordinates. Covariant
derivatives are designed to automatically correct curved charts for deviation from parallelism. The metric
tensor can be used in differential operators to correct the coordinate charts for non-orthogonality.

Therefore although higher-operators are well-defined in the absence of a connection of metric, in practice
it is highly preferable to incorporate the connection or metric structure into the operators so that they are
easier to work with in diverse coordinate charts. In this section, the transformation rules for second-order
operators are presented, but they are of limited usefulness in the differential layer of manifolds.
19.5.2 Remark: The observations in this section are useful for defining spaces of higher-order operators
on differentiable manifolds in Chapter 31. In that context, the chart transition maps ψα ◦ ψβ−1 satisfy the
requirements of the diffeomorphism φ in Remark 19.5.3 if the manifold is C 2 .
19.5.3 Remark: For any open set Ω ⊆ IRn , symmetric matrix functions a ∈ C 0 (Ω, Sym(n, IR)) and vector
functions b ∈ C 0 (Ω, IRn ), define La,b : C 2 (Ω) → C 0 (Ω) by
∀f ∈ C 2 (Ω), ∀x ∈ Ω, La,b (f )(x) = aij (x)fij (x) + bi (x)fi (x).
Define D (Ω) = {La,b ; a ∈ C 0 (Ω, Sym(n, IR)), b ∈ C 0 (Ω, IRn )}.
For open sets Ω, Ω̃ ⊆ IRn and homeomorphisms φ : Ω → Ω̃, define the pull-back Tφ : C 0 (Ω̃) → C 0 (Ω) by
∀f ∈ C 0 (Ω̃), Tφ (f ) = f ◦ φ.
Define T̂φ : D (Ω) → D (Ω̃) for C 2 diffeomorphisms φ : Ω → Ω̃ by

∀La,b ∈ D (Ω), ∀f ∈ C 2 (Ω̃), ∀x ∈ Ω̃,
T̂φ (La,b )(f )(x) = La,b (Tφ (f ))(φ−1 (x)),
or more concisely,
∀La,b ∈ D (Ω), T̂φ (La,b ) = Tφ−1 ◦ La,b ◦ Tφ .
It follows from Theorem 18.6.4 that T̂φ (Lsa,b ) ∈ D (Ω̃). For f ∈ C 2 (Ω̃), x ∈ Ω and x̃ = φ(x),
Tφ−1 (La,b (Tφ (f )))(x̃) = La,b (f ◦ φ)(x)
" #
= aij (x) fk% (x̃)φk i (x)φ% j (x) + fk (x̃)φk ij (x) + bi (x)fk (x̃)φk i (x)
= ãk% (x̃)fk% (x̃) + b̃k (x̃)fk (x̃)
= Lã,b̃ (f )(x̃),
where
ãk% = (φk i φ% j aij ) ◦ φ−1 (19.5.1)
and
b̃k = (φk ij aij + φk i bi ) ◦ φ−1 .
[ Old marginal note: Must establish that La,b and Lã,b̃ have the same effect on f ∈ C 2 (IRn ). Then can say
that they are the same “object”. ]
Clearly ã ∈ C 0 (Ω̃, Sym(n, IR)) and b̃ ∈ C 0 (Ω̃, IRn ). Therefore Lã,b̃ (f ) ∈ C 0 (Ω̃), from which it follows that
T̂φ (La,b )(f ) = Tφ−1 (La,b (Tφ (f ))) ∈ C 0 (Ω̃). So T̂φ (Lsa,b ) = Lã,b̃ ∈ D (Ω̃). This is summarized in Figure 19.5.1.
T̂φ
D (Ω) La,b Lã,b̃ D (Ω̃)
T̂φ (La,b ) = Lã,b̃
Tφ
C 2 (Ω) f ◦φ f C 2 (Ω̃)
f ◦φ = Tφ (f )
φ
Ω x φ(x) Ω̃
Figure 19.5.1 Action of a diffeomorphism on a space of differential operators

19.6. Directionally differentiable homeomorphisms 461
The term φk ij aij in b̃k is very inconvenient. It is important to note that the operator La,b is perfectly
well-defined, but to calculate it in all coordinate charts, it is necessary to introduce the term φk ij aij which
depends on the second-order derivatives of the transformation. This just does not fit in with the tensorial
way of thinking. The solution to this problem in practice is to introduce structures such as a connection into
the operator itself so that it has the illusion of being a simple tensorial object because the curvature of the
charts is incorporated into the operator. This is the motivation behind covariant differentiation.
It is clear from equation (19.5.1) that ellipticity is preserved by C 2 diffeomeorphisms. Hence the class of
elliptic operators as a whole is preserved under C 2 diffeomorphisms. If the diffeomorphisms are restricted
to affine transformations, the troublesome term φk ij aij disappears. But this restriction is incompatible with
the required generality of chart transition maps for differentiable manifolds.
The Laplacian ∆ is a well-defined elliptic operator, but it is not invariant under diffeomorpisms. The
Laplacian stays the Laplacian under orthogonal transformations in O(n) and translations. Similarly, the
Gaussian curvature operator is invariant under volume-preserving transformations in SL(n). [ Check this. ]
In summary, second-order operators can be defined in a chart-independent manner only with the assistance
of a connection so that covariant derivatives are defined. The Laplacian operator can only be defined chart-
independently with the assistance of a metric tensor.
[ Here do the general-order case of Remark 19.5.3 as definitions and a theorem? ]
19.6. Directionally differentiable homeomorphisms

19.6.1 Remark: Even the most regular of the C 0,α classes, namely the C 0,1 (Lipschitz) class, does not
guarantee existence of even unidirectional derivatives. So tangent vectors are difficult to define. However, if
the existence of unidirectional derivatives everywhere is the regularity test for a class of homeomorphisms,
the resulting class can support meaningful tangent vectors. Unidirectional tangent bundles are defined in
Section 27.14.
19.6.2 Definition: The unidirectional derivative of a function f : U → IRn at a point x ∈ U in a direction
v ∈ IRm for an open set U ⊆ IRm with m, n ∈ + 0 is the limit lima→0+ (f (x + av) − f (x))/a if this limit is
well-defined.
The (bi)directional derivative of a function f : U → IRn at a point x ∈ U in a direction v ∈ IRm for an open
set U ⊆ IRm with m, n ∈ + 0 is the limit lima→0 (f (x + av) − f (x))/a if this limit is well-defined.
19.6.3 Definition: A unidirectionally differentiable function is a function f : U → IRn for an open set
U ⊆ IRm with m, n ∈ + 0 such that lima→0+ (f (x + av) − f (x))/a is well-defined for all x ∈ U and v ∈ IR .
m
A (bi)directionally differentiable function is a function f : U → IR for an open set U ⊆ IR with m, n ∈ +

n m
0
such that lima→0 (f (x + av) − f (x))/a is well-defined for all x ∈ U and v ∈ IRm .
19.6.4 Theorem: For all n ∈ + 0 , the set of all homeomorphisms f : U ≈ V between open subsets
U, V ⊆ IRn for which both f and f −1 are unidirectionally differentiable is a complete pseudogroup of
homeomorphisms on IRn .
Proof: Let f : U ≈ V and g : V ≈ W be unidirectionally differentiable homeomorphisms, where U , V
and W are open sets in IRm , IRn and IRp respectively. Let h = g ◦ f : U ≈ W . Let x ∈ U , v ∈ IRm
and a ∈ IR. Let f∗+ (x, v) denote the unidirectional derivative lima→0+ (f (x + av) − f (x))/a for x ∈ U and
v ∈ IRm . Similarly, denote g∗+ (y, w) = lima→0+ (g(y + aw) − g(y))/a for y ∈ V and w ∈ IRn . Then
h(x + av) − h(x) g(f (x + av)) − g(f (x))
lim+ = lim+
a→0 a a→0 a
g(f (x + av)) − g(f (x) + af∗+ (x, v)) g(f (x) + af∗+ (x, v)) − g(f (x))
= lim+ + lim+
a→0 a a→0 a
g(f (x) + af∗+ (x, v)) − g(f (x))
= lim+
a→0 a
= g∗+ (f (x), f∗+ (x, v)).
" #
This follows from
" the observation that lim a→0 + f (x+av)−f (x)−af
#
+
∗ (x, v) /a = 0 by definition of f∗+ (x, v).
Hence lima→0+ g(f (x + av)) − g(f (x) + af∗+ (x, v)) /a = 0 by the uniform continuity of g on compact sets.
This establishes Definition 19.4.5 (iv) for a pseudogroup. The other conditions follow without difficulty.

19.6.5 Remark: Unidirectionally differentiable functions have the property that the directional derivatives
f∗+ (x, v) lima→0+ (f (x + av) − f (x))/a satisfy f∗+ (x, kv) = kf∗+ (x, v) for all k ≥ 0. [ Investigate the continuity
properties which follow automatically from unidirectional differentiability. ]
19.6.6 Remark: A useful short notation for lima→0+ (f (x + av) − f (x))/a could be ∂a+ f (x + av). [ This
might not make good sense at all. See Metadefinition 27.14.2 for an application of this (pseudo)notation. ]
19.6.7 Definition: The (complete) pseudogroup of unidirectionally differentiable homeomorphisms on IRn

for n ∈ +
0 is the set {φ : Ω1 ≈ Ω2 ; Ω1 , Ω2 ∈ Top(IR ) and φ, φ
n −1
are unidirectionally differentiable}.
[ Show that one-sided tangent vectors of unidirectionally differentiable curves are invariant under the pseu-
dogroup of unidirectionally differentiable homeomorphisms. Also show that unidirectional differentials of
unidirectionally differentiable real-valued functions are invariant. ]
19.6.8 Remark: The transition maps for the Lipschitz manifold in Example 42.4.1 are a pseudogroup of
unidirectionally differentiable homeomorphisms on IRn according to Definition 19.4.5, but do not meet the
requirements for closure under restriction and extension in Definition 19.4.7 for a complete pseudogroup.
[ Show that a rectifiable curve in IRn is differentiable almost everywhere. ]

[463]
Chapter 20
Measure and integration
20.1 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

20.2 Lebesgue integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
20.3 Rectangular Stokes theorem in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . 465
20.4 Rectangular Stokes theorem in three dimensions . . . . . . . . . . . . . . . . . . . . . . . 467
20.5 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
20.6 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
20.7 Exterior differentiation using Lie derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 473
20.8 Geometric measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.9 Stokes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.10 Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.11 Some integrability-based function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
20.12 Logarithmic and exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
20.13 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Measure and integration are required for Section 20.5 on differential forms and for the definition of function
spaces such as Sobolev spaces.
20.1. Lebesgue measure
20.1.1 Remark: For real functions of real variables, the Lebesgue integral is essentially the most general
integral definition possible. The “exhaustion method” of integration, which is very similar to the informal
presentation of integration in modern applied disciplines, was used in a limited way by classical Greek math-
ematicians such as Eudoxus in the 4th century BC. In the 3rd century BC, Archimedes used the exhaustion
method in a much more general way. Newton’s integral was published in 1687 in his “Philosophiae naturalis
principia mathematica”, usually referred to as the Principia mathematica or the Principia. (Bell [190],
page 132, gives the dates 1666 and 1684 for M Newton and the dates 1673 and 1675 for Leibniz.) Struik [194],
page 111, says that the integral symbol “ ” was introduced by Leibniz in 1686. Leibniz invented the term
“calculus integralis” (integral calculus). According to Bell [190], page 480, Cauchy’s integral definition dates
from 1823. The Riemann generalization of the Cauchy integral was introduced in about 1850. (Bell [190]
gives the date 1854.) Lebesgue introduced his integral in 1902.
20.1.2 Remark: Integral calculus is, in a sense, infinitely more difficult than differential calculus. It is
not generally possible to find closed-form integrals. In fact, the integration of many simple-looking functions
requires the invention of “special functions”. Bell [191], page 101, says the following about the difficulty of
closed-form integration.
M
[. . . ] the problem of evaluating f (x) dx for comparatively innocent-looking functions f (x) may be
beyond our powers. It does not follow that an “answer” exists at all in terms of known functions
when an f (x) is chosen at random—the odds against such a chance are an infinity of the worst
sort (“non-denumerable”) to one. When a physical problem leads to one of these nightmares
approximate methods are applied which give the result within the desired accuracy.

464 20. Measure and integration
[ See Taylor [145] for Lebesgue measure theory. Should also deal with integrals for distributions. But distri-
butions require the definition of integrals. ]
20.1.3 Remark: It is not possible to prove the existence of a subset of IR which is not Lebesgue measurable
using only the ZF axioms. With the addition of the axiom of choice, the existence may be asserted but not
demonstrated by any examples. This implies that no one will ever know what non-measurable sets and
functions look like. Sets which are not Lebesgue measurable are “dark sets”. The believers in AC “know”
that they are there. But there will never be any pictures. Theorem 20.1.4 is perhaps one of the most useless
theorems in mathematics, especially considering all the heated controversy and hard work it generates.
The real value of Theorem 20.1.4 is metamathematical. It shows that the combination of the axioms,
definitions and theorems of mathematics yields troubling objects which are “unforeseen consequences”. When
this happens, it could be an indication that this kind of mathematics is not ideally suited to the modelling of
the physical world. People who are impressed by the amazing ability of mathematics to model the physical
world should think also about the quirky, embarrassing outcomes from mathematical models.
20.1.4 Theorem [zf+ac]: There exists a subset of IR which is not Lebesgue measurable.
20.1.5 Remark: The usual set “construction” which is used to prove Theorem 20.1.4 first partitions all
elements of IR according to an equivalence relation E ⊆ IR × IR defined by
(x, y) ∈ E ⇔ x − y ∈ .
Let [x] denote the equivalence class of x with respect to E for any x ∈ IR. Then [x] = for all x ∈ , and
[x] ∩ [−x] = ∅ for all x ∈ IR \ . The set of equivalence classes is uncountable.
Now “construct” the set Y of equivalence classes by requiring that [0] ∈ Y and choosing one and only one
of [x] and [−x] to be an element of Y for each x ∈ IR. The existence of such a set of choices is guaranteed
by the Axiom of Choice. Define U = ∪Y . Then U is a Lebesgue unmeasurable subset of IR.
The Lebesgue unmeasurability of U relies upon the fact that there is no way to list the set of equivalence
classes [x]. There is no countable list because the set is uncountable. But there is also no uncountable listing,
e.g. by assigning one and only one equivalence class to each x ∈ IR. Therefore there is no way to specify for
each equivalence class whether it is or is not in the set Y .
20.2. Lebesgue integration

20.2.1 Remark: Theorem 20.2.3 is known as the fundamental theorem of calculus. In fact, Theorem 20.2.3
can be made stronger, but it shows the general idea.
20.2.2 Remark: The non-standard abbreviation FTOC may sometimes be used for “fundamental theorem
of calculus”.
[ Replace Theorem 20.2.3 with a sharper version. See Rudin [137], page 115 and EDM2 [35], 216.C, page 821. ]
20.2.3 Theorem: Let f : [a, b] → IR be a Lebesgue integrable function for a, b ∈ IR with a < b, and let
F : [a, b] → IR be a continuous function on [a, b] such that F is differentiable on (a, b) and F # (x) = f (x) for
all x ∈ (a, b). Then
N b
f (x) dx = F (b) − F (a).
a
[ State the generalization of Theorem 20.2.3 to C 1 curves. ]
20.2.4 Remark: Theorem 20.2.3 may be easily generalized to the integral of the differential of a real-valued
function along an arbitrary C 1 curve.

20.3. Rectangular Stokes theorem in two dimensions 465
20.3. Rectangular Stokes theorem in two dimensions

20.3.1 Remark: It turns out that the fundamental theorem of calculus is just the tip of an impressive
iceberg. It is a special case of the Gauß-Green-Stokes family of theorems, which can be generalized in many
ways.
A simple consequence of Theorem 20.2.3 is the corresponding Theorem 20.3.2 for two variables. The theorem
holds for very general regions, but it is illuminating to first prove it for a rectangular region.
20.3.2 Theorem: Let a1 , a2 , b1 , b2 ∈ IR with a1 < b1 and a2 < b2 , and let A : [a1 , b1 ] × [a2 , b2 ] → IR2 be a
continuous function such that A has partial differentials on (a1 , b1 ) × (a2 , b2 ). Then
N N
∂1 A2 − ∂2 A1 dx dx = 1 2
A.ds, (20.3.1)
Ω ∂Ω
where Ω = [a1 , b1 ] × [a2 , b2 ] and ds denotes the anti-clockwise line integral around ∂Ω.
Proof: The first term ∂1 A2 may be integrated by Theorem 20.2.3 with respect to x1 for fixed x2 . This
gives A2 (b1 , x2 ) − A2 (a1 , x2 ). (See Figure 20.3.1.) So
N N b2 N b1
∂1 A2 dx1 dx2 = ∂1 A2 (x1 , x2 ) dx1 dx2
Ω a2 a1
N b2
= A2 (b1 , x2 ) − A2 (a1 , x2 ) dx2 .
a2
x2
γt
b2
M b1
γ% a1
∂1 A2 (x1 , x2 ) dx1 γr
a2
γb
a1 b1 x1
Figure 20.3.1 Integration of exterior derivative in a rectangle
Similarly
N N b1 N b2
−∂2 A1 dx dx = 1 2
−∂2 A1 (x1 , x2 ) dx2 dx1
Ω a1 a2
N b1
= A1 (x1 , a2 ) − A1 (x1 , b2 ) dx1 .
a1
So the left-hand side of (20.3.1) becomes the sum of anti-clockwise line integrals:
N N N N
A.ds + A.ds + A.ds + A.ds,
γr γ$ γb γt
where γr , γ% , γb and γt denote respectively the right, left, bottom and top sides of [a1 , b1 ] × [a2 , b2 ].
[ Both Theorems 20.2.3 and 20.3.2 need a lot of improvement. Also give an n-dimensional version of the
rectangle versions of the Stokes formula. ]

20.3.3 Remark: Some elementary scaling tests can be applied to Theorem 20.3.2 to ensure that it is at
least plausible. The left-hand side of equation (20.3.1) has the terms ∂1 A2 − ∂2 A1 in the integrand. These
both look like covariant tensors of degree 2. So is the coordinates are multiplied by 2, they will be multiplied
by 1/4. The differential form of the integral is dx1 dx2 , which looks like a contravariant tensor of degree 2.
So under any simple coordinate scaling, the left hand side should remain constant. In fact, this is true for
any C 1 transformation.
The right-hand side of equation (20.3.1) has an integrand which looks like a covariant tensor of degree 1
and a differential form ds which looks like a contravariant tensor of degree 1. So this is also invariant under
coordinate scaling. This kind of “dimensional analysis” is useful as a basic sanity check for equations and
expressions in differential geometry.
20.3.4 Remark: The proof of Theorem 20.3.2 gives a clue for how to remember the form of the exterior
derivative. The term ∂1 A2 , if integrated on its own, yields the difference between A2 on the right and
left sides of the integration region, which is just like in the fundamental theorem
M of calculus. All that is
happening here is that the partial derivative ∂1 is cancelled by the integration . . . dx1 . Then the integeral
over x2 sums this right-left difference over the whole right and left edges of the region. The term ∂2 A1
may be understood similarly, except that a minus-sign is required because the top edge integral is going
in a negative direction, i.e. in the direction of decreasing x1 . Thus one may think of ∂1 A2 − ∂2 A1 as “the
right-left difference of A2 minus the top-bottom difference of A1 ”.
20.3.5 Remark: If the vector field A in Theorem 20.3.2 is replaced with the gradient (∂1 f, ∂2 f ) of a C 2
function f : Ω → IR, the left-hand integral has a value equal to zero because ∂ 2 f /∂x1 ∂x2 = ∂ 2 f /∂x2 ∂x1 .
So the theorem implies that the integral of the gradient df around the boundary ∂Ω is zero. This is not
surprising because this integral of df signifies the difference in “height” of the function f as a point completes
a loop around the boundary. This must be zero so that the value of f comes back to where it started. This is,
in fact, a consequence of the fundamental theorem of calculus applied to the boundary curve. (This pattern
continues for higher dimensions.)
From this observation, one may draw an interpretation of the expression ∂1 A2 − ∂2 A1 as the “deviation of
the vector field A from the differential of a function f ”. In fact, if this integral is always zero, it follows that
the boundary curve integral of A is independent of path, in which case, an integral f may be determined
simply by integration of A along non-closed curves. If the vector field A represents a physical force field, the
integrals in Theorem 20.3.2 may be thought of as the energy gained by one rotation around the boundary
curve. So a zero value implies that the field is conservative.
20.3.6 Remark: Roughly speaking, Theorem 20.3.2 suggests that the exterior derivative ∂1 A2 −∂2 A1 may
be thought of as “curl per unit area”. The kind of directional boundary path integral in Theorem 20.3.2 has
an interesting additive property. If two rectangular regions are placed side by side, the common boundary
segments cancel each other. The arrows cannot have the same direction on a common segment if they are
always oriented counterclockwise as indicated.
If a region is partitioned into many rectangles, the integral of the curl operator over the entire region may be
calculated by integrating around its boundary, ignoring all of the internal line segments where the component
rectangles coincide. In fact, this can be generalized to almost any region at all. This is not a surprising
result when it is considered that the differential operator in the interior is equal to the limit of the per-area
boundary line integral for vanishingly small rectangles. Thus, roughly speaking, one may write:
N
1
(∂i Aj − ∂j Ai )(p) = lim ∂i Aj − ∂j Ai dµ
Ω→{p} µ(Ω) Ω
N
1
= lim A.ds,
Ω→{p} µ(Ω) ∂Ω
for p ∈ IR2 , where µ is the Lebesgue measure in IR2 and the expression “limΩ→{p} ” can be made precise in

20.4. Rectangular Stokes theorem in three dimensions 467
terms of the diameter of Ω. This interpretation may be compared with the corresponding formula in IR1 :
N
1
∂i f (p) = lim ∂i f dµ
I→{p} length(I) I
N
1
= lim f dµ
I→{p} length(I) ∂I
f (b) − f (a)
= lim ,
a,b→p b−a
where ∂I denotes the “signed boundary” of the interval I = [a, b]. This “signed boundary” is positive at b
and negative at a. Hence the boundary integral is the integral of f multiplied by the 0-form which equals 1
at b and −1 at a.
The Stokes Theorem follows very naturally from the way the exterior derivative (in this case the curl
operator) is defined. The reason for the name “exterior derivative” is clear from the “limit of per-area
boundary integral” interpretation. (See also Remark 20.6.1 for this interpretation.)
[ Define the physics version of “curl” and define its precise relation to the exterior derivative. ]
20.3.7 Remark: See Remark 19.3.5 for the calculation of the transformation rule for the exterior derivative
of a covariant vector field. It transforms like a covariant tensor of degree 2. This would seem to imply that
Theorem 20.3.2 holds when the point space is subjected to an arbitrary C 1 diffeomorphism.
[ Explain why the situation is not so good when the vector field is contravariant. Using the transformation rules
for the differential forms in Theorem 20.3.2 under general diffeomorphisms (even to spaces with more than
2 dimensions), it should be possible to generate a useful class of generalizations using only diffeomorphisms.
Such generalizations should give exactly the same answer as the standard general Stokes theorem. ]
20.4. Rectangular Stokes theorem in three dimensions

This section extends Section 20.3 to rectangular solids in IR3 .
20.4.1 Remark: Stokes theorem can be extended from a rectangle to a rectangular solid. Consider first
the surface integral forM SA = {x1 } × [x2 , x2 + ∆x2 ] × [x3 , x3 + ∆x3 ]. The integral of the vector field
λ ∈ X 1 (Λ2 T (IR3 )) is SA λ(x)(e2 , e3 ) dx2 dx3 . In terms of coordinates, let λ(x)(e2 , e3 ) = a23 (x). (See
Figure 20.4.1.)
x3
e3 e3 N
N SA SB λ(x)(e2 , e3 ) dx2 dx3
λ(x)(e2 , e3 ) dx2 dx3 e2
SB
SA
e2
e1
x2
x1
Figure 20.4.1 Stokes theorem in IR3
Then by subtracting the integral over SA from the integral over SB and dividing by ∆x1 ∆x2 ∆x3 and taking
the limit, the result is ∂1 a23 (x) = (∂/∂x1 )λ(x1 , x2 , x3 )(e2 , e3 ). When the other two surface pairs are added,
the result is ∂1 a23 (x) − ∂2 a13 (x) + ∂3 a12 (x). MBy integrating this over a non-infinitesimal rectangular
M solid
as for a 2-dimensional rectangle, the result is Ω ∂1 a23 (x) − ∂2 a13 (x) + ∂3 a12 (x)dx1 dx2 dx3 = ∂Ω λ(x)(dA).

This suggests that dλ(x) should be defined as ∂1 a23 (x) − ∂2 a13 (x) + ∂3 a12 (x) in order to make the Stokes
formula valid.
The Stokes formula for a rectangular solid is then
N N
∂1 a23 − ∂2 a13 + ∂3 a12 dx dx dx =
1 2 3
a.dA.
Ω ∂Ω
The integral over the surface could be called an “exterior integral”, while the integral over the rectangular
region could be called an “internal integral”. The name of the “exterior derivative” is clearly related to the
external nature of the surface integral.
An easy proof of the Stokes formula for a rectangular solid follows the same method as for Theorem 20.3.2.
Each of the three terms of the integrand may be integrated along lines as in Figure 20.4.2 to get rid of
the partial derivative. Each of these line integrals may be integrated over the corresponding faces of the
rectangular solid. These solid integerals may be added to give the surface integral.
x3 N b1
∂1 a23 (x) dx1
a1
e3 e3
e2 e2
e1
x2
x1
Figure 20.4.2 Stokes theorem integration paths in IR3
20.4.2 Remark: As for the rectangular Stokes theorem in IR2 (Remark 20.3.6), one may motivate the
definition of the exterior derivative by the (very rough) expression:
N
1
(∂1 a23 − ∂2 a13 + ∂3 a12 )(p) = lim a.dA,
Ω→{p} µ(Ω) ∂Ω
for p ∈ IR3 , where µ is the Lebesgue measure in IR3 .
[ Maybe derive the expression for the exterior derivative for general dimensions by using rectangular regions.
Then show the Stokes theorem for C 1 regions. Show how only the antisymmetric part of a multilinear
function of a set of n vectors is significant. This motivates the definition of alternating tensors. ]
20.5. Differential forms

The term “differential form” originates from expressions like “dx dy” which appear in integrals, especially
with respect to two or more independent variables. Just as directional derivatives may be identified with
tangent vectors, so also the differentials which appear in integrals may be identified with cotangent vectors,
namely the duals of tangent vectors.
20.5.1 Remark: This section introduces differential forms in flat space. Differential forms in physics
represent densities of physical quantities which may be scalars, vectors or tensors. The density may be linear
density along a curve, per-area density on a surface or per-volume density, and so forth for higher dimensions.
Exterior calculus may be thought of as the calculus of integration on embedded submanifolds.

20.5. Differential forms 469
A differential form represents something that one may want to integrate. A differential form φ of degree m
is an alternating multilinear function of sequences of m tangent vectors in some space, which may be a flat
space or a manifold. If m = 1, then φ would typically be integrated along a curve and the single tangent
vector would be the velocity vector of the curve. If m = 2, then φ would typically be integrated over a
2-dimensional surface, and the two tangent vectors would be tangential to the surface at each point.
Differential forms φ of degree m tell you how much of something there is for a given amount of m-area. Since
they are alternating multilinear with respect to sequences of m tangent vectors, they are therefore linear
with respect to m-area. So for m = 1, φ depends linearly on the length of a curve. For m = 2, φ represents
the amount of something per unit area at a given point. And so forth. Linearity with respect to m-area
ensures that the integral is independent of the way in which a region is subdivided. So linearity seems to be
an inescapable part of the definition.
[ How do differential forms relate to general spaces of differentials such as T (M1 , M2 )? ]

[ See notes I. See also Federer [106] 4.1.6, EDM2 [35] 105.Q, and Malliavin [36], Section 7.5, page 71. See
Chapter 4 of the second part of Malliavin [36] (p.112). ]
[ Should either define “flat space” very carefully somewhere or else not mention it at all. The same goes for
“Euclidean spaces”. Particular difficulties arise with infinite-dimensional spaces. It should be pointed out
which things are okay in infinite-dimensional space and which are not. Topology gets tricky, for instance. ]
20.5.2 Definition: The (alternating) m-form bundle on IRn for m, n ∈ + 0 is the set Λm T (IR ) = IR ×
n n
Λm (IR ) together with the transformation rule ψ̂ : Λm T (IR ) →

n n
˚ Λm T (IR ) for alternating m-forms under
n
C 1 diffeomorphisms φ : IRn →
˚ IRn .
[ Must write out the full transformation rules for m-forms in Definition 20.5.2 somewhere. ]
20.5.3 Notation: Λm Tx (IRn ) for m, n ∈ +

0 and x ∈ IRn denotes the subset {x} × Λm (IRn ) of the m-form
bundle Λm T (IRn ) on IRn .
20.5.4 Definition: A cross-section of the (alternating) m-form bundle on IRn for m, n ∈ +

is a function
0
f : IRn → Λm T (IRn ) such that f (x) ∈ Λm Tx (IRn ) for all x ∈ Dom(f ).
A differential form of degree m on IR n for m, n ∈ +
0 is a cross-section of the m-form bundle on IRn .
−+
20.5.5 Definition: A C r cross-section of the m-form bundle on IR n for m, n ∈ +
0 and r ∈ 0 is a
cross-section f : IRn → Λm T (IRn ) such that f is of class C r .
A C r differential form of degree m on IR n for m, n ∈ +
0 and r ∈ +
0 is a C r cross-section of the m-form
bundle on IRn .
20.5.6 Notation: X r (Λm T (IRn )) denotes the set of C r cross-sections of Λm T (IRn ) for m, n ∈ +
0 and
−
r∈ + 0.
[ Remark 20.5.7 needs to be fixed. It shouldn’t refer to manifolds. See Federer [106], page 352. ]
20.5.7 Remark: The historical origin of the term “differential form” is the use in classical differential
geometry of abstract expressions such as “f (x1 , x2 )dx1 + g(x1 , x2 )dx2 ” to represent differential forms. If
abstract differentials such as dxi are given a concrete representation such as linear maps dψ i : T (M ) → IR
with dψ i : V 8→ v i (where v i is the ith component of V ∈ T (M )), then dxi = dψ i is a differential form of
degree 1. Products of differential forms such as dx1 dx2 can be similarly interpreted.
[ Define exterior product. See Crampin/Pirani [12], pages 91, 95, 97, 99, 104, 258. And interior product? ]
[ Must also define differential forms which are valued in spaces of infinitesimal translations of a fibre space F
or fibre sets Eb . These are vector fields rather than vectors. Maybe do this in the corresponding fibre space
chapter/section instead. ]

20.6. The exterior derivative

The exterior derivative could not be defined in Chapter 13 on tensors because it requires differential calculus
for its definition, and its motivation comes from integral calculus for multiple variables. For example,
Theorem 20.3.2 gives a motivation for the exterior derivative of a 1-form.
[ For exterior derivative, see also Crampin/Pirani [12], pages 120–125, 259. ]
20.6.1 Remark: The exterior derivative dω of a form ω of degree m may be interpreted as the infinitesimal
limit of the integral of ω over the m-dimensional boundary of an (m + 1)-dimensional submanifold divided
by the (m + 1)-dimensional area of the submanifold as it shrinks to a point. The Stokes theorem supports
this interpretation. The definition of exterior derivative is designed to make the Stokes theorem valid. One
may regard the Stokes theorem as the definition of the exterior derivative. Defining the exterior derivative
in this way is more satisfying than pulling it out of a hat. (See Section 20.3 for motivation of the exterior
derivative in terms of the Stokes formula.)
20.6.2 Definition: The exterior derivative d : X 1 (Λm T (IRn )) → X 0 (Λm+1 T (IRn )) for n, m ∈ +
0 is
defined by
∀φ ∈ X 1 (Λm T (IRn )), ∀x ∈ IRn , ∀i ∈ Im+1
n
,
m+1
5
dφ(x)(ei ) = (−1)r−1 ∂r φ(x)(omit ei ), (20.6.1)
r
r=1
where e(x) ∈ Tx (IRn )n is a constant sequence of basis vectors for IRn .

20.6.3 Remark: Definition 20.6.2 expresses the exterior derivative in terms of basis vectors e1 , . . . en in
the tangent space of the point set IRn . It is also possible to express the exterior derivative in terms of
basis vectors ej = ej1 ∧ . . . ejm of the alternating m-form space Λm T (IRn ) for j ∈ Im n
, where the sequence
e , . . . e is the dual basis of the sequence e1 , . . . en . This kind of definition has the advantage of being easier
1 n
to motivate in terms of the Stokes formula, but is less concise to state.

A function f ∈ X 1 (Λ0 (IRn )) = C 1 (IRn , IR) is simply a ! C 1 real-valued function on IRn . The exterior derivative
n
of such a function f is the ordinary differential df = r=1 ∂r f.er . (See Definition 19.2.14.) The reason for
choosing to define the exterior derivative to equal the differential for m = 0 is to make the fundamental
theorem of calculus (Theorem 20.2.3) valid. (See Remark 20.2.4 for the FTOC on curves.)
For j ∈ Im n
, the function g(x) = ∧mk=1 e
jk
in X ∞ (Λm (IRn )) is constant. So it is assumed that it has exterior
derivative equal to zero. The m-linear function g(x) has the value
m
g(x)(ei ) = ( ∧ ejk )(ei )
k=1
= δji
= δji11 ,i 2 ...im
,j2 ...jm
for all x ∈ IR and i ∈ Im

n n
.
Since f is a 0-form and g is an m-form, the exterior product f ∧g is a well-defined m-form because 0+m = m.
(See Section 9.9.15 for the exterior product.) In this case, the exterior product is the same as the pointwise
product. That is, (f ∧ g)(x) = f (x)g(x) for all x ∈ IRn . The exterior derivative of this m-form f ∧ g is
defined to be d(f ∧ g) = (df ) ∧ g because g is constant. Thus (d(f ∧ g))(x) = (df )(x) ∧ g(x) for all x ∈ IRn .
It follows immediately that
n
5
d(f (x).ej1 ∧ . . . ejm ) = ∂r f (x).er ∧ ej1 ∧ . . . ejm .
r=1
A general m-form α ∈ X 1 (Λm (IR )) may be written as a sum of constant simple m-forms g = ∧m
n
k=1 e
jk
∈
X 1 (Λm (IRn )) of basis vectors e1 , . . . en and real-valued functions f ∈ X 1 (Λ0 (IRn )). Due to the antisymmetry
rules, it is sufficient to use only increasing index sequences j ∈ Im n
. Thus
5 5 m
α= fj g j = fj ∧ ejk
n n
k=1
j∈Im j∈Im

20.6. The exterior derivative 471
for some functions fj ∈ X 1 (Λ0 (IRn )) for j ∈ Im

n
. By linearity of the exterior derivative, it follows then that
n 5
5 m
dα = (∂r fj ).er ∧ ∧ ejk (20.6.2)
n
k=1
r=1 j∈Im
n 5
5
= (∂r fj ).er ∧ ej1 ∧ . . . ejm .
r=1 n
j∈Im
Finally the general m-form dα may be evaluated for a sequence ei ∈ (IRn )m+1 for i ∈ Im+1
n
to give
n 5
5 m
dα(ei ) = (∂r fj ).(er ∧ ∧ ejk )(ei )
n
k=1
r=1 j∈Im
n 5
5
= (∂r fj ).δir,j ,
r=1 n
j∈Im
where the notation “r, j” means the concatenation of the one-element sequence (r) with the m-element
sequence j to give the (m + 1)-element sequence (r, j1 , . . . jm ) ∈ Im+1
n
. However,
m j
( ∧ ejk )(omit ei ) = δomitr (i)
k=1 r
r,j
= δr,omit r (i)
= (−1)r−1 δir,j .
Therefore
n 5
5 m
dα(ei ) = (−1)r−1 (∂r fj ). ∧ ejk (omit ei )
k=1 r
n
r=1 j∈Im
n
5
= (−1)r−1 ∂r α(omit ei )
r
r=1
for all i ∈ Im
n
. This agrees with Definition 20.6.2.
In summary, equation (20.6.1) is applicable when an m-form is thought of as a general m-linear map on
the tangent spaces of the tangent bundle, whereas equation (20.6.2) is better suited to an m-form which is
expressed as the sum of simple m-covectors of basis vectors.
20.6.4 Theorem: The exterior derivative in Definition 20.6.2 is independent of the choice of basis vectors.
20.6.5 Theorem: For all m ∈ 0,

+
the exterior derivative of an m-form is an (m + 1)-form.
Proof: It must be shown that the exterior derivative of an m-form transforms under a diffeomorphism as
a covariant tensor of degree m + 1, and that antisymmetry holds.
−+
20.6.6 Theorem: For all m ∈ +
0 and r ∈ 0 , the exterior derivative of a C r+1 m-form is a C r (m + 1)-
form.
20.6.7 Remark: Theorems 20.6.4, 20.6.5 and 20.6.6 verify that Definition 20.6.2 yields a well-defined C 0
(m+1)-form. It follows from Theorem 20.6.6 that the exterior derivative of a C ∞ m-form is also of class C ∞ .
Theorem 20.6.8 refers to the linear structure of pointwise addition and scalar multiplication of m-covector
tangent bundle cross-sections. Theorem 20.6.9 shows that the exterior derivative is an extension of the
definition of the differential of a real-valued function.
20.6.8 Theorem: For all m, n ∈ + 0 , the exterior differential in Definition 20.6.2 is a linear map from
d : X 1 (Λm T (IRn )) to X 0 (Λm+1 T (IRn )).

20.6.9 Theorem: For all n ∈ + 0 , for all f ∈ X (Λ0 T (IR )), the exterior derivative df ∈ X (Λ1 T (IR ))
1 n 0 n
is the same thing as the differential df ∈ X (T (IR )).

0 ∗ n
20.6.10 Theorem: For all n, m, q ∈ 0,

+
for all f ∈ X 1 (Λm T (IRn )) and g ∈ X 1 (Λq T (IRn )),
d(f ∧ g) = (df ) ∧ g + (−1)m f ∧ (dg).
20.6.11 Theorem: For all n, m ∈ 0,

+
for all f ∈ X 2 (Λm T (IRn )),
d(df ) = 0.
20.6.12 Remark: Theorem 20.6.11 may be thought of as: “The boundary of the boundary is empty.”
Alternatively: “The exterior of the exterior is zero.” Since the exterior derivative is the limit of the integral
of a differential form over the boundary of a region, it makes sense that repeating this operation yields zero.
Another way to think of Theorem 20.6.11 is: “The curl of a conservative field is zero.” In physics, conservative
force fields (i.e. fields in which particle or loop or surface acquires no net energy when it moves in a closed
path) are often represented as the exterior derivative of a differential form representing energy. It then
follows that the exterior derivative of the force field must be zero.
[ See Frankel [19], p. 73, thm. 2.53 for a proof of Theorem 20.6.13. ]
20.6.13 Theorem: For n ∈ 0,

+
suppose that (µm )∞
m=0 is a sequence of maps which satisfy
(i) µm : X 1 (Λm T (IRn )) → X 0 (Λm+1 T (IRn )) for all m ∈ 0;

+
(ii) µm is linear for all m ∈ 0;

+
(iii) µ0 f equals the differential df of f for all f ∈ X 1 (Λ0 T (IRn ));

(iv) µm+q (f ∧g) = µm (f )∧g +(−1)m f ∧µq (g) for all f ∈ X 1 (Λm T (IRn )), g ∈ X 1 (Λq T (IRn )) and m, q ∈ 0;
+
(v) µm+1 (µm (f )) = 0 for all f ∈ X 2 (Λm T (IR )), for all m ∈
n
0.
+
Then µm (f ) = df for all f ∈ X 1 (Λm T (IRn )), for all m ∈ + 0 , where d is the exterior derivative in Defini-
tion 20.6.2. In other words, the exterior derivative is uniquely determined by the above conditions.
20.6.14 Remark: The usual basis e to use in Definition 20.6.2 is the sequence (e1 , . . . en ) of unit vectors
in Tx (IRn ). (For notational convenience, the dependence on x ∈ IRn is suppressed.)
The set Imn
denotes the set of indices {i : m → n ; ∀j, k ∈ m , j < k ⇒ i(j) < i(k)}. This set
contains #(Imn
) = Cm
n
sequences. (See Notation 7.11.10.) The notation fi for a function f : n → A and
i ∈ Im for any set A means the sequence f ◦ i = (f (i(j)))m
n
j=1 = (fi(j) )j=1 = (fi1 , . . . fim ) :
m
m → A. (See
Notation 7.11.13.)
Denote the value φ(x)(eJ ) by aj1 ,...jm . . .
!
20.6.15 Theorem: Let m, n ∈ + 0 . Let f ∈ X (Λm T (IR )) be defined by f (x) =
1 n
i∈Imn ai (x) ∧k=1 e
m ik
,
where ai ∈ C (IR ) for all i ∈ Im . Then the exterior derivative df ∈ X (Λm+1 T (IR )) satisfies
1 n n 0 n
5 m+1
5 m+1
(df )(x) = (−1)r−1 ∂r aomitr (i) (x) ∧ eik (20.6.3)
n
k=1
i∈Im+1 r=1
for all x ∈ IRn .
20.6.16 Remark: The expression (20.6.3) in Theorem 20.6.15 is sometimes used as the definition of the
exterior derivative.
20.6.17 Example: To show how Theorem 20.6.15 works in practice, let n = 4 and m = 2. Then
I24 = {(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}.

20.7. Exterior differentiation using Lie derivatives 473
Therefore
f (x) = a12 (x)e1 ∧ e2 + a13 (x)e1 ∧ e3 + a14 (x)e1 ∧ e4 + a23 (x)e2 ∧ e3 + a24 (x)e2 ∧ e4 + a34 (x)e3 ∧ e4 ,
where aij ∈ C 1 (IRn ) for all (i, j) ∈ I24 . Then
df (x) = (∂1 a23 (x) − ∂2 a13 (x) + ∂3 a12 (x)) e1 ∧ e2 ∧ e3
+ (∂1 a24 (x) − ∂2 a14 (x) + ∂4 a12 (x)) e1 ∧ e2 ∧ e4
+ (∂1 a34 (x) − ∂3 a14 (x) + ∂4 a13 (x)) e1 ∧ e3 ∧ e4
+ (∂2 a34 (x) − ∂3 a24 (x) + ∂4 a23 (x)) e2 ∧ e3 ∧ e4 .
20.6.18 Remark: Curvature of a manifold is defined as the exterior derivative of parallel transport because
curvature is the deviation of parallelism around a closed path which bounds a disc. The laws of motion in
physics are generally defined in terms of some kind of curvature. Therefore exterior derivatives of differential
forms are important in applications of differential geometry to physics.
20.7. Exterior differentiation using Lie derivatives

20.7.1 Remark: The purpose of defining the exterior derivative in terms of Lie derivatives is to hide
coordinates. When the exterior derivative is defined in terms of a constant basis as in Definition 20.6.2, it is
not necessary to be concerned with the variation of the basis with respect to displacement in the point space.
When differentiating an expression φ(x)(ei1 , . . . eim ) for x ∈ IRn and φ ∈ X 1 (Λm T (IRn , IR)), the variation of
the basis vectors ei1 , . . . eim contributes no extra terms. However, if the basis vectors do vary with respect
to x, this variation must be subtracted out.
In the Lie derivative versions of exterior derivative definitions, variable basis vectors X1 , . . . , Xm+1 are
used instead of constant basis vectors. To compensate for this variability, Lie derivatives (which follow the
“flow”) are used instead of simple partial derivatives, and there are some compensatory terms involving
Poisson brackets of the variable basis vectors.
All in all, the simplicity of the constant-basis definition seems to be preferable. However, the Lie derivative
versions can be useful when Lie derivatives are easier to calculate than simple partial derivatives.
20.7.2 Remark: Lie derivatives (and the Poisson bracket in particular) are known to yield tensors from
tensors. This is an advantage relative to partial derivatives with respect to point coordinates. The partial
derivatives must be carefully balanced by antisymmetrizing so as to cancel out the non-tensorial terms which
transform according to higher derivatives (than the first derivative) of point transformations. Differentiating
with respect to a chart does not generally yield tensors from tensors because a chart is not an intrinsic
structure of a manifold.
[ A lot of things in Remark 20.7.3 need clarification. For example, the Lie derivative must be defined and
differential forms with vector fields as arguments must be defined. They must also be extended from tensor
monomials to general tensors. ]
[ Somewhere in this chapter, define vector field algebra and Lie derivatives in flat space. See Section 32.4 for
Lie derivatives in curved space. ]
[ In Remark 20.7.3, expand the Lie derivatives LXi to see what terms arise. They should cancel the undiffer-
entiated φ terms. ]
20.7.3 Remark: For any sequence of p + 1 vector fields X = (Xi )p+1
i=1 ,
dφ(X) = dφ(X1 , . . . , Xp+1 )
p+1
5 5
= (−1)i−1 LXi (φ(omit(X))) + (−1)i+j φ( insert (omit(X)))
i 1,[Xi ,Xj ] i,j
i=1 1≤i<j≤p+1
p+1
5
= (−1)i−1 LXi (φ(X1 , . . . , X̂i , . . . , Xp+1 ))
i=1
5
+ (−1)i+j φ([Xi , Xj ], X1 , . . . , X̂i , . . . , X̂j , . . . , Xp+1 ),
1≤i<j≤p+1
where LX denotes the Lie derivative with respect to a vector field X.

[ Why is dφ(X) in Definition 20.6.2 defined for fields X instead of vectors V ? ]

20.7.4 Remark: The pointwise tensor style in Definition 20.6.2 is as in Federer [106] 4.1.6. The tensor
field style is as in EDM2 [35] 105.Q(2), Gallot/Hulin/Lafontaine [20], Pp. 43, 74, and Crampin/Pirani [12],
page 125.
Malliavin [36], page 117, gives the following for a vector field sequence X = (Xi )p+1
i=1 :
dφ(X) = dφ(X1 , . . . , Xp+1 )
p+1
5
= (−1)i−1 LXi (φ(X1 , . . . , X̂i , . . . , Xp+1 ))
i=1
5
+ (−1)i+j φ(X1 , . . . , X̂i , [Xi , Xj ], . . . , X̂j , . . . , Xp+1 ),
1≤i<j≤p+1
which is apparently different to Definition 20.6.2.

[ It is clear from Crampin/Pirani [12], page 128, that LXi (φ(X1 , . . . , X̂i , . . . , Xp+1 )) means
p+1
5
(LXi φ)(X1 , . . . , X̂i , . . . , Xp+1 ) + φ(X1 , . . . , [Xi , Xj ], . . . , X̂i , . . . , Xp+1 ).
j=1
j%=i
I.e. must apply LXi to both φ and the parameters of φ. ]
20.8. Geometric measure theory

[ Present the Gauß-Green theorem. See Federer [106] (4.5.6, page 478), EDM2 [35] (94.F, page 355, 105.U,
page 390) and Frankel [19] (3.3b, page 111, 5.1, page 155). The EDM calls it the Stokes formula or the
Green-Stokes formula. Frankel calls it Stokes’s Theorem but attributes it to Ampère, Kelvin, Green, Gauß
and others. ]
20.9. Stokes theorem

20.9.1 Remark: Theorem 20.9.2 is known as the Stokes theorem. The Stokes theorem may be regarded
as a natural generalization of the fundamental theorem of calculus. (See also Section 20.2.)
20.9.2 Theorem: N N
dω = ω,
C ∂C
where C is a singular r-chain.
20.9.3 Remark: Theorem 20.9.2 is probably the deepest and most important statement in the differential
layer of differential geometry. Although it is stated here in Euclidean space, the differential layer of a
differentiable manifold is merely a topological generalization of Euclidean space, but is indistinguishable
locally. So the theorem is valid also on general differentiable manifolds.
Theorem 20.9.2 combines multi-variable differentiation, geometric measure theory and algebraic topology in
a single statement. It gives the motivation for much of the differential layer and is the basis of the important
facts about the higher layers. It is a combination of local and global concepts. The Stokes Theorem is
important enough to deserve its own chapter!
20.10. Radon measures

Radon measures are particularly useful for the study of hyperbolic first order systems of partial differential
equations.
[ Radon measures are very important in the study of various dynamic systems and some areas of probability
theory. Measures in general, and Radon measures in particular, have a non-trivial extension to differentiable
manifolds. Radon measures are defined as duals of C 0 function spaces. ]
[ Must also have a section or chapter on the use of Radon measures in the solution of a wide variety of
hyperbolic first-order partial differential equations, including certain “fluid flow models” which are useful in
some teletraffic research. This might be best placed in another book. ]

20.11. Some integrability-based function spaces 475
20.11. Some integrability-based function spaces

20.11.1 Notation: Let n ∈ +
, Ω ∈ Top(IRn ), k ∈ +
0 and p ∈ [1, ∞]. Then W k,p (Ω) denotes. . .
[ See Adams [93] for Sobolev space definitions. ]
20.12. Logarithmic and exponential functions

In terms of Taylor series, the exponential function is more natural than the logarithm function. But in terms
of integrals, the logarithm is more natural. The exponential is not an integral of some simpler function such
as a quotient of polynomials, whereas the logarithm arises naturally as the integral of x−1 . Therefore in this
section, the exponential is defined in terms of the logarithm.
20.12.1 Definition: The logarithm function is the function ln : (0, ∞) → IR defined by
N x
ln(x) = t−1 dt
1
for x ∈ IR with x > 0.
20.12.2 Definition: The exponential function is the function exp : IR → IR defined as the inverse of the
logarithm function.
20.12.3 Remark: Definition 20.12.2 means that ∀x ∈ IR, ln(exp(x)) = x. In other words, ln ◦ exp = idIR .
The equation ln(y) = x has one and only one solution for x ∈ IR because the logarithm function is one-to-one
and its range is IR. Equivalently, exp may be expressed as the left inverse of ln. That is, exp ◦ ln = id(0,∞) .
In other words, ∀x ∈ (0, ∞), exp(ln(x)) = x.
It may seem a little troubling that such a basic function as the logarithm is defined as an integral. Integrals
are cumbersome to calculate in practice. Defining the exponential function as the inverse of an integral
means
M y that it is defined as the solution of an integral equation. Thus y = exp(x) is defined as the solution
of 1 t−1 dt = x. The logarithm and exponential functions are often defined in terms of Taylor series which
are better suited to computers which primarily offer addition and multiplication operations.
As always, one’s choice of definition can be optimized for a given range of applications. Integrals have
some advantages for defining transcendental functions. Integrals don’t require convergence tests to ensure
that they are well-defined. It is easier to show that integral-defined functions are solutions to differential
equations. So for many analysis purposes, integral definitions are better. Series expansions are usually quite
easy to derive from integral definitions.
20.12.4 Theorem: ∀x ∈ IR, d
dx exp(x) = exp(x).
Proof: By Definition 20.12.2, the exponential function satisfies the equation ln(exp(x)) = x for all x ∈ IR.
By Definition 20.12.1 and the chain rule for differentiation, exp(x)−1 dx
d
exp(x) = 1. Therefore dx
d
exp(x) =
exp(x) as claimed.
20.12.5 Theorem: The function f : IR → IR defined by
(
0 x≤0
f (x) =
exp(−x−1 ) x > 0
is a C ∞ function on IR. (See Figure 20.12.1.)
1 1
I J I J
1 1
f (x) = exp − gR (x) = exp
x x − R2
2
0 1 2 x -1 0 1 x
Figure 20.12.1 C ∞ functions f (x) = exp(−x−1 ) and gR (x) = exp((x2 − R2 )−1 ); R = 1

20.12.6 Theorem: The function g : IRn → IR defined for R > 0 by

( " #
exp (x2 − R2 )−1 |x| < R
gR (x) =
0 |x| ≥ R
is a C ∞ function on IRn . (See Figure 20.12.1.)
20.12.7 Theorem: The function f : IR → IR defined by
-
0
" #−1 x≤0
f (x) = 1 + exp(x−1 − (1 − x)−1 ) x ∈ (0, 1)
1 x≥1
is a C ∞ function on IR. (See Figure 20.12.2.)
1
1 1
1 − f (x) = I J f (x) = I J
1 1 1 1
1 + exp − 1 + exp −
1−x x x 1−x
0 1 x
Figure 20.12.2 C ∞
function which is constant outside [0, 1]
20.12.8 Remark: Theorem 20.12.7 was arrived at by first finding a function tanh(x) which is C ∞ and
bounded between two finite values, and then finding another function (1 − x)−1 − x−1 which maps the
finite interval (0, 1) to the doubly infinite interval (−∞, ∞). When these two functions are composed, the
result is a function which has the desired properties but whose range lies in the interval [−1, 1]. This was
adjusted by noting that (tanh(x/2) + 1)/2 = (1 + e−x )−1 . (A very similar function construction is described
in Warner [50], Lemma 1.10, page 10.)
20.12.9 Theorem: The function gr,R : IRn → IR defined for n ∈ + and r, R ∈ IR with 0 ≤ r < R by

 "1 #−1
|x| ≤ r
gr,R (x) = 1 + exp((R − |x|)−1 − (|x| − r)−1 ) |x| ∈ (r, R)

0 |x| ≥ R
is a C ∞ function on IRn which is zero outside B0,R and equal to 1 inside B0,r . (See Figure 20.12.3.)
1
gr,R (x) = I J
1 1
1 + exp −
R − |x| |x| − r
1
x1
-2 -1 0 1 2
Figure 20.12.3 C ∞ function which is zero outside B0,R ; cross-section x2 , . . . xn = 0; r = 1, R = 2

20.13. Trigonometric functions 477
20.12.10 Remark: Theorem 20.12.9 is useful for constructing C ∞ functions with compact support with
prescribed properties!within a.given region. For example, if the function gr,R is multiplied by a general
n
polynomial P (x) = α∈ωn cα i=1 xα i in n variables, then all of the derivatives of the pointwise product
i
function P.gr,R are arbitrarily determined at x = 0 by the choice of coefficients cα .
20.13. Trigonometric functions

Trigonometric functions are needed for the study of spheres, which provide important examples of most
things in differential geometry.
20.13.1 Remark: In terms of Taylor series, the trigonometric functions sin, cos and tan are more natural
than the inverse trigonometric functions. But in terms of integrals, the inverse functions are more natural.
The sin, cos and tan functions cannot be constructed as integrals of simpler functions such as algebraic
functions. The inverse trigonometric functions do arise naturally as integrals of simple algebraic functions.
Therefore in this section, the trigonometric functions are defined in terms of their inverses.
[ Present all known properties of the trigonometric functions. For the trig functions, see CRC [156], pages A-2
to A-7. Also see CRC [100], pages 133–148. Also possibly slightly useful is Reinhardt [135], volume 1,
pages 178–181 . See also Gradstein/Ryzhik [113], pages 50–80, and Spiegel [143], pages 11–20. ]
20.13.2 Definition: The one-argument inverse trigonometric functions are defined as follows.
N x
−
∀x ∈ IR, arctan(x) = (1 + t2 )−1 dt
0
N x
∀x ∈ [−1, 1], arcsin(x) = (1 − t2 )−1/2 dt
0
N 1
∀x ∈ [−1, 1], arccos(x) = (1 − t2 )−1/2 dt
x
20.13.3 Remark: The inverse trigonometric functions are illustrated in Figure 20.13.1. It is clear from
the definitions that these functions are one-to-one. The arctangent, arcsine and arccosine functions are often
abbreviated to atan, asin and acos respectively.
y
acos(x)
π
asin(x)
π/2
atan(x)
0
-6 -5 -4 -3 -2 -1 1 2 3 4 5 6 x
−π/2
Figure 20.13.1 The atan, asin and acos functions
20.13.4 Definition: π = 4 arctan(1).

M∞
20.13.5 Remark: The number π in Definition 20.13.4 also satisfies π = 2 arctan(∞) = −∞ (1 + t2 )−1 dt.
Then arccos(x) = π/2 − arcsin(x) for all x ∈ [−1, 1].
M1
Note also that π = 2 arcsin(1) = arccos(−1) = −1 (1 − t2 )1/2 dt. [ There are a zillion such formulas for π. ]

20.13.6 Definition: A two-parameter arctangent function arctan : IR2 → (−π, π] may be defined in terms
of the standard single-parameter version as follows.

 arctan(y/x) if x > 0



 π + arctan(y/x) if x < 0 and y ≥ 0

−π + arctan(y/x) if x < 0 and y < 0
∀(x, y) ∈ IR2 , arctan(x, y) =

 π/2 if x = 0 and y > 0



 −π/2 if x = 0 and y < 0
0 if x = y = 0.
[ See Crampin/Pirani [12], page 41, for this arctangent function. ]
[ Define two-parameter versions of arcsin, arccos etc. also. ]
20.13.7 Remark: The 2-parameter arctan function seems to be the best basis for deriving the other
trigonometric functions. For instance,
"O #
∀x ∈ [−1, 1], arcsin(x) = arctan 1 − x2 , x
" O #
∀x ∈ [−1, 1], arccos(x) = arctan x, 1 − x2 .
The 2-parameter arctan function satisfies the following.
 " #
 arccos x(x2 + y 2 )−1/2 x2 + y 2 > 0, y ≥ 0
" #
∀(x, y) ∈ IR ,
2
arctan(x, y) = − arccos x(x2 + y 2 )−1/2 x2 + y 2 > 0, y < 0

0 x = y = 0.
[ Here define sin, cos and tan in terms of the inverse functions by way of second-order ODEs or first-order
systems. Note that these abbreviated notations seem to be due to Euler (1748). See EDM2 [35], 432.C. ]
20.13.8 Remark: Definition 20.13.9 expresses the sine, cosine and tangent functions in terms of the arcsin,
arccos and arctan functions. The sawtooth functions used in this definition are discussed in Remark 8.6.19.
20.13.9 Definition: The functions sin, cos and tan are defined as follows.
" #
∀x ∈ IR, cos(x) = arccos−1 |(x + 2π) mod 4π − 2π|
" #
∀x ∈ IR, sin(x) = arcsin−1 |(x + 3π/2) mod 2π − π| − π/2 .
∀x ∈ IR \ {(k + 1/2)π; k ∈ },
" #
tan(x) = arctan−1 (x + π/2) mod π − π/2 .
20.13.10 Theorem: The sum and difference rules are:
∀a, b ∈ IR, sin(a + b) = sin a cos b + cos a sin b
∀a, b ∈ IR, cos(a + b) = cos a cos b − sin a sin b.
20.13.11 Theorem: The product rules are:
" #
∀a, b ∈ IR, sin a sin b = 2 cos(a − b) − cos(a + b)
1
" #
∀a, b ∈ IR, cos a cos b = 2 cos(a − b) + cos(a + b)
1
" #
∀a, b ∈ IR, sin a cos b = 2 sin(a + b) + sin(a − b)
1
" #
∀a, b ∈ IR, cos a sin b = 2 sin(a + b) − sin(a − b) .
1
20.13.12 Theorem: The double-angle rules are:

∀θ ∈ IR, cos 2θ = cos2 θ − sin2 θ
= 2 cos2 θ − 1
= 1 − 2 sin2 θ.
∀θ ∈ IR, sin 2θ = 2 sin θ cos θ.
20.13.13 Theorem: The half-angle rules are:
∀θ ∈ IR, sin2 12 θ = 12 (1 − cos θ)
∀θ ∈ IR, cos2 12 θ = 12 (1 + cos θ).
and so forth. . .

20.13. Trigonometric functions 479
20.13.14 Theorem: Some rules relating the trigonometric functions to each other are:
∀θ ∈ [0, π/2), sin θ = (1 − cos2 θ)1/2

= tan θ (1 + tan2 θ)−1/2
= (cosec θ)−1
= (sec θ)−1 (sec2 θ − 1)1/2
= (1 + cot2 θ)−1/2

 (1 + tan2 θ)−1/2 if θ mod 2π ∈ [0, π/2) ∪ (3π/2, 2π)
∀θ ∈ IR, cos θ = −(1 + tan2 θ)−1/2 if θ mod 2π ∈ (π/2, 3π/2)

0 if θ mod 2π ∈ {π/2, 3π/2}.
20.13.15 Theorem: Some useful combinations are:
∀θ ∈ (−π/2, π/2) sin θ cos θ = tan θ (1 + tan2 θ)−1
and so forth. . .
20.13.16 Theorem: The relations between trigonometric functions can also be expressed as follows:
∀x ∈ IR sin(arctan x) = ±x(1 + x2 )−1/2

∀x ∈ IR cos(arctan x) = ±(1 + x2 )−1/2
and so forth. . .
20.13.17 Theorem: Some useful relations between the inverse trigonometric functions are:
∀x ∈ IR arcsin(x) = 1
2 arccos(1 − 2x2 )
∀x ∈ [0, 1], arcsin(x1/2 ) = 1
2 arccos(1 − 2x)
and so forth. . .
20.13.18 Theorem: The angle translation rules are:
∀θ ∈ IR, cos θ = sin(θ + π/2)
and so forth. . .
20.13.19 Theorem: The derivatives of the trigonometric functions are:
d
∀θ ∈ IR \ (π/2 + π ), tan θ = sec2 θ
dθ
and so forth. . .
20.13.20 Theorem: The derivatives of the inverse trigonometric functions are:
d
∀x ∈ (−1, 1), arccos x = −(1 − x2 )−1/2
dx
∂
∀x ∈ IR \ {0}, ∀y ∈ IR, arctan(x, y) = −y/(x2 + y 2 )
∂x
∂
∀x ∈ IR \ {0}, ∀y ∈ IR, arctan(x, y) = x/(x2 + y 2 ).
∂y
and so forth. . .
[ Must check the quantifiers in Theorem 20.13.20. ]

20.13.21 Theorem: The equation a cos θ + b sin θ = c for θ ∈ IR, (a, b) -= (0, 0) and c2 ≤ a2 + b2 has the
solutions θ = arctan(a, b) ± arccos(c(a2 + b2 )−1/2 ) + 2nπ for n ∈ .
√
Proof: This follows from the formula a cos θ + b sin θ = a2 + b2 cos(θ − arctan(a, b)).
20.13.22 Remark: It may seem a little ludicrous to define the inverse trigonometric functions first and
then patch them together to create the familiar trigonometric functions sin, cos and tan. However, Bell [191],
page 324, said the following in 1937.
Now, in the integral calculus, the inverse trigonometric functions present themselves naturally as
definite integrals of simple algebraic irrationalities (second degree); such integrals appear when we
seek to find the length of an arc of a circle by means of the integral calculus. Suppose the inverse
trigonometric functions had first presented themselves this way. Would it not have been “more
natural” to consider the inverses of these functions, that is, the familiar trigonometric functions
themselves as the given functions to be studied and analyzed? Undoubtedly; but in shoals of more
advanced problems, the simplest of which is that of finding the length of the arc of an ellipse
by the integral calculus, the awkward inverse “elliptic” (not “circular,” as for the arc of a circle)
functions presented themselves first. It took Abel to see that these functions should be “inverted”
and studied, precisely as in the case of sin x, cos x instead of sin−1 x, cos−1 x. Simple, was it not?
Yet Legendre, a great mathematician, spent more than forty years over his “elliptic integrals” (the
awkward “inverse functions” of his problem) without ever once suspecting that he should invert.
This extremely simple, uncommonsensical way of looking at an apparently simple but profoundly
recondite problem was one of the greatest mathematical advances of the nineteenth century.
Well, maybe it wasn’t so great an advance as that. But the importance of transforming problems to facilitate
their solution is clear.

[481]
Chapter 21
Differential equations
21.1 Ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

21.2 Systems of linear second-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
21.3 Boundary value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
21.4 Initial value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
21.5 Calculus of variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
21.6 ODEs for defining exponential and trigonometric functions . . . . . . . . . . . . . . . . . 486
21.7 Taylor series and exponentials of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 487
21.0.1 Remark: The subject of differential equations, both ordinary and partial, may be regarded as an
extension of the subject of measure and integration.
The fundamental theorem of calculus states that the simple differential equation ∀t ∈ IR, F # (t) = f (t) has
Mt
solutions of the form ∀t ∈ IR, F (t) = f (s) ds. The integral on the real number line may thus be regarded
as a method of solution of a simple class of differential equations. In other words, the FTOC states that the
integral and the anti-derivative are the same thing.
The subject of differential equations may be regarded as a generalization of the simple differential equa-
tion F # (t) = f (t). The solutions of the more general differential equations may be regarded as integrals in
some sense. In other words, solutions of general differential equations may be regarded as generalized “anti-
derivatives”. Therefore it is not suprising that integration is an important tool in the solution of differential
equations. Consequently, the treatment of differential equations comes logically after the presentation of
measure and integration.
21.0.2 Remark:
DE is an abbreviation for “differential equation(s)”.
ODE is an abbreviation for “ordinary differential equation(s)”.
PDE is an abbreviation for “partial differential equation(s)”.
PDO is an abbreviation for “partial differential operator(s)”.
BVP is an abbreviation for “boundary value problem(s)”.
IVP is an abbreviation for “initial value problem(s)”.
It is customary to append an “s” to the above abbreviations to indicate the plural because this “s” is often
present in the spoken language. However, the suffix “s” is optional. Thus “partial differential equations”
may be abbreviated to either PDE or PDEs. The abbreviation PDO is not very common.
21.0.3 Remark: Ordinary differential equations are expressed in terms of a single independent variable.
Partial differential equations are expressed in terms of a any number of independent variables. Therefore
ordinary differential equations are a special case of partial differential equations. When the number of
dependent variables is more than one, the equations are referred to as a “system” of equations. This basic
classification of differential equations is illustrated in Figure 21.0.1.
21.0.4 Remark: The vast majority of laws and models of physics are expressed as differential equations.
So most of the mathematical work in physics consists of solving differential equations. For example, Einstein’s


482 21. Differential equations
ordinary differential equation
system of
partial differential equation
ordinary differential equations
system of
partial differential equations
Figure 21.0.1 Basic classification of differential equations
gravity equations are a system of partial differential equations. Newton’s gravity law applied to a single object
falling vertically is a single ordinary differential equation. Bell [191], pages 103–104 says the following.
The great majority of the important equations of mathematical physics are partial differential
equations.
21.0.5 Remark: The subject of differential equations is truly vast. This is not surprising because of the
very broad applicability to all of the sciences, technology and engineering. But there is a second reason
for the vastness of the mathematical subject of differential equations. Although most laws and models in
applications are expressed as DEs, most of the work lies in the solution of these equations. Solving DEs
turns out to be a much deeper subject than the solution of algebraic equations. For example, it is very rare
for PDEs (in more than one variable) to have explicit solutions. Generally the best that can be achieved is
the development of approximation techniques together with and analysis of error bounds. Since typically one
can never “see” the solutions, it is necessary to develop tools for the analysis of properties of PDE solutions
in terms of the specifications of models without seeing the solutions themselves. Thus, for example, there
is a very large literature concerned with merely demonstrating the existence and regularity of solutions of
PDEs.
Most of the vastness of the PDE subject is due to the difficulty of solving the equations. The mere formulation
of PDE models is not so difficult. Therefore luckily this book does not need to provide a complete summary
of PDE solution techniques as a prerequisite for differential geometry. Such a summary of PDE solution
techniques would be on an encyclopedic scale. Nevertheless some basic PDE analysis techniques such as
maximum principles are presented here. (Recall that this is a definitions book, not a theorems and techniques
book. So this book is concerned only with the formulation of models, not the methods of solution. The
author’s project to include all DG prerequisites is only feasible because this is merely a definitions book.)
21.0.6 Remark: People who study differential equations only so that they can solve them in applications
often find incomprehensible the huge research effort devoted to proving merely the existence of solutions.
Bell [191], page 528, makes the following comment on this issue.
Of what immediate use is it to a working physicist to know that a particular differential equation
occurring in his work is solvable, because some pure mathematician has proved that it is, when
neither he nor the mathematician can perform the Herculean labor demanded by a numerical
solution capable of application to specific problems?
However, the methods of existence proof often strongly suggest methods of solution. They also give a-priori
bounds which are useful for checking that numerical approximations are credible. Knowing the space in
which a solution exists is important for determining the representation framework to be used for numerical
approximations. And when serious difficulties arise in finding solutions, it is reassuring to know that the
reason for the difficulties is not the non-existence or non-uniqueness of solutions.
21.0.7 Remark: Differential geometry generalizes the flat Euclidean space (for the independent variables)
of the classical PDE literature of the 19th century to various classes of curved spaces. Thus PDEs in a
curved space constitute an extension of the already difficult subject of flat-space PDEs. This, however, is
the framework for models in general relativity and numerous other areas of physics. This helps to explain
why DG is such a difficult subject.

21.1. Ordinary differential equations 483
21.1. Ordinary differential equations

[ This section deals with existence and uniqueness for ODEs. This is required for constructing geodesic curves
from connections. Curves are maps γ : I → X from a totally ordered set I to a set X. A path is defined as
an equivalence class of curves. ]
21.1.1 Remark: One of the most important tools in ordinary and partial differential equations is the
“maximum principle”. The purpose of maximum principles is to determine bounds on functions which
satisfy ODEs and PDEs. Such bounds are very useful for proving existence, uniqueness and regularity,
which are the three primary tasks to be carried out for any class of ODEs or PDEs.
21.1.2 Remark: Many questions in PDE can be reduced to ODE questions. So it is important to un-
derstand ODE before studying PDE. A particular example of this general idea is the topic of maximum
principles for elliptic second-order differential operators. This topic is fairly simple to deal with in a single
independent variable. A maximum principle for an ODE might say, for example, that if a real-valued func-
¯ ∩ C 2 (I) for a bounded open interval I ⊆ IR satisfies a(x)u## (x) + b(x)u# (x) = 0 for x ∈ I,
tion u ∈ C 0 (I)
then u has no interior minimum or maximum in I, under suitable conditions on a and b. Typically it will
be assumed that a(x) > 0 for all x ∈ I. So the equation can be normalized so that a(x) = 1 for all x ∈ I.
An interesting question is to then ask what conditions on b will prevent an interior minimum or maximum.
As an example, let u(x) = exp(−x−2 /2) for x -= 0 and u(0) = 0. Then u ∈ C ∞ (IR) and
( " #
u (x) + b(x)u (x) =
## # x−6 1 − 3x2 + b(x)x3 u(x) for x -= 0
0 for x = 0.
This equals zero for all x ∈ IR if (

b(x) = (3x2 − 1)/x3 for x -= 0
0 for x = 0.
Hence u## (x) + b(x)u(x) = 0 for all x ∈ IR for this choice of b. (See Figure 21.1.1.)
1 4
u(x) = exp(−x−2 /2) b(x) = −u## (x)/u# (x)
3 = (3x2 − 1)/x3
2
1
x
-1 0 1 -4 -3 -2 -1 0 1 2 3 4 x
u## (x) + b(x)u# (x) = 0
Figure 21.1.1 C ∞ counterexample for naive maximum principle
But the function u has a local minimum at x = 0. Similarly, −u satisfies the same equation and has a local
maximum at x = 0. So clearly there is no maximum principle in this case. This is perhaps a little disturbing
because the operator L = ∂x2 + b(x)∂x is clearly uniformly elliptic.
The missing ingredient in this maximum principle is a bound on the first-order coefficient b. If b is bounded,
Theorem 21.1.3 is obtained.
21.1.3 Theorem: Let I be a non-empty bounded open real-number interval. Let u ∈ C 0 (I) ¯ ∩ C 2 (I) be a
real-valued function on I which satisfies Lu(x) = u (x) + b(x)u (x) = 0 in I, where b : I → IR is a bounded
## #
function on I. Then supI (u) = sup∂I (u) and inf I (u) = inf ∂I (u).

Proof: Let kb = supI (|b|). Define v ∈ C 0 (I) ¯ ∩ C 2 (I) by v(x) = exp((kb + 1)x) for x ∈ I. ¯ Then
Lv(x) = (kb + 1)(kb + 1 + b(x))v(x) ≥ (kb + 1)v(x) > 0 for all x ∈ I. Let w = u + αv for α ∈ IR. Then
for α > 0, Lw(x) = Lu(x) + αLv(x) > 0 for all x ∈ I. Suppose x ∈ I is a local maximum of w. Then
w# (x) = 0 and w## (x) ≤ 0. So Lw(x) = 0. This is a contradiction. So w has no local maximum in I. Therefore
supI (w) = sup∂I (w). But u is the uniform limit of w as α → 0+ . So supI (u) = sup∂I (u). Negating u gives
inf I (u) = inf ∂I (u).
21.1.4 Remark: A merely C 2 function u which is also a counterexample to a naive maximum principle
(with unbounded coefficient of the first-order derivative) is the function u(x) = |x|k for x ∈ IR and k > 2.
In this case, the equation u## (x) + b(x)u(x) = 0 is satisfied for b(x) = (1 − k)/x for x -= 0 and b(0) = 0. This
is illustrated in Figure 21.1.2.
u(x) = |x|k , k = 2.1

1 4
b(x) = −u## (x)/u# (x)
3 = (1 − k)/x
2
1
x
-1 0 1 -4 -3 -2 -1 0 1 2 3 4 x
u## (x) + b(x)u# (x) = 0
Figure 21.1.2 C 2 counterexample for naive maximum principle
It seems that functions like b(x) = |x|−1 are close to the boundary of what makes a maximum principle of
" ofM xLloc (IR) #functions. In fact, any solution
this sort work. But this function is also at the boundary of the set 1
u of equation u (x) + b(x)u (x) = 0 must satisfy |u (x)| = exp −

## # #
b(x) dx , which is never equal to zero
if b ∈ L1loc (IR). Therefore no interior maximum or minimum is possible. This suggests the possibility of a
strengthened version of Theorem 21.1.3. Such a maximum principle is Theorem 21.1.5.
21.1.5 Theorem: Let I be a non-empty bounded open real-number interval. Let u ∈ C 0 (I) ¯ ∩ W 2,1 (I)
be a real-valued function on I which satisfies Lu(x) = u## (x) + b(x)u# (x) ≥ 0 for almost all x ∈ I, where
b ∈ L1loc (IR). Then supI (u) = sup∂I (u). Similarly, if Lu(x) ≤ 0 for almost all x ∈ I, then inf I (u) = inf ∂I (u).
M "M #
Proof: Define v ∈ C 0 (I) ¯ ∩ W 2,1 (I) by v(x) = x exp x (1 − b(x)) dx dx for x ∈ I. ¯ Then Lv(x) = v ## (x) +
b(x)v (x) = v (x) > 0 for all x ∈ I. Let w = u + αv for α ∈ IR. Then for α > 0, Lw(x) = Lu(x) + αLv(x) > 0
# #
for all x ∈ I. Suppose x ∈ I is a local maximum of w. Then w# (x) = 0 and w## (x) ≤ 0. So Lw(x) = 0. This
is a contradiction. So w has no local maximum in I. Therefore supI (w) = sup∂I (w). But u is the uniform
limit of w as α → 0+ . So supI (u) = sup∂I (u).
21.1.6 Remark: Many analysis texts do not provide examples to demonstrate the necessity of some of
the odd-looking assumptions which are placed on theorems. It is important to provide such examples so
that the reader can more readily accept some of the more technical assumptions, but also to establish the
“sharpness” of results.
A “sharp theorem” is a theorem whose assumptions cannot be significantly weakened and whose assertions
cannot be significantly strengthened without significantly increasing the theorem’s complexity. As a trivial
example, the bound cos(x) ≥ 1 − x2 for all x ∈ IR is not sharp because the bound can easily be improved
to cos(x) ≥ 1 − x2 /2, which is a sharp bound because the coefficient of x2 cannot be improved.
If a theorem is not sharp, the values and even the form of bounds and conditions in the theorem will quite
likely be artefacts of the method of proof. By establishing sharpness, it is made clear that the bounds and
conditions are attributes of the system under study rather than the technicalities of the proof method. In a
sense, a sharp bound “hugs” the envelope of possibilities of the things which are bounded. No better bound

21.2. Systems of linear second-order ODEs 485
can be interposed between the bound and the things which are bounded. A further practical advantage of
sharp bounds is that it will not be possible for an adversary to publish a better bound and take a share of
the glory.
To establish sharpness of a theorem, it is necessary to conjecture that each assumption and assertion can be
individually improved by varying the parameters. Each such conjecture must be disproved. This can often
be achieved by the use of counterexamples, as has been done in Remarks 21.1.2 and 21.1.4.
21.1.7 Remark: It is clear that for second-order linear equations with a single independent variable (i.e. for
ordinary differential equations), explicit calculations can quickly yield maximum principles without resort to
complicated constructions. The situation is not quite so simple for multiple independent variables (i.e. partial
differential equations). It is, however, useful to study the single-variable case first to obtain some intuition so
that the higher-dimensional cases are easier to understand. (For further information on maximum principles,
see Miranda [127], section 3 and Gilbarg/Trudinger [110], chapter 3.)
21.2. Systems of linear second-order ODEs

This section is relevant to Jacobi fields. In geodesic coordinates, Jacobi fields are solutions of a system of
linear ordinary differential equations of second-order with respect to the affine parameter along the geodesic.
Of particular interest are estimates of the solutions of boundary value problems.
21.3. Boundary value problems

A boundary value problem, in the subject of differential equations, means a problem where the values of
solutions are specified on the boundary of a region and a differential equation is required to be satisfied by
solutions on the interior of the region.
The most interesting boundary value problems are those for which the solutions exist and are unique in some
sense. Then the solution may be regarded as a function of the boundary and interior conditions.
In physical models, a BVP typically describes a situation where the values of a field are known on the
boundary of a region and one wishes to know what is happening inside the region. The boundary values
may be known either because they are passively measured or because they are actively controlled in some
way.
The solutions of a BVP are typically static, particularly if uniqueness is guaranteed. Generally there is not
time parameter to give the solutions a dynamic character.
Second-order PDEs for which BVP existence and uniqueness are guaranteed in bounded domains are typically
elliptic. When the value of a BVP solution is specified on the boundary, this is called a Dirichlet problem.
When the gradient of a BVP solution is specified on the boundary, this is called a Neumann problem.
21.3.1 Remark: Bell [191], page 105, makes the following comment about the ubiquity of boundary value
problems in physics.
In a sense mathematical physics is co-extensive with the theory of boundary-value problems.
21.4. Initial value problems

An initial value problem, in the subject of differential equations, means a problem where the values of
solutions are specified at some initial time and on the (possibly empty) boundary of a region, and a differential
equation is required to be satisfied by solutions on the interior of the region at all times after the initial time.
In some initial value problems, the region under consideration has no boundary because the region occupies
the entire space.
The most interesting IVPs are those for which the solutions exist and are unique in some sense. Then the
solution may be regarded as a function of the initial, boundary and interior conditions.
In physical models, an IVP typically describes a situation where the values of a field are known at some
initial time and on the (possibly empty) boundary of a region, and one wishes to know what is happening
inside the region after the initial time. The initial and boundary values may be known either because they
are passively measured or because they are actively controlled in some way.
Second-order PDEs for which IVP existence and uniqueness are guaranteed in bounded domains are typically
parabolic or hyperbolic.

21.5. Calculus of variations

This section deals with existence and uniqueness for calculus of variations. This is required for constructing
geodesic curves from metrics.
21.6. ODEs for defining exponential and trigonometric functions

The integral expressions for logarithmic and exponential functions in Section 20.12 have the advantage
of avoiding the need to deal with the existence and uniqueness questions for the differential expressions.
However, the motivation for the differential expressions is stronger.
Mx
21.6.1 Remark: Consider Definition 20.12.1 for the logarithm function: ln(x) = 1 t−1 dt. The exponen-
tial function is defined as the inverse of the logarithm. This leads to Theorem 20.12.4, which states that
∀x ∈ IR, (d/dx) exp(x) = exp(x). Since exp(0) = 1, this implies that the exponential function satisfies the
following ODE boundary value problem for a function f : IR → IR.
∀x ∈ IR, f # (x) = f (x)
f (0) = 1.
The equation f (0) = 1 is the boundary condition of the BVP. It turns out that the exponential function
exp : IR → IR is the unique function which satisfies this BVP.
21.6.2 Remark: The trigonometric functions sin and cos in Section 20.13 arise naturally from second-order
ODEs with constant coefficients. The sine function is the unique solution of the following BVP.
∀x ∈ IR, f ## (x) = −f (x)
f (0) = 0
f # (0) = 1.
If one ignores the effort required to prove existence and uniqueness, this seems like a much simpler charac-
terization than Definition 20.13.9.
When the sine function is characterized by a BVP, the solution is most readily obtained from a Taylor
series expansion. This yields a non-algebraic function. (See EDM2 [35], article 11 for algebraic functions
and article 10 for algebraic equations.) Although power series can be manipulated to obtain a vast array of
properties, it is necessary Mto always be on guard with respect to convergence issues. The integral expression
x
∀x ∈ [−1, 1], arcsin(x) = 0 (1 − t2 )−1/2 dt for the inverse arcsin of the sine function provides a much safer
way to derive properties.
21.6.3 Remark: The cosine function is the unique solution of the following BVP.
∀x ∈ IR, g ## (x) = −g(x)
g(0) = 1
g # (0) = 0.
From this BVP, it is possible to determine that the cosine is the derivative everywhere of the sine function.
The development of properties and relations of trigonometric functions usually start with basic relations
such as (d/dx) sin(x) = cos(x) and (d/dx) cos(x) = − sin(x), and the initial values sin(0) = 0 and cos(0) = 1.
These two equations and initial values actually constitute a system of two coupled ODEs of first order, as
follows.
∀x ∈ IR, f # (x) = g(x)
∀x ∈ IR, g # (x) = −f (x)
f (0) = 0
g(0) = 1.
Such a system is a more natural starting point for development of properties than the second-order ODE,
which itself follows trivially from the coupled first-order ODEs. It is readily shown that the inverses of the
arcsin and arccos functions given by the integral expressions in Definition 20.13.2 satisfy the above system
of equations for f and g respectively.

21.7. Taylor series and exponentials of matrices 487
21.7. Taylor series and exponentials of matrices

[ Probably should express all the special functions in previous sections as Taylor series here. ]
This section deals with Taylor series for analytic functions, and exponentials of matrices. These matrices
turn out to be solutions of systems of ODEs with constant coefficients, which are useful for studying Lie
groups.
[ Exponentials of matrices are covered in Warner [50], pages 102–107. ]
[ Somewhere in this chapter should be the basic facts of Fourier series and transforms. It would be ridiculous
to try to include all analysis that could possibly be useful in this book. But Fourier series and transforms are
fairly fundamental. It would be nice to include a proof of the completeness of Fourier series and transforms.
Perhaps one way to do this would be to show that all simple square functions MHa,b : IR → IR with Ha,b (x) = 1
for x ∈ (a, b) are almost everywhere approximated, and then argue that if Ha,b f dx = 0 for all a, b ∈ IR,
then such a continuous function f must equal zero. ]


[489]
Chapter 22
Non-topological fibre bundles
22.1 Non-topological fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

22.2 Parallelism for non-topological fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
22.3 Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
22.4 Finite transformation groups as fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 493
This chapter may be regarded as a recreational prelude to Chapters 23 and 24 which deal with the serious
business of topological fibre bundles. The non-topological fibre bundle definitions in this chapter are (prob-
ably) non-standard, but they are a plausible reconstruction (or deconstruction) of minimal fibre bundles
from standard fibre bundles. The purpose of this chapter is to investigate the extent to which topological
and differentiable fibre bundle definitions can be meaningful in non-topological fibre bundles, especially in
the case of discrete or finite spaces. (If the reader would like the skip this chapter, there are no foreseeable
negative consequences. More productive uses of the reader’s time would include sleeping, eating, and staring
at the wall.)
The core topic of differential geometry is curvature, curvature is defined in terms of parallelism, and fibre
bundles are the natural structure on which parallelism is defined because if one asks what kind of structure
the most general notion of parallelism could apply to, it must be a set of entities attached at different points
of a base set. Parallelism specifies associations between objects which are attached to different points of a
base set. A fibre bundle specifies how these objects are attached to the base set.
Despite the apparent core role of fibre bundles as the structure which supports parallelism and connections,
a large proportion of the literature on fibre bundles seems to “build roads into the desert”. There seems
to be no need to intensively study fibre bundles as a class in the way that affine connections are studied.
Differential geometry could very likely survive without them. However, the language of fibre bundles is
almost ubiquitous in the DG literature. Therefore the definitions must be presented in any serious book on
differential geometry.
This chapter deals with non-topological fibre bundles and parallelism. Topological fibre bundles are de-
fined in Chapters 23–24. Differentiable fibre bundles are defined in Chapter 34. Differential parallelism
(“connections”) is presented in Chapters 35–37.
22.1. Non-topological fibrations

The non-uniform non-topological fibrations in Definition 22.1.1 may be the most general form of fibration on
which any sort of parallelism may be defined. This minimalist definition is based on Theorem 6.6.9, which
shows that any set is partitioned by a function on that set.
A slightly more useful structure would be a fibre bundle which requires all sets in the partition to be
equipotent, so that there would be bijections between each “fibre”. These uniform non-topological fibrations
are introduced in Definition 22.1.3. There is an implicit structure group for this fibre bundle definition: the
symmetric group of all permutations of the fibre set.
Parallelism may be defined on such fibre bundles. If all fibres have the same cardinality, absolute (path-
independent) parallelism may be defined as bijections between the fibres. In the case of non-uniform cardi-
nality of fibres, parallelism can be defined by more general relations than bijections.


490 22. Non-topological fibre bundles
Path-dependent parallelism may be defined in terms of fibre bijections which are a symmetric, transitive
function of a specified set of permitted paths. The set of paths should be closed under concatenation and
reversal. If the base space had a topology, continuous paths could be used. For non-topological fibre bundles,
the path space must be chosen in other ways.
22.1.1 Definition: A non-uniform non-topological fibration is a tuple (E, π, B) such that E and B are
sets and π is a function π : E → B.
The set E is called the total space of (E, π, B).
The set B is called the base space of (E, π, B).
The function π is called the projection map of (E, π, B).
The set Eb = π −1 ({b}) for each b ∈ B is called the fibre (set) at b.
22.1.2 Remark: Definition 22.1.1 is illustrated in Figure 22.1.1.
Eb1 = π −1 ({b1 }) Eb2 = π −1 ({b2 })
B
b1 b2
Figure 22.1.1 Non-uniform non-topological fibrations
If the fibre sets π −1 ({b}) are not equipotent (do not have the same cardinality), then it is not possible
to define parallelism between fibre sets in terms of bijections. Weaker, non-bijective parallelism relations
satisfying symmetry and transitivity could be defined, but this doesn’t seem very useful for differential
geometry, although one could put a contrary argument. Bijections are not necessarily the only relations of
interest between objects which are attached to different points of a base set.
If the fibre sets π −1 ({b}) are equipotent, then there is a fixed set F such that for all b ∈ B, there exists
a bijection φ : π −1 ({b}) → F . This could be thought of as a sort of “uniform non-topological fibration
with extrinsic fibre F ” as presented in Definition 22.1.3. In physical models, one does not expect to see an
extrinsic “fibra ex machina”, but it is usual to define fibre bundles in this way, the extrinsic fibre space being
unique up to bijection.
22.1.3 Definition: A (uniform) non-topological fibration is a tuple (E, π, B) such that E and B are sets,
π is a function π : E → B, and
(i) ∀b1 , b2 ∈ B, ∃φ : π −1 ({b1 }) → π −1 ({b2 }), φ is a bijection.
A set F is said to be a fibre space for (E, π, B) if
(ii) ∀b ∈ B, ∃φ : π −1 ({b}) → F , φ is a bijection.
22.1.4 Remark: Definition 22.1.3 (i) means that the sets π −1 ({b}) are pairwise equipotent for b ∈ B.
22.1.5 Definition: A fibre chart with fibre F for a fibre bundle (E, π, B) is a map φ : π −1 (U ) → F for
some set U ⊆ B such that
(i) π × φ : π −1 (U ) → U × F is a bijection.
22.1.6 Remark: Definition 22.1.5 is illustrated in Figure
& 22.1.2 using the notation Eb = π −1 ({b}) for b ∈
B. Condition (i) is equivalent to the requiring that φ&π−1 ({b}) : π −1 ({b}) → F be a bijection for all b ∈ B. It
is clear from the diagram that a fibre chart is analogous to a projection map in that they both project the
total space down to another space, either B or F .

22.2. Parallelism for non-topological fibrations 491
F F & & fibre space

φ&E = φ&π−1 ({b})
b
φ fibre chart
E Eb1 Eb2 Eb total space
π projection map
B base space
b1 b2 b
Figure 22.1.2 Uniform non-topological fibration with fibre chart
22.1.7 Definition: A fibre atlas with fibre space F for a fibre bundle (E, π, B) is a set AF
E of functions
φ : π −1 (U ) → F for U ⊆ B such that π × φ : π −1 (U ) → U × F is a bijection.
22.1.8 Remark: Since the base set B is unstructured, there is no difference between a chart φ : π −1 (U ) →
F for b ∈ U and a set of individual pointwise charts φ : π −1 ({b}) → F for all b ∈ U . Only when structures
such as a topology or a differentiable manifold atlas are introduced on B is it worthwhile to define non-
pointwise charts. An unstructured set is effectively equivalent to a topological space with the discrete
topology. (See Definition 14.3.19 for discrete topology.) Since the fibre space F is also unstructured, even
pointwise fibre charts contain no information, because all bijections between the fibre space and a given fibre
set are equivalent. The value of defining structures such as charts and atlases for unstructured fibre bundles
is to highlight the information that is contained in them when structure is present.
22.1.9 Definition: A uniform non-topological fibration with fibre space F is a tuple (E, π, B, AF
E ) such
that
(i) (E, π, B) is a uniform non-topological fibration;
(ii) F is a fibre space for (E, π, B);
(iii) AF
E is a fibre atlas with fibre space F for (E, π, B).
22.2. Parallelism for non-topological fibrations
22.2.1 Remark: In the interests of pointless minimalism, it is desirable to define parallelism on non-
topological fibre bundles. Parallelism associates elements of the fibre set π −1 ({b}) at different points b of the
base space B of a fibre bundle (E, π, B). A simple global or absolute parallelism would identify an element
f1 ∈ π −1 ({b1 }) in the fibre set at b1 ∈ B corresponding to each fibre f0 ∈ π −1 ({b0 }) at b0 ∈ B. For notation,
write f1 R f2 when f1 and f2 are parallel fibres. The association rule for an absolute parallelism should
be transitive between base points; that is, (f1 R f2 and f2 R f3 ) ⇒ f1 R f3 . Symmetry and idempotence
conditions should also hold. The relation should also be a bijection between the fibre sets at any two points.
This implies that the fibre bundle must be uniform, of course, although a generalized form of parallelism
could be defined for non-uniform fibre bundles by using more general relations than bijections. In the case of
a uniform non-topological fibre space, an absolute parallelism corresponds to a simple equivalence relation
on the total space such that each fibre is equivalent to precisely one fibre in the fibre set at each other base
point.
Absolute parallelism is illustrated in Figure 22.2.1. In this case, a bijection θbb! : Eb → Eb! defines parallelism
for each pair of fibre sets Eb = π −1 ({b}).& These must& obey the transitivity rule θb1 b3 = θb2 b3 ◦ θb1 b2 for
all b1 , b2 , b3 ∈ B. Clearly the function φ&E ! ◦ θbb! ◦ φ&E −1 : F → F is a bijection on F for all b, b# ∈ B.
b b
In the case of non-absolute parallelism, it is usual to talk of “parallel transport”, which means that parallelism
between fibres at two base points in a fibre bundle depends on the choice of path between the points. This
is a kind of “pathwise parallelism”. For this to be meaningful, there must be a definition of permitted paths.
For topological fibre bundles, the special paths are chosen to be continuous. It is difficult to specify pathwise
parallelism on a fibre bundle without a topology or differentiable structure because all paths are equal in an
unstructured set. (An unstructured base set is equivalent to using the discrete topology on the base set. All

F F fibre space
& &
φ&E & φ&E
φ
b1
φ&E b3
fibre chart
b2
θ b1 b2 θ b2 b3
E Eb1 Eb2 Eb3 total space
θ b2 b1 θ b3 b2
π θ b1 b3 projection map
θ b3 b1
B base space
b1 b2 b3
Figure 22.2.1 Non-topological fibration with absolute parallelism
paths are continuous in the discrete topology.) If the base set B is finite, the number of non-intersecting paths
is finite. So it is then not too difficult to formalize pathwise parallelism. When the base set is countable,
questions of limits for infinite paths arise, which brings in questions about the topology. When the base set
is uncountable, it is difficult to make any sensible formalism without a topology.
One way to deal with this problem is to define an arbitrary set Pb1 b2 of subsets of the base space to be
the permitted paths from a point b1 ∈ B to b2 ∈ B, subject to conditions of symmetry and transitivity.
(The concatenation of two permitted paths should be a permitted path, but that it not essential.) Another
approach is to define some notion of locality in the base space. In a discrete base space, the locality could
specify a graph indicating which points are neighbours, and permitted paths could be composed of sequences
of neighbouring points. This would then be the discrete version of a topology, known as a “network topology”.
(For a network topology, the “tangent space” at each point could be simply the set of neighbours of the
point!)
If the set of permitted paths Pb1 b2 has been defined in some way on a fibre bundle for b1 , b2 ∈ B, one may
denote pathwise parallelism as f1 RQ f2 if f1 is parallel to f2 along the path Q ∈ Pb1 b2 . This is then the
definition of “parallel transport” of the fibre f1 ∈ π −1 ({b1 }) from b1 to b2 along the path Q. One could
also denote the parallelism as f2 = ΘQ (f1 ) when f1 RQ f2 . Then a minimum requirement of a parallelism
function Θ should be that ΘQ1 +Q2 = ΘQ2 ◦ ΘQ1 , where Q1 + Q2 denotes the concatenation of paths Q1
and Q2 . (This is illustrated in Figure 22.2.2.) It is even possible to define notions such as curvature in such
a framework.
F F fibre space
& &
φ&E & φ&E
φ
b1
φ&E b3
fibre chart
b2
ΘQ1 ΘQ2
ΘQ3 ΘQ4
π projection map
Q1 Q2
B base space
b1 b2 b3
Q3 Q4
Figure 22.2.2 Non-topological fibration with pathwise parallelism
[ The discussion of paths in Remark 22.2.1 should be updated in view of the new definitions of paths and
curves. For discrete space, could use never-constant curves as an equivalent definition of paths. ]
22.2.2 Remark: Paths are defined in this book as equivalence classes of curves for some suitable equiv-
alence relation. The most important attribute of curves which should be retained in an equivalence class
is the direction of the curves. So curves which go in opposite directions must always be regarded as not
equivalent. Details of the parametrization of curves may be regarded as redundant information which may
be ignored in paths.

22.3. Non-topological fibre bundles 493
Network topology is discussed in Section 16.10. Continuous curves may be defined with respect to network
topologies. Then continuous directed paths may be defined as equivalence classes of continuous curves which
have the same direction.
22.2.3 Remark: Non-topological fibre bundles with structure groups are discussed in Section 22.3. Topo-
logical fibre bundles are presented in Chapters 23–24. Differentiable fibre bundles are presented in Chapter 34.
22.3. Non-topological fibre bundles

This section presents fibre bundles which have structure groups but no topological structure. It is particularly
interesting to see how parallelism is defined on such fibre bundles and their associated principal fibre bundles.
(See Remark 6.6.11 for some comments on non-topological fibrations.)
22.3.1 Definition: A non-topological (G, F ) fibre bundle for an effective left transformation group
< (G, F, σG , µ) is a tuple (E, π, B, AF
(G, F ) − E ) such that
(i) E and B are sets and π is a function π : E → B;

(ii) AFE is a set of functions φ : π
−1
(U ) → F for sets U ⊆ B such that each π × φ : π −1 (U ) → U × F is a
bijection;
−1
(iii) ∀φ1 , φ2 ∈ AF E , ∀b ∈ Dom(φ1 ) ∩ Dom(φ2 ), ∃g ∈ G, βb,φ1 ◦ βb,φ2 = Lg , where βb,φ denotes the function
&
φ&π−1 ({b}) for all b ∈ B and φ ∈ AF E.
B is called the base space of the fibre bundle.

E is called the total space. (Kobayashi/Nomizu [27], page 50, call the total space also the bundle space.)
π is called the projection map.
G is called the structure group.
F is called the fibre space.
E is called the fibre atlas.
AF
The functions φ ∈ AF E are called the fibre charts of the fibre bundle.
The sets π −1 ({b}) may be called the fibres or fibre sets of the fibre bundle.
22.3.2 Remark: Definition 22.3.1 is an extension of Definition 22.1.9. In fact, Definition 22.1.9 is equiv-
alent to the special case that the group (G, F ) in Definition 22.3.1 is the permutation group of F . (See
Example 9.4.16 for permutation groups.)
As mentioned in Remark 22.1.8, charts and atlases contain no real information if the sets B, E and F are
unstructured. (E is structured by the projection map π, but is otherwise unstructured.) In this section, the
fibre space is structured by the structure group G, but the base space B is still unstructured. Therefore the
fibre charts do contain real information about the structuring of the individual fibre sets π −1 ({b}), but the
information is the same as if the charts were specified pointwise – i.e. one chart for each fibre set π −1 ({b}).
[ Here define parallelism on non-topological fibre bundles with structure groups. See Remark 22.2.1 for ideas
on how to do this. ]
[ Here present non-topological principal fibre bundles, with the motivation coming from parallelism for as-
sociated principal fibre bundles. Discuss invariants and covariants. Define porting of parallelism between
associated fibre bundles and define curvature. ]
[ A non-topological version of the Stokes Theorem could be presented here if it is well-defined. Some sort of
generalized definition of curvature could be put here too. ]
22.4. Finite transformation groups as fibre bundles

The only non-trivial example of a non-topological fibre bundle which the author can think of is the case of
finite transformation groups. To be interesting, a class of fibre bundles must have the possibility of non-zero
curvature because curvature is the core concept of differential geometry. Therefore an interesting class of
fibre bundles must have a natural definition of parallel transport which is not trivial on closed paths.
Let (G, F ) be a finite left transformation group. (It doesn’t really need to be finite, but that makes things
simpler initially. Left transformation groups are defined in Section 9.4.) Define the base space for a fibre

bundle by B = G , the set of all integer-valued functions on the set G. The set B may be given an addition
operation σB : B × B → B defined by σB : (b1 , b2 ) 8→ (g 8→ b1 (g) + b2 (g)). Then (B, σB ) is a commutative
group with identity 0B : g 8→ 0.
!
A distance function d : B × B → + 0 may be defined on B by d : (b1 , b2 ) 8→ g∈G |b2 (g) − b1 (g)|. (This is
related to the Hamming distance function in signal processing theory. See Proakis [213], page 415.) This
actually yields a discrete topology (or network topology) on B by defining the set of neighbourhoods of
points b ∈ B as those points with distance 0 or 1 from b.
A local difference operation ∆ may be defined for pairs of elements of B with distance not exceeding 1. For
b1 , b2 ∈ B with d(b1 , b2 ) = 1, define ∆(b1 , b2 ) = h ∈ G if b1 (g) − b2 (g) = δ(g, h) and ∆(b1 , b2 ) = h−1 ∈ G if
b1 (g) − b2 (g) = −δ(g, h), where δ denotes the Kronecker delta function for G. If b1 = b2 , define ∆(b1 , b2 ) = e.
(The set of values of ∆(b1 , b2 ) may be interpreted as the tangent space at the point b1 .)
Let the fibre space of the fibre bundle be F , and let the total space be E = B × F . Define the projection
map π : E → B with π : (b, x) 8→ b, and define a single chart φ : E → F by φ : (b, x) → x. Define the fibre
atlas as AF E = {φ}.
A set of curves may now be defined in the base set B as the set C of all functions γ : [a, b] → B for
intervals [a, b] = {t ∈ ; a ≤ t ≤ b} of the integers, with the constraint that d(γ(t), γ(t + 1)) ≤ 1 for
all t ∈ [a, b) = {t ∈ ; a ≤ t < b}. (This means that the curve is continuous with respect to the discrete
topology on B.) Never-constant curves are are curves in C which satisfy d(γ(t), γ(t + 1)) = 1 for all t ∈ [a, b).
The initial and terminal points of a curve γ : [a, b] → B are the points S(γ) = a and T (γ) = b respectively.
γ
Now parallel transport may be defined on the (G, F ) fibre bundle (E, π, B, AF E ) as a map Θs,t : Eγ(s) → Eγ(t)
γ γ γ
for all never-constant curves γ ∈ C and s, t ∈ Dom(γ) by Θs,t : (γ(s), f ) 8→ (γ(t), σG (gs,t , f )) = (b, gs,t f ),
γ
where gs,t is defined inductively by the rules
∀s ∈ Dom(γ), γ
gs,s = Le
γ γ
∀s, t ∈ Dom(γ), s < t ⇒ gs,t = ∆(γ(t), γ(t − 1))gs,t−1 .
In other words, Θγs,t performs a left action on the fibre element f of (b, f ) ∈ Eγ(s) by a group element
γ
gs,t = ∆(γ(t), γ(t − 1))∆(γ(t − 1), γ(t − 2)) . . . ∆(γ(s + 1), γ(s)) ∈ G. So
" t−1
/
" # #
Θγs,t : (γ(s), f ) 8→ γ(t), ∆(γ(u + 1) − γ(u)) f .
u=s
The curvature of the parallelism Θ may be defined as the map κ : C̄ → G where C̄ = {γ ∈ C ; S(γ) = T (γ)}
γ
is the set of closed curves in B, and κ : γ 8→ gS(γ),T (γ) .
Consider the example of the symmetric group G = S3 of degree 3 acting on the set F = {1, 2, 3}. The 6
elements of this group are eG , g1 = (12), g2 = (23), g3 = (31), g4 = (123) and g5 = (321). Consider the path
γ : [0, 4] → B = G defined by γ(0) = 0B , γ(1) = e1 , γ(2) = e1 + e2 , γ(3) = e2 and γ(4) = 0B , where ek
denotes the element ek : g 8→ δ(g, gk ) of B. Then κ(γ) = g2−1 g1−1 g2 g1 = [g2 , g1 ] = g4 , the commutator of g2
and g1 .
For general finite groups, similar minimal cycles in the vicinity of any point in the base space yield all of the
commutators of the group. Thus curvature for this fibre bundle is very closely related to the commutator
operation of the group. (These minimal cycle curvatures seem to form some sort of Lie algebra.) In the
special case of a commutative group, the curvature is everywhere equal to the identity of the structure group.
Hence the fibre bundle is flat if and only if the group is commutative. All products of elements of the group
may be regarded as continuous paths in this fibre bundle. The difference between any two products which
have the same terms in a different order may be calculated by a kind of discrete Stokes Theorem in terms
of the curvature on minimal cycles which fill in the concatenation of one path with the reverse of the other.
The base space B could presumably be constructed by grafting together patches from the set G . That is,
manifolds with interesting topologies could be constructed from patches of G via identification spaces. For
such manifolds, one could study analogues of exterior derivatives.
One could possibly investigate the solutions of boundary value problems in these kinds of discrete fibre
bundles. For example, one might study equations of prescribed curvature. Initial value problems may also
be of some interest.

22.4. Finite transformation groups as fibre bundles 495
The curvature for a differentiable fibre bundle is calculated as the limit for a sequence of curves which converge
to a point while staying (roughly speaking) within a fixed tangent plane. This then yields a generator (or
‘infinitesimal translation’) of the structure group for each pair of tangent directions rather than an element
of the group.
Further examples of non-trivial discrete fibre bundles with non-zero curvature may be obtained by tesselating
any Lie group or any differentiable fibre bundle with a connection. Thus by embedding a network of
points and curves joining points inside the base space of a differentiable fibre bundle, the connection on the
differentiable fibre bundle may be approximated by a discrete fibre bundle. (This seems to be related to
the homology of simplicial complexes or polyhedra.) This kind of construction would typically yield a set
of discrete curvature values which is not a finite subgroup of the differentiable structure group, but it would
yield a discrete subgroup. In very special cases, the subgroup could be finite.


[497]
Chapter 23
Topological fibre bundles
23.1 History, motivation and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

23.2 Topological fibrations with intrinsic fibre spaces . . . . . . . . . . . . . . . . . . . . . . . 500
23.3 Topological fibrations and fibre atlases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
23.4 Fibration identification spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
23.5 Structure groups discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
23.6 Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
23.7 Fibre bundle homomorphisms, isomorphisms and products . . . . . . . . . . . . . . . . . 512
23.8 Structure-preserving fibre set maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
23.9 Topological principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
23.10 Associated topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
23.11 Construction of associated topological fibre bundles . . . . . . . . . . . . . . . . . . . . . 524
23.12 Construction of associated topological fibre bundles via orbit spaces . . . . . . . . . . . . . 525
This chapter deals with topological fibre bundles. Non-topological fibre bundles are defined in Chapter 22.
Differentiable fibre bundles are defined in Chapter 34.
The fundamental motivation for fibre bundles is to provide a structure on which parallelism may be defined.
Parallelism is defined on topological fibre bundles in Chapter 24. Differentiable parallelism on differentiable
fibre bundles, known as a “connection”, is presented in Chapters 35–37.
Although this chapter may be regarded as a subset of Chapter 34, the relative simplicity of topological fibre
bundles makes it easier to examine the fine detail of definitions in depth without the distractions of topics
such as Lie groups and vector fields which arise in the case of differentiable fibre bundles. Therefore many
issues are investigated in this chapter which are skipped over in Chapter 34.
There are broadly three species of fibre bundles: fibrations (also called groupless fibre bundles), ordinary fibre
bundles and principal fibre bundles. Fibrations are ordinary fibre bundles which have no structure group.
Principal fibre bundles are ordinary fibre bundles for which the structure group and the fibre space are the
same. (These concepts are more fully explained in Section 23.1.) Additional topics include parallelism on
fibre bundles and associations between ordinary and principal fibre bundles.
23.0.1 Remark: For brevity, OFB is short for “ordinary fibre bundle”, and PFB is short for “principal
fibre bundle”. These are non-standard abbreviations.
23.1. History, motivation and overview

23.1.1 Remark: It seems that fibre bundles as a distinct concept were first defined by Eduard Ludwig
Stiefel in 1936 in the context of pathwise parallelism in manifolds. (See EDM2 [35], 147.A.) [ Should be able
to dig up more history than this! ]
23.1.2 Remark: Fibre bundles are designed to support parallel transport in the same sense that roads are
designed to support car, truck and bus transport.


498 23. Topological fibre bundles
23.1.3 Remark: The “elevated” language of fibre bundles provides a level of abstraction which can be quite
annoying to people who want to understand differential geometry with the minimum of fuss. Unfortunately,
the fibre bundle language is widely used and cannot be avoided.
Roughly speaking, a fibre bundle is an attachment of multiple copies of a “fibre space” at the points of a
manifold, one copy being attached to each point in the space. (See Figure 23.2.1.) The aggregate set of
pointwise copies of the fibre space may be given structures such as a topology, a differentiable structure
and a transformation group structure. Examples of fibre bundles include tangent spaces, tensor spaces and
spaces of k-tuples of tangent vectors on differentiable manifolds.
The relevance of fibre bundles in physics arises from the fact that field theories attach various scalar, vectorial
and tensorial objects to each point in space-time. (The word “field” in physics means a function of space
or space-time. This is quite different to the mathematical “field” in Definition 9.8.8, which is an algebraic
structure.) Physical fields are “cross-sections” of fibre bundles. Thus physical fields live inside fibre bundles.
A short, snappy slogan to explain fibre bundles is that “fibre bundles are the spaces where physical fields
live”. Some physics-oriented texts on differential geometry mention no sets at all. It is possible to do
mathematics without identifying which set each object “lives in”. But clear, unambiguous thinking is very
much assisted by associating each mathematical object with a containing set.
The structures of fibre bundles are used in the formulation of the equations of motion of the fields. These
structures add a “horizontal” component to fibre bundles in addition to the “vertical” structure within each
pointwise fibre set. (See Figure 23.2.3.) The whole point of fibre bundles is to specify the horizontal structure
so that the fibre elements at nearby points in the manifold are related in some way instead of being simply
isolated copies of the fibre space with no relation to neighbouring fibres. The horizontal structure is essential
for defining “covariant derivatives” of physical fields. These are derivatives which, roughly speaking, measure
the rate of change over space or space-time of a fields with respect to the horizontal structure on the fibre
bundle.
In summary, a fibre bundle may be characterized as “one copy of a fibre space F at each point of a base
space B, plus some horizontal structure to specify relations between the copies of F at different points of B”.
The horizontal relations may be visualized as a kind of glue attaching fibres at neighbouring points.
23.1.4 Remark: A fibration has a “base set” B % to which is attached a “fibre set” Eb at each b ∈ B. These
fibre sets are pairwise disjoint. Their union E = b∈B Eb is called the “total space” of the fibration. The
function π : E → B which maps points in each fibre set Eb to its base point b is called the “projection map”
of the fibration. It follows that Eb = π −1 ({b}) for all b ∈ B. (See Figure 22.1.1.) The customary notation
“E ” for the total space is most likely mnemonic for the French “entier” which means “whole”, “entire” or
“total”.
A typical fibre bundle is the set of tangent vectors at points of a manifold. (See Section 27.8 for total tangent
spaces.) The set of tangent vectors is the total space of a fibration because each tangent vector is attached
to a unique point of the manifold. (Crampin/Pirani [12], chapter 14, page 353, give a succinct motivation
for fibre bundles as a generalization of tangent bundles.)
A topological fibration has a total space E and base space B which are topological spaces such that the
projection map π : E → B is continuous. The fibre sets Eb = π −1 ({b}) are required to be homeomorphic
to each other in the relative topology of E. The full definition of fibrations (or “groupless fibre bundles”) is
given in Sections 23.2 and 23.3.
23.1.5 Remark: Definitions of parallelism are required to preserve the structure of fibre sets at different
points of a base space. The structure and symmetry of fibre sets are specified in terms of a “structure group”.
A fibre bundle is a fibration for which a structure group is specified. Fibre bundles with structure groups
(or “ordinary fibre bundles”) are defined in Section 23.6. Unless otherwise stated, a fibre bundle will usually
mean an ordinary fibre bundle, but sometimes it means either of the three species of fibre bundles.
23.1.6 Remark: Principal fibre bundles, defined in Section 23.9, are fibre bundles for which the structure
and fibre space are the same, but additionally a “right action” map is defined. This is used in definitions of
parallelism.
23.1.7 Remark: Figure 23.1.1 is a comparative sketch of the maps and spaces which are required in the
definitions of fibrations, fibre bundles and principal fibre bundles.

23.1. History, motivation and overview 499
G G
Rg
Lg Lg
π −1 (U ) ⊆ E F π −1 (U ) ⊆ E Lg,φ q −1 (U ) ⊆ P Lg,φ
φ
F G
φ φ
π
×
π π q
q×
φ
φ
φ
U ⊆B U ×F ⊆B×F U ⊆B U ×F ⊆B×F U ⊆B U ×G⊆B×G
fibration (ordinary) fibre bundle principal fibre bundle
Figure 23.1.1 Fibrations, fibre bundles and principal fibre bundles
Each of these three species of fibre bundle is built on a base space B with a fibre space F (or G). There
are projection maps π or q and local charts φ mapping the total space E or P to the fibre space so that
π × φ or q × φ is a local homeomorphism. A fibration has no structure group G, whereas the fibre space F
of a principal fibre bundle is the same as the structure group G. The principal fibre bundle also has a right
action Rg on the total space P for each g ∈ G. (Any similarity between the two fibre bundle figures on the
right and the Trojan rabbit in Scene 9 of “Monty Python and the Holy Grail” (1974) is purely coincidental.
See Chapman et alia [197], page 27.)
23.1.8 Remark: There are many similarities between the definitions for topological manifolds and topo-
logical fibre bundles because their global structure may be expressed in terms of an atlas. Charts and atlases
are optional for topological manifolds and topological fibrations, but they are required for topological fibre
bundles which have a structure group. The following table summarizes the relations between some basic
definitions for topological manifolds and fibre bundles.
topological topological topological
concept manifold fibration fibre bundle
basic definition Definition 25.3.1 Definition 23.2.1 Definition 23.6.4
chart Definition 25.4.2 Definition 23.3.4 (Definition 23.3.4)
compatible chart (Definition 25.4.2) (Definition 23.3.4) Definition 23.6.14
atlas Definition 25.4.6 Definition 23.3.14 (Definition 23.6.4)
equivalent atlas (Definition 25.4.6) (Definition 23.3.14) Definition 23.6.16
The author has tried as much as possible to harmonize the definitions for manifolds and fibre spaces.
23.1.9 Remark: Concerning the specification and parametrization of fibre bundles, one may compare the
usual practice for fibre bundles with that for manifolds. One does not usually include in the specification
tuple for a topological n-dimensional manifold the set IRn and the topology on IRn . In the case of a C k
differentiable manifold, one does not include the pseudogroup of local diffeomorphisms of IRn as part of
the specification tuple. These structures are assumed to exist outside the manifold. An atlas specifies the
relation of the manifold to the external Euclidean space which possesses the topological and/or differentiable
structure. Oddly, in the case of fibre bundles, the usual practice is to include the structure group and fibre
space in the specification tuple.
23.1.10 Remark: On the subject of terminology, it would be sensible to call a fibre set Eb a “fibre” at b,
but some texts use the word “fibre” for an extrinsic fibre space, which is defined in Definition 23.3.3.
It is tempting to think of each member of the fibre set Eb as a fibre, but the English-language words “thread”
and “filament” perhaps better describe elements of the fibre set, while the fibre (set) can be thought of as
consisting of many individual filaments spun together. So a fibre is more like rope or knitting wool.

23.2. Topological fibrations with intrinsic fibre spaces

Fibrations are groupless fibre bundles, but topological fibrations are the same as fibre bundles which have
a minimal structure group, namely the group of topological automorphisms of the fibre space. Therefore
fibrations have an implicit structure group. However, topological fibre bundles without an explicit structure
group are used so frequently that it is useful to have a special word for them. [ A candidate topology for the
automorphism group might be the direct product topology, which corresponds to pointwise convergence. ]
Groupless fibre bundles are called “fibrations” by Crampin/Pirani [12], page 353, although they define only
C ∞ differentiable fibrations. The Gallot/Hulin/Lafontaine [20] definition 1.91 of a topological fibre bundle is
groupless, but the authors immediately proceed to rename such fibre bundles to “fibrations” without warning.
The definition of a fibration in EDM2 [35], 148.B, requires the projection map to have a “covering homotopy
property” (and they define “fibration” and “fibre space” to be synonyms). The EDM2 [35] fibration definition
is an algebraic topology concept which is totally different to the definition in this book. It would probably
be safest to always use the term “groupless fibre bundle” instead of “fibration”. But the former term is
too long. Therefore the terminology of Crampin/Pirani [12] is adopted in this book: “fibration” will mean
a groupless fibre bundle, either non-topological (Section 22.1), topological or differentiable. The algebraic
topology usage will be ignored.
A fibration (Definition 23.2.1) is specified by a triple (E, π, B), where E and B are topological spaces and
π : E → B is a continuous function. For each b ∈ B, the set Eb = π −1 ({b}) is thought of as “the fibre at b”.
Each fibre may be pictured as attached to b as its base point. (See Figure 23.2.1.)
π −1 ({b1 }) π −1 ({b2 }) π −1 ({b})
total space
E
partitioned into fibres
π projection map
B base space
b1 b2 b
Figure 23.2.1 Partition of total space into fibres by projection map
The fibres π −1 ({b}) are clearly disjoint subsets of E. In fact, they constitute a partition of E. (This
partitioning is a general property of inverses of functions which has nothing to do with continuity or topology,
of course. See Remark 6.9.7 for non-topological fibre bundles.) Although π is defined from E to B, it should
be thought of as a one-to-many map from B to E. The map from B to E is defined as the inverse of a
many-to-one function. But one should think in terms of a set of disjoint subsets of E, one attached to each
point of B.
The space B is called the “base space” of the fibre bundle, E is called the “total space”, and π is called the
projection map. The fibres π −1 ({b}) are required to be topologically equivalent (i.e. homeomorphic) to each
other, and they must vary continuously in some sense with respect to the base point b. This continuity is
expressed through local “fibre charts”. Since the fibres at all points of B are topologically equivalent, it is
usual to specify a separate topological space F such that π −1 ({b}) ≈ F for all b ∈ B, and F is nominated
as “the” fibre space for the fibre bundle. The choice of F is clearly arbitrary up to homeomorphism. For
clarity, spaces of the form π −1 ({b}) may be referred to as an “intrinsic fibre space”, whereas a set such as F
may be referred to as an “extrinsic fibre space”.
Definition 23.2.1 is a slightly unconventional definition for a fibre bundle. The standard definitions have an
extrinsic fibre bundle F such that locally the total space E is homeomorphic to the cross product of the base
space B with F .
Given a topological fibration (E, π, B) with an intrinsic fibre space according to Definition 23.2.1, the choice
of an extrinsic fibre space F is arbitrary up to homeomorphism. Therefore it is best not to include a
particular choice of F in the definition. However, in practice most people want a particular choice of fibre
space. Therefore topological fibre bundles are formulated here in the following three steps.

23.2. Topological fibrations with intrinsic fibre spaces 501
(1) Definition 23.2.1 defines a fibration with an intrinsic fibre space;

(2) Definition 23.3.7 defines a fibration with an extrinsic fibre space;
(3) Definition 23.3.14 defines fibre atlases for fibrations with an extrinsic fibre space.
For practical purposes a fibre atlas is very useful, but for the formal definition it is not essential except when
a structure group is specified.
23.2.1 Definition: A topological fibration with intrinsic fibre space is a tuple (E, π, B) −
< (E, TE , π, B, TB )
such that
(i) E− < (E, TE ) and B − < (B, TB ) are topological spaces,
(ii) π : E → B is continuous,
(iii) ∀b ∈ B, ∃U ∈ Topb (B), ∃φ : π −1 (U ) → π −1 ({b}), π × φ : π −1 (U ) ≈ U × π −1 ({b}),
(iv) ∀b1 , b2 ∈ B, π −1 ({b1 }) ≈ π −1 ({b2 }).
E is called the total space of (E, π, B).
π is called the projection map of (E, π, B).
B is called the base space of (E, π, B).
For any b ∈ B, the set π −1 ({b}) is called the fibre (set) of (E, π, B) at b.
The maps φ are called intrinsic fibre charts for (E, π, B).
23.2.2 Remark: The spaces and maps in Definition 23.2.1 are illustrated in Figure 23.2.2. Recall that
Topb (B) denotes the set of all open neighbourhoods of b ∈ B. (See Notation 14.3.11.) See Definition 6.9.12
for pointwise direct products of functions such as π × φ.
φ
π −1 (U ) ⊆ E π −1 ({b})
π
×
π
φ
U ⊆B U × π −1 ({b}) ⊆ B × π −1 ({b})
Figure 23.2.2 A map φ for an intrinsic fibre space π −1 ({b})
Definition 23.2.1 (iii) implies by Theorem 15.1.9 that π −1 ({b1 }) ≈ π −1 ({b2 }) for all b1 , b2 ∈ U . In the case
of a connected topology on B, this implies condition (iv). So condition (iv) is superfluous in the case that
B is connected.
Since the fibres π −1 ({b}) are pairwise homeomorphic, any topological space F which is homeomorphic to one
such fibre is homeomorphic to all of them. The space F is therefore uniquely defined up to homeomorphism
and is uniform over all of B.
Definition 23.2.1 (iv) means that the fibres of (E, π, B) are globally uniform. This is reminiscent of the
requirement that a manifold have the same dimension at all points. Just as a manifold could be alternatively
defined to have different dimensions on different components, so a fibration or fibre bundle could be defined
to have different fibre spaces on different components of the base space. But the benefits of the increased
generality would not outweigh the nuisance.
23.2.3 Remark: Trivial examples are often useful for checking the basic sanity of definitions and theorems.
The topological space (B, TB ) in Definition 23.2.1 could be as trivial as the space (∅, {∅}), which implies that
E = ∅ and π = ∅. This satisfies all of the conditions of Definition 23.2.1. This example could be referred to
as the trivial or empty topological fibration.
Another kind of triviality occurs if (E, TE ) = (∅, {∅}) for an arbitrary topological space (B, TB ). Then π = ∅,
and condition (iii) is satisfied by U = B and φ = ∅ for all b ∈ B.
A slightly less trivial example is where (E, TE ) = (B, TB ) for any topological space (B, TB ) and π is the
identity idE on E. Then π −1 ({b}) = {b} for all b ∈ B and condition (iii) is satisfied by U = B and φ : b# 8→ b

for all b# ∈ B. (The discovery of further trivial examples is left as an exercise for the interested student.
Everybody else may go home when they’ve tidied their desks.)
23.2.4 Remark: A map of the form φ : π −1 (U ) → π −1 ({b}) maps the fibres at all points in U to the fibre
at b. This map is continuous. To clarify this, let F = Eb = π −1 ({b}) and consider the homeomorphism
ψ = (π × φ)−1 : U × F ≈ π −1 (U ) projected onto a fixed f ∈ F . Define ψf : U → π −1 (U ) ⊆ E by
ψf (b) = ψ(b, f ) = (π × φ)−1 (b, f ). This is illustrated in Figure 23.2.3.
π −1 ({b1 }) π −1 ({b2 }) π −1 ({b})
E ψf (b1 )
ψf (b2 )
f
B
b1 b2 b
Figure 23.2.3 Continuity of local chart with respect to base point
The function ψf is continuous because π × φ is a homeomorphism. So the point in each fibre π −1 ({b# }) which
corresponds to the fixed f ∈ F varies continuously with respect to b# ∈ U ⊆ B. This may be thought of as a
“gluing together” of the fibre sets by the maps φ. There are many ways to glue the fibre sets together, and
this is what makes a fibration more than a mere set product of two topological spaces. One might say that
the fibre sets are “continuously connected”. (This should not be confused with connections on differentiable
fibre bundles, which are definitions of differential parallelism. In fact, the maps φ could define a kind of local
parallelism, but this is usually completely irrelevant to any actual connection which is defined.)
The way in which a fibre bundle provides “glue” for the fibre sets is particularly clear in the case of a Möbius
strip where it is not possible to construct a continuous curve which passes through each fibre set exactly
once. This implies that it is not possible to cover a Möbius strip with a single fibre chart.
23.2.5 Remark: Definition 23.2.1 of a fibration as a projection map π : E → B for topological spaces
E and B is concise and tidy, but it has a problem: the fact that in many important cases, the set E is
difficult to specify directly. For example, consider the tangent fibration of a sphere. The tangent fibration
of S n for n ≥ 2 is clearly not globally homeomorphic to the topological product space S n × IRn . It is hard
work determining constructions for tangent fibrations from standard topological spaces in terms of product
topologies, quotient topologies, relative topologies and so forth. In fact, the simplest and most general way
to construct tangent fibrations out of standard topologies (such as IRn , S n and projective spaces) is to use
“grafting”. This construction technique is justified by Theorem 15.10.5 and Definition 15.10.6. The reason
that this kind of grafting of patches produces tangent fibrations so easily is the fact that a manifold is defined
to be locally homeomorphic to a Euclidean space, which implies that manifolds are globally homeomorphic
to a graft of Euclidean space patches.
23.2.6 Remark: A non-trivial example of a topological fibration is the set of local coordinates on S 2 . In
a neighbourhood of each point on a 2-sphere, it is straightforward to set up a homeomorphism between the
local coordinates at each point in the neighbourhood and the given point. However, this map cannot be
extended as a homeomorphism to the tangent space for the whole of the sphere. Therefore the tangent space
of the sphere is not globally homeomorphic to the simple topological product of S 2 with the fibre space at
any point.
23.3. Topological fibrations and fibre atlases

This section adds fibre spaces, fibre charts and fibre atlases to the fibrations with intrinsic fibre spaces defined
in Section 23.2. The specification of an external fibre space for an intrinsic fibration (Definition 23.2.1)
yields the topological fibration with extrinsic fibre space in Definition 23.3.7. The addition of a fibre atlas
to Definition 23.3.7 yields the topological fibration with an atlas in Definition 23.3.17.

23.3. Topological fibrations and fibre atlases 503
23.3.1 Remark: Contrary to traditional practice, the fibre space for a fibration is not included in specifi-
cation tuples in this book. Just as the set IRn is not usually included in the specification tuple for a manifold,
so it is also undesirable to include fibre spaces (and structure groups) in fibre bundle specification tuples.
This is partly because inclusion of fibre spaces (and structure groups) makes the tuples excessively long, but
mainly because these parameters are part of the class specification rather than the object specification. The
principle here is that class specifiers should not be included in object specifiers.
23.3.2 Remark: The spartan simplicity of topological fibrations with intrinsic fibre spaces in Section 23.2
is inconvenient in practice. Extrinsic fibre spaces are introduced in Definition 23.3.3. By Definition 23.2.1,
a fibre space is homeomorphic to all fibre sets π −1 ({b}) of a topological fibration if and only if it is homeo-
morphic to just one such fibre set. (A trivial exception is the empty fibration alluded to in Remark 23.2.3
where B = ∅ and therefore (F, TF ) = (∅, {∅}) is the fibre space.)
23.3.3 Definition: A fibre space or standard fibre for a topological fibration (with intrinsic fibre space)
(E, π, B) is any topological space F such that F ≈ π −1 ({b}) for all b ∈ B.
23.3.4 Definition: A fibre chart for a fibre space F for a topological fibration (with intrinsic fibre space)
(E, π, B) is any function φ : π −1 (U ) → F such that π × φ : π −1 (U ) ≈ U × F for some U ∈ Top(B). (See
Figure 23.3.1.)
φ
π −1 (U ) ⊆ E F
π
×
π
φ
U ⊆B U ×F ⊆B×F
Figure 23.3.1 Fibre chart φ for an extrinsic fibre space F
23.3.5 Remark: Many texts, such as EDM2 [35], 147.B, define fibre charts for fibrations and fibre bundles
in the inverse direction to that specified in Definition 23.3.4. They define fibre charts to have the form
ψ : U × F ≈ π −1 (U ). The relations between the two styles of definitions are π × φ = ψ −1 and φ = π2 ◦ ψ −1 ,
where π2 : B × F → F denotes the projection π2 : (b, f ) 8→ f . The fibre-to-total-space form of chart ψ is less
economical in the sense that an additional condition is required, namely that π(ψ(b, f )) = b for all b ∈ U .
Both Kobayashi/Nomizu [27], page 50, and Crampin/Pirani [12], page 354, define fibre charts in the same
direction (total space to standard fibre) as here. [ See also Gallot/Hulin/Lafontaine [20], 1.91, page 31. ]
23.3.6 Remark: The fact that there is no distinction between a fibre chart and a compatible fibre chart
in Definition 23.3.4 is due to the fact that any chart which is compatible with the topology on E and B
must also be compatible with all other such charts since the requirement for chart compatibility is purely
topological. (This is explained in more detail in Remark 23.3.15.)
Therefore all fibre atlases for topological fibrations are equivalent. So the specification of a fibre atlas is
optional for topological fibrations. Definition 23.3.7 shows that topological fibrations may be defined in
terms of local homeomorphisms rather than a global atlas. This is completely analogous to Definition 25.3.1
which defines a topological manifold without an atlas.
23.3.7 Definition: A topological fibration with fibre space F for a topological space F −
< (F, TF ) is a tuple
< (E, TE , π, B, TB ) such that
(E, π, B) −
< (E, TE ) and B −
(i) E − < (B, TB ) are topological spaces,
(ii) π : E → B is continuous,
(iii) ∀b ∈ B, ∃U ∈ Topb (B), ∃φ : π −1 (U ) → F, π × φ : π −1 (U ) ≈ U × F .

23.3.8 Definition: A cross-section over a set B # ⊆ B for a topological fibration (E, π, B) is a continuous
function f : B # → E such that π ◦ f = idB ! .
A cross-section of a topological fibration (E, π, B) is a cross-section over B.
23.3.9 Notation: X(E, π, B) for a topological fibration (E, π, B) denotes the set of all cross-sections
of (E, π, B).
23.3.10 Remark: In Definition 23.3.8, a cross-section over a subset B # of the base space B of a fibration
(E, π, B) represents a choice of fibre element f (b) ∈ Eb = π −1 ({b}) for each b ∈ B # because π ◦ f = idB ! .
More generally, the right inverse f of any surjective function g : X → Y represents the choice of an element
f (y) ∈ g −1 ({y}) for all y ∈ Y . A cross-section is really just a continuous right inverse of the projection map.
This has nothing at all to do with any structure group. The choice of fibre space for the fibration is not
directly involved either.
It turns out that cross-sections are very important in the study of fibre bundles and differential geometry
in general. For example, parallel transport is represented as cross-sections over paths in the base space B,
and all of the fields of physics are represented as cross-sections over regions of the base space. Cross-sections
are also an important tool for investigating the global topology of the base space. A useful way to think of
cross-sections is as “fibre fields” by analogy with vector fields in physics.
23.3.11 Remark: Terminology for cross-sections is varied. Some authors (e.g. Crampin/Pirani [12]) hy-
phenate to “cross-section” and give “section” as an alternative. Other authors (e.g. Darling [14], Gal-
lot/Hulin/Lafontaine [20], Lang [31], Lee [33]) use only “section”. Some (e.g. EDM2 [35], Kobayashi/
Nomizu [27]) use only “cross section”. Some (e.g. Frankel [19], Spivak [43]) use principally “section”, but
sometimes “cross section”.
The best practical choice of name may be “cross-section”. This avoids confusion with the word “section” as
in “chapters and sections”. It also avoids the line breaks that may occur with the two words “cross section”,
which would make string search on computer files difficult. In casual writing and discussion, probably
“section” has no disadvantages, but in book-writing, the hyphenated “cross-section” seems best.
23.3.12 Remark: Since fibre charts φ : π −1 (U ) → F for a fibre space F define homeomorphisms π × φ :
π −1 (U ) → U × F , the continuity of a cross-section in Definition 23.3.8 is equivalent to continuity “through
&
the fibre charts”. In other words, a cross-section is continuous if and only if the map φ ◦ f &U : B # ∩ U → F is
continuous for all fibre charts φ. However, if the set of fibre charts is constrained to a specified atlas (as in
the case of a fibre bundle with a structure group), this does not in any way constrain the class of continuous
cross-sections.
23.3.13 Remark: The fibre charts in Definition 23.3.4 are expressed in terms of the arbitrary extrinsic
fibre spaces in Definition 23.3.3. Since the choice of fibre space is arbitrary up to homeomorphism, fibre
charts could in principle be targeted at more than one fibre space. To avoid such chaos, fibre atlases in
Definition 23.3.14 are required to be targeted at a fixed choice of fibre space.
Although atlases are defined here in terms of a pre-defined topology on the fibration, this logic is the reverse
to usual practice. It is more usual to define the atlas first and then define the topology so that all of the
charts in the atlas are homeomorphisms.
23.3.14 Definition: A fibre atlas for a fibre space F % for a topological fibration (E, π, B) is a set AF
E of
fibre charts for the fibre space F for (E, π, B) such that φ∈AF Dom(φ) = E.
E
An indexed fibre atlas for fibre space F for

% a topological fibration (E, π, B) is a family (φi )i∈I of fibre charts
for fibre space F for (E, π, B) such that i∈I Dom(φi ) = E. (That is, Range(φ) is a fibre atlas for F .)
23.3.15 Remark: The chart transition function for charts φi and φj in an indexed fibre atlas in Defini-
tion 23.3.14 is + & ,
&
(π × φi ) ◦ (π × φj )−1 & : (Ui ∩ Uj ) × F → (Ui ∩ Uj ) × F.
(Ui ∩Uj )×F
This is necessarily continuous by Definition 23.3.4. (See Figure 23.3.2.) Therefore it is unnecessary to add
any subsidiary condition on the regularity of transition functions to Definitions 23.3.14 and 23.3.17. As in the

23.3. Topological fibrations and fibre atlases 505
case of manifolds, it is necessary to equip a fibration or ordinary fibre bundle with an atlas with prescribed
regularity only in the case of higher classes of regularity than mere continuity. Similarly, it is not necessary to
define the notion of equivalent atlases, since all fibre charts on a fixed fibration are automatically compatible.
F π −1 (Ui ) π −1 (Uj ) F
φi φj
π −1 (Ui ∩ Uj )
i π◦
φ φj
π◦ π
Ui × F Ui × F
(Ui ∩ Uj ) × F Ui ∩ Uj (Ui ∩ Uj ) × F
Ui Uj
B×F B×F
Uj × F Uj × F
B
+ & ,
(π × φi ) ◦ (π × φj )−1 &(U
i ∩Uj )×F
Figure 23.3.2 Chart transition map for a fibration with fibre space F
23.3.16 Remark: Definition 23.3.17 is an alternative to Definition 23.3.7. For the theoretical development,
the atlas-free Definition 23.3.7 is often preferable, whereas for practical applications it is often preferable to
have the atlas as in Definition 23.3.17.
23.3.17 Definition: A topological fibration with a fibre atlas for the fibre space F for a topological space
F is a tuple (E, π, B) − E ) such that
< (E, TE , π, B, TB , AF
(i) E −
< (E, TE ) and B −
< (B, TB ) are topological spaces and π : E → B is continuous,
(ii) ∀φ ∈ AE , ∃Uφ ∈ TB , π × φ : π −1 (Uφ ) ≈ Uφ × F .
F
%
(iii) φ∈AF Uφ = B.
E
23.3.18 Remark: Since the charts φ ∈ AF E in Definition 23.3.17 are homeomorphisms, the topologies TE
and TB may be discarded without losing information. This is analogous to the fact that the atlas on a
differentiable manifold makes the specification of the topology unnecessary. So a topological fibration could
E ) without loss of information. In fact, the sets E and B (and the sets G and F )
be specified as (E, π, B, AF
can be discarded too.
23.3.19 Remark: Numerous standard constructions may be defined for topological fibrations. For exam-
ple, the direct product fibration in Definition 23.3.20 is based on Definition 23.3.7.
23.3.20 Definition: The direct product of two topological fibrations (E1 , π1 , B1 ) − < (E1 , TE1 , π1 , B1 , TB1 )
and (E2 , π2 , B2 ) −
< (E2 , TE2 , π2 , B2 , TB2 ) with fibre spaces F1 −
< (F1 , TF1 ) and F2 −
< (F2 , TF2 ) respectively
is the topological fibration (E, π, B) − < (E, TE , π, B, TB ) with fibre space F −< (F, TF ) which satisfies the
following.
(i) (E, TE ) = (E1 , TE1 ) × (E2 , TE2 ), (B, TB ) = (B1 , TB1 ) × (B2 , TB2 ) and (F, TF ) = (F1 , TF1 ) × (F2 , TF2 )
are direct product topological spaces. (See Definition 15.1.1.)
(ii) π = π1 × π2 : E → B is the direct product of π1 and π2 . (See Definition 6.9.11.)
23.3.21 Theorem: The tuple (E, π, B) −

< (E, TE , π, B, TB ) in Definition 23.3.20 is a topological fibration
with fibre space F .

Proof: For any b = (b1 , b2 ) ∈ B, there are fibre charts φ1 : π1−1 (U1 ) → F1 and φ2 : π2−1 (U2 ) → F2
for U1 ∈ Topb1 (B1 ) and U2 ∈ Topb2 (B2 ) for the two respective fibrations. The direct product function
φ = φ1 × φ2 : U1 × U2 → F is a fibre chart for (E, π, B) with b ∈ π(Dom(φ)).
23.4. Fibration identification spaces

The term “identification spaces” is used in EDM2 [35], 147.B to describe spaces which are constructed from
patches by identifying the overlaps. Atlases are not provided for the fibrations constructed in this section,
but in a sense, the constructed fibration is really an atlas itself. Structure groups are also not provided in
this section because they are really a property of a fibration with a particular choice of atlas.
23.4.1 Remark: The construction, or reconstruction, of a topological fibration from charts is discussed by
Crampin/Pirani [12], page 355. (See also Kobayashi/Nomizu [27], page 52.) There is more than one way to
construct a fibration as a graft of cross products of a base space with a fibre space. In Definition 23.4.2, a
fixed pre-defined base space B is assumed. But it is also possible to construct the base space via the graft at
the same time as constructing the fibre space structure on top of that base space. Set identification spaces
are defined in Section 6.10.
23.4.2 Definition: Let (Ui )i∈I be a family of open subsets of a topological space B. Let F be a topological
space. A topological fibration identification space with base space B and fibre space F is a set graft X ⊆
˚i∈I (Ui × F ) together with a topology T , where X satisfies
×
(i) ∀(bi , fi )i∈J ∈ X, ∀i, j ∈ J, bi = bj ,
(ii) ∀i, j ∈ I, ∀b ∈ Ui ∩ Uj , ∃x ∈ X, {i, j} ⊆ Dom(x), and
(iii) the family of topological spaces (Ui × F, Top(Ui × F ))i∈I is topologically consistent with the graft X.
The topology T on X is the graft topology on X derived from the patch topologies (Top(Ui × F ))i∈I .
[ Perhaps should define an atlas for the fibration in Definition 23.4.2? ]
23.4.3 Remark: Definition 23.4.2 is illustrated in Figure 23.4.1. Set grafts are introduced in Defini-
tion 6.10.5. Topological grafts are presented in Definition 15.11.2.
f2
F
F f1 f3 F
U2
b
U1 U3
Figure 23.4.1 Element ((b, f1 ), (b, f2 ), (b, f3 )) of graft set X
From Definition 6.10.5 (iii), it follows that all elements of all sets Ui × F are represented in the set graft X.
In other words, ∀i ∈ I, ∀b ∈ Ui , ∀f ∈ F, ∃x ∈ X, xi = (b, f ).
It follows from Definition 6.10.5 (ii) that an element of a set Ui × F matches at most one element of each set
Uj × F for i -= j. Hence each element in each Ui × F is in precisely one element of X.
Definition 23.4.2 (i) means that only pairs (b, f ) ∈ B × F with the same base point b are permitted to be
matched up in the set graft. Definition 23.4.2 (ii) then implies that each element (b, fi ) of each set Ui × F
must be matched to precisely one element (b, fj ) in each other set Uj × F such that b ∈ Uj . (Condition (i)

23.5. Structure groups discussion 507
only says that if there is a matching element in Uj × F , then it must have the same base point, whereas
condition (ii) means that there is at least one such point.)
The purpose of the graft is simply to specify which fibre elements within one patch of the graft are matched
up to which fibre elements for the same base point in different patches. This means that the set graft X
does not do any grafting of the base set B. The same result could have been achieved by defining a graft
of a family of copies of F , one copy for each i ∈ I, at each point in B. But that would be quite clumsy to
define.
The set B could itself be constructed as a graft of patches. It could be convenient, in fact, to do the grafting
of both the base space and the fibres at the same time, but this is not done in Definition 23.4.2.
The topological consistency of the product topologies of the sets Ui × F with the graft X means that the
graft transition maps hij : (Ui ∩ Uj ) × F → (Ui ∩ Uj ) × F are continuous with respect to these product
topologies for all i, j ∈ I. The graft maps hij are defined so that hij (b, fi ) = (b, fj ) if and only if there
is some x ∈ X such that xi = (b, fi ) and xj = (b, fj ). The continuity of a graft map hij is equivalent to
the continuity of the projection of hij onto the fibre space F plus the continuity of this automorphism with
respect to the point b ∈ B. [ This needs to be checked and explained better. ]
23.4.4 Theorem: A topological fibration identification space is a topological fibration.
[ Here define combined set+fibre topological grafts. These are probably more useful in practice. ]
[ Somewhere define topological fibre bundle identification spaces with a structure group? Also define topolog-
ical principal fibre bundle identification spaces? ]
23.5. Structure groups discussion

Fibre bundles are similar to topological manifolds in many ways, but fibre bundles have a kind of struc-
ture contraint imposed by a “structure group” which has no exact analogue for manifolds. However, the
“pseudogroup” of local diffeomorphisms of a Euclidean space (Kobayashi/Nomizu [27], page 1) is a kind of
structure group which represents regularity for a manifold. Such structure groups or pseudogroups constrain
the notion of a “compatible chart” in both cases.
The use of structure groups in the definition of fibre bundles helps to align differential geometry with
Felix Klein’s Erlanger Programm (1872), which proposed that each geometry should have an associated
transformation group under which all properties and relations of figures are invariant. Even though there
are generally no such global groups for a differentiable manifold, they are present at each point of a fibre
bundle such as a tangent bundle. (For at least 50 years after 1872, it was the fashion for every mathematician
to discover that their subject of study was invariant under some group or other so as to increase the status
of their work. See Bell [190], pages 445–6.)
The definition of a “fibre bundle with structure group” requires some group membership and continuity
constraints on the overlaps of fibre charts. The “structure group” is an effective topological group G of left
transformations on the fibre space F .
In the case of a fibration or groupless fibre bundle (E, π, B) with fibre space F , there is an implicit structure
group G, namely the group Aut(F ) of topological automorphisms of the fibre space F . Thus the fibres
Eb = π −1 ({b}) for b ∈ B are topological copies of F which vary continuously in E with respect to b. In the
overlap of fibre charts, the transition maps are required only to be homeomorphisms of F , i.e. elements of
the topological automorphism group of F .
One fly in the ointment here is the task of determining what topology to put on the automorphism group for
a general topological space F . This may be an unavoidable difference between a topological fibration with
fibre F and a topological (G, F ) fibre bundle whose structure group G is the group of automorphisms of F .
[ Should try to resolve the issue of the topology of implicit structure groups. One candidate for the “natural”
topology on the set of topological automorphisms of F might be the topology of pointwise convergence, which
is the same as the product topology. See EDM2 [35], 435.B. Another good candidate is the compact-open
topology. (See Definition 15.7.9.) ]
In the case of a fibre bundle with a structure group, the structure group G is some subgroup of the topological
automorphism group of F . The pointwise transition maps between fibre charts are required to be elements

of G, which will usually be a proper subgroup of the topological automorphism group of F . These pointwise
group elements are required to vary continuously with respect to position in the manifold.
The structure group of a fibre bundle is utilized not only in determining allowable transition maps for fi-
bre chart overlaps but also in determining whether two charts are compatible, and this in turn determines
whether two atlases are equivalent. Thus if, for example, F is a vector space and G is the group of linear
transformations of F , then any two fibre charts must match in their overlap region in a pointwise linear fash-
ion. If G consists of orthogonal transformations, the chart overlaps must be related pointwise orthogonally,
and so forth. The important point here is that the fibre π −1 ({b}) at each point b ∈ B is thought of as a full
copy of the fibre space F together with all of the algebraic and other structures of F . These structures are
preserved under transformations specified by the structure group G.
If the maximal fibre atlas is constructed from a given fibre atlas A for a (G, F ) fibre bundle (E, π, B), this
maximal atlas will generally not be a valid atlas if G is replaced by a subgroup of G. Thus maximal fibre
bundle atlases vary inversely with respect to the structure group in the sense of the partial order of set
inclusions. The larger the structure group, the smaller the maximal atlas, and vice versa. This is analogous
to the situation with regularity of manifolds, where the higher the regularity of the diffeomorphism group
on IRn , the smaller the corresponding maximal atlas, and vice versa.
Instead of dealing in maximal atlases (which are very infinite), it is probably better to use language such as
“the (G, F ) fibre bundle generated by atlas A on (E, π, B)”. This then leaves open the question of whether
one defines the generated fibre bundle as an equivalence class of atlases, a maximal atlas, or some other
set-theoretic construction. If a set must be decided upon as the “essence of the fibre bundle”, then probably
an equivalence class of atlases is preferable. But this equivalence class varies according to one’s choice of
structure group and regularity class. So it is best to resist the temptation to construct maximal atlases.
For any given topological fibration (E, π, B), the fibre space F is uniquely determined up to homeomorph-
ism. Given both (E, π, B) and a particular choice of F , the group G must be the group of all topological
automorphisms of F because all possible fibre charts are considered to be compatible fibre charts. If the set
of fibre charts is constrained to lie in a particular atlas, then G is constrained only to be a superset of the
set of pointwise transition maps of the charts in the atlas. Thus the choice of atlas constrains the choice of
structure group G. Conversely (actually contrapositively), the choice of structure group constrains the atlas.
23.6. Topological fibre bundles
23.6.1 Remark: A small family tree for topological fibre bundles is shown in Figure 23.6.1.
topological space non-topological fibration

(B,TB ) (E,π,B)
topological fibration
(E,TE ,π,B,TB )
topological fibre bundle

(E,TE ,π,B,TB ,AF E)
Figure 23.6.1 Family tree for topological fibre bundles
23.6.2 Remark: To distinguish a manifold atlas from a fibre atlas, the notation AM will be used for a
manifold atlas whereas the fibre space F will be indicated explicitly in the notation AFE for a fibre atlas
on a total space E. Then differentiable fibre bundles (in Chapter 34) can be specified by tuples such as
E ) where AE and AB are manifold atlases for E and B respectively.
(E, AE , π, B, AB , AF
23.6.3 Remark: For transformation group elements g ∈ G − < (G, F, σG , µ) in Definition 23.6.4, Lg denotes
the function Lg : F → F satisfying Lg : f 8→ µ(g, f ). The spaces and maps of Definition 23.6.4 are illustrated
in Figure 23.6.2.
See Definition 16.8.7 for “effective topological left transformation groups”. The map µ̄ is the function-valued
function µ̄ : G → (F → F ) defined by µ̄(g)(f ) = µ(g, f ), where µ : G × F → F is the group operation of G
on F . The circle on the arrow for µ̄ in Figure 23.6.2 indicates that the map is of the form µ̄ : G → (F → F ),

23.6. Topological fibre bundles 509
φ µ̄
π −1 (U ) ⊆ E
F G
π
×
π
φ
U ⊆B U ×F ⊆B×F
Figure 23.6.2 Coordinate map for fibre bundle with structure group
not µ̄ : G → F . That is, for all g ∈ G, µ̄(g) is a map µ̄(g) : F → F . In this case, the circled arrow µ̄ is the
action of a transformation group G on F .
23.6.4 Definition: A topological (G, F ) fibre bundle for an effective topological left transformation group
(G, F ) −
< (G, TG , F, TF , σG , µ) is a tuple (E, π, B) −
< (E, TE , π, B, TB , AF
E ) such that:
(i) (E, TE ) and (B, TB ) are topological spaces, and π : E → B is continuous;

(ii) ∀φ ∈ AF E , ∃Uφ ∈ TB , φ : π
−1
(Uφ ) → F is continuous and π × φ : π −1 (Uφ ) ≈ Uφ × F ;
%
(iii) φ∈AF Uφ = B;
E
&
(iv) ∀φ1 , φ2 ∈ AF −1 &
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∃g ∈ G, βb,φ1 ◦ βb,φ2 = Lg , where βb,φ = φ π −1 ({b}) : π ({b}) ≈ F for
−1
φ ∈ AF E and b ∈ Uφ ;
−1
(v) ∀φ1 , φ2 ∈ AF
E , the function gφ1 ,φ2 : Uφ1 ∩ Uφ2 → G defined by Lgφ1 ,φ2 (b) = βb,φ1 ◦ βb,φ2 is continuous.
The transformation group G − < (G, TG , F, TF , σG , µ) is called the structure group of the fibre bundle.
The set AF E is called a (G, F ) fibre atlas for the fibration (E, π, B).
The set Eb = π −1 ({b}) is called the fibre (set) of (E, π, B) at b ∈ B.
The map βb,φ may be called the per-fibre chart, fibre-set chart or pointwise fibre chart for the fibre bundle
(E, π, B) at b ∈ B.
The functions gφ1 ,φ2& are called fibre atlas transition functions for (E, π, B).
The function gφ1 ,φ2 &π−1 ({b}) may be called the per-fibre (fibre atlas) transition function or pointwise (fibre
atlas) transition function for (E, π, B) at b ∈ B.
23.6.5 Remark: The transition map Lg for the per-fibre charts βb,φ1 and βb,φ2 in Definition 23.6.4 (iv) is
& )
& − 1 ( {b }
π −1 ({b}) = φ 1 π
F
β b,φ 1
−1
E βb,φ1 ◦ βb,φ 2
= Lg , ∃g ∈ G
βb,
π −1
φ2 =φ &
2& F
π −1
({b
})
B b
Figure 23.6.3 Fibre bundle transition maps are group elements
Conditions (i), (ii) and (iii) mean that (E, π, B) − < (E, TE , π, B, TB ) is a topological fibration with fibre
space F according to Definition 23.3.7 and AF E is a fibre atlas for (E, π, B) with fibre space F according
to Definition 23.3.14. So the first three conditions of Definition 23.6.4 are the same as the conditions of
Definition 23.3.17 for a topological fibre atlas with a fibre atlas. The extra conditions (iv) and (v) specialize
E so that the charts are related to each other by continuous left group actions on the fibre
the fibre atlas AF
space.

23.6.6 Remark: Topological fibre bundles may be empty. (This is mentioned for the case of topological
fibrations in Remark 23.2.3.) The general empty topological (G, F ) fibre bundle for non-empty F has the
E ) = (∅, {∅}, ∅, ∅, {∅}, AP ), where AP = ∅ or {∅}. (For proof, see Exercise 46.8.1.)
form (E, TE , π, B, TB , AF G G
A special case of this is (E, TE , π, B, TB , AE ) = (∅, {∅}, ∅, ∅, {∅}, ∅).

F
23.6.7 Remark: The group element gφ1 ,φ2 (b) in Definition 23.6.4 (v) may be thought of as a transformation
rule from φ2 coordinates to φ1 coordinates. So indices tend to match up as for standard matrix notation (as
opposed to the reverse convention for Markov process matrices). An example of this is Theorem 23.8.11 (ii).
23.6.8 Remark: Definition 23.6.4 (iv) may also be expressed as ∀φ1 , φ2 ∈ AF E , ∀b ∈ Uφ1 ∩ Uφ2 , ∃g ∈
−1 −1
G, ∀y ∈ F, φ1 (βb,φ 2
(y)) = g.y. By putting z = β b,φ2 (y), the condition becomes
∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∃g ∈ G, ∀z ∈ Eb ,
φ1 (z) = gφ2 (z). (23.6.1)
−1
Since G acts effectively on F , the group element g is uniquely determined by the transition map βb,φ1 ◦ βb,φ 2
.
This is because the map Lg : F × F uniquely determines g in an effective group. Therefore the function
gφ1 φ2 in condition (v) is well-defined. It is not well-defined if (G, F ) is not effective. Using gφ1 φ2 , equation
(23.6.1) becomes:
∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀z ∈ Eb ,
φ1 (z) = gφ1 φ2 (b)φ2 (z)
= gφ1 φ2 (π(z))φ2 (z),
from which it follows that:
E , ∀z ∈ Dom(φ1 ) ∩ Dom(φ2 ),
∀φ1 , φ2 ∈ AF
φ1 (z) = gφ1 φ2 (π(z))φ2 (z).
The group elements gφ1 ,φ2 (b) can be defined without the restricted β maps by noting that Lgφ1 ,φ2 (b) : y 8→
" #
P2 (π × φi ) ◦ (π × φj )−1 (b, y) for y ∈ F , where P2 : B × F denotes the projection map P2 : (b, y) 8→ y. This
π −1 (Ui ) π −1 (Uj )
φj
E F
π −1 (Ui ∩Uj ) φi
π◦
φj
π π◦ P2
φi
Ui ∩ Uj (Ui ∩Uj )×F

B B×F
P1
Ui Uj Ui × F Uj × F
Figure 23.6.4 Projection maps and fibre charts for transition maps
The base point b in this expression may be thought of as a tag for the fibre element y which is thrown away
by P2 when it has been used in the chart transition map. This expression for gφ1 ,φ2 may be further simplified
to Lgφ1 ,φ2 (b) : y 8→ φi ◦ (π × φj )−1 (b, y).
23.6.9 Notation: atlas(E, π, B) for a (G, F ) fibre bundle (E, π, B, AF E ) denotes the fibre atlas AE .
F
Then atlasb (E, π, B) for b ∈ B denotes the subset {φ ∈ atlas(E, π, B); b ∈ π(Dom(φ))} of atlas(E, π, B).
E,b for b ∈ B also denotes the set {φ ∈ atlas(E, π, B); b ∈ π(Dom(φ))}.
AF

23.6. Topological fibre bundles 511
[ Check if Remark 23.6.10 is really true. Maybe the atlas is uniquely determined by the tuple (E, TE , π, B, TB ).
It’s probably okay as it is. ]
23.6.10 Remark: The fibre atlas AF E is an essential parameter in the specification of a topological fibre
bundle with a structure group in Definition 23.6.4. A triple (E, π, B) may be given two different atlases to
specify two different, incompatible fibre bundles with respect to the pair (G, F ). A triple (E, π, B) with a
single atlas is a (G, F ) fibre bundle for a range of choices of the group G. But this set of groups depends on
the choice of atlas AF E.
In general, the larger the atlas, the smaller the set of allowable structure groups G. A fibre bundle with a
single-chart atlas will have the widest range of possibilities for the group G, since any topological transfor-
mation group G on F will be consistent with the atlas. If a quadruple (E, π, B, AF E ) is a (G, F ) fibre bundle,
then it is a (G# , F ) fibre bundle for any topological left transformation group of F which is a supergroup
of G.
The purpose of the fibre atlas AF E on each fibre set is to specify algebraic structure. The atlas is constrained
by the topology of the underlying fibration (E, π, B), although one may think of the atlas as determining
the topologies on E and B. The set of cross-sections of a fibre bundle is constrained by these topologies. If
the topologies are induced by the atlas, then one may say that the atlas determines the set of cross-sections
also. However, the choice of structure group does not directly constrain the set of cross-sections of the fibre
bundle.
To summarize, the topology of the underlying fibration constrains the atlas, and the atlas constrains the
structure group; in the other direction, the structure group constrains the atlas, and the atlas determines
the topology of the underlying fibration. The minimal structure group is a property of the atlas, not of the
underlying fibration. The maximal atlas is a property of the structure group, not of the underlying fibration.
23.6.11 Remark: It is tempting to conjecture that Definition 23.6.4 (v) could be a direct consequence of
the other conditions. To try to prove this conjecture, note that for b ∈ Uφ1 ∩ Uφ2 and f ∈ F ,
−1
(b, gφ1 φ2 (b)(f )) = (b, βb,φ1 (βb,φ 2
(f )))
" #
= b, βb,φ1 ((π × φ2 )−1 (b, f ))
" #
= (π × φ1 ) (π × φ2 )−1 (b, f ) ,
which is clearly a continuous function from B × F to B × F . So gφ1 φ2 is the projection onto F of the
homeomorphism (π × φ1 ) ◦ (π × φ2 )−1 . Hence gφ1 φ2 : B × F → F is continuous. The group action
µ : G × F → F is continuous by Definition 16.8.3.
[ There must be some natural topology on the function space F → F . Probably this could be chosen so
that a function f : G × F → F is continuous if and only if the corresponding function f : G → (F → F )
is continuous. See Definition 14.12.21 for pointwise convergence topology, which is the same as the direct
product topology. This might be a suitable topology, especially for the implicit automorphism group for
topological fibrations. ]
The function µ : G × F → F may be regarded as µ̄ : G → (F → F ). Then both gφ1 φ2 and µ̄ are maps
whose range is the set of maps from F to F . It is tempting to consider the function µ̄−1 ◦ gφ1 φ2 . In fact,
by condition (iv), the range of gφ1 φ2 is a subset of the range of µ̄. So µ̄−1 ◦ gφ1 φ2 : B → G is a well-defined
function. But the continuity of µ̄ does not imply the continuity of µ̄−1 . . . .
[ Under what conditions or circumstances would Definition 23.6.4 (v) follow from the other conditions? ]
23.6.12 Remark: Fibre bundles have many similarities to manifolds. Both fibre bundles and manifolds
have topological definitions which require no extra structure such as an atlas, and more regular definitions
such a differentiable fibre bundles or manifolds which do require extra structure in their specification such
as an atlas. The specification of a (G, F ) fibre bundle requires an atlas. In a sense, the structure group G
is analogous to the local diffeomorphism group (called a “pseudogroup” in Kobayashi/Nomizu [27]) which
defines C k and other regularity for differentiable manifolds and fibre spaces.
23.6.13 Remark: For any b ∈ B and fibre chart φ& such that b ∈ π(Dom(φ)), define the function βb,φ :
Eb ≈ F by βb,φ (x) = φ(x). In other words, βb,φ = φ&E . Then the map Lg ◦ βb,φ : Eb ≈ F is also a valid
b

pointwise identification of Eb with the extrinsic fibre space F . Consequently, the identification of each fibre
set with the extrinsic fibre is determined only up to a group element g ∈ G. Therefore the map βb,φ should
be thought of not as a fixed identification with the fibre space F but rather an identification with F in some
indeterminate “orientation” of F . The set of all possible valid pointwise identifications of Eb with F is the
set {Lg ◦ βb,φ ; g ∈ G}.
23.6.14 Definition: A compatible fibre chart for a (G, F ) fibre bundle (E, π, B, AF E ) is a fibre chart φ for
(E, π, B) with fibre space F such that (E, π, B, AF
E ∪ {φ}) is a (G, F ) fibre bundle.
23.6.15 Remark: Another way of stating Definition 23.6.16 is that two atlases A1 and A2 are said to be
equivalent if every fibre chart in A1 is compatible with every fibre chart in A2 .
23.6.16 Definition: Topological (G, F ) fibre atlases A1 and A2 for a topological fibration (E, π, B) are
said to be equivalent topological (G, F ) fibre atlases for (E, π, B) if A1 ∪ A2 is a topological (G, F ) fibre atlas
for (E, π, B).
[ Define a maximal (G, F ) fibre atlas, and emphasize that it is optional. ]

[ Define cross-sections of fibre bundles. ]
23.6.17 Example: The Möbius strip provides probably the simplest non-trival example of a fibre bundle.
(See Section 43.2.) For the Möbius strip, the only possible structure group is G = {I, J} where I is the
identity on F = {−1, 1} and J : F → F swaps the elements of F . The trivial group {I} would not be
an effective group since it leaves two elements of F fixed. In this example, both the fibre space F and the
structure group G are completely determined by the triple (E, π, B). In general, the structure group G may
not be uniquely determined by this triple.
[ Refer to the example of the tangent bundle of a differentiable manifold. ]

[ Present a worked example for G = SO(2), F = S 1 , B = S 2 . Also give the example B = S 1 , F = S 0 ,
G = {I, J}, where J swaps 1 with −1, which should be a Möbius strip. ]
23.7. Fibre bundle homomorphisms, isomorphisms and products
A fibre bundle homomorphism is called simply a “fibre bundle map” in EDM2 [35], 147.B. The style of
homomorphism in Definition 23.7.1 maps only the total spaces because the rest of the structure more or
less follows from this. (This is not surprising. The total space is the central component of a fibre bundle.)
The atlases are not required to be completely equivalent. They are only required to be C 0 equivalent. Thus
conditions (ii) and (iii) specify that the map must be consistent with the charts on each fibre bundle and
the topology of the structure group. But the fibre atlas on (E1 , π1 , B1 , A1 ) could have regularity or group
invariance properties which are not present for (E2 , π2 , B2 , A2 ).
[ May also define “exact equivalence” of fibre bundles which have an exact match of atlases. These could be
called “atlas-equivalent” fibre bundles. These would guard all regularity and group invariance properties of
the fibre bundles. ]
23.7.1 Definition: A topological (G, F ) fibre bundle homomorphism between two topological (G, F ) fibre
bundles (E1 , π1 , B1 , A1 ) and (E2 , π2 , B2 , A2 ) is a continuous map f : E1 → E2 such that:
(i) π2 ◦ f = f˜ ◦ π1 for some continuous function f˜ : B1 → B2 ;
&
(ii) ∀φ1 ∈ A1 , ∀φ2 ∈ A2 , ∀b ∈ U1 ∩ f˜−1 (U2 ), ∃g ∈ G, βf˜(b),φ2 ◦ f ◦ βb,φ
−1
1
= Lg , where βbi ,φi = φi &π−1 ({bi })
i
for φi ∈ Ai and bi ∈ Ui = πi (Dom(φi )) for i = 1, 2;
(iii) ∀φ1 ∈ A1 , ∀φ2 ∈ A2 , ρφ1 φ2 : U1 ∩ f˜−1 (U2 ) → G is continuous, where ρφ1 φ2 is defined by Lρφ1 φ2 (b) =
−1
βf˜(b),φ2 ◦ f ◦ βb,φ 1
.
[ Condition (iii) in Remark 23.7.2 seems to include condition (ii). So maybe these should be combined. ]
23.7.2 Remark: Definition 23.7.1 (i) is equivalent to requiring that the relation f˜ = π2 ◦ f ◦ π1−1 be a
well-defined continuous function. If this function exists, it is unique. The continuity of f˜ means roughly that
f continuously maps sets of the form π1−1 ({b1 }) for b1 ∈ B1 onto sets of the form π2−1 ({b2 }) for b2 ∈ B2 .

23.7. Fibre bundle homomorphisms, isomorphisms and products 513
Condition (ii) can be made to look more similar to condition (i) by writing it as follows:
∀φ1 ∈ A1 , ∀φ2 ∈ A2 , ∀b ∈ U1 ∩ f˜−1 (U2 ), ∃g ∈ G,
&
&
βf˜(b),φ2 ◦ f & −1 = Lg ◦ βb,φ1 : π1−1 ({b}) → π2−1 ({f˜(b)}).
π1 ({b})
Thus Lg : F ≈ F in (ii) is analogous to the base space map f˜ : B1 → B2 in (i).

Condition (iii) is illustrated in Figure 23.7.1. It may be written equivalently as
&
&
φ2 ◦ f & −1 : z 8→ Lρφ1 ,φ2 (π1 (z)) ◦ φ1 (z)
˜−1
π1 (U1 ∩f (U2 ))
for z ∈ π1−1 (U1 ∩ f˜−1 (U2 )). This shows that f (z) is equivalent “through the charts” to a group action which
depends only on π1 (z) for each fixed choice of charts.
ρφ1 ,φ2 (b) ∈ G

F F
βb,φ1 φ1 φ2 βf˜(b),φ2
f
Dom(φ1 ) ⊆ E1 Dom(φ2 ) ⊆ E2
π1−1 ({b}) π1 π2 π2−1 ({f˜(b)})
f˜
U1 = π1 (Dom(φ1 )) b f˜(b) π2 (Dom(φ2 )) = U2
B1 B2
Figure 23.7.1 Topological (G, F ) fibre bundle homomorphism
23.7.3 Remark: If f˜ in Definition 23.7.1 is a homeomorphism, then f is also a homeomorphism. [ This
should probably be a theorem. ]
23.7.4 Definition: A topological (G, F ) fibre bundle isomorphism between topological (G, F ) fibre bundles
(E1 , π1 , B1 , A1 ) and (E2 , π2 , B2 , A2 ) is a map f : E1 → E2 such that f and f −1 are topological (G, F ) fibre
bundle homomorphisms.
23.7.5 Definition: Equivalent topological (G, F ) fibre bundles are topological (G, F ) fibre bundles between
which there is a topological (G, F ) fibre bundle isomorphism.
23.7.6 Remark: The style of fibre bundle homomorphism in Definition 23.7.1 assumes the same topological
transformation group (G, F ) for the two fibre bundles. Definition 23.7.7 removes this restriction. (See
Definition 16.8.6 for topological transformation group homomorphisms.)
23.7.7 Definition: A topological fibre bundle homomorphism between a topological (G1 , F1 ) fibre bundle
(E1 , π1 , B1 , A1 ) and a topological (G2 , F2 ) fibre bundle (E2 , π2 , B2 , A2 ) is a tuple of maps (fˆ, fˇ, f ) such that:
(i) (fˆ, fˇ) is a topological transformation group homomorphism with fˆ : G1 → G2 and fˇ : F1 → F2 ;
(ii) f : E1 → E2 is continuous;
(iii) π2 ◦ f = f˜ ◦ π1 for some continuous function f˜ : B1 → B2 ;
&
(iv) ∀φ1 ∈ A1 , ∀φ2 ∈ A2 , ∀b ∈ U1 ∩ f˜−1 (U2 ), ∃g ∈ G1 , βf˜(b),φ2 ◦ f ◦ βb,φ
−1
= fˇ◦ Lg , where βbi ,φi = φi &π−1 ({b
1
i i })
for φi ∈ Ai and bi ∈ Ui = πi (Dom(φi )) for i = 1, 2;
(v) ∀φ1 ∈ A1 , ∀φ2 ∈ A2 , ρφ1 φ2 : U1 ∩ f˜−1 (U2 ) → G1 is continuous, where ρφ1 φ2 is defined by fˇ ◦ Lρφ1 φ2 (b) =
−1
βf˜(b),φ2 ◦ f ◦ βb,φ 1
.

fˆ
G1 G2
µ1 µ2
fˇ
F1 F2
φ1 φ2
f
E1 E2
π1 π2
f˜
B1 B2
Figure 23.7.2 General topological fibre bundle homomorphism

If the equation βf˜(b),φ2 ◦ f ◦ βb,φ −1
1
= fˇ ◦ Lg in condition (iv) is replaced with the less restrictive ∃g2 ∈
−1
G2 , βf˜(b),φ2 ◦ f ◦ βb,φ 1
= Lg2 ◦ fˇ ◦ Lg , which would be more symmetric, it would no longer be required in
−1
general that βf˜(b),φ2 ◦ f ◦ βb,φ 1
(F1 ) ⊆ fˇ(F1 ).
23.7.9 Remark: Definitions 23.7.10 and 23.7.11 permit the structure groups (G1 , F1 ) and (G2 , F2 ) to be
different but equivalent whereas Definitions 23.7.4 and 23.7.5 require identical structure groups.
23.7.10 Definition: A topological fibre bundle isomorphism between topological fibre bundles (E1 , π1 , B1 )
and (E2 , π2 , B2 ) is a map f : E1 → E2 such that f and f −1 are topological fibre bundle homomorphisms.
23.7.11 Definition: Equivalent topological fibre bundles are topological fibre bundles between which there
is a topological fibre bundle isomorphism.
23.7.12 Remark: The term “product bundle” is generally used for trivial bundles which are constructed
as the direct product of two topological spaces. Therefore the term “direct product bundle” should be used
for the direct product of two topological fibre bundles.
23.7.13 Definition: [Definition of product bundle.]
[ See EDM2 [35], 52.G, for a definition of fibre product. See also EDM2 [35], 200.D, for the definition of Tor. ]
[ Define “Whitney sum” of fibre bundles. See Crampin/Pirani [12], pages 358–9. ]
23.7.14 Definition: [Definition of reduced bundle.]
[ Give definitions of cross-section and smooth cross-section. See Gallot/Hulin/Lafontaine [20], 1.34, page 17;
EDM2 [35], 147.L; Malliavin [36], 7.4.4, page 69. ]
%
[ May introduce “double fibre bundles” such as T (M1 , M2 ) = p∈M1 ,q∈M2 Lin(Tp (M1 ), Tq (M2 )), the “double
tangent space”, with projections πj : T (M1 , M2 ) → Mj and π12 : T (M1 , M2 ) → M1 × M2 . ]
[ Maybe have a new section here on “double fibre bundles”, including non-topological and topological. Define
structure groups on these. Try to construct double fibre bundles out of single fibre bundles. Double fibre
bundles are related to double tangent spaces. ]
23.8. Structure-preserving fibre set maps

Maps which “preserve structure” between fibre sets Eb with b ∈ B for (G, F ) fibre bundles (E, π, B, AFE)
are important for defining parallelism because parallelism must preserve structure. A map which preserves
structure means a map which is equivalent to the action of an element of the structure group “through the
charts”.

23.8. Structure-preserving fibre set maps 515
23.8.1 Remark: It is tempting to try to transfer the action of the group G from the fibre F to the total
space E. One motivation for this might be for the definition of connections on tangent bundles. This
idea doesn’t seem to work though. Initially we have a fibre bundle (E, π, B) − E& ) and a
< (E, TE , π, B, TB , AF
topological transformation group (G, F ) − < (G, TG , F, TF , σG , µG ). Let φ ∈ AE and define βb,φ = φ&π−1 ({b})
F F
for b ∈ Dom(φ) as in Definition 23.6.4 so that βb,φ : π −1 ({b}) ≈ F . The most obvious way to transfer the
action of g # ∈ G from F to π −1 ({b}) is as follows.
−1
∀z ∈ Dom(φ), z 8→ βb,φ ◦ Lg! ◦ βb,φ (z),
where b = π(z). This must be tested for fibre chart independence. From Definition 23.6.4, for any fixed b ∈
−1
Dom(φ1 ) ∩ Dom(φ2 ) there is a g ∈ G such that βb,φ1 ◦ βb,φ 2
= Lg . From this it follows that βb,φ1 = Lg ◦ βb,φ2
−1 −1
and βb,φ1 = βb,φ2 ◦ Lg−1 . This implies that
−1 −1
βb,φ 1
◦ Lg! ◦ βb,φ1 = βb,φ 2
◦ Lg−1 g! g ◦ βb,φ2 .
−1
This is only equal to the desired βb,φ 2
◦ Lg! ◦ βb,φ2 if g and g # commute. So in general, the definition of the
action of g on E is not chart-independent.
#
This seems to be the motivation for the introduction of principal fibre bundles and associated fibre bundles.
In the case of principal fibre bundles, the action on the total space is on the right, and the action on the
right commutes with the action of chart transition functions on the left.
23.8.2 Definition: A structure-preserving fibre set map for a (G, F ) topological fibre bundle (E, π, B, AF
E)
is a homeomorphism f : Eb1 ≈ Eb2 for b1 , b2 ∈ B such that
∀φ1 ∈ AF F
E,b1 , ∀φ2 ∈ AE,b2 , ∃g ∈ G, ∀z ∈ Eb1 , φ2 (f (z)) = gφ1 (z).
23.8.3 Remark: The group element g ∈ G in Definition 23.8.2 and Notation 23.8.4 may be thought of as
the coordinates of the map f : Eb1 ≈ Eb2 . This is clearer if g is written as gφ1 ,φ2 (f ; b1 , b2 ) = βb2 ,φ2 ◦f ◦βb−1
1 ,φ1
,
where g and Lg are equated.
23.8.4 Notation: IsoG (Eb1 , Eb2 ) for a (G, F ) topological fibre bundle (E, π, B, AF
E ) denotes the set of all
topological isomorphisms from Eb1 to Eb2 for b1 , b2 ∈ B which are equivalent to left translations by group
elements “through the charts”. In other words,
IsoG (Eb1 , Eb2 ) = {f ∈ Iso(Eb1 , Eb2 ); ∀φ1 ∈ AF
E,b1 , ∀φ2 ∈ AE,b2 , ∃g ∈ G, ∀z ∈ Eb1 , φ2 (f (z)) = gφ1 (z)}.
F
[ Possibly define some sort of structure on the IsoG (Eb1 , Eb2 ) spaces? There could be some sort of topological
or algebraic structure. Maybe there could be a “dual isomorphism bundle” structure which looks like a fibre
bundle of some sort. At least it should be given a name. In the case of differentiable fibre bundles, there
should be a differentiable structure. ]
%G (E) for the “double fibre bundle” of all structure-preserving
[ It would be useful to have a notation such as Iso
fibre set maps on (E, π, B). Thus IsoG (E) = p,q∈B IsoG (Ep , Eq ). ]
23.8.5 Notation: AutG (Eb ) for a (G, F ) topological fibre bundle (E, π, B, AF
E ) denotes the set of all
topological automorphisms of Eb for b ∈ B which are equivalent to left translations by group elements
“through the charts”. That is, AutG (Eb ) = Iso(Eb , Eb ).
23.8.6 Remark: Notation 23.8.4 is based on the Notation 14.13.3 by which Iso(Eb1 , Eb2 ) denotes the set
of homeomorphisms from Eb1 to Eb2 . However, it is superfluous to require that f ∈ Iso(Eb1 , Eb2 ) for all
f ∈ IsoG (Eb1 , Eb2 ) because (G, F ) is a continuous transformation group. So
IsoG (Eb1 , Eb2 ) = {f : Eb1 → Eb2 ; ∀φ1 ∈ AF
E,b1 , ∀φ2 ∈ AE,b2 , ∃g ∈ G, ∀z ∈ Eb1 , φ2 (f (z)) = gφ1 (z)}.
F
In terms of the per-base-point chart notation βb,φ in Definition 23.6.4, one may write
IsoG (Eb1 , Eb2 ) = { βb−1
2 ,φ2
◦ Lg ◦ βb1 ,φ1 ; φ1 ∈ AF F
E,b1 , φ2 ∈ AE,b2 , g ∈ G }.
All maps of the form βb−1

2 ,φ2
◦ Lg ◦ βb1 ,φ1 are automatically elements of Iso(Eb1 , Eb2 ) by the conditions of
Definition 23.6.4.

23.8.7 Definition: A (fibre set) automorphism through the charts on a (G, F ) fibre bundle (E, π, B, AF
E)
−1
is a map of the form βb,φ ◦ Lg ◦ βb,φ : Eb ≈ Eb for some g ∈ G, b ∈ B and φ ∈ AF
E,b .
23.8.8 Notation: Lbg,φ for b ∈ B, g ∈ G and φ ∈ AF E,b for a topological (G, F ) fibre bundle (E, π, B, AE )
F
−1
denotes the automorphism through the charts Lbg,φ = βb,φ ◦ Lg ◦ βb,φ : Eb ≈ Eb .
Lg,φ for g ∈ G and φ ∈ AF E for a topological (G, F ) fibre bundle (E, π, B, AE ) denotes the map z 8→
F
−1
βπ(z),φ ◦ Lg ◦ βπ(z),φ (z) for z ∈ Dom(φ).
23.8.9 Remark: The map Lg,φ in Notation 23.8.8 is an automorphism Lg,φ : Dom(φ) ≈ Dom(φ).
[ Probably get induced vector fields on differentiable fibre bundles by differentiating Lbg,φ and Lg,φ with respect
to g via the atlas on G. ]
23.8.10 Remark: Theorem 23.8.11 is illustrated in Figure 23.8.1, particularly the proof of part (ii).
F Lg1 F Lg2 fibre space
φk βb,φ1 βb,φ2 fibre charts
E Lbg1 ,φ1 Eb Lbg2 ,φ2 total space
π projection map
B base space
b
Figure 23.8.1 Fibre set automorphisms
23.8.11 Theorem: The fibre set automorphisms Lbg,φ for a (G, F ) topological fibre bundle (E, π, B, AF
E)
for g ∈ G, φ ∈ AF and = have the following properties:
E b ∈ Uφ π(Dom(φ))
(i) ∀φ ∈ AF
E , ∀b ∈ Uφ , ∀g1 , g2 ∈ G, Lg1 ,φ ◦ Lg2 ,φ = Lg1 g2 ,φ ;
b b b
(ii) ∀φ1 , φ2 ∈ AF E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀g1 , g2 ∈ G, Lg1 ,φ1 ◦ Lg2 ,φ2 = Lg1 h12 g2 h21 ,φ1 = Lh21 g1 h12 g2 ,φ2 , where
b b b b
−1
hij = gφi ,φj (b) for i, j = 1, 2, and Lgφi ,φj (b) = βb,φi ◦ βb,φ j
as in Definition 23.6.4 (v);
−1
E , ∀b ∈ Uφ1 ∩ Uφ2 , Lh12 ,φ1 = Lh12 ,φ2 = βb,φ2 βb,φ1 for h12 as in (ii);
(iii) ∀φ1 , φ2 ∈ AF b b
E , ∀b ∈ Uφ1 ∩ Uφ2 ∩ Uφ3 , Lh12 ,φ1 = Lh12 ,φ2 = Lh32 h13 ,φ3 for hij as in (ii);
(iv) ∀φ1 , φ2 , φ3 ∈ AF b b b
(v) ∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀g ∈ G, Lg,φ1 = Lh21 ,φ2 Lg,φ2 Lh12 ,φ2 = Lh21 gh12 ,φ2 .
b b b b b
−1 −1 −1 −1
Proof: Part (i) follows from Lbg1 ,φ Lbg2 ,φ = (βb,φ Lg1 βb,φ )(βb,φ Lg2 βb,φ ) = βb,φ Lg1 Lg2 βb,φ = βb,φ Lg1 g2 βb,φ .
(For simplicity, juxtaposition is used instead of the “◦” symbol to indicate function composition here.)
Part (ii) may be calculated as follows. (See Figure 23.8.1.)
−1 −1
Lbg1 ,φ1 Lbg2 ,φ2 = (βb,φ 1
Lg1 βb,φ1 )(βb,φ 2
Lg2 βb,φ2 )
−1 −1
= βb,φ 1
Lg1 Lh12 Lg2 βb,φ2 (βb,φ 1
βb,φ1 )
−1
= βb,φ 1
Lg1 h12 g2 Lh21 βb,φ1
= Lbg1 h12 g2 h21 ,φ1 .
The equality to Lbh21 g1 h12 g2 ,φ2 follows similarly. (Note how the indices match up nicely. Spooky, huh?)
−1 −1 −1 −1
Part (iii) follows from Lbh12 ,φ1 = βb,φ 1
Lh12 βb,φ1 = βb,φ 1
(βb,φ1 βb,φ 2
)βb,φ1 = βb,φ 2
βb,φ1 . Similarly, Lbh12 ,φ2 =
−1 −1 −1
βb,φ2 (βb,φ1 βb,φ2 )βb,φ2 = βb,φ2 βb,φ1 .

23.8. Structure-preserving fibre set maps 517
Part (iv) follows from the calculation:

−1
Lbh12 ,φ1 = βb,φ 1
Lh12 βb,φ1
−1 −1 −1
= (βb,φ 3
βb,φ3 )βb,φ 1
Lh12 βb,φ1 (βb,φ 3
βb,φ3 )
−1
= βb,φ 3
Lh31 Lh12 Lh13 βb,φ3
−1
= βb,φ 3
Lh31 h12 h13 βb,φ3
= Lbh31 h12 h13 ,φ3 = Lbh32 h13 ,φ3 .
The other half of part (iv) follows from part (iii). Part (v) follows from the calculation
−1 −1 −1 −1
Lbg,φ1 = βb,φ 1
Lg βb,φ1 = βb,φ 1
(βb,φ 2
βb,φ2 )Lg (βb,φ 2
βb,φ2 )βb,φ1 = Lbh21 ,φ2 Lbg,φ2 Lbh12 ,φ2 = Lbh21 gh12 ,φ2 .
23.8.12 Remark: Part (i) of Theorem 23.8.11 means that the composition rules for the automorphisms
Lg,φ are the same as for the left translation operators of the transformation group (G, F ).
The general composition rule Lbg1 ,φ1 ◦ Lbg2 ,φ2 = Lbg1 h12 g2 h21 ,φ1 = Lbh21 g1 h12 g2 ,φ2 for different charts in part (ii)
involves some sort of conjugation of group elements with the coordinate transition group elements h12
and h21 = h−1 12 .
−1
Part (iii) in reverse gives a chart transition rule for elements of a fibre set. Thus βb,φ βb,φ2 (z) = Lbh21 ,φ1 (z)
" b # 1
for all z ∈ Eb . Hence βb,φ2 (z) = βb,φ1 Lh21 ,φ1 (z) for all z ∈ Eb . In terms of Notation 23.8.15, one may write
also Lb,b −1
e,φ1 ,φ2 = Lh12 ,φ1 = Lh12 ,φ2 = βb,φ2 βb,φ1 .
b b
The fact that Lbh12 ,φ1 = Lbh12 ,φ2 in part (iii) suggests that Lbh12 ,φ is independent of the chart φ. Part (iv)
shows that this is not true in general.
Part (v) is a general chart transition rule. This rule indicates a fundamental problem with the transfer of
group actions from the fibre space F to the fibre sets Eb . The problem is that parallelism cannot be specified
by associating a group element g ∈ G with each point on a curve to indicate a left action on the fibre set
at that point. The left actions Lbg,φ on Eb are chart-dependent. So the orientation of the fibre sets must be
indicated in general by a different group element for each chart. The purpose of principal fibre bundles is to
try to remove this problem.
23.8.13 Remark: For any topological (G, F ) fibre bundle (E, π, B, AF
E ), it is tempting to believe that:
E,b1 , ∀φ2 ∈ AE,b2 , ∀f ∈ IsoG (Eb1 , Eb2 ), ∀g ∈ G, ∀z ∈ Eb1 ,
∀b1 , b2 ∈ B, ∀φ1 ∈ AF F
f (Lbg,φ
1
1
(z)) = Lbg,φ
2
2
(f (z)).
In fact, this is false in general. One would think that if you carry out a transformation g on the domain of an
isomorphism, then this should commute with the same transformation on the range. The reason it doesn’t
work is the way in which the fibre charts interact with the isomorphism. Consider for example the parallelism
on IR2 with the group O(2) of rotations. If the charts at two base points are orientation-preserving, then
the above rule is true. But if one of the charts has the opposite orientation to the other, then the rotation
Lg,φ1 on one fibre set will have the opposite effect to the rotation Lg,φ2 on the other fibre set.
The essence of the problem is the fact that all elements of IsoG (Eb1 , Eb2 ) are themselves left actions on the
fibre space, and these cannot be expected to commute with the left actions (automorphisms) through the
charts on each fibre set. It turns out that in the case of principal fibre bundles, this problem can be resolved
by using a combination of right actions and left actions. Thus a rule of the form f (Rg (z)) = Rg (f (z)) will
be obtained.
[ The following paragraph doesn’t make any sense until you’ve read the sections on tangent bundles. So they
really shouldn’t be here at all! ]
This issue is related to the concept of group invariants because the fibre set isomorphisms preserve all
invariants of the structure group, which is because the isomorphisms are themselves left group actions.
In the case of coordinate bundles for differentiable manifolds, it will turn out that the left action of the
structure group preserves linear combinations. That is, linear combinations are an invariant of the general
linear group. Therefore matrices which represent linear combinations of tangent vectors must be invariant.
But such matrices act on vectors from the right. And this is why coordinate frame bundles are used as
principal fibre bundles. Coordinate frames are really right actions on the tangent vector space.

23.8.14 Definition: A (fibre set) isomorphism through the charts on a (G, F ) fibre bundle (E, π, B, AF E)
is a map of the form βb−1
2 ,φ2
◦ Lg ◦ β b1 ,φ 1
: E b1
≈ E b2
for some g ∈ G, b1 , b2 ∈ B, φ 1 ∈ A F
E,b1 and φ 2 ∈ A E,b2 .
F
23.8.15 Notation: Lbg,φ 1 ,b2

1 ,φ2
for b1 , b2 ∈ B, g ∈ G, φ1 ∈ AF
E,b1 and φ2 ∈ AE,b2 for a topological (G, F ) fibre
F
b1 ,b2
bundle (E, π, B, AE ) denotes the isomorphism through the charts Lg,φ1 ,φ2 = βb−1
F
2 ,φ2
◦ Lg ◦ βb1 ,φ1 : Eb1 ≈ Eb2 .
23.8.16 Remark: Notation 23.8.15 is illustrated in Figure 23.8.2.
Lg
F F fibre space
βb1 ,φ1 βb3 ,φ3
φk βb2 ,φ2 fibre charts
Lbg,φ
1 ,b2
1 ,φ2
Lbg,φ
2 ,b3
2 ,φ3
Lbg,φ
2 ,b1
2 ,φ1
Lbg,φ
3 ,b2
3 ,φ2
π Lbg,φ
1 ,b3
1 ,φ3 projection map
Lbg,φ
3 ,b1
3 ,φ1
B base space
b1 b2 b3
Figure 23.8.2 Fibre set isomorphisms through the charts
If the fibre set isomorphisms Lbg,φ1 ,b2

1 ,φ2
for base point pairs (b1 , b2 ) ∈ Dom(φ1 )×Dom(φ2 ) are considered as the
maps Lg,φ1 ,φ2 for variable (b1 , b2 ), the result is not a single-valued function because each element of the fibre
set Eb1 is mapped to a single element in every fibre set Eb2 for b2 ∈ Dom(φ2 ). Therefore the symbol Lg,φ1 ,φ2
is best thought of as an equivalence relation than as a function. For each choice of g ∈ G and φ1 , φ2 ∈ AF E,
the relation Lg,φ1 ,φ2 sets up an equivalence between one point in each fibre set of Dom(φ1 ) and one point
in each fibre set of Dom(φ2 ). In practice, the superscripts on the notations Lbg,φ and Lbg,φ 1 ,b2
1 ,φ2
are tedious to
write; so they may be omitted.
[ When “double fibre bundles” have been defined, Remark 23.8.16 can be phrased in such terms. ]
23.8.17 Theorem: The fibre set isomorphisms Lbg,φ1 ,b2
1 ,φ2
for a (G, F ) topological fibre bundle (E, π, B, AF
E)
for g ∈ G, φk ∈ AE and bk ∈ Uφk = π(Dom(φk )) for k = 1, 2 have the following properties:
F
b2 ,b3 b1 ,b2 b1 ,b3

E , ∀b1 ∈ Uφ1 , ∀b2 ∈ Uφ2 , ∀b3 ∈ Uφ3 , ∀g1 , g2 ∈ G, Lg2 ,φ2 ,φ3 ◦ Lg1 ,φ1 ,φ2 = Lg2 g1 ,φ1 ,φ3 ;
(i) ∀φ1 , φ2 , φ3 ∈ AF
b2 ,b3 b1 ,b2
E , ∀b1 ∈ Uφ1 , ∀b2 ∈ Uφ2 ∩ Uφ3 , ∀b3 ∈ Uφ4 , ∀g1 , g2 ∈ G, Lg2 ,φ3 ,φ4 ◦ Lg1 ,φ1 ,φ2 =
(ii) ∀φ1 , φ2 , φ3 , φ4 ∈ AF
Lbg12,b −1
h32 g1 ,φ1 ,φ4 , where hij = gφi ,φj (b) for i, j = 2, 3, and Lgφi ,φj (b) = βb,φi ◦βb,φj as in Definition 23.6.4 (v);
3
" b1 ,b2 b1 ,b2

(iii) ∀φ1 , φ#1 , φ2 , φ#2 ∈ AFE , ∀b1 ∈ Uφ1 ∩ Uφ!1 , ∀b2 ∈ Uφ2 ∩ Uφ!2 , Lg,φ1 ,φ2 = Lg ! ,φ!1 ,φ!2 ⇔
#
g # = gφ!2 ,φ2 (b2 ).g.gφ1 ,φ!1 (b1 ) ;
b1 ,b2 −1
(iv) ∀φ1 , φ2 ∈ AF
E , ∀b1 ∈ Uφ1 , ∀b2 ∈ Uφ2 , Le,φ1 ,φ2 = βb2 ,φ2 βb1 ,φ1 ;
b1 ,b2 −1
(v) ∀φ1 , φ2 ∈ AF
E , ∀b1 ∈ Uφ1 , ∀b2 ∈ Uφ2 , Lg = βb2 ,φ2 Lg,φ1 ,φ2 βb1 ,φ1 : F → F .
Proof: Part (i) follows by simple calculation (indicating composition by juxtaposition):

Lbg22,b b1 ,b2 −1 −1
,φ2 ,φ3 Lg1 ,φ1 ,φ2 = (βb3 ,φ3 Lg2 βb2 ,φ2 )(βb2 ,φ2 Lg1 βb1 ,φ1 )
3
= βb−1
3 ,φ3
Lg2 g1 βb1 ,φ1
= Lbg12,b3
g1 ,φ1 ,φ3 .
Part (ii) may be calculated similarly as follows:

Lbg22,b b1 ,b2 −1 −1
,φ3 ,φ4 Lg1 ,φ1 ,φ2 = (βb3 ,φ4 Lg2 βb2 ,φ3 )(βb2 ,φ2 Lg1 βb1 ,φ1 )
3
= βb−1
3 ,φ4
Lg2 Lh32 Lg1 βb1 ,φ1
= Lbg12,b3
h32 g1 ,φ1 ,φ4 .

23.9. Topological principal fibre bundles 519
The rule for changes of fibre charts in part (iii) follows from the calculation:
Lbg,φ
1 ,b2
1 ,φ2
= βb−1
2 ,φ2
Lg βb1 ,φ1
= βb−1
2 ,φ 2
−1
2 2
−1
! (βb2 ,φ! βb ,φ )Lg (βb1 ,φ1 βb ,φ! )βb1 ,φ!
1 1
2 1
= βb−1 ! Lg !
2 ,φ2
L Lgφ ,φ! (b1 ) βb1 ,φ!1 .
φ ,φ2 (b2 ) g
2 1 1
Parts (iv) and (v) follow trivially from Definition 23.8.14.
23.8.18 Remark: Theorem 23.8.17 (v) implies that any fibre set isomorphism f : Eb1 → Eb2 may be
converted to a unique g ∈ G by calculating Lg = βb2 ,φ2 ◦ f ◦ βb−1 1 ,φ1
. Any isomorphism f = Lbg,φ
1 ,b2
1 ,φ2
may be regarded as a parallelism relation between the fibre sets at b1 and b2 in B. To specify absolute
(path-independent) parallelism, these fibre set isomorphisms are completely adequate. But for pathwise
(path-dependent) parallelism, the definitions of Section 24.2 are required.
Looking ahead to differentiable fibre bundles, the strategy will be to try to differentiate the parameter g for
the map f with respect to the point b2 as it varies along a curve. This derivative will be called a “connection”.
One may think of g as a function gf,φ1 ,φ2 (b1 , b2 ) which is a coordinatization of the map f (b1 , b2 ) : Eb1 → Fb2 .
Thus gf,φ1 ,φ2 (b1 , b2 ) = βb2 ,φ2 f (b1 , b2 )βb−1
1 ,φ1
. Looking even further ahead, one may also try to calculate the
exterior derivative of the derivative of gf,φ1 ,φ2 (b1 , b2 ) with respect to b2 to obtain a measure of the curvature
of the parallelism. This is not totally unlike the situation in elementary calculus where the curvature of a
curve in flat 2-space IR2 is related to the second derivative of the curve regarded as a graph.
23.9. Topological principal fibre bundles

23.9.1 Remark: A topological principal fibre bundle is a particular kind of topological fibre bundle, namely
a topological (G, F ) fibre bundle (P, π, B) such that F = G. (Topological fibre bundles were defined in
Section 23.6.) More precisely, the structure group (G, F ) − < (G, TG , F, TF , σG , µ) has F = G, TF = TG
and µ = σG . So the structure group for a principal fibre bundle is the topological left transformation group
< (G, TG , G, TG , σG , σG ). The corresponding topological group is G −
(G, G) − < (G, TG , σG ). A principal fibre
bundle with structure group G is usually called a “principal G-bundle” or just a “G-bundle”.
With an ordinary fibre bundle, the group G acts on the left on the fibre space F . But when the fibre space F
equals the group G, the group can also act on the right on the fibre space. It is possible to extend the right
action of G from the fibre space to the total space P via the fibre charts in a chart-independent manner. This
makes possible the construction of a topological right transformation group (G, P ) − G)
< (G, TG , P, TP , σ, µP
in terms of a right action µG : P × G → P . This right transformation group is uniquely determined by the
P
fibre bundle (P, π, B) −

< (P, TP , π, B, TB , AG
P ).
Some authors define principal fibre bundles in reverse. They start with the right transformation group (G, P )
and build the (G, G) fibre bundle (P, π, B) from this. (E.g. Kobayashi/Nomizu [27], page 50, construct the
base space of a principal fibre bundle as the quotient of a G-space P by the right action of a group G.
Then they construct ordinary fibre bundles as associated bundles for left transformation groups (G, F ) as in
Section 23.12.)
Most authors define connections (differential parallelism) on principal fibre bundles rather than ordinary
fibre bundles. Then they copy connections from PFBs to associated OFBs. (They often use constructions in
terms of “lifts” and “pullbacks” which are unncessarily convoluted and indirect.) So principal fibre bundles
customarily serve as a structure on which to define connections and parallelism. Since connections and
parallelism are defined on OFBs instead of PFBs in this book, the motivation for PFBs is not so strong.
They are, however, important for understanding the literature. (Although PFBs are ubiquitous in the
mathematical DG literature, they are seldom seen in the physics literature.)
It is perhaps interesting to ask whether principal fibre bundles could be defined for F -= G. For this to make
sense, G would need to have two actions on F , a left action and a right action, and these two actions must
commute with each other. However, this might not be very useful.
Definition 23.9.2 is the topological version of Definition 34.4.1 for differentiable principal fibre bundles.

23.9.2 Definition: A topological principal (fibre) bundle with structure group G for a topological group
G − < (G, TG , σG ) is a topological (G, G) fibre bundle (P, π, B) −
< (P, TP , π, B, TB , AG
P ) with (G, G) − <
(G, TG , G, TG , σG , σG ).
−1
The right action of G on P is the operation µP G : P × G → P defined by µPG (z, g) = βπ(z),φ (σG (φ(z), g)) for
&
(z, g) ∈ P × G for any φ ∈ AP with z ∈ Dom(φ), where βb,φ = φ&π−1 ({b}) for b ∈ π(Dom(φ)).
G
The right transformation group of G on P is the topological right transformation group (G, P ) − <
(G, TG , P, TP , σG , µP
G ).
A topological principal fibre bundle with structure group G is also called a topological principal G-bundle or
a topological G-bundle.
23.9.3 Remark: Figure 23.9.1 illustrates the spaces and maps in Definition 23.9.2. The map Rg : P → P
in Figure 23.9.1 is defined in Notation 23.9.4, and Lg : G → G is the left translation map Lg : g # 8→ gg # .
G G
Rg Lg
π −1 (U ) ⊆ P
πφ
×
π
φ
U ⊆B U ×G⊆B×G
Figure 23.9.1 Principal fibre bundle spaces and maps
[ Kobayashi/Nomizu [27], page 50, says that M = P/G and π is the projection map of this quotient. ]
23.9.4 Notation: Rg for g ∈ G for a principal fibre bundle (P, π, B, AG P ) denotes the map Rg : P → P
defined by Rg : z 8→ µP
G (z, g), where µP
G is the right action of G on P .
23.9.5 Theorem: The right action µP
G in Definition 23.9.2 is fibre chart independent.
−1
Proof: For any fibre charts φ, φ# ∈ AG P with z ∈ Dom(φ) ∩ Dom(φ ), let g ∈ G be such that βb,φ! ◦ βb,φ =
# #
Lg! . Then
−1 −1
βb,φ (σG (βb,φ (z), g)) = βb,φ ! ◦ Lg ! (σG (L(g ! )−1 ◦ βb,φ! (z), g))
−1
= βb,φ ! (σG (βb,φ! (z), g)).
The theorem follows from this.
23.9.6 Remark: This chart independence in Theorem 23.9.5 is due to that fact that the fibre space is a
group whose elements can be acted on from both the left and the right. One could say that group elements
are “two-port” objects. Since groups are associative, the right and left actions commute with each other. The
elements of ordinary fibre spaces are then “single-port” objects. As noted in Remark 23.8.1, it is apparently
not possible to define a chart-independent right action for ordinary fibre bundles.
−1
If the expression βπ(z),φ (σG (φ(z), g)) for µP
G (z, g) in Definition 23.9.2 seems untidy, it may also be written
as µG (z, g) = (π × φ) (π(z), φ(z)g).
P −1
23.9.7 Remark: The right action Rg : P → P of G on P in Notation 23.9.4 may be expressed in terms
−1
of the right action R̄g : G → G of G on G as Rg = βb,φ ◦ R̄g ◦ βb,φ for g ∈ G and b ∈ π(Dom(φ)). In other
words, the action of G on P is the same as the action of G on G through the charts. Therefore the right
action µPG does not provide any information that is not already in the fibre bundle.
23.9.8 Remark: Condition (ii) of Theorem 23.9.9 is often given as part of the definition a principal fibre
bundle, whereas here it is presented as a consequence of the definition of a right action which is expressed
in terms of the components of the standard fibre bundle tuple in Definition 23.9.2.

23.10. Associated topological fibre bundles 521
23.9.9 Theorem: The right action µP G on a topological principal fibre bundle (P, π, B, AP ) is the unique
G
map µG : P × G → P which satisfies:

P
(i) ∀z ∈ P, ∀g ∈ G, π(µP
G (z, g)) = π(z); (That is, π(zg) = π(z).)
(ii) ∀φ ∈ AP , ∀z ∈ Dom(φ), ∀g ∈ G, φ(µP
G
G (z, g)) = σG (φ(z), g). (That is, φ(zg) = φ(z)g.)
[ Show explicitly that actions Lbg,φ and Rg commute on P . ]
23.9.10 Remark: In Definition 23.6.4 for ordinary fibre bundles, the structure group G is required to act
effectively on the fibre space F . Definition 23.9.2 does not need to specify that the right action of G on G
is effective because a group always acts freely on itself, and a free action on a non-empty set is necessarily
effective, as mentioned in Remark 9.4.22.
The right action µP G of G acts freely on P . To see this in terms of the conditions in Theorem 23.9.9, let z ∈ P
and g ∈ G \ {e} satisfy µP G (z, g) = z; that is, zg = z. Then by (i), π(zg) = π(z). So zg and z are in the same
fibre set of P ; that is, they have the same base point b = π(zg) = π(z). From (ii) it follows that φ(zg) = φ(z)g
for any chart φ ∈ AG P such that b ∈ Uφ = π(Dom(φ)). Since G acts freely on G, this implies that φ(zg) -= φ(z).
But π × φ : π −1 (Uφ ) ≈ Uφ × F is a bijection. Therefore zg = (π × φ)−1 (b, φ(zg)) -= (π × φ)−1 (b, φ(z)) = z.
This means that non-identity elements of G have no fixed points, and G therefore acts freely on P .
23.9.11 Remark: Theorem 23.9.9 (ii) means that all fibre charts are equivariant maps between the right
transformation groups (G, P ) and (G, G). (See Definition 9.5.8 for equivariant maps.)
Condition (i) means that the action of G on P is “vertical”. Without this condition, the action of G on P
would not be uniquely determined by the fibre charts as a result of condition (ii).
[ No textbooks seem to mention that the group action on the total space in Theorem 23.9.9 must satisfy
condition (i): π(z, g) = π(z). Check that this is not implied by other conditions. ]
23.9.12 Remark: In contrast to the ease of constructing µP G from the atlas AP in Remark 23.9.7, it is not
G
possible to construct the atlas from the right action µG . It is true that if φ(z) is known for one value of
P
z ∈ π −1 ({b}) for a given b ∈ π(Dom(φ)), then φ(zg) is uniquely defined for all g ∈ G as φ(zg) = φ(z)g, and
this means (by Definition 23.6.4 (ii)) that φ(z) is uniquely defined for all z ∈ π −1 ({b}). Thus the fibre charts
are fully defined if they are defined for just one value of z in each fibre π −1 ({b}). However, this single value
cannot be obtained from µP G . Therefore the information in the fibre charts is partly but not fully redundant.
To specify a fibre chart φ, it is sufficient to specify the set φ−1 ({e}), where e ∈ G is the identity of G.
−1
The rest of the fibre chart is uniquely determined by this. One may think of the unique point βb,φ (e) ∈
φ ({e}) ∩ π ({b}) as the “identity point” in each fibre set π ({b}). For a tangent bundle, this gives a
−1 −1 −1
kind of “coordinate frame field” on B.

−1
[ Kobayashi/Nomizu [27], page 50, construct B = P/G. They then derive βb,φ1 ◦ βb,φ 2
= Lg from the right
action of G on P on page 51. Present this construction here also. ]
23.9.13 Remark: Structure-preserving fibre set maps are defined for principal fibre bundles in exactly the
same way as for ordrinary fibre bundles in Section 23.8. As mentioned in Remark 23.8.1, left actions on
fibre sets by the structure group do not generally commute with the fibre chart transition maps. Therefore
the structure group element g ∈ G in Definition 23.8.2 is chart-dependent. This is reflected in Notation
23.8.8 where Lg,φ denotes left translation of fibre sets by g with respect to a fibre chart φ. This is still true
for left actions on principal fibre bundles, but for a PFB it is possible to define right actions also (as in
Notation 23.9.4), and right actions do commute with fibre chart transition maps.
[ Near here, replicate Section 23.8 using right actions Rg on principal fibre bundles instead of left actions?
Then show how left and right actions interact? E.g. RgG ◦ φ = φ ◦ RgP for charts φ : P →
˚ G? ]
23.10. Associated topological fibre bundles

Two fibre bundles are said to be “associated” if they have the same fibre chart transition maps. It follows
that associated fibre bundles must have the same structure group because fibre chart transition maps are
equivalent to left translations Lg of the fibre space F by elements g of the structure group G, but the fibre

spaces F may be different. Fibre chart transition maps describe how patches of the total space are glued
together. Therefore associated fibre bundles must be glued together in the same way, even though they may
have a different fibre space. The equality of fibre chart transition maps also implies that the fibre bundle
base spaces B must be the same.
Prime examples of associated fibre bundles are tensor bundles T r,s (M ) of different types (r, s) on the same
differentiable fibre bundle M . (See Section 28.3 for tensors.) These have the same structure group G
(typically GL(n) for an n-dimensional manifold) and the same fibre chart transition rules. Tensors of
different type have different fibre spaces F and transformation rules µ : G × F → F , but the coordinate
charts φ themselves are related to each other by the same transition rules gφ1 ,φ2 for each type of tensor.
Thus the group elements are the same, but the actions of the group elements on the fibre spaces are different
because the fibre spaces are different.
Associated fibre bundles could be defined to have different base spaces, but usually they have the same base
space as in Definition 23.10.5. This is understandable because the main purpose of defining them is so as to
transfer parallelism from one fibre bundle to another under the assumption that they are based on the same
underlying fibre structure. Thus, for instance, all the different kinds of tensor bundles on a differentiable
manifold are based on a single tangent bundle. If the base spaces are different, one may transfer parallelism
between fibre bundles using fibre bundle isomorphisms.
[ Maybe could transfer parallelism between fibre bundles by using the inverse of a fibre bundle homomorphism! ]
Fibre bundle associations are quite different to fibre bundle homomorphisms (which are defined in Sec-
tion 23.7). A fibre bundle association specifies a map between the fibre atlases of the fibre bundles. This
fibre atlas map is required to preserve the chart transition maps, but there is no specified map between the
total spaces of the associated bundles. A fibre bundle association may be thought of as an isomorphism of
the fibre atlases.
[ Investigate how a homomorphism h2 : ξ2 → η can be defined in terms of a homomorphism h1 : ξ1 → η if ξ1
and ξ2 are associated fibre bundles. ]
Specific methods for constructing associated fibre bundles are presented in Sections 23.11 and 23.12.
23.10.1 Remark: If (G, F ) and (G, F̃ ) are effective left transformation groups, then the fibre chart tran-
sition functions uniquely determine group elements. These group elements can then be compared with each
other. If they are equal for some bijection between the atlases, then they are said to be associated. This
shows a very good reason why the transformation groups should be effective. Since parallelism is determined
by group elements at each base point, it follows that parallelism can be uniquely transferred from any fibre
bundle to any associated fibre bundle. In the case of differentiable fibre bundles, this means that connections
may be ported between different types of fibre bundles if they have the same base space and structure group.
In particular, connections on affine connections on tangent bundles can be copied to all types of tensor
bundles.
23.10.2 Remark: Definitions 23.10.3 and 23.10.5 are pleasantly simple compared to the definitions of asso-
ciated fibre bundles in most textbooks. The usual definitions are actually particular methods of construction
of associated fibre bundles, and it is the methods of construction which are complicated. Definition 23.10.3
says only that associated fibre bundles must have exactly matching domains for all charts in the respective
atlases, and that all chart transition functions must be the same.
Definition 23.10.3 is illustrated in Figure 23.10.1. Note how the correspondence between the fibre bundles
involves only a map h between the charts. There are no maps between the spaces of the two fibre bundles.
This should be contrasted with fibre bundle homomorphisms as illustrated in Figure 23.7.2, which has maps
between all components of the two fibre bundles. Note also that although the spaces “on the outside”
(G and B) are the same, the spaces “on the inside” (F and E) are different. Therefore the group action
maps µ and µ̃ are different and the projection maps π and π̃ are different. In the middle, the charts φi and
φ̃i = h(φi ) are related by the fibre space association map h.
23.10.3 Definition: A topological fibre bundle association is a bijection h : AF

E → AẼ between the fibre
F̃
atlases of topological (G, F ) and (G, F̃ ) fibre bundles (E, π, B, AF

E ) and (Ẽ, π̃, B, AẼ ) respectively such that:
F̃
E , π(Dom(φ)) = π̃(Dom(h(φ)));
(i) ∀φ ∈ AF

23.10. Associated topological fibre bundles 523
G G
µ µ̃
F F̃
h
φ1 φ2 h(φ1 ) h(φ2 )
E Ẽ
π π̃
B B
Figure 23.10.1 Associated topological fibre bundles
(ii) ∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , gφ1 ,φ2 (b) = g̃h(φ1 ),h(φ2 ) (b), where Uφ denotes π(Dom(φ)), and g, g̃ denote
the fibre chart transition functions for the respective fibre bundle atlases.
[ What can be said about the relation between the topologies on E and Ẽ in Definition 23.10.3? ]
23.10.4 Remark: Perhaps a clearer way of presenting Definition 23.10.3 (ii) is:
∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀g ∈ G,
" # " #
∀z ∈ π −1 ({b}), φ1 (z) = gφ2 (z) ⇔ ∀z̃ ∈ π̃ −1 ({b}), h(φ1 )(z̃) = gh(φ2 )(z̃) . (23.10.1)
This is similar to equation (24.3.1) in Definition 24.3.2 for associated parallelism. (For proof of this remark,
see Exercise 46.8.2.)
23.10.5 Definition: Associated topological fibre bundles are topological (G, F ) and (G, F̃ ) fibre bundles
E ) and (Ẽ, π̃, B, AẼ ) for which a topological fibre bundle association h : AE → AẼ is specified.
(E, π, B, AF F̃ F F̃
23.10.6 Remark: Recall from Definition 23.6.4 that the chart transition functions for Definition 23.10.3
−1 −1
are defined by Lgφ1 ,φ2 (b) = βb,φ1 ◦ βb,φ 2
and Lg̃h(φ1 ),h(φ2 ) (b) = β̃b,h(φ1 ) ◦ β̃b,h(φ 2)
, where β and β̃ are notations
& &
which are defined by βb,φ = φ& −1 π
and β̃ = φ̃& −1
({b}) b,φ̃ π̃
. Definition 23.10.3 (ii) means that the group
({b})
elements g and g̃ are equal, but Lg : F ≈ F and Lg̃ : F̃ ≈ F̃ are not equal in general. Even if the spaces F
and F̃ are the same, the group actions µ : G × F → F and µ̃ : G × F̃ → F̃ might be different.
23.10.7 Example: The fibre bundle association for fibre bundles in Definition 23.10.5 is generally not
unique even if the atlases of the fibre bundles are fixed. As a trivial example, consider F = F̃ = (−1, 1) ⊆ IR
with the relative topology from IR. Let B = {0} ⊆ IR, let E = Ẽ = F , and let G = Aut(F ), the group of
homeomorphisms from F to F . Define charts φ1 : E → F and φ2 : E → F by φ1 : x 8→ x and φ2 : x 8→ −x,
E = AẼ = {φ1 , φ2 }. Of course, π : x 8→ 0 and π̃ : x 8→ 0.
and let AF F̃
Since ξ = (E, π, B, AF ˜
E ) and ξ(Ẽ, π̃, B, AẼ ) are identical fibre bundles, the identity map h1 : AE → AẼ is
F̃ F F̃
a topological fibre bundle association according to Definition 23.10.3. But another topological fibre bundle
association is the map h2 : AF E → AẼ defined by h2 : φ1 8→ φ2 and h2 : φ2 8→ φ1 . To show that Definition
F̃
23.10.3 (ii) is satisfied, note that g̃h(φ1 ),h(φ2 ) (b) = g̃φ2 ,φ1 (0) : x 8→ −x. But gφ1 ,φ2 (b) : x 8→ −x also. Condition
(ii) follows for other chart combinations similarly. So h1 and h2 are topological fibre bundle associations
between ξ and ξ˜ and h1 -= h2 .
Since a fibre bundle may be associated with itself via an atlas bijection which is not the identity, such a self-
association may be composed with associations to different fibre bundles to also make associations between
different fibre bundles non-unique. It follows that when parallelism is being copied between associated fibre
bundles, it is essential to specify which association map is being used between the atlases.

23.10.8 Remark: Definition 23.10.5 is very strict. It requires the atlases of associated topological fibre
bundles to have closely matched domains and identical fibre chart transition maps. In practice, one may
be content to call a pair of fibre bundles associated if they are merely C 0 equivalent to a pair of strictly
associated fibre bundles. This is presented in Definition 23.10.9.
[ In Remark 23.10.8, could talk about “strongly associated” and “weakly associated” fibre bundles. ]
[ Must define C 0 equivalent topological fibre bundles in Section 23.7. ]
23.10.9 Definition: C 0 associated topological fibre bundles are topological fibre bundles ξ1 , ξ˜1 with the
same base space which are C 0 equivalent to associated topological fibre bundles ξ2 , ξ˜2 respectively.
23.10.10 Remark: Definition 23.10.9 is illustrated in Figure 23.10.2. Definition 23.10.9 means that if ξ2
and ξ˜2 are associated topological fibre bundles according to Definition 23.10.5, and ξ1 and ξ2 are equivalent,
and ξ˜1 and ξ˜2 are equivalent, and they all have the same base space, then ξ1 and ξ˜1 are C 0 associated
topological fibre bundles. (See Definition 23.7.11 for C 0 equivalent topological fibre bundles.) Definition
23.10.9 permits the associated fibre bundles ξ2 , ξ˜2 to have different but equivalent structure groups (G1 , F1 )
and (G̃1 , F̃1 ), but the base spaces are required to be identical. This is somewhat arbitrary and may be
changed in light of the requirements for any applications.
G1 G2 G̃2 =G2 G̃1

µ1 µ2 µ̃2 µ̃1
F1 F2 F̃2 F̃1
h
φ1 φ2 h(φ2 ) φ̃2 φ̃1
E1 E2 Ẽ2 Ẽ1
π1 π2 π̃2 π̃1
B B B B
ξ1 isomorphism ξ2 association ξ˜2 isomorphism ξ˜1

Figure 23.10.2 C 0 associated topological fibre bundles, Definition 23.10.9
23.11. Construction of associated topological fibre bundles

Associated fibre bundles are constructed in this section as identification spaces. These are similar to the
identification spaces in Section 23.4.
23.11.1 Remark: Definition 23.11.2 constructs an associated (G, F̃ ) fibre bundle from a given (G, F )
fibre bundle. The only information that the associated bundle inherits from the given bundle is the set
of transition maps gφ1 φ2 of Definition 23.6.4 for topological fibre bundles. (See Kobayashi/Nomizu [27],
Prop. 5.2, page 52, for the related result that a principal fibre bundle may be constructed from any set of
transition maps covering the base space and satisfying a transitivity rule. An almost identical result is at
the end of EDM2 [35], 147.B.)
23.11.2 Definition: The associated topological (G, F̃ ) fibre bundle (identification space method) of a given
topological (G, F ) fibre bundle (E, π, B) − < (E, TE , π, B, TB , AF E ), for topological left transformation groups
(G, F ) − < (G, TG , F, TF , σ, µG ) and (G, F̃ ) −
F
< (G, TG , F̃ , TF̃ , σ, µF̃
G ), is the topological (G, F̃ ) fibre bundle
(Ẽ, π̃, B) −
< (Ẽ, TẼ , π̃, B, TB , AF̃
Ẽ
) defined by:

23.12. Construction of associated topological fibre bundles via orbit spaces 525
) *
E,b , where [(b, y, φ)] = {(b,
(i) Ẽ = [(b, y, φ)]; b ∈ B, y ∈ F̃ , φ ∈ AF g ! (b)y, φ# ); φ# ∈ AF
E,b }, the transition
" φ& φ #
maps gφ1 φ2 : Uφ1 ∩ Uφ2 → G are defined by Lgφ1 ,φ2 (b) = φ1 ◦ φ2 &π−1 ({b}) −1 , and Dom(φ) = π −1 (Uφ )
for φ ∈ AFE;
(ii) π̃ : Ẽ → B is defined by π̃ : [(b, y, φ)] 8→ b;
(iii) AF̃
Ẽ
= {φ̃; φ ∈ AF E }, where φ̃ : π̃
−1
(Uφ ) → F̃ is defined for φ ∈ AF
E by φ̃ : [(b, y, φ)] 8→ y;
)% *
(iv) TẼ = φ∈AF (π̃ × φ̃)
−1
(Ωφ ); Ω : AE → IP(B × F̃ ) and ∀φ ∈ AE , Ωφ ∈ Top(Uφ × F̃ ) .
F F
E
[ Create a diagram for Definition 23.11.2. ]
23.11.3 Remark: The fibre bundle (Ẽ, π̃, B) which is constructed in Definition 23.11.2 is well defined
and satisfies Definition 23.10.3 for a fibre bundle association because the chart transition maps satisfy
φ̃2 : [(b, y1 , φ1 )] 8→ gφ2 ,φ1 (b)y1 by conditions (i) and (iii).
It would perhaps be more logical to use the charts φ̃ as tags for triples (b, y, φ) instead of φ, but then they
would be used in (i) although they are not defined until (iii). Besides, there is a one-to-one map between
them anyway.
23.11.4 Remark: If the fibre space F̃ in Definition 23.11.2 is the structure group G, then the associated
fibre bundle (Ẽ, π̃, B) = (P, q, B)" is a principal
# G-bundle. In this case, the right action µP
G : P × G → P of
the G-bundle is defined by µG : [(b, g, φ)], g 8→ [(b, gg , φ)].
P # #
[ Refer to the example of the tangent bundle of a differentiable manifold. ]

[ Try to show relations between actions Rg and Rf for associated ordinary and principal fibre bundles. ]
[ Here present the 4-tuple equivalence class construction for associated fibre bundles. ]
23.12. Construction of associated topological fibre bundles via orbit spaces

The “orbit space method” of defining associated fibre bundles constructs associated ordinary fibre bundles
from a given principal fibre bundle. The orbit space method is less general than the identification space
method in Section 23.11 because the given fibre bundle must be a PFB, but it is the method most often
encountered in textbooks.
The author has spent a tremendous amount of time and energy attempting to find natural generalizations
or a deeper theoretical context for the popular “orbit space method”. It seems, however, that the orbit
space construction is a red herring which leads to nothing of interest. Therefore it is de-emphasized in this
book. Differential geometry probably does not need it. In practice, no one ever seems to use the orbit-space
construction directly because it is too abstract. Instead they show that some more concrete construction is
isomorphic to the orbit-space associated fibre bundle, and they use the more concrete construction instead.
So the orbit-space construction is perhaps one of those things which one should just learn and forget.
23.12.1 Remark: The identification space method uses fibre charts φ as tags for pairs (b, y) ∈ B × F , for a
base space B and fibre space F , to make tagged tuples (b, y, φ). The fibre chart tags determine the required
transformation of the fibre space element y when changing the fibre chart. So for an arbitrary chart φ# , the
fibre space element for (b, y, φ) is calculated as gφ! ,φ (b)y in terms of the chart transition map gφ! ,φ (b) for the
given fibre bundle.
The orbit space method, on the other hand, uses tuples of the form (z, y) ∈ P × F , for a given PFB total
space P . The component z ∈ P contains the same information as the combination of the base point b and
fibre chart F in the identification space method. The fibre space element corresponding to a pair (z, y) is
easily calculated as φ# (z)y for an arbitrary chart φ# . This gives exactly the same answer as in the identification
space method because φ# (z) = gφ! ,φ (b)φ(z). The orbit space method looks simpler, but in practice exactly
the same calculation is required.
In summary, the tuple (b, y, φ) in the identification space method carries around a copy of the base point
b and fibre chart φ so that the fibre chart transition map gφ! ,φ (b) can be applied correctly to change y
to y # = gφ! ,φ (b)y, whereas the tuple (z, y) in the orbit space method carries around a PFB total space
element z so that the group elements φ(z) and φ# (z) may be applied to y to change it to y # = φ# (z)−1 φ(z)y
when the fibre chart is changed to φ# . This works because φ# (z)−1 φ(z) = gφ! ,φ (b). (Neat, huh?)

The orbit space method does not work if the given fibre bundle is not a PFB because the product φ(z)y is
only defined if φ(z) is an element of a transformation group acting on y ∈ F , and this implies that the fibre
bundle is a PFB.
The orbit space method could be generalized to given fibre bundles which are not PFBs if their fibre space
contains enough information to make the correct transition maps. In other words, it must be possible to
extract the group element gφ! ,φ (b) somehow. In fact, this can be done in the case of coordinate n-frames
for tangent bundles of n-dimensional manifolds, because there is a one-to-one map between transition maps
and transformations of n-frames. This is used in the popular definition of principal tangent bundles. It can
also be done for the (n + 1)-frame bundle, but not for the (n − 1)-frame bundle. However, it seems that the
effort required to construct such intellectual curiosities does not yield any worthwhile benefits. (See Remark
34.9.4 for further comment on this.)
23.12.2 Remark: Definition 23.12.3 constructs a topological (G, F ) fibre bundle (E, π, B) from a given
topological principal G-bundle (P, q, B), where (G, F ) is an effective topological left transformation group.
The total space E of the ordinary fibre bundle (E, π, B) is constructed as an equivalence class of pairs
in P × F . which is locally homeomorphic to B × F . The (G, F ) fibre atlas AF E for (E, π, B) is constructed
from the atlas AGP for (P, q, B) by applying the group operation of G. The topology TE for E is defined so
that the maps π × φ̃ will be homeomorphisms for φ̃ ∈ AF E .
23.12.3 Definition: The associated topological (G, F ) fibre bundle (orbit space method) with structure
group (G, F ) − G ) for a given topological G-bundle (P, q, B) −
< (G, TG , F, TF , σ, µF < (P, TP , q, B, TB , AG
P ) is the
topological (G, F ) fibre bundle (E, π, B) − < (E, TE , π, B, TB , AE ) defined by:
F
) *
(i) E = [(z, y)]; z ∈ P, y ∈ F , where [(z, y)] = {(z # , y # ) ∈ P ×F ; q(z # ) = q(z), ∃φ ∈ AG P , φ(z )y = φ(z)y};
# #
(ii) π : E → B is defined by π : [(z, y)] 8→ q(z);

(iii) AFE = {φ̃; φ ∈ AP }, where φ̃ : π
G −1
(Uφ ) → F is defined for φ ∈ AG P by φ̃ : [(z, y)] 8→ φ(z)y;
)% *
(iv) TE = φ∈AG (π × φ̃)
−1
(Ωφ ); Ω : AP → IP(B × F ) and ∀φ ∈ AP , Ωφ ∈ Top(Uφ × F ) .
G G
P
[ Create a diagram for Definition 23.12.3. ]

23.12.4 Notation: (P × F )/G denotes the orbit-space version of the associated topological fibre bundle
in Definition 23.12.3. Thus one may write E = (P × F )/G. (See Notation n53 for another popular notation.)
23.12.5 Remark: There is no axiom of choice problem with Definition 23.12.3 (iv) because it is clear that
the maps Ω : AGP → IP(B × F ) defined by Ω : φ 8→ ∅ and Ω : φ 8→ Uφ × F are both well-defined. So TE is
non-empty if P is non-empty. (As mentioned in Remark 23.2.3, fibre bundles may be empty.)
23.12.6 Theorem: The associated (G, F ) fibre bundle of a principal G-bundle satisfies the conditions for
the definition of a topological (G, F ) fibre bundle.
Proof: From the one-to-one
& correspondence between [(z, y)] ∈ Eb = π −1 ({b}) and φ(z)y ∈ F , it follows
&
that the maps βb,φ̃ = φ̃ E are bijections, where φ̃ is defined by condition (iii). Therefore the maps π × φ̃ :
b
π −1 (Uφ ) → Uφ × F are bijections.
[ Show that φ1 (z) = gφ2 (z) ⇔ φ̃1 (z̃) = g φ̃2 (z̃) if π(z) = π(z̃). ]
[ Must show that TE is a topology etc. Use a general theorem about weak topology for partial functions? ]
...
[ Show that TE is the only possible topology for E in Definition 23.12.3. ]
[ Prove here that Definition 23.12.3 satisfies the conditions of Definition 23.10.5 for an associated fibre bundle. ]
23.12.7 Remark: The relation φ(z # )y # = φ(z)y in Definition 23.12.3 (i) is independent of the choice of φ ∈
P . (For proof, see Exercise 46.8.3.) If P is non-empty, then AP is non-empty; so the set of (z , y ) satisfying
# #
AG G
“∃φ ∈ AP , φ(z )y = φ(z)y” is the same as the set of (z , y ) satisfying “∀φ ∈ AP , φ(z )y = φ(z)y”. Thus two
G # # # # G # #
pairs (z, y) and (z # , y # ) are considered equivalent in Definition 23.12.3 (i) if they have the same base point
in B and the action of z on y through a chart φ is the same as the action of z # on y # through the same
chart φ. Hence the elements [(z, y)] of E are the same as orbits of the action of P on F except that the
action is indirect via one or more charts φ ∈ AG P . (For comparison, see Definition 9.4.28 for the orbit space
of a general left transformation group.)

23.12. Construction of associated topological fibre bundles via orbit spaces 527
23.12.8 Remark: Definition 23.12.3 (i) may be expressed explicitly in terms of orbit spaces by noting that
q(z # ) = q(z) and φ(z # )y # = φ(z)y ⇔ q(z # ) = q(z) and y # = φ(z # )−1 φ(z)y
" #
⇔ q(z # ) = q(z) and ∃g ∈ G, y # = gy and g = φ(z # )−1 φ(z)
" #
⇔ q(z # ) = q(z) and ∃g ∈ G, y # = gy and φ(z # ) = φ(zg −1 )
" #
⇔ ∃g ∈ G, y # = gy and z # = zg −1
⇔ ∃g ∈ G, (z # , y # ) = (zg −1 , gy)
⇔ ∃g ∈ G, (z # , y # ) = (zg, g −1 y).
So [(z, y)] is the same thing as {(zg, g −1 y); g ∈ G}, which happens to be the orbit of (z, y) ∈ P × F under
the right action ((z, y), g) 8→ (zg, g −1 y) of G on P × F .
" " ##
Since φ̃ (zg, g −1 y) = φ(zg)g −1 y = φ(z)gg −1 y = φ(z)y = φ̃ (z, y) for any g ∈ G, a fibre chart φ̃ maps all
representatives of an orbit [(z, y)] to the same element of F .
The set E = (P ×F )/G is defined as a “right inside skew product” of transformation groups in Definition 9.6.5.
The construction of (P × F )/G is similar to the construction of a tensor product of two spaces. It is
particularly similar to the tensor product of two modules over a ring. (See EDM2 [35], 277.J.) For any ring
R and a left R-module X and right R-module Y , the tensor product X ×R Y is defined so as to satisfy
(xa) ⊗ y = x ⊗ (ay) for a ∈ R, x ∈ X and y ∈ Y , and some other conditions. The projection map
f : P × F → E is “G-balanced” in the sense that [(zg, y)] = [(z, gy)] for all z ∈ P , g ∈ G and y ∈ F .
[ Tensor products of modules should be defined more tidily and moved to Section 9.9. ]
As mentioned in Remark 23.9.7, the information in the right action map µP G is redundant because this
information may be recovered from the fibre charts. Consequently, an associated OFB constructed in Defi-
nition 23.12.3 from a PFB may be defined without reference to µP G . To be specific, the expression µG (z, g)
P
−1
in condition (i) may be replaced with βπ(z),φ (φ(z).g) for φ ∈ AP . This shows that the associated OFB is
G
constructed essentially in terms of only the fibre charts of the PFB.
23.12.9 Notation: The total space E = (P × F )/G may be denoted as P ×G F , and the fibre bundle
(E, π, B) is often denoted as η ×G F , where η = (P, q, B). That is, (E, π, B) = (P, q, B) ×G F .
23.12.10 Remark: If the identification space method of Definition 23.11.2 is applied to a (G, F ) fibre
bundle (E, π, B, AF E ) to construct a G-bundle (P, q, B, AP ), and then the orbit space method of Definition
G
23.12.3 is applied to (P, q, B) to construct a (G, F̃ ) fibre bundle (Ẽ, π̃, B, AF̃ ), the result is a total space Ẽ
P" #Q Ẽ
which consists of equivalence classes which look like [(b, g, φ)], y , where b ∈ B, g ∈ G, φ ∈ AF E and y ∈ F̃ .
This suggests that one should combine the four components of the tuples into a single tuple to give an
equivalence class such as [(b, g, y, φ)]. This would be defined by [(b, g, y, φ)] = {(b, g # , y # , φ# ); gφφ! (b)(g # y # ) =
gy}. (Of course, the full definition of this set requires also the constraints g # ∈ G, y # ∈ F̃ and φ# ∈ AF E,b .
These boring constraints are suppressed here. The group element gφφ! (b) is the usual chart transition rule.)
The group element g in [(b, g, y, φ)] may be chosen to be equal to the identity e ∈ G and can then be removed.
This yields [(b, y, φ)] as in Definition 23.11.2.


[529]
Chapter 24
Parallelism on topological fibre bundles
24.1 Parallelism path classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529

24.2 Pathwise parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . 531
24.3 Associated parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
24.4 Other topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Parallelism on topological fibre bundles is a natural generalization of the connections on differentiable fibre
bundles in Chapter 35.
24.1. Parallelism path classes

Parallelism in flat spaces is absolute parallelism, which means a global equivalence relation between elements
of fibres sets at each base point. By contrast, pathwise parallelism means parallelism which is absolute only
within each path. For all the points along each path, there is an equivalence relation between elements of
fibre sets on the points of the path. (Self-intersections of paths are dealt with by treating multiple crossings
of a single base point as different points.)
24.1.1 Remark: One may ask why parallelism is defined for paths and not, say, for parametric families
of paths or some other structure. If there was absolute parallelism within general 1-parameter families of
paths (essentially 2-dimensional submanifolds), then the pathwise curvature for all paths would be zero, and
this would (apparently) imply that the parallelism is absolute. For 1-dimensional submanifolds, disconnected
paths (such as ordered traversals as in Definition 7.1.17) would yield “parallelism at a distance”, which would
also seem to imply absolute parallelism. So the only way to develop a non-trivial parallelism is apparently
with connected paths, and connectedness implies that the curve domains are intervals.
One may also ask whether pathwise parallelism has any applications, and the answer is that all Riemannian
manifolds yield a well-defined parallelism via the Levi-Civita connection, and many areas of mechanics and
field theory also yield non-trivial parallelisms. So it turns out that pathwise parallelism is a very applicable
generalization of absolute parallelism, and any further generalization beyond paths is probably not very
useful.
[ In Remark 24.1.1, try to prove that sheetwise parallelism always implies zero curvature and absolute paral-
lelism. ]
24.1.2 Remark: The fibre atlas on a topological fibre bundle uniquely determines the topology, which
in turn determines which cross-sections along paths are continuous. A definition of parallelism, on the
other hand, determines which continuous cross-sections along paths correspond to parallel translation. Since
parallel translation must satisfy a group invariance property, the structure group plays a role in defining
parallelism but not in defining the set of all continuous cross-sections.
As discussed in Section 16.1, the terminology adopted for curves and paths in this book is that “curves”
are maps γ : I → M for intervals I and topological spaces M , whereas “paths” are equivalence classes of
curves. Two curves are considered equivalent if they are related to each other by an increasing parameter
homeomorphism. So a path is a set of curves which all start at the same point and take the same route to
the end point, passing every point in the same order. Parallel transport depends only on the path, not on
the particular choice of curve which represents the path.

530 24. Parallelism on topological fibre bundles
In the case of differentiable fibre bundles, parallelism is defined on rectifiable paths because a connection
can only be integrated along a path if the tangent to the path exists almost everywhere. (See Sections 17.5
and 26.12 for rectifiable paths.) It does not seem to be possible to generalize parallelism in a satisfactory
way to all continuous paths in a topological space or topological manifold. (The set of all continuous paths
in M is denoted P0 (M ) in Notation 16.4.4.) This is not surprising because the transitivity property of
parallelism along paths gives parallelism the character of an integral, and integrals are not usually defined
for completely general functions. In this case, the integral is the kind of direction-dependent path integral
that appears in the Stokes theorem, which requires a locally rectifiable curve. Therefore it seems natural
and unavoidable that the most general definition of pathwise parallelism on a topological fibre bundle will
require the specification of a class P of paths on which parallelism may be defined. Since the path class
P will typically be defined in terms of a differentiable structure, it will not generally be definable in terms
of the topological structure alone. This is a kind of ex machina path class which requires some external
structure for its definition, and which therefore must be specified as an ad hoc set which has certain closure
and continuity properties. The paths in the class P could be thought of as being “wormholes” through
which parallelism is carried between the fibre sets at different points in the base space. This is illustrated in
Figure 24.1.1.
wormhole
1
End here
ole 3
w or
mh
Start here
h
ol
rm
e
2
wo
Figure 24.1.1 Parallelism “wormholes” (paths carrying fibre set orientation information)
24.1.3 Remark: It is not guaranteed that every kind of transformation of fibre sets along paths in a
fibre bundle will satisfy the criteria for a definition of parallelism. In a physical system, one may imagine
that one sends test particles out into the state space of the system to measure the transformations that
occur in the fibre sets along the paths of the particles. Only if these transformations satisfy a definition
such as Definition 24.2.2 can the transformations be thought of as a kind of parallelism. In other words, a
parallelism is something that one must discover. It turns out that many mathematical models, in particular
all Riemannian metric spaces, have a natural and useful parallelism. If the parallelism can be differentiated,
then a connection is defined. If the connection has a well-defined exterior derivative, then the curvature may
be defined, and curvature is what makes differential geometry different to flat-space geometry.
24.1.4 Remark: Parallelism along paths is essential in physics for the support of polarization of light
and conservation of momentum. Since Mach’s principle (a very reasonable principle) says that momentum
must be related to the rest of the matter of the universe (and inertial frames just “coincidentally” are those
which have constant velocity relative to the “fixed stars”), one might ask if there is some causal relation
between the “fixed stars” and the parallelism or affine connection on physical space. Even though there is
no luminiferous aether in the 19th century sense, there still seems to be some sort of structure in the vacuum
which defines parallelism so that momentum and polarization are meaningful.
Space seems to require a connection in addition to mere differentiable structure, and one might reasonably
ask what this structure is composed of. It seems to be obey equations which are related to gravity, but it
is not clear what the “parallelism wormholes” are. It seems almost as if space has tramlines laid down for
matter and energy to flow along, and the tramlines have some sort of “roll control” which maintains parallel
transport.

24.2. Pathwise parallelism on topological fibre bundles 531
One could go further and ask the more fundamental question of why physical sapce (or space-time) seems
to have a differentiable structure. How are nearby points “glued” together to make a smoothish manifold?
How do nearby points “know” that they are near each other? How does know that it must let light pass
through it at the speed of light and not some other speed? Why doesn’t light travel in a randomly crooked
path or go round in circles? The anthropomorphic principle does not tell us how these things happen. It
only tells us that we would not be observing the world if it were otherwise.
24.1.5 Remark: Definitions 24.1.6 and 24.2.2 are, presumably, non-standard. It is reasonable to expect
that a parallelism path class will be closed under concatenation, restriction and reversal. Closure under
continuous reparametrization is taken care of by the definition of a path as an equivalence class of curves.
A class P of paths in a base space B for defining parallelism on a fibre bundle (E, π, B) must be a partition of
some set C of curves in B such that the curves in any path are path-equivalent according to Definition 16.3.7.
In other words, every path in P must be%a non-empty set of path-equivalent curves in C and the paths must
be pairwise disjoint. In particular, C = P.
24.1.6 Definition: A parallelism path class for a topological fibre bundle (E, π, B) is a set P of paths in
B such that
(i) All constant paths in B are in P; (idempotence)

(ii) For all paths Q = [γ]0 ∈ P, the reverse −Q = [−γ]0 is in P; (symmetry)
(iii) For all Q1 , Q2 ∈ P with T (Q1 ) = S(Q2 ), the concatentation of Q1 with Q2 is in P. (transitivity)
%
A parallelism curve class is the set of curves C = P in a parallelism path class P.
24.1.7 Remark: Examples of suitable parallelism path classes are the set of rectifiable paths in a Lipschitz
manifold, the set of piecewise C k paths in a differentiable manifold for k ≥ 1, and the set of piecewise linear
paths in an affine space.
24.1.8 Remark: Just as a connection is defined on the differentiable structure a differentiable fibre bundle
(i.e. the differentiable atlas of the fibre bundle) rather than on the differentiable manifold itself, so a “paral-
lelism” on a topological fibre bundle is defined on a “parallelism path class” rather than on the underlying
topological space.
24.1.9 Definition: A (topological) fibre set parallelism for base points p, q ∈ B in a topological (G, F )
−1
E ) is a structure-preserving fibre set map α : Ep ≈ Eq of the form α = βq,φ2 ◦Lg ◦βp,φ1
fibre bundle (E, π, B, AF
for some g ∈ G, φ1 ∈ AE,p and φ2 ∈ AE,q . (That is, α ∈ IsoG (Ep , Eq ) in terms of Notation 23.8.4.)
F F
The
% (topological) fibre set parallelism space for a topological (G, F ) fibre bundle (E, π, B, AFE ) is the set
F
p,q∈B IsoG (Ep , Eq ) of all topological fibre set parallelisms for the fibre bundle (E, π, B, AE ).
24.1.10 Remark: Definition 24.2.2 is necessarily a little convoluted. In plain language, it means that a
pathwise parallelism is a set of structure-preserving maps between the fibre sets of pairs of points of curves
in the specified curve class C . These parallelism maps are equivalent for curves which are path-equivalent,
and they obey some basic rules of transitivity and symmetry.
Although general parallelism is not an absolute (i.e. path-independent) map between fibre sets, the restric-
tion to paths is absolute. Within a path, every point in every fibre set has a unique association with a
point in each other fibre set. (Recall that intersection points of paths are regarded as different points.)
Therefore parallelism along a path may be formalized as an equivalence relation rather than as the maps of
Definition 24.2.2. The functional representation is probably better for such tasks as differentiation though.
24.2. Pathwise parallelism on topological fibre bundles
24.2.1 Remark: See Notation 23.8.4 for the isomorphism sets IsoG (Ep , Eq ). Figure 24.2.1 illustrates some
of the structures involved in the pathwise parallelism in Definition 24.2.2.

Ep Θγs,t Eq
∈ IsoG (Ep ,Eq )
γ(s) = p q = γ(t)
Range(γ) ⊆ B
γ
Iγ IR s, t
Figure 24.2.1 Pathwise parallelism structure
24.2.2 Definition: A (topological) (pathwise) parallelism on a parallelism path class P for a topological
E ) is a map
(G, F ) fibre bundle (E, π, B, AF
% " % #
Θ:C → Iγ × Iγ → IsoG (Ep , Eq ) ,
γ∈C p,q∈E
%
where C = P and Iγ = Dom(γ) for γ ∈ C , which satisfies the following:
(i) ∀γ ∈ C , Dom(Θγ ) = Iγ × Iγ ;
(ii) ∀γ ∈ C , ∀s, t ∈ Iγ , Θγs,t ∈ IsoG (Eγ(s) , Eγ(t) );
" #
(iii) ∀Q ∈ P, ∀γ ∈ Q, ∀I ⊆ IR, ∀β ∈ C(I, Iγ ), ∀s, t ∈ I, γ ◦ β ∈ Q ⇒ Θγ◦β γ
s,t = Θβ(s),β(t) ; (parametrization
independence)
(iv) ∀γ ∈ C , ∀s, t, u ∈ Iγ , Θγt,u ◦ Θγs,t = Θγs,u ; (transitivity)
(v) ∀γ ∈ C , ∀s, t ∈ Iγ , Θ−γ γ
−s,−t = Θs,t ; (reversibility)
" #
(vi) ∀γ1 , γ2 ∈ C , γ1 ⊆ γ2 ⇒ Θγ1 ⊆ Θγ2 ; (monotonicity)
γ
(vii) ∀γ ∈ C , ∀φ1 , φ2 ∈ AF E , gφ1 ,φ2 is continuous, where gφγ1 ,φ2 : I1 × I2 → G with Ik = Iγ ∩ γ −1 (Uφk ) for
&
k = 1, 2 is defined by φ2 ◦ Θγs,t = Lgφγ ,φ (s,t) ◦ φ1 &E for all φ1 , φ2 ∈ AF
E and s, t ∈ I1 × I2 . (continuity)
1 2 γ(s)
The notation Θγs,t means Θ(γ)(s, t), and Θ means Θ(γ).

γ
[ Would it be better to call the monotonicity condition (vi) in Definition 24.2.2 a “restriction independence”
condition or something similar? ]
[ Try to show that for a PFB, Rgφγ ,φ (s,t)−1 is a parallelism if Lgφγ ,φ (s,t) is. See manuscript notes for proof. ]
1 2 1 2
[ Write out what the conditions of Definition 24.2.2 mean in terms of “coordinates” gγ(s),γ(t) with Θγs,t =
γ(s),γ(t)
Lg,φ1 ,φ2 ? ]
24.2.3 Remark: Definition 24.2.2 (iii) has the consequence that the parallelism map is the identity map
along constant stretches of curves. That is, if β : I → Iγ is constant on [a, b], then Θγ◦β s,t = idEγ(β(s)) for
all s, t ∈ [a, b].
If (iii) is applied twice in the case of a curve equivalence γ1 ◦ β1 = γ2 ◦ β2 = γ3 with γ1 , γ2 , γ3 ∈ Q, the result
is Θγβ11 (s),β1 (t) = Θγβ22 (s),β2 (t) = Θγs,t3 . This means that the definition of parallelism is independent of the curve
used to represent a path. So parallelism depends only on the path, not on the parametrization.
24.2.4 Remark: The transitivity rule Definition 24.2.2 (iv) with u = t implies an idempotence rule, namely
that Θγt,t = idEγ(t) for any γ ∈ C and t ∈ Iγ . Similarly, (iv) implies the rule Θγt,s = (Θγs,t )−1 . These look like
semigroup properties, but the maps Θγs,t are only isomorphisms, not automorphisms.
[ Since the maps Θγ are not semigroups, what are they? Is there a name for this sort of thing? ]
If the fibre set isomorphisms Θγs,t are known for a fixed s ∈ Iγ , then the isomorphisms for all other pairs
(s, t) may be calculated.

24.2. Pathwise parallelism on topological fibre bundles 533
24.2.5 Remark: The reversibility rule, Definition 24.2.2 (v), together with the transitivity rule (iv), implies
that the parallelism on a path is absolute. That is, it doesn’t matter how a curve gets from one point of
the path to another, it will always give the same parallelism from one point on the path to another. (When
there are self-intersections, the different traversals through the same point are regarded as different points of
the path.) In particular, if a curve starts at a point on a path and comes back to the same point, the result
is the identity map. So the parallelism is “flat” because there are no closed paths for which the parallelism
is a non-identity fibre set map.
This absolute parallelism implies that the function Θγ may be replaced with a simple equivalence relation
on the fibre sets over the base points of the curve γ.
24.2.6 Example: Condition (v) for Definition 24.2.2 does not follow from the other conditions. As a
counterexample, consider a trivial (G, F ) fibre bundle (E, π, B, AF E ) with G = O(2), F = IR , B = IR,
2
E = B × F , π : (x, z) 8→ x, and AE = {φ} with φ : (x, z) 8→ z. Define C to be the set of rectifiable

F
curves γ : I → B. Define a map Θ for this fibre bundle by Θγs,t = R(αs,t γ γ

), where R(αs,t ) denotes rotation
γ Mt #
of the fibre sets (through the chart) by angle αs,t = s |γ (u)| du for γ ∈ C . The interested student may
verify that all of Definition 24.2.2 except condition (v) is satisfied by Θ. (It’s about time the other students
did some work too. The interested student can’t be expected to do everything!) If the rotation angles are
γ Mt
replaced with αs,t = s γ # (u) du = γ(t) − γ(s), then Θ satisfies all of the conditions of Definition 24.2.2.
These examples are illustrated in Figure 24.2.2 for γ : u 8→ sin u, s = 0 and t = π.
x ∈ IR x ∈ IR
N t N t
γ γ
αs,t = |γ # (u)| du αs,t = γ # (u) du = γ(t) − γ(s)
s s
Figure 24.2.2 Definition of parallelism without and with reversibility
24.2.7 Remark: Definition
& 24.2.2 (vi) says that if γ1 is a subcurve of γ2 , then Θγ1 is a restriction of Θγ2 .
γ2 &
Therefore Θ = Θ I ×I .
γ1
γ1 γ1
If two curves γ1 and γ2 have a common portion γ0 so that γ0 ⊆ γ1 and γ0 ⊆ γ2 , then condition (vi) implies
that the parallelism of γ1 and γ2 will be the same along the common portion γ0 . This means that any
two curves passing through the same points will experience the same parallelism transformation, no matter
how the curves differ elsewhere. This may be thought of as a “memoriless” property. The transformation
experienced by a test particle moving along any portion of a curve is independent of anything that happens
before (or after) it passes through that portion.
This can be looked at in reverse. The opposite of a function restriction is a function extension. If a curve
γ3 is the concatenation of curves γ1 and γ2 , then γ1 ⊆ γ3 and γ2 ⊆ γ3 . So both curves are subcurves of γ3 .
(It follows by Definition 24.1.6 that γ3 ∈ C .) Therefore the parallelism map Θγs,t3 for s ∈ Iγ1 and t ∈ Iγ2 is
obtained as the composition of Θγs,b1
1
and Θγa22 ,t , where Iγk = [ak , bk ] for k = 1, 2. Thus Θγs,t3 = Θγs,b
1
1
◦ Θγa22 ,t .
So Definition 24.2.2 (vi) may be thought of as a concatenation rule.
24.2.8 Remark: The group element gφγ1 ,φ2 (s, t) in Definition 24.2.2 (vii) generally depends on the fibre
charts φ1 and φ2 . If b = γ(s) = γ(t), one may choose a single chart φ = φ1 = φ2 . For such a closed curve
portion, Θγs,t = Lbg,φ ∈ AutG (Eγ(s) ) with g = gφ,φ
γ
(s, t). Unfortunately, by Theorem 23.8.11 (v), the group
element g depends on φ.
A simple example of this is the tangent bundle of the sphere S 2 with the orthogonal group G = (O)(2) as
the structure group. The parallel transport around a closed curve (with the standard parallelism definition)
results in a rotation of the tangent space common initial and terminal point through some angle, α ∈ IR say.
This angle is the same for all charts which have the same orientation, but the rotation angle is −α for a
chart with the opposite orientation.

[ Construct equivalence relations for fibre sets over a path from Definition 24.2.2. One can similarly construct
a map (t, y) 8→ Θγs,t (βγ(s),φ
−1
(y)) for (t, y) ∈ Iγ × F which yields a single “lift” of γ for each y ∈ F . Each lift
is a curve in E. ]
[ Since (G, F ) is an effective group, there is a one-to-one correspondence between the group elements and
the parallelism maps between fibre sets at different points of a (G, F ) fibre bundle. The group element is
uniquely determined by the bijection through the charts. ]
24.3. Associated parallelism

24.3.1 Remark: Definition 24.3.2 shows how parallelism can be “ported” between a topological (G, F )
fibre bundle and an associated topological (G, F̃ ) fibre bundle. This concept is illustrated in Figure 24.3.1.
(G, F ) fibre bundle
(G, F̃ ) fibre bundle

Figure 24.3.1 Porting parallelism between associated fibre bundles
This must be the real reason for defining associated fibre bundles. The idea is to achieve economy of
specifications of parallelism by specifying it just once for one fibre bundle and then copying it to all associated
fibre bundles. The prime example of this is where the parallelism on the tangent bundle of a differentiable
manifold is re-used for all of the different kinds of tensor bundles on that manifold. Most of this chapter is
intended as preparation for Definition 24.3.2.
24.3.2 Definition: The associated (topological) (pathwise) parallelism from a topological (G, F ) fibre bun-
dle ξ = (E, π, B, AF ˜
E ) to an associated topological (G, F̃ ) fibre bundle ξ = (Ẽ, π̃, B, AẼ ) for a given topological
F̃
pathwise parallelism Θ on a parallelism curve class C is the topological pathwise parallelism Θ̃ on C̃ which
is defined by
∀γ ∈ C , ∀s, t ∈ Iγ , ∀φ1 ∈ AF F̃
E,γ(s) , ∀φ2 ∈ AẼ,γ(t) , ∀g ∈ G,
γ(s),γ(t) γ(s),γ(t)
Θγs,t = Lg,φ1 ,φ2 ⇔ Θ̃γs,t = Lg,φ̃ , (24.3.1)
1 ,φ̃2
where φ̃1 = h(φ1 ) and φ̃2 = h(φ2 ) are the charts for ξ˜ which are associated with the charts φ1 and φ2
respectively for ξ via a topological fibre bundle association h : AF
E → AẼ .
F̃
γ(s),γ(t)
[ An alternative to (24.3.1) in Definition 24.3.2 might be Lg ≡ Rg−1 ,φ̃ ?]
1 ,φ̃2
24.3.3 Remark: Definition 24.3.2 is illustrated in Figure 24.3.2. The most important thing to focus on in
this cluttered diagram is the equality g̃ = g. This means that for matching (i.e. associated) charts, the paral-
lelism is “coordinatized” by the same group element, regarding the fibre charts as a kind of coordinatization
of the space of all permitted isomorphisms of the fibre space.
[ Maybe do another diagram like Figure 24.3.2 with Rg̃ , g̃ = g −1 ? ]

24.3. Associated parallelism 535
Lg Lg̃ ; g̃ = g
F F F̃ F̃
βγ(s),φ1 φ1 φ2 βγ(t),φ2 β̃γ(s),φ̃1 φ̃1 =h(φ1 ) φ̃2 =h(φ2 ) β̃γ(t),φ̃2
Eγ(s) Eγ(t) Ẽγ(s) Ẽγ(t)

Θγs,t = Lg,φ1 ,φ2 Θ̃γs,t = Lg̃,φ̃
1 ,φ̃2
π π π̃ π̃
γ γ
B γ(s) γ(t) B B γ(s) γ(t) B
Θγs,t = Lg,φ1 ,φ2 = βγ(t),φ
−1
◦ Lg ◦ βγ(s),φ1 Θ̃γs,t = Lg̃,φ̃ −1
= β̃γ(t), ◦ Lg̃ ◦ β̃γ(s),φ̃1
2 1 ,φ̃2 φ̃ 2
Figure 24.3.2 Associated topological pathwise parallelism
−1 γ(s),γ(t) −1 γ(s),γ(t)
Recall from Notation 23.8.15 that Lg,φ1 ,φ2 = βγ(t),φ 2
◦ Lg ◦ βγ(s),φ1 : F ≈ F , and Lg,φ̃ ,φ̃ = βγ(t),φ̃2
◦ Lg ◦
& 1 2
&
βγ(s),φ̃1 : F̃ ≈ F̃ , where βb,φ denotes φ π−1 ({b}) and so forth.
Using the notation of Definition 24.2.2 (vii), the parallelism association in expression (24.3.1) may be formu-
lated as the equation gφγ1 ,φ2 = g̃φ̃γ ,φ̃ for all associated charts φ1 ↔ φ̃1 and φ2 ↔ φ̃2 . The group elements
1 2
gφγ1 ,φ2 (s, t) and g̃φ̃γ ,φ̃ (s, t) may be thought of as the coordinates in G of the fibre set isomorphisms Θγs,t and
1 2
Θ̃γs,t respectively with respect to the corresponding fibre charts.
[ Could perhaps also define associated parallelism when the charts of the fibre bundles are not exactly asso-
ciated? Θ̃γs,t = Lg̃,φ̃ ,φ̃ = Lg̃! ,φ̃! ,φ̃! , where g̃ # = g. . . Use Theorem 23.8.17 (iii). ]
1 2 1 2
[ Show how to define associated parallelism for orbit-space associated fibre bundles? ]
24.3.4 Remark: There are many things which are the same in the two associated fibre bundles in Defi-
nition 24.3.2. These include the structure group G, the base space B, and the curve class C . Thus both
parallelisms Θ and Θ̃ are defined for the same values of γ, s and t, and their values are left translations
through the charts by the same group element g ∈ G for each curve γ ∈ C . The difference is that these left
translations are for different fibre spaces.
24.3.5 Remark: It doesn’t seem to be possible to define associated parallelism without the use of fibre
charts because fibre bundle associations can only be defined in terms of fibre charts. Therefore, as with all
definitions which are constructed with charts, it must be verified that the definition is chart-independent.
This is done in Theorem 24.3.6.
[ Must also show that Θ̃ is a parallelism. ]

[ See manuscript notes about Theorem 24.3.6. ]
24.3.6 Theorem: The associated parallelism in Definition 24.3.2 is chart-independent.

γ(s),γ(t)
Proof: It must be shown that the isomorphism Θ̃γs,t = Lg,φ̃ ,φ̃ with g = g̃φ̃γ ,φ̃ (s, t) is independent of
1 2 1 2
the choice of fibre charts. The original parallelism Θ is automatically chart-independent because the group
elements g are defined in terms of Θ rather than the other way around. Chart-independence for Θ means that
the group elements gφγ1 ,φ2 (s, t) ∈ G obey the rule gφγ! ,φ! (s, t) = ḡφ!2 ,φ2 (γ(t))gφγ1 ,φ2 (s, t)ḡφ1 ,φ!1 (γ(s)), where the
1 2
functions ḡφ,φ! : Uφ ∩ Uφ! → G denote the fibre chart transition functions for charts φ, φ# ∈ AF E . This follows
b1 ,b2
from the rules for change of fibre charts for the fibre set isomorphisms Lg,φ1 ,φ2 . (See Definition 23.8.17 (iii).)
The same rule must be shown to apply for Θ̃; that is, it must be shown that
gφ̃γ! ,φ̃! (s, t) = ḡφ̃! ,φ̃2 (γ(t))gφ̃γ (s, t)ḡφ̃1 ,φ̃! (γ(s)).
1 2 2 1 ,φ̃2 1

By Definition 24.3.2, gφ̃γ! ,φ̃! (s, t) = gφγ! ,φ! (s, t) and gφ̃γ (s, t) = gφγ1 ,φ2 (s, t). From Definition 23.10.3 (ii)
1 2 1 2 1 ,φ̃2
for fibre bundle associations, it follows that ḡφ̃! ,φ̃2 (γ(t)) = ḡφ!2 ,φ2 (γ(t)) and ḡφ̃1 ,φ̃! (γ(s)) = ḡφ1 ,φ!1 (γ(s)). So
2 1
everything works out very nicely. (This is not a coincidence. It’s all been rigged!)
[ Parallelism can also be ported between OFBs and PFBs as a natural extension of Definition 24.3.2. Show
how parallelism is ported to associated fibre bundles which are constructed by the orbit space (skew product)
and identification space constructions. ]
[ Should also make some comments on parallelism on topological PFBs. ]
[ Define a kind of “covariant derivative” as the change of fibre minus the parallel change. ]
[ Show the relation to right translations on PFBs. Maybe Lbg,φ ≡ Rgb for PFBs? (Closed curves??) And
perhaps Lbg,φ
1 ,b2
1 ,φ2
b1 ,b2
≡ Rg,?,? ?]
γ(s),γ(t)
[ If (E, π, B) is a PFB, try to show that Rg−1 ,φ̃ gives an associative parallelism. ]
1 ,φ̃2
24.4. Other topics

This section is a holding bay for some other topological parallelism topics, such as holonomy groups and
topological generalizations of curvature and Stokes Theorems.
[ Remark 24.4.1 about “homotopy continuity” (not a standard definition) of pathwise parallelism is purely
experimental. ]
24.4.1 Remark: It is reasonable to expect that as a path is continuously varied, the parallel transport
along that path should vary continuously. This can be made meaningful in terms of homotopy.
Let (E, π, B) be a topological fibre bundle. A homotopy from a curve γ0 to a curve γ1 may be defined as
a map γ : [0, 1] × [0, 1] → B such that γ(s, 0) = γ(0, 0) and γ(s, 1) = γ(0, 1) for all s ∈ [0, 1]. Denote by
γs : [0, 1] → B the curve defined by γs : t 8→ γ(s, t). If each curve γs is a suitable curve for a pathwise
parallelism Θ on (E, π, B), then Θ may be said to be “homotopy continuous” if the map s 8→ Θγ0,1 s
(z) is a
continuous map from [0, 1] to Eγ(0,1) for each z ∈ Eγ(0,0) .
[ Does “homotopy continuity” follow automatically in the compact-open topology (or some other natural
topology) from the continuity condition in Definition 24.2.2? ]
%
[ Should try to put a topology on IsoG (E) = p,q∈E IsoG (Ep , Eq ) so that continuity can be defined for
variations of γ where the endpoints are variable. This is related somehow to the task of differentiating a
connection twice on geodesics. So maybe will need a differentiable structure on IsoG (E). This space could
be interpreted as some sort of “double fibre bundle”. ]
[ Define “pathwise curvature” as the map from the set of paths which start and finish at a common point
p ∈ M to the set of automorphisms of Ep . The value of the curvature is the map Θab : Ep → Ep , where
Θst is the “pathwise parallelism” on the fibre bundle. Then must show that curvature is additive in the
sense that if two closed paths are added (in some algebraic topology sense – see Ahlfors [94], page 137, for
formal sums of curves; see EDM2 [35], 94.D, for additivity of integrals on contours; see EDM2 [35], 80.D,
for holonomy groups) to make another close path, then the curvature value for the path sum is the sum of
the curvature values. This kind of additivity has some relation to the Stokes Theorem. Consider the case
of complex analysis, where paths not enclosing poles have absolute parallelism whereas with a pole in the
middle, the pathwise curvature is non-zero. ]
[ It seems clear that the pathwise curvature will be independent of the choice of start/end point on a path. Will
−1
have a curvature map κ : C → G or κ : P → G. Then g = κ(γ) will be the solution of Θγs,s = βb,φ ◦ Lg ◦ βb,φ .
Unfortunately, this will probably depend on the fibre chart φ. Must look into using a principal fibre bundle
and then writing Θγs,s = Rg : Eb ≈ Eb , or something like that. ]
[ Try to use the chart-invariance of fibre coordinates/components of Θ for a closed path to show that κ(γ) is a
chart-independent group element gφγ1 ,φ1 (s, t). This is not true, but try to prove something like this anyway. ]
[ Definition 24.4.3 has been temporarily abandoned. It may be revived some day. The idea is that parallelism
is an equivalence relation defined on a “path bundle”, which is a sort of wormhole through the fibre bundle
which carries fibre orientation from one point to another. It doesn’t seem to be needed for defining anything
yet, but there could be some use for it. ]

24.4. Other topics 537
24.4.2 Remark: The “curve bundle” and “path bundle” definitions are non-standard. They may be
thought of as wormholes transporting fibre orientation from point to point of a fibre bundle. Definition
24.4.3 defines curve bundles as parametrized families of points in the fibre bundle. The parametrization
distinguishes multiple points of self-intersecting curves. Because of possible self-intersections, a curve bundle
is not quite the same thing as a subbundle constructed by restricting the base space to the image of the
curve.
24.4.3 Definition: The curve bundle over a never-constant curve γ : I → B in a (G, F ) topological fibre
E ) is the (G, F ) topological fibre bundle (Eγ , πγ , Bγ , AEγ ) defined by
bundle (E, π, B, AF F
(i) Eγ = {(t, z); t ∈ I, z ∈ Eγ(t) } ⊆ I × E;

(ii) πγ : (t, z) 8→ (t, π(z)) = (t, γ(t)) for t ∈ I;
(iii) Bγ = {(t, γ(t)); t ∈ I} = γ ⊆ I × B;
(iv) AF
Eγ = {φ̃; φ ∈ AE }, where the chart φ̃ : Ũφ → F is defined for each φ ∈ AE by Ũφ = {(t, z) ∈ Eγ ; γ(t) ∈
F F
Dom(φ)} and φ̃ : (t, z) 8→ φ(z);

(v) The topology TEγ on Eγ is the weak topology induced by the maps πγ × φ̃ for φ̃ ∈ AF
Eγ .
(vi) The topology TBγ on Bγ is the strong topology induced by the map πγ .
24.4.4 Remark: It is difficult to give a simple definition of curve bundles for sometimes-constant curves
because constant stretches of curves are really a kind of self-intersection, and the topology and charts for
intersections should distinguish between separate intersections of over a point, but constant stretches should
be regarded as a single point. The curve bundle topology and charts for sometimes-constant curves may be
defined with reference to equivalent never-constant curves.
The topology defined for curve bundles in Definition 24.4.3 is equivalent to that of the trivial bundle I × F .
[ Define a “parallel cross-section” along a path bundle or curve bundle. Also define parallel transport and the
lift of a path or curve through a given z ∈ Ep . Parallel cross-sections may also be expressed as equivalence
relations which contain the same information. ]


[539]
Part II
Differential geometry


[541]
Chapter 25
Topological manifolds
25.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541

25.2 Euclidean and locally Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
25.3 Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
25.4 Charts and atlases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
25.5 Topological manifold constructions, attributes and relations . . . . . . . . . . . . . . . . . 548
25.6 Topological identification spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
25.0.1 Remark: In this chapter, topological manifolds are introduced as topological spaces which happen
to be locally homeomorphic to Euclidean spaces. Alternatively, topological manifolds may be thought of as a
“patchwork quilt”, consisting of patches of Euclidean space sewn together somehow seamlessly. In practice,
the patchwork quilt model is how manifolds are studied, but the point set of the manifold is primary in the
way a manifold is visualized.
25.0.2 Remark: A topological space (X, T ) requires no extra structure in addition to the topology T
in order to be declared to be a topological manifold. Additional structure such as an atlas is optional.
Differentiable manifolds, introduced in Chapter 26, do need extra structure (such as an atlas) for their
specification.
25.1. Background
25.1.1 Remark: Manifolds have been a big success story of mathematical generalization. Much of modern
physics is written in the language of manifolds, perhaps because no one really believes any more that the
universe is flat. Even if the universe turns out to be homeomorphic to a simple Euclidean space, the general
manifold point of view is still a useful tool for forcing one to throw away preconceived notions of flat space.
25.1.2 Remark: Probably the term “manifold” arose to describe the generalization of 1-parameter curves
to parametrized surfaces. OED [212], page 1272, gives the year 1855 for the earliest use of the word “manifold”
in Kantian philosophy. Riemann’s Über die Hypothesen welche der Geometrie zu Grunde liegen appeared in
1854, but the OED gives the year 1890 for the first use of “manifold” in mathematics.
25.1.3 Remark: There was a time when mathematicians would define a manifold as any set which could
be parametrized by a finite set of real parameters. Nowadays mathematicians are more precise about
the conditions for such parametrizations. In the olden days, manifolds were pictured as having a grid of
coordinate curves on a surface similar to longitude and latitude curves on a globe of the Earth. Points
were labelled with coordinates (x1 , . . . , xn ) and these were used as a substitute for the familiar Cartesian
coordinates for flat space, the difference being that the coordinate curves on a manifold were themselves
variable instead of the more familiar rigid Cartesian grid lines.
25.1.4 Remark: Although nowadays the topological structure on Euclidean spaces IRn is regarded as the
lowest-level structure which a manifold should have local equivalences to at every point of the manifold,
Bell [191], page 505, suggests that the existence of local maps from a set to subsets of IRn was formerly (at
least in 1937) considered sufficient to call the set an n-dimensional manifold.

542 25. Topological manifolds
A manifold is a class of objects (at least in common mathematics) which is such that any member
of the class can be completely specified by assigning to it certain numbers, in a definite order,
corresponding to “numerable” properties of the elements, the assignment in the given order corre-
sponding to a preassigned ordering of the “numerable” properties.
The existence of space-filling curves implies that local bijections do not ensure sufficient structural equivalence
to subsets of IRn . The topological structure must be equivalent also. In other words, the bijections must be
homeomorphisms.
25.1.5 Remark: A topological manifold is defined as a Hausdorff locally Euclidean topological space with
constant dimension. In other words, topological manifolds are just patched-together pieces of a Euclidean
space – up to a homeomorphism. The concept of a manifold is yet another method for creating new spaces
from old. The language used to describe manifolds clearly shows the subject’s origin in the mapping of
the Earth, which is everywhere locally sort-of-Euclidean but not Euclidean globally. Hence early attempts
to create charts of the Earth on a single sheet of flat paper were doomed to failure. The compromise was
to create an atlas which would cover the whole Earth with multiple overlapping charts. This principle of
defining a curved space in terms of an atlas of charts is capable of wide generalization.
25.1.6 Remark: Although a topological space is defined as being homeomorphic everywhere to open sets
in a fixed Euclidean space, it is interesting to speculate on the usefulness of a class of topological spaces
which is simply uniform in the sense that a fibre space is uniform in Definition 23.2.1. In other words, one
could require every two points p and q in a topological space to have open neighbourhoods, Ωp and Ωq
respectively, with a homeomorphism φ : Ωp ≈ Ωq such that φ(p) = q. For such a space, if one point was
locally Euclidean, then all points would be locally Euclidean.
A somewhat absurd notion of “discrete manifold” is suggested by Bell [191], page 505. Such a manifold M
would have local bijections everywhere from subsets of M to subsets of n . If this proposal was serious, it
would mean that all sets are “discrete manifolds”, even if the bijections are required to be homeomorphisms
(under very weak assumptions on the topology). However, if some sort of order structure on the manifolds
was required to correspond to order on n , perhaps some non-trivial definition could be built from such an
idea.
25.1.7 Remark: A minor distinction in terminology is introduced in this chapter. If a topological manifold
is specified by its topology, namely a set TM of open sets, then the pair (M, TM ) is called simply a “topological
manifold”. But if the manifold’s topology is specified by an atlas AM , then the pair (M, AM ) is called a
“C 0 manifold”, which fits in nicely with the definitions of differentiable manifolds of class C r for general
integers r. This is discussed further in Remark 25.4.1.
25.1.8 Remark: In terms of the mathematics, probably 99% of the definitions in differential geometry
make perfect sense in a single chart of a manifold. It is really only questions of global topology which require
an atlas of charts covering the whole manifold.
Pure mathematicians who work in differential geometry are indeed strongly interested in questions relating
to the global topology of manifolds, but the definitions which they use are generally of two kinds: (1) purely
topological definitions (layer 1 in Section 1.1) relating to connectivity categories (like homotopy and homol-
ogy) and (2) single-chart definitions in the higher layers of the manifold structure (layers 2 to 4). In other
words, the full atlas is required only for the topology, which is not, strictly speaking, a geometric structure.
The DG definitions which clearly do require more than a very local chart for their definition relate to geodesics
and the convexity of sets. A convex set, however, can always be defined in a single chart, and shortest-path
geodesics can usually be fitted into a single chart. A particular situation where a geodesic might not fit in
a single chart is where the image of the geodesic is locally dense in the manifold. This can happen with
chaotic or fractal orbits, for example. In the case of integrals over the whole manifold, clearly a full atlas of
charts is required.
Even though single-chart differential geometry may seem to be an appealing simplification, when global
questions hit the fan, a full atlas is required. So it’s best to anticipate this by working with atlases as soon
as possible.
25.1.9 Remark: Hermann Weyl [51, page 104], writing in 1918–1922, strongly hinted at a three-layer

25.2. Euclidean and locally Euclidean spaces 543
model of differential geometry structure in the following passage which appears at the end of a philosophical
discussion of the nature of physical space.
Die gedanklichen Grundlagen sind gelegt, und wir dürfen jetzt nicht länger säumen, mit dem
systematischen Aufbau der »reinen Infinitesimalgeometrie« zu beginnen, der sich naturgemäß in
drei Stockwerken vollziehen wird; vom jeder näheren Bestimmung baren Kontinuum über die affin
zusammenhängende Mannigfaltigkeit zum metrischen Raum.
This may be translated into English as follows.
The intellectual foundations are laid, and we must now no longer tarry to begin with the systematic
construction of the “pure infinitesimal geometry” which will be carried out, appropriately to its
nature, in three storeys; from the continuum which is bare of every qualification, via the affinely
connected manifold to the metric space.
Another passage [51, page 78] makes it clear that Weyl’s “continuum” means a locally Euclidean topological
space. (The word “Raum” means “space”, but it also means a “room” as in a house. Probably this was
intended humorously! The word “gedanklich” can be translated as “intellectual” or “imaginary”.) Thus
Weyl was evidently proposing a “three-storey” model as follows.
storey/floor structure
3 metric space Riemannian metric
2 affinely connected manifold affine connection
1 continuum continuous charts
This may be compared to the five-layer model in Section 1.1. Weyl misses out the differential layer, although
throughout his book, he does require sufficient differentiability of the chart transition maps for his purposes.
Since not much can be done without differentiability (i.e. with only a pure topological manifold “bare of
any qualification”), it is quite understandable that he does not split his “continuum” layer into a topological
continuum and differentiable continuum.
The zeroth layer (a set without topology) is not defined by Weyl. This is also quite understandable because
differential geometry really only starts when you get to charts in the “continuum” layer. What goes on in
the zeroth layer “cellar” was quite rightly not Weyl’s concern in a book intended for physicists around 1920.
25.2. Euclidean and locally Euclidean spaces
Product topologies are defined in Section 15.1.
The most important examples of product topologies are the n-fold products of sets such as IR. The standard
topology on IRn is the product topology of the factors in the set product.
[ Define the standard metric space (IRn , d). ]
[ Section 42.2 covers Euclidean spaces with a focus on tangent bundles. ]
25.2.1 Definition: A Euclidean topological space is a Cartesian product set IRn for some n ∈ + 0 together
with the product topology on IRn , where the topology on each set IR is generated by the set of intervals of
the form (a, b) for a, b ∈ IR with a < b.
[ Obviously topological bases must be defined for this, and a couple of other things, like the topology on IR. ]
[ Give the example here of the product of the topologies of [0, 1] ⊆ IR and {0, 1} ⊆ IR and similar examples.
Also give the example of the product of the S 1 and S 0 topologies. ]
25.2.2 Remark: Unless otherwise stated, the topology on IRn is assumed to be as in Definition 25.2.1.
25.2.3 Definition: A locally Euclidean space is a topological space X such that

∀x ∈ X, ∃Ω ∈ Topx (X), ∃n ∈ +0 , ∃G ∈ Top(IR ), Ω ≈ G.
n
25.2.4 Remark: Warner [50] defines a locally Euclidean space to require the Hausdorff property, but most
authors seem to agree with Definition 25.2.3. The Hausdorff property is not implied by Definition 25.2.3.
See Section 42.3 for examples of non-Hausdorff locally Euclidean topological spaces.

25.2.5 Remark: The concept of a homeomorphism may be thought of as a way of transferring tasks from
a general space to a special kind of space where the work is easier. This reduces duplication of effort. In the
case of Definition 25.2.3, tasks are transferred from a locally Euclidean space to open subsets of Euclidean
spaces. Thus many already-proven results in Euclidean spaces are “portable” to locally Euclidean spaces.
25.2.6 Remark: A locally Euclidean space is both locally connected and locally compact. See Exercises
46.9.1 and 46.9.2.
25.3. Topological manifolds

Subsets of topological spaces are implicitly considered to have the relative topologies from those spaces.
(See Definition 14.11.13 for relative topology.) This applies particularly to subsets of IRn and topological
manifolds.
Most texts define a manifold in terms of an atlas. A more intrinsic approach is taken here. Manifolds are
defined in terms of the existence everywhere of a local homeomorphism to a Euclidean space.
Definition 25.3.1 assumes that a set M has a pre-defined topology and a test is applied to this topology
to determine whether it satisfies the conditions for a topological manifold. It seems that Malliavin [36]
actually creates the topology on a set via an atlas as the weak topology induced by the coordinate maps.
Although this may be closer to the practical way of working, it would introduce superfluous structure into
the mathematical definition.
Manifolds could be generalized from the “locally Euclidean topological space” definition presented here to a
manifold which is locally homeomorphic to any topological space at all. This would yield a general “patching
together” definition for pieces of any given topology. In particular, spaces which are locally homeomorphic
to n or n could be studied.
Kobayashi/Nomizu [27], page 2, define manifolds more generally to be topological spaces which are locally
homeomorphic to spaces other than IRn . Such generalizations are not considered here, even to complex
spaces.
[ Could define here a locally Euclidean topological space with variable dimension, and then define a manifold
as such a space with constant dimension. On the other hand, a fibre bundle always has a constant fibre
space at all points. This raises the question of whether it would be useful to define a fibre bundle with
non-constant fibre space. ]
25.3.1 Definition: Let n ∈ + 0 . An n-dimensional (topological) manifold is a Hausdorff space M such

that every point has an open neighbourhood which is homeomorphic to an open sub set of IRn . In other
words, M is said to be an n-dimensional (topological) manifold if
(i) M is a Hausdorff space (see Definition 15.2.13), and
(ii) ∀x ∈ M, ∃Ω ∈ Topx (M ), ∃G ∈ Top(IRn ), Ω ≈ G.
[ Try to show that a topological manifold is more than just T2 , i.e. Hausdorff. Probably the extra locally
Euclidean property combined with T2 would imply something stronger than that. See Section15.2 for
topological separation classes. ]
25.3.2 Remark: The Hausdorff condition is not superfluous. (See [20], definition 1.6, page 6.) [ See
Malliavin [36], proposition I.1.4.1. ] It is not immediately obvious why the Hausdorff property does not
follow from the local homeomorphisms to IRn , but Example 42.3.2 confirms that the Hausdorff condition is
not superfluous.
25.3.3 Remark: Warner [50] defines a locally Euclidean space to require the Hausdorff condition, which
makes his definition the same as Definition 25.3.1. He then defines a manifold to have a maximal atlas, and
requires the topology to be second countable and Hausdorff. This seems to be a rare choice of conditions.
But Crampin/Pirani [12], page 238, require a countable basis.
25.3.4 Remark: Notation 25.3.5 is a good example of a style of definition which is almost ubiquitous in
mathematics literature. Taken at face value, it suggests that dim(M ) is a function of the manifold. In
fact, as noted in Remark 25.3.7, dim(M ) has an infinite number of values for the empty manifold. So it

25.4. Charts and atlases 545
cannot possibly be an inferrable property of the set M together with its topology. In the case of non-empty
manifolds, the calculation of the dimension non-trivial. One is not supposed to understand the notation
dim(M ) as requiring a calculation from the set and topology. The notation dim(M ) actually means “the
integer n which was used in the definition of M in Definition 25.3.1”. This is an ill-defined concept. But
it is also what the definition means in most textbooks. Such “properties of definitions” should always be
replaced by a “property of defined object” if one wishes to be moderately rigorous.
25.3.5 Notation: dim(M ) denotes the dimension of a manifold M .
25.3.6 Example: A trivial example of a topological manifold is the set IRn with its standard topology
for n ∈ +
0 . For any x ∈ IR , the sets Ω and G in Definition 25.3.1 may be taken as the whole set IR , and
n n
the homeomorphism is the identity map.
25.3.7 Remark: It could be argued that the case of dimension n = 0 is of little interest. There are,
however, some redundant cases which make use of this. A 0-dimensional manifold is just a discrete topology;
in other words, an arbitrary set M with the maximal topology T = IP(M ). The empty topological space
0.
(M, T ) = (∅, {∅}) is a topological manifold for all dimensions n ∈ +
[ Define “manifolds with boundary”. E.g. see Lang [31], pages 38–41. ]
25.3.8 Notation: C(M, IR) denotes, for any topological manifold M , the set of all continuous real-valued
functions on M together with the operations of pointwise addition and multiplication by elements of IR.
To be precise in terms of Definition 10.1.2, C(M, IR) is an abbreviation for the tuple (IR, V, σIR , τIR , σV , µ),
where
(i) IR −
< (IR, σIR , τIR ) denotes the field of real numbers,
(ii) V = {f : M → IR; f is continuous},
(iii) σV : V × V → V is the pointwise addition on V , and
(iv) µ : IR × V → V is the pointwise multiplication by IR on V .
C(M, IR) may also be denoted as C(M ), C 0 (M, IR) or C 0 (M ).
25.3.9 Notation: C̊(M, IR) for a manifold M denotes the set of continuous partially defined real-valued
functions on M .
25.3.10 Notation: C̊p (M, IR) for a manifold M and p ∈ M denotes the set of continuous partially defined
real-valued functions on M whose domains contain p.
25.3.11 Remark: Notations 25.3.8, 25.3.9 and 25.3.10 are well-defined for general topological spaces, not
just topological manifolds. (See Section 6.11 for partially defined functions.) Other structures are frequently
added to C(M, IR) in Notation 25.3.8 in a fairly standard fashion, such as topological or metric space
structure for compact M .
[ Here give notation C(M, IRm ) for continuous maps between manifolds etc. ]
[ Here give notation C(M1 , M2 ) for continuous maps between manifolds etc. ]
25.3.12 Remark: The definitions of curves and paths in topological manifolds are inherited from the
topological space definitions in Sections 16.2 and 16.4.
25.4. Charts and atlases

25.4.1 Remark: A topological manifold is just a topological space M which happens to be Hausdorff and
everywhere homeomorphic to a Euclidean space IRn . In other words, a topological manifold is a locally
Euclidean Hausdorff topological space with constant dimension. No extra structure, such as an atlas, needs
to be specified since these are implicit in the topological structure on M . This contrasts with the case of a
differentiable manifold which does require extra structure. (See also Remark 26.3.9.)

Since an atlas is a very common way of indicating the topology on a manifold, even if the manifold is not
differentiable, it is convenient to accept an alternative specification tuple (M, AM ) for a manifold (M, TM ),
where AM is an atlas according to Definition 25.4.6 and TM is the topology implied on M by AM .
The tuples (M, TM ) and (M, AM ), specifying the topology or an atlas respectively, will be used almost in-
terchangeably. Both specifications are used by a large number of authors as the standard for a “topological
manifold”. To distinguish between them when necessary, the topological form (M, TM ) will be referred to as
a “topological manifold” (Definition 25.3.1), and the atlas form (M, AM ) will be called a “C 0 manifold” (Def-
inition 25.4.11). This fits conveniently with the definition for a C r differentiable manifold (Definition 26.3.6).
25.4.2 Definition: A chart for an n-dimensional topological manifold M for n ∈ +

0 is a homeomorphism
ψ : U → G such that U ∈ Top(M ) and G ∈ Top(IRn ).
A manifold chart is also called a coordinate map or a coordinate function.
U G
M
IRn
Figure 25.4.1 Coordinate map ψ : U → G ∈ Top(IRn ) with U ∈ Top(M )
25.4.4 Remark: In this book, the symbol ψ usually hints at a manifold chart (i.e. a coordinate map).
Many books use the symbol φ for manifold charts. However, in this book, φ hints at a function from one
manifold to another. The author’s mnemonic for this is that ψ (“psi”) suggests the last two letters of “maps”,
whereas φ (“phi”) is the first letter of the word “function” (a Latin word which does not seem to be of Greek
origin).
25.4.5 Remark: Since the map ψ in Definition 25.4.2 is a homeomorphism, so is its inverse ψ −1 . Therefore
it would be possible to define charts as maps from the Euclidean space IRn to the manifold. Many texts
do this. However, it is better to regard coordinates as tags on geometrical points. The points are the
primary entity and the coordinates are mere labels for the points. On the other hand, in the case of curves
and families of curves, points are given as a function of real n-tuples because curves and families of curves
represent a kind of possibly self-intersecting (non-injective) motion within the manifold. Curves may self-
interest; manifolds do not. A curve gives you a unique point for each parameter value. A manifold has a
unique set of parameters for each point.
Another way to see that it is more natural to define charts as functions from the point set to the coordinate
set than vice versa is to think of how people make real maps of the Earth. The usual procedure is to choose
which points are of interest, such as towns and mountains, and then determine the coordinates (e.g. longitude
and latitude) of these points. In other words, the coordinates are attributes of the point, not vice versa.
One does not choose a set of coordinates and then go out and see what is at those coordinates. Perhaps
an exception to this would be aerial and satellite photography where data is organized as a set of pixels,
and the points on the Earth must be determined as a function of the pixel row and column in the matrix.
However, such images are usually calibrated by identifying points on Earth which have known coordinates
and then determining the Earth-to-pixels map by interpolation.
Similarly in the case of fibre bundles, fibre charts are expressed as functions from the fibre bundle to the fibre
space rather than vice versa. On the other hand, when defining charts for particular embedded manifolds, it
is generally easier to define the inverse chart, i.e. from a Euclidean space to the manifold. (See for instance
spherical coordinates in Section 41.1.)
% Definition: An atlas for an n-dimensional topological manifold M is a set S of charts for M such
25.4.6
that ψ∈S Dom(ψ) = M .

25.4. Charts and atlases 547
% indexed atlas for an n-dimensional topological manifold M is a family (ψi )i∈I of charts for M such that
An
i∈I Dom(ψi ) = M . (See Figure 25.4.2.)
M
U1 U2
M \ U2 M \ U1
ψ1 ψ2
ψ1 (U1 ) ψ2 (U2 )
U1 ∩ U2
IRn IRn
Figure 25.4.2 Atlas for a topological manifold M
25.4.7 Remark: An atlas is given two alternative formalizations in Definition 25.4.6: with and without
an index. In practical applications, the charts are usually indexed. For convenience, the family of charts is
usually referred to interchangeably as a set of charts, which then means the set of charts which is indexed
by the family. As always with indexed families, the indexing may be implicit or explicit, according to the
requirements of the context. The same issue arises for fibre bundle atlases in Definition 23.3.14.
The arguments in favour of defining an atlas as a set of charts rather than an indexed family are much
stronger than the counterarguments. In the case of an indexed atlas: (1) it is difficult to choose an index set
for a complete atlas other than the set of charts themselves, which is rather clumsy; (2) when merging two
atlases, it is difficult to choose an index for the union of the atlases, particularly if the atlases are infinite;
(3) when restricting a manifold to a subset, the restricted atlas uses a subset of the index set of the full
atlas; (4) since the content of an atlas is independent of the indexing, the extraneous index map must be
ignored when comparing atlases. All in all, it is best to simply add an index set whenever it is convenient
for applications.
[ As in Definition 23.3.17 for fibre bundles, define a “topological manifold with atlas” even though the atlas
is optional. ]
25.4.8 Notation: An atlas for a manifold M which is implicit in a particular context may be denoted
by atlas(M ). Then atlasp (M ) = {ψ ∈ atlas(M ); p ∈ Dom(ψ)} denotes the set of charts in atlas(M )
whose domain contains a given point p ∈ M . Another notation will be AM for an atlas for M , and ApM
for atlasp (M ).
25.4.9 Remark:
& It is not necessary to impose any additional continuity condition on the transition maps
ψj ◦ (ψi &U ∩U )−1 : ψi (Ui ∩ Uj ) → ψj (Ui ∩ Uj ) because all charts are homeomorphisms by Definition 25.4.2.
i j
Therefore any atlas for M is a C 0 atlas according to Definition 26.3.2.

Every topological manifold M possesses an atlas. If M is compact, then any atlas on M has a finite subset
which is also an atlas. (This may be referred to as a sub-atlas.) Therefore every compact topological manifold
has a finite atlas. But a topological manifold with an infinite number of disconnected components cannot
have a finite atlas.
[ Should show that a simply connected topological manifold must have a finite atlas, but only if it is true. ]
25.4.10 Theorem: For any given atlas S on a topological manifold M , there is one and only one topology T
on M such that S is an atlas for the topological space (M, T ). In other words, the atlas uniquely determines
the topology. Conversely, the topology determines the set of all possible charts, and hence the set of all
possible atlases.
25.4.11 Definition: A C 0 manifold is a pair (M, AM ) such that AM is an atlas for the topological manifold
(M, TM ) for some topology TM on M .

25.4.12 Remark: By Theorem 25.4.10, the topology TM in Definition 25.4.11 is uniquely determined by
the atlas AM . The atlas AM is uniquely determined up to an atlas equivalence by the topology TM . See
Remark 25.4.1 for related discussion.
25.4.13 Theorem: If S is an atlas for the topological space (M, T ), and ψ is a chart for (M, T ), then
S ∪ {ψ} is an atlas for (M, T ). The set {ψ; ψ is a chart for M } is an atlas for M .
[ See Malliavin [36], proposition I.1.3.1 for Theorem 25.4.13. ]
25.4.14 Remark: All pairs of charts for a topological manifold are automatically consistent on their in-
tersection. This is because the topological manifold structure is entirely determined by the topological
structure. This is different to the case of a differentiable manifold, where an atlas is required to indicate
which structure is intended. An atlas for a topological manifold is fairly superfluous unless the manifold is
actually defined in terms of an atlas using the concept of a topological graft, as in Theorem 25.6.4.
25.4.15 Definition: The maximal atlas of a topological space (M, T ) is the atlas consisting of all charts
for (M, T ), namely C̊(M, IRn ) = {ψ : U → IRn ; U ∈ Top(M ) and ψ is a homeomorphism}.
[ Define a “complete atlas” and show that the maximal atlas is complete? ]
25.4.16 Remark: Although the structure of a topological manifold may equally be described by its topol-
ogy or by an atlas, it seems that a differentiable manifold requires the atlas, unless there’s some sort of set
of subsets or something on a differentiable atlas from which the differentiable structure can be recovered.
25.4.17 Remark: If one wished to generalize the concept of a topological manifold in a manner similar
to the intrinsic definition of a topological fibre bundle (Definition 23.2.1), then one could define a manifold
to be a topological space (M, T ) such that for all x1 , x2 ∈ M , there exist neighbourhoods U1 , U2 ∈ T such
that U1 ≈ U2 . In other words, the topological space is locally homogeneous. This also would imply the
existence of a fixed extrinsic topological space (V, T # ) such that for all x ∈ M , there is a neighbourhood U
of x such that U ≈ V . This would mean that (M, T ) is a space which is patched together from patches
of the space (V, T # ). This is not very useful for differential geometry because only spaces which are locally
Euclidean are relevant.
25.4.18 Remark: In topological manifolds, a curve may be tested for continuity by mapping it through
the charts as in Theorem 25.4.19.
25.4.19 Theorem: If γ : I → M is a map from& an interval I ⊆ IR to an n-dimensional topological

manifold M , then γ is continuous if and only if ψ ◦ γ &γ −1 (U) : γ −1 (U ) → IRn is continuous for all continuous
charts ψ : U → IRn for M .
25.5. Topological manifold constructions, attributes and relations

25.5.1 Theorem: A function f : M1 → M2 is continuous if and only if continuous through the charts etc.
[ See Malliavin [36], proposition I.1.5.3. ]
[ Define here the restriction of a manifold to an open subset. See Malliavin [36], section I.6, for submanifolds. ]
25.5.2 Remark: The direct product functions ψ1 × ψ2 in Definition 25.5.3 are given by Definition 6.9.11,
which defines ψ1 × ψ2 : (p1 , p2 ) 8→ (ψ1 (p1 ), ψ2 (p2 ) for p1 ∈ Dom(ψ1 ) and p2 ∈ Dom(ψ2 ).
If n1 = dim(M1 ) and n2 = dim(M2 ), then the usual identification of IRn1 × IRn2 with IRn1 +n2 by concaten-
tation is assumed. (See Definition 7.7.6 for concatenation.)

25.6. Topological identification spaces 549
25.5.3 Definition: The (direct) product atlas of two atlases S1 and S2 on topological manifolds M1 and
M2 respectively is the atlas S for the product topological space M1 × M2 given by
S = {ψ1 × ψ2 ; ψ1 ∈ S1 and ψ2 ∈ S2 }.
25.5.4 Theorem: If (M1 , S1 ) and (M2 , S2 ) are topological manifolds, then (M1 × M2 , S1 × S2 ) is a topo-
logical manifold with the direct product atlas S1 × S2 as in Definition 25.5.3.
25.5.5 Definition: The (direct) product manifold of two topological manifolds (M1 , S1 ) and (M2 , S2 ) is
the manifold (M1 × M2 , S1 × S2 ).
25.5.6 Theorem: The topology induced on the product of two topological manifolds by a product atlas
is the same as the product of the topologies induced by the respective atlases on the manifolds.
25.5.7 Remark: Generally the direct product atlas of two maximal atlases for topological manifolds is not
itself maximal. This is another good reason to not work with maximal atlases.
[ Should say something about manifolds with boundaries here. ]
[ Also present here the quotient of a manifold with respect to an equivalence relation. ]
25.6. Topological identification spaces

25.6.1 Remark: Although the formal definition of a topological manifold is expressed in terms of a topo-
logical space with the property that charts exist everywhere, in practice, the manifold itself is often defined
in terms of charts. In other words, the manifold is actually constructed from the grafting together of an
atlas of charts. So the fact that an atlas of charts exists is a direct consequence of the method of construc-
tion. The graft of a family of topological spaces is presented in Definition 15.11.2. If the charts in a graft
of a family of topological spaces are locally Euclidean and related to each other by homeomorphisms on
their intersections, the resulting graft will be a manifold. This is stated more precisely in Theorem 25.6.4.
Conversely, a topological manifold is homeomorphic to a topological graft of the chart spaces, as stated in
Theorem 25.6.3.
ψ1 (U1 ) = Range(ψ1 )
U1 = Dom(ψ1 ) x1 IRn f1
ψ1
p x
M
ψ2 (U2 ) = Range(ψ2 )
ψ2
x2 IRn
U2 = Dom(ψ2 ) f2
X
Figure 25.6.1 Topological graft of charts of a manifold
25.6.3 Theorem: Let (ψi )i∈I be an indexed atlas for an n-dimensional topological manifold M . Let
Xi = Range(ψi ) for i ∈ I. Define X ⊆ ×˚i∈I Xi by
) *
X = (xi )i∈J ∈ × ˚ Xi ; ∃p ∈ M, (∀i ∈ J, xi = ψi (p) and ∀i ∈ I \ J, p ∈
/ Dom(ψi ))
i∈I
) *
= (xi )i∈J ∈ × ˚ Xi ; ∃p ∈ M, ∀i ∈ I, ((i ∈ J and xi = ψi (p)) or (i ∈
/ J and p ∈
/ Dom(ψi ))) .
i∈I
For all i ∈ I, define the topology Ti on Xi to be the relative topology from IRn . The family (Xi , Ti )i∈I
is topologically consistent with the graft X. (See Definition 15.11.2.) Let T be the graft topology on X.
Then (X, T ) ≈ (M, Top(M )).

Proof: It is evident from Definition 6.10.5 that X is a set graft of the family (Xi )i∈I . Condition (i) of
Definition 6.10.5 requires that there be no null families in X. This follows from the fact that the atlas
covers M . The remaining conditions follow equally straightforwardly.
The fact that the family (Xi , Ti )i∈I is topologically consistent with the graft X follows directly from the
topological consistency of all charts in the atlas. Since all chart transition maps are homeomorphisms, the
image of an open set under any chart transition map is an open set.
To show the topological equivalence of (X, T ) and (M, Top(M )), define the identification map fi : Xi → X
as in Definition 15.11.2 so that fi (xi ) = x for all i ∈ I and x ∈ X. In other words, fi maps each element of
Xi to the corresponding element of the graft X.
Define the function f : M → X so that f (p) = fi (ψi (p)) for some i ∈ I. This is well-defined because
the functions fi are defined so that fi (xi ) = fj (xj ) if and only if ψi−1 (xi ) = ψj−1 (xj ) for all i, j ∈ I. To
show that f is a homeomorphism, first let Ω ∈ Top(M ) and note that ψi (Ω) ∈ Ti for all i ∈ I. Therefore
f (Ω) = ∪i∈I fi (ψi (Ω)) ∈ T . Similarly, any open set of (X, T ) is of the form ∪i∈I fi (Ωi ) for some open sets
Ωi ∈ Ti for i ∈ I, by Definition 15.11.2. Each set ψi−1 (Ωi ) is open in (M, Top(M )). So f −1 (∪i∈I fi (Ωi )) =
∪i∈I f −1 (fi (Ωi )) = ∪i∈I ψi−1 (Ωi ) ∈ Top(M ). Hence f : (M, Top(M )) ≈ (X, T ).
25.6.4 Theorem: Let n ∈ + , and let (X, T ) be a topological graft of a family (Xi , Ti )i∈I of topological
spaces which are all homeomorphic to some open subset of IRn . Then (X, T ) is an n-dimensional topological
manifold.

[551]
Chapter 26
Differentiable manifolds
26.1 Overview of differentiable structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552

26.2 Terminology and definition choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
26.3 Differentiable manifold atlases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
26.4 Some standard differentiable manifold atlases . . . . . . . . . . . . . . . . . . . . . . . . 556
26.5 Some basic definitions for differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . 556
26.6 Differentiable real-valued functions on differentiable manifolds . . . . . . . . . . . . . . . . 558
26.7 Differentiable curves and paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
26.8 Differentiable families of differentiable transformations . . . . . . . . . . . . . . . . . . . . 561
26.9 Differentiable maps between differentiable manifolds . . . . . . . . . . . . . . . . . . . . . 562
26.10 Analytic manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
26.11 Unidirectionally differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
26.12 Lipschitz manifolds and rectifiable curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
26.13 Differentiable fibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
26.14 Tangent space building principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
26.0.1 Remark: This chapter introduces the “differential layer” of differential geometry. (The structural
layers of differential geometry are summarized in Sections 1.1 and 26.1.) No connection or metric is assumed
to be defined. Only a differentiable structure is defined. Therefore geodesics, covariant derivatives, distances
and angles are meaningless here. The subject matter of Chapters 26 to 32 is referred to as “differential
topology” by Misner/Thorne/Wheeler [38], chapter 9.
Figure 26.0.1 illustrates the relations between the various kinds of manifolds according to the amount of
structure which is defined on them.
differentiable manifold
differentiable manifold with connection
Riemannian manifold pseudo-Riemannian manifold
Figure 26.0.1 Family tree of manifolds according to structures defined on them
26.0.2 Remark: Apart from global considerations, a differentiable manifold is the same as a flat space
which is subject to arbitrary differentible changes of coordinates. Only concepts which retain their meaning
under local diffeomorphisms of the underlying space are meaningful for differentiable manifolds. For example,
the concept of a straight line is meaningless.
26.0.3 Remark: Differentiable manifold topics are distributed among the chapters as follows.

552 26. Differentiable manifolds
chapter topics
26. differentiable manifolds differentiability tests for manifolds, functions, maps and curves
27. tangent bundles tangent vectors; tangent operators
28. tensors, tensor fields covariant vectors; tensors; vector fields; tensor fields; differential forms
29. higher-order vectors higher-order tangent vectors and operators
30. differentials differentials of functions, maps and curves
31. higher-order differentials higher-order differentials; differentials for higher-order operators
32. vector field calculus Poisson bracket; Lie derivatives; exterior derivative
26.1. Overview of differentiable structure

26.1.1 Remark: Vectors specify direction at each point of a set. This enables you to determine the rates
of change of functions in various directions. Vectors are defined with the aid of local coordinate charts as
Figure 26.1.1 Layer 2: Charts and vectors
There is a lot of freedom in the choice of the local charts. For example, if the charts are rotated through
any angle, the coordinates attached to points are changed, but the property of differentiability of a function
with respect to points “under the chart” is not changed by such a rotation. Any local diffeomorphism of a
local chart leaves the differentiability properties unaltered.
26.1.2 Remark: There is no correspondence between the direction of a vector at one point and a vector
at any other point. This is because of the total freedom of choice of local coordinate orientation. Even if
two vectors at different points seem to have the same direction for a particular choice of coordinates, the
directions will generally be different for other orientations of the charts.
26.1.3 Remark: The local charts in Layer 2 define a canonical topology on the set of points. This is
why charts are in a higher layer than the topology. The choice of a set of local charts for a set is called a
“differentiable structure”.
26.1.4 Remark: Vectors at points of a manifold are called “tangent vectors” for historical reasons. They
originated in the study of n-dimensional surfaces in IRn+1 for n = 1 or 2, regarded as real functions of
n variables. In this case, the vectors can be interpreted as tangent line segments which are familiar from
Euclidean geometry.
26.1.5 Remark: A “differential” at a point on a manifold is a map from the linear space of vectors at that
point to some other linear space. The simplest kind of differential map maps the tangent vectors at a point
to the 1-dimensional linear space IR. Differentials are useful for defining the rates of change of functions on
a manifold. The rate of change is a linear function of the tangent vectors.

26.2. Terminology and definition choices 553
26.1.6 Remark: The core fact about the differential layer is the Gauß-Green theorem, which is also known
as the Stokes theorem, the Stokes formula and the Green-Stokes formula. This theorem combines local and
global concepts, multi-variable derivatives, geometric measure theory and algebraic topology. An operator
called the “exterior derivative” maps the differential forms used for integration (line elements, area elements
and volume elements) to the differential forms for their boundaries; hence the adjective “exterior”. The
integral of the exterior derivative of a differential form over a region equals the integral of the form itself over
the boundary of the region. This provides a very powerful tool for converting between pointwise equations
in the interior of a region and integrals over the boundary of the region. This is very important, for example,
in electromagnetism (Maxwell’s equations).
The Stokes theorem is usually written in the following deceptively simple way:
N N
dω = ω,
C ∂C
for any singular r-chain C and differential form ω of degree r − 1. Unravelling the meaning of this formula
reveals a vast network of concepts. The Stokes theorem is the culmination of the development of the
“mathematical machinery” in the differential layer.
26.1.7 Remark: Differentiable structures are defined in Chapter 26. Vectors on manifolds are defined in
Chapters 27, 28 and 29. Differentials on manifolds are defined in Chapters 30 and 31.
26.2. Terminology and definition choices

26.2.1 Remark: The central concept of the differential layer of a manifold is the tangent space. Tan-
gent spaces are defined in Chapter 27. The reason for using regularity classes such as C r rather than
r-times differentiability is to simplify the definitions of derivatives.
!n For instance, the directional derivative
lima→0 (f (x + av) − f (x))/a for v ∈ IRn may be expressed as i=1 v i ∂f (x)/∂xi if f is C 1 .
26.2.2 Remark: The concept of a “differentiable structure” on a manifold is an abstract concept or

“essence” which does not need to be represented by any particular set construction such as a C r maxi-
mal atlas as many authors do. The best way to think of the differentiable structure is as a set of methods
and procedures for answering questions about differentiability for a manifold, such as which functions are
differentiable and which curves are differentiable. Such questions can be answered in terms of a finite atlas
or in other ways. For computational applications, it is desirable to define a manifold in terms of a finite atlas
rather than a maximal atlas. The desire to have a manifold definition which is independent of a particular
finite atlas should be resisted because the completion of any atlas is highly dependent on the level of regu-
larity specified. A maximal atlas also discards possibly valuable information, such as, for example, a varying
level of regularity in different regions of the manifold. Therefore a C r differentiable structure is defined in
this book as a C r differentiable atlas.
The term “differentiable structure” seems wrong although it is common usage. The expression “differential
structure” seems much more logical since the structure itself is not differentiable structure. Expressions such
as “metric structure”, “algebraic structure” and “topological structure” certainly suggest “differential struc-
ture” as the preferred term. It is likely that “differentiable structure” is really a contraction of “differentiable
manifold structure”.
Some authors use the term “differentiable manifold” to mean a C ∞ manifold. This is an unfortunate
practice. When a reader refers to a book for particular results, it is very easy to make serious errors by
thinking that the assumptions or assertions of a theorem are weaker than they really are. It is an unnecessary
practice, given that the term C ∞ is easier to write and has the same number of syllables to pronounce as
“differentiable”. So no effort is economized and great harm can be done in terms of wasted time and energy
for the reader. Redefining standard terms to mean something else is nearly always a nuisance for the reader.
(A similar practice is the habit of using the term “map” to mean a continuous map. This is dangerous if the
reader is not reading a book linearly.) Neither the C 1 nor C ∞ interpretations of the word “differentiable”
agree with the standard elementary calculus definition, which is a weaker notion than C 1 . However, the C 1
interpretation will be used for the word “differentiable” in the manifold context in this book, although this
is, strictly speaking, incorrect terminology. (See Remark 26.3.8, for instance.)

26.2.3 Remark: Many differential geometers claim to use “coordinate-free” definitions and notations.
These just hide the coordinates so that you don’t see them. There is an analogy here to computer operating
systems which offer a point-and-click interface so that you never have to type textual commands on an
old-fashioned command line. What really happens is that someone sets up buttons and menus so that the
novice user can make complex commands happen with the click of a mouse, but when the experienced user
finds that the pre-programmed command sequences are insufficient, they must use a text command line. In
the same way, differential geometry can be done in a coordinate-free manner with notations set up to hide
the coordinates for common situations. Then when you want to do something unusual which is not on the
menu, you must do the hard work and go back to coordinates. Practical calculations almost always require
detailed low-level work with coordinates. Anyone who wants to take differential geometry seriously should
not avoid learning the low-level coordinate versions of everything. This does not mean that one should use
tensor calculus at all times, but one should always know how to convert all “coordinate-free” expressions
into the coordinate expressions which they hide.
26.3. Differentiable manifold atlases

26.3.1 Remark: It is a source of some frustration that a geometric object as symmetric as the 2-sphere
S 2 in IR3 which is everywhere smooth and uniform must be described in differential geometry with charts
which have edges. It is not possible to cover all of S 2 with a single chart. This is the same annoyance that
occurs when trying to print a map of the world on a single sheet of flat paper. The cause of the problem is
the fact that Cartesian coordinates, which are so appropriate for parametrising a space such as IR3 simply
cannot cope with a sphere.
A similar problem arises even in flat spaces such as IR3 , because in physical space we do not see any grid lines.
Any set of Cartesian coordinates for IR3 will make some sets easier to describe than others. So we accept
that any translation and orientation of Cartesian coordinates is valid. But annoyingly, there is no single
“correct” translation and orientation. Physical 3-dimensional space is apparently everywhere uniform and
smooth, but Cartesian coordinates have special points and directions, so that there are an infinite number of
ways of parametrising any physical system. The fact that the cross-product of two intervals of real numbers
cannot parametrize a 2-sphere is an additional annoyance. Euclid’s geometry, in which all relations between
figures are relative, does not suffer from these annoyances, but for the sake of analysis, it is necessary to
work with coordinates and charts.
Maybe some day, someone will discover how to analyse the world without recourse to grids and charts. For
the present, though, coordinates are the only practical way to do differential geometry. After all, everyone
seems to accept that sentences are broken in the middle when the text flows into a new line. There is no such
break in human speech, and yet we accept these breaks in printed books. No one would seriously suggest
that books should be written on long paper rolls so as to avoid line breaks and page breaks. Similarly we
must accept that manifold charts have edges and non-uniformities.
−
26.3.2 Definition: A C r atlas for a topological manifold M − < (M, TM ) for r ∈ + 0 is a topological
atlas AM for (M, TM ) such that ψ2 ◦ ψ1−1 : ψ1 (U1 ∩ U2 ) → ψ2 (U1 ∩ U2 ) is C r for all ψ1 , ψ2 ∈ AM , where
Uα = Dom(ψα ) for α = 1, 2. (See Figure 26.3.1.)
An indexed C r atlas for a topological manifold M is a family (ψα )α∈I such that {ψα ; α ∈ I} is a C r atlas
for M .
A C r atlas for a topological manifold M is also called a C r differentiable structure on M .
26.3.3 Remark: Definition 26.3.2 defines a C r atlas as a special kind of topological atlas which satisfies a
differentiability condition. This assumes that the set M has a pre-defined topology of a topological manifold.
(Topological atlases are defined in Section 25.3.) An alternative approach is taken in Definition 26.3.4 where
no pre-existing topology is assumed. In this case, a unique topology is induced on the set M by the atlas.
It is not really important whether the topology comes from the atlas or the atlas comes from the topology,
as long as they agree with each other. Definition 26.3.4 is used in Definition 26.3.6.
−+
26.3.4 Definition: For n ∈ + 0 and r ∈ 0 , an n-dimensional C atlas for a set M is a set AM of
r
bijections ψ : U → IR from subsets U of M to open subsets ψ(U ) of IRn such that

n

26.3. Differentiable manifold atlases 555
U1 U2
ψ1 U1 ∩ U2 ψ2
ψ1 (U1 ) ψ2 (U2 )
& &
ψ1 &U ψ2 &U
1 ∩U2 1 ∩U2
ψ2 (U1 ∩ U2 )
ψ1 (U1 ∩ U2 )
&
ψ2 ◦ ψ1−1 &ψ
1 (U1 ∩U2 )
IRn IRn
&
Figure 26.3.1 Transition map ψ2 ◦ ψ1−1 &ψ , n = dim(M )
1 (U1 ∩U2 )
(i) ∀ψ1 , ψ2 ∈ AM , ψ2 ◦ ψ1−1 : ψ1 (U1 ∩ U2 ) → ψ2 (U1 ∩ U2 ) is C r , where Uα = Dom(ψα ) for α = 1, 2;

%
(ii) ψ∈AM Dom(ψ) = M .
An indexed C r atlas for a set M is a family (ψα )α∈I such that {ψα ; α ∈ I} is a C r atlas for M .
The topology induced by a C r atlas AM on a set M is the topology TM on M for which all of the maps in
AM are homeomorphisms.
The atlas AM is called a C r manifold atlas for the set M if the topology induced by AM on M is Hausdorff.
The indexed atlas (ψα )α∈I is called an indexed C r manifold atlas for the set M if the topology induced by
AM = {ψα ; α ∈ I} on M is Hausdorff.
[ Must show that the domains of the charts in Definition 26.3.4 are open in the induced topology. Must also
define the topology more succinctly and precisely. ]
26.3.5 Remark: The topology TM in Definition 26.3.4 is uniquely determined by the atlas AM and is the
weak topology induced by the charts ψ ∈ AM . If the topology TM on M is not Hausdorff, then the atlas
AM may be thought of as a “C r locally Euclidean space atlas”.
−
26.3.6 Definition: For n ∈ + and r ∈ + 0 , an n-dimensional C (differentiable) manifold is a pair
r
(M, AM ) such that M is a set and AM is a C r manifold atlas for the set M .
The topology TM induced by the atlas AM on M is called the underlying topology of the differentiable man-
ifold (M, AM ). The topological space (M, TM ) is called the underlying topological space of the differentiable
manifold (M, AM ).
26.3.7 Remark: The differentiable manifold specification tuple (M.AM ) is abbreviated to M in most texts.
This
% is somewhat illogical because the set M is easily recovered from the implied atlas AM as the union
{Dom(ψ); ψ ∈ AM } = M of the domains of the charts in the atlas. The atlas AM cannot be recovered
from the set M alone. However, it is customary to use the base set of a structure tuple as its abbreviation
rather than the added structures because people generally think about objects in the foreground and their
attributes in the background. The set M is a kind of “figure-head” for the whole structure tuple (M, AM ).
26.3.8 Remark: If the regularity class C r of a differentiable manifold is not specified, it is assumed to
be C 1 . A “differentiable manifold” means a C 1 manifold unless otherwise indicated. Some authors, unfortu-
nately, say “differentiable manifold” when they mean “C ∞ manifold”. It is better to use the term “smooth
manifold” to mean “C ∞ manifold”. Best of all is to avoid the ambiguous terms “differentiable manifold”
and “smooth manifold” and always state the regularity class explicitly as C 1 or C ∞ or some other class.
A very wide range of regularity classes could be useful instead of just “C r ” in Definition 26.3.2. However,
only the C r classes are specified here because they are adequate for most purposes of this book, and because

it is easy to substitute some other regularity class such as “analytic” or “C k,α ” for “C r ”. This is also
mentioned in Remark 18.7.8 for flat spaces. In the case of differentiable manifolds, one would need to define
a class of maps from IRn to IRn , such as a pseudogroup, for each regularity class. (See Section 19.4 for
diffeomorphism pseudogroups.)
26.3.9 Remark: Whereas a topological manifold may be defined without an atlas (see Definition 25.3.1),
it is necessary to specify an atlas for a differentiable manifold in order to indicate the choice of differentiable
−
structure. For any r ∈ + , a single topological space may support an infinite number of incompatible C r
atlases. By contrast, all C 0 atlases for a topological manifold are automatically compatible. So a C 0 manifold
is just a topological manifold with an arbitrary atlas. (See Remarks 25.4.14 and 26.5.11.)
26.3.10 Remark: A C 0 manifold is not, strictly speaking, a differentiable manifold. The case r = 0 is
included for notational convenience, and also to provide an alternative specification tuple for a topological
manifold. The term “C 0 manifold” will be used for any tuple (M, AM ) such that AM is a C 0 atlas for M ,
whereas if TM is the topology induced on M by the atlas AM , the pair (M, TM ) is referred to as a “topological
manifold”. (See Remark 25.4.1 for related discussion.)
26.4. Some standard differentiable manifold atlases

26.4.1 Definition: The usual atlas for an open subset M of IRn for n ∈ +
0 is the atlas AM = {ψ}
where ψ = idM .
The usual atlas for an open set M ⊆ IRn is also called the standard atlas for M or the usual differentiable
structure for M or the standard differentiable structure for M .
The pair (IRn , AIRn ) is called the differentiable manifold IRn for n ∈ 0,
+
where AIRn = {idIRn }.
26.4.2 Remark: The differentiable manifold IRn −

< (IRn , AIRn ) in Definition 26.4.1 is a C ∞ manifold for
all n ∈ 0 . It is also an analytic manifold.
+
26.4.3 Remark: The set M = {x ∈ IRn+1 ; xn+1 = f (x1 , . . . xn )} ⊆ IRn+1 for any C 0,1 function f : IRn →
IR can be given manifold charts in a natural way by projecting points from M to x ∈ IRn+1 ; xn+1 = 0},
which may be identified with IRn . For any w ∈ IRn , a function ψw : M → IRn may be defined by ψw :
(x1 , . . . xn+1 ) 8→ (x1 − w1 xn+1 , . . . xn − wn xn+1 ). This function is one-to-one if |w| < Lip(f )−1 , where Lip(f )
is the Lipschitz constant of f in Definition 17.4.10. An example of this is illustrated in Figure 42.4.1.
Manifolds which are embedded in the flat spaces IRn are said to be “regularly” embedded if they are
everywhere locally projectable in this one-to-one manner onto hyperplanes of the ambient space.
If only a single projection of a manifold onto a hyperplane is used in an atlas, the atlas will be of class
C ∞ because there will be no transition maps. Therefore it is important to include enough charts in the
atlas to accurately describe the inherent regularity (or irregularity) of the manifold. In IRn , n projections
with n independent projection directions at each point should be adequate to fully describe the manifold’s
regularity.
26.5. Some basic definitions for differentiable manifolds

26.5.1 Remark: Although an atlas is defined in Definition 26.3.2 as a set AM of charts, an atlas is
sometimes formalized as a family (ψα )α∈I such that AM = {ψα ; α ∈ I}. The conversion between these two
set constructions is usually handled informally. Each form has its own advantages according to context.
26.5.2 Remark: The differentiability property of any function f : IRn → IRn is a kind of “locally affine”
condition. It is therefore not surprising that affine connections are defined on manifolds satisfying Defini-
tion 26.3.2, which requires differentiable chart transition functions as indicated in Definition 26.5.7.
26.5.3 Remark: Kobayashi/Nomizu [27], page 1, give a very general class of regularity definitions for
atlases through the concept of a “pseudogroup of transformations”. Thus an atlas is said to be compatible
with a particular pseudogroup Γ of transformations if its transition maps are all elements of Γ.

26.5. Some basic definitions for differentiable manifolds 557
26.5.4 Notation: atlas(M ) denotes the implicit atlas AM for a differentiable manifold (M, AM ).
atlasp (M ) denotes the subset {ψ ∈ atlas(M ); p ∈ Dom(ψ)} of charts ψ in (the implied atlas) AM whose
domains contain a particular point p ∈ M .
ApM is an alternative notation for atlasp (M ).
−
26.5.5 Theorem: Let (M, AM ) be a C r manifold for r ∈ + 0 . Then (M, AM ) is a C manifold for all
s
s ∈ 0 with s ≤ r.
+
26.5.6 Definition: A C r chart for a C r manifold (M, AM ) is a topological chart ψ for M such that
AM ∪ {ψ} is a C r atlas for M . Such a chart ψ is said to be C r compatible with (M, AM ).
26.5.7 Definition: The coordinate transition matrix for charts ψα and ψβ in an indexed atlas (ψα )α∈I for
a differentiable manifold M at a point p ∈ Dom(ψα ) ∩ Dom(ψβ ) is the matrix Zβα (p) ∈ GL(n) defined by
+ ∂ " #i ,&&
Zβα (p)i j = ψβ ◦ ψα
−1
(x) & . (26.5.1)
∂xj x=ψα (p)
The function Zβα : Dom(ψα ) ∩ Dom(ψβ ) → GL(n) is called the coordinate transition matrix map for charts
ψα , ψβ ∈ atlas(M ).
26.5.8 Theorem: Using the notation of Definition 26.5.7, Zβα ∈ C̊ r−1 (M, GL(n)) if M is C r and r ≥ 1.
If M is C ∞ , then Zβα is C ∞ . [ This also uses Notation 26.6.6! Must fix this. ]
26.5.9 Remark: A useful mnemonic shorthand for equation (26.5.1) is
∂ψβi
Zβα (p)i j = (p).
∂ψαj
This is similar to Remark 27.5.16.
26.5.10 Definition: Two C r manifold atlases A1M and A2M for a set M are said to be C r equivalent atlases
on M if A1M ∪ A2M is a C r atlas for M . Then (M, A1M ) and (M, A2M ) are said to be C r equivalent manifolds.
26.5.11 Remark: Clearly (M, A1M ∪ A2M ) in Definition 26.5.10 is C r equivalent to both (M, A1M ) and
(M, A2M ). [ See Malliavin [36], definition I.1.5.1, section I.3.1. ]
All atlases on a given topological manifold are C 0 equivalent. This follows from Theorem 25.4.13. [ See
Malliavin [36], proposition I.1.4.1. Need the Hausdorff condition to get this equivalence? ]
26.5.12 Definition: The restriction of a differentiable

& manifold (M, AM ) to an open subset Ω of M is the
differentiable manifold (Ω, AΩ ), where AΩ = {ψ &Ω∩Dom(ψ) ; ψ ∈ S and Ω ∩ Dom(ψ) -= ∅}.
The atlas AΩ may be called the relative atlas of relative differentiable structure of the subset Ω of M .
26.5.13 Remark: If (M, AM ) is a C r manifold, then the restricted manifold (Ω, AΩ ) is C r . [ Prove this? ]
The exclusion of empty charts from the definition of the atlas AΩ is quite arbitrary.
[ Define submanifolds near here. Do this also for topological manifolds. ]

[ Define foliations. Also for topological manifolds. See EDM2 [35], section 154. ]
26.5.14 Remark: A C r maximal atlas can be constructed for a C r manifold (M, AM ) as the set of all
C r compatible charts for (M, AM ) according to Definition 26.5.6. This C r maximal atlas is sometimes
called the “C r differentiable structure” on M −< (M, AM ). Maximal atlases are not very useful in practice.
For example, Theorem 26.5.5 would not be valid if differentiable manifolds were required to have maximal
atlases. Kobayashi/Nomizu [27], page 2, call a maximal atlas a “complete” atlas. A C r maximal atlas could
be referred to as a “C r -maximal differentiable structure”. But it is clearer to call it a C r -maximal atlas.
26.5.15 Definition: The product of two differentiable manifolds (M1 , A1 ) and (M2 , A2 ) is the differen-
tiable manifold (M1 × M2 , A), where A is the product atlas of A1 and A2 . (See Definition 25.5.3.) [ See
Malliavin [36], proposition I.3.2.7. ]

−+
26.5.16 Theorem: Let r ∈ 0 . Let M1 and M2 be C r manifolds. Then the product manifold of M1 and
M2 is C r .
[ Probably a large number of topological categories of C r manifolds could be collected together into a single
definition instead of the following multiple categories. ]
26.5.17 Definition: A compact C r manifold is a C r manifold (M, S) whose underlying topological space
M is compact.
26.5.18 Definition: A paracompact C r manifold is a C r manifold (M, S) whose underlying topological

space M is paracompact.
26.6. Differentiable real-valued functions on differentiable manifolds

The continuity of a real-valued function on a topological manifold M depends only on the topological
structure. It has nothing to do with the choice of atlas, but differentiability of a real-valued function
f : M → IR is meaningless if the only structure on a topological manifold M is the topology. The same
topological space M can have many incompatible differentiable structures such that f is differentiable in
some but not in others. (See Remark 26.3.9.)
The C r differentiability of a function whose domain is an open subset of IRn and range is IRm for some
m, n ∈ + 0 is (or will be) defined in Section 18.5. Thus the C differentiability of functions on manifolds is
r
defined in terms of the same property for flat spaces.

−
26.6.1 Definition: A (real-valued) C r (differentiable) function for r ∈ + 0 in an open subset Ω of a C
r
manifold M − < (M, AM ) is a function f : Ω → IR such that f ◦ ψ −1 : ψ(Ω ∩ Dom(ψ)) → IR is of class C r for
all charts ψ ∈ AM . (See Figure 26.6.1.)
M
ψ(Ω ∩ Dom(ψ))
ψ Ω
f
Range(ψ) Dom(ψ)
IR
f ◦ ψ −1
IRn
Figure 26.6.1 Differentiability test for f ◦ ψ −1 : ψ(Ω ∩ Dom(ψ)) → IR
[ The algebraic structures in Notation 26.6.2, Theorem 26.6.3 and Notation 26.6.12 do not belong here. Present
such algebraic structures on function spaces elsewhere. Probably they need their own section. ]
−
26.6.2 Notation: C r (M, IR), for r ∈ + 0 and C manifolds M −
r
< (M, AM ), denotes the real linear space
of all C functions on M with the usual pointwise addition and real scalar multiplication operations for
r
function spaces.
C r (M ) is an abbreviation for C r (M, IR).
−
26.6.3 Theorem: Let M − < (M, AM ) be a C r manifold with r ∈ + 0 . Then the set C (M, IR) of C
r r
real-valued functions on M is a ring under the operations of pointwise addition and multiplication.

26.6. Differentiable real-valued functions on differentiable manifolds 559
[ Remark 26.6.4 may be related to manifolds with boundaries in some way. Check the whole remark to find
out why I wrote it. ]
−
26.6.4 Remark: The spaces C r (M ) with r ∈ + 0 contain at least the constant functions, but in the case of
analytic functions, these might be the only functions. Therefore it is of interest to know whether the C r (M )
spaces have a richer set of functions than that.
The spaces C r (M ) contain functions which match any specified derivatives of order up to r at any given
point p in the manifold. For instance, let ψ ∈ AM be a chart for (M, AM ) such that p ∈ M , and let R ∈ IR+
be a positive number such that the ball Bψ(p),R ⊆ Range(ψ). Let x0 = ψ(p). Define φ : Range(ψ) → IR by
φ(x) = 1/(1+exp(1/(R−|x−x0 |)−1/|x−x0 |)) for 0 < |x−x0 | < R, φ(x0 ) = 1, and φ(x) = 0 for |x−x0 | ≥ R.
(See Figure 26.6.2.) Then φ ∈ C ∞ (IRn ) and all derivatives of φ equal zero for x = x0 and |x − x0 | ≥ R.
(See Theorem 20.12.7.) Therefore if φ is multiplied by any polynomial function P : IRn → IR, the pointwise
product P · φ is C ∞ , has the same derivatives as the polynomial P at x = x0 , and vanishes completely
outside Bx0 ,R . Then the function (P · φ) ◦ ψ : Dom(ψ) → IR may be extended with the value zero outside
Dom(ψ) to yield a function in C r (M ) with any specified values of derivatives up to order r at the point p.
1 1
f (x) = I J
1 1
1 + exp −
R − |x − x0 | |x − x0 |
x ∈ IRn
|x − x0 | = R x0 |x − x0 | = R
Figure 26.6.2 C ∞ function on IRn with compact support BR (x0 )
26.6.5 Remark: Notations 26.6.6 and 26.6.7 are concerned with “local functions”, namely functions which
are not necessarily defined everywhere on a manifold.
−+
26.6.6 Notation: C̊ r (M, IR), for r ∈ 0 and C r manifolds M , denotes the set of all C r functions f : Ω →
IR on open sets Ω ∈ Top(M ).
C̊ r (M ) is an abbreviation for C̊ r (M, IR).
−+
26.6.7 Notation: C̊pr (M, IR), for r ∈ 0 and points p in a C r manifold M , denotes the set of all C r
functions on open sets Ω ∈ Topp (M ).
C̊pr (M ) is an abbreviation for C̊pr (M, IR).
[ Define compact-open topology for differentiable manifolds? See EDM2 [35], section 279.C. ]
26.6.8 Definition: A local maximum of a function u : M → IR on a C 0 manifold M is a point p ∈ M

such that for some open neighbourhood Ω of p, u(q) ≤ u(p) for all q ∈ Ω.
A local minimum of a function u : M → IR on a C 0 manifold M is a point p ∈ M such that for some open
neighbourhood Ω of p, u(q) ≥ u(p) for all q ∈ Ω.
26.6.9 Theorem: If p ∈ M is a local maximum& of a function u ∈ C 1 (M ), where M − < (M, AM ) is a C 1

i −1 &
n-dimensional manifold, then (∂/∂x )(u ◦ ψ (x)) x=ψ(p) = 0 for all i = 1, . . . n and ψ ∈ AM .
" & #n
If M is C 2 and u ∈ C 2 (M ), then the matrix of second derivatives (∂ 2 /∂xi ∂xj )(u ◦ ψ −1 (x))&x=ψ(p) i,j=1 is
negative semi-definite for all ψ ∈ AM .
[ Must prove the “negative semi-definite” assertion in Remark 26.6.10. ]

" & #n
26.6.10 Remark: The matrix (uij )ni,j=1 = (∂ 2 /∂xi ∂xj )(u ◦ ψ −1 (x))&x=ψ(p) i,j=1 in Theorem 26.6.9 is
!n
negative semi-definite if and only i,j=1 a uij ≤ 0 for all positive semi-definite real symmetric matri-
ij
ces (a )i,j=1 . This makes maximum principles for boundary value problems possible. This helps to explain
ij n
why second-order elliptic partial differential equations are a important class of equations.
Since the first-order derivatives of the function u in Theorem 26.6.9 are zero, the second-order derivatives
of chart transition maps in Remark 19.5.3 do not play a role. So the second-order derivative matrix in
Theorem 26.6.9 is in fact tensorial. In other words, it transforms entirely according to the first-order
derivatives (the Jacobian matrix) of chart transition maps. Of course, whether or not a point is a maximum
for a C 2 function is independent of the choice of chart. So, as expected, the criteria for a maximum in
Theorem 26.6.9 are chart-independent. More than this, the criteria do not require any connection or metric,
even though second derivatives are involved here.
−+
26.6.11 Definition: An (IRm -valued) C r (differentiable) function for m ∈ + 0 and r ∈ 0 in an open
subset Ω of a C manifold M −
r
< (M, AM ) is a function f : Ω → IR such that f ◦ψ : ψ(Ω∩Dom(ψ)) → IRm
m −1
is of class C r for all charts ψ ∈ AM .

−
26.6.12 Notation: C r (M, IRm ), for r ∈ + 0 , m ∈ 0 and C manifolds M −
+ r
< (M, AM ), denotes the real
linear space of all IR -valued functions of class C on M , with the usual pointwise addition and real scalar
m r
multiplication operations for function spaces.
26.7. Differentiable curves and paths

This section uses the definitions in Sections 16.2 and 16.4 for curves, paths and parametric families of curves
in general topological spaces. Curves and families of curves are assumed to be continuous by definition.
−
26.7.1 Definition: A C r (differentiable) curve in a C r manifold M <− (M, AM ) for r ∈ + 0 is an open
curve γ : I → M in M such that ψ◦γ : γ (Dom(ψ)) → IR is of class C for all ψ ∈ AM , where n = dim(M ).
−1 n r
26.7.2 Remark: Open curves are introduced in Definition 16.2.9. Particularly note that the domain of a
curve is defined to be an interval of IR. (See Figure 26.7.1.) C r curves in IRn are given by Definition 18.6.5.
M
p
γ ψ IRn
ψ◦γ ψ(p)
IR
Figure 26.7.1 Curve in a differentiable manifold
26.7.3 Remark: Definition 26.7.1 seems to only define C r curves in a differentiable manifold when the
curve differentiability parameter r equals the differentiablity parameter r of the manifold. However, a C s
−
manifold for s ∈ + 0 such that r ≤ s is automatically of class C . Therefore the meaning of Definition 26.7.1
r
would be unchanged if the “C manifold” requirement was replaced with a C s requirement for r ≤ s. This
r
observation holds in general for a large set of definitions for structures on differentiable manifolds. It is
simply simpler and tidier to state only the minimum regularity requirement for the manifold rather than
spelling out the more general-looking statement in detail.
26.7.4 Remark: The condition “for all ψ ∈ AM ” in Definition 26.7.1 cannot be replaced with the condition
“for some ψ ∈ AM ” because differentiability must hold for at least a set of chart domains which cover the
range
% of the curve. The condition could be replaced by the condition “for all ψ ∈ C”, where C ⊆ AM and
{Dom(ψ); ψ ∈ C} ⊇ Dom(γ). Hence only a finite number of charts need to pass the test. But this is true of
the vast majority of regularity tests because the regularity implied by the differentiable structure is usually
fully determined by a small finite subset of the atlas. (This remark has some relevance to computational
differential geometry, where it is important to keep everything as finite as possible.)

26.8. Differentiable families of differentiable transformations 561
26.7.5 Theorem: The range of a C r curve in a differentiable manifold M is connected in the topology
on M .
[ Define C k paths as equivalence classes [γ]k of C k curves γ. ]
[ Should extend Definition 26.7.1 to non-open intervals by requiring one-sided derivatives to be well-defined
at the end-points of the interval. The boundary conditions get trickier in the case of families of curves. ]
[ Define a nonsingular curve/path. E.g. Greene/Wu [68], page 6. ]
[ Define a piecewise C r curve/path. State that piecewise C r curves/paths are closed under concatenation.
Define rectifiable curves/paths. Show that a rectifiable curve in a C 1 manifold is differentiable almost
everywhere. ]
[ Define tangent bundles and fibrations for paths, maybe not in this section. This definition should use
curves/paths to cope with self-intersections of paths. The tangent vector at a point on a curve should be in
the path tangent bundle. So the tangent vectors for a curve should be a cross-section of the path tangent
bundle. ]
26.7.6 Theorem: If p ∈ M is a local maximum of a& function u ∈ C 1 (M ), where M is a C 1 manifold, and
γ : IR → M is a C 1 curve in M , then (d/dt)(u ◦ γ(t))&t=x = 0 for all x ∈ IR such that p = γ(x).
&
If M is C 2 , u ∈ C 2 (M ) and γ is C 2 , then (d2 /dt2 )(u ◦ γ(t))& ≤ 0 for all x ∈ IR such that p = γ(x).
t=x
[ Somewhere, but not in this section, give a definition of tangent vectors in terms of curves through a point.
Call these “tangent curve classes”. (See Darling [14], section 7.2.1, page 147.) Then refer to this from
Section 27.3. ]
[ Should define tangent vectors as Cauchy sequence classes somewhere. ]
26.7.7 Remark: The following table summarizes how curve and path topics have been distributed within
this book.
section topics
16.1 terminology for curves and paths paths are equivalence classes [γ]0 of curves γ
16.2 curves definitions for curves in topological spaces
16.3 path-equivalence of curves definition of curves which have the same path
16.4 paths definitions for paths
17.5 rectifiable curves and paths almost-everywhere differentiable curves
24.2 pathwise topological parallelism parallelism Θγs,t for paths γ
26.7 differentiable curves and paths
26.12 rectifiable curves
28.8 vector fields along curves
29.8 higher-order vector fields for families of curves
30.4 differential of a curve dγ = γ #
31.3 higher-order differentials of curves and families dk γ, k ≥ 1; ∂ijk... γ
31.8 differentials of curves for higher-order operators
37.1 covariant derivatives of vector fields along curves Dγ; Dk γ, k ≥ 1
37.2 geodesic curves Dγ = 0
26.8. Differentiable families of differentiable transformations

Differentiable families of differentiable diffeomorphisms are required for the analysis of connections on dif-
ferentiable fibre bundles.
26.8.1 Definition: A C 1 one-parameter family of diffeomorphisms of a C 1 manifold M is a map φ : IR →
C 1 (M, M ) such that
(i) ∀t ∈ IR, φ(t) is a diffeomorphism (automorphism) from M to M ,
(ii) φ ∈ C 1 (IR × M, M ).

26.8.2 Definition: A C 1 one-parameter group of diffeomorphisms of a C 1 manifold M is a map φ : IR →

C 1 (M, M ) such that
(i) ∀t ∈ IR, φ(t) is a diffeomorphism (automorphism) from M to M ,
(ii) ∀s, t ∈ IR, φ(s) ◦ φ(t) = φ(s + t),
(iii) φ ∈ C 1 (IR × M, M ).
26.8.3 Remark: The function φ : IR → C 1 (M, M ), and its transpose φ̄ : IR × M → M , defined by

φ̄(s, x) = φ(x)(s) for (s, x) ∈ IR × M , are regarded as interchangeable in Definition 26.8.2.
26.8.4 Definition: A local C 1 one-parameter group of local transformations of a C 1 manifold M is a map

φ : (a, b) → C 1 (Ω, M ) for some a, b ∈ IR such that a < 0 < b, Ω is an open subset of M , and
(i) ∀t ∈ (a, b), φ(t) is injective,
(ii) ∀s, t ∈ (a, b), (s + t ∈ (a, b) and φ(t) ∈ Ω) ⇒ φ(s) ◦ φ(t) = φ(s + t),
(iii) φ ∈ C ((a, b) × Ω, M ).
1
[ The above definition derives from EDM [34], 108.L, and Gallot/Hulin/Lafontaine [20], pages 23–26. ]
26.9. Differentiable maps between differentiable manifolds

Differentiable maps include diffeomorphisms as a special case. A diffeomorphism is a differentiable map
which is also a homeomorphism whose inverse is a differentiable map. Differentiable maps may be between
manifolds of arbitrary equal or unequal dimension.
Note that even though it is straightforward to define whether a map between manifolds is of class C r , it is
not so easy to state just what the rth derivative is. This contrasts with real functions of a real variable, where
the derivative of a function resides in the same space as the function being differentiated. In differential
geometry, derivatives frequently fall into a completely different space to the original function. Each order
of derivative may require a separate space to be constructed for it. The derivative of a differentiable map,
called a “differential”, is presented in Section 30.3. It is possible to avoid the question of what a differential
is in this section by defining differentiability in terms of charts, which moves the question into flat space.
−+
26.9.1 Definition: A C r (differentiable) map from a C r manifold M1 to a C r manifold M2 for r ∈ 0 is
a map φ : M1 → M2 such that
∀ψ1 ∈ AM1 , ∀ψ2 ∈ AM2 , ψ2 ◦ φ ◦ ψ1−1 is of class C r .
[ See Malliavin [36], proposition I.1.5.3 for C 0 maps. ]
26.9.2 Remark: When the regularity class C r of a differentiable map in Definition 26.9.1 is not stated
explicitly, some authors assume this to mean C 1 while others assume it means C ∞ . It is generally best to be
explicit to avoid misunderstanding. The spaces and maps in Definition 26.9.1 are illustrated in Figure 26.9.1.
ψ2 ◦ φ ◦ ψ1−1
IRn1 IRn2
ψ1 ψ2
M1 p φ(p) M2
φ
Figure 26.9.1 C r differentiable map “through the charts”

26.9. Differentiable maps between differentiable manifolds 563
26.9.3 Remark: The C r regularity of maps between manifolds is often defined in terms of real-valued
test functions. This gives a correct test which is neat and tidy. In this test-function regularity definition, a
map φ : M1 → M2 is said to be C r if f ◦ φ is in C r (M1 ) for all f ∈ C r (M2 ). This kind of definition has
a few difficulties despite its formal neatness. The number of test functions f in this definition is generally
extremely infinite, whereas for finite atlases, Definition 26.9.1 requires only a finite number of functions to be
tested for C r regularity. Another difficulty is that in practice, testing f ◦ φ for regularity requires the use of
charts on each manifold, which implies that the test-function definition turns into Definition 26.9.1 anyway.
So the apparent chart-free status of the test-function approach is actually an illusion. The equivalence of
the definitions is demonstrated in Theorem 26.9.4.
The function f ◦ φ in Theorem 26.9.4 is a kind of “pull-back” of f from M2 to M1 . This results in a “push-
forth” of tangent vectors from M1 to M2 in Definitions 30.3.1 and 30.3.18. The function ψ2k ◦ φ ◦ ψ1−1 in
the proof of Theorem 26.9.4 could be thought of as a push-forth of the function φ from the manifold to the
coordinate space via the charts, thereby making the function φ act on the coordinate space instead of the
points of the manifold.
−+
26.9.4 Theorem: Let M1 and M2 be C r manifolds for some r ∈ 0 . Then a function φ : M1 → M2 is of
class C r if and only if
∀f ∈ C r (M2 ), f ◦ φ ∈ C r (M1 ). (26.9.1)
Proof: Suppose that φ : M1 → M2 is of class C r . Let ψ1 ∈ atlas(M1 ) and ψ2 ∈ atlas(M2 ). Then by

Definition 26.9.1, ψ2 ◦ φ ◦ ψ1−1 is of class C r . A function f : M2 → IR is of class C r (by Definition 26.6.1)
if and only if f ◦ ψ2−1 is C r for all ψ2 ∈ AM2 . Let f : M2 → IR be C r . Then f ◦ ψ2−1 is C r . Therefore
(f ◦ ψ2−1 ) ◦ (ψ2 ◦ φ ◦ ψ1−1 ) is C r . But this equals f ◦ φ ◦ ψ1−1 . It follows that f ◦ φ is C r .
To show the converse, suppose that φ : M1 → M2 satisfies line (26.9.1). The kth component of ψ2 ◦ φ ◦ ψ1−1
is ψ2k ◦ φ ◦ ψ1−1 . Define f : Dom(ψ2 ) → IR by f : p 8→ ψ2k (p). This is of class C ∞ because f ◦ ψ2−1 = ψ2k ◦ ψ2−1 :
x 8→ xk for x ∈ Range(ψ2 ). Therefore f ◦ φ is C r , which means that f ◦ φ ◦ ψ1−1 is C r for ψ1 ∈ atlas(M1 ).
So ψ2k ◦ φ ◦ ψ1−1 is C r for k = 1, . . . n2 . Hence φ is of class C r .
26.9.5 Remark: Figure 26.9.2 illustrates the sets and maps in Theorem 26.9.4.
φ
M1 p φ(p) M2
f ◦φ f
IR
Figure 26.9.2 C differentiable map via test functions
r
The proof of Theorem 26.9.4 exemplifies how attempts to do differential geometry in a coordinate-free fashion
really only hide the coordinates. To be precise, whenever one invokes the space of C r functions f : M → IR
as a test space on which to work “coordinate-free”, the real-valued functions f are themselves coordinates.
There is very little difference indeed between the individual coordinates ψ k of a C r chart ψ and a C r
real-valued function.
This thinking may be applied similarly to tangent operators as defined in Section 27.5. These are defined
on C 1 test functions f : M → IR, but this is equivalent to defining operators on chart coordinates ψ k :
M → IR. In fact, a tangent operator (in Definition 27.5.1) of the form ∂p,v,ψ acting on a function f = ψ k
yields ∂p,v,ψ (f ) = v k . This shows again the equivalence of chart coordinates and test functions.
−+
26.9.6 Notation: For r ∈ 0 , denote by C r (M1 , M2 ) the set of all C r maps from C r manifold M1 to C r
manifold M2 .
26.9.7 Notation: Denote by C̊ r (M1 , M2 ) the set of all C r maps from open sets Ω ⊆ M1 to M2 .

−
26.9.8 Definition: For r ∈ + 0 , a C diffeomorphism from M1 to M2 is a homeomorphism φ : M1 ≈ M2
r
such that both φ and φ are C differentiable maps.

−1 r
The C r manifolds M1 and M2 are said to be C r -diffeomorphic if there exists a C r diffeomorphism from M1
to M2 .
26.9.9 Remark: If the regularity class C r is not specified, r = ∞ is often assumed. Thus two differentiable
manifolds are often said to be diffeomorphic when they are both C ∞ manifolds and are C ∞ -diffeomorphic.
It is preferable to state the regularity class of a diffeomorphism explicitly.
The flat-space version of Definition 26.9.8 is Definition 19.1.2.
26.10. Analytic manifolds

[ This section may be combined with Section 26.3 some day using a variable regularity class. ]
Analytic manifolds have no special properties that are useful in this book, except that they are used in
defining Lie groups in Chapter 33. The analytic manifold definitions are the same as the C ∞ except for a
simple text substitution.
26.10.1 Definition: An analytic atlas for an n-dimensional topological manifold M is an atlas S for M
such that ψ2 ◦ ψ1−1 is analytic for all ψ1 , ψ2 ∈ S.
26.10.2 Definition: An n-dimensional analytic manifold is a pair (M, S) such that M is an n-dimensional
topological manifold and S is an analytic atlas for M . Then M is called the underlying topological space
of (M, S).
26.10.3 Definition: An analytic chart for an analytic manifold (M, S) is a chart ψ for M such that
S ∪ {ψ} is also an analytic atlas for M .
26.10.4 Definition: Analytic equivalent atlases on a topological manifold M are analytic atlases A1 and
A2 on M such that A1 ∪ A2 is an analytic atlas on M . Then (M, A1 ) and (M, A2 ) are said to be analytic
equivalent manifolds. (Clearly (M, A1 ∪ A2 ) is then also analytic equivalent to both (M, A1 ) and (M, A2 ).)
26.10.5 Definition: A compact analytic manifold is an analytic manifold (M, AM ) whose underlying
topological space M is compact.
26.10.6 Definition: A paracompact analytic manifold is an analytic manifold (M, AM ) whose underlying
topological space M is paracompact.
26.10.7 Remark: It seems like all analytic manifolds should be paracompact. So Definition 26.10.6 is
apparently a waste of space. [ Check this. ]
[ Note that a differentiable manifold with only one chart in its atlas is automatically analytic. ]
26.11. Unidirectionally differentiable manifolds

26.11.1 Remark: For many purposes, including analysis on the graphs of solutions of boundary value
problems, it would be valuable to be able to define various generalizations of differentiable manifolds with
weaker regularity than C 1 . For example, Lipschitz and Hölder continuity would be useful, as would various
Sobolov spaces and distribution spaces. In the case of Lipschitz continuity, it is well known that the first
derivative exists almost everywhere. This could be translated into the intrinsic manifold context without
great difficulty.
In the case of Lipschitz manifolds, instead of a tangent bundle with copies of IRn attached to each point in
the manifold, there would be a tangent cone at each point. This cone would be a linear tangent space almost
everywhere.
Lipschitz manifolds are useful for defining rectifiable curves, which are the natural kind of curve for defining
parallelism. This subject is presented in Section 26.12.
For the subject of second-order boundary value problems, the case of C k,α manifolds for general k would
be more useful than the k = 0 case. Then the tangent bundle would be the same as for C 1 manifolds, but
higher-order structures would require generalizations.

26.12. Lipschitz manifolds and rectifiable curves 565
[ It may be that for Lipschitz surfaces, the triples (p, v, ψ) could be easier to generalize than the tangent
operators ∂p,v,ψ . Attempt here to make these generalizations. ]
26.11.2 Remark: Definitions 26.11.3 and 26.11.4 are the unidirectional analogues of the C r manifold
Definitions 26.3.2 and 26.3.6.
26.11.3 Definition: A unidirectionally differentiable atlas for a topological manifold M − < (M, TM ) is
a topological atlas AM for (M, TM ) such that ψ2 ◦ ψ1−1 : ψ1 (U1 ∩ U2 ) → ψ2 (U1 ∩ U2 ) is unidirectionally
differentiable for all ψ1 , ψ2 ∈ AM , where Uα = Dom(ψα ) for α = 1, 2.
26.11.4 Definition: For n ∈ + , an n-dimensional unidirectionally differentiable manifold is a pair

(M, AM ) such that M is a set and AM is a unidirectionally differentiable manifold atlas for M .
26.12. Lipschitz manifolds and rectifiable curves

[ This section has been taking up too much time. The author will continue work on it later. Please ignore it
for now, because it not yet ready to read. ]
[ Logically speaking, this section probably belongs at the beginning of Chapter 26. But weak regularity is
more difficult to understand than C k regularity. That’s why it’s here. ]
Rectifiable curves are a minimum requirement for defining parallelism and connections (and therefore covari-
ant derivatives and curvature) on a manifold, and a Lipschitz atlas is a minimum requirement for defining
rectifiable curves. Therefore Lipschitz manifolds are a minimum requirement for much of differential geom-
etry. This is the motivation for this section.
26.12.1 Remark: Example 26.12.2 shows that it is not possible to define rectifiability of curves in topo-
logical spaces in terms of the curve map “through the charts”, because even if a curve is rectifiable with
respect to one chart, there will certainly be other charts for which rectifiability does not hold.
[ If a topological manifold is metrizable, rectifiability is well-defined for each choice of metric, but the lengths
of curves will depend on the choice of metric, and the rectifiability property for each curve may depend on
the choice of metric. ]
26.12.2 Example: Define a 2-dimensional topological manifold (M, TM ) by M = IR2 with the usual
topology TM on IR2 . For any continuous function h : IR → IR define the map ψh : M → IR2 by ψh :
(x1 , x2 ) 8→ (x1 , x2 + h(x1 )). Then ψh is a continuous map for any continuous h, and the inverse of ψh is ψ−h ,
which is also continuous. Therefore ψh is a homeomorphism and so a valid continuous chart for (M, TM ) for
any continuous h : IR → IR. For the zero function h = 0, the map ψh = ψ0 is the identity map on IR2 . This
ψ0
γ γ(I) IR2
I
ψh
M
IR2
Figure 26.12.1 Chart-dependence of rectifiability of a curve
Now define a curve γ : I → M by γ : x 8→ (x, 0) for some interval I ⊆ IR. Consider the map “through
the charts” defined by ψh ◦ γ : I → IR2 . This must be continuous because ψh and γ are continuous. It is
clear that when h = 0, the map ψ0 ◦ γ = γ is C ∞ and certainly a Lipschitz function. But if h is chosen
to be a continuous-everwhere, differentiable-nowhere function, then ψh ◦ γ is continuous but not Lipschitz
continuous. (See Example 18.2.16 for a nowhere-differentiable function.)

26.12.3 Remark: One could define Lipschitz manifolds whose transition functions have the corresponding
regularity. On such manifolds, rectifiable curves and sets would be well-defined.
Rectifiable curves and paths are defined for metric spaces in Section 17.5. Manifold charts induce a chart-
dependent metric structure on a manifold. Lengths of curves cannot be defined in a chart-independent
manner in the absence of a true metric. But the rectifiability property can be defined in a chart-independent
manner if the manifold is Lipschitz continuous.
[ Maybe should define locally Lipschitz manifolds instead of Lipschitz. ]

[ Define a locally Lipschitz atlas. ]
26.12.4 Definition: A Lipschitz (continuous) manifold is a topological manifold equipped with a locally
Lipschitz atlas.
[ In Definition 26.12.5, shouldn’t need to restrict Dom(ψ) to a smaller neighbourhood? ]
26.12.5 Definition: A rectifiable compact-domain curve in a Lipschitz manifold (M, AM ) is a compact-

domain curve γ : I → M such that " for
& all
# t ∈ I, for some Lipschitz chart ψ ∈ AM,γ(t) , for some δ > 0,
γ(Bt,δ ) ⊆ Dom(ψ) and the map ψ ◦ γ &B : I ∩ Bt,δ → IRn is a rectifiable curve in IRn , where n = dim(M ).
t,δ
[ In Definition 26.12.5, by Lebesgue covering lemma, can always take a finite number of intervals K to cover I. ]
26.12.6 Remark: Since the parameter interval I in Definition 26.12.5 is compact, the cover of I by open
balls Br,δ ⊆ IR may be replaced by a finite cover.
The set γ −1 (U ) is relatively open in I, but is not generally an interval. In general, γ −1 (U ) is a countable
disjoint union of open intervals.
26.12.7 Remark: Theorem 26.12.8 implies that a curve in a Lipschitz manifold (M, AM ) may be tested
for rectifiability by breaking up the domain of the curve into a finite number of compact subintervals whose
ranges fit within at least one of the domains of charts in the atlas. Most importantly, the test gives the same
result independent of the choice of atlas, as long as the atlases are Lipschitz equivalent.
[ Maybe instead of Theorem 26.12.8 should just state that Definition 26.12.5 is independent of the choice of
Lipschitz atlas for Lipschitz equivalent atlases. ]
26.12.8 Theorem: For any rectifiable compact-domain curve γ : I → M in a Lipschitz manifold%m (M, AM )
with n = dim(M ), there is a finite family (Kj )m
j=1 of compact subsets of I such that I = j=1 K j and for
all j = 1 . . . m, for some ψj ∈ AM , Kj ⊆ Dom(ψj ), and the map ψj ◦ γKj : Kj → IRn is a rectifiable curve.
[ The following proof is work in progress. It isn’t ready to be read yet. It’s actually the proof of a different
(deleted) theorem! ]
Proof: Let ψ # : U # → IRn"be& a chart

# for (M, TM ). Let K # ⊆ I be a compact interval such that γ(K # ) ⊆ U # .
# &
It must be shown that ψ ◦ γ K ! : K → IRn is a rectifiable curve in IRn .
#
Let AM = {ψj ; j ∈ J} be the atlas in the statement of Theorem 26.12.8, and let Uj = Dom(ψj ) for j ∈ J.
Then {γ −1 (Uj ); j ∈ J} is an open cover of K # . Since K # is a compact subset of I, the Lebesgue covering
lemma implies that K # may be divided into a finite number of compact subintervals& K% such that each K% is
included entirely in one set γ −1 (Uj ). By the property of the atlas AM , ψj ◦(γ &K ) is a rectifiable curve in IRn .
$ &
Since ψj (γ(K% )) is compact, it follows that (ψ # ◦ ψj−1 )(ψj (γ(K% ))) = ψ # (γ(K% )) is compact and ψ # ◦ (γ &K )
$
is a rectifiable curve in IRn . Therefore the property is satisfied for the chart ψ # . It follows that the property
holds for all atlases for M .
[ Maybe, for the above proof, may need a theorem that the concatentation of rectifiable curves is rectifiable? ]
[ Is rectifiability only well-defined for compact-domain paths? Is local rectifiability well-defined for all paths?
Could use compact subsets of the parameter interval for this. ]

26.13. Differentiable fibrations 567
26.13. Differentiable fibrations

This section used to be in Chapter 34, but the concepts of differentiable fibrations are used too frequently
in Chapters 27 to 32. However, all other definitions of differentiable fibre bundles remain in Chapter 34
because they require Lie groups which are not defined until Chapter 33.
As illustrated in Figure 23.1.1, there are broadly three species of topological fibre bundles. Differentiable
fibre bundles have the same three species: fibrations, ordinary fibre bundles, and principal fibre bundles.
(Recall that fibrations are groupless fibre bundles.)
Topological fibrations with an intrinsic fibre space were defined in Section 23.2. As mentioned in Re-
mark 23.2.4, a topological fibration defines how the fibre sets at each point of the base space are “glued
together”. In the case of differentiable fibrations, the fibre sets are glued together in a differentiable fash-
ion. Definition 26.13.1 is a differentiable version of the topological fibration with intrinsic fibre space in
Definition 23.2.1. The topologies TE and TB are replaced with manifold atlases AE and AB .
The differentiable fibration in Definition 26.13.1 may seem like it is just a differentiable map between man-
ifolds which happens to satisfy some special conditions (iii) and (iv). But the purpose of the map is to
partition the total space E into fibres π −1 ({b}) at each of the points b of the base space B, and conditions
(iii) and (iv) constrain how the pointwise fibres are related to each other.
−+
26.13.1 Definition: A C k (differentiable) fibration with intrinsic fibre space for k ∈ 0 is a tuple
< (E, AE , π, B, AB ) such that
(E, π, B) −
(i) E −
< (E, AE ) and B −
< (B, AB ) are C k manifolds,
(ii) π : E → B is of class C k ,
(iii) ∀b ∈ B, ∃U ∈ Topb (B), ∃φ : π −1 (U ) → π −1 ({b}), π × φ : π −1 (U ) → U × π −1 ({b}) is a C k diffeomor-
phism,
(iv) ∀b1 , b2 ∈ B, π −1 ({b1 }) is C k -diffeomorphic to π −1 ({b2 }).
E is called the total space of (E, π, B).
π is called the projection map of (E, π, B).
B is called the base space of (E, π, B).
For any b ∈ B, the set π −1 ({b}) is called the fibre set of (E, π, B) at b.
If the regularity class C k is not stated, it is assumed to be C 1 .
26.13.2 Remark: If the structure group of a fibre bundle is not specified (as in the groupless fibre bundles
in Definitions 26.13.1 and 26.13.3), it seems reasonable that the group should be defined implicitly as the
largest group that makes sense for the given class of fibre bundle. In the case of topological fibre bundles in
Section 23.3, this implicit group is the group of topological automorphisms of the fibre space. Thus if the
C k differentiable fibre bundle in Definition 34.2.2 had no explicitly defined group G, the implied structure
group should be the group of all C k diffeomorphisms of the fibre space F . One little problem with this
idea is the fact that the group of all diffeomorphisms of a given differentiable manifold is generally not a
finite-dimensional differentiable manifold. The diffeomorphism group is usually very infinite-dimensional. By
contrast, the topological automorphism group of a fibre space can be given a reasonable natural topology.
26.13.3 Definition: A C k (differentiable) fibration with fibre space F for a C k differentiable manifold
−
< (F, AF ) for k ∈ +
F − 0 is a tuple (E, π, B) −
< (E, AE , π, B, AB ) such that
(i) E −
< (E, AE ) and B −
< (B, AB ) are C k manifolds,
(ii) π : E → B is a C k map,
(iii) ∀b ∈ B, ∃U ∈ Topb (B), ∃φ : π −1 (U ) → F, π × φ : π −1 (U ) → U × F is a C k diffeomorphism.
If the regularity class C k is not stated, it is assumed to be C 1 .
26.13.4 Remark: The fibre charts φ in Definition 26.13.3 are automatically pointwise C k -consistent on
their overlaps. So it is not necessary to require regularity of the transition maps in this definition. In fact,
since (π × φ1 ) ◦ (π × φ2 )−1 : (Uφ1 ∩ Uφ2 ) × B ≈ (Uφ1 ∩ Uφ2 ) × B is a C k diffeomorphism which maps (b, y)
to (b, gφ1 ,φ2 (b)(y)) for some function gφ1 ,φ2 : Uφ1 ∩ Uφ2 → C k (F, F ), it follows that gφ1 ,φ2 is of class C k in

the weak sense that gφ1 ,φ2 (·)(y) : b 8→ gφ1 ,φ2 (b)(y) ∈ F is a C k map for each fixed y ∈ F . Without a C k
structure on C k (F, F ), it is not possible to assert differentiability in the stronger sense of gφ1 ,φ2 being itself
of class C k .
A C 0 fibration is equivalent to a topological fibration except that the topologies on the spaces are specified
via charts rather than sets of open sets.
An advantage of the absence of a fibre atlas in Definition 26.13.3 is the fact that a C k fibration (E, π, B) for
a fibre space F is also a C k fibration for any C k manifold F # which is diffeomorphic to F . This is as one
would intuitively expect. Definition 26.13.7, on the other hand, specifies a particular atlas for a particular
fibre space F . It is straightforward to define equivalence relations among atlases so that this dependency on
a particular space F is removed.
−
26.13.5 Definition: A C k (differentiable) fibre chart for k ∈ + 0 for a C fibration (E, π, B) with fibre
k
space F is any function φ : π (U ) → U × F such that π × φ : π (U ) → U × F is a C k diffeomorphism for

−1 −1
some U ∈ Top(B).
−
26.13.6 Definition: A C k fibre atlas for k ∈ + 0 for a fibre % space F for a C k fibration (E, π, B) is a set
AE of C fibre charts for fibre space F for (E, π, B) such that φ∈AF Dom(φ) = E.
F k
E
−
An indexed C k fibre atlas for k ∈ + 0 for fibre space F%for a C fibration (E, π, B) is a family (φi )i∈I of C
k k
fibre charts for fibre space F for (E, π, B) such that i∈I Dom(φi ) = E. (That is, Range(φ) is a C fibre k
atlas for F .)
−
26.13.7 Definition: A C k (differentiable) fibration with a fibre atlas for the fibre space F for k ∈ + 0 for
a C k manifold F is a tuple (E, π, B) − < (E, AE , π, B, AB , AFE ) such that
(i) E −
< (E, AE ) and B −< (B, AB ) are C k manifolds and π : E → B is C k ,
(ii) ∀φ ∈ AF
E , ∃Uφ ∈ TB , π × φ : π
−1
(Uφ ) → Uφ × F is a C k diffeomeorphism.
%
E
26.13.8 Definition: The horizontal component of a vector W ∈ T (E) for a C 1 fibration (E, π, M ) is the
vector π∗ (W ) ∈ T (M )
26.13.9 Definition: A vertical vector on the total space E of a C 1 fibration (E, π, M ) is a vector W ∈
T (E) whose horizontal component is zero.
26.13.10 Remark: The set ker((dπ)z ) = {W ∈ Tz (M ); (dπ)z (W ) = 0} of vertical vectors in Definition
26.13.9 for a fixed point z ∈ E is a subspace % of Tz (E), namely the kernel of the pointwise differential
map (dπ)z : Tz (E) → Tπ(z) (M ). The set z∈Ep ker(dπ)z for a fixed p ∈ M is not a linear space, but it
%
is essentially the same as the total tangent space of Ep . The set z∈E ker(dπ)z of all vertical vectors on a
fibration (E, π, M ) is also not a linear space, but it is useful in the case of the case of tangent fibrations (see
Section 27.8) because the vertical vectors may be identified with the space T (M ) via a “drop” function.
26.13.11 Remark: For a very good reason, there is no definition of vertical components or horizontal
vectors for general differentiable fibre bundles. This is because these concepts are not meaningful (i.e. chart-
independent) in the absence of a connection. It is the purpose of connections in Chapter 35 to give these
concepts meaning. Very limited definitions of vertical components and horizontality are possible in the
context of Lie derivatives. (See Section 32.4 for Lie derivatives.)
26.13.12 Remark: Cross-sections of topological fibrations are defined in Definition 23.3.8. Cross-sections
of differentiable fibre bundles are the same except that differentiability is required. Definition 26.13.13 and
Notation 26.13.14 apply also to C k fibre bundles. In general, any definition for a fibration applies to the
underlying fibration of a fibre bundle.
−
26.13.13 Definition: A C k cross-section of a C k fibration (E, π, B) for k ∈ + 0 is a C map X : B → E
k
such that π ◦ X = idB .

−
26.13.14 Notation: X k (E, π, B) for k ∈ + 0 and a C fibration (E, π, B) denotes the set of all C cross-
k k
sections of (E, π, B).

26.14. Tangent space building principles 569
26.14. Tangent space building principles

There is a bewildering array of mechanisms for building spaces from a differentiable manifold. These are
presented in Chapters 27 to 32. These spaces may be grouped together under the general title “tangent
spaces”. The following is a summary of these building principles.
[ The following table is not finished yet. Some inaccuracies need to be fixed, and there should be references to
the sections and subsections where the methods are defined. Should also indicate which kinds of differentials
live in each kind of construction. Are there meaningful constructions such as T r,s (M1 , M2 ) or P r (M1 , M2 )?
Maybe could even include concepts like direct product manifolds near here, and quotient manifolds? But
these are not really tangent space constructions. ]
(1) Tangent vectors. Input: C k manifold. Output: Vector space or C k−1 vector bundle.
For any C 1 manifold M , construct the pointwise tangent vector space Tp (M ) for all p ∈ M and the total
tangent space T (M ).
(2) Tangent operators. Input: C k manifold. Output: Vector space or set.
For any C 1 manifold M , construct the pointwise tangent operator space T̊p (M ) for all p ∈ M and the
set of tangent operators T̊ (M ). (The set T̊ (M ) is not given structure such as an atlas or topology.)
(3) Tagged tangent operators. Input: C k manifold. Output: Vector space or C k−1 vector bundle.
For any C 1 manifold M , construct the pointwise tagged tangent operator space T̂p (M ) for all p ∈ M and
the total tagged tangent operator space T̂ (M ).
(4) Coordinate frame space. Input: C k manifold. Output: C k−1 fibre bundle.
For any C 1 manifold M and r = 1, . . . dim(M ), construct the pointwise tangent r-frame space Ppr (M )
for all p ∈ M , and the total tangent r-frame space P r (M ). If r = dim(M ) = n, these are called the
pointwise tangent n-frame space Pp (M ) for all p ∈ M , and the total tangent n-frame space P (M )
(5) Cotangent vectors. Input: C k manifold. Output: Vector space or C k−1 vector bundle.
For any C 1 manifold M , construct the pointwise cotangent vector space Tp∗ (M ) for all p ∈ M and the
total cotangent space T ∗ (M ).
(6) Tensors. Input: C k manifold. Output: Tensor space or C k−1 vector bundle.
For any C 1 manifold M , construct the pointwise tensor space Tpr,s (M ) for all p ∈ M , and the total
tensor space T r,s (M ), where r, s ∈ +
0.
(7) Vector fields. Input: C k vector bundle. Output: Vector field algebra.
For any total space on a C k manifold M , construct the space of vector fields X k ( ) for integers k ∈
0 . This is the space of C cross-sections of .
+ k
(8) Map forms. Input: Two C k manifolds. Output: C k−1 vector bundle.
For any two C 1 manifolds M1 and M2 , construct the pointwise space of T (M2 )-valued forms Tp∗ (M1 , M2 )
at p ∈ M1 , and the total space of T (M2 )-valued forms T ∗ (M1 , M2 ) on M1 .
(9) Map vectors. Input: Two C k manifolds. Output: C k−1 vector bundle.
For any two C 1 manifolds M1 and M2 , construct the pointwise space of T ∗ (M2 )-valued tangent vectors
Tp (M1 , M2 ) at p ∈ M1 , and the total space of T ∗ (M2 )-valued tangent vectors T (M1 , M2 ) on M1 . (Note
that T (M1 , M2 ) = T ∗ (M2 , M1 ), roughly speaking.)
(10) Higher-order tangent vectors. Input: C k manifold. Output: C k−% vector bundle.
[%]
For any C % manifold M , construct the pointwise space of /th-order tangent vectors Tp (M ) at p ∈ M ,
and the total space of /th-order tangent vectors T (M ).
[%]
(11) Higher-order map vectors. Input: Two C k manifolds. Output: C k−% vector bundle.
For any C % manifolds M1 and M2 , construct the pointwise space of /th-order T ∗ (M2 )-valued tan-
[%]
gent vectors Tp (M1 , M2 ) at p ∈ M1 , and the total space of /th-order T ∗ (M2 )-valued tangent vec-
tors T (M1 , M2 ).
[%]
The above construction methods may be applied recursively. The output from each principle may be an
input to other building principles. Also, each C k vector bundle is also a C k manifold. So for example,
T (T (M )) results from applying method (1) twice to a C k manifold to yield a C k−2 vector bundle, because
the C k−1 vector bundle from the first step is also a C k−1 manifold which can be input to the second step.
Another typical example is X k (T ∗ (M )), which results from applying method (5) followed by method (7).

The main motivation and application of all tangent space constructions is to provide a “place to live” for
differentials of various kinds. For example, the differential dφ of a C 1 map φ : M1 → M2 for C 1 manifolds
M1 and M2 looks like a cotangent vector at each point of M1 and like a tangent vector at each point of M2
in the range of φ. This kind of differential requires a tangent space T ∗ (M1 , M2 ).
[ There should be a section, maybe near here, in which there is a summary of all the different kinds of
spaces, including spaces of tangent vectors, tangent operators, higher-order tangent vectors, (mixed) tensors,
differentials, cotangents, vector fields and differential forms. There should be at least as table summarising
these, and maybe one or more diagrams too. This plethora of spaces is available even without bringing in
connections or metrics. ]

[571]
Chapter 27
Tangent bundles on differentiable manifolds
27.1 Styles of representation of tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 573

27.2 Tangent bundle metadefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
27.3 Tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
27.4 Computational tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
27.5 Tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
27.6 Tagged tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
27.7 Pointwise tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
27.8 Tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
27.9 Tangent operator bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
27.10 The tangent bundle of a tangent bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
27.11 Horizontal components and drop functions . . . . . . . . . . . . . . . . . . . . . . . . . . 594
27.12 Tangent frames and coordinate basis vectors . . . . . . . . . . . . . . . . . . . . . . . . . 596
27.13 Tangent space constructions, attributes and relations . . . . . . . . . . . . . . . . . . . . 598
27.14 Unidirectional tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
27.15 Distributions as representations of tangent bundles . . . . . . . . . . . . . . . . . . . . . 599
27.16 Tangent bundles on infinite-dimensional manifolds . . . . . . . . . . . . . . . . . . . . . . 600
The following table summarizes some tangent space concepts which are introduced in this chapter.
reference concept symbol comments
26.3.6 differentiable manifold (M, AM ) M a set, AM = atlas(M ) a C r atlas on M , r ≥ 1
27.2.1 abstract tangent bundle (T , π, ÂT , Φ) total space T , projection π, bundle atlas ÂT , lift Φ
27.3.2 tangent coordinate triple (p, v, ψ) p ∈ M, v ∈ IRn , ψ ∈ atlasp (M )
27.3.3 tangent vector tp,v,ψ equivalence class of (p, v, ψ) for fixed p ∈ M
27.7.1 tangent (vector) space Tp (M ) {tp,v,ψ ; p%∈ M, v ∈ IRn , ψ ∈ atlasp (M )}
27.8.1 tangent bundle (T (M ), AT (M ) ) T (M ) = p∈M Tp (M ) with C r−1 atlas AT (M )
27.10.13 tang. bundle tang. space Tz (T (M )) tangent space of a tangent bundle, z ∈ T (M )
27.10.14 tang. bundle tang. bundle T (T (M )) has C r−2 atlas AT (T (M )) , r ≥ 2
27.5.1 tangent operator ∂p,v,ψ ∂p,v,ψ : f 8→ v i ∂ip,ψ f for f ∈ C r (M, IR)
27.6.2 tagged tangent operator (p, ∂p,v,ψ ) tangent operator ∂p,v,ψ with tag p ∈ M
27.7.6 tangent operator space T̊p (M ) {∂p,v,ψ ; p ∈ M, v ∈ IRn , ψ ∈ atlasp (M )}; untagged
27.7.9 tagged tang. operator sp. T̂p (M ) {(p, ∂p,v,ψ ); p ∈ M, v ∈ IRn , ψ ∈ atlasp (M )}
%
27.9.2 tangent operator bundle (T̂ (M ), AT̂ (M ) ) p∈M T̂p (M ) with C
r−1
atlas AT̂ (M )
27.12.1 tangent k-frame (tp,vj ,ψ )kj=1 sequence of k independent vectors in Tp (M )
27.12.2 tangent n-frame set Pp (M ) set of n-frames
% at p ∈ M
27.12.8 tangent n-frame bundle (P (M ), AP (M ) ) P (M ) = p∈M Pp (M ) with C r−1 atlas
27.0.1 Remark: A tangent bundle is not a fibre bundle. The tangent bundle concept in this chapter is

572 27. Tangent bundles on differentiable manifolds
not a sub-species of the differentiable fibre bundles defined in Chapter 34. A tangent bundle is a stand-alone
concept which is closely related to topological fibre bundles and differentiable fibre bundles. (The relations
between tangent bundles and fibre bundles are roughly summarized in Figure 27.0.1.)
non-topological fibre bundle
tangent bundle
differentiable fibre bundle
Figure 27.0.1 Family tree for fibre bundles and tangent bundles
Differentiable fibre bundles require differentiable groups which in turn require tangent bundles on differ-
entiable manifolds. So to avoid cyclic definitions, tangent bundles cannot be defined as a sub-species of
differentiable fibre bundles.
27.0.2 Remark: Tangent vectors are the quintessence of directionality. The word “tangent” is derived
from the Latin word “tangens” which means “touching”. A tangent vector is a vector which touches a curve
at a point. Tangent vectors in flat space have been well known since classical Greek mathematics. Tangent
vectors for manifolds (curved space) are defined in terms of flat-space tangent vectors via differentiable
charts. Since tangent vectors are well-defined and familiar for flat spaces IRn (which are the ranges of
manifold charts), it makes sense to exploit flat-space vectors to define tangent vectors for differentiable
manifolds. Charts on manifolds map points to coordinates. In the reverse direction, tangent vectors in IRn
are mapped back onto the manifold.
27.0.3 Remark: Figure 27.0.2 outlines the core idea for defining tangent vectors on manifolds in this book.
(x, v) = Ψ(ψ)(V )
Ψ(ψ) ∈ AT (M )
T (M ) V T (IRn ) ≡ IRn × IRn
≡ IR2n
(x, v) 8→ x
π Ψ
x = ψ(p)
M p = π(V ) IRn
ψ ∈ AM
Figure 27.0.2 Exploitation of flat-space tangent bundle to define tangent bundles on manifolds
The idea is to use the flat-space tangent bundle T (IRn ) alluded to in Definition 19.1.8 to define the tangent
bundle T (M ) on a manifold M . Then the charts Ψ(ψ) for the tangent bundle’s total space T (M ) are
required to have the same transformation rules as the chart-transition diffeomorphisms φ = ψ2 ◦ ψ1 for
charts ψ1 , ψ2 ∈ atlasp (M ) ⊆ AM .
27.0.4 Remark: Many differential geometry textbooks adopt tangent operators as the fundamental defi-
nition of a tangent vector. Tangent operators have some difficulties which must be swept under the carpet.
So an old-fashioned and pedestrian definition of tangent vectors is adopted here, namely equivalence classes
of coordinate triples (p, v, ψ) ∈ M × IRn × AM for p ∈ M , v ∈ IRn and ψ ∈ AM , where M is the manifold
and AM is the atlas on M . The space of these tangent vectors is given the symbol T (M ). In second place,
the set of tagged tangent operators (p, ∂p,v,ψ ) is denoted T̂ (M ). In third place, tangent operators ∂p,v,ψ are

27.1. Styles of representation of tangent vectors 573
given the symbol T̊ (M ). The less common definition of tangent vectors as equivalence classes of C 1 curves
passing through a point p ∈ M will be referred to as “tangent curve classes”.
27.0.5 Remark: The word ‘vector’ is Latin meaning ‘carrier’. So a vector carries something from one
point to another. According to Struik [194], page 175, the word “vector” was introduced into mathematics
by William Rowan Hamilton (1805–1865) in the context of quaternions. The OED [212], page 2456, gives
the date 1865 as the first recorded occurrence of the word “vector” in the mathematical sense of a quantity
having both magnitude and direction although it was used as early as 1796 in the sense of the straight line
joining a planet to the focus of its orbit.
27.0.6 Remark: The words “coordinate”, “component” and “coefficient” are often used interchangeably,
but they have different meanings.
A coordinate is a number which tells you the location of a point in a grid, for example the numbers x and
y in Cartesian coordinates (x, y) for the plane. A component is an element of a list or array of numbers,
for example the numbers xi in a vector (x1 , . . . xn ) or the numbers aij in a matrix [aij ]ni,j=1 . A coefficient
is typically a constant multiplier of a term in an expression, for example the numbers a, b, and c in the
expression ax2 + bx + c.
The numbers v i in Definition 27.5.1 for tangent operators may be described in all three ways, but with
slightly different interpretations. They are coordinates of operators ∂p,v,ψ in the natural atlas on the total
tangent operator space T̂ (M ). They are components of the n-tuple v ∈ IRn . And they are coefficients
of the first-order derivatives. These three words each suggest different relations of the numbers to their
mathematical objects rather than attributes of the numbers themselves.
In the case of Definition 27.3.3 for tangent vectors, it is not strictly correct to talk of the numbers v i as
coefficients, because they are not multipliers for terms in an expression. But they certainly are components
of an n-tuple. They may also be thought of as coordinates of tangent vectors with respect to an atlas for
the total space of a tangent bundle.
For both tangent vectors and tangent operators, it is probably preferable to refer to the numbers v i as
coordinates. In the case of tangent operators (Definition 27.5.1), the term “coefficients” is may also be used.
In the case of tangent vectors (Definition 27.3.3), the term “components” may be used for the coordinates.
27.1. Styles of representation of tangent vectors
This section is a discussion of the meaning of tangent vectors, some popular styles of representation of
tangent vectors, and the advantages and disadvantages of the various styles. A particular manifold may
have many tangent bundles, but they are related by unique isomorphisms.
The classical meaning of a vector is an oriented segment from a given point p to another point q. (This is
explained in the article on vectors in EDM2 [35], 442.A.) Vectors arose in physics to describe velocities and
forces. A velocity is typically a derivative γ # (t) of a curve γ. A force is typically a gradient ∇E of a potential
function E. Both of these are defined as limits of rates of variation of functions from point to point.
In flat space, there is a unique straight line joining one point to another, but in a differentiable manifold there
is generally no uniquely specified shortest path between two points. There is no such thing as a geodesic
in a manifold without a connection. A specification of two end-points is insufficient to specify a unique,
chart-invariant shortest path in a curved space without a metric.
The closest thing to a “real meaning” of a tangent vector in a differentiable manifold M is an infinitesimal
translation of a point p ∈ M . This philosophical concept cannot be directly represented as a set construction.
(See Section 18.1 for the problem of “infinitesimals” in the linear space case, which has a long history.) But a
generator of small movements can be defined. This gives a motivation for preferring the differential operator
(case (ii) in Remark 27.1.1).
In the case of embedded manifolds, it is possible to use extrinsic tangent vectors, such as are indicated in
equations (41.2.1) and (41.2.2) for S 2 embedded in IR3 . This is useless in general, for instance for cosmology,
but extrinsic tangent vectors do meet the requirements for a tangent vector object when they are available.
When a manifold is embedded in a Euclidean space, the tangent plane to any point of a smooth enough
manifold is well defined. The vectors in this tangent plane may be used to describe the velocity of a curve in

the manifold, the gradient of a real-valued function on the manifold, and various other analytic concepts. The
problem with non-embedded manifolds is that there is no ambient space in which to construct a tangent plane.
One could artificially construct an ambient space locally for any smooth enough non-embedded manifold and
use that as a tangent space. But that would be clumsy, unnatural and restrictive. (Frankel [19], section 1.3,
page 23, mentions that Hassler Whitney showed that an n-dimensional manifold can always be embedded in
some 2n-dimensional manifold.) Many approaches may be taken to construct intrinsic tangent vectors, such
as using the coordinates from local charts, using curves within the manifold to substitute for straight-line
tangents, or using gradient operators.
The problem of defining tangent vectors directly in terms of the points of a manifold arises because the only
thing known about the points of a manifold is that they are elements of a set. It is not at all clear how
to construct well-defined tangent vectors for abstract points. When presented with a manifold, the tangent
space is not usually part of the given structure. One must construct a tangent space somehow from the
points. (For further discussion of this issue, see Remark 27.1.11.)
27.1.1 Remark: The coordinate transition matrices for charts are presented in Definition 26.5.7. Any
object which transforms according to these matrices may be accepted as a valid tangent vector representation.
This idea is implemented in Metadefinition 27.2.1. Some examples of intrinsic tangent vector constructions
which satisfy this metadefinition are as follows.
(i) Coordinates. Almost all real-life computations use this representation. A tangent vector may be
represented as an equivalence class of triples (p, v, ψ), where p ∈ M is the vector’s base point, v ∈ IRn
is the set of tangent vector coordinates and ψ is a chart. (See Definition 27.3.3.) Alternatively, tangent
vectors may be represented by triples (x, v, α) where x is the set of coordinates of p ∈ M and α is the
index of ψ in an indexed atlas. (See Definition 27.4.2. This kind of definition is almost identical to
Darling [14], page 135, section 6.6.2 and Frankel [19], page 23, section 1.3a.)
(ii) A differential!operator. This is popular & for analytical and theoretical applications. The operator has
the form f 8→ i vψi (∂/∂xi )(f ◦ ψ −1 (x))&x=ψ(p) for vψ ∈ IRn . This style of definition requires a space of
functions to differentiate. (See comments in Remark 27.1.2.)
(iii) An equivalence class of curves. This representation defines a tangent vector at p ∈ M to be an
equivalence class of curves
& γ : IR → M which have the same velocity vector γ # (0) at p = γ(0), where
&
γ (0) = (d/dt)ψ(γ(t)) t=0 for charts ψ : M → IRn . (See comments in Remark 27.1.3.) [ For equivalence
#
classes of curves, see Malliavin [36], proposition I.4.1. ] (See for instance Gallot/Hulin/Lafontaine [20],
page 14, section 1.25. See also Crampin/Pirani [12], page 247.)
(iv) A derivation. This representation is based on linear functionals L : C ∞ (M ) → IR which obey the
Leibniz rule. (See Section 44.1.) These are essentially the same as the differential operators in (ii), but
are defined more algebraically. They don’t work correctly for C k (M ) spaces with k < ∞. (This form
of tangent vector definition appears in EDM2 [35], section 105.F, and Gallot/Hulin/Lafontaine [20],
section 1.45–1.53, pages 20–22.)
(v) A generalized function. Schwartz distributions, for example, represent generalized functions as el-
ements of the duals of spaces such as C0∞ (M ). In particular, points may be represented as Dirac
delta functions and tangent vectors may be represented as directional derivatives of delta functions.
This is a broad extension of the differential operator and derivation styles of definition. This style of
representation is restricted to C ∞ manifolds.
It is straightforward to invent other reasonable definitions of tangent vectors, such as an equivalence class of
local diffeomorphisms or an equivalence class of sequences of points (converging with a specified velocity to
a point).
27.1.2 Remark: The differential operator representation of tangent vectors in part (ii) of Remark 27.1.1
is often claimed to be chart-independent and “coordinate-free”, although clearly in order to specify which
vector one it talking about, one must give the vector components. The operator definition has the advantage
that the transformation rules follow automatically from its form. It is probably the style of representation
which is most widely regarded as the essence of tangent vectors in differential geometry texts.
One seldom-mentioned disadvantage of this representation is the “zero vector ambiguity problem”: the
zero vectors at all points in the manifold are represented by the same operator. (See Remark 27.5.12,
Definition 27.6.2 and Remark 27.6.1.)

27.1. Styles of representation of tangent vectors 575
Another problem is that the space of differentiable functions on a manifold must be defined beforehand. A
differentiable function f is defined as one for which the derivatives (∂/∂xi )(f ◦ ψ −1 (x)) are well-defined for
charts ψ. This is uncomfortably close to being a circular definition. The space C 1 (M ) is a very large set of
functions. Defining a tangent vector as a linear functional on an infinite-dimensional linear space of functions
is certainly not conceptually economical. In fact, the n coordinate functions p 8→ ψni (p) for i = 1 . . . n (for
a fixed chart ψ whose domain contains p) suffice as test functions to fully determine the operator, but this
is a reversion to coordinates again, which the operator definition is supposed to avoid. Clearly the claims
that differential operators provide the best, simplest or most natural representation of tangent vectors are
vulnerable to scrutiny. (See also Remark 27.6.6.)
One should regard tangent operators as being abstract differentiation procedures rather than actions on a
particular class of test functions, because otherwise the chosen class will always be too large or small for
some applications.
27.1.3 Remark: Since the curves in part (iii) of Remark 27.1.1 are embedded within the point space of
the manifold, this definition has a strong intuitive appeal which seems to be coordinate-free, but this is
illusory because the choice of curves depends heavily on the coordinate maps. The representation is quite
uneconomical in practice because a single vector is represented as an infinite number of curves. The curves
themselves must be tested to ensure that the expression (d/dt)ψ i (γ(t)) is well-defined for all components ψ i
of the coordinate charts of the manifold. So the coordinates are not excluded from the definition. When one
wishes to indicate a particular tangent vector, in practice one must specify the components of the derivatives.
So, like the operator in part (ii), this style of definition has intuitive appeal but no practical value.
27.1.4 Remark: All of the constructions in Remark 27.1.1 have the correct transformation rules. It is
difficult to select one representation as morally superior to all others. The approach here will be to say that
a tangent space is any construction which has the right properties and relations to the corresponding man-
ifold. All such representations and constructions are isomorphic, equivalent and interchangeable. However,
option (i) is chosen here as the preferred representation because it is the best starting point for deriving all
of the other representations. (See Definition 27.3.3.)
For comparison, the positive integers may be represented as Babylonian, Greek, Roman or Arabic numerals,
in binary, octal, hexadecimal or sexagesimal. The best choice depends on context. The same is true for
tangent vector representations.
27.1.5 Remark: Tangent vectors are used to define tangents of curves (γ # for γ : IR → M ), gradients
of real-valued functions (∇v f for f : M → IR), differentials of functions (dφ for φ : M1 → M2 ), affine
connections (ρ : T (M ) → T (P (M ))), vector fields (X : M → T (M )), and many other analytic concepts. For
the development of such definitions, the most convenient tangent vector definition is arguably the differential
operator option (ii), but this convenience can be obtained with option (i) by using the notation DV to
distinguish a tangent operator from its corresponding component-based vector V . Thus component vectors
and tangent operators may happily co-exist.
27.1.6 Remark: One might ask why C 1 test functions are used instead of general differentiable
! functions.
The reason is to guarantee the equality of the expressions lima→0 (f (x+av)−f (x))/a and ni=1 v i ∂f (x)/∂xi .
[ Must give a reference for this. ]
27.1.7 Remark: It is reasonable to ask whether the existence of unidirectional derivatives everywhere
on a manifold would be sufficient to define meaningful tangent vectors like lima→0+ (f (x + av) − f (x))/a
which are unidirectional. The examples in Section 42.4 give some hints on this subject. Example 42.4.4
shows that C 0,1 regularity is not sufficient to guarantee existence everywhere of directional derivatives.
However, if the chart transition maps have well-defined unidirectional derivatives, it should be possible to
define unidirectional tangent vectors. This could be useful for some applications. (Such generalizations are
discussed in Section 27.14.)
27.1.8 Remark: In a sense, the tangent vector representations (ii) and (iii) are inverses or duals of each
other. Just as a test function f : M → IR is one coordinate of a coordinate chart ψ : M → IRn , each
curve γ : IR → M is one inverse coordinate of an inverse coordinate chart ψ −1 : IRn → M . Both the
differential operator and curve representations of tangent vectors are pruned-down versions of coordinate

charts. Therefore one may as well go the whole hog and use full coordinate charts. The moral of this story is
that there is no such thing as a coordinate-free tangent vector. Test functions and curves are thinly disguised
coordinate charts.
27.1.9 Remark: Sometimes vectors do not transform as they should. Some quantities in physics do not
vary under all transformations in GL(n) according to the standard matrix rules. In such cases, a different
invariance group may be required. So, all things considered, it may be best to define a vector as an equivalence
class of triples (p, v, ψ), where p is a point in a space, v is a set of coordinates for the vector, ψ is an element
of the permitted set of charts for the space, and a set of transformation rules is supplied for determining the
equivalence class.
The point/coordinates/chart form of vector definition is unsatisfying. Since the coordinates must be the
coordinates of something, it leaves open the question of what that something is. But then, one can equally
observe that the Cartesian coordinates for a point in space are just numbers which depend upon the co-
ordinate frame in a specified way. The coordinates are certainly not points. Nor is any equivalence class
of coordinate/chart pairs (p, ψ) a point. A point is really something outside the scope of pure set theory.
But it is not really an empirical construct either. A point is a psychological construct within the minds of
mathematicians. This construct is useful for modelling the real world, and it may be given coordinates. But
the point itself is undefined, just as in the case of classical Euclidean geometry.
27.1.10 Remark: A point is defined as something which has position but no extent. But this is not very
illuminating. Since points cannot ultimately be defined within mathematics, it is no surprise that vectors are
not defined either. So if it is good enough to define a point as somehow underlying the sets of coordinates that
describe it, then surely this must be good enough for vectors. One may as well define them as equivalence
classes of triples (p, v, ψ) for p, v ∈ IRn just as points are really equivalence classes of pairs (x, ψ) for x ∈ IRn .
If coordinates are good enough for points, surely they are good enough for vectors too. It follows that for
consistency one should define vectors by coordinates rather than by differential operators on function spaces.
27.1.11 Remark: After ten years of occasional meditation on the question, the author finally decided on
22 October 2001 that tangent vectors are neither of the usual candidates. In fact, tangent vectors are a class
of object rather than a particular set or function, even if the manifold is a particular given construction.
To be precise, tangent vectors are any mathematical object which has the correct transformation rules. It
is not necessarily a differential operator, an equivalence class of curves, an equivalence class of coordinates,
nor derivations, germs, jets or distributions. All of these may qualify as tangent vectors as long as they obey
the specified transformation rules.
The following comment was written by this author shortly after arriving at this conclusion.
Solved! I’ve finally on the morning of 22 October 2001, ten years after I started trying to decide the
issue, discovered a good solution to the question of how to represent tangent vectors. The answer
is that one should not select any particular set structure as being the tangent vector structure.
Instead, one should provide a test which any given set structure must pass in order to qualify as a
tangent vector. This is what mathematical physicists do anyway. They define an object, and then
they test it to see if it qualifies as a vector or tensor of some kind.
There is nothing radical about this approach. It is exactly what was done in the case of defining a
manifold. It was nowhere said that a manifold is a member of some specified set of objects, such as
the set of topological grafts of Euclidean spaces with various properties, although that could have
been done. In fact, manifolds can be anything at all that has a suitable topology. A differentiable
manifold is just a manifold with an atlas having the right sort of property. If the points of a manifold
are just anything you like, as long as the structure has the right properties, then why shouldn’t
the tangent bundle also be any set structure which you want to put on a manifold which happens
to pass a qualification test. Therefore from now on, I intend to refer to “a” tangent bundle rather
than “the” tangent bundle of a given manifold. ‘Real tangent vectors’ lie outside the scope of set
theory, just as ‘real points’ do.
Probably the majority of mathematical classes are defined in terms of satisfying specified conditions
rather than as specific set constructions. Thus linear spaces have points in any set at all, with
operations satisfying various conditions, whereas the space IRn is a specific set construction – as
long as a particular representation of the real numbers is decided on. But even the real numbers

27.2. Tangent bundle metadefinition 577
or the integers may be regarded as classes of objects rather than particular set constructions. Any
system which is isomorphic to a given system may then be regarded as representing the same object.
27.2. Tangent bundle metadefinition

This section presents a specification, or metadefinition, for tangent bundles. The choice of representation
for a tangent bundle is “outsourced”. That is, any representation may be freely chosen within the specified
rules. Whenever you outsource anything, you must be careful to specify exactly what you expect to get.
This is essentially an axiomatic approach as opposed to a specific representation.
The set of tangent vectors on a manifold is so important that it deserves its own name: “tangent bundles”.
Although the word “bundle” is used, this does not imply that it is a kind of fibre bundle. The word “space”
is not quite right because that would suggest a linear space, which the set of all tangent vectors on a manifold
certainly is not.
A tangent bundle requires additional structure to qualify for the name “fibre bundle”. With the addition of
a topology on the total space and a suitable topological group structure on IRn , a tangent bundle qualifies
as a topological fibre bundle. (See Section 23.6.) If the base manifold M is C 2 , the addition of an induced
differentiable structure on a tangent bundle for M qualifies it as a differentiable fibre bundle. (See Chap-
ter 34.) Since differentiable fibre bundles are defined in terms of tangent bundles, it is important to avoid
an infinite cycle of definitions.
Metadefinition 27.2.1 provides an acceptance test for any proposed tangent bundle definition for a differen-
tiable manifold. To pass the test, a proposed tangent bundle construction must provide a set T (the “total
tangent space”) whose elements will be called the “tangent vectors” of the manifold. A projection function
π must be provided so that each vector V ∈ T can be associated with a unique base point p = π(V ) ∈ M .
It must be possible to determine the components in IRn of any vector V ∈ T with respect to any given
chart for the manifold. That is, given any chart ψ ∈ AM , it must be possible to determine the components
φψ (V ) ∈ IRn of any vector V ∈ T . The tangent bundle (total space) atlas must be consistent with the
manifold atlas.
[ Try to adapt of Metadefinition 27.2.1 to define pseudo-vectors (cross-product in IR3 ) and spinors. Also try
to adapt it for single-sided tangent vectors analogous to unidirectional derivatives. ]
27.2.1 Metadefinition: A tangent bundle for an n-dimensional C 1 manifold (M, AM ) must provide a
tuple (T , π, ÂT , Φ) −
> (T , ÂT ) −
> T which satisfies the following conditions.
(i) π : T → M is a surjective map.
(ii) Φ : AM → ÂT is a bijection.
(iii) ∀ψ ∈ AM , Φ(ψ) : π −1 (Dom(ψ)) → IRn .
&
(iv) ∀p ∈ M, ∀ψ ∈ atlasp (M ), Φ(ψ)&π−1 ({p}) : π −1 ({p}) → IRn is a bijection.
(v) ∀p ∈ M, ∀ψ1 , ψ2 ∈ atlasp (M ), ∀V ∈ π −1 ({p}), ∀i = 1 . . . n,
5n
∂ &
−1 &
Φ(ψ2 )(V )i = j
(ψ2
i
◦ ψ1 (x)) & Φ(ψ1 )(V )j . (27.2.1)
j=1
∂x x=ψ1 (p)
T is called the total space of the tangent bundle.

An element of T is called a tangent vector.
π is called the projection map of the tangent bundle.
ÂT is called the tangent bundle atlas of the tangent bundle.
The maps φψ ∈ ÂT are called tangent bundle charts.
Φ is called the lift function of the tangent bundle.
A tangent vector at p ∈ M is any element of π −1 ({p}).
The tangent
& vector at p ∈ M with coordinates v ∈ IRn with respect to chart ψ ∈ AM is the tangent vector
&
(Φ(ψ) π−1 ({p}) )−1 (v) ∈ T .

v = φψ (V )
φψ ∈ ÂT
T V IRn ≡ Tψ(p) (IRn )
π Φ
x = ψ(p)
M p = π(V ) IRn
ψ ∈ AM
Figure 27.2.1 Tangent bundle metadefinition “lift” function Φ and “anchor” charts φψ = Φ(ψ)
& Remark: Metadefinition 27.2.1 is illustrated in Figure 27.2.1.

27.2.2 The range set IRn for the functions
&
Φ(ψ) π−1 ({p}) should be thought of as the tangent space Tψ(p) (IR ) of IRn at p. In other words, the maps
n
Φ(ψ) associate tangent vectors of the manifold M at p with tangent vectors of IRn at ψ(p).
Condition (iii) implies that the functions (ψ ◦ π) × Φ(ψ) for ψ ∈ AM map from subsets of T to IRn × IRn ,
which may be identified with the set of tangent vectors of IRn . The set of the functions (ψ ◦ π) × Φ(ψ) will
be a C 0 atlas for T .
Note that upper-case V is used for tangent vectors in T whereas lower-case v is used for vector components
in IRn . Recall also (Section 5.16) that “A −
> B ” means that expression B is an abbreviation for expression A.
27.2.3 Remark: Metadefinition 27.2.1 (v) implies that φψ (V ) is uniquely determined for all charts ψ if
the value of φψ (V ) is given for any particular chart ψ.
27.2.4 Remark: The tangent bundle charts φψ = Φ(ψ) ∈ ÂT in Metadefinition 27.2.1 “anchor” the
vectors of the tangent bundle to particular tangent vectors in the already-defined tangent bundle of IRn .
The familiar, well-understood tangent vectors in flat space are “leveraged” to define tangent vectors on
differentiable manifolds. The tangent bundle charts φψ answer the question: “Which direction does each
tangent vector have?” Condition (v) ensures that the answer to this question is independent of the choice
of chart.
By anchoring tangent vectors on manifolds to tangent vectors on IRn , each tangent vector has well-defined
coordinates for each manifold chart because flat-space tangent vectors have well-defined coordinates. The
projection map π similarly anchors tangent vectors to their base points. The combination of the projection
map and tangent bundle charts completely determines the base point p and the coordinates v for each
chart ψ. This leads naturally to an equivalence class of triples (p, v, ψ) describing each tangent vector. This
is the basis of Definition 27.3.3.
The inverses of tangent bundle charts induce tangent space structure from IRn onto the total space T .
The inverses of charts also induce a topology and manifold atlas on the tangent bundle’s total space. This
automatically gives the total space a topological manifold structure. If this structure is differentiable, it is
possible to build a tangent bundle on the tangent bundle. This is essential for defining the higher-order
derivatives which are required in physical models.
There are many different definitions of tangent vectors on manifolds, but as long as they are anchored to
the flat-space tangent vectors, the existence of unique isomorphisms between the various tangent bundle
definitions is guaranteed. Therefore there is no ambiguity. All tangent bundle definitions are equivalent
and interchangeable. (The same requirement for unique isomorphisms is discussed in Remark 13.5.6 for the
flat-space tensor product metadefinition.)
27.2.5 Remark: The form of Metadefinition 27.2.1 would be easy to generalize by replacing the “fibre
space” IRn with some other space and replacing the general linear group GL(n, IR) which is implicit in the
transition map rule with some other group. In fact, the tangent bundle metadefinition may be thought of
as a template for all fibre bundle definitions. In other words, fibre bundles are merely generalizations of
tangent bundles.

27.2. Tangent bundle metadefinition 579
&
φψ2 ◦ (φψ1 &π−1 ({p}) )−1
T
v2
v1 φ = Φ(ψ ) φψ2 = Φ(ψ2 )
ψ1 1
Tψ1 (p) (IRn ) ≡ IRn V IRn ≡ Tψ2 (p) (IRn )
Φ π Φ
x1 x2
IRn p IRn
ψ1 ψ2
ψ2 ◦ ψ1−1
Figure 27.2.2 Tangent bundle metadefinition transition maps
27.2.6 Remark: The transition rule condition (v) in Metadefinition 27.2.1 is illustrated in Figure 27.2.2.
&
The transition map φψ2 ◦ (φψ1 &π−1 ({p}) )−1 in Metadefinition 27.2.1 (v) is a linear bijection from IRn to IRn .
In other words, it is an element of GL(n, IR). For p ∈ M , let x1 = ψ1 (p), x2 = ψ2 (p), V ∈ π −1 ({p}),
v1 = φψ1 (V ) and v2 = φψ2 (V ). Then
" #
x2 = ψ2 ◦ ψ1−1 (x1 )
" & #
&
v2 = φψ2 ◦ (φψ1 & −1 )−1 (v1 )
π ({p})
and
5n
∂ &
&
∀i = 1 . . . n, v2i = (ψ i
◦ ψ −1
(x)) & v1j .
j 2 1
j=1
∂x x=ψ1 (p)
27.2.7 Remark: Metadefinition 27.2.1 (iii) implies that ∀ψ ∈ AM , Dom(Φ(ψ)) = π −1 (Dom(ψ)). This
constraint could be relaxed to permit a tangent space atlas with a very general set of chart domains. But
this book is following a policy of minimal structures. If a manifold has only 2 charts, then the tangent space
gets only 2 charts and they match domain for domain. If a particular tangent space implementation has
more charts, that could reduce the regularity of the tangent space. That would be a different differentiable
structure. The approach taken here maximizes the regularity of the tangent space. [ This remark is similar
to Remark 27.8.9. ]
27.2.8 Remark: Metadefinition 27.2.1 (v) says effectively that anything is a tangent space if it transforms
like a tangent space between charts on the manifold. This kind of outsource-and-test approach permits the
coexistence of a large number of different definitions of tangent spaces. This fits well with general practice
in physics, which is to accept anything as a vector which transforms like a vector.
A good example of accepting anything as a tangent vector which transforms like a tangent vector is
Frankel [19], section 1.3a, page 23. However, physicists typically test mathematical expressions to de-
termine if they are vectors whereas mathematicians test mathematical set constructions to see if they are
correct representations of the class of tangent bundles. In the former case, the set construction is generally
fixed. In the latter case, it is the set construction which is to be tested, not the form of an expression which
determines the numbers which populate the set construction.
[ Define the topology induced on a tangent bundle by tangent bundle charts. If a tangent bundle definition
has a topology, it must be the same as this induced topology. Also define a differentiable structure on the
tangent bundle induced by the tangent bundle charts. Show some basic properties of the induced topology
and induced differentiable structure. ]

27.2.9 Theorem: The set of maps

AT (M ) = {Qn,n ◦ ((ψ ◦ π) × Φ(ψ)); ψ ∈ AM } ,
where Qn,n : IRn ×IRn → IR2n is the concatenation map in Definition 8.5.3, is a 2n-dimensional differentiable
structure on any tangent bundle defined under Metadefinition 27.2.1 for an n-dimensional manifold.
[ Definition 27.8.4 defines the standard atlas on a concrete tangent bundle as in Theorem 27.2.9. ]
27.3. Tangent vectors

27.3.1 Remark: The two main forms of tangent vector definition used in this book are Definition 27.3.3
(component form) and Definition 27.6.2 (differential operator form). For computational purposes, it is best
to work with coordinates only. Definition 27.4.2 (computational form) contains coordinate numbers only.
Since Metadefinition 27.2.1 requires tangent vector components to be specified for all charts, Definition 27.3.3
offers an excellent combination of simplicity and versatility. It is very easy to convert a component-triple
vector into a tangent operator, but not vice versa.
27.3.2 Definition: A tangent coordinate triple for an n-dimensional C 1 manifold (M, AM ) is a triple
(p, v, ψ) ∈ M × IRn × AM such that p ∈ Dom(ψ).
27.3.3 Definition: A tangent vector for a C 1%manifold M − < (M, AM ) with n = dim(M ) is an equivalence
class [(p, v, ψ)] of tangent coordinate triples in ψ∈AM (Dom(ψ) × IRn × {ψ}), where the triples (p1 , v1 , ψ1 )
and (p2 , v2 , ψ2 ) for ψ1 , ψ2 ∈ AM are said to be equivalent whenever p1 = p2 = p and
5n
∂ &
i&
∀i = 1 . . . n, v2i = j
(ψ2 ◦ ψ1
−1
(x)) & v1j . (27.3.1)
j=1
∂x x=ψ1 (p)
27.3.4 Notation: tp,v,ψ for p ∈ M , v ∈ IRn , ψ ∈ atlasp (M ) and n ∈ + 0 denotes the equivalence class
[(p, v, ψ)] in Definition 27.3.3. In other words, tp,v,ψ = [(p, v, ψ)].
%
% Remark: nThe set of triples ψ∈AM (Dom(ψ) × IR × {ψ}) in Definition 27.3.3 is the same as the
n
27.3.5
set ψ∈M ({p} × IR × atlasp (M )).
27.3.6 Remark: It is convenient to think of the equivalence class tp,v,ψ = [(p, v, ψ)] in Definition 27.3.3
as the “tangent vector at p with components v with respect to the chart ψ”. If the reader chooses to
use a differential operator representation of tangent vectors instead of components, the expression tp,v,ψ =
[(p, v, ψ)] can be identified with that representation, or any other representation.
Definition 27.3.3 is almost the same as the tangent vector definition by Frankel [19], section 1.3a, page 23,
which is very much in the style favoured by physicists. Frankel essentially defines a tangent vector at p as
an equivalence class of the form [(ψ, v)]. (This equivalence class just happens to be the graph of a function
with domain atlasp (M ).) The Frankel definition requires the association of a coordinate n-tuple v with every
chart ψ so that equation (27.3.1) is satisfied.
One may similarly regard an equivalence class [(p, v, ψ)] as a function with domain {p} × atlasp (M ) and
range IRn by identifying each triple (p, v, ψ) with the pair ((p, ψ), v).
The main problem with regarding the components v as a function of a set of charts is that the function’s
domain is atlasp (M ), a set of charts which varies according to the point p ∈ M . Therefore this representation
would define the set of vectors on a manifold to be a set of functions with an untidy, variable set of point-
dependent domains. Therefore the “function-of-charts” representation is not used in this book.
27.3.7 Remark: There could be benefits in changing the order of tangent component triples from (p, v, ψ)
to (p, ψ, v). This would help to clarify that the components v are a function of the set of pairs (p, ψ), as
noted in Remark 27.3.6. The altered order would also be better for generalizations to higher-order tangent
objects which would then be specified by quadruples like (p, ψ, v, a), where v and a are the first-order and
second-order parts respectively, and so forth for progressively higher-order tangent objects.
An argument in favour of putting the chart ψ at the end of a tangent vector component triple is that
generally one wants to focus on the pair (p, v). Having the chart ψ in the middle is a distraction. In
practical applications, the chart is mostly fixed while the points and vectors vary. It is for this reason that
the order in Definition 27.3.3 was chosen.

27.4. Computational tangent vectors 581
27.3.8 Notation: Tp (M ) denotes the set of tangent vectors tp,v,ψ at a point p in a C 1 manifold M . In
other words,
Tp (M ) = {tp,v,ψ ; v ∈ IRn , ψ ∈ atlasp (M )}.
%
27.3.9 Notation: T (M ) = p∈M Tp (M ) denotes the set of all tangent vectors on a C 1 manifold M . In
other words,
T (M ) = {tp,v,ψ ; p ∈ M, v ∈ IRn , ψ ∈ atlasp (M )}.
27.3.10 Remark: Notations 27.3.8 and 27.3.9 demonstrate an annoying difficulty with the way sets are
notated. It would have been fairly logical to write
Tp (M ) = {tp,v,ψ ; v ∈ IRn , ψ ∈ AM , p ∈ Dom(ψ)}

and
T (M ) = {tp,v,ψ ; p ∈ M, v ∈ IRn , ψ ∈ AM , p ∈ Dom(ψ)},
which is apparently the same as in Notations 27.3.8 and 27.3.9. The problem is that this would define Tp (M )
and T (M ) to be the same thing. In the above specification for Tp (M ), however, the predicate p ∈ Dom(ψ)
is supposed to indicate a condition to be satisfied, whereas the predicate p ∈ M for T (M ) indicates that p is
a free (or “dummy”) variable. What is really required is a way of indicating that p is fixed on the right-hand
side for Tp (M ), but is a free variable for T (M ).
[ This would be a good location for the definition of a tangent bundle. ]
27.3.11 Remark: A useful mnemonic for equations (27.3.1) and (27.4.1) is
∂ψβi
vβi = (p)vαj .
∂ψαj
See Remark 27.5.16 for further comment on this.
27.4. Computational tangent vectors
27.4.1 Definition: A computational tangent coordinate triple for an n-dimensional C 1 manifold (M, AM )
with indexed atlas (ψα )α∈I is a triple (x, v, α) ∈ IRn × IRn × I such that x ∈ Range(ψα ).
27.4.2 Definition: A computational tangent vector for a C 1 manifold % (M, AM ) with n = dim(M ) and
indexed atlas (ψα )α∈I is an equivalence class of coordinate triples in α∈I (Range(ψα ) × IRn × {α}), where
the triples (xα , vα , α) and (xβ , vβ , β) for α, β ∈ I are said to be equivalent whenever ψα−1 (xα ) = ψβ−1 (xβ )
and
5n
∂ &
i&
∀i = 1 . . . n, vβi = j
(ψβ ◦ ψα
−1
(x)) & vαj . (27.4.1)
j=1
∂x x=xα
%
27.4.3 Remark: The set ψ∈AM (Dom(ψ)×IRn ×{ψ}) in Definition 27.3.3 is a subset of M ×IRn ×AM . The
equivalence classes of component triples effectively constitute a “graft” of the product sets Dom(ψα ) × IRn .
(See Definition 6.10.5 for graft sets.)
%
The set α∈I (Range(ψα ) × IRn × {α}) in Definition 27.4.2 is a subset of IRn × IRn × I. The equivalence
classes of component triples effectively constitute a “graft” of the product sets Range(ψα ) × IRn .
Darling [14], page 135, section 6.6.2, gives a definition of tangent vectors which is almost identical to Def-
initions 27.3.3 and 27.4.2. Darling uses equivalence classes of the form [(p, α, v)] where p ∈ M , α ∈ I
and v ∈ IRn . For such a definition, the α ∈ I is convenient for computation as in Definition 27.4.2, but the
p ∈ M is an abstract point as in Definition 27.3.3. It is, in fact, probably more logical to make the chart
index precede the vector components, since then the index stays in the same place if higher-order objects
are represented in a similar way. [ This remark is related to Remark 27.3.7. ]

27.5. Tangent operators

A tangent operator is an action of a tangent vector on (real-valued) differentiable functions on a manifold.
There is an almost exact correspondence between a tangent vector defined in terms of components and the
differential operator which is constructed from this. Many authors use tangent operators as their primary
definition of tangent vectors, but as discussed in Section 27.1, tangent operators have both advantages and
disadvantages as a primary definition. The disadvantages seem to outweigh the advantages. Therefore
Definition 27.3.3 is adopted as the basic definition and Definition 27.5.1 is a secondary definition.
27.5.1 Definition: A tangent operator on a C 1 manifold M is a function ∂p,v,ψ : C 1 (M ) → IR which is
defined for p ∈ M , v ∈ IRn and ψ ∈ atlasp (M ) by
5n
∂ &
&
∀f ∈ C 1 (M ), ∂p,v,ψ (f ) = v i i (f ◦ ψ −1 )(x)& .
i=1
∂x x=ψ(p)
The n-tuple v is called the coefficient vector or coordinate vector of the tangent operator ∂p,v,ψ with respect
to the chart ψ.
The triple (p, v, ψ) is called the coordinate triple for the tangent operator ∂p,v,ψ .
The point p is called the base point of the tangent operator ∂p,v,ψ .
27.5.2 Remark: Definition 27.5.1 is illustrated in Figure 27.5.1. The symbol ∂i is shorthand for ∂/∂xi .
ψ
p
vi ∂i (f ◦ ψ −1 )
U = Dom(ψ)
ψ(p)
M
f ◦ψ −1
level curves
of f ◦ ψ −1
f ψ(U )
IR IRn
!n
Figure 27.5.1 Tangent operator definition: ∂p,v,ψ (f ) = i=1 v i ∂i (f ◦ ψ −1 )(ψ(p))
27.5.3 Notation: T̊p (M ) denotes the set of tangent operators with base point p ∈ M in a C 1 manifold M .
%
27.5.4 Notation: T̊ (M ) denotes the set p∈M T̊p (M ) of all tangent operators on a C 1 manifold M .
27.5.5 Remark: One of the difficulties with the action space C 1 (M ) for tangent operators is that very
often one wishes to apply tangent operators to functions which are not globally C 1 or not defined on the whole
manifold M . A tangent operator ∂p,v,ψ is well-defined for any function f ∈ C̊p1 (M ). (See Notation 26.6.7.)
In fact, the function f only needs to be differentiable at the point p. Even if f is not differentiable at p,
it could be differentiated as a generalized function or distribution. Therefore it is usually “understood”
that tangent operators are actually a formal or symbolic operation or procedure, not a function with a
predetermined " domain and range. So the # expression ∂p,v,ψ should be thought of as the abstract operator
f 8→ lima→0 f (ψ −1 (ψ(p) + av)) − f (p) , where f is any sort of thing at all. This is generally accepted
in practice. In this way of thinking, a tangent operator isn’t a function. It’s a procedure. The procedure
requires test functions and a differentiable structure. The main problem with the “differentiation procedure”
view of tangent operators is the hidden dependency of the limit procedure on a differentiable structure.
The choice of C 1 (M ) as the action space for tangent operators is a reasonable compromise. It is always
well-defined on C 1 manifolds. And after all, it doesn’t matter at all because it’s only the symbolic form of
the operator that matters.

27.5. Tangent operators 583
27.5.6 Remark: One of the most unsatisfying aspects of tangent operators as a primary definition for
tangent vectors is the difficulty of differentiating such operators. The ability to differentiate vector fields
is essential for most applications. But to differentiate an operator-valued function requires some sort of
differentiable structure on the set of tangent operators. How does one coordinatize the set of all tangent
operators on a manifold? The obvious way to do this is to use the coefficient n-tuples v of operators ∂p,v,ψ .
In other words, one arrives back at the coordinate representation of tangent vectors. Differentiating in the
general space of operators on spaces like C 1 (M ) leads down complicated, confusing pathways from which
it is difficult to return. If most common operations on tangent vectors require a reversion to coordinates
anyway, one might as well use coordinates as the primary definition. That is precisely the thinking behind
the primary definition choice in this book.
27.5.7 Notation: DV for any tangent vector V = tp,v,ψ ∈ T (M ) for a C 1 manifold M denotes the
corresponding tangent operator. In other words, DV = ∂p,v,ψ ∈ T̊ (M ).
27.5.8 Remark: The D-notation for tangent operators in Notation 27.5.7 is quite standard, but it can be
ambiguous. In Notation 27.5.7, D is presented as a function D : T (M ) → T̊ (M ). But an operator such
as DV may be applied to the C ∞ (M ) functions or the C 1 (M ) functions, or it may be applied to Schwartz
distributions or tempered distributions on the manifold. DV may also apply to vector or tensor fields rather
than real-valued functions, or it may apply to differential forms of various kinds. (In these cases, DV signifies
a covariant derivative with respect to an affine connection.) The subscript V may refer to a tangent vector
at just one point p ∈ M , or it may refer to a vector field, in which case DV would yield a function on a
whole region of definition of its argument.
It follows from this remark that one must be very careful to determine which kind of derivative is indicated
by the notation DV in each context and what the domain and range of the operator are. Mostly, when the
domain of definition of the operator is known, the appropriate definition may be determined.
[ The DV notation is used as here in Frankel [19], section 1.3b, page 24. ]
27.5.9 Example: The linear space IRn is easily given a C ∞ manifold structure by defining an atlas with
a single chart, namely the identity map on IRn . (See Definition 26.4.1.) Tangent vectors for the C ∞
manifold M = IRn with this standard differentiable structure are of the form tp,v,ψ0 , where p, v ∈ IRn and
ψ0 : IRn → IRn is the identity map. The equivalence class tp,v,ψ0 = [(p, v, ψ0 )] is the singleton set {(p, v, ψ0 )}.
The set of tangent vectors T (M ) = {tp,v,ψ ; p ∈ M, v ∈ IRn , ψ ∈ atlasp (M )} in Notation 27.3.9 is T (IRn ) =
{{(p, v, ψ0 )}; p, v ∈ IRn }.
!n
The
!n seti T̊p (M ) of tangent operators for this
!n manifold is the set
& T̊ (IRn ) = { i=1 v i ∂ip ; p, v ∈ IRn }, where
&
i=1 v ∂i is shorthand for the map f 8→ i=1 v (∂/∂x )f (x) x=p for f ∈ C (IR ).
i i 1 n
The tangent bundle total space T (IRn ) for the differentiable manifold IRn is not the same thing as the tangent
space T (IRn ) of the linear space IRn . However, it is mostly harmless to think of these two structures as being
the same thing.
27.5.10 Remark: Theorem 27.5.11 means that Definitions 27.3.3 and 27.5.1 are consistent with each other.
If the coordinate vectors are equal, then the operators are equal. In other words, the choice of representative
of the equivalence class of coordinate triples is immaterial. So the D-operation in Notation 27.5.7 commutes
with changes of chart. Theorem 29.1.3 is similar to Theorem 27.5.11.
27.5.11 Theorem: If the tangent vectors tp1 ,v1 ,ψ1 and tp2 ,v2 ,ψ2 of a C 1 manifold are equal then the tangent
operators ∂p1 ,v1 ,ψ1 and ∂p2 ,v2 ,ψ2 are equal.
Proof: Suppose tp1 ,v1 ,ψ1 = tp2 ,v2 ,ψ2 . By Definition 27.3.3, p1 = p2 and v2i = v j ∂j (ψ2i ◦ ψ1−1 ) for all i =
1, . . . n. So by Definition 27.5.1,
n
5 ∂ &
−1 &
∀f ∈ C 1 (M ), ∂p1 ,v1 ,ψ1 (f ) = v1i i
(f ◦ ψ1 )(x) &
i=1
∂x x=ψ1 (p1 )
5 n
∂ & ∂ &
& j &
= v1i j (f ◦ ψ2−1 )(x̃)& (ψ2 ◦ ψ1
−1
)(x) &
∂ x̃ x̃=ψ2 (p2 ) ∂xi x=ψ1 (p1 )
i,j=1

n
5 ∂ &
&
= v2j (f ◦ ψ2
−1
)(x̃) &
j=1
∂ x̃j x̃=ψ2 (p2 )
= ∂p2 ,v2 ,ψ2 (f ),
!n
where v2j = j
i=1 v1 ∂i (ψ2
i
◦ ψ1−1 ) for j = 1, . . . n.
27.5.12 Remark: Theorem 27.5.11 has a partial converse. The converse is true if at least one of the
tangent operators is non-zero. The proof of this converse (Theorem 27.5.13) requires the construction of an
appropriate pair (or set?) of separating test functions.
27.5.13 Theorem: If tangent vectors ∂p1 ,v1 ,ψ1 and ∂p2 ,v2 ,ψ2 on a C 1 manifold are equal and non-zero,
then tp1 ,v1 ,ψ1 = tp2 ,v2 ,ψ2 .
27.5.14 Theorem: Let M be an n-dimensional C 1 manifold with indexed atlas (ψα )α∈I . Let p ∈ M ,
α, β ∈ I, and vα , vβ ∈ IRn be such that ∂p,vα ,ψα = ∂p,vβ ,ψβ . Then vβ = Zβα (p)vα , where the matrix
Zβα (p) ∈ GL(n) for p ∈ Dom(ψα ) ∩ Dom(ψβ ) is as in Definition 26.5.7.
27.5.15 Remark: The matrix Zβα (p) in Theorem 27.5.14 may be thought of as the contravariant trans-
formation rule from α-coordinates to β-coordinates. The right-to-left order is due to the use of matrix
multiplication of column vectors from the left, which is the usual order in finite linear space theory. (The
reverse convention is followed in Markov probability theory.) Thus Zγα (p) = Zγβ (p)Zβα (p) for all α, β, γ ∈ I
such that p ∈ Dom(ψα ) ∩ Dom(ψβ ) ∩ Dom(ψγ ).
27.5.16 Remark: The tangent operator ∂p,v,ψ in Definition 27.5.1 may be written more colloquially as
∂ &&
n
5 n
5 ∂
∂p,v,ψ = vi & or vi (p). or v i ∂ip,ψ .
i=1
∂xi x=ψ(p) i=1
∂ψ i
These useful mnemonic notations are similar to the notations in Remark 26.5.9. When p and ψ are clear
from the context, the simpler notation vi∂ may be used.
27.5.17 Remark: Figure 27.5.2 shows the relations between tangent operators, real-valued functions, and
points in a manifold. In this diagram, f appears twice, first as a function from points p ∈ M to IR, and then
as a point f ∈ C 1 (M ) being mapped to IR by the operator ∂p,v,ψ . In one context, f is a function, whereas
in the other it is a ‘point’ in the domain of the operator ∂p,v,ψ .
C 1 (M ) f ∂p,v,ψ
IR
M p f
Figure 27.5.2 Tangent operator map ∂p,v,ψ : C 1 (M ) → IR
27.5.18 Remark: Theorem 26.6.9 & is rewritten in the language of tangent operators in Theorem 27.5.19.
The expression (d/dt)(u ◦ γ(t))&t=x = 0 in Theorem 26.7.6 for a C 1 curve γ : IR → M , where γ(x) = p,
equates to the expression Dγ ! (x) u, where γ # (x) is as in Definition 30.4.1.
27.5.19 Theorem: If p ∈ M is a local maximum of a function u ∈ C 1 (M ), where M is a C 1 manifold,

then DV (u) = 0 for all V ∈ Tp (M ).

27.6. Tagged tangent operators 585
27.6. Tagged tangent operators

27.6.1 Remark: A “tagged” tangent operator on a manifold M is a pair of the form (p, ∂p,v,ψ ), where
∂p,v,ψ is a tangent operator as in Definition 27.5.1. The reason for tagging is to resolve the ambiguity of
operators ∂p,0,ψ with zero coefficients. (See Remark 27.5.12.) It is not possible to construct a tangent bundle
from tangent operators without such tagging.
Although non-zero tangent operators can be disambiguated in principle, in practice it is very burdensome
to determine the base point p and coefficients v (for a given chart ψ) from the action of an operator on test
functions. So even without the zero-vector ambiguity, it would be a good idea to use tagging. (Malliavin [36],
section I.7.1, page 64, also comments on the topic of ambiguity of untagged tangent operators.)
27.6.2 Definition: A tagged tangent operator on a C 1 manifold (M, AM ) is a pair (p, ∂p,v,ψ ) such that
p ∈ M and ∂p,v,ψ : C 1 (M ) → IR is a tangent operator at p.
The vector v is called the coefficient vector, coordinate vector, coefficients or coordinates of the tagged tangent
operator (p, ∂p,v,ψ ) with respect to the chart ψ.
27.6.3 Notation: T̂p (M ) for a manifold M and point p ∈ M denotes the set of tagged tangent operators
(p, ∂p,v,ψ ) at p.
%
27.6.4 Notation: T̂ (M ) for a manifold M and point p ∈ M denotes the set p∈M T̂p (M ) of tagged
tangent operators (p, ∂p,v,ψ ) at p.
[ Near here, define the standard atlas on T̂ (M ). ]
27.6.5 Notation: D̂V for a tangent vector V = tp,v,ψ ∈ T (M ) on a C 1 manifold M denotes the tagged
tangent operator (p, DV ) = (p, ∂p,v,ψ ) ∈ T̂ (M ).
27.6.6 Remark: A tangent bundle based on the tagged tangent operator in Definition 27.6.2 satisfies
Metadefinition 27.2.1, but it is certainly not an economical tangent vector definition. The space C 1 (M )
is a very large space of functions. If p is specified, it is actually sufficient to consider only the functions
f k ∈ C 1 (M ) defined by f k (p) = ψ(p)k for p ∈ Dom(ψ) and k = 1 . . . n (with appropriate smoothing
at the boundary of Dom(ψ) to ensure extendability to all of M ). Then ∂p,v,ψ (f k ) = v k for all v ∈ IRn
and k = 1 . . . n. In other words, the action of ∂p,v,ψ on a well-chosen set of n functions contains the same
information as the action on the entire space C 1 (M ). But of course, the main objective of this representation
is usefulness in theoretical applications, not economy for data structures in computer software.
It is possible to recover the base point p and the component vector v ∈ IRn from a tangent vector opera-
tor ∂p,v,ψ if the point p is not known, but it is more difficult. If the operator is non-zero, it is sufficient
to apply the operator to the functions fk and fk% defined by fk : p 8→ ψ(p)k and fk% : p 8→ ψ(p)k ψ(p)% .
Let αi = ∂p,v,ψ (fk ) and βij = ∂p,v,ψ (fk% ) for p ∈ Dom(ψ). Then clearly v i = αi for all i = 1 . . . n, and
xi = βii /2αi for all i such that αi -= 0. Choose j such that αj -= 0. Then v i = βij /αj for all i such that
αi = 0. Thus both the point p = ψ −1 (x) and v ∈ IRn are determined from the action of ∂p,v,ψ on at most
n + n2 test functions.
27.6.7 Remark: The set of tagged tangent ! operators for the manifold IRn with the standard differentiable
n
structure in Example 27.5.9 is T̂ (IR ) = {(p, i=1 v i ∂ip ); p, v ∈ IRn }.
n
27.6.8 Remark: Definition 27.5.1 was the first definition to be written in this book. The whole book
started with defining tangent vectors and then moving forwards and backwards from there. There is a good
reason for this. The concept of an intrinsic tangent vector is the fundamental leap from a point space into
another kind of space. In a linear space, there is little difference between a point and a vector.
27.7. Pointwise tangent spaces

This section presents the linear space structure which is induced on the set of tangent vectors at a fixed
point of a differentiable manifold by a tangent bundle atlas. Definition 27.7.1 formally defines the tangent
space at a point of a manifold as in Definition 10.1.2.

27.7.1 Definition: The tangent space at a point p in a C 1 manifold (M, AM ) is the linear space tuple
Tp (M ) −
< (IR, Tp (M ), σIR , τIR , σTp (M ) , µ), where
(i) Tp (M ) = {tp,v,ψ ; v ∈ IRn , ψ ∈ atlasp (M )} as in Notation 27.3.8;
(ii) σIR and τIR are the addition and multiplication operations of IR;
(iii) σTp (M ) : Tp (M ) × Tp (M ) → Tp (M ) is the vector addition operation on Tp (M ) defined by tp,v1 ,ψ +
tp,v2 ,ψ 8→ tp,v1 +v2 ,ψ ;
(iv) µ : IR × Tp (M ) → Tp (M ) is the scalar multiplication operation defined by (λ, tp,v,ψ ) 8→ tp,λv,ψ .
27.7.2 Remark: The vector addition and scalar multiplication in Definition 27.7.1 are independent of
equivalence class representatives because the equivalence relation (27.3.1) in Definition 27.3.3 is linear.
27.7.3 Remark: The linear space Tp (M ) in Definition 27.7.1 can be its own tangent space. Instead of
constructing a separate tangent space for Tp (M ), the limit of an expression such as γ # (0) = limt→0 t−1 (γ(t) −
γ(0)) for maps γ : IR → Tp (M ) may be regarded as an element of Tp (M ) rather than some abstract tangent
space of the tangent space Tp (M ). The map which sends an abstract tangent vector γ # (0) to the corresponding
concrete element of Tp (M ) will be called a “drop function”. It turns out that this is important in defining
Lie derivatives and covariant derivatives, especially because most textbooks “drop” abstract tangents to
concrete tangents without comment.
27.7.4 Definition: A coordinate basis vector at a point p ∈ M for a chart ψ ∈ AM in a C 1 manifold

(M, AM ) with n = dim(M ) is a tangent vector ep,ψ
i = tp,vi ,ψ ∈ Tp (M ), where vi ∈ IRn is defined for
i = 1 . . . n by vij = δij .
27.7.5 Remark: The coordinate basis vectors in Definition 27.7.4 are a basis for the linear space of tangent
vectors at a fixed point of a differentiable manifold. The symbol δij is the Kronecker delta presented in
!n
Definition 7.9.10. Thus tp,v,ψ = i=1 v i ep,ψ
i .
27.7.6 Definition: The tangent operator space at a point p in a C 1 manifold M is the set T̊p (M ) of
tangent operators at p ∈ M together with the operations of addition and multiplication by real numbers.
Thus the linear combination λ1 L1 + λ2 L2 : C 1 (M ) → IR of tangent operators L1 , L2 ∈ T̊p (M ) is defined by
∀f ∈ C 1 (M ), (λ1 L1 + λ2 L2 )(f ) = λ1 L1 (f ) + λ2 L2 (f ).
27.7.7 Remark: Each pointwise tangent space T̊p (M ) is a linear subspace of the global function space
%
Lin(C 1 (M ), IR) of linear functionals on C 1 (M ). The union T̊ (M ) = p∈M T̊p (M ) of these subspaces is not
a linear subspace of Lin(C 1 (M ), IR) because the sum of non-zero tangent vectors at two different points of
a manifold will not be a tangent vector at any point of the manifold.
27.7.8 Remark: Although the action of the operators in spaces T̊p (M ) is restricted to the space C 1 (M )
so that operators at all points can act on the same space, it is tacitly assumed that every operator in T̊p (M )
is extended to the space C̊p1 (M ) of C 1 functions which are defined in a neighbourhood of p. In fact, the
tangent operators are assumed to act on any reasonable kind of function on M or a subset of M , whether
the function is classically differentiable or not.
[ This remark looks similar to Remark 27.5.5 and some other remarks. ]
27.7.9 Definition: The tagged tangent operator space at a point p in a C 1 manifold M is the set T̂p (M ) of
all pairs (p, L) such that L ∈ T̊p (M ), together with the operations of pointwise addition and multiplication by
real numbers on the operator component. Thus the linear combination λ1 (p, L1 ) + λ2 (p, L2 ) of two tangent
vectors (p, L1 ), (p, L2 ) ∈ T̂p (M ) is defined by
λ1 (p, L1 ) + λ2 (p, L2 ) = (p, λ1 L1 + λ2 L2 ).
27.7.10 Remark: The chart basis operators in Definition 27.7.11 provide a natural basis for the linear
space of tangent operators at a fixed point of a differentiable manifold.

27.8. Tangent bundles 587
27.7.11 Definition: An tangent operator basis operator at a point p ∈ M with respect to a chart ψ ∈ AM
of a C 1 manifold (M, AM ) is a tangent operator ∂ip,ψ ∈ T̊p (M ) defined for i = 1 . . . n = dim(M ) by
∂(f ◦ ψ −1 (x)) &&

∀f ∈ C 1 (M ), ∂ip,ψ (f ) = & .
∂xi x=ψ(p)
A tagged tangent operator basis vector is a tagged tangent operator ∂ˆip,ψ = (p, ∂ip,ψ ) such that ∂ip,ψ is a
tangent operator basis vector.
!
27.7.12 Remark: The equation ∂p,v,ψ = ni=1 v i ∂ip,ψ is satisfied for all tangent operators ∂p,v,ψ ∈ T̊ (M ).
The right-hand side of this equation is a linear combination in the linear space structure of T̊p (M ) for
each p ∈ M .
27.7.13 Theorem: Let ψα , ψβ ∈ atlas(M ) be two charts for a C 1 manifold M with n = dim(M ). Then
p,ψ
the tangent operator basis vectors (∂ip,ψα )ni=1 and (∂i β )ni=1 at any point p ∈ Dom(ψα ) ∩ Dom(ψβ ) satisfy
n
5
p,ψβ
∀i = 1 . . . n, ∂i = ∂jp,ψα Zαβ (p)j i . (27.7.1)
j=1
[ The Jacobian matrices Zαβ are defined where? ]
27.7.14 Remark: A useful mnemonic for ∂ip,ψ is ∂/∂ψ i (p). (See Remark 27.5.16.) Then equation (27.7.1)
may be expressed as
5 ∂n
p,ψβ ∂
∀i = 1 . . . n, ∂i = i
(p) = j
(p)Zαβ (p)j i
∂ψβ j=1 ∂ψα
n
5 ∂ ∂ψαj
= j (p) ∂ψ i (p)
j=1 ∂ψα β
n n
5 p,ψ ∂ψ j 5
= ∂j α αi (p) = ∂jp,ψα Zαβ (p)j i .
j=1
∂ψ β j=1
A useful shorthand for the symbol ∂ip,ψ is ∂i when the point p and chart ψ are implied. (See Remark 19.2.15
for the corresponding flat-space abbreviations for unit vectors.)
27.7.15 Remark: In Theorem 27.7.13, the unit basis vectors are multiplied by the matrix Zαβ (p) on the
right whereas in Theorem 27.5.14, the components are multiplied by the inverse matrix Zβα (p) on the left.
There is no contradiction here. In Theorem 27.5.14, the objects being transformed are contravariant compo-
nents for tangent vectors, whereas in Theorem 27.7.13, the objects are sequences! of tangent vectors, which
are essentially covariant in nature. This ensures that the linear combination ni=1 ∂ip,ψα vαi is independent of
the chart ψα .
Whenever a coordinate change occurs, the components vαi of a tangent operator ∂p,v,ψ (or vector tp,v,ψ ) are
transformed according to a linear transformation with matrix Zβα (p) ∈ GL(n) on the left, which is just what
is required for the definition of a fibre bundle in Definition 23.6.4, condition (iv).
[ Why isn’t the symbol for Jacobian matrices a letter J? ]
27.8. Tangent bundles

This section presents a natural atlas for the set of tangent vectors at all points of a differentiable manifold.
This atlas induces a topology on the set of tangent vectors. If the manifold is C 2 , the atlas induces a
differentiable manifold structure on the bundle of tangent %
vectors. Whereas pointwise tangent spaces Tp (M )
are linear spaces, the tangent bundle total space T (M ) = p∈M Tp (M ) is not a linear space although it does
have a natural topological structure.

Differentiable fibre bundles are defined in terms of differentiable (Lie) groups, which are defined in terms
of differentiable manifolds. Therefore tangent bundles cannot logically be defined as differentiable fibre
bundles until these two other topics have been presented. The full differentiable tangent bundle definition is
in Section 34.8.
The tangent bundle representation adopted for Definition 27.8.1 uses coordinates and components. The
representation in Definition 27.9.2 uses tagged differential operators (p, ∂p,v,ψ ).
27.8.1 Definition: The tangent bundle of an n-dimensional C 1 manifold (M, AM ) is the tuple
(T (M ), π, ÂT (M ) , Φ) −
> (T (M ), ÂT (M ) ) −
> T (M ) where:
%
(i) T (M ) = p∈M Tp (M ) = {tp,v,ψ ; p ∈ M, v ∈ IRn , ψ ∈ atlasp (M )}.
(ii) π : T (M ) → M is defined by π : tp,v,ψ 8→ p.
(iii) Φ is the function with domain AM which is defined so that for all ψ ∈ AM , Φ(ψ) is the map Φ(ψ) :
π −1 (Dom(ψ)) → IRn defined by Φ(ψ) : tp,v,ψ 8→ v.
(iv) ÂT (M ) = {Φ(ψ); ψ ∈ AM }.
27.8.2 Theorem: Definition 27.8.1 satisfies Metadefinition 27.2.1.
27.8.3 Remark: Some of the maps and spaces in Definition 27.8.1 are illustrated in Figure 27.8.1. The
tangent bundle atlas ÂT (M ) (combined with the maps ψ ◦ π for ψ ∈ AM ) induces a topological and differen-
tiable structure on the total tangent vector set T (M ). This agrees with the atlas definition in EDM2 [35],
section 147.F.
φψ
π −1 (U ) ⊆ T (M ) IRn
U = Dom(ψ) ⊆ M IRn
ψ
Figure 27.8.1 Spaces and maps for a tangent bundle
27.8.4 Definition: The tangent bundle total space atlas for the tangent bundle (T (M ), π, ÂT (M ) , Φ) of an
n-dimensional C 1 manifold M is the atlas AT (M ) defined by
AT (M ) = {Qn,n ◦ ((ψ ◦ π) × Φ(ψ)); ψ ∈ AM } ,
where Qn,n : IRn × IRn → IR2n is the concatenation map in Definition 8.5.3.
27.8.5 Remark: The tangent bundle charts in Definition 27.8.4 are illustrated in Figure 27.8.2.
φψ
π −1 (U ) ⊆ T (M ) IRn Qn,n ◦ ((ψ ◦ π) × φψ )
= Φ(ψ)
π Qn,n IR2n
ψ
◦
π
U = Dom(ψ) ⊆ M IRn
ψ
Figure 27.8.2 Standard atlas for a tangent bundle

27.8. Tangent bundles 589
The atlas for T (M ) in Definition 27.8.4 may be written as
AT (M ) = {Ψ(ψ); ψ ∈ AM } ,
where
Ψ(ψ) = Qn,n ◦ ((ψ ◦ π) × Φ(ψ))
for all ψ ∈ AM . The bijection Ψ : AM → AT (M ) associates tangent bundle (total space) charts with
(base space) manifold charts. (It is convenient to abbreviate each chart Ψ(ψ) : π −1 (Dom(ψ)) → IR2n
to ψ̂ ∈ AT (M ) .) The atlas in Definition 27.8.4 is related to the flat-space tangent bundle as in Figure 27.0.2.
27.8.6 Definition: The tangent bundle total space manifold of a C 1 manifold M is the the differentiable
manifold (T (M ), AT (M ) ), where the atlas AT (M ) is given by Definition 27.8.4.
27.8.7 Remark: The tangent fibration in Definition 27.8.8 is essentially a differentiable fibration as its
name suggests. (See Section 34.1.) General differentiable fibrations are defined in Section 26.13. Differen-
tiable tangent bundles are defined in Section 34.8.
27.8.8 Definition: The tangent fibration of a C 1 manifold M − < (M, AM ) is the tuple (T (M ), π, M ) −<
(T (M ), AT (M ) , π, M, AM ) where AT (M ) is the tangent bundle total space atlas for M and π is the projection
map for T (M ) as in Definition 27.8.1.
[ Maybe should have a table of the various tangent bundle total spaces, fibrations and fibre bundles near
here? ]
27.8.9 Remark: It is tempting to think that a tangent bundle should not be defined to have a specific
atlas. Definitions 27.8.1 and 27.9.2 seem perhaps to be overly restrictive in prescribing specific atlases.
However, it would not be strictly correct to define the differentiable structure on T (M ) as some equivalence
class of AT (M ) or some sort of maximal atlas because that would weaken the regularity of the specific
atlas {Ψ(ψ); ψ ∈ AM }. There are infinitely many levels and gradations of regularity between classes C r
and C r+1 . Any addition to the differentiable structure would weaken the regularity, possibly losing some
property of interest.
The regularity inherited from the base space M might not be analytic. For instance, the atlas for M may
satisfy some group property such as local orthogonality or conformality of the transition maps.
[ See Remark 27.2.7. ]
[ Define the standard topology on the tangent operator bundle. ]
27.8.10 Remark: The definition of the induced topology on a tangent bundle total space is related to
Theorem 14.11.15, which expresses the topology on a set in terms of the topology on each member of an
open covering of that set. [ See Malliavin [36], section I.1.2 regarding the “natural topology” of a C 0 atlas. ]
27.8.11 Theorem: The topology defined for a tangent bundle in Definition 27.8.1 depends on the topology
on the base space, but is independent of the choice of atlas for the base space.
27.8.12 Theorem: The tuple (T (M ), TT (M ) , π, M, TM ), where (T (M ), TT (M ) ) is the topological total tan-

gent space for a C 1 n-dimensional manifold M −
< (M, AM ), is a topological fibre bundle with fibre space IRn .
(See Definitions 23.2.1 and 23.3.7.)
[ Check this theorem. ]
−+
27.8.13 Theorem: For r ∈ 0 , the pair (T (M ), AT (M ) ) in Definition 27.8.1 is a C r manifold if (M, AM )
is a C r+1 manifold.

27.9. Tangent operator bundles

27.9.1 Remark: Definition 27.9.2 is an operator version of Definition 27.8.1.
[ Convert Definition 27.9.2 to make it look like Metadefinition 27.2.1. ]
27.9.2 Definition: The tangent operator bundle of a C 1 manifold M − < (M, AM ) is the C 0 manifold
T̂ (M ) −< (T̂ (M ), AT̂ (M ) ), where
%
(i) T̂ (M ) = p∈M T̂p (M ) = {(p, ∂p,v,ψ ); ψ ∈ AM , p ∈ Dom(ψ), v ∈ IRn }, where n = dim(M ), and
(ii) AT̂ (M ) = {ψ̂; ψ ∈ AM }, where for any chart ψ ∈ AM , the chart ψ̂ : π −1 (Dom(ψ)) → IR2n is defined by
ψ̂ : (p, ∂p,v,ψ ) 8→ (ψ(p), v), where π : T̂ (M ) → M is defined by π : (p, ∂p,v,ψ ) 8→ p.
The function π : T̂ (M ) → M is called the projection map of the tangent operator bundle T̂ (M ).
[ Show that Definition 27.9.2 satisfies Metadefinition 27.2.1. ]

[ Show that Definition 27.9.2 is well-defined. See Malliavin [36], lemma I.7.2.4. ]
27.10. The tangent bundle of a tangent bundle

27.10.1 Remark: Tangent vectors are mathematical objects which correspond to first-order derivatives.
But a very large proportion of physics is expressed in terms of second-order differential equations. So
tangent vectors must be extended somehow to represent second-order derivatives if differential geometry is
to be useful in physics. However, the transition from first-order to second-order objects is precisely where
differential geometry becomes significantly more complicated than Euclidean space.
27.10.2 Remark: A second-order derivative is usually thought of as the first-order derivative of the first-
order derivative. This is fine as long as the output from the first-order derivative of a function space is the
same space you started with. For example, the first-order derivative of a C ∞ function f : IR → IR is a C ∞
function from IR to IR. So you can keep taking derivatives up to any order in the same way. But with a
function f ∈ C ∞ (IRn , IR) for integer n ≥ 2, the first-order derivative will typically be the sequence of n
first-order partial derivatives (∂i f )ni=1 ∈ C ∞ (IRn , IRn ) (or some similar representation). So even in this very
basic flat-space case, the second-order derivative is not the double application of the first-order derivative.
The situation is much more difficult when defining a second-order derivative on a differentiable manifold. In
this case, the first-order derivative of a real-valued function yields a cross-section of the cotangent bundle,
which is a linear functional on the tangent vector space at each point of the manifold. The purpose of this
section is to define the first-order derivative of such cross-sections so that second-order derivatives will be
meaningful. However, not all cross-sections of first-order tangent bundles are obtained as the differential of
a real-valued function. So the applicability of the tangent bundle of a tangent bundle is much broader than
just second-order derivatives of real-valued functions.
[ Should also present the chart transition map equations for T (T ∗ (M )) and T ∗ (T ∗ (M )) in the hope that these
may be closely related to the equations for second-order tangent operators. ]
27.10.3 Remark: One obvious way to try to define mathematical objects to represent second-order deriva-
tives would be to construct a second layer of tangent vectors on top of the bundle of first-order tangent vectors.
But tangent vectors are defined with a differentiable manifold M as input and a tangent bundle T (M ) as
output. It would be possible to construct a second-level tangent object T (T (M )) if the tangent bundle T (M )
was a manifold. It happens that the tangent bundle total space T (M ) does have a natural C r−1 differentiable
structure if M is C r . This can be used as the manifold to construct a second-level tangent bundle T (T (M ))
of class C r−2 . So the steps to construct a second-level tangent bundle are as follows. (These steps are
sketched in Figure 27.10.1.)
(i) Define a tangent bundle T (M ) on a C r manifold M − < (M, AM ).
(ii) Define a C r−1
atlas (i.e. differentiable structure) AT (M ) on the total space of T (M ).
(iii) Define a second-level tangent bundle T (2) (M ) = T (T (M )) on the total space of T (M ) −
< (T (M ), AT (M ) ).
(iv) Define a C r−2
atlas AT (T (M )) on the total space of T (T (M )).

27.10. The tangent bundle of a tangent bundle 591
AM AT (M ) AT (T (M ))
M T (M ) T (T (M ))
Figure 27.10.1 Building tangent bundles from manifolds and atlases
The differentiable structure on the tangent bundle of a differentiable manifold is presented in Definition 27.8.4.
The existence of a differentiable structure on the tangent bundle implies that tangent vectors can be defined
with base points in the tangent bundle T (M ) just as they were on the base space M in Definition 27.3.3.
The manifold of these tangent vectors forms a new tangent bundle, namely T (T (M )).
[ There is some discussion of T (T (M )) in Malliavin [36], chapter II.4, page 112. ]
27.10.4 Remark: When the time comes to construct the second-level tangent bundle, coordinate vectors
have a clear advantage compared to differential operator vectors. Tangent vectors are defined in terms
of coordinate charts on a manifold. Therefore second-level tangent vectors must be defined in terms of
coordinate charts on the first-level tangent bundle. It is easier to construct T (T (M )) in terms of coordinate-
based tangent vectors (Definition 27.3.3) than tangent operators (Definition 27.6.2).
27.10.5 Remark: The following concepts are easily confused.
(i) The tangent bundle of a tangent bundle: T (2) (M ) = T (T (M )). (This section.)
&
(ii) The space T̊ [2] (M ) of second-order derivative operators f 8→ v i wj (∂ 2 f (x)/∂xi ∂xj )&x=p . (Section 31.5.)
(iii) The space of tensors of degree 2: T 2 (M ) = T (M ) ⊗ T (M ). (See Section 28.3.)
(iv) The set of ordered pairs of tangent vectors: T (M )2 = T (M ) × T (M ). (See Section 27.12 for 2-frames.)
The words “degree”, “order” and “level” are used in this book with the following meanings.
word meaning example tangent spaces
degree the multiplicity of a tensor ⊗1 V , ⊗2 V , ⊗3 V T (r,0) (M )
order the order of derivatives ∂/∂x , ∂ /∂x ∂x , ∂ /∂x ∂x ∂x
i 2 i j 3 i j k
T [r] (M ), T̊ [r] (M )
level recursive level of tangent bundles T (M ), T (T (M )), T (T (T (M ))) T (r) (M )
27.10.6 Remark: In the same way that first-level tangent bundles on manifolds are defined in terms of
flat-space tangent bundles (as mentioned in Remark 27.0.3), second-level tangent bundles are defined in
terms of the corresponding flat-space second-level tangent bundles. This is summarized in Figure 27.10.2.
(x, v, w) = Ψ(2) (ψ)(W )

Ψ(2) (ψ) ∈ AT (2) (M ) T (2) (IRn )
T (2) (M ) W ≡ IRn × IRn × (IRn × IRn )
≡ IR4n
(x, v, w) 8→ x
π (2) Ψ(2)
x = ψ(p)
M p = π(W ) IRn
ψ ∈ AM
Figure 27.10.2 Defining second-level tangent bundle in terms of flat space

27.10.7 Definition: A tangent vector on the total tangent space (T (M ), AT (M ) ) of a C 2 n-dimensional

manifold (M, AM ) at z = tp,v,ψ ∈ T (M ) is an equivalence class [(tp,v,ψ , w, ψ̂)], where p ∈ M , v ∈ IRn ,
ψ ∈ AM , w ∈ IR2n and ψ̂ ∈ AT (M ) . Thus Tz (T (M )) = {[(z, w, ψ̂)]; w ∈ IR2n , ψ̂ ∈ atlasz (T (M ))}.
(2)
A reasonable sort of notation for [(z, w, ψ̂)] might be tz,w,ψ̂ .
27.10.8 Remark: Theorem 27.10.9 gives chart transition rules for tangent vectors to the total tangent
space T (M ) which are analogous to the chart transition rules in Definition 27.3.3 for tangent vectors to
the manifold M . These transition rules show that verticality of vectors in T (T (M )) is chart-independent,
(2)
whereas horizontality is not. In other words, if the components w ∈ IR2n of a vector tz,w,ψ̂ ∈ Tz (T (M ))
satisfy wi = 0 for i = 1 . . . n for one chart, then this holds for all charts.
On the other hand, horizontality is not chart-independent. That is, if w satisfies wi = 0 for i = n + 1, . . . 2n
for one chart, this equality does not generally hold for all charts. But if z is a zero vector (that is, z = tp,v,ψ
with v = 0), then horizontality is chart-independent.
27.10.9 Theorem: Let M be an n-dimensional C 2 manifold. Let p ∈ M and z = tp,v,ψ ∈ Tp (M ). Let
ψ1 , ψ2 ∈ atlasp (M ), and let ψ̂1 , ψ̂2 ∈ atlasz (T (M )) be the corresponding tangent bundle total space charts
(2) (2)
in Definition 27.8.4. Then for w1 , w2 ∈ IR2n , tz,w ,ψ̂ = tz,w ,ψ̂ if and only if for all i = 1 . . . n,
1 1 2 2
n
5 &
&
w2i = ∂xj (ψ2i ◦ ψ1−1 (x))& w1j (27.10.1)
x=ψ1 (p)
j=1
and
n
5 & n
5 &
& &
w2n+i = ∂xj ∂xk (ψ2i ◦ ψ1−1 (x))& v1k w1j + ∂xj (ψ2i ◦ ψ1−1 (x))& w1n+j . (27.10.2)
x=ψ1 (p) x=ψ1 (p)
j,k=1 j=1
(2) (2)
Proof: By Definition 27.3.3 applied to the manifold T (M ), tz,w = tz,w if and only if
1 ,ψ̂1 2 ,ψ̂2
2n
5 &
&
∀i = 1, . . . 2n, w2i = ∂yj (ψ̂2i ◦ ψ̂1−1 (y))& w1j . (27.10.3)
y=ψ̂1 (z)
j=1
The variable y = ψ̂1 (z̆) ∈ IR2n has the form (x̆, v̆) = (ψ1 (p̆), v̆), where z̆ = tp̆,v̆,ψ1 is a point in T (M ). So
y j = x̆j = ψ1j (p̆) for j = 1 . . . n and y j = v̆ j−n for j = n + 1, . . . 2n. Thus
- i
−1
ψ2 ◦ ψ1−1 (x̆) for i = 1 . . . n
ψ̂2 ◦ ψ̂1 (y) = !n
i
& (27.10.4)
k=1 ∂xk (ψ2
i−n
◦ ψ1−1 (x))&x=x̆ v̆ k for i = n + 1, . . . 2n.
Substitution of (27.10.4) into the expression ∂yj (ψ̂2i ◦ ψ̂1−1 (y)) in (27.10.3) for j = 1 . . . n gives
-
∂x̆j (ψ2i ◦ ψ1−1 (x̆)) for i = 1 . . . n
∂yj (ψ̂2i ◦ ψ̂1−1 (y)) = !n &
∂x̆j k=1 ∂xk (ψ2 ◦ ψ1 (x))&x=x̆ v̆
i−n −1 k
for i = n + 1, . . . 2n
- &
∂xj (ψ2i ◦ ψ1−1 (x))&x=x̆ for i = 1 . . . n
= !n &
k=1 ∂xj ∂xk (ψ2
i−n
◦ ψ1−1 (x))&x=x̆ v̆ k for i = n + 1, . . . 2n.
A similar substitution for j = n + 1, . . . 2n gives
-
∂v̆j−n (ψ2i ◦ ψ1−1 (x̆)) for i = 1 . . . n
∂yj (ψ̂2i ◦ ψ̂1−1 (y)) = ! &
∂v̆j−n nk=1 ∂xk (ψ2i−n ◦ ψ1−1 (x))&x=x̆ v̆ k for i = n + 1, . . . 2n
(
0 for i = 1 . . . n
= &
∂xj−n (ψ2i−n ◦ ψ1−1 (x))&x=x̆ for i = n + 1, . . . 2n.
Substitution of these expressions into (27.10.3) gives (27.10.1) and (27.10.2).

27.10. The tangent bundle of a tangent bundle 593
Tz (T (M ))
(2) (2)
tz,w = tz,w
1 ,ψ̂1 2 ,ψ̂2
z = tp,v1 ,ψ1
x1 = ψ1 (p) = tp,v2 ,ψ2 x2 = ψ2 (p)
ψ̂1 ψ̂2
(x1 ,v1 ) z (x2 ,v2 )
IR2n T (M ) IR2n
ψ1 ψ2
x1 p x2
IRn M IRn
Figure 27.10.3 Chart transition rule for tangent vectors to T (M )
27.10.10 Remark: The sets, points and maps in Theorem 27.10.9 are illustrated in Figure 27.10.3.
The transformation rules (27.10.1) and (27.10.2) may be summarized as the following matrix equation. (This
is similar to Remark 19.3.2.) 3 H4 3 43 H 4
w2 A 0 w1
= ,
w2V B A w1V
where wiH and wiV are respectively the horizontal and vertical parts of the component vector w ∈ IR2n and
R Sn
∂ψ2i
A= ,
∂ψ1j i,j=1
R Sn
∂ 2 ψ2i
B= vk .
∂ψ1j ∂ψ1k i,j=1
27.10.11 Remark: The transformation rules for T (2) (M ) = T (T (M )) in Theorem 27.10.9 are eerily similar
to the rules for T [2] (M ) in Definition 29.3.3. This is probably not a coincidence.
[ Compare the transition map rule in Theorem 27.10.9 with the rule for Γjk
i
in EDM2 [35], 80.L? ]
27.10.12 Remark: The way in which atlases are used to mechanically arrive at the chart transition rules in
Theorem 27.10.9 shows the advantage of working with coordinate-based tangent vectors in T (T (M )) rather
than tangent operators in spaces such as T̊ (T̊ (M )) or T̂ (T̂ (M )). It was precisely the difficulties of working
with these higher-level tangent spaces which prompted the author to abandon differential operators as the
primary basis for tangent spaces. Such operators are good for intuition and bad for precise calculations with
higher-level concepts.
[ Definition 27.10.13 is probably a theorem, not a definition. ]
27.10.13 Definition: The tangent space of a tangent bundle total space T (M ) at a point z = tp,v,ψ ∈
T (M ) for a C 2 n-dimensional manifold M is the set Tz (T (M )) of all tangent vectors on T (M ) at z together
with the linear space operations of pairwise addition and scalar multiplication defined by
(2) (2) (2)

λ1 tz,w + λ2 tz,w = tz,λ
1 ,ψ̂j 2 ,ψ̂j 1 w1 +λ2 w2 ,ψ̂j
for all λ1 , λ2 ∈ IR, w1 , w2 ∈ IR2n and ψ̂j ∈ atlasz (T (M )).

27.10.14 Definition: The total space of the second-level tangent bundle of a C 2 n-dimensional manifold
M− < (M, AM ) is the differentiable manifold tuple T (T (M )) −
< (T (T (M )), AT (T (M )) ), where
%
(i) T (T (M )) = z∈T (M ) Tz (T (M )) is the union of the pointwise tangent spaces of (M, AM );
(ii) AT (T (M )) = {ψ̌; ψ ∈ AM }, where for all ψ ∈ AM , the chart ψ̌ : π̂ −1 (Dom(ψ)) → IR4n is defined by
(2) (2)
ψ̌ : tt ,w,ψ̂
8→ (ψ(p), v, w), where π̂ : T (T (M )) → T (M ) is defined by π̂ = π∗ : tt ,w,ψ̂
8→ tp,π1 (w),ψ ,
p,v,ψ p,v,ψ
where π1 : IR2n → IRn is the projection map with π1 : (x1 , . . . x2n ) 8→ (x1 , . . . xn ).
ψ̌
T (T (M )) IR4n
π̂
ψ̂
T (M ) IR2n
π
ψ
M IRn
Figure 27.10.4 Charts and maps for total tangent space of total tangent space
27.10.16 Remark: It is clumsy to have to refer to a space such as T (T (T (M ))) as the “tangent bundle
of the tangent bundle of the tangent bundle of M ”. Since there is probably no standard term for this, it
is convenient to refer to such a space as the “third-level tangent bundle of M ”. Then a linear space such
as Tz (T (T (M ))) for z ∈ T (T (M )) may be referred to as a “third-level tangent space of M at z”. It is not
obvious what notation to use for kth-level tangent spaces for arbitrary k ∈ + . Notation 27.10.17 will be
used in this book. The parentheses in the superscript are supposed to be a mnemonic for the parantheses in
the recursive definition.
27.10.17 Notation: T (k) (M ) for k ∈ +

denotes the tangent bundle T (T (k−1) (M )) of a C k manifold M ,
where T (0) (M ) = M .
27.10.18 Definition: The kth-level tangent bundle of a C k manifold M is the tangent bundle T k (M ) in
Notation 27.10.17.
27.11. Horizontal components and drop functions

27.11.1 Remark: A connection (i.e. parallelism) is constructed on a differentiable manifold M by defining,
for each direction of motion v in the tangent space at a point p ∈ M , the rate of adjustment of each vector
w in the tangent space to keep the vector w moving in a parallel fashion. For example, a point q ∈ M may
move along a curve γ : IR → M with velocity vector v as the curve passes through the point p. The vector
w(q) always lies in the tangent space at the point q. So it is necessary to be able to define the velocity of
a vector in the tangent bundle with a variable base point q. This in turn requires the definition of tangent
vectors on the tangent bundle, and more importantly, the tangent vectors on the tangent bundle must be
given a differentiable manifold structure so that all of the conditions in the definition of a connection will be
meaningful. (See Definition 35.5.3.)
Parallelism is defined for a particular path between two points p1 , p2 ∈ M as a linear map between the
tangent spaces Tp1 (M ) and Tp2 (M ). When this is differentiated, it leads naturally to a definition of affine
connections in terms of the tangent space of the tangent space of M . This seems to imply that a manifold
must be C 2 in order to have a (differential) connection on it. Probably this constraint could be weakened
somewhat.

27.11. Horizontal components and drop functions 595
27.11.2 Definition: The horizontal component of a vector W in a total tangent space T (T (M )) for a C 2
manifold M is the vector π∗ (W ) in T (M ), where π is the projection map of T (M ).
27.11.3 Definition: A vertical vector in a total tangent space T (T (M )) for a C 2 manifold M is a vector
W ∈ T (T (M )) whose horizontal component is zero.
27.11.4 Remark: The horizontal component of a vector in a total tangent space is chart-independent.
Therefore the verticality property is also chart-independent. Definitions 27.11.2 and 27.11.3 follow the
terminology for general differentiable fibre bundles in Section 26.13. Vertical components and horizontal
vectors are not defined here because there is no “connection” between tangent spaces at different points of
a manifold.
27.11.5 Remark: It turns out that Definition 27.11.6 is very important in constructing a covariant deriva-
tive out of an affine connection. It may be compared with the much simpler “drop” function for a linear
space such as IRn . The tangent space Tp (IRn ) at a point p ∈ IRn is usually identified with IRn without
comment. Thus if γ : IR → IRn is a differentiable curve in IRn , the derivative of γ at p = γ(t) is defined
as limh→0 (γ(t + h) − γ(t))/h, which is the limit of a vector (γ(t + h) − γ(t))/h that just happens to be in IRn
for h -= 0. The fact that IRn has a complete topology implies that the limit exists and is an element of IRn .
Alternatively, one could think of IRn as a manifold with an atlas of one chart – the identity map. Then the
limit could be thought of as an element of Tp (IRn ).
In the case of a general manifold, there is no linear space structure, and therefore only the tangent space
version of a vector can be defined. But if the manifold happens to be a linear space also, then there are
two possible definitions. This is exactly what occurs when one constructs tangent vectors in Tz (T (M )) for a
C 2 manifold with z ∈ T (M ). Any vertical vector in Tz (T (M )) may be constructed either within the linear
space of vertical vectors or in the manifold structure of T (M ). Definition 27.11.6 gives the canonical map
from the latter to the former. The verticality is important because a vertical vector y ∈ Tz (M ) represents
a tangent vector with a constant base point p = π(z), and if the base point is not changing, the rest of the
manifold might as well not exist, and the vector therefore exists entirely within a simple linear space.
27.11.6 Definition: The drop function at a point z ∈ T (M ) for a C 2 manifold M is the linear map
8z : ker((dπ)z ) → T (M ) defined by 8z : (z, w, ψ̂) 8→ (π(z), π2 (w), ψ) for w ∈ IR2n and ψ ∈ atlasπ(z) , where
π2 : IR2n → IRn is the projection map π2 : w 8→ (wn+1 , . . . w2n ), and π : T (M ) → M is the standard
projection map for T (M ).
27.11.7 Theorem: The drop function 8z in Definition 27.11.6 is a chart-independent linear isomorphism.
Proof: Let z ∈ T (M ), p = π(z) ∈ M and ψ1 , ψ2 ∈ atlasp (M ). It must be shown that tp,π2 (w1 ),ψ1 =
(2) (2)
tp,π2 (w2 ),ψ2 if tz,w ,ψ̂ = tz,w ,ψ̂ ∈ ker((dπ)z ). The components w11 , . . . w1n and w21 , . . . w2n are all zero. So by
1 1 2 2
Theorem 27.10.9,
n
5 &
&
w2n+i = ∂xj (ψ2i ◦ ψ1−1 (x))& w1n+j ,
x=ψ1 (p)
j=1
from which it follows that π2 (w1 ) and π2 (w2 ) satisfy the same chart transition rule, which happens to be
the right transition rule for vectors in Tp (M ).
The fact that 8z is an isomorphism follows from the fact that π2 is surjective and (dπ)z : (z, w, ψ̂) 8→
(p, π1 (w), ψ) for ψ ∈ atlasp (M ).
27.11.8 Remark: The map 8z in Definition 27.11.6 is a bijection from the subspace ker((dπ)z ) of vertical
vectors in Tz (T (M )) to Tp (M ). Note that 8z cannot be extended to a chart-independent linear map on all
of Tz (T (M )) without some means of removing the arbitrary definition of horizontality. This is precisely why
connections are required. A connection on a C 2 manifold is equivalent to extending 8z to all of Tz (T (M ))
for all z ∈ T (M ) in a way which is chart-independent. This requires some way of adjusting the map for each
chart so as to remove the horizontal component in equation (27.10.2) in Definition 27.10.9.
In fact, a connection may be thought of as a horizontal component removal rule – the connection determines
a horizontal vector which is subtracted from an element of Tz (T (M )) to give a vertical vector. This is the

basis of covariant differentiation. Definition 27.11.6 is an essential ingredient in defining covariant derivatives
from connections, although most texts do not mention this explicitly. The vertical vectors constructed from
connections must be “dropped” from the vertical space ker((dπ)z ) down to the base tangent space Tp (M ).
(This is the procedure for covariant differentiation of vector fields in X 0 (T (M )). Similar procedures are
followed for general tensor fields. If the field to be differentiated is valued in a linear space, a similar drop
function can generally be defined.)
%
Maybe the total drop function 8 = z∈T (M ) 8z could be regarded as a differentiable map between differ-
%
entiable manifolds if the domain z∈T (M ) ker((dπ)z ) (i.e. the set of all vertical vectors) can be regarded as
a C 1 manifold.
[ Must try to put a sensible differentiable structure on the set of all vertical vectors. And maybe a fibration
and fibre bundle structure too. Maybe even give it a notation, both for total tangent spaces and for general
C 1 fibre bundles. ]
27.11.9 Remark: It is sometimes useful to work with functions f : M → T (T (M )) such that f (p) ∈
T (T (M )) and π(π∗ (f (p))) = p for all p ∈ M , where M is a C 2 manifold. Such functions are cross-sections
of the fibration (T (T (M )), π ◦ π∗ , M ). It would be useful to have a notation for the space of these functions.
One possibility is the notation “T (T (M ))/M ”, although this is ambiguous because the projection map could
be something other than π ◦ π∗ .
27.12. Tangent frames and coordinate basis vectors

This section presents the natural topological and atlas structures for the space of all sequences of linearly
independent tangent vectors of a differentiable manifold.
A tangent frame is a sequence of linearly independent vectors in the tangent vector space Tp (M ) at a single
point p of a C 1 manifold M . Tangent frames with r vectors for r ≤ n are elements of the direct product
set Tp (M )r at each p ∈ M . Tangent frames are useful for the definition of parallelism (connections) on
manifolds and also for representing forms on manifolds. The set of all tangent r-frames of a given manifold
has a natural manifold structure. It is shown in Section 34.9 that the set of all tangent n-frames of a manifold
may be regarded as a principal fibre bundle.
27.12.1 Definition: A tangent r-frame at a point p ∈ M of an n-dimensional C 1 manifold M for r ∈ +
0
with r ≤ n = dim(M ) is any linearly independent sequence of r elements in Tp (M ).
A tangent basis at a point p ∈ M is any tangent n-frame at a point p of an n-dimensional manifold M .
27.12.2 Notation: Ppr (M ) denotes the set of all tangent r-frames at a point p in a C 1 manifold M . (Then
Ppr is a subset of the set Tp (M )r of r-tuples of elements of Tp (M ).)
Pp (M ) denotes the set of all n-frames at p ∈ M , where n = dim(M ). (Then Pp (M ) = Ppn (M ).)
%
P r (M ) denotes the set p∈M Ppr (M ) of all r-frames of the manifold M .
%
P (M ) denotes the set p∈M Pp (M ) of all coordinate frames of the manifold M .
27.12.3 Remark: The set Ppr (M ) may be expressed symbolically as follows:

) "!r #*
Ppr (M ) = (zi )ri=1 ∈ Tp (M )r ; ∀(λi )ri=1 ∈ IRr , i=1 λi zi = 0 ⇒ (∀i = 1, . . . r, λi = 0) .
27.12.4 Definition: A tangent operator r-frame is. . .
27.12.5 Remark: An atlas is introduced for the set of coordinate frames in Theorem 27.12.6 and Defini-
tion 27.12.8 using the coordinate basis vectors presented in Definition 27.7.11.
[ Give also a version of Theorem 27.12.6 using tangent component vectors rather than tangent operators.
In this case, the basis vectors will be denoted as ep,ψ
i ∈ Tp (M )n . The proof is identical to that for The-
orem 27.12.6. It’s best to just prove the theorem for component vectors and write the operator case as a
corollary. ]

27.12. Tangent frames and coordinate basis vectors 597
27.12.6 Theorem: Let (M, AM ) be an n-dimensional C 1 manifold. Then for any p ∈ M and any chart
ψ ∈ AM , the sequence (∂ip,ψ )ni=1 is a basis for T̊p (M ). Therefore ∀p ∈ M, dim(T̊p (M )) = dim(Tp (M )) =
dim(M ). [ Is this really worth stating? ]
Hence (or otherwise) for any p ∈ M , ψ ∈ AM and n-frame B ∈ P̊p (M ), there is a unique matrix b =
(bi j )ni,j=1 ∈ GL(n) such that
+5 n ,n
B= ∂ip,ψ bi j .
j=1
i=1
If this sequence is denoted as Bp,b,ψ , then
P̊p (M ) = {Bp! ,b,ψ ; b ∈ GL(n), ψ ∈ AM , p# ∈ Dom(ψ), p# = p}

and
P̊ (M ) = {Bp,b,ψ ; b ∈ GL(n), ψ ∈ AM , p ∈ Dom(ψ)}.
The n-frames Bp,b,ψ obey the following coordinate transformation rule for ψα , ψβ ∈ AM :
Bp,bα ,ψα = Bp,bβ ,ψβ
if and only if
n
5
(bβ )i j = (Zβα (p))i k (bα )k j ,
k=1
where Zαβ (p) is as in Definition 26.5.7.

p,ψβ p,ψβ
Proof: A straightforward calculation using ∂ip,ψα = (∂ψβk /∂ψαi )∂k = ∂k Zβα (p)k i yields the following:
I5 n Jn
p,ψα
Bp,bα ,ψα = ∂i (bα ) j
i
i=1 j=1
I5
n 5
n Jn
p,ψ
= ∂k β Zβα (p)k i (bα )i j
i=1 k=1 j=1
I5
n n Jn
p,ψβ
5
= ∂k Zβα (p)k i (bα )i j
i=1 k=1 j=1
I5
n Jn
p,ψβ
= ∂k (bβ )k j = Bp,bβ ,ψβ .
i=1 j=1
27.12.7 Remark: In contrast to the situation for tangent vector spaces, there are no algebraic operations
on the sets of tangent r-frames and coordinate frames. This is because these frames are not closed under
simple vector addition. However, the right action of the group G = GL(n) on the set of frames is of some
interest. The group actions µ : Ppr (M ) × G → Ppr (M ) for p ∈ M and µ : P r (M ) × G → P r (M ) turn out to be
the actions of topological transformation groups. (See Section 16.8 for topological transformation groups.)
When the set P (M ) is regarded as a fibre bundle over the base space M , it turns out that (P (M ), π, M ) is
a principal G-bundle.
Although the treatment of the tangent frame bundle is delayed until Section 34.9 so as to use the differentiable
fibre bundle definitions of Chapter 34, it is possible to introduce here the differentiable manifold structure
on the sets of tangent r-frames. This is presented in Definition 27.12.8.
27.12.8 Definition: The total (tangent) r-frame space of a C 1 n-dimensional manifold (M, AM ) for r ∈
0 with r ≤ dim(M ) is the tuple (P (M ), AP r (M ) ), where
+ r
(i) P r (M ) = {(p, z) ∈ (M, T (M )r ); z ∈ Tp (M )r and the vectors (z1 , . . . zr ) are independent};

(ii) AP r (M ) = {ψ̂; ψ ∈ AM }, where for any chart ψ ∈ AM , the chart ψ̂ : q −1 (Dom(ψ)) → IRn+rn is
defined by ψ̂ : (p, z) 8→ (ψ(p), v1 , . . . vr ), where zj = tp,vj ,ψ for j = 1, . . . r and q : P r (M ) → M satisfies
q : (p, z) 8→ p.

The map q : P r (M ) → M is called the projection map of the total tangent r-frame space.
The total tangent n-frame space P (M ) − < (P (M ), AP (M ) ) = (P n (M ), AP n (M ) ) of the manifold M is also
called the total (tangent) coordinate frame space.
27.12.9 Remark: The sets and functions in Definition 27.12.8 are illustrated in Figure 27.12.1.
ψ̃
q −1 (U ) ⊆ P r (M ) IRn+rn
U ⊆M IRn
ψ
Figure 27.12.1 Total tangent r-frame space
[ Define total tangent r-frame fibration, as for the tangent fibration. See Definition 34.9.1. ]
[ Show that P (M ) is a topological principal fibre bundle, or at least that the operation Rg on P (M ) transforms
as it should. ]
[ Must also define T (P (M )) here for use in connections. ]
27.13. Tangent space constructions, attributes and relations

This section collects together various constructions, attributes and relations for tangent spaces which are of
a general nature. They are generally lacking in any great theoretical interest. These things have been moved
out of the earlier sections of this chapter because they tend to be a bit of a yawn.
27.13.1 Theorem: Let M1 and M2 be C 1 manifolds. Let n1 = dim(M1 ) and n2 = dim(M2 ). Let p1 ∈ M1
and p2 ∈ M2 . Let v1 ∈ IRn1 and v2 ∈ IRn2 . Let ψ1 ∈ atlasp1 (M1 ) and ψ2 ∈ atlasp2 (M2 ). Then the tangent
operator ∂(p1 ,p2 ),v1 ⊕ v2 ,ψ1 ⊕ ψ2 ∈ T(p1 ,p2 ) (M1 × M2 ) satisfies
n1
5 + ∂ " #,&& 5n2 + ∂ " #,&&
−1 j −1
∂(p1 ,p2 ),v1 ⊕ v2 ,ψ1 ⊕ ψ2 (f ) = v1i f ◦ ψ1 (x 1 ) & + v2 f ◦ ψ2 (x 2 ) &
i=1
∂xi1 x1 =ψ1 (p1 )
j=1 ∂xj2 x2 =ψ2 (p2 )
for all f ∈ C̊(p

1
1 ,p2 )
(M1 × M2 , IR).
27.13.2 Notation: The tangent operator ∂(p1 ,p2 ),v1 ⊕ v2 ,ψ1 ⊕ ψ2 ∈ T(p1 ,p2 ) (M1 × M2 ) will also be denoted
by ∂p1 ,v1 ,ψ1 ⊕ ∂p2 ,v2 ,ψ2 .
27.14. Unidirectional tangent bundles

A unidirectional tangent bundle looks like a cone at each point of the manifold whereas the usual definition
of a tangent bundle looks like a tangent plane at each point. The natural invariant class of curves for
unidirectionally differentiable manifolds is the class of curves with one-side derivatives everywhere. The set
of one-sided derivatives at a point constitutes a cone of tangent vectors.
The examples in Section 42.4 gives some idea of the issues which arise for manifolds with Lipschitz transition
maps. The Lipschitz (i.e. C 0,1 ) guarantees the existence of derivatives almost everywhere, but does not
guarantee even the existence of unidirectional derivatives everywhere, as shown by Example 42.4.4. Existence
of tangent vectors almost everywhere may be adequate in contexts where vector fields are to be integrated,
for example. So the manifold classes C k,1 for integer k may be useful for some applications. In other
applications, directional derivatives may be required. Such manifolds may arise from differential equations
whose force functions have well-defined one-sided limits everywhere instead of full continuity.
27.14.1 Remark: Metadefinition 27.14.2 is the unidirectional version of Metadefinition 27.2.1. Unidirec-
tionally differentiable homeomorphisms are defined in Section 19.6. Unidirectionally differentiable manifolds
are defined in Section 26.11.

27.15. Distributions as representations of tangent bundles 599
[ See Remark 19.6.6 for the ∂a+ notation. It is probably a pseudo-notation. ]
27.14.2 Metadefinition: A unidirectional tangent bundle for an n-dimensional unidirectionally differen-

tiable manifold (M, AM ) must provide a tuple (T , π, ÂT , Φ) −
> (T , ÂT ) −
> T which satisfies the following
conditions.
(i) π : T → M is a surjective map.
(ii) Φ : AM → ÂT is a bijection.
(iii) ∀ψ ∈ AM , Φ(ψ) : π −1 (Dom(ψ)) → IRn .
&
(iv) ∀p ∈ M, ∀ψ ∈ atlasp (M ), Φ(ψ)&π−1 ({p}) : π −1 ({p}) → IRn is a bijection.
(v) ∀p ∈ M, ∀ψ1 , ψ2 ∈ atlasp (M ), ∀V ∈ π −1 ({p}),
" #&&
Φ(ψ2 )(V ) = ∂a+ ψ2 (ψ1−1 (x + av)) & . (27.14.1)
x=ψ1 (p),v=Φ(ψ1 )(V )
T is called the total space of the unidirectional tangent bundle.

An element of T is called a unidirectional tangent vector.
π is called the projection map of the unidirectional tangent bundle.
ÂT is called the unidirectional tangent bundle atlas of the unidirectional tangent bundle.
The maps φψ ∈ ÂT are called unidirectional tangent bundle charts.
Φ is called the lift function of the unidirectional tangent bundle.
A unidirectional tangent vector at p ∈ M is any element of π −1 ({p}).
The unidirectional tangent vector at & p ∈ M −1with coordinates v ∈ IRn with respect to chart ψ ∈ AM is the
&
unidirectional tangent vector (Φ(ψ) π−1 ({p}) ) (v) ∈ T .
27.14.3 Remark: Equation (27.14.1) may be written more fully as follows.
ψ2 (ψ1−1 (x + av)) − ψ2 (ψ1−1 (x)) &&

Φ(ψ2 )(V ) = lim+ &
a→0 a x=ψ1 (p),v=Φ(ψ1 )(V )
ψ2 (ψ1−1 (ψ1 (p) + aΦ(ψ1 )(V ))) − ψ2 (p)
= lim+
a→0 a
= ∂a+ ψ2 (ψ1−1 (ψ1 (p) + aΦ(ψ1 )(V ))).
Apart from condition (v), Metadefinition 27.14.2 is the same as Metadefinition 27.2.1. If the manifold M is
C 1 , the metadefinitions are the same.
27.14.4 Remark: It seems fairly clear that any two representations of Metadefinition 27.14.2 must be
related by a unique isomorphism. This is required if the metadefinition is to be a complete characterization
of unidirectional tangent bundles. [ Provide a theorem to confirm the unique isomorphism property. ]
[ Here might be a good place to insert a section which generalizes tangent vectors to C 0,1 manifolds, C 0,α
manifolds, rectifiable manifolds, and manifolds which are simply differentiable (as opposed to continuously
differentiable). For C 0,1 (Lipschitz) manifolds, the manifold would be differentiable almost everywhere. So
the tangent bundle would presumably be an almost-everywhere tangent bundle. This suggests possibly
using distributions instead of function. C 0,1 and W 1,∞ regularity are essentially equivalent. This suggests a
motivation for using Sobolev spaces and weakly differentiable functions. ]
27.15. Distributions as representations of tangent bundles

[ For completeness, there should be a section on distributions as a framework for defining tangent vectors
as in Remark 27.1.1, case (v). This could be useful for partial differential equations and field theories. It
would be interesting to see how the higher derivatives of delta functions are related to higher order tangent
vectors, tensors and differentials on a differentiable manifold. Maybe a whole chapter would be needed for
this. Manifolds with Sobolev space regularity could be combined with the distribution representations. This
means that distributions must be fully defined, probably somewhere near Section 20.10. ]

27.16. Tangent bundles on infinite-dimensional manifolds

[ Maybe should present infinite-dimensional manifolds some day. Infinite dimensional manifolds have relevance
to calculus of variations, Jacobi fields on geodesics, motion of manifolds by mean curvature, vibrations of
strings and membranes and other applications. For example, the infinitesimal variations of a curve or
membrane may be regarded as elements of a tangent space at a point in the space of curves or membranes. ]
27.16.1 Remark: If infinite-dimensional manifolds were within the scope of this book, it would be impor-
tant to choose the definition of tangent vectors to maximize the range of possible manifolds on which tangent
vectors can be defined. An example of a very infinite-dimensional manifold is the set M of all C 1 diffeomor-
phisms of a given manifold S. In this case, the tangent vectors of M would be the vector fields generated on
S by families of diffeomorphisms. Such a family is a differentiable curve in M . This suggests that the most
general form of tangent vector definition might be the set of all derivatives γ # (0) of curves γ : IR → M . This
still raises the question of what space the derivatives γ # (0) should live in. This could be some sort of Hilbert
space, Banach space or topological vector space. It is best to use differential operators for motivation only;
then use coordinates for the formal definition.

[601]
Chapter 28
Tensor bundles and tensor fields on manifolds
28.1 Contravariant tensors and tensor spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

28.2 Cotangent vectors and cotangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
28.3 Covariant and mixed tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
28.4 Double tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
28.5 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
28.6 Tangent operator fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
28.7 Tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
28.8 Vector fields and tensor fields along curves . . . . . . . . . . . . . . . . . . . . . . . . . . 608
28.9 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
28.0.1 Remark: This chapter presents covariant vectors, tensors, vector fields, tensor fields and differential
forms. Differential manifolds were defined in Chapter 26. Several kinds of tangent spaces were defined in 27.
[ The following table will give an overview of tangent spaces, vector fields and differentials. It isn’t ready to
look at yet. Tangent space building principles are summarized in Section 26.14. ]
type TS TTS vector fields PD GD references
real function Cpk (M ) C (M )
k
f (p) f 26.6
manifold map Cpk (M1 , M2 ) C k (M1 , M2 ) φ(p) φ 26.9
tangent Tp (M ) T (M ) X r (M ) γ # (t) γ# 27.7, 27.8, 28.5
tangent operator T̊p (M ) T̊ (M ) γ # (t) γ#
tagged tangent operator T̂p (M ) T̂ (M ) X r (T̂ (M )) γ # (t) γ# 28.5
tangent frame Pp (M ) P (M ) X r (P (M ))
cotangent Tp∗ (M ) T ∗ (M ) X r (T ∗ (M )) (df )p df 28.2
type (r, s) tensor Tpr,s (M ) T r,s (M ) X r (T r,s (M )) 28.3, 28.7
map cotangent Tp∗ (M1 , M2 ) T ∗ (M1 , M2 ) X r (T ∗ (M1 , M2 )) (dφ)p dφ 28.4
map tangent Tp (M1 , M2 ) T (M1 , M2 ) X r (T (M1 , M2 )) 28.4
[k]
order-k tangent Tp (M ) T [k] (M ) X r (T [k] (M )) (dφ)p dφ 29.7
[k]
order-k tangent operator T̊p (M ) T̊ [k] (M ) (dφ)p dφ
[k]
tagged order-k tang. op. T̂p (M ) T̂ [k] (M ) X r (T̂ [k] (M )) (dφ)p dφ
[ Missing from this table are semi-groups like φ : IR × M → M , which generate vector fields, and higher-
order differentials like d2 f and d2 φ. Another missing class is curves and families of curves. Families of
diffeomorphisms are related to curves. Could also present multi-parameter families of differentiable maps.
Such maps are not necessarily diffeomorphisms. ]
28.0.2 Remark: As in the case of tangent bundles, it is important to remember that tensor bundles are
not defined as special cases of differentiable fibre bundles. A tensor bundle does not have the differentiable
structure group of the fully-defined differentiable fibre bundles which are introduced in Chapter 34. The
rough interrelationships are sketched in Figure 28.0.1.

602 28. Tensor bundles and tensor fields on manifolds
non-topological fibre bundle
tangent bundle
tensor bundle differentiable fibre bundle
differentiable differentiable
tangent fibre bundle tensor fibre bundle
Figure 28.0.1 Family tree for fibre bundles and tensor bundles
28.1. Contravariant tensors and tensor spaces

Contravariant tensors are constructed from ordinary tangent spaces.
[ Maybe comment on naming of “contravariant” vectors and tensors. Maybe the history is from the forms
being covariant and the contravariant vectors varying oppositely. Check the history books! Note that
covariant coefficients require no metric, whereas contravariant coefficients do require a metric. (Check this.)
Distinguish contravariant vectors from contravariant coefficients. ]
28.2. Cotangent vectors and cotangent spaces

Cotangent spaces are defined to give differentials a “place to live”. Differentials are linear functionals on
the space of tangent vectors. The properties of linear functionals on a finite-dimensional linear space are
well-known. (See Section 10.5.) In particular, the dual space has the same dimension as the original space.
(See EDM2 [35], 256.G, for dual spaces.)
28.2.1 Remark: There are many ways to define cotangents on differentiable manifolds. Cotangents at a
point p in a manifold M may be defined as the dual of the linear space Tp (M ) of tangent vectors at p. This
dual can be defined either (1) in terms of standard (induced) linear structure on Tp (M ) or (2) as cotangent
triple equivalence classes t∗p,w,ψ = [(p, w, ψ)] defined so that t∗p,w,ψ .tp,v,ψ = wi v i for tp,v,ψ ∈ Tp (M ). In
case (1), the cotangent vectors are linear maps from Tp (M ) to IR.
Cotangents can also be defined in terms of flat-space cotangents just as tangent vectors are. Thus a cotangent
would be a triple (p, w, ψ) where w ∈ IRn is the sequence of components of a cotangent at ψ(p) ∈ IRn for
manifold charts ψ ∈ atlasp (M ).
Another way to define a cotangent vector at p ∈ M would be as an equivalence class of function f ∈ C 1 (M )
which have the same gradient. Thus two functions f, g ∈ C 1 (M ) would be considered equivalent if DV f =
DV g for all V ∈ Tp (M ). (Notation 27.5.7 defines DV .) A cotangent would then be an equivalence class
[f ] of f ∈ C 1 (M ) with respect to this equivalence relation. This construction for cotangent vectors is an
analogue of the construction of contravariant vectors on manifolds as equivalence classes of curves with the
same derivative. (See Remark 27.1.1, paragraphs (iii) and (iv).)
Cotangent vectors can also be defined as maps from the set of C 1 curves γ : IR → M with γ(0) = p
to the real numbers. This is a dual of the curve-equivalence-class representation of tangent vectors. (See
Remark 27.1.1& (iii).) Every function f ∈ C (M ) corresponds to such a cotangent vector. The map γ 8→
1
∂t (f (γ(t)))&t=0 has the same value for all curves γ in a curve equivalence class at p ∈ M .
Just as in the case of (contravariant) tangent vectors, one really wants to be able to use all representations
of cotangent vectors interchangeably according to the application du jour. But for practical reasons, it is
necessary to choose one of the representations as the standard and derive all of the others from the standard
version.
This book tries to use the “economy principle” to help choose the best standard representation of each class
of object. This eliminates some of the more artistically pleasing representations involving curves, smooth
functions and linear functionals. The most economical representation is in terms of equivalence classes
[(p, w, ψ)] for w ∈ IRn and charts ψ ∈ atlasp (M ). These are the same kinds of triples as for tangent vectors,
but the computation rules are different.

28.2. Cotangent vectors and cotangent spaces 603
[ For better consistency and more efficient calculations, should use covectors of the form t∗p,w,ψ = [(p, w, ψ)]
for T ∗ (M ) ≡ T 0,1 (M ), where" p ∈ M , ψ ∈ AM,p ,#w ∈ (IRn )∗ = Lin(IRn , IR). Then define µ : T 0,1 (M ) ×
T 1,0 (M ) → T 0,0 (M ) by µ : [(p, w, ψ)], [(p, v, ψ)] 8→ [(p, w(v), ψ)]. But Tp0,0 (M ) is equivalent to IR. So
[(p, w(v), ψ)] ≡ w(v) ∈ IR. Then for general tensors, likewise use [(p, K, ψ)] with K a flat-space tensor
r,s
in ⊗ IRn . Really need µ : T 0,1 (M ) × T 1,0 (M ) → IR. Is there any real use for T 0,0 (M ) with real numbers
attached at points of M ? ]
[ See Crampin/Pirani [12], page 76, regarding “covector fields and the Lie derivative”. On page 37, they call
cotangent vectors “covectors”. ]
28.2.2 Definition: The cotangent (vector) space of a C 1 manifold M at a point p ∈ M is the dual linear
space Lin(Tp (M ), IR) of Tp (M ), denoted as Tp∗ (M ).
Any element of Tp∗ (M ) is called a cotangent (vector) or covector of M at p.
28.2.3 Remark: The pointwise cotangent space Tp∗ (M ) in Definition 28.2.2 is strictly speaking an ab-
breviation for the tuple (IRn , Tp∗ (M ), σIR , τIR , σ, µ), where σIR and τIR are respectively the addition and
multiplication operations on IR, σ is the addition operation on Tp∗ (M ), and µ is the scalar multiplication
operation of IR on Tp∗ (M ). (See Definition 10.1.2 for linear spaces.)
It is usual to assume that the space Tp∗ (M ) has its standard topology, but not the standard metric, inner
product or norm, because these are not invariant under chart transitions.
[ Near here, define T̊p∗ (M ) as the dual of T̊p (M ) (or otherwise). The elements of T̊p∗ (M ) should map operators
∂p,v,ψ to IR. ]
[ Probably should make the following remark into a formal definition. ]
28.2.4 Remark: Cotangents may be coordinatized in terms of coordinates on the " tangent#nspace Tp (M ).
If ψ is a chart for M at p, and ω ∈ Tp∗ (M ), define Ψ : Tp∗ (M ) → IRn by Ψ : ω 8→ ω(tp,ei ,ψ ) i=1 , where for
each i =! 1 . . . n, ei ∈ IRn has components (ei )j = δij . If w = Ψ(ω), then wi = ω(tp,ei ,ψ ) for i = 1 . . . n, and
ω(V ) = ni=1 wi v i for all V = tp,v,ψ ∈ Tp (M ).
28.2.5 Definition: A coordinate basis covector at a point p ∈ M for a chart ψ ∈ AM in " a C# manifold
1
(M, AM ) with n = dim(M ) is the vector ep,ψ ∈ Tp (M ) defined for i = 1 . . . n by ep,ψ tp,v,ψ = v i for
i ∗ i
all v ∈ IRn .
28.2.6 Remark: The unit basis cotangent vectors eip,ψ in Definition 28.2.5 are equivalent to the differen-
tials (dψ i )p ∈ T̊p∗ (M ). One could also use the simplified notation dip,ψ or di . Then the identity dip,ψ (ep,ψ
j ) = δj
i
holds in terms of the unit vector notation ep,ψ j in Definition 27.7.4. (See also Remark 30.2.8 for similar com-
ments on notation.) The cotangent vectors (dψ i )p may also be written as dψ i (p).
28.2.7 Theorem: The unit vectors dip,ψ = (dψ i )p for i ∈ n at p ∈ M on an n-dimensional C 1 manifold
M satisfy the transformation rule:
∂ ψ̃ j (ψ −1 (x)) &&
dip,ψ̃ = & djp,ψ
∂xi x=ψ(p)
∂ ψ̃ j j
= d
∂ψ i p,ψ
for charts ψ, ψ̃ ∈ atlasp (M ).

" #n " #n
[ Maybe have a dual space basis notation diψ i=1 analogous to ∂iψ i=1 . Write an explicit formula for
" # i −1 &
(x)) &
diψ (p, v, ψ̄) = ∂ (ψ◦ψ
∂xk x=ψ(p)
.v k ? ]
28.2.8 Definition: The total cotangent space of a C 1 n-dimensional manifold M −

< (M, AM ) is the C 0
manifold T (M ) −
∗
< (T (M ), AT (M ) ), where
∗ ∗

%
(i) T ∗ (M ) = p∈M Tp∗ (M ), and
(ii) AT ∗ (M ) = {ψ̂ ∗ ; ψ ∈ AM }, where for any chart ψ ∈ AM , the chart ψ̂ ∗ : π ∗ −1 (Dom(ψ)) → IR2n is defined
" #
by ψ̂ ∗ : ω 8→ ψ(p), (ω(tp,ei ,ψ ))ni=1 , where π ∗ : T ∗ (M ) → M is the projection map for T ∗ (M ), and ei
denotes the ith unit vector of IRn for i = 1, . . . n.
28.2.9 Remark: The charts and projection maps for Definition 28.2.8 are illustrated in Figure 28.2.1.
ψ̃
T (M ) IR2n
π
ψ
M IRn
π∗
ψ̃ ∗
T ∗ (M ) IR2n
Figure 28.2.1 Charts and projection maps for tangent and cotangent spaces
28.3. Covariant and mixed tensors

The space T ∗ (M ) is defined in Section 30.2.
[ Must define the full topological and differentiable structures for all tensor spaces in this section. ]
r,s
28.3.1 Remark: In Notation 28.3.2, it would probably be better to define Tpr,s (M ) = ⊗ Tp (M ). In
Remark 13.8.4, it was mentioned that there is a difficulty here in notating the fully general ordering of
mixed tensors. One could use a notation such as Tp1,−1,1,1 (M ) = Tp (M ) ⊗ Tp∗ (M ) ⊗ Tp (M ) ⊗ Tp (M ). More
n
generally, TpJ (M ) = ⊗i=1 Vi , where J ∈ {−1, 1}n , and for i = 1, . . . n, Vi = Tp (M ) if J(i) = 1, and
Vi = Tp (M ) if J(i) = −1. There doesn’t seem to be any standard notation such as this.
∗
[ The tensor spaces in Notation 28.3.2 are constructed pointwise from multilinear functions of the pointwise
tangent vector spaces Tp (M ). It could be simpler to re-use the flat-space definitions as in [(p, K, ψ)]. ]
28.3.2 Notation: Tpr,s (M ) denotes the tensor product of r copies of the tangent space Tp (M ) of the C 1
manifold M at the point p ∈ M , and s copies of the dual space Tp∗ (M ):
+ r , + s ,
Tpr,s (M ) = ⊗ Tp (M ) ⊗ ⊗ Tp∗ (M ) ,
i=1 j=1
where ⊗ denotes the tensor product of linear spaces. (See Section 13.12.)
[ Define coordinates for Tpr,s (M ). ]
%
28.3.3 Notation: T r,s (M ) denotes the set p∈M Tpr,s (M ).
[ Define the atlas for T r,s (M ). ]
28.3.4 Definition: A tensor of type (r, s) at a point p in a C 1 manifold M is any element of Tpr,s (M ).
[ Here the definitions of tensor addition and product should be imported from Chapter 13. ]
28.3.5 Theorem: For any tensor K of type (r, s) at a point p in a C 1 manifold M , there are unique
numbers (Kji )i∈Nnr ,j∈Nns such that
5 + r , + s ,
K= Kji ⊗ ∂ip,ψ
k
⊗ ⊗ (dx j$
) p,ψ .
r
i∈Nn
k=1 %=1
s
j∈Nn
[ This theorem has to be tidied up a bit to take care of the chart dependence properly. Are the dxj$
differentials of real-valued functions? ]

28.4. Double tangent spaces 605
28.3.6 Definition: The numbers (Kji )i∈Nnr ,j∈Nns in Theorem 28.3.5 are the components of the tensor K
with respect to the chart ψ.
28.3.7 Theorem: [ The coordinate transformation rule for tensors. ]
[ Define tensor bundles in two ways: as vector bundles of tensors and as the tensor product of vector bundles? ]
28.3.8 Definition: [Definition of tensor bundle. Define the atlas for T r,s (M ).]
[ Must do “double fibre bundles” in fibre.tex and lie.tex. ]
28.4. Double tangent spaces

This section deals with tangent spaces such as T (M1 , M2 ) and T ∗ (M1 , M2 ) for C 1 manifolds M1 and M2 .
Such spaces are useful for representing differentials of maps between manifolds.
[ Must define canonical atlases for these double tangent spaces. Show how the special cases M1 = IR or
M2 = IR lead to spaces T ∗ (M ) and T (M ) respectively. Thus T ∗ (M ) is associated with real-valued functions
in the same way that T (M ) is associated with differentiable curves. Discuss the fact that Tp (IR) can be
identified with IR for all p ∈ IR because IR has an absolute parallelism. Therefore Tp (M ) may be identified
with T0 (M ), and so forth. ]
28.4.1 Definition: The (pointwise) double tangent space of a pair of C 1 manifolds M1 and M2 at points
p1 ∈ M1 and p2 ∈ M2 is the linear space Tp1 ,p2 (M1 , M2 ) = Hom(Tp1 (M1 ), Tp2 (M2 )) of linear homomorphisms
between the tangent spaces Tp1 (M1 ) and Tp2 (M2 ).
28.4.2 Remark: It is quite easy to define coordinates for spaces Tp1 ,p2 (M1 , M2 ) = Hom(Tp1 (M1 ), Tp2 (M2 ))
in Definition 28.4.1. Let ψ1 ∈ atlasp1 (M1 ) and ψ2 ∈ atlasp2 (M2 ). Basis vectors epi 1 ,ψ1 = tp1 ,ei ,ψ1 ∈ Tp1 (M1 )
and epj 2 ,ψ2 = tp2 ,ej ,ψ2 ∈ Tp2 (M2 ) for i = 1, . . . n1 = dim(M1 ) and j = 1, . . . n2 = dim(M2 ) are provided by
Definition 27.7.4. The basis covectors eip1 ,ψ1 ∈ Tp∗1 (M1 ) are defined in Definition 28.2.5.
!n1 k p1 ,ψ1
Basis vectors epp21 ,ψ 2 ,i
,ψ1 ,j ∈ Hom(Tp1 (M1 ), Tp2 (M2 )) may be defined so that for all V = k=1 v ek ∈ Tp1 (M1 ),
epp21 ,ψ 2 ,i i p2 ,ψ2
,ψ1 ,j (V ) = v ej . A mnemonic for this is eij = ej · ei , where the operation “·” represents the scalar
multiplication on Tp2 (M2 ). Thus eij (V ) = ej · ei (V ) for V ∈ Tp1 (M1 ). [ This implies dim(Tp1 ,p2 (M1 , M2 )) =
n1 n2 ? ]
[ Define components cji so that L = cji eij for L ∈ Hom(. . . , . . .), or something like that. Relate this to Jacobian
matrices as in Definition 30.3.15? ]
28.4.3 Definition:
% The total double tangent space of a pair of C 1 manifolds M1 and M2 is the space
T (M1 , M2 ) = p1 ∈M1 ,p2 ∈M2 Tp1 ,p2 (M1 , M2 ) of all linear homomorphisms between tangent spaces Tp1 (M1 )
and Tp2 (M2 ).
[ Must also define an atlas and fibration structure etc. in Definition 28.4.3? Is this a fibration over M1 × M2 ?
Maybe also a fibration over M1 and M2 ? πj : T (M1 , M2 ) → Mj etc. ]
[ Define an atlas for T (M1 , M2 ) and define tangent space T (T (M1 , M2 )). Atlas ψ̂(. . .) = (ψ1 (p1 ), ψ2 (p2 ), . . .).
Bring in a Jacobian matrix. ]
28.4.4 Remark: Corresponding to the double tangent spaces are the double cotangent spaces. Clearly
T ∗ (M1 , M2 ) = T (M2 , M1 ) for any C 1 manifolds M1 and M2 .
28.4.5 Definition: The (pointwise) double cotangent space of a pair of C 1 manifolds M1 and M2 at points
p1 ∈ M1 and p2 ∈ M2 is the space Tp∗1 ,p2 (M1 , M2 ) = Hom(Tp2 (M2 ), Tp1 (M1 )) of linear homomorphisms
between the tangent spaces Tp2 (M2 ) and Tp1 (M1 ).
[ Must also define an atlas and linear space operations etc. ]
28.4.6 Definition:
% The total double cotangent space of a pair of C 1 manifolds M1 and M2 is the space
T (M1 , M2 ) = p1 ∈M1 ,p2 ∈M2 Tp∗1 ,p2 (M1 , M2 ) of all linear homomorphisms between tangent spaces Tp2 (M2 )
∗
and Tp1 (M1 ).

[ Must also define an atlas and fibration structure etc. Or basis and coordinates? ]

28.5. Vector fields

Vector fields on manifolds are vector-valued functions. The value of a vector field at any point of a manifold
is a tangent vector at that point. The symbol X is often used to denote vector fields and spaces of vector
fields. Since vector fields may be interpreted as cross-sections of tangent bundles, a useful mnemonic for the
letter X is the word “cross”. Figure 28.5.1 is an artist’s impression of a vector field.
X(p) ∈ Tp (M )
p
Figure 28.5.1 A vector field on a manifold
28.5.1 Definition: A (tangent) vector field on a C 1 manifold M is a function X : M → T (M ) such that

X(p) ∈ Tp (M ) for all p ∈ M .
A (tangent) vector field on a subset S ⊆ M of a C 1 manifold M is a function X : S → T (M ) such that
X(p) ∈ Tp (M ) for all p ∈ S.
28.5.2 Remark: A vector field on a subset S of a manifold M in Definition 28.5.1 is the same thing as a
vector field on the manifold S with the relative differentiable structure. (See Definition 26.5.12 for relative
differentiable structures.) It is generally assumed that definitions for whole manifolds also apply to open
subsets with the relative differentiable structure.
28.5.3 Remark: The atlas AT (M ) of the total tangent space T (M ) of a C 1 manifold M is defined in Defini-
tion 27.8.1. A chart ψ̂ ∈ AT (M ) is associated with each chart ψ ∈ AM . By Theorem 27.8.13, (T (M ), AT (M ) )
is a C r−1 manifold if (M, AM ) is a C r manifold with r ≥ 1.
28.5.4 Remark: It follows from Definition 26.9.1 and Theorem 27.8.13 that a vector field X is of class C r
−
on a C r+1 manifold M for r ∈ +
0 if and only if the map ψ̂ ◦X ◦ψ
−1
is of class C r for all charts ψ ∈ atlas(M ).
The function ψ̂ ◦ X ◦ ψ −1 may be tested for regularity in terms of a coordinate representation of the
vector field X. Suppose ψ ∈ AM and X(p) = tp,ξ(ψ(p)),ψ for p ∈ Dom(ψ), where ξ : Range(ψ) → IRn
with n = dim(M ). Then ψ̂ ◦ X ◦ ψ −1 (y) = (y, ξ(y)) for all y ∈ Range(ψ). Therefore X is a C r vector field if
and only if the components ξ of X are of class C r for all charts ψ ∈ AM .
From the formula ψ̂ ◦ X ◦ ψ −1 (y) = (y, ξ(y)), it follows that ψ̂ ◦ X ◦ ψ −1 = (idDom ψ ) × ξ, and therefore
π2 ◦ ψ̂ ◦ X ◦ ψ −1 = ξ, where π2 : IR2n → IRn is the projection map π2 : (x1 , . . . x2n ) 8→ (xn+1 , . . . x2n ).
28.5.5 Definition: The component function of a vector field X ∈ X(M ) on a C 1 n-dimensional manifold
M for a chart ψ ∈ atlas(M ) is the function ξ : Range(ψ) → IRn defined by ξ = π2 ◦ ψ̂ ◦ X ◦ ψ −1 , where
ψ̂ ∈ atlas(T (M )) is the chart for T (M ) corresponding to ψ and π2 : IR2n → IRn is the projection map π2 :
(x1 , . . . x2n ) 8→ (xn+1 , . . . x2n ).
28.5.6 Remark: Notation 28.5.7 seems to be slightly non-standard. Some authors (e.g. Malliavin [36],
7.4.1, page 69, Gallot/Hulin/Lafontaine [20], 1.38, page 18, Darling [14], 7.1.1, page 144) use notations such
as ΓT (M ), Γ(T M ) or ΓT M instead of X(M ). Crampin/Pirani [12], page 252, uses a notation like (M ).
EDM2 [35], section 105.M, and Kobayashi/Nomizu [27], page 5, use the notation (M ).
28.5.7 Notation: X(T (M )) for a C 1 manifold M denotes the set of vector fields on M .
−
X r (T (M )) for a C r+1 manifold M for r ∈ +
0 denotes the set of C vector fields on M .
r
X(M ) is an abbreviation for X(T (M )), and X r (M ) is an abbreviation for X r (T (M )).

28.6. Tangent operator fields 607
−
28.5.8 Definition: The linear space of C r vector fields on a C r+1 manifold M for r ∈ + 0 is the tu-
ple X (M ) −
r
< (IR, X (M ), σIR , τIR , σX r (M ) , µ), where IR −
r
< (IR, σIR , τIR ) is the usual field of real numbers,
σX r (M ) : X r (M ) × X r (M ) → X r (M ) is the operation of pointwise addition on X r (M ) (using the linear
space addition of Tp (M ) for all p ∈ M ), and µ : IR × X r (M ) → X r (M ) is the pointwise product operation
of IR on X r (M ) (also using the linear space structure of Tp (M ) for p ∈ M ).
28.5.9 Remark: The linear space of C r vector fields in Definition 28.5.8 can be defined also for general
vector fields in X(T (M )), although this is not often useful.
28.5.10 Notation: DX for a vector field X ∈ X(T (M )) for a C 1 manifold M denotes the map DX :
C 1 (M ) → (M → IR) defined by (DX f )(p) = DX(p) f for all f ∈ C 1 (M ) and p ∈ M .
28.5.11 Theorem: For all r ∈ + 0 , the space X (M ) of C vector fields on a C

r r r+1
manifold M is a unitary
left module over the ring C r+1
(M ) with the pointwise product (f.X)p = f (p)Xp .
28.5.12 Definition: The coordinate basis vector fields of a C 1 manifold M with respect to a C 1 chart ψ
for M are the maps eψ
i : Dom(ψ) → T (M ) defined for i = 1, . . . n = dim(M ) by
∀p ∈ Dom(ψ), eψ p,ψ
i (p) = ei .
28.5.13 Remark: See Definition 27.7.4 for the chart basis vectors ep,ψ
i = tp,vi ,ψ referred to in Defini-
tion 28.5.12.
−
28.5.14 Theorem: The coordinate basis vector fields of a C k+1 manifold for a C k+1 chart for k ∈ +0 are
of class C k .
[ Around about here, should define n-frame fields, and then define the field of coordinate bases. This particular
n-frame field is apparently called the Gauß frame with respect to a given chart. ]
28.6. Tangent operator fields

Many texts use the same notation for vector fields X and the action DX of vector fields. Since this text is
being more careful about such definitions, a separate set of notations is defined in this section for tangent
operator fields.
28.6.1 Remark: In Notations 28.5.7 and 28.6.3, X(T (M )) and X(T̂ (M )) are spaces of cross-sections of
the corresponding tangent fibrations T (M ) and T̂ (M ), but X(T̊ (M )) in Notation 28.6.2 is not a space of
cross-sections because the set T̊ (M ) of untagged tangent operators is not a fibration. However, all of the
spaces in Notations 28.6.2 and 28.6.3 may be regarded as linear spaces with respect to pointwise vector
addition and scalar multiplication similarly to Definition 28.5.8.
28.6.2 Notation: X(T̊ (M )) for a C 1 manifold M denotes the set of maps X : M → T̊ (M ) such that
X(p) ∈ T̊p (M ) for all p ∈ M .
−
X r (T̊ (M )) for a C r+1 manifold M with r ∈ + denotes the set {X ∈ X̊(T (M )); X is C r } of C r tangent
operator fields in X(T̊ (M )).
X̊(M ) is an abbreviation for X(T̊ (M )), and X̊ r (M ) is an abbreviation for X r (T̊ (M )).
28.6.3 Notation: X(T̂ (M )) for a C 1 manifold M denotes the set of maps X : M → T̂ (M ) such that
X(p) ∈ T̂p (M ) for all p ∈ M .
−
X r (T̂ (M )) for a C r+1 manifold M with r ∈ + denotes the set {X ∈ X̂(T (M )); X is C r } of C r tagged
tangent operator fields in X(T̂ (M )).
X̂(M ) is an abbreviation for X(T̂ (M )), and X̂ r (M ) is an abbreviation for X r (T̂ (M )).
28.6.4 Definition: The coordinate basis operator fields in a subset U of a C 1 manifold M with respect to
a chart ψ are the maps ∂iψ : Dom(ψ) → T (M ) defined by
∂iψ (p) = ∂ip,ψ
for all p ∈ Dom(ψ). (See Definition 27.7.11 for the chart basis operators ∂ip,ψ .)

28.6.5 Theorem: The coordinate basis operator fields are C 0 vector fields.
[ The following definition is probably a trivial application of the definition of the chart on the tangent space. ]
[ Use the charts ψ̂ in Definition 28.6.6. ]
28.6.6 Definition: The component function (or the set of components) of a vector field X on a C 1 manifold
M with respect to the chart ψ ∈ atlas(M ) is the function from Dom(ψ) to IRn which maps p ∈ Dom(ψ) to
ξψ (p) = φψ (p, Xp ), so that
5 n
DX(p) = (φψ (p, Xp ))i ∂ip,ψ .
i=1
−+
28.6.7 Theorem: Let r ∈ 0 . Then a vector field X in a C r manifold M is a C r vector field if and only
if for all charts ψ of M the component function ξψ (p) satisfies ξψ ∈ C r (Dom(ψ), IRn ). That is,
X ∈ C r (M, T (M )) ⇔ ∀ψ ∈ atlas(M ), ξψ ∈ C r (Dom(ψ), IRn ).
28.7. Tensor fields

28.7.1 Definition: A tensor field of type (r, s) in a subset S of a C 1 manifold M for r, s ∈ +
0 is a map
Y : S → T r,s (M ) such that Y (p) ∈ Tpr,s (M ) for all p ∈ S.
28.7.2 Definition: A C k differentiable tensor field of type (r, s) in an open subset Ω of a C k+1 manifold
M for r, s ∈ + 0 is a tensor field Y of type (r, s) in Ω such that Y : Ω → T
r,s
(M ) is of class C k with respect
to the C k differentiable structures on M and T r,s (M ).
28.7.3 Notation: Xr,s
k
(Ω) will denote the set of C k tensor fields of type (r, s) in the open subset Ω of a
C k+1 manifold M .
[ Define “naive” differentials in T (T r,s (M )) of tensor fields in Xr,s
k
(M ). ]
28.8. Vector fields and tensor fields along curves
A vector field along a curve is defined on the domain of the curve, not on the range. This is convenient in
the case of non-self-intersecting curves, but essential for curves which do have self-intersections. The phrase
“along a curve” is used rather than “on a curve” because vector fields are defined on the parameter interval
of the curve, not on the image set.
Some important vector fields along curves in differentiable manifolds are the velocity field of a curve (the
tangent to the curve at each point) and transversal vector fields induced by embedding the curve in a family
of curves.
28.8.1 Definition: A vector field along a curve γ : I → M in a C 1 manifold M is a function Y : I → T (M )
such that ∀t ∈ I, Y (t) ∈ Tγ(t) (M ).
28.8.2 Definition: A continuous vector field along a curve γ : I → M in a C 1 manifold M is a vector
field Y : I → T (M ) along γ such that Y is a continuous function with respect to the usual topology on
intervals I ⊆ IR and the standard topology on T (M ).
28.8.3 Remark: Differentiable vector fields can be defined along differentiable curves. See Definition 26.7.1
for differentiable curves. Definition 28.8.4 uses the standard C r differentiable structure on the total tangent
space T (M ) of a C r+1 manifold M .
28.8.4 Definition: A C r (differentiable) vector field along an open C r curve γ : I → M in a C r+1
−
manifold M for r ∈ + 0 is a vector field Y : I → T (M ) along γ such that Y is of class C with respect
r
to the usual differentiable structure on open intervals I ⊆ IR and the standard C differentiable structure
r
on T (M ).
[ Must define C r vector fields on manifolds for r ≥ 2, and then specialize to vector fields along curves. ]
[ Define vector fields on families of curves. ]

28.9. Differential forms 609
28.9. Differential forms

[ See EDM 108.Q, Malliavin, Section I.7.5, page 71. ]
28.9.1 Definition: A differential form of%degree m with coefficients in a linear space W on a subset N
of a C ∞ manifold M is a function ω : N → p∈M (Λm (Tp (M ), W )) such that ω(p) ∈ Λm (Tp (M ), W ) for all
p ∈ N , and. . .
A differential form of degree m on a subset N of a C ∞ manifold M is a differential form of degree m with
coefficients in IR on N . That is, if the space of coefficients is not mentioned, then it is implicitly IR.
28.9.2 Notation: The set of differential forms of degree m with coefficients in W on a subset N of a C ∞
manifold M , together with the operations of pointwise addition and pointwise multiplication by elements of
the field of W , may be denoted by Λm (N, W ). [ Must check that this notation is okay. ]
[ Also need to deal with the case that W is an (alternating) algebra, so that the space of differential forms
can be made into an (alternating) algebra. ]
[ Additionally must deal with the spaces of C r differential forms. ]


[611]
Chapter 29
Higher-order tangent vectors
29.1 Higher-order tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613

29.2 Tensorization coefficients for second-order tangent operators . . . . . . . . . . . . . . . . . 615
29.3 Higher-order tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
29.4 Higher-order tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
29.5 Drop functions for second-level tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . 620
29.6 Elliptic second-order operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
29.7 Higher-order vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
29.8 Higher-order vector fields for families of curves . . . . . . . . . . . . . . . . . . . . . . . . 622
Very roughly speaking, first-order derivatives are of interest to geometers whereas second-order derivatives
are of interest to physicists. First-order operators are related to the geometry of a manifold, whereas second-
order operators are related to forces acting on fields.
Higher-order tangent vectors are extensions of ordinary tangent vectors to represent higher-order derivatives.
These should be distinguished from higher-order differentials of functions and maps. Higher-order tangent
vectors are dealt with separately in their own chapter because they are not commonly dealt with in differential
geometry textbooks at all.
In the same way that tangent operators are defined in Section 27.5 (Definition 27.5.1) as first-order derivatives
of test functions, it is also possible to define higher-order operators. The transformation rules for these
operators then yield similar coordinate-based tangent vector definitions.
29.0.1 Remark: It is difficult to assign a meaning to second-order tangent vectors in the sense of the
interpretation of first-order tangent vectors as geometric infinitesimal vectors. The difficulty of geometric
interpretation is perhaps one reason why it is rarely dealt with in differential geometry texts, although
second-order operators are of the greatest importance in physics.
29.0.2 Remark: It is noteworthy that higher-order tangent vectors do not require a metric, a connection, or
even the second tangent space T (T (M )). Second-order tangent vectors have chart transition rules which fulfil
the requirements of symmetry and transitivity for a well-defined differential geometric object because they
are based on second-order tangent operators, which themselves are defined as maps from C 2 (M ) functions
to IR. Although the Laplacian operator uses a metric for its definition, it is in fact a second-order tangent
operator. This may seem a little paradoxical. But the Laplacian needs the metric for the choice of its
coefficients, but it resides in a space which does not require a metric for its construction.
29.0.3 Remark: The reason for normally not defining second-order tangent operators becomes clear when
one asks what the second derivative of a real-valued function is in a particular direction V ∈ Tp (M ) in the
tangent space at a point. The first-order derivative is chart-independent (because the chart transition rules
for the vector V make the vector “follow” the chart transition map). But the second-order derivative is
chart-dependent. One way to deal with this is to define a new class of second-order tangent objects which
“follow” the chart transition maps. This is not usually done because it does not correspond to physical
intuition. We like to think of space as being essentially Euclidean, Cartesian, Galilean, Newtonian or even
Lorentzian. Physicists do not want to be bothered with second-order transformation rules. The first-order

612 29. Higher-order tangent vectors
rules are burdensome enough already. There is nothing to intuitively identify with locally parabolic-looking
curves in space, whereas locally linear-looking curves may be identified with velocities and forces.
We are accustomed to dealing with Galilean transformations of coordinates every day. Galilean relativity
is part of ordinary life. So we find it easy to accept the need for such transformations in physics. An
object which transforms according to a translation combined with a rotation seems intuitively real to us.
But everyday life does not usually require us to accept curved, accelerating or rotating coordinates on equal
terms with affine static coordinate frames. If that were part of everyday life, then differential geometry
could have been formulated in such terms. We need to remove second-order derivatives from coordinate
transformation rules in differential geometry simply because we are not accustomed to them. There is no
a-priori necessity to reject mathematical objects which transform according to higher-order derivatives of
the chart transition maps.
29.0.4 Remark: Second-order operators are so quintessential to physics that they must be dealt with
somehow. In practice, the second-order terms in diffeomorphisms are dealt with by making them disappear
by effectively normalizing the coordinates at each point before doing calculations. The use of covariant
derivatives and Christoffel symbols provides compensation terms which always bring back second-order
calculations to a standard “uncurved” chart. If the chart is chosen to make these compensation mechanisms
disappear, such a chart is called “normal coordinates”. This means that space is assumed to have a special
class of charts at each point which are somehow physically different to the others. Relativity in physics does
not mean relativity with respect to all diffeomorphisms. There are, in effect, “grooves” in space which define
parallel translation. These geodesic curves must be used in order to define second-order derivatives.
The second-order derivative of a real-valued function in a direction V can be defined in a chart-dependent
manner by only calculating the second-order derivative in a special subset of all charts, and then using
tensorization terms (such as Christoffel symbols) to convert the calculation in general charts to the value in
a special subset of normalized charts. A reference chart for such a second-order derivative definition could
be thought of as an “anchor chart”.
In this chapter, connections are not yet defined. Therefore the “grooves in space” which enable chart-
independent second-order derivatives to be calculated are not available. So the values of second-order
derivatives are chart-dependent. However, this does not mean that they are ill-defined. They just have
burdensome transformation rules.
29.0.5 Remark: Elliptic second-order boundary value problems in flat space are essentially identical to
the same analysis on a single patch of a differentiable manifold. The arbitrariness of the chart implies that
the same problem may be attacked in a wide variety of coordinate systems. It may be that a BVP is classified
as linear or semilinear in some choices of coordinate system but not in others. The ellipticity property for
second-order equations is invariant under diffeomorphisms. Regularity classes are also essentially unchanged
by smooth enough diffeomorphisms. Therefore the existence, uniqueness and regularity theory for elliptic
second-order equations should be portable from flat space. When more than one coordinate patch is required,
the analysis of elliptic BVPs starts to look different to the situation in flat space.
29.0.6 Remark: The analysis of partial differential equations is significantly different in a differentiable
manifold (by contrast to flat space) if convexity properties are studied because it is generally not possible
to choose a coordinate system in which the geodesics are all straight lines. Similarly, a difference arises if
one wishes to define operators such as the Laplacian, which depends strongly on the metric structure of a
manifold.
The Laplacian needs not only chart-independent second derivatives, as all chart-independent second-order
operators do, but also chart-independent distance and angle specifications so that the second-order derivatives
may be correctly scaled and orthogonalized. (In other words, orthonormal coordinates are required at each
point.) To achieve this, the metric tensor is required.
Although one of the main aims of this book is to present differential geometry in such a way that geometric
properties of solutions of partial differential equations can be studied, a good starting point is to study how
much analysis can be done in the total absence of a connection or metric structure.

29.1. Higher-order tangent operators 613
29.1. Higher-order tangent operators

Higher-order tangent vectors are an abstraction of higher-order tangent operators. So the operators are
defined first in order to determine the transformation rules.
29.1.1 Definition: A second-order tangent operator on an n-dimensional C 2 manifold M is any function

[2]
∂p,a,b,ψ : C 2 (M ) → IR defined for p ∈ M , a ∈ Sym(n, IR), b ∈ IRn and ψ ∈ atlasp (M ) by
∀p ∈ M, ∀a ∈ Sym(n, IR), ∀b ∈ IRn , ∀ψ ∈ atlasp (M ), ∀f ∈ C 1 (M ),

5 n
∂2 & 5n &
[2] & i ∂ &
∂p,a,b,ψ (f ) = a ij
i ∂xj
(f ◦ ψ −1
)(x) & + b i
(f ◦ ψ −1
)(x) & .
i,j=1
∂x x=ψ(p)
i=1
∂x x=ψ(p)
The pair (a, b) is called the component pair for, or the components of, the second-order tangent operator
[2]
∂p,a,b,ψ with respect to the chart ψ at p.
[2]
The tuple (p, a, b, ψ) is called the coefficient tuple for the second-order tangent operator ∂p,a,b,ψ .
[2] [2]
29.1.2 Notation: T̊p (M ) denotes the set of all second-order tangent operators ∂p,a,b,ψ at a fixed point
p ∈ M in Definition 29.1.1.
% [2] [2]
T̊ [2] (M ) = p∈M T̊p (M ) denotes the set of all second-order tangent operators ∂p,a,b,ψ in Definition 29.1.1.
[2] [2]
29.1.3 Theorem: Second-order tangent operators ∂p1 ,a1 ,b1 ,ψ1 and ∂p2 ,a2 ,b2 ,ψ2 on an n-dimensional C 2 man-
ifold M are equal if p1 = p2 = p and
n
5 ∂ & ∂ &
i& j&
∀i, j = 1 . . . n, aij
2 = (ψ2 ◦ ψ1
−1
(x)) & (ψ2 ◦ ψ1
−1
(x)) & ak%
1 , (29.1.1)
∂xk x=ψ1 (p) ∂x% x=ψ1 (p)
k,%=1
5n
∂2 & n
5 ∂ &
−1 i& &
∀i = 1 . . . n, bi2 = k %
(ψ2 ◦ ψ1 (x)) & a1 +
k%
k
(ψ2 ◦ ψ1−1 (x))i & bk1 . (29.1.2)
∂x ∂x x=ψ1 (p) ∂x x=ψ1 (p)
k,%=1 k=1
Proof: Suppose that p1 = p2 ∈ M . Then for all f ∈ C 2 (M ),
n
5 ∂2 & 5n &
[2] & i ∂ &
∂p1 ,a1 ,b1 ,ψ1 (f ) = aij
1 (f ◦ ψ1
−1
)(x) & + b1 (f ◦ ψ1
−1
)(x) &
i,j=1
∂xi ∂xj x=ψ1 (p1 )
i=1
∂x i x=ψ1 (p1 )
5 n
∂2 & ∂ & ∂ &
& & &
= aij
1 k %
(f ◦ ψ2
−1
)(x̃) & i
(ψ2
k
◦ ψ1
−1
)(x) & j
(ψ2
%
◦ ψ1
−1
)(x) &
∂ x̃ ∂ x̃ x̃=ψ2 (p2 ) ∂x x=ψ1 (p1 ) ∂x x=ψ1 (p1 )
i,j,k,%=1
5n
∂ & ∂2 &
& &
+ aij
1 k
(f ◦ ψ2−1 )(x̃)& i j
(ψ2k ◦ ψ1−1 )(x)&
∂ x̃ x̃=ψ2 (p2 ) ∂x ∂x x=ψ1 (p1 )
i,j,k=1
5 n
∂ & ∂ &
−1 & −1 &
+ bi1 (f ◦ ψ2 )(x̃) & (ψ2
k
◦ ψ1 )(x) &
∂ x̃k x̃=ψ2 (p2 ) ∂xi x=ψ1 (p1 )
i,k=1
5n
∂2 & n
5 ∂ &
& &
= ak%
2 (f ◦ ψ2−1 )(x̃)& + bk2 (f ◦ ψ2
−1
)(x̃) &
∂ x̃k ∂ x̃% x̃=ψ2 (p2 ) ∂ x̃k x̃=ψ2 (p2 )
k,%=1 k=1
[2]
= ∂p2 ,a2 ,b2 ,ψ2 (f ),
where a2 and b2 are as in equations (29.1.1) and (29.1.2).
29.1.4 Remark: To show the converse of Theorem 29.1.3 is not so easy. It must be shown first that
p1 = p2 and v2i = v1j ∂j (ψ2i ◦ ψ1−1 ) for all i = 1, . . . n, and then that the symmetric parts of the second-order
coefficients are the same. This requires the use of test functions.

[2] [2] [2]
29.1.5 Remark: The set of operators ∂p,0,b,ψ is a closed subset of the set of operators ∂p,a,b,ψ in T̊p (M )
[2]
under chart transitions. The set of tangent operators ∂p,a,0,ψ is not closed.
[2]
29.1.6 Remark: More colloquially, the second-order tangent operator ∂p,a,b,ψ may be written as
∂ 2 && ∂ &&
n
5 n
5 n
5 5n
[2] ∂2 ∂
∂p,a,b,ψ = aij i j & + bi i & or aij (p) + bi i (p),
i,j=1
∂x ∂x x=ψ(p) i=1 ∂x x=ψ(p) i,j=1
∂ψ i ∂ψ j i=1
∂ψ
or just
[2] ∂2 ∂ [2]
∂p,a,b,ψ = aij i j
+ bi i or even ∂p,a,b,ψ = aij ∂ij + bi ∂i .
∂x ∂x ∂x
29.1.7 Remark: Differential geometry is infested with the tedious kinds of expressions and calculations
seen in Theorem 29.1.3 and its proof. One has the choice between writing the full tedious details or using
ambiguous abbreviations. Generally the abbreviations, such as in Remark 29.1.6, are to be preferred, if one
does not forget how to write down the full details. In abbreviated form, Theorem 29.1.3 becomes:
k% ˜ k˜
aij
1 ∂ij + b1 ∂i = a2 ∂k% + b2 ∂k
i
where
aij i j k%
2 = φ,k φ,% a1
and
bi2 = φi,k% ak%
1 + φ,k b1 .
i k
Therefore
i j k% ˜ i k ˜
aij
1 ∂ij + b1 ∂i = φ,k φ,% a1 ∂ij + (φ,k% a1 + φ,k b1 )∂i
i i k%
i j ˜ i ˜ i ˜
= ak%
1 (φ,k φ,% ∂ij + φ,k% ∂i ) + b1 (φ,k ∂i ).
k
k % ˜ k ˜ k˜
= aij
1 (φ,i φ,j ∂k% + φ,ij ∂k ) + b1 (φ,i ∂k ).
i
(29.1.3)
Here φ = ψ2 ◦ ψ1−1 .
[ Define “tensorial” somewhere. ]
29.1.8 Remark: The expression (29.1.3) looks simple enough. It suggests that ∂ij = φk,i φ%,j ∂˜k% + φk,ij ∂˜k
and ∂i = φk,i ∂˜k , which is true and interesting. However, it is much more useful to rearrange (29.1.3) so that
a and b transform like tensors. Thus
aij ij k % ˜ m r s ˜ i k ˜
1 ∂ij + b1 ∂i = (a1 φ,i φ,j )(∂k% + φ,rs φ̃,k φ̃,% ∂m ) + (b1 φ,i )∂k
i
= ãk% (∂˜k% + φm φ̃r φ̃s ∂˜m ) + b̃k ∂˜k

1 ,rs ,k ,% 1
where
ij k %
1 = a1 φ,i φ,j
ãk%
and
b̃k1 = bi1 φk,i .
This gives us a nice tensorial form for the coefficients. It follows that the operators ∂˜k% + φm r s ˜ ˜
,rs φ̃,k φ̃,% ∂m and ∂k
are also tensorial. So we have constructed a tensorial kind of second-order derivative. The problem with
this is that the second order derivative operator must be calculated in terms of a single special chart (or a
special subset of atlas-compatible charts).
An interesting question to ask now is how to define a second-order operator on any C 2 manifold so that it
looks like ∂˜k% + φm r s ˜
,rs φ̃,k φ̃,% ∂m when transformed. This question leads to Theorem 29.2.1.
[2]
29.1.9 Definition: A tagged second-order tangent operator for a C 2 manifold (M, AM ) is a pair (p, ∂p,a,b,ψ )
[2]
such that p ∈ M and ∂p,a,b,ψ : C 2 (M ) → IR is a second-order tangent operator at p.
[2]
The tuple (p, a, b, ψ) is called the coefficient tuple for the tagged second-order tangent operator (p, ∂p,a,b,ψ ).

29.2. Tensorization coefficients for second-order tangent operators 615
29.1.10 Notation: ∂ˆp,a,b,ψ for p ∈ M , a ∈ Sym(n, IR), b ∈ IRn and ψ ∈ atlasp (M ) for an n-dimensional
[2]
[2]
C 2 manifold M denotes the ordered pair (p, ∂p,a,b,ψ ).
[2] [2]
29.1.11 Notation: T̂p (M ) denotes the set of all tagged second-order tangent operators (p, ∂p,a,b,ψ ) at a
fixed point p ∈ M in Definition 29.1.9.
% [2] [2]
T̂ [2] (M ) = p∈M T̂p (M ) denotes the set of all tagged second-order tangent operators (p, ∂p,a,b,ψ ) in Defini-
tion 29.1.9.
[ Define higher-order tangent operator bundles near here? ]
[ Must also define higher-order operators of order greater than 2. ]
[ Show how second-order operators and vectors are related to spaces like T (T (M )), T (T ∗ (M )) and T ∗ (T ∗ (M )),
or something like that. ]
29.2. Tensorization coefficients for second-order tangent operators

[ The function spaces C 2 (M ) and C 0 (M ) in Theorem 29.2.1 should be restricted to the domain of the chart ψ. ]
29.2.1 Theorem: For an n-dimensional C 2 manifold M and charts ψ ∈ atlas(M ), let Lij (ψ) : C 2 (M ) →
C 0 (M ) be the second-order operator fields on the domain of ψ defined by Lij (ψ) = ∂ij −ωij
k
(ψ)∂k for i, j ∈ n .
Then the matrix Lij (ψ)(f ) transforms like the coefficients of a T (0,2) (M ) tensor for f ∈ C 2 (M, IR) if and
only if ω satisfies
%
ωij (ψ̃) = ωrs
k
(ψ)φr ,i φs ,j φ̃% ,k + φk ,ij φ̃% ,k (29.2.1)
for all ψ, ψ̃ ∈ atlas(M ), where φ = ψ ◦ ψ̃ −1 and φ̃ = ψ̃ ◦ ψ −1 = φ−1 .
Proof: It must be shown that Lij (ψ̃)(f ) = φk ,i φ% ,j Lk% (ψ)(f ) for i, j ∈ n , ψ, ψ̃ ∈ atlas(M ) and f ∈
C 2 (M ). For p ∈ Dom(ψ) ∩ Dom(ψ̃),
∂ 2 (f ◦ ψ̃ −1 (x̃)) && ∂(f ◦ ψ̃ −1 (x̃)) &&
Lij (ψ̃)(f )(p) = & − ω k
ij ( ψ̃)(p) &
∂ x̃i ∂ x̃j x̃=ψ̃(p) ∂ x̃k x̃=ψ̃(p)
∂ (f ◦ ψ ◦ ψ ◦ ψ̃ (x̃)) &
2 −1 −1 & ∂(f ◦ ψ −1 ◦ ψ ◦ ψ̃ −1 (x̃)) &&
= & − ω k
( ψ̃)(p) &
ij
∂ x̃i ∂ x̃j x̃=ψ̃(p) ∂ x̃k x̃=ψ̃(p)
∂ 2
(f ◦ ψ −1
(x)) & ∂(f ◦ ψ −1
(x)) &
& &
= φk ,i φ% ,j & + φk ,ij &
∂xk ∂x% x=ψ(p) ∂xk x=ψ(p)
∂(f ◦ ψ (x)) &
−1 &
− ωijk
(ψ̃)(p)φ% ,k &
∂x% x=ψ(p)
∂(f ◦ ψ −1 (x)) &&
= φk ,i φ% ,j Lk% (ψ)(f )(p) + φr ,i φs ,j ωrs k
(ψ)(p) &
∂xk x=ψ(p)
" k # ∂(f ◦ ψ (x)) &
−1 &
+ φ ,ij − ωij %
(ψ̃)(p)φk ,% & .
∂xk x=ψ(p)
This is equal to φk ,i φ% ,j Lk% (ψ)(f )(p) for all f ∈ C 2 (M, IR) if and only if
" #
φr ,i φs ,j ωrs
k
(ψ)(p) + φk ,ij − ωij
%
(ψ̃)(p)φk ,% = 0.
This is easily rearranged to give
%
ωij (ψ̃)(p) = φr ,i φs ,j φ̃% ,k ωrs
k
(ψ)(p) + φk ,ij φ̃% ,k ,
which agrees perfectly with (29.2.1).
29.2.2 Remark: Condition (29.2.2) in Definition 29.2.3 ensures that the second-order operator matrix
Lij (ψ) = ∂ij − ωij
k
(ψ)∂k yields the coefficients of a type (0, 2) tensor when it is applied to any real-valued
function f ∈ C (M, IR). In other words, the matrix of values Lij (ψ)(f ) ∈ C 0 (Dom(ψ), IR) for all f ∈
2
C 2 (M, IR) must satisfy

Lij (ψ̃)(f ) = φk ,i φ% ,j Lk% (ψ)(f )
on Dom(ψ) ∩ Dom(ψ̃) for all f ∈ C 2 (M, IR) and ψ, ψ̃ ∈ atlas(M ).

29.2.3 Definition: Tensorization coefficients for an n-dimensional C 2 manifold M are functions ω(ψ) =
(ωij
k
(ψ))ni,j,k=1 ∈ C 0 (Dom(ψ), IRn×n×n ) for ψ ∈ atlas(M ) such that
∀ψ, ψ̃ ∈ atlas(M ), ∀i, j, k ∈ n,

k
ωij (ψ̃) = φ̃k ,% φr ,i φs ,j ωrs
%
(ψ) + φ̃k ,% φ% ,ij (29.2.2)
where φ = ψ ◦ ψ̃ −1 and φ̃ = ψ̃ ◦ ψ −1 = φ−1 .
29.2.4 Remark: Condition (29.2.2) may be written out more fully as follows.
∀p ∈ M, ∀ψ, ψ̃ ∈ atlasp (M ), ∀i, j, k ∈ n,

k
ωij (ψ̃)(p) = φ̃ k
,% (p)φ ,i (p)φ
r s
,j (p)ωrs (ψ)(p)
+ φ̃k ,% (p)φ% ,ij (p)
%
∂ & ∂ & ∂ &

k& r& s&
= ( ψ̃ ◦ ψ −1
(x)) & (ψ ◦ ψ̃ −1
(x)) & (ψ ◦ ψ̃ −1
(x)) & %
ωrs (ψ)(p)
∂x% x=ψ(p) ∂xi x=ψ̃(p) ∂xj x=ψ̃(p)
∂ & ∂2 &
& %&
+ % (ψ̃ ◦ ψ −1 (x))k & (ψ ◦ ψ̃ −1
(x)) & . (29.2.3)
∂x x=ψ(p) ∂xi ∂xj x=ψ̃(p)
The summation symbols in (29.2.3) have been omitted due to lack of space. Despite this, (29.2.3) is still
considerably more difficult to write and read than (29.2.2). However, sometimes it is desirable to ensure
that abbreviated notations such as are written in (29.2.2) do have the intended, well-defined meaning. In
particular, it often happens that there is confusion between functions which are valued in the manifold’s
point space and functions which are valued in the chart’s range space. This is usually not too dangerous
when there is only one chart in a given context, but in the presence of multiple charts, hidden errors can
arise which are difficult to identify and remove.
29.2.5 Remark: The tensorization coefficients ωij k

(ψ) in Definition 29.2.3 are completely arbitrary for any
fixed chart ψ, but then the transformation rules completely determine ωij k
(ψ̃) for all other charts ψ̃. The
values of ωij (ψ)(p) are completely independent at all points p ∈ M . In particular, the values may be not be
k
continuous, bounded or even integrable.
The choice of tensorization coefficients at a point is equivalent to choosing normalized coordinates at that
point. (More precisely, it is equivalent to choosing an equivalence class of normalized coordinates. Here
“normalized” means that the tensorization coefficients vanish in normalized coordinates at the given point.)
29.2.6 Remark: Although the operators ∂ij yield a symmetric matrix of values ∂ij f when applied to any
function f ∈ C 2 (M ), the tensorization coefficients ωij k
(ψ) are not necessarily symmetric. Therefore the
operators Lij (ψ̃)(f ) in Theorem 29.2.1 do not necessarily yield a symmetric matrix. Although ∂ij f (p) is
necessarily continuous and symmetric, Lij (ψ̃)(f )(p) may be neither continuous nor symmetric. Asymmetry
of the “tensorization coefficients” turns out to be interesting enough to have its own name: “torsion”.
" #
29.2.7 Remark: The Christoffel symbol Γij k
= 12 g kl ∂gli /∂xj + ∂glj /∂xi − ∂gij /∂xl in Section 38.5 sat-
isfies the requirements of Theorem 29.2.1 if ωij k
= Γij k
. (See Theorem 38.5.4.) Therefore second covariant
derivatives which are based on the Levi-Civita connection in a Riemannian space are tensorial.
One can say more than this. Since the Christoffel symbol for the Levi-Civita connection in a Riemannian
space provides a valid “tensorization term” for second-order operators, one may add any tensor of type (1, 2)
to the Christoffel symbol and still have a valid tensorization term. This gives some idea of the wide range
of choice available for defining an affine connection.
29.2.8 Remark: Although tensorization coefficients are apparently quite arbitrary, being defined inde-
pendently at each point of a manifold, an affine connection is not independent at each point. An affine
connection is defined as the differential of a parallelism. This constrains an affine connection to have more
properties than are imposed by equation (29.2.2) in Definition 29.2.3.
[ Determine a set of necessary and sufficient conditions for a set of tensorization coefficients to be the differential
of a parallelism. ]

29.3. Higher-order tangent vectors 617
[ Express the tensorization coefficients concept in terms of something more abstract to make it more similar
to connection forms and covariant derivatives. ]
29.2.9 Remark: One might well ask why there is a negative sign in the operator Lij (ψ) = ∂ij − ωij k
(ψ)∂k
in Theorem 29.2.1. The sign is chosen to match the standard definition for the Christoffel symbol, but the
Christoffel symbol’s sign is itself chosen so that it is positive for transforming covariant tensors and negative
to transform contravariant tensors. It happens that the first-order operator ∂i is covariant. So it gets a
negative sign for the coefficients when subjected to a covariant derivative.
It is a very general problem that contravariant tensors are regarded as “ordinary” while covariant tensors
are regarded as “opposite” in some sense. It is too late to reverse the long course of history in differential
geometry. (See also Remark 13.7.2 on this subject.) So it is necessary to simply tolerate some of the
perplexing ways that signs appear in tensor calculus.
[ Near here, present the IR1 and IR2 versions of tensorization coefficients in explicit detail. In IR1 , there
is a single term φ## /φ# , or something like that. This just happens to be the curvature of the function φ.
Coincidence or not? You be the judge! ]
[ Present the third-order tangent operator tensorization. ]
[2]
29.2.10 Remark: In a later chapter, differential operators La,b or tp,a,b,ψ will be defined with the assistance
of a connection to make the operators construct objects which have tensorially transforming coefficients. The
actual tangent objects will be same as without a connection, but the coefficients will transform tensorially
because the second-order parts of the transformation rules will be incorporated into the object’s construction
algorithm. (This is a bit like incorporating a motion-compensation gyro unit in a camera to correct for
attitude variation.)
29.3. Higher-order tangent vectors

[ The plethora of tangent object classes is a bit overwhelming. They should be summarized in a table and
dealt with in a very systematic and digestible fashion. Each class gets a bundle, a space, an object notation,
a space notation, an operator notation, a transformation rule, an atlas, and so forth. All of this should be
in a neat and tidy table with references to sections where they are defined. ]
In the following, Sym(n, IR) means the set of real symmetric n × n matrices. (See Notation 11.5.3.)
29.3.1 Definition: A % second-order
" tangent (component) tuple for
# an n-dimensional C manifold (M, AM )
2
is a tuple (p, a, b, ψ) ∈ ψ∈AM Dom(ψ) × Sym(n, IR) × IR × {ψ} .

n
A computational second-order tangent (component)% " tuple for an n-dimensional Cn manifold

2
# (M, AM ) with
indexed atlas (ψα )α∈I is a tuple (x, a, b, α) ∈ α∈I Range(ψα ) × Sym(n, IR) × IR × {α} .
29.3.2 Remark: The second-order tangent vector component-tuple equivalence rules in Definition 29.3.3
are based on the corresponding rules for second-order operators in Theorem 29.1.3.
Although these vector transformation rules, and probably all others, are defined to match the corresponding
rules for differential operators, it does not follow that all tangent objects are differential operators. What
does follow is that all differential operators on functions spaces on manifolds yield tangent spaces which have
transformation rules in terms of chart transition maps. We use the differential operators as a quick way to
determine the transformation rules. But then the corresponding tangent spaces are general mathematical
classes which can be used for a wide variety of purposes, to represent the results of a wide variety of
constructions and calculations.
By comparison, consider ordinary vector fields X ∈ X 1 (T (M )) on manifolds. (See Section 28.5 for vector
fields.) At each point p ∈ M , the vector X(p) is an element of Tp (M ), but X(p) is not necessarily a
differential operator. Similarly, if W ∈ X 1 (T ∗ (M )) is a covector field, the covector W (p) ∈ T ∗ (M ) is not
necessarily the gradient W (p) = df (p) of some real-valued function f ∈ C 1 (M ). This would be a very strong
constraint on the field W . However, first-order operators and differentials are used to conveniently determine
the transformation rules. Then vectors like X(p) are thought of simply as indicating a direction, not an
operator. It just happens that a first-order operator has a direction, but other things have a direction too.
In the same way, all directional objects on a manifold use differential operators for their transformation rules
without actually being operators.

29.3.3 Definition: A second-order tangent vector%for an n-dimensional C 2 manifold M − < (M, AM ) is an

equivalence class [(p, a, b, ψ)] of tuples (p, a, b, ψ) ∈ ψ∈AM (Dom(ψ) × Sym(n, IR) × IRn × {ψ}), where the
tuples (p1 , a1 , b1 , ψ1 ) and (p2 , a2 , b2 , ψ2 ) for ψ1 , ψ2 ∈ AM are said to be equivalent whenever p1 = p2 = p and
n
5 ∂ & ∂ &
i& j&
∀i, j = 1 . . . n, aij
2 = (ψ2 ◦ ψ1
−1
(x)) & (ψ2 ◦ ψ1
−1
(x)) & ak%
1 , (29.3.1)
∂xk x=ψ1 (p) ∂x% x=ψ1 (p)
k,%=1
5n
∂2 & n
5 ∂ &
−1 i& −1 i&
∀i = 1 . . . n, bi2 = (ψ2 ◦ ψ1 (x)) & a k%
1 + (ψ2 ◦ ψ1 (x)) & bk1 . (29.3.2)
∂xk ∂x% x=ψ1 (p) ∂xk x=ψ1 (p)
k,%=1 k=1
[2]
29.3.4 Notation: tp,a,b,ψ for p ∈ M , a ∈ Sym(n, IR), b ∈ IRn and ψ ∈ atlasp (M ) for an n-dimensional
C 2 manifold M with n ∈ + 0 denotes the equivalence class [(p, a, b, ψ)] in Definition 29.3.3. In other words,
[2]
tp,a,b,ψ = [(p, a, b, ψ)].
29.3.5 Remark: Notation 29.3.6 gives notations for sets of second-order tangent vectors. Definition 29.3.7
specifies the obvious linear structure on the pointwise set of second-order tangent vectors. Definition 29.3.10
gives the tangent obvious bundle structure for second-order tangent vectors.
[2] [2]
29.3.6 Notation: Tp (M ) denotes the set of second-order tangent vectors tp,a,b,ψ at a point p in a C 2
manifold M . That is,
) [2] *
Tp[2] (M ) = tp,a,b,ψ ; ψ ∈ atlasp (M ), a ∈ Sym(n, IR), b ∈ IRn .
% [2]
T [2] (M ) = p∈M Tp (M ) denotes the set of all second-order tangent vectors for a C 2 manifold M . That is,
) [2] *
T [2] (M ) = tp,a,b,ψ ; p ∈ M, ψ ∈ atlasp (M ), a ∈ Sym(n, IR), b ∈ IRn .
29.3.7 Definition: The second-order tangent space at a point p in a C 2 manifold (M, AM ) is the set
[2] [2]
Tp (M ) where n = dim(M ) and tp,a,b,ψ = [(p, a, b, ψ)] denotes the equivalence class of (p, a, b, ψ) with
respect to the equivalence relation in Definition 29.3.3, together with the linear space operations inherited
from Sym(n, IR) and IRn .
29.3.8 Remark: More precisely, the second-order tangent space at p ∈ M in Definition 29.3.7 is the tuple
[2] [2]
(IR, Tp (M ), σIR , τIR , σT [2] (M ) , µ), where Tp (M ) is as above, σIR and τIR are the standard operations of
p
[2] [2] [2]
addition and multiplication for IR, σT [2] (M ) : Tp (M ) × Tp (M ) → Tp (M ) is the addition operation on
p
[2] [2] [2] [2] [2]
Tp (M ) defined by tp,a1 ,b1 ,ψ + tp,a2 ,b2 ,ψ 8→ tp,a1 +a2 ,b1 +b2 ,ψ , and µ : IR × Tp (M ) → Tp (M ) is the scalar
[2] [2]
multiplication operation (λ, tp,a,b,ψ ) 8→ tp,λa,λb,ψ .
29.3.9 Remark: Just as in Definition 27.7.1, the definitions of vector addition and scalar multiplication
in Definition 29.3.7 are independent of the choice of coordinates. This is because the chart transition rule in
equations (29.3.1) and (29.3.2) is linear with respect to the components a and b.
[ Define higher-order tangent bundles as for Definition 27.8.1. Definitions 29.3.10 and 29.3.11 need to be
fixed. ]
29.3.10 Definition: The second-order tangent bundle of a C 2 manifold (M, AM ) is the C 0 manifold
(T [2] (M ), AT [2] (M ) ), where
% [2] [2]
(i) T [2] (M ) = p∈M Tp (M ) = {tp,a,b,ψ ; ψ ∈ AM , p ∈ Dom(ψ), a ∈ Sym(n, IR), b ∈ IRn }, where n =
dim(M ), and
2
(ii) AT [2] (M ) = {ψ̃; ψ ∈ AM }, where for any chart ψ ∈ AM , the chart ψ̃ : π −1 (Dom(ψ)) → IRn+n +n
is
[2]
defined by ψ̃ : tp,a,b,ψ 8→ (ψ(p), a, b), where π : T (M ) → M is defined by π : [(p, a, b, ψ)] 8→ p.
The function π : T [2] (M ) → M is called the projection map of the total tangent space T [2] (M ).

29.3. Higher-order tangent vectors 619
29.3.11 Definition: The topological second-order tangent bundle of a C 2 manifold (M, AM ) is the topo-
logical space (T [2] (M ), TT [2] (M ) ), where TT [2] (M ) is the topology induced on T [2] (M ) by the second-order
total tangent space atlas AT [2] (M ) .
[2]
29.3.12 Notation: DW for any second-order tangent vector W = tp,a,b,ψ ∈ T [2] (M ) denotes the corre-
[2]
sponding tangent operator ∂p,a,b,ψ ∈ T̊ [2] (M ).
[2]
29.3.13 Notation: D̂W for a second-order tangent vector W = tp,a,b,ψ ∈ T [2] (M ) denotes the corre-
[2]
sponding tagged second-order tangent operator (p, DW ) = (p, ∂p,a,b,ψ ) ∈ T̂ [2] (M ). Thus D̂W = (p, DW ) =
∂ˆp,a,b,ψ = (p, ∂p,a,b,ψ ).
[2] [2]
29.3.14 Remark: Theorem 29.1.3 implies that Definitions 29.3.3 and 29.1.1 are consistent with each other
and the D-operation in Notation 29.3.12 commutes with changes of chart.
29.3.15 Remark: A useful mnemonic for equations (29.3.1) and (29.3.2) is
∂ψ2i ∂ψ2j k%
aij
2 = a ,
∂ψ1k ∂ψ1% 1
∂ 2 ψ2i ∂ψ2i k
bi2 = 1 +
ak% b .
k
∂ψ1 ∂ψ1 % ∂ψ1k 1
The equations are reduced to the first-order tangent vector transformation rule if a1 is zero. If the first-order
component is ignored, the transformation rule reduces to that for contravariant coefficients of 2-tensors in
Tp2,0 (M ). The rules can be further abbreviated as follows.
āij = x̄i k x̄j l ak% ,

b̄i = x̄i k% ak% + x̄i k bk .
Equations (29.3.1) and (29.3.2) are reminiscent of the equations in Theorem 27.10.9.
29.3.16 Remark: It is essentially true that T (M ) ⊆ T [2] (M ), and precisely true that T̊ (M ) ⊆ T̊ [2] (M )
and T̂ (M ) ⊆ T̂ [2] (M ), since the second-order tangent vectors and operators reduce to the corresponding
first-order vectors and operators when the second-order coefficient matrix a is zero.
The higher-order operators of order k > 2 may also be defined along the same pattern as the second-order
operators. The notations for these spaces would then be T [k] (M ) and so forth. Each one of these spaces has
a corresponding total tangent space and can have a higher-order tangent bundle defined for it in Chapter 34.
[ Should determine the relation between second-order tangent vectors and degree 2 tensors in Tp2,0 (M ). Is
[2]
Tp (M ) possibly some sort of extension of Tp2,0 (M )? Answer: It seems like the components for the second-
order operator are tensorial (i.e. they are equal to the components of tensor objects) if they are (1) defined
in a fixed chart and (2) transformed between charts according to covariant transformation rules using an
affine connection. The Hessian of a real-valued function at a stationary point (probably) does not need such
covariant transformation to correct for 2nd order diffeomorphism derivatives. ]
29.3.17 Remark: There are good arguments against the component order (p, a, b, ψ) in Definition 29.3.3.
The ordering (ψ, p, b, a) is preferable in some ways. For example, it is sometimes useful to consider all spaces
T [k] (M ) together in one combined set. It is useful to be able to say that (ψ, p, a1 , a2 , a3 ) ∈ T [3] (M ) and
(ψ, a1 , a2 ) ∈ T [2] (M ) are equivalent when the symmetric n × n × n array a3 equals zero. This could be done
by considering all such sequences to be elements of a set of infinite sequences for which all but a finite number
of components equal zero. This is much easier to do if the components have increasing order. Nevertheless,
decreasing order is used here for the time being because it matches the decreasing left to right order in which
polynomials are written. (See Remark 32.2.6 for similar discussion.)

29.4. Higher-order tangent spaces

[ Maybe should present higher-order cotangent spaces here, and higher-order tensor spaces? What do the
linear and multilinear duals of higher-order tangent spaces look like? Are they useful? What’s needed is a
big classification diagram to interrelate all of the spaces! ]
These spaces include pointwise higher-order tangent spaces and total tangent spaces for both vectors and
operators. Of particular interest are the elliptic second-order operators. Higher-order operators were defined
in Section 29.1.
29.4.1 Definition: The second-order tangent operator space at a point p in a C 2 manifold M is the
[2]
set T̊p (M ) of all tangent operators at p ∈ M , together with the operations of pointwise addition and
multiplication by real numbers. Thus the linear combination λ1 L1 + λ2 L2 : C 2 (M ) → IR of two second-order
[2]
tangent operators L1 , L2 ∈ T̊p (M ) is defined by
∀L1 , L2 ∈ T̊p[2] (M ), ∀λ1 , λ2 ∈ IR, ∀f ∈ C 2 (M ),
(λ1 L1 + λ2 L2 )(f ) = λ1 L1 (f ) + λ2 L2 (f ).
[2]
29.4.2 Remark: Although the action of the operators in spaces T̊p (M ) is restricted to the space C 2 (M )
[2]
so that operators at all points can act on the same space, it is tacitly assumed that every operator in T̊p (M )
is also defined on the space C̊p (M ) of C functions which are defined in a neighbourhood of p. In fact, the
2 2
tangent operators are assumed to act on any reasonable kind of function on M or a subset of M , whether
the function is classically differentiable or not.
29.4.3 Definition: The tagged second-order tangent operator space at a point p in a C 2 manifold M is
[2] [2]
the set T̂p (M ) of all pairs (p, L) such that L ∈ T̊p (M ), together with the operations of pointwise addition
and multiplication by real numbers on the operator component (as in Definition 29.4.1). Thus the linear
[2]
combination λ1 (p, L1 ) + λ2 (p, L2 ) of two tangent vectors (p, L1 ), (p, L2 ) ∈ Tp (M ) is defined by
∀L1 , L2 ∈ T̊p[2] (M ), ∀λ1 , λ2 ∈ IR,
λ1 (p, L1 ) + λ2 (p, L2 ) = (p, λ1 L1 + λ2 L2 ).
[ Here could present coordinate basis vectors analogous to Definition 27.7.4 etc. ]
29.4.4 Remark: Second-order total tangent vector and operator spaces may be defined as in Section 27.8
by defining a standard atlas for the sets T [2] (M ) and T̂ [2] (M ).
29.5. Drop functions for second-level tangent vectors

[ This section must be checked and rewritten. ]
29.5.1 Remark: In Definition 29.5.2, (uv + vu)/2 means the matrix (aij )ni,j=1 with aij = (ui v j + v i uj )/2.
29.5.2 Definition: The drop function from T (2) (M ) to T [2] (M ) for a C 2 manifold M is the function
" (2) # [2]
8 : T (2) (M ) → T [2] (M ) defined by 8 tV,(u,w),ψ̃ = tp,(uv+vu)/2,w,ψ for all p ∈ M , ψ ∈ atlas(M ), V =
tp,v,ψ ∈ Tp (M ), and u, v, w ∈ IRn with n = dim(M ), where ψ̃ is the chart for T (2) (M ) corresponding to ψ.
29.5.3 Theorem: The drop function in Definition 29.5.2 is chart-independent.
Proof: It must be shown that the transformation rules for T (2) (M ) in Theorem 27.10.9 match the rules for
T [2] (M ) in Definition 29.3.3. Let ψ1 , ψ2 ∈ atlas(M ) and p ∈ Dom(ψ1 ) ∩ Dom(ψ2 ), and let u1 , v1 , w1 ∈ IRn
and u2 , v2 , w2 ∈ IRn be the respective components u, v, w ∈ IRn in Definition 29.5.2 for ψ1 and ψ2 . In
abbreviated notation, the transformations rules for T (2) (M ) on lines (27.10.1) and (27.10.2) may be written
as
∂ψ2i
ui2 = uj1 (29.5.1)
∂ψ1j
and

29.6. Elliptic second-order operators 621
∂ 2 ψ2i ∂ψ2i
w2i = uj1 v1k + w1j . (29.5.2)
∂ψ1j ∂ψ1k ∂ψ1j
If line (29.5.1) is combined with the transformation rule v2i = ∂ψ2i /∂ψ1j v1j for v1 and v2 , the result is
∂ψ2i ∂ψ2j k %
ui2 v2j = u v .
∂ψ1k ∂ψ1% 1 1
This yields aij i k j

2 = (∂ψ2 /∂ψ1 )(∂ψ2 /∂ψ1 )a1 by symmetrising over i and j, where a1 , a2 are the respective
% k%
matrices in Definition 29.3.3. This together with line (29.5.2), symmetrized over j and k, gives an exact
match with the abbreviated transformation rule in Remark 29.3.15.
29.5.4 Remark: The drop function in Definition 29.5.2 is an extension of the drop function for vertical
(2)
vectors in Definition 27.11.6. To see this, note that tV,(u,w),ψ̃ in Definition 29.5.2 is vertical if and only
" (2) # [2] " (2) #
if u = 0. Then 8 tV,(u,w),ψ̃ = tp,0,w,ψ ≡ tp,w,ψ . This agrees with 8V tV,(0,w),ψ̃ in Definition 27.11.6. Thus
the natural extension of the drop function to general vectors in T (T (M )) yields second-order tangent vectors
when the (chart-dependent) horizontal component is non-zero.
[ Try to extend the (vertical) drop function from T (2) (M ) to general T (k) (M ). ]
29.6. Elliptic second-order operators

[2]
[ Mention near here that the Laplacian operator at a point p in a C 2 manifold resides in T̊p (M ). Quote the
operator here when it has been properly defined in a later chapter. ]
29.6.1 Remark: The Hessian operator in flat space is a second-order operator, but there are two kinds of
Hessian for differentiable manifolds. The Hessian at a critical point of a function f , namely a point p ∈ M
[2]
where (df )p = 0, turns out to be a covariant tensor in T̊p (M ) which does not require a connection for its
definition. However, the Hessian at a non-critical point of a real-valued function does require a connection
for its definition. (See Section 28.3 for discussion of the Hessian at critical points.)
See Greene/Wu [68], page 7, for the Hessian for general functions using a connection. Since D2 f (X, Y ) =
X(Y f ) − (DX Y )f , it seems that (DX Y )f = 0 if (df )p = 0. So the connection doesn’t come into it. [ This
must be checked since they have fields X and Y rather than vectors at a point. ]
29.6.2 Remark: Definite and semi-definite matrices were introduced in Definition 11.4.7. Elliptic second-
order operators have a second-order component which is positive semi-definite or positive definite. The chart
transition rules in Definitions 29.3.3 and 29.1.1 guarantee that this property is chart-independent.
29.6.3 Definition: A (weakly) elliptic second-order tangent operator at a point p ∈ M of a C 2 manifold
[2] [2]
M is an operator ∂p,a,b,ψ ∈ T̊p (M ) such that the matrix a ∈ Sym(n, IR) is positive semi-definite.
A strictly elliptic second-order tangent operator at a point p ∈ M of a C 2 manifold M is an operator
[2] [2]
∂p,a,b,ψ ∈ T̊p (M ) such that the matrix a ∈ Sym(n, IR) is positive definite.
29.6.4 Theorem: If p ∈ M is a local maximum of a function u ∈ C 2 (M ), where M is a C 2 manifold, then

[2]
DW (u) ≥ 0 for all positive semi-definite operators W ∈ Tp (M ).
[ Near here maybe do some vector field versions of Theorem 29.6.4 etc. ]
29.7. Higher-order vector fields

[k]
This section deals with functions on C k manifolds M which are valued in spaces such as Tp (M ) (introduced
in Definition 29.3.7) at points p ∈ M . In other words, these higher-order vector fields, which may be vector-
valued or operator-valued, are cross-sections of fibrations such as T [k] (M ) and T̂ [k] (M ). The definitions
follow the pattern of Section 28.5.
29.7.1 Definition: A second-order vector field in a subset S of a C 2 manifold M is a map X : S → T [2] (M )
such that π(X(p)) = p for all p ∈ S, where π : T [2] (M ) → M is the standard projection map.

29.7.2 Definition: The action of a second-order vector field X in the manifold M on a function f ∈ C 2 (M )
is the map DX f : M → IR defined by DX f : p 8→ DX(p) f .
[ Define a vector field of order 0 for simple multiplication by a real-valued function? ]
29.7.3 Definition: A vector field X of order k ∈ +

0 in a C r+k manifold M is said to be of class C r for
−
r∈ + 0 if
∀f ∈ C r+k (M ), DX f ∈ C r (M ).
29.7.4 Notation: X r (T [k] (M )) denotes the set of C r vector fields of order k on a C r+k manifold M ,
−+
0 and r ∈ 0 .
where k ∈ +
29.7.5 Remark: The invariance of the class of C r vector fields of order 2 on a C s+2 manifold is discussed
in Remark 19.5.3.
29.7.6 Definition: An elliptic second-order vector field in a C 2 manifold M is a second-order vector field
X ∈ X 0 (T [2] (M )) such that the operator X(p) is an elliptic second-order operator for all p ∈ M .
[ Give maximum principles for weakly and strongly elliptic vector fields. ]
29.7.7 Remark: Even if a metric or connection is needed for the calculation of some kinds of higher-order
vector fields, they are still well-defined vector fields in spaces X r (T [k] (M )) without the metric or connection.
For example, operators such as the Laplacian acting on real-valued functions require a metric to determine
the components of the Laplacian vector field, but the field “lives” in a space X r (T [2] (M )), which does not
itself involve any metric in its definition. It is important to distinguish between the structures required for
constructing an object and the structures required for “housing” the object.
29.8. Higher-order vector fields for families of curves

[ Must define higher-order vector fields along curves and families of curves. These are very important for
3-point convexity maximum principles. ]
[ Apply d[2] φ to families of curves such as in Figure 29.8.1, where φ is the map from γ(1, t) to γ(s, t). This
will be applicable to convexity maximum principles. Should also say exactly the relation is between such
differentials and vectors fields along curves and families of curves. ]
γ(1, t) γ(1, 1)
γ(1, 0) γ(s, 1)
γ(s, 0)
γ(0, 0) γ(0, 1)
γ(0, t)
Figure 29.8.1 Family of curves

[623]
Chapter 30
Differentials on manifolds
30.1 Pointwise differentials versus induced maps . . . . . . . . . . . . . . . . . . . . . . . . . . 623

30.2 The differential of a real-valued function . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
30.3 The differential of a differentiable map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
30.4 The differential of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
30.5 One-parameter transformation families and vector fields . . . . . . . . . . . . . . . . . . . 633
M
In the olden days, differentials were thought of as the df and dx in expressions such as df /dx or f dx. As
mentioned in EDM2 [35], 106.B, these differentials are not well-defined, but may be thought of as the limits
of infinitesimal quantities. In differentiable geometry, there are well-defined mathematical objects which are
analogous to the ill-defined differentials of the 18th century. For a function f : M → IR and point p ∈ M for
a C 1 manifold M , the differential (df )p is defined as the map from differential operators at p to derivatives.
Then for a chart ψ ∈ atlas(M ), the object dψ i is well-defined for i = 1 . . . n, where n = dim(M ). Similarly,
objects of the form ∂/∂ψ i are well-defined as differential operators.
Neither dψ i nor ∂/∂ψ i are well-defined in 18th century calculus. It is important not to confuse the well-
defined concepts of differential geometry with the abstract differentials of elementary calculus.
30.1. Pointwise differentials versus induced maps

30.1.1 Remark: Small differences in the way first-order differentials are defined have a significant effect
on how higher-order differentials are defined.
There seem to be essentially two ways of defining differentials of functions and maps. These approaches may
be called “pointwise differentials” and “induced maps”. In the first approach, the differential is calculated
at each point of the domain manifold first, and this is then aggregated to construct a function whose domain
is the same as the original function. In the second approach, both the domain and range of the function are
regarded as differentiable manifolds with tangent vector fibrations, and the differential function is defined
between those two fibrations. This second form is a kind of “lift” or “push-forth” of the original function.
A variant of the second approach is the “pull-back”, which is some kind of dual or inverse of the push-forth.
30.1.2 Remark: The notation commonly used for the differential of a function f at a point&p is “(df )p ”.
This may be interpreted as either the value (df )(p) of a function df at p or the restriction (df )&Tp (M ) to the
tangent space Tp (M ). The first interpretation leads to pointwise differentials whereas the second leads to
induced maps.
These two approaches to defining differentials are not greatly different in the case of first-order differentials,
but they are significantly different when higher-order differentials are constructed. In the case of real-valued
functions on manifolds, the pointwise style of differential is generally adopted, whereas the “induced map”
style is more usual for the differential of a map between two manifolds. In this chapter, an attempt will be
made to disentangle the various ways in which differentials of various orders may be defined.
30.1.3 Remark: Some texts are unclear as to what the domains and ranges of the functions in their defi-
nitions are. However, the following are rough indications of the approaches to defining differentials by some
authors. Examples of the pointwise differential approach are EDM2 [35], sections 105.I–J and Kobayashi/

624 30. Differentials on manifolds
Nomizu [27], page 7. Examples of the induced map approach are Crampin/Pirani [12], section 10.3,
pages 250–251, Darling [14], section 2.8, page 43, and Gallot/Hulin/Lafontaine [20], section 1.64. Exam-
ples of the pull-back approach to differentials are Malliavin [36], page 90, and Gallot/Hulin/Lafontaine [20],
page 38.
30.1.4 Remark: Differentials of real-valued functions may be regarded as a special case of differentials of
maps between manifolds because IR may be regarded as a 1-dimensional manifold whose tangent space at
each point is a copy of IR. (See Darling [14], pages 40–41, for a comment on this.)
30.1.5 Remark: The naming of “induced” maps is quite appropriate because they are reminiscent of the
way electrical current is induced in a wire by a neighbouring wire via a magnetic field. The induced map
between tangent spaces is in some sense parallel to the map between the point spaces. This is illustrated in
Figure 30.1.1.
φ∗∗
T (T (M1 )) z φ∗∗ (z) T (T (M2 ))
π̃1 π̃2
φ∗
T (M1 ) L φ∗ (L) T (M2 )
π1 π2
φ
M1 p φ(p) M2
Figure 30.1.1 Induced maps for a manifold map
30.1.6 Remark: The following table indicates the domains and ranges of some forms of differentials and
induced maps.
manifold map real-valued function curve
φ : M1 → M2 f : M → IR γ : IR → M
dφ : M1 → T (M1 , M2 ) df : M → T (M )
∗
dγ : IR → T (IR, M )
φ∗ : T (M1 ) → T (M2 ) f∗ : T (M ) → IR γ∗ : IR → T (M )
d2φ : M1 → T (M1 , T (M1 , M2 )) d2f : M → T (M, T ∗ (M )) d2 γ : IR → T (IR, T (IR, M ))
(dφ)∗ : T (M1 ) → T (T (M1 , M2 )) (df )∗ : T (M ) → T (T (M )) (dγ)∗ : IR → T (T (IR, M ))
∗
d(φ∗ ) : T (M1 ) → T (T (M1 ), T (M2 )) d(f∗ ) : T (M ) → T ∗ (T (M )) d(γ∗ ) : IR → T (IR, T (M ))

φ∗∗ : T (T (M1 )) → T (T (M2 )) f∗∗ : T (T (M )) → IR γ∗∗ : IR → T (T (M ))
[ Probably will interpret T (IR, M ) in this table as equivalent to T (M ). Should mention this in Section 28.4.
Also try to find a simplification of spaces like T (M, T ∗ (M )). Also must deal near here with differentials of
diffeomorphism families φ : IR × M → M . Also deal with path families γ : IRm → ˚ M. ]
30.1.7 Remark: Broadly speaking, the d operator yields a function of points on manifolds, which therefore
emphasizes the dependence on the base point alone, whereas the induced map operator yields a function on
the combined base point and tangent vector. However, the double application of the d operator does not
yield a function with a simple dependence on pairs of manifold points as might have been desired. In all
cases, there is a need to construct differentiable structures on top of other differentiable structures. This
makes the situation inconveniently complicated for all second-order differentials.
30.1.8 Remark: The relations between the two styles of first-order differentials of manifold maps are as
follows.
∀V ∈ T (M ), φ∗ (V ) = (dφ)π(V ) (V )

30.2. The differential of a real-valued function 625
%
φ∗ = (dφ)p
p∈M
∀p ∈ M, ∀V ∈ Tp (M ), (dφ)p (V ) = φ∗ (V )
&
&
∀p ∈ M, (dφ)p = φ∗ & −1 ,
π ({p})
where π : T (M ) → M is the projection map for T (M ). The corresponding relations for real-valued functions
are the following.
∀V ∈ T (M ), f∗ (V ) = (df )π(V ) (V )
%
f∗ = (df )p
p∈M
∀p ∈ M, ∀V ∈ Tp (M ), (df )p (V ) = f∗ (V )
&
&
∀p ∈ M, (df )p = f∗ & −1 .
π ({p})
30.1.9 Remark: Although the differentials df of real-valued functions f : M → IR may be interpreted

as cross-sections of the cotangent fibration, df ∈ X 0 (T ∗ (M )), such an interpretation does not seem to be
possible in the case of the differential dφ of a manifold map.
% In this case, the range %
of dφ is not the entire total
tangent space T (M1 , M2 ). The range of dφ is in fact p∈M1 Tp,φ(p) (M1 , M2 ) -= p∈M1 ,q∈M2 Tp,q (M1 , M2 ).
It is just possible, though, that this could be regarded as a cross-section of some sort of fibration over M1
alone if T (M1 , M2 ) can be regarded as a fibration over M1 .
30.1.10 Remark: The simplification of the form of the maps df and f∗ relative to dφ and φ∗ has the
consequence that information is lost. The maps df and f∗ both lose the information in the function f . In
the case of df , the manifold map dφ : M → T (M, IR) is replaced with df : M → T ∗ (M ), and covectors
(df )p ∈ Tp∗ (M ) contain no information on the value of f (p). Similarly, φ∗ : T (M ) → T (IR) is replaced
with f∗ : T (M ) → IR, and f∗ (V ) ∈ IR for V ∈ Tp (M ) contains no information about f (p). A value such as
φ∗ (V ) ∈ Tφ(p) (IR), however, contains the value φ(p).
30.2. The differential of a real-valued function
The derivative of a real-valued function of several real variables is sometimes specified as the sequence of
partial derivatives of the function with respect to the independent variables. These partial derivatives have
simple transformation rules under changes of coordinates if the function is C 1 . Another approach is to
specify the directional derivative of the function in every direction at every point. In the case of real-valued
functions on differentiable manifolds, this second approach is generally adopted. This associates with each
C 1 function f and direction V at each point p a directional derivative DV f (p). If f and p are fixed, then
DV f (p) is a function of the direction V . This is the “differential” of the function f at p. The differential
of a function is a map from the set of all vectors V at a point to the real-valued derivative of the function
in the direction V . This is clearly linear with respect to V . Therefore a differential is a linear functional on
the set of vectors. In other words, it is a member of the dual linear space.
30.2.1 Definition: For any function f ∈ C 1 (M ) and point p in a C 1 manifold M , the differential of f
at p is the map (df )p : Tp (M ) → IR defined by
∀V ∈ Tp (M ), (df )p (V ) = DV f.
The differential of f at p (for tangent operators) is the map (df )p : T̊p (M ) → IR defined by
∀L ∈ T̊p (M ), (df )p (L) = L(f ).
30.2.2 Remark: Some spaces and maps in Definition 30.2.1 are illustrated in Figure 30.2.1.
The same notation (df )p is used for both the tangent vector and operator versions of differentials to economize
on notations. (A notation such as (df ˚ )p would have been the natural choice for the operator version,
but this would look silly. It increases the line spacing too much. But with such a notation, one could
˚ )p ◦ D, where D is the map V 8→ DV as in Figure 30.2.1.)
write (df )p = (df

D
Tp (M ) V L T̊p (M )
(df )p (df )p
DV L
C 1 (M ) f f C 1 (M )
IR
f f
M p p M
Figure 30.2.1 Differential of a real-valued function at a point for vectors and operators
30.2.3 Remark: For any V = tp,v,ψ ∈ Tp (M ), the value of (df )p (V ) may be written explicitly as
" # 5 n &
&
(df )p tp,v,ψ = v i ∂xi (f ◦ ψ −1 (x))&
x=ψ(p)
i=1
= ∂p,v,ψ (f ).
30.2.4 Remark: Since (df )p = {(V, DV f ); V ∈ Tp (M )}, whereas DV = {(f, DV f ); f ∈ C 1 (M )}, it seems
that DV and (df )p are projections of the map (f, V ) 8→ DV f onto f and V respectively. So the derivative
and the differential are just different ways of viewing the same map.
[ For “coordinate differentials” such as (dψ i )p , see Crampin/Pirani [12], pages 37 and 249. ]
30.2.5 Remark: It is often desirable to define (df )p to act on functions in C̊p1 (M, IR) instead of C 1 (M ),
because then it is not necessary to extend local functions to global functions
!n with desired local properties.
Another advantage is the ability to write expressions such as (df )p = i=1 ∂ip,ψ (f )(dψ i )p for f ∈ C̊p1 (M, IR),
with n = dim(M ), as in Theorem 30.2.7.
It is not true in general that the coordinate functions ψ i are in C 1 (M ). For simplicity, Definition 27.5.1 was
written in terms of C 1 (M ), and Definition 30.2.1 follows this pattern. (See Remark 27.5.5 for discussion
of this.) It will be understood that the differentials in Definition 30.2.1 apply to functions in any natural
extension of C 1 (M ) which yield linear functionals on the tangent vector or tangent operator space.
30.2.6 Remark: The cotangent set Tp∗ (M ) is equal to {(df )p : Tp (M ) → IR; f ∈ C 1 (M )} because the
set of differentials (df )p spans Tp∗ (M ). This can be shown with functions f which are chart component
functions ψ i for ψ ∈ atlas(M ) multiplied by suitable functions with compact support. (See Remark 20.12.10
for smooth functions with compact support.)
If the differentials (df )p are defined in terms of tangent operators in T̊p (M ) rather than Tp (M ), then the
cotangent space may be defined as T̊p∗ (M ) = {(df )p : T̊p (M ) → IR; f ∈ C 1 (M )}. The difference between
these cotangent spaces (probably) may be safely glossed over in most situations.
30.2.7 Theorem: For any C 1 manifold M and chart ψ ∈ atlasp (M ), the sequence of vectors ((dψ i )p )ni=1
is a basis for Tp∗ (M ), where n = dim(M ).
30.2.8 Remark: Theorem 30.2.9 expresses the differential (df )p of a C 1 real-valued function in terms of
the unit basis vectors in Theorem 30.2.7. The cotangent vectors (dψ i )p may be abbreviated to dip,ψ or di .
(See Remark 27.5.16 for the corresponding tangent operator abbreviations. See Remark 28.2.6 for the unit
cotangent vector notation.)
!
30.2.9 Theorem: For any f ∈ C̊p1 (M, IR), (df )p = ni=1 ∂ip,ψ (f )(dψ i )p .
30.2.10 Remark: The pointwise differentials (df )p of a function f ∈ C 1 (M ) at points p ∈ M are combined
to construct a cotangent vector field df ∈ X(T ∗ (M )) in Definition
& 30.2.11. There are two ways to proceed
depending on whether (df )p is interpreted as (df )(p) or (df )&T (M ) . As mentioned in Section 30.1, the former
p
yields differentials whereas the latter yields induced maps.

30.3. The differential of a differentiable map 627
30.2.11 Definition: For any function f ∈ C 1 (M ) for a C 1 manifold M , the differential of f is the
cotangent vector field df ∈ X(T ∗ (M )) defined by
∀p ∈ M, ∀V ∈ Tp (M ), (df )(p)(V ) = (df )p (V ).
The differential of f (for tangent operators) is the function df ∈ X(T̊ ∗ (M )) defined by

∀p ∈ M, ∀L ∈ T̊p (M ), (df )(p)(L) = (df )p (L).
The differential of f (for tagged tangent operators) is the function df ∈ X(T̂ ∗ (M )) defined by
∀p ∈ M, ∀(p, L) ∈ T̂p (M ), (df )(p)(p, L) = (df )p (L).
[ Near here, show that df ∈ X k (T ∗ (M )) if f ∈ C k+1 (M ), or something like that. Crampin/Pirani [12], pages
76 and 249, say that df ∈ X 0 (T ∗ (M )). ]
30.2.12 Remark: If f is defined only on an open subset of M in Definition 30.2.11, then the domain
of df is correspondingly restricted. Thus df : T (Dom(f )) → IR, where Dom(f ) is understood to have the
appropriate restricted atlas.
30.2.13 Theorem: If f ∈ C k+1 (M ) for a C k+1 manifold with k ∈ 0,
+
then df ∈ X k (T ∗ (M )).
Proof: . . .
[ Should find out if df is a special case of the exterior derivative. I think it is. ]
30.2.14 Remark: Higher-order differentials (such as differentials of differentials) of real-valued functions
are defined in Section 31.1. Differentials of real-valued functions for higher-order operators are defined in
Section 31.5.
[ Look at the action of differentials and induced maps on vector fields in X k (T (M )). ]
30.3. The differential of a differentiable map

A differentiable map is a C 1 map φ : M1 → M2 for C 1 manifolds M1 and M2 . The differential of a
differentiable map is covariant with respect to the source manifold M1 and contravariant with respect to the
target manifold M2 . That is, the differential behaves like cotangent vector in T ∗ (M1 ) and like a tangent
vector in T (M2 ). Differentials of real-valued functions and curves may be thought of as special cases of
differentials of maps between manifolds.
Differentiable maps are defined in Section 26.9. The first-order differential of a differentiable map is defined
in this section both in terms of tangent vectors (Definition 30.3.1) and tangent operators (Definition 30.3.18).
The operator form is clearly the simplest. This shows the value of tangent operators for presenting and moti-
vating definitions, but for practical calculations, the tangent vector version (defined in terms of components)
is required.
30.3.1 Definition: The differential at a point p ∈ M1 of a C 1 map φ : M1 → M2 , for C 1 manifolds M1
and M2 with n1 = dim(M1 ) and n2 = dim(M2 ), is the linear map (dφ)p : Tp (M1 ) → Tφ(p) (M2 ) defined by
∀ψ1 ∈ atlasp (M1 ), ∀v1 ∈ IRn1 , ∀ψ2 ∈ atlasφ(p) (M2 ),
" #
(dφ)p tp,v1 ,ψ1 = tφ(p),v2 ,ψ2 , (30.3.1)
where v2 ∈ IR n2
is defined by
n1
5 " #&&
∀k = 1, . . . n2 , v2k = v1i ∂xi ψ2k ◦ φ ◦ ψ1−1 (x) &
x=ψ1 (p)
i=1
= ∂p,v1 ,ψ1 (ψ2k ◦ φ).
30.3.2 Remark: The pointwise differential in Definition 30.3.1 is extended to the whole manifold M1 in
Definition 30.3.4. The result is a map dφ : M1 → T (M1 , M2 ) such that (dφ)(p) ∈ Tp,φ(p) (M1 , M2 ) for
all p ∈ M1 . The pointwise and total double tangent spaces Tp,q (M1 , M2 ) and T (M1 , M2 ) are given by
Definitions 28.4.1 and 28.4.3 respectively.

30.3.3 Remark: In Gallot/Hulin/Lafontaine [20], 1.36, page 17, the notation Tp f is used for (dφ)p . In
Federer [106], the notation D is used. The d notation has the advantage that dxi etc. is given real meaning
by the definition. An added advantage would be if the differential coincides with the exterior derivative for
scalar fields. (This coincidence probably does not occur!)
[ An attempt should be made to check whether such formulas as ds2 = cos2 θ dφ2 + dθ 2 are made meaningful
by the definition of differential of a real function. ]
30.3.4 Definition: The differential of a C 1 map φ : M1 → M2 for C 1 manifolds M1 and M2 is the map
dφ : M1 → T (M1 , M2 ) with (dφ)(p) = (dφ)p for all p ∈ M1 .
[ Should have an explicit definition of the application of tangent operators and general operators ∂xk etc. to
IRn -valued fucntions. ]
30.3.5 Remark: The tangent operator ∂p,v1 ,ψ1 in Definition 30.3.1 is as defined in Definition 27.5.1. It is
applied to each of the n2 components of the function ψ2 ◦ φ : M1 → IRn2 . As a shorthand, one could write
v2 = ∂p,v1 ,φ1 (ψ2 ◦ φ). In fact, this extension of differential operators to IRn -valued functions will be adopted
for convenience. Then one could write instead of equation (30.3.1):
∀ψ1 ∈ atlasp (M1 ), ∀v1 ∈ IRn1 , ∀ψ2 ∈ atlasφ(p) (M2 ),

" #
(dφ)p tp,v1 ,ψ1 = tφ(p),∂p,v1 ,ψ1 (ψ2 ◦φ),ψ2 .
30.3.6 Definition: The induced map of a map φ ∈ C 1 (M1 , M2 ) for C 1 manifolds M1 and M2 is the map
φ∗ : T (M1 ) → T (M2 ) defined by
∀z ∈ T (M1 ), φ∗ (z) = (dφ)π1 (z) (z),
where (dφ)p is as in Definition 30.3.1 with p = π1 (z), and π1 is the projection map of T (M1 ).
[ Must also define pull-back φ∗ and push-forth for tangent vectors and vector fields. ]
30.3.7 Remark: Definition 30.3.6 joins % together the pointwise differentials (dφ)p of Definition 30.3.1 for
all points p ∈ M1 . In other words, φ∗ = p∈M1 (dφ)p .
The corresponding Definition 30.3.24 for tangent operators is a little less tidy because of the requirement to
add tags to the operators.
[ See Malliavin [36], proposition I.3.5 for definitions of φ∗ and φ∗ . ]

[ The induced map of a differentiable map between differentiable manifolds must be a fibre bundle map. This
should be a theorem in the differentiable fibre bundles chapter. See Malliavin [36], proposition I.7.1.6. ]
[ Is the map φ∗ a tangent bundle map? See Malliavin [36], proposition I.7.1.3. ]
[ Should define here the extension of the induced map φ∗ to n-frames. Since φ∗ sends tangent vectors to tangent
vectors, this clearly induces a unique corresponding map of the n-tuples of tangent vectors. Therefore there
must be a map φ∗ : P (M1 ) → P (M2 ) for the principal tangent bundles. This probably has something to do
with connections on principal fibre bundles. ]
−
30.3.8 Theorem: If φ : M1 → M2 is a C r map between C r manifolds M1 and M2 for r ∈ + , then the
induced map φ∗ is of class C r−1
.
Proof: [ See Malliavin [36], proposition I.7.2.8 for proof of Theorem 30.3.8. Only need to show that v2 in
Definition 30.3.1 is C r−1 . ]
30.3.9 Definition: A C 1 map φ : M1 → M2 is said to be regular at p ∈ M1 if (dφ)p is injective.
30.3.10 Definition: An immersion of a C 1 manifold M1 into a C 1 manifold M2 is a differentiable map

from M1 to M2 which is regular at all points of M1 .
30.3.11 Definition: An embedding of a C 1 manifold M1 into a C 1 manifold M2 is an immersion of M1

into M2 which is injective.

30.3. The differential of a differentiable map 629
[ There might be other ‘standard’ definitions of submanifolds than Definition 30.3.12. ]
30.3.12 Definition: A submanifold of a manifold M1 is a manifold M2 such that M2 ⊆ M1 and the

identity map i : M2 → M1 is an embedding of M2 into M1 .
30.3.13 Remark: The induced map φ∗ of a C r diffeomorphism φ : M1 → M2 between C r manifolds M1

−
and M2 for r ∈ + is (probably) a C r−1 diffeomorphism from T (M1 ) to T (M2 ). This has a natural extension
to a map from X r−1 (M1 ) to X r−1 (M2 ).
[ Define here the image of a vector field under a diffeomorphism. See Gallot et alia 1.63, p.24. ]
30.3.14 Definition: The induced map for vector fields of a C 1 diffeomorphism φ : M1 → M2 , where M1
and M2 are C 1 manifolds, is the map φ∗ : X 0 (M1 ) → X 0 (M2 ) defined by
∀X ∈ X 0 (M1 ), ∀p ∈ M2 , φ∗ (X)(p) = φ∗ (X(φ−1 (p))),
where φ∗ : T (M1 ) → T (M2 ) denotes the induced map of φ. That is,
∀X ∈ X 0 (M1 ), φ∗ (X) = φ∗ ◦ (X ◦ φ−1 ).
[ The notation Jψφ1 ψ2 will be used for the Jacobian matrix of a map from one manifold to another, where ψ1
and ψ2 are charts for the two different manifolds. The notation Zβα will be used for a change of coordinates
at a single point p in a single manifold M , where ψα , ψβ ∈ atlasp (M ). ]
30.3.15 Definition: Let φ : M1 → M2 be a C 1 map between C 1 manifolds M1 and M2 with n1 = dim(M1 )

and n2 = dim(M2 ). Then the Jacobian matrix of φ at p ∈ M1 with respect to charts ψ1 ∈ atlasp (M1 ) and
ψ2 ∈ atlasφ(p) (M2 ) is the matrix Jψφ1 ψ2 (p) ∈ Mn2 n1 (IR) defined by
+ ∂ " #i ,&&
Jψφ1 ψ2 (p)i j = ψ2 ◦ φ ◦ ψ1
−1
(x) & .
∂xj x=ψ1 (p)
30.3.16 Theorem: Let M, M1 , M2 be C 1 manifolds. Let p ∈ M . Let φ1 ∈ C̊p1 (M, M1 ) and φ2 ∈
C̊p1 (M, M2 ). Let ψ ∈ atlasp (M ), ψ1 ∈ atlasφ1 (p) (M1 ) and ψ2 ∈ atlasφ2 (p) (M2 ). Then
+ ∂ ,&
φ1 ×φ2 i &
Jψ,ψ 1 ⊕ ψ2
(p)i j = (((ψ1 ◦ φ 1 ) ⊕ (ψ 2 ◦ φ 2 )) ◦ ψ −1
(x)) &
∂xj x=ψ(p)
= concat(J φ1 ·
(p) , J φ2 ·
(p) ) ,i
ψ,ψ1 j ψ,ψ2 j
φ1 φ2
where the Jacobian matrices Jψ,ψ 1
(p) ∈ Sym(n1 , IR) and Jψ,ψ 2
(p) ∈ Sym(n2 , IR) are as in Definition 30.3.15,
where n1 = dim(M1 ) and n2 = dim(M2 ).
[ Give an example to clarify Theorem 30.3.16. ]

[ See Definition 7.7.6 for the concatenation operator. ]
30.3.17 Remark: By putting L = ∂p,v1 ,ψ1 for tp,v1 ,ψ1 ∈ Tp (M ) and f = ψ2k ◦ φ in Definition 30.3.1,
the operator form of the differential is constructed in Definition 30.3.18. These are equivalent definitions.
Definition 30.3.18 is simpler whereas Definition 30.3.1 is more useful for computation. The operator definition
provides a convenient shorthand and mnemonic for the component version of the differential, which is defined
so as to be consistent with the operator version.
30.3.18 Definition: The differential (for tangent operators) at a point p ∈ M1 of a map φ ∈ C 1 (M1 , M2 ),
where M1 and M2 are C 1 manifolds, is the linear map (dφ)p : T̊p (M1 ) → T̊φ(p) (M2 ) defined by
" #
∀L ∈ T̊p (M1 ), ∀f ∈ C 1 (M2 ), (dφ)p (L) (f ) = L(f ◦ φ).

(dφ)p (L)
(dφ)p
T̊p (M1 ) L T̊φ(p) (M2 )
π1 π2
ψ1 φ ψ2
p φ(p)
M1 M2
IRn1 IRn2
f ◦φ f
IR
Figure 30.3.1 The differential of a map for first-order operators
30.3.19 Remark: Figure 30.3.1 shows spaces and maps relevant to Definition 30.3.18, and the projection
maps πk : T̊ (Mk ) → Mk for k = 1, 2.
Theorem 30.3.20 gives the relation between the component version of the map differential in Definition 30.3.1
and the operator version in Definition 30.3.18. The same notation (dφ)p is used for both versions, but this
should not cause confusion.
30.3.20 Theorem: Definitions 30.3.1 and 30.3.18 are consistent with each other. That is, D(dφ)p (V ) =
(dφ)p (DV ) for all V ∈ Tp (M1 ) and p ∈ M1 .
Proof: Let V = tp,v1 ,ψ1 and f ∈ C 1 (M2 ). Then DV = ∂p,v1 ,ψ1 and so by Definition 30.3.18,
(dφ)p (DV )(f ) = ∂p,v1 ,ψ1 (f ◦ φ)

5n1 &
&
= v1i ∂xi (f ◦ φ ◦ ψ1−1 (x))&
x=ψ1 (p)
i=1
5n1 &
&
= v1i ∂xi (f ◦ ψ2−1 ◦ ψ2 ◦ φ ◦ ψ1−1 (x))&
x=ψ1 (p)
i=1
5n2 & n1
5 &
& &
= ∂yj (f ◦ ψ2−1 (y))& v1i ∂xi (ψ2j ◦ φ ◦ ψ1−1 (x))&
y=ψ2 (φ(p)) x=ψ1 (p)
j=1 i=1
= ∂φ(p),v2 ,ψ2 (f ) = D(dφ)p (V ) (f ),
where v2 is as in Definition 30.3.1. Since this holds for all f ∈ C 1 (M2 ), the result follows.
30.3.21 Remark: Definition 30.3.18 maps the “ubiquitous zero vector” of M1 to the corresponding vector
in M2 . The tangent operator L ∈ T̊p (M1 ) such that L : f 8→ 0 for all f ∈ C 1 (M1 ) is the same map
independent of p ∈ M1 , which justifies the name “ubiquitous zero vector”. Luckily, this vector is mapped to
the corresponding zero vector for M2 , no matter which point p it is attached% to. Therefore the union of the
differential maps (dφ)p : T̊ (M1 ) → T̊ (M2 ) is a well-defined function φ∗ = p∈M1 (dφ)p . This is the induced
map given in Definition 30.3.23.
30.3.22 Definition: The differential (for tangent operators) of a map φ ∈ C 1 (M1 , M2 ) for C 1 manifolds
M1 and M2 is the linear map dφ : M1 → T̊ (M1 , M2 ) defined by
" #
∀p ∈ M1 , ∀L ∈ T̊ (M1 ), ∀f ∈ C 1 (M2 ), (dφ)(p)(L) (f ) = L(f ◦ φ).
30.3.23 Definition: The induced map (for tangent operators) of a map φ ∈ C 1 (M1 , M2 ) for C 1 manifolds
%
M1 and M2 is the linear map φ∗ : T̊ (M1 ) → T̊ (M2 ) defined by φ∗ = p∈M1 (dφ)p . That is,
" #
∀L ∈ T̊ (M1 ), ∀f ∈ C 1 (M2 ), φ∗ (L) (f ) = L(f ◦ φ).

30.4. The differential of a curve 631
30.3.24 Definition: The induced map (for tagged tangent operators) of a map φ ∈ C 1 (M1 , M2 ) for C 1
manifolds M1 and M2 is the map φ∗ : T̂ (M1 ) → T̂ (M2 ) defined by
" #
∀(p, L) ∈ T̂ (M1 ), φ∗ (p, L) = (φ(p), (dφ)p (L)),
where (dφ)p is the operator version of the differential given in Definition 30.3.18.
30.3.25 Remark: In the study of partial differential equations, a typical equation would be aij (x)uij (x) +
bi (x)ui (x) + c(x)u(x) = f (x) for u ∈ C 2 (Ω), c, f ∈ C 0 (Ω), b ∈ C 0 (Ω, IRn ) and a ∈ C 0 (Ω, Sym(n, IR))
for some open subset Ω of IRn . In a differential geometry context, such equations must be given a chart-
independent meaning. A term such as bi (x)ui (x) may be replaced by ∂p,b(p),ψ (u) for functions u ∈ C 2 (M )
and b : M → IRn . It is of interest to know how such expressions are transformed under differentiable maps
between manifolds.
Theorem 30.3.26 shows the correspondence between first order partial differential equations in diffeomorphic
open subsets of manifolds. The close relation between partial differential equations and tangent vectors
reveals the true analytical nature of differential geometry.
[ Maybe should do a version of Theorem 30.3.26 for second-order operators in Section 31.2. Also give the
transformations for chart transitions. Should check Theorem 30.3.26. See Section 19.5. ]
30.3.26 Theorem: Let M1 , M2 be C 1 n-dimensional manifolds and let φ : Ω1 → Ω2 be a diffeomorphism

between open sets Ω1 ⊆ M1 and Ω2 ⊆ M2 . Let b ∈ IRn and u ∈ C 1 (Ω1 , IR) and c, f ∈ C 0 (Ω1 , IR). If the
equation
∂p,b,ψ1 (u) + c(p)u(p) = f (p)
is satisfied for some p ∈ Ω1 and ψ1 ∈ atlasp (M1 ), then the equation
∂q,b̃,ψ2 (ũ) + c̃(q)ũ(q) = f˜(q)
is satisfied for any ψ2 ∈ atlasq (M2 ), where q = φ(p), c̃ = c ◦ φ−1 , f˜ = f ◦ φ−1 , ũ = u ◦ φ−1 , and
&
&
b̃i = bj ∂j (ψ2 ◦ φ ◦ ψ1−1 )i &
ψ1 (p)
for i = 1 . . . n.
Proof: All terms in the two equations are equal. The equality of the first-order terms is a simple conse-
quence of the definition of the differential of the map φ for operators.
30.4. The differential of a curve

The differential of a curve is the same thing as the tangent vector or velocity vector of the curve.
Differentials of curves are contravariant vectors because the range of a curve is the manifold itself. This may
be contrasted with differentials of real-valued functions, which are covector fields because the manifold is the
domain of the map. The differential of a map between two manifolds is covariant with respect to the map’s
domain and contravariant with respect to the map’s range.
Differentiability of curves in a differentiable manifold is defined in Section 26.7. Differentiable vector fields
along curves are defined in Section 28.8.
30.4.1 Definition: The tangent vector field of a C 1 open curve γ : I → M in a C 1 manifold M −

< (M, AM )
is the map γ # : I → T (M ) defined by
P" " d " i #&& #n #Q
∀t ∈ I, γ # (t) = γ(t), ψ (γ(u)) & i=1
,ψ
du u=t
= tγ(t),∂t (ψ◦γ(t)),ψ
for any chart ψ ∈ AM such that γ(t) ∈ Dom(ψ).

30.4.2 Remark: The fact that γ # (t) in Definition 30.4.1 is a well-defined tangent vector is easily verified
in relation to Definition 27.3.3. Let ψ1 , ψ2 ∈ atlasγ(t) (M ). Then
" #i
∂t (ψ2 ◦ γ(t)) = ∂t (ψ2i ◦ ψ1−1 ◦ ψ1 ◦ γ(t))
n
5 " #j
= ∂xj (ψ2 ◦ ψ1−1 (x)) ∂t (ψ1 ◦ γ(t)) ,
j=1
which verifies equation (27.3.1). Since the tangent vector field is tagged by the curve parameter t ∈ I, there
are no ambiguities at self-intersections of the curve. However, γ # (t) is customarily said to be “the tangent
vector at γ(t)”, which is ambiguous if the curve parameter is discarded.
30.4.3 Remark: The tangent vector γ # (t) in Definition 30.4.1 is really only a substitute for a limit of
the form limh→0 (γ(t + h) − γ(t))/h. If M has no linear space (or affine space) structure, then the sum and
product in the expression (γ(t + h) − γ(t))/h simply do not make sense. This is precisely why tangent vectors
for manifolds were invented. They provide a substitute for a real differential of a curve by first overlaying a
coordinate chart and then differentiating the coordinates instead of the curve itself. Then to get rid of the
arbitrariness of this procedure, equivalence classes of these derivatives are used.
The situation becomes more interesting when the manifold M does have a linear space (or affine space)
structure. In this case, the expression (γ(t + h) − γ(t))/h is well-defined, and if the limit exists, it would be
interesting to compare the result with the corresponding tangent vector to the manifold. (It is assumed here
that all finite-dimensional linear spaces are given the standard topology which makes them homeomorphic
under a linear map to a space IRn , which is topologically complete.) The results should be equivalent if the
coordinate charts match up in a differentiable manner with the linear space structure.
Suppose that M is a linear space with the standard topology, and that ψ ∈ atlas(M ) is C 1 with respect & to
the linear space structure of M in the sense that for all p, v ∈ M , the derivative bp,v = ∂t (ψ(p + tv))&t=0 ∈
IRn is well-defined and continuous with respect to v. Then define Vp,v = tp,bp,v ,ψ ∈ Tp (M ). The map
L : M → Tp (M ) defined by L : v 8→ Vp,v is a linear isomorphism. Therefore the inverse linear map
8p = L−1 : Tp (M ) → M is well-defined. The map 8p might be referred to as the “drop” of Tp (M ) onto M
(analogous to “lift” functions). A special case of this kind of drop-function is the canonical identification of
Tp (IRn ) with IRn for all p ∈ IRn . Such drop-functions arise in the calculation of covariant derivatives with
respect to affine connections in Section 36.5. (See Definition 27.11.6 for drop functions.)
[ The following paragraph needs to be sorted out a bit. ]
Curves and families of curves are maps from spaces IRm to manifolds. Since the domains of such maps may
be regarded as either a manifold or a linear space, there are two possible definitions for a differential of the
map. For a curve, the value of γ # (t) is in a tangent space Tγ(t) (M ) in the manifold case, and in IRm in the
linear space case. A similar issue arises when the range of the map is IRm . The differential of a real-valued
function may be regarded as valued in some dual space Tp∗ (M ) or else a linear map between Tp (M ) and a
tangent space of IR.
30.4.4 Remark: It is interesting to compare Definition 30.4.1 with the corresponding definitions for the
differential of a differential map in Section 30.3. The set IR may be regarded as a differentiable manifold
(IR, AIR ) with AIR = {ψ0 }, where ψ0 = idIR is the identity chart on IR. Then the differential dγ : IR →
T (IR, M ) and induced map γ∗ : T (IR) → T (M ) are defined as follows.
" #
∀t ∈ IR, ∀α ∈ IR, (dγ)t tt,α,ψ0 = αγ # (t)
" #
∀t ∈ IR, ∀α ∈ IR, γ∗ tt,α,ψ0 = αγ # (t),
&
where αγ # (t) = tγ(t),α∂t (ψ◦γ(t)),ψ for ψ ∈ AM . Thus (dγ)t = γ∗ &Tt (IR) ∈ Hom(Tt (IR), Tγ(t) (M )) for t ∈ IR.
Conversely, the value γ # (t) can be expressed in terms of dγ and γ∗ as
" # " #
∀t ∈ IR, γ # (t) = (dγ)(t) tt,1,ψ0 = γ∗ tt,1,ψ0 .
It is clear that γ # contains all of the information in the maps dγ and γ∗ . Since IR has such an obvious choice
of chart and the pointwise linear spaces of IR are 1-dimensional,
" # it seems quite unnecessary to give the full
differentials. It is entirely sensible to define γ # (t) as γ∗ tt,1,ψ0 , since the number 1 and chart ψ0 are implicit.

30.5. One-parameter transformation families and vector fields 633
30.4.5 Notation: dγ denotes the tangent vector field γ # in Definition 30.4.1.
30.4.6 Remark: In contradiction to Remark 30.4.4, Notation 30.4.5 defines the differential dγ of a C 1
curve to be the same as the tangent vector field of the curve. This is based on the identification of the
tangent space at a point of a real interval with the set IR.
30.4.7 Remark: The notation γ # is used for curves instead of γ̇ because the dot is more difficult to see.
An argument can be made for using the dot-notation γ̇ for functions γ : IR → m from the real numbers to a
manifold M , and the dash-notation f # for functions f : M → IR from a manifold to the real numbers.
Isaac Newton used the dot-notation specifically for differentiation with respect to a time parameter as opposed
to a space parameter. However, the modern habit of unambiguously identifying function domains and ranges
makes such notational distinctions unnecessary. In Newton’s work, symbols referred to numbers rather than
functions. So notations such as y # and ẏ communicated to the reader which function to differentiate, in this
case x 8→ y or t 8→ y respectively. (See Remark 18.2.11 for more on the Newtonian dot-notation.)
30.4.8 Definition: The tangent operator of a C 1 open curve γ : I → M in a C 1 manifold M for a param-
eter t ∈ I is the tangent operator Dγ ! (t) ∈ T̊γ(t) (M ), where γ # (t) is the tangent vector in Definition 30.4.1.
30.4.9 Theorem: If r ≥ 1 and γ is a C r non-self-intersecting curve, then the tangent vector field γ # is a
C r−1 vector field on the image of the restriction of γ to the interior of its domain.
[ Theorem 30.4.9 needs to be tidied up by extending the tangent vector field to the whole of the domain of γ. ]
30.4.10 Definition: A one-parameter family of curves induced by a vector field on a curve is. . .
[ Cover definition, existence and uniqueness of integral curves of vector field X with γ # (t) = X(γ(t)). ]
30.4.11 Definition: A partial tangent vector field of a C 1 family of curves γ : IRm → M with m ∈ + in
a C 1 manifold M is a vector field ∂k γ : IRm → T (M ) defined for k = 1 . . . m, t ∈ IR and ψ ∈ atlasγ(t) (M ) by
∂k γ(t) = tγ(t),∂tk (ψ◦γ(t)),ψ .
[ Should have a diagram here showing transversal vector fields. ]
30.4.12 Remark: The differential dγ for a C 1 family of curves γ : IRm → M is not quite the same thing
as the sequence (∂k γ)m
k=1 , but it does contain essentially the same information.
30.4.13 Remark: The differentials in Definition 30.4.11 may be thought of as “transversal vector fields”.
For example, when m = 2, the vector γ2 (u1 , u2 ) is transverse to the curve t1 8→ γ(t1 , t2 ) at the point γ(u1 , u2 )
for u = (u1 , u2 ) ∈ IR2 .
[ Present here the almost-everywhere differentials of rectifiable curves and families of curves. ]
30.5. One-parameter transformation families and vector fields

[ Probably the one-parameter and multi-parameter groups of diffeomorphisms should be defined near Section
26.7 on differentiable curves. Then the generated vector fields should be presented in this section. ]
Vector fields generated by families of diffeomorphisms are important because these are exactly what is
generated on fibre spaces by a connection for motion along a curve in a base space. Parallelism is always
defined as a bijection between fibre sets at points of a base space, but in the case of a connection (differential
parallelism), the fibre sets are differentiable manifolds, and the pathwise parallelism for a given curve in
the base space generates a one-parameter family of diffeomorphisms of the fibre space (through the charts).
Generally diffeomorphisms of differentiable manifolds are not of great importance or interest, but in the case
of fibre spaces, they are the essence of parallelism and connections. The Poisson bracket of the vector fields
generated by base space curves corresponds to curvature. The curvature of a connection may be defined in
general, even if the structure group is not a finite-dimensional manifold, as the Poisson bracket of the fields
generated by infinitesimal motions in the base space.

[ Should present here especially the vector fields generated by families of diffeomorphisms on fibre spaces.
This is also covered in the differentiable groups chapter, but the group doesn’t have to be a manifold. ]
[ See Crampin/Pirani [12], page 251, for one-parameter transformation groups and vector fields. ]
In this section, the Poisson bracket is given an alternative definition via local one-parameter groups of
transformations. A canonical example of this sort of correspondence between vector fields and one-parameter
transformation groups is the Taylor series expansion f (x + ε) = exp(ε(∂/∂x))(f )(x) for analytic functions f .
[ The symbol ∂/∂x is a sort of slang term, which should be defined carefully in general somewhere. Should
also carefully define exp(ε(∂/∂x)). ]
30.5.1 Remark: It follows (hopefully) from Definition 26.8.2 (i) that φt is a diffeomorphism from Ω
to φt (Ω).
[ The following definition is actually only a motivational definition for the material in the following comment. ]
30.5.2 Definition: The vector field generated by a C 1 one-parameter group φ of transformations of a C 1

manifold M is the vector field Xφ ∈ X 0 (M ) defined by
∀f ∈ C 1 (M ), ∀p ∈ M, Xφ (f )(p) = lim t−1 (f (φ(t)(p)) − f (p)).

t→0
[ This may be equivalent to the image under the differential of φ of the vector field ∂/∂t on IR × M . Of
course, this operator should be defined somewhere. Then dφ : T (IR × M ) → T (M ) maps (∂/∂t)(t,p) for
(t, p) ∈ IR × M to (dφ)(∂/∂t)(t,p) ∈ Tφ(t,p) (M ).
+∂,
(dφ) : f 8→ lim t−1 (f (φ(t)(p)) − f (p)).
∂t t→0
That is,
+∂,
X(φ)(f )(p) = (dφ) (f )(p).
∂t
That is,
+∂,
X(φ) = (dφ) .
∂t
That is, the vector field ∂/∂t on IR × M is mapped to X(φ) on M by dφ. In reverse, φ is generated by X(φ).
For each X ∈ T (M ), want existence and uniqueness of φ : IR × M → M such that (dφ)(∂/∂t) = X. Notation
exp(tX) = φ(t).
The vector field generated by a one-parameter group (φt )t∈IR is in a sense the derivative of the group at t = 0. ]
[ Should the expression X(φ) in the above comment be replaced by Xφ ? ]
30.5.3 Theorem: The function Xφ in Definition 30.5.2 is a vector field for all one-parameter groups φ of
transformations of a C 1 manifold M .
[ Define a one-parameter group of transformations of a manifold generated by X, and a local one-parameter

group of local transformations around p generated by X. ]
[ Should mention the EDM claim that the local group generated by a vector field always exists. ]
[ Will define d so that ((dφt )Y )(p) = (dφt )φ−1 (p) Y (φ−1
t (p)) for an extension of d from C (M ) to X (M ). ]
∞ ∞
t
[ Must define vector fields for multi-parameter groups of transformations. This has some relevance to curvature
and connections and fibre spaces. ]

[635]
Chapter 31
Higher-order differentials
31.1 Higher-order differentials of a real-valued function . . . . . . . . . . . . . . . . . . . . . . 635

31.2 Higher-order differentials of a differentiable map . . . . . . . . . . . . . . . . . . . . . . . 635
31.3 Higher-order differentials of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
31.4 Higher-order differentials of curve families . . . . . . . . . . . . . . . . . . . . . . . . . . 638
31.5 Differentials of real-valued functions for higher-order operators . . . . . . . . . . . . . . . 639
31.6 Hessian operators at critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
31.7 Differentials of differentiable maps for higher-order operators . . . . . . . . . . . . . . . . 640
31.8 Differentials of curves for higher-order operators . . . . . . . . . . . . . . . . . . . . . . . 642
This chapter presents two different topics: higher-order differentials and differentials for higher-order oper-
ators. These are distinct topics. For example, a second-order differential is the differential of a differential,
whereas the differential for a second-order operator is the “push-forth” of the operator. These topics are
presented together to (hopefully) help to clarify the difference between them.
31.1. Higher-order differentials of a real-valued function
This section is about differentials of differentials (d2f )p of real-valued functions f , and higher-order versions
of this. By Definition 30.2.11, df ∈ X(T ∗ (M )), and by Theorem 30.2.13, df ∈ X 1 (T ∗ (M )) for if f ∈ C 2 (M ).
Thus df is a C 1 map from M to T ∗ (M ), which both have a C 1 manifold structure if M is a C 2 manifold.
Therefore df : M → T ∗ (M ) is a C 1 map which can be differentiated. Its differential must be a map of the
form d2f : T (M ) → T (T ∗ (M )) such that (d2f )p : Tp (M ) → T(df )p (T ∗ (M )).
31.2. Higher-order differentials of a differentiable map
[ See Malliavin [36], proposition I.7.3 for higher order derivatives. ]

There are at least two classes of differential maps which may be described as second-order differentials
between manifolds M1 and M2 . One is the differential of a differential, which is a map d2 φ from T (T (M1 ))
to T (T (M2 )); the other is a differential map d[2] φ from T [2] (M1 ) to T [2] (M2 ), where T [2] (M ) denotes the
space of second-order partial differential operators on a C 2 manifold M . Differentials d[k] φ for higher-order
operators are described in Section 31.5. This section is about differentials of differentials d2 φ etc. [ d2 φ is
really φ∗∗ . ]
The second-order differential of a diffeomorphism is of interest because an affine connection on a manifold
is a special case of this. Parallelism on a manifold is then generated by integrating such a differential. [ This
gives some sort of motivation for connections? ]
[ φ maps M1 to M2 . Then φ∗ : T (M1 ) → T (M2 ). Must calculate coefficients of φ∗∗ ? ]


636 31. Higher-order differentials
31.3. Higher-order differentials of a curve

Just as the first-order differential of a curve (Section 30.4) represents the velocity of a trajectory (if the
parameter is interpreted as time), so the second-order differential represents the acceleration. Usually the
second and higher order differentials are presented in the context of an affine connection, but in this sec-
tion, the differentials are more abstract. In essence, higher order differentials may be specified in terms of
their effect on real-valued functions, just as first-order differentials of curves are. This guarantees chart-
independence. In the context of physics, chart-independence may be interpreted as observer-independence.
The facts should be objective. That is, facts about objects should be independent of the coordinate charts
used to describe them.
This section defines connection-free higher-order differentials of curves and verifies that they are chart-
independent.
[ The differential in Definition 31.3.1 is really γ∗∗ with some sort of simplification of T (T (IR))? ]
31.3.1 Definition: The second-order tangent vector field on a C 2 curve γ : IR → M in a C 2 manifold M

with n = dim(M ) is the map γ ## : IR → T (T (M )) defined for ψ ∈ atlasγ(t) (M ) by
∀t ∈ IR, γ ## (t) = tγ ! (t),wψ (t),ψ̃ ,
where wψ : IR → IR2n is defined by

-
∂t (ψ i ◦ γ(t)) for i = 1 . . . n
∀t ∈ IR, wψ (t)i =
∂t2 (ψ i−n ◦ γ(t)) for i = n + 1, . . . 2n
and γ # is the first-order tangent vector field in Definition 30.4.1.
31.3.2 Remark: The expression for γ ## (t) in Definition 31.3.1 may be written out more fully as follows:
P" #Q
∀t ∈ IR, γ ## (t) = tγ(t),∂t (ψ◦γ(t)),ψ , (∂t (ψ ◦ γ(t)), ∂t2 (ψ ◦ γ(t))), ψ̃ ,
where (∂t (ψ ◦ γ(t)), ∂t2 (ψ ◦ γ(t))) ∈ IR2n represents the concatenation of the vectors ∂t (ψ ◦ γ(t)) ∈ IRn
and ∂t2 (ψ ◦ γ(t)) ∈ IRn" . Definition 31.3.1#is constructed from the double application of Definition 30.4.1.
When the coordinates ψ(γ(t)), ∂t (ψ◦γ(t)) " ∈ IR of γ (t) in #the total
2n #
" tangent space T (M ) #are differentiated
with respect to t, the result is simply ∂t ψ(γ(t)), ∂t (ψ ◦ γ(t)) = ∂t (ψ ◦ γ(t)), ∂t2 (ψ ◦ γ(t)) .
It is perhaps interesting to note that the standard
" coordinates in IR for γ (t) with 2respect to #the chart ψ
4n ##
may be written as the quadruple concatenation ψ(γ(t)), ∂t (ψ ◦ γ(t)), ∂t (ψ ◦ γ(t)), ∂t (ψ ◦ γ(t)) . The first
derivatives appear twice in this coordinate vector.
31.3.3 Theorem: The vector field γ ## in Definition 31.3.1 is chart-independent. In other words, for all
t ∈ IR and ψ1 , ψ2 ∈ atlasγ(t) (M ), tγ ! (t),wψ (t),ψ̃1 = tγ ! (t),wψ (t),ψ̃2 , where ψ̃α denotes the chart for T (M )
1 2
corresponding to the chart ψα for M for α = 1, 2.
Proof: By Theorem 27.10.9, tγ ! (t),wψ (t),ψ̃1 = tγ ! (t),wψ (t),ψ̃2 if and only if equations (27.10.1) and (27.10.2)
1 2
are satisfied. Thus it must be shown that for all i = 1 . . . n,
n
5 &
& j
i
wψ 2
(t) = ∂xj (ψ2i ◦ ψ1−1 (x))& wψ 1
(t)
x=ψ1 (γ(t))
j=1
and
n
5 & n
5 &
& j & n+j
n+i
wψ 2
(t) = ∂xj ∂xk (ψ2i ◦ ψ1−1 (x))& vψk 1 (t)wψ 1
(t) + ∂xj (ψ2i ◦ ψ1−1 (x))& wψ 1
(t),
x=ψ1 (γ(t)) x=ψ1 (γ(t))
j,k=1 j=1
where vψ1 (t) ∈ IRn is the component vector for γ # (t) defined by γ # (t) = tγ(t),vψ1 (t),ψ1 . The first equation

31.3. Higher-order differentials of a curve 637
follows as for the first-order tangent field from the calculation

i
wψ 2
(t) = ∂t (ψ2i ◦ γ(t))
= ∂t (ψ2i ◦ ψ1−1 ◦ ψ1 ◦ γ(t))
5n &
&
= ∂xj (ψ2i ◦ ψ1−1 (x))& ∂t (ψ1j ◦ γ(t))
x=ψ1 (γ(t))
j=1
5n &
& j
= ∂xj (ψ2i ◦ ψ1−1 (x))& wψ 1
(t).
x=ψ1 (γ(t))
j=1
The second equation follows similarly from the calculation

n+i
wψ 2
(t) = ∂t2 (ψ2i ◦ γ(t))
= ∂t2 (ψ2i ◦ ψ1−1 ◦ ψ1 ◦ γ(t))
5n &
&
= ∂xj ∂xk (ψ2i ◦ ψ1−1 )& ∂t (ψ1k ◦ γ(t))∂t (ψ1j ◦ γ(t))
x=ψ1 (γ(t))
j,k=1
n
5 &
&
+ ∂xj (ψ2i ◦ ψ1−1 )& ∂t2 (ψ1j ◦ γ(t))
x=ψ1 (γ(t))
j=1
n
5 & n
5 &
& j & n+j
= ∂xj ∂xk (ψ2i ◦ ψ1−1 )& k
wψ 1
(t)wψ 1
(t) + ∂xj (ψ2i ◦ ψ1−1 )& wψ 1
(t).
x=ψ1 (γ(t)) x=ψ1 (γ(t))
j,k=1 j=1
This is the correct answer because wψ

k
1
(t) = vψk 1 (t) for k = 1 . . . n.
31.3.4 Remark: The fact that wψ k

1
(t) = vψk 1 (t) for k = 1 . . . n in the proof of Theorem 31.3.3 means that
the horizontal component of γ is the same as γ # . Thus π∗ (γ ## (t)) = γ # (t), where π : T (M ) → M is the
##
projection map for T (M ). In other words, the vector γ ## (t) ∈ Tγ(t) (T (M )) carries inside it a copy of the
vector γ # (t) ∈ Tγ(t) (M ). This is because the vector γ # (t) = tp,v,ψ contains a copy of the base point p.
Variation of p is regarded as horizontal whereas variation of v is regarded as vertical.
Figure 31.3.1 illustrates the four parts of the second-order tangent field of a curve. The first two parts are
the point γ(t) and the component vector v ∈ IRn which are combined as γ # (t) = tγ(t),v,ψ . The third part
is the sequence of n horizontal components wj of the vector γ ## (t) ∈ Tγ ! (t) (T (M )). The fourth part is the
sequence of n vertical components wn+j of γ ## (t), which indicate the rate of change of the components v
with respect to t. In the limit, the third part is the same as the second part. Only the fourth part is really
a second differential. All the rest is just book-keeping.
31.3.5 Remark: The dotted arrow in Figure 31.3.1 represents a kind of parallel-translated copy of the
vector v k .
4 vertical
2
vk wn+j
1 γ(t)
wj
3
horizontal
Figure 31.3.1 Components of second-order tangent field of a curve
However, it is important to remember that this translated vector depends on the choice of coordinates. The
dotted arrow is constructed by copying the coordinates vk from γ(t) to a nearby point on the curve. But the
transition rules are different at different points of the manifold. So these copied coordinates to not transform
correctly at any point except γ(t). The purpose of an affine connection is to construct a true vector at each
point of the curve which transforms correctly at each point and which may be thought of as the parallel
translate of γ # (t) to each point.

31.3.6 Remark: If γ # (t) = 0, then γ ## (t) is a vertical vector. This implies that the vertical part of γ ## (t)
transforms in the same way as vectors in Tp (M ) under changes of chart. If the “drop” function 8γ ! (t) in
Definition 27.11.6 is applied to γ ## (t) ∈ ker((dπ)γ ! (t) ), the result is a well-defined vector in Tp (M ). This
means that even in the absence of a connection, the acceleration of a trajectory is chart-independent if the
velocity is zero, and the acceleration may then be interpreted as a true tangent vector to the manifold. One
way of thinking about this is to observe that the error introduced by chart variation (as in Remark 31.3.5)
into the limiting process when differenting γ # (t) with respect to t converges to zero faster than t converges.
31.3.7 Notation: dk γ for k ∈ + for a C k open curve γ : I → M in a C k manifold M recursively denotes
the differential d(dk−1 γ), where d0 γ = γ.
31.3.8 Remark: Notation 31.3.7 implies that dk γ : I → T (k) (M ), where the higher tangent spaces T (k) (M )
are defined in Notation 27.10.17. For example, d2 γ : I → T (T (M )) is defined by
P" #Q
∀t ∈ I, (d2 γ)(t) = tγ(t),∂t (ψ◦γ(t)),ψ , ∂t2 (ψ ◦ γ), ψ̃ ,
for all ψ ∈ atlasγ(t) (M ), where ψ̃ is the chart for T (M ) corresponding to ψ as in Definition 27.8.1.
[ Similarly to Remark 30.4.4, compare γ ## with 2nd-order differentials d2 γ or γ∗∗ for γ : IR2 → M . Relate this
to T (T (IR)). ]
γ(t),ψ γ(t),ψ
[ Define Dγ !! (t) . Relate Dγ !! (t) to basis tangent operators ∂i and ∂ij .]
[ Show that γ is C if γ is C
## r
.]
r+2
[ Distinguish between simple second derivatives of curves and covariant derivatives of the simple first derivative.
The first must be in T (T (M )) whereas the second may be in T (M ). ]
[ Must look at the second-order tangent vector interpretation of γ ## (t). In other words, γ ## (t) may be thought
of as a second-order operator acting on real functions. Yet another possibility is to regard γ ## (t) as a chart-
ψ
dependent vector ∂tt (γ) ∈ Tγ(t) (M ). This happens to be the first-order part of the second-order tangent
ψ
vector! The vector ∂tt (γ) could be combined with a connection to give a chart-independent tangent vector. ]
31.4. Higher-order differentials of curve families

The second-order differential of a curve may be interpreted either as the differential of the differential in
T (T (M )) or as a second-order operator in T [2] (M ). The T (T (M )) interpretation is presented in this section.
31.4.1 Remark: When derivatives do not commute, it is important to get the order right. The notation
γk% means a derivative with respect to k then /. Thus in Definition 31.4.2, γk% (t) means ∂t$ (∂tk γ(t)), which
is a vector with base point ∂tk γ(t) = γk (t).
31.4.2 Definition: A partial second-order tangent vector field of a C 2 map γ : IRm → M in a C 2 manifold
M with n = dim(M ) and m ∈ + is a map γk% : IRm → T (T (M )) defined for k, / = 1 . . . n and ψ ∈
atlasγ(t) (M ) by
(2)
∀t ∈ IRm , γk% (t) = tγ ,
k (t),wψ (t),ψ̃
where wψ : IRm → IR2n is defined by

-
∂t$ (ψ i ◦ γ(t)) for i = 1 . . . n
∀t ∈ IR ,m
wψ (t) =
i
∂t$ ∂tk (ψ i−n
◦ γ(t)) for i = n + 1, . . . 2n
and γk is the first-order tangent vector field for γ as in Definition 30.4.11.
31.4.3 Remark: The expression for γk% (t) in Definition 31.4.2 may be written out more fully as:
P" #Q
∀t ∈ IR, γk% (t) = tγ(t),∂tk (ψ◦γ(t)),ψ , (∂t$ (ψ ◦ γ(t)), ∂tk ∂t$ (ψ ◦ γ(t))), ψ̃ ,
where (∂t$ (ψ ◦ γ(t)), ∂tk ∂t$ (ψ ◦ γ(t))) ∈ IR2n represents the concatenation of the vectors ∂tk (ψ ◦ γ(t)) ∈ IRn
and ∂tk ∂t$ (ψ ◦ γ(t)) ∈ IR
" . Definition 31.4.2 #is constructed
n
from the double application of Definition 30.4.1.
When the coordinates ψ(γ(t)), ∂tk (ψ ◦ γ(t)) " ∈ IR 2n
of γk (t) in
# the" total tangent space T (M ) are
# differen-
tiated with respect to t , the result is ∂t$ ψ(γ(t)), ∂tk (ψ ◦ γ(t)) = ∂t$ (ψ ◦ γ(t)), ∂tk ∂t$ (ψ ◦ γ(t)) .
%

31.5. Differentials of real-valued functions for higher-order operators 639
[ Do a diagram for second-order differentials of curve families similar to Figure 31.3.1. ]

31.4.4 Remark: The second-order tangent vector γk% (t) in Definition 31.4.2 is a chart-independent vector
in Tγk (t) (T (M )). The proof of this is the same as for Theorem 31.3.3. In this case, though, the second and
third parts v k and wk of the vector are not generally the same unless k = /. That is, the horizontal part of
the second-order differential is not generally the same as the first-order differential.
Related to this is the fact that the partial tangent vector fields do not generally satisfy γk% = γ%k when k -= /.
Such fields are not even comparable because they are in different tangent spaces Tγk (t) (M ) and Tγ$ (t) (M ).
31.5. Differentials of real-valued functions for higher-order operators

Higher-order tangent vectors and tangent operators are defined in Section 29.3. First-order differentials of
real-valued functions are defined for first-order tangent vectors and operators in Section 30.2.
31.5.1 Definition: For any function f ∈ C 2 (M ) and point p in a C 2 manifold M , the differential of f
[2]
at p for second-order tangent operators is the linear map (d[2] f )p : T̊p (M ) → IR defined by
∀L ∈ T̊p[2] (M ), (d[2] f )p (L) = L(f ).

[2]
The differential of f at p for second-order tangent vectors is the linear map (d[2] f )p : Tp (M ) → IR defined
by
∀W ∈ Tp[2] (M ), (d[2] f )p (W ) = DW f.
31.5.2 Remark: Some of the spaces and maps in Definition 31.5.1 are shown in Figure 31.5.1. These
pointwise differentials may now be extended to the whole second-order tangent space.
[2]
T̊p (M ) L
(d[2] f )p
L
C 2 (M ) f
IR
f
M p
Figure 31.5.1 Differential of a real-valued function for second-order operators at a point
31.5.3 Definition: For a function f ∈ C 2 (M ) on a C 2 manifold M , the differential of f (for second-order

tangent operators) is the map d[2] f : T̊ [2] (M ) → IR defined by
∀L ∈ T̊ [2] (M ), (d[2] f )(L) = L(f ).
The differential of f (for tagged second-order tangent operators) is the map d[2] f : T̂ [2] (M ) → IR defined by
" #
∀(p, L) ∈ T̂ [2] (M ), (d[2] f ) (p, L) = L(f ).
The differential of f (for second-order tangent vectors) is the map d[2] f : T [2] (M ) → IR defined by
∀W ∈ T [2] (M ), (d[2] f )(W ) = DW f.
[2]
[ Define the “generalized Hessian” in a C 2 manifold as the dual of Tp (M ). I.e. this Hessian is the linear
[2] [2]
map tp,a,b,ψ 8→ ∂p,a,b,ψ (f ). ]

31.6. Hessian operators at critical points

[ Should determine whether “stationary” might be a more accurate word than “critical”. ]
31.6.1 Remark: It is perhaps surprising that the Hessian of a real-valued function is, under the right
circumstances, a well-defined tensor even in the absence of a connection. The usual definition of the Hessian
operator incorporates a connection. (See Remark 29.6.1 for a comment on general Hessians. See Greene/
Wu [68], page 7, for Hessians with a connection.) It will be shown here that the Hessian of a real-valued
function f ∈ C 2 (M ) for a C 2 manifold M , at a critical point of f , namely a point p ∈ M such that (df )p = 0,
the Hessian of f at p is a well-defined tensor in T̊p0,2 (M ). Obviously, since this “connectionless” version of
the Hessian is restricted to critical points, it is not very effective at generating interesting vector fields from
real-valued functions in the way that the first-order differential df does.
It is tempting to use a notation such as (d2 f )p for the Hessian of f at p, but this has some difficulties.
The differential of df should be a map from T (T (M )) to IR, as discussed in Section 31.1. In the presence
of a connection, it is usual to write D2 f for the Hessian of f . This is the same thing as the Hessian in
Theorem 31.6.2 if (df )p = 0. It is probably more accurate to denote the Hessian as (d[2] f )p , using the
notation of Section 31.5.
31.6.2 Theorem: Let M be a C 2 manifold. Then for any real-valued function f ∈ C 2 (M ) and point
p ∈ M such that (df )p = 0, the map Hf,p : Tp (M ) × Tp (M ) → IR defined for all tp,u,ψ , tp,v,ψ ∈ Tp (M ) by
" # n
5 ∂2 &
&
Hf,p tp,u,ψ , tp,v,ψ = ui v j (f ◦ ψ) & (31.6.1)
i,j=1
∂xi ∂xj x=ψ(p)
is a tensor in Tp0,2 (M ) which is independent of the chart ψ ∈ atlasp (M ).
Proof: [ See Theorem 29.1.3 for transformation rules for second-order operators. ]
[ Must show that Hf,p is a tensor and that it is chart-independent. Should also show that when (df )p -= 0, it
is a tensor but is chart-dependent. ]
For a fixed chart ψ, the function Hf,p is clearly bilinear with respect to u, v ∈ IRn . Therefore for this fixed
chart, Hf,p ∈ L 2 (Tp (M ), IR) ∼= L 2 (Tp∗ (M ), IR)∗ = Tp0,2 (M ).
&
31.6.3 Remark: For f and ψ as in Theorem 31.6.2, define fij = ∂ 2 /∂xi ∂xj (f ◦ ψ)&x=ψ(p) for i, j = 1, . . . n.
!n
Then Hf,p = i,j=1 fij ei ⊗ ej , where the cotangent vectors ei = (dψ i )p ∈ Tp∗ (M ) are combined to obtain
!n
ei ⊗ ej ∈ Tp0,2 (M ), and the right-hand side of equation (31.6.1) may be written as i,j=1 fij ui v j .
[ Interpret Hessians in the context of contravariant tensors aij ei ⊗ ej ∈ Tp2,0 (M ). ]
31.7. Differentials of differentiable maps for higher-order operators

Differentials for second-order operators should be useful for the second-order analysis of “geodesic leverage”
or “pantograph” maps in Section 37.9.
Differentiable maps are defined in Section 26.9. Higher-order tangent vectors and tangent operators are
defined in Section 29.3. First-order differentials of differentiable maps are defined for first-order tangent
vectors and operators in Section 30.3.
The first-order differential of a differentiable map for second-order tangent vectors and operators is defined in
this section in terms of tangent vector coordinates (Definition 31.7.1) and then in terms of tangent operators
(Definition 31.7.3).
31.7.1 Definition: The differential for second-order tangent vectors at a point p ∈ M1 of a differentiable
map φ ∈ C 2 (M1 , M2 ), where M1 and M2 are C 2 manifolds with m = dim(M1 ) and n = dim(M2 ), is the
[2] [2]
map (d[2] φ)p : Tp (M1 ) → Tφ(p) (M2 ) defined by
[2]
∀tp,a,b,ψ1 ∈ Tp[2] (M1 ), ∀ψ2 ∈ atlasφ(p) (M2 ),
" [2] # [2]
(d[2] φ)p tp,a,b,ψ1 = tφ(p),ã,b̃,ψ ,
2

31.7. Differentials of differentiable maps for higher-order operators 641
where ã ∈ Sym(n, IR) is the real symmetric n × n matrix defined by
" #&& " #&&

ãk% = aij ∂xi ψ2k ◦ φ ◦ ψ1−1 (x) & ∂xj ψ2% ◦ φ ◦ ψ1−1 (x) &
x=ψ1 (p) x=ψ1 (p)
for all k, / = 1, . . . n, and b̃ ∈ IR is defined by

n
" #&& " #&&
b̃k = aij ∂xi ∂xj ψ2k ◦ φ ◦ ψ1−1 (x) & + bi ∂xi ψ2k ◦ φ ◦ ψ1−1 (x) &
x=ψ1 (p) x=ψ1 (p)
[2]
= ∂p,a,b,ψ1 (ψ2k ◦ φ)
for all k = 1, . . . n.
31.7.2 Remark: As in the case of first-order tangent operators, the differential of a map for second-order
operators is determined in Definition 31.7.3 by simply passing the test function from M2 to M1 via the map.
Once again, the form of the definition appears much simpler for operators than for coordinate vectors.
!
The expression for ãk% looks like mi,j=1 a ∂p,ei ,ψ1 (ψ2 ◦ φ)∂p,ej ,ψ1 (ψ2 ◦ φ), where ei ∈ IR
ij m
are the standard
orthonormal basis vectors. This expression resembles a T (M1 ) tensor. [ Must check whether there is any
2,0
truth in this. ]
31.7.3 Definition: The differential for second-order operators at a point p ∈ M1 of a differentiable

[2] [2]
map φ ∈ C 2 (M1 , M2 ), for C 2 manifolds M1 and M2 , is the map (d[2] φ)p : T̊p (M1 ) → T̊φ(p) (M2 ) defined by
" [2] #
∀L ∈ T̊p[2] (M1 ), ∀f ∈ C 2 (M2 ), (d φ)p (L) (f ) = L(f ◦ φ).
31.7.4 Remark: The map (d[2] φ)p is obviously linear. Figure 31.7.1 shows spaces and maps relevant to
Definition 31.7.3, and the projection maps πk : T̊ (Mk ) → Mk for k = 1, 2.
(d[2] φ)p (L)
(d[2] φ)p [2]
[2]
T̊p (M1 ) L T̊φ(p) (M2 )
π1 π2
ψ1 φ ψ2
p φ(p)
M1 M2
IRm IRn
f ◦φ f
IR
Figure 31.7.1 The differential of a map for second-order operators
31.7.5 Theorem: Definitions 31.7.1 and 31.7.3 are consistent with each other. That is, D(d[2] φ)p (W ) =
[2]
(d[2] φ)p (DW ) for all W ∈ Tp (M1 ) and p ∈ M1 .
[2]
Proof: (d[2] φ)p (W ) is the result of the differential of φ applied to W ∈ Tp (M1 ), whereas (d[2] φ)p (DW )
[2]
is the result of the differential
" [2] # of φ applied to DW ∈ T̊p (M1 ). To prove Theorem 31.7.5, it is necessary to
calculate (d φ)p (L) (f ) with L = DW .
[2] [2] [2] [2]
Let W = tp,a,b,ψ1 ∈ Tp (M1 ). Then L = DW = ∂p,a,b,ψ1 ∈ T̊p (M1 ). So for ψ2 ∈ atlasφ(p) (M2 ) and f ∈

C 2 (M1 ),
" [2] # [2]
(d φ)p (L) (f ) = ∂p,a,b,ψ1 (f ◦ φ)
+ " #,&&
= (aij ∂xi ∂xj + bi ∂xi ) (f ◦ ψ2−1 ) ◦ (ψ2 ◦ φ ◦ ψ1−1 (x)) &
x=ψ1 (p)
" −1
#&& " k −1
#&& " #&&
= a ∂x̃k ∂x̃$ f ◦ ψ2 (x̃) &
ij
∂xi ψ2 ◦ φ ◦ ψ1 (x) & ∂xj ψ2% ◦ φ ◦ ψ1−1 (x) &
x̃=ψ2 (φ(p)) x=ψ1 (p) x=ψ1 (p)
" −1
#&& " k −1
#&&
+ a ∂x̃k f ◦ ψ2 (x̃) &
ij
∂xi ∂xj ψ2 ◦ φ ◦ ψ1 (x) &
x̃=ψ2 (φ(p)) x=ψ1 (p)
" −1
#&& " k −1
#&&
+ b ∂x̃k f ◦ ψ2 (x̃) &
i
∂xi ψ2 ◦ φ ◦ ψ1 (x) &
x̃=ψ2 (φ(p)) x=ψ1 (p)
" # & " #&&
−1 & −1
= ãk% ∂x̃k ∂x̃$ f ◦ ψ2 (x̃) & + b̃k ∂x̃k f ◦ ψ2 (x̃) &
x̃=ψ2 (φ(p)) x̃=ψ2 (φ(p))
[2]
= ∂φ(p),ã,b̃,ψ (f )
2
= Dt[2] (f )
φ(p),ã,b̃,ψ2
= D(d[2] φ)p (W ) (f ),
where ã and b̃ are as in Definition 31.7.1. The theorem follows immediately.
[ Must find or make a macro to permit page-splitting in the above equation display. ]
31.7.6 Remark: Theorem 31.7.5 leads to the commutative diagram illustrated in Figure 31.7.2.
(d[2] φ)p
T̊ (M1 ) T̊ (M2 )
D D
T (M1 ) T (M2 )
(d[2] φ)p
Figure 31.7.2 Differential map for second-order vectors and operators
31.8. Differentials of curves for higher-order operators

This section deals with the push-forth of higher-order operators according to curves and families of curves.
[2]
[ For γ : IR → M , should get (d[2] γ)(t) ∈ Tγ(t) (M ). ]

[643]
Chapter 32
Vector field calculus
32.1 Naive vector field derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643

32.2 The Poisson bracket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
32.3 Vector field derivatives for curve families . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
32.4 Lie derivatives of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
32.5 Lie derivatives of tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
32.6 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
32.1. Naive vector field derivatives

Naive vector field derivatives are derivatives with respect to vector fields on manifolds which do not use
any definition of parallelism. This may be contrasted with Lie derivatives which use parallel transport with
respect to the vector field and covariant derivatives which use parallel transport with respect to a connection.
When there is some definition of parallel transport, it is possible to use some notion of “pull-back” to force
the function being differentiated to stay inside the fibre set at a fixed point of a manifold so that the derivative
of the function can remain in the same class of functions. However, in the absence of any notion of parallel
transport, the function and its derivative are generally in different function classes. Therefore naive vector
field derivatives are often of very limited utility. On the other hand, they are the important for the definition
of other derivatives such as the Lie and covariant derivatives.
32.1.1 Remark: The action ∂V in Definition 32.1.2 is the same as the action DV in Notation 27.5.7.
The differential action with respect to vector fields in Definition 32.1.3 is defined in terms of this pointwise
differential operator ∂V = DV for real-valued functions. The vector field spaces X(M ) and X r (M ) for
−
r∈ + 0 are defined in Notation 28.5.7.
32.1.2 Definition: The action of a vector V ∈ T (M ) on C 1 real-valued functions on a C 1 manifold M is

the map ∂V : C 1 (M ) → IR where ∂V f = DV f for all f ∈ C 1 (M ).
32.1.3 Definition: The action of a vector field X ∈ X(M ) on C 1 real-valued functions for a C 1 manifold
M is the map ∂X : C 1 (M ) → (M → IR) where ∂X f : M → IR is defined by (∂X f )(p) = ∂X(p) f for all
f ∈ C 1 (M ) and p ∈ M .
32.1.4 Remark: The action ∂X in Definition 32.1.3 is the same as the action DX in Notation 28.5.10. It is
redefined here with a new notation to emphasize that it is a member of a family of naive differential actions,
whereas the notation DX emphasizes membership of the family of covariant derivatives. These derivatives
just happen to be identical in the case of derivatives of real-valued functions.
−
32.1.5 Theorem: A vector field X in a C k+1 manifold M is of class C k for k ∈ + 0 if and only if
∀f ∈ C k+1 (M ), ∂X f ∈ C k (M ).
32.1.6 Remark: Theorem 32.1.5 suggests that one may compose the actions of two vector field actions
∂X and ∂Y to construct an action ∂X ◦ ∂Y : C k+2 (M ) → C k (M ), although this is clearly not a simple vector
field action. This is discussed in Section 32.2. It turns out that ∂X ◦ ∂Y ∈ T̊ [2] (M ).

644 32. Vector field calculus
32.1.7 Remark: Corresponding to the vector field action on real-valued functions in Definition 32.1.3 is
the vector field action on vector fields in Definition 32.1.11. This is defined in terms of the action of a vector
on a vector field in Definition 32.1.8. The partial derivative notation ∂X Y emphasizes that this derivative
is not the same as the covariant derivative which is denoted as DX Y , of the Lie derivative which is denoted
as LX Y . Both DX Y and LX Y are constructed from ∂X Y by subtracting suitable correction factors.
32.1.8 Definition: The action of a vector V ∈ T (M ) on C 1 vector fields on a C 2 manifold M is the

map ∂V : X 1 (M ) → T (T (M )) where ∂V Y ∈ TY (p) (T (M )) is defined for V ∈ Tp (M ) and Y ∈ X 1 (M ) by
∂V Y = Y∗ (V ), where Y∗ : T (M ) → T (T (M )) is the induced map of Y .
32.1.9 Remark: The differential action ∂V Y in Definition 32.1.8 is the rate of change of the vector field
Y in the direction V . This rate of change is not in the tangent space T (M ), but it is in the second
tangent space T (2) (M ) = T (T (M )). Therefore it is a well-defined vector, although as all texts point out, its
components do not transform like a vector in T (M ). The vector Y∗ (V ) is also denoted as (dY )p (V ).
32.1.10 Theorem: Let M be a C 2 manifold. Let V = tp,v,ψ ∈ T (M ) and Y ∈ X 1 (M ). Then the action
∂V Y in Definition 32.1.8 satisfies
P" & #Q
&
∂V Y = Y (p), (v, v k (∂xk η(x))& ), ψ̃ ,
x=ψ(p)
where η = π2 ◦ ψ̃ ◦ Y ◦ ψ −1 is the component function of Y for the chart ψ.
32.1.11 Definition: The action of a vector field X ∈ X(M ) on C 1 vector fields for a C 2 manifold M is
the map ∂X : X 1 (M ) → (M → T (T (M ))) where ∂X Y : M → T (T (M )) is defined by (∂X Y )(p) = ∂X(p) Y
for all Y ∈ X 1 (M ) and p ∈ M .
32.1.12 Remark: Unfortunately, the result of differentiating a vector field Y with respect to a vector
field X in Definition" 32.1.11 is not a vector
# field in a space such as X 0 (T (T (M ))) as one would hope.
Instead, ∂X Y ∈ X T (T (M )), π ◦ π∗ , M , where (T (T (M )), π ◦ π∗ , M ) is the fibration of double tangents
0
over the base space M instead of the usual T (M ). The map π ◦ π∗ is the composition of the projection maps
π : T (M ) → M and π∗ : T (T (M )) → T (M ). Thus ∂X Y is a cross-section of this space of second tangents
over M as a base space. The reason for this is that ∂X Y only specifies a single vector in T (T (M )) for each
point p ∈ M rather than a vector at all points of T (M ).
Theorem 32.1.13 is written in terms of the set X k (T (T (M )), π ◦ π∗ , M ) of C k cross-sections of the fibration
of the second tangent space T (T (M )) of M over the base space M . The general notation X k (E, π, B) for
the set of C k cross-sections of a fibration (E, π, B) is defined in Notation 26.13.14.
−+
32.1.13 Theorem: Let M be a C k+2 manifold for some k ∈ 0 . Then
" #
∀X ∈ X k (M ), ∀Y ∈ X k+1 (M ), ∂X Y ∈ X k T (T (M )), π ◦ π∗ , M ,
where π : T (M ) → M and π∗ : T (T (M )) → T (M ) are the usual projection maps.
[ The notation X k (E, π, M ) for differentiable fibrations (E, π, M ) in Theorems 32.1.13 and 32.1.18 is defined
in Chapter 34. ]
32.1.14 Remark: The differential action ∂X Y in Definition 32.1.11 satisfies (∂X Y )(p) = Y∗ (X(p)) for
all p ∈ M . In other words, ∂X Y = Y∗ ◦ X. Since Y is a cross-section of T (M ), it satisfies π ◦ Y = idM , where
π : T (M ) → M is the projection map of T (M ) and idM is the identity on M . Therefore π∗ ◦ Y∗ = idT (M ) ,
from which it follows that π∗ (∂V Y ) = V for all V ∈ Tp (M ). Hence π∗ ((∂X Y )(p)) = X(p) for all p ∈ M . In
other words, π∗ ◦ (∂X Y ) = X.
32.1.15 Remark: Strictly speaking, there should be different notations for the action of vectors on real-
valued functions and vector fields. The notation ∂V refers to different operators in expressions such as ∂V f
for f ∈ C 1 (M ) and ∂V X for X ∈ X 1 (M ). Naive vector field derivatives may be generalized to differentiable
cross-sections of any differentiable fibre bundle. Of special interest are tensor fields of general type.

32.2. The Poisson bracket 645
−
The space Xr,s k
(M ) = X k (T r,s (M )) for k ∈ +
0 consists of all C tensor fields of type (r, s) on a C
k k+1
manifold M . If a field K ∈ Xr,s (M ) are differentiated in a naive manner with respect to a vector V ∈
1
T (M ), the result is a vector ∂V K in the tangent space TK(p) (T r,s (M )) of the total tensor space T r,s (M ),
where p = π(V ). Therefore this differential operator has domain and range ∂V : Xr,s 1
(M ) → T (T r,s (M )). It
r,s
seems reasonable to label the operator with the tensor type. Thus ∂V : Xr,s (M ) → T (T r,s (M )). From this
1
perspective, the operator on real-valued functions is notated as ∂V0,0 f = ∂V f = DV f , and the operator on
vector fields is notated as ∂V1,0 X = ∂V X.
32.1.16 Definition: The action of a vector V ∈ T (M ) on C 1 tensor fields of type (r, s) on a C 2 manifold
M is the map ∂Vr,s : X 1 (T r,s (M )) → T (T r,s (M )) where ∂Vr,s K ∈ TK(p) (T r,s (M )) is defined for V ∈ Tp (M )
and K ∈ X 1 (T r,s (M )) by ∂Vr,s K = K∗ (V ), where K∗ : T (M ) → T (T r,s (M )) is the induced map of K.
32.1.17 Definition: The action of a vector field X ∈ X(M ) on C 1 tensor fields of type (r, s) on a C 2
r,s r,s
manifold M is the map ∂X : X 1 (T r,s (M )) → (M → T (T r,s (M ))) where ∂X K : M → T (T r,s (M )) is defined
by (∂X K)(p) = ∂X(p) K for all K ∈ X (T (M )) and p ∈ M .
1 r,s
−+
32.1.18 Theorem: Let M be a C k+2 manifold for some k ∈ 0 . Then for all tensor types (r, s),
∀X ∈ X k (M ), ∀K ∈ X k+1 (T r,s (M )),

r,s " #
∂X K = ∂X K ∈ X k T (T r,s (M )), π ◦ π̂, M ,
where π : T r,s (M ) → M and π̂ : T (T r,s (M )) → T r,s (M ) are the usual projection maps.
32.2. The Poisson bracket

The Poisson bracket is applicable to the definition of curvature of a connection for a general differentiable
fibre bundle – with or without a Lie structure group. An “infinitesimal curve” in the base space of a
differentiable fibre bundle induces a parallel motion of the fibre set over the points of the curve, and if this
parallel motion is viewed through a fibre chart, the motion is a global diffeomorphism of the fibre space, which
is a differentiable manifold. Therefore all connections are equivalent to vector fields which are generated by
diffeomorphisms. The most important characteristic of a connection is its curvature, which is defined as the
commutator of these vector fields. For example, if Xk is the field generated by the motion of a base point
with velocity Vk for k = 1, 2, then the curvature in the “plane” spanned by the ordered pair (V1 , V2 ) is the
commutator [X1 , X2 ]. Thus the fibre space vector field algebra corresponding to connections gives a very
general definition of curvature even when the customary connection form, which is defined in terms of the
Lie algebra of the structure group, is not itself defined.
In order to present the Poisson bracket of vector fields in a logically correct fashion, it is necessary to make
use of the second-order tangent vector space T [2] (M ) (Section 29.3) and the second-order tangent operator
space T̊ [2] (M ) (Section 29.1).
[ Present commuting vector fields on two-parameter curve families here. Try to show that the fields commute
if and only if they are the tangent fields for the parameters of a two-parameter curve family, or something
like that. Comment on the relation of this to “zero torsion”. ]
[ See EDM2 [35], 82.B, for canonical transformations and the Poisson bracket. See also EDM2 [35], 271.F, for
the relation to Hamilton-Jacobi equations, and 324.C,D for the relation to first-order PDEs. ]
32.2.1 Remark: Vector fields X, Y ∈ X 1 (T (M )) for a C 2 manifold M can be combined in the sense of
differential operator fields ∂X , ∂Y ∈ X 1 (T̊ (M )) acting on the space
& of real-valued functions C 2 (M ). Thus
0 2 &
∂X (∂Y f ) ∈ C (M ) for f ∈ C (M ). The composition ∂X ◦ ∂Y C 2 (M ) of these differential operators is in
the tangent operator field space X 0 (T̊ [2] (M )) which is defined in Section 29.7. This corresponds to the
component-oriented tangent vector field XY in X 0 (T [2] (M )) given by Definition 32.2.2.
32.2.2 Definition: The composition of vector fields X ∈ X(M ) and Y ∈ X 1 (M ) for a C 2 manifold M is
the second-order vector field XY ∈ X 0 (T [2] (M )) given by
[2] [2]
∀p ∈ M, ∀ψ ∈ atlasp (M ), (XY )(p) = tp, a(ψ(p)), b(ψ(p)), ψ ∈ Tp (M )

with a = (aij )ni,j=1 : Range(ψ) → Sym(n, IR) and b = (bj )nj=1 : Range(ψ) → IRn for n = dim(M ) defined by
∀x ∈ Range(ψ), ∀i, j = 1 . . . n, aij (x) = 12 (ξ i (x)η j (x) + η i (x)ξ j (x))

∀x ∈ Range(ψ), ∀i = 1 . . . n, bi (x) = ξ i (x)∂xi η j (x),
where ξ : Range(ψ) → IRn and η ∈ C 1 (Range(ψ), IRn ) are component functions for X and Y respectively
for the chart ψ, so that X(p) = tp,ξ(ψ(p)),ψ and Y (p) = tp,η(ψ(p)),ψ for p ∈ Dom(ψ).
32.2.3 Remark: Theorem 32.2.4 uses the drop function 8 : T (2) (M ) → T [2] (M ) in Definition 29.5.2. This
theorem follows easily by comparing the expression for bi (x) in Definition 32.2.2 with the chart-dependent
vertical component of ∂V Y in Theorem 32.1.10. The identity 8 ◦ (∂X Y ) = XY can be interpreted by
applying both sides as differential operators to a quadratic polynomial. Then the left side can be thought of
as the divergence of the gradient whereas the right side can be thought of as some sort of Hessian operator.
Theorem 32.2.4 is a generalization of the identity LX Y = [X, Y ] in Theorem 32.4.13. The generalization
probably isn’t very useful, but you never can tell with these things. Waste not, want not!
32.2.4 Theorem: 8 ◦ (∂X Y ) = XY for any X ∈ X(M ) and Y ∈ X 1 (M ) for a C 2 manifold M .
32.2.5 Remark: From Definition 32.2.2, the commutator [X, Y ] = XY − Y X for X, Y ∈ X 1 (M ) for a C 2
manifold M satisfies:
[2]
∀p ∈ M, ∀ψ ∈ atlasp (M ), [X, Y ](p) = tp, 0, (ξi ∂i ηj −ηi ∂i ξj )n ,ψ
j=1
≡ tp,(ξi ∂i ηj −ηi ∂i ξj )nj=1 ,ψ

∈ Tp (M ),
where 0 denotes the n × n zero matrix. A second-order tangent vector is equivalent to a first-order tangent
vector if the second-order component is zero. This vector [X, Y ](p) is chart-independent.
[2]
32.2.6 Remark: The calculation in Remark 32.2.5 shows that the tuple format tp,a,b,ψ = [(p, a, b, ψ)] for
second-order tangent vectors is inconvenient. It would be more logical to present the components in increasing
order, such as [(p, b, a, ψ)]. Even more logical would be an ordering such as [(ψ, x, b, a)] with x = ψ(p). Then
vectors of all orders could be regarded as infinite tuples for which only a finite number of components are
non-zero. Then an equivalence such as [(ψ, x, b, a)] ≡ [(ψ, x, b)] would seem quite natural. Probably in
computer software, things would be done this way. (Remark 29.3.17 is similar.)
[ Have a theorem stating that [X, Y ] in Remark 32.2.5 is chart-independent? ]
32.2.7 Definition: The composition of operator fields ∂X and ∂Y for X ∈ X 0 (M ) and Y ∈ X 1 (M ) for a
C 2 manifold M is the second-order operator field ∂X ∂Y ∈ X 0 (T̊ [2] (M )) given by
∀p ∈ M, ∀f ∈ C 1 (M ), (∂X ∂Y )(p)(f ) = (∂X ∂Y f )(p).
32.2.8 Theorem: Definitions 32.2.2 and 32.2.7 are equivalent. In other words, ∂XY = ∂X ∂Y .
32.2.9 Remark: Theorem 32.2.8 may be compared with the calculation ∂XY f = ∂X ∂Y f = ∂X (Y i ∂i f ) =
(∂X Y )i ∂i f + X i Y j ∂i ∂j f for functions f ∈ C 2 (M ). In other words, ∂XY = (∂X Y )i ∂i + X i Y j ∂i ∂j , where X i
means ξ i and Y j means η j . This informal calculation shows how the action of XY is related to the naive
derivative ∂X Y .
32.2.10 Definition: The Poisson bracket on a C 2 manifold M is the operation [·, ·] : X 1 (M ) × X 1 (M ) →
X 0 (M ) defined by [X, Y ] = XY − Y X for vector fields X, Y ∈ X 1 (M ).
32.2.11 Theorem: For any C ∞ manifold M , the tuple X ∞ (M ) − < (IR, X ∞ (M ), σIR , τIR , σA , τA , µ) is a Lie
algebra, where τA is the Poisson bracket on M and (IR, X (M ), σIR , τIR , σA , µ) is the real linear space with
∞
pointwise addition and multiplication.

Proof: See Definition 9.11.1 for Lie algebras. This theorem follows by Theorem 9.11.9.
[ Determine how much regularity the Poisson bracket inherits from the fields X and Y . ]
[ Gallot/Hulin/Lafontaine [20] has some material for this section? Proof of lemma 1.52 shows that vector
fields are closed under the Poisson bracket; defn. after lemma 1.52 for Poisson bracket; theorem 1.63 and
defn 1.64 on Lie derivative of the “push-forth”. ]

32.3. Vector field derivatives for curve families 647
32.3. Vector field derivatives for curve families

The Poisson bracket [X, Y ] has some special properties if the vector fields X and Y are the vector fields
generated by a two-parameter curve family.
32.3.1 Remark: Suppose that γ : IR2 → M is a C 2 curve family in a C 2 manifold M , and define X :
IR2 → T (M ) by X(s, t) = ∂s γ(s, t) ∈ Tγ(s,t) (M ). Similarly define Y : IR2 → T (M ) by Y (s, t) = ∂t γ(s, t) ∈
Tγ(s,t) (M ). These fields may be expressed more precisely as
P" #Q
∀s, t ∈ IR, ∀ψ ∈ atlasγ(s,t) (M ), X(s, t) = γ(s, t), ∂s (ψ(γ(s, t))), ψ
P" #Q
∀s, t ∈ IR, ∀ψ ∈ atlasγ(s,t) (M ), Y (s, t) = γ(s, t), ∂t (ψ(γ(s, t))), ψ .
For brevity, one may write simply X = γs and Y = γt . It is useful to calculate various definitions of
derivatives for such vector fields. For example, the action ∂X Y of one vector field on another in Definition
32.1.11 may be calculated as
P" " # #Q
(∂X Y )(p) = X(s, t), ∂t ψ(γ(s, t)), ∂s ∂t ψ(γ(s, t)) , ψ̃ ,
for all p = γ(s, t), where ψ̃ is the chart for T (M ) corresponding to ψ ∈ atlasp (M ). As expected, (∂X Y )(p) ∈
TX(p) (M ) and the horizontal component of (∂X Y )(p) is π∗ ((∂X Y )(p)) = Y (p). A fly in the ointment
here is the fact that Y is not always a vector field in the sense required by Definition 32.1.11 because
Dom(Y ) = Dom(γ) may not include a neighbourhood of p. A second fly in the ointment is the fact that γ
may not be injective and so Y (s, t) may have two different values for (s, t) such that γ(s, t) = p. Since X(p)
may also be multi-valued, the derivative ∂X(p) Y may become very ambiguous indeed. When there are so
many flies in the ointment, it is best to throw away the ointment and buy a new jar.
32.3.2 Theorem: Let M be a C 2 manifold with n = dim(M ) ≥ 2. Let ψ be a C 2 chart for M . Define the
vector fields X and Y on Dom(ψ) by X = eψ ψ ψ
1 and Y = e2 , where the basis vector fields ek : Dom(ψ) → T (M )
are as in Definition 28.5.12. Then the action of X on Y satisfies
P" " # #Q
∀p ∈ Dom(ψ), (∂X Y )(p) = Y (p), (1, 0, . . . 0), (0, . . . 0) , ψ̃ .
The composition XY of X and Y satisfies

P" #Q
∀p ∈ Dom(ψ), (XY )(p) = p, a(p), b(p), ψ ,
where a : Dom(ψ) → Sym(n, IR) is given by a(p)ij = 12 (δ1i δ2j + δ2i δ1j ) for i, j = 1, . . . n and b(p) = 0 ∈ IRn for
all p ∈ M .
The Poisson bracket [X, Y ] satisfies [X, Y ](p) = 0 for all p ∈ M .
[ Interpret Theorem 32.2.4 for X = γs , Y = γt . Also interpret this for contours etc. for functions f ∈
C 2 (IR2 , IR). ]
32.4. Lie derivatives of vector fields

In this section, the Lie derivative is defined for vector fields acting on vector fields. This is shown to be
equal to the Poisson bracket. In Section 32.5, the Lie derivative of general tensor fields is defined using the
association between tensor spaces and tangent vector spaces.
32.4.1 Remark: The first task for defining Lie derivatives of vector fields is to define the differential of a
vector under the flow generated by a vector field. This must then be subtracted from the actual differential
of the vector field in the direction of the flow. The difference between the actual differential and the “with
the flow” differential is defined as the Lie derivative of the vector field. This principle applies also to general
tensor fields, but for such fields it is more difficult to define the “with the flow” differential.

32.4.2 Remark: The “flow” of one vector field Y ∈ X 1 (M ) on a C 2 manifold with respect to a second
vector field X ∈ X 1 (M ) is defined in terms of a local family of differentiable maps φ : I → (M → M ) which
is constructed from the field X. This family φ does not necessarily have to exist or be unique. It is only
used as a construct to motivate the definition of the notion of flow. The first field Y may be thought of as
the passive field which is acted on by the active field X. Thus the passive field Y is made to flow according
to the active field X, which may be thought of as the velocity field of a fluid flow.
Now suppose that for all t ∈ I, for some open interval I ⊆ IR with 0 ∈ I, there is a C 2 diffeomorphism
φ(t) : M → M and φ(0) = idM is the identity map on M . Suppose also that for all p ∈ M , the map
t 8→ φ(t)(p) is a C 2 map (actually a curve) from I to M , and that this map satisfies ∂t (φ(t)(p)) = X(p) for
all p ∈ M . (If such a family of diffeomorphisms exists, it is said to be generated by the vector field X.)
For a fixed t ∈ I, the map φ(t) can be differentiated at each point p ∈ M . This gives the differential
(dφ(t))p : Tp (M ) → Tφ(t)(p) (M ) of φ(t) at p. For any tangent vector V = tp,v,ψ ∈ Tp (M ), the tangent vector
(dφ(t))p (V ) represents the result of making V flow under the influence of the field X. The vector V may be
thought of as the velocity of a curve γ which passes through the point p. Under the diffeomorphism φ(t),
this curve is transported to a new curve γ̄ = φ(t) ◦ γ which passes through φ(t)(p). The velocity of γ̄ as it
passes through p̄ = φ(t)(p) is then V̄ = (dφ(t))p (V ).
Let n = dim(M ). Then (dφ(t))p (V ) = tp̄,v̄,ψ2 for any ψ2 ∈ AM,p̄ , where
n
5
∀j = 1 . . . n, v̄ j = v i ∂yi (ψ2 ◦ φ(t) ◦ ψ −1 (y)).
i=1
The quantity of interest is the differential of (dφ(t))p (V ) with respect to t when t = 0, because this is the
velocity of the motion of V when it is made to flow according to the field X. The picture to have in mind is
a vector V attached to the point p which moves according to the vector field X along with all other points
in a neighbourhood of p. The desired differential with respect to t is the vector θX V ∈ TV (T (M )) defined as
" #&&
θX V = ∂t (dφ(t))p (V ) &
t=0
P" " & # #Q&&
&
= (dφ(t))p (V ), ∂t (ψ ◦ φ(t)(p)), ∂t (v i ∂yi (ψ ◦ φ(t) ◦ ψ −1 (y))& ) , ψ̃ & (32.4.1)
y=ψ(p) t=0
P" " & # #Q
&
= V, ξ(ψ(p)), v i ∂yi (ξ(y))& , ψ̃ , (32.4.2)
y=ψ(p)
where ψ2 is chosen equal to ψ because φ(t)(p) = p, and ψ̃ ∈ AT (M ) is the chart for T (M ) corresponding to
ψ ∈ AM as in Definition 27.8.1. (If you think the plethora of parentheses is confusing, you obviously haven’t
done much Lisp programming!) The above calculation is possibly not instantly clear. The expression on
(2)
line (32.4.1) has the form tV,(w ,w ),ψ̃ for some w1 , w2 ∈ IRn . This is the usual way in which vectors in the
1 2
tangent space T (T (M )) are expressed. (See Definitions 27.8.4 and 27.10.13 for details. Recall that vectors
in T (T (M )) have 2n components.) In this case, w1 = ∂t (ψ ◦ φ(t)(p)) is the rate of change of the n base
space coordinates with respect to t. As expected, this is equal to ξ(ψ(p)) ∈ IRn , which is the sequence of
components of X(p).
" #
The second sequence of n components of ∂t (dφ(t))p (V ) is w2 = ∂t (v i ∂yi (ψ ◦ φ(t) ◦ ψ −1 (y))). The important
step here is to swap the differential operators ∂t and ∂yi . This is okay because v i is constant and ψ ◦ φ(t) ◦
ψ −1 (y) is C 2 with respect to t ∈ I and y ∈ IRn . The result is then w2 = v i ∂yi ∂t (ψ ◦ φ(t) ◦ ψ −1 (y)). But
∂t (ψ ◦ φ(t) ◦ ψ −1 ) = ξ. Line (32.4.2) follows from this.
32.4.3 Remark: Although the ‘flow velocity’ θX V of a vector V for a vector field X in Remark 32.4.2 is
defined in terms of a family φ of diffeomorphisms which may or may not exist, the flow θX V is calculated
to be the vector on line (32.4.2) which does not require any such diffeomorphisms. Therefore this may be
taken as a general definition of the parallel translation velocity of a vector V for the vector field X.
The terminology “Lie connection” in Definition 32.4.4 is non-standard, but it seems reasonable enough
because the vector (dφ(t))p (V ) in Remark 32.4.2 is called the “Lie transport” of the vector V by the vector
field X. (See for example Crampin/Pirani [12], pages 64–69, for Lie transport.) The ‘Lie connection’ is the
differential of a parallel transport. So it is in fact a connection of a limited kind. However, instead of being
a per-path connection, it is a per-vector-field connection. Hence it is labelled with the vector field rather
than a curve velocity vector as is the case for an affine connection.

32.4. Lie derivatives of vector fields 649
32.4.4 Definition: The Lie connection with respect to a C 1 vector field X on a C 2 manifold M with
n = dim(M ) is the map θX : T (M ) → T (T (M )) defined by
∀p ∈ M, ∀v ∈ IRn , ∀ψ ∈ atlasp (M ),
P" " & # #Q
&
θX V = V, ξ(ψ(p)), v i ∂xi (ξ(x))& , ψ̃ ,
x=ψ(p)
where V = tp,v,ψ ∈ T (M ), X(q) = tq,ξ(ψ(q)),ψ for all q ∈ Dom(ψ), and ψ̃ is the T (T (M )) chart corresponding
to ψ.
32.4.5 Definition: A Lie transport of a vector V ∈ T (M ) of a C 2 manifold & M by a vector field X ∈

X 1 (M ) is a curve γ : I → T (M ) for some interval I ⊆ IR such that 0 ∈ I, γ &Int(I) is C 1 , and
(i) γ(0) = V ;
(ii) ∀t ∈ Int(I), γ # (t) = θX (γ(t)),
where θX is the Lie connection with respect to X.
32.4.6 Remark: The Lie transport curve γ in Definition 32.4.5 is a lift of the curve π ◦ γ : I → M in the
same way that curves are lifted by affine connections. (Here π : T (M ) → M is the projection map of T (M ).)
An important difference is that the base point curve π ◦ γ cannot be freely chosen. The base point curve is
generally determined by the starting point p = π(V ) and the field X. So parallelism is defined by this “Lie
connection” only along integral curves of X. Definition 32.4.5 does not require existence or uniqueness of
integral curves. But if existence and uniqueness are guaranteed, then one may refer to the Lie transport of
a vector by a vector field. The Lie connection in Definition 32.4.4 is well-defined even if integral curves of
X are non-existent or non-unique.
32.4.7 Remark: The vector θX V in Definition 32.4.4 looks very similar to ∂V X. In fact,
P" " & # #Q
&
∂V X = X(p), v, v i ∂yi (ξ(y))& , ψ̃ . (32.4.3)
y=ψ(p)
The vectors X(p) and V in lines (32.4.2) and (32.4.3) appear in different positions. Thus in line (32.4.3),
∂V X ∈ TX(p) (T (M )) and π∗ (∂V X) = V , where π∗ is the induced map or push-forth of π, whereas in
line (32.4.2), θX (V ) ∈ TV (T (M& )) and π∗ (θX (V )) = X(p). The vectors ∂V X and θX (V ) have the same
vertical component v i ∂yi (ξ(y))&y=ψ(p) , which is, of course, chart-dependent. This vertical component may
be interpreted as the difference between the parallel motion of V with respect to the vector field and the
parallel motion of V with respect to the coordinate system at p.
A problem with the parallel translation vector θX (V ) is the fact that it is an element of TV (T (M )) rather
than the tangent space Tp (M ) where p = π(V ). The set TV (T (M )) ⊆ T (2) (M ) = T (T (M )) contains
“tangents to tangents” as defined in Section 27.10. These vectors are chart-independent because they are
defined in terms of the charts of T (T (M )), not T (M ). Just like the affine connections in Chapter 36, these
vectors should be thought of a “tensorization terms” which are applied to the actual rate of change of vector
fields. The difference between them is then a “vertical vector” which can be converted to a vector in T (M )
by using a “drop function”.
Figure 32.4.1 shows the flow velocities θX (V1 ) and θX (V2 ) for vectors V1 , V2 ∈ Tp (M ) for a vector field X ∈
X 1 (T (M )). This emphasizes that the flow velocities are not elements of the tangent space T (M ).
32.4.8 Remark: The Lie derivative of a vector field Y with respect to a vector field X on a differentiable
manifold M is defined as the difference between the actual rate of change of Y and the rate of change it
would have if it was transported in a parallel fashion by the flow of X. This difference is ∂X(p) Y − θX (Y (p))
for p ∈ M . This is a vertical vector in TY (p) (T (M )), which means that (dπ)Y (p) (∂X(p) Y − θX (Y (p)) = 0,
where (dπ)Y (p) : TY (p) (T (M )) → Tp (M ) is the differential at Y (p) ∈ Tp (M ) of the projection map π :
T (M ) → M . Therefore a drop function 8 : T (T (M )) → T (M ) may be applied to the difference vector to
give LX (Y )(p) = 8(∂X(p) Y − θX (Y (p))) ∈ Tp (M ) for all p ∈ M . This is illustrated in Figure 32.4.2.

))
(T (M )) (T (M
θX (V1 ) ∈ V1
T )
∈ T X (p
V1 ∂ V1X
T (M ) T (M ) X(p)
V2 θX (
V2 )
∈T ∂V X
V2 (T 2 ∈T
(M ) X (p (
) ) T (M
π π ))
V1 ∈ Tp (M ) V1 ∈ Tp (M )
M p X(p) ∈ Tp (M ) M p X(p) ∈ Tp (M )
V2 ∈ Tp (M ) V2 ∈ Tp (M )
Figure 32.4.1 Flow velocities for vectors V1 , V2 for a vector field X
(T (M ))
∈ TY (p)
∂X (p)Y
T (M ) Y (p)
θX ∂X(p) Y − θX (Y (p)) ∈ ker((dπ)Y (p) )
(Y (
p )) ∈
π TY
(p) (
T (M
))
M p " #
LX (Y )(p) = 8Y (p) ∂X(p) Y − θX (Y (p))
∈ Tp (M )
Figure 32.4.2 Lie derivative of vector field Y with respect to X
32.4.9 Definition: The Lie derivative of a C 1 vector field Y with respect to a C 1 vector field X for a C 2
manifold M is the vector field LX Y ∈ X 0 (M ) defined by
∀p ∈ M, (LX Y )(p) = 8Y (p) (∂X(p) Y − θX (Y (p))),
where θX is the Lie connection on M with respect to X and 8 is the drop function given by Definition 27.11.6.
32.4.10 Remark: In terms of components, (LX Y )(p) = tp,ξi ∂i η−ηi ∂i ξ,ψ ∈ Tp (M ) for all p ∈ M in Defini-
tion 32.4.9 if X(p) = tp,ξ,ψ and Y (p) = tp,η,ψ for charts ψ ∈ atlasp (M ). This follows from the calculation
" (2) #
8 tV,(0,ξi ∂ η−ηi ∂ ξ),ψ̃ = tp,ξi ∂i η−ηi ∂i ξ,ψ .
i i
32.4.11 Remark: The notation LX Y for Lie derivatives could be confused with the notation Lg X for the
left translation of vector fields in Section 33.3. The former has a vector field as a subscript whereas the latter
has a group element. This will usually remove any ambiguity.
32.4.12 Remark: The hypothetical family of transformations φ in Remark 32.4.2 is more than needed. It
is sufficient to define a two-parameter curve family whose tangent vectors in one direction equal the vector
field Y at p ∈ M and equal X in the other direction. The differential ∂X Y for such a curve family is the
desired parallel flow θX Y of Y with respect to the field X, which must then be subtracted from the actual
rate of change ∂X Y of Y . It just happens that the vertical components of ∂Y X and ∂X Y are the same for
the parallel translated field Y . This leads on to the temptation to claim that LX Y = ∂X Y − ∂Y X. But this
is a misleading claim in some ways.
[ In Remark 32.4.12, put (∂s γ)(0, 0) = Y (p) and ∂t γ = X. ]

[ Try to interpret the ‘equality’ of θX Y and θY X in the case that X = γs and Y = γt by using T [2] (M ) vectors
or some sort of [(p, v, w1 , w2 , ψ)] vectors. ]
32.4.13 Theorem: LX Y = [X, Y ] for any C 1 vector fields X and Y on a C 2 manifold.

32.4. Lie derivatives of vector fields 651
32.4.14 Remark: The Lie derivative with respect to a field X may be thought of as the covariant derivative
for a parallelism which is defined for a curve class C which is the set of integral curves of X. This is a very
limited kind of parallelism because it only sets up isomorphisms between tangent spaces at points which are
connected by these integral curves.
32.4.15 Remark: The identity LX Y = [X, Y ] hides the different ways in which these two expressions
are constructed. The expression LX Y for vector fields X, Y ∈ X 1 (M ) for a C 2 manifold M is formed by
subtracting a flow-parallelism connection term θX Y from the naive derivative ∂X Y to give a T (T (M ))-valued
" ∂X Y − θX Y , so #that (∂X Y − θX Y )(p) ∈ T (T (M )) for p ∈ M . It happens that the horizontal component
field
π∗ (∂X Y − θX Y )(p) equals zero, and therefore a drop function 8 may be applied to this vertical vector to
give a vector field 8 ◦ (∂X Y − θX Y ) ∈ X 0 (M ).
On the other hand, the expression [X, Y ] is constructed by first subtracting the combined vectors XY, Y X ∈
X 0 (T [2] (M )) to give the commutator [X, Y ] = XY − Y X ∈ X 0 (T [2] (M )). In this case, it happens that the
second-order component of this commutator equals zero. Therefore it is equivalent to a first-order vector
field [X, Y ] ∈ X 0 (T (M )) by “dropping” the second-order vector into T (M ).
Thus LX Y and [X, Y ] are constructed in different higher spaces but are then dropped into the same
space X 0 (M ). It happens that they have the same value for all fields X, Y ∈ X 1 (M ).
The fact that two calculations return the same answer does not mean that one is necessarily calculating the
same thing. The fact that LX Y = [X, Y ] does not automatically imply that a Lie derivative and a Poisson
bracket have the same meaning. The Lie derivative expression LX Y is just one of a family of expressions
LX K for tensor fields K ∈ X 1 (T r,s (M )) of general type (r, s). The Poisson bracket expression [X, Y ] does
not seem to generalize to tensors of general type. The Poisson bracket is related to curvature concepts
since it typically measures the strength of interaction between families of transformations, whereas the Lie
derivative LX K typically measures the variation of any tensor field K with respect to a given flow field X.
32.4.16 Remark: A third expression which evaluates to the same result as LX Y and [X, Y ] in Remark
32.4.15 is “∂X Y − ∂Y X”. This is not really well defined because (∂X Y )(p) ∈ TY (p) (T (M )) and (∂Y X)(p) ∈
TX(p) (T (M )). But if the chart-dependent vertical components of these two expressions are subtracted, the
chart-independent result is the same as for LX Y and [X, Y ]. More precisely, 8(∂X Y )−8(∂Y X) = XY −Y X
is well-defined, but 8(∂X Y − ∂Y X) is meaningless in general because ∂X Y − ∂Y X is meaningless in general.
To see why ∂X Y − ∂Y X and XY − Y X are similar but different, note that for f ∈ C 2 (M ) and component
functions ξ and η for X and Y respectively, ∂XY f = ∂X ∂Y f = (ξ i ∂i )(η j ∂j )f = ξ i (∂i η j )∂j f + ξ i η j ∂i ∂j f . By
contrast, ∂X Y = ξ i (∂i η j ) looks like the first term in the component expression ∂X ∂Y = ξ i (∂i η j )∂j +ξ i η j ∂i ∂j .
Therefore ∂∂X Y -≡ ∂X ∂Y .
This implies that ∂X Y must not be confused with XY or ∂XY . The vector field XY and the opera-
tor field ∂XY may be regarded as interchangeable, but ∂X Y is not equivalent to the other two. In fact,
whereas XY ∈ X 0 (T [2] (M )) and ∂XY ∈ X 0 (T̊ [2] (M )), the field ∂X Y is in the second tangent vector field
space X 0 (T (T (M )), π ◦ π∗ , M ). In other words, XY is valued in T [2] (M ) whereas ∂X Y is valued in T (2) (M ).
Despite the clear difference between ∂XY and ∂X Y , it happens that ∂XY − ∂Y X = ∂[X,Y ] and ∂X Y − ∂Y X
both give the same result when applied to a C 2 real-valued function. The reason for this is that the second
term ξ i η j ∂i ∂j in the expression for ∂XY is cancelled in the commutator ∂X ∂Y − ∂Y ∂X when it is applied to
a real-valued function.
The author would like to take this opportunity to apologize for the banality of Remark 32.4.16.
32.4.17 Theorem: LX Y = 8(∂X Y ) − 8(∂Y X) = XY − Y X for all vector fields X, Y ∈ X ( M ) for a C 2

manifold M .
32.4.18 Remark: Continuing from Section 32.3, the Lie derivative LX Y may be calculated for vector fields
X and Y as in Theorem 32.3.2. Within the domain Dom(ψ) of a C 2 chart ψ, LX Y = 8(∂X Y − θX Y ) with
[2] [2]
(∂X Y )(p) = tY (p),((1,0,...0),(0,...0)),ψ̃ ∈ T [2] (M ) from Theorem 32.3.2, and θX (Y (p)) = tY (p),((1,0,...0),(0,...0)),ψ̃
from line (32.4.2), for all p ∈ Dom(ψ). Hence LX Y = 0. This is as expected from the identity LX Y = [X, Y ].
[ See Crampin/Pirani [12], page 77, for comparison of Lie derivative with covariant derivative. ]
[ Must do all of the Lie derivatives in this section also in the flat-space tensor chapter. ]

32.5. Lie derivatives of tensor fields

Lie derivatives are an extension of the Poisson bracket from vector fields to general tensor fields. Or perhaps
the Poisson bracket of vector fields is a special case of the Lie derivative. More precisely, the Poisson bracket
is a particular construction from two vector fields, whereas the Lie derivative is a family of operations on
fibre bundles which are associated with the tangent bundle of a manifold, and in the special case of the
Lie derivative of a vector field (a cross-section of the tangent bundle), the value of the Lie derivative just
happens to be the same as the Poisson bracket.
In most texts, the Lie derivative is expressed in a rather abstract form in terms of a family of local diffeo-
morphisms. In practice, it is essential to have a concrete expression for Lie derivatives acting on any given
kind of vector, covector or tensor field. To obtain an explicit expression for the Lie derivative for general
tensors and other fibre bundles on a given base manifold, it is necessary to use the concept of “associated
parallelism”, which is defined in Section 24.3. The Lie derivative is straightforward to define for the tangent
bundle of a differentiable manifold. The Lie derivatives for cross-sections of all other fibre bundles on the
same manifold are defined in terms of fibre bundle associations, which are defined in Section 23.10.3.
Just as covariant derivatives may be extended from cross-sections of tangent bundles (i.e. vector fields) to
cross-sections of arbitrary associated tangent fibre bundles (such as tensor fields and differential forms) by
using the concept of associated parallelism (and associated connections), so also Lie derivatives may be
extended from vector fields to general associated fibre bundles. Textbooks often perform this extension
by differentiating various contractions of vector and tensor fields, but this is just one way of defining the
association between different types of tensor fields.
[ Gallot/Hulin/Lafontaine [20], defn 1.109 gives a definition of Lie derivative for general types of tensor fields.
Must give an explicit formula for Lie derivatives of general tensor field types in terms of derivatives of K
and X. Will use associated fibre bundles for this. Will this require differentiable fibre bundles? ]
[ Maybe use a notation such as Lr,s
X : Xr,s (M ) → Xr,s (M ) for Lie derivatives of tensors of general type. ]
1 0
32.6. The exterior derivative

[ Try here to motivate or justify the definition of exterior derivative in terms of its superior properties. ]
32.6.1 Definition: The exterior derivative on the space (algebra?) Λ∗ (N, W ) (?) of C r differential forms
with coefficients in W on a subset N of a C ∞ manifold M is the map d : Λ∗ (N, W ) → Λ∗ (N, W ) (effectively)
defined by
∀ω ∈ Λm (N, W ), dω = . . .
[ See Malliavin, defn. I.4.4, p.117, Gallot et alia, and EDM 108.Q(2) for definitions of the exterior derivative. ]
[ Give the definition of Lie derivative of a differential form. ]
[ Here must present measure and integration theory for differential forms on differentiable manifolds. Present
the Gauß-Green theorem, the Stokes theorem, etc. etc. ]

[653]
Chapter 33
Differentiable groups
33.1 Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654

33.2 Hilbert’s fifth problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
33.3 Left invariant vector fields on Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
33.4 Right invariant vector fields on Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . 660
33.5 The Lie algebra of a Lie group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
33.6 Diffeomorphism groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
33.7 Lie transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
33.8 Infinitesimal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
33.0.1 Remark: Three kinds of group are presented in this chapter:

(1) Lie group: a group which is a differentiable manifold;
(2) Lie transformation group: a Lie group which is a group of diffeomorphisms of a differentiable manifold;
(3) diffeomorphism group: a topological group of diffeomorphisms of a differentiable manifold.
Diffeomorphism groups which are not necessarily differentiable manifolds (case (3)) are perhaps not, strictly
speaking, “differentiable groups”, but it is convenient to present them together with Lie groups and Lie
transformation groups. Since a group may always be regarded as a transformation group of itself, (1) may
be considered as a special case of (2) while (2) is a special case of (3).
33.0.2 Remark: Lie groups may be viewed as groups which happen to be differentiable or as differentiable
manifolds which happen to have a group structure. In the latter perspective, the group structure puts a
very strong constraint on the manifold. In fact, the manifold is in some senses almost completely determined
by a finite set of parameters which determine the Lie algebra at the origin. From this point of view, the
relation of Lie groups to general differentiable manifolds may be regarded as analogous to the relation of
polynomials to general differentiable functions of several real variables. In other words, the Lie groups are a
(relatively) tiny subclass of general differentiable manifolds. Nevertheless, just like polynomials, Lie groups
have an enormous range of applications and have therefore been very intensively studied.
Lie groups enter into differential geometry in two distinct ways: as structure groups for fibre bundles and as
examples of differentiable manifolds. In this chapter, Lie groups are considered only in the role as structure
groups. The reason for this is the cyclical nature of the definitions for Lie groups. Differentiable manifolds
provide a substrate for diferentiable fibre bundles, which require Lie groups for their full definition. But Lie
groups are themselves differentiable manifolds which have differentiable fibre bundles defined on them. So
there is an infinite tree of manifolds supporting fibre bundles, which include Lie groups which are manifolds
with fibre bundles defined on them, and so forth. To break this infinite cycle, tangent bundles were defined
on differentiable manifolds in Chapter 27 to not be a sub-class of differentiable fibre bundles. The resulting
tree of definition dependencies is illustrated in Figure 33.0.1.
33.0.3 Remark: The definitions for differentiable groups require the differentiable manifold definitions of
Chapter 26, and are in turn required by the definitions of differentiable fibre bundles in Chapter 34 and
connections in Chapters 35 and 36.


654 33. Differentiable groups
tangent bundle
Lie group
differentiable fibre bundle
Lie group with

differentiable fibre bundle connection
Lie group with connection
Figure 33.0.1 Family tree of Lie groups and differentiable fibre bundles
33.0.4 Remark: The core concept of differential geometry is curvature. The most general concept of
curvature on a differentiable fibre bundle is defined in terms of the exterior derivative of vector fields (in-
finitesimal left actions) on the fibre space which are induced by motion along curves in the base space. Such
an exterior derivative is equal to a Poisson bracket of vector fields. Therefore curvature and Lie algebras
of Lie groups are closely related. Since curvature in physical models is very often identified with the force
acting on a system, equations of motion and Lie algebras of Lie groups are also closely related. Similarly,
parallelism is closely related to least action principles. It is not surprising, therefore, that Lie groups and
Lie algebras are of such importance in physics.
33.0.5 Remark: It seems that the development of Lie groups (1869–99) arose out of a desire to apply the
transformation group approach of Galois theory (1832–46) to differential equations. Lie groups are also used
as structure groups for principal fibre bundles, associated with an attempt to make differential geometry
correspond in some way to the Erlanger Programm (1872) where geometries are thought of as the study of
invariants of transformation groups. David Hilbert’s fifth problem (1900) asked the question of whether a
topological group is always a Lie group. This was followed by over 50 years of effort to show finally (1952–53)
that the answer is yes, more or less.
Marius Sophus Lie (1842–1899) defined Lie groups to require first and second order differentiability. Hilbert
asked whether this requirement could be weakened. By 1952–53, it was shown that continuity and some weak
conditions would guarantee analyticity. Therefore it is customary to define Lie groups as being analytic.
Therefore the analytic condition has been adopted in Section 33.1.
−
In the case of Lie transformation groups, if the space acted upon by the group is C k for some k ∈ + 0 and
the group action is likewise C k for a fixed group element, then the group action can be proved to be C k
with respect to the group elements also. It does not seem that analyticity follows in this case. Therefore
the group actions of Lie transformation groups (Section 33.7) are defined here as having only C k regularity
−
from some k ∈ + 0 , although the groups themselves are defined to be analytic.
33.1. Lie groups

Malliavin [36], page 156, defines Lie groups to have a manifold structure of class C 3 and group action and
inverse map of class C 3 . Gallot et alia [20], page 27, require class C ∞ . EDM2 [35], section 249.A, requires
an analytic manifold and analytic group operations. Kobayashi and Nomizu [27], page 38, require C ∞
regularity, but they say on page 43 that the C ∞ condition may be replaced with analyticity.
This very confusing range of definitions for Lie groups seems to arise from the fact that they are all really
the same, if some fairly weak conditions are assumed. The simplest way to deal with this confusion is to
define Lie groups with the strongest reasonable assumptions, and then provide the theorems which guarantee
such strong assmptions in terms of weaker assumptions. Therefore Definition 33.1.1 is phrased in terms of
analyticity.
[ Define here a differentiable group of class C k . Then show that for such a group, there is an analytic atlas
for which the group is a Lie group. ]
33.1.1 Definition: A Lie group is a triple (G, AG , σ) such that

33.2. Hilbert’s fifth problem 655
(i) (G, σ) is a group;

(ii) (G, AG ) is an analytic manifold;
(iii) the operation σ : G × G → G is analytic with respect to AG ;
(iv) the inversion map f : G → G with f : g 8→ g −1 is analytic with respect to AG .
33.1.2 Remark: If TG denotes the topology induced by the atlas AG in Definition 33.1.1, then (G, TG , σ)
is a locally Euclidean topological group. See Definition 16.7.1 for topological groups and Definition 25.2.3
for locally Euclidean topology.
EDM2 [35], section 249.A, requires paracompactness of the manifold (G, AG ) in Definition 33.1.1. This seems
to be superfluous. According to Warner [50], lemma 1.9, page 9, any locally compact, Hausdorff, second
countable topological space is paracompact. He says that all manifolds are second countable. So it seems
that there are adequate conditions to guarantee paracompactness without it being explicitly stated. (See
Definition 15.7.14 for paracompactness.)
33.1.3 Remark: Many texts require the group operation (x, y) 8→ σ(x, y −1 ) = xy −1 from G × G → G to
be analytic for a Lie group. The purpose of this is to imply that the map y 8→ y −1 is analytic. However, this
is superfluous since, as remarked in Montgomery/Zippin [77], page 49, the analyticity of the map g 8→ g −1
follows from the analyticity of σ by the implicit function theorem. So condition (iv) of Definition 33.1.1 is
superfluous.
[ Prove that g 8→ g −1 is analytic. Then remove this from Definition 33.1.1. ]
33.1.4 Remark: Lie groups, regarded as transformation groups of themselves, are particularly useful for
defining differentiable principal fibre bundles.
[ Near here, present Lie group homomorphisms, isomorphisms and automorphisms. Also define “inner auto-
morphisms” g # 8→ gg # g −1 or g # 8→ g −1 g # g. These are also called conjugation maps. See Definition 9.3.22. ]
33.2. Hilbert’s fifth problem

33.2.1 Remark: Apparently continuity can be substituted for analyticity in Definition 33.1.1, and the
analyticity then follows. (See Sulanke and Wintgen [86].) Kobayashi and Nomizu [27], page 38, require a
Lie group to be a C ∞ manifold and state that it follows that the manifold is real analytic (page 43).
EDM2 [35], section 423.N, says that Hilbert’s fifth problem (in 1900) posed precisely this question, and it
was resolved in the positive in 1952. “It was proved [in 1952] that any locally connected finite-dimensional
locally compact group is a Lie group.” Of course, since a topological group does not have a differentiable
structure, it must be provided. So what this means is that there exists a differentiable atlas AG , compatible
with the topology TG on the topological group, for which the group (G, AG , σ) is a Lie group.
The real reference for this problem, known as Hilbert’s 5th problem, is the book by Montgomery/Zippin [77].
In section 4.10, they give some theorems on this subject, which unfortunately, are a little perplexing. Two
of their theorems are quoted here as Theorems 33.2.2 and 33.2.4.
33.2.2 Theorem: A locally euclidean group has no small subgroups and is isomorphic to a Lie group.
33.2.3 Remark: Theorem 33.2.2 is stated in Montgomery/Zippin [77], pages 70 and 184. They attribute
this result to Gleason [67] and Montgomery/Zippin [78].
33.2.4 Theorem: A locally compact group which is finite-dimensional and locally-connected is a Lie group.
33.2.5 Remark: Theorem 33.2.4 is stated in Montgomery/Zippin [77], page 185.

A difficulty with Theorem 33.2.4 is the apparent fact that any finite-dimensional topological group must also
be locally compact and locally connected. This raises the question of why such superfluous conditions are
included. One possibility is that the authors did not know that they were superfluous. Another possibility
is that they required the locally compact and locally connected conditions principally, and added the finite-
dimensional condition as an afterthought without thinking about the dependencies among the conditions.
It is also possible that their conditions are non-standard and therefore are not interdependent somehow.

The definition of finite-dimensional used by Montgomery/Zippin [77] is in terms of homeomorphisms to

Euclidean spaces, not the more abstract topological definition of Lebesgue dimension. (See Section 15.9
for topological dimension.) [ See EDM2 [35], section 117.B. ] They go on to say that for locally compact
metric groups, they would use the more general topological dimension, except that this would be equivalent
to using “cells” to define dimension. Since they do not define “cells”, it is difficult to know exactly what
they mean. Perhaps their page 184 clarifies this a little. This seems to imply that their n-cell is an n-cube
or similar structure, or possible a higher-dimensional tetrahedron. They also seem to be indicating that by
“dimension” they do mean some sort of generalized topological dimension from dimension theory, rather
than euclidean dimension.
Another difficulty with Theorems 33.2.2 and 33.2.4 is the fact that one of them talks about being isomorphic
to a Lie group whereas the other says that the group is a Lie group. The latter is probably just an informal
way of saying that the group is isomorphic under some change of coordinates.
All in all, it seems that Theorems 33.2.2 and 33.2.4 may be equivalent. If the generalized topological
dimension is intended in Theorem 33.2.4, then certainly Theorem 33.2.2 would follow from it, since a locally
Euclidean space is locally compact, locally connected and finite dimensional according to any reasonable
definition of topological dimension.
Figure 33.2.1 shows some family relations between topological groups and Lie groups.
topological group
(G,TG ,σG )
locally compact top. group locally connected top. group

(G,TG ,σG ) (G,TG ,σG )
locally Euclidean group

(G,TG ,σG )
EDM2, 423.N
Lie group
(G,AG ,σG )
Figure 33.2.1 Family tree of topological and Lie groups
33.3. Left invariant vector fields on Lie groups

General invariants of transformation groups are discussed in Section 9.7.
33.3.1 Remark: The left translation operators in Definition 33.3.2 are required for the definition of the
Lie algebra of a Lie group. The superscripts on the symbols for these operators distinguish the kinds of
spaces which they operate on: G for group elements, C for continuous real-valued functions, T for tangent
vectors, and F for fields. The superscript is omitted when the application space is obvious. As one would
expect, all of the left translation operators in Definition 33.3.2 become identity maps when g equals the
identity e of G.
33.3.2 Definition: Let G be a Lie group.
The left translation operator (for group elements) LG

g : G → G is defined for g ∈ G by
∀x ∈ G, g (x) = gx.
LG
The left translation operator (for real-valued functions) LC
g : C (G) → C (G) is defined for g ∈ G by
0 0
∀φ ∈ C 0 (G), g (φ) = φ ◦ Lg −1 .
LC G
That is,
∀φ ∈ C 0 (G), ∀x ∈ G, g (φ)(x) = φ(g
LC −1
x).

33.3. Left invariant vector fields on Lie groups 657
The left translation operator (for tangent operators) L̊Tg : T̊ (G) → T̊ (G) is defined for g ∈ G by
∀V ∈ T (G), L̊Tg (DV ) = DV ◦ LC

g −1 .
That is,
∀V ∈ T (G), ∀φ ∈ C 1 (G), L̊Tg (DV )(φ) = DV (LC
g −1 φ)
= DV (φ ◦ LG
g ).
The left translation operator (for tangent operator fields) L̊F

g : X̊ (G) → X̊ (G) is defined for g ∈ G by
0 0
∀X ∈ X 0 (G), g (DX ) = L̊g ◦ (DX ◦ Lg −1 ).

L̊F T G
That is,
∀X ∈ X 0 (G), ∀x ∈ G, g (DX )(x) = L̊g (DX (g
L̊F T −1
x)).
That is,
∀X ∈ X 0 (G), ∀x ∈ G, ∀φ ∈ C 1 (G),
g (DX )(x)(φ) = DX (Lg −1 x)(Lg −1 φ)
L̊F G C
= DX (g −1 x)(φ ◦ LG
g ).
[ Should extend Definition 33.3.2 to all tensor spaces T r,s (G) and some other spaces. ]
33.3.3 Remark: The interpretation of Definitions 33.3.2 and 33.3.7 may be assisted by Figure 33.3.1.
LG
g LC
g
LC LC
φ g φ
g
LG
g
x gx g −1 x LG
g −1 x
LTg LF
g
V LTg X(g −1 x) LF
g g (X)(x)
LF
LTg V
LC LC LC LC
g −1 φ g −1 φ
g −1 φ g −1 φ
x LG
g gx g −1 x LG
g −1 x
Figure 33.3.1 Left translation operators for Lie groups
33.3.4 Remark: Definition 33.3.2 shows the convenience of using tangent operators DV ∈ T̊ (M ) instead
of tangent vectors V ∈ T (M ). Just as in distribution theory, the translation of differential operators may
be easily expressed in terms of the reverse translation of test functions. For differentiable manifolds, it
is necessary to convert the tangent operator back into a tangent vector. This conversion is discussed in
Remark 27.6.6 and elsewhere.
In the case of tangent vector fields X ∈ X 0 (G), the notation DX means the pointwise assignment of
an operator to a vector. In other words, DX ∈ X̊ 0 (G) is defined by DX (x) = DX(x) for all x ∈ G
and X ∈ X 0 (G).
The left translation operators L̊Tg and L̊F
g tangent operators DV and tangent operator fields DX whereas
Lg and Lg apply to non-operator tangent vectors V and vector fields X respectively.
T F

33.3.5 Remark: In Definition 33.3.2, L̊Tg (DV ) ∈ T̊gx (G) for all V ∈ Tx (G) (i.e. for all DV ∈ T̊x (G)). So
L̊Tg does not define an automorphism of T̊x (G), but it is an automorphism of the total tangent space T̊ (G).
The left translation operator for vector fields L̊F
g is defined by taking the value of the field X at g
−1
x, but
the test function φ must then also be moved back to g x in order to apply the vector X(g x) to it.
−1 −1
33.3.6 Remark: Definition 33.3.7 gives the non-operator tangent vector version of Definition 33.3.2 for
the left translation functions LTg and LFg . Whereas the tangent operators in Definition 33.3.2 act on test
g )∗ .
functions, the tangent vectors in Definition 33.3.7 use the induced map (LG
The induced map (dLG g )∗ in Definition 33.3.7 may be written in terms of the pointwise differential. For
all V ∈ T (G), (dLg )∗ (V ) = (dLG
G
g )π(V ) (V ) and (dLg )∗ (X(g
G −1
x)) = (dLG
g )g −1 x (X(g
−1
x)).
The left translation operator (for tangent vectors) LTg : T (G) → T (G) is defined for g ∈ G by LTg = (LG
g )∗ :
∀V ∈ T (G), LTg (V ) = (LG

g )∗ (V ).
The left translation operator (for vector fields) LF

g : X (G) → X (G) is defined for g ∈ G by
0 0
∀X ∈ X 0 (G), g (X) = Lg ◦ X ◦ Lg −1
LF T G
= (LG
g )∗ ◦ X ◦ Lg −1 .
G
That is,
∀X ∈ X 0 (G), ∀x ∈ G, g (X)(x) = Lg (X(g
LF T −1
x))
g )∗ (X(g
= (LG −1
x)).
33.3.8 Theorem: Definition 33.3.7 is consistent with Definition 33.3.2. That is, DLTg (V ) = L̊Tg (DV ) for all
g ∈ G and V ∈ T (M ); and DLFg (X) = L̊F
g (DX ) for all g ∈ G and X ∈ X (G).
0
Proof: For g ∈ G, V ∈ T (M ) and φ ∈ C 1 (G), it follows from Definition 33.3.2 that L̊Tg (DV )(φ) = DV (φ ◦
g ). By Definition 30.3.23, this equals (Lg )∗ (DV )(φ). So L̊g (DV ) = (Lg )∗ (DV ). By Theorem 30.3.20, this
LG G T G
equals D(LG
g )∗ (V )
. By Definition 33.3.7, (LGg )∗ (V ) = Lg (V ). Therefore L̊g (DV ) = DLT
T T
g (V )
as claimed.
The equivalence DLFg (X) = L̊F
g (DX ) follows similarly from the calculation L̊g (DX )(x)(φ) = DX (g
F −1
x)(φ ◦
g ) = (Lg )∗ (DX (g x))(φ), so that L̊F
g (DX )(x) = (Lg )∗ (DX (g x)) = D(LG −1 x)) for x, g ∈ G
−1 −1
LG G G
g )∗ (X(g
and X ∈ X (G). But by Definition 33.3.7, (Lg )∗ (X(g x)) = Lg (X)(x) for all x ∈ G, and the result
0 G −1 F
follows from this.
33.3.9 Remark: In Definition 33.3.7, LTg (V ) ∈ Tgx (G) for all V ∈ Tx (G). So LTg does not define an
automorphism of Tx (G), but it is an automorphism of the total tangent space T (G) since LTg = (LG
g )∗ .
33.3.10 Theorem: For all elements g1 , g2 ∈ G of a Lie group G, LG g1 ◦ Lg2 = Lg1 g2 , Lg1 ◦ Lg2 = Lg1 g2 ,
G G C C C
L̊g1 ◦ L̊g2 = L̊g1 g2 , L̊g1 ◦ L̊g2 = L̊g1 g2 , Lg1 ◦ Lg2 = Lg1 g2 and Lg1 ◦ Lg2 = Lg1 g2 .
T T T F F F T T T F F F
33.3.11 Definition: A left invariant vector field on a Lie group G is a vector field X ∈ X 0 (G) such that
∀g ∈ G, g (X) = X.
LF
33.3.12 Theorem: Let G be a Lie group. Then X(g) = (LG g )∗ (X(e)) = (dLg )e (X(e)) for any left invariant
G
vector field X on G and any g ∈ G, where e is the identity of G.
Proof: Let X be a left invariant vector field on G. Then by Definitions 33.3.11 and 33.3.7, X(g) =
h )∗ (X(h
(LG −1
g)) for any g, h ∈ G. Put h = g. Then X(g) = (LG
g )∗ (X(e)) for all g ∈ G.
33.3.13 Remark: As mentioned in Remark 33.8.8, left invariant vector fields are infinitesimal right actions.

33.3. Left invariant vector fields on Lie groups 659
33.3.14 Theorem: Let G be a Lie group with identity e. Then for all V ∈ Te (G) there is a unique left
invariant vector field XV ∈ X 0 (G) such that XV (e) = V . Moreover, XV ∈ X ∞ (G) and XV (g) = LTg V for
all g ∈ G. Conversely, any left invariant vector field on G is of the form XV for some V ∈ Te (G).
Proof: Define XV : G → T (G) for V ∈ Te (G) by XV (g) = LTg V for all g ∈ G. Then XV (g) ∈ Tg (G) for
all g ∈ G. So XV ∈ X(G). To show that XV is left invariant, let x, g ∈ G. Then
g (XV )(x) = Lg (XV (g

−1
LF T
x))
= LTg LTg−1 x V
= LTx V
= XV (x).
So XV is left invariant by Definition 33.3.11. (This the same as Theorem 33.3.12.)

To prove that XV is C ∞ , it must be shown (by Remark 28.5.4) that ψ̃ ◦ XV ◦ ψ −1 is C ∞ for all ψ ∈ atlas(G),
where ψ̃ ∈ atlas(T (G)) denotes the tangent space chart corresponding to each manifold chart ψ ∈ atlas(G).
(See Definition 27.8.1 for tangent space charts.) By Definition 30.3.1 for the differential of a map, XV (g) =
g )e (V ) = tg,w,ψ2 for ψ2 ∈ atlasg (G), where w ∈ IR is defined by
(dLG n
n
5 &
−1 &
∀k = 1 . . . n, wk = v i ∂xi (ψ2k ◦ LG
g ◦ ψ1 (x)) &
x=ψ1 (e)
i=1
5n
" #&&
= v i ∂xi ψ2k (σ(g, ψ1−1 (x))) & ,
x=ψ1 (e)
i=1
where v ∈ IRn satisfies V = te,v,ψ1 for ψ1 ∈ atlase (G), and σ : G × G → G is the C ∞ group action of G.
Define f : U → IRn by f (x, y) = ψ2 (σ(ψ2−1 (y), ψ1−1 (x))) for (x, y) ∈ U = Dom(ψ1 ) × Dom(ψ2 ) ⊆ IR2n . Then
by the definition of a C ∞ manifold
!n map σ : G × G → G, f is a C ∞ function. Therefore the derivative
function (x, y) 8→ h(x, y) = i=1 v ∂xi f (x, y) is also C . It follows that w is C
i ∞ ∞
with respect to g,
−1
since w = h(ψ2 (g), ψ1 (e)). But for all y ∈ Dom(ψ2 ), ψ̃2 (XV (ψ2 (y))) = (y, h(y, ψ1 (e))) ∈ IR2n , which is now
obviously C ∞ with respect to y. Therefore XV is C ∞ as claimed.
The uniqueness and the converse follow from Definition 33.3.7 with x = g and X(e) = V .
33.3.15 Remark: The proof of Theorem 33.3.14 shows that left invariant vector fields XV are analytic.
33.3.16 Remark: One may try to prove the C ∞ regularity in Theorem 33.3.14 using the test-function
version of the left translation operator L̊Tg . In this case, the left invariant tangent operator field XV ∈ X̊ 0 (G)
would be constructed as XV (g) = L̊Tg (DV ) for V ∈ Te (G). In this case, the test for C ∞ regularity would
require XV (φ) ∈ C ∞ (G) for all φ ∈ C ∞ (G). For any x ∈ G, XV (φ)(x) = XV (x)(φ) = L̊Tx (DV )(φ) =
DV (φ ◦ LG
x ). It would be argued that φ ◦ Lx is C
G ∞
with respect to x, and therefore DV (φ ◦ LG x ) must be C
∞
with respect to x, and therefore the C regularity of XV (φ) is proved. However, although it is “obvious”
∞
that x 8→ DV (φ ◦ LG x ) is C , it is not obvious how one would prove it. To show that DV (φ(σG (x, ·))) is C
∞ ∞
would probably require some analysis using charts as in the proof of Theorem 33.3.14.
[ Is XL∞ (G) in Theorem 33.3.17 the same thing as XL (G)? ]
33.3.17 Theorem: The set XL∞ (G) of left invariant C ∞ vector fields on G is a subalgebra of the Lie
algebra X ∞ (G) of C ∞ vector fields on G.
Proof: It must be shown that XL∞ (G) is closed under both linear combinations and the Poisson bracket.
...
33.3.18 Remark: The tangent bundle of a Lie group G is trivial. A chart which demonstrates this is
φ : T (G) → Te (G) defined by φ : z 8→ (dLπ(z)−1 )π(z) (z), where e is the identity of G, π is the projection map
of T (G), and dLπ(z)−1 is the differential of the map Lπ(z)−1 : G → G.

33.3.19 Remark: Left translation of vectors defines an absolute parallelism on a Lie group. More precisely,
vectors V1 ∈ Tg1 (G) and V2 ∈ Tg2 (G) for g1 , g2 in a Lie group may be said to be parallel if V2 = LTg g−1 (V1 ).
2 1
Of course, right translation also defines an absolute parallelism.
[ Try to define left translations of general tensors, and left-invariant tensor fields. ]
33.4. Right invariant vector fields on Lie groups

In the interests of fair play, this section replicates Section 33.3 for the case of right invariant vector fields.
33.4.1 Remark: Right invariant vector fields are important for defining connections on principal fibre
bundles. Such connections must leave invariant the structure of fibre sets; therefore the infinitesimal trans-
formations specified by connections must be right invariant vector fields because these are equivalent to
infinitesimal left actions of the group on itself, as explained in Remark 33.8.8.
The right translation operator (for group elements) RgG : G → G is defined for g ∈ G by
∀x ∈ G, RgG (x) = xg.
The right translation operator (for real-valued functions) RgC : C 0 (G) → C 0 (G) is defined for g ∈ G by
∀φ ∈ C 0 (G), RgC (φ) = φ ◦ RgG−1 .
That is,
∀φ ∈ C 0 (G), ∀x ∈ G, RgC (φ)(x) = φ(xg −1 ).
The right translation operator (for tangent operators) R̊gT : T̊ (G) → T̊ (G) is defined for g ∈ G by
∀V ∈ T (G), R̊gT (DV ) = DV ◦ RgC−1 .

That is,
∀V ∈ T (G), ∀φ ∈ C 1 (G), R̊gT (DV )(φ) = DV (RgC−1 φ)
= DV (φ ◦ RgG ).
The right translation operator (for tangent operator fields) R̊gF : X̊ 0 (G) → X̊ 0 (G) is defined for g ∈ G by
∀X ∈ X 0 (G), R̊gF (DX ) = R̊gT ◦ (DX ◦ RgG−1 ).

That is,
∀X ∈ X 0 (G), ∀x ∈ G, R̊gF (DX )(x) = R̊gT (DX (xg −1 )).
That is,
∀X ∈ X 0 (G), ∀x ∈ G, ∀φ ∈ C 1 (G),
R̊gF (DX )(x)(φ) = DX (RgG−1 x)(RgC−1 φ)
= DX (xg −1 )(φ ◦ RgG ).
The right translation operator (for tangent vectors) RgT : T (G) → T (G) is defined for g ∈ G by RgT = (RgG )∗ :
∀V ∈ T (G), RgT (V ) = (RgG )∗ (V ).
The right translation operator (for vector fields) RgF : X 0 (G) → X 0 (G) is defined for g ∈ G by
∀X ∈ X 0 (G), RgF (X) = RgT ◦ X ◦ RgG−1 .
= (RgG )∗ ◦ X ◦ RgG−1 .

33.4. Right invariant vector fields on Lie groups 661
That is,
∀X ∈ X 0 (G), ∀x ∈ G, RgF (X)(x) = RgT (X(xg −1 ))
= (RgG )∗ (X(xg −1 )).
33.4.4 Remark: The interpretation of Definitions 33.4.2 and 33.4.3 may be helped by Figure 33.4.1.
RgG RgC
φ RgC RgC φ
RgG
x xg xg −1 RgG−1 x
RgT RgF
V RgT X(xg −1 ) RgF RgF (X)(x)
RgT V
RgC−1 φ RgC−1 RgC−1 φ RgC−1
φ φ
x RgG xg xg −1 RgG−1 x
Figure 33.4.1 Right translation operators for Lie groups
33.4.5 Theorem: Definition 33.4.2 is consistent with Definition 33.4.3. That is, DRgT (V ) = R̊gT (DV ) for
all g ∈ G and V ∈ T (M ); and DRgF (X) = R̊gF (DX ) for all g ∈ G and X ∈ X 0 (G).
Proof: The proof is the same as for Theorem 33.3.8.
33.4.6 Theorem: For all elements g1 , g2 ∈ G of a Lie group G, RgG1 ◦ RgG2 = RgG2 g1 , RgC1 ◦ RgC2 = RgC2 g1 ,
R̊gT1 ◦ R̊gT2 = R̊gT2 g1 , R̊gF1 ◦ R̊gF2 = R̊gF2 g1 , RgT1 ◦ RgT2 = RgT2 g1 and RgF1 ◦ RgF2 = RgF2 g1 .
33.4.7 Definition: A right invariant vector field on a Lie group G is a vector field X ∈ X 0 (G) such that
∀g ∈ G, RgF (X) = X.
33.4.8 Theorem: Let G be a Lie group. Then X(g) = (RgG )∗ (X(e)) = (dRgG )e (X(e)) for any right
invariant vector field X on G and any g ∈ G, where e is the identity of G.
33.4.9 Theorem: Let G be a Lie group with identity e. Then for all V ∈ Te (G) there is a unique right
invariant vector field XV ∈ X 0 (G) such that XV (e) = V . Moreover, XV ∈ X ∞ (G) and XV (g) = RgT V for
all g ∈ G. Conversely, any right invariant vector field on G is of the form XV for some V ∈ Te (G).
[ The formula XV (g) = RgT V in Theorem 33.4.9 is the same as Theorem 33.4.8. ]
33.4.10 Theorem: The set XR ∞

(G) of right invariant C ∞ vector fields on G is a subalgebra of the Lie
algebra X (G) of C vector fields on G.
∞ ∞
[ Must show the relation between left and right invariant fields. They’re probably conjugate or something. ]

33.4.11 Remark: Group elements may be thought of as “two-port” objects because they can be multiplied
both on the left and on the right, whereas the elements of the passive set of a Lie transformation group are
“one-port” objects because they can only be multiplied from the left. This turns out to be the fundamental
reason why principal fibre bundles have some advantages over ordinary fibre bundles. Theorem 33.4.12 is
an example of why the two-port property is useful: the left and right actions commute. This follows from
associativity.
33.4.12 Theorem: Let G be a Lie group. Then
g Rh = Rh Lg for all g, h ∈ G.
(i) LG G G G
(ii) LC
g Rh = Rh Lg for all g, h ∈ G.
C C C
(iii) LTg RhT = RhT LTg for all g, h ∈ G.
Proof: Part (i) follows from the calculation LG g Rh x = g(xh) = (gx)h = Rh Lg for all g, h ∈ G. Part (ii)
G G G
follows from the calculation Lg Rh (φ) = φ ◦ Lg−1 ◦ Rh−1 = φ ◦ Rh−1 ◦ Lg−1 = RhC LC
C C G G G G
g for all g, h ∈ G, which
follows from part (i).
Part (iii) has two forms depending on whether vector fields or operator fields are acted upon.
...
33.5. The Lie algebra of a Lie group

The Lie algebra of a Lie group is generally defined in two different ways, as tangent vectors in the tangent
space of the identity of the group (as in Definition 33.5.1) or as left invariant vector fields (as in Defini-
tion 33.5.2). By Theorem 33.3.14, these definitions are effectively equivalent.
[ Probably could use Theorem 9.11.9 to show that X ∞ (M ) is a Lie algebra with the commutator operation. ]
33.5.1 Definition: The (tangent-space version) Lie algebra of a Lie group G with identity e is the set
Te (G) together with the operation [·, ·] : Te (G) × Te (G) → Te (G) defined by
∀v, w ∈ Te (G), [v, w] = [Xv , Xw ](e),
where Xv , Xw are the left invariant vector fields in G defined in Theorem 33.3.14.
[ Must define the exponential map on a Lie group by elements of the Lie algebra: exp tA, for t ∈ IR and
A ∈ Te (G). Prove that the exponential map exists and is unique. See EDM2 [35] 249.Q. ]
[ Present the full specification tuples for Definitions 33.5.1 and 33.5.2. ]
33.5.2 Definition (→ 33.5.1): The (vector-field version) Lie algebra of a Lie group G is the set XL∞ (G)
of all left invariant vector fields in G together with the operation [·, ·] : XL∞ (G) × XL∞ (G) → XL∞ (G) defined
by
∀X, Y ∈ XL∞ (G), [X, Y ] = XY − Y X.
[ Show that exp(f (x)) = f (exp(x)) for automorphisms f . See Crampin/Pirani [12], page 313. ]
33.5.3 Theorem: Definitions 33.5.1 and 33.5.2 for the Lie algebra of a Lie group are equivalent. [ That
is, there is some sort of Lie algebra isomorphism XL∞ (G) H Te (G). Define this isomorphisms. Mention the
exponential map. ]
[ Define linear representations of Lie algebras. See Remark 9.11.16. Also define irreducible representations,
adjoint representations and Killing forms. See EDM2 [35], section 248.B. For adjoint representations, see
EDM2 [35], 249.P, Crampin/Pirani [12], page 314, Fulton/Harris [109], page 106. Define ad(g) as the induced
map of the inner automorphism by g. ]
[ Theorem: If f : G → G is and automorphism, then f∗ : Te (G) → Te (G) is a Lie algebra automorphisms.
Similar theorems for homomorphisms etc. ]
[ Define general linear Lie algebras. See EDM2 [35] 248.A. ]

33.6. Diffeomorphism groups 663
33.6. Diffeomorphism groups

The structure groups for parallelism on differentiable fibre bundles are groups of diffeomorphisms of a dif-
ferentiable manifold. Lie transformation groups (in Section 33.7) are diffeomorphism groups for the special
case that the group itself is a differentiable manifold. A general diffeomorphism group can be very infinite-
dimensional. So strictly speaking, diffeomorphism groups are not generally differentiable groups themselves.
Of special interest for the analysis of parallelism on differentiable fibre bundles are families of diffeomor-
phisms, generators of diffeomorphisms, and vector fields generated by families of diffeomorphisms. In the
analysis of curvature, the Lie algebra of such vector fields plays an important role.
−+
33.6.1 Definition: The C r diffeomorphism group of a C r manifold M for r ∈ 0 is the transformation
group G − < (G, M, AM , σG , µ) of all C r diffeomorphisms from M to M .
< (G, M ) −
The C r diffeomorphism group of M may also be called the C r automorphism group of M .
[ Should define a standard topology on diffeomorphism groups, probably the topology of pointwise conver-
gence. ]
33.6.2 Remark: In Definition 33.6.1, the group elements are identified with their action on the mani-
fold M . This implies automatically that the group acts effectively on M because any two group elements
which have the same action on M are the same element by their definition. Since the identity function on
M is always a C r diffeomorphism, the C r diffeomorphism group is well defined for any C r manifold.
If the class is not specified, it is assumed to be C 1 . The C 0 diffeomorphism group of a C 0 manifold M is the
same thing as the topological automorphism group of M although the topology on M is represented by an
atlas instead of open sets. In general, there is no standard differentiable structure (such as a differentiable
atlas) on diffeomorphism groups, although the space of all vector fields on M which are generated by one-
parameter families of diffeomorphisms of M could be regarded as a kind of tangent space for the group.
−+
33.6.3 Definition: A group of C r diffeomorphisms of a C r manifold M for r ∈ 0 is a transformation
group G − < (G, M, AM , σG , µ) of C r diffeomorphisms from M to M .
< (G, M ) −
A group of C r diffeomorphisms of M may also be called a group of C r automorphisms of M .
33.6.4 Remark: A group of C r diffeomorphisms of a C r manifold in Definition 33.6.3 is necessarily a
subgroup of the C r diffeomorphism group of M according to Definition 33.6.1. If the group G has a
topology TG , and the group action is continuous with respect to TG and the topology on M , then this kind
of group may be called a topological group of C r diffeomorphisms as in Definition 33.6.5. The topological
group of all C r diffeomorphisms of a manifold M is required in Definition 33.6.6 to have the compact-open
topology.
−
33.6.5 Definition: A topological group of C r diffeomorphisms of a C r manifold M for r ∈ + 0 is a
topological transformation group G −
< (G, M ) −
< (G, TG , M, AM , σG , µ) of C diffeomorphisms from M to M
r
such that µ : G × M → M is continuous.

A topological group of C r diffeomorphisms of M may also be called a topological group of C r automorphisms
of M .
−
33.6.6 Definition: The topological C r diffeomorphism group of a C r manifold M for r ∈ + 0 is the
topological transformation group G − < (G, TG , M, AM , σG , µ) of all C r diffeomorphisms from M
< (G, M ) −
to M , where the topology TG is the compact-open topology on G.
The topological C r diffeomorphism group of M may also be called the topological C r automorphism group
of M .
[ Check that the compact-open topology is the most suitable topology for Definition 33.6.6. ]
−
33.6.7 Definition: A C r family of diffeomorphisms for r ∈ + 0 of a C manifold M is a map γ : I →
r
(M → M ) for some interval I ⊆ IR such that γ(t) : M → M is a C diffeomorphism of M for all t ∈ I, and
r
the map t 8→ γ(t)(z) is a C r map from I to M for all t ∈ I.

33.6.8 Definition: The vector field generated by a C r family γ of diffeomorphisms

& of a C r manifold M
&
for a parameter t0 ∈ I is the map X : M → T (M ) defined by X : z 8→ ∂t (γ(t)(z)) t=t for z ∈ M .
0
−+
33.6.9 Theorem: For all r ∈ , the vector field generated by any C r family of diffeomorphisms of a C r
manifold is of class C r−1
for all parameter values of the family.
33.7. Lie transformation groups

Lie transformation groups are particularly useful for defining differentiable ordinary fibre bundles and con-
nections on differentiable fibre bundles.
[ See Sulanke and Wintgen [86], section I.7. ]
[ Refer back to topological transformation groups and Euclidean topological transformation groups. ]
−+
33.7.1 Definition: A C r Lie transformation group for r ∈ 0 is a tuple
< (G, AG , M, AM , σG , µ) such that
(G, M ) −
(i) (G, AG , σG ) is a Lie group,
(ii) (M, AM ) is a C r manifold,
(iii) (G, TG , M, TM , σG , µ) is a topological transformation group with topologies TG and TM induced by the
atlases AG and AM respectively,
(iv) the map µ : G × M → M is C r with respect to the product differentiable structure on G × M and the
differentiable structure on M ; i.e. µ ∈ C r (G × M, M ).
33.7.2 Remark: A Lie transformation group is also known as a differentiable transformation group. The
manifold M in Definition 33.7.1 is sometimes called a G-manifold or G-space. In this book, it will be called
a Lie (transformation) group space, especially when the group G is not specified.
If the regularity class C r of a Lie transformation group is unspecified, it is assumed to be C 1 . (Some authors
may assume that it is C ∞ or analytic.)
33.7.3 Remark: This is a minimal-regularity definition as in EDM2 [35], section 431.C, which corresponds
to Montgomery/Zippin [77], page 195.
Malliavin [36], page 240 defines Lie groups of transformations as C ∞ transformations on the right of a C ∞
manifold. Kobayashi and Nomizu [27], page 41 also define Lie transformation groups to have a C ∞ action
on the right on a C ∞ manifold.
33.7.4 Remark: Figure 33.7.1 shows some of the relations between topological transformation groups and
Lie transformation groups. The symbols AG and AM refer to atlases on sets G and M respectively. The
atlases imply corresponding topologies for the respective sets.
transformation group
(G,X,σG ,µ)
topological group transf. group of top. space

(G,TG ,σG ) (G,X,TX ,σG ,µ)
Lie group top. transf. group of top. space

transf. group of top. space
(G,AG ,σG ) (G,TG ,X,TX ,σG ,µ)
M loc. compact, TG =compact-open top.
M loc. connected or unif. top. space
EDM2, 431.H(10)
G acts equicontinuously on M
C k Lie transf. group (G,TG ,X,TX ,σG ,µ)
(G,AG ,M,AM ,σG ,µ)
eff. top. transf. group, M is C k
EDM2, 431.H(11) G loc. compact, µ is C k w.r.t. M
(G,AG ,M,AM ,σG ,µ)
Figure 33.7.1 Family tree of topological and Lie transformation groups

33.7. Lie transformation groups 665
According to EDM2 [35], section 431.H. “Suppose that M is a C 1 manifold and G is a topological trans-
formation group of M acting effectively on M . If G is locally compact and the map x 8→ g(x) of M is of
class C 1 for each element g of G, then G is a Lie transformation group of M .” This seems to cover all of
the cases of interest. Thus if the group, the manifold and the group action are C 1 , then the group is a Lie
transformation group, which implies that it is analytic. This implies that there is not much point in defining
C r transformation groups for r > 1. However, it seems clear that there is a point in defining varying levels
of regularity for the manifold M and for the action µ : G × M → M . Of particular relevance to this is
Theorem 33.7.5, which is paraphrased from Montgomery/Zippin [77], page 212.
[ What is the relation between Theorem 33.7.5 (and Remark 33.7.4) and Hilbert’s fifth problem? ]
−
33.7.5 Theorem: Let (G, M, σG , µ) be a Lie transformation group of a manifold M . Let k ∈ + 0 . Suppose
that M is a C k manifold and Lg : M → M is C k (i.e. Lg ∈ C k (M, M )) for all g ∈ G. Then µ ∈ C k (G×M, M ).
If M is analytic and Lg : M → M is analytic, then the group action µ is analytic.
33.7.6 Notation: GL(n, IR) denotes the Lie group of general linear transformations of IRn . That is, it is
the group of real invertible n × n matrices. An abbreviated notation is GL(n).
[ Present examples here, such as SO(2) and SO(3). SO(3) is discussed a little in Section 41.8. These are
defined in Section 10.9. Should present here the full specification tuples of the classical groups. ]
33.7.7 Remark: Definition 33.7.8 is a generalization to Lie transformation groups of Definition 33.3.2 for
Lie groups. The left translation operators in Definition 33.7.8 use the same notations as in Definition 33.3.2,
but all act on the passive space M rather than G. A similar comment applies to Definition 33.7.9.
33.7.8 Definition: Let (G, M ) be a Lie transformation group.
The left translation operator (for group elements) LG

g : M → M is defined for g ∈ G by
∀x ∈ M, g (x) = gx.
LG
The left translation operator (for real-valued functions) LC
g : C (M ) → C (M ) is defined for g ∈ G by
0 0
∀φ ∈ C 0 (M ), g (φ) = φ ◦ Lg −1 .
LC G
That is,
∀φ ∈ C 0 (M ), ∀x ∈ M, g (φ)(x) = φ(g
LC −1
x).
The left translation operator (for tangent operators) L̊Tg : T̊ (M ) → T̊ (M ) is defined for g ∈ G by
∀V ∈ T (M ), L̊Tg (DV ) = DV ◦ LC
g −1 .
That is,
∀V ∈ T (M ), ∀φ ∈ C 1 (M ), L̊Tg (DV )(φ) = DV (LC
g −1 φ)
= DV (φ ◦ LG
g ).
The left translation operator (for tangent operator fields) L̊F

g : X̊ (M ) → X̊ (M ) is defined for g ∈ G by
0 0
∀X ∈ X 0 (M ), g (DX ) = L̊g ◦ (DX ◦ Lg −1 ).

L̊F T G
That is,
∀X ∈ X 0 (M ), ∀x ∈ M, g (DX )(x) = L̊g (DX (g
L̊F T −1
x)).
That is,
∀X ∈ X 0 (M ), ∀x ∈ M, ∀φ ∈ C 1 (M ),
g (DX )(x)(φ) = DX (Lg −1 x)(Lg −1 φ)
L̊F G C
= DX (g −1 x)(φ ◦ LG
g ).
33.7.9 Definition: Let (G, M ) be a Lie transformation group.

The left translation operator (for tangent vectors) LTg : T (M ) → T (M ) is defined for g ∈ G by
∀V ∈ T (M ), LTg (V ) = (LG
g )∗ (V ).
The left translation operator (for vector fields) LF

g : X (M ) → X (M ) is defined for g ∈ G by
0 0
∀X ∈ X 0 (M ), g (X) = (Lg )∗ ◦ X ◦ Lg −1 .
LF G G
That is,
∀X ∈ X 0 (M ), ∀x ∈ M, g (X)(x) = (Lg )∗ (X(g
LF G −1
x)).
[ Define also left invariant vector fields etc. See Definition 33.3.11. Remark that except for (G, G), left
invariant fields on M are not very practical. E.g. SO(2) on IR2 has no non-constant left invariant vector
fields? ]
33.8. Infinitesimal transformations

[ This topic is related to the exponential map and the “local Lie group of local transformations”. ]
In this section, vector fields on a G-manifold M are generated by differentiating transformations in a Lie
transformation group G. These vector fields may be thought of as “infinitesimal transformations” or “differ-
ential actions” of the group on the manifold, or “generators” of actions on the manifolds. These are used for
defining connections on ordinary fibre bundles because differential parallelism is represented as differential
actions on tangent fibre bundles.
33.8.1 Remark: Corresponding to left and right invariant vector fields on a Lie group, as outlined in
Sections 33.3 and 33.4, one may define vector fields on a G-manifold M with some similar properties. An
example of this for G = SO(3) and M = S 2 is presented in Remark 41.8.4.
33.8.2 Remark: Definition 33.8.4 defines an infinitesimal transformation YV on a G-manifold correspond-

ing to each element V of the Lie algebra Te (G).
To motivate Definition 33.8.4, consider a curve γ : I → G for some open interval I ⊆ IR such that 0 ∈ I
and γ(0) = e ∈ G. For each p ∈ M , one may define a curve γp : I → M by γp : t 8→ γ(t)p. Since M is a
manifold, differentiability of such curves is well-defined. So assume that γp is a C 1 &curve in M for all p& ∈ M .
If G is a Lie group, one may differentiate both γ and γp . Then γp# (0) = ∂t (γ(t)p)&t=0 = ∂t (Rp (γ(t)))&t=0 =
(dRp )e (γ # (0)), where Rp is the right action map in Definition 33.8.3. Thus if G is a Lie group, then the
infinitesimal action of the curve γ on M is of the form (dRp )e (V ) with V = γ # (0). However, even if G is
not a Lie group, it may be that the derivatives γp# (0) exist for all p ∈ M , in which case one may generalize
the definition to infinitesimal transformations of the form Yγ ∈ X(M ), where Yγ : p 8→ γp# (0). This more
general definition is applicable to connections on fibrations, whereas the Lie group version is applicable to
connections on differentiable fibre bundles.
33.8.3 Definition: The right action on a Lie transformation group G −

< (G, M, σG , µ) by a point p ∈ M
is the map Rp : G → M defined by Rp : g 8→ g.p = µ(g, p).
33.8.4 Definition: An infinitesimal transformation of a C 1 Lie transformation group G acting on a C 1

manifold M is a vector field YV ∈ X(M ) defined for V ∈ Te (G) by
∀p ∈ M, YV (p) = (dRp )e (V ),
where Rp : G → M is the right action of p on G.
33.8.5 Remark: Since the right action Rp on G by each element p ∈ M is a C 1 map, the differential dRp
is well-defined. So YV (p) ∈ Tp (M ) is well defined for all p ∈ M . The regularity of YV follows from the
regularity of the group action in a similar way to the regularity of the left invariant vector field XV ∈ X(G)
in Theorem 33.3.14.
33.8.6 Theorem: [ Have a theorem to say that YV is C k if (G, M ) is C k+1 , or something like that. The
proof is probably similar to that of Theorem 33.3.14 ].

33.8. Infinitesimal transformations 667
33.8.7 Remark: In principle, one could try to invert the construction in Definition 33.8.4 to obtain a
vector field on the group G. This does not seem to be useful, however. For each p ∈ S 2 and W ∈ Tp (S 2 ), one
may attempt to construct a vector field from the inverse map (dRp )g at each g ∈ G, such as (dRp )−1 g (W )
or ker((dRp )g ). The problem with (dRp )−1g (W ) is the fact that the linear map (dR p ) g is not generally
injective. Therefore (dRp )−1
g ({W }) would be some sort of hyperplane in Tg (G). The subspace ker((dRp )g )
of Tg (G) could possibly hold some interest, but it is difficult to see any immediate application.
This situation contrasts with the situation where σG : G × G → G yields useful left and right invariant
vectors fields on Lie groups G.
[ Have a definition of infinitesimal transformations for non-Lie transformation groups. Give conditions for
meaningfulness. ]
[ Look at composition and commutators of infinitesimal transformations. Should get some sort of Lie algebra
out of this, probably corresponding to Te (G) etc. ]
[ An infinitesimal right action is left invariant and vice versa. ]
[ In the case M = G, try to show that (dRp )e (v) is left or right invariant, but not usually otherwise. Try to
show that for G “bigger” than M , all left invariant vector fields are constant. If M is “bigger” than G, can
get infinitely many left invariant vector fields with the same value at a given point. But with (G, G), one
field value determines all values everywhere. ]
33.8.8 Remark: The form (dRp )e for infinitesimal transformations in Definition 33.8.4 may seem a little
surprising, but in the case M = G, so that G acts on G by left translation, Theorem 33.4.8 shows that a
right-invariant vector field X on G satisfies X(p) = (dRp )e (X(e)) for all p ∈ G, which matches perfectly
with Definition 33.8.4. Therefore an infinitesimal transformation is a generalization of right-invariant vector
fields from Lie left transformation groups of the special form (G, G) to general Lie left transformation
groups (G, M ).
It may seem odd that infinitesimal transformations of a left transformation group are so closely related to
right invariant vector fields. The reason for this is that left and right actions of groups commute, as stated
in Theorem 33.4.12. Therefore an infinitesimal left action by a group is invariant under right actions of the
group; so an infinitesimal left action is a right invariant vector field when the G-manifold M is the group G
itself.
Similarly, in the case of the Lie right transformation group (G, G) − < (G, G, σG , σG ) (same tuple as the
left transformation group but with a different object class, as mentioned in Remark 5.16.6), infinitesimal
transformations are infinitesimal right actions, and these are left invariant vector fields on G. In the case
of general Lie right transformation groups (G, M ), the infinitesimal transformations are of the form g 8→
(dLg )e (V ) for V ∈ Te (G).
Summarizing this, one may say that on a Lie group, left invariant vector fields are infinitesimal right actions,
and right invariant vector fields are infinitesimal left actions.
33.8.9 Definition: The left action on a Lie right transformation group G −

< (G, M, σG , µ) by a point
p ∈ M is the map Lp : G → M defined by Lp : g 8→ pg = µ(p, g).
33.8.10 Definition: An infinitesimal transformation of a C 1 Lie right transformation group G acting on

a C 1 manifold M is a vector field YV ∈ X(M ) defined for V ∈ Te (G) by
∀p ∈ M, YV (p) = (dLp )e (V ),
where Lp : G → M is the left action of p on G.


[669]
Chapter 34
Differentiable fibre bundles
34.1 Differentiable fibre bundles with non-Lie structure group . . . . . . . . . . . . . . . . . . 670

34.2 Differentiable fibre bundles with Lie structure group . . . . . . . . . . . . . . . . . . . . . 670
34.3 Vector fields on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . 671
34.4 Differentiable principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
34.5 Vector fields on differentiable principal fibre bundles . . . . . . . . . . . . . . . . . . . . . 674
34.6 Associated differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
34.7 Vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
34.8 Tangent bundles of differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . 677
34.9 Tangent frame bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
Chapter 23 on topological fibre bundles should be read before this chapter. Differentiable fibre bundles require
also the definitions of differentiable manifolds (Chapters 26–32) and Lie groups (Chapter 33). Differentiable
fibrations are defined in Section 26.13. The following overview table compares the regularity conditions for
definitions of topological and differentiable fibre bundles.
topological C k differentiable analytic

component symbol fibre bundle fibre bundle fibre bundle
total space topological space C k differentiable manifold analytic manifold
E
base space B topological space C k differentiable manifold analytic manifold
fibre space F topological space C k differentiable manifold analytic manifold
structure group G topological group Lie group Lie group
projection map π:E→B continuous Ck analytic
fibre charts φ:E→
˚ F continuous Ck analytic
group operation σ:G→G continuous analytic analytic
left action µ:G×F →F continuous Ck analytic
Whereas a topological fibre bundle specifies a topology for each space, a differentiable fibre bundle specifies
an atlas for each space. Any level of regularity beyond continuity requires atlases. These regularity-indicating
atlases must not be confused with fibre atlases which indicate global fibre structure.
Although pathwise parallelism may be defined on topological fibre bundles, the definitions of connections
in Chapter 35 require differentiable fibre bundles because a connection is a differential representation of
pathwise parallelism. Pathwise parallelism is then calculated by integrating a connection along a path. Just
as topological fibre bundles (Chapter 23) are the natural structure for defining parallelism (Chapter 24),
differentiable fibre bundles (Chapter 34) are the natural structure for defining connections (Chapters 35–37).
[ For differentiable fibre bundles, see Sulanke and Wintgen [86], II.1, and Choquet-Bruhat [61], I.12. ]


670 34. Differentiable fibre bundles
34.1. Differentiable fibre bundles with non-Lie structure group

Differentiable fibrations (fibre bundles with a structure group) are presented in Section 26.13. This section
presents differentiable fibre bundles with a non-Lie structure group, which means that the structure group
is not a differentiable manifold.
It could be useful to examine differentiable manifolds whose base space is a finite-dimensional manifold but
whose total space and fibre space are more general structures. Such generality is not (currently) presented
here. Thus in this section, the base space, total space and fibre space are assumed to be manifolds, but the
structure group is assumed to be only a topological space, which may be the space of all diffeomorphisms of
the fibre space.
Definition 34.1.1 is essentially identical to Definition 26.13.7.
34.1.1 Definition: A C k (differentiable) fibration with fibre space F for a C k differentiable manifold
−
< (F, AF ) and k ∈ +
F − 0 is a tuple (E, π, B) −
< (E, AE , π, B, AB , AF
E ) which satisfies:
(i) (E, AE ) and (B, AB ) are C k manifolds and π : E → B is C k ;

(ii) ∀φ ∈ AFE , ∃Uφ ∈ Top(B), φ : π
−1
(Uφ ) → F is C k and π ×φ : π −1 (Uφ ) → Uφ ×F is a C k diffeomorphism;
%
E
34.1.2 Remark: Definition 34.1.3 defines differentiable fibre bundles in terms of topological transformation
groups of C k automorphisms (G, F ), where the group G does not have a differentiable structure but the fibre
space F does. This kind of transformation group is given by Definition 33.6.5. A possibly suitable name for
such a fibre bundle would be a “semi-differentiable fibre bundle”.
34.1.3 Definition: A C k (differentiable) (G, F ) fibre bundle with non-Lie structure group for an effective
−+
topological C k left transformation group (G, F ) −
< (G, TG , F, AF , σG , µF
G ) for k ∈ 0 is a tuple (E, π, B) −
<
(E, AE , π, B, AB , AE ) which satisfies:
F

(ii) ∀φ ∈ AF E , ∃Uφ ∈ Top(B), φ : π
−1
%
E
&−1
(iv) ∀φ1 , φ2 ∈ AF &
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∃g ∈ G, φ2 ◦ φ1 E = Lg .
b
34.2. Differentiable fibre bundles with Lie structure group

The fibre bundles presented in this section are assumed to have structure groups which are finite-dimensional
differentiable manifolds. Therefore all of the relevant spaces are manifolds: the base space, total space, fibre
space and structure group. Although this is very convenient for analysis, the more general case of structure
groups which are not finite-dimensional manifolds may be useful in applications. This more general case is
presented in Section 34.1.
34.2.1 Remark: The definition of a topological fibre bundle in Section 23.3 involves four topological
spaces: the total space E, the base space B, the fibre space F , and the structure group G. The definition
also specifies three maps: the projection map π : E → B, the group operation σ : G × G → G and the
group action µ : G × F → F . Additionally there is an atlas AF E of fibre maps φ : π
−1
(U ) → F for open
sets U ∈ Top(B). A differentiable fibre bundle is the same except that the topologies are replaced with atlases
and continuity is replaced with differentiability. The additional structures to be specified are differentiable
manifold atlases for all four spaces, and the maps are required to be suitably differentiable.
The differentiable (G, F ) fibre bundle in Definition 34.2.2 satisfies the conditions for a topological (G, F )
fibre bundle in Definition 23.6.4 if the atlases AE , AB , AG and AF are replaced with the corresponding
induced topologies TE = Top(E), TB = Top(B), TG = Top(G) and TF = Top(F ). The other elements of
the specification tuple stay the same.
34.2.2 Definition: A C k (differentiable) (G, F ) fibre bundle for an effective C k Lie left transformation
−+
group (G, F ) − G ) for k ∈
< (G, AG , F, AF , σG , µF 0 is a tuple (E, π, B) −
< (E, AE , π, B, AB , AFE ) which
satisfies:

34.3. Vector fields on differentiable fibre bundles 671

(ii) ∀φ ∈ AF E , ∃Uφ ∈ Top(B), φ : π
−1
%
E
&
(iv) ∀φ1 , φ2 ∈ AF −1 &
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∃g ∈ G, βb,φ2 ◦ βb,φ1 = Lg , where βb,φ = φ π −1 ({b}) : π ({b}) ≈ F for
−1
φ ∈ AF E and b ∈ Uφ ;
−1
E , the function gφ1 ,φ2 : Uφ1 ∩ Uφ2 → G defined by Lgφ1 ,φ2 (b) = βb,φ1 ◦ βb,φ2 is of class C .
(v) ∀φ1 , φ2 ∈ AF k
34.2.3 Remark: The Lie transformation group in Definition 34.2.2 is defined in Section 33.7. Although
the group itself is assumed to be analytic, the action on the fibre F is only assumed to be C k . It turns out
that if the group is assumed to be only C k , it will be analytic anyway although this is non-trivial to prove.
G : G × F → F of G on F is C with respect to F , then it is also C
It also turns out that if the action µF k k
with respect to G. (See Remark 33.2.1.)

The structure group G for a fibre bundle has the role of classifying parallelism according to how much
structure is preserved. If the structure group is large, not much fibre structure is preserved under parallel
translations. A small structure group ensures that more structure is preserved.
[ Should have a theorem which can be used to prove Remark 34.2.4. ]
34.2.4 Remark: The per-fibre-set charts βb,φ : π −1 ({b}) ≈ F in Definition 34.2.2 must be of class C k
because π × φ is of class C k .
34.2.5 Definition: An analytic (G, F ) fibre bundle for an effective analytic Lie left transformation group
(G, F ) − G ) is a differentiable (G, F ) fibre bundle (E, π, B) −
< (G, AG , F, AF , σG , µF < (E, AE , π, B, AB , AF
E)
such that
(i) (E, AE ) and (B, AB ) are analytic manifolds;
(ii) the projection map π is analytic;
(iii) E , φ : Uφ → F is analytic and π × φ : π
for all φ ∈ AF −1
(Uφ ) → Uφ × F is analytic;
(iv) the transition maps gφ1 φ2 for φ1 , φ2 ∈ AE are analytic.
F
34.2.6 Remark: The analyticity of the map µF G in Definition 34.2.5 is implied by the weaker condition
that the action map µF G : G × F → F be analytic with respect to F only. (See Remark 33.7.4.) But since
this implication is non-trivial, it is best to state the analyticity requirement explicitly for both G and F .
[ This is really a comment on Lie transformation groups. Should move this remark to the relevant section! ]
[ Near here define differentiable fibre bundle homomorphisms, diffeomorphisms/isomorphisms, direct prod-
ucts etc. as in Section 23.7. Also maybe define C k compatible charts, C k equivalent atlases etc. See
Definition 23.6.14. ]
34.3. Vector fields on differentiable fibre bundles

[ Show relations of these vector fields to automorphisms Lbg,φ etc. Can something be done with Lbg,φ 1 ,b2
1 ,φ2
?]
This section deals with vector fields on the total space of a differentiable manifold, both globally and locally
on fibre spaces Ep individual points p ∈ M . Also discussed is the relation of such fields to the structure
group action on the total space via the fibre charts.
Vector fields may be generated by a Lie group on the individual fibres at points of a base space as in
Section 33.8. These may be thought of as “differential actions” of the group. The term “infinitesimal
transformations” is used by EDM2 [35], 431.G in the context of Lie transformation groups. Such vector
fields may sometimes be extended to cover an entire fibre bundle total space by using an atlas of fibre charts.
At each point the field is generated by some element of the Lie algebra of the Lie group, but this element
will generally depend on the choice of chart and the point in the base space.
If the vector field generated by a structure group is parametrized by vectors in the tangent space of the base
space, then one may define a special class of such families for which the vector field depends linearly on the
base space vector. These kinds of vector field families are used for defining connections on differentiable fibre
bundles.

[ All vector field concepts for differentiable fibre bundles should be generalized to topological fibre bundles in
the sense of transformations, as opposed to infinitesimal transformations. ]
[ Here define and discuss vector fields whose domain is Ep and range is T (E) from some total space E. These
vector fields are not valued in T (Ep ) because the vectors might not be vertical. In particular, consider vector
fields on Ep with a constant horizontal component. ]
34.3.1 Remark: Vector fields may be defined on the differentiable manifolds E, B, G and F of a differen-
tiable fibre bundle (E, π, B) − E ) with structure group (G, F ) −
< (E, AE , π, B, AB , AF < (G, AG , F, AF , σG , µF
G ).
Various kinds of invariant fields and infinitesimal transformations are of special interest on these manifolds.
On the group G, left and right invariant vector fields are defined in Sections 33.3 and 33.4. On the fibre
space F , infinitesimal transformations are defined in Section 33.8. In this section, similar fields are defined
on the total space E. Related fields are defined for principal fibre bundles in Section 34.5. Some special
kinds of vector fields (or vector bundle “cross-sections”) on ordinary fibre bundles are summarized in the
following table.
field parameters formula description
XuL ∈ X(G) u ∈ Te (G) g 8→ (dLG
g )e (u) left invariant vector field
XuR ∈ X(G) u ∈ Te (G) g 8→ (dRgG )e (u) right invariant vector field
XuF ∈ X(F ) u ∈ Te (G) f 8→ (dRf )e (u) infinitesimal transformation
E
Xu,φ ∈ X(E) u ∈ Te (G), φ ∈ AFE
∗
βb,φ XuF infinitesimal transformation via charts
E
Xu,v,φ ∈ X(E) u ∈ Te (G), v ∈ Tb (B), φ ∈ AF
E non-vertical infinitesimal transformation
[ In the following, must clarify exactly what T (Eb ) means in relation to T (E). A few other things need detailed
justification here too. ]
The field Xu,φ
E
on E is defined as the pull-back via a chart φ ∈ AF of the field XuF on F . This field on E is
& E
&
generally fibre chart dependent. It is constructed from βb,φ = φ E , where Eb = π −1 ({b}). Since βb,φ : Eb →
b
F is a diffeomorphism, the pull-back map βb,φ ∗
: T (F ) → T (Eb ) is well-defined, and Xu,φ
E
= (b 8→ βb,φ
∗
(XuF ))
is a well-defined vector field on Uφ = π(Dom(φ)) which is vertical in the sense that π∗ (Xu,φ ) is the zero
E
vector field on Uφ .
34.3.2 Remark: The fields XUL and XuR in Remark 34.3.1 are specific to Lie groups and have nothing to
do with the fibre space or the differentiable fibre bundle. The field XuF is specific to the Lie transformation
group (G, F ) and has nothing to do with the differentiable manifold. Only the fields Xu,φ E
and Xu,v,φ
E
are
specific to the differentiable fibre bundle. However, all of these fields are related.
34.3.3 Remark: The vector u ∈ Te (G) in Remark 34.3.1 may be set equal to the derivative γ # (0) ∈ Te (G)
of a differentiable curve γ : IR → G with γ(0) = e ∈ G. This suggests the natural generalization to a
differentiable fibre bundle with non-Lie structure group. In this way, the vector field Xu,φ
E
may be generalized
to the vector field X on Dom(φ) defined by
&
π(z) &
∀z ∈ Dom(φ), X(z) = ∂t (Lγ(t),φ (z))& ,
t=0
&−1
8 φ&E (gφ(z)). The corresponding generalization
where the left action Lbg,φ : Eb → Eb is defined by Lbg,φ : z →
b
of the infinitesimal left action XuF to non-manifold groups G is discussed in Remark 33.8.2. The corresponding
left and right invariant fields XuL and XuR , which do not require curves for their definition, are discussed in
Sections 33.3 and 33.4 respectively.
34.3.4 Remark: The vector field Xu,φ E

in Remark 34.3.1 is entirely vertical by virtue of its construction.
If this vector field is restricted to a single fibre set Eb , it generates a family of automorphisms of Eb , which
are in the set AutG (Eb ) introduced in Notation 23.8.5. A natural generalization is to vary the base point b.
Then it is possible to generate a non-vertical vector field via 1-parameter families of isomorphisms in the
sets IsoG (Eb1 , Eb2 ) for b1 , b2 ∈ B which were introduced in Notation 23.8.4.
Consider a curve γ : I → B × G for some open interval I ⊆ IR with 0 ∈ I and γ(0) = b. A vector field may be
b,γ1 (t) &
generated from this curve if the derivative X(z) = ∂t (Lγ2 (t),φ,φ (z))&t=0 is well-defined for all z ∈ Eb , where

34.4. Differentiable principal fibre bundles 673
γ1 (t) and γ2 (t) are the B and G components of γ(t) respectively, and the fibre set isomorphisms Lbg,φ 1 ,b2
1 ,φ2
are defined in Notation 23.8.15. Then the curve γ generates a vector field on Eb which is not necessarily
vertical. In fact, π∗ (X(z)) = γ1# (0) for all z ∈ Eb . This means that the horizontal component of X(z) has
the same value γ1# (0) ∈ Tb (B) for all z ∈ Eb . This kind of vector field is well-defined even for differentiable
fibrations and fibre bundles with non-Lie structure group.
When G is a Lie group, the vector field X defined here is given the notation Xu,v,φE
, where u ∈ Te (G)
and v ∈ Tb (B). The vectors u and v may be thought of as the vertical and horizontal components of the
vector field respectively. For a non-manifold structure group G, a more general notation is required.
Vector fields of the form Xu,v,φ
E
are exactly what are required for defining connections on general differentiable
fibre bundles. Since parallelism is defined in terms of the isomorphism maps Lbg,φ1 ,b2
1 ,φ2
in Section 23.8, it is not
at all surprising that connections are defined in terms of vector fields such as Xu,v,φ , which are differentials
E
of such isomorphisms.
34.4. Differentiable principal fibre bundles

Differentiable principal fibre bundles are the customary structure on which to define a connection. The
reason for this is that parallel transport for all associated fibre bundles of a given PFB may be defined in
terms of such a connection. However, this advantage of PFBs relative to ordinary fibre bundles is partly
illusory. Connections may be defined on any old OFB, and parallel transport is then defined on associated
fibre bundles by copying the fibre chart transition maps. This all becomes clearer in Chapter 35.
A principal fibre bundle with structure group G is the same thing as a (G, G) ordinary fibre bundle. It is
always possible to define a right action µPG : P × G → P by group elements in G acting on the total space P .
This right action adds no new information because it is defined in terms of the other components of the
definition. In the following definitions, the notations Lg and Rg are shorthand for the left and right actions
of group elements respectively. So for g in a group G, Lg : g # 8→ gg # and Rg : g # 8→ g # g.
−
34.4.1 Definition: A C k (differentiable) principal (fibre) bundle with structure group G for k ∈ + 0 and
a Lie group G − < (G, AG , σG ) is a C k (G, G) fibre bundle (P, q, B) −
< (P, AP , q, B, AB , AG
P ) for (G, G) −<
(G, AG , G, AG , σG , σG ).
&−1
The right action of G on P is the operation µP &
G : P × G → P defined by µG (z, g) = φ P (σG (φ(z), g)) for
P
q(z)
P with z ∈ Dom(φ).
(z, g) ∈ P × G for any φ ∈ AG
The right transformation group of the principal fibre bundle (P, q, B) is the C k Lie right transformation
group (G, P ) − G ).
< (G, AG , P, AP , σG , µP
A C k principal fibre bundle with structure group G is also called a C k (differentiable) principal G-bundle or
a C k (differentiable) G-bundle.
If the regularity class C k is not specified, it is assumed to be C 1 .
34.4.2 Remark:
& Definition
& 34.4.1 is illustrated in Figure 34.4.1. As usual, the notations Pb = q −1 ({b})
& &
and βb,φ = φ q−1 ({b}) = φ P for b ∈ q(Dom(φ)) are adopted here for Definition 34.4.1.
b
G G
Rg Lg
q −1 (U ) ⊆ P
φ
q×
q
φ
U ⊆B U ×G⊆B×G
Figure 34.4.1 Principal fibre bundle with Dom(φ) = q −1 (U )

−+
34.4.3 Theorem: The right action µP G of a C principal fibre bundle (P, q, B, AP ) for k ∈
k G
0 is the unique
map µG : P × G → P which satisfies:
P
(i) ∀z ∈ P, ∀g ∈ G, q(µP
G (z, g)) = q(z); (That is, q(zg) = q(z).)
(ii) ∀φ ∈ AG
P , ∀z ∈ Dom(φ), ∀g ∈ G, φ(µG (z, g)) = σG (φ(z), g). (That is, φ(zg) = φ(z)g.)
P
34.4.4 Remark: The analogous left action to the right action µP G in Definition 34.4.1 is the map LG,φ :
P
−1
G × q (Uφ ) → q (Uφ ) defined by LG,φ : (g, z) 8→ βq(z),φ (σG (g, φ(z))). This left action is chart-dependent.
−1 −1 P
34.4.5 Remark: The requirements for principal fibre bundles are summarized in the following table. Topo-
logical principal fibre bundles are defined in Section 23.9.
topological C k principal analytic
component symbol principal G-bundle G-bundle principal G-bundle
total space P topological space C k manifold analytic manifold
base space B topological space C k manifold analytic manifold
structure group G topological group Lie group Lie group
projection map π:P →B continuous Ck analytic
fibre charts φ:P →
˚ G continuous Ck analytic
group operation σ:G→G continuous analytic analytic
right action µ:P ×G→P continuous Ck analytic
A rough and simple family tree for differentiable fibre bundles is illustrated in Figure 34.4.2.
differentiable manifold
(X,AX )
differentiable fibration differentiable transformation group

(E,AE ,π,B,AB ) (G,F )−
<(G,AG ,F,AF ,σG ,µ)
differentiable (G,F ) fibre bundle
(E,AE ,π,B,AB ,AF E)
differentiable G-bundle
(P,AP ,q,B,AB ,AGP)
Figure 34.4.2 Family tree for differentiable fibre bundles
[ Explain here why the n-frame bundle on a differentiable manifold can be regarded as a principal GL(n)-
bundle. Also discuss principal SO(n)-bundles and so forth. ]
[ Near here, define analytic principal G-bundles. ]
34.5. Vector fields on differentiable principal fibre bundles

[ Relate the vector fields to spaces of fibre set isomorphisms Lbg,φ etc., and to OFB vector fields. ]
This section prepares the way for defining connections on principal fibre bundles. This section is related
to Section 33.8 on “infinitesimal transformations of Lie transformation groups”. Of special interest is the
relation between infinitesimal transformation fields on associated principal and ordinary fibre bundles. Vector
fields on ordinary differentiable fibre bundles are defined in Section 34.3.
34.6. Associated differentiable fibre bundles

Associated fibre bundles are fibre bundles which have equivalent chart transition maps. See sections 23.10,
23.11 and 23.12 for associated topological fibre bundles. Associated differentiable fibre bundles are defined in
exactly the same way in terms of the topologies induced by the differentiable atlases. The definitions in those

34.6. Associated differentiable fibre bundles 675
sections are therefore valid for differentiable fibre bundles if the topologies are replaced with differentiable
atlases.
Associated fibre bundles are a mechanism for copying definitions of parallelism between fibre bundles. Thus
parallelism can be defined on one fibre bundle and copied to many associated fibre bundles. A typical example
of this is the copying of parallelism from a tangent bundle of a manifold to tensor bundles of arbitrary type.
A common example in the literature is copying parallelism from a principal fibre bundle to an ordinary
fibre bundle. Since a connection on a fibre bundle is a differential representation of pathwise parallelism,
connections may also be copied between associated fibre bundles, and this is the purpose of defining them.
Associated parallelism for general topological fibre bundles is presented in Section 24.3.
Most textbooks define associated fibre bundles in terms of a particular method of construction, namely the
orbit space construction method presented in Section 23.12. The orbit space method constructs abstract
ordinary fibre bundles from principal fibre bundles. Another construction method in the literature uses iden-
tification spaces as presented in Section 23.11. This method is strongly hinted at by Kobayashi/Nomizu [27],
Prop. 5.2, page 52. In practice, associated fibre bundles are rarely constructed by the methods in Definitions
34.6.7 and 34.6.10. Instead, they are constructed independently and an association map h is constructed
between their atlases as in Definition 34.6.2.
34.6.1 Remark: Definitions 34.6.2 and 34.6.3 are almost identical to Definitions 23.10.3 and 23.10.5 re-
spectively because fibre bundle associations do not specify regularity constraints. The differences lie in the
fact that topological fibre bundles specify topologies for each space whereas differentiable fibre bundles spec-
ify atlases to indicate the regularity. As mentioned in Remark 23.10.4, Definition 34.6.2 (ii) may be expressed
as:
∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀g ∈ G,
" # " #
∀z ∈ π −1 ({b}), φ1 (z) = gφ2 (z) ⇔ ∀z̃ ∈ π̃ −1 ({b}), h(φ1 )(z̃) = gh(φ2 )(z̃) .
34.6.2 Definition: A differentiable fibre bundle association is a bijection h : AF

E → AẼ between the fibre
F̃
atlases of differentiable (G, F ) and (G, F̃ ) fibre bundles (E, π, B, AF

E ) and (Ẽ, π̃, B, AẼ ) respectively such
F̃
that:
E , π(Dom(φ)) = π̃(Dom(h(φ)));
(i) ∀φ ∈ AF
(ii) ∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , gφ1 ,φ2 (b) = g̃h(φ1 ),h(φ2 ) (b), where Uφ denotes π(Dom(φ)), and g, g̃ denote
the fibre chart transition functions for the respective fibre bundle atlases.
34.6.3 Definition: Associated differentiable fibre bundles are differentiable (G, F ) and (G, F̃ ) fibre bun-
E ) and (Ẽ, π̃, B, AẼ ) respectively for which there is specified a differentiable fibre bundle
dles (E, π, B, AF F̃
association h : AF
E → AẼ .
F̃
34.6.4 Remark: Just as Definition 23.10.9 defines a pair of C 0 -associated topological fibre bundles as C 0 -
equivalent to a pair of associated topological fibre bundles, Definition 34.6.5 defines a pair of C k -associated
differentiable fibre bundles as C k -equivalent to a pair of associated differentiable fibre bundles. This is
34.6.5 Definition: C k associated differentiable fibre bundles are differentiable fibre bundles ξ1 , ξ˜1 with
the same base space which are C k equivalent to associated differentiable fibre bundles ξ2 , ξ˜2 respectively.
34.6.6 Remark: Definition 34.6.7 is the differentiable analogue of Definition 23.11.2. This defines a method
of constructing associated fibre bundles with the help of identification spaces.
34.6.7 Definition: The associated C k differentiable (G, F̃ ) fibre bundle (identification space method) for
−
k ∈ + 0 of a given C differentiable (G, F ) fibre bundle (E, π, B) − < (E, AE , π, B, AB , AF E ), for C Lie
k k
left transformation groups (G, F ) − < (G, AG , F, AF , σ, µG ) and (G, F̃ ) −

F
< (G, AG , F̃ , AF̃ , σ, µG ), is the C k
F̃
differentiable (G, F̃ ) fibre bundle (Ẽ, π̃, B) −

< (Ẽ, AẼ , π̃, B, AB , AF̃
Ẽ
) defined by:

) *
E,b , where [(b, y, φ)] = {(b,
(i) Ẽ = [(b, y, φ)]; b ∈ B, y ∈ F̃ , φ ∈ AF g ! (b)y, φ# ); φ# ∈ AF
E,b }, the transition
" φ& φ #
maps gφ1 φ2 : Uφ1 ∩ Uφ2 → G are defined by Lgφ1 ,φ2 (b) = φ1 ◦ φ2 &π−1 ({b}) −1 , and Dom(φ) = π −1 (Uφ )
E;
for φ ∈ AF
(ii) π̃ : Ẽ → B is defined by π̃ : [(b, y, φ)] 8→ b;
(iii) AF̃
Ẽ
= {φ̃; φ ∈ AF
E }, where φ̃ : π̃
−1
(Uφ ) → F̃ is defined for φ ∈ AF
E by φ̃ : [(b, y, φ)] 8→ y;
) Ẽ,φ *
(iv) AẼ = ψi,j ; φ ∈ AF E , i ∈ IB , j ∈ IF̃ , where IB and IF̃ are index sets for the atlases AB and AF̃
Ẽ,φ
respectively, and the charts ψi,j : Dom(φ̃) → IRnB +nF̃ , with nB = dim(B) and nF̃ = dim(F̃ ), are
E , i ∈ IB and j ∈ IF̃ by
defined for φ ∈ AF
Ẽ,φ " # " #

∀b ∈ Uφ , ∀y ∈ F, ψi,j [(b, y, φ)] = ψiB (b), ψjF (y) ,
where ψiB ∈ AB and ψjF̃ ∈ AF̃ correspond to indices i ∈ IB and j ∈ IF̃ respectively.
[ Check whether these C k associated fibre bundle constructions have anything to do with k. If not, then
remove the C k attribute from the definitions. ]
[ Show that Definitions 34.6.7 and 34.6.10 satisfy the conditions for associated C k fibre bundles. ]
34.6.8 Remark: Definition 34.6.2 is the differentiable analogue of Definition 23.12.3. This defines a method
of constructing associated fibre bundles with the help of orbit spaces. The manifold atlases for the constructed
total spaces in conditions 34.6.7 (iv) and 34.6.10 (iv) are essentially the same.
34.6.9 Remark: The fine details of regularity of the original total space P in Definition 34.6.10 are lost
in this method of construction because the atlas AE does not depend in any way on the original atlas AP .
(The same loss of regularity information from the original total space occurs in Definition 34.6.7.) In fact,
it does not seem to be possible to retain such information in general since the spaces F and G may be very
different. The only hope for retaining such information is to retain some of the irregularity with respect to
the base space B, but this must be communicated through the fibre charts somehow.
34.6.10 Definition: The associated differentiable (G, F ) fibre bundle (orbit space method) with C k struc-
ture group (G, F ) − G ) for a given C principal G-bundle (P, q, B) −
< (G, AG , F, AF , σ, µF k
< (P, AP , q, B, AB , AG
P)
−+
for k ∈ 0 is the differentiable (G, F ) fibre bundle (E, π, B) − < (E, AE , π, B, AB , AE ) defined by:
F
) *
(i) E = [(z, y)]; z ∈ P, y ∈ F , where [(z, y)] = {(z # , y # ) ∈ P × F ; q(z # ) = q(z) and φ(z # )y # = φ(z)y};
(ii) π : E → B is defined by π : [(z, y)] 8→ q(z);
(iii) AFE = {φ̃; φ ∈ AP }, where φ̃ : π
G −1
(Uφ ) → F is defined for φ ∈ AG
P by φ̃ : [(z, y)] 8→ φ(z)y;
) E,φ *
(iv) AE = ψi,j ; φ ∈ AP , i ∈ IB , j ∈ IF , where IB and IF are index sets for the atlases AB and AF
G
E,φ
respectively, and the charts ψi,j : Dom(φ̃) → IRnB +nF , with nB = dim(B) and nF = dim(F ), are
P , i ∈ IB and j ∈ IF by
defined for φ ∈ AG
E,φ " # " #

∀z ∈ q −1 (Uφ ), ∀y ∈ F, ψi,j [(z, y)] = ψiB (π(z)), ψjF (φ(z)y) ,
where ψiB ∈ AB and ψjF ∈ AF correspond to indices i ∈ IB and j ∈ IF respectively.
34.6.11 Remark: Definition 34.6.10 is illustrated in Figure 34.6.1. The customary notation for the total
space E in Definition 34.6.10 is (P × F )/G or P ×G F .
[ Try to get induced “infinitesimal transformations” on PFBs from OFBs and vice versa. This is relevant
to clarifying the relation between connections on OFBs and PFBs. Do this also for general associated
differentiable fibre bundles. ]

34.7. Vector bundles 677
ψkG ψkG
IRnG G G IRnG
G = σG
µG µF
G
ψkG ψjF
IR nG
G F IRnF
h
φ h(φ) φ̃
E,φ
ψ%P ψi,j
IR nP
P E IRnE
q π
ψiB ψiB
IRnB B B IRnB
principal fibre bundle associated fibre bundle

Figure 34.6.1 Associated differentiable fibre bundle construction, orbit space method
34.7. Vector bundles

Vector bundles are differentiable fibre bundles for which the structure group is the general linear analytic
Lie left transformation group (GL(n), IRn ). It is also possible to define vector bundles as topological fibre
bundles, but this does not seem to be very common in the literature. Definition 34.7.1 adds some linearity
requirements to the general differentiable fibre bundle definition. For this, the set IRn must be equipped
with the usual linear space structure, but this level of (boring) detail is omitted here (for now).
[ See EDM2 [35], 147.F, for vector bundles. Also define “principal vector bundles”? Define “orthogonal
vector bundles” with the group O(n)? Possibly should use a different kind of tuple for vector bundles
to differentiable fibre bundles to reflect the special linear structure. But this would make it slightly less
convenient for saying that vector bundles are a subclass of the differentiable fibre bundles. ]
[ Could use general linear spaces V and GL(V ) instead of IRn and GL(n, IR) in Definition 34.7.1. Maybe
should deal with linear space structures here some day. ]
34.7.1 Definition: An n-dimensional vector bundle for n ∈ +
0 is a differentiable (GL(n), IRn ) fibre bundle
(E, π, B, AF
E) −
< (E, AE , π, B, AB , AF
E ) such that
(i) For all b ∈ B, π −1 ({b}) is a real n-dimensional linear space;

(ii) For all b ∈ B and φ ∈ atlasb (E, π, B) = AFE,b , the map βb,φ : π
−1
({b}) ≈ F is a linear space isomorphism.
34.7.2 Definition: A line bundle is any 1-dimensional vector bundle.
[ Perhaps should have a table showing the requirements of all components of fibre bundles. Thus the row for
the group G would indicate that the group is the homeomorphism group, or GL(n) etc., the row for F would
indicate that it is a topological space, a manifold, a vector space etc., and so forth. ]
34.8. Tangent bundles of differentiable manifolds

The following definition of a tangent bundle is chosen to match with Definition 34.2.2 for a differentiable
fibre bundle. In fact, it is a vector bundle according to Definition 34.7.1 if suitable linear space structure
is added to the fibre sets. If the linear space structures were added, then Definition 34.8.1 would specify a
vector bundle as claimed in Theorem 34.8.4.
The differentiable manifold of tangent vectors (T (M ), AT (M ) ) is defined in Section 27.8 as the total tangent
space of the manifold M . The tangent fibration of a differentiable manifold is given by Definition 27.8.8.
The tangent bundle in Definition 34.8.1 is the same as the tangent fibration of a manifold except that a fibre
n
atlas AITR(M ) is added to indicate how the structure group interacts with the total space of the fibre bundle.
34.8.1 Definition: The tangent fibre bundle of a C 1 n-dimensional manifold M −

< (M, AM ) is the tuple
n
< (T (M ), AT (M ) , π, M, AM , AITR(M ) ) where
(T (M ), π, M ) −

(i) (T (M ), AT (M ) , π, M, AM ) is the tangent fibration of M as in Definition 27.8.8;

n
(ii) AITR(M ) = {ψ̂; ψ ∈ M }, where for each ψ ∈ AM , the fibre chart ψ̂ : π −1 (Dom(ψ)) → IRn is defined so
that ψ̂ : tp,v,ψ 8→ v.
34.8.2 Remark: Some of the maps and spaces in Definition 34.8.1 are illustrated in Figure 34.8.1.
ψ̂
π −1 (U ) ⊆ T (M ) IRn
π
×
π ψˆ
ψ̃ =
U ⊆M U × IRn ⊆ M × IRn
(ψ
◦ π)
× ψ̂
ψ
IRn IRn × IRn ≡ IR2n
Figure 34.8.1 Tangent bundle spaces and maps
[ Show that Definition 34.8.1 is well-defined. See Malliavin [36], lemma I.7.2.4. ]
34.8.3 Remark: As stated in Theorem 27.8.13, the pair (T (M ), AT (M ) ) in Definition 34.8.1 is a C k dif-
−
ferentiable manifold if (M, AM ) is a C k+1 differentiable manifold for k ∈ +
0.
− n
34.8.4 Theorem: For k ∈ + 0 , the tuple (T (M ), π, M ) −
< (T (M ), AT (M ) , π, M, AM , AITR(M ) ) in Defini-
tion 34.8.1 is a C k vector bundle if M −
< (M, AM ) is a C k+1 differentiable manifold.
[ Discuss vector fields (such as infinitesimal transformations) on tangent bundles here. Define drop functions. ]
[ Maybe comment on the tangent bundle of a tangent bundle here. ]
[ Define the Lie derivative for general differentiable fibre bundles by using fibre bundle associations between the
tangent bundle and an arbitrary associated fibre bundle. This uses the associated parallelism for topological
fibre bundles. ]
34.9. Tangent frame bundles

[ In this section, define the tangent PFB as both group elements and as basis fields. E.g. see Definition 23.11.2
for associated topological PFBs. ]
Fibre charts for principal fibre bundles define a particular choice of “local coordinates” at each point of the
domain of the chart. The inverse of the structure group’s identity element under a PFB chart defines a
coordinate frame field on the base space of the fibre bundle.
34.9.1 Definition: The tangent (coordinate) frame bundle of a C 1 n-dimensional manifold (M, AM ) is
the principal fibre bundle P (M ) − P ) where
< (P, AP , q, M, AM , AG
(i) The components (P, AP , q, M, AM ) are as in Definition 27.12.8;
(ii) AG
P = {ψ̂; ψ ∈ M }, where. . .
34.9.2 Remark: Definition 34.9.1 is illustrated in Figure 34.9.1. The right action µP
G : P × G → P of G
on P defined by . . .

34.9. Tangent frame bundles 679
ψ̂
q −1 (U ) GL(n)
q×
q ψˆ
ψ̃ =
U ⊆M U × GL(n)
(ψ
◦ q)
× ψ̂
ψ
IRn IRn × GL(n)
Figure 34.9.1 Tangent coordinate frame bundle spaces and maps

−
34.9.3 Theorem: For k ∈ + 0 , the tuple (P, AP , q, M, AM , AP ) in Definition 34.9.1 is a C principal
G k
GL(n)-bundle if M is a C k+1
manifold.
34.9.4 Remark: This remark is about attempting to generalize the concept of a frame bundle as a principal
fibre bundle to associated principal fibre bundles of general topological fibre bundles. Associated fibre bundles
are constructed in Section 23.11, but the frame bundle concept suggests another kind of construction for
PFBs in general. The idea is to try to construct a principle fibre bundle by attaching a set of objects at each
point of a manifold which is homeomorphic to the structure group.
A basis of the tangent space at each point of a differentiable manifold may be regarded as a set of “test
points” for a fibre atlas. Thus the total space P is defined for the tangent space of an n-dimensional manifold
as the set of all n-frames (ei )ni=1 at all points of the manifold. The individual vectors ei in these n-frames
are elements of the associated tangent bundle total space E. If φ ∈ AF E is an OFB
" chart # and φ̃ ∈ AP is the
G
corresponding PFB chart, then φ(ei ) = fi ∈ F = IR for each i = 1 . . . n and φ̃ (ei )i=1 = g ∈ G = GL(n).
n n
It happens that the matrix of g has linearly independent column vectors fi = g(fi0 ) for i = 1 . . . n, where
the vectors fi0 are the standard unit vectors of IRn . So the full matrix of g may be obtained from the action
of g on the “test points” fi0 ∈ F . Since the group action is uniquely determined by the n test points, the
total space of the PFB may be represented in terms of sequences of basis vectors. (See Remark 23.12.1 for
related comments.)
To generalize the idea of representing group elements as coordinate bases to general topological fibre bundles,
it is simply a matter of identifying a subset S of the standard fibre set F for which the action of the group
G is uniquely determined on the whole of F . It is always possible, of course, to choose S = F (because
structure groups always act effectively on their fibre spaces). Then the group element g is represented as the
set of pairs (f, gf ), and P is represented as sets of pairs (f, φ−1 (gf )) for f ∈ S. In this way, it is possible to
represent P in terms of the total space E of any associated (G, F ) fibre bundle. In the case that G = GL(n),
these pairs would be (fi0 , φ−1 (gfi0 )), and the standard vectors fi0 may be replaced with the indices i = 1 . . . n.
[ There’s a technicality to be fixed up here with (f, φ−1 (gf )). ]
The idea can be clarified by looking at the structure groups SO(3) and O(3). In the case group elements
g ∈ SO(3), it is sufficient to specify the action of g on two points, for example any two standard basis vectors
of IR3 . But in the case of g ∈ O(3), g must be tested on at least three vectors in IR3 . Thus the total space
P of a principal SO(3)-bundle may be defined in terms of pairs of vectors whereas a O(3)-bundle total space
requires vector triples. The important thing is to make sure that a homeomorphism can be set up with the
structure group.
[ Near here, probably in a new section, define fibre bundles for tensor spaces such as T r,s (M ). Also define
tangent bundles for product manifolds M1 × M2 , manifold quotients, and so forth. ]
[ Possibly have a section on “differentiable double fibre bundles”. These are fibre bundles which include double
tangent spaces/bundles as a special case. ]


[681]
Chapter 35
Connections on differentiable fibre bundles
35.1 Naming, history and choice of definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 682

35.2 Differentiation of parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
35.3 Horizontal lift functions for ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . . . 686
35.4 Curvature of connections on ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . . . 689
35.5 Horizontal lift functions for principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . 690
35.6 Connection forms for PFB connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
35.7 Covariant derivatives for general connections . . . . . . . . . . . . . . . . . . . . . . . . . 695
35.8 Parallel displacement for PFB connections . . . . . . . . . . . . . . . . . . . . . . . . . . 695
35.9 Alternative definitions for general connections . . . . . . . . . . . . . . . . . . . . . . . . 696
35.0.1 Remark: The differential of parallelism is called a “connection”. Connections are defined in many
different and confusing ways, but they are all equivalent to differentials of parallel translations of fibre sets
with respect to base points of a fibre bundle. General topological parallelism is presented in Chapter 24.
35.0.2 Remark: Since a connection is the differential of the parallel transport of fibre sets in each direc-
tion V ∈ Tp (M ) at each base point p ∈ M of a differentiable manifold M , the pathwise parallelism may be
reconstructed from a connection by integration along paths. If the information in a definition of parallelism
can be reproduced in this way from its differential, then the differential representation has many advantages.
Connections represent the information more efficiently and are much easier to work with than a pathwise
definition. Some generality is lost but this is usually not a problem. Since connections are differentials, the
fibre bundle is required to be differentiable. That is why differentiable fibre bundles must be defined before
connections. Differentiable fibre bundles are presented in Chapter 34.
35.0.3 Remark: Fibre bundles have explicit or implicit structure groups. If no structure group is specified,
it is assumed to be the group of all homeomorphisms of the fibre space in the case of a topological fibre
bundle, or the group of all C k diffeomorphisms in the case of a C k fibre bundle.
Parallelism on a fibre bundle is required to preserve the structure implied by its structure group. Therefore
parallel translation of the fibre between any two points must be an element of the structure group (via the
fibre bundle chart). Since a connection is the differential of a parallel translation, the structure group must
be differentiable, which implies that it must be a Lie group. Therefore the value of a connection function
must be an element of the Lie algebra of the structure group. It follows that Lie groups and Lie algebras
are inescapable in the definition of connections. Lie groups are presented in Chapter 33.
35.0.4 Remark: As mentioned in Section 1.1, the metric layer is built on top of the connection layer. Thus
there are infinitely many choices for the metric structure on a manifold for a given connection structure, but
each metric structure determines a unique connection which supports the metric, namely the Levi-Civita
connection. So the metric structure contains all of the information required to construct the connection
layer.
It could be argued that in some sense the connection layer does determine the metric layer. Suppose
a Riemannian metric is given on a differentiable manifold. From this, the Levi-Civita connection may be


682 35. Connections on differentiable fibre bundles
constructed. Then if the metric structure is known at just one point (and the manifold is pathwise connected),
the metric at all points may be determined by parallel transport, which preserves the length of vectors.
There are two difficulties with this argument. The weaker counter-argument is that the metric can be
arbitrarily scaled without changing the Levi-Civita connection. So there are infinitely many Riemannian
metrics for the given connection, but these metrics are related by a very simple scaling relation. The stronger
counter-argument is that it is only a very special kind of connection which is the Levi-Civita connection of
some Riemannian metric. Such a connection would need to at least preserve some kind of orthogonal
structure group for all paths, but the definition of such a structure group pre-supposed the definition of a
metric. Any connection which does not preserve an orthogonal structure group will be inconsistent with any
Riemannian metric.
35.0.5 Remark: For differential geometry, the most important kinds of connections are linear (called
“affine”) connections on tangent bundles of differentiable manifolds. Affine connections are presented in
Chapters 36 and 37.
35.1. Naming, history and choice of definitions

35.1.1 Remark: The word “connection” is a potentially confusing choice of name for differential paral-
lelism. Hermann Weyl called it a “Zusammenhang” in German. (See Weyl [51], section 15, page 113.) The
English word “connection” probably came from “Zusammenhang” which does mean “connection” although
it literally means “hang together”.
35.1.2 Remark: The index of the Hermann Weyl book [51, page 344] written in 1918–1922 mentions three
kinds of connection (“Zusammenhang”), namely the affine connection, metric connection and continuous
connection (“affin”, “metrisch” and “stetig”).
When the two “continuous connection” references are followed, they do not lead to the explicit phrase
“stetiger Zusammenhang”, but to discussions of continuous n-dimensional manifolds with no connection
and no metric. It is clear from the first text location ([51, page 78–79]) that Weyl is talking about a
purely topological manifold as in Definition 25.3.1 (without mentioning the technical Hausdorff space re-
quirement). At the second text location ([51, page 104]) Weyl distinguishes between a topological manifold
or “continuum” (“das Kontinuum”), an “affinely connected manifold” (“die affin zusammenhängende Man-
nigfaltigkeit”) and a “metric space” (“der metrische Raum”). (See also Weyl’s “three-storey” DG structure
model, Remark 25.1.9.)
It seems a fairly safe assumption that Weyl understands a “continuous connection” to be a structure of
topological charts, whereas an “affine connection” is a differential parallelism structure ([51, page 113]).
In other words, the term “affine connection” is contrasted to a mere “topological connection” or locally
Euclidean structure. For Weyl’s word “Zusammenhang”, we could perhaps substitute the English word
“structure”. Then the three levels of his “Zusammenhang” would be the “continuous structure” (topology),
“affine structure” and “metric structure”.
35.1.3 Remark: A more recent German text (Goenner [21], section 8.2, page 226) refers to a connection as
a “lineare Übertragung” or “Konnektion”. The word “Übertragung” means “transference” or “transmission”,
which is fairly close to “transport”. In English, there is no need for argument. Differential parallelism is
always called a “connection”. The integral of the connection along a path is always called “parallel transport”.
35.1.4 Remark: The word “connection” may be motivated by considering differentiation of vector fields on
manifolds. If there is no connection, differentiation of vector fields is chart-dependent. The difference between
two vectors at different base points can only be evaluated if they are in the same vector space. Therefore
the fibre sets at different base points of a manifold must be “connected” by chart-independent parallelism
isomorphisms between the fibre sets. Then vector subtraction is well defined. Therefore differentiation may
be defined. Connections are differentials with respect to base points of these isomorphisms which “connect”
the fibre sets at different base points.
35.1.5 Remark: Although Riemann introduced Riemannian geometry as an intrinsic generalization of two-
dimensional surfaces embedded in Euclidean 3-space in 1854, and Christoffel symbols (which are coordinates
of a connection) were used in the 19th century in covariant differentiation, the concept of a connection

35.1. Naming, history and choice of definitions 683
which is independent of the Riemannian metric seems to have been introduced in the early 20th century by
Levi-Civita (1917) and developed by Weyl, Eddington and Cartan (1923–25). This innovation interposed
a connection layer between the differentiable layer and the metric layer. (These layers are discussed in
Section 1.1. A clear historical account of the introduction of general connections into differential geometry
may be found in EDM2 [35], article 109.) According to Bell [190], page 360, covariant differentiation was
invented by Christoffel in about 1869 and was given this name by Ricci-Curbastro in 1887.
35.1.6 Remark: Most texts define connections on principal fibre bundles. It is possible to define connec-
tions on ordinary fibre bundles and this has been done here as the primary definition. One of the problems
with principal fibre bundles is the fact that they are chart-dependent. So the apparently chart-independent
definitions of connections on PFBs are in fact illusory because they sit on top of chart-dependent PFBs.
[ An old note from 2002-11-6: “Make the point that for a fibration, the curvature expression is chart-dependent,
but although the connection form and curvature form for a PFB are apparently chart-independent, the
associated PFB of an OFB is very arbitrary – equivalent to the arbitrariness of the chart for an OFB.” ]
35.1.7 Remark: Most DG textbooks aimed at mathematicians formalize connections either as connection
forms or as horizontal subspaces on principal fibre bundles. Parallelism can be extracted from such definitions
with sufficient effort. In this book, definitions are chosen for maximum comprehensibility. Therefore the
horizontal lift function which directly specifies infinitesimal parallelism is the primary definition here. When
the horizontal lift function is integrated over a path, the result is parallel transport of fibre sets from point
to point within a manifold.
35.1.8 Remark: Among the styles of definition of a connection are the following.
style of definition bundle type notation reference
pathwise parallelism topological Θγs,t 24.2
horizontal lift function differentiable θV (z) 35.3
connection form diffbl. PFB ωz 35.6
horizontal subspaces differentiable Qz 35.9
horizontal component map differentiable 35.9
hz
covariant derivative differentiable DV X 36.5
Koszul connection tangent DX Y
Christoffel symbols tangent Γjk
i
36.10
affine geodesics tangent γ 37.2
The styles marked “tangent” are defined only for tangent bundles. The connection form requires a Lie group
as its structure group and is defined only on differentiable principal fibre bundles.
35.1.9 Remark: Some treatments of connections in the literature are Misner/Thorne/Wheeler [38], chap-
ter 10, EDM2 [35], article 80, Gallot/Hulin/Lafontaine [20], section II.B, and Weyl [51], section 15.
[ Spivak [43], pages II.336–341, has comparisons of some alternative definitions of connections. See Darling [14],
section 9.1, page 194, for the Koszul connection. See Malliavin [36], page 89 for discussion of parallelism
with connection forms. Should have a table showing relations between the above styles of definitions of
connections. ]
[ Should try to define a “second-order connection”, which means a connection which tells how to transport
second-order derivatives on a manifold rather than first-order. Probably this can be defined in terms of the
first-order connection, but maybe it could be independently defined in a meaningful sense. ]
[ Although it is assumed here that the fibre bundles on which connections are defined are differentiable, it’s
perhaps not meaningless to ask what a connection on a merely continuous principal fibre bundle might look
like. If the manifold is only C 1 , then the tangent bundle will be only C 0 , and this should give rise to a C 0
fibre bundle. Then it should be possible to define how a fibre changes as the base point in M moves along
a continuous path. (Probably need a Lipschitz manifold and connection, with rectifiable paths.) ]

35.2. Differentiation of parallel transport

35.2.1 Remark: Parallel transport is usually defined as the integral of a connection. In this section, the
reverse procedure is carried out. The connection is defined as the differential of parallel transport, just
to show that it can be done. In real life, however, the connection is always defined first, and the parallel
transport is defined as the integral of this.
35.2.2 Remark: Parallelism on fibre bundles is formalized in Section 24.2 as fibre set isomorphisms Θγs,t :
Eγ(s) → Eγ(t) between fibre sets Eγ(s) and Eγ(t) of a (G, F ) fibre bundle (E, π, B, AF E ) for parameters
s, t ∈ Iγ of suitable curves γ : Iγ → B. When B is a differentiable manifold, the natural choice for a class
of curves to define parallelism on is the set of all rectifiable curves. (See Section 17.5 for rectifiable curves.
In the context of the Hölder continuity C k,α , a reasonable notation for these curves would be C 0,1 or C0,1 .)
Other popular choices for the curve class are piecewise C 1 and piecewise C ∞ . The important thing is that
each curve must be differentiable for almost all parameter values. In terms of a curve class C , parallelism
has the form: % " % #
Θ:C → Iγ × Iγ → IsoG (Ep , Eq ) .
γ∈C p,q∈E
(See Definition 24.2.2 for full details.) For a fixed z ∈ Eγ(s) for any curve γ ∈ C and s ∈ Iγ , one may
attempt to differentiate the map γ̃ z : Iγ → E defined by γ̃ z : t 8→ Θγs,t (z) ∈ Eγ(t) . (This is a lift of γ
from B to E.) Note that γ̃ z (t) is in a moving fibre set Eγ(t) . This is exactly the kind of thing which
cannot be differentiated in a fibre-chart-independent manner in the absence of a connection. However, this is
actually a good thing because the covariant derivative of a general cross-section on the curve will be defined
by subtracting (in some sense) the fibre-chart-dependent derivative of γ̃ z from the fibre-chart-dependent
derivative of the general cross-section. The deviation of a cross-section from parallel transport along a curve
will be fibre-chart-independent. (This is illustrated in Figure 35.2.1.)
Ep V ◦γ Eq
z
Θγs,t
deviation
from
parallel
γ(s) = p q = γ(t) transport
Range(γ) ⊆ B
γ
Iγ = Dom(γ) IR s, t
Figure 35.2.1 Deviation of a cross-section V from parallel transport Θγs,t (z)
To differentiate the map γ̃ z : Iγ → E requires a differentiable structure on E, namely a C 1 atlas on E. Denote

the derivative of this map by ∂t γ̃ z = ∂t Θγs,t (z) ∈ Tz (E). This map, regarded as a function of z ∈ Eγ(t) ,
must be an infinitesimal left action on the fibre set Eγ(t) . (This means that the map must be the vector
field generated by a family of diffeomorphisms of the fibre set.) The definition of parallelism also guarantees
reparametrization independence, transitivity, reversibility and consistency on subcurves. These properties
imply some useful properties for the differential ∂t Θγs,t (z) of a parallelism specification Θ. However, there
are many commonly accepted properties of connections which do not follow from the general definition of
parallelism (Definition 24.2.2).
35.2.3 Remark: There is no guarantee that the derivative of parallelism will be the same for all curves
γ passing through a given point p = γ(t) with the same curve velocity γ # (t). Even if this assumption
is added, there is still no guarantee that ∂t Θγs,t (z) will depend linearly on γ # (t). Nevertheless, these two
properties are generally assumed, probably because the examples of interest, such as Levi-Civita connections,
usually have these properties. Therefore it is generally assumed that ∂t Θγs,t (z) = θγ ! (t) (z) for some function

35.2. Differentiation of parallel transport 685
θV : Eγ(t) → T (E) for each V ∈ Tγ(t) (B) such that θV (z) ∈ Tz (E), and that the map V 8→ θV (z) is linear
for all fixed z ∈ Eγ(t) .
In summary, four assumptions are typically made for parallelism on a differentiable fibre bundle:
(1) the derivative ∂t Θγs,t (z) is defined whenever γ # (t) is defined;
(2) ∂t Θγs,t1 (z) = ∂t Θγs,t2 (z) whenever γ1 (t) = γ2 (t) and γ1# (t) = γ2# (t);
(3) ∂t Θγs,t (z) depends linearly on γ # (t);
(4) the structure group is a Lie transformation group.
When all of these conditions hold, the parallelism is said to be determined by a “connection”. The connection
is the map θV . The integral of this along a curve γ equals the parallel transport Θγ along the curve. In
other words, parallel transport is the integral of the connection, and the connection is the differential of the
parallel transport, but this is only true when the above conditions hold. However, most textbooks ignore
any kind of parallelism which does not satisfy the above conditions.
35.2.4 Remark: Assumption (4) in Remark 35.2.3 may be relaxed in two ways: either by permitting any
diffeomorphism of the fibre space (through the charts) to be a valid parallelism, or by defining a structure
group which is not a Lie transformation group. The groupless fibre bundle in the first case is actually a
fibration. An example of the second case is a structure group such as the group of all orientation-preserving
diffeomorphisms of a differentiable manifold such as IRn or the group of isometries of a Banach space. A
minimum requirement is that the derivative ∂t Θγs,t (z) must be well-defined, and this requires some sort of
differentiable structure on the fibre space.
35.2.5 Remark: Even if assumption (4) in Remark 35.2.3 is assumed to hold, one may expect one or
more of the remaining assumptions to fail in applications to physical models, especially in general relativity.
Assumptions (1) and (2) seem quite reasonable since the physical world is generally fairly continuous. As-
sumption (3) is less credible. At boundaries of regions in physical models, parallelism can be expected to do
strange things.
Assumption (1) may be replaced by the weaker condition that the one-sided derivatives ∂t+ Θγs,t (z) are defined
whenever the one-sided derivative ∂t+ γ(t) is defined. Then (1), (2) and (3) may be replaced as follows:
(1# ) the one-sided derivative ∂t+ Θγs,t (z) is defined whenever ∂t+ γ(t) is defined;
(2# ) ∂t+ Θγs,t1 (z) = ∂t+ Θγs,t2 (z) whenever γ1 (t) = γ2 (t) and ∂t+ γ1 (t) = ∂t+ γ2 (t);
" #
(3# ) ∀λ ≥ 0, (γ1 (t) = γ2 (t) and ∂t+ γ1 (t) = λ∂t+ γ2 (t)) ⇒ ∂t Θγs,t1 (z) = λ∂t Θγs,t2 (z) .
The parallelism in Example 35.2.6 satisfies (1# ), (2# ) and (3# ), but not (1), (2) or (3),
35.2.6 Example: Define a trivial differentiable (G, F ) fibre bundle ξ = (E, π, B, AF E ) with B = IR ,
2
F = IR or S , E = B × F , π : (b, f ) 8→ b, AE = {φ}, φ : E → F , φ : (b, f ) 8→ f , and G = SO(2). Define an

2 1 F
absolute parallelism Θ on ξ by Θγs,t : Eγ(s) ≈ Eγ(t) with Θγs,t : z 8→ A(γ(s), γ(t))z, where A : B × B → G is
3 4
cos α − sin α
defined by A(b1 , b2 ) = with α = |ψB1
(b2 )| − |ψB
1
(b1 )| for b1 , b2 ∈ B, where ψB : B → IR2 is
sin α cos α
the identity chart for B. This is illustrated in Figure 35.2.2.
2
ψB (b) ∈ IR γ
α
1
ψB (b) ∈ IR
α = |ψB
1
(γ(t))| − |ψB
1
(γ(s))|
Figure 35.2.2 Parallelism with one-sided directional derivatives

j
Write xj = ψB (γ(t)) for j = 1, 2. Without loss of generality, assume that ψ 1 (γ(s)) = 0. Then for x1 > 0
and z ∈ Eγ(t) ,
3 43 1 4
γ cos α − sin α z
∂t Θs,t (z) = ∂t
sin α cos α z2
3 43 1 4
∂α − sin α − cos α z
= ,
∂t cos α − sin α z2
where ∂t α = V 1 and V = (V 1 , V 2 ) = γ # (t), assuming the identity chart on F and its tangent space T (F ).
It follows that θV (z) = ∂t+ Θγs,t (z) depends only on V = γ # (t) for x1 > 0 and is linear with respect to γ # (t).
Therefore all three two-sided assumptions (1), (2) and (3) above are satisfied for x1 > 0.
For x1 < 0, ∂t Θγs,t (z) satisfies the same equation as above with ∂t α = −V 1 . However, when x1 = 0,
∂t+ α = |V 1 | and θV (z) = ∂t+ Θγs,t (z) is well-defined for V = ∂t+ γ(t), but ∂t Θγs,t (z) is not defined for V 1 -= 0.
Therefore the three one-sided assumptions (1# ), (2# ) and (3# ) above are satisfied for x1 = 0, but not the
two-sided assumptions because ∂t Θγs,t (z) is not defined when γ(t)1 = 0 and γ # (t)1 -= 0.
35.2.7 Theorem: [ If the conditions in Remark 35.2.3 hold, then the parallel transport Θ equals the
integral of the connection θ which is the differential of the parallel transport Θ. ]
[ Show (?) how differential parallelism can be derived from pathwise parallelism using additivity of pathwise
curvature. Then use the Stokes Theorem and exterior derivatives in a differentiable manifold to derive the
Riemann curvature tensor. Try to get an explicit formula for the horizontal lift functions as derivatives of
the group element gf,φ1 ,φ2 (b1 , b2 ) for a parallelism f . It should then be possible to express curvature directly
as some sort of second-order derivative of the parallelism. ]
[ Must also present differentiability of parallelism for differentiable fibrations or differentiable fibre bundles
with non-Lie structure group. ]
[ Put a new section here for “Horizontal lift functions for differentiable fibrations”? ]
35.2.8 Remark: [ Since the connection is the derivative of parallel transport, try to generalize connections
to almost-everywhere functions classes like L1 . Then use some sort of Sobolev space for the parallelism
and see if the conversions between connections and parallelisms gives back the same structures for such
generalized function classes. Similar considerations arise for conversions between the two-point metric and
Riemannian metric (tensor field) in Section 38.4, particularly Remark 38.4.1. ]
35.3. Horizontal lift functions for ordinary fibre bundles

35.3.1 Remark: Section 33.8 discusses vector fields (infinitesimal actions) generated on a G-manifold by
the action of a Lie group G. The extension of this concept to differentiable fibre bundles will be discussed
in Section 34.3. A connection may be represented as a family of vector fields on the total space of a
differentiable fibre bundle, which are generated by structure group actions and parametrized by base space
direction vectors.
[ Near here, define families of vector fields on the total space of a differentiable fibre bundle which depend
linearly on tangent vectors in the base space. Then one may define a special class of these which are generated
by the structure group. Condition (ii) requires that (V 8→ θV (z)) ∈ Lin(Tπ(z) (M ), Tz (E)) for all z ∈ E, but
the map V 8→ θV for fixed p ∈ M is from Tp (M ) to a space of vector fields whose domain is Ep and range
is T (E). So this is a family of fields which depend linearly on the family parameter. See Section 34.3. ]
35.3.2 Remark: Definition 35.3.5 defines connections on ordinary fibre bundles instead of principal fibre
bundles. This makes it possible, for example, to define affine connections on tangent bundles of differentiable
manifolds rather than the more usual coordinate frame bundles.
The connection introduced in Definitions 35.3.5 and 35.3.12 could be called an “OFB connection” to distin-
guish it from the “PFB connection” in Definition 35.5.3.
35.3.3 Remark: Definition 35.3.5 specifies a connection by fixing a vector V in the base space and stating
how all points in the fibre attached to the base point p move in a parallel fashion when the point p is moved

35.3. Horizontal lift functions for ordinary fibre bundles 687
Tz1 (E )
θV (z1 ) ∈
z1
E θV (z
z2
2) ∈T
z2 (E
)
πE
M p
V ∈ Tp (M )
Figure 35.3.1 Horizontal lift function on an ordinary fibre bundle
in the direction V . This is illustrated in Figure 35.3.1. This may be thought of as a vector field on the total
space for each base space velocity.
Definition 35.3.12 does the reverse. In Definition 35.3.12, an element of the fibre at a point in the base space
is fixed, and then the parallel motion of that element is specified for each velocity of the base point. This
may be thought of as a linear map from the space of base space velocities to the space of fibre velocities for
each fibre element.
35.3.4 Remark: Part (iv) of Definition 35.3.5 is specified in terms of the differential of the right action
Rf of points f ∈ F on elements of the group G. For all f ∈ F , Rf : G → F is defined by Rf : g 8→ gf . Such
“differential actions” dRf are discussed more fully in Section 33.8.
[ (dRφ(z) )e (u) is right-invariant. Therefore it is an infinitesimal left action!? ]

[ Show how the conditions in Definition 35.3.5 follow from conditions on parallel transport. See Remark 35.2.3. ]
35.3.5 Definition:
% A (horizontal) lift function on a C 1 (G, F ) fibre bundle (E, πE , M ) is a function
θ : T (M ) → p∈M (Ep → T (E)) which satisfies
(i) ∀p ∈ M, ∀V ∈ Tp (M ), ∀z ∈ Ep , θV (z) ∈ Tz (E),
(ii) ∀z ∈ E, (V 8→ θV (z)) ∈ Lin(TπE (z) (M ), Tz (E)),
(iii) ∀p ∈ M, ∀V ∈ Tp (M ), ∀z ∈ Ep , (dπE )z (θV (z)) = V ,
(iv) ∀p ∈ M, ∀V ∈ Tp (M ), ∀φ ∈ AF
E,p , ∃u ∈ Te (G), ∀z ∈ Ep , (dφ)z (θV (z)) = (dRφ(z) )e (u).
[ Re-express (iv) as an invariance rule for vector fields rather than in terms of a specific choice of u ∈ Te (G),
which is unnecessarily restrictive. ]
[ How does (dRφ(z) )e (u) in (iv) compare to infinitesimal left actions γ # (0)? ]
35.3.6 Remark: The maps and spaces in Definition 35.3.5 are illustrated in Figure 35.3.2.
&
35.3.7 Remark: Condition (ii) means that for fixed p ∈ M , θ &Tp (M ) is a linear map. [ Section 34.3 will
discuss this kind of field on Ep with values in T (E). It is not entirely clear why this map should be linear.
Should give some explanation of why this condition is required. ]
Condition (iii) means that the horizontal component of θV is V . This is because a fibre moving in a parallel
fashion along a path must move so that the base point of the fibre follows the path.
Condition (iv) of Definition 35.3.5 is a kind of invariance condition. It means that the vertical component
(with respect to a particular choice of fibre chart φ) of the connection value θV (z) for fixed V is the action
of an element of the Lie algebra Te (G) on the fibre space F . The connection is not really invariant under
the group action. The connection is a group action. So the fibre structure is invariant under the action of
the connection.
Although the connection θ is chart-independent, the vector u ∈ Te (G) depends on the choice of fibre chart.
[ There should be some sort of transition rule for calculating u for all fibre charts if it is given for only one
fibre chart. ]

πT (G)
G e u Te (G)
Rφ(z) (dRφ(z) )
πT (F )
F φ(z) Tφ(z) (F )
(dφ)z (θV (z))
φ (dφ)z
πT (E)
Ep z θV (z) Tz (E)
θV
πE (dπE )z
πT (M )
M p V Tp (M )
Figure 35.3.2 Maps and spaces for a horizontal lift function on an ordinary fibre bundle
In a sense, elements of the Lie algebra of a Lie group are really members of the group itself. This may
be compared with distribution theory and Radon measures, where continuous density functions and point
masses are considered to be members of the same space. Similarly, the elements of the Lie algebra may be
considered as the continuous part whereas the group elements are the discrete part of a combined space.
The linear map (dφ)z in condition (iv) takes the vertical component of θV (z) relative to a fibre chart φ. (More
objectively, (dφ)z removes the horizontal component. The vertical component depends on the fibre chart.)
Each fibre chart may define a different vertical component. The map (dφ)z is “orthogonal” with respect to
(dπE )z in the sense that ker(dφ)z ∩ ker(dπE ) = {0}. (This follows from the C 1 diffeomorphism πE × φ :
−1
πE (U ) ≈ U ×F for U = πE (Dom(φ)).) The map (dπE )z removes the vertical component of vectors in Tz (E).
35.3.8 Remark: The role of structure groups of fibre bundles is shown by the definition of a connection.
A connection is required to preserve the structure of the fibre which is indicated by the structure group.
Therefore the value of the connection is the differential of a group action acting on the fibre bundle.
[ Must define the lift function for vector fields. Definition 35.3.9 should not be expressed in terms of a vector
field lift. There should be a more basic way of doing this. It should be a theorem that a C k connection will
lift a C k vector field to C k vector field on the total space. ]
−
35.3.9 Definition: A connection of class C k for k ∈ + 0 on a C
k+1
differentiable (G, F ) fibre bundle
ξ = (E, πE , M, AE ) is a connection θ on ξ such that liftθ (X) ∈ X (E) for all X ∈ X k (M ).
F k
35.3.10 Remark: If a connection is C k , then it seems reasonable that the integral of the connection along
a C k+1 path should be a C k+1 function of the base point. In particular, if the connection is continuous, then
the parallel transport along a C 1 path should be C 1 . [ Should express this more precisely in a theorem. ]
[ Define the lift of a path. Maybe do this in Section 35.8. ]
35.3.11 Remark: The connection in Definition 35.3.5 is transposed in the equivalent Definition 35.3.12.
If θ satisfies Definition 35.3.5, then θ̄ defined by θ̄z (V ) = θV (z) satisfies Definition 35.3.12 and vice versa.
Definition 35.3.12 uses the same notation Rf for the right action of an element of the fibre space F as
Definition 35.3.5.
35.3.12 Definition:
% A (transposed) (horizontal) lift function on a C 1 (G, F ) fibre bundle (E, πE , M ) is a
map θ̄ : E → z∈E Lin(TπE (z) (M ), Tz (E)) which satisfies
(i) ∀z ∈ E, θ̄z ∈ Lin(TπE (z) (M ), Tz (E)),
(ii) ∀z ∈ E, (dπE )z ◦ θ̄z = idTπE (z) (M ) ,
−1
(iii) ∀p ∈ M, ∀V ∈ Tp (M ), ∀φ ∈ AF
E , ∃u ∈ Te (G), ∀z ∈ πE ({p}), (dφ)z (θ̄z (V )) = (dRφ(z) )e (u).

35.4. Curvature of connections on ordinary fibre bundles 689
[ Should w in (iii) be notated as wV,φ ? ]
35.3.13 Remark: The vector w in Definition 35.3.12 (iii) is (probably) not defined in case G is the group
of all diffeomorphisms. But γ # (0) is defined for such a group.
−+
35.3.14 Definition: A connection θ̄ on a C k+1 (G, F ) fibre bundle (E, πE , M ) for k ∈ 0 is said to be
of class C k if liftθ̄ (X) ∈ X k (E) for all X ∈ X k (M ).
[ Must define C k regularity of a connection θ̄ first in terms of a chart, and then give a theorem relating this
to the effect on vector fields. ]
[ Show that (1) the integral of the connection which is the differential of a parallelism equals the original
parallelism; (2) the differential of the integral of a connection equals the original connection. Then: (3) give
necessary and sufficient conditions for a parallelism to be generated by an affine connection. ]
35.4. Curvature of connections on ordinary fibre bundles

35.4.1 Remark: Curvature is, broadly speaking, the deviation of parallelism from flatness. In the case
of affine connections and tangent bundles of manifolds, the curvature may be measured with the Riemann
curvature tensor. In the case of connections on general fibre bundles, a more general measure of curvature
is required. If a connection is defined on a general differentiable fibration whose structure group is not a
finite-dimensional Lie group, then curvature must be defined in terms of vector fields on the fibre space
because then there is no finite-dimensional Lie algebra which can be used for defining connection forms and
curvature forms.
35.4.2 Remark: Suppose γ : IR2 → M is a 2-parameter C 2 family of curves in the base space M of a C 2
differentiable (G, F ) fibre bundle ξ = (E, π, M, AF E ). Suppose that θ is a C horizontal lift function on ξ.
1
Then one may define parallel transport on ξ along two different paths starting at γ(0, 0) and ending at γ(s, t),
the first path γ1 initially following the s axis, the other path γ2 following the t axis. Let γ̃kz (s, t) : IR2 → E
denote the lift function with value γ̃kz (0, 0) = z ∈ Eγ(0,0) along path γk for k = 1, 2. Then (probably)
N s N t
γ̃1z (s, t) =z+ θV (u,0) (γ̃1z (u, 0)) du + θW (s,u) (γ̃1z (s, u)) du
0 0
and
N t N s
γ̃2z (s, t) = z + θW (0,u) (γ̃2z (0, u)) du + θV (u,t) (γ̃2z (u, t)) du,
0 0
where V (s, t) = ∂s γ(s, t) and W (s, t) = ∂t γ(s, t) for (s, t) ∈ IR2 . The difference between γ̃1z and γ̃2z is a
measure of curvature. (This is roughly illustrated in Figure 35.4.1.) Using the Stokes Theorem, it is possible
to express this difference in terms of a suitable differential of the horizontal lift function.
Eγ(0,t)
Eγ(s,t) γ̃ z (s, t)
2
γ̃1z (s, t)
Eγ(0,0) γ(0, t)
z
γ(s, t)
Eγ(s,0)
γ(0, 0)
γ(s, 0)
Figure 35.4.1 Curvature concept for connection on ordinary fibre bundle

35.5. Horizontal lift functions for principal fibre bundles

[ Can define curvature by fixing an initial z, then taking the Poisson bracket of vector fields of f in directions
V1 , V2 . ]
[ Must derive all PFB definitions from OFB definitions. This may be done by converting between OFBs and
PFBs using associated fibre bundles, or by defining PFBs as a special case of an OFB, with the additional
right group action on the total space. Should then re-derive the OFB definitions from PFB definitions. It is
likely that the Lie algebra element u for a PFB will be exactly the same as for the OFB. ]
[ Define PFB connections in terms of OFB connections and vice versa via the definition of “parallel fibre
transports”, i.e. parallel cross-sections. ]
35.5.1 Remark: Three alternative definitions of a connection on a principal fibre bundle are given in this
chapter: Definitions 35.5.3, 35.9.2 and 35.9.4. They each have advantages and disadvantages.
35.5.2 Remark: Suppose (P, πP , M ) is a C 1 principal G-bundle. (See Definition 34.2.2 for “differentia-
bility of a fibre bundle”. See Definition 34.4.1 for “principal G-bundle”.) Then the map πP : P → M is a
C 1 map between the C 1 manifolds P and M . Hence the differential (dπP )z : Tz (P ) → TπP (z) (M ) of the
map πP is well-defined for every z ∈ P . Similarly, if Rg denotes the action of a group element g ∈ G on the
manifold P , then Rg is a C 1 map from P to P . Hence the differential (dRg )z : Tz (P ) → Tzg (P ) of the map
Rg is well-defined for every z ∈ P and g ∈ G.
[ Do a transposed version of Definition 35.5.3 with ρ : T (M ) → (P →
˚ T (P )). ]
35.5.3 Definition:
% A (transposed) (horizontal) lift function on a C 1 principal G-bundle (P, πP , M ) is a
map ρ̄ : P → z∈P Lin(TπP (z) (M ), Tz (P )) which satisfies
(i) ∀z ∈ P, ρ̄z ∈ Lin(TπP (z) (M ), Tz (P )),
(ii) ∀z ∈ P, (dπP )z ◦ ρ̄z = idTπP (z) (M ) ,
(iii) ∀z ∈ P, ∀g ∈ G, ρ̄zg = (dRg )z ◦ ρ̄z .
[ Show how the conditions in Definition 35.5.3 follow from conditions for a parallel transport on (P, πP , M ). ]
35.5.4 Remark: The maps in Definition 35.5.3 are illustrated in Figures 35.5.1 and 35.5.2. The map
Rg : P → P is defined for g ∈ G by Rg : z 8→ z.g = µP
G (z, g).
Tzg (P ) ρ̄zg (v) zg P
(dRg )z Rg g G
liftρ̄ (X)
ρ̄zg Tz (P ) ρ̄z (v) z P
(dπP )zg
ρ̄z (dπP )z πP
TπP (z) (M ) v πP (z) M

X
Figure 35.5.1 Maps for transposed horizontal lift function on a PFB
[ In Figure 35.5.1, X is no longer so relevant? ]

[ In Definition 35.5.3 (iii), try to get also ρ̄gz = (dLg )ρ̄z ◦ ρ̄z or something similar. ]
[ Show that Definition 35.5.3 (iii) implies that ρ· (v) is an infinitesimal left translation. ]
[ The lift function in Definition 35.5.3 is equivalent to a right invariant vector field through the charts. There-
fore equivalent to an infinitesimal left action. ]

35.5. Horizontal lift functions for principal fibre bundles 691
(dRg )z Rg
ρ̄zg (v) ρ̄z (v) zg z

liftρ̄ (X)
T (P ) P
Figure 35.5.2 Right actions and the lift function of a connection
−+
35.5.5 Definition: A connection ρ̄ on a C k+1 fibre bundle (P, πP , M ) is said to be of class C k for k ∈ 0
if liftρ̄ (X) ∈ X k (P ) for all X ∈ X k (M ).
35.5.6 Remark: The conditions of Definition 35.5.3 may be interpreted as follows: (i) the velocity of
change of n-frame is a linear function of the velocity of the base point; (ii) the “horizontal component” of
the connection is the identity; (iii) the velocity of transformation of the tangent space is the differential of a
group action on the fibre bundle.
35.5.7 Remark: If Definition 35.5.3 is studied line by line, it is less complex than it seems at first. In
part (i), ρ̄z (v) means the direction in which to move z relative to the “coordinates” when moving the point
p = πP (z) in the direction v ∈ TπP (z) (M ). In other words, ρ̄z (v) means the rate of change of z required in
the direction v in order to keep z parallel to the starting value. A practical example of this would be an
airplane flying along a geodesic from New York to Paris. To keep the airplane in an orientation parallel to
the initial orientation, it is necessary to adjust the bearing of the airplane relative to the longitude/latitude
coordinate system. In this case, z would be the airplane’s orientation, v is the direction of travel of the
airplane, and ρ̄z (v) is the rate at which the orientation must be changed to keep the airplane moving in a
parallel fashion.
Figure 35.5.3 illustrates roughly how the connection ρ̄ determines parallelism at a short distance from a
point z ∈ P .
ρ̄z (v) + o(v)
ρ̄z (−v) + o(−v)
z
v
−v p = π(z)
Figure 35.5.3 Local parallelism determined by a connection ρ̄
The projection of z onto M is πP (z) ∈ M . The vector v ∈ TπP (z) (M ) is a tangent vector at πP (z) which
indicates a direction of movement for the point p = πP (z). If this point p is moved by the amount v to p + v,
then the vector in πP−1 ({p + v}) which is parallel to z will look like z + ρ̄z (v) + o(v). In other words, the
vector z ∈ P is moved my the small amount ρ̄z (v) + o(v) ∈ Tz (P ). (This interpretation is not very rigorous.
It is only intended to give an intuitive interpretation of the connection.) The term o(v) of order smaller than
v as v → 0.
Note that ρ̄z (v) has horizontal and vertical components. The horizontal component is the displacement
horizontally by the vector v, which implies that it contains no real information. The vertical component is
the deviation of z away from the point it would have been at p + v if the vertical component of z had been
left unchanged relative to the coordinate system. Similarly, when the point p is moved in the direction −v
to p − v, the vertical component of z is translated by φz (−v) + o(−v) to z + φz (−v) + o(−v). The dashed
vertical lines at p ± v represent the parallel transport of z in the case that ρ̄ has no vertical component. The
horizontal dashed lines represent the parallel transportation of the vector z under translations ±v of p, which
consist of a horizontal component (the straight portion) and a vertical component (the curved portion). This
explains part (i) of Definition 35.5.3.

The fact that condition (i) requires linearity of the connection with respect to tangent vectors in TπP (z) (M )
implies a reduced amount of information in the connection. The whole map ρ̄z is fully specified on any
TπP (z) (M ) if it is known for any n linearly independent vectors in that space.
Part (ii) of Definition 35.5.3 says that (dπP )z (ρ̄z (v)) = v for all v ∈ TπP (z) . This just means that if p is
translated to p + v, then the value z + ρ̄z (v) + o(v) has a horizontal component equal to v. In other words,
ρ̄z always translates z to a point in P which has the same base point as the point p + v. (Once again, this
is a very rough first-order description. This description can be made precise, but it is more useful to think
here in the language of small displacements which is usual in physics texts.) This condition implies that the
horizontal component of the connection actually contains no information.
Part (iii) of Definition 35.5.3 states that the parallel transport vector ρ̄zg can be obtained from ρ̄z by applying
the linear transformation (dRg )z , which is (very roughly speaking) a vertical linear transformation factor
of g. In other words, all vectors zg + ρ̄(zg) + o(v) can be obtained from z + ρ̄(z) + o(v) by applying the
transformation g. The situation when a group element g is applied is illustrated in Figure 35.5.4.
ρ̄z (v) + o(v)

ρ̄z (−v) + o(−v)
z
ρ̄zg (v) + o(v)
zg
ρ̄zg (−v) + o(−v)
v
−v p = π(z)
Figure 35.5.4 Local parallelism under group action
Elements g of the group G act on the set P . The vector z.g is therefore an element of P such that
πP (z.g) = πP (z) = p, as shown. The figure shows how the connection ρ̄ transports a vector zg in a parallel
fashion for small displacements ±v from the base point p. For instance, zg is transported to z + ρ̄zg (v) + o(v)
for a base point translation v. Just as for parallel transport of vector z, the parallel transport of zg has a
horizontal and vertical component. The vertical component of the connection is invariant under the group G.
This implies a high level of redundancy of the information in the connection.
Condition (iii) specifies not so much invariance as conservation or preservation. It specified that the structure
of the fibre space must be preserved under parallel translations. The connection ρ̄ is not really invariant in
any sense, but it does guarantee that the invariants of the structure group are preserved. This is anologous to
the fact that the Christoffel symbols for a connection are not themselves tensors, but their use does preserve
the tensor property of tensors which are differentiated with them.
[ To define differentiability of the connections ρ̄, h and Q in Definition 35.5.5, could say that ρ̄ is C k when
∀X ∈ X k (M ), liftρ̄ (X) ∈ X k (P ).
Or could use the condition

∀X ∈ X k (P ), hz ◦ X ∈ X k (P ).
The latter form looks the most natural. ]
35.5.8 Definition: The lift of a vector field X ∈ X(M ) by a connection ρ̄ on a principal G-bundle
(P, πP , M ) is the vector field X ∗ = liftρ̄ (X) ∈ X(P ) defined by
∀z ∈ P, X ∗ (z) = liftρ̄ (X)(z) = ρ̄z (X(πP (z))).

[ The lift is defined after it is used in Definition 35.5.3. Should fix this. ]
35.5.10 Definition: A vertical vector at z ∈ P , in a C 1 principal G-bundle P −

< (P, πP , M ), is a vector
v ∈ Tz (P ) which satisfies (dπP )z (v) = 0.

35.6. Connection forms for PFB connections 693
liftρ̄ (X)
Tz (P ) z P
ρ̄z (dπP )z πP
TπP (z) (M ) πP (z) M

X
Figure 35.5.5 Definition 35.5.8 of lift of a vector field
35.5.11 Notation: Vz (P ) denotes the set of vertical vectors at z, for a C 1 principal G-bundle P −
<
(P, πP , M ) and z ∈ P .
35.5.12 Remark: Vz (P ) = ker((dπP )z ) ⊆ Tz (P ) for any z ∈ P , for any C 1 principal G-bundle P −

<
(P, πP , M ).
[ Possibly insert a new section on extension of connections to associated fibre bundles near here. ]
[ Insert new section here on connection forms for differentiable fibrations and ordinary fibre bundles? ]
35.6. Connection forms for PFB connections

[ Should do absolutely everything for OFB connections first, including this section. Then do everything for
PFB connections.
% Theorem 35.6.5 should generalize to OFBs. The OFB connection form should be of the
form ω : E → z∈E Lin(Tz (E), Te (G)), or something like that. ]
35.6.1 Remark: A connection form tells you how much a given motion deviates from parallel. In other
words, the connection form is the difference between the actual velocity of a curve in the total space and
the velocity that the curve should have for parallel motion. Therefore the connection form is zero for
parallel motion. Connection forms may be calculated from horizontal lift functions by simply subtracting
such functions from an identity function. (Specifically, the vector field dRg in Definition 35.5.3 should be
subtracted from the actual fibre motion.)
35.6.2 Remark: A connection form may be interpreted as a measure of angular velocity because it mea-
sures the deviation of the rate of change of orientation of fibres relative to parallel motion. Therefore it is
not at all surprising that the symbol ω is commonly used both for connection forms and for angular velocity
in mechanics. The analogy is accurate when the structure group is an orthogonal group. For other kinds of
groups, the analogy is still useful.
35.6.3 Definition: The connection%form on a principal G-bundle (P, πP , M ) corresponding to a connection

ρ on (P, πP , M ) is the map ω : P → z∈P Lin(Tz (P ), Te (G)) such that for all z ∈ P ,
(i) ωz ∈ Lin(Tz (P ), Te (G)),
(ii) ker ωz = ρ̄z (TπP (z) (M )), and
(iii) ωz ◦ (dLz )e = idTe (G) ,
where (dLz )e ∈ Lin(Te (G), Tz (P )) is the differential at e ∈ G of the map Lz : g 8→ z.g from G to P .
[ Show the precise relation between the connection form and horizontal lift functions. Probably get ω =
id −ρ̄ ◦ (dπP ) or something. This should be the definition of ω, and most of Definition 35.6.3 should then be
a theorem. ]
[ Can a curvature form be defined for a general connection form? ]
[ The conditions in Definition 35.6.3 should be expressed in plain English also. ]
35.6.4 Remark: In Definition 35.6.3, condition (iii) means that the vertical component of the connection
form is the identity.

[ Note that ρ̄z is not a differential form on M or P . ]

35.6.5 Theorem: Let µz = (dLz )e . Then the diagram
ρ̄z z ω
←− TπP (z) (M ) −→
0 −→ ←− Te (G) −→
←− Tz (P ) −→ ←− 0,
(dπP )z µz
or equivalently
µz (dπP )z
←− Te (G) −→
0 −→ ←− TπP (z) (M ) −→
←− Tz (P ) −→ ←− 0,
ωz ρ̄z
is exact in both directions. (The arrows to and from 0 are zero maps.) That is,
ρ̄z is injective
ρ̄z (TπP (z) (M )) = ker ωz
ωz is surjective
µz is injective
µz (Te (G)) = ker ((dπP )z )
and (dπP )z is surjective.
In addition,
(dπP )z ◦ ρ̄z = idTπP (z) (M )
and ωz ◦ µz = idTe (G) .
hz = ρ̄z ◦ (dπP )z is the horizontal component operator on Tz (P ), and µz ◦ ωz is the vertical component
operator. Im(ρ̄z ) = ker(ωz ) is the space of horizontal vectors, and Im(µz ) = ker ((dπP )z ) is the space of
vertical vectors at z.
[ ωz has been made independent of the choice of fibre chart by shifting it to e ∈ G. The action of G on P
gives a fibre-chart-free association between Te (G) and Tz (P ) for any z ∈ P . ]
[ In the following comment, could write ρ̄z (TπP (z) (M )) = Qz = ker(ωz ). See Spivak [43], book II, page 336.
Qz is the horizontal space at z. Also give a formula for ρ in terms of ω? ]
[ Should try to show that ωz is uniquely determined by
ωz ◦ µz = idTe (G)
and ρ̄z (TπP (z) (M )) ⊆ ker ωz .
This should follow from Section 10.11 on exact sequences of linear maps. It should therefore be possible to
write ωz in terms of µz and ρ̄z , like for instance
ωz (y) = µz−1 (y − hz (y))
= µz−1 (y − ρ̄z ◦ (dπP )z (y))
for y ∈ Tz (P ). (This looks very much like the definition of a covariant derivative!) Something similar should
be possible in the reverse direction, like hz (y) = y −µz ◦ωz (y), and some expression for ρ̄z . These expressions
could be added to those in Remark 35.9.6.
This existence and uniqueness result is necessary for showing that the connection form contains the same
information as a connection. ]
[ Mention somewhere the connection of exact sequences with algebraic topology. See for instance Greenberg/
Harper [114], page 54, and EDM2 [35] 277.E. ]
[ Mention the relation between the connection form and the exponential map of elements of Te (G) acting
on P . Define this action to be Rexp tA : P → P . This is a 1-parameter group of transformations on P ,
which therefore induces a vector field A∗ on P . Then can replace condition (iii) of Definition 35.6.3 by
∀A ∈ Te (G), ωz (A∗ (z)) = A. See EDM2 [35] 105.N. ]
[ Interpret the connection form as the rate of change of (position + coordinates) in a manifold to a rate of
change of coordinates. The “output” rate of change of coordinates is in fact a correction term which equals
zero if the “input” rate of change of coordinates corresponds to parallel (motion + rotation). ]
[ Here define the relation of Christoffel symbols to the connection form. See EDM2 [35], page 1573, 417.B.
Probably ω i j = Γkj
i
dxk . ]

35.7. Covariant derivatives for general connections 695
35.7. Covariant derivatives for general connections

[ See manuscript notes. Get γ ## (t) − θγ ! (t) (γ # (t)) etc. ]
35.7.1 Remark: It is possible to define a covariant derivative for general connections which is completely
analogous to the standard covariant derivative for tangent bundles. The difference is that instead of the
covariant derivative being valued in the tangent space of the base manifold, it is valued in the tangent
bundle of the fibre space. It just happens in the case of tangent bundles that this fibre space is IRn , which
makes it easy to identify the covariant derivative with the tangent space of the base manifold. In the case
of a principal fibre bundle, the fibre is the structure group, which happens to be a Lie group. Therefore
the covariant derivative for a principal fibre bundle is valued in the tangent space of the Lie group, and this
tangent space is identified with the Lie algebra of the group.
[ Can some sort of Riemann curvature tensor be defined for general connections? ]
35.8. Parallel displacement for PFB connections

[ In this section cover parallel displacement as in EDM2 [35] 82.C. This is meaningful for general connections. ]
[ Fibre-to-fibre homeomorphisms should be in the topological fibre bundle chapter maybe. Then cover only
the case of connections on differentiable fibre bundles in this section. ]
−1 −1
35.8.1 Remark: The set {f : πE ({b1 }) ≈ πE ({b2 }); b1 , b2 ∈ B} is useful for the discussion of parallel
−1 −1
displacement, where (E, πE , B) is a topological fibre bundle of some sort. Since πE ({b1 }) ≈ πE ({b2 }) holds
for all b1 , b2 ∈ B, the set is clearly nonempty for all pairs (b1 , b2 ).
%
[ The set H(E, πE , B) = b1 ,b2 ∈B Hb1 ,b2 (E, πE , B) could be some sort of “double fibre bundle” analogous to
double tangent bundles T (M1 , M2 ). ]
−1 −1
Let Hb1 ,b2 (E, πE , B) denote the set {f%: πE ({b1 }) ≈ πE ({b2 })} of homeomorphisms between the fibres at
−1 −1
b1 and b2 , and define H(E, πE , B) = b1 ,b2 ∈B Hb1 ,b2 (E, πE , B) = {f : πE ({b1 }) ≈ πE ({b2 }); b1 , b2 ∈ B}.
−1 −1
Define π̄E : H(E, πE , B) → B1 × B2 by π̄E : f 8→ (b1 , b2 ) if f : πE ({b1 }) ≈ πE ({b2 }). Then the triple
−1
η = (H(E, πE , B), π̄E , B1 × B2 ) looks a little like a fibre bundle. In fact, π̄E ({(b1 , b2 )}) = Hb1 ,b2 (E, πE , B)
for all b1 , b2 ∈ B. It would be straightforward to construct a natural topology for H(E, πE , B) so that η is
a topological fibre bundle. Definition 35.8.3 is an experimental definition of this sort of thing.
[ It’s pretty clear that fully general continuous connections are not possible. Must have rectifiable curves. ]
35.8.2 Remark: A definition of parallel translation should associate an element of the homeomorphism
space Hb1 ,b2 (E, πE , B) with every continuous curve from b1 to b2 for b1 , b2 ∈ B. The set of all curves in B may
be represented as the set C 0 ([0, 1], B) of continuous maps from [0, 1] ⊆ IRn to B. Then a definition of parallel
transport could be represented as a map α : C 0 ([0, 1], B) → H(E, πE , B) such that α(γ) ∈ Hb1 ,b2 (E, πE , B)
whenever γ ∈ C 0 ([0, 1], B) satisfies γ(0) = b1 and γ(1) = b2 .
For this definition to be satisfactory, it should satisfy a transitivity rule, namely that if γ is the contatenation
of two curves γ1 and γ2 such that γ1 (0) = b1 , γ1 (1) = γ2 (0) and γ2 (1) = b2 , then α(γ) = α(γ2 ) ◦ α(γ1 ).
It would be interesting to know whether mere differentiability and transitivity for a pathwise parallelism as
above would suffice to give the complete standard definition for a connection on a differentiable principal
fibre bundle. Perhaps the group invariance properties would force the parallelism into the right form.
%
[ In the case of a vector bundle, Definition 35.8.3 should be replaced with b1 ,b2 Hom(Eb1 , Eb2 ). ]
35.8.3 Definition: The fibre-to-fibre homeomorphism space of a topological fibre bundle (E, πE , B) is the
set H(E, πE , B) defined by
−1 −1
H(E, πE , B) = {f : πE ({b1 }) ≈ πE ({b2 }); b1 , b2 ∈ B}.
The fibre-to-fibre homeomorphism bundle of the topological fibre bundle (E, πE , B) − < (E, TE , πE , B, TB ) is
the tuple (H(E, πE , B), π̄E , B1 × B2 ) − < (H(E, πE , B), TH , π̄E , B, TB ), where π̄E : H(E, πE , B) → B1 × B2
−1 −1
is defined by π̄E : f 8→ (b1 , b2 ) if f : πE ({b1 }) ≈ πE ({b2 }), and the topology TH is defined by . . .

35.9. Alternative definitions for general connections

35.9.1 Remark: Definitions 35.9.2 and 35.9.4 are non-standard. The author prefers 35.5.3.
35.9.2
% Definition (→ 35.5.3): A connection on a C ∞ principal G-bundle (P, πP , M ) is a map h : P →
z∈P Lin(Tz (P ), Tz (P )) such that
(i) ∀z ∈ P, hz ∈ Lin(Tz (P ), Tz (P )),

(ii) ∀z ∈ P, (dπP )z ◦ hz = (dπP )z ,
(iii) ∀z ∈ P, ∀g ∈ G, hzg = (dRg )z ◦ hz ◦ (dRg )−1
z ,
The function h is called the horizontal map function of the connection.
[ The conditions in Definition 35.9.2 should also be expressed in plain English. Define C k regularity of
horizontal component function. ]
35.9.3 Theorem: Definitions 35.5.3 and 35.9.2 contain equivalent information under the correspondence
hz = ρ̄z ◦ (dπP )z . More precisely. . .
35.9.4 Definition (→ 35.5.3): A connection on a C ∞ principal G-bundle (P, πP , M ) is a map Q : P →

IP(T (P )) such that
(i) ∀z ∈ P, Qz is a subspace of Tz (P ),
(ii) ∀z ∈ P, Tz (P ) = Vz (P ) ⊕ Qz ,
(iii) ∀z ∈ P, ∀g ∈ G, (dRg )z Qz = Qzg ,
(iv) the map z 8→ Qz is C ∞ in some sense.
The vectors of Qz are said to be horizontal at z.
[ The conditions in Definition 35.9.4 should be expressed in plain English also. Must define C k regularity of
the horizontal subspace function. ]
35.9.5 Theorem: Definitions 35.5.3 and 35.9.4 contain equivalent information under the correspondence
Qz = ρ̄z (TπP (z) (M )). More precisely, given a map ρ̄ which satisfies Definition 35.5.3, the function Q : P →
IP(T (P )) defined by Qz = ρ̄z (TπP (z) (M )) for all z ∈
% P satisfies Definition 35.9.4. Conversely, if the map
Q satisfies Definition 35.9.4, then the map ρ̄ : P → z∈P Lin(TπP (z) (M ), Tz (P )), where for each z ∈ P , ρ̄z
is the unique right inverse of (dπP )z such that ρ̄z (TπP (z) ) = Qz (guaranteed by Theorem 10.11.10), then Q
satisfies Definition 35.5.3.
%
Proof: First assume that ρ̄ : P → z∈P Lin(TπP (z) (M ), Tz (P )) satisfies Definition 35.5.3, and define
Qz for each z ∈ P by Qz = ρ̄z (TπP (z) (M )). Then clearly Qz is a subspace of Tz (P ) for all z ∈ P . So
Q : P → IP(T (P )), and condition (i) of Definition 35.9.4 is satisfied. From Theorem 10.11.9, it follows that
Tz (P ) = Vz (P ) ⊕ Qz for all z ∈ P , which verifies condition (ii).
The transformation rule (dRg )z Qz = Qzg of Definition 35.9.4 follows from the corresponding rule ρ̄zg =
ρ̄z ◦ (dRg )z of Definition 35.5.3. Indeed
(dRg )z Qz = (dRg )z (ρ̄z (TπP (z) (M )))

= ((dRg )z ◦ ρ̄z )(TπP (z) (M ))
= ρ̄zg (TπP (z) (M )),
which follows from the fact that ρ̄zg = (dRg )z ◦ ρ̄z .

To show that Qz is differentiable if ρ̄z is differentiable, it is necessary to define clearly what differentiability
means for each of these.
Now assume that a connection is given according to Definition 35.9.4. For z ∈ P , define ρ̄z : TπP (z) (M ) →
Tz (P ) to be a linear function such that (dπP )z ◦ ρ̄z = idTπP (z) (M ) and ρ̄z (TπP (z) (M )) ⊆ Qz . The existence
and uniqueness of ρ̄z are guaranteed by Theorem 10.11.10. Thus (i) and (ii) of Definition 35.5.3 are satisfied.

35.9. Alternative definitions for general connections 697
To demonstrate condition (iii) of Definition 35.5.3, let z ∈ P , g ∈ G and v ∈ TπP (z) (M ). Then ρ̄zg (v) ∈
Qzg . . .
[ Show that ρ̄zg = ρ̄z ◦ (dRg )z . ]
[ To generalize Definition 35.5.5, need to define the derivatives of Qz . ]
35.9.6 Remark: The relations between the definitions of connection are:
ρ̄z ↔ hz ◦ (dπP )−1

z ↔ ...
ρ̄z ◦ (dπP )z ↔ hz ↔ proj. of Tz (P ) onto subspace Qz
ρ̄z (TπP (z) (M )) ↔ hz (Tz (P )) ↔ Qz
Note that Q is more easily derived from ρ̄ or h than vice versa, and between ρ̄ and h, h is more easily derived
from ρ̄. So ρ̄ gives the best definition. But h is in some senses more natural. For instance, differentiability
is probably easier to define in terms of h.
[ Should try to include the connection form ω in the above table. ]
35.9.7 Remark: Definition 35.6.3 is equivalent to Definitions 35.5.3, 35.9.2 and 35.9.4 in the sense that
a connection form contains the same information as each of the three definitions of a connection. Thus a
connection form may be regarded as an alternative definition for a connection.
[ Give a fourth or fifth definition of connection: see Gallot/Hulin/Lafontaine [20], Definition 2.49, page 69. ]


[699]
Chapter 36
Affine connections and covariant derivatives
36.1 Concepts, history and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700

36.2 Overview of affine connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
36.3 Motivation for defining connections on manifolds . . . . . . . . . . . . . . . . . . . . . . . 702
36.4 Affine connections on tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
36.5 Covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
36.6 Hessian operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
36.7 Elliptic second-order operator fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
36.8 Curvature and torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
36.9 Affine connections on principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 711
36.10 Coefficients of affine connections on principal fibre bundles . . . . . . . . . . . . . . . . . 711
36.11 Connections for Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
36.0.1 Remark: Affine connections are a special case of the general connections in Chapter 35. For affine
connections, the structure group is a general linear group GL(n), and the differentiable fibre bundle is the
tangent bundle of an n-dimensional manifold or some fibre bundle associated with the tangent bundle. So
affine connections are effectively defined on tangent vector bundles of differentiable manifolds.
36.0.2 Remark: If one had to explain differential geometry to a non-mathematician, a simple phrase to
summarize the subject might be: “The geometry of curvature.” One might also say: “The geometry of
curved spaces.” Curvature is really the core concept of differential geometry which distinguishes it from
“normal geometry”. Curvature in turn may be defined as “the deviation of parallel lines from parallel”. In
other words, curvature is the tendency of parallel lines to get closer or further away from each other as they
are extended. So curvature requires parallelism as a prior concept. In the olden days of Euclidean geometry,
parallel lines stayed the same distance apart no matter how far they were extended. (This is related to
Euclid’s fifth postulate. See EDM2 [35], 139.A and 285.A.) In a space of positive curvature, parallel lines
get closer together. In a space of negative curvature, they move apart. Since parallelism clearly cannot be
defined in terms of equidistance in a curved space, alternative definitions are required which do not rely on
the definition of distance.
36.0.3 Remark: In this chapter, there are no definitions for distance on spaces. Parallelism is defined here
in terms of “connections”, which are an abstraction from the familiar concept of parallelism in flat Euclidean
space. It turns out that curvature and parallelism can be very comprehensively defined with no definition
of distance at all. Curvature in the absence of a metric is the subject of Section 36.8. (Some minimalist
definitions of parallelism and curvature are discussed in Chapter 22 for non-topological manifolds, and in
Chapter 23 for topological manifolds.)


700 36. Affine connections and covariant derivatives
36.1. Concepts, history and terminology

36.1.1 Remark: An affine connection defines differential pathwise parallelism for a differentiable manifold.
Whereas parallelism for a flat vector space is a simple equivalence relation between the vectors at all points
in the space, parallelism for curved spaces is path-dependent. Given any two points in a differentiable
manifold, there is generally no absolute relation of parallelism between the tangent vectors at the two points.
An example of this is illustrated in Figure 36.1.1. If a vector is transported in a parallel fashion from the
equator to the north pole of the earth, the vector will not be parallel to the same vector transported along
a different path.
Finish here Finish here
Hà Nô.i
München
München Hà Nô.i
Start here Start here
Figure 36.1.1 Path-dependent parallel transport on a 2-sphere
If you stand at longitude 0◦ , latitude 0◦ , on the Earth on a surfboard pointed northwards and surf towards
the north pole in a parallel fashion, the surfboard will point towards longitude 180◦ . But if you move in a
parallel fashion first towards longitude 90◦ and then to the north pole, the surfboard will be pointing towards
longitude −90◦ when you arrive at the north pole. (This proves that the Earth is not flat.)
36.1.2 Remark: The notion of a connection formalizes the path-dependence of “parallelism at a distance”
in terms of infinitesimal coordinate frame motions for infinitesimal paths. In other words, a connection spec-
ifies parallelism in a differential sense. The notion of a connection was abstracted from Riemannian spaces,
but is meaningful for a larger class of differentiable manifolds. The standard connection on a Riemannian
manifold is called a Levi-Civita connection, which is defined in Section 38.5.
36.1.3 Remark: The name “affine” comes from affine spaces in which parallelism is an invariant of the
affine transformation group. This is a misnomer. Many authors call an affine connection more correctly a
linear connection. Affine spaces are presented in Sections 12.1 and 12.2. The historical origin of the term
“affine” is discussed in Section 45.3.
As mentioned in Remark 15.3.2, the choice of the word “connection” for differential parallelism is also
unfortunate. So the term “affine connection” is doubly unfortunate, and perplexing.
[ It may be a good idea to introduce the terminology “affine (differentiable) manifold” or “differentiable affine
manifold” to mean “differentiable manifold with an affine connection”. If only a tiny majority of people are
using “affine manifold” for torsion-free connections, it might be a good idea to usurp this term for a higher
and more noble purpose! Who cares if there’s a bit of torsion? And besides, who exactly is using “affine
manifold” to imply torsion-free? ]
36.1.4 Remark: Another misfortune of terminology in the topic of affine connections is the lack of a
standard term for “differentiable manifolds with an affine connection”. Terms such as “affine space” and
“affine geometry” refer to flat-space concepts. The term “affine manifold” apparently refers to manifolds
with a torsion-free flat affine connection. Maybe the shortest term which can be used safely is an “affine
connection manifold”. In Misner/Thorne/Wheeler [38], chapter 10, the subject of geometry with affine
connections but no metric is referred to as “affine geometry”.

36.2. Overview of affine connections 701
[ Should have a table showing conversions between definitions of connections, or maybe later in chapter. See
Spivak [43], book II, page 336 diagram. ]
36.1.5 Remark: As mentioned in Remark 19.4.2, tangent vectors may be regarded as invariants of the
“figures” (e.g. curves) with respect to C 1 diffeomorphisms. When an affine connection is defined on a
C 2 manifold, the definition of parallelism is the same for all charts in the manifold’s atlas. An affine
connection is transformed in such a way that parallel transport works the same regardless of the chart
transition diffeomorphisms applied to the manifold. Hence parallelism may be regarded as an invariant of
the pseudogroup of chart transition maps of a C 2 manifold which has a well-defined affine connection. Since
curvature is determined by parallel transport, it follows that curvature is then a chart invariant also.
36.2. Overview of affine connections

36.2.1 Remark: The first obstacle to understanding affine connections is the unfortunate choice of name.
The words “affine” and “connection” are both confusing. A better name would be “differentiable parallelism”
or “parallelism differential”. Parallel transport on a manifold may be defined as an integral of an affine
connection. But that is getting ahead of the story. So let’s start from the beginning. . .
36.2.2 Remark: The set of second derivatives of a real-valued function on a manifold is, unhappily, not
a tensorial object. The tuple of first-order derivative is tensorial, namely a first-degree tensor. But the
second-order derivatives require a “tensorization” term to counteract the second-order derivatives which
enter into the transformation rules for transition maps between charts. This tensorization requirement leads
directly to the definition of an affine connection. The overwhelming importance of second-order derivatives
in physics makes it essential to be able to define tensorial second derivatives. Derivatives which are corrected
by “tensorization terms” are called “covariant derivatives”.
36.2.3 Remark: Parallelism connects up the definitions of direction at different points in a space by
“carrying” vectors from one place to another. Layer 2 does not have any definition of parallel motion. This
is very unfortunate for physics. Momentum, for example, requires parallel motion of objects to be defined.
Newton’s first law says that in the absence of forces, an object’s momentum does not change. It travels in
a straight line at an even speed. But if you apply a diffeomorphism to the space coordinates, a straight line
may become curved and vice versa. If the notion of “straight ahead” is not well defined, an object does not
know which direction to travel in. The games of cricket and billiards are not possible in such a world. An
affine connection must be added to the differentiable structure in order to define parallism at a distance. It
is sufficient to define parallel translation along curves. This permits objects to determine how curved their
path is.
36.2.4 Remark: The observant reader will notice that the parallel transport of chart axes from point A
to point B in Figure 36.2.1 does not result in agreement at B.
A B
Figure 36.2.1 Layer 3: Path-dependent parallel transport of vectors
The reader may think that this is most unfortunate and may wish that things were defined differently so
as to exclude this inconvenience. However, the difference in parallel transport between two paths with the

same pair of end-points is called “curvature”. The curvature of a manifold equals zero everywhere if and
only if all paths between each pair of end-points give the same parallel transport. A space where curvature
is everywhere zero is called “flat”. The flat spaces are the ones which Euclid wrote about. There is no
need of differential geometry when curvature is zero. The curvature of space-time in general relativity is
zero only if there is no gravitational field. Flat space-time corresponds to an empty universe. Therefore, far
from being an inconvenient nuisance, path-dependent parallelism is the core idea which lies at the heart of
differential geometry. It is the raison d’être of this book. It is the engine inside the chassis. Curvature is
the DNA which defines the organism. It is the Penelope which brings Odysseus home. The entire subject
of differential geometry exists to make sense of curvature. The reader who does not wish to accept path-
dependent parallelism is studying the wrong subject.
36.2.5 Remark: The most important point about pathwise parallelism in Layer 3 is not whether it shows
zero or non-zero curvature. The important thing is that it must be defined. When the differentiable structure
is already defined in Layer 2, and the parallelism is differentiable with respect to that differentiable structure,
the parallelism is called an “affine connection”. The word “affine” arose from a thinking error by Euler when
he was writing about similarity transformations. It is related to the word “affinity”. The choice of the word
“affine” is unfortunate, but we are stuck with it. A better terminology would have been “linear connection”.
(See Section 45.3 for history of the word “affine”.)
36.2.6 Remark: Affine connections are defined on differentiable manifolds in Chapter 36. More general
connections are defined in Chapter 35. A very general kind of parallelism is defined in Chapter 24, but the
full generality of parallelism is not required for differential geometry.
36.2.7 Remark: The natural, most general kind of mathematical structure on which one can define par-
allelism is the “fibre bundle”. This concept tends to make differential geometry quite confusing because the
concept is very general and abstract. Since much DG literature is written in the language of fibre bundles,
it is necessary to present fibre bundle definitions in this book. Roughly speaking, fibre bundles are just
structures which are attached at points of a manifold. Different textbooks define fibre bundles differently.
Therefore a wide range of definitions is defined in this book. The best approach to understanding fibre
bundles is to think always about particular examples. The most basic non-trivial example of a fibre bundle
is the space of all tangent vectors on a manifold. Fibre bundles are defined in Layer 2, but they fulfil their
purpose only when parallelism is defined on them in Layer 3.
36.3. Motivation for defining connections on manifolds

[ This section probably belongs in Chapter 35, but needs to be adapted for general connections. ]
36.3.1 Remark: One way to clearly motivate connections on differentiable manifolds is to try to calculate
the second derivative of a C 2 map γ : IR2 → M for a C 2 manifold M . The first derivative with respect
to the first parameter is easily expressed in terms of tangent vectors as a map γ1 : IR2 → T (M ) such that
γ1 (s, t) ∈ Tγ(s,t) (M ) for all (s, t) ∈ IR2 . If this is differentiated with respect to the second parameter, the
result is a map γ12 : IR2 → T (T (M )) such that γ12 (t) ∈ Tγ1 (s,t) for (s, t) ∈ IR2 . If the derivatives are
performed in the opposite order, the result is a map γ21 : IR2 → T (T (M )) such that γ21 (t) ∈ Tγ2 (s,t) (M )
for (s, t) ∈ IR2 . The problem with this is that the derivatives γ12 (s, t) and γ21 (s, t) are not even in the same
tangent space unless γ1 (s, t) = γ2 (s, t). Therefore there is no way of comparing these quantities. This is not
a problem of simple non-commutivity. The problem is that the two second derivatives are not comparable
at all.
The core of this problem is the fact that the space T (T (M )) consists merely of formal ‘infinitesimals’ of
motions within the manifold structure of T (M ). A second derivative of the form γ12 says nothing about
the rate of change of γ1 with respect to the second parameter because there is no way of comparing vectors
γ1 (s, t) for different parameter values, in particular for values (s, t) and (s, t + h) for small h ∈ IR. It is
clearly desirable to have γ1 (s, t) and γ1 (s, t + h) in the same vector space so that they can be subtracted to
yield some sort of derivative.
The obvious way to solve this problem is to define a linear isomorphism between Tγ(s,t)) (M ) and Tγ(s,t+h) (M )
for all h ∈ IR so that γ1 (s, t) and γ1 (s, t + h) can be subtracted. If this plan is carried through, the result in

36.3. Motivation for defining connections on manifolds 703
the limit as h → 0 is a linear map from Tγ1 (s,t) (T (M )) to Tγ(s,t) (M ). This maps γ12 (s, t) into Tγ(s,t) (M ). If
the derivatives are done in the reverse order, then γ21 (s, t) is also mapped into Tγ(s,t) (M ). In this way, the
two second-order derivatives can be compared.
A further difficulty now arises. There is no obvious way to define the linear maps αγ2 (s,t) : Tγ1 (s,t) (T (M )) →
Tγ(s,t) (M ) and αγ1 (s,t) : Tγ2 (s,t) (T (M )) → Tγ(s,t) (M ). (The subscripts for α are chosen to indicate the
direction of the second derivative.) Therefore the difference between the two second-order derivatives has an
arbitrary value, αγ2 (s,t) (γ12 (s, t)) − αγ1 (s,t) (γ21 (s, t)) ∈ Tγ(s,t) (M ).
Although the differential parallelism along curves thus defined is arbitrary in the mathematical sense, the
situation is not hopeless because in a flat space, an absolute parallelism is defined independent of paths,
whereas in general relativity and mechanics, there are definitions of pathwise parallelism which arise from
the physics being modelled. In mechanics, parallelism within a system state space may arise from a least
action principle, whereas parallelism is derived from the metric structure in general relativity.
From the above discussion, it is clear that connections arise naturally as a way of giving meaning to second
(and higher) order derivatives. A very large number (probably the vast majority) of physical models require
second order derivatives. Such derivatives are meaningless if vectors at different points cannot be compared.
(More precisely, second order derivatives are defined in the absence of a connection, but they lie in the
space T (T (M )), which is useless because this is a space of abstract derivatives, and the derivatives in an
equation would then all be in different, incomparable spaces Tz (T (M )).) So “parallelism at a distance” is
a prerequisite for defining the second-order derivatives which are required by most physical models. When
working in flat space, it is difficult to be aware of the role of parallelism in defining second-order derivatives.
Curved space exposes this role.
The term “curvature” is applied to the extent to which derivatives do not commute. If the curvature is
non-zero, the order of applying derivatives becomes important, which is not so in flat space if the function
is twice continuously differentiable. Figure 36.3.1 illustrates path-dependent parallelism.
e2p1 ,ψ
w1 = w
+u 1
e1p1 ,ψ
Tp1 (M
p,ψ
e2 w v1 )
v2 − v1
w + u2
p,ψ
e1 w2 =
Tp (M ) e2
p2 ,ψ
v2
p2 ,ψ
e1
)
T p2 (M
Figure 36.3.1 Path-dependent parallel transport of a tangent vector
36.3.2 Remark: As mentioned in Section 38.2, in the connection layer (layer 3), the affine connection is a
differential of parallel transport and the parallel transport is an integral of the affine connection. Similarly,
in the metric layer (layer 4), the Riemannian metric tensor is a differential of the pointwise distance function
and the distance functions is an integral of the metric tensor.
However, the pointwise distance function is calculated by extremizing the path integral over all paths between
two points. The affine connection is calculated by simply integrating the affine connection over a single path.
This raises the question of whether something interesting and useful is obtained by extremizing the path
integral of the affine connection. Certainly specially distinguished paths are available in layer 3, namely the
geodesic paths. In layer 4, the geodesics which follow the Levi-Civita connection are paths which extremize
the point-to-point distance. So is there something special in the parallel transport which is carried by a
geodesic curve? If not, why not.

36.4. Affine connections on tangent bundles

[ Define affine connections on Lie transformation groups and on differentiable fibre bundles? ]
[ Is it true that θv ∈ X 0 (T (π −1 ({p})))? Maybe θv (z) ∈ Tz (E), linear with respect to v. ]
36.4.1 Remark: If the tangent bundle T (M ) is substituted for the total space E in Definition 35.3.12,
the result is Definition 36.4.2. In fact, a vector bundle structure is used here. (See Section 34.8 for vector
bundles.) [ Change Definition 36.4.2 to use vector bundles. ]
[ Also do Definition 36.4.2 first for θ? ]
% An affine connection on a C tangent bundle (T (M ), π, M ) for a C manifold M is a

36.4.2 Definition: 1 2
map θ̄ : T (M ) → z∈T (M ) Lin(Tπ(z) (M ), Tz (T (M ))) such that

(i) ∀z ∈ T (M ), θ̄z ∈ Lin(Tπ(z) (M ), Tz (T (M ))),
(ii) ∀z ∈ T (M ), (dπ)z ◦ θ̄z = idTπ(z) (M ) ,
n
(iii) ∀p ∈ M, ∀V ∈ Tp (M ), ∀φ ∈ AITR(M ) , ∃u ∈ gl(n), ∀z ∈ Tp (M ), (dφ)z (θ̄z (V )) = (dRφ(z) )e (u), where for
all y ∈ IRn , Ry : GL(n) → IRn is defined by Ry : g 8→ gy,
The function θ̄ is called the lift function of the connection.
36.4.3 Remark: The conditions of Definition 36.4.2 may be summarized as (i) linearity with respect to the
translation vector, (ii) equality of horizonal component of connection to the translation vector, (iii) preser-
vation of fibre space structure under the connection.
[ Must define the lift of a vector field by an affine connection. Should find a way to define regularity of affine
connections without using the lift of vector fields. Then have a theorem that a C k connection lifts C k fields
to C k fields. ]
−
36.4.4 Definition: An affine connection of class C k on a C k+1 manifold M for k ∈ + 0 is an affine
connection θ̄ on M such that liftθ̄ (X) ∈ X k (T (M )) for all X ∈ X 1 (M ).
[ Should regenerate parallel displacement from the OFB connection. ]
36.4.5 Definition: [Definition of parallel displacement of a tangent space along a path.]
[ See EDM2 [35] 80.C and 80.H for parallel displacement. ]
[ Must show the existence and uniqueness of the parallel transport φt along a path map γ in M mapping
φt : Tγ(0) (M ) → Tγ(t) (M ). This leads to a map Φ : Γ(M ) → G, where Γ(M ) is the set of piecewise smooth
paths in M . ]
[ Give formula for connection form in terms of connection lift functions. ]
[ Somewhere define general “associated connections” for associated fibre bundles of the initial OFB or PFB,
using associated parallelism. ]
36.5. Covariant derivatives

36.5.1 Remark: Simple derivatives of vectors with respect to the coordinates of a manifold are not in-
variant under changes of coordinates. Derivatives of vectors cannot be made coordinate-independent in the
same way as derivatives of real-valued functions by applying the coordinate transformation rule in Defi-
nition 26.5.7. Naive derivatives of vectors have coordinates which depend on the second-order derivatives
of chart transition maps. This is the principal motivation for defining a connection. Parallel transport is
defined in order to be able to define covariant derivatives rather than because parallelism is of interest in
itself.
36.5.2 Remark: Although the covariant derivative is often used as a definition of a connection, it is
applicable only to affine connections. Although a covariant derivative does yield an affine connection on
the tangent bundle, it is in a sense the opposite of a connection because the covariant derivative gives the
deviation of a vector field from parallelism, whereas a connection gives a definition of parallelism itself.
Covariant derivatives are defined by subtracting the connection from naive derivatives of vectors.

36.5. Covariant derivatives 705
36.5.3 Remark: The term “covariant derivative” is yet another unfortunate choice of words in the subject
of differential geometry. On a differentiable manifold without any connection, a covariant vector transforms
as the dual of contravariant vectors, and the contravariant vectors are the ordinary tangent vectors of the
manifold’s tangent bundle. Thus “covariant” and “contravariant” vectors are duals of each other. It happens
that the differential of a real-valued function on a manifold is a covariant vector. But this has nothing at all
to do with affine connections or parallel transport.
The “covariant derivative” of a real-valued function, when an affine connection is defined, is identical to
the differential of the real-valued function in the differential layer 2 with no connection. But the differential
of a vector field (in the absence of a connection) is a mixed covariant/contravariant tensor which is chart-
dependent (unless it is “anchored” to a particular chart as suggested in Remark 29.0.4).
When an affine connection is available, the “covariant derivative” of a contravariant vector field on a manifold
has a value which is a contravariant vector or vector field. Thus the so-called “covariant derivative” yields a
contravariant object, which seems somewhat confusing. However, the value of the covariant derivative of a
contravariant vector field varies as the dual of the vector (or vector field) which is used for the differentiating.
In this sense, the value of the covariant derivative does vary “covariantly”.
Unfortunately there is a further confusion of terminology here because a “covariant vector” varies as the
dual of the “contravariant vectors”, and the contravariant vectors are the ordinary tangent vectors of the
tangent bundle. So “covariant” really means “contravariant” and vice versa. (This source of confusion is
also discussed in Remark 13.7.2.)
In tensor calculus, the term “covariant” is applied to anything which uses a subscript index and the term
“contravariant” corresponds to superscript indices. Since the application of the “covariant derivative” adds
a subscript index to the components of the object being differentiated, the word “covariant” does seem
appropriate. But the important attribute of the “covariant derivative” is that it takes into account the
j k
affine connection. Thus ∂i v j is not the covariant derivative of v, whereas the components ∂i v j + Γik v do
correspond to the covariant derivative of a vector field v.
36.5.4 Remark: The covariant derivative of a vector field is defined in terms of an affine connection θ on
a C 2 manifold in Definition 36.5.5.
!
[ Regarding θ̄X(π(V )) (V ) in Definition 36.5.5, see Spivak [43], II. page 317 for h(Y ) = Y − j ω j (Y ) etc.
Instead of θ̄X(π(V )) (V ), could write θ̄V (X(π(V ))) etc.?
The vector field X ∈ X 1 (M ) can probably be a general diff. cross-section of a diff. fibre bundle?
Perhaps the drop function 8 should be a different letter rather than ω?
Must summarize the relations between θ, ρ, ω (connection form), Γjk i
and DV X. ]
36.5.5 Definition: The covariant derivative of a vector field X ∈ X 1 (M ) on a C 2 manifold M with

respect to a connection θ on M is the map D : T (M ) × X 1 (M ) → T (M ) defined by
∀V ∈ T (M ), ∀X ∈ X 1 (M ), DV X = 8X(π(V )) (dV X − θ̄X(π(V )) (V )),
where 8 is the “drop” function for T (M ) and π is the projection map for T (M ).
[ Must remember to define dV X in Definitions 36.5.5 and 36.5.7. ]
36.5.6 Remark: Definition 36.5.5 is probably clearer in terms of a fixed p = π(V ).
∀V ∈ Tp (M ), ∀X ∈ X 1 (M ), DV X = 8X(p) (dV X − θ̄X(p) (V )).
This is illustrated in Figure 36.5.1. It certainly is not necessary that V be a vector field as many texts
require. Definition 36.5.7 gives the case that V is a vector field.
36.5.7 Definition: The covariant derivative of a vector field X ∈ X 1 (M ) on a C 2 manifold M with
respect to a connection θ on M is the map D : X 0 (M ) × X 1 (M ) → X 0 (M ) defined by
∀V ∈ X 0 (M ), ∀X ∈ X 1 (M ), ∀p ∈ M,
(DV X)(p) = 8X(p) (dV X(p) − θ̄X(p) (V (p))).
36.5.8 Definition: [Definition of covariant differential of a tensor field.]

T (p) (T (M ))
dV X ∈ X
T (M ) X(p)
θV ( dV X − θV (X(p)) ∈ ker((dπ)X(p) )
X(
p) )
π ∈T
X(
p) (T
(M
) )
M p " #
DV X = 8X(p) dV X − θV (X(p))
∈ Tp (M )
Figure 36.5.1 Covariant derivative of a vector field by a fixed vector
36.5.9 Remark: This simple relation between the abstract definition of an affine connection and the
concrete definition of a covariant derivative is not shown in any text consulted by the author. The texts
which do give a relation all explain it in terms of integral curves and parallel transport. (Note also that the
OFB connection is used here, not the PFB connection.)
The negative sign in the above equations for the covariant derivative are due to the fact that covariant
derivatives measure deviation from parallelism whereas the connection defines parallelism itself. Therefore
they must be in an opposite relation to each other.
[ Give the rule for explictly converting a connection θ into Γjk

i
via the covariant derivative DV . In other words,
find the relation between the components of and Γjk for any given chart. (See EDM2 [35], page 1573,
i
section 417.B.) The connection form ω is just the identity map minus the connection θ or ρ. The Christoffel
symbol is just the matrix of components of the connection form ω. Should have a table showing how to
convert between all formulations of an affine connection. ]
36.5.10 Remark: The following definition is a little abstract. However, it should be possible to use this
high-level definition to develop an explicit definition of covariant derivatives in terms of the connection ρ̄.
In fact, it should be possible to derive the covariant derivatives of all kinds of tensor fields from this kind
of high-level definition. The problem here is to somehow generate the transport function φt,h out of ρ̄ and
then use φt,h to develop DX Y .
36.5.11 Definition: The covariant derivative of a vector field Y along a curve γ in a C ∞ manifold M is
the map Y # : Int(Dom(γ)) → T (M ) such that Yt# ∈ Tγ(t) (M ) is defined for t ∈ Int(Dom(γ)) by
Yt# = lim t−1 (φ−1

t,h (Yt+h ) − Yt ),
h→0
where φt,h is the “parallel displacement” along γ from Tγ(t) (M ) to Tγ(t+h) (M ). The limit is taken in Tγ(t) (M ).
36.5.12 Definition: The covariant derivative of a vector field Y in the direction of a vector field X is the
vector field DX Y defined at a point p ∈ M by
(DX Y )p = lim t−1 (φ−1

t (Yγ(t) ) − Yγ(0) ),
t→0
where γ is an integral curve of X such that γ(0) = p, and φ is the function which effects parallel transport
along γ.
[ Definition 36.5.12 is very incomplete. It needs a definition of integral curve, parallel transport, and limits
in Tp (M ). Also required is a proof of independence with respect to the integral curve if it is not unique,
plus a proof of existence of the integral curve etc. Also need proof of existence and differentiability of the
connection. ]
[ The following definition looks suspiciously like the curvature form on a principal bundle. ]
[ Do the covariant derivative of general fibre bundle cross-sections using associated fibre bundles and associated
parallelism and associated connections. ]

36.6. Hessian operators 707
36.5.13 Definition: [Define covariant differential of a differential form on a principal G-bundle (P, π, M ).
Get something like
(Dα)(X1 , . . . , Xk+1 ) = (dα)(hX1 , . . . , hXk+1 ),
where h : X ∞ (P ) → X ∞ (P ) is one of the three definitions of a connection.]
[ Try to re-express the covariant derivative in terms of the lift function ρ̄ and in terms of the other equivalent
forms of connection. See EDM2 [35] 80.G.
In the above definition for covariant differential of a differential form on a principal fibre bundle, have: F is a
finite-dimensional vector space, α ∈ X0,k ∞
(P, F ) (that is, α is a C ∞ (alternating) k-form on the C ∞ manifold
P with coefficients in F ), for i = 1, . . . , k + 1, Xi is a vector field on P (that is, Xi ∈ X ∞ (P )), h : X ∞ (P ) →
X ∞ (P ) is the horizontal projection operator, and so h(X) ∈ X ∞ (P ) and h(X)(z) = hz (X(z)) ∈ Tz (P ).
Could use the identity hz = ρ̄z ◦ (dπ)z (or h = ρ̄ ◦ dπ) to simplify the expression. Thus hXi = h(Xi ) =
ρ̄ ◦ (dπ)(Xi ). For z ∈ P , have α ∈ Λk (Tz (P ), F ). ]
[ See notes J for covariant derivatives. Also see EDM2 [35] 80.I and 417.B, Gallot/Hulin/Lafontaine [20] 2.58,
2.60, and 2.68. ]
−1
36.5.14 Remark: [The motivation for the definition of covariant derivative: (d/dt)(φt,h (. . .) − . . .) etc.
Make a comparison with the Lie derivative.]
[ Is it possible to define covariant derivatives on T (T (M ))? ]

r,s
[ Define covariant derivatives of general-type tensors. Maybe use notations DX and DVr,s . ]
36.6. Hessian operators

36.6.1 Remark: As mentioned in Section 29, second-order tangent operators are well-defined in the ab-
sence of a connection, but they have a first-order term which depends on the choice of coordinate chart
because the transition rules for second-order operators have a term depending on the second derivatives of
the chart transition maps. When a connection is defined, however, second-order derivatives may be defined
with reference to local parallel transport so that the chart-dependent first-order component of the operator is
hidden. If a second-order derivative is written in terms of coordinate derivatives instead of covariant deriva-
tives, the first-order term reappears. Thus covariant second-order operators are really special cases of the
differentiable manifold operators in Section 29.1, but they are written with the assistance of the connection
to hide the first-order derivatives.
36.6.2 Remark: As mentioned in Section 31.6, the first-order derivative term in a second-order operator
at a critical point of a real-valued function is zero with respect to one chart of a C 2 manifold if and only if
it is zero with respect to all charts. Therefore the Hessian operator is well-defined at a critical point even
in the absence of a connection, but when the derivative of a real-valued function is non-zero, the Hessian is
chart-dependent unless some arbitrary choice of special coordinates is made. A connection effectively selects
geodesic coordinates at each point as the “right” coordinates. The Hessian is calculated with respect to these
coordinates, and the Hessian is calculated in all other coordinates by including correction terms to make the
calculation agree with geodesic coordinates.
[ The product DV1 DV2 in Definition 36.6.3 doesn’t commute? Probably the product commutes if the connec-
tion is symmetric. ]
36.6.3 Definition: The Hessian operator at a point p in a C 2 manifold M with a C 1 connection is the
[2]
map Hp : Tp (M ) × Tp (M ) → T̊p (M ) defined by Hp : (V1 , V2 ) 8→ (f 8→ DV1 DV2 f ), for V1 , V2 ∈ Tp (M )
and f ∈ C 2 (M ), where D denotes the covariant derivative on M .
36.6.4 Remark: In terms of Christoffel symbols, the Hessian operator looks like ∂ij − Γij k
∂k . Therefore
j
Hp (V1 , V2 ) = v1i v2 (∂ij − Γij
k
(ψ)∂k ) for vectors Vm = tp,vm ,ψ ∈ Tp (M ) for m = 1, 2.
[ Show how to derive the Hessian at p ∈ M from an arbitrary linear combination of second& derivatives of a
function f ∈ C 2 (M ) via C 2 curves γ : IR → M passing through p. Define Dγ f = ∂s2 f (γ(s))&s=0 for p = γ(0).
The natual bilinear extension of such derivatives is the Hessian. The same approach can be used for maps φ. ]

[ See Greene/Wu [68], page 7 for a vector-field based definition of Hessian. ]

[ Must also define Hessian of maps φ : M → M (φ ∈ C 2 (M, M )) and maps φ ∈ C 2 (M1 , M2 ) for C 2 manifolds
with C 1 affine connections. ]
36.7. Elliptic second-order operator fields

36.7.1 Remark: In flat space, the theory of boundary value problems for elliptic second-order partial
differential equations (as described in Gilbarg/Trudinger [110]) is concerned with the “actors”, namely the
BVP solution u, the coefficients of the PDE a, b, c and right-hand side f and the boundary conditions.
An important actor in this scenario is the domain Ω itself, and its boundary ∂Ω. The flat-space “theatre”
of action is a space IRn , which is always firmly in the background. In differential geometry on the other
hand, the “theatre” is of very great significance. The theatre is then the manifold M together with an
optional affine connection and (even more) optional Riemannian metric. All of the assumptions in Gilbarg/
Trudinger [110] must be re-examined and re-worked to adapt them to a curved theatre of action.
36.7.2 Remark: The Hessian is clearly a bilinear function on the tangent space Tp (M ). This suggests
that it may be contracted with contravariant degree-2 tensors to yield a well-defined scalar object.
[2]
Tensorial second-order operators may be defined in T̊p (M ) by contracting the second-order covariant deriva-
tive with symmetric second-degree contravariant tensors. (The antisymmetric component has no effect. So
it is best to eliminate it from consideration.) For any symmetric tensor A ∈ Tp2,0 (M ), with coefficients
[aij ]ni,j=1 , the operator aij (∂ij − Γij
k
(ψ)∂k ) gives a chart-independent number when applied to a particular
function f ∈ C (M ). This is the contraction of the type (2, 0) tensor A with the type (0, 2) tensor D2 f (p).
2
As noted in Remark 29.6.2, the positive definite and semi-definite properties of degree-2 tensors are chart-
independent. Therefore the classification of second-order operators as elliptic, weakly elliptic, hyperbolic
etc. is chart-independent.
36.7.3 Definition: A (weakly) elliptic second-order operator at a point p ∈ M of a C 2 manifold M with
an affine connection is an operator AD2 f (p) such that the tensor A ∈ Tp2,0 (M ) is positive semi-definite.
36.7.4 Remark: In terms of tensor components with respect to a chart ψ ∈ atlasp (M )f , the tensor
AD2 f (p) in Definition 36.7.3 may be written as aij (p)(∂ij f (p) − Γij
k
(ψ)∂k f (p)).
36.7.5 Remark: Whereas it was not possible to distinguish between pure second-order operators and
mixed first-and-second-order operators in Definition 29.6.3, when an affine connection is available, pure
second-order operators as in Definition 36.7.3 are well-defined.
36.7.6 Remark: The “lower norm” matrix function λ− : Mn,n (IR) in Theorem 36.7.7 is given by Defini-
tion 11.4.2.
[ 2007-3-21: Theorem 36.7.7 is work in progress. Please ignore it. ]
36.7.7 Theorem: Let M be an n-dimensional C 2 manifold with an affine connection. Assume:
(i) Ω ⊆ M is an open subset of M .
(ii) The contravariant tensor field A ∈ X(T (2,0) (Ω)) satisfies
∀ψ ∈ atlas(M ), ∃ka ∈ IR+ , ∀x ∈ ψ(Ω), λ− (a(ψ)(x)) ≥ ka .
(iii) The contravariant vector field B ∈ X(T (Ω)) satisfies

∀ψ ∈ atlas(M ), ∃kb ∈ IR+ , ∀x ∈ ψ(Ω), ∀i ∈ n, |bi (ψ)(x)| ≤ kb .
(iv) The Christoffel symbol Γ for the affine connection on M satisfies

∀ψ ∈ atlas(M ), ∃kc ∈ IR+ , ∀x ∈ ψ(Ω), ∀i, j, k ∈ n,
k
|Γij (ψ)(x)| ≤ kc .
(v) L : C 2 (Ω, IR) → (Ω → IR) is an operator defined by Lu(p) = A(p)D2 u(p) + B(p)Du(p) for p ∈ Ω.
(vi) u ∈ C 0 (Ω) ∩ C 2 (Ω) satisfies Lu(p) ≥ 0 for all p ∈ Ω.
Then supΩ (u) ≤ sup∂Ω (u).

36.8. Curvature and torsion 709
36.7.8 Remark: Theorem 36.7.7 is an example of how the connection on a differentiable manifold becomes
an actor in PDE problems. The “coordinate-free” expressions for partial differential equations hide the
coordinates and the connection, but when the analysis starts, the hidden terms and factors must be brought
out into the open and pinned against the wall.
36.7.9 Remark: The Christoffel symbol Γ in Theorem 36.7.7 may be replaced by arbitrary tensorization
coefficients as defined in Section 29.2. An affine connection is not necessary to make the theorem valid. The
connection (or tensorization coefficient field) is required only in order to allow the differential operator to be
expressed in terms of a tensorial second-order coefficient field A.
36.7.10 Remark: General tensorization coefficients (and general connection coefficients) are not necessar-
ily symmetric. (In other words, they may not be torsion-free.) In this case, the antisymmetric component of
the tensor A(p) in Theorem 36.7.7 cannot be automatically disregarded. If A(p) is required to be symmetric,
then the anti-symmetric component of Γ may be disregarded. If Γ is required to be symmetric, then the
anti-symmetric component of A(p) may be disregarded.
It is not clear whether the case where both are anti-symmetric would have interesting applications. If a
connection is not torsion-free, it cannot be reconstructed from the set of geodesics using the Schild’s ladder
approach. Second-order equations which depend on the torsion of the underlying space seem unlikely to be
interesting. The Levi-Civita connection is, of course, torsion-free.
36.7.11 Remark: It often happens that the assumptions, the input conditions of a theorem, are difficult
to provide. The output results, the assertions, are only obtained when the input conditions are satisfied.
This raises the question of how one controls the input parameters. Sometimes this control is achieved by
using another theorem B whose outputs give the required inputs for theorem A. In the case of second-order
PDE on differentiable manifolds, one of the required inputs is a bound on the geometry itself.
For Theorem 36.7.7, bounds on the Christoffel symbol Γ are required. Such a bound is not usually available.
Bounds on a geometry may be given as, for example, a constant, positive or negative sectional curvature,
bounds on the Ricci curvature, and so forth. It is then necessary to find relations between the bounds which
are provided and the bounds which are required.
36.8. Curvature and torsion
36.8.1 Remark: The central concepts of differential geometry are manifolds, tangent vectors, pathwise
parallelism, curvature and metric tensors. The core concept is curvature. If the curvature is everywhere
zero in a manifold, then there is little of real geometric interest. A manifold with zero curvature may have
topological and analytical interest, but geometrically speaking, such a manifold is not much different to
Euclidean space with curvilinear coordinates and possibly some topological complexity.
36.8.2 Remark: The curvature of an affine connection is something like the “curl” of the connection. That
is, it measures how much the parallel transport varies around the boundary of an infinitesimal area element
divided by the area of the element. This is what the exterior derivative calculates.
[ For both curvature and torsion, see Crampin/Pirani [12], page 268 and 273. ]
[ Note that geodesics don’t belong in this section because they are defined in Section 37.2. ]
[ It is very likely that curvature and torsion are meaningful for a general connection. If so, then there should
be a separate section on this before affine connections are defined. ]
[ See EDM2 [35] 80.G, 80.J (and 80.E ?) for curvature form, and 364.D for curvature form on Riemannian
manifolds. The curvature form should look something like Ω = Dω, which requires first the general definition
of covariant derivative. ]
[ Should try to define a generalized curvature for the case that the connection is weakly differentiable. ]
36.8.3 Definition: The curvature form on a C 2 manifold with a C 1 connection is . . .
[ Should consider as an example (Dα)(X1 , X2 ) = (dα)(hX1 , hX2 ), where α is a 1-form and hz (X) = (ρ̄z ◦
(dπ)z )(X). Also consider the case of a 0-form α, so that (Dα)(X) = (dα)(hX). See EDM2 [35] 417.B for
clues. ]

36.8.4 Definition: The canonical 1-form is . . .
36.8.5 Definition: The torsion form of a C 2 manifold with a C 1 connection is . . .
36.8.6 Definition: The curvature tensor of a C 2 manifold with a C 1 connection is . . .
36.8.7 Definition: The torsion tensor of a C 2 manifold with a C 1 connection is . . .
[ Give the significance of the torsion tensor in great detail. See in particular Weyl [51], page 113. ]
[ Probably the Schild’s ladder construction deserves its own section. ]
36.8.8 Remark: Note that the torsion of an affine connection has no influence on the geodesics of the
manifold. So if the connection is regarded as a way of generating geodesics, then the torsion component of
the connection is superfluous. Thus the connection can be regenerated from knowledge of just the geodesics,
but not the torsion component. But if the torsion is required to be zero, then the connection is uniquely
determined by the set of geodesics. However, knowledge of the covariant derivative determines the whole
connection, including the torsion component.
If the geodesics (together with affine parameters on all geodesics) are known, then the Schild’s ladder
approach regenerates the definition of parallelism. The minimal unit of such a ladder is a “cross-brace”. A
cross-brace is a pair of geodesics which meet in their mid-points. Then two pairs of opposite ends of the
resulting “X” are joined by geodesics, giving a bow tie shape: 9:. In the limit of a small cross-brace, the
opposing geodesics are supposed to be parallel. If the torsion is zero, then this is so. But otherwise, the
torsion causes such pairs of geodesics to be non-parallel to first order with respect to the distance between
them. The first-order rate of deviation of the two geodesics from parallel transport is in fact equal to the
torsion in the plane of the cross-brace.
[ Maybe (and hopefully) there is some sort of relation between the cross-brace method of measuring torsion
and the triangle method of measuring the sectional curvature. But the sectional curvature requires a metric,
doesn’t it? ]
If the torsion is non-zero, then a Schild construction will generate a twisting (or expanding etc.) ladder
which may resemble a DNA double helix or may simply expand or compress without rotating. It would be
interesting to know how the behaviour of the double helix changes with respect to orientation. A better way
of thinking of the Schild’s ladder is as an extendable handle 9:9:9:9:, similar to the type used in old-fashioned
lift doors (as in the Hôtel des Vosges in Strasbourg).
[ Give a detailed account of Schild’s ladder near here. Show how the geodesics with affine parameters can be
used for successive approximation to parallel transport. ]
A useful way to demonstrate torsion is to change coordinates around a point in a 2-manifold so that the
Christoffel symbol is antisymmetric. Then a pair of basis vectors at the origin gets distorted in a clear way
as it is moved around under parallel transport. A bow tie can be formed in geodesic coordinates from the
four points (±1, ±1) or from the four points (1, 0), (1, 1), (−1, 0) and (−1, −1). If the torsion is non-zero,
then a bow tie with parallel ends can be formed by varying the cross-over ratio from 0.5 to something else
which varies linearly with distance apart of the end lines. But in greater than 2 dimensions, this variation of
the cross-over ratio does not suffice. It is useful to demonstrate explicitly how the geodesics are unaffected
by the torsion in the case that all Christoffel symbol entries are zero except for Γ12 1
= −Γ211
= α. Then a
geodesic in the direction (1, 1) is still a straight line in geodesic coordinates. All geodesics through the origin
are still radial.
[ Here have a diagram showing how the standard basis at the origin is parallel displaced as the origin is moved
in each of 8 compass directions away from the origin. More than one example should be given, corresponding
to different possibilities for the Christoffel symbol. ]
[ Should show the relation between the torsion tensor and the asymmetry of the Christoffel symbol. The
Christoffel symbol is symmetric if the connection is torsion-free. ]
[ There is still the question of the relation between the parallelogram law in linear spaces and the bow tie
construction. In linear spaces, the inner product can be reconstructed from the norm if the parallelogram
law is assumed. Is this equivalent to saying that a connection can be reconstructed out of the geodesics? If
not, then what is the corresponding thing in connections to the parallelogram law. ]

36.9. Affine connections on principal fibre bundles 711
[ Also the important question must be dealt with of whether or not zero torsion is equivalent to the existence
locally of a vector field such that the Lie derivative with respect to this field is equal to the covariant
derivative. It could be that all Lie derivatives have zero torsion, or else that they all generate a local
connection, with or without zero torsion, or else that there is no particularly neat relation between Lie
derivatives and covariant derivatives. After all, the relation DX Y − DY X = LX Y suggests that one should
not try to look for such correspondences between covariant and Lie derivatives. ]
[ Should have a section with some theorems giving necessary or sufficient intrinsic conditions on a set of
(parametrized) curves for these curves to be suitable for use with the Schild’s ladder construction to generate
an affine connection. It is clear that the construction will yield a torsion-free affine connection if and only if
the set of geodesics was originally constructed from such a connection, but this condition is not intrinsic to
the set of geodesics. A first step towards this kind of investigation might be to determine intrinsic conditions
on the set of geodesics which make the set self-consistent with respect to a metric. ]
36.8.9 Theorem: [Bianchi identities. Give geometric significance. Misner/Thorne/Wheeler [38], chap-
ter 15, make a big fuss over these. So does everybody else. They must be important!]
[ Give the components of the torsion tensor, curvature tensor and covariant differential. ]
[ This is a very interesting passage from EDM2 [35], section 80.K: “For a Riemannian metric g on M , there
exists a unique affine connection on M such that (i) ∇g = 0, and (ii) the torsion tensor T vanishes. This
connection is called the ‘Riemannian connection’ corresponding to g.” The rest of what they say here is
equally useful. ]
36.9. Affine connections on principal fibre bundles

[ The n-frame bundle is defined in Section 34.9. ]
36.9.1 Definition: Let M be a C 1 n-dimensional manifold. Let (P (M ), π, M ) be the bundle of tangent

n-frames over M with structure group GL(n) and the standard fibre atlas. An affine connection on M is
any connection on the tangent n-frame bundle (P (M ), π, M ).
36.9.2 Definition: [Definition of affine transformation. See EDM2 [35] 80.J.]
36.9.3 Definition: [Definition of the product of manifolds with affine connections.]
[ See notes J. ]
[ Is it possible to replace GL(n) with O(n) and so forth, and then get a different sort of connection etc.? ]
[ Should regenerate parallel displacement from the PFB connection. ]
[ Maybe have a section on orthogonal and conformal connections here. Maybe projective connections too. ]
36.10. Coefficients of affine connections on principal fibre bundles

36.10.1 Remark: Suppose that M is a C 1 n-dimensional manifold with an affine connection ρ̄. Then for
all p ∈ M and z ∈ π −1 ({p}), ρ̄z : Tp (M ) → Tz (P (M )) is a linear map between tangent spaces of M at p
and P (M ) at z, where P (M ) is the n-frame bundle of M , and π : P (M ) → M is the standard projection
map for P (M ). Let ψ ∈ C̊ 1 (M, IRn ) be a C 1 chart for M , and let ψ̂ ∈ C̊ 1 (P (M ), IRn ) be the corresponding
chart for P (M ). [ Note that it is not necessarily true that ψ ∈ atlas(M ). It is only required that ψ be
C 1 -compatible with the atlas on M . The reason for this weak condition on ψ is not actually very good. It
was introduced here out of a fear that smooth n-frame fields might not exist on chart domains that are too
large. But actually this is not a problem, because the domain of a chart is always C 1 -diffeomorphic to an
open subset of IRn . ] Then ψ̂ : π −1 (Dom(ψ)) → IRn × GL(n).
Let e : Dom(ψ) → P (M ) be a C 1 n-frame field on M . Then for all p ∈ Dom(ψ), the map (de)p : Tp (M ) →
Te(p) (P (M )) can be expressed in coordinates as follows. For v ∈ Tp (M ), v = v i ∂ip,ψ , and. . .
For p ∈ M , the n-frame e(p) = (e(p)i )i∈n ∈ Pp (M ) may be expressed in terms of the basis (∂ip,ψ )i∈n by
ei (p) = ∂jp,ψ a(p)j i for a unique map a : Dom(ψ) → GL(n). Thus a = ψ̂ ◦ e. The n-frame field e is parallel at
p in the direction v ∈ Tp (M ) if and only if (de)p (v) = ρ̄e(p) (v). That is, the lift function ρ̄ specifies the rate
of change of an n-frame in a given direction which keeps that n-frame parallel.

Now for p ∈ Dom(ψ) and v ∈ Tp (M ), (de)p (v) can be expressed in terms of the chart ψ̂ for P (M ) by
(de)p (v) =. . .
Define the vectors ∂iz,ψ̂ and ∂j,k
z,ψ̂
for i, j, k ∈ n by
∂ &
&
∂iz,ψ̂ (f ) = i
(f ◦ ψ̂ −1 (x, a))&
∂x (x,a)=ψ̂(z)
and
∂ &
z,ψ̂ &
∂j,k (f ) = (f ◦ ψ̂ −1
(x, a)) &
∂aj k (x,a)=ψ̂(z)
for f ∈ C̊z1 (P (M ), IR). This gives a set of n + n2 basis vectors for Tz (P (M )).
36.10.2 Theorem: Let M − < (M, AM ) be a C 1 n-dimensional manifold. Let p ∈ M , z ∈ Pp (M ) and

w ∈ Tz (P (M )). Then for all ψ ∈ AM , for some b ∈ IRn and a ∈ GL(n),
w = bi ∂iz,ψ̂ + aj k ∂j,k
z,ψ̂
.
In particular, for any affine connection ρ̄ on M , there are components ρ̄ψ̂

z (v) ∈ IR
n×n
such that
ρ̄z (v) = v i ∂iz,ψ̂ + ρ̄z (v)j k ∂j,k

z,ψ̂
for all v ∈ IRn .
36.10.3 Theorem: If ρ is an affine connection on M , then for all p ∈ M and v ∈ Tp (M ),
(dq)z (bi ∂iz,ψ̂ + aj k ∂j,k

z,ψ̂
) = bi ∂ip,ψ
and
ρ̄z (v) = v i ∂iz,ψ̂ + ρ̄z (v)j k ∂j,k
z,ψ̂
,
z (v) ∈ IR
for some ρ̄ψ̂ n×n
. [ This theorem is possibly meaningless. Check this. ]
36.10.4 Theorem: Let ρ be an affine connection on the tangent n-frame bundle P (M ) of a C 1 n-

dimensional manifold M . Then there exists a map Γ : C̊(IR, M ) × atlas(M ) → IRn×n×n such that. . .
[ Probably Definition 36.10.5 of Γ in terms of D will eventually be replaced by a definition of D in terms of Γ,

which will in turn be defined in terms of ρ̄. Alternatively, D could be expressed directly in terms of ρ̄. But
the main task of this section will be to express Γ in terms of ρ̄. ]
[ In Definition 36.10.5, must derive Γ directly from ρ and θ too. ]
36.10.5 Definition: Let ψ be a chart for a C ∞ manifold M with a connection, and let (∂i )ni=1 = (∂ip,ψ )ni=1
denote the coordinate basis vectors at p ∈ Dom(ψ). Then the Christoffel symbol for ψ with respect to the
given connection is defined to be the set of components of the tangent vectors (D∂j (∂k ))nj,k=1 in the coordinate
basis:
n
5
D∂j (∂k ) = Γjk
i
∂i . (36.10.1)
i=1
[ To give meaning to Definition 36.10.5, must have a definition for DX Y . ]

!n
[ Equation (36.10.1) in Definition 36.10.5 could be D∂j (p) (∂k ) = i=1 Γjk
i
(p)∂i (p)? This is because the sub-
script of D would then be a vector and D∂j (p) would act on the vector field ∂k . ]
[ Try to express the connection form ω in terms of the Christoffel symbol. This might be possible by firstly
expressing the connection ρ̄ in terms of the Christoffel symbol and then using the relation between ω and ρ̄.
In fact, the whole range of expressions for the connection could be expressed somehow in terms of coordinates,
maybe. ]

36.11. Connections for Lagrangian mechanics 713
36.11. Connections for Lagrangian mechanics

[ See Crampin/Pirani [12], page 345, section 13.8, and EDM2 [35], 271.F. ]
36.11.1 Remark: Least action principles yield shortest paths in phase spaces. These are geodesics from
which a torsion-free connection may be derived.
[ Show how to derive/calculate connections in terms of Lagrangians and/or least action principles. Also
present Hamiltonian mechanics and phase space. Maybe the origin of the term “phase space” is the fact that
the trajectories of an oscillatory system in Lagrangian coordinates sometimes look like circles or ellipses, and
the motion goes through phases like sine/cosine pairs. ]


[715]
Chapter 37
Geodesics, convexity and Jacobi fields
37.1 Covariant derivatives of vector fields along curves . . . . . . . . . . . . . . . . . . . . . . 715

37.2 Geodesic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
37.3 Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
37.4 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
37.5 Convex combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
37.6 Convex curvilinear interpolation in affine manifolds . . . . . . . . . . . . . . . . . . . . . 718
37.7 Families of geodesic interpolations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
37.8 Exponential maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
37.9 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
37.0.1 Remark: This chapter defines geodesic curves, Jacobi fields, convex sets and convex functions for
a manifold with an affine connection. The author’s current research interests are related to these topics. The
requirements of this chapter have provided the motivation to write the rest of the book. The calculation of
the parallel transport of second-order differential operators along geodesic curves has required a thorough
analysis of the concepts of tangent spaces, differentials, connections and curvature. It is fortuitous that
the development of definitions related to second-order Jacobi fields requires the prior clarification of such a
wide range of concepts in differential geometry. Almost the whole book has been generated by recursively
developing the definitions required for second-order Jacobi fields. Therefore this is in a sense the core chapter
of the book – from the author’s perspective.
37.0.2 Remark: A curve is defined in Section 16.2 as a map γ : I → M from a real interval to a topological
space M . A (continuous oriented) path is defined in Section 16.4 as an equivalence class of curves which
are related by oriented homeomorphisms of the parameter interval. In the context of geodesic curves, two
curves are considered equivalent only if they are related by an affine parameter transformation. Therefore it
may be convenient to define an “affine path” as an equivalence class of curves which are affine related. Then
two curves may be said to “have the same affine path” if they are in the same equivalence class.
37.1. Covariant derivatives of vector fields along curves

[ Also try to define covariant derivatives along curves in general differentiable fibre bundles. ]
37.1.1 Remark: This section presents covariant derivatives along curves and families of curves. The phrase
“along curves” is used instead of “on curves” to emphasize that covariant derivatives are not defined on the
image set of a curve. Covariant derivatives are well-defined at self-intersections of curves because they are
defined in terms of the parameter interval, not the image set.
37.1.2 Definition: The covariant derivative of a C 1 vector field X : IR → T (M ) along a curve γ : IR → M
in a C 1 manifold M with respect to an affine connection θ on M is the function t 8→ (Dγ ! (t) X)(t) defined by
(Dγ ! (t) X)(t) = 8X(t) (X # (t) − θγ ! (t) (X(t))).
[ The notation (Dγ ! (t) X)(t) in Definition 37.1.2 needs to be fixed. The RHS is okay though. Maybe a better
notation would be just DX(t), since the curve is implicit in the function X. (γ(t) = π(X(t)) for all t ∈ IR.) ]

716 37. Geodesics, convexity and Jacobi fields
T (t) (T (M ))
X (t) ∈ X
#
T (M ) X(t)
θγ ! X # (t) − θγ ! (t) (X(t)) ∈ ker((dπ)X(t) )
(t) (
X(
t) )
π ∈T
X(
t) (T
(M
) )
M γ(t) " #
DX(t) = 8X(t) X # (t) − θγ ! (t) (X(t))
∈ Tγ(t) (M )
Figure 37.1.1 Covariant derivative of a vector field along a curve
[ Must settle on some standard notations. Might use γ∗∗ instead of γ ## for the abstract (connection-free)
second-order tangent vector field. Might use something like D2 γ instead of Dγ ! (t) γ # (t). Might add the
connection θ to the D-symbol, as for example Dγθ ! (t) . Can then compare Dγθ1! (t) with Dγθ2! (t) . ]
37.1.4 Definition: The (covariant) acceleration of a C 2 open curve γ : I → M in a C 2 manifold M with

respect to an affine connection θ on M is the map Dγ ! (γ # ) : Int(I) → T (M ) defined by:
∀t ∈ I, Dγ ! (t) γ # (t) = 8γ ! (t) (∂t2 γ(t) − θγ ! (t) (γ # (t))),
where ∂t2 γ : I → T (T (M )) satisfies ∂t2 γ(t) ∈ Tγ ! (t) (T (M )) for t ∈ I, and θγ ! (t) ∈ Tγ ! (t) (T (M )) also. The
function 8z (ker(dπ)z ) → Tγ(t) (M ) for z ∈ T (M ) is the drop-function for T (M ). (See Definition 27.11.6.)
37.2. Geodesic curves
37.2.1 Remark: Differentiable curves are defined in Section 26.7. Curves and families of curves are always
assumed to be continuous unless otherwise stated.
37.2.2 Remark: The geodesic curves in Definition 37.2.3 are affine-parametrized. It is understood that a
C 2 curve I → M means a curve which is C 2 on the interior of I and continuous on all of I.
37.2.3 Definition: A geodesic curve in a C 2 manifold M is a C 2 curve γ : I → M such that (Dγ ! (t) γ # )(t) =
0 for all t ∈ I, where I is an interval of IR.
[ Also define geodesic curves in terms of θ, ρ, ω etc. ]
37.2.4 Definition: A geodesic path in a C 2 manifold M is the equivalence of any geodesic curve in M
with respect to affine reparametrization.
37.2.5 Theorem: In terms of local coordinates, have
d2 xi j
i dx dx
k
+ Γjk = 0,
dt2 dt dt
for a geodesic γ, in terms of x = ψ ◦ γ. [ Note that this sort of geodesic is an affinely parametrized geodesic.
There’s probably a simple formula also for a generally parametrized geodesic. ]
[ For subscripts and superscripts as in the derivatives of xk above, should have some macros to make the scripts
extend over the side of the fraction. For instance, \rlapsuper could mean a superscript which overlaps to
the right. ]
[ Show near here a precise way of constructing the connection from affine-parametrized geodesics. ]

37.3. Jacobi fields 717
37.3. Jacobi fields

[ See especially Klingenberg [26], section 1.12, for Jacobi fields. ]
37.3.1 Remark: The most important motivation for writing this book is to obtain some really good
estimates for the Jacobi field and its first and second derivatives. Of interest are not only the magnitudes of
the derivatives, but also the amount of rotation and skewing of the Jacobi field relative to parallel transport
along a path.
In this section, derivative estimates should be found for the Jacobi field in terms of the curvature, without
using the metric. Then in the next chapter, the metric should be used to get better estimates.
[ Here define families of geodesic curves. Then find the derivatives in terms of the equations for geodesics.
Use this to derive the equations for Jacobi fields. Do this both with the covariant derivative D and with
Christoffel symbols. ]
37.3.2 Definition: The Jacobi field along a geodesic curve γ in a C ∞ manifold M is the vector field Y
along the curve γ such that
Y ## + R(γ # , Y )γ # = 0.
[ Each part of the equation in Definition 37.3.2 should be checked to see if it is sensible. There are also the
questions of existence and uniqueness. The equation might be related to Dγ ! γ # = 0 etc. ]
[ The curvature tensor R of M is defined in EDM2 [35] 80.J. The covariant derivative Y # of Y along γ and
the tangent vector γ # will hopefully also be defined somewhere some day. Jacobi fields are in EDM [34] 48.C
and EDM2 [35] 178. ]
[ Mention the equation of geodesic variation. See Greene/Wu [68], page 6. The coverage by Gallot/Hulin/
Lafontaine [20], pages 118–124 is very useful for Jacobi fields in the context of Riemannian manifolds. Should
− − −
cover the equation of geodesic variation thoroughly, especially covering the case of : Ω × Ω × [0, 1] → Ω in
Section 37.7. ]
[ Estimates for the Jacobi field should go here! ]
[ Give examples here of exact solutions of Jacobi field equations for “constant curvature” connections. Do
this for various dimensions of manifolds. ]
37.4. Convex sets

[ See EDM2 [35], 364.C, for convex neighbourhoods. ]
37.4.1 Definition: A convex subset of a C 1 manifold M with an affine connection is a subset K of M

such that for all x, y ∈ K, there is a unique geodesic in M which has endpoints x and y and lies in K. [ See
Klingenberg [26], definition 1.9.9, p.84. ]
37.4.2 Definition: A starlike subset of a C 1 manifold M with an affine connection is a subset K of M

such that for some x ∈ K, for all y ∈ K, there is a unique geodesic in M which has endpoints x and y and
lies in K.
[ Perhaps it would be useful to have a definition of “starlike” which does not require uniqueness of the geodesics
between the star-centre point and the points of the set. As long as the set is equal to the union of the ranges
of a set of geodesic curves with the star-centre as end-point, the set could reasonably be called starlike.
Perhaps the terms “weakly starlike” and “strongly starlike” could be used for the non-unique and unique
cases respectively. ]
37.4.3 Remark: An open hemisphere of a sphere is convex, but its closure is not convex. This is because
a closed hemisphere contains pairs of conjugate points. A whole sphere, minus a single point, is starlike,
centred on the point opposite the excluded point. [ This example must be in the section on S n also. Refer
here to that much fuller treatment from here. ]
37.4.4 Theorem: If a subset of a C 1 manifold with an affine connection is convex, then it is also starlike.

[ Is it true that the closure of a convex set is starlike? And is it true that a closed convex subset is covered
by a single map? And what happens in manifolds with boundaries? ]
37.4.5 Theorem: Let K be an open starlike subset of a C 1 manifold with an affine connection. Then K
is homeomorphic to the ball B1 (0) of IRn . [ Use normal coordinates? See Klingenberg [26] around about
Definition 1.9.9. ]
37.4.6 Theorem: Let K be a starlike subset of a C 1 manifold with an affine connection. Then there exists
a chart ψ ∈ C̊(M, IRn ) such that ψ is C 1 -compatible with atlas(M ) and K ⊆ Dom(ψ).
Proof: [ This requires a construction of some sort, presumably from some sort of gluing together of charts
which cover K. ]
37.5. Convex combinations

[ A better term for convex combinations might be “geodesic interpolations”? ]
[ For parametrized families of geodesics, see Klingenberg [26], section 1.9. ]
[ The image of a family of geodesic curves parametrized by endpoints in convex subsets K1 and K2 of a
manifold M with an affine connection may be a convex subset of M under some circumstances. Determine
what these are. ]
−
37.5.1 Theorem: Let Ω be an open set in a C ∞ manifold M such that Ω is convex in M . Then there
− − − −
exists a unique family γ : Ω × Ω × [0, 1] → Ω of geodesic curves parametrized by their endpoints in Ω. The
restriction of this family to Ω × Ω × [0, 1] is C .
∞
[ Near here have a graphic showing two points x1 and x2 near each other, and two points y1 and y2 also near
each other. Show the geodesics joining each x-y pair, and some representative point γ(0.4), say, on each
geodesic. ]
37.5.2 Definition: Let M be a C ∞ manifold with an affine connection. Let N denote the subset of M ×M
consisting of those pairs of points in M for which there exists a unique geodesic joining the points in M .
Then the convex combination function M : N × [0, 1] → M is defined for (x, y) ∈ N by setting M (x, y, λ)
equal to the image of γ(λ) for the unique geodesic curve γ : [0, 1] → M with γ(0) = x and γ(1) = y. The
notation may be used for M when the manifold is implicit in the context.
−
37.5.3 Theorem: Let Ω be an open subset of a C ∞ manifold such that Ω is a convex subset of M . Then
− − − −
Ω× Ω ⊆ N , and the restriction Ω̄ of M to Ω× Ω×[0, 1] is the unique family of geodesic curves parametrized
−
by their endpoints in Ω. Moreover, M is continuous, and probably C 1 , and quite likely C ∞ , if the connection
is C ∞ . [ This differentiability question deserves to be looked into. Find out at least some sufficient conditions
for the continuity of . ]
[ Should try to define here the centroid of a set of points, and more generally, a convex combination of a set
of points, and the convex hull of a set. Quite likely this is not so simple as in flat space, because the order in
which convex combinations are performed may have an influence. So the curvature might cause a blowing
out of the set of combinations from a submanifold into a set of variable thickness. ]
37.6. Convex curvilinear interpolation in affine manifolds

37.6.1 Remark: Convex curvilinear interpolation may be generalized to a general differentiable manifold
with an affine connection using geodesics and convex combinations. However, the resulting curve interpola-
tion would be expected to be different for the two different ways of interpolating the curves.
If γ1 and γ3 are the start and finish curves respectively, and γ2 and γ4 are the end-point trajectory specifi-
cations as in Remark 16.5.2, there may be a geodesic joining γ1 (s) to γ3 (s) for all s ∈ [0, 1], and a geodesic
joining γ2 (t) to γ4 (t) for all t ∈ [0, 1]. But the point sums and differences in (16.5.1) are not well defined in
a general affine manifold. Furthermore, the third term in (16.5.1) will not in general be invariant under the
interchange of the roles of curve-pairs γ1 -γ3 and γ2 -γ4 .

37.7. Families of geodesic interpolations 719
37.7. Families of geodesic interpolations

37.7.1 Remark: This section deals with families of geodesic curves which are parametrized by their end-
points. Of special interest is the way in which first and second order derivatives are transmitted along the
family of curves.
37.7.2 Remark: Of special interest is the differential of the map x 8→ (x, y, λ) from M to M . This
probably should be analyzed in terms of the full map from M × M × [0, 1] to M . This generates a whole
bunch of Jacobi fields.
[ What does ⊕ mean in Ty (M ) ⊕ Tλ (IR) in the following? ]
If is C , then the differential
∞
(d )x,y,λ : Tx (M ) ⊕ Ty (M ) ⊕ Tλ (IR) → T (x,y,λ) (M )
is a well-defined linear map for all (x, y, λ) ∈ Ω × Ω × (0, 1). Let z = (x, y, λ). Then for each (x, y, λ), the
map (d )x,y,λ may be decomposed into three components as follows:
∂z
(x, y, λ) = (d )x,y,λ (·, 0, 0)
∂x
∂z
(x, y, λ) = (d )x,y,λ (0, ·, 0)
∂y
∂z
(x, y, λ) = (d )x,y,λ (0, 0, ·),
∂λ
so that
∂z
(x, y, λ) : Tx (M ) → Tz (M )
∂x
∂z
(x, y, λ) : Ty (M ) → Tz (M )
∂y
∂z
(x, y, λ) : Tλ (IR) → Tz (M ).
∂λ
y
For bases (ewi )i=1 at w = x, y and z, will then have (∂z/∂x)ei = a i ej , (∂z/∂y)ei = b i ej , and (∂z/∂λ) =
n x j z j z
h ej , for some arrays [a i ]i,j=1 , [b i ]i,j=1 , and [h ]j=1 .

j z j n j n j n
[ In above “decompositions into three components”, find Jacobi fields J(v, w, λ) for (x, y, λ). Check the
component equations too. ]
[ Under what conditions is it guaranteed that (x, y, λ) is a C k function of x, y ∈ M ? ]
37.7.3 Definition: A family of geodesic curves parametrized by their endpoints is a continuous map γ :
S1 × S2 × [0, 1] → M such that S1 and S2 are subsets of a C ∞ manifold M and
(i) ∀x ∈ S1 , ∀y ∈ S2 , ∀λ ∈ [0, 1], γ(x, y, 0) = x and γ(x, y, 1) = y,
(ii) ∀x ∈ S1 , ∀y ∈ S2 , γx,y is a geodesic curve in M , where γx,y : [0, 1] → M is defined for x ∈ S1 and
y ∈ S2 by γx,y (λ) = γ(x, y, λ) for λ ∈ [0, 1].
The family of geodesics is said to be C r when S1 and S2 are open subset of M , and the restriction of γ to
S1 × S2 × (0, 1) is a C r map.
[ The derivatives d in the following remark should be partial derivatives. ]
37.7.4 Remark: In terms of local coordinates, have
d2 z i j
i dz dz
k
2
+ Γjk = 0,
dλ dλ dλ
dz i dz j
gij = d(x, y)2
dλ dλ
etc. etc. Clearly also have

∂z i &&
& = δi j
∂xj λ=0
∂z i &&
& =0
∂xj λ=1
∂z i &&
& =0
∂y j λ=0
∂z i &&
& = δi j ,
∂y j λ=1
if the same chart is used over a neighbourhood of the geodesic joining x and y.
[ In a space of constant sectional curvature, the matrix ∂z/∂x is effectively diagonal, with diagonal elements
α, . . . , α, (1 − λ). This puts limits on the maximum “magnification factor” along the geodesic, depending on
what the sectional curvature actually is. It should be checked whether the sectional curvature is meaningful
in this way with just an affine connection. Maybe not! ]
[ Jacobi fields should be defined near here in terms of the map. This sort of Jacobi field should be just
Yy,z (λ) = (d )(y, z, λ)(Ly , Lz , Lλ ), where Ly ∈ Ty (M ), Lz ∈ Tz (M ) and Lλ = 0 ∈ Tλ (IR), and y and z are
fixed. In other words, the Jacobi field for given y, z ∈ M and Ly ∈ Ty (M ) and Lz ∈ Tz (M ) is the image
under d of the vector (Ly , Lz , 0) ∈ T(y,z,λ) (M × M × [0, 1]). This could be made the primary definition, and
the definition in terms of the Riemann curvature tensor should be presented as a mere property of the Jacobi
field. A difficulty with this is that the geodesics must exist and be uniquely defined in a neighbourhood of
(y, z) for the Jacobi field to be defined, whereas the Riemann tensor definition is a local definition. Perhaps
the convex combination definition should be used only as a motivational definition. Should have a theorem
that Yy,z (·) is a Jacobi field. ]
37.8. Exponential maps

[ Must clarify the difference between geodesic coordinates and the sorts of coordinates you get by parallel
transporting vectors along a geodesic. ]
37.8.1 Remark: Exponential maps may also be called normal coordinates or geodesic coordinates. For
geodesic coordinates, see EDM2 [35], section 80.J. For normal coordinates, see Gallot/Hulin/Lafontaine [20],
2.86, page 84, and section II.C (page 80) generally.
[ In this section, must cover families of geodesics, in particular. Also vector fields on families of curves. This
section is intended as preparation for Jacobi fields. ]
37.8.2 Definition: [Definition of exponential map.]
37.8.3 Definition: [Definition of geodesic coordinates. See end of EDM2 [35] 80.J.]
37.8.4 Definition: [Definition of conjugate point.]
37.8.5 Definition: [Definition of convex neighbourhood. This might have something to do with conjugate
points. That is, a convex set should have no pairs of conjugate points in it, maybe.]
37.9. Convex functions

[ The concepts of α-concavity and harmonic concavity should be given here. Must also replicate all of the
properties of α-concavity in Kennington [120]. ]
37.9.1 Definition: A convex function on a convex subset K of a C ∞ manifold is a function f : K → IR
such that
∀x, y ∈ K, ∀λ ∈ [0, 1], f ( (x, y, λ)) ≤ (1 − λ)f (x) + λf (y).
[ See Greene/Wu [68], page 7, for a definition of a convex function in terms of the Hessian. Also should define
“strictly convex” here. ]

37.9. Convex functions 721
37.9.2 Definition: The convexity test function on a manifold M with affine connection is the function
c : IRM → IR(N ×[0,1]) defined by
∀u ∈ IRM , ∀(x, y, λ) ∈ N × [0, 1],

c(u)(x, y, λ) = (1 − λ)u(x) + λu(y) − u( M (x, y, λ)),
where N is the set of pairs of non-conjugate points in M . (This definition is non-standard.)
37.9.3 Remark: The convexity test function in Definition 37.9.2 may be viewed as a “geodesic leverage
map” because a variation of either x or y causes a variation in c(u)(x, y, λ) which is reminiscent of the action
of a lever. The design of a pantograph is similar. So it could also be called a “pantograph map”. (In the
1960s, there was a cheap kind of plastic pantograph called a “sketchagraph”. The author received one as a
birthday present and was not made happy by it!)
The convexity test function seems to have been first used to prove geometric properties of solutions of BVPs
by Korevaar [123]. A generalization to the α-convexity test function cα : IRM → IR(N ×[0,1]) is defined and
used in Kennington [119]:
∀u ∈ IRM , ∀(x, y, λ) ∈ N × [0, 1],

cα (u)(x, y, λ) = (1 − λ)uα (x) + λuα (y) − uα ( M (x, y, λ)),
for α > 0. In fact, this may be generalized to α ∈ [−∞, ∞].
37.9.4 Theorem: Let c be the convexity test function on a manifold M with affine connection, and
let u ∈ IRK be a real-valued function on a convex subset K of M . Then u is convex in K if and only if c(u) is
non-negative in K ×K ×[0, 1]. Similarly, u is concave in K if and only if c(u) is non-positive in K ×K ×[0, 1].
37.9.5 Theorem: Let c be the convexity test function on a manifold M with affine connection, and
let u ∈ C 1 (Ω) be a function on a convex open subset Ω of M . Then
∂c(z) ∂u ∂u ∂z
(x, y, λ) = (1 − λ) −
∂x ∂x ∂z ∂x
∂c(z) ∂u ∂u ∂z
(x, y, λ) = λ −
∂y ∂y ∂z ∂y
∂c(z) ∂u ∂u ∂z
(x, y, λ) = (1 − λ) −
∂λ ∂λ ∂z ∂λ
where. . .
[ The last formula above is missing at least one term! ]

[ Clearly it would be interesting to know if ∂z/∂x and ∂z/∂y are invertible, and to know also something about
their eigenvalues (if eigenvalues are of relevance).
Also of great interest are the second derivatives of z with respect to x, y and λ. Then should get
∂2c ∂ 2 u(z)
= (1 − λ) ...
∂x∂x ∂x∂x
and so forth. All of these ideas on transport of second order differential operators should be worked out
properly. In particular, work out what kind of object a second order operator is. Calculate all second order
deriatives of the map. ]


[723]
Chapter 38
Riemannian manifolds
38.1 Historical notes on Riemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 724

38.2 Overview of Riemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
38.3 The Riemannian metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
38.4 The point-to-point distance function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
38.5 The Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
38.6 Curvature tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
38.7 Differential operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
38.8 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
38.9 Embedded Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
38.10 Information geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
38.0.1 Remark: The Riemannian metric has been banned from all chapters prior to this to avoid confusing
pre-metric geometry with the convenient special formulas of Riemannian spaces. Too many textbooks begin
with Riemannian spaces and then ask readers to forget what they have just learned so that metric-free affine
connections can be presented. In this book, Riemmannian manifolds are presented after the chapters on
general connections. All results on general connections may be applied without restraint to the special kinds
of connections on Riemannian manifolds, including Levi-Civita connections and metric connections.
38.0.2 Remark: A Riemannian manifold is essentially a metric space for which the Hessian of the square
of the distance function is defined everywhere. (The conditions can probably be made weaker than this.)
The Riemannian metric is defined to be half the Hessian of the square of the distance function. Thus a
Riemannian manifold is the same as the familiar metric space studied in topology, with the constraint that
it must have a differentiable structure, and its distance function must satisfy a differentiability condition
with respect to its charts.
[ This chapter covers all of EDM2 [35] 364, and a small section of EDM2 [35] 105. Federer [106], section 5.4.12,
page 634, has an interesting instant summary of Riemannian geometry for submanifolds of IRn . ]
[ The most urgent thing to calculate in this chapter is a set of bounds on the length of Jacobi fields. In the
particular case of constant sectional curvature, it is well known that the lengths of Jacobi fields vary with
the sine or hyperbolic sine of the distance along the geodesic. The second most urgent thing is probably to
find out what happens to the inner product of any pair of Jacobi fields on a single geodesic. In the case of
constant sectional curvature, the fields are parallel transported along the geodesic so that angle is preserved. ]
[ Probably the conformal sublayer should be treated at the end of this chapter or after the pseudo-Riemannian
space chapter. ]
38.0.3 Remark: A “conformal sublayer” may be interpolated between the connection and metric layers.
In this sublayer, angles between vectors are defined globally but distances are not.


724 38. Riemannian manifolds
38.1. Historical notes on Riemannian geometry

38.1.1 Remark: The Riemannian style of differential geometry commenced with the systematic study of
quadratic differential forms by Gauß in “Disquisitiones generales circa superficies curvas”, (1827). These
were line elements on 2-dimensional surfaces embedded in IR3 . This was generalized to intrinsic geometry of
arbitrary dimensions by Riemann in “Über die Hypothesen welche der Geometrie zu Grunde liegen”, (1854).
Christoffel invented covariant differentiation in 1869, and Ricci-Curbastro called it “covariant differentiation”
in his development of tensor calculus, published 1887–88. (See Bell [190], pages 210, 354, 358–360.)
38.1.2 Remark: Bell [191], pages 263–264, said the following in 1937 about the early origins of differential
geometry.
During the period 1821–1848 Gauss was scientific adviser to the Hanoverian (Göttingen was then
under the government of Hanover) and Danish governments in an extensive geodetic survey. Gauss
threw himself into the work. His method of least squares and his skill in devising schemes for
handling masses of numerical data had full scope but, more importantly, the problems arising in
the precise survey of a portion of the earth’s surface undoubtedly suggested deeper and more general
problems connected with all curved surfaces. These researches were to beget the mathematics of
relativity. The subject was not new; several of Gauss’ predecessors, notably Euler, Lagrange, and
Monge, had investigated geometry on certain types of curved surfaces, but it remained for Gauss
to attack the problem in all its generality, and from his investigations the first great period of
differential geometry developed.
Differential geometry may be roughly described as the study of properties of curves, surfaces, etc.,
in the immediate neighbourhood of a point, so that higher powers than the second of distances
can be neglected. Inspired by this work, Riemann in 1854 produced his classic dissertation on the
hypotheses which lie at the foundations of geometry, which, in its turn, began the second great
period of differential geometry, that which is today of use in mathematical physics, particularly in
the theory of general relativity.
38.1.3 Remark: The Riemannian metric arose as an attempt to give differential geometry an intrinsic
framework as opposed to the extrinsic treatment of embedded manifolds. Gauß had found that the Gaußian
curvature of a manifold was independent of embedding. (The date 1827 is given by do Carmo [17], page 36–
37, for this.) The Riemannian metric made this possible by abstracting the metric properties inherited by a
surface from its embedding within a Euclidean space. It turns out that by knowing only the metric tensor,
it is possible to derive all other aspects of the intrinsic geometry of a manifold. It follows, therefore, that the
embedding can be dispensed with. The Riemannian metric framework was then the obvious candidate for
the basis of the curved space-time generalization of Einstein’s special relativity to general relativity. As soon
as the extrinsic framework of an embedded manifold is discarded, however, many questions arise as to how
to redefine all extrinsic geometric objects in terms of the Riemannian metric. A useful picture to have in
mind is that of a population of geometers who are trapped inside a surface embedded in a Euclidean space,
but who cannot experience anything outside the surface at all.
38.1.4 Remark: The historical origin of the Riemannian metric seems to be the second fundamental form
O" #
E dp2 + 2F dp · dq + G dq 2
arising out of curvature calculations by Gauß for a 2-manifold embedded in IR3 . For this, see Gauß: “Disquisi-
tiones generales circa superficies curvas” (1827), translated in Spivak [43] book II, pages 55–111, particularly
pages 87–95. The Riemannian metric is a generalization of this second fundamental form for such embedded
surfaces. The key passage for this is in article 12 of the “Disquisitiones” as follows (Spivak [43], book II,
pages 91–93).
Since we always have
dx2 + dy 2 + dz 2 = E dp2 + 2F dp · dq + G dq 2
O
it is clear that (E dp2 + 2F dp · dq + G dq 2 ) is the general expression for the linear element on
the curved surface. The analysis developed in the preceding article thus shows us that for finding
the measure of curvature there is no need of finite formulæ, which express the coordinates x, y, z

38.2. Overview of Riemannian geometry 725
as functions of the indeterminants p, q; but that the general expression for the magnitude of any
linear element is sufficient. Let us proceed to some applications of this very important theorem.
In other words, the quadratic form contains all of the information required for analysis of intrinsic geometric
properties of the surface. This is the fundamental idea behind the Riemannian metric, namely that the
(quadratic) Riemannian metric tensor contains all of the information required about distance elements for
the surface. Therefore the particular choice of embedding does not need to be known.
38.1.5 Remark: There are many questions which naturally arise from the definition of the Riemannian
metric (Definition 38.3.3). For example, why is it quadratic rather than cubic or linear, or some more
general form of function? Why is it a tensor? Why is it symmetric? Why is it positive definite? Why is the
Riemannian metric not path-dependent like parallelism? In the physical world, how do the distances and
angles at all points in space-time get synchronized to give a single universal standard of length?
Even to this day, it is not at all clear why space should have a quadratic character. Lengths add linearly in a
straight line and quadratically at right angles. It is difficult to experience how surprising this fact is because
human beings have never existed in a space which does not obey the Pythagoras theorem. As someone once
observed, whoever discovered water was not a fish. (Albert Einstein [183] is supposed to have said: “Was
weiß der Fisch von dem Wasser, in dem er sein Leben lang herumschwimmt?” In other words: “What does
the fish know about the water in which it swims around all its life?”) It must have been very mysterious to
the first discoverers of the 52 = 42 + 32 rule for a right angle that these numbers should have such a strange
relation to each other, and that there are so few of these integer formulas. Struik [194], page 28, says the
following regarding Babylonian geometry around 1750bc.
The texts show that the Babylonian geometry of the Semitic period was in possession of formulas
for the areas of simple rectilinear figures and for the volumes of simple solids, although the volume
of a truncated pyramid had not yet been found. The so-called theorem of Pythagoras was known,
not only for special cases, but in full generality, as a numerical relation between the sides of a right
triangle. This led to the discovery of “Pythagorean triples” such as (3, 4, 5), (5, 12, 13) etc.
The Pythagoras theorem was therefore one of the earliest non-trivial discoveries of the arithmètic nature of
space. It is still at the core of our mathematical representation of space-time.
The concept of the Riemannian metric is a generalization of the Pythagoras theorem. The Riemannian
metric effectively sets up a Pythagoras law at every point in space. Unlike the definition of parallelism over
vectors at a distance, the lengths of vectors at different points are assumed to have an absolute relation to
each other, independent of the path taken for the comparison. One might reasonably ask why vector length
is not also path-dependent. A physical mechanism for parallel transport can be imagined, like photons
carrying information about orientation, but it is not clear how physical space would transport vector length
information between points to keep the definitions synchronized throughout a universe.
However, just as Bertrand Russell had to reluctantly accept Euclid’s axioms when he was young (in 1883
as mentioned in Remark 2.1.9), one must (at least temporarily) accept Riemannian geometry’s global and
quadratic assumptions if one is to understand much of modern physics, either willingly or reluctantly. Of
course, it turned out 30 years later that Euclid’s assumptions were wrong and Russell was right to be sceptical!
(Some recent observations suggesting that the speed of light may not be constant could conceivably be related
to the question of globality of the pseudo-Riemannian metric of physical space-time. Perhaps some notion of
“metric transport” could be more realistic than an absolute global Riemannian metric. Then one would need
to determine how a metric “transport mechanism” could yield a Riemannian metric as an approximation at
short to medium distances.)
38.2. Overview of Riemannian geometry
38.2.1 Remark: A Riemannian metric makes possible the comparison of lengths and angles at different
points in a space, as illustrated in Figure 38.2.1.
The distance and angle comparisons are global, not path-dependent. In physics, it is unusual to have an
absolutely synchronized quantity throughout the universe for all time. But this is what the Riemannian
metric offers.

10
0 10 10
0
10
10 0 0 90 90
90
0 90 10 90 0
0 90 10 0 0 10
0 10 0
0 10 10 10
0 0 10
10 10
0
90 90 0
10
0 0 0 90
10 0
10 10 10
10 90 10 10
90 0
90 0 10
0
0
0 10 10 10
10 90
90 0
0 0 90
0 90 10
10 10
0 0
0 10 0
0 0 10 90 0
10 10 90 0
10
0
10 0 0
90 10 0
90 10 0 10
10 10 90
10
10 90
0 0
Figure 38.2.1 Layer 4: Riemannian metric defines distance and angles globally
38.2.2 Remark: The Riemannian metric tensor’s component matrix g = [gij ]ni,j=1 and the two-point
metric function d, familiar from topological metric spaces, are related by the following formula.
1 ∂ 2 d(x, y)2 &&

gij (x) = & . (38.2.1)
2 ∂y i ∂y j y=x
This may be abbreviated to gij (x) = 12 ∂ij d2 (x, ·). That is, the Riemannian metric tensor’s component
matrix g at a point x equals half the matrix of second-order derivatives of the square of the two-point
distance function d. (To avoid technicalities, points and coordinates are used interchangeably here.)
A Riemannanian manifold is a combined differentiable manifold and metric space such that the square of
the distance function is twice differentiable. Hence a Riemannian metric is a sub-class of the familiar class
of two-point metrics. If the distance function has the right kind of differentiability, half the Hessian of its
square is a Riemannian metric. (An interesting technicality here is the fact that the Hessian of a function
at a stationary point is tensorial even in the absence of a connection.) It follows that Riemannian manifolds
are a sub-class of the familiar topological metric spaces, assuming that a differentiable structure is specified
on the metric space.
Conversely, the two-point distance function is recovered by integrating the metric tensor.
N T
d(x, y) = min1 gij dxi dxj .
γ∈Cx,y γ
In other words, the distance from x to y equals the minimum integral of the metric tensor over all differentiable
curves from x to y. Contrary to popular belief, a Riemannian manifold may be defined in terms of either a
metric tensor g or a two-point distance function d.
38.2.3 Remark: In layer 4, a Riemannian metric is a kind of differential of a distance function. In layer 3,
an affine connection is a kind of differential of the parallel transport function. In both layers, the global
point-to-point function is obtained by integrating the differential. Parallel transport is path-dependent.
The distance function is path-dependent too, but the distance of greatest interest is the minimal distance
obtained by extremizing the integral over a class of paths. In layer 2, a tangent vector is the differential of
displacement along a curve. So the principal structure in each layer is the differential of something which
varies along a curve.
layer pathwise concept differential concept
2 displacement tangent vector
3 parallel transport affine connection
4 distance metric tensor

38.3. The Riemannian metric 727
Fundamental physics is written mostly in terms of differential equations. So it is the differential specification
of structures in layers 2, 3 and 4 which is of greatest relevance to the formulation of physics theories.
38.2.4 Remark: Although orientations at different points are not comparable, the ability to unambigu-
ously compare angles and local distances is a very tight constraint on the manifold. Clearly a Riemannian
metric is a big step beyond what an affine connection offers. A conformal metric specifies angles globally
but not distances. So a conformal metric lies between layers 3 and 4.
38.2.5 Remark: A Riemannian metric induces a canonical affine connection. This is called the Levi-Civita
connection. (There is also a more general class of affine connections called “metric connections” which are
more weakly consistent with a Riemannian metric. This is mentioned in Section 38.5.) The Levi-Civita
connection is an orthogonal connection. This means that the connection preserves angles and length.
38.2.6 Remark: Since the Riemannian metric determines a canonical parallelism one may use this paral-
lelism to define geodesic curves. A geodesic curve, also called simply a “geodesic”, is a curve whose direction
is parallel at all points. In other words, if you transport the direction of the curve at any point to a second
point, then the direction at the second point is the same as what you transported. The geodesic curves
can be combined with the metric to determine distance. Thus the distance between two points A and B
in a Riemannian manifold is obtained by adding the distances along a geodesic curve from A to B. This is
90 90 90 90
90 90 90 90
90
0 10 0
90 10 10 0 10
90 0
0 10 0 90 10 0 10
10 0 0
10
0 10
A 10
0 10
0 B
Figure 38.2.2 Levi-Civita connection determines geodesic curves and long distances
This is how the local Riemannian distance function is extended to measure distances globally. (It turns out
that the extremal paths with respect to distance are the geodesics. So the “self-parallel” paths are the ones
which determine the point-to-point distance function.)
38.3. The Riemannian metric

[ Maybe could make this section work for L1 functions or some sort of Sobolev function classes for the metric
tensor. Alternatively try Lipschitz or rectifiable functions. ]
38.3.1 Remark: At each point p ∈ M of a C 1 manifold M , a covariant tensor field g ∈ T 0,2 (M ) evaluated
at p is a bilinear form gp : Tp (M ) × Tp (M ) → IR. It is said to be “positive definite” when gp is positive
definite for all p ∈ M .
38.3.2 Remark: The tensor field in Definition 38.3.3 is not necessarily continuous. However, continuity
guarantees that the tensor field is equal to the derivative of the integral of itself, which is highly desirable.
If a Riemannian metric were defined always as the derivative of a distance function, this would ensure an
adequate level of continuity.
38.3.3 Definition: A Riemannian metric (tensor field) on a C 1 manifold M is any positive-definite sym-
metric covariant tensor field of degree 2 on M .
38.3.4 Definition: A Riemannian metric (tensor field) of class C r on a C r+1 manifold M is a Riemannian
metric g on M such that g is of class C r on M .
38.3.5 Remark: The fact that the word “covariant” appears in Definition 38.3.3 does not imply that a
metric tensor requires the prior definition of a connection. It has nothing to do with covariant derivatives.
The C 1 condition on M could possibly be weakened to a Hölder C 0,1 condition with the metric tensor being
defined almost everywhere. This would still permit distance to be calculated.

38.3.6 Definition: A C r Riemannian manifold (or Riemannian space) for r ∈ + is a pair (M, g) such
that M is a C r differentiable manifold and g is a C r−1 Riemannian metric on M . The tensor g is called the
metric tensor or fundamental tensor of M .
38.3.7 Definition: The length RLR of any vector L ∈ Tp (M ) for any point p in a Riemannian manifold
M is defined by T
RLR = gp (L, L).
38.4. The point-to-point distance function

[ This section is very woolly right now. ]
[ Maybe could make this section work for L1 functions or some sort of Sobolev function classes for the metric
tensor. Alternatively try Lipschitz or rectifiable functions. ]
[ Maybe weak necessary conditions and sufficient conditions for a two-point distance function to determine a
Riemannian metric can be discovered by using normal coordinates. ]
38.4.1 Remark: The objective of this section is to determine conditions, preferably necessary and suffi-
cient, to place on a distance function d for a manifold so that the manifold has a Riemannian metric g with
the same point-to-point distance function. It is apparently sufficient that d be C 2 on a suitably differentiable
manifold and satisfies some constraints on the second derivatives. It seems to be necessary that d should be
at least C 0,1 .
Function classes should be determined for the distance function and metric tensor so that the distance-to-
tensor and tensor-to-distance conversions are inverses of each other. (This is similar to the parallelism-to-
connection and connection-to-parallelism conversions mentioned in Remark 35.2.8.)
38.4.2 Remark: The task here is to determine the precise relation between the Riemannian metric and the
point-to-point distance function of topological space theory. The latter is defined as a function d : M × M →
0 on a set M which satisfies the three conditions of identity d(x, x) = 0, symmetry d(x, y) = d(y, x) and
IR+
the triangle inequality d(x, y) + d(y, z) ≥ d(x, z). This may be referred to as a “two-point distance function”.
[ Check and justify conditions (38.4.1) and (38.4.2). ]
When the set M is a manifold, it is possible to regard the distance function as a function d¯ : Range(ψ) ×
Range(ψ) → IRn defined by d¯ : (x, y) 8→ d(ψ −1 (x), ψ −1 (y)), where ψ is a chart for M . Then for the distance
function d to correspond with a Riemannian metric g = (gij )ni,j=1 in terms of local coordinates, one would
expect equation (38.4.1) to be satisfied.
T
∀x, y ∈ Range(ψ), ¯ y) =
d(x, gij (x)(y i − xi )(y j − xj ) + o(|y − x|) as y → x. (38.4.1)
This implies, and (probably) is implied by equation (38.4.2).
∀x, y ∈ Range(ψ), ¯ y)2 = gij (x)(y i − xi )(y j − xj ) + o(|y − x|2 ) as y → x.

d(x, (38.4.2)
Here |y − x| denotes the standard Euclidean norm in IRn . It is probably true that the manifold M is a
Riemannian manifold whose metric tensor is g if and only if equation (38.4.2) is satisfied for all x in every
chart and g is continuous and positive definite.
It may follow from (38.4.2) that the second derivatives of d(x, y)2 with respect to y exist for all x by using
the continuity of g. With a bit of luck, these derivatives might be continuous with respect to x too.
The above equations may be equivalent to the following.
T
¯ ·)) =
∂v+ (d(x, gij (x)v i v j ,
which means T
¯ x + tv) =
lim + t−1 d(x, gij (x)v i v j ,
t→0

38.4. The point-to-point distance function 729
for all v ∈ IRn . It may be that if all conditions are taken together, such as continuity of d and the triangle
inequality, then the second derivatives of d¯2 may exist and be continuous.
In the other direction, the distance function can be generated from the Riemannian metric by minimization
over curves as follows. N T
d(x, y) = min1 gij dxi dxj ,
γ∈Cx,y γ
where the minimization is over suitable C curves from x to y in the usual way. If g is continuous, then for
1
y in a small enough neighbourhood of x, the minimising curve γ should be unique and C 1 . By aligning the
minimising curves with the radial directions out of x (normal coordinates), equations (38.4.1) and (38.4.2)
should be recovered.
38.4.3 Remark: One problem with this is that if the Riemannian manifold is not connected, then there
will be no curves at all between some pairs of points. Even if the manifold is connected, there may be no
minimum-length geodesic. An example of this is a Euclidean space with a closed subset removed from it.
[ Should have a preliminary section on calculus of variations so as to be able to analyse the distance function
d obtained from a Riemannian metric g. ]
38.4.4 Example: A simple example shows that a Riemannian distance function is a very special kind of
distance function. Consider the set IRn for"! n ≥ 2 with # the distance function d : (x, y) 8→ |yn− x|pi , where
n
the p-norm is defined as usual by |x|p = i=1 |x i p 1/p
| for 1 ≤ p < ∞, and |x|∞ = maxi=1 |x | in the
p = ∞ case. Clearly d corresponds to a Riemannian metric if and only if p = 2. (See Definition 10.8.1 for
the p-norm.)
" #
Consider the value of d(0, y) for n = 2. The value is (y 1 )p + (y 2 )p 1/p . A Riemannian metric must converge
to a quadratic function of y as y → 0. This can only happen for p = 2.
A change of coordinates can remove the problem at a single point, but not at all points in IRn . This kind
of example makes it clear that distance functions can only be Riemannian if they are in some sense locally
affine distortions of Euclidean space with the 2-norm.
38.4.5 Remark: Theorem 38.4.6, which may not be perfectly correct in detail, is an attempt to determine
the relation between two-point distance functions and Riemannian metrics. It seems that the Riemannian
metric tensor is simply half the Hessian of the square of the distance function. When this Hessian exists, it
is a well-defined tensor in T 0,2 (M ). Theorem 38.4.6 requires twice differentiability of the distance function.
This is probably much stronger than is required. It’s quite possible that the manifold only needs to be C 0,1 .
[ Although Theorem 38.4.6 may be almost right if a topological metric space is assumed, in the case of a given
Riemannian space it is necessary to consider various aspects of pathwise connectivity. ]
[ Maybe should split Theorem 38.4.6 into two theorems. (1) If d is derived from g, then g can be recovered
from d. (2) If g is derived from d, then d can be recoverd from g. Weak conditions should be given for each
of these theorems. For example, it may be that g and the Hessian of d2 only need to be defined almost
everywhere in the sense of Lebesgue measure or something like that. ]
38.4.6 Theorem: Suppose M is a metric space with distance function d : M × M → IR+ 0 , and that M has
a C 2 atlas AM . Then (M, AM ) is a Riemannian manifold with distance function d if and only if the matrix
of second derivatives 3 4n
∂ 2 d(p, q)2 &&
& (38.4.3)
∂ψ(q)i ∂ψ(q)j q=p i,j=1
exists and is invertible for all p ∈ M and ψ ∈ atlasp (M ).
38.4.7 Theorem: If condition (38.4.3) in Theorem 38.4.6 holds, then for all p ∈ M , the metric tensor
[gij (p)]ni,j=1 of M for each chart ψ ∈ atlasp (M ) is given by
1 ∂ 2 d(p, q)2 &&
∀i, j = 1 . . . n, gij (p) = & .
2 ∂ψ(q)i ∂ψ(q)j q=p
[ Since the first derivative of d2 with respect to q is zero at q = p, it follows that the tensor g = gij ei ej is the
same tensor for all charts. See Section 31.6. ]

&
38.4.8 Remark: Note that gij has not been defined to be &
2 ((∂/∂q )d(p, q) (∂/∂q )d(p, q)) q=p because
1 i j
d(p, ·) is generally not differentiable at p.
[ Remark 38.4.8 should be clarified considerably. ]

[ In this section, must show that the definition of a Riemannian manifold somewhere in the concavity book is
equivalent to the one given in this section. ]
[ In the metric space on IRn using the p-norm, it looks like the geodesics would be the same as with the
2-norm. Does this imply that the Riemannian connection is the same? ]
38.5. The Levi-Civita connection

38.5.1 Remark: This is the section where the affine connection on a manifold is generated uniquely from
the Riemannian metric. The connection, of course, does not uniquely determine the metric tensor. This is
why the Riemannian metric is in a higher layer than the affine connection.
[ If you construct the Levi-Civita connection from a Riemannian metric, there is an infinite set of Riemannian
metrics which have the same Levi-Civita connection. For example, Riemannian metrics which differ by
a constant multiplier have the same Levi-Civita connection. Determine the full generality of Riemannian
metrics which can share the same Levi-Civita connection. This will then answer the question of whether it
is possible to reconstruct the Riemannian metric from its Levi-Civita connection. For example, it would be
interesting to know if the Riemannian metric is uniquely defined globally if you specify the metric tensor at
one point and the affine connection globally. ]
[ Given a distance function d : M × M → M , is it possible to determine unique shortest paths for all point
pairs and therefore generate an affine connection out of this. The differentiability conditions on a distance
function to generate a Riemannian metric are quite strong, but maybe much weaker conditions could yield
an affine connection. ]
[ Define a “length-parametrized geodesic”. See Gallot et alia, p. 116, 3.34. This is called a “normal geodesic”
in Greene and Wu, page 6. ]
[ For the following 4 definitions of connections, see the EDM2 [35] 80. ]
38.5.2 Definition: The Levi-Civita connection for a C 2 manifold M with a C 1 Riemannian metric tensor
field g is the affine connection on M which has Christoffel symbol given by
I J
1 ∂gli ∂glj ∂gij
Γij
k
= g kl + − .
2 ∂xj ∂xi ∂xl
[ The Levi-Civita connection is the unique torsion-free connection which makes geodesics length-minimizing?
See Gallot/Hulin/Lafontaine [20], page 70, sections 2.51 and 2.53. ]
38.5.3 Remark: Although the Christoffel symbol is not a tensor, it is required to transform in a specifed
way under changes of chart ψ. According to Theorem 29.2.1, the differential of a parallelism must satisfy
equation (29.2.1) in order for the operator L(ψ) = ∂ij − Γij
k
(ψ)∂k to be a second-degree covariant tensor
operator when applied to functions f ∈ C (M ).
2
38.5.4 Theorem: The Christoffel symbol Γij k

in Definition 38.5.2 satisfies condition (29.2.1) for the second-
order operator ∂ij − Γij
k
(ψ)∂k to be a second-degree covariant tensor in Theorem 29.2.1.
Proof: Let ψ, ψ̃ ∈ atlasp (M ) be charts at p ∈ M for a C 2 Riemannian manifold M . Denote the respective
Christoffel symbols by Γ and Γ̃. Then
I J
1 kl ∂gli ∂glj ∂gij
Γij
k
= g + − ,
2 ∂xj ∂xi ∂xl
I J
1 ∂g̃li ∂g̃lj ∂g̃ij
Γ̃ij
k
= g̃ kl + − ,
2 ∂ x̃j ∂ x̃i ∂ x̃l

38.5. The Levi-Civita connection 731
where g̃ k% = φ̃k ,i φ̃% ,j g ij , g̃ij = φk ,i φ% ,j gk% , φ = ψ ◦ ψ̃ −1 and φ̃ = ψ̃ ◦ ψ −1 = φ−1 . Then
∂g̃li ∂
= (φm ,i φk ,% gmk )
∂ x̃j ∂ x̃j
∂gmk
= φm
,ij φ,% gmk + φ,%j φ,i gmk + φ,i φ,% φ,j
k k m m k r
.
∂xr
Therefore
1 k % tu + m s m s r ∂gms
Γ̃ij
k
= φ̃ ,t φ̃ ,u g φ,ij φ,% gms + φs,%j φm ,i gms + φ,i φ,% φ,j
2 ∂xr
m s r ∂gms
+ φm ,ji φ,% gms + φ,%i φ,j gms + φ,j φ,% φ,i
s s m
∂xr
,
m s r ∂gms
− φm φ
,i% ,j
s
gms − φ s
φ m
,j% ,i gms − φ φ φ
,i ,j ,%
∂xr
1 + ,
m s r ∂gms s r ∂gms s r ∂gms
= φ̃k ,t φ̃% ,u g tu 2φm ,ij φ,% gms + φ,i φ,% φ,j
s
+ φm,j φ,% φ,i − φm,i φ,j φ,%
2 ∂x r ∂x r ∂x r
1 k % tu + m s r ∂gms m s r ∂gms m s r ∂gms
,
= φ,ij φ̃,m + φ̃ ,t φ̃ ,u g φ,i φ,% φ,j
m k
+ φ,j φ,% φ,i − φ,i φ,j φ,%
2 ∂x r ∂x r ∂x r
1 + ∂g ∂g ∂g ,
ms rs mr
= φm,ij φ̃,m + φ̃ ,t φ ,i φ ,j g
k k m r ts
+ m−
2 ∂xr ∂x ∂xs
= φ̃ ,t φ ,i φ ,j Γmr + φ,ij φ̃,m .
k m r t m k
(38.5.1)
(The boxed terms cancel to zero.) This matches equation (29.2.1).
38.5.5 Remark: It is the term φm ,ij φ̃,m in equation (38.5.1) which makes the Christoffel symbol a non-
k
tensorial object. That is, the symbol does not correspond to the coefficients of a tensor of any type. The
Christoffel symbol is in fact a family of “tensorization coefficients”. (See Definition 29.2.3 for general ten-
sorization coefficients.) The non-tensorial Christoffel symbol may be combined with non-tensorial higher-
order derivatives to produce tensorial objects. The second-order derivative term in (38.5.1) is what allows
the Christoffel symbol to convert non-tensorial partial derivatives into tensorial covariant derivatives.
Equation (29.2.1) does not uniquely determine the Levi-Civita connection. In fact, equation (29.2.1) places
a fairly weak constraint on the tensorization coefficients which specify a connection.
38.5.6 Definition: A metric connection on a Riemannian manifold M is a connection for which ∇g = 0,

which means that parallel transport maps the tangent bundle between points in M so that orthogonal frames
are mapped to orthogonal frames.
38.5.7 Remark: A Riemannian connection is a metric connection such that the torsion is everywhere zero.
38.5.8 Definition: [Definition of coefficients of the Riemannian connection.]
[ Give the components of the torsion tensor, curvature tensor and covariant differential. Also give the coordi-
nate equation for a geodesic. ]
38.5.9 Definition: [Definition of tangent n-frame orthogonal bundle.]
38.5.10 Definition: [Definition of normal coordinates.]
[ Show that all the ways of expressing the connection in a Riemannian space are equivalent. For example,
show that Schild’s ladder (using geodesics) reconstructs the same parallelism from which the geodesics are
constructed. ]

38.6. Curvature tensors

38.6.1 Remark: This section includes Riemann curvature, Ricci curvature, sectional curvature and Gauß
curvature. Actually, the Riemann curvature tensor is uniquely defined in terms of the affine connection. The
Gauß curvature is just the two-dimensional version of the sectional curvature.
38.6.2 Definition: [Definition of curvature form.]
38.6.3 Definition: [Definition of curvature tensor.]
[ Some questions to be answered near here. Is the Riemann curvature for a Riemannian manifold in some
sense an element of the Lie algebra of SO(n)? Is the Riemann curvature the exterior derivative of the
connection form in some sense? The exterior derivative of a differential form is some sort of commutator of
Lie derivatives? ]
38.6.4 Definition: [Definition of sectional curvature.]
[ Around here, define spaces of constant sectional curvature, and classify such spaces. Probably the spaces of
positive constant sectional curvature are isometric to a sphere. Define the negative equivalent too. ]
38.6.5 Definition: [Definition of Ricci tensor.]
38.6.6 Definition: [Definition of Ricci curvature.]
38.6.7 Definition: [Definition of scalar curvature.]
[ Present the second variation formula for the length of a family of geodesics. See Greene/Wu [68], page 6. ]
38.6.8 Definition: An Einstein space is a Riemannian manifold in which the Ricci tensor is a scalar
multiple of the metric tensor.
[ According to EDM2 [35], 364.D, the scalar multiple for an Einstein space must be constant if dim(M ) ≥ 3. ]
38.7. Differential operators
[ Especially do the Laplacian and the modulus of the gradient. See EDM2 [35] 194.B. Probably could use the
conservation of mass to determine the correct form of the Laplacian, in particular in the context of the heat
equation. Must show that a conservation law applies to the heat equation. The Laplacian is nicely defined
by Frankel [19], pages 305 and 93. ]
[ Hodge theory operators ∗ and δ. See Greene/Wu [68], pages 7 and 8. Also present harmonic, subharmonic
and superharmonic functions, page 8. Also see Warner [50], chapter 6, for Hodge theory, including the ∗ and
Laplace-Beltrami operators. ]
[ Here define elliptic, hyperbolic and parabolic operators. Show that the Laplacian is elliptic. ]
38.7.1 Remark: The Laplacian operator on an n-dimensional C 2 manifold M calculates the sum of sec-
ond derivatives of a real-valued function in n orthogonal directions at a point p ∈ M . The individual
second derivatives are effectively calculated along geodesic curves passing through p. The term −Γij k
∂k in
Definition 38.7.2 ensures that the second derivatives “follow the geodesics”. The factor g takes care of
ij
orthogonality and scaling of the second derivatives. Clearly the Laplacian operator requires both the affine
connection and the Riemannian metric for its definition, whereas general elliptic operators require only the
affine connection.
38.7.2 Definition: The Laplacian (operator) on a C 2 manifold with Riemannian metric g is the operator
∆ : C 2 (M, IR) → C 0 (M, IR) defined by
" #
∆ = g ij ∂ij − Γij
k
∂k ,
where Γ is the Levi-Civita connection for g.

38.8. Inner product 733
38.7.3 Remark: The Laplace-Beltrami operator generalizes the flat-space Laplace operator not only to
Riemannian manifolds but also to differential forms of general degree.
[ Define Ricci flow. I used to do research, which I did not publish except in seminars, on flow by mean
curvature for embedded manifolds. I should dig up my old research on this and define the relevant concepts
here. ]
38.8. Inner product

[ Should cover orthogonality here, and also the dot product of two vectors, like ∇u · γ̇(s). Also cover the
lengths of vectors here, like the length of the gradient vector |∇u| and |γ̇(s)|. ]
[ Definition of trace. E.g. ∆f = trace D2 f . See Greene/Wu [68], page 7. ]
[ There should be a section on Finsler metrics somewhere near here. ]
38.9. Embedded Riemannian manifolds

38.9.1 Definition (→ 38.3.6): [Alternative definition of Riemannian manifold.]
[ Here do a definition of the inherited metric tensor for embedded manifolds in a Riemannian manifold. See
notes G, page 3. Also define normal vectors to tangent spaces of submanifolds. See EDM2 [35] 364.A. ]
38.10. Information geometry

[ The particular benefit of the differential geometry perspective is apparently the fact that the point-to-point
distance function is an invariant under changes of coordinates. I have been told that the affine connections
which are used for information geometry are not the same as the Levi-Civita connection. ]
38.10.1 Remark: A Riemannian metric arises out of statistics as the Fisher information matrix. (See
Amari/Nagaoka [54]. See also EDM2 [35], 399.D, page 1489.) The Fisher information matrix is defined by:
N
∂ log f (x, θ) ∂ log f (x, θ)
∀θ ∈ M, gij (θ) = f (x, θ) dx, (38.10.1)
∂θi ∂θj
whereMf : S × M → IR is a family of probability densities on the set S with parameters in the manifold M .
Thus f (x, θ) dx = 1 for all θ ∈ M .
[ In equation (38.10.1), the coordinates θi and points θ of the manifold M are mixed up in the colloquial
fashion. This must be fixed. ]


[735]
Chapter 39
Pseudo-Riemannian manifolds
39.1 Overview of pseudo-Riemannian geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 735

39.2 The pseudo-Riemannian metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
39.3 General relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
39.4 Singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
39.5 Global solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
[ This is currently just a place-holder for a chapter which I will write some time soon. ]
Pseudo-Riemannian manifolds are a generalization of Riemannian manifolds. Therefore it makes sense to
deal with the special case of Riemannian manifolds first (Chapter 38).
[ This chapter should particularly deal with the case of (1, 3) manifolds. ]
[ Another particular topic of interest is the way in which (1, n) manifolds (i.e. manifolds with a Minkowski
metric) can be characterized locally by inequalities for d(x, y) analogous to the triangle inequality for Rie-
mannian manifolds. There are notes on this somewhere. This sort of result should be generalized to general
(m, n) manifolds. Then hopefully pseudo-Riemannian manifolds can also be given a one-sentence definition
as for Riemannian manifolds. ]
39.1. Overview of pseudo-Riemannian geometry

39.1.1 Remark: As in the case of the Riemannian manifold, a pseudo-Riemannian metric tensor field may
be defined either explicitly or as half the Hessian of the second derivatives of a two-point distance function
as in equation (38.2.1). In the pseudo-Riemannian case, the distance is hyperbolic rather than elliptic. Thus
the symmetric metric tensor component matrix has eigenvalues with mixed sign whereas the eigenvalues are
all positive in the case of a Riemannian manifold.
39.1.2 Remark: Special relativity is formulated in terms of Minkowski space-time, which is a hyperbolic
version of Euclidean space. When flat Minkowski space-time is generalized to manifolds, the corresponding
concept is a pseudo-Riemannian metric. This is the mathematical framework of general relativity.
A pseudo-Riemannian metric permits distances to be negative. Distances are determined globally with a
pseudo-Riemannian metric as described in Remark 38.2.6. Geodesic curves do not necessarily minimize
distance. They are defined by parallelism, not by minimizing or maximizing the length of the curve.
39.1.3 Remark: Pseudo-Riemannian geometry is the final stage in the presentation of differential geome-
try. The most important concept to generalize to a pseudo-Riemannian metric (for the purposes of general
relativity) is the Riemann curvature tensor. When all of the machinery of differential geometry has been
generalized from the Riemannian metric to the pseudo-Riemannian metric, the framework is then finally
ready for the presentation of Einstein’s equations and other physics models which are expressed in terms of
Riemannian and pseudo-Riemannian geometry.
39.1.4 Remark: General relativity is defined in terms of a generalized Riemannian metric called a “pseudo-
Riemannian metric”. The absolute global metric in general relativity is based on the assumption that all
of time-space is uniform. In particular, the speed of light is assumed to be constant at all points, in all

736 39. Pseudo-Riemannian manifolds
directions, and for all time. However, observational evidence in 2002 suggested that the speed of light is not
constant after all. This may imply that a Riemannian (or pseudo-Riemannian) metric is not a suitable basis
for geometry and gravity in cosmology. If this is so, physics which requires only layer 3 may survive while
the global metric in layer 4 does not. Therefore it is important to express geometry and physics as much as
possible in terms of layer 3. This implies a need to keep very clear which definitions require a metric and
which definitions require only an affine connection.
Surprisingly, most of differential geometry does not require a Riemannian metric. This may turn out to be
a good thing for cosmology if it turns out that a path-independent Riemannian metric is not a valid model
for global geometry. Alternatively, the relative lack of necessity of a metric may be a hint that the metric
is indeed not valid in cosmology. There is a precedent for this in the discovery that Euclid’s fifth postulate
did not follow from the other axioms. The abandonment of the equidistance of parallel lines facilitated
the development of geometries such as Riemannian manifolds. If the Riemannian and pseudo-Riemannian
metrics have to be abandoned, it will be important to base as much geometry as possible on connections.
That is why this book introduces the Riemannian metric as late as possible. (The Riemannian metric is
defined in Chapter 38. The pseudo-Riemannian metric is in Chapter 39.)
39.1.5 Remark: Since distance and angles are determined globally in a Riemannian or pseudo-Riemannian
manifold by the transport of the local metric tensor along geodesic curves which are determined by the Levi-
Civita connection, it would be tempting to think that a universe model based on such geometry has a clear
means of synchronizing the laws of physics at different points. However, the metric tensor is assumed a-priori
to be a global structure in space-time. It would be much more satisfying if the universe somehow “knew”
how to calculate the global Riemannian metric using parallel transport of the laws of physics along geodesics.
Since it seems now from experimental evidence that the laws of physics may indeed vary in space and time,
it may be that in future, the laws of gravity will be expressed in terms of level 3 affine connections, which
just happen to give an illusion of the existence of a global Riemannian metric. If there is some deviation of
the transport mechanisms of physics from an ideal Levi-Civita connection, then one would expect the global
Riemannian metric to be replaced by a metric which is path dependent. Therefore the length of a distant
object may depend on which path its length was communicated to you along. Thus the illusion of a global
Riemannian metric may be an artefact of a connection which is approximately a Levi-Civita connection.
The connection may determine the metric instead of vice versa. This could help account for an apparent
variable speed of light. It could also help to bring gravity theory into conformance with Mach’s principle.
This remark is, of course, pure conjecture.
39.1.6 Remark: General relativity was the principal driving force for differential geometry in the early
20th century, particularly for Riemannian geometry. This is a case where a body of mathematics was first
developed for mathematical reasons, then became useful – or indispensable – for physics, and then was
rapidly and richly developed to meet the needs of applications. Bell [190], page 370, says the following.
General relativity [. . . ] was directly responsible for the direction taken by differential geometry
about 1920. This newer geometry might have been developed almost forty years earlier. All the
necessary technique was available; but it was not until the successes of relativity showed that
Riemannian space and the tensor calculus were of more than mathematical interest that differential
geometers noticed what they had been overlooking.
Riemann lived from 1826 to 1866. Tensor calculus was developed about 1890 by Ricci-Curbastro. Einstein’s
general relativity was published in 1915 and 1916.
39.2. The pseudo-Riemannian metric

39.2.1 Definition: A pseudo-Riemannian metric on a C 1 manifold M is any continuous non-degenerate
symmetric covariant tensor field of degree 2 on M .
A pseudo-Riemannian metric of class C r on a C r+1 manifold M is a pseudo-Riemannian metric g on M
such that g is of class C r on M .
39.2.2 Remark: The fact that the word “covariant” appears in Definition 39.2.1 does not imply that a
metric tensor requires the prior definition of a connection. The C 1 condition could possibly be weakened
to a Hölder C 0,1 condition with the metric tensor being defined almost everywhere. This would still permit
distance to be calculated.

39.3. General relativity 737
[ For definition 39.2.1, the concept of a C ∞ symmetric tensor of type (0, 2) is already defined. Still to be defined
is non-degeneracy. This can be done in terms of contraction of tensors by requiring that g.x = 0 ⇒ x = 0
for contravariant vectors x. Should define non-degeneracy in the linear chapter. ]
39.3. General relativity
39.3.1 Remark: Bell [191], pages 503–504, quotes a fascinating 1870 paper by Clifford, “On the space-
theory of matter”, which seems very close indeed to the central idea of general relativity, namely that matter
and the curvature of space are connected.
Riemann has shown that as there are different kinds of lines and surfaces, so there are different
kinds of space of three dimensions; and that we can only find out by experience to which of these
kinds the space in which we live belongs. In particular, the axioms of plane geometry are true
within the limits of experiment on the surface of a sheet of paper, and yet we know that the sheet is
really covered with a number of small ridges and furrows, upon which (the total curvature being not
zero) these axioms are not true. Similarly, he says, although the axioms of solid geometry are true
within the limits of experiment for finite portions of our space, yet we have no reason to conclude
that they are true for very small portions; and if any help can be got thereby for the explanation of
physical phenomena, we may have reason to conclude that they are not true for very small portions
of space.
I wish here to indicate a manner in which these speculations may be applied to the investigation
of physical phenomena. I hold in fact
(1) That small portions of space are in fact of a nature analogous to little hills on a surface which
is on average flat; namely, that the ordinary laws of geometry are not valid in them.
(2) That this property of being curved or distorted is continually being passed on from one portion
of space to another after the manner of a wave.
(3) That this variation of the curvature of space is what really happens in that phenomenon which
we call the motion of matter , whether ponderable or ethereal.
(4) That in the physical world nothing else takes place but this variation, subject (possibly) to the
law of continuity.
Unfortunately, Clifford died at the age of 33 in 1879 (apparently from over-work), whereas Riemann died
aged 39 in 1866 (from tuberculosis). If they had lived long enough to learn about the Michelson-Morley
experiment [158] in 1881–1887, they might have developed a space-time theory of gravity in the 19th century,
possibly quite different to Einstein’s popular theory. At the very least, it is clear that very many of the ideas
of Einstein’s gravity theory were “in the air” long before the publication of his general relativity.
[ Present here Einstein’s equations, including the Einstein fudge factor Λ. Present a formulation of a push
theory of gravity in a different section. ]
39.3.2 Remark: Einstein’s equations look something like
1
Rµν − Rgµν + Λgµν = κ0 Tµν ,
2
where Λ is the famous fudge factor which “explains” the discrepancy between theory and observations by
invoking an ad-hoc variation in the cosmological expansion rate.
39.4. Singularities
[ Present here the basic definitions for black hole solutions. ]

738 39. Pseudo-Riemannian manifolds
39.5. Global solutions

[ Present here the big bang hypothesis. ]

[739]
Chapter 40
Tensor calculus
40.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740

40.2 Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
40.3 Manifolds with affine connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
40.4 Equations of geodesic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
40.5 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
40.6 Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
40.7 Submanifolds of Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
[ This is a place-holder for a chapter which I will write some day. Please ignore this chapter for now. ]
40.0.1 Remark: Tensor calculus is principally concerned with local analysis on differentiable manifolds in
terms of coordinate charts and components of vectors and tensors. Tensor calculus is a collection of practical
notations and methods for differential geometry calculations. The previous chapters have been concerned
mainly with the meaning of the objects which underly the calculations. It is important to have both an
understanding of the objects and the ability to perform calculations with them. The earlier chapters have
been to a great extent motivated by the desire to give unambiguous meaning to all of the expressions in
tensor calculus so that calculations can be performed with confidence.
The hazards of working with coordinates and components in tensor calculus include the following.
(1) Geometrical object confusion. The principal hazard of tensor calculus is that it is easy to lose track of
the associations between numerical coordinates and the objects which are coordinatized. The symbols
of tensor calculus refer to arrays of numbers. In an n-dimensional manifold, an array of n numbers
may be the coordinates of a tangent vector with respect to a basis, or the coordinates of a cotangent
vector. An n × n array could be the matrix of coordinates of a tangent tensor or a cotangent tensor;
or the coefficients of a linear transformation of point coordinates or its inverse transformation; or the
coefficients of a linear transformation of basis vectors or its inverse. Equality should never be asserted
between objects from different object classes. Likewise, addition and other operations should only be
applied to the objects with matching classes.
(2) Structural layer confusion. When using coordinates, it is particularly easy to mix up the structural
layers of a differentiable manifold. For example, equations which are valid in a Riemannian manifold
(layer 4) may be erroneously applied to a manifold which possesses only an affine connection (layer 3)
or only a differentiable structure (layer 2).
(3) Coordinate chart confusion. When multiple coordinate charts are being used for a single problem, there
is a further danger of losing track of which chart is being used for each expression.
(4) Transformation rule confusion Each kind of object has its own transformation rules between coordinate
charts. Identification of the wrong object class for a component array may yield incorrect transforma-
tions.
Every array in every equation can be a source of ambiguity and error. But despite the hazards of tensor
calculus, it is difficult to do serious practical calculations without it. The best policy is to use it, but
frequently check that each expression has the intended, unambiguous association with a geometrical object.


740 40. Tensor calculus
40.0.2 Remark: In this chapter, everything in the previous chapters on differential geometry is presented
in terms of coordinates and components. A manifold is defined here to be an open subset of IRn . A metric is
defined as a symmetric matrix function on the set. Then a geodesic is defined to be any curve which satisfies
the appropriate equations. Everything else is similarly defined purely in terms of coordinate equations and
components.
[ Some references for tensor calculus are Misner/Thorne/Wheeler [38], pages 223–224, and EDM2 [35], pages
1730–1733 (Appendix A, Table 4) and article 417. ]
40.1. History
40.1.1 Remark: According to Bell [190], page 360, tensor calculus was developed by Eugenio Beltrami
and Gregorio Ricci-Curbastro around the year 1890, in particular in a publication by Ricci in 1887 or 1888.
This implies that tensor calculus was developed after Riemannian geometry, but before the development of
affine connections and fibre bundles.
40.1.2 Remark: Einstein reportedly had serious difficulties understanding tensor calculus. He had to
understand it in order to formulate his gravity theory. Bell [191], page 256, made the following comment on
how Einstein arrived at general relativity.
What gave Einstein his idea was the hard labor he expended for several years mastering the tensor
calculus of two Italian mathematicians, Ricci and Levi-Civita, themselves disciples of Riemann and
Christoffel, both of whom in their turn had been inspired by the geometrical work of Gauss.
40.2. Differentiable manifolds

[ In this section, deal with those parts of tensor calculus which are valid in the absence of a connection and
Riemannian metric. ]
[ A “tensor”, such as “aij ”, means the map λ ∈ L2 (Tp (M ), IR) with λ : (v1 , v2 ) → aij v1i v2j , where (v1i )ni=1 =
ψ̂(v1 ), (v2j )nj=1 = ψ̂(v2 ), with ψ̂ the manifold’s “lift” function. ]
40.3. Manifolds with affine connection
[ It is assumed here that Γ is symmetric. Does this require the connection to be torsion-free? Should express
Γ in terms of the connection form ω. See EDM2 [35], page 1732, App. A.4. ]
40.3.1 Theorem: Let x : [a, b] → M be a C 2 curve in a C 2 manifold M with a C 1 affine connection. Let
Γij
k
be the Christoffel symbol of the connection with respect to a coordinate map ψ : Ω → IRn for some open
subset Ω of M such that x([a, b]) ⊆ Ω. Then
(i) x is an affinely parametrized geodesic curve in M if and only if
d2 xi dxj dxk
∀t ∈ (a, b), 2
(t) + Γjk
i
(x(t)) (t) (t) = 0. (40.3.1)
dt dt dt
(ii) x is a freely parametrized geodesic curve in M if and only if
d2 xi dxj dxk dxi

∀u ∈ (a, b), ∃k(u) ∈ IR, (u) + Γ i
jk (x(u)) (u) (u) = k(u) (u).
du2 du du du
Furthermore, k is C 1 if x is C 2 , and the re-parametrization u = f (t) makes t an affine parameter for

the curve if f satisfies
f ## (u) + k(u)f # (u)2 = 0.
This equation has solution N u+ N y ,−1
f (u) = k(z) dz dy.

40.4. Equations of geodesic variation 741
(iii) If dim(M ) = 2, and the curve can be locally expressed as a graph with respect to x1 , then the function
x2 = h(x1 ) satisfies
h## − Γ22
1
(h# )3 + (Γ22
2
− Γ12
1
− Γ21
1
)(h# )2 + (Γ12
2
+ Γ21
2
− Γ11
1
)h# + Γ11
2
= 0.
Proof: Part (ii) follows from part (i) on substitution of u = f (t), for any C 2 function f for which f # (t) -= 0
for all t ∈ (a, b). This gives
d2 xi (t) dxj (t) dxk (t) f ## (t) dxi (t)

∀t ∈ (a, b), + Γjk
i
(x(t)) =− # 2 .
dt 2 dt dt f (t) dt
This may be interpreted as the parallelism of Dẋ ẋ and ẋ. In other words, the covariant derivative of the
tangent vector along the curve is parallel to the tangent vector.
Part (iii) follows from a comparison of the two equations specified in part (iii). Since they both must have
the same value for k(u) for each u, k(u) may be eliminated between the two equations to solve for x2 in terms
of x1 . Indeed, if x1 = u and x2 = h(u) are substituted into the two equations in part (iii), they simplify to
Γ22
1
(h# )2 + (Γ12
1
+ Γ21
1
)h# + Γ11
1
= k(u)
h## + Γ22
2
(h# )2 + (Γ12
2
+ Γ21
2
)h# + Γ11
2
= k(u)h# .
Now substituting k(u) from the first equation into the second gives
h## − Γ22
1
(h# )3 + (Γ22
2
− Γ12
1
− Γ21
1
)(h# )2 + (Γ12
2
+ Γ21
2
− Γ11
1
)h# + Γ11
2
= 0.
40.3.2 Theorem: The curvature tensor components Ri jkl are given by
∂Γj%i ∂Γjk i
Ri jk% = − + Γmki
Γmjl − Γml Γjk
i m
∂xk ∂x%
i i
= kΓj%, − /Γjk, + Γmk
i
Γm
j% − Γm% Γjk .
i m
(40.3.2)
40.4. Equations of geodesic variation
[ This section is at the core of the author’s interest in differential geometry. The rest of the book has been
written to try to make sense of these formulas. The author got completely stuck on these calculations. Then
he decided to go back and do all of differential geometry at a pure mathematical level of correctness so that
he never gets stuck in this way again. ]
This section presents some derivations of first and second order equations of geodesic variation in a manifold
with an affine connection.
For any C 1 curve γ : IR → M in a C 1 manifold M , the velocity of the curve at a point γ(t) ∈ M is strictly
defined as the vector tγ(t),∂t (ψ◦γ(t)),ψ ∈ Tγ(t) (M ) for any chart ψ ∈ atlasγ(t) (M ). In tensor calculus, the
coordinates ∂t (ψ i (γ(t))) for i = 1 . . . n = dim(M ) are brought into the foreground. For simplicity, ψ i ◦ γ is
written as γ i for i = 1 . . . n, and ∂t (ψ i ◦γ) is written as γti . So strictly speaking, γti represents tγ(t),∂t (ψ◦γ(t)),ψ .
For a C 2 curve γ in a C 2 manifold M , the derivatives γtt i
are well defined, but they are not components of
a vector in Tγ(t) (M ).
[ Theorem 40.4.1 should also be proved somewhere without tensor calculus, preferably both in affine connec-
tions and in some sort of generalization to general connections. It should be possible to reduce the manifold
regularity, maybe from C 3 to C 2,1 . ]
40.4.1 Theorem: Let γ : IR2 → M be a one-parameter family of geodesics in a C 3 manifold with a

torsion-free C 2 connection with Christoffel symbol Γ. Then the tranverse field γt satisfies
Dγ2s γt = R(γs , γt )γs , (40.4.1)
where γs denotes ∂s γ(s, t) ∈ Tγ(s,t) (M ) and γt denotes ∂t γ(s, t) ∈ Tγ(s,t) (M ).

Proof: A family of geodesics satisfies γss

i
+ Γjk γs γs = 0. The derivative with respect to t is:
i j k
i i j k
i
γsst + /Γjk, γsj γsk γt% + 2Γjk γst γs = 0. (40.4.2)
For a general one-parameter C 3 family of curves in a C 3 manifold,

i
(Dγ2s γt )i = γsst
i
+ /Γjk, γsj γtk γs% + Γjk
i
(γss
j k
γt + 2γsj γst
k
) + Γ%m
i
Γjk
m j k %
γs γt γs .
Substitution of γsst
i
from (40.4.2) into this gives:
i i
(Dγ2s γt )i = (/Γjk, − kΓj%, )γsj γtk γs% + Γjk γss γt + Γ%m
i j k i
Γjk
m j k %
γs γt γs .
j % m
Substitution of γss
j
= −Γ%m γs γs into this and swapping j with m gives:
i i
(Dγ2s γt )i = (/Γjk, − kΓj%, + Γ%m
i
Γjk
m
− Γmk
i
Γm
%j )γs γt γs .
j k %
i
= (jΓ%k,
i
− kΓ%j, + Γ%k
m i
Γmj − Γ m
%j Γmk )γs γt γs .
i j k %
This is the same as the expression
(R(γs , γt )γs )i = Ri %jk γsj γtk γs%

i
= (jΓ%k,
i
− kΓ%j, + Γ%k
m i
Γmj − Γ m
%j Γmk )γs γt γs
i j k %
obtained from Theorem 40.3.2.
[ Look at simplification of equations for Jacobi fields and their derivatives by using normal coordinates to
parallel-translate vectors along the geodesic. Then most Christoffel symbol values become zero. ]
[ Next look at the transmission of first derivatives along a family of geodesics when minimizing c(u). This
transmission may be called “leverage”. The map from one point to another along a geodesic is a “leverage
map”. Using the properties of systems of linear second-order ODEs, it should be possible to deduce some
useful estimates for γt in terms of the Riemannian curvature. ]
[ The first equation in Remark 40.4.2 is the Hessian of φλ : M → M ? Looks like (D2 φλ )jk γtj γuk . Something
to do with Hp (. . .)? ]
[ What seems to be needed for convexity theory is the “generalized Hessian” of γ, or the “covariant Hessian”. ]
40.4.2 Remark: For a general C 4 family of curves γ : IR3 → M ,
i j k
(Dγu γt )i = γtu
i
+ Γjk γt γu
(Dγs Dγu γt )i = γtus
i i
+ /Γjk, γtj γuk γs% + Γjk
i j k
(γts γu + γtj γus
k j k
+ γtu γs ) + Γ%m
i % j k m
Γjk γt γu γs
(Dγ2s Dγu γt )i = γtuss
i i
+ /mΓjk, γtj γuk γs% γsm + /Γjk,
i j k %
(2γts γu γs + 2γtj γus
k %
γs + γtj γuk γss
% j k %
+ γtu γs γs )
j j k
(40.4.3)
+ Γjk
i
(γtss γuk + 2γts γus + γtj γuss
k j
+ 2γtus j k
γsk + γtu γss )
% j k m n
+ nΓ%m,
i
Γjk γt γu γs γs + Γ%m
i %
nΓjk, (2γtj γuk γsm γsn )
j k m
+ Γ%m
i
Γjk
%
(2γts γu γs + 2γtj γus
k m
γs + γtj γuk γss
m j k m
+ γtu γs γs ) + Γnp
i n % j k m p
Γ%m Γjk γt γu γs γs ,
where the curve family parameters are (s, t, u) ∈ IR3 . If the family is geodesic with respect to the first
parameter s, all double-s derivative terms may be substituted from the geodesic equation and its derivatives:
i
γss = −Γjk
i j k
γs γs
i i j k
i
γtss = −/Γjk, γsj γsk γt% − 2Γjk γst γs
i
i
γuss = −/Γjk, γsj γsk γu% − 2Γjk
i j
γsu γsk
i i j k % j
i
γtuss = −/mΓjk, γsj γsk γt% γum − /Γjk, (2γsu
j
γsk γt% + γsj γsk γtu
%
+ 2γst γs γu ) − Γjk
i
(2γstu γsk + 2γsu
j k
γst ).

40.5. Riemannian manifolds 743
i i
[ In the above equation, maybe the formula γtuss
i
= −/mΓjk, γsj γsk γu% γtm − 2/Γjk, could be useful? ]
Substitution of γtuss
i
into (40.4.3) gives
i
(Dγ2s Dγu γt )i = (/mΓjk, i
− jkΓ%m, )γtj γuk γs% γsm + /Γjk,
i
γtj γuk γss
%
i i j k %
+ (/Γjk, − kΓj%, )(2γts γu γs + 2γus
j
γtk γs% + γsj γtu
k %
γs )
j
+ Γjk
i
(γtss γuk + γtj γuss
k j k
+ γtu γss ) (40.4.4)
% j k m n
+ i
nΓ%m, Γjk γt γu γs γs + Γ%mi %
nΓjk, (2γtj γuk γsm γsn )
j k m
+ Γ%m
i
Γjk
%
(2γts γu γs + 2γtj γus γs + γtj γuk γss
k m m
+ γtu j k m
γs γs ) + Γnp
i n % j k m p
Γ%m Γjk γt γu γs γs .
j
Substitution of γtss and γuss
k
into (40.4.4) gives
i
− jkΓ%m, )γtj γuk γs% γsm + /Γjk,
i
γtj γuk γss
%
i i j k % i j k
+ (/Γjk, − kΓj%, )(2γts γu γs + 2γus
j
k %
γs ) + Γjk γtu γss
% j k m n
+ nΓ%m,
i
Γjk γt γu γs γs + (2Γ%m
i %
nΓjk, − Γ%k
i %
jΓmn, %
− Γ%ji kΓmn, )γtj γuk γsm γsn (40.4.5)
j k m
+ 2(Γ%m
i
Γjk
%
− Γ%ki
Γjm
%
)γts γu γs + 2(Γ%m
i
Γkj
%
− Γ%ji Γkm
%
)γtj γus
k m
γs
j k m j k m n % j k m p
+ Γ%m Γjk (γt γu γss + γtu γs γs ) + Γnp Γ%m Γjk γt γu γs γs .
i % i
Substitution of γss into (40.4.5) gives

i
− jkΓ%m, )γtj γuk γs% γsm + (/Γjk,
i i
− kΓj%, j k %
)(2γts γu γs + 2γus
j
k %
γs )
+ (nΓ%m,
i
Γjk
% i
− /Γjk, Γmn
%
+ 2Γ%m
i %
nΓjk, − Γ%k
i %
jΓmn, %
− Γ%ji kΓmn, )γtj γuk γsm γsn
j k m
(40.4.6)
+ 2(Γ%m
i
Γjk
%
− Γ%k
i
Γjm
%
)γts γu γs + 2(Γ%m
i
Γkj
%
− Γ%ji Γkm
%
)γtj γus
k m
γs
j k m % j k m p
+ (Γ%m
i
Γkj
%
− Γ%ji Γkm
%
)γtu γs γs + (Γnp
i
Γm%
n
− Γn%
i
Γmp
n
)Γjk γt γu γs γs .
Recognition of the Riemann curvature tensor in (40.4.6) leads to the following:
i
− jkΓ%m, )γtj γuk γs% γsm + Ri mn% Γjk
% j k m n
γt γu γs γs
i
+ (/Γmn, Γjk
% i
− /Γjk, Γmn
%
+ 2Γ%m
i %
nΓjk, − Γ%k
i %
jΓmn, %
− Γ%ji kΓmn, )γtj γuk γsm γsn (40.4.7)
j k % j k %
+R i
k%j γtu γs γs + 2R i
j%k γts γu γs + 2R i j k %
j%k γus γt γs .
m
[ From notes: mΓ%n,
i
Γjk
m i
− mΓjk, Γ%n
m
+ 2Γm%
i m
nΓjk, − Γmk
i m
jΓ%n, − Γmj
i
kΓ%n, ?]
[ This derivation of second-order equations of geodesic variation will be continued some time soon. These
calculations are the prime motivation for the whole book. Around 2002-6-20, I finally got a “handle” on this
problem. So I’m expecting to get this all worked out soon. ]
40.5. Riemannian manifolds

[ This section is very old and totally useless. It’s just a place-holder for future work. ]
A Riemannian manifold is a (topological) metric space which can be locally coordinatized in such a way
that the square of the distance between two points is twice continuously differentiable with respect to the
coordinates of the points and the matrix of second derivatives is invertible.
[ Should non-degeneracy be used here instead of invertibility? ]
Let d : M × M → IR+ 0 = [0, ∞) be the distance function on an n-dimensional Riemannian manifold M .
Then the metric tensor on M at a point p ∈ M (for a particular local coordinate map (U, ψ)) is the matrix
[gij (x)]ni,j=1 defined by
1 ∂ 2 d(x, y)2 &&
gij (x) = & ,
2 ∂y i ∂y j y=x

where x = ψ(p). [ It should be explained why the obvious simplification does not apply. The obvious
simplification is to say that (d2 /dy 2 )f& (y)2 = 2(f f ## + (f # )2 ), and that since d(x, y) = 0 at y = x, the formula
for gij must reduce to ((d/dy)d(x, y)&y=x )2 . However, the derivative of d(x, y) generally does not exist in a
neighbourhood of y = x. But then again, maybe the right derivative of d(x, y) would do the job. ]
[ It should be possible to show that g is C k if d has some regularity property. ]
For integer r ≥ 0, if the matrix elements gij are C r with respect to the coordinates, then M is said to
be C r+1 .
The Christoffel symbol Γij k
in a Riemannian metric space satisfies
I J
1 kl ∂gli ∂glj ∂gij
Γij
k
= g + − .
2 ∂xj ∂xi ∂xl
where the matrix [g ij (x)]ni,j=1 is the inverse of the matrix [gij (x)]ni,j=1 .
If M is C 2 , then a continuously differentiable curve γ : [0, L] → M , mapping s 8→ (xi (s))ni=1 (in terms of
local coordinates), is a normal geodesic (or a length-parametrized geodesic) if L ≥ 0 and for all s ∈ (0, L),
d2 xi dxj dxk
2
+ Γjk
i
(x(s)) =0
ds ds ds
as in an affine connection space, and
dxi dxj
gij (x(s)) = 1.
ds ds
Then L = L(γ) is the length of the curve γ. The image of a normal geodesic is a geodesic.
If x, y ∈ M are points such that there is a unique geodesic γx,y in M with endpoints x and y satisfying
L(γx,y ) = d(x, y), then the convex combination (x, y, λ) of x and y is uniquely defined for λ ∈ [0, 1] by
(x, y, λ) = γx,y (λd(x, y)).
A subset K of a Riemannian manifold M is convex if for all x, y ∈ K, there exists a unique geodesic γx,y
with endpoints x and y such that L(γx,y ) = d(x, y). Then convex combinations are well-defined in a convex
set K, and so : K × K × [0, 1] → K is a well-defined function.
A function f : K → IR on a convex set K is a convex function if for all x, y ∈ K and λ ∈ [0, 1],
f ( (x, y, λ)) ≤ (1 − λ)f (x) + λf (y).
A function f : K → IR is said to be concave if −f is convex.

[ Define harmonic concavity around here somewhere. ]
The gradient ∇f , the length of the gradient |∇f |, and the Laplacian ∆f , of a twice differentiable function
f : S → IR on a subset S of a Riemannian manifold M are defined by
∂f
(∇f )i = ,
∂xi
I J1/2
∂f ∂f
|∇f | = g ij i j , and
∂x ∂x
I 2 J
∂ f k ∂f
∆f = g ij − Γij
∂xi ∂xj ∂xk
I J
1 ∂ √ ij ∂f
= √ g g ,
g ∂xi ∂xj
where g = det([gij ]ni,j=1 ). If M is a C 2+k manifold and g ∈ X 1 (T 0,2 (M )), then (probably) ∆ ∈ X k (T [2] (M )).
[ The following two definitions should only be presented if they are needed for results. ]
[ Definition of parallelism. ]

40.6. Pseudo-Riemannian manifolds 745
[ Definition of normal coordinates. ]

j
Two vectors X(1) , X(2) ∈ Tx (M ) are said to be orthonormal at x ∈ M if gij (x)X(k)
i
X(l) = δkl .
If M is C 3 , then the sectional curvature of M at x ∈ M in the “plane” of the orthornormal vectors X, Y ∈ IRn
is defined to be
Kx (X, Y ) = gim (x)Rm jkl (x)X i Y j X k Y l ,
where the curvature tensor Ri jkl is given by Theorem 40.3.2.
A Riemannian manifold of constant sectional curvature is one for which κ = Kx (X, Y ) is independent of the
point x and the pair (X, Y ) of orthonormal vectors.
40.6. Pseudo-Riemannian manifolds

[ This section should include tensor calculus for special and general relativity. ]
40.7. Submanifolds of Euclidean space

[ This should, in particular, deal with Gaußian curvature and mean curvature. Also, a whole bunch of
things have simplified formulas when the manifold is embedded in IRn . And the book by Gallot/Hulin/
Lafontaine [20] has lots of material on this subject. It seems like in a 2-dimensional embedded manifold, the
Gauß curvature is the same as sectional curvature. The mean curvature just doesn’t mean anything in an
intrinsic geometry. ]
40.7.1 Definition: A subset M ⊆ IRn is said to be a C r k-dimensional submanifold of IRn if
∀x ∈ M, ∃U ∈ Top(IRn ),
x ∈ U and ∃f ∈ C r (U, IRn−k ), U ∩ M = f −1 (0) and f is a submersion.
(A submersion is a map whose differential is everywhere surjective.) [ See Gallot/Hulin/Lafontaine [20],

page 2. ]


[747]
Chapter 41
Geometry of the 2-sphere
41.1 Terrestrial coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747

41.2 Tensor calculus in terrestrial coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
41.3 Metric tensor calculation from the distance function . . . . . . . . . . . . . . . . . . . . . 751
41.4 The principal fibre bundle in terrestrial coordinates . . . . . . . . . . . . . . . . . . . . . 752
41.5 The Riemannian connection in terrestrial coordinates . . . . . . . . . . . . . . . . . . . . 753
41.6 Coordinates for polar exponential maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
41.7 The global tangent bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
41.8 Isometries of S 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
41.9 Geodesic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
41.10 Affinely parametrized geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
41.11 Convex sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.12 Normal coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.13 Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.14 Circles on the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
41.15 Calculation of the “hours of daylight” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
41.16 Some standard map projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
41.17 Projection of a sphere onto a plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
41.0.1 Remark: The 2-sphere S 2 is among the simplest non-trivial geometries to study from the dif-
ferential geometry perspective. It demonstrates a large number of features which are generalized through
differential geometry to more general spaces. Although the 2-sphere has been studied for thousands of years,
it still has a rich variety of properties which provide a testing ground for theoretical concepts. (Of course,
there was a break in its study in Europe during the Dark Ages when people were told that the Earth was
flat.) Understanding the 2-sphere is fundamental to many areas of physics from the quantum mechanics
of atoms and elementary particles to astrophysics and cosmology. Therefore the 2-sphere deserves its own
chapter, or maybe a whole book of its own. After all, the word “geometry” does mean the measurement of
the Earth, and the Earth is a 2-sphere (approximately).
The manifold S 2 is a good test of the practicality of all differential geometry definitions. Any definition
which cannot be applied to S 2 probably should be changed.
[ See CRC [100], page 312 for spherical coordinates. See also Cohen-Tannoudji/Diu/Laloë [155], volume 1,
especially the end-papers. ]
[ Many of the statements in the chapter would make nice exercises! ]
41.1. Terrestrial coordinates

[ Probably should use (x1 , x2 , x3 ) instead of (x, y, z) in the following. ]
41.1.1 Remark: Coordinates really are necessary for differential geometry. How else would one indicate
points in manifolds? One could use the methods of synthetic geometry, such as “the point on the intersection

748 41. Geometry of the 2-sphere
of this line and that”. But after a while, it becomes clear that these ways of indicating points are really
coordinate systems in disguise. So those people who think that coordinates are bad are really fooling
themselves. The “coordinate-free” concepts in differential geometry always turn out to contain coordinates
in disguise.
41.1.2 Remark: Define the 2-sphere to be the subset S 2 = {(x, y, z) ∈ IR3 ; x2 + y 2 + z 2 = 1} of IR3 .
Define terrestrial spherical coordinates for S 2 by
" # ) *
ψ : S 2 → (−π, π] × (−π/2, π/2) ∪ (0, −π/2), (0, π/2)
(41.1.1)
ψ : (x, y, z) 8→ (φ, θ) = (arctan(x, y), arcsin(z)).
(See Section 20.13 for definitions of trigonometric functions.) The range of ψ is illustrated in Figure 41.1.1.
φ is called the longitude and θ is called the latitude.
θ
2 North Pole
π/2
1
München
−π π φ
-3 -2 -1 1 2 3
-1
Melbourne
−π/2 -2
Figure 41.1.1 Range of terrestrial coordinates for S 2
41.1.3 Remark: The map ψ has a left inverse ψ̄ defined by
ψ̄ : IR2 → S 2
(41.1.2)
ψ̄ : (φ, θ) 8→ (x, y, z) = (cos θ cos φ, cos θ sin φ, sin θ).
It is noteworthy that this kind of inverse chart ψ̄ does not have the “seams” which necessarily appear in the
forward chart ψ.
[ Maybe the kind of multiple-covering chart in Remark 41.1.3 should be formalized somehow in general? ]
41.1.4 Remark: Figure 41.1.2 illustrates lines of constant longitude and latitude for a 2-sphere. The
curves φ = 0 and θ = 0 are emphasized. The longitude intervals are 15◦ . The latitude intervals are 10◦ .
Palo Alto München
Rām Allāh
Ouagadougou
La Habana
Kinshasa
Figure 41.1.2 Domain of terrestrial coordinates for S 2

41.2. Tensor calculus in terrestrial coordinates 749
41.1.5 Remark: The terrestrial coordinates presented here do not constitute a chart for the manifold S 2 ,
but they can be made into a chart by restricting ψ̄ : IR2 → S 2 to the set
V0 = (−π, π) × (−π/2, π/2)
= Int(Dom(ψ)),
or equivalently, by restricting ψ : S 2 → IR2 to
) O *
U0 = (x, y, z) ∈ S 2 ; x > − 1 − z 2
= {(x, y, z) ∈ S 2 ; x > 0 or y -= 0}
= ψ −1 (V0 ),
&
which removes the poles and the international dateline from the domain of ψ. Define ψ0 = ψ &U0 and
&
ψ̄0 = ψ̄ &V . Then ψ0 is a chart for S 2 .
0
41.1.6 Remark: Astronomical spherical coordinates are obtained by forcing φ into the range [0, 2π). This
is done by adding 2π to φ when y < 0. In quantum mechanics, it is customary to define θ = arccos(z), so
that θ ∈ [0, π]. (See Cohen-Tannoudji/Diu/Laloë [155].) Terrestrial-style coordinates with φ ∈ (−π, π] and
θ ∈ [−π/2, π/2] are assumed in this chapter unless otherwise indicated.
41.2. Tensor calculus in terrestrial coordinates

[ Should split this section into those things which can be defined with an affine connection but no metric,
versus those things which require a metric. Maybe also have a section on those things which don’t even
require a connection. ]
The extrinsic tangent vectors generated by the parameters φ and θ are as follows.
 
− cos θ sin φ
∂
=  cos θ cos φ 
∂φ
0
  (41.2.1)
− sin θ cos φ
∂
=  − sin θ sin φ  .
∂θ
cos θ
The lengths of these vectors satisfy |∂/∂φ| = cos θ and |∂/∂θ| = 1. A convenient abbreviation for these
tangent vectors is ∂φ = ∂/∂φ and ∂θ = ∂/∂θ. Strictly & speaking, these vectors are defined by ∂φ = ∂p,e1 ,ψ
and ∂θ = ∂p,e2 ,ψ , where ∂p,v,ψ : f 8→ ∂φ (f ◦ ψ −1 )&ψ(p) for v ∈ IR2 , f ∈ C 1 (S 2 ) and p ∈ S 2 . Tangent vectors
are always attached to some base point in the manifold, in this case p ∈ S 2 . For convenience, the base point
is omitted in the notation when there is no confusion.
The vectors in equation (41.2.1) are associated with the unit vectors in equation (41.2.2).
 
− sin φ
∂
eφ (φ, θ) = (cos θ)−1 =  cos φ 
∂φ
0
  (41.2.2)
− sin θ cos φ
∂
eθ (φ, θ) = =  − sin θ sin φ .
∂θ
cos θ
The metric for the 2-sphere is inherited from its embedding in IR3 .
¯ 1 , p2 )2 ≤ (θ1 − θ2 )2 + (φ1 − φ2 )2 in Theorem 41.2.1. ]
[ Show that d(p
41.2.1 Theorem: The distance d(p ¯ 1 , p2 ) within IR3 between two points p1 and p2 in S 2 , whose spherical
coordinates are respectively (φ1 , θ1 ) and (φ2 , θ2 ), is given by
¯ 1 , p2 )2 = 2(1 − cos(θ1 − θ2 ) + cos θ1 cos θ2 (1 − cos(φ1 − φ2 )))
d(p
" # " #
= 4 sin2 12 (θ1 − θ2 ) + 4 cos θ1 cos θ2 sin2 12 (φ1 − φ2 ) .

The distance d(p1 , p2 ) between the same two points within the sphere surface S 2 is given by
" #
d(p1 , p2 ) = arccos cos(θ1 − θ2 ) + cos θ1 cos θ2 (cos(φ1 − φ2 ) − 1)
" #
= arccos sin θ1 sin θ2 + cos θ1 cos θ2 cos(φ1 − φ2 )
" #
¯ 1 , p2 )
= 2 arcsin 12 d(p
+" #1/2 ,
= 2 arcsin sin2 ( 12 (θ1 − θ2 )) + cos θ1 cos θ2 sin2 ( 21 (φ1 − φ2 )) .
¯ 1 , p2 )) or arccos(p1 · p2 ). ]
Proof: [ The distance d(p1 , p2 ) may be calculated as either 2 arcsin( 12 d(p
41.2.2 Theorem: With respect to the coordinate map assumed in this chapter for S 2 , namely the terres-
trial coordinates in Section 41.1, the components gij of the metric tensor, Γij
k
of the Christoffel symbol for
the Levi-Civita connection, and R jkl of the Riemann curvature tensor, satisfy
i
gij = cos2 θ δi1 δj1 + δi2 δj2

g ij = sec2 θ δ1i δ1j + δ2i δ2j
Γij
k
= − tan θ (δi1 δj2 + δi2 δj1 )δ1k + sin θ cos θ δi1 δj1 δ2k
k
" #
lΓij, = − sec2 θ (δi1 δj2 + δi2 δj1 )δ1k + (cos2 θ − sin2 θ)δi1 δj1 δ2k δl2
Ri jkl = (δ1i δj2 − cos2 θ δ2i δj1 )(δk1 δl2 − δk2 δl1 )
Rijkl = cos2 θ (δi1 δj2 − δi2 δj1 )(δk1 δl2 − δk2 δl1 )
Rij = gij
R=2
Gij = 0
K(X, Y ) = cos2 θ (X 1 Y 2 − X 2 Y 1 )2 .
[ Also calculate sectional curvature in Theorem 41.2.2. ]

[ For Ri jkl in Theorem 41.2.2, note that Rjk = R% jkl = −δj2 δk2 − cos2 θ δj1 δk1 = −gjk !? ]
Proof: The formula for gij can be determined directly from the metric function on S 2 from Theorem 41.2.1,
or else from the general formula for embedded manifolds. [ This should be done explicitly for at least one of
the methods of proof. ]
To prove the formula for Ri jkl , note that
i i
kΓjl, − lΓjk, = (sec2 θ δ1i δj2 − (cos2 θ − sin2 θ)δ2i δj1 ) (δk1 δl2 − δk2 δl1 )
and
Γm
jl Γmk − Γjk Γml = −(tan θ δ1 δj + sin θ δ2 δj ) (δk δl − δk δl ).
i m i 2 i 2 2 i 1 1 2 2 1
The formula for Ri jkl follows immediately.
[ Also must calculate the operators ∇ and ∆ in spherical coordinates. And also the equation of geodesic
variation, and the Jacobi fields, and the second variation of length and of energy. Also calculate the sectional
curvature, the Ricci curvature, and other curvatures. Also must calculate the sum of the angles of an arbitrary
triangle, and relate this to the area of triangle and the sectional curvature. Also must calculate the area and
circumference of an arbitrary circle, and the ratio of area to radius, etc. And also must calculate operators
such as ∇(∇u|∇u|p ). And it would be nice to have the solution of such equations as ∆u + uγ = 0 in a circle,
with zero Dirichlet data, for 0 ≤ γ ≤ 1. ]
[ Here must state a theorem on the slope m = tan β of a curve – that it is given by
dθ
m = tan β = sec θ .
dφ
The proof should be done in terms of the inner product gij X i Y j . ]

41.3. Metric tensor calculation from the distance function 751
41.3. Metric tensor calculation from the distance function

This section deals with the correspondence between the standard point-to-point metric (the distance func-
tions) and the standard Riemannian metric on S 2 .
It is well-known how the point-metric d : M × M → IR+ 0 is derived from the Riemannian metric g ∈
T 0,2 (M ) for any Riemannian manifold M . The reverse calculation is not often shown in textbooks. (See
Theorem 38.4.6.)
It follows from Theorem 41.2.1 that the point-to-point distance function d : S 2 × S 2 → [0, π] satisfies
"1 # "1 # " #
sin2 2 d(p1 , p2 ) = sin2 2 (θ1 − θ2 ) + cos θ1 cos θ2 sin2 12 (φ1 − φ2 ) , (41.3.1)
or equivalently,
cos(d(p1 , p2 )) = cos(θ1 − θ2 ) + cos θ1 cos θ2 (cos(φ1 − φ2 ) − 1), (41.3.2)
where ψ0 : U0 → V0 ⊆ IR2 is the terrestrial coordinates chart defined in Section 41.1, p1 , p2 ∈ U0 ⊆ S 2 are
points in S 2 with coordinates (φ1 , θ1 ) = ψ0 (p1 ) and (φ2 , θ2 ) = ψ0 (p2 ). Very unfortunately, as is usual with
distance functions, the function d(p1 , p2 ) is not at all differentiable at p2 = p1 . Therefore the elementary
rule for differentiating the square of a differentiable function is of little use.
It follows from equation (41.3.1) that limp2 →p1 d(p1 , p2 ) = 0." So d(p1 , p2 ) is continuous with
# respect to p2 in
a neighbourhood of p1 . It also follows that d(p1 , p2 ) ≤ arcsin ((φ1 − φ2 )2 + (θ1 − θ2 )2 )−1/2 . So all directional
derivatives of d2 with respect to φ2 and θ2 are zero at p2 = p1 .
The second derivatives of d(p1 , p2 )2 with respect to the coordinates of p2 for fixed p1 may be determined
from the following calculations.
∂d
sin d = − cos θ1 cos θ2 sin(φ1 − φ2 )
∂φ2
∂d
sin d = − sin(θ1 − θ2 ) + cos θ1 sin θ2 (cos(φ1 − φ2 ) − 1)
∂θ2
+ ∂d ,2
∂2d
sin d + cos d = cos θ1 cos θ2 cos(φ1 − φ2 )
∂φ22 ∂φ2
∂2d ∂d ∂d
sin d + cos d = cos θ1 sin θ2 sin(φ1 − φ2 ).
∂φ2 ∂θ2 ∂φ2 ∂θ2
∂2d + ∂d ,2
2 sin d + cos d = cos(θ1 − θ2 ) + cos θ1 cos θ2 (cos(φ1 − φ2 ) − 1)
∂θ2 ∂θ2
It follows that
∂d2 ∂d
lim = 2 lim d
φ2 →φ1 ∂φ2 φ2 →φ1 ∂φ2
d " #
= 2 lim − cos θ1 cos θ2 sin(φ1 − φ2 )
φ2 →φ1 sin d
= 0.
This shows that ∂d2 /∂φ2 is continuous for p2 in a neighbourhood for p1 . The same is true for ∂d2 /∂θ2 . It
follows that d2 is C 1 with respect to p2 in a neighbourhood of p1 . The second derivative with respect to φ2
may then be calculated as follows.
∂ 2 d2 1 ∂d
= lim 2d
∂φ22 φ2 →φ1 φ2 − φ1 ∂φ2
d − cos θ1 cos θ2 sin(φ1 − φ2 )
= 2 lim
φ2 →φ1 sin d φ2 − φ1
= 2 cos θ1 cos θ2 ,

which equals 2 cos2 θ1 for θ2 = θ1 . This agrees with the limit of ∂ 2 d2 /∂φ22 as p2 → p1 . So this partial
derivative is continuous at p2 = p1 . It may similarly be shown that ∂ 2 d/∂φ2 ∂θ2 = 0 and ∂ 2 d/∂θ22 = 2
at p2 = p1 , which also agree with the limits of the corresponding second derivatives. Therefore d2 is a C 2
function for p2 in a neighbourhood of p1 , and the Hessian matrix for d2 (p1 , p2 ) with respect to p2 satisfies
 2 2 
∂ d ∂ 2 d2
R S
1 ∂φ2 ∂θ2  cos2 θ1 0
2
 ∂φ2 
 = ,
2  ∂ 2 d2 ∂ 2 d2  0 1
∂θ2 ∂φ2 ∂θ22
which agrees with the standard Riemannian metric tensor in accordance with Theorem 38.4.6.
[ There should be a general theorem to guarantee that the square d2 a distance function d is C 2 at the origin if d
has the form d(p1 , p2 ) = f (g1 (p1 , p2 )h1 (x2 − x1 ) + g2 (p1 , p2 )h2 (y2 − y1 )), where (xk , yk ) = ψ(pk ) for k = 1, 2,
the functions f , g1 , g2 , h1 and h2 are C ∞ with f # (0) = h#1 (0) = h#2 (0), g1 and g2 are non-negative, and
f (z) ≥ 0 for all z ≥ 0. Something along these lines might be useful for dealing with metrics in Riemannian
spaces. ]
41.4. The principal fibre bundle in terrestrial coordinates

Affine connections are defined on principal fibre bundles. (See Definition 23.9.2 for topological PFBs.) The
PFB of interest is the tangent n-frame bundle of an n-dimensional manifold, acted on by the group GL(n).
Three sets of manifold charts are required for the PFB (P, π, B) for B = S 2 . These charts are of the form
ψB : B → IR2 for the manifold B, ψP : P → IR6 for the total space P , and ψG : G → IR4 for the structure
group G.
The chart ψB is defined in equation (41.1.1). Its inverse ψ̄B is defined by equation (41.1.2). This chart leaves
gaps at the poles, but this is not a serious problem because this section is only intended to demonstrate the
basic principles.
[ Maybe in the following should use α, β, γ instead of u, v and w. Then use u, v instead of current α and β. ]
The set P of coordinate frames for the manifold B is the set of ordered pairs of independent tangent vectors
at each point of B. This set may be expressed as
) *
P = (v11 eφ (b) + v21 eθ (b), v12 eφ (b) + v22 eθ (b)); v ∈ M 2,2 (IR), det(v) -= 0, b ∈ B ,
where eφ (b) and eθ (b) are the unit tangent vectors defined in equation (41.2.2) for terrestrial coordinates
at b ∈ B. Note that the coefficients v21 and v12 are intentionally ‘out of order’, because row vectors and
matrix multiplication on the right are assumed. Thus
3 4
v v12
(v11 eφ (b) + v21 eθ (b), v12 eφ (b) + v22 eθ (b)) = (eφ (b), eθ (b)) 11 .
v21 v22
P is a 6-dimensional manifold which can be coordinatized by the chart ψP : P → IR6 defined by
ψP : (v11 eφ (b) + v21 eθ (b), v12 eφ (b) + v22 eθ (b)) 8→ (φ, θ, v11 , v12 , v21 , v22 ).
Standard lexicographic ordering is used for the elements of the matrix v.

The projection map π : P → B satisfies π : (v11 eφ (b) + v12 eθ (b), v21 eφ (b) + v22 eθ (b)) 8→ b.
The structure group G is GL(2) = {v ∈ M 2,2 (IR); det(v) -= 0}. This set is adequately coordinatized by the
matrix elements. Thus ψG : v 8→ (v11 , v12 , v21 , v22 ).
The action µ : G × P → P is defined with matrix multiplication. For g ∈ G and p ∈ P , define
µ(g, p) = p.g = (w11 eφ (b) + w21 eθ (b), w12 eφ (b) + w22 eθ (b))
3 4
w11 w12
= (eφ (b), eθ (b)) ,
w21 w22
where

41.5. The Riemannian connection in terrestrial coordinates 753
3 4 3 4
v v12 u11 u12
p = (eφ (b), eθ (b)) 11 , g= ,
v21 v22 u21 u22
and 3 4 3 4 3 4
w11 w12 v v12 u u12
w= = 11 · 11 = v · u.
w21 w22 v21 v22 u21 u22
Additionally, the principal fibre bundle (P, π, B) requires a fibre atlas AP,G , each of whose charts maps the
total space P to the structure group G. The single fibre bundle chart " ψP : P → G may
G
# be defined so that
−1
ψPG : (v11 eφ (b) + v12 eθ (b), v21 eφ (b) + v22 eθ (b)) 8→ v, where v = ψG (v11 , v12 , v21 , v22 ) ∈ GL(2).
The tangent bundle on B can be coordinatized with ψT (B) : T (B) → IR4 , where for b ∈ B and α ∈ Tb (B),
ψT (B) : (b, α) 8→ (φ, θ, αφ , αθ ), where (αφ , αθ ) is a tuple of contravariant ccordinates for α so that α =
αφ eφ (b) + αθ eθ (b).
The definition of an affine connection on the principal fibre bundle (P, q, B) will require a tangent bundle
on P . Since there is a differentiable manifold chart ψP : P → IR6 , it is easy to construct a tangent bundle
T (P ) of P as the set of (p, β) such that p ∈ P and β ∈ Tp (P ). This tangent bundle T (P ) must now be
coordinatized with a chart.
[ Maybe put eφ , eθ instead of ∂φ , ∂θ in the next paragraph? ]
The sequence of six vectors (∂φ (p), ∂θ (p), ∂v11 (p), ∂v12 (p), ∂v21 (p), ∂v22 (p)) is a basis for Tp (P ) for each p ∈ P .
For brevity, this basis will be denoted (∂φ , ∂θ , ∂v11 , ∂v12 , ∂v21 , ∂v22 ), when the point p is implicit. Each vector
β ∈ Tp (P ) may be mapped to IR6 by β 8→ (βφ , βθ , β11 , β12 , β21 , β22 ), where these contravariant coordinates
are chosen so that
β = βφ ∂φ (p) + βθ ∂θ (p) + β11 ∂v11 (p) + β12 ∂v12 (p) + β21 ∂v21 (p) + β22 ∂v22 (p).
The coordinatization of P is provided already by ψP . So now this can be combined with the above coordinates
for Tp (P ) to give the combined chart ψT (P ) : T (P ) → IR12 defined by
ψT (P ) : (p, β) 8→ (φ, θ, v11 , v12 , v21 , v22 , βφ , βθ , β11 , β12 , β21 , β22 ).
41.4.1 Remark: A rather amusing thing about the approach to connections taken in this section is the
fact this is the coordinate-free way of doing it. But it involves heaps more coordinates than the traditional
coordinate method using the Γ symbol as in Section 41.2. This just shows that the obsession that some
people have with “coordinate-free” methods actually makes things worse in practice. It seems like the popular
“coordinate-free” philosophy arose from the 19th century battle between the synthetic geometers (coordinate-
free) and the analytical geometers (using coordinates). Some people are still trying to do differential geometry
along 19th century synthetic lines. It’s a kind of nostalgia maybe. But it’s all educational.
41.5. The Riemannian connection in terrestrial coordinates

41.5.1 Remark: The Riemannian connection for S 2 defines parallel transport of tangent vectors on S 2 .
Figure 41.5.1 shows parallel transport of a vector along two paths in S 2 . Note that the region bounded by
the paths has area π/4, which equals the difference in orientation of the axes at the end-points.
Finish here
Hamburg
Hà Nô.i
Start here
Figure 41.5.1 Parallel transport in S 2 along boundary of φ ∈ (0, π2 ), θ ∈ ( π6 , π2 )

This section uses the notation of Section 41.4 for the principal fibre bundle (P, π, B) for B = S 2 with
structure group G = GL(2).
41.5.2 Remark: An affine connection ρ̄ is a function defined on the 6-dimensional total space P so that
for each p ∈ P , ρ̄p is a map from the 2-dimensional tangent space Tπ(p) (B) to the 6-dimensional tangent
space Tp (P ).
The components of ρ̄p in the horizontal direction are quite easy to determine. As a point moves on b ∈ B,
the corresponding element of p ∈ P must move so that π(p) = b. Therefore the partial derivative of the φ
component of the 6-tuple (φ, θ, v11 , v12 , v21 , v22 ) for a point in P with respect to the φ component of (φ, θ)
for a point in B must equal 1. The derivative with respect to θ must be 0. The derivatives with respect to θ
are similar.
It remains to determine the derivatives of the coordinates (v11 , v12 , v21 , v22 ) of elements in P with respect to
coordinates (φ, θ) of points in B. The v-coordinates must change in such a way that the vectors (v11 , v12 ) and
(v21 , v22 ) maintain parallelism as the base point (φ, θ) changes. In the case of the Levi-Civita connection,
these vectors will remain orthogonal if they are initially orthogonal. This implies that the derivative of
(v11 , v12 , v21 , v22 ) is an element of the Lie algebra of the Lie group SO(2). In other words, the derivative
must be an anti-symmetric matrix.
[ Use superscripts for α and β in the following? The map ρ̄p should be from IR2 to IR6 . Must use the
coordinate vectors ∂φ and ∂θ so that the matrices are as follows. ]
3 4 3 4
0 − tan θ − tan θ 0
αφ + αθ .
sin θ cos θ 0 0 0
41.5.3 Remark: The Riemannian connection ρ̄ for S 2 satisfies the following.

" #
ρ̄p (ψT−1(B) (φ, θ, αφ , αθ )) = ψT−1(P ) (φ, θ, v11 , v12 , v21 , v22 , αφ , αθ , β11 , β12 , β21 , β22 ) ,
where p = ψP−1 ((φ, θ, v11 , v12 , v21 , v22 )) ∈ P and

3 4 I 3 4 3 4J 3 4
β11 β12 0 − sin θ 0 0 v v12
= αφ + αθ · 11 . (41.5.1)
β21 β22 sin θ 0 0 0 v21 v22
The β matrix is clearly linear with respect to the tangent vector coordinates α, an invariant under right
action by the structure group.
[ Really should do the map ρ̄p only from one tangent space to the other and ignore the point coordinates. So
it should map a 2-d space to a 6-d space. ]
The fact that this connection is an orthogonal connection is due to the fact that the matrices in equa-
tion (41.5.1) are real and anti-symmetric. Anti-symmetric 2x2 matrices are in fact the generators of SO(2),
which follows from the well-known formulas from linear ODE systems theory:
I 3 4J 3 4
0 −1 cos λ − sin λ
exp λ =
1 0 sin λ cos λ
and 3 4 3 43 4
d cos λ − sin λ 0 −1 cos λ − sin λ
= .
dλ sin λ cos λ 1 0 sin λ cos λ
With the constraint that the connection be orthogonal, there are at most 2 independent parameters in
the expression for the β matrix. In the case of general affine connections, there are clearly 8 independent
parameters.
[ Should cover linear ODE systems theory in the Lie groups chapter. ]
With the above connection ρ̄, it is possible to generate parallel transport along paths. For example, consider
−1
the curve γ : IR → B defined by γ : t 8→ ψB (t, θ1 ). This is a curve around the latitude line θ = θ1
for θ1 ∈ (− 2 , 2 ). As discussed in Section 35.8, it should be possible to define a suitable function
π π
γ̂ : IR2 → {f : π −1 ({b1 }) ≈ π −1 ({b2 }); b1 , b2 ∈ B},

41.6. Coordinates for polar exponential maps 755
such that Dom(γ̂s,t ) = π −1 ({γ(s)}) and Range(γ̂s,t ) = π −1 ({γ(t)}) for all s, t ∈ IR.
The parallel transport function γ̂ will be such that for s = 0, each frame in π −1 ({γ(0)}) is rotated by
angle t sin θ1 . So the difference in angle for any given s, t ∈ IR will be (t − s) sin θ1 . Must show that this
satisfies the equations for parallel transport in terms of ρ̄. In terms of ψB , obtain γ # (t) = cos θ.eφ (γ(t)). This
will justify the parallel translation in Figure 41.5.1. The matrix for the parallel translation operation from
γ(s) to γ(t) is 3 4 3 " # " #4
cos λ − sin λ cos" (t − s) sin θ1# − sin "(t − s) sin θ1 #
= .
sin λ cos λ sin (t − s) sin θ1 cos (t − s) sin θ1
[ Must show that this satisfies the parallel transport equation for given ρ̄. ]
[ Putting s = 0, t = φ above gives λ = φ sin θ. ]
[ The map for a given fixed initial p ∈ π −1 ({γ(s)}) for a given s ∈ IR is called a “lift” in EDM2 [35],
section 80.C. This is denoted γp∗ and is a curve in P such that γp∗ (s) ∈ π −1 ({γ(s)}) for all s ∈ Dom(γ). ]
[ Also deal here with the “horizontal subspace” style of connection definition Qx . ]
41.6. Coordinates for polar exponential maps

Terrestrial coordinates have bad discontinuities at the poles which are difficult to remove. Therefore to
construct a 2-chart atlas for S 2 , it is necessary to introduce charts which are better behaved at the poles.
Define the 2-chart atlas (ψ1 , ψ2 ) for S 2 , where ψ1 : S 2 \ {(0, 0, −1)} → IR2 and ψ2 : S 2 \ {(0, 0, 1)} → IR2 are
defined by
3 4 3 4
x(x2 + y 2 )−1/2 arccos(z) x arccos(z)(1 − z 2 )−1/2
ψ1 (x, y, z) = =
y(x2 + y 2 )−1/2 arccos(z) y arccos(z)(1 − z 2 )−1/2
3 4 3 4
x(x2 + y 2 )−1/2 (π − arccos(z)) x(π − arccos(z))(1 − z 2 )−1/2
ψ2 (x, y, z) = =
y(x2 + y 2 )−1/2 (π − arccos(z)) y(π − arccos(z))(1 − z 2 )−1/2
for |z| -= 1, and ψ1 (0, 0, 1) = ψ2 (0, 0, −1) = (0, 0). These are exponential maps radiating from (0, 0, 1) and
(0, 0, −1) respectively. The ranges of the charts ψ1 and ψ2 are illustrated in Figure 41.6.1.
η = ψ12 (x, y, z) η = ψ22 (x, y, z)
(0, 1, 0) π (0, 1, 0) π
3 θ=− 3 θ=
2 2
2 2
(1, 0, 0) (1, 0, 0)
1 1
ξ = ψ11 (x, y, z) ξ=
-3 -2 -1 1 2 3 -3 -2 -1 1 2 3
ψ21 (x, y, z)
-1 -1
North South
Pole π -2 θ=0 Pole π -2 θ=0
θ= θ=−
2 -3 2 -3
Figure 41.6.1 Ranges of charts ψ1 and ψ2 for S 2
The inverses of the charts ψ1 and ψ2 are as follows.

 " #
ξ(ξ 2 + η 2 )−1/2 sin (ξ 2 + η 2 )1/2
" #
ψ̄1 (ξ, η) = ψ1−1 (ξ, η) =  η(ξ 2 + η 2 )−1/2 sin (ξ 2 + η 2 )1/2 
" 2 #
cos (ξ + η 2 )1/2
 " #
ξ(ξ 2 + η 2 )−1/2 sin (ξ 2 + η 2 )1/2
" #
ψ̄2 (ξ, η) = ψ2−1 (ξ, η) =  η(ξ 2 + η 2 )−1/2 sin (ξ 2 + η 2 )1/2  .
" 2 #
1 − cos (ξ + η 2 )1/2

The maps τi = ψi ◦ ψ0−1 , where the terrestrial coordinate chart ψ0 is as defined in Section 41.1, are as follows.
It is convenient to add subscripts to ξ and η to indicate which chart ψi they belong to.
 +π , 
3 4 − θ cos φ
ξ1
= τ1 (φ, θ) = (ψ1 ◦ ψ0−1 )(φ, θ) =  + 2π , 
η1 − θ sin φ
2
 +π , 
3 4 + θ cos φ
ξ2 2
= τ2 (φ, θ) = (ψ2 ◦ ψ0 )(φ, θ) =  + π
−1
, .
η2 + θ sin φ
2
The maps τi−1 = ψ0 ◦ ψi−1 , where the map ψ0 is as defined in Section 41.1, are as follows.
3 4 R S
φ arctan(ξ1 , η1 )
−1 −1
= τ1 (ξ1 , η1 ) = (ψ0 ◦ ψ1 )(ξ1 , η1 ) = π
θ − (ξ12 + η12 )1/2
2
3 4 R S
φ arctan(ξ2 , η2 )
−1 −1
= τ2 (ξ2 , η2 ) = (ψ0 ◦ ψ2 )(ξ2 , η2 ) = π .
θ − + (ξ22 + η22 )1/2
2
For embedded manifolds, it is not necessary to introduce the concept of a set graft (as in Definitions 6.10.5
and 15.11.2) to construct the set of points in the manifold. It is, however, generally very useful to use set
grafts to construct tangent bundles for such spaces. Nevertheless, it is instructive to construct S 2 here from
the graft of the two polar exponential maps ψ1 and ψ2 .
A set graft indicates which points on two patch sets are to be identified and regarded as the same point. In
this case, the set graft must be a subset X of the partial Cartesian product × ˚i=1,2 Vi , where Vi = Range(ψi )
for i = 1, 2. (See Definition 6.10.1 for ‘partial Cartesian product’.)
Let (ξ1 , η1 ) ∈ V1 \ {(0, 0)}. The corresponding point in V2 is (ξ2 , η2 ) = α12 (ξ1 , η1 ) ∈ V2 \ {(0, 0)}, where
α12 : V1 \ {(0, 0)} → V2 \ {(0, 0)} is defined by
(ξ1 , η1 )
α12 (ξ1 , η1 ) = (π − |(ξ1 , η1 )|) . (41.6.1)
|(ξ1 , η1 )|
Let ζ1 = (ξ1 , η1 ) and ζ2 = (ξ2 , η2 ). Then
ζ1
ζ2 = α12 (ζ1 ) = (π − |ζ1 |) .
|ζ1 |
−1
Coincidentally, it happens that α12 = α12 . The graft set X is defined by
) * ) *
X = ((0, 0), ·), (·, (0, 0)) ∪ ((ξ1 , η1 ), α12 (ξ1 , η1 )); (ξ1 , η1 ) ∈ V1 \ {(0, 0)} ,
˚i=1,2 Vi .
where the dot ‘·’ represents an undefined value in a partial sequence in the partial Cartesian product ×
(See Definition 6.10.5.) If X is given the graft topology from V1 and V2 , then X ≈ S 2 .
This graft set X may be used as the base set for the manifold S 2 , but in practice it is too clumsy. It is
unnecessary because we already have the embedded subset of IR3 to focus on as the set of points in the
manifold. However, it is not quite so clear what one should take to be the set of points in the tangent
bundle. The bundle is itself may be defined as a set of ordered pairs (p, ∂p,v,ψ ) where ∂p,v,ψ if the tangent
vector at p with coordinates v ∈ IR2 for chart ψ. But when it is time to talk about the tangent bundle of
the tangent bundle, things become somewhat muddled. Then it is better to have a concrete set to point to
as representing the tangent bundle in a coordinate-indpendent sense, since the form (p, ∂p,v,ψ ) can be valid
only in the context of a single chart.
[ Near here, should show that ψ2 ◦ ψ1−1 is C ∞ etc., and calculate the derivatives of these chart transition
functions. ]

41.7. The global tangent bundle 757
41.7. The global tangent bundle

This section uses the 2-chart atlas {ψ1 , ψ2 } for S 2 which is defined in Section 41.6.
The manifold charts ψ1 and ψ2 may be extended to tangent bundle charts which look like B0,π 2
× IR2 for ψ1
and ψ2 with G = GL(2) or O(2). (The groups SL(2) and SO(2) are unsuitable unless the orientation of ψ2
is flipped.)
For C 1 functions f : S 2 → IR, consider the derivatives v i (∂/∂ζji )(f ◦ψj−1 (ζj )), where ζj = (ξ, η) ∈ Range(ψj ).
There is a serious problem here. The general definition of a tangent vector regards the points of a manifold
as being abstract points. But when calculating derivatives such as (∂/∂ζji )(f ◦ ψj−1 (ζj )), it is only possible
to apply rules such as the composition rule for derivatives if the points of the manifold, such as S 2 , are
coordinatized. In the case of an embedded manifold, the coordinates of the ambient space, such as IR3 , are
not completely adequate. This is especially true in the case of derivatives or order greater than 1, but even
first order derivatives have the difficulty that they must first be extended from the manifold into the ambient
space.
The requirement for coordinates to be defined on a manifold in order to define tangent vectors as derivative
operators means that a chart is required. In other words, there is in practice really no such thing as abstract
points. For practical calculations, it is always necessary to work within particular charts. Even defining the
value of a function on a manifold requires coordinates, unless it is some simple form of function, such as
“distance from a given point”, for instance. Thus everything is ultimately reduced to Cartesian coordinates
in any real calculations. All of the theoretical definitions for manifolds in terms of abstract points are valuable
only for clear thinking, not for any practical calculations.
Using the chart ψ in Section 41.1, the tangent vectors from the chart ψ1 may be calculated by elementary
calculus from the formula f ◦ ψ1−1 = (f ◦ ψ −1 ) ◦ (ψ ◦ ψ1−1 ) as follows.
I J I J
1 ∂ 2 ∂ ∂φ ∂ ∂θ ∂ ∂φ ∂ ∂θ ∂
v +v =v 1
+ +v 2
+
∂ξ1 ∂η1 ∂ξ ∂φ ∂ξ ∂θ ∂η ∂φ ∂η ∂θ
I J I J
−η ∂ −ξ ∂ ξ ∂ −η ∂
=v 1
+ +v 2
+ .
ξ 2 + η 2 ∂φ (ξ 2 + η 2 )1/2 ∂θ ξ 2 + η 2 ∂φ (ξ 2 + η 2 )1/2 ∂θ
I J I J
− sin φ ∂ ∂ cos φ ∂ ∂
= v1 − cos φ + v2 − sin φ .
π/2 − θ ∂φ ∂θ π/2 − θ ∂φ ∂θ
Similarly, differentiation of f ◦ ψ2−1 yields
I J I J
1 ∂ 2 ∂ − sin φ ∂ ∂ cos φ ∂ ∂
v +v =v 1
+ cos φ +v 2
+ sin φ .
∂ξ2 ∂η2 π/2 + θ ∂φ ∂θ π/2 + θ ∂φ ∂θ
Subscripts have been added for the vectors ∂/∂ξ and ∂/∂η to indicate which chart they belong to. These
tangent vectors may be expressed in terms of the terrestrial chart tangent vectors as follows.
∂ − sin φ ∂ ∂ ∂ − sin φ ∂ ∂
= − cos φ = + cos φ
∂ξ1 π/2 − θ ∂φ ∂θ ∂ξ2 π/2 + θ ∂φ ∂θ
∂ cos φ ∂ ∂ ∂ cos φ ∂ ∂
= − sin φ = + sin φ .
∂η1 π/2 − θ ∂φ ∂θ ∂η2 π/2 + θ ∂φ ∂θ
These can be solved for ∂φ and ∂θ as follows.
∂ ∂ ∂ ∂ ∂ ∂
= (π/2 − θ)(− sin φ + cos φ ) = (π/2 + θ)(sin φ − cos φ )
∂φ ∂ξ1 ∂η1 ∂φ ∂ξ2 ∂η2
∂ ∂ ∂ ∂ ∂ ∂
= − cos φ − sin φ = − cos φ − sin φ .
∂θ ∂ξ1 ∂η1 ∂θ ∂ξ2 ∂η2
These formulas tell you which tangent vectors in the two charts must be identified in the set graft of the two
charts. Equivalent formulas in terms of the V1 and V2 coordinates are as follows.
∂ ∂ ∂ ∂ ∂ ∂
= − η1 + ξ1 = η2 − ξ2
∂φ ∂ξ1 ∂η1 ∂φ ∂ξ2 ∂η2
∂ −ξ1 ∂ −η1 ∂ ∂ −ξ2 ∂ −η2 ∂
= 2 + 2 = 2 + 2 .
∂θ (ξ1 + η1 ) ∂ξ1
2 1/2 (ξ1 + η1 ) ∂η1
2 1/2 ∂θ (ξ2 + η2 ) ∂ξ2
2 1/2 (ξ2 + η2 ) ∂η2
2 1/2

[ To construct the tangent bundle topological space E = T (S) for M = S 2 , use Theorem 15.10.5, Def-
inition 15.10.6 (set union topology), and Definition 15.1.1 (product topology) to construct T (M ) from
patches Ui × IR2 . ]
The base-point grafting equivalence rule for the atlas (ψ1 , ψ2 ) is shown in Equation (41.6.1). The grafting
rule for tangent vectors is obtained from this by differentiating line 41.6.1.
∂ + πη 2 , ∂ πξ2 η2 ∂
= 2
− 1 −
∂ξ2 |ζ2 |3 ∂ξ1 |ζ2 |3 ∂η1
∂ πξ2 η2 ∂ + πξ22 , ∂
=− + − 1 .
∂η2 |ζ2 |3 ∂ξ1 |ζ2 |3 ∂η1
The matrix for the basis transformation is then as follows.
3 4 3 " 2 −3 # 4 3 4
∂/∂ξ2 πη2 |ζ2 | − 1 "−πξ2 η2 |ζ2 |−3 # ∂/∂ξ1
= · . (41.7.1)
∂/∂η2 −πξ2 η2 |ζ2 |−3 πξ22 |ζ2 |−3 − 1 ∂/∂η1
To construct the tangent bundle graft of the two charts, it is now sufficient to identify point-vector pairs
which correspond to the same base point and tangent vector. The determinant of the matrix in line 41.7.1
is 1 − π(ξ22 + η22 )−1/2 . This is negative. So the group for the tangent bundle cannot be SL(2). However,
replacing chart ψ2 with a mirror image of itself makes the determinant positive. It is interesting to note that
if the tangent space of the mirror image of chart ψ2 is given appropriate basis vectors, the transition matrix
above becomes a simple rotation in SO(2) with angle 2φ.
[ Clearly it is now required to have a general definition in the differentiable structure chapter for the tangent
bundle constructed from charts, in particular in the case of embedded manifolds. ]
[ Here there should be a treatment of T (T (S 2 )), the tangent space of the tangent space of S 2 . This should be
followed by the Riemannian connection on S 2 . Also of interest would be the space T 2 (M ) of second-order
derivates on S 2 . ]
41.8. Isometries of S 2
This section presents rotations of the 2-sphere in IR3 about various axes. These rotations are elements of
the classical group SO(3).
[ This subject is related to spinors. See Misner/Thorne/Wheeler [38], chapter 41, pages 1135–1165. ]
Since the Riemannian manifold S 2 is symmetric with respect to elements of the orthogonal group O(3), this
group can be used for generating new geodesics out of simple geodesics. For instance, the equator of S 2 is
E = {(x, y, z) ∈ IR3 ; x2 + y 2 = 1 and z = 0}, which is the image set of a geodesic. Therefore the image gE
of E for any group element g ∈ O(3) is also a geodesic of S 2 .
41.8.1 Remark: Define three generators of SO(3) to be the antisymmetric matrices
     
0 0 0 0 0 1 0 −1 0
A1 =  0 0 −1  ; A2 =  0 0 0  ; A3 =  1 0 0  .
0 1 0 −1 0 0 0 0 0
Define the corresponding single parameter families of rotation matrices R1 , R2 and R3 by
 
1 0 0
R1 (α1 ) = exp(α1 A1 ) =  0 cos α1 − sin α1 
0 sin α1 cos α1
 
cos α2 0 sin α2
R2 (α2 ) = exp(α2 A2 ) =  0 1 0 
− sin α2 0 cos α2
 
cos α3 − sin α3 0
R3 (α3 ) = exp(α3 A3 ) =  sin α3 cos α3 0  .
0 0 1
For k = 1, 2, 3, let Rk (t) denote the corresponding linear transformations on IR3 , defined by x 8→ Rk (t)x.
Then Rk (t)(S 2 ) = S 2 for all t ∈ IR and k = 1, 2, 3.

41.8. Isometries of S 2 759
41.8.2 Remark: Euler’s angles θ, φ and ψ are the parameters of transformations of IR3 of the form
e#i = R3 (φ)R2 (θ)R3 (ψ)ei where the ei are orthonormal basis vectors of IR3 . The coordinate transformation
has the matrix R3 (−ψ)R2 (−θ)R3 (−φ). Therefore the coordinates (x# , y # , z # ) with respect to the modified
basis vectors are related to the original coordinates (x, y, z) by
 #  
x x
 y #  = R3 (−ψ)R2 (−θ)R3 (−φ)  y 
z# z
  
cos ψ cos θ cos φ − sin ψ sin φ sin ψ cos φ + cos ψ cos θ sin φ − cos ψ sin θ x
=  − sin ψ cos θ cos φ − cos ψ sin φ cos ψ cos φ − sin ψ cos θ sin φ sin ψ sin θ   y  .
sin θ cos φ sin θ sin φ cos θ z
This transformation is supposed to be useful for analysing spinning tops or something. The idea is to align
the z # coordinate (unit vector e#3 ) with the axis of the top.
Although Euler’s angles are useful for mechanics, they are not at all useful as coordinates for the differentiable
manifold structure of the group SO(3) in a neighbourhood of the identity. For that purpose, the three
parameters α1 , α2 and α3 are perfectly suited. For instance, the transformation R3 (α3 )R1 (α1 )R2 (α2 ) ∈
SO(3) of unit vectors for α ∈ IR3 provides useful coordinates in a neighbourhood of the identity of SO(3).
This particular coordinatization may be thought of in terms of an aeroplane flying in the direction of the
X-axis: α2 is the forward-tilt of the plane (the “pitch”), α1 is the right-roll of the plane (the “roll”), α3 is
the left-drift of the plane (the “yaw”). Each of the 6 orderings of the 3 rotations gives a different but
analytic-compatible chart in a neighbourhood of the identity.
41.8.3 Remark: It is tempting to look for better parameters than Euler’s angles. One interesting possi-
bility would be to define two parameters as the location of the fixed point in S 2 of a group element, with
the third parameter being the amount of rotation around that fixed point. The calculations for this are
not completely trivial. For instance, even if the Euler angle ψ is set to zero, it may be shown (matrix
diagonalization) that the fixed points of transformation R3 (−ψ)R2 (−θ)R3 (−φ) satisfy
sin θ(1 − cos φ)(1 − cos φ cos θ)

x= z
cos φ cos θ
and
sin φ sin θ
y= z.
(1 − cos φ)(1 + cos θ)
This leads to complicated expressions for the terrestrial coordinates of the fixed point. This does not seem
to be a useful way to parametrize rotations, although it does have the advantage of removing the special
significance of the axial directions.
[ It would be useful to have some idea of how charts such as ψ123 should be restricted so that they are well-
defined. Specifically, α = (α1 , α2 , α3 ) must be restricted. What are the conditions for two values of α to
map to the same rotation? The robotics literature should have some information on this. ]
41.8.4 Remark: For every vector V ∈ Te (G) for G = SO(3), a vector field is induced on S 2 by the map
p 8→ (dRp )e (V ) for p ∈ S 2 , where Rp : G → S 2 is defined by Rp : g 8→ g.p for g ∈ G and p ∈ S 2 . This map
is chart-independent.
Define a chart ψ123 : G → ˚ IR3 so that ψ123 : R3 (α3 )R2 (α2 )R1 (α1 ) 8→ (α1 , α2 , α3 ) in some neighbourhood
of e ∈ S . The rotation matrix is:
3
 
c2 c3 s1 s2 c3 − c1 s3 c1 s2 c3 + s1 s3
R3 (α3 )R2 (α2 )R1 (α1 ) =  c2 s3 s1 s2 s3 + c1 c3 c1 s2 s3 − s1 c3  ,
−s2 s1 c2 c1 c2
where sk = sin αk and ck = cos αk for k = 1, 2, 3.

Let V = te,v,ψ123 ∈ Te (G) for v ∈ IR3 . Define the chart ψ0 : S 2 → ˚ IR2 for terrestrial coordinates as in
Section 41.1. Then to calculate (dRp )e (V ), it is necessary to differentiate the position of a point in S 2 with

" # ! &
respect to coordinates of G. To be precise, (dRp )e (V ) i = j V j ∂xj (ψ0i ◦Rp ◦ψ123 −1
(x))&x=ψ123 (e) , where ψ01 =
φ and ψ02 = θ. When v = (1, 0, 0), (dRp )e (V ) = − cos φ tan θ epφ + sin φ epθ , where epφ and epθ are the coordinate
basis vectors at p ∈ S 2 with respect to ψ0 . Similarly for v = (0, 1, 0), (dRp )e (V ) = − sin φ tan θ epφ − cos φ epθ ,
and for v = (0, 0, 1), (dRp )e (V ) = epφ . Therefore for general V ∈ Te (G),
 
3 4 v1
− cos φ tan θ − sin φ tan θ 1
(dRp )e (V ) = [ epφ epθ ]  v2  .
sin φ − cos φ 0 v3
[ Calculate induced fields on S 2 for α coordinates. Also do this with the Euler pseudo-chart? Do a diagram
of the induced fields, one sphere showing the induced field for each axis or rotation. ]
[ Give the general formula for rotations around axes through arbitrary points (φ0 , θ0 ). These can be expressed
in terms of the rotations Rk (t). See Notes Q. ]
[ Define exponential map and semigroups in SO(3). ]
41.9. Geodesic curves

A curve is defined as a map γ : I → X. A path is defined as an equivalence class of curves.
This section deals with geodesic curves which can be expressed in the form θ = g(φ) for some function g.
The next section deals with affinely parametrized curves.
[ Have a graphic here of the image set of the general geodesic curve, showing the intersection point φ = φ0
with the equator, and the angle β = arctan(k). ]
41.9.1 Theorem: If a geodesic curve in S 2 is locally expressible as a function θ = g(φ), then for some
k, φ0 ∈ IR, the function satisfies
θ = arctan(k sin(φ − φ0 )).
Conversely, any function of this form is the graph of a geodesic curve.
Proof: It follows from the equations for a freely parametrized geodesic curve in 2-dimensional manifolds
with affine connection (Theorem 40.3.1) that
θ ## − Γ22
1
(θ # )3 + (Γ22
2
− 2Γ12
1
)(θ # )2 + (2Γ12
2
− Γ11
1
)θ # + Γ11
2
= 0.
On substituting the Christoffel symbol Γijk

= − tan θ (δi1 δj2 + δi2 δj1 )δ1k + sin θ cos θ δi1 δj1 δ2k , the differential
equation for θ becomes
θ ## + 2 tan θ (θ # )2 + sin θ cos θ = 0.
However, θ ## + 2 tan θ (θ # )2 = cos2 θ (tan θ)## . So (tan θ)## + tan θ = 0, which has the general solution tan θ =
k sin(φ − φ0 ) as claimed.
41.9.2 Remark: Another way to calculate the coordinates of great circle lines is to consider a great circle
through (φ, θ) = (0, 0) with inclination α to the equator to be the intersection of the sphere S 2 with the
plane {(x, y, z); z cos α = y sin α}. Then the equation in terms of (φ, θ) must be
tan θ = sin φ tan α.
That is,
θ = arctan(sin φ tan α).
This agrees with the particular case (φ1 , θ1 ) = (0, 0) in the following theorem.

41.9. Geodesic curves 761
41.9.3 Theorem: The geodesic curve through the point (φ1 , θ1 ) with slope α has the equation
tan α
tan θ = sin(φ − φ1 ) + tan θ1 cos(φ − φ1 ).
cos θ1
[ The point on this geodesic which has distance r from (φ1 , θ1 ) is. . . ]
[ In the above equation, solve for φ to get φ = φ1 + . . .. ]
This curve passes through the equator at points (φ0 , 0) with slope β, where φ0 and β satisfy
tan(φ0 − φ1 ) = − sin θ1 cot α
O
sin2 θ1 + tan2 α
tan β = ± .
cos θ1
[ It is necessary to say which β goes with which φ0 . Must also deal with special cases of the pair (θ1 , α). ]
Hence the geodesic curve can also be expressed as
tan θ = tan β sin(φ − φ0 ).
The slope at any point (φ, θ) of the geodesic is β such that

O
tan β = sign(cos(φ − φ+ 0 )) cos θ tan α sec θ1 + tan θ1 − tan θ,
2 2 2 2
0 is a zero of θ at which the slope of the geodesic is non-negative.

where φ+
Proof: By Theorem 41.9.1, the equation for θ describes a geodesic curve, since tan θ is a linear combination
of sin φ and cos φ. Clearly θ = θ1 when φ = φ1 . To show that the coefficients are correct for the given slope
α at (φ1 , θ1 ), note that
dθ tan α
sec2 θ = cos(φ − φ1 ) − tan θ1 sin(φ − φ1 ).
dφ cos θ1
Hence at φ = φ1 , the slope of the geodesic is
dθ &&
sec θ & = tan α,
dφ φ=φ1
as claimed.
The geodesic curve passes through the equator when θ = 0, which occurs for φ = φ0 , where tan(φ0 − φ1 ) =
− sin θ1 cot α. (If α = 0 and θ1 -= 0, this may be interpreted to mean that (φ0 − φ1 ) mod 2π = π/2 or 3π/2.
If α = 0 and θ1 = 1, the value of φ0 is indeterminate, and may be taken to have any real value.) When θ1 -= 0,
there are clearly two points (φ0 , 0) on the equator through which the geodesic curve passes.
The slope m = tan β = sec θ dθ/dφ of the curve at any point (φ, θ) on the curve satisfies
I J
tan α
m = cos θ cos(φ − φ1 ) − tan θ1 sin(φ − φ1 ) .
cos θ1
Hence at a point (φ0 , 0), the slope m satisfies
dθ && tan α
m = sec θ & = cos(φ0 − φ1 ) − tan θ1 sin(φ0 − φ1 )
dφ φ=φ0 cos θ1
O
sin2 θ1 + tan2 α
= sign(cos(φ0 − φ1 )) .
cos θ1
The slope m = tan β at a general given point on the geodesic satisfies
1 dθ
tan β =
cos θ dφ
I J
tan α
= cos θ cos(φ − φ1 ) − tan θ1 sin(φ − φ1 )
cos θ1
O
= sign(cos(φ − φ+ 0 )) cos θ tan α sec θ1 + tan θ1 − tan θ,
2 2 2 2
where φ+
0 is a zero of θ at which the slope of the geodesic is non-negative.

41.9.4 Theorem: The geodesic through two points (φ1 , θ1 ) and (φ2 , θ2 ) which are neither coincident nor
antipodal has the equation
tan θ2 sin(φ − φ1 ) + tan θ1 sin(φ2 − φ)
tan θ = . (41.9.1)
sin(φ2 − φ1 )
This geodesic passes through the equator at points (φ0 , 0) at slope β0 , with
tan θ2 sin φ1 − tan θ1 sin φ2
tan(φ0 ) =
tan θ2 cos φ1 − tan θ1 cos φ2
tan β0 = . . .
The slope of the geodesic at φ = φ1 is
I J
cos θ1 tan θ2 sin θ1
β1 = arctan − .
sin(φ2 − φ1 ) tan(φ2 − φ1 )
The slope of the geodesic at φ = φ2 is
I J
sin θ2 cos θ2 tan θ1
β2 = arctan − .
tan(φ2 − φ1 ) sin(φ2 − φ1 )
Proof: (41.9.1) defines a geodesic curve since tan θ is expressed as a linear combination of translated sines
of φ. The fact that the curve passes through both specified points follows immediately by substitution.
The formula for φ0 follows by setting θ = 0 in the formula for tan θ and using the difference rule for the sine
function, then solving for φ.
41.10. Affinely parametrized geodesics

[ Here have theorems on affine parametrized and length-parametrized geodesics. This may be tricky, because
when solving for f (u) with k(u) = −2 tan θ g # and g(u) = arctan(c sin(φ − φ0 )), f (u) comes out like
N u
f (u) = (c1 − ln |1 + c2 sin2 (φ − φ0 )| )−1 du.
This, hopefully, is an error! See notes A. ]

The equations for affinely parametrized geodesics may be obtained in several ways. One way is to re-
parametrize the equations for a geodesic curve parametrized by φ. Another way is to solve the equations for
a geodesic curve directly. And a third way is to use the symmetries of S 2 to generate families of geodesic
curves out of a simple geodesic, such as the equator, parametrized by φ. This last approach gives
φ(t) = φ0 + arctan(cos α tan(t − t0 ))
 
sin α sin(t − t 0 )
θ(t) = arctan  T 
1 − sin α sin (t − t0 )
2 2
U V
sin α tan(t − t0 )
= arctan O
1 + cos2 α tan2 (t − t0 )
U V
cos α tan(t − t0 )
= arcsin O ,
1 + cos2 α tan2 (t − t0 )
by applying successively Rz (−t0 ), Rx (α) and Rz (φ0 ) to the curve given by φ(t) = t and θ(t) = 0.
[ For affine parametrization, need Ẍ i + Γjk
i
Ẋ j Ẋ k = 0. That is, θ̈ + cos2 θ φ̇2 = 0 and φ̈ − 2 tan θ φ̇θ̇ = 0. For
length-parametrized geodesics, need additionally gij Ẋ i Ẋ j = 1. That is, cos2 θ φ̇2 + θ̇ 2 = 1. (Note: Should
use the dash-notation, not the dot-notation.) ]
41.10.1 Theorem: Given two non-antipodal points on S 2 with spherical coordinates (φ0 , θ0 ) and (φ1 , θ1 ),
the point (φλ , θλ ) which divides the line joining the two points in the ratio λ is given by
φλ = . . .
θλ = . . .

41.11. Convex sets and functions 763
41.11. Convex sets and functions

[ In this section, it should be stated which sets are convex and which are not. ]
41.12. Normal coordinates

[ Should also deal in this section with the topic of geodesic coordinates. ]
41.12.1 Theorem: The set of points in S 2 with distance r from a given point with coordinates (φ0 , θ0 )
satisfies. . .
41.12.2 Theorem: The set of points on the geodesic whose points are equidistant from a given point
(φ0 , θ0 ) satisfies. . .
41.13. Jacobi fields

[ This section should describe the entire family of Jacobi fields for all geodesics. The magnitude of Jacobi
fields should be calculated, and also the first derivatives of the Jacobi field. ]
[ Write out the equation of geodesic variation and the equation of energy minimization etc. ]
[ Also look at the curvature of curves. Curvature equals the rate of change of direction of the curve with
respect to distance. Write out the formulas for general curves and circles. ]
41.14. Circles on the sphere

This section deals with the description of circles on S 2 in terms of terrestrial coordinates. A circle is defined
as the set of points which are equidistant from a given point. Let this distance be γ ∈ [0, π] for the unit
sphere embedded in IR3 . Let (φ0 , θ0 ) ∈ [−π, π] × [−π/2, π/2] be terrestrial coordinates for the centre of the
circle. The set of points (φ, θ) on the circle must satisfy:
cos γ = cos(θ − θ0 ) + cos θ cos θ0 (cos(φ − φ0 ) − 1)
= cos θ cos θ0 cos(φ − φ0 ) + sin θ sin θ0 .
This equation may be solved in terms of either φ or θ as follows.
+ cos γ − cos(θ − θ ) , + cos γ − sin θ sin θ ,
0 0
φ = φ0 ± arccos +1 = φ0 ± arccos
cos θ cos θ0 cos θ cos θ0 (41.14.1)
θ = arctan(cos θ0 cos(φ − φ0 ), sin θ0 ) ± arccos(cos γ (cos θ0 cos (φ − φ0 ) + sin θ0 )
2 2 2 −1/2
).
The formula for θ follows from Theorem 20.13.21.
One application of circles on a sphere is to horizon lines for satellite pictures. If a camera is placed at a
distance r0 from the centre of a sphere of radius r, the horizon circle has radius γ satisfying cos γ = r/d.
Thus is a satellite’s ground point is (φ0 , θ0 ), the region which can be imaged by the satellite satisfies the
above equations with cos γ replaced by r/r0 . The distance r0 = |(x0 , y0 , z0 )| and the angles φ0 = arctan(x, y)
and θ0 = arcsin(z/r0 ) can be derived from the Cartesian coordinates (x0 , y0 , z0 ) for the camera viewpoint in
terms of the 2-parameter length and arctan functions by the following algorithm.
φ0 = arctan(x0 , y0 )
r0# = |(x0 , y0 )|
θ0 = arctan(r0# , z0 )
r0 = |(r0# , z0 )|.
This is useful for software like MetaPost which offers limited trigonometric functions. This procedure is
readily extended to higher-dimensional spherical coordinates. The resulting angle φ0 here lies in (−π, π].
(Section 20.13 has further details on trigonometric functions.) The coordinates (r0 , φ0 , θ0 ) can be substituted
into Equations (41.14.1) to obtain the horizon circle for the given camera viewpoint. (This is used in the
line-hiding algorithm for Figure 41.1.2, for example.)
[ Give here the general formula for the intersection of two circles on a sphere. ]
[ Calculate here with the curvature of the circles and calculate parallel transport along them. ]

41.15. Calculation of the “hours of daylight”

[ Should do a “length of day” calculation, which gives length of day in terms of latitude and time of year etc.
24
length of day = arccos(tan δ tan θ) hours
π
π/2 − φ
= · 12 hours,
π/2
where θ is the latitude of the place at which the day is observed, and δ is the declination angle of the
Sun, which reaches a maximum of about 23.5◦ . (The mean value was 23◦ 26# 45## on 1 January 1950. See
Norton [211], page 6. The angle should also be represented as a decimal, and probably also in radians.) ]
Let the declination of the Sun be δ. Then the path of the Sun satisfies
(x, y, z) = (− sin θ1 , 0, cos θ1 ).
For δ = 0, have (− sin θ1 , 0, cos θ1 ).(cos θ cos φ, cos θ sin φ, sin θ) = 0. That is,
− sin θ1 cos θ cos φ + cos θ1 sin θ = 0

cos θ1 sin θ
cos φ =
sin θ1 cos θ
tan θ
=
tan θ1
tan θ = cos φ tan θ1 .
For general δ,
− sin θ1 cos θ cos φ + cos θ1 sin θ = sin δ
cos θ1 sin θ − sin δ
cos φ = .
sin θ1 cos θ
The length of day calculation should be done after calculating the circle with a given centre and radius.
Then take radius equal to π/2 − δ. Then solve for z = 0. That is, θ = 0. Then get cos φ = − sin δ/ sin θ1 .
41.16. Some standard map projections

[ Should also have a section on map projections for the sphere. E.g. Mercators projection. ]
[ Should find out what the Bessel ellipsoidal coordinates are, and work out the geodesics for this, and other
similar things. ]
[ Also in the chapter, do calculations of area of circles, the area above a curve θ = f (φ) and so forth. Also
deal with the Gauß theorem, Green’s theorem etc. Show that area of a region is related to the change of
angle for parallel transport around a boundary. ]
41.17. Projection of a sphere onto a plane

41.17.1 Remark: The projection of points and lines from IR3 onto a plane which is identified with IR2
is not difficult. Points and lines are projected to points and lines. The projection of a 2-sphere onto a
plane requires a little more work. The outline of the image under projection of a sphere from IR3 to IR2
may be determined from the tangent lines to the sphere which pass through the viewpoint. Like many
Cartesian coordinate calculations, the task is made much easier by guessing a method of attack which avoids
unnecessary complexity in the intermediate constructions.
41.17.2 Remark: Define a projection map P : IR3 → IR2 by P : x 8→ (L(x)1 , L(x)2 )/L(x)3 , where
L : IR3 → IR3 is defined by L : x 8→ A(x − b), where A ∈ M3,3 (IR3 ) is an invertible 3 × 3 matrix and b ∈
IR3 . The matrix A represents a rotation (or other linear transformation) of the points in IR3 , while b is a
translation. The point b ∈ IR3 is the viewpoint where the camera is placed. If A is orthogonal, the row

41.17. Projection of a sphere onto a plane 765
vector a3 = (a31 , a32 , a33 ) is the direction in which the camera is pointed and the row vectors a1 and a2 are
the directions in which the X and Y axes are oriented.
Assume that the matrix is orthogonal. Then the sphere projection problem reduces the calculation of
the orientation of lines through a point b ∈ IR3 which are tangent to a sphere with centre q ∈ IR3 and
radius R ∈ IR+ . Such a line has distance R from q. Such lines must be calculated for all directions
eθ = a1 cos θ + a2 sin θ from q for θ ∈ [0, 2π]. The problem is solved if a number r(θ) ∈ IR can be determined
for each θ so that the line segment from b to q + r(θ)eθ is tangent to the sphere.
Now focus on a single orientation eθ . The plane through b and q oriented in the direction eθ is a plane in
which the problem looks much simpler. In IR2 , the" value of t ∈ IR for which # the line through (d, 0) and
t(c1 , c2 ) has distance R from (0, 0) satisfies t = d/ c1 + c2 (d2 /R2 − 1)1/2 if the denominator is positive.
(The denominator is always positive in this application.)
To apply the simplifed
" IR2 case to calculate
# r(θ), let d −1
= |q −b|, c1 = a1 ·(b−q)/|b−q| and c2 = (|eθ |2 −c21 )1/2 .
Then r(θ) = d/ c1 + c2 (d /R − 1)
2 2 1/2
. Hence r(θ) = c1 d−1 + c2 (R−2 − d−2 )1/2 . Therefore the point
vθ = q + r(θ)eθ ∈ IR will appear to be on the edge of the sphere for an observer at b. But vθ will be
3
projected by the map L to L(vθ ) = A(q − b) + r(θ)(cos θ, sin θ, 0) ∈ IR3 if A is orthonormal. This is then
projected to P (vθ ) = P (q) + r(θ)(cos θ, sin θ)/L(vθ )3 ∈ IR2 . This formula was used for hiding a curve behind
the sphere image on the front cover of this book. The outline of a projected sphere may be drawn by joining
points P (vθ ) for a range of θ values using a cubic spline. The shape is always an ellipse. So probably there’s
a much easier way of doing this!


[767]
Chapter 42
Examples of manifolds
42.1 Topological space examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767

42.2 Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
42.3 Non-Hausdorff locally Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
42.4 Hölder-continuous manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
42.5 Torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
42.6 General sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
42.7 Conical coordinates for Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
42.8 Hyperboloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
42.9 Tractrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
42.10 Analysis on Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
This chapter presents various practical examples of manifolds. These examples are presented separately
from the theoretical chapters so that the layers of structure of each example geometry can be presented in
a unified fashion in one place rather than scattering the details of, say, the geometry of a sphere among the
theoretical chapters. The geometry of S 2 is dealt with separately in Chapter 41. Application of the theory
to practical examples is a good test of the value of particular choices of definitions. If a definition is so
abstract that it cannot be applied to practical geometries, probably the definition should be revised.
42.0.1 Remark: In many fields, particularly in topology and real analysis, it turns out that a very small
set of examples serves a very broad set of applications, mostly as counterexamples to conjectures and to show
that some theorems cannot be easily improved. (Theorems which cannot be improved, in some specified
sense, are called “sharp”.) Differential geometry is similar. A small set of examples suffices for numerous
roles, not only as counterexamples, but also to demonstrate how definitions and theorems play out in practice.
This is a further justification for collecting popular examples together in their own chapters at the end of
the book.
[ Maybe should have a chapter dealing with S n for n ≥ 3 also! ]

[ Also should have a section on spherical coordinates for all of IR3 instead of just S 2 . But that would just be
a study of curvilinear coordinates for the flat space IR3 rather than a curved geometry. ]
[ Perhaps should have a chapter on classical differential geometry, just embedded manifolds. See EDM2 [35],
App. A.4, page 1730. ]
42.1. Topological space examples

Topological spaces are not all manifolds. But some examples of topological spaces are presented in this
chapter. Of special interest are pathological examples.
42.1.1 Remark: Pathological sets and functions are very useful for disproving conjectures, thereby saving
a lot of research time trying to prove false conjectures. Research on an open question often proceeds
by an alternation between two directions of attack: (A) trying to prove that the conjecture is true, and
(B) trying to find a counterexample. Usually trying to find a proof of the conjecture helps to show how to

768 42. Examples of manifolds
construct a counterexample. Conversely, attempting to construct a counterexample helps to find a proof of

the conjecture.
Pathological examples are useful for teaching mathematics. They are useful as “borderline examples” to
show what is the most extreme kind of example which is just able to satisfy a particular definition. Extreme
examples are also useful for showing why every condition of a definition is necessary. Quite often, a conjecture
which is shown to be false by counterexample can be made true by adding an extra condition which excludes
the counterexample.
Therefore it is useful to learn a large repertoire of pathological sets and functions for use in testing definitions
and for disproving and improving conjectures.
42.1.2 Example: [ Present a space-filling curve here. Try to explicitly calculate t so that f (t) = (π, e) for
a space-filling curve f : [0, 1] → [0, 1]2 to help make the case that all points are covered, and that the inverse
function is practically calculatable. ]
42.2. Euclidean spaces

It is not entirely superfluous to study Euclidean spaces IRn from the perspective of differential geometry. In
fact, it is useful because we already know all the answers. So we only have to ensure that the differential
geometry approach gives the right answers. This helps to clarify the theory without being distracted by any
real geometric interest.
Let M = IRn for some n ≥ 1. Let X = C ∞ (M ). A one-chart atlas for M uses the identity map idM
on M as the only chart & ψ. Tangent operators ∂p,v : X → IR for p ∈ M and v ∈ IRn are defined so that
∂p,v : f 8→ v i ∂f (x)/∂xi &x=p . Then the tangent operator space of M is T (M ) = {∂p,v ; p ∈ M, v ∈ IRn }.
It is not easy to define the space T (T (M& )) in a concrete sense because tangent vectors in T (M ) are defined
to have a form such as ∂p,v = v i ∂/∂xi &x=p . This tangent vector is an element of the dual of C ∞ (M ). Its
derivatives with respect to the base point p may be defined in the sense of Schwartz distributions as follows.
∂ 2 f (x) &&
∂p,v ⊗ w : f 8→ −v i wj i j & .
∂x ∂x x=p
But this is not an element of T (T (M )). This functional ∂p,v ⊗ w is a map from C ∞ (M ) to IR, just like ∂p,v ,
whereas an element of T (T (M )) should be a map from C ∞ (T (M )) to IR.
The space T (M ) may be parametrized by the pair (p, v) for tangent vectors ∂p,v within a given fixed chart
for M . This defines a chart θ : Dom(ψ) × IRn → IR2n for T (M ). In this case, the domain of ψ = idM is all
of IRn . So θ : T (M ) → IR2n , θ : ∂p,v 8→ (p, v). Therefore elements of T (T (M )) have the form
+ ∂ ∂ , &
(2) &
∂p,v,α,β : g 8→ αj j + β k k
(g ◦ θ −1 (q, w))& .
∂q ∂w (q,w)=(p,v)
(2)
Each vector ∂p,v,α,β has a horizontal component αj ∂/∂q j and a vertical component β k ∂/∂wk . It is difficult
to think how a function g : T (M ) → IR could have much real significance. (This may be contrasted with
the definition of a simple tangent vector ∂p,v which acts on a function f : M → IR, which does have a clear
significance and application.)
A real problem here is that the representation of a tangent vector as an element ∂p,v : C ∞ (M ) → IR does not
really match one’s intuitive idea of a tangent vector. It does happen to have the right transformation laws
under changes of coordinates, but when it is time to look at spaces like T (T (M )) and the n-frame bundle of
a manifold M , which is required for defining a connection, the generalized function style of definition is quite
difficult to interpret. One relatively minor inconvenience is the fact that ∂p,0 = ∂q,0 for all p, q ∈ M . This
provided a hint already that the tangent vector definition as a pointwise derivative operator is not right.
The difficulty in defining the space T (T (M )) is a confirmation that the definition is not suitable for general
purposes.
Elements of T 2 (M ) = T (M ) ⊗ T (M ) must be of the form (v i ∂i ) ⊗ (wj ∂j ) for v, w ∈ IRn .
42.2.1 Example: Let Ω be an open subset of IRn for some n ∈ + . Define an atlas on Ω by S = {ψ} with
ψ : Ω → IRn defined by ψ(x) = x for all x ∈ Ω. Then (Ω, S) is an analytic manifold.

42.3. Non-Hausdorff locally Euclidean spaces 769
42.3. Non-Hausdorff locally Euclidean spaces

42.3.1 Remark: This section deals with spaces formed from Euclidean spaces by quotient operations which
cause one or more points in the resulting spaces to be non-Hausdorff.
42.3.2 Example: An example of a topological space which is a non-Hausdorff locally Euclidean space
may be constructed as follows. Define X = IR × {0, 1} to have the relative topology from IR2 . Define an
equivalence relation R on X so that (x1 , y1 ) R (x2 , y2 ) whenever x1 = x2 -= 0 or (x1 , y1 ) = (x2 , y2 ). Let
Y = X/R have the standard quotient topology. (See Definition 15.1.8.) The set Y may be identified with
the set Y # = (IR × {0}) ∪ {(0, 1)}. Clearly the two points (0, 0) and (0, 1) cannot be separated. So Y # is not
Hausdorff although it is T1 .
[ Probably any locally Euclidean space is T1 . Should prove this. ]
Define a topological atlas (which just happens to be analytic) on Y # to have two charts: φi : Ui → IR with
U0 = IR × {0}, φ0 : (x, 0) 8→ x for x ∈ IR, U1 = ((IR \ {0}) × {0}) ∪ {(0, 1)} and φ1 : (x, y) 8→ x for (x, y) ∈ U1 .
Then Definition 25.2.3 (for a locally Euclidean space) is satisfied but X is not Hausdorff. This example is
IR × {1}
X
IR × {0}
f
f f
identification map f f (0,1)
f
Y# IR × {0}
φ1
charts φ1 φ1
φ0 φ0 φ0
IR
IR
Figure 42.3.1 Non-Hausdorff locally Euclidean space example 42.3.2
42.3.3 Remark: It is shown in Example 42.3.2 that the quotient topological space formed by identifying
two real lines at all points except one – the “real line with two origins” – is locally Euclidean but non-
Hausdorff. The rest of this section presents the natural generalization of this to the quotient space of two
copies of a Euclidean space IRn at all points except some subset S of IRn . The set S could be chosen to be
dense in IRn , for instance. This kind of quotient topological space construction seems to be the same thing
as a topological “graft” as defined in Sections 15.10 and 15.11.
42.4. Hölder-continuous manifolds

As mentioned in Section 26.11, there seem to be no differential geometry textbooks which treat the subject
of Hölder-continuity of manifolds and other scales of fractional regularity which interpolate the discrete
steps of C k regularity for integer k. For example, Lipschitz manifolds (Definition 26.12.4) are rarely defined
or treated. Manifolds with fractional differentiability are not at all pathological. They arise naturally as
integrals of systems which have discontinuous force functions.
42.4.1 Example: The set M = {x ∈ IRn+1 ; xn+1 = |x1 |} is a simple example of a set which is naturally
modelled as an n-dimensional C 0,1 manifold. (See Figure 42.4.1.)
The most obvious chart for this set is ψ0 : M → IRn with ψ0 : (x1 , . . . xn+1 ) 8→ (x1 , . . . xn ). The atlas
{ψ0 } containing only this chart makes this set a C ∞ manifold. One might ask why this very smooth state
of affairs should be upset by adding further charts. The fly in this ointment is that this would not be an
accurate description of the manifold. Problems would arise when the embedding of M in IRn+1 is used as a
diffeomorphism. Tangent vectors and higher order differential constructions would not map as expected.

xn+1
M 2 M
8
0.4
0.4
8
0.
α=0
α=0
0.
α=
α=
α=
α= x1
-3 -2 -1 0 1 2
Figure 42.4.1 Projection maps for a Lipschitz manifold
To expose the non-C ∞ nature of the set M , it suffices to project the set onto IRn in different directions. (See
Remark 26.4.3 for general projections for graphs of functions.) For α ∈ (−1, 1), define the chart ψα : M → IRn
by ψα : (x1 , . . . xn+1 ) 8→ (x1 − αxn+1 , x2 , . . . xn ). This map is clearly C ∞ with respect to the ambient
space IRn+1 , but when xn+1 = |x1 | is substituted, this yields ψα : (x1 , . . . xn+1 ) 8→ (x1 − α|x1 |, x2 , . . . xn ).
The transition map ψα ◦ ψ0−1 : IRn → IRn is defined by ψα ◦ ψ0−1 : (x1 , . . . xn ) 8→ (x1 − α|x1 |, x2 , . . . xn ),
which is clearly only C 0,1 . (See Figure 42.4.2.)
ψα1 ◦ ψ0−1 (x) = x1 − α|x1 |

2 α=0
α = 0.4
1
α = 0.8
-2 -1 1 2 x1
-1
α=0 -2
α = 0.4 α = 0.8
Figure 42.4.2 Transition maps for a Lipschitz manifold
42.4.2 Remark: The important thing to note here is that for general embedded manifolds, the direction
of projection has no natural choice. It would be deceptive to use only one projection of a set because this
would give the manifold a structure which depends on the choice of projection chart. To honestly reflect the
structure of an embedded manifold, all local projections onto hyperplanes should be included in the atlas
so that there will be no perplexing chart-dependent properties when relations between the manifold and the
ambient space are examined.
42.4.3 Remark: An interesting question to ask about the xn+1 = |x1 | manifold is how the lack of C 1
regularity affects the definition of the tangent bundle. In fact, this example manifold has an extra property
which is not shared by general C 0,1 manifolds; namely its transition maps have unidirectional derivatives at
all points. This kind of manifold is discussed in Section 27.14.
42.4.4 Example: Figure 42.4.3 shows a function which is C 0,1 but which has no one-sided derivatives
at x = 0.

42.5. Torus 771
y=x
f (x) = x sin (π ln2 |x|)

1
-2 -1
1 2 x
-1
y = −x
Figure 42.4.3 Lipschitz function without one-sided derivatives at x = 0
The function f : IR → IR defined by f (x) = x sin(k ln |x|) for x -= 0 and f (0) = 0 has derivative f # (x) =
sin(k ln |x|) + k cos(k ln |x|) for x -= 0. So the C 0,1 norm of f is Rf R0,1 = (1 + k2 )1/2 . In this case, k = π/ ln 2.
Define the set M ⊆ IRn+1 by M = {x ∈ IRn+1 ; xn+1 = f (x1 )}. When this is projected at various angles
onto the xn+1 = 0 plane as in Example 42.4.1, the resulting charts will have C 0,1 transition maps, but the
points on M with x1 = 0 will have no one-sided tangent vectors in most directions.
42.5. Torus
42.5.1 Definition: The n-torus T n may be defined to have the base space
T n = {z ∈ n
; ∀i = 1 . . . n, |zi |2 = 1}.
42.6. General sphere
[ Should present the Jacobian matrix for IRn in spherical coordinates. ]
The n-sphere S n is defined for n ∈ +
to have the base set
S n = {x ∈ IRn+1 ; |x| = 1}.
This will be given the differentiable structure of a C ∞ n-manifold.

[ Use the differentiable structure induced by the embedding of S n in IRn+1 . This means that the differentiable
structures induced by embeddings will have to be defined somewhere. This probably requires taking as an
atlas the set of all local coordinate maps which are projections onto tangential planes. If too small a number
of such projection maps is used, then a non-C 2 manifold could easily appear to be a C ∞ manifold, because
it is the overlap of maps which determines the regularity of an atlas. Somewhere there should also be a good
treatment of how the “maximal atlas” generated by a given atlas may actually be of variable regularity at
different parts of the manifold. There probably should also be a treatment somewhere of manifolds which
are actually graphs of functions f : IRn → IR. ]
To generate a general set of spherical coordinates for all dimensions, it is convenient to look for a pattern in
the following formulas for IR4 .
x1 = R cos ψ cos θ cos φ
x2 = R cos ψ cos θ sin φ
x3 = R cos ψ sin θ
x4 = R sin ψ.

This is the same as the formulas for IR3 with the difference that the first three coordinates are multiplied by
cos ψ and the fourth coordinate is sin ψ. Thus in IRn , the formulas would be
( .
R. nj=2 cos θj for i = 1
x =
i .n
R sin θi . j=i+1 cos θj for i > 1,
where θ2 , . . . θn are n − 1 angle parameters. It is interesting to note that R fits in logically as angle parame-
ter θ1 in this scheme. In fact, it is traditional to make all of the angles lie in the set [−π/2, π/2] except for
the first angle, which is in this case θ2 . The first angle is generally taken to lie in the set [0, 2π) or some such
set of width 2π. One might ask why the pattern is broken by this exception. The exception disappears if
R is allowed to be negative. For instance, in IR2 this gives x1 = R cos θ and x2 = R sin θ where R ∈ IR and
θ ∈ [−π/2, π/2]. Whether or not R is permitted to be negative, the idea of letting R be the first coordinate
θ1 causes the coordinates θ1 , . . . θn to be a right-handed set of coordinates with respect to x1 , . . . xn , and in
fact the Jacobian of the map from the n-tuple θ to the n-tuple x is the identity matrix when all angles (but
not R) are zero.
A consequence of the above discussion is a preferred order for the coordinates for spherical coordinates for
IR3 and higher-dimensional spaces. For IR3 , the order should be (R, φ, θ), where φ the longitude angle and
θ is the latitude angle. Each time the dimension of the space is increased, the angle which brings in the new
axis direction is added to the end of the parameter list.
42.7. Conical coordinates for Euclidean spaces

[ These should be the sort of things I used for barrier functions in the old days. They involve hypergeometric
functions. This section is really about curvilinear coordinates for flat space. Such stuff should really be in a
different section probably. ]
The Laplacian in conical coordinates. For x ∈ IRn , let r = |x| and φ = xn r −1 . If a function u is conically
symmetric, then can write u(x) = r λ g(φ). Then
ui = λr λ−2 xi g(φ) + r λ g # (φ)(δin r −1 − r −3 xi xn )

" #
∆u = r λ−2 (1 − φ2 )g ## (φ) − (n − 1)φg # (φ) + λ(λ + n − 2)g(φ)
Let u(x) = f (r, φ). Then
n−1 1 − φ2 n−1
∆u = frr + fr + 2
fφφ − φfφ .
r r r2
42.8. Hyperboloid
Hyperboloids are sets of the form
n+1
5
n
Hs,c = {x ∈ IR n+1
; s2i = c},
i=1
where c ∈ IR, and s ∈ {−1, +1}n .

[ Should give a basic classification of these manifolds, and also put on a differentiable structure. ]
42.9. Tractrix
The tractrix curve is defined in EDM2 [35], 93.H, page 351. In Cartesian coordinates, the tractrix has a
parametric formula f : (0, π) → IR2 with f (t) = a(log tan(t/2) + cos t, sin t).
A surface of constant negative curvature may be constructed as a surface of revolution generated by a tractrix.
This surface is called a “pseudo-sphere”. (See Bell [191], pages 303–305, EDM2 [35], 111.I, page 419, 285.E,
page 1072.)
[ Calculate the curvatures of a tractrix of revolution. Also determine the form of all geodesics. ]

42.10. Analysis on Euclidean spaces 773
42.10. Analysis on Euclidean spaces

42.10.1 Example: Figure 42.10.1 shows the “vectors” df (or ∇f ) for a simple function on a subset of IR2 .
The function f is defined by f (x, y) = (y/(1 − x2 ))2 for x ∈ (−1, 1) and y ∈ IR. This has the differential
df (x, y) = (4xy 2 (1 − x2 )−3 , 2y(1 − x2 )−2 ). (Any resemblance between this diagram and a Viking opera house
is purely coincidental.)
f (x, y) = 1.0 y f (x, y) = 0.9

1
(fx , fy ) =
(4xy 2 , 2y(1 − x2 ))
(1 − x2 )3
I J2
y
f (x, y) =
1 − x2
-1 1 x
f (x, y) = 0.1 f (x, y) = 0.2
Figure 42.10.1 Differential of real-valued function on IR2
[ Could have a section on Einstein manifolds near here. See Section 38.6 for the definition of an Einstein
space. This or some other section should cover such concepts as the de Sitter universe and the Friedman
universe, if these concepts are not mere figments of my imagination. Should have a specific model of the
universe using current ideas about its age and the Λ kludge factor and other such things. Probably should
have a whole chapter on cosmology. ]
[ Should have a section on Schwarzschild singularities near here. Give coordinates and solutions for black hole
problems. ]
[ Should have a section on information geometry near here. See EDM2 [35], 399.D, pages 1488–1489. ]


[775]
Chapter 43
Examples of fibre bundles
43.1 Euclidean fibre bundles on Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . 775

43.2 The Möbius strip as a fibre bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
43.3 The Möbius strip fibre bundle on S 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
This chapter presents various practical examples of fibre bundles.
43.1. Euclidean fibre bundles on Euclidean spaces

This section concerns fibre bundles with B = IRm , F = IRn and E = IRm+n with m, n ≥ 1, for a wide range
of structure groups on F from the trivial group up to the group of topological automorphisms of F . The
projection map π : E → B is defined as π : (x1 , . . . xm , xm+1 , . . . xm+n ) 8→ (x1 , . . . xm ). A full specification
tuple for a fibre bundle has the form
(E, π, B, AF < (E, TE , π, B, TB , AE,F ; G, TG , F, TF , σG , µ).

E , F, G) −
With this section’s assumptions, the specification tuple is as follows.
(IRm+n , π, IRm , A, IRn , G) −

< (IRm+n , TIRm+n , π, IRm , TIRm , A; G, TG , IRn , TIRn , σG , µ),
where A is the IRn -fibre atlas for E = IRm+n . The topologies on the Euclidean spaces are the standard
Euclidean topologies. Let G be any subgroup of the group of homeomorphisms of the fibre space F = IRn .
Define the topology TG on G as the weak topology induced by µ, namely
) *
TG = {g ∈ G; µ(g, x) ∈ Ω}; Ω ∈ TIRn and x ∈ IRn .
The operation σG is completely determined by the action of elements g ∈ G on elements of F = IRn .

This operation is simply function composition. Hence the topological transformation group (G, F ) − <
(G, TG , F, TF , σG , µ) is well defined.
When the group G is chosen, the only freedom remaining in the construction of these fibre bundles is the
choice of the atlas A. This atlas consists of continuous maps φ : π −1 (U ) → F for open sets U ∈ TB = TIRm
such that π × φ : π −1 (U ) ≈ B × F . These charts are constrained by the requirement that their transition
maps must be elements of G acting on F . The smaller the group G is, the stronger the constraint on the
charts in the atlas.
First let the structure group be the trivial group G = {I}, where I : F → F is the identity map I : x 8→
x. Now define a single chart for E = IRm+n by φ0 : E → F with φ0 : (x1 , . . . xm , xm+1 , . . . xm+n ) 8→
(xm , . . . xm+n ). Then A = {φ0 } is clearly a fibre bundle atlas for the fibre bundle.
For this group G = {I} and atlas A = {φ0 }, it is interesting to determine how much freedom remains
for defining compatible fibre charts for this fibre bundle. Let φ1 : π −1 (U ) → F be a fibre chart which is
−1
compatible with A for some U ∈ TB . Then for all b ∈ B, φb,1 ◦ φb,0 : F ≈ F and φb,1 ◦ φ−1
b,0 : F ≈ F must both
&
&
be elements of G, where φb,i = φi π−1 ({b}) for i = 1, 2. This clearly implies that φb,1 = φb,0 for all b ∈ B,
&
from which it follows that φ1 = φ0 & −1 . In other words, all compatible charts for the particular choice of
π (U)
atlas A must agree everywhere on their domain with the function φ0 .

776 43. Examples of fibre bundles
Given G = {I} and any (G, F ) fibre chart at all for the fibre bundle (E, π, B), all other fibre charts agree
everywhere with the given fibre chart. This implies that there are infinitely many different, mutually incom-
patible (G, F ) atlases for this fibre bundle. There is a very wide range of choice of fibre chart φ0 : E → F .
For example, φ#0 may be defined by φ#0 : (x1 , . . . xm , xm+1 , . . . xm+n ) 8→ (xm + y 1 , . . . xm+n + y n ), where y ∈ F
is an arbitrary element of IRn . The set A# = {φ#0 } still meets all of the requirements for a (G, F ) fibre bundle
atlas because φ#0 is still continuous and satisfies π × φ#0 : π −1 (U ) ≈ B × F , and A# has no problems with
transition maps because there are none. In fact, given any homeomorphism θ : F ≈ F , the set A# = {θ ◦ φ0 }
is a valid (G, F ) atlas for (E, π, B) if φ0 : E → F is any valid fibre chart.
Now let G = {I, J}, where I is as above and J : F ≈ F is defined by J : x 8→ −x. Then clearly G is a
group of homeomorphisms of F . If the atlas is taken to be A = {φ0 } again, where φ0 is as above. Now a
chart φ1 : π −1 (U ) → F for U ∈ TB is a compatible chart if and only if φb,1 = I ◦ φb,0 or φb,1 = J ◦ φb,0 for
all b ∈ B, where the choice of map I or J is constant with respect to b on connected subsets of U . If U = B,
clearly either φ1 = φ0 or φ1 = J ◦ φ0 .
In the case of the two-element group G = {I, J}, the choice of fibre charts which are compatible with a given
single fibre chart is very limited.
Consider now the group G = {θ : F ≈ F ; θ is linear} = GL(n). In this case, it is not necessary that all
charts φ : π −1 (U ) → F for U ∈ TB should be linear, but if any one of these fibre charts is linear for a
given point b ∈ U , then all other fibre charts must be linear for this value of b. The fibre chart φ0 does not
necessarily map from origin to origin.
The largest possible group G is the set of all homeomorphisms of the topological space F = IRn . Then given
any fibre chart φ0 : U0 → F for the fibre bundle, any other fibre chart φ1 : U1 → F will have the form
e 8→ µ(θ(π(e)), φ0 (e)) for e ∈ π −1 (U0 ∩ U1 ) for some continuous map θ : U0 ∩ U1 → G. In this case, all fibre
charts are compatible. Therefore it is not necessary to specify an atlas.
One of the important points that comes clearly out of this example is that fact that the total space E is
only a topological space and has no other structure than that imposed by the fibre charts. The structure on
the structure group imposes structure on the total space, but it does not have to respect any structure on
E apart from its topological structure. Thus even if the group is a linear group such as GL(n), the point
(π × φ)−1 (b, 0) in each fibre π −1 ({b}) which is mapped by φ to 0 ∈ F is not necessarily any sort of zero
point such as (b1 , . . . bm , 0 . . . 0) ∈ π −1 ({b}). This “origin point” may be any point in π −1 ({b}), but in the
case G = GL(n), this point must be the same for all fibre charts in a given atlas, and this “origin point”
must vary smoothly with respect to b ∈ B.
This shows the significance of specifying an atlas for a fibre bundle. As soon as one fibre chart is given in a
neighbourhood of a point, all other fibre charts are determined up to a continuously varying group operation
in that neighbourhood. There is no unique identification of the fibre space F with the fibre π −1 ({b}) at each
point b ∈ B, except for the trivial structure group, but in the case of an effective structure group, the set of
all possible maps from the fibre at a point to the fibre space is homeomorphic to the structure group itself.
For instance, if the group has two elements, then there are two possible chart maps at b, and so forth.
43.2. The Möbius strip as a fibre bundle

The Möbius strip is perhaps the simplest non-trivial example of a fibre bundle. (The date 1865 is given by
do Carmo [17], page 36, for the first presentation of the Möbius strip by August Ferdinand Möbius.) Let
B = E = S 1 = {x ∈ IR2 ; |x| = 1} with the standard topology induced by IR2 , and define π : E → B by
π(eθ ) = e2θ , where eθ ∈ S 1 is defined by eθ = (cos θ, sin θ). It follows that π −1 ({eθ }) = {eθ/2 , eπ+θ/2 } for all
θ ∈ [0, 2π), where S 1 is identified with the set [0, 2π) with the usual equivalent topology. (See Figure 43.2.1.)
Clearly each set π −1 ({eθ }) consists of two points with the discrete topology. (With the trivial topology, the
set F would need the trivial topology, and the product topology of B ×F would not be locally homeomorphic
to the required open sets of E. [ Check to make certain that the trivial topology on F is impossible. If F
has the trivial topology, then G must be the trivial group. ]) Any fibre space for (E, π, B) would have to be
equivalent to F = S 0 = {1, −1} ⊆ IR with its standard topology.
The domains of fibre bundle charts are required to be of the form π −1 (U ) for some open subset of B. For
example, consider open sets of the form Ut,r = {θ ∈ [0, 2π); |θ − t| < r} for t ∈ [0, 2π) and r ∈ [0, π],
using the obvious folding of θ into [0, 2π). Then π −1 (Ut,r ) = Ut/2,r/2 ∪ Uπ+t/2,r/2 . When r ≤ π, this set
has two components, and the value of φ(e) must be constant for e in each component, because F = S 0

43.2. The Möbius strip as a fibre bundle 777
0 π/4 π/2 3π/4 π 5π/4 3π/2 7π/4 2π

E = S1
π −1 : B → E
B = S1
0 π/2 π 3π/2 2π
Figure 43.2.1 Fibre bundle map for the Möbius strip
has the discrete topology and φ must be continuous. (Of course, if F has the trivial topology, all functions
φ : E → F are continuous.) Since all homeomorphisms are bijections, F must have two elements and the
value of φ must be different for the two components of π −1 (Ut,r ). This also makes U = B impossible. And
this means that there must be at least 2 charts in the fibre atlas for (E, π, B). [ In fact, it looks like 3 charts
will be required, and two of these will flip the values of φ in their intersection. Must check this! ] . . .
[ The following sets U1 etc. might contain and error. Check this. Also check φ1 and φ2 . ]
Figure 43.2.2 shows a fibre bundle map for the Möbius strip with U1 = (2π − ε, 2π − 2ε) ⊆ B and π −1 (U1 ) =
(2π − ε/2, π − ε) ∪ (π − ε/2, 2π − ε) ⊆ E for some small ε > 0. (Note that intervals are all denoted left-to-right
modulo 2π.)
−1 1
F = {−1, 1}
φ1 : π −1 (U1 ) → F
E = S1 π −1 (U1 ) =
(2π − ε/2, π − ε) ∪ (π − ε/2, 2π − ε)
π:E→B
B = S1 U1 = (2π − ε, 2π − 2ε)
0 π/4 π/2 3π/4 π 5π/4 3π/2 7π/4 2π
Figure 43.2.2 Fibre bundle chart for the Möbius strip
By following the arrows, it is clear that for each b ∈ B, the two elements of π −1 ({b}) map to the two
different elements of F = {−1, 1}. A similar fibre chart can be defined for an open subset of B such as
U2 = (π − ε, π − 2ε). Then {U1 , U2 } forms an open covering of B, and on the overlap components of the
sets π −1 (Ui ), the map to F may differ, but the transition function is an element of the structure group. The
fibre chart represented in Figure 43.2.2 is φ1 : U1 → F defined as follows.
(
−1 for θ ∈ [0, π − ε) ∪ (2π − ε/2, 2π)
φ1 (θ) =
1 for θ ∈ (π − ε/2, 2π − ε).
If φ : U2 → F is defined by
(
−1 for θ ∈ [0, π/2 − ε) ∪ (3π/2 − ε/2, 2π)
φ2 (θ) =
1 for θ ∈ (π/2 − ε/2, 3π/2 − ε),
then φ1 and φ2 agree on the sets [0, π/2 − ε) ∪ (2π − ε/2, 2π) and (π − ε/2, 3π/2 − ε), but they disagree on
(π/2 − ε/2, π − ε) and (3π/2 − ε/2, 2π − ε). Hence the group element gφ1 φ2 (b) ∈ G is constant on each of
these intervals, and therefore is a continuous function of b ∈ B. (See Figure 43.2.3.)

−1 1
F = {−1, 1}
φ1 : π −1 (U1 ) → F
π −1 (U1 ) ⊆ E
b
E = S1
π −1 (U2 ) ⊆ E
φ2 : π −1 (U2 ) → F
F = {−1, 1}
−1 1
Figure 43.2.3 Fibre bundle charts for the Möbius strip
43.2.1 Remark: Even though the sets B and E are both equal to S 1 in this example, they are regarded
as being two “copies” of the same set S 1 rather than the same set. This can be made self-consistent by
introducing the concept of a “labelled set”, which is really an ordered pair (L, S) where L is the label and
S is the set. Then two sets with different labels are actually different. This would make life too difficult. So
it is best to deal with this aspect of set naming in an informal way. However, it is important to know that
this situation can be made self-consistent if required. One good way of dealing with would be to say that B
and E are labels in some label space, and the values of both of these labels are the same set S 1 .
[ The rest of these examples need to go in their own sections. ]
43.2.2 Example: [A torus: E = S 1 × S 1 , B = S 1 , F = S 1 . Turn this into a Klein bottle by turning the
tube inside out at the join?]
43.2.3 Example: [Some sort of multi-Möbius strip: B = S 1 , F = n = {e2πki/n ; k = 0, . . . n − 1}.]
43.2.4 Example: [B = S 2 , F = n = {e2πki/n ; k = 0, . . . n − 1}. Some sort of 2-d multi-Möbius strip?]
43.2.5 Example: [A sphere: B = S 2 , F = S 1 .]
43.2.6 Remark: The worst joke in mathematics is probably: “Why did the chicken cross the Möbius strip?
Answer: To get to the same side.”
43.3. The Möbius strip fibre bundle on S 1

[ This is the original section on the Möbius strip written in the early 1990s. ]
A simple non-trivial example of a fibre bundle is the Möbius strip fibre bundle on the circle S 1 as the base
space. Define E = [0, 1) × (0, 1) ⊆ IR2 , with the topology T = τ (Top(IR × (0, 1))), where IR × (0, 1) has the
topology induced by IR2 , and τ : IR × (0, 1) → E is defined by
(
(x mod 1, y), x mod 2 ∈ [0, 1)
τ (x, y) =
(x mod 1, 1 − y), x mod 2 ∈ [1, 2).
Then T is a topology on E. This can be seen by noting that for any set G ∈ Top(IR × (0, 1)),
% %
τ −1 ◦ τ (G) = (G + 2i) ∪ (G# + 2j + 1)
i∈ j∈
∈ Top(IR × (0, 1)),

43.3. The Möbius strip fibre bundle on S 1 779
where G# = {(x, y) ∈ IR2 ; (x, 1 − y) ∈ G}. Then it follows by Theorem 15.11.6 that T is indeed a topology
on E. So denote it by Top(E) = T .
Similarly define the topology on B = [0, 1) to be Top(B) = σ(Top(IR)), where IR has the usual topology and
σ : IR → B is defined by σ(x) = [x]. Define p : E → B by p(x, y) = x. Then p is continuous.
Let F = (0, 1) have the topology induced by IR, and let G = {1, g} be the group acting on F with g : y 8→ 1−y.
Define the atlas S = {ψ1 , ψ2 } by
p(Dom(ψ1 )) = (0, 1)
p(Dom(ψ2 )) = [0, 1) \ { 12 }
∀(x, y) ∈ (0, 1) × (0, 1) ψ1 (x, y) = y
∀(x, y) ∈ [0, 12 ) × (0, 1) ψ2 (x, y) = y
∀(x, y) ∈ ( 21 , 1) × (0, 1) ψ2 (x, y) = 1 − y.
%
Then p(Dom(ψ1 )), p(Dom(ψ2 )) ∈ Top(B) and ψ∈S p(Dom(ψ)) = B.
Also note that p × ψ : Dom(ψ) ≈ p(Dom(ψ)) × F for ψ ∈ S, and
(
−1 1, b ∈ (0, 12 ) or i = j
gψi ψj = ψi,b ◦ ψj,b =
g, b ∈ ( 21 , 1) and i -= j,
for i, j ∈ {1, 2}, and for any topology on G, gij : Ui ∩ Uj → G is continuous, since each gij is continuous on
the components of its domain.
Thus all the conditions for a fibre bundle atlas are satisfied.
[ Try to create a PostScript graphic for this fibre bundle. ]


[781]
Chapter 44
Derivations, gradient operators, germs and jets
44.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782

44.2 Some elementary examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
44.3 Further elementary examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
44.4 Spaces of differentiable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
44.5 Spaces of smooth functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
44.6 The space of analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.7 The Hölder spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.8 Further topics on derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.9 Germs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
44.10 Jets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
This chapter might be deleted in the first release of the book. The author originally throught that derivations
were the best way to define tangent spaces. After long contemplation on this subject, he concluded that
derivations were the worst way to define tangent spaces. This chapter has been placed near the end in the
hope that everybody will ignore it. The valiant attempt to convert tangent bundle analysis into algebra
leads to a mass of convoluted theory which is unnatural and difficult to use. It is better for algebra fans to
just grit their teeth and put up with all the limits and derivatives, or just stick to algebra.
The topic of the dimensionality of derivation-style tangent spaces, though apparently peripheral, takes up
most of this chapter, just to make sure that the rough edges are ironed out properly. This chapter may
be completely ignored. In fact, it probably should be completely ignored. A proof of the dimensionality
of the space of derivations is given by Warner [50], Theorem 1.17, page 13. See also Crampin/Pirani [12],
pages 247–248. The difficulty of just proving the dimensionality of the derivation-style tangent space is
sufficient reason to reject this approach. The fact that this approach can only be applied to C ∞ manifolds
is the coup de grâce.
In an attempt to get coordinate-free in relation to differential geometry, the modern Leibniz-rule definition
of a tangent vector has been widely adopted. This corresponds to the concept of a “derivation”. (See [20],
p.20.) It is not really possible to deal with manifolds in a totally coordinate-free manner, since manifolds
are defined to be topological spaces which can be coordinatized. Attempts to work without coordinates only
succeed in hiding the coordinates. But all definitions lead back to coordinates eventually. The “coordinate-
free” notations are nevertheless often tidier-looking than their coordinate-oriented equivalents. So it could
be useful to establish the correspondence between the two sets of definitions carefully.
This chapter commences with some basic definitions and examples. The examples are intentionally ele-
mentary, in order to demonstrate the surprising complexity of the product-rule definition of tangent space.
Following the simple examples, realistic function spaces like the smooth functions are dealt with.
[ The “synthetic differential geometry” approach gives a tangent space dimension theorem of more or less the
same sort as given in this chapter. See [50], Theorem 1.17. ]
[ See Klingenberg [26], lemma 1.4.6, p.33 for a proof that all derivations are gradient operators. ]


782 44. Derivations, gradient operators, germs and jets
44.1. Definitions
[ Gallot/Hulin/Lafontaine [20] has definitions for this section. Defn. 1.45 derivations, defn. 1.50 derivation,
theorems 1.49, 1.51 Lie derivatives for derivations. ]
All of the difficulties arise even in the case of the manifold IRn . So it is not necessary to deal with general
differentiable manifolds here. In order for the Leibniz rule to make much sense, the function space in question
should be closed under the pointwise product of functions.
[ The following definition should explicitly define derivations, not just the tangent space. ]
n
44.1.1 Definition: Let F ⊆ IR(IR ) be a real vector space of functions f : IRn → IR under pointwise
addition and scalar multiplication which is closed under the operation of pointwise multiplication,
∀x ∈ IRn , (f.g)(x) = f (x).g(x),
for all f, g ∈ F . For such a space F and any p ∈ IRn , the tangent space of IRn at p with respect to F , denoted
TpF (IRn ), is the vector space of linear maps L : F → IR such that
∀f, g ∈ F, L(f.g) = L(f )g(p) + f (p)L(g).
Any such linear map L is called a tangent vector of IRn at p with respect to F .
44.1.2 Remark: This definition is quite standard, although it strongly emphasizes the function space as
opposed to the underlying manifold. In EDM [34] and Szekeres [87], F is taken to be in C ∞ (M ) for any C ∞
manifold M .
The definition of tangent space makes sense even if the base space M is generalized from IRn to any space
at all. The space F can also be generalized in various ways, for instance by replacing IR with a ring, but life
is already complicated enough as it is.
44.2. Some elementary examples

In this section, the most elementary examples of function spaces are examined – those which are generated
by a single member of the space.
The smallest vector space F of functions closed under pointwise multiplication is the trivial space F = {0},
for which TpF (IRn ) = {0}, where 0 represents, according to context, the zero function 0 : IRn → IR or the
zero operator 0 : F → IR.
Suppose F contains at least one function f : IRn → IR such that f -= 0. Then F also contains the functions
x 8→ f (x)i for positive integers i, and all linear combinations of such powers of f . That is, all polynomial
expressions of f (without a constant term) are in F . This may be denoted by F [f ], the space generated
by f :
(5m W
F [f ] = ai f ; m > 0 and ∀i = 1 . . . m, ai ∈ IR .
i
i=1
F [f ] is a vector space which is closed under pointwise multiplication, and is therefore an appropriate space
with respect to which a tangent space on IRn can be defined.
If f is a non-zero constant function, then F [f ] is 1-dimensional. More generally, if the range of f is finite,
then there will always be a non-trivial linear combination of the powers of f which is constant. Indeed,
suppose the range of f consists of the m distinct non-zero values yj and possibly also 0. Then the formal
polynomial in φ,
m
/
P (φ) = φ (φ − yj ),
j=1
has the value 0 at φ = f . This may be expanded in terms of powers of φ as

m+1
5
P (φ) = c k φk
k=1
m
5
= φm+1 + c k φk ,
k=1

44.2. Some elementary examples 783
! .m
where the real coefficients ck are given by cm+1 = 1, cm = mj=1 (−yj ), c1 = j=1 (−yj ) and so forth. Thus
f m+1 can be written as a linear combination of lower positive powers of f , and so the dimension of F [f ]
is equal to at most m. The converse fact that no lower power of f can be expressed in terms of yet lower
powers of f follows from the unique factorization theorem. So dim(F [f ]) = #(f (IR) \ {0}).
If the range of f is infinite, F [f ] clearly has countably infinite dimension. It turns out that the dimension of
F [f ]
F [f ] has an influence on the dimension of the tangent space Tp (IRn ).
F [f ]
44.2.1 Theorem: The tangent space Tp (IRn ) is 0-dimensional if the range of f is finite.
F [f ]
Proof: In the case of a constant function f , it is easy to show that L(f ) = 0 for all L ∈ Tp (IRn ). In
F [f ]
fact, L(1) = L(1.1) = L(1).1 + 1.L(1). Hence Tp (IRn ) = {0}.
F [f ]
If the range of f is finite, then P (f ) ∈ F [f ] and any L ∈ Tp (IRn ) may be applied to P (f ):
Um+1 V
5
0 = L(P (f )) = L ck f k
k=1
m+1
5
= ck L(f k )
k=1
m+1
5
= L(f ) kck f (p)k−1 .
k=1
The last line follows by the inductive application of the Leibniz rule to the product f k = f.f k−1 .
Viewed as a polynomial in f (p), the coefficient of L(f ) in the last line above is the derivative of the formal
polynomial P evaluated at f (p). And since P has no multiple zeros, the derivative of P must be non-zero at
each of its zeros, which in this case are the possible values of f (p). So the coefficient of L(f ) is equal to 0,
and it therefore follows that L(f ) = 0.
F [f ]
This has shown that Tp (IRn ) is 0-dimensional if the range of f is finite.
44.2.2 Remark: Now suppose that the range of f is infinite. Any function g in F [f ] may be written as
m
5
g= ai f i
i=1
for some integer m and coefficients ai ∈ IR. An inductive application of the Leibniz rule then reduces the
expression for L(g) to the following:
m
5
L(g) = L(f ) iai f (p)i−1 ,
i=1
F [f ]
by induction on i. Thus Tp (IRn ) is at most 1-dimensional, since the value of any tangent vector L on a
function in F [f ] is completely determined by its value on f .
F [f ]
Although it is clear that the tangent space Tp (IRn ) is at most 1-dimensional, it is perhaps not so obvious
that it is at least 1-dimensional. To show this, it is essentially necessary to find at least one non-zero map
L which obeys the Leibniz rule for all pairs of functions f and g in F [f ]. This is equivalent to showing
that no further reductions of the above expression for L(g) are possible by using the Leibniz product rule.
This difficulty increases as the size of F increases. It is often difficult to know whether further reductions
are possible. It is for this reason that it is useful to examine trivial examples first in depth, so as to find
techniques for showing the non-existence of further reductions.
F [f ]
44.2.3 Theorem: The tangent space Tp (IRn ) is 1-dimensional if the range of f is infinite.

Proof: It is sufficient to demonstrate the existence of a non-zero linear map L : F [f ] → IR which obeys
the Leibniz rule for all functions g, h ∈ F [f ]. If the range of f is infinite, then any g ∈ F [f ] may be written
as
m1
5
g= ai f i ,
i=1
for some coefficients (ai )m

i=1 . These coefficients are uniquely determined by g, because the vectors f are
1 i
linearly independent for i ≥ 1 if the range of f is infinite. Define a map L : F [f ] → IR in terms of these
unique coefficients for g by
m1
5
L(g) = α iai f (p)i−1 , (44.2.1)
i=1
for some α ∈ IR. Similarly, any h ∈ F [f ] uniquely determines coefficients (bj )m

j=1 such that
2
m2
5
h= bj f j .
j=1
(Of course the numbers m1 and m2 are not uniquely determined. They are actually irrelevant, and serve
only the emphasize that the sequences are finite.) By expanding the product g.h, the value of L(g.h) can be
calculated as: I5 J
m1 m2
5
L(g.h) = L ai f i
bj f j
i=1 j=1
m1 5
5 m2
= ai bj L(f i+j )
i=1 j=1
m1 5
5 m2
=α (i + j)ai bj f (p)i+j−1 .
i=1 j=1
The right hand side of the Leibniz rule is
I5
m1 J 5m2 m1
5 I5
m2 J
L(g)h(p) + g(p)L(h) = L ai f i . bj f (p)j + ai f (p)i . L bj f j
i=1 j=1 i=1 j=1
m
5 1 m2
5 m1
5 5m2
=α iai f (p)i−1 . bj f (p)j + ai f (p)i . α jbj f (p)j−1
i=1 j=1 i=1 j=1
m1 5
5 m2
=α (i + j)ai bj f (p)i+j−1 .
i=1 j=1
Both of these calculations use only the definition (44.2.1) for g. This verifies the product rule for L for all
g, h ∈ F [f ], for any α ∈ IR.
F [f ]
The tangent space Tp (IRn ) is thus at least 1-dimensional. So the dimension of the tangent space is
precisely 1.
44.2.4 Remark: It is perhaps not quite obvious why this form of proof would not work when the space
F [f ] is finite-dimensional. The operator L defined by (44.2.1) is not uniquely defined when dim(F [f ]) < ∞,
since the value of L(g) for g ∈ F [f ] depends on whether g is reduced to lower powers of f before having L
applied to it. One way around this is to require all functions in F [f ] to be reduced to the lowest possible
powers of f before the application of L:
m
5
g= ai f i .
i=1

44.3. Further elementary examples 785
Such a representation of g is then unique, but the Leibniz rule fails for the resulting operator when α -= 0.
To see this, let g = f m and h = f . Then g.h = f m+1 is equal to some linear combination of the powers f i
for 1 ≤ i ≤ m:
5m
f m+1 = ci f i ,
i=1
for some real numbers ci . Hence the definition of L gives

m
5
L(g.h) = α ici f i−1 .
i=1
But
L(g)h(p) + g(p)L(h) = αmf (p)m−1 .f (p) + f (p)m .α
= α(m + 1)f (p)m .
If the Leibniz rule is to hold, then these two calculations of L(g.h) should give the same result. But as
noted in the proof of Theorem 44.2.1, the derivative of a polynomial cannot be zero at a simple zero of the
polynomial. So the Leibniz rule does not hold for the product g.h.
44.2.5 Remark: Clearly, if the elements of the function space F are required to be continuous, then the
only kind of function with a finite range is a constant function, on a connected region at least.
44.3. Further elementary examples

In this section, the case F = F [f1 , f2 ] and similar function spaces are treated, like F [x, sin x]. It is shown
that the tangent space can have very large dimension even for IR1 .
[ The case F = F [f1 , f2 ]. If this space is infinite-dimensional and there are no relations between f1 and f2 ,
F [f ,f ]
then the tangent space Tp 1 2 (IRn ) should turn out to be 2-dimensional. It should then be possible to
come to some reasonable conclusions when there are relations between f1 and f2 . Then spaces generated by
any number of functions should be dealt with. The case where one of these functions is constant (or finite-
valued) should make it possible to subtract g(p) from any function g ∈ F , and therefore concentrate on those
functions which vanish at p. There should be some connections with abstract algebras here somewhere. ]
[ Next deal with the concrete case F [x]. This is the space of all polynomials. The concrete case F [x, sin x]
may or may not be instructive. Could try for instance
$
L(xi ) = 0, i -= 1
(1 i = 1
0, j -= 1
L(sin x) =
j
2 j=1
-
1, i = 1, j = 0
L(xi sinj x) = 2 i = 0, j = 1
0 otherwise
where
m,n
5
F = {f : IR → IR; f (x) = aij xi sinj x, m, n ≥ 0, aij ∈ IR for all i, j}.
i,j=1
Thus L(f ) = a10 + 2a01 . Comparison with the space F [1, x, sin x] may be interesting. ]
44.4. Spaces of differentiable functions

See Warner [50], Theorem 1.17, page 13, proves the dimensionality of the space of derivations. He shows
that the dimension of the space of derivations equals the dimension of the manifold in the case of C ∞
test functions. But in the case of C k functions for k < ∞, he says on page 16 that the dimension of the
derivation space is infinite. (This is shown in Newns/Walker [81].) Then he says that the way to fix this is
to define the derivations as standard derivatives. The conclusion, then, is that one can replace derivatives

with derivations in a convoluted manner in the C ∞ case, but must revert to derivatives otherwise. Since
the derivation approach is so restrictive and artificial, it is very difficult indeed to understand why anyone
would use this approach. Maybe it is designed to make algebraists more comfortable.
In this section, the case F = {f : IR → IR; f is differentiable in a neighbourhood of p} for p ∈ IRn is treated.
This shows the difficulties which arise when a space is not closed under the quotient operator.
Here is a hopeful form of proof of the dimension of the tangent space for the space of differentiable functions.
To clarify the argument, the simplifying assumption that n = 1 is made.
Suppose F is the space of real functions differentiable in a neighbourhood of p ∈ IR. For any f ∈ F , for x
in a suitable neighbourhood of p,
N 1
f (x) = f (p) + f # (p + t(x − p)).(x − p) dt
0
N 1
= f (p) + (x − p). f # (p + t(x − p)) dt
0
= f (p) + (x − p).g(x),
M1
where g = 0 f # (p + t(x − p)) dt. The function g is well-defined in a neighbourhood of p. So now the Leibniz
rule may be applied to f as follows:
L(f ) = L(f (p)) + L((x − p).g)

&
&
= 0 + (x − p)& .L(g) + L(x − p).g(p)
x=p
= 0.L(g) + f # (p).L(x − p)
= f # (p).L(x − p).
Thus the value of L(f ) is just whatever you get when you apply L to the function x 8→ x − p multiplied by
the derivative of f at p. This would imply that the tangent space is 1-dimensional. It doesn’t matter what
L(g) is, because it gets multiplied by zero anyway. But that’s exactly where the proof falls down. The value
of L(g) is of no relevance as long as it is a real number. But in fact, it is not necessarily even defined. L(g)
is only defined if g ∈ F . An example demonstrates that this is not so in general.
The function g defined above is actually just the differential quotient of f . For any p ∈ IR, define the
−
(non-linear) operator Qp : IRIR → (IR)IR by


 f (x) − f (p)
 , x -= p
x−p
Qp (f )(x) = f (x) − f (p)


 lim sup , x = p,
x→p x−p
for all x ∈ IR, for any f ∈ IRIR . (This operator is linear on subspaces of IRIR in which all functions are
differentiable at p.) Then if f is differentiable,
N 1
Qp (f )(x) = f # (p + t(x − p)) dt.
0
To show that Qp (F ) -⊆ F if F is the space of differentiable functions, consider φα ∈ F defined by

(
xα sin x−1 , x -= 0
φα (x) = (44.4.1)
0, x = 0.
For α > 0, (
xα−1 sin x−1 , x -= 0
Q0 (φα )(x) =
0, x = 0.

44.5. Spaces of smooth functions 787
Q0 (φα ) is differentiable if and only if α > 2, whereas φα is differentiable if and only if α > 1. In fact, for
α > 1, ( α−1
αx sin x−1 − xα−2 cos x−1 , x -= 0
φ#α (x) =
0, x = 0.
So in particular, if α = 2 then the above form of proof that the tangent space dimension is 1 does not work.
Essentially, the above form of proof of the dimension of the tangent space proceeds by supposing that the
space F is closed under the quotient operator Qp and then applying the quotient rule to the product of
the function x 8→ x − p with the quotient function. This then leads to the conclusion that the value of
any tangent vector L applied to the function depends only on the value of the quotient function at p, that
is f # (p).
The requirement that F be closed under Qp is equivalent to the condition that, under the ring structure
of F , the function x 8→ x − p be a divisor of all functions in F which vanish at p. Clearly this is not so for
the differentiable functions on IR.
Two questions are immediately raised by this. Firstly, which spaces F are closed under the quotient operator,
so that the above method of proof can be made to work? And secondly, if the above method of proof cannot
be made to work, is the dimension of the tangent space no longer equal to n, or are there alternative methods
of proof? In Section 44.5, it is shown that the quotient operator method works for the C ∞ functions, but
not for the C k functions for integer k.
44.5. Spaces of smooth functions

In the lead-up to the C ∞ functions, it is useful to deal with the C k functions first.
44.5.1 Theorem: ∀k ≥ 0, ∀p ∈ IR, Qp (C k (IR)) -⊆ C k (IR).

Proof: To show that Qp (C k (IR)) -⊆ C k (IR), consider the function φα defined in (44.4.1). φα ∈ C 1 (IR) if an
only if α > 2, and Q0 (φα ) = φα−1 ∈ C 1 (IR) if an only if α > 3. So for α = 3, φα provides a counterexample
to the closure of C 1 (IR) under Q0 .
To find a counterexample for k > 1, it suffices to consider the (k − 1)-fold integral I0k−1 (φα ) of φα , where I0
denotes the integral operator N x
I0 (f )(x) = f (t) dt.
0
Then clearly I0k−1 (φα ) ∈ C k (IR) if an only if α > 2 and

I Jk I Jk
d d " −1 k−1 #
Q0 (I0 (φα )) =
k−1
x I0 (φα )
dx dx
5k I J
k
= (−1)i i! x−1−i I0i−1 (φα ).
i=0
i
Since I0i−1 (φα ) is majorized by |x|α+i−1 /(i − 1)! for i ≥ 1, Q0 (I0k−1 (φα ))(k) is continuous at x = 0 when
α > 2 if an only if the term x−1 I0−1 (φα ) is continuous at 0. But I0−1 (φα ) = φ#α . Inspection of the terms
of φα shows that x−1 I0−1 (φα ) is thus discontinuous for α ≤ 3. So if α = 3, Q0 (I0k−1 (φα )) ∈ / C k (IR), but
I0 (φα ) ∈ C (IR). Hence for all k ≥ 1, Qp (C (IR)) -⊆ C (IR). (Note that the case k = 0 is a pushover.)
k−1 k k k
44.5.2 Theorem: ∀k ≥ 1, ∀p ∈ IR Qp (C k (IR)) ⊆ C k−1 (IR).
Proof: Now to show that Q0 (C k (IR)) ⊆ C k−1 (IR) for all k ≥ 1, it is helpful to recall that a function
h : IR → IR with h(0) = 0 is continuous at 0 if an only if
∀ε > 0, ∃δ > 0, |x| < δ ⇒ |h(x)| < ε.
Suppose f ∈ C k (IR), and define g ∈ C k (IR) by

k
5 xi
g(x) = f (x) − f (i) (0).
i=0
i!

Then g = I0k (g (k) ). But h = g (k) is continuous at 0 and h(0) = 0. So for any integer i with 0 ≤ i ≤ k,
∀ε > 0, ∃δ > 0, |x| < δ ⇒ |g (k−i) (x)| < ε|x|i .
So the (k − 1)-th derivative of Q0 (g) satisfies

&I Jk−1 & &I Jk−1 &
& d & & d &
& Q (g) & = & (x −1
g(x)) &
& dx 0 & & dx &
& k−1 I J &
&5 k−1 &
= && (−1)i i! x−1−i g (k−1−i) (x) &&
i=0
i
5 Ik − 1J
k−1
≤ i! |x|−1−i |g (k−1−i) (x)|
i=0
i
5 Ik − 1J
k−1
≤ i! |x|−1−i εi |x|i+1
i=0
i
5 Ik − 1J
k−1
= i! εi ,
i=0
i
for any set of positive numbers εi , for small enough |x|. Hence Q0 (g)(k−1) (x) → 0 as x → 0. It remains to
show that Q0 (g)(k−1) (0) = 0. Since g(0) = g # (0) = 0,
g(x)
Q0 (g)(0) = lim sup
x→0 x
= 0.
Since g ## (0) = 0 also,
Q0 (g)(x)
Q0 (g)# (0) = lim
x→0 x
= lim x−2 g(x)
x→0
= 0.
By induction, since g (i) (0) = 0 for 0 ≤ i ≤ k,
Q0 (g)(k−2) (x)
Q0 (g)(k−1) (0) = lim
x→0 x
5 Ik − 2J
k−2
= lim (−1)i i! x−2−i g (k−2−i) (x)
x→0
i=0
i
= 0.
So Q0 (g) ∈ C k−1 (IR). But Q0 (f ) = Q0 (g) + Q0 (f − g), and f − g is a polynomial. So Q0 (f ) ∈ C k−1 (IR),
since the differential quotient of any polynomial is a polynomial.
44.5.3 Remark: An immediate consequence of the above is the closure of C ∞ (IR) under the quotient
" C ∞ (IR) #
operator. Hence dim Tp (IR) = 1.
" C ∞ (IR) #
44.5.4 Theorem: ∀p ∈ IR, dim Tp (IR) = 1.
" C ∞ (IRn ) n #
44.5.5 Theorem: ∀n ≥ 1, ∀p ∈ IRn , dim Tp (IR ) = n.

44.6. The space of analytic functions 789
44.6. The space of analytic functions

[ Then deal with all analytic functions, which are basically the pointwise limits of functions in F [x], if the
pointwise limit exists. This raises the question of whether the domain should be restricted from IRn to some
ball around p. Anyway, the tangent space should turn out to have the right dimension for the analytic
functions, because the analytic functions are closed under the quotient operator. That is, if f is analytic,
then Qp (f ) is analytic for any p. ]
44.7. The Hölder spaces

The spaces C k,α (IRn ) and W k,p (IRn ) probably should be considered.
As an example, let F = C 1/2 (IR), and try to define the linear map L ∈ T0F (IR) so that L(|x|1/2 ) = 1 and
also L(sign(x)|x|1/2 ) = 1. Then if this can be done, it should result in
&
&
L(x) = 2|x|1/2 & L(|x|1/2 ),
x=0
and therefore L(ax) = 0 for all constants a ∈ IR. Thus the generators of this space are effectively the
functions which are only just in C 1/2 (IR). This would make an interesting example maybe.
44.8. Further topics on derivations

Also to be considered is the issue of whether demanding that the tangent maps L be continuous on some
topology on F makes things better and/or simpler.
" #
Should try to show via some sort of Hahn-Banach theorem that dim TpF (IRn ) > n for F = C n (IRn ), etc.
In the case of C k manifolds, this raises the question of just how tangent spaces should be defined in spaces
larger than C ∞ (M ). Maybe the derivative will have to be defined in terms of coordinates!
44.9. Germs
[ Warner [50], pages 11–22, has a good treatement of derivations and germs. See also Bump [97], section 6. ]
[ This section should be written up as formal definitions and theorems. ]
This is an e-mail I once received on this subject.
“The idea I was telling you about is really due to Grothendieck [69]. It is hinted at in Spencer’s arti-
cle [142]. It is more in evidence in algebraic geometry than in differential geometry, e.g. Fulton [108],
exercise 13-3 or, indeed in analytic geometry, e.g. Gunning [70], corollary 1 to theorem 15.
“Anyway, if p ∈ M a smooth manifold, let W be the ideal of germs vanishing at p inside O, the ring
of germs of smooth functions at p. Then O/W = IR (induced by evaluation of a function at p). So
W/W 2 is an IR-module, i.e. a vector space. I claim that it is a vector space of dimension equal to
the dimension of M . You can define this as the cotangent space at p. For example, if f is a germ
at p, then f − f (p) is in W , so defines an element of W/W 2 , and this is df (p). Similarly,
& if X is a
derivation of O, e.g. a germ of a smooth vector field at p, then by the Leibnitz rule, X &W 2 vanishes.
So X defines a linear functional on W/W 2 , i.e. a tangent vector at p.
“This is all ‘explained’ by ‘synthetic differential geometers’. I remember that Reyes & Koch were
involved in the basics (but beware that they use intuitionistic logic). The MSC classification number
is 51 K 10. So you’ll be able to find lots of this stuff in the library.
“P.S. Anders Kock [73] seems to have written a book on it.”
Let M be an n-dimensional C ∞ manifold. Let p ∈ M . Let
W = {f ∈ C ∞ (M ); f (p) = 0},
which is an ideal of the ring O = C ∞ (M ). Then O/W = IR. [ See Section 9.8 for rings and ideals. ]
O is a ring because it is closed under pointwise addition and multiplication. W is an ideal in O because for
all f ∈ O and g ∈ W , f.g ∈ W . O/W is the set {f + W ; f ∈ O}, which is a 1-dimensional vector space

under multiplication. For all f1 , f2 ∈ O, the product of f1 + W with f2 + W is equal to f1 .f2 + W . Clearly
f1 + W = f2 + W if and only if f1 (p) = f2 (p).
Now look at W/W 2 . The set W 2 is the set of self-products of elements of W : W 2 = {f.f ; f ∈ W }. Clearly
all such functions must be non-negative. Then W/W 2 = {f + W 2 ; f ∈ W }. Then two (representatives of)
elements of W/W 2 are equal if and only if they differ by a square of a function which vanishes at 0. That is,
∀f1 , f2 ∈ W, (f1 + W 2 = f2 + W 2 ⇔ ∃g ∈ W, f1 = f2 + g.g).
Is W 2 really an ideal of W ? Check that equivalence classes of the form f + W 2 for f ∈ W have well-defined
sums and products. Let f1 , f2 ∈ W . Then (f1 + W 2 ) + (f2 + W 2 ) = (f1 + f2 ) + W 2 if the sum of any pair
of elements of W 2 can be written as an element of W 2 and vice versa. And so forth, and so forth. . .
See Gallot/Hulin/Lafontaine [20], page 19, for a discussion of germs. They define the space of germs as
W = {f : M → IR; ∃Ω ∈ Top(M ), p ∈ Ω and f (p) = 0}.
Gallot et alia then go on to look at the set of derivations (Definition 1.45) on W. Then this set of derivations
is an n-dimensional vector space, which can be defined to be the tangent space of M at p. This is apparently
a different definition to the one indicated above.
44.10. Jets
[ See EDM 108.X. This is a set of definitions of tensors which is useful for defining a C ∞ manifold structure
on tensor bundles of arbitrary order. ]
[ Do jets have anything to do with sheaves? ]
[ Are jets really just equivalence classes of curves? ]

[791]
Chapter 45
History
45.1 Chronology of mathematicians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791

45.2 Origins of words and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
45.3 Etymology of affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
45.4 Logical language in ancient literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
45.1. Chronology of mathematicians

This section outlines the chronology of mathematicians who contributed directly or indirectly to the devel-
opment of the topics in this book. The tables are based on Bynum et alia [192], EDM2 [35], Bell [190],
KEM [122] and other sources. Names are sorted by date of death.
45.1.1 Remark: Ancient history

dates name contribution
c639–c546bc Thales of Miletus first proof of geometric theorem; learned mathematics in Egypt;
KEM [122] gives dates c625–c545bc
572–492bc Pythagoras of Samos taught geometry; maybe first to prove Pythagoras’ theorem; fl.
510bc; another source gives dates c.569–c.475bc; KEM [122]
gives c580–c496bc
c490–c425bc Zeno of Elea paradoxes regarding infinitesimals
480–411bc Antiphon the Sophist atomistic calculation of the area of circles; proposed a method
of exhaustion; fl.430?
c470–c410bc Hippocrates of Chios wrote “Elements of Geometry” (lost)
c460–c370bc Democritus computation of the volume of pyramids by dividing them into
‘atomistic’ laminas; fl.430?
c400–347bc Eudoxus of Cnidus attributed as having developed ‘method of exhaustion’.
(EDM [34] and KEM [122] give dates c408–c355bc.)
c427–c347bc Plato !
384–322bc Aristotle first sum of infinite series ∞
n=0 4
−n
; provided basis for Euclid’s
Elements? Basic formal logic.
c325–c265bc Euclid of Alexandria Elements; organized ruler/compass geometry axiomatically. (fl.
c280bc; EDM [34] gives c300bc; KEM [122] gives c365–c300bc)
c287–212bc Archimedes of Syracuse rigorous treatment of areas and volumes bounded by curved
lines and surfaces using the ‘method of exhaustion’. (EDM [34]
gives dates c.282–212bc.)
c276–c194bc Eratosthenes of Cyrene measured distance of 1◦ on Earth
c262–c190bc Apollonius of Perga Konikon Biblia; conic sections.
fl.146–126bc Hipparchus Founder of trigonometry.


792 45. History
The Mesopotamians and Egyptians made much progress in geometry before classical Greek mathematics,
but no personal names are associated with this very early geometry. It is not known, for example, who
discovered Pythagoras’ theorem between about 2500bc and 1850bc. According to Bell [190], page 70, the
first deductive proof of a geometric theorem is traditionally ascribed to Thales about 600bc. Maybe Thales
needed to use deduction to fill in the gaps in his knowledge which he learned on a visit to Egypt. He probably
forgot a few things while sailing back to Anatolia, the Egyptian priests didn’t like to tell everything, and
Egyptian and Mesopotamian mathematics texts never gave proofs of rules and theorems.
Euclid was more important as a collector and organizer of geometrical knowledge than as an inventor or dis-
coverer. It is the logical organization of Euclid’s “Elements” which has had a profound effect on mathematics
and physics, not so much the particular set of theorems. Bell [190], page 71, says:
With the completion of Euclid’s Elements, Greek elementary geometry, exclusive of the conics,
attained its rigid perfection. It was wholly synthetic and metric. Its lasting contribution—and
Euclid’s—to mathematics was not so much the rich store of 465 propositions which it offered as the
epoch-making methodology of it all.
For the first time in history masses of isolated discoveries were unified and correlated by a single
guiding principle, that of rigorous deduction from explicitly stated assumptions. Some of the
Pythagoreans and Eudoxus before Euclid had executed important details of the grand design, but
it remained for Euclid to see it all and see it whole. He is therefore the great perfector, if not the
sole creator, of what is today called the postulational method, the central nervous system of living
mathematics.
Unification and organization of mathematics is still an important task in the 21st century. Bell [191],
page 299, says the following.
Geometrical teaching was dominated by Euclid for over 2200 years. His part in the Elements
appears to have been principally that of a coordinator and logical arranger of the scattered results
of his predecessors and contemporaries, and his aim was to give a connected, reasoned account of
elementary geometry such that every statement in the whole long book could be referred back to
the postulates. Euclid did not attain this ideal or anything even distantly approaching it, although
it was assumed for centuries that he had.
45.1.2 Remark: Dark Ages
99bc–1bc
1–99
c85–c168 Ptolemy of Alexandria wrote Almagest on astronomy and geometry.
[Claudius Ptolemaeus]
200–299
fl.300–350 Pappus of Alexandria Mathematical Collection referred to lost works on geometry.
400–499
500–599
600–699
700–799
800–899
900–999
1000–1099
1100–1199
1200–1299
1300–1399
Progress in mathematics was woefully slow under the Roman Empire and Catholic church until the Renais-
sance and Reformation. There was some progress in algebra in these Dark Ages, but geometry and analysis
seem to have gone backwards. Bell [190], page 85, describes this as follows.
It is customary in mathematical history to date the beginning of the sterile period from the onset
of the Dark Ages in Christian Europe. But mathematical decadence had begun much earlier, in

45.1. Chronology of mathematicians 793
one of the greatest material civilizations the world has known, in the Roman Empire at the height
of its splendor. Mathematically, the Roman mind was crass.
45.1.3 Remark: Renaissance

1404–1472 Leone Battista Alberti theory of perspective; vanishing line. (b. Feb 18, d. Apr 3.)
1512–1558 Robert Recorde 1557: invented the “ = ” sign for equality.
1526–1572 Rafael Bombelli 1572: negative number arithmetic.
1540–1603 François Viète symbolic algebra; Newton’s method. (d. Dec 13.)
1571–1630 Johannes Kepler principle of continuity; points at infinity.
1564–1642 Galileo Galilei 1591/1612: dynamics. (b. Feb 15, d. Jan 8.)
1596–1650 René Descartes 1637: la Géométrie; analytic geometry. (b. Mar 31, d. Feb 11.)
1591–1661 Girard Desargues 1636–39: invented projective geometry; points at infinity.
(b. Feb 21, d. Sep.)
1623–1662 Blaise Pascal 1636–39: synthetic projective geometry. (b. Jun 19, d. Aug 19.)
1601–1665 Pierre de Fermat 1629: analytic geometry; 1657/61: tangents to curves as limits
of secants; var. principle in optics. (b. Aug 17, d. Jan 12.)
1630–1677 Isaac Barrow taught calculus to Newton. (b. Oct, d. May 4.)
1629–1695 Christiaan Huygens (b. Apr 14, d. Jul 8.)
The 17th century is notable for the rapid development of analysis, which is distinguished from other math-
ematics by the use of limits and infinite processes. The big contribution of Newton was in the application
of analysis to physics, for which he required some clarification and much development of the methods of
analysis.
Practical analysis was developed initially by Archimedes. Limits and derivatives were written about by
some authors in the century or two before Newton. But these notions were elevated from curiosities to
fundamental physical modelling tools by Newton. It was then perhaps inevitable that analysis would be
applied to geometry to produce differential geometry.
Bell [191], page 96, has the following comment on how Newton learned calculus from Isaac Barrow.
Barrow’s geometrical lectures dealt among other things with his own methods for finding areas
and drawing tangents to curves—essentially the key problems of the integral and the differential
calculus respectively, and there can be no doubt that these lectures inspired Newton to his own
attack.
Bell [190], pages 120–121, has the following comment on the earlier development of Newton’s method for
solution of polynomial equations by François Viète.
Improving on the devices of his European predecessors, Vieta gave a uniform method for the
numerical solution of algebraic equations. Its nature is sufficiently recalled here by noting that it
was essentially the same as Newton’s (1669) given in textbooks.
45.1.4 Remark: Enlightenment

1616–1703 John Wallis generalization of superscript exponential notation
1646–1716 Gottfried Wilhelm Leibniz 1673/75: diff/int. calculus; fundamental theorem of calculus;
calculus notation. (b. Jul 1, d. Nov 4.)
1642–1727 Isaac Newton 1666/84: diff/int. calculus; 1687: Principia; fund. theorem of
calculus; celestial mechanics. (b. Dec 25, d. Mar 20.)
1865–1731 Brook Taylor
1698–1746 Colin Maclaurin
1707–1783 Leonhard Euler coined the term “affine”, 1748. (b. Apr 15, d. Sep 18.)
The 18th century was known as the “Age of Reason” because during this time the objections of European
religious authorities to scientific progress were overcome and finally made irrelevant. Once again, as during
the golden age of classical Greece, critical thinking and insightful discovery replaced ignorant authority. An

794 45. History
important step in this was the publication of Lagrange’s work on mechanics in 1788. Bell [190], page 362,
wrote the following on this subject.
The eighteenth century has been called the Age of Reason, also an age of enlightenment, partly
because the physical science of that century attained its freedom from theology. In the hundred
years from the death of Newton in 1727 to that of Laplace in 1827, dogmatic authority suffered
the most devastating of all defeats at the hands of scientific inquiry: indifference. It simply ceased
to matter, so far as science was concerned, whether the assertions of the dogmatists were true or
whether they were false.
45.1.5 Remark: Nineteenth century

1724–1804 Immanuel Kant “Proved” that Euclidean geometry was “known a priori ”.
(b. Apr 22, d. Feb 12.)
1736–1813 Joseph-Louis Lagrange calculus of variations; Lagrangian mechanics. (b. Jan 25,
d. Apr 10.)
1746–1818 Gaspard Monge introduced differential geometry; created descriptive geometry,
representing solids by means of projections on a plane and
forming the basis of engineering drawing. (b. May 9, d. Jul 28.)
1753–1823 Lazare Nicholas Marguerite 1803: Géométrie de position; 1806: Essai sur les transversailles;
Carnot projective geometry.
1749–1827 Pierre-Simon Laplace analysis, celestial mechanics, potential theory. (b. Mar 23,
d. Mar 5.)
1802–1829 Niels Henrik Abel (b. Aug 5, d. Apr 6.)
1768–1830 Jean Baptiste Joseph (b. Mar 21, d. May 16.)
Fourier
1811–1832 Evariste Galois finite groups. (b. Oct 25, d. May 31.)
1752–1833 Adrien Marie Legendre
1781–1840 Siméon Denis Poisson
1781–1848 Bernard Placidus Johann Bolzano-Weierstraß theorem. (b. Oct 5, d. Dec 18.)
Nepomuk Bolzano
1804–1851 Carl Gustav Jacob Jacobi Hamilton-Jacobi equation; Jacobian determinant. (b. Dec 10,
d. Feb 18.)
1777–1855 Johann Carl Friedrich perfected differential geometry; Gaußian curvature enabled
Gauß classic non-Euclidean geometries to be described without
embedding in Euclidean space. (b. Apr 30, d. Feb 23.)
1789–1857 Augustin Louis Cauchy partial differential equations; Cauchy sequences. (b. Aug 21,
d. May 23)
1805–1859 Johann Peter Gustav boundary value problems. (b. Feb 13, d. May 3)
Lejeune Dirichlet
1796–1863 Jakob Steiner contributor to projective geometry. (b. Mar 18, d. Apr 1.)
1805–1865 William Rowan Hamilton Hamiltonian mechanics. (b. Aug 4, d. Sep 2.)
1826–1866 Georg Friedrich Bernhard 1854: Über die Hypothesen welche der Geometrie zu Grunde
Riemann liegen; generalized Gaußian curvature to higher dimensions.
(b. Sep 17, d. Jul 20.)
1798–1867 Karl Georg Christian von elimination of metrical considerations from projective geometry
Staudt
1788–1867 Jean-Victor Poncelet 1822: Traité des propriétés projectives des figures; following
Desargues, effectively created modern projective geometry;
introduced imaginary points. (b. Jul 1, d. Dec 22.)
1790–1868 August Ferdinand Möbius 1827: Der barycentrische Calcul, includes many of his results
on projective and affine geometry; introduced barycentric
coordinates. (b. Nov 17, d. Sep 26.)
1811–1874 Ludwig Otto Hesse

45.1. Chronology of mathematicians 795
1809–1877 Hermann Günter development of a general calculus for vectors (b. Apr 15,
Grassmann d. Sep 26.)
1845–1879 William Kingdon Clifford (b. May 4, d. Mar 3.)
1793–1880 Michel Chasles 1852: Traité de Géométrie discusses cross ratio. (b. Nov 15,
d. Dec 18.)
1808–1882 Johann Benedict Listing 1847: Vorstudien zur Topologie; first printed use of word
‘topology’. (b. Jul 25, d. Dec 24.)
1823–1891 Leopold Kronecker arithmetization of mathematics. (b. Dec 7, d. Dec 29.)
1821–1895 Arthur Cayley reduction of metrical geometry to projective geometry; ‘invented’
matrices. (b. Aug 16, d. Jan 26)
1815–1897 Karl Theodor Wilhelm (b. Oct 31, d. Feb 19.)
Weierstraß
1842–1899 Marius Sophus Lie continuous transformation groups. (b. Dec 17, d. Feb 18.)
The dearth of British names among mathematicians who died between 1750 and 1900 is quite striking. This is
sometimes attributed to the isolation of British mathematicians after the silly arguments about who invented
the calculus. Newton ostensibly won the argument, but it was a self-defeating victory. British mathematics
went into decline. Europeans dominated the development of mathematics thereafter. Bell [191], page 144,
makes the following comment on this subject.
The upshot of it all was that the obstinate British practically rotted mathematically for all of a
century after Newton’s death, while the more progressive Swiss and French, following the lead of
Leibniz, and developing his incomparably better way of merely writing the calculus, perfected the
subject and made it the simple, easily applied implement of research that Newton’s immediate
successors should have had the honor of making it.
(See also some related comments in Remark 18.2.11.)
45.1.6 Remark: Twentieth century

1835–1900 Eugenio Beltrami (b. Nov 16, d. Feb 18.)
1829–1900 Elwin Bruno Christoffel covariant differentiation. (b. Nov 10, d. Mar 15.)
1822–1901 Charles Hermite
1819–1903 George Gabriel Stokes the Stokes theorem (?)
1832–1903 Rudolf Otto Sigismund (b. May 14, d. Oct 7.)
Lipschitz
1864–1909 Hermann Minkowski 1908: introduced Minkowski space-time formulation of special
relativity (b. Jun 22, d. Jan 12.)
1854–1912 Jules Henri Poincaré ‘opened up the road to algebraic topology’; Poincaré conjecture.
(b. Apr 29, d. Jul 17.)
1831–1916 (Julius Wilhelm) Richard set theory; real numbers. (b. Oct 6, d. Feb 12.)
Dedekind
1838–1916 Ernst Mach Mach’s principle [term coined 1918 by Einstein]. (b. Feb 18,
d. Feb 19.)
1873–1916 Karl Schwarzschild singularities in general relativity. (b. Oct 9, d. May 11.)
1845–1918 Georg Ferdinand Ludwig Set theory 1870, 1883. (b. Mar 3, d. Jan 6.)
Philipp Cantor
1849–1925 Felix Klein Erlanger Programm; unification of Euclidean, projective and
other non-Euclidean geometries. (b. May 25, d. Jun 22.)
1848–1925 Friedrich Ludwig Gottlob set theory foundations; victim of Russell’s paradox (b. Nov 8,
Frege d. Jul 26.)
1853–1925 Gregorio Ricci-Curbastro developed tensor calculus; absolute differential calculus (b.
Jan 12, d. Aug 6.)
1843–1930 Moritz Pasch 1882: statement of geometry as a hypothetico-deductive system
1861–1931 Cesare Burali-Forti Burali-Forti paradox. (b. Aug 13, d. Jan 21)

796 45. History
1875–1932 Giuseppe Vitali proved “existence” of Lebesgue non-measurable sets. (b. Aug 26,
d. Feb 29)
1858–1932 Giuseppe Peano 1891–95: reduction of considerable part of mathematics to
symbolism (b. Aug 27, d. Apr 20.)
1879–1932 John Wesley Young gave a strict axiomatic basis for projective geometry
1878–1936 Marcel Grossman discovered relevance of tensor calculus to relativity? (b. Apr 9,
d. Sep 7)
1875–1941 Henri Léon Lebesgue measure and integration (b. Jun 28, d. Jul 26.)
1873–1941 Tullio Levi-Civita 1917: infinitesimal parallel transport; developed tensor calculus
(b. Mar 29, d. Dec 29.)
1862–1943 David Hilbert 1899: Grundlagen der Geometrie. (b. Jan 23, d. Feb 14.)
1882–1944 Arthur Stanley Eddington 1916: Riemannian and affine connections. (b. Dec 28, d. Nov 22.)
1892–1945 Stefan Banach
1861–1947 Alfred North Whitehead 1910–13: Principia Mathematica.
1873–1950 Constantin Carathéodory geometric measure theory
1869–1951 Élie Cartan 1901: exterior derivative; 1923–25: defined general connections.
(b. Apr 9, d. May 6.)
1871–1953 Ernst Friedrich Ferdinand 1908: set theory axioms
Zermelo
1879–1955 Albert Einstein 1916: general relativity. (b. Mar 14, d. Apr 18.)
1885–1955 Hermann Klaus Hugo Weyl 1916: Riemannian and affine connections. (b. Nov 9, d. Dec 8.)
1871–1956 Félix Edouard Justin Emile (b. Jan 7, d. Feb 3.)
Borel
1903–1957 John (János) von 1922: early version of Bernays-Gödel set theory. Ordinal
Neumann numbers using successor sets.
1880–1960 Oswald Veblen gave a strict axiomatic basis for projective geometry
1891–1965 Abraham Adolf Fraenkel 1922: completed Zermelo’s set theory axioms; KEM [122] gives
name order “Adolf Abraham”
1881–1966 Luitzen Egbertus Jan 1908: “the unreliability of the principles of logic”
Brouwer
1872–1970 Bertrand Arthur William 1910–13: Principia Mathematica; Russell’s paradox.
Russell
1878–1973 René Maurice Fréchet 1906: topological compactness
1921–1977 Alfred Schild 1970: Schild’s ladder. (d. May 24.)
1888–1977 Paul Isaak Bernays Set theory. (b. Oct 17, d. Sep 18.)
1906–1978 Kurt Gödel set theory (b. Apr 28, d. Jan 14.)
1909–1978 Eduard Ludwig Stiefel 1936: introduced fibre bundles as a distinct concept
1896–1980 Kazimierz (Casimir) represented ordered pair (a, b) as {{a}, {a, b}}
Kuratowski
1918–1988 Richard Phillips Feynman (b. May 11, d. Feb 15.)
1906–1993 Andre! i Nikolaeviq Andrei Nikolaevich Tikhonov. Compactness theorem. (b. Oct 30,
Tihonov d. Oct 7.)
1915–2001 Frederick (Fred) Hoyle Astrophysics. Cosmology. (b. Jun 24, d. Aug 20.)
1915–2002 Laurent Schwartz theory of distributions. (b. Mar 5, d. Jul 4.)
[ Maybe should have a chronology of events here too. ]
45.2. Origins of words and notations
45.2.1 Remark: The word mathematics comes from the Greek word meaning “the act of learning;
knowledge, learning, science, art, doctrine”, from meaning “to learn, have learnt, know; to ask,
inquire, hear, perceive; to understand”. (See for example Feyerabend [202] for Greek translations.) At
Plato’s Academy until 529ad, the “mathemata” meant the quadrivium of subjects: music, astronomy,
geometry and arithmetic. (See Remark 2.9.8 and Figure 2.9.1.)

45.2. Origins of words and notations 797
45.2.2 Remark: The word geometry (from Greek meaning “land-surveying, geometry”, literally
“land measurement” from meaning “earth, land; soil, ground, field; empire, home”) was used by Herodotus
(c.484–c.425bc) for the Egyptian methods of redetermining land boundaries after the annual flooding of the
Nile.
[ Try to use Arabic font for Arabic names. ]
45.2.3 Remark: The word algorithm comes from “al Khwārizmi”, a nickname of the 9th century Arab
mathematician Abū Ja’far Mohammed ibn Mūsā from the town of Kwārizm (died c.850).
45.2.4 Remark: The word algebra comes from the Arabic word “al Jebr”, which means “bone-setting”.
This seems to come from the book “al jebr w’almuquabala” (“restoration and reduction”) written by al
Khwārizmi. (See Bell [190], page 99.) Presumably algebra is likened to “setting bones” because bone-setting
restores the patient to the normal condition, while algebra restores the unknown variables which have been
modified by arithmètic operations.
45.2.5 Remark: The addition and subtraction symbols “ + ” and “ − ” were invented in 1489 by J.W.
Widmann in Germany, according to Bell [190], page 97. Ball [188], page 206, attributes these symbols to
Johannes Widman, born about 1460. (Yet another source gives the name Johannes Widman, 1462–1498,
and the name of the 1489 publication as “Behende und hupsche Rechnung auf allen kauffmanschafft”.)
45.2.6 Remark: The equals sign “ = ” was invented by Robert Recorde, “Whetstone of witte”, 1557.
According to Bell [190], page 129, Recorde said that: “no two things could be ‘moare equalle’ than ‘a paire
of parelleles’.” According to Beckmann [189], page 12, Recorde wrote that “noe .2. thynges, can be moare
equalle.”
45.2.7 Remark: Rafael Bombelli’s book “Algebra” in 1572 introduced a consistent theory of imagi-
nary complex numbers, including arithmetic for negative numbers. (The name is also spelled Raffael and
Rafaello.) He also introduced an unusual notation for variables such as x and x2 . (See Bell [190], page 174;
Lakoff/Núñez [173], page 73; Struik [194], page 85; Ball [188], page 228.)
45.2.8 Remark: Descartes introduced the notations x, xx, x3 , x4 , etc. for powers of x (Bell [190],
page 144). Gauß wrote xx instead of x2 . Euler still used the xx notation in 1748. Gauß said that “neither
is more wasteful of space than the other”, according to Bell [190], page 125.
√
45.2.9 Remark: The notations x−n and x1/n , instead of 1/xn and n x, were introduced by John Wallis
in 1655, according to Bell [190], page 129.
45.2.10 Remark: Newton developed the dot notation ẏ, ÿ, etc. for derivatives. (Bell [190], page 127.)
N
dy
45.2.11 Remark: Leibniz invented the derivative notation and the integral sign from the Latin
dx
dy
“summa”. (See Bell [190], page 153.) According to Bell [191], page 99, referring to the notation : “This
dx
symbolism is due (essentially) to Leibniz and is the one in common use today; Newton used another (ẏ)
which is less convenient.”
!
Similarly, the sum symbol is the Greek letter “S”, related to the word “sum”.
45.2.12 Remark: The word function in the mathematical sense was apparently introduced by Leibniz.
Bell [191], page 98, says: “The word function (or its Latin equivalent) seems to have been introduced into
mathematics by Leibniz in 1694; the concept now dominates much of mathematics and is indispensable in
science. Since Leibniz’ time the concept has been made precise.”
45.2.13 Remark: Beckmann [189], page 12, says that the notation π was not used for the ratio of cir-
cumference to diameter of a circle until the 18th century. Beckmann [189], page 145, says that a Welsh
mathematician, William Jones (1675–1749), used the π notation in 1706 in Synopsis Palmariorum Math-
eseos. He says also that Euler first used the π notation in 1737 in his Variae observationes circa series
infinitas.

798 45. History
45.2.14 Remark: The word affine was introduced by Leonhard Euler for affine spaces and affine maps in
1748 in his book Introductio in analysin infinitorum [104], volume 2, chapter 18, section 442, as explained
in Section 45.3.
45.2.15 Remark: The term group as an algebraic structure was invented by Evariste Galois (1811–32),
according to Bell [190], page 234.
45.2.16 Remark: The integer congruence notation a ≡ b (mod m) was apparently invented by Gauß.
(Bell [191], page 225.)
45.2.17 Remark: The word topology was first used in print by Johann Benedict Listing in “Vorstudien zur
Topologie” (1847), according to EDM2 [35], article 426. The earlier term for this word was “analysis situs”,
which is Latin for “analysis of place”. (The word “topology” is a Greek-based construction meaning “study
of place” from meaning “place, spot; passage in a book; region, district; space, locality; position,
rank, opportunity” and meaning “thought, reasoning, computation, reckoning, deliberation, account,
consideration, opinion” [and 44 other meanings in my dictionary].) It is not immediately obvious why the
word “topology” is used for the study of continuity. The connection can be seen by thinking about what a
set would be without a topology. Then every point is equivalent; for instance, there would be no concept of a
continuous curve. But with a topology, continuous curves are constrained to move smoothly from one point
to another. Thus if progressively shorter segments of a curve are considered, they are progressively more local
to their starting point. In other words, a topology gives a set a sense of place or locality. This becomes even
clearer when considering how neighbourhoods are used in defining interior, exterior and boundary points of
sets. In fact, the aptness of the word “topology” is most clear when a topology is defined in terms of open
bases at all points rather than open sets.
45.2.18 Remark: The assertion symbol < was perhaps (?) invented in 1879 by Frege.
45.2.19 Remark: The word homeomorphism was introduced by Henri Poincaré in 1895. (See EDM2 [35],
section 425.G.) This comes from Greek meaning “like, similar, resembling; the same, of the same
rank; equal citizen; equal; common, mutual; a match for; agreeing, convenient”, and meaning “form,
shape, figure, appearance, fashion, image; beauty, grace”.
[ Is it possible that the QED symbol “ ” at the end of proofs in modern texts originated from a bold right
square bracket? Taylor [145] used a very solid bold right square bracket in 1966. Olaf told me in 2005
that Paul Halmos claimed in “I want to be a mathematician” that he (Halmos) invented this style of QED
symbol. ]
45.3. Etymology of affine spaces

This section deals with the historical origin of the word “affine” in expressions such as “affine spaces” and
“affine transformations”. It seems that the term was introduced by Euler in a slightly erroneous fashion
in 1827, and was later defined in the modern sense by Möbius in 1827.
The word “affine” does not appear in many English dictionaries. The word “affin/affine” meant “simi-
lar” in French from the 12th to the 16th centuries and then disappeared, but reappeared in the mid-19th
century [214]. Some sample definitions from various dictionaries are summarized in the following table.
language dictionary word definition
Latin White [218] affinis bordering upon, adjacent to, allied, kindred; a connection or
relation by marriage.
English Oxford shorter [212] affin/e 1509. A relation by marriage; a connection; closely related.
French Petit Robert [214] affin/e which conserves invariant, by linear relations, transformations in
the plane or in space
German Wahrig [216] affin/e parallel-related (from Latin “affinis”: adjacent, adjoining)
German Duden [200] affin produced by parallel projection of one plane onto a second
Italian Sansoni [208] affine similar, allied, kindred, alike
Spanish Cassell [199] afin contiguous, adjacent; allied, related, similar

45.3. Etymology of affine spaces 799
The term “affine” in geometry seems to have been introduced by Leonhard Euler in 1748 in his book
Introductio in analysin infinitorum [104], volume 2, chapter 18, section 442. This is quoted by August
Ferdinand Möbius in 1827 in his book Der barycentrische Calcul [76], pages 194–195, as the source for his
adoption of the term. So apparently Euler was the first mathematician to use the word “affine” for general
linear transformations and Möbius was the second. But the truth is more complex than this.
Euler defined two figures to be affine if they could be oriented and translated so that one could be obtained
from the other by a scaling in IR2 such as (x, y) 8→ (ax, by) (whereas a similarity transformation has the
form (x, y) 8→ (ax, ay), of course). Euler’s thinking on this seems to have been rather woolly. This relation
is not an equivalence relation since it is not transitive. (For instance a square can be deformed to a rectangle
through one axis, or to a diamond through an axis at 45◦ to this, but there is not single two-scale scaling
which can transform the diamond into the rectangle.) Therefore the set of such transformations does not
form a closed group. Euler’s use of this non-transitive relation is quite understandable since geometric
thinking in terms of transformation groups, equivalence relations and invariants did not really take off until
the 19th century.
Möbius noted that Euler claimed that for any two affine-related figures in the plane, there must always
be a pair of axes for which by scaling the figure differently in those two directions the two figures could
be matched. In other words, Euler effectively implied that an affine relation between two figures could be
expressed as a translation combined with a transformation matrix such as
3 43 43 4
cos θ1 − sin θ1 a 0 cos θ2 − sin θ2
,
sin θ1 cos θ1 0 b sin θ2 cos θ2
for some a, b, θ1 , θ2 ∈ IR. The composition of such matrices yields matrices with non-zero off-diagonal
entries. The non-closure of Euler’s transformations implies that the relation is not transitive, and therefore
his relation is not an equivalence relation as he probably had assumed it would be.
Between Euler in 1748 and Möbius in 1827, no one seems to have taken much interest in affine spaces.
Although Euler apparently coined the word, Möbius was probably the real inventor of affine transformations
as a subject for study. The central concerns in geometry between the times of Desargues and Möbius
were clearly metric-invariant, conformal-invariant and projective geometries, and somehow there was no
motivation to consider affine-invariant geometry as a special topic. But Möbius found applications to the
engineering problem of determining the centre of mass of a structure. The centre of mass is preserved under
affine transformations but not under projections. In fact, the affine transformations make up precisely the
group under which convex combinations such as the centre of gravity are invariant.
Within a few decades, the projective, affine, conformal and metric transformation groups were systematized
within the framework of the so-called Erlanger Programm (named after Erlangen University, where Felix
Klein proposed the program in 1872). The idea of the Erlanger Programm was to study a wide range of
geometries, each specified as the set of properties and relations of special subsets (the “figures”) of a given
set X which are invariant under a group G of transformations of X which define a generalized notion of
congruence. In the case of affine spaces, X is a linear space and G is the set of all affine transformations – the
group of all combinations of translations and invertible linear transformations. It seems that this sort of meta-
geometrical point of view did not originate in the Erlanger Programm but was rather merely systematized
in Klein’s proposal.
On reading Euler’s original text on the “affinity” relation, it becomes clear that he was not much interested
in the significance for geometry. He was interested rather in the graphs of parametrized families of algebraic
functions rather than the geometry of those graphs as geometrical objects. Here is paragraph 442 of Euler’s
Introductio in analysin infinitorum [104], volume 2, chapter 18.
442. Quemadmodum in curvis similibus abscissae et applicatae homologae in eadem ratione sive
augentur sive diminuuntur, ita, si abscissae aliam sequantur rationem, aliam vero applicatae, cur-
vae non amplius orientur similes. Verum tamen, quia curvae hoc modo ortae inter se quandam
affinitatem tenent, has curvas affines vocabimus; complectitur ergo affinitas sub se similitudinem
tanquam speciem, quippe curvae affines in similes abeunt, si ambae illae rationes, quas abscissae
et applicatae seorsim sequuntur, evadant aequales. Ex curva ergo quacunque data AM B innu-
merabiles curvae affines (Fig. 88 et 89) amb reperientur hoc modo: sumatur abscissa ap, ita ut
sit AP : ap = 1 : m; harum rationum 1 : m et 1 : n vel alterutram vel utramque, innumerabiles
prodibunt curvae, quae primae AM B erunt affines.

800 45. History
This may be translated as follows.

442. Just as the corresponding abscissae and ordinates in similar curves are augmented or dimin-
ished in the same ratio, so, if the abscissae follow one ratio and the ordinates follow a different
ratio, the curves are no longer similar. Nevertheless, because curves arising in this way have a
certain affinity to each other, we will call these curves affine; affinity therefore encompasses the
similarity idea, so to speak; in fact, affine curves change to similar curves if both of those ratios,
which the abscissae and ordinates separately follow, happen to be equal. Therefore from any given
curve AM B, countless affine curves amb (Fig. 88 and 89) are found in this way: the abscissa ap is
chosen so that AP : ap = 1 : m; then the ordinate pm is determined so that P M : pm = 1 : n; and
thus by changing these ratios 1 : m and 1 : n, either both or one at a time, countless curves will be
produced which are affine to the first AM B.
Euler’s figures 88 and 89 are illustrated in Figure 45.3.1. The X-axis is vertical and the Y-axis is oriented to
the left. The lines P M and pm represent the ordinates, whereas AP and ap represent the abscissae.
b
O
M P o
m p
Fig. 89
Fig. 88 c a
C A
Figure 45.3.1 Euler’s figures 88 and 89 in the Introductio
The editor, Andreas Speiser [105], of the complete works of Euler makes this comment regarding chapter 18.
Kapitel 18 handelt von ähnlichen und affinen Kurven, ersteres mit der Substitution x = mu,
y = mv, letzteres mit der allgemeineren x = mu, y = nv. Der Ausdruck “affin” ist wohl hier von
Euler eingeführt worden.
Chapter 18 deals with similar and affine curves, the former with the substitution x = mu, y = mv,
the latter with the more general x = mu, y = nv. The expression “affine” is no doubt introduced
here by Euler.
In other words, this passage appears to be the origin of the term “affine” in this geometrical sense. But
it seems that Euler did not do much with it. The subject seems to have not taken off until Möbius took
it up and used the same term that Euler had introduced. Here is the relevant comment by Möbius [76],
section 147, page 195, just after quoting Euler’s paragraph 442.
Der von Euler hier aufgestellte Begriff der Affinitas ist also ganz mit dem vorhin entwickelten
einerlei, und ich will daher gleichfalls diese allgemeinere Verwandtschaft Affinität, und Figuren,
zwischen denen sie statt findet, affine Figuren nennen.
Translated into English, this is as follows.
The concept of affinity proposed here by Euler is thus entirely the same as that which was developed
above, and likewise, I want to call this more general relation affinity, and call figures between which
this relation exists affine figures.

45.4. Logical language in ancient literature 801
In paragraph 443, Euler discusses how to substitute scaled variables in place of the x and y variables in a
given equation. But in paragraph 444, he talks about the supposed distinction between similar and affine
curve relationships.
444. Discrimen autem inter curvas similes et affines hoc potissimum est notandum, quod curvae,
quae sunt similes respectu unius axis vel puncti fixi, eaedem similes sint futurae respectu aliorum
quorumvis axium seu punctorum homologorum. Curvae autem, quae tantum sunt affines, tales
tantum sunt respectu eorum axium, ad quos referentur, neque pro lubitu alii axes seu puncta
homologa in ipsis dantur, ad quae affinitas referri possit. [. . . ]
444. Now the most powerful distinction to be noted between similar and affine curves is that curves
which are similar in respect of a single axis or fixed point are going to be similar with respect to
any other axes or corresponding points. On the other hand, curves which are only affine, are such
only in respect of the axes to which they are referred, and other axes or corresponding points are
not given in themselves arbitrarily, to which the affinity may be referred. [. . . ]
Euler says here that the axes to be used for a similarity translation are arbitrary, which is true, but that
the axes for an affinity transformation cannot be chosen arbitrarily. Möbius states that in fact the axes
for an affinity transformation are not only of arbitrary orientation but are also not necessarily orthogonal,
although he continues to use two scale factors m and n for the scalings in the two axial directions. Thus
Möbius does not resort to linear combinations of coordinates. He still uses a diagonal matrix effectively, but
axes are chosen in different angles in each of the two figures which are supposed to be affine. Clearly Euler
thought only in terms of orthogonal coordinates in both figures, with no off-diagonal components in the
transformation matrix. Oddly, though, in paragraphs 452–454, Euler discusses orothogonal transformations
using sines and cosines of rotation angles. But he didn’t say anything about making both the diagonal
components different and also the off-diagonal components non-zero at the same time. This just shows that
Euler was not thinking at all in terms of what we think of as affine transformations today. But in terms of
etymology, there seems little doubt that Euler was the originator of the term “affine” for a concept which
directly developed into the transformation group that we know today.
45.4. Logical language in ancient literature

45.4.1 Remark: Logical language in the epic of Gilgamesh
The incidence of logical language in the epic of Gilgamesh is discussed in Remark 3.5.2. The epic of Gilgamesh
contains the following 11 instances of the word “if”.
The last line of the following passage [203, page 19] (repeated as lines II.284–286) is an IF-clause.
‘So to keep safe the cedars, II.227
Enlil made it his lot to terrify men; II.228
if you penetrate his forest you are seized by the tremors.’ II.229
The following passage is in lines VI.96–97 [203, page 51].
‘If you do not give me the Bull of Heaven, VI.96
I shall smash the gates of the Netherworld, right down to its dwelling, VI.97
to the world below I shall grant manumission, VI.98
I shall bring up the dead to consume the living, VI.99
I shall make the dead outnumber the living.’ VI.100
A little later, there is the following [203, page 51].
‘If you want from me the Bull of Heaven, VI.103
let the widow of Uruk gather seven years’ chaff, VI.104
and the farmer of Uruk grow seven years’ hay.’ VI.105
And a little later, there is the following [203, page 52].
‘Had I caught you too, I’d have treated you likewise, VI.156
I’d have draped your arms in its guts!’ VI.157
On the next tablet, there is the following [203, page 56].
‘Had I but known, O door, that so you would repay me, VII.47
had I but known, O door, that so you would reward me, VII.48

802 45. History
I would have lifted my axe, I would have cut you down, VII.49
I would have floated you down as a raft to Ebabbara.’ VII.50
On tablet X, there is the following [203, page 77].
‘If you and Enkidu were the ones who slew the Guardian, X.36
destroyed Humbaba, who dwelt in the Forest of Cedar, X.37
killed lions in the mountain passes, X.38
seized and slew the Bull come down from heaven – X.39
why are your cheeks so hollow, your face so sunken, X.40
your mood so wretched, your visage so wasted?’ X.41
The form of lines X.76–77 [203, page 78] (repeated as lines X.153–154) is (A ⇒ B) ∧ ((¬A) ⇒ C). This is a
two-way decision tree. It suggests the proposition A ∨ ¬A or the corresponding exclusive disjunction.
‘If it may be done, I will cross the ocean, X.76
if it may not be done, I will wander in the wild!’ X.77
A little later, there is the following echo [203, page 79].
‘If it may be done, go across with him, X.90
if it may not be done, turn around and go back!’ X.91
Later, there is an unfinished IF-sentence [203, page 86].
‘If, Gilgamesh, the temples of the gods [have no] provisioner , X.288
the temples of the goddesses . . . ’ X.289
On tablet XI, there is the following [203, page 98].
‘There is a plant that [looks] like a box-thorn, XI.283
it has prickles like a dogrose, and will [prick one who plucks it.] XI.284
But if you can possess this plant, XI.285
[you’ll be again as you were in your youth.] XI.286
45.4.2 Remark: Logical language in Bēowulf

As mentioned in Remark 3.5.4, there are 16 OR-constructions, 26 IF-constructions, and numerous “unless”
and “except” constructions in the 3182 lines of Bēowulf. So to quote them all would be burdensome, especially
if presented in both Old English and modern English. Therefore only a small sample of logical expressions
from that work are given here.
The following instance of “if” occurs in lines 1380–1382 (Wrenn [219], page 153).
“Ic þē þā fǣh!e fēo lēanige, 1380
eald-gestrēonum, swā ic ǣr dyde, 1381
wundini golde, gyf þū on weg cymest.” 1382
The translation of this pasage by Alexander [195], page 94, is as follows.
‘I shall reward the deed, as I did before,
with wealthy gifts of wreathèd ore,
treasures from the hoard, if you return again.’
The following instance of “or” occurs in lines 1490–1491 (Wrenn [219], page 157).
“ic mē mid Hruntinge 1490
dōm gewyrce, oþ!e mec dēa! nime!.” 1491
‘With Hrunting shall I
achieve this deed – or death shall take me!’
The following quotation shows how many “or” alternatives may be chained together (Wrenn [219], page 165).
“Nū is þı̄nes mægnes blǣd 1761
āne hwı̄le; eft sōna bi! 1762
þæt þec ādl o!!e ecg eafoþes getwǣfe!, 1763
o!!e fȳres feng o!!e flōdes wylm 1764
o!!e gripe mēces o!!e gāres fliht 1765
o!!e atol yldo, o!!e ēagena bearhtm 1766

45.4. Logical language in ancient literature 803
forsite! ond forsworce!; semninga bi!, 1767

þæt !ec, dryht-guma, dēa! oferswȳ!e!.” 1768
The translation of this pasage by Alexander [195], pages 106–107, is as follows.
‘The noon of your strength
shall last for a while now, but in a little time
sickness or a sword will strip it from you:
either enfolding flame or a flood’s billow
or a knife-stab or the stoop of a spear
or the ugliness of age; or your eyes’ brightness
lessens and grows dim. Death shall soon
have beaten you then, O brave warrior!’
The following quotation shows an example of the word “nefne”, meaning “unless” (Wrenn [219], page 178).
“Gēn is eall æt !ē 2149
lissa gelong; ic lȳt hafo 2150
hēafod-māga, nefne Hygelāc !ec!” 2151
‘Joy, for me, always
lies in your gift. Little family
do I have in the world, Hygelac, besides yourself.’
In the above case, the word “nefne” is best translated as “except” or “besides”. In the following example,
the opposite is true. The following quotation shows an example of the word “būtan”, meaning “except” or
“without” (Wrenn [219], page 136).
“Ic hine hrædlı̄ce heardan clammum 963
on wæl-bedde wrı̄þan þōhte, 964
þæt hē for mund-gripe mı̄num scolde 965
licgean lı̄f-bysig, būtan his lı̄c swice.” 966
‘I had meant to catch him, clamp him down
with a cruel lock to his last resting-place;
with my hands upon him, I would have him soon
in the throes of death – unless he disappeared!’

804 45. History

[805]
Chapter 46
Exercise questions
46.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805

46.2 Sets, relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
46.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
46.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
46.5 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
46.6 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
46.7 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
These exercises are intended for the reader’s self-study, as active learning for the concepts in the book. They
are not intended for use in exam questions or any other form of assessment. The answers will be given in
full (in Chapter 47) to ensure that there is no incompleteness in the presentation due to missing proofs of
theorems which the reader is expected to provide.
Since answers will be given for all exercises, teachers may find these exercises useless as homework and exam
questions. Questions for assessment can easily be constructed by morphing the exercises in this book (and
all other DG books). The difficult part of designing exam questions is to ensure that the answers will not be
too difficult or too easy. So it is helpful to see the answers (as in this book) to see what the level of difficulty
might be. Hopefully a slightly morphed question will have a slightly morphed answer. (Of course this is not
always so!)
This chapter is organized into sections which do not correspond exactly to the chapters of the book, but the
topic order should be roughly the same.
46.1. Logic
46.1.1 Exercise (→ 47.1.1): Give a logical statment which is equivalent to τ (A) + τ (B) + τ (C) ≤ 2 for
propositions A, B and C. (See Remark 4.3.15.)
46.1.2 Exercise (→ 47.1.2): Give a logical statment which is equivalent to τ (A) + τ (B) + τ (C) = 2 for
propositions A, B and C. (See Remark 4.3.15.)
46.1.3 Exercise (→ 47.1.3): Determine the possible truth values of proposition variables A, B and C,
given that A ∧ (B ∨ C) is true and (A ∨ B) ∧ C is false. (See Remark 4.4.2.)
46.1.4 Exercise (→ 47.1.4): Prove Theorem 4.10.2 directly without the “Deduction Theorem”. Theorems
which are not tainted by the naive “Deduction Theorem” may be used in the proof.
46.1.5 Exercise (→ 47.1.5): Prove Theorem 4.11.11.
46.1.6 Exercise (→ 47.1.6): Write down a logical expression which means “there exist at least 3 things
x, y and z with property P ”. In other words, ∃3 x, P (x). (See Remark 4.16.9.)

806 46. Exercise questions
46.1.7 Exercise (→ 47.1.7): Write down a logical expression which means “there exist at most 3 things
46.1.8 Exercise (→ 47.1.8): Write down a logical expression which means “there exist exactly 3 things
46.1.9 Exercise (→ 47.1.9): Write down a logical expression which means “there exist at least 2 and most
3 things with property P ”. In other words, ∃32 x, P (x). (See Remark 4.16.9.)
46.1.10 Exercise (→ 47.1.10): For general non-negative integer n, write down a logical expression which
means “there exist at least n things with property P ”. In other words, ∃n x, P (x). (See Remark 4.16.9.)
means “there exist at most n things with property P ”. In other words, ∃n x, P (x). (See Remark 4.16.9.)
means “there exist exactly n things with property P ”. In other words, ∃nn x, P (x). (See Remark 4.16.9.)
46.1.13 Exercise (→ 47.1.13): For general non-negative integers m and n with m < n, write down a
logical expression which means “there exist at least m and most n things with property P ”. In other words,
∃nm x, P (x). (See Remark 4.16.9.)
46.2. Sets, relations and functions

46.2.1 Exercise (→ 47.2.1): Show that ∃x ∈ S, P (x) is equivalent to ¬(∀x ∈ S, ¬P (x)). (See Re-
mark 5.1.15.)
46.2.2 Exercise (→ 47.2.2): Show that the combination of the specification axiom in line (5.4.1) and
the replacement axiom in line (5.4.2) implies the single replacement axiom in Definition 5.1.26 (6). (See
Remark 5.4.3 for discussion.)
46.2.3 Exercise (→ 47.2.3): Show that the ZF replacement axiom, Definition 5.1.26 (6), implies the spec-
ification axiom in line (5.4.1) of Remark 5.4.3.
46.2.4 Exercise: In ZF set theory (Section 5.1), an infinite number of set membership relations “on the
left” is prohibited by the regularity axiom (7), but an infinite number of such relations “on the right” is
perfectly okay. Why is there this asymmetry?
46.2.5 Exercise: Rewrite the ZF set theory axioms to be consistent with ∀x, x ∈ x, but only if it is
possible to do this. (See Remark 5.7.17.)

46.3. Numbers 807
46.3. Numbers
46.3.1 Exercise (→ 47.3.1): Write the ordinal numbers 0, 1, 2, 3, 4, 5 explicitly in terms of the empty
set. (See Remark 7.2.14.)
46.3.2 Exercise (→ 47.3.3): Draw a diagram of the ordinal number 10 in the style of Figure 7.2.1. (See
Remark 7.2.15.)
46.3.3 Exercise: Show that the successor set S + = S ∪ {S} in Theorem 7.2.17 is well-defined for all sets
S in ZF set theory.
46.3.5 Exercise: Prove Theorem 8.6.18.
46.4. Algebra
46.4.1 Exercise (→ 47.4.1): Construct a group G and a subgroup S of G such that ∀g ∈ G, gSg −1 =
g −1 Sg, but ∃h ∈ G, hS -= Sh. In other words, S is a normal subgroup of G, but the two possible definitions
for the conjugate of S are not equivalent. (Hint: h2 Sh−2 = S, but hSh−1 -= S. So an element h of order 2
would be a good hunch.)
46.4.2 Exercise: Referring to Example 9.3.17, construct a similar example with SO(3) on IR3 with h
equal to an arbitrary rotation or reflection and g equal to an arbitrary rotation.
46.4.3 Exercise: Show that if h = Sφ , then g −1 hg = Sφ−θ and ghg −1 = Sφ+θ . (See Example 9.3.17.)
46.4.4 Exercise: Prove the statements in Remark 9.3.19.
46.4.6 Exercise: Prove the statements in Remark 9.4.15.
46.4.7 Exercise: Referring to Example 9.4.32, show by calculation in coordinates that for rotations around
each of the axes in IR3 , points on the sphere S 2 ⊆ IR3 are mapped to other points which lie on circles on the
sphere with circle centres on the respective axes.
46.4.8 Exercise: Verify that the tuples (G, X, σ, µ) in Definitions 9.4.36 and 9.4.37 satisfy Definition 9.4.4
for a left transformation group.
46.4.9 Exercise: Verify that the skew products in Definitions 9.6.4, 9.6.5, 9.6.8 and 9.6.9 satisfy Defini-
tions 9.4.4 and 9.5.2 for left and right transformation groups.
46.5. Linear algebra

46.5.1 Exercise: Show that for any one-dimenional linear space V over a field K,
∀φ ∈ End(V ), ∃λ ∈ K, ∀v ∈ V, φ(v) = λv.
See Remark 10.4.1 for context.

46.6. Tensor algebra

46.7. Topology
46.7.1 Exercise: Show that the trivial and discrete topologies in Definitions 14.3.18 and 14.3.19 satisfy
the requirements of Definition 14.3.3.
46.7.2 Exercise: Tabulate the set of all possible topologies on a four-element set. (See Example 14.4.6.)
46.7.3 Exercise (→ 47.7.1): Show by direct calculation that Theorem 14.9.8 is valid for X = ∅.
46.7.5 Exercise: Determine whether the empty topology (X, T ) = (∅, {∅}) is second countable, separable,
Hausdorff, normal, connected, locally connected, simply connected, compact, sequentially compact, locally
compact or paracompact. See Chapter 15.
46.7.6 Exercise: Determine whether the single point topology (X, T ) = ({x}, {∅, {x}}) is second count-
able, separable, Hausdorff, normal, connected, locally connected, simply connected, compact, sequentially
compact, locally compact or paracompact. See Chapter 15.
46.7.7 Exercise: Determine whether the trivial two-point topology (X, T ) = ({x, y}, {∅, {x, y}}) is second
countable, separable, Hausdorff, normal, connected, locally connected, simply connected, compact, sequen-
tially compact, locally compact or paracompact. See Chapter 15.
46.7.8 Exercise: Determine if the discrete two-point topology (X, T ) = ({x, y}, {∅, {x}, {y}, {x, y}}) is
second countable, separable, Hausdorff, normal, connected, locally connected, simply connected, compact,
sequentially compact, locally compact or paracompact. See Chapter 15.
" Cartesian product X for any totally

46.7.10 Exercise: Verify that the lexicographic total order on any n
ordered #set X and integer n ∈ 0 which is defined by x ≤ y ⇔ ∀j ∈ n , (∀i ∈ n , i < j ⇒ xi = yi ) ⇒

+
xj ≤ yj is equivalent to the order x ≤ y ⇔ ∀j ∈ n , ((xj ≤ yj ) ∨ (∃i ∈ n , (i < j ∧ xi -= yi ))). (See

Section 7.1 and Remark 16.1.8 for context.)

46.8.1 Exercise (→ 47.8.1): Prove the statement in Remark 23.6.6 that the general empty topological
E ) = (∅, {∅}, ∅, ∅, {∅}, AP ), where
(G, F ) fibre bundle for non-empty F has the form (E, TE , π, B, TB , AF G
AP = ∅ or {∅}.
G
46.8.2 Exercise (→ 47.8.2): Show that condition (23.10.1) in Remark 23.10.4 and Definition 23.10.3 (ii)
are equivalent.
46.8.3 Exercise (→ 47.8.3): Prove the statement in Remark 23.12.7 that the relation φ(z # )y # = φ(z)y in
P . In other words, prove that
Definition 23.12.3 (i) is independent of the choice of φ ∈ AG
∀b ∈ B, ∀z, z # ∈ q −1 ({b}), ∀y, y # ∈ F, ∀φ1 , φ2 ∈ AG P,
φ1 (z # )y # = φ1 (z)y ⇔ φ2 (z # )y # = φ2 (z)y.

46.9. Topological manifolds 809

46.9.1 Exercise (→ 47.9.1): Show that any locally Euclidean space is locally connected.
(See Remark 25.2.6.)
46.9.2 Exercise (→ 47.9.2): Show that any locally Euclidean space is locally compact.
(See Remark 25.2.6.)



[811]
Chapter 47
Exercise answers
47.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811

47.2 Sets, relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
47.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
47.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
47.5 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
47.6 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
47.7 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
Generally exercises have many possible answers. Take these answers seriously at your peril.
[ The numbering scheme for exercise questions and answers obviously needs fixing. The same number should
somehow be used for corresponding questions and answers. But for now, arrows point from the answers to
the questions. ]
47.1. Logic
47.1.1 Answer (→ 46.1.1): (A ∧ B) ∨ (A ∧ C) ∨ (B ∧ C).
" #
47.1.2 Answer (→ 46.1.2): (A ∧ B) ∨ (A ∧ C) ∨ (B ∧ C) ∧ ¬(A ∧ B ∧ C).
Alternatively, (A ∧ B ∧ ¬C) ∨ (A ∧ ¬B ∧ C) ∨ (¬A ∧ B ∧ C).
47.1.3 Answer (→ 46.1.3): Since A ∧ (B ∨ C) is true, both A and B ∨ C must be true. Since (A ∨ B) ∧ C
is false, either A ∨ B is false or C is false. But since A is true, A ∨ B must be true. Therefore C must be
false. Then since B ∨ C is true, B must be true. Conclusion: A and B are true, but C is false. There are
no other possible truth-value combinations.
47.1.4 Answer (→ 46.1.4): To prove Theorem 4.10.2 (i) directly without the “Deduction Theorem”:
< (α ⇒ β) ⇒ ((β ⇒ γ) ⇒ (α ⇒ γ))
(1) (β ⇒ γ) ⇒ (α ⇒ (β ⇒ γ)) PC 1
(2) (α ⇒ (β ⇒ γ)) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)) PC 2
(3) (β ⇒ γ) ⇒ ((α ⇒ β) ⇒ (α ⇒ γ)) Theorem 4.8.3 (iii) (1,2)
(4) (α ⇒ β) ⇒ ((β ⇒ γ) ⇒ (α ⇒ γ)) Theorem 4.8.3 (iv) (3)
47.1.5 Answer (→ 46.1.5): Theorem 4.11.11, part (i) may be shown as follows.
" #
(A ⇒ B) ⇔ (A ⇒ A) ∧ (A ⇒ B) (47.1.1)
" #
⇔ A ⇒ (A ∧ B)
" #
⇔ (A ⇒ (A ∧ B)) ∧ ((A ∧ B) ⇒ A) (47.1.2)
" #
⇔ A ⇔ (A ∧ B) .

812 47. Exercise answers
Line (47.1.1) follows from the tautology A ⇒ A. Line (47.1.2) follows from the tautology (A ∧ B) ⇒ A.
[ The proof of Theorem 4.11.11 in the above answer is most unsatisfactory. The theorem isn’t very useful
either. So should scrap it or do it properly. ]
47.1.6 Answer (→ 46.1.6): ∃3 x, P (x) may be written as:
∃x, ∃y, ∃z, P (x) ∧ P (y) ∧ P (z) ∧ (x -= y) ∧ (y -= z) ∧ (x -= z).

∀w, ∀x, ∀y, ∀z,
(P (w) ∧ P (x) ∧ P (y) ∧ P (z)) ⇒ (w = x ∨ w = y ∨ w = z ∨ x = y ∨ x = z ∨ y = z).
Alternatively,
∀w, ∀x, ∀y, ∀z,
(w -= x ∧ w -= y ∧ w -= z ∧ x -= y ∧ x -= z ∧ y -= z) ⇒ (¬P (w) ∨ ¬P (x) ∨ ¬P (y) ∨ ¬P (z)).
Of course, saying that there are at most 3 things x such that P (x) is true is not really an existential statement
at all. All uniqueness statements (in this case a “tripliqueness” statement) are negative existence statments.
Therefore universal quantifiers are used rather than existential quantifiers.
The similarity of form to Answer 47.1.6 is not coincidental. This is not surprising because ∃3 x, P (x) means
the same thing as ¬(∃4 x, P (x)). There exist at most three x if and only if there do not exist four x. It
follows by simple negation of ∃4 x, P (x) that there is a third equivalent form for ∃3 x, P (x), namely
∀w, ∀x, ∀y, ∀z,
¬P (w) ∨ ¬P (x) ∨ ¬P (y) ∨ ¬P (z) ∨ w = x ∨ w = y ∨ w = z ∨ x = y ∨ x = z ∨ y = z.

+ ,
∃x, ∃y, ∃z, (P (x) ∧ P (y) ∧ P (z) ∧ (x -= y) ∧ (y -= z) ∧ (x -= z))
+ ,
∧ ∀w, ∀x, ∀y, ∀z, (P (w) ∧ P (x) ∧ P (y) ∧ P (z)) ⇒ (w = x ∨ w = y ∨ w = z ∨ x = y ∨ x = z ∨ y = z) .
This is clearly the same as the conjunction of ∃3 x, P (x) and ∃3 x, P (x) in Answers 47.1.7 and 47.1.8. It is
probably not possible to reduce the complexity of the statement by exploiting some sort of redundancy in
the combination of statements.
" #
∃x, ∃y, (P (x) ∧ P (y) ∧ (x -= y))
+ ,
∧ ∀w, ∀x, ∀y, ∀z, (P (w) ∧ P (x) ∧ P (y) ∧ P (z)) ⇒ (w = x ∨ w = y ∨ w = z ∨ x = y ∨ x = z ∨ y = z) .
This may be interpreted as: “There are at least 2, but less than 4, x such that P (x) is true.” More informally,
one could write 2 ≤ #{x; P (x)} < 4.
47.1.10 Answer (→ 46.1.10): It is tempting to approach this exercise using induction. However, that
would not yield a closed formula for the desired statement ∃n x, P (x). It is easier to think of the ordinal
number n as a general set. We want to require the existence of a unique xi such that P (xi ) is true for
each i ∈ n. The logical expression (47.1.3) is one way of writing ∃n x, P (x).
" # " #
∃f, ∀i ∈ n, ∃x ((i, x) ∈ f ∧ P (x)) ∧ ∀i ∈ n, ∀j ∈ n, ∀x ((i, x) ∈ f ∧ (j, x) ∈ f ) ⇒ i = j . (47.1.3)
In other words, there exists a set f which includes an injective relation on n, such that P (f (i)) is true
for all i ∈ n. (It is only required that f restricted to n be an injective relation on n. The purpose of
this generality is to simplify the logic.) Thus statement (47.1.3) means that there exists a distinct x which
satisfies P for each i ∈ n. As a bonus, the logical expression (47.1.3) is valid for any set n, even if n is
countably or uncountably infinite.

47.2. Sets, relations and functions 813
47.1.11 Answer (→ 46.1.11): Note that ∃n x, P (x) is equivalent to ¬(∃n+1 x, P (x)), which can be ob-
tained from Answer 47.1.10 by simple negation as in the following expression, where n+ = n + 1.
" # " #
∀f, ∀i ∈ n+ , ∀j ∈ n+ , ∀x ((i, x) ∈ f ∧ (j, x) ∈ f ) ⇒ i = j ⇒ ∃i ∈ n+ , ∀x ((i, x) ∈ f ⇒ ¬P (x)) .
This may be interpreted as: “For any injective function f on n+ , for some i in n+ , P (f (i)) is false.” The
generalization of this expression to infinite n does not seem to be as straightforward as in Answer 47.1.10.
47.1.12 Answer (→ 46.1.12): This question can be answered by combining Answers 47.1.10 and 47.1.11.
+" # " #,
∃f, ∀i ∈ n, ∃x ((i, x) ∈ f ∧ P (x)) ∧ ∀i ∈ n, ∀j ∈ n, ∀x ((i, x) ∈ f ∧ (j, x) ∈ f ) ⇒ i = j
K " # " #
∀g, ∀i ∈ n+ , ∀j ∈ n+ , ∀y ((i, y) ∈ g ∧ (j, y) ∈ g) ⇒ i = j ⇒ ∃i ∈ n+ , ∀y ((i, y) ∈ g ⇒ ¬P (y)) .
47.1.13 Answer (→ 46.1.13): The expression ∃nm x, P (x) is a simple variation of Answer 47.1.12.
+" # " #,
∃f, ∀i ∈ m, ∃x ((i, x) ∈ f ∧ P (x)) ∧ ∀i ∈ m, ∀j ∈ m, ∀x ((i, x) ∈ f ∧ (j, x) ∈ f ) ⇒ i = j
K " # " #
∀g, ∀i ∈ n+ , ∀j ∈ n+ , ∀y ((i, y) ∈ g ∧ (j, y) ∈ g) ⇒ i = j ⇒ ∃i ∈ n+ , ∀y ((i, y) ∈ g ⇒ ¬P (y)) .
47.2. Sets, relations and functions

47.2.1 Answer (→ 46.2.1): By Remark 5.1.15, ∃x ∈ S, P (x) means ∃x, (x ∈ S ∧ P (x)). By Re-
mark 4.13.2, this means ¬(∀x, ¬(x ∈ S ∧ P (x))). By Remark 5.1.15, this is equivalent to ¬(∀x, (x ∈
S ⇒ ¬P (x))). By Remark 5.1.15, this means ¬(∀x ∈ S, ¬P (x)).
47.2.2 Answer (→ 46.2.2): First assume the specification axiom in line (5.4.1) and the replacement axiom
in line (5.4.2) of Remark 5.4.3. That is, assume
∃Y, ∀z, (z ∈ Y ⇔ (z ∈ X ∧ P (z))) (47.2.1)
for any set X and boolean 1-formula P , and
∃Y, ∀x, ((x ∈ X ∧ ∃a, R(x, a)) ⇒ ∃b, (b ∈ Y ∧ R(x, b))) (47.2.2)
for any set X and boolean 2-formula R. It must be shown that

" # " #
∀x, ∀y, ∀z, ((f (x, y) ∧ f (x, z)) ⇒ y = z) ⇒ ∃B, ∀y, y ∈ B ⇔ ∃x, (x ∈ A ∧ f (x, y)) (47.2.3)
for any set A and boolean 2-formula f .

Let A be a set and let f be a boolean 2-formula such that ∀x, ∀y, ∀z, ((f (x, y) ∧ f (x, z)) ⇒ y = z). (In other
words, f is a function.) Then by (47.2.2), there is a set Y which satisfies
∀x, ((x ∈ A ∧ ∃a, f (x, a)) ⇒ ∃b, (b ∈ Y ∧ f (x, b))) (47.2.4)
Define the boolean 1-formula P by P (z) = “∃x ∈ A, f (x, z)”. Then by (47.2.1), there is a set Z which
satisfies ∀z, (z ∈ Z ⇔ (z ∈ Y ∧ ∃x ∈ A, f (x, z))).
Now let y satisfy ∃c ∈ A, f (c, y). Then there is a c such that f (c, y). This c then satisfies (c ∈ A) ∧
(∃z, f (c, z)). Therefore ∃b, (b ∈ Y ∧ f (c, b)) by (47.2.4). Thus there is a b such that b ∈ Y and f (c, b). But
f is a function. So b = y. Therefore y ∈ Y .
The above argument shows that any y which satisfies ∃c ∈ A, f (c, y) also satisfies y ∈ Y , it follows that
∀y, ((∃c ∈ A, f (c, y)) ⇒ (y ∈ Y )). Therefore (z ∈ Y ∧ ∃x ∈ A, f (x, z)) ⇔ (∃x ∈ A, f (x, z)). (This follows
from the logical tautology (B ⇒ A) ⇒ ((A ∧ B) ⇔ B).) By susbstituting this equivalence into the definition
of Z, it follows that ∀z, (z ∈ Z ⇔ ∃x, (x ∈ A ∧ f (x, z))). This is the right hand side of (47.2.3). So (47.2.3)
is verified.

47.2.3 Answer (→ 46.2.3): It must be shown that (47.2.1) in Remark 5.4.3 follows from line (47.2.3) in
Answer 47.2.2.
47.2.4 Answer (→ 46.2.6): To prove Theorem 5.13.4 part (i), note that
x ∈ A ∪ ∅ ⇔ (x ∈ A ∨ x ∈ ∅)
⇔ x ∈ A. (47.2.5)
Line (47.2.5) follows from Remark 4.11.9 and the fact that ∀x, x ∈
/ ∅ by the definition of an empty set.
To prove part (ii), note that
x ∈ A ∩ ∅ ⇔ (x ∈ A ∧ x ∈ ∅)
⇔ x ∈ ∅. (47.2.6)
Line (47.2.6) follows from Remark 4.11.9 and the fact that ∀x, x ∈
/ ∅ by the definition of an empty set.
To prove part (iii), note that
A ⊆ B ⇔ (z ∈ A ⇒ z ∈ B)
⇔ (z ∈ A ⇒ (z ∈ A ∧ z ∈ B))
⇔ (z ∈ A ⇔ (z ∈ A ∧ z ∈ B)).
47.2.5
" # "(→ 46.2.7): #These
Answer " formulas follow from
# "Theorem 5.13.6. For
# example, (A∪B)∩(C ∪D) =
(A ∪ B) ∩ C ∪ (A ∪ B) ∩ D = (A ∩ C) ∪ (B ∩ C) ∪ (A ∩ D) ∪ (B ∩ D) .
47.2.6 Answer
% (→ 46.2.8): For Theorem 5.14.7 part (i), let S1 and S2 be sets of sets such that S1 ⊆
% S2 .
Let x ∈ S1 . Then ∃A ∈ S1 , x ∈ A. But A ∈ S1 ⇒ A ∈ S2 . So ∃A ∈ S2 , x ∈ A, which means that x ∈ S2 .
'
To prove part (ii), let S1 and S2 be non-empty sets of sets such that S1 ⊆ S2 .' Let x ∈ S1 . Then
∀A ∈ S1 , x ∈ A. But A ∈ S1 ⇒ A ∈ S2 . So ∀A ∈ S2 , x ∈ A, which means that x ∈ S2 .
To prove part (iii), let A be a set and S1 be a set of sets. Then
%
x∈A∩( S1 ) ⇔ (x ∈ A) ∧ (∃X ∈ S1 , x ∈ X)
⇔ ∃X ∈ S1 , (x ∈ A ∧ x ∈ X)
⇔ ∃X ∈ S1 , x ∈ A ∩ X
%
⇔ x ∈ {A ∩ X; X ∈ S1 }.
To prove part (iv), let A be a set and S1 be a non-empty set of sets. Then
'
x∈A∪( S1 ) ⇔ (x ∈ A) ∨ (∀X ∈ S1 , x ∈ X)
⇔ ∀X ∈ S1 , (x ∈ A ∨ x ∈ X)
⇔ ∀X ∈ S1 , x ∈ A ∪ X
'
⇔ x ∈ {A ∪ X; X ∈ S1 }.
To prove part (v), let S1 and S2 be sets of sets. Then

% %
x ∈ ( S1 ) ∩ ( S2 ) ⇔ (∃X1 ∈ S1 , x ∈ X1 ) ∧ (∃X2 ∈ S2 , x ∈ X2 )
⇔ ∃X1 ∈ S1 , ∃X2 ∈ S2 , (x ∈ X1 ∧ x ∈ X2 )
⇔ ∃X1 ∈ S1 , ∃X2 ∈ S2 , x ∈ X1 ∩ X2
%
⇔ x ∈ {X1 ∩ X2 ; X1 ∈ S1 , X2 ∈ S2 }.

47.2. Sets, relations and functions 815
To prove part (vi), let S1 and S2 be non-empty sets of sets. Then

' '
x ∈ ( S1 ) ∪ ( S2 ) ⇔ (∀X1 ∈ S1 , x ∈ X1 ) ∨ (∀X2 ∈ S2 , x ∈ X2 )
⇔ ∀X1 ∈ S1 , ∀X2 ∈ S2 , (x ∈ X1 ∨ x ∈ X2 )
⇔ ∀X1 ∈ S1 , ∀X2 ∈ S2 , x ∈ X1 ∪ X2
'
⇔ x ∈ {X1 ∪ X2 ; X1 ∈ S1 , X2 ∈ S2 }.
To prove part (vii), let S1 and S2 be non-empty sets of sets. Then

% %
x ∈ ( S1 ) ∪ ( S2 ) ⇔ (∃X1 ∈ S1 , x ∈ X1 ) ∨ (∃X2 ∈ S2 , x ∈ X2 )
⇔ ∃X1 ∈ S1 , ∃X2 ∈ S2 , (x ∈ X1 ∨ x ∈ X2 )
⇔ ∃X1 ∈ S1 , ∃X2 ∈ S2 , x ∈ X1 ∪ X2
%
⇔ x ∈ {X1 ∪ X2 ; X1 ∈ S1 , X2 ∈ S2 }.
To prove part (viii), let S1 and S2 be non-empty sets of sets. Then

' '
x ∈ ( S1 ) ∩ ( S2 ) ⇔ (∀X1 ∈ S1 , x ∈ X1 ) ∧ (∀X2 ∈ S2 , x ∈ X2 )
⇔ ∀X1 ∈ S1 , ∀X2 ∈ S2 , (x ∈ X1 ∧ x ∈ X2 )
⇔ ∀X1 ∈ S1 , ∀X2 ∈ S2 , x ∈ X1 ∩ X2
'
⇔ x ∈ {X1 ∩ X2 ; X1 ∈ S1 , X2 ∈ S2 }.
To prove part (ix), let S1 and S2 be sets of sets. Then

% % % %
x∈( S1 ) ∪ ( S2 ) ⇔ (x ∈ S1 ) ∨ (x ∈ S2 )
" # " #
⇔ ∃X, (X ∈ S1 ∧ x ∈ X) ∨ ∃X, (X ∈ S2 ∧ x ∈ X)
" #
⇔ ∃X, (X ∈ S1 ∧ x ∈ X) ∨ (X ∈ S2 ∧ x ∈ X)
" #
⇔ ∃X, (X ∈ S1 ∨ X ∈ S2 ) ∧ x ∈ X
" #
⇔ ∃X, (X ∈ S1 ∪ S2 ) ∧ x ∈ X
⇔ ∃X ∈ S1 ∪ S2 , x ∈ X
%
⇔ x ∈ (S1 ∪ S2 ).
To prove part (x), let S1 and S2 be non-empty sets of sets. Then

' ' ' '
x∈( S1 ) ∩ ( S2 ) ⇔ (x ∈ S1 ) ∧ (x ∈ S2 )
" # " #
/ S1 ∨ x ∈ X) ∧ ∀X, (X ∈
⇔ ∀X, (X ∈ / S2 ∨ x ∈ X)
" #
/ S1 ∨ x ∈ X) ∧ (X ∈
⇔ ∀X, (X ∈ / S2 ∨ x ∈ X)
" #
⇔ ∀X, (X ∈
/ S1 ∧ X ∈ / S2 ) ∨ x ∈ X
" #
/ S1 ∪ S2 ) ∨ x ∈ X
⇔ ∀X, (X ∈
⇔ ∀X ∈ S1 ∪ S2 , x ∈ X
'
⇔ x ∈ (S1 ∪ S2 ).
The above calculations use the fact that the proposition ∀x ∈ X, P (x), for any set X and set-theoretic
formula P , means ∀x, (x ∈ X ⇒ P (x)), which is equivalent to the proposition ∀x, (x ∈ / X ∨ P (x)). (See
Notation 5.1.14 and Remark 5.1.15.)
To prove part (xi), let S be a set of sets, let A ∈ S and let z ∈ A.%It follows that ∃y,
% (z ∈ y ∧ y ∈ S) because
this is true for y = A. Therefore z ∈ {x; ∃y, (x ∈ y ∧ y ∈ S)} = S. Hence A ⊆ S.
'
To prove part (xii), let S be a non-empty set of sets, let A ∈ S and let z ∈' S = {x; ∀y, (y ∈ S ⇒ x ∈ y)}.
Then ∀y, (y ∈ S ⇒ z ∈ y). From A ∈ S it then follows that z ∈ A. Hence S ⊆ A.

To show the part (xiii) left-to-right%implication, let S be a set of sets and assume ∀X ∈ S, X ⊆ A. That is,
∀X, (X ∈ S ⇒ X ⊆ A). Let z ∈ S. Then ∃y ∈ S, z ∈ y. That is,%∃y, (z ∈ y ∧ y ∈ S). From y ∈ S, it
follows that y ⊆ A. So ∃y, (z ∈ y ∧ y ⊆ A). Therefore z ∈ A. Hence S ⊆ A.
%
To show the% part (xiii) right-to-left implication, let S be a set of sets and assume S ⊆ A. Let X ∈ S.
Then X ⊆ S by part (xi). So X ⊆ A. Hence ∀X ∈ S, X ⊆ A.
To show the part (xiv) left-to-right implication, let S be a non-empty set of sets and ' assume ∀X ∈ ' S, X ⊇ A.
That is, ∀X, (X ∈ S ⇒ X ⊇ A). Let z ∈ A. Then ∀X ∈ S, z ∈ X. That is, z ∈ S. Hence A ⊆ S.
'
To show the part (xiv)
' right-to-left implication, let S be a non-empty set of sets and assume S ⊇ A. Let
X ∈ S. Then X ⊇ S by part (xii). So X ⊇ A. Hence ∀X ∈ S, X ⊇ A.
% % %
% (xv), let S1 and S2 be sets of sets. By part (v), ( S1 ) ∩ ( S2 ) = {X1 ∩ X2 ; X1 ∈ S1 , X2 ∈
To prove part
S2 }. But {X1 ∩ X2 ; X1 ∈ S1 , X2 ∈ S2 } = {y; ∃X1 ∈ S1 , ∃X2 ∈ S2 , y ∈ X1 ∩ X2 }. This equals the empty
set if and only if the proposition “∃X1 ∈ S1 , ∃X2 ∈ S2 , y ∈ X1 ∩ X2 ” is false for all y. In other words,
∀y, ¬(∃X1 ∈ S1 , ∃X2 ∈ S2 , y ∈ X1 ∩ X2 ). That is, ∀y, ∀X1 ∈ S1 , ∀X2 ∈ S2 , y ∈ / X1 ∩ X2 . This may be
rearranged
% as ∀X%1 ∈ S 1 , ∀X 2 ∈ S 2 , ∀y, y ∈
/ X 1 ∩ X 2 . But ∀y, y ∈
/ X 1 ∩ X 2 means precisely that X1 ∩ X2 = ∅.
Hence ( S1 ) ∩ ( S2 ) = ∅ if and only if ∀X1 ∈ S1 , ∀X2 ∈ S2 , X1 ∩ X2 = ∅.
47.2.7 Answer (→ 46.2.9): To prove Theorem 5.14.11 part (i), note that
% %
x ∈ {A ∈ S; P (A)} ⇔ x ∈ {B ∈ S; P (B)}
⇔ ∃A, (x ∈ A ∧ A ∈ {B ∈ S; P (B)}) (47.2.7)
⇔ ∃A, (x ∈ A ∧ A ∈ S ∧ P (A)})
⇔ ∃A ∈ S, (x ∈ A ∧ P (A)}). (47.2.8)

' '
x ∈ {A ∈ S; P (A)} ⇔ x ∈ {B ∈ S; P (B)}
⇔ ∀A, (x ∈ A ∨ A ∈/ {B ∈ S; P (B)}) (47.2.9)
⇔ ∀A, (x ∈ A ∨ ¬(A ∈ S ∧ P (A))
⇔ ∀A, (x ∈ A ∨ A ∈/ S ∨ ¬P (A))
∀A ∈ S, (x ∈ A ∨ ¬P (A)). (47.2.10)
⇔
Line (47.2.7) follows from Theorem 5.14.4 (i). Line (47.2.8) follows from Notation 5.1.14. Line (47.2.9)
follows from Theorem 5.14.4 (ii). Line (47.2.10) follows from Remark 5.1.15.
47.2.8 Answer (→ 46.2.10): To prove part (i) of Theorem 5.14.13, let A1 be a set. Then A1 ⊆ A1 .
Therefore A1 ∈ IP(A1 ) by Definition 5.8.18. (See also Remark 5.8.22.)
To prove part (ii), let A1 and A2 be sets. Let X be a set such that X ∈ IP(A1 ). Then X ⊆ A1 . So X ⊆ A2 .
So X ∈ IP(A2 ).
%
To prove part (iii),
% let A1 be a set. Then A1 ∈ IP(A1 ). So A1 ⊆ (IP(A1 )). To show the reverse in-
clusion, let%x ∈ (IP(A1 )). Then ∃X ∈ % IP(A1 ), x ∈ X. So ∃X, (X ⊆ A1 ∧ x ∈ X). Hence x ∈ A1 .
Therefore (IP(A1 )) ⊆ A1 . It follows that (IP(A1 )) = A1 .
Part (iv) follows immediately from Theorem 5.14.7 (i).
% %
To prove part%(v), let S1 be a set of sets. Let X ∈ S1 . Then X ⊆ S1 . So X ∈ IP( S1 ). It follows
that S1 ⊆ IP( S1 ).
%
To prove part (vi), suppose S1 ⊆ IP(S2 ) and let A %∈ S1 . Then ∃U, % (U ∈ S1 ∧ A ∈ U ). So ∃U, (U ∈ IP(S2 ) ∧
A ∈ U ) (because S1 ⊆ % IP(S 1 )). That is, A ∈ (IP(S 2 )). But (IP(S2 )) = S2 by Theorem 5.14.13 (iii).
So A ∈ S2 . Therefore S1 ⊆ S2 as claimed.
47.2.9 Answer (→ 46.2.11): To prove Theorem 5.14.16, part (i), note that
%
{x ∈ X; P (x, y)} = {z; ∃y ∈ Y, z ∈ {x ∈ X; P (x, y)}}
y∈Y
= {x; ∃y ∈ Y, (x ∈ X ∧ P (x, y))}
= {x; x ∈ X ∧ ∃y ∈ Y, P (x, y)}
= {x ∈ X; ∃y ∈ Y, P (x, y)}.

47.3. Numbers 817

'
{x ∈ X; P (x, y)} = {z; ∀y ∈ Y, z ∈ {x ∈ X; P (x, y)}}
y∈Y
= {x; ∀y ∈ Y, (x ∈ X ∧ P (x, y))}
= {x; x ∈ X ∧ ∀y ∈ Y, P (x, y)} (47.2.11)
= {x ∈ X; ∀y ∈ Y, P (x, y)}.
Line (47.2.11) follows from Y -= ∅, which follows from the assumption ∃x ∈ X, ∃y ∈ Y, P (x, y).
47.2.10 Answer (→ 46.2.12): This proof uses the idea that two finite sets that are equal must have
the same number of elements. Suppose (a, b) = (c, d). By Definition 6.1.3, (a, b) = {{a}, {a, b}}. and
(c, d) = {{c}, {c, d}}. So {a} ∈ (a, b). Therefore {a} ∈ (c, d). So by definition of {{c}, {c, d}}, either
{a} = {c} or {a} = {c, d}. If {a} = {c}, then a = c. If {a} = {c, d}, then a = c = d. In either case, a = c.
To show that b = d, first suppose that a = b. Then (a, b) = {{a}}. Therefore (c, d) = {{c}}. So c = d.
Therefore b = d.
Now suppose that a -= b. Then {a, b} ∈ (c, d) since {a, b} ∈ (a, b). So {a, b} = {c} or {a, b} = {c, d}.
Therefore {a, b} = {c, d} since a -= b. Hence c -= d. So b = c or b = d. But a = c and a -= b. So b = d.
(See also proofs of Theorem 6.1.12 by Halmos [160], page 23 and Mendelson [165], page 162.)
47.3. Numbers
47.3.1 Answer (→ 46.3.1):
0=∅
1 = {∅}
2 = {∅, {∅}}
3 = {∅, {∅}, {∅, {∅}}}
4 = {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}}
5 = {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}, {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}}}
47.3.2 Answer (→ 46.3.4): For all x ∈ IR,
ceiling(x) = inf{i ∈ ; i ≥ x}
= − sup{−i; i ∈ ∧ i ≥ x}
= − sup{i; −i ∈ ∧ −i ≥ x}
= − sup{i ∈ ; i ≤ −x}
= − floor(−x).
47.3.3 Answer (→ 46.3.2): The ordinal number 10 is illustrated in Figure 47.3.1.
Please check that you got all of the boxes right. Deduct one point for each wrong box.
47.4. Algebra
47.4.1 Answer (→ 46.4.1): For the set X = 2 , define the bijections τi,j : X → X and σ : X → X for
i, j ∈ by τi,j : (x, y) 8→ (x + i, y + j) and σ : (x, y) 8→ (y, x). These are translation and coordinate-swap
actions respectively. Define the group G as the set of actions {τi,j ; i, j ∈ } ∪ {στi,j ; i, j ∈ } with the
operation of function composition, where στi,j denotes the composition σ ◦ τi,j .
Let H = {τn,n ; n ∈ } ∪ {στn,n ; n ∈ }. This is a subgroup of G. Let g = στa,b for any a, b ∈ . It is
straightforward to show that gHg −1 = g −1 Hg for all a, b ∈ , but gH = Hg if and only if a = b. (When the
author verified this example on 2004-12-25, he vowed to sacrifice 3 oxen and a fat hamster to the memory of
Pythagoras, who famously sacrificed a hundred oxen to the deities in gratitude for his discovery of a general
proof of the Pythagoras theorem. See Struik [194], page 42. The author was assisted by a strong hint from
Bill Moran, by the way.)
The example would probably have the required properties if was replaced by the finite group k for
suitable k.

Figure 47.3.1 The ordinal number 10
47.5. Linear algebra

47.5.1 Answer (→ 46.5.3): Let n ∈ +
0 and A, B ∈ Sym(n, IR). Then
n
5
∀i, j ∈ n, (AB)Tij = (AB)ji = ajk bki
k=1
5n
= akj bik
k=1
5n
= bik akj = (BA)ij .
k=1
Therefore (AB)T = BA as claimed.
47.6. Tensor algebra

47.6.1 Answer (→ 46.6.1): Let g, h ∈ L ((Vα )α∈A ; U ) and let f = g + h be the pointwise sum of g and h.
The expression f (u) in Theorem 13.2.12 evaluates to g(u) + h(u) = λ1 g(v) + λ2 g(w) + λ1 h(v) + λ2 h(w) =
λ1 f (v) + λ2 f (w), as required. Similarly, for λ ∈ K and f = λg, f (u) evaluates to λg(u) = λ(λ1 g(v) +
λ2 g(w)) = λ1 f (v) + λ2 f (w), as required.
47.6.2 Answer (→ 46.6.2): Let g, h ∈ Lm +

(V ; U ) and let f = g + h be the pointwise sum of g and h. Then
f "∈ Lm (V ; #U ) by" Theorem#13.3.2.
" For permutations
# " P #: Nm" → Nm# and "vector sequences
# (vi )m
i=1 ∈ V ,
m
f (vP (i) )i=1 = g (vP (i) )i=1 + h (vP (i) )i=1 = g (vi )i=1 + h (vi )i=1 = f (vi )i=1 , as required.
m m m m m m
47.6.3 Answer (→ 46.6.3): Let g, h ∈ Lm −

(V ; U ) and let f = g +h be the pointwise sum of g and h. Then
f ∈ Lm (V ; U ) by Theorem 13.3.2. For permutations P : Nm → Nm and vector sequences (vi )m i=1 ∈ V ,
m
" # " # " #

i=1 = g (vP (i) )i=1 + h (vP (i) )i=1
f (vP (i) )m m m
" # " #
= parity(P )g (vi )m
i=1 + parity(P )h (vi )i=1
m
" #
= parity(P )f (vi )m
i=1 .
Therefore f is antisymmetric. So f ∈ Lm
−
(V ; U ) as claimed.

47.7. Topology 819
47.7. Topology
47.7.1 Answer (→ 46.7.3): Let X = ∅. Then IP(X) = {∅}. This gives two possibilities for subsets
S ⊆ IP(X) of the power set of X, namely S = ∅ or S = {∅}. However, {∅, X} = {∅}. So the requirement
that {∅, X} ⊆ S implies that S = {∅}. Therefore IP(S) = {∅, {∅}}.
'
In the'equation T # = { C; C ∈ IP(S), 1 ≤ #(C) < ∞}, the only possibility for C is {∅} because 1 ≤ #(C).
Then C = ∅. So T # = {∅} and IP(T # ) = {∅, {∅}}.
% %
In the equation T = { D; D ∈ IP(T # )}, the only choices for D are ∅ or {∅}. Then D = ∅ in both cases.
Therefore T = {∅}. This is the same as the only possible topology on X, namely the set {∅, X} = {∅}.
To fully verify Theorem 14.9.8 for X = ∅, it must be shown that the topology T = {∅} on X = ∅ is the
intersection of all topologies on X. This follows immediately from the fact that there is one and only one
possible topology on X = ∅.
47.7.2 Answer (→ 46.7.4): Let X be a finite T1 topological space. If X = ∅, then there is only one
topology on X, namely the set Top(X) = {∅} = IP(∅) = IP(X).
Let x ∈ X. By the'T1 property, there is a set Ωy ∈ Top(X) for each y ∈ I = X \ {x} such that x ∈ Ωy
and y ∈/ Ωy . Then y∈I Ωy = {x}. Since this is a finite intersection, it follows that {x} ∈ Top(X). Since
every subset of X can be written as a union of such singleton sets, it follows that Top(X) = IP(X).
47.7.3 Answer (→ 46.7.9): Let (x, y) ∈ X ×Y satisfy y -= f (x) in Theorem 15.2.17. Since Y is a Hausdorff
space, there exist Ω1 , Ω2 ∈ Top(Y ) such that f (x) ∈ Ω1 , y ∈ Ω2 and Ω1 ∩ Ω2 = ∅. Then
graph(f ) ∩ (f −1 (Ω1 ) × Ω2 ) = {(x, y) ∈ X × Y ; y = f (x) and f (x) ∈ Ω1 and y ∈ Ω2 }

= {(x, y) ∈ X × Y ; f (x) ∈ Ω1 and f (x) ∈ Ω2 }
= ∅.
But (x, y) ∈ f −1 (Ω1 ) × Ω2 and f −1 (Ω1 ) × Ω2 ∈ Top(X × Y ). Therefore (x, y) is in the interior of G =
(X × Y ) \ graph(f ). Since all points of G are in the interior of G, it follows that G is open in the topology
of X × Y . Therefore graph(f ) is closed in the topology of X × Y .
%
To make the last statement a little more rigorous, let A = (x,y)∈G {Ω ∈ Top(X ×Y ); Ω ⊆ G and (x, y) ∈ Ω}.
Then A ∈ Top(X × Y ) and A = G. Therefore graph(f ) is closed.

47.8.1 Answer (→ 46.8.1): An empty fibre bundle means a fibre bundle whose total space is empty. If E
in the topological (G, F ) fibre bundle (E, TE , π, B, TB , AF
E ) has E = ∅, then TE = {∅} is the only possible
topology on E. (See Remarks 14.3.17 and 14.3.10 regarding the empty topological space.) The only possible
function π : ∅ → B is the empty function π = ∅. (See Remark 6.5.18.) The empty function on the empty
topological space (∅, {∅}) is continuous for any target topology (B, TB ). So Definition 23.6.4 (i) is satisfied.
To satisfy Definition 23.6.4 (ii), it is necessary to set Uφ = ∅ because φ must be the empty function, π −1 (U ) =
∅ and π × φ is the empty function, which can only be a bijection π × φ : π −1 (Uφ ) ≈ Uφ × F for F -= ∅ if
Uφ = ∅. So the empty chart φ = ∅ is the only permissible chart. It then follows by Definition 23.6.4 (iii)
that B = ∅.
If AFE = {∅} (i.e. if the atlas contains only the empty function), then Definition 23.6.4 (ii) requires that for
−1
φ1 = φ2 = ∅ and Uφ1 = Uφ2 = ∅, the condition ∀b ∈ Uφ1 ∩ Uφ2 , ∃g ∈ G, βb,φ1 ◦ βb,φ 2
= Lg holds. But this
is always true because Uφ1 ∩ Uφ2 = ∅. The single transition map gφ1 ,φ2 in Definition 23.6.4 (v) is the empty
function. Therefore AF E = {∅} is a valid atlas. The fibre atlas AE = ∅ similarly satisfies all of the conditions.
F
47.8.2 Answer (→ 46.8.2): To show that condition (23.10.1) in Remark 23.10.4 follows from condition (ii)
of Definition 23.10.3, first& let φ1 , φ2 ∈ AF
E , b ∈ Uφ1 ∩Uφ2 and g ∈ G, and suppose that ∀z ∈ π
−1
({b}), φ1 (z) =
&
gφ2 (z). Since βb,φ2 = φ2 π−1 ({b}) : π ({b}) → F is a bijection, one may write ∀y ∈ F, gy = gφ2 (βb,φ
−1 −1
(y)) =
2
−1
φ1 (βb,φ 2
(y)). But by definition of gφ1 ,φ2 , it is also true that ∀z ∈ π −1 ({b}), φ1 (z) = gφ1 ,φ2 (b)φ2 (z). So

−1
∀y ∈ F, gφ1 ,φ2 (b)y = φ1 (βb,φ 2
(y)). So ∀y ∈ F, gy = gφ1 ,φ2 (b)y. Since G acts effectively on F , it follows (by
Remark 9.4.15) that g = gφ1 ,φ2 (b). Therefore by Definition 23.10.3 (ii) and the definition of g̃h(φ1 ),h(φ2 ) ,
∀z̃ ∈ Ẽb , h(φ1 )(z̃) = g̃h(φ1 ),h(φ2 ) (b)h(φ2 )(z̃)

= gφ1 ,φ2 (b)h(φ2 )(z̃)
= gh(φ2 )(z̃).
This has proved the forward implication of condition (23.10.1):
∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀g ∈ G,
(∀z ∈ π −1 ({b}), φ1 (z) = gφ2 (z)) ⇒ (∀z̃ ∈ π̃ −1 ({b}), h(φ1 )(z̃) = gh(φ2 )(z̃)).
The reverse implication follows in the same way because h is a bijection.

To show the converse, suppose now that
∀φ1 , φ2 ∈ AF
E , ∀b ∈ Uφ1 ∩ Uφ2 , ∀g ∈ G,
(∀z ∈ π −1 ({b}), φ1 (z) = gφ2 (z)) ⇔ (∀z̃ ∈ π̃ −1 ({b}), h(φ1 )(z̃) = gh(φ2 )(z̃)). (47.8.1)
Let φ1 , φ2 ∈ AF E , b ∈ Uφ1 ∩ Uφ2 and g ∈ G. By definition of gφ1 ,φ2 (b), ∀z ∈ Eb , gφ1 ,φ2 (b)φ2 (z). Insert
g = gφ1 ,φ2 (b)
& into the left hand side of condition (47.8.1) to obtain ∀z̃ ∈ π̃ −1 ({b}), h(φ1 )(z̃) = gh(φ2 )(z̃).
Since h(φ2 )&Ẽ : Eb → F is a bijection and G acts effectively on F̃ , it follows as above that g = g̃h(φ1 ),h(φ2 ) (b).
b
Therefore gφ1 ,φ2 (b) = g̃h(φ1 ),h(φ2 ) (b), which was to be proved.
47.8.3 Answer (→ 46.8.3): Let (P, q, B) − < (P, TP , q, B, TB , AG

P ) be a topological G-bundle, let (G, F ) −
<
(G, TG , F, TF , σ, µG ) be an effective topological left transformation group, let z, z # ∈ P , y, y # ∈ F and
F
P , and suppose that φ1 (z )y = φ1 (z)y. Then

# #
φ1 , φ2 ∈ AG
φ2 (z # )y # = φ2 (z # )φ1 (z # )−1 φ1 (z # )y #
= φ2 (z # )φ1 (z # )−1 φ1 (z)y
= gφ2 ,φ1 (b)φ1 (z # )φ1 (z # )−1 φ1 (z)y
= gφ2 ,φ1 (b)φ1 (z)y
= φ2 (z)y,
which proves the implication φ1 (z # )y # = φ1 (z)y ⇒ φ2 (z # )y # = φ2 (z)y. The reverse implication follows in the
same way. Therefore the choice of φ in Definition 23.12.3 (i) does not affect the definition.

47.9.1 Answer (→ 46.9.1): Let (X, T ) be a locally Euclidean space. By Definition 25.2.3, this means
that ∀x ∈ X, ∃Ω ∈ Topx (X), ∃n ∈ + , ∃G ∈ Top(IRn ), Ω ≈ G. Let y = φ(x), where φ : Ω ≈ G is the
homeomorphism in Definition 25.2.3. Since G is open in IRn , there is an r > 0 such that By,r ⊆ G. Define
Ω# = φ−1 (By,r ). Then Ω# is connected because By,r is connected. Therefore Ω# satisfies the requirements of
Definition 15.4.20 for the local connectedness of X.
47.9.2 Answer (→ 46.9.2): Let (X, T ) be a locally Euclidean space. By Definition 25.2.3, this means
that ∀x ∈ X, ∃Ω ∈ Topx (X), ∃n ∈ + , ∃G ∈ Top(IRn ), Ω ≈ G. Let y = φ(x), where φ : Ω ≈ G is the
homeomorphism in Definition 25.2.3. Since G is open in IRn , there is an r > 0 such that B̄y,r ⊆ G. Define
− −
Ω# = φ−1 (By,r ). Then Ω# = φ−1 (B̄y,r ) because φ is a homeomorphism, and Ω# is a compact subset of (X, T )
because B̄y,r is a compact subset of IR . Therefore Ω satisfies the requirements of Definition 15.7.12 for the
n #
local compactness of X.

47.10. Differentiable manifolds 821

−
47.10.1 Answer (→ 46.10.1): Let M − < (M, AM ) be a C r manifold with r ∈ + 0 . Let f1 , f2 ∈ C (M, IR).
r
Then the sum-function f1 + f2 : M → IR is defined by f1 + f2 : p 8→ f1 (p) + f2 (p). Let ψ ∈ atlas(M ). Then

the function g = (f1 + f2 ) ◦ ψ −1 : Range(ψ) → IR is the pointwise sum g = (f1 ◦ ψ −1 ) + (f2 ◦ ψ −1 ) on the
open subset Range(ψ) of IRn , where n = dim(M ). So g is C r on Range(ψ) because (f1 ◦ ψ −1 ) and (f2 ◦ ψ −1 )
are C r on Range(ψ) by Theorem 18.7.4. Therefore C r (M, IR) is closed under the operation of pointwise
function addition. Similarly, C r (M, IR) is closed under the operation of pointwise function multiplication.


[823]
Chapter 48
Notations and abbreviations
48.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823

48.2 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
48.1. Notations
The following notations are defined or used in this book. The two-number references are section numbers.
The three-number references are definitions, notations, remarks or theorems. Three-number references in
parentheses are equation numbers. Three-number references in square brackets are figures.
Abbreviations are currently listed in the index in Chapter 49, or else in Section 48.2.
notation reference meaning
1.6.6 end of proof; quod erat demonstrandum
T 3.7.16 proposition-tag “true”
F 3.7.16 proposition-tag “false”
¬ 4.3.3 logical “not” (negation)
∧ 4.3.3 logical “and” (conjunction)
∨ 4.3.3 logical “or” (disjunction)
⇒ 4.3.3 logical implication operator (“implies”)
4.3.3 logical equivalent operator (“if and only if”)
⇔
| 4.3.9 alternative denial operator (NAND oeprator, Sheffer stroke)
↓ 4.3.9 joint denial operator (NOR operator, Peirce arrow, Quine dagger)
! 4.3.9 exclusive-or operator (XOR operator)
9 4.3.19 logical operator whose value is always true
⊥ 4.3.19 logical operator whose value is always false
< 4.5.8 assertion
F< 4.5.9 two-way assertion
9 4.12.10 logical predicate whose value is always true
⊥ 4.12.10 logical predicate whose value is always false
∀ 4.13.3 for all
∃ 4.13.3 for some
∃# x, P (x) 4.16.4 P (x) is true for one and only one x; unique existence


824 48. Notations and abbreviations

x∈A 5.1.3 x is an element (or member) of set A
x∈ /A 5.1.4 x is not an element (or member) of set A
A⊆B 5.1.8 A is a subset of B
A⊇B 5.1.8 A is a superset of B; same as B ⊆ A
A -⊆ B 5.1.9 A is not a subset of B
A -⊇ B 5.1.9 A is not a superset of B; same as B -⊆ A
x -= y 5.1.10 x does not equal y; i.e. ¬(x = y)
A⊂ %= B 5.1.12 A is a proper subset of B
A⊃ \ B
= 5.1.12 A is a proper superset of B; same as B ⊂ %= A
∃x ∈ S, P (x) 5.1.14 P (x) is true for some element x of a set S
∀x ∈ S, P (x) 5.1.14 P (x) is true for all elements x of a set S
∅ 5.8.4 the empty set; satisfies ∀x, x ∈/∅
{x} 5.8.11 singleton set S satisfying z ∈ S ⇔ z = x
{x; P (x)} 5.8.12 set S satisfying z ∈ S ⇔ P (z)
{x ∈ A; P (x)} 5.8.15 set S satisfying z ∈ S ⇔ (z ∈ A ∧ P (z))
IP(X) 5.8.19 the power set of set X; i.e. {A; A ⊆ X}
A∪B 5.13.3 union of sets A and B
A∩B 5.13.3 intersection of sets A and B
A\B 5.13.9 complement of set B within set A
%E B
A 5.13.15 symmetric set difference of sets A and B
'S 5.14.2 union of set of sets S
S 5.14.2 intersection of non-empty set of sets S
X− <Y 5.16 X is an abbreviation for Y
Y − >X 5.16 X is an abbreviation for Y
(a, b) 6.1.4 the ordered pair {{a}, {a, b}} for any a and b
A×B 6.2.2 Cartesian product of sets A and B
Dom(R) 6.3.7 domain of relation or function R
Range(R) 6.3.7 range of relation or function R
Im(R) 6.3.7 image of relation or function R
R1 ◦ R2 6.3.24 composition of relations (or functions) R1 and R2
R−1 6.3.28 inverse of relation (or function) R
f :X→Y 6.5.7 f is a function from X to Y
f (x) 6.5.12 the value of a function f for an argument x of f
YX 6.5.13 the set of functions from X to Y
id&X 6.5.21 identity function on set X
f &A 6.5.28 the restriction of function f to set A
f ×g 6.9.11 direct product of functions f and g
f× ˙ g 6.9.12 pointwise direct product of functions f and g
X→ ˚ Y 6.11.4 set of partially-defined functions from X to Y
f :X→ ˚ Y 6.11.4 f is a partially-defined function from X to Y
f : G −→
◦ F 6.11.6 f is a function from G to F → F ; same as f : G → (F → F )
G −→◦ F 6.11.6 the set of functions from G to F → F ; same as G → (F → F )
X→Y 6.12.2 the set of functions from X to Y ; same as Y X

48.1. Notations 825

ω 7.2.12 the finite ordinal numbers {0, 1, 2, . . .}
S+ 7.2.16 the successor set S ∪ {S} of any set S
ω+ 7.2.29 the extended finite ordinal numbers ω ∪ {ω}
7.2.31 the natural numbers {1, 2, 3, . . .}
n 7.2.33 the set {1, 2, . . . n} for n ∈ ω
IPk (X) 7.2.34 the set {S ∈ IP(X); #(S) ≤ k} for any set X and k ∈ + 0
−
the extended natural numbers ∪ {∞}
m|k 7.4.3 m divides k, where m, k ∈
7.5.3 the integers {. . . , −2, −1, 0, 1, 2, . . .}
+
7.5.4 the positive integers {1, 2, 3, . . .}; equivalent to
+
0 7.5.4 non-negative integers {0, 1, 2, . . .}; equivalent to ω
−
7.5.4 the negative integers {−1, −2, −3, . . .}
−
0 7.5.4 non-positive integers {0, −1, −2, . . .}
n 7.5.6 the set {i ∈ ; 0 ≤ i < n} = {0, 1, . . . n − 1} for n ∈ + 0
∞ 7.6.1 the positive infinite pseudo-integer
−∞ 7.6.1 the negative infinite pseudo-integer
−
7.6.3 the extended integers ∪ {∞, −∞}
−+ −
7.6.4 the positive extended integers; equivalent to
−+
7.6.4 non-negative extended integers; equivalent to ω +
−0−
7.6.4 the negative extended integers
−−
0 7.6.4 non-positive extended integers
Xn 7.7.1 Cartesian product of n copies of set X for n ∈ + 0
χA 7.9.3 indicator function of a set A
2n 7.9.8 power-of-two function for integer argument n ∈ + 0
δ(i, j) 7.9.10 Kronecker delta function
δij , δ ij , δji 7.9.11 Kronecker delta function; same as δ(i, j)
perm(X) 7.10.4 set of permutations of a set X
parity(f ) 7.10.9 parity of a permutation f : X → X for a set X
n! 7.10.14 the value of the factorial function for argument n
(n)k 7.10.17 the value of the Jordan factorial function for argument (n, k)
+(f ) 7.10.20 Levi-Civita alternating symbol
+i1 ,...in 7.10.21 Levi-Civita alternating symbol
+i1 ,...in 7.10.21 Levi-Civita alternating symbol
Crn 7.11.3 combination symbol
Irn 7.11.10 set of increasing maps from r to n
Jrn 7.11.10 set of non-decreasing maps from r to n
List(X) 7.12.2 list space of a set X
List(X) 7.12.5 extended list space of a set X


8.1.2 the set of rational numbers
+
8.1.4 the set of positive rational numbers
+
0 8.1.4 the set of non-negative rational numbers
−
8.1.4 the set of negative rational numbers
−
8.1.4 the set of non-positive rational numbers
−0
8.2.2 the set of extended rational numbers
−+
8.2.4 the set of positive extended rational numbers
−+
0 8.2.4 the set of non-negative extended rational numbers
−−
8.2.4 the set of negative extended rational numbers
−−
0 8.2.4 the set of non-positive extended rational numbers
IR 8.3.7 the set of real numbers
IR+ 8.3.8 the set of positive real numbers {x ∈ IR; x > 0}
IR+0 8.3.8 the set of non-negative real numbers {x ∈ IR; x ≥ 0}
IR− 8.3.8 the set of negative real numbers {x ∈ IR; x < 0}
IR−0 8.3.8 the set of non-positive real numbers {x ∈ IR; x ≤ 0}
(a, b) 8.3.10 the open interval of real numbers {x ∈ IR; a < x < b}
[a, b] 8.3.10 the closed interval of real numbers {x ∈ IR; a ≤ x ≤ b}
[a, b) 8.3.10 the closed-open interval of real numbers {x ∈ IR; a ≤ x < b}
(a, b] 8.3.10 the open-closed interval of real numbers {x ∈ IR; a < x ≤ b}
−
IR 8.4.2 the set of extended real numbers IR ∪ {−∞, ∞}
− −
IR+ 8.4.5 the set of positive extended real numbers {x ∈ IR; x > 0}
−+ −
IR0 8.4.5 the set of non-negative extended real numbers {x ∈ IR; x ≥ 0}
− −
IR− 8.4.5 the set of negative extended real numbers {x ∈ IR; x < 0}
−− −
IR0 8.4.5 the set of non-positive extended real numbers {x ∈ IR; x ≤ 0}
IRn 8.5.1 the set of real-number n-tuples for n ∈ + 0
Qm,n 8.5.3 the m, n-concatenation operator for real number tuples for m, n ∈ +
0
|x| 8.6.1 the absolute value of x ∈ IR
sign(x) 8.6.2 the sign of x ∈ IR
H(x) 8.6.6 the Heaviside function of x ∈ IR
floor(x) 8.6.8 the floor function of x ∈ IR
ceiling(x) 8.6.9 the ceiling function of x ∈ IR
frac(x) 8.6.13 the fractional part function of x ∈ IR
round(x) 8.6.14 the round function of x ∈ IR
x mod m 8.6.16 x modulo m for x ∈ IR and m ∈ IR \ {0}
8.7.1 the set of complex numbers

48.1. Notations 827

Lg 9.2.7 left action by a group element g ∈ G on elements of G
Rg 9.2.7 right action by a group element g ∈ G on elements of G
e 9.2.10 identity element of a group G
g −1 9.2.15 inverse of an element g of a group G
Hom(G1 , G2 ) 9.2.23 the set of group homomorphisms from G1 to G2
Iso(G1 , G2 ) 9.2.23 the set of group isomorphisms from G1 to G2
End(G) 9.2.23 the set of group homomorphisms from G to G
Aut(G) 9.2.23 the set of group isomorphisms from G to G
Mon(G1 , G2 ) 9.2.23 the set of group monomorphisms from G1 to G2
Epi(G1 , G2 ) 9.2.23 the set of group epimorphisms from G1 to G2
gH 9.3.4 the left coset of a subgroup H of a group G by g ∈ G
Hg 9.3.4 the right coset of a subgroup H of a group G by g ∈ G
G/H 9.3.10 quotient of group G with respect to normal subgroup H of G
Sg 9.3.18 the conjugate of subset S of group G by g ∈ G
N (S) 9.3.28 the normalizer of a subset S of a group G
Z(S) 9.3.28 the centralizer of a subset S of a group G
Lg 9.4.6 left transformation group action by g ∈ G for a group G
Rg 9.5.5 right transformation group action by g ∈ G for a group G
HomA (M1 , M2 ) 9.9.12 set of A-homomorphisms from module M1 to module M2 over set A
EndA (M ) 9.9.12 set of A-endomorphisms from module M to M over set A
AutA (M ) 9.9.12 set of A-automorphisms from module M to M over set A
GL(M ) 9.9.12 same as AutA (M ) for module M and set A
gl(V ) 9.11.14 Lie algebra associated with associative algebra AutK (V ) of K-automorphisms
of K-module V
ad(X) 9.11.20 adjoint of element X of Lie algebra A under the adjoint representation of A
ad(A) 9.11.23 adjoint Lie algebra {ad(X); X ∈ A} of Lie algebra A
dim(V ) 10.2.6 dimension of linear space V
vi 10.2.17 the ith component of a vector v for a given basis
Lin(V, W ) 10.3.2 set of linear maps from linear space V to linear space W
Hom(V1 , V2 ) 10.3.7 the set of linear space homomorphisms from V1 to V2
Iso(V1 , V2 ) 10.3.7 the set of linear space isomorphisms from V1 to V2
End(V ) 10.3.7 the set of linear space homomorphisms from V to V
Aut(V ) 10.3.7 the set of linear space isomorphisms from V to V
Mon(V1 , V2 ) 10.3.7 the set of linear space monomorphisms from V1 to V2
Epi(V1 , V2 ) 10.3.7 the set of linear space epimorphisms from V1 to V2
fi 10.5.13 the ith component of linear functional f for a given basis
V1 ⊕ V2 10.6.2 external direct sum of linear spaces V1 and V2
V1 ⊕ V2 10.6.9 internal direct sum of linear spaces V1 and V2
V /W 10.7.2 quotient of linear space V over linear space W
|x|p 10.8.2 p-norm of x ∈ IRn
(x, y) 10.8.6 inner product of vectors x, y ∈ IRn
x·y 10.8.6 inner product of vectors x, y ∈ IRn
Nx, yO 10.8.6 inner product of vectors x, y ∈ IRn


Mm,n (K) 11.1.3 the set of m × n matrices over a field K
Mm,n (IR) 11.1.3 the set of m × n real-valued matrices
Mm,n 11.1.4 sames as Mm,n (IR)
AT 11.1.13 the transpose of matrix A ∈ Mm,n (K), m, n ∈ + 0 , field K
AB 11.1.17 the product of matrices A and B
In 11.1.21 the identity matrix in Mn,n (K) for n ∈ + 0 and field K
Mn (K) 11.3.1 same as Mn,n (K)
Tr(A) 11.3.3 the trace of a square matrix A
det(A) 11.3.7 the determinant of a square matrix A
A−1 11.3.16 the inverse of an invertible square matrix A
λ+ (A) 11.4.2 the upper norm of a real square matrix A
λ− (A) 11.4.2 the lower norm of a real square matrix A
Sym(n, IR) 11.5.3 the set of real symmetric n × n matrices
Sym+ 0 (n, IR) 11.6.1 the set of positive semi-definite real symmetric n × n matrices
Sym− 0 (n, IR) 11.6.1 the set of negative semi-definite real symmetric n × n matrices
Sym+ (n, IR) 11.6.1 the set of positive definite real symmetric n × n matrices
Sym− (n, IR) 11.6.1 the set of negative definite real symmetric n × n matrices
[P, Q] 12.2.18 line segment through points P, Q
[P0 , P1 . . . Pk ] 12.2.25 hyperplane segment through points P0 , P1 , . . . Pk for k ∈ +
0
L ((Vα )α∈A ; U ) 13.2.5 set of multilinear maps from ×α∈A Vα to U

L (V1 , . . . Vm ; U ) 13.2.6 the set L ((Vα )α∈A ; U ) with A = m = {1, . . . m}
Lm (V, U ) 13.2.7 same as L ((Vα )α∈A ; U ) with A = m and Vα = V for all α
Lm +
(V, U ) 13.4.4 the set of symmetric multilinear maps from V m to U
Lm −
(V, U ) 13.4.5 the set of antisymmetric multilinear maps from V m to U
⊗α∈A Vα 13.5.8 tensor product of linear spaces (Vα )α∈A
⊗m i=1 Vi 13.5.8 tensor product of linear spaces (Vi )m
i=1
V 1 ⊗ . . . Vm 13.5.8 tensor product of linear spaces (Vi )m
i=1
13.6.7 tensor monomial corresponding to (vα )α∈A
⊗α∈A vα
⊗mi=1 vi 13.6.11 tensor monomial corresponding to (vi )m i=1
v1 ⊗ . . . vm 13.6.12 tensor monomial corresponding to (vi )m i=1
⊗m V 13.6.14 tensor product of m copies of linear space V
Lr,s (V, W ) 13.8.8 set of multilinear maps for mixture of V and V ∗
⊗∗ V 13.9.7 tensor algebra of linear space V
Λm (V, W ) 13.10.3 the set Lm −
(V, W ) with pointwise vector addition and scalar product
Λ
Km V 13.10.5 same as Λm (V, K), where K is the field of V
m
V 13.10.7 alternating tensor product of m copies of V ; same as Λm (V, K)∗
∧mi=1 vi 13.10.11 a simple m-vector

48.1. Notations 829

Top(X) 14.3.5 the topology on a topological space X
Topx (X) 14.3.11 the set of open neighbourhoods of x ∈ X in a topological space X
Top(X) 14.3.15 the set of closed sets in a topological space X
Int(S) 14.5.2 the interior of a set S in a topological space X
S̄ 14.5.5 the closure of a set S in a topological space X
IntT (S) 14.5.8 the interior of a set S in a topological space (X, T )
ClosureT (S) 14.5.14 the closure of a set S in a topological space (X, T )
Ext(S) 14.6.3 the exterior of a set S in a topological space X
Bdy(S) 14.6.5 the boundary of a set S in a topological space X
∂S 14.6.5 the boundary of a set S in a topological space X
ExtT (S) 14.6.12 the exterior of a set S in a topological space (X, T )
BdyT (S) 14.6.12 the boundary of a set S in a topological space (X, T )
C(X, Y ) 14.12.12 set of continuous functions from X to Y , for topological spaces X, Y
C 0 (X, Y ) 14.12.12 same as C(X, Y )
C 0 (X) 14.12.14 same as C(X, IR)
limz→x f (z) 14.12.16 limit of function f : X → Y at x ∈ X
limx f 14.12.18 limit of function f : X → Y at x ∈ X
X≈Y 14.13.2 topological spaces X and Y are homeomorphic
f :X≈Y 14.13.2 f is a homeomorphism from X to Y
Iso(X, Y ) 14.13.3 set of all topological isomorphisms from X to Y
Aut(X) 14.13.3 set of all topological automorphisms on X
Topcx (X) 15.4.16 the connected component of X which contains x
S(γ) 16.2.14 the initial point of a curve γ
T (γ) 16.2.14 the terminal point of a curve γ
C0 (M ) 16.2.18 the set of C 0 curves in topological space M
[γ]0 16.4.1 the set of curves which are path-equivalent to a given curve γ
P0 (M ) 16.4.4 the set of C 0 paths in topological space M
−
17.1.8 open ball {y ∈ M ; d(x, y) < r}, metric space M , centre x ∈ M , radius r ∈ IR+
Bx,r 0
−+
B̄x,r 17.1.8 closed ball {y ∈ M ; d(x, y) ≤ r}, metric space M , centre x ∈ M , radius r ∈ IR0
−
Br (x) 17.1.8 open ball {y ∈ M ; d(x, y) < r}, metric space M , centre x ∈ M , radius r ∈ IR+ 0
−+
B̄r (x) 17.1.8 closed ball {y ∈ M ; d(x, y) ≤ r}, metric space M , centre x ∈ M , radius r ∈ IR0
−
Bx,r1 ,r2 17.1.18 open annulus {y ∈ M ; r1 < d(x, y) < r2 }, x ∈ M , r1 , r2 ∈ IR+
−0
B̄x,r1 ,r2 17.1.18 closed annulus {y ∈ M ; r1 ≤ d(x, y) ≤ r2 }, x ∈ M , r1 , r2 ∈ IR+0
−
Ḃx,r 17.1.18 punctured open ball {y ∈ M ; 0 -= d(x, y) < r}, x ∈ M , r ∈ IR+ 0
B̄˙
−
x,r 17.1.18 punctured closed ball {y ∈ M ; 0 -= d(x, y) ≤ r}, x ∈ M , r ∈ IR+
−+ 0
Br1 ,r2 (x) 17.1.18 open annulus {y ∈ M ; r1 < d(x, y) < r2 }, x ∈ M , r1 , r2 ∈ IR0
−
B̄r1 ,r2 (x) 17.1.18 closed annulus {y ∈ M ; r1 ≤ d(x, y) ≤ r2 }, x ∈ M , r1 , r2 ∈ IR+0
−+
Ḃr (x) 17.1.18 punctured open ball {y ∈ M ; 0 -= d(x, y) < r}, x ∈ M , r ∈ IR0
B̄˙ (x)
−
r 17.1.18 punctured closed ball {y ∈ M ; 0 -= d(x, y) ≤ r}, x ∈ M , r ∈ IR+ 0
Lip(f ) 17.4.11 the infimum of Lipschitz constants for a Lipschitz function f
−
C k (U ) 18.4.10 set of k-times differentiable functions on open set U ⊆ IR for k ∈ +
− 0
C k (Ω) 18.7.2 set of k-times differentiable functions on Ω ∈ Top(IRn ) for k ∈ + 0, n∈ 0
+
C k (Ω, IRm ) 18.7.3 set of k-times IR -valued differentiable functions on Ω ∈ Top(IR ), m, n ∈ +

m n
0,
−+
k∈ 0
K 18.7.8 regularity class such as C k , C ∞ or analytic
C k,α (U ) 18.9.3 set of functions in C k (U ) with α-Hölder kth derivative, k ∈ + 0 , α ∈ (0, 1]


T (IR ) n
19.1.8 tangent bundle IR × IR on IR , n ∈ +
n n n
0
−+
X r (T (IRn )) 19.1.13 set of C r cross-sections of T (IRn ), n ∈ + 0, r ∈ 0
T ∗ (IRn ) 19.2.8 cotangent bundle IRn × IRn on IRn , n ∈ + 0
−+
X r (T ∗ (IRn )) 19.2.12 set of C r cross-sections of T ∗ (IRn ), n ∈ +0, r ∈ 0
−+
X r (Λm T (IRn )) 20.5.6 set of C r cross-sections of Λm T (IRn ) for m, n ∈ 0,
+
r∈ 0
X(E, π, B) 23.3.9 the set of cross-sections of topological fibration (E, π, B)

atlas(E, π, B) 23.6.9 the fibre atlas of fibre bundle (E, π, B)
IsoG (Eb1 , Eb2 ) 23.8.4 the set of topological isomorphisms from fibre Eb1 to Eb2
AutG (Eb ) 23.8.5 the set of topological automorphisms of fibre set Eb
−1
Lbg,φ 23.8.8 automorphism through the charts βb,φ ◦ Lg ◦ βb,φ : Eb ≈ Eb
Rg 23.9.4 right action map on a principal fibre bundle
(P × F )/G 23.12.4 orbit-space version of an associated topological fibre bundle
P ×G F 23.12.9 same as (P × F )/G
Θγs,t 24.2.2 parallelism map between parameters s and t of a curve γ
atlas(M ) 25.4.8 the atlas AM for a differentiable manifold M −
< (M, AM )
atlasp (M ) 25.4.8 set of charts ψ ∈ atlas(M ) such that p ∈ Dom(ψ)
C r (M ) 26.6.2 the set of C r real functions on a C r manifold M ; same as C r (M, IR)
C̊ r (M ) 26.6.6 C r real functions on open subsets of a C r manifold M ; same as C̊ r (M, IR)
C̊pr (M ) 26.6.7 C r real functions on open neighbourhoods of p ∈ M of a C r manifold M ; same
as C̊pr (M, IR)
C r (M, IRm ) 26.6.12 the set of C r IRm -valued functions on a C r n-dimensional manifold M
C r (M, N ) the set of C r maps from a C r manifold M to a C r manifold N
C̊ r (M, N ) the set of C r maps from open subsets of a C r manifold M to a C r manifold N
Tp (M ) 27.3.8 vector space of tangent vectors at p in a C 1 manifold M
T (M ) 27.3.9 the total space of tangent (coordinate) vectors of a C 1 manifold M
T̊ (M ) 27.3.9 the total space of untagged tangent operators of a C 1 manifold M
T̊p (M ) 27.5.3 vector space of untagged tangent operators at p in a C 1 manifold M
T̂p (M ) 27.6.3 vector space of tagged tangent operators at p in a C 1 manifold M
T̂ (M ) 27.6.4 the total space of tagged tangent operators of a C 1 manifold M
ep,ψ
i 27.7.4 coordinate tangent vector in direction i at p with respect to coordinate map ψ
for a C 1 manifold
∂ip,ψ 27.7.11 coordinate tangent operator in direction i at p with respect to coordinate map ψ
for a C 1 manifold
T ∗ (M ) 28.2.2 the union of the dual tangent spaces of a C 1 manifold M
Tpr,s (M ) 28.3.2 the space of tangent (r, s)-tensors at a point p in a C 1 manifold M
X k (M ) 28.5.7 the set of C k vector fields in a C k manifold M
GL(n, IR) 33.7.6 group of general linear transformations of IRn
GL(n) 33.7.6 group of general linear transformations of IRn
GL(V ) group of general linear transformations of vector space V
Vz (P ) 35.5.11 set of vertical vectors ker((dπP )z ) at z ∈ P ; C 1 principal G-bundle (P, πP , M )
Γij
k
36.10.5 components of the Christoffel symbol of an affine connection
Ri jkl components of the Riemann curvature tensor of an affine connection
gij components of the metric tensor of a Riemannian manifold

48.2. Abbreviations 831
48.2. Abbreviations
abbreviation reference meaning

AC 5.0.9 axiom of choice
AD Anno Domini [i.e. Common Era]
AMS 1.10 American Mathematical Society
BC Before Christ [i.e. Before Common Era]
BCE Before Common Era
BG 5.0.9 Bernays-Gödel (set theory)
BVP 21.0.2 boundary value problem
CC 5.0.9 axiom of countable choice
CE Common Era
DE 21.0.2 differential equation
DG differential geometry
DNA 36.8.8 deoxyribonucleic acid
EDM 49.1 Encyclopedic dictionary of mathematics
FOL + EQ 5.2.3 first order language with equality
FTOC 20.2.2 fundamental theorem of calculus
GL 11.7.1 general linear group
GR general relativity
HOL 6.5.16 higher-order logic
HTTP 2.5.7 hypertext transfer protocol
IQ 1.10 intelligence quotient
IVP 21.0.2 initial value problem
KEM 49.3 Kleine Enzyklopädie Mathematik
LHS left hand side
MP 4.6.1 modus ponens
MSC 1.8 mathematics subject classification
NAND 4.3.6 not AND
NBG 5.0.9 Neumann-Bernays-Gödel (set theory)
NOR 4.3.6 not OR
O 11.7.1 orthogonal group
ODE 21.0.2 ordinary differential equation
OED 1.6.7 Oxford English Dictionary
OFB 23.0.1 ordinary fibre bundle
PC 4.7.2 propositional calculus
PDE 21.0.2 partial differential equation
PDO 21.0.2 partial differential operator
PFB 23.0.1 principal fibre bundle
QC 4.14.1 predicate calculus
QED 1.6.6 quod erat demonstrandum
RAA 3.11.1 reductio ad absurdum
RHS right hand side

abbreviation reference meaning

SL 11.7.1 special linear group
SNMP 2.5.7 Simple Network Management Protocol
SO 11.7.1 special orthogonal group
SU 11.7.1 special unitary group
TS 28.0.1 tangent space
U 11.7.1 unitary group
URL 49.0.1 Uniform Resource Locator
USA 2.5.20 United States of America
UTC 2.5.20 Universal Time Coordinated
wf 4.5.2 well-formed formula

wff 4.5.2 well-formed formula
XOR 4.3.7 exclusive OR
ZF 5.0.9 Zermelo-Fraenkel (set theory)

ZFC 5.9.5 Zermelo-Fraenkel set theory with axiom of choice

[833]
Chapter 49
Bibliography
49.1 Differential geometry introductory texts . . . . . . . . . . . . . . . . . . . . . . . . . . . 833

49.2 Other differential geometry references . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
49.3 Other mathematics references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
49.4 Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
49.5 Logic and set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
49.6 Anthropology and linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
49.7 Philosophy and ancient history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
49.8 History of mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
49.9 Other references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
49.10 Comments on other people’s books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
Section 49.1 consists mostly of differential geometry books which may be useful as introductory texts. Some
of these introductory works are on other subjects such as general relativity, but have substantial introductory
material on differential geometry. Section 49.2 contains differential geometry references. These are mostly
on specialized topics, or have historical interest, or are research monographs. Section 49.3 contains texts on
mathematical topics other than differential geometry.
49.0.1 Remark: Internet locations (“URLs”) are avoided as much as possible in this book because Internet
resources and locations are much more ephemeral than books and journals.
49.1. Differential geometry introductory texts

1. Louis Auslander, Differential geometry, Harper & Row, New York, 1967?.
2. Marcel Berger, Bernard Gostiaux, Differential geometry: manifolds, curves, and surfaces, Springer,
New York, 1988?, translated from Géométrie différentielle, P.U.F., Paris, 1986.
3. Marcel Berger, Bernard Gostiaux, Géométrie différentielle, P.U.F., Paris, 1986.
4. Richard L. Bishop, Richard J. Crittenden, Geometry of manifolds, Academic Press, New York, 1964.
5. Richard L. Bishop, Samuel I. Goldberg, Tensor analysis on manifolds, Dover Publications, New
York, 1968, 1980.
6. Wilhelm Blaschke, K. Leichtweiss, Elementare Differentialgeometrie, 5th. ed., Springer, Berlin, 1973.
7. William Munger Boothby, An introduction to differentiable manifolds and Riemannian geometry,
Academic Press, New York, 1975.
8. William Munger Boothby, An introduction to differentiable manifolds and Riemannian geometry,
2nd. ed., Academic Press, Orlando, Florida, 1986.
9. Nicolas Bourbaki, Variétés différentielles et analytiques, Hermann et Cie, Paris, 1967/71.
10. F. Brickell, R. S. Clark, Differentiable manifolds, Van Nostrand Reinhold, London, 1970.
11. Robert Coquereaux, Riemannian geometry, fiber bundles etc., ??, ??, 19??.
12. M. Crampin, F.A.E. Pirani, Applicable differential geometry, Cambridge U.P., Cambridge, England,
1986, 1994.
13. W. D. Curtis, F. R. Miller, Differential manifolds and theoretical physics, Academic Press, Orlando,
Florida, 1985.

834 49. Bibliography
14. R.W.R. Darling, Differential forms and connections, Cambridge U.P., Cambridge, England, 1994.
15. Manfredo Perdigão do Carmo, Differential forms and applications, Springer, Berlin, 1994, translated
from Formas diferenciais e aplicacoes.
16. Manfredo Perdigão do Carmo, Differential geometry of curves and surfaces, Prentice-Hall, Englewood
Cliffs, N.J, 1976.
17. Manfredo Perdigão do Carmo, Riemannian geometry, Birkhäuser, Boston, 1992, translated from
Geometria Riemanniana, Instituto de Matematica Pura e Aplicada, 1979, 1988. ISBN 0-8176-3490-8.
18. A. T. Fomenko, Differential geometry and topology, Consultants Bureau, New York, 1987?.
19. Theodore Frankel, The geometry of physics, an introduction, first. ed., Cambridge University Press,
Cambridge, 1997, 1999, 2001. ISBN 0-521-38753-1.
20. Sylvestre Gallot, Dominique Hulin, Jacques Lafontaine, Riemannian Geometry, Springer, Berlin,
1990.
21. Hubert Goenner, Einführung in die spezielle und allgemeine Relativitätstheorie, Spektrum
Akademischer Verlag, Heidelberg, 1996.
22. Heinrich Walter Guggenheimer, Differential geometry, McGraw-Hill, New York, 1963.
23. Robert Clifford Gunning, Lectures on Riemann surfaces, Princeton University Press, Princeton, New
Jersey, 1966.
24. Sigurdur Helgason, Differential geometry, Lie groups and symmetric spaces, Academic Press, New
York, 1978.
25. Morris William Hirsch, Differential Topology, Springer, New York, 1976.
26. Wilhelm Klingenberg, Riemannian geometry, Walter de Gruyter, Berlin, 1982.
27. Shoshichi Kobayashi, Katsumi Nomizu, Foundations of differential geometry, volumes 1,2, Wiley
Interscience, New York, 1963/69.
28. Shoshichi Kobayashi, Transformation groups in differential geometry, Erg. math. 70, Springer, Berlin,
1972.
29. Serge Lang, Introduction to differentiable manifolds, John Wiley and Sons, New York, 1962.
30. Serge Lang, Differential manifolds, Addison-Wesley, Reading, Massachusetts, 1972.
31. Serge Lang, Fundamentals of differential geometry, Springer, New York, 1991. ISBN 0-387-98593-X.
32. Detlef Laugwitz, Differential and Riemannian geometry, Academic Press, New York, 1965, translated
from Differentialgeometrie.
33. John M. Lee, Riemannian manifolds: An introduction to curvature, Springer-Verlag, New York, 1997.
ISBN 0-387-98271-X.
34. Mathematical Society of Japan, Encyclopedic dictionary of mathematics, 1st. ed., (ed. Shôkichi
Iyanaga, Yukiyosi Kawada), MIT Press, Cambridge MA, 1980.
35. Mathematical Society of Japan, Encyclopedic dictionary of mathematics, 2nd. ed., (ed. Kiyosi Itô),
MIT Press, Cambridge MA, 1993. ISBN 0-262-59020-4.
36. Paul Malliavin, Géométrie différentielle intrinsèque, Hermann, Paris, 1972.
37. Richard S. Millman, George D. Parker, Elements of differential geometry, Prentice-Hall, Englewood
Cliffs, N.J, 1977?.
38. Charles W. Misner, Kip S. Thorne. John Archibald Wheeler, Gravitation, W. H. Freeman, New
York, 1970.
39. Tanjiro Okubo, Differential geometry, Marcel Dekker, New York, 1987.
40. Barrett O’Neill, Elementary Differential Geometry, ?, ?, 19??.
41. Walter A. Poor, Differential geometric structures, McGraw Hill, New York, 1981.
42. Mikhail Mikhailovich Postnikov, The variational theory of geodesics, (ed. Bernard R. Gelbaum),
Saunders, Philadelphia, 1967, translated from Variatsionnaia teoriia geodezicheskikh.
43. Michael David Spivak, A comprehensive introduction to differential geometry, volumes I-V, 3rd. ed.,
Publish or Perish, Berkeley, 1999.
44. Norman Earl Steenrod, The topology of fibre bundles, Princeton Univ. Press, Princeton, N.Y., 1951.
45. Peter Szekeres, A course in modern mathematical physics: groups, Hilbert space, and differential
geometry, Cambridge University Press, Cambridge, UK, 2004. ISBN 0-521-82960-7.
46. Andrzej Trautman, Differential geometry for physicists: Stony Brook lectures, Bibliopolis, Napoli,
1984?.
47. Izu Vaisman, A first course in differential geometry, M. Dekker, New York, 1984?.
48. Robert M. Wald, General relativity, University of Chicago Press, Chicago, 1984.

49.2. Other differential geometry references 835
49. Frank Wilson Warner, Foundations of differentiable manifolds and Lie groups, Scott, Foresmond and
Co., Glenview, Illinois, 1971.
50. Frank Wilson Warner, Foundations of differentiable manifolds and Lie groups, Springer, New York,
1983.
51. Hermann Weyl, Raum, Zeit, Materie, 7th. ed., Springer, Berlin, 1923, 1988.
52. Hermann Weyl, Gruppentheorie und Quantenmechanik, Hirzel, Leipzig, 1928.
53. Thomas J. Willmore, Introduction to differential geometry, Oxford University Press, London, 1959.
49.2. Other differential geometry references

54. Shun-ichi Amari, Hiroshi Nagaoka, Methods of information geometry, American Math Society,
Providence, Rhode Island, 1993, 2000, 2007. ISBN 9780821843024.
55. Thierry Aubin, Nonlinear analysis on manifolds, Monge-Ampère equations, Springer, Berlin, 1982.
56. Arthur L. Besse, Einstein manifolds, Springer, Berlin, 1986.
57. William L. Burke, Applied differential geometry, Cambridge University Press, Cambridge, New York?,
1985.
58. Herbert Busemann, The geometry of geodesics, Academic Press, New York, 1955.
59. Shiing Shen Chern, Complex manifolds without potential theory, Springer, New York, 1979.
60. Shiing Shen Chern, Differentiable manifolds, ?, ?, 198?.
61. Yvonne Choquet-Bruhat, Géométrie différentielle et systèmes extérieurs, Dunod, Paris, 1968.
62. Gaston Darboux, Leçons sur la théorie générale des surfaces et les applications géometriques du calcul
infinitesimal, 3rd. ed., Chelsea, New York, 1972.
63. Georges de Rham, Variétés différentiables, Hermann et Cie, Paris, 1955.
64. C. T. J. Dodson, T. Poston, Tensor geometry: the geometric viewpoint and its uses, Pitman, London,
1977.
65. C. Ehresmann, Les connexions infinitésimales dans un espace fibré différentiable, Colloque de
Topologie, Brussels (1950), 29–55.
66. Harley Flanders, Differential forms with applications to the physical sciences, Dover, New York, 1963,
1989.
67. A. Gleason, Groups without small subgroups, Ann. of Math. 56 (1952), 193–212.
68. R. E. Greene, H. Wu, Function theory on manifolds which possess a pole, Lecture Notes in
Mathematics 699, Springer, Berlin, 1979.
69. Alexendre Grothendieck, Éléments de géométrie algébraique IV, IHES Sci. Pub. Math. 24 (1965),
???–???.
70. Robert Clifford Gunning, Complex algebraic varieties, PUP, 1970.
71. Robert Hermann, Geometry, physics, and systems, M. Dekker, New York, 1973.
72. Shoshichi Kobayashi, Differential geometry of complex vector bundles, Iwanami Shoten, and Princeton
University Press, Tokyo, 1987.
73. Anders Kock, Synthetic differential geometry, Cambridge University Press, 1981.
74. Tullio Levi-Cività, Nozione di parallelismo in una varietà qualunque e consequente spezificazione
geometrice della curvature Riemanniana, Rend. Circ. Mat. Palermo 42 (1917), 173–205.
75. John Legat Martin, General relativity: a guide to its consequences for gravity and cosmology, E.
Horwood Halsted Press, Chichester, New York, 1988.
76. August Ferdinand Möbius, Der barycentrische Calcul, ein neues Hülfsmittel zur analytischen
Behandlung der Geometrie, Verlag von Johann Ambrosius Barth, Leipzig, 1827. (Republished:
Georg Olms Verlag, Hildesheim, 1976.)
77. Deane Montgomery, Leo Zippin, Topological transformation groups, Interscience Publishers, London,
1955.
78. Deane Montgomery, Leo Zippin, Small subgroups of finite-dimensional groups, Ann. of Math. 56
(1952), 213–241.
79. Gaspard Monge, Application de l’analyse à la géométrie, 4th. ed., ?, Paris, 1809.
80. Mikio Nakahara, Geometry, topology, and physics, A. Hilger, Bristol, England, 1990?.
81. N. Newns, A. Walker, Tangent planes to a differentiable manifold, J. London Math. Soc. 31 (1956),
400–407.
82. Georg Friedrich Bernhard Riemann, Über die Hypothese, welche der Geometrie zu Grunde liegen,
Habilitationsschrift, ?, ?, 1854.

83. Jacob T. Schwartz, Differential geometry and topology, Gordon and Breach, New York, 1968.
84. Shlomo Sternberg, Lectures on differential geometry, Prentice-Hall, Englewood Cliffs, N.J, 1964.
85. Dirk Jan Struik, Lectures on classical differential geometry, 2nd. ed., Addison-Wesley, Reading,
Massachusetts, 1961.
86. R. Sulanke, P. Wintgen, Differentialgeometrie und Faserbündel, Birkhäuser Verlag, Basel, 1972.
87. Peter Szekeres, General relativity, Lecture notes, Adelaide University, 1974.
88. Tracy Yerkes Thomas, Concepts from tensor analysis and differential geometry, 2nd. ed., Academic
Press, New York, 1965.
89. R. Walter, Konvexität in riemannschen Mannigfaltigkeiten, Jahresber. DMV 83 (1981), 1–31.
90. Thomas J. Willmore, Total curvature in Riemannian geometry, Ellis Horwood, Chichester, England,
1982.
91. Joseph Albert Wolf, Spaces of constant curvature, McGraw-Hill, New York, 1967.
92. Kentaro Yano, Shigeru Ishihara, Tangent and cotangent bundles: differential geometry, Marcel
Dekker, New York, 1973.
49.3. Other mathematics references

93. Robert A. Adams, Sobolev spaces, Academic Press, New York, 1975.
94. Lars Valerian Ahlfors, Complex analysis, an introduction to the theory of analytic functions of one
complex variable, 2nd. ed., McGraw-Hill Kogakusha, Tokyo, 1966.
95. G. E. Bredon, Sheaf Theory, McGraw-Hill, New York, 1967.
96. G. E. Bredon, Introduction to compact transformation groups, Academic Press, New York, 1972.
97. Daniel Bump, Lie groups, Springer, New York, 2004.
98. Claude Chevalley, Introduction to the theory of algebraic functions of one variable, Amer. Math. Soc.,
New York, 1951.
99. Claude Chevalley, Theory of Lie groups I, Princeton University Press, Princeton, 1946.
100. CRC standard mathematical tables, 27th. ed., (ed. William H. Beyer), CRC Press, Boca Ratón,
Florida, 1964, 1981, 1984.
101. Charles W. Curtis, Linear algebra, 2nd. ed., Allyn and Bacon, Inc., Boston, 1968.
102. René Descartes, Géométrie, ?, Paris, 1637.
103. Tammo tom Dieck, Transformation groups and representation theory, Lecture notes in Math., 766,
Springer, Berlin, 1979.
104. Leonhard Euler, Introductio in analysin infinitorum, Marcus-Michael Bousquet, Lausanne, 1748.
105. Leonhard Euler, Leonhardi Euleri opera omnia, first series, volume 9, (ed. Andreas Speiser), B. G.
Teubner, 1945.
106. Herbert Federer, Geometric measure theory, Springer, Berlin, 1969.
107. William Feller, An introduction to probability theory and its applications, volume 1, 3rd. ed., John
Wiley & Sons, New York, 1950, 1957, 1968.
108. William Fulton, Algebraic curves, an introduction to algebraic geometry, W. A. Benjamin, New York,
1969.
109. William Fulton, Joe Harris, Representation theory: A first course, Springer, New York, 1991, 2004.
110. David Gilbarg, Neil Sidney Trudinger, Elliptic partial differential equations of second order, 2nd.
ed., Springer, New York, 1983.
111. David Gilbarg, Neil Sidney Trudinger, Elliptic partial differential equations of second order, 3rd. ed.,
Springer, New York, 1998, 2001.
112. Robert Gilmore, Lie groups, Lie algebras, and some of their applications, Wiley, New York, 1974.
113. I. S. Gradstein, I. M. Ryzhik, Tables of series, products, and integrals, Verlag Harri Deutsch, Thun,
Germany, 1981, translated from «Tablicy integralov, summ, r"dov i proizvedeni! i», I. S.
Gradxte! in, I. M. Ry#ik, Nauka, Moskva, 1971.
114. Marvin J. Greenberg, John R. Harper, Algebraic topology, a first course, Benjamin/Cummings,
London, 1981.
115. Alexendre Grothendieck, J. A. Dieudonné, Éléments de géométrie algébraique I, Springer, Berlin,
1971.
116. B. Hartley, T. O. Hawkes, Rings, modules and linear algebra, Chapman and Hall, London, 1970.
117. Lester La Verne Helms, Introduction to potential theory, Robert E. Krieger Publishing Company,
Huntingdon, New York, 1975.

49.3. Other mathematics references 837
118. John L. Kelley, General topology, Van Nostrand Company, Princeton, 1955. (Republished: Graduate
Texts in Mathematics 27, Springer-Verlag, New York 1975.)
119. Alan U. Kennington, An improved convexity maximum principle and some applications, Ph.D.
Thesis, University of Adelaide, South Australia, 1984.
120. Alan U. Kennington, Power concavity and boundary value problems, Indiana Univ. Math. J. 34
(1985), 687–704.
121. Alan U. Kennington, Convexity of level curves for an initial value problem, J. Math. Anal. Appl. 133
(1988), 324–330.
122. Kleine Enzyklopädie Mathematik, 2nd. ed., (ed. W. Gellert, H. Küstner, M. Hellwich, H. Kästner),
Verlag Harri Deutsch, Thun und Frankfurt/M, 1984.
123. Nicholas J. Korevaar, Capillary surface convexity above convex domains, Indiana Univ. Math. J. 32
(1983), 73–81.
124. Erwin Kreyszig, Advanced engineering mathematics, 9th. ed., John Wiley & Sons (Wiley International
Edition), New York, 2006.
125. Olga Alexandrovna Ladyzhenskaya, Nina N. Ural’tseva, Linear and quasilinear elliptic equations,
Academic Press, New York, 1968, translated from «Line! inye: kvaziline! inye uravneni"
elliptiqeskogo tipa», O. A. Lady#enska", N. N. Ural$ceva, Nauka, Moskva, 1964.
126. Serge Lang, Introduction to algebraic geometry, Interscience, New York, 1958.
127. Carlo Miranda, Partial differential equations of elliptic type, Springer, New York, 1970.
128. Frank Morgan, Geometric measure theory, a beginner’s guide, Academic Press, New York, 1988.
129. Hanna Neumann, Schwartz distributions, Notes in Pure Mathematics 3, Department of Pure
Mathematics, ANU, Canberra, 1969.
130. Blaise Pascal, Traité dv triangle arithmetiqve, avec qvelqves avtres petits traitez svr la mesme matiere,
Guillaume Desprez, Paris, 1665.
URL: http://www.lib.cam.ac.uk/deptserv/rarebooks/PascalTraite/
131. Lev Semenovich Pontryagin, Topological groups, 1st. ed., ?, ?, 1939.
132. Lev Semenovich Pontryagin, Topological groups, 2nd. ed., Gordon & Breach, ?, 1954, 1966.
133. Lev Semenovich Pontryagin, Topological groups, (German translation) 2nd. ed., ?, ?, 1957–58.
134. Lev Semenovich Pontryagin, Topological groups, 2nd. ed., ?, ?, 19??. (Republished in “Selected
works”)
135. Fritz Reinhardt, Heinrich Soeder, dtv-Atlas zur Mathematik, Deutscher Taschenbuch Verlag,
München, 1974, 1977, 1987, 1990.
136. Alex P. Robertson, Wendy J. Robertson, Topological vector spaces, second. ed., Cambridge
University Press, London, 1964, 1973.
137. Walter Rudin, Principles of mathematical analysis, 2nd. ed., McGraw-Hill, New York, 1964.
138. Walter Rudin, Functional analysis, 2nd. ed., McGraw-Hill, New York, 1973, 1991.
139. Abraham Seidenberg, Elements of the theory of algebraic curves, Addison-Wesley, Reading,
Massachusetts, 1968.
140. George F. Simmons, Introduction to topology and modern analysis, McGraw-Hill, Tokyo, 1963.
141. Isadore Manuel Singer, John A. Thorpe, Lecture notes on elementary topology and geometry,
Springer, New York, 1967/76.
142. Spencer, Overdetermined systems of linear partial differential equations, Bull. AMS 75 (1969),
179–239.
143. Murray R. Spiegel, Mathematical handbook of formulas and tables, McGraw-Hill Book Company, New
York, 1968.
144. Michael David Spivak, Calculus, W. A. Benjamin, New York, 1967.
145. Samuel James Taylor, Introduction to measure and integration, Cambridge University Press, London,
1973.
146. John A. Thorpe, Elementary topics in differential geometry, Springer, Berlin, 1979.
147. François Treves, Basic linear partial differential equations, Academic Press, New York, 1975.
148. Bartel Leendert van der Waerden, Einführung in die algebraische Geometrie, Springer, Berlin, 1939.
149. R. J. Walker, Algebraic curves, Dover, New York, 1962[1950?].
150. André Weil, Foundations of algebraic geometry, Amer. Math. Soc., New York, 1946, 1962.
151. Kôsaku Yosida, Functional analysis, Sixth. ed., Springer-Verlag, Berlin, 1965, 1971, 1974, 1978, 1980,
1995.

152. Oscar Zariski, Pierre Samuel, Commutative algebra, Van Nostrand, Princeton, N.J., 1958.
153. Ernst Friedrich Ferdinand Zermelo, Untersuchungen über die Grundlagen der Mengenlehre I, Math.
Ann. 65 (1908), 261–281.
49.4. Physics
154. Vladimir Igorevich Arnold, Mathematical methods of classical mechanics, Graduate texts in
Mathematics 60, Springer, New York, 1978.
155. Claude Cohen-Tannoudji, Bernard Diu, Franck Laloë, Quantum Mechanics, Wiley-Interscience,
New York, 1977, translated from Mécanique quantique, Hermann, Paris, 1977.
156. CRC handbook of chemistry and physics, (ed. Robert C. Weast), CRC Press, Boca Ratón, Florida,
1988.
157. H. A. Lorentz, Electromagnetic phenomena in a system moving with any velocity smaller than that of
light., Proc. Acad. Sci. Amsterdam 6 (1904), 809–835.
158. Albert Abraham Michelson, Edward Williams Morley, On the relative motion of the Earth and the
luminiferous ether, Amer. Journ. Sci. 34 (1887), 333–345.
49.5. Logic and set theory

159. Lewis Carroll, Symbolic logic and the game of logic, Dover Publications, New York, 1896, 1958.
160. Paul Richard Halmos, Naive set theory, Springer, New York, 1974.
161. John Harrison, Formal proof—theory and practice, Notices of the American Mathematical Society 55
(2008), 1395–1406.
URL: http://www.ams.org/notices/200811/index.html
162. Stephen Cole Kleene, Introduction to metamathematics, Van Nostrand, 1952.
163. Stephen Cole Kleene, Mathematical logic, Wiley, 1967.
164. Edward John Lemmon, Beginning Logic, Thomas Nelson, London, 1965, 1971.
165. Elliott Mendelson, Introduction to mathematical logic, D. Van Nostrand, New York, 1964.
166. Tobias Nipkow, Lawrence C. Paulson, Markus Wenzel, Isabelle/HOL: A proof assistant for
higher-order logic, Springer-Verlag, Berlin, 2008.
URL: http://www4.in.tum.de/~nipkow/LNCS2283/
167. Chris E. Mortensen, Inconsistent mathematics, Springer-Verlag, New York, 1994. ISBN 9780792331865.
168. Bertrand Arthur William Russell, Alfred North Whitehead, Principia mathematica, Volumes I–III,
Cambridge University Press, Cambridge, 1910–1913.
169. Joseph Robert Shoenfield, Mathematical logic, Association for Symbolic Logic, AK Peters, Natick,
Massachusetts, 1967, 2001. ISBN 1-56881-135-7.
49.6. Anthropology and linguistics

170. Leslie C. Aiello, Robin Ian MacDonald Dunbar, Neocortex size, group size, and the evolution of
language, Current Anthropology 34 (1993), 184–193.
171. Robin Ian MacDonald Dunbar, Coevolution of neocortical size, group size and language in humans,
Behavioral and Brain Sciences 16 (1993), 681–735.
172. William A. Foley, Anthropological linguistics: an introduction, Blackwell Publishers Ltd., Malden,
Massachusetts, 1997, 2002.
173. George Lakoff, Rafael E. Núñez, Where mathematics comes from: how the embodied mind brings
mathematics into being, Basic Books, New York, 2000. ISBN 978-0-465-03771-1.
174. Steven J. Mithen, The prehistory of the mind: The cognitive origins of art and science, Thames and
Hudson, London, 1996, 1999. ISBN 0-500-28100-9.
175. Bronis&law Kasper Malinowski, The problem of meaning in primitive languages, in “The meaning of
meaning”, (ed. C. Ogden, I. Richards) , pp. 296–336, Harcourt, Brace and World, New York,
1923.
176. Nicholas Ostler, Empires of the word: A language history of the world, Harper Perennial, London,
2005, 2006.
177. Bruce Richman, Some vocal distinctive features used by gelada baboons, Journal of the Acoustical
Society of America 60 (1972), 718–724.
178. Bruce Richman, The synchronization of voices by gelada monkeys, Primates 19 (1978), 569–581.
179. Bruce Richman, Rhythm and melody in gelada vocal exchanges, Primates 28 (1987), 199–223.

49.7. Philosophy and ancient history 839
49.7. Philosophy and ancient history

180. Ronald W. Clark, The life of Bertrand Russell, Penguin Books, Harmondsworth, England, 1975, 1978.
181. Henry Bernard Cotterill, Ancient Greece: myth & history, Geddes and Grosset, New Lanark, Sctoland,
1913, 2004.
182. Albert Einstein, Essays in science, Wisdom Library, Philosophical Library, New York, 1934,
translated from Mein Weltbild, Querido Verlag, Amsterdam, 1933.
183. Albert Einstein, Aus meinen späten Jahren, Ullstein, Frankfurt-am-Mein, 1950, 1993.
ISBN 3548347215.
184. Colin McEvedy, The new Penguin atlas of ancient history, Penguin Books, London, 1967, 2002.
185. Stephen Palmquist, Kant on Euclid: Geometry in Perspective, Philosophia Mathematica II 5:1/2
(1990), 88–113.
URL: http://www.hkbu.edu.hk/~ppp/srp/arts/KEGP.html
186. Bertrand Arthur William Russell, History of Western philosophy, George Allen & Unwin, London,
1946, 1961, 1974.
187. J. A. K. Thomson, Hugh Tredennick, Jonathon Barnes, The ethics of Aristotle: the Nicomachean
ethics, Penguin Books, London, 1953, 1976, Aristotle.
49.8. History of mathematics

188. Walter William Rouse Ball, A short account of the history of mathematics, Dover, New York, 1893,
1908, 1960.
189. Petr Beckmann, A history of π, St. Martin’s Press, New York, 1971.
190. Eric Temple Bell, The development of mathematics, 2nd. ed., McGraw-Hill, New York, 1940, 1945.
191. Eric Temple Bell, Men of mathematics, Simon & Schuster, New York, 1937, 1965, 1986.
192. W. F. Bynum, E. J. Browne, Roy Porter, Dictionary of the history of science, Princeton University
Press, Princeton, New Jersey, 1985.
193. J. B. Dubbey, The introduction of the differential notation in Great Britain, Annals of Science 19
(1963), 37–48.
194. Dirk Jan Struik, A concise history of mathematics, Dover, New York, 1948, 1967, 1987.
ISBN 0-486-60255-9 (pbk.).
49.9. Other references
195. Beowulf, Penguin, Harmondsworth, England, 1973, 1977, translated by Michael Alexander.
196. Beryl T. Atkins, etc., Collins-Robert French-English, English-French dictionary, 3rd. ed.,
HarperCollins, London, 1993.
197. Graham Chapman, Terry Jones, Terry Gilliam, Michael Palin, Eric Idle, John Cleese, Monty
Python and the Holy Grail (book), Methuen, London, 1977, 1989.
198. Mark Collier, Bill Manley, How to read Egyptian hieroglyphs: a step-by-step guide to teach yourself,
The British Museum Press, London, 1998, 1999, 2003.
199. Angel Garcı́a de Paredes, Cassell’s Spanish-English English-Spanish Dictionary, Cassell, London,
1978.
200. Günther Drosdowski, Duden Deutsches Universalwörterbuch, Dudenverlag, Mannheim, 1989.
201. Maurits Cornelis Escher, The world of M. C. Escher, Harry N. Abrams, New American Library, New
York, 1971, 1974.
202. Karl Feyerabend, Langenscheidt’s pocket Greek dictionary: Greek-English, 7th. ed., Langenscheidt,
Berlin, 1963.
203. The epic of Gilgamesh, (ed. Andrew George?), Penguin, London, 1999, 2003.
204. The Guinness encyclopedia, (ed. Ian Crofton), Guinness Publishing, Enfield, Middlesex, England,
1990.
205. Patrick Hanks, etc., Collins dictionary of the English language, second. ed., Collins Publishers,
Sydney, 1979, 1986.
206. Brian W. Kernighan, Dennis M. Ritchie, The C programming language, second. ed., Prentice Hall,
Englewood Cliffs, New Jersey, 1978, 1988.
207. Donald Ervin Knuth, The TeXbook, Addison Wesley Publishing Company, Reading, Massachusetts,
1984, 1986, 1991.

208. Vladimiro Macchi, I Dizionari Sansoni Inglese-Italiano Italiano-Inglese, 2nd. ed., Sansoni Editore,
Firenze, 1983.
209. Alfred Mann, The study of counterpoint, W.W. Norton, New York, 1965, 1943, 1971, translated from
Gradus ad Parnassum, Johann Joseph Fux, Austrian Empire, Vienna, 1725.
210. Shu Lin, Daniel J. Costello, Error control coding: Fundamentals and applications, Prentice-Hall,
Englewood Cliffs, New Jersey, USA, 1983. ISBN 0-13-283796-X.
211. Arthur P. Norton, Norton’s star atlas, 16th. ed., (ed. Gilbert E. Satterthwaite), Gall & Inglis,
Edinburgh, 1910, 1973.
212. C. T. Onions, etc., The shorter Oxford English dictionary on historical principles, Oxford University
Press, Oxford, 1933, 1973, 1992.
213. John G. Proakis, Digital Communications, 3rd. ed., McGraw-Hill, New York, 1983, 1989, 1995.
214. Alain Rey, Josette Rey-Debove, etc., Le Petit Robert 1, dictionnaire alphabétique et analogique, Le
Robert, Paris, 1983.
215. Ruth Schumann-Antelme, Stéphane Rossini, Illustrated hieroglyphics handbook, Sterling Publishing,
New York, 2002, translated from Lecture illustrée des hieroglyphes, Éditions du Rocher, Paris, 1998.
216. Gerhard Wahrig, Deutsches Wörterbuch, Bertelsmann Lexikon Verlag, Gütersloh, 1991.
217. Larry Wall, Tom Christiansen, Randal L. Schwartz, Programming Perl, O’Reilly & Associates,
Sebastapol, California, 1991, 1996.
218. John T. White, A complete Latin-English and English-Latin dictionary for the use of junior students,
Longmans, Green and Co., London, 1889.
219. Beowulf: with the Finnesburg fragment, (ed. C. L. Wrenn, W. F. Bolton), University of Exeter, 1953,
1958, 1973, 1988.
49.10. Comments on other people’s books

The comments in this section are the current personal opinions of this author. These comments should be
regarded with maximum scepticism. The author’s opinions on other people’s books change over time and
are biased by his own favourite application areas and personal preferences. The order in which books are
mentioned in this section should not be interpreted as order of preference. Any comments which seem to be
negative should be disregarded. The author accepts no responsibility at all for any purchase choices of the
reader which may be influenced by the comments in this section.
The book by Frankel [19] is an excellent exposition of a full range of differential geometry topics which
combines applicability to physics with a higher level of mathematical precision than is usual in DG texts
which are aimed at physicists.
The Crampin/Pirani [12] book is a more mathematically oriented version of differential geometry although
it is intended for applications in physics, particularly mechanics. The first 163 pages (42.7%) present the
differential layer in the absence of any metric or connection. The following 72 pages (18.8%) present metrics
and connections. Manifolds (charts and atlases) are defined only in the last 147 pages (38.5%) of the book.
Thus this book is broadly organized according to structural layers.
The Misner/Thorne/Wheeler [38] book presents the physicists’ view of differential geometry (in addition to
general relativity). This book is noteworthy for apparently using no function spaces at all. Mathematical
objects are presented in isolation without function classes to contain them. This contrasts with the modern
approach in mathematics which associates almost all mathematical objects with container classes. This book
uses intuition rather than the axiomatic/deductive approach to differential geometry.
The book by Szekeres [45] presents a wide range of mathematical physics topics in a systematic way which
attempts to include the principal mathematical prerequisites in the earlier chapters. This is similar to my
own attempt to include prerequisites in the early chapters. (Peter Szekeres was one of my mathematical
physics lecturers at the University of Adelaide in the 1970s.)
The five-volume Spivak [43] book is not organized according to structural layers. Nor is it a systematic
deductive development of differential geometry although it is oriented to mathematics rather than physics
applications. This book is useful for its wide range of mathematical applications topics and the analysis of
historical DG texts.
The “Encyclopedic dictionary of mathematics” [35] by the Mathematical Society of Japan is a very com-
prehensive set of definitions covering all of mathematics. The articles on differential geometry are a useful
reference for most of the basic definitions.

[841]
Chapter 50
Index
The references in this index are not page numbers. (Page numbers will be added in future.) The references
are of three kinds: chapter number (simple integer), section number (two dotted integers) or subsection
number (three dotted integers). A subsection may be a theorem, definition, remark or other text unit with
three dotted integers. Subsections which are underlined are definitions.
a-fortiori, 4.8.6, 18.5.15 adherent set, 14.5.6

a-priori geometry, 3.4.4 adjoint, etymology, 9.11.24
a-priori knowledge, 2.1.3, 2.5.4 adjoint Lie algebra, 9.11.22
a-priori mathematics, 2.12.1, 6.0.3 adjoint representation of Lie algebra, 9.11.19
abbreviation, structure tuple, 26.3.7 adjunct, etymology, 9.11.24
abbreviations, 48.2 advocate, devil’s, 3.10.5
Abel, Niels Henrik, 9.2.1, 9.2.21, 20.13.22, 45.1.5 aether, luminiferous, 24.1.4
Abelian group, 9.2.21 affine connection, 12.0.3, 36
abgeschlossene Hülle, 14.5.6 affine connection, overview, 36.2
aborigine, Australian, 2.2.7 affine connection, tensor calculus, 40.3
absolute knowledge, 2.5.14 affine connection, terminology, 36.1, 36.1.3, 36.1.4
absolute parallelism, 12.0.2 affine connection on principal fibre bundle, 36.9, 36.9.1
absolute value function, 8.6.1 affine connection on principal fibre bundle, coefficients, 36.10
abstract direct sum of linear spaces, 10.6.3 affine connection on tangent bundle, 36.4, 36.4.2
abstract discussion context, 4.3.1 affine connection on tangent bundle, differentiability, 36.4.4
abstract groups, 9.4.1 affine-invariant geometry, 45.3.0
abstract logic, 3.9.6, 4.1.11 affine manifold, 12.0.3, 36.1.4
abstract logic, crisp perfection, 3.4.3 affine manifolds, convex curvilinear interpolation, 37.6
abstract-to-concrete map, logical operation, 3.13.1 affine path, 37.0.2
abstract variable name space, 2.11.17 affine space, 12, 12.0.3, 12.2.3
abstraction, three stages, 3.9.6 affine space, convex combination of points, 12.2.24
absurdity, 3.11.2 affine space, convex curvilinear interpolation, 16.5
absurdly large infinity, 2.0.3 affine space, etymology, 45.3
absurdum, reductio ad, 3.10.12, 3.11.2 affine space, hyperplane segment through points, 12.2.22
Abū Ja’far Mohammed ibn Mūsā, 45.2.3 affine space, hyperplane through points, 12.2.21
abundance of integers, 2.11.11 affine space, line segment, 12.2.16
AC (axiom of choice), 5.0.9 affine space, line segment, properties, 12.2.19
AC-enhanced theorem, 5.9.1 affine space, line through points, 12.2.13
AC-tainted theorem, 5.9.1, 7.2.27, 7.8.4 affine space, line through points, properties, 12.2.14
AC-tainted theorem example, 10.2.25, 20.1.4 affine space, manifold chart, 12.2.11
AC-tainted theorem examples, 5.9.10 affine space, span of points, 12.2.23
Academy, 45.2.1 affine space definitions, 12.2
acceleration of curve, covariant, 37.1.4 affine space discussion, 12.1
acceptance/rejection model, proposition, 3.7.2 affine space point space, 12.2.3
accumulation point, 14.7.1 affine space vector space, 12.2.3
Achilles and the tortoise, 2.11.3, 14.0.4, 14.1.4 affine structure function, 12.2.5
acknowledgements, 1.10 affine transformation, 12.0.3, 36.9.2
action, differential, 33.8.0 affine transformation group, 45.3.0
action, infinitesimal, 35.3.1 affinely connected manifold, 35.1.2
action, right, 23.9.2 affinely parametrized geodesic on two-sphere, 41.10
action of group, differentiable, 34.3.0 Ages, Dark, 41.0.1
action of vector, 32.1.2, 32.1.8, 32.1.16 aggregate, uncountable, name, 2.10.1
action of vector field, 32.1.3, 32.1.11, 32.1.17 Akkadian, 3.5.2
active set, 9.9.0 al Khwārizmi, 45.2.3, 45.2.4
acyclic graph, 5.7.19 Alberti, Leone Battista, 45.1.3
acyclic network, concepts, 2.1.7 Alcibiades, 3.4.2
ad-hoc kludge, 5.7.27 Alexander the Great, 3.4.2
ADC (Analogue Digital Conversion), 2.4.3 algebra, 9
addition, vector, 10.1.2 algebra, alternating, 13.11.3

842 50. Index
algebra, alternating tensor, 13.11 analytic fibre bundle for Lie left transformation group, 34.2.5
algebra, associative, 9.10 analytic function space, 44.6
algebra, Boole’s, 3.6.3 analytic logic, 3.9.7
algebra, Boolean, 4.0.1, 4.4.1 analytic manifold, 26.10, 26.10.2
algebra, etymology, 3.8.3, 45.2.4 analytic real function, 8.7.3
algebra, exterior, 13.11.2 anchor chart, 29.0.4, 36.5.3
algebra, general linear, 9.10.4 ancient Greece, 2.1.6
algebra, general tensor, 13.9 ancient Greek logic, 3.7.11
algebra, Grassman, 13.1.7 ancient Greeks, 3.2.8, 3.7.18, 5.0.2
algebra, inverse problem, 4.4.2 ancient history, law, 3.10.14
algebra, Lie, 9.11, 9.11.1 ancient literature, logic in, 3.5
algebra, Lie, real, 9.11.11 ancient literature, logical language, 45.4
algebra, linear, 10 ancient Olympic games, 3.2.8
algebra, logic, 3.1.2 ancient Roman, 2.5.8
algebra, logical, 4.14.4 and, 3.7.7, 4.3.3
algebra, logical, formalized, 3.1.2 and-introduction rule, 4.6.2
algebra, matrix, 11 angle brackets, 10.8.7
algebra, mixed tensor, 13.9.14 angles, Euler’s, 41.8.2
algebra, multilinear, 13.1.7 Anglo-Saxon, 3.5.4
algebra, predicate, 4.4.1 Anglo-Saxon, otherwise, 3.7.17
algebra, propositional, 4.4.1 animal, cognition, 5.7.17
algebra, rectangular matrix, 11.1 animal, domesticated, 2.2.6
algebra, symbolic, 10.10.4 animal, half, 1.4.7
algebra, tensor, 13, 13.9.6 animal, multi-celled, 14.1.11
algebra of sets, 5.13, 5.14 animal, world model, 5.7.25
algebra representations, 9.10.6 animal communication, 2.5.1
algebraic closure of group, 9.2.2 animal learning, 7.2.1
algebraic operation symbol, 9.9.1 animal logic, 3.5.10
algebraic real number, 14.2.8 animal mind, 2.3.3, 3.4.5
algebraic style, symbolic logic, 4.0.2 annulus, closed, 17.1.17
algebraic system, 4.1.1, 9.9.0 annulus, open, 17.1.17
algebraic topology, 14.1.10, 14.2.3, 15.4.24, 16.2.2, 16.6 Anschauung, 3.4.4
algorithm, etymology, 45.2.3 answer, yes/no, 3.9.5
algorithms, 2.10.10 ant, 1.8.1, 2.10.3, 2.10.7
alien minds, 2.5.6 antecedent subexpression, 4.3.14
all sets, 5.7.23 antediluvian soup, 5.1.21
allocation, dynamic memory, 2.10.10 anthropocentric, 2.2.6
allowed homomorphism, 9.9.10 anthropological linguistics, 2.2.4
Alpha Centauri, 4.12.1 anthropological observable, 3.4.1
alphabet, Greek, 14.3.13 anthropologist, 2.2.7, 2.2.8
alternating algebra, 13.11.3 anthropology, 2.5.1, 2.5.17, 3.0.1
alternating form, 13.10.4 anthropology of logic, 3.2.8
alternating form bundle, cross-section, 20.5.4 anthropology of mathematics, 2.2.5
alternating form bundle on Euclidean space, 20.5.2 anthropomorphic principle, 24.1.4
alternating symbol, Levi-Civita, 7.9.1, 7.10.20 anti-derivative, 21.0.1
alternating tensor, 13.10 anti-reflexive relation, 5.7.10
alternating tensor algebra, 13.11 anticommutative product, 9.11.0
alternating tensor product, 13.10.6 Antiphon the Sophist, 45.1.1
alternative denial, 4.3.6, 4.6.4 antipodal points on two-sphere, 41.9.4
alternative denial operator, 4.3.8 antisymmetric multilinear effect of vector sequence, 13.1.5
always-false proposition, 4.3.20 antisymmetric multilinear map, 13.4, 13.4.3
always-true proposition, 4.3.20 antisymmetric relation, 5.7.10
always-true set-theoretic formula, 6.5.16 Apollonius of Perga, 45.1.1
amazing coincidence, 3.4.5 applicability, logic, 3.9.8
ambiguity, left/right transformation group, 2.5.4, 5.16.6, application, logic, 3.3.3
33.8.8 application rule, theorem, 4.9.2
ambiguity, zero tangent operator, 2.6.5, 27.5.12, 27.6.1 applied mathematician, 2.9.9
ambiguous conjunct, logical expression, 4.3.12 arc, 16.1.1
American Mathematical Society, 1.10 arc, Jordan, 16.2.11
analogue digital conversion, 2.4.3 arc, open, 16.2.9
analysis, etymology, 14.8.1 archaeologist, future, 2.3.4, 2.3.5
analysis, local continuity, 14.2.2 Archimedes of Syracuse, 20.1.1, 45.1.1, 45.1.3
analysis, numerical, 2.11.6 architect, 5.0.6
analysis on Euclidean space, 42.10 architecture, 1.4.8
analysis situs, 14.2.1, 45.2.17 arctangent function, two-parameter, 20.13.6
analytic atlas, 26.10.1 Arctic, 3.10.6
analytic chart, 26.10.3 area, directed, 13.1.11
analytic equivalent atlas, 26.10.4 argument, logical, 3.7.8

50. Index 843
argument of function, 6.5.11 aut, 4.3.5

argumentation, logical, 4.4 automated logic, 4.3.10
Aristotelian logic, 3.4.2, 3.11.9 automorphism, idempotent linear, 10.1.5
Aristotle, 3.4.2, 45.1.1 automorphism, inner, 9.3.22
arithmetic, unsigned integer, 7.4 automorphism, Lie algebra, 9.11.8
arithmètic arts, 2.9.6 automorphism, linear space, 10.3.6
arithmètic equivalent, logic operator, 4.3.15 axiom, comprehension, ZF, 5.4.3
arithmètic triangle, 7.11.6 axiom, empty set, 5.3.2
arrow, Peirce, 4.3.6, 4.3.8 axiom, extension, ZF, 5.1.3, 5.2
arrow, portable, 10.1.6 axiom, extensionality, 5.2.1
arrow of time, 2.2.1, 2.2.6 axiom, infinity, ZF, 2.11.13, 5.6
art, logic, 3.2.3 axiom, naive comprehension, 5.7.2, 5.7.6, 5.7.8
art, non-representational, 3.1.3 axiom, power set, 5.3.5
artificial intelligence, 2.3.1 axiom, reflexivity of equality, 4.15.1
ass, 3.5.3 axiom, regularity, 5.7.27
assertion, 4.5.7 axiom, regularity, ZF, 5.5, 5.7.19
assertion, delayed, 3.7.17 axiom, replacement, ZF, 5.4
assertion, double-negative, 3.7.4 axiom, separation, ZF, 5.4.3
assertion, etymology, 3.7.1 axiom, set existence, ZF, 5.1.17, 5.1.18
assertion symbol, 4.5.8, 4.6.6, 45.2.18 axiom, set theory, CC, 5.10.1
assertion symbol, two-way, 4.5.9 axiom, singleton, 5.3.4
assertion trigger, 3.7.17, 3.10.14 axiom, specification, 4.12.6
assertions, uninteresting, 4.8.1 axiom, specification, ZF, 5.4.1, 5.4.2, 5.7.19, 47
associated differentiable fibre bundle, 34.6, 34.6.3, 34.6.7, axiom, substitution of equality, 5.2.5
34.6.10 axiom, substitutivity of equality, 4.15.1
associated fibre bundle, 34.9.4 axiom, union, 5.3.4
associated parallelism, 24.3 axiom, unordered pair, 5.3.3
associated topological fibre bundle, 23.10, 23.10.5, 23.10.9 axiom, ZF, productive, 5.1.17, 5.1.19
associated topological fibre bundle, construction, 23.11 axiom of choice, 2.10.3, 2.12.3, 4.13.10, 5.9, 5.9.6, 5.11.5,
associated topological fibre bundle, orbit space method, 5.15, 6.9.5, 7.8.0, 10.2.24, 14.2.9, 14.9.9, 15.1.5, 15.7.10,
23.12, 23.12.3 17.3.26, 23.12.5
associated topological pathwise parallelism, 24.3.2 axiom of choice, useless, 5.1.18
associative algebra, 9.10 axiom of choice and Lebesgue measure, 20.1.3
associative algebra, associated Lie algebra, 9.11.10 axiom of comprehension, 5.11.2
associative algebra, linear representation, 9.10.7 axiom of countable choice, 5.9.1, 5.9.11, 5.10, 5.10.1, 5.10.2,
associative algebra over commutative unitary ring, 9.10.1 5.10.3, 7.8.3, 7.8.4
associativity, logical operator, 4.3.12 axiom of dependent choice, 5.0.10
associativity rule, 4.3.12 axiom of foundation, 5.5.1
asterisk method of learning, 1.9.1 axiom of infinity, 2.10.6, 2.10.7
asteroid, 2.5.6 axiom of reckless comprehension, 5.7.24
astronomical spherical coordinates, 41.1.6 axiom of replacement, 5.11.1
astronomy, 2.9.8 axiom of separation, 5.11.2
astronomy, Ptolemaic, 3.4.2 axiom of specification, 5.11.1
asymmetric logical operator, 4.6.2 axiom of subsets, 5.11.2
Athens, 3.4.2 axiom of substitution, ZF, 5.11.3
atlas, analytic, 26.10.1 axiom schema, 4.5.2
atlas, analytic equivalent, 26.10.4 axiom style, ZF, 5.6.2
atlas, C r -maximal, 26.5.14 axiom system, classical, 2.8.3
atlas, differentiable, 26.3, 26.3.2, 26.3.4 axiom system, modern, 2.8.3
atlas, fibre, non-topological fibration, 22.1.7 axiom system, self-consistency, 3.11.4
atlas, manifold, differentiable, 26.3 axiom systems, summary of experience, 2.5.18
atlas, maximal, topological, 25.4.15 axiom template, 5.11.5
atlas, standard, for Euclidean space, 26.4.1 axiomatic approach, definitions, 5.0.4
atlas, tangent bundle, 27.2.1 axiomatic method of definition, 2.8.1, 2.8.2
atlas, topological, 25.4, 25.4.6 axiomatic reformulation of logic, 3.13.4, 7.13
atlas, usual, for Euclidean space, 26.4.1 axiomatic specification, 2.7.2
atlas direct product, 25.5.3 axiomatic system, 2.5.13, 4.5.2
atlas of curves for a path, 16.1.8 axiomatic system, incomplete, 3.8.2
atlas of tangent bundle total space, 27.8.4 axiomatic system, spartan, 4.6.4
atlas product, 25.5.3 axiomatic system for propositional calculus, 4.7.4
atom, 2.12.3 axiomatization, 5.0.7
atoms, Universe, 2.11.15 axiomatization, credibility, 3.2.6
attributes, database object, 2.4.4 axiomatization, Euclid’s geometry, 3.2.7
attributes, set, 5.2.6 axioms, baseless, 2.2.8
audio cable connectors, 2.5.13 axioms, Euclidean geometry, 2.1.9
audio system, positive feedback, 3.3.7 axioms, inconsistent, 4.1.9, 4.12.7
Australian aborigine, 2.2.7 axioms, linear space, 2.8.7
Auswahlaxiom, 5.9.3 axioms, non-standard, 5.0.10

844 50. Index
axioms, Peano, 2.5.13, 7.3.3 binary set intersection properties, 5.13

axioms, set theory, Zermelo-Fraenkel, 5.1.26 binary set union properties, 5.13
axioms, ZF set theory, 5.3 biology, discipline, 2.5.17
axioms and constructions, 2.8 bird, 2.0.4
bird, tree, 5.7.25
Babbage, Charles, 18.2.11
birth, 2.4.2, 2.11.19
baboon, gelada, 2.2.4
bistable transistor circuit, 4.1.1
Babylon, 3.4.2
black hole, 39.4
Babylonian geometry, 38.1.5
black number, 2.10.5, 2.11.12
background claim, 3.5.4
blindingly obvious, 1.5.2
background proposition, 3.6.1, 3.7.3
blocking proposition, 3.10.1, 3.10.14
backwards-deductive search, 4.8.5
blow-out, unknowns, 3.8.2
bacteria, 2.2.1
bodies, communications standardization, 2.8.9
ball, closed, 17.1.7, 17.3.9
bogus theorem, 4.9.10
ball, closed, punctured, 17.1.17
Bolzano, Bernard Placidus Johann Nepomuk, 45.1.5
ball, closed, with zero radius, 17.1.10
Bolzano-Weierstraß property, 17.3.29, 17.3.30
ball, open, 17.1.7, 17.3.8
Bolzano-Weierstraß theorem, 17.3.29
ball, open, punctured, 17.1.17
Bombelli, Rafael, 7.5.11, 45.1.3, 45.2.7
ball, tennis, 2.10.10
bone-setting, 45.2.4
ball centre, 17.1.7
Bonn, 0.0
ball radius, 17.1.7
book, definition-centric, 1.4.5
Banach, Stefan, 45.1.6
books, other people, 49.10
Banach space, 10.2.22
bandwidth, finite, 2.5.13 Boole’s algebra, 3.6.3
bandwidth, finite, human communication, 2.11.14 Boolean algebra, 4.0.1, 4.4.1
barking dog, Harry’s, 3.6.3 boolean formula, 47
Barrow, Isaac, 45.1.3 boot-strap, 3.14.1, 4.7.7
base, open, 14.11, 14.11.2, 15.6.1 boot-strap, integer definitions, 7.2.5
base point of tangent operator, 27.5.1 boot-strap layer, 2.1.6
base space, tangent bundle, 19.1.10 boot-strapping mathematics, 2.9.6
baseless axioms, 2.2.8 boot-strapping of definitions, 2.1.1, 2.1.3, 6.1.1
basis, dual, canonical, 10.5.7 borderline examples, 42.1.1
basis, tangent, 27.12.1 Borel, Félix Edouard Justin Emile, 5.9.3, 45.1.6
basis existence, linear space, 5.9.10, 5.9.12, 10.2.25 bottle, Klein, 43.2.2
basis for linear space, 10.2.23 bottleneck, information, cosmic, 2.11.17
basis for linear space, finite, 10.2.9 bottleneck, naming, 2.10.1
basis vector, 10.2 bound of partially ordered set, lower, 7.1.10
basis vector, coordinate, 27.7.4, 27.12 bound of partially ordered set, upper, 7.1.10
bedrock of knowledge, 2.0.1 bound variable, 5.1.24
bedrock of mathematics, 2.1 boundary, zero-thickness, 14.1.9
bedrock of physics, 2.12.3 boundary, zero-width, 14.0.3
behaviour, grooming, 2.2.4 boundary conditions, 2.11.5
behaviour control, social, 3.7.12 boundary differentiability of functions of several variables,
behaviourism, 2.5.20 18.5.1
belief, 3.10.15 boundary of boundary, 20.6.12
beliefs, 3.4.3 boundary of set, 14.6
Beltrami, Eugenio, 40.1.1, 45.1.6 boundary of set, open/closed portions, 14.6.9
Bēowulf, 3.5.4, 3.12.3 boundary of set, topological, 14.6.4
Bēowulf, logical language, 45.4.2 boundary point of set, 14.0.2
Bernays, Paul Isaak, 2.9.7, 45.1.6 boundary value problem, physical models, 21.3.0
Bernays-Gödel set theory, 5.1.6, 5.5.1, 5.12 boundary value problems, 21.3
BG (Bernays-Gödel), 5.0.9 boundary value problems, differentiability at boundary
BG set theory, 5.12 points, 18.5.1
Bianchi identities, 36.8.9 bounded set in metric space, 17.2.7
bibliography, 49 bracket, Poisson, 9.11.12, 32.2, 32.2.10, 32.3.2, 32.5.0, 33.0.4
biconditional expression, 4.3.14 brackets, angle, 10.8.7
bidirectional derivative, 19.6.2 brain, human, 4.2.8
bidirectionally differentiable function, 19.6.3 Brazilian forest, 3.2.8
bidual of linear space, 10.5.19 bricklaying, 1.4.8
big bang, 39.5 brilliant guesswork, 1.4.7
big ideas, 1.4.11 British spelling, 1.6.7
biggest number, 5.7.18 Brouwer, Luitzen Egbertus Jan, 3.11.2, 45.1.6
bijection, 6.5.23, 14.1.10 bucket/set metaphor, 5.7.17
bijective function, 6.5.23 bug, 4.7.7
bilinear map, canonical, 13.5.3 bug fix, 3.10.4
bilinear map, universal, 13.5.3 buggy set theory, 3.10.4
binary number, 7.5.8 building safety regulations, 5.0.6
binary numbers, 2.10.6 bulk conjunctions, 3.1.2
binary operator, 4.3.12 bulk disjunctions, 3.1.2

50. Index 845
bundle, alternating form, cross-section, 20.5.4 Cartesian product projection map, 6.9.8
bundle, curve, 24.4.3 Cartesian space, 29.0.3
bundle, differentiable fibre, 34 cat, sat, mat, 3.10.10
bundle, line, 34.7.2 cat, Schrödinger’s, 3.2.8, 5.7.12
bundle, orthogonal, tangent, 38.5.9 catalysis, 14.8.1
bundle, second-order tangent, 29.3.10 catchment area of differential geometry, 1.4.3
bundle, tangent, 27.0.1, 27.8, 27.8.1 categories of mathematics ontologies, 2.3.8
bundle, tangent, Euclidean space, 19.1.8 category, rabbit, 2.11.14
bundle, tangent operator, 27.9, 27.9.2 cattle counting, prehistoric, 2.11.19
bundle, topological second-order tangent, 29.3.11 Cauchy, Augustin Louis, 3.11.9, 9.2.1, 14.12.1, 20.1.1, 45.1.5
bundle, vector, 34.7, 34.7.1 Cauchy sequence of rational numbers, 8.3.5
Burali-Forti, Cesare, 3.2.1, 45.1.6 causality violation, self-containing set, 5.7.24
Burali-Forti paradox, 3.2.1, 5.7.13, 5.7.15, 7.2.3 caveat emptor, 4.1.1
bus, 23.1.2 Cayley, Arthur, 9.2.1, 10.1.8, 11.0.1, 45.1.5
būtan, Old English, 3.5.4 CC (axiom of countable choice), 5.0.9
BVP (boundary value problem), 21.0.2 CC set theory axiom, 5.10.1
CC-tainted theorem, 5.9.1, 7.2.27
C r -maximal atlas, 26.5.14
CC-tainted theorem example, 7.2.26, 7.2.28, 7.2.36
calculation, 1.4.8
CC-tainted theorem examples, 5.9.10, 5.10.4
calculus, differential, 18
ceiling function, 8.6.9
calculus, differential, and logic, 4.4.2
cell, 33.2.5
calculus, exterior, 13.1.7
Centauri, Alpha, 4.12.1
calculus, fundamental theorem, 20.2.3, 20.3.1, 20.6.3, 20.9.1,
21.0.1 central concepts of differential geometry, 36.8.1
calculus, integral, 20 centralizer of group, 9.3.27
calculus, integral, and logic, 4.4.2 centre of ball, 17.1.7
calculus, predicate, 3.1.2, 4.12, 4.13, 4.14 centre of group, 9.3.27
calculus, propositional, 3.1.2 centroid, 9.7.0
calculus, propositional, axiomatic system, 4.7.4 CGI graphics, 2.4.6
calculus, propositional, formalization, 4.5 chain, cyclic, 5.7.10
calculus, propositional, implication-based, 4.7 chain, implication operators, 4.6.5
calculus, propositional, semantics-free, 4.5.1 chain, set membership, 5.5.3
calculus, propositional, theorems, 4.8 chain, singular, 26.1.6
calculus, tensor, 40 challenge/response, 2.11.21
calculus, vector field, 32 channel, mathematics communication, 2.5.11
calculus of variations, 21.5 chapter groups, 1.3
camel, straw, 2.11.15, 4.13.12 chapter page counts, 1.3.1
canoncial multilinear map for tensor space, 13.5.1 chapters, overview, 1.2
canonical 1-form, 36.8.4 characteristic function, 7.9.4
canonical bilinear map, 13.5.3 characteristic polynomial of matrix, 11.5.7
canonical construction, 2.8.6 characterization, 2.5.8
canonical dual basis, 10.5.7 chart, analytic, 26.10.3
canonical map for tensor space, extended, 13.13.5 chart, differentiable, 26.5.6
canonical map from linear space to second dual, 10.5.20 chart, Earth, 25.1.5
canonical multilinear map, 13.6.4 chart, fibre, 23.3.4
canonical representation, 2.8.6 chart, fibre, non-topological fibration, 22.1.5
Cantor, Georg Ferdinand Ludwig Philipp, 2.9.7, 3.2.1, 5.0.5, chart, manifold, affine space, 12.2.11
7.2.7, 14.2.8, 45.1.6 chart, per-fibre-set, 34.2.4
Cantor’s paradox, 3.2.1 chart, tangent bundle, 27.2.1
capacity, cranial, 2.2.4 chart, topological, 25.4, 25.4.2
capital, intellectual, 4.7.7 chart for path, 16.1.8
car, 23.1.2 Chasles, Michel, 45.1.5
car parts, 2.4.3, 2.4.4 chemistry, 2.1.5
Carathéodory, Constantin, 45.1.6 chemistry, discipline, 2.5.17
cardinal number comparability, 5.9.10 chess, 3.4.3, 3.9.3, 4.8.1
cardinality, 14.1.10 chicken, 2.5.6, 2.5.8, 43.2.6
cardinality, uniqueness, 4.16.7 chicken, egg, 5.7.21
Carnot, Lazare Nicholas Marguerite, 45.1.5 chicken-foot symbol, 5.16.4, 6.3.20
Carroll, Lewis, 4.0.1 Chinese mathematics, 7.11.7
Cartan, Élie, 35.1.5, 45.1.6 choice, dependent, axiom, 5.0.10
Cartesian coordinates, 18.1.1, 25.1.3 choice, multiple, 3.7.3, 3.9.5
Cartesian product, partial, 6.10, 6.10.1, 41.6.0 choice axiom, 2.10.3, 2.12.3, 4.13.10, 5.9, 5.9.6, 5.11.5, 5.15,
Cartesian product, sequence, 7.7 6.9.5, 7.8.0, 10.2.24, 14.2.9, 14.9.9, 15.1.5, 15.7.10,
Cartesian product, standard identification map, 7.7.6 17.3.26, 23.12.5
Cartesian product of family of functions, 6.9 choice axiom, countable, 5.9.1, 5.9.11, 5.10, 5.10.1, 5.10.2,
Cartesian product of family of sets, 6.9, 6.9.1 5.10.3, 7.8.3, 7.8.4
Cartesian product of sets, 6.2, 6.2.1 choice axiom, useless, 5.1.18
Cartesian product of sets, properties, 6.2.5 choice function, 7.8.0
Cartesian product p-norm, 10.8.1 choice functions without axiom of choice, 7.8

846 50. Index
Christoffel, Elwin Bruno, 35.1.5, 45.1.6 cognitive theory, 2.11.14

Christoffel symbol, 36.8.8, 36.10.5, 38.5.2, 38.5.4, 40.3.1, coherence, recursive model, 3.3.6, 3.3.7
40.5.0 coherent, etymology, 2.1.7
Christoffel symbol, non-tensorial, 38.5.5 coherent network of concepts, 2.1.7
Christoffel symbol, tensorization, 29.2.7 cohesion, social, 2.2.4
Christoffel symbol on two-sphere, 41.2.2 coin, two-sided, proposition analogy, 3.7.4
Christoffel symbols, 29.0.4 coincidence, amazing, 3.4.5
chronology of mathematicians, 45.1 collection, 5.1.5
circle, great, 41.9.2 colloquial logic confusion, 3.5.7, 3.13.7
circle on two-sphere, 41.14 colonies, Spanish, 2.2.7
circuit, bistable transistor, 4.1.1 colour perception, 3.4.5
circuit, digital electronics, 4.1.8 column vector map, 11.1.10
circuit, electronic logic, 4.5.5 combination, 7.11
circuit voltages, transistor, 3.10.6 combination, convex, 10.2.27, 12.1.1, 37.5, 40.5.0
circularity, logic and set theory, 3.3.8 combination, convex, points in affine space, 12.2.24
civilisation, extra-terrestrial, 2.2.6 combination, linear, 9.7.0, 9.7.1, 10.2.8, 10.2.22
civilisations, inter-galactic, 2.0.3 combination, linear, formal, 10.10.4
claim, foreground/background, 3.5.4 combination, truth-functional, 4.2.1, 4.2.3
clash of definitions, pathological, 2.6.7 combination function, convex, 37.5.2
class, mathematical, 5.16.2 combination symbol, 7.11.2
class, proper, NBG, 5.7.23 combinatorics, topology on finite set, 14.4.6
class of objects, 5.16.7 communication, animal, 2.5.1
class ontology, mathematical, 2.5.4 communication, human, finite bandwidth, 2.11.14
class operational procedure, 4.1.4 communication channel, mathematics, 2.5.11
class tag, set, 5.2.6 communication protocol, 2.5.7
class test procedure, 4.1.4 communication systems interoperability, 2.8.9
class/object model, 5.7.25 communications, computer, 2.1.4
classes, topology, 14.2.6, 14.8.10, 15 communications, socio-mathematical network, 2.5
classical axiom system, 2.8.3 communications engineering, 2.8.9
classification, global connectivity, 14.2.2 communications standardization bodies, 2.8.9
classification of set, 6.4.5 communities of minds, 2.2.3
classification of topologies, 14.1.10 commutative group, 9.2.20
clay tablet, Mycenaean, 2.3.5 commutative ring, 9.8.5
Clifford, William Kingdon, 39.3.1, 45.1.5 commutative ring with identity, 9.11.1
close curve, simple, 16.2.11 commutative unitary ring, 9.8.6
closed annulus, 17.1.17 commutative unitary ring, associative algebra, 9.10.1
closed ball, 17.1.7, 17.3.9 commutator of vector fields, 32.2.5
closed ball, punctured, 17.1.17 compact analytic manifold, 26.10.5
closed ball with zero radius, 17.1.10 compact differentiable manifold, 26.5.17
closed curve, 16.2.11 compact-domain curve, 16.2.9
closed interval, 8.3.10 compact-domain curve, rectifiable, in Lipschitz manifold,
closed-open interval, 8.3.10 26.12.5
closed path, 16.4.7 compact-open topology, 15.7.9
closed path, simple, 16.4.8 compact set, 15.7.4
closed-point topology, trivial, 14.8.6 compact topological space, 5.9.10
closed portion of boundary of set, 14.6.9 compactness, Heine-Borel, 15.7.5
closed set, 14.3.12 compactness, sequential, 17.3.29
closed set symbol F , 14.3.13 compactness classes of topological spaces, 15.7
closure, concrete proposition domain, 4.9.1 compass direction, 3.7.3
closure, exterior, topology, 14.6.1 compatible fibre chart, 23.6.14
closure, topological, 14.5.4 competition, 2.2.2, 2.2.3
closure, topology, notation, 14.5.6 complement, double, 3.6.1, 3.11.6
closure of set, 14.5 complement, set, 5.13.8
closure of set unions under arbitrary unions, 5.15 complete pseudogroup of diffeomorphisms, 19.4.10
cloth, 2.5.14 complete pseudogroup of homeomorphisms, 19.4.7
cloud of statistical variations, 2.3.9 complete pseudogroup of unidirectionally differentiable
cluster point, 14.7.1 homeomorphisms, 19.6.7
Code of Laws, Hammurabi, 3.5.3 completely regular topological space, 15.2.18
codomain of relation, 6.3.6 completeness, logic, 4.1.7
coefficient, 27.0.6 completeness of modelling, mathematical logic, 3.2.5
coefficient tuple, 27.5.1 complex number, 8.7
coefficient tuple of tagged tangent operator, 27.6.2 complexity of vocalizations, 2.2.4
coefficient vector, 27.5.1 component, 27.0.6
coefficient vector of tagged tangent operator, 27.6.2 component, connected, 15.4.14
coefficients, tensorization, 38.5.5 component, horizontal, of vector, 26.13.8
coefficients of affine connection on principal fibre bundle, component, vector, 10.5.13
36.10 component function of vector field, 28.5.5, 28.6.6
cognitive science, 5.7.17 component map, dual, 10.5.11

50. Index 847
component map for basis of linear space, 10.2.14 conditioning, operant, 3.5.10
component matrix of a linear map, 11.2.1 conditioning, psychology, 7.2.1
component matrix of linear map, 11.2 cone-shaped neighbourhood, 18.2.13
component tuple, computational second-order tangent, 29.3.1 confidence level, logic, 3.9.3
component tuple, second-order tangent, 29.3.1 conformal connection, 12.1.5
component tuple of vector with respect to basis, 10.2.16 conformal-invariant geometry, 45.3.0
components of tensor, 28.3.6 conformal map, 9.4.10
composite integer, 7.4.2 conformal metric, 38.2.4
composite number, 14.4.9 conformal sublayer, 38.0.3
composite of functions, 6.7.1 conformal transformation group, 45.3.0
composite of partially defined functions, 6.11.7 conformality of transition maps, 27.8.9
composition, diffeomorphisms, 19.1.3 confusion, colloquial logic, 3.5.7, 3.13.7
composition of functions, 6.7, 6.7.1 confusion, logic, 3.3.4
composition of operator fields, 32.2.7 conical coordinates for Euclidean space, 42.7
composition of partially defined functions, 6.11.7 conical coordinates Laplacian, 42.7.0
composition of relations, 6.3.23 conjecture, Poincaré, 14.1.10
composition of vector fields, 32.2.2 conjugate of a subset of a group, 9.3.13, 9.3.15
composition rule for differentiation, 18.4.11, 18.4.12 conjugate operator, 10.5.24
compound proposition, 3.13.2 conjugate point, 37.8.4
compound proposition, reception, 3.7.14 conjugation map, 9.3.22
compound proposition decomposition, 3.7.14 conjunct, 4.3.11
compound proposition space, 4.2.2 conjunct, logical expression, ambiguous, 4.3.12
compound propositions, on-demand construction, 3.13.5 conjunction, logical, 4.3.3
comprehension, naive, axiom, 5.7.2, 5.7.6, 5.7.8 conjunction, proposition list, 3.13.6
comprehension, reckless, axiom, 5.7.24 conjunction, triple, 3.5.8
comprehension axiom, 5.11.2 conjunctions, bulk, 3.1.2
comprehension axiom, ZF, 5.4.3 conjunctive normal form, 4.11.3
compressible integer, 2.11.6 connected component, 15.4.14
compressible number, 2.11.2 connected subset, 15.4.3
computational second-order tangent component tuple, 29.3.1 connected topological space, 15.4.1
computational tangent vector, 27.4, 27.4.2 connectedness and continuity of functions, 14.1.8
computational tangent vector triple, 27.4.1 connection, affine, 12.0.3, 36
computations, mechanistic, 2.9.1 connection, affine, overview, 36.2
computer communications, 2.1.4 connection, affine, terminology, 36.1.3, 36.1.4
computer file system directory, 5.7.19 connection, choice of definitions, 35.1
computer hardware, 3.13.8 connection, conformal, 12.1.5
computer language, 2.9.2 connection, continuous, 35.1.2
Computer Modern Roman font, 1.10 connection, general, alternative definitions, 35.9
computer operating system, 4.7.7 connection, history, 35.1
computer programming, 2.6.9, 5.16.7 connection, Lagrangian mechanics, 36.11
computer proof, 3.6.2 connection, Levi-Civita, 38.5, 38.5.2, 38.5.5, 38.7.2, 41.5.2
computer simulation, 3.13.4, 3.13.8, 3.14.2 connection, Levi-Civita, Euclidean space, 12.4.1
computer software, 2.3.5, 3.13.8 connection, Levi-Civita, globality, 39.1.5
concatenation of curves, 16.2.15 connection, Levi-Civita, metric layer, 38.0.1
concatenation of lists, 7.12.2 connection, Levi-Civita, on two-sphere, 41.2.2
concatenation of paths, 16.4.13 connection, Levi-Civita, orthogonal, 38.2.5
concatenation of sequence of curves, 16.2.17 connection, Levi-Civita, parallelism, 35.2.3, 36.1.2
concatenation operator for tuples, 8.5.3 connection, Levi-Civita, parallelism at a distance, 24.1.1
concave function, 40.5.0 connection, Levi-Civita, tensorization, 29.2.7
concept, primitive, 3.4.3 connection, Levi-Civita, tensorization coefficients, 36.7.10
concept network, coherent, 2.1.7 connection, Lie, 32.4.3, 32.4.4
concepts, acyclic network, 2.1.7 connection, metric, 38.5.6
concepts, lowest-level, 3.0.1 connection, naming, 35.1
conceptual economy, 6.5.2 connection, OFB, 35.3.2
concrete discussed context, 4.3.1 connection, orthogonal, 12.1.5, 41.5.3
concrete equality relation, 2.5.11 connection, orthogonal, Levi-Civita, 38.2.5
concrete equality relation, import, 4.15.2 connection, PFB, 35.3.2
concrete proposition, 3.3.3, 3.10.6 connection, PFB, connection form, 35.6
concrete proposition domain, 3.1.2, 4.1, 4.1.3, 4.5.4 connection, PFB, parallel displacement, 35.8
concrete proposition domain, dynamic, 4.1.6 connection, terminology, 15.3.2
concrete proposition domain, static, 4.1.6 connection, torsion-free, 36.1.4, 36.7.10
concrete proposition domain closure, 4.9.1 connection definition styles, 35.1.8
concrete proposition domain examples, 4.1.8 connection differentiability, 35.3.9, 35.3.14
concrete set domain, 5.7.23 connection form, PFB connection, 35.6
concrete variable domain, 5.7.3 connection form on principal fibre bundle, 35.6.3
concrete variable space, 5.2.3 connection layer, 1.1, 35.0.4
conditional expression, 4.3.14 connection on differentiable fibre bundle, 35
conditional statements, Gilgamesh epic, 3.5.2, 45.4.1 connection on ordinary fibre bundle, curvature, 35.4

848 50. Index
connection on principal fibre bundle, alternative definition, control, social behaviour, 3.7.12
35.9.2, 35.9.4 convergent sequence, 14.12.27
connection on principal fibre bundle, differentiability, 35.5.5 conversion, analogue digital, 2.4.3
connections on manifolds, motivation, 36.3 convex combination, 12.1.1, 37.5, 40.5.0
connective, primitive, 4.5.2, 4.7.4, 4.11.1 convex combination, points in affine space, 12.2.24
connective, principal, logical operator, 4.3.11, 4.3.12 convex combination function, 37.5.2
connective, propositional, 4.3.4 convex combination of vectors, 9.7.0, 10.2.27
connectivity classes of topological spaces, 15.4 convex curvilinear interpolation, affine manifolds, 37.6
connectivity classification, global, 14.2.2 convex curvilinear interpolation, affine space, 16.5
consequent subexpression, 4.3.14 convex function, 37, 37.9, 37.9.1, 40.5.0
conservation equations, 14.0.3 convex function on two-sphere, 41.11
conservative force field, 20.6.12 convex neighbourhood, 37.8.5
consistent, etymology, 2.1.7 convex set, 37, 37.4
constant, individual, 4.12.12 convex set on two-sphere, 41.11
constant curvature, surface, 42.9 convex subset, 37.4.1
constant curve, 16.2.11 convex subset of Riemannian manifold, 40.5.0
constant function, continuous, 14.12.3 convexity, 37
constant logical function, 4.12.12 convexity test function, 37.9.2
constant name, definition, 4.12.13 cook, meat-centric, 1.4.5
constant path, 16.4.8 cook, vegetarian, 1.4.5
constant predicate, 4.12.12 cookbook, 1.5.3
constant stretch of curve, 16.3.2 cooperation, 2.2.2, 2.2.3
construction, canonical, 2.8.6 coordinate, 27.0.6
construction, on-demand, compound propositions, 3.13.5 coordinate basis covector, 28.2.5
construction, set, dynamic perspective, 5.7.24 coordinate basis operator field, 28.6.4
construction of associated topological fibre bundle, 23.11 coordinate basis vector, 27.7.4, 27.12
construction stage, ZF set theory, 5.5.4 coordinate basis vector field, 28.5.12
constructional method of definition, 2.8.1, 2.8.4 coordinate frame bundle, tangent, 34.9.1
constructions, topology, 15 coordinate-free, 2.9.6, 11.2.0, 11.2.4, 26.2.3, 26.9.5, 27.1.2,
constructions and axioms, 2.8 27.1.3, 27.1.8, 36.7.8, 41.1.1, 41.4.1, 44.0.0
constructivism, 2.2.8 coordinate-free linear space, 12.0.1
container, physical, 2.6.2 coordinate function, topological, 25.4.2
container metaphor, set, 5.7.17, 5.7.25 coordinate-independent, 36.5.1
content, determinable, set, 5.5.3 coordinate map, topological, 25.4.2
context, discussed, 3.3.4 coordinate transformation, 19.1.7
context, discussed, concrete, 4.3.1 coordinate transition matrix, 26.5.7
context, discussion, 3.3.4 coordinate triple, tangent, 27.3.2
context, discussion, abstract, 4.3.1 coordinate triple of tangent operator, 27.5.1
context, meta-discussion, 3.3.4 coordinate tuple, 27.5.1
context stack, 3.10.5 coordinate tuple of tagged tangent operator, 27.6.2
continuity, epsilon-delta, 17.4.2, 17.4.3, 17.4.5, 17.4.14 coordinate vector, 27.5.1
continuity, Hölder, 18.9, 18.9.1 coordinate vector of tagged tangent operator, 27.6.2
continuity analysis, local, 14.2.2 coordinates, Cartesian, 18.1.1, 25.1.3
continuity of functions in terms of connectivity, 15.5 coordinates, conical, for Euclidean space, 42.7
continuous connection, 35.1.2 coordinates, geodesic, 37.8.1, 37.8.3
continuous curve, 16.2.4 coordinates, normal, 37.8.1, 38.5.10
continuous function, 14.12, 14.12.2 coordinates, spherical, 42.6.0
continuous function, constant, 14.12.3 coordinates, spherical, astronomical, 41.1.6
continuous function, uniformly, 17.4.8 coordinates, tangent vector, 27.2.1
continuous function on metric space, 17.4, 17.4.1 coordinates, tensor calculus, 40
continuous manifold, 25.4.11 coordinates, terrestrial, 41.1, 41.6.0
continuous path, directed, 16.4.2 coordinates for polar exponential map, 41.6
continuous path, oriented, 16.4.2 corps, 9.8.10
continuous path, unoriented, 16.4.15 corpus, 9.8.10
continuous vector field along curve, 28.8.2 corpus, mathematical knowledge, 3.2.5, 5.0.3
continuum, 25.1.9, 35.1.2 corpus, PDE, 1.4.1
continuum hypothesis, 5.0.10, 7.2.7 corpus, theorem, 5.6.2
contour, 16.1.1 correct logic, 3.1.3
contradiction, 3.10.3, 4.3.22, 4.3.23 correctness, plodding, 1.4.7
contradiction, proof by, 2.11.5, 3.1.4, 3.11, 4.1.7 corresponding parameters of curves, 16.3.14
contradiction, proof by, equivalents, 3.11.1 coset of subgroup, 9.3.3
contradiction, proof by, validity, 3.11.10 cosine function, 20.13.9
contradiction, proof by, world-model ontology, 3.11.8 cosmic information bottleneck, 2.11.17
contradiction-free logic, 3.10.12 cotangent bundle on Euclidean space, 19.2.8
contradictione sequitur quodlibet, Ex, 3.11.1 cotangent space, 28.2, 28.2.2
contravariant, terminology, 13.7.2 cotangent space, double, pointwise, 28.4.5
contravariant tensor, 28.1 cotangent space, double, total, 28.4.6
contravariant tensor space, 28.1 cotangent space, total, 28.2.8

50. Index 849
cotangent vector, 28.2, 28.2.2 curve, closed, 16.2.11

cotangent vector space, 28.2.2 curve, closed simple, 16.2.11
cothrom, Domhan, 3.10.8 curve, compact-domain, 16.2.9
countability class, 15.6 curve, constant, 16.2.11
countable choice axiom, 5.9.1, 5.9.11, 5.10, 5.10.1, 5.10.2, curve, continuous, 16.2.4
5.10.3, 7.8.3, 7.8.4 curve, differentiable, 26.7, 26.7.1
counterexamples and sharpness of theorems, 42.0.1 curve, differential, 30.4
counting cattle, prehistoric, 2.11.19 curve, differential, for higher-order operator, 31.8
court, tennis, 2.10.10 curve, geodesic, 37.2, 37.2.3
covariant, terminology, 13.7.2 curve, higher-order differential, 31.3
covariant acceleration of curve, 37.1.4 curve, infinitesimal, 32.2.0
covariant derivative, 32.1.0, 36, 36.5 curve, initial point, 16.2.13
covariant derivative, terminology, 36.5.3 curve, level, 16.1.7
covariant derivative for general connection, 35.7 curve, Lie transport, 32.4.6
covariant derivative of vector field, 36.5.5, 36.5.7, 36.5.12 curve, multiple point, 16.2.13
covariant derivative of vector field along curve, 36.5.11, 37.1, curve, never-constant, 16.3.3, 24.4.4
37.1.2 curve, non-self-intersecting, 30.4.9
covariant tensor, 13.7, 28.3 curve, open, 16.2.9
covector, 13.10.8, 28.2.2 curve, rectifiable, 17.5, 17.5.5, 26.12
covector, coordinate basis, 28.2.5 curve, representative, 16.4.2
covector, unit coordinate basis, 28.2.6 curve, simple, 16.2.11
cover, open, 14.11.12, 15.7.1 curve, sometimes-constant, 16.3.3
cover, open, finite, 14.11.12 curve, space-filling, 16.1.5, 25.1.4, 42.1.2
covering, open, 15.7.1 curve, tangent operator, 30.4.8
covering, open refinement, 15.7.2 curve, tangent vector, transformation rule, 19.1.4
cow, 2.4.2, 2.4.3, 2.11.19 curve, tangent vector field, 30.4.1
craft, 2.5.16 curve, terminal point, 16.2.13
cranial capacity, 2.2.4 curve, topological, 16
cream, ice, 3.3.10 curve atlas, 16.1.4, 16.1.8
credibility, axiomatization, 3.2.6 curve bundle, 24.4.3
crime, 3.10.6 curve class, parallelism, 24.1.6
crisp perfection, abstract logic, 3.4.3 curve class, tangent, 27.0.4
critical point, Hessian operator, 31.6 curve concatenation, 16.2.15
cross-brace, 36.8.8 curve constant stretch, 16.3.2
cross product, 13.1.1 curve corresponding parameters, 16.3.14
cross product, naive, 6.1.6 curve equivalence class, 27.1.1
cross-section, terminology, 23.3.11 curve family, 16.2.2
curve family, higher-order differential, 31.4
cross-section of a topological fibration, 23.3.8
cross-section of alternating form bundle, 20.5.4 curve family, vector field derivative, 32.3
cross-section of alternating form bundle, differentiable, 20.5.5 curve in Euclidean space, differentiable, 18.6.5
cross-section of cotangent bundle on Euclidean space, 19.2.10 curve length, 17.5.9, 40.5.0
cross-section of fibration, differentiable, 26.13.13 curve morphing trajectory, 16.5.1
cross-section of fibre bundle, 6.9.7 curve on two-sphere, geodesic, 41.9
cross-section of tangent bundle, 32.5.0 curve parametrization, 16.1.6
cross-section of tangent bundle on Euclidean space, 19.1.11 curve path-equivalence, 16.3, 16.3.7
curve reparametrization, 17.5.6
cross-section of tangent bundle on Euclidean space,
differentiable, 19.1.12 curve sequence, concatenation, 16.2.17
curve terminology, 16.1
cross-section of tangent fibration, 28.6.1
curve topics summary, 26.7.7
cross-section of topological fibration, 26.13.12
curve traversal order, 16.1.5
cross-section over a set, 23.3.8
curved space, PDE techniques, 1.4.1
cryptographic puzzle, 5.15.3
curvilinear interpolation, convex, affine manifolds, 37.6
cuneiform writing, 3.5.1
curvilinear interpolation, convex, affine space, 16.5
curl, 20.3.6, 20.6.12, 36.8.2
customer, logic, 3.3.8
curvature, 36.8
cut, Dedekind, 2.7.2, 2.7.3, 8.3.4
curvature, constant, surface, 42.9
cycle of disciplines, 2.5.17
curvature, Gaussian, 19.5.3
cyclic, logic and set theory, 2.1.6
curvature, positive, space, 36.0.2
cyclic chain, 5.7.10
curvature, Ricci, 38.6.6
cyclic definition, 2.0.1
curvature, scalar, 38.6.7
cyclic inclusion of sets, 4.1.9
curvature, sectional, 38.6.4
cyclic knowledge, 2.1.3
curvature, sectional, on Riemannian manifold, 40.5.0
curvature, topological generalization, 24.4.0 da Vinci, Leonardo, 3.9.8
curvature form, 36.8.3, 38.6.2 dagger, Quine, 4.3.6, 4.3.8
curvature of connection on ordinary fibre bundle, 35.4 danger, reductio ad absurdum, 3.11.4
curvature tensor, 36.8.6, 38.6, 38.6.3 dark age, 3.4.2
curvature tensor, Riemann, 35.4.1 Dark Ages, 2.3.5, 41.0.1, 45.1.2
curvature tensor components, affine connection, 40.3.2 dark energy, 2.10.5
curve, 16.2 dark matter, 2.10.5

850 50. Index
dark number, 2.10, 2.11.12 derivative, directional, 19.6.2

dark set, 2.10 derivative, exterior, 20.6, 20.6.2
dark sets, 20.1.3 derivative, exterior, on manifold, 32.6, 32.6.1
data, experimental, 2.9.9 derivative, Lie, 32.4.9
database object attributes, 2.4.4 derivative, real-valued function, 18.2.10
dateline, international, 41.1.5 derivative, unidirectional, 19.6.2
datur, tertium non, 3.6.4 derivative, vector field, naive, 32.1
daylight hours calculation, 41.15 derivative for curve family, vector field, 32.3
DE (differential equation), 21.0.2 derivative notation, 18.2.11, 30.4.7
de Morgan’s law, 5.13.11, 5.14.8 derivatives for several variables, higher-order, 18.6
death, 2.4.2, 2.11.19, 3.5.3 Desargues, Girard, 45.1.3
decimal classification, Dewey, 1.8.1 Descartes, René, 2.1.6, 2.4.5, 2.9.6, 18.1.1, 45.1.3, 45.2.8
decision-making, mammoth hunters, 3.6.1 descriptive interpretation, logical assertion, 3.7.13, 3.12.1
decomposition, compound proposition, 3.7.14 determinable content, set, 5.5.3
Dedekind, (Julius Wilhelm) Richard, 2.9.10, 2.12.4, 4.9.12, determinant of a matrix, 11.3.6
9.2.1, 45.1.6 deviation from flatness, parallelism, 35.4.1
Dedekind cut, 2.7.2, 2.7.3, 8.3.4, 13.1.8 devil’s advocate, 3.10.5
Dedekind-finite set, 5.9.10, 7.2.24 Dewey decimal classification, 1.8.1
Dedekind-infinite set, 7.2.24 diagonal argument, 2.11.20
deduction, 3.7.8 diagonal map, 6.9.13
deduction, mathematical, 2.9.7 diagonalization of matrix, 41.8.3
deduction rule, 4.9.1 diagram, Venn, 5.7.17, 5.7.25
deduction rule, meta-theorem, 4.9.4 diagrams, intuition, 3.6.2
deduction rules, 4.4.2, 4.6 diagrams, semantics, 3.6.2
deduction rules, defining logical operators, 4.6.3 dialysis, 14.8.1
deduction theorem, 4.6.7, 4.9, 4.10.1 diameter of a set, 17.2
define-before-use ordering, 1.4.12 diameter of set, 17.2.4
definite, negative, 11.4.7 dictionary, 2.2.7
definite, positive, 11.4.7 diffeomorphic, 19.1.2
definite real symmetric matrix, 11.6 diffeomorphism, 19.1.2
definition, clash, pathological, 2.6.7 diffeomorphism, composition, 19.1.3
definition, constructional method, 2.8.4 diffeomorphism, Euclidean space, 19, 19.1
definition, cyclic, 2.0.1 diffeomorphism differentiability class, 26.9.8
definition, function, inline, 14.12.19 diffeomorphism family, vector field, 33.6.8
definition, inductive, 18.4.1 diffeomorphism group, 33.0.1, 33.6
definition, operational, 2.11.8, 2.11.9 diffeomorphism group, differentiable, 33.6.1
definition, template, 18.4.1 diffeomorphism invariant, 36.1.5
definition breeding rules, 3.3.7 diffeomorphism on Euclidean space, 19.2
definition-centric book, 1.4.5 diffeomorphism pseudogroup, 18.7.8, 19.4, 23.1.9, 23.5.0,
definition notation, 1.6.5 23.6.12, 26.3.8
definition of “constant”, 4.12.13 diffeomorphism pseudogroup, complete, 19.4.10
definitions, boot-strapping, 2.1.1, 2.1.3, 6.1.1 diffeomorphisms, differentiable, topological group, 33.6.5
definitions, mathematical, naturalism, 5.0.4 diffeomorphisms, differentiable group, 33.6.3
definitions, mathematical systems, 2.8 diffeomorphisms, family, differentiable, 33.6.7
definitions book, 2.7.1 diffeomorphisms, one-parameter family, 26.8.1
deflationary theory of truth, 3.10.8 diffeomorphisms, one-parameter group, 26.8.2
delayed assertion, 3.7.17 differences of this book from other DG texts, 1.7
deliverables, mathematics, 3.2.3 differentiability, fractional, 42.4.0
delta function, Dirac, 27.1.1 differentiability, higher-order, 18.4.1
delta function, Kronecker, 7.9.1, 7.9.10, 11.1.22, 22.4.0, 27.7.5 differentiability-based function space, 18.7
demand, on, unbounded modelling, 5.7.25 differentiability class of diffeomorphism, 26.9.8
Democritus, 45.1.1 differentiability implies continuity, 18.2.15
demonstration prototypes, 2.8.9 differentiability of connection, 35.3.9, 35.3.14
denial, alternative, 4.3.6, 4.6.4 differentiability of connection on principal fibre bundle, 35.5.5
denial, joint, 4.3.6, 4.6.4 differentiable action of group, 34.3.0
denial operator, alternative, 4.3.8 differentiable atlas, 26.3.4
denial operator, joint, 4.3.8 differentiable chart, 26.5.6
dense subset, 18.5.7, 18.5.12 differentiable cross-section of alternating form bundle, 20.5.5
dense subset of topological space, 15.6.5 differentiable cross-section of cotangent bundle on Euclidean
dependent choice, axiom, 5.0.10 space, 19.2.11
derivation, 27.1.1, 27.1.11, 44, 44.8 differentiable cross-section of fibration, 26.13.13
derivation examples, 44.2 differentiable cross-section of tangent bundle on Euclidean
derivation rules, logic, 4.14.4 space, 19.1.12
derivative, bidirectional, 19.6.2 differentiable curve, 26.7, 26.7.1
derivative, covariant, 32.1.0, 36, 36.5 differentiable curve in Euclidean space, 18.6.5
derivative, covariant, for general connection, 35.7 differentiable diffeomorphism group, 33.6.1
derivative, covariant, of vector field, 36.5.5, 36.5.7, 36.5.12 differentiable diffeomorphisms, group, 33.6.3
derivative, covariant, of vector field along curve, 36.5.11 differentiable diffeomorphisms, topological group, 33.6.5

50. Index 851
differentiable differential form, 20.5.5 differential form in flat space, 20.5

differentiable family of diffeomorphisms, 33.6.7 differential geometry history, 45
differentiable family of differentiable transformations, 26.8 differential layer, 1.1, 26.0.1
differentiable fibration, 26.13 differential manifold tensor calculus, 40.2
differentiable fibration with differentiable fibre space, 34.1.1 differential of curve, 30.4
differentiable fibration with fibre atlas for specified fibre differential of curve, higher-order, 31.3
space, 26.13.7 differential of curve family, higher-order, 31.4
differentiable fibration with intrinsic fibre space, 26.13.1 differential of curve for higher-order operator, 31.8
differentiable fibration with specified fibre space, 26.13.3 differential of differentiable map, 30.3, 30.3.1, 30.3.4
differentiable fibre atlas, 26.13.6 differential of differentiable map, higher-order, 31.2
differentiable fibre bundle, 27.8.0, 34 differential of differentiable map for higher-order operator,
differentiable fibre bundle, associated, 34.6, 34.6.3, 34.6.7, 31.7
34.6.10 differential of function on Euclidean space, 19.2.13
differentiable fibre bundle, horizontal lift function, 35.3.5 differential of real-valued function, 30.2, 30.2.1, 30.2.11
differentiable fibre bundle, lift function, 35.3.12 differential of real-valued function, higher-order, 31.1
differentiable fibre bundle, vector field, 34.3 differential of real-valued function for higher-order operator,
differentiable fibre bundle association, 34.6.2 31.5
differentiable fibre bundle connection, 35 differential of real-valued function for second-order operator,
differentiable fibre bundle with Lie structure group, 34.2, 31.5.1, 31.5.3
34.2.2 differential on Euclidean space, 19.2
differentiable fibre bundle with non-Lie structure group, 34.1, differential on manifold, 30
34.1.3 differential operator in Riemann space, 38.7
differentiable fibre chart, 26.13.5 differential operator on Euclidean space, second-order, 19.5
differentiable function, directionally, 18.5.8 differential parallelism, 15.3.2, 35.1.1
differentiable function, partially, 18.5.4 differential quotient, 44.4.0
differentiable function, totally, 18.5.17 differential topology, 26.0.1
differentiable function, unidirectionally, 18.3.5, 18.5.14 differentiation, composition rule, 18.4.11, 18.4.12
differentiable function space, 44.4 differentiation, linear spaces, 18.8
differentiable group, 27.8.0, 33 differentiation, one variable, 18.2
differentiable manifold, 26, 26.3.6 differentiation, several variables, 18.5
differentiable manifold, compact, 26.5.17 differentiation of parallel transport, 35.2
differentiable manifold, paracompact, 26.5.18 digit sequence, 7.2.5
differentiable manifold, tangent bundle, 27, 34.8 digital electronics, 3.6.2
differentiable manifold atlas, 26.3, 26.3.2 digital electronics circuit, 4.1.8
differentiable manifold product, 26.5.15 digital images, 2.9.1
differentiable manifold restriction, 26.5.12 dimension, Lebesgue, 15.9.1, 33.2.5
differentiable map, differential, 30.3, 30.3.1, 30.3.4 dimension, topological, 15.9, 33.2.5
differentiable map, higher-order differential, 31.2 dimension of dual linear space, 10.5.10
differentiable map, induced map, 30.3.6 dimension of finite-dimensional linear space, 10.2.5
differentiable map between differentiable manifolds, 26.9, dimensional analysis, 20.3.3
26.9.1 dinosaur, 2.4.6
differentiable map for higher-order operator, differential, 31.7 dip, lucky, 5.11.5
differentiable path, 26.7 Dirac delta function, 27.1.1
differentiable principal fibre bundle, 34.4, 34.4.1 direct product atlas, 25.5.3
differentiable principal fibre bundle, vector field, 34.5 direct product manifold, 25.5.5
differentiable real-valued function, 18.2.2, 18.2.5 direct product of family of groups, 9.3.30
differentiable real-valued function on differentiable manifold, direct product of functions, 6.9.11
26.6, 26.6.1 direct product of functions, pointwise, 6.9.12
differentiable second-order vector field, 29.7.3 direct product of groups, 9.3.30
differentiable structure, 26.2.2 direct product of two topological fibrations, 23.3.20
differentiable structure, overview, 26.1 direct product topology, 15.1.4
differentiable tensor field, 28.7.2 direct sum of linear space sequence standard injection, 10.6.4
differentiable transformation, differentiable family, 26.8 direct sum of linear spaces, 10.6
differentiable vector field along curve, 28.8.4 direct sum of linear spaces, abstract, 10.6.3
differential, higher-order, 31 direct sum of linear spaces, external, 10.6.1
differential, pointwise, 30.1 direct sum of linear spaces, formal, 10.6.1
differential action, 33.8.0 direct sum of linear spaces, internal, 10.6.5
differential calculus, 18 directed area, 13.1.11
differential calculus and logic, 4.4.2 directed continuous path, 16.4.2
differential equation, ordinary, 21.1 direction, compass, 3.7.3
differential equations, logic analogy, 4.0.3 directional derivative, 19.6.2
differential equations, partial, 36.7.1 directionally differentiable function, 18.5.8, 19.6.3
differential equations, systems, 21.0.3 directionally differentiable homeomorphism, 19.6
differential for second-order operator, 31.7.3 directory, file system, computer, 5.7.19
differential for second-order tangent vector, 31.7.1 Dirichlet, Johann Peter Gustav Lejeune, 45.1.5
differential for tangent operator, 30.3.18, 30.3.22 Dirichlet problem, 21.3.0
differential form, 28.9, 28.9.1 dirt, 2.2.1
differential form, differentiable, 20.5.5 disbelief, 3.10.15

852 50. Index
disciplines, cycle, 2.5.17 double cotangent space, pointwise, 28.4.5

disclaimer, 0.0 double cotangent space, total, 28.4.6
discomfort level, 2.11.13 double dual of linear space, 10.5.19
discomfort levels, 2.10.3, 2.12.5, 5.9.3 double negation, 3.6.1, 3.10.8
discomfort levels with infinities, 2.9.5 double negation, logical, 3.11.6
disconnected set of sets, 15.4.9 double-negative assertion, 3.7.4
disconnected subset, 15.4.3 double tangent space, 28.4
disconnection of a set of sets, 15.4.10 double tangent space, pointwise, 28.4.1
disconnection of sets, 15.3 double tangent space, total, 28.4.3
disconnection of subset of topological space, 15.4.7 doubt, active, 3.9.4
disconnection of topological space, 15.4.5 dough-nut and tea-cup topology, 14.2.3
discontinuous function example, 18.5.6, 18.5.7, 18.5.11, dramatis personae, 1.0
18.5.12 dreary argumentation, 4.0.2
discovery of proof, 4.8.4, 4.8.5 drop function, 30.4.3
discrete manifold, 25.1.6 drop function, Lie derivative, 32.4.15
discrete topological space, 14.3.19 drop function for second-level tangent vector, 29.5, 29.5.2
discrete topology, 14.3.19 drop function of total tangent space, 27.11, 27.11.6
discussed context, 3.3.4 drop-in replacement, 2.8.9
discussed context, concrete, 4.3.1 dry argumentation, 4.0.2
discussion, logic, 3.9.4 dual, multilinear, 13.6.5, 13.7.5
discussion context, 3.3.4 dual basis, canonical, 10.5.7
discussion context, abstract, 4.3.1 dual component map, 10.5.11
discussion language, 3.10.6 dual linear space dimension, 10.5.10
discussions, network, 3.3.4 dual map, 10.5.24
disjoint sets, 5.13.7 dual of linear space, double, 10.5.19
disjoint union, 6.4.2 dual of linear space, second, 10.5.18
disjunct, 4.3.11 dual operator, 10.5.24
disjunction, 3.13.7 dual order, 7.1.7
disjunction, exclusive, 3.5.5 dual space, 10.5, 10.5.5, 13.8.4
disjunction, inclusive, 3.5.5 dual vector, 13.7.2
disjunction, logical, 4.3.3 duality, logic quantifier, 4.13.10
disjunction, meta-proposition, 3.13.6 duality, topology, 14.4.7
disjunction, triple, 3.5.8 dummy variable, 4.3.16, 5.1.23, 5.8.16, 5.8.24
disjunctions, bulk, 3.1.2 dummy variable, limit of function, 14.12.19
disjunctive normal form, 3.5.9, 4.11.3 dummy variable, naked, 5.8.28
distance between point and set, 17.2.1 dummy variable, superfluous, 6.5.16
distance between sets, 17.2, 17.2.1 duple, ordered, 7.7.3
duplicity, 4.16.11
distance function, 17.1.1
distance function, Hessian of second derivatives, 39.1.1 duplique, 4.16.11
distance function, nearest integer, 8.6.19 dynamic concrete proposition domain, 4.1.6
distance function, point-to-point, 38.4 dynamic memory allocation, 2.10.10
distance function in metric space, 17.1 dynamic perspective, set construction, 5.7.24
distribution, 26.11.1 dynamics, population, 3.7.14
distribution, Schwartz, 27.1.1, 27.5.8 dynasty, Sung, 7.11.7
distribution, tempered, 27.5.8 Earth, flat, 41.0.1
distribution theory, 14.6.6, 33.3.4 Earth, round, 3.10.6
distributions representation of tangent bundle, 27.15 Earth chart, 25.1.5, 25.4.5
distributivity logic axiom, 4.7.5 Earthlings, 2.0.3
distrust, 3.9.4 easy to show, 1.5.2
divergent sequence, 14.12.27 economy, conceptual, 6.5.2
divides (integer relation), 7.4.4 economy, intellectual, 2.11.14
DNA (deoxyribonucleic acid), 36.2.4, 36.8.8 Eddington, Arthur Stanley, 35.1.5, 45.1.6
dog, 2.4.2, 2.5.5 education, medieval university, 2.9.8
dog, Harry’s, barking, 3.6.3 effective computability, 2.12.3
dogma, prisoner, 1.4.6 effective left transformation group of a topological space,
domain, concrete proposition, 3.1.2, 4.1, 4.1.3, 4.5.4 16.8.7
domain, concrete variable, 5.7.3 effective right transformation group of topological space,
domain, set, concrete, 5.7.23 16.8.13
domain of function, 6.5.9 effective topological left transformation group of topological
domain of interpretation, propositional calculus, 4.12.8 space, 16.8.8
domain of relation, 6.3.6, 6.3.7 effective topological right transformation group, 16.8.14
domain restriction, relation, 6.3.33 egg, chicken, 5.7.21
domain/range specification, function, 6.5.5 eggs, 5.15.6
domesticated animal, 2.2.6 eggs, goose, golden, 2.10.1
Domhan cothrom, 3.10.8 Egypt, 3.11.9
dotage, Newton’s notation, 18.2.11 Egyptian geometry, 45.2.2
double-angle rules, trigonometry, 20.13.12 Egyptian hieroglyphics, 2.5.5, 3.10.6
double complement, 3.6.1, 3.11.6 Egyptian mathematics, 1.4.4, 45.1.1

50. Index 853
Egyptian priests, 3.2.8 equation, self-referential, 5.7.14

eigen, etymology, 11.5.7 equation of geodesic variation, 40.4
eigenspace, linear space, 10.4.1 equations, logical, simultaneous, 4.4.2
eigenspace of linear space endomorphism, 10.4 Equator, 3.10.6
eigenvalue, linear space, 10.4.1 equilibrium, 2.11.5
eigenvalue, matrix, 11.1.0, 11.5.7 equinumerosity, 5.9.10
eigenvalues of metric tensor, 39.1.1 equinumerous sets, 7.2.18
eigenvector, linear space, 10.4.1 equipotent sets, 7.2.18
eigenvector, matrix, 11.1.0, 11.5.7 equipotent theorems, 4.9.2
Einstein, Albert, 2.3.4, 2.3.6, 7.2.7, 39.1.6, 39.3.1, 40.1.2, equivalence, logical, 4.3.3
45.1.6 equivalence class of curves, 27.1.1
Einstein index convention, 7.11.15, 13.8.5, 13.9.18, 13.10.2 equivalence of topological spaces, 14.1.10
Einstein space, 38.6.8 equivalence relation, 4.15.3, 6.4, 6.4.1
Einstein’s equations, 1.4.11, 39.1.3, 39.3.2 equivalent logical expressions, 3.10.13
Einstein’s equations, tensor terms, 13.1.1 equivalent manifolds, 26.5.10
electrolysis, 14.8.1 equivalent topological fibre atlases, 23.6.16
electromagnetic radiation, 3.4.5 equivalent topological fibre bundles, 23.7.5, 23.7.11
electronic logic circuit, 4.5.5 era, pre-logical, 3.2.8
electronics, digital, 3.6.2 Eratosthenes of Cyrene, 45.1.1
electronics circuit, digital, 4.1.8 erectus, Homo, 2.2.4
elementary particle, 2.0.1 Erlanger Programm, 12.2.2, 14.2.3, 19.1.1, 19.4.2, 23.5.0,
Elements, Euclid, 5.0.7 33.0.5, 45.3.0
ellipse projected by sphere, 41.17.2 erroneous use of symbol, exists, 4.13.4
elliptic functions, 20.13.22 error, execution, 3.10.4
elliptic integrals, 20.13.22 error, logic, 3.9.3
elliptic PDE, 21.3.0 essence of a group, 5.16.2
elliptic second-order operator, 29.6, 36.7.3 essence of integers, 2.5.6
elliptic second-order operator field, 36.7 essential nature, 1.4.7
elliptic second-order PDE, 26.6.10 essential nature, set, 5.2.6
elliptic second-order tangent operator, 29.6.3 ethics, logic, 3.3.10
elliptic second-order vector field, 29.7.6 etymology, adjoint, 9.11.24
embedded manifold, 26.4.3 etymology, adjunct, 9.11.24
embedded Riemannian manifold, 38.9 etymology, affine space, 45.3
embedding of manifold, 30.3.11 etymology, algebra, 3.8.3, 45.2.4
embodied mind theory, 2.5.21 etymology, algorithm, 45.2.3
empirical proposition, 3.3.10, 4.13.10 etymology, analysis, 14.8.1
emptor, caveat, 4.1.1 etymology, coherent, 2.1.7
empty function, 6.5.17 etymology, consistent, 2.1.7
empty path, 16.4.7 etymology, eigen, 11.5.7
empty product of family of sets, pathological, 15.1.5 etymology, naive, 2.1.2
empty set, 5.8.3, 5.8.6 etymology, sequence, 7.1.16
empty set, uniqueness, 5.3.2, 5.8.1 etymology, solve, 14.8.1
empty set axiom, 5.3.2 etymology, tangent, 27.0.2
empty topology, 14.3.10, 14.4.2 etymology, topology, 14.2.1, 45.2.17
endeavour, minimalist, 5.0.7 etymology, vector, 27.0.5
endomorphism, Lie algebra, 9.11.8 Euclid, Elements, 5.0.7
endomorphism, linear space, 10.3.6 Euclid of Alexandria, 45.1.1
endomorphism module, 9.9.27 Euclid’s Elements , 45.1.1
endomorphisms of a module, ring, 9.9.7 Euclid’s fifth postulate, 36.0.2
endomorphisms of module, ring, 9.9.15 Euclid’s geometry, axiomatization, 3.2.7
energy, dark, 2.10.5 Euclidean fibre bundle on Euclidean space, 43.1
energy and time, terrible waste, 4.4.4 Euclidean geometry, 2.1.9, 2.3.4, 26.1.4, 26.3.1, 36.0.2, 38.1.5
English, Old, 3.5.4 Euclidean inner product, standard, 10.8.5
epimorphism, Lie algebra, 9.11.8 Euclidean linear space, 10.2.20
epimorphism, linear space, 10.3.6 Euclidean linear space, standard basis, 10.2.21
epsilon, 5.1.20 Euclidean norm, 10.8.4
epsilon-delta continuity, 17.4.2, 17.4.3, 17.4.5, 17.4.14 Euclidean space, 25.2, 42.2
equality, 4.15 Euclidean space, atlas, standard, 26.4.1
equality, definition, 4.15.1 Euclidean space, atlas, usual, 26.4.1
equality, first order language with, 5.2.3 Euclidean space, Levi-Civita connection, 12.4.1
equality, reflexivity, axiom, 4.15.1 Euclidean space, locally, non-Hausdorff, 42.3
equality, substitution of, 5.2.3 Euclidean space analysis, 42.10
equality, substitution of, axiom, 5.2.5 Euclidean space concepts, 12.4
equality, substitutivity, axiom, 4.15.1 Euclidean space conical coordinates, 42.7
equality relation, concrete, 2.5.11 Euclidean space diffeomorphism, 19
equality relation, concrete, import, 4.15.2 Euclidean space fibre bundle, 43.1
equation, differential, ordinary, 21.1 Euclidean space submanifold, 40.7, 40.7.1
equation, logical, 4.4.3 Euclidean space submersion, 40.7.1

854 50. Index
Euclidean space tangent vector, 19.1 exterior derivative using Lie derivative, 20.7
Euclidean topological space, 25.2.1 exterior of set, 14.6
Eudoxus of Cnidus, 20.1.1, 45.1.1 exterior of set, topological, 14.6.2
Euler, Leonhard, 12.1.1, 36.2.5, 45.1.4, 45.2.8, 45.3.0 exterior point of set, 14.0.2
Euler’s angles, 41.8.2 external direct sum of linear spaces, 10.6.1
European renaissance, 2.3.5, 45.1.2 extra-terrestrial civilisation, 2.2.6
even integer, 4.13.10 extraneous properties management, 2.7.4
even permutation, 7.10.10 extraneous properties of mathematical objects, 2.7.2
Ex contradictione sequitur quodlibet, 3.11.1 extrapolation, concept, 7.2.1
Ex falso sequitur quodlibet, 3.11.1 extrinsic tangent vector, 27.1.0
exact sequence of linear maps, 10.11, 10.11.1
factorial function, 7.10.13, 18.6.1
example fibre bundle, 43
factorial function, Jordan, 7.10.16
example of manifold, 42
faith, formalist existence proof, 2.2.8
excluded middle, 3.1.4, 3.6.3, 3.11.2, 5.7.17
false, abstract label, 3.2.5
excluded middle, logical machine ontology, 3.11.6
false, proposition tag, 3.7.16
excluded middle, Russell’s paradox, 5.7.11
false application of theorem, 14.2.6
exclusive disjunction, 3.5.5
false generalization, 17.0.1
exclusive or, 3.5.2, 3.5.7, 3.13.7, 4.3.5
false-store, proposition, 3.7.19
exclusive or, notation, 4.3.7
false zero-operand operator, 4.3.18, 4.3.19
exclusive-or operator, 4.3.8
false zero-parameter predicate, 4.12.10
execution error, 3.10.4
falsity, ontology, 3.6.1
exercise answers, 47
falsity, proposition tagging, 3.7.5
exercise questions, 46
falsity, semantics, 3.9
exhaustion method, 20.1.1 falsity, subjective, 3.7.2
exhaustive substitution, logical expression, 4.14.3 falso sequitur quodlibet, Ex, 3.11.1
existence, unique, 4.16.3 families, parametrized propositions, 4.12
existence, unique, notation, 4.16.4, 4.16.6 family, 6.5.4
existence, unknowable, Lebesgue non-measurable set, 2.10.8 family, proposition, parametrized, 3.1.2
existence axiom, set, ZF, 5.1.17, 5.1.18 family of curves, 16.2.2
existence proof, formalist, faith, 2.2.8 family of curves, higher-order vector field, 29.8
existential quantifier, 4.13.2 family of diffeomorphisms, differentiable, 33.6.7
existential/universal quantifier, information content, 4.13.9 family of diffeomorphisms, vector field, 33.6.8
exists, erroneous use of symbol, 4.13.4 family of functions, 6.8, 6.8.14
experience, sensible, 2.3.6 family of functions, Cartesian product, 6.9
experimental physicist, 2.9.9 family of geodesic curves parametrized by endpoints, 37.7.3
exponential function, 20.12, 20.12.2 family of geodesic interpolations, 37.7
exponential map, 37.8, 37.8.2 family of geodesics, one-parameter, 40.4.1
exponential of linear map, 10.4.1 family of local diffeomorphisms, 32.5.0
exponential of matrix, 21.7 family of sets, 6.8, 6.8.1
expression, logical, 4.3 family of sets, Cartesian product, 6.9, 6.9.1
expression, logical, exhaustive substitution, 4.14.3 family of sets, intersection, 6.8.5
expression, logical, non-atomic, 3.5.4 family of sets, union, 6.8.5
expression evaluation, logical, 4.4 family of systems, parametrized, 2.8.5
expression parse-tree, logic, 4.3.10 feedback, positive, audio system, 3.3.7
expressions, equivalent, logical, 3.10.13 feedback, sensor/motor, 3.3.2
extended canonical map for tensor space, 13.13.5 feet, 2.2.1
extended integer, 7.6, 7.6.2 female slave, 3.5.3
extended integer, negative, 7.6.4 Fermat, Pierre de, 45.1.3
extended integer, positive, 7.6.4 Feynman, Richard Phillips, 2.0.4, 45.1.6
extended integer set notations, 7.6.4 fibration, differentiable, 26.13
extended list space, 7.12.5 fibration, differentiable, with fibre atlas for specified fibre
extended number notation, 1.6.1 space, 26.13.7
extended rational number, 8.2 fibration, differentiable, with intrinsic fibre space, 26.13.1
extended rational number notation, 1.6.1 fibration, differentiable, with specified fibre space, 26.13.3
extended real number, 8.4, 8.4.1 fibration, non-topological, 22.1
extended real number notation, 1.6.1 fibration, non-topological, parallelism, 22.2
extended truth table, 3.8.1, 4.2.6 fibration, non-uniform non-topological, 22.1.1
extension axiom, ZF, 5.1.3, 5.2 fibration, tangent, 27.8.8
extension of a function, 6.5.31 fibration, topological, 26.13.0
extension theorem, Tietze, 15.2.27 fibration, topological, cross-section, 26.13.12
extensionality, axiom, 5.2.1 fibration, uniform non-topological, 22.1.3
extent, point with no, 2.4.3 fibration, uniform non-topological with fibre space F , 22.1.9
exterior algebra, 13.11.2 fibration identification space, 23.4
exterior calculus, 13.1.7 fibre, standard, 23.3.3
exterior closure, topology, 14.6.1 fibre, tangent bundle, 19.1.10
exterior derivative, 20.6, 20.6.2 fibre atlas, 23.3
exterior derivative of parallel transport, 20.6.18 fibre atlas, differentiable, 26.13.6
exterior derivative on manifold, 32.6, 32.6.1 fibre atlas, non-topological fibration, 22.1.7

50. Index 855
fibre atlas, topological, equivalence, 23.6.16 finite open cover, 14.11.12

fibre atlas for a topological fibration, 23.3.14 finite ordinal number, 7.2.12
fibre bundle, associated, differentiable, 34.6, 34.6.3, 34.6.7 finite paper, 2.10.7
fibre bundle, cross-section, 6.9.7 finite set, 7.2.20
fibre bundle, differentiable, 27.8.0, 34 finite set, Dedekind, 7.2.24
fibre bundle, differentiable, associated, 34.6.10 finite set topology, combinatorics, 14.4.6
fibre bundle, differentiable, connection, 35 finite transformation group as fibre bundle, 22.4
fibre bundle, differentiable, vector field, 34.3 finitely populated proposition store, 3.3.7
fibre bundle, non-topological, 22, 22.3, 22.3.1 first order language with equality, 5.2.3
fibre bundle, principal, affine connection, 36.9 first order logic, NBG set theory, 4.13.13
fibre bundle, principal, differentiable, 34.4, 34.4.1 first order logic, semantic space, 4.13.13
fibre bundle, principal, differentiable, vector field, 34.5 first order logic, ZF set theory, 4.13.13
fibre bundle, principal, topological, 23.9 fish, 5.15.6, 38.1.5
fibre bundle, tangent, 34.8.1 Fisher information matrix, 38.10.1
fibre bundle, topological, 23, 23.6, 23.6.4, 34.0.0 five layers, linguistic structure, predicate calculus, 4.12.4
fibre bundle, topological, associated, 23.10, 23.10.5 five layers, linguistic structure, propositional calculus, 4.5.4
fibre bundle, topological, history, 23.1 fix, bug, 3.10.4
fibre bundle, topological, motivation, 23.1 fixed stars, 24.1.4
fibre bundle, topological, overview, 23.1 flat Earth, 41.0.1
fibre bundle, topological, pathwise parallelism, 24.2 flat paper, 25.1.5
fibre bundle association, differentiable, 34.6.2 flat paradox, 5.7.23
fibre bundle example, 43 flat space, 12.0.2
fibre bundle homomorphism, 23.7 flatness deviation, parallelism, 35.4.1
fibre bundle isomorphism, 23.7 flavour, topology, 14.2.2
fibre bundle on Euclidean space, 43.1 flavours, logic, 4.0.1
fibre bundle product, 23.7 floating-point number, 5.16.1, 8.3.1
fibre chart, 23.3.4 flooding, Nile, 45.2.2
fibre chart, compatible, 23.6.14 floor function, 8.6.8
fibre chart, differentiable, 26.13.5 flow, information, incomplete, 3.8.2
fibre chart, non-topological fibration, 22.1.5 flow diagram, topics, 1.2
fibre set automorphism through the charts, 23.8.7 flow of vector field, 32.4.2
fibre set glue, 26.13.0 flow velocity, 32.4.3
fibre set isomorphism through the charts, 23.8.14 fluxions, 18.1.1, 18.2.11
fibre set map, structure-preserving, 23.8 fly, 5.15.3, 23.5.0, 32.3.1, 42.4.1
fibre set parallelism, topological, 24.1.9 fly, ointment, 7.2.5
fibre set parallelism space, topological, 24.1.9 foe, friend or, 2.2.4
fibre space, 23.3.3 foisting propositions, 3.7.12
fibre space, non-topological fibration, 22.1.3 folder, file system, computer, 5.7.19
fibre space of tangent bundle, 27.2.5 font, Computer Modern Roman, 1.10
fibre-to-fibre homeomorphism space of topological fibre font, Fraktur, 14.3.14
bundle, 35.8.3 font, rsfs, 1.10
field, 9.8, 9.8.8 font, wncyr, 1.10
field, gravitational, 2.11.18 food, poisonous, 3.10.15
field, Jacobi, 37.3, 37.3.2 football, 2.9.4
field, metric tensor, 10.2.18 foreground claim, 3.5.4
field, operator, composition, 32.2.7 foreground proposition, 3.6.1, 3.7.3
field, ordered, affine space, 12.2.15 forest, Brazilian, 3.2.8
field, Riemannian metric tensor, 38.3.3 form, alternating, 13.10.4
field, tangent vector, partial, 30.4.11 form, curvature, 36.8.3, 38.6.2
field, tensor, 28.7, 28.7.1 form, differential, 28.9, 28.9.1
field, tensor, differentiable, 28.7.2 form, differential, in flat space, 20.5
field, vector, 28.5, 30.5 form, second fundamental, 38.1.4
field, vector, composition, 32.2.2 form, statement, 4.5.2, 4.5.4
fifth problem, Hilbert’s, 33.0.5, 33.2 form, statement-form-name, 4.5.4
figure, 36.1.5 form, torsion, 36.8.5
figure-head set in structure tuple, 26.3.7 form bundle, alternating, cross-section, 20.5.4
figure of transformation group, 9.7, 9.7.0 formal direct sum of linear spaces, 10.6.1
file system directory, computer, 5.7.19 formal linear combination of vectors, 10.10.4
finite bandwidth, 2.5.13 formal notation introduction, 5.8.6
finite bandwidth, human communication, 2.11.14 formalism, 5.9.3
finite basis for linear space, 10.2.9 formalism, secure, 1.5.1
finite-dimensional linear space, 10.2.5 formalism, topology, 14.3.1
finite-dimensional linear space dimension, 10.2.5 formalisms and notations, plethora, 1.4.1
finite group, 7.4.5 formalist existence proof, faith, 2.2.8
finite machine, 2.10.9 formalization, logic procedures, 4.5.1
finite measurement resolution, 2.12.1 formalization, propositional calculus, 4.5
finite mind, 2.10.2 formalized logical algebra, 3.1.2
finite naive set theory, 3.13.3 formicarium, 2.10.3, 2.10.6

856 50. Index
Forms, Platonic, 2.4.1 function restriction, 6.5.27

formula, boolean, 47 function sequence, 7.1.14
formula, set-theoretic, 6.3.3 function set, notation, 6.5.13, 6.5.15, 6.5.16
formula, well-formed, 4.3.10, 4.5.2 function set map, 6.6
fortune, wheel of, 1.8.0 function set notations, 6.12
foundation, logic, 3.0.1 function-set versus function-predicate, 6.5.1
foundation, mathematics, 5.0.1 function space, differentiability-based, 18.7
foundation axiom, 5.5.1 function space, integrability-based, 20.11
foundations of mathematics, 1.5.1 function target set, 6.5.10
Fourier, Jean Baptiste Joseph, 2.9.10, 45.1.5 function template, 6.5.22
fractional differentiability, 42.4.0 function transpose, 6.12.3
fractional part function, 8.6.13 function tree, 4.3.12
Fraenkel, Abraham Adolf, 2.9.7, 45.1.6 function value, 6.5.11
Fraktur font, 14.3.14 function-valued function, 6.12.1
frame, intertial, 24.1.4 functional, linear, 10.5, 10.5.1
frame, orthogonal, 38.5.6 functional module, 4.1.1
frame, tangent, 27.12, 27.12.1 fundamental form, second, 38.1.4
frame, tangent operator, 27.12.4 fundamental tensor, 38.3.6
frame bundle, tangent, 34.9, 34.9.1 fundamental theorem of calculus, 20.2.3, 20.3.1, 20.6.3,
framework, logic, 4.0.1 20.9.1, 21.0.1
framework system, meta-logical, 5.7.22 Fux, Johann Joseph, 1.4.14
Fréchet, René Maurice, 45.1.6 fuzzy ideas, 2.4.3
free group, 10.10.4 Gaelic, Irish, 3.9.5
free linear space, 10.10, 10.10.1, 10.10.5 Galilean relativity, 29.0.3
free linear space standard immersion, 10.10.1 Galileo Galilei, 45.1.3
free linear space used to define tensor product, 13.12 Galileo transformation, 29.0.3
free variable, 5.1.24, 5.8.16 Galois, Evariste, 9.2.1, 45.1.5, 45.2.15
Frege, Friedrich Ludwig Gottlob, 2.4.6, 2.9.7, 45.1.6, 45.2.18 Galois theory, 33.0.5
friend or foe, 2.2.4 garden, 2.10.6
frog, 7.2.7 Gauß, Johann Carl Friedrich, 38.1.1, 38.1.2, 38.1.3, 38.1.4,
FTOC (fundamental theorem of calculus), 20.2.2 45.1.5, 45.2.8
function, 6, 6.5, 6.5.6 Gauß-Green theorem, 26.1.6
function, bijective, 6.5.23 Gaussian curvature operator, 19.5.3
function, continuous, 14.12 gelada baboon, 2.2.4
function, convex, 37.9, 37.9.1 general connection, alternative definitions, 35.9
function, domain/range specification, 6.5.5 general connection, covariant derivative, 35.7
function, empty, 6.5.17 general linear algebra, 9.10.4
function, function-valued, 6.12.1 general linear group, 10.9, 11.7, 36.0.1
function, identity, 6.5.19 general linear transformations, Lie group, 33.7.6
function, injective, 6.5.23 general relativity, 38.1.3, 39.1.4, 39.3
function, inverse, 6.5.25 general relativity global solution, 39.5
function, local, 6.11.3 general set intersection properties, 5.14
function, logical, constant, 4.12.12 general set union properties, 5.14
function, partial, 6.11, 6.11.3 general tensor algebra, 13.9
function, partially defined, 6.11, 6.11.3 generalization, false, 17.0.1
function, predicate logic, 4.12.11 generalized function, 27.1.1
function, real-valued, basic, 8.6 generated topology, 14.9.4
function, real-valued, higher-order differential, 31.1 geodesic, 36.8.8, 37
function, set-theoretic, 6.5.1 geodesic, length-parametrized, 40.5.0
function, surjective, 6.5.23 geodesic, minimum-length, 38.4.3
function, truth, 3.10.9, 4.2.4 geodesic, normal, 40.5.0
function argument, 6.5.11 geodesic coordinates, 37.8.1, 37.8.3
function composite, 6.7.1 geodesic curve, 37.2, 37.2.3
function composition, 6.7, 6.7.1 geodesic curve on two-sphere, 41.9, 41.9.3
function composition notation, 6.7.2 geodesic curves parametrized by endpoints, 37.7.1
function definition, inline, 14.12.19 geodesic interpolations, family, 37.7
function direct product, 6.9.11 geodesic leverage, 31.7.0
function direct product, pointwise, 6.9.12 geodesic leverage map, 37.9.3
function domain, 6.5.9 geodesic on two-sphere, affinely parametrized, 41.10
function extension, 6.5.31 geodesic path, 37.2.4
function family, 6.8.14 geodesic variation, equation, 40.4
function family, Cartesian product, 6.9 geodesics, family, one-parameter, 40.4.1
function image, 6.5.9, 6.5.10 geographical terrain, 3.5.10
function inverse set map, 6.6 geometric arts, 2.9.6
function-predicate versus function-set, 6.5.1 geometric measure theory, 13.10.0, 20.8
function product, direct, 6.9.11 geometric properties of solutions of BVPs, 37.9.3
function quotient, 6.7.7 geometry, a-priori, 3.4.4
function range, 6.5.9, 6.5.10 geometry, Babylonian, 38.1.5

50. Index 857
geometry, Egyptian, 45.2.2 group, left transformation, 9.4

geometry, Euclid’s, axiomatization, 3.2.7 group, Lie, 33.0.1, 33.1, 33.1.1
geometry, Euclidean, 2.1.9, 26.1.4, 36.0.2, 38.1.5 group, Lie transformation, 33.7, 33.7.1
geometry, information, 38.10 group, locally Euclidean, 33.2.2
geometry, pre-metric, 38.0.1 group, matrix, 11.7
geometry, synthetic, 3.4.4 group, orthogonal, 24.2.8, 35.6.2, 41.8.0
geometry of the 2-sphere, 41 group, right transformation, 9.5
geophysics, 8.7.3 group, structure, 9.7.0, 43.1.0
germ, 27.1.11, 44, 44.9 group, structure, parallelism, 35.0.3
gif, Old English, 3.5.4 group, topological, 16, 16.7, 16.7.1
Gilgamesh, 3.2.8, 3.5.2, 3.12.3 group, topological, locally Euclidean, 33.1.2
Gilgamesh epic, conditional statements, 3.5.2, 45.4.1 group, transformation, Lie, 33.0.1
Gilgamesh epic, logical language, 3.5.2, 45.4.1 group, trivial, 43.1.0
global connectivity classification, 14.2.2 group centralizer, 9.3.27
global solution, general relativity, 39.5 group centre, 9.3.27
global tangent bundle on two-sphere, 41.7 group identity, 9.2.10
global warming, 3.10.6 group inverse, 9.2.13
glue, fibre set, 26.13.0 group left inverse, 9.2.13
glue, topological, 14.1.2 group morphism notations, 9.2.23
goblin, 2.10.6, 2.10.7 group morphisms, 9.2.22
Gödel, Kurt, 2.9.7, 45.1.6 group normalizer, 9.3.25
gold, 3.5.3 group of differentiable diffeomorphisms, 33.6.3
golden eggs, goose, 2.10.1 group of differentiable diffeomorphisms, topological, 33.6.5
golden tablets, 3.2.7 group of linear transformations of IRn , 10.9
goose, golden eggs, 2.10.1 group right inverse, 9.2.13
gradient in Riemannian space, 40.5.0 group size, social, 2.2.4
gradient operator, 19.2.3, 44 groups, abstract, 9.4.1
Gradus ad Parnassum, 1.4.14 groups, holonomy, 24.4.0
graft, set, 41.6.0 groups, transformation, 9.4.1
graft, topological, 42.3.3 groups of chapters, 1.3
graft of sets, 6.10.6 guesswork, brilliant, 1.4.7
graft topology, 25.6.1 Guinea, New, 3.2.8
grammar, 2.9.8 gyf, Old English, 3.5.4
grammar book, 2.2.7, 2.2.8 habilis, Homo, 2.2.4
grammatically correct, 2.5.18 Hadamard, Jacques Salomon, 5.9.3
granular space-time, 2.12.1 half-angle rules, trigonometry, 20.13.13
graph, acyclic, 5.7.19 half animal, 1.4.7
graphics, CGI, 2.4.6 half robot, 1.4.7
Grassman, Hermann Günter, 13.1.7, 45.1.5 hall of mirrors, 5.7.19
Grassman algebra, 13.1.7 Halmos, Paul Richard, 5.0.1
gravitational field, 2.11.18 Hamilton, William Rowan, 27.0.5, 45.1.5
gravity law, Newton’s, 21.0.4 Hammurabi Code of Laws, 3.5.3
great circle line, 41.9.2 hamster, 47.4.1
Greece, ancient, 2.1.6 handshake procedure, 2.1.4
Greek, ancient, logic, 3.7.11 hang, system, 3.10.4
Greek alphabet, 14.3.13 hard link, computer, 5.7.19
Greeks, ancient, 3.2.8, 3.7.18, 5.0.2 hardware, computer, 3.13.4, 3.13.8
Green-Stokes formula, 26.1.6 Harry’s dog, barking, 3.6.3
Gregorian year, 2.10.3 Hausdorff condition, topological manifold, 25.3.2
grey number, 2.10.1, 2.10.5, 2.11.12, 3.4.3 Hausdorff separation, 15.2.12
grey set, 2.10.1 Hausdorff space, 15.2.13, 25.2.4
grid, rectangular, 16.5.6 Hausdorff space condition, topological manifold, 35.1.2
grooming behaviour, 2.2.4 Hausdorff topology, 15.2.13
grooves in space, 29.0.4 Haydn, Joseph, 1.4.14
Grossman, Marcel, 45.1.6 heaven, mathematical, ontology, 2.3.10
group, 9.2, 9.2.4, 9.2.18 Heaviside function, 8.6.6
group, Abelian, 9.2.21 Heine-Borel compactness, 15.7.5
group, commutative, 9.2.20 Heine-Borel theorem, 5.9.10, 5.9.13, 17.3.26, 17.3.27
group, diffeomorphism, 33.0.1, 33.6 Hermite, Charles, 2.4.5, 45.1.6
group, differentiable, 27.8.0, 33 Hesse, Ludwig Otto, 45.1.5
group, direct product, 9.3.30 Hessian matrix, 26.6.9, 26.6.10, 38.2.2
group, direct product of family, 9.3.30 Hessian of second derivatives of distance function, 39.1.1
group, essence, 5.16.2 Hessian operator, 29.6.1, 36.6, 36.7.2
group, finite, 7.4.5 Hessian operator at a point, 36.6.3
group, free, 10.10.4 Hessian operator at critical point, 31.6
group, general linear, 10.9, 11.7, 36.0.1 hieroglyphics, Egyptian, 2.5.5, 3.10.6
group, homotopy, 16.6.0 high-level set language, 5.6.2
group, left module, 9.9.17 higher-order derivatives for real-to-real functions, 18.4

858 50. Index
higher-order derivatives for several variables, 18.6 human procedure, logic, 3.4.3
higher-order differentiability, 18.4.1 human speech, 2.2.4
higher-order differential, 31 hunters, mammoth, decision-making, 3.6.1
higher-order differential of curve, 31.3 Huygens, Christiaan, 45.1.3
higher-order differential of curve family, 31.4 hydrogen, 2.1.5
higher-order differential of differentiable map, 31.2 hydrolysis, 14.8.1
higher-order differential of real-valued function, 31.1 hyperbolic first order systems of PDE, 20.10.0
higher-order operator, differentiable map, differential, 31.7 hyperbolic metric, 17.1.15
higher-order operator, differential of curve, 31.8 hyperbolic PDE, 21.4.0
higher-order operator, real-valued function, differential, 31.5 hyperboloid, 42.8
higher-order tangent operator, 29.1 hyperplane segment through points, affine space, 12.2.22
higher-order tangent space, 29.4 hyperplane through points, affine space, 12.2.21
higher-order tangent vector, 29, 29.3 hypothesis, continuum, 5.0.10
higher-order vector field, 29.7
ibn Mūsā, Abū Ja’far Mohammed, 45.2.3
higher-order vector field for family of curves, 29.8
ice cream, 3.3.10
Hilbert, David, 33.0.5, 45.1.6
ideal, Platonic, 2.3.3
Hilbert space, 10.2.22
ideal of ring, 9.8.7
Hilbert’s fifth problem, 33.0.5, 33.2
ideas, big, 1.4.11
Hipparchus, 45.1.1
ideas, fuzzy, 2.4.3
Hippocrates of Chios, 7.4.6, 45.1.1
ideas, Plato’s theory, 2.4
history of differential geometry, 45
idempotent linear automorphism, 10.1.5
history of tensor calculus, 40.1
identification map, standard, for Cartesian product, 7.7.6
history of topology, 14.2 identification set, 6.4.5
Hobby, John D., 1.10 identification space, 6.4.5, 6.10, 6.10.5
Hoekwater, Taco, 1.10 identification space, topological, 25.6.0
Hölder condition on Riemannian space, 39.2.2 identification space method, 23.12.10
Hölder continuity, 18.9 identification space representation, 6.10.6
Hölder continuous function, 18.9.1 identity, 2.4.3, 2.4.4
Hölder-continuous manifold, 42.4 identity, group, 9.2.10
Hölder function space, 44.7 identity, Jacobi, 9.11.0, 9.11.1, 9.11.3
holonomy groups, 24.4.0 identity, set, 5.2.6
homeomorphism, 14.1.10, 14.13 identity element of unitary ring, 9.8.2
homeomorphism, directionally differentiable, 19.6 identity function, 6.5.19
homeomorphism pseudogroup, 14.2.3, 19.4.2, 19.4.3, 19.4.5, identity matrix, 11.1.20
19.4.6, 19.4.9, 19.6.4, 19.6.7 if and only if, 4.3.3
homeomorphism pseudogroup, complete, 19.4.7 IF-construction, 3.5.4
Homer, 3.2.8 Iliad, 5.0.2
hominid, 2.2.4 image of function, 6.5.9, 6.5.10
Homo erectus, 2.2.4 image of path, 16.4.2
Homo habilis, 2.2.4 image of relation, 6.3.6, 6.3.7
homology theory, singular, 16.6.0 image of set by relation, 6.3.13
homomorphism, Lie algebra, 9.11.6, 9.11.16 images, digital, 2.9.1
homomorphism, linear space, 10.3.6 imaginary world, model, 2.10.1
homomorphism, order, 7.1.13 imaginative process, predicate calculus, 4.14.4
homomorphism between modules, 9.9.10 immersion of manifold, 30.3.10
homomorphism module, 9.9.5, 9.9.14, 9.9.26 imperative verb mood, 3.10.14, 3.12.1, 3.12.3
homotopy continuous, 24.4.1 imperfect universe, 3.4.3
homotopy group, 16.6.0 implication, logical, 4.3.3
horizontal component of total tangent space, 27.11, 27.11.2 implication-based propositional calculus, 4.7
horizontal component of vector, 26.13.8 implication operator, primacy, 4.6.5
horizontal lift function for ordinary fibre bundle, 35.3 implication operator chain, 4.6.5
horizontal lift function for principal fibre bundle, 35.5 implies, 4.3.3
horizontal lift function on differentiable fibre bundle, 35.3.5, import, concrete equality relation, 4.15.2
35.3.12 importation, logical operator, 4.2.1, 4.3.1
horizontal lift function on principal bundle, transposed, 35.5.3 imported mathematical structure, 2.6.2
horizontal map function for principal fibre bundle, 35.9.2 inclusion of sets, cyclic, 4.1.9
horizontal vector, 10.11.11 inclusive disjunction, 3.5.5
horizontal vector on principal fibre bundle, 35.9.4 inclusive or, 3.5.2, 3.5.7, 3.13.7, 4.3.5
horse, 1.5.4 incomplete axiomatic system, 3.8.2
hours of daylight calculation, 41.15 incomplete information flow, 3.8.2
Hoyle, Frederick (Fred), 45.1.6 incomplete information transfer, 3.8
Hrunting, 45.4.2 incompressible number, 2.10.4, 2.11.2
Hülle, abgeschlossene, 14.5.6 inconsistency, world-model ontology, 5.7.12
human brain, 4.2.8 inconsistency-tolerant logic, 4.1.5, 5.7.11, 5.7.12
human communication, finite bandwidth, 2.11.14 inconsistent axioms, 4.1.9, 4.12.7
human language, 3.2.8 inconsistent logic machine, 3.10.4
human literature, 3.8.2 inconsistent truth value function, 4.1.5
human minds, 2.5.7 index convention, Einstein, 7.11.15, 13.8.5, 13.9.18, 13.10.2

50. Index 859
index function, permutation, 7.10.11 information geometry, 38.10

index lowering, 13.9.18 information matrix, Fisher, 38.10.1
index of this book, 50 information transfer, incomplete, 3.8
indexed, differentiable, 26.3.2 initial index, list, 7.12.3
indexed atlas, differentiable, 26.3.2 initial point of curve, 16.2.13
indexed differentiable atlas, 26.3.2 initial point of path, 16.4.11
indicative verb mood, 3.7.1, 3.10.11, 3.12.1, 3.12.2 initial value problems, 21.4
indicator function, 7.9.2 injection, 6.5.23
indirect method of proof, 3.11.1, 3.11.9 injective function, 6.5.23
individual constant, 4.12.12 injective relation, 6.3.31
Indo-European languages, 3.12.2 inline function definition, 14.12.19
induced map, 30.1 inline proof, 4.9.10
induced map for tagged tangent operator, 30.3.24 inner automorphism, 9.3.22
induced map for tangent operator, 30.3.23 inner product, 10.8
induced map for vector field, 30.3.14 inner product, Euclidean, standard, 10.8.5
induced map of differentiable map, 30.3.6 inner product on Riemannian manifold, 38.8
induced topology of differentiable atlas, 26.3.4 insert function for list, 7.12.2
induced topology on a set by a function, 14.12.5 insertion, proposition list, 3.7.14, 3.10.2
induction, 3.7.8 inside, 5.7.17
induction, mathematical, 2.11.1, 7.3.4 inside skew product of transformation groups, left, 9.6.8
induction, mathematical, principle, 2.10.1 inside skew product of transformation groups, right, 9.6.9
induction, mathematical, validity, 2.11.17 insight, 1.9.2
induction, naive, 6.1.15, 7.2.6 integer, 7
induction, transfinite, 18.4.1 integer, composite, 7.4.2
induction argument example, 14.3.8 integer, compressible, 2.11.6
induction-capable machine, 5.7.25 integer, even, 4.13.10
induction principle invalidity, 2.11.3 integer, extended, 7.6, 7.6.2
inductive definition, 18.4.1 integer, largest, 2.11.5, 2.11.7
inequality, triangle, 17.1.12, 17.1.14 integer, nearest, distance function, 8.6.19
inference from experience, 7.2.1 integer, nearest, function, 8.6.14
infimum of partially ordered set, 7.1.10 integer, negative, 7.5.4
infinite, 2.11.20 integer, negative extended, 7.6.4
infinite-dimensional linear space, 10.2.5 integer, philosophy, 2.11
infinite-dimensional manifold, tangent bundle, 27.16 integer, positive, 7.5.4
infinite paper, 2.10.6 integer, positive extended, 7.6.4
infinite proposition space, 4.12.2 integer, prime, 7.4.2
infinite sequence, termination, 2.11.22 integer, semi-pixie, 2.11.10
infinite set, Dedekind, 7.2.24 integer, signed, 7.5, 7.5.2
infinite versus unbounded, 5.5.2 integer arithmetic, unsigned, 7.4
infinite world, 2.10.9 integer definitions, boot-strap, 7.2.5
infinitesimal, 36.3.1 integer notation summary, 1.6.1
infinitesimal action, 35.3.1 integer representation, 27.1.4
infinitesimal curve, 32.2.0 integer set notations, 7.5.4
infinitesimal transformation, 33.8, 34.3.0 integer set notations, extended, 7.6.4
infinitesimal transformation of Lie right transformation integers, quantum mechanics, 3.4.3
group, 33.8.10 integrability-based function space, 20.11
infinitesimal transformation of Lie transformation group, integral, Lebesgue, 20.2
33.8.4 integral calculus, 20
infinitesimal vector, 19.1.6 integral calculus and logic, 4.4.2
infinitesimals, 18.1 integral of Riemannian metric, 38.4.2
infinity, 8.4.4 integration, 20
infinity, absurdly large, 2.0.3 integration, Lebesgue, 20.2
infinity, concept, 7.2.1 intellect, superior, 3.3.4
infinity, logical quantifier, 4.13.10, 4.13.12 intellectual capital, 4.7.7
infinity, ontology, 2.10.1 intellectual economy, 2.11.14
infinity, philosophically troubling, 14.2.5 intelligence, artificial, 2.3.1
infinity, philosophy, 2.11 inter-galactic civilisations, 2.0.3
infinity axiom, 2.10.6, 2.10.7, 7.2.6 interior of set, 14.5
infinity axiom, ZF, 2.11.13, 5.6 interior of set, topological, 14.5.1
infix notation, 6.3.22 interior point of open set, 14.3.9
information bottleneck, cosmic, 2.11.17 interior point of set, 14.0.2
information content, 2.5.15 internal direct sum of linear spaces, 10.6.5
information content, logical conjunction/disjunction, 3.5.9 international dateline, 41.1.5
information content, logical operator, 3.5.5 Internet protocols, 2.5.7
information content, proposition, 3.8.1 Internet resources, 49.0.1
information content, topology, 14.8.10 Internet standards, 2.5.13
information content, universal/existential quantifier, 4.13.9 interoperability of communication systems, 2.8.9
information flow, incomplete, 3.8.2 interoperability tests, 2.5.2

860 50. Index
interpolation, convex curvilinear, affine manifolds, 37.6 k-combination, 7.11.1

interpolation, convex curvilinear, affine space, 16.5 k-permutation, 7.11.1
interpretation, domain, propositional calculus, 4.12.8 kangaroo, roast, 2.2.7
interpretation, logic, 3.3.3 Kant, Immanuel, 3.4.4, 45.1.5
intersection of family of sets, 6.8.5 Kantian philosophy, 25.1.2
intersection of sets, 5.13.2 Kepler, Johannes, 45.1.3
intersection of sets, properties, binary, 5.13 key, secondary, 2.6.1
intersection of sets, properties, general, 5.14 Khwārizmi, al, 45.2.3, 45.2.4
intertial frame, 24.1.4 Klein, Felix, 14.2.3, 19.4.2, 23.5.0, 45.1.6, 45.3.0
interval, open, 14.10.2 Klein bottle, 43.2.2
interval, real numbers, 8.3.10 kludge, ad-hoc, 5.7.27
interval, unit, 8.3.10 knitting, 2.9.6
introduction, formal notation, 5.8.6 knowledge, 1.8.1
introduction to the book, 1 knowledge, a-priori, 2.1.3, 2.5.4
introspection, 2.5.20 knowledge, absolute, 2.5.14
intuition, 5.0.3 knowledge, cyclic, 2.1.3
intuition, diagrams, 3.6.2 knowledge, mathematical, corpus, 3.2.5, 5.0.3
intuitionism, 2.2.8, 3.6.4 knowledge, prior, 3.0.1
intuititionism, 5.9.3 knowledge bedrock, 2.0.1
intuitive topology, 14.3.9 knowledge wheel, 2.0.1
invariant of diffeomorphism, 36.1.5 known unknown, 4.2.8
invariant of transformation group, 9.7 Knuth, Donald Ervin, 1.10
invariant vector field, right, 33.4.7 Konnektion, 35.1.3
inverse, group, 9.2.13 Kontinuum, 25.1.9
inverse function, 6.5.25 Körper, 9.8.10
inverse image of set by relation, 6.3.13 Kronecker, Leopold, 2.12.5, 7.2.7, 45.1.5
inverse-image topology, 14.12.6 Kronecker delta function, 7.9.1, 7.9.10, 11.1.22, 22.4.0, 27.7.5
inverse matrix, left, 11.1.24 Kuratowski, Kazimierz (Casimir), 6.1.5, 45.1.6
inverse matrix, right, 11.1.24
inverse of linear map, 10.4.1 lacto-vegetarian, 5.15.6
inverse problem, algebra, 4.4.2 ladder, Schild’s, 36.7.10, 36.8.8
inverse problem, logic, 4.4.2 Lagrange, Joseph-Louis, 2.9.10, 45.1.4, 45.1.5
inverse relation, 6.3.27, 6.5.25 Lagrangian mechanics, connection, 36.11
inverse set map, function, 6.6 land measurement, 45.2.2
inverse set map corresponding to a function, 6.6.1 language, 3.5.1
inverse trigonometric function, 20.13.2 language, discussion, 3.10.6
invertible linear transformation, 9.11.15 language, first order, with equality, 5.2.3
invertible matrix, 11.3.15 language, human, 3.2.8
invisible pixie, 2.2.8, 2.10.7 language, location, 2.5.7
Iraq, 3.5.2 language, logic, low-level, 5.6.2
Irish Gaelic, 3.9.5 language, logical, 3.11.7
irrational numbers, 2.9.3, 2.12.4, 2.12.5 language, logical, ancient literature, 45.4
is-proposition, 3.3.10 language, logical, Bēowulf, 45.4.2
isolated point of set, 14.7.5 language, logical, Gilgamesh epic, 3.5.2, 45.4.1
isometry on two-sphere, 41.8 language, mathematical, 2.5.7
isomorphism, Lie algebra, 9.11.8 language, natural, 3.10.6, 3.12.1
isomorphism, linear space, 10.3.6 language, natural, logic, 3.13.1
isomorphism, order, 7.1.13 language, predicate, name-to-object map, 4.12.3
isomorphism test, 2.8.6 language, set, high-level, 5.6.2
IVP (initial value problem), 21.0.2 language, steering, 2.5.7
language families, natural, 1.4.13
Ja’far Mohammed ibn Mūsā, Abū, 45.2.3 languages, Indo-European, 3.12.2
Jacobi, Carl Gustav Jacob, 45.1.5 Laplace, Pierre-Simon, 2.9.10, 45.1.4, 45.1.5
Jacobi field, 37, 37.3, 37.3.2 Laplace-Beltrami operator, 38.7.3
Jacobi field on two-sphere, 41.13 Laplacian in conical coordinates, 42.7.0
Jacobi identity, 9.11.0, 9.11.1, 9.11.3 Laplacian operator, 19.5.3, 29.0.6, 29.7.7, 38.7.1, 38.7.2
Jacobian matrix, 19.1.4, 26.6.10, 30.3.15 Laplacian operator in Riemannian space, 40.5.0
Jacquard, Joseph-Marie, 2.5.14 large infinity, absurdly, 2.0.3
Jacquard loom design, 2.5.14 largest integer, 2.11.5, 2.11.7
jet, 27.1.11, 44, 44.10 largest topology, 14.4.1
jigsaw puzzle, 0.0 Latin, 2.1.2, 3.9.5, 4.6.8
joint denial, 4.3.6, 4.6.4 latitude, 25.4.5
joint denial operator, 4.3.8 law, ancient history, 3.10.14
joke, mathematics, 43.2.6 law, de Morgan’s, 5.13.11, 5.14.8
Jones, William, 45.2.13 law, Scottish, not proven, 3.10.15
Jordan arc, 16.2.11 Laws, Code, Hammurabi, 3.5.3
Jordan factorial function, 7.10.16 laws of motion, 20.6.18
journey, 16.1.1 layer, boot-strap, 2.1.6
juxtaposition, symbolic, 2.5.12 layer, magma, 3.14.1

50. Index 861
layers, five, linguistic structure, predicate calculus, 4.12.4 length of multi-index, 18.6.1
layers, five, linguistic structure, propositional calculus, 4.5.4 length of ordered traversal, 17.5.9
layers of structure of differential geometry, 1.1 length of rectifiable path, 17.5.17
learning, animal, 7.2.1 length of vector in Riemannian manifold, 38.3.7
learning, asterisk method, 1.9.1 length-parametrized geodesic, 40.5.0
learning, rote, 1.9.2 Leonardo da Vinci, 3.9.8
learning mathematics, 1.9 level curve, 16.1.7
least action principle, 36.11.1 level manifold, 16.1.7
Lebesgue, Henri Léon, 20.1.1, 45.1.6 leverage, geodesic, 31.7.0
Lebesgue dimension, 15.9.1, 33.2.5 leverage map, geodesic, 37.9.3
Lebesgue integration, 20.2 Levi-Civita, Tullio, 35.1.5, 45.1.6
Lebesgue measure, 13.10.0, 20.1 Levi-Civita alternating symbol, 7.9.1, 7.10.20
Lebesgue non-measurable set, 2.10.6, 2.11.9, 4.13.10, 5.9.3, Levi-Civita connection, 38.5, 38.5.2, 38.5.5, 38.7.2, 41.5.2
5.9.10, 5.9.11, 20.1.3, 20.1.4 Levi-Civita connection, Euclidean space, 12.4.1
Lebesgue non-measurable set, unknowable existence, 2.10.8 Levi-Civita connection, globality, 39.1.5
Lebesgue number, 17.3.22, 17.3.24 Levi-Civita connection, metric layer, 35.0.4, 38.0.1
Lebesque non-measurable set, 3.2.1 Levi-Civita connection, orthogonal, 38.2.5
left A-module, 9.9.8 Levi-Civita connection, parallelism, 35.2.3, 36.1.2
left action on Lie right transformation group, 33.8.9 Levi-Civita connection, parallelism at a distance, 24.1.1
left-associative operator, 4.3.12 Levi-Civita connection, tensorization, 29.2.7
left conjugate of a subset of a group, 9.3.15 Levi-Civita connection, tensorization coefficients, 36.7.10
left conjugation map, 9.3.22 Levi-Civita connection on two-sphere, 41.2.2
left conjunct, 4.3.11 Levi-Civita symbol, 7.10.11, 7.10.21, 7.10.22
left coset of subgroup, 9.3.3 Levi-Civita tensor, 7.10.20
left-differentiable function, 18.3.4 lexicographic order, 16.1.8
left disjunct, 4.3.11 liar’s paradox, 5.7.13, 5.7.22
left inside skew product of transformation groups, 9.6.8 library, software, 4.7.7
left invariant vector field, Lie group, 33.3.11 Lie, Marius Sophus, 19.4.1, 33.0.5, 45.1.5
left invariant vector field on Lie group, 33.3 Lie algebra, 9.11, 9.11.1
left inverse, group, 9.2.13 Lie algebra, adjoint, 9.11.22
left inverse matrix, 11.1.24 Lie algebra, adjoint representation, 9.11.19
left module, unitary, 10.1.3 Lie algebra, real, 9.11.11
left module over a ring, 9.9.19 Lie algebra associated with associative algebra, 9.11.10
left module over a ring, unitary, 9.9.21 Lie algebra homomorphism, 9.11.6, 9.11.16
left module over group, 9.9.17 Lie algebra linear representation, 9.11.17
left-open set, 18.3.2 Lie algebra morphisms, 9.11.8
left outside skew product of transformation groups, 9.6.4 Lie algebra of Lie group, 35.3.7, 41.5.2
left-side membership relation, 5.2.6 Lie algebra of Lie group, tangent-space version, 33.5.1
left transformation group, 9.4, 9.4.4 Lie algebra of Lie group, vector-field version, 33.5.2
left transformation group, topological, 16.8.17 Lie algebra on Lie group, 33.5
left transformation group homomorphism, 9.4.9 Lie algebra representation, 9.11.17
left transformation group homomorphism, topological, 16.8.6 Lie algebra representation space, 9.11.17
left transformation group mirror image, 9.6.1 Lie connection, 32.4.3, 32.4.4
left transformation group of a topological space, effective, Lie derivative, 32.4.9
16.8.7 Lie derivative expression for exterior derivative, 20.7
left transformation group of topological space, 16.8.2 Lie derivative of tensor field, 32.5
left transformation group of topological space, effective Lie derivative of vector field, 32.4
topological, 16.8.8 Lie group, 33.0.1, 33.1, 33.1.1
left transformation group of topological space, topological, Lie group, left invariant vector field, 33.3
16.8.3 Lie group, left translation operator, 33.3.2, 33.7.9
left transformation semigroup, 9.4.2 Lie group, right invariant vector field, 33.4
left translation operator, 33.3.1 Lie group, right translation operator, 33.4.2
left translation operator for tangent vectors, 33.3.7 Lie group, right translation operator for tangent vectors,
left translation operator of Lie transformation group, 33.7.8 33.4.3
left translation operator on Lie group, 33.3.2, 33.7.9 Lie group left invariant vector field, 33.3.11
left/right transformation group ambiguity, 2.5.4, 5.16.6, Lie group Lie algebra, 33.5, 35.3.7
33.8.8 Lie group Lie algebra, tangent-space version, 33.5.1
leftward path, set membership, 5.5.2 Lie group Lie algebra, vector-field version, 33.5.2
legal system, Mesopotamia, 3.7.11 Lie group of general linear transformations, 33.7.6
Legendre, Adrien Marie, 2.9.10, 20.13.22, 45.1.5 Lie group space, 33.7.2
Leibniz, Gottfried Wilhelm, 18.2.11, 20.1.1, 45.1.4, 45.1.5, Lie left transformation group, analytic fibre bundle, 34.2.5
45.2.11 Lie right transformation group, infinitesimal transformation,
Leibniz rule, 44.0.0 33.8.10
Leibniz rule for derivative, 27.1.1 Lie right transformation group, left action, 33.8.9
Leibniz rule tangent vector, 44.1.1 Lie structure group, 32.2.0
length, Planck, 2.12.1 Lie structure group, differentiable fibre bundle, 34.2, 34.2.2
length of curve, 17.5.9, 40.5.0 Lie subalgebra, 9.11.5
length of list, 7.12.2 Lie transformation group, 33.0.1, 33.7, 33.7.1

862 50. Index
Lie transformation group, infinitesimal transformation, 33.8.4 linear space direct sum, abstract, 10.6.3
Lie transformation group, left translation operator, 33.7.8 linear space direct sum, external, 10.6.1
Lie transformation group, right action, 33.8.3 linear space direct sum, formal, 10.6.1
Lie transformation group space, 33.7.2 linear space double bidual, 10.5.19
Lie transport, 32.4.5 linear space dual dimension, 10.5.10
Lie transport curve, 32.4.6 linear space eigenspace, 10.4.1
lift function, 10.11.11 linear space eigenvalue, 10.4.1
lift function, horizontal, for ordinary fibre bundle, 35.3 linear space eigenvector, 10.4.1
lift function, horizontal, on differentiable fibre bundle, 35.3.5 linear space endomorphism eigenspace, 10.4
lift function, tangent bundle, 27.2.1 linear space exact sequence, 10.11.1
lift function, tangent bundle, unidirectional, 27.14.2 linear space homomorphism linear space, 10.3.4
lift function for principal fibre bundle, horizontal, 35.5 linear space morphism notations, 10.3.7
lift function on differentiable fibre bundle, 35.3.12 linear space morphisms, 10.3.6
lift function on principal bundle, horizontal, transposed, linear space of linear space homomorphisms, 10.3.4
35.5.3 linear space of multilinear maps, 13.3, 13.3.3
lift of vector field by connection on principal fibre bundle, linear space of rectangular matrices, 11.1.7
35.5.8 linear space of vector fields, 28.5.8
light, speed, 38.1.5 linear space quotient, 10.7
light, visible, 3.4.5 linear space second dual, 10.5.18
limit, philosophically troubling, 14.2.5 linear space sequence direct sum standard injection, 10.6.4
limit notation, 14.12.16 linear space subspace, 10.2.1
limit of function at a point, 14.12.15 linear space tensor product, 13.6
limit of sequence of points, 14.12.27 linear subspace, 10.2
limit point of sequence, 14.12.26 linear transformation, invertible, 9.11.15
limit point of sequence of points, 14.12.27 linear transformations, general, Lie group, 33.7.6
limit point of set, 14.7.1, 14.12.26 linear transformations of IRn , group, 10.9
limit processes, 14.8.1 linguist, 2.2.8
limit set of function at a point, 14.12.15 linguistic structure, predicate calculus, five layers, 4.12.4
limits, metaphysics, 4.13.12 linguistic structure, propositional calculus, five layers, 4.5.4
line bundle, 34.7.2 linguistic style, symbolic logic, 4.0.2
line segment, affine space, 12.2.16 linguistics, 2.2.7
line segment, properties, affine space, 12.2.19 linguistics, anthropological, 2.2.4
line through points, affine space, 12.2.13 link, hard, computer, 5.7.19
line through points, properties, affine space, 12.2.14 Lipschitz, Rudolf Otto Sigismund, 45.1.6
linear algebra, 10 Lipschitz constant, 17.4.10
linear automorphism, idempotent, 10.1.5 Lipschitz continuity, 17.4.10
linear combination of vectors, 9.7.0, 9.7.1, 10.2.8, 10.2.22 Lipschitz curve transition map, 16.1.8
linear combination of vectors, formal, 10.10.4 Lipschitz function, 17.0.4, 17.4.10
linear functional, 10.5, 10.5.1 Lipschitz manifold, 19.6.8, 24.1.7, 26.11.1, 26.12, 26.12.4,
linear group, general, 36.0.1 42.4.0
linear map, 10.3, 10.3.1 Lipschitzian function, 17.4.10
linear map, component matrix, 11.2 Lisa, Mona, 3.9.8
linear map, exact sequence, 10.11 list, initial index, 7.12.3
linear map between modules over a ring, 9.9.25 list concatenation function, 7.12.2
linear map component matrix, 11.2.1 list element insert function, 7.12.2
linear map exponential, 10.4.1 list element substitute function, 7.12.2
linear map for a component matrix, 11.2.3 list element swap function, 7.12.2
linear map inverse, 10.4.1 list insertion, proposition, 3.7.14, 3.10.2
linear map transpose, 10.5.23 list length function, 7.12.2
linear representation of a Lie algebra, 9.11.17 list notation, 4.12.3
linear representation of associative algebra, 9.10.7 list omit function, 7.12.2
linear space, 9.9.2, 10.1, 10.1.2 list operation, 7.12.2
linear space, coordinate-free, 12.0.1 list operation on ring, 9.12.2
linear space, Euclidean, 10.2.20 list operation on semigroup, 9.12.1
linear space, finite-dimensional, 10.2.5 list operations for sets with algebraic structure, 9.12
linear space, free, 10.10, 10.10.1, 10.10.5 list product function, 9.12.2
linear space, infinite-dimensional, 10.2.5 list projection function, 9.12.1
linear space, internal direct sum, 10.6.5 list restriction function, 7.12.2
linear space, quotient, 10.7.1 list space, 7.12.2
linear space, topological, 16.9.1 list space, extended, 7.12.5
linear space axioms, 2.8.7 list space for general sets, 7.12
linear space basis, 10.2.23 list subsequence function, 7.12.2
linear space basis, finite, 10.2.9 Listing, Johann Benedict, 14.2.1, 45.1.5, 45.2.17
linear space basis existence, 5.9.10, 5.9.12, 10.2.25 listing, putative, 2.11.21
linear space bidual, 10.5.19 literature, 1.4.8
linear space differentiation, 18.8 literature, ancient, logic in, 3.5
linear space dimension, 10.2.5 literature, ancient, logical language, 45.4
linear space direct sum, 10.6 literature, human, 3.8.2

50. Index 863
literature, mathematical, 5.0.3 logic expression parse-tree, 4.3.10

local continuity analysis, 14.2.2 logic flavours, 4.0.1
local diffeomorphism family, 32.5.0 logic function, predicate, 4.12.11
local function, 6.11.3 logic in ancient literature, 3.5
local maximum, 26.7.6 logic interpretation, 3.3.3
local maximum on differentiable manifold, 26.6.8 logic language, low-level, 5.6.2
local minimum on differentiable manifold, 26.6.8 logic machine, 3.7.5, 3.7.7, 3.10.1, 4.2.8
local transformations, one-parameter group, 26.8.4 logic machine, inconsistent, 3.10.4
locally compact topology, 15.7.12 logic machine, temporal parameter, 3.7.9
locally connected topological space, 15.4.20 logic machine model, 3.6.1
locally Euclidean group, 33.2.2 logic machine model, recursive, 3.3.6
locally Euclidean space, 25.2, 25.2.3 logic machine network, 3.7.14
locally Euclidean space, non-Hausdorff, 42.3 logic machine reboot, 3.10.5
locally Euclidean topological group, 33.1.2 logic metaphor, 3.5.10
locally Euclidean topological space, 25.3.0 logic methods, 4
locally finite subset of a topological space, 15.7.3 logic model, 3.3.3
location, mathematical objects, 2.7.1 logic ontology, proposition-store, 3.6, 3.7
location language, 2.5.7 logic ontology, world-view, 3.6
logarithm function, 20.12, 20.12.1 logic operator, arithmètic equivalent, 4.3.15
logic, 4 logic operator notations, 4.3.6
logic, abstract, 3.9.6, 4.1.11 logic problem, 4.1.7
logic, analytic, 3.9.7 logic procedures, 4
logic, animal, 3.5.10 logic procedures formalization, 4.5.1
logic, anthropology, 3.2.8 logic quantifier duality, 4.13.10
logic, applicability, 3.9.8 logic quantifier notations, 4.13.7
logic, Aristotelian, 3.4.2, 3.11.9 logic remarks, 2.9
logic, art, 3.2.3 logic service provider, 3.3.8
logic, automated, 4.3.10 logic structure, 3.3.3
logic, axiomatic reformulation, 3.13.4, 7.13 logic styles, 4.0.1
logic, colloquial, confusion, 3.5.7, 3.13.7 logic substitution rule, 4.9.1
logic, completeness, 4.1.7 logic symbols, 4.3.5
logic, contradiction-free, 3.10.12 logic syntax/semantics, 4.3.10
logic, correct, 3.1.3 logical algebra, 4.14.4
logic, foundation, 3.0.1 logical algebra, formalized, 3.1.2
logic, human procedure, 3.4.3 logical argument, 3.7.8
logic, inconsistency-tolerant, 4.1.5, 5.7.11, 5.7.12 logical argumentation, 4.4
logic, inverse problem, 4.4.2 logical conjunction, 4.3.3
logic, mathematical, tour, 3.1.2 logical disjunction, 4.3.3
logic, mathematical, true nature, 3.1.2 logical equation, 4.4.3
logic, mechanization, 3.7.5, 4.0.2 logical equations, simultaneous, 4.4.2
logic, mental process model, 3.11.3 logical expression, 4.3
logic, modern, universality, 3.4 logical expression, exhaustive substitution, 4.14.3
logic, naive, 3.2.1, 4.9.5 logical expression, non-atomic, 3.5.4
logic, natural language, 3.13.1 logical expression evaluation, 4.4
logic, paraconsistent, 4.1.5, 5.7.11 logical expressions, equivalent, 3.10.13
logic, predicate, 3.1.2 logical function, constant, 4.12.12
logic, proposition-store ontology, 3.11.6 logical language, 3.11.7
logic, propositional, world-model ontology, 3.6.3 logical language, ancient literature, 45.4
logic, science, 3.2.5 logical language, Bēowulf, 45.4.2
logic, self-consistency, 4.1.7 logical language, Gilgamesh epic, 3.5.2, 45.4.1
logic, semantics, 3 logical negation, 3.10.1
logic, semantics-free, 4.5.1 logical negation, double, 3.11.6
logic, symbolic, set theory, 5.0.3 logical negation, semantics, 3.10
logic, synthetic, 3.9.7 logical operation, abstract-to-concrete map, 3.13.1
logic, three-truth-value, 3.11.7 logical operation tree, 3.10.13
logic, two-truth-value, 3.11.7 logical operator, 3.10.10, 4.3, 4.11
logic, wild, 2.2.7 logical operator, associativity, 4.3.12
logic, world-model ontology, 3.11.6 logical operator, asymmetric, 4.6.2
logic algebra, 3.1.2 logical operator, information content, 3.5.5, 3.5.9
logic and set theory, circularity, 3.3.8 logical operator, principal connective, 4.3.11, 4.3.12
logic and set theory, cyclic, 2.1.6 logical operator, zero-operand, 4.3.18, 4.3.19
logic application, 3.3.3 logical operator importation, 4.2.1, 4.3.1
logic applications, 3.2.5 logical operators, defined by deduction rules, 4.6.3
logic axiom of distributivity, 4.7.5 logical predicate, zero-parameter, 4.12.10
logic axiom of restriction, 4.7.5 logical proposition, verb mood, 3.12
logic circuit, electronic, 4.5.5 logical quantifier, infinity, 4.13.10, 4.13.12
logic customer, 3.3.8 logical quantifiers, 4.13
logic discussion, 3.3.4 logical sub-expression, parenthesized, 4.3.12

864 50. Index
logical thinking, 3.2.8 map, differentiable, higher-order differential, 31.2

logical voltage, 4.1.8 map, exponential, 37.8, 37.8.2
longitude, 25.4.5 map, name-to-object, predicate language, 4.12.3
loom design, Jacquard, 2.5.14 map, regular, 30.3.9
loop, modelling, 3.3.7 map, set, function, 6.6
Lorentz transformation, 29.0.3 map, truth value, 4.1.3
Lorentzian relativity, 29.0.3 map from linear space to second dual, canonical, 10.5.20
low-level logic language, 5.6.2 map projection for two-sphere, standard, 41.16
lower bound of partially ordered set, 7.1.10 mapping, 6.5.4
lowering the indices, 13.9.18 martian robots, 3.4.1
lowest-level concepts, 3.0.1 mat, cat, sat, 3.10.10
lucky dip, 5.11.5 mathematical class, 5.16.2
L
# ukasiewicz, Jan, 4.7.1 mathematical class notations, 5.16.5
luminiferous aether, 24.1.4 mathematical class ontology, 2.5.4
mathematical class parameters, 5.16.5
Mach, Ernst, 45.1.6
mathematical deduction, 2.9.7
Mach’s principle, 24.1.4, 39.1.5
mathematical definitions, naturalism, 5.0.4
machine, finite, 2.10.9
mathematical induction, 2.11.1, 7.3.4
machine, induction-capable, 5.7.25
mathematical induction, principle, 2.10.1
machine, logic, 3.7.5, 3.7.7, 3.10.1, 4.2.8
mathematical induction, validity, 2.11.17
machine, logic, network, 3.7.14
mathematical knowledge, corpus, 3.2.5, 5.0.3
machine, mathematical thinking, 4.2.8
mathematical language, 2.5.7
machine, world-model, 3.6.1
mathematical logic, modelling, 3.2.5
machine model, logic, 3.6.1
machine model, logic, recursive, 3.3.6 mathematical logic, span, 3.2.5
machine reboot, logic, 3.10.5 mathematical logic, tour, 3.1.2
machines, virtual, 2.5.15 mathematical logic, true nature, 3.1.2
Maclaurin, Colin, 45.1.4 mathematical object, 5.16.2
Maclaurin series, 13.1.2 mathematical objects, extraneous properties, 2.7.2
magma, molten, 2.1.1, 2.1.3 mathematical objects location, 2.7.1
magma layer, 3.14.1 mathematical physicist, 2.9.9
male slave, 3.5.3 mathematical structure, imported, 2.6.2
Malus, Étienne-Louis, 3.11.9 mathematical symbols, intellectual content, 2.3.7
mammoth hunters, decision-making, 3.6.1 mathematical system definitions, 2.8
management of extraneous properties, 2.7.4 mathematical thinking machine, 4.2.8
management of propositions, predicate calculus, 4.12.1 mathematician, applied, 2.9.9
manifold, affine, 12.0.3, 36.1.4 mathematicians, chronology, 45.1
manifold, affinely connected, 35.1.2 mathematicians, logic, 3.13.1
manifold, analytic, 26.10, 26.10.2 mathematics, a-priori, 2.12.1, 6.0.3
manifold, analytic, compact, 26.10.5 mathematics, bedrock, 2.1
manifold, analytic, paracompact, 26.10.6 mathematics, deliverables, 3.2.3
manifold, C 0 , 25.4.11 mathematics, naive, 2.1.1, 3.14, 6.0.3
manifold, differentiable, 26.3.6 mathematics, naive, non-axiomatic, 3.14.2
manifold, differentiable, compact, 26.5.17 mathematics, recreational, 2.9.9
manifold, differentiable, paracompact, 26.5.18 mathematics, rigorous, 2.1.1
manifold, discrete, 25.1.6 mathematics, self-interest, 2.10.1
manifold, embedded, 26.4.3 mathematics, semantics, 3.2.4
manifold, Hölder-continuous, 42.4 mathematics communication channel, 2.5.11
manifold, level, 16.1.7 mathematics foundation, 5.0.1
manifold, Lipschitz, 19.6.8, 24.1.7, 26.12, 26.12.4, 42.4.0 mathematics heaven, 2.4.1
manifold, paracompact, 33.1.2 mathematics joke, 43.2.6
manifold, pathological, 42.4.0 mathematics learning, 1.9
manifold, pseudo-Riemannian, 39, 40.6 mathematics ontology, 2.3
manifold, Riemannian, 38, 38.3.6 mathematics ontology categories, 2.3.8
manifold, topological, 25, 25.3, 25.3.1 mathematics outputs, 3.2.3
manifold, unidirectionally differentiable, 26.11 mathematics package, computerized, 1.4.8
manifold atlas, differentiable, 26.3 mathematics philosophy, 2
manifold chart, affine space, 12.2.11 mathematics remarks, 2.9
manifold direct product, 25.5.5 mathematics software packages, symbolic, 2.12.6
manifold embedding, 30.3.11 matrix, characteristic polynomial, 11.5.7
manifold equivalence, 26.5.10 matrix, coordinate transition, 26.5.7
manifold example, 42 matrix, definite real symmetric, 11.6
manifold immersion, 30.3.10 matrix, Fisher information, 38.10.1
manifold product, 25.5.5 matrix, Hessian, 26.6.9, 26.6.10
manifold product, differentiable, 26.5.15 matrix, invertible, 11.3.15
manifold with affine connection, tensor calculus, 40.3 matrix, Jacobian, 19.1.4, 26.6.10, 30.3.15
map, 6.5.4 matrix, orthogonal, 11.1.25, 11.5.7, 41.17.2
map, differentiable, between differentiable manifolds, 26.9, matrix, real definite, 11.4
26.9.1 matrix, real negative definite, 11.4.7

50. Index 865
matrix, real negative semi-definite, 11.4.7 meta-discussion context, 3.3.4

matrix, real positive definite, 11.4.7 meta-function, 6.5.22, 7.9.1, 7.9.12
matrix, real positive semi-definite, 11.4.7 meta-language, 3.10.6
matrix, real semi-definite, 11.4 meta-language, semantics, 3.10.6
matrix, real symmetric, 11.5.2 meta-language, syntax, 3.10.6
matrix, rectangular, 11.1.1 meta-logic, 3.10.12
matrix, semi-definite real symmetric, 11.6 meta-logical framework system, 5.7.22
matrix, sparse, 10.10.5 meta-logical proof, 3.13.8
matrix, symmetric, 11.3.19 meta-meta-language, 3.10.8
matrix algebra, 11 meta-model, 3.3.5
matrix algebra, real square, 11.4 meta-modelling, 3.3
matrix algebra, real symmetric, 11.5 meta-proof, 4.9.4
matrix algebra, rectangular, 11.1 meta-proposition, 3.10.7, 3.10.10
matrix algebra, square, 11.3 meta-proposition, disjunction, 3.13.6
matrix determinant, 11.3.6 meta-sentence, 3.10.7
matrix diagonalization, 41.8.3 meta-set, 5.7.23
matrix eigenvalue, 11.1.0, 11.5.7 meta-set-theory, 5.7.21
matrix eigenvector, 11.1.0, 11.5.7 meta-theorem, 4.9, 4.9.5
matrix exponential, 21.7 meta-theorem, deduction rule, 4.9.4
matrix group, 11.7 meta-theorem, statement about proofs, 4.9.3
matrix identity, 11.1.20 metadefinition, tangent bundle, 27.2, 27.2.1
matrix inverse, left, 11.1.24 metadefinition, tensor product, 13.5
matrix inverse, right, 11.1.24 metaphor, container, set, 5.7.17, 5.7.25
matrix linear space, rectangular, 11.1.7 metaphor, logic, 3.5.10
matrix product, 11.1.16 metaphysical universe, 2.7.1
matrix trace, 11.3.2 metaphysics, limits, 4.13.12
matrix transpose, 11.1.12 method of definition, axiomatic, 2.8.1, 2.8.2
matter, dark, 2.10.5 method of definition, constructional, 2.8.1, 2.8.4
maximal atlas, C r , 26.5.14 method of exhaustion, 20.1.1
maximal atlas, topological, 25.4.15 method of proof, indirect, 3.11.1, 3.11.9
maximum, local, 26.7.6 methods, logic, 4
maximum, local, on differentiable manifold, 26.6.8 metric, induced topology, 17.3, 17.3.3
maximum of partially ordered set, 7.1.10 metric, point-to-point, 17.3.2
maximum principle, 21.0.5, 21.1.1, 21.1.4 metric, pseudo-Riemannian, 39.2, 39.2.1
Maxwell’s equations, 26.1.6 metric, pseudo-Riemannian, overview, 39.1
measure, Lebesgue, 13.10.0, 20.1 metric, Riemannian, 38.3, 38.3.3
measure, Radon, 20.10 metric, Riemannian, differentiability, 38.3.4
measure theory, 20 metric, Riemannian, overview, 38.2
measure theory, geometric, 13.10.0, 20.8 metric, two-point, 17.1.3
measurement, land, 45.2.2 metric connection, 38.5.6
measurement resolution, finite, 2.12.1 metric function, 17.1.1
meat, 5.15.6 metric function, two-point, 38.2.2
meat-centric cook, 1.4.5 metric-invariant geometry, 45.3.0
mechanics, Lagrangian, connection, 36.11 metric layer, 1.1
mechanics, quantum, 3.4.5 metric layer, Levi-Civita connection, 35.0.4
mechanistic computations, 2.9.1 metric space, 17, 17.1.2
mechanization, logic, 4.0.2 metric space, bounded set, 17.2.7
mechanization of logic, 3.7.5 metric space, continuous function, 17.4, 17.4.1
mechanization of mathematics, 2.9.1 metric space, paracompact, 17.3.7
medieval Europe, 3.4.2 metric space, topology, 17.3.3
medieval university education, 2.9.8 metric space distance function, 17.1
membership, tribal, 2.2.2 metric tensor, 38.3.6, 40.5.0
membership chain, set, 5.5.3 metric tensor calculation from distance function, 41.3
membership relation, 5.1.3 metric tensor field, 10.2.18
membership relation, concrete propositions, 4.1.9 metric tensor field, Riemannian, 38.3.3
membership relation, left-side, 5.2.6 metric tensor on two-sphere, 41.2.2
membership relation, sets, 4.15.1 metric transformation group, 45.3.0
membership relation network traversal, 5.5.3 metric transport, 38.1.5
membership relation on the left, 5.2.1 microscope, 1.8.1, 2.12.2
membership symbol, set, 5.1.20 microscope analogy, 2.10.1
membership theory, 5.1.3 middle, excluded, 3.1.4, 3.6.3, 3.11.2, 5.7.17
memory, virtual, 2.10.10 middle, excluded, Russell’s paradox, 5.7.11
memory allocation, dynamic, 2.10.10 Middle Ages, 3.4.2
mental process model, logic, 3.11.3 min/max equivalent, logic operator, 4.3.15
Mesopotamia, legal system, 3.7.11 mind, animal, 2.3.3, 3.4.5
Mesopotamian mathematics, 45.1.1 mind, embodied, 2.5.21
meta-assertion, 3.9.4 mind, finite, 2.10.2
meta-discussion, logic, 3.9.4 mind, human, finite bandwidth, 2.11.14

866 50. Index
mind, mathematical, ontology, 2.3.10 module with operator domain, 9.9.8

mind, Roman, 45.1.2 module without operator domain, 9.9.4
mind states, 2.5.3 modules over a ring, linear map, 9.9.25
mind stretching, 2.3.5 modulo function, 8.6.16
minds, 2.9.2 modulus function, 8.6.3
minds, alien, 2.5.6 modus ponendo ponens, 4.6.8
minds, communities, 2.2.3 modus ponens, 4.4.3, 4.5.2, 4.6.1, 4.6.5
minds, human, 2.5.7 modus tollendo tollens, 4.6.8
mini-logic-machine, 3.10.5 Mohammed ibn Mūsā, Abū Ja’far, 45.2.3
minimalist, 4.6.4, 4.11.3 molten magma, 2.1.1, 2.1.3
minimalist endeavour, 5.0.7 Mona Lisa, 3.9.8
minimalist principle, 4.9.4 Monge, Gaspard, 45.1.5
minimum, local, on differentiable manifold, 26.6.8 monkey, gelada, 2.2.4
minimum-length geodesic, 38.4.3 monkey tribes, 2.2.3
minimum of partially ordered set, 7.1.10 monomial, tensor, 13.13.2
Minkowski, Hermann, 45.1.6 monomorphism, Lie algebra, 9.11.8
Minkowski space-time, 39.1.1 monomorphism, linear space, 10.3.6
Minoan tablets, ancient, 2.5.6 monster, self-referential, 5.7.27
mirror image of left transformation group, 9.6.1 monstrosities, 4.2.7, 4.2.8
mirror image of right transformation group, 9.6.1 mood, verb, 3.10.6
mirrors, hall of, 5.7.19 mood, verb, imperative, 3.10.14, 3.12.1, 3.12.3
missing steps, 1.5.2 mood, verb, indicative, 3.7.1, 3.10.11, 3.12.1, 3.12.2
mixed tensor, 13.8, 28.3 mood, verb, logical proposition, 3.12
mixed tensor algebra, 13.9.14 mood, verb, subjunctive, 3.7.1, 3.10.11, 3.12.2
mixed tensor product operation, 13.9.11 mood, verb, symbolic logic, 3.12.1
mixed tensor space, 13.8.2 Moran, Bill, 47.4.1
mixed transformation group, 9.6 Morgan’s law, de, 5.13.11, 5.14.8
mobile vector, 10.1.6 morphing curves, trajectory, 16.5.1
Möbius, August Ferdinand, 12.1.1, 43.2.0, 45.1.5, 45.3.0 morphism notations, linear space, 10.3.7
Möbius strip, 23.6.17, 43.2.0 morphisms, groups, 9.2.22
Möbius strip as fibre bundle, 43.2 morphisms, Lie algebra, 9.11.8
Möbius strip fibre bundle on one-sphere, 43.3 morphisms, linear space, 10.3.6
model, acceptance/rejection, proposition, 3.7.2 motion, laws, 20.6.18
model, imaginary world, 2.10.1 motivations of this book, 1.4
model, logic, 3.3.3 motor/sensor, feedback, 3.3.2
model, logic machine, 3.6.1 Mount Olympus, 5.0.2
model, logic machine, recursive, 3.3.6 mouse, 7.2.7
model, mental process, logic, 3.11.3 mouse click, 26.2.3
model, object/class, 5.7.25 MP, equivalent to RAA, 4.6.2
model, perfect, 3.4.3 MSC 2000 subject classification, 1.8
model, recursive, coherence, 3.3.6, 3.3.7 multi-celled animal, 14.1.11
model, set theory, 5.2.3 multi-index, 18.6.1
model, socio-mathematical network, 2.5.3 multi-index derivative, 18.6.2
model, world, animal, 5.7.25 multi-level tangent bundle, 27.10.18
model, world, organism, 3.3.2 multilinear algebra, 13.1.7
modelling, 3.3 multilinear dual, 13.6.5, 13.7.5
modelling, mathematical logic, 3.2.5 multilinear effect of sequence of vectors, 13.1.2, 13.1.11
modelling, recursive, 3.3 multilinear effect of vector sequence, 13.1.6
modelling, unbounded, on demand, 5.7.25 multilinear map, 13.2, 13.2.3
modelling loop, 3.3.7 multilinear map, antisymmetric, 13.4, 13.4.3
modelling mathematical thinking, 2.6.3 multilinear map, canonical, 13.6.4
modelling ontology, 2.10.1 multilinear map, canonical, tensor space, 13.5.1
models, physics, 21.0.4 multilinear map, symmetric, 13.4, 13.4.2
models, world, multiple, 3.11.2 multilinear map linear space, 13.3, 13.3.3
modern axiom system, 2.8.3 multilinear quintessence, 13.5.4
modern logic, universality, 3.4 multiple choice, 3.7.3, 3.9.5
module, 9.9 multiple contexts, logic, 3.3.4
module, functional, 4.1.1 multiple point of curve, 16.2.13
module, unitary left, 10.1.3 multiple point of path, 16.4.11
module automorphism, 9.9.10 multiple-valued function, 6.11.1
module homomorphism, 9.9.10 multiple world models, 3.11.2
module morphism notations, 9.9.12 multiplication, scalar, 10.1.2
module of endomorphisms, 9.9.27 multiplicative axiom, set theory, 5.9.14
module of homomorphisms, 9.9.5, 9.9.14, 9.9.26 multiplicity quantifier, 4.16.9
module operator domain, 9.9.8 Mūsā, Abū Ja’far Mohammed ibn, 45.2.3
module over a set, 9.9.8 music, 2.9.8
module over group, left, 9.9.17 Mycenaean clay tablet, 2.3.5
module structures summary table, 9.9.0 myth, 5.0.2

50. Index 867
naive, etymology, 2.1.2 network synchronization, socio-mathematical, 2.5.3

naive comprehension, axiom, 5.7.2, 5.7.6, 5.7.8 network topology, 16.10
naive cross product, 6.1.6 network traversal, membership relation, 5.5.3
naive induction, 6.1.15, 7.2.6 Neumann-Bernays-Gödel set theory, 5.1.1, 5.2.2
naive logic, 3.2.1, 4.9.5 Neumann problem, 21.3.0
naive mathematics, 2.1.1, 2.1.2, 3.14, 6.0.3, 7.2.5 neurophysiology, 2.5.17
naive mathematics, non-axiomatic, 3.14.2 neuropsychology, 2.5.17
naive natural number, 3.14.4 never-constant curve, 16.3.3, 24.4.4
naive set, 3.14.4, 4.1.3, 4.1.4, 4.1.9, 4.12.6 New Guinea, 3.2.8
naive set theory, 5.1.1, 5.2.2, 5.7.2 Newton, Isaac, 18.1.1, 18.1.2, 18.2.11, 20.1.1, 30.4.7, 45.1.3,
naive set theory, finite, 3.13.3 45.1.4, 45.1.5, 45.2.10, 45.2.11
naive theorem, 4.9.5 Newton’s gravity law, 21.0.4
naive vector field derivative, 32.1 Nile flooding, 45.2.2
naked dummy variable, 5.8.28 non-AC mathematician, 5.15.6
name, constant, definition, 4.12.13 non-atomic logical expression, 3.5.4
name, statement, 4.5.2 non-axiomatic naive mathematics, 3.14.2
name, statement-form, 4.5.4 non datur, tertium, 3.6.4
name, uncountable aggregate, 2.10.1 non-existence, proof, 3.10.12
name map, proposition, 4.1.10, 4.1.11 non-Hausdorff locally Euclidean space, 42.3
name space, abstract variable, 2.11.17 non-Hausdorff topology, 15.2.15
name space, proposition, 4.1.1, 4.1.10, 4.1.11 non-Lie structure group, differentiable fibre bundle, 34.1,
name-to-object map, predicate language, 4.12.3 34.1.3
naming bottleneck, 2.10.1 non-linear operator, 44.4.0
NAND (not-and), 4.6.4, 4.7.1, 4.11.3 non-measurable set, Lebesgue, 2.10.6, 2.11.9, 3.2.1, 4.13.10,
NAND operator, 4.3.6, 4.3.8 5.9.3, 5.9.10, 5.9.11, 20.1.3, 20.1.4
Napoléon Bonaparte, 3.11.9 non-measurable set, Lebesgue, unknowable existence, 2.10.8
natural language, 3.10.6, 3.12.1 non-membership notation, set, 5.1.4
natural language, logic, 3.13.1 non-representational art, 3.1.3
natural language families, 1.4.13 non-self-intersecting curve, 30.4.9
natural number, 7.2.31, 7.3, 7.3.2, 7.3.5 non-separated pair of sets, 15.3.3
natural number, naive, 3.14.4 non-tensorial Christoffel symbol, 38.5.5
naturalism, mathematical definitions, 5.0.4 non-topological fibration, 22.1
nature, essential, 1.4.7 non-topological fibration, parallelism, 22.2
nature, essential, set, 5.2.6 non-topological fibration, uniform, 22.1.3
nature, set, 5.5.3 non-topological fibre bundle, 22, 22.3, 22.3.1
nature, true, mathematical logic, 3.1.2 non-trivial topology, 14.8.10
navigation, terrain, 3.5.10 non-uniform non-topological fibration, 22.1.1
NBG (Neumann-Bernays-Gödel), 5.0.9, 5.2.2 nonsense, pure, 5.7.10
NBG proper class, 5.7.23 NOR (not-or), 4.6.4
NBG set theory, 4.1.5, 5.1.1, 5.2.2, 5.7.5, 5.12 NOR operator, 4.3.6, 4.3.8
NBG set theory, first order logic, 4.13.13 norm, 10.8
nearest integer distance function, 8.6.19 norm, Euclidean, 10.8.4
nearest integer function, 8.6.14 normal coordinates, 29.0.4, 37.8.1, 38.5.10
nefne, Old English, 3.5.4 normal coordinates on two-sphere, 41.12
negated proposition, 3.10.10 normal form, conjunctive, 4.11.3
negation, double, 3.6.1, 3.10.8 normal form, disjunctive, 3.5.9, 4.11.3
negation, logical, 3.10.1, 4.3.3 normal geodesic, 40.5.0
negation, logical, double, 3.11.6 normal space, 14.1.8
negation, logical, semantics, 3.10 normal subgroup, 9.3.8
negation, truth table, 3.10.10 normal topological space, 15.2.21
negation operator, 3.10.3 normalizer of group, 9.3.25
negative definite, 11.4.7 not, 4.3.3
negative number one’s complement representation, 7.5.9 not-knowing, tolerance, 3.2.8
negative number two’s complement representation, 7.5.8 not proven, Scottish law, 3.10.15
negative semi-definite, 11.4.7 notation, derivative, 18.2.11, 30.4.7
neighbourhood, cone-shaped, 18.2.13 notation for definitions, 1.6.5
neighbourhood, convex, 37.8.5 notation introduction, formal, 5.8.6
neighbourhood, per-point, 14.1.6 notations, 48.1
neighbourhood, per-set, 14.1.6 notations for sets of functions, 6.12
neighbourhood, topological, 14.1.3 noumena, 2.12.1
neighbourhood of point in topological space, 14.3.11 noumena, phenomena, 2.5.10
nemne, Old English, 3.5.4 number, biggest, 5.7.18
network, acyclic, concepts, 2.1.7 number, binary, 7.5.8
network, logic machine, 3.7.14 number, black, 2.10.5, 2.11.12
network communications, socio-mathematical, 2.5 number, complex, 8.7
network model, socio-mathematical, 2.5.3 number, composite, 14.4.9
network of concepts, coherent, 2.1.7 number, compressible, 2.11.2
network of discussions, 3.3.4 number, dark, 2.10, 2.11.12

868 50. Index
number, extended rational, 8.2 ontology, Platonic, 2.4.1

number, extended real, 8.4, 8.4.1 ontology, proposition-store, logic, 3.7, 3.11.6
number, floating-point, 5.16.1, 8.3.1 ontology, proposition-store-machine, 3.6.2
number, grey, 2.10.1, 2.10.5, 2.11.12, 3.4.3 ontology, world-model, 5.7.25
number, incompressible, 2.10.4, 2.11.2 ontology, world-model, inconsistency, 5.7.12
number, Lebesgue, 17.3.22 ontology, world-model, logic, 3.11.6
number, natural, 7.2.31, 7.3, 7.3.2, 7.3.5 ontology, world-model, proof by contradiction, 3.11.8
number, natural, naive, 3.14.4 ontology, world-model, propositional logic, 3.6.3
number, ordinal, 7.2, 14.1.10 ontology, world-model-machine, 3.6.2
number, prime, 7.4.2 ontology categories, mathematics, 2.3.8
number, rational, 8.1, 8.1.1 ontology for logic, proposition-store, 3.6
number, real, 8.3, 8.3.6 ontology for logic, world-view, 3.6
number, real, algebraic, 14.2.8 ontology of truth and falsity, 3.6.1
number, real, philosophy, 2.12 open annulus, 17.1.17
number, real, unknown, 4.2.8 open arc, 16.2.9
number, unmentionable, 2.10.4 open ball, 17.1.7, 17.3.8
number heaven, 2.4.1 open ball, punctured, 17.1.17
number mysticism, 2.4.5 open base, 14.11, 14.11.2, 15.6.1
number notation summary, 1.6.1 open base for topological space, 15.6.2
number representation, real, 8.3.3 open-closed interval, 8.3.10
number tuple, real, 8.5 open cover, 14.11.12, 15.7.1
numbers, binary, 2.10.6 open cover, finite, 14.11.12
numerical analysis, 2.11.6 open curve, 16.2.9
numerology, Pythagorean, 2.4.6 open curve, tangent operator, 30.4.8
nymþe, Old English, 3.5.4 open curve, tangent vector field, 30.4.1
object, attributes, database, 2.4.4 open interval, 8.3.10, 14.10.2
object, mathematical, 5.16.2 open path, 16.4.7
object class, 5.16.7 open portion of boundary of set, 14.6.9
object/class model, 5.7.25 open refinement of covering, 15.7.2
objectives of this book, 1.4 open set, 14.3.12
objects, underlying, 4.16.1 open set symbol G, 14.3.13
objects location, mathematical, 2.7.1 open set symbol Ω, 14.3.13
oblique projection, 42.4.1, 42.4.2 open subbase, 14.11.3
observable, anthropological, 3.4.1 open subbase of topological space, 15.6.4
obvious, 1.5.2, 3.0.1 opera house, Viking, 42.10.1
odd permutation, 7.10.10 operant conditioning, 3.5.10
ODE (ordinary differential equation), 21.0.2 operating system, computer, 4.7.7
Odysseus, 36.2.4 operation tree, logical, 3.10.13
OED (Oxford English Dictionary), 1.6.7 operational definition, 2.11.8, 2.11.9
OFB (ordinary fibre bundle), 23.0.1 operational procedure, class, 4.1.4
OFB connection, 35.3.2 operator, binary, 4.3.12
ointment, 5.15.3, 23.5.0, 32.3.1, 42.4.1 operator, differential, in Riemann space, 38.7
ointment, fly, 7.2.5 operator, gradient, 44
Old English, 3.5.4 operator, Hessian, 36.6, 36.7.2
Olduvai 9, 2.2.4 operator, Hessian, at critical point, 31.6
Olympic games, ancient, 3.2.8 operator, implication, primacy, 4.6.5
Olympus, Mount, 5.0.2 operator, Laplace-Beltrami, 38.7.3
Omega, 14.3.13 operator, Laplacian, 19.5.3, 38.7.1, 38.7.2
omicron, 14.3.13 operator, left-associative, 4.3.12
omit function for list, 7.12.2 operator, left translation, 33.3.1
on demand, unbounded modelling, 5.7.25 operator, logic, arithmètic equivalent, 4.3.15
on-demand construction, compound propositions, 3.13.5 operator, logical, 3.10.10, 4.3, 4.11
1–1, 6.5.23 operator, logical, associativity, 4.3.12
one-parameter family of diffeomorphisms, 26.8.1 operator, logical, importation, 4.2.1, 4.3.1
one-parameter family of geodesics, 40.4.1 operator, logical, principal connective, 4.3.11, 4.3.12
one-parameter group of diffeomorphisms, 26.8.2 operator, logical, zero-operand, 4.3.18, 4.3.19
one-parameter group of local transformations, 26.8.4 operator, NAND, 4.3.6
one-parameter transformation family, 30.5 operator, negation, 3.10.3
one-parameter transformation group, vector field, 30.5.2 operator, NOR, 4.3.6
one-sphere Möbius strip fibre bundle, 43.3 operator, right-associative, 4.3.12
one-to-one, 6.5.23 operator, second-order, 36.6.1
one’s complement representation of negative numbers, 7.5.9 operator, second-order, differential, 31.7.3
ontologically empty, 2.5.18 operator, second-order, elliptic, 29.6, 29.6.3
ontology, definition, 2.3.1 operator, second-order elliptic, 36.7.3
ontology, infinity, 2.10.1 operator, second-order weakly elliptic, 36.7.3
ontology, mathematical classes, 2.5.4 operator, tangent, 27.0.4, 27.5, 27.5.1
ontology, mathematics, 2.3 operator, tangent, higher-order, 29.1
ontology, modelling, 2.10.1 operator, tangent, second-order, 29.1.1

50. Index 869
operator, tangent, second-order, tagged, 29.1.9 oþ!e, Old English, 3.5.4

operator, tangent, tagged, 27.6, 27.6.2 outputs, mathematics, 3.2.3
operator chain, implication, 4.6.5 outside, 5.7.17
operator domain of module, 9.9.8 outsourcing definitions, 27.2.0
operator field, coordinate basis, 28.6.4 overview of chapters, 1.2
operator field, second-order elliptic, 36.7 ox, 3.5.3, 47.4.1
operator field, tangent, 28.6 oxygen, 2.1.5
operator field composition, 32.2.7
page counts, chapters, 1.3.1
operator frame, tangent, 27.12.4
pair, ordered, 6.1, 6.1.3, 7.7.3
operator homomorphism, 9.9.10
pair axiom, unordered, 5.3.3
operator notations, logic, 4.3.6
pair category, 6.1.2
operator space, second-order tangent, 29.4.1
pantograph, 31.7.0, 37.9.3
operator space, tagged second-order tangent, 29.4.3
paper, finite, 2.10.7
operator space, tangent, 27.7.6
paper, flat, 25.1.5
operator space, tangent, tagged, 27.7.9
paper, infinite, 2.10.6
or, 4.3.3
papers, research, 5.0.3
or, exclusive, 3.5.2, 3.5.7, 3.13.7, 4.3.5
Pappus of Alexandria, 45.1.2
or, exclusive, notation, 4.3.7
papyrus, 2.5.5
or, inclusive, 3.5.2, 3.5.7, 3.13.7, 4.3.5
parabolic PDE, 21.4.0
OR-construction, 3.5.4
paracompact analytic manifold, 26.10.6
orbit space method for associated fibre bundles, 23.12.0
paracompact differentiable manifold, 26.5.18
order, 7
paracompact manifold, 33.1.2
order, dual, 7.1.7
paracompact metric space, 17.3.7
order, lexicographic, 16.1.8 paracompact topology, 15.7.14
order, partial, 7.1.1 paraconsistent logic, 4.1.5, 5.7.11
order, total, 7.1.4 paradox, 3.2.7
order, word, 3.10.6 paradox, Burali-Forti, 3.2.1, 5.7.13, 5.7.15, 7.2.3
order homomorphism, 7.1.13 paradox, Cantor, 3.2.1
order isomorphism, 7.1.13 paradox, flat, 5.7.23
ordered field, affine space, 12.2.15 paradox, liar’s, 5.7.13, 5.7.22
ordered pair, 6.1, 6.1.3, 7.7.3 paradox, recursion-style, 2.11.3
ordered quadruple, 6.1.14 paradox, Russell’s, 3.7.10, 3.10.4, 4.1.9, 5.1.6, 5.5.1, 5.6.4, 5.7,
ordered sample, 7.11.1 5.7.7, 5.8.28, 5.12.1, 5.12.3
ordered selection, 7.11, 7.11.1 paradox, Zeno, 2.11.3, 7.2.7, 14.0.4, 14.2.5
ordered set, 7.1, 7.1.1 parallel displacement for PFB connection, 35.8
ordered set, totally, 7.1.4 parallel transport, 22.2.1, 32.1.0, 35.0.2, 36.1.1, 36.2.1, 36.5.1,
ordered traversal, 7.1.17, 7.1.18, 16.1.8, 16.2.3, 17.5.11 36.8.8
ordered traversal, length, 17.5.9 parallel transport, differentiation, 35.2
ordered triple, 6.1.14 parallel transport, exterior derivative, 20.6.18
ordered tuple, 7.7.3 parallelism, absolute, 12.0.2
ordering, define-before-use, 1.4.12 parallelism, associated, 24.3
ordinal number, 7.2, 14.1.10 parallelism, associated topological pathwise, 24.3.2
ordinal number, finite, 7.2.12 parallelism, deviation from flatness, 35.4.1
ordinal number 10, 47.3.3 parallelism, differential, 15.3.2, 35.1.1
ordinal numbers, von Neumann construction, 7.2.5 parallelism, fibre set, topological, 24.1.9
ordinary differential equation, 21.1 parallelism, Levi-Civita connection, 35.2.3, 36.1.2
ordinary fibre bundle, connection, curvature, 35.4 parallelism, pathwise, 34.0.0
ordinary fibre bundle, horizontal lift function, 35.3 parallelism, pathwise, topological, 24.2.2
organism, 5.7.25 parallelism, structure group, 35.0.3
organism, multi-celled, 14.1.11 parallelism at a distance, Levi-Civita connection, 24.1.1
organism, world model, 3.3.2 parallelism curve class, 24.1.6
oriented continuous path, 16.4.2 parallelism for non-topological fibration, 22.2
ornithology, 2.0.4 parallelism on topological fibre bundle, 24
orthogonal bundle, tangent, 38.5.9 parallelism path class, 24.1, 24.1.6
orthogonal connection, 12.1.5, 41.5.3 paralysis, 14.8.1
orthogonal connection, Levi-Civita, 38.2.5 parameter, proposition, 4.12.3
orthogonal frame, 38.5.6 parametrization of curve, 16.1.6
orthogonal group, 24.2.8, 35.6.2, 41.8.0 parametrization of path, 16.4.18
orthogonal matrix, 11.1.25, 11.5.7, 41.17.2 parametrized family of systems, 2.8.5
orthogonal transformation, 11.5.7, 19.2.4, 19.5.3, 23.5.0, parametrized proposition families, 4.12
45.3.0 parametrized proposition family, 3.1.2
orthogonality of transition maps, 27.8.9 parentheses, 4.3.16
orthonormal vectors on Riemannian manifold, 40.5.0 Paris, raining, 3.6.3
other people’s books, 49.10 parity, 7.10.9
otherwise, Anglo-Saxon, 3.7.17 parity function, permutation, 7.10.11
otherwise, logic, 3.5.4 parse-tree, logic expression, 4.3.10
o!!e, Old English, 3.5.4 part, fractional, function, 8.6.13
o!þe, Old English, 3.5.4 partial Cartesian product, 6.10, 6.10.1, 41.6.0

870 50. Index
partial differential equations, 29.0.6, 36.7.1 PDE, elliptic second-order, 26.6.10

partial function, 6.11, 6.11.3 PDE, hyperbolic, 21.4.0
partial order, 7.1.1 PDE, parabolic, 21.4.0
partial second-order tangent vector field, 31.4.2 PDE (partial differential equation), 21.0.2
partial sequence, 6.10.2 PDE corpus, 1.4.1
partial tangent vector field, 30.4.11 PDE techniques, curved space, 1.4.1
partially defined function, 6.11, 6.11.3 PDO (partial differential operator), 21.0.2
partially defined function, composite, 6.11.7 Peano, Giuseppe, 2.5.13, 2.9.7, 45.1.6
partially defined function, composition, 6.11.7 Peano axioms, 2.5.13, 7.2.32, 7.3.2, 7.3.3
partially differentiable function, 18.5.4 Peirce arrow, 4.3.6, 4.3.8
partially ordered set, 7.1.1 Penelope, 36.2.4
particle, elementary, 2.0.1 pentuple, ordered, 7.7.3
particle trajectory, 16.2.2 people, pre-scientific, 3.2.8
partition, 6.4 per-fibre-set chart, 34.2.4
partition of set, 6.4.2, 6.4.5 per-point neighbourhood, 14.1.6
parts, car, 2.4.3, 2.4.4 per-set neighbourhood, 14.1.6
Pascal, Blaise, 7.11.6, 45.1.3 perception, colour, 3.4.5
Pascal’s triangle, 7.11.6, 7.11.7 perfect model, 3.4.3
Pasch, Moritz, 45.1.6 perfection, crisp, abstract logic, 3.4.3
passive set, 9.9.0 permutation, 7.10, 7.10.3, 7.11
patch, software, 3.10.4 permutation-invariant topology, 14.8.5
patchwork quilt, 25.0.1 perspective, dynamic, set construction, 5.7.24
path, 16.4, 16.4.2 PFB (principal fibre bundle), 23.0.1
path, affine, 37.0.2 PFB connection, 35.3.2
path, closed, 16.4.7 PFB connection, connection form, 35.6
path, constant, 16.4.8 PFB connection, parallel displacement, 35.8
path, continuous, directed, 16.4.2 phase space, 36.11.1
path, continuous, oriented, 16.4.2 phenomena, 2.12.1
path, continuous, unoriented, 16.4.15 pheonomena, noumena, 2.5.10
path, differentiable, 26.7 philosophy of infinity, 2.11
path, empty, 16.4.7 philosophy of integers, 2.11
path, geodesic, 37.2.4 philosophy of mathematics, 2, 2.5.21
path, open, 16.4.7 philosophy of real numbers, 2.12
path, rectifiable, 17.5, 17.5.16, 24.1.7 photolysis, 14.8.1
path, simple, 16.4.8 photon, 2.12.3
path, simple closed, 16.4.8 physical container, 2.6.2
path, topological, 16 physicist, experimental, 2.9.9
path atlas, 16.1.4, 16.1.8 physicist, mathematical, 2.9.9
path chart, 16.1.8 physicist, theoretical, 2.9.9
path class, parallelism, 24.1, 24.1.6 physicists and reality, 2.9.9
path concatenation, 16.4.13 physics, 2-sphere, 41.0.1
path-equivalence of curves, 16.3 physics, affine spaces, 10.1.1
path-equivalent curves, 16.3.7 physics, algebraic expressions, 7.4.6
path image, 16.4.2 physics, approximated by real world, 3.4.3
path initial point, 16.4.11 physics, bedrock, 2.12.3, 5.9.11
path multiple point, 16.4.11 physics, chart-independence, 31.3.0
path parametrization, 16.4.18 physics, discipline, 2.5.17
path representative, 16.4.2 physics, fibre bundles, 23.1.3
path reversal, 16.4.9 physics, Galileo transformation, 29.0.3
path terminal point, 16.4.11 physics, Lie groups, 33.0.1
path terminology, 16.1 physics, logical reasoning, 4.9.12
path topics summary, 26.7.7 physics, mathematical foundation, 2.10.9
pathological clash of definitions, 2.6.7 physics, mixed tensors, 13.8.4
pathological empty product of family of sets, 15.1.5 physics, network topology, 16.10.1
pathological examples, 42.1.1 physics, parallelism, 24.1.4
pathological examples, axiomatic system, 3.2.7 physics, power series, 8.7.3
pathological manifold, 42.4.0 physics, Riemannian geometry, 39.1.6
pathological set, 6.6.2, 6.9.5 physics, Riemannian metric, 38.2.1, 39.1.5
pathwise parallelism, 34.0.0 physics, second-order derivatives, 36.2.2
pathwise parallelism, associated topological, 24.3.2 physics, second-order operators, 29.0.1
pathwise parallelism, reversibility rule, 24.2.5 physics, vector, 27.2.8
pathwise parallelism, topological, 24.2.2 physics, vector fields, 16.2.2, 23.3.10
pathwise parallelism, transitivity rule, 24.2.4 physics, vectors, 27.1.0
pathwise parallelism on topological fibre bundle, 24.2 physics models, 21.0.4
pattern recognition, 2.11.14 π , 2.3.9, 2.7.2, 2.10.4, 2.10.6, 2.11.9, 2.11.13, 2.11.18, 5.9.11,
PC (propositional calculus), 4.7.2 20.13.4, 45.2.13
PC theorem, 4.7.2 pig, poke, 5.11.5
PDE, elliptic, 21.3.0 pitch, 41.8.2

50. Index 871
pixie, invisible, 2.2.8, 2.10.7 predicate, logical, zero-parameter, 4.12.10

pixie number, 2.10.5 predicate algebra, 4.4.1
pixies, 2.10.3, 2.12.3 predicate calculus, 3.1.2, 4.12, 4.13, 4.14
pixies at the bottom of the garden, 5.9.3 predicate calculus, imaginative process, 4.14.4
plain TEX, 1.5.3, 1.10 predicate calculus, linguistic structure, five layers, 4.12.4
Planck length, 2.12.1 predicate calculus, management of propositions, 4.12.1
Plato, 2.4.1, 3.4.2, 45.1.1, 45.2.1 predicate function, 5.7.2
Plato’s theory of ideas, 2.4 predicate language, name-to-object map, 4.12.3
Platonic Forms, 2.4.5 predicate logic, 3.1.2
Platonic ideal, 2.3.3 predicate logic function, 4.12.11
Platonic ontology, 2.4.1 prehistoric cattle counting, 2.11.19
plethora, contradictions, 5.7.4 prerequisites, 1.0
plethora, formalisms and notations, 1.4.1, 1.4.2 prescriptive interpretation, logical assertion, 3.7.13, 3.12.1
plethora, parentheses, 32.4.2 priests, Egyptian, 3.2.8
plodding correctness, 1.4.7 primacy, implication operator, 4.6.5
Poincaré, Jules Henri, 5.9.3, 45.1.6, 45.2.19 primal space, 13.8.4
Poincaré conjecture, 14.1.10 primal vector, 13.7.2
point-set layer, 1.1 primates, 2.2.4
point space of affine space, 12.2.3 prime integer, 7.4.2
point-to-point distance function, 38.4, 41.3.0 prime number, 2.11.8
point-to-point metric, 17.3.2 primitive concept, 3.4.3
point transformation, 19.1.7 primitive connective, 4.5.2, 4.7.4, 4.11.1
point with no extent, 2.4.3 primitive symbol, 4.3.12, 4.6.4, 4.7.1
pointwise convergence topology, 14.12.21, 15.7.8 principal connective, logical operator, 4.3.11, 4.3.12
pointwise differential, 30.1 principal fibre bundle, 34.9.4
pointwise direct product of functions, 6.9.12 principal fibre bundle, affine connection, 36.9, 36.9.1
pointwise double cotangent space, 28.4.5 principal fibre bundle, connection, differentiability, 35.5.5
pointwise double tangent space, 28.4.1 principal fibre bundle, connection form, 35.6.3
pointwise tangent space, 27.7, 27.7.1 principal fibre bundle, differentiable, 34.4, 34.4.1
poisonous food, 3.10.15 principal fibre bundle, differentiable, vector field, 34.5
Poisson, Siméon Denis, 45.1.5 principal fibre bundle, lift function, horizontal, 35.5
Poisson bracket, 9.11.12, 20.7.2, 32.2, 32.2.10, 32.3.2, 32.5.0, principal fibre bundle, topological, 23.9
33.0.4 principal fibre bundle, vertical vector, 35.5.10
poke, pig, 5.11.5 principal fibre bundle in terrestrial coordinates, 41.4
polar exponential map coordinates, 41.6 principal G-bundle, topological, 23.9.2
polarization of light, 3.11.9 principle, anthropomorphic, 24.1.4
Polish notation, 4.3.16 principle, Mach’s, 24.1.4
politics, 3.7.18 principle, principle, 4.9.4
polynomial, characteristic, 11.5.7 principle of mathematical induction, 2.10.1, 7.3.4
polynomial representation of tensor, 13.13.3 prior knowledge, 3.0.1
Poncelet, Jean-Victor, 45.1.5 prisoner of dogma, 1.4.6
ponendo ponens, modus, 4.6.8 probability, undefined concept, 3.9.1, 5.1.2
ponens, modus, 4.4.3, 4.6.1, 4.6.5 probability notation, 5.8.26
ponere, 4.6.8 probability theory, 7.11.1
populated, finitely, proposition store, 3.3.7 problem, Dirichlet, 21.3.0
population dynamics, 3.7.14 problem, Hilbert’s fifth, 33.0.5, 33.2
portable arrow, 10.1.6 problem, logic, 4.1.7
portable vector, 10.1.6 problem, Neumann, 21.3.0
portion of boundary of set, open/closed, 14.6.9 problem, somebody else’s, 3.3.8
positive curvature, space, 36.0.2 procedure, class test, 4.1.4
positive definite, 11.4.7 procedure, operational, class, 4.1.4
positive feedback, audio system, 3.3.7 procedures, 2.10.10
positive semi-definite, 11.4.7 procedures, logic, 4
postulates, 2.5.18 procedures, logic, formalization, 4.5.1
postulational method, 45.1.1 product, alternating tensor, 13.10.6
power, etymology, 7.4.6 product, anticommutative, 9.11.0
power-of-two function, 7.9.7 product, Cartesian, partial, 6.10.1
power series, physics, 8.7.3 product, Cartesian, sequence, 7.7
power set, universe set, 5.7.26 product, cross, 13.1.1
power set axiom, 5.3.5 product, inner, 10.8
power set properties, 5.14.13 product, tensor, 2.5.12
pre-image of set by relation, 6.3.13 product, wedge, 13.10.2
pre-logical era, 3.2.8 product atlas, 25.5.3
pre-metric geometry, 38.0.1 product bundle, 23.7.13
pre-scientific people, 3.2.8 product function, list, 9.12.2
precedence rules, 4.7.4 product manifold, 25.5.5
precision of modelling, mathematical logic, 3.2.5 product of differentiable manifolds, 26.5.15
predicate, constant, 4.12.12 product of matrices, 11.1.16

872 50. Index
product operation, mixed tensor, 13.9.11 proposition store, 3.7.12

product operation, tensor, 13.9.2 proposition store, finitely populated, 3.3.7
product rules, trigonometry, 20.13.11 proposition-store-machine ontology, 3.6.2
product topology, 15.1, 15.1.1 proposition-store ontology, logic, 3.7, 3.11.6
product topology, direct, 15.1.4 proposition-store ontology for logic, 3.6
productive ZF axiom, 5.1.17, 5.1.19 proposition tag, 3.7.15
programming, computer, 2.6.9, 5.16.7 proposition tagging, 3.9.2
projection, oblique, 42.4.1, 42.4.2 proposition tagging, truth and falsity, 3.7.5
projection function, list, 9.12.1 proposition template, 4.12.3, 5.1.25
projection map, 27.9.2, 41.17.2 proposition testing unit, 3.7.6
projection map, tangent bundle, 19.1.10, 27.2.1 propositional algebra, 4.4.1
projection map for Cartesian product, 6.9.8 propositional bearing function, 2.2.4
projection of sphere onto plane, 41.17 propositional calculus, 3.1.2, 4.8.3
projective geometry, 45.3.0 propositional calculus, domain of interpretation, 4.12.8
projective transformation group, 45.3.0 propositional calculus, implication-based, 4.7
pronoun, 4.3.16 propositional calculus, linguistic structure, five layers, 4.5.4
proof, indirect method, 3.11.1, 3.11.9 propositional calculus, semantics-free, 4.5.1
proof, inline, 4.9.10 propositional calculus axiomatic system, 4.7.4
proof, meta-logical, 3.13.8 propositional calculus formalization, 4.5
proof, non-existence, 3.10.12 propositional calculus theorems, 4.8
proof by computer, 3.6.2 propositional connective, 4.3.4
proof by contradiction, 2.11.5, 3.1.4, 3.11, 4.1.7 propositional logic, world-model ontology, 3.6.3
proof by contradiction, equivalents, 3.11.1 propositions, foisting, 3.7.12
proof by contradiction, validity, 3.11.10 prototypes, demonstration, 2.8.9
proof by contradiction, world-model ontology, 3.11.8 provability, theorem, truth table, 4.4.4
proof discovery, 4.8.4, 4.8.5 proven, not, Scottish law, 3.10.15
proof symbol, 1.6.6 provider, service, logic, 3.3.8
proofs, statement about, meta-theorem, 4.9.3 pseudo-instruction, 3.10.2
proper class, NBG, 5.7.23 pseudo-integer, 3.8.2
properties management, extraneous, 2.7.4 pseudo-metric, 17.1.15
properties of mathematical objects, extraneous, 2.7.2 pseudo-name, proposition, 3.10.7
proponent, 3.7.18 pseudo-notation, 1.4.7, 12.4.1, 13.9.18, 19.2.15
proposition, always-false, 4.3.20 pseudo-number, 4.2.8
proposition, always-true, 4.3.20 pseudo-real-number, 3.8.2
proposition, compound, 3.13.2 pseudo-Riemannian manifold, 39, 40.6
proposition, compound, decomposition, 3.7.14 pseudo-Riemannian metric, 39.2, 39.2.1
proposition, concrete, 3.3.3, 3.10.6 pseudo-Riemannian metric, overview, 39.1
proposition, empirical, 3.3.10, 4.13.10 pseudo-sphere, 42.9
proposition, etymology, 3.7.1 pseudo-theorem, 4.9.5
proposition, foreground/background, 3.6.1, 3.7.3 pseudo-truth-value, 4.2.5
proposition, information content, 3.8.1 pseudogroup, 36.1.5
proposition, logical, verb mood, 3.12 pseudogroup for curves, 16.1.8
proposition, negated, 3.10.10 pseudogroup of diffeomorphisms, 18.7.8, 19.4, 23.1.9, 23.5.0,
proposition, tautological, 4.3.22 23.6.12, 26.3.8
proposition, tautologous, 4.3.22 pseudogroup of diffeomorphisms, complete, 19.4.10
proposition, undecidable, 3.7.19, 3.8 pseudogroup of homeomorphisms, 14.2.3, 19.4.2, 19.4.3,
proposition, undecided, 3.10.1 19.4.5, 19.4.6, 19.4.9, 19.6.4, 19.6.7
proposition, vacuum, 3.6.1 pseudogroup of homeomorphisms, complete, 19.4.7
proposition, value, 3.3.10 pseudogroup of transformations, 26.5.3
proposition acceptance/rejection model, 3.7.2 psychology, conditioning, 7.2.1
proposition analogy, two-sided coin, 3.7.4 Ptolemaic astronomy, 3.4.2
proposition blocking, 3.10.1, 3.10.14 Ptolemy of Alexandria [Claudius Ptolemaeus], 45.1.2
proposition breeding rules, 3.3.7 pull-back, 32.1.0
proposition domain, concrete, 3.1.2, 4.1, 4.1.3, 4.5.4 pull-back, transformation groups, 9.3.16
proposition domain, concrete, closure, 4.9.1 pullback operator, 10.5.24
proposition domain, concrete, dynamic, 4.1.6 punch cards, 2.5.14
proposition domain, concrete, examples, 4.1.8 punctured closed ball, 17.1.17
proposition domain, concrete, static, 4.1.6 punctured open ball, 17.1.17
proposition families, parametrized, 4.12 pure nonsense, 5.7.10
proposition family, parametrized, 3.1.2 pure set, 5.5.4
proposition list, conjunction, 3.13.6 pure set theory, 5.7.25
proposition list insertion, 3.7.14, 3.10.2 push-forth, 31.0.0
proposition name map, 4.1.10, 4.1.11 push-forth, transformation groups, 9.3.16
proposition name space, 4.1.1, 4.1.10, 4.1.11 putative listing, 2.11.21
proposition parameter, 4.12.3 puzzle, cryptographic, 5.15.3
proposition pseudo-name, 3.10.7 Pythagoras of Samos, 45.1.1, 47.4.1
proposition space, compound, 4.2.2 Pythagoras theorem, 1.4.4, 38.1.5, 47.4.1
proposition space, infinite, 4.12.2 Pythagorean numerology, 2.4.6

50. Index 873
Pythagorean triples, 38.1.5 real number tuple, 8.5, 8.5.1

Pythagoreans, 7.4.6, 45.1.1 real numbers, virtual, 2.10.10
real numbers, well-ordering, 5.9.3
QC (predicate calculus), 4.14.1 real numbers usual metric, 17.1.4
QED (quod erat demonstrandum), 1.6.6 real positive definite matrix, 11.4.7
QED symbol, 1.6.6 real positive semi-definite matrix, 11.4.7
quadriplicity, 4.16.11 real semi-definite matrix, 11.4
quadriplique, 4.16.11 real square matrix algebra, 11.4
quadrivium, 2.9.8, 45.2.1
real stuff of mathematics, 2.5.19
quadruple, ordered, 6.1.14, 7.7.3
real symmetric matrix, 11.5.2
quantifier, existential, 4.13.2
real symmetric matrix algebra, 11.5
quantifier, logical, infinity, 4.13.10, 4.13.12
real-valued function, basic, 8.6
quantifier, multiplicity, 4.16.9
real-valued function, differential, 30.2, 30.2.1, 30.2.11
quantifier, universal, 4.13.2
real-valued function, higher-order differential, 31.1
quantifier duality, logic, 4.13.10
real-valued function for higher-order operator, differential,
quantifier notations, logic, 4.13.7
31.5
quantifiers, logical, 4.13
real-valued function for second-order operator, differential,
quantum mechanics, 2.4.2, 3.4.5
31.5.1, 31.5.3
quantum mechanics, integers, 3.4.3
real world, approximation to physics, 3.4.3
quantum mechanics of cattle, 2.11.19
reality and physicists, 2.9.9
quaternion, 8.7.3, 27.0.5
realm, pure mathematical, 3.2.5
quilt, patchwork, 25.0.1
reboot, logic machine, 3.10.5
Quine dagger, 4.3.6, 4.3.8
reboot, system, 3.10.4
quintessence, multilinear, 13.5.4
reception of compound proposition, 3.7.14
quod erat demonstrandum, 1.6.6
recipient of proposition, 3.7.13, 3.12.1
quodlibet, Ex contradictione sequitur, 3.11.1
reckless comprehension, axiom, 5.7.24
quodlibet, Ex falso sequitur, 3.11.1
recognition, pattern, 2.11.14
quotient group, 9.3.10
Recorde, Robert, 45.1.3, 45.2.6
quotient linear space, 10.7.1
recreational mathematics, 2.9.9
quotient of functions, 6.7.7
rectangular grid, 16.5.6
quotient of linear spaces, 10.7
rectangular matrix, 11.1.1
quotient set, 6.4.4
rectangular matrix algebra, 11.1
quotient topology, 15.1, 15.1.6, 15.1.8
rectangular matrix linear space, 11.1.7
RAA, equivalent to MP, 4.6.2 rectangular Stokes theorem in two dimensions, 20.3
RAA (reductio ad absurdum), 3.7.10, 3.7.18, 3.11.2 rectifiable compact-domain curve in Lipschitz manifold,
rabbit, 23.1.7 26.12.5
rabbit category, 2.11.14 rectifiable curve, 17.0.4, 17.5, 17.5.5, 26.12
radiation, electromagnetic, 3.4.5 rectifiable path, 17.5, 17.5.16, 24.1.7
radio, 1.9.1 rectifiable path, length, 17.5.17
radius of ball, 17.1.7 rectifiable set, 17.5, 17.5.3
Radon measure, 9.7.0, 20.10 recursion-style paradox, 2.11.3
raining, Paris, 3.6.3 recursive logic machine model, 3.3.6
random sample, 2.10.9 recursive model, coherence, 3.3.6, 3.3.7
range, meanings, 6.3.17 recursive modelling, 3.3
range of function, 6.5.9, 6.5.10 reduced bundle, 23.7.14
range of relation, 6.3.6, 6.3.7 reductio ad absurdum, 3.7.10, 3.7.18, 3.10.5, 3.10.12, 3.11.2,
range restriction, relation, 6.3.33 3.11.9, 4.5.2, 4.7.6
range/domain specification, function, 6.5.5 reductio ad absurdum, danger, 3.11.4
rat, 4.9.12 reductionism, 2.0.1
rational number, 8.1, 8.1.1 reductionist, 4.11.3
rational number, Cauchy sequence, 8.3.5 redundancy, specification tuple, 14.3.2
rational number, extended, 8.2 redundancy theory of truth, 3.10.8
rational number notation summary, 1.6.1 redundant axiom, ZF set theory, 5.1.17
real-analytic function, 8.7.3 refinement of covering, 15.7.2
real definite matrix, 11.4 reflexive relation, 6.3.29
real Lie algebra, 9.11.11 reflexivity, 6.4.1
real negative definite matrix, 11.4.7 reflexivity of equality axiom, 4.15.1
real negative semi-definite matrix, 11.4.7 reformulation of logic, axiomatic, 3.13.4, 7.13
real number, 8.3, 8.3.6 regular map, 30.3.9
real number, algebraic, 14.2.8 regularity, weak, 1.4.10
real number, extended, 8.4, 8.4.1 regularity axiom, 5.7.27
real number, philosophy, 2.12 regularity axiom, ZF, 5.5, 5.7.19
real number, unknown, 4.2.8 relation, 6, 6.3, 6.3.2
real number interval topology, 15.8 relation, anti-reflexive, 5.7.10
real number notation summary, 1.6.1 relation, antisymmetric, 5.7.10
real number representation, 8.3.3 relation, codomain, 6.3.6
real number topology, 14.10.1 relation, composition, 6.3.23
real number topology, standard, 14.10 relation, domain, 6.3.6, 6.3.7

874 50. Index
relation, equality, concrete, 2.5.11 Riemannian connection in terrestrial coordinates, 41.5

relation, equality, concrete, import, 4.15.2 Riemannian manifold, 38, 38.3.6, 40.5.0
relation, equivalence, 6.4.1 Riemannian manifold, alternative definition, 38.9.1
relation, image, 6.3.6, 6.3.7 Riemannian manifold, embedded, 38.9
relation, injective, 6.3.31 Riemannian manifold inner product, 38.8
relation, inverse, 6.3.27, 6.5.25 Riemannian manifold orthonormal vectors, 40.5.0
relation, membership, 5.1.3 Riemannian manifold sectional curvature, 40.5.0
relation, membership, left-side, 5.2.6 Riemannian manifold tensor calculus, 40.5
relation, predicate versus set, 6.3.4 Riemannian manifold vector length, 38.3.7
relation, range, 6.3.6, 6.3.7 Riemannian metric, 38.3, 38.3.3
relation, reflexive, 6.3.29 Riemannian metric, overview, 38.2
relation, set-theoretic, 6.3.18 Riemannian metric differentiability, 38.3.4
relation, source set, 6.3.14, 6.3.15, 6.3.16 Riemannian metric integral, 38.4.2
relation, symmetric, 6.3.29 Riemannian space, 38.3.6
relation, target set, 6.3.14, 6.3.15, 6.3.16 Riemannian space gradient, 40.5.0
relation, transitive, 6.3.29 Riemannian space Laplacian operator, 40.5.0
relation, univocal, 4.16.2, 6.11.2 right action, 23.9.2
relation network traversal, membership, 5.5.3 right action on Lie transformation group, 33.8.3
relation on the left, membership, 5.2.1 right-associative operator, 4.3.12
relation-predicate, 6.3.3 right conjugate of a subset of a group, 9.3.13
relation-predicate versus relation-set, 6.3.4 right conjugation map, 9.3.22
relation-set, 6.3.3 right conjunct, 4.3.11
relation-set versus relation-predicate, 6.3.4 right coset of subgroup, 9.3.3
relation tuple, 6.3.20 right-differentiable function, 18.3.3
relative set complement, 5.13.8 right disjunct, 4.3.11
relative topology, 14.11.13 right inside skew product of transformation groups, 9.6.5,
relativity, Galilean, 29.0.3 9.6.9
relativity, general, 38.1.3, 39.1.4, 39.3 right invariant vector field, 33.4.7
relativity, Lorentzian, 29.0.3 right invariant vector field on Lie group, 33.4
relativity, special, 38.1.3, 39.1.1 right inverse, group, 9.2.13
relativity, truth-value status, 3.3.9 right inverse matrix, 11.1.24
Renaissance, 3.4.2 right-open set, 18.3.1
renaissance, European, 2.3.5, 45.1.2 right transformation group, 9.5, 23.9.2
reparametrization of curve, 17.5.6 right transformation group, effective topological, 16.8.14
replacement axiom, 5.11.1 right transformation group, topological, 16.8.12, 16.8.17
replacement axiom, ZF, 5.4 right transformation group mirror image, 9.6.1
representation, canonical, 2.8.6 right transformation group of topological space, 16.8.11
representation, polynomial, of tensor, 13.13.3 right transformation group of topological space, effective,
representation, real number, 8.3.3 16.8.13
representation of a Lie algebra, 9.11.17 right transformation semigroup, 9.5
representation of associative algebra, linear, 9.10.7 right translation operator for tangent vectors, Lie group,
representation of identification space, 6.10.6 33.4.3
representation of integer, 27.1.4 right translation operator on Lie group, 33.4.2
representation of Lie algebra, adjoint, 9.11.19 rigorous mathematics, 2.1.1
representation of tangent vector, 27.1, 27.1.1 ring, 9.8, 9.8.1
representation space of Lie algebra representation, 9.11.17 ring, commutative, 9.8.5
representational art, 3.1.3 ring, commutative unitary, 9.8.6
representations of algebras, 9.10.6 ring, list operation, 9.12.2
representative curve, 16.4.2 ring, unitary, 9.8.2
research papers, 5.0.3 ring, zero, 9.8.4
resolution, measurement, finite, 2.12.1 ring ideal, 9.8.7
resources, Internet, 49.0.1 ring of endomorphisms of a module, 9.9.7
restriction logic axiom, 4.7.5 ring of endomorphisms of module, 9.9.15
restriction of a function, 6.5.27 ring with unity, 9.8.2
restriction of differentiable manifold, 26.5.12 roast kangaroo, 2.2.7
restriction of domain of relation, 6.3.33 robot, half, 1.4.7
restriction of list, 7.12.2 robot task, 1.4.8
reversal of path, 16.4.9 robots, martian, 3.4.1
reverse Polish, 4.3.16 rogue project, 5.0.6
reversibility rule for pathwise parallelism, 24.2.5 roll, 41.8.2
rhetoric, 2.9.8 Roman, ancient, 2.5.8
Ricci-Curbastro, Gregorio, 35.1.5, 38.1.1, 39.1.6, 40.1.1, 45.1.6 Roman Empire, 45.1.2
Ricci curvature, 38.6.6 Roman mind, 45.1.2
Ricci tensor, 38.6.5 rotation of two-sphere, 41.8.0
Riemann, Georg Friedrich Bernhard, 25.1.2, 35.1.5, 38.1.1, rote learning, 1.9.2
38.1.2, 39.1.6, 39.3.1, 45.1.5 roulette wheel, 5.9.4
Riemann curvature tensor, 35.4.1 roulette wheel, ex machina, 5.9.9
Riemannian connection coefficients, 38.5.8 round, Earth, 3.10.6

50. Index 875
round function, 8.6.14 second-order tangent bundle, 29.3.10

route, 16.1.1 second-order tangent bundle, topological, 29.3.11
row vector map, 11.1.11 second-order tangent component tuple, 29.3.1
rsfs font, 1.10 second-order tangent operator, 29.1.1
rule, and-introduction, 4.6.2 second-order tangent operator, elliptic, 29.6.3
rule, associativity, 4.3.12 second-order tangent operator, tagged, 29.1.9
rule, deduction, 4.9.1 second-order tangent operator, tensorization coefficients,
rule, deduction, meta-theorem, 4.9.4 29.2, 29.2.3
rule, self-consistency, 3.10.3 second-order tangent operator space, 29.4.1
rule, substitution, logic, 4.9.1 second-order tangent operator space, tagged, 29.4.3
rule, theorem application, 4.9.2 second-order tangent space, 29.3.7
Ruler, Universe, 2.11.22 second-order tangent vector, 29.3.3
rules, deduction, 4.4.2, 4.6 second-order tangent vector, differential, 31.7.1
rules, derivation, logic, 4.14.4 second-order tangent vector field, 31.3.1
Russell, Bertrand Arthur William, 2.1.9, 2.4.6, 2.9.7, 3.2.1, second-order tangent vector field, partial, 31.4.2
38.1.5, 45.1.6 second-order vector field, 29.7.1
Russell, Francis Stanley (Frank), 2.1.9 second-order vector field, differentiable, 29.7.3
Russell’s paradox, 3.2.1, 3.7.10, 3.10.4, 4.1.9, 5.1.6, 5.5.1, second-order vector field, elliptic, 29.7.6
5.6.4, 5.7, 5.7.7, 5.8.28, 5.12.1, 5.12.3 secondary key, 2.6.1
Russell’s paradox, excluded middle, 5.7.11 section, terminology, 23.3.11
Russia, Soviet Union, 2.4.4 sectional curvature, 38.6.4
sectional curvature on Riemannian manifold, 40.5.0
salt, 14.8.1
secure formalism, 1.5.1
same-group left inside skew product of transformation groups,
9.6.8 selection, ordered, 7.11, 7.11.1
same-group right inside skew product of transformation selection, unordered, 7.11.1
groups, 9.6.9 self-consistency, axiom system, 3.11.4
sample, ordered, 7.11.1 self-consistency, logic, 4.1.7
sample, random, 2.10.9 self-consistency rule, 3.10.3
sample, unordered, 7.11.1 self-containing set, causality violation, 5.7.24
sand, 1.5.1 self-discipline, 12.0.1
sat, mat, cat, 3.10.10 self-interest, mathematics, 2.10.1
satellite image analogy, 2.10.1 self-referential equation, 5.7.14
sawtooth functions, 8.6.19 self-referential monster, 5.7.27
scalar curvature, 38.6.7 semantic space, first order logic, 4.13.13
scalar multiplication, 10.1.2 semantics, 2.2.3, 2.3.1
scepticism, 2.2.8, 3.9.4, 3.10.15 semantics, diagrams, 3.6.2
schema, axiom, 4.5.2 semantics, logic, 3
Schild, Alfred, 45.1.6 semantics, mathematics, 3.2.4
Schild’s ladder, 36.7.10, 36.8.8 semantics, meta-language, 3.10.6
Schrödinger’s cat, 3.2.8, 5.7.12 semantics, truth and falsity, 3.9
Schwartz, Laurent, 45.1.6 semantics-free logic, 4.5.1
Schwartz distribution, 27.1.1, 27.5.8 semantics-free propositional calculus, 4.5.1
Schwarzschild, Karl, 45.1.6 semantics of logical negation, 3.10
Schwarzschild singularity, 39.4 semantics/syntax, logic, 4.3.10
science, cognitive, 5.7.17 semi-closed interval, 8.3.10
science, logic, 3.2.5 semi-definite, negative, 11.4.7
scientific truth, 3.9.3 semi-definite, positive, 11.4.7
scope, wff name, 4.9.3 semi-definite real symmetric matrix, 11.6
Scottish law, not proven, 3.10.15 semi-open interval, 8.3.10
search, backwards-deductive, 4.8.5 semi-pixie integer, 2.11.10
second countable topological space, 15.6.3 semicolon set notation, 5.8.26
second derivatives of distance function, Hessian, 39.1.1 semigroup, 9.1, 9.1.1
second dual of linear space, 10.5.18 semigroup, list operation, 9.12.1
second fundamental form, 38.1.4 semigroup, right transformation, 9.5
second-level tangent bundle, 27.10.3 sender of proposition, 3.7.13, 3.12.1
second-level tangent bundle on Euclidean space, 19.3.3 sensible experience, 2.3.6
second-level tangent bundle total space, 27.10.14 sensor/motor, feedback, 3.3.2
second-level tangent vector, 19.3 separability of topological space, 15.6
second-level tangent vector, drop function, 29.5, 29.5.2 separable topological space, 15.6.6
second-order differential operator on Euclidean space, 19.5 separated pair of sets, 15.3.3
second-order operator, 36.6.1 separation, Hausdorff, 15.2.12
second-order operator, differential, 31.7.3 separation axiom, 5.11.2
second-order operator, elliptic, 29.6, 36.7.3 separation axiom, ZF, 5.4.3
second-order operator, real-valued function, differential, separation class, completely regular, 15.2.18
31.5.1, 31.5.3 separation class, Hausdorff, 15.2.13
second-order operator, weakly elliptic, 36.7.3 separation class, normal, 15.2.21
second-order operator field, elliptic, 36.7 separation class, T4 , 15.2.20
second-order PDE, elliptic, 26.6.10 separation class T0 , topology, 15.2.3

876 50. Index
separation class T1 , topology, 14.3.9, 14.7.2, 15.2.4 set sequence, 7.1.14

separation class T2 , 15.2.13 set-theoretic formula, 6.3.3
separation classes of topological spaces, 15.2 set-theoretic formula, always-true, 6.5.16
separation of sets, 15.3 set-theoretic function, 6.5.1
sequence, convergent, 14.12.27 set-theoretic relation, 6.3.18
sequence, digit, 7.2.5 set theory, 5
sequence, divergent, 14.12.27 set theory, Bernays-Gödel, 5.1.6, 5.5.1, 5.12
sequence, etymology, 7.1.16 set theory, BG, 5.12
sequence, infinite, termination, 2.11.22 set theory, buggy, 3.10.4
sequence, partial, 6.10.2 set theory, naive, 5.1.1, 5.2.2, 5.7.2
sequence of curves, concatenation, 16.2.17 set theory, naive, finite, 3.13.3
sequence of functions, 7.1.14 set theory, NBG, 4.1.5, 5.1.1, 5.2.2, 5.7.5, 5.12
sequence of sets, 7.1.14 set theory, Neumann-Bernays-Gödel, 5.1.1, 5.2.2
sequent, propositional calculus, 4.14.4 set theory, pure, 5.7.25
sequential compactness, 17.3.29 set theory, symbolic logic, 5.0.3
sequentially compact set, 15.7.11 set theory, Zermelo, 5.11, 5.11.1
sequitur quodlibet, Ex contradictione, 3.11.1 set theory, Zermelo-Fraenkel, 4.1.8, 5.1, 5.7.2
sequitur quodlibet, Ex falso, 3.11.1 set theory, Zermelo-Fraenkel, with axiom of choice, 5.9.6
series, Maclaurin, 13.1.2 set theory, Zermelo-Fraenkel, with axiom of countable choice,
series, Taylor, 8.7.3, 20.13.1, 21.7 5.10.3
service provider, logic, 3.3.8 set theory, Zermelo-Skolem-Fraenkel, 5.1.1
set, container metaphor, 5.7.17, 5.7.25 set theory, ZF, propositions, 4.1.9
set, dark, 2.10 set theory, ZF, redundant axiom, 5.1.17
set, determinable content, 5.5.3 set theory 8-line summary, ZF, 5.1.27, 5.1.28
set, essential nature, 5.2.6 set theory and logic, cyclic, 2.1.6
set, grey, 2.10.1 set theory axiom, CC, 5.10.1
set, incompressible, 2.10.4 set theory axioms, Zermelo-Fraenkel, 5.1.26
set, naive, 3.14.4, 4.1.3, 4.1.4, 4.1.9, 4.12.6 set theory axioms, ZF, 5.3
set, nature, 5.5.3 set theory construction stage, ZF, 5.5.4
set, pure, 5.5.4 set theory customer, 3.3.8
set, rectifiable, 17.5 set theory model, 5.2.3
set, self-containing, causality violation, 5.7.24 set theory service provider, 3.3.8
set, undefined concept, 5.1.2 set translate, 14.8.4
set, universal, 4.1.9 set union, 5.13.2
set, unmentionable, 2.10.4 set union closure, 5.15
set algebra, 5.13, 5.14 set union properties, binary, 5.13
set attributes, 5.2.6 set union properties, general, 5.14
set boundary, topological, 14.6.4 set union topology, 15.10
set category, 6.1.2 set universe, 5.7.14
set class tag, 5.2.6 setting bones, 45.2.4
set complement, 5.13.8 several variables, differentiation, 18.5
set construction, dynamic perspective, 5.7.24 several variables, higher-order derivatives, 18.6
set diameter, 17.2, 17.2.4 Shakespeare, William, 1.0
set distance, 17.2 sharpness of bounds, 21.1.6
set domain, concrete, 5.7.23 sharpness of theorems, 42.0.1
set existence axiom, ZF, 5.1.17, 5.1.18 sheep, 3.5.3
set exterior, topological, 14.6.2 Sheffer stroke, 4.3.6, 4.3.8, 4.6.4
set family, 6.8.1 should-proposition, 3.3.10
set family, Cartesian product, 6.9, 6.9.1 sign function, 8.6.2
set graft, 6.10.6, 41.6.0 sign function, permutation, 7.10.11
set identity, 5.2.6 signed integer, 7.5, 7.5.2
set interior, topological, 14.5.1 signum function, 8.6.3
set intersection, 5.13.2 silver, 3.5.3
set intersection properties, binary, 5.13 simple close curve, 16.2.11
set intersection properties, general, 5.14 simple closed path, 16.4.8
set language, high-level, 5.6.2 simple curve, 16.2.11
set map, function, 6.6 simple m-vector, 13.10.10
set map, inverse, function, 6.6 simple path, 16.4.8
set map corresponding to a function, 6.6.1 simple tensor, 13.6.6
set map corresponding to a function, inverse, 6.6.1 simple topology, countably infinite set, 14.8
set membership chain, 5.5.3 simple topology, finite set, 14.4
set membership relation, 4.15.1 simulation, computer, 3.13.4, 3.13.8, 3.14.2
set membership symbol, 5.1.20 simultaneous logical equations, 4.4.2
set non-membership notation, 5.1.4 sine function, 20.13.9
set notation definition, uniqueness, 5.8.6 singleton, 5.3.3, 5.8.9
set partition, 6.4.2 singleton axiom, 5.3.4
set product, Cartesian, 6.2, 6.2.1 singular homology theory, 16.6.0
set quotient, 6.4.4 singular r -chain, 26.1.6

50. Index 877
singularity, Schwarzschild, 39.4 stack, context, 3.10.5

situs, analysis, 14.2.1, 45.2.17 stage, ZF set theory construction, 5.5.4
size, social group, 2.2.4 stages of abstraction, three, 3.9.6
skew product of transformation groups, left inside, 9.6.8 standard atlas for Euclidean space, 26.4.1
skew product of transformation groups, right inside, 9.6.9 standard basis of Euclidean linear space, 10.2.21
slave, 3.5.3 standard fibre, 23.3.3
smallest topology, 14.4.1 standard identification map for Cartesian product, 7.7.6
smiling, 3.9.8 standard immersion, free linear space, 10.10.1
smooth function space, 44.5 standard immersion in tensor product using free linear space,
snake, 2.1.1 13.12.1
snakes and ladders, 2.1.3 standard injection for direct sum of linear space sequence,
Sobolev space, 26.11.1 10.6.4
social behaviour control, 3.7.12 standard map projection for two-sphere, 41.16
social group size, 2.2.4 standard topology for the real numbers, 14.10
socio-mathematical network communications, 2.5 standardization bodies, communications, 2.8.9
socio-mathematical network model, 2.5.3 standardization processes, 2.5.13
socio-mathematical network synchronization, 2.5.3 starlike subset, 37.4.2
Socrates, 3.4.2 stars, fixed, 24.1.4
software, computer, 3.13.4, 3.13.8 state, successor, 2.4.4
software library, 4.7.7 statement about proofs, meta-theorem, 4.9.3
software packages, symbolic mathematics, 2.12.6 statement form, 4.5.2, 4.5.4
software patch, 3.10.4 statement-form name, 4.5.4
soil, 1.5.1 statement-form-name form, 4.5.4
solidarity, 2.2.4 statement name, 4.5.2, 4.7.4
solve, etymology, 14.8.1 statements, conditional, Gilgamesh epic, 3.5.2, 45.4.1
somebody else’s problem, 3.3.8 states, world, 3.6.3
sometimes-constant curve, 16.3.3 states of mind, 2.5.3
soup, antediluvian, 5.1.21 static concrete proposition domain, 4.1.6
source set of relation, 6.3.14, 6.3.15, 6.3.16 statistical variations, cloud, 2.3.9
Soviet Union, Russia, 2.4.4 statistics, geometry, 38.10.1
space, affine, 12, 12.2.3 steering language, 2.5.7
space, flat, 12.0.2 Steiner, Jakob, 45.1.5
space, identification, 6.10.5 step function, 8.6.5
space, linear, 10.1, 10.1.2 steps, missing, 1.5.2
space, Riemannian, 38.3.6 stetiger Zusammenhang, 35.1.2
space, vector, 10.1.1 Stiefel, Eduard Ludwig, 23.1.1, 45.1.6
space-filling curve, 16.1.5, 25.1.4, 42.1.2 Stokes, George Gabriel, 45.1.6
space grooves, 29.0.4 Stokes theorem, 14.0.3, 20.3.6, 20.6.1, 20.9, 20.9.2, 24.1.2,
space of positive curvature, 36.0.2 24.4.0, 26.1.6
space-time, granular, 2.12.1 Stokes theorem, rectangular, in three dimensions, 20.4, 20.4.1
space-time, Minkowski, 39.1.1 Stokes theorem, rectangular, in two dimensions, 20.3, 20.3.2
span of mathematical logic, 3.2.5 store, proposition, 3.7.12
span of points, affine space, 12.2.23 store, proposition, finitely populated, 3.3.7
Spanish colonies, 2.2.7 straw, camel, 2.11.15, 4.13.12
spanning set of linear subspace, 10.2.4 stretch of curve, constant, 16.3.2
sparse matrix, 10.10.5 strictly elliptic second-order tangent operator, 29.6.3
spartan axiomatic system, 4.6.4 stroke, Sheffer, 4.3.6, 4.3.8, 4.6.4
special functions, 20.1.2 stronger topology, 14.3.21
special relativity, 38.1.3, 39.1.1 strongly connected function, 15.5.4
specification, axiomatic, 2.7.2 structural layers of differential geometry, 1.1
specification axiom, 4.12.6, 5.11.1 structure, differentiable, 26.2.2
specification axiom, ZF, 5.4.1, 5.4.2, 5.7.19, 47 structure, differentiable, overview, 26.1
specification tuple, 5.16 structure, logic, 3.3.3
specification tuple redundancy, 14.3.2 structure, mathematical, imported, 2.6.2
spectral decomposition, 11.5.7 structure function, affine, 12.2.5
speech, human, 2.2.4 structure group, 9.7.0, 43.1.0
speed of light, 38.1.5 structure group, Lie, 32.2.0
spelling, British, 1.6.7 structure group, Lie, differentiable fibre bundle, 34.2, 34.2.2
sphere, 43.2.5 structure group, non-Lie, differentiable fibre bundle, 34.1,
sphere, tangent, 41.17.2 34.1.3
sphere of general dimension, 42.6 structure group, parallelism, 35.0.3
sphere projection onto plane, 41.17 structure groups discussion, 23.5
spherical coordinates, 42.6.0 structure-preserving fibre set map, 23.8, 23.8.2
spherical coordinates, astronomical, 41.1.6 structure tuple, abbreviation, 26.3.7
spherical coordinates, terrestrial, 41.1 structure tuple, figure-head set, 26.3.7
Spivak, Michael David, 1.4.2 stuff of mathematics, 2.0.3
square function, 8.6.20 style, symbolic logic, 4.0.2
square matrix algebra, 11.3 style, ZF axioms, 5.6.2

878 50. Index
style of this book, 1.5 symbolic mathematics software packages, 2.12.6

styles, logic, 4.0.1 symbols, intellectual content, 2.3.7
sub-expression, logical, parenthesized, 4.3.12 symbols, logic, 4.3.5
subalgebra, Lie, 9.11.5 symmetric matrix, 11.3.19
subbase, open, 14.11.3, 15.6.1 symmetric multilinear map, 13.4, 13.4.2
subexpression, antecedent, 4.3.14 symmetric relation, 6.3.29
subexpression, consequent, 4.3.14 symmetry, 6.4.1
subgroup, 9.3, 9.3.1 synchronization, socio-mathematical network, 2.5.3
subject classification, MSC 2000, 1.8 syntax, meta-language, 3.10.6
subjective truth, 3.7.2 syntax/semantics, logic, 4.3.10
subjunctive verb mood, 3.7.1, 3.10.11, 3.12.2 synthetic geometry, 3.4.4
sublayer, conformal, 38.0.3 synthetic logic, 3.9.7
submanifold, 30.3.12 system, algebraic, 4.1.1
submanifold in Euclidean space, 40.7.1 system, axiomatic, 4.5.2
submanifold of Euclidean space, 40.7 system, axiomatic, incomplete, 3.8.2
submersion in Euclidean space, 40.7.1 system, meta-logical framework, 5.7.22
subsequence function for list, 7.12.2 system definitions, mathematical, 2.8
subsets axiom, 5.11.2 system hang, 3.10.4
subspace, trivial, 10.2.7 system of linear second-order ODEs, 21.2
subspace of linear space, 10.2.1 system reboot, 3.10.4
subspace spanned by subset of linear space, 10.2.4, 10.6.6 systems, parametrized family, 2.8.5
substitute function for list, 7.12.2 systems of differential equations, 21.0.3
substitution, exhaustive, logical expression, 4.14.3 T0 separation class, topology, 15.2.3
substitution axiom, ZF, 5.11.3 T1 separation class, topology, 14.3.9, 14.7.2, 15.2.4
substitution of equality, 5.2.3 T1 topology, trivial, 14.8.7
substitution of equality axiom, 5.2.5 T1 topology on finite set, 15.2.10
substitution operator, 13.2.2 T2 space, 15.2.13
substitution rule, logic, 4.9.1 T4 topological space, 15.2.20
substitutivity of equality axiom, 4.15.1 Tá an Domhan cothrom, 3.10.8
successor function, 7.3.3 table, truth, 3.10.1, 4.4.2
successor set, 5.7.15, 7.2.2, 7.2.4, 7.3.1 table, truth, extended, 3.8.1, 4.2.6
successor state, 2.4.4 table, truth, theorem provability, 4.4.4
sugar, 1.8.1 tablet, clay, Mycenaean, 2.3.5
sum of linear spaces, direct, 10.6 tablets, ancient Minoan, 2.5.6
Sumerian, 3.5.2 tablets, golden, 3.2.7
Sun rises in the East, 3.10.6 tag, class, set, 5.2.6
Sung dynasty, 7.11.7 tag, proposition, 3.7.15
super-user, 5.7.19 tagged second-order tangent operator, 29.1.9
superfluous dummy variable, 6.5.16 tagged second-order tangent operator space, 29.4.3
superior intellect, 3.3.4 tagged tangent operator, 27.6, 27.6.2
supremum of partially ordered set, 7.1.10 tagged tangent operator, differential, 30.3.24
surface of constant curvature, 42.9 tagged tangent operator basis operator, 27.7.11
surface of revolution, 42.9 tagged tangent operator space, 27.7.9
surjection, 6.5.23 tagging propositions, 3.9.2
surjective function, 6.5.23 tainted theorem, AC, 5.9.1, 7.8.4
suspicion, truth and falsity, 3.9.4 tainted theorem, CC, 5.9.1
swan, white, 4.13.10 tainted theorem example, AC, 10.2.25, 20.1.4
swap function for list, 7.12.2 tainted theorem example, CC, 7.2.26, 7.2.28, 7.2.36
syllogism, 4.0.1, 4.6.5 tangent, etymology, 27.0.2
symbol, assertion, 4.5.8, 4.6.6, 45.2.18 tangent basis, 27.12.1
symbol, chicken-foot, 5.16.4, 6.3.20 tangent bundle, 27.0.1, 27.8, 27.8.1
symbol, Christoffel, 36.8.8, 36.10.5 tangent bundle, affine connection, 36.4, 36.4.2
symbol, Levi-Civita, 7.10.11, 7.10.21, 7.10.22 tangent bundle, affine connection, differentiability, 36.4.4
symbol, primitive, 4.3.12, 4.6.4, 4.7.1 tangent bundle, Euclidean space, 19.1.8
symbol, two-way assertion, 4.5.9 tangent bundle, multi-level, 27.10.18
symbol −, 45.2.5 tangent bundle, second-level, 27.10.3
symbol +, 45.2.5 tangent bundle, second-order, 29.3.10
symbol =, 45.2.6 tangent bundle, topological second-order, 29.3.11
symbol ⇔, 4.11.2 tangent bundle, unidirectional, 27.14, 27.14.2
symbol γ for curves, 16.2.1 tangent bundle atlas, 27.2.1
symbol ∧, 4.11.2 tangent bundle atlas, unidirectional, 27.14.2
symbol ∨, 4.11.2 tangent bundle chart, 27.2.1
symbolic algebra, 10.10.4 tangent bundle chart, unidirectional, 27.14.2
symbolic juxtaposition, 2.5.12 tangent bundle fibre space, 27.2.5
symbolic logic, 2.9.7 tangent bundle lift function, 27.2.1
symbolic logic, set theory, 5.0.3 tangent bundle lift function, unidirectional, 27.14.2
symbolic logic, verb mood, 3.12.1 tangent bundle metadefinition, 27.2, 27.2.1
symbolic logic style, 4.0.2 tangent bundle of differentiable manifold, 34.8

50. Index 879
tangent bundle of tangent bundle, 27.10 tangent vector, unidirectional, 27.14.2

tangent bundle on differentiable manifold, 27 tangent vector coordinates, 27.2.1
tangent bundle on Euclidean space, cross-section, 19.1.11 tangent vector field, 28.5.1
tangent bundle on Euclidean space, cross-section, tangent vector field, partial, 30.4.11
differentiable, 19.1.12 tangent vector field, second-order, 31.3.1
tangent bundle on infinite-dimensional manifold, 27.16 tangent vector field, second-order, partial, 31.4.2
tangent bundle on two-sphere, global, 41.7 tangent vector field of curve, 30.4.1
tangent bundle projection map, 27.2.1 tangent vector on total tangent space, 27.10.7
tangent bundle projection map, unidirectional, 27.14.2 tangent vector representation, 27.1, 27.1.1
tangent bundle total space, 27.2.1 tangent vector triple, computational, 27.4.1
tangent bundle total space, unidirectional, 27.14.2 tangent vector using Leibniz rule, 44.1.1
tangent bundle total space atlas, 27.8.4 target set of function, 6.5.10
tangent bundle total space manifold, 27.8.6 target set of relation, 6.3.14, 6.3.15, 6.3.16
tangent bundle transition map, 27.2.6 tautological proposition, 4.3.22
tangent component tuple, computational second-order, 29.3.1 tautologous proposition, 4.3.22
tangent component tuple, second-order, 29.3.1 tautology, 4.3.22, 4.3.23
tangent coordinate frame bundle, 34.9.1 Taylor, Brook, 45.1.4
tangent coordinate triple, 27.3.2 Taylor series, 7.11.1, 8.7.3, 20.13.1, 21.7
tangent curve class, 27.0.4 tea-cup and dough-nut topology, 14.2.3
tangent fibration, 27.8.8 techniques, PDE, curved space, 1.4.1
tangent fibre bundle, 34.8.1 tedium, 5.15.8
tangent frame, 27.12, 27.12.1 tempered distribution, 27.5.8
tangent frame bundle, 34.9, 34.9.1 template, axiom, 5.11.5
tangent frame space, total, 27.12.8 template, proposition, 4.12.3, 5.1.25
tangent function, 20.13.9 template definition, 18.4.1
tangent operator, 27.0.4, 27.5, 27.5.1 template function, 6.5.22
tangent operator, differential, 30.3.18, 30.3.22, 30.3.23 temporal parameter, logic machine, 3.7.9
tangent operator, higher-order, 29.1 tennis ball, 2.10.10
tangent operator, second-order, 29.1.1 tensor, 13.6.2, 28.3.4
tangent operator, second-order, differential, 31.7.3 tensor, alternating, 13.10
tangent operator, second-order, elliptic, 29.6.3 tensor, contravariant, 28.1
tangent operator, second-order, tagged, 29.1.9 tensor, covariant, 13.7, 28.3
tangent operator, second-order, tensorization coefficients, tensor, curvature, 36.8.6, 38.6, 38.6.3
29.2, 29.2.3 tensor, curvature, Riemann, 35.4.1
tangent operator, tagged, 27.6, 27.6.2 tensor, fundamental, 38.3.6
tangent operator, tagged, differential, 30.3.24 tensor, Levi-Civita, 7.10.20
tangent operator, zero, ambiguity, 2.6.5, 27.5.12, 27.6.1 tensor, meaning, 13.1
tangent operator basis operator, 27.7.11 tensor, metric, 38.3.6, 40.5.0
tangent operator basis operator, tagged, 27.7.11 tensor, mixed, 13.8, 28.3
tangent operator bundle, 27.9, 27.9.2 tensor, Ricci, 38.6.5
tangent operator field, 28.6 tensor, simple, 13.6.6
tangent operator frame, 27.12.4 tensor, torsion, 36.8.7
tangent operator of curve, 30.4.8 tensor algebra, 13, 13.9.6
tangent operator space, 27.7.6 tensor algebra, alternating, 13.11
tangent operator space, second-order, 29.4.1 tensor algebra, general, 13.9
tangent operator space, tagged, 27.7.9 tensor algebra, mixed, 13.9.14
tangent operator space, tagged second-order, 29.4.3 tensor bundle on manifold, 28
tangent orthogonal bundle, 38.5.9 tensor calculus, 40
tangent space, double, 28.4 tensor calculus, history, 40.1
tangent space, double, pointwise, 28.4.1 tensor calculus for differential manifold, 40.2
tangent space, double, total, 28.4.3 tensor calculus for manifold with affine connection, 40.3
tangent space, higher-order, 29.4 tensor calculus for Riemannian manifold, 40.5
tangent space, pointwise, 27.7, 27.7.1 tensor calculus in terrestrial coordinates, 41.2
tangent space, second-order, 29.3.7 tensor components, 28.3.6
tangent space building principles, 26.14 tensor diagrams, 13.1.1
tangent space of tangent bundle total space, 27.10.13 tensor field, 28.7, 28.7.1
tangent space using Leibniz rule, 44.1.1 tensor field, differentiable, 28.7.2
tangent to curve, transformation rule, 19.1.4 tensor field, Lie derivative, 32.5
tangent to sphere, 41.17.2 tensor field, metric, 10.2.18
tangent vector, 27.2.1, 27.3, 27.3.3 tensor field, Riemannian metric, 38.3.3
tangent vector, computational, 27.4, 27.4.2 tensor field, Riemannian metric, differentiability, 38.3.4
tangent vector, Euclidean space, 19.1 tensor field along curve, 28.8
tangent vector, extrinsic, 27.1.0 tensor field on manifold, 28
tangent vector, higher-order, 29, 29.3 tensor interpretation, 13.1.1
tangent vector, second-level, 19.3 tensor metadefinition, 13.5.1
tangent vector, second-level, drop function, 29.5, 29.5.2 tensor monomial, 13.13.2
tangent vector, second-order, 29.3.3 tensor polynomial representation, 13.13.3
tangent vector, second-order, differential, 31.7.1 tensor product, 2.5.12

880 50. Index
tensor product, alternating, 13.10.6 thinking machine, mathematical, 4.2.8

tensor product defined via free linear space, 13.12 Thomson, William (Lord Kelvin), 20.8
tensor product defined via lists of tensor monomials, 13.13 three-storey model for differential geometry, 25.1.9
tensor product in terms of free linear space, 13.12.1 three-truth-value logic, 3.11.7
tensor product metadefinition, 13.5 Tietze extension theorem, 15.2.27
tensor product of linear spaces, 13.6 Tikhonov, Andrei Nikolaevich, 45.1.6
tensor product operation, 13.9.2 Tikhonov’s theorem, 5.9.10, 15.7.10
tensor product operation, mixed, 13.9.11 time and energy, terrible waste, 4.4.4
tensor product space, 13.6.2 tolerance of not-knowing, 3.2.8
tensor product space metadefinition, 13.5.1 tollendo tollens, modus, 4.6.8
tensor product standard immersion using free linear space, tollere, 4.6.8
13.12.1 topic flow diagram, 1.2
tensor space, 13.6.2 topological atlas, 25.4, 25.4.6
tensor space, contravariant, 28.1 topological boundary of set, 14.6.4
tensor space, mixed, 13.8.2 topological chart, 25.4, 25.4.2
tensor space canoncial multilinear map, 13.5.1 topological closure, 14.5.4
tensor space extended canonical map, 13.13.5 topological coordinate function, 25.4.2
tensorization, 36.2.2 topological coordinate map, 25.4.2
tensorization, Levi-Civita connection, 29.2.7 topological curve, 16
tensorization coefficients, 36.7.10, 38.5.5 topological dimension, 15.9, 33.2.5
tensorization coefficients for second-order tangent operator, topological exterior of set, 14.6.2
29.2, 29.2.3 topological fibration, 23.3, 26.13.0
terminal point of curve, 16.2.13 topological fibration, cross-section, 23.3.8, 26.13.12
terminal point of path, 16.4.11 topological fibration, direct product, 23.3.20
termination, infinite sequence, 2.11.22 topological fibration with a fibre atlas, 23.3.17
terminology, affine connection, 36.1.3, 36.1.4 topological fibration with fibre space F , 23.3.7
terminology, connection, 15.3.2 topological fibration with intrinsic fibre space, 23.2, 23.2.1
terminology, contravariant, 13.7.2 topological fibre atlase, equivalence, 23.6.16
terminology, covariant, 13.7.2 topological fibre bundle, 23, 23.6, 23.6.4, 34.0.0
terminology, covariant derivative, 36.5.3 topological fibre bundle, associated, 23.10, 23.10.5, 23.10.9
terminology, differentiable manifolds, 26.2 topological fibre bundle, associated, orbit space method,
terra firma, 2.1.3 23.12.3
terrain navigation, 3.5.10 topological fibre bundle, fibre-to-fibre homeomorphism space,
terrestrial coordinates, 41.1, 41.2.2, 41.6.0, 41.14.0 35.8.3
terrestrial coordinates principal fibre bundle, 41.4 topological fibre bundle, parallelism, 24
terrestrial coordinates Riemannian connection, 41.5 topological fibre bundle, pathwise parallelism, 24.2
terrestrial coordinates tensor calculus, 41.2 topological fibre bundle association, 23.10.3
terrible waste of time and energy, 4.4.4 topological fibre bundle homomorphism, 23.7.1, 23.7.7
tertium non datur, 3.6.4 topological fibre bundle isomorphism, 23.7.4, 23.7.10
test function, convexity, 37.9.2 topological fibre set parallelism, 24.1.9
test particle, 16.2.2 topological fibre set parallelism space, 24.1.9
test procedure, class, 4.1.4 topological G-bundle, 23.9.2
testing unit, proposition, 3.7.6 topological glue, 14.1.2
tetrahedron, volume, 13.1.2 topological graft, 25.6.1, 42.3.3
TEX, plain, 1.5.3, 1.10 topological group, 16, 16.7, 16.7.1
text-level argumentation, 4.0.2 topological group, locally Euclidean, 33.1.2
Thales of Miletus, 1.4.4, 3.2.8, 45.1.1 topological group of differentiable diffeomorphisms, 33.6.5
the, 4.16.8 topological identification space, 15.11, 25.6.0
theorem, bogus, 4.9.10 topological interior of set, 14.5.1
theorem, deduction, 4.9 topological layer, 1.1
theorem, false application, 14.2.6 topological left transformation group, 16.8.17
theorem, naive, 4.9.5 topological left transformation group homomorphism, 16.8.6
theorem, PC, 4.7.2 topological left transformation group of topological space,
theorem application rule, 4.9.2 16.8.3
theorem corpus, 5.6.2 topological left transformation group of topological space,
theorem provability, truth table, 4.4.4 effective, 16.8.8
theorems, equipotent, 4.9.2 topological manifold, 25, 25.3, 25.3.1
theorems, propositional calculus, 4.8 topological neighbourhood, 14.1.3
theoretical physicist, 2.9.9 topological path, 16
theory, membership, 5.1.3 topological pathwise parallelism, 24.2.2
theory of distributions, 14.6.6 topological principal fibre bundle, 23.9
theory of ideas, Plato, 2.4 topological principal fibre bundle with structure group, 23.9.2
theory of truth, deflationary, 3.10.8 topological principal G-bundle, 23.9.2
theory of truth, redundancy, 3.10.8 topological right transformation group, 16.8.12, 16.8.17
theory of types, 4.1.9 topological right transformation group, effective, 16.8.14
there, 2.10.1 topological second-order tangent bundle, 29.3.11
thief, 3.5.3 topological space, 14.3, 14.3.3
thinking, logical, 3.2.8 topological space, compact, 15.7.4

50. Index 881
topological space, connected, 15.4.1 topology of real number intervals, 15.8

topological space, Euclidean, 25.2.1 topology on finite set, T1 , 15.2.10
topological space, locally connected, 15.4.20 topology on finite set, uniform, 14.4.8
topological space, neighbourhood of point, 14.3.11 torsion, 36.8, 38.5.7
topological space, normal, 14.1.8 torsion form, 36.8.5
topological space, separable, 15.6.6 torsion-free connection, 36.1.4, 36.7.10
topological space compactness classes, 15.7 torsion tensor, 36.8.7
topological space connectivity classes, 15.4 tortoise, Achilles, 2.11.3, 14.0.4, 14.1.4
topological space disconnection, 15.4.5 torus, 42.5, 43.2.2
topological space equivalence, 14.1.10 total cotangent space, 28.2.8
topological space examples, 42.1 total double cotangent space, 28.4.6
topological space product, 15.1.1 total double tangent space, 28.4.3
topological space subset disconnection, 15.4.7 total order, 7.1.4
topological transformation group, 16.8 total space, tangent bundle, 19.1.10
topological vector space, 16.9, 16.9.3 total space atlas, tangent bundle, 27.8.4
topology, 14, 14.3.3 total space manifold, tangent bundle, 27.8.6
topology, algebraic, 14.1.10, 14.2.3, 16.2.2, 16.6 total space of second-level tangent bundle, 27.10.14
topology, compact-open, 15.7.9 total space of tangent bundle, 27.2.1
topology, differential, 26.0.1 total tangent frame space, 27.12.8
topology, direct product, 15.1.4 total tangent space, drop function, 27.11, 27.11.6
topology, discrete, 14.3.19 total tangent space, horizontal component, 27.11, 27.11.2
topology, duality, 14.4.7 total tangent space, vertical vector, 27.11.3
topology, empty, 14.3.10, 14.4.2 total tangent space tangent vector, 27.10.7
topology, etymology, 14.2.1, 45.2.17 totally differentiable function, 18.5.17
topology, Hausdorff, 15.2.13 totally ordered set, 7.1.4
topology, history, 14.2 tour, mathematical logic, 3.1.2
topology, information content, 14.8.10 trace of a matrix, 11.3.2
topology, intuitive, 14.3.9 track, 16.1.1
topology, isolated point, 14.7.5 tractrix, 42.9
topology, largest, 14.3.17, 14.4.1 trajectory, 7.1.18, 16.1.1
topology, locally compact, 15.7.12 trajectory, curve morphing, 16.5.1
topology, non-Hausdorff, 15.2.15 transcendental functions, 20.12.3
topology, non-trivial, 14.8.10 transcendental numbers, 2.9.3, 2.12.5
topology, overview, 14.1 transfer, incomplete information, 3.8
topology, paracompact, 15.7.14 transfinite induction, 18.4.1
topology, permutation-invariant, 14.8.5 transformation, affine, 12.0.3
topology, product, 15.1 transformation, differentiable, differentiable family, 26.8
topology, purpose, 14.8.1 transformation, Galileo, 29.0.3
topology, quotient, 15.1, 15.1.6, 15.1.8 transformation, infinitesimal, 33.8, 34.3.0
topology, relative, 14.11.13 transformation, Lorentz, 29.0.3
topology, simple, countably infinite set, 14.8 transformation, orthogonal, 11.5.7, 19.2.4, 19.5.3, 23.5.0,
topology, simple, finite set, 14.4 45.3.0
topology, smallest, 14.3.17, 14.4.1 transformation family, one-parameter, 30.5
topology, strength, 14.3.20 transformation group, 9.4.4
topology, stronger, 14.3.21 transformation group, left, 9.4
topology, T0 separation class, 15.2.3 transformation group, Lie, 33.0.1
topology, T1 separation class, 14.3.9, 14.7.2, 15.2.4 transformation group, mixed, 9.6
topology, tea-cup and dough-nut, 14.2.3 transformation group, one-parameter, vector field, 30.5.2
topology, translation-invariant, 14.8.4, 14.8.5 transformation group, right, 9.5, 23.9.2
topology, trivial, 14.3.18, 17.3.5 transformation group, topological, 16.8
topology, trivial closed-point, 14.8.6 transformation group ambiguity, left/right, 2.5.4, 5.16.6,
topology, trivial T1 , 14.8.7 33.8.8
topology, underlying, 26.3.6 transformation group as fibre bundle, finite, 22.4
topology, weaker, 14.3.21 transformation group figure, 9.7, 9.7.0
topology, weakness, 14.3.20 transformation group homomorphism, 9.4.9
topology classes, 14.2.6, 14.8.10, 15 transformation group homomorphism, topological left, 16.8.6
topology classification, 14.1.10 transformation group invariant, 9.7
topology constructions, 15 transformation group of a topological space, effective left,
topology flavour, 14.2.2 16.8.7
topology for real numbers, 14.10.1 transformation group of topological space, left, 16.8.2
topology for the real numbers, standard, 14.10 transformation group of topological space, topological left,
topology formalism, 14.3.1 16.8.3
topology generated by collection of subsets of set, 14.9.4 transformation groups, 9.2.3, 9.4.1
topology induced by a metric, 17.3, 17.3.3 transformation groups, left outside skew product, 9.6.4
topology induced by differentiable atlas, 26.3.4 transformation groups, right inside skew product, 9.6.5
topology induced on a set by a function, 14.12.5 transformation pseudogroup, 26.5.3
topology of metric space, 17.3.3 transformation rule, tangent to curve, 19.1.4
topology of pointwise convergence, 14.12.21, 15.7.8 transformation semigroup, 9.4.2

882 50. Index
transformation semigroup, right, 9.5 truth, ontology, 3.6.1

transformations, general linear, Lie group, 33.7.6 truth, proposition tagging, 3.7.5
transistor circuit, 4.6.4 truth, scientific, 3.9.3
transistor circuit, bistable, 4.1.1 truth, semantics, 3.9
transistor circuit voltages, 3.10.6 truth, subjective, 3.7.2
transition map, tangent bundle, 27.2.6 truth, theory of, deflationary, 3.10.8
transition map conformality, 27.8.9 truth, theory of, redundancy, 3.10.8
transition map orthogonality, 27.8.9 truth, undefined concept, 3.9.1
transitive relation, 6.3.29 truth function, 3.10.9, 4.2.4
transitivity, 6.4.1 truth function, unary, 3.10.10
transitivity rule for pathwise parallelism, 24.2.4 truth-functional combination, 4.2.1, 4.2.3
translate of set, 14.8.4 truth table, 3.10.1, 4.4.2
translation-invariant topology, 14.8.4, 14.8.5 truth table, extended, 3.8.1, 4.2.6
translation operator, left, 33.3.1 truth table, negation, 3.10.10
translation operator, right, on Lie group, 33.4.2 truth table, theorem provability, 4.4.4
transport, Lie, 32.4.5 truth table applicability, predicate calculus, 4.14.3
transport, metric, 38.1.5 truth value, 3.7.15
transport, parallel, 32.1.0, 35.0.2, 36.1.1, 36.8.8 truth value, unknown, 3.2.5, 3.8.2, 4.2.5, 4.2.7
transpose of a matrix, 11.1.12 truth value function, inconsistent, 4.1.5
transpose of linear map, 10.5.23 truth value map, 3.1.2, 4.1.3
transposed horizontal lift function on differentiable fibre truth-value status, relativity, 3.3.9
bundle, 35.3.12 tuple, computational second-order tangent component, 29.3.1
transposed horizontal lift function on principal bundle, 35.5.3 tuple, real number, 8.5
transposed map, 10.5.24 tuple, second-order tangent component, 29.3.1
transposition of function, 6.12.3 tuple, specification, 5.16
transposition operation on set, 7.10.7 tuple, specification, redundancy, 14.3.2
traversal, 16.1.1 tuple, structure, figure-head set, 26.3.7
traversal, membership relation network, 5.5.3 tuple concatenation operator, 8.5.3
traversal, ordered, 7.1.17, 7.1.18, 16.1.8, 16.2.3, 17.5.11 tuple of real numbers, 8.5.1
tree, bird, 5.7.25 two-parameter arctangent function, 20.13.6
tree, function, 4.3.12 two-point metric, 17.1.3
tree, logical operation, 3.10.13 two-port object, 33.4.11
triangle, area, 13.1.2 two-sided coin, proposition analogy, 3.7.4
triangle, arithmetic, 7.11.6 two-sphere antipodal points, 41.9.4
triangle inequality, 17.1.12, 17.1.14 two-sphere geodesic curve, 41.9, 41.9.3
triangle of Pascal, 7.11.6 two-sphere geometry, 41
tribal membership, 2.2.2 two-sphere global tangent bundle, 41.7
two-sphere isometry, 41.8
tribes, monkey, 2.2.3
trichotomy, 5.9.14 two-sphere rotation, 41.8.0
trigger, assertion, 3.7.17, 3.10.14 two-truth-value logic, 3.11.7
trigonometric function, 20.13 two-way assertion symbol, 4.5.9
trigonometric function, inverse, 20.13.2 two’s complement representation of negative numbers, 7.5.8
trigonometry double-angle rules, 20.13.12 types, theory, 4.1.9
trigonometry half-angle rules, 20.13.13 typing, 1.4.8
trigonometry product rules, 20.13.11 typographic arts, 2.9.7
triple, ordered, 6.1.14, 7.7.3 Übertragung, 35.1.3
triple, tangent coordinate, 27.3.2 ugly set-construction definitions, 2.8.8
triple conjunction, 3.5.8 unary truth function, 3.10.10
triple disjunction, 3.5.8 unbounded modelling, on demand, 5.7.25
triplicity, 4.16.11 unbounded versus infinite, 5.5.2
triplique, 4.16.11 uncertainty, logic, 3.9.3
tripliqueness, 47.1.7 uncountable, 2.11.20
trivial bundle, 24.4.4 uncountable aggregate, name, 2.10.1
trivial closed-point topology, 14.8.6 uncut space, 10.10.4
trivial group, 43.1.0 undecidable proposition, 3.7.19, 3.8
trivial subspace, 10.2.7 undecided proposition, 3.10.1
trivial T1 topology, 14.8.7 undefined concept, probability, 3.9.1, 5.1.2
trivial topological space, 14.3.18 undefined concept, set, 5.1.2
trivial topology, 14.3.18, 17.3.5 underlying objects, 4.16.1
trivium, 2.9.8 underlying topology, 26.3.6
truck, 23.1.2 unidirectional derivative, 19.6.2
true, abstract label, 3.2.5 unidirectional tangent bundle, 27.14, 27.14.2
true, proposition tag, 3.7.16 unidirectional tangent bundle atlas, 27.14.2
true nature, mathematical logic, 3.1.2 unidirectional tangent bundle chart, 27.14.2
true-store, proposition, 3.7.19 unidirectional tangent bundle lift function, 27.14.2
true zero-operand operator, 4.3.18, 4.3.19 unidirectional tangent bundle projection map, 27.14.2
true zero-parameter predicate, 4.12.10 unidirectional tangent bundle total space, 27.14.2
truth, definition, 3.4.3 unidirectional tangent vector, 27.14.2

50. Index 883
unidirectionally differentiable atlas for topological manifold, validity, proof by contradiction, 3.11.10
26.11.3 value, absolute, function, 8.6.1
unidirectionally differentiable function, 18.3.5, 18.5.14, 19.6.3 value, truth, map, 3.1.2
unidirectionally differentiable homeomorphisms, complete value of function, 6.5.11
pseudogroup, 19.6.7 value proposition, 3.3.10
unidirectionally differentiable manifold, 26.11, 26.11.4 variable, bound, 5.1.24
uniform non-topological fibration, 22.1.3 variable, dummy, 4.3.16, 5.1.23, 5.8.16, 5.8.24
uniform non-topological fibration with fibre space F , 22.1.9 variable, dummy, limit of function, 14.12.19
uniform topology on finite set, 14.4.8 variable, dummy, naked, 5.8.28
uniformly continuous function, 17.4.8 variable, dummy, superfluous, 6.5.16
uniformly Hölder continuous function, 18.9.1 variable, free, 5.1.24, 5.8.16
uninteresting assertions, 4.8.1 variable domain, concrete, 5.7.3
union, disjoint, 6.4.2 variable name space, abstract, 2.11.17
union axiom, 5.3.4 variable space, concrete, 5.2.3
union of family of sets, 6.8.5 variation, geodesic, equation, 40.4
union of sets, 5.13.2 Veblen, Oswald, 45.1.6
union of sets, properties, binary, 5.13 vector, action, 32.1.2
union of sets, properties, general, 5.14 vector, basis, 10.2
unique existence, 4.16.3 vector, coordinate basis, 27.12
unique existence notation, 4.16.4, 4.16.6 vector, cotangent, 28.2
uniqueness, 4.16 vector, dual, 13.7.2
uniqueness, cardinality, 4.16.7 vector, etymology, 27.0.5
uniqueness, empty set, 5.3.2, 5.8.1 vector, horizontal, 10.11.11
uniqueness, set notation definition, 5.8.6 vector, infinitesimal, 19.1.6
unit coordinate basis covector, 28.2.6 vector, mobile, 10.1.6
unit element of unitary ring, 9.8.2 vector, portable, 10.1.6
unit interval, 8.3.10 vector, primal, 13.7.2
unit step function, 8.6.5 vector, second-order tangent, 29.3.3
unitary left module, 10.1.3 vector, simple, 13.10.10
unitary left module over a ring, 9.9.21 vector, tangent, 27.2.1, 27.3, 27.3.3
unitary ring, 9.8.2 vector, tangent, Euclidean space, 19.1
unitary ring, commutative, 9.8.6 vector action, 32.1.8, 32.1.16
unity of unitary ring, 9.8.2 vector addition, 10.1.2
universal bilinear map, 13.5.3 vector bundle, 34.7, 34.7.1
universal quantifier, 4.13.2 vector component, 10.5.13
universal set, 4.1.9 vector field, 28.5, 30.5
universal/existential quantifier, information content, 4.13.9 vector field, action, 32.1.3, 32.1.11
universality of modern logic, 3.4
vector field, component function, 28.5.5
Universe, atoms, 2.11.15 vector field, composition, 32.2.2
universe, imperfect, 3.4.3 vector field, coordinate basis, 28.5.12
universe, mathematical, ontology, 2.3.10 vector field, differentiable second-order, 29.7.3
universe, metaphysical, 2.7.1 vector field, elliptic second-order, 29.7.6
Universe, Ruler, 2.11.22 vector field, higher-order, 29.7
universe set, 5.7.14, 5.7.23 vector field, indeced map, 30.3.14
universe set, power set, 5.7.26
vector field, Lie derivative, 32.4
university education, medieval, 2.9.8
vector field, Lie group, left invariant, 33.3.11
univocal relation, 4.16.2, 6.11.2
vector field, Lie group, right invariant, 33.4
unknowable existence, Lebesgue non-measurable set, 2.10.8
vector field, lift, by connection on principal fibre bundle,
unknown known, 4.2.8
35.5.8
unknown real number, 4.2.8
vector field, linear space, 28.5.8
unknown truth value, 3.2.5, 3.8.2, 4.2.5, 4.2.7
vector field, right invariant, 33.4.7
unknown unknown, 3.8.2, 4.2.8
vector field, second-order, 29.7.1
unknowns, blow-out, 3.8.2
vector field, tangent, 28.5.1
UNLESS-construction, 3.5.4
vector field, tangent, partial, 30.4.11
unmentionable number, 2.10.4
vector field, tangent, second-order, 31.3.1
unordered pair axiom, 5.3.3
vector field, tangent, second-order, partial, 31.4.2
unordered sample, 7.11.1
vector field action, 32.1.17
unordered selection, 7.11.1
vector field along curve, 28.8, 28.8.1
unoriented continuous path, 16.4.15
vector field along curve, continuous, 28.8.2
unsigned integer arithmetic, 7.4
vector field along curve, differentiable, 28.8.4
upper bound of partially ordered set, 7.1.10
vector field calculus, 32
URL (Uniform Resource Locator), 49.0.1
vector field commutator, 32.2.5
useless axiom of choice, 5.1.18
vector field component function, 28.6.6
usual atlas for Euclidean space, 26.4.1
vector field derivative, naive, 32.1
usual metric on real numbers, 17.1.4
usual topology for real numbers, 14.10.1 vector field derivative for curve family, 32.3
usual topology for IRn , 14.10.3 vector field flow, 32.4.2
vector field for family of curves, higher order, 29.8
vacuum, proposition, 3.6.1 vector field generated by family of diffeomorphisms, 33.6.8

884 50. Index
vector field generated by one-parameter transformation wff (well-formed formula), 4.5.2

group, 30.5.2 wff name, 4.7.4
vector field on differentiable fibre bundle, 34.3 wff name scope, 4.9.3
vector field on differentiable principal fibre bundle, 34.5 wheel, roulette, ex machina, 5.9.9
vector field on Lie group, left invariant, 33.3 wheel of fortune, 1.8.0
vector map, column, 11.1.10 wheel of knowledge, 2.0.1
vector map, row, 11.1.11 white swan, 4.13.10
vector sequence, antisymmetric multilinear effect, 13.1.5 Whitehead, Alfred North, 45.1.6
vector sequence multilinear effect, 13.1.2, 13.1.6 Widman, Johannes, 45.2.5
vector space, 9.9.2, 10.1.1 Widmann, J.W., 45.2.5
vector space, cotangent, 28.2.2 wild logic, 2.2.7
vector space, topological, 16.9, 16.9.3 wncyr font, 1.10
vector space of affine space, 12.2.3 word order, 3.10.6
vegetarian cook, 1.4.5 world, imaginary, model, 2.10.1
vel, 4.3.5 world, infinite, 2.10.9
velocity of flow, 32.4.3 world model, animal, 5.7.25
Venn diagram, 5.7.17, 5.7.25 world model, organism, 3.3.2
verb mood, 3.10.6 world-model machine, 3.6.1
verb mood, imperative, 3.10.14, 3.12.1, 3.12.3 world-model-machine ontology, 3.6.2
verb mood, indicative, 3.7.1, 3.10.11, 3.12.1, 3.12.2 world-model ontology, 5.7.25
verb mood, logical proposition, 3.12 world-model ontology, inconsistency, 5.7.12
verb mood, subjunctive, 3.7.1, 3.10.11, 3.12.2 world-model ontology, logic, 3.11.6
verb mood, symbolic logic, 3.12.1 world-model ontology, proof by contradiction, 3.11.8
vertical vector, principal fibre bundle, 35.5.10 world-model ontology, propositional logic, 3.6.3
vertical vector of total tangent space, 27.11.3 world models, multiple, 3.11.2
vertical vector on total space of fibration, 26.13.9 world states, 3.6.3
Viète, François, 45.1.3 world-view ontology for logic, 3.6
viewpoint of projection, 41.17.2 writing, cuneiform, 3.5.1
Viking opera house, 42.10.1
XOR (exclusive OR), 4.3.7, 4.3.8
Vinci, Leonardo da, 3.9.8
violation, causality, self-containing set, 5.7.24 Yáng Huı̄, 7.11.7
virtual machines, 2.5.15 yaw, 41.8.2
virtual memory, 2.10.10 year, Gregorian, 2.10.3
visible light, 3.4.5 yes/no answer, 3.9.5
visual art, 3.1.3 Young, John Wesley, 45.1.6
Vitali, Giuseppe, 3.2.1, 45.1.6
vocalization pattern, 2.2.4 Zeno of Elea, 2.11.3, 7.2.7, 14.0.4, 14.1.4, 14.2.5, 45.1.1
vocalizations, complexity, 2.2.4 Zeno’s paradox, 14.1.4
voltage, audio system, 3.3.7 Zermelo, Ernst Friedrich Ferdinand, 2.9.7, 5.9.3, 45.1.6
voltage, logic, 4.5.5 Zermelo-Fraenkel set theory, 4.1.8, 5.1, 5.7.2
voltage, logical, 4.1.8 Zermelo-Fraenkel set theory axioms, 5.1.26
voltages, 2.5.8 Zermelo-Fraenkel set theory with axiom of choice, 5.9.6
voltages, transistor circuit, 3.10.6 Zermelo-Fraenkel set theory with axiom of countable choice,
von Neumann, John (János), 5.12.1, 7.2.4, 45.1.6 5.10.3
von Neumann construction, ordinal numbers, 7.2.5 Zermelo set theory, 5.11, 5.11.1
von Staudt, Karl Georg Christian, 45.1.5 Zermelo-Skolem-Fraenkel set theory, 5.1.1
zero-operand operator, logical, 4.3.18, 4.3.19
Wallis, John, 45.1.4, 45.2.9 zero-parameter predicate, logical, 4.12.10
warming, global, 3.10.6 zero ring, 9.8.4
Washington University, 1.10 zero tangent operator ambiguity, 2.6.5, 27.5.12, 27.6.1
waste of time and energy, terrible, 4.4.4 zero-thickness boundary, 14.1.9
water, 2.1.5, 38.1.5 zero-width boundary, 14.0.3
weak regularity, 1.4.10 ZF (Zermelo-Fraenkel), 5.0.9
weak topology, 14.11.9 ZF axiom, productive, 5.1.17, 5.1.19
weaker topology, 14.3.21 ZF axiom of substitution, 5.11.3
weakly connected function, 15.5.3 ZF axiom style, 5.6.2
weakly elliptic second-order operator, 36.7.3 ZF-clean theorem, 5.9.1
weakly elliptic second-order tangent operator, 29.6.3 ZF comprehension axiom, 5.4.3
weaving, 2.5.14 ZF extension axiom, 5.1.3, 5.2
Webster, Noah, 1.6.7 ZF infinity axiom, 2.11.13, 5.6
wedge product, 13.10.2 ZF regularity axiom, 5.5, 5.7.19
Weierstraß, Karl Theodor Wilhelm, 2.9.7, 17.4.2, 45.1.5 ZF replacement axiom, 5.4
well-defined function, 6.11.1 ZF separation axiom, 5.4.3
well-formed formula, 4.3.10, 4.5.2 ZF set existence axiom, 5.1.17, 5.1.18
well-ordering of real numbers, 5.9.3 ZF set theory, first order logic, 4.13.13
Weyl, Hermann Klaus Hugo, 1.5.4, 10.1.7, 12.2.1, 25.1.9, ZF set theory, propositions, 4.1.9
35.1.1, 35.1.2, 35.1.5, 45.1.6 ZF set theory, redundant axiom, 5.1.17
wf (well-formed formula), 4.5.2 ZF set theory 8-line summary, 5.1.27, 5.1.28
wff, 4.7.4 ZF set theory axioms, 5.3

50. Index 885
ZF set theory construction stage, 5.5.4 ziggurat, 1.4.15

ZF specification axiom, 5.4.1, 5.4.2, 5.7.19, 47 zoology, 2.2.5
ZF-unreachable, 2.12.3
Zorn’s lemma, 5.9.13, 5.9.14, 10.2.25
ZFC (Zermelo-Fraenkel set theory with axiom of choice),
5.9.5 Zusammenhang, 25.1.9, 35.1.1, 35.1.2
Zhū Shı̀jié, 7.11.7 Zusammenhang, stetiger, 35.1.2
number value number value number value

pages 903 definitions 725 theorems 404
chapters 50 notations 232 proofs 207
sections 457 examples 51 exercises 69
diagrams 283 lemmas 2 solutions 40
remarks 2023 metadefns 3 [comments] 1219

Differential Geometry

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Differential Geometry

Transféré par

Droits d'auteur :

Formats disponibles

[i]

Differential geometry reconstructed

First edition [work in progress]

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Mathematics Subject Classification (MSC 2000): 53–01

Library cataloguing data

Kennington, Alan Ulrich (1953-

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

read write type upload download

books brain desk workstation web server Internet

April 2010 Dr. Alan U. Kennington

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Part I. Preliminary topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 2. Philosophical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 3. Logic semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 4. Logic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Chapter 5. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Chapter 6. Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 7. Order and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

Chapter 8. Rational and real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

Chapter 9. Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Chapter 10. Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 11. Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

Chapter 12. Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

Chapter 13. Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 15. Topology classes and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

Chapter 16. Topological curves, paths and groups . . . . . . . . . . . . . . . . . . . . . . . . . . 401

Chapter 17. Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

Chapter 18. Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

Chapter 19. Diffeomorphisms in Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 20. Measure and integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

Chapter 21. Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

Chapter 22. Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

Chapter 24. Parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 529

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Part II. Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

Chapter 25. Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541

Chapter 26. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

Chapter 27. Tangent bundles on differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . 571

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 28. Tensor bundles and tensor fields on manifolds . . . . . . . . . . . . . . . . . . . . . . 601

Chapter 29. Higher-order tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

Chapter 30. Differentials on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

Chapter 31. Higher-order differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635

Chapter 32. Vector field calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 33. Differentiable groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653

Chapter 34. Differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669

Chapter 35. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . 681

Chapter 36. Affine connections and covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . 699

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 37. Geodesics, convexity and Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Chapter 38. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723

Chapter 39. Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735

[ www.topology.org/tex/conc/dg.html ] [ draft: UTC 2010–4–8 Thursday 09:22 ]

Chapter 41. Geometry of the 2-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747