Académique Documents
Professionnel Documents
Culture Documents
Alan U. Kennington
90
90
90
A 90 0 10
90
0
90 10 10 B
90 0 0 90
90 90 90 10 10
0 0
10
10
0 90
0 10
0 10 0 10 0 10
10 10
0
0 10 10
0 0 90 90 0
10 0
10 0 10
0 10 0 90 10 0
10 0
90 90 0
10
90 0 10 90 10
0 10 0 10
10 0 90 10
10 0 0 0
90 10
90 10 0 0
0
10 10 90 10 0
10 0
0 90
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
0 10 10
0 10 90
10 90 0 10
0 90 10
0
0
10
1 2
thematics
0 ma
ph
c
2
i
3
ys
l og
ics
1
pology
5
chemist
4
9
o
thr
8
ry
an
e
c bi
3 en olo
gy neurosci
6 7
N
i, j m , k, ` n , N
i p i p i
R jk` = (Dek De` De` Dek )(ej )i = j`,k
i i
jk,` + j` pk jk p`
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
First printing, April 2017 [work in progress]
c 2017, Alan U. Kennington.
Copyright
All rights reserved.
The author hereby grants permission to print this book draft for personal use.
Public redistribution of this book draft in PDF or other electronic form on the Internet is forbidden.
Public distribution of this book draft in printed form is forbidden.
You may not charge any fee for copies of this book draft.
This book was typeset by the author with the plain TEX typesetting system.
The illustrations in this book were created by the author with MetaPost.
This book is available on the Internet: www.geometry.org/dg.html
Preface
There are no tall towers of advanced differential geometry in this book. The objective of this book is to dig
deep tunnels beneath the tall towers of advanced theory to investigate and reconstruct the foundations.
This book has not been peer-reviewed, and it probably never will be. The economic efficiency argument for
pre-publication peer review is not so strong when the electronic version is free. Nowadays the best reviewer
is the reader. The peers will have little interest in this book because it is primarily concerned with the
meaning of mathematics, not with the serious business of discovering new theorems. The purpose here is
not to fit in with the current literature, but rather to examine some of that literature with the intention of
reconstructing it into a more unified systematic framework. Doing things in the usual way is not the prime
objective. (The author thinks of himself more as an Enkidu than as a Gilgamesh [430, 431].)
Part IV of this book is currently being rewritten because it is too incomplete and inconsistent. Part III is
also incomplete in some chapters. Anything which looks unfinished probably is unfinished. So please just
skip over it for now. The creative process for producing this book is illustrated in the following diagram.
All of these processes are happening concurrently.
During the current consolidation phase, everything will (hopefully) become tidy, comprehensible and
coherent. The book is being assembled like a jigsaw puzzle. Most of the pieces are fitting together nicely
already, but some pieces are still in the box waiting to be thrown on the table. Inconsistencies, repetition,
self-indulgence, frivolity and repetition are being progressively removed. All of the theorems will be proved.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Formative chaos will yield to serene order. It wont happen overnight, but it will happen.
Biography
The author was born in England in 1953 to a German mother and Irish father. The family migrated in 1963
to Adelaide, South Australia. The author graduated from the University of Adelaide in 1984 with a Ph.D. in
mathematics. He was a tutor at University of Melbourne in 1984, research assistant at Australian National
University (Canberra) in early 1985, Assistant Professor at University of Kentucky for the 1985/86 academic
year, and visiting researcher at University of Heidelberg, Germany, in 1986/87. From 1987 to 2010, the
author carried out research and development in communications and information technologies in Australia,
Germany and the Netherlands. His time and energy since then have been consumed mostly by this book.
Chapter titles
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I. Foundations
2. General comments on mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3. Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4. Propositional calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5. Predicate logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6. Predicate calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7. Set axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8. Set operations and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9. Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
10. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
11. Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
12. Ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
13. Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
14. Natural numbers and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
15. Rational and real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
16. Real-number constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
Part II. Algebra
17. Semigroups and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
18. Rings and fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
19. Modules and algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
20. Transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
21. Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
22. Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23. Linear maps and dual spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
24. Linear space constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
25. Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
26. Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
27. Multilinear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
28. Tensor spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
29. Tensor space constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
30. Multilinear map spaces and symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Part III. Topology and analysis
31. Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
32. Topological space constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
33. Topological space classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
34. Topological connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
35. Continuity and limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029
36. Topological curves, paths and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059
Chapter titles
37. Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
38. Metric space continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1113
39. Topological linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135
40. Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147
41. Multi-variable differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169
42. Higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193
43. Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211
44. Lebesgue measure zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1249
45. Exterior calculus for Cartesian spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269
46. Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291
47. Parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
Part IV. Differential geometry
48. Locally Cartesian spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345
49. Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375
50. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397
51. Philosophy of tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433
52. Tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455
53. Covector bundles and frame bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1489
54. Tensor bundles and multilinear map bundles . . . . . . . . . . . . . . . . . . . . . . . . 1515
55. Vector fields, tensor fields, differential forms . . . . . . . . . . . . . . . . . . . . . . . . 1533
56. Differentials of functions and maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1549
57. Recursive tangent bundles and differentials . . . . . . . . . . . . . . . . . . . . . . . . 1573
58. Higher-order tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1595
59. Vector field calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1607
60. Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1631
61. Differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1663
62. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 1699
63. Connections on principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1725
64. Curvature of connections on fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 1753
65. Affine connections on tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1773
66. Geodesics and Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1801
67. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1809
68. Levi-Civita parallelism and curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . 1827
69. Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1843
70. Spherical geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1851
Appendices
71. History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1865
72. Notations, abbreviations, tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879
73. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1903
74. Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1921
Acknowledgements
The author is indebted to Donald E. Knuth [433] for the gift of the TEX typesetting system, without which
this book would not have been created. (A multi-decade project requires a stable typesetting environment!)
Thanks also to John D. Hobby for the MetaPost illustration tool which is used for all of the diagrams.
The fonts used in this book include cmr (Computer Modern Roman, American Mathematical Society),
msam/msbm (AMSFonts, American Mathematical Society), rsfs (Taco Hoekwater), bbm (Gilles F. Robert),
wncyr Cyrillic (Washington University), grmn (CB Greek fonts, Claudio Beccari), and AR PL SungtiL GB
(Arphic Technology, Taiwan).
The author is grateful to Christopher J. Barter (University of Adelaide) for creating the Ludwig text editor
which has been used for all of the composition of this book (except for 6 months using an Atari ST in 1992).
The author is grateful to his Ph.D. supervisor, James Henry Michael (19202001), especially for his insistence
that all gaps in proofs must be filled, no matter how tedious this chore might be, because intuitively obvious
assertions often become unproven (or false) conjectures when the proofs must be written out in full, and
because filling the gaps often leads to more interesting results than the original obvious assertions.
The author is grateful to his father, Noel Kennington (19252010), for demonstrating the intellectual life,
for the belief that knowledge is a primary good, not just a means to other ends, and for a particular interest
in mathematics, sciences, languages, philosophies and histories.
Table of contents
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Structural layers of differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Topic flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Chapter groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objectives and motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Some minor details of presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Differences from other differential geometry books . . . . . . . . . . . . . . . . . . . . 11
1.7 How to learn mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8 MSC 2010 subject classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Part I. Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3.2 Concrete proposition domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Proposition name spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Knowledge and ignorance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Basic logical expressions combining concrete propositions . . . . . . . . . . . . . . . . 54
3.6 Truth functions for binary logical operators . . . . . . . . . . . . . . . . . . . . . . . . 59
3.7 Classification and characteristics of binary truth functions . . . . . . . . . . . . . . . . 64
3.8 Basic logical operations on general logical expressions . . . . . . . . . . . . . . . . . . 65
3.9 Logical expression equivalence and substitution . . . . . . . . . . . . . . . . . . . . . . 67
3.10 Prefix and postfix logical expression styles . . . . . . . . . . . . . . . . . . . . . . . . 70
3.11 Infix logical expression style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.12 Tautologies, contradictions and substitution theorems . . . . . . . . . . . . . . . . . . 77
3.13 Interpretation of logical operations as knowledge set operations . . . . . . . . . . . . . 79
3.14 Knowledge set specification with general logical operations . . . . . . . . . . . . . . . . 81
3.15 Model theory for propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Chapter 8. Set operations and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.1 Binary set unions and intersections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.2 The set complement operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
8.3 The symmetric set difference operation . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.4 Unions and intersections of collections of sets . . . . . . . . . . . . . . . . . . . . . . . 256
8.5 Closure of set unions under arbitrary unions . . . . . . . . . . . . . . . . . . . . . . . 262
8.6 Set covers and partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
8.7 Specification tuples and object classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Chapter 12. Ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
12.1 Extended finite ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
12.2 Order on extended finite ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . 371
12.3 Enumerations of sets by extended finite ordinal numbers . . . . . . . . . . . . . . . . . 374
12.4 General ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
12.5 The transfinite recursive cumulative hierarchy of sets. . . . . . . . . . . . . . . . . . . 380
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
16.5 Some basic real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
16.6 Powers and roots of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
16.7 Sums and products of finite sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
16.8 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
20.4 Transitive left transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
20.5 Orbits and stabilisers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
20.6 Left transformation group homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . 592
20.7 Right transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
20.8 Right transformation group homomorphisms . . . . . . . . . . . . . . . . . . . . . . . 596
20.9 Figures and invariants of transformation groups . . . . . . . . . . . . . . . . . . . . . 597
20.10 Baseless figure/frame bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
20.11 Associated baseless figure bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
24.3 Natural isomorphisms for duals of subspaces . . . . . . . . . . . . . . . . . . . . . . . 711
24.4 Affine transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
24.5 Exact sequences of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
24.6 Seminorms and norms on linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 717
24.7 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
24.8 Hyperbolic inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
28.1 Tensor product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
28.2 The canonical tensor map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
28.3 Universal factorisation property for tensor products . . . . . . . . . . . . . . . . . . . 829
28.4 Simple tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
28.5 Tensor space definition styles and terminology . . . . . . . . . . . . . . . . . . . . . . 834
28.6 Tensor product space metadefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
28.7 Tensor product spaces defined via free linear spaces . . . . . . . . . . . . . . . . . . . 840
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
32.11 Set union topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
32.12 Topological collage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
37.6 Some basic metric space topology properties . . . . . . . . . . . . . . . . . . . . . . . 1104
37.7 Compactness and convergence in metric spaces . . . . . . . . . . . . . . . . . . . . . . 1106
37.8 Cauchy sequences and completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
43.1 History and styles of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1212
43.2 Gregory-Barrow-Leibniz-Newton integral . . . . . . . . . . . . . . . . . . . . . . . . . 1214
43.3 Cauchy integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216
43.4 Cauchy-Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218
43.5 Cauchy-Riemann-Darboux integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1222
43.6 Fundamental theorem of calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227
43.7 Riemann integral for vector-valued integrands . . . . . . . . . . . . . . . . . . . . . . 1230
43.8 Cauchy-Riemann-Darboux-Stieltjes integral . . . . . . . . . . . . . . . . . . . . . . . . 1231
43.9 Stieltjes integral for vector-valued integrands . . . . . . . . . . . . . . . . . . . . . . . 1232
43.10 Stieltjes integral for linear operator integrands . . . . . . . . . . . . . . . . . . . . . . 1233
43.11 Ordinary differential equations, Peano method . . . . . . . . . . . . . . . . . . . . . . 1234
43.12 Ordinary differential equations, Picard method . . . . . . . . . . . . . . . . . . . . . . 1236
43.13 Logarithmic and exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . 1241
43.14 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1243
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
47.3 Pathwise parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . 1327
47.4 Associated parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
52.6 Tangent velocity vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476
52.7 Tangent differential operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1478
52.8 Tagged tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486
52.9 Unidirectional tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
57.7 Higher-order differentials of real-valued functions . . . . . . . . . . . . . . . . . . . . . 1590
57.8 Hessian operators at critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1591
57.9 Higher-order differentials of maps between manifolds . . . . . . . . . . . . . . . . . . . 1592
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Chapter 62. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . 1699
62.1 History of parallel transport and connections . . . . . . . . . . . . . . . . . . . . . . . 1699
62.2 Specification styles for differential parallelism . . . . . . . . . . . . . . . . . . . . . . . 1702
62.3 Reconstruction of parallel transport from its differential . . . . . . . . . . . . . . . . . 1704
62.4 Horizontal lift functions on differentiable fibrations . . . . . . . . . . . . . . . . . . . . 1706
62.5 Horizontal lift functions on ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . 1708
62.6 Horizontal component maps and horizontal subspaces . . . . . . . . . . . . . . . . . . 1715
62.7 Covariant derivatives for ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . . . 1720
62.8 Parallel transport on ordinary fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . 1722
62.9 Connections on vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1722
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
66.5 Equations of geodesic deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1807
Appendices
Chapter 1
Introduction
This book is not Differential Geometry Made Easy. If you think differential geometry is easy, you havent
understood it! This book aims to be Differential Geometry Without Gaps, although a finite book cannot
fill every gap. Recursive gap-filling entails a broad and deep traversal of the catchment area of differential
geometry. This catchment area includes a large proportion of undergraduate mathematics.
This is a definitions book, not a theorems book. Definitions introduce concepts and their names. Theorems
assert properties and relations of concepts. Most mathematical texts give definitions to clarify the meaning
of theorems. In this book, theorems are given to clarify the meaning of definitions. To be meaningful,
many definitions require prior proofs of existence, uniqueness or regularity. So some basic theorems are
unavoidable. Some theorems are also given to motivate definitions, or to clarify the relations between them.
The intention of this definition-centred approach is to assist comprehension of the theorem-centred literature.
The focus is on the foundations of differential geometry, not advanced techniques, but the reader who has a
confident grasp of basic definitions should be better prepared to understand advanced theory.
To study differential geometry, the reader should already have a good understanding of mathematical logic,
set theory, number systems, algebra, calculus and topology. These topics are presented in preliminary
chapters here, but it is preferable to have made prior acquaintance with them. This book is targeted at the
graduate perspective although much of the subject matter is typically taught in undergraduate courses.
The central concepts of differential geometry are coordinate charts, tangent vectors, the exterior calculus,
fibre bundles, connections, curvature and metric tensors. The intention of this book is to give the reader
a deeper understanding of these concepts and the relations between them. The presentation strategy is to
systematically stratify all differential geometry concepts into structural layers.
This is not a textbook. It is a thoughtful investigation into the meanings of differential geometry concepts.
Differential geometry imports a very wide range of definitions and theorems from other areas. Imperfect
understanding of fundamental concepts can result in an insecure or even false understanding of advanced
concepts. Therefore it is desirable to better understand the foundations. But the recursive definition of a
concept in terms of other concepts inevitably leads to a dead end, a loop, or an infinite chain of definitions.
Ultimately one must ask the ontology question: What is this mathematical concept in itself . The ontology
trail leads outside pure mathematics to other disciplines and into the minds of mathematicians.
The author intends to use this book as a resource for writing other books. When this unabridged version
has been released, the author may write a shorter version which omits the least popular technicalities. The
shorter version may be titled Differential Geometry Made Easy. It might sell a lot of copies!
The following table shows structural layers required by some important concepts. For example, Jacobi fields
may be defined for an affine connection or Riemannian metric, but not if given only differentiable structure.
concepts structural layers where concepts are meaningful
point topology differentiable affine riemannian
set structure connection metric
0 cardinality of sets yes yes yes yes yes
1 interior/boundary of sets yes yes yes yes
connectivity of sets yes yes yes yes
continuity of functions yes yes yes yes
2 tangent vectors yes yes yes
differentials of functions yes yes yes
tensor bundles yes yes yes
differential forms yes yes yes
vector field calculus yes yes yes
Lie derivative yes yes yes
exterior derivative yes yes yes
Stokes theorem yes yes yes
3a connections/parallellism yes+ yes
covariant derivative yes+ yes
curvature form [pfb] yes+ yes
Riemann curvature [vb] yes+ yes
3b geodesic curves yes yes
geodesic coordinates yes yes
Jacobi fields yes yes
torsion tensor yes yes
convex sets and functions yes yes
Ricci curvature yes yes
4 distance between points yes
length of a vector yes
angle between vectors yes
volume of a region yes
area of a surface yes
normal coordinates yes
sectional curvature yes
scalar curvature yes
Laplace-Beltrami operator yes
Concepts marked yes+ are meaningful for general connections on differentiable fibre bundles, not just for
affine connections on tangent bundles. Thus 3a denotes the sub-layer of connections on differentiable fibre
bundles, whereas 3b denotes the sub-layer of affine connections on tangent bundles. There are also some
other sub-layers and sub-sub-layers, for example for vector bundles [vb], differentiable fibrations without Lie
structure groups, principal fibre bundles [pfb], and so forth.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
pages chapter group chapters
14 introduction 1
480 Part I: foundations 216
18 general comments 2
172 mathematical logic 36
124 sets, relations, functions 710
166 order, numbers 1116
372 Part II: algebra 1730
126 semigroups, groups, rings, fields, modules 1721
246 linear algebra, tensor algebra 2230
438 Part III: topology and analysis 3147
252 topology and metric spaces 3139
144 differential and integral calculus 4045
42 topological fibre bundles 4647
520 Part IV: differential geometry 4870
52 topological manifolds 4849
234 differentiable manifolds 5059
68 Lie groups, differentiable fibre bundles 6061
110 general and affine connections 6266
42 Riemannian/pseudo-Riemannian manifolds 6769
14 spherical geometry 70
56 appendices 7173
73 index 74
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
19. Modules and algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
20. Transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
21. Non-topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
Chapters 22 to 30 introduce linear algebra and multilinear/tensor algebra.
22. Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
23. Linear maps and dual spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
24. Linear space constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
25. Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
26. Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
27. Multilinear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
28. Tensor spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
29. Tensor space constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
30. Multilinear map spaces and symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Part III. Topology and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Chapters 31 to 39 introduce topology. A metric space is a particular kind of topological space.
31. Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
32. Topological space constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
33. Topological space classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
34. Topological connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
35. Continuity and limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029
36. Topological curves, paths and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059
37. Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
38. Metric space continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1113
39. Topological linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135
Chapters 40 to 45 introduce analytical topics, namely the differential and integral calculus.
40. Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147
41. Multi-variable differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169
42. Higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193
43. Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211
44. Lebesgue measure zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1249
45. Exterior calculus for Cartesian spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269
Chapters 46 and 47 introduce fibre bundles which have topology but no differentiable structure.
46. Topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291
47. Parallelism on topological fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
Part IV. Differential geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333
Chapters 48 and 49 introduce topological manifolds. This is layer 1 in the five-layer DG structure model.
48. Locally Cartesian spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345
49. Topological manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375
Chapters 50 to 59 add differentiable structure (i.e. charts) to manifolds. This commences layer 2.
50. Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397
51. Philosophy of tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433
52. Tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455
53. Covector bundles and frame bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1489
54. Tensor bundles and multilinear map bundles . . . . . . . . . . . . . . . . . . . . . . . . 1515
55. Vector fields, tensor fields, differential forms . . . . . . . . . . . . . . . . . . . . . . . . 1533
56. Differentials of functions and maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1549
57. Recursive tangent bundles and differentials . . . . . . . . . . . . . . . . . . . . . . . . 1573
58. Higher-order tangent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1595
59. Vector field calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1607
Chapters 60 and 61 introduce Lie (i.e. differentiable) groups, which are used in the definition of differentiable
fibre bundles, which are required for defining general connections. (This may be thought of as layer 2+.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
60. Lie groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1631
61. Differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1663
Chapters 62 to 66 introduce connections on manifolds without a metric. This is layer 3.
62. Connections on differentiable fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 1699
63. Connections on principal fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1725
64. Curvature of connections on fibre bundles . . . . . . . . . . . . . . . . . . . . . . . . . 1753
65. Affine connections on tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1773
66. Geodesics and Jacobi fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1801
Chapters 67 to 69 introduce Riemannian and pseudo-Riemannian manifold structures, which are required
for general relativity. This is layer 4.
67. Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1809
68. Levi-Civita parallelism and curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . 1827
69. Pseudo-Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1843
Chapter 70 applies differential geometry concepts to spheres, the quintessential Riemannian manifolds.
70. Spherical geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1851
Appendices
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
concepts more deeply, not just acquire skill in their application, has led him reluctantly into a prolonged
investigation, like a journalist persistently sniffing out a story over many years because a routine report
turned up some surprising developments. This book is a report of that journalistic investigation.
A classic example of this genre is George Booles 1854 book, An investigation of the laws of thought [293].
In philosophy, it is not unusual to see the words essay, treatise or enquiry in a book title. (For example,
de Montaigne [398, 399]; Locke [405]; Berkeley [396]; Hume [402].) This book is thus an investigation, essay,
treatise or enquiry. Implicit in such titles is the possibility, or even the expectation, of failure! To judge this
book as a textbook or monograph would be like judging the cat as fox-hunter and the dog as mouse-catcher.
1.4.4 Remark: The original motivation for this book was elliptic second-order boundary value problems.
In 1986 the author started trying to generalise his PhD research on certain geometric properties of solutions
of second-order boundary and initial value problems from flat space to differentiable manifolds. This required
some computations for parallel transport of second-order differential operators along geodesics in terms of
the curvature of the point-space. The author could not find the answer in the literature.
Looking beyond the authors own research area, it seemed that the existence and regularity theory for partial
differential equations in flat space, particularly for boundary and initial value problems, should be extended
to curved space. If the universe is not flat, much PDE theory could be inapplicable. Many of the techniques of
PDE theory depend on special properties of Euclidean space. So it seemed desirable to formulate differential
geometry so as to facilitate porting PDE theory to curved space. For this task, the author could not
find suitable differential geometry texts. The more he read, the more confusing the subject became because
of the multitude of contradictory definitions and formalisms. The origins and motivations of differential
geometry concepts are largely submerged under a century of continuous redefinition and rearrangement. The
differential geometry literature contains a plethora of mutually incomprehensible formalisms and notations.
Writing this book has been like creating a map of the world from a hundred regional maps which use different
coordinate systems and languages for locating and naming geographical features.
1.4.6 Remark: Computation is the robots task. Understanding mathematics is the humans task.
To suppose that mathematics is the art of calculation is like supposing that architecture is the art of
bricklaying, or that literature is the art of typing. Calculation in mathematics is necessary, but it is an
almost mechanical procedure which can be automated by machines. The mathematician does much more
than a computerised mathematics package. The mathematician formulates problems, reduces them to a form
which a computer can handle, checks that the results correspond to the original problem, and interprets and
applies the answers. Therefore the mathematician must not be too concerned with mere computation. That
is the task of the robot. The task of the human is to understand mathematics.
Most mathematics is concerned with answering questions, like what 2 + 2 equals and how to prove it. But
another important part of mathematics is questioning answers, like what 2, + and = mean, how these
concepts can be better defined, and how they can be generalised. This book is concerned as much with
questioning answers as with answering questions.
1.4.7 Remark: It is important to identify the class of each mathematical object before using it.
For every mathematical object, one may ask the following questions.
(1) How do I perform calculations with this object?
(2) In which space does this object live?
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) What is the essential nature of this object?
Students who need mathematics only as a tool for other subjects may be taught not much more than how to
perform calculations. This often leads to incorrect or meaningless calculations because of a lack of knowledge
of which kind of space each object lives in. Different spaces have different rules. Question (3) is important
to help guide ones intuition to form conjectures and discover proofs and refutations. One must have some
kind of mental concept of every class of object. This book tries to give answers to all three of the above
questions. Human beings are half animal, half robot. It is important to satisfy the animal halfs need for
motivation and meaning as well as the robot halfs need to make computations.
If one cannot determine the class of object to which a symbolic expression refers, it may be that the
expression is a pseudo-notation. That is, it may be a meaningless expression. Such expressions are
frequently encountered in differential geometry (and in theoretical physics and applied mathematics). This
book tries to avoid pseudo-notations and pseudo-concepts. They should be replaced with well-defined,
meaningful expressions and concepts. An effective tactic for making sense of difficult ideas in the DG
literature is to determine which set each object in each mathematical expression belongs to. It sometimes
happens that a difficult problem is made easy, trivial or vacuous by carefully identifying the meanings of all
symbolic expressions and sub-expressions in the statement of the problem. Plodding correctness is better
than brilliant guesswork. (Best of all, of course, is brilliant guesswork combined with plodding correctness.)
self-contradictory. Dissecting a difficult mathematical idea occasionally reveals that the presenter needs to
do more homework. To paraphrase a famous song title, it dont mean a thing if it dont mean a thing.
1.4.9 Remark: Minimalism: This book attempts to remove unnecessary constraints on definitions.
In this book, an attempt is made to determine what happens if many of the apparently arbitrary choices
and restrictions in definitions are dropped or relaxed. If removing or weakening a requirement results in a
useless or meaningless definition, this helps to clarify the meaning and motivate the definition.
There is a kind of duality between definitions and theorems. If a definition is generalised, the set of theorems
is typically restricted. If a definition is restricted, the set of theorems is typically increased. Conversely, if
a theorem is strengthened, this puts a constraint on the generality of the definitions for the inputs to the
theorem, and vice versa. For each input to a fixed theorem, there is a maximal definition for that input for
which the theorem is valid (for a given fixed output). In this sense, there is a natural maximal definition for
each input to that theorem, assuming that the other inputs and the output are fixed. In many places in this
book, such maximal definitions are sought. (A maximal definition here means a definition which is true for
a maximal set of objects, which means that the definition has minimal constraints.) Since different authors
have different favourite theorems, many of the most important definitions in mathematics have multiple
versions, each optimised for a different set of favourite target theorems.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
can be obtained for a C k1 input, the theorem requiring a C k input would not be sharp. Similarly, if the
output is shown to be C k when in fact it is possible to prove a C k+1 output, such a theorem would not
be sharp. A theorem can be shown to be sharp by demonstrating the existence of a counterexample which
makes strengthening or sharpening of the theorem impossible within the specified class of possibilities. (In
this example, the class of possibilities is the set of all C k regularity levels.)
The assumptions of definitions can also be sharpened. For example, a continuous real function can be
integrated, but much weaker assumptions than continuity still yield a meaningful integral. Similarly, many
definitions for linear spaces may be easily extended to general modules over rings, and many definitions for
Riemannian manifolds are perfectly meaningful for affinely connected manifolds.
Whenever it has been relatively easy to do so, theorems and definitions have been made as sharp as possible
in this book, although this has been avoided when the cost has clearly outweighed the benefits. Sometimes
one must sacrifice some truth for beauty, although mostly it is better to sacrifice some beauty for truth.
1.4.13 Remark: Folklore should be avoided. Methods and assumptions should be explicitly documented.
Differential geometry has developed multiple languages and dialects for expressing its extensive network of
concepts during the last 190 years. Familiarity with the folklore of each school of thought in this subject
sometimes requires a lengthy apprenticeship to learn its languages and methods. It is the authors belief
that a competent mathematician should be able to learn differential geometry from books alone without the
need for initiation into the esoteric mysteries by the illuminated ones. The validity of theorems should follow
from objective logical argument, not subjective geometric intuition. Differential geometry should receive the
same kind of logically rigorous treatment that algebra, topology and analysis are subjected to.
The author is reminded of the foreword to the counterpoint tutorial Gradus ad Parnassum ([435], page 17)
by Johann Joseph Fux, published in 1725:
I do not believe that I can call back composers from the unrestrained insanity of their writing to
normal standards. Let each follow his own counsel. My object is to help young persons who want
to learn. I knew and still know many who have fine talents and are most anxious to study; however,
lacking means and a teacher, they cannot realize their ambition, but remain, as it were, forever
desperately athirst.
Joseph Haydn was one of the composers who learned counterpoint from Fuxs book because he lacked means
and a teacher. This differential geometry book is also aimed at those who lack means and a teacher. The
Gradus ad Parnassum was much more successful than previous counterpoint tutorials because Fux arranged
the ideas in a logical order, starting from the most basic two-part counterpoint (with gradually increasing
rhythmic complexity), then three-part counterpoint in the same way, and finally four-part counterpoint in
the same graduated manner. Fux [435], page 18, said about this systematic approach: When I used this
method of teaching I observed that the pupils made amazing progress within a short time. This book tries
similarly to arrange all of the ideas which are required for differential geometry in a clear systematic order.
1.4.14 Remark: Concentrated focus on definitions gives better understanding than peripheral perception.
A vegetarian cook will generally cook vegetables better than the meat-centric cook who regards vegetables
as a necessary but uninteresting background. And a definition-centric book will generally explain definitions
better than a theorem-centric book which regards definitions as a necessary but uninteresting background.
1.4.15 Remark: Truth is not decided by majority vote. Seek truth from facts.
This author has strongly believed since the age of 16 years that truth is not decided by majority vote. After
a test had been handed back to the chemistry class, he noticed that the two questions where he had lost
points had been marked incorrectly. So he raised this with the teacher. After about 20 minutes of open
discussion, the teacher agreed that it had been marked incorrectly. Then this 16-year-old student started to
make the case for the other lost mark. The teacher lost patience and asked for a show of hands to decide
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
where the teacher was right or the student. The majority voted on the side of the teacher. The teacher said
this settled the matter and the aggrieved student was wrong. Since this student had studied chemistry by
reading more broadly and intensively than expected, it seemed somewhat unjust that a majority vote had
decided the truth. The disillusionment with science that arose from this experience is doubtless one of
the historical causes of the present book. If the majority of mathematicians, teachers and textbooks agree
on a point, that does not necessarily imply that it is correct. Assertions should be either true or false in
themselves, not because a majority vote goes one way or the other. All majority beliefs should be examined
and re-examined by free-thinking individuals because the majority might be wrong! As Mao Ze-Dong
and Deng Xiao-Png used to say: Seek truth from facts. (Sh sh qiu sh. .)
extended integers Z
Z, N
+
Z0, +
+ +
Z
Z0
extended rationals Q
Q+ Q+0 Q
Q
R R+ R0 R R0
0
extended reals
The notation is used for the non-negative integers + Z 0 when the ordinal number representation is meant.
Thus 0 = , 1 = {0}, 2 = {0, 1} and so forth. Then + = {}. The notation + Z 0 is preferable when
the representation is not important. Note that the plus-symbol in + Z 0 indicates the inclusion of positive
numbers, whereas the plus-symbol in + means that an extra element is added to the set .
The notation N Z
is mnemonic for the natural numbers + . (Some authors include zero in the natural
numbers. See Remark 14.1.1.) The bar over a set of numbers indicates that the base set is extended by
Z Z
R R
elements and . So = {, } and = {, }. The positivity and negativity restrictions
Z Z
are then applied to these extended base sets. Thus +
+
0 = 0 {} and R
R
= {}.
N
The notations n = {1, 2, . . . n} and n = {0, 1, . . . n 1} are used as index sets for finite sequences. These
index sets are often used interchangeably in an informal manner. Thus n usually means Nn in practice.
R R
1.5.2 Remark: Mathematical symbols at the beginning of a sentence are avoided.
When mathematical symbols appear immediately after punctuation, the sentence structure can become
unclear. Therefore a concerted effort has been made to commence all sentences with natural language words.
For similar reasons, end-of-sentence mathematical symbols at the beginning of a line are also avoided, like
. These punctuation-related considerations sometimes lead to slightly unnatural grammatical structure
of sentences. Likewise, a concerted effort has been made to rearrange sentences in order to avoid hyphen-
ation at line breaks. The text is also often rewritten to avoid a single word on a final line, which would look
bad.
1.5.3 Remark: There are no foot-notes or end-notes.
Mathematics is painful enough already without having to go backwards and forwards between the main text
and numerous foot-notes and/or end-notes. Such archaic parenthetical devices cause sore eyes and loss of
concentration. Their original purposes are no longer so relevant in the era of computer typesetting. They
were formerly used to save money when typesetting postscripts, after-thoughts and revision notes by editors.
The bibliography, on the other hand, does belong at the end of a book or article, pointed to from the text.
1.5.4 Remark: All theorems are based on Zermelo-Fraenkel axioms except those which are tagged.
Theorems which are based on non-Zermelo-Fraenkel axioms are tagged to indicate the axiomatic framework.
This is like listing ingredients on food labels. (See Remarks 4.5.5 and 7.1.11 for some axiom-system tags.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1.5.5 Remark: Pseudo-notations to introduce definitions are avoided.
A definition is not a logical implication or equivalence. Nor is it an axiom or a theorem. A definition
is a shorthand for an often-used concept. Therefore the symmetric symbols = for equality or for
equivalence are not used in this book for definitions, but the popular asymmetric definition assignment
def
notations = , =, and := are clumsy, unattractive and incorrect. Plain language is used instead.
1.5.6 Remark: An empty-square symbol indicates the end of each proof.
The square symbol is placed at the end of each completed proof, with the same meaning as the Latin
abbreviation QED (Quod Erat Demonstrandum) which was customary in old-style Euclidean geometry
textbooks. The invention of this kind of QED symbol is generally attributed to Paul Richard Halmos. In
the 1990 edition of a 1958 book, Eves [303], page 149, wrote: The modern symbol , suggested by Paul R.
Halmos, or some variant of it, is frequently used to signal the end of a proof. In 1955, Kelley [97], page vi,
wrote the following.
In some cases where mathematical content requires if and only if and euphony demands something
less I use Halmos iff. The end of each proof is signalized by . This notation is also due to Halmos.
Many authors have used a solid rectangle. (This leaves unanswered the question of who first used the more
attractive, understated empty square symbol instead of the brash, ink-wasting solid bar.)
1.5.7 Remark: Italicisation of Latin and other non-English phrases.
It is conventional to typeset non-English phrases in italics, e.g. modus ponens and modus tollens, as a hint
to readers that these are imported from other languages. This convention is frequently adopted here also.
1.5.8 Remark: The chicken-foot symbol indicates specification-tuple abbreviations.
The non-standard chicken-foot symbols < and > indicate abbreviations of specification tuples for
mathematical objects. (See Section 8.7.) This avoids embarrassing absurdities such as: Let X = (X, T ) be
a topological space. Clearly abbreviation, not equality, is intended in such situations.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) This is not a textbook. (See Remark 1.4.3.)
(2) Definitions are the main focus rather than theorems. Theorems are presented only to support the
presentation of definitions.
(3) Substantial preliminary chapters present most of the prerequisites for the book. This avoids having to
weave elementary material into more advanced material wherever needed.
(4) Classes of mathematical objects are generally defined in terms of specification tuples.
(5) An effort has been made to identify clearly which class each mathematical object belongs to, and the
domain and range classes for all functions. This kind of tidy book-keeping helps to ensure that every
expression is meaningful.
(6) Predicate logic notation is used extensively for the presentation of definitions and theorems. Predicate
logic is more precise than explanations in natural language, but most of the formal definitions are
accompanied by ample natural-language commentary, especially when the logic could be perplexing.
(7) Differential geometry concepts are presented in a progressive order within a framework of structural
layers and sublayers. In particular, the Riemannian metric is not introduced until affine connections have
been presented, and concepts which are meaningful without a connection are defined in the differentiable
manifold chapters before connections are presented. (See Section 1.1 for further details.)
(8) There are no exercises (because this is not a textbook). So the reader is not required to remedy numerous
gaps in the presentation. Proving theorems is the best kind of exercise. Readers should write or sketch
proofs for most theorems themselves before reading the solutions provided here. (Most of the proofs
in this book would be set as exercises in typical textbooks.) Proof-writing is the principal skill of pure
mathematics. Anyone can make conjectures. The ability to provide a proof is the difference between
the mathematician and the dilettante. The dilettante conjectures. The mathematician proves. Creating
examples and diagrams to elucidate theorems and definitions is also an important kind of exercise. And
possibly most educational of all is to rewrite everything in ones own personal choice of notation.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1.6.3 Remark: Topics which are outside the scope of this book.
All books have loose ends. Otherwise they would never end. Even encyclopedias must limit their scope.
(1) Model theory. Although model theory is alluded to, and some basic models (the cumulative hierarchy
and the constructible universe) are defined, this topic is not developed beyond some basic definitions.
(2) Category theory. The bulk handling facilities for morphisms in category theory are not used here.
(3) Complex numbers. As far as possible, complex numbers are avoided. However, they are required for
the unitary Lie groups which are unavoidable in gauge theory.
(4) Topological vector spaces. Presentation of general topological vector spaces is restricted to a brief
outline in Section 39.1. Schwartz distributions and Sobolev spaces are not presented.
(5) Algebraic topology. Apart from some brief allusions, algebraic topology and combinatorial topology
in general are avoided. Only point set topology is presented.
(6) Lie groups. Although Chapter 60 presents some basic definitions for Lie groups, the deeper group-
theoretic, combinatorial topology and representation theory aspects of the subject are avoided.
(7) Probability theory. Almost nothing is said about probability at all. Consequently information
geometry is not presented.
(8) Theoretical physics. Although one of the principal motivations for this book is to elucidate and
illuminate the mathematical foundations which underly general relativity and cosmology, applications to
physics are restricted to some basic definitions in Chapter 69 concerning pseudo-Riemannian manifolds.
Likewise gauge theory, which is related to Lie groups and fibre bundles, is not presented.
hh 2016-11-15. Part (8) will need to be updated if the current plan to include basic gauge theory succeeds. ii
A book is like a refrigerator. When there is too little food in the fridge, it is advantageous to add more food.
But too much food means you cant find things at the back, and the stale food eventually becomes inedible.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(10) If you discuss your work with other people, especially with teachers or tutors, show them your notes
and the lines with the asterisks. Try to get them to explain these lines to you.
If you keep working like this, you will find that your study becomes very efficient. This is because you do
not waste your time studying things which you have already understood.
I used to notice that I would spend most of my time reading the things which I did understand. To learn
efficiently, it is necessary to focus on the difficult things which you do not understand. Thats why it is so
important to mark the incomprehensible lines with an asterisk.
Copying material by hand is important because this forces the ideas to go through the mind. The mind is
on the path between the eyes and the hands. So when you copy something, it must go through your mind!
It is also important to develop an awareness of whether you do or do not really understand something. It is
important to remove the mental blind spots which hide the difficult things from the conscious mind.
When copying material, it is important to determine which is the first word where it becomes difficult. Read
a difficult sentence until you find the first incomprehensible word. Focus on that word. If a mathematical
expression is too difficult to understand, read each symbol one by one and ask yourself if you really know
what each symbol means. Make sure you know which set or space each symbol belongs to. Look up the
meaning of any symbol which is not clear.
1.7.2 Remark: First learn. Then understand. Insight requires ideas to be uploaded to the mind first.
It is often said that learning is most effective when the student has insight. This assertion is sometimes
given as a reason to avoid rote learning, because insight must precede the acquisition of ideas. But the
best way to get insight into the properties and relations of ideas is to first upload them into the mind and
then make connections between them. For example, a child learning multiplication tables will notice many
patterns and redundancies during or after rote learning. The best strategy is to learn first, then understand
more deeply. Connections can only be made between ideas when they are in the mind to be connected.
22 20 19
26 18
28 17
K-theory
30
Real fu
Categ
16
Mea
Non
31 15
Fun
Ass
32
Po
sure
asso
14
ory th
Lin
ctio
nctions
o
ten
Se
33
Al
13
iati
a
ns o
cia
and
ve
tial
ge
ra
Sp
ve
34
ral
Co
br
o
t
nd
i
ec
the
v
r
a
integ
m
r
12
y
a
e
O
co
ic
ial
ngs
Fi
m
,
m
rd
com
ring
35
mp ns
el
hom
ge
ut
Pa
u
ry
fu
in
ltili
at
ar
11
n
o
rti th
and algeb
lex
a
Nu
cti
me bra
iv
s an bras matr
y
al
plex
eo
ne
tion
37
e
di
m
o
Dy dif ry
lo
va
try
be
al
ffe
ar
na fe
08
a
ge
an
g
rt
d
ria Ge
re
re
l
mi
var
ge ra;
ical a ras
he d
nt ne
39
nt
a
Dif ca
ble
or po
ial
l
ral
ia
ls
g
fer
i
y
06
a
le
ys alg lyn
eb
en tem equ
sa
Ord
b
ce
qu
eb
lgebra
om
le
er,
40
Seq at
nd
an
rai
s
at
latt
uen df io cs ia
un and ns
io
ice
05
Com ls
an
ces y
ns
s, o ste
, c ti e bina
41
s o r r
aly
ix
d
App erie n g toric ere m
s, s al eq odic
s
roxim
t
s da
tic
03
he
Math
ation um ua th emat lge
s m ti e
sp
bra
ory
and a
expa bility ons ory
42
ical lo
Harmo ic s
ac
gic a
nic an nsio
tru
01
nd fo
alysis History
es
ctu
on Eu ns and biog unda res
clidea tions
43
raphy
Abstract ha n spac
rmonic anal es
ysis
00
General
ces
44
97
n, circ u l
tions im urinag
opt
unicatio
Integral equa d comm iom
ol;
45
n avm
ntr Informa
tio n a
es be ra
h
nalysis o l
al c ontro scienc l and prog
94
onal a ptim
ory, c
Functi e a l a al
o h i
46
t r c
ory nd
ms natu s, so atic
es
sa Syste
r the ion
her
93
c
rato
t m
try
i
ess
o
t nom ath
e
ma
sfe
pe ia and
O f va
r me
tic p lds
47
ogy eco h, m
r ry
o o
roc
Biol
of
ge
,
s
ctu o
y
culu
has nifo
eor ics
92
arc
tru l the
Cal te
e
Mechanics of particles and systems
r e e th se ys
49
y c
stoc n ma
e
etr ry
h
s, na
m r
dis
o s
Ga
et
s p
e
e
om ion
ble solids
ro
nic tio
91
nd
s, h
aly plex
st
Ge a o m rat
a
s
s a
51
ex ge
vit
e ic
Op d
mic
sis
m
nv
gy
ys
al
a
an
90
t
i
r
co
o
y
t ph
lo
g
c
log
C n y
a
po
o
a
re
i
d
52
and
et
Ge m
l
an
l
ffe
po
no
ce
to
an
f deforma
ec
dy
agn
i
86
ory
ro
al
ty
to
D
m
nd
mo
t
is,
ory
er
ivi
53
As
lysis
aic
the
l
trom
en
ce
sa
ica
at
lys
her
85
anics
the
br
l
G
Re
Computer scien
ist
old
tum
a
ge
54
cal ana
t
l an
83
a
ility
l
Al
nif
e
o
ca
an
St
h
55
s
ba
Ma
ssi
stics
bab
82
ic
ics,
Q
Glo
m
n
Cla
57
Numeri
81
Pro
Opt
Stati
Fluid
h
58
Mec
80
60 78
62 76
65 68 70 74
Part I
Foundations
The informal outlines at the beginning of Parts I, II, III and IV of this book are not intended to be read.
However, these outlines could possibly be of some value as an adjunct to the table of contents and index.
Part I: Foundations
(1) Part I introduces logic, sets and numbers. Each of these foundation topics requires the other two. So
it is not possible to present them in strict define-before-use order. (This would be like trying to write
a dictionary where every word must be defined using only words which have already been defined.)
The definitions and theorems before and including Definition 7.2.4 are tagged as metamathematical or
logical, but most definitions and theorems thereafter are based on Zermelo-Fraenkel set theory and are
presented in define-before-use order.
(2) Chapters 3, 4, 5 and 6 on mathematical logic are presented before all other topics because logic is
the most fundamental, ubiquitous and deeply pervasive part of mathematics. In particular, it is the
necessary foundation for axiomatic set theory.
(3) Chapter 7 introduces the Zermelo-Fraenkel set theory axioms, which provide the fundamental framework
for modern mathematics. Chapters 8, 9 and 10 present the most basic structures and theorems of
mathematics in terms of the ZF set theory foundation.
(4) Chapters 11, 12 and 13 introduce order, ordinal numbers and cardinality.
(5) Chapters 14, 15 and 16 introduce natural numbers, integers, rational numbers and real numbers.
(6) Section 3.5 introduces logical expressions which can be used to describe ones current knowledge about
a given concrete proposition domain. These logical expressions use familiar logical operations such as
and, or, not and implies.
(7) Definition 3.6.2 introduces truth functions, which are arbitrary maps from {F, T}n to {F, T}. These
are used to present some basic binary logical operators in Remark 3.6.12. The wide variety of notations
for logical operators is demonstrated by the survey in Table 3.6.1, described in Remark 3.6.14.
(8) Sections 3.8 and 3.9 introduce symbol-strings which may be used as names of propositional logic
expressions. This allows names which refer to particular knowledge functions to be combined into new
knowledge functions. When particular meanings are defined for logical symbol-strings, the result is a
kind of syntax or language for describing knowledge functions. Symbol-strings are a compact way
of systematically naming knowledge functions (i.e. logical expressions).
(9) Section 3.10 introduces the prefix and postfix styles for logical expressions, which are relatively easy to
manage. Section 3.11 introduces the infix styles for logical expressions, which is relatively difficult to
manage, but this is the most popular style!
(10) Section 3.12 introduces tautologies and contradictions, which are important kinds of logical expressions.
It also mentions substitution theorems for tautologies and contradictions.
(11) Section 3.13 is concerned with the interpretation of logical operations propositional logic expressions as
logical operations on knowledge sets.
(12) Section 3.14 mentions some of the limitations of the standard logical syntax systems for describing
complicated knowledge sets.
(13) Section 3.15 states that model theory for propositional logic has very limited interest. It also explains
very briefly what model theory is, and gives a broad classification of logical systems.
Chapter 4: Propositional calculus
(1) Chapter 4 describes the argumentative approach to logic, where axioms and rules of inference are
first asserted (and presumably accepted by the reader), and logical propositions are then progressively
inferred from those axioms and rules. In this way, the reader can be led to accept propositions that
they otherwise might reject if they had foreseen the consequences of accepting the axioms and rules. In
propositional logic, the argumentative approach requires much more effort than truth-table computation.
However, the argumentative method is presented here, in the Hilbert style and without the assistance
of the deduction theorem, to demonstrate just how much hard work is required. This is a kind of
mental preparation for predicate calculus, where the argumentative method seems to be unavoidable
for a logical system which has non-logical axioms.
(2) Remark 4.1.5 mentions that the knowledge set concept is applied to the interpretation of propositional
logic argumentation in much the same way as it is to propositional logic expressions. This gives a
semantic basis for predicate logic argumentation. (In other words, it gives a basis for testing whether
all arguments infer only true conclusions from true assumptions.)
(3) Remark 4.1.6 gives a survey of the wide range of terminology which is found in the literature to refer
to various aspects of formalisations of logic.
(4) Table 4.2.1 in Remark 4.2.1 surveys the wide range of styles of axiomatisation of propositional calculus.
(5) Definition 4.3.9 and Notations 4.3.11, 4.3.12 and 4.3.15 define and introduce notations for unconditional
and conditional assertions.
(6) The modus ponens inference rule is introduced in Remark 4.3.17.
(7) Definition 4.4.3 presents the chosen axiomatisation for propositional calculus in this book. This consists
of the three Lukasiewicz axioms together with the modus ponens and substitution rules.
(8) The first propositional calculus theorem in this book is Theorem 4.5.6, which bootstraps the system
by proving some very elementary technical assertions. These are applied throughout the book to prove
other theorems. Because of the way in which logic is formulated here in terms of very general concrete
proposition domains and knowledge sets, the theorems in Chapter 4 are applicable to all logic and
mathematics. Further theorems for the implication operator are given in Section 4.6.
(9) Section 4.7 presents some theorems for the and, or and if and only if operators, which are
themselves defined in terms of the implication and negation operators.
(10) The deduction theorem, which is actually a metatheorem, is discussed in Section 4.8. However, it is
not used in Chapter 4 for proving any theorems. However, the conditional proof rule CP is included
in Definition 6.3.9 as an axiom because it is semantically obvious. It is not obvious that it follows from
the argumentative approach to logic, but in terms of the knowledge set semantics, there is little room
for doubt.
(1) Figure 5.0.1 in Remark 5.0.1 shows the relations between propositional logic, predicate logic, first-order
languages and Zermelo-Fraenkel set theory. Mathematics is viewed in this book, in principle, as the set
of all systems which can be modelled within ZF set theory. (This is a somewhat approximate delineation
which can be made more precise.)
(2) Section 5.1 discusses the underlying assumptions and interpretations of predicate logic. Remark 5.1.1
describes predicate logic as a framework for the bulk handling of parametrised sets of propositions,
particularly infinite sets of propositions.
(3) Notation 5.2.2 introduces the universal and existential quantifiers (for all) and (for some). These
are applied to parametrised propositions to form predicates, which are propositions which may contain
these quantifiers. A survey of notations for these quantifiers is given in Table 5.2.1 in Remark 5.2.4.
(4) Section 5.3 gives some informal arguments in favour of using natural deduction systems for predicate
calculus instead of Hilbert-style systems. Natural deduction systems use conditional assertions (called
sequents) for each line of an argument, whereas Hilbert-style systems use unconditional assertions.
Natural deduction systems correspond closely to the efficient way in which real mathematicians construct
proofs of real-world theorems, whereas Hilbert-style systems are often oppressively difficult to work with
because they are so unintuitive. Real mathematical proofs rely heavily on the inference methods known
as Rule G and Rule C (discussed in Remark 6.3.19 and elsewhere), but the Hilbert approach does
not permit these methods, except by way of metatheorems. The semantics of sequents is summarised
in Table 5.3.1 in Remark 5.3.10 for simple propositions, and in Remark 5.3.11 for logical quantifiers.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) It is explained in Section 6.1 that there is no general truth-table-like automatic method for evaluating
validity of logical predicates. Therefore the argumentative method must be used.
(2) Table 6.1.2 in Remark 6.1.5 is a survey of styles of axiomatisation of predicate calculus according to the
numbers of basic logical operators, axioms and inference rules they use. The style chosen here is based
on the four symbols , , and , together with the three Lukasiewicz axioms, the modus ponens and
substitution rules, and five natural deduction rules. The propositional calculus theorems in Chapter 5
are assumed to apply to predicate calculus because the same three Lukasiewicz axioms are used.
(3) The chosen predicate calculus system for this book is presented in Definition 6.3.9.
(4) Section 6.4 discusses the semantic interpretation of the chosen predicate calculus. Remark 6.5.9 gives
an example of the semantics of some particular lines of logical argument in this natural deduction style.
(5) Section 6.6 presents the first bootstrap theorems for the chosen predicate calculus. Surprisingly
difficult are the assertions in Theorem 6.6.4 for zero-variable predicates. These must be handled very
carefully because of the subtleties of interpretation. The other theorems in Section 6.6 demonstrate
some of the strategies which can be used for this natural deduction system, and also shows how some
kinds of false theorems can be easily avoided. (In some predicate calculus systems, false theorems
are not so easy to avoid!)
(6) Section 6.7 introduces predicate calculus with equality. An equality relation is needed in every logical
system where the proposition parameters represent objects in some universe. In the case of ZF set
theory, these objects are the sets.
(7) In terms of the equality relation, uniqueness and multiplicity are introduced in Section 6.8.
(8) Section 6.9 introduces first-order languages, which are distinguished from predicate calculus by having
more relations than just equality. In the ZF set theory case, the element-of relation is required.
Chapter 9: Relations
(1) Chapter 9 introduces relations as arbitrary sets of ordered pairs in Section 9.5.
(2) Some basic theorems about ordered pairs are given in Section 9.2.
(3) Section 9.3 contains some comments on extensions of the ordered pair concept to more general finite
ordered tuples.
(4) General Cartesian set-products and their basic properties are introduced in Section 9.4.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) Basic definitions and properties of general relations are presented in Section 9.5, including important
concepts such as the domain, range, source set, target set and images of relations.
(6) Section 9.6 presents various properties of composites and inverses of relations.
(7) Section 9.7 defines direct products of relations. These are precursors of the direction products of
functions in Section 10.14, which are used for direct products of manifolds in Sections 49.4 and 50.13.
(8) Equivalence relations, which are almost ubiquitous in mathematics, are defined in Section 9.8.
(8) Sections 10.9 and 10.10 concern the somewhat neglected topic of partially defined functions. These
are almost ubiquitous in differential geometry, and yet their general properties are rarely presented
systematically. Various useful properties of partial functions and their composites are given.
(9) Section 10.11 gives some basic properties of Cartesian products of sets. Since the axiom of choice is not
adopted in this book, it is not necessarily guaranteed that non-empty products of non-empty families
of sets are non-empty. However, this is not a problem in practice.
(10) Sections 10.12 and 10.13 define projection maps, slices and lifts for binary and general Cartesian
products of sets. The slices and lifts are generally presented informally as required, but here they are
given names and notations.
(11) Sections 10.14 and 10.15 present two different kinds of direct products of functions. The double-
domain style of direct product is used for direct products of manifolds and other such structures. The
common-domain style of direct product is used for fibre charts for fibre bundles.
(12) Section 10.16 presents equivalence relations and equivalent kernels, which are often used to define various
kinds of quotient space structures.
(13) Section 10.17 introduces partial Cartesian products and collage spaces, which are used for constructing
spaces by gluing together patches, as is often done in differential geometry.
(14) The indexed set covers in Section 10.18 are an obvious extension of the set covers in Section 8.6.
(15) Section 10.19 presents some useful extensions to the standard notations for sets of functions and map-
rules. For example, f : A (B C) means that f is a function with domain A and values which are
functions from B to C. Consistent with this, f : a 7 (b 7 g(a, b)) means that f is the function which
maps a to the function which maps b to g(a, b). These notations are often used in this book.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) Section 11.3 defines upper and lower bounds, infimum and supremum, and the minimum and maximum
for functions whose range is a partially ordered set.
(4) Section 11.4 defines lattices as a particular kind of partially ordered set. Although lattices have some
applications to inferior and superior limits in Section 35.8, they are not much needed for differential
geometry.
(5) Section 11.5 defines total order, which is very broadly applicable.
(6) Section 11.6 defines well ordered sets, which are required for transfinite induction. According to the
axiom of choice, a well-ordering exists for every set, but this is about as useful as knowing that theres
at least one needle in the haystack, but youre not sure which galaxy the haystack is located in. This
doesnt help you to find a needle.
(7) Section 11.7 is on transfinite induction, which has applications in set theory, but not many applications
in differential geometry.
(3) In Theorem 12.3.2, it is shown without the axiom of choice that there exists an enumeration of any subset
of by some element of {}. This shows that all subsets of are countable. Theorem 12.3.13 shows
that there is no injection from to an element of . This implies that is infinite.
(4) Section 12.4 mentions general ordinal numbers, but only to observe that they have very little relevance
to differential geometry beyond + .
(5) Section 12.5 presents the so-called von Neumann universe of sets, also known as the cumulative hierarchy
of sets. This vast universe of sets is a kind of standard model for ZF set theory, although the stages
beyond V+ have little relevance to differential geometry.
(6) Table 12.5.2 in Remark 12.5.5 summarises the lower inductive stages of both the ordinal numbers and
the corresponding transfinite cumulative hierarchy of sets. The main purpose of presenting these is to
give some idea of why they are not useful.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
well suited to the classification of all possible well-orderings of infinite sets, a topic which has dubious
relevance to applicable mathematics.)
(4) Section 13.3 introduces finite sets.
(5) Section 13.4 defines addition for finite ordinal numbers.
(6) Section 13.5 looks at the basic properties of infinite and countably infinite sets. Various difficulties arise
because (without the axiom of countable choice) it is not possible to guarantee that a set which is not
finite is necessarily equinumerous to .
(7) Theorem 13.6.4 shows that, without the axiom of choice, the union of an explicit countably infinite
family of finite sets is finite. The explicitness means that a family of enumerations must be given for
the sets. In practice, such a family is typically known, but in many contexts in measure theory, for
example, such explicit families cannot be guaranteed to exist. A consequence of this kind of issue is
that a large number of basic theorems in measure theory must be modified to make them independent
of the axiom of choice. However, as mentioned in Remark 13.6.8, avoiding the axiom of choice is often
a simple matter of quantifier swapping.
(8) Section 13.7 gives some explicit enumerations of subsets of . These are useful in relation to measure
theory in the proofs of Theorems 44.3.3 and 44.4.2.
(9) Section 13.8 is about the frustrating subject of the Dedekind definitions for finite and infinite sets, and
mediate cardinals, whose cardinality is finite but less than the cardinality of . Unfortunately, such
issues arise frequently in topology and measure theory.
(10) To pour oil onto the fire, Section 13.9 gives eight different definitions of finite sets. If the axiom of
choice is adopted, these definitions are all equivalent. This kind of chaos helps to explain why the vast
majority of mathematicians prefer to adopt the axiom of choice, even though AC is in many ways even
more disturbing than the consequences of not adopting it.
(11) Section 13.10 presents some definitions and properties of various kinds of cardinality-constrained power
sets. These are useful for some topics in topology and measure theory.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
it is assumed as the basis of most science and engineering, and a substantial proportion of mathematics.
So it is a good idea to understand them well. They are not as simple as they seem.
(2) Section 15.1 introduces the rational numbers in the usual way, as equivalence classes of ratios of integers.
In most situations, when people think they are using real numbers, they are really using rational
numbers, particularly in computer hardware and software contexts.
(3) Section 15.2 considers the cardinality of the rational numbers, and explicit enumerations which prove
that they are countably infinite.
(4) Section 15.3 discusses a variety of representations of the real numbers, none of which is truly satisfactory
for even modest requirements.
(5) Sections 15.4, 15.5, 15.6, 15.7, 15.8 and 15.9 present the definitions and basic operations and properties
of real numbers using the Dedekind cut representation. This representation was chosen because it has
the most elementary prerequisites. However, even the easiest option is not as easy as it seems! The real
number system is a complete ordered field. But verifying every detail is not a trivial task.
(4) Sections 16.5, 16.6 and 16.7 define some familiar real-valued functions which are useful from time to
time, particularly for constructing examples and counterexamples.
(5) Section 16.8 very briefly defines complex numbers, but their use is avoided as much as possible. They
are required only for defining unitary groups, which are relevant to gauge theory.
Chapter 2
General comments on mathematics
sic
s
op o lo g y
chemist
thr
ry
an
e
c bi
en
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
sci olo
neuro gy
The author started writing this book with the intention of putting differential geometry on a firm, reliable
footing by filling in all of the gaps in the standard textbooks. After reading a set theory book by Halmos [307]
in 1975, it had seemed that all mathematics must be reducible to set constructions and properties. But
decades later, after delving more deeply into set theory and logic, that happy illusion faded and crumbled.
It seems to be a completely reasonable idea to seek the meaning of each mathematical concept in terms of
the meanings of its component concepts. But when this procedure is applied recursively, the inevitable result
is either a cyclic definition at some level, or a definition which refers outside of mathematics. Any network
of definitions can only be meaningful if at least one point in the network is externally defined.
The reductionist approach to mathematics is very successful, but cannot be carried to its full conclusion.
Even if all of mathematics may be reduced to set theory and logic, the foundations of set theory and logic
will still require the importation of meaning from extra-mathematical contexts.
The impossibility of a globally hierarchical arrangement of definitions may be due to the network organisation
of the human mind. Whether the real world itself is a strict hierarchy of layered systems, or is rather a
network of systems, is perhaps unknowable since we can only understand the world via human minds.
to promptly identify where further support is required before it topples and sinks into the sand, or if the
deficiencies cannot be remedied, one may give up and recommend the evacuation of the building.
This remark is more than mere idle philosophising. The foundations of mathematics are examined in Part I
of this book because the author found numerous cases where a trace-back of definitions or theorems led to
serious weaknesses in the foundations. Even some fairly intuitively obvious distance-function constructions
for Riemannian manifolds in Chapter 67 lead directly to questions in set theory regarding constructibility
for arcane concepts such as mediate cardinals, which can only be resolved (or at least understood) within
mathematical logic and model theory. Consequently it is now this authors belief that the foundations of
mathematics should not be ignored, especially in applied mathematics.
2.0.3 Remark: Philosophy of mathematics. But this is not the professional philosophers philosophy.
Chapter 2 was originally titled Philosophy of mathematics, but the professional philosophers idea of
philosophy has almost nothing in common with the philosophy in this book. Every profession has its own
philosophical frameworks within which ideas are developed and discussed. Mathematicians in particular
have a need to develop and discuss ideas which arise directly from the day-to-day concerns of the metier itself.
Philosophers have contemplated mathematics for thousands of years, possibly because it is simultaneously
exceedingly abstract and yet extremely applicable. But as Richard Feynman is widely reputed (but not
proved) to have said: Philosophy of science is about as useful to scientists as ornithology is to birds.
The same is largely true of the philosophy of mathematics, except that some philosophers have in fact been
mathematicians investigating philosophy, not philosophers investigating mathematics. Any philosophy in
this book arises directly from the real, inescapable quandaries which are encountered while formulating and
organising practical mathematics, not from the desire to have something to philosophise about.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
sedimentary layers (rigorous mathematics)
set theory
naive logic naive sets naive functions naive order naive numbers
Mathematics is boot-strapped into existence by first assuming socially and biologically acquired naive notions;
then building logical machinery on this wobbly basis; then rigorously redefining the naive notions of sets,
functions and numbers with the logical machinery. Terra firma floats on molten magma. Mathematics and
logic are like two snakes biting each others tails. They cannot both swallow the other (although at first they
might think they can). The way in which rigorous logic is boot-strapped from naive logic is summarised in
the opening paragraph of Suppes/Hill [346], page 1.
In the study of logic our goal is to be precise and careful. The language of logic is an exact one. Yet
we are going to go about building a vocabulary for this precise language by using our sometimes
confusing everyday language. We need to draw up a set of rules that will be perfectly clear and
definite and free from the vagueness we may find in our natural language. We can use English
sentences to do this, just as we use English to explain the precise rules of a game to someone who
has not played the game. Of course, logic is more than a game. It can help us learn a way of
thinking that is exact and very useful at the same time.
A mathematical education makes many iterations of such redefinition of relatively naive concepts in terms
of relatively rigorous concepts until ones thinking is synchronised with the current mathematics culture.
Philosophers who speculate on the true nature of mathematics possibly make the mistake of assuming a
static picture. They ignore the cyclic and dynamic nature of knowledge, which has no ultimate bedrock
apart from the innate human capabilities in logic, sets, order, geometry and numbers.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
At the beginning of the 21st century, it would probably be fair to say that the majority of mathematics
is (or can be) expressed within the framework of Zermelo-Fraenkel or Neumann-Bernays-Godel set theory
(appealing to axioms of choice when absolutely necessary). ZF and NBG set theory may be expressed within
the framework of first-order language theory, and first-order languages derive their semantics from model
theory. Model theory is so abstract, abstruse and arcane that few mathematicians immerse themselves in it.
But most mathematicians seem to accept the famous metatheorems of Kurt Godel, Paul Cohen and others,
which are part of model theory. Hence model theory could currently be viewed as the ultimate bedrock of
mathematics. This is convenient because, being abstract, abstruse and arcane, model theory is examined by
very few mathematicians closely enough to ascertain whether it provides a solid ultimate bedrock for their
work. Thus the solidity of their discipline is outsourced. So it is somebody elses problem.
remembered; . . . why should I admit these things if they cant be proved? His brother warned that
unless he could accept them it would be impossible to continue. Russell capitulated provisionally.
In later life, Russell tried, together with Whitehead [350, 351, 352], to provide a solid bedrock for all of
mathematics, but this attempt was only partially successful.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
In addition to the semi-arbitrariness of the axioms for set theory, there is also arbitrariness in the choice
of model for the chosen set theory axioms. Such questions, and many other related questions, are the
motivation for thinking long and hard about the bedrock of mathematics.
Although it could be concluded that all problems in the foundations of mathematics could be removed by
drastic measures, such as rejecting the infinity axiom, or the power-set axiom, or both, this would remove
essentially all of modern analysis. The best policy is, all things considered, to manage the metaphysicality of
mathematics, because when the metaphysical is removed from mathematics, almost nothing useful remains.
If only concrete mathematics is accepted, then mathematics loses most of its power. The whole point of
mathematics is to take human beings beyond the concrete. Restricting mathematics to the most concrete
concepts defeats its prime purpose. Without infinities, and infinities of infinities, mathematics could not
progress far beyond the abacus and the ruler-and-compass. But the price to pay to go far beyond directly
perceptible realities is eternal vigilance. Mathematics needs to be anchored in reality to some extent.
Mathematics spans a spectrum of metaphysicality from the intuitively clear elementary arithmetic to bizarre
concepts such as inaccessible cardinals. Situated in between the extremes lie mathematical analysis and
differential geometry. One does want to be able to assert the existence of solutions to problems of many
kinds, but if the existence is of the shadowy axiom-of-choice kind, that wont be of much practical use when
one tries to design approximation algorithms for such solutions, or draw graphs of them.
quality of life
technology
engineering
science
mathematics
mathematics foundations
that technology. Since mathematics has been such an important factor in technological progress in the last
500 years, surely the foundations of mathematics deserve some attention also, since mathematics is the
foundation of science, which is the foundation of modern technology, the principal factor which differentiates
life in the 21st century from life in the 16th century. A large book could easily be filled with a list of the main
contributions of mathematics to science and technology. Unfortunately it is only mathematicians, scientists
and engineers who can appreciate the full extent of the contributions of mathematics to the modern quality
of life.
2.2.2 Remark: Albert Einsteins explanation of the ontological problem for mathematics.
Albert Einstein [400], pages 154155, wrote the following comments about the ontological problem in a
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1934 essay, Das Raum-, Ather- und Feldproblem der Physik, which originally appeared in 1930. He used
the terms Wesensproblem and Wesensfrage (literally essence-problem or essence-question) for the
ontological problem or question.
Es ist die Sicherheit, die uns bei der Mathematik soviel Achtung einflot. Diese Sicherheit aber ist
durch inhaltliche Leerheit erkauft. Inhalt erlangen die Begriffe erst dadurch, da sie wenn auch
noch so mittelbar mit den Sinneserlebnissen verknupft sind. Diese Verknupfung aber kann keine
logische Untersuchung aufdecken; sie kann nur erlebt werden. Und doch bestimmt gerade diese
Verknupfung den Erkenntniswert der Begriffssysteme.
Beispiel: Ein Archaologe einer spateren Kultur findet ein Lehrbuch der euklidischen Geometrie
ohne Figuren. Er wird herausfinden, wie die Worte Punkt, Gerade, Ebene in den Satzen gebraucht
sind. Er wird auch erkennen, wie letztere auseinander abgeleitet sind. Er wird sogar selbst neue
Satze nach den erkannten Regeln aufstellen konnen. Aber das Bilden der Satze wird fur ihn ein
leeres Wortspiel bleiben, solange er sich unter Punkt, Gerade, Ebene usw. nicht etwas denken
kann. Erst wenn dies der Fall ist, erhalt fur ihn die Geometrie einen eigentlichen Inhalt. Analog
wird es ihm mit der analytischen Mechanik gehen, uberhaupt mit Darstellungen logisch-deduktiver
Wissenschaften.
Was meint dies sich unter Gerade, Punkt, schneiden usw. etwas denken konnen? Es bedeutet
das Aufzeigen der sinnlichen Erlebnisinhalte, auf die sich jene Worte beziehen. Dies auerlogische
Problem bildet das Wesensproblem, das der Archaologe nur intuitiv wird losen konnen, indem
er seine Erlebnisse durchmustert und nachsieht, ob er da etwas entdecken kann, was jenen Ur-
worten der Theorie und den fur sie aufgestellten Axiomen entspricht. In diesem Sinn allein kann
vernunftigerweise die Frage nach dem Wesen eines begrifflich dargestellten Dinges sinnvoll gestellt
werden.
This may be translated into English as follows.
It is the certainty which so much commands our respect in mathematics. This certainty is purchased,
however, at the price of emptiness of content. Concepts can acquire content only when no matter
how indirectly they are connected with sensory experiences. But no logical investigation can
uncover this connection; it can only be experienced. And yet it is precisely this connection which
determines the knowledge-value of the systems of concepts.
Example: An archaeologist from a later culture finds a textbook on Euclidean geometry without
diagrams. He will find out how the words point, line and plane are used in the theorems. He
will also discern how these are derived from each other. He will even be able to put forward new
theorems himself according to the discerned rules. But the formation of theorems will be an empty
word-game for him so long as he cannot have something in mind for a point, line, plane, etc. Until
this is the case, geometry will hold no actual content for him. He will have a similar experience
with analytical mechanics, in fact with descriptions of logical-deductive sciences in general.
What does this having something in mind for a line, point, intersection etc. mean? It means
indicating the sensory experience content to which those words relate. This extralogical problem
constitutes the essence-problem which the archaeologist will be able to solve only intuitively, by
sifting through his experiences to see if he can find something which corresponds to the fundamental
terms of the theory and the axioms which are proposed for them. Only in this sense can the question
of the essence of an abstractly described entity be meaningfully posed in a reasonable way.
(For an alternative English-language translation, see Einstein [401], pages 6162, The problem of space,
ether, and the field in physics, where Wesensfrage is translated in the paragraph immediately following
the above quotation as ontological problem.)
Mathematics cannot be defined in a self-contained way. Some science-fiction writers claim that a dialogue
with extra-terrestrial life-forms will start with lists of prime numbers and gradually progress to discussions
of interstellar space-ship design. But the ancient Greeks didnt even have the same definitions for odd and
even numbers that we have now. (See for example Heath [224], pages 7074.) So it is difficult to make the
case that prime numbers will be meaningful to every civilisation in the galaxy. Since only one out of millions
of species on Earth have developed language in the last half a billion years (ignoring extinct homininians), it
is not at all certain that even language is universal. The fact that one species out of millions can understand
numbers and differential calculus doesnt imply that they are somehow universally self-evident, independent
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of our biology and culture.
Einstein is probably not entirely right in saying that all meaning must be connected to sensory experience
unless one includes internal perception as a kind of sensory experience. Pure mathematical language refers
to activities which are perceived within the human brain, but internal perception originates in the ambient
culture. So it is probably more accurate to say that words must be connected to cultural experience. A
thousand years ago, no one on Earth could have understood Lebesgue integration, no matter how well it was
explained, because the necessary intellectual foundations did not exist in the culture at that time.
One of the main purposes of this book is to identify the meanings of many of the definitions in mathematics
without the need for so much communication by osmosis as an apprentice in the mathematics trade. Anyone
who reads older historical works will inevitably find concepts which make little or no sense in terms of
modern mathematical thought. Often the ontology has been lost. Ancient authors had concepts in their
minds which are no longer generally even thought about in the 21st century. Even in the relatively new
subject of differential geometry, many older works are difficult to make sense of now. Hence it is important
to attempt to explicitly state the meanings of mathematical concepts at this time so that they may be
understood in future centuries which do not have access to our current osmosis through apprenticeship.
(It may often seem that some of the explanations in this book are excessively explicit, but people in future
centuries might not be able to fill the gaps in a way which we now take for granted.)
translation between the languages. To achieve this objective, an ontology must contain representations of all
the objects, classes, relations, attributes and other things which are signified by the languages in question.
In the context of mathematics, an ontology must be able to represent all of the ideas which are signified by
mathematical language. The big question is: What should an ontology for mathematics contain? In other
words, what are the things to which mathematics refers and what are the relations between them?
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It is certain that mathematics does exist in the human mind. This mathematics corresponds to observations
of, and interactions with, the physical universe. Those observations and interactions are limited by the
human sensory system and the human cognitive system. We cannot be at all certain that the mathematics
of our physical models is inherent in the observed universe itself. We can only say that our mathematics
is well suited to describing our observations and interactions, which are themselves limited by the channels
through which we make our observations.
The correspondence between models and the universe seems to be good, but we view the universe through
a cloud of statistical variations in all of our measurements. All observations of the physical universe require
statistical inference to discern the noumena underlying the phenomena. This inference may be explicit, with
reference to probabilistic models, or it may be implicit, as in the case of the human sensory system.
the understanding, but according to Plato it can be done by reason, which shows that there is a
rectilinear triangle in heaven, of which geometrical propositions can be affirmed categorically, not
hypothetically.
Written numerals are only ink or graphite smeared onto paper. If billions of people are writing these symbols,
they must mean something. They must refer to something in the world of experience of the people who
write the symbols to communicate with each other.
Since nothing in the physical world corresponds exactly to the idea of a number, and written numbers must
refer to something, there must be a non-physical world where numbers exist. This non-physical world must
be perceived by all humans because otherwise they could not communicate about numbers with each other.
This proves the existence of a non-physical world where all numbers are perfect and eternal. This world
may be referred to as a mathematics heaven. This number-world can be perceived by the minds of human
beings. So the human mind is an sensory organ which can perceive the Platonic ideal universe in the same
way that the eyes see physical objects. (See Plato [407], pages 240248, for the simile of the cave.)
2.2.7 Remark: Lack of unanimity in perception of the Platonic universe of Forms.
The history of arguments over the last 150 years about the foundations of mathematics, particularly in regard
to intuitionism versus formalism or logicism, casts serious doubt on the claim that everyone perceives the same
mathematical universe. There have even been serious disagreements as to the reality of negative numbers,
zero, irrational numbers, transcendental numbers, complex numbers, and transfinite ordinal numbers.
People born into a monolinguistic culture may have the initial impression that their language is absolute
because everyone agrees on its meaning and usage, but later one discovers that language is mostly relative,
i.e. determined by culture. In the same way, Plato might have thought that mathematics is absolute, but
the more one understands of other cultures, the more one perceives ones own culture to be relative.
2.2.8 Remark: Descartes and Hermite supported the Platonic ontology for mathematics.
Descartes and Hermite supported the Platonic Forms view of mathematics. Bell [213], page 457, published
the following comment in 1937.
Hermites number-mysticism is harmless enough and it is one of those personal things on which
argument is futile. Briefly, Hermite believed that numbers have an existence of their own above
all control by human beings. Mathematicians, he thought, are permitted now and then to catch
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
glimpses of the superhuman harmonies regulating this ethereal realm of numerical existence, just
as the great geniuses of ethics and morals have sometimes claimed to have visioned the celestial
perfections of the Kingdom of Heaven.
It is probably right to say that no reputable mathematician today who has paid any attention
to what has been done in the past fifty years (especially the last twenty five) in attempting to
understand the nature of mathematics and the processes of mathematical reasoning would agree
with the mystical Hermite. Whether this modern skepticism regarding the other-worldliness of
mathematics is a gain or a loss over Hermites creed must be left to the taste of the reader. What is
now almost universally held by competent judges to be the wrong view of mathematical existence
was so admirably expressed by Descartes in his theory of the eternal triangle that it may be quoted
here as an epitome of Hermites mystical beliefs.
I imagine a triangle, although perhaps such a figure does not exist and never has existed any-
where in the world outside my thought. Nevertheless this figure has a certain nature, or form, or
determinate essence which is immutable or eternal, which I have not invented and which in no way
depends on my mind. This is evident from the fact that I can demonstrate various properties of
this triangle, for example that the sum of its three interior angles is equal to two right angles, that
the greatest angle is opposite the greatest side, and so forth. Whether I desire it or not, I recognize
very clearly and convincingly that these properties are in the triangle although I have never thought
about them before, and even if this is the first time I have imagined a triangle. Nevertheless no one
can say that I have invented or imagined them. Transposed to such simple eternal verities as
1 + 2 = 3, 2 + 2 = 4, Descartes everlasting geometry becomes Hermites superhuman arithmetic.
2.2.9 Remark: Bertrand Russell abandoned the Platonic ontology for mathematics.
Bertrand Russell once believed the ideal Forms ontology, but later rejected it. Bell [214], page 564, said the
following about Bertrand Russells change of mind.
In the second edition (1938) of the Principles, he recorded one such change which is of particular
interest to mathematicians. Having recalled the influence of Pythagorean numerology on all subse-
quent philosophy and mathematics, Russell states that when he wrote the Principles, most of it in
1900, he shared Freges belief in the Platonic reality of numbers, which, in my imagination, peopled
the timeless realm of Being. It was a comforting faith, which I later abandoned with regret.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the basic conceptual mechanisms of the embodied human mind as it has evolved in the real world.
Mathematics is a product of the neural capacities of our brains, the nature of our bodies, our
evolution, our environment, and our long social and cultural history.
The mathematics-in-the-brain ontology implies that every concept in mathematics may be mapped to states
and activities within the brain. This implies strict limitations on what may be called intuitive mathematics.
However, our modern mathematics culture has extended beyond the intuitive. It is still possible to map all
mathematics to brain states and activities, but the capabilities of the brain are stretched by the modern
mathematics education. (That might explain why almost all people find advanced mathematics painful.) So
an idea which is intuitive to a person immersed in advanced mathematics may be impossible to grasp for a
person who lacks that immersion. The necessary structures and functions must be built up in the brain over
time. This viewpoint has consequences for the issues in Remark 2.2.12.
The comment on this meeting by Hans Lewy seems as correct now as it was then. (See Reid [227], page 184;
Van Dalen [229], page 491492.)
The feeling of most mathematicians has been informally expressed by Hans Lewy, who as a Privat-
dozent was present as Brouwers talk in Gottingen:
It seems that there are some mathematicians who lack a sense of humor or have an over-swollen
conscience. What Hilbert expressed there seems exactly right to me. If we have to go through so
much trouble as Brouwer says, then nobody will want to be a mathematician any more. After all,
it is a human activity. Until Brouwer can produce a contradiction in classical mathematics, nobody
is going to listen to him.
That is the way, in my opinion, that logic has developed. One has accepted principles until such
time as one notices that they may lead to contradiction and then he has modified them. I think
this is the way it will always be. There may be lots of contradictions hidden somewhere; and as
soon as they appear, all mathematicians will wish to have them eliminated. But until then we will
continue to accept those principles that advance us most speedily.
Both ends of the spectrum lack credibility. Pure formalism apparently accepts as mathematics anything that
can be written down in symbolic logic without disobeying some arbitrary set of rules. But obeying the rules
is only a necessary attribute of mathematics, certainly not sufficient to deliver meaningfulness. By contrast,
intuitionism rejects any mathematics which has insufficient intuitive immediacy. But it is difficult to draw
a clear line so that meaningful intuitive mathematics is on one side and meaningless mystical mathematics
lies on the other. Certainly to get more results cannot be the overriding consideration.
There is a graduation of meaningfulness from the most intuitive concepts, such as finite integers, to clearly
non-intuitive concepts, such as real-number well-orderings or Lebesgue unmeasurable sets. As concepts
become more and more metaphysical and lose their connection to reality, one must simply be aware that
one is studying the properties of objects which cannot have any concrete representation. One will never see
a Lebesgue non-measurable set of points in the real world. One will never see a well-ordering of the real
numbers. But one can meaningfully discuss the properties of such fantasy concepts in an axiomatic fashion,
just as one may discuss the properties of a Solar System where the Moon is twice as heavy as it is, or a
universe where the gravitational constant G is twice as high as it is, or a universe which has a thousand
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
spatial dimensions. Discussing properties of metaphysical concepts is no more harmful than playing chess.
Formalism is harmless as long as one is aware, for example, that it impossible to write a computer programme
which can print out a well-ordering of all real numbers.
The solution to the intuitionism-versus-formalism debate adopted here is to permit mildly metaphysical
concepts, but to always strive to maintain awareness of how far each concept diverges from human intuition.
(In concrete terms, this book is based on standard mathematical logic plus ZF set theory without the axiom
of choice.) The approach adopted here might be thought of as Hilbertian semi-formalism, since Hilbert
apparently thought of formalism as a way to extend mathematics in a logically consistent manner beyond
the bounds of intuition. By proving self-consistency of a set of axioms for mathematics, he had hoped to
at least show that such an extrapolation beyond intuition would be acceptable. When Godel essentially
showed that the axioms for arithmetic could not be proven by finitistic methods to be self-consistent, the
response of later formalists, such as Curry, was to ignore semantics and focus entirely on symbolic languages
as the real stuff of mathematics. (Note, however, that Gentzen proved that arithmetic is complete and
self-consistent, directly contradicting Godels incompleteness metatheorem.) Model theory has brought some
semantics back into mathematical logic. So modern mathematical logic, with model theory, cannot be said
to be semantics-free. However, the ZF models which are used in model theory are largely non-constructible
and no more intuitive than the languages which are interpreted with those models.
The view adopted here, then, is that formalism is a good idea in principle, which facilitates the extension
of mathematics beyond intuition. But the further mathematics is extrapolated, the more dubious one must
be of its relevance and validity. It is generally observed in most disciplines that extrapolation is less reliable
than interpolation. Euclidean geometry, for example, cannot be expected to be meaningful or accurate when
extrapolated to a diameter of a trillion light-years. Every theory is limited in scope.
mathematics ontology. In the former case, numbers, sets and functions refer to mental states and activities.
In the latter case, numbers, sets and functions refer to real-world systems which are being modelled.
Zermelo-Fraenkel set theory and other pure mathematical theories, such as those of geometry, arithmetic and
algebra, are abstractions from the concrete models which are constructed for real-world systems in physics,
chemistry, biology, engineering, economics and other areas. Such mathematical theories provide frameworks
within which models for extra-mathematical systems may be constructed. Thus mathematical theories are
like model-building kits such as are given to children for play or are used by architects or engineers to model
buildings or bridges. For example, the interior of a planet or the structure of a human cell can be modelled
within Zermelo-Fraenkel set theory. Even abstract geometry, arithmetic and algebra can be modelled within
Zermelo-Fraenkel set theory.
Whenever a mathematical theoretical framework is applied to some extra-mathematical purpose, the entities
of the pure mathematical framework acquire meaning through their association with extra-mathematical
objects. Thus in applied mathematics, the answer to the question What do these numbers, sets and
functions refer to? may be answered by following the arrow from the model to the real-world system.
In pure mathematics, one studies mathematical frameworks in themselves, just as one might study a molecule
construction kit or the components which are used in an architects model of a building. One may study the
general properties of the components of such modelling kits even when the components are not currently
being utilised in a particular model of a particular real-world system. In this pure mathematical case, one
cannot determine the meanings of the components by following the arrow from the model to the real-world
system. In this case, one is dealing with a human abstraction from the universe of all possible models of all
systems which can be modelled with the abstract framework.
The distinction between pure and applied mathematics ontologies is not clear-cut. All real-world categories,
such as buildings, bridges, dogs, etc., are human mental constructs. There is no machine which can be
pointed at an animal to say if it is a dog or not a dog. The set of dogs is a human classification of the
world. There may be boundary cases, such as hybrids of dogs, dingos, wolves, foxes, and other dog-related
animals. The situation is even more subjective in regard to breeds of dogs. A physical structure may be
both a building and a bridge, or some kind of hybrid. Is a cave a building if someone lives in it and puts
a door on the front? Is a caravan a building? Is a tree trunk a bridge if it has fallen across a river? Is a
building a building if it hasnt been built yet, or if it has been demolished already? It becomes rapidly clear
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that there is a second level of modelling here. The mathematical model points to an abstract world-model,
and the world-model points in some subjective way to the real world. Ultimately there is no real world apart
from what is perceived through observations from which various conceptions of the real world are inferred.
As mentioned in Remark 3.3.1, the word dog is given real-world meaning by pointing to a real-world dog.
The abstract number 3, written on paper, does not point to a real-world external object in the same way,
since it refers to something which resides inside human minds. However, when the number 3 is applied
within a model of the number of sheep in a field, it acquires meaning from the application context.
One may conclude from these reflections that there is no unsolvable riddle or philosophical conundrum
blocking the path to an ontology for mathematics. If anyone asks what any mathematical concept refers to,
two answers may be given. In the abstract context, the answer is a concept in one or more human minds.
In the applied context, the answer is some entity which is being modelled. Of course, some mathematics is
so pure that one may have difficulty finding any application to anything resembling the real world, in which
case only one of these answers may be given.
their brain. Learning mathematics requires the steady accumulation and interconnection of idea-structures.
The ability to write answers on paper is not the main objective. The answers will appear on paper if the
idea-structures in the brain are correctly installed, configured and functioning. Whereas the chemist studies
chemicals which are out there in the physical world, mathematics students must not only look inside their
own minds for the objects of their study, but also create those objects as required.
Mathematics has strong similarities to computer programming. Theorems are analogous to library procedures
which are invoked by passing parameters to them. If the parameters correctly match the assumptions of the
theorem, the logic of the theorems proof is executed and the output is an assertion. The proof is analogous
to the body or code of a computer program procedure. However, whereas computer code is executed on
a semiconductor-based computer, mathematical theorems are executed inside human brains.
2.3.2 Remark: Unnameable sets, mystical faith, Ockhams razor and unnecessary entities.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
An example of an unnameable set would be a well-ordering of the real numbers. The believers know by
the axiom of choice that such well-orderings exist, but no individual well-ordering of the real numbers can
be named. The existence of such dark objects is a matter of mystical faith, which is therefore a matter of
personal preference. It has been shown, more or less, that the axiom of choice is consistent with the more
constructive axioms of set theory. Therefore it is not possible to disprove the faith of believers. Dark sets
either exist or do not exist in an inaccessible zone where no one can identify individual objects. No light
can ever be shone on this region. It is eternally in total darkness. If one accepts Ockhams razor, one surely
does not wish to multiply entities beyond necessity. Newton [260], page 387, put it very well as follows.
REGUL PHILOSOPHANDI
REGULA I.
Causas rerum naturalium non plures admitti debere, quam qu & ver sint & earum phnomenis
explicandis sufficiant.
Dicunt utique philosophi: Natura nihil agit frustra, & frustra fit per plura quod fieri potest per
pauciora. Natura enim simplex est & rerum causis superfluis non luxuriat.
This may be translated as follows.
PHILOSOPHISING RULES
RULE I.
One must not admit more causes of natural things than are both true and sufficient to explain their
appearances.
The philosophers say anyway: Nature does nothing without effect, and it is without effect to make
something with more which can be made with less. Indeed, nature is simple and does not luxuriate
in superfluous causes of things.
The intuitionists rejected more of the dark entities in mathematics than the logicists, but some of these
dark entities are of great practical utility, particularly the existence of infinite sets, which are the basis of
limits and all of analysis. Limits are no more harmless than the points at infinity of the artists rules of
perspective. Points at infinity are convenient, and have a strong intuitive basis. But well-orderings of the
real numbers are useless because one cannot approach them as limits in any way. They simply are totally
inaccessible. They are therefore not a necessity.
2.3.3 Remark: Unnameable numbers and sets may be thought of as dark numbers and dark sets.
The constraints of signal processing place a finite upper bound on the number of different symbols which
can be communicated on electromagnetic channels, in writing or by spoken phonemes. Therefore in a finite
time and space, only a finite amount of information can be communicated, depending on the length of time,
the amount of space, and the nature of the communication medium. Consequently, mathematical texts can
communicate only a finite number of different sets or numbers. The upper bound applies not only to the
set of objects that can be communicated, but also to the set of objects from which the chosen objects are
selected. In other words, one can communicate only a finite number of items from a finite menu.
Since symbols are chosen from a finite set, and only a finite amount of time and space is available for
communication, it follows that only a finite number of different sets and numbers can ever be communicated.
Similarly, only a finite number of different sets and numbers can ever be thought about. Even if one imagines
that an infinite amount of time and space is available, the total number of different sets and numbers that
can be communicated or thought about is limited to countable infinity. Since symbols can be combined to
signify an arbitrarily wide range of sets and numbers, it seems sensible to generously accept a countably
infinite upper bound for the cardinality of the sets and numbers that can be communicated or thought about.
It follows that almost all elements of any uncountably infinite set are incapable of being communicated or
thought about. One could refer to those sets and numbers which cannot be communicated or thought about
as unnameable, uncommunicable, unthinkable, unmentionable, and so forth.
Unnameable numbers and sets could be described as dark numbers and dark sets by analogy with dark
matter and dark energy. (This is illustrated in Figure 2.3.1.) The supposed existence of dark matter and
dark energy is inferred in aggregate from their effects in the same way that dark numbers and sets may be
constructed in aggregate. Unnameable numbers and sets could also be called pixie numbers and pixie
sets, because we know they are out there, but we can never see them.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
countable name space
names N
name
map
rk number
da
s
nameable
numbers
U
da
rk numbers
universe of numbers
Figure 2.3.1 Nameable numbers and dark numbers
The unnameability issue can be thought of as a lack of names in the name space which can be spanned
by human texts. The names spaces used by human beings are necessarily countable. These names may be
thought of as illuminating a countable subset of elements of any given set. Then if a set is uncountable,
almost all of the set is necessarily dark.
Since most elements of an uncountable set can never be thought of or mentioned, any uncountable set is
metaphysical. One may write down properties which are satisfied by uncountable sets, but this is analogous
to a physics model for a 500-dimensional universe. Whether or not such a model is self-consistent, the
model still refers to a universe which we cannot experience. In fact, such a ridiculous physical model cannot
be absolutely rejected as impossible, but the possibility of an uncountable sets of names for things can be
absolutely rejected.
In particular, standard Zermelo-Fraenkel set theory is a world-model for a mathematical world which is
almost entirely unmentionable and unthinkable. The use of ZF set theory entails a certain amount of self-
deception. One pretends that one can construct sets with uncountable cardinality, but in reality these sets
can never be populated. In other words, ZF set theory is a world-model for a fantasy mathematical universe,
not the real mathematical universe of real mathematicians.
This does not imply that ZF set theory must be abandoned. The sets in ZF set theory which can be named
and mentioned and thought about are perfectly valid and usable. It is important to keep in mind, however,
that ZF set theory models a universe which contains a very large proportion of wasted space. But there is
no real problem with this because it costs nothing.
2.3.4 Remark: Arbitrary mathematical foundations need to be anchored.
One of the most powerful capabilities of the human mind is its ability to imagine things which are not
directly experienced. For example, we believe that the food in a refrigerator continues to exist when the
door is closed. This ability to believe in the existence of things which are not directly perceived is an
enormous advantage when the belief turns out to be true. However, modern mathematics and science have
harnessed that ability, and have extrapolated it so that human beings can now believe in total absurdities
when led by reason and logic.
In the sciences, extrapolations and hypotheses are subject to the discipline of evidence and experiment.
In mathematics, the only discipline is the requirement for self-consistency, which is not a very restrictive
discipline. Hence there are concepts in pure mathematics which have an extremely tenuous connection with
practical mathematics. The idea that mathematics is located in the human mind assists in making choices
of foundations for mathematics from amongst the infinity of possible self-consistent foundations. Otherwise
mathematics would be completely arbitrary, as some people do claim.
2.3.5 Remark: Unnameable sets and numbers are incompressible.
Numbers and sets which cannot be referred to individually by any construction, nor by any specific individual
definition, may be described as unmentionable numbers or sets. They could also be described as incom-
pressible numbers and sets because the ability to reduce an infinite specification to a finite specification is a
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
kind of compression. (The word incompressible was suggested to the author in a phone call on 2008-3-12
or earlier [437].) For example,
P the real number has an infinite number of digits in its decimal expansion,
but the expression = 4 n
n=0 (1) (2n + 1)
1
is a finite specification for which uniquely defines it so
that it can be be distinguished from any other specified real number.
2.3.6 Remark: The price to pay for the power of mathematics.
Bell [213], pages 521522 made the following comment about irrational numbers and infinite concepts.
It depends upon the individual mathematicians level of sophistication whether he regards these
difficulties as relevant or of no consequence for the consistent development of mathematics. The
courageous analyst goes boldly ahead, piling one Babel on top of another and trusting that no
outraged god of reason will confound him and all his works, while the critical logician, peering
cynically at the foundations of his brothers imposing skyscraper, makes a rapid mental calculation
predicting the date of collapse. In the meantime all are busy and all seem to be enjoying themselves.
But one conclusion appears to be inescapable: without a consistent theory of the mathematical
infinite there is no theory of irrationals: without a theory of irrationals there is no mathematical
analysis in any form even remotely resembling what we now have; and finally, without analysis
the major part of mathematicsincluding geometry and most of applied mathematicsas it now
exists would cease to exist.
The most important task confronting mathematicians would therefore seem to be the construction
of a satisfactory theory of the infinite.
Mathematics sometimes seems like an exhausting series of mind-stretches. First one has to stretch ones
ideas about integers to accept enormously large integers. Then one must accept negative numbers. Then
fractional, algebraic and transcendental numbers. After this, one must accept complex numbers. Then there
are real and complex vectors of any dimension, infinite-dimensional vectors and seminormed topological
linear spaces. Then one must accept a dizzying array of transfinite numbers which are more infinite than
infinity. Beyond this are dark numbers and dark sets which exist but can never be written down. If there
were no benefits to this exhausting series of mind-stretches, only crazy people would study such stuff. It
is only the amazing success of applications to science and engineering which justifies the towering edifice of
modern mathematics. Intellectual discomfort is the price to pay for the analytical power of mathematics.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
not fall from the sky on golden tablets!
2.4.3 Remark: The logic literature uses the word model differently.
The word model is widely used in the logic literature in the reverse sense to the meaning in this book.
(See also Remark 2.4.2 on this matter.)
Shoenfield [340], page 18, first defines a structure for an abstract (first-order) language to be a set of
concrete predicates, functions and variables, together with maps between the abstract and concrete spaces,
as outlined in Remark 5.1.4. If such a structure maps true abstract propositions only to true concrete
propositions, it is then called a model. (Shoenfield [340], page 22.)
E. Mendelson [320], page 49, defines an interpretation with the same meaning as a structure, just
mentioned. Then a model is defined as an interpretation with valid maps in the same way as just mentioned.
(See E. Mendelson [320], page 51.) Shoenfield [340], pages 61, 62, 260, seems to define an interpretation as
more or less the same as a model.
Here the abstract language is regarded as the model for the concrete logical system which is being discussed.
It is perhaps understandable that logicians would regard the abstract logic as the real thing and the
concrete system as a mere structure, interpretation or model. But in the scientific context, generally
the abstract description is regarded as a model for the concrete system being studied. It would be accurate
to use the word application for a valid structure or interpretation of a language rather than a model.
2.4.4 Remark: Mathematical logic is an art of modelling, whether the subject matter is concrete or not.
There is a parallel between the history of mathematics and the history of art. In the olden days, paintings
were all representational. In other words, paintings were supposed to be modelled on visual perceptions of
the physical world. But although the original objective of painting was to represent the world as accurately
as possible, or at least convince the viewer that the representation was accurate, the advent of cameras made
this representational role almost redundant. Photography produced much more accurate representations
for a much lower price than any painter could compete with. So the visual arts moved further and further
away from realism towards abstract non-representational art. In non-representational art, the methods and
techniques were retained, but the original objective to accurately model the visual world was discarded. With
modern computer generated imagery, the capability to portray things which are not physically perceived has
increased enormously.
Mathematics was originally supposed to represent the real world, for example in arithmetic (for counting and
measuring), geometry (for land management and three-dimensional design) and astronomy (for navigation
and curiosity). The demands of science required rapid development of the capabilities of mathematics to
represent ever more bizarre conceptions of the physical world. But as in the case of the visual arts, the
methods and techniques of mathematics took on a life of their own. Increasingly during the 19th and 20th
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
centuries, mathematics took a turn towards the abstract. It was no longer felt necessary for pure mathematics
to represent anything at all. Sometimes fortuitously such non-representational mathematics turned out to
be useful for modelling something in the real world, and this justified further research into theories which
modelled nothing at all.
The fact that a large proportion of logic and mathematics theories no longer represent anything in the
perceived world does not change the fact that the structures of logic and mathematics are indeed theories.
Just as a painting is not necessarily a painting of physical things, but they are still paintings, so also the
structures of mathematics are not necessarily theories of physical things, but they are still theories. It follows
from this that mathematics is an art of modelling, not a science of eternal truth. Therefore it is pointless to
ask whether mathematical logic is correct or not. It simply does not matter. Mathematical logic provides
tools, techniques and methods for building and studying mathematical models.
2.4.5 Remark: Theorems dont need to be proved. Nor do they need to be read.
In everyday life, the truth of a proposition is often decided by majority vote or by a kind of adversarial
system. Thus one person may claim that a proposition is true, and if no one can find a counterexample
after many years of concerted and strenuous efforts, it may be assumed that the proposition is thereby
established. After all, if counterexamples are so hard to find, maybe they just dont matter. And besides,
delay in establishing validity has its costs also. The safety of new products could be determined in this way
for example, especially if the new products have some benefits. Thus a certain amount of risk is generally
accepted in the real world.
By contrast, it is rare that propositions are accepted in mathematics without absolute proof, although the
adjective absolute can sometimes be somewhat elastic. One way to avoid the necessity of absolutely
proving a proposition is to make it an axiom. Then it is beyond question! Propositions are also sometimes
established as folk theorems, but these are mostly reluctantly accepted, and generally the users of folk
theorems are conscious that such theorems still await proof or possible disproof.
Theorems are also sometimes proved by sketches, typically including comments like it is easy to see how to
do X or by a well-known procedure, we may assert X. Then anyone who does not see how to do it is
exposing themselves as a rank amateur. Sometimes, such proofs can be made strictly correct only by finding
some additional technical conditions which have been omitted because every professional mathematician
would know how to fill in such gaps. Thus there is a certain amount of flexibility and elasticity in even the
mathematical standards for proof of theorems.
In textbooks, it is not unusual to leave most of the theorems unproved so that the student will have something
to do. This is quite usual in articles in academic journals also, partly to save space because of the high cost
of printing in the olden days, but also to avoid insulting the intelligence of expert readers.
So one might fairly ask why the proofs of theorems should be written out at all. Every proposition could
be regarded as an exercise for the reader to prove. Alternatively, each proposition could be thought of as
a challenge to the reader to find counterexamples. If the reader does find a counterexample, thats a boost
of reputation for the reader and a blow to the writer. If a proposition survives many decades without a
successful contradiction, that boosts the reputation of the writer. Most of the time, a false theorem can be
fixed by adding some technical assumption anyway. So theres not much harm done. Thus mathematics
could, in principle, be written entirely in terms of assertions without proofs. The fear of losing reputation
would discourage mathematicians from publishing false assertions.
Even when proofs are provided, it is mostly unnecessary to read them. Proofs are mostly tedious or difficult
to read almost as tedious or difficult as they are to write! Therefore proofs mostly do not need to be
read. A proof is analogous to the detailed implementation of computer software, whereas the statement of a
theorem is analogous to the user interface. One does not need to read a proof unless one suspects a bug or
one wishes to learn the tricks which are used to make the proof work. (Analogously, when people download
application software to their mobile computer gadgets, they rarely insist on reading the source before using
the app, but authors should still provide the source code in case problems arise, and to give confidence
that the software does what is claimed.)
2.4.6 Remark: Lemmas.
A lemma in ancient Greek mathematics, appears, to have had more than one interpretation, and those
interpretations were not quite the same as the modern meanings. A lemma could have meant a kind of
folk theorem which was used in proofs, but which no one had yet rigorously proved because it was deemed
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
unnecessary to prove an assertion which is obviously true. Then at some later time, the lemma might have
been given a proof. Heath [224], page 373, wrote the following on this subject.
The word lemma () simply means something assumed . Archimedes uses it of what is now
known as the Axiom of Archimedes, the principle assumed by Eudoxus and others in the method
of exhaustion; but it is more commonly used of a subsidiary proposition requiring proof, which,
however, it is convenient to assume in the place where it is wanted in order that the argument
may not be interrupted or unduly lengthened. Such a lemma might be proved in advance, but the
proof was often postponed till the end, the assumption being marked as something to be afterwards
proved by some such words as , as will be proved in due course.
Elsewhere, Heath quotes the following later view of Proclus (412485ad). (See Euclid/Heath [195], page 133.)
The term lemma, says Proclus, is often used of any proposition which is assumed for the
construction of something else: thus it is a common remark that a proof has been made out of
such and such lemmas. But the special meaning of lemma in geometry is a proposition requiring
confirmation. For when, in either construction or demonstration, we assume anything which has not
been proved but requires argument, then, because we regard what has been assumed as doubtful
in itself and therefore worthy of investigation, we call it a lemma, differing as it does from the
postulate and the axiom in being matter of demonstration, whereas they are immediately taken for
granted, without demonstration, for the purpose of confirming other things.
In a footnote to this quotation from Proclus, Heath makes the following comment on the relative modernity
of the Proclus concept of a lemma as a theorem whose proof is delayed to a later time to avoid disrupting a
proof. (He mentions the first century ad Greek mathematician Geminus.)
This view of a lemma must be considered as relatively modern. It seems to have had its origin
in an imperfection of method. In the course of a demonstration it was necessary to assume a
proposition which required proof, but the proof of which would, if inserted in the particular place,
break the thread of the demonstration: hence it was necessary either to prove it beforehand as a
preliminary proposition or to postpone it to be proved afterwards ( ). When,
after the time of Geminus, the progress of original discovery in geometry was arrested, geometers
occupied themselves with the study and elucidation of the works of the great mathematicians who
had preceded them. This involved the investigation of propositions explicitly quoted or tacitly
assumed in the great classical treatises; and naturally it was found that several such remained to be
demonstrated, either because the authors had omitted them as being easy enough to be left to the
reader himself to prove, or because books in which they were proved had been lost in the meantime.
Hence arose a class of complementary or auxiliary propositions which were called lemmas.
According to this view, lemmas are gaps which have been found in proofs. Nowadays a lemma often means an
uninteresting, difficult, technical theorem which is only of use to prove a single interesting, useful theorem.
Zorns lemma, in particular, is a kind of gap-filler, replacing a common, reasonable assumption with a
rigorous proof from some other basis.
If lemmas are theorems whose assertions are obvious, but whose proofs are tedious, then certainly this book
has a lot of lemmas. Such lemmas are needed in great numbers if one tries to fill all the gaps. History
has shown that interesting non-trivial mathematics sometimes arises from the effort to prove the obvious.
Therefore the plethora of lemmas in this book can perhaps be excused.
Chapter 3
Propositional logic
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
possible to systematically present any one of these three subjects without first presenting the other two.
Logic seems to be the best starting point for several reasons.
(1) In mathematics, logical operations and expressions are required in almost every sentence and in all
definitions and theorems, explicitly or implicitly, formally or informally, expressed in symbolic logic or
in natural language.
(2) Symbolic logic provides a compact, unambiguous language for the communication of mathematical
assertions and definitions, helping to clear the fog of ambiguity and subjectivity which envelops natural
language explanations. (This book uses symbolic logic more than most about differential geometry.
Definitions and notations vary from decade to decade, from topic to topic, and from author to author,
but symbolic logic makes possible precise, objective comparisons of concepts and arguments.)
(3) Although the presentation of mathematics requires some knowledge of the integers, some understanding
of set membership concepts, and the ability to order and manipulate symbols in sequences of lines on
a two-dimensional surface of some kind, most people studying mathematics have adequate capabilities
in these areas. However, the required sophistication in logic operations and expressions is beyond the
naive level, especially when infinite or infinitesimal concepts of any kind are introduced.
(4) The history of the development of mathematics in the last 350 years has shown that more than naive
sophistication in logic is required to achieve clarity and avoid confusion in modern mathematical analysis.
It is true that one may commence with numbers or sets, and that this is how mathematics is often
introduced at an elementary level, but lack of sophistication in logic is very often a major obstacle to
progress beyond the elementary level. Therefore the best strategy for more advanced mathematics is to
commence by strengthening ones capabilities in logic.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
(5) Logic is wired into all brains at a very low level, human and otherwise. This is demonstrated in particular
by the phenomenon of conditioning, whereby all animals (which have a nervous system) may associate
particular stimuli with particular reactions. This is in essence a logical implication. Associations may
be learned by all animals (which have a nervous system). So logic is arguably the most fundamental
mathematical skill. (At a low level, brains look like vast pattern-matching machines, often producing
true/false outputs from analogue inputs. See for example Churchland/Sejnowski [393].)
(6) Logic is wired into human brains at a very high level. Ever since the acquisition of language (probably
about 250,000 years ago), humans have been able to give names to individual objects and categories
of objects, to individual locations, regions and paths, and to various relations between objects and
locations (e.g. the lion is behind the tree, or Mkoko is the child of Mbobo). Through language, humans
can make assertions and deny assertions. The linguistic style of logic is arguably the capability which
gives humans the greatest advantage relative to other animals.
Commencing with logic does not imply that all mathematics is based upon a bedrock of logic. Logic is not
so much a part of the subject matter of mathematics, but is rather an essential and predominant part of the
process of thinking in general. Arguably logic is essential for all disciplines and all aspects of human life.
The special requirement for in-depth logic in mathematics arises from the extreme demands placed upon
logical thinking when faced with the infinities and other such semi-metaphysical and metaphysical concepts
which have been at the core of mathematics for the last 350 years. Mathematical logic provides a framework
of language and concepts within which it is easier to explain and understand mathematics efficiently, with
maximum precision and minimum ambiguity. Therefore strengthening ones capabilities first in logic is a
strategic choice to build up ones intellectual reserves in preparation for the ensuing struggle.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
was necessitated by evolutionary pressures to increase social group size.
In groups of the size typical of non-human primates (and of vocal-grooming hominids), social
knowledge is acquired by direct, first-hand interaction between individuals. This would not be
possible in the large groups characteristic of modern humans, where cohesion can only be maintained
if individuals are able to exchange information about behaviour and relationships of other group
members. By the later part of the Middle Pleistocene (about 250,000 years ago), groups would
have become so large that language with a significant social information content would have become
essential.
Regarding three possible explanations for the evolutionary pressure to increase human social group size,
Aiello/Dunbar [392], page 191, make the following comment.
The second possibility is that human groups are in fact larger than necessary to provide protection
against predators because they are designed to provide protection against other human groups [. . . . ].
Competition for access to resources might be expected to lead to an evolutionary arms race because
the larger groups would always win. Some evidence to support this suggestion may lie in the fact
that competitive aggression of this kind resulting in intraspecific murder has been noted in only
one other taxon besides humans, namely, chimpanzees.
The need for human social groups to maintain cohesion in competition with other social groups would have
been particularly great in times of ecological stress such as megadrought in eastern Africa during the Late
Pleistocene. (See for example Cohen et alii [429].) Proficiency in a particular language would be a strong
marker of us versus them, with the consequence that major wars between groups for access to resources
would be much more likely than if individuals were not aggregated into groups by their language. (Any
individual who could not speak ones own language would be one of them, evoking hostility rather than
cooperation.) Thus it seems quite likely that basic propositional logic, which is predicated on the ability to
express ideas in verbal form, arose from the evolutionary benefits of forming and demarcating social groups
for success in genocidal wars during the last 250,000 years. (See also related discussion of the evolution of
language by Foley [394], pages 4378.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
deserves at least some investment of time and effort. Many of the colossal blunders in science, technology
and economics can be attributed to bad logic. So the ability to think logically is surely at least a valuable
tool in the intellectual toolbox. The specific facts and results of mathematical logic are mostly not directly
applicable in science, technology and economics, but the clarity and depth of thinking are directly applicable.
Similarly, advanced mathematics in general is widely applicable in the sense that the mental skills and the
clarity and depth of thinking are applicable.
3.0.5 Remark: Apologa for the assumption of a-priori knowledge of set theory and logic.
It is assumed in Chapters 3, 4, 5 and 6 that the reader is familiar with the standard notations of elementary
set theory and elementary symbolic logic. Since it is not possible to present mathematical logic without
assuming some prior knowledge of the elements of set theory and logic, the circularity of definitions cannot
be avoided. Therefore there is no valid reason to avoid assuming prior knowledge of some useful notations
and basic definitions of set theory and logic, which enormously facilitate the efficient presentation of logic.
Readers who do not have the required prior knowledge of set theory may commence reading further on in
this book (where the required notations and definitions are presented), and return to this chapter later.
If the cyclic dependencies between logic and set theory seem troubling, one could perhaps consider that left
and right cannot be defined in a non-cyclic way. Left is the opposite of right, and right is the opposite
of left. It is not clear that one concept may be defined independently prior to the other. And yet most
people seem comfortable with the mutual dependency of these concepts. Similarly for up and down,
and inside and outside. Few people complain that they cannot understand inside until outside
has been defined, and they cannot understand outside until inside has been defined, and therefore they
cannot understand either!
The purpose of Chapter 3 is more to throw light upon the true nature of mathematical logic than to try to
rigorously boot-strap mathematical logic into existence from minimalist assumptions. The viewpoint adopted
here is that logical thinking is a physical phenomenon which can be modelled mathematically. Logic becomes
thereby a branch of applied mathematics. This viewpoint clearly suffers from circularity, since mathematics
models the logic which is the basis of mathematics. This is inevitable. Reality is like that.
To a great extent, the concept of a set or class is a purely grammatical construction. Roughly speaking,
properties and classes are in one-to-one correspondence with each other. Ignoring for now the paradoxes
which may arise, the unrestricted comprehension rule says that for every predicate , there is a set
X = {x; (x)} which contains precisely those objects which satisfy , and for every set X, there is a
predicate such that (x) is true if and only if x X. It is entirely possible to do mathematics without any
sets or classes at all. This was always done in the distant past, and is still possible in modern times. (See for
example Misner/Thorne/Wheeler [258], which apparently uses no sets.) Since a predicate may be regarded
as an attribute or adjective, and a set or class is clearly a noun, the unrestricted comprehension rule simply
associates adjectives (or predicates) with nouns. Thus the adjective mammalian, or the predicate is a
mammal, may be converted to the set of mammals, which is a nounal phrase. In this sense, one may say
that prior knowledge of set theory is really only prior knowledge of logic, since sets are linguistic fictions.
Hence only a-priori knowledge of logic is assumed in this presentation of mathematical logic, although the
nounal language of sets and classes is used throughout because it improves the efficiency of communication
and thinking.
(The Greek word apologa [] is used here in its original meaning as a speech in defence, not as
a regretful acceptance of blame or fault as the modern English word apology would imply. Somehow the
word has reversed its meaning over time! See Liddell/Scott [420], page 102; Morwood/Taylor [422], page 44.)
3.0.6 Remark: Why propositional logic deserves more pages than predicate logic.
It could be argued that propositional logic is a toy logic, as in fact some authors do explicitly state. (See for
example Chang/Keisler [298], page 4.) Some authors of logic textbooks do not present a pure propositional
calculus, preferring instead to present only a predicate calculus with propositional calculus incorporated.
As suggested in Remarks 4.1.1 and 5.0.3, the main purpose of predicate calculus is to extend mathematics
from intuitively clear concrete concepts to metaphysically unclear abstract concepts, in particular from the
finite to the uncountably infinite. Since most interesting mathematics involves infinite classes of objects, it
might seem that propositional logic has limited interest. However, as mentioned in Remark 3.3.5, there is
a naming bottleneck in mathematics (which is caused by the finite bandwidth of human communications
and thinking). So in reality, all mathematics is finite in the language layer, even when it is intended to
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
be infinite in the semantics layer. Therefore the apparently limited scope of propositional logic is not as
limited as it might seem. Another reason for examining the toy propositional logic very closely is to ensure
that the meanings of concepts are clearly understood in a simple system before progressing to the predicate
logic where concepts are often difficult to grasp.
It seems plausible that, when human language first developed about 250,000 years ago, assertions developed
very early, such as: Theres a lion behind the tree. It is possible that imperatives developed even earlier,
such as Come here., or Go away., or Give me some of that food. But there is a fine line between
assertions and imperatives. Every request or command contains some informational content, and assertions
mostly carry implicit or explicit requests or commands. For example, Theres a lion! means much the
same as Run away from that lion., and Give me food. carries the information I am hungry.
The earliest surviving written stories, such as Gilgamesh in Mesopotamia, include many descriptions of
deception using language. In fact, the first writing is generally believed to have originated in Mesopotamia
about 3500bc as a means of enforcing contracts. Verbal contracts were often denied by one or both parties.
So it made sense to record contracts in some way. Thus assertions, denials and deceptions have a long
history. It seems plausible that these were part of human life for well over a hundred thousand years.
It seems likely, therefore, that a sharp distinction between truth and falsity of natural-language assertions
must have been part of daily human life since shortly after the advent of spoken language. So, like the
numbers 1, 2 and 3, the logical notions of truth and falsity, and assertions and denials, may be assumed as
naive knowledge. Hence it is unnecessary to define such concepts. In fact, it is probably impossible. The
basic concepts of logic are deeply embedded in human language.
3.1.2 Remark: All propositions are either true or false. A proposition cannot be both true and false.
In a strictly logical model, all propositions are either true or false. This is the definition of a logical model.
(If you disagree with this statement, you are in a universe where your belief is true and my belief is false!)
Mathematical logic is the study of propositions which are either true or false, never both and never neither.
Each proposition shall have one and only one truth value, and the truth value shall be either true or false.
Nothing in the real world is clear-cut: true or false. Perceptions and observations of the real world
are subjective, variable, indirect and error-prone. So one could argue that a strictly logical model cannot
correctly represent such a real world. The real world is also apparently probabilistic according to
quantum physics. We have imperfect knowledge of the values of parameters for even deterministic models,
which suggests that propositions should also have possible truth values such as unknown or maybe or
not very likely, or numerical probabilities or confidence levels could be given for all observations. But this
would raise serious questions about how to define such concepts. Inevitably, one would need to define fuzzy
truth values in terms of crisp true/false propositions. So it seems best to first develop two-valued logic, and
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
then use this to define everything else.
If someone claims that no proposition is either absolutely true or absolutely false, one could ask them
whether they are absolutely certain of this. If they answer yes, then they have made an assertion which is
absolutely true, which contradicts their claim. So they must answer no, which leaves open the possibility
that absolutely true or absolutely false propositions do exist. More seriously, it is in the metalanguage that
people are more likely to make categorical claims of truth or falsity, especially when there is no possibility
of a proof. This supports the notion that a core foundation logic should be categorical and clear-cut. Then
application-specific logics may be constructed within such a categorical metalogic. Therefore clear-cut logic
does appear to have an important role as a kind of foundation layer for all other logics.
The logic presented in this book is simple and clear-cut, with no probabilities, no confidence values, and no
ambiguities. More general styles of logic may be defined within a framework of clear-cut true-or-false logic,
but they are outside the scope of this book. One may perhaps draw a parallel with the history of computers,
where binary zero-or-one computers almost completely displaced analogue designs because it was very much
more convenient to convert analogue signals to and from binary formats and do the processing in binary than
to design and build complex analogue computers. (There was a time long ago when digital and analogue
computers were considered to be equally likely contenders for the future of computing.)
3.1.3 Remark: The truth values of propositions exist prior to argumentation.
There is a much more important sense in which all propositions are either true or false. Most of the
mathematical logic literature gives the impression that the truth or falsity of propositions comes via logical
argumentation from axioms and deduction rules. Consequently, if it is not possible to prove a proposition
true or false within a theory, the proposition is effectively assumed to have no truth value, because where
else could a proposition obtain a truth value except through logical argumentation? This kind of thinking
ignores the original purpose of logical argumentation, which was to deduce unknown truth values from known
truth values. Just because one cannot prove that a lion is or isnt behind the tree does not mean that the
proposition has no truth value. It only means that ones argumentation is insufficient to determine the
matter. Similarly, in algebra for example, the inability of the axioms of a field to prove whether a field K is
finite means that the axioms are insufficient, not that the proposition K is finite has no truth value. The
proposition does have a truth value when a particular concrete field has been fully defined.
This point may seem pedantic, but it removes any doubt about the excluded middle rule, which underlies
the validity of reductio ad absurdum (RAA) arguments. So it will not be necessary to give any serious
consideration to objections to the RAA method of argument.
3.1.4 Remark: RAA has absurd consequences. Therefore it is wrong. (Spot the logic error!)
Some people in the last hundred years have put the argument that the excluded middle is unacceptable
because if RAA is accepted as a mode of argument, then any proposition can be shown to be false from
any contradiction. But this argument uses RAA itself, by apparently proving that RAA is false from the
absurd consequence that any proposition can be shown to be false by a single contradiction. (The discomfort
with the apparent arbitrariness of RAA is sometimes given the Latin name ex falso (sequitur) quodlibet,
meaning from falsity, anything (follows). See for example Curry [301], pages 264, 285.)
There is a more serious reason to reject the arguments against RAA. If a mathematical theory does yield
a contradiction, by applying inference methods which purportedly give only true conclusions from true
assumptions, then the whole system should be rejected. The arrival at a contradiction means that either
the inference methods contain a fault or the axioms contain a fault, in which case all conclusions arrived at
by this system are thrown into total doubt. In fact, in case of a contradiction, it becomes much more likely
than not that a substantial proportion of false conclusions may already have been (falsely) inferred. As an
analogy, if a computer system yielded the output 3 + 2 = 7, one would either re-boot the computer, rewrite
the software, or consider purchasing different hardware. One would also consider that all computations up
to that time have been thrown into serious doubt, necessitating a discard of all outputs from that system.
Contradictions are regularly arrived at in practical mathematical logic by making tentative assumptions
which one wishes to disprove, while maintaining a belief that all axioms, inference methods and prior theorems
are correct. Then it is the tentative assumption which is rejected if a contradiction arises. It is difficult to
think of another principle of inference which is more important to mathematics than RAA. This principle
should not be accepted because it is indispensable. It should be accepted because it is clearly valid.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
RAA may be thought of as an application to logic of the regula falsi method, which was known to the
ancient Egyptians and is similar to the tentative assumption method of Diophantus. (See Cajori [221],
pages 1213, 61, 9193, 103.) The regula falsi idea is to make a first guess of the value of a variable, and
then use the resulting error to assist discovery of the true value. In logic, one makes a first guess of the truth
value of a proposition. If this yields a contradiction, this implies that the only other possible truth value
must have been correct. This is a special case of the notion of negative feedback.
propositions. When one reverses the deductive method to seek these origins, there are inevitably gaps where
first causes should be. There seems to be some kind of instinct to fill such gaps.
3.1.6 Remark: Valid logic must not permit false conclusions to be drawn from true assumptions.
Any logic system which permits false conclusions to be drawn from true assumptions is clearly not valid.
Such a logic system would lack the principal virtue which is expected of it. However, if it is supposed that
truth is decided by argumentation within a logic system, the system would be deciding the truthfulness of
its own outputs, which would make it automatically valid because it is the judge of its own validity. Thus
all logic systems would be valid, and the concept of validity would be vacuous. Therefore to be meaningful,
validity of a logic system must have some external test (by an impartial judge not connected with the case).
This observation reinforces the assertion in Remark 3.1.5. Logical argumentation must be subservient to the
extra-logical truth or falsity of propositions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3.2.3 Definition [mm]: A truth value map is a map : P {F, T} for some (naive) set P.
The concrete proposition domain of a truth value map : P {F, T} is its domain P.
3.2.5 Remark: Propositions have truth values. This is the essence of propositions.
In Definition 3.2.3, a truth value map is defined on a concrete proposition domain P which may be a set
defined in some kind of set theory, or it may be a naive set. Each concrete proposition p P has a truth
value (p) which may be true (T) or false (F). Some, all, or none of the truth values (p) may be known.
3.2.6 Remark: Propositions obtain their meaning from applications outside pure logic.
In applications of mathematical logic in particular contexts, the truth values for concrete propositions are
typically evaluated in terms of a mathematical model of some kind. For example, the truth value of the
proposition Mars is heavier than Venus may be evaluated in terms of the mass-parameters of planets in
a Solar System model. (Its false!) Pure logic is an abstraction from applied logic. Abstract logic has the
advantage that one can focus on the purely logical issues without being distracted by particular applications.
Abstract logic can be applied to a wide variety of contexts. A disadvantage of abstraction is that it removes
intuition as a guide. When intuition fails, one must return to concrete applications to seek guidance.
Concrete propositions are located in human minds, in animal minds or in computer minds. The discipline
of mathematical logic may be considered to be a collection of descriptive and predictive methods for human
thinking behaviour, analogous to other behavioural sciences. This is reflected in the title of George Booles
1854 publication An investigation of the laws of thought [293]. Human logical behaviour is in some sense
the art of thinking. Logical behaviour aims to derive conclusions from assumptions, to guess unknown
truth values from known truth values. The discipline of mathematical logic aims to describe, systematise,
prescribe, improve and validate methods of logical argument.
3.2.7 Remark: Logical language refers to concrete propositions, which refer to logic applications.
The lowest level of logic, concrete proposition domains, is the most concrete, but it is also the most difficult
to formalise because it is informal by nature, being formulated typically in natural language in a logic
application context. This is the semantic level for logic, where real logical propositions reside. A logical
language L obtains meaning by pointing to concrete propositions. Similarly, a concrete proposition domain
P obtains meaning by pointing to a mathematical system S or a world-model for some kind of world.
This idea is illustrated in Figure 3.2.1.
propositional calculus
L Example: p1 or p2 , where
p1 = < 3, p2 = > 3
mathematical system
S
, 2, 3, , <, . . .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 3.2.1 Relation of logical language to concrete propositions and mathematical systems
The idea here is that abstract proposition symbols such as p1 and p2 refer to concrete propositions such as
< 3 and > 3, while symbols such as and 3 refer to objects in some kind of mathematical
system, which itself resides in some kind of mind. The details are not important here. The important thing
to know is that despite the progressive (and possibly excessive) levels of abstraction in logic, the symbols do
ultimately mean something.
In case (1), the domain P is a well-defined set in the Zermelo-Fraenkel sense. In the other cases above, P is
the class of all ZF sets.
Examples (6) and (7) relating to the physical world are not so crisply defined as the others.
(6) P = the propositions P (x, t) defined as the voltage of transistor x is high at time t for transistors x
in a digital electronic circuit for times t. The truth value (P (x, t)) is not usually well defined for all
pairs (x, t) for various reasons, such as settling time, related to latching and sampling. Nevertheless,
propositional calculus can be usefully applied to this system.
(7) P = the propositions P (n, t, z) defined as the voltage Vn (t) at time t equals z for locations n in an
electronic circuit, at times t, with logical voltages z equal to 0 or 1. The truth value (P (n, t, z)) is
usually not well defined at all times t. Propositions P (n, t, z) may be written as Vn (t) = z.
Some of these examples of concrete proposition domains and truth value maps are illustrated in Figure 3.2.2.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
concrete proposition domain
Figure 3.2.2 Concrete proposition domains and truth value maps
to say that all logic is about symbol-strings. In this book, it is claimed that proposition names are always
labels for concrete propositions, while logical expressions always point to knowledge sets.
The distinction between names and objects in logic is as important as distinguishing between points in
manifolds and the coordinates which refer to those points. A hundred years ago, it was common to confuse
points with coordinates. Such confusion is not generally accepted in modern differential geometry. (This
confusion was introduced about 1637 by Descartes [194], and was formalised as an axiom by Cantor and
Dedekind about 1872. See Boyer [216], pages 92, 267; Boyer [215], pages 291292.)
In this book, proposition names are admittedly often confused with the propositions which they point to.
This mostly does as little harm as substituting the word dog for a real dog. When the difference does
matter, the concepts of a proposition name space and a proposition name map will be invoked. In
Section 3.4, for example, propositions are identified with their names. This is no more harmful than the
standard practice of saying let p be a point instead of the more correct let p be a label for a point. But
in the axiomatisation of logic in Chapter 4, the distinction does matter.
The issue of the distinction between names and objects is called use versus mention by Quine [330], page 23.
3.3.2 Definition [mm]: A proposition name map is a map : N P from a (naive) set N to a concrete
proposition domain P.
The proposition name space of a definition name map : N P is its domain N .
N A, B, C, . . .
ab
str
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
act
ru t
proposition t = th va
name map lue m
ap
x, y, y
/ x,
P , F T {F, T}
truth value map
, . . .
concrete proposition domain truth value space
From the formalist perspective, the name space N and the abstract truth value map t : N {F, T} are
the principal subject of study, while the domain P and the name map : N P are considered to be
components of an interpretation of the language.
restored respectively. Therefore the proposition name maps in Definition 3.3.2 are themselves variable in
the sense that they are context-dependent. In Figure 3.3.1, then, the map is variable, while the domain P
may be fixed.
Scopes are typically organised in a hierarchy. If a scope S1 is defined within a scope S0 , then S0 may be said
to be the enclosing scope or parent scope, while S1 may be referred to as the enclosed scope, child
scope or local scope. The constant name assignments in any enclosed child scope are generally inherited
from the enclosing parent scope.
A theorem, proof or discussion often commences with language like: Let P be a proposition. This means
that the variable name P is to be fixed within the following scope. Then P is constant within the enclosed
scope, but is a free variable in the enclosing scope.
The discussion of constants and scopes may seem to be of no real importance. However, it is relevant to
issues such as the axiom of choice. Every time a choice of object is made, an assignment is made in the local
scope between a name and a chosen object. This requires the name map to be modified so that a locally
constant name is assigned to the chosen object.
Shoenfield [340], page 7, describes the context-dependence of proposition name maps by saying that [. . . ] a
syntactical variable may mean any expression of the language; but its meaning remains fixed throughout
any one context. (His syntactical variable is equivalent to a proposition name.)
3.3.5 Remark: The naming bottleneck.
Constant names cannot always be given to all propositions in a domain P. Variable names can in principle be
made to point at any individual proposition, but some propositions may be unmentionable as individuals
because the selective specification of some objects may require an infinite amount of information. (For
example, the digits of a truly random real number in decimal representation almost always cannot be
specified by any finite rule.)
Proposition name spaces in the real world are finite. Countably infinite name spaces are a modest extension
from reality to the slightly metaphysical. Anything larger than this is somewhat fanciful. Some authors
explicitly state that their proposition name spaces are countable. (See for example E. Mendelson [320],
page 29.) Most authors indicate implicitly that their proposition name spaces are countable by using letters
indexed by integers, such as p1 , p2 , p3 , . . ., for example.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3.3.6 Remark: Application-independent logical axioms and logical theorems.
A single proposition name map as in Definition 3.3.2 may be mapped to many different concrete proposition
domains. (This is illustrated in Figure 3.3.2.)
Propositional logic may be formalised abstractly without specifying the concrete proposition domain or the
proposition name map. Then it is intended that the theorems will be applicable to any concrete proposition
domain and any valid proposition name map. The axioms and theorems of such abstract propositional logic
are called logical axioms and logical theorems to distinguish them from axioms and theorems which are
specific to particular applications.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
wild. The knowledge-set concept is a model for human knowledge (or beliefs, to be more precise). The
main task in the formulation of logic is to correctly describe valid, self-consistent logical thinking in terms of
general laws for relations between proposition truth values, and also the processes of argumentation which
aim to deduce unknowns from knowns. This task is carried out here in terms of the knowledge-set concept.
Although the words set and subset are used in Definition 3.4.4, these are only naive sets because by
Definition 3.3.2, a concrete proposition domain is only a naive set. These sets may or may not be sets or
classes in the sense of set theories, but this does not matter because only the most naive properties of sets
are required here. (The term knowledge space in Definition 3.4.3 is rarely used in this book.)
3.4.3 Definition [mm]: The knowledge space for a concrete proposition domain P is the set 2P .
3.4.4 Definition [mm]: A knowledge set for a concrete proposition domain P is a subset of 2P .
3.4.5 Example: A knowledge set for a concrete proposition domain with 3 propositions.
As an example of a knowledge set, let P = {p1 , p2 , p3 } be a concrete proposition domain. If it is known only
that p1 is false or p2 is true (or both), then the knowledge set is
K = { 2P ; (p1 ) = F or (p2 ) = T}
= {(F, F, F), (F, F, T), (F, T, F), (F, T, T), (T, T, F), (T, T, T)},
where the ordered truth-value triples are defined in the obvious way.
3.4.8 Remark: Knowledge can assert individual propositions or relations between propositions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Although not all truth values (p) for p P may be directly observable, relations between truth-values may
be known or conjectured. The objective of logical argumentation is to solve for unknown truth-values in
terms of known truth-values, using known (or conjectural) relations between truth-values. In Example 3.4.7,
knowledge can have the form of a relation such as = where there is smoke, there is fire. Knowledge is
stored in human minds not only as the truth values of individual propositions, but also as relations between
truth values of propositions. If no knowledge could ever be expressed as relations between truth values of
individual propositions, there would be no role for propositional calculus.
It could be argued that someone whose knowledge is only the above logical relation has no knowledge at all.
Does this person know whether there is smoke? No. Does this person know whether there is fire? No. There
are only two questions, and the person does not know the answer to either question. Therefore they know
nothing at all. Zero knowledge about smoke plus zero knowledge about fire equals zero total knowledge.
The purpose of Remark 3.4.6 and Example 3.4.7 is to resolve this apparent paradox. The knowledge set
concept in Remark 3.4.1 is an attempt to give a concrete representation for the meaning of the partial
knowledge in compound logical propositions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
means that p is true. The meaning of the statement p is true lies in the application context. In other
words, it is undefined and irrelevant from the point of view of pure logic. However, one could regard the
statement is not excluded as a concrete proposition to which pure logic is to be applied.
It is important to avoid confusion about the levels of propositions. If one constructs propositions in some way
from concrete propositions, one must not simultaneously regard a particular proposition as both concrete
and higher-level. One may work with multiple levels, but they must be kept distinct. Logic and set theory
are circular enough already without adding unnecessary confusions.
3.5.2 Notation [mm]: Basic unary and binary logical operator symbols.
(i) means not (logical negation).
(ii) means and (logical conjunction).
(iii) means or (logical disjunction).
(iv) means implies (logical implication).
(v) means is implied by (reverse logical implication).
(vi) means if and only if (logical equivalence).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
to defining the meaning of logical expressions.
3.5.5 Notation [mm]: Let P be a concrete proposition domain. Then for any propositions p, q P,
(1) Kp denotes the knowledge set { 2P ; (p) = T},
(2) Kp denotes the knowledge set { 2P ; (p) = F},
(3) Kpq denotes the knowledge set { 2P ; ( (p) = T) ( (q) = T)},
(4) Kpq denotes the knowledge set { 2P ; ( (p) = T) ( (q) = T)},
(5) Kpq denotes the knowledge set { 2P ; ( (p) = T) ( (q) = T)},
(6) Kpq denotes the knowledge set { 2P ; ( (p) = T) ( (q) = T)},
(7) Kpq denotes the knowledge set { 2P ; ( (p) = T) ( (q) = T)}.
3.5.6 Remark: Construction of knowledge sets from basic logical operations on sets.
The sets Kp and Kp in Notation 3.5.5 are unambiguous, but the sets in lines (3)(7) require some explanation
because they are specified in terms of the symbols in Notation 3.5.2, which are abbreviations for natural-
language terms. These sets may be clarified as follows for any p, q P for a concrete proposition domain P.
(1) Kp = { 2P ; (p) = T}.
(2) Kp = 2P \ Kp .
(3) Kpq = Kp Kq .
(4) Kpq = Kp Kq .
(5) Kpq = Kp Kq .
(6) Kpq = Kp Kq .
(7) Kpq = Kpq Kpq .
In line (5), Kpq denotes the set { 2P ; ( (p) = T) ( (q) = T)}, which means the knowledge that
if (p) = T, then (q) = T. In other words, if (p) = T, then the possibility (q) = F can be excluded.
In other words, one excludes the joint occurrence of (p) = T and (q) = F. Therefore the knowledge set
Kpq equals 2P \ (Kp Kq ). Hence Kpq = Kp Kq . Lines (6) and (7) follow similarly.
Using basic set operations to define basic logic concepts and then prove relations between them is clearly
cheating because basic set operations are first introduced systematically in Chapter 8. This circularity is
not as serious as it may at first seem. The purpose of introducing the knowledge set concept is only to
give meaning to logical expressions. It is possible to present formal logic without giving any meaning to
logical expressions at all. Some people think that meaningless logic (i.e. logic without semantics) is best.
Some people think it is inevitable. However, even the most semantics-free presentations of logic are inspired
directly by the underlying meaning of the formalism. Making the intended meaning explicit helps to avoid
the development of useless theories which are exercises in pure formalism.
The sets in Notation 3.5.5 may be written out explicitly if P has only two elements. Let P = {p, q}. Denote
each truth value map : P {F, T} by the corresponding pair ( (p), (q)). Then the sets (1)(7) may be
written explicitly as follows.
(1) Kp = {(T, F), (T, T)}.
(2) Kp = {(F, F), (F, T)}.
(3) Kpq = {(T, T)}.
(4) Kpq = {(F, T), (T, F), (T, T)}.
(5) Kpq = {(F, F), (F, T), (T, T)}.
(6) Kpq = {(F, F), (T, F), (T, T)}.
(7) Kpq = {(F, F), (T, T)}.
The fact that the knowledge sets may be written out as explicit lists implies that there are no real difficulties
with circularity of definitions here. The concept of a list may be considered to be assumed naive knowledge.
The use of the set brackets { and } is actually superfluous. Thus any unary or binary logical expression
may be interpreted as a specific constraint on the possible combinations of truth values of one or two specified
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
propositions. Such a constraint may be specified as a list of possible truth-value combinations, while the truth
values of all other propositions in the concrete proposition domain may vary arbitrarily. The truth-value
combination list is finite, no matter how infinite the proposition domain is.
3.5.7 Remark: Escaping the circularity of interpretation between logical and set operations.
The circularity of interpretation between logical operations and set operations may seem inescapable within
a framework of symbols printed on paper. The circularity can be escaped by going outside the written page,
as one does in the case of defining a dog or any other object which is referred to in text. According to
Lakoff/Nunez [395], pages 3945, 121152, the objects and operations for both logic and sets come from
real-world experience of containers (for sets) and locations (for states or predicates of objects), as in Venn
diagrams. There may be some truth in this, but for most people, logical operations are learned as operational
procedures. For example, the meaning of the intersection A B of two sets A and B (for some concrete
examples of A and B, such as the spider has a black body and a red stripe on its back), may be learned
somewhat as in the following kind of operational definition scheme. (This example may be thought of as
defining either the set expression A B or the logical expression (x A) (x B).)
(1) Consider an object x. (Either think about it, write it down, hold it in your hand, look at it from a
distance, or talk about it.)
(2) If you have previously determined whether x is in A B, and x is in the list of things which are in A B,
or in the list of things which are not in A B, thats the answer. End of procedure.
(3) Consider whether x is in A. If x is in A, go to step (4). If x is not in A, add x to the list of things
which are not in A B. Thats the answer. End of procedure.
(4) Consider whether x is in B. If x is in B, add x to the list of things which are in A B. Thats the
answer. End of procedure. Otherwise, if x is not in B, add x to the list of things which are not in A B.
Thats the answer. End of procedure.
This kind of procedure is somewhat conjectural and introspective, but the important point is that in opera-
tional terms, two tests must be applied to determine whether x is in A B. In principle, one may be able to
perform two tests concurrently, but in practice, generally one test will be more difficult than the other, and a
person will know the answer to one of the tests before knowing the answer to the other test. The conclusion
about the logical conjunction of the two propositions may be drawn in step (3) or in step (4), depending on
the outcome of the first test. Typically the easier test will be performed first. One presumably retains some
kind of memory of the outcomes of previous tests. So memory must play a role. If the answer to one test is
known already, only one test remains to be performed, or it may be unnecessary to perform any test at all
(if the first test failed).
In early life, the meaning of the word and is generally learned as such a sequential procedure. Often a child
will think that the criteria for some outcome have been met, but will be told that not only the criterion which
has been clearly met, but also one more criterion, must be met before the outcome is achieved. E.g. Ive
finished eating the potatoes. Can I have dessert now? Response: No, you must eat all of the potatoes and
all of the carrots. Result: Child learns that two criteria must be met for the word and. In practice, the
criteria are generally met in a temporal sequence. The operational procedure for the word or is similar.
(The English word or comes from the same source as the word other, meaning that if the first assertion
is false, there is another assertion which will be true in that case, which suggests once again a temporal
sequence of condition testing.)
Basic logic operations such as and and or are learned at a very early age as concrete operational
procedures to be performed to determine whether some combination of logical criteria has been met. Basic
set operations such as union and intersection likewise correspond to concrete operational procedures. The
difference between these set operations and the corresponding logical operations is more linguistic than
substantial. Whether one reports that x is a rabbit or x is in the set of rabbits could be regarded as a
question of grammar or linguistic style.
It is perhaps of some value to mention that in digital computers, basic boolean operations on boolean variables
may be carried out in a more-or-less concurrent manner by the settling of certain kinds of electronic circuits
which are designed to combine the states of the bits in pairs of registers. However, a large amount of the logic
in computer programs is of the sequential kind, especially if the information for individual tests becomes
available at different times. Sometimes, the amount of work for two boolean tests is very different. So it is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
convenient to first perform the easy test, and only if the outcome is still unknown, the second (more costly)
test is performed. Such sequential testing corresponds more closely to real-life human logic.
The static kinds of logical operations which are defined in mathematical logic have their ontological source
in human sequential logical operations, not static combinations of truth values. However, this difference can
be removed by considering that mathematical logic applies to the outcomes of logical operations, not to the
procedures which are performed to yield such outcomes.
3.5.9 Definition [mm]: Let P be a concrete proposition domain. Then for any propositions p, q P, the
following are interpretations of logical expressions in terms of concrete propositions .
(1) p signifies the knowledge set Kp = { 2P ; (p) = T}.
(2) p signifies the knowledge set Kp = 2P \ Kp = { 2P ; (p) = F}.
(3) p q signifies the knowledge set Kpq = Kp Kq = { 2P ; ( (p) = T) ( (q) = T)}.
(4) p q signifies the knowledge set Kpq = Kp Kq = { 2P ; ( (p) = T) ( (q) = T)}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
N p q p q W
embedding p q q
language
proposition name map interpretation semantics
Figure 3.5.1 Proposition names, concrete propositions, logical expressions and knowledge sets
The proposition name map in Figure 3.5.1 is variable. Therefore the interpretation map is also variable.
To interpret a logical expression, one carries out the following steps.
(1) Parse the logical expression (e.g. p q) to extract the proposition names in N (e.g. p and q).
(2) Follow the map : N P to determine the named concrete propositions in P (e.g. (p) and (q)).
(3) P
Follow the embedding from P to (2P ) to determine the atomic knowledge sets (e.g. K(p) and K(q) ).
(4) Apply the set-constructions to the atomic knowledge sets in accordance with the structure of the logical
expression to obtain the corresponding compound knowledge set (e.g. K(p) K(q) ).
3.5.12 Remark: Logical operators acting on propositions versus acting on logical expressions.
The logical operations which act on proposition names to form logical expressions in Definition 3.5.9 must
be distinguished from logical operations which act on general logical expressions in Section 3.8.
In Definition 3.5.9, a logical operator such as implication () is combined with proposition names p, q N
to form logical expressions. Let W denote the space of logical expressions. Then the implication operator
is effectively a map from N N to W . By contrast, the logical operator in Section 3.8 is effectively a
map from W W to W , mapping each pair (1 , 2 ) W W to a logical expression of the form 1 2 .
3.5.16 Definition [mm]: Let P be a concrete proposition domain. Then the following are interpretations
of logical expressions in terms of concrete propositions .
(1) signifies the knowledge set .
(2) > signifies the knowledge set 2P .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Although logical operator symbols and > may appear in logical expressions on their own, they may also
be used in combination with other logic symbols, as mentioned in Remark 3.12.6.
3.6.2 Definition [mm]: A truth function with n parameters or n-ary truth function , for a non-negative
integer n, is a function : {F, T}n {F, T}.
3.6.3 Remark: Justification for the range-set {F, T} for truth functions.
It is not entirely obvious that the set {F, T} is the best range-set for truth functions in Definition 3.6.2.
One advantage of this range-set is that pairs of propositions are effectively combined to produce compound
propositions. To see this, consider the following equalities.
Kp = { 2P ; (p) = T} = { 2P ; fp ( ) = T}
Kp,q = { 2P ; ( (p), (q)) = T} = { 2P ; fp,q
( ) = T},
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3.6.5 Remark: An ad-hoc notation for binary truth functions.
For any binary truth function : {F, T}2 {F, T}, let be denoted temporarily by the sequence of four
values (F, F) (F, T) (T, F) (T, T) of . Thus for example, FFTF denotes the binary truth function
which satisfies (A, B) = T if and only if (A, B) = (T, F).
The five kinds of binary logical expressions p q in Definition 3.5.9 have the following truth functions.
truth logical
function expression
FFFT pq
FTTT pq
TFFT pq
TFTT pq
TTFT pq
This shows the meaning of truth functions more clearly. Compound logical expressions signify exclusions
of some combinations of truth values. As long as the excluded truth-value combinations do not occur, the
logical expressions are not invalidated.
In terms of the smoke-and-fire example, the proposition if there is smoke, then there is fire cannot be
invalidated if there is no smoke, whether there is fire or not. Likewise, the proposition cannot be invalidated
if there is fire, whether there is smoke or not. The only combination which invalidates the proposition
is a situation where there is smoke without fire. If this situation occurs, the proposition is invalidated.
Conversely, if the proposition is valid, then a smoke-without-fire situation is excluded. Thus it is not correct
to say that a compound proposition such as p q is true or false. It is correct to say that such a
proposition is excluded or not excluded. In other words, it is invalidated or not invalidated.
3.6.7 Remark: Additional binary logical operations.
2
Since there are 2(2 ) = 16 theoretically possible binary truth functions in Remark 3.6.5, one naturally asks
whether some of the remaining 11 binary truth functions might be useful. Three additional useful binary
logical operations, all of which are symmetric, are introduced in Definition 3.6.10.
The logical operator symbols introduced in Notation 3.6.8 have multiple alternative names, as indicated.
In electronics engineering, nand, nor and xor are generally capitalised as NAND, NOR and XOR
respectively. In this book, both upper and lower case versions are used.
3.6.8 Notation [mm]: Additional binary logical operator symbols.
(i) means alternative denial (NAND, Sheffer stroke).
(ii) means joint denial (NOR, Peirce arrow, Quine dagger).
(iii) M means exclusive or (XOR, exclusive disjunction).
3.6.9 Remark: History of the Sheffer stroke operator.
The discovery of the properties of the Sheffer stroke operator is attributed by Hilbert/Bernays [309], page 48
footnote, to an 1880 manuscript by Charles Sanders Peirce, although the discovery only became generally
known much later through a 1913 paper by Henry Maurice Sheffer.
3.6.10 Definition [mm]: Let P be a concrete proposition domain. Then for any propositions p, q P, the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
following are interpretations of logical expressions in terms of concrete propositions .
(1) p q signifies the knowledge set 2P \ (Kp Kq ).
(2) p q signifies the knowledge set 2P \ (Kp Kq ).
(3) p M q signifies the knowledge set (Kp \ Kq ) (Kq \ Kp ).
3.6.11 Remark: Truth functions for the additional binary logical operators.
The alternative denial operator has the truth function : {F, T}2 {F, T} satisfying (p, q) = F for
(p, q) = (T, T), otherwise (p, q) = T.
The joint denial operator has the truth function : {F, T}2 {F, T} satisfying (p, q) = T for (p, q) =
(F, F), otherwise (p, q) = F.
The exclusive-or operator has the truth function : {F, T}2 {F, T} satisfying (p, q) = T for (p, q) =
(T, F) or (p, q) = (F, T), otherwise (p, q) = F.
3.6.12 Remark: Truth functions for eight binary logical operators.
The following table combines the binary logical operators in Definition 3.6.10 with the table in Remark 3.6.5.
function p q
FFFT pq
FTTF pMq
FTTT pq
TFFF pq
TFFT pq
TFTT pq
TTFT pq
TTTF pq
3.6.14 Remark: A survey of symbols in the literature for some basic logical operations.
The logical operation symbols in Notations 3.5.2 and 3.6.8 are by no means universal. Table 3.6.1 compares
some logical operation symbols which are found in the literature.
The NOT-symbols in Table 3.6.1 are prefixes except for the superscript dash (Graves [83]; Eves [303]), and
the over-line notation (Weyl [152]; Hilbert/Bernays [309]; Hilbert/Ackermann [308]; Quine [331]; EDM [108];
EDM2 [109]). In these cases, the empty-square symbol indicates where the logical expression would be. The
reverse implication operator is omitted from this table because it is rarely used. When it is used, it is
usually denoted as the mirror image of the forward implication symbol.
The popular choice of the Sheffer stroke symbol | for the logical NAND operator is unfortunate. It
clashes with usage in number theory, probability theory, and the vertical bars | . . . | of the modulus, norm
and absolute value functions. The less popular vertical arrow notation has many advantages. Luckily
this operator is rarely needed.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
[Signum C significat est consequentia; ita b C a legitur b est consequentia propositionis a. Sed hoc
signo nunquam utimur].
Signum significat deducitur; ita a b idem significat quod b C a.
This may be translated as follows.
[The symbol C means is a consequence; thus b C a is read as: b is a consequence of proposition a.
But this symbol is never used].
The symbol means is deduced; thus a b means the same as b C a.
According to Quine [331], page 26, Peanos notation for logical implication was revived by Peano from
an earlier usage of the same symbol in 1816 by Gergonne [366]. The same origin is identified by Cajori [222],
volume II, page 288, who wrote that Gergonnes C stood for contains and his 180 -rotated C stood
for is contained in. Peano [325], page XI, also uses these symbols for class inclusion relations as follows.
Signum significat continetur . Ita a b significat classis a continetur in classi b. [. . . ]
[Formula b C a significare potest classis b continet classem a; at signo C non utimur].
This may be translated as follows.
The symbol means is contained . Thus a b means class a is contained in class b. [. . . ]
[Formula b C a could mean class b contains class a; but the symbol C is not used].
Unfortunately this is the reverse of the modern convention for the meaning of . But it does help to
explain why the logical operator which is often seen in the logic literature is apparently around the
wrong way. In fact, the omitted text in the above quotation is as follows in modern notation.
a, b K, (a b) x, (x a x b),
where K denotes the class of all classes. This makes the logic and set theory symbols match, but it contradicts
the modern convention where the smaller side of the symbol corresponds to the smaller set.
year author not and or implies if and only if nand nor xor
1889 Peano [325] =
1910 Whitehead/Russell [350] .
1918 Weyl [152] +
1919 Russell [339] /, |
1928 Hilbert/Ackermann [308] & v /
1931 Godel [306] ., & ,
1934 Hilbert/Bernays [309] & |
1940 Quine [330] |
1941 Tarski [348] M
1944 Church [299] |
|
1946 Graves [83] , , 0 . ,
1950 Quine [331] , |
1950 Rosenbloom [336]
1952 Kleene [315] & |
1952 Wilder [353] / /
1953 Rosser [337] &, .
1953 Tarski [349]
1954 Carnap [296] .
1955 Allendoerfer/Oakley [47] Y
1958 Bernays [292] &
1958 Eves [303] 0 /
1958 Nagel/Newman [323]
1960 Korner [403] , , & , v , | J
1960 Suppes [345] &
1963 Curry [301] , & , or , ,
1963 Quine [332]
1963 Stoll [343]
1964 E. Mendelson [320] |
1964 Suppes/Hill [346] &
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1965 KEM [99]
1965 Lemmon [317] & v |
1966 Cohen [300] &
1967 Kleene [316] &
1967 Margaris [319]
1967 Shoenfield [340] &
1968 Smullyan [341] |
1969 Bell/Slomson [290]
1969 Robbin [334]
1970 Quine [333] .
1971 Hirst/Rhodes [311]
1973 Chang/Keisler [298]
1973 Jech [314]
1974 Reinhardt/Soeder [120] | 5
1982 Moore [321] &
1990 Roitman [335]
1993 Ben-Ari [291]
1993 EDM2 [109] , , , &, , + , , , , , ,
1996 Smullyan/Fitting [342]
1998 Howard/Rubin [312]
2004 Huth/Ryan [313]
2004 Szekeres [266] and or
Kennington M
truth logical
function expression type sub-type equivalents atoms
FFFF always false 0
FFFT AB conjunction A, B 2
FFTF A B conjunction A, B 2
FFTT A atomic 1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
FTFF A B conjunction A, B 2
FTFT B atomic 1
FTTF A B implication exclusive AMB A B
FTTT A B implication inclusive AB A B
TFFF A B conjunction A, B AB 2
TFFT AB implication exclusive A M B A B
TFTF B atomic 1
TFTT A B implication inclusive A B AB
TTFF A atomic 1
TTFT AB implication inclusive A B A B
TTTF A B implication inclusive A B A B
TTTT > always true 0
The binary logical expressions in Table 3.7.1 may be initially classified as follows.
(1) The always-false () and always-true (>) truth functions convey no information. In fact, the always-
false truth function is not valid for any possible combinations of truth values. The always-true truth
function excludes no possible combinations of truth values at all.
(2) The 4 atomic propositions (A, B, B and A) convey information about only one of the propositions.
These are therefore not, strictly speaking, binary operations.
(3) The 4 conjunctions (A B, A B, A B and A B) are equivalent to simple lists of two
atomic propositions. So the information in the operators corresponding to these truth functions can be
conveyed by two-element lists of individual propositions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
not how most people spontaneously think of exclusive disjunctions.)
It follows that every binary logical expression is equivalent to either a list of conditionals or a list of atomic
propositions, modulo negations of concrete propositions. In other words, the implication operator is, appar-
ently, the most fundamental building block for general binary logical operators. This tends to justify the
choice of basic logical operators and for propositional calculus in Definition 4.4.3.
3.8.5 Definition [mm]: The (single) substitution into a symbol string 0 of a symbol string 1 for the
m00 1
symbol in position q is the symbol string 00 = (00,j )j=0 which is constructed as follows.
i 1
(1) Let i = (i,j )m
j=0 for i = 0 and i = 1.
(2) Let m00 = m0 + m1 1.
0,k if k < q
(3) Let 00,k = 1,kq
if q k < q + m1 , for k Zm .
0
0
0,km1 +1 if k q + m1
3.8.6 Remark: Logical operator binding precedence rules and omission of parentheses.
Parentheses are omitted whenever the meaning is clear according to the usual binding precedence rules.
Conversely, when a logical expression is unquoted, the expanded expression must be enclosed in parentheses
unless the usual binding precedence rules make them unnecessary. For example, if = p1 p2 , then the
symbol-string 8 is understood to equal (p1 p2 ).
3.8.7 Remark: Construction of knowledge sets from basic logical operations on knowledge sets.
Knowledge sets may be constructed from other knowledge sets as follows.
(i) For a given knowledge set K1 2P , construct the complement 2P \ K1 = { 2P ; / K1 }.
P P
(ii) For given K1 , K2 2 , construct the intersection K1 K2 = { 2 ; K1 and K2 }.
(iii) For given K1 , K2 2P , construct the union K1 K2 = { 2P ; K1 or K2 }.
By recursively applying the construction methods in lines (i), (ii) and (iii) to basic knowledge sets of the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
form Kp = { 2P ; (p) = T} in Notation 3.5.5, any knowledge set K 2P may be constructed if P is a
finite set. (In fact, it is clear that this may be achieved using only the two methods (i) and (ii), or only the
two methods (i) and (iii).)
It is inconvenient to work directly at the semantic level with knowledge sets. It is more convenient to define
logical expressions to symbolically represent constructions of knowledge sets. For example, if 1 and 2 are
the names of logical expressions which signify knowledge sets K1 and K2 respectively, then the expressions
81 , 81 82 and 81 82 signify the knowledge sets 2P \ K1 , K1 K2 and K1 K2 respectively.
This is an extension of the logical expressions which were introduced in Section 3.5 from expressions signifying
atomic knowledge sets Kp for p P to expressions signifying general knowledge sets K 2P . Logical
expressions of the form p1 8 p2 for p1 , p2 P (as in Remark 3.6.12) are extended to logical expressions of
the form 81 8 82 , where 1 and 2 signify subsets of 2P . (Note that p1 and p2 in the expression p1 8 p2
must not be dereferenced.)
3.8.8 Remark: Logical expression templates.
Symbol-strings such as 81 82 which require zero or more name substitutions to become true logical
expressions may be referred to as logical expression templates. A logical expression may be regarded as a
logical expression template with no required name substitutions.
It is often unclear whether one is discussing the template symbol-string before substitution or the logical
expression which is obtained after substitution. (The confusion of symbolic names with their values is a
constant issue in natural-language logic and mathematics!) This can generally be clarified (at least partially)
by referring to such symbol-strings as templates if the before-substitution string is meant, or as logical
expressions if the after-substitution string is meant.
3.8.9 Definition [mm]: A logical expression template is a symbol string which becomes a logical expression
when the names in it are substituted with their values in accordance with Notation 3.8.3.
3.8.10 Remark: Extended interpretations are required for logical operators applied to logical expressions.
The natural-language words and and or in Remark 3.8.7 lines (ii) and (iii) cannot be simply replaced
with the corresponding logical operators and in Definition 3.5.9 because the expressions K1
and K2 are not elements of the concrete proposition domain P. (Note, however, that one could easily
construct a different context in which these expressions would be concrete propositions, but these would be
elements of a different domain P 0 .)
Similarly, the words and and or in Remark 3.8.7 lines (ii) and (iii) cannot be easily interpreted using
binary truth functions in the style of Definition 3.6.2 because the expressions K1 and K2
cannot be given truth values which could be the arguments of such truth functions. The interpretation of
these expressions requires set theory, which is not defined until Chapter 7. (Admittedly, one could convert
knowledge sets to knowledge functions as in Definition 3.4.12, but this approach would be unnecessarily
untidy and indirect.)
3.8.11 Remark: Recursive syntax and interpretation for infix logical expressions.
A general class of logical expressions is defined recursively in terms of unary and binary operations on
a concrete proposition domain in Definition 3.8.12. (A non-recursive style of syntax specification for infix
logical expressions is given in Definition 3.11.2, but the non-recursive specification is somewhat cumbersome.)
3.8.12 Definition [mm]: An infix logical expression on a concrete proposition domain P, with proposition
name map : N P is any expression which may be constructed recursively as follows.
(i) For any p N , the logical expression p, which signifies { 2P ; ((p)) = T}.
(ii) For any logical expressions 1 , 2 signifying K1 , K2 on P, the following logical expressions with the
following meanings.
(1) ( 81 ), which signifies 2P \ K1 .
(2) (81 82 ), which signifies K1 K2 .
(3) (81 82 ), which signifies K1 K2 .
(4) (81 82 ), which signifies (2P \ K1 ) K2 .
(5) (81 82 ), which signifies K1 (2P \ K2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(6) (81 82 ), which signifies (K1 K2 ) ((2P \ K1 ) (2P \ K2 )) = ((2P \ K1 ) K2 ) (K1 (2P \ K2 )).
(7) (81 82 ), which signifies 2P \ (K1 K2 ).
(8) (81 82 ), which signifies 2P \ (K1 K2 ).
(9) (81 M 82 ), which signifies (K1 \ K2 ) (K2 \ K1 ) = (K1 K2 ) \ (K1 K2 ).
3.9.2 Definition [mm]: Equivalent logical expressions 1 , 2 on a concrete proposition domain P are logical
expressions 1 , 2 on P whose interpretations as subsets of 2P are the same.
3.9.3 Notation [mm]: 1 2 , for logical expressions 1 , 2 on a concrete proposition domain P, means
that 1 and 2 are equivalent logical expressions on P.
81 1
(81 82 ) 81 82
81 82 81 82 .
Discovering and proving such equivalences can give hours of amusement and recreation. Alternatively one
may write a computer programme to print out thousands or millions of them.
Equivalences between logical expressions may be efficiently demonstrated using truth tables or by deducing
them from other equivalences. The deductive method is used to prove theorems in Sections 4.4, 4.6 and 4.7.
The equivalence relation between logical expressions is different to the logical equivalence operation .
The logical operation constructs a new logical expression from two given logical expressions. The
equivalence relation is a relation between two logical expressions (which may be thought of as a boolean-
valued function of the two logical expressions).
3.9.5 Remark: All logical expressions may be expressed in terms of negations and conjunctions.
The logical expression templates in Definition 3.8.12 lines (3)(9) may be expressed in terms of the logical
negation and conjunction operations in lines (1) and (2) as follows. When logical expression names (such
as 1 and 2 ) are unspecified, it is understood that the equivalences are asserted for all logical expressions
which conform to Definition 3.8.12.
(1) (81 82 ) ( (( 81 ) ( 82 )))
(2) (81 82 ) ( (81 ( 82 )))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) (81 82 ) ( (( 81 ) 82 )).
(4) (81 82 ) ( (( (81 82 )) ( (( 81 ) ( 82 )))))
(( (81 ( 82 ))) ( (( 81 ) 82 ))).
(5) (81 82 ) ( (81 82 )).
(6) (81 82 ) (( 81 ) ( 82 )).
(7) (81 M 82 ) ( (( (81 ( 82 ))) ( (( 81 ) 82 ))))
(( (( 81 ) ( 82 ))) ( (81 82 ))).
In abbreviated infix logical expression syntax, these equivalences are as follows.
(1) 81 82 (81 82 ).
(2) 81 82 (81 82 ).
(3) 81 82 (81 82 ).
(4) 81 82 ((81 82 ) (81 82 ))
(81 82 ) (81 82 ).
(5) 81 82 (81 82 ).
(6) 81 82 81 82 .
(7) 81 M 82 ((81 82 ) (81 82 ))
(81 82 ) (81 82 ).
These equivalences have the interesting consequence that the left-hand-side expressions may be defined to
mean the right-hand-side expressions. Then there would be only two basic logical operators and ,
and the remaining operators would be mere abbreviations. Such an approach is in fact often adopted.
Similarly, all logical expressions are equivalent to expressions which use only the negation and implication
operators, or only the negation and disjunction operators. Economising the number of basic logical operators
in this way has some advantages in the boot-strapping of logic because some kinds of recursive proofs of
metamathematical theorems thereby require less steps.
All logical expressions can in fact be written in terms of a single binary operator, namely the nand
or the nor operator, since both negations and conjunctions may be written in terms of either one of
these operators. However, defining all operators either in terms of the nand-operator or in terms of the
nor-operator generally makes recursive-style metamathematical theorem proofs more difficult.
Logical operator minimalism does not correspond to how people think. After the early stages of boot-
strapping logic have been completed, it is best to use a full range of intuitively meaningful logical operators.
The recursively defined logical expressions in Definition 3.8.12 are merely symbolic representations for par-
ticular kinds of knowledge sets. They are helpful because they are intuitively clearer and more convenient
than set-expressions involving the set-operators \, and , but logical expressions are not the real thing.
Logical expressions merely provide an efficient, intuitive symbolic language for communicating knowledge,
beliefs, assertions or conjectures.
3.9.6 Remark: Uniform substitution of symbol strings for symbols in a symbol string.
Propositional calculus makes extensive use of substitution rules. These rules permit symbols in logical
expressions to be substituted with other logical expressions under specified circumstances. Definition 3.9.7
presents a general kind of uniform substitution for symbol strings. (See Notation 14.3.7 for the sets n = Z
{0, . . . n 1} for n + Z
0 .)
Uniform substitution is not the same as the symbol-by-symbol dereferencing concept which is discussed in
Remark 3.8.2 and Notation 3.8.3. Single-symbol substitution is described in Definition 3.8.5.
3.9.7 Definition [mm]: The uniform substitution into a symbol string 0 of symbol strings 1 , 2 . . . n
m00 1
for distinct symbols p1 , p2 . . . pn , where n is a non-negative integer, is the symbol string 00 = (00,j )j=0
which is constructed as follows.
(1) i = (i,j )m i 1
j=0 for all i n+1 . Z
mi if 0,k = pi
Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) `k = for all k m0 .
1 if 0,k
/ {p1 , . . . pn }
P m0
(3) m00 = k=0 `k .
Pk1
(4) (k) = j=0 `k for all k m0 . Z
(5) 00,(k)+j =
i,j if 0,k = pi
0,k if 0,k / {p1 , . . . pn }
for all j Z` , for all k Zm .
k 0
3.9.8 Notation [mm]: Subs(0 ; p1 [1 ], p2 [2 ], . . . pn [n ]), for symbol strings 0 , 1 , 2 . . . n and symbols
p1 , p2 . . . pn , for a non-negative integer n, denotes the uniform substitution into 0 of the symbol strings
1 , 2 . . . n for the symbols p1 , p2 . . . pn .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
p2 (p1 p2 ) p3 6 p3
functional expression
f2 (, f1 (, f2 (, p1 , f1 (, p2 ))), p3 )
Figure 3.10.1 Representations of an example logical expression
All nodes are required to have a parent node except for a single node which is called the root node. Each
of these methods has advantages and disadvantages. Probably the postfix notation is the easiest to manage.
It is unfortunate that the infix notation is generally used as the basis for axiomatic treatment since it is so
difficult to manipulate.
Recursive definitions and theorems, which seem to be ubiquitous in mathematical logic, are not truly intrinsic
to the nature of logic. They are a consequence of the argumentative style of logic, which is generally expressed
in natural language or some formalisation of natural language. If history had taken a different turn, logic
could have been expressed mostly in terms of truth tables which list exclusions in the style of Remark 3.6.6.
Ultimately all logical expressions boil down to exclusions of combinations of the truth values of concrete
propositions. Each such exclusion may be expressed in infinitely many ways as logical expressions. This
is analogous to the description of physical phenomena with respect to multiple observation frames. In
each frame, the same reality is described with different numbers, but there is only one reality underlying
the frame-dependent descriptions. In the same way, logical expressions have no particular significance in
themselves. They only have significance as convenient ways of specifying lists of exclusions of combinations
of truth values.
3.10.2 Remark: Postfix logical expressions.
Both the syntax and semantics rules for postfix logical expressions are very much simpler than for the infix
logical expressions in Definition 3.8.12. In particular, the syntax does not need to be specified recursively.
The space of all valid postfix logical expressions is neatly specified by lines (i) and (ii).
3.10.3 Definition [mm]: A postfix logical expression on a concrete proposition domain P, with operator
classes O0 = {, >}, O1 = {}, O2 = {, , , , , , , M}, using a name map
n1
: N P, is any symbol-sequence = (i )i=0 X n for positive integer n, where X = N O0 O1 O2 ,
which satisfies
(i) L(, n) = 1,
(ii) a(i ) L(, i) for all i = 0 . . . n 1,
Pi1
where L(, i) = j=0 (1 a(j )) for all i = 0 . . . n, and
(
0 if i O0 N
a(i ) = 1 if i O1
2 if i O2
for all i = 0 . . . n 1. The meanings of these logical expressions are defined recursively as follows.
(iii) For any p N , the postfix logical expression p signifies the knowledge set { 2P ; ((p)) = T}.
(iv) For any postfix logical expressions 1 , 2 signifying K1 , K2 on P, the following postfix logical expressions
with the following meanings.
(1) 81 signifies 2P \ K1 .
(2) 81 82 signifies K1 K2 .
(3) 81 82 signifies K1 K2 .
(4) 81 82 signifies (2P \ K1 ) K2 .
(5) 81 82 signifies K1 (2P \ K2 ).
(6) 81 82 signifies (K1 K2 ) ((2P \ K1 ) (2P \ K2 )) = ((2P \ K1 ) K2 ) (K1 (2P \ K2 )).
(7) 81 82 signifies 2P \ (K1 K2 ).
(8) 81 82 signifies 2P \ (K1 K2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(9) 81 82 M signifies (K1 \ K2 ) (K2 \ K1 ) = (K1 K2 ) \ (K1 K2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3.10.9 Remark: Abbreviated notation for proposition name maps for postfix logical expressions.
The considerations in Remark 3.8.13 regarding implicit proposition name maps apply to Definition 3.10.3
also. Although this definition is presented in terms of a specified proposition name map : N P, it will
be mostly assumed that N = P and is the identity map.
3.10.10 Remark: Prefix logical expressions.
The syntax and interpretation rules for prefix logical expressions in Definition 3.10.11 are very similar to
those for postfix logical expressions in Definition 3.10.3. The prefix logical expression syntax specification in
lines (i) and (ii) differs in that the stack length L(, i) has an upper limit instead of a lower limit. A non-
positive stack length may be interpreted as an expectation of future arguments for operators, which must be
fulfilled by the end of the symbol-string, at which point there must be exactly one object on the stack. Note
that for both postfix and prefix syntaxes, an empty symbol-string is prevented by the condition L(, n) = 1.
(The function L is identical in the two definitions.)
3.10.11 Definition [mm]: A prefix logical expression on a concrete proposition domain P, with operator
classes O0 = {, >}, O1 = {}, O2 = {, , , , , , , M}, using a name map
n1
: N P, is any symbol-sequence = (i )i=0 X n for positive integer n, where X = N O0 O1 O2 ,
which satisfies
(i) L(, n) = 1,
(ii) L(, i) 0 for all i = 0 . . . n 1,
Pi1
where L(, i) = j=0 (1 a(j )) for all i = 0 . . . n, and
(
0 if i O0 N
a(i ) = 1 if i O1
2 if i O2
for all i = 0 . . . n 1. The meanings of these logical expressions are defined recursively as follows.
(iii) For any p N , the prefix logical expression p signifies the knowledge set { 2P ; ((p)) = T}.
(iv) For any prefix logical expressions 1 , 2 signifying K1 , K2 on P, the following prefix logical expressions
with the following meanings.
(1) 81 signifies 2P \ K1 .
(2) 81 82 signifies K1 K2 .
(3) 81 82 signifies K1 K2 .
(4) 81 82 signifies (2P \ K1 ) K2 .
(5) 81 82 signifies K1 (2P \ K2 ).
(6) 81 82 signifies (K1 K2 ) ((2P \ K1 ) (2P \ K2 )) = ((2P \ K1 ) K2 ) (K1 (2P \ K2 )).
(7) 81 82 signifies 2P \ (K1 K2 ).
(8) 81 82 signifies 2P \ (K1 K2 ).
(9) M 81 82 signifies (K1 \ K2 ) (K2 \ K1 ) = (K1 K2 ) \ (K1 K2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Notation 3.9.8 for the uniform substitution notation.)
3.10.15 Theorem [mm]: Substitution of expressions for proposition names in prefix logical expressions.
Let P be a concrete proposition domain with name map : N P. Let 0 , 1 , . . . n W + (N , O0 , O1 , O2 )
be prefix logical expressions on P, and let p1 , . . . pn N be proposition names, where n is a non-negative
integer. Then Subs(0 ; p1 [1 ], . . . pn [n ]) W + (N , O0 , O1 , O2 ).
5
p2
4 4 B(, i)
p1 ( )
3 3
( )
2 2
( ) p3
1 1
0 ( )
0
i 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 3.11.1 Bracket levels of an infix logical expression example
The difficulties of non-recursively specifying the space of infix logical expressions in Definition 3.8.12 can
be gleaned from syntax conditions (i)(v) in Definition 3.11.2. These conditions make use of the quantifier
Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
notation which is introduced in Section 5.2 and the set notation m = {0, 1, . . . m 1} for integers m 0.
3.11.2 Definition [mm]: An infix logical expression on a concrete proposition domain P, with operator
classes O0 = {, >}, O1 = {}, O2 = {, , , , , , , M}, and parenthesis
n1
classes B + = {(} and B = {)}, using a name map : N P, is any symbol-sequence = (i )i=0 Xn
+
for positive integer n, where X = N O0 O1 O2 B B , which satisfies
Z Z P Pi1
(i) i n , B(, i) 1, where i n , B(, i) = ij=0 b+ (j ) j=0 b (j ), where
n n
b+ (i ) = 1 if i N O0 B + and b (i ) = 1 if i N O0 B
0 otherwise 0 otherwise.
(The symbol string is assumed to be extended with empty characters outside its domain Zn.)
(ii) B(, n) = 0.
(iii) For all i1 , i2 Zn with i1 i2 which satisfy
(1) B(, i1 1) < B(, i1 ) = B(, i2 ) > B(, i2 + 1) and
(2) B(, i1 ) B(, i) for all i Zn with i1 < i < i2 ,
either
(3) i1 = i2 , or
Z
(4) there is one and only one i0 n with i1 < i0 < i2 and B(, i1 ) = B(, i0 ), and for this i0 ,
(i0 ) O1 and i1 + 1 = i0 < i2 1, or
Z
(5) there is one and only one i0 n with i1 < i0 < i2 and B(, i1 ) = B(, i0 ), and for this i0 ,
(i0 ) O2 and i1 + 1 < i0 < i2 1.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Notation 3.9.8 for the uniform substitution notation.)
3.11.6 Theorem [mm]: Substitution of expressions for proposition names in infix logical expressions.
Let P be a concrete proposition domain with name map : N P. Let 0 , 1 , . . . n W 0 (N , O0 , O1 , O2 )
be infix logical expressions on P, and let p1 , . . . pn N be proposition names, where n is a non-negative
integer. Then Subs(0 ; p1 [1 ], . . . pn [n ]) W 0 (N , O0 , O1 , O2 ).
(3) Subexpressions which satisfy Definition 3.11.2 condition (iii) (5) have the binary expression form in
Definition 3.8.12 lines (2)(9).
3.11.9 Remark: The principal operator and matching parentheses in an infix logical subexpression.
Z
Let be an infix logical expression according to Definition 3.11.2, with domain n . For any i n which Z
satisfies (i) O1 O2 B + B , the matching left and right parentheses are defined respectively by
3.11.10 Remark: Outer versus inner parenthesisation for infix logical expressions.
An alternative to the outer parenthesisation style for infix logical expressions in Definition 3.11.2 semantic
conditions (iv)(v) is the inner parenthesisation style as follows.
(iv0 ) For any p N , the logical expression p signifies { 2P ; ((p)) = T}.
(v0 ) For any logical expressions 1 , 2 signifying K1 , K2 on P, the following logical expressions have the
following meanings.
(10 ) (81 ) signifies 2P \ K1 .
(20 ) (81 ) (82 ) signifies K1 K2 .
(30 ) (81 ) (82 ) signifies K1 K2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
And so forth. Then the outer-style example (((p1 (p2 ))) p3 ) in Figure 3.10.1 becomes the inner-
style expression (((p1 ) ((p2 )))) (p3 ). The inner style has an extra two parentheses for each binary
operator in the expression. This makes it even more unreadable than the outer style.
3.11.11 Remark: Many difficulties arise in logic because logical expressions imitate natural language.
A conclusion which may be drawn from Section 3.10 is that many of the difficulties of logic arise from the way
in which knowledge is typically described in natural language. In other words, the placing of constraints on
combinations of truth values of propositions does not inherently suffer from the complexities and ambiguities
of the forms of natural language used to communicate those constraints. The features of natural language
which are imitated in propositional logic include the use of recursive-binary logical expressions, and the use
of infix syntax.
An even greater range of difficulties arises when the argumentative method is applied to logic in Chapter 4.
The very long tradition of seeking to extend knowledge from the known to the unknown by logical argu-
mentation has resulted in a formalist style of modern logic which focuses excessively on the language of
argumentation while relegating semantics to a secondary role. Knowledge is sometimes identified in the logic
literature with the totality of propositions which can be successfully argued for, using rules which are only
distantly related to their meaning. The rules of logic are justifiable only if they lead to valid results. The
rules of logic must be tested against the validity of the consequences, not vice versa.
3.10.3 and 3.10.11, one may define logical expressions in terms of functions. In functional form, the example
in Figure 3.10.1 in Remark 3.10.1 would have the form
= f2 (, f1 (, f2 (, p1 , f1 (, p2 ))), p3 ), (3.11.1)
where the functions f1 and f2 are defined according to some syntax style. For example, in the infix syntax
style, f1 (o1 , 1 ) expands to o81 (81 ) and f2 (o2 , 2 , 3 ) expands to (82 )o82 (83 ). In the postfix syntax style,
f1 (o1 , 1 ) expands to 81 o81 and f2 (o2 , 2 , 3 ) expands to 82 83 o82 .
The observant reader will no doubt have noticed that the symbols in Equation (3.11.1) appear in the sequence
p1 p2 p3 , which (not) coincidentally is the same order as in the prefix syntax in Figure 3.10.1.
In fact, the conversion between functional notation and prefix notation is very easy to automate. The prefix
and functional notations are well suited to proofs of metamathematical theorems. This suggests that for
facilitating such proofs, the prefix or functional notations should be preferred.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) (82 81 ) (81 82 ).
Proof: To verify these claims, let P be the concrete proposition domain on which 1 , 2 and 3 are
defined, and let K1 , K2 and K3 be the respective knowledge sets.
To verify (i), note that 82 81 signifies the knowledge set (2P \ K2 ) K1 by Definition 3.8.12 (ii)(4).
Therefore 81 (82 81 ) signifies the knowledge set (2P \ K1 ) ((2P \ K2 ) K1 ) by the same definition.
By the commutativity and associativity of the union operation, this set equals ((2P \ K1 ) K1 ) (2P \ K2 ),
which equals 2P (2P \K2 ), which equals 2P . Hence 81 (82 81 ) is a tautology by Definition 3.12.2 (ii).
To verify (ii), it follows from Definition 3.8.12 (ii)(4) that (81 82 ) (81 83 ) signifies the set
(2P \ ((2P \ K1 ) K2 )) ((2P \ K1 ) K3 ) = (K1 (2P \ K2 )) ((2P \ K1 ) K3 )
= (K1 ((2P \ K1 ) K3 )) ((2P \ K2 ) ((2P \ K1 ) K3 ))
= (2P K3 ) ((2P \ K2 ) ((2P \ K1 ) K3 ))
= 2P ((2P \ K2 ) ((2P \ K1 ) K3 ))
= (2P \ K2 ) ((2P \ K1 ) K3 ),
while 81 (82 83 ) signifies the same knowledge set (2P \ K1 ) ((2P \ K2 ) K3 ). Therefore the logical
expression (81 (82 83 )) ((81 82 ) (81 83 )) signifies the knowledge set
(2P \ ((2P \ K2 ) ((2P \ K1 ) K3 ))) ((K1 (2P \ K2 )) ((2P \ K1 ) K3 )) = 2P .
Hence the logical expression (81 (82 83 )) ((81 82 ) (81 83 )) is a tautology.
For part (iii), 82 81 signifies the knowledge set (2P \ (2P \ K2 )) (2P \ K1 ) by Definition 3.8.12
parts (1) and (ii)(4), which equals K2 (2P \ K1 ), whereas (81 82 ) signifies (2P \ K1 ) K2 , which is the
same set. Therefore (82 81 ) (81 82 ) signifies (2P \ ((2P \ K1 ) K2 )) ((2P \ K1 ) K2 ) = 2P .
Hence (82 81 ) (81 82 ) is a tautology.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
tautology. In other words, the tautology property depends only on the form of the logical expression, not
on the particular concrete propositions to which it refers. This also applies to contradictions.
It is not entirely obvious that this should be so. For example, the expression p1 (p2 p1 ) is a tautology
for any proposition names p1 , p2 N , for a name map : N P. This follows from the interpretation
(2P \ Kp1 ) ((2P \ Kp2 ) Kp1 ), which equals 2P . It seems from the form of proof that the validity does
not depend on the choice of proposition names. So one would expect that the proposition names may be
substituted with any other names. But arguments based on the form of a proof are intuitive and unreliable.
Asserting that the form of a natural-language proof has some attributes would be meta-metamathematical.
If one substitutes p3 for only the first appearance of p1 , the resulting expression p3 (p2 p1 ) is not
a tautology. So uniformity of substitution is important. Using Definition 3.9.7 for uniform substitution, it
should be possible to provide a merely metamathematical proof that uniform substitution converts tautologies
into tautologies. This is stated as (unproved) Propositions 3.12.8 and 3.12.10.
3.12.8 Proposition [mm]: Substitution of proposition names into tautologies.
Let P be a concrete proposition domain with name map : N P. Let 0 W 0 (N , O0 , O1 , O2 ) be an
infix logical expression on P, let p1 , . . . pn N be distinct proposition names, and let p1 , . . . pn N be
proposition names, where n is a non-negative integer. If 0 is a tautology on P, then the substituted symbol
string Subs(0 ; p1 [p1 ], . . . pn [pn ]) W 0 (N , O0 , O1 , O2 ) is a tautology on P.
3.12.9 Remark: Substitution of logical expressions into tautologies yields more tautologies.
It may be observed that when any of the letters in a tautology are uniformly substituted with arbitrary
logical expressions, the logical expression which results from the substitution is itself a tautology. However,
this conjecture requires proof, even if the proof is only informal.
3.12.10 Proposition [mm]: Substitution of logical expressions into tautologies.
Let P be a concrete proposition domain with name map : N P. Let 0 , 1 , . . . n W 0 (N , O0 , O1 , O2 )
3.13.2 Remark: A test-function analogy for knowledge sets as interpretations of logical operators.
The original difficulties in the interpretation of the Dirac delta distribution concept were resolved by applying
distributions to smooth test functions. Then derivatives of Dirac delta functions could be interpreted by
applying differentiation to the test functions instead of the symbolic functions. By analogy, one may regard
knowledge sets as test functions for logical operators. Then symbolic logical operations may be interpreted
as concrete operations on knowledge sets.
3.13.3 Remark: Logic operations and set operations are duals of each other.
The interpretation of logic operations in terms of set operations is somewhat suspicious because the set
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
operations are themselves defined in terms of natural-language logic operations. So whether logic is defined
in terms of sets or vice versa is a moot point. However, the framework proposed here has definite consequences
for logic. For example, this framework implies (without any shadow of a doubt) that two logical expressions
of the form (p1 ) p2 and p1 p2 have exactly the same meaning. Similarly, logical expressions of the
form p1 and (p1 ) have exactly the same meaning (without any shadow of a doubt). More generally, any
two logical expressions are considered to have identical meaning if and only if they put exactly the same
constraints on the truth value map for the relevant concrete proposition domain.
Although logic operations and set operations are apparently duals of each other, since set operations may
be defined in terms of logical operations (via a naive axiom of comprehension), and logical operations
may be defined in terms of set operations, nevertheless the set operations are more concrete. Whenever
there is ambiguity, one may easily demonstrate set operations on finite sets in a very concrete way by listing
those elements which are in a resultant set, and those which are not. By contrast, clarifying ambiguities in
logic operations can be exasperating. (A well-known example of this is the difficulty of convincing a naive
audience that a conditional proposition P Q is true if P is false.)
It could be argued that symbolic logic expressions are more concrete than set expressions because symbolic
logic manipulates sequences of symbols on lines and such manipulation can be checked automatically by
computers because the rules are so objective that they can be programmed in software. However, the
corresponding set operators are equally automatable in the case of finite sets of propositions, although for
infinite sets, only symbolic logic can be automated. This is because a finite sequence of symbols may be
agreed to refer an infinite set. This is a somewhat illusory advantage for symbolic logic because sets can
be defined otherwise than by comprehensive lists of their elements. For example, very large sparse matrices
are typically represented in computers by listing only the non-zero elements. If an infinite matrix has only
a finite number of non-zero elements, clearly one may easily represent such a matrix in a finite computer.
Similarly, the number may be represented symbolically in a computer rather than in terms of its infinite
binary or decimal expansion. If one continues to apply and extend such methods of representing infinite
objects by finite objects in computer software, the result is not very much different to symbolic logic. The
real difference is how the computer implementation is thought about.
In the knowledge-set table in Figure 3.13.1, the symbol signifies that a truth-value-map possibility is
excluded. The symbol
signifies that a truth-value-map possibility is not excluded. This helps to make
clear the meanings of the entries in the right column of the truth table. Each F in the right column
signifies that the truth-value combination to the left is excluded . Each T in the right column signifies
that the truth-value tuple to the left is not excluded. Thus the knowledge represented by the compound
proposition A B is very limited. It excludes only one of the four possibilities. It says nothing at all
about the other three possibilities.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It seems clear from Figure 3.13.1 (in Remark 3.13.4) that there is a one-to-one association between finite
knowledge sets and truth tables. This association is easy to demonstrate by associating subsets K of 2P
with functions from 2P to {F, T}.
A knowledge set K for a concrete proposition domain P may be written as K = { 2P ; K ( ) = T},
where K is the unique function K : 2P {F, T} such that for all 2P , K ( ) = T if and only if K.
Conversely, any function : 2P {F, T} determines a unique knowledge set K = { 2P ; ( ) = T}.
Thus there is a one-to-one association between knowledge sets and functions from 2P to {F, T}. (This is
analogous to the indicator function concept in Section 14.6.)
A function : 2P {F, T} for finite P is essentially the same thing as a truth table, since a truth table
lists the values ( ) in the right column for the truth value maps 2P which are listed in the left columns.
This applies to any finite concrete proposition space P.
Each truth table for a fixed finite concrete proposition space P may be associated in a one-to-one way with
a function : {F, T}n {F, T} if a fixed bijection P n (i.e. : {1, 2, . . . n} P) is given for some
non-negative integer n. Such a bijection induces a unique bijection R : {F, T}P {F, T}n which satisfies
R : f 7 f for f 2P = {F, T}P . This fixed bijection R may be composed with each function
: {F, T}n {F, T} by the rule 7 = R . Then : {F, T}P {F, T} is a well-defined truth
table. Conversely, the maps : {F, T}P {F, T} are associated in a one-to-one manner with the composite
1
maps = R : {F, T}n {F, T}. The association is one-to-one because R is a bijection.
In summary, for a fixed bijection : {1, 2, . . . n} P, there is a unique one-to-one association between the
truth functions : {F, T}n {F, T} in Definition 3.6.2 and the knowledge sets K 2P and truth table
maps : 2P {F, T}. Therefore it is a matter of convenience which form of presentation is used to define
knowledge on a finite concrete proposition domain P.
3.13.6 Remark: Knowledge has more the nature of exclusion than inclusion.
It seems clear from Remarks 3.6.4 and 3.13.4 that knowledge is more accurately thought of in terms of
exclusion of possibilities rather than inclusion. The knowledge in multiple knowledge sets is combined by
intersecting them (as in Remark 3.4.10) because the set of possibilities which are excluded by the intersection
of a collection of sets equals the union of the possibilities excluded by each individual set in the collection.
In other words, knowledge accumulates by excluding more and more possibilities.
Similarly, statistical inference only excludes hypotheses. Statistical inference cannot prove hypotheses. It can
only exclude hypotheses at some confidence level (assuming a framework of a-priori modelling assumptions).
Thus statistical confidence level refers to the confidence of exclusion, not the probability of inclusion.
One might perhaps draw a further parallel with Karl Poppers philosophy of science, which famously asserts
that science progresses by refuting conjectures, not by proving them. (See Popper [409].) Thus scientific
knowledge progresses by accumulating exclusions, not by accumulating inclusions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
option has the advantage of full generality, but gives little insight into the meaning of ones knowledge.
In practical applications, ones knowledge is typically a conjunction of multiple pieces of knowledge of a
relatively simple kind. This conjunction-of-a-list approach to logic has the great advantage that if some item
on a knowledge list turns out to be false, it may be removed and the combined knowledge may be recalculated.
(This is analogous to the way in which lists of constraints are managed in linear programming.)
To describe knowledge which is more complex than simple binary operations, one could define notations and
n 2
methods for managing the 2(2 ) general n-ary logical operations for n 3 in the same way that the 2(2 ) = 16
general binary operations are presented in Section 3.7. This approach is unsuitable largely because human
beings do not think or talk using n-ary logical operations for n 3. In particular, the English language
describes general logical relations by combining unary and binary logical operations recursively.
In practice, general n-ary logical operations are expressed as combinations of binary operations. For example,
the assertion that if the Sun is up and the Moon is up, then a solar eclipse is not possible has the form
(p1 p2 ) p3 , where p1 = the Sun is up, p2 = the Moon is up and p3 = a solar eclipse is possible.
This assertion could in principle be described in terms of a ternary logical operation with its own eight-line
truth table. (In this special case, there is in fact an alternative to compound binary operations because the
assertion is equivalent to (p1 p2 ) p3 , which is equivalent to p1 p2 p3 , which may be expressed
as at least one of these propositions is false, but this is not strictly a ternary logical operation.)
The most usual way of describing finite knowledge sets is by compound propositions, which are binary
operations whose terms are themselves either concrete propositions or compound propositions, and so
forth recursively. This approach is equivalent to extending a concrete proposition domain P by adding
binary logical operations to the domain. Then binary operations are applied to this extended domain to
obtain a further extension of the domain, and so forth indefinitely. Since each such compound proposition
may be identified with a knowledge set K = { 2P ; ( ) = T}, the end result of such recursion is a
second-order proposition domain which may be identified with 2P if P is a finite set.
3.14.3 Example: Simple binary logical operations are inadequate to describe some small knowledge sets.
A knowledge set K 2P for a concrete proposition domain P cannot always be expressed as a single
binary logic operation, nor even as the conjunction of a sequence of binary logic operations. For example,
if P = {p1 , p2 , p3 , p4 }, then the knowledge that precisely two of the four propositions are true (and the
other two are false) corresponds to the knowledge set K = { 2P ; #{p P; (p) = T} = 2}. (This set
contains C 42 = 6 elements. See Notation 14.8.3 for combination symbols.) This set K cannot be described
as the conjunction or disjunction of any list of binary logic operations because for all pairs X = {pi , pj } of
propositions chosen from P = {p1 , p2 , p3 , p4 }, for all binary truth functions of pi and pj , there is always at
least one combination of the remaining pair in P \ X which is possible, and at least one which is excluded.
The condition #{p P; (p) = T} = 2 for P = {p1 , p2 , p3 , p4 } corresponds to the logical expression
(p1 p2 ) (p3 p4 ) (p1 M p2 ) (p3 M p4 ) (p1 p2 ) (p3 p4 ) .
This compound proposition does not give a natural description of the knowledge set. Recursively compounded
binary logical operations seem unsuitable for such sets. If this knowledge set example is written in disjunctive
normal form, it requires 6 terms, but this representation also does not communicate the intended meaning of
the knowledge set. Propositional logic is traditionally presented in terms of binary operations and compounds
of binary operations possibly because such forms of logical expressions are prevalent in natural-language
logic usage. However, the natural-language-inspired compounded binary logic expressions are inefficient and
clumsy for many situations.
3.14.4 Example: Recursive binary logical operations are unsuitable for most large knowledge sets.
As a more extreme example, suppose one knows that 70 out of a set of 100 propositions are true (and the rest
are false). Then disjunctive normal form requires C 10070 = 29,372,339,821,610,944,823,963,760 terms, which
is clearly too many for practical purposes, whereas one may specify such a knowledge set fairly compactly in
terms of the existence of a bijection from the first 70 integers to the set of propositions. (Such an example
is not entirely artificial. For example, a test may have 100 questions, and the assessment may state that 70
of the answers are correct. Then one knows that 70 of the propositions of the candidate are true and 30 are
false. Similarly, one might know that the score is in the range from 70 to 80, which makes the knowledge
set structure even more difficult to describe with compound binary logic operations.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
As a slightly more natural approach to this 70 out of 100 example, one could attempt a divide-and-conquer
strategy to generate a compound expression. If (p1 ) = T, then 69 of the remaining 99 propositions must
be true. If (p1 ) = F, then 70 of the remaining propositions must be true. So the logical operation may
be written as (p1 P (99, 69)) (p1 P (99, 70)), where P (m, k) means that k of the last m propositions
are true. Then P (100, 70) may be expanded recursively into a binary tree. Unfortunately, this tree has
approximately C 10070 terms. So it is not more compact than disjunctive normal form or an exhaustive listing
of a truth table, but it does have the small advantage that it gives some insight. However, is still requires
many trillions of terabytes of disk space to store the result.
Another strategy to deal with the 70 out of 100 example is to recursively split the set of propositions in the
50
middle. Thus Q(1, 100, 70) may be written as (Q(1, 50, k) Q(51, 100, 70 k)), where Q(i, j, k) means
k=20
the assertion that exactly k of the propositions from pi to pj inclusive are true. This strategy also fails to
deliver a more compact logical expression.
The arithmetic expression (A) + (B) + (C) 2 is equivalent to (A B) (A C) (B C), and
(A) + (B) + (C) = 2 is equivalent to (A B) (A C) (B C) (A B C), which is
equivalent to (A B C) (A B C) (A B C).
Negatives, conjunctions and disjunctions may also be expressed in terms of minimum and maximum operators
as in Table 3.14.2.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Table 3.14.2 Equivalent min/max expressions for some logical expressions
An advantage of these min/max operators is that they are also valid for infinite sets of propositions. This is
important in the predicate calculus, where propositions are organised into parametrised families which are
typically infinite.
Also in pure predicate logic and predicate calculus, there is no way to determine the underlying class of
propositions and predicate-parameters from the language, although it is usual to assume that the class of
predicate-parameters is non-empty to avoid needing to deal with this trivial case. Therefore in pure predicate
logic, model theory has little interest because of the extreme generality of possible models.
When non-logical axioms are introduced, the model theory question becomes genuinely interesting. In set
theories in particular, the interest focuses on the class of sets in the system. This class is generally required
to have at least one element, such as the empty set, and is also expected to contain all sets which can
be constructed recursively from this according to some rules. Therefore the set of objects in the system
must include at least those sets which can be constructed according to the rules, but may contain additional
objects. Model theory is concerned with determining the properties of those object.
It would be possible to artificially construct a kind of model theory for propositional logic by introducing some
constant propositions, analogous to the constant relation and constant set in set theory. But then to
make this interesting, one would need some kind of recursive rule-set for constructing new propositions from
old propositions, starting with the given constant propositions. It is true that one may recursively construct
a hierarchy of compound propositions using binary logical operators. But these are quite redundant because
they would simply shadow the compound propositions of the language itself.
Set theory generates interesting classes of propositions by first generating interesting classes of sets, and these
sets may be combined into propositions using the set membership relation, thereby defining an interesting
class of concrete propositions. Thus in order to make model theory interesting, propositions must have
parameters which are chosen from an interesting class of objects, and the non-logical axioms must then give
rules which can be used to recursively construct the interesting class of objects. This is in essence what a
first-order language is, and this explains why model theory is interesting for first-order languages, but is
not of any serious interest for propositional logic.
3.15.2 Remark: Broad classification of logical systems.
Some styles of logical systems where the model is given a-priori may be classified as follows.
(1) Pure propositional logic. A concrete proposition domain P is given. Each proposition in the language
signifies a subset of the knowledge space 2P . The theorems of the language concern relations between
such propositions. The language is typically built recursively from unary and binary operations. The
logical axioms for such a system are tautologies which are valid for any subset of 2P .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) Propositional logic with non-logical axioms. A concrete proposition domain P is given, and a
set of constraints on the knowledge space 2P reduces it a-priori to a subset K0 of 2P . The constraints
may be specified in the language as non-logical axioms. This is equivalent to asserting a-priori a single
proposition which signifies this reduced knowledge space K0 . This is a kind of first-order language for
propositional logic. The logical axioms for 2P in (1) are also valid for K0 .
(3) Pure predicate logic. A space V of predicate parameters (called variables) is given. A space Q of
predicates is given. These are proposition-valued functions with zero or more parameters in V. (These
may be referred to as constant predicates to distinguish them from the predicates which are formed
in the language from logical operators.) A concrete proposition domain P is defined to be the space
of all values of all predicates. (For example, V could be a class of sets and Q could consist of the two
binary predicates = and . Then P would contain all propositions of the form x = y or x y
for x and y in the class V.) Then every proposition in the language signifies a subset of the knowledge
space 2P , and the theorems of the language concern relations between such propositions, exactly as
in (1). Because of the opportunity open up by the parametrisation of propositions, the unary and
binary operations in (1) are enhanced by the introduction of universal and existential quantifiers. The
logical axioms for a pure predicate logic are tautologies which are valid for any subset of 2P .
(4) Predicate logic with non-logical axioms. This is the same as the pure predicate logic in (3)
except that the knowledge space 2P is reduced a-priori to a subset K0 by a set of constraints which are
typically written in the language as non-logical axioms. These are combined with the pure predicate
logic logical axioms in (3). Non-logical axioms are written in terms of constant predicates, and possibly
also constant objects and constant object-maps. (Otherwise the non-logical axioms would have nothing
to refer to.)
(5) Pure predicate logic with object-maps. This is the same as the pure predicate logic in (3)
except that a space of object-maps is added to the system. These object-maps are maps from V n to
V for some non-negative integer n. These potentially make the concrete proposition domain P larger
because concrete propositions may be constructed as a combination of predicates and object-maps. For
example, in axiomatic group theory (not based on ZF set theory), one might write (g1 , (g2 , g3 )) =
((g1 , g2 ), g3 ), which would be a concrete proposition formed from the variables g1 , g2 , g3 , the object-
map , and the predicate =.
(6) Pure predicate logic with object-maps and non-logical axioms. This is the same as (5),
but with the addition of non-logical axioms as in (4). This corresponds to a fairly typical description of
a first-order language in mathematical logic and model theory textbooks.
Logical system styles (5) and (6) are not required for ZF set theory, and are therefore not formally presented
in this book. They are more applicable to the axiomatisation of algebra without set theory. Style (1) is the
subject of Chapters 3 and 4. Style (3) is the subject of Chapters 5 and 6. Style (4) is the logical framework
for ZF set theory.
Model theory is concerned with inverting the order of specification in these logical systems. Instead of
defining the objects and propositions in advance, model theory takes a language or theory, including
axioms, as the input, and then studies properties of the possible object spaces V (and the constrained
knowledge spaces K0 2P ) as output. In the case of a set theory, model theory is concerned with the
properties of models which fit each set of axioms. Each model has a universe of sets and a set-membership
relation for that universe. Some of the questions of interest are whether at least one model does exist,
whether a model for one theory is also a model for another theory, and so forth. Unfortunately, model
theory generally relies upon an underlying set theory as a framework in which to define models, which
makes the whole endeavour seem very circular. This is a credibility issue for model theory.
Chapter 4
Propositional calculus
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
supposed facts which do not fit the pattern, and to formulate new hypotheses. But in mathematics, the
replacement of intuitive truth with inferred truth has arguably gone too far in some areas.
Propositional calculus is an unnecessary expenditure of time and effort from the point of view of applications
to propositional logic, which can be managed much more efficiently with truth tables. Propositional calculus
must be endured for the sake of its extension to applications with infinite sets of propositions.
example, infinite sets cannot be demonstrated concretely, but they may be managed as abstract properties
in an abstract language which refers to a non-demonstrable, non-concrete universe. In other words, the
axiomatic, argumentative, deductive approach extends our knowledge from concepts which we can know
by other means to concepts which we cannot know by other means.
4.1.2 Remark: Veracity and credibility should arise from the mechanisation of logic.
The underlying assumption of all formal logical systems is the belief (or hope) that the veracity, or at least the
credibility, of a very wide range of assertions may be assured by first focusing intensely on the selection and
close examination of a small set of axioms and inference rules, and then by merely monitoring and enforcing
conformance to the axioms and rules, all outputs from the process will inherit veracity or credibility from
those axioms and rules.
This is somewhat similar to the care which is taken in the design of machines which manufacture products
and other machines. The manufacturing machines must be calibrated and verified carefully, and one assumes
(or hopes) that the ensuing mass production of products and more machines will be reliable. However, real-
world manufacturing is almost always monitored by quality control processes. If the analogy is valid, then
the products of mathematical theories also need to be monitored and subjected to quality control.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
4.1.4 Remark: All propositional calculus formalisations must be equivalent.
All formalisations of propositional calculus must be equivalent to each other because they must all correctly
the describe the same model, which in this case consists of a concrete proposition domain, the space of
knowledge sets on this domain, and the collection of basic logical operations on these knowledge sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
variables (Nagel/Newman [323], page 46), sentence symbols (Chang/Keisler [298], page 5), atoms
(Curry [301], page 55), or propositional atoms (Huth/Ryan [313], page 32).
(3) Basic logical operators: The most often used basic logic operators are denoted here by the sym-
bols , , , and . Basic logical operators are also called logical operators (Margaris [319],
page 30), sentence-forming operators on sentences (Lemmon [317], page 6), operations (Curry [301],
page 55; Eves [303], page 250), connectives (Chang/Keisler [298], page 5; Rosenbloom [336], page 32),
sentential connectives (Nagel/Newman [323], page 46; Robbin [334], page 1), statement connectives
(Margaris [319], page 26), propositional connectives (Kleene [316], page 5; Kleene [315], page 73;
Cohen [300], page 4), logical connectives (Bell/Slomson [290], page 32; Lemmon [317], page 42;
Smullyan [341], page 4; Roitman [335], page 26), primitive connectives (E. Mendelson [320], page 30),
fundamental logical connectives (Hilbert/Ackermann [308], page 3), logical symbols (Bernays [292],
page 45), or fundamental functions of propositions (Whitehead/Russell, Volume I [350], page 6).
(Operator symbols are identified here with the operators themselves. Strictly speaking a distinction
should be made. Another distinction which should be made is between the formally declared basic
logical operators and the non-basic logical operators which are defined in terms of them. However, the
distinction is not always clear. Nor is the distinction necessarily of real importance. The non-basic
operators may be defined explicitly in terms of the basic operators, or alternatively their meanings may
be defined via axioms.)
(4) Punctuation symbols: In the modern literature, punctuation symbols typically include parentheses
such as ( and ), although parentheses are dispensable if the operator syntax style is prefix or
postfix for example. Punctuation symbols are called punctuation symbols by Bell/Slomson [290],
page 33, or just parentheses by Chang/Keisler [298], page 5. In earlier days, many authors uses
various styles of dots for punctuation. (See for example Whitehead/Russell, Volume I [350], page 10;
Rosenbloom [336], page 32; Robbin [334], pages 5, 7.) Some authors used square brackets ([ and ])
instead of parentheses. These were called brackets by Church [299], page 69.
(5) Logical expression symbols: These are the symbols which are permitted in logical expressions.
These symbols are of three disjoint symbol classes: proposition names, basic operators and punctuation
symbols. Logical expression symbols are also known as basic symbols (Bell/Slomson [290], page 32),
formal symbols (Stoll [343], page 375), primitive symbols (Church [299], page 48; Robbin [334],
page 4; Stoll [343], page 375), or symbols of the propositional calculus (Lemmon [317], page 43).
(6) Logical expressions: These are the syntactically correct symbol-sequences. Logical expressions are
also known as formulas (Bell/Slomson [290], page 33; Kleene [316], page 5; Kleene [315], pages 72, 108;
Margaris [319], page 31; Shoenfield [340], page 4; Smullyan [341], page 5; Nagel/Newman [323], page 45;
Stoll [343], page 375376), well-formed formulas (Church [299], page 49; Lemmon [317], page 44; Huth/
Ryan [313], page 32; E. Mendelson [320], page 29; Robbin [334], page 5; Cohen [300], page 6), proposition
letter formulas (Kleene [315], page 108), statement forms (E. Mendelson [320], page 15), sentences
(Chang/Keisler [298], page 5; Rosenbloom [336], page 38), elementary sentences (Curry [301], page 56),
or sentential combinations (Hilbert/Ackermann [308], page 28).
Logical expressions which are not proposition names may be called composite formulas (Kleene [316],
page 5), compound formulas (Smullyan [341], page 8), or molecules (Kleene [316], page 5).
(7) Logical expression names: The permitted labels for well-formed formulas. (The labels for well-formed
formulas could be referred to as wff names or wff letters.) Logical expression names are also called
sentential symbols (Chang/Keisler [298], page 5), sentence symbols (Chang/Keisler [298], page 7),
syntactical variables (Shoenfield [340], page 7), metalogical variables (Lemmon [317], page 49), or
metamathematical letters (Kleene [315], pages 81, 108).
(8) Syntax rules: The rules which decide the syntactic correctness of logical expressions. Syntax rules are
also known as formation rules (Church [299], page 50; Lemmon [317], page 42; Nagel/Newman [323],
page 45; Robbin [334], page 4; Kleene [315], page 72; Kleene [316], page 205; Smullyan [341], page 6).
(9) Axioms: The collection of axioms or axiom schemes (or schemas). Axiom schemes (or schemas, or
schemata) are templates into which arbitrary well-formed formulas may be substituted. (Axiom schemes
are attributed by Kleene [315], page 140, to a 1927 paper by von Neumann [384].) Axioms can also be
defined in terms of logical expression names for which any logical expressions may be substituted. Both
of these two variants are equivalent to an axiom-set plus substitution rule.
Almost everybody calls axioms axioms, but axioms are also called primitive formulas (Nagel/
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Newman [323], page 46), primitive logical formulas (Hilbert/Ackermann [308], page 27), primitive
tautologies (Eves [303], page 251), primitive propositions (Whitehead/Russell, Volume I [350], pages
13, 98), prime statements (Curry [301], page 192), or initial formulas (Bernays [292], page 47).
(10) Inference rules: The set of deduction rules. Inference rules are also known as rules of inference
(Kleene [316], page 34; Kleene [315], page 83; Church [299], page 49; Margaris [319], page 2; Robbin [334],
page 14; Shoenfield [340], page 4; Stoll [343], page 376; Smullyan [341], page 80; Suppes/Hill [346],
page 43; Tarski [348], page 132; Bernays [292], page 47; Bell/Slomson [290], page 36; E. Mendelson [320],
page 29), rules of proof (Tarski [348], page 132), rules of procedure (Church [299], page 49), rules of
derivation (Lemmon [317], page 8; Suppes [344], page 25; Bernays [292], page 47), rules (Curry [301],
page 56; Eves [303], page 251), primitive rules (Hilbert/Ackermann [308], page 27), natural deduction
rules (Huth/Ryan [313], page 6), deductive rules (Kleene [315], page 80; Kleene [316], page 205),
derivational rules (Curry [301], page 322), transformation rules (Kleene [315], page 80; Kleene [316],
page 205; Nagel/Newman [323], page 46), or principles of inference (Russell [338], page 16).
The reader will (hopefully) gain the impression from this brief survey that the terms in use for the components
of a propositional calculus formalisation are far from being standardised! In fact, it is not only the descriptive
language terms that are unstandardised. There seems to be no majority in favour of any one way of doing
almost anything in either the propositional and predicate calculus, and yet the resulting logics of all authors
are essentially equivalent. In order to read the literature, one must, therefore, be prepared to see familiar
concepts presented in unfamiliar ways.
4.1.7 Remark: The naming of concrete propositions and logical expressions.
The relations between concrete propositions, proposition names, logical expressions, logical expression names
in Remark 4.1.6 are illustrated in Figure 4.1.1.
The concrete propositions in layer 1 in Figure 4.1.1 are in some externally defined concrete proposition
domain. (See Section 3.2 for concrete proposition domains.) Concrete propositions may be natural-language
logical
expression 5
formulas
meta-
mathematics
logical
expression 4
names
logical
AB A (B C) B C C 3
expressions
propositional
calculus
proposition
A B C D 2
names
concrete
proposition 1 proposition 2 proposition 3 proposition 4 1
propositions
semantics
real
... ?... ... ?... ... ?... ... ?... 0
world
statements, symbolic mathematics statements, transistor voltages, or the states of any other kind of two-state
elements of a system.
Concrete propositions are abstractions from a physical or social system. Therefore the fact that the real
world does not have sharply two-state components is a modelling or verisimilitude question which does
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
not affect the validity of argumentation within the logical system. (The real world may be considered to
be in layer 0 in this diagram.)
The proposition names in layer 2 are associated with concrete propositions. Two proposition names may
refer to the same concrete proposition. The association may vary over time and according to context.
The proposition names in layer 2 belong to the discussion context, whereas the concrete propositions in
layer 1 belong to the discussed context. Layer 3 also belongs to the discussion context. (Layers 4 and 5
belong to a meta-discussion context, which discusses the layer 2/3 context.)
In layer 3, the proposition names in layer 2 are combined into logical expressions. Layers 2 and 3 are the
layers where the mechanisation of propositional calculus is executed.
In layer 4, the metamathematical discussion of logical expressions, for example in axioms, inference rules,
theorems and proofs, is facilitated by associating logical expression names with logical expressions in layer 3.
This association is typically context-dependent and variable over time in much the same way as for proposition
name associations.
In layer 5, the logical expression names in layer 4 may be combined in logical expression formulas. These
are formulas which are the same as the logical expression formulas in layer 3, except that the symbols
in the formulas are logical expression names instead of concrete proposition names. The apparent logical
expressions in axioms, inference rules, and most theorems and proofs, are in fact templates in which each
symbol may be replaced with any valid logical expression.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1958 Eves [303], page 256 1 4 rules
1958 Nagel/Newman [323], pages 4549 , , , 4 Subs, MP
1963 Curry [301], pages 5556 , 3 MP
1963 Stoll [343], pages 375377 , 3 MP
1964 E. Mendelson [320], pages 3031 , 3 MP
1964 E. Mendelson [320], page 40 , 4 MP
1964 E. Mendelson [320], page 40 , 3 MP
1964 E. Mendelson [320], page 40 , 3 Subs, MP
1964 E. Mendelson [320], page 40 , , , 10 MP
1964 E. Mendelson [320], page 42 , 1 MP
1964 E. Mendelson [320], page 42 1 MP
1964 Suppes/Hill [346], pages 12109 , , , 0 16 rules
1965 Lemmon [317], pages 540 , , , 0 10 rules
1967 Kleene [316], pages 3334 , , , , 13 MP
1967 Margaris [319], pages 4751 , 3 MP
1969 Bell/Slomson [290], pages 3237 , 9 MP
1969 Robbin [334], pages 314 , 3 MP
1973 Chang/Keisler [298], pages 49 , Taut MP
1993 Ben-Ari [291], page 55 , 3 MP
2004 Huth/Ryan [313], pages 327 , , , 0 9 rules
Kennington , 3 Subs, MP
The basic operator notations in Table 4.2.1 have been converted to the notations used in this book for easy
comparison. Usually the remaining (non-basic) logical operators are defined in terms of the basic operators.
Sometimes the axioms are defined in terms of both basic and non-basic operators. So it is sometimes slightly
subjective to designate a subset of operators to be basic.
The axioms for some systems are really axiom templates. The number of axioms is in each case somewhat
subjective since axioms may be grouped or ungrouped to decrease or increase the axiom count. The axiom
set Taut means all tautologies. (Such a formalisation is not strictly of the same kind as is being discussed
here. Axioms are usually chosen as a small finite subset of tautologies from which all other tautologies can
be deduced via the rules.)
The inference rules MP and Subs are abbreviations for Modus Ponens and Substitution respectively.
For systems with large numbers of rules, or rules which have unclear names, only the total number of rules is
given. Usually the number of rules is very subjective because similar rules are often grouped under a single
name or split into several cases or even sub-cases.
Propositional logic formalisations differ also in other ways. For example, some systems maintain dependency
lists as part of the argumentation to keep track of the premises which are used to arrive at each assertion.
Despite all of these caveats, Table 4.2.1 gives some indication of the wide variety of styles of formalisations
for propositional calculus. Perhaps the primary distinction is between those systems which use a minimal
rule-set (only MP and possibly the substitution rule) and those which use a larger rule-set. The large rule-
set systems typically use fewer or no axioms. The minimum number of basic operators is one operator,
but this makes the formalisation tedious and cumbersome. Single-operator systems have more academic
than practical interest. Systems with four or more operators are typically chosen for first introductions to
logic, whereas the more spartan two-operator logics are preferable for in-depth analysis, especially for model
theory. For the applications in this book, the best style seems to be a two-operator system with minimal
inference rules and 3 or 4 axioms or axiom templates. Such a system is a good compromise between intuitive
comprehensibility and analytical precision.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
[. . . ] there are certain advantages to keeping our list of symbols short. [. . . ] proofs by induction
based on it are shorter this way. At the other extreme, we could have managed with only a single
connective, whose English translation is neither . . . nor . . . . We did not do this because neither
. . . nor . . . is a rather unnatural connective.
The single connective which they refer to is the joint denial operator in Definition 3.6.10. A contrary
view is taken by Russell [339], page 146.
When our minds are fixed upon inference, it seems natural to take implication as the primitive
fundamental relation, since this is the relation which must hold between p and q if we are to be able
to infer the truth of q from the truth of p. But for technical reasons this is not the best primitive
idea to choose.
He chose the single alternative denial operator instead, but is then obliged to immediately define four
other operators (including the implication operator) in terms of this operator in order to state 5 axioms in
terms of implication.
predicate calculus, and to set theory, and to general mathematics beyond that. It could be argued that the
choice does not matter at all. No matter which basic operators are chosen, all of the other operators are
soon defined in terms of them, and it no longer matters which set one started with. Similarly, a small set of
axioms is soon extended to the set of all tautologies. So once again, the starting set makes little difference.
Likewise the inference rules are generally supplemented by derived rules, so that the final set of inference
rules is the same, no matter which set one commenced with.
Since the end-point of propositional calculus theorem-proving is always the same, particularly in view of
Remark 4.1.4, one might ask whether there is any criterion for choosing formalisations at all. (There are
logic systems which are not equivalent to the standard propositional calculus and predicate calculus, but
these are outside the scope of this book, being principally of interest only for philosophy and pure logic
research.) In this book, the principal reason for presenting mathematical logic is to facilitate trace-back
as far as possible from any definition or theorem to the most basic levels. Therefore the formalisation is
chosen to make the trace-back as meaningful as possible. Each step of the trace-back should hopefully add
meaning to any definition or theorem.
If the number of fundamental inference rules is too large, one naturally asks where these rules come from,
and whether they are really all valid. Therefore a small number of intuitively clear rules is preferable. The
modus ponens and substitution rules are suitable for this purpose.
Similarly, a small set of meaningful axioms is preferable to a comprehensive set of axioms. The smaller the
set of axioms which one must be convinced of, the better. If the main inference rule is modus ponens, then
the axioms should be convenient for modus ponens. This suggests that the axioms should have the form
for some logical expressions and . This strongly suggests that the basic logical operators should
include the implication operator .
The propositional calculus described by Bell/Slomson [290], pages 3237, uses and as the basic operators,
but their axioms are all written in terms of , which they define in terms of their two basic operators. Sim-
ilarly, Whitehead/Russell, Volume I [350], pages 413, Hilbert/Ackermann [308], pages 2728, Wilder [353],
pages 222225, and Eves [303], pages 250252, all use and as basic operators, but they also write all of
their axioms in terms of the implication operator after defining it in terms of the basic operators. This hints
at the difficulty of designing an intuitively meaningful axiom-set without the implication operator.
If it is agreed that implication should be one of the basic logical operators, then the addition of the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
negation operator is both convenient and sufficient to span all other operators. It then remains to choose
the axioms for these two operators.
The three axioms in Definition 4.4.3 are well known to be adequate and minimal (in the sense that none of the
three are superfluous). They are not intuitively the most obvious axioms. Nor is it intuitively obvious that
they span all tautologies. These three axioms also make the boot-strapping of logical theorems somewhat
austere and sometimes frustrating. However, they are chosen here because they are fairly popular, they give
a strong level of assurance that theorems are correctly proved, and they provide good exercise in a wide
range of austere techniques employed in formal (or semi-formal) proof. Then the extension to predicate logic
follows much the same pattern. Zermelo-Fraenkel set theory can then be formulated axiomatically within
predicate logic using the same techniques and applying the already-defined concepts and already-proved
theorems of propositional and predicate logic.
activity. The set of accumulated theorems is augmented by new theorems as they are discovered, and the
logical deduction may exploit these accumulated theorems. The theorem accumulation set is generally not
formalised explicitly. Theorems may be exchanged between the theorem accumulation sets of people who are
using the same logical deduction system and trust each other not to make mistakes. Thus academic research
journals become part of the logical deduction process.
Typically a number of short-cuts are formulated as metatheorems or derived inference rules to automate
tediously repetitive sequences of steps in proofs. These metatheorems and derived rules are maintained and
exchanged in metatheorem or derived rule accumulation sets.
It is rare that logical deduction systems are fully specified. Until one tries to implement a system in computer
software, one does not become aware of many of the implicit state variables which must be maintained. (One
notable computer implementation of mathematical logic is the Isabelle/HOL proof assistant for higher-order
logic, Nipkow/Paulson/Wenzel [324].)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
no truth value maps. Thus if K , then K . In other words, K K .
After a hypothesis has been introduced in a formal argument, all propositions following this introduction
are only required to signify knowledge sets K which are supersets of K . Then instead of writing ` ,
one writes ` . This can then be claimed as a theorem which asserts that may be inferred from .
By uniformly applying substitution rules to and , one may then apply such a theorem in other proofs.
Whenever a proposition of the form appears in an argument, it is permissible to introduce as an inference.
The moral justification for this is that one could easily introduce the lines of the proof of ` into the
importing argument with the desired uniform substitutions. To save time, one prepares in advance numerous
stretches of argument which follow a common pattern. Such patterns may be exploited by invocation, without
literally including a substituted copy into the importing argument.
A second departure from the notion that every line is a tautology is the adoption of non-logical axioms.
These are axioms which are not tautologies. Such axioms are adopted for the logical modelling of concrete
proposition domains P where some relations between the concrete propositions P are given a-priori. As
a simple example, P could include propositions of the form (x) = x is a dog and (x) = x is a animal
for x in some universe U . Then one might define K1 = { 2P ; x U, ((x)) ((x))}. Usually
this would make K1 a proper subset of 2P . This is equivalent to adopting all propositions of the form
(x) (x) as non-logical axioms. In a system which has non-logical axioms, all arguments are made
relative to these axioms, which in semantic terms means that the global space 2P is replaced by a proper
subset which corresponds to a-priori knowledge.
In the case of predicate calculus, it transpires, at least in the natural deduction style of logical argument
which is presented in Section 6.3, that every line is a conditional tautology. In other words, every line has
Tm implicit form 1 , 2 , . . . m ` . Such a form of conditional
the
P
Tm assertion has the P
semantic interpretation
i=1 K i K , which is equivalent to the set-equality (2 \ i=1 K i ) K = 2 , which effectively means
that the proposition (1 2 . . . m ) is a tautology. In such a predicate calculus, the same two kinds
of departures from the rule every line is a tautology are encountered, namely hypotheses and non-logical
axioms.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
An assertion may now be regarded as a quotation of an unconditionally asserted logical proposition which
has appeared in the main scroll. For this purpose, a separate scroll which lists theorems may be convenient.
If challenged to produce a proof of a theorem, the keeper of the main scroll may locate the line in the scroll
and perform a trace-back through the justifications to determine a minimal proof for the assertion. Since
the justification of a line must quote the earlier line numbers which it requires, one may collect all of the
relevant earlier lines into a complete proof.
Thus an assertion is effectively a claim that someone has a scroll, or can produce a scroll, which gives
a proof of the assertion. In practice, mathematicians are almost always willing to accept mere sketches
of proofs, which indicate how a proof could be produced. True proofs are almost never given, except for
elementary logic theorems.
4.3.5 Remark: Why logic and mathematics are organised into theorems. The mind is a microscope.
Recording all of the worlds logic and mathematics on a single scroll would have the advantage of ensuring
consistency. In a very abstract sense, all of the argumentation is maintained on a single scroll for each logical
system. In principle, all of the theorems scattered around the world could be merged into a single scroll.
One obvious reason for not doing this is that there are so many people in different places independently
contributing to the main scroll. But even individuals break up their logical arguments into small scrolls
which each have a statement of a theorem followed by a proof. This kind of organisation is not merely
convenient. It is necessitated by the very limited bandwidth of the human mind. While working on one
theorem, an individual temporarily focuses on providing a correct (or convincing) proof of that assertion.
During that time, maximum focus can be brought to bear on a small range of ideas. The mind is like
a microscope which has a very limited field of view, resolution and bandwidth, but by breaking up big
questions into little questions, it is possible to solve them. Theorems correspond, roughly speaking, to the
minds capacity to comprehend ideas in a single chunk. A million lines of argument cannot be comprehended
at one time, but it is possible in tiny chunks, like exploring and mapping a whole country with a microscope.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
readers are interested only in the highlighted assertions.
In metamathematics, the assertion symbol generally indicates that an asserted logical proposition template
(not a single explicit logical proposition) is claimed to have a proof which the claimant can produce on demand
if particular logical propositions are substituted into the template. In this case, an explicit algorithm should
be stated for the production of the proof. In the absence of an effective procedure for producing an explicit
proof, some scepticism could be justified.
The assertion symbol ` is in all cases a metamathematical concept. In the metaphor of Remark 4.3.4,
the assertion symbol is not found on the main scroll, unless one considers that all unconditional lines in
the main scroll have an implicit assertion symbol in front of them, and quoting such lines in other scrolls
quotes the assertion symbols along with the asserted logical propositions.
semantic implication. This meaning for the assertion symbol in sequents is confirmed by Church [299],
page 165. Contradicting this is a statement by Curry [301], page 184, that Gentzens elementary statements
are statements of deducibility. However, this may be a kind of revisionist view of history after model theory
came to dominate mathematical logic. The original paper by Gentzen is very clear and cannot be understood
as Curry claims.
In model theory, options (2) and (3) are sharply distinguished because model theory would have no purpose
if they were the same.
4.3.10 Remark: Unconditional assertions are tautologies. Conditional assertions are derived rules.
Unconditional assertions in Definition 4.3.9 may be thought of as the special case of conditional assertions
with n = 0. However, there is a qualitative difference between the two kinds of assertion. An unconditional
assertion claims that the asserted logical expression is a consequence of the axioms and rules alone, which
means that they are facts about the system. In pure propositional calculus (without nonlogical axioms),
all such valid assertions signify tautologies.
A conditional assertion is a claim that if the premises appear as lines on the main scroll of an axiomatic
system, then the asserted logical expression may be produced on a later line by applying the axioms and
rules of the system. Thus if the premises are not facts of the system, then the claimed consequence is not
claimed to be a fact. A conditional assertion may be valid even when the premises are not facts. This
is somewhat worrying when the main scroll is supposed to list only facts.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
` denotes the unconditional assertion of the logical expression in an implicit axiomatic system.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
4.4. A three-axiom implication-and-negation propositional calculus
4.4.1 Remark: The chosen axiomatic system for propositional calculus for this book.
Section 4.4 concerns the particular formalisation of propositional calculus which has been selected as the
starting point for propositional logic theorems (and all other theorems) in this book.
Definition 4.4.3 defines a one-rule, two-symbol, three-axiom system. It uses and as primitive symbols,
and modus ponens and substitution as the inference rules.
The three axioms (PC 1), (PC 2) and (PC 3) are attributed to Jan Lukasiewicz by Hilbert/Ackermann [308],
page 29, Rosenbloom [336], pages 3839, 196, Church [299], pages 119, 156, and Curry [301], pages 5556, 84.
These axioms are also presented by Margaris [319], pages 49, 51, and Stoll [343], page 376. (According
to Hilbert/Ackermann [308], page 29, Lukasiewicz obtained these axioms as a simplification of an earlier
6-axiom system published in 1879 by Frege [305], but it is difficult to verify this because of Freges almost
inscrutable quipu-like notations for logical expressions.)
The chosen three axioms are a compromise between a minimalist NAND-based axiom system (as described
in Remark 4.2.3), and an easy-going 4-symbol, 10-axiom natural system which would be suitable for a
first introduction to logic. Definition 4.4.3 incorporates these axioms in a one-rule, two-symbol, three-axiom
system for propositional calculus.
Wffs, wff-names and wff-wffs must be carefully distinguished (although in practice they are often confused).
The wffs in Definition 4.4.3 (v) are strings such as ((A) (B C)), which are built recursively from
proposition names, logical operators and punctuation symbols. The wff-names in Definition 4.4.3 (iv) are
letters which are bound to particular wffs in a particular scope. Thus the wff-name may be bound
to the wff ((A) (B C)), for example. The wff-wffs in Definition 4.4.3 (vi) are strings such as
( ( )) which appear in axiom templates and rules in Definition 4.4.3 (vii, viii), and also in
theorem assertions and proofs. Thus wff-wffs are templates containing wff-names which may be substituted
with wffs, and the wffs contain proposition names.
4.4.3 Definition [mm]: The following is the axiomatic system PC for propositional calculus.
(i) The proposition name space NP for each scope consists of lower-case and upper-case letters of the
Roman alphabet, with or without integer subscripts.
(ii) The basic logical operators are (implies) and (not).
(iii) The punctuation symbols are ( and ).
(iv) The logical expression names (wff names) are lower-case letters of the Greek alphabet, with or without
integer subscripts. Each wff name which is used within a given scope is bound to a unique wff.
(v) Logical expressions (wffs) are specified recursively as follows.
(1) A is a wff for all A in NP .
(2) (8 8 ) is a wff for any wffs and .
(3) (8 ) is a wff for any wff .
Any expression which cannot be constructed by recursive application of these rules is not a wff. (For
clarity, parentheses may be omitted in accordance with customary precedence rules.)
(vi) The logical expression formulas (wff-wffs) are defined recursively as follows.
(1) is a wff-wff for any wff-name which is bound to a wff.
(2) (81 82 ) is a wff-wff for any wff-wffs 1 and 2 .
(3) (8 ) is a wff-wff for any wff-wff .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(vii) The axiom templates are the following logical expression formulas for any wffs , and .
(PC 1) ( ).
(PC 2) ( ( )) (( ) ( )).
(PC 3) ( ) ( ).
(viii) The inference rules are substitution and modus ponens. (See Remark 4.3.17.)
(MP) , ` .
4.4.4 Remark: The lack of obvious veracity and adequacy of the propositional calculus axioms.
Axioms PC 1, PC 2 and PC 3 in Definition 4.4.3 are not obvious consequences of the intended meanings of
the logical operators. (See Theorem 3.12.3 for a low-level proof of the validity of the axioms.) Nor is it
obvious that all intended properties of the logical operators follow from these three axioms.
Axiom PC 1 is one half of the definition of the operator. Axiom PC 2 looks like a distributivity axiom
or restriction axiom, which says how the operator is distributed or restricted by another operator.
Axiom PC 3 looks like a modus tollens rule. (See Remark 4.8.15 for the modus tollens rule.)
4.4.5 Remark: The interpretation of axioms, axiom schemas and axiom templates.
Since this book is concerned more with meaning than with advanced techniques of computation, it is im-
portant to clarify the meanings of the logical expressions which appear, for example, in Definition 4.4.3 and
Theorem 4.5.6 before proceeding to the mechanical grind of logical deduction. (The modern formalist fashion
in mathematical logic is to study logical deduction without reference to semantics, regarding semantics as
an optional extra. In this book, the semantics is primary.)
Broadly speaking, the axioms in axiomatic systems may be true axioms or axiom schemas, and the proofs
of theorems may be true proofs or proof schemas. In the case of true axioms, the letters in the logical
expression in an axiom have variable associations with concrete propositions. Each letter is associated with
at most one concrete proposition, the association depending on context. In this case of true axioms, the
axioms are logical expressions in layer 3 in Figure 4.1.1 in Remark 4.1.7. An infinite number of tautologies
may be generated from true axioms by means of a substitution rule. Such a rule states that a new tautology
may be generated from any tautology by substituting an arbitrary logical expression for any letter in the
tautology. In particular, axioms may yield infinitely many tautologies in this way.
In the case of axiom schemas, the axioms are logical expression formulas (i.e. wff-wffs), which are in layer 5
in Figure 4.1.1. There are (at least) two interpretations of how axiom schemas are implemented. In the
first interpretation, the infinite set of all instantiations of the axiom schemas is notionally generated before
deductions within the system commence. In this view, an axiom instance, obtained by substituting a
particular logical expression for each letter in an axiom schema, exists in some sense before deductions
commence, and is simply taken off the shelf or out of the toolbox. In the second interpretation, each
axiom schema is a mere axiom template from which tautologies are generated on demand by substituting
logical expressions for each letter in the template. In this axiom template interpretation, a substitution
rule is required, but the source for substitution is in this case a wff-letter in a wff-wff rather than in a wff.
For a standard propositional calculus (without nonlogical axioms), the choice between axioms, axiom schemas
and axiom templates has no effect on the set of generated tautologies. The axioms in Definition 4.4.3 are
intended to be interpreted as axiom templates.
In the case of logic theorems such as Theorem 4.5.6, each assertion may be regarded as an assertion schema,
and each proof may be regarded as a proof schema. Then each theorem and proof may be applied by
substituting particular logical expressions into these schemas to produce a theorem instantiation and proof
instantiation. However, this interpretation does not seem to match the practical realities of logical theorems
and proofs. By an abuse of notation, one may express the true meaning of an axiom such as ( )
as W , W , ( ), where W denotes the set of all logical expressions. In other words,
the wff-wff ( ) is thought of as a parametrised set of logical expressions which are handled
simultaneously as a single unit.
This is analogous to the way in which two real-valued functions f, g : R Rmay be added to give a
R R
function f + g : by pointwise addition f + g : x 7 f (x) + g(x). Similarly, one might write f g to
R
mean x , f (x) g(x). Thus a wff-wff ( ) may be regarded as a function of the arguments
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and , and the logical expression formula is defined pointwise for each value of and . Thus an
infinite number of assertions may be made simultaneously. The quantifier expressions W and
W are implicit in the use of Greek letters in the axioms in Definition 4.4.3 and in the theorems and
the proof-lines in Theorem 4.5.6. To put it another way, one may interpret an axiom such as ( )
as signifying the map f : W W W given by f (, ) = ( ) for all W and W .
The quantifier-language for all and the set-language W may be removed here by interpreting the axiom
( ) to mean:
if and are wffs, then the substitution of and into the formula ( ) yields a tautology.
Thus the substitution rule is implicitly built into the meaning of the axiom because the use of wff letters in the
logical expression formula implies the claim that the formula yields a valid tautology for all wff substitutions.
There is a further distinction here between wff-functions and wff-rules. A function is thought of as a set of
ordered pairs, whereas a rule is an algorithm or procedure. Here the rule interpretation will be adopted.
In other words, an axiom such as ( ) will be interpreted as an algorithm or procedure to
construct a wff from zero or more wff-letter substitutions. This may be referred to as an axiom template
interpretation.
Yet another fine distinction here is the difference between the value of the axiom template ( )
for a particular choice of logical expressions and and the rule (, ) 7 ( ). The value
is a particular logical expression in W . The rule is a map from W W to W . (This is analogous to the
R
distinction between the value of a real-valued function x2 + 1 for a particular x and the rule x 7 x2 + 1.)
The intended meaning is generally implicit in each context.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It may seem that the rules stated here for propositional calculus lack precision. To state the rules with total
precision would make them look like computer programs, which would be difficult for humans to read. The
rules are hopefully stated precisely enough so that a competent programmer could translate them into a form
which is suitable for a computer to read. However, this book is written for humans. The reader needs to be
convinced only that the rules could be converted to a form suitable for verification by computer software.
axiom instances. The phrase potentially infinite is often used to describe this situation. The application
of substitutions may generate any finite number of axiom instances, and from each axiom instance, any finite
number of concrete logical expressions may be generated. One could say that the number is unbounded or
potentially infinite, but the existence of aggregate sets of axiom instances and concrete logical expressions
is not required.
4.5.5 Remark: Tagging of theorems which are proved within the propositional calculus.
Logic theorems which are deduced within the propositional calculus axiomatic system in Definition 4.4.3 will
be tagged with the abbreviation PC, as in Theorem 4.5.6 for example. (Theorems and definitions which
are not tagged are, by default, stated within Zermelo-Fraenkel set theory. See Remark 7.1.11.)
4.5.6 Theorem [pc]: The following assertions follow from Definition 4.4.3.
(i) ` .
(ii) ( ) ` ( ) ( ).
(iii) ` .
(iv) , ` .
(v) ( ) ` ( ).
(vi) ` ( ) (( ) ( )).
(vii) ` ( ) (( ) ( )).
(viii) ` ( ) ( ).
(ix) ` ( ) ( ).
(x) ` ( ) (( ( )) ( )).
(xi) ` .
(xii) ` .
(xiii) ` (( ) ).
(xiv) ` ( ( )) ( ( )).
(xv) ` ( ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(xvi) ` ( ).
(xvii) ` ( ) .
(xviii) ` ( ) (( ) ).
(xix) ` ( ) (( ) ).
(xx) ` (( ) ) ( ).
(xxi) ` (( ) ) ( ).
(xxii) ` .
(xxiii) ` .
(xxiv) ` .
(xxv) ` .
(xxvi) ` ( ) ( ).
(xxvii) ` ( ) ( ).
(xxviii) ` ( ) ( ).
(xxix) ` .
(xxx) ` .
(xxxi) ` .
(xxxii) , ` .
(xxxiii) ` ( ) (( ) ).
(xxxiv) ` ( ) (( ) ).
(xxxv) ` ( ) (( ) ).
(xxxvi) ` ( ) ( ).
(xxxvii) ` ( ) ( ).
(xxxviii) ` .
(xxxix) ` .
(xl) ` .
(xli) ` .
(xlii) ( ) ` .
(xliii) ( ) ` .
(xliv) ( ( )).
(xlv) , ` ( ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (iv): , `
(1) Hyp
(2) Hyp
(3) ( ) part (i) (2): [ ], []
(4) ( ) ( ) part (ii) (3): [], [], []
(5) MP (4,1)
To prove part (v): ( ) ` ( )
(1) ( ) Hyp
(2) ( ) ( ) part (ii) (1): [], [], []
(3) ( ) PC 1: [], []
(4) ( ) part (iv) (3,2): [], [ ], [ ]
To prove part (vi): ` ( ) (( ) ( ))
(1) ( ) ( ( )) PC 1: [ ], []
(2) ( ( )) (( ) ( )) PC 2: [], [], []
(3) ( ) (( ) ( )) part (iv) (1,2): [ ], [ ( )], [( ) ( )]
To prove part (vii): ` ( ) (( ) ( ))
(1) ( ) (( ) ( )) part (vi): [], [], []
(2) ( ) (( ) ( )) part (v) (1): [ ], [ ], [ ]
To prove part (viii): ` ( ) ( )
(1) Hyp
(2) ( ) part (i) (1): [ ], []
(3) ( ) ( ) part (ii) (2): [], [], []
To prove part (ix): ` ( ) ( )
(1) Hyp
(2) ( ) (( ) ( )) part (vii): [], [], []
(3) ( ) ( ) MP (2,1)
To prove part (x): ` ( ) (( ( )) ( ))
(1) ( ( )) (( ) ( )) PC 2: [], [], []
(2) ( ) (( ( )) ( )) part (v) (1): [ ( )], [ ], [ ]
To prove part (xi): `
(1) (( ) ) PC 1: [], [ ]
(2) ( ( )) ( ) part (ii) (1): [], [ ], []
(3) ( ) PC 1: [], []
(4) MP (2,3)
To prove part (xii): `
(1) Hyp
(2) part (xi): []
(3) MP (2,1)
To prove part (xiii): ` (( ) )
(1) ( ) ( ) part (xi): [ ]
(2) (( ) ) part (v) (1): [ ], [], []
To prove part (xiv): ` ( ( )) ( ( ))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) ( ( )) (( ) ( )) PC 2: [], [], []
(2) ( ) PC 1: [], []
(3) (( ) ( )) ( ( )) part (ix) (2): [], [ ], [ ]
(4) ( ( )) ( ( ))
part (iv) (1,3): [ ( )], [( ) ( )], [ ( )]
To prove part (xv): ` ( )
(1) ( ) PC 1: [], []
(2) ( ) ( ) PC 3: [], []
(3) ( ) part (iv) (1,2): [], [ ], [ ]
To prove part (xvi): ` ( )
(1) ( ) part (xv): [], []
(2) ( ) part (v) (1): [], [], []
To prove part (xvii): ` ( )
(1) ( ( )) part (xv): [], [( )]
(2) ( ) ( ( )) part (ii) (1): [], [], [( )]
(3) ( ( )) (( ) ) PC 3: [ ], []
(4) ( ) (( ) ) part (iv) (2,3): [ ], [ ( )], [( ) ]
(5) ( ) (( ) ) part (v) (4): [ ], [ ], []
(6) part (xi): []
(7) ( ) MP (5,6)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (xxiv): `
(1) Hyp
(2) part (xxii): []
(3) MP (2,1)
To prove part (xxv): ` .
(1) Hyp
(2) part (xxiii): []
(3) MP (2,1)
To prove part (xxvi): ` ( ) ( )
(1) part (xxii): []
(2) ( ) ( ) part (ix): [], [], []
(3) ( ) ( ) PC 3: [], []
(4) ( ) ( ) part (iv) (2,3): [ ], [ ], [ ]
To prove part (xxvii): ` ( ) ( )
(1) part (xxiii): []
(2) ( ) ( ) part (viii): [], [], []
(3) ( ) ( ) PC 3: [], []
(4) ( ) ( ) part (iv) (2,3): [ ], [ ], [ ]
To prove part (xxviii): ` ( ) ( )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) ( ) (( ) ) part (xix): [], []
(2) ( ) ( ) part (xxvii): [], []
(3) (( ) ) (( ) ) part (ix) (2): [ ], [ ], []
(4) ( ) (( ) ) part (iv) (1,3): [ ], [( ) ], [( ) ]
To prove part (xxxiv): ` ( ) (( ) )
(1) ( ) ( ) part (xxviii): [], []
(2) ( ) (( ) ) part (xxxiii): [], []
(3) ( ) (( ) ) part (iv) (1,2): [ ], [ ], [( ) ]
To prove part (xxxv): ` ( ) (( ) )
(1) ( ) ( ) part (xxviii): [], []
(2) ( ) (( ) ) part (xix): [], []
(3) ( ) (( ) ) part (iv) (1,2): [ ], [ ], [( ) ]
To prove part (xxxvi): ` ( ) ( )
(1) (( ) ) part (xiii): [], []
(2) part (xxii): []
(3) (( ) ) part (iv) (2,1): [], [], [( ) ]
(4) ( ) ( ) part (v) (3): [], [ ], []
To prove part (xxxvii): ` ( ) ( )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) ( ) part (xv): [], []
(3) ( ) part (xxx) (2): [], [ ]
(4) MP (3,1)
To prove part (xliii): ( ) `
(1) ( ) Hyp
(2) ( ) PC 1: [], []
(3) ( ) part (xxxi) (2): [], [ ]
(4) MP (3,1)
To prove part (xliv): ( ( ))
(1) (( ) ) part (xiii): [], []
(2) (( ) ) ( ( )) part (xxviii): [ ], []
(3) ( ( )) part (iv) (1,2): [], [( ) ], [ ( )]
To prove part (xlv): , ` ( )
(1) Hyp
(2) Hyp
(3) ( ( )) part (xliv): [], []
(4) ( ) MP (3,1)
(5) ( ) MP (4,2)
This completes the proof of Theorem 4.5.6.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) Subst (1): []
It is the trivial borderline cases which often provide the impetus to properly formalise the methods of a
logical or mathematical framework. This is perhaps such a case.
4.5.8 Remark: Equivalence of a logical implication expression to its contrapositive.
An implication expression of the form is known as the contrapositive of the corresponding
implication expression . Theorem 4.5.6 parts (iii) and (xxxi) demonstrate that the contrapositive
of a logical expression is equivalent to the original (i.e. positive) logical expression. (For the definition
of contrapositive, see for example Kleene [316], page 13; E. Mendelson [320], page 21. For the law of
contraposition, see for example Suppes [344], pages 34, 204205; Margaris [319], pages 2, 4, 71; Church [299],
pages 119, 121; Kleene [315], page 113; Curry [301], page 287. The equivalence is also called the principle
of transposition by Whitehead/Russell, Volume I [350], page 121. It is called both transposition and
contraposition by Carnap [296], page 32.)
4.5.9 Remark: Notation for the burden of hypotheses.
In the proofs of assertions in Theorem 4.5.6, the dependence of argument-lines on previous argument-lines is
not indicated. Any assertion which follows from axioms only is clearly a direct, unconditional consequence of
the axioms. But assertions which depend on hypotheses are only conditionally proved. An assertion which
has a non-empty list of propositions on the left of the assertion symbol is in effect a deduction rule. Such
an assertion states that if the propositions on the left occur in an argument, then the proposition on the
right may be proved. In other words, the assertion is a derived rule which may be applied in later proofs,
not an assertion of a tautology.
Strictly speaking, one should distinguish between argument-lines which are unconditional assertions of tau-
tologies and conditional argument lines. In some logical calculus frameworks, this is done systematically.
For example, one might rewrite part (i) of Theorem 4.5.6 as follows.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
calculus, but the predicate calculus in Chapter 6 often requires both accumulation and elimination (or
discharge) of hypotheses from such dependency lists. Then inspired guesswork must be replaced by a
systematic discipline.
4.5.10 Remark: Comparison of propositional calculus argumentation with computer software libraries.
The gradual build-up of theorems in an axiomatic system (such as propositional calculus, predicate calculus
or Zermelo-Fraenkel set theory) is analogous to the way programming procedures are built up in software
libraries. In both cases, there is an attempt to amass a hoard of re-usable intellectual capital which can
be used in a wide variety of future work. Consequently the work gets progressively easier as user-friendly
theorems (or programming procedures) accumulate over time.
Accumulation of theorem libraries sounds like a good idea in principle, but a single error in a single re-usable
item (a theorem or a programming procedure) can easily propagate to a very wide range of applications. In
other words, bugs can creep into re-usable libraries. It is for this reason that there is so much emphasis
on total correctness in mathematical logic. The slightest error could propagate to all of mathematics.
The development of the propositional calculus is also analogous to boot-strapping a computer operating
system. The propositional calculus is the lowest functional layer of mathematics. Everything else is based
on this substrate. Logic and set theory may be thought of as a kind of operating system of mathematics.
Then differential geometry is a user application in this operating system.
such a systematic, exhaustive approach is about as useful as generating the set of all possible chess games
according to the number of moves n. As n increases, the number of possible games increases very rapidly.
But a more serious problem is that it the vast majority of such games are worthless. Similarly in logic, the
vast majority of true assertions are uninteresting.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
assert the validity of anything which follows from a set of axioms, as if deduction from axioms were the
golden test of validity. As in the sciences, mathematicians should be more willing to question their golden
axioms if the consequences are too absurd.
If a result is questioned on the grounds of absurdity, some people defend the result on the grounds that it
follows inexorably from axioms, and the axioms are defended on the grounds that if they are not accepted,
then one loses certain highly desirable results which follow from them. Thus one is offered a package of axioms
which have mixed results, including both desirable results and undesirable results. The heavy investment in
the argumentative method yields profits when one is able to make assertions with them. The choice of axioms
confers the power to decide which results are true and which are false. Perhaps it is this power over truth
and falsity which makes the argumentative method so attractive. The ability to arrive at absurd conclusions
from minimal assumptions is a power which is difficult to resist, despite the necessary expenditure of effort
on arduous, frustrating proofs. If one merely wrote out all of the desired tautologies of propositional logic
as simple calculations from their semantic definitions, they would not have the same power of persuasion as
a lengthy argument from first principles.
4.5.14 Remark: Identifying the logical conjunction operator in the propositional calculus.
Theorem 4.5.15 establishes that the logical expression formula ( ) has the properties that one would
expect of the conjunction , or equivalently the list of atomic wff-wffs and .
4.5.15 Theorem [pc]: The following assertions follow from Definition 4.4.3.
(i) ` ( ) .
(ii) ` ( ) .
(iii) ` ( ( )).
(iv) , ` ( ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
4.6.1 Remark: The purpose of Theorem 4.6.2.
The purpose of Theorem 4.6.2 is to prove Theorem 4.7.6 (viii), which is equivalent to Theorem 4.6.2 (vi).
4.6.2 Theorem [pc]: The following assertions follow from the propositional calculus in Definition 4.4.3.
(i) ` ( ) (( ) ).
(ii) ` ( ) (( ) ).
(iii) ( ), ` ( ).
(iv) ` ( ) (( ) (( ) )).
(v) ( ( )) ` ( ( )).
(vi) ` ( ) (( ) (( ) )).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) ( ) (( ) (( ) )) part (v) (2)
(4) ( ) (( ) (( ) )) Theorem 4.5.6 (v) (3)
This completes the proof of Theorem 4.6.2.
4.6.4 Theorem [pc]: The following assertions follow from the propositional calculus in Definition 4.4.3.
(i) ` ( ) (( ) ).
(ii) , ` .
(iii) , ` .
(iv) , ` .
(v) , ` .
(vi) , ` .
(vii) , ` .
(viii) ( ), ` .
(ix) , , ( ) ` .
(x) ( ), ` ( ).
(xi) ` ( ( )) (( ) (( ) ( ))).
(xii) ( ) ` ( ) (( ) ( )).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) ( ) ( ) Theorem 4.5.6 (xxvi)
(4) MP (3,2)
(5) MP (4,1)
To prove part (v): , `
(1) Hyp
(2) Hyp
(3) Theorem 4.5.6 (xxxi) (2)
(4) MP (3,1)
To prove part (vi): , `
(1) Hyp
(2) Hyp
(3) part (iv)
(4) Theorem 4.5.6 (xxii)
(5) MP (4,3)
To prove part (vii): , `
(1) Hyp
(2) Hyp
(3) part (v)
(4) Theorem 4.5.6 (xxii)
(5) MP (4,3)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) ( ) Hyp
(2) ( ( )) (( ) (( ) ( ))) part (xi)
(3) ( ) (( ) ( )) MP (1,2)
This completes the proof of Theorem 4.6.4.
4.7.4 Remark: Arguments for and against a minimal set of basic logical operators.
Definition 4.7.2 is not how logical connectives are defined in the real world. It just happens that there is a lot
of redundancy among the operators in Notation 3.5.2. So it is possible to define the full set of operators in
terms of a proper subset. Defining the operators in terms of a minimal set of operators is part of a minimalist
mode of thinking which is not necessarily useful or helpful.
Reductionism has been enormously successful in the natural sciences in the last couple of centuries. But
minimalism is not the same thing as reductionism. Reductionism recursively reduces complex systems to
fundamental principles and synthesises entire systems from these simpler principles. (E.g. Solar System
dynamics can be synthesised from Newtons laws.) However, it cannot be said that the operators and
are more fundamental than the operators and . The best way to think of the basic logical connectives
is as a network of operators which are closely related to each other.
In many contexts, the set of three operators , and is preferred as the fundamental operator set.
(For example, there are useful decompositions of all truth functions into disjunctive normal form and
conjunctive normal form.) In the context of propositional calculus, the operator is more fundamental
because it is the basis of the modus ponens inference rule. But the modus ponens rule could be replaced by
an equivalent set of rules which use one or more different logical connectives.
A propositional calculus based on the single NAND operator with a single axiom and modus ponens is
possible, but it requires a lot of work for nothing. The NAND operator is in no way the fundamental
operator underlying all other operators. It is minimal, not fundamental.
4.7.5 Remark: Demonstrating a larger set of natural axioms for non-basic logical operators.
The ten assertions of Theorem 4.7.6 are the same as the axiom schemas of the propositional calculus described
by E. Mendelson [320], page 40.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
4.7.6 Theorem [pc]: Proof of 10 axioms for an alternative propositional calculus with , , and .
The following assertions for wffs , and follow from the propositional calculus in Definition 4.4.3.
(i) ` ( ).
(ii) ` ( ( )) (( ) ( )).
(iii) ` ( ) .
(iv) ` ( ) .
(v) ` ( ( )).
(vi) ` ( ).
(vii) ` ( ).
(viii) ` ( ) (( ) (( ) )).
(ix) ` ( ) (( ) ).
(x) ` .
Proof: Parts (i) and (ii) are identical to axiom schemas PC 1 and PC 2 respectively in Definition 4.4.3.
Part (iii) is the same as ` ( ) by Definition 4.7.2 (ii). But this assertion is identical to
Theorem 4.5.15 (i). Similarly, part (iv) is the same as ` ( ) by Definition 4.7.2 (ii), and this
assertion is identical to Theorem 4.5.15 (ii).
To prove part (v): ` ( ( ))
(1) ( ) ( ) Theorem 4.5.6 (xi)
(2) (( ) ) Theorem 4.5.6 (v) (1)
(3) (( ) ) ( ( )) Theorem 4.5.6 (xxvi)
4.7.7 Remark: Joining and splitting tautologies to and from logical expressions.
If a wff is a tautology, it may be combined in a disjunction with any wff without altering the truth or
falsity of the enclosing wff. Thus if is a sub-wff of a wff, it may be replaced with the combined wff ( ).
This fact follows from Theorem 4.7.6 (vi) and the theorem ` ( ) . Conversely, a tautology may
be removed from a conjunction. Thus may be substituted for if is a tautology.
4.7.8 Remark: Joining and splitting contradictions to and from logical expressions.
There is a corresponding observation to Remark 4.7.7 in terms of a contradiction instead of a tautology. (See
Definition 3.12.2 (i) for contradictions.) A contradiction can be combined with any proposition without
altering its truth or falsity. Thus the combined proposition is equivalent to for any proposition
and contradiction .
4.7.9 Theorem [pc]: More assertions for propositional calculus with the operators , , , and .
The following assertions for wffs , and follow from the propositional calculus in Definition 4.4.3.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) , ` .
(ii) , ` .
(iii) ` ( ( )) ( ( )).
(iv) ` ( ) ( ).
(v) ` ( ) (() ).
(vi) ` () (( ) ).
(vii) ` (( )) ( ).
(viii) ` (( )) ( ).
(ix) ` .
(x) ` .
(xi) ` .
(xii) ` .
(xiii) ` .
(xiv) ` .
(xv) ` .
(xvi) , ` .
(xvii) , ` .
(xviii) , ` .
(xix) ` .
(xx) ` .
(xxi) ` ( ) ( ).
(xxii) ` ( ) ( ).
(xxiii) ` ( ) ( ).
(xxiv) ` ( ) ( ).
(xxv) ` ( ) ( ).
(xxvi) ` ( ) ( ).
(xxvii) ` ( ) ( ).
(xxviii) ` .
(xxix) ` ( ) ( ).
(xxx) ` ( ) ( ).
(xxxi) ` ( ) ( ).
(xxxii) ` ( ( )) ( ( )).
(xxxiii) ` (( ) ) ( ( )).
(xxxiv) ` (( ) ) ( ( )).
(xxxv) ` ( ) (( ) (( ) )).
(xxxvi) , , ` .
(xxxvii) , ` ( ) .
(xxxviii) ` ( ) .
(xxxix) ` ( ) .
(xl) ` ( ).
(xli) ` ( ) .
(xlii) ` ( ) (( ) ).
(xliii) ` (( ) ) ( ( )).
(xliv) ` ( ( )) (( ) ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(xlv) ( ) ` ( ) .
(xlvi) ` (( ) ) ( ( )).
(xlvii) ` (( ) ) ( ( )).
(xlviii) ` (( ) ) (( ) ).
(xlix) ` (( ) ) (( ) ).
(l) ` (( ) ) (( ) ).
(li) ` (( ) ) (( ) ).
(lii) ` ( ( )) (( ) ).
(liii) ` ( ( )) ( ( )).
(liv) ` ( ( )) ( ( )).
(lv) ` ( ( )) (( ) ).
(lvi) ` .
(lvii) ` .
(lviii) ` .
(lix) ` .
(lx) a ` .
(lxi) ( ) ` .
(lxii) ` ( ).
(lxiii) ` ( ) .
(lxiv) ` ( ) (( ) ).
(lxv) ` ( ) ( ( )).
(lxvi) ` ( ) (( ) ( ( ))).
(lxvii) , ` ( ).
(lxviii) ( ), ( ) ` ( ).
(lxix) ` ( ) (( ) ).
(lxx) ` ( ) ( ).
(lxxi) ` ( ) ( ).
(lxxii) ` ( ) (( ) ( )).
(lxxiii) ` ( ( )) (( ) ( )).
(lxxiv) ` ( ( )) (( ) ( )).
(lxxv) ` (( ) ( )) ( ( )).
(lxxvi) ` ( ( )) (( ) ( )).
(lxxvii) ` ( ( )) (( ) ( )).
(lxxviii) ` (( ) ( )) ( ( )).
(lxxix) ` ( ( )) (( ) ( )).
(lxxx) ` ( ( )) .
(lxxxi) ` ( ( )) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (ii): , `
(1) Hyp
(2) Hyp
(3) ( ) ( ) part (i) (1,2)
(4) Definition 4.7.2 (iii) (3)
To prove part (iii): ` ( ( )) ( ( ))
(1) ( ( )) ( ( )) Theorem 4.5.6 (xiv)
(2) ( ( )) ( ( )) Theorem 4.5.6 (xiv)
(3) ( ( )) ( ( )) part (ii) (1,2)
To prove part (iv): ` ( ) ( )
(1) ( ) ( ) Theorem 4.5.6 (xxvi)
(2) ( ) ( ) Theorem 4.5.6 (xxvi)
(3) ( ) ( ) part (ii) (1,2)
To prove part (v): ` ( ) (() )
(1) (() ) (() ) Theorem 4.5.6 (xi)
(2) ( ) (() ) Definition 4.7.2 (i) (1)
To prove part (vi): ` () (( ) )
(1) ( ) (() ) part (v)
(2) () (( ) ) Theorem 4.5.6 (v) (1)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (xiii): `
(1) Hyp
(2) ( ) ( ) Definition 4.7.2 (iii) (1)
(3) part (ix) (2)
To prove part (xiv): `
(1) Hyp
(2) ( ) ( ) Definition 4.7.2 (iii) (1)
(3) part (x) (2)
To prove part (xv): `
(1) Hyp
(2) part (xiv) (1)
(3) part (xiii) (1)
(4) part (ii) (2,3)
To prove part (xvi): , `
(1) Hyp
(2) Hyp
(3) part (xiii) (1)
(4) part (xiii) (2)
(5) Theorem 4.5.6 (iv) (3,4)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) part (xiii) (1)
(3) part (xiv) (1)
(4) Theorem 4.5.6 (iii) (3)
(5) Theorem 4.5.6 (iii) (2)
(6) part (ii) (4,5)
To prove part (xxi): ` ( ) ( )
(1) Hyp
(2) part (xiii) (1)
(3) part (xiv) (1)
(4) ( ) ( ) Theorem 4.5.6 (viii) (2)
(5) ( ) ( ) Theorem 4.5.6 (viii) (3)
(6) ( ) ( ) part (ii) (4,5)
To prove part (xxii): ` ( ) ( ).
(1) Hyp
(2) part (xiii) (1)
(3) part (xiv) (1)
(4) ( ) ( ) Theorem 4.5.6 (ix) (3)
(5) ( ) ( ) Theorem 4.5.6 (ix) (2)
(6) ( ) ( ) part (ii) (4,5)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) ( ) ( ) Definition 4.7.2 (i) (1)
To prove part (xxviii): `
(1) Hyp
(2) ( ) ( ) part (xxvii)
(3) MP (2,1)
To prove part (xxix): ` ( ) ( )
(1) ( ) ( ) part (xxvii)
(2) ( ) ( ) part (xxvii)
(3) ( ) ( ) part (ii) (1,2)
To prove part (xxx): ` ( ) ( )
(1) ( ) ( ) Theorem 4.5.6 (xxvi)
(2) ( ) ( ) Theorem 4.5.6 (xxxi) (1)
(3) ( ) ( ) Definition 4.7.2 (ii) (2)
To prove part (xxxi): ` ( ) ( )
(1) ( ) ( ) part (xxx)
(2) ( ) ( ) part (xxx)
(3) ( ) ( ) part (ii) (1,2)
To prove part (xxxii): ` ( ( )) ( ( ))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (xxxvi): , , `
(1) Hyp
(2) Hyp
(3) Hyp
(4) ( ) (( ) (( ) )) part (xxxv)
(5) ( ) (( ) ) MP (4,1)
(6) ( ) MP (5,2)
(7) MP (6,3)
To prove part (xxxvii): , ` ( )
(1) Hyp
(2) Hyp
(3) ( ) (( ) (( ) )) Theorem 4.7.6 (viii)
(4) ( ) (( ) ) MP (3,1)
(5) ( ) MP (4,2)
To prove part (xxxviii): ` ( )
(1) Theorem 4.5.6 (xi)
(2) ( ) part (xxxvii) (1,1)
To prove part (xxxix): ` ( )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (xliv): ` ( ( )) (( ) )
(1) ( ) Theorem 4.7.6 (iii)
(2) ( ( )) (( ) ( )) Theorem 4.5.6 (ix) (1)
(3) ( ) (( ) ) part (xlii)
(4) (( ) ( )) (( ) ) PC 2
(5) ( ( )) (( ) ) Theorem 4.5.6 (iv) (2,4)
To prove part (xlv): ( ) ` ( )
(1) ( ) Hyp
(2) ( ( )) (( ) ) part (xliv)
(3) ( ) MP (1,2)
To prove part (xlvi): ` (( ) ) ( ( ))
(1) (( ) ) ( ( )) part (xliii)
(2) ( ( )) (( ) ) part (xliv)
(3) (( ) ) ( ( )) part (ii) (1,2)
To prove part (xlvii): ` (( ) ) ( ( ))
(1) (( ) ) ( ( )) part (xlvi)
(2) ( ( )) ( ( )) part (iii)
(3) (( ) ) ( ( )) part (xvi) (1,2)
To prove part (xlviii): ` (( ) ) (( ) )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(4) (( ) ) (( ) ) part (xlix)
(5) ( ( )) (( ) ) part (xvi) (3,4)
To prove part (liii): ` ( ( )) ( ( ))
(1) ( ( )) ( ( )) Theorem 4.5.6 (xiv)
(2) ( ( )) ( ( )) Definition 4.7.2 (i) (1)
To prove part (liv): ` ( ( )) ( ( ))
(1) ( ( )) ( ( )) part (iii)
(2) ( ( )) ( ( )) Definition 4.7.2 (i) (1)
To prove part (lv): ` ( ( )) (( ) )
(1) ( ( )) ( ( )) part (liv)
(2) ( ( )) (( ) ) part (xxix)
(3) ( ( )) (( ) ) part (xvi) (1,2)
(4) ( ) ( ) part (xxix)
(5) ( ( )) ( ( )) part (xxiii) (4)
(6) ( ( )) (( ) ) part (xvi) (5,3)
To prove part (lvi): `
(1) Hyp
(2) Theorem 4.5.6 (xxxviii) (1)
(3) Definition 4.7.2 (i) (2)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(4) ( ) Theorem 4.5.6 (xxv) (3)
(5) ( ) Definition 4.7.2 (ii) (4)
To prove part (lxiii): ` ( )
(1) Hyp
(2) ( ) Theorem 4.7.6 (iii)
(3) ( ( )) Theorem 4.7.6 (v)
(4) ( ) ( ( )) Theorem 4.5.6 (ii) (3)
(5) ( ) MP (1,4)
(6) ( ) part (ii) (2,5)
To prove part (lxiv): ` ( ) (( ) )
(1) ( ) Theorem 4.7.6 (iii)
(2) ( ) (( ) ) Theorem 4.5.6 (i)
To prove part (lxv): ` ( ) ( ( ))
(1) ( ( )) Theorem 4.7.6 (v)
(2) ( ) ( ( )) Theorem 4.5.6 (ii) (1)
To prove part (lxvi): ` ( ) (( ) ( ( )))
(1) ( ( )) Theorem 4.7.6 (v)
(2) ( ) (( ) ( ( ))) Theorem 4.6.4 (xii) (1)
To prove part (lxvii): , ` ( )
(1) Hyp
(2) Hyp
(3) ( ) (( ) ( ( ))) part (lxvi)
(4) ( ) ( ( )) MP (1,3)
(5) ( ) MP (2,4)
To prove part (lxviii): ( ), ( ) ` ( )
(1) ( ) Hyp
(2) ( ) Hyp
(3) (( ) ( )) part (lxvii)
(4) ( ) Definition 4.7.2 (iii) (3)
To prove part (lxix): ` ( ) (( ) )
(1) ( ) (( ) ) part (lxiv)
(2) ( ) ( ( )) part (lxv)
(3) ( ) (( ) ) part (lxviii) (1,2)
To prove part (lxx): ` ( ) ( )
(1) Hyp
(2) (() ) (() ) Theorem 4.5.6 (viii) (1)
(3) ( ) ( ) Definition 4.7.2 (i) (2)
To prove part (lxxi): ` ( ) ( )
(1) Hyp
(2) () () Theorem 4.5.6 (xxxi) (1)
(3) ( ) ( ) Theorem 4.5.6 (viii) (2)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(4) (( )) (( )) Theorem 4.5.6 (xxxi) (3)
(5) ( ) ( ) Definition 4.7.2 (ii) (4)
To prove part (lxxii): ` ( ) (( ) ( ))
(1) ( ) (() ()) Theorem 4.5.6 (xxviii)
(2) (() ()) (( ) ( )) Theorem 4.5.6 (vi)
(3) (( ) ( )) (( ) ( )) Theorem 4.5.6 (xxviii)
(4) (( ) ( )) (( ) ( )) Definition 4.7.2 (ii) (3)
(5) ( ) (( ) ( )) Theorem 4.5.6 (iv) (1,2)
(6) ( ) (( ) ( )) Theorem 4.5.6 (iv) (5,4)
To prove part (lxxiii): ` ( ( )) (( ) ( ))
(1) ( ) (() ()) Theorem 4.5.6 (xxviii)
(2) ( ( )) ( (() ())) Theorem 4.5.6 (viii) (1)
(3) ( (() ())) (( ()) ( ())) PC 2
(4) ( ( )) (( ()) ( ())) Theorem 4.5.6 (iv) (2,3)
(5) (( ()) ( ())) (( ()) ( ())) Theorem 4.5.6 (xxviii)
(6) (( ()) ( ())) (( ) ( )) Definition 4.7.2 (ii) (5)
(7) ( ( )) (( ) ( )) Theorem 4.5.6 (iv) (4,6)
To prove part (lxxiv): ` ( ( )) (( ) ( ))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) ( ) Theorem 4.7.6 (vi)
(2) ( ) ( ( )) part (lxxi) (1)
(3) ( ) Theorem 4.7.6 (vii)
(4) ( ) ( ( )) part (lxxi) (3)
(5) (( ) ( )) ( ( )) part (xxxvii) (2,4)
To prove part (lxxix): ` ( ( )) (( ) ( ))
(1) ( ( )) (( ) ( )) part (lxxvii)
(2) (( ) ( )) ( ( )) part (lxxviii)
(3) ( ( )) (( ) ( )) part (ii) (1,2)
To prove part (lxxx): ` ( ( ))
(1) Theorem 4.5.6 (xi)
(2) ( ) Theorem 4.7.6 (iii)
(3) ( ( )) part (xxxvii) (1,2)
(4) ( ( )) Theorem 4.7.6 (vi)
(5) ( ( )) part (ii) (3,4)
To prove part (lxxxi): ` ( ( ))
(1) ( ( )) Theorem 4.7.6 (iii)
(2) Theorem 4.5.6 (xi)
(3) ( ) Theorem 4.7.6 (vi)
(4) ( ( )) part (ii) (1,3)
4.7.10 Remark: Some basic logical equivalences which shadow the corresponding set-theory equivalences.
The purpose of Theorem 4.7.11 is to supply logical-operator analogues for the basic set-operation identities
in Theorem 8.1.6.
4.7.11 Theorem [pc]: The following assertions for wffs , and follow from the propositional calculus
in Definition 4.4.3.
(i) ` ( ) ( ). (Commutativity of disjunction.)
(ii) ` ( ) ( ). (Commutativity of conjunction.)
(iii) ` ( ) .
(iv) ` ( ) .
(v) ` ( ) ( ) . (Associativity of disjunction.)
(vi) ` ( ) ( ) . (Associativity of conjunction.)
(vii) ` ( ) ( ) ( ) . (Distributivity of disjunction over conjunction.)
(viii) ` ( ) ( ) ( ) . (Distributivity of conjunction over disjunction.)
(ix) ` ( ) . (Absorption of disjunction over conjunction.)
(x) ` ( ) . (Absorption of conjunction over disjunction.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (viii) is the same as Theorem 4.7.9 (lxxix).
Part (ix) is the same as Theorem 4.7.9 (lxxx).
Part (x) is the same as Theorem 4.7.9 (lxxxi).
a tautology. One could make the claim that the argumentative method of propositional calculus mimics the
way in which people logically argue points, but this does not justify the excessive effort compared to truth-
table calculations. One could also make the argument that the axiomatic method gives a strong guarantee of
the validity of theorems. But the fact that the theorems are all derived from three axioms does not increase
the likelihood of their validity. In fact, one could regard reliance on a tiny set of axioms to be vulnerable in
the sense that a biological monoculture is vulnerable. Either they all survive or none survive. A population
which is cloned from a very small set of ancestor axioms could be entirely wiped out by a tiny fault in a
single axiom! It is somewhat disturbing to ponder the fact that the most crucial axioms and rules of the
whole edifice of mathematics are themselves completely incapable of being proved or verified in any way.
(See Remark 2.1.4 for Bertrand Russells similar observations on this issue.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Definition 6.3.9.)
To prove: , `
(1) Hyp
(2) Hyp
(3) MP (1,2)
In other words, if the logical expression can be proved, then may be deduced from . Perhaps
surprisingly, the reverse implication is not so easy to show because it is not declared to be a rule of the system.
However, the deduction metatheorem states that if can be inferred from , then the logical expression
is a provable theorem in the axiomatic system. Unfortunately, there is no corresponding deduction
theorem which would prove this result within the axiomatic system. (The deduction metatheorem requires
mathematical induction in its proof, for example, which is not available within propositional calculus.) The
deduction metatheorem and modus ponens may be summarised as follows.
inference rule name inference rule Curry code
modus ponens given , infer ` Pe
deduction metatheorem given ` , infer Pi
Curry [301], page 176, gives the abbreviation Pe for modus ponens and Pi for the deduction metatheorem.
These signify implication elimination and implication introduction respectively. In a rule-based natural
deduction system, both of these are included in the logical framework, whereas in an assertional system
with only the MP inference rule, implication introduction is a metatheorem.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The conclusion is the same as the hypothesis. So the formal proof is finished as soon as it is started. When
attempting to prove ` , one has no hypothesis to the left of assertion symbol. So one must arrive at
conclusions from the axioms and MP alone. Since the compound logical expression does not appear
amongst the axiom templates, the proof must have at least two lines as follows.
To prove: `
(1) [logical expression?] [axiom instance?]
(2) [justification?]
Here the logical expression on line (1) must be an axiom instance, since there is no other source of logical
expressions. This must lead to on line (2) by MP, since there is only one inference rule. However,
MP requires two (different) inputs to generate one output. So at least three lines of proof are required.
To prove: `
(1) [logical expression?] [axiom instance?]
(2) [logical expression?] [axiom instance?]
(3) MP (2,1)
With a little effort, one may convince oneself that such a 3-line proof is impossible. The MP rule requires
one input with the symbol between two logical subexpressions, where the subexpression on the left
is on the other input line, and is the subexpression on the right. This can only be achieved with
the axiom instances ( ) or ( ) ( ). In neither case can the left subexpression
be obtained from an axiom instance. Therefore a 3-line proof is not possible. One may continue in this
fashion, systematically increasing the number of lines and considering all possible proofs of that length. The
following is a solution to the problem.
To prove: `
(1) (( ) ) PC 1: [], [ ]
(2) ( (( ) )) (( ( )) ( )) PC 2: [], [ ], []
(3) ( ( )) ( ) MP (2,1)
(4) ( ) PC 1: [], []
(5) MP (4,3)
The proof could possibly be shortened by a line (although it is difficult to see how). But it is clear that a
quick conversion of the proof of ` into a proof of ` is not likely. If it is not obvious how to
convert the most trivial one-line proof, it is difficult to imagine that a 20-line proof will be easy either.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
truth value map as in Definition 3.2.3, and K 2P is the knowledge set denoted by .
The assertion ` for compound propositions and has the meaning (2P \ K ) K , where
K 2P is the knowledge set denoted by .
The assertion ` for compound propositions and has the meaning K V K , where the
triple arrow V denotes implication in the semantic (metamathematical) framework. This is equivalent to
the (naive) set-inclusion K K . From this, it follows that any 2P must satisfy (2P \ K ) K .
(To see this, note that 2P implies K (2P \ K ) K (2P \ K ).) However, it does not follow
that the validity of (2P \ K ) K for a single truth value map implies the set-inclusion K K . To
obtain this conclusion, one requires the validity of (2P \ K ) K for all possible truth value maps .
Thus in semantics terms, the assertion ` implies (2P \ K ) K , which implies the validity of the
assertion ` . So in terms of semantics, the deduction metatheorem is necessarily true. Therefore
one might reasonably ask why implication introduction is not a fundamental rule of any logical system. Any
reasonable logical system should guarantee that ` follows from ` if it accurately describes
the underlying semantics. The validity of the implication introduction rule is thrown in doubt in minimalist
logical systems which are based on modus ponens as the sole inferential rule. Natural deduction systems
which have rules that mimic real-world mathematics do not need to have such doubt because they can
prescribe implication introduction as an a-priori rule.
The requirement to prove the deduction metatheorem may be thought of as a burden of proof for any
logical system which declines to include the implication introduction rule amongst its a-priori rules. This
includes all systems which rely upon modus ponens as the sole rule. This provides an incentive to abandon
assertional systems in favour of rule-based natural deduction systems.
Non-MP rules become even more compelling in predicate calculus, where Rule G and Rule C, for example,
must be proved as metatheorems in assertional systems. (See Remark 6.1.7 for Rules G and C.) This suggests
that the austere kind of axiomatic framework in Definition 4.4.3 is too austere for practical mathematics.
In fact, even for the purposes of pure logic, it seems of dubious value to discard a necessary rule and then
have to work hard to prove it in a precarious metatheorem. Such an approach only seems reasonable if one
starts from the assumption that the theory or language is the primary reality and that the underlying
meaning (or model) is of secondary importance. Making the deduction metatheorem an a-priori rule only
seems like cheating if one assumes that the language is the real thing and the underlying space of concrete
propositions is a secondary construction.
4.8.6 Remark: Literature for the deduction metatheorem for propositional calculus.
Presentations of the deduction metatheorem (and its proof) for various axiomatisations of propositional
calculus are given by several authors as indicated in Table 4.8.1. Most authors refer to the deduction
metatheorem as the deduction theorem. (The propositional algebra used in the presentation by Curry [301],
pages 178181, lacks the negation operator. So it is apparently not equivalent to the propositional calculus
in Definition 4.4.3.)
The deduction metatheorem is attributed to Herbrand [368, 369] by Kleene [315], page 98; Kleene [316],
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
page 39; Curry [301], page 249; Lemmon [317], page 32; Margaris [319], page 191; Suppes [344], page 29.
4.8.7 Remark: Different deduction metatheorem proofs for different axiomatic systems.
Each axiomatic system requires its own deduction metatheorem proof because the proofs depend very much
on the formalism adopted. (See Remark 6.6.25 for the deduction metatheorem for predicate calculus.)
cannot generally progress in mathematics without accepting it. A concept which is taught at a young
enough age is generally accepted as obvious.)
One could call these sorts of logic theorems by various names, like pseudo-theorems, metatheorems
or fantasy theorems. In this book, they will be called naive theorems since they are proved using naive
logic and naive set theory.
4.8.9 Remark: Notation and the space of formulas for the deduction metatheorem.
Theorem 4.8.11 attempts to provide a corollary for modus ponens.
To formulate naive Theorem 4.8.11, let W denote the set of all possible wffs in the propositional calculus in
Definition 4.4.3,
S let W n denote the set of all sequences of n wffs for non-negative integers n, and let List(W )
denote the set n=0 W n of sequences of wffs with non-negative length. An element of W n is said to be a wff
sequence of length n. The concatenation of two wff sequences 1 and 2 is denoted with a comma as 1 , 2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove P (1) it must be shown that ` 1 is provable by some proof 0 List(W ). Every line of
the proof must be either (a) an axiom (possibly with some substitution for the wff names), (b) an element
of the wff sequence , (c) the wff , or (d) the output of a modus ponens rule application. Line 1 of the
proof cannot be a modus ponens output. So there are only three possibilities. Suppose 1 is an axiom.
Then the following argument is valid.
` 1
(1) 1 (from some axiom)
(2) 1 ( 1 ) PC 1
(3) 1 MP (2,1)
If 1 is an element of the wff sequence , the situation is almost identical.
` 1
(1) 1 Hyp (from )
(2) 1 ( 1 ) PC 1
(3) 1 MP (2,1)
If 1 equals , the following proof works.
` 1
(1) Theorem 4.5.6 (xi)
When a theorem is quoted as above, it means that a full proof could be written out inline according to the
proof of the quoted theorem. Now the proposition P (1) is proved. So it remains to show that P (k1) P (k)
for all k > 1.
Now suppose that P (k 1) is true for an integer k with 1 < k m. That is, assume that the assertion
` i is provable for all integers i with 1 i < k. Line k of the original proof must have been
justified by one of the four reasons (a) to (d) given above. Cases (a) to (c) lead to a proof of `
exactly as for establishing P (1) above.
In case (d), line k of proof is arrived at by modus ponens from two lines i and j with 1 i < k
and 1 j < k with i 6= j, where j has the form i k . By the inductive hypothesis P (k 1), there
0 00
are valid proofs 0 W m for ` i and 00j W m for ` j . (For the principle of
0 00
mathematical induction, see Theorem 12.2.11.) Then the concatenated argument 0 , 00 W m +m has
i on line (m0 ) and (i k ) on line (m0 + m00 ). A proof of ` k may then be constructed
as an extension of the argument 0 , 00 as follows.
` k
(m0 ) i (above lines of 0 )
0 00
(m + m ) (i k ) (above lines of 00 )
(m0 + m00 + 1) ( (i k )) (( i ) ( k )) PC 2
0 00
(m + m + 2) ( i ) ( k ) MP (m + m + 1, m + m00 )
0 00 0
` (A (A B)) B.
A (A B) ` B
A, (A B) ` B
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A ` (A B) B.
More generally, logical expressions may be freely moved between the left and the right of the assertion symbol
in this way in a similar fashion.
4.8.13 Remark: Substitution in a theorem proof yields a proof of the substituted theorem.
If all lines of a theorem are uniformly substituted with arbitrary logical expressions, the resulting substituted
deduction lines form a valid proof of the similarly substituted theorem. (See for example Kleene [315],
pages 109110.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
4.9.1 Remark: The original selection of a propositional calculus for this book.
Definition 4.9.2 is the propositional calculus system which was originally selected as the basis for propositional
logic theorems in this book. It differs from Definition 4.4.3 only in axiom (PC0 3). (This system is also used
by E. Mendelson [320], pages 3031.) Like Definition 4.4.3, Definition 4.9.2 is a one-rule, two-symbol, three-
axiom system with and as logical operators and modus ponens as the sole inference rule.
4.9.2 Definition [mm]: The following is the axiomatic system PC0 for propositional calculus.
(i) The proposition names are lower-case and upper-case letters of the Roman alphabet, with or without
integer subscripts.
(ii) The basic logical operators are (implies) and (not).
(iii) The punctuation symbols are ( and ).
(iv) Logical expressions (wffs) are specified recursively. Any proposition name is a wff. For any wff , the
substituted expression () is a wff. For any wffs and , the substituted expression ( ) is a wff.
Any expression which cannot be constructed by recursive application of these rules is not a wff. (For
clarity, parentheses may be omitted in accordance with customary precedence rules.)
(v) The logical expression names (wff names) are lower-case letters of the Greek alphabet, with or without
integer subscripts.
(vi) The logical expression formulas (wff-wffs) are logical expressions whose letters are logical expression
names.
(vii) The axiom templates are the following logical expression formulas:
(PC0 1) ( ).
(PC0 2) ( ( )) (( ) ( )).
(PC0 3) ( ) (( ) ).
(viii) The inference rules are substitution and modus ponens. (See Remark 4.3.17.)
(MP) , ` .
4.9.4 Theorem [pc0 ]: The following assertions follow from Definition 4.9.2.
(i) , ` .
(ii) ( ) ` ( ).
(iii) ` ( ) ( ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) Hyp
0
(3) ( ) ( ( )) PC 1: [ ], []
(4) ( ) MP (3,2)
0
(5) ( ( )) (( ) ( )) PC 2: [], [], []
(6) ( ) ( ) MP (5,4)
(7) MP (6,1)
To prove part (ii): ( ) ` ( )
(1) ( ) Hyp
0
(2) ( ) PC 1: [], []
0
(3) ( ( )) (( ) ( )) PC 2: [], [], []
(4) ( ) ( ) MP (3,1)
(5) ( ) part (i) (2,4): [], [ ], [ ]
To prove part (iii): ` ( ) ( )
(1) ( ) PC0 1: [], []
(2) ( ) (( ) ) PC0 3: [], []
(3) ( ) (( ) ) part (ii) (2): [ ], [ ], []
(4) (( ) ) part (i) (1,3): [], [ ], [( ) ]
(5) ( ) ( ) part (ii) (4): [], [ ], []
This completes the proof of Theorem 4.9.4.
4.9.5 Remark: Single axiom schema for the implication and negation operators.
It is stated by E. Mendelson [320], page 42, that the single axiom schema
(((( ) ( )) ) ) (( ) ( ))
was shown by Meredith [374] to be equivalent to the three-axiom Definition 4.9.2. (Consequently it is
equivalent also to Definition 4.4.3.) The minimalism of a single-axiom formalisation is clearly past the point
of diminishing returns. Even the three-axiom sets are difficult to consider to be intuitively obvious.
It should be remembered that the main justification for an axiomatic system is that one merely needs to
accept the obvious validity of the axioms and rules (as for example in the case of the axioms of Euclidean
geometry), and then all else follows from the axioms and rules. If the axioms are opaque, the whole raison
detre of an axiomatic system is null and void. An axiomatic system should lead from the obvious to the
not-obvious, not from the not-obvious to the obvious, which is what these formalisms do in fact achieve.
Chapter 5
Predicate logic
propositional logic
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 5.0.1 Classes of logic systems
5.0.3 Remark: Predicate logic attempts to formalise theories which describe infinities.
Mathematical logic would be a shallow subject if infinities played no role. It is the infinities which make
mathematical logic interesting. (One could perhaps make the even bolder claim that it is the infinities
which prevent mathematics in general from being shallow!) Since infinities range from slightly metaphysical
to extremely metaphysical, it is not possible to settle questions regarding infinite logic and infinite sets by
reference to concrete examples. At best, one may hope to make the theory correspond accurately to the
metaphysical universes which are imagined to exist beyond the finite world which we can perceive. Predicate
logic is primarily an attempt to formalise logical argumentation for systems where infinities play a role.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
5.1.1 Remark: Predicate logic is parametrised propositional logic with bulk handling facilities.
Predicate logic introduces parameters for the propositions of propositional logic. If the concrete proposition
domain is finite, the use of parameters is a mere management convenience, permitting bulk handling of
large classes as opposed to individual treatment, but for an infinite set of propositions, the parametrised
management of propositions and their truth values is unavoidable. It is true that all of propositional logic is
immediately and directly applicable to arbitrary infinite collections of propositions, but propositional logic
lacks the bulk handling facilities. The additional facilities of predicate logic consist of just two quantifiers,
namely the universal and existential quantifiers. (So it would be fairly justified to refer to predicate logic as
quantifier logic or propositional logic with quantifiers.)
In practice, proposition parameters are typically thought of as objects of some kind. Examples of kinds
of objects are numbers, sets, points, lines, and instants of time. Although the parameters for propositions
may be in multiple object-universes, it is usually most convenient to define the universal and existential
quantifiers to apply to a single combined object-universe. Then individual classes of objects are selected by
applying preconditions to parameters in order to restrict the application of proposition quantifiers to the
relevant object classes.
relations of objects. This suggests some kind of dual relationship between objects and propositions, similar
to the relationship between rows and columns of matrices. One may focus on either the propositions or
the objects as the primary entities in a logical system. In predicate logic, the focus shifts to a great extent
from the propositions to the objects. This is related to the concept of object-oriented programming in
computer science, where the focus is shifted from functions and procedures to objects. Then functions and
procedures become secondary concepts which derive their significance from the objects to which they are
attached. Similarly in predicate logic, attributes and relations become secondary concepts which derive their
significance from the objects on which they act.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
5.1.4 Remark: Name maps for predicates, variables and propositions in predicate logic.
Predicate calculus requires two kinds of names, namely (1) names which refer to predicates and (2) names
which refer to parameters for the predicates. Thus the notation P (x) refers to the proposition which
is output by the predicate referred to by P when the input is a variable which is referred to by the
symbol x. In pure predicate calculus, there are no constant predicates (such as = and in set
theory). Therefore the symbols P and x may refer to any predicate and variable respectively. However,
within the scope of these symbols, their meaning or binding does not change. Similarly, the Q(x, y)
refers to the proposition which is output by the predicate bound to Q from inputs which are bound to x
and y. And so forth for any number of predicate parameters.
The name-to-object maps (or bindings) for a predicate language are summarised in the following table.
The maps are fixed within their scope.
name name object object
map space domain type
V NV V variables
Q NQ Q predicates
P NP P propositions
The choice of the words space and domain here is fairly arbitrary. (The most important thing is to
avoid using the word set.) The spaces and maps in the above table are illustrated in Figure 5.1.1.
In Figure 5.1.1, the proposition name Q(y, z) would be mapped by P to P (Q(y, z)), which is the
output from the predicate Q (Q) when it is applied to the variables V (y) and V (z). Thus one could
write P (Q(y, z)) = Q (Q)(V (y), V (z)). Then if Q is , y is and z is {}, one
could write P ( {}) = Q ()(V (), V ({})).
V Q P
V Q P
concrete , {},. . . ,. . . {},. . . F T {F, T}
The space Q of predicates may be partitioned according to the number of parameters of each predicate.
Thus Q = Q0 Q1 Q2 . . ., where Qk is the space of predicates which have k parameters. The predicates
f Qk have the form f : V k {F, T}. In terms of the list notation in Section 14.11, the predicates f Q
have the form f : List(V) {F, T}.
5.1.6 Remark: Five layers of linguistic structure. Logical expressions and logical expression names.
The five layers of metamathematical language for propositional calculus in Figure 4.1.1 in Remark 4.1.7 are
applicable also to predicate calculus. The main difference is that the logical operators are extended by the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
addition of universal and existential quantifiers in the predicate calculus. When writing axioms, for example,
one uses logical expression names to refer to logical expressions whose individual letters refer to entire
proposition names. For example, in the logical expression name x, ((x) (x)), the letters and
could refer to logical expressions such as P (x) A and Q(x) A respectively. Then x, ((x) (x))
would mean x, ((P (x) A) (Q(x) A)) in the language of the theory. Thus x, ((x) (x)) is
a kind of template for logical expressions which have a particular form. The distinction becomes important
when writing axioms and theorems for a language. Axioms and theorems are generally written in template
form, and proofs of theorems are really templates of proofs which are instantiated by replacing logical
expression names with specific logic expressions in the language. (This idea is illustrated in Figure 5.1.2.)
compounds
of logical y, (y) (y) 5 for
expressions
axioms,
rules and
logical
expression (y) (y) 4 theorems
names
logical
x, x y z, y = z 3
expressions
predicate
language
concrete
object = x y z xy y=z 2
names
Q V P
concrete
q1 q2 v1 v2 v3 q1 (v1 , v2 ) q2 (v2 , v3 ) 1 semantics
objects
predicates variables concrete propositions
Figure 5.1.2 Concrete objects, names, logical expressions and logical expression names
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
As an extension of notation, logical predicates which are always true or always false and have one or more
parameters, may be denoted as for functions. Thus, for example, >(x, y, z) would be a true proposition for
any variables x, y and z, and (a, b, c, d) would be a false proposition for any variables a, b, c and d. Luckily
such notation is rarely needed.
A abstract B
names
C
V V
Q Q
V (A) V V (B)
Q () Q ()
concrete
objects V (C)
Figure 5.1.3 Mapping the constant name to a concrete predicate
C is constant with respect to y, but not with respect to x. Thus one may typically write C(x) to informally
suggest that C may depend on x but not on y.
So the question arises in the case of names of predicates, functions and variables as to what a constant name
should be constant with respect to. The simple answer is that constant names are constant with respect to
name maps, but this requires some explanation.
Consider first the name maps V : NV V from the individual name space NV to the individual object
space V. (The individual spaces are usually called variable spaces, but this is confusing when one is
talking about constant variables. A more accurate terms would be predicate parameter spaces.) In
principle, the language defined in terms of the name space should be invariant under arbitrary choices of the
name map V . That is, all of the axioms and rules should be valid for any given name map. Therefore the
axioms and rules should be invariant under any permutation of the elements of the object space V.
If all elements of a concrete space are equivalent, there is no need to be concerned with the choice of names.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
But this is not usually the case. More typically, each element of a concrete space has a unique character
which is of interest in the study of that space. In this case, one often wishes to have fixed names for some
concrete objects.
Consider the example of the empty set in ZF set theory. Let a V be the empty set in a concrete ZF set
theory (often called a model of the theory). Then the name A NV in the parameter name space NV
c c
points to a if and only if V (A) = a, where = : V V P denotes the equality relation on the concrete
parameter space V. (See Figure 5.1.4.)
NV NV
1V 1V 1V 2V 2V 2V
V V
c c c c c c
a = 1V (A) b = 1V (B) c = 1V (C) a = 2V (B) b = 2V (A) c = 2V (C)
concrete objects concrete objects
Figure 5.1.4 Definition of a constant name
c
A variable name C NV can be said to be constant if 1V , 2V V NV , 1V (C) = 2V (C), where V NV
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
space Q.
The constant empty set name is characterised the predicate P (X) = y, y / X, which depends on the
variable name for a membership relation which satisfies the ZF axioms. However, this is not a problem
which anyone worries about. The choice of predicate name map Q : NQ Q is very strongly constrained
by the ZF set axioms. Then in terms of the choice of the membership relation , the name is uniquely
constrained by the predicate P (), which is true only if the name points to the unique empty set for the
c
particular choice of predicate name map for . Then we have a = V ().
In conclusion, the definition of a constant name (for a predicate parameter) requires a definition of equality
on the concrete parameter space, which implies that a first-order language without equality cannot have
constant names, and at least one additional constant predicate must be specified in order to distinguish
one parameter from another, so that one will know which concrete object is being pointed to by the constant.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Many logic textbooks become particularly confusing when they write their inference rules in the abbreviated
style. In this book, in Definition 6.3.9 rule (UI) for example, one sees a sentence similar to: From ` (y),
infer ` x, (x). In the abbreviated notation convention, one must add a subsidiary note such as where
x is not free in to a sentence such as: From ` , infer ` x, . When multiple inference rules
must be applied in complex practical proofs of logic theorems, the verification of the correct application of
informally written subsidiary conditions is error-prone, which tends to defeat the main purpose of formal
logic, which is to guarantee correctness as far as possible.
The habit of suppressing implicit parameters is seen almost universally in the mathematics literature. This
is particularly so in differential geometry and partial differential equations. Thus one writes u = f as an
R
abbreviation for x n , u(x) = f (x), for example. Free variables, such as x in this case, are often
suppressed, and their universes are generally indicated somewhere in the context. Complicated tensor
expressions are quite often tedious to write out in full. For example, the equation
Ri jk` = j`,k
i i
jk,` i
+ mk m i m
j` m` jk .
As a rule of thumb, such expressions are interpreted to mean that the parametrised proposition is being
asserted for all values of all free variables in the universes defined within the context for those variables.
This is only a problem if multiple contexts are combined and the meanings therefore become ambiguous.
The beneficial purpose of suppressing proposition and object parameters, and universal quantifiers for free
variables, is to focus the minds of readers and writers on particular aspects of expressions and propositions,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1968 Smullyan [341] (x)P x (x)P x
1969 Bell/Slomson [290] (x)P (x) (x)P (x)
1969 Robbin [334] xP(x) xP(x)
1970 Quine [333] x F x x F x
1971 Hirst/Rhodes [311] x, P (x) x, P (x)
1973 Chang/Keisler [298] (x)(x) (x)(x)
1973 Jech [314] x (x) x (x)
1974 Reinhardt/Soeder [120]
x
P (x)
x
P (x)
1979 Levy [318] x x
1980 EDM [108] xF (x), (x)F (x), xF (x), xF (x) xF (x), (Ex)F (x), xF (x), xF (x)
1982 Moore [321] xP (x) xP (x)
1990 Roitman [335] x (x) x (x)
1993 Ben-Ari [291] x P (x) x P (x)
1993 EDM2 [109] xF (x) xF (x)
1996 Smullyan/Fitting [342] (x)P (x) (x)P (x)
1998 Howard/Rubin [312] (x)P (x) (x)P (x)
2004 Huth/Ryan [313] x P (x) x P (x)
2004 Szekeres [266] x(P (x)) x(P (x))
Kennington x, P (x) x, P (x)
while placing routine parameters in the background. There is a trade-off here between precision and focus.
One may sometimes wonder whether a writer even knows how to write out an abbreviated expression in
full. When mathematical expressions seem difficult to comprehend, it is often a rewarding exercise to seek
out all of the variables, and the universes for those variables, and write down fully explicit versions of those
expressions, showing all quantifiers, all variables, and all universes for those variables.
As mentioned also in Remark 6.6.3, the abbreviated style of predicate logic notation where proposition
parameters are suppressed is not adopted in this book. Thus if a proposition parameter list is not shown,
then the proposition has no parameters. If one parameter is shown, then the proposition belongs to a single-
parameter family with one and only one parameter. And so forth for any number of indicated parameters.
This rule implies that so-called bound variables are suppressed. Thus one may write, for example, A(x)
to mean y U, P (x, y), which roughly speaking may be written as A(x) = y U, P (x, y). If one
expands the meaning of A(x), the bound variable y appears. But since it is bound by the quantifier y, it
is not a free variable in A(x). Therefore it is not indicated because (A(x))xU is a single parameter family
of propositions. The expression y U, P (x, y) yields a single proposition for each choice of the free
variable x.
To be a little more precise, a two-parameter family of propositions (P (x, y))x,yU is (in metamathematical
language) effectively a map P : U U P from pairs (x, y) U U to concrete propositions P (x, y) P
for some universe of parameters U and some concrete proposition domain P. The expression A(x) =
y U, P (x, y) then denotes a family one-parameter family of propositions (A(x))xU , which is effectively
a map A : U P. The logical expression y U, P (x, y) determines one and only one proposition
in P for each choice of x U . (The semantic interpretation of A(x) = y U, P (x, y) is discussed
for example in Remark 5.3.11.) Therefore it is valid to denote the (proposition pointed to by the) logical
expression y U, P (x, y) by a single-parameter expression A(x) for each x U , and so (A(x))xU is
a well-defined proposition family A : U P.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Although universal and existential quantifiers are superficially similar, since they are in some sense duals of
each other, the differences in information content between them show that they are fundamentally different.
for determining what the elements of such a set are. If the axiom of choice is added to the ZF set theory
axioms, it can be proved by means of axioms and deduction rules that such sets must exist. In other words,
it is shown that x, P (x) is true for the predicate P (x) = the set x is Lebesgue non-measurable. But no
examples can be found. This does not prove that the assertion is false, which is very frustrating if one thinks
about it too much.
The difficulties with universal and existential quantifiers in mathematics are in one sense the reverse of the
difficulties for empirical propositions. The empirical claim that all swans are white is always vulnerable
to the discovery of new cases. So empirical universals can never be certain. But in the case of mathematics,
universal propositions are often quite robust. For example, one may be entirely certain of the proposition:
All squares of even integers are even. If anyone did find a counterexample x whose square x2 is not
even, one could apply a quick deduction (from the definitions) that the x2 is not even if x is not even,
which would be a contradiction. More generally, universal mathematical propositions are usually proved by
demonstrating the absurdity of any counterexample. There is no corresponding method of proof to show the
total absurdity of non-white swans. (Popper [409], page 381, attributes the use of swans in logic to Carnap.
See also Popper [408], page 47, for ravens.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the universes ability to represent an integer would break down.
5.3.2 Remark: Natural deduction argues with conditional assertions instead of unconditional assertions.
Conditional assertions of the form A1 , A2 , . . . Am ` B for m + Z
0 were presented by Gentzen [361, 362]
as the basis for a predicate calculus system which he called natural deduction. This style of logical calculus
had well over a dozen inference rules, but no axioms. This was in conscious contrast to the Hilbert-style
systems which used absolutely minimal inference rules, usually only modus ponens and possibly some kind
of substitution rule. The natural deduction style of logical calculus permits proofs to be mechanised in a
way which closely resembles the way mathematicians write proofs, and also tends to permit shorter proofs
than with the more austere Hilbert-style calculus. The natural deduction style of logical calculus is adapted
for the presentation of predicate calculus in this book. It is both rigorous and easy to use for practical
theorem-proving, especially in the tabular form presented in comprehensive introductions to logical calculus
by Suppes [344] and Lemmon [317]. Some texts in the logic literature which present tabular systems of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
natural deduction are as follows.
(1) 1940. Quine [330], pages 9193, indicated antecedent dependencies by line numbers in square brackets,
anticipating Suppes 1957 line-number notation.
(2) 1944. Church [299], pages 164165, 214217, gives a very brief description of natural deduction and
sequent calculus systems.
(3) 19501982. Quine [331], pages 241255. It is unclear which of the four editions introduced tabular
presentations of natural deduction systems. This book by Quine demonstrates a technique of using
one or more asterisks to the left of each line of a proof to indicate dependencies. This is equivalent to
Kleenes vertical bar-lines. However, this system is not applied to a wide range of practical situations,
and does not use the very convenient and flexible numbered reference lines as in the books by Suppes
and Lemmon.
(4) 1952. Kleene [315], page 182, uses vertical bars on the left of the lines of an argument to indicate
antecedent dependencies. On pages 408409, he uses explicit dependencies on the left of each argument
line. On pages 98101, Kleene gives many derived rules for a Hilbert system which look very much like
the basic rules for a natural deduction system.
(5) 1957. Suppes [344], pages 25150, seems to give the first presentation of the modern tabular form of
natural deduction with numbered reference lines. It is intended for introductory logic teaching, but also
has some theoretical depth.
(6) 1963. Stoll [343], pages 183190, 215219, used sets of line numbers to indicate antecedent dependencies
for the lines of tabular logical arguments based on natural deduction inference rules.
(7) 1965. Lemmon [317], gives a full presentation of practical propositional calculus and predicate calculus
which uses many rules, but no axioms. It emphasises practical introductory theorem-proving, but does
have some theoretical treatment. It is derived directly from Suppes [344].
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
based on these styles of assertion statements show enormous variety in the literature. Any deduction system
which infers true assertions from true assertions is acceptable. And of course, such deduction systems should
preferably deliver all, or as many as possible, of the true inferences that are semantically correct.
The corresponding styles of assertions, independent of the styles of inference systems, are as follows.
(1) Unconditional assertions: ` for some logical formula .
Meaning: is true.
(2) Conditional assertions: (also called sequents)
N
(2.1) Simple sequents: 1 , 2 , . . . m ` for formulas i and , for i m , where m + 0. Z
N
Meaning: If i is true for all i m , then is true.
(2.2) Compound sequents: 1 , 2 , . . . m ` 1 , 2 , . . . n for formulas i and j , for i Nm
N
and j n , where m, n + Z
0.
N
Meaning: If i is true for all i m , then j is true for some j n .N
5.3.4 Remark: It is convenient to define a sequent to be a single-consequent assertion.
In 1965, Lemmon [317], page 12, defined sequent with the same meaning as the simple sequent in
Definition 5.3.14.
Thus a sequent is an argument-frame containing a set of assumptions and a conclusion which is
claimed to follow from them. [...] The propositions to the left of ` become assumptions of
the argument, and the proposition to the right becomes a conclusion validly drawn from those
assumptions.
Huth/Ryan [313], page 5, also defines a sequent to have one and only one consequent proposition. This seems
far preferable in a natural deduction context. The fact that Gentzen did not define it in this way in 1934
should not be an obstacle to sensible terminology in the 21st century. This is how the term sequent will
be understood in this book.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
5.3.6 Remark: History of the meaning of the assertion symbol. Truth versus provability.
The assertion symbol has changed meaning since the 1930s. Gentzen [361], page 180, wrote the following in
1934. (In his notation, the arrow symbol took the place of the modern assertion symbol ` , and his
superset symbol took the place of the modern implication operator symbol .)
2.4. Die Sequenz A1 , . . . , A B1 , . . . , B bedeutet inhaltlich genau dasselbe wie die Formel
A 1 , . . . , A r B1 , . . . , Bs
worin die Anzahlen r und s von 0 verschieden sind, gleichbedeutend mit der Implikation
Employment of the deduction theorem as primitive or derived rule must not, however, be confused
with the use of Sequenzen by Gentzen.[361, pp. 190210] ([. . . ]) For Gentzens arrow, , is not
comparable to our syntactical notation, `, but belongs to his object language (as is clear from the
fact that expressions containing it appear as premisses and conclusions in applications of his rules
of inference).
In other words, Gentzens apparent assertion symbol is not metamathematical (in the meta-language, as
Church calls it), but is rather a part of the language in which propositions are written. That is, it is the
same as the implication operator, not an assertion of provability.
After the 1944 book by Church [299], page 165, all books on mathematical logic have apparently defined the
assertion symbol in a sequent to mean provability, not truth. In other words, even though an implication
may be true for all possible models, the assertion is not valid if there exists no proof of the assertion within
the proof system. A distinction arose between what can be proved and what is true. An incomplete theory
is a theory where some assertions are true but not provable. The double-hyphen notation is often used
for truth of an assertion in all possible models, whereas the notation ` is reserved for assertions which can
be proved within the system. (Assertions that cannot be proved within a theory can sometimes be proved
by metamathematical methods outside the theory.)
In books in 1963 by Curry [301], page 184, in 1965 by Lemmon [317], page 12, and in 2004 by Huth/
Ryan [313], page 5, it is stated that sequents do signify assertions of provability. (See Remark 5.3.4 for the
relevant statement by Lemmon.) This is possibly because the assertion symbol had been redefined during
that time, particularly under the influence of model theory.
Consequently the modern meaning of sequent is a single-consequent assertion which is provable within the
proof system in which it is written.
5.3.7 Remark: Gentzen-style multi-consequent sequent calculus is not used in this book.
The sequent calculus logic framework which is presented in Remark 5.3.9 is not utilised in this book.
Gentzens multi-output sequent calculus has theoretical applications which are not required in this book. It
is the simple (single-output) conditional assertions which are used here because of their great utility for
rigorous practical theorem-proving.
It is a misfortune that the word sequent is so closely associated with Gentzens sequent calculus because this
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
word is so much shorter than the phrase simple conditional assertion. Nevertheless, the word sequent
(or the phrase simple sequent) is used in this book in connection with the natural deduction, not the
sequent calculus which Gentzen sharply counterposed to the natural deduction systems which he described.
The real innovation in the sequent concept lies not in the form and semantics of the sequent itself, but rather
in the use of sequents as the lines of arguments instead of unconditional assertions as was the case in the
earlier Hilbert-style logical calculus. In other words, every line of a Gentzen-style argument is, or represents,
a valid sequent, and valid sequents are inferred from other valid sequents. This is quite different to the
Hilbert-style argument where every line represents a true proposition, and true propositions are inferred
from other true propositions. It is the transition from the one valid assertion per line style of argument to
the one valid conditional assertion per line style which is really valuable for practical logical proofs, not
the disjunctive right-hand-side semantics.
If the two sides of a sequent are sets instead of sequences, and the right side is a single-element sequence,
it is questionable whether the term sequent still evokes the correct concept. However, its meaning and
historical associations are close enough to make the term useful in the absence of better candidates.
5.3.9 Remark: Sequent calculus and disjunctive semantics for consequent propositions.
Gentzen generalised single-output assertions to multi-output assertions which he called sequents.
(Actually he called them Sequenzen in German, and this has been translated into English as sequents to
avoid a clash with the word sequences. See Kleene [316], page 441, for a similar comment.) Sequents have
the fairly obvious form A1 , A2 , . . . Am ` B1 , B2 , . . . Bn for m, n + Z
0 , but the meaning of Gentzen-style
sequents is not at all obvious.
The commas on the left of a sequent A1 , A2 , . . . Am ` B1 , B2 , . . . Bn are interpreted according to the
unsurprising convention that each comma is equivalent to a logical conjunction operator, but each comma
on the right is interpreted as a logical disjunction operator. In other words, such a sequent means that if all
of the prior propositions are true, then at least one of the posterior propositions is true. This is equivalent
to the simple unconditional assertion ` (A1 A2 . . . Am ) (B1 B2 . . . Bn ). (This is consequently
equivalent to the disjunction of the n sequents A1 , A2 , . . . Am ` Bj for j n .) N
The historical origin of the somewhat perplexing disjunctive semantics on the right of the assertion symbol
is the sequent calculus which was introduced by Gentzen [361], pages 190193. The adoption of disjunctive
semantics on the right has various theoretical advantages. The 22 inference rules for predicate calculus with
such semantics have a very symmetric appearance and are easily modified to obtain an intuitionistic inference
system. But Gentzens principal reason for introducing this style of sequent was its usefulness for proving a
completeness theorem for predicate calculus.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
1. `B 2 KB KB = 2P
2. A ` B KA KB (2P \ KA ) KB = 2P
Tm T
3. A1 , A2 , . . . Am ` B i=1 KAi S
KB (2P \ m i=1 KAi ) S
KB = 2P
Tm n P
Tm n
4. A 1 , A 2 , . . . A m ` B1 , B2 , . . . Bn i=1 KAi j=1 KBj (2 \ i=1 KAi ) j=1 KBj = 2P
Sm P
Sn
" " i=1 (2 \ KAi ) j=1 KBj = 2P
Sn Sn
5. ` B1 , B2 , . . . Bn 2P i=1 KBi i=1 KBi = 2P
Tm Sm P
6. A1 , A2 , . . . Am ` i=1KA i i=1 (2 \ KAi ) = 2P
7. A ` KA 2P \ KA = 2P
8. ` 2P = 2P
Table 5.3.1 The semantics of conditional and unconditional assertions
In each row in Table 5.3.1, the inclusion relation in the relation interpretation column is equivalent to
the equality in the set interpretation column. (This follows from the fact that all knowledge sets are
necessarily subsets of 2P .) The inclusion relations have the advantage that they visually resemble the form
of the corresponding sequents. The assertion symbol may be replaced by the inclusion relation and the
commas may be replaced by intersection and union operations on the left and right hand sides respectively.
The sets on the left of the equalities in the set interpretation column are asserted to be tautologies. (The
correspondence between tautologies and the set 2P is given by Definition 3.12.2 (ii).) One may think of each
sequent as signifying a set. When sequent is asserted, the statement being asserted is that the truth value
map is an element of the sequents knowledge set for all 2P .
The assertion ` A may be interpreted as KA . The empty left hand side of this sequent signifies the
intersection of a list of zero knowledge sets (or an empty union of copies of 2P ), which yields 2P , the set of
all possible truth value maps. Thus ` A has the meaning 2P KA . Since the truth value map
is an implicit universal variable, the logical implication is equivalent to the set-inclusion relation 2P KA .
Since the constraint 2P is always in force, this inclusion relation is equivalent to the equality KA = 2P .
Thus logical expressions which are asserted without preconditions are always tautologies in a purely logical
system. (In a system with non-logical axioms, the set 2P is replaced with some subset of 2P .)
The sequent A ` B may be interpreted as the inclusion relation KA KB , which means that if A is true
then B is true. The sequent A ` B may also be interpreted as the set (2P \ KA ) KB , which is also the
knowledge set for the compound proposition A B . If this sequent is asserted in a purely logical system
(i.e. without non-logical axioms), then it is being asserted that (2P \ KA ) KB equals the set 2P .
It is easily seen that KA KB if and only if (2P \ KA ) KB = 2P for any sets KA , KB P(2P ).
(2P \ KA ) KB = 2P 2P \ ((2P \ KA ) KB ) =
KA (2P \ KB ) =
KA \ KB =
KA KB .
This justifies the set interpretations in the above table where there are one or more propositions on the left
of the assertion symbol.
Every sequent in a purely logical system is effectively a tautology. This fact is important in regard to the
design of a predicate calculus. In the Hilbert-style predicate calculus systems, there are no left-hand sides
for the lines of a proof. Each line is asserted to be a tautology in a purely logical system (or it is asserted to
be a consequence of the full set of axioms in a system with non-logical axioms). In a sequent-based natural
deduction system, each line of an argument signifies a (single-output) sequent, and the inference rules are
designed to efficiently manipulate those sequents. The principal advantage of such sequent-based argument
systems is that they more closely resemble the way in which mathematicians construct proofs. As a bonus,
the proofs are generally shorter and easier to discover in a sequent-based system.
The idea that the intersection of a sequence of zero knowledge sets yields the set 2P is consistent with
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the inductive rule that each additional knowledge set KAi in the sequence KA1 , KA2 , . . . KAm restricts the
resultant set. Since all knowledge sets are subsets of 2P , the smallest set for m = 0 is 2P . T Anything
m
larger would
Sm go outside the domain of interest. Another way of looking at this is to think of i=1 KAi
P P
as 2 \ i=1 (2 \ KAi ). Then the empty intersection for m = 0, which is normally not well defined, is
S0
now well defined as 2P \ i=1 (2P \ KAi ), which equals 2P \ , which equals 2P . (The union of an empty
family of sets is well defined and equal to the empty set by Theorem 8.4.6 (i).) Therefore a sequent of the
form A1 , A2 , . . . Am ` B1 , B2 , . . . Bn with m = 0 is consistent with an unconditional sequent of the form
` B1 , B2 , . . . Bn . In other words, the unconditional style of sequent is a special case of the corresponding
conditional sequent with m = 0.
Similarly, when n = 0, so that theS right-hand-side proposition list B1 , B2 , . . . Bn is empty, the natural
limit of the knowledge-set union nj=1 KBj is the empty set . Then the interpretation of the sequent is
T
that the left-hand-side knowledge set m i=1 KAi is a subset of , which is only true if there is no truth value
map which is an element of all of the knowledge sets KAi . This means that the composite logical expression
A1 A2 . . . Am is a contradiction.
The case n = 0 is of only peripheral interest here. In this book, only sequents with exactly one consequent
formula will be used in the development of predicate calculus and first-order logic.
In the degenerate case m = n = 0, the almost-empty sequent ` signifies = 2P , which is not possible
even if P = because 2 = {} =
6 by Theorem 10.2.22 (ii). The set of empty functions from to P is not
the same as the empty set which contains no functions. (Its okay to make forward references to set-theory
theorems here because anything goes in metamathematics!)
A conditional assertion of the form x, (x) ` y, (y) is very similar to a conditional assertion of the
form A1 , A2 , . . . Am ` B1 , B2 , . . . Bn . Their interpretations are also very similar. Instead of finite sets of
indices 1 . . . m and 1 . . . n, however, the quantified expressions x, (x) and y, (y) have the implied
index set U , which is universe of parameters for propositions.
The similarity between the sequents x, (x) ` y, (y) and A1 , A2 , . . . Am ` B1 , B2 , . . . Bn has some
relevance for the design a predicate calculus for quantified expressions based on inspiration from the much
simpler propositional calculus for unquantified expressions.
5.3.14 Definition [mm]: A (simple) sequent is an expression of the form A1 , A2 , . . . Am ` B for some
sequence of logical expressions (Ai )m Z +
i=1 , for some m 0 , and some logical expression B.
A compound sequent is an expression of the form A1 , A2 , . . . Am ` B1 , B2 , . . . Bn for some sequence
of logical expressions (Ai )mi=1 , for some m Z + n
0 , and some sequence of logical expressions (Bj )j=1 , for
some n 0 .Z+
Tm
The meaning of a simple sequent A1 , A2 , . . . Am ` B is the knowledge set inclusion relation i=1 KAi
N
KB , where KAi is the meaning of Ai for all i m , and KB is the meaning of B.
The meaning of a compound sequent A1 , A2 , . . . Am ` B1 , B2 , . . . Bn is the knowledge set inclusion
T
relation m K A
Sn
K B , N
where KAi is the meaning of Ai for all i m , and KBj is the meaning
N
i=1 i j=1 j
of Bj for all j n .
Chapter 6
Predicate calculus
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
predicate calculus, which takes no account of any constant predicates, object-maps or non-logical axioms.
6.1.2 Remark: The need for semantics to guide the axiomatic method for predicate calculus.
Truth tables are a tabular representation of the method of exhaustive substitution. When there are N
propositions in a logical expression, there are 2N combinations to test. When there are infinitely many
propositions, the exhaustive substitution method is clearly not applicable. However, since even predicate
calculus logical formulas contain only finitely many symbols, it seems reasonable to hope for the discovery
of some finite procedure of determining whether an expression is a tautology or not.
E. Mendelson [320], page 56, comments as follows on the existence of a truth-table-like procedure for the
evaluation of logical expressions involving parametrised families of propositions.
In the case of the propositional calculus, the method of truth tables provides an effective test as to
whether any given statement form is a tautology. However, there does not seem to be any effective
process to determine whether a given wf is logically valid, since, in general, one has to check the
truth of a wf for interpretations with arbitrarily large finite or infinite domains. In fact, we shall see
later that, according to a fairly plausible definition of effective, it may actually be proved that
there is no effective way to test for logical validity. The axiomatic method, which was a luxury in
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
the study of the propositional calculus, thus appears to be a necessity in the study of wfs involving
quantifiers, and we therefore turn now to the consideration of first-order theories.
This observation does not imply that parametrised families of propositions can derive their validity only
by deduction from axioms. The semantics of parametrised propositions is not entirely different to the
semantics of unparametrised propositions. The introduction of a space of variables, together with universal
and existential quantifiers to indicate infinite conjunctions and disjunctions respectively, does not imply
that the semantic foundations are irrelevant to the determination of truth and falsity of logical expressions.
The axioms of a predicate calculus are justified by semantic considerations. It is therefore possible to
similarly examine all logical expressions in predicate calculus to determine their validity by reference to their
meaning. However, the pursuit of this approach rapidly leads to a set of procedures which closely resemble
the axiomatic approach. (This is demonstrated particularly by the semantic interpretations for predicate
calculus in Section 6.4.)
The axiomatisation of logic is merely a formalisation of a kind of logical algebra. There is an analogy
here to numerical algebra, where a proposition such as x2 + 2x = (x + 1)2 1 may be shown to be true
for all x Rwithout needing to substitute all values of x to ensure the validity of the formula. One uses
algebraic manipulation rules which preserve the validity of propositions, such as add the same number to
both sides of an equation and multiply both sides of the equation by a non-zero number, together with
the rules of distributivity, commutativity, associativity and so forth. But these rules are based directly on
the semantics of addition and multiplication. The symbol manipulation rules are valid if and only if they
match the semantics of the arithmetic operations. In the same way, the axioms of predicate calculus are not
arbitrary or optional. The predicate calculus axioms and rules derive their validity from the propositional
calculus, which derives its axioms and rules from the semantics of propositional logic.
Consequently, the validity of predicate calculus is derived from exhaustive substitution in principle. It is not
possible to carry out exhaustive substitution for infinite families of propositions. But the rules and axioms
are determined from a study of the meanings of logical expressions.
Thus predicate calculus is a formalisation of the algebra of parametrised families of propositions, and the
objective of this algebra is to solve for the truth values of particular propositions and logical expressions,
and also to show equality (or other relations) between the truth value functions represented by various logical
expressions which may involve quantifiers.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
There is a temptation in predicate calculus to regard truth as arising solely from manipulations of symbols
according to apparently somewhat arbitrary rules and axioms. The truth of a proposition does not arise
from line-by-line deduction rituals. The truth arises from the underlying semantics of the proposition, which
is merely discovered or inferred by means of line-by-line argument.
The above comment by E. Mendelson [320], page 56, continues into the following footnote.
There is still another reason for a formal axiomatic approach. Concepts and propositions which
involve the notion of interpretation, and related ideas such as truth, model, etc., are often called
semantical to distinguish them from syntactical precise formal languages. Since semantical notions
are set-theoretic in character, and since set theory, because of the paradoxes, is considered a rather
shaky foundation for the study of mathematical logic, many logicians consider a syntactical ap-
proach, consisting in a study of formal axiomatic theories using only rather weak number-theoretic
methods, to be much safer.
Placing all of ones hopes on abstract languages, detached from semantics, seems like abandoning the original
problem because it is difficult. The validity of predicate calculus is derived from set-theory semantics. This
cannot be avoided. The detachment of logic from set theory creates vast opportunities to deviate from the
natural meanings of logical propositions and walk off into territory populated by ever more bizarre paradoxes
of a logical kind. At the very least, the axioms of predicate calculus should be based on a consideration of
some kind of underlying semantics. This should prevent aimless wandering from the original purposes of the
study of mathematical logic.
approach breaks down; indeed there is known to be no mechanical means for sifting expressions
at this level into the valid and the invalid. Hence we are required there to use techniques akin to
derivation for revealing valid sequents, and we shall in fact take over the rules of derivation for
the propositional calculus, expanding them to meet new needs. The propositional calculus is thus
untypical: because of its relative simplicity, it can be handled mechanicallyindeed, [. . . ] we can
even generate proofs mechanically for tautologous sequents. For richer logical systems this aid is
unavailable, and proof-discovery becomes, as in mathematics itself, an imaginative process.
The idea that there is no effective means of mechanical computation to determine the truth values of
propositions in the predicate calculus is illustrated in Figure 6.1.1.
mechanical logical
computation argumentation
objects
infinite
predicates predicate
? quantification
relations calculus
operators
functions
(impossible) (difficult)
The requirement for the use of the rules of derivation is not very different to the requirement for deduction
and reduction procedures in numerical algebra. The task of predicate calculus is to solve problems in logical
algebra. If infinite sets of variables are present in any kind of algebra, numerical or logical, it is fairly clear
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that one must use rules to handle the infinite sets rather than particular instances. But this does not imply
that the methods of argument, devoid of semantics, have a life of their own. By analogy, one may specify an
arbitrary set of rules for manipulating numerical algebra expressions, equations and relations, but if those
rules do not correspond to the underlying meaning of the expressions, equations and relations, those rules
will be of recreational interest at best, and a waste of time and resources at worst. To put it simply, semantics
does matter.
Mathematics, at one time, was entirely an imaginative process. Then the communications between math-
ematicians were codified symbolically to the extent that some mechanisation and automation was possible.
But now, when mechanical methods are inadequate, the application of the imaginative process is almost
regarded as a necessary evil, whereas it was originally the whole business.
One could mention a parallel here with the industrial revolution, during which a large proportion of human
productive activity was mechanised and automated. The fact that the design, use and repair of machines
requires human intervention, including manual dexterity and an imaginative process, is perhaps inconve-
nient at times, but one should not forget that humans were an endangered species of monkey only a couple
of hundred thousand years ago. The automation of a large proportion of human thought by mechanising
logical processes can be as beneficial as the automation of economic goods and services. But semantics
defines the meaning and purposes of logical processes. So semantics is as necessary to logic as human needs
are to the totality of economic production. One should not permit mechanised methods of logic to force
mathematicians to conclusions which seem totally ridiculous. If the conclusions are too bizarre, then maybe
it is the mechanised logic which needs to be fixed, not the minds of mathematicians.
6.1.4 Remark: The decision problem.
Rosser [337], page 161, said the following.
The problem of finding a systematic procedure which, for a given set of axioms and rules, will tell
whether or not any particular arbitrarily chosen statement can be derived from the given set of
axioms and rules is called the decision problem for the given set of axioms and rules. The method
of truth-value tables gives a solution to the decision problem for the statement calculus [. . . ]. What
about a solution to the decision problem for the predicate calculus [. . . ]. So far none has been
discovered.
Shortly after this remark, Rosser [337], page 162, said the following.
As we said, no solution of the decision problem for the predicate calculus has yet been discovered.
We can say much more. There is no solution to be discovered! This result was proved by Alonzo
Church (see Church, 1936).
Let there be no misunderstanding here. Churchs theorem does not say merely that no solution
has been found or that a solution is hard to find. It says that there simply is no solution at all.
What is more, the theorem continues to hold as we add further axioms (unless by mischance we
add axioms which lead to a contradiction, so that every statement becomes derivable). Applying
Churchs theorem to our complete set of axioms for mathematics, we conclude that there can never
be a systematic or mechanical procedure for solving all mathematical problems. In other words,
the mathematician will never be replaced by a machine.
This makes clear the underlying motivation of formal mathematical logic, which is to mechanise mathematics.
In this connection, the following paragraph by Curry [301], page 357, is of interest. (The system HK* is a
formulation of predicate calculus.)
For a long time the study of the predicate calculus was dominated by the decision problem (Entschei-
dungsproblem). This is the problem of finding a constructive process which, when applied to a given
proposition of HK , will determine whether or not it is an assertion. [. . . ] Since a great variety of
mathematical problems can be formulated in the system, this would enable many important math-
ematical problems to be turned over to a machine. Much effort in the early 1930s went into special
results bearing on this problem. In 1936, Church showed [. . . ] that the problem was unsolvable.
This shows once again the mechanisation motivation in the formalisation of mathematical logic.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The symbol notations in Table 6.1.2 have been converted to the notations of this book for easy comparison.
In the axioms column, the first number in each pair is the number of propositional calculus axioms, and the
second is the number of axioms which contain quantifiers. Taut means that the propositional axioms are all
tautologous propositions, which are determined by truth tables or some other means. The inference rules are
abbreviated as Subs (one or more substitution rules), MP (Modus Ponens), Gen (the generalisation
rule). Since the hypothesis introduction rule is assumed to be available in all systems, it is not mentioned
here. For any additional rules, only the number of them is given. (See Remark 4.2.1 for the propositional
calculus version of this table.)
to providing all required methods. The chosen logical system should be as compact and pruned-down as
possible with no superfluous concepts, but it should cover all of the requirements of day-to-day mathematics.
The formulations of predicate calculus in the overview in Figure 6.1.2 (in Remark 6.1.5) give some idea of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the wide variety in the literature. However, the variety is very much greater than suggested by such an
overview. Most of these systems are incompatible without major adjustments to accommodate them. Proofs
in one logical system mostly cannot be imported into others. The axioms in one system are often expressed
as rules or definitions in another system, and vice versa. Hence it is important to choose a system which
makes the import and export of theorems and definitions as painless as possible (although the pain cannot
be reduced to zero).
6.1.8 Remark: Desirability of harmonising predicate calculus with conjunction and disjunction axioms.
Another criterion for the choice of a predicate calculus is compatibility with intuition. The axioms and rules
should be obvious, and their application should be obvious. In terms of the underlying semantics, a universal
quantifier is a multiple-AND operator, and an existential quantifier is a multiple-OR operator. Therefore for
a finite universe to quantify over, one would expect the axioms and rules for the quantifiers to resemble the
axioms for the -operator and -operator respectively. There are various propositional calculus frameworks
in the literature which give axioms for the basic operators , , and . (See for example Kleene [315],
page 82; E. Mendelson [320], pages 4041; Kleene [316], pages 1516.) Since all properties of these operators
can be deduced from their axioms, one could reasonably expect that similar axioms for the -quantifier and
-quantifier would yield the expected deductions for a predicate calculus.
6.1.9 Remark: Outline of practical procedures for manipulating universal and existential propositions.
A predicate calculus needs four kinds of fundamental rules for exploiting universal and existential expressions.
These are the UI (universal introduction), UE (universal elimination), EI (existential introduction) and EE
(existential elimination) rules. The concept of operation of these rules may be broadly summarised as follows.
(UI) Assumption: A(y) for arbitrary y. Conclusion: x, A(x). (Requires some work to discover a proof.)
* One picks an arbitrary element y from a class U and demonstrates that the proposition A(y) is true.
From this one concludes that A(x) is true for all x in U because the proof procedure did not depend on
the choice of y.
* For example, one may be able to prove that if y is a dog, then y is an animal, without knowing or making
use of any particular characteristics of the dog y. If so, one may conclude that all dogs are animals.
(UE) Assumption: x, A(x). Conclusion: A(y) for arbitrary y. (Requires no work at all. Just substitution.)
* If one knows that A(x) is true for all x, then one may conclude that A(y) for an arbitrary choice of y.
* For example, if one knows that all dogs are animals, one may pick any dog at random and conclude
that it is an animal without needing to consider any of its individual characteristics.
(EI) Assumption: A(y) for some y. Conclusion: x, A(x). (Requires some work to find an example.)
* One finds some element x from a class U and discovers that the A(y) is true. From this one concludes
that A(x) is true for some x in U .
* For example, if one discovers a single black swan y, this immediately justifies the conclusion that x is
black for some swan x. So this inference rule is very easy to apply after the individual y has been found,
but it is not always clear how to discover y in the first place. Sometimes there is no known procedure
for discovering it.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(EE) Assumption: x, A(x). Conclusion: A(y) for some y. (May be very difficult or even impossible.)
* If one knows that A(x) is true for some x, then one may conclude that A(y) for a some choice of y.
However, it is not at all clear how to discover this y. There may be no known discovery procedure.
* For example, even if one is told that that there is some swan that is black, one may not be able to find
it in one lifetime. In the case of a cryptographic puzzle, for example, all the computing power on Earth
may be insufficient to find the answer, even though it is known for certain that an answer exists.
Following the application of an elimination rule, the individual proposition which has been stripped of its
quantifiers may be manipulated according to propositional calculus methods. So the real interest in predicate
calculus lie in the four kinds of procedures outlined above. The universal rules are essentially concerned with
theorems which prove the validity of propositions for a large class of parameters. The existential rules are
generally more concerned with examples and counterexamples. The discovery of a counterexample makes
the corresponding theorem impossible to prove. Conversely, the discovery of a proof for a theorem excludes
the possibility of finding a counterexample. Mathematical research often consists of a mixture of theorem
proving and counterexample discovery. Whichever succeeds first precludes the possibility of the other.
As a matter of proof strategy, the purpose of the elimination rules is to permit the de-quantified expressions
to be manipulated according to propositional calculus, which is already well defined in Chapter 4. Following
such propositional manipulation, the introduction rules are applied to return to quantified expressions. Hence
a proof typically commences with one or more elimination rule applications, followed by some propositional
calculus rule applications, followed by one or more introduction rule applications. There could be two or
more elimination/manipulation/introduction cycles in a single proof.
calculus axioms are not shown since they all yield the same theorems. The purpose of this summary is to
assist comparison of the systems. (The notations and terminology are harmonised to this book. The free
variables in each predicate are listed after its name. For example, A(x, y) indicates that A has the two free
variables x and y and no others.) The natural deduction systems of Suppes [344] and Lemmon [317] (which
are listed in Table 6.1.2) are not included in the following summary.
1928. Hilbert/Ackermann [308], pages 6570.
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: A(y) (x, A(x)).
(3) Rule: From A B(x), obtain A (x, B(x)).
(4) Rule: From B(x) A, obtain (x, B(x)) A.
1944. Church [299], page 172.
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: (x, (A B(x))) (A (x, B(x))).
(3) Rule: From A, infer x, A(x).
1952. Kleene [315], page 82. (Same as Hilbert/Ackermann [308].)
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: A(y) (x, A(x)).
(3) Rule: From A B(x), infer A (x, B(x)).
(4) Rule: From B(x) A, infer (x, B(x)) A.
1952. Wilder [353], page 236. (Same as Hilbert/Ackermann [308].)
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: A(y) (x, A(x)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) Rule: From A B(x), infer A (x, B(x)).
(4) Rule: From B(x) A, infer (x, B(x)) A.
1953. Rosser [337], pages 101102.
(1) Axiom: A (x, A). (No free occurrences of x in A.)
(2) Axiom: ((x, A(x)) A(y)) ((x, A(x)) (y, A(y))).
(3) Axiom: (x, A(x, y)) A(y, y).
(4) Derived Rule G: From A(y), infer x, A(x). (Rosser [337], pages 124126.)
(5) Derived Rule C:. From x, A(x), infer A(y). (Rosser [337], pages 126133.)
1958. Bernays [292], page 47. (Same as Hilbert/Ackermann [308], who attribute their system to Bernays.)
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: A(y) (x, A(x)).
(3) Rule: From A B(x), infer A (x, B(x)).
(4) Rule: From B(x) A, infer (x, B(x)) A.
1963. Curry [301], page 342.
(1) Rule: From ` x, A(x), infer ` A(y).
(2) Rule: From ` A(y), infer ` x, A(x). (Rule G.)
(3) Rule: From A(y) ` B, infer x, A(x) ` B. (Rule C.)
(4) Rule: From ` A(y), infer ` x, A(x).
1963. Curry [301], page 344. (Same as Church [299], plus two extra axioms.)
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: (x, (A B(x))) (A (x, B(x))).
(3) Axiom: A(y) (x, A(x)).
(4) Axiom: (x, (A(x) B)) ((x, A(x)) B).
(5) Rule: From A(y), infer x, A(x).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(4) Derived Rule C: From ` x, A(x) and A(x) ` B, infer ` B. (E. Mendelson [320], pages 7375.)
1967. Kleene [316], page 107. (Same as Hilbert/Ackermann [308].)
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: A(y) (x, A(x)).
(3) Rule: From A B(x), infer A (x, B(x)).
(4) Rule: From B(x) A, infer (x, B(x)) A.
1967. Margaris [319], page 49.
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: A (x, A). (No free occurrences of x in A.)
(3) Axiom: ((x, A(x)) A(y)) ((x, A(x)) (y, A(y))).
(4) Axiom: x, A(x) for any axiom A.
(5) Derived Rule C: From ` x, A(x) and A(x) ` B, infer ` B. (Margaris [319], pages 40, 7879.)
1967. Shoenfield [340], page 21.
(1) Rule: From B(x) A, infer (x, B(x)) A. (Rule C.)
1969. Bell/Slomson [290], page 5758. (Same as Church [299].)
(1) Axiom: (x, A(x)) A(y).
(2) Axiom: (x, (A B(x))) (A (x, B(x))).
(3) Rule: From A, infer x, A(x).
6.1.11 Remark: Ambiguity in notations for universal and existential free variables.
A disagreeable aspect of many of the predicate calculus frameworks in Remark 6.1.10 is the lack of distinction
between universal and existential free variables. When a free variable is introduced in a proof, precisely the
same notation denotes a universal free variable, which is completely arbitrary, and an existential free variable,
which is chosen from a restricted set (or class) of solutions of a logical predicate. These two kinds of free
variables have an entirely different character, and yet they are notated identically. Therefore such systems
are highly vulnerable to error and subjectivity, since the context determines the meaning of symbols in a
subtle way.
When a universal free variable is introduced into a proof, it is obtained from a universal logical expression of
the form x, P (x). If U denotes the universe of individual indices or objects, this expression has the informal
interpretation x U, P (x), which is equivalent to x, (x U P (x)). But an existential expression of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the form x, Q(x) has the informal interpretation x U, Q(x), which is equivalent to x, (x U Q(x)).
It is significant that the implicit logical operator is in the first case, and in the second case.
logical informal equivalent
expression interpretation expression
x, P (x) x U, P (x) x, (x U P (x))
x, P (x) x U, P (x) x, (x U P (x))
It seems reasonable to adopt the universal quantifier elimination rule x0 U, x, P (x) ` P (x0 ). In other
words, for any x0 U , the proposition P (x0 ) follows from the proposition x, P (x). The analogous intro-
duction rule for the existential quantifier is x0 U, P (x0 ) ` x, P (x). In other words, for any x0 U , the
proposition x, P (x) follows from the proposition P (x0 ). This gives two rules which seem reasonable and
natural for the universal and existential quantifiers.
quantifier introduction elimination
expression rule rule
x, P (x) Rule G x0 U, x, P (x) ` P (x0 )
x, P (x) x0 U, P (x0 ) ` x, P (x) Rule C
The real difficulties commence when one attempt to fill the gaps in this table. These two gaps, labelled
Rule G and Rule C, have a very different nature to each other. An introduction rule for the universal
quantifier (i.e. Rule G) would require the prior proof of P (x) for every x U . This can be achieved by
using a form of proof where x is unknown but arbitrary, and the proof argument is known (by intuition
or assumption) to be independent of the choice of x. By contrast, when one attempts to eliminate the
existential quantifier (i.e. Rule C), the resulting assertion must be that P (x) is true for some choice of x, but
how one may make such a choice is often unknowable. (For example, the discovery of a valid choice might
require more computing power than is available in the universe if the problem is hard.)
Many predicate frameworks have -elimination rules which look like x, P (x) ` P (y), and -elimination rules
which look like x, P (x) ` P (y). It is clear that P (y) has a totally different significance in these two cases,
but notationally they look the same. Hence a safer and more credible method of inference is required.
A clue to the meaning of an elimination rule like x, P (x) ` P (y) is the fact that in practice, no proof ends
with the proposition P (y) for an indeterminate object y, whereas one often ends a proof with a conclusion
of the form x, P (x). The last step in such a proof would be an existential introduction of some kind, not an
elimination. So existential elimination should be thought of only as an intermediate step in a proof. In other
words, one eliminates the existential quantifier, then makes some use of the liberated proposition, and then
introduces an existential quantifier again in some way. This helps to explain the unusual form of Rule C.
6.1.12 Remark: The contrasting roles of elimination and introduction rules for quantifiers.
Quantifier elimination rules are generally used in the body of a proof to narrow the focus to the quantified
logical expression, which can then be manipulated with a propositional calculus. Following such manipula-
tion, quantifier introduction rules are required to conclude the proof. In predicate calculus proofs, one does
not usually wish to conclude with an intermediate variable in the final expression.
In applications, however, one often wishes to apply a quantified expression to a particular concrete variable.
For example, if one proves that the square of an odd integer is an odd integer, one may wish to apply this
theorem to a particular odd integer, and if one proves that there exists a prime number greater than any
given prime number, one may wish to find an example of a particular prime number which is larger than a
particular given prime number.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
says that the dog bit the cat, the subject is the dog and the predicate is bit the cat. In the subject-
predicate model for clauses, the subject is thought of as a member of a class of objects, while the predicates
are thought of as functions which take objects as arguments and deliver true-or-false values. A clause as a
whole is then thought of as a proposition which may be true or false. So the first task of a predicate calculus
is to provide a framework for describing propositions which have at least such a subject-predicate structure.
A further level of intra-proposition structure which must be described by a predicate calculus is the object-
relation-object form. For example, the clause the dog is bigger than the cat is a clause with two objects
and one relation. Such clause structure may be thought of as a boolean-valued function of two objects. The
space of objects for such relations is typically the same as the space of objects for the subject-predicate
clause-form. Both clause-forms may be thought of boolean-valued functions of objects, with one or two
parameters respectively. The inductive instinct of mathematicians leads one to immediately embed such
clause-forms in a framework with predicates which take any non-negative number of parameters.
A second category of requirements for a predicate calculus is the need to be able to describe infinities. One of
the most quintessential instincts of mathematicians is to generalise everything to infinite classes and infinite
classes of infinite classes, and so forth. The unary and binary operators from which compound propositions
are constructed in the propositional calculus are inadequate to describe conjunctions or disjunctions of infinite
collections of propositions. The use of quantifiers to extend unary and binary operators to infinite classes is
generally considered to be an integral part of any predicate calculus.
Thus there are two principal technical requirements for a predicate calculus, which are independent.
(1) Proposition structuring. Require object-predicate and object-relation structure for propositions.
(2) Quantifier operations. Require quantifiers for infinite collections of propositions.
One could design a calculus with predicate and relation structuring for propositions, but with no quantifiers
for infinite collections of propositions. But one could also design a calculus with quantifiers for infinite
collections of propositions, but with no structuring of those propositions.
At the very minimum, a predicate calculus for mathematics should be able to describe the language of
Zermelo-Fraenkel set theory. ZF set theory has the membership relation , which is a two-parameter
relation. Although individual sets cannot be true or false, in NBG set theory, there is a single-parameter
predicate which is true or false according to whether a class is a set. Otherwise, it seems that set theory
requires only relations for objects called sets in some universe of sets. However, it is customary to provide
additionally functions in a predicate calculus. These are not the set-of-ordered-pair style of functions within
set theory. Predicate calculus functions are object-valued functions with zero or more object parameters
(as contrasted with boolean-valued functions). Although these are apparently not required for ZF set theory,
they are required for algebra which is not explicitly based on set theory. Algebra is largely concerned with
operations which deliver a unique object from one or two parameter objects. Thus the requirements for a
predicate calculus customarily include the following.
(1) Proposition structuring. Require object-predicate and object-relation structure for propositions.
(2) Quantifier operations. Require quantifiers for infinite collections of propositions.
(3) Object functions. Require object-functions which map from object-tuples to objects.
To meet all of these requirements, one needs at least the following classes of entities.
(1) Objects. A universe of objects.
(2) Relations. Classes of boolean-valued relations with zero or more object-parameters.
(3) Quantifiers. Quantifier operations for boolean-valued relations.
(4) Functions. Classes of object-valued functions with zero or more object-parameters.
A pure predicate calculus provides only quantifiers for parametrised propositions, with no explicit support
for constant relations and functions which take objects as parameters. A pure predicate calculus also has
no non-logical axioms. Non-logical axioms generally state some a-priori properties of constant relations and
functions, but if such constant relations and functions are not present, it is clearly not possible to assert
properties for them.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
may introduce index sets I and organise propositions into families (Pk )kI .
It is a small step from the indexed families of propositions to the introduction of a space of objects so that
propositions can be interpreted as properties and relations of objects. (Properties and relations may be
grouped together as predicates.) Thus one may introduce a set of objects U , and the propositions may
be organised into families of the form (P1 (i))iU , (P2 (i, j))i,jU , (P3 (i, j, k))i,j,kU , and so forth. There may
be any number of proposition families with a given number of parameters, and some families may not be
defined for the whole universe of objects U . Partial definition may be managed by defining a default value
(such as false) for the truth-value of undefined propositions. Since such proposition families have some
significance in particular applications, they can be given notations which suggest their meaning.
With the introduction of objects and predicates, there is still in fact no significant difference between such
a minimalist predicate logic and the propositional logic in Chapters 3 and 4. Even if there are finite
rules which specify infinite sets of relations between the truth-values of propositions, these rules are part
of the application context, not explicitly supported by the propositional calculus. For example, one might
specify that P (x1 , x2 ) is a false proposition for every x1 in U , for some x2 in U . (Such a situation arises,
for example, with the ZF empty set axiom x2 , x1 , x1 / x2 .) A propositional calculus does not permit any
conclusions to be drawn from such a universal/existential statement. Nor does it have language in which
such a specification can be expressed.
Although in a literal sense, a logical framework which explicitly supports notions of objects and predicates
may be called a predicate logic, any predicate calculus for such a logic would not say much more than
is said by a simple propositional logic. Therefore one may safely reject such a very literal interpretation of
the concept of a predicate logic.
some infinite index set I, one might specify some constraints on the truth values of the propositions with a
universal-quantifier expression such as i I, (Pi Pi+1 ), where I = + Z
0 . This simply makes formal and
explicit a rule which could otherwise have been specified informally in the application context.
It is important to indicate the scope of each quantifier. The scope of a quantifier must be a valid proposition
for each choice of a free variable in the scope. By convention, the scope of a quantifier must be a contiguous
substring of the full string. So in expressions of the form i, (A(i)) or j, (B(j)), the subexpressions A(i)
and B(j) must be valid propositions for all choices of the variables i and j. The scope of each quantifier is
delimited by parentheses.
One possible difficulty with this approach is the assignment of meaning to subexpressions of a logical expres-
sion. In the case of pure propositional logic, each logical subexpression is identified with a knowledge set.
(See Sections 3.4 and 3.5 for knowledge sets.) Therefore every subexpression in propositional logic may be
interpreted as a knowledge set. Since quantifier logic merely extends propositional logic by the addition of
quantifier operators, it seems plausible that a similar kind of knowledge set could be defined for each valid
subexpression of a valid expression which contains quantifiers.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
possibility of introducing an equality relation. Thus when two labels x and y point to the same object, one
may write x = y, or some such expression. (This is the subject of Section 6.7.)
(2) the variables to which each quantifier applies (which implies linkage if more than one variable is substi-
tuted by a single quantifier),
(3) the exact scope of each quantifier.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
6.3.1 Remark: Objectives of a good predicate calculus.
A good predicate calculus should permit proofs which are short, simple, comprehensible and clearly correct.
Proofs should also be easy to discover, construct, verify and modify. These objectives underly the choices
which have been made in the design of the predicate calculus in Definition 6.3.9. (See also Section 6.2 for
some discussion of the requirements for a predicate calculus.)
6.3.2 Remark: The importance of the safe management of temporary assumptions in proofs.
In real-life mathematics, one generally keeps mental track of the assumptions upon which all assertions are
based. Some of the assumptions are fairly permanent within a particular subject. Some of the assumptions
are restricted to a narrower context than the whole subject. Some assumptions may be restricted to a
single chapter or section of a work. Sometimes the assumptions are present only in the statement of a
particular theorem. And quite often, assumptions are adopted and discharged during the development of a
proof. At the beginning of a proof, it is usually fairly clear which assumptions are in force at that point,
but as a long proof progresses, it may become unclear which temporary assumptions are still in force and
which temporary assumptions have been discharged already. A good predicate calculus should make the
management of temporary assumptions clear, safe and easy. (The safe, tidy management of temporary
assumptions in proofs is discussed particularly in Section 6.5.)
6.3.3 Remark: A predicate calculus which uses true sequents instead of tautologies.
The predicate calculus in Definition 6.3.9 is based on the propositional calculus in Definition 4.4.3. The
axioms (PC 1), (PC 2) and (PC 3) are incorporated essentially verbatim. To these axioms, 7 inference rules
are added.
The 7 inference rules in Definition 6.3.9 are written in terms of sequents rather than propositions. A sequent
is a sequence of zero or more proposition templates followed by an assertion symbol, which is followed by a
single proposition template. (All of the sequents in Chapter 6 are simple sequents as in Definition 5.3.14.
Simple sequents have one and only one proposition on the right of the assertion symbol.) Each such sequent
essentially means that if the input templates at the left of the assertion symbol match propositions which are
found in a logical argument, then the corresponding filled-in output template at the right of the assertion
symbol may be inserted into the argument at any point after the appearance of the inputs.
When a rule is applied, the list of inputs of the sequent is indicated by line numbers which refer to earlier
propositions. Thus every line of an argument is a sequent in an abbreviated notation. (This style of
abbreviated logical calculus was used in textbooks by Suppes [344], pages 25150, and Lemmon [317].) In
propositional calculus, or in a Hilbert-style predicate calculus, every line of an argument is a tautology. In a
natural deduction style of argument, every line represents a true sequent, which is not very different to the
true-proposition style of argument.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
6.3.5 Remark: Design of a user-friendly predicate calculus based on true sequents.
Natural deduction predicate calculus systems are based on true sequents rather than the Hilbert-style true
propositions. In sequents, there are two kinds of operator-like connectives which separate propositions.
Every sequent contains, in principle, a single assertion symbol ` , and zero or more commas before the
assertion symbol. The meaning of the assertion symbol is very closely related to the implication symbol .
The meaning of the comma separator is very closely related to the conjunction symbol . Thus for example
the sequent 1 , 2 , 3 ` has a meaning which is closely related to the single compound proposition
(1 2 3 ) .
It is quite straightforward to represent universal quantifier elimination and introduction procedures in terms
of implication and conjunction operators. Thus if the universe of objects is a 3-element set U = {x1 , x2 , x3 }
for example, then the predicate formula x U, (x) may be written as (x1 ) (x2 ) (x3 ),
and this can be rewritten as x, (x U (x)). In the first case, only conjunctions are required,
whereas in the second case, an implication is required. The formula x U, (x) corresponds roughly
to the sequent x U ` (x). In other words, from the assumption x U one infers (x), no matter
which element x U is chosen. So the sequents ` x, (x) and , x U ` (x) may be regarded as
essentially interchangeable for any ambient assumption-set . It is important to note that any sequent such as
(x) ` (x) is implicitly universal, which means that such a sequent is interpreted as x, ((x) (x)).
Therefore the pseudo-sequent x U ` (x) is interpreted as x, (x U (x)).
The real difficulty starts with the representation of existential quantifier elimination and introduction in
terms of sequents. The predicate formula x U, (x) for the 3-element universe U = {x1 , x2 , x3 }, for
example, may be written as (x1 ) (x2 ) (x3 ), or as x, (x U (x)). The appearance of the
disjunction operator cannot easily be implemented in terms of implication and conjunction operators,
and the existential quantifier x cannot easily be implemented in terms of a universal quantifier. Therefore
it seems that existential logic may be very difficult to implement in terms of sequents. In fact, predicate
calculus systems in general seem to be not very well suited to the implementation of existential logic. To
a great extent the cause of this is the very limited language of logic arguments. Conjunctions, implications
and universals are apparently inherent in the culture of logical argumentation, whereas disjunctions and
existentials are alien to the culture.
One of the keys to the solution of this problem is the observation that one rarely, if ever, ends a proof
with a particular choice of a free variable x which exemplifies an existential statement of the form x, (x).
Typically any such choice will be introduced with a phrase such as: Let x be such that (x) is true.
Then the particular choice of x is used for some purpose, and some conclusion is drawn from it. But the
particular choice of x does not usually appear in the conclusion of the argument. Thus choices of variables
are generally stages of arguments which lead to particular conclusions. As mentioned in Remark 6.1.10, the
1928 work by Hilbert/Ackermann [308], pages 6570, proposes a predicate calculus which includes rules for
the introduction of universal and existential quantifiers in the following style.
(3) Rule: From A B(x), obtain A (x, B(x)).
(4) Rule: From B(x) A, obtain (x, B(x)) A.
In the existential case (4), the quantified expression is introduced as a premise for a conclusion A, whereas
in the universal case (3), the quantified expression is the conclusion of a premise A. This effectively means
that the proposition B(x) A is interpreted as x, (B(x) A), which is equivalent to (x, B(x))
A. This demonstrates that expressions containing free variables are generally interpreted with an implicit
universal quantifier. By making the variable proposition B(x) the premise, the universal quantifier yields the
existential expression x, B(x) as the premise for A. This kind of reverse logic for existential quantifiers
is required because of the default universal interpretation when free variables are present.
In natural deduction systems (where true sequents appear on every line instead of true propositions), the
argumentation for existential quantifier introduction and elimination is implemented principally as Rule C,
which is presented in the following form by E. Mendelson [320], pages 7375, and Margaris [319], pages 7879.
Derived Rule C: From ` x, A(x) and A(x) ` B, infer ` B.
When the implicit universal quantifier is applied to the sequent A(x) ` B (and the assertion symbol is
replaced with the conditional operator), the result is x, (A(x) B), which is equivalent to (x, A(x))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
B, which may be combined with the sequent ` x, A(x) to obtain the assertion ` B. Thus an existential
quantifier is again represented via the implicit universal quantifier by placing the quantified predicate on the
left of the implication operator (whose role is in this case performed by the assertion symbol).
In a typical example of existential quantifiers in ordinary mathematical usage, one might have a function f
R R
which is somehow known to satisfy K , t , |f (t)| < K. To exploit such an assumption, one may
R R
argue along the lines: Choose K such that t , |f (t)| < K. Then one may perform manipulations
with this number K, and later on may draw a conclusion about the function f from this which are independent
of the choice of K. Thus one obtains a conclusion which does not refer to the arbitrary choice of K. In the
abstract predicate logic context, choosing K could be written as ` x U A(x) or ` x U, A(x).
The one may manipulate the arbitrary choice of x to obtain a conclusion, after which the arbitrary choice
is removed because the validity of the conclusion does not depend on this choice. Such a form of argument
may be written abstractly as the following existential elimination and introduction rules.
(EE) From ` x, (x), infer ` x U and ` (x).
(EI) From ` x U and ` (x), infer ` x, (x).
In principle, there is no problem with such an approach. The fly in the ointment here is the fact that
the standard implicit interpretation of the sequents ` x U and ` (x) would be x, x U and
x, (x), which is quite nonsensical. One really wants the interpretation x, (x U (x)), which is
equivalent to x U, (x). There is no easy way to turn off this standard interpretation. Although the
logical argumentation style which results from this approach is valid, since it is equivalent to the Rule C
approach, it does not result in lines which are all valid sequents. Hence the rather clumsy Rule C approach
is to be preferred. Its clumsiness is not inherent in the notion of an existential quantifier nor in the arbitrary
choice of a object for such a quantifier. The clumsiness arises from the default interpretation of sequents,
which is unfairly biased in favour of universal quantifiers.
Some logical inference systems actually use different ranges of the alphabet for free variables to indicate
which ones should be universalised and which ones are constants. This is even clumsier than using, say, hat
and check notations such as x and x to indicate universal and existential free variables respectively.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and the map (y) 7 x, (x y) would be a soft predicate with one parameter. Since rules, theorems and
proofs are written in terms of soft predicates, they must be (more or less) formally specified.
6.3.7 Remark: Predicate names, variables names, wffs, wff-names, wff-wffs and soft predicates.
The relations between predicate names, variable names, wffs, wff-names and wff-wffs for predicate calculus
are an extension of the corresponding concepts for propositional calculus which are described in Remark 4.4.2.
In Definition 6.3.9 (v), wffs are defined as strings such as (x1 , (x2 , (A(x1 , x2 ) B(x2 , x3 )))), which are
built recursively from predicate names, variable names, logical operators and punctuation symbols. Each wff-
name in Definition 6.3.9 (iv) is a letter which may be bound to a particular wff within each scope. Thus the
wff-name may be bound to the wff (x1 , (x2 , (A(x1 , x2 ) B(x2 , x3 )))), for example. The wff-wffs
in Definition 6.3.9 (vi) are strings such as (x3 , ((x3 ))), which may appear in axioms, inference rules,
theorem assertions and proofs. Each wff-wff is a template into which particular wffs may be substituted.
The wff-wff construction rule (1) in Definition 6.3.9 (vi) requires the list of variable names following a
wff-name to contain all of the free variables in the wff which is bound to. Thus, for example,
the wff-wff (x3 ) is valid if is bound to a wff which contains only x3 as a free variable, such as
(x1 , (x2 , (A(x1 , x2 ) B(x2 , x3 )))). The set of free variables in a wff may be determined by means of
the recursive formulas at the right of each rule in Definition 6.3.9 (v). Expressions which are defined accord-
ing to this rule are soft predicates, which may be thought of as software-defined predicates by contrast
with the hard predicates with names in NQ .
Compared to other predicate calculus systems, this one is relatively simple and intuitive to use, even compared
to other natural deduction systems. It is not designed for theoretical analysis, but rather for its ease of use as
a practical deduction system for all mathematics. It is directly applicable to first-order logic systems which
have specified constant predicates and non-logical axioms. Proofs are relatively easy to discover, compared
to other systems, and errors in deduction are relatively easy to detect.
Definition 6.3.9 is quite informal, and has many rough edges. This is an inevitable consequence of the lack
of a formal language for metamathematics in this book. In terms of such a formal language, for example for
a suitable computer software package, it should be possible to specify all of the rules much more precisely.
The informal specification given here is sufficient for the proofs of some basic theorems in Section 6.6, which
are useful for the rest of the book.
6.3.9 Definition [mm]: The following is the axiomatic system QC for predicate calculus.
(i) The predicate name space NQ for each scope consists of upper-case letters of the Roman alphabet, with
or without integer subscripts. The variable name space NV for each scope consists of lower-case letters
of the Roman alphabet, with or without integer subscripts.
(ii) The basic logical operators are (implies), (not), (for all) and (for some).
(iii) The punctuation symbols are ( and , and ).
(iv) The logical predicate expression names (wff names) are lower-case letters of the Greek alphabet, with
or without integer subscripts. Within each scope, each wff name may be bound to a wff.
(v) Logical predicate expressions (wffs), and their free-variable name-sets S, are built recursively according
to the following rules.
(1) A(x1 , . . . xn ) is a wff for any A in NQn and x1 , . . . xn in NV . [S = {x1 , . . . xn }]
8 8
(2) ( ) is a wff for any wffs and . [S = S S ]
8
(3) ( ) is a wff for any wff . [S = S ]
8
(4) (x, ) is a wff for any wff and x in NV . [S = S \ {x}]
(5) (x, 8 ) is a wff for any wff and x in NV . [S = S \ {x}]
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Any expression which cannot be constructed by recursive application of these rules is not a wff. (For
clarity, parentheses may be omitted in accordance with customary precedence rules. In particular, the
parentheses for the parameters of a zero-parameter predicate may be omitted.)
The free-variable name-set of a wff is the set S defined recursively by the rules used for its construction.
A k-parameter wff , for non-negative integer k, is a wff whose free-variable name-set contains k names.
A free variable (name) in a wff is any element of its free-variable name-set.
A bound variable (name) in a wff is any variable name in the wff which is not an element of its free-
variable name-set.
(vi) The logical expression formulas (wff-wffs) are defined recursively as follows.
(1) (x1 , . . . xk ) is a wff-wff for any wff-name , for any finite sequence x1 , . . . xk in NV , where is
bound to a wff whose free variable names are all contained in the set {x1 , . . . xk }. Logical expression
formulas constructed according to this rule are soft predicates.
(2) (81 82 ) is a wff-wff for any wff-wffs 1 and 2 .
(3) (8 ) is a wff-wff for any wff-wff .
(4) (x, 8 ) is a wff-wff for any wff-wff and x in NV .
(5) (x, 8 ) is a wff-wff for any wff-wff and x in NV .
(vii) The axiom templates are the following sequents for any soft predicates , and with any numbers of
parameters, applied uniformly for each soft predicate.
(PC 1) ` ( ).
(PC 2) ` ( ( )) (( ) ( )).
(PC 3) ` ( ) ( ).
(viii) The inference rules are uniform substitution and the following rules for any soft predicates and with
any numbers of extra parameters, applied uniformly for each soft predicate.
(MP) From 1 ` and 2 ` , infer 1 2 ` .
(CP) From , ` , infer ` .
(Hyp) Infer ` .
(UE) From (x) ` x, (x, x), infer (x) ` (x, x).
(UI) From ` (x), infer ` x, (x).
(EE) From 1 ` x, (x) and 2 , (x) ` , infer 1 2 ` .
(EI) From (x) ` (x, x), infer (x) ` x, (x, x).
Each symbol indicates a dependency list, which is a sequence of zero or more wffs which may have
x as a free variable only if so indicated. The wffs and , and also the wffs in dependency lists, may
have free variables as indicated and also any free variables other than x, applied uniformly in each rule.
1 2 means the concatenation of dependency lists 1 and 2 , with removal of duplicates.
6.3.11 Remark: Selective quantification of free variables in the predicate calculus rules.
The rules UE and EI in Definition 6.3.9 (viii) permit selective quantification of free variables. In other
words, if the wff indicated by has more than one occurrence of the free variable x, then the quantified
form of this wff may substitute any number of instances of x with the quantified variable x. In these two
rules, the wff list (x) is not substituted in any way with the quantified free variable.
6.3.12 Remark: Extraneous free variables for the three propositional calculus axioms.
The three axioms in part (vii) of Definition 6.3.9 are copied from the Lukasiewicz axioms in part (vii) of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Definition 4.4.3. However, the wff specifications are different. So the meaning is slightly different. In the
predicate calculus case, the wffs may have free variables, which are indicated in Definition 6.3.9 (v). In
the special case of a wff with an empty free-variable name-set S , it is not difficult to believe that the
propositional calculus axioms of Definition 4.4.3 are applicable here, and that therefore all of the theorems
for that calculus are valid also.
If a wff appearing in a sequent has one or more free variables, these free variables are interpreted according
to the universalisation semantics in Remark 6.4.1. This effectively implies that the sequent asserts that
the result of any uniform substitution of particular variables for the free variables yields a logical formula
which is true for the underlying knowledge space. Since the axioms in Definition 6.3.9 (vii) are deemed to
be tautologies for any substitution of wffs for , and , every uniform substitution for their free variables
must yield a valid tautology. Therefore they continue to be tautologies when they are universalised.
Another way to think of this is that when free variables are seen in sequents, they are in fact not free
variables. They are in fact always implicitly universalised and are therefore actually bound variables.
It follows from these observations that all of the theorems in Sections 4.5, 4.6 and 4.7 for the propositional
calculus in Definition 4.4.3 are applicable for proving theorems for the predicate calculus in Definition 6.3.9.
A typical example of this is the proof of Theorem 6.6.7 (iii), where lines (2) and (5) assert A(x0 ) and
(x, A(x)) A(x0 ) respectively, and Theorem 4.6.4 (v) is then applied to assert (x, A(x)) on line (6).
The input propositions both contain the free variable x0 , which is implicitly substituted in each proposition
so that the free variable is no longer free. Then the conclusion follows by propositional calculus for each
possible substitution.
6.3.13 Remark: Extraneous free variables in the predicate calculus inference rules.
The inference rules in Definition 6.3.9 (viii) permit free variables in much the same was as for the axioms,
as mentioned in Remark 6.3.12. However, there are some differences. The inference rules contain both wffs
and dependency lists, and these have both explicit indications of free variables and implicit permission for
extra or extraneous free variables. For the same reasons as given in Remark 6.3.12, the extraneous free
variables do not diminish the validity of the rules. If a rule is valid for one substitution of particular values
for the extraneous free variables, then the universalisation of the rule is also valid.
An example of the application of extraneous free variables to an inference rule may be found in the proof
of Theorem 6.6.22 (i) in lines (2) and (3). The (consequent) proposition in line (2) is y, A(x0 , y), and
the (consequent) proposition in line (3) is A(x0 , y0 ), which follows by the UE rule. The first proposition
contains the extraneous free variable x0 , which is retained in the second proposition. The rule acts only on
the variable y, ignoring the variable x0 as extraneous. Since the inference is valid for each fixed value
of x0 , it is therefore valid for all fixed values. Hence universalisation semantics may be validly applied to x0 .
The antecedent dependency list for both lines (2) and (3) in the proof of Theorem 6.6.22 (i) is the single wff
x, y, A(x, y), which has no free variables. Therefore the form of rule being applied is as follows.
From ` y, (x0 , y), infer ` (x0 , y0 ).
This is a special case of the UE rule, but with the extraneous free variable x0 indicated explicitly.
The additional (optional) dependence of the soft predicate on the free variable x in rules UE and EI in
Definition 6.3.9 (viii) effectively means that any number of instances of x may be bound to the quantified
variable x or left in place, as desired.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(EE) From the sequent A(x) ` B, the sequent ` (x, A(x)) B can be inferred because the sequent A(x) ` B
means x U, (A(x) B), which is equivalent to (x, A(x)) B.
(EI) The sequent A(x, x) ` x, A(x, x) follows from the tautology x U, (A(x, x) x, A(x, x)). So
` x U, A(x, x) may be inferred from ` A(x, x).
Even more briefly, the basic properties of the quantifiers may be written as follows purely in terms of sequents.
(UE) Infer x, A(x, x) ` A(x, x).
(UI) From ` A(x) infer ` x, A(x).
(EE) From A(x) ` B infer ` (x, A(x)) B.
(EI) Infer A(x, x) ` x, A(x, x).
In condensed form, these properties encapsulate the meaning of the two quantifiers. The duality between
the UE and EI properties is clear. The duality between the UI and EE properties is less clear because the
default universalisation semantics for sequents breaks the symmetry here. However, the symmetry can be
restored by observing that the UI property is equivalent to: From B ` A(x) infer ` B x, A(x). (Then
B may be replaced by the always-true proposition >.)
The sequents (x) ` x, (x, x) and (x) ` (x, x) in the UE rule in Definition 6.3.9 are expanded as
lines (2) and (3) respectively as follows, where line (1) is the basic (naive predicate calculus) property (UE)
of the universal quantifier indicated above.
(1) x U, x, (x, x) (x, x) Hyp
(2) x U, (x) x, (x, x) Hyp
(3) x U, (x) (x, x) from (1,2)
It is clear that line (3) follows from lines (1) and (2). In other words, the UE rule in Definition 6.3.9 follows
from the basic property of the universal quantifier in line (1). Significantly, the antecedent condition list
is permitted to depend on the free variable x.
Similarly, the sequents (x) ` (x, x) and (x) ` x, (x, x) in the EI rule in Definition 6.3.9 are expanded
as lines (2) and (3) respectively as follows, where line (1) is the basic (naive predicate calculus) property (EI)
of the existential quantifier indicated above.
(1) x U, (x, x) x, (x, x) Hyp
(2) x U, (x) (x, x) Hyp
(3) x U, (x) x, (x, x) from (1,2)
Once again, it is clear that line (3) follows from lines (1) and (2). In other words, the EI rule in Definition 6.3.9
follows from the basic property of the existential quantifier in line (1). Significantly, the antecedent condition
list is permitted to depend on the free variable x.
The UI rule may be interpreted as follows. There is no mystery to be solved here.
(1) x U, (x) Hyp
(2) x, (x) from (1)
The reason for requiring the antecedent conditions list to be independent of x for the UI and EE rules,
but not for the UE and EI rules, is explained in Remark 6.4.5. The EE rule may be interpreted as follows.
(1) 1 x, (x) Hyp
(2) x U, (2 (x)) Hyp
(3) 2 x, ((x) ) from (2)
(4) 2 (x, (x)) from (3)
(5) (1 2 ) from (1,4)
Here line (5) is inferred from lines (1) and (2), which validates the EE rule. Note that the expression
1 2 in line (5) denotes the logical conjunction of the sequences of wffs in 1 and 2 , which corresponds
to the union of the wffs in the two sequences, which is denotes here as 1 2 . The reason for apparent
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
disagreement here is that conjunction semantics are applied to lists of antecedent wffs. (Strictly speaking,
the set-union operator is incorrect because the correct operator is list-concatenation.)
Thus expanding sequent lines according to universalisation semantics and then applying well-known logical
rules for the quantifiers validates all four of the quantifier rules for predicate calculus.
6.3.16 Remark: The accents on universal and existential variables are superfluous in principle.
In Definition 6.3.9, the hat accent serves only to hint that a variable is free on a line of a logical
argument. It is unnecessary and may be omitted without changing the meaning. The variables which are
free or bound are objectively discernable by reading the text of each line.
As a mnemonic for the meaning of the hat accent , one may think of it as the logical conjunction
operator symbol . A universally quantified proposition x, A(x) may be thought of as A(x1 )
A(x2 ) A(x3 ) . . . (if it is assumed that the elements of the parameter-universe may be enumerated).
Similarly, one may think of the check accent as a hint for the logical disjunction operator because
an existentially quantified proposition x, A(x) may be thought of as A(x1 ) A(x2 ) A(x3 ) . . ..
(Such existentialised free variables are not used for the predicate calculus in Definition 6.3.9.)
6.3.17 Remark: Alternative form for the existential quantifier elimination rule.
The following is a simpler-looking alternative for the existential elimination rule EE in Definition 6.3.9.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
rules (EE) and (EI) are not almost exactly the reverse of each other. The underlying cause of this is that
according to the de-facto standard interpretation, sequents are assumed to be universalised with respect
to free variables. The reverse of rule (EI) seems like it should be somewhat as follows.
(EE00 ) From ` x, (x), infer ` (x).
This rule would be perfectly valid if an existential-style sequent interpretation were available for x, but in
terms of the current conventions, there is no way to express the idea that the variable x in a sequent is
existentialised. Therefore rule (EE00 ) cannot be used in Definition 6.3.9.
an arbitrary list of further dependencies 2 (which must all be fixed relative to x). Then Rule C consists in
the inference that the sequent 1 2 ` must be valid under these conditions. This is clearly the inference
of 1 2 ` from the sequents 1 ` x, (x) and 2 , (x) ` , which is exactly what the EE rule in
Definition 6.3.9 says. An application of Rule C looks somewhat as follows in practice.
(i) x, (x) a (1 )
(j) (x) Hyp a (j)
(k) a (2 ,j)
(n) EE (i,j,k) a (1 2 )
Any commas between propositions in the proposition lists at the left and right of the assertion symbol are
converted to logical conjunction operators before the quantifiers are applied. Each of the universal variables
may occur on the left or the right of the assertion symbol any number of times, including zero times.
Strictly speaking, sequents are not truly universally quantified for each free variable which they contain.
Strictly speaking, a sequent which contains one or more free variables is a different sequent for each com-
bination of free variables. In other words, the sequent may be thought of as a kind of template into which
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the free variables may be inserted. One may refer to such templates as parametrised sequents. Since
the combinations of free variables may be freely chosen, there is not much difference between saying that
the sequent is true for each combination and saying that it is true for all combinations of free variables.
The difference here is between metamathematical quantification and object language quantification. (The
object language is the name sometimes given to the language in which the wffs of the predicate calculus
are written.)
Z
2
+
for some m2 0 . Then it follows from the sequents 1 ` and 2 ` that
K1 2 = K1 K2
((2P \ K ) K ) K
= K K
K .
K = K 2P
= K ((2P \ K ) K )
= (K (2P \ K )) (K K )
(K (2P \ K )) K
(2P \ K ) K .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T
The knowledge-set interpretation of (x) ` x, (x, x) is x U, (K(x) xU K(x,x) ), where U
is the universe of parameters of the predicate logic system which is being referred to. The interpretation
of (x) ` (x, x) is x U, (K(x) K(x,x) ). A universal quantifier is implied for all free variables
because there is an implicit claim that the assertion
T is true for all values of x in the universe of parameters.
Let 2P be a truth value map. Then xU K(x,x) if and only if K(x,x) for all x U . (See
Notation 10.8.9 for the intersection of a family of sets.) Therefore
T T
x U, (K(x) K(x,x) ) x U, 2P , ( K(x) xU K(x,x) )
xU
x U, 2P , ( K(x) y U, K(y,x) )
x U, 2P , y U, ( K(x) K(y,x) ) (6.4.1)
P
x U, y U, 2 , ( K(x) K(y,x) ) (6.4.2)
x U, y U, (K(x) K(y,x) ) (6.4.3)
x U, (K(x) K(x,x) ), (6.4.4)
which is the same as the knowledge-set interpretation for the sequent (x) ` (x, x). (Line (6.4.1)
follows from Theorem 6.6.12 (xviii). Line (6.4.2) follows from Theorem 6.6.22 (i). Line (6.4.3) follows from
Definition 7.3.2 and Notation 7.3.3.) Hence the inference of the sequent (x) ` (x, x) from the sequent
(x) ` x, (x, x) is always valid. This establishes the validity of the UE rule.
In the case of the UI rule, the same reasoning applies as for the UE rule, except that the final line (6.4.4)
cannot be reversed if depends on x or has an extraneous dependence on x. Therefore the dependence
on x must be removed. The resulting semantic calculation is then as follows.
T T
K K(x) 2P , ( K xU K(x) )
xU
2P , ( K x U, K(x) )
2P , x U, ( K K(x) )
x U, 2P , ( K K(x) )
x U, (K K(x) ),
which is the same as the knowledge-set interpretation for the sequent ` (x). This gives some insight
into why the UE rule permits the precondition to depend on x, and the wff to have an extraneous
dependence on x, whereas the UE rule does not.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of others. Such inclusions and exclusions may be identified with knowledge sets, where each knowledge set
excludes some subset of the set 2P of all possible combinations 2P of true and false values for all propo-
sitions in a concrete proposition domain P. It is helpful to have something to think of as the significance of
each logical assertion, and if that significance is fully consistent with all practical applications of that logic,
so much the better!
Fourth, although the logical axioms for predicate calculus may seem to be not much more than a con-
venient shorthand for the corresponding set-manipulation operations, the real advantages of the predicate
calculus argument language is its extension to general first-order languages, where non-logical axioms are
introduced. Such non-logical axioms restrict the full space 2P to a proper subset in ways which could be
excessively complex for convenient practical argumentation. Mathematical literature is almost entirely writ-
ten in an argumentative language from assumptions to assertions. Even though in principle such arguments
are equivalent to calculations of constraints above or below given target knowledge sets, that is not how
mathematicians generally think. (Recall that the subset relation for knowledge sets corresponds to the im-
plication operator for logical propositions. So constraints above or below knowledge sets correspond to
implication relations.)
freely choose which inference to draw. This seems perhaps somewhat paradoxical. One might reasonably
expect the conditions for the inference of ` x, (x) to be significantly weaker than the conditions
for the inference of ` x, (x). In fact, the antecedent proposition list is permitted to depend
on x in the EI case, and the wff may retain one or more copies of the free variable x in the consequent
proposition x, (x). These are in practice two significant weakenings of the conditions under which the
rule may be applied, which enormously increases the power of the rule.
which is the same as the knowledge-set interpretation for the sequent (x) ` (x, x). (Line (6.4.5) follows
from Theorem 6.6.12 (xxvii). Line (6.4.6) follows from Theorem 6.6.22 (iii).) Hence the inference of the
sequent (x)(x) ` x, (x, x) from the sequent (x) ` (x, x) is always valid. This establishes the
validity of the EI rule.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of the EI rule isnt valid. This is partly because the existential meaning of the logical expression (x)
cannot be interpreted within the standard universalised sequent semantics, but also because the swapping of
universal and existential quantifiers in line (6.4.6) in Remark 6.4.8 cannot be reversed. A different semantic
system could remove both of these show-stoppers, but within the standard interpretation, these obstacles
cannot be removed. However, existential semantics can be effectively procured by locating the free-variable-
containing term (x) on the left of the assertion symbol. (This left-hand-side free-variable trick can be
seen as early as the 1928 axioms and rules of Hilbert/Ackermann [308], pages 6570, which are summarised
in Remark 6.1.10.)
The sequent 2 , (x) ` has the interpretation x U, ((2 (x)) ) using standard universalised
free variable semantics, where any commas in the proposition list 2 are converted to conjunction operators.
By Theorem 6.6.12 (iii), this expression is equivalent to (x U, (2 (x))) . Thus an existential
meaning is very conveniently manufactured using the standard universalised free variable interpretation. By
Theorem 6.6.16 (vi), this expression is also equivalent to (2 (x U, (x))) , which is equal to
the expansion of the sequent 2 , x, (x) ` . This may quite plausibly be combined with the sequent
1 ` x, (x) to produce the sequent 1 2 ` . This may be more carefully verified using the
knowledge-set interpretation.
S
In terms of knowledge sets, the sequent 1 ` x, (x) has the interpretation K1 xU K(x) , the
sequent 2 , (x) ` has the interpretation x U, (K2 K(x) K ), and sequent 1 2 `
has the interpretation K1 K2 K . Therefore the EE rule may be interpreted as follows.
S
(EE) From K1 xU K(x) and x U, (K2 K(x) K ), infer K1 K2 K .
The second precondition may be rewritten as follows.
S
x U, (K2 K(x) K ) (K2 K(x) ) K
xU
where line (6.4.7) follows from Theorem 10.8.13 (i). Therefore the conjunction of the two preconditions
implies
S S
K1 K2 K(x) (2P \ K(x) ) K
xU xU
S
= K(x) K
xU
K .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
K (2P \ K(x) ) K ,
xU
which is the knowledge-set interpretation for the sequent ` (x, (x)) . Hence rule EE0 is plausible.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is as follows for Theorem 4.5.6 (vi).
(1) ( ) ( ( )) PC 1: [ ], [] a
Axiom (PC 1) is the proposition template ( ), which is here uniformly substituted with the
wff for the letter , and with the wff for the letter . The resulting substituted formula
( ) ( ( )) is then appended to the list of argument lines. Substitutions into axioms
are always tautologies. So the list of dependencies for an axiom substitution line is always the empty list.
(Note that argument line dependency lists are not shown in Chapter 4 because the lists are always empty
for propositional calculus.)
if j < i, the line numbers in the justification will be in reverse numerical order. An example application of
the MP rule appears on line (4) of the proof of Theorem 6.6.12 part (i) as follows.
(1) (x, A(x)) B Hyp a (1)
(2) A(x0 ) Hyp a (2)
(3) x, A(x) EI (2) a (2)
(4) B MP (1,3) a (1,2)
In this example, i = 1, j = 3, n = 4, i = 1, j = 2, n = 1, 2, = (x, A(x)) and = B.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) A B(x0 ) CP (2,4) a (1)
In this example, i = 2, j = 4, n = 5, i = 2, j = 1, 2, n = 1, = A and = B(x0 ).
(i) (x) Ji a (i )
(n) x, (x) UI (i) a (i )
This is almost the exact converse of the UE rule. (The only difference is that the dependency list i may
depend on x in the UE case.)
Both the UE and UI rules are well exemplified in the proof of Theorem 6.6.12 part (xvii) as follows. (This
example also neatly demonstrates the interaction between the Hyp, MP and CP rules.)
(1) A x, B(x) Hyp a (1)
(2) A Hyp a (2)
(3) x, B(x) MP (1,2) a (1,2)
(4) B(x0 ) UE (3) a (1,2)
(5) A B(x0 ) CP (2,4) a (1)
(6) x, (A B(x)) UI (5) a (1)
A very important point to note about the application of the UI rule is that the dependency list must contain
no instances of the free variable which is being removed by the application of the rule. This is a very easy
requirement to overlook. The dependencies of each argument line are abbreviated as a list of line numbers in
parentheses at the far right of the line. The lines referred to by these numbers must be examined to ensure
that the removed free variable is not present. In the above example, the UI rule is applied to line (5) to
remove the free variable x0 . Line (5) depends on line (1), which luckily does not contain the free variable x0 .
Two examples of attempted pseudo-proofs where this non-dependency check could easily be overlooked are
described in Remarks 6.6.8 and 6.6.24. This kind of error is particularly easy to make because it is not the
line to which the rule is being applied which must be checked. It is the line or lines referred to by the number
or numbers in parentheses at the right which must be checked!
6.5.6 Remark: Limited usefulness of the free variable dependency permission for the UE rule.
The UE rule seems to only rarely exploit the permission to make the dependency list i depend on the free
variable x. This may be because when this happens, considerable information is thrown away, and mostly
one does not wish to throw away information in a proof. The weakness of this case can be seen if one
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
considers that the UE rule allows on to infer a sequent of the form (y) ` (x) from a sequent of the
form (y) ` x, (x). A sequent (y) ` (x) is valid both if x and y are different and if they are the
same, but requiring them to be the same clearly throws away a large range of possibilities. It is always
permitted to uniformly substitute one free variable for another in a sequent, and if this makes two of the free
variables the same, the range of propositions which are asserted by the sequent is considerably restricted.
The appearance of the free variable x in the statement of Definition 6.3.9 rule (UE) is merely permitting
the possibility of discarding useful information, which one can easily do in other ways. By contrast, the
appearance of the free variable x in Definition 6.3.9 rule (EI) offers a considerable strengthening of the rule
because in that case, x is being eliminated from the consequent of the sequent, not being introduced .
The following is a contrived example application of the UE rule where the dependency list contains the same
free variable which is being eliminated from the universally quantified predicate.
(1) x, (A(x) y, B(y)) Hyp a (1)
(2) A(x0 ) y, B(y) UE (1) a (1)
(3) A(x0 ) Hyp a (3)
(4) y, B(y) MP (2,3) a (1,3)
(5) B(x0 ) UE (4) a (1,3)
(6) A(x0 ) B(x0 ) CP (3,5) a (1)
(7) x, (A(x) B(x)) UI (6) a (1)
In this argument, clearly a lot of information has been given away by the application of the UE rule in
line (5) because the same free variable x0 was chosen for the universal elimination as was used in the universal
elimination in line (2). This kind of choice is rarely useful in a proof, whereas in the case of the EI rule, it is
very common to see an existential quantifier introduction use the same free variable as a proposition in the
dependency list.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(7) (x, A(x)) B CP (3,6) a (1)
The hypothesis in line (2) is introduced so that it can be discharged by the EE rule in line (6). The hypothesis
in line (3) is introduced so that it can be discharged by the CP rule in line (7). The justification EE (1, 2, 5)
for line (6) lists line (1) as the existentially quantified proposition x, (A(x) B) which will become the
new dependency for the proposition B which is input on line (5), while discharging the dependency on the
proposition A(x0 ) B on line (2). Thus the dependency list 5 = (2, 3) is replaced with 6 = (1, 3). This
is an important psychological step. It means that the argument from line (2) to line (5), which proves B from
A(x0 ) B for an arbitrary choice of x0 justifies the conclusion that B follows from the existential proposition
x, (A(x) B). In other words, no matter which value of x0 exists in line (1), the conclusion B followed
from A(x0 ) B. Therefore the conclusion B follows from x, (A(x) B), independent of the choice of x0 ,
and therefore the dependency on line (2) may be discharged. It is noteworthy that at no point is it stated
in this argument that x0 exists, nor that x0 is chosen, even though a mathematician composing such a
proof may often think of the value of x0 as existing or having been chosen.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(6) x, (A(x) B) UI (5) a (1) ((x, A(x)) B) (x, (A(x) B))
The argument lines on the left are expanded as sequents on the right. The dependency lists in parentheses at
the right of each argument line are fully expanded on the right, with commas replaced by logical conjunction
operators. Assertion symbols are converted to implication operators and free variables are assumed to be
universally quantified. Visual inspection of the sequent expansions reveals that they are all tautological.
(See Remark 6.6.24 for an example of how such sequent expansions can be used to reveal inference errors.)
The meaning of all lines in the above argument may be expressed in terms of the knowledge set concept in
Section 3.4 as follows. (To save space, the universal quantifier x0 U is abbreviated as x0 , where U
is the universe of predicate parameters.)
argument lines knowledge set semantics
S S
(1) (x, A(x)) B Hyp a (1) (2P \ xU KA(x) ) KB (2P \ xU KA(x) ) KB
(2) A(x0 ) Hyp a (2) x0 , KA(x0 ) KA(x0 )
S
(3) x, A(x) EI (2) a (2) x0 , KA(x0 ) xU KA(x)
S
(4) B MP (1,3) a (1,2) x0 , ((2P \ xU KA(x) ) KB ) KA(x0 ) KB
S
(5) A(x0 ) B CP (2,4) a (1) x0 , (2P \ xU KA(x) ) KB (2P \ KA(x0 ) ) KB
S S
(6) x, (A(x) B) UI (5) a (1) (2P \ xU KA(x) ) KB xU ((2P \ KA(x) ) KB )
The above table exemplifies the core concept underlying predicate logic in this book. As mentioned in
Remarks 2.1.1, 3.0.5 and 3.13.3, there is a kind of duality between logic and naive set theory where logical
language may be regarded as a shorthand for logical operations on knowledge sets, and knowledge sets may
be regarded as a model for a logical language. Some samples of this duality between the language world
and the model world for predicate logic are given in the table in Remark 5.3.11. The naive set theory
referred to here is the very minimalist set theory which is hinted at in Remark 3.2.9.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
dependencies. Thus, for example, this assertion (under some complicated conditions) appears at the end of
a theorem which appears in Gilbarg/Trudinger [79], page 95.
Then
(2)
|u|2,;T C(|u|0; + |f |0,;T )
where C = C(n, , , ).
This informal practice contradicts the convention that C denotes the function and C(n, , , ) denotes the
value of the function. (See Notation 10.2.8 for this convention for functions and their values.) The purpose
of this style of notation is to suggest that an expression depends only on the indicated list of parameters,
or at least it is intended to suggest that any other dependencies are irrelevant to the context in which the
expression appears.
It is the informal practice which is adopted for predicate calculus here. Thus x, A, for example, indicates
that the logical expression A has no occurrence of the variable x when it is fully expanded. This is essentially
equivalent to writing x, A(x) with the proviso that A(x) is the same proposition for any x U , for the
relevant universe U . One may think of the absence of a parameter x in a logical expression A as meaning
that the proposition x, y, (A(x) A(y)) is being asserted. In mathematical logic, as in mathematics in
general, one must always be alert for hidden dependencies. The standard logic textbooks abound in tedious
conditions regarding bound and free variables. It is often not at all clear that the technical conditions
regarding variables in predicate calculus are not subjective or intuitive.
The convention adopted in this book for the indication of proposition parameter lists is also described in
Remark 5.2.5.
(i) x, A ` A.
(ii) x, A ` A.
(iii) A ` x, A.
(iv) A ` x, A.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove Theorem 6.6.4 part (i): x, A ` A
(1) x, A(x) Hyp a (1)
(2) A(x0 ) UE (1) a (1)
This is a correct proof of part (i) because A(x0 ) is the same as A for any choice of x0 . In the proof provided
for part (i) following the statement of Theorem 6.6.4, the redundant dependency on the parameter of A has
been omitted. As mentioned in Remark 6.6.3, the convention is adopted here that the absence of an explicit
parameter for a predicate means that the predicate is independent of that parameter.
To prove Theorem 6.6.4 part (ii): x, A ` A
(1) x, A(x) Hyp a (1)
(2) A(x0 ) Hyp a (2)
(3) A(x0 ) Theorem 4.5.6 (xii) (2) a (2)
(4) A(x0 ) EE (1,2,3) a (1)
Once again, the fact that A(x0 ) is independent of x0 implies that the final line (4) is the same as A with
the redundant dependency suppressed. However, there is no need to resort to such artifices. The proof of
Theorem 6.6.4 follows the specified rules.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) x, A(x) Hyp a (3)
(4) A(x0 ) UE (3) a (3)
(5) (x, A(x)) A(x0 ) CP (3,4) a
(6) (x, A(x)) Theorem 4.6.4 (v) (2,5) a (2)
(7) (x, A(x)) EE (1,2,6) a (1)
To prove part (iv): x, A(x) ` (x, A(x))
(1) x, A(x) Hyp a (1)
(2) x, A(x) Hyp a (2)
(3) (x, A(x)) part (iii) (2) a (2)
(4) (x, A(x)) (x, A(x)) CP (2,3) a
(5) (x, A(x)) Theorem 4.6.4 (iv) (1,4) a (1)
To prove part (v): x, A(x) ` (x, A(x))
(1) x, A(x) Hyp a (1)
(2) x, A(x) Hyp a (2)
(3) (x, A(x)) part (ii) (2) a (2)
(4) (x, A(x)) (x, A(x)) CP (2,3) a
(5) (x, A(x)) Theorem 4.6.4 (iv) (1,4) a (1)
This completes the proof of Theorem 6.6.7.
6.6.9 Remark: Some practically useful relations between universal and existential quantifiers.
Theorem 6.6.10 gives some practically useful equivalent forms of assertions (ii) and (iv) in Theorem 6.6.7.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(v) (x, A(x)) ` x, A(x).
(vi) (x, A(x)) ` x, A(x).
(vii) (x, A(x)) ` x, A(x).
(viii) (x, A(x)) ` x, A(x).
(ix) ` x, (A(x) A(x)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) ((x, A(x))) (x, A(x)) part (iv) a
(3) x, A(x) MP (2,1) a (1)
To prove part (ix): ` x, (A(x) A(x)).
(1) A(x0 ) A(x0 ) Theorem 4.5.6 (xi) a
(2) x, (A(x) A(x)) UI (1) a
This completes the proof of Theorem 6.6.10.
6.6.11 Remark: Some basic assertions for a single-parameter predicate and a single proposition.
In Theorem 6.6.12, part (x) is the contrapositive of part (ix), and is therefore essentially equivalent. However,
the method of proof is notably different.
6.6.12 Theorem [qc]: Basic assertions for one single-parameter predicate and one simple proposition.
The following assertions follow from the predicate calculus in Definition 6.3.9.
(i) (x, A(x)) B ` x, (A(x) B).
(ii) x, (A(x) B) ` (x, A(x)) B.
(iii) (x, A(x)) B a ` x, (A(x) B).
(iv) x, A(x) ` x, (A(x) B).
(v) ` (x, A(x)) (x, (A(x) B)).
(vi) B ` x, (A(x) B).
(vii) B ` x, (A(x) B).
(viii) ` B (x, (A(x) B)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) A(x0 ) Hyp a (2)
(3) x, A(x) EI (2) a (2)
(4) B MP (1,3) a (1,2)
(5) A(x0 ) B CP (2,4) a (1)
(6) x, (A(x) B) UI (5) a (1)
To prove part (ii): x, (A(x) B) ` (x, A(x)) B
(1) x, (A(x) B) Hyp a (1)
(2) x, A(x) Hyp a (2)
(3) A(x0 ) Hyp a (3)
(4) A(x0 ) B UE (1) a (1)
(5) B MP (4,3) a (1,3)
(6) B EE (2,3,5) a (1,2)
(7) (x, A(x)) B CP (2,6) a (1)
Part (iii) follows from parts (i) and (ii).
To prove part (iv): x, A(x) ` x, (A(x) B)
(1) x, A(x) Hyp a (1)
(2) A(x0 ) Hyp a (2)
(3) A(x0 ) B Theorem 4.7.9 (xi) (2) a (2)
(4) A(x0 ) B Theorem 4.7.9 (lvii) (3) a (2)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) x, (A(x) B) part (iv) (4) a (3)
(6) ((x, A(x))) (x, (A(x) B)) CP (3,5) a
(7) B (x, (A(x) B)) part (viii) a
(8) x, (A(x) B) Theorem 4.7.9 (xxxvi) (2,6,7) a (1)
To prove part (x): (x, (A(x) B)) ` ((x, A(x)) B)
(1) (x, (A(x) B)) Hyp a (1)
(2) x, (A(x) B) Theorem 6.6.10 (v) (1) a (1)
(3) (A(x0 ) B) UE (2) a (1)
(4) A(x0 ) Theorem 4.5.6 (xlii) (3) a (1)
(5) x, A(x) UI (4) a (1)
(6) B Theorem 4.5.6 (xliii) (3) a (1)
(7) ((x, A(x)) B) Theorem 4.5.6 (xlv) (5,6) a (1)
To prove part (xi): x, (A(x) B) ` (x, A(x)) B
(1) x, (A(x) B) Hyp a (1)
(2) A(x0 ) B Hyp a (2)
(3) x, A(x) Hyp a (3)
(4) A(x0 ) UE (3) a (3)
(5) B MP (2,4) a (2,3)
(6) B EE (1,2,5) a (1,3)
(7) (x, A(x)) B CP (3,6) a (1)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) A B(x0 ) CP (2,4) a (1)
(6) x, (A B(x)) UI (5) a (1)
Part (xviii) follows from parts (xvi) and (xvii).
To prove part (xix): A ` x, (A B(x))
(1) A Hyp a (1)
(2) A B(x0 ) Theorem 4.5.6 (xl) (1) a (1)
(3) x, (A B(x)) UI (2) a (1)
To prove part (xx): ` A x, (A B(x))
(1) A Hyp a (1)
(2) x, (A B(x)) part (xix) (1) a (1)
(3) A x, (A B(x)) CP (1,2) a
To prove part (xxi): A ` x, (A B(x))
(1) A Hyp a (1)
(2) A B(x0 ) Theorem 4.5.6 (xl) (1) a (1)
(3) x, (A B(x)) EI (2) a (1)
To prove part (xxii): ` A x, (A B(x))
(1) A Hyp a (1)
(2) x, (A B(x)) part (xxi) (1) a (1)
(3) A x, (A B(x)) CP (1,2) a
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) A B(x0 ) Theorem 4.5.6 (i) (4) a (4)
(6) x, (A B(x)) EI (5) a (4)
(7) x, (A B(x)) EE (3,4,6) a (1,2)
(8) A x, (A B(x)) CP (2,6) a (1)
(9) A x, (A B(x)) part (xxii) a
(10) x, (A B(x)) Theorem 4.6.4 (iii) (8,9) a (1)
To prove part (xxvi): A x, B(x) ` x, (A B(x)) [second proof ]
(1) A x, B(x) Hyp a (1)
(2) A x, B(x) Theorem 4.7.9 (lvi) (1) a (1)
(3) A x, (A B(x)) part (xxii) a
(4) x, B(x) x, (A B(x)) part (xxiv) a
(5) (A x, B(x)) x, (A B(x)) Theorem 4.7.9 (xxxvii) (3,4) a
(6) x, (A B(x)) MP (5,2) a (1)
Part (xxvii) follows from parts (xxv) and (xxvi).
To prove part (xxviii): (x, (A(x) B)) ` ((x, A(x)) B)
(1) (x, (A(x) B)) Hyp a (1)
(2) (x, A(x)) B Hyp a (2)
(3) x, (A(x) B) part (i) (2) a (2)
(4) ((x, A(x)) B) (x, (A(x) B)) CP (2,3) a
(5) ((x, A(x)) B) Theorem 4.5.6 (xxxii) (4,1) a (1)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is provably complete, then one can have confidence that further analysis will some day achieve a result one
way or the other. And if one can prove somehow that no proof within the system is possible, one can state
with some confidence that the desired theorem is false. But within an incomplete system, it would not be
clear whether the expenditure of further effort is justified after a hundred or more years of intense research
has achieved no result.
In this book, it is generally assumed that there is a fixed underlying logical system (or model), and that
the logical language or calculus is merely a convenient means for discovering theorems for that logical system.
If the calculus is not capable of proving all theorems which are true within the underlying system, this raises
the possibility that other systems might satisfy all of the provable theorems of the calculus, but differing from
the chosen system only in those theorems which are not provable. This kind of speculation leads down the
path to model theory, where one fixes the axioms and rules of the language and investigates the possible
systems which satisfy the language. In model theory, the language is fixed and the underlying model is
variable. But if the model is fixed, and the language is regarded as a mere tool for studying it, then model
theory loses much of its allure. Model theory is less interesting if the model is a single fixed model.
Figure 6.6.1 illustrates one way in which a logical language could be used either with only one interpretation
or with many interpretations. In the case of the transmitting (TX) entity, there is only one model, which is
axiomatised for communicating ideas efficiently. The receiving (RX) entity only has access to the language
and therefore cannot be certain about the model from which the TX entity created the language. So for the
RX entity, there is ambiguity in the interpretation, while for the TX entity there is no such ambiguity.
6.6.15 Remark: Some basic assertions for the conjunction and disjunction operators.
The assertions of Theorem 6.6.12 for a single-parameter predicate and a single proposition all concern the
implication and negation logical operators. (Some of the proofs for Theorem 6.6.12 did admittedly make
use of the conjunction and disjunction operators for convenience, but these operators did not appear in the
assertions of the theorem themselves.) Theorem 6.6.16 concerns the conjunction and disjunction operators.
axiomatisation interpretation
semantics semantics
TX model RX models
Theorem 6.6.16 parts (ii) and (iii) are not valid if the universe is empty. Therefore they cannot be extended
to obtain the corresponding set-theoretic theorems of the form x X, (A(x) B) ` (x X, A(x)) B.
and (x X, A(x)) B a ` x X, (A(x) B) unless it is known that X 6= . The same comment
applies to parts (vii) and (viii).
6.6.16 Theorem [qc]: The following assertions follow from the predicate calculus in Definition 6.3.9.
(i) (x, A(x)) B ` x, (A(x) B).
(ii) x, (A(x) B) ` (x, A(x)) B.
(iii) (x, A(x)) B a ` x, (A(x) B).
(iv) (x, A(x)) B ` x, (A(x) B).
(v) x, (A(x) B) ` (x, A(x)) B.
(vi) (x, A(x)) B a ` x, (A(x) B).
(vii) (x, A(x)) B ` x, (A(x) B).
(viii) x, (A(x) B) ` (x, A(x)) B.
(ix) (x, A(x)) B a ` x, (A(x) B).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) (x, A(x)) B Hyp a (1)
(2) ((x, A(x)) B) Definition 4.7.2 (ii) (1) a (1)
(3) (x, (A(x) B)) Theorem 6.6.12 (xiv) (2) a (1)
(4) x, (A(x) B) Theorem 6.6.10 (v) (3) a (1)
(5) x, (A(x) B) Definition 4.7.2 (ii) (4) a (1)
To prove part (ii): x, (A(x) B) ` (x, A(x)) B
(1) x, (A(x) B) Hyp a (1)
(2) x, (A(x) B) Definition 4.7.2 (ii) (1) a (1)
(3) (x, (A(x) B)) Theorem 6.6.7 (v) (2) a (1)
(4) ((x, A(x)) B) Theorem 6.6.12 (x) (3) a (1)
(5) (x, A(x)) B Definition 4.7.2 (ii) (4) a (1)
Part (iii) follows from parts (i) and (ii).
To prove part (iv): (x, A(x)) B ` x, (A(x) B) [first proof ]
(1) (x, A(x)) B Hyp a (1)
(2) ((x, A(x)) B) Definition 4.7.2 (ii) (1) a (1)
(3) (x, (A(x) B)) Theorem 6.6.12 (xxix) (2) a (1)
(4) x, (A(x) B) Theorem 6.6.10 (viii) (3) a (1)
(5) x, (A(x) B) Definition 4.7.2 (ii) (4) a (1)
To prove part (iv): (x, A(x)) B ` x, (A(x) B) [second proof ]
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) B A(x0 ) Theorem 4.7.9 (lviii) (2) a (1)
(4) x, (B A(x)) UI (3) a (1)
(5) B x, A(x) Theorem 6.6.12 (xvi) (4) a (1)
(6) (x, A(x)) B Theorem 4.7.9 (lix) (5) a (1)
Part (ix) follows from parts (vii) and (viii).
6.6.17 Remark: Importation of two-proposition theorems into predicate calculus.
All of the propositional calculus theorems which are valid for two logical expressions can be immediately
imported into predicate calculus by making both logical expressions depend on the same variable object.
Theorem 6.6.18 gives a sample of such imported assertions. Such results are a dime a dozen, and can
obviously be generated automatically.
6.6.18 Theorem [qc]: The following assertions follow from the predicate calculus in Definition 6.3.9.
(i) x, (A(x) B(x)) ` x, (A(x) B(x)).
(ii) x, (A(x) B(x)) ` x, (A(x) B(x)).
(iii) x, (A(x) B(x)) a ` x, (A(x) B(x)).
(iv) x, B(x) ` x, (A(x) B(x)).
Proof: To prove part (i): x, (A(x) B(x)) ` x, (A(x) B(x))
(1) x, (A(x) B(x)) Hyp a (1)
(2) A(x0 ) B(x0 ) UE (1) a (1)
(3) A(x0 ) B(x0 ) Theorem 4.7.9 (lvi) (2) a (1)
(4) x, (A(x) B(x)) UI (3) a (1)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
soft predicates with any numbers of parameters to be substituted for the wff-names. So the rules are in fact
followed correctly here.
6.6.20 Theorem [qc]: The following assertions follow from the predicate calculus in Definition 6.3.9.
(i) x, (A(x) B(x)) ` (x, A(x)) (x, B(x)).
(ii) (x, A(x)) (x, B(x)) ` x, (A(x) B(x)).
(iii) x, (A(x) B(x)) a ` (x, A(x)) (x, B(x)).
Proof: To prove part (i): x, (A(x) B(x)) ` (x, A(x)) (x, B(x))
(1) x, (A(x) B(x)) Hyp a (1)
(2) A(x0 ) B(x0 ) UE (1) a (1)
(3) A(x0 ) Theorem 4.7.9 (ix) (2) a (1)
(4) x, A(x) UI (3) a (1)
(5) B(x0 ) Theorem 4.7.9 (x) (2) a (1)
(6) x, B(x) UI (5) a (1)
(7) (x, A(x)) (x, B(x)) Theorem 4.7.9 (i) (4, 6) a (1)
To prove part (ii): (x, A(x)) (x, B(x)) ` x, (A(x) B(x))
(1) (x, A(x)) (x, B(x)) Hyp a (1)
(2) x, A(x) Theorem 4.7.9 (ix) (1) a (1)
(3) x, B(x) Theorem 4.7.9 (x) (1) a (1)
(4) A(x0 ) UE (2) a (1)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(4) x, A(x, y0 ) EI (3) a (3)
(5) y, x, A(x, y) EI (4) a (3)
(6) y, x, A(x, y) EE (2,3,5) a (2)
(7) y, x, A(x, y) EE (1,2,6) a (1)
To prove part (iii): x, y, A(x, y) ` y, x, A(x, y)
(1) x, y, A(x, y) Hyp a (1)
(2) y, A(x0 , y) Hyp a (2)
(3) A(x0 , y0 ) UE (2) a (2)
(4) x, A(x, y0 ) EI (3) a (2)
(5) y, x, A(x, y) UI (4) a (2)
(6) y, x, A(x, y) EE (1,2,5) a (1)
This completes the proof of Theorem 6.6.22.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
6.6.24 Remark: An attempt to reverse a non-reversible theorem.
In Remark 6.6.8, an attempt is made to reverse the non-reversible theorem x, A(x) ` x, A(x), which is
given as Theorem 6.6.7 part (i). Similarly, Theorem 6.6.22 part (iii) is non-reversible. It is of some interest
to determine whether the chosen predicate calculus in Definition 6.3.9 will permit a false reversal in this
case, and if not, why not.
False converse of Theorem 6.6.22 part (iii): y, x, A(x, y) ` x, y, A(x, y) [pseudo-theorem]
(1) y, x, A(x, y) Hyp a (1)
(2) x, A(x, y0 ) UE (1) a (1)
(3) A(x0 , y0 ) Hyp a (3)
(4) y, A(x0 , y) (*) UI (3) a (3)
(5) x, y, A(x, y) EI (4) a (3)
(6) x, y, A(x, y) EE (2,3,5) a (1)
It is quite difficult to see why this form of proof fails. Lines (1) to (6) have the following meanings when
they are explicitly expanded as sequents. (See Remark 6.5.9 regarding such sequent expansions.)
(1) y, x, A(x, y) y, x, A(x, y) a (1)
(2) y0 U, y, x, A(x, y) x, A(x, y0 ) a (1)
(3) x0 U, y0 U, A(x0 , y0 ) A(x0 , y0 ) a (3)
(4) x0 U, y0 U, A(x0 , y0 ) y, A(x0 , y) (*) a (3)
(5) x0 U, y0 U, A(x0 , y0 ) x, y, A(x, y) (*) a (3)
(6) y, x, A(x, y) x, y, A(x, y) (*) a (1)
When expanded as explicit sequents like this, it is clear that line (4) is not a valid sequent. (As a consequence,
lines (5) and (6) are also not valid.) The cause of this is the incorrect application of the UI rule. This rule
states that from ` (x) one may infer ` x, (x). The dependency list for line (3) contains the single
proposition A(x0 , y0 ), which clearly does depend on the parameter y0 . So the UI rule cannot be applied.
Thus the error in the above pseudo-proof lies in the incorrect application of the UI rule, not the EE rule as
one could have suspected. In fact, it is always invalid to apply the UI rule to a hypothesised sequent, since
the free variable to which the universalisation is to be applied always appears in the dependency list if it
appears in the consequence. (See Remark 6.6.8 for some similar comments in regard to the application of
Rule G in the context of a related pseudo-proof.) Rosser [337], page 124, referred to the UI rule as Rule G,
meaning the generalisation rule.
At first sight, it might seem that the inference of line (4) from line (3) is correct because line (3) (in the
abbreviated argument lines) seems to say that A(x0 , y0 ) is true for all x0 U and y0 U . But it does not
say this. Line (3) actually says that A(x0 , y0 ) A(x0 , y0 ) for all x0 U and y0 U , which really asserts
nothing about A at all. One might also think that the fully expanded sequent on line (4) asserts that the
statement x0 U, y, A(x0 , y) follows from the statement x0 U, y0 U, A(x0 , y0 ). That would
be a correct assertion. But line (4) does not say this! The key point here is that the universal quantifiers
for the free variables are applied to the entire sequent, not to the consequent and antecedent propositions
of the sequent separately.
This pseudo-proof example demonstrates that it is not sufficient to examine only the clearly visible consequent
proposition of a sequent before applying an inference rule. The consequent proposition appears clearly in the
foreground of an argument line, but in the background is a list of dependencies at the right of the line. Every
dependency in that list must be examined to ensure that the variable to be universalised in an application of
the UI rule is not present. The dependency list is an integral part of the sequent. This observation applies
also to the EE rule. The absence of a free variable parameter (x) appended to a dependency list in
Definition 6.3.9 means that the propositions in the dependency list must not contain the free variable x.
It is perhaps worth mentioning that although the free variable y0 appears in both lines (2) and (3) in the
above pseudo-proof, this does not in any way link the two lines, except as a mnemonic for the person reading
or writing the logical argument. Each line is independent. The scope of any free variables in a line is restricted
to that line. One might think of the free variable y0 in the formula A(x0 , y0 ) on line (3) as meaning the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
variable y0 which was chosen in line (2), which is what it would be in standard mathematical argumentation
style. In fact, the formula A(x0 , y0 ) on line (3) commences a new argument, whose lines are linked only
by the line numbers in parentheses at the right of each line. The free variables may be arbitrarily renamed
on each line without affecting the validity or outcome of the argument in any way. Thus the meaning of
every line is history-independent. In other words, there is no entanglement between the lines of a predicate
calculus argument! (This is hugely beneficial for the robustness and verifiability of proofs.) Hence the error
at line (4) in the pseudo-proof has nothing at all to do with the apparent linkage of the variable y0 (which
is removed by the application of the UI rule) to the apparent fixed choice of this variable in line (2). The
variable isnt fixed and it isnt linked!
It is perhaps also worth mentioning that the error in the above pseudo-proof has nothing to do with the
invocation of the EE rule on line (6). Nor is the error due to the fact that the UI rule is applied to a line
which is within the apparent scope of the application of the EE rule from line (2) through to line (5). The
fact that the UI rule is applied to a variable y0 which appears in the existential formula x, A(x, y0 ) in
line 2 is irrelevant. (See E. Mendelson [320], page 74, line (III), for a statement which does suggest this kind
of linkage for a closely related variant of the predicate calculus which is presented here.)
the correctness of mathematical proofs to be fully automated, thereby taking the human out of the loop.
If some metatheorems seem indispensable for efficient logical deduction, one may as well convert those
metatheorems into primary inference rules, which is more or less what a natural inference system does. Since
a billion lines of inference may be checked every second by modern computers, it seems quite unnecessary
to invoke efficiency as a motivation for metatheorems. The deduction metatheorem, in particular, says only
that an equivalent proof exists without the benefit of the metatheorem. If metatheorem methods are justified
by their equivalence to methods which do not use metatheorems, it could be argued that it is somewhat
superfluous to adopt them.
In QC+EQ, the concrete predicate space Q includes at least the binary equality predicate =, which is
axiomatised. Other concrete predicates may also exist, but they do not have specified axioms or rules. In a
predicate calculus with equality, any additional predicates are only axiomatised with respect to substitutivity
of variable names which refer to the same variable.
In a pure predicate calculus QC (without equality), there may be any number of predicates, but there are
no non-logical axioms at all for these. In a FOL, there may be any numbers of concrete predicates in Q
and concrete object-maps in F, and these are typically all axiomatised. ZF is a particular FOL which
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is a predicate calculus with equality with one additional axiomatised concrete predicate and no concrete
object-maps. In other words, Q = {=, } and F = .
As outlined in Remark 5.1.4, a pure predicate calculus has any number of concrete predicates and variables,
which are combined to produce concrete propositions. These concrete propositions are then suitable for
formalisation as a concrete proposition domain P as in Section 3.2, with a proposition name space NP as
in Section 3.3. In a predicate calculus with equality, the equality predicate is axiomatised both in relation
to itself and in relation to all other predicates. This requires some non-logical or extralogical axioms in
addition to the purely logical axioms which are given in Definition 6.3.9 for pure predicate calculus.
The two kinds of predicates are distinguished here as concrete and abstract, but some books refer to them
as constant and variable predicates. Another way to think of them is as hard and soft predicates,
analogous to the hardware and software in computers. The hard predicates are built-in, constant and
concrete, whereas the soft predicates are constructed as required as logical formulas from the basic logical
operators and quantifiers. (See also Remark 6.3.7 for soft predicates.)
6.7.2 Remark: Equality in predicate calculus means that two names are bound to the same object.
As in the cases of pure propositional calculus and pure predicate calculus, so also in the case of predicate
calculus with equality, the formulation in terms of a language must be guided by the underlying semantics.
In Figure 5.1.2 in Remark 5.1.6, concrete object names are shown in a separate layer to the concrete objects
themselves. The name map V : NV V for variables may or may not be injective in a particular scope.
There is an underlying assumption in the formalisation of equality that the space of concrete variables V
has some kind of native equality relation which is extralogical, meaning that it is an application issue.
(I.e. its somebody elses problem.) Then two names of variables x and y are said to be equal in the name
space NV whenever V (x) and V (y) are equal in the concrete object space V. In other words, x and y
are bound to the same object. (The object binding map V is scope-dependent.)
6.7.3 Remark: A concrete variable space (almost) always has an equality relation.
If a space of concrete variables V is given for a logical system, one usually does have the ability to determine
if two objects in V are the same or different. This is surely one of the most basic properties of any set or class
or space of objects. In the physical world, two electrons may be indistinguishable. They have no identity.
But abstract objects constructed by humans generally do not have that problem.
An equality relation may be thought of as a map E : V V P which has the symmetry and transitivity
properties of an equivalence relation as in Definition 9.8.2. In other words, E(v1 , v2 ) = E(v2 , v1 ) for all
v1 , v2 V and (E(v1 , v2 ) E(v2 , v3 )) E(v1 , v3 ) for all v1 , v2 , v3 V. However, a difficulty arises when
one attempts to require the reflexivity property. To require that E(v, v) to be true for any v V exposes the
issue of the difference between names and objects. This requirement is equivalent to requiring that E(v1 , v2 )
be true if v1 = v2 , which can only be determined by reference to an even more concrete equality relation. An
equality relation E should satisfy E(v1 , v2 ) if and only if v1 = v2 . In other words, if v1 6= v2 , then E(v1 , v2 )
must be false. But this means that E must be the same relation as =, which raise the question of where
this pre-defined relation = comes from.
When using a variable name v in some context in some language, it is assumed that it has a fixed binding
within its scope. For example, when one writes x x, one assumes that the first x refers to the same
object (such as a number or an element of an ordered set) as the second x. This means that the value of the
name-to-object map V maps the two instances of x to the same object. In other words, V (x1 ) = V (x2 ),
where x1 and x2 are the first and second occurrences of the symbol x. But this equality of V (x1 ) and
V (x2 ) is only meaningful if an equality relation is already defined on the object space V. So it is not even
possible to say what it means that the binding of a variable name stays the same within its scope unless an
equality relation is available for the object space. Consequently, if the symbol v is a name in the variable
name space NV , and E is a relation on this name space, then the requirement that E(v, v) must be true for
all v NV is meaningless because it is not possible to say that the binding of v has stayed the same within
its scope. To put this another way, the definition of a function requires it to have a unique value for each
element of its domain, but uniqueness cannot be defined without an equality relation on the target space.
So it is impossible to say whether the name map V : NV V is well defined.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
add some extra condition to distinguish equality from equivalence. Condition (i) in Definition 9.8.2 for an
equivalence relation R on a set X requires x R x to be true for all x in X, which means that x R y must
be true whenever x = y. Since all equivalence relations are implicitly or explicitly defined in terms of an
underlying equality relation, the circularity of definitions is difficult to overcome, as noted in Remark 6.7.3.
There are two main ways to formalise equality within Zermelo-Fraenkel set theory. Approach (1) is applicable
to general predicate calculus with equality. Approach (2) is specific to set theories. (The ZF axiom of
extension has two different forms depending on which approach is adopted. See Remark 7.5.13 for more
detailed discussion of these two approaches to defining equality.)
(1) Require the equality relation to satisfy substitutivity of equality with respect to all other predicates.
This means that any two names which refer to the same object may be used interchangeably in any
other predicate.
(2) Define the equality relation in terms of set membership. The equality x = y may be defined in a set
theory to be an abbreviation for the logical expression z, ((z x) (z y)), where is the name
bound to the concrete set membership relation.
The axioms in Definition 6.7.5 (ii) require the equality relation to satisfy the three basic conditions for an
equivalence relation and a substitutivity condition with respect to any concrete predicate whose name A is in
NQn , the set of names of concrete predicates with n parameters, where n is a non-negative integer. (See the
proof of Theorem 7.3.5 (iv) for an application of the substitutivity axiom, Definition 6.7.5 (ii) (Subs). The
substitutivity axiom is usually applied as a self-evident standard procedure without quoting any axioms.)
(i) The predicate name = in NQ is bound to the same binary concrete predicate in all scopes.
(ii) The additional axiom templates are the following logical expression formulas.
(EQ 1) ` x1 = x1 for x1 NV .
(EQ 2) ` (x1 = x2 ) (x2 = x1 ) for x1 , x2 NV .
(EQ 3) ` (x1 = x2 ) ((x2 = x3 ) (x1 = x3 )) for x1 , x2 , x3 NV .
(Subs) ` (xi = yi ) (A(x1 , . . . xi , . . . xn ) A(x1 , . . . yi , . . . xn )) for A NQn , xi , yi NV , i = 1, . . . n.
6.7.6 Notation: x 6= y, for variable names x and y in a predicate calculus with equality, means (x = y).
6.7.8 Theorem [qc+eq]: Some propositions for predicate calculus with equality.
The following assertions follow from the predicate calculus with equality in Definition 6.7.5.
(i) ` z, z = z.
(ii) ` z, z = z.
(iii) ` x, y, x = y.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iv) ` x, y, x = y.
(v) y1 = y2 ` y2 = y1 .
(vi) y1 = y2 , A(x, y1 ) ` A(x, y2 ).
(vii) y1 = y2 , A(x, y2 ) ` A(x, y1 ).
(viii) y1 = y2 ` x, (A(x, y1 ) A(x, y2 )).
(ix) z, (z = y A(x, z)) ` A(x, y).
(x) A(x, y) ` z, (z = y A(x, z)).
(xi) z, (z = y A(x, z)) a ` A(x, y).
(xii) z, (z = y A(x, z)) ` A(x, y).
(xiii) A(x, y) ` z, (z = y A(x, z)).
(xiv) z, (z = y A(x, z)) a ` A(x, y).
(1) x = x EQ 1 a
(2) y, x = y EI (1) a
(3) x, y, x = y EI (2) a
To prove part (iv): ` x, y, x = y
(1) x = x EQ 1 a
(2) y, x = y EI (1) a
(3) x, y, x = y UI (2) a
To prove part (v): y1 = y2 ` y2 = y1
(1) y1 = y2 Hyp a (1)
(2) y1 = y2 y2 = y1 EQ 2 a
(3) y2 = y1 MP (1,2) a (1)
To prove part (vi): y1 = y2 , A(x, y1 ) ` A(x, y2 )
(1) y1 = y2 Hyp a (1)
(2) A(x, y1 ) Hyp a (2)
(3) (y1 = y2 ) (A(x, y1 ) A(x, y2 )) Subs a
(4) A(x, y1 ) A(x, y2 ) MP (1,3) a (1)
(5) A(x, y2 ) MP (2,4) a (1,2)
To prove part (vii): y1 = y2 , A(x, y2 ) ` A(x, y1 )
(1) y1 = y2 Hyp a (1)
(2) A(x, y2 ) Hyp a (2)
(3) y2 = y1 part (v) (1) a (1)
(4) A(x, y1 ) part (vi) (3,2) a (1,2)
To prove part (viii): y1 = y2 ` x, (A(x, y1 ) A(x, y2 ))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(1) y1 = y2 Hyp a (1)
(2) A(x, y1 ) Hyp a (2)
(3) A(x, y2 ) part (vi) (1,2) a (1,2)
(4) A(x, y1 ) A(x, y2 ) CP (2,3) a (1)
(5) x, (A(x, y1 ) A(x, y2 )) UI (4) a (1)
To prove part (ix): z, (z = y A(x, z)) ` A(x, y)
(1) z, (z = y A(x, z)) Hyp a (1)
(2) z = y A(x, z) Hyp a (2)
(3) z = y Theorem 4.7.9 (ix) (2) a (2)
(4) A(x, z) Theorem 4.7.9 (x) (2) a (2)
(5) A(x, y) part (vi) (3,4) a (2)
(6) A(x, y) EE (1,2,5) a (1)
To prove part (x): A(x, y) ` z, (z = y A(x, z))
(1) A(x, y) Hyp a (1)
(2) y = y EQ 1 a
(3) (y = y) A(x, y) Theorem 4.7.9 (i) (2,1) a (1)
(4) z, ((z = y) A(x, z)) EI (3) a (1)
Part (xi) follows from parts (ix) and (x).
To prove part (xii): z, (z = y A(x, z)) ` A(x, y)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
other words, if a logical calculus is unable to demonstrate some true propositions, that is a much less serious
problem than the ability to prove even a single false proposition.
6.7.10 Remark: The history of the equality symbol.
The equals sign = was first used by Robert Recorde, Whetstone of witte, 1557. (See Cajori [222],
Volume I, pages 165, 297309.) His symbol===== (a paire of parelleles) was somewhat longer than the
modern version. Recorde wrote that he chose the symbol bicause noe.2. thynges, can be moare equalle.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
6.8.7 Remark: Using the cardinality operator notation to denote uniqueness in a set theory.
In terms of the cardinality notation in set theory, one might write 0 x, P (x) informally as #{x; P (x)} = 1,
meaning that there is precisely one thing x such that P (x) is true. (Existence means that #{x; P (x)} 1
whereas uniqueness means that #{x; P (x)} 1.) But the notation 0 is well defined in a general predicate
calculus with equality, which is much more general than set theory.
6.8.8 Remark: Generalised multiplicity quantifiers.
There are some (rare) occasions where it would be convenient to have notations for concepts such as there
are at least 2 things x such that P (x) is true, notated 2 x, P (x). (Margaris [319], page 174, gives the
notation !n for essentially the same concept as the quantifier n presented here.) More generally, there
are between m and n things such that P (x) is true for cardinal numbers m n, notated nm x, P (x).
Then 0n would be a sensible shorthand for nn . Hence the quantifiers 0 , 01 and 11 are equivalent.
The notation n x, P (x) should mean: There are at most n things such that P (x) is true. So n should
mean the same as n0 . (See Remarks 6.8.9, 6.8.10, 6.8.11, 6.8.12, 6.8.13, 6.8.14, 6.8.15, 6.8.16 regarding
expressions for nm .)
Such generalised uniqueness and existence quantifiers could be referred to as multiplicity quantifiers.
Perhaps generalised existential quantifier notations are best written out in longhand. The simple case
2 x, P (x) may be defined as x, y, (P (x) P (y) (x 6= y)), but the more general cases become rapidly
more complex. The slightly more complex case 02 x, P (x) (that is, 22 x, P (x)) would mean
x, y, (P (x) P (y) (x 6= y)) x, y, z, (P (x) P (y) P (z)) (x = y y = z x = z) .
The statement n x, P (x) is equivalent to (n+1 x, P (x)) for non-negative integers n. So nm x, P (x)
is equivalent to (m x, P (x)) (n+1 x, P (x)).
The statement n x, P (x) is equivalent to (n1 x, P (x)) for positive integers n.
6.8.9 Remark: The expression 3 x, P (x), which means there exist at least 3 things with property P ,
may be written as:
x, y, z, P (x) P (y) P (z) (x 6= y) (y 6= z) (x 6= z).
6.8.10 Remark: The expression 3 x, P (x), which means there exist at most 3 things with property P ,
may be written as:
w, x, y, z,
(P (w) P (x) P (y) P (z)) (w = x w = y w = z x = y x = z y = z).
Alternatively,
w, x, y, z,
(w 6= x w 6= y w 6= z x 6= y x 6= z y 6= z) (P (w) P (x) P (y) P (z)).
Saying that there are at most 3 things x such that P (x) is true is not really an existential statement All
uniqueness statements (in this case a tripliqueness statement) are negative existence statements. Therefore
universal quantifiers are used rather than existential quantifiers.
The similarity of form to Remark 6.8.9 is not surprising because 3 x, P (x) means the same thing as
(4 x, P (x)). There exist at most three x if and only if there do not exist four x. It follows by simple
negation of 4 x, P (x) that there is a third equivalent form for 3 x, P (x), namely
w, x, y, z,
P (w) P (x) P (y) P (z) w = x w = y w = z x = y x = z y = z.
6.8.11 Remark: The expression 33 x, P (x), which means there exist exactly 3 things with property P ,
may be written as:
x, y, z, (P (x) P (y) P (z) (x 6= y) (y 6= z) (x 6= z))
w, x, y, z, (P (w) P (x) P (y) P (z)) (w = x w = y w = z x = y x = z y = z) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
This is the same as the conjunction of 3 x, P (x) and 3 x, P (x) in Remarks 6.8.9 and 6.8.10. It is probably not
possible to reduce the complexity of the statement by exploiting some sort of redundancy in the combination
of statements.
6.8.12 Remark: The expression 32 x, P (x), which means there exist at least 2 and most 3 things with
property P , may be written as:
x, y, (P (x) P (y) (x 6= y))
w, x, y, z, (P (w) P (x) P (y) P (z)) (w = x w = y w = z x = y x = z y = z) .
This may be interpreted as: There are at least 2, but less than 4, x such that P (x) is true. More informally,
one could write 2 #{x; P (x)} < 4.
6.8.13 Remark: To write a logical formula for n x, P (x) for a general non-negative integer n, which
means there exist at least n things with property P , is more difficult. It is tempting to approach this task
using induction. However, that would not yield a closed formula for the desired statement n x, P (x). It is
easier to think of the ordinal number n as a general set. We want to require the existence of a unique xi
such that P (xi ) is true for each i n. The logical expression (6.8.1) is one way of writing n x, P (x).
f, i n, x ((i, x) f P (x)) i n, j n, x ((i, x) f (j, x) f ) i = j . (6.8.1)
In other words, there exists a set f which includes an injective relation on n, such that P (f (i)) is true
for all i n. (It is only required that f restricted to n be an injective relation on n. The purpose of this
generality is to simplify the logic.) Thus statement (6.8.1) means that there exists a distinct x which satisfies
P for each i n. As a bonus, the logical expression (6.8.1) is valid for any set n, even if n is countably or
uncountably infinite.
6.8.14 Remark: Consider the task of writing a logical formula for n x, P (x) for general non-negative
integer n, which means there exist at most n things with property P . Note that n x, P (x) is equivalent to
(n+1 x, P (x)), which can be obtained from Remark 6.8.13 by simple negation as in the following expression,
where n+ = n + 1.
f, i n+ , j n+ , x ((i, x) f (j, x) f ) i = j i n+ , x ((i, x) f P (x)) .
This may be interpreted as: For any injective function f on n+ , for some i in n+ , P (f (i)) is false. The
generalisation of this expression to infinite n does not seem to be as straightforward as in Remark 6.8.13.
6.8.15 Remark: Consider the task of writing a logical formula for nn x, P (x) for general non-negative
integer n, which means there exist exactly n things with property P . This task can be tackled by combining
Remarks 6.8.13 and 6.8.14.
f, i n, x ((i, x) f P (x)) i n, j n, x ((i, x) f (j, x) f ) i = j
V
g, i n+ , j n+ , y ((i, y) g (j, y) g) i = j i n+ , y ((i, y) g P (y)) .
6.8.16 Remark: Consider the task of writing a logical formula for nm x, P (x), for general non-negative
integers m and n with m < n, which means there exist at least m and most n things with property P .
This logical expression is a simple variation of Remark 6.8.15.
f, i m, x ((i, x) f P (x)) i m, j m, x ((i, x) f (j, x) f ) i = j
V
g, i n+ , j n+ , y ((i, y) g (j, y) g) i = j i n+ , y ((i, y) g P (y)) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that two things exist. (The corresponding adjective might be duplicate or duplex.) The nouns for 3 and
4 might be triplicity and quadruplicity respectively.
6.8.18 Remark: An equality relation is required for both upper and lower bounds on numbers of objects.
In Remark 6.8.8, both upper and lower limits on the cardinality of objects satisfying a given predicate require
an equality relation on the space of objects. (Here cardinality is meant in the logical sense for non-negative
integers only, not the set-theoretic sense.) Whereas uniqueness places an upper bound of 1 on the number
objects satisfying a predicate by requiring all such objects to be equal , the notion of duplicity puts a lower
bound on cardinality by requiring the existence of at least two unequal objects. In each case, the equality
relation is required for counting objects.
in the sequent. (It means that if the truth lies in all of the knowledge set Ki , then the truth must also
lie in the knowledge set K .)
In a first-order language, the non-logical axioms constrain the knowledge space 2P a-priori to some subset
KZ of 2P , where Z is the logical conjunction of all of the axioms of the language. This is like preceding
all sequents by the proposition Z. Thus an assertion ` has the meaning KZ K because `
is tacitly converted to the sequent Z ` . Similarly, a general sequent T of the form 1 , . . . n ` is
tacitly converted to Z, 1 , . . . n ` , which has the meaning KZ ni=1 Ki K . Consequently the
theorems of a first-order logic with non-logical axioms are not necessarily tautologies. Such theorems are
all consequences of the non-logical axioms. However, all of the theorems from the pure predicate calculus
continue to be tautologies. So pure predicate calculus theorems are valid for any first-order logic.
In principle, it is possible to determine a complete universe of objects for such a system, and to then
determine all of the true/false entries in the tables. This does not make set theory simple for several reasons.
Determining even one complete universe for ZF set theory, for example, is not entirely trivial. (The study
of such universes leads to model theory.) More importantly, the propositions of interest are not these simple
concrete propositions, but rather the soft predicates which can be built from them. For example, the
proposition x, x / means that the entire first column of the table contains X. This very simple
soft proposition is the conjunction of the falsity of the infinitely many concrete propositions in the first
column. The proposition x, x / x is the conjunction of the falsity of of all diagonal propositions in the
table. Most of the interesting theorems and conjectures in mathematics are vastly more complicated
logical combinations of concrete propositions than that. As in the game of chess, the basic rules seem simple,
but playing the game is complex.
Chapter 7
Set axioms
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
[. . . ] general set theory is pretty trivial stuff really, but, if you want to be a mathematician, you
need some, and here it is; read it, absorb it, and forget it.
It is true that mathematics can be done successfully without studying set theory or mathematical logic, but
some people have a desire to understand what everything means, not just how to do it. Since this book is
strongly oriented towards meaning and understanding, as opposed to just proving theorems, the presentation
here commences with the foundations. Set theory is explained in terms of an even lower foundation layer,
mathematical logic, which is presented in Chapters 3, 4, 5 and 6. Although this is all pretty trivial stuff,
it can help to give meaning to an otherwise seemingly endless treadmill of abstract theorem-proving.
7.1.2 Remark: The importance of not examining the foundations of mathematics too closely.
Mathematics is full of perplexing and incomprehensible ambiguities. So it is always useful to be able to
claim that everything can be made meaningful using rigorous set theory. This is like the ancient Greeks
explaining events (e.g. in the Iliad) in terms of the whims of deities on Mount Olympus. As long as no
one went there, no one could prove that the whims of deities on Mount Olympus didnt explain events.
Similarly, one should not examine set theory too closely, because the myth of a solid basis in rigorous set
theory is useful to maintain. Even in times when most logicians thought that set theory was irreparably
self-inconsistent, mathematics progressed regardless. Mathematics did not stop for a decade while waiting
for paradoxes in set theory to be resolved.
7.1.3 Remark: Arguments for and against defining mathematics in terms of sets.
Constructing all mathematical objects from ZF sets is like building all houses, cars and computers out of
empty boxes. Although it is possible to build mathematics from highly convoluted constructions which are
defined entirely in terms of the empty set, this is not at all natural.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
An alternative to the sets-only approach for the definition of mathematical objects is the axiomatic approach,
where objects such as integers, real numbers and other low-level structures are defined by axioms. Then
the particular set-based constructions of these structures are merely representations for the corresponding
axiomatic systems.
By allowing structures other than sets to be defined axiomatically, mathematical constructions will not then
all be derivable from empty sets as their core underlying object. Some minimalism is thereby sacrificed for
the sake of naturalism. However, minimalism in the foundations implies a vast amount of pointless work
in the construction phase, constructing all mathematical objects out of the empty set. The fact that it
can be done does not imply that it should be done. Suitable candidates for axiomatisation (i.e. definition
independent of pure set theory) include numbers, groups, fields, linear spaces, tensors and tangent vectors.
Allowing some classes of mathematical objects to be defined independently of set theory could be regarded
as a kind of diversification strategy or insurance policy. If it turns out later that there is some fundamental
flaw in pure set theory which cannot be fixed, it would be useful to have an independent basis for at least
some of the structures which are required for mathematics.
7.1.4 Remark: ZF is a model validation framework for theories. Mathematics does not reside inside ZF.
It is not true to say that all mathematics is about ZF set theory. Nor is all mathematics expressed within the
framework of ZF set theory. Mathematics thrived for thousands of years before set theory was invented, but
towards the end of the 19th century and the beginning of the 20th century, the accumulation of paradoxes in
mathematics motivated attempts to axiomatise all of mathematics as Euclid had done for plane geometry.
Peano, Frege, Whitehead and Russell, Hilbert, and others attempted to bring mathematics within a logical
system which could be shown to be at least consistent, and preferably complete, so that mathematics in
general could be validated, and future discoveries of paradoxes could be absolutely excluded.
Although the logicist and formalist research programmes were ultimately considered to have failed to deliver
absolute certainty to mathematics, the Zermelo-Fraenkel set theory framework remains as a comprehensive
framework within which almost all mathematics can be represented. When most of the key questions of
independence of axioms (such as the various axioms of choice and continuum hypotheses) had been reasonably
well resolved in the 1960s by Paul Cohen and others, mathematicians largely lost interest in research into
the validity of the foundations of mathematics.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Now in the post-1960s mathematical world, ZF set theory has a much more limited role. The original
motivation for trying to fit all of mathematics into ZF set theory was to define models whose validity could
be established within a strict axiomatic framework. One can say now that if a mathematical theory can be
modelled within ZF set theory, then probably there are no serious fundamental problems which can endanger
the theory. For example, Russells paradox is generally believed to have been banished from ZF set theory.
If a contradiction does in fact arise in ZF set theory, it will have major effects on all of mathematics, not
merely on a single isolated area of mathematics. Such a discovery would probably bring some fame to the
discover, which would more than compensate for the loss of validity of some small area of mathematics. So
ZF set theory is a fairly safe framework within which to build mathematical models. There is a kind of a
built-in insurance scheme which will pay out more than adequate compensation for any damage.
At the beginning of the 21st century, one may regard (almost) all mathematics as being formulated within
the framework of a first-order language (FOL) or in some related kind of logical language. (See Section 6.9 for
first-order language.) Any valid FOL for a mathematical theory typically has infinitely many interpretations
by models within ZF set theory if it has any at all. But the ability to find a model for a theory within the
ZF framework is not the sole criterion for validity. Other kinds of models can be defined. The principal
advantage of finding a model for a theory within ZF is the confidence that one obtains from this. For
example, the real numbers may be implemented as Cauchy sequences or as Dedekind cuts, both within ZF
set theory. Likewise, the integers, rational numbers, complex numbers, group theory, topological spaces,
linear spaces, Schwartz distributions, Caratheodory measure theory and partial differential equations can all
be modelled very straightforwardly within the ZF set theory framework.
Mathematical theories and structures are not all modelled within ZF set theory. The important point is
that almost all mathematical theories and structures can be modelled within ZF set theory. As soon as this
feasibility has been demonstrated, the immersion of a theory within ZF set theory may be safely discarded.
Thus for example the non-negative integers may be modelled as sets 0 = , 1 = {}, and so forth. But this
is not really the definition of the non-negative integers. It is merely a feasibility demonstration. Similarly,
representations of real numbers as Cauchy sequences or Dedekind cuts may be safely discarded as soon as the
theory of real numbers has been validated for the particular model which is chosen. The real real numbers
are extra-mathematical entities, not elements of the von Neumann universe or any other set theory universe.
7.1.5 Remark: ZF set theory does not exist. Its only a theory. Or a model . . .
Whereas Remark 7.1.4 asserts that ZF set theory is a testing and validation framework which is a kind of
factory that provides models for all of mathematics, ZF set theory itself is a theory which requires models. In
practice, the models for ZF set theory are provided by ZF set theory! (See for example Chang/Keisler [298],
page 560, for some issues in the choice of an underlying set theory for model theory.) In other words, it
is a self-modelling theory. This should, perhaps, be alarming because almost all of mathematics depends on
the consistency of ZF set theory.
The 20th century literature on set theory shows broad disagreement regarding which axioms should be part
of axiomatic set theory. Several axioms have swayed between broad acceptance and broad scepticism. (Some
of the more controversial axioms are the choice, continuum hypothesis, infinity, replacement and regularity
axioms.) The situation in mathematical logic is not much different. As a result, there are now dozens
of variants of set theory and hundreds of variants of mathematical logic, each with their own cohort of
believers who defend their own systems and denigrate other systems. The idea that mathematics is the
most logical, water-tight and bullet-proof of all intellectual disciplines is inconsistent with the literature.
It is probably fairest to say that ZF set theory provides a well-studied core set of axioms which are accepted
by enough mathematicians to make it a standard point of reference or trig point for most of mathematics.
In this sense, one may regard ZF set theory as a unifying basic credo for mathematics. Whether one
actually believes in it or not, it does provide a common language to communicate mathematical ideas.
At the opposite extreme, one may think of mathematics as a completely atomised subject where every
definition is a mini-theory which can have its own mini-models. In other words, mathematics is a network
of definitions, and ZF set theory is only one theory amongst millions. Some other fundamental theories are
the integers and the real numbers, which are also used as models for the interpretation of many theories.
Since both the integers and real numbers can themselves be modelled within ZF set theory, this may give the
impression that the definitions of mathematics may be organised as a directed acyclic graph of definitions
which all ultimately derive their meaning from ZF set theory. Such an outcome would have pleased Peano,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Frege, Russell and Hilbert. However, the validity of mathematics does not nowadays reside in a single bucket
into which all other definitions ultimately drain.
Axiomatic systems such as first-order logics provide a very general intellectual framework for constructing
mathematical theories and concepts, clearly much more general than ZF set theory. However, axiomatic
systems are usually modelled in terms of some kind of underlying set theory, which is typically chosen as
some variant of ZF set theory.
7.1.8 Remark: The roles of set theory in mathematics language and semantics.
It is strongly emphasised in Chapters 4 and 6 (and elsewhere) that mathematical logic language can be based
upon semantics which is expressed in terms of elementary naive set theory and logic. The very broad picture
looks like Figure 2.1.1, which shows rigorous mathematics based upon a bedrock layer of mathematical
logic, which in turn is supported by naive logic and set theory. Figure 7.1.1 shows a slightly more concrete
version of this picture.
mathematics
language
mathematics
semantics
set theory
language
set theory
semantics
logic
language
logic
semantics
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
naive
set theory
naive
logic
The naive set theory here is nothing like Zermelo-Fraenkel axiomatic set theory. The naive logic is intuitively
clear, trivial logic. But the important assertion here is that mathematics language can be backed up by
semantics to which the language points. In principle, it should be possible to follow an arrow from any
mathematical language construct to the supporting semantics to discover the meaning of the construct.
Very roughly speaking, naive logic is described in Sections 3.1, 3.2 and 3.3, the relevant naive set theory is
described in Section 3.4, logic semantics is described in Sections 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13
and 3.14, and in Chapter 5, and logic language is described in Chapters 4 and 6.
mathematics literature which gives a kind of life to abstract set theory. Therefore set theory is more than
just a first-order language. The fact that a grammar and specification can be written for the language of
mathematics does not mean that mathematics is just a language. It is the ideas which are communicated in
the language which are the real mathematics, in the same way that the ideas communicated in English and
on the Internet are the real life which underlies the grammar books and the technical specifications.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
closely related systems are mentioned. The ZF system is adequate for most of differential geometry.
The rapid development of axiomatic set theory and mathematical logic around the beginning of the 20th
century followed some occasionally vicious warfare between mathematicians in the 19th century. Axiomatic
set theory proscribes some behaviours and gives licence to others. This does not mean that the axioms are
universally true. The axioms help to keep the peace. ZF set theory is a kind of mainstream law-book which
may be amended as required for special applications.
7.2.2 Definition:
A set-theoretic formula is a soft predicate in a predicate calculus with predicate name space {=, }.
The ZF set theory axioms in Definition 7.2.4 are proposition templates. Concrete individuals of the indicated
type may be substituted for any of the free variables. All of the quantifiers refer to the set of individuals,
not predicates or functions. All individuals are sets. So all quantifiers refer to sets.
7.2.4 Definition [mm]: Zermelo-Fraenkel set theory is a predicate calculus with equality, with a binary
relation which satisfies the following axioms.
(1) The extension axiom: For any sets A and B,
(x, (x A x B)) (A = B).
(2) The empty set axiom:
A, x, (x A).
(3) The unordered pair axiom: For any sets A and B,
C, z, (z C (z = A z = B)).
(4) The union axiom: For any set K,
X, z, (z X S, (z S S K)).
(5) The power set axiom: For any set X:
P, A, (A P x, (x A x X)).
(6) The replacement axiom: For any two-parameter set-theoretic formula f and set A,
(x, y, z, ((f (x, y) f (x, z)) y = z)) B, y, (y B x, (x A f (x, y))).
(7) The regularity axiom: For any set X,
(y, y X) y, (y X z, (z X (z y))).
(8) The infinity axiom:
X, z, (z X ((u, (u z)) y, (y X v, (v z (v y v = y))))).
7.2.5 Notation: x
/ y means (x y).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
7.2.6 Remark: Domain-restricted universal and existential quantifiers.
The meanings of the logical predicate expressions x S, P (x) and x S, P (x) in Notation 7.2.7 are of
a different kind. The former uses while the latter uses . This difference ensures that x S, P (x)
is equivalent to (x S, P (x)). This matches the expected form of an existential proposition. (See
Remark 5.2.1 and Theorems 6.6.7 (ii) and 6.6.10 (vi).)
7.2.7 Notation:
A proposition of the form x S, P (x) for a set S and set-theoretic formula P means x, (x S P (x)).
A proposition of the form x S, P (x) for a set S and set-theoretic formula P means x, (x S P (x)).
7.2.8 Remark: Compact summary of ZF axioms.
The following is a compact summary of the ZF set theory axioms in Definition 7.2.4. The word formula
on the left means two-parameter set-theoretic formula.
(1) sets A, B: (x, (x A x B)) (A = B)
(2) A, x, x
/A
(3) sets A, B: C, z, (z C (z = A z = B))
(4) set K: X, z, (z X S K, z S)
(5) set X: P, A, (A P x, (x A x X))
(6) formula f , set A: (x, y, z, ((f (x, y) f (x, z)) y = z)) B, y, (y B x A, f (x, y))
(7) set X: (y, y X) y X, z X, z /y
(8) X, z, (z X ((u, u / z) y X, v, (v z (v y v = y)))).
It is quite impressive that the entire basis of set theory can be written in 8 lines. The rest of mathematics
is just definitions, notations, theorems and remarks.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the notation
x .
Here is chosen as the initial of the word . x may be read x is an . Thus x man
will mean x is a man, and so on.
And the rest, as they say, is history.
7.2.10 Remark: Set theory is really membership theory. Membership is the only attribute of a set.
Set theory provides axioms for the membership relation rather than for the sets themselves. So it would
perhaps be more accurate to call set theory membership theory or containment theory.
In set theory, it is considered that the only property of a set is its membership relations on the left. In
other words, two sets with names A and B must be the same set if they satisfy x, (x A x B). The
assertion that sets have no other properties than their on-the-left membership relations is enforced by the
ZF extension axiom, Definition 7.2.4 (1), which states that x, (x A x B) implies A = B. (This issue
is also discussed in Remark 6.7.4 and Section 7.5.)
In practice, mathematicians typically think of sets as having more properties than are indicated by their
membership. The cause of this is that set theory is used as a model-construction factory for most of the
systems in mathematics, but the sets in the models of different systems often overlap, so that the same set has
different meanings in different systems. However, this is an application issue. Sets have many applications,
and their meanings are different in each application. This is inevitable. In pure set theory, extraneous
attributes of sets are totally ignored. (This issue is also mentioned in Section 8.7.)
7.3.2 Definition:
A subset of a set B is a set A such that x, (x A x B).
A superset of a set A is a set B such that x, (x A x B).
A set A is said to be included in a set B when A is a subset of B.
A set A is said to include a set B when A is a superset of B.
7.3.3 Notation:
A B, for sets A and B, means that A is a subset of B.
In other words, A B means x, (x A x B).
A B, for sets A and B, means that A is a superset of B.
In other words, A B means x, (x B x A).
A 6 B, for sets A and B, means that A is not a subset of B.
In other words, A 6 B means (x, (x A x B)).
A 6 B, for sets A and B, means that A is not a superset of B.
In other words, A 6 B means (x, (x B x A)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) A A.
(iii) (A B B A) A = B.
(iv) A = B A B.
(v) A = B A B.
(vi) (A B B C) A C.
Proof: For part (i), let A be a set. Then x, (x A x A) by Theorem 6.6.10 (ix). So A A by
Notation 7.3.3.
Part (ii) may be proved as for part (i).
Part (iii) follows from Definitions 7.2.4 (1), 7.3.2 and 4.7.2 (iii).
For part (iv), define a two-parameter predicate P by P (y, z) = x, (x y x z). By the predicate
calculus with equality axiom of substitutivity, Definition 6.7.5 (ii) (Subs), P (A, A) P (A, B) since A = B.
But P (A, A) = x, (x A x A) (which is the same as A A) follows from Theorem 6.6.10 (ix), and
P (A, B) = x, (x A x B) then follows from this. Hence A = B A B by Definition 7.3.2.
Part (v) may be proved as for part (iv).
For part (vi), let A, B and C be sets with A B B C. Then by Definition 7.3.2, x, (x A x B)
and x, (x B x C). Let x be a set. Then (x A x B) (x B x C). Therefore
x A x C by Theorem 4.5.6 (iv). Consequently x, (x A x C) by Definition 6.3.9 (UI).
Therefore A C by Definition 7.3.2. So (A B B C) A C.
7.3.6 Definition:
A proper subset of a set B is a set A such that A B and A 6= B.
A proper superset of a set A is a set B such that A B and A 6= B.
7.3.7 Notation:
A6= B, for sets A and B, means that A is a proper subset of B.
A \ B, for sets A and B, means that A is a proper superset of B.
=
7.3.10 Remark: Some sample proofs of ZF set theory assertions using formal predicate calculus.
The above proof for Theorem 7.3.5 uses the standard informal sketch style of argumentation in which the
vast majority of mathematical proofs are written (by human beings). The reader should, in principle, be able
to build a rigorous proof from an informal sketch. Formal line-by-line logical calculus becomes rapidly more
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
tedious and incomprehensible as the theory progresses from the basics to more advanced topics. However,
the very elementary Theorem 7.3.5 is not too onerous to prove, as follows, in the predicate calculus fashion
as in Section 6.7. (Note that accents on free variables, i.e. sets, are optional because they are merely hints
that the variables are free. The sets A and B are free variables in the following proofs.)
To prove Theorem 7.3.5 (i): ` A A
(1) x, (x A x A) Theorem 6.6.10 (ix) a
(2) A A Notation 7.3.3 (1) a
To prove Theorem 7.3.5 (ii): `AA
(1) x, (x A x A) Theorem 6.6.10 (ix) a
(2) A A Notation 7.3.3 (1) a
To prove Theorem 7.3.5 (iii): ` (A B B A) A = B
(1) AB BA Hyp a (1)
(2) AB Theorem 4.7.9 (ix) (1) a (1)
(3) x, (x A x B) Notation 7.3.3 (2) a (1)
(4) x A x B UE (3) a (1)
(5) BA Theorem 4.7.9 (x) (1) a (1)
(6) x, (x B x A) Notation 7.3.3 (5) a (1)
(7) x B x A UE (6) a (1)
(8) x A x B Theorem 4.7.9 (ii) (4,7) a (1)
(9) x, (x A x B) UI (8) a (1)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(9) x, (x A x C) UI (8) a (1)
(10) AC Notation 7.3.3 (9) a (1)
(11) (A B B C) A C CP (1,10) a
Such formality in proofs does not greatly assist comprehension of the ideas, but it is occasionally useful
to convert some lines of informal sketch-proofs to rigorous predicate calculus when the logic is unclear or
uncertain. In principle, all pure mathematics is no more than a long sequence of predicate calculus arguments
as above. Certainly no one would read a book which is written entirely in this way. Nor would they learn
much about mathematics. This naturally raises the question of what essence would be lost in millions of lines
of rigorous predicate calculus without commentary. If a super-computer could be programmed to execute
a trillion lines of predicate calculus per second and store all of the resulting theorems in a database, why
would that not be doing mathematics? Logical calculus without commentary would be almost meaningless,
like bones without flesh. But commentary without the iron discipline of logical calculus would be dubious
speculation, like flesh without bones.
certificate of compliance if it satisfies the axioms. The choice of representation is a mere implementation
issue to be resolved by the supplier. A model (or implementation or interpretation) which is outsourced is
then somebody elses problem.
7.4.2 Remark: The different qualities of the ZF axioms.
Definition 7.2.4 specifies the properties of the membership relation between sets. It also specifies construction
methods which yield new sets from given sets. So it provides a combination of membership relation properties
and set existence rules. The set existence axioms 2, 3, 4, 5, 6 and 8 guarantee the existence of various kinds
of sets. They effectively anoint sets which are constructed according to specified rules. Axioms 1 and 7
specify constraints which limit set theory to safe ground. Thus there are two technical axioms and six
anointment axioms. (This is perhaps clearer in the high-level summary in Remark 7.3.9.)
All of the ZF set existence axioms guarantee the existence of sets which are constructed with the aid of
pre-existing sets except for axioms 2 and 8, which each make a single set out of nothing.
The productive axioms 2, 3, 4, 5, 6 and 8 may be thought of as defining lower bounds on the sets which may
exist in ZF set theory. The restrictive axioms 1 and 7 may be thought of as defining upper bounds.
Axiom 2 is implied by the other axioms. Axiom 6 allows you to prune back any set to the empty set. Since
the infinity axiom 8 guarantees the existence of at least one set, it follows that an empty set exists. Thus
axiom 2 follows from axioms 6 and 8. So it is technically redundant, but it is generally retained so that if
the infinity axiom is removed, the remaining axioms guarantee that the universe of sets is non-empty.
7.4.3 Remark: The definiteness of sets whose existence is asserted by ZF set theory.
The ZF set existence axioms 2, 3, 4, 5, 6 and 8 each specify a set in terms of a specific set-theoretic formula.
So each individual application of these axioms yields one and only one set, whose elements are precisely
determined by the parameters (if any) of the axiom. In each case, the anointed set X is specified by a logical
expression of the form X, z, (z X P (z, . . .)) for some predicate P with all parameters specified except
for z, which is bound to a universal quantifier. (The special case of the empty set axiom 2 is equivalent to
X, z, (z X (z)), where denotes the always-false predicate. See Notation 5.1.10 for .) The form
of these logical expressions implies their uniqueness by Theorem 7.5.2.
The combination of existence and uniqueness permits a specific name or notation to be given to a set. So
all of the ZF set existence axioms yield unique, well-specified sets. This contrasts with the axiom of choice,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
which merely claims that a specified class of sets is non-empty. This is why the axiom of choice is so useless.
(See also Remark 7.12.1.)
7.4.4 Remark: ZF consistency and existence of models.
One of the first things that a mathematician asks about a set of axioms is the existence question. Does
there exist at least one system which satisfies the axioms? Existence is an extremely woolly philosophical
notion, but it is generally agreed (almost universally) that a prerequisite for existence is consistency of the
axioms. Conversely, if a model (or exemplar) for the axioms can be found, then one concludes (rightly or
wrongly) that the axioms must be consistent. Since it is generally not possible (for various reasons) to prove
within an axiomatic system that the system itself is consistent, mathematical logicians have resorted to the
construction of models for theories as a way of demonstrating consistency. Unfortunately, most of these
model constructions rely upon the consistency of similar sets of axioms for their existence proofs. The
word construction here does not mean literally constructive methods, but rather the definition of classes
of objects within other axiomatic systems. Since essentially all of the axiomatic systems used for defining
models are themselves in doubt, all proofs of model-existence and theory-consistency are necessarily relative.
Among the hundreds of models which have been defined for ZF set theory (mostly to answer questions about
relative consistency or independence of some axioms relative to others), there are two models which are
the most basic, obvious and intuitively clear. Although these two models have large numbers of difficulties
and unanswered questions, they are at least simple in principle. They are generally called the constructible
universe, denoted L, and the von Neumann universe, denoted V . Both of these ZF universes require ordinal
numbers for their definition. So their presentation is delayed until Chapter 12. (The von Neumann universe
is discussed in Section 12.5. The constructible universe is mentioned very briefly in Remark 12.5.8.)
Although there is no absolute proof of the consistency of the ZF axioms, most mathematicians accept as
a matter of faith that at least one model does exist in some sense. This is not the kind of faith which is
required in order to accept the axiom of choice. In the AC case, one must accept the existence of entities
for which no concrete examples will ever be seen. In the ZF consistency case, all of mathematics has made
extensive use and application of the axioms every minute of every day for a hundred years without finding any
inconsistency. While AC is incapable of being tested, ZF is continually tested by every line of mainstream
mathematics. Thus one can say that ZF has a superlative test record, whereas AC has no record of testing
at all. Hence the choice between ZF and ZFC as a basis for mathematics is very clear. Of course, ZFC yields
more theorems, which is the motivation for its wide acceptance, but the extra theorem-proving power is
purchased at the cost of extreme metaphysicality. It is a Faustian pact, obtaining a richness of theorems in
return for ones mathematical integrity.
A B
x
x, (x A x B) A = B
Figure 7.5.1 ZF extension axiom
If two names refer to the same object, all predicates of that object must be identical. This is formalised as
the axiom of substitutivity for predicate calculus with equality in Definition 6.7.5 (ii) (Subs). The converse
is formalised by the ZF extension axiom. In ZF set theory, the only property of a set is what it contains
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
because equality and set membership are the only predicates for sets in Definition 7.2.4.
By the extension axiom, if x, (x A x B) then A = B. So predicate calculus with equality
substitutivity axiom implies x, (A x B x). In other words, if two sets have the same membership
relations on the left, then they also have the same set membership relations on the right. The converse
is also true, as shown by Theorem 7.5.11.
Proof: Let x be a set. Then S, z, (z S (z = x z = x)) by the pair axiom, Definition 7.2.4 (3)
(by letting A = x and B = x). Therefore S, z, (z S z = x). The uniqueness of S follows from
Theorem 7.5.2. Hence 0 S, z, (z S z = x) for all sets x.
7.5.7 Notation: {x}, for a set x, denotes the set S which satisfies z, (z S z = x). In other words,
{x} denotes the singleton which contains x.
Proof: Let x be a set. Then {x} is a well-defined set by Theorem 7.5.4 and Notation 7.5.7, and x {x}
by Notation 7.5.7. Hence x, x {x}.
7.5.10 Remark: Same membership relations on the left imply same membership relations on the right.
From Theorem 7.5.11, it follows that two sets are equal if and only if they have the same membership
relations on the right. But the extension axiom, Definition 7.2.4 (1), states only that two sets are equal if
and only if they have the same membership relations on the left.
Proof: Let A, B be sets such that z, (A z B z). Let z1 = {A}. Then x, (x z1 x = A).
So A z1 . Therefore B z1 . Let x = B. Then x = A. Hence A = B.
7.5.12 Remark: Equality in set theories which are not based on predicate calculus with equality.
As mentioned in Remark 6.7.4, if the object equality predicate for sets is not axiomatised within a predicate
calculus with equality (QC+EQ), then the equality relation may be defined in terms of set membership.
(See for example the presentation of NBG (Neumann-Bernays-Godel) set theory by E. Mendelson [320],
pages 159170.) Then the extension axiom in Definition 7.2.4 (1) must be replaced by an axiom which
replaces the substitutivity axiom in QC+EQ. In other words, the equality relation may be defined in terms
of the membership relation by defining A = B to mean x, (x A x B). In terms of this defined
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
equality relation, the extension axiom then requires this relation to have the following relation to the set
membership relation on the right.
A = B x, (A x B x). (7.5.1)
This would follow automatically from the QC+EQ axiom Definition 6.7.5 (EQ 1) (Subs). At first sight,
line (7.5.1) could easily be confused with the extension axiom in Definition 7.2.4 (1).
7.5.13 Remark: Set theories based on predicate calculus with or without equality.
Two ways of introducing the extension axiom are summarised in the following table.
proposition concrete abstract
1. (A = B) z, (z A z B) QC + EQ definition
2. (A = B) z, (A z B z) QC + EQ axiom
3. (z, (z A z B)) (A = B) axiom definition
4. (z, (A z B z)) (A = B) theorem theorem
The more concrete approach, based on predicate calculus with equality, is adopted in this book and also by
Shoenfield [340], pages 238240, and by EDM2 [109], 33.B, pages 147148. In this approach, it is assumed
that the names in the variable name space NV of the language refer to objects in a concrete variable space V.
(In the case of set theory, this means that the names of sets refer to concrete sets in some externally defined
space.) An equality relation is assumed to be defined already on the concrete variable space, and this relation
is imported into the abstract space via the variable name map V : NV V.
Therefore in the concrete approach, the equality relation = is defined as an import of a concrete equality
relation. Since it also assumed that the membership relation is also imported from a concrete membership
relation, and the concrete relation is assumed to be well-defined for the concrete variables, it automatically
follows that in the abstract name space, (A = B) (z A z B) and (A = B) (A z B z).
In fact, B may be substituted for any instance of A like this without changing the truth value of the
proposition. That is, (A = B) (F (A) F (B)) for any predicate F . (This is called the substitutivity of
equality axiom. See Remark 6.7.4.) So lines (1) and (2) in the above table follow immediately. (QC + EQ
abbreviates predicate calculus with equality.) In this case, line (3), (z, (z A z B)) (A = B)
is specified as an axiom of extension. This is required because it is not a-priori obvious that the concrete
membership relation would imply the concrete equality relation in this way.
The abstract approach is used by E. Mendelson [320], pages 159170. In this approach, the equality relation is
not imported from a concrete space. So the equality relation needs to be defined in terms of the membership
relation, which is the only relation imported from the concrete space. In this approach, A = B is defined to
mean z, (z A z B). This yields lines (1) and (3) in the table.
In the abstract approach, since the equality relation is no more than a definition in terms of the membership
relation, there is no guarantee at all that this abstract equality of A and B implies that they both refer to
the same concrete object, which therefore would have all attributes identical. Therefore this substitutivity
of equality property must be specified as an axiom of extension.
This explains why line (3) is an axiom in the concrete approach whereas it is given by a definition in the
abstract approach, and the reverse is true for line (1).
7.6. The ZF empty set, pair, union and power set axioms
7.6.1 Remark: The empty set axiom.
The empty set axiom states that there exists an empty set. It is unique by Theorem 7.6.2. Therefore it can
be given a name and a notation.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
by the extension axiom, Definition 7.2.4 (1).
Proof: Let x and y be sets. Then S, z, (z S (z = x z = y)) by the pair axiom, Definition 7.2.4 (3).
To show uniqueness, suppose that Sk satisfies z, (z Sk (z = x z = y)) for k = 1, 2. Then
z, (z S1 z S2 ) by Theorem 4.7.9 (xvii). So S1 = S2 by the extension axiom, Definition 7.2.4 (1).
Hence 0 S, z, (z S (z = x z = y)) for all sets x and y.
7.6.9 Notation: {x, y}, for sets x and y, denotes the set S such that z, (z S (z = x z = y)). In
other words, {x, y} denotes the unordered pair which contains x and y.
S
X z K
X, z, z X S, (z S S K)
Figure 7.6.1 ZF union axiom for a general collection of sets
The uniqueness of the set X in this proposition, for any given set K, may be proved exactly as for singletons
in Theorem 7.5.4, the empty set in Theorem 7.6.2, and unordered pairs in Theorem 7.6.7. (There is no need
to formally prove an assertion which is obvious. It is the custom in mathematics that any unproved assertion
which the reader does not believe automatically becomes an exercise question for which the author will not
provide an answer!) Thus clearly K, 0 X, z, (z X S K, z S). And as usual, this existence and
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
uniqueness implies that it is possible to give a name and notation to this set.
7.6.12 Definition: The union of a set K is the set X which satisfies z, (z X S K, z S).
S
7.6.13 Notation:
S K, for a set K, denotes the set X which satisfies z, (z X S K, z S). In
other words, K denotes the union of K.
A B
X z AB
X, z, z X (z A z B)
Figure 7.6.2 ZF union axiom for a collection of two sets
S
7.6.15 Definition: The union of two sets A and B is the set {A, B}.
7.6.17 Remark: The intersection of a pair of sets requires the specification theorem.
It may seem straightforward to define the intersection of two sets A and B as the set C which satisfies
z, (z C (z A z B)), but the ZF union axiom does not imply the existence of such a set, and
there is no ZF intersection axiom. However, the existence of the intersection of two sets does follow from the
specification theorem which follows from the ZF replacement axiom. (See Theorem 7.7.2 and Remark 7.7.20.)
7.6.18 Remark: A singleton axiom would be weaker than the unordered pair axiom.
It may seem that if unordered pair axioms were replaced with a singleton axiom which guarantees the
existence of the set {x} for any x, this could be combined with the union axiom to construct an unordered
pair {x, y} as the union {x} {y} of the two sets {x} and {y}. However, this would require the existence of
the unordered pair {{x}, {y}} in Definition 7.6.15.
Thus the ZF axioms stipulate the existence of a set containing exactly 0 elements and the existence of sets
containing any given 2 elements. Then existence for all other finite numbers of elements can be demonstrated
from these. Oddly, there is a gap in the middle between 0 and 2 here! If one attempts to stipulate only
existence of 1-element and 2-element sets in terms of given sets, this would not exclude the possibility of an
empty universe if the infinity axiom is omitted.
7.6.20 Theorem: Let A be a single-parameter set-theoretic formula. Let S1 , S2 be sets which satisfy
z, (z Sk A(z)) for k = 1, 2. Then S1 = S2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Let Sk be sets which satisfy z, (z Sk A(z)) for k = 1, 2. Then z, (z S1 z S2 ) by
Theorem 4.7.9 (xvii). So S1 = S2 by the extension axiom, Definition 7.2.4 (1).
7.6.21 Definition: The power set of a set X is the set P which satisfies z, (z P z X).
7.6.24 Remark: Using the power set notation to clarify quantified logical expressions.
Propositions of the form z, (z X A(z)), for some set-theoretic formula A, are often seen in practical
P P
mathematics. Since the power set (X) is defined by z, (z (X) z X), the proposition z X is
P
interchangeable with the proposition z (X). So the proposition z, (z X A(z)) is equivalent to
P P
z, (z (X) A(z)). This may be written in terms of Notation 7.2.7 as z (X), A(z), which
has considerable advantages when other quantifiers some before or after the quantifier z (X).P
A f B
x y
old new
Figure 7.7.1 ZF replacement axiom: y, (y B x A, f (x, y))
Although each individual set y which satisfies f (x, y) for some x A is clearly contained in at least one
pre-existing set because y {y}, it is does not follow in general from the other axioms that the collection of
all such sets y is contained in or included in a pre-existing set.
The replacement axiom is directly useful for showing that infinite collections of sets which are determined
by some set-theoretic rule may be considered to be sets. The union axiom is not helpful here because to
apply the union axioms would require the sets in question to already constitute a well-defined set so that
their union may be shown to exist.
Some authors call the replacement axiom the substitution axiom, but the word substitution is best
reserved for text-level substitution of expressions into other expressions. (See for example Sections 3.9
and 3.12, and Definitions 4.4.3 (viii) and 6.3.9 (viii).)
The most useful practical applications of the replacement axiom are indirect by way of the specification
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
theorem, Theorem 7.7.2, which is used for the great majority of set constructions in mathematics.
B, y, (y B (y A P (y))).
Proof: Let A be a set. Let P be a single-parameter set-theoretic formula. Define the two-parameter
set-theoretic formula f by f (x, y) = P (x) (x = y) for sets x and y. Then the ZF replacement axiom,
Definition 7.2.4 (6), asserts the following for any set A and two-parameter set-theoretic formula f .
Let x, y1 , y2 be sets such that f (x, y1 ) f (x, y2 ) is true. Then P (x) (x = y1 ) P (x) (x = y2 ) is true
by the definition of f . So y1 = y2 . Therefore f satisfies x, y1 , y2 , ((f (x, y1 ) f (x, y2 )) y1 = y2 ).
Consequently B, y, (y B x, (x A f (x, y))). In other words, there exists a set B which satisfies
y, (y B x, (x A P (x) x = y)). This is equivalent to y, (y B x, (y A P (y) x = y)),
which is equivalent to y, (y B (y A P (y))).
The replacement axiom extends the class of sets which can be proved to exist. It was noticed by Fraenkel
and Skolem that the range of the recursively defined function f given by f (0) = and f (i + 1) = (f (i)) P
P PP
for all i , was not guaranteed to exist in Z. In other words, the set {, (), ( ()), . . .} could not be
guaranteed to exist in Z, but its existence was guaranteed in ZF. (See for example Halmos [307], page 76;
Moore [321], page 262; Stoll [343], page 303.)
The extension of the guaranteed universe of sets in ZF is required for various pure mathematical set theory
purposes, particularly guarantee the existence of sets beyond V+ in the cumulative hierarchy in Sec-
tion 12.5. However, for most purposes in applicable mathematics, the sets whose existence is guaranteed
in Z are adequate. (See Remark 12.5.3 for some comments on the smaller minimum universe of sets for
Zermelo set theory.)
One of the advantages of Zermelo set theory is that the specification axiom (60 ) in Definition 7.7.4 is much
simpler than the replacement axiom (6) in Definition 7.2.4. This suggests that it is more likely to be a
fundamental property of sets. The specification axiom is also known as the axiom of comprehension or the
axiom of separation.
7.7.4 Definition [mm]: Zermelo set theory is the same as Zermelo-Fraenkel set theory (Definition 7.2.4)
except that the replacement axiom (6) is substituted with Axiom (60 ).
(60 ) The specification axiom: For any set A and single-parameter set-theoretic formula P ,
B, y, (y B (y A P (y))).
7.7.5 Remark: Name for two-parameter set-theoretic formulas which are function-like.
It is convenient to give a name to the kind of two-parameter set-theoretic formula which is required for
the ZF replacement axiom. Such a formula f satisfies x, y1 , y2 , ((f (x, y1 ) f (x, y2 )) y1 = y2 ). It is
difficult to call this a set-theoretic function because this is very easily confused with two-parameter set-
theoretic object-maps in general first-order languages. However, within ZF set theory, the opportunities for
confusion should be minimal. There is a more serious danger of confusion with single-parameter set-theoretic
object-maps because this is essentially what they are. (At least they may be identified with each other.)
One fairly safe solution to this problem is to allow free substitution of one form for the other, regarding
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the difference as a mere notational issue. Notation 7.7.7 permits a boolean-valued two-parameter function
formula to be notated as if it were a set-valued single-parameter formula. Although this is a harmless
fiction, it could seem to contradict the claim that ZF set theory has no set-theoretic object-maps. Note that
Notation 7.7.7 also clashes with the conventions for the ordered-pair-set functions in Definition 10.2.2, which
are based on the ordered-pair-set relations in Definition 9.5.2.
7.7.7 Notation: f (x), for a set x and a two-parameter set-theoretic function formula f , denotes the set
y for which f (x, y) is true.
known that there is a well-defined set which satisfies P . To be more precise, the proposition X, z, (z
X P (z)) must be provable in ZF set theory. Provability is, unfortunately, a slightly slippery notion. It is
difficult to prove that a proposition cannot be proved. Nevertheless, the onus is on the user to demonstrate
that the notation is well-defined before using it.
In case (2), there is generally no serious difficulty with the validity of its use. The specification theorem
guarantees that {x A; P (x)} represents a valid set whenever it is can be shown that A is a set. Any
well-formed predicate P combined with any well-defined set A will yield a valid set.
In case (3), the ZF replacement axiom must be applied unless it is known that all of the sets f (x) for x A
are contained in a single well-defined set. This notation is mostly used when all of the elements of the range
of f are in fact elements of a set whose validity is known. Therefore the application of the replacement axiom
is, most of the time, superfluous in applicable mathematics.
It should be noted that in all of these cases, there is no absolute general guarantee that the notations actually
denote valid sets. They always require implicit or explicit justification, especially the forms (1) and (3), which
almost always require explicit justification. In case (2), the only point which needs to be checked is that A
is a set, which is typically clear within the context.
7.7.9 Remark: Case (1): Notation for sets which are justified by semi-naive comprehension.
Although Zermelo-Fraenkel set theory does not permit completely general naive comprehension as a basis
for specifying sets, it is convenient to introduce a form of notation which at least appears to specify sets
in such a naive way. (Naive comprehension means that {x; P (x)} specifies a valid set or class for any
set-theoretic formula P .) To ensure that this form of notation does not introduce non-sets into mathematical
texts, the use of Notation 7.7.10 must always be justified with a proof that the object referred to is a genuine
ZF set. One could perhaps refer to such careful use of this notation as semi-naive comprehension.
Theorem 7.6.20 shows that there is at most one ZF set which is specified by a single-parameter set-theoretic
formula. So if at least one set can be proved to satisfy the formula, it must be unique. Therefore any
proposition which specifies at least one set specifies exactly one set. Hence Notation 7.7.10 is well defined if
the stipulated condition for P is met.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the set X which satisfies z, (z X P (z)).
7.7.11 Remark: Variant notations for sets which are specified by semi-naive comprehension.
There are some variants of Notation 7.7.10 in the literature. For example, Shoenfield [340], page 242, uses
[x | P (x)] to denote {x; P (x)}. (See also Remark 7.7.17 for discussion of similar alternative notations.)
7.7.13 Remark: Case (2): Notation for sets which are justified by the specification theorem.
There are no validity issues for Notation 7.7.14 if A denotes a well-defined set. The set which is denoted by
{x A; P (x)} is the same as the set which is denoted by {x; x A P (x)} in terms of Notation 7.7.10.
This set is well defined by Theorems 7.6.20 and 7.7.2.
7.7.15 Theorem: Let A be a set and let P be a single-parameter set-theoretic formula. Then
y, (y {x A; P (x)} (y A and P (y))).
Proof: Let A be a set and P be a single-parameter set-theoretic formula. Let S = {x A; P (x)}. Then
S = {x; x A and P (x)} by Notation 7.7.14. So y, (y S (y A and P (y))) by Notation 7.7.10.
Hence y, (y {x A; P (x)} (y A and P (y))).
7.7.16 Theorem:
(i) A = {x; x A} = {x A; x = x} = {x A; >} for any set A.
(ii) {x A; P (x)} A, for any set A and any single-parameter set-theoretic formula P .
Proof: For part (i), let A be a set. Then y, (y {x; x A} y A) by Theorem 7.7.12. Therefore
A = {x; x A} by Definition 7.2.4 (1). By Notation 7.7.14, {x A; x = x} = {x; x A and x = x}, which
equals {x; x A} because x = x is true for all sets. (See Section 6.7 and Remark 7.5.13 for equality.) Then
{x A; x = x} = {x A; >} because the always-true proposition > is always true. (See Notation 5.1.10.)
For part (ii), let A be a set and let P be a single-parameter set-theoretic formula. Let y {x A; P (x)}.
Then y A and P (y) by Theorem 7.7.15. Therefore x A. Hence {x A; P (x)} A.
7.7.18 Remark: Case (3): Notation for sets which are justified by the replacement axiom.
The set notation {f (x); x A} is well defined by the ZF replacement axiom, Definition 7.2.4 (6).
7.7.20 Remark: Existence of intersections of sets follows from the specification theorem.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The intersection of a pair of sets A and B may be defined by restricting elements of one of the sets to be
contained in the other as in Definition 7.7.21, using Notation 7.7.14. This is well-defined by Theorem 7.7.2.
7.7.21 Definition: The intersection of two sets A and B is the set {z A; z B}.
7.7.22 Notation: A B, for sets A and B, denotes the intersection of the two sets A and B.
7.7.24 Notation:
{A X; P (A, X)}, for a set X and set-theoretic formula P , means {A P(X); P (A, X)}.
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
7.8. The ZF regularity axiom 233
z6 z5 z4 z3 z2 z1
z5 z4 z3 z2 z1 z0
Figure 7.8.1 Infinite chain of set memberships on the left
The axiom of regularity implies that any infinite sequence of set memberships . . . zn . . . z2 z1 z0
terminates on the left for some n. This can be proven by letting X be this sequence of sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The regularity axiom differs significantly from the other axioms by not actually anointing any new sets. It
simply outlaws certain kinds of sets which are considered objectionable. Therefore there are no definitions
or notations for sets which are constructed in accordance with this axiom because it produces no new sets.
The negative nature of this axiom can be seen more clearly by rewriting line (7.8.1) as:
X, (X 6= z X, z X 6= ) ,
where denotes the always-false predicate in Notation 5.1.10. In other words, X 6= z X, z X 6=
is forbidden. Alternatively, this may be written as:
X, (z X, z X 6= ) X =
7.8.2 Theorem: Let A, B and C be sets.
(i) A
/ A.
(ii) A
/ B or B
/ A. In other words, if A B then B
/ A.
(iii) ((A B) (B C) (C A)).
Proof: For part (i), let A be a set. Let X = {A}. Then X 6= . So z X, z X = by the regularity
axiom. But the only element of X is z = A. So A X = . That is, A {A} = . Hence A / A.
For part (ii), let A and B be sets. Let X = {A, B}. Then X 6= . So z X, z X = by the regularity
axiom. There are only two elements in X. So either z = A or z = B. Suppose that z = A. Then A X = .
That is, A {A, B} = . So A / A and B / A. In particular, B / A. Similarly, from z = B it follows
that A / B. Hence A / B or B / A.
For part (iii), let A, B and C be sets. Let X = {A, B, C}. Then X 6= and so z X, z X = . But then
z = A or z = B or z = C. Suppose that z = A. Then A {A, B, C} = . So C / A. Similarly, setting z = B
yields A
/ B, and setting z = C yields B / C. In each case, the proposition (A B) (B C) (C A)
is contradicted. Hence ((A B) (B C) (C A)).
universe U0
universe U1
universe U2
universe U3
universe U4
universe U5
universe 6
universe
...
unv
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 7.8.2 A folder which contains a hard link to its parent folder
Theorem 7.8.2 (i) prevents Russells paradox by excluding the existence of a set U satisfying x, x U .
Russells paradox is thus resolved in ZF set theory by simply forbidding the kinds of sets which cause
embarrassment. There are many ways to exclude Russells paradox. Neumann-Bernays-Godel set theory
avoids the paradox by demoting the problematic sets such as U to classes which are forbidden to
be elements of sets. An alternative is to not exclude the paradox, but to let it propagate into the deeper
foundations of mathematics by formalising inconsistency-tolerant logic. (See for example Mortensen [322].)
Except in very elementary introductions, the concept of a set or universe of all sets is not often seen in serious
mathematics. So not much is lost by excluding it. The metaphor for a set in everyday life is a container,
which clearly cannot contain itself. If the word container had been adopted instead of set or class, the
concept of a container which contains all containers would have been less appealing, and the temptation
would have been easier to resist.
7.8.5 Remark: The depth of sets on the left can be unbounded, but not infinite.
If any chain of membership relations on the left is required to be finite by the regularity axiom, then
every such chain is equinumerous to one and only one finite ordinal number. (See Definition 12.1.32 and
Section 12.2 for finite ordinal numbers. See Section 13.1 for equinumerosity.) The finite stages Vn of the
cumulative hierarchy of sets (i.e. the von Neumann universe) have the interesting property that they contain
precisely those sets of finite ordinal numbers which have maximum depth n on the left. (This is clear from
the method of construction.) However, the first infinite von Neumann universe stage, denoted V , does not
contain sets with infinite depth. The depth is unbounded, but never infinite for a single set in the universe
stage V . Thus the regularity axiom is not contravened.
The set membership tree for (which is a subset of V ) is illustrated in Figure 7.8.3. Any particular path
downwards from terminates after a finite number of steps, although this number of steps is unbounded.
0 1 2 3 ... n ...
0 0 1 0 1 2 0 . . . n1
0 0 0 1
0
Figure 7.8.3 Set membership tree for the set of finite ordinal numbers
7.8.6 Remark: Infinite set membership chains to the right are permitted.
Although the regularity axiom prevents an infinite chain of set memberships to the left there is nothing
to stop an infinite chain of set memberships to the right. In fact, the set of finite ordinal numbers is such
an infinite sequence. By Theorem 7.5.9, the infinite sequence in line (7.8.2) is valid for any set x.
. . . z4 z3 z2 z1 z0 . (7.8.3)
One might ask why there is an asymmetry here. One way to look at this is to consider that the nature and
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
meaning of a set is determined by what it contains, not by what it is contained in. Thus to determine the
nature of a set, one follows the membership relation network to the left. By looking at the members, and
the members of the members, and so forth, one should be able to find an end-point of any traversal along
a leftwards path in the membership relation network within a finite number of steps. The regularity axiom
guarantees that this is so. Therefore the nature of any set may be determined in this way.
One might ask also why the nature of a set is not determined by the sets which it is contained in. That
would seem to be not entirely unreasonable. However, this approach would require a very large number of
top-level sets to be given meaning outside pure set theory. In the ZF approach to set theory, all membership
relation traversals to the left ultimately end with the empty set. Therefore one only needs to give meaning
to one set, namely the empty set. All other sets then derive their meanings from this one set by following
set membership chains on the left. It is not at all clear how one could start with a single universe set and
give meaning to its members, and members of members, and so forth.
So one possible answer to the question of why the regularity axiom is needed is that it provides certainty
that all sets have determinable content.
7.8.7 Remark: Constructible sets are guaranteed to satisfy the ZF regularity axiom.
If all sets in ZF set theory are built up by a finite number of stages from the empty set (axiom 2) and
the (infinite) set of finite ordinals (axiom 8), it seems fairly clear that the ZF regularity axiom should
hold automatically. (See for example Shoenfield [340], pages 270276 for a constructible universe for ZF set
theory. See also Remark 12.5.8.) Each of the set construction axioms 3, 4, 5 and 6 seems to build sets which
obey the regularity axiom if the source-sets out of which they are built obey the regularity axiom. The fly
in the ointment here is the observation, made in Remark 7.8.3, that the set requires the regularity axiom
to force it to be regular. Since all infinite sets require at some stage of their construction, this means that
all infinite sets owe their regularity to the regularity of .
X, z, (z X (z = y X, v, (v z (v y v = y)))).
This has the form z, (z X P (z, X)). It is because of this self-reference in the defining predicate that
so much effort is required to demonstrate uniqueness, which is the main subject of Section 12.1. (Such a
self-reference appears also in the infinity axioms of other authors which are described in Remark 7.9.2.)
Consequently the unique set of ordinal numbers whose existence is stipulated by axiom 8 is not given a name
and notation until Section 12.1. (See Definition 12.1.26 and Notation 12.1.27.)
X, (y, (y X z, z
/ y)
u, (u X v, (v X w, (w v (w u w = u))))). (7.9.1)
X, ( X (y X, (y {y} X))) (7.9.2)
Line (7.9.1) is given by Shoenfield [340], page 240. Line (7.9.2), which is equivalent to line (7.9.1), is given
by EDM2 [109], 33.B. Line (7.9.3), which is given by E. Mendelson [320], page 169, and EDM [108], 35.B,
may be written equivalently as line (7.9.4).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
7.9.3 Remark: Defining finite ordinal numbers in terms of predecessors instead of successors.
A style of infinity axiom which asserts the existence of predecessors rather than successors is as follows.
X, z, (z X (z = y X, z = y {y})). (7.9.5)
X, z, (z X ((u, u
/ z) y, (y X v, (v z (v y v = y))))). (7.9.6)
This superficially resembles the successor-style axioms in Remark 7.9.2, but in Section 12.1 it becomes clear
that this adjustment to the infinity axiom makes proofs of basic properties of the ordinal numbers quite
different. The successor style asserts that X contains and all successors of all elements of X, whereas the
predecessor style asserts that X contains and all successors of all elements of X, and nothing else. This
is enforced by requiring each element (except ) to have a predecessor. Roughly speaking, the conventional
successor style implies that X for some set X, whereas the predecessor style asserts that is a set.
The author has not yet found any set theory textbook which formulates the infinity axiom in terms of
predecessors as here. E. Mendelson [320], page 175, does define in terms of predecessors, but he first
defines general ordinal numbers in the usual way (in terms of transitivity and well-ordering), and he states
the infinity axiom in terms of successors. (See E. Mendelson [320], page 169.)
7.9.4 Remark: Historical note on the adoption of the predecessor-style infinity axiom in this book.
Since the author cannot find the version of the infinity axiom which is used in Definition 7.2.4 (8) in the
literature, there is some mystery as to where it could have originated from. After developing all of the
essential theory for this variant definition in Section 12.1, it was perplexing that no reference could be found
for it. Since the variant axiom yielded all of the expected results, although with unexpectedly large effort,
it was surprising that it was even valid.
The infinity axiom in terms of predecessors in Remark 7.9.3, line (7.9.6), which is adopted here as one of the
ZF axioms (Definition 7.2.4 (8)), was added to this book on 2009-1-26, replacing an earlier successor-style
infinity axiom. Infinity axioms are expressed in most textbooks in the successor style as in Remark 7.9.2.
The 2009-1-26 predecessor-style infinity axiom was apparently based on a finite ordinal numbers definition
which was added to this book on 2005-3-14. That definition, in turn, seems very likely to have been inspired
by pure intuitive conjecture (i.e. guesswork) by inspecting earlier place-holder text, which was as follows.
The finite ordinal numbers are 0 = , 1 = {0}, 2 = {0, 1}, n+ = n {n}, etc.
The replacement text for this on 2005-3-14 was as follows.
The set of finite ordinal numbers is the set defined by
The question marks indicated that the author was unsure of the new place-holder definition, suggesting that
it was a hastily concocted conjecture to be either verified, corrected or deleted later.
The change from successor axiom to predecessor axiom could plausibly have been influenced by reading
E. Mendelson [320], page 175, where finite ordinal numbers are defined in a similar way, but the similarity is
not instantly obvious. Also, Mendelson requires the elements of to be a-priori ordinal numbers, whereas
here there is no such a-priori assumption. In other words, his definition only says which ordinals are in ,
not how the ordinals in can be defined directly in terms of ZF without first defining general ordinals. So
it seems unlikely that this was the source. It seems most likely that the form of infinity axiom chosen for
this book was a lucky guess, based on a combination of intuition, optimism and naivety!
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
One might reasonably ask why the very general concept of infinite sets is described in ZF set theory in terms
of the von Neumann ordinal number construction, where each element in the set is constructed as x {x}
from each preceding set x.
The construction of an infinite set X requires the specification of a new set which is different to all preceding
sets, for each given subset of X. To be able to state that a set X is infinite, we need to be able to assert
that no matter how large any subset Y X is, we can always generate a set S(Y ) which is different to all
elements of Y . So the general requirement is to find a general construction rule which yields a set S(Y ) for
any given set Y , such that S(Y ) Y = for any set Y .
The construction Y 7 Y {Y } happens to satisfy the requirement. To see this, let S(Y ) = Y {Y } and
note that Y S(Y ), but Y / Y because this is forbidden by the regularity axiom. Therefore S(Y ) 6= Y , by
the substitution of equality axiom of a first-order language with equality. So each generated set is different
to the previous set.
We also need to prove that S(Y ) is different to all preceding elements of the sequence of sets. To show
this, note that Y S(Y ). In fact, by mathematical induction, it is clear that all generated elements of the
sequence include all previous elements of the sequence. In fact, each element Y of the sequence equals the
union of all elements in the sequence up to and including Y . Therefore the proposition Y / Y also implies
that S(Y ) is different to all of the preceding elements of the sequence.
The construction S(Y ) = {Y } looks simpler, and the inequality S(Y ) 6= Y is guaranteed by the regularity
axiom. (This construction was presented in a 1908 paper by Zermelo.) One may easily show that the sets
in the sequence , {}, {{}}, etc., are different by applying the regularity axiom inductively, but such
induction would have to be naive induction since ZF induction is not available before the integers have been
defined. (The Zermelo style of number construction is also discussed in Remark 12.1.13.)
names NV
name V
map
rk number
da
s
nameable
numbers V
da
rk numbers
modelled concrete numbers
Figure 7.10.1 The naming bottleneck and dark numbers
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
7.10.2 Remark: Socio-mathematical network virtual infinity interpretation of ZF set theory axioms.
The set theory axioms in general may be thought of as giving a social licence for the creation of sets by
mathematicians. If a mathematician agrees to only claim the creation of sets which are in conformance with
the axioms, other mathematicians will not make objections.
Not all sets which are permitted by ZF set theory actually exist. The ZF axioms are like legislation which
licenses anyone to construct sets within specified constraints without hindrance. But sets do not exist in any
real sense. They exist only in the minds and literature and discussions of communities of mathematicians.
Different communities have different laws about what is permitted. As in the case of legislation in other
areas of social behaviour, the laws of set theory are chosen so as to help resolve arguments about what
is right and wrong. The social harmony of the community is an important consideration. Clarity and
consistency of the laws tends to minimise social discord and vexatious litigation. Admittance to citizenship
in each mathematical community is subject to acceptance of their constitution and laws.
Infinite sets, such as the integers, grant a very broad licence to create and discuss numbers. It is not possible
for a finite number of finite minds to simultaneously contemplate an infinite set of integers, but the infinite
freedom to create integers ad-libitum is required. This has some similarities to the way computer virtual
memory works. A computer process may have the freedom to access a thousand times as much memory as
physically exists in the computer, but the memory is only allocated on demand. When the process wishes
to write to a block of memory, it is instantiated on demand. Thus any particular portion of the potentially
infinite virtual memory is accessible, but the whole virtual memory cannot be instantiated simultaneously.
On-demand set creation is like walking on rolls of red carpet in a desert. You do not need to cover the whole
desert with a single red carpet. You only need a dozen rolls of carpet and a few helpers. Whenever you
want to walk in some direction, the helpers roll out a carpet in the direction you want to walk. When the
carpets are all used, the helpers roll up one of the used carpets and roll it out in front of you. In this way,
you can walk anywhere in an infinite desert with a small fixed finite number of carpets. (This is similar to
how the human minds short-term consciousness works.) Thus one has the happy illusion that the whole
infinite desert is covered in red carpet.
The axioms of set theory are like helpers rolling out pieces of red carpet in the desert. They allow you to
walk in an infinite universe of sets, but you can only walk in a finite area at any one time. Sets do not
exist until someone thinks of them. Sets and numbers come into existence and persist as long as you want
them to. But they do not all exist permanently in some universe of sets. Sets and numbers exist only in the
human socio-mathematical network, and they exist only in the dynamical sense of on-demand set creation.
The red-carpet-in-the-desert metaphor is related to the way in which computer users may view an infinite
amount of information on a finite computer screen by scrolling, and one may achieve any level of resolution
by zooming in. In this way, one has the illusion that the objects being viewed have infinite resolution and
infinite extent, although the viewing window has a fixed finite number of pixels.
7.10.3 Remark: The infinity axiom is less metaphysical than the axiom of choice.
Whereas the axiom of choice (in Section 7.11) requires faith because it does not give you any set at all, the
axiom of infinity gives you as much of an infinite set as you wish, and you can employ any means, including
computers, to generate the elements of an infinite set. It does require some faith to believe that an infinite
set goes on forever, even though we can never add up all of the terms of a series such as is employed for a
number such as . But since we can have as many digits as we wish, it seems to be somewhat metaphysical
to suppose that an infinite set or series stops at some point. No one has ever encountered an infinite series
which reaches its final term. Calculation of the decimal digits of has never encountered a show-stopper
which cannot be removed by employing larger and faster computers.
In the case of the infinity axiom, it is very difficult indeed to imagine that there is a point at which the
integers suddenly stop existing. The induction principle has a very strong intuitive basis indeed. It would
require enormous faith to believe that there is a point at which the digits of are exhausted, or to believe
that there is a largest integer beyond which there are no more integers. It is a common riddle for 5 year old
children to name the largest number. By the age of 10, very few children are in any doubt that numbers
may be arbitrarily large and that there are infinitely many numbers.
On the other hand, the axiom of choice delivers precisely nothing. In general, the axiom of choice delivers an
infinite set of sets if it delivers only one set. But it does not deliver the first set from which the others may
be constructed. This is because an infinite set of choices can generally be modified in an infinite number of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
ways to generate more choice sets. The ZF axioms can generate infinitely many choice sets if we are delivered
only one. So it is very frustrating that the first one is never delivered.
It is tempting to think that it is a small hurdle to obtain the first choice set because it is only a single set.
If someone tells you that they will give you a billion dollars if they give you even one single dollar, it is very
tempting to feel cheerful about this. But if the first dollar is impossible to deliver, one must restrain ones
irrational exuberance. No technological development in the size and speed of computers will ever deliver the
first choice set. The axiom of infinity gives you as many dollars as you want. The axiom of choice also gives
you as many dollars as you want, but only if it gives you the first dollar, which it never does.
It may be concluded, then, that only the most extreme sceptic would reject the infinity axiom, whereas
enormous trust is required to accept the choice axiom. One could perhaps call the axiom of choice the
rabbit out of a hat axiom. But there is no rabbit! All you get is an IOU for a rabbit!
If one takes the view that the axiom of choice is actually false, then all of the theorems which make use of
sets whose existence is proved by invoking the axiom of choice are in reality only asserting properties of
the empty set. Thus they may be completely correct, but also totally useless.
an imaginary parachute, its better to become aware of this before jumping from the plane, even though a
real parachute is more expensive.
7.11.4 Definition: A (set-collection) choice set for a set X is a set C such that A X, 0 x C, x A.
7.11.5 Definition: A choosable collection (of sets) is a pairwise disjoint collection of sets X which satisfies
C, A X, 0 x C, x A. In other words, it is a pairwise disjoint collection of sets for which a set-
collection choice set exists.
Proof: Let X be a choosable collection of sets. Suppose that X Then there is a set C which satisfies
A X, 0 x C, x A. Let A = . Then A X. So 0 x C, x A. Therefore x C, x A. In other
words, x, (x C and x A). So x, x A, which contradicts the emptiness of A. Hence
/ X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
7.11.7 Remark: Using choice sets instead of choice functions for the axiom of choice.
There are two main (equivalent) approaches to formulating the axiom of choice, in terms of choice functions
or choice sets. It is more natural to think of choice axioms in terms of choice functions, but functions
are defined in Chapter 10 in terms of ordered pairs, which adds to the complexity. The option of using
set-theoretic functions as in Definition 7.7.6 is not feasible because this would require a different kind of
predicate calculus, where predicates can be subject to quantifiers. (That would be second-order logic,
which is a different kettle of fish.) A second and more important reason to not use set-theoretic functions is
that there are not enough of them. More precisely, they are capable of representing only finitely specifiable
rules, whereas sets-of-ordered-pairs functions are almost unlimited. (They may represent rules which require
an infinite, even uncountably infinite, amount of information to specify.)
The main advantage of formulating the axiom of choice in terms of choice sets is that it can be stated as
a low-level set-theoretic formula. The principal disadvantage is that the set-collection to choose from must
contain pairwise disjoint sets. This constraint is not necessary for choice functions. All in all, it seems best
to commence with choice sets, such as C in Definitions 7.11.5 and 7.11.10. However, the axiom is still quite
long when expressed as a pure set-theoretic formula,
The choice-function approach does not require a choosable set-collection to have the pairwise disjoint property
because the choice function associates each choice of element with the set it has been chosen from. In the
choice-set approach, pairwise disjointness is required because, for example, two of the sets A1 and A2 in the
set-collection could be subsets of a third set A3 . Then if A1 and A2 are disjoint, they would each contain a
single element of C, but then A3 would contain two elements of C.
A1
CA2
A3 CA1
A2
X
A4 A5
CA6
CA4 CA5
A6
and the decidability of the existence of the von Neumann stage V+ requires both the axioms of infinity
and replacement, and the decidability of the existence or non-existence of mediate cardinals requires some
form of the continuum hypothesis. So undecidability is just a fact of life.
Definition 7.11.5 is expressed in terms of a proposition of the form X, C, P (X, C) for some set-theoretic
predicate P . The validity of this assertion can be made more likely by either constraining the class of sets
X for which it is asserted, or else by expanding the universe of sets C with which the proposition P (X, C)
can be satisfied. For example, if X is constrained to be finite, then the induction principle makes the
proposition true. But if X is an unrestricted collection of non-empty sets, the universe of sets C must be
expanded to improve the chances of the proposition being true. Since different models of ZF set theory have
different universes of sets, it is not surprising that a given class of eligible set-collections may be guaranteed
to be always choosable in one model, but guaranteed to be not always choosable in another model, and the
choosability may be unknown or unknowable in yet other models. (See Howard/Rubin [312], pages 91103,
145221, for comprehensive details. Their list includes 95 forms of choice axioms which have not yet been
shown to be equivalent in ZF set theory, and most of these axioms are known to be independent of most of
the others. A sample of this complexity is shown in Figure 13.6.1 in Remark 13.6.7.)
The plausibility of axioms of choice no doubt derives from the fact that in practical mathematics, choosing
elements of sets according to a finitely expressible rule is always feasible because the sets follow an explicit
pattern which facilitates the choice of elements according to a rule.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
7.11.9 Remark: Adding AC to the ZF axioms.
Definition 7.11.10 adds the axiom of choice to the ZF set theory in Definition 7.2.4. This can be useful when
courage and moral fortitude fail. (If its too difficult to prove, make it an axiom. Because axioms dont
need to be proved!) The set of axioms ZF plus AC is typically abbreviated as ZFC. (Definition 7.11.10 (9)
is illustrated in Figure 7.11.2.)
A1 z1 C z2 C A2
A3
z3 C
z6 C
z4 C z5 C
A4 A6
A5
Figure 7.11.2 Choice of one element zi from each set Ai of a disjoint collection X
7.11.10 Definition [mm]: Zermelo-Fraenkel set theory with the axiom of choice is Zermelo-Fraenkel set
theory as in Definition 7.2.4, together with the additional Axiom (9).
(9) The choice axiom: For every set X,
((A, (A X x, x A)) A, B, ((A X B X) (A = B x, (x A x B))))
C, A, (A X z, (z A z C y, ((y A y C) y = z))).
In other words,
X, / X A, B X, (A = B A B = )) (C, A X, 0 z C, z A).
( (7.11.1)
In other words, every pairwise disjoint collection of non-empty sets is choosable.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
to provide a means of proving the well-ordering theorem. (See Moore [321], page 330.)
(3) Trichotomy. For any sets X and Y , either X is equinumerous to a subset of Y , or Y is equinumerous
to a subset of X. In other words, either there exists a bijection from X to a subset of Y , or vice versa.
In other words, cardinal numbers are totally ordered. (See E. Mendelson [320], page 198; Moore [321],
page 330.)
(4) Every set which spans a linear space includes a basis for the space. (See Moore [321], page 332.)
(5) Tychonoffs compactness theorem. The product of any family of compact topological spaces is compact.
(Moore [321], page 333.)
tempting because it asserts the existence of sets, but no method of determining the contents. (See also
Rudin [126], page 391, for the Tychonoff theorem.)
(5) Hausdorffs maximality theorem. Every non-empty partially ordered set contains a totally ordered subset
which is maximal with respect to the property of being totally ordered. (See Rudin [126], page 391.)
(6) Every proper ideal in a commutative unitary ring lies in a maximal ideal. (See Rudin [126], page 391.)
(7) The general Hahn-Banach theorem. (See Rudin [126], page 391.)
(8) The Krein-Milman theorem. (See Rudin [126], page 391.)
(9) Urysohns lemma. (See Simmons [133], pages 135137.)
(10) Tietze extension theorem. (See Simmons [133], pages 135137.)
(11) Generalised Stone-Weierstra theorem. (See Rudin [126], page 121.)
(12) Stone-Cech compactification. (See Simmons [133], pages 141, 331332; Rudin [126], page 411.)
The following assertions are CC consequences which are relevant to this book.
(13) The union of a countable number of countable sets is countable. (See Theorem 13.7.9 and Jech [314],
page 142.)
(14) R is not equal to a countable union of countable sets. (Howard/Rubin [312], pages 30, 132.)
(15) Any Dedekind-finite set (Definition 13.8.3) is a finite set as defined by equinumerosity to finite ordinal
numbers. (See Theorem 13.8.11 (iii). It is shown by Jech [314], pages 119131, that the axiom of count-
able choice implies the equivalence of finite and Dedekind-finite sets, but that this finiteness equivalence
does not imply the CC axiom.)
(16) Eight independent definitions of finiteness in ZF are equivalent in ZF+CC. (See Table 13.9.1 in Re-
mark 13.9.1. Two of these definitions are ordinary finiteness and Dedekind-finiteness.)
(17) Every infinite set has a subset which is equinumerous to the ordinal numbers.
That is, if X is an infinite
set, then f, (i , f (i) X) (i, j , (i = j f (i) = f (j))) . (See Theorem 13.8.10 and
Jech [314], page 141.)
R is closed. (See Remark 35.6.11 and Jech [314], page 142.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(18) Every sequentially compact subset of
(19) Every sequentially compact subset of R is bounded. (See Remark 35.6.11 and Jech [314], page 142.)
These benefits of AC and CC must surely be irresistible to any mathematician. Why would anyone give up
such huge benefits if they cost nothing? (One possible reason to resist temptation could be that these axioms
are in fact false.) A comprehensive list of several hundred consequences of AC, CC, DC and many other AC
variants, is given by Howard/Rubin [312], pages 9136. A more approachable list of (almost) irresistible AC
consequences (and the bad things which can happen without AC) is given by Jech [314], pages 141166.
7.11.15 Remark: The difference between Zermelo and Zermelo-Fraenkel set theory.
The set theory published by Zermelo in 1908 had an axiom of specification instead of an axiom of replacement.
(See Zermelo [390].) It was eventually noticed that the existence of some ordinal numbers was not guaranteed
without a stronger axiom. It was Fraenkel who much later advocated the replacement axiom to strengthen
the specification axiom in order to guarantee this existence. (See the 1922 paper by Fraenkel [360].) In most
of differential geometry, there is little need of this stronger axiom. The von Neumann universe V+ is more
than adequate for applicable mathematics, and this set-universe is a model for Zermelo set theory.
7.12.2 Remark: Bertrand Russells boots-and-socks metaphor for the axiom of choice.
In 1919, Russell [339], pages 125127, explained the axiom of choice in terms of a millionaire who was so rich
that he had a countable infinity of pairs of boots and a countable infinity of pairs of socks.
Among boots we can distinguish right and left, and therefore we can make a selection of one out
of each pair, namely, we can choose all the right boots or all the left boots; but with socks no such
principle of selection suggests itself, and we cannot be sure, unless we assume the multiplicative
axiom, that there is any class consisting of one sock out of each pair.
Russell used the term multiplicative axiom for the axiom of choice. Referring to the ordering of a countably
infinite set of pairs of objects, he wrote:
There is no difficulty in doing this with the boots. The pairs are given as forming an 0 , and
therefore as the field of a progression. Within each pair, take the left boot first and the right
second, keeping the order of the pair unchanged; in this way we obtain a progression of all the
boots. But with the socks we shall have to choose arbitrarily, with each pair, which to put first;
and an infinite number of arbitrary choices is an impossibility. Unless we can find a rule for selecting,
i.e. a relation which is a selector, we do not know that a selection is even theoretically possible.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Russell then went on to suggest that the difficulty could be resolved by using the location of the centre
of mass of each sock as a selector. The kind of axiom of choice described in the socks metaphor is for a
countable sequence of pairs, which is probably the weakest of all choice axioms.
No set-theoretically definable well-ordering of the continuum can be proved to exist from the
Zermelo-Fraenkel axioms together with the axiom of choice and the generalized continuum
hypothesis.
If the axiom of choice were valid, every set would have a well-ordering. But no set-theoretically definable
well-ordering is possible. In other words, with finite bandwidth, it will never be possible to write down
such a thing nor even contemplate it. An axiom which asserts a proposition which is known to be factually
false seems to be a somewhat unwise axiom. One could argue that the well-orderability of the real numbers
is valid metamathematically in a larger system within which ZF is modelled, but that well-ordering is not
accessible within the language.
If one adopts some constructible universe model for Zermelo-Fraenkel set theory so that the axiom of choice
is true for this model, the universe of all sets becomes countable metamathematically. If it were true that
the set of all real numbers was countable within the theory, this would contradict the Cantor diagonalisation
procedure which shows that the set of real numbers cannot be countable. Since the well-orderability of the
real numbers in the constructible universe follows from the countability of the set of real numbers in the
model, it is not possible to say within the ZF theory that the real numbers are well-orderable, nor that they
are countable, unless one states this simply as an axiom.
Another way to put this is that if a well-ordering of the real numbers existed, one would not need any
axiom of choice to assert its existence. For example, it is easily shown that the non-negative integers are
well ordered. There is no need of an axiom of choice in this case. If a well-ordering of the real numbers
really existed, why should an axiom be invented to support the assertion? This would be relying on ad-hoc
hypotheses when no genuine proof is possible. As Newton [260], page 530, once wrote: hypotheses non
fingo. In other words, one should not make up metaphysical stories which are not justified by the evidence.
Rationem vero harum gravitatis proprietatum ex phnomenis nondum potui deducere, & hypotheses
non fingo. Quicquid enim ex phnomenis non deducitur, hypothesis vocanda est; & hypotheses seu
metaphysic, seu physic, seu qualitatum occultarum, seu mechanic, in philosophia experimentali
locum non habent. In hac philosophia propositiones deducuntur ex phnomenis, & redduntur
generales per inductionem.
This may be translated as follows.
I have not yet been able to deduce the true reason of these properties of gravity from phenomena, and
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
I do not contrive hypotheses. Indeed whatever is not deduced from phenomena should be called a
hypothesis; and hypotheses, be they metaphysical or physical, of occult qualities, or mechanical, have
no place in experimental philosophy. In this philosophy, propositions are deduced from phenomena,
and are made general by induction.
This kind of thinking may equally be applied to mathematics. The rules of mathematics should be inferred
from everyday mathematical practice, not drawn from the imagination. If no well-orderings for the real
numbers are observed in real-world mathematics, it seems somewhat arbitrary to suppose that they exist
in some metaphysical universe to which there is no possible access.
7.12.5 Remark: Discomfort with the axiom of choice. Its not a bug, its a feature.
Choice functions whose existence is asserted by a choice axiom are like pixies at the bottom of the garden.
You know theyre there, but you never see them! A set which exists only by virtue of an axiom of
choice has unknown contents. (A good example of this is a Lebesgue non-measurable function. The author
discovered this issue as an undergraduate when he tried to write a computer program to print a graph of
such a function to see what it looked like.) Some mathematicians feel quite comfortable with this while
others find the discomfort intolerable. E. Mendelson [320], page 197, said:
The Axiom of Choice is one of the most celebrated and contested statements in the theory of sets.
Struik [228], page 201, said the following about the 1904 proof of the well-ordering theorem by Zermelo [389].
Since Zermelo based his proof on the Auswahlaxiom, which states that from each subset of a given
set one of its elements can be singled out, mathematicians differed on the acceptability of a proof
where no constructive procedure could be given for the finding of such an element. Hilbert and
Hadamard were willing to accept it; Poincare and Borel were not.
Most modern mathematicians have lost interest in the question because it has been known since 1938 that
AC is relatively consistent with the ZF axioms. AC also makes very numerous questions very much simpler.
AC is convenient, and is apparently not logically inconsistent with ZF. So most mathematicians acquiesce,
accepting an apparently vacuous axiom because it makes some theorems more elegant and easier to prove.
The axiom of choice is untestable because it gives you sets which you can never see. Just as Ockhams razor
is invoked in science to eliminate arbitrary and untestable theorising, so also should the axiom of choice
be rejected in mathematics. Real mathematicians solve problems by constructing solutions, not by pulling
them out of a hat.
When some anti-AC mathematicians discovered that they had implicitly relied upon choice axioms in their
own publications for the construction of choice functions, this was understandably embarrassing. The way
in which the acceptance of AC developed in the early 20th century is reminiscent of the attitude which is
sometimes adopted in the computer software world, where a design blunder or implementation error which
is difficult to fix is marketed as a feature. Since it was too onerous, and embarrassing, to remove or disown
all of the published results which tacitly invoked AC, the axiom transmuted from a bug into a feature. As
mentioned in Remark 7.11.9, if it cant be proved, make it an axiom. The progress of the controversy
is well documented by Moore [321], including this comment on page 64.
It is a historic irony that many of the mathematicians who later opposed the Axiom of Choice
had used it implicitly in their own researches. This occurred chiefly to those who were pursuing
Cantorian set theory or applying it in real analysis. At the turn of the century such analysts included
Rene Baire, Emile Borel, and Henri Lebesgue in France, as well as W. H. Young in Germany. [. . . . ]
At times these various mathematicians used the Assumption in the guise of infinitely many arbitrary
choices, and at times it appeared indirectly via the Countable Union Theorem and similar results.
Certainly the indirect uses did not indicate any conscious adoption of non-constructive premises.
The 1938 relative consistency proof by Godel [367] showed that AC was in some sense harmless. But the
real harm lies in the diversion of much human creativity into a fruitless metaphysical domain. If the bug
had been taken more seriously ten or twenty years earlier, it could have been eradicated. Now it is too late.
Abstention from choice axioms is now almost as difficult and rare as vegetarianism amongst the Eskimo.
Chapter 8
Set operations and constructions
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
operator. There is a one-to-one correspondence between propositional calculus tautologies and identities for
simple set-operation expressions. A much easier option is to use boolean arithmetic, representing each set
by a single-bit boolean variable. The proofs of set-expression identities in Chapter 8 are informal. There
is no attempt to apply formal methods here because the theorems are all of a very elementary character.
Most of the theorems in Sections 8.1, 8.2 and 8.3 are limited to three sets, and yet the complexity of the
statements and proofs of the theorems is somewhat burdensome. The reader should try to imagine the
complexity which would arise for four or five sets or more. The reason for the complexity becomes clearer
n
when one considers that there are 2(2 1) possible sets which can be constructed from n sets for n + 0, Z
which generates the sequence 1, 2, 8, 128, 32768, 2147483648, 9223372036854775808 (for 6 sets), and so forth.
Each of these sets can be represented by an infinite number of set-operation expressions. So the possibilities
for discovering set-expression identities are endless. Therefore this chapter presents only a selection of useful
identities. The infinite sets of sets which are permitted by the general unions and intersections in Sections
8.4 open up an enormous new can of worms. It becomes rapidly clear that elementary set algebra is not a
walk in the park.
8.1.2 Definition:
The union of sets A and B is the set {x; x A x B}.
The intersection of two sets A and B is the set {x; x A x B}.
8.1.3 Notation:
A B for sets A and B denotes the union of A and B
A B for sets A and B denotes the intersection of A and B
8.1.4 Theorem:
(i) A = A for any set A.
(ii) A = for any set A.
(iii) A B (z A (z A z B)) for any sets A and B.
(iv) A A = for any set A.
Proof: To prove part (i), note that
x A (x A x )
x A. (8.1.1)
Line (8.1.1) follows from Remark 4.7.8 and the fact that x, x
/ by the definition of an empty set.
To prove part (ii), note that
x A (x A x )
x . (8.1.2)
Line (8.1.2) follows from Remark 4.7.8 and the fact that x, x
/ by the definition of an empty set.
To prove part (iii), note that
A B (z A z B)
(z A (z A z B))
(z A (z A z B)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (iv), let A be a set which satisfies A . Suppose that A 6= . Then x A for some x by
Definition 7.6.3. So x by Notation 7.3.3 and Definition 7.3.2. But this contradicts Definition 7.6.3 for
the empty set. Hence A = .
8.1.5 Remark: The logical equivalent of a set-operation theorem.
Theorem 8.1.4 (iii) is equivalent to the tautology ( ) ( ( )) with the meanings = z A
and = z B.
8.1.6 Theorem: The following identities hold for all sets A, B and C.
(i) A B A.
(ii) A B B.
(iii) A B A.
(iv) A B B.
(v) A B = B A. [Commutativity of union]
(vi) A B = B A. [Commutativity of intersection]
(vii) A A = A.
(viii) A A = A.
(ix) A (B C) = (A B) C. [Associativity of union]
(x) A (B C) = (A B) C. [Associativity of intersection]
(xi) A (B C) = (A B) (A C). [Distributivity of union over intersection]
(xii) A (B C) = (A B) (A C). [Distributivity of intersection over union]
(xiii) A (A B) = A. [Absorption of union over intersection]
(xiv) A (A B) = A. [Absorption of intersection over union]
Proof: These formulas follow from the corresponding formulas for the logical operators and .
Part (i) follows from Theorem 4.7.6 (vi). Part (ii) follows from Theorem 4.7.6 (vii). Part (iii) follows from
Theorem 4.7.6 (iii). Part (iv) follows from Theorem 4.7.6 (iv). Part (v) follows from Theorem 4.7.11 (i).
Part (vi) follows from Theorem 4.7.11 (ii). Part (vii) follows from Theorem 4.7.11 (iii). Part (viii) follows from
Theorem 4.7.11 (iv). Part (ix) follows from Theorem 4.7.11 (v). Part (x) follows from Theorem 4.7.11 (vi).
Part (xi) follows from Theorem 4.7.11 (vii). Part (xii) follows from Theorem 4.7.11 (viii). Part (xiii) follows
from Theorem 4.7.11 (ix). Part (xiv) follows from Theorem 4.7.11 (x).
Proof: For part (i), suppose that A B. Let x A C. Then x A or x C (or both). Suppose x A.
Then x B. So x B C. Suppose x C. Then x B. So x B C also. Therefore (A C) (B C).
For part (ii), suppose that A B. Let x A C. Then x A and x C. So x B. So x B C.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Therefore (A C) (B C).
For part (iii), let A B. Then A B B B = B by part (i). But B A B by Theorem 8.1.6 (ii).
So A B = B. To show the converse, suppose that A B = B. Then A B B by Theorem 7.3.5 (iv).
Let x A. Then x A B. Therefore x B. Hence A B.
For part (iv), let A B. Then A B A A = A by part (ii). But A A B by Theorem 8.1.6 (iii).
So A B = A. To show the converse, suppose that A B = A. Then A B A by Theorem 7.3.5 (iv).
Let x A. Then x A B. Therefore x B. Hence A B.
Part (v) follows from Theorem 8.1.6 (iii) and Theorem 8.1.6 (i).
Part (vi) follows from parts (v) and (iii) and Theorem 8.1.6 (v).
For part (vii), suppose that A = B C. Then C A by Theorem 8.1.6 (ii). So if A = B, then C B.
But if C B, then A = B C B B by part (i), and B B = B by Theorem 8.1.6 (vii). So A B.
But A = B C implies B A by Theorem 8.1.6 (i). Therefore A = B by the ZF extension axiom. Hence
A = B C B.
For part (viii), assume that A C and B C. Let x A B. Then x A or x B. Suppose that x A.
Then x C because A C. Similarly, if x B then x C. Therefore (A C B C) (A B C).
Now suppose that AB C. Let x A. Then x AB, and so x C. Therefore A C. Similarly, B C.
Therefore A C and B C. In other words, A C B C. Therefore (AB C) (A C B C).
Hence (A C B C) (A B C).
For part (ix), the implication (A B C) (A B C) follows from part (v) and Theorem 7.3.5 (vi).
Hence (A C B C) (A B C) by part (viii).
For part (x), assume that A C or B C. Suppose that A C. Then A B C because A B A by
Theorem 8.1.6 (iii). Similarly, if B C then A B C because A B B by Theorem 8.1.6 (iv). Hence
(A C B C) (A B C).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
/ B}.
8.2.3 Theorem (de Morgans law): The following identities hold for all sets A, B and X.
(i) X \ (A B) = (X \ A) (X \ B).
(ii) X \ (A B) = (X \ A) (X \ B).
Proof: For part (i), let x X \ (A B). Then x X and x / A B. So (x A x B). So
x/ Ax / B. So x X \A and x X \B. So x (X \A)(X \B). Now suppose that x (X \A)(X \B).
Then x X \ A and x X \ B. So x X and x / A and x / B. So (x A x B). So x / A B.
So x X \ (A B). Hence X \ (A B) = (X \ A) (X \ B).
As an alternative approach for part (i), note that x A \ (B C) (x A) (x / B x/ C)
(x A x / B) (x A x / C) (x A \ B) (x A \ C) x (A \ B) (A \ C) .
For part (ii), let x X \ (A B). Then x X and x / A B. So (x A x B). So x
/Ax / B. So
x X \ A or x X \ B. So x (X \ A) (X \ B). Now suppose that x (X \ A) (X \ B). Then x X \ A
or x X \ B. So x X, and x / A or x
/ B. So (x A x B). So x / A B. So x X \ (A B).
Hence X \ (A B) = (X \ A) (X \ B).
8.2.4 Theorem: Let A and B be sets.
(i) A B = (A B) \ ((A \ B) (B \ A)).
Proof: For part (i), let x A B. Then x A and x B. So x / A \ B and x / B \ A. So
x / (A \ B) (B \ A). But x A B. Therefore x (A B) \ ((A \ B) (B \ A)). To show the
reverse inclusion, suppose that x (A B) \ ((A \ B) (B \ A)). Then x A B, and so x A
or x B. Suppose that x A, and assume that x / B. Then x A \ B. So x (A \ B) (B \ A). So
x/ (A B) \ ((A \ B) (B \ A)), which is a contradiction. Therefore x B. Similarly, the assumption
x B implies x A. Therefore x A B in both cases. Hence A B = (A B) \ ((A \ B) (B \ A)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (iv), note that X \ A = if and only if x, (x X \ A), if and only if x, (x X x
/ A)
by Definition 8.2.1, if and only if x, (x
/ X x A), if and only if x, (x X x A), if and only
if X A.
For part (v), let x (X \ A) A. Then x X \ A. So x
/ A by Definition 8.2.1. So x
/ A and x A.
Hence (X \ A) A = .
Part (vi) follows from part (v) and Theorem 8.1.7 (ii), since A \ X A by part (iii).
For part (vii), note that X \A X. So (X \A)A X A by Theorem 8.1.7 (i). Now suppose that x X A.
Then x X or x A (or both). If x A, then x (X \ A) A by the definition of set union. If x
/ A,
then x X \ A by Definition 8.2.1. So x (X \ A) A by the definition of set union.
For part (viii), suppose that A X. Then (X \ A) A = X A by part (vii). But X A = X by
Theorem 8.1.7 (iii). So (X \ A) A = X.
For part (ix), x ((X \ A) X) (x X x
/ A) x X x X by Theorem 4.7.11 (ix).
For part (x), let x X. Then x A or x / A. If x A, then x X A. If x
/ A, then x X \ A. So
x (X A) (X \ A). Conversely, suppose that x (X A) (X \ A). Then x X A or x X \ A.
Then x X in either case. So (X A) (X \ A) = X.
Part (xi) follows directly from part (x) and Theorem 8.1.4 (i).
Part (xii) follows directly from part (v).
For part (xiii), note that (X \ A) \ A X \ A by Definition 8.2.1. Suppose x X \ A. Then x
/ A by
Definition 8.2.1. So x (X \ A) \ A by Definition 8.2.1. Hence X \ A (X \ A) \ A.
For part (xiv), let x X \ (X \ A). Then x X and x / X \ A. From x / X \ A, it follows that x
/X
or x A. But x / X contradicts x X. So x A. Thus x X A. Hence X \ (X \ A) X A. Now
suppose that x X A. Then x X and x A. So x / X \ A, by Definition 8.2.1. So x X \ (X \ A), by
Definition 8.2.1. Hence X A X \ (X \ A).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(xiv) (A C) \ (B \ C) = (A \ B) C.
(xv) If A B, then A \ C B \ C.
(xvi) If A B, then C \ A C \ B.
(xvii) C \ A C \ B C A C B.
(xviii) (A \ C) (B \ C) = A B C.
(xix) (A C B C) (A \ C B \ C) A B.
(xx) If A X, then (X \ B) A = A \ B.
(xxi) If A, B X, then (C A C B) ((X \ C) A (X \ C) B) A B.
Proof: Part (i) follows from x (A (B \ C)) x A (x B x / C) (x A x
B) x / C x ((A B) \ C).
Part (ii) follows from a double application of part (i). Suppose A C = . Then A (B \ C) = (A B) \ C =
(B A) \ C = B (A \ C) = B A by Theorem 8.2.5 (xi).
For part (iii), let A, B and C be sets. Then x (A B) \ C if and only if x A B and x / C, if and
only if (x A or x B) and x / C, if and only if (x A and x / C) or (x B and x / C), if and only if
x A \ C or x B \ C, if and only if x (A \ C) (B \ C).
For part (iv), let B C. Then B \ C = by Theorem 8.2.5 (iv). Therefore (A B) \ C = (A \ C) (B \ C) =
(A \ C) by part (iii) and Theorem 8.1.4 (i).
For part (v), let A \ B C. Then by Theorem 8.2.5 (x), A = (A B) (A \ B) (A B) C B C.
Conversely, suppose that A B C. Then by Theorem 8.2.5 (xviii), A \ B (B C) \ B = C \ B C.
For the first equality in part (vi), let A, B and C be sets. Then x (A B) \ C if and only if x A B
and x / C, if and only if x A and x B and x / C, if and only if (x A and x
/ C) and (x B and
x/ C), if and only if x A \ C and x B \ C, if and only if x (A \ C) (B \ C).
The second equality in part (vi) follows from the fact that (A \ C) C = by Theorem 8.2.5 (v), and so
(A \ C) (B \ C) = (A \ C) B by part (ii). Alternatively, the equality (A B) \ C = (A \ C) B follows
directly from part (i).
For the first equality in part (vii), let A, B and C be sets. Then x (A \ B) \ C if and only if x (A \ B)
and x
/ C, if and only if (x A and x / B) and x
/ C, if and only if x A and x / B C, if and only if
x A \ (B C).
For the second equality in part (vii), note that B B C by Theorem 8.1.6 (ii). Hence (A B) \ (B C) =
A \ (B C) by part (iv).
For part (viii), (A \ B) \ B = A \ (B B) by part (vii). But B B = B by Theorem 8.1.6 (vii). Hence
(A \ B) \ B = A \ B.
For part (ix), note that (A \ B) \ C = A \ (B C) by part (vii). Similarly, (A \ C) \ B = A \ (C B).
But B C = C B. Therefore (A \ B) \ C = (A \ C) \ B.
For part (x), x (A\C)\(B\C) (x A x / C) (x B x
/ C) (x A x / C) (x
/ B x
C) (x A x / C x
/ B) (x A x
/ C x C) (x A x / C x / B) x A\(B C).
Part (xi) follows from parts (vii) and (x).
For the first equality of part (xii), note that (A \ C) \ (A \ B) = (A \ (A \ B)) \ C by part (ix). But
A \ (A \ B) = A B by Theorem 8.2.5 (xiv). Therefore (A \ C) \ (A \ B) = (A B) \ C. The second equality
follows from the first by swapping A and B because A B = B A.
For part (xiii), let A, B and C be sets. Then x A \ (B \ C) if and only if x A and x / B \ C, if and
only if x A and (x / B or x C), if and only if (x A and x
/ B) or (x A and x C), if and only if
x A \ B or x A C, if and only if x (A \ B) (A C).
For part (xiv), (A C) \ (B \ C) = ((A C) \ B) ((A C) C) by part (xiii). This equals ((A C) \ B) C
by Theorem 8.1.6 (xiv). Then this equals ((A \ B) (C \ B)) C = (A \ B) ((C \ B) C) by part (iii).
And this equals (A \ B) C by Theorem 8.2.5 (ix).
For part (xv), let A, B and C be sets such that A B. Let x A \ C. Then x A and x
/ C. So x B
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and x
/ C. So x B \ C. Hence A \ C B \ C.
For part (xvi), let A, B and C be sets such that A B. Let x C \ B. Then x C and x
/ B.
Therefore x
/ A. So x C \ A. Hence C \ A C \ B.
For part (xvii), let A, B and C be sets such that C \ A C \ B. Let x C A. Then x C and x A.
Then x / C \ A. So x / C \ B. So x B because x C. So x C B. Hence C A C B. To show the
converse, suppose that C A C B. Then C \ (C A) C \ (C B) by part (xvi). Hence C \ A C \ B
by Theorem 8.2.5 (xvi).
For part (xviii), note that (A \ C) (B \ C) = if and only if (A B) \ C = by part (vi), if and only if
(A B) C by Theorem 8.2.5 (iv).
For part (xix), let A, B and C be sets such that (A C B C) (A \ C B \ C) A B. Let
x A. Then x A C or x A \ C by Theorem 8.2.5 (x). If x A C then x B C. If x A \ C then
x B \ C. So x (B C) (B \ C). Therefore x B by Theorem 8.2.5 (x). Hence A B. Conversely,
suppose that A B. Let x A C. Then x A and x C. Therefore x B and x C. So x B C.
Hence A C B C. Now let x A \ C. Then x A and x
/ C. Therefore x B and x / C. So x B \ C.
Hence A \ C B \ C. Therefore (A C B C) (A \ C B \ C).
For part (xx), let A, B and X be sets such that A X. Let x (X \ B) A. Then x X \ B and x A.
So x
/ B and x A. So x A \ B. Conversely, suppose that x A \ B. Then x A and x
/ B. But x X
because A X. So x X \ B and x A. So x (X \ B) A.
For part (xxi), let A, B, C and X be sets with A, B X. Then (X \ C) A = A \ C and (X \ C) B = B \ C
by part (xx). So the proposition (X \ C) A (X \ C) Bis equivalent to A \ C B \ C. Hence the
proposition (C A C B) ((X \ C) A (X \ C) B) A B follows from part (xix).
8.3.2 Definition: The (symmetric) set difference of two sets A and B is the set (A \ B) (B \ A).
8.3.3 Notation: A 4 B, for sets A and B, denotes the symmetric set difference of A and B.
8.3.4 Theorem: The symmetric set difference operation has the following properties for sets A, B and C.
(i) A 4 A = .
(ii) A 4 = A.
(iii) A 4 B = B 4 A.
(iv) A B if and only if A 4 B = B \ A.
(v) A 4 B = (A B) \ (A B) = (A B) 4 (A B).
(vi) If A B = , then A 4 B = A B.
(vii) (A 4 B) (A B) = .
(viii) (A 4 B) \ (A B) = A 4 B.
(ix) (A 4 B) (A B) = A B.
(x) (A 4 B) 4 (A B) = A B.
(xi) (A 4 B) 4 C = (A \ (B C)) (B \ (A C)) (C \ (A B)) (A B C).
(xii) (A 4 B) 4 C = A 4 (B 4 C).
(xiii) (A 4 B) 4 A = B.
(xiv) (A 4 B) C = (C \ A) 4 (C \ B).
(xv) A \ (B 4 C) = (A \ (B C)) (A B C).
Proof: For part (i), let A be a set. Let x A 4 A. Then x (A \ A) (A \ A) by Definition 8.3.2. So
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
x A \ A. So x A and x
/ A, by Definition 8.2.1. This is a contradiction. Hence A 4 A = .
For part (ii), let A be a set. Then x A 4 if and only if x (A \ ) ( \ A) (by Definition 8.3.2), if and
only if x A \ or x \ A, if and only if x A \ , if and only if x A and A
/ , if and only if x A.
So A 4 = A.
Part (iii) follows from the symmetry of the union operation in Definition 8.3.2.
For part (iv), let A B. Then A 4 B = (A \ B) (B \ A) = (B \ A) = B \ A by Theorem 8.2.5 (iv) and
Theorem 8.1.4 (i).
For the converse of part (iv), suppose that A 4 B = B \ A. Then (A \ B) (B \ A) = B \ A. Therefore
A \ B B \ A by Theorem 8.1.7 (iii). So A \ B B by Theorem 8.2.5 (iii). So (A \ B) \ B B \ B by
Theorem 8.2.6 (xv). But B \ B = by Theorem 8.2.5 (i) and (A \ B) \ B = A \ B by Theorem 8.2.5 (xiii).
So A \ B . Therefore A \ B = . Hence A B by Theorem 8.2.5 (iv).
For the first equality in part (v), let A and B be sets. Then
xA4B x (A \ B) (B \ A)
(x A x / B) (x B x / A)
(x A x B) (x / B x B) (x A x
/ A) (x
/A x
/ B)
(x A x B) (x /A x / B)
xAB x / AB
x (A B) \ (A B).
The second equality in part (v) follows from (A B) (A B) and part (iv).
Part (vi) follows directly from the first equality in part (v).
8.3.5 Remark: The simplicity of the properties of the symmetric difference operation.
Properties (i), (ii), (iii) and (xii) in Theorem 8.3.4 for the symmetric difference operation are satisfyingly
simple and symmetric. This is because a symmetric set difference contains a point if and only if the number
of sets which contain the point is odd. This observation implies both the symmetry rule (iii) and the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
associativity rule (xii). The set A 4 A in part (i) contains no points because all points are either in zero or
two of the sets A and A.
Theorem 8.3.7, by contrast, contains few satisfyingly simple symmetric rules because the symmetric set
difference operator is arithmetic in nature, whereas the union and intersection operators are logical in nature.
(Admittedly, parts (i) and (ii) are quite satisfying.) The logical operators have simple relations amongst
themselves and the arithmetic operator 4 has simple relations. But combinations of these two kinds of
relations yield very few simple formulas.
8.3.6 Remark: Truth tables give quick proofs of set-operation theorems, but deduction has advantages.
Theorems 8.3.4 and 8.3.7 can be more easily proved using truth tables or Venn diagrams, but the logical
deduction method can be more easily extended to assertions which involve infinite collections of sets.
8.3.7 Theorem: The symmetric set difference operation has the following properties for sets A, B and C.
(i) (A 4 B) C = (A C) 4 (B C).
(ii) (A 4 B) \ C = (A C) 4 (B C) = (A \ C) 4 (B \ C).
(iii) (A 4 B) C = (A B C) \ ((A B) \ C).
(iv) (A 4 B) B = A B.
(v) A 4 (A B) = A \ B.
(vi) (A 4 B) A = A \ B.
(vii) (A 4 B) \ A = B \ A.
(viii) (A B) \ (A 4 C) = (A B C).
(ix) (A B) \ (A 4 B) = (A B).
(x) (A 4 B) (A 4 C) = (A B C) \ (A B C).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
((A B) (A 4 C)) \ ((A B) \ (A 4 C)) by Theorem 8.2.6 (xiv). But (A B) (A 4 C) = A B C by
part (iv), and (AB)\(A4C) = AB C by part (viii). Hence (A4B)(A4C) = (AB C)\(AB C).
For part (xi), (A 4 B) (A 4 C) = ((A \ B) (B \ A)) (A 4 C) = ((A \ B) (A 4 C)) ((B \ A) (A 4 C)).
But (A \ B) (A 4 C) = (A (A 4 C)) \ B = (A \ C) \ B = A \ (B C) by part (vi) and Theorem 8.2.6 (vii).
Moreover, (B \ A) (A 4 C) = ((B \ A) A) 4 ((B \ A) C) by part (i), which equals 4 ((B \ A) C) =
(B \ A) C = (B C) \ A by Theorem 8.2.6 (i). Hence (A 4 B) (A 4 C) = (A \ (B C)) ((B C) \ A).
For part (xii), note that (A 4 B) (A 4 C) = (A B C) \ (A
B C) by part (x). So (A 4 B) (A 4 C) \
A (B 4 C) = (A B C) \ (A B C) \ A (B 4 C) = (A B C) \ (A B C) (A (B 4 C))
by Theorem 8.2.6 (vii). But (A B C) (A (B 4 C)) = A ((B C) (B4 C)) by Theorem 8.1.6 (xii),
which equals A (B C) by Theorem 8.3.4 (ix). Hence (A 4 B) (A 4 C) \ A (B 4 C) = (A B
C) \ (A (B C)) = A 4 (B C) by Definition 8.3.2, as claimed.
(xiii), (A 4B) (A 4C) = (A \ (B C)) ((B C) \ A) by part (xi). So (A 4B) (A 4C) (B 4
For part
C) \ A = (A \ (B C)) ((B C) \ A) ((B 4 C) \ A). This equals (A \ (B C)) (((B C) (B 4 C)) \ A)
by Theorem 8.2.6 (iii), which equals (A \ (B C)) ((B C) \ A) by Theorem 8.3.4 (ix), and this equals
A 4 (B C) by Definition 8.3.2.
8.4.2
S Notation:
T S denotes the set {x; A S, x A} for any set S.
S denotes the set {x; A S, x A} for any non-empty set S.
8.4.4 Theorem:
S
(i) x S A S, x A A, (x A A S) for any x and any set of sets S.
T
(ii) x S A S, x A A, (x A A / S) for any x and any non-empty set of sets S.
S
Proof: For part (i), the first equivalence (x S) (A S, x A) follows directly from Notation 8.4.2.
The second equivalence (A S, x A) (A, (x A A S)) follows from Notation 7.2.7 (which defines
the meanings of the restricted quantifiers x S and x S).
T
For part (ii), the first equivalence (x S) (A S, x A) follows directly from Notation 8.4.2. The
equivalence (A S, x A) (A, (A S x A)) follows from Notation 7.2.7, from which the second
equivalence (A S, x A) (A, (x A A / S)) follows by Theorem 6.6.18 (iii).
8.4.5 Remark: Checking some trivial properties of unions and intersections of set-collections.
Despite the triviality of Theorem 8.4.6, it is important to confirm the validity of trivial boundary cases
for definitions and theorems. Part (i) is proved in unnecessary semi-formality in order to demonstrate to
some extent how propositional calculus and predicate calculus may be applied in such simple cases. A fully
formal proof would, of course, be excessively tedious. Mathematicians give only sketches of proofs. Life is
too short to waste on completely formal proofs. (Computer programs can do that if required.) The details
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of a step in a proof only need to be filled in if some scepticism is expressed as to its validity.
8.4.6 Theorem:
S
(i) = .
S
(ii) {A} = A for any set A.
T
(iii) {A} = A for any set A.
S
(iv) {A, B} = A B for any sets A and B.
T
(v) {A, B} = A B for any sets A and B.
S
Proof: For part (i), let x . Then A, (x A A ) by Theorem 8.4.4 (i). But A, A / by
Definition 7.6.3. So A, (A x A) by Theorem
S 4.7.9 (lxii). Therefore (A, (A x
S A)) by
Theorem 6.6.7 (v). This contradiction gives x / by Theorem
S 4.6.4 (v). Therefore x, x
/ by the
universal introduction axiom, Definition 6.3.9 (UI). Hence = by Definition 7.6.3.
For Spart (ii), let A be a set. Then y {A} if and only if y = A, for any given ySby Definition 7.5.6. So
x {A} if and only if y {A}, x y, which holds if and only if x A. Hence {A} = A.
T
For part (iii), let A be a set. Then y {A} if and only if y T
= A, for any given y. So x {A} if and only
if y {A}, x y, which holds if and only if x A. Hence {A} = A.
part (iv), let A and B be sets. Then y {A, B} if and only if y = A or y = B, for any given y. So
For S
x {A, B} if and only if y {A, B}, x y, which holds if and only if xS A or x B. But x A B if
and only if x A or x B, by Definition 8.1.2 and Notation 8.1.3. Hence {A, B} = A B.
For Tpart (v), let A and B be sets. Then y {A, B} if and only if y = A and y = B, for any given y. So
x {A, B} if and only if y {A, B}, x y, which holds if and only if x A
T and x B. But x A B
if and only if x A and x B, by Definition 8.1.2 and Notation 8.1.3. Hence {A, B} = A B.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
But A S1 A S2 . So A S2 , x A, which means that x S2 .
T
To prove part (ii), let S1 and S2 be non-empty sets of sets such that S1 S2 .T Let x S1 . Then
A S1 , x A. But A S1 A S2 . So A S2 , x A, which means that x S2 .
To prove part (iii), let A be a set and S be a set of sets. Then
S
xA( S) (x A) (X S, x X)
X S, (x A x X)
X S, x A X
S
x {A X; X S}.
To prove part (v), let A be a set and S be a non-empty set of sets. Then
T
xA( S) (x A) (X S, x X)
X S, (x A x X)
X S, x A X
T
x {A X; X S}.
To prove part (vi), let A be a set and S be a non-empty set of sets. Then
T
x A ( S) (x A) (X S, x X)
X S, (x A x X)
X S, x A X
T
x {A X; X S}.
To prove part (vii), let S1 and S2 be sets of sets. Then
S S
x ( S1 ) ( S2 ) (X1 S1 , x X1 ) (X2 S2 , x X2 )
X1 S1 , X2 S2 , (x X1 x X2 )
X1 S1 , X2 S2 , x X1 X2
S
x {X1 X2 ; X1 S1 , X2 S2 }.
To prove part (viii), let S1 and S2 be non-empty sets of sets. Then
T T
x ( S1 ) ( S2 ) (X1 S1 , x X1 ) (X2 S2 , x X2 )
X1 S1 , X2 S2 , (x X1 x X2 )
X1 S1 , X2 S2 , x X1 X2
T
x {X1 X2 ; X1 S1 , X2 S2 }.
To prove part (ix), let S1 and S2 be non-empty sets of sets. Then
S S
x ( S1 ) ( S2 ) (X1 S1 , x X1 ) (X2 S2 , x X2 )
X1 S1 , X2 S2 , (x X1 x X2 )
X1 S1 , X2 S2 , x X1 X2
S
x {X1 X2 ; X1 S1 , X2 S2 }.
To prove part (x), let S1 and S2 be non-empty sets of sets. Then
T T
x ( S1 ) ( S2 ) (X1 S1 , x X1 ) (X2 S2 , x X2 )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
X1 S1 , X2 S2 , (x X1 x X2 )
X1 S1 , X2 S2 , x X1 X2
T
x {X1 X2 ; X1 S1 , X2 S2 }.
To prove part (xi), let S1 and S2 be sets of sets. Then
S S S S
x ( S1 ) ( S2 ) (x S1 ) (x S2 )
X, (X S1 x X) X, (X S2 x X)
X, (X S1 x X) (X S2 x X)
X, (X S1 X S2 ) x X
X, (X S1 S2 ) x X
X S1 S2 , x X
S
x (S1 S2 ).
To prove part (xii), let S1 and S2 be non-empty sets of sets. Then
T T T T
x ( S1 ) ( S2 ) (x S1 ) (x S2 )
X, (X / S1 x X) X, (X / S2 x X)
X, (X / S1 x X) (X / S2 x X)
X, (X / S1 X / S2 ) x X
X, (X / S1 S2 ) x X
X S1 S2 , x X
T
x (S1 S2 ).
The above calculations use the fact that the proposition x X, P (x), for any set X and set-theoretic
formula P , means x, (x X P (x)), which is equivalent to the proposition x, (x
/ X P (x)). (See
Notation 7.2.7 and Remark 7.2.6.)
To prove part (xiii), let S be a set of sets, let A S and let z A. ItSfollows that y,S(z y y S)
because this is true for y = A. Therefore z {x; y, (x y y S)} = S. Hence A S.
T
To prove part (xiv), let S be a non-empty set of sets, let A S and let z T S = {x; y, (y S x y)}.
Then y, (y S z y). From A S it then follows that z A. Hence S A.
To show the part (xv) left-to-rightSimplication, let S be a set of sets and assume X S, X A. That is,
X, (X S X A). Let z S. Then y S, z y. That is,Sy, (z y y S). From y S, it
follows that y A. So y, (z y y A). Therefore z A. Hence S A.
S
S the part (xv) right-to-left implication, let S be a set of sets and assume S A. Let X S. Then
To show
X S by part (xiii). So X A. Hence X S, X A.
T assume X T
To show the part (xvi) left-to-right implication, let S be a non-empty set of sets and S, X A.
That is, X, (X S X A). Let z A. Then X S, z X. That is, z S. Hence A S.
T
T right-to-left implication, let S be a non-empty set of sets and assume S A. Let
To show the part (xvi)
X S. Then X S by part (xiv). So X A. Hence X S, X A.
S S S
S let S1 and S2 be sets of sets. By part (vii), ( S1 ) ( S2 ) = {X1 X2 ; X1
To prove part (xvii),
S1 , X2 S2 }. But {X1 X2 ; X1 S1 , X2 S2 } = {y; X1 S1 , X2 S2 , y X1 X2 }. This equals
the empty set if and only if the proposition X1 S1 , X2 S2 , y X1 X2 is false for all y. In other
words, y, (X1 S1 , X2 S2 , y X1 X2 ). That is, y, X1 S1 , X2 S2 , y / X1 X2 . This
may be rearranged asSX1 SS1 , X2 S2 , y, y / X1 X2 . But y, y / X1 X2 means precisely that
X1 X2 = . Hence ( S1 ) ( S2 ) = if and only if X1 S1 , X2 S2 , X1 X2 = .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T S S
(ii) A \ S = {A \ X; X S} = XS (A \ X) if S 6= .
S S
Proof: For part (i), let A be a set, and let S be a non-empty set of sets. Then A \ S =T
{x A; x
/ S}
= {x A; X S, x / X} = {x; x A X S, x / X} = {x; X S, x A \ X} = XS (A \ X).
T T
For part (ii), let A be a set, and let S be a non-empty set of sets. Then A \ S = {x S A; x
/ S} =
{x A; X S, x / X} = {x; x A X S, x / X} = {x; X S, x A \ X} = XS (A \ X).
8.4.12 Theorem:
S
(i) x {A S; P (A)} A S, (x A P (A)) A, (x A A S P (A))
for any set of sets S and set-theoretic formula P .
T
(ii) x {A S; P (A)} A S, (x A P (A)) A, (x A A / S P (A))
for any non-empty set of sets S and set-theoretic formula P which satisfies A S, P (A).
Line (8.4.1) follows from Theorem 8.4.4 (i). Line (8.4.2) follows from Notation 7.2.7. Line (8.4.3) follows
from Theorem 8.4.4 (ii). Line (8.4.4) follows from Remark 7.2.6.
8.4.14 Theorem: Let A1 and A2 be sets. Let S1 and S2 be sets (of sets).
(i) A1 P(A1).
(ii) A1 A2 P(A1 ) P(A2 ).
(iii) A1 P(A2 ) P(A1 ) P(P(A2 )).
(iv) (P(A1 )) = A1 .
S
(vi) S1 P( S1 ).
S
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T
Proof: To prove part (i), let A1 be a set. Then A1 A1 . Therefore A1 P(A1 ) by Definition 7.6.21.
(See also Remark 7.6.24.)
To prove part (ii), let A1 and A2 be sets. Let X be a set such that X P(A1). Then X A1. So X A2.
P
So X (A2 ).
Part (iii) follows from part (ii) because it is just another way of saying the same thing.
To prove part (iv), P S
P
S let A1 be a set. Then A1 (A1 ). So A1 ( (A1 )). To show the reverse in-
P P
clusion, letSx ( (A1 )). Then X S(A1 ), x X. So X, (X A1 x X). Hence x A1 .
P P
Therefore ( (A1 )) A1 . It follows that ( (A1 )) = A1 .
Part (v) follows immediately from Theorem 8.4.8 (i).
S S
To prove partS(vi), let S1 be a set of sets. Let X S1 . Then X S1 . So X ( S1 ). It follows P
P
that S1 ( S1 ).
P
To prove part (vii), suppose S1 (A1 ) and let B SS
S
1 . Then U, (U S S1 B U ). So U, (U
P (A1 ) B U ) (because
S S 1 (S 1 )).PThat is, B ( (A 1 P
)). But P
( (A1 )) = A1 by part (iv).
So B A1 . Therefore S1 A1 as claimed.
S S
P P
To prove part (viii), suppose that S1 A1 . Then ( S1 ) (A1 ) by part (ii). But S1 ( S1 ) by
S
P
part (vi). Hence S1 (A1 ). P
Part (ix) follows trivially from parts (vii) and (viii).
To prove part (x), suppose
T P S T S
S1 (A1 ) and S1 6= . Then S1 A1 by part (vii). But S1 S1
because S1 6= . So S1 A1 as claimed.
S
PP S
P
To prove part (xi), let Z = { C; C ( (A1 ))}. Then Z { {a}; {a} (A1 )} = {a; a (A1 )} = P
P(A1 ).
S
P
Proof: Let T = { C; C S}. Let Q T . Let D = {A S; C (S), (A C C Q)}. Then
S
S
x, x D A, (x A A D)
P
A, (x A A S C (S), (A C C Q))
S
S
A, C, (x A A S C S A C C Q) (8.5.1)
S
A, C, (x A A C C S C Q) (8.5.2)
S S
C, (x C C Q C S)
S
C, E, (x E E Q C S E = C)
P
E, (x E E Q C (S), E = C)
S
E, (x E E Q E T )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
E, (x E E Q) (8.5.3)
S
x Q.
PP
8.6.7 Definition: A cover of a set X is a collection C ( (X)) such that X = C.
S
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
illustrated in Figure 8.6.1.)
cover of a subset
cover of a set
partition
Partitions are closely related to equivalence relations, which are defined in Section 9.8.
8.6.14 Definition:
A partition C 0 of a set X is (weakly) finer than a partition C of X if x C 0 , y C, x y.
A partition C 0 of a set X is (weakly) coarser than a partition C of X if y C, x C 0 , y x.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To distinguish one object from another, identifiers (name tags) are used. Since names from different contexts
may clash, scope identifiers are often used implicitly or explicitly. Strictly speaking, mathematics should also
have all five of these components: (1) a set to indicate which object of a class is indicated; (2) a class tag to
indicate the human significance of the object class; (3) an encoding tag to indicate the chosen representation;
(4) a name tag to indicate which object of a class is indicated; (5) a scope tag to remove ambiguity from
multiple uses of a name tag. All of this is done informally in most mathematics, but it is helpful to be aware
of the limits of the expressive power of sets. Sets should be thought of as the mathematical equivalent of
the zeros and ones of computer data. Either explicitly or implicitly, this raw data must be brought to life.
8.7.2 Remark: Particular objects within a mathematical class may be singled out with specification tuples.
Mathematical objects may be organised into classes. The objects in each mathematical class may be indicated
by a specification tuple, a parameter sequence which uniquely determines a single object in the class.
Names for things are often confused with the things which they refer to. Mathematical names are no
exception. A specification tuple is often thought of as the definition of the object itself. For example, a pair
(G, G ) may be defined to be a group if the function G : G G G satisfies the axioms of a group. The
trivial group with identity 0 would then be the tuple ({0}, {((0, 0), 0)}). However, given only this pair of
sets, it would be difficult to guess that it is supposed to be a group. There is something missing in the bare
parameter list. One could think of the missing significance as the essence of a group.
8.7.5 Remark: A more precise alternative specification notation which is not used in this book.
In a more formal presentation, one might indicate mathematical classes explicitly. For example, the class
of all topological principal G-bundles might be denoted as TPFB[G, TG , G ], where G < (G, TG , G ) is a
topological group. (In this case, there is a different class TPFB[G, TG , G ] for each choice of the triple of
parameters (G, TG , G ), where G is a group, TG is a topology on G and G is an algebraic operation on G.)
An individual member of this class might be denoted as TPFB[G, TG , G ](P, TP , q, B, TB , AG P
P , G ), which
indicates both the parameters of the class and the parameters of the particular object. The 3 class parameters
are (G, TG , G ). The 7 object parameters are (P, TP , q, B, TB , AG P
P , G ).
With such a notation, the reader would see at all times the clear division between class and object parameters.
In practice, such an object is denoted as (P, q, B) or (P, TP , q, B, TB , AG P
P , G ), and the class membership is
indicated in the informal context. Although this book is not written in such a formal way, an effort has been
made to ensure that most class and object specifications could be formalised if required. Most texts freely
mix class and object parameters, which can make it difficult to think clearly about what one is doing.
8.7.6 Remark: Idea for replacing specification tuples with specification trees or networks.
Another possible idea for upgrading the information level of specification tuples would be to replace them
with specification trees or specification networks as discussed very briefly in Remark 46.7.3.
8.7.7 Remark: Example of ambiguity which results from not indicating the class of an object.
Suppose LTG(G, X, , ) denotes a left transformation group G with group operation : G G G, acting
on X with action : GX X. Let RTG(G, X, , ) denote the corresponding right transformation group.
Then for any group GP(G, ), the parameters of LTG(G, G, , ) and RTG(G, G, , ) are identical. Yet
they are respectively the left and right transformation groups of G acting on G. So they are different classes
of structure with identical parameters. (This ambiguity is also mentioned in Remark 60.13.9.)
This shows that there must be something extra which indicates the class of a specification tuple. Thus, for
example, when this book talks about the group (G, ), what is really meant is the structure GP(G, ),
where the meaning of GP is explained only in non-technical human-to-human language. The meaning of
the class of a structure lies outside formal mathematics in the socio-mathematical context. So the reader
should have no illusions that the pair (G, ) is a group. It is just a pair of parameters.
8.7.8 Remark: The words class and object are not new in mathematics.
The idea that mathematical definitions refer to classes or objects does not derive from the corresponding
terminology in computer programming. For example, Bell [213], page 505, published the following in 1937.
A manifold is a class of objects (at least in common mathematics) which is such that any member
of the class can be completely specified by assigning to it certain numbers [. . . ]
Chapter 9
Relations
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
important consequences for the representation of functions.
Cartesian product
relation
Figure 9.0.1 includes equivalence relations, which are defined in Section 9.8, functions, which are defined in
Chapter 10, and some types of order, which are defined in Chapter 11.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
9.1.2 Remark: Graphs of relations and functions.
If relations and functions are identified with the set of ordered pairs associating elements of the domain with
elements of the range (as in Sections 9.5 and 10.2), then it is difficult to say what the graph of a relation
or graph of a function means, since the graph (i.e. the set of ordered pairs) is identical to the function.
The term graph for a relation or function does have a non-vacuous meaning, however, if a function is
defined to be a triple consisting of a domain, a range and a rule (or rule-set). Then the graph is the set of
element-pairs determined by the rule.
9.1.3 Remark: Explicit specification of domains and ranges of relations and functions.
Some authors specify relations and functions as an ordered triple such as f = (X, Y, S) to mean that X is the
domain, Y is the range, and S is the set of ordered pairs of the function. Then two functions with the same
domain set and ordered-pair set are considered to be equal functions if and only if the range sets are the
same. (See for example MacLane/Birkhoff [106], page 4.) The sets X and Y are superfluous. Nevertheless,
if this formalism is adopted, the set S may be referred to as the graph of the function f to distinguish it
from the domain/range/graph triple.
In contexts where the domain and range are fixed in the context background, one naturally ascribes properties
such as well-defined and surjective to the ordered-pair sets of functions because the domain and range
sets are not varying. So one attributes these properties to the things which do vary. But in contexts where
the domain and range sets are highly variable, one becomes aware that the well-defined and surjective
properties are actually relations between the ordered-pair sets and the domain and range sets. In some
contexts, each individual function may be specified as a fixed rule which is interpreted differently according
to the domain and range.
Thus, for example, the squaring rule maps n to n n for integers, but the same rule may map z to z 2 for
complex numbers or A to A2 for (finite or infinite) square matrices over the quaternions. Indicator maps
on sets X map elements of X to the range Y = {0, 1}, but this range may be interpreted as a subset of the
Z R
ordinal numbers , or the signed integers , or the real numbers , and so forth. Thus a single function
may have many different ranges.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Another way to specify ordered pairs would be to introduce a new kind of object into the set theory frame-
work, namely the class of pairs of objects. These could be axiomatised in the same way as sets, but then
one would have to describe also pairs whose elements may be pairs. (See Remark 9.2.4.) Similarly, one
could define relations and functions as axiomatised classes of objects, which would make the system even
more unwieldy. According to E. Mendelson [320], page 162, Kazimierz (Casimir) Kuratowski discovered the
representation of ordered pairs in Definition 9.2.2. (See also Quine [332], page 58, and Kuratowski [372],
page 171.) It has the advantage that pairs are implemented in terms of sets, thereby avoiding the nuisance
of axiomatising multiple classes of objects, but it has the disadvantage that several kinds of ambiguities
occur.
For any sets a and b, the ordered pair construction {{a}, {a, b}} in Definition 9.2.2 is a well-defined set.
(This follows from ZF Axiom (3). See Definition 7.2.4.) A peculiarity of this representation is the fact that
the ordered pair (a, a) is represented by {{a}}. One would not immediately think of a set of the form {{a}}
as an ordered pair. The difference is a matter of interpretation and context. The advantage of representing
pairs as ordinary sets is the spartan economy of ZF set theory. The disadvantage is that each set has a
meaning only when accompanied by commentary.
9.2.2 Definition: An ordered pair is a set of the form {{a}, {a, b}} for any sets a and b.
9.2.3 Notation: (a, b) denotes the ordered pair {{a}, {a, b}} for any a and b.
9.2.4 Remark: Alternative representation of ordered pairs as a class outside ZF set theory.
One could axiomatically define a system of pairs of sets, called Pairs, say, to contain all of the ordered
pairs of sets in a ZF set theory Sets, say. The relations between sets and pairs could be defined in terms
of a function P : Sets Sets Pairs and functions L : Pairs Sets and R : Pairs Sets such that
L(P (x, y)) = x, R(P (x, y)) = y and P (L(p), R(p)) = p for sets x and y and pairs p. (Of course, functions and
cross-products have not been defined yet, and Sets and Pairs are classes, not sets. So these are merely naive
notions here, not axiomatic.) We could call P (x, y) the ordered pair of sets x and y. Then Definition 9.2.2
would be just one possible representation of the class Pairs in terms of sets. The set expression {{a}, {a, b}}
is merely a parametrisation of the concept of an ordered pair.
The set-theoretic functions P , L and R for the ordered pair representation in Definition 9.2.2 must satisfy
P (a, b) = {{a}, {a, b}}, L({{a}, {a, b}}) = a and R({{a}, {a, b}}) = b. But the operators L(p) and R(p)
must be able to extract a and b from p without being given them in advance. Theorems 9.2.5 and 9.2.6 show
that the left and right elements of an ordered pair can be extracted without knowing them in advance.
9.2.5 Theorem: The left element of an ordered pair p is the set a defined by x p, a x.
Proof: Let p = (a, b) be an ordered pair. Then p = {{a}, {a, b}}. Therefore x p, a x. But it must
be shown that this uniquely determines the set a. In other words, it must be shown that if x p, a0 x,
then a0 = a. Suppose a = b. Then a0 {a}. Therefore a0 = a. Suppose a 6= b. Then a0 {a} and a0 {a, b}.
In particular, a0 {a} again. So a0 = a.
9.2.6 Theorem: The right element of an ordered pair p is the set b defined by 0 x p, b x.
Proof: The proposition 0 x p, b0 x means x p, b0 x and x, y p, (b0 x b0 y) x = y .
(See Notation 6.8.3 for the unique existence notation.) This clearly holds with b = b0 . So it remains to show
that any b0 which satisfies 0 x p, b0 x must equal b. Suppose a = b. Then b0 {a} = {b}. So b0 = b.
Suppose a 6= b. Then either b0 {a} or b0 {a, b}. So either b0 = a or b0 = b. Suppose b0 = a. Then x = {a}
and y = {a, b} implies that (b0 x b0 y) and x 6= y, which contradicts the assumption. Hence b0 = b.
9.2.7 Remark: The difficulty of extracting the left and right items from an ordered pair.
Theorems 9.2.5 and 9.2.6 are unsatisfying. One would ideally like to have simple logical expressions left
and right which yield a and b from p as a = left(p) and b = right(p). Instead we have expressions only for
{a} and {b} in terms of p.
{a} = {y; x p, y x}
T
= p
{b} = {y; 0 x p, y x}
S T S T
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
p \ p if S p 6= T p
= S
p if p = p.
These are clumsy constructions, but there seem to be no expressions for a and b themselves at all. In fact,
more generally, there seems to be no expression to simply construct a from {a}.
One way out of this difficulty would be to invent a form of expression such as x; x X to extract a
from X = {a}. But this logical expression only yields a single thing if the set is a singleton. It is desirable
to have an operator which yields the thing inside the set. The closest one can come to x = the thing
inside X is x X. Within the application context, one must also then assert that 0 x, x X. In a
formal logic approach, this is essentially what is done. Thus for example, the proposition x p, a x
may be introduced to mean that a is the left element of p.
The left and right operators may be written as L(p) = a; x p, a x and R(p) = b; 0 x p, b x.
This is equivalent to writing x p, L(p) x and 0 x p, R(p) x.
9.2.8 Remark: Extraction of elements from singletons and ordered pairs if the elements are sets.
Kelley [97], pages 258259, gives the methods in Theorem 9.2.9 for extracting the elements of singletons and
ordered pairs. These methods depend very much on the assumption that the objects a and b in an ordered
pair (a, b) are themselves sets. By the axiom of extension, Definition 7.2.4 (1), {x; x a} = a for any set a.
(In a set theory with atoms, this equality is not valid.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
TS SS ST
For part (xi), let a and b be sets. Then (a, b) (a, b) \ (a, b) = (a b) ((a b) \ a) = b
by parts (viii), (vii) and (ix) and Theorem 8.2.5 (xix).
9.2.10 Remark: Unnaturalness of left and right element extraction from ordered pairs.
The left and right element extraction operations forTordered
S pairs
S SinTheorem
S T 9.2.9 parts (ix), (x) and (xi) are
quite unnatural. Certainly an expression such as z z \ z does not prompt a spontaneous
association with the idea of the second element of the ordered pair z. However, this representation for
ordered pairs has been adopted as the de-facto standard, which has the advantage that everyone suffers the
same inconvenience. Theorem 9.2.11 expands the expressions for a and b in Theorem 9.2.9 as (very slightly
abbreviated) set-theoretic formulas.
Proof: For part (i), it follows from Theorem 9.2.9 (ix) that
ST
xa x P
T
y P, x y
T
y, (y P x y)
y, (x y z P, y z).
Part (iv) follows from the identity b = (a b) ((a b) \ a) and parts (i), (ii) and (iii).
9.2.12 Remark: Left and right element extraction operations for ordered pairs.
It is occasionally convenient to encapsulate the expressions in Theorems 9.2.9 (x, xi) and 9.2.11 (i, iv) as the
left and right element operators Left and Right in Definition 9.2.13 and Notation 9.2.14.
TT
9.2.13 Definition: The left element of an ordered pair of sets p is the set p.
TS SS ST
The right element of an ordered pair of sets p is the set p p \ p = b.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
9.2.14 Notation: Left(p), for an ordered pair of sets p, denotes the left element of p.
Right(p), for an ordered pair of sets p, denotes the right element of p.
Passing all of these tests excludes the possibility that z could have the form {{a}, {a, b}, {b}} with a 6= b
because this would have an empty intersection although its union would equal {a, b}.
After a set has passed all of these tests, one may safely apply the left and right element extraction operations
in Theorem 9.2.9 parts (ix), (x) and (xi). This strengthens the impression mentioned in Remark 9.2.10 that
the standard representation for ordered pairs is unnatural.
Another way of testing a given set z to verify that it is an ordered pair is by way of the proposition
a, b, z = {{a}, {a, b}}. This may be written out more fully as a, b, c, (c z (c = {a} c = {a, b})),
which may be written out even more fully as
a, b, c, c z (x, (x c x = a)) (x, (x c (x = a x = b))) .
9.2.16 Remark: Ordered pairs are equal if and only if both components are equal.
Theorem 9.2.17 asserts the property of ordered pairs which motivates the representation of ordered pairs in
Definition 9.2.2. Any alternative representation which has the same property would be equally acceptable,
but the representation in Definition 9.2.2 seems to be universally accepted. (See also proofs of Theorem 9.2.17
by Halmos [307], page 23 and E. Mendelson [320], page 162.)
Proof: This proof uses the idea that two finite sets that are equal must have the same number of elements.
Suppose (a, b) = (c, d). By Definition 9.2.2, (a, b) = {{a}, {a, b}}. and (c, d) = {{c}, {c, d}}. So {a} (a, b).
Therefore {a} (c, d). So by definition of {{c}, {c, d}}, either {a} = {c} or {a} = {c, d}. If {a} = {c},
then a = c. If {a} = {c, d}, then a = c = d. In either case, a = c.
To show that b = d, first suppose that a = b. Then (a, b) = {{a}}. Therefore (c, d) = {{c}}. So c = d.
Therefore b = d.
Now suppose that a 6= b. Then {a, b} (c, d) since {a, b} (a, b). So {a, b} = {c} or {a, b} = {c, d}.
Therefore {a, b} = {c, d} since a 6= b. Hence c 6= d. So b = c or b = d. But a = c and a 6= b. So b = d.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and ordered 2-tuples. The former are defined in Definition 9.2.2 whereas the latter are represented as
functions, which cannot be defined until ordered pairs have been defined. In practice, the n-tuple function
representation is most often used because this is uniform and convenient. To avoid circularity, a boot-
strapping procedure is followed. First, low-level ordered pairs are defined as in Definition 9.2.2. Then
general functions are defined in terms of these low-level ordered pairs. Then general n-tuples are defined in
terms of functions.
It is possible to define ordered triples and quadruples directly in terms of ordered pairs as in Definition 9.3.3
without first defining functions. These are quite inelegant, however, and are not the usual way of defining
general ordered tuples.
9.3.3 Definition: An ordered triple is a set of the form (a, (b, c)) for any three sets a, b and c. This is
denoted as (a, b, c).
An ordered quadruple is a set of the form (a, (b, c, d)) for any four sets a, b, c and d. This is denoted
as (a, b, c, d).
The corresponding constructions for a different permutation of the components are even less attractive. (This
alternative representation for ordered triples and quadruples is described by E. Mendelson [320], page 162.)
On the other hand, this is not very much more unattractive than the standard representation for the ordinal
numbers. (See Remark 12.2.1.) The formulas for extracting the components of these ordered triples and
quadruples are even more unattractive. This is a further motivation for not using this representation.
It would seem reasonable to use a representation {{a}, {a, b}, {a, b, c}} for the ordered triple (a, b, c), and a
similar expression for ordered quadruples etc. However, when two of the components of such a triple are
equal, the construction is ambiguous.
{{a}, {a, c}} = (a, c) = (b, c) if a = b
{{a}, {a, b}, {a, b, c}} = {{a}, {a, b}} = (a, b) = (a, c) if b = c
{{a}, {a, b}} = (a, b) = (b, c) if a = c.
This would imply that (b, b, c) = (c, b, c) and (a, a, c) = (a, c, c), and so forth. Therefore it is best to abandon
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
such low-level constructions for ordered tuples except in the case of ordered pairs, which are required to
boot-strap the definitions of Cartesian products, relations and functions.
9.4.2 Definition: The Cartesian product of two sets A and B is the set {(a, b); a A, b B} of all
ordered pairs (a, b) such that a A and b B.
b (a, b)
B AB
Proof: For part (i), note that A B = {(a, b); a A, b B} by Definition 9.4.2. It follows that
x A B (x = (a, b) a A b B). Therefore
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
AB = x, x
/ AB
x, (x 6= (a, b) a
/A b
/ B)
x, (x = (a, b) (a /A b
/ B))
a, b, (a
/A b / B)
a, (a
/ A (b, b
/ B)) (9.4.1)
(a, a
/ A) (b, b
/ B) (9.4.2)
(A = ) (B = ).
(x, y) (A B) (C D) (x A y B) (x C y D)
((x A y B) x C) ((x A y B) x D)
(x A x C) (y B x C) (x A x D) (y B y D)
(x A x C) (y B y D)
(x, y) (A C) (B D).
Hence (A B) (C D) (A C) (B D).
9.4.7 Theorem: Properties of unions and intersections of collections of set-products.
Let C1 and C2 be (collections of) sets.
S S
(i) AC1 (A B) = ( C1 ) B.
T T
(ii) AC1 (A B) = ( C1 ) B if C1 6= .
S S
(iii) BC2 (A B) = A ( C2 ).
T T
(iv) BC2 (A B) = A ( C2 ) if C2 6= ..
S S S S
(v) AC1 BC2 (A B) = ( C1 ) ( C2 ).
T S T S
(vi) AC1 BC2 (A B) = ( C1 ) ( C2 ) if C1 6= .
S T S T
(vii) AC1 BC2 (A B) = ( C1 ) ( C2 ) if C2 6= .
T T T T
(viii) AC1 BC2 (A B) = ( C1 ) ( C2 ) if C1 6= and C2 6= .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(a C1 ) b B
S
(a, b) ( C1 ) B,
S S
where line (9.4.3) follows from Theorem 6.6.16 (iii). Hence AC1 (A B) = ( C1 ) B.
Parts (ii), (iii) and (iv) may be proved as for part (i).
Part (v) follows from parts (i) and (iii).
Part (vi) follows from parts (ii) and (iii).
Part (vii) follows from parts (i) and (iv).
Part (viii) follows from parts (ii) and (iv).
9.4.8 Remark: Projection maps for binary Cartesian products.
Since the definitions for projection maps for binary Cartesian products require the concept of a function,
this important construction for Cartesian set-products is delayed until Section 10.12.
9.4.9 Remark: Emphatic version of the Cartesian set-product notation.
The product symbol in Notation 9.4.3 is very much over-used in mathematics. It can mean a plain
Cartesian set-product as in Definition 9.4.2, a relation product as in Definition 9.7.2, a function product as
in Definitions 10.14.3 and 10.15.2, a group product as in Definition 17.7.14, a topological space product as in
Definition 32.8.4, or a differentiable manifold product as in Definition 50.12.2. Most of the time, the product
construction which is intended is clear from the object classes of the component spaces, but there are also
situations where the class-specific interpretation must be overridden. (See for example Remark 50.12.6.)
Notation 9.4.10 is provided to override all such class-specific interpretations.
9.4.10 Notation: Strict notation for binary Cartesian set products.
A S B denotes the Cartesian product of sets A and B.
9.5. Relations
9.5.1 Remark: The definition of a relation between two sets.
Remark 9.1.1 mentions the difference between a membership-predicate relation (defined by a logical predicate
expression or rule) and an ordered-pair-set relation (defined as a set of ordered pairs of related objects). In
this terminology, Definition 9.5.2 defines an ordered-pair-set relation. Each pair (a, b) in a relation defines a
directed association between the components a and b.
9.5.3 Example: Some relations which are defined for all sets.
The following sets are relations from X to Y for any sets X and Y .
(1) . The empty relation. Nothing is related to anything else.
(2) X Y . Every element of X is related to every element of Y .
(3) {(x, y) X Y ; x = y}. The equality relation.
(4) {(x, y) X Y ; x 6= y}. The inequality relation.
(5) {(x, y) X Y ; x y}. The containment or element-of relation.
(6) {(x, y) X Y ; x y}. The set-inclusion relation.
The following sets are relations from X to Y for any sets X, Y , A and B.
(7) {(x, y) X Y; x A}. This is the same as (X A) Y .
(8) {(x, y) X Y; x/ A}. This is the same as (X \ A) Y .
(9) {(x, y) X Y; y B}. This is the same as X (Y B).
(10) {(x, y) X Y; y/ B}. This is the same as X (Y \ B).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The set {(x, y) X Y ; P (x, y)} is a relation from X to Y for any sets X and Y binary predicate P .
The relations =, 6=, , in the above list are set-theoretic relations. The relations = and are
primary relations in ZF set theory, whereas 6= and are composed from two or more primary relations.
(Thus A 6= B means (A = B), and A B means x, (x A x B) for example.) These set-theoretic
relations must be distinguished from the corresponding ordered-pair-set relations {(x, y) X Y ; x = y},
{(x, y) X Y ; x 6= y}, {(x, y) X Y ; x y} and {(x, y) X Y ; x y}. Usually the context makes
clear which kind of relation is meant.
Set-theoretic relations typically apply to all sets without restriction, whereas ordered-pair-set relations are
generally defined only within a specified Cartesian product of sets.
9.5.4 Definition: The domain of a relation R is the set {a; b, (a, b) R}.
The range (or the image or the codomain ) of a relation R is the set {b; a, (a, b) R}.
9.5.6 Notation:
Dom(R) denotes the domain of a relation R.
Range(R) denotes the range of a relation R.
Img(R) is an alternative notation for the range of a relation R.
9.5.7 Remark: The domain and range of a relation are well-defined sets.
Since Definition 9.5.2 specifies that a relation is a set, it follows that the domain and range in Definition 9.5.4
are also sets. It is perhaps not immediately clear why this is so. The specification axiom (which follows from
the ZF replacement axiom, Definition 7.2.4 (6)) implies that any expression of the form {x X; P (x)} is a
genuine set if X is a set and P is a set-theoretic function. Hence S S it is sufficient to find suitable X and P to
represent the domain and range of a function. The set X = R includes both the domain and range for
any relation R. Theorem 9.5.8 expresses the domain and range of any relation in terms of this set X and
suitable predicates P .
9.5.8 Theorem: The domain and range of a relation are sets. For any relation R,
SS SS
Dom(R) = a R; b R, (a, b) R
and
SS SS
Range(R) = b R; a R, (a, b) R .
Proof: Let R be a relation. Suppose that (a, b) R. By definition of an ordered pair, this means that
{{a}, {a, b}} R. It follows that {a} (a, b) for some b. Let y = (a, b). Then {a} y andS y R for some y.
That is, y, ({a} y S y R). By the definition of set union, this is equivalent to {a}
S S R. Let z = {a}.
Then a z and z R. So by the definition of set union again, this implies a R. This argument
may be summarised as follows.
(a, b) R
{{a}, {a, b}} R
y, ({a} y y R)
S
{a} R
S
z, (a z z R)
SS
a R.
SS
An almost identical argument shows that (a, b) R b R as follows.
(a, b) R
{{a}, {a, b}} R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
y, ({a} y y R)
S
{a, b} R
S
z, (b z z R)
SS
b R.
SS
By a double application of the
ZFSunion
S axiomS (Definition
S 7.2.4
(4)), R
SSis a set. By
SSthe replacement
axiom (Definition 7.2.4 (6)), a R; b R, (a, b)SSR and b S S R; a R, (a, b) R
are both sets also. But since (a, b) R implies that a R and b R, these sets are the same
as {a; b, (a, b) R} and {b; a, (a, b) R} respectively, which are simply the definitions of Dom(R)
and Range(R). It follows that the domain and range of R are sets for any relation R.
9.5.9 Remark: The method of proof that the domain and range of a relation are well-defined sets.
The steps in the proof of Theorem 9.5.8S may be thought of as progressively replacing the set-braces {
and } with the set-union operator . The two levels of nesting of a and b in set-braces are replaced
by two levels of set-union operators. This theorem depends very particularly on the representation chosen
for ordered pairs in Definition 9.2.2.
9.5.10 Theorem: Let R be a relation.
SS SS
(i) R R R .
(ii) R Dom(R) Range(R).
Proof: Part (i) follows from Theorem 9.5.8.
For part (ii), let x R. Then x = (a, b) for some a and b by Definition 9.5.2. So a satisfies b, (a, b) R,
and b satisfies a, (a, b) R. Therefore a Dom(R) and b Range(R) by Definition 9.5.4. So (a, b)
Dom(R) Range(R) by Definition 9.4.2. Hence R Dom(R) Range(R).
Proof: For part (i), let (a, b) R. Then b, (a, b) R by Definition 6.3.9 (EI). So a Dom(R) by
Definition 9.5.4 and Notation 9.5.6. The assertion follows.
For part (ii), let (a, b) R. Then a, (a, b) R by Definition 6.3.9 (EI). So b Range(R) by Definition 9.5.4
and Notation 9.5.6. The assertion follows.
9.5.13 Definition: The image of a set A by a relation R is the set {b; a A, (a, b) R}.
The pre-image (or inverse image) of a set B by a relation R is the set {a; b B, (a, b) R}.
9.5.14 Notation: R(A), for a relation R and set A, denotes the image of A by R.
R1 (B), for a relation R and set B, denotes the pre-image of B by R.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Parts (i) and (ii) follow from Definitions 9.5.4 and 9.5.13.
For part (iii), suppose that A B. Let b R(A). Then a A, (a, b) R by Definition 9.5.13. So
a B, (a, b) R. Therefore b R(B). Hence R(A) R(B).
For part (iv), suppose that A B. Let a R1 (A). Then b A, (a, b) R by Definition 9.5.13. So
b B, (a, b) R. Therefore a R1 (B). Hence R1 (A) R1 (B).
Proof: For part (i), Range(R(ARange(R))) = {b; a, (a, b) R(ARange(R))} by Definition 9.5.4,
and (a, b) R (A Range(R)) if and only if (a, b) R and (a, b) A Range(R). But (a, b)
A Range(R) if and only if a A and b Range(R). Since (a, b) R implies b Range(R), it follows
from Theorem 4.7.9 (lxiii) that (a, b) R (A Range(R)) if and only if (a, b) R and a A. Hence
Range(R (A Range(R))) = {b; a A, (a, b) R}, which equals R(A) by Definition 9.5.13.
For part (ii), Dom(R (Dom(R) B)) = {a; b, (a, b) R (Dom(R) B)} by Definition 9.5.4, and
(a, b) R (Dom(R) B) if and only if (a, b) R and (a, b) Dom(R) B. But (a, b) Dom(R) B if and
only if a Dom(R) and b B. Since (a, b) R implies a Dom(R), it follows from Theorem 4.7.9 (lxiii)
that (a, b) R (Dom(R) B) if and only if (a, b) R and b B. Hence Dom(R (Dom(R) B)) =
{a; b B, (a, b) R}, which equals R1 (A) by Definition 9.5.13.
For part (iii), R(A Dom(R)) R(A) by Theorem 9.5.15 (iii). Let b R(A). Then a A, (a, b) R by
Definition 9.5.13. But (a, b) R implies a Dom(R) by Theorem 9.5.11 (i). So a A, (a Dom(R) and
(a, b) R). Thus a A Dom(R), (a, b) R). So b R(A Dom(R)) by Definition 9.5.13. Therefore
R(A) R(A Dom(R)). Hence R(A) = R(A Dom(R)).
For part (iv), R1 (B Range(R)) R1 (B) by Theorem 9.5.15 (iv). Let a R1 (B). Then b B,
(a, b) R by Definition 9.5.13. But (a, b) R implies b Range(R) by Theorem 9.5.11 (ii). So b B,
(b Range(R) and (a, b) R). Thus b B Range(R), (a, b) R). So a R1 (B Range(R)) by
Definition 9.5.13. Therefore R1 (B) R1 (B Range(R)). Hence R1 (B) = R1 (B Range(R)).
9.5.17 Definition: A source set for a relation R is any set A such that A Dom(R).
A target set for a relation R is any set B such that B Range(R).
9.5.18 Remark: Source sets and target sets for relations.
The terms source set and target set in Definition 9.5.17 are possibly non-standard. These sets are often
referred to colloquially as the domain and range of the relation. If a relation is defined to be merely a
set R of ordered pairs, as opposed to the tuple (R, X, Y ), then the sets X and Y are not uniquely determined
properties of the set R. These sets are merely part of the context in which the relation is discussed. Thus
Dom(R) and Range(R) are uniquely determined by R, whereas Source(R) and Target(R) are not well defined
because they are not uniquely determined. Nevertheless it is very often useful to be able to refer to the source
and target sets. Therefore they need names which are different to domain and range. (Other suitable
names for the source and target sets would be the origin set and destination set respectively.)
When a relation is defined to be a tuple (R, X, Y ), incorporating the source and target sets X and Y into
the specification, the set of ordered pairs R is called the graph of the relation.
When relations are defined as merely the set of ordered pairs R, the word graph is a synonym for the
relation itself, although the use of the word graph then draws attention to the fact that the relation is
represented as a set of ordered pairs. One generally thinks of sets and relations as being in different classes
in some sense. (Class tags for sets are discussed in Remark 9.1.4.) According to this perspective, sets merely
provide a space of parameters for the objects in object classes. Consequently, using the term graph is
supposed to indicate that the set of ordered pairs should be thought of as a member of the Set class rather
than the Relation class. (Object classes here are only distantly related to the categories of category theory.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Thus it is useful to be able to refer to the set of ordered pairs of a relation as a graph because this
signals that one should remove the metamathematical Relation tag for the set, and replace it with the
metamathematical Set tag.
In defence of the relation triples (R, X, Y ), one may argue that in practice, most relations and functions
are introduced into mathematical contexts by specifying the source and target sets and a rule for effectively
determining which ordered pairs are to be included. However, this is not always so. A rule may be specified
for the set of ordered pairs, and it may sometimes be a non-trivial task to determine the domain and range.
This is more of an issue for functions, which are required to be defined for all elements of the domain.
9.5.19 Remark: Redundant, but useful, definition and notation for the graph of a relation.
Definition 9.5.20 and Notation 9.5.21 are, strictly speaking, entirely redundant, as is proved by the equally
redundant Theorem 9.5.22. However, as suggested in Remark 9.5.18, both the definition and the notation
have some value in changing the perception of a relation from an association between elements of two sets
to a simple subset of a Cartesian product set.
9.5.20 Definition: The graph of a relation R is the set {(a, b) Dom(R) Range(R); (a, b) R}.
9.5.21 Notation: graph(R), for a relation R, denotes the graph of R.
9.5.22 Theorem: Let R be a relation. Then graph(R) = R.
Proof: Let R be a relation. Then R is a set of ordered pairs by Definition 9.5.2, and it follows from
Notation 9.5.21 and Definition 9.5.20 that graph(R) = {(a, b) Dom(R) Range(R); (a, b) R}. Let
x graph(R). Then x = (a, b) for some (a, b) R. So x R. Therefore graph(R) R. Now let x R.
Then x = (a, b) for some ordered pair (a, b), and so (a, b) R. Therefore a Dom(R) and b Range(R) by
Notation 9.5.6 and Definition 9.5.4, So (a, b) Dom(R) Range(R) by Notation 9.4.3 and Definition 9.4.2.
Therefore x graph(R). Hence graph(R) = R.
9.5.25 Remark: Including source and target sets in specification tuples for relations.
A relation is sometimes defined as a triple of sets (R, A, B), where A and B are sets and R A B. This
is then generally abbreviated to R. Thus R < (R, A, B), using the chicken-foot notation for abbreviations
introduced in Section 8.7. In such a formalism, the reader must decide according to context whether R
means the full tuple (R, A, B) or just the set of ordered pairs R A B.
The tuple formalism (R, A, B) has the advantage of communicating the intended context to the reader, but
it is often clumsy and confusing. The contextual sets A and B are probably best communicated in the
surrounding text.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
9.5.27 Remark: Justification of the infix notation for relation expressions.
The infix notation a R b is abstracted from the well-known notations for relations such as a = b, a 6= b, a b,
a < b, a > b, a b, a b, a b, a b, a b, a b, a ' b, a b and a
= b.
9.6.3 Notation: R2 R1 denotes the composition of two relations R1 and R2 . In other words,
Proof: Let R1 and R2 be relations. By Theorem 9.5.8, Dom(Ri ) and Range(Ri ) are sets for i = 1, 2. Thus
Ri Dom(Ri ) Range(Ri ) for i = 1, 2. Let R = R2 R1 . Let (a, b) R. Then (a, c) R1 and (c, b) R2
for some c. Therefore a Dom(R1 ) and b Dom(R2 ). Hence R Dom(R1 ) Range(R2 ).
x, x Dom(R2 R1 ) y, (x, y) R2 R1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
y, z, ((x, z) R1 and (z, y) R2 )
z, ((x, z) R1 and y, (z, y) R2 )
z, ((x, z) R1 and z Dom(R2 ))
z, ((z, x) R11 and z Dom(R2 ))
x R11 (Dom(R2 )).
x, y Range(R2 R1 ) x, (x, y) R2 R1
x, z, ((x, z) R1 and (z, y) R2 )
z, ((x, (x, z) R1 ) and (z, y) R2 )
z, (z Range(R1 ) and (z, y) R2 )
y R2 (Range(R1 )).
Proof: Let R1 , R2 and R3 be relations. Then R3 R2 = {(b, d); c, ((b, c) R2 (c, d) R3 )} and
(R3 R2 ) R1 = {(a, d); b, ((a, b) R1 (b, d) R3 R2 )} by Definition 9.6.2. Therefore
9.6.9 Definition: The identity relation on a set X is the relation {(x, x); x X}.
9.6.10 Notation: idX denotes the identity relation on a set X. In other words, idX = {(x, x); x X}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
from A to B, then R1 is a relation from B to A.
9.6.12 Definition: The inverse of a relation R is the set {(b, a); (a, b) R}.
9.6.15 Theorem:
(i) (R1 )1 = R for any relation R.
(ii) R1 R is a symmetric relation for any relation R.
(iii) R1 R idDom(R) for any relation R.
(iv) R R1 idRange(R) for any relation R.
(v) (R2 R1 )1 = R11 R21 for any relations R1 and R2 .
Proof: For part (i), let R be a relation. Then R1 = {(b, a); (a, b) R} by Definition 9.6.12. Therefore
(R1 )1 = {(b, a); (a, b) R1 } = {(b, a); (b, a) R} = R.
To prove part (ii), let R be a relation. Let S denote the composite R1 R of R with its inverse. Suppose
(x1 , x2 ) S = R1 R. Then for some y Y , (x1 , y) R and (y, x2 ) R1 . So (x2 , y) R by the
definition of R1 . Similarly, (y, x1 ) R1 . Hence (x2 , x1 ) S by the definition of the composite. Therefore
S is symmetric.
For part (iii), let R be a relation. Then R1 R = {(a, c); b, ((a, b) R (b, c) R1 )} by Definition 9.6.2.
So by Definition 9.6.12,
R1 R = {(a, c); b, ((a, b) R (c, b) R)}
{(a, a); b, ((a, b) R (a, b) R)}
= {(a, a); a Dom(R)}
= idDom(R) .
For part (iv), let R be a relation. Then R R1 = {(a, c); b, ((a, b) R1 (b, c) R)} by Definition 9.6.2.
So by Definition 9.6.12,
R R1 = {(a, c); b, ((b, a) R (b, c) R)}
{(a, a); b, ((b, a) R (b, a) R)}
= {(a, a); a Range(R)}
= idRange(R) .
For part (v), let R1 and R2 be relations. Then (R2 R1 )1 = ({(a, c); b, ((a, b) R1 (b, c) R2 )})1 =
{(c, a); b, ((a, b) R1 (b, c) R2 )} = {(c, a); b, ((b, a) R11 (c, b) R21 )} = R11 R21 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
x1 , x2 , y, (x1 , y) R (x2 , y) R x1 = x2 .
9.6.18 Theorem:
(i) The composite of any two injective relations is an injective relation.
(ii) R1 R = idDom(R) if and only if R is injective, for any relation R.
(iii) R R1 = idRange(R) if and only if R1 is injective, for any relation R.
Proof: For part (i), let R1 and R2 be injective relations. By Definition 9.6.2, the composite of R1 and R2
is the relation R = {(a, b); c, ((a, c) R2 (c, b) R1 )}. Suppose (a1 , b) R and (a2 , b) R. Then for
some c1 and c2 , (a1 , c1 ), (a2 , c2 ) R2 and (c1 , b), (c2 , b) R1 . Since R1 is an injective relation, c1 = c2 . So
a1 = a2 because R1 is an injective relation. Hence R is an injective relation.
For part (ii), let R be an injective relation. Then
Line (9.6.1) follows from the injectivity of R. Now suppose that R is not injective. Then (a1 , b), (a2 , b) R
for some a1 , a2 and b with a1 6= a2 , and so (b, a1 ), (b, a2 ) R1 . Therefore b, ((a1 , b) R (b, a2 ) R1 ).
So (a1 , a2 ) R1 R. But (a1 , a2 )
/ idDom(R) . So R1 R 6= idDom(R) .
Line (9.6.2) follows from the injectivity of R1 . Suppose that R1 is not injective. Then (a1 , b), (a2 , b) R1
for some a1 , a2 and b with a1 6= a2 , and so (b, a1 ), (b, a2 ) R. Therefore b, ((a1 , b) R1 (b, a2 ) R).
So (a1 , a2 ) R R1 . But (a1 , a2 )
/ idRange(R) . So R R1 6= idRange(R) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S S S S
(v) If Dom(R) C1 and Range(R) C2 , then R = AC1 BC2 (idB R idA ).
Proof: For part (i), it follows from Notations 9.6.3 and 9.6.10 that
Parts (ii) and (iii) may be proved in a similar way to part (i).
S S S S
For part
S (iv), S AC1 BC2 (idB R idA ) = AC1 BC2 (R (A S B)) Sby part (iii), which equals
R( AC1 BC2 (AB)) by Theorem 8.4.8 (iii), which equals R(( C1 )( C2 )) by Theorem 9.4.7 (v).
Part (v) follows from part (iv) because R Dom(R) Range(R).
9.7.2 Definition: The direct product of two relations R1 and R2 is the relation:
{((x1 , x2 ), (y1 , y2 )); (x1 , y1 ) R1 and (x2 , y2 ) R2 }.
9.7.3 Notation: R1 R2 , for relations R1 and R2 , denotes the direct product of R1 and R2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Range(R1 R2 ) = {b; a, (a, b) R1 R2 }
= {(y1 , y2 ); (x1 , x2 ), ((x1 , x2 ), (y1 , y2 )) R1 R2 }
= {(y1 , y2 ); (x1 , x2 ), (x1 , y1 ) R1 and (x2 , y2 ) R2 }
= {(y1 , y2 ); (x1 , (x1 , y1 ) R1 ) and (x2 , (x2 , y2 ) R2 )}
= {(y1 , y2 ); y1 Range(R1 ) and y2 Range(R2 )}
= Range(R1 ) Range(R2 ).
9.7.6 Remark: Alternative expression for the direct product of two relations.
By Theorem 9.7.5, one may write:
Proof: For part (i), it follows from Definitions 9.6.2 and 9.7.2 that
((x1 , x2 ), (y1 , y2 )) Dom((S1 R1 ) (S2 R2 )) Range((S1 R1 ) (S2 R2 )),
((x1 , x2 ), (y1 , y2 )) (S1 R1 ) (S2 R2 )
(x1 , y1 ) (S1 R1 ) (x2 , y2 ) (S2 R2 )
(z1 , ((x1 , z1 ) R1 (z1 , y1 ) S1 )) (z2 , ((x2 , z2 ) R2 (z2 , y2 ) S2 ))
(z1 , z2 ), (x1 , z1 ) R1 (x2 , z2 ) R2 (z1 , y1 ) S1 (z2 , y2 ) S2
(z1 , z2 ), ((x1 , x2 ), (z1 , z2 )) R1 R2 ((z1 , z2 ), (y1 , y2 )) S1 S2
((x1 , x2 ), (y1 , y2 )) (S1 S2 ) (R1 R2 ).
Hence (S1 R1 ) (S2 R2 ) = (S1 S2 ) (R1 R2 ).
For part (ii), it follows from Definitions 9.6.2, 9.6.12 and 9.7.2 that
((x1 , x2 ), (y1 , y2 )) Dom(R11 R21 ) Range(R11 R21 ),
((x1 , x2 ), (y1 , y2 )) R11 R21 (x1 , y1 ) R11 (x2 , y2 ) R21
(y1 , x1 ) R1 (y2 , x2 ) R2
((y1 , y2 ), (x1 , x2 )) R1 R2
((x1 , x2 ), (y1 , y2 )) (R1 R2 )1 .
Hence R11 R21 = (R1 R2 )1 .
Part (iii) follows from parts (i) and (ii).
9.7.9 Remark: Common-domain direct products of relations.
The common-domain direct products of functions in Definition 10.15.2 may be generalised to relations, but
in the absence of any obvious applications to differential geometry, they are not defined here.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a partition is defined in Section 8.6.
9.8.2 Definition: An equivalence relation on a set X is a relation R on X such that
(i) x X, x R x; [reflexivity]
(ii) x, y X, (x R y y R x); [symmetry]
(iii) x, y, z X, ((x R y and y R z) x R z). [transitivity]
9.8.3 Remark: Equi-informational definitions of partitions and equivalence relations.
Theorem 9.8.4 could be thought of as the fundamental theorem of partitions and equivalence relations.
Partitions and equivalence relations are equi-informational in much the same way that subsets of sets and
indicator functions on sets are equi-informational. (See Remark 14.6.6 for the equi-informationality of the
P
set of subsets (S) and the set of functions 2S for any set S.)
Another structure which is equi-informational to a partition of X or an equivalence relation on X is a
function defined on X. This function defines an equivalence kernel on X which is an equivalence relation.
(See Definition 10.16.7.)
9.8.4 Theorem: Interdefinability of partitions and equivalence relations.
(i) Let S be a partition of a set X. Then the relation R on X defined by
x, y X, x R y A S, (x A and y A) (9.8.1)
is an equivalence relation on X.
(ii) Let R be an equivalence relation on a set X. Then
{A P(X) \ {}; (x, y A, x R y) and (x A, y X \ A, (x R y))} (9.8.2)
is a partition of A.
Proof: For part (i), let S be a partition of a set X, and define a relation R on X as in line (9.8.1). Let
x X. Then x A for some A S by Definition 8.6.11 (i). So A S, (x A and x A). Therefore x R x,
which verifies Definition 9.8.2 (i). The symmetry condition Definition 9.8.2 (ii) follows from the symmetry
of the form of the relation in line (9.8.1). Now suppose that x, y, z X satisfy x R y and y R z. Then
A S, (x A and y A) and B S, (y B and z B). By Definition 8.6.11 (ii), either A = B or
A B = . If A = B, then x A and z B = A, and so x R z. But if A B = , then either y / A or y
/ B,
and so either x R y is false or y R z is false, which contradicts the assumption. Thus Definition 15.6.5 (iii) is
verified. Hence R is an equivalence relation on X.
P
For part (ii), let RSbe an equivalence relation on X. Let S be the subset of (X) \ {} as indicated in
line (9.8.2). Then S X. Let x X. Let A = {y X; x R y}. Then A X, and A 6= because x A
P
by Definition 9.8.2 (i). So A (X) \ {}. Let x, y A. Then x R y by the definitionSof A. Let xS A and
y X \ A. Then x R y is false by the definition of A. So A S. Therefore X S and so S = X,
which verifies Definition 8.6.11 (i). Now let A, B S satisfy A B 6= . Then x A B for some x X.
Let y A. Then x R y by line (9.8.2). Suppose that y / B. Then x R y is false by line (9.8.2). So A B.
Similarly B A. So A = B. This verifies Definition 8.6.11 (ii). Hence S is a partition of X.
9.8.5 Definition: The quotient set of a set X with respect to an equivalence relation R is the set of
equivalence classes of X with respect to R.
9.8.6 Notation: X/R, for an equivalence relation R on a set X, denotes the quotient (set) of X with
respect to R.
Chapter 10
Functions
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.18 Indexed set covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
10.19 Extended notations for function-sets and map-rules . . . . . . . . . . . . . . . . . . . . . 335
well-defined operation on all sets. But the set of all sets is not a set. This kind of logic-layer set-theoretic
function is not the subject of Section 10.1.
A reasonable name for the kind of function in part (i) would be a function-predicate by analogy with the
relation-predicate alluded to in Remark 9.1.1. Then a reasonable name for the kind of function in part (ii)
would be a function-set by analogy with the corresponding relation-set.
10.1.3 Remark: The active nature of functions.
Functions are generally represented in set theory as particular kinds of sets, but they are usually thought
of as being a separate class of object something like a machine which produces outputs for given inputs.
The representation of functions as sets is an economical measure which keeps the number of object classes
low. (Conceptual economy is discussed by Halmos [307], section 6. Dissatisfaction with the passive definition
of a function as a set is discussed by Halmos [307], section 8.)
10.1.4 Remark: Function procedures versus function look-up tables.
It may be that the feeling which mathematicians have that a function is different to a set is due to the
historical origins of functions. In the olden days, for instance, the square of an integer x was defined by a
procedure of multiplication of x by itself, which was an active process of generating one number from another.
But the set definition of the square function is more like a look-up table in computing. The set definition is
a set of ordered pairs (x, x2 ). So to calculate the square of a number with the set definition, you look up the
value in the set of ordered pairs. In a more active definition of functions, you would specify an algorithm or
procedure. It seems to have been necessary historically to abandon functions defined as procedures in favour
of functions defined as look-up tables in order to remove an arbitrary limitation on the set of functions that
one can discuss. The down side of this has been the loss of active mood in the definition of functions.
10.1.5 Remark: Other terminology for functions.
The nouns function and map are used synonymously in this book. In many contexts, the word map
indicates a function between sets which are peers in some sense (such as differentiable manifolds), whereas
function is then used to indicate a more light-weight function such as a real-valued function. The noun
mapping is synonymous with map, and a family is really the same thing as a general function except
that it is thought about differently. All of these synonyms for function are useful for putting the focus on
different aspects of functions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.1.6 Remark: Historical origin of the word function.
The word function in the mathematical sense was apparently introduced by Leibniz. Bell [213], page 98, says:
The word function (or its Latin equivalent) seems to have been introduced into mathematics by Leibniz in
1694; the concept now dominates much of mathematics and is indispensable in science. Since Leibniz time
the concept has been made precise.
In an important 1875 paper, Darboux [166], page 59, felt it necessary to strongly assert that a well-defined
function f is any correspondence of a unique value f (x) to each value of x, and that each individual value
f (x1 ), f (x2 ), . . . . may be defined in a completely arbitrary manner. This was at a time when it was still
believed that all continuous functions are differentiable because the notion of a function was much narrower
than is now accepted. Some impression of the gulf separating the concept of a function 130 years ago from
the present day may be gained from this opening paragraph of a paper by Darboux [166], page 57, where he
referred to an 1854 treatise by Riemann [178], posthumously published in 1868.
Jusqua lapparition du Memoire de Riemann sur les series trigonometriques aucun doute ne setait
eleve sur lexistence de la derivee des fonctions continues. Dexcellents, dillustres geometres, au
nombre desquels il faut compter Ampere, avaient essaye de donner des demonstrations rigoureuses
de lexistence de la derivee. Ces tentatives etaient loins sans doute detres satisfaisantes; mais, je
le repete, aucun doute navait ete formule sur lexistence meme dune derivee pour les fonctions
continues.
This may be translated as follows.
Until the appearance of Riemanns memoire on trigonometric series, no doubt had been raised about
the existence of the derivative of continuous functions. Excellent, illustrious geometers, amongst
whose number one must count Ampere, had tried to give rigorous demonstrations of the existence
of the derivative. Without doubt, these attempts were far from being satisfactory; but, I repeat,
no doubt had been formulated about the existence itself of a derivative for continuous functions.
10.2.4 Remark: Separation of conditions for functions into existence and uniqueness conditions.
A relation f is a function f : X Y if and only if x X, 0 y Y, (x, y) f . This can be expressed in
terms of set cardinality as x X, #{y Y ; (x, y) f } = 1. In other words, there is one and only one y
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
for each x X such that (x, y) f . In terms of the set-map f in Definition 10.6.3 (i), the condition for a
relation f to be a function f : X Y may be written as x X, #(f({x})) = 1.
The first two conditions in Definition 10.2.2 may be expressed in terms of image set cardinality as follows.
(i0 ) x X, #{y; (x, y) f } 1. [uniqueness]
0
(ii ) x X, #{y; (x, y) f } 1. [existence]
If condition (i0 ) is not required, the relation is a multiple-valued function. If condition (ii0 ) is not required,
the relation is a partial function. Neither of these classes of function has the huge importance of the
well-defined function in Definition 10.2.2. The reason for the core significance of well-defined functions
in mathematics is the fact that one may use the word the for the value of the function. One may give a
name and a notation to an object which exists and is unique. A function introduces a unique object into a
mathematical discussion for each element of the domain. A very large proportion of definitions and notations
depend for their well-definition on the existence and uniqueness of function values. This point cannot be
over-emphasised. Existence and uniqueness proofs are two of the core preoccupations of the mathematician,
especially the pure mathematician. When one says that something is well defined, one generally means that
it exists and is unique. The two conditions in the definition of a function are therefore constantly employed
in all of mathematics.
10.2.5 Remark: Domain, range and image of functions are inherited from relations.
The domain, range and image of a function are defined exactly as for relations in Definition 9.5.4. The
notations Dom(f ), Range(f ) and Img(f ) are defined for functions exactly as for relations in Notation 9.5.6.
Likewise, the definition and notation for graph(f ), the graph of a function f , are inherited from general
relations, namely from Definition 9.5.20 and Notation 9.5.21. (Since graph(f ) = f for any function f , this
definition and notation are, of course, totally redundant, as discussed in Remarks 9.5.18 and 9.5.19.)
10.2.8 Notation: f (x) denotes the value of a function f for an argument x Dom(f ).
fx denotes the value of a function f for an argument x Dom(f ).
(fi )iX is an alternative notation for a function f with domain X.
10.2.9 Theorem: Let f and g be functions. Then f = g if and only if x, (f (x) = g(x)).
Proof: Let f = g. Let x be a set. Let y = f (x) and z = g(x). Then (x, y) f and (x, z) g = f by
Definition 10.2.7. So y = z by Definition 10.2.2 (i). Thus f (x) = g(x). Therefore x, (f (x) = g(x)).
Now assume x, (f (x) = g(x)). Let p f . Then p = (a, b) for some a Dom(f ) and b Range(f ) by
Definitions 10.2.2 and 9.5.2. So b is the value of f for argument a by Definition 10.2.7. Therefore b = f (a)
by Notation 10.2.8. So b = g(a) by the assumption. So (a, b) g by Definition 10.2.7 and Notation 10.2.8.
Consequently f g. Similarly g f . Hence f = g by Theorem 7.3.5 (iii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Mathematics texts often talk about a function of several variables. There is no specific definition of this
concept as a kind of set-construction in ZF set theory. Presumably one could define such set-constructions
by generalising ordered pairs to ordered tuples, but usually the single domain set is simply replaced by a
Cartesian set-product of some kind. (By contrast, two-parameter functions are explicitly defined in first-
order languages, where the set membership predicate, for example, is a function with two set-arguments, not
a function with a single set-pair as an argument. But the domain of the set-membership predicate is not a
ZF set, nor is it a pair of ZF sets.)
A function of two variables, where the variables are in sets X1 and X2 , and the range is a set Y , may
be formalised as a function with domain X = X1 X2 and range Y . Thus there is no difference between
a function with a single variable x = (x1 , x2 ) X = X1 X2 and a function of two variables x1 X1
and x2 X2 . The general definition of functions of more than two variables requires the construction of
Cartesian products of finite families of sets. (See Section 10.11.)
10.2.11 Remark: Notation for the set of functions with a given domain and range.
The function set Y X is the set of functions f such that Range(f ) = X and Dom(f ) Y .
10.2.12 Definition: The function set from X to Y , for sets X and Y , is the set of functions from X to Y .
10.2.13 Notation: Y X , for sets X and Y , denotes the function set from X to Y .
10.2.14 Notation: The template {f : X Y ; P (f )} denotes the set {f Y X ; P (f )} for any sets X and
Y and any set-theoretic formula P .
10.2.16 Remark: Alternative notations for the set of functions with a given domain and range.
If > denotes the always-true set-theoretic formula with zero arguments (as in Notation 5.1.10), then the set
Y X may be written in terms of Notation 10.2.14 as {f : X Y ; >}. But then the function name f does
not appear in the expression P (f ) = >. So it seems superfluous to write the letter f at all. (One could make
use here of the single-parameter always-true logical predicate which is alluded to in Remark 5.1.9.) The set
Y X is sometimes written as {f : X Y }, which also contains the superfluous function name f .
In this book, the notation X Y is proposed as an equivalent for Y X . (See Notation 10.19.2.) This
notation has the advantage that it avoids the superfluous function name, but it is slightly non-standard.
One place where the author has found this sort of notation is in a computer software user manual for the
Isabelle/HOL proof assistant for higher-order logic [324], page 5.
10.2.19 Theorem: Let X and Y be sets, and let f : X Y . Then f = if and only if X = .
Proof: Let X and Y be sets, and let f : X Y . Suppose that f = , then Dom(f ) = {x; y, (x, y) f } =
{x; y, (x, y) }. But z, z
/ by Definition 7.6.3. So x, y, (x, y)
/ . Therefore x, (y, (x, y) )
by Theorem 6.6.7 (v). So x0 , x0
/ {x; y, (x, y) }. Therefore {x; y, (x, y) } = by Definition 7.6.3.
Hence X = .
Now suppose that X = . Then Dom(f ) = . That is, {x; y, (x, y) f } = . So x, (y, (x, y) f ).
Therefore x, y, (x, y)
/ f by Theorem 6.6.10 (v). But all elements of a function are ordered pairs. So
z, z
/ f . Hence f = by Definition 7.6.3.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Whichever choice one makes will break one rule or the other (or both). Using some analysis, one may
resolve the issue by observing that limx0+ xx = limx0+ ex ln x = exp(limx0+ x ln x) = 1. Coincidentally,
Theorem 10.2.21 gives 1 as the cardinality of the set . This coincidence seems to justify the answer 00 = 1.
Proof: By Theorem 10.2.19, if f = then f : . In other words, . To show that this is the
only element of , let f . Then f = by Definitions 10.2.2 and 9.5.2. So f = . So contains
one and only one element, namely the empty set .
10.2.22 Theorem:
(i) X = for any non-empty set X.
(ii) Y = {} for any set Y .
(iii) Y {x} = {{(x, y)}; y Y } for any object x and set Y .
10.2.23 Definition: The identity function on a set X is the function f : X X with x X, f (x) = x.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
choice functions cannot be proved to exist (without AC), it is usually a simple matter to put preconditions
on theorems or definitions to require choice functions to be provided as a prerequisite to make the theorems
or definitions valid.
Adding a condition to a theorem that a suitable kind of choice function must exist for the theorem to be
valid is very similar to requiring a number to be non-zero before dividing another number by it. This is
requiring the multiplicative inverse of the number to exist. But choice functions are the same (or essentially
the same) as right inverses of given surjective functions. (See Remark 7.11.12 (1) and Theorem 10.5.14 for
this equivalence. See Definition 10.5.2 for surjective functions.) So an axiom of choice is always guaranteeing
that a right inverse of a function exists. If mathematicians can be disciplined enough to require a number
to be non-zero to make a theorem valid, then surely requiring a function to have a right inverse is not much
more onerous.
A very large proportion of all the hard problems in mathematics, which require so much time, effort, ingenuity
and luck to solve, sometimes for a century or more, are inverse problems. Solving any algebraic, differential
or integral equation, for example, is solving an inverse problem. It could therefore be perceived as extremely
lazy for someone to say that a two-line axiom gives them an inverse to any surjective function without
making any effort at all, whereas so many mathematicians devote their whole lives and creativity to finding
inverses. By renouncing all axioms of choice, mathematicians would be forced to construct inverses instead
of merely fantasising them into existence. Making this effort sometimes reveals features of the inverses and
their construction which had not previously been noticed, and which may lead to further developments.
The choice-function and choice-set formulations are equivalent, but it is the choice-function version which
is generally the most applicable and most comprehensible. The first application of the axiom of choice is to
Theorem 10.5.14, which makes much more sense in terms of choice functions than choice sets.
10.3.3 Remark: Styles of choice sets and choice functions.
The styles of choice sets and choice functions include the following.
(1) Set-collection choice set. Given a pairwise disjoint set of non-empty sets X:
Choose a set C such that A X, 0 z C, z A. (See Definition 7.11.4.)
(2) Set-collection choice function.
S Given a set of non-empty sets X:
Choose a function f : X X such that A X, f (A) A. (See Definition 10.3.4.)
(3) Power-set choice function. Given a set X:
P P
Choose a function f : (X) \ {} X such that A (X) \ {}, f (A) A. (See Definition 10.3.5.)
(4) Set-family choice function.
S Given a family of non-empty sets (X )I : Q
Choose a function f : I I X such that I, f () X . In other words, choose f I X .
(See Definition 10.11.8.)
The axiom of choice in Definition 7.11.10 (9) is the hypothesis that every eligible set-collection (i.e. pairwise
disjoint set of non-empty sets) in ZF set theory has a set-collection choice set. Style (4), which requires the
definition
Q of families of sets in Section 10.8, is equivalent to choosing an element of the Cartesian set-product
A is non-empty. (Cartesian set-products of set-families are introduced in Section 10.11.)
X
The AC version which asserts that every Cartesian product of non-empty sets is non-empty, often called
the multiplicative axiom, is sometimes given as the standard AC definition, possibly because it is so
intuitively appealing. Both the power-set and set-family choice function formulations were given in 1904 by
Zermelo [389]. (The choice-set version of AC in Definition 7.11.10 (9) is called the multiplicative axiom by
E. Mendelson [320], page 198, but this disagrees with the terminology of most authors.)
S
10.3.4 Definition: A set-collection choice function for a set of non-empty sets S is a function f : S S
which satisfies A S, f (A) A.
10.3.5 Definition: A power-set choice function for a set X is a function f : P(X) \ {} X which
P
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
satisfies A (X) \ {}, f (A) A.
10.3.6 Remark: Power-set choice functions.
P
Definition 10.3.5 is the special case of Definition 10.3.4 with S = (X) \ {}. (Definition 10.3.5 is illustrated
P
in Figure 10.3.1.) Therefore for any set X, any set-collection choice function for (X) \ {} is a power-set
choice function for X.
f (A1 ) f (A2 )
A3
A1 A2
X f (A3 )
A4 A5
f (A6 )
f (A4 ) f (A5 )
A6
Figure 10.3.1 Choice of one element for every subset Ai of a given set X
S
P
Conversely, let S be a set of non-empty sets, and let X = S. Then S (X) by Theorem 8.4.14 (vi). Let
f be a power-set choice function for X, and let g = {(x, y) f ; x S}. (That is, g is the restriction of f
to S as in Definition 10.4.2.) Then g is a function with domain S, and g(A) A for all A S. So g is a
set-collection choice function for S.
10.3.7 Remark: Axiom of choice for power-set choice functions.
An axiom of choice may be stated in terms of power-set choice functions as:
X, f : P(X) \ {} X, A P(X) \ {}, f (A) A.
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
296 10. Functions
To be suitable for use as an axiom in Section 7.11, this would have to be expanded to a much lower-level
set-theoretic formula. For this, an ordered pair (x, y) must be written as {{x}, {x, y}}. The condition
(x, y) f may be written as:
This expansion must be performed for the expression (x, z) f also. But this is not the end of the story.
The expression A X may be written as a A, a X, which is simple enough. If f is a function,
then f (A) A may be written as:
p, (p f ((a p, A a) (a p, b a, b A))).
But theres more. To ensure that f really is a function, all elements of f must also be required to be ordered
pairs. The proposition p f, x, y, p = (x, y) may be written as:
p f, x, y, ((a p, x a) (a p, y p) (a p, x a) (a p, b a, (b = x b = y))).
It is fairly clear that the axiom of choice for power-set choice functions cannot be written on a single line as
a set-theoretic formula. (It seems to require about five lines.) This is ample reason to use the set-collection
choice sets for Definition 7.11.10, where the axiom requires only two lines as a set-theoretic formula. It is
difficult to believe that a five-line set-theoretic formula could be a fundamental property of sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.4.1 Definition: A restriction of a function f is any function g which satisfies g f .
10.4.2 Definition: The restriction of a function f to a set A is the function {(x, y) f ; x A}.
10.4.3 Notation: f A for a function f and set A denotes the restriction of f to A.
Proof: The assertions follow directly from Definition 10.4.2 and Notation 10.4.3.
10.4.5 Remark: Mild clash of terminology between function restrictions and relation restrictions.
The restriction of a function f to a set A in Definition 10.4.2 is the same as the domain restriction of the
relation f which is given in Definition 9.6.19. Thus there is a mild clash of terminology here. The restriction
of a relation to a set in Definition 9.6.19 restricts both the domain and range to the same set. However,
the relation restriction definition specifies that the source and target sets of the relation are the same. (This
is in itself a source of ambiguity, since source and target sets are highly non-unique, being defined in each
context for convenience, as a kind of hint.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Definition 10.4.9 defines extensions of functions so that g is an extension of f if and only if f is a restriction
of g. By contrast with restrictions of functions, there is no such thing as the extension of a function f to a
set A which includes the domain of f , because obviously this would not be unique.
10.4.10 Definition: An extension of a function f to a set A such that Dom(f ) A is any function g
which satisfies g f and has domain A.
In this example, the variable x is bound to the partial differential operator. After the differentiation is exe-
cuted, the result becomes a function of a free variable x, which is then substituted with the expression (p).
The substitution cannot occur before differentiation is carried out because substitution for x in the partial
differential operator would be very difficult to interpret. Therefore it is understood that the expression (p)
is substituted after differentiation is carried out.
The difficulty arises here because of an anomaly in the standard way in which differentiation is denoted in
the literature. Notations with a form similar to (d/dx) cos2 (x) commonly signify the result of carrying
out a differentiation on a function, not just its value at x, and then regarding the result as a functional
expression for the same variable x. Thus the notation d/dx effectively means that the following expression
is expected to be an inline definition of a function (which may contain multiple instances of the variable x,
as in x2 + x), which in this example is the composition of the cosine and square functions. Then this
inline-defined function is to be differentiated, and the resulting function is understood to have the same
variable x. A more formally correct notation would be d(x 7 cos2 (x))(x), where the inline function is now
defined explicitly as x 7 cos2 (x), then this is differentiated by d, and the resulting derivative is evaluated
at x. Sadly history did not go down this path. But if it had, one could have written, instead of the expression
in line (10.4.3), the expression
j (x 7 i (1 (x)))( (p)).
Thus Notation 10.4.12 may be thought of as a post-evaluation expression substitution notation.
Notation 10.4.12 must not be confused with function restriction. The notation F (x)x=a denotes substitution
of a value a, which yields a value for the expression F when it is restricted to the set {a} Dom(F ). Clearly
F (a) is not exactly the same thing as F {a} , which equals the set {(a, F (a))}.
10.4.12 Notation: F (x)x=a , for an expression F with variable x, and an expression a, denotes the result
of substituting a for x into F after it has been evaluated.
10.4.14 Definition: The composition or composite of two functions f and g such that Range(f ) Dom(g)
is the function h defined by h(x) = g(f (x)) for all x Dom(f ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.4.15 Notation: g f denotes the composition of two functions f and g.
10.4.18 Theorem: Let f, g, h be functions with Range(f ) Dom(g) and Range(g) Dom(h).
(i) (h g) f and h (g f ) are well-defined functions with domain equal to Dom(f ).
(ii) (h g) f = h (g f ).
Proof: For part (i), let Range(f ) Dom(g) and Range(g) Dom(h). Then h g and g f are well-
defined functions, and Dom(h g) = Dom(g) and Range(g f ) Range(g). So Range(f ) Dom(h g) and
Range(g f ) Dom(h). Hence (h g) f and h (g f ) are well-defined functions with Dom((h g) f ) =
Dom(f ) and Dom(h (g f )) = Dom(g f ) = Dom(f ).
For part (ii), let x Dom(f ). Then ((h g) f )(x) = (h g)(f (x)) = h(g(f (x))) by Definition 10.4.14.
Similarly (h (g f ))(x) = h((g f )(x)) = h(g(f (x))). Thus x Dom(f ), ((h g) f )(x) = (h (g f ))(x).
Hence (h g) f = h (g f ).
relation
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
partial function
#(y)1
bijective function
#(x)=1 #(y)=1
The inequalities in Figure 10.5.1 are abbreviations where #(x) means #{x; (x, y) f } and #(y) means
#{y; (x, y) f }, and the inequalities indicate constraints on the numbers of values as in Remark 10.5.20.
(See Chapter 13 for cardinality.) Thus 1 indicates uniqueness, 1 indicates existence, and = 1
indicates unique existence.
10.5.2 Definition:
A function f : X Y is injective when x1 , x2 X, (f (x1 ) = f (x2 ) x1 = x2 ).
A function f : X Y is surjective when y Y, x X, f (x) = y.
A function f : X Y is bijective when it is injective and surjective.
An injection from a set X to a set Y is a function f : X Y which is injective.
A surjection from a set X to a set Y is a function f : X Y which is surjective.
A bijection from a set X to a set Y is a function f : X Y which is bijective.
Proof: By Definition 10.2.23 and Notation 10.2.25, Dom(idX ) = Range(idX ) = X. Injectivity follows
from the observation that if idX (x1 ) = idX (x2 ), then x1 = x2 by the definition of idX . For surjectivity, let
y X. Let x = y. Then idX (x) = y. So idX : X X is a bijection by Definition 10.5.2.
Proof: For part (i), let f : X Y and g : Y Z be injections. Let g(f (x1 )) = g(f (x2 )) for some
x1 , x2 X. Then f (x1 ) = f (x2 ) because g is injective. So x1 = x2 because f is injective. Hence g f is an
injection.
For part (ii), let f : X Y and g : Y Z be surjections. Let h = g f . Let z Z. Then z = g(y)
for some y Y because Range(g) = Z. So y = f (x) for some x X because Range(f ) = Y . Thus
z Z, x X, z = g(f (x)). Therefore Range(g f ) Z. But Range(g f ) Z. So Range(g f ) = Z by
Theorem 7.3.5 (iii). Hence g f : X Z is a surjection.
Part (iii) follows from parts (i) and (ii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is a preposition requiring an object. Since onto does not require a grammatical object in mathematical
English, it suffers from the same hazards as the word surjective.
10.5.7 Theorem: Let f : X Y be a function. Then f : X Y is a bijection if and only if its inverse
relation f 1 is a function from Y to X.
Now suppose that f : X Y is a function and its inverse relation f 1 is a function from Y to X. Then by
Definition 10.2.2 (i), x, y1 , y2 , (((x, y1 ) f 1 (x, y2 ) f 1 ) y1 = y2 ). Therefore f has the property
y, x1 , x2 , (((x1 , y) f (x2 , y) f ) x1 = x2 ). So y, x1 , x2 , ((f (x1 ) = y f (x2 ) = y) x1 = x2 ),
which is equivalent to x1 , x2 , (f (x1 ) = f (x2 ) x1 = x2 ). So x1 , x2 X, (f (x1 ) = f (x2 ) x1 = x2 ), and
therefore f is injective by Definition 10.5.2. From Definition 10.2.2 (ii), it follows that Dom(f 1 ) = Y . So
f 1 satisfies y Y, x X, f 1 (y) = x. So f satisfies y Y, x X, f (x) = y. Therefore f is surjective
by Definition 10.5.2. Hence f is a bijection.
10.5.11 Definition:
A left inverse of a function f : X Y is a function g : Y X such that g f = idX .
A right inverse of a function f : X Y is a function g : Y X such that f g = idY .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) A function f : X Y is an injection if and only if f has a left inverse.
(iii) A function f : X Y is a surjection if f has a right inverse.
(iv) A function f : X Y is a bijection if and only if f has an inverse function.
(v) A function f : X Y is an injection if and only if (f g = f h) (g = h) for all functions g : Z X
and h : Z X and sets Z.
(vi) A function f : X Y is a surjection if and only if (g f = h f ) (g = h) for all functions g : Y Z
and h : Y Z and sets Z.
Proof: For part (i), let f : X Y be a function for sets X and Y . Suppose that there exists a function g
which is the inverse of f . Then by Definition 10.5.8, g = f 1 . Let x X. Then g(f (x)) = z if and only
if (f (x), z) g, if and only if (z, f (x)) g 1 = f , if and only if f (z) = f (x). So z = x because f is an
injection, because f is a bijection by Theorem 10.5.7. Therefore g(f (x)) = x for all x X. So g f = idX .
Therefore g is a left inverse of f by Definition 10.5.11. Similarly, f (g(y)) = z if and only if (g(y), z) f ,
if and only if (z, g(y)) f 1 , if and only if f 1 (z) = f 1 (y). So z = y because f is a function. Therefore
f (g(y)) = y for all y Y . So f g = idY . Therefore g is a right inverse of f by Definition 10.5.11.
Now suppose that f : X Y is a function for sets X and Y , and that g is both a left inverse and right inverse
of f . Then g : Y X is a function, and g f = idX and f g = idY by Definition 10.5.11. Let (x, y) f .
Then y = f (x) = y. So g(y) = g(f (x)) = idX (x) = x. So (y, x) g. Conversely, let (y, x) g. Then
x = g(y). So f (x) = f (g(y)) = idY (y) = y. So (x, y) f . Thus (x, y) f if and only if (y, x) g, which
means that g is the inverse of f by Definitions 10.5.8 and 9.6.12.
For part (ii), let f : X Y be an injection. Let h be the inverse relation of f . Then h : f (X) X is
a function. Define g : Y X by g(y) = h(y) if y f (X) and g(y) = if y / f (X). Then g f = idX .
Therefore f has a left inverse.
Let f : X Y be a function which has a left inverse g : Y X. Then g f = idX . Let x1 , x2 X. Suppose
f (x1 ) = f (x2 ). Then g(f (x1 )) = g(f (x2 )). But g(f (x1 )) = x1 and g(f (x2 )) = x2 . So x1 = x2 . Therefore f
is injective.
For part (iii), let f : X Y be a function which has a right inverse g : Y X. Then f g = idY . Let y Y .
Then f (g(y)) = y. Let x = g(y). Then g(x) = y. Therefore f is surjective.
For part (iv), let f : X Y be a bijection for sets X and Y . Then the relation f 1 is a function by
Theorem 10.5.7. So f has an inverse function. Now suppose that f : X Y is a function and that f has an
inverse function g. Then g is both a left inverse and right inverse of f by part (i). So f is both an injection
and a surjection by parts (ii) and (iii). Therefore f is a bijection by Definition 10.5.2.
For part (v), let f : X Y be an injection for sets X and Y . Let Z be a set, and let g : Z X and
h : Z X be functions. Suppose that f g = f h. Let z Z. Then f (g(z)) = f (h(z)). So g(z) = h(z) by
Definition 10.5.2 because f is injective. Therefore z Z, g(z) = h(z), which means that g = h.
Now let f : X Y be a function such that (f g = f h) (g = h) for all functions g : Z X
and h : Z X and sets Z. Let x1 , x2 X be such that f (x1 ) = f (x2 ). Let Z = {z} for any z, and define
functions g : Z X and h : Z X by g : z 7 x1 and h : z 7 x2 . Then (f g = f h) because f (g(z)) =
f (x1 ) = f (x2 ) = f (g(z)) and so z Z, f (g(z)) = f (h(z)). So g = h. Therefore x1 = g(z) = h(z) = x2 . So
f is injective.
For part (vi), let f : X Y be a surjection for sets X and Y . Let Z be a set. Let g : Y Z and h : Y Z
be functions. Suppose that g f = h f . Let y Y . Then y = f (x) for some x X because f is surjective.
So g(f (x)) = h(f (x)), which implies that g(y) = h(y). Therefore y Y, g(y) = h(y). So g = h.
Now let X and Y be sets, and let f : X Y be a function such that (g f = h f ) (g = h) for all
functions g : Y Z and h : Y Z and sets Z. Suppose that f is not surjective. Then there is an element
y0 Y such that x X, f (x) 6= y0 . In other words, y0
/ Range(f ). Let Z = {z1 , z2 } be a set containing
two different elements z1 and z2 . Define g : Y Z by g(y) = z1 for all y Y . Define h : Y Z by
h(y) = z1 for all y Y \ {y0 }, and h(y0 ) = z2 . Then g f = h f because g(f (x)) = h(f (x)) = z1 for
all x X. But g 6= h because g(y0 ) = z1 6= z2 = h(y0 ).
10.5.13 Remark: Proof of existence of right inverse for surjective functions requires axiom of choice.
It is not possible to show in Theorem 10.5.12 (iii) that a function f : X Y is a surjection only if f has a
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
right inverse unless the axiom of choice is invoked. (See for example Remark 7.11.12.) The axiom of choice
is used for AC-tainted Theorem 10.5.14, but if the domain set X can be given a well-ordering, the axiom of
choice is not required. (See Theorem 11.7.4.) Note, however, that the necessary and sufficient condition for
surjectivity in part (vi) of Theorem 10.5.12 does not require the axiom of choice, and its proof is in fact very
elementary.
10.5.14 Theorem [zf+ac]: A function f : X Y is a surjection if and only if f has a right inverse.
Proof: Let f : X Y be a surjective function. Let P (y, x) be the proposition x X f (x) = y for
all x and y. Then the proposition y Y, x, P (y, x) is equivalent to y Y, x X, f (x) = y, which
is true by the definition of the surjectivity of f . The axiom of choice implies y Y, P (y, g(y)) for some
function g. This is equivalent to the proposition y Y, (g(y) X f (g(y)) = y). This is equivalent to
g(Y ) X and f g = idY . In other words, g : Y X and g is a right inverse of f . The converse follows
from Theorem 10.5.12 (iii).
10.5.15 Remark: Slightly paradoxical existence of inverse of bijections without axiom of choice.
There seems to be small paradox between parts (iii) and (iv) of Theorem 10.5.12. On the one hand, it cannot
be asserted (within ZF set theory) that every surjection has a right inverse. On the other hand, a bijection
has both a left and right inverse. An AC-free mathematician might suspect that part (iv) could make some
use of the axiom of choice. However, it is the injectivity which guarantees that no choice is required when
constructing the inverse.
a function with its inverse relation does have some useful properties as indicated in Theorem 10.5.17. In
particular, the inverse relation is more or less the right inverse of an original function which is a surjection.
Proof: For part (i), let (y, z) f f 1 . Then by Definition 9.6.2, (y, x) f 1 and (x, z) f for some
x X. So (x, y) f and (x, z) f for some x X by Definitions 10.5.8 and 9.6.12. So y = z Range(f )
by Definitions 10.2.2 and 9.5.4. Therefore (y, z) idRange(f ) by Definition 9.6.9. Thus f f 1 idRange(f ) .
For the reverse inclusion, let (y, z) idRange(f ) . Then y Range(f ) and z = y by Definition 9.6.9. So
(x, z) = (x, y) f for some x X by Definition 9.5.4. So (y, x) f 1 by Definitions 10.5.8 and 9.6.12. So
(y, z) f f 1 by Definition 9.6.2. Thus f f 1 idRange(f ) . Hence f f 1 = idRange(f ) .
Part (ii) follows from part (i) and Definitions 10.5.2 and 9.5.4.
For part (iii), let (x, w) idX . Then x X and w = x by Definition 9.6.9. So (x, y) f for some y Y
by Definition 10.2.2. Then (y, x) f 1 by Definitions 10.5.8 and 9.6.12. So (x, w) = (x, x) f 1 f by
Definition 9.6.2. Hence f 1 f idX .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A set endomorphism of a set S is a set homomorphism : S S. Alternative name: endofunction .
A set automorphism of a set S is a set isomorphism : S S.
10.5.20 Remark: Common names for the six kinds of set morphisms.
In the case of plain sets, which have no order structure, nor algebraic, topological, differentiable or any other
kind of structure, most kinds of morphisms are better known by their common names.
common name morphism name alternative x-values y-values
: S1 S2 function set homomorphism =1
: S1 S2 injection set monomorphism 1 =1
: S1 S2 surjection set epimorphism 1 =1
: S1 S2 bijection set isomorphism =1 =1
:SS set endomorphism endofunction =1
:SS permutation set automorphism =1 =1
Only the term set endomorphism requires an alternative name endofunction because all of the other
kinds of morphisms have well-known common names. The x-values and y-values columns indicate the
constraints on x and y values. (More precisely, the x-values column signifies #{x; (x) = y} for arbitrary y,
while the y-values column signifies #{y; (x) = y} for arbitrary x.) Thus all classes of set morphisms
must have one and only one y value in the target set for each x value in the source set. Injections must have
at most one x value for each y value. Surjections must have at least one x value for each y value.
all categories. Then the terms endomorphism and automorphism mean respectively homomorphisms
and isomorphisms whose domain and target set are the same. Authors who define mono/epi/isomorphisms
in this way explicitly for all categories include MacLane/Birkhoff [106], page 38; S. Warner [151], page 82.
Monomorphisms and epimorphisms are defined by Lang [104], page 120, as the exactness of certain diagrams,
which are then equivalent to the simple injectivity/surjectivity criteria. Authors who define them in this
way for multiple individual categories (including groups, rings and modules) include Grove [86], pages 8, 49,
127; Ash [49], pages 19, 37, 89.
These terms are defined differently in some accounts of category theory. For example, EDM2 [109], page 204,
defines monomorphisms by the left cancellative property which appears in Theorem 10.5.12 (v), which is
equivalent to injectivity for set homomorphisms, but may not necessarily be equivalent for some higher
categories. They also define epimorphisms by the right cancellative property in Theorem 10.5.12 (vi), which
differs from surjectivity of homomorphisms for some higher categories, such as rings for example.
As mentioned in Remark 1.6.3 item (2), category theory is outside the scope of this book.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
this singleton. This is formalised in Definition 10.5.25. (It is tempting to go wobbly at the knees here and
reach for an axiom of choice to construct a right inverse f 1 for f from which g f 1 may be constructed.
Luckily thats not the only way to do it.) Definition 10.5.25 is illustrated in Figure 10.5.2.
y0 g f 1
B y z C
f g
A
x1 x2 x3 x4
10.5.25 Definition: The function quotient of a function g : A C with respect to a surjective function
f : A B such that x1 , x2 A, f (x1 ) = f (x2 ) g(x1 ) = g(x2 ) is the function g f 1 : B C defined
by g f 1 = {(f (x), g(x)); x A}.
f
P(X) A
f(A)=
{f (x); xA} P(Y )
f
X x f (x) Y
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 10.6.1 Set-map between power sets
10.6.5 Remark: Colloquial notations for function set-maps and inverse set-maps.
The set-maps f and f1 in Definition 10.6.3 are often denoted simply as f and f 1 respectively. This
notation re-use, although economical, leads to real ambiguity when the domain or range of f has elements
which are contained in other elements. Thus for instance, if f : X Y and X, then f () may refer to
the original (point-map) function f or the corresponding set-map f. If this seems to be a problem only for
pathological sets X and Y , consider X = , the set of ordinal numbers. In this case, every element of is also
a subset of , which makes it absolutely essential to distinguish between a point-map function f : X Y
and its corresponding set-map function f. Another common circumstance where serious ambiguity arises is
when the domain or range is a power set.
Despite the real danger of ambiguity and confusion, most texts do use the same notation for the function
and its set-map. For clarity in Theorems 10.6.6, 10.6.9, 10.7.1 and 10.7.6, set-maps are notated differently
to the corresponding point-map functions.
P
(iv0 ) A, B (X), f(A B) = f(A) f(B) if and only if f is injective.
P
(v) A (X), f(X \ A) f(X) \ f(A).
P
(v0 ) A (X), f(X \ A) = f(X) \ f(A) if and only if f is injective.
Proof: For part (i), note that f() = {f (x); x } = . A function f : X Y satisfies f (x) Y for all
x X, by Definition 10.2.2 (iii). So f(X) = {f (x); x X} Y .
For part (i0 ), let f : X Y be surjective. Then y Y, x X, f (x) = y, by Definition 10.5.2. Let y Y .
Then x X, f (x) = y. So y {f (x); x X} = {z; x X, z = f (x)}. So Y {f (x); x X} = f(X).
Now let f(X) = Y . Then {f (x); x X} = Y . So y Y, x X, f (x) = y. So f is surjective.
P
For part (ii), let A, B (X). Suppose that A B. Let y f(A). Then x A, y = f (x). Let x A
satisfy y = f (x). Then x B. So x, (x B y = f (x)). In other words, x B, y = f (x). So
y {z; x B, z = f (x)} = f(B).
P
For part (ii0 ), suppose that f is injective. Let A, B (X). Then A B f(A) f(B) by part (ii).
Suppose that f(A) f(B). Let x A. Then f (x) f(A). So f (x) f(B). So f (x) = f (x0 ) for some x0 B,
by Definition 10.6.3 (i). Then x = x0 by the injectivity of f . So x B. Hence f(A) f(B) A B.
P
So A B f(A) f(B). To show the converse, assume that A, B (X), A B f(A) f(B).
Let x, x0 X with f (x) = f (x0 ). Let A = {x} and B = {x0 }. Then f(A) = {f (x)} = {f (x0 )} = f(B).
Therefore A = B. So x = x0 . Hence f is injective.
P
For part (iii), let A, B (X). Let y f(A B). Then y = f (x) for some x A B. So x A or x B
(or both). Suppose x A. Then f (x) f(A). So y f(A) f(B). The same follows if x B. Hence
f(A B) f(A) f(B). Now suppose that y f(A) f(B). Then either y f(A) or y f(B). Suppose
that y f(A). Then y = f (x) for some x A. Then x A B. So f (x) f(A B). Similarly if y f(B).
Hence f(A) f(B) f(A B). So f(A B) = f(A) f(B).
P
For part (iv), let A, B (X). Let y f(A B). Then y = f (x) for some x A B. So x A and x B.
So f (x) f(A) and f (x) f(B). So y f(A) f(B). Hence f(A B) f(A) f(B).
P
For part (iv0 ), suppose that f is injective. Let A, B (X). Then f(A B) f(A) f(B) by part (iv).
Let y f(A) f(B). Then y f(A) and y f(B). So y = f (x) for some x A, and y = f (x0 ) for
some x0 B. Then x = x0 by the injectivity of f . So x = x0 A B. So y f(A B). Therefore
f(A) f(B) f(A B). Hence f(A B) = f(A) f(B). To show the converse, assume that A, B
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P (X), f(A B) = f(A) f(B). Let x, x0 X with f (x) = f (x0 ). Let A = {x} and B = {x0 }. Then
f (A) = {f (x)} = {f (x0 )} = f(B). So f(A B) = f(A) f(B) = {f (x)} =
6 . So A B 6= . So x = x0 .
Hence f is injective.
For part (v), note that X = (X \ A) A by Theorem 8.2.5 (viii). So f(X) = f((X \ A) A) = f(X \ A) f(A)
by part (iii). So f(X)\ f(A) = (f(X \A) f(A))\ f(A) = f(X \A)\ f(A) f(X \A) by Theorem 8.2.5 (xviii).
P
For part (v0 ), suppose that f is injective. Let A (X). Then f(X \ A) f(X) \ f(A) by part (v).
Let y f(X \ A). Then y = f (x) for some x X \ A. Then x X and x / A for such x. Suppose
that y f(A). Then y = f (x0 ) for some x0 A. But then x = x0 by the injectivity of f . So x A,
which is a contradiction. Therefore y / f(A). So y f(X) \ f(A). Hence f(X \ A) f(X) \ f(A), and
P
so f(X \ A) = f(X) \ f(A). To show the converse, assume that A (X), f(X \ A) = f(X) \ f(A).
Let x, x0 X satisfy f (x) = f (x0 ) = y. Let A = {x}. Then f(A) = {y}. So y / f(X) \ f(A). Suppose
0 0
that x 6= x . Then x X \ A. So y f (X \ A), which contradicts the assumption. So x = x0 . Hence f is
injective.
10.6.7 Remark: Relations between function restrictions, domains, ranges and set-maps.
In terms of the function restriction operation in Definition 10.4.2, the relations in Theorem 10.6.8 follow
almost trivially. But checking the logic is a good exercise.
Proof: For part (i), f(Dom(f )) = {y; x Dom(f ), (x, y) f } by Definition 10.6.3 (i). But x Dom(f )
if and only if z, (x, z) f , by Definition 9.5.4. So f(Dom(f )) = {y; x, ((z, (x, z) f ) and (x, y) f )}.
But (x, y) f implies z, (x, z) f . So the proposition (z, (x, z) f ) and (x, y) f is equivalent to the
proposition (x, y) f by Theorem 4.7.9 (lxiii). Therefore f(Dom(f )) = {y; x, (x, y) f }, which equals
Range(f ) by Definition 9.5.4.
For part (ii), f A = {(x, y) f ; x A} by Definition 10.4.2. So Range(f A ) = {y; x A, (x, y) f } by
Definition 9.5.4, which equals f(A) by Definition 10.6.3 (i).
10.6.9 Theorem: Properties of function inverse set-maps.
P P
Let f : X Y be a function. Let f1 : (Y ) (X) denote the inverse set-map for f .
(i) f1 () = and f1 (Y ) = X.
P
(i0 ) A (Y ), f1 (A) = A = if and only if f is surjective.
P
(ii) A, B (Y ), A B f1 (A) f1 (B).
P
(ii0 ) A, B (Y ), A B f1 (A) f1 (B) if and only if f is surjective.
P
(ii00 ) A, B (Y ), A = B f1 (A) = f1 (B) if and only if f is surjective.
P
(iii) A, B (Y ), f1 (A B) = f1 (A) f1 (B).
P
(iv) A, B (Y ), f1 (A B) = f1 (A) f1 (B).
P
(v) A, B (Y ), f1 (A \ B) = f1 (A) \ f1 (B).
P
(v0 ) A (Y ), f1 (Y \ A) = X \ f1 (A).
Proof: For part (i), suppose that x f1 (). Then x X and f (x) by Definition 10.6.3 (ii). But
this contradicts Definition 10.2.2 (iii) for a function f : X Y . So x / f1 (). Hence f1 () = . By
1
Definition 10.6.3 (ii), f (Y ) = {x X; f (x) Y }. But this equals X because f (x) Y for all x X.
P
For part (i0 ), suppose that f is surjective. Let A (Y ). If A = , then f1 (A) = by part (i). Suppose
that f1 (A) = . Let y A. Then y = f (x) for some x X, by the surjectivity of f . Then x f1 (A),
which contradicts the assumption. So A = . To show the converse, assume that A (Y ), f1 (A) = P
A = . Let y Y . Let A = {y}. Then A 6= . So f1 (A) 6= . So x f1 (A) for some x X. Thus
y = f (x) for some x X. Hence f is surjective.
P
For part (ii), let A, B (Y ) satisfy A B. Let x f1 (A). Then x X and f (x) A by Defini-
tion 10.6.3 (ii). So f (x) B. So x f1 (B) by Definition 10.6.3 (ii). Hence f1 (A) f1 (B).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
For part (ii0 ), suppose that f is surjective. Let A, B (Y ) satisfy A B. Then f1 (A) f1 (B)
by part (ii). Suppose that f1 (A) f1 (B). Let y A. Then y = f (x) for some x f1 (A), by
Definition 10.6.3 (ii). Then x f1 (B) for such x. So y = f (x) B by Definition 10.6.3 (ii). Hence A B.
P
To show the converse, assume that A, B (Y ), A B f1 (A) f1 (B). Let y Y . Let A = {y}.
Suppose that f1 (A) = . Let B = . Then f1 (B) = by part (i). So f1 (A) f1 (B). So A B. That
is, {y} , which is a contradiction. So f1 (A) 6= . So y = f (x) for some x X. Hence f is surjective.
P
For part (ii00 ), suppose that f is surjective. Let A, B (Y ). Then A = B f1 (A) = f1 (B) by a double
P
application of part (ii0 ). To show the converse, assume that A, B (Y ), A = B f1 (A) = f1 (B).
Let y Y . Let A = {y}. Suppose that f1 (A) = . Let B = . Then f1 (B) = by part (i). So
f1 (A) = f1 (B). So A = B. That is, {y} = , which is a contradiction. So f1 (A) 6= . So y = f (x) for
some x X. Hence f is surjective.
P
For part (iii), let A, B (Y ). Let x f1 (AB). Then x X and f (x) AB. So f (x) A or f (x) B
(or both). If f (x) A, then x f1 (A). So x f1 (A) f1 (B). This follows similarly for f (x) B.
Hence f1 (A B) f1 (A) f1 (B). Now suppose that x f1 (A) f1 (B). Then x f1 (A)
or x f1 (B) (or both). Suppose that x f1 (A). Then f (x) A. So f (x) A B. So x f1 (A B),
and similarly if x f1 (B). Hence f1 (A B) f1 (A) f1 (B). So f1 (A B) = f1 (A) f1 (B).
P
For part (iv), let A, B (Y ). Let x f1 (A B). Then x X and f (x) A B. So f (x) A and f (x)
B. So x f1 (A) and x f1 (B). So x f1 (A) f1 (B). Hence f1 (A B) f1 (A) f1 (B).
Now suppose that x f1 (A) f1 (B). Then x f1 (A) and x f1 (B). So f (x) A and f (x) B.
So f (x) A B. So x f1 (A B). Hence f1 (A B) f1 (A) f1 (B).
P
For part (v), let A, B (Y ). Let x f1 (A \ B). Then x X and f (x) A \ B. So f (x) A
and f (x) / B. So x f1 (A) and x / f1 (B) by Definition 10.6.3 (ii). So x f1 (A) \ f1 (B). Now
suppose that x f (A) \ f (B). Then x f1 (A) and x
1 1
/ f1 (B). So f (x) A and f (x) / B by
Definition 10.6.3 (ii). So f (x) A \ B. So x f (A \ B). Hence f1 (A \ B) = f1 (A) \ f1 (B).
1
Part (v0 ) follows directly from part (v) by substituting Y for A and A for B, and noting that f1 (Y ) = X
by part (i).
10.6.10 Remark: Inverse set-maps have simpler properties than function set-maps.
The reason for the simplicity of Theorem 10.6.9 parts (iii), (iv) and (v0 ), compared to the corresponding
Theorem 10.6.6 parts (iii), (iv), (iv0 ), (v) and (v0 ), is the fact that the inverse of a function is automatically
injective and surjective, even if the inverse is not a function.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) S P(Y ), f1 (S) = f1 (S f(X)).
(iii0 ) S P(Y ), f1 (S) = S f(X) = .
(iii00 ) S P(f(X)), f1 (S) = S = .
Proof: For part (i), let f : X Y for sets X and Y . Let S P(Y ). Then
y Y, y f(f1 (S)) x f1 (S), f (x) = y
x, (x f1 (S) f (x) = y)
x, (f (x) S f (x) = y)
y S x, f (x) = y
y S y f(X)
y S f(X).
P
For part (iii), let S (Y ). Then f1 (S f(X)) = f1 (S) f1 (f(X)) f1 (S) X = f 1 (S) by
Theorem 10.6.9 (iv) and part (ii). But f1 (S f(X)) f 1 (S) by Theorems 10.6.9 (iv) and 8.1.6 (iii).
Therefore f1 (S) = f1 (S f(X)).
P
For part (iii0 ), let S (Y ). Suppose that f1 (S) = . Then f(f1 (S)) = by Theorem 10.6.6 (i).
Therefore S f(X) = by part (i). Now suppose that S f(X) = . Then f1 (S) = by part (iii) and
Theorem 10.6.9 (i).
P
For part (iii00 ), let S (f(X)). Then S f(X) = S by Theorem 8.1.7 (iv). Therefore f1 (S) = S =
by part (iii0 ).
10.7.2 Theorem: Properties of compositions of unmixed function set-maps and inverse set-maps.
Let f : X Y and g : Y Z for sets X, Y and Z. Let h = g f . Let f, f1 , g, g 1 , h and h1 be the
corresponding function set-maps and inverse set-maps as in Definition 10.6.3.
Proof: For part (i), let S P(X). Then by Definition 10.6.3, h(S) = {h(x); x S} = {g(f (x));
x S} = {g(y); x S, y = f (x)} = {g(y); y f(S)} = g(f(S)).
P
For part (ii), let S (Z). Then h1 (S) = {x X; h(x) S} = {x X; g(f (x)) S} = {x X;
f (x) g 1 (S)} = f1 (g 1 (S)) by Definition 10.6.3.
P P
Proof: For part (i), let A (X) and B (Y ). Suppose f(A) B. Then f1 (f(A)) f1 (B) by
Theorem 10.6.9 (ii). But f1 (f(A)) A by Theorem 10.7.1 (ii). Hence A f1 (B).
Now suppose that A f1 (B). Then f(A) f(f1 (B)) by Theorem 10.6.6 (ii). But f(f1 (B)) B by
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Theorem 10.7.1 (i0 ). Hence f(A) B.
Proof: For part (i), let f : X Y be a function, and let S P(P(X)). Then
S S
f( S) = {y Y; x S, f (x) = y}
S
= {y Y; x, (x S f (x) = y)}
= {y Y; x, ((A S, x A) f (x) = y)}
= {y Y; A S, x, (x A f (x) = y)}
= {y Y ; A S, y f(A)}
S
= {f(A); A S}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Line (10.7.1) follows from Theorem 6.6.16 (iii). Line (10.7.2) follows from Theorem 6.6.22 (iii). Line (10.7.3)
follows from Theorem 4.7.9 (xxxiv). Line (10.7.4) follows from Theorem 6.6.12 (xxvii). Then the equality
{f (A); A S} = f(S) follows directly from Definition 10.7.5.
T T
For part (iii), let f : X Y be a function, and let S 0 P(P(Y )). Then
S S
f1 ( S 0 ) = {x X; f (x) S 0 }
= {x X; A S 0 , f (x) A}
= {x X; A S 0 , x f1 (A)}
S
= {f1 (A); A S 0 }.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a function for all i I.
10.8.5 Notation: fi , for a family of functions f and i Dom(f ), denotes f (i). In other words, fi = f (i).
(fi )iI , for a family of functions f such that Dom(f ) = I, denotes f . In other words, f = (fi )iI .
10.8.6 Remark: The hidden agenda of families of sets and functions.
For the axiom-of-choice believer, any set of objects may be indexed by an index set which is an ordinal
number. Then the well-ordering of the index set may be exploited in applications for the traversal of the
whole family, visiting each element of the family once and only once. Families of sets or functions are often
used as if the function which maps the index set to the target set gives a kind of handle by which the
elements of the target set may be manipulated. For example, a set X may be known to be infinite, and
one might write: Let (xi )i be a sequence of elements of X. This immediately invokes the axiom of
(countable) choice if it is being tacitly asserted that such a sequence exists.
Every set X can be trivially indexed by itself. The identity function idX : X X on a set X, defined by idX :
x 7 x, satisfies the requirements for an injective family of sets whose range is the set X. Thus (Si )iI , with
I = X and Si = i for all i I, is an injective family of sets with range X. However, such a useless indexing of
a set X is often not what is implicitly signified by the introduction of a family of sets or functions. Whenever
introducing a family of sets of functions, one should be conscious of whether one is implicitly assuming the
existence of such a family when it cannot necessarily be readily justified.
10.8.7 Notation: As a convenience in notation, if two families (Ai )iI and (Bi )iI have the same index
set I, then the family of pairs
(Ai , Bi ) may be denoted in abbreviated fashion as (Ai , Bi )iI instead of the
explicit notation (Ai , Bi ) iI .
An n-tuple of families (A1i )iI , (A2i )iI , . . . (Ani )iI for n 1 may be denoted as (A1i , A2i , . . . Ani )iI instead
of the explicit notation (A1i , A2i , . . . Ani ) iI .
10.8.8 Definition:S The union of a family of sets S = (Si )iI is the union of the range of S. In other
words, it is the set {Si ; i I} = {x; i I, x Si }.
The intersection of a family T of sets S = (Si )iI such that I 6= is the intersection of the range of S. In
other words, it is the set {Si ; i I} = {x; i I, x Si }.
S
10.8.9 Notation: iI Si , for a family of sets S = (Si )iI , denotes the union of S.
T
iI Si , for a family of sets S = (Si )iI , denotes the intersection of S.
S T
Alternative notations: iI S(i) and iI S(i).
10.8.10 Remark: The union and intersection of the range of a set-valued formula.
The replacement axiom, Definition 7.2.4 (6), guarantees that {y; x X, y A(x)} = {A(x); x X} is a
well-defined set via Theorem 7.7.2. The union
S axiom, Definition 7.2.4 (4), then guarantees that the union of
this set is also a set. So the construction xX A(x) in Notation 10.8.9 is a well-defined set.
T
The set construction xX A(x) in Notation S 10.8.9 is well defined by the axiom of specification because it
can be written as a restriction of the set xX A(x) by a set-theoretic formula.
10.8.11 Theorem: Different ways of writing the union and intersection of a family of sets.
Let A = (Ax )xX = (A(x))xX be a family of sets.
S S S
(i) {A(x); x X} = {y; x X, y A(x)} = xX A(x) = {y; x X, y = A(x)}.
T T T
(ii) {A(x); x X} = {y; x X, y A(x)} = xX A(x) = {y; x X, y = A(x)} if X 6= .
S
Proof: From Notation 8.4.2, it follows that {A(x); x X} = {y; z {A(x); x X}, y z}. Therefore
S
{A(x); x X} = {y; z, (z {A(x); x X} y z)} (10.8.1)
= {y; z, (z {w; x X, A(x) = w} y z)} (10.8.2)
= {y; z, ((x X, A(x) = z) y z)} (10.8.3)
= {y; z, x X, (A(x) = z y z)} (10.8.4)
= {y; z, x, (x X A(x) = z y z)} (10.8.5)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
= {y; x, z, (x X A(x) = z y z)} (10.8.6)
= {y; x, (x X (z, (A(x) = z y z))} (10.8.7)
= {y; x X, z, (A(x) = z y z)} (10.8.8)
= {y; x X, y A(x)}, (10.8.9)
where line (10.8.1) follows from Notation 7.2.7, line (10.8.2) follows from Notation 7.7.19, line (10.8.3) follows
from Notation 7.7.10, line (10.8.4) follows from Theorem 6.6.16 (vi), line (10.8.5) follows from Notation 7.2.7,
line (10.8.6) follows from Theorem 6.6.22 (ii), line (10.8.7) follows from Theorem 6.6.16 (vi), line (10.8.8)
follows from Notation 7.2.7, and line (10.8.9) follows from Theorem 6.7.8 (xi).
S
The equality {y; x X, y A(x)} = xX A(x) follows from Notation 10.8.9.
the equality {A(x);Sx X} = {y; x X, y = A(x)}. Taking the union of both sides
Notation 7.7.19 gives S
of this equality gives {A(x); x X} = {y; x X, y = A(x)}. This completes the proof of part (i).
For part (ii), note that {A(x); x X} is non-empty because X is non-empty. Then
T
{A(x); x X} = {y; z {A(x); x X}, y z} (10.8.10)
= {y; z, (z {A(x); x X} y z)} (10.8.11)
= {y; z, (z {w; x X, A(x) = w} y z)} (10.8.12)
= {y; z, ((x X, A(x) = z) y z)} (10.8.13)
= {y; z, ((x, (x X A(x) = z)) y z)} (10.8.14)
= {y; z, x, ((x X A(x) = z) y z)} (10.8.15)
= {y; z, x, (x X (A(x) = z y z))} (10.8.16)
= {y; x, z, (x X (A(x) = z y z))} (10.8.17)
Proof: Part (i) follows from Theorem 8.4.8 (iii). Part (ii) follows from Theorem 8.4.8 (v). Part (iii) follows
from Theorem 8.4.8 (vii). Part (iv) follows from Theorem 8.4.8 (viii).
10.8.14 Remark: Swapping unions and intersections of a doubly indexed family of sets.
S T case of Theorem 10.8.15 with I = , the double index set is I J = J = . Then A = ,
In the degenerate
and x iI jJ Ai,j i I, j J, x Ai,j i, (i jS J,Tx Ai,j ) , where
denotes Sthe always-false proposition. (See Notation 3.5.15.) Therefore iI jJ Ai,j = . Similarly,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T
x jJ TiI A Si,j j J, i I, x Ai,j j J, i, (i x Ai,j ) j J, .
Therefore jJ iI Ai,j = .
10.8.15
S T Theorem: T LetS A = (Ai,j )iI,jJ be a doubly indexed family of sets such that J 6= . Then
iI jJ Ai,j jJ iI Ai,j .
Proof: Let A = (Ai,j )iI,jJ be a doubly indexed family of sets such that I 6= . Then
S T
x Ai,j i I, j J, x Ai,j
iI jJ
j J, i I, x Ai,j (10.8.21)
T S
x Ai,j ,
jJ iI
S T T S
where line (10.8.21) follows from Theorem 6.6.22 (iii). Hence iI jJ Ai,j jJ iI Ai,j .
10.8.16 Remark: Application of a double set-map theorem to families of sets.
Theorem 10.8.17 is an indexed version of Theorem 10.7.6. The single and double set-map notations f, f1 ,
f and f1 are introduced in Definitions 10.6.3 and 10.7.5. The proof given here for Theorem 10.8.17 is
essentially the same as the proof given for Theorem 10.7.6.
10.8.17 Theorem: Properties of double set-maps for indexed families of sets.
P P
Let f : X Y be a function. Let A = (Ai )iI (X)I and B = (Bi )iI (Y )I be families of sets.
(i) f( iI Ai ) = iI f(Ai ) = f({Ai ; i I}) = f(Range(A)).
S S S S
Proof: For part (i), let f : X Y be a function, and let (Ai )iI be a family of subsets of X. Then
S S
y f( iI Ai ) x iI Ai , y = f (x)
i I, x Ai , y = f (x)
i I, y f(Ai )
S
y iI f(Ai ).
Line (10.8.22) follows from Theorem 6.6.16 (iii). Line (10.8.23) follows from Theorem 6.6.22 (iii). Line
(10.8.24) follows from Theorem 4.7.9 (xxxiv). Line (10.8.25) follows from Theorem 6.6.12 (xxvii). Hence
f( iI Ai ) iI f(Ai ). Then iI f(Ai ) = f({Ai ; i I}) = f(Range(A)) follows directly from
T T T T T
Definition 10.7.5.
For part (iii), let f : X Y be a function, and let (Bi )iI be a family of subsets of Y . Then
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S S
x f1 ( iI Bi ) f (x) iI Bi
i I, f (x) Bi
i I, x f1 (Bi )
S
x f1 (Bi ).
iI
S S
Hence f1 ( iI Bi ) = iI f1 (Bi ). Then
S 1 (Bi ) = S f1 ({Bi ; i I}) = S f1 (Range(B)) follows
iI f
directly from Definition 10.7.5.
For part (iv), let f : X Y be a function, and let (Bi )iI be a non-empty family of subsets of Y . Then
T T
x f1 ( iI Bi ) f (x) iI Bi
i I, f (x) Bi
i I, x f1 (Bi )
T
x f1 (Bi ).
iI
T T
Hence f1 ( iI Bi ) = iI f1 (Bi ). Then
T 1 (Bi ) = T f1 ({Bi ; i I}) = T f1 (Range(B)) follows
iI f
directly from Definition 10.7.5.
10.8.18 Remark: Some theorems about unions of unions.
TheoremsS 10.8.19 and 10.8.21 are applicable to topology. Theorems 10.8.19 means that the union of the
unions S f (A) of individual set-collections f (A) in a family of set-collections (f (A))AI is equal to the
union {f (A); A I} of the combined collection {f (A); A I} of the sets in each of the individual
collections f (A). In other words, merging each collection f (A) individually into a single set first and then
merging these merged sets gives the same result as combining all of the collections f (A) into one big collection
and merging that big combined collection.
10.8.19 Theorem:
S Let f : I
S PP
S S( (X)) for sets I and X. Then
S S
{ f (A); A I} =
SS
{f (A); A I}.
In other words, AI f (A) = AI f (A).
S S
Proof: The left-hand side expression { f (A); A I} satisfies
S S S
x
{ f (A); A I} A I, x f (A)
A I, B f (A), x B.
SS
The right-hand side expression {f (A); A I} satisfies
SS S
x {f (A); A I} B {f (A); A I}, x B
S
B, (B {f (A); A I} x B)
B, A I, (B f (A) x B)
B f (A), A I, x B.
10.8.21
S Theorem: Let X be aSset andS QS ( (X)). Let S PP PP
: Q ( (X)) be a function such that
fS
A = f (A) for all A Q. Then Q = { f (A); A Q} = {f (A); A Q}.
S S S S
Proof:
S S Clearly Q S =S {A; A Q} = { f (A); A Q}, and from Theorem 10.8.19, it follows that
{ f (A); A Q} = {f (A); A Q}.
10.8.22 S
Remark: Unions and intersections of collections of sets constrained by a predicate.
The set yY {x X; P (x, y)} in Theorem 10.8.23 may be thought of as the domain of the set-theoretic
formula P if it is subject to the restrictions x X and y Y on P (x, y).
T
The set yY {x X; P (x, y)} is perhaps less T useful. If the set {(x, y) X Y ; P (x, y)} is interpreted as
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the graph of a function f : X Y , the set yY {x X; P (x, y)} is the set of x X for which f ({x}) = Y .
Line (10.8.26) follows from Y 6= , which follows from the assumption x X, y Y, P (x, y).
T T T
10.8.24 Theorem: Let (Ai , Bi )iI be a family of set-pairs. Then iI A i Bi = ( iI Ai ) ( iI Bi ).
i I, ((x Ai ) and (y Bi ))
(i I, x Ai ) and (i I, y Bi ) (10.8.27)
T T
x Ai and y Bi
iI iI
T T
(x, y) ( Ai ) ( Bi ),
iI iI
T T T
where line (10.8.27) follows from Theorem 6.6.20 (iii). Hence iI A i Bi = ( iI Ai ) ( iI Bi ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
fully-defined functions.
Conditions (i), (ii) and (iii) in Definition 10.9.2 correspond to conditions (i), (ii) and (iii) for a fully-defined
function in Definition 10.2.2. E. Mendelson [320], page 168, uses the adjective univocal to refer to ordered-
pair relations which have the uniqueness property in Definition 10.9.2 (i).
10.9.3 Notation: X Y denotes the set of all partially defined functions from a set X to a set Y .
f :X Y means that f is a partially defined function from a set X to a set Y .
10.9.5 Remark: Possible alternative notations for sets of partially defined functions.
There is a notation Y X for the set of functions f : X Y , but there is no simple notation for the set or
partially defined functions {f : U Y ; U X}. This could be denoted in the following ways.
S
{f : U Y ; U X} = YU
UX
(Y {Y })X
= (Y + )X .
The second notation has the advantage of making it obvious that the cardinality of the set is (#(Y )+1)#(X) ,
but it is not good, meaningful set notation.
10.9.8 Definition:
A partial function f : X Y is injective when x1 , x2 Dom(f ), (f (x1 ) = f (x2 ) x1 = x2 ).
A partial function f : X Y is surjective when y Y, x Dom(f ), f (x) = y.
A partial function f : X Y is bijective when it is injective and surjective.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.9.9 Remark: Set-maps for partial functions and inverses of partial functions.
The definitions and notations for function set-maps and inverse set-maps in Section 10.6 are immediately
applicable to partial functions. (Of course, they may also be applied to general relations.) Definition 10.9.10
is a partial function version of Definition 10.6.3. As mentioned in Remark 10.6.5, the notations f and f 1
are employed, in practice, instead of f and f1 respectively, despite the potential for confusion and error.
10.9.10 Definition:
(i) The set-map for a partial function f : X P
Y , for sets X and Y , is the function f : (X) (Y ) P
defined by f (A) = {y Y ; x A, (x, y) f } = {f (x); x A} for all A X.
(ii) The inverse set-map for a partial function f : X P
Y , for sets X and Y , is the function f1 : (Y )
P
(X) defined by f1 (B) = {x X; y B, (x, y) f } = {x X; f (x) B} for all B Y .
10.9.11 Remark: Difference between properties of function set-maps and partial function set-maps.
It is perhaps remarkable that there is almost no difference between Theorem 10.6.6 for functions set-maps
and Theorem 10.9.12 for partial function set-maps. The only noticeable difference is seen in part (ii0 ) of
Theorem 10.9.12, where the equivalence of set inclusions forces the partial function to be a function.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Theorem 8.2.6 (iii). But f((X \Dom(f ))\A) f(X \Dom(f )) = by part (ii). So f(X \A) = f(Dom(f )\A)
P P
for all A (X), by part (iii). Similarly f(X) = f(Dom(f )). So A (X), f(X \ A) = f(X) \ f(A)
if and
P
only if A (X), f (Dom(f ) \ A) = f (Dom(f )) \ f (A). But f (A) = f (A Dom(f )) and f (Dom(f ) \ A) =
P
f(Dom(f ) \ (A Dom(f ))) for all A (X). So X may be replaced by Dom(f ) in the statement of part (v0 ),
and f is a function on Dom(f ). Hence the assertion follows from the application of Theorem 10.6.6 (v0 ) to
this function.
10.9.13 Remark: Differences between function and partial function inverse set-map properties.
Theorem 10.9.14 for partial function inverse set-maps is almost identical to Theorem 10.6.9 for function
inverse set-maps. The only noticeable difference is that X is replaced by Dom(f ).
Proof: Let f : X Y be a partial function. Then f is a function from Dom(f ) to Y . So Theorem 10.6.9
is directly applicable to f with X replaced by Dom(f ). Hence part (i) follows from Theorem 10.6.9 (i),
part (i0 ) follows from Theorem 10.6.9 (i0 ), part (ii) follows from Theorem 10.6.9 (ii), part (ii0 ) follows from
Theorem 10.6.9 (ii0 ), part (ii00 ) follows from Theorem 10.6.9 (ii00 ), part (iii) follows from Theorem 10.6.9 (iii),
part (iv) follows from Theorem 10.6.9 (iv), part (v) follows from Theorem 10.6.9 (v), and part (v0 ) follows
from Theorem 10.6.9 (v0 ).
10.9.15 Remark: Comments on partially defined functions.
A relation f : X Y is an injective partially defined function if and only if the inverse relation f 1 : Y X
is an injective partially defined function. Such a function could be referred to as a partial bijection or a
local bijection. Such names could be justified by observing that f : Dom(f ) Range(f ) is a bijection.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (i0 ) follows trivially from part (i).
For part (i00 ), suppose that f : X Y is surjective. Then f(X) = Y by Definition 10.9.8. So it follows
P 1
P
from part (i) that S (Y ), f (f (S)) = S. To show the converse, assume S (Y ), f(f1 (S)) = S.
Let S = Y . Then f(f1 (Y )) = Y . So for all y Y , for some x f1 (Y ), f (x) equals y. But f1 (Y )
Dom(f ) by Definition 10.9.10 (ii) for f1 . So for all y Y , for some x Dom(f ), f (x) equals y. Hence f is
surjective by Definition 10.9.8.
P
For part (ii), suppose that Dom(f ) = X. Then f is a function. So S (X), f1 (f(S)) S follows from
Theorem 10.7.1 (ii). To show the converse, suppose that f : X P
Y satisfies S (X), f1 (f(S)) S.
Let S = X. Then f (f (X)) X. So for all x X, for some y f(X), f (x) = y by Definition 10.9.10 (ii)
1
for f1 . But f(X) Y by Definition 10.9.10 (i) for f. So for all x X, for some y Y , f (x) = y. Therefore
X Dom(f ). Hence X = Dom(f ).
For part (ii0 ), suppose that f : X P
Y is injective. Let S (X). Let x f1 (f(S)). Then by
Definition 10.9.10 (ii) for f , there is a y f(S) such that f (x) = y, and then by Definition 10.9.10 (i)
1
10.10.5 Theorem: The composite of any two partially defined functions is a partially defined function.
Proof: Let f1 and f2 be partially defined functions. By Definition 9.6.2 for the composite of relations,
the composite of f1 and f2 is the relation {(a, b); c, ((a, c) f1 (c, b) f2 )}. Suppose (a, b1 ) f and
(a, b2 ) f . Then (a, c1 ), (a, c2 ) f1 and (c1 , b1 ), (c2 , b2 ) f2 for some c1 , c2 B1 . Since f1 is a partially
defined function, c1 = c2 . So b1 = b2 because f2 is a partially defined function. Hence f is a partially defined
function by Definition 10.9.2.
10.10.6 Definition: The composition or composite of two partially defined functions f1 and f2 is the
partially defined function {(a, b); c, ((a, c) f1 (c, b) f2 )}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
which cannot be gleaned from truth tables alone.)
Proof: Let f1 A1 B1 and f2 A2 B2 be partially defined functions. It follows from Theorem 10.10.5
that f2 f1 is a partially defined function. It must be shown that Dom(f2 f1 ) A1 and Range(f2 f1 ) B2 .
Let a Dom(f2 f1 ). Then (a, b) f2 f1 for some b. So c, ((a, c) f1 (c, b) f2 ) by Definition 10.10.6.
So (a, c) f1 for some c. So a Dom(f1 ).
Let b Range(f2 f1 ). Then (a, b) f2 f1 for some a. So c, ((a, c) f1 (c, b) f2 ) by Definition 10.10.6.
So (c, b) f2 for some c. So b Range(f2 ).
Therefore Dom(f2 f1 ) Dom(f1 ) and Range(f2 f1 ) Range(f2 ). But Dom(f1 ) A1 and Range(f2 ) B2
by Notation 10.9.3 and Definition 10.9.2 (ii, iii). So Dom(f2 f1 ) A1 and Range(f2 f1 ) B2 . Hence
f2 f1 : A1
B2 by Notation 10.9.3 and Definition 10.9.2 (ii, iii).
A1 B1 A2 B2
f1 f2
a1 b1
c1 c2 c3
a2 b2
f1 f2
f1 : A1 B1 f 2 : A2 B 2
f2 f1 : A1
B2
Figure 10.10.1 Composite of two functions is a partially defined function
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ix) Dom(f2 f11 ) = f1 (Dom(f1 ) Dom(f2 )).
(x) Range(f2 f11 ) = f2 (Dom(f1 ) Dom(f2 )).
10.11.2 Definition: The Cartesian product of a family of sets (Si )iI is the set of functions
S
f :I Si ; i I, fi Si .
iI
10.11.3 Notation: iI Si for a family of sets (Si )iI denotes the Cartesian product of the family of sets
according to Definition 10.11.2.
10.11.4 Remark: Alternative expression for the Cartesian product of a family of sets.
The Cartesian product in Definition 10.11.2 may be written as
Si =
iI
f P(I S Si); (j I, 0 y S Si, (j, y) f )
(j I, y Sj , (j, y) f ) ,
iI iI
which shows that iI Si is a well-defined ZF set by Theorem 7.7.2, the ZF specification theorem. Otherwise,
this formula seems to have no obvious value.
10.11.5 Remark: The special case of the Cartesian product of a constant family of sets.
If Si = X for all i I, then iI Si = X I for any sets X and I. (See Notation 10.2.13 for the set of
functions X I .)
N
If I = n for some n + Z n
0 and Si = X for all i I, then iI Si = X . (See Notation 14.1.21 for index
N
sets n .)
10.11.6 Theorem: Cartesian products iI Si of families of sets (Si )iI have the following properties.
(i) If I = , then iI Si = {}.
(ii) If Si = for some i I, then iI Si = .
(iii) If I = {j}, then iI Si = {(j, x); x Sj }.
S
Proof: For part (i), let (Si )iI be a family of sets. Then iI Si = {f : I iI Si ; i I, fi Si } by
Definition 10.11.2. Let I = . Then iI Si = {f : ; >} = = {} by Theorem 10.2.21. (The symbol
> denotes the always-true predicate. See Notation 5.1.10.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (ii), let (Si )iI be a familySof sets such that Sj = for some j I. Then iI Si = {f : I
S
iI Si ; i I, fi Si } = {f : I iI Si ; fj i I \ {j}, fi Si } = .
For part (iii), let (Si )iI be a family of sets with I = {j} for some j. Then iI Si = {f : {j} Sj ; fj Sj }
by Definition 10.11.2. Hence iI Si = {(j, x); x Sj }.
10.11.8 S
Definition: A set-family choice function for a family of non-empty sets (XQ )I is a function
f : S I X which satisfies A S, f (A) A. In other words, it is an element of I X .
10.11.9 Remark: Axiom of choice and non-emptiness of Cartesian products of families of non-empty sets.
The Cartesian product
S in Definition 10.11.2 is a well-defined set because it is specified as a subset of a well-
defined set I iI Si which is restricted by a well-defined predicate. However, it is not guaranteed to be
non-empty in general unless one adds an axiom of choice to the Zermelo-Fraenkel axioms. Except when
one is trying very hard to construct pathological sets, it will usually be true that a Cartesian product is
non-empty if all of the member sets are non-empty. But the non-emptiness of the Cartesian product of an
infinite family of non-empty sets should not be assumed without proof.
If the index set is finite, then the product of a family of non-empty sets is non-empty by an inductive
argument in Zermelo-Fraenkel set theory. If the index set is countably infinite, then the product of a family
of non-empty sets is non-empty by an inductive argument in Zermelo-Fraenkel set theory with the axiom
of countable choice. For a general index set, the product of a family of non-empty sets is non-empty in
Zermelo-Fraenkel set theory with the general axiom of choice. Since finite and countably infinite sets are
defined in Sections 13.3 and 13.5 respectively, the statement and proof of those cases must wait until then,
but the general case can be stated here as Theorem 10.11.10.
If one possesses a well-ordering on the union of the domain of a family of non-empty sets, the axiom of choice
is unnecessary
S to prove that the Cartesian product is non-empty. Thus if one possesses a well-ordering on
the set iI Si corresponding to the family (Si )iI in the proof of Theorem 10.11.10, one may define the
choice function A as the set {(i, min(Si )); i I}. (See Section 11.6 for well-orderings.)
10.11.10 Theorem [zf+ac]: The Cartesian product of a family of non-empty sets is non-empty.
Proof: Some formulations of ZF+AC set theory give the statement of this theorem as the definition of the
axiom of choice. Axiom (9) in Definition 7.11.10 means that there exists a set which intersects each member
of an arbitrary set of disjoint non-empty sets in exactly one element. For an arbitrary family of non-empty
, the sets {i} Si for i I are disjoint. Therefore the axiom of choice implies that there
open sets (Si )iI S
exists a set A iI ({i} Si ) such S that A ({i} Si ) contains one and only one element for each i I.
Such a set A is a function from I to iI Si such that A(i) Si for all i I. In other words, A iI Si .
Hence iI Si 6= .
10.11.11 Remark: The axiom of choice permits universal and existential quantifiers to be swapped.
In predicate logic, from a statement x, y, A(x, y), one may infer the corresponding statement with
quantifiers swapped, namely y, x, A(x, y). (See Theorem 6.6.22 (iii).) But the converse inference is not
justified. Similarly, if the quantified variables are restricted to specific sets (as in Notation 7.2.7), then from
a statement x S, y T, A(x, y), one may also infer the corresponding statement with quantifiers
swapped, namely y T, x S, A(x, y), but the converse inference is not justified. An axiom of choice
may be thought of as reversing the quantifier order in a way which, at first sight, might seem to be not
justified. With a suitable choice axiom, one may infer (2) from (1), where (Tx )xS is a family of sets.
(1) x S, y Tx , A(x, y).
Q
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) y xS Tx , x S, A(x, y(x)).
S
In statement (2), the simple variable y has been replaced with a function y : S xS Tx such that
x S, y(x) Tx . This is the kind of choice function which is delivered by an axiom of choice. (The axiom
of countable choice requires the set S to be countable.) It is very tempting to think that since there exists
at least one y in each set Tx in line (1), then y may be regarded as a function of x in the same way that Tx
is. But in the absence of a suitable axiom of choice, this kind of functional dependence cannot be proved in
general in ZF set theory.
In some common scenarios, swapping quantifiers is easy to achieve without an axiom of choice. A well known
R R
example is the - limit definition. Thus if the proposition + , + , f (Ba, ) Bf (a), is given,
R R R
one may infer the proposition : + + , + , f (Ba,() ) Bf (a), without invoking any axiom
of choice. (This example is discussed further in Remark 44.2.1.)
One may sometimes have the impression that the need for an axiom of choice in a particular proof is a
somewhat subjective matter. This is not true. When the quantifiers are reversed (or swapped), the choice
R R
function, which is : + + in the real function limit example, is supposed to be an object in the set
theory universe in which Zermelo-Fraenkel set theory is being interpreted. This object either does or does
not exist in that universe. One may not personally know currently whether that object is contained in that
universe, but once the universe has been specified, it is a clear-cut true/false question whether the object is
contained in it.
10.12.2 Definition: The projection (map) of a Cartesian product S1 S2 onto a component k N2, is
the map (x1 , x2 ) 7 xk from S1 S2 to Sk .
N
10.12.3 Notation: k , for k 2 , denotes the projection map of a Cartesian product S1 S2 onto the
component k. In other words, k : S1 S2 Sk is defined by (x1 , x2 ) 7 xk .
N
10.12.6 Notation: Slicekp (A), for k 2 , p S1 S2 and A S1 S2 , denotes the slice of A through p
N
along component k. In other words, Slicekp (S) = {(x1 , x2 ) A; i 2 \ {k}, xi = pi }.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) p S1 S2 , C P(P(S1 S2)), 1(Slice1p ( S C )) = SAC 1(Slice1p(A)).
(ii) p S1 S2 , C P(P(S1 S2 )), 2 (Slice2p ( C )) = AC 2 (Slice2p (A)).
S S
Proof: SFor part (i), let p S1 S2 and C P(S1 S2 ). Then 1 (Slice1p ( C)) = {x1 ; x Slice1p ( C)} =
S S
{x1 ; x C and x2 = p2 } = {x1S
S ; (A C, x A) and x2 = p2 } S = {x1 ; A C, (x A and x2 = p2 )} =
1 1
AC {x 1 ; x A and x 2 = p 2 } = AC {x 1 ; x 1 (Slicep (A))} = AC 1 (Slicep (A)).
Part (ii) may be proved as for part (i).
in general. However, 1
k is a valid relation, which is the inverse of the function k : S1 S2 Sk , and
when 1
k is composed with the restricted function f
Slicek (A)
, the result is a well-defined function from the
p
The projected slice of a function f : A Y through a point (p1 , p2 ) along component k , for A S1 S2 , a
N
set Y , (p1 , p2 ) S1 S2 and k 2 , is the function f Slicek (A) 1
k
k on the subset k (Slicep (A)) of Sk .
p
N
10.12.10 Notation: Slicekp (f ), for k 2 , p S1 S2 , f : A Y , A S1 S2 and a set Y , denotes the
slice of f through p along component k. In other words, Slicekp (f ) = f Slicek (A) .
p
10.12.11 Remark: Projected function slices, substitution operators and lift maps.
The projected function slice operation in Definition 10.12.9 has the same effect as the substitution of one
of the parameters of the function. Thus
t Sk , (Slicekp (f ) 1
k )(t) = (f Slicek
1
k )(t)
p (A)
1
= (f k Slicek (A) )(t)
p
= f subs(p),
k,t
where subsk,t (p) means the result of substituting t for element k of p. (See Definition 14.11.6 (vii).) However,
it is more accurate to think of projected function slices as lift maps because they are right inverses of
projection maps.
10.12.13 Notation: Liftkp , for k N2 and p S1 S2 , denotes the lift map for component k of the
point p. In other words, Liftkp : Sk S1 S2 is the map defined by
t Sk , i N2 , Liftkp (t)(i) =
t
pi
if i = k
6 k.
if i =
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
simpler and more meaningful expression f Liftkp . Lift maps are right inverses of projection maps as follows.
N
In other words, k 2 , p S1 S2 , k Liftkp = idSk . Note, however, that this assumes the partial
function composition concept in Definition 10.10.6.
Substitution operators are distinguished from lift maps by the fact that substitution operators map from
tuples to tuples, whereas a lift maps map from values of individual components to the tuple space. Thus
Liftkp (t) gives the same output as the expression subsk,t (p). In other words, p is a parameter for the lift
map, but p is an input for the substitution operator, whereas t is round the other way.
10.13.2 Definition: The projection map of a Cartesian product iI Si onto a component k I is the
map from iI Si to Sk defined by x 7 xk for x = (xi )iI iI Si .
10.13.3 Notation: k , for k I, denotes the projection map of a Cartesian set-product iI Si onto the
component k. In other words, k : iI Si Sk is defined by k : x 7 xk .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.13.10 Definition: Slices of subsets of Cartesian set-products through given points.
The slice of a set A through a point p along component k , for A iI Si , p = (pi )iI iI Si and k I,
is the subset {x A; i I \ {k}, xi = pi } of A.
The projected slice of a set A through a point p along component k , for A iI Si , p = (pi )iI iI Si
and k I, is the subset k ({x A; i I \ {k}, xi = pi }) = {xk ; x A and i I \ {k}, xi = pi } of Sk .
10.13.11 Notation: Slicekp (A), for k I, p iI Si and A iI Si , denotes the slice of A through p
along component k. In other words, Slicekp (A) = {x A; i I \ {k}, xi = pi }.
10.13.12 Remark: Slice sets for empty and singleton index sets.
If I = in Definition 10.13.10, then iI Si = {}. Therefore p = and either A = or A = {a} = {}.
However, the condition k I cannot be met. So Definition 10.13.10 is vacuous in this case. Similarly, there
are no projected slices either.
Let I be a singleton, such as {j}. Then iI Si = {{(j, xj )}; xj Sj }. (This is not precisely the same
as Sj = {xj ; xj Sj }, but it is often identified with it. This identification is not assumed here.) In this
case, p = {(j, xj )} for some xj Sj , and A is a subset of {{(j, xj )}; xj Sj }. The only possibility for k is j.
So the slice of A through p along component k is Sliceka (A) = {x A; i I \ {k}, xi = pi }, which equals
A because I \ {k} = . In other words, the intuitively expected result is obtained.
10.13.13 Theorem: Let iI Si be a family of sets.
(i) If I = {j}, then k I, p iI Si , A P(iI Si), k (Slicekp (A)) = k (A).
If I = {j}, then k I, p iI Si , C P(P(iI Si )), k (Slicekp ( C )) = AC k (Slicekp (A)).
S S
(ii)
k I, p iI Si , C P(P(iI Si )), k (Slicekp ( C )) = AC k (Slicekp (A)).
S S
(iii)
Proof: Part (i) follows from the observation in Remark 10.13.12 that if I is a singleton, then Slicekp (A) = A
P
for all k I, p iI Si and A (iI Si ). Hence k (Slicekp (A)) = k (A).
S let I = {j},
For part (ii), S k I,
S p iI Si and P
S C (iI Si ). Then by part (i) and Theorem 10.13.4,
k (Slicekp ( C)) = k ( C) = AC k (A) = AC k (Slicekp (A)).
For part (iii), let k I, p iI Si and C P(iI Si). Then
S S
k (Slicekp ( C)) = {xk ; x Slicekp ( C)}
S
= {xk ; x C and i I \ {k}, xi = pi }
= {xk ; (A C, x A) and i I \ {k}, xi = pi }
= {xk ; A C, (x A and i I \ {k}, xi = pi )}
S
= AC {xk ; x A and i I \ {k}, xi = pi }
S
= AC {xk ; x k (Slicekp (A))}
S
= AC k (Slicekp (A)).
In the case I = , the proposition k I is always false, and so the proposition is true. The case that I is
a singleton (i.e. when I \ {k} = ), is additionally verified by part (ii).
N
10.13.15 Notation: Slicekp (f ), for k 2 , p iI Si , f : A Y , A iI Si and a set Y , denotes the
slice of f through p along component k. In other words, Slicekp (f ) = f Slicek (A) .
p
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
10.13.16 Definition: Lift maps from component spaces to Cartesian set-products.
The lift map for component k to a point p iI Si , for k I, is the map from Sk to iI Si defined by
t 7 x with xk = t and i I \ {k}, xi = pi .
10.13.17 Notation: Liftkp , for k I and p iI Si , denotes the lift map for component k of the point p.
In other words, Liftkp : Sk iI Si is the map defined by
t if i = k
t Sk , i I, Liftkp (t)(i) =
pi 6 k.
if i =
10.14.2 Theorem: Let f and g be functions. Let h be the direct product of f and g, regarded as relations.
(See Definition 9.7.2.) In other words, let h = {((x, y), (u, v)); (x, u) f and (y, v) g}.
(i) h is a function.
(ii) Dom(h) = Dom(f ) Dom(g).
(iii) Range(h) = Range(f ) Range(g)
(iv) (x, y) Dom(h), h(x, y) = (f (x), g(y)).
Proof: For part (i), let ((x, y), (u1 , v1 )) h and ((x, y), (u2 , v2 )) h. Then (x, u1 ) f , (y, v1 ) g,
(x, u2 ) f and (y, v2 ) g by Definition 9.7.2. So u1 = u2 and v1 = v2 by Definition 10.2.2 (i). Therefore
(u1 , v1 ) = (u2 , v2 ) by Theorem 9.2.17. Hence h is a function by Definition 10.2.2 (i).
Part (ii) follows from Theorem 9.7.5 (i).
Part (iii) follows from Theorem 9.7.5 (ii).
For part (iv), let (x, y) Dom(h). Then x Dom(f ) and y Dom(g) by part (ii). So (x, f (x)) f and
(y, g(y)) g. Therefore ((x, y), (f (x), g(y))) h. Hence h(x, y) = (f (x), g(y)).
10.14.3 Definition: The (double-domain) direct product of two functions f and g is the function from
Dom(f ) Dom(g) to Range(f ) Range(g) defined by (x, y)
7 (f (x), g(y)) for (x, y) Dom(f ) Dom(g).
10.14.4 Notation: f g, for functions f and g, denotes the double-domain direct product of f and g.
Thus
(x, y) Dom(f ) Dom(g), (f g)(x, y) = (f (x), g(y)).
g.
Alternative notation: f
10.14.5 Remark: Emphatic notation for double-domain direct product of functions.
In some contexts, the double-domain function product in Definition 10.14.3 and the common-domain function
g may be used, where
product in Definition 10.15.2 must be distinguished. In this case, the notation f
the double dot is a mnemonic for the double domain.
10.14.6 Theorem: Let f and g be functions.
(i) If f and g are injective, then f g is injective.
(ii) If f : X1 Y1 and g : X2 Y2 are bijections, then f g : X1 X2 Y1 Y2 is a bijection.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: For part (i), let (x1 , y1 ), (x2 , y2 ) Dom(f ) Dom(g) satisfy (f g)(x1 , y1 ) = (f g)(x2 , y2 ). Then
(f (x1 ), g(y1 )) = (f (x2 ), g(y2 )) by Definition 10.14.3. So f (x1 ) = f (x2 ) and g(y1 ) = g(y2 ) by Theorem 9.2.17.
Therefore x1 = x2 and y1 = y2 by Definition 10.5.2. So (x1 , y1 ) = (x2 , y2 ) by Theorem 9.2.17. Hence f g
is injective by Definition 10.5.2.
For part (ii), let f : X1 Y1 and g : X2 Y2 be bijections. Then f g is injective by part (i).
But Dom(f g) = Dom(f ) Dom(g) = X1 X2 and Range(f g) = Range(f ) Range(g) = Y1 Y2 by
Theorem 10.14.2 (ii, iii). So f g : X1 X2 Y1 Y2 is a surjective function. Hence f g : X1 X2 Y1 Y2
is a bijection.
10.14.7 Remark: Composition of double-domain direct products of functions.
Unsurprisingly, the composition of two double-domain direct products of functions yields a single double-
domain direct product of two composites of functions. (See Definition 10.4.14 for composition of functions.)
This is shown in Theorem 10.14.8. This kind of construction is relevant to transition maps for direct products
of manifolds. (See for example Theorem 50.12.5.)
10.14.8 Theorem: Let f1 , f2 , g1 , g2 be functions with Range(f1 ) Dom(f2 ) and Range(g1 ) Dom(g2 ).
Then (f2 f1 ) (g2 g1 ) = (f2 g2 ) (f1 g1 ).
Proof: From Definitions 10.4.14 and 10.14.3, it follows that
(x, y) Dom(f1 ) Dom(g1 ),
((f2 f1 ) (g2 g1 ))(x, y) = (f2 (f1 (x)), g2 (g1 (y)))
= (f2 g2 )(f1 (x), g1 (y))
= (f2 g2 )((f1 g1 )(x, y))
= ((f2 g2 ) (f1 g1 ))(x, y).
Hence (f2 f1 ) (g2 g1 ) = (f2 g2 ) (f1 g1 ).
10.14.9 Remark: Definition and notation for the double-domain direct product of partial functions.
The direct product of partial functions in Definition 10.14.10 is generalised from Definition 10.14.3 for fully
defined functions. This kind of direct product is useful for the definition of atlases for direct products
of manifolds. (See Definition 49.4.6.) Definition 10.14.10 is specialised from Definition 9.7.2 for general
relations.
10.14.10 Definition: The (double-domain) direct product of two partial functions f and g is the partial
function {((x1 , x2 ), (y1 , y2 )); (x1 , y1 ) f and (x2 , y2 ) g}.
10.14.11 Notation:
f g, for partial functions f and g, denotes the double-domain direct product of f and g.
10.14.12 Remark: Domain and range of the direct product of two partial functions.
It is easily seen from Definition 10.14.10 and Notation 10.14.11 that Dom(f g) = Dom(f ) Dom(g) and
Range(f g) = Range(f ) Range(g). One may then write:
((x1 , x2 ), (y1 , y2 )) Dom(f g) Range(f g),
((x1 , x2 ), (y1 , y2 )) f g (x1 , y1 ) f and (x2 , y2 ) g.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (iii) follows from parts (ii) and (iii).
Proof: For part (i), let y (f1 f2 )(S). Then y = (f1 f2 )(x) for some x X. So by Definition 10.15.2,
y = (f1 (x), f2 (x)) for some x X. Therefore y f1 (S) f2 (S) by Definition 9.4.2. It then follows that
(f1 f2 )(S) f1 (S) f2 (S).
Part (ii) follows from part (i).
For part (iii), let S1 Y1 and S2 Y2 . Then by Definition 10.15.2, x (f1 f2 )1 (S1 S2 ) if and only
if (f1 (x), f2 (x)) S1 S2 , which holds if and only if f1 (x) S1 and f2 (x) S2 , which holds if and only if
x f11 (S1 ) f21 (S2 ). Hence (f1 f2 )1 (S1 S2 ) = f11 (S1 ) f21 (S2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
X f 1 ({a})
g
g 1 ({b}) b B
f
f 2
g
A AB
a 1
Figure 10.15.1 Maps and spaces for a common-domain direct function product
10.16.2 Definition: The quotient map of an equivalence relation R on a set X is the map f : X X/R
given by f : x 7 {y X; x R y}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
y1 , y2 Y, y1 6= y2 f 1 ({y1 }) f 1 ({y2 }) = .
f 1 ({y1 }) f 1 ({y2 })
Y
y1 y2
Figure 10.16.1 Partitioning of a set X by an inverse function f 1
Functions provide a useful tool for partitioning sets. The value of a function f on a set X may be thought
of as a tag which identifies elements of X which belong to the same part of the partition. In other words, a
function effectively defines an equivalence relation on its domain.
10.16.6 Remark: The equivalence relation induced by an inverse function is called an equivalence kernel.
The relation R on X in Definition 10.16.7 is an equivalence relation by Theorem 10.16.3. (This definition of
equivalence kernel is given, for example, by MacLane/Birkhoff [106], page 33.)
Proof: For part (i), let R be the equivalence kernel of a function f : X Y . For x X, let g(x) =
f 1 ({f (x)}). Then g(x) = {x0 X; f (x0 ) {f (x)}} = {x0 X; f (x0 ) = f (x)} = {x0 X; x0 R x} X/R.
So g is well defined.
Let S X/R. Then S = {x0 X; x0 R x} = {x0 X; f (x0 ) = f (x)} for some x X, and so S = g(x).
Therefore g : X X/R is surjective.
For part (ii), define h to be the set of ordered pairs {(g(x), f (x)); x X}. Then h (X/R) f (X) because
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Range(g) = X/R by part (i). So h is a relation from X/R to f (x). In fact, Range(g) = X/R implies
that Dom(h) = X/R. To show that h(S) has a unique value for all S X/R, let S = g(x1 ) = g(x2 )
for x1 , x2 X. Then {x X; f (x) = f (x1 )} = {x X; f (x) = f (x2 )}. So f (x1 ) = f (x2 ). Therefore h is
well defined.
The surjectivity of h follows from the definition of f (X) = {f (x); x X}. To show that h is injective, let
y f (X) and suppose that S1 , S2 X/R satisfy h(S1 ) = h(S2 ). Then S1 = g(x1 ) and S2 = g(x2 ) for some
x1 , x2 X by the surjectivity of g. So h(g(x1 )) = h(g(x2 )), and so f (x1 ) = f (x2 ). Therefore h in injective.
Hence h is a well-defined bijection.
For part (iii), let f : X Y be a function, and define g and h as in parts (i) and (ii). Since g : X X/R and
h : X/R f (X) are well-defined surjections, it follows that h g : X f (X) is a well-defined surjection.
Let x X. Then (h g)(x) = h(g(x)) = f (x) by the definition of h. Hence h g = f .
10.17.2 Definition:
S The partial Cartesian product of a family of sets (Si )iI is the set of functions
{x : J iI Si ; J I and i J, xi Si }.
iI Si denotes the partial Cartesian product of a family of sets (Si )iI . Thus
10.17.3 Notation:
S
Si = {x : J
Si ; J I and i J, xi Si }.
iI iI
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) x, y X, i Dom(x) Dom(y), (xi = yi x = y).
In other words, the elements of X are non-empty elements of iI Si such that every element of each set Si
is the value xi of one and only partial choice function x X.
10.17.9 Remark: Gluing sets onto each other.
Collage spaces are closely related to quotient sets, which are introduced in Definition 9.8.5. Definition 10.17.8
requires some interpretation. Strictly speaking, a subset X of the set iI Si is only a representation of a
collage space of the family of sets (Si )iI . The collage space of a family of sets is a kind of equivalence class
of the sets in the family. It is easier toS interpret Definition 10.17.8 if the sets Si are pairwise disjoint. In
this case, there is a canonical map f : iI Si X defined so that f (s) is the unique element x of X such
that xi = s for some i I. (See Figure 10.17.1, where a dot is used in tuples such as (w1 , ) for the
undefined elements of sequences in X.)
y2 z2 S2
x2
w y
x z
S1
w1 y1
x1 f
Si = S1
X S2
(w1 , ) = f (w1 ) f (z2 ) = ( , z2 ) iI
Since f is a well-defined S surjective function, the sets of the form f 1 ({x}) for x X are non-empty and
constitute a partition of iI Si . The elements of each set f 1 ({x}) are regarded as identified. In other
words, the elements in such a set are regarded as glued onto each other.
If the sets Si are not disjoint, it is straightforward to attach a tag i to elements of each Si so as to force the
sets to be disjoint. If
Sthe sets Si are disjoint, it is possible to formalise the elements of the collage space as a
partition of the set iI Si rather than as a set of tagged elements of the partial Cartesian product iI Si .
S
In fact, the partial Cartesian product set iI Si is simply the set of subsets of iI Si with tags on each
element to indicate which set Si it was drawn from.
The idea of superposing sets to construct collage spaces is fundamental to differential geometry. Differ-
entiable and topological manifolds are constructed by gluing portions of Cartesian spaces onto each other
to create more general classes of spaces. The sets (Si )iI are in this case the domains of charts in an atlas.
The domain of each chart is a patch. The patches are glued together to form an abstract manifold. It is
necessary to also define a collage-space topology and a collage-space differentiable structure, and so forth,
in order to build up the structural layers of manifolds.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(Cj )jJ of the subset A of X such that j J, i I, Cj Bi .
10.18.5 Remark: Confusion of families of sets with their ranges.
There is a subtle issue with the inclusion (Cj )jJ (Bi )iI in Definition 10.18.3. It is not important that
the indexing of the two families (Cj )jJ and (Bi )iI should be the same. In fact, only the ranges of these
two families are intended to satisfy the inclusion. In other words, {Cj ; j J} {Bi ; i I}.
10.18.6 Remark: Indexed set covers and collage spaces.
Theorem 10.18.7 shows that indexed covers of sets are in essence the same thing as the collage spaces in
Definition 10.17.8. However, a collage space is more general because it permits the gluing together of disjoint
sets without even having a substrate (such as the set A in Definition 10.18.2) to attach the patches to. A
collage space builds a set from patches in the air.
10.18.7 Theorem: Let (Bi )iI be an indexed cover of A X. Define x(s) = {(i, s); i I and s Bi } for
all s A. Let Y = {x(s); s A}. Then Y is a collage space of (Bi )iI .
Proof: For all s A, let J(s) = {i I; s Bi }. Then x(s) is a function with Dom(x(s)) = J(s) I and
i J(s), x(s) Bi . So x(s) iI Bi by Notation 10.17.3. Thus Y iI Bi .
S
Let s A. Then s iI Bi by Definition 10.18.2 (ii). So s Bi for some i I. So x(s) 6= . So / Y.
Therefore Y satisfies Definition 10.17.8 (i).
Let i I and s Bi . Then x(s) Y and x(s)(i) = s. So i I, s Bi , y Y, yi = s. Therefore Y
satisfies Definition 10.17.8 (ii).
Let y 1 , y 2 Y . Then y 1 = x(s1 ) and y 2 = x(s2 ) for some s1 , s2 A. Let i Dom(y 1 ) Dom(y 2 ).
Suppose that yi1 = yi2 . Then x(s1 )(i) = x(s2 )(i). So s1 = s2 . So x(s1 ) = x(s2 ). So y 1 = y 2 . Consequently
y 1 , y 2 , i Dom(y 1 ) Dom(y 2 ), (yi1 = yi2 y 1 = y 2 ). Therefore Y satisfies Definition 10.17.8 (iii). Hence
Y is a collage space of (Bi )iI .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
map-rule notation in Remark 10.19.3. As a practical example, a tangent operator field on a differentiable
R R
manifold M is of the form X : M ((M ) ). (Here A = M , B = (M ) and C = .) This R R
R R
may be transposed as the function X t : (M ) (M ) defined by X t (f )(x) = X(x)(f ) for all
R
f (M ) and x M . (In terms of the double-map-rule notation, X t : f 7 (x 7 X(x)(f )).) In fact,
whenever a function is function-valued, and the target functions all have the same domain, the base function
may be transposed in this way to make a new function.
It often happens in differential geometry that function-valued functions are transposed in this way for
convenience according to context. Noteworthy examples of this are differentials and connections. There are
four different ways of representing the information in a function f : A B C using transposition and
domain-splitting. These are as follows.
f :AB C
ft : B A C defined by f t (b, a) = f (a, b) or f t : (b, a) 7 f (a, b)
f : A (B C) defined by f(a)(b) = f (a, b) or f : a 7 (b 7 f (a, b))
ft : B (A C) defined by ft (b)(a) = f (a, b) or ft : b 7 (a 7 f (a, b)),
where f denotes the domain-split of f . Any one of these functions may be defined in terms of any of the
others. Therefore they all contain the same information. Functions are often freely converted between these
forms with little or no comment.
Any function valued on a cross-product may be regarded as a function-valued function by fixing the value
of one coordinate and constructing a function using the remaining coordinate. (See Theorem 10.19.7.) The
functions f : A (B C) and ft : B (A C) may be thought of as projections of f onto A and B
respectively. One could even invent notations such as 1 f for f and 2 f for ft . Obviously there are very
many more ways of doing this in the case of a Cartesian product of many sets.
10.19.7 Theorem: Let X, Y and A be sets. Define 1 : AXY (AY )X and 2 : AXY (AX )Y by
1 : f 7 (x 7 (y 7 f (x, y)) and 2 : f 7 (y 7 (x 7 f (x, y)). Then 1 and 2 are bijections (with respect
to their indicated domains and ranges).
Proof: Let X, Y and A be sets. Define 1 : AXY (AY )X by 1 : f 7 (x 7 (y 7 f (x, y)). This is
a well-defined map because for all functions f : X Y A and elements x X and y Y , f (x, y) has a
well-defined value.
Let g (AY )X . Then g(x) AY for all x X. Define a function f : X Y A by f (x, y) = g(x)(y) for
all (x, y) X Y . Then 1 (f )(x)(y) = f (x, y) = g(x)(y) for all (x, y) X Y . This means that 1 (f ) = g.
So g Range(1 ). Therefore 1 : AXY (AY )X is a surjection.
Now suppose that a function g (AY )X satisfies 1 (f1 ) = g and 1 (f2 ) = g for some f1 , f2 AXY . Then
1 (f1 )(x)(y) = g(x)(y) for all x X and y Y . But 1 (f1 )(x)(y) = f1 (x, y) by the definition of 1 . So
f1 (x, y) = g(x)(y) for all (x, y) X Y . Similarly f2 (x, y) = g(x)(y) for all (x, y) X Y . Therefore
f1 (x, y) = f2 (x, y) for all (x, y) X Y , which means that f1 = f2 . Therefore 1 is an injection. Hence
1 is a bijection.
It may be proved that 2 : AXY (AX )Y is a bijection in the same way as for 1 , or one may simply
transpose 2 , apply the result for 1 , and then transpose 2 back to the original map.
10.19.8 Remark: Circled arrow notation for the action of one set on another set.
As a minor innovation (if it is indeed new), actions of sets on other sets are indicated in diagrams (such as
Figure 46.7.2) by a circled arrow or William Tell notation .
Thus the expression G F denotes
G (F F ) or G F F . Then if : G F F is a left group action, for example, this could be
written as : G
F instead of : G (F F ). (This is more useful for arrows in diagrams than in
linear text.)
Chapter 11
Order
relation
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
partial order
lattice order
total order
well-ordering
11.1.3 Theorem: A relation R on a set X is a partial order on X if and only if the inverse relation R1
is a partial order on X.
Proof: Let R be a partial order on X. Then R satisfies Definition 11.1.2 conditions (i), (ii) and (iii). So
R1 satisfies
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
(i0 ) x X, x R1 x;
(ii0 ) x, y X, (y R1 x x R1 y) x = y;
(iii0 ) x, y, z X, (y R1 x z R1 y) z R1 x.
Condition (i0 ) for R1 is the same as condition (i) for R1 . The equivalence of conditions (ii0 ) and (ii)
for R1 follows by swapping the dummy variables x and y. The equivalence of conditions (iii0 ) and (iii)
for R1 follows by swapping the dummy variables x and z and observing that the logical AND-operator is
symmetric. Hence R1 is a partial order on X. The converse follows similarly.
11.1.4 Remark: Equivalent expressions for the conditions for partial orders.
If idX denotes the identity map on a set X, and R is a relation on X, then condition (i) in Definition 11.1.2
is equivalent to idX R, condition (ii) is equivalent to R R1 idX , and condition (iii) is equivalent
to R R R. This is formalised as Theorem 11.1.5.
11.1.5 Theorem: Let R be a relation on a set X. Then R is a partial order on X if and only if the
following conditions all hold.
(i) idX R. [weak reflexivity]
1
(ii) R R idX . [antisymmetry]
(iii) R R R. [transitivity]
Proof: For part (i), note that the proposition x R x in Definition 11.1.2 (i) means (x, x) R. But
idX = {(x, x); x X}. So the proposition x X, (x, x) R is equivalent to z idX , z R, which is
equivalent to idX R.
For part (ii), note that the proposition x R y y R x in Definition 11.1.2 (ii) means (x, y) R (y, x) R,
which is equivalent to (x, y) R (x, y) R1 , which is equivalent to (x, y) R R1 . But for x, y X,
the proposition x = y is equivalent to (x, y) idX . So Definition 11.1.2 (ii) is equivalent to R R1 idX .
For part (iii), note that R R = {(x, y); z, ((x, z) R (z, y) R)} by Definition 9.6.2. So R R R
if and only if x, y, z, ((x, z) R (z, y) R) ((x, y) R) . By Theorem 6.6.12 (iii), this is
equivalent to x, y, z, ((x, z) R (z, y) R) ((x,
y) R) . Since R X X, this is equivalent to
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
x, y, z R, ((x, z) R (z, y) R) ((x, y) R) , which is equivalent to Definition 11.1.2 (iii).
11.1.7 Remark: Redundancy of the base sets of the specification tuple for a partial order.
As often happens, the base set X in the specification pair (X, R) for a partial order in Definition 11.1.2 is
redundant. The set X may be determined from the set R as X = {x; (x, x) R}.
11.1.9 Definition: The dual order of an order R for a set X is the inverse R1 of the relation R. In other
words, the dual of an order R is the relation R1 = {(x, y); (y, x) R}.
11.1.10 Notation: The symbol may be used for a partial order R. Then x y means x R y and the
following notations apply.
The symbol means the dual of the order . Thus equals R1 . Thus x y means y x.
{a, b, c, d}
Figure 11.1.2 Inclusion relation for sets of two, three or four elements
The symbol < denotes the relation which satisfies x < y (x y x 6= y).
The symbol > denotes the relation which satisfies x > y (x y x 6= y).
The symbols 6, 6, 6< and 6> denote the logical negations of , , < and > respectively.
Thus x 6 y means (x y), x 6 y means (x y), x 6< y means (x < y), and x 6> y means (x > y).
11.1.11 Remark: Relations which contain the same information as a partial order.
A partial order contains the same information as the corresponding relations <, and >.
Therefore the specification of a partial order may be freely chosen from these four representations. To avoid
defining everything four times, it is usual to standardise the representation as meaning less than or equal
to. Then < may be defined by (x < y) (x y x 6= y), with similar expressions for and >.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Let (X, ) be a partially ordered set. For part (i), let x, y X with x y. Suppose that x > y.
Then y x and x 6= y by Notation 11.1.10. So x = y by Definition 11.1.2 (ii). This contradicts x 6= y.
Therefore (x > y). The proof of part (ii) is essentially identical.
11.1.14 Remark: Closure of partial orders under restriction of the domain and range.
Theorem 11.1.15 uses Definition 9.6.19 for the restriction of a relation.
11.1.17 Definition: An order homomorphism between two ordered sets X and Y is a map f : X Y
such that x1 , x2 X, x1 x2 f (x1 ) f (x2 ) .
An order isomorphism between two ordered sets X and Y is a bijection f : X Y such that x1 , x2
X, x1 x2 f (x1 ) f (x2 ) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A minimal element of A in X is an element b A such that a A, (a < b).
A maximal element of A in X is an element b A such that a A, (b < a).
c d e c d e
f g h i f g h i
minimal elements f , g, h, i minimal elements f , g, h, i
(with transitivity arrows) (without transitivity arrows)
Figure 11.2.1 Example of minimal and maximal elements of a partially ordered set
In the graph on the left of Figure 11.2.1, all pairs (x, y) with x < y are shown. Strictly speaking, an order
relation should also show self-arrows, which are arrows pointing from each node to itself. For clarity,
these are usually omitted. The arrow diagram for a partial order omitting self-arrows is always a directed
acylic graph. (This follows from the transitivity and antisymmetry properties.) Omitting the self-arrows is,
of course, equivalent to graphing the strict relation < instead of .
Arrow diagrams for partial orders are usually further simplified by omitting arrows for pairs (x, z) for
which there is an element y with x < y and y < z. Thus an arrow is shown from x to y if and only if
(x < y) (z, (x 6< z z 6< y)). In other words, the transitivity is usually implicit. The full order relation
can always be reconstructed by adding transitivity and self-arrows. The abbreviated graph may be thought
of as having all of the short-cuts removed. That is, only the longest (directed) paths are shown because
the short-cuts are implicit.
It is important to note that minimal and maximal elements of a set are non-unique in general. Thus neither
existence nor uniqueness is guaranteed for minimal and maximal elements. This contrasts to the infimum,
supremum, minimum and maximum of a set in Definition 11.2.4, which are all unique if they exist.
11.2.4 Definition: Let (X, ) be a partially ordered set. Let A be a subset of X.
A lower bound for A in X is an element b X such that a A, b a.
An upper bound for A in X is an element b X such that a A, a b.
An infimum or greatest lower bound of A in X is a lower bound b for A in X such that x b for all lower
bounds x for A in X. In other words, (a A, b a) and x X, ((a A, x a) x b).
A supremum or least upper bound of A in X is an upper bound b for A in X such that b x for all upper
bounds x for A in X. In other words, (a A, a b) and x X, ((a A, a x) b x).
A minimum of A in X is an element b A such that a A, b a.
A maximum of A in X is an element b A such that a A, a b.
11.2.5 Definition: A bounded subset of a partially ordered set (X, ) is a set A (X) such that X P
contains both a lower bound for A and an upper bound for A.
P
An unbounded subset of a partially ordered set (X, ) is a set A (X) which is not a bounded subset of X.
P
A bounded-above subset of a partially ordered set (X, ) is a set A (X) such that X contains an upper
bound for A.
P
A bounded-below subset of a partially ordered set (X, ) is a set A (X) such that X contains a lower
bound for A.
11.2.6 Remark: Relations between classes of extrema and bounds of partially ordered sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The relations between the classes of bounds in Definitions 11.2.2 and 11.2.4 are summarised in Figure 11.2.2
and asserted in Theorem 11.2.7. A minimum is a particular kind of infimum, and an infimum is a particular
kind of lower bound. A minimum is also a particular kind of minimal element. (For brevity, the logical
conditions illustrated in Figure 11.2.2 for the infimum and supremum are the shorter forms which are
demonstrated in Theorem 11.2.31.)
infimum: bX supremum: bX
xX, (xb aA, xa) xX, (xb aA, xa)
minimum: bA maximum: bA
aA, ba aA, ba
Figure 11.2.2 Relations between classes of bounds for a subset A of an ordered set X
Proof: Let (X, ) be a partially ordered set. Let A X. For part (i), let x X be a minimum of A.
Then x A and a A, x a. So a A, (x > a) by Theorem 11.1.12 (i). Hence x is a maximal element
of A by Definition 11.2.2. The proof of part (ii) is essentially identical to the proof of part (i).
For part (iii), let x1 be a minimum of A in X. Then x1 is a lower bound for A in X by Definition 11.2.4.
Suppose that x2 is also a lower bound for A in X. Then a A, x2 a. So x2 x1 . Hence x1 is an
infimum of A in X. Part (iv) follows as for part (iii).
For part (v), let x A be an infimum of A in X. Then x is a lower bound for A by Definition 11.2.4. So x
is a minimum of A in X. The converse follows from part (iii). Part (vi) follows as for part (v).
Proof: Let (X, ) be a partially ordered set. Let A X. To show part (i), suppose that x1 , x2 X are
both an infimum of A in X. Then x1 , x2 are lower bounds for A in X. But an infimum of a set is greater than
or equal to all lower bounds for that set. So x1 x2 and x2 x1 . Hence x1 = x2 by Definition 11.1.2 (ii).
The uniqueness of the supremum of A follows similarly.
For part (ii), suppose that x1 , x2 X are both minimums of A in X. Then by Theorem 11.2.7 (iii), x1 and
x2 are both infimums of A in X. Hence x1 = x2 by part (i). The uniqueness of the maximum of A follows
similarly.
11.2.10 Remark: Conditions for existence of the infimum and supremum of a set.
For any subset A of a partially ordered set (X, ), the infimum of A is defined as the maximum of the set
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of lower bounds of A. (This is fairly obvious from the alternative name greatest lower bound, if for no
other reason.) Therefore the infimum of A exists if and only if the maximum of the set of lower bounds of A
exists, and they are equal if they do exist. Similarly, the supremum of A exists if and only if the minimum
of the set of upper bounds of A exists, and they are equal if they do exist. These observations are formalised
in Theorem 11.2.11.
P
Proof: Let (X, ) be a partially ordered set. Let A (X). Let A = {x X; a A, x a}. Then
b X is an infimum of A if and only if a A, b a and x X, ((a A, x a) x b) by
Definition 11.2.4. This holds if and only if b A and x b for all x A , which holds if and only if b is a
maximum of A . This shows part (i), and part (ii) follows similarly.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) If the minimum of A exists, then min(A) = inf(A).
(iii) If the maximum of A exists, then sup(A) = max(A).
(iv) If the minimum and maximum of A both exist, then min(A) = inf(A) sup(A) = max(A).
Proof: For part (i), a A, (inf(A) a and a sup(A)) because by Definition 11.2.4, inf(A) is a lower
bound for A and sup(A) is an upper bound for A. But A 6= . So inf(A) sup(A).
Part (ii) follows from Theorems 11.2.7 (iii) and 11.2.9 (i, ii).
Part (iii) follows from Theorems 11.2.7 (iv) and 11.2.9 (i, ii).
Part (iv) follows from parts (i), (ii) and (iii). (The existence of the minimum and maximum implies that the
set is non-empty by Definition 11.2.4.)
11.2.16 Remark: Existence of the infimum, supremum, minimum and maximum of a set.
There is an implicit caveat when using any of the notations in Notation 11.2.14. Thus for example, a
statement such as b = inf(A) means (1) the infimum of A exists and equals b. Alternatively, it could
mean (2) b is equal to the infimum of A if it exists, which is a different meaning. Interpretation (2) does
not assert the existence of the infimum, and so it does not give a value to b if it does not exist. Therefore
interpretation (1) is to be preferred.
The need to repetitively provide the caveat if it exists is generally avoided in the case of topological limits
by restricting attention to continuous functions, for example, so that the limits are guaranteed to exist. In
some naive introductions to analysis, one sometimes sees language such as: inf(A) = undefined, using
the pseudo-value undefined. In fact, one needs three pseudo-values. One needs a plus infinity which is
added to the ordered set X as an upper bound for all elements of X, a negative infinity which is a lower
bound for all of X, and an undefined value which means that there is neither an infimum nor extremum
amongst the elements of X, nor amongst the two pseudo-infinities.
The issue of infinities and undefined values in the extended-real numbers is not solved by introducing even
three of the pseudo-values for limits. And then one has the further issue of how to combine these pseudo-
values. The arithmetic of pseudo-values is defective. For example, adding, subtracting, multiplying or divid-
ing these three pseudo-values to each other gives rise to yet further pseudo-values. In the case of the bounds
in Notation 11.2.14, one often wishes to evaluate combinations of them. For example, Notation 11.3.11 uses
the returned values of one kind of bound in the evaluation of another bound.
The same kind of issue arises when dealing with functions in general. One often sees the caveat if it is
defined in the case of partially defined functions such as the tangent in Definition 43.14.13 and the majority
of complex-analytic functions. One wishes to write f (x) for the value of f at x even when this is undefined
for some x. The expression f (x) is well defined only if it has one and only one value. It is possible to
wave ones hands when dealing with functions which occasionally have more or less than one value, but
then one is skating on thin ice. In some contexts, the functions may be defined almost everywhere, and
yet one still wishes to use the notation f (x), even when f may be undefined on a dense subset of its
domain. Such hand-waving is acceptable only if one knows how to provide a rigorous treatment. In the
case of divergent series, the divergence of a series to plus or minus infinity can be remedied by adding the
pseudo-values and + to the real numbers, or by adding a single point at infinity (or a point at
infinity for each direction) to the complex numbers. But this does not fix the problem of bounded series
which do not converge, for example.
In the case of the bounds in Notation 11.2.14, there are at least two remedies to the undefined value
problem (in addition to the remedy where one employs pseudo-values). One approach would be to introduce
the non-standard notation inf set(A), for example, to mean {x X; x is an infimum for A}. Such a notation
has the disadvantage of mixing English with the more traditional Latin notations, which are arguably more
international. Another alternative would be a notation such as Inf(A), capitalising the initial letter to
indicate that the set of possible infimum values is intended, not necessarily a well-defined single value. Yet
another possible notation would be an over-bar, such as inf(A), for example, to indicate the set of infimum
values. Probably such an over-bar is the least bad notation for such sets.
P
A second approach is to regard the bounds in Notation 11.2.14 as functions from (X) to X, which is in
fact what they are! In most texts, these bounds are regarded almost as unary operators acting on sets (or
on functions), whereas they actually depend on the choice of order on the containing set (or on the function
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
range). When these bounds are recognised as functions, it becomes possible to deal with them in the same
way as general functions, namely by referring to the domain of the function in any caveat. In other words,
one may define the domain of these bounds to be the set of subsets of X for which the bound exists. Then
A Dom(inf) means that inf(A) exists. Since the bound depends on the order, one may add a tag for the
ordered set X, as for example inf X (A) or inf X (A), although the subscript is already used for other purposes
in the standard notation.
From these considerations, one arrives at the non-standard Definitions 11.2.17 and 11.2.19, and Notations
11.2.18 and 11.2.20.
11.2.17 Definition:
The infimum set-map for a partially ordered set (X, ) is the map : P(X) P(X) defined by
P
(A) = {b X; b is an infimum for A} for all A (X).
The supremum set-map for a partially ordered set (X, ) is the map : P(X) P(X) defined by
P
(A) = {b X; b is a supremum for A} for all A (X).
The minimum set-map for a partially ordered set (X, ) is the map : P(X) P(X) defined by
P
(A) = {b X; b is a minimum for A} for all A (X).
The maximum set-map for a partially ordered set (X, ) is the map : P(X) P(X) defined by
P
(A) = {b X; b is a maximum for A} for all A (X).
The notations inf, sup, min and max are abbreviations for inf X , supX , minX and maxX when the partially
ordered set X
< (X, ) is implicit in the context.
11.2.19 Definition:
P
The infimum map for a partially ordered set (X, ) is the map : {A (X); inf X(A) 6= } X defined
P
by (A) = inf(A) for all A (X) such that inf X(A) 6= .
P
The supremum map for a partially ordered set (X, ) is the map : {A (X); supX(A) 6= } X defined
P
by (A) = sup(A) for all A (X) such that supX(A) 6= .
P
The minimum map for a partially ordered set (X, ) is the map : {A (X); minX(A) 6= } X defined
P
by (A) = min(A) for all A (X) such that minX(A) 6= .
P
The maximum map for a partially ordered set (X, ) is the map : {A (X); maxX(A) 6= } X
P
defined by (A) = max(A) for all A (X) such that maxX(A) 6= .
11.2.21 Remark: Alternative expressions for some properties of extrema and bounds.
Theorem 11.2.7 (iii, iv) and Theorem 11.2.9 (i, ii) may be rewritten as follows for any partially ordered set
(X, ), using the numerosity notation # for finite sets in Notation 13.3.6.
(i) Dom(minX ) Dom(inf X ).
(ii) Dom(maxX ) Dom(supX ).
(iii) P
A (X), #(minX(A)) #(inf X(A)) 1.
(iv) P
A (X), #(maxX(A)) #(supX(A)) 1.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Parts (i) and (ii) follow from Theorem 11.2.7 (iii, iv).
Parts (iii) and (iv) follow from Theorem 11.2.7 (iii, iv) and Theorem 11.2.9 (i, ii).
The value of each map is the unique value of b X which satisfies the corresponding condition in the
expression defining the maps domain.
11.2.23 Remark: Expressions for bound-sets in terms of extremum and bound operators.
Theorem 11.2.24 is a rewrite of Theorem 11.2.11 using Notation 11.2.14.
Proof: Parts (i) and (ii) follow immediately from Theorem 11.2.11 and Notation 11.2.14.
upper
A+
bounds
c = min(A+ )
c = sup(A)
A
b b = inf(A)
= max(A )
lower
A
bounds
P
From Theorems 11.2.24 (i) and 11.2.7 (v), it follows that A (X) has a minimum in an ordered set (X, )
if and only if max(A ) exists and max(A ) A. Then inf(A) = min(A) = max(A ). Similarly A (X) P
has a maximum if and only if min(A+ ) exists and min(A+ ) A. Then sup(A) = max(A) = min(A+ ).
11.2.26 Remark: Alternative expressions for relations between bound-sets and extrema.
Applying the comments in Remark 11.2.22 to Theorem 11.2.24, one obtains the equations inf(A) = max(A )
for all A Dom(inf), and sup(A) = min(A+ ) for all A Dom(sup). By referring to the domains of the
bounds, one avoids using the word exists and the term well defined. Theorem 11.2.24 also says that
A Dom(max) if and only if A Dom(inf), and A+ Dom(min) if and only if A Dom(sup).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The set-maps in Definition 11.2.17 and Notation 11.2.18 may be extended to define the lower-bound set-
P P
map lb : (X) (X) and the upper-bound set-map ub : (X) (X) by P P
A P(X), lb(A) = {b X; a A, b a}
A P(X), ub(A) = {b X; a A, b a}.
P
Then A = lb(A) and A+ = ub(A) for all A (X). Hence Theorem 11.2.24 may be expressed in symbolic
logic as follows for any partially ordered set (X, ).
(i) A P(X), A Dom(inf) lb(A) Dom(max).
(ii) A Dom(inf), inf(A) = max(lb(A)).
(iii) A P(X), A Dom(sup) ub(A) Dom(min).
(iv) A Dom(sup), sup(A) = min(ub(A)).
11.2.27 Remark: Alternative expressions for some properties of extrema and bounds.
Most of Theorem 11.2.13 may be rewritten as in Theorem 11.2.28.
Proof: Parts (i), (ii), (iii) and (iv) follow immediately from Theorem 11.2.13 (ii, iii, iv, v).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the logical negative of this proposition is true. This gives the following expression for b = sup(A).
x X, x b a A, x a.
These comments are formalised in Theorem 11.2.31.
11.2.31 Theorem: Let (X, ) be a partially ordered set. Let A X.
(i) b = inf(A) if and only if x X, (x b a A, x a).
(ii) b = sup(A) if and only if x X, (x b a A, x a).
(iii) inf(A) exists if and only if b X, x X, (x b a A, x a).
(iv) sup(A) exists if and only if b X, x X, (x b a A, x a).
Proof: Let X be a partially ordered set. Let A X. Then by Definition 11.2.4, b = inf(A) if and
only if (1) a A, b a and (2) x X, ((a A, x a) x b). Proposition (2) is equivalent to
x X, (x
b (a A, x a)). In other words, if x > b, or x and b are not related by the partial
order, then the proposition a A, x a is false.
If x = b, then proposition (1) implies a A, x a. If x < b, then a A, x a by transitivity, using
proposition (1). So this covers all of the four possible relations (or non-relations) between x and b. The case
x > b and the unrelated case imply (a A, x a). The other two cases imply a A, x a. Hence
b = inf(A) implies x X, (x b a A, x a), as claimed.
To show the converse, let b A satisfy (3) x X, (x b a A, x a). Substituting b for x
gives a A, b a. So b is a lower bound for A. Let x be any lower bound for A. Then proposition (3)
implies x b. Therefore b is an infimum for A in X by Definition 11.2.4. This verifies part (i).
The proof of part (ii) follows similarly.
Parts (iii) and (iv) are immediate corollaries of parts (i) and (ii) respectively.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) If A 6= and sup(A) exists, and b is a lower bound for A, then sup(A) b.
(iv) If sup(A) exists and c is an upper bound for A, then sup(A) c.
P
Proof: Let (X, ) be a partially ordered set. Let A (X) \ {}. For part (i), suppose that inf(A) exists
and that b X satisfies a A, a b. Then A = {x X; a A, a x} is a non-empty subset of X
and inf(A) = max(A ) by Theorem 11.2.24. Since b A , it follows that b max(A ) = inf(A).
For part (ii), suppose that inf(A) exists and that c X satisfies a A, a c. Then since A is non-empty,
there is an a A with a c. So by transitivity, inf(A) a c.
Part (iii) follows as for part (ii). Part (iv) follows as for part (i).
11.2.37 Remark: Some alternative expressions for properties of the infimum and supremum.
One may express Theorem 11.2.36 in terms of Notation 11.2.20 and Remark 11.2.26 as follows for any
partially ordered set (X, ).
(i) A Dom(inf), b lb(A), inf(A) b.
(ii) A Dom(inf), c ub(A), inf(A) c.
(iii) A Dom(sup), b lb(A), sup(A) b.
(iv) A Dom(sup), c ub(A), sup(A) c.
11.2.38 Theorem: Let (X, ) be a partially ordered set.
(i) A, B Dom(inf), A B inf(A) inf(B).
(ii) A, B Dom(sup), A B sup(A) sup(B).
(iii) A, B Dom(min), A B min(A) min(B).
(iv) A, B Dom(max), A B max(A) max(B).
Proof: Let (X, ) be a partially ordered set. For part (i), let A, B Dom(inf). Suppose that A B.
Let b = inf(B). Then x B, b x. So x A, b x. So b lb(A). Let a = inf(A). Then a = max(lb(A)).
So b a. Thus inf(A) inf(B).
The proof of part (ii) parallels the proof of part (i).
For part (iii), let A, B Dom(min). Suppose that A B. Let b = min(B). Then x B, b x. So
x A, b x. Let a = min(A). Then a A. Therefore b a. Thus min(A) min(B). (Alternatively use
part (i) together with Theorem 11.2.7 (iii).)
The proof of part (iv) parallels the proof of part (iii).
11.2.39 Remark: Order relations between points and sets induced by order relations on point-pairs.
P
For any partially ordered set (X, ), corresponding relations R1 on X (X) and R2 on (X) X may P
be defined by
x X, A (X), P x R1 A a A, x a
P
A (X), x X, A R2 x a A, a x.
The same symbol may be used for R1 and R2 when there is no danger of confusion. Then one may
write, for example, x A to mean that x is a lower bound for A. Then Theorem 11.2.31 (i) may be
written as: b = inf(A) if and only if x X, (x b x A). Much of Section 11.2 would look much
simpler if lower and upper bounds conditions were written in this extended notation.
To take this style of notation further, one may also define a relation R0 on P(X) P(X) by
A, B P(X), A R0 B a A, b B, a b.
Then R0 may also use the same symbol . This notation is not so useful for Section 11.2, although it does
P
at least have some of the properties of a partial order on (X). Unfortunately, the empty set is problematic
P
because one would have A for all A (X), which implies that X = . One may restrict the relation
P P
R0 to ( (X) \ {}) ( (X) \ {}), but the restricted relation is still not a valid partial order on (X) \ {}P
because the weak reflexivity condition A A is not valid in general. (In fact, A A if and only if A is a
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
singleton {x} for some x X.) However, the transitivity and antisymmetry conditions do seem to be valid.
11.3.2 Definition: Let X be a set. Let (Y, ) be a partially ordered set. Let f : X Y .
A lower bound for f is a lower bound for f (X) in Y .
A upper bound for f is a upper bound for f (X) in Y .
An infimum or greatest lower bound for f is an infimum for f (X) in Y .
A supremum or least upper bound for f is a supremum for f (X) in Y .
A minimum for f is a minimum for f (X) in Y .
A maximum for f is a maximum for f (X) in Y .
11.3.3 Notation: Let X be a set. Let (Y, ) be a partially ordered set. Let f : X Y .
inf(f ) denotes the infimum of f , if it exists. Alternatives: inf f , inf xX f (x).
sup(f ) denotes the supremum of f , if it exists. Alternatives: sup f , supxX f (x).
min(f ) denotes the minimum of f , if it exists. Alternatives: min f , minxX f (x).
max(f ) denotes the maximum of f , if it exists. Alternatives: max f , maxxX f (x).
11.3.4 Notation: Let X be a set. Let (Y, ) be a partially ordered set. Let f : X Y and A X.
inf xA f (x) denotes the infimum of f A , if it exists. Alternatives: inf A (f ), inf A f .
supxA f (x) denotes the supremum of f A , if it exists. Alternatives: supA (f ), supA f .
minxA f (x) denotes the minimum of f A , if it exists. Alternatives: minA (f ), minA f .
maxxA f (x) denotes the maximum of f A , if it exists. Alternatives: maxA (f ), maxA f .
11.3.5 Theorem: Let f : X Y for a set X and a partially ordered set (Y, ). Let A be a subset of X.
(i) If A 6= and the infimum and supremum of f both exist, then inf A (f ) supA (f ).
(ii) If the minimum of f exists, then minA (f ) = inf A (f ).
(iii) If the maximum of f exists, then supA (f ) = maxA (f ).
(iv) If the minimum and maximum of f both exist, then minA (f ) = inf A (f ) supA (f ) = maxA (f ).
Proof: For part (i), f (A) 6= because A 6= . So inf A (f ) supA (f ) by Theorem 11.2.15 (i). Similarly,
parts (ii), (iii) and (iv) follow from Theorem 11.2.15 (ii, iii, iv).
11.3.6 Remark: Extremal notions for functions are defined on their range sets.
The various kinds of bounds in Definition 11.3.2 are the same as the corresponding bounds in Definition 11.2.4
applied to the image of each function, not the graph. For most purposes, it is not necessary to know that a
set of ordered pairs is a function. Most of the time, one may regard the graph (i.e. the set of ordered pairs)
of a function either as a set or as a function. But in the case of bounds, the distinction is important. When
the set of ordered pairs is thought of as a function f : X Y , the bounds are applied to the image set
f (X) Y of the function, not the graph f = {(x, f (x)); x X}. However, one may also define bounds for
the graph as a subset of X Y if the Cartesian set product has an order defined on it.
11.3.7 Definition: Let X be a set. Let (Y, ) be a partially ordered set. Let f : X Y .
f is bounded below (in Y ) when f (X) is bounded below in Y .
f is bounded above (in Y ) when f (X) is bounded above in Y .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
f is bounded (in Y ) when f (X) is bounded in Y .
11.3.8 Theorem: Let X be a set. Let (Y, ) be a partially ordered set. Let f : X Y . Let S P(X).
(i) If inf(f (S)) exists and b Y is a lower bound for f (S), then inf(f (S)) b.
(ii) If S 6= and inf(f (S)) exists, and c Y is an upper bound for f (S), then inf(f (S)) c.
(iii) If S 6= and sup(f (S)) exists, and b Y is a lower bound for f (S), then sup(f (S)) b.
(iv) If sup(f (S)) exists and c Y is an upper bound for f (S), then sup(f (S)) c.
Proof: Let (Y, ) be a partially ordered set. Let f : X Y . For part (i), suppose S is a non-empty
subset of X such that inf(S) exists, and let b Y satisfy x S, f (x) b. Then b is a lower bound for the
non-empty subset A = f (S) of Y . So inf(f (S)) = inf(A) b by Theorem 11.2.36 (i).
Similarly, parts (ii), (iii) and (iv) follow from Theorem 11.2.36 (ii, iii, iv).
11.3.9 Theorem: Let X be a set. Let (Y, ) be a partially ordered set. Let f, g : X Y .
(i) If x X, f (x) g(x), and inf(f ) and inf(g) both exist, then inf(f ) inf(g).
(ii) If x X, f (x) g(x), and sup(f ) and sup(g) both exist, then sup(f ) sup(g).
Proof: For part (i), let X be a set, let (Y, ) be a partially ordered set, and let f, g : X Y satisfy
x X, f (x) g(x). Suppose that inf(f ) and inf(g) exist. Then b = inf(f ) Y is a lower bound for f (X),
by Definition 11.2.4. So b f (x) for all x X. So b g(x) for all x X, by Definition 11.1.2 (iii). So
b inf(g) by Theorem 11.3.8 (i). Hence inf(f ) inf(g).
Part (ii) may be proved as for part (i).
11.3.10 Remark: Mixed infimums and supremums for functions of several variables.
For functions of two or more variables, one may mix infimum and supremum operations. The infimum over
all variables individually is the same as the infimum over the combined cross-product domain. (Similarly
for the supremum.) Notation 11.3.11 gives some samples of mixed infimum/supremum combinations. Of
course, there are infinitely many variations and combinations generalising this notation.
11.4. Lattices
11.4.1 Remark: Usefulness of complete lattices for defining extrema.
Complete lattices are well suited to be the range of functions for which inferior and superior limits are to be
defined, for example in Section 35.8. The existence of mixed extremum expressions such as in Notation 11.3.11
is also guaranteed when the function target space is a complete lattice. (See Figure 11.6.1 for a family tree
including lattices and other classes of partial orders.)
11.4.2 Definition: A lattice is a partially ordered set X such that inf {a, b} and sup {a, b} both exist in X
for all a, b X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
11.4.4 Definition: A complete lattice is a partially ordered set X such that inf A and sup A both exist
P
in X for all A (X) \ {}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
By Definition 11.5.1 (ii), x, y X, (x R y y R x) x = y. This is also clearly symmetric with respect
to x and y. So R1 satisfies the same condition.
By Definition 11.5.1 (iii), x, y, z X, (x R y y R z) x R z. So from the definition of an inverse relation,
x, y, z X, (y R1 x z R1 y) z R1 x. So x, y, z X, (y R1 z x R1 y) x R1 z by swapping
dummy variables x and z. Hence R1 satisfies Definition 11.5.1 (iii) by symmetry of the AND-operator.
11.5.4 Remark: Automatic application of partial order properties to total orders.
The comments on partial orders in Remark 11.1.4 (and proved in Theorem 11.1.5) apply also to total orders.
Thus a partial order R on a set X is a relation on X which satisfies the three conditions: idX R,
R R1 idX and R R R. The additional condition (i) in Definition 11.5.1 for a total order may be
written in the same idiom as R R1 = X X.
11.5.5 Remark: Contrast between partial orders and total orders.
A total order has a stronger reflexivity condition than a partial order. Every total order is a partial order.
So all theorems and definitions which require a partial order apply also to a total order. A family tree for
some classes of order relations is illustrated in Figure 11.1.1. (See Figure 9.0.1 for a family tree showing
order relations in the context of more general relations.)
11.5.6 Theorem: Let (X, ) be a totally ordered set.
(i) The set {a, b} has both a minimum and a maximum in X for all a, b X.
(ii) (X, ) is a lattice.
Proof: Let (X, ) be a totally ordered set. For part (i), let a, b X. Then a b or b a (or both).
Let A = {a, b}. If a b, then x A, a x and x A, x b. So a is the minimum of A and b is the
maximum of A. Similarly if b a. So {a, b} always has a minimum and a maximum in X for all a, b X.
Part (ii) follows immediately from part (i), Definition 11.4.2 and Theorem 11.2.7 (iii, iv).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A semi-infinite closed interval of X is a set of the form {x X; a x} or {x X; x b} for some a, b X.
A semi-infinite open interval of X is a set of the form {x X; a < x} or {x X; x < b} for some a, b X.
An interval of X is any open, closed, semi-open or semi-closed interval, any semi-infinite interval of X, or
the whole set X.
A bounded interval of X is any bounded open, closed, closed-open or open-closed interval of X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
In other words,
x, y X I , x y j I, ((xj yj ) (i I, (i < j xi 6= yi ))). (11.5.2)
11.5.22 Definition: A totally ordered family of elements of a set X is a family (xi )iI X I such that
the index set I is totally ordered.
11.5.23 Definition: A totally ordered family of sets is a family of sets (Xi )iI such that the index set I
is totally ordered.
11.5.24 Definition: A totally ordered family of functions is a family of functions (fi )iI such that the
index set I is totally ordered.
11.5.27 Definition: The projection map of a Cartesian product iI Si onto a range of components J =
{j I; n1 j n2 }, where I is a totally ordered set and n1 , n2 I, is the map from iI Si to jJ Sj
defined by (xi )iI 7 (xj )jJ .
11.5.28 Notation: nn21 , for n1 , n2 I, denotes the projection map of iI Si onto the component range
{j I; n1 j n2 }. Thus nn21 : iI Si jJ Sj is defined by nn21 : (xi )iI 7 (xj )jJ .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
11.5.29 Remark: Projection maps from Cartesian products to component ranges.
Definition 11.5.27 and Notation 11.5.28 are useful for various definitions for charts for tangent bundles. If
a manifold has dimension n, then the tangent bundle is a manifold of dimension 2n. The horizontal and
vertical components of the standard charts on the tangent bundle can then be separated using the projection
R R
maps n1 : 2n n and 2n n+1 :
2n
R R
n . (A standard canonical map is assumed from the index set
N N N N
J = {i 2n ; i n + 1} = 2n \ n to n as in Definition 14.5.9.)
11.5.31 Definition: An ordered traversal of a set X is a bijection f : I X for some totally ordered
set I.
11.6. Well-ordering
11.6.1 Definition: A (left) well-ordering on a set X is a partial order R on X such that all non-empty
subsets of X have a minimum. That is,
A P(X) \ {}, b A, a A, b R a.
A right well-ordering on a set X is a partial order R on X such that all non-empty subsets of X have a
maximum. That is,
P
A (X) \ {}, b A, a A, a R b.
11.6.2 Theorem: If R is a left or right well-ordering for a set X, then R is a total order for X.
Proof: Let R be a left well-ordering on a set X. Let x, y X. Let S = {x, y}. Then S has a minimum
element, which must be either x or y. So either x R y or y R x. Therefore R is a total order on X.
Let R be a right well-ordering on a set X. Let x, y X. Let S = {x, y}. Then S has a maximum element,
which must be either x or y. So either x R y or y R x. Therefore R is a total order on X.
relation
partial order
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
lattice order
Figure 11.6.1 Family tree for classes of order relations, including various well-orderings
There is some similarity between total orders and lattice orders. A total order on a set X guarantees the
existence of min{a, b} and max{a, b} for any a, b X. A lattice order on X guarantees the existence of
inf{a, b} and sup{a, b} for any a, b X. Hence a total order is necessarily a lattice order.
11.6.5 Theorem:
(i) Every non-empty bounded-below subset of a right well-ordered set (X, ) has an infimum in X.
(ii) Every non-empty bounded-above subset of a left well-ordered set (X, ) has a supremum in X.
Proof: For part (i), let (X, ) be a right well-ordered set. Let A be a non-empty bounded-below subset
of X. Then A has a lower bound in X. So the set A = {x X; a A, x a} is non-empty. Therefore
A has a maximum by Definition 11.6.1. Hence A has an infimum by Theorem 11.2.11.
Part (ii) follows similarly to part (i).
Proof: To show part (i), let (X, ) be a non-empty ordered set. Let be a left well-ordering. Then
P
b X, a X, b R a by Definition 11.6.1 because X (X) \ {}. In other words, there is an element
b X which is a lower bound for X. Hence X is bounded below by Definition 11.2.34.
Part (ii) follows similarly.
Proof: Since this is an AC theorem which is not entirely trivial to prove, no proof is given here. Proofs
directly from the axiom of choice, or indirectly via Zorns lemma, or a numeration theorem (showing equinu-
merosity to ordinal numbers), or some other equivalent axiom, are given by Zermelo [389], pages 514516;
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Stoll [343], pages 116117; Pinter [327], pages 120121; Levy [318], pages 160161; E. Mendelson [320],
pages 197199; Kelley [97], page 35; Shoenfield [340], page 253; Roitman [335], page 82; Bernays [292],
pages 138139; Halmos [307], pages 6869.
11.7.3 Remark: An AC-free version of the theorem that surjections have right inverses.
Theorem 11.7.4 is an AC-free version of AC-tainted Theorem 10.5.14. To put this theorem more briefly,
every surjection on a well-ordered set has a right inverse. (The converse is elementary. So it does not need
to be stated.)
11.7.4 Theorem: Let X be a well-ordered set. Then a function f : X Y is a surjection if and only if f
has a right inverse.
Chapter 12
Ordinal numbers
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of an ordered sequence with the elements of a set, whereas the cardinal numbers merely tell you the total
number of elements. The cardinality of a set depends only on the last number reached, not on the particular
order in which the elements were counted, whereas the ordinal numbers are used for associating particular
number-tags with particular elements.
Since ordinal numbers provide the means by which the elements of sets are counted, it seems that ordinal
numbers should be defined before cardinal numbers, because elements can be counted only by associating
numbers in a one-to-one (i.e. bijective) manner with the elements of the set. In other words, the counting
procedure is a prerequisite for defining the cardinality of a set.
It is true that one may compare the cardinalities of sets without counting them (i.e. bijectively associating
them) with the elements of a standard set of ordinal numbers. So it could be argued that cardinality
is the prior concept. But in practice, one usually wants to know cardinalities of sets in some absolute
sense, measured with respect to to sets with known, standardised cardinality, not merely comparing relative
cardinalities. Therefore ordinal numbers are presented here as the prior concept.
Mathematics thrived without set theory until the time of Cantor in the late nineteenth century. Before the
introduction of Cartesian coordinates, most mathematics was expressed in terms of integers, points and lines,
but line segments, in particular the theory of proportion, played a role similar to the modern concept of real
numbers. Most of the progress in pure and applied mathematics in the last 400 years has been expressed in
terms of numbers.
Despite the apparent primacy of numbers in mathematics, and the consequent secondary role of sets, numbers
have also been defined in pure mathematics for the last hundred years in terms of sets. In other words, sets
have been regarded as the primary concept, and numbers have been defined within a set theory framework.
This is a kind of set-ism, analogous to the early 20th century logicism which claimed that all mathematics
is reducible to logic.
It is perfectly possible, and very much more natural, to axiomatise numbers independently of sets. In this
book, ordinal numbers are defined in terms of sets in the usual way, and sets are axiomatised as the primary
conceptual framework. A very strong argument in favour of defining at least the ordinal numbers within set
theory is the requirement for sufficient standard-cardinality sets to use in equinumerosity tests to measure
cardinality. Nevertheless, numbers are not sets. Although it is possible to represent numbers as particular set
constructions, it is also possible to arithmetise set theory. Numbers and sets are complementary conceptual
frameworks. They have shared primacy. In fact, it seems even more true to say that all three conceptual
domains, logic, sets and numbers, have shared primacy.
12.0.3 Remark: The special role of ordinal numbers in Zermelo-Fraenkel set theory.
Ordinal numbers are not merely a useful construction within ZF set theory for measuring the cardinality of
finite sets, for providing order types for comparison with general well-ordered sets, and for parametrising
general transfinite recursion procedures. The ordinal numbers also provide a kind of backbone for the
class of all sets. This special role for the ordinal numbers in ZF set theory is discussed in Section 12.5,
where von Neumanns transfinite recursion procedure for ordinal numbers is applied to sets. Then the
ordinal numbers act as the rank parameter for general sets. Each single step in the transfinite inductive
procedure is associated with an application of the power-set axiom to expand the class of all sets. Each
limit operation in this procedure is associated with a similar application of the union axiom. In this way,
every set in a more or less minimal ZF universe is associated with a unique ordinal number, which is known
as the rank of the set.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
There is some similarity between the von Neumann style of ordinal number construction and a construction
which arises when trying to resolve Russells paradox. If one requires a universe of sets U to be an element of
itself, one way to do (try to) achieve this is to extend the universe to the set U {U }. But then this expanded
universe needs to be included in itself, which requires a further expansion to the set U {U } {U {U }}
and so forth. This gives rise to a construction which somewhat resembles a von Neumann style of hierarchy
built from two atoms, one of them being the empty set , the other being the universe U .
Despite the unquestioned importance of the ordinal numbers for general set theory, the portion of set theory
which is applicable to differential geometry does not require most of the breathtaking size and often exotic
phenomena which are associated with ordinal numbers beyond the first few stages of the transfinite recursion.
(The apparent irrelevance of most of ordinal number theory for practical mathematics is also mentioned in
Remarks 12.4.2 and 12.4.3.)
The finite ordinal numbers are defined inductively as 0 = , 1 = {0}, 2 = {0, 1}, 3 = {0, 1, 2}, n+ = n {n},
and so forth. (This construction for the ordinal numbers is due to von Neumann [382].) Unfortunately,
induction requires the prior definition of all finite ordinal numbers. So induction cannot be used to define
ordinal numbers. Construction of ordinal numbers must be achieved using only the Zermelo-Fraenkel axioms.
The von Neumann construction for finite ordinal numbers has various useful properties. For example,
0 1 2 3 . . . and 0 1 2 3 . . .. But x y x 6= y. Therefore 0
6= 1 6= 2 6= 3 . . .. The membership
relation is transitive, and in fact corresponds exactly to the usual notion of order on the integers.
m N, (m = a N, m = a {a}). (12.1.1)
12.1.4 Remark: Every non-empty element of an extended finite ordinal number has a predecessor.
A ZF set is an extended finite ordinal number if and only if it is either a finite ordinal number or else the
single infinite ordinal number , which is equal to the set of all finite ordinal numbers. (This is shown in
Theorem 12.1.36 (i). See Notation 12.1.27 for . See Definition 12.1.32 for finite ordinal numbers.) These
finite ordinal numbers and one infinite ordinal number have in common the fact that each element, except
the empty element, is the successor of another element of the set. (See Definition 12.2.14 for successor sets.)
So these may both be regarded as sequences, since all of their elements, except the empty element, have
a predecessor within the set.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
element. In other words, the elements of are precisely those which are equal to the empty set or else are
the successor of some element. However, this is not enough to prevent infinite sequences of containments
on the left. In other words, there could be a sequence of containments which is not terminated on the
left by the empty set. The ZF regularity axiom requires every containment sequence to terminate with the
empty set. This implies the uniqueness of the set . Thus, roughly speaking, the regularity axiom guarantees
termination on the left, and the infinity axiom guarantees non-termination on the right. (This is illustrated
in Figure 12.1.1.)
{} {,{}} ...
axiom of axiom of
regularity infinity
0 1 2 3 4 5 6 7 8
Figure 12.1.1 Complementary roles of the ZF axioms of regularity and infinity
Termination on the left is just as important as non-termination on the right. (Termination on the left is
what makes the induction principle work, and induction is the principal principle in mathematics!)
Most textbook definitions of ordinal numbers are stated in terms of well-ordering and the transitivity of
the set-membership relation. (A transitive set is a set X whose elements are all subsets of X. That is,
x X, y x, y X. In other words, x X, x X.) Suppes [345], page 131, defines ordinal numbers
to be both well-ordered by the set-membership relation and transitivity (which he calls completeness).
Jech [314], page 33, E. Mendelson [320], page 172, and Quine [332], page 157, also require transitivity and
set-membership well-ordering. Bernays [292], page 80, defines ordinals to be transitive, totally ordered and
well-founded (i.e. satisfying the ZF regularity axiom). According to Chang/Keisler [298], page 580, the
definition which has by now become fairly standard
S in the literature S
defines an ordinal as a set X which is
well-ordered by set-membership and satisfies X X. The condition X X is equivalent to transitivity.
So the Chang/Keisler definition is also in terms of well-ordering and transitivity, but they comment that:
This definition of ordinals is, of course, quite artificial. According to a footnote by Quine [332], page 157,
the earliest formulation of this well-ordering/transitivity kind was in 1937 by Robinson [379].
Slightly different definitions are given by some other authors. Shoenfield [340], page 246, defines ordinals
as transitive sets whose elements are all transitive. Stoll [343], page 307, and Halmos [307], page 75, define
them as well-ordered sets X for which each element x X equals its initial segment {y X; y < x}.
Generally ordinal numbers are defined to make proofs convenient rather than to maximise intuitive appeal.
Definition 12.1.3 is chosen for maximum intuitive appeal, although the cost is a not-so-easy proof of basic
properties in Theorems 12.1.15, 12.1.17, 12.1.19 and 12.1.21. The transitivity property of extended finite
ordinal numbers is proved in Theorem 12.1.15 (iii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
12.1.9 Theorem: Let N be a set which satisfies m N, a N, m = a {a}. Then N = .
12.1.10 Remark: A useful form of the axiom of regularity for proving ordinal number theorems.
Theorem 12.1.11 is a contrapositive form of Theorem 12.1.9 which is useful for proving theorems.
12.1.11 Theorem: Let N be an extended finite ordinal number. Let N 0 be a non-empty subset of N
/ N 0 . Then m N 0 , a N \ N 0 , m = a {a}. In other words, a N \ N 0 , a {a} N 0 .
which satisfies
Proof: Let N be an extended finite ordinal number. Let N 0 be a set such that 6= N 0 N and / N 0.
0 0 0
Then m N , a N, m = a {a} by Definition 12.1.3, because N N and m N , m 6= . By
Theorem 12.1.9, (m N 0 , a N 0 , m = a {a}) because N 0 6= . So m N 0 , a N 0 , m 6= a {a}.
Let m N 0 satisfy a N 0 , m 6= a {a}. Then a N, m = a {a}. Let a N satisfy m = a {a}.
/ N 0 . So a N \ N 0 , m = a {a}. Hence m N 0 , a N \ N 0 , m = a {a}.
Then a
12.1.12 Theorem: Let m, a and b be sets such that m = a {a} and m = b {b}. Then a = b. In other
words, for any set m, there is at most one set a such that m = a {a}.
Proof: Let m, a and b be sets such that m = a {a} and m = b {b}. Then a m. So a b {b}. So
a b or a = b. Similarly, b m and so b a or a = b. Suppose that a 6= b. Then a b and b a. This is
excluded by the ZF axiom of regularity. (See Theorem 7.8.2 (ii).) Hence a = b.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ix) If m N , then m is an extended finite ordinal number.
(x) If m N , then m {m} is an extended finite ordinal number.
(xi) If m N , then {x N ; m / x} is an extended finite ordinal number.
(xii) m N, m {m} = {x N ; m / x}.
(xiii) a, b N, (a b a = b b a). [N is totally ordered by .]
(xiv)
a, b N, (a b a 6= b).
Proof: For part (i), let N be an extended finite ordinal number. Suppose / N . Then it follows from
Definition 12.1.3 that m N, a N, m = a {a}. Hence N = by Theorem 12.1.9.
For part (ii), let N be an extended finite ordinal number. Let N 0 = {m N \ {}; / m}. Let m N 0 .
Then m N \ {}. So a N, m = a {a} by Definition 12.1.3. But / m. So for a such that m = a {a},
both / a and 6= a. So a N 0 . Therefore a N 0 , m = a {a}. So m N 0 , a N 0 , m = a {a}.
Hence N 0 = by Theorem 12.1.9.
For part (iii), S S ordinal number. Suppose that m 6 N for some m N . In
let N be an extended finite
other words, ( N ) \ N 6= . Let x ( N ) \ N , and let N 0 = {m N ; x m}. Then N 0 6= , and
m N 0 , m 6= , and m N 0 , a N, m = a {a} because N 0 N . If m = a {a} with m N 0
and a N , then x M , and so x a or x = a. But x / N and a N . So x 6= a. Therefore x a.
So a N 0 . Consequently, m N 0 , a N 0 , m = a {a}. Therefore N 0 = by Theorem 12.1.9. Hence
m N for all m N .
For part (iv), let N be an extended finite ordinal number. Let m N . Then m N by part (iii). But
m = N would imply N N , which is excluded by the ZF axiom of regularity. Hence m 6= N .
For parts (v), (vi) and (vii), let N be an extended finite ordinal number. Let m N and b m. Define
N 0 = {x N ; b x x m}. Then N 0 6= because m N 0 . Clearly / N 0 because b / . So by
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
0
Then / Nm because N by part (i), since N 6= , and m / and m {m} by parts (x) and (i).
0
Assume that Nm 6= . Since Nm is an extended finite ordinal number by part (xi), Theorem 12.1.11 implies
0 0
y Nm , z Nm \ Nm , y = z {z}. Then m
/ y, m / z, y
/ m {m}, z m {m}, and y = z {z} for
such y and z. From z m {m}, it follows that z m or z = m. From m / y = z {z}, it follows that
m / z and m 6= z. So z = m is excluded. Therefore z m. Note that z {z} = y / m {m}. (This first
stage of the proof of part (xii) has found a branch point z between the sets m {m} and Nm .)
0 0
For the second stage of the proof of part (xii), let Nm,z = {w m {m}; z w}. Then Nm,z 6=
0 0
because m Nm,z . But / Nm,z because z / , and m {m} is an extended finite ordinal number by
0 0
part (x). So w Nm,z , v (m {m}) \ Nm,z , w = v {v} by Theorem 12.1.11. Then v m {m},
w m {m}, z / v and z w = v {v}. So z v or z = v. But z v is excluded by z / v. So z = v. So
0
z {z} = v {v} = w m {m}. This contradicts the conclusion that z {z} / m {m}. So Nm = .
Hence m {m} = {x N ; m / x}. (The proof of part (xii) is illustrated in Figure 12.1.2. The question
marks in the diagram indicate what would happen if the axiom of regularity was not enforced.)
w = v {v} m m {m} N
?
z stage 2
stage 1 N
?
y = z {z} Nm = {x N ; m
/ x}
Figure 12.1.2 Strategy for a proof that Nm m {m}
For part (xiii), let N be an extended finite ordinal number, and let a, b N . Suppose that a
/ b. Let
Na = {x N ; a / x}. Then b Na . But Na = a {a} by part (xii). So b a or b = a.
For part (xiv), let N be an extended finite ordinal number, and let a, b N . Suppose that a b. Then a 6= b
by part (iv). No suppose that a 6= b. Then a 6= b. So a b or b a by part (xiii). Suppose that b a. Then
b
6= a by part (iv). This contradicts the assumption a 6= b. So a b.
12.1.16 Remark: The ZF axiom of regularity guarantees extended finite ordinals have a lower bound.
In Theorem 12.1.15, the ZF axiom of regularity, via Theorems 12.1.9 and 12.1.11, plays a pivotal role which
is analogous to the principle of mathematical induction. Since the elements of extended finite ordinals are
totally ordered by the membership relation, the ZF axiom of regularity effectively guarantees that every
subset of such an ordinal must have a minimum element. This is a kind of well-ordering property. Hence
induction-style proofs are able to be used in Theorem 12.1.9, arriving at contradictions by examining the
minimum object of a set of objects which do not satisfy some asserted proposition.
Proof: For part (i), let N1 and N2 be extended finite ordinal numbers and let N = N1 N2 . Let m N .
Then m = (a1 N1 , m = a1 {a1 }) and m = (a2 N2 , m = a2 {a2 }) by Definition 12.1.3.
Therefore either m = or (a1 N1 , m = a1 {a1 }) (a2 N2 , m = a2 {a2 }). So either m = or
a1 N1 , a2 N2 , m = a1 {a1 } = a2 {a2 }. But a1 = a2 for such a1 and a2 by Theorem 12.1.12.
Therefore for all m N , either m = or a N, m = a {a}. Hence N = N1 N2 is an extended finite
ordinal number.
For part (ii), let N1 and N2 be extended finite ordinal numbers and let N = N1 N2 . Let m N . Then either
m N1 N2 or m N1 \ N2 or m N2 \ N1 . If m N1 N2 , then either m = or a N, m = a {a},
as in the proof of part (i). Suppose that m N1 \ N2 . Then either m = or a1 N1 , m = a1 {a1 } by
Definition 12.1.3. Therefore either m = or a1 N, m = a1 {a1 }. If m N2 \ N1 , the same consequence
follows. Hence N = N1 N2 is an extended finite ordinal number by Definition 12.1.3.
For part (iii), let N1 and N2 be extended finite ordinal numbers, and suppose that N1 6 N2 and N2 6 N1 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Let x1 N1 \ N2 and x2 N2 \ N1 . By part (ii), N1 N2 is an extended finite ordinal number. So by
Theorem 12.1.15 (xiii), x1 x2 , x1 = x2 or x2 x1 . Suppose that x1 x2 . Then x1 N2 because x2 N2
by Theorem 12.1.15 (iii). This contradicts the assumption x1 N1 \ N2 . So either N1 N2 or N2 N1 .
The same conclusion follows from the assumptions x1 = x2 or x2 x1 . Hence N1 N2 or N2 N1 .
(vi) If x N, x {x}
/ N , then N {N } is an extended finite ordinal number.
S
/ N , then N
(vii) If x N, x {x} 6= N .
S
(viii) If x N and x {x}
/ N , then N = x.
S
(ix) If x N, x {x} N , then N = N .
S
(x) N is an extended finite ordinal number.
S
(xi) (x N, x {x} N ) N = N .
S
(xii) (x N, x {x} / N) N 6= N .
Proof: For part (i), let N be an extended finite ordinal number, and let x N satisfy x {x} / N.
Suppose that N \ {x} is not an extended finite ordinal number. Then m0 N \ {x}, (m0 6= a
N \ {x}, m0 6= a {a}) follows from Definition 12.1.3. So some m0 N satisfies m0 6= x, m0 6= , and
a N \ {x}, m0 6= a {a}. But m N, (m = a N, m = a {a}) by Definition 12.1.3 because
N is an extended finite ordinal number. So m0 satisfies a N, m0 = a {a}. Combining this with
a N \ {x}, m0 6= a {a} gives a = x and m0 = x {x}. But this contradicts the assumption x {x}
/ N.
Therefore N \ {x} is an extended finite ordinal number.
For part (ii), let N be an extended finite ordinal number, and let x N be such that N \ {x} is an extended
finite ordinal number. Then m N \ {x}, (m = (a N \ {x}, m = a {a})) by Definition 12.1.3.
Suppose that x {x} N . Then x {x} N \ {x} because x 6= x {x}. Let m = x {x}. Then m 6= .
So a N \ {x}, m = a {a}. But x {x} = a {a} implies x = a by Theorem 12.1.12. So x N \ {x},
which is a contradiction. Hence x {x} / N.
For part (iii), let N be an extended finite ordinal number. Let x N satisfy x {x} / N . Let N 0 =
0
N {x {x}}. Let m N . Then m N or m = x {x}. If m N , then m = a N, m = a {a} by
Definition 12.1.3. So m = a N 0 , m = a {a}. If m = x {x}, then m = a N 0 , m = a {a}
holds with a = x N 0 . So m N 0 , (m = a N 0 , m = a {a}). Hence N 0 is an extended finite
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
ordinal number by Definition 12.1.3.
For part (iv), let N be an extended finite ordinal number. Let x N satisfy x {x}
/ N . Then x {x} N
since x N by Theorem 12.1.15 (iii). Suppose that y N \ (x {x}). Then y / x and y 6= x. Therefore
x y by Theorem 12.1.15 (xiii). So x {x} N by Theorem 12.1.15 (vii). This is a contradiction. Therefore
N \ (x {x}) = . Hence N = x {x}.
For part (v), let N be an extended finite ordinal number. Suppose that x N, x {x}
/ N . Then
x N, N = x {x} by part (iv). Now suppose that x N, N = x {x}. Then x N, x {x} /N
because N / N by the axiom of regularity.
Part (vi) follows immediately from parts (iii) and (iv).
For part (vii), let N be an extendedSfinite ordinal number which S satisfies x N, x {x} / N . Let x N
satisfy x {x} / N . Suppose that N = N . Then x {x} = N by part (iv). S So x y some y N .
for S
Therefore x {x} N by Theorem
S 12.1.15 (vii). This is a contradiction. So N 6
= N . But N N by
Theorem 12.1.15 (iii). Hence N 6= N .
S
For part (viii), let NSbe an extended finite ordinal number. LetS x N satisfy x {x} / NS . Then x N
by the definition S of N because x N . Suppose that x 6= N . Then there exists y N with y / x.
Therefore, since N N by Theorem 12.1.15 (iii), there exist y, z N such that y z and y / x. But
then x = y or x y by Theorem 12.1.15 (xiii). So either x z or else x y and y S z. In either case, it
follows from Theorem 12.1.15 (vii) that x {x} N . This is a contradiction. Hence N = x.
For part (ix), let N be an extended finite ordinal number which satisfies Sx N, x {x} SN . Then
x
S N, y N, x y because x x {x}
S for any set x. So x N, x N . Therefore N N . But
N N by Theorem 12.1.15 (iii). Hence N = N .
For part (x), let N be an extended finite ordinal number. Then N must satisfy either x N, x {x} /N
or x N, x {x} N , since these propositions are logical negations of each other. In the former case,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S S
(xii) If N1 6= , N2 6= , N1 = N1 and N2 = N2 , then N1 = N2 .
0
Proof: For part (i), let N1 and N2 be extended finite ordinal numbers with N1 6= N2 . Define N = N2 \N1 .
0
Then 6= N N2 , and / N because N1 by Theorem 12.1.15 (i). So x N2 \ N , x {x} N 0 by
0 0
Proof: The ZF infinity axiom, Definition 7.2.4 (8), asserts the existence of a set X which satisfies:
z, (z X ((u, u
/ z) y, (y X v, (v z (v y v = y))))).
z, (z X (z = (y X, z = y {y}))). (12.1.2)
Any set X which satisfies line (12.1.2) must satisfy z X, (z = (y X, z = y {y})). So any set X
which satisfies line (12.1.2) is an extended finite ordinal number according to Definition 12.1.3.
Any set X satisfying line (12.1.2) clearly must satisfy X because (z = ) (z X). So X 6= . For
any X satisfying line (12.1.2), let y X and let z = y {y}. This z satisfies y X, z = y {y}. Therefore
z X for this z. Thus z = y {y} X for any y X. In S other words, y X, z X, z = S y {y}. So
y
S X, z X, y
S z. This is equivalent to y X, y X. And this is equivalent to X X. Hence
X = X because X X by Theorem 12.1.15 (iii).
S
12.1.23 Theorem: There is a unique non-empty extended finite ordinal number N such that N = N .
Proof: Existence follows from Theorem 12.1.22. Uniqueness follows from Theorem 12.1.21 (xii).
12.1.24 Remark: The regularity and infinity axioms shape the set of finite ordinal numbers.
Roughly speaking, the axiom of regularity guarantees that the set of finite ordinal numbers is bounded
on the left, while the axiom of infinity guarantees that it is unbounded on the right. In other words,
the axiom of regularity forbids infinite chains of set-membership relations on the left, while the axiom of
infinity requires the existence of infinite chains of set-membership relations on the right.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
be exactly the same except that a caveat would need to be attached to calculations involving infinite sets.
For example, instead of the existential quantifier z, one could write instead the tagged quantifier z,
signifying that existence is conditional on the infinity axiom. Then someone who rejects the infinity axiom, an
ultrafinitist for example, can accept the proposition as conditional on an existence which may or may not be
true, whereas the infinity axiom believer may ignore the caveat. The same calculations are made regardless.
In this sense, the infinity axiom is an insubstantial philosophical matter, one might say academic. The
axiom of infinity merely anoints the set of all ordinal numbers, accepting it into the mathematical arena
as a dinkum set. It is possible for people to work for causes in which they do not believe. Similarly, one may
carry out calculations concerning conjectural metaphysical entities in which one does not believe.
One may argue similarly that the axiom of choice could have a special existential quantifier ac z, for
example. Once again, the calculations would be the same. However, almost nothing could have been done
in the applied mathematics of the last 400 years without infinity-related concepts such as limits, whereas
very little of a concrete practical nature ensues from AC-dependent existence assertions. Therefore axiom-
dependence tagging has a practical benefit in the case of the axioms of choice, whereas infinity-axiom tagging
would be annoyingly ubiquitous.
12.1.26 Definition:S The set of finite ordinal numbers is the unique non-empty extended finite ordinal
number N such that N = N .
In other words, the set of finite ordinal numbers is the unique set N such that:
(i) N 6= ,
(ii) m N, (m = a N, m = a {a}),
S
(iii) N = N.
12.1.28 Theorem:
(i) .
(ii) m , m {m} .
(iii) m1 , m2 , ((m1 {m1 } = m2 {m2 }) m1 = m2 ).
Proof: Part (i) follows from Theorem 12.1.15 (i) and Definition 12.1.26 (i).
Part (ii) follows from Theorem 12.1.19 (xi) and Definition 12.1.26 (iii).
For part (iii), let m1 , m2 satisfy m1 {m1 } = m2 {m2 }. Then m1 m1 {m1 }. So m1 m2 or
m1 = m2 . Similarly, m2 m1 or m2 = m1 . Suppose that m1 6= m2 . Then m1 m2 and m2 m1 , which
contradicts Theorem 7.8.2 (ii). Hence m1 = m2 .
12.1.29 Remark: Existence and uniqueness of justifies a constant name.
Existence and uniqueness of the set N in Definition 12.1.26 is guaranteed by Theorem 12.1.23. So the symbol
is a constant name for a fixed object in Zermelo-Fraenkel set theory. This constant name for a fixed object
probably ranks second in importance next to the constant symbolic name for the empty set.
12.1.30 Remark: Notation for the set of ordinal numbers.
The notation for the set of finite ordinal numbers is generally attributed to Cantor. (See for example
Cajori [222], Volume 2, page 45; Quine [332], page 151.) Cantor used for the first number of the second
number-class in an 1883 article, explaining his choice in a footnote as follows. (See Cantor [355], page 577,
also in Cantor [295], page 195. Part 2 refers to his 1880 paper, Cantor [354], also in Cantor [295], page 147.)
Das Zeichen , welches ich in Nr. 2 dieses Aufsatzes (Bd. XVII, pag. 357) gebraucht habe, ersetze
ich von nun an durch , weil das Zeichen schon vielfach zur Bezeichnung von unbestimmten
Unendlichkeiten verwandt wird.
This may be translated as follows.
The symbol , which I have used in part 2 of this essay (Vol. 17, page 357), I replace from now on
with , because the symbol is already employed in many cases for the indication of indefinite
infinities.
Cantor was emphasising that his notation indicated an actual infinity, as opposed to the indefinite
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
infinity or potential infinity notation .
12.1.31 Remark: The relation between and the finite ordinal numbers.
The name set of finite ordinal numbers in Definition 12.1.26 is justified by the name finite ordinal number
in Definition 12.1.32, together with Theorem 12.1.34 (i).
12.1.32 Definition:
A finite ordinal number is an extended finite ordinal number N such that N = or x N, x {x}
/ N.
12.1.33 Theorem: Let N be an extended finite ordinal number.
S
(i) N is a finite ordinal number if and only if N = or N 6= N .
(ii) N is a finite ordinal number if and only if N 6= .
(iii) N is a finite ordinal number if and only if N {N } is an extended finite ordinal number.
(iv) N is a finite ordinal number if and only if N = or x N, N = x {x}.
(v) N is a finite ordinal number if and only if N = or N \ {x} is an extended finite ordinal number for
some x N .
(vi) N is a finite ordinal number if and only if N .
Proof: For part (i), let N be an extended finite ordinal number. S
Suppose that N is a finite ordinal number.
If N 6= , then x N,Sx {x} / N by Definition 12.1.32. So N = N by Theorem 12.1.19 (ix). Now
suppose that N = or N 6= N . Then N = and x N, x {x} / N by Theorem 12.1.19 (ix). So N is
a finite ordinal number by Definition 12.1.32.
For part (ii),
S let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. Then
N =S or N 6= N by part (i). So N 6= . Now suppose that N is not a finite ordinal number. Then N 6=
and N = N by part (i). So N = by Definition 12.1.26 and Notation 12.1.27.
For part (iii), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. Then
N = or x N, x {x} / N by Definition (12.1.32). So N = or N {N } is an extended finite ordinal
number by Theorem 12.1.19 (vi). But N = implies that N {N } = {} is an extended finite ordinal
number by Remark 12.1.13. So N {N } is an extended finite ordinal number. Now suppose that N {N }
is an extended finite ordinal number. Then N = or x N, x {x} / N by Theorem 12.1.19 (xiv). Hence
N is a finite ordinal number by Definition 12.1.32.
For part (iv), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. If
N 6= , then x N, x {x} / N . So x N, N = x {x} by Theorem 12.1.19 (iv). Therefore N = or
x N, N = x{x}. Now suppose that N = or x N, N = x{x}. Then N = or x N, x{x} /N
because N / N by the axiom of regularity. So N is a finite ordinal number by Definition 12.1.32.
For part (v), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number. Then
N = or x N, x {x} / N by Definition 12.1.32. So by Theorem 12.1.19 (i), N = or N \ {x} is an
extended finite ordinal number for some x N . Now suppose that N = or N \ {x} is an extended finite
ordinal number for some x N . If N = , then N is a finite ordinal number by Definition 12.1.32. So let
x N be such that N \ {x} is an extended finite ordinal number. Then x {x} / N by Theorem 12.1.19 (ii).
Therefore N is a finite ordinal number by Definition 12.1.32.
For part (vi), let N be an extended finite ordinal number. Suppose that N is a finite ordinal number.
Then N 6= by part (ii). So N
6= or 6= N by Theorem 12.1.17 (iii).
S Suppose that 6= N . Then
x , x{x} N \ by Theorem 12.1.21 (i). So x , x{x}
/ . So 6= by Theorem 12.1.19 (vii).
This contradicts Definition 12.1.26. So N 6= . Therefore N by Theorem 12.1.21 (iii). Now suppose
that N . Then N 6= by the axiom of regularity. So N is a finite ordinal number by part (ii).
12.1.34 Theorem:
(i) N is a finite ordinal number if and only if N .
(ii) N , N {N } .
S
(iii) N , N .
Proof: For part (i), suppose that N is a finite ordinal number. Then N is an extended finite ordinal
number by Definition 12.1.32. So N by Theorem 12.1.33 (vi). Now suppose that N . Then N is an
extended finite ordinal number by Definition 12.1.26. So N is a finite ordinal number by Theorem 12.1.33 (vi).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (ii), let N . Then N is a finite ordinal number by part (i). So N {N } is an extended finite
ordinal number by Theorem 12.1.33 (iii). Let x = N . Then x N {N }, but x {x} = N {N } / N {N }
by Theorem 7.8.2 (i). So N {N } is a finite ordinal number by Definition 12.1.32. Hence N {N } by
part (i).
S
For part (iii), let N . Then N is a finite ordinal number by part (i). So N is an extended finite
ordinal number
S S S (x). But N = x {x} for some x N by Theorem
by Theorem 12.1.19 S 12.1.33 (iv), and so
S
N = ( x) x. Suppose that S N = N . Then from x N , it follows that x N = ( x) x. But x /x
by Theorem 7.8.2 (i). So xS x. Therefore x S y for some y x by Notation 8.4.2. But this is impossible
S Theorem 7.8.2 (ii). So N 6= N . Therefore N is a finite ordinal number by Theorem 12.1.33 (i). Hence
by
N by part (i).
12.1.35 Remark: Precise identification of the set of extended finite ordinal numbers.
Theorem 12.1.36 shows that {} = {N ; N is an extended finite ordinal number}. In other words,
{} = {N ; m N, (m = a N, m = a {a})}.
This set is an extension of the set of finite ordinal numbers in the sense that the element of {} is
a kind of point at infinity or limit for the finite elements in . (For comparison, see extended integers
in Section 14.4 and extended real numbers in Section 16.2.)
One may similarly write as follows.
S
= {N ; (N = ) ( N 6= N (m N, (m = a N, m = a {a})))}
S
= {N ; ((N = ) ( N 6= N )) (m N, (m = a N, m = a {a}))}.
The expression for {} seems simpler than the expression for . This is because both sets have the same
interior equation, but the set requires a boundary condition.
12.1.36 Theorem:
(i) N {} if and only if N is an extended finite ordinal number.
(ii) {} is not an extended finite ordinal number.
Proof: For part (i), let N {}. If N , then N is a finite ordinal number by Theorem 12.1.33 (vi),
and so N is an extended finite ordinal number by Definition 12.1.32. If N = , then N is an extended
finite ordinal number by Definition 12.1.26. Therefore N is S an extended finite
S ordinal number. Now suppose
that N is an extended finite ordinal number. Then either N N or N = N by Theorem 12.1.15 (iii).
S 6=
If S N 6= N , then N is a finite ordinal number by Theorem 12.1.33 (i), and so N by Theorem 12.1.33 (vi).
If N = N , then N if N = , and N = if N 6= by Definition 12.1.26. Hence N {}.
For part (ii), suppose that {} is an extended finite ordinal number. Then {} {} by part (i),
which contradicts the axiom of regularity. Hence {} is not an extended finite ordinal number.
0=
1 = {}
2 = {, {}}
3 = {, {}, {, {}}}
4 = {, {}, {, {}}, {, {}, {, {}}}}
5 = {, {}, {, {}}, {, {}, {, {}}}, {, {}, {, {}}, {, {}, {, {}}}}}
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
containing only an empty box represents {}, and so forth.
0
1
2
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Theorem
T 12.1.21 (ii). So S S by the definition of S. This contradicts the axiom of regularity.
So S S.
12.2.4 Remark: Order for the ordinal numbers in terms of the set membership relation.
The standard order relation on the ordinal numbers in Definition 12.2.5 is a total order on the ordinal
numbers by Theorem 12.2.6. Theorem 12.2.7 asserts that this order is a well-ordering. The principle of
mathematical induction follows from this.
12.2.5 Definition: The standard order on the finite ordinal numbers is the relation which is defined
on by x, y , (x y x y). In other words, = {(x, y) ; x y}.
12.2.6 Theorem: The standard order on the ordinal numbers is a total order.
12.2.8 Remark: The set of finite ordinal numbers satisfies transitivity and well-ordering.
Since the set membership relation on is transitive by Theorem 12.1.21 (v), and the set inclusion relation
on is a well-ordering by Theorem 12.2.7, the two conditions which typically define general ordinal numbers
in textbooks are now established. (These conditions are discussed in Remark 12.1.6.)
12.2.9 Remark: Total order on extended finite ordinal numbers.
The total order in Definition 12.2.5 is extended to extended finite ordinal numbers in Definition 12.2.10.
12.2.10 Definition: The standard order on the extended finite ordinal numbers is the relation which
is defined on + by x, y + , (x y x y). In other words, = {(x, y) + + ; x y}.
12.2.11 Theorem: The principle of mathematical induction.
Let S be a set such that S , S, and x S, x {x} S. Then S = .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (ii), let X1 , X2 + . Let f : X1 X2 be an increasing bijection. Suppose that X1 and X2 = .
Then Range(f ) = X1 and f = idX1 as in the proof of part (i). So f is not surjective. Therefore X2 . So
X1 = X2 and f = idX1 by part (i). Now suppose that X1 = . Then by the inductive method of part (i),
X2 = Range(f ) = and f = idX1 .
12.2.13 Remark: Successor sets are well defined.
For any set X, the successor set X {X} is a well-defined set. This follows from the unordered pair axiom
and the union axiom in ZF set theory. The inequality X {X} 6= X follows from the ZF regularity axiom,
which implies that X / X for sets X.
12.2.14 Definition: The successor set of a set X is the set X {X}.
12.2.15 Notation: X + , for any set X, denotes the successor set X {X} of X.
12.2.16 Remark: The successor set notation.
R Z
The notation X + = X {X} for successor sets clashes negligibly with notations such as + and + . Peano
used the notation a+ for the abstract successor of a positive integer a in an 1891 work (Peano [377], page 20),
but not in an earlier 1889 work (Peano [325]).
12.2.17 Theorem:
(i) X , X + .
(ii) X + , X .
(iii) X , 6 X.
(iv) N , X + , (X N X ).
(v) X, Y , (X + = Y + X = Y ).
Proof: Part (i) follows directly from Theorem 12.1.34 (ii) and Notation 12.2.15.
For part (ii), let X + . If X = , then clearly X . So suppose that X . Let m X. Then m is
an extended finite ordinal number by Theorem 12.1.15 (ix). So m + by Theorem 12.1.36 (i). But m 6=
because the containment X is impossible by the axiom of regularity. (See Theorem 7.8.2 (ii).)
So m . Therefore m X, m . In other words, X .
For part (iii), let X . Suppose that X. Then X = because X by part (ii). So . But
this is impossible by the axiom of regularity. (See Definition 7.2.4 (7).) Hence 6 X.
For part (iv), let N , X + and X N . Suppose that X = . Then N , which is impossible by
part (iii). Therefore X 6= . Hence X .
Part (v) follows from Theorem 12.1.28 (iii).
12.2.18 Remark: Notation for the set of extended finite ordinal numbers.
Using Notation 12.2.15, the set of extended finite ordinal numbers may be written as + = {}. This
notation can be useful to refer to sequences which have either a finite or infinite domain. The domain of a
finite sequence is an element of , whereas the domain of an infinite sequence is the set itself.
12.2.21 Definition: A subsequence of a sequence (Xi )iI is a sequence (Yj )jJ such that Y = X for
some increasing function : J I.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A finite subsequence of a sequence (Xi )iI is a subsequence (Yj )jJ of (Xi )iI such that J .
An infinite subsequence of a sequence (Xi )iI is a subsequence (Yj )jJ of (Xi )iI such that J = .
Proof: For part (i), let (Yj )jJ be a subsequence of an injective sequence (Xi )iI with Y = X 1 and
Y = X 2 , where 1 : J I and 2 : J I are increasing functions. Then X(1 (j)) = X(2 (j)) for
all j J. So 1 (j) = 2 (j) for all j J because X is injective. Hence 1 = 2 .
For part (ii), let X = (Xi )iI be a sequence for some I + . Let Y = (Yj )jJ be a subsequence of X for
some J + . Then there is an increasing function 1 : J I such that Y = X 1 . Let Z = (Zk )kK be a
subsequence of Y for some K + . Then there is an increasing function 2 : K J such that Z = Y 2 .
Let = 1 2 . Then Z = X , where : K I is an increasing function. Hence Z is a subsequence
of X.
because a surprisingly large proportion of intuitively appealing cardinality assertions regarding countable
sets do require some kind of axiom of choice. The simple construction idea in the proof of Theorem 12.3.2
is illustrated in Figure 12.3.1.
Y
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 . . .
f ...
0 1 2 3 4 5 6 7 8 9 10 . . .
where Yk denotes the subset Y \ Range(gk ) = Y T \ {gk (i); i k} of for all k .T In other words, each
gk+1 extends gk by adding the ordered pair (k, Yk ), where the minimum element Yk of the non-empty
subset Yk of is a well-defined element of because is well ordered. Since (gk )k is a non-decreasing
sequence of subsets of Y , it follows that (Yk )k is a non-increasing sequence of subsets of .
T
Let X = {k ; Yk 6= }. Then X . Suppose that X 6= . Then \ X 6= , and so N = ( \ X)
is a well-defined element of because is well ordered. But \ X = {k ; Yk = }. So Yk 6= for
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
all k N , and Yk = for all k \ N because (Yk )k is non-increasing. Therefore X = N if X 6= .
Thus X + .
Since g0 = , g0 is a function with domain T = 0 . For k X, suppose that gk is a function whose
domain is k. Then gk+1 = gk {(k, Yk )} is a function with domain k + 1 = k {k} because Yk 6= .
Therefore by induction, gk+1 is a function with domain k + 1 for all k X.
S
Let f = kX gk+1 . If X 6= , S then X and so f = gX because (gk )k is a non-decreasing sequence
of subsets of Y and X = {i + 1; i X} for all X (since each ordinal number is equal to 1 more
than the largest of its elements). So Dom(f ) = X. Then Range(f ) = Y because Range(gk ) = Y \ Yk for all
k , and
T so Range(g
T X ) = Y \ YX T = Y because X \ X since X / X. But f is an injection
because
f (k) = Yk = (Y \ Range(gk )) = (Y \ Range(f k )) for all k X, and so f (k) / Range(f k ). Therefore
f : X Y is a bijection.
S
If X = , then f = kX gk+1 is a function with Dom(f ) = because f contains one and only one
T
ordered pair for each k , and f is an injection because f (k) = (Y \ Range(f k )) for all k X, and so
T
f (k) / Range(f k ). To show that f : X Y S is a surjection, let z = (Y \Range(f )), the smallest element of
Y which is not in the range of T f , and let y = {j Range(f ); j < z}, the largest element of Range(f ) which
is less than z. (Note that z = Y in the exceptional case where {j Range(f ); j < z} =T, and so f (0) = z,
which is a contradiction. So this case can be ignored.) Let k = f 1 (y). Then f (k +1) = (Y \{f (i); i k})
by the definition of f . But z Y \{f (i); i k} because Y \{f (i); i k} Y \Range(f ) and z Y \Range(f ).
Note that f is a strictly increasing function because (Yk )k is a strictly decreasing sequence of sets (with
respect to the set-inclusion partial order on ()). P
Suppose that z > f (i) for some i with i > k. Then f (i) would be an element of {j Range(f ); j < z}
with f (k)
T < f (i), which would
T contradict the definition of k. Therefore f (i) for all i with i > k.
zT
So z = (Y \ Range(f )) = (Y \ {f (i); i k}) (Y \ {f (i); i > k}) = (Y \ {f (i); i k}). Therefore
f (k + 1) = z, which is a contradiction. So Range(f ) = Y . Hence f is a bijection.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
induction.
12.3.7 Remark: Equinumerosity of sets of finite ordinal numbers elements of + .
In the terminology of equinumerosity in Section 13.1, Theorem 12.3.8 means that every set of finite ordinal
numbers is equinumerous to a finite ordinal number or the set of all ordinal numbers. In other words, every
subset of is equinumerous to some element of + .
Similarly, Theorem 12.3.10 means that every subset of a finite ordinal number N is equinumerous to an
element of N or to N itself.
12.3.8 Theorem: Equinumerosity of subsets of to elements of + .
P
Y (), X + , f X Y, f is a bijection.
Proof: The assertion follows immediately from Theorem 12.3.2.
12.3.9 Remark: Enumeration of an arbitrary subset of a finite ordinal number.
Theorem 12.3.10 is a finite-set analogue of Theorem 12.3.8 which, for example, is useful for showing that all
finite sets are Dedekind finite in Section 13.8.
12.3.10 Theorem: Equinumerosity of subsets of N to elements of N + .
P
N , Y (N ), X N + , f X Y, f is a bijection.
P
Proof: Let N and Y (N ). By Theorem 12.3.2, there is a unique increasing function f : X Y
with X + . But i f (i) for all i X by Theorem 12.3.6 (ii). So X N . Hence X N + .
12.3.11 Theorem: Let f : X Y be the standard enumeration for a set Y P(N ) for some N .
(i) X N .
(ii) If Y 6= N , then X < N .
Proof: For part (i), let Y 6= N for some N . Then i f (i) for all i X by Theorem 12.3.6 (ii).
So X N .
For part (ii), let Y
6= N for some N . Then X N by Theorem 12.3.10. Suppose that X = N . Let
y N \ Y . Define a bijection g : N \ {y} N 1 by
i if i < y
i N \ {y}, g(i) =
i 1 if i > y.
Then Y Dom(g) and g is increasing on Y . But f : N Y is increasing. So g f : N N 1 is increasing.
So X N 1 by part (i), which contradicts the assumption X = N . Hence X < N .
12.3.12 Remark: Injective and surjective finite set endomorphisms must be bijections.
In the proof of Theorem 12.3.13 (i), the strategy follows the typical inductive pattern, namely to state first
that if something goes wrong, then there must be a least integer for which it goes wrong, and then to
show that there is an even smaller integer where it goes wrong, which is a contradiction.
A possibly interesting aspect of the proof of Theorem 12.3.13 (ii) is the constructive choice of a right inverse
of a surjective function. As mentioned in Remark 10.5.13, the general existence of right inverses of surjections
in Theorem 10.5.14 requires the axiom of choice, but in this case, an explicit right inverse may be constructed
because every finite set has an explicit well-ordering.
12.3.13 Theorem:
(i) Let : N N be an injection for some N . Then is a bijection.
(ii) Let : N N be an surjection for some N . Then is a bijection.
(iii) There is no injection from to a finite ordinal number.
Proof: For part (i), suppose that for some N , there exists an injection : N N which is not a
bijection. Let N0 be the minimum N such that there exists an injection : N N which is not
a bijection. (N0 is well defined because is well ordered.) Let 0 : N0 N0 be an injection which is not a
bijection. Then y N0 \ Range(0 ) for some y. Define g : N0 \ {y} N0 1 by
i if i < y
i N0 \ {y}, g(i) =
i 1 if i > y.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Then g is an injection from Range(0 ) to N0 1. Define 1 : N0 1 N0 1 to be the restriction of g 0
to N0 1. Then 1 is an injection. But 1 is not a bijection because g(0 (N0 1)) / Range(1 ). This
contradicts the hypothesis that N0 is the minimum finite ordinal number with the specified property. Hence
all injections : N N for N are bijections. (See Figure 12.3.2.)
N0
0 y
/ Range(0 )
g 0 y N0 \ {y}
g
z (N0 1) \ {z}
z = g(0 (N0 1))
/ Range g 0 N
0 1
For part (ii), suppose that for some N , there exists a surjection : N N . Define : N N by
(j) = min{i N ; (i) = j} for all j N . Then : N N is a well-defined function because is a
surjection, and : N N is an injection because is a well-defined function. Therefore : N N is a
bijection by part (i). Hence : N N is a bijection.
For part (iii), let f : N be an injection for some N . Let = f N . Then g : N N is an
injection. Therefore is a bijection from N to N . Let x \ N . Then f (x) N . Let y = g 1 (f (x)).
Then f (y) = f (x) because g is a bijection. But y N because Dom(g) = N . So x 6= y. Therefore f is not
injective, which is a contradiction. Hence there is no injection from to N .
12.3.14 Remark: The first-instance subsequence of a sequence is injective and has the same range.
It is often useful to be able to replace a sequence with an injective subsequence which has the same range.
This is easily achieved by selecting the first occurrence of each value in the sequence. This construction is
shown in Theorem 12.3.15 to have the desired properties. It is given a name in Definition 12.3.16.
12.3.15 Theorem: Let (xi )iI be a sequence for some I + . Let S = {i I; j i, xj 6= xi }. Let
: J S be the standard enumeration for S. Then x is an injective sequence which is a subsequence
of x, and Range(x ) = Range(x).
Proof: Let (xi )iI be a sequence for some I + . Let S = {i I; j i, xj 6= xi }. Then S
because I . Let Y = Range(x) and Z = {xi ; i S} = Range(xS ). Clearly Z Y . Let y Y . Then
y = xi for some i I. Let j = min({k I; xk = xi }). Then j is well defined because {k I; xk = xj } = 6 .
Clearly xj = xi . But xk 6= xi for all k j by the definition of j. So j S. Therefore xj Z. So xi Z.
Thus Y Z, and so Y = Z.
Let be the standard enumeration for S. (See Definition 12.3.5.) Then : J S is an increasing bijection
for some J + . Therefore x = (x(j) )jJ is a sequence by Definition 12.2.19, which is a subsequence
of x by Definition 12.2.21. Suppose that j1 , j2 J with j1 6= j2 . Then (j1 ), (j2 ) S, and (j1 ) 6= (j2 )
because is increasing. Suppose that (j1 ) < (j2 ). Then j (j2 ), xj 6= x((j2 )) by the definition
of S because (j2 ) S. So x((j1 )) 6= x((j2 )). Similarly, (j1 ) > (j2 ) implies x((j1 )) 6= x((j2 )).
Therefore x is injective. The
equality Range(x ) = Range(x) follows from the equality Range(x ) =
Range(xRange() ) = Range(xS ) = Range(x).
12.3.16 Definition: The first-instance subsequence of a sequence (xi )iI is the subsequence x =
(x(j) )jJ , where : J S is the standard enumeration of S = {i I; j i, xj 6= xi }.
12.3.17 Remark: Equinumerosity of finite ordinal numbers implies equality.
In the terminology of Section 13.1, the existence of a bijection between finite ordinal numbers N1 and N2 in
Theorem 12.3.18 (ii) means that they are equinumerous. Thus Theorem 12.3.18 (ii) implies that two finite
ordinal numbers are equinumerous if and only if they are equal. (Note that part (ii) of Theorem 12.3.18 can
also be proved by a double application of part (i).)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
12.3.18 Theorem:
(i) Let N1 , N2 . Let : N1 N2 be an injection. Then N1 N2 . Hence N1 N2 .
(ii) Let N1 , N2 . Let : N1 N2 be a bijection. Then N1 = N2 .
Proof: For part (i), let N1 , N2 . Then N1 N2 or N2 N1 by Theorem 12.1.21 (xi). So N1 N2
0
or N2
6= N1 . Let : N1 N2 be an injection and suppose that N2 6= N1 . Define = N . Then
2
: N2 N2 is an injection. So : N2 N2 is a bijection by Theorem 12.3.13 (i), and so Range(0 ) = N2 .
0 0
12.4.2 Remark: The usefulness of general ordinal numbers as yardsticks for cardinality.
When the cardinality of a set must be measured by comparison with a specific yardstick (or metre rule)
set, so as to avoid using abstract non-set cardinal numbers, it is the general ordinal numbers which are
usually used. However, this only works for all sets if all sets are well-ordered, which they are not in practice,
although choice-axiom-believers claim that well-orderings do exist in some inaccessible universe. Hence in
practice, ordinal numbers are fairly useless as cardinality yardsticks except for finite and countably infinite
R R
sets. Even the set of real numbers has no demonstrable injection f : for a general ordinal number ,
because the construction of such a map would induce a well-ordering on the real numbers, which is generally
agreed to be impossible to construct. (Anyone who could achieve such a construction explicitly would deserve
a Nobel prize for literature because it would be an outstanding work of fiction!) A map which can only be
explicitly constructed for sets which can be explicitly well-ordered is of limited practical value. Even if the
axiom of choice is adopted, it still only delivers a vaguely defined cardinal number 1 , which is supposed to
be the smallest ordinal number which has higher cardinality than .
It is very much more natural and practical to use beta sets as cardinality yardsticks for infinite sets. This
approach will be adopted here. (See Section 13.2 for beta-cardinality.) Therefore general ordinal numbers
are not directly required in their role as cardinality yardsticks. General ordinal numbers are useful, however,
for building ZF set theory models such as the von Neumann universe and the constructible universe. In
conjunction with the construction of such ZF models, one easily defines also the beta-sets. Each beta-set B
has a subscript which is a general ordinal number. So general ordinal numbers are required for defining
general cardinality, but only as subscripts for beta-sets and as the ranks of stages in the construction of the
von Neumann universe. On the other hand, one rarely requires ranks above + 3. But it is nice to be able
to say that the cardinality of any ZF set can be represented by a particular yardstick-set of some kind, no
matter how absurdly big the sets-to-be-measured are.
12.4.3 Remark: A personal view of the importance (or otherwise) of transfinite ordinal numbers.
It is this authors personal view that the transfinite ordinal numbers are one of the monumental dead-ends
of mathematics. There are two issues to be addressed here. First, whether or not general transfinite ordinal
numbers have any really useful role to play in mathematics. And second, how did it happen that such a
huge amount of intellectual energy and heavy machinery has been brought to bear on a cul-de-sac for over
a hundred years?
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The importance of the finite ordinal numbers is unquestionable. They serve as yardsticks for both order and
cardinality. The set of ordinal numbers is unquestionably important. But it should be asked whether we
need any more than these. The ordinal numbers beyond are worse than useless as cardinality yardsticks.
The cardinal number for the set of real numbers is an ordinal number 1 somewhere beyond 0 . But even
the existence of this mysterious ordinal number is only demonstrable with the aid of the axiom of choice. It
existence requires faith, and what you actually get is unknown even if you do have faith!
The role of transfinite ordinal numbers as order-yardsticks is highly questionable because the existence of
well-orderings on sets is generally proved only by invoking the axiom of choice, which delivers only an IOU
instead of a real rabbit out of a hat. The importance of the ordinal numbers in the time of Cantor, Hilbert
and von Neumann depended on the value of well-orderings on all sets, which clearly cannot be constructed
in general. (If they could be constructed, you wouldnt need the axiom of choice!)
Since virtually all of real-world mathematics (and most fanciful mathematics) can be very easily and com-
fortably represented within the von Neumann partial universe V+ , there seems no need to go beyond this.
For cardinalities, the existence of sets with cardinality between the beth numbers cannot be demonstrated,
since this has been shown to be model-dependent. Therefore the beta-sets B defined recursively by B0 =
P
and B+1 = (B ) (and so forth) are adequate cardinality yardsticks for all sets. (See Section 13.2 for
beta-sets and beta-cardinality.) One may remove the fear of not being able to find an exact match amongst
these sets by invoking the generalised continuum hypothesis, which is yet another rabbit out of a hat.
(This particular magic trick guarantees the non-existence of some kinds of sets, whereas the axiom of choice
guarantees the existence of particular kinds of sets!) However, such fear is not important. It would be a
truly important discovery if one could in fact find such mediate cardinals between the cardinalities of
the beta-sets. Knowing the smallest beta-set which dominates a given set provides more than adequate
measurement of the size of sets.
The remaining question is why so much intellect has been expended in the last 120 years on transfinite ordinal
numbers which bring no obvious benefits but cause so much obvious trouble. Probably the endorsement
of Cantors paradise by Hilbert was an important factor influencing the investment of research effort.
(See Hilbert [370], page 170: Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben
konnen. Literally: Out of this Paradise, that Cantor created for us, no one must be able to expel us.)
Von Neumanns work on the ordinals undoubtedly added to their prestige as a mountain to be climbed.
Understandably, real mathematicians have largely lost interest in the foundations of mathematics since the
end of the 1960s, when the most important questions seemed to have been resolved. It is not surprising at
all that set theory and model theory split off into a separate domain of mathematics with limited relevance
for the mainstream. What is regrettable, however, is that the teaching of set theory continues to include
substantial coverage of arcane aspects of transfinite ordinal numbers, and that the axiom of choice is taught
as an article of unquestioned mainstream belief. In practice, the axiom of choice is false because no one can
construct an explicit well-ordering of the set of real numbers (including all non-constructible real numbers).
Conclusion: The von Neumann partial universe V+ is adequate for almost all mathematics. (It doesnt
contain + as an element though!) The generalised continuum hypothesis should be ignored as an
unnecessary distraction. And the axiom of choice should be resisted wherever and whenever it arises.
Mathematics can live without it. No serious theorems will be lost. And the beta-sets , 2 , 2(2 ) should be
adopted as transfinite cardinality yardsticks.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Existence and uniqueness for the general transfinite recursive construction of hierarchies of sets was proved
by von Neumann in two papers in 1928 for Zermelo-Fraenkel set theory (von Neumann [385]), and for
von Neumanns own set theory (von Neumann [386]), which later developed into NBG set theory. In neither
of these papers did he apply his transfinite recursive method to construct a universe of all sets.
Presentations of the von Neumann universe by Bernays [292], pages 203209, and E. Mendelson [320],
page 202, give credit to von Neumann for the transfinite induction construction method which they use,
but not for its particular application to the construction of a universe of sets. Roitman S [335], page 79,
states (without references) that: The realization that regularity is equivalent to V = ON V is due to
von Neumann.
The standard notation for a universe of sets is V , which apparently originates with the early use in set theory
of the symbol or (or upside-down V ) for the empty set, and or V for the universe of sets. (The former
is the intersection of all sets, while the latter is the union of all sets. See for example Peano [325], page XI;
Whitehead/Russell, Volume I [350], page 229; Carnap [296], pages 125126; Bernays [292], pages 33, 58.) It
is possible that this use of the letter V could have been taken as a hint by some mathematicians that the
universe was being attributed to von Neumann.
V4 = P(V3)
= 0, 1, {1}, 2, {{1}}, {0, {1}}, {1, {1}}, {0, 1, {1}},
{2}, {0, 2}, {1, 2}, 3, {{1}, 2}, {0, {1}, 2}, {1, {1}, 2}, {0, 1, {1}, 2} .
Thus V4 contains 16 = 24 elements. It is even more unpleasant when written out in full.
V4 = , {}, {{}}, {, {}},
{{{}}}, {, {{}}}, {{}, {{}}}, {, {}, {{}}},
{{, {}}}, {, {, {}}}, {{}, {, {}}}, {, {}, {, {}}},
{{{}}, {, {}}}, {, {{}}, {, {}}}, {{}, {{}}, {, {}}}, {, {}, {{}}, {, {}}} .
The first 5 mini-universes are illustrated in Figure 12.5.1 using the same visualisation conventions as in
Figure 12.2.1. (See Remark 12.2.2.)
V0
V1
V2
V3
V4
The mini-universe V5 contains 216 = 65536 elements, which is too many to usefully list here. The first few
elements of V5 are as follows (in lexicographic order).
V5 = P(V4)
= 0, 1, {1}, 2, {{1}}, {0, {1}}, {1, {1}}, {0, 1, {1}},
{2}, {0, 2}, {1, 2}, 3, {{1}, 2}, {0, {1}, 2}, {1, {1}, 2}, {0, 1, {1}, 2},
{{{1}}}, {0, {{1}}}, {1, {{1}}}, {0, 1, {{1}}},
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
{{1}, {{1}}}, {0, {1}, {{1}}}, {1, {1}, {{1}}}, {0, 1, {1}, {{1}}},
{2, {{1}}}, {0, 2, {{1}}}, {1, 2, {{1}}}, {0, 1, 2, {{1}}},
{{1}, 2, {{1}}}, {0, {1}, 2, {{1}}}, {1, {1}, 2, {{1}}}, {0, 1, {1}, 2, {{1}}},
{{0, {1}}}, {0, {0, {1}}}, {1, {0, {1}}}, {0, 1, {0, {1}}},
{{1}, {0, {1}}}, {0, {1}, {0, {1}}}, {1, {1}, {0, {1}}}, {0, 1, {1}, {0, {1}}},
{2, {0, {1}}}, . . . .
The universe stage V6 contains 265536 2.0035 1019728 sets. This is very significantly greater than the
number of electrons in the visible physical universe. The set Vk expands astonishingly quickly as k
increases. This shows how deceptive the word finite is. It is clearly not possible to represent the arbitrary
choice of a single element of V7 using two spin states of every electron in the known Universe. So V10 is totally
beyond description. This gives some idea of the strength of the power-set axiom. In a very small number of
stages, the bounds of human comprehension are exceeded, long before the infinity axiom is applied.
The finite universe stages Vk for k satisfy k Vk and k Vk+1 \ Vk . Therefore Vk is the smallest
universe stage which includes k, and Vk+1 is the smallest stage contains k.
For all k \ {0}, the difference set Vk \ Vk1 consists of all set-membership
S trees which have depth exactly
equal to k. It is straightforward to arrange all elements of V = k Vk in a lexicographic order so that
all elements of Vk come before all elements of Vk+1 , for all k .
12.5.3 Remark: Some slightly infinite stages of the von Neumann universe
It is intuitively clear that all possible finite sets which can be built from the empty set S
will be constructed
in stage Vk for some k as discussed in Remark 12.5.2. Therefore the union V = k Vk of all such
finite universes is the set of all possible finite ZF sets which can be constructed from the empty set. In
fact, it follows from the ZF regularity axiom that no other finite sets are possible. In particular, V . So
V is an infinite set. It is clearly possible to write an explicit bijection f : V by continually extending
bijections fk : nk Vk , where nk has the same cardinality as Vk . This is not difficult because Vk Vk+1
for all k . Hence V is countable.
The set V is a model for ZF set theory without the axiom of infinity. This is stated in simple terms as
follows by Cohen [300], page 54.
The first really interesting axiom [of ZF set theory] is the Axiom of Infinity. If we drop it, then we
can take as a model for ZF the set M of all finite sets which can be built up from . [...] It is clear
that M will be a model for the other axioms, since none of these lead out of the class of finite sets.
In fact, V is a model for Zermelo set theory without the axiom of infinity for any limit ordinal . (See
Smullyan/Fitting [342], page 96, for a proof of this. They call any universe which satisfy the Zermelo set
theory axioms apart from the infinity axiom a basic universe.)
The cardinalities of n and Vn are the same for n = 0, n = 1 and n = 2, but n is smaller than Vn for n
with n 3. However, the cardinalities of and V are the same, whereas n is smaller than Vn for all ordinal
numbers n > .
P
The set V+1 = (V ) breaks through into a universe where infinite sets are possible. From V , it
P P P
follows that + 1 = {} () (V ) = V+1 . The inclusion () V+1 implies that V+1 is
P
uncountable (by Cantors diagonalisation procedure). It is well-known that () is equinumerous to the set
R R
of real numbers. So V+1 is an effective cardinality yardstick forS . It is convenient to define B0 = ,
P
and B+1 = (B ) for all non-initial ordinal numbers , and B = < B for initial ordinal numbers .
Then in particular, B is equinumerous to V+ for all finite ordinal numbers .
The set V2 is a model for Zermelo set theory. (See Remark 7.11.15 for Zermelo set theory. See Smullyan/
Fitting [342], page 96, for a proof that V2 is a model for Zermelo set theory. They show furthermore that
V is a Zermelo universe for any ordinal number such that 2.) The fact that V2 is not a model
for Zermelo-FraenkelS set theory is clear from the fact that 2 / V2 , although + i V2 for all i ,
and 2 = + = i ( + i). The ZF replacement axiom implies that 2 is a ZF set by considering the
replacement map i 7 + i for i .
12.5.4 Remark: The relation between ordinal numbers and von Neumann universe stages.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The cumulative hierarchies of ordinal numbers and sets are generated by the same transfinite recursion
procedure. The parallel development of these two hierarchies is illustrated in Table 12.5.2. (For brevity,
some stages have been omitted.)
The stage number, or rank of each stage in the cumulative hierarchy in Table 12.5.2 is the same as the
ordinal number which is constructed at each stage. Therefore the left column is superfluous. The ordinal
numbers may be considered to be the backbone of set theory because of the central role they play as the
rank parameter in this construction of the class of all sets.
At each stage of the hierarchy, non-limit ordinals + 1 are constructed as {}, whereas set-universe
P
stages V+1 are constructed as (V ), but both the limit ordinal numbers 0 and the von Neumann universe
limit S
stages V0 are constructed
S by the same method, namely as the union of all previous stages. Thus
0 = <0 and V0 = <0 V for all limit ordinals 0 .
The ad-hoc notation $i in Table 12.5.2 means the solution, for i , of the inductive rule $i+1 = $i
with $0 = . (Note that $ is a variant of the Greek letter , not the Greek letter . A possible mnemonic
for the letter $ in this application is the word power.)
The beta-cardinality numbers k in Table 12.5.2 are defined for all ordinal numbers k by k = #(V+k )
for ordinal numbers k < 2 and k = #(Vk ) for ordinal numbers k 2 . (See Section 13.2 for beta-
cardinality.) The principal idea of the beta-cardinality idea is that the cardinality of sets is measured by
the ordinal numbers for finite and countably infinite sets. For uncountable infinite sets, the ordinal numbers
are totally inadequate to the task of acting as cardinality yardsticks. Therefore beta-cardinality switches
to the von Neumann set hierarchy as a source of cardinality yardsticks for all uncountable sets. Since the
services of the axiom of choice, and therefore also of the generalised continuum hypothesis, are declined in this
book, it is not possible to assert that all sets are equinumerous to some stage of the von Neumann hierarchy.
However, the concrete existence of such sets with mediate cardinality has never been demonstrated in a
constructive way because if they could be explicitly constructed, they would have a well-defined cardinality
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S S
+ 2 + 2 = i (( + ) + i) V +2 = i (V ++i ) +2
S S
+ 3 + 3 = i (( + 2) + i) V +3 = i (V +2+i ) +3
... ... ... ...
S S
2 2 = i ( + i )
V 2 = i (V +i )
2
S S
3 3 = i ( 2 + i ) V 3 = i (V 2+i ) 3
... ... ... ...
S S
= i ( i) V = i (V i )
S S
2 2 = i ( i) V 2 = i (V i ) 2
... ... ... ...
S S
2
= i ( i)
2
V 2 = i (V i ) 2
S S
3 3 = i ( 2 i) V 3 = i (V 2 i ) 3
... ... ... ...
S S
( )2 ( )2 = i ( i ) V( )2 = i (V i ) ( )2
S S
( )3 ( )3 = i (( )2 i ) V( )3 = i (V( )2 i ) ( )3
... ... ... ...
2 2 S S
( ) ( ) = ( ) = i ( )i V(2 ) = V( ) = i V( )i (2 )
S i S
( ) ( ) = i ( ) = $2 V( ) = i V(i ) ( )
... ... ... ...
S S
0 0 = i $i V0 = i V$i 0
0 + 1 0 + 1 = 0 {0 } P
V0 +1 = (V0 ) 0 +1
... ... ... ...
Table 12.5.2 The cumulative hierarchies of ordinal numbers and sets
representative amongst the von Neumann universe stages. (In fact, if the universe of sets is restricted to
constructible sets, both AC and GCH are true.) Therefore beta-cardinality is well defined for all constructible
sets, which is adequate for all practical purposes.
i , + (i + 1) = ( + i)+ V+(i+1) = P
(V+i ) (12.5.1)
S S
2 = + = i ( + i) V2 = V+ = i V+i . (12.5.2)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
In ordinal arithmetic, the plus operator + signifies that the two well-ordered sets on the left and right
of the operator, and respectively say, are to be concatenated, maintaining their order internally,
and extending order to the concatenated set + by defining x < y for all x and y . According
to this addition rule, the ordered set + 1 is isomorphic (as an ordered set) to the ordinal number
+ = {}. (This is actually a theorem about + 1, not a definition for + 1.) Similarly, the
ordered set + 1 is order-isomorphic to the ordinal number + = {}. (Once again, this is a
theorem about + 1, not a definition for + 1.) Thus strictly speaking, the equalities in line (12.5.1)
are actually order-isomorphisms. However, it is customary to identify ordinal number concatenations
with their order-isomorphic representatives in the class of ordinal numbers. (These representatives are
always unique.) This addition operation is not commutative in general. (E.g., 1 + 6 + 1.)
(4) For any i, j , the ordinal number j + (i + 1) = j + (i + 1) = (j + i) {j + i}, and
S the hierarchy
stage Vj+(i+1) S P
= (Vj+i ), are well defined as in step (3). Then (j + 1) = j + = i (( j) + i)
and V(j+1) = i V(j)+i S (3). Taking the limit of the stages j for
S are also well defined as in step
j yields 2 = = j ( j) and V2 = V = j Vj . This is summarised as follows.
The multiplication operator in ordinal arithmetic signifies that if the well-ordered sets on the left
and right of the operator are and respectively, then is the concatenation of copies of the
set . The order for this concatenation is inherited from within each copy of , and x < y if x L
and y R are elements of copies L and R of , where L is strictly to the left of R . This
multiplication operation is not commutative in general. (E.g., 2 6 2.) The operator symbol
is often omitted to save space. Thus for example, j + i means ( j) + i.
(5) In the quadratic tranche of stages, one obtains quadratic polynomials in as follows.
These quadratic polynomials which lead up to 3 may be extended to cubic polynomials which lead
up to 4 . The cubic polynomials in may be written out as follows. (To save space, addition of 1
has been abbreviated to the superscript +. Thus i+ means (i + 1), and so forth.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
hinted at in part (5). The order of the summation must be from the highest-degree monomials down
to the lowest, as indicated. The multipliers im are all elements of , and k ranges over elements of
which satisfy k m, where m is the degree of this polynomial. All of the stages up to an
including the stage may now beP repeated with an additional term on the left. In other words,
one forms the polynomials + m k
k=0 ik for all coefficient tuples (im , . . . i0 ), for all m . The
union of all of these stages is then 2 = + . One may repeat this procedureSany finite number
of times to produce j for any j . The union of these stages is then = j ( j). This
procedure may be repeated to produce 2 = ( ) . (The notation 2 is not merely a
convenient notation for ( ). It is verifiable that this follows from the definitions of ordinal number
addition and multiplication. In this special case, the associative law for ordinal number multiplication
is valid.) One may then proceed to produce ` for ` , and the union (i.e. limit) of these is
S 2
( )2 = = ` ( ` ). (It is essential to distinguish between ( )2 and ( ) here!)
(7) Having arrived at ( )2 in part (6), one may now repeat the procedureS to arrive at ( )k for any k .
The union (i.e. limit) of all such ordinal numbers is then ( ) = k ( )k . One may now add
Pm
arbitrary polynomials to this term as in part (6) to S produce ( ) + k=0 k ik for tuples (im , . . . i0 ),
for all m . The union of these is ( ) + = j (( ) + j ). Repeating
S this procedure yields
( ) + k for k , and taking the limit of these gives ( ) + = k (( ) + k). Repeating
this gives ( ) + (( ) 2) = (( ) + ) + . But ( ) 2 = ( 2). (This is another
special case where associativity gives the right answer!) So ( ) + (( ) 2) = ( ) + (2).
This procedure may be repeated to give ( ) + (k) for k , and the limit of these stages yields
( ) + ( 2 ). This procedure may be repeated to give ( ) + ( ` ) for ` , and in the
limit, this yields ( ) + ( ), which may be written as ( ) + ( )2 . When this procedure is
repeated, the result is ( ) + ( )m for m , and in the limit, this yields ( ) + ( ) , which is
clearly equal to ( ) 2. This procedure may be repeated to produce ( ) k for k , and in the
limit, this gives ( ) . When this procedure is repeated, one obtains (( ) ) , which equals
( ) 2 by the fortuitous associativity which was noted earlier. This may be repeated to produce
( ) ` for ` , which yields ( ) in the limit. If this procedure is repeated, one obtains
(( ) ) , which equals ( ) ( )2 by fortuitous associativity once again. This procedure may
be repeated to produce ( ) ( )m for m , and this yields ( ) ( ) in the limit. Clearly this
may be written as (( ) )2 . This procedure may be repeated to produce (( ) )k for k , which
yields (( ) ) in the limit.
(8) Having arrived at (( ) ) in part (7), one naturally asks whether exponentiation by can be
simplified fortuitously by analogy with function-sets of the form Y X , which denotes the set of functions
from X to Y . (See Notation 10.2.13.) Such function-sets have the property that there is a canonical
bijection h : (Z X )Y Z XY . (Let f (Z X )Y . Then one may define h(f ) Z XY by h(f )(x, y) =
f (y)(x) for all (x, y) X Y .) Therefore one may, if one wishes, identify the sets (Z X )Y and Z XY
with each other. By analogy, one would hope to identify the ordinal number ( ) with a well-ordered
set of the form for example.
The first fly in the ointment here is the fact that an ordinal number of the form is not the same as
the set of functions from to . The ordinal number may be visualised as an infinite-dimensional
hypercube with as the parameter-set along each axis, but the non-zero coordinates are restricted to
a finite subset of the set of axis-labels . Each point in this infinite-dimensional hypercube represents
a polynomial in (with coefficients in ), similar to the polynomials mentioned in part (6). The
elements of the ordinal number may be identified with the functions from to which have finite
support. (This means functions whose value is zero outside a finite set.) By contrast, general func-
tions from to have unbounded support. In other words, general functions from to are like
power series in with coefficients in . This difference underlies the fact that the function-set 2 is
uncountable, whereas the ordinal number is countable. So it will clearly not be possible to construct
a bijection between the ordinal number set and the set of functions from to .
The second fly in the ointment here is the fact that an ordinal number is an ordered set. The
well-ordering on is the lexicographic order determined by the lowest component where two tuples
differ. Thus if x, y , then x < y if and only if n , ((i , (i < n xi = yi )) (xn < yn )).
The bare set of functions from to has no such order relation.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Despite the imperfect analogy between ordinal numbers of the form and the corresponding sets of
functions from to , the analogy does fortuitously give the right answer. An ordinal number of the
form ( ) may be visualised as an infinite-dimensional hypercube where the coordinate along each
axis is itself an infinite-dimensional hypercube. The lower-level hypercube represents polynomials in
with coefficients in , whereas the higher-level hypercube represents polynomials in with coefficients
in . In other words, the coefficients for polynomials in the higher-level hypercube are themselves
polynomials in the lower-level hypercube. Any element of ( ) may be uniquely identified by a function
g : with finite support. If one defines h : ( ) and h : ( ) by
h(f )(x, y) = f (y)(x) and h(g)(y)(x) = g(x, y) for all f ( ) and g , and x, y , then h
and h are inverses of each other, and they map functions with finite support to functions with finite
support. (In the case of f ( ) , the function f (y) also has finite support for all y .) Hence h
2 2
and h are bijections between ( ) and ( ) if the ordinal number ( ) is defined to contain only
2
those functions from to which have finite support.
2 2
One has therefore the identification ( ) = ( ) , where ( ) is the well-ordered set constructed as
3 2 2
the set of binomials in with lexicographic order. Similarly, ( ) = ( ) = ( ( ) ) = (( ) ) .
The elements of this set may be thought of as multinomials in three variables, each of which is an
element of . In other words, they are trinomials in . By repeating this procedure, one obtains
k
( ) forSk . These are multinomials in k variables which are elements of . Hence one obtains
k
( ) = k ( ) , which may be though of as the set of all multinomials in . Let $2 = ( ) . (For
comparison, $1 consists of all single-variable polynomials in .)
( )
(9) The procedure from $1 = to $2 = ( ) may be repeated to produce $3 = ( )
. By induction,
S
($` )
one may produce $` for ` satisfying $`+1 = for all ` . Then one may define 0 = ` $` .
12.5.6 Remark: The ordinal number 0 .
One somewhat surprising observation about the ordinal number 0 is that it is countable. In other words,
#(0 ) = #(V ) although 0 V0 \ V for all ordinal numbers < 0 . Therefore when it is claimed that
the set of real numbers can be well-ordered, it is being claimed that the real numbers can be put into a
one-to-one association with all of the ordinal numbers up to 0 and well beyond that point in Table 12.5.2.
Chapter 13
Cardinality
13.1. Equinumerosity
13.1.1 Remark: Defining cardinality in terms of equinumerosity.
Cardinal numbers are difficult to define, but the relation of equal cardinality, or equinumerosity, is easy
to define in a strictly logical manner. However, there is ambiguity in the word exists in Definition 13.1.2,
which is the root cause of very deep and frustrating issues regarding the comparability of sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
13.1.2 Definition: Sets X and Y are equinumerous or equipotent if there exists a bijection f : X Y .
13.1.3 Remark: There is no explicit logical predicate which counts the elements in a set.
Two sets X and Y are defined (in terms of the unique existence quantifier 0 ) to be equinumerous if and
only if
f X Y, (x X, 0 y Y, (x, y) f ) (y X, 0 x Y, (x, y) f ).
This is a well-defined logical expression for any sets X and Y because the Cartesian set product X Y is
a well-defined set. This is important to know because the relation of equinumerosity is defined for all sets,
and the set of sets is not a set! However, there is evidently no such logical expression for the numerosity or
cardinality of a set. That is, there is no logical expression of the form N (X) such that N (X) is a set for all
sets X, and N (X) = N (Y ) if and only if X and Y are equinumerous.
It is possible to construct logical expressions to test for each cardinality. For example, x, x / X is an
effective test for zero cardinality. The logical expression (x, x X) (x1 , x2 X, x1 = x2 ) is an effective
test for cardinality equal to 1. For small finite cardinalities, one may use logical multiplicity quantifiers
as in Remark 6.8.8, but these become rapidly impractical and tedious. (See for example Remark 6.8.11.)
More generally, one may test sets for cardinality equal to any ordinal number (or any other set) by applying
the above logical expression for equinumerosity of sets X and Y with Y equal to the comparison set. This
is quite simple and convenient, but it requires the provision of a class of comparison sets, one comparison
set for each cardinality. And that is where some of the deeper issues in the subject of cardinality begin to
arise. Even the smaller ordinal numbers become rapidly tedious to write out in full. (For examples, see
Remark 12.2.1.)
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Then Xi Xj = for all i, j with i 6= j. (That is, the sets Xi are pairwise disjoint for i .)
Proof: Let X and Y be sets. Let f : X Y and g : Y X be injective functions. Inductively define the
sequences (Ai )
i=0 X and (Bi )i=0 X by A0 = X, B0 = g(Y ), Ai+1 = g(f (Ai )) and Bi+1 = g(f (Bi ))
for all i . Then A0 B0 , and Ai Bi implies Ai+1 Bi+1 for all i by Theorem 10.6.6 (ii). Therefore
by induction, Ai Bi for all i . Similarly, B0 A1 , and Bi Ai+1 implies Bi+1 Ai+2 for all i by
Theorem 10.6.6 (ii). Therefore by induction, Bi Ai+1 for all i .
Inductively define (Xi )
i=0 X
by X0 = g(Y \ f (X)) and Xi+1 = g(f (Xi )) for i . Then X0 =
B0 \ A1 because g(Y \ f (X)) = g(Y ) \ g(f (X)) due to the injectivity of g, by Theorem 10.6.6 (v0 ). Also
by Theorem 10.6.6 (v0 ), Xi = Bi \ Ai+1 implies Xi+1 = Bi+1 \ Ai+2 for all i . Therefore by induction,
Xi = Bi \ Ai+1 for all i . (This is illustrated in Figure 13.1.1.)
X A0 f g C
0 Y
g(Y ) B0 D0 f (X)
X0
g(f (X)) A1 C1 f (g(Y ))
g(f (g(Y ))) B1 D1 f (g(f (X)))
X1
g(f (g(f (X)))) A2 C2 f (g(f (g(Y ))))
g(f (g(f (g(Y ))))) B2 D2 f (g(f (g(f (X)))))
X2
g(f (g(f (g(f (X)))))) A3 C3 f (g(f (g(f (g(Y ))))))
... ...
Figure 13.1.1 Set-sequence construction for the Schroder-Bernstein theorem
Proof: Let X and Y be sets. Let f : X Y and g : Y X be injective functions. Inductively define
(Xi )
i=0 X
by X0 = g(Y \ f (X)) and Xi+1 = g(f (Xi )) for i . Then Xi Xj = for all i, j
with i 6= j, by Theorem 13.1.7. Define h : X Y by
S
f (x)
if x X \ Xi
i=0
x X, h(x) =
S
g 1 (x) if x
Xi .
i=0
is well defined for all x X because g is injective and Xi g(Y ) for all i , and so g 1 is well
Then h(x) S
defined on i=0 Xi .
To showSthat h is injective, let x1 , x2 XSwith h(x1 ) = h(x2 ). Suppose that x1 6= x2 . Since f is injective
on X \ i=0 Xi and g 1 is injective on i=0 Xi , one of the points x1 and xS 2 must be in each S of these
subsets of the domain. Suppose (without loss of generality) that x1 X \ i=0 Xi and x2 i=0 Xi .
1
Then f (x1 ) = g (x2 ). So g(f (x1 )) = x2 . So x2 g(f (X)) = A1 . So x2 / X0 . But x2 Xk for
some k , and g f is a bijection from X to A1 = g(f (X)) with the property that g(f (Xi )) = Xi+1 for
all i . STherefore x1 = (f 1 g 1 )(x2 ) Xk1 because x2 / X0 . This contradicts the assumption that
x1 X \ i=0 Xi . Hence x1 = x2 and so h is injective.
S
To show
S that h : X Y is surjective, note that h(X) = U V , where U = f (X \ i=0 Xi )Sand V =
g 1 ( i=0 S it follows that U = f (X)\
Xi ). Since f is injective on X, and g 1 is injective on g 1 (Y ), S i=0 f (Xi )
and V = i=0 g 1 (Xi ). But g 1 (Xi ) = f (Xi1 ) for S i 1. So V =
S( i=1 f (X i1 )) (Y \ f (X)) because
g 1 (X0 ) = Y \ f (X). Therefore U V = (f (X) \ i=0 f (Xi )) ( i=0 f (Xi )) (Y \ f (X)) = Y . So h is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
surjective. Hence h is a bijection.
13.1.11 Notation:
#(X) #(Y ), for sets X and Y, means that X dominates Y .
#(X) #(Y ), for sets X and Y, means that Y dominates X.
#(X) > #(Y ), for sets X and Y, means that X strictly dominates Y .
#(X) < #(Y ), for sets X and Y, means that Y strictly dominates X.
13.1.12 Theorem: Let X and Y be sets such that X Y . Then #(X) #(Y ).
Proof: Let X and Y be sets such that X Y . Then the identity function idX is an injection from X
to Y . (See Definition 10.2.23 and Notation 10.2.25 for the identity function.) Hence #(X) #(Y ) by
Definition 13.1.9 and Notation 13.1.11.
Definition 13.1.9. There is a purely logical expression for the relation of domination between sets. A set X
dominates a set Y if and only if
f X Y, (x X, y1 , y2 Y, (x, y1 ), (x, y2 ) f y1 = y2 )
(y X, x Y, (x, y) f ) (y Y, x1 , x2 X, (x1 , y), (x2 , y) f x1 = x2 ) .
This is a relation between all sets. Therefore it is not a relation in the sense of Definition 9.5.2.
Let B(X, Y ) = {f X Y ; f is a bijection} and I(X, Y ) = {f X Y ; f is an injection} for all sets
X and Y . Then X and Y are equinumerous if and only if B(X, Y ) 6= , and X dominates Y if and only
if I(Y, X) 6= . So X strictly dominates Y if and only if B(X, Y ) = and I(Y, X) 6= . However, the
existence of a purely logical expression for a set property does not necessarily imply that the property can be
calculated. The axiom of choice is invoked precisely because the emptiness or otherwise of sets is sometimes
impossible to determine within ZF set theory. In the case of cardinality, ZF set theory has acquired some
optional extra axioms for resolving equinumerosity and domination questions. However, as in the case of
the choice axiom, these optional numerosity axioms are a matter of personal faith. They do not construct
anything new. They merely state, without any justification, that certain sets are empty or non-empty. (It
takes courage to admit that one does not know something.)
13.1.14 Remark: Equinumerosity is a kind of equivalence relation.
The assertion in Theorem 13.1.8 resembles the antisymmetry condition for a partial order in Definition 11.1.2.
The theorem states that if X is dominated by Y and Y is dominated by X, then X and Y are equinumerous.
However, there are two important differences. Firstly, Definition 11.1.2 requires a partial order to be a
subset of a Cartesian set product, which is not true in the case of a relation on the class of all sets. Secondly,
Theorem 13.1.8 does not assert that X and Y are equal. It states only that they are equinumerous.
The equinumerosity relation amongst sets apparently satisfies the conditions for an equivalence relation in
Definition 9.8.2. Since equinumerosity is defined on all sets, it is not a relation in the sense of Definition 9.5.2.
However, equinumerosity of sets is a genuine equivalence relation on the class of all sets.
If one wishes, one may think of the class of all sets as being partitioned into equivalence classes which
have the same numerosity. These classes are not sets (except for the equivalence class of sets which are
equinumerous to the empty set). So it is dangerous to think of them as sets. However, it is perfectly safe to
think of equinumerosity as an equivalence relation.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Since one cannot refer to the equivalence classes of the equinumerosity relation as sets, one cannot define
the is-dominated-by relation on such equivalence classes. However, one may observe that this relation
satisfies the three conditions for a partial order in Definition 11.1.2 if one accepts equinumerosity as a
substitute for equality in the antisymmetry condition. Those who accept the axiom of choice can say that
the is-dominated-by relation is then a total order.
13.1.15 Remark: The trichotomy comparability theorem for cardinality of sets.
The well-known trichotomy theorem in ZF+AC set theory which states that all pairs of sets are comparable.
In other words, for any pair of sets, either one dominates the other or else they are equinumerous. In
pure ZF set theory, this theorem is not provable. (See for example Smullyan/Fitting [342], pages 41, 113;
Wilder [353], pages 132, 135138; Moore [321], pages 4652; E. Mendelson [320], page 198; Levy [318],
page 162; Halmos [307], page 89.)
13.1.16 Remark: Which ordinal number is 1 ?
1 is the standard notation for the least uncountable ordinal number. One might be tempted to guess that it
equals . After all, the ordinal numbers k for k are all clearly countable, and looks like it should
be at least as big as 2 , which is well known to be uncountable. However, is equivalent to the set of all
finite sequences of non-negative integers, not the set of all infinite sequences of non-negative integers. Hence
( )
is countable. The same observation applies to the powers ( ) , ( )
, and so forth. According to
Quine [332], page 212, the ordinal number 1 lies far beyond even . One may conclude from this that a
startlingly high percentage of the infinite ordinal numbers are useless as cardinality yardsticks since only
ordinal numbers which have different cardinality are required as yardsticks.
The confusing fact that the set 2 is uncountable while the set is countable may be S clarified to some
P
extent by using the more strictly correct notation () instead of S 2 . Then if one writes instead
P
of , it is no longer so surprising that () is uncountable while is countable.
13.1.17 Remark: The parting of the ways between ordinal numbers and cardinality.
The association between finite ordinal numbers and the cardinality of finite sets is very close. The finite ordi-
nal numbers provide an excellent yardstick for the cardinality of finite sets, and the set is an indispensable
yardstick for countably infinite sets. However, it is clear that this close relationship suffers a severe parting
of the ways beyond . As noted in Remark 13.1.16, finding the second infinite cardinal number amongst
the infinite ordinals is highly problematic. Not only is there a breathtaking waste of space between 0 =
and 1 . The identity of 1 cannot be even approximately determined. It is quite astonishing, or at least
surprising, that so much attention is paid in the literature to the use of ordinal numbers for measuring
cardinality. The approach outlined in Remark 13.2.2 is far preferable.
The cardinality of ordinal numbers far beyond has many other problems apart from sparsity and uselessness.
The study of this subject depends very much on the axiom of choice to solve problems, and even so, the
subject is plagued by exotic classes of cardinals which fortunately are outside the scope of this book. The
ordinal numbers in full generality may be safely ignored.
13.1.18 Remark: Cardinality of well-orderable sets.
If a set can be well-ordered, there exists an ordinal number which is equinumerous to it. Since the ordinal
numbers are well ordered, there is a least ordinal number which is equinumerous to it. This is defined to be
the cardinality of the set in Definition 13.1.19.
13.1.19 Definition:
The cardinality of a well-orderable set X is the least ordinal number which is equinumerous to X.
13.1.20 Notation: card(X), for a well-orderable set X, denotes the cardinality of X.
13.2. Beta-cardinality
13.2.1 Remark: Cardinality is problematic, but not a problem in practice.
Cardinality is a problematic topic. Often it is not known whether two sets are equinumerous or if one of
them dominates the other. (This is the trichotomy issue. See Remark 13.1.15.) Sometimes it is not even
knowable whether suitable injections, surjections or bijections exist to establish cardinality relations between
sets. And sometimes it is not even known whether it is knowable or not. (The axiom of choice merely
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
muddies the waters by claiming that suitable functions exist, which seems to fill the gaps in knowledge,
but is an empty claim. No axiom of choice ever produced a choice function.) Thus one must live with
some uncertainty and ignorance. On the positive side, sets which arise in applicable mathematics are almost
always straightforward to assign a cardinality to. Sets which are not finite are usually provably equinumerous
to some von Neumann universe stage V , where is almost never beyond + .
13.2.2 Remark: The beta-cardinality of a set.
Many common sets are not well-orderable. For example, the real numbers cannot be well-ordered. (See
Remark 7.12.4.) Sets which have no well-ordering cannot be assigned a cardinality using general ordinal
numbers as yardsticks. Consequently, the ordinal numbers are of limited use for measuring cardinality.
The beta-cardinality of a set in Definition 13.2.9 is a hybrid of the standard cardinality using elements of
{} as yardsticks for finite and countably infinite sets, combined with infinite von Neumann universe
stages V for uncountable sets. The two ranges of this hybrid definition overlap (and agree) in the case of
countably infinite sets.
The concept of operation of beta-cardinality for infinite sets is to determine the smallest for which a
given set X is equinumerous to a subset of V . In other words, X would have beta-cardinality V if there
exists an injection from X to V , but no injection exists to a smaller universe stage V . (It is provable in ZF
that #(V ) < #(V+1 ). So there is no danger of assigning X to two different universe stages.) For example,
has beta-cardinality , R P PR
has beta-cardinality (), and ( ) has beta-cardinality ( ()). PP
There is a theoretical danger that a set could have mediate cardinality between the cardinality of two
von Neumann universe stages, but no explicit expression for a mediate cardinal can be constructed in ZF
set theory. Their existence requires the negation of the continuum hypothesis. Even in model theory, it is
not easy to demonstrate existence of mediate cardinals. In the worst-case scenario, it may be unknown what
the precise beta-cardinality of a set is. This is not a disaster. There are many unknowns and unknowables
in mathematics which do not hinder the progress of the subject at all.
13.2.3 Definition: The rank of a set X is the smallest ordinal number such that the von Neumann
universe stage V includes X.
13.2.6 Definition: The cardinality-rank of a set X is the smallest ordinal number such that stage V
of the von Neumann universe dominates X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
+2 +2
+ +
For finite sets, the cardinality-rank is almost useless. As mentioned in Remark 12.5.2, #(V0 ) = 0, #(V1 ) =
20 = 1, #(V2 ) = 21 = 2, #(V3 ) = 22 = 4, #(V4 ) = 24 = 16, #(V5 ) = 216 = 65536, and so forth. This is why
the elements of , the finite ordinal numbers, are used instead.
The rank is even less useful for measuring cardinality than the cardinality-rank because the rank indicates the
maximum membership-depth of set. (This is related to the ZF axiom of regularity discussed in Remark 7.8.1.
Roughly speaking, the rank of a set is the ordinal number of the maximum-length path from the set down
to the empty set by membership relations on the left.) Thus {{{}}} has depth 3, and therefore rank 3,
although it contains only one element, which gives it cardinality-rank equal to 1.
where (X) denotes the smallest rank of the von Neumann stages which dominate X.
between them, as is done in Notation 13.1.5. This is not necessarily the case for the beta-cardinality concept.
In the case of infinite beta-cardinalities, the equality (A) = (B) does not necessarily imply that A and
B are equinumerous. (The bar on indicates that the cardinality has been rounded up to the nearest
exact beta-cardinality.) The existence of mediate cardinals in some ZF models implies that two non-
equinumerous sets could have the same beta-cardinality. Therefore the cardinality symbol # is used here
as a kind of pseudo-notation which does not associate a particular yardstick set with its argument. Thus the
expression #(A) is effectively meaningless in isolation if A is not known to be countable. But the pseudo-
equality expression #(A) = #(B) is meaningful. (See also Remarks 13.1.4 and 13.1.10 for set-numerosity
pseudo-notations.)
Except in the case of sets with mediate cardinality (which are very difficult to concoct within ZF set
theory), it is generally found that sets will have a kind of exact beta-cardinality. In other words, they are
equinumerous to either a finite set or some infinite von Neumann stage V . Two such sets will clearly be
equinumerous. For example, both of the sets R and 2 have exact beta-cardinality equal to V1 () = (). P
Therefore they are equinumerous.
13.2.12 Remark: The beth numbers and beta-sets.
The beth number with index is equal to the cardinality (i.e. the smallest equinumerous ordinal number)
of the von Neumann universe V (). In other words, it equals card(V ()). Thus the beth numbers with
P PP
indices 0, 1 and 2 are respectively the ordinal numbers card(), card( ()) and card( ( ())). (All except
the first of these ordinals require the axiom of choice to prove their existence.)
The beta-set B() is defined
S P
by transfinite recursion with the rules B(0) = , B( + 1) = (B()) for any
ordinal , and B() = {B(); < } for any limit ordinal . Replacing each power-set B() = V ()
with the minimal equinumerous ordinal number card(B ) = card(V ()) seems to have no benefit at all, but
it does have significant disadvantages. Such ordinal numbers are not defined without the axiom of choice, and
even when they are defined, their locations in the ordinal number sequence cannot be determined in general.
Hence it is highly desirable to ignore the beth numbers and use the well-defined beta-sets as cardinality
yardsticks instead. (For beth numbers, see Bell/Slomson [290], pages 6, 204205; Roitman [335], page 109.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
13.3.1 Remark: The ordinal number yardstick definition for finiteness of sets.
The ordinal number equinumerosity criterion in Definition 13.3.2 is fairly generally agreed to be the basis
of the concept of finiteness for sets. Despite the inconvenience of needing to first carefully construct the finite
ordinal numbers as a cardinality yardstick, it has the great advantage of being obviously correct. Since
we can construct the finite ordinal numbers almost by hand, this hands-on experience gives a high level
of confidence in the definition.
Theorem 13.9.4 gives an equivalent definition for finite sets which does not use the finite ordinal numbers for
equinumerosity tests. The Kuratowski finite set definition which is mentioned in Remark 13.9.7 also does not
require ordinal number equinumerosity tests. (There are also various other non-equinumerosity definitions
for finite sets.) In practice, however, it is most often finite ordinal number equinumerosity which is easiest
to demonstrate and apply.
Definitions of finite sets essentially the same as Definition 13.3.2 are given for example by Halmos [307],
page 53; Roitman [335], page 103; S. Warner [151], page 138; Shoenfield [340], page 255; Stoll [343], page 83;
Pinter [327], page 144; Suppes [345], page 149; Bernays [292], page 97; Lang [104], page 876; MacLane/
Birkhoff [106], pages 156157; Curtis [63], page 14; Smullyan/Fitting [342], page 4.
13.3.2 Definition: A finite set is a set S which satisfies n , f : n S, f is bijective.
13.3.3 Theorem: Let S be a finite set. Then 0 n , f : n S, f is bijective. In other words, there is
one and only one finite ordinal number n for which S and n are equinumerous.
Proof: Let S be a finite set. Then n , f : n S, f is bijective by Definition 13.3.2. Suppose that
n1 , n2 satisfy f : nk S, f is bijective, for k = 1, 2. Then there exist f1 : n1 S such that f1 is
bijective and f2 : n2 S such that f2 is bijective. So f21 f1 : n1 n2 is a bijection by Theorem 10.5.4 (iii)
because f21 is a bijection by Theorem 10.5.9. Therefore n1 = n2 by Theorem 12.3.18 (ii). Thus n is unique.
Hence 0 n , f : n S, f is bijective.
13.3.4 Remark: Definition and notation for the numerosity of a finite set.
Since the finite ordinal number n which exists for any given finite set S according to Definition 13.3.2 is
unique by Theorem 13.3.3, it is possible to give this number a name and notation as in Definition 13.3.5 and
Notation 13.3.6.
Note that Notation 13.3.6 is compatible with Notation 13.1.5 for equinumerosity of sets because if two
finite sets S1 and S2 have the same value of numerosity according to Definition 13.3.5, then they must be
equinumerous since they will then be equinumerous to the same finite ordinal number. (This may seem
intuitively obvious, but intuition is not a proof!) It is easily verified that Notation 13.3.6 is also compatible
with Notation 13.1.11 in the case of finite sets.
13.3.5 Definition: The numerosity or cardinality of a finite set S is the unique number n for which
a bijection exists from n to S.
13.3.6 Notation: #(S), for a finite set S, denotes the numerosity (or cardinality) of S.
13.3.7 Remark: Pseudo-notation for finiteness of sets.
Notation 13.3.8 is a frequently used pseudo-notation. In terms of Notations 13.1.11 and 13.5.10, the finiteness
of a set S may also be indicated by writing #(S) < #() or #(S) < . This means, by Definition 13.1.9,
that there exists an injection from S to , and that S and are not equinumerous. It then follows from
Theorem 12.3.8 that S is equinumerous to an element of . In other words, S is finite. (Luckily there is no
danger of mediate cardinals between the finite sets and as there is between and ().) P
The apparent inequality in the expression #(S) < has limited validity because the symbol has no
specific meaning. However, it may be safely interpreted to mean #(S) < #() = . (The corresponding
pseudo-notation #(S) = for infinite sets in Notation 13.5.4 may then be interpreted as the logical negation,
(#(S) < #()).)
13.3.8 Notation: #(S) < , for a set S, means that S is a finite set.
13.3.9 Theorem: N , #(N ) = N .
Proof: Let N . Then the identity function idN : N N is a bijection. (See Notation 10.2.25 for idN .)
Therefore #(N ) = N .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
13.3.10 Remark: Ad-hoc notation for set of possible enumerations of a given set.
In terms of Notation 13.3.11, a set S is finite when Enum(, S) 6= . In other words, S is a finite set if and
only if there exists an enumeration of S by some element of .
Enum(K, Y ) is a ZF set for any ZF sets K and Y . To see this, note that Bij(X, Y ) (X Y ) for P
all sets X and S Y . (See Notation 10.5.23
S for Bij(X, Y ), the set of bijections from X to Y .) Therefore
P P
Enum(K, Y ) XK (X Y ) (( K) Y ) because X K, X K.
S
13.3.11 Notation: Enum(K, Y ), for sets K and Y , denotes the set of bijections from X to Y for X K.
In other words, S
Enum(K, Y ) = Bij(X, Y ).
XK
13.3.13 Theorem:
(i) Every subset of a finite set is finite.
(ii) If S is a finite set and A S, then #(A) #(S).
(iii) There are no injections from to a finite set.
Proof: For part (i), let S be a finite set. Then by Definition 13.3.2, there exists a bijection f : n S
for some n . Let A be a subset of S. Let A0 = f 1 (A). Then A0 n. Let g : X A0 be the
standard enumeration for A0 . Then X + and g is a bijection by Definition 12.3.5. But X n by
Theorem 12.3.11 (i). So X by Theorem 12.2.17 (iv). But f A0 : A0 A is a bijection. (It is injective
because f is injective, and it is surjective because f (A0 ) = f (f 1 (A)) = A by Theorem 10.7.1 (i00 ).) Therefore
f g : X A is a bijection. Hence A is a finite set.
For part (ii), let S be a finite set and A S. Then A is a finite set by part (i). So there exist NS , NA and
bijections S : NS S and A : NA A by Definition 13.3.2. Let = 1 S idA A . Then : NA NS
is a well-defined function because Dom() = Dom(A ) = NA and Range() Range(1 S ) = NS , and is
injective because A , idA and 1 S are injective. So NA NS by Theorem 12.3.18 (i). Hence #(A) #(S)
by Definition 13.3.5.
For part (iii), let S be a finite set. Then there is a bijection f : n S for some n by Definition 13.3.2. Let
g : S be an injection. Then f 1 g : n is an injection. This is impossible by Theorem 12.3.13 (iii).
Hence there are no injections from to S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the cardinality of the disjoint union. However, one must first prove that this union is finite. This is where
induction seems to be inescapable. The construction in Theorem 13.4.2 exploits the well-defined subtraction
operation to inductively prove the existence of N3 such that N2 = N3 N1 , since N3 N1 is apparently so
straightforward to define.
To put it another way, if the solution N3 of the equation N3 N1 = N2 is known for given N1 and N2 , then
it is simple to verify that the equation is satisfied. The more difficult task is to solve the inverse problem.
(The majority of difficult problems in mathematics are such inverse problems.) In other words, N1 + N2
must somehow be constructed from given N1 and N2 . Some clues can be gleaned from Figure 13.4.1.
0
1
2
2+1=3
2+2=4
2+3=5
It can be seen in Figure 13.4.1 that the ordinal number 3 may be constructed by replacing each instance of the
empty set (i.e. the number 0) in the number 2 with the elements of the number 2. In other words, wherever
there is an empty box, replace this with and {}, which are the elements of 2. After the replacement, there
are 4 empty boxes instead of 2.
To construct the number 4, one may likewise substitute the elements of 2 for each empty box in 3, and to
construct 5, one may substitute the elements of 3 for each empty box in 3. The general rule is that N1 + N2
is obtained by substituting N1+ for each empty box in N2 . This suggests that there should be a set-theoretic
formula which constructs the sum of two finite ordinal numbers. However, this would require a recursive
descent through the boxes in the diagrams (i.e. the sets and sets within sets recursively), somehow replacing
the empty sets during the recursion. Such recursion is certainly no simpler than induction. There is a simple
set-theoretic formula to construct N + from N , but apparently no non-recursive set-theoretic formula for
adding general N1 , N2 .
The simplest way to define addition seems to be to exploit the explicit construction for successor sets,
combined with induction, to prove the existence of a solution N3 for the equation N3 N1 = N2 for any
given N1 and N2 in . An existence proof by induction is less satisfying than an explicit set-theoretic
formula, but infinitely more satisfying than appealing to an axiom of choice.
13.4.2 Theorem: Existence and uniqueness of the sum of two finite ordinal numbers.
N1 , N2 , 0 N3 , #(N3 \ N1 ) = N2 .
Proof: Let N1 . Let X = {N2 ; N3 \N1 , #(N3 \N1 ) = N2 }. (See Notation 13.3.6 for #(S) for
finite sets S.) Clearly X , and X because #(N1 \ N1 ) = #() = and N1 / N1 by Theorem 7.8.2 (i).
Let N X. Then there exists N3 such that #(N3 \ N1 ) = N and N3 / N1 . For such N3 , there exists a
bijection : N N3 \ N1 . Using Notation 12.2.15 for successor sets, define + : N + N3+ \ N1 by
(x) if x N
x N + + (x) =
N3 if x = N .
Then + is a well-defined function from N + to N3+ \ N1 because N + is the disjoint union of N and {N },
and N3 N3+ \ N1 because N3 N3+ = N3 {N3 } and N3 / N1 . To show that + : N + N3+ \ N1 is a
+ + +
bijection, let x1 , x2 N satisfy (x1 ) = (x2 ). If x1 , x2 N , then x1 = x2 because : N N3 \ N1 is
a bijection. If x1 N and x2 / N , then x2 = N and (x2 ) = N3 which cannot be an element of N3 \ N1 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
So this case is impossible because + (x1 ) 6= + (x2 ). The only remaining case to check is x1 / N and
x2 / N , which implies x1 = N = x2 . Thus + is injective. But Range() = N3 \ N1 and so Range(+ ) =
Range(){N3 } = (N3 \N1 ){N3 }. But N3 / N1 . So Range(+ ) = (N3 {N3 })\N1 = N3+ \N1 . Therefore
: N N3 \ N1 is a bijection. Thus #(N3+ \ N1 ) = N + for N3+ . Therefore N + X. It then follows
+ + +
13.4.3 Definition: The addition operation on the finite ordinal numbers is the map : for
which (N1 , N2 ) is the unique N3 which satisfies #(N3 \ N1 ) = N2 , for all N1 , N2 .
The sum of N1 and N2 is the finite ordinal number (N1 , N2 ), for N1 , N2 .
13.4.4 Definition: The subtraction operation on the finite ordinal numbers is the map
: {(N1 , N2 ) ; N2 N1 } given by (N1 , N2 ) = #(N1 \ N2 ) for all N1 , N2 with N2 N1 .
The difference between N1 and N2 is the finite ordinal number (N1 , N2 ), for N1 , N2 with N2 N1 .
13.4.6 Theorem: Some properties of addition and subtraction of finite ordinal numbers.
(i) N1 , N2 , N1 + N2 = N2 + N1 .
(ii) N1 , N2 , : N2 (N1 + N2 ) \ N1 , is a bijection.
In other words, N1 , N2 , Bij(N2 , (N1 + N2 ) \ N1 ) 6= .
(iii) N1 , N2 , : N2 (N2 + N1 ) \ N1 , is a bijection.
In other words, N1 , N2 , Bij(N2 , (N2 + N1 ) \ N1 ) 6= .
(iv) N1 , N2 , (N1 + N2 ) N1 = N2 = (N2 + N1 ) N1 .
(v) N1 , N2 \ N1 , (N2 N1 ) + N1 = N2 .
(vi) N0 , N1 , N2 , (N1 N2 N0 + N1 N0 + N2 ).
Proof: For part (i), let N1 , N2 . Let N3 = N1 + N2 and N = N2 + N1 . Then by Definition 13.4.3,
there exist bijections : N2 N3 \ N1 and : N1 N3 \ N2 . Define : N3 N3 by
(
(x) if x N1
x N3 , (x) = 1
(x) if x N3 \ N1 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(N2 N1 ) + N1 is a bijection. Hence (N2 N1 ) + N1 = N2 by Theorem 12.3.18 (ii).
For part (vi), let N0 , N1 , N2 with N1 N2 . Then N1 N2 by Definition 12.2.5. By part (ii), there are
bijections k : Nk (N0 + Nk ) \ N0 for k = 1, 2. Then 2 1 1 : (N0 + N1 ) \ N0 (N0 + N2 ) \ N0 is a
well-defined function because 1 1 and 2 are functions and Dom( 1 1
2 1 ) = Dom(1 ) = (N0 +N1 )\N0 and
1 1
Range(2 1 ) Range(2 ) = (N0 + N2 ) \ N0 , and 2 1 is an injection by Theorem 10.5.4 (i) because
1 1
1 and 2 are injective. Let = idN0 (2 1 ). Then : N0 + N1 N0 + N2 is a well-defined injection
because Dom(idN0 ) = N0 and Dom(2 1 ) are disjoint and Range(idN0 ) = N0 and Range(2 1
1
1 ) are
disjoint. Hence N0 + N1 N0 + N2 by Theorem 12.3.18 (ii).
13.4.7 Theorem: Let A and B be disjoint finite sets. Then #(A B) = #(A) + #(B).
Proof: Let A and B be disjoint finite sets. Then N1 = #(A) and N2 = #(B) are well-defined elements
of by Theorem 13.3.3 and Notation 13.3.6, and there exist bijections 1 : N1 A and 2 : N2 B.
There exists a bijection 3 : N2 (N1 + N2 ) \ N1 by Theorem 13.4.6 (ii). Let = 1 (2 1
3 ). Then
: N1 + N2 A B is a well-defined function because N1 + N2 is the disjoint union of Dom(1 ) = N1
and Dom(2 1 1 1
3 ) = Dom(3 ) = (N1 + N2 ) \ N1 , and 1 and 2 3 are well-defined functions. Since
1 1
Range(1 ) = A and Range(2 3 ) = 2 (Range(3 )) = 2 (N2 ) = B, it follows that Range() = A B.
The injectivity of follows from the injectivity of 1 and 2 1
3 . Therefore : N1 + N2 A B is a
bijection. Hence #(A B) = N1 + N2 = #(A) + #(B) by Definition 13.3.5.
13.4.8 Remark: Application of addition and subtraction of cardinalities of finite sets.
Theorem 13.4.9 is a fairly basic assertion regarding the cardinality of the symmetric difference of two finite
sets, but it demonstrates the application of the basic addition and subtraction arithmetic for finite ordinal
numbers. Since time does not permit the presentation of multiplication and division for finite ordinal
numbers, it is tacitly assumed that the subtraction of 2 #(A B) is the same as subtracting #(A B) twice.
(Theorem 13.4.9 is applied in the proof of Theorem 14.7.14 (iv).)
13.4.9 Theorem: #(A 4 B) = #(A) + #(B) 2 #(A B) for any finite sets A and B.
Proof: (A 4 B) (A B) = A (B \ (A B)) by Theorems 8.3.4 (ix) and 8.2.5 (vii, xvi) for general sets A
and B. The unions in this equation are disjoint by Theorems 8.3.4 (vii) and 8.2.5 (v, xvi). Therefore for finite
sets A and B, #(A4B)+#(AB) = #(A)+#(B\(AB)) by Theorem 13.4.7. But B is the disjoint union of
B\(AB) and AB because AB B. So #(B) = #(B\(AB))+#(AB) by Theorem 13.4.7. Therefore
#(B \ (A B)) = (#(B \ (A B)) + #(A B)) #(A B) = #(B) #(A B) by Theorem 13.4.6 (iv).
Hence #(A 4 B) = (#(A 4 B) + #(A B)) #(A B) = #(A) + #(B \ (A B)) #(A B) =
#(A) + #(B) #(A B) #(A B) by Theorem 13.4.6 (iv).
13.4.10 Theorem: Let A and B be finite sets. Then #(A B) #(A) + #(B).
Proof: Let A and B be finite sets. Let C = B \ A. Then A and C are disjoint sets. But C is a
finite set by Theorem 13.3.13 (i). So A and C are disjoint finite sets. So #(A C) = #(A) + #(C) by
Theorem 13.4.7. But C B. So #(C) #(B) by Theorem 13.3.13 (ii), and A C = A B. Hence
#(A B) = #(A C) = #(A) + #(C) #(A) + #(B) by Theorem 13.4.6 (vi).
13.4.12 Theorem: There is a unique function : with (0) = 1 and i , (i + 1) = (i) + (i).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
must be elements of Fn , for which uniqueness is assumed by inductive hypothesis, and the value of ( n+1) is
then uniquely determine by the value of (n). Therefore #(Fn+1 ) = 1. Thus 0 S and n + 1 S whenever
n S. So by Theorem 12.2.11, S = .
S SS
Now let = {; n , Fn } = n Fn . Then : is a well-defined function because
n n+1 for all n , where n denotes the unique function in Fn . The uniqueness of :
satisfying (0) = 1 and i , (i + 1) = (i) + (i) follows easily by induction.
13.4.13 Definition: The nth power of 2 , for n , is the value of (n), where : is the function
which satisfies (0) = 1 and i , (i + 1) = (i) + (i).
13.4.15 Theorem: n , n 2n .
and implications where it is difficult to find a foothold. Unfortunately, any attempt to untangle the gordian
knots of infinity concepts with magic axioms leads to a kind of undecidability denial. Some things just cant
be decided. There is some short-term benefit in denying the non-existence of bijections when they are sorely
wanted. But in the longer term, learning to live with undecidability is best.
13.5.2 Definition: An infinite set is a set which is not finite.
13.5.3 Remark: Pseudo-notation for infiniteness of sets.
Notation 13.5.4 is a pseudo-notation, based on Notation 13.3.8, which means that a set is not finite. The
apparent equality in this expression has very limited validity.
13.5.4 Notation: #(S) = , for a set S, means that S is an infinite set.
13.5.5 Remark: The ambiguous boundary between finite and infinite.
It cannot be proved within ZF set theory that every infinite set (according to Definition 13.5.2) includes
a subset which is equinumerous to the set of ordinal numbers . This is an extremely inconvenient fact.
Definition 13.5.2 says what an infinite set is not, not what it is. The existence of an infinite sequence of
distinct elements in an infinite set can be proved with the assistance of the axiom of countable choice, but
this is useless because axioms of choice never deliver concrete examples of the choice functions which are
claimed to exist.
One way to resolve this issue would be to simply define an infinite set to mean a set which includes a set
equinumerous to , but a non-standard meaning for a standard definition generally creates more problems
then it solves, in this case because there would still remain a mystery as to what lies in between the finite
and the infinite. (With the standard definition of infinite, the mystery relates to what lies between the
finite and the -infinite.)
Definition 13.5.6 introduces the non-standard class of -infinite sets, which is fairly close to an intuitive
idea of an infinite set, and also the standard countable and uncountable cardinality classes. (The relations
between these classes are illustrated in Figure 13.5.1 with rough indications of their meanings.)
set set
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
finite infinite finite infinite
< 6< <
countable countable
By Theorem 13.8.6, a ZF set is -infinite if and only if it is Dedekind-infinite. However, these concepts
have an important philosophical difference. The test for the -infinite class is comparative or extrinsic,
in the sense that the set must be shown to be equinumerous to a given yardstick set . The test for
the Dedekind-infinite class is intrinsic because it stipulates only the existence of a particular kind of set
endomorphism. (See Definition 10.5.19 for set endomorphisms.) Moore [321], page 24, says the following.
(See Definition 13.8.3 for Dedekind-finite sets.)
In 1882 Cantor still did not believe that a simple definition of finite set was necessary or even
possible. Therefore he was surprised when Dedekind communicated his own definition [. . . ], the
first adequate definition of finite set which did not presuppose the natural numbers.
13.5.7 Theorem:
(i) is an infinite set.
(ii) Every -infinite set is infinite.
(iii) Every subset of a countable set is countable.
Proof: For part (i), suppose that is a finite set. Then there exists a bijection f : n for some n
by Definition 13.3.2. Then f 1 : n is a bijection. So f 1 : n is an injection. But this is impossible
by Theorem 13.3.13 (iii). Therefore is not a finite set. Hence is an infinite set by Definition 13.5.2.
For part (ii), let S be an -infinite set. Suppose that S is finite. Then by Definition 13.3.2, there exists a
bijection g : n S for some n . But by Definition 13.5.6, there exists an injection f : S. Then
g 1 f : n is an injection. But this is impossible by Theorem 13.3.13 (iii). Therefore S is not a finite
set. Hence S is an infinite set by Definition 13.5.2.
Part (iii) follows from Theorem 12.3.8.
13.5.8 Remark: Extending the finite set cardinality notation to countably infinite sets.
Definition 13.3.5 and Notation 13.3.6 may now be extended from finite sets to include countably infinite sets
as in Definition 13.5.9 and Notation 13.5.10. It is easy to verify that these definitions and notations are
compatible with each other, and also with Notations 13.1.5 and 13.1.11.
13.5.9 Definition: The numerosity or cardinality of a countably infinite set S is the set .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
13.5.10 Notation: #(S), for a countably finite set S, denotes the numerosity (or cardinality) of S. In
other words, #(S) = if and only if S is a countably infinite set.
13.5.13 Theorem:
(i) A set S is finite if and only if n , f : S n, f is injective.
(ii) A set S is finite if and only if n , f : n S, f is surjective.
(iii) A set S is countable if and only if f : S , f is injective.
(iv) A set S is countable if and only if S = or f : S, f is surjective.
Proof: For part (i), let S be a set which satisfies n , f : S n, f is injective. Then there exists an
injective function g : S n for some n . The set g(S) is a well-defined subset of . Suppose that g(S) = .
(In other words, n = 0.) Then S = . So the empty function from n to S with n = satisfies Definition 13.3.2
(because the empty function is a bijection from to ). Now suppose that g(S) 6= . Then n 1 and S 6= .
Define a function h : n S inductively by h(0) = min(g(S)) and h(i) = min{j g(S) ( \ n); j > h(i 1)}
for all i n\{0}. This function is well defined because {j g(S)( \n); j > h(i1)} is a well-defined non-
empty subset of if h(i 1) is well defined. Let m = {k ; h(k) n}. Define : m S by = f 1 h.
Then m and is a bijection. Therefore S is finite by Definition 13.3.2. The converse follows immediately
from Definition 13.3.2.
For part (ii), let S be a set which satisfies n , f : n S, f is surjective. Then there exists a surjective
function f : n S for some n . Define g : S by g(x) = min{i ; f (i) = x} for all x S. This
is well defined because the set f 1 ({x}) = {i ; f (i) = x} is a non-empty subset of , which contains a
well-defined minimum because the standard order on is well ordered. To see that the function g is injective,
let x, y S satisfy g(x) = g(y). Then g(x) = min(f 1 ({x})) = min(f 1 ({y})) = g(y). So g(x) f 1 ({y})
and g(y) f 1 ({x}). Therefore f (g(x)) = y and f (g(y)) = x. So x = f (g(y)) = f (g(x)) = y. Hence g is
injective. But Range(g) n. So g : S n. Hence S is finite by part (i). The converse assertion follows
directly from Definition 13.3.2.
For part (iii), let S be a set which satisfies f : S , f is injective. Let Y = Range(f ). Then Y .
So there exists a bijection g : N Y for some set N + by Theorem 12.3.8. Then g 1 f : S N is
a bijection. Therefore S is a countable set by Definition 13.5.6. Now suppose that S is countable. Then
N + , g : N S, g is bijective by Definition 13.5.6. Let f = g 1 . If N = , then f : S is a
bijection which satisfies the proposition f : S , f is injective. Otherwise, N and f : S N is
an injection which satisfies the proposition.
For part (iv), let S satisfy f : S, f is surjective. Let Y = {j ; i , (i < j f (i) 6= f (j))}.
Then f Y : Y S is a bijection. There exists a bijection g : N Y for some set N + by Theorem 12.3.8.
Then f Y g : N S is a bijection. Therefore S is a countable set by Definition 13.5.6. Now suppose that
S is countable. Then N + , g : N S, g is bijective by Definition 13.5.6. If N = 0, then S = . So
assume that N 6= 0. Define f : S by f (i) = g(i) for all i N , and f (i) = g(0) for all i \ N . Then f
is surjective.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
13.5.14 Remark: The relations between some definitions of infinite.
Definition 13.5.6 means that an -infinite set is a set which contains a subset which is equinumerous to .
Every -infinite set is infinite. The converse follows with the assistance of the axiom of countable choice.
(See Definition 13.5.19 and Theorem 13.8.10.) It is not easy to construct a set which is infinite but not
-infinite. There are ZF models in which all infinite sets are -infinite, and other ZF models in which some
infinite sets are not -infinite. (See particularly Levy [373] for the latter.)
The following table expresses some classes of finite and infinite sets in terms of Notations 10.5.23 and 13.3.11.
class definition enumeration space
finite n , Bij(n, S) 6= Enum(, S) 6=
infinite n , Bij(n, S) = Enum(, S) =
-infinite Inj(, S) 6=
countably infinite Bij(, S) 6= Enum({}, S) 6=
countable n + , Bij(n, S) 6= Enum( + , S) 6=
The relations between these classes of finite and infinite sets are illustrated in Figure 13.5.2.
13.5.15 Remark: The need to be able to produce the maps which verify infinite cardinality.
Merely knowing that a set S satisfies the proposition n , f : n S, f is bijective is not sufficient
for some practical purposes. Often one must be in possession of the number n and the map f if one wishes
to use the finiteness of the set in a proof or calculation.
This is a general ontological issue regarding propositions which contain existential quantifiers. It is not
enough to merely know that something exists. One must be able to produce an explicit example on demand.
One may think of a universal quantifier as meaning that no matter what example you produce, I can show
that my assertion about it is true. An existential quantifier, by contrast, may be thought of as meaning:
set
S
-infinite set
Inj(, S) 6=
countable set
n + , Bij(n, S) 6=
Figure 13.5.2 Relations between classes of finite and infinite sets (in ZF set theory)
If you require an explicit example, I can produce one for you. This is a kind of constructivism. Or maybe
it is productivism, or production on demand.
13.5.16 Remark: Cartesian products of finite and countable families of non-empty sets.
The non-emptiness of Cartesian products of general families of non-empty sets requires the axiom of choice.
(See Theorem 10.11.10.) For finite families, Zermelo-Fraenkel set theory suffices. For general countably
infinite families, the axiom of countable choice is required. Of course, there are many explicit means of
showing non-emptiness of Cartesian products of infinite set-families which are not too unusual. For example,
if a well-ordering is defined on the union of the range of the set-family, an element of the Cartesian product
may be chosen as the map from each index value to the minimum of the corresponding non-empty set.
hh 2017-3-17. Near here, also give a finite choice theorem in the style of Definition 7.11.10 item (9). ii
13.5.17 Theorem: Finite choice theorem for Cartesian products.
The Cartesian product of a finite family of non-empty sets is non-empty.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Let (Si )in be a family of non-open sets for some n . If n = 0,Sthen (Si )in = {} = 6 by
Theorem 10.11.6 (i). Assume that ik Si 6= for some kS n. Then {f : k ik Si ; i k, fi Si } =
6
by Definition 10.11.2. So there exists a function f : k ik Si such that i k,Sfi Si . But k + 1 n.
So Sk 6= . So x, x Sk . Then f 0 = f {(k, x)} is a function f 0 : k + 1 ik+1 Si which satisfies
i k + 1, fi0 Si . So f 0 ik+1 Si . Therefore ik+1 Si 6= . Hence by induction, in Si 6= .
13.5.18 Remark: The countable axiom of choice.
It may seem somewhat incongruous that an apparently very basic set theorem is introduced so long after the
basic set theory definitions in Chapter 7. This is an inevitable consequence of the need to define concepts
before they are used because finiteness is a fairly high-level concept, not part of the ZF axiom framework.
The unfundamental character of the finite choice theorem exposes also how unfundamental the countable
choice axiom is. One would usually define all axioms up front and then start developing the theory from
there. But the countable choice axiom requires a fair amount of theoretical development before countability
of sets can be defined. Finiteness requires even more theoretical development.
13.5.19 Definition [mm]: Zermelo-Fraenkel set theory with countable axiom of choice is Zermelo-Fraenkel
set theory as in Definition 7.2.4, together with the additional Axiom (90 ).
(90 ) The countable choice axiom: For every countable set X,
((A, (A X x, x A)) A, B, ((A X B X) (A = B x, (x A x B))))
C, A, (A X z, (z A z C y, ((y A y C) y = z))).
In other words, for every countable set X,
/ X A, B X, (A = B A B = )) (C, A X, 0 z C, z A).
( (13.5.1)
In other words, every countable pairwise disjoint collection of non-empty sets is choosable.
13.5.20 Theorem [zf+cc]: The Cartesian product of a countable family of non-empty sets is non-empty.
Proof: The assertion may be proved as for Theorem 10.11.10, with the difference that the family of
non-empty sets is required to be countable.
13.6.2 Remark: The union of an explicit countably infinite family of finite sets is countable.
It is fairly simple to show that the union of an explicit countably infinite family of finite sets is countable by
first defining a natural injection from the union of the family to the set (e.g. by mapping each finite set
to a row of the doubly infinite array), and then exploiting an explicit bijection between and such as
is presented in Theorem 13.7.2. This shows that there exists a surjection from a subset of to the union of
the countably infinite family. From this, it is possible to deduce that the union is either finite or countably
infinite. However, there is no need to use such abstract methods when concrete methods are available. The
proof of Theorem 13.6.4 inductively constructs an explicit bijection for a given countably infinite family of
finite sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Let S = (Si )S i be an explicit countably infinite family of finite sets. Then there are functions
n : and : i Si such that i S : ni Si is a bijection for all i . Inductively define functions
I : + , J : + and g : ( Range(S)) by g(0) = with the recursion rules:
13.6.6 Theorem: The union of a countable family of finite subsets of a totally ordered set is countable.
P
Proof: Let X be a totally ordered set with order <. Let S : (X) be such that Si is finite for
all i . Then there exists a unique function n : such that there exists a bijection from ni to Si for
all i . From amongst the possible bijections for each i , choose the unique bijection i : ni Si
which preserves order. In other words, i , j, k ni , (j < k i (j) < i (k)). (Thus i (0) = min(Si ),
i (1) = min(Si \ {min(Si )}), and so forth
S by induction.) Then S is an explicit countably infinite family of
finite sets by Definition 13.6.3. Hence Range(S) is a countable set by Theorem 13.6.4.
13.6.7 Remark: Weaker conditions to obtain countable union of finite sets theorem.
Whereas Theorem 13.6.4 requires an explicit countably infinite family of sets in order to avoid invoking
a choice axiom, Theorem 13.6.6 requires the target set (the union of the range of the set map) to be
totally ordered, which is a stronger requirement. (The countable choice axiom version of the theorem is
Theorem 13.6.12.) The morbidly curious reader might like to know that the result can be obtained from
a choice axiom, denoted C(, < ), which is weaker than either the countable choice axiom or the total
order requirement. The relations between these requirements are illustrated in Figure 13.6.1. (A much more
detailed diagram is given by Moore [321], page 324.)
axiom of choice
C(, ) [1]
Figure 13.6.1 Implications and impossible implications between some ZF choice axioms
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The arrows in Figure 13.6.1 mean that the implication can be proved in ZF set theory. The crossed arrows
mean that there is a known ZF model in which the implication is false. (The numbers in square brackets
are the form numbers in Howard/Rubin [312], pages 964.) The notation U T (, < , ) means that the
union of a countable family of finite sets is countable. (This is essentially the same as Theorem 13.6.12.)
It can be seen in Figure 13.6.1 that form 30, the existence of total (i.e. linear) orderings on all sets implies
form 10, the choice axiom C(, < ), which means that there exists a choice function for any countable
family of finite sets. But this is equivalent to U T (, < , ) (form 10A), which is the desired theorem. The
diagram also shows that the countable choice axiom (CC), denoted here as C(, ) (form 8) implies the
required weaker choice axiom C(, < ). (This topic is also mentioned in Remark 33.4.20.)
The reader who desires further gory details may consult Howard/Rubin [312]. One point of possible further
interest is that their form 14 implies form 49, which implies form 10. Thus Theorem 13.6.12 can be
proved in ZF with the additional axiom known as the boolean prime ideal theorem (form 14), which states
that every boolean algebra has a prime ideal. This subject is, luckily, outside the scope of this book. (See
Remark 1.6.3 part (1) for this scope constraint.)
13.6.8 Remark: Implicit versus explicit countable families of finite sets. Quantifier swapping.
P
A countably infinite family of finite sets has the formSS : (X) for some map S and set X. (By the
replacement axiom, the set X can always be chosen as Range(S), which is necessarily a ZF set.) The finite
set condition in the implicit case in Theorem 13.6.12 may be written as
i , n , Bij(n, Si ) 6= . (13.6.1)
In terms of Notation 13.3.11, this condition may be written as i , Enum(, Si ) 6= . The explicit case
in Definition 13.6.3 and Theorem 13.6.4 may be written as
S
n : , : Bij(ni , Si ), i ,
i
i Bij(ni , Si ). (13.6.2)
Thus the implicit form in line (13.6.1) specifies only that the set of bijections Bij(n, Si ) is non-empty, whereas
the explicit form in line (13.6.2) makes a choice i for the bijection in each set of bijections Bij(ni , Si ).
In a very rough way, one may say that the provision of a choice function enables the quantifiers i and n
to be swapped. When a choice function is available, one may write n, i instead of i, n. Slightly
more precisely, a proposition of the form i, n, , Bij(n, Si ) can be replaced with a proposition of
the form n, , i, i Bij(ni , Si ). Thus the existential variables n and are replaced by explicit
choice functions n and when the quantifiers are swapped.
13.6.10 Remark: The union of a countably infinite family of finite sets is countable.
Some kind of axiom of countable choice is required to be added to the ZF axioms in order to prove that the
union of a countably infinite family of finite sets is countable. A succinct and definitive account of this issue
is given by Howard/Rubin [312], pages 1920. (See under Form 10.) Theorem 13.6.12 is in fact equivalent
(modulo the ZF axioms) to the axiom that every countably infinite family of non-empty finite sets has a
choice function. (This is weaker than the full axiom of countable choice.)
It is assumed in Theorem 13.6.12 that all of the sets in the family are subsets of a single given set. This is
unnecessary because the ZF replacement axiom guarantees that the union of the range of the family is a set.
(See Section 7.7 for the replacement axiom.)
13.6.11 Theorem [zf+cc]: Let S be a countably infinite family of finite sets. Then S is an explicit
countably infinite family of finite sets.
Proof: Let S be a countably infinite family of finite sets. Then as discussed in Remark 13.6.8, there is a
P
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
set X such that S : (X), and i , n , Bij(n, Si ) 6= . There is a unique function n :
such that i , Bij(ni , Si ) 6= because Bij(n, Si ) can beSnon-empty for at most one value of n . Then
by the axiom of countable choice, there is a map : i Bij(ni , Si ) such that i , i Bij(ni , Si ).
Hence S is an explicit countably infinite family of finite sets by Definition 13.6.3.
P
S Theorem [zf+cc]: For some set X, let S : (X) be such that S(i) is a finite set for all i .
13.6.12
Then Range(S) is a finite or countably infinite set. In other words, for any set X, for any S : (X), P
S
(i , n , g : n S(i), g is a bijection) (m + , h : m Range(S), h is a bijection).
ZF input choice
axioms conditions axiom
ZF input choice
choice
axioms conditions function
function
ZF theorem ZF theorem
output output
assertion assertion
the same theorems are obtained, but a clean interface (to use the engineering term) is created between
ZF and ZF+AC theorems so that mathematicians who prefer a more constructive style of mathematics can
avoid the most non-constructive theorems.
Even if ZF and ZF+AC theorems are not split in this way, one may regard all AC-invoking theorems as
being virtually split. In other words, one may interpret all AC-invoking theorems as having an invisible
extra condition which states that the user must provide whatever choice function is required, and an invisible
shim theorem which invokes an AC axiom to provide the choice function. When an AC axiom is invoked
in a proof with language such as by the axiom of choice, X is true, one may replace this with using
the provided choice function, X is true. Each invocation of an AC axiom creates an implicit IOU which
must be paid by the user of the theorem. Such invocations are place-holders which must filled with user-
provided choice functions. Thus choice axiom invocation becomes a matter of linguistic style which does not
materially affect the structure of mathematical proofs. Belief in choice axioms then becomes a matter of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
personal interpretation and faith. Non-believers understand that choice functions must really be provided.
Further clear examples of shim theorems are Theorems 23.5.9 and 23.5.10, which are front ends for
Theorems 23.5.3 and 23.5.7 respectively.
13.6.14 Remark: Tooth fairies, Father Christmas, the Easter Bunny, and the axiom of choice.
The axiom of choice is comparable to the tooth fairy myth. Parents used to tell young children in Britain
in the olden days that when they lost a tooth, they could put it under their pillow at night and a tooth
fairy would replace the tooth with a sixpence. Of course, with the benefit of hindsight, it was the parents
who had to remove the tooth and put a sixpence under the pillow while the child slept. No tooth fairy ever
delivered a sixpence. And no axiom of choice ever delivered a choice function.
Similarly, Father Christmas supposedly put presents under the Christmas tree, but actually the parents
had to do this while the child was asleep. Another such myth was the Easter Bunny, which supposedly hid
coloured eggs in scattered locations around the back garden each Easter for the children to find. In each case,
the child perceived this as magic or supernatural, but ultimately it was the adults who had to provide the
sixpence, the presents and the Easter eggs. The axiom of choice is entirely analogous because it supposedly
delivers choice functions, but in every case, the concrete choice function which is found under the pillow,
under the tree, or in the garden, is put there by an adult. This is the moral of the story in Remark 13.6.13.
Moore [321], pages 200201, makes the following comment about the controversy related to this issue in
about 1918. (Here the Axiom means the axiom of choice.)
Much of the controversy surrounding the Axiom, Sierpinski contended, arose from divergent inter-
pretations of its meaning. In this vein he considered the name Axiom of Choice to be inappro-
priate. For the Axiom did not enable a mathematician to choose an element effectively even from
a single non-empty set. In fact there existed sets, proven non-empty by the Axiom, in which it was
not known how to determine a single element uniquely, that is, to give a characteristic property
distinguishing this element from all others in the set. One such set was that of all non-measurable
R
subsets of . On the other hand he knew of no set, shown to be non-empty without the Axiom, in
which a particular element could not be determined.
Exactly! When a set is defined without the axiom of choice, the same human being who specified the set
can generally choose at least one element of the set. But when the set is delivered by the axiom of choice,
there is no such information. Effectively the ZF axioms (except extension and regularity) merely anoint sets
whose identity is already known, accepting such sets into the arena of ZF set theory. The axiom of choice
does not anoint a human-made set, but rather claims that a set exists without anyone knowing what it is.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a triangular enumeration a rectangular enumeration
Figure 13.7.1 Two enumeration styles for
Proof: For part (i), define a relation < on as in line (13.7.1). Let (i1 , j1 ), (i2 , j2 ) . Then
either i1 j1 6= i2 j2 or i1 j1 = i2 j2 or i1 j1 =
i j , and only one of these cases can occur, because
\ 2 2
set inclusion is a total order on . If i1 j1 6= i2 j2 , then (i1 , j1 ) < (i2 , j2 ) and (i2 , j2 ) 6< (i1 , j1 ). Similarly,
if i1 j1 \ i2 j2 , then (i2 , j2 ) < (i1 , j1 ) and (i1 , j1 ) 6< (i2 , j2 ). Let k = i1 j1 = i2 j2 . Then k ,
=
and i1 k, j1 k, i2 k and j2 k. Since set inclusion is a total order on , either i1 6= i2 or i1 = i2
or i1
=
\ i2 , and only one of these cases can occur. Suppose that i1 6= i2 . Then j1 = j2 . So (i2 , j2 ) < (i1 , j1 )
and (i1 , j1 ) 6< (i2 , j2 ). Suppose that i1 = i2 . Then either j1 6= j2 or j1 = j2 or j1 =
j , and only one of these
\ 2
cases can occur. If j1 6= j2 , then (i1 , j1 ) < (i2 , j2 ) and (i2 , j2 ) 6< (i1 , j1 ). If j1 = j2 , then (i1 , j1 ) = (i2 , j2 )
and (i1 , j1 ) 6< (i2 , j2 ) and (i2 , j2 ) 6< (i1 , j1 ). If j1
\ j2 , then (i1 , j1 ) 6< (i2 , j2 ) and (i2 , j2 ) < (i1 , j1 ). Finally,
=
suppose that i1 \ i2 . Then j1 = j2 . So (i1 , j1 ) < (i2 , j2 ) and (i2 , j2 ) 6< (i1 , j1 ). In all cases, exactly one of
=
the three propositions (i1 , j1 ) < (i2 , j2 ), (i1 , j1 ) = (i2 , j2 ) and (i2 , j2 ) < (i1 , j1 ) holds. Hence < is a strict
total order on .
For part (ii), first note that the case conditions i
\ j, i j and i 6= 0, and i = 0 constitute a partition
=
of the elements of . (In other words, one and only one of them is true for each element of .)
The ordinal number j + 1 is well S defined as j {j} for all j by Theorem 12.1.34 (ii). The ordinal
number i 1 is well defined as i for all i \ {0} by Theorem 12.1.34 (iii). So (i, j) for all
(i, j) . Since h(0) , it follows by induction that h(n) for all n . So h :
is a well-defined function.
For part (iii), let n and h(n) = (i, j), and suppose that i \ j. Then i j = i = i (j + 1) because
=
j + 1 i, and h(n) = (i, j + 1) by equation (13.7.2). Since j 6= j + 1, it follows from equation (13.7.1)
that (i, j) < (i, j + 1). Suppose that i j and i 6= 0. Then i 1 6= i, and so i j = j = (i 1) j. So
(i, j) < (i 1, j) by equation (13.7.1). Suppose that i = 0. Then i j = j and (j + 1) 0 = j + 1. So
(i, j) < (j + 1, 0) by equation (13.7.1). Thus h(n) < h(n + 1) for all of the cases in equation (13.7.2).
T
For part (iv), suppose that n1 , n2 satisfy n1 < n2 and h(n1 ) h(n2 ). Let n02 = {n ; h(n1 ) h(n)}.
(In other words, n02 is the minimum n with h(n1 ) h(n).) Then n02 is a well defined element of
because {n ; h(n1 ) h(n)} is non-empty and is well ordered. Clearly n1 < n02 , and h(n1 ) h(n02 ) and
h(n1 ) h(n02 1). (Note that n02 might equal n1 . So equality cannot be excluded here.) So h(n02 1) h(n02 ).
But this contradicts part (iii). Therefore h(n1 ) < h(n2 ). Hence n1 , n2 , (n1 < n2 h(n1 ) < h(n2 )).
Part (v) follows from parts (ii) and (iii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (vi), it is clear that Range(h) 0 0. Suppose that Range(h) k k for some k . Then
(0, k) Range(h). So h(n) = (0, k) for some n , and so h(n + 1) = (k + 1, 0) Range(h). Therefore
(k +1, j) Range(h) for all j with j k +1 by induction on j. (To see this, note that if (k +1, j) Range(h)
and j < k +1, then (k +1, j +1) Range(h) by the case i \ j in line (13.7.2).) So (k +1, k +1) Range(h).
=
But then (i, k + 1) Range(h) for all i with i k + 1. This follows by induction on i. (To see this, note
that if (i, k + 1) Range(h) and i k + 1, then (i 1, k + 1) Range(h) by the case i j and i 6= 0 in
line (13.7.2).) Therefore (0, k + 1) Range(h). But all elements (i, j) of which satisfy i j = k + 1 are
either of the form (i, k+1) with i k+1, or (k+1, j) with j k+1, and all of these elements are in Range(h).
So Range(h) {(i, j) ; (i j) = k + 1}. Then since Range(h) k k = {(i, j) ; (i j) k},
it follows that Range(h) {(i, j) ; (i j) k + 1} = (k + 1) (k + 1). By induction on k , it
follows that Range(h) k k for all k . Therefore Range(h) . So h : is a surjection.
Hence h : is a bijection by part (v).
Part (vii) follows from part (vi) and Definition 13.5.6.
This explicit enumeration is exploited in the proofs of Theorems 44.3.3 and 44.4.2.
A corresponding explicit enumeration for the rectangular enumeration of which is illustrated on the
right of Figure 13.7.1, and has the state machine described in Theorem 13.7.2 (ii), can be defined explicitly
as the map n 7 (I(n), J(n)), where the functions f, g, I, J : are given by
n , g(n) = floor(n1/2 )
n , f (n) = g(n)2
n , I(n) = g(n) + min(0, f (n) + g(n) n)
n , J(n) = min(g(n), n f (n)).
Unfortunately, this explicit enumeration cannot be used for the proof of Theorem 13.7.2 because it is based on
higher-level real-number algebraic functions and operations. The floor of the square root could be replaced
with an integer function, for example by defining g(n) = max{k ; k2 n}, but this requires the
definition of the integer multiplication operation. Similarly, the complicated expression for g in the triangular
enumeration case may be replaced with the formula g(n) = max{k ; k(k + 1) 2n}.
It could be argued that as soon as addition and multiplication have been defined (using countable induction)
for the ordinal numbers, these explicit formulas for bijections from to are at least as concrete and
explicit at the bijections presented in Theorem 13.7.2 (ii) and Remark 13.7.3. However, the expressions for
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
g(n) require the solution of a kind of maximisation problem, which is not fully concrete and explicit.
13.7.7 Theorem: Let (Si )ni=1 be a family of countably infinite sets for some n Z+. Then the Cartesian
set-product ni=1 Si is countably infinite.
Proof: The assertion follows from Theorem 13.7.6 and Definition 13.5.6.
Proof: For a proof of Theorem 13.7.9, see Pinter [327], page 148; Halmos [307], pages 9192. See also
Moore [321], page 9, for a brief outline proof. (See Remark 13.7.10 for excuse for absence of proof.)
This is well defined because h(n) is defined for each n in terms of the values of h(k) for k n. Then the
set {(i, j) ; k n, h(k) < (i, j)} is a non-empty subset of , which has a well-defined minimum.
Thus any well-ordering induces an enumeration on .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Dedekind finite set definition is attractive because it uses a simple intrinsic rule instead of the extrinsic
machinery of finite ordinal numbers, but the magic wand of the axiom of choice is the greater evil. (See
Remark 7.11.13 (15) for further comments on Dedekind-finite sets. See Definition 13.3.2 for finite sets.)
13.8.2 Definition: A Dedekind-infinite set is a set S such that f (S) 6= S for some injection f : S S.
13.8.3 Definition: A Dedekind-finite set is a set S such that every injection from S to S is a surjection.
Proof: Let S be a finite set. Then there is a bijection f : N S for some ordinal number N . Let
g : S S be an injection. Then g 0 = f 1 g f : N N is an injection. So g 0 : N N is a bijection by
Theorem 12.3.13 (i). Therefore g : S S is a bijection. Hence S is Dedekind-finite.
Proof: Let S be a Dedekind-infinite set. Then there exists f : S S with f (S) 6= S. So S \ f (S) 6= .
Therefore there exists x0 such that x0 S \ f (S). Let x1 = f (x0 ). Then x1 f (S) because x0 S, and
x1 / f (f (S)) because x0
/ f (S) and f is injective. So x1 f (S) \ f (f (S)). Define the sequence of functions
(f k )k inductively by f 0 = idS and f k+1 = f f k for all k . Define the sequence (xk )k inductively
by xk+1 = f (xk ) for k , where x0 is as stated. Then it follows by induction that xk f k (S) \ f k+1 (S)
for all k . Therefore (xk )k is an infinite sequence of distinct points in S. Hence S is -infinite.
To show the converse, let S be an -infinite set. Then there is an infinite sequence x = (xk )k with xk S
for all k , and xk 6= x` for all k, ` with k 6= `. Define f : S S by f (xk ) = xk+1 for all k , and
f (z) = z for all z S \ Range(x). Then f is a well-defined function from S to S, and Range(f ) = S \ {x0 }.
Therefore f (S) 6= S. Hence S is Dedekind-infinite.
13.8.7 Remark: The mysterious mediate cardinals.
Sets which are neither finite nor Dedekind-infinite were referred to as mediate cardinals by Whitehead/
Russell, Volume II [351], page 288, where inductive means the same as finite, and reflexive means the
same as Dedekind-infinite.
The following properties of cardinals which are neither inductive nor reflexive (supposing there are
such) are easily proved. Let us put
NC med = N0 C NC induct NC refl Df,
Cls med = Cls induct Cls refl Df,
where med stands for mediate.
The authors then give some properties of these mediate cardinals. (The abbreviations in the quotation
have the following meaning. NC means the class of cardinal numbers [351, page 5]. N0 C means the class
of homogeneous cardinals [351, page 8]. Cls means class [350, page 198]. NC med means the class of
mediate cardinals. Cls med means the class of mediate sets. The symbol indicates the unary or binary
class complement operation. Df means that the left term is defined to mean the right-hand expression.)
13.8.8 Remark: Mediate cardinals versus finite and Dedekind-infinite sets.
The relations between mediate cardinals and the two most commonly used definitions of finite and infinite
sets are illustrated in Figure 13.8.1.
finite infinite
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
finite sets cardinals
Dedekind
Dedekind infinite
infinite sets
(-infinite)
Proof: Let X be an infinite Q set. For N , define SN = Inj(N + 1, X). Then SNQ6= for all N
because X is infinite.
Q So N SN 6= by the axiom of countable choice. Let N SN . In other
words, : N SN satisfies (N ) Inj(N + 1, X) for all N . Now define a function h : X as
the concatenation of all of the sequences (N ). In other words, h is defined as follows.
Thus h(k) is the I(k)-th element in the finite sequence (N (k)) for each k . Then h is a countably
infinite sequence of elements of X, and a countably infinite subsequence of h is injective. Duplicates may be
removed from h to yield an injective countably infinite sequence f : X as follows.
The set {j ; i , (i < j h(i) 6= h(j))} is guaranteed to be non-empty for all k . Thus the pair
(J(k), f (k)) is defined inductively for all k . Clearly f satisfies line (13.8.1).
Proof: Part (i) follows from Theorem 13.8.10 and Definition 13.5.6.
Part (ii) follows from part (i) and Theorem 13.8.6.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (iii) follows as the contrapositive of part (ii).
13.8.13 Remark: The benefits of substituting Dedekind infinity for traditional infinity.
In practice, when one demonstrates that a set is infinite, the core of the demonstration is typically a con-
struction of an injective map from the ordinal numbers to the set. This clearly implies that the set is
non-finite, and is therefore infinite according to Definition 13.5.2. But such a construction also has the
stronger consequence that the set is thereby proved to be -infinite.
It is difficult to think of any practical situation where a set is demonstrated to not be finite while simulta-
neously there is no clear possibility for showing that the set is -infinite. Consequently, it seems reasonable
to replace the term infinite with either -infinite of the equivalent term Dedekind infinite.
13.8.14 Remark: Relations between various definitions of infinite cardinality.
The relations between some notions of infinity are illustrated in Figure 13.8.2. (The arrows in Figure 13.8.2
indicate specialisation of definitions. See Definition 13.5.6 for -infinite, countable and uncountable sets.)
Here an uncountable -infinite set means an -infinite set which is not countably infinite, whereas an
uncountable infinite set means an infinite set which is not countable. Thus an uncountable infinite set in ZF
set theory could have mediate cardinality. This conundrum is illustrated in Figure 13.8.3.
sets
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
finite infinite
countably uncountably
finite mediate
infinite -infinite
It is not possible to say whether mediate cardinal sets are smaller or larger than countably infinite sets
because they are not comparable. A related issue arises with uncountable -infinite sets because these may
be comparable to standard uncountable yardstick sets, but they might not be comparable. In other words,
sets which are -infinite and not countably infinite are not necessarily comparable to some standard yardstick
set such as a general ordinal number or beta-set. (See Section 13.2 for beta-cardinality.)
13.8.15 Remark: Avoiding higher-order mediate cardinals.
Although an uncountable -infinite set is at least as big as , but not of the same size as , this leads to
further uncertainties which are similar to the mediate cardinality issue. (The existence of mediate cardinals
of higher degree was investigated in 1923 by Wrinch [388]. See Moore [321], pages 211212.) To be really
sure, in a positive way, that a set is bigger than , a safe way to do this is to show that the set has a
P P
subset which is equinumerous to (). Similarly, to ensure that a set is bigger than (), it is safest to
PP
demonstrate that it contains a subset equinumerous to ( ()), and so forth.
name definition
I-finite Every non-empty set of subsets of A has a maximal element.
Ia-finite In every two-set partition for A, at least one of the two sets is I-finite.
II-finite Every non-empty monotone set of subsets of A has a maximal element.
III-finite The power set of A is Dedekind finite.
IV-finite A is Dedekind finite.
V-finite |A| = 0, or 2 |A| > |A|.
VI-finite |A| = 0, or |A| = 1, or |A|2 > |A|.
VII-finite A is I-finite or not well-orderable.
Table 13.9.1 Variant definitions of finiteness
All of the definitions in Table 13.9.1 are set-theoretic. In other words, they are logical predicates in terms
of the equality and set-membership relations, not referring in any way to the set of ordinal numbers .
The I-finite set definition is equivalent to the standard -based finite set concept in Definition 13.3.2.
(This is shown in Theorem 13.9.4.) Only two of these finite set definitions feature prominently in applicable
analysis, namely the I-finite and IV-finite definitions.
These eight finite-set classes are distinct in ZF set theory. In other words, they define independent concepts.
They are not merely alternative definitions for a single concept. The definitions in Table 13.9.1 are sorted
from strongest to weakest. (This progression is illustrated in Figure 13.9.2.) Thus, for example, a set which
is I-finite is necessarily IV-finite in all ZF models. No pair of distinct concepts amongst these eight finite-set
concepts can be proven to be equivalent in ZF because for each pair, there exists a ZF model in which there
exist sets which satisfy one definition but not the other.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
I-finite I-infinite
Ia-finite Ia-infinite
II-finite II-infinite
III-finite III-infinite
IV-finite mediate IV-infinite
V-finite V-infinite
VI-finite VI-infinite
VII-finite VII-infinite
All eight of these finite-set definitions are equivalent if the axiom of choice is adopted. (The relations between
these definitions are discussed in particular by Levy [373], pages 23; Howard/Rubin [312], pages 278280;
Moore [321], pages 2230, 130, 209213, 276277; Suppes [345], pages 98108, 149. Some exercises for I-finite,
II-finite and IV-finite sets are given by Jech [314], page 52, for two Fraenkel models and one Mostowski model.
These are permutation models for ZF with atoms, which are called models N1, N2 and N3 respectively by
Howard/Rubin [312], pages 175183. Note that the II-finite definition is also referred to as T-finite.)
The finiteness definitions I, II, III, IV and V were published in 1924 by Tarski [380], pages 49, 93, and the
forward implications were proved there. Proofs of all of the forward implications and reverse non-implications
in ZF set theory (with ur-elements) were given in 1958 by Levy [373], who attributed most of the results to
earlier papers by Mostowski and Lindenbaum. (The reverse non-implications are proved by counterexamples
which are based on countably infinite sets of ur-elements in Mostowski models.) A presentation of the model
theory underlying these results would be well within the scope of this book because the mediate cardinal
gap between I-finite and IV-finite sets (which is mentioned in Remark 13.9.6) has a pervasive influence on
topology, measure and integration, and mathematical analysis generally. (See Remark 7.11.13 for a sample
of the consequences of not adopting the countable axiom of choice.) However, a presentation of model theory
would significantly expand the size of the of this book and unacceptably delay its publication.
13.9.2 Theorem: Let S be a non-empty set of subsets of n for some n . Then S has a minimal element.
PP
In other words, n , S ( (n)) \ {}, y S, x S, (x y x = y).
Proof: Let n = 0 = . Then P(n) = {} and P(P(n)) = {, {}}. So P(P(n)) \ {} = {{}}. So the only
possibility for S is {}. Then is clearly the minimal element of S
PP
Assume that the proposition is true for n = k . Let n = k + 1 = k {k} . Let S ( (k + 1)) \ {}.
P P
Then T (k + 1) for all T S. Suppose that T (k) for some T S. Let S 0 = {U S; U k}. Then
PP
S 0 ( (k)) \ {}. So S 0 has a minimal element U0 by the inductive hypothesis for n = k. This is also
minimal for all of S because all elements U of S \ S 0 contain the element k, which is not itself an element
of k. So U0 6 U for such U .
Now suppose T P
/ (k) for all T S. In other words, k T for all T S. Let S 0 = {U k; U S}.
PP
Then S ( (k)) \ {}. So S 0 has a minimal element U0 by the inductive hypothesis for n = k. But then
0
U0 {k} is a minimal element of S because U U0 {k} if and only if U k U0 for all U S, since k U
for all such U .
Therefore in both cases, the proposition for n = k + 1 follows from the proposition for n = k. Hence the
proposition is true for all n by Theorem 12.2.11.
13.9.3 Theorem: Let S be a non-empty set of subsets of n for some n . Then S has a maximal
PP
element. In other words, n , S ( (n)) \ {}, y S, x S, (y x y = x).
PP PP
Proof: Let n , and let S ( (n)) \ {}. Let S 0 = {n \ U ; U S}. Then S 0 ( (n)) \ {}. So S 0
has a minimal element U00 S 0 by Theorem 13.9.2. Let U0 = n \ U00 . Then U0 is a maximal element of S
P
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
because U0 U for U, U0 (n) if and only if (n \ U0 ) (n \ U ).
13.9.4 Theorem: A set A is equinumerous to a finite ordinal number if and only if every non-empty set
of subsets of A has a maximal element with respect to the relation of set inclusion. In other words,
n , f : n A, f is a bijection S P(P(A)) \ {}, y S, x S, (y x y = x).
Proof: Suppose that f : n A is a bijection for some n . Let S be a non-empty set of subsets of A.
PP
Let S 0 = {f 1 (U ); U S}. Then S 0 ( (n)) \ {}. So S 0 has a maximal element U0 by Theorem 13.9.3.
Therefore f (U0 ) is a maximal element of S because f is a bijection.
PP
Now suppose that a set A satisfies S ( (A)) \ {}, y S, x S, (y x y = x). Define a
P
set S = {y (A); n , f n y, f is a bijection}. This is the set of all subsets of A which are
PP
equinumerous to some element n of . Then S ( (A)) \ {} because S and so S 6= . Therefore
y S, x S, (y x y = x). Let y0 S be such an element, namely such that x S, (y x y = x).
Let z A \ y. Let x = y {z}. Then y x. So y = x. This is a contradiction. Therefore A \ y = . But
P P
y S and S (A). So y (A), and so y A. Therefore y = A. So A S. So by the definition of S,
n , f n y, f is a bijection. In other words, n , f : n A, f is a bijection.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T
(A) \ {} = {S ( (A) \ {}); (x A, {x} S) and (X, Y S, X Y S)}. (13.9.1)
In other words, A is Kuratowski-finite when all of its non-empty subsets are generated from singletons by
P
pairwise union operations. (To be precise, Kuratowski actually expressed this by requiring (A) \ {} to be
P
the unique subset of (A) \ {} which satisfies the conditions x A, {x} S and X, Y S, X Y S.
But this is clearly equivalent to line (13.9.1).)
Kuratowski showed that this definition is equivalent to the standard definition (i.e. equinumerosity to an
element of ) by a simple induction argument to show that finite sets are Kuratowski-finite (similar to
the proof given here for Theorem 13.9.2), and by examining the set of all finite subsets of A to show that
Kuratowski-finite sets are finite (similarly to the second half of the proof given here for Theorem 13.9.4).
Finite sets are equivalent in ZF to I-finite and Kuratowski-finite sets. However, Tarskis I-finite definition
seems to be easier to interpret and manipulate than Kuratowskis. Therefore the Kuratowski definition is
not explained further here.
In 1924, Tarski [380], pages 4858, showed that I-finiteness is equivalent to Kuratowski-finiteness, which
consequently implied that I-finiteness is equivalent to the standard numerical finiteness.
13.9.8 Remark: A set is finite if and only if the power set of its power set is Dedekind-finite.
It was shown in 1912 by Whitehead/Russell, Volume II [351], page 288, that a set is finite if and only if the
power set of its power set is Dedekind-finite. (This equivalence was stated in more modern language in 1924
by Tarski [380], pages 7374.) This gives an interesting progression of finiteness concepts as follows.
name definition or equivalent criterion
I-finite P(P(A)) is Dedekind finite.
III-finite P(A) is Dedekind finite.
IV-finite A is Dedekind finite.
13.10.2 Notation:
Pm
P
(X), for a set X and m , denotes the set {S (X); #(S) m}.
Pn (X), for a set X and n , denotes the set {S P
(X); n #(S)}.
Pm
n (X), for a set X and m, n , denotes the set {S P
(X); n #(S) and #(S) m}.
13.10.3 Remark: Some special subsets of general power sets with infinite bounds.
Notation 13.10.5 uses the infinity symbol to extend Notation 13.10.2 to subsets S of X which have
unbounded finite cardinality. Notation 13.10.6 uses the symbol to include subsets S of X which are
countably infinite. (See Definition 13.5.6 for countably infinite sets.)
A constraint of the form #(S) < is shorthand which means i , Bij(i, S) 6= . A constraint of
the form #(S) #() is shorthand which means i + , Bij(i, S) 6= . A constraint of the form
#(S) = #() is shorthand which means Bij(, S) 6= .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
values, whereas the cardinality upper bound in Notation 13.10.6 indicates that the cardinality includes
all finite values and also the actually infinite value #().
P A potential infinity is never reached, as for
example in a limit such as limi ai or a sum such as n=0 xn , but an actual infinity can be reached.
13.10.5 Notation:
P
P
(X), for a set X, denotes the set {S (X); #(S) < }.
P
n (X), for a set X and n , denotes the set {S P
(X); n #(S) and #(S) < }.
13.10.6 Notation:
P (X), for a set X, denotes the set {S P(X); #(S) #()}.
P (X), for a set X, denotes the set {S P(X); #() #(S)}.
Pn (X), for a set X and n , denotes the set {S P(X); n #(S) and #(S) #()}.
P (X), for a set X, denotes the set {S P(X); #(S) = #()}.
13.10.7 Remark: Applications and properties of cardinality-constrained power sets.
Constrained power sets of the form P
1 (X) are useful for topology. Some elementary general properties are
given in Theorem 13.10.8.
13.10.8 Theorem: Properties of the set of all non-empty finite subsets of a given set.
P
(i) 1 () = .
P
(ii) 1 ({A}) = {{A}} for any set A.
(iii) A X, {A} P
1 (X) for any set X.
P
P
(iv) 1 (X) = (X) \ {} for any finite set X.
(v) X Y P
1 (X)
P
1 (Y ) for any sets X, Y .
(vi)
S
P
( 1 (X)) = X for any set X.
(vii) X P
1 (Y )
S
X Y for any sets X, Y .
(viii) C 1 P P
(
1 (X)),
S
C P
1 (X) for any set X.
(ix) P P T
P
C 1 ( 1 (X)), C (X) for any set X.
(x)
S
{ C; C P P
1 ( 1 (X))} =
P
1 (X) for any set X.
(xi)
T
{ C; C P P
1 (
1 (X))} =
P
(X) for any set X with #(X) 2.
(xii)
T
P P
P
{ C; C 1 ( 1 (X))} = 1 (X) for any set X with #(X) 1.
Chapter 14
Natural numbers and integers
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
whereas in general mathematics they typically commence with 1. The blackboard-bold notation seems
to be the most modern. (The use of blackboard bold font for special sets like , , , and signifies NZQ R C
that these are fundamental sets. This frees up the corresponding letters N , Z, Q, R and C for general use.)
When an author uses the lower-case omega symbol for the natural numbers, this is usually because they
are identifying natural numbers with finite ordinal numbers. In this case, the natural numbers automatically
commence with 0 rather than 1. However, natural numbers and finite ordinal numbers are really two different
systems. Natural numbers have arithmetic operations such as addition and multiplication, whereas the main
purpose of ordinal numbers is to represent equivalence classes of well-ordered sets. Ordinal numbers have
been represented as ZF sets only since von Neumanns 1923 paper. (See von Neumann [382].) Earlier, ordinal
numbers were some kind of abstract essence of the equivalence classes of well-orderings. It is probably best
to keep the two concepts distinct.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1967 Kleene [316], page 176 {0, 1, 2, 3, . . .}
1967 Shoenfield [340], page 249 {0, 1, 2, 3, . . .}
1967 Spivak [136], page 21 { 1, 2, 3, . . .} N
1968 Curtis [63], page 11 { 1, 2, 3, . . .} N
1968 Rosenlicht [124], pages 10, 21 { 1, 2, 3, . . .}
1969 Bell/Slomson [290], page 7 { 1, 2, 3, . . .} N
1969 Robbin [334], page 66 {0, 1, 2, 3, . . .} N
1971 Friedman [72], pages 34 { 1, 2, 3, . . .}
1971 Kane et alii [94], page 3 { 1, 2, 3, . . .} N
1971 Pinter [327], page 125 {0, 1, 2, 3, . . .}
1971 Shilov [130], page 2 { 1, 2, 3, . . .}
1973 Chang/Keisler [298], page 582 {0, 1, 2, 3, . . .}
1973 Jech [314], page 20 {0, 1, 2, 3, . . .}
1979 Levy [318], page 56 {0, 1, 2, 3, . . .}
1982 Moore [321], page xxiii {0, 1, 2, 3, . . .} N
1990 Roitman [335], page 2 {0, 1, 2, 3, . . .} N
1993 Ben-Ari [291], page 327 {0, 1, 2, 3, . . .} N
1993 EDM2 [109], 294.B, page 1104 { 1, 2, 3, . . .} N
1996 Smullyan/Fitting [342], page 32 {0, 1, 2, 3, . . .}
1998 Mattuck [110], page 399 { 1, 2, 3, . . .} N
2004 Szekeres [266], page 13 { 1, 2, 3, . . .} N
2005 Penrose [263], page xxvii {0, 1, 2, 3, . . .} N
Kennington { 1, 2, 3, . . .} N
Table 14.1.1 Survey of definitions and notations for natural numbers
columns of matrices starting with 1. Therefore the natural numbers are defined to be the positive integers
in this book.
Probably a better term for the positive integers would be the counting numbers, which is essentially
how they are named in German. The German noun Zahl literally means count, closely related the
verb zahlen meaning to count. These words are derived from a word which meant a notch or incision,
suggesting the use of notches to keep count. (See Wahrig [426], page 1452.) According to White [427],
page 409, the Latin word numerus, from which the English word number is derived, is akin to the Greek
word , which means to divide or to distribute. (See Morwood/Taylor [422], page 219.) One
cannot divide a thing into zero parts! Some people claim that zero is a natural number. If zero is so natural,
why did it take so long to invent?
14.1.3 Remark: The nostalgic value of the Peano axioms for natural numbers.
The Peano natural number system axioms were introduced before the axiomatisation of set theory in the
early 20th century. (Cantor did not formally axiomatise his set theory.) Although Peano did give a list of
axioms for the foundations of mathematics, his axiomatisation of classes, functions and numbers expressed
a naive form of set theory without existence axioms, in particular no axiom of infinity. Although it may
appear at first sight that arithmetic may be axiomatised without and underlying set theory, the language of
sets must be used explicitly or implicitly in any such axiomatisation, and when the naive approach to sets
encounters contradictions, as it did, it becomes inevitable that naive sets must be replaced by a more formal
theory. Definition 14.1.5 is stated within the framework of ZF set theory. This may seem like a historical
anachronism, but the alternative is to use a naive set theory instead. The idea that the Peano axioms form
a kind of alternative approach to numbers not based on set theory is illusory.
Nowadays the main value of the Peano axioms is nostalgia for a simpler time when set theory was naive
and informal, and the paradoxes and urgency to resolve issues of consistency and independence through
axiomatisation and model theory were over the horizon. It would be easy to declare that the natural numbers
are nothing other than the positive finite ordinal numbers, but there is some historical and theoretical interest
in approaching natural numbers from the perspective of the Peano axioms. This approach is demonstrated
in Section 14.1, but it is not an essential part of the modern approach to numbers.
14.1.4 Remark: Origin of the Peano axioms for the natural numbers.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Peano presented 9 axioms (in Latin) for the natural numbers in 1889. (See Peano [325], page 1.) In 1891, he
presented the better-known set of 5 axioms (in Italian) for the natural numbers. (See Peano [377], page 90.)
In more modern notation, these 5 axioms are presented in Definition 14.1.5. (These axioms are also presented,
amongst many others, by Graves [83], pages 1823; Halmos [307], page 46; E. Mendelson [320], page 102;
EDM2 [109], 294.B, page 1104; Stoll [343], pages 5859; Russell [339], pages 56.)
Peano possibly had in mind Newtons Philosophi naturalis principia mathematica [260] when he chose the
language and title for his own Arithmetices principia [325] where he presented in 36 pages an axiomatisation
of propositional calculus, classes (i.e. set theory), functions, natural number arithmetic, rational number
arithmetic, real numbers and limits. This seems to have been an attempt to repeat in the area of mathematics
what had been achieved 200 years earlier in physics and 2200 years earlier in geometry.
in Italian, and numerus integer positivus in Latin). Nowadays, zero is often used as the first integer. If no
arithmetic operations are applied, the choice of initial integer makes no real difference to Definition 14.1.5.
14.1.7 Remark: The principle of mathematical induction is one of the Peano axioms.
Condition (5) in Definition 14.1.5 is known as the principle of mathematical induction. The property of
the same name for finite ordinal numbers is proved in Theorem 12.2.11.
14.1.8 Remark: All systems of natural numbers are isomorphic. At least one model exists in ZF.
In order to speak of the (system of) natural numbers, it must be shown that all systems of natural numbers
are isomorphic and that there is only one isomorphism between each pair of systems of natural numbers.
Then each element of any such system may be unambiguously identified with a unique element of each other
system. This is shown in Theorem 14.1.16. But first it must be shown that at least one system of natural
numbers exists within ZF set theory. This is shown in Theorem 14.1.9.
14.1.9 Theorem: Existence of a ZF model for the Peano natural number system axioms.
Let N
= \ {} and 1 = {}, and define s : N N N
by s(x) = x {x} for all x . Then ( , s, 1) is a N
system of natural numbers.
Proof: For Definition 14.1.5 (1), by Theorem 12.1.28 (i). Therefore {} by Theorem 12.1.28 (ii).
But {} =
6 because / . Hence {} \ {} = . N
N N
For Definition 14.1.5 (2), s( ) by Theorem 12.2.17 (i).
N
For Definition 14.1.5 (3), x, y , (s(x) = s(y) x = y) by Theorem 12.2.17 (v).
N N
For Definition 14.1.5 (4), suppose that 1 s( ). Then {} = x {x} for some x = \ {}. So x {}
because x x {x}. Therefore x = , which contradicts the assumption that x \ {}. Hence 1 / s( ). N
For Definition 14.1.5 (5), let X be a set which satisfies 1 X and s(X) X. Let S = (X {}) . Then
S , and S because . Now let x S. Then x and so x {x} by Theorem 12.1.28 (ii).
If x = , then x {x} = 1 X, and so x {x} S. If x 6= , then x X and so x {x} and
x {x} X since s(X) X. So x {x} S. Thus x S, x {x} S. Therefore S = by the principle
of mathematical induction, Theorem 12.2.11. So X {}. Therefore \ {} X by Theorem 8.2.6 (v).
N
Thus X. Hence X, ((1 X s(X) X) X). N
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
14.1.10 Remark: Defining leading intervals of a natural number system.
Since no order relation is defined on a natural number system, it is not easy to define leading intervals,
N
meaning the intervals [1, x] for x . These intervals must be defined inductively, as in Theorem 14.1.11.
N
14.1.11 Theorem: Let ( , s, 1) be a system of natural numbers.
N
(i) x , s(x) 6= x.
N PN
(ii) x , I ( ), (x I and s(x) / I and y I \ {1}, z I, y = s(z)).
N PN
In other words, for all x , there exists at least one I ( ) which satisfies x I and s(x) / I and
y I \ {1}, z I, y = s(z).
N
(iii) Let I satisfy 1 I and s(1) / I and y I \ {1}, z I, y = s(z). Then I = {1}.
(iv) Let x N and I N satisfy s(x) I and s(s(x)) / I and y I \ {1}, z I, y = s(z). Then
I = I \ {s(x)} satisfies x I and s(x) / I and y I \ {1}, z I , y = s(z).
N PN
(v) x , 0 I ( ), (x I and s(x) / I and y I \ {1}, z I, y = s(z)).
N PN
In other words, for all x , there is one and only one set I ( ) which satisfies x I and s(x) /I
and y I \ {1}, z I, y = s(z).
N
Proof: For part (i), let X = {x ; s(x) 6= x}. Then 1 X by Definition 14.1.5 (4) because s(1) s( ). N
Let x X. Then s(x) 6= x. So s(s(x)) 6= s(x) by Definition 14.1.5 (3). So N X by Definition 14.1.5 (5).
N
Therefore X = . Hence x , s(x) 6= x.N
N PN
For part (ii), let X = {x ; I ( ), (x I and s(x) / I and y I \ {1}, z I, y = s(z))}. Let
PN
x = 1 and I = {1}. Then I ( ) and x I and s(x) / I because 1 N
/ s( ) by Definition 14.1.5 (4), and
y I \ {1}, z I, y = s(z) is satisfied because I \ {1} = . Therefore 1 X,
PN
Let x X. Then there exists I ( ) such that x I and s(x) / I and y I \ {1}, z I, y = s(z).
Let I + = I {s(x)}. Then s(x) I + . Suppose that s(s(x)) I + . Then s(s(x)) I since s(s(x)) 6= s(x)
by part (i). Let y = s(s(x)). Then y = s(z) for some z I. But z 6= s(x) because s(x) / I, and
Definition 14.1.5 (3), s(x) = z because s(s(x)) = s(z), which is a contradiction. So s(s(x)) / I + . Let
y I + \ {1}. Then y I \ {1} or y = s(x). If y I \ {1}, then z I, y = s(z). If y = s(x), then the
proposition z I, y = s(z) is validated by z = x. Thus s(x) X. So N X by Definition 14.1.5 (5).
N N PN
Therefore X = . Hence for all x , there is at least one set I ( ) which satisfies x I and s(x) /I
and y I \ {1}, z I, y = s(z).
For part (iii), assume that I N
and 1 I and s(1) / I and y I \ {1}, z I, y = s(z). Define
N
X = {y ; y = 1 or y / I}. Then X Nand 1 X. (It must now be shown that y X, s(y) X.)
Let y X. If y = 1, then s(y) X because s(1) / I. So suppose that y X and y 6= 1. Then y / I by
the definition of X. Suppose that s(y) I. Then s(y) = s(z) for some z I by the definition of I. But
then y = z by Definition 14.1.5 (3). So y I, which is a contradiction. Therefore s(y) / I. So s(y) X.
N
Therefore s(X) X. So X by Definition 14.1.5 (5). So y I, y = 1 because I . Hence I = {1}. N
N N
For part (iv), let x , I , s(x) I, s(s(x)) / I and y I \ {1}, z I, y = s(z). Let I = I \ {s(x)}.
By Definition 14.1.5 (4), s(x) 6= 1. So s(x) I \ {1}. Therefore z I, s(x) = s(z). So x = z by
Definition 14.1.5 (3). So x I. But x 6= s(x) by part (i). Therefore x I . Note also that s(x) / I .
To show that y I \ {1}, z I , y = s(z), let y I \ {1}. Then y I \ {1} and y 6= s(x). So y = s(x)
for some z I. To show that z I , it must be shown that z 6= s(x). So suppose that z = s(x). Then
y = s(s(x)), and so s(s(x)) I because y I. But this contradicts the assumption s(s(x)) / I. So z 6= s(x),
and so z I . Hence y I \ {1}, z I , y = s(z).
N PN
For part (v), let Y = {x ; 0 I ( ), (x I and s(x) / I and y I \ {1}, z I, y = s(z))}. (In
other words, let Y be the set of x N PN
for which the relevant set I ( ) is unique.) For x , let N
PN
Zx = {I ( ); x I and s(x) N
/ I and y I \ {1}, z I, y = s(z)}. Then x , Zx 6= by part (ii).
Let I, I 0 Z1 . Then I = {1} = I 0 by part (iii). So 1 Y .
Let x Y . Let I, I Zs(x) . Let I = I \ {s(x)} and I = I \ {s(x)}. Then I , I Zx by part (iv). So
I = I because of the inductive assumption that Zx contains only one set. Therefore Zs(x) contains only one
N
set. Thus x , (x Y s(x) Y ). Since 1 Y , it follows from Definition 14.1.5 (5) that N
Y.
N PN
Hence x , 0 I ( ), (x I and s(x) / I and y I \ {1}, z I, y = s(z)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Since the set of natural numbers I in Theorem 14.1.11 (v) exists and is unique for each x
given a name and notation, parametrised by x, as in Definition 14.1.13 and Notation 14.1.14.
N
14.1.13 Definition: The leading interval up to x in a natural number system ( , s, 1), for x N, is the
subset I of N
which satisfies x I and s(x)
/ I and y I \ {1}, z I, y = s(z).
14.1.14 Notation:
N N N
[1, x], for x , for a natural number system ( , s, 1), denotes the leading interval up to x in N.
N
[1, x] is an abbreviation for [1, x] when N
is indicated in the context.
N N N N
Proof: Let X = {x ; 0 f : [1, s(x)] 0 , (f (1) = 10 and y [1, x], f (s(y)) = s0 (f (y))) }. To
N N
show that 1 X, define f : [1, s(1)] 0 by f (1) = 10 and f (s(1)) = s0 (10 ). This is a well-defined function
N N
because [1, s(1)] = {1, s(1)} by Theorem 14.1.15 (iii), and f is a bijection because 0 [10 , s0 (10 )] = {10 , s0 (10 )}
N N
similarly. Clearly y [1, 1], f (s(y)) = s0 (f (y)) since [1, 1] = {1} by Theorem 14.1.15 (i). The uniqueness
N
of f follows from f (1) = 10 and y [1, x], f (s(y)) = s0 (f (y)), which implies f (s(1)) = s0 (10 ).
Let x X. Then there is a unique f : N[1, s(x)] N0 with f (1) = 10 and y N[1, x], f (s(y)) = s0 (f (y)).
Define f + : N[1, s(s(x))] N0 by f + (x) = f (x) for x N[1, s(x)] and f + (s(s(x))) = s0 (f (s(x))). Then
f + : N[1, s(s(x))] N0 satisfies f + (1) = 10 and y N[1, s(x)], f + (s(y)) = s0 (f +
(y)). To see that
+ + + +
f + is unique, let f1 and f2 both satisfy these conditions. Then f1 [1,s(x)] and f2 [1,s(x)] must be equal
because they satisfy the conditions of the inductive hypothesis, and then f1+ (s(s(x))) = f2+ (s(s(x))) follows
N
from the requirement y [1, s(x)], f + (s(y)) = s0 (f + (y)), which implies f1+ (s(s(x))) = s0 (f1+ (s(x))) =
N
s0 (f (s(x))) = s0 (f2+ (s(x))) = f1+ (s(s(x))). So f1+ = f2+ . Thus s(x) S. So X by Definition 14.1.5 (5).
N N N N
Therefore x , 0 f : [1, s(x)] 0 , (f (1) = 10 and y [1, x], f (s(y)) = s0 (f (y))).
For each x N, denote by fx N N
the unique function f : [1, s(x)] S 0 which satisfies f (1) = 10 and
N N
y [1, x], f (s(y)) = s (f (y)). Then fx fs(x) for all x . So f = xN fx is a well-defined function
0
from N N N
to 0 which satisfies f(1) = 10 and y , f(s(y)) = s0 (f(y)). The uniqueness of f follows from
N N
the uniqueness of fN[1,x] = fx on the leading intervals [0, x] for x , using Theorem 14.1.15 (iv). The
N N N
injectivity and surjectivity of f : 0 follows from the injectivity and surjectivity of fx : [0, 1] 0 [0, 1] N
N N N N
on [1, x] for each x . Hence f is the unique bijection from f : to 0 which satisfies f s = s0 f and
f (1) = 10 .
14.1.17 Remark: The identification of natural numbers with positive finite ordinal numbers.
Since all models for the natural number system are isomorphic in the sense defined by Theorem 14.1.16, one
is free to either choose one particular model to call the model or else to say that the model is of no importance.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The main advantage of being model-free is that no artefacts of the model can introduce extraneous side-
effects. This issue arises with every number system and algebraic system in mathematics. The approach
taken here is to present one concrete model, but then to ignore the chosen ZF model unless it is convenient
N
to use it. Thus the symbol in Notation 14.1.19 refers, from now on, to the particular natural number
system in Definition 14.1.18. This implementation inherits the order relation from in Definition 12.2.5 and
the (suitably restricted) addition and subtraction operations in Definitions 13.4.3 and 13.4.4.
14.1.18 Definition: The standard system of natural numbers is the tuple N < (N, s, 1) with N = \ {},
N N N
s : defined by x , s(x) = x {x}, and 1 = {}.
14.1.21 Notation: Nn, for n , denotes the set {i N; i n}. That is, Nn = {1, 2, . . . n}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A presentation of Gentzens consistency proof for number theory is given by Kleene [315], pages 476499.
Weyl [153], page 144, summarised the situation very well in 1946 as follows.
It is likely that all mathematicians ultimately would have accepted Hilberts approach had he been
able to carry it out successfully. The first steps were inspiring and promising. But then Godel
dealt it a terrific blow (1931), from which it has not yet recovered. Godel enumerated the symbols,
formulas, and sequences of formulas in Hilberts formalism in a certain way, and thus transformed
the assertion of consistency into an arithmetic proposition. He could show that this proposition
can neither be proved nor disproved within the formalism. This can mean only two things: either
the reasoning by which a proof of consistency is given must contain some argument that has no
formal counterpart within the system, i.e., we have not succeeded in completely formalizing the
procedure of mathematical induction; or hope for a strictly finitistic proof of consistency must be
given up altogether. When G. Gentzen finally succeeded in proving the consistency of arithmetic
he trespassed those limits indeed by claiming as evident a type of reasoning that penetrates into
Cantors second class of ordinal numbers.
A slightly modified proof of Gentzens consistency result for arithmetic was given in 1939 by Hilbert/
Bernays [310], pages 372387. They mention the relation between the Godel incompleteness results and
Gentzens consistency theorem, citing a 1943 paper by Gentzen [365].
14.3. Integers
14.3.1 Remark: Approaches to the definition of number systems.
Although the ordinal numbers are essentially always defined directly in terms of ZF sets as in Section 12.1,
other number systems may be defined in multiple ways.
(1) Axiomatically, as abstract objects defined only by predicate logic.
(2) Direct ZF sets, identifying each number in the system with a particular ZF set.
(3) By construction, constructing larger number systems from smaller systems.
(4) By restriction, restricting a larger number system to a smaller sub-system.
In the restriction approach, one my commence with complex numbers and define the real numbers, rational
numbers, signed integers, unsigned integers and natural numbers by successive restrictions. In the construc-
tive approach, one successively builds up number systems in the opposite order, for example, first natural
numbers, then unsigned integers, signed integers, rational numbers, real numbers and complex numbers.
Alternatively, one may give a direct representation of numbers in terms of ZF sets, which is generally only
done for the smaller systems. Or one may define a number systems purely abstractly in terms of axioms
written either formally in terms of a predicate logic or informally in terms of rules for existence and the
properties of arithmetic operations.
A substantial advantage of the restriction approach is that the numbers in all systems may be freely mixed,
as they often must be. One could, in principle use an extension approach, whereby a lower-level system
is extended as required for a higher-level system. This has the advantage of mixability, but then there
are burdensome technicalities which arise with arithmetic operations, which must frequently swap between
multiple styles of representation.
In practice, yet another approach is actually followed, namely the identification approach, whereby the
differences in representations are ignored most of the time so that all systems can be regarded as restrictions
from one big number system (such as the complex numbers), but specific representations can be invoked
whenever convenient. However, when presenting the foundations of mathematics, one generally chooses some
combination of the pure axiomatic approach, the direct ZF-set-based approach, and construction of larger
systems from smaller systems. As mentioned in Remark 12.0.2, neither logic, sets nor numbers have clear
primacy over the others. They have shared primacy. Therefore the choice of approach to defining number
systems is a matter of personal taste and convenience.
A fairly sensible approach here may be as follows.
(1) Define the set of finite ordinal numbers directly in terms of ZF sets. (See Section 12.1.)
(2) Define natural numbers both axiomatically and directly as ZF set. (See Section 14.1.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) Define signed integers both axiomatically and as ZF sets (equivalence classes of ordinal number pairs).
(4) Define unsigned integers by restriction of the signed integers.
(5) Define rational numbers as equivalence classes of pairs of integers, and optionally axiomatically also.
(6) Define real numbers as equivalence classes of rational number sequences, and optionally axiomatically.
(7) Define complex numbers as pairs of real numbers.
(8) Identify the elements of all number systems with the corresponding elements of higher number systems.
(9) Define extended natural numbers, signed integers, unsigned integers, rational numbers, real numbers
and complex numbers by adding one or more points at infinity to these number systems.
14.3.2 Remark: Definition of signed integers as equivalence classes of unsigned integer pairs.
Figure 14.3.1 illustrates the equivalence classes which are used in one popular construction of the signed
integers from unsigned integers. This is formalised in Definition 14.3.3.
14.3.3 Definition: A space of (signed) integers is the set of equivalence classes of N N for any system
N of natural numbers, where (m1 , n1 ) (m2 , n2 ) whenever m1 + n2 = m2 + n1 .
n 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8
10
11
12
13
14
15
16
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
General bounded sub-intervals of the integers may be denoted as in Notation 14.3.10. General unbounded
sub-intervals of the integers may be denoted as in Notation 14.3.11.
14.3.10 Notation:
Z[m, n] denotes the set {i Z; m i n} = {m, m + 1, . . . n} for m, n Z.
Z[m, n) denotes the set {i Z; m i < n} = {m, m + 1, . . . n 1} for m, n Z.
Z(m, n] denotes the set {i Z; m < i n} = {m + 1, m + 2, . . . n} for m, n Z.
Z(m, n) denotes the set {i Z; m < i < n} = {m + 1, m + 2, . . . n 1} for m, n Z.
14.3.11 Notation:
Z(, n] denotes the set {i Z; i n} for n Z.
Z(, n) denotes the set {i Z; i < n} for n Z.
Z[m, ) denotes the set {i Z; m i} for m Z.
Z(m, ) denotes the set {i Z; m < i} for m Z.
14.3.12 Remark: Computer representations of integers.
The twos complement representation of negative integers in computers, which is the most popular by far in
personal computers, is of the form (2n , p), where p + Z
0 is a non-negative integer satisfying 2
n1
p < 2n ,
n1
where n is the number of bits. The bit-patterns for 0 p < 2 represent themselves. So the value
represented by a twos complement binary number p is ((p + 2n1 ) mod 2n ) 2n1 in terms of the modulo
operation in Definition 16.5.19.
14.3.13 Remark: Construction of signed integers from unsigned integers with a sign bit.
Figure 14.3.2 illustrates another popular way of representing signed integers.
This kind of representation is sometimes used in computers. It is used within Internet packets in particular.
(For example, n-bit ones complement representation has one sign bit followed by n 1 value bits.) In this
s
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 v
Figure 14.3.2 Construction of signed integers from unsigned integers and sign
case, each signed integer corresponds to a unique value-sign pair (v, s), where v is a non-negative integer and
s = 0 or 1. The pair (0, 1) (meaning 0) may or may not be excluded from the set. If 0 is included
in the set, it is defined to be equal to 0 anyway.
14.3.15 Remark: How to avoid the technicalities of signed integers, rationals and real numbers.
The number definition strategy adopted by Graves [83], pages 1739, is to first define positive integers, then
positive rational numbers, then positive real numbers, and then finally include non-negative numbers in the
real number system. (See Graves [83], page 31, for this final step.) This strategy has the substantial advantage
of avoiding numerous tedious technicalities in the theoretical development. According to Graves [83], page 26:
The second step in the historical development of the concept of number was the introduction of fractions.
That is true. The algebraic structure of positive rational and real numbers is actually quite satisfying, and
the technical overheads of including negative numbers at the end are not very great. It is then possible
to either define signed integers and rational numbers as sub-systems of the real numbers, or one can include
negative numbers in both the integers and the rational numbers in the same way as for real numbers.
Although this positive numbers first strategy has many advantages, it is not the way in which numbers
are generally developed in modern introductory courses. All in all, the arguments on each side are about
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
equal. Therefore the strategy adopted here is to include negative numbers in the integers, rationals and real
numbers as soon as possible rather than leaving it until the last moment. The cost of this strategy is some
avoidable, distracting technicalities.
14.3.17 Notation:
Pn
i=m ai , for m, n
+
Z
0 and any function a : I Z Z
with I [m, n], denotes the
sum defined inductively with respect to n m by
Pn
(1) ai = 0 if n m < 0,
Pni=m
(2) i=m ai = am if n m = 0,
Pn Pn1
(3) i=m ai = ( i=m ai ) + an if n m > 0.
14.3.18 Notation:
Qn
i=m ai , for m, n
+
Z
0 and any function a : I Z Z
with I [m, n], denotes the
product defined inductively with respect to n m by
Q
(1) ni=m ai = 1 if n m < 0,
Qn
(2) i=m ai = am if n m = 0,
Qn Qn1
(3) i=m ai = ( i=m ai )an if n m > 0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
14.5.3 Remark: The redundant case of zero-length sequences of sets.
Notation 14.5.1 has the interesting consequence that X 0 = Y 0 for any sets X and Y . This shows how
necessary it is to have class tags and even name tags for sets, since the set value of a combination of symbols
does not contain the entire meaning of those symbols.
14.5.4 Definition: An n-tuple of elements of a set X for any n + Z n
0 is any element of X .
A 2-tuple may be referred to as a duple (rarely) or ordered pair , a 3-tuple may be called a triple , a 4-tuple
may be called a quadruple and a 5-tuple may be called a pentuple .
14.5.5 Remark: Two representations for the ordered-pair concept.
The phrase ordered pair should be used with care because it can also mean Definition 9.2.2.
14.5.6 Notation: The expressions (a, b), (a, b, c), (a, b, c, d) and (a, b, c, d, e) for a, b, c, d, e X denote
respectively a duple, triple, quadruple and pentuple of elements of X. (The ordering on the printed line
represents the order within each tuple.) Notations for general n-tuples are defined inductively on n.
14.5.7 Remark: Standard concatenation and subsequence map.
Definitions 14.5.8 and 14.5.9 assume that sequences start with 1. This is the de-facto standard initial
index value, but in some contexts, the sequences may start with 0 as in Definition 14.11.6 (iii, ix). Note also
that the notation in Definition 14.5.9 clashes with Notation 11.5.28, which does not shift the index.
14.5.8 Definition: The standard concatenation map for a set X and m, n + Z
0 is the map concat :
X m X n X m+n which is defined for a X m and b X n by concat : (a, b) 7 c, where c X m+n is
defined by ci = ai for i m and ci+m = bi for i n. In other words,
a X m , b X n , i Nm+n,
ai if i Nm
concat(a, b)(i) =
bim if i Nm+n \ Nm .
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
432 14. Natural numbers and integers
14.5.9 Definition: The standard subsequence map for a set X and p Z+0 and m, n Z+ with m n p
is the map nm : X p X nm+1 which is defined by
14.6.2 Definition: The indicator function for a subset A of a set S, with value set V , is the function
f : S V defined by
1V , if x A
f (x) =
0V , if x S \ A.
14.6.3 Notation: A for a subset A of a set S, with value set V , denotes the indicator function of A in S.
Thus the function A : S V satisfies
1V , x A
A (x) =
0V , x S \ A,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
for any subset A of S.
{a}
(a, 1)
14.6.5 Theorem: For any set S, the sets P(X) and 2S are equinumerous.
P
Proof: Define f : (S) 2S by f : A 7 A for all A S, where A denotes the indicator function for
A as a subset of S. Then f is a bijection.
14.6.7 Definition: The (integer) power-of-two function is the function f : Z+0 Z+ which satisfies
f (0) = 1 and f (n + 1) = 2f (n) for n +Z
0.
14.6.8 Notation: 2n for n Z+0 denotes the integer power-of-two function for argument n Z+0.
14.6.9 Theorem: #( ) = 1.
Proof: By Theorem 10.2.21, = {}. Therefore #( ) = 1.
14.6.10 Definition: The (Kronecker) delta function on a set S, for target set V , is the function
f : S S V which is defined by
1V , if x = y
f (x, y) =
0V , if x 6= y.
14.6.11 Notation: , for sets S and V , denotes the Kronecker delta function on S with any value set V .
ij , ji , ij and ij denote the value (i, j) of the Kronecker delta function : S S V for i, j S, for any
set S with any value set V .
14.6.12 Remark: The Kronecker delta function is a function template.
The Kronecker delta function in Definition 14.6.10 is often applied to sets V which are subsets or supersets
Z
of the integers. The Kronecker delta function for V = is illustrated in Figure 14.6.2.
ij
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
i
As mentioned in Remark 14.6.1, the range of the Kronecker delta function must be interpreted according to
context. The symbols 0 and 1 must be defined within the context. Thus the Kronecker delta function
is really a function template.
14.6.13 Remark: Customary notation for the Kronecker delta function template.
An irony of the Kronecker delta function notation is that it is apparently a mnemonic for the word
difference, whereas the value 1 is generally associated with the meaning true and the value 0 is generally
associated with the meaning false. But ij equals 1 if and only if i and j are the same!
14.6.14 Remark: Relation between indicator functions and Kronecker delta functions.
Theorem 14.6.15 shows that the indicator function and delta function for any set S are closely related. A
Kronecker delta function may be thought of as an indicator function which indicates a singleton set.
14.6.15 Theorem: For any set S,
i, j S, {i} (j) = ij .
Proof: Let S be a set. Let i, j S. Let A = {i}. Then A S. By Definition 14.6.2 and Notation 14.6.3,
{i} (j) = A (j) = 1V if j A = {i}, and {i} (j) = 0V if j / A = {i}. So {i} (j) = 1V if i = j, and
{i} (j) = 0V if i 6= j. So {i} (j) = ij by Definition 14.6.10 and Notation 14.6.11.
14.6.17 Definition: The support of a function f : S V is the set {x S; f (x) 6= 0V }, where zero
element 0V V is determined by the context.
14.6.19 Theorem: Let S and V be sets. Let 0V , 1V V be distinct. Then A P(S), supp(A) = A.
Proof: The assertion follows directly from Definitions 14.6.2 and 14.6.17.
There are two problems with this. First, the formula f (x, y) = ( y = x2 ) specifies neither the domain nor
the range of f . This problem is not too serious because the Kronecker delta function also has an unspecified
domain and range, which are generally indicated in the surrounding context. The second problem is more
serious, namely that the argument of is a proposition. This breaks the rules of ZF set theory, which is a
first-order language. Even worse than this is the fact that, in the example, x and y must be substituted into
the proposition somehow. However, if this form of notation is understood to be a pseudo-notation template,
it can be used reasonably unambiguously as a shorthand.
Despite its customary notation , the Kronecker delta function for propositions is more similar to
an indicator function than a delta function. In any given context, the delta pseudo-notation template in
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Notation 14.6.21 can be converted to valid ZF language by copying the proposition P into the context. Note,
however, that the string of characters in the text representing the proposition must be significant within the
context. This is metamathematical enough to require the [mm] tag.
14.7. Permutations
14.7.1 Remark: Permutations within one set versus permutations between two sets.
The word permutation is used for two different but related concepts. The first kind of permutation, given
in Definition 14.7.2, is a bijection P from a set X to itself.
The second kind of permutation is an injection f from a set X to a set Y . This kind of permutation is called
a k-permutation by EDM2 [109], article 330, to distinguish it from the set-bijection style of permutation.
N
In the special case that X = k , the two concepts are the same. Feller [68], page 28, uses the term ordered
sample for an ordered selection and uses the word permutation for the special case #(X) = k. The
second kind of permutation is called an ordered selection in this book. (This is defined in Section 14.9.)
14.7.7 Notation: perm0 (X), for a set X, denotes the set of finite permutations of X. In other words,
14.7.8 Remark: Transpositions are permutations which swap only two elements.
A permutation of a set X is a transposition if and only if exactly two elements of the set are swapped; in
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
other words, #{x X; f (x) 6= x} = 2. If the set X in Definition 14.7.9 has less than two elements, there
are no transpositions on X because the only permutation is the identity map. So clearly all transpositions
and the identity map are finite permutations.
Definition 14.7.9 is related to the swap-item list operation in Definition 14.11.6.
14.7.11 Theorem: Every finite permutation f of a set X may be expressed as f = Tk Tk1 . . . T1 for
a sequence (Ti )ki=1 of transpositions of X with k +
0. Z
Proof: Let A = {x X; f (x) 6= x}. Then n = #(A) < by Definition 14.7.6. Let : n A be aN
bijection, which exists because A is finite. Let P = f A . If n = 0, then the assertion is valid with k = 0.
The case n = 1 is impossible. If n = 2, then P = T1 with T1 ((1)) = (2) and T1 ((2)) = (1).
Z
Now assume that the assertion is true for n n0 for some n0 + \ {1}. Suppose that #(A) = n0 + 1.
Define T : A A by
n if x = P (n)
x A, T (x) = P (n) if x = n
x otherwise,
which is a transposition of A by Definition 14.7.9 because P (n) 6= n since P (x) 6= x for all x A. Let
P 0 = T P . Then
(
n if P (x) = P (n)
0
x A, P (x) = T (P (x)) = P (n) if P (x) = n
(x otherwise
n if x = n
= P (n) if x = P 1 (n)
x otherwise.
Let A0 = {x A; P 0 (x) 6= x}. Then #(A) < n0 + 1 because P (n) = n. So P 0 = Tk Tk1 . . . T1 for a
sequence (Ti )ki=1 of transpositions of A0 with k + Z
0 . Therefore P = Tk+1 Tk . . . T1 with Tk+1 = T , which
verifies the assertion for n n0 + 1. Hence by induction, the assertion follows for all finite permutations
of X.
14.7.14 Theorem: Let X be a finite set with a total order <. For all permutations P of X, define
S(P ) = {(x, y) X X; x < y and P (x) > P (y)}, N (P ) = #(S(P )) and (P ) = (1)N (P ) . Then
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) N (P 1 ) = N (P ) for all permutations P of X.
(ii) #(S(P Q) 4 S(Q)) = #(S(P )) for all permutations P and Q of X. (See Definition 8.3.2 and
Notation 8.3.3 for the set-difference operator 4.)
(iii) N (P Q) = #(S(P ) 4 S(Q1 )) for all permutations P and Q of X.
(iv) (P Q) = (P ) (Q) for all permutations P and Q of X.
(v) (T ) = 1 for all transpositions T of X.
(vi) (P ) = 1 for all even permutations of X and (P ) = 1 for all odd permutations of X.
Line (14.7.1) follows by relabelling (x, y) as (y, x). The line (14.7.2) follows by replacing (y, x) with
(P (y), P (x)), exploiting the assumption that P is a bijection. Hence S(P 1 ) = f (S(P )), where f : X X
X X is the map f : (x, y) (P (y), P (x)). But f is a bijection. So N (P 1 ) = N (P ).
For part (ii), let P and Q be permutations of X. Then
S(Q) = {(Q1 (x), Q1 (y)) X X; Q1 (x) < Q1 (y) and x > y}.
So
S(P Q) \ S(Q) = {(Q1 (x), Q1 (y)) X X; Q1 (x) < Q1 (y) and P (x) > P (y) and x < y}
and
S(Q) \ S(P Q) = {(Q1 (x), Q1 (y)) X X; Q1 (x) < Q1 (y) and x > y and P (x) < P (y)}
= {(Q1 (y), Q1 (x)) X X; Q1 (x) > Q1 (y) and x < y and P (x) > P (y)}
= g {(Q1 (x), Q1 (y)) X X; Q1 (x) > Q1 (y) and x < y and P (x) > P (y)} ,
#(S(P Q) 4 S(Q)) = #{(Q1 (x), Q1 (y)) X X; x < y and P (x) > P (y)}
= #{(x, y) X X; x < y and P (x) > P (y)}
= #(S(P )).
For part (iii), let P and Q be permutations of X. Then #(S(P ) 4 S(Q1 )) = #(S(P Q)) by replacing Q
with Q1 in part (ii) and then replacing P with P Q. Thus N (P Q) = #(S(P ) 4 S(Q1 )).
For part (iv), let P and Q be permutations of X. Note that #(A 4 B) = #(A) + #(B) 2 #(A B) for
any finite sets A and B by Theorem 13.4.9. Let A = S(P ) and B = S(Q1 ). Then #(S(P ) 4 S(Q1 )) =
#(S(P )) + #(S(Q1 )) 2 #(S(P ) S(Q1 )) = N (P ) + N (Q) 2 #(S(P ) S(Q1 )) by part (i). So
N (P Q) = N (P ) + N (Q) 2 #(S(P ) S(Q1 )) by part (iii). Hence (P Q) = (P )(Q) because
2 #(S(P ) S(Q1 )) is an even integer.
For part (v), let X be a finite set with total order <. Let T be a transposition of X with T (i) = j and
T (j) = i for i, j X with i < j. Then T (x) > T (y) for x < y if and only if
(i < x) (x < j) (y = j) (x = i) (i < y) (y < j) (x = i) (y = j) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (vi) follows from parts (iv) and (v) and Definition 14.7.13.
Proof: For part (i), let X be a set. Suppose P X + X . Then P = Sk Sk1 . . . S1 for some sequence
(Si )ki=1 of transpositions of X, where k is an even integer, and P = T` T`1 . . . T1 for some sequence
(Ti )`i=1 of transpositions of X, where ` is an odd integer. So idX = P P 1 = Um Um1 . . . U1 for a
sequence (Ui )m +
i=1 of transpositions of X, where m = k + ` is an odd integer. Thus idX X X .
N
Let Y = {x X; i m , Ui (x) 6= x}. Then Y is a finite set. Every finite set has a total order. So let <
be a total order on Y . Define N (Q) = #{(x, y) Y Y ; x < y and Q(x) > Q(y)} and (Q) = (1)N (Q) for
all permutations Q of X which satisfy x X \ Y, Q(x) = x. Then by Theorem 14.7.14 (vi) applied to the
restrictions of such permutations to Y , (P ) = 1 and (P ) = 1, which is impossible. Hence X + X = .
Part (ii) may be proved by induction on #(X). The assertion is trivially true for #(X) = 1. Suppose that
N
the assertion is true for sets Y with #(Y ) n . Let X be a set with #(X) = n + 1. Let P be a
permutation of X. Let x X. Define Q : X X to equal P if P (x) = x. Otherwise define Q = Tx,P (x) P ,
where Tx,P (x) is the transposition of x and P (x). Then Q(x) = x, and so QX\{x} is a permutation of
X \ {x}. But #(X \ {x}) = n. So Q is a product of an even or odd number of permutations. Therefore
X\{x}
P is a product of an even or odd number of permutations. Hence X + X = perm(X).
Part (iii) follows from parts (i) and (ii), and Definition 8.6.11.
14.7.16 Remark: Partition of permutations into odd and even does not require an ordered set.
Theorem 14.7.15 applies to all finite permutations on an arbitrary set X. Even though a total order is defined
in the proof of part (i), the assertion does not require any order to be given on X, or on any subset of X, and
the assertions of Theorem 14.7.15 are order-independent because Definition 14.7.13 is order-independent.
Thus the parity of finite permutations in Definition 14.7.17 is a well-defined function from the set perm0 (X)
of all finite permutations on an arbitrary set X to the set {1, 1}.
14.7.17 Definition: The parity function for a set X is the integer valued function on the set of finite
permutations on X with value 1 for odd permutations and value 1 for even permutations.
14.7.18 Notation: parity : perm0 (X) {1, 1}, for a set X, denotes the parity function for X. In other
words, for any set X,
1 if P is an even permutation of X
P perm0 (X), parity(P ) =
1 if P is an odd permutation of X.
where N (P ) = #({(x, y) X X; x < y and P (x) > P (y)}) for all P perm0 (X).
Proof:
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
hh 2017-4-9. To be continued . . . ii
14.7.21 Definition: The factorial function is the map f : Z+0 Z+0 defined inductively by f (0) = 1 and
Z
n + , f (n) = nf (n 1).
14.7.22 Notation: n! denotes the value of the factorial function for argument n.
14.7.24 Definition: The Jordan factorial function is the map f : Z+0 Z+0 Z+0 defined inductively by
n +Z +
Z
0 , f (n, 0) = 1 and n, k 0 , f (n, k + 1) = (n k)f (n, k).
14.7.25 Notation: (n)k denotes the value of the Jordan factorial function for argument (n, k).
Define : perm(Nn ) Nn+1 perm(Nn+1 ) by (Q, j) = Tj,n+1 (Q{(n+1, n+1)}) for all Q perm(Nn )
and j Nn+1 . Then (Q, j) perm(Nn+1 ) for all Q perm(Nn ) and j Nn+1 because Tj,n+1 and
Q {(n + 1, n + 1)} are both permutations of Nn+1 . So ((P )) perm(Nn+1 ) for all P Nn+1 . Summary:
N
Let P n+1 . Then (P ) = (Q, j), where Q = TP (n+1),n+1 P Nn and j = P (n+1). So ((P )) = (Q, j).
Therefore Q {(n + 1, n + 1)} = TP (n+1),n+1 P = Tj,n+1 P . So (Q, j) = Tj,n+1 Tj,n+1 P = P . Thus
N
((P )) = P for all P n+1 . In other words, = idNn+1 .
N N N
Let (Q, j) perm( n ) n+1 . Let P = (Q, j). Then P perm( n+1 ). Therefore ((Q, j)) = (P )
N N N N N N
perm( n ) n+1 . Thus : perm( n ) n+1 perm( n ) n+1 is a well-defined function. Since
P = Tj,n+1 (Q {(n + 1, n + 1)}), it follows that P (n + 1) = Tj,n+1 (n + 1) = j and
TP (n+1),n+1 P Nn = Tj,n+1 (Tj,n+1 (Q {(n + 1, n + 1)}))Nn
= (Q {(n + 1, n + 1)}) N
n
= Q.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
N N
Thus ((Q, j)) = (Q, j) for all (Q, j) perm( n ) n+1 . In other words, = idperm(Nn )Nn+1 .
N N
Therefore is the inverse of by Theorem 10.5.12 (i). So : perm( n+1 ) perm( n ) n+1 is a bijectionN
N N N
by Theorem 10.5.12 (iv). Therefore #(perm( n+1 )) = #(perm( n ) n+1 ) by Definition 13.1.2 and
N N N
Notation 13.1.5. So #(perm( n+1 )) = #(perm( n )) #( n+1 ) = (n + 1)n! = (n + 1)! by Definition 14.7.21.
N
Hence #(perm( n )) = n! for all n + Z
0 by induction on n.
14.7.29 Definition: The Levi-Civita (alternating) symbol or Levi-Civita tensor (with n indices) is the
N N
function : ( n n ) {1, 0, 1}, for some n + Z
0 , defined by
(
1 if f is an odd permutation of n N
(f ) = 0 if f is not a permutation of n N
+1 if f is an even permutation of n , N
for all f : Nn Nn.
14.7.30 Notation: i1 ,...in for n + Z n
0 denotes the value (i) of the Levi-Civita symbol for i = (ik )k=1 .
i1 ,...in
is an alternative notation for i1 ,...in .
14.7.31 Remark: The Levi-Civita alternating symbol is related to the parity function.
See Definitions 14.7.2 and 14.7.13 for permutations. The Levi-Civita alternating symbol is an extension of
the parity function in Definition 14.7.17 for sets n . N
14.7.32 Remark: Multi-index-list and multi-index-degree-tuple definitions and notations.
Let n + Z 0 . There are two main ways to abbreviate monomial expressions which are products of n variables
x1 , . . . xn or compositions of n partial derivative operators 1 , . . . n . (There are two opposite conventions
for the order of application of partial derivatives, as mentioned in Remark 42.2.1.)
N
(1) Give an multi-index-list I List( n ) =
S
`=0
`
n. N
Algebra example: x1 x2 x4 x1 x4 x4 = xI where I = (1, 2, 4, 1, 4, 4).
Calculus example: 1 2 4 1 4 4 = I where I = (1, 2, 4, 1, 4, 4) or I = (4, 4, 1, 4, 2, 1).
(2) Give a multi-index-degree-tuple ( Z+0)n.
?
Algebra example: x1 x2 x4 x1 x4 x4 = x21 x12 x03 x34 = x , where = (2, 1, 0, 3) with n = 4.
?
Calculus example: 1 2 4 1 4 4 = 12 21 30 43 = , where = (2, 1, 0, 3) with n = 4.
The multi-index-list abbreviation style in (1) has the advantage of greater generality and detail because each
appearance of a variable or operator is listed individually in an explicit order. This style is also useful for
notating the multiple indices of general tensors. It has the disadvantage of being less brief.
The multi-index-degree-tuple abbreviation style in (2) has the advantage of brevity and fixed length when
the number of variables is fixed, but it has the serious disadvantage that it assumes commutativity and
associativity of the operation which is represented. This constraint is acceptable if ` partial derivatives are
acting on a C ` function because the value of the expression is then order-independent by Theorem 42.3.6. But
this style of notation is then of no value for discussing scenarios where the order-independence is questionable
and must be proved, or possibly disproved.
The two styles are distinguished here by notating (1) as a subscript and (2) as a superscript. They will both
be referred to in this book by the name multi-index, although it is case (2) which is given this name in
most literature. The two styles of multi-indices are introduced in Definitions 14.7.33 and 14.7.34.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The (multi-index) factorial function is defined on (Z+ )k by ! = i=1 i ! for all (Z+ )k .
Qk
14.7.35 Remark: Cardinality conservation and flow-balance equations for finite permutations.
Theorem 14.7.36 (i) is a kind of cardinality conservation equation for bijections, although it is little more than
the definition of cardinality. Theorem 14.7.36 (ii) is a kind of flow-balance equation for permutations because
it states that the number of elements of S which flow into the complement X \ S is equal to the number of
elements of X \ S which flow into S. The flow function f may be thought of as a very minimal kind of
evolution transform for X. (This flow-balance equation is useful for the proof of Theorem 14.10.5 (xiii).)
14.8. Combinations
14.8.1 Remark: Alternative names for combinations.
A combination is called a k-combination by EDM2 [109], article 330, but it could also be referred to
as an unordered selection or unordered sample. (See Remark 14.9.1 for k-permutations and ordered
selections.) Although these concepts are important in probability theory, they are even more important in
analysis, particularly as coefficients of Taylor series.
In Definition 14.8.2, + Z +
0 is identified with . Therefore n = {0, 1, . . . n 1} for n 0 . Z
14.8.2 Definition: The combination symbol is the function C : Z+0 Z Z+0 defined by
n, r Z+0, C(n, r) = #{x P(n); #(x) = r}.
14.8.3 Notation: C nr , for n Z+0 and r Z, denotes the value of the combination symbol C(n, r).
14.8.4 Remark: Alternative notation for the combination function.
Definition 14.8.2
means that C nr is the number of distinct r-element subsets of an n-element set. The
n
notation r is a popular alternative to C nr . However, when combination symbols are mixed with other
areas of mathematics, the notation nr tends to be ambiguous.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Apres auoir donne les proportions qui se rencontrent entre les cellules & les rangs des Triangles
Arithmetiques, ie passe a diuers vsages de ceux dont le generateur est lvnite; cest ce quon verra
dans les traictez suiuans. Mais ien laisse bien plus que ie nen donne; cest vne chose estrange
combien il est fertile en proprietez, chacun peut sy exercer; Iauertis seulement icy, que dans toute
la suite, ie nentends parler que des Triangles Arithmetiques, dont le generateur est lvnite.
The comment about fertile properties is more often translated as: It is extraordinary how fertile in properties
this triangle is. Everyone can try his hand. The conclusion one may draw from this is that it is pointless
to try to make a comprehensive list of properties of the combination symbol (i.e. Pascals triangle).
Pascals stipulation that the generator is unity is equivalent to part (i) of Theorem 14.8.8.
However, it seems these Chinese authors merely wrote out six or eight rows of the triangle without system-
atically investigating its properties as Pascal did.
Boyer/Merzbach [217], pages 254 and 269, mentions that Petrus Apianus published a Pascals triangle in
Germany in 1527. They mention also a Pascals triangle published in Germany in 1544 by Michael Stifel.
However, like the Chinese authors, these German authors did not investigate the properties as Pascal did.
14.8.7 Remark: Finite difference rule for Pascals triangle and the combination function.
To establish that the cells of Pascals triangle are equal to the combination symbol in Definition 14.8.2, it
is necessary to show that the combination symbol satisfies the induction rule in Theorem 14.8.8 (ii).
n
Cr1 C nr
C n+1
r
Qr1
From this, it is possible to write various formulas such as C nr = i=0 (n i)/(1 + i) in Theorem 16.7.4 (iii).
Such a formula requires a definition of division for rational or real numbers, even though the result of this
sum of ratios is an integer. This can be performed within the integers by noting that the denominator always
divides the numerator exactly in the terms (n i)/(1 + i), but it is simpler to wait until the real numbers
have been fully characterised in Section 15.9.
14.8.8 Theorem:
Z
(i) r , C 0r = r0 .
(ii) n Z+0 , r Z, C r
n+1 n
= Cr1 + C nr .
Proof: For part (i), let r Z. Then C 0r = #{x P(); #(x) = r}. But P() = {} and #() = 0. So
C 0r = 1 if r = 0, and C 0r = 0 if r 6= 0. In other words, C 0r = r0 .
For part (ii), let n Z+0 and r Z. Then
C n+1
r P
= #{x (n + 1); #(x) = r}
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P P
= #{x (n + 1); #(x) = r and n x} + #{x (n + 1); #(x) = r and n
/ x}
P P
= #{x0 {n}; x0 (n) and #(x) = r 1} + #{x0 (n); #(x) = r}
P P
= #{x0 (n); #(x) = r 1} + #{x0 (n); #(x) = r}
n
= Cr1 + C nr .
14.9.2 Remark: Notations for sets of strongly and weakly ordered selections.
The notations Irn and Jrn in Notation 14.9.3 are non-standard. (Federer [67], page 15, gives the notation
(n, r) for Irn .) Irn could be thought of as the set of strongly ordered selections of r elements from n, and
then Jrn would be the corresponding set of weakly ordered selections. Jrn corresponds to the set of possible
independent partial derivatives of order r of a C r function of n variables. (See Feller [68], page 39.)
14.9.5 Theorem:
(i) n +Z
0 , r Z+0, #(Irn) = C nr.
Z
(ii) n + , r Z+0, #(Jrn) = C n+r1
r .
(iii) r Z , #(Jr ) = C r .
+ 0 0+r1
Z+
(iv) n 0 , r Z+, #(Jrn) = C n+r1
r .
Proof: For part (i), let n, r Z+ 0 . Then for all x P(Nn ) such that #(x) = r, define (x) to be the
standard enumeration of x from r to x. as described in Definition 12.3.5. Then (x) : r x is an
increasing bijection. Define 0 (x) : Nr x by 0 : i 7 (i 1). (This converts the start-at-zero domain r
to the start-at-one domain Nr .) Then 0 (x) Irn . Since 0 : {x P(Nn ); #(x) = r} Irn is a bijection, it
follows from Definition 14.8.2 that #(Irn ) = C nr .
For part (ii), let n = 1 and r + Z n
0 . Then Jr = {}, where : r N N
1 is defined by : x 7 1. So
1
#(Jr ) = 1 = C r n+r1
Z
. Now assume the validity of the assertion for some n + , for all r +0 . Then Z
r Z+0, #(Jrn+1 ) = #({f Jrn+1 ; f (r) = n + 1}) + #({f Jrn+1 ; f (r) 6= n + 1})
n+1
= #(Jr1 ) + #(Jrn )
= C n+r1
r1 + C n+r1
r
= C r(n+1)+r1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It is perhaps noteworthy that Theorem 14.9.5 is equivalent to the following assertions.
Z+0,
r1
Q ni
n, r #(Irn ) = .
i=0 1+i
Z+0,
r1
Q n+i
n, r #(Jrn ) = .
i=0 1+i
These are more similar to each other than is suggested by parts (i) and (ii) of Theorem 14.9.5.
14.9.7 Notation: f for a function f : Nn A and Imn for any set A means the sequence f =
(f ((j)))m m
Nm A.
j=1 = (f(j) )j=1 = (f1 , . . . fm ) :
where Ji = {j Nr ; g(j) < g(i) or (g(j) = g(i) and j i)} for all i Nr .
(i) i Nr , i Ji.
(ii) i, i Nr , g(i) < g(i0 ) Ji
0
6= Ji . 0
(iv) P perm(Nr ).
(v) g P 1 NonDec(Nr , S).
(vi) If g Inj(Nr , S), then g P 1 Inc(Nr , S).
Proof: For part (i), let i Nr . Let j = i. Then g(j) g(i) and j i. So i P (i).
For part (ii), let i, i0 Nr with g(i) < g(i0 ). Let j Ji . Then j Nr , and either g(j) < g(i) or else
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
g(j) = g(i) and j i. If g(j) < g(i) then g(j) < g(i0 ) and so j Ii0 . Also, if g(j) = g(i) then g(j) < g(i0 )
and so j Ii0 . Therefore Ji Ji0 . By part (i), i0 J(i0 ). Suppose that i0 J(i). Then either g(i0 ) < g(i)
or else g(i0 ) = g(i) and i0 i. These alternatives both contradict the assumption g(i) < g(i0 ). So i0 / J(i).
Hence Ji 6= iJ 0 .
N N
For part (iii), let i, i0 r with g(i) g(i0 ) and i < i0 . Let j Ji . Then j r , and either g(j) < g(i)
or else g(j) = g(i) and j i. If g(j) < g(i) then g(j) < g(i0 ) and so j Ii0 . If g(j) = g(i) and j i,
then g(j) g(i0 ) and j < i0 , and so j Ii0 . Therefore Ji Ji0 . By part (i), i0 J(i0 ). Suppose that
i0 J(i). Then either g(i0 ) < g(i) or else g(i0 ) = g(i) and i0 i. The alternative g(i0 ) < g(i) contradicts
the assumption g(i) < g(i0 ). The alternative g(i0 ) = g(i) and i0 i contradicts the assumption i < i0 . So
i0
/ J(i). Hence Ji 6= Ji0 .
N N
For part (iv), if r = 0 then r = and P = , which implies P {} = perm( r ). So assume r Z+. Let
N
i r . Then Ji = N
6 by part (i). So P (i) r . Thus P is a well-defined function from r to r . N N
0
N 0 0 0
To show that P is injective, let i, i r with i < i . If g(i) g(i ), then P (i) < P (i ) by part (iii). If g(i) >
N
g(i0 ), then P (i) > P (i0 ) by part (ii). Similarly P (i) 6= P (i0 ) if i > i0 . So i, i0 r , i 6= i0 P (i) 6= P (i0 ).
N N
Thus P is injective. So P : r r is a bijection by Theorem 12.3.13 (i). Hence P perm( r ). N
For part (v), let k, k 0
Nr with k < k0 .
Suppose that g(P (k)) > g(P (k )). Let i = P 1 (k) and
1 1 0
i = P (k ). Then g(i) > g(i ) and P (i) = k < k0 = P (i0 ). But by part (ii), g(i) > g(i0 ) implies that
0 1 0 0
P (i) = #(Ji ) > #(Ji0 ) = P (i0 ), which is a contradiction. So g(P 1 (k)) g(P 1 (k0 )). Thus g P 1 is
N
non-decreasing. In other words, g P 1 NonDec( r , S) by Notation 11.1.19.
N N
For part (vi), let g Inj( r , S). By part (v), g P 1 NonDec( r , S). Suppose that g P 1 / Inc( r , S). N
N
Then g(P 1 (k)) = g(P 1 (k0 )) for some k, k0 r with k < k0 . Let i = P 1 (k) and i0 = P 1 (k0 ). Then
N
g(i) = g(i0 ). So i = i0 because g Inj( r , S). So k = P (i) = P (i0 ) = k0 , which is a contradiction. Hence
N
g P 1 Inc( r , S).
14.10.3 Definition: The standard sorting-permutation for a finite sequence g : Nr S, for r Z+0 and
N
a totally ordered set S, is the permutation Pg perm( r ) defined by
N
14.10.5 Theorem: Partitions of rn according to rearrangement orbits.
Let n, r + Z N
0 . Define (f ) = {f P ; P perm( r )} for all f
r
n. N
(i) f Nrn , (f ) Nrn .
(ii) f Inj(Nr , Nn ), (f ) Inj(Nr , Nn ).
(iii) f Nrn , f (f ).
(iv) g Inj(Nr , Nn ), f Irn , g (f ).
Inj(Nr , Nn ) = {(f ); f Irn }.
S
(v)
(vi) f, g Irn , g (f ) g = f .
(vii) f Irn , (f ) Irn = {f }.
(viii) f, f 0 Irn , f 6= f 0 (f ) (f 0 ) = .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ix) {(f ); f Irn } is a partition of Inj( Nr , Nn).
(x) f Irn , #((f )) = r!.
(xi) g NSrn, f Jrn , g (f ).
(xii) Nrn = {(f ); f Jrn }.
(xiii) f, g Jrn , g (f ) g = f .
(xiv) f Jrn , (f ) Jrn = {f }.
(xv) f, f 0 Jrn , f 6= f 0 (f ) (f 0 ) = .
(xvi) {(f ); f Jrn } is a partition of Nrn.
N N N
Proof: For part (i), let f rn . Let P perm( r ). Then Dom(P ) = r and Range(P ) = r . So N
N N
Dom(f P ) = Dom(P ) = r and Range(f P ) = f (Range(P )) = Range(f ) n . Therefore f P rn . N
Hence (f ) rn .N
N N
For part (ii), let f Inj( r , n ). Let P perm( Nr ). Then f P Nrn by part (i). But f P is injective
by Theorem 10.5.4 (i). Hence (f ) Inj( r , n ). N N
For part (iii), let f N
r
n. Then idNrn perm( Nr ) by Definition 14.7.2 and Theorem 10.5.3. Therefore
f = f idNrn (f ).
N N N
For part (iv), let g Inj( r , n ). Let Pg perm( r ) be the standard sorting-permutation for g as in
N N N
Definition 14.10.3 with S = n . Let f = g Pg1 . Then f Inc( r , n ) = Irn by Theorem 14.10.2 (vi), and
N N
g = f Pg (f ). Hence g Inj( r , n ), f Irn , g (f ).
N N S S
For part (v), Inj( r , n ) {(f ); f Irn } by part (iv). For the converse, let g {(f ); f Irn }. Then
n
g (f ) for some f Ir . So g = f P for N n
S some P perm( r ) and f Ir . Therefore g Inj( r , n ) by N N
N N N N S
Theorem 10.5.4 (i). Thus Inj( r , n ) {(f ); f Irn }. Hence Inj( r , n ) = {(f ); f Irn }.
N
For part (vi), let f, g Irn with g (f ). Then g = f P for some P perm( r ). Suppose that P 6= idNr .
Then by Theorem 12.2.12 (i), P is not an increasing function. So i < j and P (i) > P (j) for some i, j r . N
Then g(i) = f (P (i)) > f (P (j)) = g(j) because f Irn , which contradicts the assumption that g Irn .
Therefore P = idNr . So g = f .
Part (vii) follows from parts (iii) and (vi).
For part (viii), let f, f 0 Irn with (f ) (f 0 ) 6= . Then g (f ) (f 0 ) for some g. So g = f P
N
and g = f 0 P 0 for some P, P 0 perm( r ). Then f = f 0 Q, where Q = P 0 P 1 perm( r ) by N
Theorem 10.5.4 (iii). So f (f 0 ). Therefore f = f 0 by part (vi). Hence f 6= f 0 (f ) (f 0 ) = by
Theorem 4.5.6 (xxx).
Part (ix) follows from parts (v) and (viii), and Definition 8.6.11.
N
For part (x), let f Irn and define the map : perm( r ) (f ) by : P 7 f P . Then is a well-defined
N
surjection onto (f ). Let P, P 0 perm( r ) with P 6= P 0 . Let Q = P 0 P 1 . Suppose that Q = idNr .
N
Then P = Q P = P 0 P 1 P = P 0 . Therefore Q 6= idNr . So Q(i) 6= i for some i r . Therefore
N
f (Q(i)) 6= f (i) for such i r because f is increasing and consequently injective. Thus f Q 6= f . So
N N
f P 0 = f Q P 6= f P by Theorem 10.5.12 (vi) because P : r r is a surjection. Thus (P ) 6= (P 0 ).
N N
It follows that is an injection. So : perm( r ) (f ) is a bijection. Hence #((f )) = #(perm( r )) = r!
by Theorem 14.7.28, Definition 13.1.2 and Notation 13.1.5.
N N
For part (xi), let g rn . Let Pg perm( r ) be the standard sorting-permutation for g with range S = n N
N N
as in Definition 14.10.3. Let f = g Pg1 . Then f NonDec( r , n ) = Jrn by Theorem 14.10.2 (v), and
N
g = f Pg (f ). Hence g rn , f Jrn , g (f ).
N
r
S n
N
r
S n
S (xii), n n {(f ); f Jr } by part (xi). But n {(f ); f Jr } by part (i). Hence
For part
N r
n = {(f ); f Jr }.
N
For part (xiii), let f, g Jrn with g (f ). Then g = f P for some P perm( r ). Suppose that g 6= f .
N
Then g(i) 6= f (i) for some i r . Suppose that g(i) > f (i). Then f (P (i)) > f (i). So P (i) > i because
Z N Z
f Jrn . Let S = [1, i]. Let S 0 = n \ S = [i + 1, n]. Then #(S P (S 0 )) = #(S 0 P (S)) by
Theorem 14.7.36 (ii). But P (i) S 0 P (S). So #(S 0 P (S)) 1. Therefore #(S P (S 0 )) 1. In other
N
words, P (j) i for some j r with j > i. So g(j) = f (P (j)) f (i) since f Jrn . Therefore g(j) < g(i),
which contradicts the assumption that g Jrn . So g(i) 6> f (i).
Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Now suppose that g(i) < f (i). Then f (P (i)) < f (i). So P (i) < i because f Jrn . Let S = [1, i 1]. Let
N Z
S 0 = n \ S = [i, n]. Then #(S P (S 0 )) = #(S 0 P (S)) by Theorem 14.7.36 (ii). But P (i) S P (S 0 ).
N
So #(S P (S 0 )) 1. Therefore #(S 0 P (S)) 1. In other words, P (j) i for some j r with j < i. So
g(j) = f (P (j)) f (i) since f Jrn . Therefore g(j) > g(i), which contradicts the assumption that g Jrn .
N
So g(i) 6< f (i). Thus there is not i r with g(i) 6= f (i). Hence g = f .
Part (xiv) follows from parts (iii) and (xiii).
For part (xv), let f, f 0 Jrn with (f ) (f 0 ) 6= . Then g (f ) (f 0 ) for some g. So g = f P
N
and g = f 0 P 0 for some P, P 0 perm( r ). Then f = f 0 Q, where Q = P 0 P 1 perm( r ) by N
Theorem 10.5.4 (iii). So f (f 0 ). Therefore f = f 0 by part (xiii). Hence f 6= f 0 (f ) (f 0 ) = by
Theorem 4.5.6 (xxx).
Part (xvi) follows from parts (xii) and (xv), and Definition 8.6.11.
14.11.6 Definition: The following operations are defined on the list space List(X) for a general set X.
(i) The length function length : List(X) Z+0 defined by length : x 7 #(Dom(x)).
(ii) The canonical injection i : X List(X) defined by i : x (x), where (x) is the one-element list
containing the single element x.
(iii) The concatenation function concat : List(X) List(X) List(X), defined by
Z
x, y List(X), concat(x, y) X length(x)+length(y) and i length(x)+length(y) ,
xi if i < length(x)
concat(x, y)i =
yilength(x) if i length(x).
(iv) The omit item i function omitj : List(X) List(X) defined by:
`i if i < j
omit(`)(i) =
j `i+1 if j i < length(`) 1.
That is,
omit(`) = (`0 , . . . , `j1 , `j+1 , . . . , `length(`)1 ).
j
Note that omitj acts as the identity on lists which have length less than or equal to j. Otherwise, omitj
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
maps X k to X k1 , where k = length(`).
(v) The omit items i and j function omitj,k : List(X) List(X) is defined for j, k Z+0 such that j 6= k
by
`i if i < min(j, k)
omit(`)(i) = `i+1 if min(j, k) i < max(j, k)
j,k
`i+2 if max(j, k) i < length(`) 2.
Note that omitj,k acts as the identity on lists which have length less than or equal to min(j, k).
(vi) The swap items j and k operator swapj,k : List(X) List(X) is defined for j, k Z+0 as the map
swapj,k (`) : Dom(`) X for all ` List(X), where
`k if i = j and k Dom(`)
i Dom(`), swap(`)(i) = `j if i = k and j Dom(`)
j,k
`i if i
/ {j, k} or {j, k} 6 Dom(`).
That is,
(` \ {(j, `j ), (k, `k )}) {(j, `k ), (k, `j )} if {j, k} Dom(`)
swap(`)(i) =
j,k ` if {j, k} 6 Dom(`).
The operator swapj,k equals the identity function on List(X) if j = k. (This operator is well defined for
any function `, not just for lists.)
(vii) The substitute x at position j operator subsj,x : List(X) List(X) defined by:
x if i = j
subs(`)(i) =
j,x `i 6 j.
if i =
The operator insertj,x acts as the identity on lists which have length less than or equal to j.
(ix) The subsequence of length n starting at k function subseqk,n : List(X) List(X) is defined for
k, n +Z
0 by n
subseq(`)(i) = `k+i for 0 i < min(n, length(`) k)
k,n 0 otherwise.
This function maps X j to X m , where m = min(n, j k).
14.11.7 Remark: Substitution operator applications.
Although Definition 14.11.6 (vii) may seem overly complicated for such a simple task, the substitution op-
erator is in fact often required. Substitution in tuples is very often explained using horizontal ellipsis
dots, which do not give full confidence in the precision of a definition. For example, when an author
R
wishes to substitute s for xi and t for xj in a tuple x n , this is often denoted in some such way as
x1 , . . . xi1 , s, xi+1 , . . . xj1 , t, xj+1 , . . . xn , even though i might not be less than j. This can be expressed
much more precisely as subsi,s (subsj,t (x)).
14.11.8 Remark: Lists of elements of algebraic systems have extended operations.
See Definitions 18.10.2 and 18.10.3 for algebraic operations on list spaces.
14.11.9 Remark: Operations on lists of sets.
Theorem 14.11.10 and Definition 14.11.11 are concerned with operations on lists of sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
length(S)1 S
14.11.10 Theorem: Let X be a set of sets. Let S List(X). Then i=0 Si List( X).
n1
Proof: Let X be a set of sets. Let S List(X). Let f i=0 Si , where n = length(S). Then Dom(f ) = n
Sn1 S S S
n1
and Range(f ) i=0 Si X. So f List( X). Hence i=0 Si List( X).
14.11.11 Definition: The following operations are defined on List(X), where X is a set of sets.
S S
P
(i) The union operation : List(X) ( X) is defined by
S S
S List(X), (S) = {x X; i Dom(S), x Si }
length(S)1
S
= Si .
i=0
14.11.14 Theorem: Let X be a set. Let ` List(X) with n = length(`) Z+0. Then
m Zn+1, ` = concat(subseq(`), subseq(`)).
0,m m,nm
Proof: Let X be a set of sets. Let S 1 , S 2 List(X) and let S = concat(S 1 , S 2 ). Let n1 = length(S 1 ),
n2 = length(S 2 ) and n = length(S). Then n = n1 + n2 .
Let `1 (S 1 ) and `2 (S 2 ). Then length(`1S ) = n1 and length(`2 ) = n2 . By Remark 14.11.13,
1 2
` = concat(` , ` ) is a well defined element of List( X). Then length(`) = n = n1 + n2 . For i n1 , N
N N
`i = `1i Si1 = Si . For i n \ n1 , `n1 +i = `2i Si2 = Sn1 +i . Thus `i Si for all i n . So N
` (S) = (concat(S 1 , S 2 )).
Now suppose that ` (concat(S 1 , S 2 )) = (S). Let `1 = subseq0,n1 (`) and `2 = subseqn1 ,n2 (`). (See
Definition 14.11.6 (ix) for the subsequence operator.) Then `1 (S 1 ) and `2 (S 2 ), and ` = concat(`1 , `2 )
by Theorem 14.11.14. So ` {concat(`1 , `2 ); `1 (S 1 ), `2 (S 2 )}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It is sometimes useful to combine finite and infinite sequences in extended list spaces as in Definition 14.11.17.
14.11.17 Definition: The extended list space on a set X is the set List(X) X .
14.11.22 Definition: The list space with maximum length k on a set X, for k + 0 , is the set Z
Sk i
i=0 X .
14.11.23 Notation: Listk (X), for k Z+0, denotes the list space with maximum length k on a set X.
14.11.24 Remark: Spaces of multinomial coefficients with specified maximum degree.
Symmetric multinomial coefficients are relevant to definitions of multinomials over commutative rings. For
such multinomials, multiple copies of coefficients appear in the coefficient maps in spaces Mult+
k (I, Y ).
Thus 2x1 x2 would appear as c(1, 2)x1 x2 + c(2, 1)x2 x1 with c(1, 2) = c(2, 1) = 1.
The spaces Mult+ N R
k ( n , ) are particularly applicable to higher-order partial derivatives of functions on
differentiable manifolds.
14.11.25 Definition: The multinomial coefficient space of degree k with index set I and value space Y ,
for k +Z0 , is the set Y
Listk (I)
of functions from Listk (I) to Y .
The symmetric multinomial coefficient space of degree k with index set I and value space Y , for k +
0 , is Z
the set of functions {c : Listk (I) Y ; ` k, I ` , i, j I, c(swapi,j ()) = c()}.
14.11.26 Notation: Multk (I, Y ), for k + Z 0 , denotes the multinomial coefficient space of degree k with
index set I and value space Y . In other words, Multk (I, Y ) = Y Listk (I) .
Mult+ +
Z
k (I, Y ), for k 0 , denotes the symmetric multinomial coefficient space of degree k with index set I and
value space Y . That is, Mult+ `
k (I, Y ) = {c : Listk (I) Y ; ` k, I , i, j I, c(swapi,j ()) = c()}.
Chapter 15
Rational and real numbers
15.0.1 Remark: The historical motivation for the formalisation of real numbers.
In the 21st century, the real numbers seem so natural, it is difficult to understand why there was ever a need
to formalise the real numbers and establish the validity of the concept. The following quote from Bell [213],
pages 519520, helps to clarify the problem which needs to be solved.
If two rational numbers are equal, it is no doubtobvious that their square roots areequal.
Thus
23 and 6 are equal;
so also
then are 2 3 and 6. But it is not obvious that 2
3 2
= 3,
and hence 2 3 = 6. The un-obviousness of this simple assumed equality, 2 3 = 6,
taken for granted in school arithmetic, is evident if we visualize what the equality implies: the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
lawless square roots of 2, 3, 6 are to be extracted, the first two of these are then to be multiplied
together, and the result is to come out equal to the third. As no one of these three roots can be
extracted exactly, no matter to how many decimal places the computation is carried, it is clear
that the verification by multiplication as just described will never be complete. The whole
human
race toiling incessantly through all its existence could never prove in this way that 2 3 = 6.
Closer and closer approximations to equality would be attained as time went on, but finality would
continue to recede. To make these concepts of approximation and equality precise, or to replace
our first crude conceptions of irrationals by sharper descriptions which will obviate the difficulties
indicated, was the task Dedekind set himself in the early 1870shis work on Continuity and
Irrational Numbers was published in 1872.
The word lawless here doubtless refers to a number whose decimal expansion follows no apparent pattern,
whereas the decimal expansion of a rational number will always repeat itself
indefinitely after some number
of digits. Thus the product of the lawless decimal
expansions for 2 and 3 should somehow be proved
to be equal to the lawless decimal expansion for 6. If even such elementary relations of equality amongst
algebraic numbers cannot be satisfactorily resolved with decimal expansions, then surely the totality of
analysis for general transcendental numbers will be out of reach. At the very minimum, a formalisation of
the real numbers must be able to securely define the equality relation amongst the full range of formulas
which specify individual numbers. This requires a model containing numbers which can be associated with
symbolic formulas in such a way that equality within the model corresponds exactly to our intuitive sense of
equality. When this bare minimum requirement to be able to settle questions of equality and inequality is
met, one may then proceed to establish the other requirements for a real number system, such as addition,
multiplication and order.
15.0.2 Remark: The rational numbers provide a convenient base camp for climbing the real numbers.
One generally builds the real number system on the assumed consistency and general well-definition of the
rational numbers. The rational numbers have the convenience that they are countable, and their decimal
expansions consist of infinite repetitions of a finite string of digits after some point. Thus each rational
number is finitely specifiable as a ratio of integers or in terms of a repeating digit-string in a decimal
expansion (or an expansion with any base). One could in principle base real numbers on algebraic numbers,
which are at least finitely specifiable, and are therefore countable. It is also possible to build the real numbers
from a smaller space than the full rational numbers. For example, the real numbers may be built from the
Z
set of all rational numbers of the form i 2n for i and n + Z 0 , which is dense in the real numbers.
(This corresponds to the set of all finite binary expansions, which are typically used in computer processors.)
However, the rational numbers have the weight of history in their favour, and they have various convenient
arithmetic closure properties which can be used for defining real numbers instead of having to prove these
closure properties as an additional task. The rational numbers thus provide a convenient base camp for
the attack on the real numbers. (Sections 15.4, 15.5, 15.6, 15.7 and 15.8 give some idea of how difficult it is
to build real numbers even from this relatively convenient starting point.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Consequently, one need not be too concerned as to whether real numbers are a suitable model for the
physical world itself. Real numbers are an extrapolation of the model which humans currently use for
measurements of the physical world.
A more abstract approach to the rational number system would be to characterise it as an ordered field Q
which contains the integers, and which contains no smaller ordered field which contains the integers. (See
MacLane/Birkhoff [106], pages 101104, 265267, for the construction of Q
as the field of quotients which
is generated by Z
regarded as an integral domain.) Then every map f : Z
K from the integers to Z
an ordered field K which preserves the ordered additive group and multiplicative semigroup structure of
Z
integers would factorisable into a canonical map : Q and a structure-preserving map g : Q K.
(This would be along the same lines as the unique factorisation property for tensor products in Section 28.3,
and for multilinear maps in Section 27.6.) Such a high-level abstract characterisation of the rational number
system still does not answer the fundamental question of what rational numbers are.
Perhaps the closest approximation to a credible ontology for rational numbers would be to say that integer
pairs of the form (p, q) are parameters for a well-known Euclidean geometry construction to cut a given
line in a specified ratio by first drawing a line adjacent to the given line, marking off q equal segments on
the constructed line, and then cutting the given line with parallel lines from the constructed line. (This
is justified by Euclids propositions V.9 and V.10. See for example Euclid/Heath [196], pages 211214;
Euclid [198], pages 132133; Kuchel [102], page 157.) This construction is illustrated in Figure 15.1.1.
11 C
10
9
8
7
6
5
4
3
2
1
E
A B
7
AE = 11 AB
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
D
Figure 15.1.1 Euclidean construction to divide a given line segment in a given ratio
The ratio p/q may then be interpreted to mean the result of such a geometric construction, including its
metric and order properties. Thus, as in the case of most concepts in mathematics, the true meaning of the
concept is extra-mathematical. The pure mathematical construction, which is an integer-pair equivalence
class in this case, is merely a parameter-set which points to an extra-mathematical object. (For other
examples of extra-mathematical objects, see Remark 51.1.8 for points in manifolds, and Remark 51.1.15 for
tangent vectors.) One may think of an integer-ratio pair (p, q) as a coordinate tuple for a rational number
point in an extra-mathematical number space. The existence of infinitely many choices for this coordinate
tuple should then be no more (or less) troubling than the existence of infinitely many choices for charts
for manifolds. (In fact, the ratios p/q in Figure 15.1.2 in Remark 15.1.7 are literally coordinates, which
parametrise lines by specifying a single point on each line.)
15.1.2 Remark: Rational numbers can be constructed from equivalence classes of integer pairs.
The rational numbers are a natural extension of the signed integers Z
in Section 14.3. Definition 15.1.3
permits the representation Q of the set of rational numbers to be chosen freely, although a correspondence
function between integers and rational numbers must be provided. Definition 15.1.3 allows the set Q to
include the integers as a subset. But typical representations do not literally include the integers.
Z Z Z Z
classes {[(n, d)]; (n, d) ( \ {0})}, where [(n, d)] = {(n0 , d0 ) ( \ {0}); nd0 = n0 d} for integer pairs
Z Z Z Z
(n, d) ( \ {0}), and : ( \ {0}) Q is defined by : (n, d) 7 [(n, d)].
Sets of equivalence classes are often used as representations of mathematical classes when the first choice
for the representation suffers from non-uniqueness. One way to avoid the inconvenience of equivalence
classes is to choose a canonical element from each equivalence class to represent the whole class. In the
case of the rational numbers, the lowest common denominator can be used to construct a well-defined
unique representative for each rational number. (This would be a suitable representation in computer
software.) For each equivalence class [(n, d)] in the above representation, a suitable representative would be
the pair (n0 , d0 ) where d0 = |d|/ gcd(|n|, |d|) and n0 = nd0 /d when n 6= 0, and (n0 , d0 ) = (0, 1) when n = 0.
This gives a second representation (Q0 , 0 ) of the rational numbers with the obvious map 0 . A third
representation (Q00 , 00 ) may be constructed by substituting n Z for each pair (n, d) Q0 with d = 1.
00 0
Z Z
Thus Q = {(n, d) Q ; d 6= 1} . Such a representation has the advantage that Q00 , but then the
definition of arithmetic operations becomes more difficult because the representation is non-uniform. The
representation (Q0 , d0 ) which simplifies the ratio-pairs (n, d) using the greatest common divisor gcd(|n|, |d|)
of the absolute values of n and d is also difficult to work with because after each operation, the resulting
representative must be normalised by simplifying the fraction. Thus for practical purposes, it may be
most convenient to use the full equivalence classes, although other representations are more attractive for
explaining the meaning of the rational numbers. All representations are, of course, interchangeable.
Z Z
15.1.5 Definition: The set of rational numbers is the set {[(n, d)]; n , d \ {0}}, where [(n, d)]
Z Z Z
denotes the equivalence class {(n0 , d0 ) 2 ; d0 6= 0 and nd0 = n0 d} for all pairs (n, d) ( \ {0}).
15.1.6 Notation: Q denotes the set of rational numbers.
15.1.7 Remark: A standard representation for rational numbers.
Definition 15.1.5 is the popular representation for the set of rational numbers in terms of equivalence classes
of pairs of integers with equal ratios. This is illustrated in Figure 15.1.2. (This diagram is reminiscent of
some constructions in projective geometry.)
4 4 6 4 3 2 3 4 6 1 6 3 2
3 1 n 1 1 1 1 2 3 5 1 7 4 3
3/5
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1/2
2/5
1/3
1/4
1/5
1/10
d 0/1
1/10
1/5
1/4
1/3
2/5
1/2
3/5
6 3 2 3 6 1 6 3 2
1 1 1 2 5 1 7 4 3
15.1.10 Definition: An ordered rational number system is a tuple (Q, , , , R) where (Q, , , ) is a
P
rational number system and the relation R (Q Q), denoted as <, satisfies the following condition.
(3) (n1 , d1 ), (n2 , d2 ) Z Z+, (n1, d1) < (n2, d2 ) n1d2 < n2 d1.
15.1.11 Notation:
Q++ denotes the set of positive rational numbers {q Q; q > 0}.
Q0 denotes the set of non-negative rational numbers {q Q; q 0}.
Q denotes the set of negative rational numbers {q Q; q < 0}.
Q0 denotes the set of non-positive rational numbers {q Q; q 0}.
15.1.12 Remark: Notations for bounded and unbounded intervals of rational numbers.
Q
General bounded sub-intervals of may be denoted as in Notation 15.1.13. General unbounded sub-intervals
of Qmay be denoted as in Notation 15.1.14.
15.1.13 Notation:
Q Q
[a, b] denotes the set {q ; a q b} for a, b . Q
Q Q
[a, b) denotes the set {q ; a q < b} for a, b . Q
Q Q
(a, b] denotes the set {q ; a < q b} for a, b . Q
Q Q
(a, b) denotes the set {q ; a < q < b} for a, b . Q
15.1.14 Notation:
Q Q
(, b] denotes the set {q ; q b} for b . Q
Q Q
(, b) denotes the set {q ; q < b} for b . Q
Q Q
[a, ) denotes the set {q ; a q} for a . Q
Q Q
(a, ) denotes the set {q ; a < q} for a . Q
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
15.1.15 Remark: Order, addition and multiplication for the rational numbers.
The definitions and properties of order, addition and multiplication for the rational numbers, based on the
corresponding definitions for the integers, are well known. They do not need to be repeated here in full.
Theorem 15.1.16, which is related to the Archimedean property of the rational numbers, is used in the proof
of Theorem 15.6.13 (i).
Q
Proof: Let q + . Then q = [(n, d)] for some n, d Z+ . By Theorem 13.4.15, n 2n . Therefore
n 2n d because d 1. So q 2n by Definition 15.1.10.
p 0 11 1 1 1 2 1 2 1 3 2 34 1 54 3 5 2 5 3 7 4 5 6 78 1
q 1 98 7 6 5 9 4 7 3 8 5 79 2 97 5 8 3 7 4 9 5 6 7 89 1
Figure 15.1.3 Uneven distribution of rational numbers p/q with denominator q 25
The unevenness of distributions of rational numbers with bounded denominator affects the rate of conver-
gence of rational approximations to real numbers. In particular, convergence is much slower in the case of
transcendental real numbers. (See for example EDM2 [109], pages 16301631.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
x Q
-5 -4 -3 -2 -1 0 1 2 3 4 5
Figure 15.2.1 The map x 7 (1 + x/(1 + |x|))/2 from Q to Q(0, 1)
It is easily verified that this function maps rational numbers to rational numbers.
d + 2n
2d + 2n if n 0
+
Z
Z
n , d +
Z
, f (n/d) =
d
2d 2n
if n 0. Z
Q Q
To show that this map is a bijection, it is sufficient to show that f 1 : (0, 1) is a well-defined function.
The inverse map may be written as:
2y 1
1
2 2y if y 2
Q
y (0, 1), 1
f (y) =
2y 1 if y 1 .
2y 2
The fact that rationals are mapped to rationals follows from the following formula.
2n d
if 2n d
Z Z
2d 2n
d + , n (0, d), f 1 (n/d) =
2n d
if 2n d,
2n
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
15.2. Cardinality and enumerations of the rational numbers 457
Z Z
where (0, d) denotes the integer interval {i ; 0 < i < d}. It follows that the cardinality of Q is the same
as the cardinality of the unit rational interval (0, 1).Q
15.2.3 Theorem: Define a map g : Q(0, 1) Q by
d Z +
Z
, n (0, d), g([(n, d)]) = [(2n d, min(2n, 2d 2n))] (15.2.1)
(i) g is a well-defined function from Q(0, 1) to Q.
(ii) g is an injection.
(iii) g is a bijection.
Q
Proof: For part (i), let q (0, 1). By Definition 15.1.5, q = [(n, d)] for some n and d \ {0}. Z Z
Then q (0, 1) if and only if either d > 0 and 0 < n < d, or else d < 0 and d < n < 0. Suppose
that d > 0. Then g(q) = [(2n d, min(2n, 2d 2n))] by line (15.2.1). Clearly min(2n, 2d 2n) + . So Z
Q
g(q) is a well-defined element of . Suppose that [(d1 , n1 )] = [(d, n)] for some (d1 , n1 ) + . Then Z Z
g([(n1 , d1 )]) = [(2n1 d1 , min(2n1 , 2d1 2n1 ))]. This equals [(2n d, min(2n, 2d 2n))] because nd1 = n1 d
implies (2n d) min(2n1 , 2d1 2n1 ) = (2n1 d1 ) min(2n, 2d 2n). (See Definition 15.1.3.) Therefore the
value of g(q) is a rational number which is independent of the representation of q.
Q
For part (ii), let q1 , q2 (0, 1) with g(q1 ) = g(q2 ). Let q1 = [(n1 , d1 )] and q2 = [(n2 , d2 )] with denominators
d1 > 0 and d2 > 0. Then [(2n1 d1 , min(2n1 , 2d1 2n1 ))] = [(2n2 d2 , min(2n2 , 2d2 2n2 ))]. Therefore
(2n1 d1 ) min(2n2 , 2d2 2n2 ) = (2n2 d2 ) min(2n1 , 2d1 2n1 ).
Suppose that 2n1 d1 and 2n2 d2 . Then min(2n1 , 2d1 2n1 ) = 2n1 and min(2n2 , 2d2 2n2 ) = 2d2 2n2 .
So (2n1 d1 )(2d2 2n2 ) = (2n2 d2 )(2n1 ). So 2(2n1 d1 )(2n2 d2 ) = d2 (2n1 d1 ) d1 (2n2 d2 ).
But 2(2n1 d1 )(2n2 d2 ) 0 d2 (2n1 d1 ) d1 (2n2 d2 ) 0. So d2 (2n1 d1 ) d1 (2n2 d2 ) = 0.
Therefore 2n1 d1 = 0 and 2n2 d2 = 0 because d1 > 0, d2 > 0, 2n1 d1 0 and 2n2 d2 0. So
q1 = [(n1 , 2n1 )] = [(1, 2)] = [(n2 , 2n2 )] = q2 . (In other words, q1 = q2 = 1/2.) Similarly, if 2n1 d1 and
2n2 d2 , then q1 = q2 = [(1, 2)].
Now suppose that 2n1 d1 and 2n2 d2 . Then min(2n1 , 2d1 2n1 ) = 2n1 and min(2n2 , 2d2 2n2 ) = 2n2 .
So (2n1 d1 )(2n2 ) = (2n2 d2 )(2n1 ). So 2n1 d2 = 2n2 d1 . So q1 = q2 by Definition 15.1.5. Suppose that
2n1 d1 and 2n2 d2 . Then min(2n1 , 2d1 2n1 ) = (2d1 2n1 ) and min(2n2 , 2d2 2n2 ) = (2d2 2n2 ). So
(2n1 d1 )(2d2 2n2 ) = (2n2 d2 )(2d1 2n1 ). Thus 2n1 d2 = 2n2 d1 . So q1 = q2 . Hence g is an injection.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (iii), define f : by Q Q
d Z +
, n , Z f ([(n, d)]) =
[(d + 2n, 2d + 2n)] if n 0
[(d, 2d 2n)] if n 0.
Q Q
Then f : is a well-defined function because f ([(n, d)]) is independent of the choice of the pair (n, d)
Q
to represent [n, d], and Range(f ) (0, 1) because 0 < d + 2n < 2d + 2n when d > 0 and n 0, and
Q
0 < d < 2d 2n when d > 0 and n 0. Let q = [(n, d)] . Then g(f (q)) = q. So g : (0, 1) is Q Q
surjective. Hence g is a bijection by part (ii).
15.2.4 Remark: The cardinality of the rational numbers.
It is straightforward to construct an explicit bijection from = + 0 to Z Q
(0, 1) by countable induction.
Q Z
Every element of (0, 1) may be written as p/q for some p, q + with p < q. The pairs (p, q) may be
right-to-left lexicographically ordered so that the lowest few pairs are (1, 2), (1, 3), (2, 3), (1, 4), (2, 4), (3, 4),
and so forth. Equivalent ratios may be eliminated inductively. Let Y = {(p, q) + + ; p < q} withZ Z
the order (p1 , q1 ) < (p2 , q2 ) whenever q1 < q2 or else q1 = q2 and p1 < p2 . Define : Y (0, 1) by Q
: (p, q) 7 [(p, q)] = p/q. (In other words, converts each pair (p, q) to the corresponding equivalence class
Q
[(p, q)] in .) Define g : Y inductively by
n , g(n) = min {y Y ; i , (i < n (y) 6= (g(i))}
= min {y Y ; (y)
/ {(g(i)); i < n}}
= min (Y \ 1 ({(g(i)); i < n})).
This generates the sequence g(0) = (1, 2), g(1) = (1, 3), g(2) = (2, 3), g(3) = (1, 4), g(4) = (3, 4), and so
Q Q Q
forth. Define h : by h(n) = f 1 ((g(n))) for all n , where f : (0, 1) and f 1 : (0, 1) Q Q
are as in Remark 15.2.2. Then h is an explicit bijection. Hence Q
has the same cardinality as . In other
words, Qis a countably infinite set. The first few elements of the sequence are as follows.
n g(n) h(n)
0 1/2 0
1 1/3 1/2
2 2/3 1/2
3 1/4 1
4 3/4 1
5 1/5 3/2
6 2/5 1/4
7 3/5 1/4
8 4/5 3/2
This sequence simultaneously increases the density and breadth of its range at each stage, where stage
here means the denominator of g(n). This enumeration of all rational numbers is illustrated in Figure 15.2.2.
(Clearly a much more even enumeration could easily be designed.)
stage
2
3
4
5
6
7
8
9
10
11
12
13
14
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The explicitness of this sequence has advantages for applications in real-number topology and measure theory.
For example, given any open subset of the real numbers, one may construct a fairly explicit map from
to the set of connected components of by associating each such component with the first n such that
h(n) is an element of that component.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
number system should not be taken too seriously. Its only a model.
It is only our measurements and observations of the physical world which use the real number system, not
the physical world itself. A falling object does not have any kind of internal state variable which is a real
number corresponding to its velocity or inertia or temperature or mass. Numbers exist in human minds after
observations have been made. (A similar comment is made in Remark 51.1.7 regarding Cartesian charts for
the physical world.) This is very roughly illustrated in Figure 15.3.1.
real
? after number
a ctio n
arrow
object observer of
reaction time
before
Figure 15.3.1 Real numbers are found in models of observations, not in the real world
in retrospect, not be surprising. Humans (and computers) are finite, whereas real numbers are even more
infinite than the integers. The difficulties arise from the ultimate impossibility of representing an infinite
amount of information in a finite-state system. So the attempt is doomed to ultimate failure. The best that
can be realistically claimed is that real numbers are some kind of extrapolation of our finite approximations
into some kind of inaccessible metaphysical number-universe which we can never directly perceive. Axioma-
tising such a metaphysical universe, and convincing ourselves that the axiomatisation is logically consistent,
does not make that universe real. The more we rely upon abstract axiomatisation, the more we are tacitly
admitting that we are dealing with something which is beyond our direct intellectual grasp. Ironically, the
word real in the term real numbers is arguably the opposite of the truth.
The unreality of the real numbers may provide at least some solace when one trudges through the tedious
mire of the Dedekind cut representation for the real numbers in Sections 15.415.8. All logical formalisations
of the real numbers seem to be exceedingly tedious and unnatural, bearing little relation to our relatively
simple intuitive notions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
V U V U
(3) 2 where = {y 2 ; #{i ; y(i) = 1} < } and = {x 2 ; #{i ; x(i) = 0} = }.
This is the same as option (1) except that the integer part of each real number is replaced by a sign
bit followed by the absolute value of the integer Ppart. (This has
Pthe samei1
very
minor non-uniqueness
as style (2).) A triple (s, y, x) represents (1)s j=0 y(j)2j
+ i=0 x(i) 2 .
(4) 2 {z 2Z ; #{i + Z
0 ; z(i) = 1} < and #{i
Z
; z(i) = 0} = }. This has a sign bit followed
by the integer and fractional parts of the absolute value
P of the real number combined into a single
Z
map z : {0, 1}. A pair (s, z) is mapped to (1)s i= z(i)2i . As in styles (2) and (3), the only
redundancy here is the double representation of the number 0R .
(5) 2 Z U where U
= {x 2 ; #{i ; x(i) = 0} = }. (Floating point representation.) The
first component represents the sign of the real number. The second component represents the binary
left shift of the absolute value of the number (before the sign bit is applied). The third component
represents the full binary expansion
Pof the number, including both the integer and fractional parts.
A triple (s, k, x) represents (1)s i=0 x(i) 2ki1 . The redundancy here is quite substantial because
the sequence x may commence with one or more zeros. (There are some untidy ways to reduce this
redundancy which are used in practical floating-point computer representations, but they are outside
the scope of this book.)
(6) Dedekind cuts. (This representation has a unique representative for each real number, but each represen-
tative is a countably infinite set of rational numbers, which are themselves countably infinite equivalence
classes.) Each semi-infinite interval I of Q represents sup I.
(7) Cantor representation: Equivalence classes of Cauchy sequences. (The representative of each real number
is an equivalence class of sequences of rational numbers. Each such equivalence class has the cardinality
Q
of the power set of the set of integers.) Each equivalence class [x] of sequences x represents
limi x(i). (This representation is outlined by Spivak [136], page 505.)
15.3.6 Remark: The Dedekind and Cantor representations of the real numbers.
The most popular representations for the real numbers in pure mathematics are the Dedekind representation
and the Cantor representation. The former defines real numbers as semi-infinite intervals of rational numbers
called Dedekind cuts. The latter defines real numbers as equivalence classes of Cauchy sequences.
The Dedekind representation exploits the total order on the rationals to fill the gaps. The Cantor repre-
sentation exploits the distance function on the rational numbers to fill the gaps. The total order and the
distance function on the rational numbers are not unrelated. However, the distance function approach is
much more generally applicable. Any metric space may be completed (i.e. the gaps can be filled in) using
the Cauchy sequence approach. On the other hand, the total order approach requires much less convoluted
logic for its construction.
In applied mathematics and the sciences, the most popular representations of the real numbers are as finite
and infinite decimal or binary expansions. However, irrational numbers cannot be represented precisely by
such expansions, since they require an infinite amount of information.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
15.3.7 Definition: A Cauchy sequence of rational numbers is a sequence x : Q such that
Q+ , n , i n, |xi xj | < .
15.3.8 Definition: The Cantor real number system is the set of equivalence classes of all Cauchy sequences
of rational numbers, where two Cauchy sequences x = (xi )i and y = (yi )i are considered equivalent if
the combined sequence (zi )i defined by z2i = xi and z2i+1 = yi for all i is a Cauchy sequence.
The Dedekind-cut real-number representation is presented by Spivak [136], pages 494505; Rudin [125],
pages 312; Schramm [129], pages 350359; Friedman [72], pages 1015; Penrose [263], page 59; Eves [303],
pages 196199. (Dedekind cuts are presented very much more generally for partially ordered groups by
Fuchs [73], pages 9295.)
15.4.3 Definition: A Dedekind cut is a subset S of Q which satisfies (15.4.1) and (15.4.2).
y S, x , Q x<y xS (15.4.1)
y S, z S, y < z. (15.4.2)
15.4.4 Definition: A finite Dedekind cut is a set S P(Q) \ {, Q} which satisfies (15.4.3) and (15.4.4).
y S, x , Q x<y xS (15.4.3)
y S, z S, y < z. (15.4.4)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
them, and also to choose which constructions to define for particular concepts. In the case of Dedekind cuts,
Q
one may think of any Dedekind cut as a set of the form {q ; q < r} for some real number r. The great
misfortune here is the circular definition of real numbers in terms of Dedekind cuts and Dedekind cuts in
terms of real numbers. Therefore Definitions 15.4.4 and 15.4.7 must use intrinsic constructions which refer
only to rational numbers. This is the principal insight of Dedekinds real-number construction method, that
the real numbers can be defined in a non-circular fashion in terms of rational numbers only. Nevertheless,
Q R
one may use the mental picture of semi-infinite open intervals (, r) = {q ; q < r} for r to guide
the theoretical development until the real numbers are solidly established and ready to use. (Of course, this
solidity seems somewhat precarious if one looks too closely at the lower supporting layers, as outlined in
Section 2.1 and Remarks 3.5.7, 3.13.3, 6.4.6 and elsewhere.)
15.4.11 Theorem: A set is a real number if and only if it is a finite Dedekind cut. In other words, S R if
PQ Q
and only if S ( ) \ {, } and S satisfies (15.4.5) and (15.4.6).
y S, x Q, x<y xS (15.4.5)
y S, z S, y < z. (15.4.6)
PQ Q
Proof: Let S be a real number. Then S ( ) \ {, } by Definition 15.4.7. Let y S. Then there
Q
exists z S such that y < z and x , (x < z x S). Therefore x , (x < y x S). So Q
S satisfies line (15.4.3) of Definition 15.4.4, and line (15.4.4) is satisfied because y < z. Hence S is a finite
Dedekind cut by Definition 15.4.4.
PQ Q
Now suppose that S is a finite Dedekind cut. Then S ( ) \ {, } by Definition 15.4.4. Let y S. Then
Q
y < z for some z S by line (15.4.4), and this implies that z . So by replacing y with z in line (15.4.3),
Q
one obtains x , x < z x S. Therefore y S, z S, (y < z and x , (x < z x S)). Q
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Hence S is a real number by Definition 15.4.7.
15.4.13 Theorem:
R Q Q
(i) S , y \ S, z , (y z z \ S). Q
(ii) A set S is a real number if and only if S and T = Q Q \ S satisfies T
/ {, Q} and (15.4.7) and
(15.4.8).
u T, v Q, u<v vT (15.4.7)
u , Q (v Q, u < v v T ) u T. (15.4.8)
R Q
Proof: For part (i), let S , y \ S, and z Q
with y z. If y = z then z \ S. So let y < z.Q
Suppose that z S. Then y S by Theorem 15.4.11 line (15.4.5). This is a contradiction. So z \ S. Q
R
For part (ii), let S . Then S and S Q Q
/ {, } by Definition 15.4.7. So T Q
/ {, }. Let u T . Then
Q
u \ S. Let v Q Q Q
with u < v. Then v \ S by part (i). Thus u T, v , (u < v v T ),
Q
which verifies line (15.4.7). Now suppose that u satisfies u Q
/ T . Then u \ T = S. So v S, u < v,
by Theorem 15.4.11 line (15.4.6). By predicate logic, this is equivalent to (v S, u v), which is
equivalent to (v , (v Q Q
/ T u v)), which is equivalent to (v , (u < v v T )). Thus
u Q
/ T (v , (u < v v T )), which is logically equivalent to its contrapositive, which is
Q
(v , u < v v T ) u T , which verifies line (15.4.8).
Q Q
For the converse of part (ii), suppose that S and that T = \ S satisfies T Q
/ {, } and lines (15.4.7)
PQ Q Q
and (15.4.8). Then S ( )\{, }. Let u S and v with v < u. Suppose that v / S. Then v T . So
Q
u T by line (15.4.7), which is a contradiction. So v S. Therefore u S, v , (v < u v S), which
verifies line (15.4.5) of Theorem 15.4.11. Now let y S, and suppose that the proposition z S, y < z
Q
is false. Then z S, y z. This is equivalent to z , (z S y z), which is equivalent to
Q
z , (y > z z Q
/ S), which is equivalent to z , (y > z z T ). Therefore y T by line (15.4.8),
which contradicts the assumption that y S. Thus y S, z S, y < z, which verifies line (15.4.6) of
Theorem 15.4.11. Hence S is a real number.
15.4.14 Remark: Equivalence of single-set Dedekind-cut representation to a two-set partition.
Theorem 15.4.15 states that the form of Dedekind cut representation for the real numbers in Definition 15.4.7
is equivalent to the older-style two-set cut, where the rational numbers were partitioned into two semi-infinite
Q
intervals, S and \ S. It is clearly sufficient to specify only one of the sets of a two-set partition of a fixed
set. Therefore the one-set specification is adopted here.
15.4.15 Theorem: A set S P(Q) \ {, Q} is a real number if and only if (15.4.9) and (15.4.10) are
satisfied.
a S, b \ S,Q a<b (15.4.9)
a S, b S, a < b. (15.4.10)
PQ Q
Proof: Let S ( ) \ {, } be a real number. Then S is a finite Dedekind cut by Theorem 15.4.11.
Q
Let a S and b \ S. Suppose that b a. Then b S by Definition 15.4.4, which is a contradiction.
Therefore a < b. So line (15.4.9) is satisfied. Line (15.4.10) follows immediately from line (15.4.4).
PQ Q
Now suppose that S ( ) \ {, } and that lines (15.4.9) and (15.4.10) are satisfied. Let y S. Then
y < z for some z S by line (15.4.10). For such z, let x Q
with x < z. Suppose that x / S. Then z < x
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
by line (15.4.9), which is a contradiction. So x S. Hence S . R
15.5.10 Theorem: The standard embedding of the rational numbers inside the real numbers is injective.
Q
Proof: Let q1 , q2 . If q1 < q2 , then (q1 +q2 )/2 (q2 )\(q1 ). If q1 > q2 , then (q1 +q2 )/2 (q1 )\(q2 ).
So q1 6= q2 implies (q1 ) 6= (q2 ). Hence the standard embedding of the rational numbers inside the real
numbers is injective.
15.5.11 Theorem:
(i) q1 , q2 Q, (q1 = q2 (q1 ) = (q2 )).
(ii) q1 , q2 Q, (q1 q2 (q1 ) (q2 )).
(iii) q1 , q2 Q, (q1 < q2 (q1 )
6= (q2 )).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: For part (i), it is obvious that (q1 ) = (q2 ) if q1 = q2 . So suppose that (q1 ) = (q2 ) and q1 6= q2 .
If q1 < q2 , then q1 (q2 ) \ (q1 ). So (q1 ) 6= (q2 ), and similarly if q1 > q2 .
Q Q
For part (ii), let q1 , q2 . Then (q1 ) = {x ; x < q1 } and (q2 ) = {y Q; y < q2 } by Notation 15.5.4.
Let q1 q2 . Let x (q1 ). Then x Q
and x < q1 . Therefore x Q and x < q2 . So x (q2).
Q
Hence q1 , q2 , q1 q2 (q1 ) (q2 ).
Q Q
Now suppose that (q1 ) (q2 ). Then {x ; x < q1 } {y ; y < q2 }. Suppose that q2 < q1 .
Q Q
Then q2 {x ; x < q1 } \ {y ; y < q2 }. This contradicts the set-inclusion. Therefore q1 q2 .
Q
Hence q1 , q2 , q1 q2 (q1 ) (q2 ).
Part (iii) follows from parts (i) and (ii).
15.5.12 Theorem:
(i) S R, q S, (q) 6= S.
(ii) S R, q Q \ S, S (q).
S R, S = {(q); q S}.
S
(iii)
(iv) S R, q Q, (q S (q) 6= S).
(v) S R, q Q, ((q S or (q) = S) (q) S).
S R, S = {(q); q Q and (q) S}.
S
(vi)
(vii) S R, S = {z Q; w Q, (z (w) and (w) S)}.
(viii) S R, S = {z Q; w Q, (w < z and (w) S)}.
X P(Q) \ {}, {(q); q X} 6= Q {(q); q X} R .
S S
(ix)
S P(Q) \ {, Q}, (S R S = {(q); q S}).
S
(x)
R
Proof: For part (i), let S and q S. Then by Definition 15.4.7, there exists q 0 S with q < q 0 and
x S for all x Q
with x < q 0 . So (q 0 ) S for some q 0 S with q < q 0 . But (q) 0
6= (q ) because q < q
0
by Theorem 15.5.11 (iii). Therefore (q) 6= S.
For part (ii), let S R
and q Q
\ S. Let q 0 S. Then q 0 < q by Theorem 15.4.15 line (15.4.9). So
0
q (q) by Definition 15.5.3. Hence S (q).
R
For part (iii), let S . If q 0 S, then 0
S q < q for some q S,0 and
0
S so q (q) by Definition 15.5.3.
Then (q) S by part (i), and so q {(q); q S}. Now let q {(q); q S}. Then q 0 (q) for
0
some q S. But then q 0 < q by Definition 15.5.3. So q 0 S by Theorem 15.4.11 line (15.4.5).
R Q
For part (iv), let S and q . If q S, then (q)
6= S by part (i). So suppose that (q) 6= S. Then there
exists some x S \ (q). But then q x for this x because x and x Q
/ (q). So q S because x S.
Hence q S (q) 6= S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
w
S (x), let S P(Q) \ {, Q}. If S R, then S = {(q); q S} by part (iii). Now suppose that
S
For part
S = {(q); q S}. Then S R by part (ix).
15.6.2 Definition: The standard order on the real numbers is the set {(S1 , S2 ) R R; S1 S2 }.
15.6.3 Notation: denotes the standard order on the real numbers.
15.6.4 Theorem: The standard order on the real numbers is a total order on the set R.
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
15.6. Real number order 467
R PQ
Proof: Let S1 , S2 . Then S1 , S2 ( ). Suppose that x S1 \ S2 and y S2 \ S1 for some x, y . Q
If x < y, then x S2 (because y S2 ), which is a contradiction. Similarly, if x > y, then y S1
(because x S1 ), which is also a contradiction. So either S1 S2 or S2 S1 . In other words, either
S1 S2 or S2 S1 .
Suppose that S1 S2 and S2 S1 . Then S1 S2 and S2 S1 . So S1 = S2 by the ZF axiom of extension.
(See Definition 7.2.4 (1).)
R
Let S1 , S2 , S3 with S1 S2 and S2 S3 . Then S1 S2 and S2 S3 . So S1 S3 . Therefore S1 S2 .
Hence the standard order on the real numbers is a total order on R
by Definition 11.5.1.
15.6.5 Theorem:
Q
(i) q1 , q2 , (q1 q2 (q1 ) (q2 )).
Q
(ii) q1 , q2 , (q1 < q2 (q1 ) < (q2 )).
Proof: Part (i) follows from Theorem 15.5.11 (ii) and Definition 15.6.2 and Notation 15.6.3.
Part (ii) follows from Theorem 15.5.11 (iii) and Definition 15.6.2 and Notation 15.6.3.
15.6.6 Remark: Derived definitions for the total order on the real numbers.
The order relations <, > and are defined for real numbers in terms of the primary relation in
the usual way, as in Notation 11.1.10.
15.6.7 Notation:
R+
denotes the set of R
positive real numbers {r ; r > 0}.
R+
0 denotes the set of R
non-negative real numbers {r ; r 0}.
R
denotes the set of R
negative real numbers {r ; r < 0}.
R
0 denotes the set of R
non-positive real numbers {r ; r 0}.
15.6.8 Remark: Alternative notations for positive and non-negative real numbers.
R
Some authors use + for non-negative real numbers, but then there would be no obvious notation for positive
R
reals. It would be tedious to have to write + \ {0} for the positive reals.
R Z
Some authors use notations such as + and + instead of the superscript versions. The advantage of this
R R
would be that you could write n+ for the Cartesian product of n copies of + . However, this would cause
R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
ambiguity for notations like 0+ for the non-negative real numbers.
15.6.9 Remark: Completeness of the real numbers.
The definitions of lower bounds, upper bounds, greatest lower bound, least upper bound, infimum, supremum,
minimum and maximum are defined for real numbers in terms of the order relation in the usual way,
as in Definition 11.2.4.
R
To show that is a complete ordered field in Definition 18.9.3, it will be necessary to know that non-empty
bounded sets of real numbers have a least upper bound. This is the completeness property for ordered fields.
15.6.10 Theorem: Let A be a non-empty set of real numbers which is bounded above.
S
(i) A . R
S
(ii) A is the least upper bound of A in . R
Proof: For part (i), let A be a non-empty S set of real numbers which is bounded above. Then = 6 A R
R
and K , a A, a K. Let S = A. Then S 6= because each element of A is non-empty by
Definition 15.4.7, and the union of a non-empty set of non-empty sets is non-empty. Let x S. Then x a
for some a A. But a A implies a K, which implies a K. So x a for some a K. Therefore
a K, and so S K. So S 6= Q
because K 6= Q PQ
by Definition 15.4.7. Therefore S ( ) \ {, }. Q
Q
Let y S. Then y a for some a A. But a A implies that a , and so there exists z a
Q
such that y < z and x , (x < z x S) by Definition 15.4.7. However, z a implies z S. So
Q R
y S, z S, (y < z and x , (x < z x S)). Hence S by Definition 15.4.7.
S
For part (ii), note that by the construction of S = A, it is clear that S a for all a A, and so S a for
all a A. Therefore S is an upper
S bound for A.SNow suppose that S 0 is an upper bound 0
S for A with S < S.
0 0
Then a S for all a A. So A S 6= S A, which is a contradiction. Hence A is the least upper
bound of A.
15.6.11 Theorem:
(i) S R, q S, (q) < S.
(ii) S R, q Q \ S, S (q).
(iii) S R, q Q, (q S (q) < S).
15.6.13 Theorem:
(i) S R, Q+ , x S, y Q \ S, y x < .
(ii) S1 , S2 R, q S2 \ S1 , S1 (q) < S2 .
(iii) S1 , S2 R, (S1 < S2 q Q, S1 < (q) < S2 ).
Proof: For part (i), let S R and Q+ . Since S 6= and S 6= Q, there exist x0 S and y0 Q \ S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Let 0 = y0 x0 . Then 0 Q+ by Theorem 15.4.15 line (15.4.9). For i Z+ 0 , let P (i) be the proposition
that there exist xi S and yi Q \ S with yi xi = 2i 0 . Then P (0) is true.
Suppose that P (i) is true for some i Z+ 0 . Then there exist xi S and yi Q \ S with yi xi = 2 0 . Let
i
z = (xi + yi )/2. If z S, let xi+1 = z and yi+1 = yi . Otherwise let xi+1 = xi and yi+1 = z. Then xi+1 S
and yi+1 Q \ S, and yi+1 xi+1 = 2i1 0 . Thus P (i + 1) is true. So by induction, for all i Z+ 0 , there
exist xi S and yi Q \ S with yi xi = 2i 0 .
Since 0 / Q+ , for some i0 Z+ 0 , 0 / 2
i
by Theorem 15.1.16. Then 2i 0 . Let x = xi and
0 0
15.7.2 Definition: The addition operation for the real numbers is the map : R R R defined by
Q
(S1 , S2 ) = {x ; y S1 , z S2 , x = y + z} = {y + z; y S1 , z S2 }.
15.7.3 Theorem: The addition operation for the real numbers has the following properties.
(i) R
S1 , S2 , (S1 , S2 ) . R [closure of addition]
(ii) R
S1 , S2 , (S1 , S2 ) = (S2 , S1 ). [commutativity of addition]
(iii) R
S1 , S2 , S3 , ((S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). [associativity of addition]
(iv) R
S , ((0), S) = S = (S, (0)). [additive inverse]
R
Proof: For part (i), let S1 , S2 . Let S3 = (S1 , S2 ). Then S3 = {y + z; y S1 , z S2 }. Since
S1 6= and S2 6= , it follows that S3 6= . Since S1 6= Q Q
and S2 6= , there exist x1 Q \ S1 and
x2 Q\ S2 . Suppose that x1 + x2 S3 . Then x1 + x2 = y + z for some y S1 and z S2 . Suppose
Q
that S3 = . Then x1 + x2 S3 . But (x1 y) + (x2 + z) = 0, and x1 y 6= 0 6= x2 z by the assumptions
for x1 , x2 , y and z. So either x1 < y or x2 < z. Suppose that x1 < y. Then x1 S1 by Theorem 15.4.11
and Definition 15.4.4, line (15.4.3), which is a contradiction. Similarly the assumption x1 < z yields a
contradiction. Therefore S3 6= . Q
Now let y3 S3 . Then y3 = y1 + y2 for some y1 S1 and y2 S2 . So there exist z1 S1 and z2 S2
Q Q
such that y1 < z1 and y2 < z2 and x1 , (x1 < z1 x1 S1 ) and x2 , (x2 < z2 x2 S2 ). Let
Q
z3 = z1 + z2 . Then y3 < z3 . Let x3 with x3 < z3 . Let x1 = z1 (z3 x3 )/2 and x2 = z2 (z3 x3 )/2.
Q
Then x1 , x2 with x1 < z1 and x2 < z2 . So x1 S1 and x2 S2 by Definition 15.4.3. Let x3 = x1 + x2 .
R
Then x3 S3 by the definition of . Hence S3 by Definition 15.4.7.
R
For part (ii), let S1 , S2 . Let S3 = (S1 , S2 ) and S30 = (S2 , S1 ). Let x S3 . Then x = y + z
for some y S1 and z S2 . But y + z = z + y by the commutativity of the rational numbers. So
x = z + y for some z S2 and y S1 . So z S30 . Therefore S3 S30 . Similarly, S30 S3 . Hence
R
S1 , S2 , (S1 , S2 ) = (S2 , S1 ).
R
For part (iii), let S1 , S2 , S3 . Let a ((S1 , S2 ), S3 ). Then a = w + z for some w (S1 , S2 )
and z S3 , and w = x + y for some x S1 and y S2 . So a = (x + y) + z = x + (y + z). But
y + z (S2 , S3 ) and so x + (y + z) (S1 , (S2 , S3 )). So ((S1 , S2 ), S3 ) (S1 , (S2 , S3 )). Similarly,
R
((S1 , S2 ), S3 ) (S1 , (S2 , S3 )). Hence S1 , S2 , S3 , ((S1 , S2 ), S3 ) = (S1 , (S2 , S3 )).
R Q
For part (iv), let S . Then ((0), S) = ({x ; x < 0}, S) = {x + y; x Q and x < 0 and y S}.
Let w ((0), S). Then w = x + y for some x, y Q
with x < 0 and y S. So w Q and w < y.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Therefore w S by Theorem 15.4.11 line (15.4.5). Now suppose that w S. Then for some z S,
Q
w < z and x , (x < z x S) by Definition 15.4.7. Let y = w z. Then y Q and y < 0. So
y (0). Therefore w = y + z with y (0) and z S. So w ((0), S). Therefore S ((0), S). Hence
S = ((0), S). The equality S = (S, (0)) then follows from part (ii).
15.7.4 Remark: Explicit constructions for additive and multiplicative inverse maps.
The axioms for an additive group require only that an additive inverse (i.e. a negative) of each element of the
group should exist. The existence of a function which maps each element to its additive inverse is not required
to exist. However, since the additive inverse is always unique if it does exist, one may always define the map
from elements to inverses as the set of ordered pairs {(x, y) G G; x + y = 0G }, where G is the additive
group and 0G is the additive identity of the group. In the case of the real numbers, it is straightforward to
give an explicit construction for the additive inverse map. This is given in Definition 15.7.6. Similarly, an
explicit construction for the multiplicative inverse (i.e. reciprocal) is given in Definition 15.8.11.
15.7.5 Remark: The negation operation for the real numbers.
Q
A naive guess for the negative of a real number S would be S 0 = {x ; y S, x + y < 0}. This doesnt
Q
work for embedded rational numbers. Let w and S = (w). Then S = {y ; y < w} and Q
Q Q
S 0 = {x ; y , (y < w x + y < 0)}
Q Q
= {x ; y , (y < 0 x < w y)}
Q Q
= {x ; y , (y > 0 x < w + y)}
T
Q
= {(w + y); y and y > 0}
Q
= {x ; x w},
which is not a real number. The problem with S 0 is that it equals an intersection of real numbers (w + y),
which does not yield an open interval of rational numbers when w is rational. It makes sense, then, to try
aSunion of open intervals, converging from the left instead of the right. This suggests using something like
Q
{(y); y \ S}, which is at least guaranteed to yield a real number.
15.7.6 Definition: The negative of a real number S is the set {x Q; y Q \ S, x + y < 0}.
15.7.7 Notation: S, for a real number S, denotes the negative of S. In other words,
S R, S = {x Q; y Q \ S, x + y < 0}.
15.7.8 Remark: Use of the Archimedean property of rational number order for real number negation.
Theorem 15.7.9 gives some basic properties of the negativeS of a real number to justify the name. Part (i)
resembles Theorem 15.5.12 (iii), which states that S = {(y); y S} for all real numbers S.
A notable aspect of the proof for Theorem 15.7.9 (iii) that the negative of a real number is its additive inverse
is the use of the Archimedean property of the total order on the rational numbers. This property has an
analytic character since it uses a kind of limit operation. This is surprising because negation of a real number
seems at first sight to be a purely algebraic operation.
15.7.9 Theorem:
R S
(i) S , S = {(y); y \ S}. Q
R
(ii) S , S . R [closure of negation]
R
(iii) S , (S, S) = (0) = (S, S). [additive inverse]
R
(iv) S1 , S2 , (S1 = S2 S1 = S2 ).
(v) S ,R Q Q
\ S = {w ; (w) S}.
R Q Q
(vi) S , w , (w \ S (w) S).
R Q Q
(vii) S , w , (w \ S ((w) = S or w S)).
R Q Q
(viii) S , S = {x ; y \ S, x + y < 0}.
R S
Q
(ix) S , S = {(y); y \ S}.
R
(x) S , S = (S). [double negation]
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: For part (i), let S be a real number. Then by Definition 15.7.6,
z, z S Q Q
z and y \ S, z + y < 0
Q Q
y \ S, (z and z + y < 0)
Q Q
y \ S, z {x ; x + y < 0}
Q
y \ S, z (y)
S
Q
z {(y); y \ S}.
S
Hence S = {(y); y \ S}. Q
Q Q
For part (ii), let S be a real number. Then S = {x ; y \ S, x + y < 0}. Clearly S . Since Q
Q Q
S 6= , it follows that y \ S for some y, and then x = y 1 satisfies x Q
and x + y = 1 < 0. So
S 6= . Since S 6= , there is some z S. Suppose that z S. Then z + y < 0 for some y \ S. But Q
then y < z, and so y S, which is a contradiction. So z Q
/ S although z . Therefore S 6= . In Q
Q
fact, if z S and y \ S, then z < y by Theorem 15.4.15, line (15.4.9). So (y) (z) for all y \ S Q
by Theorem 15.5.11 (ii). Therefore S (z). So z is S an upper bound for A = {(y); y Q
\ S}.
Since A 6= , it follows from Theorem 15.6.10 (i) that S = A . R
For part (iii), let S be a real number. Let z (S, S). Then z = x + y for some x S and y S. Then
Q Q
y and w \ S, y + w < 0. But then x < w by Theorem 15.4.15 line (15.4.9). So x + y < x w < 0.
Therefore z (0), and so (S, S) (0).
To show that (S, S) (0), let z (0). Then z Q and z < 0. Let = z. By Theorem 15.6.13 (i),
Q
there exist x S and y \ S such that y x < . Then (y) S by part (i). By Theorem 15.4.11
Q
line (15.4.5), there exists x0 S with x < x0 . Let y 0 = y +x0 x. Then y 0 \S by Theorem 15.4.13 (i), and
y 0 x0 = y x < . Since y 0 > y, it follows that y 0 (y) S. Therefore z = < x0 y 0 = x0 + (y 0 )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
w Q \ S, z (w)
z {(w); w Q \ S}.
S
S
Hence S = {(w); w \ S}. Q
Part (x) follows from part (viii) and Definition 15.7.6.
15.7.10 Remark: Commutative group properties of real number addition and negation.
R
It follows from Theorems 15.7.3 and 15.7.9 that the pair ( , ) is a commutative group by Definitions 17.3.2
and 17.3.22. Since those definitions appear in a later chapter, the term commutative group is not used for
the properties asserted in Theorem 15.7.11.
R
15.7.11 Theorem: The pair ( , ), where is the addition operation in Definition 15.7.2, has the following
properties.
(i) R
S1 , S2 , (S1 , S2 ) . R [closure]
(ii) R
S1 , S2 , S3 , ((S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). [associativity]
(iii) R
S , ((0), S) = S = (S, (0)). [existence of identity]
(iv) R
S , (S, S) = (0) = (S, S). [existence of inverses]
(v) R
S1 , S2 , (S1 , S2 ) = (S2 , S1 ). [commutativity]
Proof: Part (i) is the same as Theorem 15.7.3 (i).
Part (ii) is the same as Theorem 15.7.3 (iii).
Part (iii) is the same as Theorem 15.7.3 (iv).
Part (iv) follows from Theorem 15.7.9 (ii, iii).
Part (v) is the same as Theorem 15.7.3 (ii).
15.7.12 Remark: Using group properties to prove properties of real number addition.
Many properties of real number addition can be more effortlessly derived from the abstract commutative
group properties in Theorem 15.7.11. For example, the identity (S) = S in Theorem 15.7.9 (x) follows
abstractly in Theorem 15.7.13 (v) from the associativity and inverse properties in Theorem 15.7.11 by the
same proof steps as for completely abstract groups.
The real number system is in practice regarded as no more and no less than an abstract complete ordered
field. Any other properties are regarded as mere artefacts of particular models for the abstract system
of axioms for a complete ordered field. Consequently, there is little benefit in proving any properties of the
Dedekind cut representation beyond what is required to verify the complete ordered field axioms, since all
other properties will be discarded in applications.
15.7.13 Theorem:
(i) S, T R, ((S, T ) = (0) T = S). [unique right inverse]
(ii) S, T R, ((T, S) = (0) T = S). [unique left inverse]
(iii) S, T1 , T2 R, ((S, T1 ) = (S, T2 ) T1 = T2 ).
(iv) S, T1 , T2 R, ((T1 , S) = (T2 , S) T1 = T2 ).
(v) S R, (S) = S. [double negation]
(vi) A, B, C R, (A, B) = (A, B). [commutativity of negation and addition]
Proof: For part (i), let S, T R satisfy (S, T ) = (0). Then (S, (S, T )) = (S, (0)) because is
a well-defined function. So ((S, S), T ) = S by Theorem 15.7.11 (ii, iii). Therefore ((0), T ) = S by
Theorem 15.7.11 (iv). Hence T = S by Theorem 15.7.11 (iii).
For part (ii), let S, T R
satisfy (T, S) = (0). Then ((T, S), S) = ((0), S) because is a
well-defined function. So (T, (S, S)) = S by Theorem 15.7.11 (ii, iii). Therefore (T, (0)) = S by
Theorem 15.7.11 (iv). Hence T = S by Theorem 15.7.11 (iii).
For part (iii), let S, T1 , T2 R
satisfy (S, T1 ) = (S, T2 ). Then (S, (S, T1 )) = (S, (S, T1 )). But
(S, (S, T1 )) = ((S, S), T1 ) = ((0), T1 ) = T1 follows from Theorem 15.7.11 (ii, iv, iii). Similarly
(S, (S, T2 )) = ((S, S), T2 ) = ((0), T2 ) = T2 . Hence T1 = T2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (iv) may be proved as for part (iii).
R
For part (v), let S . Then ((S), S) = (0) by Theorem 15.7.11 (iv), and since is a well-
defined function, (((S), S), S) = ((0), S). So ((S), (S, S)) = S by Theorem 15.7.11 (ii, iii).
Therefore ((S), (0)) = S by Theorem 15.7.11 (iv). Hence (S) = S by Theorem 15.7.11 (iii).
Alternatively, for part (v), Theorem 15.7.11 (iv) implies both ((S), S) = (0) and (S, S) = (0).
Hence (S) = S by part (iv).
For part (vi), let A, B, C R. Then
((A, B), (A, B)) = ((B, A), (A, B))
= (B, (A, (A, B)))
= (B, ((A, A), B))
= (B, ((0), B))
= (B, B)
= (0)
by Theorem 15.7.11 (i, ii, iii, iv, v). Hence (A, B) = (A, B) by part (i).
15.7.14 Remark: Density of rational numbers implies lack of supremum for an irrational number.
It is noteworthy that the proof of Theorem 15.7.15 (iii) uses the analytic-looking property of the real numbers
in Theorem 15.6.13 (iii), which concerns the density of the rational numbers within the real numbers.
Theorem 15.7.15 (iii) may be interpreted to mean that the supremum of the set of rational numbers which
are less than a given irrational number is not equal to an embedded rational number. In other words, the
reflection in the origin of the complement of an irrational real number S equals the negative of S because it
Q
is unnecessary to remove the rational supremum from the reflected complement {y; y \ S} because it
does not contain one.
Theorem 15.7.15 implies that the negative of a real number has one of two simple forms, one for embedded
rational numbers, and the other form for other real numbers. Thus a rational real number (q), for
Q Q
some q , has a negative of the form {y; y \ (q) and y 6= q}, whereas an irrational real number S
Q
has a negative of the form {y; y \ S}. The difference between these two forms is only a single point
which must be excised in the case of rational real numbers. This is more or less intuitively obvious, but of
course it must be verified, like all intuition. The proof is onerous, but it must be done!
15.7.15 Theorem:
R
(i) S , S {y; y \ S}. Q
Q Q
(ii) q , (q) = {y; y \ (q) and y 6= q}.
R Q
(iii) S \ ( ), S = {y; y \ S}. Q
R
(iv) S , (S Q
/ ( ) S = {y; y \ S}). Q
R
Proof: For part (i), let S . Let x S. Then by Definition 15.7.6, x Q
and x < z for some
Q Q Q Q
z \ S. So x and x > z for some z \ S. Therefore x \ S by Theorem 15.4.13 (i). Hence
Q
x {x; x \ S} = {y; y \ S}. Q
Q
For part (ii), let q . Let x (q). Then x {y; y Q
\ (q)} by part (i). By Definition 15.7.6,
x Q and x < z for some z Q
\ (q). But Q Q
\ (q) = {z ; q z}. So x > z for some
z Q with z q. Therefore x > q, and so x 6= q. So x {y; y Q
\ (q) and y 6= q}. Hence
Q
(q) {y; y \ (q) and y 6= q}.
Q
To show the reverse inclusion for part (ii), let x {y; y \ (q) and y 6= q}. Then x = y for some
y Q \ (q) with y 6= q. So x = y for some y Q
with y q and y 6= q. That is, x = y for some
y Q Q
with y > q. Let y 0 = (y +Sq)/2. Then y 0 , y < y 0 and y 0 > q, and so y (y 0 ) and
Q Q
y 0 \ (q). Therefore x = y {(y 0 ); y 0 \ (q)}. So x (q) by Theorem 15.7.9 (i). Therefore
Q Q
{y; y \ (q) and y 6= q} (q). Hence (q) = {y; y \ (q) and y 6= q}.
R Q Q
For part (iii), let S \( ). Then S {y; y \S} by part (i). Suppose that S 6= {y; y \S}. Q
Then y Q Q
/ S for some y \ S. So by Definition 15.7.6, z \ S, y + z 0 for some y \ S. In Q
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Q Q
other words, y \ S, z \ S, z y. Suppose that S < (z). Then by Theorem 15.6.13 (iii), there
Q Q
exists w with S < (w) < (z). So w \S with w < z by Theorem 15.6.5 (ii). This is a contradiction.
Q Q
So S = (z). Therefore S ( ) = {(q); q }. This is a contradiction. Hence S = {y; y \ S}. Q
R
For part (iv), let S . By part (iii), if S Q Q Q
/ ( ) then S = {y; y \ S}. If S ( ) then S = (q)
Q Q Q
for some q , and so S = {y; y \S and y 6= q} by part (ii). But q \(q) = \S since q Q / (q).
Q
So {y; y \ S and y 6= q} = Q
6 {y; y \ S}. Hence S 6= {y; y \ S}. Q
15.7.16 Remark: Some minor properties of the negation and order of real numbers.
The assertions in Theorem 15.7.17 are easy to prove and are fairly shallow. However, it is important to
verify them because they are often required as small steps in later proofs. (For the meaning of 0R , see
Notation 15.5.8.)
15.7.17 Theorem:
Q
(i) q , (q) = (q).
(ii) 0R = 0R .
R
(iii) S1 , S2 , (S1 S2 S2 S1 ).
R
(iv) S , (S 0R S 0R ).
R
(v) S , (S > 0R S Q
0 ).
R
(vi) S1 , S2 , S3 , (S1 < S2 (S1 , S3 ) < (S2 , S3 )). [order additivity]
Proof: For part (i), let q Q. Then (q) = {x Q; y Q \ (q), y < x} by Definition 15.7.6. But
Q \ (q) = {z Q; z q}. Therefore (q) = {x Q; y Q, (q y and y < x} = {x Q; q < x} =
{x Q; x < q} = (q).
For part (ii), 0R = (0) = (0) = (0) = 0R by part (i).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
cut in Definitions 15.4.4 and 15.4.7. (See Notations 15.1.13 and 15.1.14 for the rational number intervals
in Definition 15.8.2.)
It seems that any kind of representation of the real numbers is unsatisfyingly untidy when both addition
and multiplication operations must be supported. This suggests that there is something intrinsically difficult
about multiplication in particular, and certainly the combination of addition and multiplication. Since the
weight of tradition is in favour of the additive representation, that is the representation which continues to
be pursued here as the basis of Definition 15.8.4.
R R R
15.8.4 Definition: The multiplication operation for real numbers is the map : defined by
R
{yz; y S1 , z S2 } if S1 , S2
Q R R
0
R if S1
{yz; y S1 , z \ S2 } and S2 +
Q R R
S1 , S2 , (S1 , S2 ) = 0
{yz; y \ S1 , z S2 } if S1 + and S2
Q Q Q R
0
+ +
{yz; y 0 S1 , z 0 S2 } if S1 , S2 +
15.8.6 Remark: Application of multiplicative Dedekind cuts to multiplication of positive real numbers.
R
The construction in the case S1 , S2 + in Definition 15.8.4 is the same as first converting S1 and S2 to the
multiplicative style of Dedekind cut as in Definition 15.8.2, then multiplying them according to the simple
rule mentioned in Remark 15.8.3, and finally converting the result back to an additive Dedekind cut. For
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
non-negative real numbers, the two forward conversions are achieved by truncation to + Q
0 , and the single
reverse conversion is achieved by extending the resulting bounded interval to negative infinity.
15.8.8 Theorem:
(i) {yz; y S1 , z S2 } R
0 for all S1 , S2 R
0.
R
(ii) {yz; y S1 , z S2 } 0 for all S1 , S2
+
R 0.
Q
(iii) {yz; y S1 , z \ S2 } 0R for all S 1
R
0 and R
S2 + .
Q
R
(iv) {yz; y \ S1 , z S2 } 0 for all S1 +
R and S2 R 0.
(v) Q +
Q Q
+
{yz; y 0 S1 , z 0 S2 } R+
for all S1 , S2 R+ .
R
(vi) S1 , S2 , (S1 , S2 ) . R [closure of multiplication]
15.8.9 Theorem:
(i) S1 R0 , S2 R+ , (S1, S2) = {yz; y S1, z S2}.
(ii) S1 R+ , S2 R 0 , (S1 , S2 ) = {yz; y S1 , z S2 }.
(iii) S R, (S, (0)) = (0) = ((0), S).
(iv) S1 , S2 R, (S1 , S2 ) = (S1 , S2 ) = (S1 , S2 ). [commutativity of negation and multiplication]
Proof: For part (i), let S1 R 0 and S2 R . Then (S1 , S2 ) = {yz; y S1 , z Q \ S2 } follows from
+
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and Theorem 15.7.13 (v). So (S1 , S2 ) = {yz; y S1 and z Q and ((z) = S2 or z S2 )}. Suppose
that y S1 and (z) = S2 . Then y < y 0 for some y 0 S1 by Theorem 15.4.11 line (15.4.6), and then
z 0 = zy/y 0 < z because y, z Q . So z 0 (z) = S2 , and so y 0 z 0 = yz {yz; y S1 , z S2 }.
Hence (S1 , S2 ) = {yz; y S1 , z S2 }.
Part (ii) follows from part (i) by swapping S1 and S2 .
R
For part (iii), let S . If S R
0 , then (S, (0)) = {yz; y S, z (0)} by Definition 15.8.4. But
Q Q Q
{yz; y S, z (0)} = {yz; y S, z } = + because S (0) = , and any element x + Q
Q
may be written as yz for any y S by letting z = x/y. Therefore (S, (0)) = (0). Now suppose
R Q Q Q
that S + . Then (S, (0)) = {yz; y \ S, z (0)} = {yz; y \ S, z }. But \ S + . So Q Q
Q Q Q
(S, (0)) = because every x may be written as yz for any y + by setting z = x/y. Therefore
(S, (0)) = (0). Similarly, (0) = ((0), S).
R
For part (iv), let S1 , S2 . If S1 = (0) or S2 = (0), then (S1 , S2 ) = (0) = (0) = (S1 , S2 ) =
R
(S1 , S2 ) by part (iii) and Theorem 15.7.17 (ii). If S1 , S2 , then (S1 , S2 ) = {yz; y S1 , z S2 }
by Definition 15.8.4 and Theorem 15.7.13 (v), and (S1 , S2 ) = {yz; y S1 , z S2 } by part (ii) and
Theorem 15.7.13 (v). So (S1 , S2 ) = (S1 , S2 ). Similarly, (S1 , S2 ) = (S1 , S2 ).
15.8.10 Theorem:
(i) S1 , S2 R, (S1, S2) = (S2, S1 ). [commutativity of multiplication]
(ii) S1 , S2 , S3 R, ( (S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). [associativity of multiplication]
(iii) S R, (S, (1)) = S = ((1), S). [multiplicative identity]
S1 and S2 follows from Definition 15.8.4 and the commutativity of multiplication of rational numbers. If
S1 R
0 and S2 R+ , then (S1, S2) = {yz; y S1, z Q \ S2 } = {zy; z Q \ S2, y S1} = (S2, S1).
The result for S1 R+ and S2 R0 follows similarly. Hence the result is valid for all S1, S2 R.
For part (ii), let S1 , S2 , S3 R. First consider the case S1 , S2 , S3 R
0 . Then
( (S1 , S2 ), S3 ) = ({xy; x S1 , y S2 }, S3 )
= ({xy; x S1 , y S2 }, S3 )
= ({xyz; x S1 , y S2 , z S3 })
= {xyz; x S1 , y S2 , z S3 }
by Definition 15.8.4 and Theorems 15.8.9 (iv) and 15.7.13 (v). Similarly, (S1 , (S2 , S3 )) = {xyz; x S1 ,
R
y S2 , z S3 }. So ( (S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). For the case S1 + and S2 , S3 R 0 , one
obtains ( (S1 , S2 ), S3 ) = ( (S1 , S2 ), S3 ) = ( (S1 , S2 ), S3 ) by Theorem 15.8.9 (iv). But then
S1 , S2 , S3 R
0 . So by the first case, ( (S1 , S2 ), S3 ) = (S1 , (S2 , S3 )), which equals (S1 , (S2 , S3 ))
by Theorem 15.8.9 (iv). Therefore ( (S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). The other six cases for the signs of
S1 , S2 and S3 follow similarly.
R
For part (iii), let S . First consider the case S R
0 . Then (S, (1)) = {yz; y S, z (1)}
by Theorem 15.8.9 (i). But (1) = (1) by Theorem 15.7.17 (i), and (1) = {z ; z < 1}. So Q
{yz; y S, z (1)} {y; y S} = S. Let y S. Then y < y 0 for some y 0 S. Let z 0 = y/y 0 .
Then z 0 < 1Q . So z 0 (1) and y 0 z 0 = yz {yz; y S, z (1)}. So (S, (1)) = S. Similarly,
R R
S = ((1), S). The case S + may be converted to the case S by applying Theorem 15.8.9 (iv).
15.8.11 Definition: The reciprocal of a non-zero real number S is the set {x Q ; y Q \S, x < 1/y}
R
for S , and is the set
0 {x Q+
Q Q
; y \ S, x < 1/y} for S + . R
15.8.12 Notation: S 1 , for non-zero real numbers S, denotes the reciprocal of S. in other words,
R \ {0R}, 1
Q Q
{x ; y \ S, x < 1/y} if S R
S S =
Q
0 {x Q +
Q
; y \ S, x < 1/y} if S R+ .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
15.8.13 Remark: Analysis-like proof that the reciprocal of a real number is its multiplicative inverse.
The second half of the proof of Theorem 15.8.14 (xiii) uses an analysis-like argument which is based on
Theorem 15.6.13 (i), which is itself based on the Archimedean property of rational number order. A similar
situation is encountered in the corresponding proof that the negative of a real number is its additive inverse
in Theorem 15.7.9 (iii), as mentioned in Remark 15.7.8.
15.8.14 Theorem:
(i) S R , S 1 = SS
{(1/y); y Q \ S}.
(ii) S R+ , S 1 = {(1/y); y Q \ S}.
(iii) S R , S 1 R .
(iv) S R+ , S 1 R+ .
(v) S R \ {0R }, S 1 R \ {0R }. [closure of negation]
S R , Q Q Q R+ .
S
(vi) 0 {1/z; z S} = \ {1/z; z \ S} = {(1/z); z S}
S R+ , {1/z; z S Q+ } = Q \ {1/z; z Q \ S} = {(1/z); z S Q+ } R .
S
(vii)
(viii) S R , (S 1 ) = Q 0 {1/z; z S}.
(ix) S R , (S) = Q0 {1/z; z S}.
1
(x) S R+ , (S 1 ) = {1/z; z S Q+ }.
(xi) S R+ , (S)1 = {1/z; z S Q+ }.
(xii) S R \ {0R }, (S)1 = (S 1 ). [commutativity of negative and reciprocal]
(xiii) S R \ {0R }, (S, S ) = (1) = (S , S).
1 1
[the reciprocal is the multiplicative inverse]
(xiv) S, T1 , T2 R, (S, (T1 , T2 )) = ( (S, T1 ), (S, T2 )). [distributivity]
R Q
Proof: For part (i), let S . Then x S 1 if and only if x and x < 1/y for some y \ S. In Q
Q
other words, x S 1 if and only if xS {x ; x < 1/y} = (1/y) for some S y
Q
\ S because y . Q
1
Therefore x S if and only if x {(1/y); y Q
\ S}. Hence S = {(1/y); y \ S}.
1
Q
R
For part (ii), let S + . Then x S 1 if and only if x Q0 or x {x Q +
; x < 1/y} for some y \ S. Q
Q
But {x + ; x < 1/y} = + S Q Q
(1/y) because y + . Therefore x S 1 if and only if x (1/y) for
Q
some y \ S. Hence S 1 = {(1/y); y \ S}. Q
R Q
For part (iii), let S . ThenSS 1 by Definition 15.8.11. Let X = {z ; 1/z \ S} = Q Q
Q
{1/y; y \ S}. Then S 1 = {(y); y X} by part (i). Therefore S 1 R
by Theorem 15.5.12 (ix)
R
because X 6= . Hence S 1 because S 1 6= Q
= (0).
For part (iv), let S R+
. Let X = {z Q ; 1/z Q\ S} = {1/y; y Q\ S}. Then X 6= and
S
S 1 = {(y); y X} 6= Q by part (ii). So S 1 R by Theorem 15.5.12 (ix). Hence S 1 + because R
S 1
\
= Q
= (0).
Part (v) follows from parts (iii) and (iv).
R
For part (vi), let S . Then x Q 0 {1/z; z S} if and only if x Q and x Q
/ \ {1/z; z S}.
Q Q
But \ {1/z; z S} = {1/z; z \ S} by Theorem 10.6.6 (v0 ) because the map z 7 1/z from Q
Q
to + is a S bijection. Hence Q 0 {1/z; z S} = Q Q
\ {1/z; z \ S}. Now let A = Q
0 {1/z; z S}
and B = {(1/z); z S}. Let y A. Then y Q or y = 1/z for some z S. If y Q
0 , then
Q Q Q
0
+
y (1/z) for some z S because S 6= . So y B. If y / 0 , then y and y = 1/z for
some z S. There exists z 0 S with z < z 0 by Theorem 15.4.11 line (15.4.6). Then 1/z (1/z 0 )
Q
because z 0 and 1/z < 1/z 0 . So y B. Therefore A B. Let y B. Then y < 1/z for some
z S. If y Q
0 , then y A. If y / Q0 , then y Q+
and y < 1/z for some z S. Let z 0 = 1/y.
Then z <0 0 0
S z. But then y = 1/z and z S. So y A. Therefore AS= B. So 0 {1/z; z + S}
Q R
because {(1/z); z S} R by Theorem 15.5.12 (ix). Hence {(1/z); z S} R
because
S
{(1/z); z S} 6 Q 0 .
R Q
For part (vii), let S + . Then x {1/z; z S + } if and only if x and x Q Q
/ \ {1/z;
zS Q
+
}. But Q
\ {1/z; z S +
Q } = {1/z; z Q+ 0
\ S} by Theorem 10.6.6 (v ) because the map
Q Q
z 7 1/z from + to is a bijection. Hence {1/z; z S Q Q
+ } = \ {1/z; z \ S}. Now let Q
A = {1/z; z S Q+
S
} and B = {(1/z); z S Q
+
Q
}. Let y A. Then y and y = 1/z for
some z S +
Q. There exists z S with z < z by Theorem 15.4.11 line (15.4.6). Then 1/z (1/z 0 )
0 0
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
because 1/z < 1/z 0 . So y B. Therefore A B. Let y B. Then y and y < 1/z for some Q
Q
z S + . Let z 0 = 1/y. Then z 0 < z. So z 0 S +S Q . Therefore y = 1/z 0 for some z 0 S + . Q
So y A. Therefore B A. S Hence {1/z; z S+ Q +
} = {(1/z); Q
z S + }. But it follows from
Theorem 15.5.12 (ix) that {(1/z); z S Q R S
} . Hence {(1/z); z S + } because Q R
S
{(1/z); z S Q+
} Q
.
R
For part (viii), let S . Let T = Q 0 {1/z; z S}. Then T R +
S
by part (vi). So T = S {(x);
Q
x \ T } by Theorem 15.7.9 Q
S (i). But \ T = {1/z; z S \ S} because S Q
Q. So T = {(x);
Q Q
z \ S, x = 1/z} = {(x); z \ S, x = 1/z} = {(1/z); z \ S}. Therefore T = S 1 Q
by part (i). Hence (S 1 ) = T by Theorem 15.7.9 (x).
R S
Q
For part (ix), let S . Then S = {(y); y \ S} by Theorem 15.7.9 (i). By Definition 15.8.11,
(S)1 = Q0 {x Q+
; y Q\ S, x < 1/y}. But y Q \ S if and only if (y) S, by
Q
Theorem 15.7.9 (vi), and x < 1/y if and only if 1/x < y because x, y + . So y \ S, x < 1/y Q
Q
is true ifSand only if 1/x (y) for some y + with (y) S, and this is true if and only if
1/x {(y); y Q R
and (y) S} because S . But this is true if and only if 1/x S by
Theorem 15.5.12 (vi). So (S)1 = Q0 {x Q +
; 1/x S}. Hence (S)1 = Q
0 {1/z; z S}.
For part (x), let S R+
. Let T = {1/z; z S Q+
}. Then T R
by part (vii), and T = {x ; Q
Q R
y \T, x < y} + by Definition 15.7.6 and Theorem 15.7.17 (i). But Q\T = {1/z; z \S} + Q 0. Q
Q Q S
Q
So T = {x ; z \ S, x < 1/z} = {(1/z); z \ S} = S by part (ii). Hence (S 1 ) = T by
1
Q
Theorem 15.5.12 (viii). But {x ; 1/x S} = {x Q ; 1/x S Q+ } = {x Q; 1/x S Q+ }.
So (S)1 = {1/z; z S + }. Q
Part (xii) follows from parts (viii), (ix), (x) and (xi).
R R R
For part (xiii), let S . Then S 1 by part (iii), and so (S, S 1 ) + by Theorem 15.8.8 (ii) and
Definition 15.8.4. Let T = {yz; y S, z S 1 }. Let x T . Then x = yz for some y S and z S 1 .
Q Q
But z S 1 if and only if z and zw > 1 for some w \ S, by Definition 15.8.11. From y S and
Q
w \ S, it follows that y < w, by Theorem 15.4.15 line (15.4.9). So zy > zw > 1. Therefore yz < 1.
So x (1). Therefore T (1).
Now suppose that x (1). Then x Q R Q
and x < 1. Since S , \ S 6= . Let w0 \ S. Q
Let = (x + 1)w0 . Then +
Q Q Q
because 1 + x and w0 . So by Theorem 15.6.13 (i), there
exist y S and w Q
\ S such that w y < . Let w0 = min(w, w0 ). Then w0 \ S and w0 y Q
w y < . So y/w 1 = (w0 y)/(w0 ) < /(w0 ) /(w0 ) = x 1. Therefore y/w0 > x.
0
But y/w0 = (w0 y)/w0 1 < 1 because y < w0 and w0 < 0. Let z = x/y. Then z and Q
Q
zw0 = xw0 /y = x/(y/w0 ) > 1, where w0 \ S. So x T . Therefore (1) T , and so T = (1).
However, (S, S 1 ) = T by Definition 15.8.4. Hence (S, S 1 ) = (1) = (1) by Theorem 15.7.17 (i).
The equality (1) = (S 1 , S) then follows from Theorem 15.8.10 (i). In the case S + , part
(xii) and R
Theorem 15.8.9 (iv) may be applied to obtain (S, S 1 ) = (S), (S 1 ) = (S), (S)1 = (1), and
hence (1) = (S 1 , S).
R
For part (xiv), let T1 , T2 . Then (T1 , T2 ) = {y + z; y T1 , z T2 }. Suppose that S, T1 , T2 0 . Then R
R
(T1 , T2 ) 0 . So (S, (T1 , T2 )) = {xw; x S, w (T1 , T2 )} = {x(y + z); x S, y T1 , z T2 }
by Definition 15.8.4. Similarly, (S, T1 ) = {xy; x S, y T1 } and (S, T2 ) = {xz; x S, z T2 }. So
( (S, T1 ), (S, T2 )) = ({xy; x S, y T1 }, {xz; x S, z T2 }) by Theorem 15.7.13 (vi). Therefore
( (S, T1 ), (S, T2 )) = {x1 y x2 z; x1 , x2 S, y T1 , z T2 } = {x(y + z); x S, y T1 , z T2 },
which equals (S, (T1 , T2 )).
Now for part (xiv), let S + R
0 and T1 , T2
R
0 . Then (S, (T1 , T2 )) = ( (S, T1 ), (S, T2 )) by the case
R
just considered because S 0 . But (S, (T1 , T2 )) = (S, (T1 , T2 )) by Theorem 15.8.9 (iv). Similarly
(S, T1 ) = (S, T1 ) and (S, T2 ) = (S, T2 ). So ( (S, T1 ), (S, T2 )) = ( (S, T1 ), (S, T2 ))
by Theorem 15.7.13 (vi). Therefore (S, (T1 , T2 )) = ( (S, T1 ), (S, T2 )) by Theorem 15.7.13 (v). The
cases with T1 , T2 + R0 follow similarly. The remaining cases have mixed sign for the terms T1 and T2 .
R R R R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Suppose that T1 0 and T2
+
0 . Then either (T1 , T2 )
0 or (T1 , T2 )
+
0 . Suppose that
R
(T1 , T2 ) 0 . Then (S, ((T1 , T2 ), T2 )) = ( (S, (T1 , T2 )), (S, T2 )) because T2
0 . But R
(S, ((T1 , T2 ), T2 )) = (S, T1 ). So
15.8.15 Theorem:
(i) S1 , S2 R, S3 R \ {0R }, ( (S1, S3 ) = (S2, S3) S1 = S2).
(ii) S1 , S2 R, S3 R \ {0R }, (S1 6= S2 (S1 , S3 ) 6= (S2 , S3 )).
(iii) S1 , S2 R, S3 R+ , (S1 S2 (S1 , S3 ) (S2 , S3 )).
(iv) S1 , S2 R, S3 R+ , (S1 < S2 (S1 , S3 ) < (S2 , S3 )).
Proof: For part (i), let S1 , S2 R and S3 R \{0R } with (S1 , S3 ) = (S2 , S3 ). Then S1 = (S1 , (1)) =
1 1 1 1
(S1 , (S3 , S3 )) = ( (S1 , S3 ), S3 ) = ( (S2 , S3 ), S3 ) = (S2 , (S3 , S3 )) = (S2 , (1)) = S2 .
Part (ii) follows logically from part (i).
For part (iii), let S1 , S2 R
0 with S1 S2 , and S3
+
R
. Let x (S1 , S3 ). Then x = yz for some
y S1 and z Q
\ S3 . So x = yz for some y S2 and z Q
\ S3 because S1 S2 . So z (S2 , S3 ).
R
15.9.2 Theorem: The real number system ( , , , ) has the following properties.
(i) S1 , S2 R, (S1, S2) R. [closure of addition]
(ii) S1 , S2 , S3 R, ((S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). [associativity of addition]
(iii) S R, ((0), S) = S = (S, (0)). [existence of additive identity]
(iv) S R, (S, S) = (0) = (S, S). [existence of additive inverse]
(v) S1 , S2 R, (S1 , S2 ) = (S2 , S1 ). [commutativity of addition]
(vi) S1 , S2 R, (S1 , S2 ) R. [closure of multiplication]
(vii) S1 , S2 , S3 R, ( (S1 , S2 ), S3 ) = (S1 , (S2 , S3 )). [associativity of multiplication]
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(viii) S1 , S2 R, (S1 , S2 ) = (S2 , S1 ). [commutativity of multiplication]
(ix) S R, (S, (1)) = S = ((1), S). [multiplicative identity]
(x) S R \ {0R }, (S, S ) = (1) = (S , S).
1 1
[multiplicative inverse]
(xi) S, T1 , T2 R, (S, (T1 , T2 )) = ( (S, T1 ), (S, T2 )). [distributivity]
(xii) is a total order on R. [total order]
(xiii) S1 , S2 , S3 R, (S1 < S2 (S1 , S3 ) < (S2 , S3 )). [order additivity]
(xiv) S1 , S2 , S3 R, ((S1 < S2 and S3 > (0)) (S1 , S3 ) < (S2 , S3 )). [order multiplicativity]
(xv) Every non-empty subset of R which is bounded above has a least upper bound. [completeness]
Proof: Parts (i), (ii), (iii), (iv) and (v) follow from Theorem 15.7.11.
Part (vi) follows from Theorem 15.8.8 (vi).
Part (vii) follows from Theorem 15.8.10 (ii).
Part (viii) follows from Theorem 15.8.10 (i).
Part (ix) follows from Theorem 15.8.10 (iii).
Part (x) follows from Theorem 15.8.14 (xiii).
Part (xi) follows from Theorem 15.8.14 (xiv).
Part (xii) follows from Theorem 15.6.4.
Part (xiii) follows from Theorem 15.7.17 (vi).
Part (xiv) follows from Theorem 15.8.15 (iv).
Part (xv) follows from Theorem 15.6.10 (i, ii).
15.9.4 Theorem: The set of real numbers has the following properties.
PR R
(i) S ( ) \ {}, ((x , y S, x y)
R R
(0 a , ((y S, a y) and (a0 , (a0 > a x S, x < a0 ))))).
In other words, if a set of real numbers has a lower bound, then it has a unique greatest lower bound.
PR R
(ii) S ( ) \ {}, ((x , y S, x y)
R R
(0 b , ((y S, b y) and (b0 , (b0 < b x S, x > b0 ))))).
In other words, if a set of real numbers has an upper bound, then it has a unique least upper bound.
R Z
(iii) x , j , x j.
R Z
(iv) x , i , i x.
Chapter 16
Real-number constructions
R.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
16.1.2 Notation: Let a, b
(i) [a, b] denotes the set {x R; a x and x b}.
(ii) (a, b) denotes the set {x R; a < x and x < b}.
(iii) [a, b) denotes the set {x R; a x and x < b}.
(iv) (a, b] denotes the set {x R; a < x and x b}.
(v) [a, ) denotes the set {x R; a x}.
(vi) (a, ) denotes the set {x R; a < x}.
(vii) (, b] denotes the set {x R; x b}.
(viii) (, b) denotes the set {x R; x < b}.
(ix) (, ) denotes the set R.
16.1.5 Definition: An open interval of the real numbers is any set of the form (a, b), (a, ), (, b)
R R
or , for some a, b with a b.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
A closed interval of the real numbers is any set of the form [a, b], [a, ), (, b] or R, for some a, b R
with a b.
A closed-open interval of the real numbers is any set of the form [a, b) for some a, b R with a b.
An open-closed interval of the real numbers is any set of the form (a, b] for some a, b R with a b.
A semi-closed interval or semi-open interval of the real numbers is any open-closed or closed-open interval
R
of the real numbers for some a, b with a b.
16.1.6 Theorem: A set of real numbers is a real-number interval if and only if it is either an open, closed
or semi-closed interval of the real numbers.
R
Proof: Let I be a real-number interval. If I = , then I = (a, a) for any a . So I is an open interval.
R
Now assume that I 6= . Then x0 I for some x0 . If I is not bounded above, then x I for all x R
with x0 x by Definition 16.1.4. In other words, [x0 , ) I. Similarly, if I is not bounded below, then
(, x0 ] I. So in the case of a doubly unbounded interval, I = R= (, ). If I is bounded below
and not bounded above, let a = inf{x I; x x0 }. Then a R
by Theorem 15.9.4 (i), and (a, x0 ] I by
Definition 16.1.4. If a I, then I = [a, ). If a
/ I, then I = (a, ). Similarly, if I is bounded above and
R
not bounded below, then I = (, b] or I = (, b), where b = sup{x I; x0 x} . If I is bounded
both above and below, then I = [a, b], I = [a, b), I = (a, b] or I = (a, b).
16.1.7 Definition: The closed unit interval of the real numbers is the set [0, 1].
The open unit interval of the real numbers is the set (0, 1).
16.1.9 Theorem: For all I P(R), I is a real-number interval if and only if x1 , x2 I, [x1 , x2] I.
Proof: Let I be a real-number interval. Let x1 , x2 I. Let x [x1 , x2 ]. Then x1 x and x x2 by
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Notation 16.1.2 (i). If x = x1 or x = x2 , then x I by assumption. Otherwise x I by Definition 16.1.4.
PR
Hence [x1 , x2 ] I. Conversely, suppose that I ( ) satisfies x1 , x2 I, [x1 , x2 ] I. Let x, y I
R
and t satisfy x < t and t < y. Then t [x, y] by Notation 16.1.2 (i). So t I by the assumption on I.
Therefore I satisfies line (16.1.1). Hence I is a real-number interval by Definition 16.1.4.
16.1.10 Definition:
The interval containing an element x of a subset S of R is the set {y R; [x, y] [y, x] S}.
16.1.11 Theorem: Let S be a subset of R. For x S, let Ix denote the interval of S containing x.
(i) For all x S, Ix is a real-number interval.
S
(ii) S = xS Ix .
(iii) x1 , x2 S, (Ix1 Ix2 = or Ix1 = Ix2 ).
R
Proof: For part (i), let x S. Let y1 , y2 Ix . Let t satisfy y1 < t and t < y2 . Either t x or t > x.
Suppose that t x. Then y1 < x. But [y1 , x] Ix by Definition 16.1.10. So t Ix because t [y1 , x]. Now
suppose that t > x. Then x < y2 . But [x, y2 ] Ix by Definition 16.1.10. So t Ix because t [x, y2 ]. Thus
t Ix in either case. Hence Ix is a real-number interval by Definition 16.1.4.
S S
For part (ii), let x S. Then x Ix by Definition 16.1.10. So S xS Ix . Now let y xS Ix . Then
y Ix for someSx S. If y x, then yS [y, x] S, and so y S. If y > x, then y [x, y] S, and
so y S. Thus xS Ix S. Hence S = xS Ix .
For part (iii), let x1 , x2 S with x1 x2 . If x1 = x2 , obviously Ix1 = Ix2 . So assume that x1 < x2 . Suppose
that Ix1 Ix2 6= . Let z Ix1 Ix2 . If z x1 , then [z, x2 ] Ix2 because Ix2 is an interval by part (i). So
[x1 , x2 ] S. Similarly, [x1 , x2 ] S if x2 z. Suppose that x1 z and z x2 . Then [x1 , z] S because
Ix1 is an interval, and [z, x2 ] S because Ix2 is an interval. So [x1 , x2 ] S. Thus [x1 , x2 ] S in all cases.
Let y Ix1 . Suppose that y x1 . Then [y, x1 ] S by Definition 16.1.10 for Ix1 . So [y, x2 ] = [y, x1 ]
[x1 , x2 ] S. Therefore y Ix2 by Definition 16.1.10 for Ix2 because [x2 , y] = . Suppose that x1 y x2 .
Then [y, x2 ] [x1 , x2 ] S. So y Ix2 by Definition 16.1.10 for Ix2 . Suppose that x2 y. Then [x1 , y] S
by Definition 16.1.10 for Ix1 . So [x2 , y] [x1 , y] S. Therefore y Ix2 by Definition 16.1.10 for Ix2 . In
all cases, y Ix1 implies x Ix2 . Thus Ix1 Ix2 . Similarly, Ix2 Ix1 . Therfore Ix1 = Ix2 . Hence either
Ix1 Ix2 = or Ix1 = Ix2 .
16.1.12 Definition: A finite real-number interval is a real-number interval of the form (a, b), (a, b], [a, b)
R
or [a, b] for some a, b with a b.
An infinite real-number interval is a real-number interval which is not a finite real-number interval.
A semi-infinite real-number interval is a real-number interval of the form (a, ), [a, ), (, b) or (, b]
for some a, b . R
16.1.13 Definition: The length of a real-number interval I is b a if I is a finite interval (a, b), (a, b],
R
[a, b) or [a, b] for some a, b with a b, and is if I is an infinite interval.
16.1.14 Remark: Symmetric notation for a finite closed interval of real numbers.
Notation 16.1.15 is occasionally useful for various operations on real-number intervals. (See for example
Theorem 32.6.4.) This way of denoting a finite closed interval be expressed in terms of Notation 22.11.12 for
the convex span of a subset of a linear space as [[x, y]] = conv({x, y}), if R
is regarded as a linear space.
R
16.1.15 Notation: [[a, b]], for a, b , denotes the set [a, b] [b, a].
In other words, [[a, b]] = [min(a, b), max(a, b)].
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
initially, but the development of addition and multiplication properties would be untidy. Moreover, such
a treatment would be implementation-dependent. So it is best to add the two pseudo-numbers and
to an abstract real number system R
and adapt the axioms accordingly.
Another reason to develop the extended real numbers axiomatically rather than in terms of Dedekind cuts
is that the extended real numbers have very few of the basic properties that are generally associated with
well-behaved algebraic systems. Even the addition operation is not algebraically closed.
16.2.2 Remark: Interpretation of the infinity-symbol as a well-defined concept.
The symbol for infinity in Definition 16.2.3 is known as Bernoullis lemniscate. (It was introduced
a few decades earlier by John Wallis in 1655. See Cajori [222], Volume 1, page 214, Volume 2, page 44.)
This symbol can be drawn as the graph of a formula like (x2 + y 2 )2 = x2 y 2 . (This is r 2 = cos 2 in polar
coordinates.) The meaning of the symbol is not so easy to specify. It is nearly always a pseudo-notation.
That is, it does not represent a particular set or any precise logical concept. (Thus a pseudo-notation is even
worse than a pseudo-number.) Propositions which include the symbol generally need to be rewritten to
make them logically meaningful.
In the case of the infinity for extended non-negative integers, one may append the set to the set of integers
to obtain + = {}. Thus could denote the set in the case of the non-negative integers.
R
In the case of the real numbers, it does not make much sense to define to mean the set since then
R R
would mean , which is either meaningless or else it would mean the same set as . The easy way out is to
R
say that it just doesnt matter. One plausible solution is to define + to mean + and to mean . R
R
R R
Another alternative is to define all elements of to be equivalence classes of ordered pairs in . Pairs
(x, y) with x 6= 0 could represent the number y/x, and pairs (x, y) with x = 0 could represent + if y > 0 or
if y < 0. (The pair (0, 0) may be held in reserve to mean undefined for another kind of extended real
number definition.) Ultimately, the representation of the set of extended real numbers is not as important
as the representation of the ordinary real numbers. In fact, it just doesnt matter!
16.2.6 Remark: The choice of notations for sets of extended real numbers.
R
The notation for the extended real numbers is not used by many authors. (Federer [67], page 51, does use
this over-bar notation.) The principal advantage of the over-bar notation is the ability to add superscripts,
which is difficult for the more popular asterisk superscript notation . R
The asterisk superscript has the advantage of suggesting the usual notation for topological compactification
of a set. (See for example S.J. Taylor [143], page 34.) However, the topological space asterisk usually
denotes a one-point compactification, whereas
R
is a two-point compactification. So there is not much loss
in abandoning the asterisk superscript.
16.2.7 Definition: The extended real number system is the number system ( , , ), where R
R
is the set
R R
R
of extended real numbers, : ( ) \ {(, +), (+, )} is the addition operation on , and
R
R R
R
: ( ) \ {(0, +), (0, ), (+, 0), (, 0)} is the multiplication operation on , defined by R
a + b if a, b R
R
(a, b) = + if a, b {+} and + {a, b}
R
if a, b {} and {a, b}
and
ab if a, b R
(a, b) = + if
R
R
R
R
(a, b) ( + + ) ( ) and {, +} {a, b} = 6
if (a, b) ( R
+
R )( R
+
R
) and {, +} {a, b} = 6 .
16.2.8 Remark: Arithmetic operations on the extended real numbers.
The extended real number system in Definition 16.2.7 is clearly not a field. Neither is it a ring. Nor is it
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a group or even a semigroup with respect to the addition or multiplication operation. Nevertheless, it is a
useful number system for measure and integration theory and some other applications.
Since both the addition and multiplication on R
are defective, it is not possible to define subtraction and
division in the usual way. However, it is possible to define a defective subtraction operation : and R R R
R R
R
a defective division operation : ( \ {0}) which have some level of consistency with the defective
addition and multiplication operations. At the least, one would hope for relations such as ((a, b), b) = a,
((a, b), b) = a, ( (a, b), b) = a and ((a, b), b) = a. Similarly, one may hopefully construct unary additive
and multiplicative inverse operations which are only slightly defective.
R R
R
R
A negation operation : and a multiplicative inverse : \ {0} may be defined as follows.
(
a if a R
(a) = + if a =
if a = +
and
R
1
a if a \ {0}
(a) =
0 if a {, +}.
R R
R
A subtraction operation : ( ) \ {(, ), (+, +)} may be defined by (a, b) = (a, (b))
R R
R
for (a, b) in the domain of , and a quotient operation : ( \ {0}) may be defined by:
R R
a/b if a and b \ {0}
(a, b) = a if a {, +} and b + R
(a) if a {, +} and b . R
16.2.9 Remark: The set of non-negative extended real numbers.
The non-negative extended real number system in Definition 16.2.10 is not exactly a simple restriction of
the extended real number system in Definition 16.2.7. For example, it is possible to write 01 = in a
meaningful way, which is not possible when there are two infinities.
16.2.10 Definition: The non-negative extended real number system is the number system ( +
0 , , ), R
+
R + +
R +
R R
where 0 is the set of non-negative extended real numbers, : 0 0 0 is the addition opera-
tion on + R
0 , and : ( 0 R
+ +
R
0 ) \ {(0, +), (+, 0)}
+
R
0 is the multiplication operation on
+
0 , defined R
by
R
a + b if a, b +0
(a, b) =
+ if + {a, b}
and
ab R
if a, b +0
(a, b) =
+ R
R
if (a, b) + + and + {a, b}.
16.2.11 Remark: Multiplicative inverse, subtraction and quotients for extended real numbers.
A multiplicative inverse : + R
0
+
R
0 may be defined by
(
a1 if a +R
(a) = 0 if a = +
+ if a = 0.
A subtraction operation : {(a, b) R+0 R+0; a b} ({+} R+0 ) R+0 may be defined by
a b if a, b R+
0 and a b
+ if a = + and b R+
(a, b) =
0,
and a quotient operation : (R+0 R0 ) \ {(0, 0), (+, +)} R0 may be defined by:
+ +
a/b if a R+ 0 and b R
+
(a, b) = 0 if a R0 and b = +
+
+ if a = + and b R+
0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
x = x + 1. (16.2.1)
This is not much more absurd than defining the imaginary number i as the solution of x2 = 1. The
equation x = x + 1 may be solved by iteration. Start with x = 1 and substitute this into the right hand
side. This gives x = 2. Now substitute again to obtain x = 3. By continuing this an infinite number of
times, the solution x converges to infinity! Another way to solve equation (16.2.1) is to divide both sides
by x, which gives 1 = (x + 1)/x, which means that the ratio of x + 1 to x equals 1, which is a more and
more accurate approximation as x tends to infinity.
Let x = 1010 . Then equation (16.2.1) is accurate to 10 decimal places. This is as accurate as any measurement
in physics. So from this point of view, the equality may be regarded as exact. If x = 10100 , then the
overwhelming majority of floating-point hardware on personal computers is unable to distinguish the left
100
and right sides of the equation. So in this sense, x = 10100 is an exact solution of the equation. If x = 1010 ,
then equation (16.2.1) is accurate to 10100 decimal places. If 2500 decimal places are printed on each side
of an A4 sheet of paper with standard 80 grams per square metre paper density, the total weight of paper
required to print out these decimal places is 1091 tons, which is approximately 3 1041 times the mass of
the observable universe. Therefore one may consider 10100 to be a pretty good approximation to infinity.
16.2.13 Definition: The ordered extended real number system is the ordered number system ( , , , ), R
R
where ( , , ) is the extended real number system in Definition 16.2.7, and is the total order on in R
Definition 15.6.2, extended by the relations
r R, r and r .
16.2.14 Theorem:
(i) The order on
R
is a total order.
R
(ii) In the order on , inf() = and sup() = .
PR R
(iii) If S ( ) is a not bounded below in , then inf(S) = with respect to the order on .
R
PR R
(iv) If S ( ) is a not bounded above in , then sup(S) = with respect to the order on .
R
(v) Every subset of
R
has a well-defined infimum and supremum in .
R
R
R
Proof: For part (i), let a, b . Let R be the standard order on . Then R is a total order on by R
R R
Theorem 15.6.4. Let R be the order on . If a, b , then a b or b a by Definition 11.5.1 (i). If
R
R
a = and b , then a b. If a = and b , then b a. If a
R
and b = , then b a. If
R
R R
a and b = , then a b. Thus a b or b a for all (a, b) . So R satisfies Definition 11.5.1 (i).
R
R
Let a, b satisfy a b and b a. If a, b , then a = b by Definition 11.5.1 (ii). If a = , then b a
implies that b = because the pair (b, ) is not in R for other values of b. Similarly, a = b if a = ,
b = or b = . Thus R satisfies Definition 11.5.1 (ii).
Let a, b, c
R R
satisfy a b and b c. If a, b, c , then a c by Definition 11.5.1 (iii). If a = ,
then a c. If a = , then b = , and so c = . Therefore a c. If b = , then a = . So a c. If
b = , then c = . So a c. If c = , then b = and so a = . So a c. If c = , then a c.
R
Thus for all a, b, c , a b and b c implies a c. So R satisfies Definition 11.5.1 (iii). Hence R is a
total order on .
R
R
For part (ii), x for all x . So is a lower bound for in the order on . But y for all y .
R
So y for all lower bounds for . Therefore inf() = by Definition 11.2.4. Similarly, sup() = .
For part (iii), let S be a subset of R which is not bounded below. Then S has no lower bound in by R
Definition 11.2.5. But is a lower bound for S in
R
by Definition 16.2.13. So is the greatest lower
bound for S. Thus inf(S) = .
Part (iv) may be proved as for part (iii).
R
For part (v), let S be a subset of . If S = , then S has a well-defined infimum and supremum by part (ii).
R R
So suppose that S 6= . If S is bounded below in , then S has a greatest lower bound in . (This exists
R
by Theorem 15.6.10 (ii) or by Theorem 15.9.2 (xv).) Thus inf(S) . If S is not bounded below in , then R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
R
R
inf(S) = by part (iii). Hence S always has a well-defined infimum in . Similarly, S always has a
well-defined supremum in .
R
16.3.5 Notation:
Q
+
denotes the set of positive extended rational numbers {q Q; q > 0}.
Q0 denotes the set of non-negative extended rational numbers {q Q
+
; q 0}.
16.4.3 Definition: The m, n-concatenation operation for real (number) tuples for m, n Z+0 is the func-
R R R
tion Qm,n : m n m+n defined by
16.5.2 Definition: The absolute value (function) for real numbers is the function | | : R R defined
by
x R, |x| =
x
x
if x 0
if x 0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
16.5.3 Theorem:
(i) x R, |x| 0.
(ii) x R, |x| x |x|.
Proof: For part (i), let x R. Then either x < 0 or x 0. If x < 0, then |x| = x 0. Otherwise x 0.
Hence x 0.
R
For part (ii), let x . Then either x < 0 or x 0. If x < 0, then |x| = x. So |x| = x < |x|. Therefore
|x| x |x|. If x 0, then |x| = x. So |x| = x x = |x|. Therefore |x| x |x|. Hence
R
x , |x| x |x|.
16.5.5 Remark: Alternative names for the signum and absolute value functions.
The sign function is also called the signum function (from the Latin word signum). Some authors use
the notation sgn(x) for sign(x). The absolute value function is sometimes known as the modulus. The
absolute value and sign functions are illustrated in Figure 16.5.1.
|x| sign(x)
3 3
2 2
1 1
-3 -2 -1 0 1 2 3 x -3 -2 -1 1 2 3 x
-1
Figure 16.5.1 Absolute value and sign functions
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
H(x) H(x)x0 H(x)x1 H(x)x2
2 2 2
1 1 1
-2 -1 0 1 2 x -2 -1 0 1 2 x -2 -1 0 1 2 x
The value H(0) = 0 would have the advantage that H(x) = limp0+ xp for all x 0. But the value H(0) = 21
R
has the advantage that H(x) = 1 H(x) for all x , and it is also the right choice for Fourier analysis.
All in all, the inconsistencies between the definitions of the Heaviside function for its various applications
make it difficult to provide a single best definition. So it is best to explicitly define in each context which
value is intended for x = 0.
16.5.10 Remark: Floor, ceiling, fraction and round functions.
It may not be immediately obvious that the floor, ceiling, fraction and round functions are constructed by
logical case-splitting. However, one may write the floor function in Definition 16.5.11 as
0 if 0 x < 1
R
1 if 1 x < 2
x , floor(x) =
1 if 1 x < 0
. . .
= i if i and i x < i + 1. Z
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
16.5. Some basic real-valued functions 491
This is admittedly not a fixed finite case-split rule. In fact, it makes the construction of the floor function look
R Z
more like the inductive definitions in Section 16.6. The constructions floor = {(x, i) ; i x < i + 1}
R Z
and ceiling = {(x, i) ; i 1 < x i} show that induction is not required for Definitions 16.5.11
and 16.5.12.
Any definition which requires an infimum or supremum has an analytical appearance, although these are
basic set-theoretic order concepts as in Chapter 11, which are apparently more fundamental than algebra or
analysis. It is generally possible to remove inf and sup operations from definitions by replacing them
with equivalent logical formulas. Therefore the boundary between logical and analytical constructions is
somewhat subjective.
Definitions 16.5.11 and 16.5.12 are illustrated in Figure 16.5.3.
floor(x) ceiling(x)
4 4
3 3
2 2
1 1
-1 1 2 3 4 5 x 0 1 2 3 4 5 x
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
In other words,
x R, Z
floor(x) = sup{i ; i x}.
An alternative notation for floor(x) is bxc.
16.5.12 Definition: The ceiling function ceiling : R Z is defined by
ceiling = {(x, i) R Z; x i < x + 1}
= {(x, i) R Z; i 1 < x i}.
In other words,
x R, Z
ceiling(x) = inf{i ; i x}.
An alternative notation for ceiling(x) is dxe.
16.5.13 Theorem: x R, ceiling(x) = floor(x).
Proof: From Definitions 16.5.11 and 16.5.12, it follows that:
x R, Z
ceiling(x) = inf{i ; i x}
Z
= sup{i; i and i x}
Z
= sup{i; i and i x}
Z
= sup{i ; i x}
= floor(x).
round(x)
4
frac(x)
3
2
1
1
x x
-3 -2 -1 0 1 2 3 -1 0 1 2 3 4 5
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
16.5.19 Definition: The modulo function mod :
x R, m R+, x mod m = inf{y R+
0 ; i Z, x = i m + y}.
x mod m (x + m) mod 2m m
m m
2m m 0 m 2m
x x
2m m 0 m 2m
m
Figure 16.5.5 Modulo functions
16.5.21 Theorem: The absolute value, floor and modulo functions satisfy the following:
(i) x R, m R+, 0 x mod m < m.
R R
Proof: For part (i), let x and m \ {0}. Then x mod m = inf{y + R Z
0 ; i , x = i m + y} by
Definition 16.5.19. Suppose that x 0. Then x {y + ; iR , x = Z
i m + y} because x = i m + y with
R Z R
0
i = 0 and y = x. So {y + ; i , x = i m + y} 6 = . Therefore x mod m +
0 , and so x mod m 0.
R Z
0
Now suppose that x < 0. Then x/m , and so i , i x/m by Theorem 15.9.4 (iv). For some such i,
let y0 = x im. Then y0 + R
0 and x = im + y0 . So y0 {y
+
R Z
0 ; i , x = i m + y} = 6 . Therefore
x mod m 0. For the second inequality in part (i), assume x mod m m. Then inf{y + 0 ; i , x = R Z
i m + y} m.
a
a
4a 2a 2a 4a
x 3a a a 3a x
2a a 0 a 2a
a
A particular sawtooth function is the distance to nearest integer, defined by inf{|x k|; k Z}. The
difference between x and the rounded value round(x) = ceiling(x 12 ) is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
x sign(x) floor(|x| + 12 ) = sign(x) |x| floor(|x| + 12 )
= sign(x) frac(|x| + 21 ) 21 .
Therefore the distance from a real number x to the nearest integer equals the absolute value of this, which
is
frac(|x| + 1 ) 1 = (|x| + 1 ) mod 1 1
2 2 2 2
= (|x| + 21 ) mod 1 12
= (x + a) mod 2a a
a a
x x
2a a 0 a 2a 2a a 0 a 2a
Figure 16.5.7 Square functions
x4 x3 x2 x1
2 x5
x2
x0 1 x0
x2
x2
x4 x5
-2 -1 1 2 x
5
x
x3
-1
x1 x3 x5 -2
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The only case requiring careful thought is (x, p) = (0, 0). It is best to define 00 = 1 in accordance with
Theorem 14.6.9, which states that #( ) = 1 because = {} by Theorem 10.2.21. Consequently the
function x 7 x0 is continuous. Another way to look at 00 is analytically as the limit expression limx0+ xx =
limx0+ exp(x ln x) = exp(limx0+ x ln x) = exp(0) = 1.
The notation xp for the non-negative integer power in Definition 16.6.3 is consistent with the power-of-two
function for integers in Notation 14.6.8. Both Definitions 16.6.3 and 16.6.4 are consistent with the integral
powers of group elements in Notation 17.4.9.
16.6.3 Definition: The (non-negative integer) power xp of a real number x R with non-negative integer
exponent p + Z
0 is defined inductively by
(1) x R, x0 = 1,
(2) x R, p Z+ , xp = xp1 x.
16.6.4 Definition: The (negative integer) power xp of a non-zero real number x R \ {0} with negative
Z
integer exponent p is defined inductively by
(1) x R \ {0}, x1 = 1/x,
(2) x R \ {0}, p Z+ , xp1 = xp /x.
a topological concept. In fact, the bounds for z p y p in Theorem 16.6.6 (vii, viii) are closely related to the
Lipschitz continuity parameter for the function y 7 y q .
It should not be surprising that some kind of analysis is required for the definition and properties of integer
roots of real numbers because when these are combined with the power functions in Definition 16.6.3, general
positive rational powers of non-negative real numbers may be defined. By filling the gaps between the
rational numbers using an infimum or supremum construction, general exponential functions such as 10x or
ex may be defined as in Definition 16.6.15.
Since square roots and cube roots have been computed since very ancient times, it may seem that integer
roots are part of elementary arithmetic. It is therefore annoying, perhaps, to discover that they require
substantial work to demonstrate their well-definition. On the other hand, the basic properties of reciprocals
of real numbers require substantial work in Theorem 15.8.14 because multiplicative inverses are also defined
as solutions of equations, although reciprocals are even more ancient than square roots and cube roots.
16.6.6 Theorem: Well-definition of positive integral roots of non-negative real numbers.
Z
(i) p + , x1 , x2 + R p p
0 , (x1 < x2 x1 < x2 ).
Z
(ii) p + , x1 , x2 + R p p
0 , (x1 x2 x1 x2 ).
Z R
(iii) p + , x1 , x2 0 , (x1 < x2 x1 < xp2 ).
+ p
Z
(iv) p + , x1 , x2 + R p p
0 , (x1 = x2 x1 = x2 ).
Z
(v) p + , x + R
0 , (x 1 x x).
p
(vi) p +
Z +
R
, x 0 , (x 1 xp x).
Z
(vii) p + , y, z + R p p
0 , ((p 2 and y < z) z y < p(z y)z
p1
).
(viii) p +
Z +
R p
, y, z 0 , (y z z y p(z y)zp p1
).
Z
(ix) q + , x + R0 , 0
y +
0 R
, y q
= x.
Z
(x) q + , x + R0 , y +
R q
0 , (y = x y = sup{z
+
R q
0 ; z x}).
Z
0
for p = 1. So assume that the assertion is true for some p + . Then 0 x1 < x2 and 0 xp1 < xp2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
So xp+1 = xp1 x1 xp2 x1 < xp2 x2 = xp+1 by Theorem 15.9.2 (xii, xiii, xiv). Hence the assertion follows for
Z
1 2
all p + by induction on p.
Part (ii) follows from part (i) and the observation that x1 = x2 implies xp1 = xp2 .
Z
For part (iii), let p + and x1 , x2 + p
R p p p
0 with x1 < x2 . Then x2 > x1 . So x2 > x1 by the contrapositive of
part (ii). Therefore x1 < x2 . The converse follows from part (i).
Part (iv) follows from part (iii) because is a total order on R
by Theorem 15.9.2 (xii).
For part (v), let p +
Z +
R
and x 0 with x 1. The case p = 1 is obvious. So assume that the assertion
Z
is true for some p + . Then xp+1 = xp x xp .1 = xp x by Theorem 15.9.2 (xii, xiii, xiv, ix). Hence the
Z
assertion follows for all p + by induction on p.
Z
For part (vi), let p + and x + R
0 with x 1. The case p = 1 is obvious. So assume that the assertion
Z
is true for some p + . Then xp+1 = xp x xp .1 = xp x by Theorem 15.9.2 (xii, xiii, xiv, ix). Hence the
Z
assertion follows for all p + by induction on p.
Z
For part (vii), let p + and y, z + R p p 2 2
0 with y < z. Let p = 2. Then z y = z y = (z y)(z + y) <
p1
2(z y)z = p(z y)z , as claimed. Now assume that the assertion is valid for some p 2. Then
z p+1 y p+1 = z(z p y p ) + (z y)y p < zp(z y)z p1 + (z y)z p = (p + 1)(z y)z p , which verifies the
assertion for the case p + 1. Hence the assertion is valid for all p 2 by induction on p.
Z
For part (viii), let p + and y, z + R p p
0 with y z. Let p = 1. Then z y = z y = p(z y)z
p1
0
because z = 1 by Definition 16.6.3 (1). This verifies the case p = 1. Now suppose that p 2 and y = z.
Then z p y p = 0 = p(z y)z p1 , which verifies the assertion for this case. The remaining case p 2 with
y < z follows from part (vii).
Z
For part (ix), let q + and x + R
0 . If q = 1, then clearly y = x is the unique solution of the equation
y q = x by Definition 16.6.3. So it may be assumed that q 2. But if x = 0, then y = 0 is the unique
solution of the equation y q = x by part (iv). So it may also be assumed that x > 0.
Let S = {z + R q
0 ; z x}. Then S 6= because 0 S. Let z0 = ceiling(x). (See Definition 16.5.12.) Then
z0 1 because x > 0. So z0q z0 by part (v). Therefore (z0 + 1)q > z0q z0 x by part (i), which then
implies that z0 + 1 / S for all z1 +
/ S. It follows that z1 R
0 with z1 z0 + 1 by part (i). Thus z0 + 1 is
an upper bound for S. So S has a well-defined supremum sup(S) + R
0 by Theorem 15.9.2 (xv).
Let y = sup(S). Suppose that y = 0. Then S = {0}. To show that this implies x = 0, assume that x > 0.
Let z = min(1, x). Then 0 < z. If x 1, then z q = 1q = 1 x. So z S, which contradicts the assumption.
But if x < 1, then z q = xq x by part (vi). So z S, which contradicts the assumption again. Therefore
x = 0, which implies that y = 0 is the unique solution of the equation y q = x by part (iv). Thus it may be
assumed that y > 0.
Now suppose that y > 0 and y q < x. Let x0 = min(x, 2y q ). Then y q < x0 . Let = (x0 y q )/(2qy q1 ). Then
> 0. Let z = y+. Then y < z. So z q y q < q(zy)z q1 by part (vii) and the assumption q 2. Therefore
z q < y q +qz q1 = y q + 12 (x0 y q )z q1 /y q1 . From x0 2y q , it follows that z q < y q + 21 (2y q y q )z q1 /y q1 =
y q + 12 yz q1 < y q + 12 z q . So z q < 2y q . Then x0 x implies z q < y q + 21 (xy q )z q1 /y q1 < y q +(xy q )y/z <
y q +(xy q ) = x. So z S. But y < z. So this contradicts the assumption y = sup(S). Consequently y q x.
Now suppose that y > 0 and y q > x. Let = (y q x)/(qy q1 ). Then > 0. Let y = y . Then
y < y. So y q y q < q(y y)y q1 by part (vii) and the assumption q 2. Therefore y q y q < qy q1 =
qy q1 (y q x)/(qy q1 ) = y q x. So y q > x. Therefore z < y for all z S by part (iii), and y
/ S. So y is an
upper bound for S which is less than y, which contradicts the assumption y = sup(S). Consequently y q x.
Hence y q = x. The uniqueness of y follows from part (iv).
Part (x) follows directly from the proof of part (ix).
16.6.9 Notation:
q
x, for x + R0 and q
+
Z, denotes the positive integral root of x with index q. In
R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
other words, q x = sup {z + 0 ; z q
x} is the unique solution y of the equation y q = x.
1/q
x is an alternative notation for x. q
16.6.11 Definition: The (non-negative rational) power of a non-negative real number x R+0 with ra-
Q
tional exponent p/q , where p +
0 and q
+
Z Z
, is the real number (xp )1/q .
R
16.6.12 Definition: The (negative rational) power of a positive real number x + with rational expo-
Q Z Z
nent p/q , where p and q + , is the real number (xp )1/q . (Optional value + for x = 0.)
16.6.13 Notation: xp/q , for x R+ 0 , p Z0 and q Z , denotes the power of a non-negative real
+ +
x1/2x1 x2 3
x5 x x
2
2 x1
x1/2
x1/4 x1/3
x1/5
1
x0 x0
x1/5 x1/3
x1/2
x1 x1
x5 x2
x 4 x3
1 2 x
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Strictly speaking, the extended real number value + for x = 0 and r < 0 in Definition 16.6.15 means
undefined. It is included here for the convenience of defining real powers for a simple Cartesian set-
product domain (x, r) + R
0 R
instead of splitting the cases according to what is and is not defined. On
the other hand, the extended real value + is occasionally useful.
16.6.15 Definition: The (real) power of a non-negative real number x + 0 with exponent r R R is the
extended real number
+ if x = 0 and r < 0
1
if x = 0 and r = 0
0 if x = 0 and r > 0
inf{x s
; s and s r} Q if 0<x1
Q
inf{xs ; s and s r} if 1 x.
16.6.16 Notation: xr , for x R+0 and r R, denotes the power of x with real exponent r.
16.6.17 Remark: Elementary inequalities for real numbers involving square roots.
The inequalities in Theorem 16.6.18 may be very broadly generalised. (The Cauchy-Schwarz inequality
in Theorem 24.7.14 is a relatively minor generalisation of Theorem 16.6.18 (ii).) But the assertions in
Theorem 16.6.18 may be proved in an elementary way using only some basic properties of the real numbers
and the square root function.
16.6.18 Theorem: Some basic inequalities for real numbers and square roots.
R
(i) x1 , y1 , x2 , y2 , (x1 x2 + y1 y2 )2 (x21 + y12 )(x22 + y22 ).
R
(ii) x1 , y1 , x2 , y2 , |x1 x2 + y1 y2 | (x21 + y12 )1/2 (x22 + y22 )1/2 .
R
(iii) x1 , y1 , x2 , y2 , (x1 + x2 )2 + (y1 + y2 )2 ((x21 + y12 )1/2 + (x22 + y22 )1/2 )2 .
(iv) x1 , y1 , x2 , y2 R, ((x1 + x2)2 + (y1 + y2 )2)1/2 (x21 + y12 )1/2 + (x22 + y22 )1/2.
Proof: For part (i), (x21 + y12 )(x22 + y22 ) (x1 x2 + y1 y2 )2 = x21 y22 + y12 x22 2x1 x2 y1 y2 = (x1 y2 y1 x2 )2 0.
The assertion follows directly from this.
Part (ii) follows from part (i) and Theorem 16.6.6 (iii).
For part (iii), ((x21 +y12 )1/2 +(x22 +y22 )1/2 )2 (x1 +x2 )2 (y1 +y2 )2 = 2(x21 +y12 )1/2 (x22 +y22 )1/2 2x1 x2 2y1 y2 .
This is non-negative by part (ii). The assertion follows directly from this.
Part (iv) follows from part (iii) and Theorem 16.6.6 (iii).
16.7.2 Notation:
Pn +
Z
i=m ai , for m, n 0 and any function a : I R
with I [m, n], denotes the sumZ
defined inductively with respect to n m by
Pn
(1) ai = 0 if n m < 0,
Pni=m
(2) i=m ai = am if n m = 0,
Pn Pn1
(3) i=m ai = i=m ai + an if n m > 0.
16.7.3 Notation:
Qn
i=m ai , for m, n
+
Z
0 and any function a : I R with I Z[m, n], denotes the
product defined inductively with respect to n m by
Q
(1) ni=m ai = 1 if n m < 0,
Qn
(2) i=m ai = am if n m = 0,
Q Qn1
(3) ni=m ai = i=m ai an if n m > 0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
16.7.4 Theorem:
(i) x R, p Z+0, xp = Qpi=1 x.Q
x R \ {0}, p Z+ p p
(ii) 0, x = i=1 1/x.
n, r Z+ n
Qr1
(iii) 0 , Cr = i=0 (n i)/(1 + i).
Proof: Part (i) follows from Definition 16.6.3 and Notation 16.7.3.
Part (ii) follows from Definition 16.6.4 and Notation 16.7.3.
Q
For part (iii), let n = 0. Then C nr = r0 by Theorem 14.8.8 (i), and also r1
i=0 (n i)/(1 + i) = r0 by
n
Qr1 +
Notation 16.7.3. Therefore C r = i=0 (n i)/(1 + i) for all r 0 when n = 0. Z
Assume the formula for n = k Z +
Let n = k + 1. Then C k+1
0. 0 = 1 by Definition 14.8.2. Therefore
Qr1
C r = i=0 (n i)/(1 + i) for r = 0. So let r + . Then C k+1
k+1
r Z
= C k k
r1 + C r by Theorem 14.8.8 (ii). So
the case n = k implies
r2
Q r1
Q
C k+1
r = (k i)/(1 + i) + (k i)/(1 + i)
i=0 i=0
r2
Q
= (1 + (k r + 1)/(1 + r 1)) (k i)/(1 + i)
i=0
r1
Q
= (k + 1 i)/(1 + i).
i=0
P P
Proof: In the case r = 1, line (16.7.1) asserts that nj=1 f (1, j) = iN1n f (1, i1 ), which is clearly true.
So assume that line (16.7.1) is true for some r + . Then Z
r+1
Q Pn n
P r P
Q n
f (k, j) = f (r + 1, j) f (k, j)
k=1 j=1 j=1 k=1 j=1
P r
P Q
= f (r + 1, ir+1 ) f (k, ik )
ir+1 N n iN r
n k=1
P r
P Q
= f (r + 1, ir+1 ) f (k, ik )
ir+1 N n i N r
n k=1
P P r
Q
= f (r + 1, ir+1 ) f (k, ik )
ir+1 N n i N r
n k=1
P r+1
Q
= f (k, ik ),
iN r+1
n
k=1
which is the assertion of line (16.7.1) for the case r + 1. Hence the assertion follows for all r Z+ by
induction on r.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
16.8.1 Definition: The complex number system is the tuple ( , C , C ) where R R
R R
(i) (x1 , y1 ), (x2 , y2 ) , C (x1 , y1 ), (x2 , y2 ) = (x1 + x2 , y1 + y2 ),
R R
(ii) (x1 , y1 ), (x2 , y2 ) , C (x1 , y1 ), (x2 , y2 ) = (x1 x2 y1 y2 , x1 y2 + y1 x2 ).
16.8.2 Notation: C
denotes both the complex number system ( , C , C ). R R
C R R
also denotes the set in the context of the complex number system.
In other words, C C R R
< ( , C , C ) = ( , C , C ).
i denotes the complex number (0, 1).
16.8.4 Remark: The real and imaginary part operators for complex numbers.
The real and imaginary parts of a complex number z = (x, y) are defined to be the left and right elements x
and y respectively of the ordered pair (x, y). The left and right element extraction operations are described
in Definition 9.2.13 and Notation 9.2.14.
16.8.5 Definition: The real part of a complex number z is the left element of z. C
C
The imaginary part of a complex number z is the right element of z.
16.8.6 Notation: Re(z), for a complex number z, denotes the real part of z.
Im(z), for a complex number z, denotes the imaginary part of z.
In other words, Re(z) = Left(z) and Im(z) = Right(z) for all z . C
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
500 16. Real-number constructions
16.8.7 Remark: The absolute value function for the complex numbers.
In compliance with Remark 1.6.3 paragraph (3), complex numbers are avoided wherever possible in this
book. However, some basic definitions are inescapable if some basic facts about connections on differentiable
fibre bundles with unitary structure groups are to be presented. In particular, the absolute value function in
Definition 16.8.8 is required in order to describe the topology on the complex numbers in Definition 39.3.4.
(See Notation 16.6.9 for the square root expression in Definition 16.8.8.)
16.8.8 Definition: The absolute value (function) for complex numbers is the function | | : C R+0
defined by
In other words,
16.8.9 Theorem: Some basic properties of the absolute value function for complex numbers.
C
(i) z1 , z2 , |z1 z2 | = |z1 ||z2 |.
C
(ii) z1 , z2 , |z1 + z2 | |z1 | + |z2 |.
|z1 z2 | = |(x1 x2 y1 y2 , x1 y2 + y1 x2 )|
= ((x1 x2 y1 y2 )2 + (x1 y2 + y1 x2 )2 )1/2
= ((x21 x22 + y12 y22 2x1 x2 y1 y2 ) + (x21 y22 + y12 x22 + 2x1 y2 y1 x2 ))1/2
= (x21 x22 + y12 y22 + x21 y22 + y12 x22 )1/2
= ((x21 + y12 )(x22 + y22 ))1/2
= (x21 + y12 )1/2 (x22 + y22 )1/2
= |z1 | |z2 |.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (ii) follows from Theorem 16.6.18 (iv) by letting z1 = (x1 , y1 ) and z2 = (x2 , y2 ).
1
e con
enc ver
nv er g s rad gence
x4 co adiu ius x3 x
r
-6 -5 -4 -3 -2 -1 x1 1 x2 2 3 4 5 6
radius of convergence of (1 + x2 )1
Figure 16.8.1 Convergence radii suggesting complex pole for real-analytic function
A similar observation may be made generally for algebraic real-analytic functions, which suggests that even
if the complex numbers are ignored, they still have disruptive effects on the real line. This is reminiscent
of the way in which geophysicists detect minerals a long distance under the ground using gravitometric,
electromagnetic and other detection methods. The existence and nature of minerals deep underground
may be inferred from evidence obtained entirely above ground. In the same way, the poles of real-analytic
functions deep in the complex plane may be detected by their influence on the radius of convergence of
Taylor series for real-valued functions evaluated purely within the real line.
This kind of argument only shows the reality of the complex numbers if one accepts the reality of power
series. There is no reason in physics why power series should have a special significance, apart from the
fact that addition and multiplication are programmed into calculators. It is very unusual for a physical
system to have state-functions which are truly real-analytic except in mathematical models, because an
analytic function is determined everywhere from any infinitesimal region. At the very microscopic level at
which quantum theory is effective, probably the real functions are indeed real analytic. So one would expect
complex numbers to be relevant in that context. The usefulness in quantum theory of complex numbers
is therefore not surprising. (There seems to be no argument for the reality of quaternions similar to the
argument for the reality of complex numbers.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
pages 2528; Shilov [131], page 381.)
Part II
Algebra
acted on by some other algebraic structure such as a semigroup, group or ring, whereas an algebra has
two sets which both have addition and multiplication operations.
(2) Sections 19.1, 19.2, 19.3, 19.4, 19.5, 19.6 and 19.7 introduce modules over sets, semigroups, groups and
rings.
(3) The most useful kind of module is a unitary left module over a commutative division ring, which is
also known as a linear space (or vector space). These are so important that they are dealt with
separately in Chapters 2224. (In fact, the multilinear and tensor spaces in Chapters 2730 are also
linear spaces. So they are also unitary left modules over commutative division rings. A commutative
division ring is also known as a field.)
(4) Most of the definitions and properties of modules are familiar from higher-level algebraic classes such
as linear spaces. It is an interesting exercise to determine how little structure is required to define a
concept or prove a property. The minimalist norms and inner products in Sections 19.4, 19.5 and 19.6
are defined in this spirit for modules over rings, leading to some properties in Theorems 19.6.4 and
19.6.6 which resemble linear space properties. In Section 19.7, some slightly weaker-than-usual Cauchy-
Schwarz and triangle inequalities are proved for modules over ordered rings, and in Theorem 19.7.9, the
full Cauchy-Schwarz inequality is shown for Archimedean ordered rings.
(5) Section 19.8 defines associative algebras, which are not much used in this book.
(6) Section 19.9 defines Lie algebras, which are closely associated with Lie groups, and are thereby of great
significance for general connections on differentiable fibre bundles, and for affine connections on tangent
bundles in particular.
Chapter 20: Transformation groups
(1) Chapter 20 introduces transformation groups, which are more reality-oriented than the abstract groups
in Section 17.3. One could say that all groups are groups of transformations, although often the same
group acts on multiple passive spaces simultaneously, which justifies abstracting groups as an individual
algebraic structure. (For example, the linear group acting on the tangent space at a point simultaneously
transforms all of the associated tensor spaces at that point.) Transformation groups are the algebraic
foundation for the Lie transformation groups in Section 60.11. Parallelism on general fibre bundles
is defined in terms of structure groups, which are groups of transformations of a fibre space. Hence
transformation groups are fundamental to the definition of parallelism, from which curvature is defined.
(2) Transformation groups are introduced in Definition 20.1.2.
(3) In Sections 20.2, 20.3 and 20.4, transformation groups are classified as effective, free or transitive.
Effective means that group elements are uniquely determined by their action on the passive set,
which implies that the group really is a group of transformations. In other words, each group element
is its action on the passive set. Free means that only the action of one element of the group has fixed
points, which is clearly not true for the group of rotations of a two-sphere for example. Transitive
means that for every pair of points in the passive set, there is a group element which transforms one to
the other. Thus the group of rotations of a two-sphere is transitive but the group of rotations of 3-space
is not. A non-trivial group acting on itself is always effective, transitive and free.
(4) Section 20.5 introduces orbits and stabilisers of transformation groups. The orbits of a transformation
group are trivial if and only if it is transitive. The stabilisers are trivial if and only if it acts freely.
Therefore a non-trivial group acting on itself has both trivial orbits and trivial stabilisers.
(5) Section 20.7 gives the definitions for right transformation groups which correspond to the more usual
style of left transformation group. In principle, this is redundant. However, it is easy to get confused
when a group simultaneously acts both on the left and the right on different passive sets.
(6) Section 20.9 is about figures and invariants of transformation groups. This topic is often presented
informally as and where required, and the presentation here is also quite informal. The basic idea
here is that the action of a group on the set of points of a given passive set X is closely associated
with its action on sets of figures in the set X. For example, the set of rotations of the points in a
Euclidean space is closely associated with the set of rotations of the lines in that space, or the triangles
or circles, or the real-valued functions on that space, or the set of functionals of those real-valued
functions. Remark 20.9.2 describes six kinds of figure spaces, which may be applied recursively in any
combination to produce a huge network of associated transformation groups, all using the same group.
A particular application of these concepts is to the definition of associated connections on tensor bundles
and general fibre bundles for a given connection on the tangent bundle of a differentiable manifold.
(7) Section 20.10 introduces baseless figure/frame bundles. These are implicit in the fibre bundle lit-
erature, but are presented here explicitly in Definition 20.10.8. In the context of fibre bundles, every
ordinary fibre bundle may be paired with an associated principal fibre bundle. The OFB represents
observations of objects, whereas the associated PFB represents reference frames for the observations.
Whenever the reference frame is adjusted, the observations are adjusted also, but the underlying ob-
ject is unchanged. The intention in Section 20.10 is to clarify the relations between observations and
reference frames by removing the underlying base-point space from consideration. The result is the
baseless figure/frame bundle in Definition 20.10.8. Study of such a minimal associated OFB and PFB
is intended to clarify some of the mysteries of fibre bundles, which are the underlying substrate for all
parallelism, which is the underlying concept for all curvature.
(8) Whereas Section 20.10 presents associated baseless figure and frame bundles, Section 20.11 presents
associated figure bundles. The former are like an associated OFB and PFB. The latter are like associated
OFBs. Examples of associated OFBs are the tensor bundles on a differentiable manifold. When the
coordinate chart for a differentiable manifold is changed, all of the different types of tensor bundles are
changed in an associated way. Section 20.11 in effect considers associated OFBs in the absence of a
base-point manifold.
Chapter 21: Non-topological fibre bundles
(1) Chapter 21 presents non-topological fibre bundles, and Chapter 46 presents topological fibre bundles.
Chapter 52 presents tangent bundles. Chapter 61 presents differentiable fibre bundles. (Sections 20.10
and 20.11 present baseless figure/frame bundles.) Fibre bundles provide a unifying framework for
differential geometry, while also extending the concept of parallelism.
(2) Section 21.1 introduces non-topological fibrations, which are the same as topological fibrations except
that they have no topology. Presenting fibrations without topology first gives some clues as to why the
topology is needed. A fibration is the same as a fibre bundle except that it lacks a fibre atlas.
(3) Sections 21.2, 21.3 and 21.4 introduce fibre charts and fibre atlases for non-topological fibrations.
A fibre chart is intended to indicate how the fibration is structured relative to a fixed fibre space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
This is less meaningful without topology, but several low-level technical aspects of fibration structure
can be more easily understood in the absence of topology.
(4) Section 21.5 defines non-topological ordinary fibre bundles to be non-topological fibrations which have
a fibre atlas for a specified structure group. This does place a real constraint on the fibre bundle chart
transition maps.
(5) Section 21.6 defines non-topological principal fibre bundles, which are like non-topological ordinary fibre
bundles except that the fibre space is the same as the structure group. Non-topological PFBs may be
associated with non-topological OFBs in much the same way as in the topological and differentiable
fibre bundle cases.
(6) Section 21.7 defines identity cross-sections and identity charts on principal bundles. These are useful
for applications to gauge theory.
(7) Section 21.8 defines the right action map and right transformation group of a non-topological principal
bundle, including the construction of a principal bundle from a given freely-acting right transformation
group.
(8) Section 21.9 introduces the idea of parallelism on a non-topological fibration. In the absence of a
topology, continuous curves are not defined in the base space. So definitions of pathwise parallelism are
almost meaningless.
(9) Section 21.10 introduces parallelism for non-topological fibre bundles. This differs from parallelism on
fibrations in that the structure group now plays a role.
Chapter 22: Linear spaces
(1) The very substantial subject of linear algebra has been arbitrarily split into linear spaces in Chapter 22,
linear maps and dual spaces in Chapter 23, and linear space constructions in Chapter 24 so as to limit
the lengths of chapters.
(2) The definitions for linear spaces in Chapter 22 are essentially the same as in any textbook on linear
algebra. A linear space is defined in Definition 22.1.1 in the standard way. (A linear space is also known
rather inconveniently as a unitary left module over a commutative division ring.) As much as possible,
the fields for linear spaces are assumed to be general, not necessarily the real numbers for example, and
no assumptions are made about the dimensionality.
(3) Section 22.2 defines the very general classes of free linear spaces and unrestricted linear spaces. The
free linear space on a set S consists of all functions on S which are zero outside a finite subset of S.
The unrestricted linear space on S consists of all functions on S, without any restrictions. (The values
are assumed to lie in a specified field.)
(4) Section 22.3 introduces linear combinations in the standard way, but with some notational innovation.
(5) Section 22.4 introduces the linear span of any subset of a linear space. Theorem 22.4.7 gives some useful
properties of the linear span.
(6) Section 22.5 defines the dimension of a linear space as the least cardinality of its spanning sets. This is
slightly non-standard. It is more usual to define the dimension to be the cardinality of any basis. This
does not work so well if the axiom of choice is rejected, because then the existence of a basis is not
guaranteed. Since the usual definition of cardinality in terms of equinumerosity to ordinal numbers is
extremely clumsy and problematic and impractical (except for finite and countably infinite cardinality),
Definition 22.5.2 uses the much better defined beta-cardinality in Section 13.2. This avoids both
mediate cardinals and the mysticism of general ordinal numbers. Thus for example, is the dimension
of the linear space of real-valued functions on the integers, and 2 is the dimension of the linear space
of real-valued functions on the real numbers.
(7) Section 22.6 introduces linear independence of sets of vectors. According to Definition 22.6.3, a set
of vectors is linearly independent when none of them lies in the span of the others. Theorem 22.6.8
converts this definition to equivalent conditions which are better suited to proofs.
(8) Section 22.7 introduces the concept of a basis of a linear space as a linearly independent subset which
spans the whole space. This is uncontroversial, but it is not assumed here that every linear space has a
basis. One of the most fundamental facts about finite-dimensional spaces is that every basis contains the
same number of vectors. This is shown in Theorem 22.7.13, which relies for its proof on Theorem 22.7.12,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
which uses elementary row operations on bases to show that any linearly independent subset of the
span of a finite set S of linearly independent vectors is not more numerous than S.
(9) In Remark 22.7.19, it is proposed that the issue of the existence (or non-existence) of a basis for a
linear space should be managed by simply assuming such existence whenever it is required. In terms
of the anthropological facts on the ground, this is what mathematicians really do, although they
profess undying faith in the axiom of choice and all its works. Whenever an explicit basis is needed,
like the sixpence provided by the tooth fairy while the child is asleep, it must be provided by an adult.
No explicit basis was ever provided by any axiom of choice, despite what anyone might believe at a
metaphysical level. Furthermore, it is proposed that theorems which need a well-ordered basis should
assume that also. (For example, the free linear space of real-valued functions on Rwhich equal zero
outside a finite set has a well-known explicit basis, but no well-ordered basis because R
cannot be
well-ordered.)
(10) Section 22.8 is about the components of vectors with respect to a basis. The component map of
a given basis is a bijection between the vectors in a linear space and their components with respect
to the basis. A component map thus provides a numericisation of a linear space in terms of the
much-maligned coordinates.
(11) Section 22.9 is concerned with the formulas for the changes in components of vectors when a basis
is changed. Special difficulties arise if the linear space is infinite-dimensional. Changes of basis, and
the accompanying changes of vector components, are the fundamental idea underlying ordinary and
principal fibre bundles. The basis corresponds to a PFB, while the components correspond to an OFB.
The basic algebra of basis changes thus underlies most of the concepts of differential geometry. There
is in effect a kind of dual action or contragredient action of the group of basis changes on the
components of vectors.
(12) Section 22.10 defines Minkowski sums and scalar products of sets.
(13) Section 22.11 gives some basic definitions of convex combinations of vectors and convex subsets of a
linear space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
arise naturally from linear maps, where the subspace in question is the kernel of the map. This leads
to the important isomorphism theorem, stated here as Theorem 24.2.16, which leads directly to Theo-
rem 24.2.18, the rank-nullity theorem.
(4) Section 24.3 defines natural isomorphisms for duals of subspaces of linear spaces.
(5) Section 24.4 is about affine transformation groups.
(6) Section 24.5 defines exact sequences of linear maps.
(7) Section 24.6 defines norms on linear spaces. A metric may be defined in terms of a norm. The most
important example of a norm is the Cartesian space root-mean-square norm, called the Euclidean
norm, which provides the metric space structure for Euclidean space.
(8) Section 24.7 defines inner products on linear spaces, which are closely related to Euclidean space.
(9) Section 24.8 defines hyperbolic and Minkowskian inner products on linear spaces, which are closely
related to Minkowski and pseudo-Riemannian spaces.
are meaningful in many application contexts, but matrix multiplication is fairly specific to the linear
map context.
(4) Section 25.4 introduces identity matrices and left and right inverses with respect to matrix multiplication,
and some of the basic properties of these concepts.
(5) Section 25.5 introduces row span, column span, row rank and column rank of matrices. These concepts
are important for analysis of the invertibility of matrices. They are related to the row and column null
spaces and nullity which are defined in Section 25.6.
(6) Section 25.7 defines the component matrices for linear maps with respect to given bases. This application
context justifies the form of the definition of matrix multiplication.
(7) Section 25.8 narrows the focus to square matrix algebra. This makes matrix inverses more meaningful.
(8) Section 25.9 is about the trace of a matrix. Section 25.10 is about the determinant of a matrix.
(9) Section 25.11 concerns the upper modulus and lower modulus of a square matrix, which are particu-
larly relevant to the bilinear form context for matrices, particularly in regard to second-order differential
operators for real-valued functions on manifolds.
(10) Section 25.12 concerns upper bounds and lower bounds for real square matrices. These have a
particular importance for the proof of Theorem 41.7.4, the inverse function theorem.
(11) Section 25.13 is about real symmetric matrices. Section 25.14 is about classical matrix groups.
(12) Section 25.15 is concerned with multidimensional matrices or higher-degree arrays.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
bundles on Cartesian spaces in Section 26.12.
(4) Section 26.4 is about affine spaces over modules over sets. Section 26.5 is about tangent-line bundles on
affine spaces over modules over sets. These are minimalist, but not as minimalist as in Section 26.3.
(5) Section 26.6 is about affine spaces over modules over groups and rings. Section 26.7 is about affine
spaces over unitary modules over ordered rings. These are progressively less and less minimalist, but
still not very useful.
(6) Section 26.8 is about tangent velocity spaces and tangent velocity bundles on affine spaces over modules
over sets. These are a minimalist version of the tangent spaces and tangent bundles on Cartesian spaces
in Section 26.12, which are actually useful.
(7) Section 26.9 is about parametrised line segments and hyperplane segments, which dont seem to be
really useful for anything. (It may be removed.)
(8) Section 26.10 is about affine spaces over linear spaces. These are the standard affine spaces which are
useful for demonstrating how to remove the arbitrary origin from linear spaces. However, it is easier to
manage the arbitrariness of the origin of a linear space than to manage the definitions of affine spaces.
So they have more philosophical than practical interest.
(9) Section 26.11 is a discussion of the many different meanings for the term Euclidean space in the
mathematics literature. It is proposed that the word Euclidean should be reserved for a space which
has the standard Euclidean distance function, whereas the word Cartesian should be used for the
R
coordinate n-tuple spaces n for integers n.
(10) Section 26.12 is about tangent-line bundles on Cartesian spaces. This is used as the fundamental
representation for tangent bundles on differentiable manifolds in Chapter 52. Therefore these tangent-
line bundles are incorporated into all of the differential geometry in this book. It took the author about
two decades to finally conclude that the best representation for tangent vectors is a parametrised line.
These are defined in Section 26.12 for Cartesian spaces, and are then incorporated into tangent vectors
in Definition 52.1.5.
(11) Section 26.13 defines direct products of tangent spaces of Cartesian spaces.
(12) Section 26.14 is about tangent velocity bundles on Cartesian spaces. These are roughly equivalent to
the tangent-line bundles in Section 26.12, but a tangent line vector is a linear function L which looks
R R
like L : t 7 p + tv for a point p n and a velocity v n , whereas a tangent velocity vector is just the
ordered pair (p, v). This is useful for computations, and is incorporated into tangent velocity vectors on
differentiable manifolds in Definition 52.6.4.
(13) Section 26.15 defines tangent covector bundles on Cartesian spaces, and Section 26.16 defines tangent
field covector bundles on Cartesian spaces, but despite their initial plausibility as candidates for the
representation of tangent covectors on differentiable manifolds, they are rejected for this purpose in
Sections 53.1 and 53.2.
(14) Section 26.17 is a philosophical discussion of ancient Greek precedents for the representation of tangent
vectors as parametrised lines.
(15) Section 26.18 on line-to-line transformation groups has some philosophical interest for indicating ways
of generalising the tangent vector concept, but it may be removed.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Chapter 27 makes much more sense if it is known in advance that tensor spaces are the duals of linear
spaces of multilinear functions. Thus most concepts are defined twice, once for multilinear functions,
and once for tensors in Chapter 28, because they are duals of each other.
(2) Section 27.1 introduces multilinear maps, which are a generalisation of linear maps in Section 23.1.
Scalar-valued multilinear maps are then called multilinear functions, which are analogous to the linear
functionals in Section 23.4.
(3) Section 27.2 defines simple multilinear functions which are products of linear functions. These are
exactly analogous to the simple tensors in Section 28.4. The sum of any two simple multilinear functions
is a multilinear function. The purpose of this definition becomes clearer when the corresponding simple
tensors are introduced.
(4) Section 27.3 expresses multilinear maps in terms of components with respect to a basis for each of the
input spaces. Then multilinear maps may be reconstructed from components in the same way that
linear maps are.
(5) Section 27.4 introduces linear spaces of multilinear maps, which are analogous to linear spaces of linear
maps. It is shown in Theorem 27.4.14 that the dimension of a linear space of multilinear functions is
equal to the product of the dimensions of the individual input spaces.
(6) Section 27.5 presents linear spaces of multilinear maps for dual spaces.
(7) Section 27.6 presents the universal factorisation property for multilinear functions in Theorem 27.6.7.
This property is often presented for tensors in textbooks, but not so often for multilinear functions, which
are the duals of tensors. A universal factorisation property says, in essence, that any multilinear map
from a direct product of linear spaces to a given linear space U can be factorised into a fixed canonical
map from the direct product to the space of multilinear functions and a linear function from that space
to U . In other words, the multilinear function space contains exactly the right amount of information
to describe the multilinear product quality of the input spaces. This makes more sense in Section 28.3
where it is shown that the tensor product of linear space contains the right amount of information
for a tensor product.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(6) Section 28.7 sketches the alternative approach to tensors where a tensor space is defined to be the
quotient space of the free linear space over simple tensors with respect to a suitable equivalence relation.
(6) Section 29.5 constrains the input spaces for mixed covariant and contravariant tensor spaces to all be
the same. These are the kinds of tensor spaces which are applicable to the vast majority of contexts in
differential geometry.
(7) Section 29.6 is concerned with components with respect to a linear space basis for mixed covariant and
contravariant tensor spaces with a single input space. The management of tensor components for such
tensors is much easier than for heterogeneous input spaces, but it is still onerous. These components
are the famous coordinates of tensor calculus.
Chapter 17
Semigroups and groups
semigroup
(, )
monoid
(, )
group
(G,G )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
transformation group
(G,X,G ,)
module commutative group
(M,M ) (G,G )
ring
(R,R ,R )
field
(K,K ,K )
Chapters 17 and 18 present some basic definitions for single-set algebraic systems. These include semigroups,
monoids and groups, which have only one operation, and rings and fields, which have two operations.
Two-set algebraic systems include modules, associative algebras and Lie algebras in Chapter 19, transforma-
tion groups in Chapter 20, linear spaces in Chapters 2224, matrix algebra in Chapter 25, and tensor algebra
Chapters 2730. Three-set algebraic systems include affine spaces, which are presented in Chapter 26,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
left A-module A M :M M M :AM M
left module over a group G M G : G G G M : M M M :GM M
(unitary) left module R M R : R R R M : M M M :RM M
over a ring R : R R R
linear space K V K : K K K V : V V V :K V V
K : K K K
associative algebra, K A K : K K K A : A A A :K AA
Lie algebra K : K K K A : A A A
A G G R R R K K K
M M M M M M A A A
module module module over a ring, associative algebra,
over a set over a group linear space Lie algebra
Figure 17.0.3 Basic algebraic structure classes
17.1. Semigroups
17.1.1 Definition: A semigroup is a pair
< (, ) where 6= and : satisfies
(i) g1 , g2 , g3 , (g1 , (g2 , g3 )) = ((g1 , g2 ), g3 ). [associativity]
17.1.2 Example: A semigroup with non-injective left and right actions. Maximum information loss.
Let be any set, and let 0 . Define : by (g1 , g2 ) = 0 for all g1 , g2 . Then is clearly an
associative closed operation on . Therefore (, ) is a semigroup. However, the non-injectivity of the maps
g1 7 (g1 , g2 ) and g2 7 (g1 , g2 ), when 6= {0}, implies that the usual kind of cancellation operation to
solve algebraic equations is not available. This is an extreme example of information loss.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
17.1.3 Example: Semigroups of non-negative integers.
Z
Let k = {i ; i k} for some k + Z
0 . Define k : k k k by k (x, y) = x + y, where + is
the usual addition operation on the integers. Then Range(k ) k and k is associative. So (k , k ) is a
semigroup for all k +0. Z
17.1.4 Example: A non-associative closed operation does not qualify as a semigroup operation.
The well-known cross product (or vector product) for the vectors i, i, j, j, k, k, 0 3 , with i = (1, 0, 0), R
j = (0, 1, 0) and k = (0, 0, 1), is a closed operation which is not associative. (See for example Szekeres [266],
page 220; Struik [38], pages 11, 206; Lanczos [250], page 355; Corben/Stehle [235], page 18; Schutz [35],
pages 125126; Joos/Freeman [248], pages 1213; Gregory [246], pages 1314; Bishop/Goldberg [3], page 95;
Guggenheimer [16], page 163; Kay [18], page 10; Kreyszig [101], pages 377381; Kreyszig [22], pages 1314;
Frankel [12], page 92; Stillwell [139], pages 1213; Kaplan/Lewis [95], pages 819824.) The Cayley table
for this operation is as in Table 17.1.1. (See Cullen [62], page 8, for Cayley tables. It is called an operation
table by Pinter [118], page 27. It is called a multiplication table by MacLane/Birkhoff [106], page 44.)
0 i j k i j k
0 0 0 0 0 0 0 0
i 0 0 k j 0 k j
j 0 k 0 i k 0 i
k 0 j i 0 j i 0
i 0 0 k j 0 k j
j 0 k 0 i k 0 i
k 0 j i 0 j i 0
Table 17.1.1 Multiplication table for the cross product.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
17.1.7 Remark: Embedding the non-negative integers inside a general semigroup.
According to Theorem 17.1.9, the semigroup of positive integers may be mapped homomorphically to any
non-empty semigroup. The sequence x cannot commence with 0 + Z
0 because this would require an
element x (0) which satisfies x (0) = x (0 + 0) = (x (0), x (0)). Such an element is not guaranteed
to exist in .
Although Theorem 17.1.9 is expressed in terms of successive multiplication on the right, multiplication on
the left would clearly give the same result.
Z
Proof: The pair ( + , +) is a semigroup, as mentioned in Example 17.1.3. For all j + , define the Z
Z
proposition P (j) = i + , x (i + j) = (x (i), x (j)). Let j = 1. Then x (i + j) = (x (i), x) for all
Z Z
i + by (ii). So i + , x (i + j) = (x (i), x (j)). Therefore P (1) is true.
Z Z
Assume that P (j) is true for some j + . Let i + . Then x (i + (j + 1)) = x ((i + 1) + j) =
(x (i + 1), x (j)) by P (j). But x (i + 1) = (x (i), x) by (ii). So x (i + (j + 1)) = ((x (i), x), x (j)) =
(x (i), (x, x (j))) by the associativity of . But (x, x (j)) = (x (1), x (j)) = x (1 + j) = x (j + 1)
Z
by P (j). So x (i + (j + 1)) = (x (i), x (j + 1)). Therefore P (j) is true. So i, j + , x (i + j) =
(x (i), x (j)) by induction. Hence x is a semigroup homomorphism by Definition 17.1.8.
Homomorphisms which embed the integers in algebraic systems according to the general pattern of a unital
morphism give some basic insight into the structure of these systems. (The family relations between the
systems discussed here are illustrated in Figure 17.1.2.)
semigroup
monoid
group
ordered group
ring
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
ordered unitary ring
integral system
Z
Figure 17.1.2 Algebraic systems with unital morphisms
causes information loss. In other words, all left and right actions within a cancellative semigroup can
be reversed. This does not imply the existence of left and right inverses, but the existence of left and
right inverses would imply that the semigroup is cancellative. For example, the multiplicative semigroup
of positive integers is cancellative, but no element except for 1 has a left or right inverse. In other words,
left and right multiplication of positive integers by a fixed positive integer k can be reversed (because such
maps are injective), but there is no positive integer k1 which can achieve this reversal by left or right
multiplication respectively (unless k = 1).
17.1.15 Definition:
A commutative semigroup is a semigroup (, ) which satisfies g1 , g2 , (g1 , g2 ) = (g2 , g1 ).
N
17.1.17 Definition: The sum of a finite sequence f : n , for a semigroup (, ) and n + , is the Z
N
element sn defined inductively by s1 = f1 and j n \ {n}, sj+1 = (sj , fj+1 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
17.1.18 Notation:
P
N Z
f , for a finite sequence f : n , where n + and is a commutative
semigroup, denotes the sum of f .
Pn P P
i=1 fi and iNn fi are alternative notations for f.
17.2. Monoids
17.2.1 Remark: A monoid is a semigroup which has an identity element.
The monoid is an algebraic structure category between a semigroup and a group. (For more on monoids,
see MacLane/Birkhoff [106], page 39; Lang [104], page 3; Grove [86], page 1.) Since a monoid is a semigroup
which has an identity element, it is convenient to first give the definition and basic properties of identity
elements in semigroups.
17.2.4 Remark: Consequences of the existence of left and right identities in a semigroup.
One may define abstract left and right identities eL , eR for a semigroup (, ) to satisfy g , (eL , g) =
g and g , (g, eR ) = g. Then (eL , eR ) = eR and (eL , eR ) = eL . So the left and right identities, if
they exist, must be the same unique identity element.
If a semigroup is only known to have a left identity, then this left identity is not necessarily a right identity.
Nor is it necessarily unique. (See Example 17.2.5.)
17.2.5 Example: Smallest semigroup which has a left identity but no right identity.
Let = {e, g} such that e 6= g. Define : by (x, y) = y for all x, y . Then ((x, y), z) = z
for all x, y, z . Similarly, (x, (y, z)) = (y, z) = z for all x, y, z . Therefore is an associative
operation on .
Both e and g are left identities for (, ), but neither e nor g is a right identity. Since a semigroup with one
element necessarily has a two-sided identity, the smallest semigroup with a left identity but no right identity
has two elements. It is also the smallest semigroup with two different left identities.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Another example of a monoid is the set + 0 of non-negative integers with the standard addition operation.
The set ( +Z ) n
of non-negative integer n-tuples is also a monoid with componentwise addition for n + 0. Z
Q R
0
+ n + n
Similarly, the sets ( 0 ) and ( 0 ) of non-negative rational and real number n-tuples respectively are
monoids with componentwise addition.
17.3. Groups
17.3.1 Remark: Historical origins of groups.
According to Bell [214], page 234. the term group as an algebraic structure was invented by Galois. However,
Bell [213], page 278, says the following, apparently attributing the early development of group theory to
Cauchy (17891857).
[. . . ] the theory of substitutions, begun systematically by Cauchy, and elaborated by him in a long
series of papers in the middle 1840s, which developed into the theory of finite groups [. . . ]
According to Szekeres [266], page 27:
The concept of a group has its origins in the work of Evariste Galois (18111832) and Niels Henrik
Abel (18021829) on the solution of algebraic equations by radicals.
Bell [213], page 282, states that the transition from a concrete group (such as a group of permutations) to
an abstract group (defined only by labels for its elements and the multiplication table for the elements)
was made by Cayley (18211895).
This abstract point of view is that now current. It was not Cauchys, but was introduced by Cayley
in 1854. Nor were completely satisfactory sets of postulates for groups stated till the first decade
of the twentieth century.
Bell [213], page 518, attributes the axiomatic approach to group theory to Dedekind (18311916).
Dedekind was one of the first to appreciate the fundamental importance of the concept of a group
in algebra and arithmetic. In this early work Dedekind already exhibited two of the leading charac-
teristics of his later thought, abstractness and generality. Instead of regarding a finite group from
the standpoint offered by its representation in terms of substitutions [. . . ], Dedekind defined groups
by means of their postulates [. . . ] and sought to derive their properties from this distillation of their
essence. This is in the modern manner: abstractness and therefore generality.
Hence it is not entirely clear who deserves credit for the invention and early development of group theory.
17.3.2 Definition: A group is a pair G < (G, ) such that : G G G satisfies the following.
(i) g1 , g2 , g3 G, (g1 , (g2 , g3 )) = ((g1 , g2 ), g3 ). [associativity]
(ii) e G, g G, (e, g) = (g, e) = g. [existence of identity]
(iii) g G, g 0 G, (g 0 , g) = (g, g 0 ) = e. [existence of inverses]
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
usually used when the group is embedded in a two-operation algebraic structure such as a ring or field.
17.3.4 Definition: A trivial group is a group (G, ) with G = {e} for some e.
17.3.6 Notation:
LG G G
g , for g in a group (G, G ), denotes the function Lg : G G defined by Lg : h 7 G (g, h).
Rg , for g in a group (G, G ), denotes the function Rg : G G defined by RgG : h 7 G (h, g).
G G
Lg and Rg are abbreviations for Lg and Rg respectively when the group G is implicit in the context. Then
Lg : G G and Rg : G G are defined by Lg : h 7 gh and Rg : h 7 hg respectively.
17.3.7 Definition: The function Lg in Notation 17.3.6 may be referred to as the left action by g (on G) .
The function Rg may be referred to as the right action by g (on G) .
17.3.9 Definition: An identity for a group G is any element e G such that eg = ge = g for all g G.
17.3.10 Remark: Possible origin of the notation e for group identity elements.
Probably the customary notation e for the identity in Definition 17.3.9 is a mnemonic for the German
word Einheit, which means unit or unity, because the identity in a multiplicatively notated group is
also called the unit element or unity.
Proof: For part (i), suppose g G has two inverses h1 , h2 G. Then h1 g = gh1 = e and h2 g = gh2 = e.
Therefore h2 = eh2 = (h1 g)h2 = h1 (gh2 ) = h1 e = h1 .
For part (ii), suppose g, h G satisfy gh = h. By Definition 17.3.2 (iii), there is an element h0 G with
hh0 = e. Then g = g(hh0 ) = (gh)h0 = hh0 = e by Definition 17.3.2 (i). So g = e.
For part (iii), suppose g, h G satisfy hg = h. By Definition 17.3.2 (iii), there is an element h0 G with
h0 h = e. Then g = (h0 h)g = h0 (hg) = h0 h = e by Definition 17.3.2 (i). So g = e.
For part (iv), let g, h G satisfy gh = e. By Definition 17.3.2 (iii), there is an element g 0 G with
g 0 g = e = gg 0 . Then h = (g 0 g)h = g 0 (gh) = g 0 e = g 0 . So h is an inverse of h, and this inverse is unique by
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
part (i). Similarly, g is the inverse of h.
Part (v) follows from part (iv) and the observation that ee = e by Definition 17.3.2 (ii).
17.3.15 Remark: The existence and properties of inverses distinguish groups from semigroups.
Although left and right inverses always exist and are the same for a group, they may be different, non-unique
or even non-existent for semigroups. (See Remark 17.2.4.)
17.3.16 Theorem: Let G be a group. Then (g1 g2 )1 = g21 g11 for all g1 , g2 G.
Proof: Let g1 , g2 G. Then (g21 g11 )(g1 g2 ) = g21 (g11 g1 )g2 = g21 eg2 = g21 g2 = e. Similarly,
(g1 g2 )(g21 g11 ) = e. Hence (g1 g2 )1 = g21 g11 .
Proof: For part (i), closure follows from Theorem 10.5.4 (iii). Let f, g, h G. Then L (L (f, g), h) =
(f g) h = f (g h) = L (f, L (g, h)) by Theorem 10.4.18 (ii), which verifies associativity. Let e = idS .
Then e satisfies Definition 17.3.2 (ii). Let f G. Then the function inverse f 1 is in G by Theorem 10.5.9,
and L (f 1 , f ) = f 1 f = idS = e and L (f, f 1 ) = f f 1 = idS = e by the definition of the inverse
function. This verifies Definition 17.3.2 (iii). Hence (G, L ) is a group.
For part (ii), closure follows from Theorem 10.5.4 (iii). Let f, g, h G. Then R (R (f, g), h) = h (g f ) =
(h g) f = R (f, R (g, h)) by Theorem 10.4.18 (ii), which verifies associativity. Definition 17.3.2 conditions
(ii) and (iii) follow as in part (i). Hence (G, R ) is a group.
Proof: For part (i), let g, h G. Then (Lg1 Lg )(h) = g 1 gh = h. So Lg1 Lg = idG . Similarly,
Lg Lg1 = idG . So Lg : G G is a bijection by Theorem 10.5.12 (iv).
Part (ii) may be proved similarly to part (i).
Part (iii) follows from Theorem 17.3.13 (ii).
Part (iv) follows from Theorem 17.3.13 (iii).
17.3.20 Remark: The set of all translations of a group by a group element is a group.
Let G be a group. For h G, define the map Lh : G G by Lh : g 7 hg. Then {Lh ; h G} is a group
under the operation of composition of maps. In other words, Lh1 Lh2 = Lh1 Lh2 = Lh1 h2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Similarly, {Rh ; h G} is a group with Rh : G G defined by Rh : g 7 gh and the reverse map-composition
operation, In other words, Rh1 Lh2 = Rh2 Rh1 = Rh1 h2 . Both left and right translation groups are presented
in Theorem 17.3.21.
Since these groups are sets of set-automorphisms, they are subgroups of the set of set-automorphisms Aut(S).
(See Section 17.6 for subgroups.)
Proof: For part (i), closure follows from the equality L (Lg , Lh )(x) = ghx = Lgh (x) for all x G.
Associativity follows from L (Lf , L (Lg , Lh )) = Lf (Lg Lh ) = (Lf Lg ) Lh = L (L (Lf , Lg ), Lh ) for
all f, g, h G by Theorem 10.4.18 (ii). Let eL = idG = Le , where e is the identity for G. Then eL GL and
L (eL , Lg ) = L (Lg , eL ) = Lg for all g G, which verifies Definition 17.3.2 (ii). For all g G, Lg1 GL
and L (Lg1 , Lg ) = Lg1 Lg = eL = Lg Lg1 = L (Lg , Lg1 ), which verifies Definition 17.3.2 (iii). Hence
(GL , L ) is a group.
For part (ii), closure follows from R (Rg , Rh )(x) = xgh = Rgh (x) for all x G. Associativity follows
from R (Rf , R (Rg , Rh )) = (Rh Rg ) Rf = Rh (Rg Rf ) = R (R (Rf , Rg ), Rh ) for all f, g, h G by
Theorem 10.4.18 (ii). Let eR = idG = Re , where e is the identity for G. Then eR GR and R (eR , Rg ) =
R (Rg , eR ) = Rg for all g G, which verifies Definition 17.3.2 (ii). For all g G, Rg1 GR and
R (Rg1 , Rg ) = Rg Rg1 = eR = Rg1 Rg = R (Rg , Rg1 ), which verifies Definition 17.3.2 (iii). Hence
(GR , R ) is a group.
17.3.23 Remark: A commutative group is also known as a module without operator domain.
The operation of a commutative group is typically notated additively to distinguish it from a multiplication
operation on the same set. Commutative groups are also known as Abelian groups, named after Niels
Henrik Abel (18021829).
The conditions for a commutative group are logically equivalent to the conditions for a module without
operator domain in Definition 19.1.2. In the module role, a commutative group is thought of as a passive
set which can be acted on by active sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is not necessary to explicitly require that the identity be mapped to the identity, and inverses be mapped
to inverses. The extra conditions (ii) and (iii) in Definition 17.3.2 guarantee these extra properties of group
morphisms, as shown in Theorem 17.4.4.
Proof: For part (i), let g G1 . Then (g) = (geG1 ) = (g)(eG1 ) by Definition 17.4.1. So (eG1 ) = eG2
by Theorem 17.3.13 (iii).
For part (ii), let g G1 . Then eG2 = (eG1 ) = (g 1 g) = (g 1 )(g) by part (i). So (g 1 ) = (g)1 by
Theorem 17.3.13 (iv).
17.4.7 Theorem: Let (G, ) be a group with identity e G. For each g G, define the function
Z
g : G inductively as follows.
(i) g (0Z ) = e.
(ii) n + Z
0 , g (n + 1) = (g (n), g).
(iii) n Z0 , g (n 1) = (g (n), g
1
).
Then for all g G, g is the unique group homomorphism from the additive group Z to G with (1) = g.
Proof: For g G and g : Z G as defined, g (n1 + 0) = g (n1 ) = g (n1 )g (0) for all n1 Z by
assumption (i). Suppose (as inductive hypothesis on n2 Z+ 0 ) that g (n1 + n2 ) = g (n1 )g (n2 ) for all
n 1 Z+ 0 . Then g (n1 + (n2 + 1)) = g ((n1 + n2 ) + 1) = g (n1 + n2 )g (1) = (g (n1 )g (n2 ))g (1) =
g (n1 )(g (n2 )g (1)) = g (n1 )g (n2 + 1) for all n1 Z+ 0 by the associativity of multiplication in G and
assumption (ii). Hence by induction on n2 Z+ 0 , g (n1 + n2 ) = g (n1 )g (n2 ) for all n1 , n2 Z0 . The
+
17.4.8 Definition: The unital group morphism generated by an element g of a group G is the map
Z
g : G which is defined in Theorem 17.4.7.
The nth power of g means the value g (n) of the unital group morphism generated by g.
Z
17.4.9 Notation: g n , for g in a group G and n , denotes the nth power of g.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) Define : G G by : g 7 g 1 . Then is a group isomorphism from (G, ) to (G, ).
Proof: For part (i), let f, g, h G. Then (f, (g, h)) = ((h, g), f ) = (h, (g, f )) = ((f, g), h) by
Definition 17.3.2 (i). So is associative. The identity e for (G, ) is clearly also an identity for (G, ) also. For
g G, let g 1 be the inverse of g in (G, ). Then (g 1 , g) = (g, g 1 ) = e and (g, g 1 ) = (g 1 , g) = e.
So g 1 is also an inverse of g in (G, ). Hence (G, ) is a group.
For part (ii), let g, h G. Then ((g, h)) = (g, h)1 = (h, g)1 = (g 1 , h1 ) by Theorem 17.3.16
because the inverses of g and h are the same in (G, ) as they are in (G, ). So ((g, h)) = ((g), (h)).
Therefore is a group homomorphism by Definition 17.4.1. But : G G is a bijection (and it equals its
own inverse). Hence is a group isomorphism.
17.4.12 Definition: The reverse-operation group of a group (G, ) is the group (G, ) with : GG G
defined by : (g, h) 7 (h, g).
17.4.13 Remark: Four basic kinds of isomorphisms from a group to its translation groups.
Any group G is isomorphic to the group of left translations of G, with essentially the same group operation.
This is because Lgh = Lg Lh for all g, h G. If each g G is mapped to Rg , the isomorphism must reverse
the operation order because Rgh = Rh Rg for g, h G. It follows from Theorem 17.4.11 that g may be
mapped to Lg1 if the operation order is reversed. This is due to the order reversal in Theorem 17.3.16.
Consequently, if g is mapped to Rg1 , there is a double reversal. Thus R(gh)1 = Rg1 Rh1 . These four
cases are described in Theorem 17.4.14.
Proof: For part (i), let g, h G. Then L (gh) = Lgh = Lg Lh = L (Lg , Lh ) = L (L (g), L (h)). Hence
L is a group homomorphism from (G, ) to (GL , L ) by Definition 17.4.1. The surjectivity of L follows
from the definition of GL . Let L (g) = L (h). Then Lg = Lh . So g = geL = heL = h, where eL is the
identity of GL . So L is injective. Hence L is an isomorphism.
For part (ii), let g, h G. Then R (gh) = Rgh = Rh Rg = R (Rg , Rh ) = R (R (g), R (h)). Hence R is
a group homomorphism from (G, ) to (GR , R ) by Definition 17.4.1. The proof that R is an isomorphism
follows as for part (i).
For part (iii), define : GL GL by : Lg 7 Lg1 for all g G. Then is a group isomorphism from
(GL , L ) to (GL , L ) by Theorem 17.4.11 (ii) because L1
g = Lg 1 for all g G. But L = L . So L is
a group isomorphism by part (i).
For part (iv), define : GR GR by : Rg 7 Rg1 for all g G. Then is a group isomorphism from
(GR , R ) to (GR , R ) by Theorem 17.4.11 (ii) because Rg1 = Rg1 for all g G. But R = R . So R
is a group isomorphism by part (ii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A left-ordered group is a tuple (G, , <) such that < is a left-ordering on the group (G, ).
A right-ordered group is a tuple (G, , <) such that < is a right-ordering on the group (G, ).
An ordered group is a tuple (G, , <) such that < is a bi-ordering on the group (G, ).
17.5.4 Definition: An Archimedean order on a group (G, ) is a total order < on G which satisfies
g G, h G+ , n + Zn +
0 , g < h , where G = {x G; eG < x} and eG is the identity of G.
17.5.5 Definition: An Archimedean left-ordering of a group (G, ) is a left-ordering < of (G, ) such
that < is an Archimedean order on (G, ).
An Archimedean right-ordering of a group (G, ) is a right-ordering < of (G, ) such that < is an
Archimedean order on (G, ).
An Archimedean bi-ordering of a group (G, ) is a bi-ordering < of (G, ) such that < is an Archimedean
order on (G, ).
17.5.6 Remark: Application of ordered commutative groups to a minimalist metric function definition.
The minimalist definition of a point-to-point metric function in Section 37.1 requires the values of the metric
function to have a commutative addition operation and a total order. This is the motivation for Definitions
17.5.7 and 17.5.8. In the case of a standard non-negative metric, only the non-negative values of the ordered
commutative group are required.
17.5.7 Definition: An ordering of a commutative group (G, ) is a total order < on G which satisfies
a, b, c G, (a < b ac < bc).
17.5.8 Definition: An ordered commutative group is a tuple (G, , R) such that (G, ) is a commutative
group and R is an ordering of (G, ).
17.5.9 Definition: An Archimedean ordering of a commutative group (G, ) is an ordering < of (G, )
such that < is an Archimedean order on (G, ).
An Archimedean ordered commutative group is an ordered commutative group (G, , <) such that < is an
Archimedean order on (G, ).
17.5.10 Remark: Ordered group morphisms.
For brevity, only ordered group homomorphisms are defined in Definition 17.5.11. The other five kinds of
morphisms are almost always assumed to be automatically defined in terms of homomorphisms following the
pattern of Definition 17.4.1.
17.5.11 Definition: An ordered group homomorphism from an ordered group (G1 , 1 , <1 ) to an ordered
group (G2 , 2 , <2 ) is a group homomorphism : G1 G2 such that g, h G1 , (g <1 h (g) <2 (h)).
17.5.12 Theorem: Let G be an ordered group. For x G, let x : Z G be the unital group morphism
generated by x.
(i) Z
x : G is a group homomorphism for all x G.
(ii) Z Z
If x > eG , then n + , x (n) > eG and n , x (n) < eG .
(iii) Z Z
If x < eG , then n + , x (n) < eG and n , x (n) > eG .
(iv) Z
x : G is a group monomorphism for all x G \ {eG }.
Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(v) x : G is an ordered group monomorphism for all x G with x > eG .
Proof: For part (ii) follows from Theorem 17.4.7.
For part (ii), let x > eG . Then x (1) = x > eG = x (0). Suppose (as inductive hypothesis) that x (n) > eG
Z
for some n + . Then x (n + 1) = x (n)g > x (n)e = x (n) because < is a right-ordering. So
Z Z
x (n + 1) > eG . Therefore by induction, x (n) > eG for all n + . Similarly, n , x (n) < eG .
Part (iii) may be proved as for part (ii).
For part (iv), let x G \ {eG }. Then either x < eG or x > eG . Suppose that x > eG . Let n1 , n2 with Z
n1 6= n2 . Then either n1 < n2 or n1 > n2 . Suppose that n1 < n2 . Then n1 n2 < 0. So x (n1 n2 ) < eG
by part (ii). Therefore x (n1 ) = x (n2 + (n1 n2 )) = x (n2 )x (n1 n2 ) < x (n2 )eG = x (n2 ) because
< is a right-ordering. So x (n1 ) 6= x (n2 ), and similarly if x < eG or n1 > n2 . Therefore x is injective.
Z
Hence x : G is a group monomorphism by Definition 17.4.1.
Part (v) follows from Definition 17.5.11 and the proof of part (iv).
17.6. Subgroups
17.6.1 Definition: A subgroup of a group (G, G ) is a group (H, H ) such that H G and H G .
17.6.2 Remark: The operation for a subgroup is a restriction of the operation of the group.
If (H, H ) is a subgroup of a group (G, G ) according to Definition 17.6.1, then H = G HH . This follows
from the fact that the domain of H must be H H.
In practice, it is usual to refer to the set H as a subgroup rather than the pair (H, G HH ) because the only
possible choice for the subgroup operation can be so easily determined from the parent groups operation.
Definition 17.6.3 is thus an informal alternative to Definition 17.6.1. Then Theorem 17.6.4 gives some fairly
obvious (but useful) conditions for a subset of a group to be a subgroup in the sense of Definition 17.6.3.
17.6.3 Definition: A subgroup-set of a group (G, G ) is a set H such that (H, H ) is a subgroup of
(G, G ), where H = G HH .
A subgroup-set may be referred to simply as a subgroup.
17.6.4 Theorem: Let G < (G, G ) be a group. Let H be a set. Then H is a subgroup-set of G if and
only if
(i) H G,
(ii) g1 , g2 H, G (g1 , g2 ) H,
(iii) eG H, and
1 1
(iv) g H, gG H, where gG denotes the inverse of g in G.
Hence H, G
HH
is a subgroup of (G, G ) if and only if (i), (ii), (iii) and (iv) hold.
Proof: Assume that H is a subgroup-set of G according to Definition 17.6.3. Then H is a subset of G,
which verifies (i), and (H, H ) is a subgroup of (G, G ), where H = G HH . Since (H, H ) is a subgroup
of (G, G ), it follows from Definition 17.6.1 that (H, H ) is a group.
Let g1 , g2 H. Then H (g1 , g2 ) H because (H, H ) is a group. So G (g1 , g2 ) H because H G . This
verifies (ii).
H has an identity element eH . Then eH G and H (eH , eH ) = eH . So G (eH , eH ) = eH because H G .
Therefore eH = eG by Theorem 17.3.13 (ii). This verifies (iii).
1
Let g H. Let gG G be the inverse of g in G, which is well defined because G is a group. Since (H, H )
1
is a group, g has an inverse gH in H. Therefore
1 1
gG = G (gG , eG )
1
= G (gG , eH )
1 1
= G (gG , H (g, gH ))
1 1
= G (gG , G (g, gH ))
1 1
= G (G (gG , g), gH )
1
= G (eG , gH )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1
= gH ,
1
which implies that gG H. This verifies (iv).
Now suppose that H is a set which satisfies (i), (ii), (iii) and (iv). Then H G. Let H = G HH . Then
H G . To satisfy Definitions 17.6.3 and 17.6.1, it remains to show that (H, H ) is a group. The algebraic
closure of H follows from (ii). The associativity of H follows from the associativity of G . The identity
1
eG for G in (iii) is also an identity for H. And the inverse gG H in (iv) is an inverse for g within H.
Therefore (H, H ) is a group by Definition 17.3.2. Hence H is a subgroup-set of G by Definitions 17.6.3
and 17.6.1.
It then follows directly from Definition 17.6.3 that H, G HH is a subgroup of (G, G ) if and only if (i),
(ii), (iii) and (iv) hold.
17.6.5 Definition: The group generated by a subset S of a group G is the subgroup of G whose set of
T
P
elements is {G0 (G); S G0 and G0 is a subgroup of G}.
17.6.6 Theorem: The group generated by a subset S of a group G is a subgroup of G which includes S.
P T
P
Proof: Let G be a group. Let S (G). Let G0 = {G0 (G); S G0 and G0 is a subgroup of G}.
P
To show closure, let g1 , g2 G0 . Then g1 , g2 G0 for all groups G0 (G) with S G0 . So G0 (g1 , g2 ) =
G (g1 , g2 ) G0 for all subgroups
G0 of G with S G0 . Therefore G (g1 , g2 ) G0 . Associativity follows
from the fact that G0 = G G G . All subgroups of G contain the identity eG G. Therefore eG G0 .
0 0
Then eG is also the identity for G0 because G0 = G G G . So one may write eG0 = eG . To show the
0 0
existence of inverses, let g G0 . Then g has an inverse g 1 G. But g also has an inverse in all
subgroups
G0 of G such that S G0 , and these inverses are all the same element of G because G0 = G G0 G0 for all
such G0 . Therefore g 1 G0 . Hence G0 is a subgroup of G which includes S.
17.6.7 Example: The set of all permutations of a set X is clearly a group with map composition as
the group operation. (See Definition 14.7.2 for general permutations.) The set of all finite permutations
of X is clearly a subgroup of the set of all permutations. (See Definition 14.7.6 for finite permutations.)
All finite permutations are expressible as the composition of a finite sequence of two-point transpositions.
(See Theorem 14.7.11.) Thus the subgroup of finite permutations is generated by the set of all two-point
transpositions of any set X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The map would be well-defined if the left and right cosets of H yielded the same partition of G. Such
a subgroup H is called a normal subgroup of G. For such a subgroup, the maps : G X X
and : X G X are well-defined, and the values (g 0 , gH) and (gH, g 0 ) depend only on the coset
of H which g 0 belongs to. Therefore one may construct a well-defined operation 0 : X X X with
0 : g1 H g2 H (g1 g2 )H. It turns out that (X, 0 ) is a group. This is called the quotient group of G
modulo H, and is given the notation G/H. This is presented more formally in Definitions 17.7.7 and 17.7.9.
17.7.6 Remark: The analogy of cosets to parallel translation in Euclidean space.
Cosets of subgroups have much in common with the parallel translation of lines and planes in Euclidean
space. The sets gH, for a subgroup H of a group G, may be thought of as parallel translates of H within G.
In particular, the non-intersection of distinct cosets resembles the notion in Euclidean space that parallel
lines never meet.
17.7.7 Definition: A normal subgroup of a group G is a subgroup H of G such that:
g G, h H, h0 H, gh = h0 g. (17.7.1)
17.7.8 Remark: The condition for a normal subgroup is equivalent to its mirror image.
Definition 17.7.7 for a normal subgroup H of G is equivalent to requiring that gH = Hg for all g G. To
show that the mirror image of condition (17.7.1) in Definition 17.7.7 holds, note that for all g G and h H,
hg = (g 1 h1 )1 = (h0 g 1 )1 = g(h0 )1 for some h0 H, and so (h0 )1 H. Hence
g G, h H, h0 H, hg = gh0 .
17.7.9 Definition: The quotient group of G modulo H for any normal subgroup H < (H, H ) of a group
G < (G, G ) is the group (G/H, G/H ), where G/H = {gH; g G} and G/H : (g1 H, g2 H) 7 g1 g2 H for
all g1 , g2 G.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The identity of the direct product group in Definition 17.7.14 is the element e = (e1 , e2 ), where e1 and e2
are the identities of G1 and G2 respectively. The inverse of (g1 , g2 ) G is (g11 , g21 ). Each of the groups
G1 and G2 is called a direct factor of G.
17.8.5 Example: Subgroup with equal left and right conjugates but different left and right cosets.
It is possible for a subgroup S of a group G to satisfy g G, gSg 1 = g 1 Sg, but h G, hS 6= Sh. In
other words, it is possible for a subgroup to have all left and right conjugates equal, but not all left and
right cosets equal. In particular, such a subgroup is not a normal subgroup. These two propositions are
equivalent to g G, g 2 Sg 2 = S and h G, hSh1 6= S respectively. So an element h of order 2 would
seem to be required for this example.
Z Z
For X = 2 , define the bijections i,j : X X and : X X for i, j by i,j : (x, y) 7 (x + i, y + j)
and : (x, y) 7 (y, x). These are translation and coordinate-swap actions respectively. Define the group G
Z Z
as the set of actions {i,j ; i, j } {i,j ; i, j } with the operation of function composition, where
i,j denotes the composition i,j .
Z Z
Let H = {n,n ; n } {n,n ; n }. This is a subgroup of G. Let g = a,b for any a, b Z. Then
Z
gHg 1 = g 1 Hg for all a, b , but gH = Hg if and only if a = b.
17.8.6 Remark: Interpretation of left conjugate as push-forth and right conjugate as pull-back.
If G is a left transformation group (defined in Section 20.1), the left conjugate ghg 1 can be interpreted as a
push-forth operation because points are first pulled back from near g to near e by the operation g 1 , then
the operation h is applied to the points, and finally the transformed points are pushed forth again to near
their original location. By contrast, the right conjugate g 1 hg may be interpreted as a pull-back operation
because points are first pushed from near e to near g by the operation g, then the operation h is applied
to the points, and finally the transformed points are pulled back again to near their original location. In
other words, left conjugation pushes the operation h from e to the point g, whereas right conjugation pulls
the operation h from g back to e.
Figure 17.8.1 shows the idea that a left and right conjugate of an element h by an element g of a group G
may be interpreted respectively as a push-forth of h from e to g, or a pull-back of h from g to e.
g g 1
h gh g 1 hg hg
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
G h ghg 1 g 1 hg h G
e g e g g
g 1
This is related to the concept of parallel transport. The element h is transported between the points e and
g in the group. In some sense, the conjugate ghg 1 is the result of transporting the operation h from e to g.
If the group is commutative, all conjugates of h are equal to h, in which case one could say that there is an
absolute parallelism. In other words, the space is flat.
17.8.7 Example: The right conjugate as a pull-back and the left conjugate as a push-forth.
Illustrated in Figure 17.8.2 is the example G = GL(2), the group of invertible linear transformations of R2 ,
with g, h G defined by the left action of the matrices R and S0 respectively on column vectors in R2 ,
where for all , R
cos sin
R =
sin cos
cos 2 sin 2
S = .
sin 2 cos 2
R
That is, R rotates points of 2 by angle in an anti-clockwise direction and S reflects points through a
line normal to the vector (cos , sin ).
g = R h = S0 g 1 = R
R S0 R
S S
left 1
right
conjugate ghg g 1 hg conjugate
R S0 R = S R S0 R = S
push-forth of h by g pull-back of h by g
Figure 17.8.2 Left conjugate is a push-forth. Right conjugate is a pull-back.
The effect of the left conjugate ghg 1 , which has matrix R S0 R is to push forth the direction of reflection
of S0 by the angle to S . In other words, the left conjugate by g rotates h forward in the direction of g.
The effect of the right conjugate g 1 hg, which has matrix R S0 R is to pull back the direction of reflection
of S0 by the angle to S .
17.8.8 Notation: S g for a subset S of a group G and g G denotes the conjugate of S by g.
17.8.9 Remark: Properties of left and right conjugates.
From Notation 17.3.6, S g = gSg 1 = Lg (Rg1 (S)) = Rg1 (Lg (S)) = (Lg Rg1 )(S) = (Rg1 Lg )(S).
Similarly for the right conjugate, g 1 Sg = Lg1 (Rg (S)) = Rg (Lg1 (S)) = (Lg1 Rg )(S) = (Rg Lg1 )(S).
17.8.10 Theorem: Let H be a subgroup of a group G. Then for all g G, gHg 1 is a subgroup of G.
Proof: Let g1 , g2 gHg 1 . Then g1 = gh1 g 1 and g2 = gh2 g 1 for some h1 , h2 H. But then
g1 g2 = gh1 g 1 gh2 g 1 = gh1 h2 g 1 , where h1 h2 H because H is a group by Definition 17.6.1. So Hg 1
is closed under multiplication. Let g1 gHg 1 . Then g1 = gh1 g 1 for some h1 H. Let h2 = h1 1 .
Then h2 H because H is a group. Let g2 = gh2 g 1 . Then g2 gHg 1 . But g1 g2 = gh1 g 1 gh2 g 1 =
gh1 h2 g 1 = geg 1 = e, and similarly g2 g1 = e. So every element of gHg 1 has an inverse with eH = eG .
Hence gHg 1 is a subgroup of G.
17.8.11 Remark: Conjugation maps on groups.
Conjugations maps are automorphisms which are related to adjoint maps for Lie group elements acting on
the Lie algebra of the group in Definition 60.7.10. For an adjoint map, the group element h is replaced by
an infinitesimal group element, which means an element of the Lie algebra of the group. In effect, ghg 1
is differentiated with respect to h.
17.8.12 Definition:
The (left) conjugation map of a group G by an element g G is the map h 7 ghg 1 for h G.
An inner automorphism of a group G is the same thing as a left conjugation map.
The right conjugation map of a group G by an element g G is the map h 7 g 1 hg for h G.
17.8.13 Remark: Properties of left and right conjugation maps.
In terms of Notation 17.3.6, the left conjugation map by g G is Lg Rg1 = Rg1 Lg . The right
conjugation map by g G is Lg1 Rg = Rg Lg1 .
17.8.14 Theorem: All conjugation maps of a group are automorphisms.
Proof: Let be the left conjugation map of a group G by g G. Then : h 7 ghg 1 maps h1 h2 to
gh1 h2 g 1 = gh1 g 1 gh2 g 1 = (h1 )(h2 ). So is a group endomorphism by Definition 17.4.1. To see that
: G G is a bijection, let h G, and let h0 = g 1 hg. Then (h0 ) = ghg 1 = gg 1 hgg 1 = h. So
is surjective. Suppose that (h1 ) = (h2 ). Then gh1 g 1 = gh2 g 1 . So g 1 (gh1 g 1 )g = g 1 (gh2 g 1 )g.
Therefore h1 = h2 . So is injective and is therefore a bijection. Hence is a group automorphism by
Definition 17.4.1, and similarly for right conjugation maps.
Chapter 18
Rings and fields
18.1. Rings
18.1.1 Remark: Rings are algebraic structures with both addition and multiplication.
A ring is a two-operation algebraic system, in contrast to semigroups and groups, which have only one
operation. A ring is a group with respect to the first operation, and a semigroup with respect to the second
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
operation. The first operation is thought of (and notated) as an addition operation. The second operation
is thought of (and notated) as a multiplication or product operation. The multiplication operation satisfies
a distributive condition with respect to the addition operation.
Figure 18.1.1 illustrates some relations between various categories of rings, including some relations to fields
and the real numbers and integers. (Not shown are the relations for Archimedean ordered rings and various
other combinations of axioms which are not presented in this book.)
18.1.3 Theorem: Let (R, , ) be a ring. Let 0R be the identity of the additive group (R, ). Then
(i) x R, (0R , x) = (x, 0R ) = 0R . (I.e. x R, 0R x = x0R = 0R .)
(ii) x, y R, (x, y) = (x, y). (I.e. x, y R, (x) (y) = x y.)
(iii) x, y R, (x, y) = (x, y) = (x, y). (I.e. x, y R, (x) y = x (y) = (x y).)
Proof: To prove part (i), let x R. Then 0R x = (0R + 0R ) x since 0R is the identity of the group (R, ).
But (0R + 0R ) x = 0R x + 0R x by Definition 18.1.2 (iii). So 0R x = 0R x + 0R x. Hence 0R = 0R x
(by adding the additive inverse (0R x) of 0R x to both sides of the equation and using the associativity
of addition). It follows similarly that 0R = x 0R .
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
ring
To prove part (ii), let x, y R. Then 0R = x 0R = x (y + (y)) = x y + x (y) by Definition 18.1.2 (iii).
Similarly, 0R = 0R (y) = ((x) + x) (y) = (x) (y) + x (y). Therefore x y = (x) (y).
For part (iii), let x, y R. Then 0R = x 0R = x (y + (y)) = x y + x (y) by Definition 18.1.2 (iii).
Therefore x (y) = (x y). Similarly 0R = 0R y = (x + (x)) y = x y + (x) y by Definition 18.1.2 (iii).
Therefore (x) y = (x y).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
18.1.5 Definition: A zero ring or trivial ring is a ring (R, , ) such that R has only one element.
Thus R = {0R }, = {((0R , 0R ), 0R )} and = {((0R , 0R ), 0R )}.
A non-zero ring or non-trivial ring is a ring which is not a zero ring.
18.1.8 Definition: The subring generated by a subset S of a ring R is the subring of R whose set of
T
P
elements is {A (R); S A and A is a subring of R}.
b R \ {0R }, ab = 0R or ba = 0R .
18.1.15 Definition: A cancellative ring or cancellation ring is a ring R which has both the left and right
cancellation properties. In other words,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
18.1.16 Theorem: A ring is a cancellative ring if and only if it has no zero divisors.
Proof: Let R be a cancellative ring. Suppose that a R is a zero divisor. Then a R \ {0R } and either
ab = 0R or ba = 0R for some b R \ {0R }. If ba = 0R , then ba = b0R although a 6= 0R , which contradicts
line (18.1.1). So no zero divisors can exist.
Conversely, suppose that R contains no zero divisors. Let h R \ {0R } and g1 , g2 R. Suppose that
hg1 = hg2 with g1 6= g2 . Then h(g1 g2 ) = 0R , but h 6= 0R and g1 g2 6= 0R , which implies that h is a
zero divisor. Similarly, if g1 h = g2 h and g1 6= g2 , then h is a zero divisor. Therefore both lines (18.1.1) and
(18.1.2) are valid, and so R is a cancellative ring.
18.1.21 Remark: Unital morphisms for non-unitary rings are not well defined.
The unital group morphisms in Definition 17.4.8, applied to the additive group of a ring R, are not ring
homomorphisms between Z and R in general. Let x : Z R be the unital group morphism generated
Z
by x R for the additive group of R. Let n1 . Then x (n1 0Z ) = x (0Z ) = 0R by Theorem 17.4.7 (i).
Suppose that n2 + Z
0 satisfies x (n1 n2 ) = x (n1 )x (n2 ). Then x (n1 (n2 + 1)) = x (n1 n2 + n1 ) =
x (n1 n2 ) + x (n1 ) = x (n1 )x (n2 ) + x (n1 ). However, x (n1 )x (n2 + 1) = x (n1 )(x (n2 ) + x (1)) =
x (n1 )x (n2 ) + x (n1 )x (1). Therefore x (n1 (n2 + 1)) = x (n1 )x (n2 + 1) only if x (n1 )x (1) = x (n1 ).
This would be true for all n1 + Z 0 if x (n1 ) = 0R for all n1 Z +
0 or if x (1) is a multiplicative identity
for R, which is not available in a non-unitary ring. (See Theorem 18.2.10 for a more positive outcome.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The element e is called the multiplicative identity (element) , unit element , unity or unity element of R.
18.2.5 Remark: The inconvenient fact that a zero ring is a unitary ring.
Unfortunately, a zero ring is a unitary ring. (See Definition 18.1.5.) It is easily verified that 0R is a
multiplicative identity for a zero ring. Therefore the zero ring case must be explicitly excluded whenever the
inequality 0R 6= 1R is required, which is almost always.
Proof: For part (i), note that since R is a unitary ring, it contains a unique additive identity 0R and a
unique multiplicative identity 1R . Suppose that 0R = 1R . Since R is assumed to not be a zero ring, there is
an element x of R such that x 6= 0R . Then 1R x = x 1R = x by Definition 18.2.2 (i), and 0R x = x 0R = 0R
by Theorem 18.1.3 (i). Hence x = 0R , which is a contradiction. Therefore 0R 6= 1R .
18.2.8 Definition: A unitary ring homomorphism from a unitary ring R1 < (R1 , 1 , 1 ) to a unitary ring
R2
< (R2 , 2 , 2 ) is a ring homomorphism : R1 R2 which satisfies
(iii) (1R1 ) = 1R2 .
18.2.10 Theorem: Let (R, , ) be a unitary ring with additive and multiplicative identities 0R , 1R R.
Z
Define the function : R inductively as follows.
(i) (0Z ) = 0R .
(ii) n +Z0 , (n + 1) = ((n), 1R ).
Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) n 0 , (n 1) = ((n), 1R ).
Then is the unique unitary ring homomorphism from the unitary ring Z to R.
Proof: Let (R, , ) be a unitary ring. Then (R, ) is a group by Definition 18.1.2 (i). So it follows from
Z
Theorem 17.4.7 that there is a unique group homomorphism : R from the additive group of toZ
Z
(R, ) which satisfies (1) = 1R . Then is the unique map from to R which satisfies Definition 18.1.19 (i)
and Definition 18.2.8 (iii).
Z
Let n1 . Then (n1 0Z ) = (0Z ) = 0R . Now suppose (as inductive hypothesis) that n2 + Z
0 satisfies
(n1 n2 ) = (n1 )(n2 ). Then (n1 (n2 + 1)) = (n1 n2 + n1 ) = (n1 n2 ) + (n1 ) = (n1 )(n2 ) + (n1 )1R =
(n1 )((n2 ) + 1R ) = (n1 )(n2 + 1) by the ring properties of Z and R and the definition of . So by
Z
induction, (n1 n2 ) = (n1 )(n2 ) for all n2 + , and similarly for n Z
2 0 . Thus Definition 18.1.19 (ii) is
Z
0
satisfied. Hence : R is a unitary ring homomorphism, and it is unique by Theorem 17.4.7.
18.2.11 Definition: The unital (ring) morphism of a unitary ring R is the unique map : Z R which
is defined inductively by
(i) (0Z ) = 0R ,
(ii) n +Z0 , (n + 1) = (n) + 1R ,
(iii) n Z0 , (n 1) = (n) 1R .
18.2.13 Definition: The characteristic of a non-trivial unitary ring R is the least positive integer n Z+
such that (n) = 0, if such a an n exists, or 0 otherwise, where is the unital morphism for R.
18.2.14 Definition: A commutative unitary ring is a unitary ring which is also a commutative ring.
18.2.15 Definition: An integral domain is a non-trivial cancellative commutative unitary ring.
18.2.16 Example: The integers and Gauian integers are integral domains.
The ring of Gauian integers is a commutative unitary ring with unit (1, 0). This is a subring of the ring of
complex numbers. Since the Gauian integers satisfy the cancellation properties in Definition 18.1.15, they
also constitute an integral domain. (The Gauian integers contain no zero divisors because the complex
numbers contain no zero divisors. So the ring is a cancellative ring by Theorem 18.1.16.)
Another well-known integral domain is the ring of integers. (See Section 14.3.) This is also easily proved to
have the cancellative property since it contains no zero divisors because it is a subring of the real numbers,
which also has no zero divisors (because it is a field). However, neither the integers nor the Gauian integers
are division rings.
18.2.17 Definition: The ring of Gauian integers is the ring R which is defined by
Z
(i) R = 2 ,
(ii) x, y R, x + y = (x1 + y1 , x2 + y2 ),
(iii) x, y R, xy = (x1 y1 x2 y2 , x1 y2 + x2 y1 ).
18.2.18 Remark: The complex numbers constitute an integral domain.
The complex number system C C
= ( , C , C ) in Definition 16.8.1 is a field according to Definition 18.7.3,
but in Theorem 18.2.19 it is only shown that it is an integral domain.
18.2.19 Theorem: The complex number system C = (C, C, C) is an integral domain.
C
Proof: The pair ( , C ) is a group in which the identity is 0C = (0, 0) and the additive inverse map is
(x, y) 7 (x, y). The addition operation C is commutative because addition in is commutative. Thus R
Definition 18.1.2 (i) is satisfied.
C C C
The multiplication operation C : is associative because by Definition 16.8.1 (ii),
C
(x1 , y1 ), (x2 , y2 ), (x3 , y3 ) ,
C ((x1 , y1 ), C ((x2 , y2 ), (x3 , y3 ))) = C ((x1 , y1 ), (x2 x3 y2 y3 , x2 y3 + y2 x3 ))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
= (x1 (x2 x3 y2 y3 ) y1 (x2 y3 + x3 y2 ), x1 (x2 y3 + x3 y2 ) + y1 (x2 x3 y2 y3 ))
= ((x1 x2 y1 y2 )x3 (x1 y2 + y1 x2 )y3 , (x1 y2 + y1 x2 )x3 + (x1 x2 y1 y2 )y3 )
= C ((x1 x2 y1 y2 , x1 y2 + y1 x2 ), (x3 , y3 ))
= C (C ((x1 , y1 ), (x2 , y2 )), (x3 , y3 )).
C
Thus ( , C ) is a semigroup. So Definition 18.1.2 (ii) is satisfied. The distributivity for Definition 18.1.2 (iii)
is similarly easy (and tedious) to show. So C
is a ring by Definition 18.1.2. Then (1, 0) is a multiplicative
C
identity for . So C
is a unitary ring by Definition 18.2.2. The commutativity of C is evident from
C
Definition 16.8.1 (ii). So is a commutative unitary ring by Definition 18.2.14. And finally is non-trivial C
by Definition 18.1.5 because it contains more than one element, and C
is a cancellative ring because every
non-zero element (x, y) has a multiplicative inverse (x0 , y 0 ) with x0 = x/(x2 + y + 2) and y 0 = y/(x2 + y + 2).
C
Hence is an integral domain by Definition 18.2.15.
18.3.3 Definition: An ordered ring is a tuple (R, , , O) where (R, , ) is a ring and O is an ordering
of the ring (R, , ).
18.3.4 Remark: The choice of direction for the notation for the order relation of a ring.
It is convenient to denote the order O in Definition 18.3.2 as < and denote O 1 as >, together with the
usual notations and for the corresponding weak order and its inverse respectively. In principle, one
could denote O as >, but if R is a non-zero unitary ring, then 0R O 1R . Since one is accustomed to think
of 0R as being less than 1R in a ring, it is best to denote O as <.
18.3.5 Remark: Equivalent conditions for the order condition for an ordered ring.
Condition (ii) in Definition 18.3.2 is expressed in terms of multiplication on the right by a fixed element c > 0.
This is equivalent to the same condition with multiplication on the left instead. (This is obvious if the ring
is commutative, but not so obvious if it is noncommutative.) It is also equivalent to the more symmetric
condition: x, y R, ((x > 0 y > 0) xy > 0). These equivalences are shown in Theorem 18.3.6.
18.3.6 Theorem: Let R be a ring with a total order < such that x, y, a R, (x < y a + x < a + y).
Then the following conditions are equivalent.
(i) x, y, c R, ((x < y c > 0) xc < yc).
(ii) x, y, c R, ((x < y c > 0) cx < cy).
(iii) x, y R, ((x > 0 y > 0) xy > 0).
Proof: Let R be a ring with a total order < which satisfies the indicated additive translation invariance.
Suppose that < satisfies condition (i). Let y, c R satisfy y > 0 and c > 0. Then 0 = 0c < yc. So yc > 0.
Hence y, c R, ((y > 0 c > 0) yc > 0), which is the same as condition (iii) under an exchange of
dummy variables. Similarly, condition (ii) yields cy > 0 for all y, c R which satisfy y > 0 and c > 0, from
which condition (iii) follows.
Now suppose that < satisfies (iii). Let x, y, c R satisfy x < y and c > 0. Let z = y x. Then
0 = x x < y x = z by the addition-invariance of <. So z > 0 and c > 0. Therefore zc > 0 by (iii).
So yc = zc + xc > xc by addition-invariance. Therefore x, y, c R, ((x < y c > 0) xc < yc),
which is the same as condition (i). Similarly, to show that (iii) implies (i), swapping the dummy variables
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
c and z in the application of condition (iii) gives cz > 0 and cy = cz + cx > cx, from which it follows that
x, y, c R, ((x < y c > 0) cx < cy), which is the same as condition (ii).
Thus (iii) implies both (i) and (ii), while conditions (i) and (ii) both individually imply (iii). Hence the three
conditions are pairwise equivalent as claimed.
18.3.7 Remark: The term isotonic for compatibility of an order with ring operations.
The term isotonic is used to describe conditions (i) and (ii) in Definition 18.3.2 by MacLane/Birkhoff [106],
page 263, and Fuchs [73], pages 2, 4. Thus the total order is isotonic for addition, and isotonic for multipli-
cation by positive factors.
The isotonic conditions are a strong constraint on the rings themselves. For example, an ordered unitary
ring which is not a zero ring must be order-isomorphic to the ring of integers if the positive elements are
well ordered. (See Remark 18.6.12.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
part (iv). So ((x) + (x)) < 0R . Hence x + x = ((x) + (x)) 6= 0R .
To show part (vi), let x R \ {0R }. Then x > 0R or x < 0R . Suppose x > 0R . Then 0R x < x x by
Definition 18.3.2 (ii). Hence x x > 0R . Suppose x < 0R . Then x > 0R by part (ii). So (x) (x) > 0R
similarly. But (x) (x) = x x by Theorem 18.1.3 (ii). So x x > 0R .
To show part (vii), let x, y R satisfy x > 0R and x < y. Then xx < xy by Definition 18.3.2 (ii). Similarly,
x y < y y by Definition 18.3.2 (ii) since y > 0R because < is a total order. Therefore x x < y y because
< is a total order.
To show part (viii), let x, y R satisfy x > 0R , y > 0R and x x = y y. Suppose that x 6= y. Then x < y
or x > y. Suppose that x < y. Then x x < y y by part (vii). This contradicts the assumption. Similarly
x > y yields a contradiction x x > y y. Therefore x = y.
To show part (ix), let x, y R satisfy x > 0R , y < 0R and x x = y y. Let z = y. Then z > 0R by
part (ii), and z z = y y by Theorem 18.2.6 (v). So x > 0R , z > 0R and x x = z z. Therefore x = z by
part (viii). Hence x = y.
To show part (x), let x, y R satisfy x x = y y. If x = 0R , then 0R = 0R 0R = y y, and so y = 0R by
part (vi). Hence x = y (and incidentally x = y). Similarly, x = y (and x = y) if y = 0R . Now suppose
that x > 0R and y > 0R . Then x = y by part (viii). If x > 0R and y < 0R , then x = y by part (ix).
Similarly, if x < 0R and y > 0R , then x = y by part (ix). Lastly, if x < 0R and y < 0R , then x > 0R
and y > 0R by part (ii), and so x = y by part (viii) because x x = (x) (x) and y y = (y) (y)
by Theorem 18.2.6 (v). Hence x = y.
To show part (xi), let x, y R \ {0R }. Suppose that x > 0R and y > 0R . Then xy > 0R by Defini-
tion 18.3.2 (ii). So xy 6= 0R because < is a total order. Similarly, if x < 0R or y < 0R , then xy < 0R
or xy > 0R . In both cases, xy 6= 0R . So there are no zero divisors.
For part (xii), let x, y, c R with xc < yc and c > 0. Then either x < y, x = y or x > y because <
is a total order on R. Suppose that x = y. Then xc = yc, which contradicts the assumption xc < yc.
Suppose that x > y. Then xc > yc by Definition 18.3.2 (ii), which also contradicts the assumption xc < yc.
Hence x < y.
For part (xiii), let x, y, c R with xc = yc and c > 0. Suppose that x < y. Then Definition 18.3.2 (ii)
implies xc < yc, which contradicts the assumption xc = yc. Similarly, if y < x, then yc < xc, which also
contradicts the assumption xc = yc. Therefore x = y. Now assume that xc = yc and c < 0. Let d = c.
Then xd = x(c) = (xc) = (yc) = y(c) = yd and d > 0. So x = y again. Hence x = y always follows
from xc = yc and c 6= 0.
Part (xiv) follows from a double application of part (xiii).
Part (xv) follows from Definition 18.3.2 (ii) and Theorem 18.3.6 (i, iii).
For part (xvi), let x > 0R and y < 0R . Then y > 0R by part (iii). So x(y) > 0R by part (xv). But
x(y) = (xy) by Theorem 18.1.3 (iii). So (xy) > 0R . Therefore xy < 0R by part (iii).
For part (xvii), let x < 0R and y < 0R . Then x < 0R and y > 0R by part (iii). So (x)(y) > 0R by
part (xv). But (x)(y) = (xy) by Theorem 18.1.3 (ii). Hence xy > 0R .
18.3.13 Definition: An ordered ring homomorphism from an ordered ring R1 < (R1 , 1 , 1 , <1 ) to an
ordered ring R2
< (R2 , 2 , 2 , <2 ) is a ring homomorphism : R1 R2 which satisfies
(iv) x, y R1 , x <1 y (x) <2 (y).
18.3.14 Remark: Embedding the integers in an ordered ring as a line of points through the origin.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Theorem 18.3.15 for ordered rings is a strengthened form of Theorem 17.4.7 for groups, which in turn is a
strengthened form of Theorem 17.1.9 for semigroups. A consequence of the order on the ring is that the
unital group homomorphism in Definition 17.4.8 is promoted here to a monomorphism.
Then x is a group monomorphism with respect to the additive group operation on R for all x R \ {0R }.
In other words, the unital group morphism generated by x R \ {0R } is injective.
Proof: The assertion follows from Definition 18.3.2 (i) and Theorem 17.5.12 (iv).
18.3.17 Definition: A positive cone or positive subset for a ring R is a subset P of R which satisfies the
following.
(i) 0R / P.
(ii) x R \ {0R }, (x P x
/ P ).
(iii) x, y P, x + y P .
(iv) x, y P, xy P .
18.3.19 Theorem: Let (R, , , <) be an ordered ring. Let P = {x R; 0R < x}. Then P is a positive
cone for the ring (R, , ).
Proof: Let (R, , , <) be an ordered ring. Let P = {x R; 0R < x}. Then 0R / P because < is a
total order on R. Let a R \ {0R }. If a P , then 0R < a. So a < 0R = 0R by Theorem 18.3.10 (ii).
Hence 0R 6< a, by the antisymmetry of the total order <. So a / P . Similarly, if x 6= 0R and x
/ P,
then x P .
Let x, y P . Then 0R < x. So y < x + y by Definition 18.3.2 (i). So 0R < x + y, by the transitivity of a
total order, because 0R < y. By Definition 18.3.2 (ii), 0R y < xy. So xy > 0R . Hence P satisfies all of the
conditions of Definition 18.3.17.
18.3.20 Theorem: Let (R, , ) be a ring with a positive cone P . Let the relation < be defined on R
by x < y y x P for all x, y R. Then < is an ordering of the ring (R, , ).
Proof: Let (R, , ) be a ring with a positive subset P . Let < be the relation on R defined by x < y
y x P for all x, y R. To show that < is a total order on R, let x, y R satisfy x 6< y. Then y x / P.
So by Definition 18.3.17 (ii), either y x = 0R or (y x) P . That is, either x = y or x y P . So
either x = y or y < x. But if x = y, then y x / P and x y / P by Definition 18.3.17 (i). So x 6< y
and y 6< x. Hence < satisfies the antisymmetry and reflexivity conditions for a total order. To show
transitivity, let x, y, z R with x < y and y < z. Then y x P and z y P . So (y x) + (z y) P
by Definition 18.3.17 (iii). Thus z x P and therefore x < z. Hence < is a total order on R.
Let a, b, c R with a < b. Then b a P . So (b + c) (a + c) P . So a + c < b + c.
Let a, b, c R with a < b and c > 0. Then b a P and c P . So (b a)c P . So bc ac P . So ac < bc.
Hence all of the conditions of Definition 18.3.2 are satisfied. So < is an ordering of the ring (R, , ).
18.3.21 Remark: Notations for positive and negative cones in ordered rings.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
One may generalise the notations in Remark 1.5.1 for positive and negative subsets of the integers, rational
numbers and real numbers to general ordered rings. (See also Notation 14.3.5.) Thus for any ring R, one
may introduce the following notations, where P is the positive cone of R.
R+ = {x R; x > 0R } = P .
R0+ = {x R; x 0R } = P {0R }.
R = {x R; x < 0R } = R \ P \ {0R }.
R0 = {x R; x 0R } = R \ P .
18.4.2 Definition: An Archimedean ordering of a ring (R, , ) is an ordering < of (R, , ) such that
< is an Archimedean order on the additive group (R, ).
An Archimedean ordered ring is an ordered ring (R, , , <) such that < is an Archimedean ordering
of (R, , ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and m + , (ym > 0 ` > m, x = 0). For such k and m (which are uniquely determined by x and y),
0 Pk+m ` Pi
it follows that (xy)k+m = j=0 xk+mj yj = xk ym > 0, and (xy)i = j=0 xij yj = 0 for all i > k + m.
Therefore xy > 0. Hence (R, , , <) is an ordered ring by Definition 18.3.2 and Theorem 18.3.6.
18.4.5 Example: A non-Archimedean ordered ring with infinitesimals.
Let R = {x Z0 ; k
Z Z
0 , j < k, xj = 0}. Define : R R R by (x, y)i = xi + yi for all x, y R
Z P0
and i 0 . Define : R R R by (x, y)i = j=i xij yj for all x, y R and i
Z
0 . Define a total
order < on R by x < y if and only if k Z0 , (x k < yk j > k, x j = y j ). Then (R, , , <) is an
ordered commutative unitary ring.
For k Z
0 , let P
k
denote the element y of this ring which satisfies yk = 1 and yj = 0 for j Z
0 \{k}. Then
0
one may write x = i= xi i for all x R. This ordered ring is non-Archimedean because 0 < n.1 <
Z
1R for all n + . (See Definition 17.5.4 for the Archimedean property.) Thus 1 is an infinitesimal in this
ring. Similarly, 1 is a negative infinitesimal for this ring because 1R < n.1 < 0 for all n + . Z
To verify that (R, , ) is a unitary commutative ring, note that (x, y) R and (x, y) R for all x, y R
Z P0
because the ring is closed under addition and multiplication, and the sum j=i xij yj has only a finite
number of non-zero terms for all i Z0 . The proofs of associativity, commutativity and distributivity are
elementary. It is clear that the element 1R = 0 is the unit element of R.
To verify that (R, , , <) is an ordered ring, suppose that x, y R satisfy x < y, and that a R. Then
k Z0 , (xk < yk j > k, xj = yj ). But ai + xi < ai + yi if and only if xi < yi , and ai + xi = ai + yi
if and only if xi = yi , for all i Z 0 . So k Z
0 , (ak + xk < ak + yk j > k, aj + xj = aj + yj ).
Therefore a + x < a + y. Now let x, y R satisfy x > 0 and y > 0. Then k Z0 , (xk > 0 j > k, xj = 0)
and m Z
0 , (ym > 0 `
P
> m, x ` = 0). For such k and m (which are
P
uniquely determined by x and y),
it follows that (xy)k+m = 0j=k+m xk+mj yj = xk ym > 0, and (xy)i = 0j=i xij yj = 0 for all i > k + m.
Therefore xy > 0. Hence (R, , , <) is an ordered ring by Definition 18.3.2 and Theorem 18.3.6.
18.4.6 Remark: A ring where some elements have no square upper bound.
Z
Example 18.4.5 can be made non-unitary by excluding the elements n.1R for \ {0R }. The ring elements
P1
x R in Example 18.4.7 have the form x = i= xi i . An interesting property of this pathological ring
is that one cannot validly assert that x R, y R, x < y 2 . A counterexample
P to this assertion is the
ring element x = 1 . The square y 2 for any y R has the form y 2 = 2i= (y 2
) i i . The fact that the
coefficient (y )1 = 0 follows from the equality . = . Consequently y R, y 2 < 1 .
2 1 1 2
The existence of square upper bounds for all ring elements is shown in Theorem 18.4.8 for non-zero ordered
rings which are either unitary or Archimedean.
Z P1
Z
and i . Define : R R R by (x, y)i = j=i+1 xij yj for all x, y R and i . Define a total
Z
order < on R by x < y if and only if k , (xk < yk j > k, xj = yj ). Then (R, , , <) is an
ordered commutative ring.
Z
For k , let k denote the element y of this ring which satisfies yk = 1 and yj = 0 for j \ {k}.
P1
Z
i
Then one may write x = i= xi for all x R. This ordered ring is non-Archimedean because
0 < n. 2
< 1
for all n +
Z
. (See Definition 17.5.4 for the Archimedean property.) Similarly,
1 < n.2 < 0 for all n + . Z
To verify that (R, , ) is a commutative ring, note that (x, y) R and (x, y) R for all x, y R because
Z P1
the ring is closed under addition and multiplication, and the sum j=i+1 xij yj has only a finite number of
Z
non-zero terms for all i . The proofs of associativity, commutativity and distributivity are elementary.
To verify that (R, , , <) is an ordered ring, suppose that x, y R satisfy x < y, and that a R. Then
Z
k , (xk < yk j > k, xj = yj ). But ai + xi < ai + yi if and only if xi < yi , and ai + xi = ai + yi
Z Z
if and only if xi = yi , for all i . So k , (ak + xk < ak + yk j > k, aj + xj = aj + yj ).
Z
Therefore a + x < a + y. Now let x, y R satisfy x > 0 and y > 0. Then k , (xk > 0 j > k, xj = 0)
Z
and m , (ym > 0 ` > m, x` = 0). For such k and m (which are uniquely determined by x and y), it
P1 P1
follows that (xy)k+m = j=k+m+1 xk+mj yj = xk ym > 0, and (xy)i = j=i+1 xij yj = 0 for all i > k + m.
Therefore xy > 0. Hence (R, , , <) is an ordered ring by Definition 18.3.2 and Theorem 18.3.6.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
18.4.8 Theorem: Let (R, , , <) be a ordered ring which is unitary or Archimedean, and which is not the
zero ring. Then x R, y R, x < y 2 .
Proof: Let (R, , , <) be a non-zero unitary ordered ring. Let x R. If x < 0, then x < y 2 with y = 0.
If 0 x < 1, then x < y 2 with y = 1. If x = 1, then x < 4 = (1 + 1)2 . If x > 1, then x < x2 by
Definition 18.3.2 (ii) because 1 < x and x > 0. So x = 1.x < x.x = x2 .
Let (R, , , <) be a non-zero Archimedean ordered ring. Let x R. Let z R \ {0}. Then z 2 > 0 by
Z
Theorem 18.3.10 (vi). It follows from Definition 17.5.4 that x < n.z 2 for some n + . But n.z 2 n2 z 2
because n n2 . Hence x < (n.z)2 , which is of the required form.
18.5.2 Definition: A pseudo-absolute value (function) on a ring (R, R , R ), valued in an ordered ring
(S, S , S , <), is a function : R S which satisfies the following.
(i) x, y R, (R (x, y)) = S ((x), (y)). (That is, x, y R, (xy) = (x)(y).)
(ii) x, y R, (R (x, y)) S ((x), (y)). (That is, x, y R, (x + y) (x) + (y).)
18.5.3 Remark: Necessary conditions for the definition of an absolute value function.
Let R be any ring, and let S be the integers Z
with the usual ordering. Then : R S defined by
(x) = 1Z for all x R is a pseudo-absolute value function on R, valued in S. This dashes any hopes of
proving that (0R ) = 0S in general.
Let R = S be any ordered ring, and define : R S by (x) = x for all x R. Then is a pseudo-absolute
value function on R, valued in S. This dashes any hopes of proving that (x) 0S in general.
These examples show that non-negativity does not guarantee (0R ) = 0S , and that (0R ) = 0S does
not guarantee non-negativity. Therefore these must both be assumed in Definition 18.5.6. Even if both
non-negativity and (0R ) = 0S are assumed, the pseudo-absolute value function could be trivial because
(x) = 0S for all x R defines a valid pseudo-absolute value function. Therefore positivity for non-zero
elements of R must also be specified if that is desired.
Pseudo-absolute value functions do have some useful properties, as shown in Theorem 18.5.4.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Therefore (x) = (x) or (x) = (x) by Theorem 18.3.10 (x).
To show part (iii), assume x R, (x) 0S . Let y = 0R . Then (y) = (y) because 0R = 0R
by Definition 18.1.2 (i) and Theorem 17.3.13 (v). Now let y > 0R . Then (y)(y) = (y)(y), and so
(y) = (y) by Theorem 18.3.10 (viii).
To show part (iv), let (x) < 0S for some x R. Then 0R (0R ) = (x+(x)) (x)+(x) by part (i).
So (x) (x) > 0S . Therefore (x) = (x) by Theorem 18.3.10 (ix). So (0R ) (x)+(x) = 0S .
Hence (0R ) = 0S . In other words, the assumption x R, (x) < 0S implies (0R ) = 0S . Therefore the
assumption (0R ) > 0S implies x R, (x) 0S by modus tollens tollendo. (See Remark 4.8.15.)
To show part (v), let x, y R satisfy (xy) = 0S . Then (x)(y) = 0S . Hence (x) = 0S or (y) = 0S by
Theorem 18.3.10 (xi).
To show part (vi), let x, y R satisfy (x) = (x) and (y) = (y). Then ((x) y) = (x)(y) =
(x)(y), and (x (y)) = (x)(y) = (x)(y). But (x) y = x (y) by Theorem 18.1.3 (ii). So
(x)(y) = (x)(y). Therefore (x)(y) = 0S by Theorem 18.3.10 (v). So (xy) = 0S . Hence (x) = 0S
or (y) = 0S by part (v).
To show part (vii), let y R satisfy (y) < 0S . Then (y) = (y) by part (ii). So (y) > 0S by
Theorem 18.3.10 (ii). Now assume that x R satisfies (x) 6= (x). Then (x) = (x) by part (ii).
Therefore (x) = 0S or (y) = 0S by part (vi). So (x) = 0S by the assumption (y) < 0S . So (x) = 0S
by part (ii). Therefore (x) = (x), which contradicts the assumption. Hence x R, (x) = (x).
Part (viii) follows directly from parts (iii) and (vii).
18.5.5 Remark: A minimalist absolute value function has its value in an ordered ring.
Absolute value functions on rings may be defined very generally in terms of orderings on rings. The order
relation in Definition 18.5.6 is the non-strict order corresponding to the strict order < on S.
18.5.6 Definition: An absolute value (function) on a ring (R, R , R ), valued in an ordered ring
(S, S , S , <), is a function : R S which satisfies the following.
(i) x, y R, (R (x, y)) = S ((x), (y)). (That is, x, y R, (xy) = (x)(y).)
(ii) x, y R, (R (x, y)) S ((x), (y)). (That is, x, y R, (x + y) (x) + (y).)
(iii) (0R ) = 0S .
(iv) x R \ {0R }, (x) > 0S .
18.5.7 Remark: Absolute value function definitions for special domains and ranges.
Absolute values are defined by MacLane/Birkhoff [106], page 264, for general non-trivial unitary rings, but
they require the ring R and the ordered ring S to be the same ring, and they specifically define the absolute
value of x to be max(x, x). Absolute value functions are defined as real-valued functions on fields satisfying
the rules in Definition 18.5.6 by Lang [104], page 465, and Ash [49], page 220.
The independent choice of rings R and S is motivated by some quite familiar kinds of scenarios, such as
Example 18.5.8, where R is not a subset of S, and S is not a subset of R.
18.5.10 Theorem: Let : R S be an absolute value function on a ring R, valued in an ordered ring S.
(i) is a pseudo-absolute value function on R, valued in S.
(ii) x R, (x) 0S .
(iii) x R, (x) = (x).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Part (i) follows from Definitions 18.5.2 and 18.5.6.
Part (ii) follows from Definition 18.5.6 (iii, iv).
Part (iii), follows from parts (i) and (ii) and Theorem 18.5.4 (iii).
18.5.11 Remark: Application of general absolute value function to define a general norm.
The general absolute value function in Definition 18.5.6 is used for the general norm on modules over rings
in Definition 19.5.2.
18.5.14 Definition: The standard absolute value (function) on an ordered ring (R, , , <) is the function
: R R which satisfies
x if x 0
x R, (x) =
x if x 0.
18.5.15 Theorem: The standard absolute value function on an ordered ring R is an absolute value function
on R, valued in R.
Proof: Let : R R be the standard absolute value function on an ordered ring (R, , , <). Then
Definition 18.5.6 (iii) is satisfied because (0) = 0.
Let x R \ {0}. If x > 0, then (x) = x > 0. If x < 0, then (x) = x > 0 by Theorem 18.3.10 (iii).
Therefore Definition 18.5.6 (iv) is satisfied.
Let x, y R with x 0 and y 0. Then xy 0 by Theorem 18.3.10 (xv). So (xy) = xy = (x)(y).
Now let x 0 and y < 0. Then xy 0 by Theorem 18.3.10 (xvi). So (xy) = (xy) = (x)y = (x)(y).
Similarly if x < 0 and y 0 then (xy) = (x)(y). Finally let x < 0 and y < 0. Then xy > 0 by
Theorem 18.3.10 (xvii). So (xy) = xy = (x)(y) = (x)(y). Thus Definition 18.5.6 (i) is satisfied.
Let x, y R with x 0 and y 0. Then x + y 0. So (x + y) = x + y = (x) + (y). Therefore (x + y)
(x) + (y). Now suppose that x 0 and y < 0. If x + y 0, then (x + y) = x + y x y = (x) + (y).
If x + y 0, then (x + y) = (x + y) x y = (x) + (y). Thus (x + y) (x) + (y) in either case.
Lastly let x < 0 and y < 0. Then (x + y) = (x + y) = (x) + (y). Thus (x + y) (x) + (y) in
every case. So Definition 18.5.6 (ii) is satisfied. Hence is an absolute value function on R, valued in R, by
Definition 18.5.6.
18.5.16 Theorem: Let : R R be the standard absolute value function on an ordered ring R.
(i) x R, (x) x (x). (I.e. x R, |x| x |x|.)
(ii) x R, (x) x (x). (I.e. x R, |x| x |x|.)
(iii) x, y R, ((x) (y)) (x y). (I.e. x, y R, |x| |y| |x y|.)
(iv) x, y R, (x) + (y) = max((x + y), (x y)). (I.e. x, y R, |x| + |y| = max(|x + y|, |x y|).)
Proof: For part (i), let x R with x 0. Then (x) = x and so x (x). Also, (x) = x x by
Theorem 18.3.10 (iii). So (x) x (x). Now suppose that x 0. Then x 0. So (x) x
(x) by the case x 0. So (x) x (x) by Theorem 18.3.10 (ii). Therefore (x) x (x)
by Theorem 18.5.10 (iii). Thus (x) x (x) in both cases.
Part (ii) follows from part (i) and Theorem 18.5.10 (iii). (Alternatively apply Theorem 18.3.10 (ii).)
For part (iii), let x, y R with x 0 and y 0. If x y then ((x) (y)) = (x y) = x y (x y)
by part (i). If x y then ((x) (y)) = (x y) = (x y) (x y) by part (ii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Now suppose that x 0 and y < 0. If x + y 0 then ((x) (y)) = (x + y) = x + y x y (x y)
by part (i). If x + y 0 then ((x) (y)) = (x + y) = (x + y) x y (x y) by part (i).
Suppose that x < 0 and y 0. If x + y 0 then ((x) (y)) = (x y) = x + y x + y (x y)
by part (ii). If x + y 0 then ((x) (y)) = (x y) = x y x + y (x y) by part (ii).
Suppose that x < 0 and y < 0. If x + y 0 then ((x) (y)) = (x + y) = x + y (x y)
by part (ii). If x + y 0 then ((x) (y)) = (x + y) = x y (x y) by part (i). Thus
((x) (y)) (x y) in all cases.
For part (iv), let x, y 0. Then (x) + (y) = x + y = max(x + y, (x y)) = max((x + y), (x y))
because x + y (x y) since x + y x y and x + y (x y). Let x 0 and y 0. Then
(x) + (y) = x y = max((x + y), x y) = max((x + y), (x y)) because (x + y) x y since
x + y x y and (x + y) x y. Similarly if x 0 and y 0. Let x, y 0. Then (x) + (y) =
x y = max(x y, (x y)) = max((x + y), (x y)) because x y (x y) since x y x y
and x y (x y).
18.5.17 Remark: The standard absolute value functions for the real and complex numbers.
R
Since the real number system is an ordered ring as a consequence of the basic properties in Theorem 15.9.2,
R
and the absolute value function for in Definition 16.5.2 agrees exactly with Definition 18.5.14, it follows
R
from Theorem 18.5.15 that the absolute value function for is a valid absolute value function for , and R
the assertions of Theorem 18.5.16 are therefore applicable.
Not so straightforward is the absolute value function for complex numbers in Definition 16.8.8. A particular
C
difference is that the complex number system ( , C , C ) in Definition 16.8.1 is not an ordered ring, although
it is a unitary ring according to Definition 18.2.2. However, the value of an absolute value function is required
to lie in an ordered ring, which in this case is the real number system. This shows the motivation for defining
absolute value functions in Definition 18.5.6 in terms of two rings, one for the domain and one for the range.
18.5.18 Theorem: The standard absolute value function for the complex numbers is an absolute value
function on the ring of complex numbers, valued in the ordered ring . R
C C
Proof: The complex number system = ( , C , C ) in Definition 16.8.1 is a ring by Theorem 18.2.19 and
C R
Definition 18.2.11. The standard absolute value function | | for in Definition 16.8.8 is valued in , which is
an ordered ring by Theorem 15.9.2 and Definition 18.3.3. The product rule in Definition 18.5.6 (i) follows from
Theorem 16.8.9 (i). The triangle inequality in Definition 18.5.6 (ii) follows from Theorem 16.8.9 (ii). The non-
negativity and positivity conditions in Definition 18.5.6 (iii, iv) are evident from the form of Definition 16.8.8.
Hence | | is an absolute value function on the ring . C
18.6.3 Theorem: Let (R, , , <) be an ordered ring. Let x R \ {0R } satisfy xx = x. Then x is an
identity for R. Hence R is an ordered non-zero unitary ring.
Proof: Let (R, , , <) be an ordered ring. Let x R\{0R } satisfy xx = x. Suppose y R satisfies xy 6= y.
Then xy < y or xy > y. Suppose xy < y. If x > 0 then xxy < xy by Definition 18.3.2 (ii). So xy < xy, which
is a contradiction. In the case that x < 0, note that xy < y implies that y < (xy) by Theorem 18.3.10 (ii).
So (x)(y) < (x)((xy)). So xy < xxy by Theorem 18.1.3 (ii). So xy < xy, which is a contradiction.
Similar contradictions follow if it is assumed that xy > y. Therefore xy = y. Similarly, yx = y for all y R.
Hence x is a multiplicative identity for R.
18.6.4 Theorem: Let R < (R, , , <) be an ordered unitary ring which is not a zero ring.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) 0R < 1R .
(ii) x R \ {0R }, R, (x = x = 1R ).
Proof: To show part (i), note that 0R 6= 1R by Theorem 18.2.6. So either 0R < 1R or 1R < 0R . Suppose
that 1R < 0R . Then 0R < 1R by Definition 18.3.2 (i) with c = 1R . So 1R < 0R by Definition 18.3.2 (ii)
with a = 1R , b = 0R and c = 1R . This contradicts the definition of a total order. Therefore 0R < 1R .
For part (ii), let x R \ {0R } and R. Suppose that x = x. Then x = 1R x. Therefore = 1R by
Theorem 18.3.10 (xiii).
18.6.6 Theorem: Let R be a non-zero ordered unitary ring. Then the unital morphism for R is an ordered
unitary ring monomorphism.
Proof: Let : Z
R be the unital morphism for R. (See Definition 18.2.11.) Then : R is a Z
unitary ring homomorphism by Theorem 18.2.10. But 1R 6= 0R because R is a non-zero unitary ring, So
Z
is injective by Theorem 18.3.15. Therefore : R is a unitary ring monomorphism.
Z
For the additive ordered group structure of R, note that : R is an ordered group monomorphism
Z
by Theorem 17.5.12 (v) because 1R > 0R by Theorem 18.6.4 (i). Hence : R an ordered unitary ring
monomorphism.
18.6.7 Remark: Consequences of unitary ring structure for absolute value functions.
Absolute value functions on rings have some additional properties for unitary rings.
18.6.8 Theorem: Let : R S be an absolute value function on a non-zero unitary ring R, valued in an
ordered ring S. Then S is a non-zero unitary ring and (1R ) = 1S .
Proof: Let : R S be an absolute value function, where R is a non-zero unitary ring and S is
an ordered ring. Suppose (1R ) y 6= y for some y S. Then either (1R ) y < y or (1R ) y > y.
Suppose (1R ) y < y. Then (1R ) (1R ) y < (1R ) y by Definition 18.3.2 (ii) and Definition 18.5.6 (iv).
But (1R ) (1R ) = (1R ) by Definition 18.5.6 (i). So (1R ) y < (1R ) y, which contradicts the total order
on S. Similarly, the assumption (1R ) y > y yields a contradiction. Therefore (1R ) y = y. Similarly,
y (1R ) = y for all y S. Hence (1R ) is a multiplicative identity for S. So S is a non-zero unitary ring
and (1R ) = 1S .
18.6.10 Definition: The trivial absolute value (function) on a ring (R, R , R ) with no zero divisors.
valued in an ordered unitary ring S
< (S, S , S , <), is the function : R S which satisfies
0S if x = 0R
x R, (x) =
1S otherwise.
18.6.11 Theorem: Let : R S be the trivial absolute value function on a ring (R, R , R ) with no zero
divisors, valued in an ordered unitary ring (S, S , S , <). Then is an absolute value function from R to S.
Proof: Let : R S be the trivial absolute value function on a ring R < (R, R , R ), valued in an
ordered unitary ring S < (S, S , S , <). Then (0R ) = 0S , which is part (iii) of Definition 18.5.6. Part (iv)
follows from Theorem 18.6.4 (i).
To show part (i) of Definition 18.5.6, let x, y R. If x = 0R or y = 0R , then (xy) = 0R = (x)(y). If
x 6= 0R and y 6= 0R , then xy 6= 0R because R has no zero divisors. So (xy) = 1R = (x)(y).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
18.6.12 Remark: Ordered integral domains.
Integral domains are defined as non-trivial cancellative commutative unitary rings in Definition 18.2.15.
So the ordered integral domains in Definition 18.6.13 are ordered rings which are non-trivial cancellative
commutative unitary rings if the order structure is removed. A prime example of an ordered integral domain
Z
is the integer number system . The Gauian integers in Definition 18.2.17 cannot be given a compatible
ordering.
A non-trivial ordered integral domain whose positive cone is well ordered is called an integral system in
Definition 18.6.14. The name is well justified by Theorem 18.6.15, which states that all integral systems
are ordered-unitary-ring-isomorphic to the ordered unitary ring of integers. (See also Pinter [118], page 210;
MacLane/Birkhoff [106], page 264.) Commutativity is not required in order to prove this isomorphism.
18.6.13 Definition: An ordered integral domain is an ordered ring (R, , , <) such that the underlying
ring (R, , ) is an integral domain.
18.6.14 Definition:
An integral system is a non-trivial ordered unitary ring whose positive cone is well ordered.
18.6.15 Theorem: For every integral system R, there is an ordered unitary ring isomorphism from the
ordered unitary ring to R.Z
Z
Proof: Let R be an integral system. Let : R be the unital morphism in Definition 18.2.11. Then
is an ordered unitary ring monomorphism by Theorem 18.6.6. Let X = Range().
Z
Let x R \ X. Then x R \ X because otherwise (n) = x for some n and so (n) = (x) =
x X. Let P = {y R; y > 0R }. Then either x P or x P because <R is a total order. Therefore
P \ X 6= . But P is well-ordered. So P \ X contains a minimum element z. But z 6= 1R because z / X
and 1R X. So either z < 1R or z > 1R . Suppose that z > 1R . Then z 1R > 0R . So z 1R P . But
z 1R < z, which then contradicts the assumption that z = min(P \ X). So suppose that z < 1R . Then
z z > 0R by Theorem 18.3.10 (vi) because z 6= 0R . But z z < 1R because z > 0R and z < 1R implies z z < z
Z
by Definition 18.3.2 (ii). Suppose that z z X. Then 0R < (n) < 1R for some n . This is impossible
because is an ordered ring homomorphism. (So n 0 implies (n) 0R and n 1 implies (n) 1R .)
Therefore z z P \ X and z z < z, which contradicts the assumption that z = min(P \ X). So R \ X = .
Therefore is surjective. Hence is an ordered unitary ring isomorphism from to R. Z
18.7. Fields
18.7.1 Remark: Division rings are almost fields, but lack commutativity.
The division rings in Definition 18.7.2 probably have more in common with fields than with general rings.
So it seems reasonable to present them together. The classic example of a not-commutative division ring is
the system of quaternions.
18.7.2 Definition: A division ring is a non-trivial unitary ring in which every non-zero element has a
multiplicative inverse.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
in Remark 18.2.5, the zero ring is a unitary ring.) If the multiplicative semigroup (K, ) in condition (ii) is
not required to be commutative, the resulting system is a division ring. Thus a field is the same thing as a
commutative division ring. A field is also the same thing as a non-trivial commutative unitary ring in which
every non-zero element has a multiplicative inverse.
Proof: Let K < (K, , ) be a field. Then K is a ring by Definition 18.7.3 (i, ii, iv) and Definition 18.1.2,
and K is a non-trivial ring by Definition 18.7.3 (iii) since the group K0 must be non-empty. So #(K) 2.
Then K is a unitary ring by Definition 18.7.3 (iii) and Definition 18.2.2. Every non-zero element of K has
a multiplicative inverse by Definition 18.7.3 (iii). So K is a division ring by Definition 18.7.2. Hence K is a
commutative division ring. The proof of the converse is equally unsurprising.
Proof: For part (i), let a K. Then by Definition 18.7.3 (iv), a(0K +0K ) = a0K +a0K . But 0K +0K = 0K
by definition of the zero of an additive group. So a0K = a(0K + 0K ) = a0K + a0K . So a0K + ((a0K )) =
a0K + a0K + ((a0K )). So 0K = a0K + 0K = a0K . Hence a0K = 0K . Similarly, 0K a = 0K .
For part (ii), note that K \ {0K } is closed under multiplication by Definition 18.7.3 (iii). So if a 6= 0K and
b 6= 0K , then ab 6= 0K .
Part (iii) follows from the fact that K is a unitary ring by Theorem 18.7.5 and Definition 18.7.2, and so
#(K) 2 by Theorem 18.2.6 (ii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
18.7.10 Definition:
P
Z
The characteristic of a field K is the positive integer min{n + ; ni=1 1K = 0K }
Z P
if {n + ; ni=1 1K = 0K } = 6 , or the integer 0 otherwise.
18.7.11 Notation: char(K), for a field K, denotes the characteristic of K. In other words,
Z+; Pni=1 1K = 0K } Z
Pn
char(K) = min{n if n + , i=1 1K = 0K
0 otherwise.
Proof: Let K < (K, , ) be a field, and let K 0 < (K 0 , 0 , 0 ) be a subfield of K. By Definition 18.7.13,
0 0 0
K K, and . By Definition 18.7.3 (i, ii) and Definitions 17.1.1 and 17.3.2, Dom( 0 ) = K 0 K 0
0 0 0 0 0
and Dom( ) = K K . Therefore = K 0 K 0 and = K 0 K 0 . This verifies (i), while (ii) follows from
Theorem 18.7.8 (iii).
To verify (iii) and (iv), first note that the additive and multiplicative identities 0K and 1K within K are equal
to the respective additive and multiplicative identities 0K 0 and 1K 0 within K 0 because 0 and 0 .
Similarly the additive and multiplicative inverses b and b1 within K are the respective additive and
multiplicative inverses within K 0 . Let 0
a, b K . Then (a, b) is a well-defined element of K. But
(a, b) = (a, b) because = K 0 K 0 . Then (a, b) K 0 because Range( 0 ) K 0 by Definitions
0 0
18.7.13 and 17.3.2. This verifies (iii). Similarly, if a K 0 and b K 0 \ {0K }, then (a, b1 ) is a well-defined
element of K which is also an element of K 0 because 0 = K 0 K 0 . Then (a, b1 ) = 0 (a, b1 ) K 0 by
Definitions 18.7.13 and 17.1.1. This verifies (iv).
Conversely, assume that the tuple (K 0 , 0 , 0 ) satisfies (i), (ii), (iii) and (iv). The inclusions K 0 K,
0 and 0 follow from (i). It follows from (ii), that K contains at least on element a. Then
0K = (a, a) K 0 by (iii). By (ii), it then follows that there exists b K 0 with b 6= 0K , and by (iv), it
follows that 1K = (b, b1 ) K 0 . Then 0 (a, 0K ) = (a, 0K ) = 0K and 0 (0K , a) = (0K , a) = 0K for all
a K 0 by (i). So 0K has the properties required for an additive identity for (K 0 , 0 ).
Let a K 0 . Then a K satisfies (a, a) = (a, a) = 0K = 0K 0 . But a = (0K , a) K 0 by (iii).
Therefore 0 (a, a) = 0 (a, a) = 0K 0 by (i). Thus every element of K has an additive inverse. The closure
condition Range( 0 ) K 0 follows by noting that (a, b) = (a, (b)) K 0 by (iii). Therefore (K 0 , 0 )
is a commutative group by Definitions 17.3.2 and 17.3.22. (Note that commutativity and associativity are
inherited automatically by 0 from .) This verifies Definition 18.7.3 (i).
Let a, b K 0 . If b = 0K , then 0 (a, b) = 0 (a, 0K ) = (a, 0K ) = 0K by (i) and Theorem 18.1.3 (i).
So 0 (a, b) K 0 . If b 6= 0K , then b has a unique multiplicative inverse b1 in K. Then b1 = (1K , b1 ) K 0
by (iv). So 0 (a, b) = 0 (a, (b1 )1 ) = (a, (b1 )1 ) K 0 by (iv). Thus Range( ) K 0 . The commutativity
and associativity of 0 are inherited automatically from . Therefore (K 0 , 0 ) is a commutative semigroup
by Definitions 17.1.1 and 17.1.15. This verifies Definition 18.7.3 (ii).
Let K00 = K 0 \ {0K 0 } = K 0 \ {0K } and 00 = 0 K 0 K 0 . Then 00 = K 0 K 0 . Let a, b K00 . Then
0 0 0 0
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
00 (a, b) = 0 (a, b) = (a, b) 6= 0K 0 by Theorem 18.7.8 (ii). So Range(00 ) K00 . Therefore (K00 , 00 ) is a
semigroup by Definition 17.1.1 because 00 inherits associativity from . Since 1K 6= 0K and 1K K 0 , it
follows that 1K K00 . The multiplicative identity property of 1K within K00 is inherited from K. Let a K00 .
Then a has a unique multiplicative inverse a1 in K, and a1 = (1K , a1 ) K 0 by (iv). So 00 (a, a1 ) =
0 (a, a1 ) = (a, a1 ) = 1K . Therefore 00 (a1 , a) = 1K by the commutativity of 00 , which is inherited from
the commutativity of . So every element of K00 has an inverse within K00 . Therefore (K00 , 00 ) is a group by
Definition 17.3.2. This verifies Definition 18.7.3 (iii).
Since the distributivity of (K 0 , 0 , 0 ) is inherited from the distributivity of (K, , ), Definition 18.7.3 (iv) is
verified. Hence (K 0 , 0 , 0 ) is a field by Definition 18.7.3.
18.8.2 Remark: The non-necessity of defining some ring concepts for fields.
Strictly speaking, it is not necessary to re-define orderings and positive cones of fields because every field
is a ring. So Definitions 18.3.2, 18.3.3 and 18.3.17 effectively also define an ordering on a field, an ordered
field, and a positive cone of a field respectively. Nevertheless, the corresponding Definitions 18.8.3, 18.8.4
and 18.8.9 are rewritten here for fields.
18.8.4 Definition: An ordered field is a tuple (K, , , R) where (K, , ) is a field and R is an ordering
of the field (K, , ).
Proof: The proof of part (i) is identical to the proof of Theorem 18.6.4 (i) because every field is a unitary
ring which is not a zero ring, by Remark 18.7.4.
The proof of part (ii) is identical to the proof of Theorem 18.3.10 (ii) because every field is a ring.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To prove part (iii), note first that if x > 1K , then x > 0K . So x 6= 0K , and therefore the multiplicative in-
verse x1 is well defined and x1 6= 0K . Suppose that x1 < 0K . Then x1 x < 0K x by Definition 18.8.3 (ii).
So 1K < 0K , which contradicts part (i). So x1 > 0K . Therefore 1K < x implies 1K x1 < x x1 by
Definition 18.8.3 (ii). Hence x1 < 1K .
To prove part (iv), define the sequence (an ) n=0 by a0 = 0K and an+1 = an + 1K for n
+
Z
0 . Then
a1 = 1K > 0K by part (i). Suppose {i Z+
; ai 0K } =
6 . Then n = min{i +
Z
; ai 0K } is well defined
and n 2. So an1 > 0K and an 0K . Hence an1 > an = an1 + 1K . So 0K > 1K , which contradicts
Z
part (i). Therefore {i + ; ai 0K } = . Hence the characteristic of K is 0 by Definition 18.7.10.
18.8.9 Definition: A positive cone or positive subset for a field K is a subset P of K which satisfies the
following.
(i) 0K
/ P.
(ii) x K \ {0K }, (x P x
/ P ).
(iii) x, y P, x + y P .
(iv) x, y P, xy P .
ordering of a field is not necessarily unique. Therefore the presence of the order relation (or the positive
cone) in the specification of an ordered field is not superfluous. Since every field is a unitary ring, these
comments apply equally to ordered rings and ordered unitary rings.
18.8.11 Example: Let K be the field ( Q2, , ) with the following operations.
(1) x, y Q2 , (x, y) = (x1 + y1 , x2 + y2 ).
(2) x, y Q2 , (x, y) = (x1y1 + 2Qx2y2 , x1y2 + x2 y1 ).
The element 0K = (0Q , 0Q ) of K satisfies 0K + x = x + 0K = x for all x K. So this is the additive
identity of K. The element 1K = (1Q , 0Q ) of K satisfies 1K x = x 1K = x for all x K. So this is the
multiplicative identity of K. The element x1 = (x1 /(x21 2Q x22 ), x2 /(x21 2Q x22 )) is the multiplicative
inverse of x K \ {0K }. (The remaining conditions for a field in Definition 18.7.3 may be easily verified.)
So K is a field.
Q
Define P1 = {x 2 ; 1 (x) = 1}, where
0 if x = 0K
1 if x1 0Q , x2 0Q and x 6= 0K
x Q2 , 1 (x) = 1 if x1 0Q , x2 < 0Q and x21 > 2Q x22
1 if x1 < 0Q , x2 0Q and x1 < 2Q x2
2 2
1 otherwise.
Q
Then P1 is a positive cone for K. However, let P2 = {x 2 ; 2 (x) = 1}, where 2 (x) = 1 (x1 , x2 ) for
Q
all x 2 . Then P2 is also a positive cone for K. But P1 6= P2 .
18.8.13 Theorem: Let (K, , , <) be an ordered field. Let P = {x K; 0K < x}. Then P is a positive
cone for the field (K, , ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: The proof of Theorem 18.8.13 is identical to the proof of Theorem 18.3.19 with the word ring
replaced by the word field. Alternatively, merely note that a field is a ring, and the definitions of an
ordering and a positive cone are identical for fields and rings.
18.8.14 Theorem: Let (K, , ) be a field with a positive cone P . Let the relation < be defined on K
by x < y y x P for all x, y K. Then < is an ordering of the field (K, , ).
Proof: The proof of Theorem 18.8.14 is identical to the proof of Theorem 18.3.20 with the word ring
replaced by the word field. Alternatively, merely note that a field is a ring, and the definitions of an
ordering and a positive cone are identical for fields and rings.
18.8.17 Definition: An ordered field monomorphism from an ordered field K1 to an ordered field K2 is
an injective ordered field homomorphism : K1 K2 .
An ordered field epimorphism from an ordered field K1 to an ordered field K2 is a surjective ordered field
homomorphism : K1 K2 .
An ordered field isomorphism from an ordered field K1 to an ordered field K2 is a bijective ordered field
homomorphism : K1 K2 .
An ordered field endomorphism of an ordered field K is an ordered field homomorphism : K K.
An ordered field automorphism of an ordered field K is an ordered field isomorphism : K K.
18.9.3 Definition: A complete ordered field is an ordered field in which every non-empty set which is
bounded above has a least upper bound.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
R
18.9.4 Theorem: The real number system ( , , , ) is a complete ordered field.
18.9.5 Remark: All complete ordered fields are isomorphic to the real numbers.
It is not too difficult to show that every complete ordered fields is ordered-field-isomorphic to the field of
real numbers. (See Shilov [131], pages 3638; Spivak [136], pages 509511; Johnsonbaugh/Pfaffenberger [93],
pages 916, 2425, 441, exercise 7.9. See Definition 18.8.17 for ordered field isomorphisms.) Therefore
one may say that there is only one complete ordered field. So there is no need to classify them! In other
words, the system of real numbers is fully axiomatised by the conditions for a complete ordered field. The
axiomatic approach to the real numbers, and also the Dedekind cut and Cauchy sequence approaches, are
presented, for example, by Eves [303], pages 180181; Shilov [131], pages 321; Spivak [136], pages 487512;
Thomson/Bruckner/Bruckner [145], pages 213; A.E. Taylor [141], pages 17; Rudin [125], pages 313. It is
the unnaturalness of the various representations of the real numbers, and their extraneous artefacts, which
make the axiomatic approach seem more attractive. (See Remark 15.3.4 for some alternative representations.)
However, the axiomatic approach lacks concreteness of meaning.
18.10.2 Definition: If X is a semigroup whose operation is written additively, the following operations
are defined for List(X) in addition to the operations in Definition 14.11.6:
P
(ii) The sum function : List(X) X defined for ` List(X) by
length(`)1
P X
`= `i .
i=0
This is well defined because addition in X is associative. The sum is understood to be left-to-right.
That is, the sum is `0 + `1 + . . .. (See Definition 17.1.17.)
18.10.3 Definition: If X is a ring, then the following additional operations are defined for List(X) in
addition to the operations for a semigroup in Definition 18.10.2:
(i) The product function from (X, List(X)) to List(X) defined by
(x`)i = x`i
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
18.11.2 Remark: Classes of sets closed under various set operations.
Definitions 18.11.7, 18.11.12, 18.11.15, 18.12.2 and 18.12.5 are useful in the development of measure theory.
Sets of sets which satisfy these definitions define their algebraic structures in terms of standard set operations:
such as set union, set intersection and set difference.
It is difficult to find any two authors who agree on the definitions of the classes of sets which appear in
Section 18.11. They disagree both on the naming and on the precise details of their definitions. The purpose
of these definitions is to provide a foundation for the definition of measure and integration. The definitions
differ according to the following criteria for a class of subsets S of a set X.
(1) Whether the empty set and parent set are elements of S. Thus: S and/or X S and/or S 6= .
(2) Whether S is closed under simple set operations. Examples (for A, B S): A B S. A B S.
A \ B S. A 4 B S. X \ A S.
T S
(3) Whether S is closed under countable unions and intersections. Thus: i=1 Ai S and/or i=1 Ai S
for families (Ai )
i=1 S Z +
.
T
(4) Whether S is closed under countable monotone unions and intersections. Thus: i=1 Ai S for families
(Ai )
i=1 S
Z+ which satisfy Ai Ai+1 for all i + and/or S Ai S for families (Ai )
Z i=1 S
Z+
Z
i=1
which satisfy Ai Ai+1 for all i +
As in all algebra, one may pick any combination of axioms and investigate the systems which satisfy them.
Some of the systems are sometimes somewhat useful for something in some context somewhere for somebody.
18.11.3 Remark: Family tree of some classes of sets closed under some kinds of set operations.
A family tree for the finitely defined classes of sets in Definitions 18.11.4, 18.11.7, 18.11.12 and 18.11.15 is
illustrated in Figure 18.11.1.
multiplicative class
semi-ring
ring
18.11.4 Definition: A multiplicative class of subsets of a set X is a set S P(P(X)) such that
(i) S 6= .
(ii) A, B S, A B S,
18.11.5 Remark: A possibly useless class of sets which is closed under intersection.
Definition 18.11.4 is given by EDM2 [109], 270.B, page 999, but it is not clear that it is useful for anything.
The requirement that S 6= does not imply that S. The set {A} satisfies Definition 18.11.4 for any
P P
A (X). At the opposite extreme, it is clear that (X) is a multiplicative class of subsets of X, for any
set X. It is equally clear that the sets of all subsets of X which have less than n elements, for any n + 0, Z
is also a multiplicative class of subsets of X.
One may obviously show by induction that a multiplicative class S of subsets of a set X includes all inter-
Z T
sections of non-empty families of elements of S. In other words, n + , (Ai )ni=1 X n , ni=1 Ai X.
A related kind of set-class is a class S of subsets of a set X which is closed under pairwise unions and
contains every singleton {x} for x X. This concept is used in the Kuratowski definition for finite sets in
Remark 13.9.7. This is a kind of pairwise-union span of the class of singletons. In general, one obtains a
useful set-class by at least insisting that some set of simple sets is included in it.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
For any non-empty set X, let S = {A (X); 1 #(X \ A) < }. Then S is a multiplicative class of
P
subsets of X. If X is a singleton, then S = {}. If X is finite and non-empty, then S = (X) \ {X}. If X
is infinite, then S contains only the complements in X of finite non-empty subsets of X. The set of finite
non-empty subsets of X is closed under pairwise unions. So S is closed under pairwise intersections.
18.11.9 Theorem:
S Let S be a semi-ring of sets. Then S is a multiplicative class of subsets of a set X if
and only if S X.
S
Proof: Let S be a semi-ring of sets. Let X be a set which satisfies S X. Then S (X) by P
PP
Theorem 8.4.14 (viii). In other words, S ( (X)). Conditions (i) and (ii) of Definition 18.11.4 follow
directly from conditions (i) and (ii) of Definition 18.11.7. Hence S is a multiplicative class of subsets of X.
To show the converse, let S be a semi-ring of sets, andSsuppose that S is a multiplicative class of subsets
PP
of X. Then S ( (X)) by Definition 18.11.4. Hence S X by Theorem 8.4.14 (vii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
counter-productive and confusing.
From conditions (ii) and (iii) in Definition 18.11.12, it follows that A B S and A \ B S for all A, B S
because of the formulas A B = (A 4 B) 4 (A B) and A \ B = A 4 (A B) for general sets A and B. It
follows that if S is a ring of sets, then S is a semi-ring of sets.
The set S = P(X) is a ring for any set X. In particular, {} is a ring.
Rudin [125], page 227, gives a slightly weaker definition of a ring of sets which does not require S.
However, the only ring which is excluded by this condition is the empty set S, which is not of very great
interest. Therefore the definitions are essentially identical.
MacLane/Birkhoff [106], page 485, requires a ring of sets to be closed under binary union and intersection.
They define a field of sets to be closed under binary union and intersection, and also the complement with
respect to the containing set. This is equivalent to Definition 18.11.15.
18.11.14 Theorem: A collection of sets S is a ring of sets if and only if the following conditions hold.
(i) S,
(ii) A, B S, A B S,
(iii) A, B S, A \ B S.
Proof: Let S be a ring of sets. Let A, B S. Then A B = (A 4 B) 4 (A B) by Theorem 8.3.4 (x), and
A \ B = A 4 (A B) by Theorem 8.3.7 (v). So A B S and A \ B S by Definition 18.11.12 conditions
(ii) and (iii). Hence Theorem 18.11.14 conditions (i), (ii) and (iii) hold.
Suppose that S is a collection of sets which satisfy Theorem 18.11.14 conditions (i), (ii) and (iii). Let A, B
S. Then A B = (A B) \ ((A \ B) (B \ A)) by Theorem 8.2.4 (i), and A 4 B = (A \ B) (B \ A) by
Definition 8.3.2. So A B S and A 4 B S. Hence S is a ring of sets.
18.11.15 Definition: An algebra of subsets (or field of subsets ) of a set X is a set S P(X) such that
(i) S is a ring of sets,
(ii) X S.
18.11.16 Remark: Examples ofSalgebras of sets, and some comments. S
Given any ring of sets S, let X = S. Then S is an algebra of subsets of X if and only if S S. Hence a
ring of sets cannot always be upgraded to an algebra of subsets of a set simply by choosing the right superset.
However, any ring of sets which does contain its union is automatically upgradable.
P
The set S = (X) is an algebra of subsets of X for any set X. In particular, the singleton {} is an algebra
of subsets of .
Definition 18.11.15 is given by both S.J. Taylor [143], page 15, and EDM2 [109], 270.B, page 999. They both
give the names field and algebra, but EDM2 [109] gives additionally the name finitely additive class.
To be precise, EDM2 [109], 270.B, page 999, defines an algebra to be a non-empty subset S (X) P
which is closed under finite unions and complements. Theorem 18.11.17 shows that this is equivalent to
Definition 18.11.15. The extension from binary unions to finite unions follows very easily by induction.
(Rudin [125], page 227, gives the name ring for a set-class which is closed under finite unions and comple-
ments, but does not explicitly require a non-empty class.)
18.11.17 Theorem: Let X be a set. Let S P(X). Then S is an algebra of subsets of X if and only if
the following conditions hold.
(i) S 6= .
(ii) A, B S, A B S.
(iii) A S, X \ A S.
P
Proof: Let X be a set. Let S (X). Let S be an algebra of subsets of X. Then S. So (i) is
satisfied. Let A, B S. Then X \ A = X 4 A S and X \ B = X 4 B S by Definition 18.11.15 (ii) and
Definition 18.11.12 (iii). So A B = X 4 ((X \ A) (X \ B)) S by Definition 18.11.12 (ii, iii). So part (ii)
is satisfied. To show part (iii), let A S and note that X \ A = X 4 A S since X S.
P
To show the converse, let X be a set and let S (X) satisfy conditions (i), (ii) and (iii). To show that
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S satisfies Definition 18.11.12 (i), note that = A \ A S for some A S by parts (i) and (iii). To show
Definition 18.11.15 (ii), note that X = X \ , which is in S by part (iii). To show Definition 18.11.12 (ii),
let A, B S. Then A B = X \ ((X \ A) (X \ B)) is in S by parts (ii) and (iii). To show Defini-
tion 18.11.12 (iii), let A, B S. Then A 4 B = (A \ B) (B \ A) is in S by parts (ii) and (iii).
18.11.18 Remark: Survey of names for various classes of sets closed under various kinds of operations.
Table 18.11.2 shows some of the terminology in the literature for the set-class algebras in this book. The -
versions of these set-class algebras require closure under countably infinite operations, which are presented in
Section 18.12. It is not entirely clear how some authors are defining their conditions because they sometimes
dont state explicitly if their classes are required to be non-empty, and whether the empty set and the whole
set X are required to be in their classes.
18.11.19 Remark: Generation of closed classes of sets from given sets of sets.
As with most algebraic and other structure categories, it is possible to define rings and algebras of sets which
are generated by a given starting set. As usual, the method is to first show that a structure category is closed
under arbitrary intersections, and then define the structure generated by a given set as the intersection of
all structures which include that set.
18.11.20 Definition: The ring of sets generated by a set S0 is the intersection of all rings of sets S such
S
P
that S ( S0 ) and S0 S.
18.11.21 Remark: The ring of sets generated by a set of sets is well defined.
It must be checkedSthat the ring of sets generated
S by a set S0 in Definition 18.11.20 is well defined.S For any
set S0 , the union S0 and the power set P ( S 0 ) are well defined by the ZF axioms. Since (P S0 ) is a
S
P P S
ring of sets and S0 ( S0 ), the set of rings of sets S such that S ( S0 ) and S0 S is non-empty.
Therefore the ring of sets generated by S0 is a well-defined set. The fact that this is a ring of sets follows
from the closure of rings of sets under arbitrary intersections.
18.11.22 Definition: The algebra of sets generated by a subset S0 of a set X is the intersection of all
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
algebras S of subsets of X such that S0 S.
18.12.5 Definition: A -algebra (or -field) of subsets of a set X is a set S P(X) such that
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
18.12. Classes of sets with countably infinite closure conditions 561
(1) X S,
(2) A, B S, A \ B S,
S
(3) (Ai )
i=0 S , i=0 Ai S.
multiplicative class
semi-ring
ring -ring
algebra -algebra
Figure 18.12.1 Family tree of finitely and infinitely defined classes of sets
18.12.11 Definition: The -ring of sets generated by a set S0 is the intersection of all -rings of sets S
S
P
such that S ( S0 ) and S0 S.
18.12.12 Definition: The -algebra of sets generated by a subset S0 of a set X is the intersection of all
-algebras S of subsets of X such that S0 S.
Chapter 19
Modules and algebras
group
(G,G )
module
(M,M )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
left A-module
(A,M,M ,)
19.0.2 Remark: Modules and algebras are algebraic structures whose passive set is a commutative group.
Single-set algebraic structures such as semigroups, groups, rings and fields are presented in Chapters 17
and 18. Chapter 19 presents modules, which are two-set algebraic structures. Linear spaces, associative
algebras and Lie algebras are derived from basic modules by adding further algebraic operations. Linear
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
spaces are presented separately in Chapters 2224 because of their extensive definitions and properties which
are required for differential geometry. Tensor spaces, which are linear spaces of maps which are constructed
from other linear spaces, are presented separately in Chapters 2730.
Modules differ from transformation groups, which are also two-set algebraic structures, in that the passive
set of a module is required to be a commutative group and the action of a modules operator domain must be
linear in some sense. General transformation groups are treated separately in Chapter 20. (See Table 17.0.2
in Remark 17.0.3 for an overview of active and passive sets of algebraic systems.)
19.1.2 Definition: A module (without operator domain) is a commutative group, whose operation is
written additively.
19.1.4 Definition: The module of homomorphisms from a module M1 to a module M2 is the group
(Hom(M1 , M2 ), ), where Hom(M1 , M2 ) is the set of group homomorphisms from M1 to M2 , and is the
operation of pointwise addition on Hom(M1 , M2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
19.1.5 Definition: The ring of endomorphisms of a module M is the ring (End(M ), , ), where End(M )
is the set of group endomorphisms of M , is the operation of pointwise addition on End(M ), and is the
composition operation of End(M ).
19.1.6 Remark: Modules over sets are a generalisation of modules without operator domain.
Since Definition 19.1.7 requires no structure at all on the operator domain of a module, Definition 19.1.2 is the
special case of Definition 19.1.7 where the operator domain is the empty set. In this sense, Definition 19.1.2
is superfluous.
19.1.7 Definition: A (left) module over a set A or (left) A-module or (left) module with operator domain A
is a tuple M
< (A, M, M , ) such that:
(i) M < (M, M ) is a commutative group (written additively).
(ii) : A M M satisfies a A, x, y M, (a, M (x, y)) = M ((a, x), (a, y)).
In other words, a A, x, y M, a(x + y) = ax + ay.
A is said to be an operator domain of the module M .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
19.1.16 Remark: Properties of endomorphism rings.
The tuple (EndA (M ), , ) in Definition 19.1.15 is a ring. As noted in Remark 19.1.3, the tuple (EndA (M ), )
is a commutative group. The composition operation is a semigroup by the associativity of composition
in general. (See Theorem 10.4.18 (ii).) The distributivity follows pointwise from the distributivity of the
operation of A on M in Definition 19.1.7 (ii).
19.2.2 Remark: The relation between general groups and transformation groups.
It could be argued that groups and transformation groups are the same thing. Any group acts on itself as a
transformation group. Since such as group acting on itself is an effective transformation group (by Defi-
nition 20.2.1), the group itself may be eliminated from the specification. That is, the set of transformations
is the same as the group itself. These transformations are a subgroup of the group of all bijections on the
set of group elements. Thus all groups and transformation groups are essentially identical to subgroups of
the group of bijections of a set.
In the case of a module over a group, the transformation group is constrained to act distributively with
respect to the modules commutative group structure. This is essentially linearity, except that no scalar
multiplication is involved. In this case, the transformation group may be identified with a subgroup of
the set of all automorphisms of the module with respect to its additive (i.e. commutative group) structure.
Thus one may regard all modules over groups as semigroups of the group of module automorphisms. Hence
it is more natural to study modules over groups as in the department of additive-structure-preserving
automorphism groups.
19.2.3 Remark: Modules over semigroups.
A left module over a semigroup is the same thing as a left -module M (i.e. a module over the set ) such
that is a semigroup and the additional condition (iv) of Definition 19.2.4 holds.
19.2.4 Definition: A (left) module over a semigroup is a tuple M
< (, M, , M , ) such that
(i) < (, ) is a semigroup (written multiplicatively);
(ii) M < (M, M ) is a commutative group (written additively);
(iii) : M M (written multiplicatively) satisfies a , x, y M, a(x + y) = ax + ay;
(iv) a, b , x M, (ab)x = a(bx);
19.2.5 Remark: Modules over monoids.
Presumably one could define a module over a monoid. It would be the same as Definition 19.2.6 with the
word group replaced by monoid, and with the additional condition that the identity of the monoid acts
as the identity map on the module. It is difficult to see any application for such a definition which would
justify the extra ink.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
19.2.6 Definition: A (left) module over a group G is a tuple M
< (G, M, G , M , ) such that
(i) G < (G, G ) is a group (written multiplicatively);
(ii) M < (M, M ) is a commutative group (written additively);
(iii) : G M M (written multiplicatively) satisfies a G, x, y M, a(x + y) = ax + ay;
(iv) a, b G, x M, (ab)x = a(bx);
(v) x M, 1G x = x.
19.2.7 Remark: Modules over groups.
A left module over a group G is the same thing as a left G-module M such that G is a group and the
additional conditions (iv) and (v) of Definition 19.2.6 hold. A left module over a group G is also the same
thing as a module M over a semigroup G such that G is additionally a group and the additional condition
(v) of Definition 19.2.6 holds. These relations are illustrated in Figure 19.0.1.
Proof: For part (i), let m M . Then 0R m = (0R +0R )m = 0R m+0R m by Definition 19.3.1 (iv). Therefore
0M = 0R m + ((0R m)) = (0R m + 0R m) + ((0R m)) = 0R m + (0R m + ((0R m))) = 0R m + 0M = 0R m.
For part (ii), let r R and m M . Then (r)m = (r)m + (rm + (rm)) = ((r)m + rm) + (rm) =
(r + r)m + (rm) = 0R m + (rm) = 0M + (rm) = rm by part (i).
19.3.3 Remark: The relation of modules over rings to other kinds of modules and linear spaces.
A left module over a ring R is the same thing as a left module M over the set R, together with the extra
conditions (i), (iv) and (v) of Definition 19.3.1. The sets and operations of the basic classes of modules are
illustrated in Figure 19.3.1.
A G G R R R
A G R
M M M M M M
module module module
over a set over a group over a ring
Figure 19.3.1 Sets and operations of modules
Definition 19.3.1 is not a specialisation or extension of Definition 19.2.6. Therefore a left module over a ring
cannot be a left module over a group. This is illustrated by the forking of the tree in Figure 19.0.1 between
modules over rings and modules over groups.
Condition (v) of Definition 19.3.1 refers to the multiplication operation of a ring R, which is not a group
operation, whereas condition (iv) of Definition 19.2.6 does refer to a group operation, even though it is written
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
multiplicatively. If the tuple (R, M, R , M , ) in Definition 19.3.1 is substituted into Definition 19.2.6, it will
satisfy all conditions except (i) and (v). The tuple (R, M, R , M , ) is in fact a module over a semigroup.
Consequently everything said about modules over rings is applicable to linear spaces, whereas modules over
groups are a kind of dead end. (See Chapter 22 for linear spaces.) From the perspective of a linear space
V over a field K, Definition 19.2.6 (iv) would require , K, v V, ( + )v = (v) because addition
is the group operation of a field. Even worse is Definition 19.2.6 (v), which would require v V, 0K v = v.
19.3.6 Definition: A unitary (left) module over a ring R or unitary (left) ring-module over R is a left
module M
< (R, M, R , R , M , ) over a ring R such that
(i) R
< (R, R , R ) is a unitary ring;
(ii) x M, 1R x = x, where 1R is the unity of R.
19.3.7 Remark: Choice of definitions for modules over rings and unitary modules over rings.
Hartley/Hawkes [87], definition 5.1, page 70, use Definition 19.3.6 for left modules over rings, but they
comment that some authors use Definition 19.3.1 instead. However, it seems safest in a general context to
always state the unitary requirement explicitly.
19.3.8 Remark: The relation between linear spaces and unitary modules over rings.
If the ring R of a unitary left module M over R is a field, then M is a linear space over R. (See Defini-
tion 22.1.1.) In other words, a linear space is the same thing as a unitary left module over a field. (See
Remark 22.1.2 for more details.)
Linear spaces are deferred until Chapters 2224. Their properties are presented in much greater detail than
most other algebraic systems because of their fundamental importance for differential geometry.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A ring-module endomorphism of a module M over a ring R is a ring-module homomorphism : M M .
A ring-module automorphism of a module M over a ring R is a ring-module isomorphism : M M .
19.3.12 Remark: Sets of module homomorphisms are linear spaces with pointwise operations.
In the linear space language in Chapter 22, all of the sets of morphisms in Definition 19.3.10 are linear spaces.
Theorem 19.3.13 states that HomR (M1 , M2 ) is a linear space, but without using the term linear space.
(Because of the close connection between unitary modules over rings and linear spaces, homomorphisms
between modules over rings are sometimes referred to as linear maps.)
19.3.13 Theorem: HomR (M1 , M2 ) is closed under the pointwise addition and scalar multiplication for
any modules M1 and M2 over a ring R.
Proof: Let 1 , 2 HomR (M1 , M2 ). Then using the notation of Definition 19.3.10,
x, y M1 , (1 + 2 )(x + y) = 1 (x + y) + 2 (x + y)
= (1 (x) + 1 (y)) + (2 (x) + 2 (y))
= (1 (x) + 2 (x)) + (1 (y) + 2 (y))
= (1 + 2 )(x) + (1 + 2 )(y),
which shows that the pointwise sum 1 + 2 is in HomR (M1 , M2 ). Thus HomR (M1 , M2 ) is closed under
pointwise addition. Similarly the pointwise scalar product is in HomR (M1 , M2 ) for any R and
HomR (M1 , M2 ).
19.3.14 Remark: The module of homomorphisms between two modules over the same ring.
The set HomR (M1 , M2 ) of all homomorphisms between modules M1 and M2 over the same commutative
ring R is the natural structure of a module over the same ring as in Definition 19.3.15. The fact that this
is a valid module is verified in Theorem 19.3.16. The commutativity of the ring is applied in line (19.3.8).
(The slightly surprising necessity for commutativity is pointed out by Ash [49], page 95. This shows that
checking all of the conditions is not entirely superfluous! See also MacLane/Birkhoff [106], page 166.)
19.3.16 Theorem: The ring-module of ring-module homomorphisms between two modules over the same
commutative ring R is a ring-module over R.
Proof: Let M1 < (R, M1 , R , R , 1 , 1 ) and M2 < (R, M2 , R , R , 2 , 2 ) be modules over a commutative
ring R
< (R, R , R ). Let M < (R, M, R , R , , ) be the ring-module of ring-module homomorphisms from
M1 to M2 . To show that M is a ring-module over R, note first that Definition 19.3.1 (i) is satisfied because
R is a ring by Definition 19.3.15.
For Definition 19.3.1 (ii), it must be shown that (M, ) is a commutative group. The algebraic closure of
: M M M follows from Theorem 19.3.13. The associativity and commutativity of follow from
Definition 19.3.15 (ii) and the associativity and commutativity of 2 . Define 0 : M1 M2 by 0 : x 7 0M2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Then 0 M and 0 + = + 0 = for all M . So 0 is an identity for (M, ). Similarly, for all
M the map : x 7 (x) is the additive inverse of . So (M, ) is a commutative group.
For Definition 19.3.1 (iii), it must first be shown that : R M M is well defined. For every a R and
M , it must be shown that (a, ) is a ring-module homomorphism from M1 to M2 over R. Let x M1 .
Then (a, )(x) = 2 (a, (x)) M2 . Thus (a, ) : M1 M2 . Then
where line (19.3.1) is by Definition 19.3.15 (iii) (pointwise product), line (19.3.2) is by Definition 19.3.10 (i)
(homomorphism additivity), line (19.3.3) is by Definition 19.3.1 (iii) (module additive distributivity), and
line (19.3.4) is by Definition 19.3.15 (iii) (pointwise product). To put it more colloquially, (a)(x + y) =
a(x + y) = a((x) + (y)) = a(x) + a(y) = (a)(x) + (a)(y). Thus (a, ) satisfies Definition 19.3.10 (i)
for homomorphism additivity from M1 to M2 . The corresponding homomorphism scalarity condition may
be checked as follows.
where line (19.3.5) is by Definition 19.3.15 (iii) (pointwise product), line (19.3.6) is by Definition 19.3.10 (ii)
(homomorphism scalarity), line (19.3.7) is by Definition 19.3.1 (v) (module multiplicative distributivity),
line (19.3.8) is by Definition 19.3.15 (ring commutativity), line (19.3.9) is by Definition 19.3.1 (v) (module
multiplicative distributivity), and line (19.3.10) is by Definition 19.3.15 (iii) (pointwise product). To put it
more colloquially, (a)(x) = a(x) = a((x)) = (a)(x) = (a)(x) = (a(x)) = ((a)(x)). Thus
(a, ) satisfies Definition 19.3.10 (ii) for homomorphism scalarity from M1 to M2 . So : R M M is a
well-defined map.
Let R and 1 , 2 M . Let x M1 . Then ((1 + 2 ))(x) = ((1 + 2 )(x)) by Definition 19.3.15 (iii).
This equals (1 (x) + 2 (x)) by Definition 19.3.15 (ii). This equals 1 (x) + 2 (x) by Definition 18.1.2 (iii).
This then equals (1 )(x) + (2 )(x) by Definition 19.3.15 (iii). And this then equals (1 + 2 )(x) by
Definition 19.3.15 (ii). Thus R, 1 , 2 M, (1 + 2 ) = 1 + 2 . Therefore : R M M
satisfies Definition 19.3.1 (iii).
Let 1 , 2 R and M . Let x M1 . Then ((1 + 2 ))(x) = (1 + 2 )(x) = 1 (x) + 2 (x) =
(1 )(x) + (2 )(x) = (1 + 2 )(x) by Definition 19.3.15 (iii), Definition 18.1.2 (iii), Definition 19.3.15 (iii)
and Definition 19.3.15 (ii) respectively. So and satisfy Definition 19.3.1 (iv).
Let 1 , 2 R and M . Let x M1 . Then ((1 2 ))(x) = (1 2 )(x) = 1 (2 (x)) = 1 ((2 )(x)) =
(1 (2 ))(x) by Definition 19.3.15 (iii), Definition 19.3.1 (v), Definition 19.3.15 (iii) and Definition 19.3.15 (iii)
respectively. So satisfies Definition 19.3.1 (v). Hence M is a ring-module over R by Definition 19.3.1.
19.3.17 Definition: The ring-module of ring-module endomorphisms of a module M over a commutative
ring R is the ring-module of ring-module homomorphisms from M to M over R.
19.3.18 Remark: Unitary ring-module morphisms.
There is no need to define unitary ring-module morphisms. Definition 19.3.10 and Notation 19.3.11 for general
ring-module isomorphisms are adequate. If R is a unitary ring with unit 1R , and M1 is a unitary module
over R, then R, 1R = 1R = by Definition 18.2.2, and x M1 , 1R x = x by Definition 19.3.6.
Therefore x M1 , 1R (x) = (1R x) = (x) by Definition 19.3.10 for a ring-module : M1 M2 over R.
In other words, 1R acts at least as a unit element on the left for all elements of Range(). This is not strong
enough to imply that M2 is a unitary module over R. (Consider for example the zero map : x 7 0M2 , which
is a ring-module homomorphism. Knowing that 1R 0M2 = 0M2 certainly does not show that M2 is a unitary
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
module.) However, the conditions for a module homomorphism are clearly always compatible with M1 and
M2 both being unitary modules over R. Thus it is redundant to say that a ring-module homomorphism
between unitary rings is a unitary ring-module homomorphism. However, it is not entirely obvious that the
ring-module of ring-module homomorphisms between modules M1 and M2 in Definition 19.3.15 will be a
unitary ring-module if M1 and M2 are unitary modules over a unitary ring R.
19.3.19 Theorem: The ring-module of ring-module homomorphisms between two unitary modules over a
commutative unitary ring R is a unitary ring-module over R.
Proof: Let M1 < (R, M1 , R , R , 1 , 1 ) and M2 < (R, M2 , R , R , 2 , 2 ) be unitary modules over a
commutative unitary ring R < (R, R , R ). Let M < (R, M, R , R , , ) be the ring-module of ring-module
homomorphisms from M1 to M2 . Then M is a ring-module over R by Theorem 19.3.16.
Let M . Then x M1 , (1R )(x) = 1R (x) = (x) by Definition 19.3.15 (iii) and Definition 19.3.6 (ii).
Thus 1R = for all M , Hence M is a unitary ring-module over R by Definition 19.3.6.
19.3.20 Theorem: Let A be the ring-module of ring-module endomorphisms of a module M over a com-
mutative ring R.
(i) A is a module over the ring R.
(ii) If M is a unitary module over R, then A is a unitary module over R.
(iii) EndR (M ) is closed under function composition.
Proof: Part (i) follows from Theorem 19.3.16.
Part (ii) follows from Theorem 19.3.19.
For part (iii), let 1 , 2 EndR (M ). Let x, y M . Then (1 2 )(x + y) = 1 (2 (x + y)) = 1 (2 (x) +
2 (y)) = 1 (2 (x)) + 1 (2 (y)) = (1 2 )(x) + (1 2 )(y). So 1 2 satisfies Definition 19.3.10 (i).
Similarly, 1 2 satisfies Definition 19.3.10 (ii). So 1 2 EndR (M ).
R
where , satisfy . Let R and x M . Then = 0R . So (x) = (0R x) = (0M ) = 0S =
0S (x) = (0R )(x) = ()(x). So and satisfy Definition 19.4.3 condition (i). Suppose that x 0R
and y 0R . Then (x + y) = (x + y) = x + y = (x) + (y). Similarly (x + y) = (x) + (y) if x 0R
and y 0R . Suppose that x 0R and y 0R and x + y 0R . Then (x + y) = (x + y) = x + y =
x + y + ( )x x + y = (x) + (y). Suppose that x 0R and y 0R and x + y 0R . Then
(x + y) = (x + y) = x + y = x + y + ( )y x + y = (x) + (y). Thus (x + y) (x) + (y)
for all x, y M . So satisfies Definition 19.4.3 condition (ii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
This example shows that a seminorm on a module over a zero ring is not necessarily symmetric with respect
to negation of its argument, and is not even necessarily non-negative. In fact, in the case of a module over
a zero ring, a seminorm is only required to satisfy a limited kind of subadditivity. This is not the kind of
behaviour which is typically associated with the words seminorm or norm. Therefore Definitions 19.4.3
and 19.5.2 require the ring R to be a non-zero ring.
19.4.3 Definition: A seminorm on a (left) module M over a non-zero ring R, compatible with an absolute
value function : R S, for an ordered ring S, is a map : M S which satisfies
(i) R, x M, (x) = ()(x), [scalarity]
(ii) x, y M, (x + y) (x) + (y). [subadditivity]
19.4.4 Theorem: Let : M S be a seminorm on a left module M over a non-zero ring R, compatible
with an absolute value function : R S, for an ordered ring S.
(i) (0M ) = 0S .
(ii) x M, (x) = (x).
(iii) x M, (x) 0S .
(iv) x, y M, (x) (y) (x y).
Proof: For part (i), the equality (0M ) = (0R 0M ) = (0R )(0M ) = 0S (0M ) = 0S follows from
Theorem 19.3.2 (i), Definitions 19.4.3 (i) and 18.5.6 (iii), and Theorem 18.1.3 (i).
For part (ii), let x M and R \ {0R }. Then R \ {0R } and (()x) = ()(x) = ()(x)
by Definition 19.4.3 (i) and Theorem 18.5.10 (iii). But (()x) = ((x)) = ()(x) by Definitions
19.3.1 (v) and 19.4.3 (i). So ()(x) = ()(x). But () 6= 0S by Definition 18.5.6 (iv), and S is a
cancellative ring by Theorem 18.3.10 (xiv) because it is an ordered ring. Hence (x) = (x).
For part (iii), let x M . Then 0S = (0M ) = (x + (x)) (x) + (x) = (x) + (x) by parts (i)
and (iv). If (x) < 0S then (x) + (x) > 0S . Therefore (x) 0S .
For part (iv), let x, y M . Then (y) = (x + (y x)) (x) + (y x) by Definition 19.4.3 (ii). But
(y x) = (x y) by part (ii). So (x) (y) (x y).
19.5.2 Definition: A norm on a (left) module M < (R, M, R , R , M , ) over a non-zero ring
R < (R, R , R ), compatible with absolute value function : R S, for an ordered ring S < (S, S , S , <),
is a map : M S which satisfies
(i) R, x M, (x) = ()(x), [scalarity]
(ii) x, y M, (x + y) (x) + (y), [subadditivity]
(iii) x M \ {0M }, (x) > 0S . [positivity]
19.5.3 Definition: A normed (left) module over a non-zero ring R < (R, R , R ) compatible with absolute
value function : R S, for an ordered ring S, is a tuple M < (M, ) < (R, M, R , R , M , , ) where
M < (R, M, R , R , M , ) is a left module over R and : M S is a norm on M compatible with S.
19.5.4 Theorem: Let : M S be a norm on a left module M over a non-zero ring R, compatible with
an absolute value function : R S, for an ordered ring S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) is a seminorm on the left module M over the non-zero ring R, compatible with : R S.
(ii) (0M ) = 0S .
(iii) x M, (x) = (x).
(iv) x M, (x) 0S .
(v) x, y M, (x) (y) (x y).
Z Z
19.5.5 Example: Let R be the ring 2 = {2n; n } with the usual addition operation from . The set Z
Z
M = 2 with componentwise addition is a commutative group. Define the action of R on M componentwise.
R
Then M is a module over the ring R. Let S = . Define : M S by : (x, y) 7 (x2 + y 2 )1/2 . Define
: R S by : x 7 max(x, x), identifying integers with the corresponding real numbers. Let R and
x M . Then (x) = ()(x). Thus satisfies Definition 19.5.2 for the given M , R, S and . (The other
conditions are easily verified.) Note that R is not a unitary ring.
C
19.5.6 Example: Let M = k , the set of k-tuples of complex numbers for some k + . Let R = . Z C
Then R is a ring with the usual addition and multiplication operations. Define the action of R on M
R
componentwise. Let S = . Define : R S by () = ||, the usual absolute value function for the
P
complex numbers. Define : M S by : z 7 ( ki=1 |zi |2 )1/2 . Then is a norm on M over R with
absolute value function for the ordered ring S. Note that R 6= S and R is not an ordered ring.
R S L()
L
L = L()
M
L
19.5.7 Remark: The interactions between sets and functions for a norm on a module over a ring.
The spaces and maps in Definition 19.5.2 (i) are illustrated in Figure 19.5.1.
There are two rings in Definition 19.5.2. The role of the first ring R is to scale the elements of M . That
is, elements of R act on the elements of M without changing their direction, or by reversing their direction.
The role of the second ring S is to give represent the size of elements of M . The value of a norm function
is an element of S, which is equipped with a total order by which relative size can be represented.
When the left action L acts on M for some R, the effect on the norm of elements of M is the same as
the action of () on S. Thus the absolute value function indicates the scaling effect of elements of R. In
other words, () is the scaling effect of R. Definition 19.5.2 (i) is similar to a linearity condition. One
could perhaps refer to it as a scalarity condition.
C
In the case of linear spaces M over the field of complex numbers R = , as in Example 19.5.6, norms on M
R
are generally valued is the ordered ring of real numbers S = . Then the absolute value function : C R
indicates the scaling effect of complex numbers. The ring R = C
(which is not an ordered ring) scales
R
elements of M , but the ordered ring S = represents lengths of elements of M .
Z Z
In the case of the module M = n over the ring R = for n + (with P Z the usual componentwise module
addition and scalar multiplication), the norm : M R Z
with : x 7 ( ni=1 x2i )1/2 uses to scale the
module elements, but uses R
to represent lengths of module elements.
19.5.8 Remark: Typical notations for norms and absolute value functions.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To distinguish norms from absolute value functions, the norm is often denoted by double vertical bars, as in
(x) = kxk, while the absolute value function is denoted by single vertical bars, as in () = ||. Then
the scalarity condition, Definition 19.5.2 (i), becomes R, x M, kxk = || kxk,
19.6.2 Definition: An inner product on a (left) module M < (R, M, R , R , M , ) over an ordered ring
R< (R, R , R , <) is a map : M M R which satisfies
(i) u, v, w M, (u + v, w) = (u, w) + (v, w), [left additivity]
(ii) u, v, w M, (u, v + w) = (u, v) + (u, w), [right additivity]
(iii) u, v M, R, (u, v) = (u, v) = (u, v), [scalarity]
(iv) u, v M, (u, v) = (v, u), [symmetry]
(v) u M \ {0M }, 0R < (u, u). [positive definiteness]
product is bilinear. It is not necessary to assume that (u, u) equals zero for u = 0M because this is implied
by the other assumptions, as shown in Theorem 19.6.4 (ii).
Proof: To prove part (i), note that (0M , v) = ((0R , 0M ), v) = R (0R , (0M , v)) = 0R follows from
Definition 19.6.2 (iii) for any v M . The equality (v, 0M ) = 0R follows similarly.
Part (ii) follows from part (i) and Definition 19.6.2 (v).
Part (iii) follows from part (ii) and Definition 19.6.2 (v).
For part (iv), let u, v M . Then (u u, v) = (0M , v) = 0R by part (i). Therefore (u, v) + (u, v) =
(u u, v) = 0R by Definition 19.6.2 (i). Hence (u, v) = (u, v) by the additive group structure of R.
Similarly, (u, v) = (u, v).
For part (v), note that 0R (u v, u v) for any u, v M by Definition 19.6.2 (v) and part (iii). Then
0R (u, u) (u, v) (v, u) + (v, v) by the left and right additivity of and part (iv). It then follows
by Definition 18.3.2 (i) that (u, v) + (v, u) (u, u) + (v, v).
Part (vi) follows from part (v) and Definition 19.6.2 (iv).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (vii), let u, v M and , R. Then (u, v) + (v, u) (u, u) + (v, v) by part (v).
Hence ( + )(u, v) 2 (u, u) + 2 (v, v) by Definition 19.6.2 (iii, iv).
For part (viii), let u, v M and , R. Then (u, v) + (u, v) (u, u) + (v, v) by part (vi).
Hence ( + )(u, v) 2 (u, u) + 2 (v, v) by Definition 19.6.2 (iii).
For part (ix), let u, v M and , R. Then (u, v) = (u, v) = (u, v) by Definition 19.6.2 (iii).
But by applying the rule in a different order, one obtains (u, v) = (u, v) = (u, v). Hence
( )(u, v) = 0R .
Proof: For part (i), let u, v M . Then (u v, u v) = (u, u) + (v, v) 2(u, v) by Definition 19.6.2,
where 2 denotes the element 1R + 1R of R. So 2(u, v) = (u, u) + (v, v) (u v, u v).
For part (ii), let u, v M . Then (u + v, u + v) = (u, u) + (v, v) + 2(u, v) by Definition 19.6.2. So
2(u, v) = (u + v, u + v) (u, u) (v, v).
Part (iii) follows by summing parts (i) and (ii).
1 u+v
2 (u + v)
v
u
0
Therefore the square-distance from 0 to 21 (u + v) equals the square-distance from 21 (u + v) to both u and v.
Hence this point 21 (u + v) is apparently the mid-point of the diameter of a circle which has the origin 0
on its circumference. This matches very well with the Euclidean theorem that the angle subtended by the
diameter of a circle at a point on the circumferences is a right angle. Perhaps a simple interpretation of
Theorem 19.6.6 (iii) when (u, v) = 0 is that the diagonals of a parallelogram are equal when one of the
angles is a right angle.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
19.7. Cauchy-Schwarz and triangle inequalities for modules over rings
19.7.1 Remark: No norm for inner products on general modules over rings.
At this point, it is difficult to produce a Cauchy-Schwarz inequality because square roots are not well defined
for general elements of general rings, and nor is division well defined in general. Therefore one cannot even
construct a norm from an inner product of the form kvk = (v, v)1/2 .
One might attempt to prove something like (u, v)2 (u, u)(v, v) instead. The first step in such a
proof would be to derive an inequality such as 2(u, v) 2 (u, u) + 2 (v, v) from the generally valid
inequality (uv, uv) 0. However, even this simple derivation requires the ring to be commutative
so that = . Therefore this line of development is best abandoned. The scalar space R needs to
be upgraded from a ring to an ordered commutative ring which has at least multiplicative inverses, and
preferably also contains square roots for all non-negative elements. Thus one requires at least an ordered
field and preferably the existence of square roots.
As mentioned in Remark 19.6.5, one loses almost nothing by assuming that the ordered ring is commutative.
It also makes good sense to assume that the module is a unitary module over a ring as in Definition 19.3.6
so as to simplify expressions such as (u, v) + (u, v) and + in Theorem 19.6.4 (vi, vii, viii) with the
help of the ring element 2R = 1R + 1R for example. Then one may obtain a generalised Cauchy-Schwarz
inequality for modules over rings as in Theorem 19.7.2. (The notation R0+ = {x R; 0R x} for the
non-negative cone of an ordered ring is defined in Remark 18.3.21. Similarly, R+ = {x R; 0R < x}
denotes the positive cone of R.)
For each element u in a module M over an ordered ring R, the existence of R0+ such that (u, u) 2
is guaranteed by Theorem 18.4.8 if R is unitary or Archimedean, and R is not the zero ring. (Ordered rings
where some elements have no square upper bound are discussed in Remark 18.4.6.)
Proof: Let be an inner product on a unitary left module M over a commutative ordered ring R. Let
u, v M and , R0+ satisfy (u, u) 2 and (v, v) 2 . If = 0R , then (u, u) = 0R , and
therefore u = 0M and (u, v) = 0R , and so (u, v) . Similarly, if = 0R , then (u, v) . Suppose
then that , R+ . It follows from Theorem 19.6.4 (viii) that 2(u, v) 2 (u, u) + 2 (v, v). So
2(u, v) 2 2 + 2 2 by Definition 18.3.2 and Theorem 18.3.10 (vi). Therefore 2(u, v) 22 2 .
Hence (u, v) by Theorem 18.3.10 (xii) because 2 > 0.
19.7.4 Remark: Work-arounds for the non-existence of square roots in Cauchy-Schwarz inequalities.
The form of the assertion in Theorem 19.7.2 carefully avoids assuming the existence of square roots. If one
lets = (u, u)1/2 and = (v, v)1/2 , one obtains (u, u) 2 and (v, v) 2 , from which Theorem 19.7.2
gives (u, v) (u, u)1/2 (v, v)1/2 , which is precisely the form of the standard Cauchy-Schwarz inequality.
If one does not assume the existence of square roots, the inequality (u, v) (u, u)1/2 (v, v)1/2 may be
expressed in the formally equivalent form (u, v)2 (u, u)(v, v), which does not pre-suppose the existence
of square roots. However, Theorem 19.7.2 does not imply this assertion, but it does imply the assertion that
(u, v)2 2 2 whenever it is assumed that (u, u) 2 and (v, v) 2 . From this, one might conjecture
that (u, v) inf{ R0+ ; (u, u) 2 } inf{ R0+ ; (v, v) 2 }, from which one might conclude that
(u, v) kukkvk, where kwk = inf{ R0+ ; (w, w) 2 } for all w M . Of course, such an infimum may
not exist, but if one adds such infimums to the ring as some kind of formal completion extension, this kind
of generalised Cauchy-Schwarz inequality could be viable.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
true. This impression is strengthened by Theorem 19.7.6, which implies that in the extremal case of two
parallel vectors, the inequality becomes an exact equation. This extremal case suggests that in the more
general case, (u, v)2 will be no greater than (u, u)(v, v). However, this kind of geometric intuition proves
absolutely nothing. A logical deduction is required!
19.7.6 Theorem: Let be an inner product on a unitary left module M over a commutative ordered
ring R. Let u, v M and , R \ {0} satisfy u + v = 0. Then (u, v)2 = (u, u)(v, v).
s1 , s2 , t1 , t2 R, (s1 e1 + s2 e2 , t1 e1 + t2 e2 ) = s1 t1 + s2 t2 + (s1 t2 + s2 t1 ).
By temporarily working outside the integers, one may exploit these inequalities more easily using some
Z
elementary calculus. For any given s1 + , let s2 = ceiling(s1 (/)1/2 ). (Note that s1 and s2 are
both integers.) Then s2 s1 (/)1/2 + 1, and so s2 /s1 (/)1/2 + s1
1 . The case s2 = 0 may be
Therefore the exact, standard Cauchy-Schwarz inequality (u, v)2 (u, u)(v, v) holds, although the proof
here goes beyond the ring R into the real number extension of the integers to perform the analysis.
It is also perhaps of some interest that the stronger inequality 2 < must hold if and are both equal
Z
to squares of positive integers. Suppose that , + satisfy 2 = and 2 = . Let u = e1 e2 .
Then 0 < (u, u) = 2 + 2 2 = 2 2. So < . So = 2 2 < . So < .
So 2 < 2 2 = . (This argument is entirely within the integers.) One then obtains (u, v)2 < (u, u)(v, v)
whenever s1 t2 s2 t1 6= 0 for u and v as above, which holds whenever u and v are not parallel. In the special
case = = 1, the only possibility for is = 0.
19.7.8 Remark: The standard Cauchy-Schwarz inequality for modules over Archimedean rings.
Example 19.7.7 gives a hint of how to prove the standard Cauchy-Schwarz inequality (u, v)2 (u, u)(v, v)
for modules over rings in general. The proof of Theorem 19.7.9 uses the assumption that the ordered ring is
Archimedean. (See Definition 17.5.4 for Archimedean order on a group. See Section 18.4 for Archimedean
ordered rings.) This proof has essentially the same reductio ad absurdum form which Archimedes felt
was necessary to validate his formulas for areas and volumes to other mathematicians at the time. The
proof of Theorem 19.7.9 has an analytical flavour although the result is purely algebraic. No square roots
or multiplicative inverses were used in the making of this proof.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Let be an inner product on a unitary module M over an Archimedean commutative ordered ring. Then
Proof: Let be an inner product on a unitary module M over an Archimedean commutative ordered
ring R. If u = 0M or v = 0M , then (u, v) = 0R , and so line (19.7.1) holds.
Suppose that (u0 , v0 )2 > (u0 , u0 )(v0 , v0 ) for some u0 , v0 M \ {0M }. For all s, t R+ , define (s, t) =
(su0 , tv0 )2 (su0 , su0 )(tv0 , tv0 ) R. Let 0 = (1R , 1R ) = (u0 , v0 )2 (u0 , u0 )(v0 , v0 ) R+ . Then
(s, t) = s2 t2 0 R+ for all s, t R+ .
Z
For any s, t R+ , there exist , + such that (su0 , su0 ) 2 .1R and (tv0 , tv0 ) 2 .1R . The existence
of such and is guaranteed by Theorem 18.4.8 because R is assumed to be Archimedean and unitary. Let
Z Z
(s) = min{ + ; (su0 , su0 ) 2 .1R } and (s) = min{ + ; (tv0 , tv0 ) 2 .1R } for s, t + . Then Z
Z
(su0 , su0 ) > ((s) 1)2 .1R and (tv0 , tv0 ) > ((t) 1)2 .1R for all s, t + . But (su0 , tv0 ) (s)(t).1R
Z
for all s, t + by Theorem 19.7.2. Thus for all s, t + , Z
s2 (u0 , u0 ) > ((s) 1)2 .1R
t2 (v0 , v0 ) > ((t) 1)2 .1R
st(u0 , v0 ) (s)(t).1R .
It is more or less clear that the expression on line (19.7.2) is of order st2 + s2 t, whereas (s, t) is of order s2 t2 ,
and that therefore by making both s and t large enough, it may be shown that (s, t) < s2 t2 0 , which
contradicts the assumption, but this must be demonstrated more carefully. Note that (u0 , u0 ) (1)2 , and
Z
so ((s) 1)2 < s2 (u0 , u0 ) s2 (1)2 . Therefore (s) < 1 + s(1) for all s + . Similarly, (t) < 1 + t(1)
Z
for all t + . But from (1) 1 and (1) 1, it follows that 1 + s(1) 2s(1) and 1 + t(1) 2t(1)
Z
for all s, t + . Therefore
Z
By the Archimedean property of R, there exists s0 + such that s0 0 > 32(1)(1)2 because 0 > 0. For
Z
such s0 , one has s20 t20 0 > 32s0 t20 (1)(1)2 . Similarly, there exists t0 + such that t0 0 > 32(1)2 (1).
For such t0 , one has s20 t20 0 > 32s20 t0 (1)(1)2 . Consequently
From this contradiction, one infers that (u0 , v0 )2 (u0 , u0 )(v0 , v0 ) for all u0 , v0 M \ {0M }.
19.7.10 Remark: A triangle inequality for inner products on modules over rings.
The generalised triangle inequality in Theorem 19.7.11 for the virtual norm corresponding to an inner
product on a module over a ring is obtained from Theorem 19.7.2. If square roots of values of norms are
well defined, this inequality may be written as ku + vk kuk + kvk by setting = kuk = (u, u)1/2 and
= kvk = (v, v)1/2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
u, v M, , R0+ , ((u, u) 2 (v, v) 2 ) (u + v, u + v) ( + )2 .
Proof: Let be an inner product on a unitary left module M over a commutative ordered ring R. Let
u, v M and , R0+ satisfy (u, u) 2 and (v, v) 2 . Then (u+v, u+v) = (u, u)+(v, v)+2(u, v)
by Definition 19.6.2 (i, ii). Hence (u + v, u + v) (u, u) + (v, v) + 2 2 + 2 + 2 = ( + )2 by
Theorem 19.7.2.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) A = EndR (M ) is the set of R-endomorphisms of M .
(ii) A : A A A is defined by f1 , f2 A, x M, A (f1 , f2 )(x) = M (f1 (x), f2 (x)).
In other words, f1 , f2 A, x M, (f1 + f2 )(x) = f1 (x) + f2 (x).
(iii) A : A A A is defined by f1 , f2 A, x M, A (f1 , f2 )(x) = f1 (f2 (x)).
In other words, f1 , f2 A, f1 f2 = f1 f2 .
(iv) A : R A A is defined by R, f A, x M, A (, f )(x) = M (, f (x)).
In other words, R, f A, x M, (f )(x) = f (x).
19.8.6 Theorem: The algebra of endomorphisms of a unitary left module over a commutative unitary
ring R is an associative algebra over R.
Proof: Let R < (R, R , R ) be a commutative unitary ring. Let M < (R, M, R , R , M , M ) be a unitary
left module over R. Let A < (R, A, R , R , A , A , A ) be the algebra of endomorphisms of M over R.
The tuple (R, A, R , R , A , A ) is a unitary ring over R by Theorem 19.3.20 (ii). So Definition 19.8.2 (i) is
satisfied by A.
For Definition 19.8.2 (ii), note that (A, A ) is a commutative group by Definition 19.3.1 (ii) because A is
a module over R, and (A, A ) is a semigroup because EndR (M ) is closed under function composition by
Theorem 19.3.20 (iii), and this composition is associative by Theorem 10.4.18 (ii). To show distributivity of
A over A , let , 1 , 2 A and let x M . Then ((1 + 2 ))(x) = (1 (x)) + (2 (x)) = (1 + 2 )(x).
So (1 +2 ) = 1 +2 . Similarly (1 +2 ) = 1 +2 . Thus Definition 18.1.2 (iii) (ring distributivity)
is verified. Therefore (A, , ) is a ring, which verifies Definition 19.8.2 (ii).
Let R and 1 , 2 A. Let x M . Then ((1 2 ))(x) = (1 2 )(x) = 1 (2 (x)) = (1 )(2 (x)) =
((1 )2 )(x). So (1 2 ) = (1 )2 . But (1 )(2 (x)) = 1 (2 (x)) = 1 ((2 )(x)) = (1 (2 ))(x). So
(1 2 ) = 1 (2 ). Thus Definition 19.8.2 (iii) is verified. Hence A is an associative algebra over R.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
page 27.)
(2) Tangent vectors. The elements of the Lie algebra are vectors in the tangent space Te (G) of a Lie group G
with identity e. (See for example Fulton/Harris [74], page 109; Lang [23], page 167; Szekeres [266],
page 561; Frankel [12], page 402; Sternberg [37], pages 151152; Gallot/Hulin/Lafontaine [13], page 27;
Bleecker [231], page 18.)
(3) Vector fields. The elements of the Lie algebra are left-invariant vector fields on a Lie group. (See
for example Lang [23], page 166; Szekeres [266], pages 559561; Frankel [12], page 403; Sternberg [37],
pages 151152; Gallot/Hulin/Lafontaine [13], page 27; Daniel/Viallet [271], page 182; Penrose [263],
pages 312313.)
Styles (2) and (3) are very closely related and both require the definition of a differentiable manifold. (See
Section 60.7.) Style (1) is an abstraction from styles (2) and (3) which can be defined algebraically without
any analysis or geometry.
19.9.2 Remark: Similarities and differences between associative algebras and Lie algebras.
The specification tuples for associative algebras and Lie algebras have the same number and types of sets and
operations. Both kinds of algebras have a passive set A and an active set R, and addition and multiplicative
operations on each of these sets, plus an action map by the active set on the passive set. However, the
passive set A for an associative algebra is a ring, which the product operation A associative and distributive
over the addition operation A , whereas the passive set A for a Lie algebra has anticommutativity and the
Jacobi identity instead of an associativity rule.
The distributivity of A over A is the same for both kinds of algebra, but in the Lie algebra case, it must
be stated as a separate condition because (A, A , A ) is not assumed to be a ring as in Definition 19.8.2 (ii).
Direct comparison of Definitions 19.8.2 and 19.9.3 is made more difficult by the fact that Definition 19.8.2
part (ii) is a combines two conditions, namely the associativity of A and the distributivity of A over A .
19.9.4 Remark: The bracket notation for the product operation of a Lie algebra.
The main benefit of the brackets in Notation 19.9.5 is to remind the user that the product is anticommutative
and non-associative, thus discouraging incorrect changes in the order of terms for example.
If a Lie algebra product is defined as the commutator of an associative algebra product as in Theorem 19.9.10
and Definition 19.9.11, the product in Notation 19.9.5 can be correctly called the commutator of X and Y .
When the product is defined abstractly, this is not strictly correct. For example, Jacobson [92], page 6, uses
the terms Lie product and commutator and the bracket notation only when the product is in fact a
commutator of elements of an associative algebra. The bracket notation is widely used to denote commutators
of linear operators. (See Lipkin [254], page 10.) But it seems reasonable to use it for abstract Lie algebras
products also, and Lie product seems like a reasonable name. (See Winter [160], page 18.)
19.9.5 Notation: [X, Y ], for X and Y in a Lie algebra A, denotes the product A (X, Y ).
19.9.6 Remark: Linear spaces with antisymmetric bilinear product satisfying Jacobi identity.
Conditions (ii) and (v) of Definition 19.9.3 mean that the product A : A A A is bilinear. (In other
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
words, it is linear with respect to each of its two parameters for fixed values of the other parameter. See
for example Definition 27.1.16.) So a Lie algebra is sometimes defined very succinctly as a linear space with
an antisymmetric (or skew) bilinear product which satisfies the Jacobi identity. Since a linear space is a
unitary left modules over a ring, this is a special case of Definition 19.9.3.
Theorem 19.9.7 is a straightforward consequence of the bilinearity of the Lie algebra product. In the special
case that X and Y in Theorem 19.9.7 are equal to the same basis of A, it can be seen that the product
of general elements of A may be computed by multiplying the component sequences and by the basis-
dependent factors [Xi , Yj ] as indicated. These factors are called the structure constants because they
completely determine the structure of the Lie algebra. (This is true for associative algebras also.)
m, n Z+0, Rm , Rn , X Am , Y An ,
m
X n
X m,n
X
i Xi , j Yj = i j [Xi , Yj ].
i=1 j=1 i,j=1
Proof: The assertion follows by induction on m and n from Definition 19.9.3 (ii, v)
19.9.8 Remark: Alternative formulation for anticommutativity of the Lie algebra product.
Definition 19.9.3 (iii) may be replaced by Z A, [Z, Z] = 0. This is clever but less meaningful. It implies
[X, Y ] + [Y, X] = [X, X] + [X, Y ] + [Y, X] + [Y, Y ] = [X, X + Y ] + [Y, X + Y ] = [X + Y, X + Y ] = 0 for any
X, Y A, assuming conditions (i) and (ii). Thus it implies condition (iii). (The converse is trivial.) This
implication requires both the left and right distributivity of A over A in condition (ii). There is no obvious
benefit in replacing a meaningful condition with a riddle.
Proof: Definition 19.9.3 condition (i) follows directly from Definition 19.8.2 condition (i) because these
do not depend on A or A0 . Definition 19.9.3 conditions (ii) and (v) for A0 follow almost immediately from
Definition 19.8.2 conditions (ii) and (iii) for A . Definition 19.9.3 (iii) follows from the formula for A0 .
For Definition 19.9.3 (iv), let X, Y, Z A. Then, writing XY for A (X, Y ) and [X, Y ] for A0 (X, Y ),
[X, [Y, Z]] [Y, [X, Z]] = (XY Z XZY Y ZX + ZY X) (Y XZ Y ZX XZY + ZXY )
= (XY Z + ZY X) (Y XZ + ZXY )
= [X, Y ]Z Z[X, Y ]
= [[X, Y ], Z].
(The boxed terms cancel to zero.) Hence A is a Lie algebra with A0 in place of A .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
19.9.11 Definition: The Lie algebra associated with an associative algebra A < (R, A, R , R , A , A , )
is the Lie algebra (R, A, R , R , A , A0 , ), where A0 : A A A is defined by
X, Y A, A0 (X, Y ) = A (X, Y ) A (Y, X).
19.9.12 Remark: The Lie algebra associated with an associative algebra of ring-module endomorphisms.
Since the set EndR (M ) of endomorphisms of a unitary module M over a commutative ring R is an associative
algebra (with multiplication defined as composition) by Theorem 19.8.6, and every associative algebra can
be converted to a Lie algebra by Theorem 19.9.10, it follows immediately that a Lie algebra is associated
with the set of endomorphisms of any unitary module over a commutative ring. This is the Lie algebra in
Notation 19.9.13.
19.9.13 Notation: gl(M ) for a ring-module M over a commutative unitary ring R denotes the Lie algebra
associated with the associative algebra EndR (M ) of ring endomorphisms of M .
19.9.14 Definition: A real Lie algebra is a Lie algebra over R.
19.9.15 Remark: The algebra of vector fields on a smooth manifold is a real Lie algebra.
The algebra X (T (M )) of C vector fields in a C manifold M with the Lie bracket as the product
operation is a real Lie algebra. (See Definition 59.3.8 for the Lie bracket.)
19.9.16 Example: A class of Lie algebras consisting of matrices.
R Z
A useful example of a Lie algebra gl(V ) is the case that V is the linear space n for some n + . Then the
R R R
elements of End( n ) are the linear transformations of n . With respect to a basis for n , these correspond
to n n matrices. Therefore gl(V ) corresponds to the set of n n matrices together with the operations of
matrix addition, scalar multiplication and the commutator product.
19.9.17 Definition: A Lie subalgebra of a Lie algebra A < (R, A, R , R , A , A , ) is a Lie algebra A0
<
(R, A , R , R , A0 , A0 , ) such that A A, A0 A , A0 A and 0 .
0 0 0
Chapter 20
Transformation groups
Figure 20.0.1 shows some relations between the algebraic classes which are presented in Chapter 20. (See
Figure 17.0.1 for further details of relations between group-related classes.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
group
(G,G )
transformation group
(G,X,G ,)
effective transitive
transformation group transformation group
(G,X,G ,) (G,X,G ,)
Transformation groups are used throughout differential geometry, for example in the definition of connections
on fibre bundles. See Section 36.9 for topological transformation groups and Section 60.11 for differentiable
transformation groups.
A transformation group is an algebraic system consisting of two sets and two operations. The active set
G < (G, ) is a group. The passive set X has no operation of its own. There is a binary operation
: G X X called the action of the group G on X.
By default, a transformation group is a left transformation group, which is presented in Definition 20.1.2.
Right transformation groups are presented in Section 20.7.
20.1.2 Definition: A (left) transformation group on a set X is a tuple (G, X, , ) where (G, ) is a group,
and the map : G X X satisfies the following conditions.
(i) g1 , g2 G, x X, ((g1 , g2 ), x) = (g1 , (g2 , x)).
(ii) x X, (e, x) = x, where e is the identity of G.
The map may be referred to as the (left) action map or the (left) action of G on X.
20.1.5 Remark: Avoidance of ambiguity in the notation for the group action on the point set.
Notation 20.1.4 does not result in ambiguity because (g1 g2 )x = g1 (g2 x) for all g1 , g2 G and x X by the
associative condition (i) in Definition 20.1.2.
20.1.6 Notation: Lg denotes the function {(x, (g, x)); (g, x) Dom()}, where is the left action map
of a left transformation group (G, X, , ), for each g G.
Lg denotes the function Lg when the left action map is implicit in the context.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The function Lg in Notation 20.1.6 is a well-defined function with domain and range equal to X for any
left transformation group (G, X, , ), for all g G. The function Lg = Lg may also be referred to as
the left action map of the left transformation group. The left action map Lg = Lg : X X satisfies
Lg (x) = (g, x) = gx for all g G and x X.
20.1.8 Remark: Each left action map is a bijection on the point space.
For any fixed g G in Definition 20.1.2, the map Lg : X X must be a bijection, because Lg (Lg1 (x)) =
Lg1 (Lg (x)) = x for all x X. This is formalised as Theorem 20.1.9.
20.1.9 Theorem: Let (G, X, , ) be a left transformation group. Then the left action map Lg : X X
is a bijection for all g G, and Lg1 = Lg1 .
Proof: Let (G, X, , ) be a left transformation group. Let g G. Then the inverse g 1 G of g is well
defined, and Lg (Lg1 (x)) = Lg1 (Lg (x)) = x for all x X. Thus Lg Lg1 = Lg1 Lg (x) = idX . So Lg is
a bijection from X to X by Theorem 10.5.12 (iv) because it has an inverse.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Such a left transformation group G < (G, X, , ) is said to act effectively on X.
20.2.2 Remark: No two group elements produce the same action in an effective transformation group.
If (G, X, , ) is an effective left transformation group and g, g 0 G are such that Lg = Lg0 , then g = g 0 .
In other words, no two different group elements produce the same action. That is, the group element is
uniquely determined by its group action. Conversely, a left group action which is uniquely determined by
the group element must be an effective left action.
20.2.4 Example: The group of actions of a group on cosets is typically not an effective group.
A fairly general example of a non-effective left transformation group is given by the tuple (G, X` , , ` ) which
is the group of left translations of the set X` of left cosets of an arbitrary subgroup H of an arbitrary group
G < (G, ) as described in Remark 17.7.13. If H is not the trivial group, then (G, X` , , ` ) is not effective.
20.2.5 Remark: Effective transformation groups are isomorphic to subgroups of the autobijection group.
If the point set X of an effective left transformation group (G, X, , ) is the empty set or a singleton, the
only possible choice for G is the trivial group {e}. Since the set of all left transformations of a set X must
be a subgroup of the group of all bijections from X to X, it is clear that the group of an effective left
transformation group must be isomorphic to a subgroup of this group of all bijections from X to X.
20.2.6 Theorem: The group operation of an effective left transformation group is uniquely determined by
the action map.
Proof: Let (G, X, , ) be an effective left transformation group. Let g1 , g2 G. Then (g1 g2 , x) =
(g1 , (g2 , x)) for all x X. Suppose that there are two group elements g3 , g4 G such that (g3 , x) =
(g4 , x) = (g1 g2 , x) for all x X. Then for all x X, (g3 g41 , x) = (g3 , (g41 , x)) = (g4 , (g41 , x)) =
(g4 g41 , x)) = (e, x) = x. This implies that g3 g41 = e since the group action is effective. So the group
element (g1 , g2 ) = g1 g2 is uniquely determined by the action map .
20.2.7 Remark: Group elements may be identified with group actions if the action is effective.
Theorem 20.2.6 implies that the group operation of an effective left transformation group (G, X, , ) does
not need to be specified because all of the information is in the action map. An effective transformation group
is no more than the set of left transformations Lg : X X of the set X. If the group action is not effective,
then there are at least two group elements g, g 0 G which specify the same action Lg = Lg0 : X X. Any
group which is explicitly constructed as a set of transformations of a set will automatically be effective.
If a left transformation group is effective, the group elements g and the corresponding left translations Lg
may be used interchangeably. There is no danger of real ambiguity in this.
In concrete applications of transformation groups, it is generally the transformations Lg : X X which are
defined, not the group elements g G. The group is then defined as a subgroup of the group of all bijections
from X to X. In other words, the transformations are concrete, and the group is an abstraction constructed
from the concrete group.
It follows that in concrete applications, transformation groups are almost always effective. This fails to be
true, however, when a group is first constructed from one set of transformations, and it then applied to a
different passive set. For example, the group G of linear transformations of the Cartesian space X1 = n R
R R R
may be applied to the set X2 of lines {x; \ {0}} n with x n \ {0}. In this case, the original
group G may be retained for convenience, even though it is not effective on the substituted passive set X2 .
20.2.8 Theorem: Let (G, X, , ) be a left transformation group. Let (G0 , 0 ) be the group of all (left)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
bijections from X to X. That is, G0 = {f : X X; f is a bijection} and the (left) composition operation
0 : G0 G0 G0 is defined by f1 , f2 G0 , 0 (f1 , f2 ) = f1 f2 . Define : G G0 by (g) : X X
with (g)(x) = (g, x) for all x X, for all g G.
Proof: Let (G, X, , ) be a left transformation group, and let (G0 , 0 ) be the group of all (left) bijec-
tions from X to X. Define : G G0 by (g) : X X with (g)(x) = (g, x) for all x X, for
all g G. To verify that is a well-defined map, note that (g 1 )((g)(x)) = (g 1 , (g, x)) = x by
Definition 20.1.2 (i, ii). So (g) is a bijection. So (g) G0 .
Let g1 , g2 G. Then 0 ((g1 ), (g2 ))(x) = ((g1 ) (g2 ))(x) = (g1 )((g2 )(x)) = (g1 , (g2 , x)) =
((g1 , g2 ), x) = ((g1 , g2 ))(x) for all x X, by Definition 20.1.2 (i). So 0 ((g1 ), (g2 )) = ((g1 , g2 )).
Hence is a group homomorphism by Definition 17.4.1.
For part (ii), : G G0 is a group monomorphism if and only if (g) 6= (eG ) for all g G \ {eG }, if and
only if x X, (g)(x) 6= (eG )(x), for all g G \ {eG }, if and only if g G \ {eG }, x X, (g, x) 6= x,
if and only if (G, X, , ) is effective, by Definition 20.2.1.
20.2.9 Remark: The map from group elements to the corresponding action maps.
Theorem 20.2.8 maps effective left transformation groups on X to subgroups of bijections on X. So all left
transformation groups are essentially subgroups of the symmetric group on X. The elements of the group
G may be thought of as mere tags or labels for bijections on X.
20.3.4 Theorem: Let (G, X, , ) be a left transformation group. If X 6= and G acts freely on X, then
(G, X, , ) is an effective left transformation group.
Proof: Let (G, X, , ) be a left transformation group with X 6= . Assume that G acts freely on X.
Let g G\{eG }. Let x X. Then (g, x) 6= x by Definition 20.3.2. Therefore g G\{e}, x X, gx 6= x.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
So G acts effectively on X by Definition 20.2.1.
20.3.5 Theorem: Let (G, X, , ) be a left transformation group. Then G acts freely on X if and only if
p X, g1 , g2 G, ((g1 , p) = (g2 , p) g1 = g2 ). (In other words, G acts freely on X if and only if the
map p : G X defined by p : g 7 (g, p) is injective for all p X.)
Proof: Let (G, X, , ) be a left transformation group which acts freely on X. Let p X and g1 , g2
G satisfy (g1 , p) = (g2 , p). Then p = (e, p) = (g11 , (g1 , p)) = (g11 , (g2 , p)) = (g11 g2 , p) by
Definition 20.1.2, where e is the identity of G. Therefore g11 g2 = e by Definition 20.3.2. So g1 = g2 .
Now suppose that (G, X, , ) satisfies p X, g1 , g2 G, ((g1 , p) = (g2 , p) g1 = g2 ). Let g G and
p X satisfy (g, p) = p. Then (g, p) = (e, p). So g = e. Hence G acts freely on X.
20.3.6 Remark: Applicability of groups which act freely to principal fibre bundles.
A canonical example of a transformation group which acts freely is a group acting on itself by left or right
translation as in Theorem 20.3.7. This is the kind of transformation group which is needed for principal fibre
bundles in Sections 46.9 and 61.9, which play an important role in defining parallelism and connections.
20.3.7 Theorem: A group acting on itself is effective and free.
Let (G, ) be a group. Define the action map : G G G by : (g1 , g2 ) 7 (g1 , g2 ). Then the tuple
(G, G, , ) = (G, G, , ) is an effective, free left transformation group of G.
Proof: For a left transformation group, the action map : G X X must satisfy the associativity
rule ((g1 , g2 ), x) = (g1 , (g2 , x)) for all g1 , g2 G and x X. If the formula for in the theorem is
substituted into this rule with X = G, it follows easily from the associativity of . So (G, G, , ) is a left
transformation group.
This left transformation group acts freely by Theorem 17.3.19 (iii) and Definition 20.3.2. Therefore it is
effective by Theorem 20.3.4 because G 6= .
20.3.8 Definition: The (left) transformation group of G acting on G by left translation , or the left
translation group of G , is the left transformation group (G, G, , ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
converse which asserts that any effective, free and transitive left transformation group is isomorphic to the
group acting on itself. Any point in the passive set may be chosen to be its identity element. (Note that
by Theorem 20.3.4, the transformation group in Theorem 20.4.6 must be effective because it is free and the
passive set is assumed non-empty.)
20.4.5 Theorem: A group acting on itself is transitive.
Let (G, ) be a group. Define the left action map : G G G by : (g1 , g2 ) 7 (g1 , g2 ). Then the tuple
(G, G, , ) = (G, G, , ) is a transitive left transformation group.
Proof: Let x, y G. Let g = yx1 . Then (g, x) = y. Hence (G, G, , ) is transitive.
20.4.6 Theorem: Free transitive transformation group of non-empty set is isomorphic to self-acting group.
Let (G, X, , ) be a left transformation group which is free and transitive. Let p X. Define the bijection
p : G X by p : g 7 gp. Define X : X X X by
x, y X, X (x, y) = p (G (1 1
p (x), p (y))). (20.4.1)
Then (X, X ) is a group and p : G X is a group isomorphism.
Proof: Theorem 20.4.3 (iii) implies that p : G X is a bijection. So X : X X X is a well-defined
function, and
x, y, z X, X (X (x, y), z) = p (G (1 1 1 1
p (p (G (p (x), p (y)))), p (z)))
= p (G (G (1 1 1
p (x), p (y)), p (z)))
= p (G (1 1 1
p (x), G (p (y), p (z))))
= p (G (1 1 1 1
p (x), p (p (G (p (y), p (z))))))
= X (x, X (y, z)),
X (x, y) = p (G (1 1
p (x), p (y)))
= p (G (1 1 1
p (x), p (p ((p (x))
1
))))
= p (G (1 1
p (x), (p (x))
1
))
= p (eG ) = eX .
20.5.2 Definition: The orbit of a left transformation group (G, X) passing through the point x X is
the set Gx = {gx; g G}.
20.5.3 Example: The orbits of the group of rotations of the sphere S 2 about its North pole are the sets
of points of constant latitude. The group action causes translations within each orbit but not between them.
(This is illustrated in Figure 20.5.1.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Start here
By contrast, the orbits of the group of all rotations of S 2 are all equal to S 2 itself because from any given
point x S 2 , it is always possible to find a rotation which will translate x to any other y S 2
20.5.4 Remark: The relation between orbits and transitivity of a transformation group.
By Definition 20.4.1, a left transformation group (G, X) is transitive if and only if all of its orbits equal the
whole point set X. By Theorem 20.4.2, one orbit equals X if and only if all orbits equal X. Hence the set
of all orbits in Definition 20.5.6 is the uninteresting singleton X if the group action is transitive.
Roughly speaking, the bigger the group, the smaller the orbits, whereas a bigger passive space typically
has larger orbits. As noted in Example 20.5.3, a subgroup of SO(3) consisting of rotations around the north
pole has infinitely many orbits within the passive set S 2 , whereas the larger group of all elements of SO(3)
R
has only one orbit. However, if the passive space S 2 is replaced with 3 , the whole group of rotations once
again has infinitely many orbits, namely the spheres of any radius centred on the origin 0 3 . R
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
592 20. Transformation groups
20.5.5 Theorem: A left transformation group (G, X, , ) acts transitively on a non-empty set X if and
only if X is the orbit of some element of X.
20.5.6 Definition: The orbit space of a left transformation group (G, X) is the set {Gx; x X} of orbits
of (G, X) passing through points x X
20.5.7 Notation: X/G denotes the orbit space of a left transformation group (G, X).
20.5.8 Remark: The orbit space may be a regarded as a point space for the group.
Theorem 20.5.9 effectively implies that the elements of the orbit space are acted on by elements of the group
in a well-defined manner which permits the orbit space to be regarded as a point space for the group.
The proof of Theorem 20.5.9 has the same form as the proof of Theorem 17.7.4. The orbit space can
also be shown to be a partition of the passive set X by noting that the relation (R, X, X) defined by
R = {(x1 , x2 ) X X; g G, x1 = gx2 } is an equivalence relation whose equivalence classes are of the
form Gx for x X.
20.5.9 Theorem: The orbit space of a left transformation group (G, X) is a partition of X.
Proof: Let (G, X, , ) be a left transformation group. Let x1 , x2 X be such that Gx1 Gx2 6= . Then
g1 x1 = g2 x2 for some g1 , g2 G. So for any g G, gx1 = (gg11 )g1 x1 = gg11 g2 x2 Gx2 . Hence Gx1 Gx2 .
Similarly Gx1 Gx2 . So Gx1 = Gx2 and it follows that X/G is a partition of X.
20.5.10 Definition: The stabiliser of a point x X for a left transformation group (G, X, , ) is the
group (Gx , x ) with Gx = {g G; gx = x} and x = Gx Gx .
20.5.11 Remark: Relation between stabilisers and the free-action property of a transformation group.
A left transformation group (G, X, , ) acts freely on X if and only if Gx = {e} for all x X. This follows
immediately from Definitions 20.3.2 and 20.5.10.
Thus the group acts freely if and only if every stabiliser is a singleton, whereas the group acts transitively if
and only if the orbit space is a singleton, as mentioned in Remark 20.5.4.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
20.6. Left transformation group homomorphisms
20.6.1 Remark: Applicability of transformation group homomorphisms to fibre bundles.
Definition 20.6.2 is related to the definition of fibre bundle homomorphisms, which are in turn related to
the export and import of parallelism between associated fibre bundles in Section 47.4. Definition 20.6.2 is
illustrated in Figure 20.6.1.
1 2
G1 G2
1 2
X1 X2
20.6.2 Definition: A (left) transformation group homomorphism from a left transformation group
(G1 , X1 )
< (G1 , X1 , 1 , 1 ) to a left transformation group (G2 , X2 )
< (G2 , X2 , 2 , 2 ) is a pair of maps (, )
with : G1 G2 and : X1 X2 such that
(i) (1 (g, h)) = 2 ((g), (h)) for all g, h G1 . That is, (gh) = (g)(h) for g, h G1 .
(ii) (1 (g, x)) = 2 ((g), (x)) for all g G1 , x X1 . That is, (gx) = (g)(x) for g G1 , x X1 .
A (left) transformation group monomorphism from (G1 , X1 ) to (G2 , X2 ) is a left transformation group
homomorphism (, ) such that : G1 G2 and : X1 X2 are injective.
A (left) transformation group epimorphism from (G1 , X1 ) to (G2 , X2 ) is a surjective left transformation
group homomorphism (, ) such that : G1 G2 and : X1 X2 are surjective.
A (left) transformation group isomorphism from (G1 , X1 ) to (G2 , X2 ) is a left transformation group homo-
morphism (, ) such that : G1 G2 and : X1 X2 are bijections.
A (left) transformation group endomorphism of a left transformation group (G, X) is a left transformation
group homomorphism (, ) from (G, X) to (G, X).
A (left) transformation group automorphism of a left transformation group (G, X) is a left transformation
group isomorphism (, ) from (G, X) to (G, X).
20.6.3 Example: Neither the group-map nor the point-space-map uniquely determines the other.
One naturally asks whether the pair of maps in Definition 20.6.2 may be replaced with a single map.
Unfortunately, neither nor determines the other. Consider the example of embedding the transformation
R
group (G1 , X1 ) = (SO(2), S 1 ) inside (G2 , X2 ) = (GL(2), 2 ). The pair of identity maps : G1 G2 and
: X1 X2 constitutes a transformation group homomorphism. (See Figure 20.6.2.)
R R
SO(2) GL(2)
1 2
y (y)
x (x)
S1 R2
Figure 20.6.2 Transformation group homomorphism in Example 20.6.3
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
In fact, it is a monomorphism. But for a fixed choice of , the map is a transformation group
R R
homomorphism for any conformal map : 2 2 . For a fixed choice of , if the target group G2 is not
specified, then any supergroup of G2 with the same map is also a transformation group homomorphism. If
the target group G2 is given, the map may or may not be uniquely specified by the map . In the current
example, is uniquely determined by . But if the group G2 is replaced with the group of all bijections
R R
of 2 (the permutation group of 2 ), then, for example, may be replaced by the map which sends group
R
elements R SO(2) to bijections of 2 which rotate the points of S 1 in the same way but leave all other
R
points of 2 fixed. More generally, any map determines only the action of group elements (g) G2 on
points in the image of .
It follows from this example that transformation group homomorphisms must specify both maps in general.
20.6.4 Example: Freedom of choice of the point-space map for a given group map.
R
Let X1 = X2 = 2 with G1 = G2 = SO(2). Define : G1 G2 by = idG1 and : x 7 (|x|)R(|x|)x
with : + R
0 R
and R : + R
0 SO(2). Then clearly satisfies Definition 20.6.2 (i) for a transformation
R
group homomorphism. Condition (ii) stipulates S SO(2), x 2 , (Sx) = S(x). This means
S SO(2), x R, 2
(|Sx|)R(|Sx|)Sx = S(|x|)R(|x|)x
= (|x|)R(|x|)Sx
because S commutes with both the scaling transformations x 7 (|x|)x and the rotations x 7 R(|x|)x. But
R
|Sx| = |x| for all x 2 . So this requirement is satisfied for completely arbitrary functions : + 0 R R
and R : + R
0 SO(2). Another way to think of this is that the value of (x) can be completely freely
chosen for x + R
0 {0R }, but then (x) is determined for all other x
2
R
by rotating ( (|x|, 0) ) by the
angle arctan(x1 , x2 ). (See Definition 43.14.8 for the two-parameter inverse tangent function.)
Thus for x within each orbit of G1 , the value of (x) is completely determined by the choice of (x0 ) for
any single point x0 in that orbit. Between the orbits, there is complete freedom.
20.7.2 Definition: A right transformation group on a set X is a tuple (G, X, , ) where (G, ) is a group,
and the map : X G X satisfies
(i) g1 , g2 G, x X, (x, (g1 , g2 )) = ((x, g1 ), g2 ).
(ii) x X, (x, e) = x, where e is the identity of G.
The map may be referred to as the (right) action map or the (right) action of G on X.
20.7.3 Notation: For a right transformation group in Definition 20.7.2, (x, g) is generally denoted as xg
or x.g for g G and x X. The associative condition (i) ensures that this does not result in ambiguity.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
since x(g1 g2 ) = (xg1 )g2 for all g1 , g2 G and x X.
20.7.4 Remark: The associativity rules of left and right transformation groups are different.
The difference between left and right transformation groups lies in the associativity rule, condition (i) in
Definition 20.7.2, rather than in the order of G and X in the domain G X or X G of the action .
20.7.5 Notation: Rg denotes the function {(x, (x, g)); (x, g) Dom()}, where is the right action
map of a right transformation group (G, X, , ), for each g G.
Rg denotes the function Rg when the right action map is implicit in the context.
20.7.8 Definition: An effective right transformation group is a right transformation group (G, X, , )
such that g G \ {e}, x X, xg 6= x. (In other words, Rg = Re only if g = e.)
Such a right transformation group G < (G, X, , ) is said to act effectively on X.
20.7.9 Theorem: The group operation in Definition 20.7.2 is uniquely determined by the action map
if the group action is effective.
Proof: The argument is essentially the same as for left transformation groups in Theorem 20.2.6.
20.7.10 Definition: A right transformation group G < (G, X, , ) is said to act freely on the set X if
g G \ {e}, x X, (x, g) 6= x. That is, the only group element with a fixed point is the identity e.
A free right transformation group is a right transformation group (G, X, , ) which acts freely on X.
20.7.11 Theorem: Let (G, X, , ) be a right transformation group. If X 6= and G acts freely on X,
then (G, X, , ) is an effective right transformation group.
Proof: Let (G, X, , ) be a right transformation group with X = 6 . Assume that G acts freely on X.
Let g G\{eG }. Let x X. Then (x, g) 6= x by Definition 20.7.10. Therefore g G\{e}, x X, xg 6= x.
So G acts effectively on X by Definition 20.7.8.
20.7.12 Theorem: Let (G, X, , ) be a right transformation group. Then G acts freely on X if and only
if y X, g1 , g2 G, ((y, g1 ) = (y, g2 ) g1 = g2 ). (In other words, G acts freely on X if and only if
the map y : G X defined by y : g 7 (y, g) is injective for all y X.)
Proof: Let (G, X, , ) be a right transformation group which acts freely on X. Let y X and g1 , g2
G satisfy (y, g1 ) = (y, g2 ). Then y = (y, e) = ((y, g1 ), g11 ) = ((y, g2 ), g11 ) = (y, g2 g11 ) by
Definition 20.7.2, where e is the identity of G. Therefore g2 g11 = e by Definition 20.7.10. So g1 = g2 .
Now suppose that (G, X, , ) satisfies y X, g1 , g2 G, ((y, g1 ) = (y, g2 ) g1 = g2 ). Let g G and
y X satisfy (y, g) = y. Then (y, g) = (y, e). So g = e. Hence G acts freely on X.
20.7.13 Definition: A transitive right transformation group is a right transformation group (G, X, , )
such that x, y X, g G, (x, g) = y. (In other words, x X, {(x, g); g G} = X.)
A right transformation group G < (G, X, , ) is said to act transitively on X if it is a transitive right
transformation group.
20.7.14 Theorem: A right transformation group (G, X, , ) acts transitively on a non-empty set X if
and only if x X, {(x, g); g G} = X. (That is, G acts transitively on a non-empty set X if and only if
X is the orbit of some element of X.)
Proof: Let (G, X, , ) be a transitive right transformation group. Then x X, {(x, g); g G} = X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
So clearly x X, {(x, g); g G} = X if X is non-empty.
Let (G, X, , ) be a right transformation group. Let x X satisfy {(x, g); g G} = X. Let y X. Then
(x, gy ) = y for some gy G. So (y, gy1 ) = x by Theorem 20.1.9. Therefore (x, g) = ((y, gy1 ), g) =
(y, gy1 g, y) for all g G. In other words, X = {(x, g); g G} = {(gy1 g, y); g G} = {(y, g 0 ); g 0 G},
substituting g 0 for gy1 g. In other words, {(y, g); g G} = X.
20.7.15 Theorem: Let (G, X, , ) be a right transformation group. For all y X, define y : G X by
y : g 7 (y, g).
(i) G acts freely on X if and only if y is injective for all y X.
(ii) G acts transitively on X if and only if y is surjective for all y X.
(iii) G acts freely and transitively on X if and only if y is a bijection for all y X.
Proof: Part (i) is a restatement of Theorem 20.7.12. Part (ii) follows directly from Definition 20.7.13.
Part (iii) follows from parts (i) and (ii).
20.7.16 Theorem: Let G < (G, ) be a group. Define a right action map : G G G by : (g1 , g2 ) 7
(g1 , g2 ). Then the tuple (G, G, , ) = (G, G, , ) is an effective, free, transitive right transformation group.
Proof: For a right transformation group, the right action map : X G X must satisfy the associativity
rule (x, (g1 , g2 )) = ((x, g1 ), g2 ) for all g1 , g2 G and x X. If the formula for in the theorem is
substituted into this rule with X = G, it follows easily from the associativity of .
This right transformation group acts freely by Theorem 17.3.19 (iv) and Definition 20.7.10. Therefore it
is effective by Theorem 20.7.11 because G 6= . To show that (G, G, , ) is transitive, let x, y G. Let
g = x1 y. Then (x, g) = xx1 y = y. So (G, G, , ) is a transitive right transformation group by
Definition 20.7.13.
20.7.17 Definition: The right transformation group of G acting on G by right translation , or the right
translation group of G , is the right transformation group (G, G, , ).
20.7.18 Remark: Insufficiency of specification tuples to distinguish left from right transformation groups.
Oddly enough, the tuple (G, G, , ) is identical for the left and right transformation groups where G acts
on G. This is yet another example which demonstrates that definitions of object classes are more than just
the sets in the specification tuples. The same tuple is used to parametrise two different classes of objects:
both the left and right transformation group classes.
20.7.19 Definition: The orbit of a right transformation group (G, X) passing through the point x X
is the set xG = {xg; g G}.
20.7.20 Remark: The relation of transitivity to orbits.
In terms of the definition of an orbit, Definition 20.7.13 defines a right transformation group (G, X) to be
transitive if and only if all of its orbits are equal to the whole point set X, and Theorem 20.7.14 means that
if one of the orbits equals X, then all of the orbits equal X.
20.7.21 Theorem: A right transformation group (G, X, , ) acts transitively on a non-empty set X if
and only if X is the orbit of some element of X.
Proof: The assertion follows directly from Theorem 20.7.14.
20.7.22 Definition: The orbit space of a right transformation group (G, X) is the set {xG; x X} of
orbits of (G, X) passing through points x X.
20.7.23 Notation: X/G denotes the orbit space of a right transformation group (G, X).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) (1 (g, h)) = 2 ((g), (h)) for all g, h G1 . That is, (gh) = (g)(h) for g, h G1 .
(ii) (1 (x, g)) = 2 ((x), (g)) for all g G1 , x X1 . That is, (xg) = (x)(g) for g G1 , x X1 .
A right transformation group monomorphism from (G1 , X1 ) to (G2 , X2 ) is a right transformation group
homomorphism (, ) such that : G1 G2 and : X1 X2 are injective.
A right transformation group epimorphism from (G1 , X1 ) to (G2 , X2 ) is a surjective right transformation
group homomorphism (, ) such that : G1 G2 and : X1 X2 are surjective.
A right transformation group isomorphism from (G1 , X1 ) to (G2 , X2 ) is a right transformation group homo-
morphism (, ) such that : G1 G2 and : X1 X2 are bijections.
A right transformation group endomorphism of a right transformation group (G, X) is a right transformation
group homomorphism (, ) from (G, X) to (G, X).
A right transformation group automorphism of a right transformation group (G, X) is a right transformation
group isomorphism (, ) from (G, X) to (G, X).
20.8.2 Remark: Relevance of groups of automorphisms of right transformation groups to affine spaces.
The set of all automorphisms of a right transformation group constitutes a group with respect to the compo-
sition operation. Groups of automorphisms of right transformation groups are relevant to affine spaces. In
the case of an affine space X over a group G, for example, a transformation of the point set X corresponds
to a transformation of the group G. (See Section 26.2 for affine spaces over groups.) The group action is
transformed in accordance with Definition 20.8.1.
20.8.3 Remark: Relevance of equivariant maps between right transformation groups to fibre bundles.
Definition 20.8.4 is relevant to the definition of principal fibre bundles. (See Remark 46.9.15.) The fibre
charts of a principal fibre bundle are equivariant maps with respect to the right action of a structure group
on the total space of a principal fibre bundle.
20.9.2 Remark: Some methods for constructing figures associated with a given transformation group.
Some methods for constructing transformation groups (G, F, , F ) associated with a given transformation
group (G, X, , X ) are outlined in the following list. For each set of figures F , the map F : (G, F ) F
denotes the action of G on F . The constructed left transformation groups are associated with the given left
transformation group in the sense that F ((g1 , g2 ), f ) = F (g1 , F (g2 , f )) for all f F , for all g1 , g2 G.
In each construction method, g may be replaced by g 1 to give an alternative associated construction. In
each case, the set F is indicated as a subset of some set. This subset must be chosen so that the operation
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
F is closed.
(1) The figures are individual points of the base set X. This is the trivial case.
F X
g G, f F, F (g, f ) = X (g, f ).
(2) The figures are subsets of X.
F P(X)
g G, S F, F (g, S) = {X (g, f ); f S}.
(3) The figures are maps : I X for some set I.
F XI
g G, F, t I, F (g, )(t) = X (g, (t)).
(4) Each figure is a map from X to some set Y .
F YX
g G, F, x X, F (g, )(x) = (X (g, x)).
(5) Each figure f : E K is a map from some set of test functions E Y X to a set K.
F K E for some E Y X
g G, f F, E, x X, F (g, f )()(x) = f ((X (g, x))).
(6) The figures are maps from an index set I with values which are subsets of X.
F P(X)I
g G, f F, t I, F (g, f )(t) = {X (g, x); x f (t)}.
20.9.3 Remark: Example of figures which are subsets of a given point space.
An example of case (2) in Remark 20.9.2 would be the set of images of injective continuous maps f : Ck X,
RPk
N
where Ck = {x k ; i=1 xi 1 and i k , xi 0} is a closed k-dimensional cell. Such images might
be referred to as disoriented cells. Closed cells could be replaced with open cells or boundaries of cells.
By fixing k, the set of figures could be restricted to just line segments or triangles etc.
20.9.4 Remark: Figures which are sets of subsets of the given point space.
One obvious generalisation of case (2) is to sets of subsets.
PP
F ( (X))
g G, C F, F (g, C) = {{X (g, f ); f S}; S C}.
This would be applicable, for example, to sets of neighbourhoods of points in a topological space X. (For
indexed sets of neighbourhoods, this would correspond to case (6), as mentioned in Remark 20.9.9.) Arbitrary
hierarchies of sets of sets of sets, and so forth, may be defined in this way by considering F = ( (X)), PP
PPP
F = ( ( (X))), and so forth.
Case (2) may also be generalised from unordered subsets of X to ordered subsets of X. For example, ordered
pairs, ordered triples, etc. Ordered sets could be thought of as equivalent to case (3), where the (possibly
variable) index set I has an order which is imported onto the point set.
20.9.5 Remark: Examples of maps with the given point set as the range.
Examples of case (3) in Remark 20.9.2 are sequences of elements of X such as a basis of a linear space (where
I is a finite or countable set), curves in X (where I is an interval of the real line), and families of curves
R
in X (where I is a subset of 2 such as [0, 1] [0, 1]). The natural operation Lg : F F is defined by
Lg ()(t) = (g, (t)) for F and t I. A special case would be the set of functions f : Ck X, where
Ck is defined as in case (2). Such functions could be called oriented cells.
20.9.6 Remark: Covariant and contravariant transformation of functions on the point set.
In case (4) in Remark 20.9.2, there are two natural operations Lg : F F . The action could be defined
either so that Lg (f )(x) = f ((g, x)) or Lg (f )(x) = f ((g 1 , x)). The second choice has the effect that the
function is moved in the direction of the action of g on X because the points x are moved in the reverse
direction. This has the advantage that if both f and x are transformed by g G, then the value f (x) is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
unchanged.
20.9.7 Remark: Functions to and from the point set are not natural figures.
It might seem reasonable to combine cases (3) and (4) in Remark 20.9.2 by considering maps : X X.
These could be endomorphisms, automorphisms, or arbitrary partial function maps (defined only on subsets
of X). The action of G on such figures could affect only the domain of each map as in case (4) as follows.
F XX
g G, F, x X, F (g, )(x) = (X (g, x)).
Alternatively only the range of each map could be affected as in case (3) as follows.
g G, F, x X, F (g, )(x) = X (g, (x)).
Or the action could be applied to both the domain and range as follows.
g G, F, x X, F (g, )(x) = X (g, (X (g, x))).
A more important question is whether point transformation maps can be validly considered as figures in
the point space. A figure should be an object with a location in the point space, and the object should be
movable within the space, more or less in the same way as individual points may be moved.
A point transformation could be regarded as a figure if the pairs (x, (x)) were regarded as ordered pairs
of points in X. In this case, the map would be thought of as a set of ordered pairs, which is more or less
a set of sets as in Remark 20.9.4. This is still not very convincing. A transformation group acting on the
figures of a point set must act on more than merely some set which is connected in some way with the
set X. The figures must be structures which are based in some on the points of X, and they must have a
natural motion when those points are acted on by the group G.
20.9.8 Remark: Distributions are functions whose range is a set of functions on the point set.
An example of case (5) in Remark 20.9.2 is the set F of continuous linear functionals f : C0 ( Rn ) R
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
20.9. Figures and invariants of transformation groups 599
R R R
with X = n , Y = K = , and E = C0 ( n ). In this case, the figures are functions of functions. A double
reversal of the sense of the Lg action in case (4) yields the forward action defined by Lg (f )() = f (Lg1 ())
for f F , : X Y , where Lg ()(x) = ((g 1 , x)) for : X Y and x X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the same except that the relation does not need to be inverted.
P
Case (5) is a fly in the ointment here because F (X, Y ) K. In this case, it is seemingly impossible
to localise the figures to points in any way. The value of the figure depends on all of the points of the
point set X. Nevertheless, it is possible to make elements of G act on the set X while leaving the other
R
sets unchanged, and figures of this kind are genuine figures. For example, if E = C0 ( n ) and K = , R
then the space of continuous linear functionals from E to K (with a suitable topology on E) is a set of
distributions on X. Examples include Dirac delta functions. Distributions have well-defined transformation
rules in accordance with transformations of the underlying point set. Distributions are similar to measures,
which can be transformed around the point set along with the points, as mentioned in Remark 20.9.9.
All in all, it seems that figures in point spaces are too diverse to unify in a single framework without including
the complete generality of transformation groups over a given group G.
It does seem reasonably clear whether the group element g G or its inverse should be applied to points in
the figure constructions to make the figure move in the correct direction relative to points, but this is also
difficult to formulate in a precise way. Thus intuition seems to determine what a figure is, and also how
precisely it should be transformed. Unless a reliable, systematic framework can be found, it seems reasonable
to simplify codify the basic operations and relations.
Definition 20.9.13 does not add structures such as an order, a topology or an atlas to the generated figure
spaces. Nor does this definition generate the contravariant transformations which may be obtained by
replacing g with g 1 in the generation rules for the action maps F2 .
20.9.13 Definition: A figure space for a transformation group (G, X, , X ) is a transformation group
(G, F, , F ) which is generated by some sequence of the following generation rules.
P
(1) For figure space (G, F1 , , F1 ) and predicate P1 on (F1 ), define figure space (G, F2 , , F2 ) by
P
F2 = {S (F1 ); P1 (S)} and
g G, S F2 , F2 (g, S) = {F1 (g, f1 ); f1 S},
where P1 satisfies F2 (G F2 ) F2 .
(2) For figure space (G, F1 , , F1 ), set I, and predicate P1 on F1I , define figure space (G, F2 , , F2 ) by
F2 = { : I F1 ; P1 ()} and
g G, F2 , F2 (g, ) = {(t, F1 (g, (t))); t I},
where P1 satisfies F2 (G F2 ) F2 .
(3) For figure space (G, F1 , , F1 ), set Y , and predicate P1 on Y F1 , define figure space (G, F2 , , F2 ) by
F2 = { : F1 Y ; P1 ()} and
g G, F2 , F2 (g, ) = {(F1 (g, f1 ), (f1 )); f1 F1 },
where P1 satisfies F2 (G F2 ) F2 .
20.9.14 Remark: A minor technicality in the definition of figure spaces.
In Definition 20.9.13, part (1), the closure pre-condition for P1 is
g G, S P(F1), P1 (S) P1 ({F1 (g, f1 ); f1 S}).
This should, strictly speaking, be placed on P1 before F2 and F2 are defined. It is tedious to write out such
pre-conditions in full. They are therefore expressed more simply as the post-condition F2 (G F2 ) F2 in
terms of F2 and F2 . (Better late than never!)
20.9.15 Example: Examples of figures using classical groups.
Z R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
As an example of Definition 20.9.13, let G be the classical group SO(n) for some n + , with X = n . (See
Section 25.14 for classical groups.) Denote this transformation group by (G, X, , X ). Using method (2)
with I = [0, 1] R R
and defining P1 () to be true if the map : I n is linear, one obtains the figure
R
space (G, F, , F ) of linearly parametrised line segments in n .
R
For each line segment F , the point (1/2) is an element of X = n . This defines a map from F to X.
This map is covariant in the sense that (F (g, )) = X (g, ()) for all g G. Of course, this is valid also
for G = GL(n).
20.9.16 Remark: Associated transformation groups are generated from a common transformation group.
If two transformation groups can be obtained as figure spaces via Definition 20.9.13 from a common transfor-
mation group, then those groups may be referred to as associated transformation groups. This is formalised
as Definition 20.9.17.
20.9.17 Definition: Two associated transformation groups are transformation groups which are figure
spaces of a common transformation group.
20.9.18 Remark: Group invariants and figures.
The definition of group invariants is based on the notion of figures. A group action on a space X may be
extended to a wide variety of spaces of figures derived from the basic space X.
20.9.19 Remark: Invariant, covariant and contravariant attributes of a transformation group.
An invariant attribute of an associated transformation group (G, F, , F ) of a given transformation group
(G, X, , X ) is a function h : F A for a set A, such that h(F (g, f )) = h(f ) for all g G and f F .
A covariant attribute of an associated transformation group (G, F, , F ) of a given transformation group
(G, X, , X ) is a map h : F A for an associated transformation group (G, A, , A ) which satisfies
h(F (g, f )) = A (g, h(f )) for all g G and f F .
20.9.20 Remark: The transformation rules for particular figures spaces are not uniquely defined.
There is clearly a very broad range of classes objects which could be regarded as figures for a set X. In
each case, there is no well-defined rule for generating the map Lg from the figure space F , although it is
obvious which map to choose in each case, more or less. The point made here is that the concepts of
figures and invariants seem to be some sort of meta-concept which cannot be codified easily.
Suppose now that a figure space F1 has been chosen for a transformation group (G, X, , ), and a transfor-
mation operator Lg : F1 F1 for each g G. Then any function : F1 F2 for any figure space F2 for X
(with action operator also denoted as Lg ) is said to be an invariant of the group G if (Lg f ) = Lg (f ) for
all f F1 and g G. (This is equivalent to requiring to be an equivariant map from (G, F1 ) to (G, F2 ).
See Definition 20.6.6 for equivariance.)
Typical examples of invariant functions would be the length of a curve, the centroid of a finite set of vectors,
the maximum of a real-valued function, the angle between two vectors, and the mass of a Radon measure.
The functions may be boolean-valued attribute functions such as the collinearity of a set of points. The
attribute value set F2 may not be modified at all by the Lg operations. But in the case of linear combinations,
for instance, the linear combination of a sequence of vectors is not invariant under transformations of the
base space X, but the linear combination is transformed in the same way as the sequence of vectors. Thus
linear-combination operations are invariants of the group of linear transformations.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the same as the transformed midpoint of the original points. This kind of preservation of relations could
be described as a kind of covariance. In other words, the transformation group preserves relations rather
than points or functions of points. However, it is arguable that the same word invariant could be applied
both to fixed points, fixed function values, and fixed relations under transformation groups.
When parallelism is defined on fibre bundles, it is often required that some structure is preserved. Thus,
for instance, if two objects attached to one point of a set satisfy some relation, then the same two objects
transported in a parallel fashion to another point may be required to maintain that relation. A property
or relation which is maintained under a transformation is called an invariant of the group of transforma-
tions. So when defining parallelism for manifolds, it is convenient to define a structure group so that the
properties and relations which are supposed to be maintained are invariants of that group. This motivates
the presentation here of transformation group invariants.
20.9.22 Remark: Invariance of linear combinations under parallel transport of tangent spaces.
An important example of a group invariant is the linear combination operation on linear spaces such as the
R
tangent vector spaces at points of a differentiable manifold. Consider the linear space n . For anyPsequence
= (i )m
i=1
m
R R R
, define the linear combination operator S : ( n )m n by S : (vi )m i=1 7
m
i=1 i vi .
This maps m-tuples of vectors in n
R
to a linear combination of those vectors. Then S (Lg v) = gS (v) for any
g GL(n) and v = (vi )m i=1 ( R
n m
R R
) , where Lg : ( n )m ( n )m is defined so that Lg : (vi )m n
i=1 7 (gvi )i=1 .
So every linear combination operator S is an invariant (or covariant?) function of the set of m-frames in n . R
When defining a connection on a differentiable fibre bundle in Chapters 62, 63 and 65, the connection will
be required to preserve these invariants. This will play a role in the transition from connections on ordinary
fibre bundles to connections on principal fibre bundles. More specifically, the invariance of a connection on
a principal fibre bundle under group action on the right will arise from the invariance of linear combinations
of n-frames under linear transformations.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is the map x 7 hx. This results in the observation g(hx) by an observer with viewpoint g G. This has
the same effect on the observation as replacing the observer viewpoint g with gh. The net observation ghx
is the same in each case. This is often experienced when sitting in a train at a station, waiting for the train
to move. When the neighbouring train moves, one may have the illusion that ones own train is moving.
It follows from these comments that there is a natural isomorphism between the left transformation group
(G, X, , ), regarded as a subset of the (bijection) automorphism group Aut(X), and the right transforma-
tion group (G, G, , ) of G acting on itself, regarded as the group-automorphism group Aut(G).
20.10.3 Theorem: Isomorphism between left actions on a point set and right actions on the group.
Let (G, X, , ) be a left transformation group. Let (G, G, , ) be the corresponding right transformation
group of G acting on itself. For g G, define LX X G
g : X X by Lg : x 7 (g, x), and Rg : G G
by RgG : h 7 (h, g).
(i) The map : AX AG defined by : LX G
g 7 Rg 1 is a group isomorphism between the function-
composition groups AX = {Lg ; g G} and AG = {RgG ; g G}.
X
Proof: For (G, X, , ), (G, G, , ) and : AX AG as in the statement of the theorem, let g1 , g2 AX .
Then (LX X X G G G G
g1 Lg2 ) = (Lg1 g2 ) = Rg1 g2 1 = Rg 1 g 1 = Rg 1 Rg 1 . Part (i) follows.
2 1 1 2
20.10.4 Remark: The relation between group actions on a point set and group actions on the group.
The maps and spaces in Theorem 20.10.3 are illustrated in Figure 20.10.1.
It may seem that Remark 20.10.2 and Theorem 20.10.3 are vacuous. The isomorphism in Theorem 20.10.3
is apparently none other than the identity map idG : G G. However, a group of transformations is usually
specified as a subgroup of the automorphism group of some set. Theorem 20.2.8 shows that each effective
left transformation group on X is essentially the same thing as a subgroup of the symmetric group on X,
RgG G RgG1
LX
g X
Figure 20.10.1 Left actions on the point set and right actions on the group
which is the group of bijections from X to X. The group G in Theorem 20.10.3 has two roles. In (G, X, , ),
G is a subset of the symmetric group on X. In (G, G, , ), G is the right automorphism group on G.
20.10.5 Remark: The relation of principal fibre bundles to ordinary fibre bundles.
The purpose of Example 20.10.6 is to show how principal fibre bundles are related to ordinary fibre bundles
in the case of baseless figure charts, but where the maps are not all linear. Principal fibre bundles are
usually explained in the context of tangent bundles, where the fibre maps and transition maps are linear,
and the structure group is a general linear group.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of direct information about the real world (and the maps between the real world and the camera imaging
P
regions), one cannot reconstruct the sets w (E), but one may construct equivalence classes which are a
(poor) substitute for the real thing. If cameras i, j I produce images fi and fj respectively, one may say
P
that they are looking at the same object w (E) if fj = i,j (fi ). Thus the pairs (i , fi ) and (j , fj ) are
equivalent (i.e. produced by the same real world object) if and only if fj = i,j (fi ). So one may construct
P
an equivalence class [(i , f )] = {(j , i,j (f )); j I}, for any i I and f (F ). Then there is a natural
P P
bijection : {[(i , f )]; i I, f (F )} (E). Unfortunately, this bijection is not knowable, but one can
at least perform the thought experiment of inverting the unknown to obtain 1 ([(i , f )]) = w for some
P
unique w (E), for all (i, f ) I (F ). P
P P
Corresponding to the group G = Aut(F ) is the group G = {g : (F ) (F ); g G}, where g denotes the
P
set-map corresponding to g G. Thus g : f 7 {g(x); x f } for all f (F ). The tuple (G, (F ), , ) P
is a left transformation group, where denotes the function composition operation and : (g, f ) 7 g(f )
P
denotes the action of G on (F ). Let P be the set of all possible camera imaging diffeomorphisms. That
is, let P = { : E F ; is a diffeomorphism}. Then G has a natural action P G on P on the left. Thus
PG : (g, )
7 g for g G and P .
If g G is applied to i in the equivalence class [(i , f )], the result is [(gi , f )], which equals [(i , g 1 f )]. (To
see this, let j = gi . Then (gi , f ) equals (j , f ), which is equivalent to (i , j,i f ), which equals (i , g 1 f )
because j,i = i 1 j = g.) In terms of the physical interpretation, this means that applying g to the
distortion of the camera lens is equivalent to applying g to the object which is being imaged. This seems
intuitively clear. If the object is made twice a big, this has the same effect as increasing the magnification
of the imaging.
In the modern digital camera, the apparent size of an object in a photo depends on the size of the imaging
element, typically a charge-coupled device (CCD). One can imagine, for the sake of this mathematical
example, that the CCD could be distorted in some way, for example by a diffeomorphism : F F . If
the array of pixels in the imagine surface suffers such a distortion, then when the image is displayed as the
rectangle which it is (falsely) assumed to be, then the resulting image will suffer the distortion = 1 .
Then one will have the equivalence (( g)1 , f ) = (g 1 1 , f ) ( 1 , gf ). So in terms of pairs (, f ), one
will have (g, f ) (, gf ). This resembles the state of affairs for the typical textbook example of principal
fibre bundles.
In the typical textbook example of principal fibre bundles for the general linear group GL(n) acting on
R
the set n of real n-tuples, the real n-tuples represent the components of vectors (in a tangent space on
a differentiable manifold) with respect to a given basis. The group has two roles. In the OFB, the group
acts on the real n-tuples of vector coordinates. In the PFB, the group acts on the linear space basis at
each point. In terms of the usual conventions for matrix multiplication, one says that the basis is acted
upon by a matrix on the right, whereas the vector components are acted upon by a matrix on the left.
Thus Ra : (ei )ni=1 7 (ej )nj=1 = (ei ai j )nj=1 and Lb : (v i )ni=1 7 (v j )nj=1 = (bj i v i )nj=1 , with the implicit
Pn
summation convention in P Remark 22.3.10. The reconstructed
P Pvector w =P i=1P v i ei is the P same as the
n i n i n i n n i k n `
reconstructed vector w = v ei if and only if v ei = v ei = a k v `=1 e` b i =
N
Pn Pn Pn ` i k i=1 i=1
Pn ` i i=1 i=1 k=1
`
k=1 `=1 ( i=1 b i a k )v e` . This holds if and only if i=1 b i a k = k for all k, ` n , which means that
the matrices b and a are inverses of each other. Therefore the pairs (e, v) and (Rg1 e, Lg v) are equivalent
for all g G.
Thus the vector components are transformed contravariantly to the basis if the imaged vector is unchanged.
The basis vectors are measuring sticks for a fixed real vector. If a measuring stick is doubled in size,
the corresponding measurement (the coordinates) must be halved. This is the underlying reason for the
contravariance. (This is not at all surprising when one looks at Figure 22.8.1 in Remark 22.8.4, which shows
that the vector components calculation operation is the inverse of the linear combinations construction
operation. The former is contravariant whereas the latter is covariant relative to the choice of basis.)
In the camera example, if the CCD is made larger, the image produced from a fixed object will be smaller
relative to the size of the CCD. (This is why low to medium price digital SLR cameras produce larger
images than expected with lenses from old film cameras. The smaller imaging area makes objects appear
relatively larger.) The photographic and linear space situations are analogous in that the transformation
applied to the measuring sticks (i.e. imaging area or basis vectors) must be inverse to the transformation of
the measurements (i.e. relative image size or vector components) if the real object is fixed (namely the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
photography subject or the vector).
The moral of this story is that one should think of a PFB total space as a set of frames of reference
or cameras or observers or linear space bases, while the fibre space is the set of measurements or
images or vector components. An OFB total space is the set of real objects or photograph subjects
or vectors. Each measurement is produced by combining a frame of reference with a real object. This
idea is fundamental to fibre bundles, tensor algebra, tensor calculus, parallelism and connections. A related
comment was made by Bleecker [231], page 24, as follows.
Since all measurements are made relative to a choice of frame, and the measurement process can
never be completely divorced from the aspect of the universe being measured, we are led to the
conclusion that the bundle of reference frames should play a part in the very structure of the
universe as we perceive it.
An analogue of the above covariant relation between object size and image size (where the diffeomorphism
maps have the form : E F ) can be manufactured by providing an inner product for a linear space. Then
the inner product tuple (vi )ni=1 = (v ei )ni=1 is covariant with respect to the basis vector tuple (ei )ni=1 .
20.10.7 Remark: Contravariant baseless figure/frame bundles.
The concept of a combined baseless figure/frame bundle may now be formalised as Definition 20.10.8. This
is the contravariant version where the coordinates vary inversely with respect to the observation frames.
The spaces and maps of baseless figure/frame bundles are illustrated in Figure 20.10.2. For comparison, the
corresponding fibre/frame bundle spaces and maps are shown on the right.
20.10.8 Definition: A (contravariant) baseless figure/frame bundle is a tuple (G, P, E, F, , P F
G , G , )
which satisfies the following conditions.
(i) (G, ) is a group.
(ii) (G, P, , P
G ) is a right transformation group which acts freely and transitively on P .
F
G F
G
G F G F
(
P P
,
G G
X
)
) P ))
(T
P
(
P P E E P P E E
1 2 1 2
B
baseless figure/frame bundle fibre/frame bundle
Figure 20.10.2 Spaces and maps for baseless figure/frame and fibre/frame bundles
(iii) (G, F, , F
G ) is an effective left transformation group.
1
(iv) : P E F satisfies P, z E, g G, (P
G (, g ), z) = F
G (g, (, z)).
(v) P, f F, 0 z E, (, z) = f .
(I.e. the map z 7 (, z) is a bijection from E to F for all P .)
G is the structure group .
P is the principal bundle or frame bundle or observer bundle or viewpoint bundle or perspective bundle .
E is the ordinary bundle or figure bundle or state bundle or entity bundle or object bundle .
F is the observation space or measurement space or component space or coordinate space .
P
G is the right action (map) of G on P .
F
G is the left action (map) of G on F .
is the observation map or measurement map .
is a frame or observer or viewpoint or perspective for P .
z is a figure or state or entity or object for z E.
f is an observation or a measurement or a view or the components or the coordinates for f F .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(, ) is a chart or component map or coordinate map for P .
20.10.9 Remark: Principal fibre bundles and ordinary fibre bundles make more sense when combined.
In a combined fibre/frame bundle which does have a base space, the set P is the total space of the principal
fibre bundle, and the set E is the total space of the ordinary fibre bundle. The standard textbooks on this
subject generally present principal (i.e. frame) bundle and ordinary fibre bundles as separate classes which
may be associated, but they probably make more sense when combined.
20.10.11 Remark: The arbitrary terminology for covariant and contravariant transformations.
In Definition 20.10.8, one could substitute g 1 for g (and g for g 1 ) without changing the meaning of the
definition. Then one would act on frames with actions Rg , and on observations f = (z) with actions Lg1 .
This alternative convention would have some advantages. The terminology in differential geometry refers to
the frames as transforming covariantly, while the measurements are said to transform contravariantly,
which is a great mystery to many people. If the actions Rg and Lg1 were adopted, this could make
the transformations align better with the terminology. However, the observables are nearly always in the
foreground in differential geometry. One is generally concerned, particularly in tensor calculus, with the
transformations of coordinates, not of bases (or reference frames). Therefore the convention followed here
is to write Rg1 and Lg . This arbitrary convention is a difficult trade-off between tradition, common sense
and practicality. (See also Section 28.5.7 for discussion of this issue.)
20.10.12 Remark: Arguments for and against identifying charts with frames.
Abbreviating charts (, ) to in Remark 20.10.10 is not necessarily an abuse of notation. The principal
bundle P is introduced in Definition 20.10.8 (ii) as an abstract set. But P could easily be defined as a set of
functions from E to F without modifying Definition 20.10.8. That is, one could choose P to be a subset of
the set of functions F E , and one could choose these functions P so that = (, ). Then one would
have z E, (z) = (, z). One could in fact take this as the definition of . There is no logical difficulty
with this in principle.
An argument against identifying (, ) with is that one generally wishes to admit multiple associated
ordinary bundles Ek for a single common principal bundle P . (See Remark 20.11.1.) In fact, this is the
main benefit of defining principal bundles. To take care of this situation, one could tag each chart to
show which ordinary bundle it is associated with. Thus Ek or k would mean k (, ) for example. This
is somewhat clumsy, but not necessarily a bad idea. Another approach would be to arrange things so that
the sets representing the ordinary bundles Ek are disjoint. Then could be applied to all of the relevant
ordinary fibre bundles without tagging.
20.10.13 Remark: The structure group is the group of all transitions between reference frames.
Condition (ii) in Definition 20.10.8 implies that the map P
G (, ) : G P is a bijection. This is intentional.
The free and transitive requirements follow from the fact that the purpose of G is to model the group of all
transitions between the reference frames in P .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
replaced by the covariant condition (iv0 ).
(ii0 ) (G, P, , P
G ) is a left transformation group which acts freely and transitively on P .
(iv0 ) : P E F satisfies P, z E, g G, (P F
G (g, ), z) = G (g, (, z)).
Proof: For part (i), let 1 , 2 P . By Definition 20.10.8 (ii) and Theorem 20.7.15 (iii), there is a unique
g G such that 2 = Rg1 1 . Then 2 = Lg 1 by Definition 20.10.8 (iv). The uniqueness of g for a given
map Lg : F F follows from the requirement that G act effectively on F by Definition 20.10.8 (iii).
The existence of the function g in part (ii) follows directly from part (i) because the group element g exists
and is unique for each 1 , 2 P . The uniqueness of the function g follows from the effective action of G
on F by Definition 20.10.8 (iii).
For part (iii), the existence of the inverse map 12 : F E for all 2 P follows from Definition 20.10.8 (v).
Then the formula Lg(1 ,2 ) = 2 1
1 follows from part (ii).
It is noteworthy that the map Lg, : E E defined by Lg, : z 7 1 Lg (z) is not independent of the choice
of P . To see this, let 1 , 2 P . Then Lg,2 L1 1 1
g,1 = (2 Lg 2 )(1 Lg 1 )
1
= 21 Lg 2 1 1
1 Lg 1 =
1 1 1 1
2 Lg Lg(1 ,2 ) Lg 1 2 2 = 2 Lgg(1 ,2 )g1 g(2 ,1 ) 2 . This equals idF for all 1 , 2 P if and only
if gg(1 , 2 )g 1 g(1 , 2 )1 = eG for all 1 , 2 P . In other words, g must commute with g(1 , 2 ) for
all 1 , 2 P . This can only hold for all g G if G is a commutative group. The physical interpretation of
this is that in general, changes to the state (or orientation or location) of an object may appear differently
to different observers. This is what makes figure bundles and fibre bundles interesting.
20.10.18 Remark: Baseless figure/frame bundles are a special case of standard fibre bundles.
A baseless figure/frame bundle is essentially equivalent to the usual definition of a fibre bundle where the
base set B is a singleton. This corresponds to a 0-dimensional manifold as the base set. Thus one may say
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that baseless figure/frame bundles are a special case of standard fibre bundles.
G G
P
G P P
G P
G P P
G
G G
P E1 P E2 P E3 P E1 P E2 P E3
1 2 3 1 (, ) 2 (, ) 3 (, )
F1 F2 F3 F1 F2 F3
F 2
F 2
F
G
1 G F
G
3
F
G
1 G F
G
3
G G
baseless figure bundles baseless figure bundles
observation maps k charts k (, ) for frames
Figure 20.11.1 Common frame bundle for associated baseless figure spaces
The transformation group (G, P, , P G ) is common to the associated baseless figure spaces. For each figure
space Ek , a transformation group (G, Fk , , F G ) and an observation map k : P Ek Fk must be defined
k
1
to satisfy Definition 20.10.8 (iv). That is, k must satisfy P, z Ek , g G, k (P G (, g ), z) =
Fk P
G (g, k (, z)). Then the charts k (, ) : Ek Fk will commute with the right action G in the sense de-
scribed in Remark 20.10.10. In other words, Rg1 = Lg , where is an abbreviation for the chart k (, ).
Proof: For part (i), it must be shown that the sets Q(, f ) are pairwise disjoint and that they cover P F .
Let (1 , f1 ), (2 , f2 ) P F be such that Q(1 , f1 ) Q(2 , f2 ) 6= . Then there is a pair (0 , f 0 ) P F
such that (0 , f 0 ) = (Rg1 1 , Lg1 f1 ) = (Rg1 2 , Lg2 f2 ) for some g1 , g2 G. Then Rg1 1 = Rg1 2
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1 2 1 2
and Lg1 f1 = Lg2 f2 . So 1 = Rg1 2 and f1 = Lg f2 , where g = g1 g2 . Therefore (1 , f1 ) Q(2 , f2 ).
Similarly (2 , f2 ) Q(1 , f1 ). So Q(1 , f1 ) = Q(2 , f2 ). Range(Q) clearly covers P F because (, f )
Q(, f ) for all (, f ) P F .
For part (ii), let z E. Let (1 , f1 ), (2 , f2 ) Q(z). Then (1 , z) = f1 and (2 , z) = f2 . By Def-
inition 20.10.8 (ii), there is a unique g G such that 2 = Rg1 1 . Then (Rg1 1 , z) = Lg (1 , z) by
Definition 20.10.8 (iv). So f2 = (Rg1 1 , z) = Lg f1 . So (2 , f2 ) = (Rg1 1 , Lg f1 ) for some g G. There-
fore (2 , f2 ) Q(1 , f1 ). So Q(z) Q(1 , f1 ) for some (1 , f1 ) P F .
To show that Q(z) = Q(1 , f1 ), let (, f ) Q(1 , f1 ). Then (, f ) = (Rg1 1 , Lg f1 ) for some g G. So
(z) = Rg1 1 (z) = Lg 1 (z) = Lg f1 = f . So (, f ) Q(z). Therefore Q(z) = Q(1 , z1 ). So Q(z) E.
Hence Range(Q) E.
To show that Range(Q) = E, let y E. Then y = Q(, f ) for some (, f ) P F . By Definition 20.10.8 (v),
there is a (unique) z E with (z) = f . Then (, f ) Q(z) by the definition of Q. So Q(z) = Q(, f ) (as
shown in the previous paragraph). Therefore y Range(Q). Hence Range(Q) = E.
For part (iii), Q : E E is a surjection by part (ii). Let z1 , z2 E satisfy Q(z1 ) = Q(z2 ). Then for all
(1 , f1 ) P F , ((1 , z1 ) = f1 ) ((1 , z2 ) = f1 ). So z1 = z2 by Definition 20.10.8 (v). So Q is injective.
Hence it is a bijection.
It should be remembered that the group G in the quotient space (P F )/G is really two group actions, one
on P and one on F . The action of G on P F is a combination of these two actions.
Chapter 21
Non-topological fibre bundles
21.0.1 Remark: The removal of topology and differentiable structure from fibre bundle definitions.
Chapter 21 introduces non-topological fibre bundles and non-topological parallelism as a minimalist prelude
to Chapters 46 and 47, which deal with topological fibre bundles and parallelism. Differentiable fibre bundles
are defined in Chapter 61 and parallelism is defined differentially as connections on differentiable fibre
bundles in Chapter 62.
The following table summarises the distribution of fibre bundle topics amongst the chapters in this book.
bundle class fibre bundles parallelism
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
non-topological Chapter 21 Chapter 21
topological Chapter 46 Chapter 47
differentiable Chapter 61 Chapter 62
[tangent bundle] Chapter 52 Chapter 65
The non-topological fibre bundle definitions in this chapter are not presented by most authors, but it is
useful to initially ignore topological questions in order to concentrate on the algebraic fundamentals. Most
texts assume that fibre bundles have topological or differentiable structure. A surprisingly large proportion
of fibre bundle concepts are meaningful and non-trivial in the absence of topology, for example cross-sections,
localisation of cross-sections, constant cross-sections, structure groups, identity cross-sections and identity
charts for principal bundles, the right action map and right transformation group of a principal bundle,
and the construction of a principal bundle from an arbitrary freely-acting right transformation group. Also
meaningful without topology are associated fibre bundles and the porting of parallelism between associated
bundles, although these are introduced in the context of topological fibre bundles in Chapter 46.
Even more minimalist than non-topological fibre bundles are the baseless figure bundles which are
described in Sections 20.10 and 20.11. Baseless figure bundles are fibre bundles with no base space. It is
possibly beneficial to first understand baseless figure bundles before reading this chapter.
21.0.2 Remark: The relation between fibre bundles and tangent bundles.
Fibre bundles seem to be natural generalisations of tangent bundles and tensor bundles on differentiable
manifolds. The wide variety of tensor bundle structures which can be defined in association with the basic
tangent bundle structure on a manifold can be organised and understood within the broader context of
general fibre bundles.
Although tangent bundles are quite likely the original inspiration for general fibre bundles, they are regarded
as a distinct class of structure here. Tangent bundles fit uncomfortably within the framework of abstract fibre
bundles. When the fibre bundle framework is applied to tangent bundles, various identifications of strictly
distinct objects are required, for example via the vertical drop functions in Section 57.2, the horizontal swap
functions in Section 57.3, and the informal short-cut versions of differential forms in Remark 55.5.5.
21.0.4 Remark: Summary of fibre bundle classes and their specification tuples.
Table 21.0.1 is a preview of various classes of fibre bundles and their specification tuples. The letters have
the following significance: E is a total space, B is a base space, is a projection map, F is a fibre space,
AFE is a fibre atlas for fibre space F , G is a structure group, TE is a topology on E, TB is a topology on B,
AE is a manifold atlas on E, AB is a manifold atlas on B, P is the total space on a principal fibre bundle,
TP is a topology on P , and AG P is a fibre atlas for structure group G, and M is a base space which has
differentiable manifold structure (i.e. a manifold atlas).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
definition specification tuple fibre bundle class
non-topological
21.1.8 (E, , B) non-topological fibration (with intrinsic fibre space)
21.1.19 (E, , B) non-topological fibration with fibre space F
21.4.5 (E, , B, AF
E) non-topological fibre bundle for fibre space F
21.5.2 (E, , B, AF
E) non-topological (G, F ) fibre bundle
21.6.3 (P, , B, AG
P) non-topological principal G-bundle
topological
46.2.2 (E, TE , , B, TB ) topological fibration (with intrinsic fibre space)
46.3.5 (E, TE , , B, TB ) topological fibration with fibre space F
46.6.5 (E, TE , , B, TB , AF
E) topological fibre bundle for fibre space F
46.7.5 (E, TE , , B, TB , AF
E) topological (G, F ) fibre bundle
46.9.3 (P, TP , , B, TB , AG
P) topological principal fibre bundle with structure group G
differentiable
61.2.10 (E, AE , , M, AM , AF
E) C k fibration with fibre atlas for fibre space F
61.6.3 (E, AE , , M, AM , AF
E) C k (G, F ) fibre bundle
61.6.7 (E, AE , , M, AM , AF
E) analytic (G, F ) fibre bundle
61.9.2 (P, AP , , M, AM , AG
P) C k principal fibre bundle with structure group G
21.0.5 Remark: Family tree of fibrations, ordinary fibre bundles and principal fibre bundles.
Figure 21.0.2 is a family tree of non-topological fibrations, ordinary fibre bundles and principal fibre bundles.
non-uniform fibration
uniform fibration
fibre bundle
Figure 21.0.2 Family tree of fibrations, ordinary fibre bundles and principal fibre bundles
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
B
b1 b2
Figure 21.1.1 Non-uniform non-topological fibration
21.1.2 Definition: A non-uniform non-topological fibration is a tuple (E, , B) such that E and B are
sets and is a function : E B.
The set E is called the total space of (E, , B).
The set B is called the base space of (E, , B).
The function is called the projection map of (E, , B).
The set 1 ({b}) is called the fibre (set) at b of (E, , B), for each b B.
21.1.3 Notation: Eb , for a total space E of a fibration (E, , B) and a point b B, denotes the fibre set
of (E, , B) at b. In other words, Eb = 1 ({b}).
21.1.5 Theorem: Let (E, , B) beSa non-uniform non-topological fibration. Then { 1 ({b}); b B} is a
partition of E. In other words, E = bB Eb and b1 , b2 B, (b1 6= b2 Eb1 Eb2 = ).
Proof: The assertion follows from Theorem 10.16.3 and Definition 8.6.11.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) b1 , b2 B, : 1 ({b1 }) 1 ({b2 }), is a bijection. [uniformity]
21.1.9 Remark: The equinumerosity of fibre sets.
The equinumerosity of fibre sets 1 ({b}) is necessary but not sufficient for a fibration to meet its intended
purpose. In the case of temperature, completely arbitrary one-to-one associations between temperatures in
Moscow and Cairo are of no value. More structure is required on the total space so as to make meaningful
comparisons between base points. For example, a total order on each set Eb would make possible the defini-
tion of monotonic bijections between these sets, which would help to make comparisons between temperature
measurements at different points more meaningful.
21.1.10 Remark: Some trivial kinds of non-topological fibrations.
If B = in Definition 21.1.8, then E = and = . The tuple (, , ) satisfies Definition 21.1.8. One may
refer to this as the empty fibration. If B = , then E must be empty. However, if E = then B can be
any set. Thus (, , B) is a non-topological fibration for any set B.
If B = {b} for some b, then E may be any set, and there is only one choice for the projection map : E B,
namely the map : e 7 b for all e E. The identity map = idE on E = 1 ({b}) satisfies condition (i)
of Definition 21.1.8. So the tuple (E, , {b}) is a non-topological fibration. One may refer to this as a
single-base-point fibration.
For any set B, one may construct a trivial kind of fibration with E = B and = idB . Then for any
b1 , b2 B, one may define a bijection : 1 ({b1 }) 1 ({b2 }) by : b1 7 b2 . One may refer to this is a
self-projection fibration
For any sets B and C, one may define E = B C, and then define : E B by the rule : (b, c) 7 b. (See
Definition 10.13.2 for Cartesian product projection maps.) Then Eb = 1 ({b}) = {b} C for all b B.
One may define the bijection : (b1 , c) 7 (b2 , c) from Eb1 to Eb2 for any b1 , b2 B. So (B C, , B) is a
fibration, with as indicated, for any sets B and C. This may be referred to as a trivial fibration.
21.1.12 Remark: Cross-sections of fibrations are right inverses of the projection map.
In the case of the trivial kind of fibration in Definition 21.1.8, a global cross-section of the fibration is defined
to be simply a right inverse of the projection map. By Theorem 10.5.12 (iii), the projection map has a
right inverse if and only if it is surjective. By Theorem 21.1.11, a uniform non-topological fibration as in
Definition 21.1.8 is surjective if the total space is non-empty. Therefore a uniform non-topological fibration
always has at least one global cross-section if E 6= or B = . But no global cross-section can exist if
E = and B 6= . The only possible local cross-section would then be the empty cross-section. (See also
Section 46.4 for cross-sections of topological fibrations.)
21.1.14 Notation: X(E, , B), for a non-topological fibration (E, , B), denotes the set of all global
cross-sections of (E, , B).
X(E, , B | U ), for a non-topological fibration (E, , B), denotes the set of all local cross-sections of (E, , B)
on a set U B.
X(E, , B), for a non-topological fibration (E, , B), denotes the set of all local cross-sections of (E, , B).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Let X be a (local or global) cross-section of a non-topological fibration (E, , B). Then X : U E
P
satisfies X = idU for some U (B), which implies that X has a left inverse. Hence X is injective by
Theorem 10.5.12 (ii).
21.1.17 Definition: A fibre space for a non-topological fibration (E, , B) is a set F such that
(i) b B, : 1 ({b}) F , is a bijection. [uniformity]
In the case of temperature, the fibre set for both Celsius or Fahrenheit may be taken as the set R
of real
numbers, but the maps : 1 ({b}) F are different. Definition 21.1.19 requires only the existence of such
maps, but this is only a necessary condition, not sufficient to capture the full meaning of a fibration.
21.1.19 Definition: A non-topological fibration with fibre space F is a fibration (E, , B) such that F is
a fibre space for (E, , B).
F F fibre space
E = 1 ({b})
b
fibre chart
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
E E b1 E b2 Eb total space
projection map
B base space
b1 b2 b
Figure 21.2.1 Uniform non-topological fibration with fibre chart
21.2.2 Definition: A fibre chart with fibre (space) F for a non-topological fibration (E, , B) is a partial
function : E
F such that
P
(i) U (B), Dom() = 1 (U ), [fibre set one in, all in]
(ii) : Dom() (Dom()) F is a bijection. [local trivialisation]
In other words, : 1 (U ) U F is a bijection for some U B.
A fibre chart with fibre F for (E, , B) may also be called an F -fibre chart for (E, , B).
21.2.3 Remark: A fibre chart can be decomposed into per-point fibre space bijections.
The product in Definition 21.2.2 (ii) denotes the pointwise Cartesian product of the functions and .
(See Definition 10.15.2 and Notation 10.15.3.) The projection map and a fibre chart are similar in
that they project the total space to a component space, either B or F respectively. In fact, the functions
( )1 : U F U and ( )1 : U F F are the Cartesian product set projection maps
for U F , as presented in Definition 10.13.2.
Definition 21.2.2 (ii) is equivalent to requiring the map 1 ({b}) : 1 ({b}) F to be a bijection for
all b U . This is shown in Theorem 21.2.4.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
such pointwise measurement maps. This suggests that an observer is able to observe multiple points as
part of a single measurement framework or process. In the case of topological or differentiable fibrations,
there is topological or differentiable structure which connects observations at different base points. In the
mathematically minimalist non-topological case, the only value added by a fibre chart is aggregation.
Questions about continuity and differentiability are left for later consideration. But in ontological terms, the
aggregation generally signifies an observer reference frame, or some such thing.
21.2.8 Remark: The fibre chart for a fibration with an empty base space.
If B = in Definition 21.2.2, then E = and = , as mentioned in Remark 21.1.10, and any set F is a
fibre space for (E, , B) = (, , ), as mentioned in Remark 21.1.20. Then the empty function = : F
is a fibre chart with fibre F (since the function product : is a bijection because = ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1997 Lee [24], page 16 (2) : 1 (U ) U F
1999 Lang [23], page 103 (2) : 1 (U ) U F
2004 Szekeres [266], page 424 (2) : 1 (U ) U F
2012 Sternberg [37], page 327 (2) : 1 (U ) U F
Kennington (1) : 1 (U ) F
Poor [32], page 2, calls the combined map = : 1 (U ) U F a bundle chart on E, and calls
the principal part of the bundle chart. In practice, both and are extensively used. In this book, is
generally called a fibre chart, whereas for topological and differentiable fibrations and fibre bundles, is
typically called a manifold chart for the total space, or something similar.
21.2.10 Remark: Illustration of relations between three styles of fibre chart specification.
Figure 21.2.3 illustrates the relations between the three styles of fibre chart specification which are mentioned
in Theorem 21.2.11.
21.2.11 Theorem: Let U , F and W be sets. Let : W U be a surjective map. Define the Cartesian
product projection maps 1 : U F U and 2 : U F F by 1 : (b, f ) 7 b and 2 : (b, f ) 7 f .
Define single-variable predicates X1 , X2 and X3 as follows.
(1) X1 () = : W F and : W U F is a bijection.
(2) X2 () = : W U F is a bijection and 1 = .
(3) X3 () = : U F W is a bijection and 1 = .
Then:
W = 1 (U ) E F
=
2
1
U B U F BF
1
Figure 21.2.3 Relations between three styles of fibre chart: , and
(i) X1 () X2 ( ).
(ii) X2 () X1 (2 ).
(iii) X2 () X3 ( 1 ).
(iv) X3 () X2 (1 ).
(v) X1 () X3 (( )1 ).
(vi) X3 () X1 (2 1 ).
Proof: To show part (i), assume X1 (). Then : W F and : W U F is a bijection.
Let = . Then : W U F is a bijection and 1 = 1 ( ) = by the definitions of 1
and the pointwise Cartesian function product. (See Definition 10.15.2.) Hence X2 ( ). For the converse
of (i), assume X2 ( ). Then : W U F . Therefore : W F . Hence X1 ().
For part (ii), assume X2 (). Then : W U F is a bijection and 1 = . Let = 2 . Then
= 2 : W U and = (2 ) = (1 ) (2 ) = . Therefore : W U F is
a bijection. Hence X1 (2 ).
For part (iii), assume X2 (). Then : W U F is a bijection and 1 = . So 1 : U F W is a
bijection and w W, 1 ((w)) = (w). Let = 1 . Let (b, f ) U F . Let w = ((b, f )). Then (w) =
(b, f ). So 1 ((w)) = 1 ((b, f )) = b. So (((b, f ))) = (w) = b. Therefore (b, f ) U F, (((b, f ))) =
b = 1 ((b, f )). In other words, = 1 . Hence X3 ( 1 ). For the converse of (iii), assume X3 ( 1 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
That is, 1 : U F W is a bijection and 1 = 1 . Then : W U F is a bijection and
(b, f ) U F, 1 ((b, f )) = ( 1 ((b, f ))). Therefore (b, f ) U F, ( 1 ((b, f ))) = b. Let w W .
Let (b, f ) = (w). Then w = 1 ((b, f )). So (w) = ( 1 ((b, f ))) = b = 1 ((b, f )) = 1 ((w)). Therefore
w W, (w) = 1 ((w)). That is, = 1 . Hence X2 ().
The proof of part (iv) follows exactly as for part (iii) by substituting 1 for and 1 for , and noting
that (1 )1 = and ( 1 )1 = .
Part (v) follows from parts (i) and (iii).
Part (vi) follows from parts (ii) and (iv).
21.3.2 Definition: The localisation of a cross-section X X(E, , B) via a fibre chart with fibre space
F for a non-topological fibration (E, , B) is the map X : Dom(X) (Dom()) F .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
constant via some chart.
21.3.5 Definition: The constant cross-section of a non-topological fibration (E, , B) with fibre space F ,
for a fibre chart and a fibre space value f0 F , is the function X,f0 : U 1 (U ) defined by
21.4.2 Definition: A fibre atlas with fibre space F for a non-topological fibration (E, , B) is a set A of
partial functions : E
F such that
(i) A, Dom() = 1 ((Dom())),
(ii) A, : Dom() (Dom()) F is a bijection,
S
(iii) {(Dom()); A} = B.
21.4.4 Remark: For non-topological fibre bundles, fibre charts contain no continuity information.
Since the base set B in Definition 21.4.2 is unstructured, there is no difference between requiring a single
combined chart : 1 (U ) F and requiring an individual per-fibre chart : 1 ({b}) F for every b U .
The combined chart makes a significance difference when topological or differentiable structure is defined on
E, B and F , and the combined chart is required to be continuous or differentiable.
A set with no topology is effectively equivalent to a topological space with the discrete topology. (See
Definition 31.3.19 for discrete topology.) Since the fibre space F is also non-topological, even pointwise
fibre charts contain no information, because all bijections between the fibre space and a given fibre set are
equivalent. The value of defining structures such as charts and atlases for unstructured fibrations and fibre
bundles is to highlight the information that is contained in them when structure is present. And besides, a
single chart is easier to work with than one chart per base-space point.
21.4.5 Definition: A non-topological fibre bundle for fibre space F is a tuple (E, , B, AF
E ) such that
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
{b ; b U } of (the graphs of) any given family = (b )bU bijections b : Eb F , for b in any subset
U of B, is necessarily (the graph of) a bijection from 1 (U ) to U F . The existence of such a chart does
not in any way constrain the tuple (E, , B). Therefore it does not add information about the fibration. For
similar reasons, the addition of an atlas of fibre charts to a fibration also adds no real information because
all charts are automatically compatible in the sense that the restriction of the composition 1 1 2 of any
two charts 1 and 2 to a single fibre set Eb = 1 ({b}) is necessarily a bijection from Eb to Eb .
Consequently Definition 21.4.5 may be referred to as a non-topological F -fibration, since it carries only
the same amount of information as a fibration with fibre space F as in Definition 21.1.19.
Since Definition 21.4.5 contributes, in the indicated senses, no additional information to the bare-bones
uniform fibration in Definition 21.1.8, one naturally asks whether one should delete the definition. The
main purpose of Definition 21.4.5 is political or rhetorical. It demonstrates that without any further
structures on the sets E, B and F , the charts and atlases are of no real value. This observation motivates
the introduction of further structures. (A similar situation occurs for locally Cartesian topological spaces
in Section 48.6, where the addition of an atlas is at first superfluous, but the choice of a particular atlas is
later seen to be important for specifying differentiable structure on such spaces.)
21.4.7 Remark: Minimal structure groups for non-topological fibre bundles with explicit fibre atlas.
Without adding algebraic, topological or differentiable structures on a non-topological fibre bundle, one may
build at least one construction with the aid of an atlas which one cannot build non-trivially without it. This
construction is a minimal structure group.
Let (E, , B, AFE) be a non-topological fibre bundle with a non-empty fibre atlas for fibre space F . Let
S0 = {1 1 F
2 Eb ; 1 , 2 AE , b (Dom(1 ) Dom(2 ))}. Then S0 is a subset of the group G1 of
bijections from F to F . The group G0 generated by S0 is the minimal structure group for the particular
choice of atlas. (See Definition 17.6.5 for subgroups generated by subsets.) Any subgroup of G1 which is a
supergroup of G0 may be asserted to be a structure group for the fibre bundle.
This shows how arbitrary the structure group is. This arbitrariness may be seen in exactly the same way in
the context of general transformation groups. Given any set S0 of bijections for any set X, one may construct
the group G0 which is generated by S0 , and then observe that S0 is a subgroup of any group between G0
and the group G1 of all bijections of X.
The structure group is limited only by the choice of atlas. One may choose the atlas to consist of a single
fibre chart : E F . Then any subgroup of G1 may be chosen as the structure group. This situation
changes when the charts are required to be compatible with algebraic, topological or differentiable structure
on the total space E.
Minimal structure groups are related to holonomy groups. (See Section 64.6.)
21.4.8 Remark: Fibre atlases which are compatible with a given structure group.
Definition 21.4.9 differs from Definition 21.4.2 by the addition of condition (iv). If AF
E is a (G, F ) atlas for a
non-topological fibration (E, , B) as in Definition 21.4.9, then the tuple (E, , B, AFE ) is a non-topological
(G, F ) fibre bundle as in Definition 21.5.2.
21.4.9 Definition: A (non-topological) (G, F ) (fibre) atlas for a non-topological fibration (E, , B), where
(G, F ) is an effective left transformation group, is a set A of partial functions : E
F such that
(i) A, Dom() = 1 ((Dom())),
(ii) A, : Dom() (Dom()) F is a bijection,
S
(iii) {(Dom()); A} = B.
(iv) 1 , 2 A, b (Dom(1 ) Dom(2 )), g G, z 1 ({b}), 2 (z) = (g, 1 (z)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
required to act on the fibre space in a specified way. Without this specified group action, part (v) of
Definition 21.5.2 would be meaningless.
21.5.2 Definition: A non-topological (G, F ) (ordinary) (fibre) bundle for an effective left transformation
group (G, F, , ) is a tuple (E, , B, AF
E ) satisfying the following.
21.5.4 Remark: Alternative expressions for the fibre chart overlap condition.
The function values 1 (z) and 2 (z) in Definition 21.5.2 (v) are guaranteed to be well defined for all z
1 ({b}) in the condition because b is required to be an element of (Dom(1 )Dom(2 )), which implies that
1 ({b}) Dom(1 ) Dom(2 ) by condition (ii). (This is discussed in Remark 21.2.1.) Thus condition (v)
may be expressed equivalently as follows.
(v0 ) 1 , 2 AF
E , b B, g G, z
1
({b}) Dom(2 ) Dom(1 ), 2 (z) = g1 (z).
Condition (v0 ) has the advantage that the test for the total space element z very clearly guarantees that the
following expressions 1 (z) and 2 (z) are well defined. However, condition (v) is preferred here because in
later definitions, the group element g G will be formalised as a function of 1 , 2 and b. (The uniqueness
of g will follow from Definition 20.2.1 for an effective left transformation group.)
In condition (v0 ), the set 1 ({b}) Dom(1 ) Dom(2 ) is either equal to 1 ({b}) or , depending on
whether b (Dom(1 ) Dom(2 )) or not. This is perfectly valid, but it does not most clearly reveal the
meaning of the condition.
Condition (v) in Definition 21.5.2 may also be expressed as follows.
1
(v00 ) 1 , 2 AF E , b Dom(1 ) Dom(2 ), g G, 2,b 1,b = Lg , where i,b denotes the function
i 1 ({b}) for b Dom(i ), for i = 1, 2.
21.5.5 Remark: Fibre chart transition maps for non-topological ordinary fibre bundles.
Since the group element g in Definition 21.5.2 (v) is unique for any given 1 , 2 and b, it is a well-defined
function of those variables. (The uniqueness ensures that choice-axiom issues do not arise.) This function is
given a name in Definition 21.5.7.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Part (i) follows from the assumption that the left transformation group (G, F ) is effective.
Part (ii) follows from part (i).
21.5.7 Definition: The fibre chart transition map for a non-topological (G, F ) fibre bundle (E, , B, AF E ),
for fibre charts 1 , 2 AF
E , is the unique map g2 ,1 : (Dom( 1 )) (Dom( 2 )) G which satisfies
21.5.8 Remark: A (G, F ) fibre bundle is the same as an F -fibration with a (G, F ) atlas.
Definition 21.5.2 for a non-topological (G, F ) fibre bundle is equivalent to the combination of the F -fibration
in Definition 21.1.19 with a (G, F ) fibre atlas as in Definition 21.4.9.
Proof: Part (i) follows from Definitions 21.1.19, 21.4.9 and 21.5.2.
Part (ii) follows from Definitions 21.5.2, 21.1.19 and 21.4.9
(, , ) satisfies Definition 21.1.8. Let (G, F, , ) be any left transformation group. S Let AF
E = . Then
Definition
S 21.5.2 parts (ii), (iii) and (v) are satisfied vacuously, and for part (iv), {(Dom()); AF
E} =
F
= = B by Theorem 8.4.6 (i). Therefore (E, , B, AE ) = (, , , ) satisfies Definition 21.5.2 for any
effective left transformation group (G, F ).
21.5.11 Remark: Objects which should not be in specification tuples for fibre bundles.
Many texts include the structure group G and the fibre space F in specification tuples for fibre bundles.
This is not necessary, and is in fact undesirable.
In the context of differentiable manifolds, it is undesirable to include the integer n and the order of differ-
entiability k in the specification tuple for an n-dimensional C k differentiable manifold. The dimension n is
not included because it is evident and unambiguous in the charts of the differentiable atlas for the manifold.
One does not include the differentiability parameter k in the specification tuple because one wishes to be
Z
able to say that every C k+1 manifold is necessarily a C k manifold for any integer k + , for example.
The set-construction for a C k+1 manifold should not need to be modified in order to assert that it is a C k
manifold. Some authors do indeed make such an assertion impossible by insisting that a C k manifold should
have a complete C k atlas (which includes all possible charts which are C k -consistent with the atlas). To
assert that a C k+1 manifold is C k , one must then add all of the C k -compatible charts to the atlas first. One
is accustomed to stating that a C 2 real-valued function of a real variable is necessarily a C 1 function. This
should be equally possible in the case of manifolds.
In the case of fibre bundles, it is similarly desirable to be able to say that every (G1 , F ) fibre bundle is a
R
(G2 , F ) fibre bundle if G1 is a subgroup of G2 . For example, every (SO(n), n ) fibre bundle is a (GL(n), n ) R
fibre bundle. One cannot easily say this if one incorporates the group G into the specification tuple for the
fibre bundle. The fibre space F is evident from the fibre bundle atlas since it is the range of each chart.
R
(Similarly, the dimension n of a manifold is evident from the range n of very chart.) Thus F is necessarily
a fixed and immutable part of the structure of a fibre bundle, whereas G is a changeable attribute of the fibre
bundle. In the case of F , it is evident and does not need to be in the specification tuple. In the case of G, it
is changeable and there should not be in the specification tuple. That is, F is useless and G is undesirable.
These comments about specifications need to be made because it is clear, for example from Figure 21.8.1,
that the structure group G and fibre space F are an integral part of a fibre bundle. However, according
R
this kind of argument, a linear space n should be part of every manifold definition, and the domain and
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
range should be part of every function definition, and full algebraic field structure should be part of every
differentiable real-valued function definition, and so forth. The temptation to include G and F in each fibre
bundle is understandable, but must be resisted.
21.5.13 Remark: Interpretation of structure groups in terms of observers and reference frames.
In the pure mathematical literature, the structure group of a fibre bundle is an abstract group. In practical
applications, the structure group is the group of transition maps between frames of reference for fibre
sets. The fibre space F is the set of components or coordinates for fibre sets Eb = 1 ({b}), and the
fibre charts : E F map the real objects in sets Eb to the component space F . Each fibre chart
AF E defines a frame of reference at each point b B for making measurements on Eb . As discussed
21.6.3 Definition: A non-topological principal (fibre) bundle with structure group G is a non-topological
(G, G) fibre bundle (P, , B, AG
P ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
21.6.5 Remark: The value of principal fibre bundles lies in associations with ordinary fibre bundles.
The left transformation group (G, G) < (G, G, , ) of a group G < (G, ) acting on itself is defined in
Definition 20.3.8. This is always an effective left transformation group by Theorem 20.3.7. Therefore the
left transformation group (G, G) may be used as the group/fibre pair (G, F ) in Definition 21.5.2 for any
group G.
Definition 21.6.3 may appear at first sight to add very little value relative to Definition 21.5.2 for ordinary
fibre bundles. The usefulness of principal fibre bundles lies in the relations between principal fibre bundles
and ordinary fibre bundles, particularly in the porting of parallelism between fibre bundles which have a
common structure group and matching atlases. Fibre bundles with the same base space and structure group,
and matching atlases, are called associated fibre bundles.
Since a principal bundle P and associated ordinary fibre bundle E are often discussed in the same context,
it is sometimes convenient to distinguish there projection maps as : E B and P : P B for example,
as in Definition 46.12.4. Principal fibre bundles are also called principal G-bundles or just G-bundles.
21.6.6 Remark: Fibre chart transition maps for principal bundles are quotients of chart values.
The group element product 2 (z)1 (z)1 in Theorem 21.6.7 may be thought of as a quotient of 2 (z)
divided by 1 (z) on the right. This expression for g2 ,1 (b) is only possible because the fibre space of a
principal bundle is the same as the structure group.
1
1 , 2 AG
P , b (Dom(1 )) (Dom(2 )), z ({b}),
g2 ,1 (b) = 2 (z)1 (z)1 .
X (b)
1 ({e})
1 (U ) P e G
(
1 ({e}) X
)
U B b (b, e) U G
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Input . Output X .
Figure 21.7.1 Identity cross-section for a non-topological principal bundle
21.7.5 Theorem: Let (P, , B, AG P ) be a non-topological principal G-bundle. Let X denote the identity
cross-section for fibre charts AG
P as in Definition 21.7.3.
(i) AG
P , X X(P, , B | (Dom())) X(P, , B). (That is, X is a local section of (P, , B).)
(ii) AG
P , b (Dom()), (X (b)) = e.
(iii) , 0 AG 0 0
P , b (Dom() Dom( )), (X (b)) = g0 , (b), where g0 , (b) denotes the fibre chart
0
transition map for P from to in Definition 21.5.7.
(iv) , 0 AG 0
P , b (Dom() Dom( )), z
1
({b}), 0 (X (b)) = 0 (z)(z)1 .
Proof: For part (i), let AG P . Let U = Dom(X ). Then U = (Dom()) by Definition 21.7.3. Let
b U . Then (X (b)) = (( )1 (b, e)) = b. So X X(P, , B | (Dom())) by Notation 21.1.14. (The
inclusion X(P, , B | (Dom())) X(P, , B) is automatic by Notation 21.1.14.)
1
For part (ii), let AG P and b (Dom()). Then (X (b)) = (( ) (b, e)) = (1 (e)) = e.
For part (iii), let , 0 AG 0 0 1 1
P and b (Dom() Dom( )). Let z = Pb (e) with Pb = ({b}). Then
0 0 1
0 1 0 0 0
(X (b)) = (( ) (b, e)) = ( P (e)) = (z ) = g0 , (b)(z ) = g0 , (b) for any z Pb .
b
1 (U ) P G
z0 = X((z))
= X(b) z Pb g = X (z)
X = (z0 )1 (z)
e = X (X((z)))
X(U ) = X (z0 )
X
X
(b, g)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
U B b = (z) U G
(b, e)
Input X. Output X .
Figure 21.7.2 Identity chart for a non-topological principal bundle
21.7.7 Definition: The identity chart for a (local) cross-section X of a non-topological principal bundle
1
(P, , B, AG
P ) with structure group G is the function X : (U ) G defined by
z 1 (U ), AG
P,(z) X (z) = (X((z)))1 (z), (21.7.1)
where U = Dom(X).
21.7.8 Theorem: Let X be the identity chart for a local cross-section X of a non-topological principal
bundle (P, , B, AG
P ).
(i) X is well-defined. That is, X is independent of the choice of fibre chart in line (21.7.1).
(ii) AGP {X } is a (G, G) fibre atlas for the non-topological fibration (P, , B).
(iii) (P, , B, AG
P {X }) is a non-topological principal bundle with structure group G.
(iv) p Dom(X), X (X(p)) = e.
Proof: For part (i), let z 1 (U ), Let 1 , 2 AG 1 2
P with z Dom( ) Dom( ). Then there is a
2 0 1 0 0 1
g0 G such that (z ) = g0 (z ) for all z ({(z)}) by Definition 21.5.2 (v). Then
2 (X((z)))12 (z) = 1 (X((z)))1 1 (X((z)))2(X((z)))1 2 (z)1 (z)1 1 (z)
= 1 (X((z)))1 g01 g0 1 (z)
= 1 (X((z)))11 (z).
Hence line (21.7.1) gives the same value for X (z), independent of the choice of .
For part (ii), let P < (P, , B, AG
P ) be a non-topological principal bundle. Let X : U P with U B be
a local cross-section of P as in Definition 21.1.13. Let X : 1 (U ) G be the identity chart for X as
in Definition 21.7.7. Then (Dom(X )) = ( 1 (U )) = U by Theorem 10.7.1 (i00 ) because is surjective
(onto B). So Dom(X ) = 1 (U ) = 1 ((Dom(X ))). So X satisfies Definition 21.4.9 (i). But AG P satisfies
Definition 21.4.9 (i) because it satisfies Definition 21.5.2 (ii). So AG
P { X } satisfies Definition 21.4.9 (i).
To show that X : Dom( X ) (Dom(X )) G is a bijection, first note that it is a well-defined
function because Range( Dom(X ) ) = (Dom(X )) and Range(X ) G. To show that it has an inverse, let
1
(b, g) (Dom(X )) G. Let AG P with b Dom(). Let z = ( ) (b, (X(b))g). Then (z) = b
1 1
and X (z) = (X(b)) (z) = (X(b)) (X(b))g = g, Thus ( X )(z) = (b, g), which verifies surjectivity.
But injectivity follows from the fact that 1 ({b}) is injective for all b (Dom()) by Definition 21.5.2 (iii).
Therefore X : Dom(X ) (Dom(X )) G is a bijection. So since : Dom() (Dom()) G
is a bijection for all charts AG G
P , it follows that AP {X } satisfies Definition 21.4.9 (ii) also.
It is clear that the base-space coverage condition, Definition 21.4.9 (iii), must hold for AG P {X } because
the charts of AG P already cover the base space.
The fibre chart transition condition, Definition 21.4.9 (iv), holds for all 1 , 2 AP G by Definition 21.5.2 (v).
So it remains to show that the transition rules are correct for AP G and X .
1
Let AG P . Let b (Dom() Dom( X )) = (Dom()) U . Let g = (X(b)) . Let z 1 ({b}). Then
1 1
X (z) = (X((z))) (z) = g(z). Thus g G, z ({b}), X (z) = g(z). So Definition 21.4.9 (iv)
is satisfied for APG and X . Hence Definition 21.4.9 for a (G, G) fibre atlas for a non-topological fibration
is satisfied.
Part (iii) follows from part (ii) and Theorem 21.5.9 (i) and Definition 21.6.3.
For part (iv), let p Dom(X). Then X (X(p)) = (X((X(p))))1(X(p)) = (X(p))1 (X(p)) = e for
all AG P,p because X = idDom(X) by Definition 21.1.13.
hh 2016-12-27. Give topological and differentiable versions of Theorem 21.7.8 and everything else concerning
identity cross-sections and identity charts. ii
21.7.9 Remark: Transition map for identity charts on principal bundles.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Theorem 21.7.10 states that the quotient X (z)Y (z)1 is independent of z 1 ({b}) for any b in the
common domain of two identity charts. This implies that a transition map can be defined which depends
only on base points b, not on the choice of z. This is given a name in Definition 21.7.11.
21.7.10 Theorem: Let X and Y be the identity charts for local cross-sections X and Y respectively of
a non-topological principal bundle (P, , B, AG
P ). Then
b Dom(X) Dom(Y ), z, z 0 1 ({b}),
X (z)Y (z)1 = X (z 0 )Y (z 0 )1 .
Proof: Let b Dom(X) Dom(Y ), z, z 0 1 ({b}) and AG
P,b . Then
Proof: Let b U . Let z 1 ({b}). Let 0 atlasb (P, , M ). Then XY (p) = X (z)Y (z)1 =
0 (X(p))1 0 (z)(0 (Y (p))1 0 (z))1 = 0 (X(p))10 (X(p)g(p)) = 0 (X(p))1 0 (X(p))g(p) = g(p).
21.7.13 Remark: Equi-informationality of principal bundle fibre charts and identity cross-sections.
A cross-section of a principal bundle can be converted to a specific chart, and then back again into the
original cross-section. Moreover, a fibre chart can be converted to a specific cross-section, and then back
again to the original fibre chart. This implies that these structures are equi-informational because no
information is lost or gained in these conversions. (A minor technical subtlety here is that Definition 21.7.3
for an identity cross-section must be extended to all compatible fibre charts, not just charts in a given fibre
atlas. This extension is easy to do, but not interesting enough to merit the effort.)
(i) AG
P , X = .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
21.8. The right action map of a non-topological principal bundle
21.8.1 Remark: The right action map of a principal bundle is implicit in its definition.
Principal fibre bundles permit the construction of a right action map by the structure group on the total
space which commutes with the chart transition maps. The action map does not need to be specified as part
of the definition of a principal fibre bundle because it is implicit in the standard specification tuple. The right
action map for a principal bundle is important for the definition of connections on differentiable fibre bundles
in particular. Theorem 21.8.2 establishes that the right action map is well defined and chart-independent.
21.8.2 Theorem: Let (P, , B, AG P ) be a non-topological principal fibre bundle with structure group (G, ).
For all g G and AG
P , define Rg, : Dom() Dom() by Rg, (z) = ( )1 ((z), ((z), g)) for
all z Dom().
(i) Rg, : Dom() Dom() is a well-defined bijection for all g G and AG
P.
(ii) AG
P , z Dom(), g1 , g2 G, Rg1 , (Rg2 , (z)) = R(g2 ,g1 ), (z).
(iii) 1 , 2 AG
P , z Dom(1 ) Dom(2 ), g G, Rg,1 (z) = Rg,2 (z).
Proof: Part (i) follows directly from the definition a fibre chart and the fact that the map Rg : G G
defined by Rg : f 7 (f, g) is a bijection on G by the definition of a group.
For part (ii), let (P, , B, AGP ) be a non-topological principal fibre bundle with structure group (G, ). Let
AGP , z Dom() and g1 , g2 G. Let b = (z), and let z2 = Rg2 , (z) and z1 = Rg1 , (z2 ). Then z2 =
( )1 ((z), ((z), g2 )). So ((z2 ), (z2 )) = ((z), ((z), g2 )). Therefore z2 1 ({b}) and (z2 ) =
((z), g2 ). Similarly, z1 1 ({b}) and (z1 ) = ((z2 ), g1 ). Therefore (z1 ) = (((z), g2 ), g1 ) =
((z), (g2 , g1 )). Let z3 = R(g2 ,g1 ), (z). Then ((z3 ), (z3 )) = ((z), ((z), (g2, g1 ))). So z3 1 ({b})
and (z3 ) = ((z), (g2 , g1 )) = (z1 ). Therefore z3 = z1 . That is, Rg1 , (Rg2 , (z)) = R(g2 ,g1 ), (z).
For part (iii), let 1 , 2 AG P and z Dom(1 ) Dom(2 ). Let z1 = Rg,1 (z) and z2 = Rg,2 (z). Then
((z1 ), 1 (z1 )) = ((z), (1 (z), g)) and ((z2 ), 2 (z2 )) = ((z), (2 (z), g)). So (z1 ) = (z2 ) = (z) and
1 (z1 ) = (1 (z), g) and 2 (z2 ) = (2 (z), g). By Definition 21.5.2 (v), there is a group element g 0 G such
that z 0 1 ({b}), 1 (z 0 ) = (g 0 , 2 (z 0 )), where b = (z). So from 1 (z1 ) = (1 (z), g), it follows that
(g 0 , 2 (z1 )) = ((g 0 , 2 (z)), g) = (g 0 , (2 (z), g)) by the associativity of . So 2 (z1 ) = (2 (z), g) (by
pre-multiplying both sides with (g 0 )1 ). So 2 (z2 ) = 2 (z1 ). Therefore z2 = z1 because 1 is a bijection
by Definition 21.5.2 (iii). Hence Rg,1 (z) = Rg,2 (z), as claimed.
21.8.3 Remark: Action by structure groups on total spaces may be defined in a chart-independent way.
Theorem 21.8.2 (iii) is the core reason for defining principal fibre bundles. It says that an action of the
structure group G on the total space P may be defined in a chart-independent manner. The reason for this is
that right and left actions by G on G commute with each other. The fibre chart transition maps are equivalent
to left actions. Therefore the right action commutes with fibre chart transition maps. Consequently, the
right action map P G in Definition 21.8.4 is well defined although it may at first sight seem to depend on the
choice of chart .
21.8.4 Definition: The right action (map) (on the total space) of a non-topological principal fibre bundle
(P, , B, AG P
P ) with structure group (G, ) is the map G : P G P defined by
1
g G, z P, AG
P,(z) , P
G (z, g) = ( ) ((z), ((z), g)),
where AG G
P,(z) = { AP ; z Dom()} as in Notation 21.5.3.
21.8.5 Remark: Diagram for the right action map on a principal fibre bundle.
Figure 21.8.1 illustrates the sets and functions in Definition 21.8.4 for principal fibre bundles in comparison
with fibrations and ordinary fibre bundles, which do not have a right action map.
G G
L P
G
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
G L
G R
G
AF
E AG
P
E E F E F P G
B B BF B BF B BG
fibration F -fibration (G, F ) fibre bundle principal G-bundle
Figure 21.8.1 Non-topological fibrations and fibre bundles
The right action map is called the principal map by Steenrod [138], pages 3738.
Definition 21.8.4 is independent of the choice of the fibre chart by Theorem 21.8.2. This fibre-chart-
independence follows from Definition 21.5.2 (v), which implies that there is a unique group element which
effects the group transitions for all elements of a fixed fibre 1 ({b}), for a fixed pair of charts 1 and 2 .
21.8.6 Notation: RgP , for a non-topological principal G-bundle (P, , B, AGP ) and g G, denotes the map
from P to P defined by z 7 P
G (z, g). In other words, Rg
P
: P P is defined by z P, RgP (z) = P
G (z, g).
zg, for z P and g P , denotes RgP (z).
Proof: For parts (i) and (ii), let z P , g G and AG P,(z) . Then ((zg), (zg)) = ( )(zg) =
()(()1 ((z), (z)g)) = ((z), (z)g) by Definition 21.8.4 and Notation 10.14.4. Hence (zg) = (z)
and (zg) = (z)g.
21.8.8 Definition: The right transformation group of a non-topological principal fibre bundle
(P, , B, AG P P
P ) with structure group (G, G ) is the right transformation group (G, P, G , G ), where G is the
G
right action map on (P, , B, AP ).
21.8.9 Remark: Basic properties of the right transformation group of a principal bundle.
Theorem 21.8.10 asserts that right transformation groups of principal bundles act freely. In other words, the
only right action which has a fixed point is the identity action. Note that by Remark 21.6.4, the possibility
that the total space is empty cannot be excluded.
21.8.10 Theorem: The right transformation group of a non-topological principal bundle acts freely on
the total space. Hence the action is effective if the total space is non-empty.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that some authors use the notation q for this map. (See for example EDM2 [109], page 569.)
It is not possible in general to construct a fibre chart for an infinite subset of B by combining the pointwise
fibre charts constructed in Theorem 21.8.12 (i) because an element z0 P must be chosen for each base
point b. This may be impossible without an axiom of choice, although in typical concrete applications there
will usually be obvious choice-rules. The choice of z0 P for each b B is effectively the choice of a
cross-section of the fibration. (See Definition 21.1.13 for cross-sections of fibrations.) So the existence of a
cross-section for a given subset of B is equivalent to the existence of a fibre chart for that subset.
Theorem 21.8.12 (iv) implies that one may start with either the right transformation group or the principal
bundle, and construct one from the other. The main advantage of starting with the principal bundle is that
it has a concrete base-point set B rather than an abstract orbit space P/G.
21.8.12 Theorem: Let (G, P, , ) be a right transformation group where G acts freely on P . Let B =
P/G = {zG; z P }, where zG = {zg; g G} for z P . Define : P B by (z) = zG for z P .
(i) (P, , B) is a non-topological fibration with fibre space G.
(ii) (P, , B, AG G
P ) is a non-topological (G, G) fibre bundle with AP = {z0 ; z0 P }, where z0 : P(z0 ) G
is defined for z0 P as the unique map from Pz0 to G which satisfies
z Pb , z0 z0 (z) = z, (21.8.1)
Proof: For part (i), let b B. Choose z0 P such that b = z0 G. Let z Pb . Then z P and b = zG.
So z = zeG zG = z0 G. Therefore z = z0 g for some g G, which is unique by Definition 20.7.10 because G
acts freely on P . Thus z Pb , 0 g G, z = z0 g. Therefore there is a unique function z0 : Pb G which
satisfies line (21.8.1).
To show that z0 : Pb G is a bijection, define Lz0 : G Pb by Lz0 (g) = z0 g for all g G. Let z Pb .
Then Lz0 (z0 (z)) = z0 z0 (z) = z by line (21.8.1). Let g G. Then z0 (Lz0 (g)) = z0 (z0 g). Therefore
z0 z0 (Lz0 (g)) = z0 z0 (z0 g) = z0 g by line (21.8.1). So z0 (Lz0 (g)) = g because G acts freely on P . Thus
z0 Lz0 = idPb and z0 Lz0 = idG . Therefore z0 : Pb G is a bijection by Theorem 10.5.12 (iv). Hence
(P, , B) is a non-topological fibration with fibre space G by Definition 21.1.19.
For part (ii), Theorem 20.7.16 implies that G acts effectively on G, as required by Definition 21.5.2 for a
non-topological (G, G) fibre bundle. To show that (P, , B, AG G
P ) satisfies Definition 21.5.2 (ii), let AP .
Then = z0 for some z0 P . Let b = z0 G. Then (z0 ) = b. Let U = {b}. Then U (B) and P
1 (U ) = Pb , and so z0 : U G is a well-defined function. Thus Definition 21.5.2 (ii) is satisfied. In fact,
z0 : U G is a bijection. So z0 : Pb U G is a bijection. Thus Definition 21.5.2 (iii) is satisfied.
S S
For all z0 in AG P , (Dom(z0 )) = (Pb ) = {b}. So {(Dom()); AG P} = {{b}; b B} = B.
Therefore Definition 21.5.2 (iv) is satisfied.
To verify Definition 21.5.2 (v), let 1 , 2 AG 1 2
P . Then = z1 and = z2 for some z1 , z2 P , and then
Dom( ) = Pb1 and Dom( ) = Pb2 , where b1 = z1 G and b = z2 G. Let b (Dom(1 ) Dom(2 )). Then
1 2
b (Pb1 Pb2 ), which implies that b = b1 = b2 . Let g = z2 (z1 ). Let z Pb . Then g1 (z) = z2 (z1 )z1 (z).
So z2 g1 (z) = z2 z2 (z1 )z1 (z) = z1 z1 (z) = z = z2 z2 (z) = z2 2 (z). Therefore g1 (z) = 2 (z) because G
acts freely on P . Thus Definition 21.5.2 (v) is satisfied. Hence (P, , B, AG P ) is a non-topological (G, G) fibre
bundle by Definition 21.5.2.
Part (iii) follows from part (ii) and Definition 21.6.3.
For part (iv), let P G
G : P G P be the right action for (P, , B, AP ) as in Definition 21.8.4. Then
1
g G, z P, AG
P,(z) , P
G (z, g) = ( ) ((z), (z)g).
Let g G and z P . Let AG P,(z) . Then = z0 for some z0 Pb with b = (z) = (z0 ). It follows that
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1
P
G (z, g) = ( z0 ) ((z), z0 (z)g). But z0 z0 (z)g = zg by line (21.8.1) and z0 z0 g = zg by line (21.8.1).
So z0 (z)g = z0 (zg) because G acts freely on P . Also, (zg) = (z) by the definition of . Therefore
P
G (z, g) = ( z0 )
1
((zg), z0 (zg)) = zg = (z, g). Hence P G = .
hh 2016-12-21. If G does not act freely on P in Theorem 21.8.12, one might obtain an ordinary fibre bundle by
the same kind of construction. If so, present this. ii
21.8.13 Remark: Identity cross-sections expressed in terms of the right action map.
Theorem 21.8.14 applies the right action map of a principal bundle to express identity cross-sections in a
slightly simpler form.
21.8.14 Theorem: Let (P, , B, AG P ) be a non-topological principal G-bundle. Let X denote the identity
cross-section for fibre charts AG
P as in Definition 21.7.3.
1
(i) AG
P , b (Dom()), z ({b}), X (b) = z(z)1 .
Proof: Part (i) follows from Theorem 21.6.7 (iv) and Definition 21.8.4
Proof: For part (i), let z 1 (Dom(X)) and g G. Then (zg) = (z) and (zg) = (z)g by
Theorem 21.8.7 (i, ii). It follows that X (zg) = (X((zg)))1 (zg) = (X((z)))1 (z)g = X (z)g by
Definition 21.7.7.
Part (ii) is equivalent to part (i).
21.9.1 Remark: Absolute parallelism specifications determine bijections between fibre sets.
Parallelism associates elements of fibre sets Eb = 1 ({b}) for pairs of points b in the base space B of a
fibration (E, , B). A point-to-point parallelism may be specified as a map : (B B) (E E) such
that b2 ,b1 : 1 ({b1 }) 1 ({b2 }) is a bijection for all b1 , b2 B. It is reasonable to expect the following
equivalence relation properties for parallelism. (See Definition 9.8.2 for equivalence relations.)
(i) b B, b,b = idEb . [reflexivity]
(ii) b1 , b2 B, b1 ,b2 = b1
2 ,b1
. [symmetry]
(iii) b1 , b2 , b3 B, b3 ,b2 b2 ,b1 = b3 ,b1 . [transitivity]
For z1 , z2 E, one may write z1 k z2 to mean that (z2 ),(z1 ) (z1 ) = z2 . Then the relation k satisfies:
(i) z E, z k z, [reflexivity]
(ii) z1 , z2 E, (z1 k z2 ) (z2 k z1 ) , [symmetry]
(iii) z1 , z2 , z3 E, (z1 k z2 and z2 k z3 ) z1 k z3 . [transitivity]
This is clearly a simple equivalence relation on E, whose equivalence classes contain one and only one element
of each fibre set Eb . This may be referred to as an absolute parallelism or a flat parallelism. Absolute
parallelism is illustrated in Figure 21.9.1.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
F F fibre space
1 E b1 3 E b3
2 E fibre chart
b2
b2 ,b1 b3 ,b2
E E b1 E b2 E b3 total space
b1 ,b2 b2 ,b3
b3 ,b1 projection map
b1 ,b3
B base space
b1 b2 b3
Figure 21.9.1 Non-topological fibration with absolute parallelism
If F is a fibre space for the fibration (E, , B), and 1 , 2 are fibre charts for (E, , B) with fibre F , such
1
that bk (Dom(k )) for k = 1, 2, then 2 E 1 E : Eb1 Eb2 is a bijection. So this map could be
b2 b1
regarded as a kind of parallelism relation between Eb1 and Eb2 . However, it is chart-dependent. So this is
not a feasible way of constructing an absolute parallelism relation unless one uses only a single global chart.
21.9.2 Remark: Parallelism may be thought of as a transformation group action on the fibre space.
If : (B B) (E E) is an absolute parallelism relation for a fibration (E, , B), and 1 , 2 are
fibre charts for (E, , B) with bk (Dom(k )) for k = 1, 2, then the map b22,b,11 : F F defined by
1
2 ,1 = 2 b2 ,b1 1
b2 ,b1 is a bijection on F for all b1 , b2 B. Thus 2 ,1 is an element of the group of
Eb1 b2 ,b1
bijections on F . Therefore viewed through the charts, parallelism may be thought of as a transformation
group action on the fibre space F .
If one considers point-to-point absolute parallelism on a fibration in the context of an example such as
temperature distributions on the Earth, parallelism is not a conversion of units. In other words, it does
not correspond to a change over observer coordinate frame. An absolute parallelism map b2 ,b1 corresponds
to an equivalence relation between the real temperatures at points b1 and b2 on the Earth. Then the maps
b22,b,11 through the charts 1 and 2 correspond to equivalence relations between temperature measurements
at points b1 and b2 which are made via coordinate frames 1 and 2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For pathwise parallelism, one may establish multiple parallelism bijections for each base space point pair
by making multiple hops between one point and another. Then for reach pair of points, there is one
parallelism bijection for each path between the pair of points. Such a parallelism may be specified as
bijections : Pb1 ,b2 (Eb1 Eb2 ) for b1 , b2 B. In other words, for all b1 , b2 B, there is a set Pb1 ,b2
of permitted paths between b1 and b2 , and for each path Q Pb1 b2 , there is an end-to-end parallelism
bijection (Q) : Eb1 Eb2 . (See Figure 21.9.2.)
F F fibre space
1 E b1 3 E b3
2 E fibre chart
b2
(Q1 ) (Q2 )
E E b1 E b2 E b3 total space
(Q3 ) (Q4 )
projection map
Q1 Q2
B base space
b1 b2 b3
Q3 Q4
Figure 21.9.2 Non-topological fibration with pathwise parallelism
The reflexivity and symmetry conditions in Remark 21.9.1 are assumed to hold for point-to-point bijections.
(This means that the constant path maps to the identity map, and the reversal of a path give the inverse
bijection.) However, the transitivity condition is not required. The choice of path determines the end-to-end
parallelism relation.
One would expect a pathwise parallelism to obey transitivity for concatenations of paths. In other words,
if a path Q0 Pb1 ,b3 is equal to the concatenation of paths Q1 Pb1 ,b2 and Q2 Pb2 ,b3 , then one would
expect (Q0 ) = (Q2 ) (Q1 ).
The canonical example of non-conservative parallelism is the parallel transport of tangent vectors on a curved
manifold. However, this example is best studied in the context of differentiable fibre bundles.
21.9.5 Remark: The special case of pathwise parallelism on a finite base space.
If B is finite, the end-to-end parallelism relations may be formed from any concatenation of point-to-point
paths. In the finite case, transitivity under concatenation implies that only the point-to-point bijections
need to be specified. In fact, one could define the point-to-point bijections for a subset of the full cross
product B B. If all points of B are reachable by some concatenation of links, then one or more end-to-end
parallelism relations will be implied.
Paths may be defined as equivalence classes of parametrised curves for some suitable equivalence relation.
One may also define paths as totally ordered subsets of the base space. If these subsets are finite, the
end-to-end parallelism will be well defined.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The primary practical purpose of defining fibre bundles for differential geometry is to define parallelism and
curvature on them. The role of the structure group in defining parallelism is to constrain the set of frames
of reference at each point of the fibre bundle.
F F fibre space
1 E 2 E
b1 b2
fibre chart
b2 ,b1
E E b1 E b2 total space
b1 ,b2
projection map
B base space
b1 b2
Figure 21.10.1 Parallelism between fibre sets of a fibre bundle
It is reasonable to expect that if the frame 1 E at b1 B and the frame 2 E at b2 B are parallel,
b1 b2
and if the objects z1 Eb1 and z2 Eb2 are parallel, then the measurements 1 (z1 ) and (z 2 ) should be
equal. This implies that 1 (z1 ) = 2 (b2 ,b1 (z1 )). So if the frames of reference 1 E and 2 E are known
b1 b2
Since fibre charts will be identified with principal fibre bundle total spaces in Section 21.6, these calculations
show that there are two ways to specify parallelism on a fibre bundle. One may either specify the parallelism
directly on fibre sets of the ordinary fibre bundle, or one may specify parallelism on the associated principal
fibre bundle, which then induces parallelism on the ordinary fibre bundle.
21.10.3 Remark: The parallelism relations look like left actions on the structure group.
When the parallelism bijection b2 ,b1 is viewed through charts 1 and 2 which are not necessarily parallel,
where bk (Dom(k )) for k = 1, 2, this bijection must be equal to the action of an element of the structure
group G on the fibre space F . This follows from Definition 21.5.2 because all chart transitions must be
structure group elements. In other words,
1 AF F
b1 , 2 Ab2 , g G, 2 E b2 ,b1 = Lg 1 E , (21.10.1)
b2 b1
where Lg : F F denotes the left action of g on F . Thus when viewed through the charts, the parallelism
relation looks like the left action of an element of the structure group. This is formalised as Theorem 21.10.4.
Since structure groups preserve some sort of structure (such as linearity or orthogonality), this means that
any specification of parallelism must preserve structure. If one associates each fibre chart with an observer
viewpoint, then line (21.10.1) means that each observer sees the same reality because their observations are
related through a transformation Lg under which the laws of physics are invariant. (See Remark 20.1.10
regarding the role of transformation groups for describing observer-independence of the laws of physics.)
21.10.4 Theorem: Let (G, F, , ) be an effective left transformation group. Let (E, , B, AF
E ) be a non-
topological (G, F ) ordinary fibre bundle. Let : Eb1 Eb2 be a bijection for some b1 , b2 B, where
Eb = 1 ({b}) 0 0 F 0
for b B. Suppose that there are charts 1 , 2 AE with bk (Dom(k )) for k = 1, 2,
such that 02 E = 01 . Then
b2
1 AF F
b1 , 2 Ab2 , g G, z Eb1 ,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
2 (z) = (g, 1 ((z))), (21.10.2)
where AF F
b = { AE ; b (Dom())} for b B.
F
(E, ,0B, AE ), b1 , b2 B,
Proof: Let (G, F, , ), : Eb1 Eb2 and 01 , 02 AF
E be as in the assumptions
0
of the theorem, with 2 E = 1 . Let 1 Ab1 and 2 AF
F
b2 . Then by Definition 21.5.2 (v), there exist
b2
g1 , g2 G such that zk Ebk , k (zk ) = (gk , 0k (zk )), for k = 1, 2. Let g = g2 g11 . Let z Eb1 . Then
(g, 1 ((z))) = (g, (g1 , 01 ((z)))) = (g, (g1 , 02 (z))) = (g, (g1 , (g21 , 2 (z)))) = 2 (z).
21.10.5 Remark: Parallelism bijections do not in general commute with structure group left actions.
The fact that parallelism is observer-dependent in line (21.10.1) implies that the desirable condition (21.10.3)
is not valid .
1 AF F
b1 , 2 Ab2 , h G, z1 , z2 Eb1 ,
1 (z1 ) = h 1 (z2 ) 2 (b1 ,b2 (z1 )) = h 2 (b2 ,b1 (z2 )). (21.10.3)
To see how this fails, suppose that line (21.10.1) holds. Fix 1 and 2 . Then for some g G, for all z Eb1 ,
2 (b2 ,b1 (z)) = g 1 (z). So 2 (b2 ,b1 (z1 )) = g 1 (z1 ) and 2 (b2 ,b1 (z2 )) = g 1 (z2 ). Then from the
assumption 1 (z1 ) = h 1 (z2 ), it follows that 2 (b2 ,b1 (z1 )) = ghg 1 2 (b2 ,b1 (z2 )). Thus even with a fixed
pair of charts, the bijection b2 ,b1 , viewed through the charts, does not commute with the action of G on F ,
unless G happens to be a commutative group.
The physical reason that condition (21.10.3) fails is that the action Lh looks different from two viewpoints
or coordinate frames. The algebraic reason that condition (21.10.3) fails is that the multiplication by h is
on the left, which is the same side as the multiplication by g in line (21.10.1). The two left actions do
not commute in general. What is really wanted here is an action on the right in condition (21.10.3). (In
fact, we really only want the actions to be on opposite sides, but the convention is to make (G, F ) a left
transformation group. And this implies that the invariance condition (21.10.3) must use a right action.)
Unfortunately, there is no right action of G on F . So condition (21.10.3) cannot be valid in general.
A right action of G on F is in fact available if F is the group G, and G acts on G from the left in
Definition 21.5.2 (v), and G acts on G on the right in condition (21.10.3). This works. The left and right
actions commute. Then one obtains condition (21.10.4).
1 AF F
b1 , 2 Ab2 , h G, z1 , z2 Eb1 ,
1 (z1 ) = 1 (z2 ) h 2 (b2 ,b1 (z1 )) = 2 (b2 ,b1 (z2 )) h. (21.10.4)
Chapter 22
Linear spaces
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Linear algebra is also attributed to Hermann Gramann. Eves [66], page 10, says the following.
Algebras of ordered arrays of numbers can perhaps be said to have originated with the Irish mathe-
matician and physicist Sir William Rowan Hamilton (18051865) when, in 1837 and 1843, he devised
his treatment of complex numbers as ordered pairs of real numbers (. . . ) and his real quaternion
algebra as an algebra of ordered quadruples of real numbers. More general algebras of ordered
n-tuples of real numbers were considered by Hermann Gunther Grassman (. . . ) in 1844 and 1862.
technology have caused us to think of the universe as a rectangular grid which provides the arena in which
matter moves. After almost 400 years of ubiquitous rectangular grids and the arithmetic coordinatisation of
space, the idea of curved space which is not a simple grid stretching to infinity in all directions seems alien.
It should come as a surprise to us that the arithmetic system which was created for counting goats and cattle
and obols and denarii could usurp the role of Euclidean point-and-line geometry system, which was created
for the triangulation of fields and the design of houses, roads, bridges and carts. For 2000 years, these two
systems were distinct and separate, and ironically, they were combined not very long after the flat Earth was
replaced by the spherical Earth. So the infinite rectangular grid to describe plane geometry was introduced
just after it was generally realised that this was not a valid model for the surface of the Earth.
Linear algebra is the algebra of the Cartesian grid. Galileo recognised that the motion of objects on Earth
follows a square law in terms of an arithmetic grid for space and time. Newtons second law F = ma
may be thought of geometrically in terms of parallel vectors F and a, or in terms of coordinates Fi = mai
for i = 1, 2, 3. This formula arises from the linearisation of space and time. The path of the particle is
curved and the force field is curved, but when they are both differentiated, the differential formula is linear.
Thus curved global reality is reduced to a linearised local relation, which is then integrated to determine
the global behaviour. Once again, it took thousands of years to recognise that arithmetic could be applied
to the world with great success, whereas the old Euclidean point-and-line system was difficult to apply to a
curved world.
Line recognition is built into the brains visual processing. Lines are recognised at a low level on the neural
path from the eyes to the visual cortex. Arithmetic is more closely related to verbal processing in the human
brain. So linear algebra and Cartesian coordinatisation bring together two distinct functionalities of the
human brain: the visual and the verbal.
Linear algebra has a central role in differential geometry as the limit of curved manifolds. Curved space and
time are locally linearised so that linear (and multilinear) algebra may be applied. Then global solutions are
obtained by integration.
One should not be perplexed that the spherical Earth requires more than one Cartesian patch to cover it
so that we can arithmetise the Earths surface. One should not be annoyed that the curved universe of
general relativity can only be expressed in terms of simple Cartesian coordinates locally instead of a globally
flat coordinate-free Euclidean point-and-line geometry. One should be astonished that the arithmetic which
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
humans developed for counting goats and cattle and money is applicable to a curved universe by locally
linearising it. The methodology of linearisation and coordinatisation has been one of the huge successes of
mathematics in applications to science, engineering, economics and other fields.
22.1.2 Remark: A linear space is a unitary left module over a commutative division ring.
A linear space V over a field K is the same thing as a unitary left module V over the ring structure of K.
(See Definition 19.3.6 and Remark 19.3.8.) In other words, a linear space is a unitary left module over a ring
with the additional requirement that all elements of the ring must have two-sided multiplicative inverses.
Figure 22.1.1 illustrates some relations between unitary left modules over rings and various kinds of linear
spaces. (The existence of bases for linear spaces is discussed in Section 22.7. See Definition 22.5.7 for
finite-dimensional linear spaces.)
linear space
A field is known more technically as a commutative division ring. (See Remark 18.7.4.) Therefore the full
technical name for a linear space would be a unitary left module over a commutative division ring. Clearly
the shorter term linear space has advantages.
22.1.4 Remark: Some partial redundancy in the axioms for a linear space.
R
Linear spaces are defined in Weyl [268], section 2, page 15, in a classical fashion for K = . Weyl makes
the observation that the scalar multiplication rules (i), (ii) and (iii) in Definition 22.1.1 follow from the
vector addition rules for scalars in the rational subfield Q R
of the field K = . Therefore the full scalar
R
multiplication rules for K = follow from the rules on the dense subset of Q Rby requiring continuity.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.1.5 Remark: Order independence of finite sums of vectors in a linear space. Pm
Sums of finite sequences (vi )ni=1 V n of vectors in a linear space V are defined inductively by i=1 vi =
Pm1 P0
i=1 vi + vm for m 1 and i=1 vi = 0V . Such sums are clearly order-independent because vector
addition is commutative. Therefore sums of the form (v )A V A are likewise well defined for finite
sets A, whether the index set A is totally ordered or not. Finite sets are easily ordered via a bijection
N
to/from the corresponding set of integers n = {1, . . . n} or n = {0, 1, . . . n 1}.
22.1.7 Definition: The field K regarded as a linear space is the linear space which has the specification
tuple (K, K, K , K , K , K ), where (K, K , K ) is the full specification tuple for K.
22.1.8 Definition: A (linear) subspace of a linear space V over a field K is a linear space U over K
such that U V and the operations for U equal the restrictions of the operations for V . In other words,
V
< (K, V, K , K , V , V ) and U
< (K, U, K , K , U , U ) are linear spaces over K which satisfy:
(i) U V .
22.1.12 Definition: A real linear space is a linear space over the field of real numbers. In other words, a
R R
< ( , V, R , R , V , ) over the field <
real linear space is a linear space V R
( , R , R ).
22.1.13 Definition: A complex linear space is a linear space over the field of complex numbers. In other
C
< ( , V, C , C , V , ) over the field of complex numbers
words, a complex linear space is a linear space V
C C
< ( , C , C ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.2.1 Remark: Most practical linear spaces are subspaces of free and unrestricted linear spaces.
In practice, most linear spaces are based on spaces of functions for which the domain is a fixed set and the
range is a field or a linear space, and the vector addition and scalar multiplication operations are defined
pointwise. As mentioned in Remark 22.1.6, a field may always be regarded as a linear space. So in this
sense, one may speak here simply of functions whose range is a linear space.
For any set S, pointwise addition and multiplication operations may be defined on the set V S of functions
from S to a linear space (or field) V over a field K. Such an unrestricted function space provides a valid
linear space. (The range V may, of course, be constructed itself as a space of functions with pointwise
addition and multiplication operations on a different domain.)
Unrestricted function spaces are sometimes useful, but in the case of an infinite domain, linear subspaces of
such functions spaces are usually more useful.
Free linear spaces are a particular class of useful subspaces of completely unrestricted function spaces. Other
kinds of useful function space subspaces are constructed by imposing linear constraints on unrestricted
function spaces or free linear spaces. (Other methods for constructing linear spaces include quotients, direct
sums, tensor products and duals.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.2.10 Definition: The free linear space on a set S over a field K < (K, K , K ) is the tuple V
<
(K, V, K , K , V , ), where
V = f : S K; #{x S; f (x) 6= 0} <
= f : S K; #(supp(f )) < ,
V : V V V is the pointwise addition operation in V , and : K V V is the pointwise scalar
multiplication operation by elements of K on V .
22.2.11 Notation: FL(S, K) denotes the free linear space on a set S over a field K. In other words,
FL(S, K) = f : S K; #{x S; f (x) 6= 0} <
= f : S K; #(supp(f )) < ,
together with the associated pointwise addition and scalar multiplication operations.
22.2.12 Definition: The standard immersion (function) for the free linear space V on a set S over a field
K is the map i : S V defined as the indicator function i(x) = {x} : S K for x S.
22.2.13 Remark: Unsatisfactory multi-letter notations.
Notations 22.2.6 and 22.2.11 are non-standard and not entirely satisfactory. Such multi-letter abbreviations
are not often used in mathematics because they may be confused with the product of the individual let-
ters in the abbreviation. However, the fonts available to mathematicians, although large and numerous,
are insufficient for the full range of concepts. The use of numerous font styles, such as upper and lower
case roman, bold, italic, bold-italic, blackboard-bold, calligraphic, Greek, bold-Greek, Fraktur, and many
mathematical symbols, eventually results in confusion because of the inevitable similarities between many
of these. Therefore multiple-letter notations are sometimes the least-bad solution.
Proof: For part (i), let V = UL(S, K). Then (V, V ) is a commutative group whose zero element is
0V : S 7 0K because (K, K ) is a commutative group. Let K and v1 , v2 UL(S, K) = K S . Then
(v1 + v2 ) is the algebraic expression for the value (, V (v1 , v2 )) with : K V V and V : V V V
as in Definition 22.2.5. Thus this is a well-defined element of V . Similarly v1 + v2 = V ((, v1 ), (, v2 ))
is a well-defined element of V . The equality of these two expressions is clear from the distributivity of
multiplication over addition for the field K. This verifies Definition 22.1.1 (i). Definition 22.1.1 conditions
(ii), (iii) and (iv) may be verified similarly from the distributivity, associativity and multiplicative identity
properties of a field. Hence UL(S, K) is a linear space.
Part (ii) may be proved as for part (i), except that the closure of the vector addition operation V and scalar
multiplication operation in Definition 22.2.10 must be verified to be closed within FL(S, K). The closure
of vector addition follows from the fact that any subset of the union of two finite sets is a finite set. The
support of the product v for K and v FL(S, K) is either the same as the support of v, if 6= 0K ,
or is equal to the empty set (which is clearly finite) if = 0K . Hence FL(S, K) is a linear space.
For part (iii), clearly FL(S, K) UL(S, K) by Definitions 22.2.5 and 22.2.10. Hence FL(S, K) is a linear
subspace of UL(S, K) by Definition 57.2.7 and parts (i) and (ii).
For part (iv), let S be finite. Let f : S K. Then {x S; f (x) 6= 0} S. So #{x S; f (x) 6= 0}
#(S) < . Thus {f : S K; #{x S; f (x) 6= 0} < } = K S . Hence FL(S, K) = UL(S, K) = K S .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
f1
R
f2
R
f1 + f2
R
Figure 22.2.1 Vector addition in the free linear space over the real numbers
22.2.16 Remark: The range of the standard immersion function for a linear space.
The standard immersion function i : S FL(S, K) in Definition 22.2.12 is the same as that which is
discussed in Remark 22.2.8. The range of this immersion function is a basis for the free linear space. (See
Definition 22.7.2 for linear space bases.)
22.2.17 Remark: Free linear spaces may be thought of as external direct sums.
Free linear spaces may be thought of as the external direct sum of a copy of a field K for each element of an
arbitrary set. (See Section 24.1 for direct sums of linear spaces.)
22.2.19 Definition: A Cartesian linear space is a set Rn for some n Z+0 , together with the operations
of componentwise addition and scalar multiplication over the field R.
22.2.20 Remark: The difference between Cartesian n-tuple space and Euclidean point space.
The Cartesian linear space in Definition 22.2.19 is also known as Cartesian coordinate space, Euclidean
space, Euclidean linear space and by other names. It is probably best to reserve the adjective Euclidean
for the normed space which is defined in Section 24.6 because the term Euclidean is often used in contrast
with non-Euclidean, which implies that the metric is a defining feature.
The bare coordinate space in Definition 22.2.19 is referred to as Cartesian to emphasise the role of the real
n-tuples as a chart for some underlying point space, which distinguishes Cartesian geometry from the more
ancient geometry of point, line and circle constructions. The name Cartesian is also apt because the set
R n
is the Cartesian set-product of n copies of the set . R
22.2.21 Remark: Computational convenience of the sparse representation of a free linear space.
From the computational point of view, the representation in Definition 22.2.10 is easy to operate on. Each
element of the free linear space has only a finite set of non-zero components. So it is possible to represent
an element of the space by a finite set of ordered pairs. Any vector f in the free linear space V on the set S
over K may be represented by a finite set of tuples:
(x, f (x)) f ; f (x) 6= 0 .
Such a representation is easy to store and manipulate if the elements of S and K are easy to store and
manipulate. This is close to the way symbolic algebra is implemented in computer software. Figure 22.2.2
illustrates an element {u} + 2{v} + (3/2){w} with u = (3, 3), v = (1, 4) and w = (4, 1) in the free
R R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
linear space on the Cartesian product set S = 2 over the field K = .
K= R
(v, 2)
(u, 1)
S= R2
(w, 3/2)
Figure 22.2.2 {(3,3)} + 2{(1,4)} + (3/2){(4,1)} in the free linear space on R2 over R
22.2.22 Remark: Free linear spaces can give concrete set-representations for symbolic algebras.
Free spaces of any kind may be thought of as a kind of symbolic algebra such as is offered by some
computer software. Each element of a free space of any class represents a grouping of symbols rather
than a mathematical object. The intended mathematical object is obtained from the symbolic algebra
by imposing an equivalence relation and adding algebraic operations. In the example of tensors, a symbolic
expression such as 1 (v1 w1 ) + 2 (v2 w2 ) may be represented as a sum of indicator functions such as
1 {(v1 ,w1 )} + 2 {(v2 ,w2 )} .
Whenever mathematicians or physicists write down juxtapositions of symbols without working out what
they represent mathematically, the concept of a free space can often save the situation. For instance,
when tensor products are written, they are usually written as v w. To formalise this mathematically, one
may first create the uncut space or free space of all formal juxtapositions of vectors v and w in some
linear space, and then reduce the size of this free space by introducing equivalence relations. This approach
is also used for free groups, free R-modules, and so forth.
An example of interest to differential geometry is a free R-module of singular q-chains over a commutative
Z
unitary ring R such as R = . In this case, such a free R-module is referred to as the space of formal
linear combinations of elements of some space. The base space for the free space of q-chains is the space of
q-simplices. (See Greenberg/Harper [84], page 44.)
22.2.24 Remark: Convenient notation for sets of functions with finite support.
Sets of the form {f : S K; #{x S; f (x) 6= 0K } < }, where S is a set and K is a field or a linear
space, are tedious to write and tedious to read. Since such sets are frequently required, it is convenient to
define Notation 22.2.25 for them. This is the same as the set of vectors of the free linear space FL(S, K) in
Notation 22.2.11, but Fin(S, K) has no linear space operations, and the range K may be either a field or a
linear space. The notation Fin(S, K) is typically used when S is a subset of a linear space.
22.2.27 Definition: The unrestricted (vector-valued) linear space on a set S, with values in a linear
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
space U < (K, U, K , K , U , U ) over a field K
< (K, K , K ), is the tuple V
< (K, V, K , K , V , V ),
where V = U S , V : V V V is the pointwise addition operation in V , and V : K V V is the
pointwise scalar multiplication operation by elements of K on V . In other words:
22.2.28 Definition: The free (vector-valued) linear space on a set S to a linear space U over a field
K< (K, K , K ) is the tuple V
< (K, V, K , K , V , V ), where
V = f : S U ; #{x S; f (x) 6= 0U } <
= f U S ; #(supp(f )) < ,
22.2.29 Remark: No standard immersion for vector-valued free and unrestricted linear spaces.
For Definitions 22.2.27 and 22.2.28, there is no standard immersion map for S in V following the pattern
of Definitions 22.2.7 and 22.2.12 because a linear space U does not generally have a multiplicative identity
element 1U . However, it is possible to define a standard immersion of U in V for each x S as the map
ix : U V with ix : u 7 (y 7 xy u). In other words, ix (u)(y) = u if y = x and ix (u)(y) = 0U if y 6= x. The
standard projection map x : V U defined by x : (uy )yS 7 ux is a left inverse for this immersion map
for each x S.
22.3.2 Definition: A linear combination of (elements of) a subset S of a linear space V over a field K is
any element v V which satisfies
n
Z+0, w S n, K n,
X
n i wi = v. (22.3.1)
i=1
22.3.3 Remark:
Pn Comments on the definition of a linear combination.
The sum i=1 i wi in Definition 22.3.2 is well defined, as discussed in Remarks 17.1.16 and 22.1.5. The set
S may be infinite, but the sum is always over a finite subset of S. Note also that the vectors wi for i n N
N
are not necessarily distinct. The index set does not need to be the set of integers n . Any finite index set
is acceptable, as discussed in Remark 22.1.5. Thus condition (22.3.1) may be replaced by (22.3.2).
P
A, #(A) < and (w )A S A , ( )A K A , w = v. (22.3.2)
A
Unfortunately, the set of all finite sets is not a ZF set. So line (22.3.2) cannot be written with a set-
constrained quantifier such as A FinSets, . . . . However, lines (22.3.1) and (22.3.2) may be written in
terms of Notation 22.2.25 as follows.
X
Fin(S, K), w w = v.
wS
P
Likewise, line (22.3.3) may be rewritten as Fin(S, K), wS w w S. Such expressions using a set
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Fin(S, K) of functions with finite support exploit the understanding in Remark 22.3.1 that only the non-zero
elements of a family need to be summed.
Pn
In the special case n = 0 in Definition 22.3.2, both w and are empty lists, and i=1 i wi becomes
P0
i=1 i wi , which equals 0V , as noted in Remark 22.1.5. (It is a generally accepted convention that the sum
of zero terms in an additive group is equal to the additive identity of the group.)
22.3.4 Definition: A subset S of a linear space V over a field K is closed under linear combinations if
n
Z+0, w S n, K n,
X
n i wi S. (22.3.3)
i=1
P
In other words, Fin(S, K), wS w w S.
22.3.5 Theorem: A subset S of a linear space V is closed under linear combinations if and only if it is
non-empty and closed under vector addition and scalar multiplication.
Proof: Let V be a linear space over a field K. Let PS1be 1a subset of V . Suppose
P 2 2 first that S is closed under
linear combinations. Let v1 , v2 S. Then v1 = ni=1 i wi1 and v2 = nj=1 j wj2 for some n1 , n2 +
Pm
Z
0,
(wi1 )ni=1
1
S n1 , (wj2 )nj=1
2
S n2 , (1i )ni=1
1
K n1 , (2j )nj=1
2
K n2 . Then v1 + v2 = u
k=1 k k , where
(uk )mk=1 S
m
and (k )m k=1 K
m
are defined by m = n1 + n2 , uk = wk1 for k = 1, . . . n1 , un1 +k = wk2 for
k = 1, . . . n2 , k = 1k for k = 1, . . . n1 and n1 +k = 2k for k = 1, . . . n2 . So v1 + v2 S. Thus S is closed
under addition.
Pn
Suppose again that S is closed under linear combinations. Let Pnv S and K. Then v = i=1 i wi for
Z + n n n n n
some n 0 , (wi )i=1 S and (i )i=1 K . Then v = i=1 i wi , where (i )i=1 K is defined by n
N
i = i for i n . So v S. Thus S is closed under scalar multiplication by elements in K.
22.3.6 Theorem: Let S be the set of all linear combinations of a subset S of a linear space V . Then S is
closed under linear combinations.
Proof: The assertion followsPn from an explicit expression for a linear combination of a finite set of linear
combinations. Let u = i=1 i vi be a linear combination of vectors vi S for i N n . Then vi =
Pmi
w
j=1 i,j i,j for some sequences
Pn
(i,j )m
j=1 in K
i
Pmi
mi
and (wi,j )m
Pn Pmij=1 in S
i mi
Z
for mi +
PM 0 for all i N
n . By
substitution, it follows that u = i=1 i j=1 i,j wi,j = i=1 j=1 i i,j wi,j = `=1 i` i` ,j` wi` ,j` , where
Pn
M = i=1 mi , and i` = min{i0 + Z 0;
Pi0
i=1 mi `} and j` = `
Pi` 1
i=1 mi for all ` N M . Thus u is a
linear combination of the vectors (wi` ,j` )M `=1 , with scalar multipliers ( ) M
i` i` ,j` `=1 .
22.3.7 Remark: The set of linear combinations of a set of vectors is closed under linear combinations.
Theorem 22.3.6 effectively says that a linear combination of a linear combination is a linear combination. So
by Theorem 22.3.5, the set of linear combinations of any subset S of a linear space V is a non-empty subset
of V which is closed under vector addition and scalar multiplication. The set of linear combinations of a set
S contains every element of S and also every element of V which can be constructed from elements of S by
the recursive application of any number of vector addition and scalar multiplication operations.
22.3.8 Theorem: A subset S of a linear space V is closed under linear combinations if and only if S is a
subspace of V .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.3.9 Remark: A subset of a linear space is assumed to have an induced linear space structure.
The second mention of the subset S in Theorem 22.3.8 is implicitly an abbreviation for the induced
tuple
S < (K, S, K , K , S , S ), where K
< (K, K , K ) is the field of V , S = V SS and S = V KS . Then
if the subset is closed under linear combinations, it satisfies Definition 22.1.8.
in Notation 22.4.5, the set-theoretic function span has technically different meanings in its two instances
in the equality span(v) = span(Range(v)).
22.4.2 Definition: The (linear) span of a subset S of a linear space V is the set of all linear combinations
of elements of S in V .
22.4.3 Definition: The (linear) span of a family v = (vi )iI of vectors in a linear space V is the set of
all linear combinations of vectors vi such that i I.
22.4.4 Notation: span(S) denotes the linear span of a subset S of a linear space V .
22.4.5 Notation: span(v) denotes the linear span of a family v = (vi )iI of vectors in a linear space V .
In other words, span(v) = span(Range(v)) = span({vi ; i I}).
22.4.7 Theorem: Basic properties of the linear span of a subset of a linear space.
Let V be a linear space over a field K.
(i) span(S) is a linear subspace of V for any subset S of V .
(ii) span() = {0V }.
(iii) S span(S) for every subset S of V .
(iv) span(W ) = W for every linear subspace W of V .
(v) For any subset S of V , span(S) = S if and only if S is a linear subspace of V .
(vi) Let S1 and S2 be subsets of V . Then S1 S2 span(S1 ) span(S2 ).
(vii) 0V span(S) for any subset S of V .
(viii) For any v V and any subset S of V , v span(S) span(S {v}) = span(S).
(ix) For any subsets S1 and S2 of V , S1 span(S2 ) span(S1 ) span(S2 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(x) S W span(S) W for every linear subspace W of V and every subset S of V .
(xi) For any subset S of V , span(span(S)) = span(S).
(xii) For any subset S of V , for any K \ {0K }, span({v; v S}) = span(S).
Proof: Part (i) follows from Definitions 22.4.2 and 22.1.8, and Theorem 22.3.6.
P
For part (ii), a vector v V is a linear combination of elements in if and only if v = ni=1 ui for some
n + Z0 and ui for all i N
n . The only possibility for n is zero, and so v =
P 0
i=1 ui = 0V . Hence
span() = {0V }.
Pn
For part (iii), let S be a subset of V . Let v S. Then i=1 ui is a linear combination of elements of S,
where n = 1 and u1 = v. So v span(S). Hence S span(S).
Pn let W be a linear subspace of V . Then W span(W ) by part (iii). Let v span(W ).
For part (iv),
Then v = i=1 wi for some family (wi )ni=1 of elements of W . So v W by Theorem 22.3.8. Therefore
span(W ) W . Hence span(W ) = W .
For part (v), let S be a subset of V . If S is a linear subspace of V , then span(S) = S by part (iv).
Now suppose that span(S) = S. Let v be a linear combination of elements of S. Then v span(S) by
Definition 22.4.2. So v S. Thus S is closed under linear combinations. Therefore S is a linear subspace
of V by Theorem 22.3.8. Hence span(S) = S if and only if S is a linear subspace of V .
Pn (vi), let S1 and S2 be subsets ofnV such that nS1 S2 . nLet vn span(S1P
For part ). Then by Definition 22.3.2,
v = i=1 i wi for some n + Z
0 , w S 1 and K . But S 1 S 2 . So v =
n +
i=1 i wi for some n 0 ,Z
n n
w S2 and K . Therefore v span(S2 ) by Definition 22.3.2. Hence span(S1 ) span(S2 ).
P0
For part (vii), let S be a subset of V . Then 0V = i=1 i wi , where (i )0i=1 = () = is the empty (zero-
length) tuple of elements of K, and (wi )0i=1 = () = is the empty (zero-length) tuple of elements of V .
P
Hence 0V span(S). (See Remarks 22.1.5 and 22.3.3 for the convention that 0i=1 i wi = 0V .)
For part (viii), let S be a subset of V . Let v span(S). PThen span(S) span(S {v}) by part (vi).
Let u span(S {v}). Then by Definition 22.3.2, u = n
i=1 i wi for some n Z+
0 , w (S {v})
n
n
Z n
and K . If wi 6= v for all i n , then w S , and so u span(S). So suppose that wj = v
Z
for some j n . Since v =
Pj1 Pn Pn0
Pn0 0
i=1 i wi for some n
0
Z + 0
0, w S
n0 0
and K n , it follows that u =
0
i=1 wi + i=j+1 wi + i=1 (j i )wi , which is a linear combination of elements of S. So u span(S).
Therefore span(S {v}) span(S). Hence span(S {v}) = span(S).
For part (ix), let S1 and S2 be subsets of V , and suppose that S1 span(S2 ). Let v span(S1 ). Then v is
a linear combination of elements of S1 . So v is a linear combination of elements of span(S2 ). But span(S2 )
is a linear subspace of V by part (i). So v span(S2 ) by Theorem 22.3.8. Hence span(S1 ) span(S2 ).
For part (x), let W be a linear subspace of V and let S be a subset of V . Suppose that S W . Then
span(S) span(W ) by part (ix). But span(W ) = W by part (iv). So span(S) W .
For part (xi), let S be a subset of V . Then span(S) is a linear subspace of V by part (i). So span(span(S)) =
span(S) by part (iv).
For part (xii), let S be Pa subset of V , and let K \ {0K }. Suppose that u span(S).Pn Then by
Definition 22.3.2, u = ni=1 i wi for some n + Z n n
0 , w (S {v}) and K . So u =
0
i=1 i (wi )
N
with 0i = 1 i for all i n . So u span({v; v S}). So span({v; v S}) span(S). Similarly
span({v; v S}) span(S). Hence span({v; v S}) = span(S).
22.4.8 Theorem: The linear span of a subset S of a linear space V is a linear subspace of V which
includes S.
Proof: Let S be a subset of a linear space V over a field K. Let P
U be the set of all linear combinations
n
of elements of S. For every vector v S, the linear combination i=1 ki wi equals v if n = 1, k1 = 1K
and w1 = v. So S U . The fact that U is a subspace of V follows directly from Definition 22.4.2,
Theorem 22.3.6 and Theorem 22.3.8.
22.4.9 Remark: The free linear space on a given set equals the span of that set.
The set of vectors in the free linear space FL(S, K) in Section 22.2 may be expressed in terms of linear spans
as follows.
FL(S, K) = {f K S ; #(supp(f )) < }
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
= span {{a} ; a S}
= span(Range(i)),
where i : S FL(S, K) is the standard immersion function for FL(S, K). (See Definition 22.2.12.) The
vectors {a} for a S are unit vectors in a free linear space.
22.4.10 Remark: The span of a set equals the smallest linear subspace containing the set.
The above approach to defining linear spans is explicitly constructive, building up the span of a set from linear
combinations. Linear spans can also be defined more abstractly by the lim-sup method as the intersection
of all linear subspaces which contain a given set. This is often preferred for its apparent simplicity. The
equivalence of the two definitions is stated as Theorem 22.4.12.
22.4.11 Theorem: The intersection of a non-empty set of subspaces of a given linear space is a subspace.
Proof: Let V < (K, V, K , K , V , V ) be a linear space,
T and let C be a non-empty set of subspaces
of V . Define the tuple (K, W, K , K , W , W ) by W = C, W = V W W and W = V KW . Then
Range(W ) W and Range(W ) W . So W is a linear space. Therefore it is a subspace of V .
22.4.12 Theorem: The linear span of a subset S of a linear space V is the intersection of all subspaces of
V which contains S. That is,
T
span(S) = {U V ; U is a subspace of V and S U }.
Proof: Let S be a subset of a linear space V . Let W be the intersection of the subspaces of V which
contain S. By Theorem 22.4.8, span(S) is a linear subspace of V , and S span(S). Therefore span(S) W
by Theorem 8.4.8 (xiv). But W is a subspace of V by Theorem 22.4.11, and S W by Theorem 8.4.8 (xvi).
So all linear combinations of elements of S are also elements of W by Theorem 22.3.8. So span(S) W by
Definition 7.3.2 and Notation 7.3.3. Hence span(S) = W .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
spanning set which is equinumerous to a subset of ( ), then the dimension of the space is 2(2 ) (which is
PP
essentially the same as ( ())).
Some indications of why beta-cardinality is preferred here to the more customary ordinal-number-based
cardinality are given in Section 13.2 and elsewhere. (In particular, note that there is no explicit ordinal
R R
number which is equinumerous to because that would imply that can be explicitly well-ordered, which
is not possible without assuming the axiom of choice.)
22.5.2 Definition: The dimension of a linear space is the least beta-cardinality of its spanning subsets.
22.5.3 Notation: dim(V ) denotes the dimension of a linear space V . In other words,
22.5.7 Definition:
A finite-dimensional linear space is a linear space which is spanned by a finite set.
An infinite-dimensional linear space is a linear space which is not finite-dimensional.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It seems intuitively clear that any linear subspace of a finite-dimensional linear space should itself be finite-
dimensional. But this must be proved. It is not entirely a-priori obvious in general that the dimension of a
linear subspace U must be equal to or lower than the dimension of a linear space V of which it is a subspace.
If V has dimension , this means that the least beta-cardinality of spanning sets for V is . Thus there exists
a spanning set S for V with (S) = because the beta-cardinals are well ordered. But then a spanning
set SU must be constructed for U such that (SU ) . In general, it is not obvious how to construct
such a spanning set. In the case of finite-dimensional spaces, the induction principle may be applied as in
Theorem 22.5.10 (ii) to explicitly construct a basis for any given linear subspace of a finite-dimensional linear
space. (Questions regarding linear independence and bases are avoided in the proof of Theorems 22.5.10
and 22.5.11 because these topics are first introduced in Sections 22.6 and 22.7 respectively, and linear maps
are not introduced until Section 23.1.
S P(Z+), W P(K S ),
W is a linear subspace of K S S 0 P(S), S 0
0
: K S K S is a bijection.
Proof: For part (i), let 1 , 2 K and w10 , w20 W 0 . Then w10 = S 0 (w1 ) and w20 = S 0 (w2 ) for some
w1 , w2 W . So 1 w1 + 2 w2 W because W is a linear subspace of K S , and so S 0 (1 w1 + 2 w2 ) W 0 .
But S 0 (1 w1 + 2 w2 ) = (1 w1,i + 2 w2,i )iS 0 = 1 (w1,i )iS 0 + 2 (w2,i )iS 0 = 1 S 0 (w1 ) + 2 S 0 (w2 ) =
0
1 w10 + 2 w20 . So 1 w10 + 2 w20 W 0 . Hence W 0 is a linear subspace of K S by Theorem 22.1.10.
Part (ii) may be proved by induction on n = #(S). Let V denote the linear space K S . The assertion
is clearly true for n = 0 because W = {0V } is the only possibility for W , and then S 0 = satisfies the
requirement. So let n + Z
0 and assume that the assertion is valid for all linear subspaces of linear spaces
Z
K S with S + and #(S) = n.
Let W be a linear subspace of V = K S with S Z+ and #(S) = n + 1. If W = V , then S 0 = S meets the
requirement. So suppose that W 6= V .
Let (i )iS be the family of vectors i = (ij )jS K S . Then i0 V \ W for some i0 S. (Otherwise W
0
would include span (i )iS = K S , and so W would equal V .) Let S 0 = S \ {i0 }. Let V 0 = K S . Define
S 0 : W V 0 by S 0 : v 7 v S 0 . To show that S 0 is injective, let v 1 , v 2 W satisfy S 0 (v 1 ) = S 0 (v 2 ). Then
vi1 = vi2 for all i S. Suppose that vi10 6= vi20 . Let = (vi10 vi20 )1 . Then v 1 v 2 = (vi10 vi20 )i0 = i0 W
because W is a linear subspace of V . This is a contradiction. So vi10 = vi20 . Therefore S 0 is injective. But
W 0 = Range(S 0 ) is a linear subspace of V 0 by part (i). So by the inductive assumption, there is a bijection
00 00
S 00 : W 0 K S for some subset S 00 of S 0 of the form S 00 : v 7 v S 00 . Then S 00 S 0 : W K S is a
bijection of the required form. Therefore by induction, the assertion is valid for all n + 0. Z
22.5.11 Theorem: Let W be a linear subspace of the linear space K n over a field K for some n Z+0.
(i) The map v 7 (vi )iI from W to K I is a bijection for some subset I of Nn.
(ii) W = span({I1 (i );
i I}) for any subset I of Nn for which the map I : v 7 (vi)iI from W to K I
is a bijection, where i = (ij )jI K I for i I.
(iii) dim(W ) n.
(iv) K n = span(SI TI ), where SI = {I1 (i ); i I} and TI = {i ; i n \ I}, for any subset I of N Nn for
which the map I : v 7 (vi )iI from W to K I is a bijection, where i = (ij )jI K I for i I.
Proof: Part (i) follows from Theorem 22.5.10 (ii) by letting S = Nn and then letting I = S 0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (ii), let I : v 7 (vi )iI be a bijection from W to K I . Let SI = {I1 (i ); i I}. Then
SI W . So span(SI ) W by P Theorem 22.4.7 (x). Let w W . Let (i )iI = I (w) K I . Then
w = I (I (w)) = I ((i )iI ) = iI i I1 (i ) span(SI ). Therefore W span(W ). So W = span(SI ).
1 1
n
P P P
i i = i i + i i
i=1 iI iI 0
P 1 P P
= i I (i ) + aij j + i i
iI jI 0 iI 0
P P P P
= i I1 (i ) + i i + aij j
iI iI 0 jI 0 iI
P P P P
= i I1 (i ) + i i + aji i
iI iI 0 iI 0 jI
P P
= i I1 (i ) + 0i i ,
iI iI 0
P
where 0i = i + jI aji for all i I 0 . Thus v span(SI TI ). Hence K n = span(SI TI ).
22.5.12 Remark: Spanning K n with n vectors so that a subset spans a given subspace.
The key point in the proof of Theorem 22.5.11 (iv) is the recognition that the structure of the projection map
I guarantees that I1 (i ) has the same j-component as j for j I, which implies that j can be expressed
as a linear combination of I1 (i ) and only the vectors j for j
/ I. Therefore the total number of spanning
vectors SI TI is equal to n. In other words, the #(I) vectors i for i I may be replaced with the #(I)
vectors I1 (i ) W without needing to use any of the vectors i for i I in the linear combination. In the
language of bases, this means that one basis has been replaced by another. The difficulties in the proof arise
mostly because bases, coordinate maps and linear maps have not been introduced at this point.
The purpose of Theorem 22.5.11 (iv) is to show that the linear space K n can be spanned by n vectors SI TI
such that a subset SI spans a given subspace. This is a kind of converse to the basis extension problem
where a minimal spanning set for a subspace is given and this set must be extended to a minimal spanning
set for the whole space.
U W
Pn
uj = k=1 wj,k vk
uj Pn wj
vi = k=1 ei,k vk
vi V ei Kn
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 22.5.1 Coordinate space for a linear subspace of a finite-dimensional linear space
Proof: Let U be a linear subspace of a finite-dimensional linear space V over a field K. Then dim(V ) = n
for some n + Z0 . So there is a subset
Pn S of V suchnthat #(S) N
= n and span(S) = V . Then S = {vi ; i n }
for some (vi )ni=1 V n , and V = i=1 i vi ; ( i ) i=1 K n
.
Pn
n n
Let W = (i )i=1 K ; i=1 i vi U . Then W is a linear subspace of K n because U is a linear subspace
of V . So dim(W ) n by Theorem 22.5.11 (iii). Therefore there is a sequence w = (wj )m j=1 (K )
n m
for
Pm m m
some m n such that W = span(w) = span(Range(w)). In other words, W = w
j j ; ( )
j j=1 K .
N
Pn j=1
Let uj = k=1 wj,k vk for j m . Then uj U because wj W .
Pn
Let u U . Then
Pm u = i=1 i vi for some (i )ni=1 W by the definition of W . But all elements
P of W can be
expressed as j=1 j wj for some (j )j=1 K m because W = span(w). So (i )ni=1 = m
m
j=1 j wj for some
Pn Pn Pm Pm Pn Pm
(j )m
j=1 K m . Therefore u = i=1 i vi = i=1 j=1 j wj,i vi = j=1 j i=1 wj,i vi = j=1 j uj . So
Pm m m
U= j=1 j uj ; (j )j=1 K . Thus U = span(u), where #(Range(u)) m. This verifies part (i). But
since U has a spanning set of cardinality less than or equal to the cardinality of any spanning set for V , it
follows that dim(U ) dim(V ), which verifies part (ii).
22.5.15 Remark: Finite-dimensional linear spaces have no proper subspaces with the same dimension.
The assertions of Theorem 22.5.16 are perhaps intuitively obvious, but as always, intuition is no substitute
for proof. In the case of infinite-dimensional linear spaces, the assertion is false. Therefore the special
properties of finite-dimensional linear spaces must be invoked in the proof.
Theorem 22.5.16 would be easier to prove with the assistance of the concept of a basis and the fact that the
cardinality of every basis for a finite-dimensional linear space equals the dimension of the space, but this is
not proved until Theorem 22.7.13.
Proof: For part (i), let V be a finite-dimensional linear space over a field K. ThenP dim(V ) = n for some
n +0Z, and there is a sequence (e ) n
j j=1 V n
such that v V, ( ) n
j j=1 K n
, v =
n
j=1 j ej .
Let U be a linear subspace of V . Then U is finite-dimensional by Theorem 22.5.14 (i). So dim(UP ) = m for
some m + Z, and there is a sequence (u i ) m
U m
such that v U, ( i ) m
K m
, v =
m
i=1 i ui .
N
0 i=1 Pn i=1
n n
Since U V , there is a sequence (i,j )j=1 K such that i m , ui = j=1 i,j ej .
Suppose that U 6= V . Then there is a vector w V \ U , and there is a sequence (j )nj=1 K n such that
Pn
N
w = j=1 j ej . If j = 0K for all j n , then w = 0V = 0U U , which contradicts the assumption.
N
So k 6= 0K for some k n . For some choice of k with k 6= 0K , define the sequence (u0i )m i=1 V
m
by
0 1
ui = ui k P i,k w for all i N m . Let U 0
= span({u 0
; N
i m }). Then U 0
is a linear subspace of V . For
N P Pi
i m , u0i = nj=1 i,j ej k1 i,k nj=1 j ej = nj=1 (i,j k1 i,k j )ej , but i,j k1 i,k j = 0K
N N
for j = k. So u0i span({ej ; j n \ {k}}) for all i m . Let Uk = span({ej ; j n \ {k}}). So N
U 0 is a linear subspace of Uk . But dim(Uk ) n 1 by Definition 22.5.2. Therefore dim(U 0 ) n 1 by
Theorem 22.5.14 (ii).
Pm 1
To show that U U 0 , letPv U . Then v = i=1 P i ui for some (i )P
m m
i=1 K . But uP
0
i = ui + k i,k w
N
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
m 0 1 m 0 1 m m
for all i mP . So v = i=1 i (u Pim+ k 0i,k w) = i=1 i ui + k i=1 i i,k w. If i=1 i i,k 6= 0K ,
then Pw = k ( m i=1 i i,k ) 1
(v i=1 i u i ) U , which contradicts the assumption w V \ U . Therefore
m
v = i=1 i u0i U 0 . So U U 0 . Hence U = U 0 , and so dim(U ) n 1 < n = dim(V ), which verifies
part (i).
Part (ii) follows immediately as the contrapositive of part (i).
Parts (iii) and (iv) follow from parts (i) and (ii) because U = V implies dim(U ) = dim(V ).
22.5.17 Definition: The codimension of a linear subspace W of a linear space V is the least beta-
cardinality of sets S V such that span(W S) = V .
22.5.18 Notation: codim(W, V ) denotes the codimension of a linear subspace W of a linear space V .
The subject of topological linear spaces is where this topic is addressed. Banach spaces and Hilbert spaces
are special kinds of topological linear spaces. Without a topology, it is difficult to give meaning to infinite
series of vectors. Therefore Chapter 22 is limited to linear combinations of finite sets of vectors.
22.6.1 Remark: Motivation for the concept of linear independence of sets of vectors.
As mentioned in Remark 22.5.4, the trivial linear space V = {0V } containing only the zero vector has
dimension 0. For any linear space V over a field K, for any v V \ {0V }, the subspace W = {v; K}
has dim(V ) = 1 because it can be spanned by a single vector v W , but it cannot be spanned by less than
one vector.
For any linear space V over a field K, for any v1 , v2 V \ {0V }, the subspace W = {1 v1 + 2 v2 ; 1 , 2 K}
has dim(V ) = 1 or dim(V ) = 2 because it can be spanned by the two vectors v1 , v2 W , but it can be
spanned by only one vector if either vector v1 or v2 is a linear combination of the other. That is, the set
{v1 , v2 } is adequate to span all of W , but one of these two vectors may be eliminated from the spanning set
if it is redundant, i.e. if it is spanned by the other vector.
For three non-zero vectors v1 , v2 , v3 V \ {0V }, the subspace W = {1 v1 + 2 v2 + 3 v3 ; 1 , 2 , 3 K} has
dim(V ) = 1, dim(V ) = 2 or dim(V ) = 3. This is because it may be possible to eliminate one vector which
is spanned by the other two. And then one of the two remaining vectors may be spanned by the other.
These examples motivate the definition of linear independence of vectors. A set of vectors in a linear space
is said to be a linearly independent set of vectors if there is no redundancy, i.e. if none of the vectors is
spanned by the others. When a set of vectors is linearly independent, the dimension of the spanned set is
equal to the number of vectors in the set. This is the pay-off for linearly independent vectors. Since none
of the vectors in a linearly independent set S is a linear combination of the others, the spanned set span(S)
cannot be spanned by fewer vectors. Therefore dim(span(S)) = (S). Definition 22.6.3 may be interpreted
by saying: If you remove v, you dont get it back again. In other words, if you remove any vector v from
the spanning set S, the vector v cannot be reconstructed from the remaining vectors in S \ {v}.
There is one little fly in the ointment here. Although there may be no more redundancy that can be removed
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
from a spanning set S for a linear subspace W of a linear space V , it is not immediately obvious that there
is no other set with fewer vectors which can span the same subspace W . For example, all integers greater
than 1 may be expressed as sums of integer multiples of the integers 2 and 3. So in this sense, the set
{2, 3} spans the integers. But the smaller set {1} also spans the integers greater than 1 in this way. So
the spanning set {2, 3} is minimal but not minimum. Nevertheless, removing redundancy from a spanning
set is an adequate motivation for Definition 22.6.3, and it will be shown that a spanning set of minimal
beta-cardinality is also a spanning set of minimum beta-cardinality.
22.6.2 Remark: All vectors in a linearly independent set of vectors contribute to the span.
Figure 22.6.1 illustrates spans and independence of vectors.
v1
2 })
spa
v1
2 })
spa
1,v
n({
1,v
n({
v
v
n({
v
, v3
v
n({
1
spa
, v3
})
spa
})
v2 v3
v2 span({v2 , v3 }) v3
span({v2 , v3 })
Figure 22.6.1 Spans and independence of vectors
In the abstract view on the left of Figure 22.6.1, one knows merely that the span of a subset S of a linear
space V is a subset span(S) of V such that S span(S). Therefore for any two vectors v1 and v2 , the
subset span({v1 , v2 }) must satisfy v1 , v2 span({v1 , v2 }). Consequently, the intersection span({v1 , v2 })
span({v1 , v3 }) must contain at least the vector v1 . Definition 22.6.3 says that the linear independence of the
set S = {v1 , v2 , v3 } means that v / span(S \ {v}) for v = v1 , v2 , v3 . In the diagram, it is clear that no two
points in S = {v1 , v2 , v3 } span all three points in S. However, it is not a-priori obvious in this abstract view
that there are no two points in the whole space V which could span all of S. This is a specific property of
linear spaces.
The projective space view on the right in Figure 22.6.1 gives a useful way of mapping out the concepts of
spanning sets and linear independence.
22.6.3 Definition: A linearly independent set of vectors in a linear space V is a subset S of V such that
v S, v
/ span(S \ {v}). (22.6.1)
22.6.5 Definition: A linearly independent family of vectors in a linear space V is a family (vi )iI of
distinct vectors in V such that {vi ; i I} is a linearly independent set of vectors in V .
Proof: For part (i), let S be a linearly independent set of vectors in V . Then 0V span(S \ {0V }) by
Theorem 22.4.7 (vii). This contradicts Definition 22.6.3 line (22.6.1). Hence 0V
/ S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (ii), let S 0 be a subset of a linearly independent subset S of V . Then v S, v / span(S \ {v}) by
Definition 22.6.3. So v S 0 , v
/ span(S \ {v}). But span(S 0 \ {v}) span(S \ {v}) by Theorem 22.4.7 (vi).
So v S 0 , v
/ span(S 0 \ {v}). Hence S 0 is a linearly independent subset of V by Definition 22.6.3.
Proof: Let S be a subset of a linear space V . Then the equivalence of condition (i) and Definition 22.6.3
line (22.6.1) follows from Notation 7.5.7 for {v}.
Assume
P that (i) is true. Then v S, {v} span(S \{v}) = . To show P Fin(S, K) and
P(ii), let = (v )vS
vS v v = 0. Suppose that v 6= 0K for some v S. Then v v = wS w w. So v = wS (w /v )w.
So {v} span(S \ {v}) = {v} 6= , which contradicts assumption (i). Therefore v S, v = 0. Thus (ii)
follows from (i).
P
Assume that (i) is false.P Then v S, {v} span(S \ {v}) 6=S . So v = wS w w for some Fin(S, K)
with v = 0. Then wS w w = 0, where (w )wS K is defined by v = 1K and w = w for
w S \ {v}. But it is not true that w S, w = 0 (because v = 1K ). So (ii) is false. Hence (i) is true
if and only if (ii) is true.
Condition (iii) is true without pre-conditions if the unique existence quantifier 0 is replaced with the
ordinary existence quantifier . This is because span(S) is defined by
X
w span(S) Fin(S, K), w = v v.
vS
22.6.9 Remark: Alternative proof of the unique linear combination condition for linear independence.
Theorem 22.6.8 part (iii) may also be proved from part (iv) as follows.
P
Assume condition
P (iv) in Theorem 22.6.8. Let w span(S) and , Fin(S, K) be suchPthat w = vS v v
and w = vS v v and 6= . Then u 6= u for some u S. Then (u u )u = vS\{u} (v v )v.
P
So u = vS u v, where v = (v v )/(u u ) for all v S \ {u} and u = 0K . Then by (iv),
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
v S, v = v,u . So u = 1K . This is a contradiction. So = . That is, w is a unique linear combination
of the elements of S. So condition (iii) is true as claimed.
22.6.10 Remark: Alternative lower-level notation for conditions for linear independence.
If the conditions in Theorem 22.6.8 are written out explicitly, without the benefit of the non-standard
Notation 22.2.25 for Fin(S, K), they have the following appearance.
(i0 ) v S, {v} span(S \ {v}) = .
P
(ii0 ) K S , #(1 (K \ {0K })) < vS v v = 0 v S, v = 0 .
P
(iii0 ) w span(S), 0 K S , #(1 (K \ {0K })) < and w = vS v v .
P
(iv0 ) w S, v K S , #(1 (K \ {0K })) < w = vS v v v S, v = v,w .
22.6.11 Remark: A popular linear independence condition is less intuitive than the spanning-set definition.
Condition (ii) in Theorem 22.6.8 is often given as the definition of linear independence of vectors. It has
an attractive symmetry and simplicity, but it is not close to ones intuition of the linear independence of
vectors. Condition (22.6.1), on the other hand, clearly signifies that each vector is independent of the others.
Definition 22.6.3 has been chosen to correspond closely to the meaning of the word independent. To see
this, note that the logical negative of line (22.6.1) in Definition 22.6.3 is line (22.6.3) as follows.
This expresses quite clearly the idea that v depends on the other vectors of S. The logical negative of
Theorem 22.6.8 condition (ii) is line (22.6.4) as follows.
X
Fin(S, K), v v = 0 and v S, v 6= 0. (22.6.4)
vS
This means that there is some linear combination of the vectors in S for which not all coefficients are zero.
It is not instantly intuitively clear that this encapsulates the idea of dependence.
The formulation of the idea of independence in Definition 22.6.3 corresponds well to the independence concept
in some other subjects. For example, in mathematical logic one seeks independent sets of axioms, which
are axioms sets for which no axiom may be proved from the others. One might say that no axiom is in the
span of the other axioms. Similarly, in set theory one seeks axioms which are independent in this same
sense. Thus there has been a considerable literature concerning the independence of the axiom of choice,
the continuum hypothesis, and various other axioms, from the Zermelo-Fraenkel axioms. One may think of
the set of all ZF theorems as the span of the ZF axiom system. Independence of the axiom of choice then
means that it is not in the span of the ZF axioms.
22.6.12 Remark: Linear charts for linear spaces using coordinates with respect to a basis.
Condition (iii) in Theorem 22.6.8 may be interpreted to mean that every vector w in span(S) has a unique
representation as a linear combination of vectors of S. So there is a bijection between the vectors span(S)
and the elements of Fin(S, K). But Fin(S, K) P is the set of vectors in the linear space FL(S, K). The map
w 7 , where FL(S, K) satisfies w = vS v v, is a linear map. (See Section 23.1 for linear maps.)
Therefore linearly independent sets of vectors can be applied to map vectors in linear spaces to scalar tuples
in free linear spaces. This provides a very general mechanism for coordinatising linear spaces. The map from
vectors to tuples may be thought of as a chart. A set of linearly independent vectors which is used in this
way is called a basis.
22.6.14 Theorem: Let V be a linear space over a field K. Let S be a linearly independent subset of V .
0
Let u0 S. Let S 0 = S \ {u0 }. Let (u )uS 0 K S be a family of elements of K. Define f : S 0 V by
u S 0 , f (u) = u + u u0 , (22.6.5)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) f is injective.
(ii) u0
/ span(f (S 0 )). Hence u0 / f (S 0 ).
(iii) f (S 0 ) {u0 } is a linearly independent subset of V .
(iv) span(f (S 0 ) {u0 }) = span(S).
Proof: For part (i), let u1 , u2 S 0 with f (u1 ) = f (u2 ) = v. Then u1 + u1 u0 = u2 + u2 u0 . Therefore
u1 = u2 + (u2 u1 )u0 span(S \ {u1 }) if u1 6= u2 and u1 6= u0 . This would contradict the linear
independence of S. So u1 = u2 . Hence f is injective.
P
For part (ii), suppose that u0 span(f (S 0 )). Then u0 = uS 0 u f (u) for some Fin(S 0 , K). Therefore
P P P P P
u0 = uS 0 u (u + u u0 ) = uS 0 u u +
0
uS 0 u u u . So uS 0 u u + ( uS 0 u P
0
u 1)u = 0. So
by Theorem 22.6.8 (ii) and the linear independence of S, u = 0 for all u S 0 and 1 = uS 0 u u = 0,
which is impossible. Therefore u0 / span(f (S 0 )). Hence u0 / f (S 0 ).
0 0
For part (iii), suppose that f (S ) {u } is not a linearly independent P set of vectors. Let S = f (S 0 ) {u0 }.
Then by Theorem 22.6.8 (ii), there exists Fin(S, K) with vS v v = 0 and P v 6= 0 for some v S.
If u0 6= 0, then u0 span(f (S 0 )), which would contradict part (ii). Therefore vf (S 0 ) v v = 0 and
P
v 6= 0 for some v f (S 0 ). In other words, uS 0 f (u) f (u) =
0 0 and f (u) 6
= 0 for some u S 0 . Thus
P 0
P P P 0 0
0 = uS 0 f (u) (u + u u ) = uS 0 f (u) u + uS 0 f (u) u u . If uS 0 f (u)Pu 6= 0, then u span(S 0
),
which would contradict the assumed linear independence of S. Therefore uS 0 f (u) (u + u u ) = 0,
which implies by Theorem 22.6.8 (ii) that S 0 is not linearly independent, which would contradict the linear
independence of S. Hence by Theorem 22.6.6 (ii), f (S 0 ) {u0 } is a linearly independent subset of V .
For part (iv), f (S 0 ) span(S) and u0 S. So f (S 0 ) {u0 } span(S). So span(f (S 0 ) {u0 }) span(S).
Now let u S. If u = u0 , then u f (S 0 ) {u0 }. Otherwise u S 0 and u = f (u) u u0 span(f (S 0 ) {u0 }).
So span(S) span(f (S 0 ) {u0 }). Hence span(f (S 0 ) {u0 }) span(S).
22.7.2 Definition: A basis (set) for a linear space V is a linearly independent subset S of V such
that span(S) = V .
22.7.4 Theorem: Every linearly independent set of vectors is a basis for its span.
Proof: Let S be a linearly independent set of vectors in a linear space V . Then span(S) is a linear subspace
of V by Theorem 22.4.7 (i), and S span(S) by Theorem 22.4.7 (iii). Hence S is a basis for span(S) by
Definition 22.7.2.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
family to consist of unique vectors. Definition 22.7.7 defines basis families for which the index set is ordered.
22.7.6 Definition: A basis (family) for a linear space V is an injective family v = (v )A V A such
that Range(v) is a basis set for V .
22.7.7 Definition: An ordered basis (family) for a linear space V is a family v = (v )A V A such that
A is an ordered set, Range(v) is a basis set for V and v : A V is injective.
A totally ordered basis (family) for a linear space V is a family v = (v )A V A such that A is a totally
ordered set, Range(v) is a basis set for V and v : A V is injective.
A well-ordered ordered basis (family) for a linear space V is a family v = (v )A V A such that A is a
well-ordered set, Range(v) is a basis set for V and v : A V is injective.
Z +
for some n 0 , where vi 6= vj for i 6= j and Range(v) is a basis for V . The injectivity of the family v is
essential. In the case of a finite family, the word sequence may be used instead of family.
22.7.9 Definition: The standard basis of a Cartesian linear space Rn for n Z+0 is the sequence (ei)ni=1
R N
defined by ei = (ij )nj=1 n for i n .
The unit vector i in a Cartesian space Rn, for n Z+0 and i Nn, is the n-tuple ei Rn defined by
ei = (ij )nj=1 .
22.7.10 Theorem: If a linear space V is finite-dimensional, then V has a finite basis. In other words, if a
linear space has a finite spanning set, then it has a linearly independent spanning set.
Proof: Let V be a finite-dimensional linear space over a field K. Then V is spanned by a finite set
P
S (V \ {0V }) by Definition 22.5.7. (The zero vector can be omitted from S because it makes no difference
to the span of S.) Let n = #(S). Then n + Z 0 . If n 1, then S is a linearly independent subset of V .
Therefore S is a finite basis for V . So suppose that n 2. If S is not a linearly independent subset of V , then
by Definition 22.6.3, there is a vector v S such that v span(S \ {v}). It follows from Theorem 22.4.7 (viii)
that span(S) = span(S \ {v}). So S may be replaced with S \ {v}. But #(S \ {v}) = n 1. Thus if the
assertion is true for #(S) = n 1, then it is true for #(S) = n. Consequently by a standard induction
argument, the assertion is true for any finite spanning set S.
22.7.11 Theorem: If a linear space V is finite-dimensional, then V has a basis whose cardinality is equal
to the dimension of V . In other words, #(S) = dim(V ) for some basis S for V .
Proof: Let V be a linear space with finite dimension n. Then n + Z
0 is the least cardinality of a spanning
set for V . Let X be the set of all spanning sets for V which have cardinality not exceeding n. Then X
is non-empty because otherwise the least cardinality of a spanning set would exceed n. But there are no
spanning sets with cardinality less than n. So X is a non-empty set of spanning sets for V with cardinality
equal to n. (Note that 0V cannot be an element of S for any S X because removing this element would
give a spanning set with cardinality less than n.)
Let S X. If n = 0, then S is a linearly independent subset of V . So V has a basis S with #(S) =
dim(V ) = 0. If n = 1, then S = {v}, for v 6= 0V , is a linearly independent subset of V . Therefore S is a
basis for V with #(S) = dim(V ) = 1.
So suppose that n 2. If S is not a linearly independent subset of V , then by Definition 22.6.3, there is a
vector v S such that v span(S \ {v}). It follows from Theorem 22.4.7 (viii) that span(S) = span(S \ {v}).
So span(S \ {v}) = V . But #(S \ {v}) = n 1, which contradicts the dimensionality of V . Therefore S is a
linearly independent spanning set for V with cardinality n.
22.7.12 Theorem: Let V be a linear space. Let S1 and S2 be linearly independent subsets of V such that
S2 is finite and S1 span(S2 ). Then #(S1 ) #(S2 ).
Proof: Let S1 and S2 be linearly independent subsets of a linear space V over a field K. Then 0V / S1
and 0V / S2 by Theorem 22.6.6 (i). Assume that S1 span(S2 ). If #(S2 ) = 0, then S2 = {0V }, and so
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S1 = . Therefore #(S1 ) = 0 #(S2 ) = 0.
If #(S2 ) = 1, then S2 = {v1 } for some v1 V \ {0V }. So span(S2 ) = {1 v1 ; 1 K}. Suppose that
u1 , u2 S1 . Then u1 , u2 span(S2 ). So u1 = 1 v1 and u2 = 2 v1 for some 1 , 2 K \ {0K }. But if
u1 6= u2 , the linear independence of S1 implies that u1 / span(S1 \ {u1 }) = span({u2 }) = {u2 ; K} =
{2 v1 ; K} = {v1 ; K} since 2 6= 0K . Clearly this is false because u1 = 1 v1 {v1 ; K}.
This contradiction implies that u1 = u2 . Therefore #(S1 ) 1 = #(S2 ).
Z
To prove the assertion for a general linearly independent finite set S2 , let n + and assume that the
assertion is true for all linearly independent subsets S2 of V such that #(S2 ) n 1. Let S2 be a linearly
N
independent subset of V such that #(S2 ) = n. Then S2 = {vj ; j n } for some family (vj )nj=1 of distinct
non-zero elements of V . Assume that S1 is a linearly independent subset of V such that SP 1 span(S2 ).
n
Then S1 = {ui ; i I} for some (not necessarily finite) family (ui )iI , where for all i I, ui = j=1 ij vj for
n
some family (ij )j=1 of elements of K. In the special case that in = 0 for all i I, all elements of S1 would
be contained in span(S2 \ {vn }). But #(S2 \ {vn }) = n 1. So in this case, the assertion #(S1 ) #(S2 )
follows from the inductive hypothesis for S2 satisfying #(S2 ) n 1. Therefore suppose that kn 6= 0K for
some k I. Let I 0 = I \ {k}, and define the family (wi )iI 0 by
i I 0 , wi = ui in 1
kn uk
Pn Pn
= j=1 ij vj in 1
kn j=1 kj vj
Pn1 1
= j=1 (ij in kn kj )vj
because ij in 1 0 0 0
kn kj = 0 for j = n. So wi span(S2 \ {vn }) for all i I . Let S1 = {wi ; i I }.
Then S10 span(S2 \ {vn }). But S10 is a linearly independent subset of V by Theorem 22.6.14 (iii) and
Theorem 22.6.6 (ii). So #(S10 ) n 1 by the inductive hypothesis because #(S2 \ {vn }) = n 1. But
#(S1 \ {uk }) = #(S10 ) by Theorem 22.6.14 (i), and uk
/ S10 by Theorem 22.6.14 (ii). Therefore #(S1 ) =
0
#(S1 ) + 1 n = #(S2 ).
22.7.13 Theorem: Every basis of a finite-dimensional linear space has cardinality equal to the dimension
of the space. In other words, every basis S for V satisfies #(S) = dim(V ).
Proof: Let V be a linear space with finite dimension n + Z0 . By Theorem 22.7.11, there is at least one
basis S for V which has #(S) = dim(V ). Let S 0 be any basis for V . Then S and S 0 are linearly independent
subsets of V such that S span(S 0 ) = V and S 0 span(S) = V . So #(S) #(S 0 ) and #(S 0 ) #(S)
by Theorem 22.7.12. Therefore #(S) = #(S 0 ) by Theorem 13.1.8. Hence every basis S for V satisfies
#(S) = dim(V ).
22.7.15 Theorem: Let S be a finite set. Let K be a field. Then dim(FL(S, K)) = #(S).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: For any finite set S and field K, FL(S, K) = K S by Theorem 22.2.14 (iv). Let n = #(S). Then
n + Z S
0 and the n elements ei = (ij )jS of KP for i S constitute a spanning set for K
S
because
S
every element v K may be writtenP as v = iS vi ei . But the vectors ei are linearly
P independent
by Theorem 22.6.8 (ii) because if iS i ei = 0K S for some (i )iS , then (i )iS = iS i ei = 0K S ,
which means that i = 0K for all i S. Therefore (ei )iS is a basis for K S by Definition 22.7.2. Hence
dim(K S ) = #(S) by Theorem 22.7.13.
22.7.16 Remark: Every free linear space has an obvious basis, but it might not be well-ordered.
Free linear spaces on infinite sets provide examples of infinite-dimensional linear spaces. (See Section 22.2
for free linear spaces.) For any set S and field K, let V be the free linear space on S with field K.
Then B = {f : S {0K , 1K }; #({x S; f (x) 6= 0}) = 1} is a basis for V . For example, the free
RR
linear space FL( , ) has a basis consisting of all functions of the form fa V defined by fa (a) = 1 and
R
x \ {a}, fa (x) = 0. However, this basis cannot be well-ordered without the assistance of the axiom of
choice. Thus there is no explicit well-ordering of this basis.
Proof: Let V be a linear space over a field K. By the well-ordering theorem (Remark 7.11.12 part (2)),
the elements of V \{0V } may be well-ordered. That is, there is a family (v )A such that A is a well-ordered
set and v : A V \ {0V } is a bijection.
Define f : A {0, 1} inductively for all A by
n
f () = 1 if v / span({v ; < and f () = 1})
0 otherwise.
Then by transfinite induction (Theorem 11.7.2), {v ; f () = 1} is a well-ordered basis for V .
22.7.19 Remark: How to develop linear spaces without the axiom of choice.
The AC-tainted Theorem 22.7.18 is not used in this book except for occasional theorems which are presented
for comparison with the literature. All theorems derived from AC-tainted theorems such as Theorem 22.7.18
are, of course, also AC-tainted.
When a basis is required for an infinite-dimensional linear space, the existence of a basis will be specified as
a pre-condition in any theorem or definition which requires it. Then the reader who accepts the axiom of
choice will understand for any linear space with a basis as for any linear space.
The following conditions may be placed on linear spaces in theorems and definitions.
(1) For all linear spaces . . .
(2) For all linear spaces which have a basis . . .
(3) For all linear spaces which have a well-ordered basis . . .
The reader who accepts the axiom of choice will think of all three of these pre-conditions as meaning the
same thing: For all linear spaces . . . .
It would be unreasonable to restrict linear algebra to the study of linear spaces which have a well-ordered
basis. For example, the unrestricted linear space R with pointwise addition and multiplication apparently
R
has no definable basis. This is by no means a pathological space. But no basis can be constructed for R . R
So all theorems and definitions for R must be valid in the absence of a basis. The axiom of choice is a
R
faith axiom. It does not deliver a basis for R which human eyes can gaze upon. The AC pixie reveals to
R
the joyous faithful that a basis exists in some astral plane, but non-believers are doomed to walk eternally in
the valley of darkness, bitter and forever athirst for the metaphysical sets which they spurn. But seriously
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
though, any theorem or definition which constructs something out of an unknowable set is merely creating
a fiction from a fiction. Garbage in, garbage out!
22.7.20 Remark: The real numbers cannot be well ordered.
The real numbers cannot be explicitly well-ordered. This must mean that 2 also cannot be explicitly
well-ordered. AC says that they can be well-ordered, but no such well-ordering can ever be presented. It
RR
follows that the linear space FL( , ) with pointwise addition and multiplication does not have a definable
well-ordered basis. According to Feferman [359], page 325:
No set-theoretically definable well-ordering of the continuum can be proved to exist from the
Zermelo-Fraenkel axioms together with the axiom of choice and the generalized continuum
hypothesis.
RR
(This quotation is also discussed in Remark 7.12.4.) For the linear space FL( , ), there is a basis {ex ; x
R }, where ex : R R
R
is defined by ex (y) = x,y for all x, y . But well-ordering this basis requires a
R
well-ordering for , which is not available in plain ZF set theory.
22.7.21 Remark: Examples of linear spaces with and without a basis.
Z R
Some infinite-dimensional linear spaces, such as the free linear space FL( +
0 , ) on the non-negative integers
RR
over the real number field, clearly have a well-ordered basis. The free linear space FL( , ) clearly has
a basis, but without the axiom of choice, the basis cannot be well ordered. The linear space R of all R
real-valued functions on the real numbers apparently does not even have a basis in the absence of the axiom
of choice. This is summarised in the following table.
linear space basis well-ordered basis
FL( + Z R
0, ) yes yes
RR
FL( , ) yes
RR
For the unrestricted linear space RR = UL(R, R) the set {ex ; x R} in Remark 22.7.20 is not a basis
because it only spans functions in RR which have finite support. (In other words, it spans only the subspace
RR
FL( , ) of UL( , ).) RR
22.7.22 Remark: Application of the axiom of choice to the extension of a spanning set to a basis.
It often happens that one requires an extension of a linearly independent set S of vectors in a linear space
to a basis for the whole space. The existence of a basis for the whole space guarantees this extension in
the case that the set S is empty. But if the set S is infinite, one must assume in general the existence of
a well-ordered basis for the whole space as in Theorem 22.7.23. With the axiom of choice, the existence of
basis extensions is guaranteed for all linear spaces as in Theorem 22.7.24.
22.7.23 Theorem: Construction of a well-ordered basis including a given linearly independent set.
Let S be a linearly independent subset of a linear space V which has a well-ordered basis. Then there is a
well-ordered basis for V which includes S.
Proof: Let V be a linear space with basis (v )A , where A is a well-ordered set. Let S be a linearly
independent subset of V . Define f : A {0, 1} inductively for A by
n
f () = 1 if v
/ span(S {v ; < and f () = 1})
0 otherwise.
Then by transfinite induction (Theorem 11.7.2), S {v ; f () = 1} is a well-ordered basis for V which
includes S.
22.7.24 Theorem [zf+ac]: Let S be a linearly independent subset of a linear space V . Then there is a
well-ordered basis for V which includes S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The motivation for defining bases for linear spaces is Theorem 22.6.8 (iii). This states that for a fixed basis,
every vector in the linear space can be equivalently represented in terms of coefficients with respect to the
basis. As mentioned in Remark 22.7.1, the original model for this idea is Cartesian coordinatisation, which
locates all points in two or three dimensions according to a rectangular grid based on an origin and two or
three basis vectors. In other words, space can be numericised.
22.8.2 Theorem: A subset B of a linear space V over a field K is a basis for V if and only if
X
v V, 0 k : B K, #({e B; k(e) 6= 0}) < and k(e)e = v.
eB
22.8.3 Theorem: A vector family (ei )ni=1 V n in a linear space V over a field K is a basis for V if and
only if
n
X
v V, 0 k K n , ki ei = v.
i=1
Pn
In other words, for all v V , there is a unique n-tuple k K n such that v = i=1 ki ei .
Proof: This follows from Theorem 22.6.8 (iii). Alternatively, Theorem 22.8.3 follows as a special case of
Theorem 22.8.2.
22.8.4 Remark: The component map for a linear space with a given basis.
P of vectors B V in a linear space V over a field K, the map B : Fin(B, K) V defined by
For any set
B (k) = eB k(e)e for k Fin(B, K) is well defined. Theorem 22.8.2 implies that B is a bijection if and
only if B is a basis. So B is a basis if and only if the inverse map B = 1
B : V Fin(B, K) is well defined.
This inverse map assigns unique components relative to the basis B for all vectors in the linear space. (See
Figure 22.8.1.)
The map B : V Fin(B, K) may be thought of as a linear chart for V . Each basis for a linear space yields
a different linear chart. These linear charts are the component maps in Definition 22.8.6. They are more
or less analogous to the charts for locally Cartesian spaces in Section 48.5.
In the figure/frame bundle context (as in Section 20.10), the linear space V would be considered to be the
total space, while the coordinate space Fin(B, K) or K n would be the fibre space, and the coordinate
map B : V Fin(B, K) would be the fibre map or measurement map. The basis B would be the
frame which is used for making measurements of vectors expressed in terms of coordinates. Multiple bases
are thought of as alternative frames of reference for making observations.
22.8.5 Remark: The component map for a finite-dimensional linear space with a given basis.
Remark 22.8.4 may be expressed more simply for finite-dimensional linear spaces. For any vector P family
B = (ei )ni=1 V n in a linear space V over a field K, the map B : K n V defined by B (k) = ni=1 ki ei
for k K n is a well-defined map. Theorem 22.8.3 implies that B is a bijection if and only if B is a basis.
So B is a basis if and only if the inverse map B = 1 n
B : V K is well defined. This inverse map assigns
unique components relative to the basis B for all vectors in the linear space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The map B : V K n may be thought of as a linear chart for V . Each basis for a linear space yields a
different linear chart. These linear charts are the component maps in Definition 22.8.8. They are analogous
to the charts for locally Cartesian spaces in Section 48.5.
22.8.6 Definition: The component map for a P basis set B V of a linear space V over a field K is the
map B : V Fin(B, K) defined by v V, v = eB B (v)(e)e.
I
22.8.7 Definition: The component map for a basis family B P= (ei )iI V of a linear space V over a
field K is the map B : V Fin(I, K) defined by v V, v = iI B (v)(i)ei .
22.8.8 Definition: The component map for a basis B = (ei )ni=1 V n of Pan n-dimensional linear space V
over a field K for n +
0 is theZmap B : V K n
defined by v V, v = n
i=1 B (v)i ei .
22.8.10 Remark: The subtle difference between a free linear space and its set of vectors.
The set Fin(B, K) in Definition 22.8.6 is the underlying set of vectors for the free linear space FL(B, K)
in Definition 22.2.10. This free linear space is the natural linear space structure for Fin(B, K). This set
and linear space may be used interchangeably, as in Theorem 22.8.11. Thus the component map B : V
Fin(B, K) is essentially the same as the map B : V FL(B, K).
22.8.11 Theorem: Let B be a basis set for a linear space V over a field K. Then the component map
B : V FL(B, K) is a well-defined bijection.
Proof: Let B be a basis set for aP linear space V over a field K. Define the linear combination map
B : Fin(B, K) V by B (k) = eB k(e)e for k Fin(B, K). By Theorem 22.8.2, this map is a
bijection if and
P only if B is a basis. But Definition 22.8.6 says that B : V Fin(B, K) is defined by
v V, v = eB B (v)(e)e, which is equivalent to v V, v = B (B (v)). In other words, idV = B B .
(That is, B is a right inverse of B .) So B is the unique inverse of B because B is a bijection, and
B : V Fin(B, K) is also a bijection.
22.8.12 Theorem: Let B be a basis set for a linear space V over a field K. Then the component map
B : V FL(B, K) satisfies
P P
Fin(V, K), B v v = v B (v), (22.8.1)
vV vV
where linear combinations in FL(B, K) are defined in terms of the pointwise linear space operations specified
in Definition 22.2.10.
P P
Proof: Let
P P Fin(V, K). Then vV v v is a well-defined
P element of V . Therefore vV v v =
eB B ( vV v v)(e)e by Definition 22.8.6. But v = eB B (v)(e)e for all v V by Definition 22.8.6.
So P P P
v v = v B (v)(e)e
vV vV eB
P P
= v B (v)(e)e.
eB vV
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P P
(Swapping summations is permissible Pbecause the sums are finite.) So B ( vV v v)(e) = vV v B (v)(e)
for all e B since the components of vV v v with respect to B are unique by Theorem 22.6.8 (iii). Hence
P P
Fin(V, K), e B, B v v (e) = v B (v)(e),
vV vV
which is equivalent to line (22.8.1) because linear operations on FL(B, K) are defined pointwise.
22.8.13 Theorem: Let B be a basis set for a linear space V over a field K. Let U be a linear subspace
of V . Then B (U ) is a linear subspace of FL(V, K).
Proof: Let B : V FL(V, K) be the component map for B. Let U be a linear subspace of V . Let u01 , u02
B (U ) and 1 , 2 K. Then u01 = B (u1 ) and u02 = B (u2 ) for some u1 , u2 U . By Theorem 22.8.12,
B (1 u1 + 2 u2 ) = 1 u01 + 2 u02 So 1 u01 + 2 u02 B (U ). Hence B (U ) is a linear subspace of FL(V, K) by
Theorem 22.1.10.
22.8.14 Remark: Avoidance of the axiom of choice when defining component maps.
No axiom of choice is required for definitions of component maps. In Remark 22.8.4 and Theorem 22.8.11,
for example, the existence and uniqueness of the map B : V Fin(B, K) as the inverse of the linear
combination map B : Fin(B, K) V is guaranteed by the fact that every function has a unique well-
defined inverse relation by Definition 9.6.12, and this relation is then a function by Theorem 22.6.8 (iii),
which follows from Definition 22.6.3 for linear independence of sets of vectors, and Definition 22.7.2 defines a
basis to be a linearly independent spanning set for a linear space, which yields both existence and uniqueness
for values of the inverse relation B = 1
B .
An axiom of choice issue does arise if one wishes to claim that every linear space as in Theorem 22.7.18, but
when the existence of a basis is assumed , the well-definition of the component map for each basis follows.
22.8.15 Remark: The intentional inconvenience of the explicit component map notations.
Instead of clumsy component notations such as B (v)(e), B (v)(i) or B (v)i , simpler notations such as vi
or v i for vectors v and basis index elements i are more commonly used. The simpler kinds of notation hide
the complexity of the inversion of the linear combination map. Although there is no axiom of choice issue
here, as explained in Remark 22.8.14, the inversion of the linear combination map can be computationally
non-trivial. Roughly speaking, this inversion is equivalent to solving n simultaneous linear equations for n
variables in the case of a linear space V with dim(V ) = n + Z
0 . The solution of these equations can be
achieved by standard matrix inversion algorithms, but the computation time can be quite large if n equals
106 or 109 for example. And of course the case of infinite-dimensional spaces is even more difficult than that.
Thus clumsy notations such as B (v)(i) are intended to communicate the idea that a non-trivial procedure
is required here.
The difficulty of computing vector components should not be confused with the relative simplicity of com-
puting covariant components, which are expressed in terms of an inner product. For example, if (ei )ni=1
is a basis for an n-dimensional linear space V , each covariant component (v, ei ) of v V for i n is N
computed with reference to only two vectors, v and ei . But the contravariant components B (v)(i) depend
on v and all of the basis vectors. To compute even a single component, it is necessary to solve for all of the
components simultaneously.
22.8.16 Definition: The component function for a vector v with respect to a basis B V for a linear space
V over a field K is the function B (v) Fin(B, K), where B is the component map in Definition 22.8.6.
22.8.17 Definition: The component tuple for a vector v with respect to a basis B = (ei )ni=1 for an n-
dimensional linear space V over a field K for n + Z n
0 is the tuple B (v) K , where B is the component
map in Definition 22.8.8.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that the map I : K n K I defined by I : v 7 (vi )iI = v I is a bijection. Such a set I exists by
Theorem 22.5.11 (i). Let SI = {I1 (i ); i I} and TI = {i ; i I 0 } where i = (ij )jI K I for i I, and
N
i = (ij )nj=1 K n for i I 0 = n \ I. Then K n = span(SI TI ) by Theorem 22.5.11 (iv). Let B = SI TI .
Define (ei )ni=1 by ei = I1 (i ) for i I and ei = i for i I 0 . Then B = {ei ; i n }.
Pn
N
To show that P B is a1linearlyPindependentP subset of K n , suppose P that i=1 i ei = 0 for some (i )ni=1 K n .
Then 0 = iI i I (i ) + iI 0 i i = iI i (I1 (i ) i ) + ni=1 i i . But (I1 (i ) i )j = 0 for j I
P
because I , and therefore I1 does not alter the jth component of i . So I1 (i ) i = jI 0 bij j for some
0 P Pn
(bij )jI 0 K I for all i I. Let 0i = i for i I and 0i = i + jI j bij for i I 0 . Then 0 = i=1 0i i .
N P
Therefore 0i = 0 for all i n . So i = 0 for all i I, and so i = jI j bij = 0 for all i I 0 . Thus
N
i = 0 for all i n . So (ei )ni=1 is a linearly independent family, which is therefore a basis for K n .
The sub-family (ei )iI consists of elements of W which span W by Theorem 22.5.11 (ii). Since they are
linearly independent, (ei )iI is a basis for W by Definition 22.7.6, which verifies part (i).
For part (ii), let m = #(I), where I is as in part (i). Then m + Z
0 and m n. Let : n N n be N
N N
a permutation of n such that ( m ) = I. (Such a permutation may be constructed using the standard
N
enumeration for I in Definition 12.3.5.) Then : m I is a bijection. Let (e)ni=1 be the basis for K n
N
which is constructed for I in part (i). Define (ei )ni=1 by ei = e(i) for all i n . Then by part (i), (ei )ni=1 is
a basis for K n and (ei )m m
i=1 = (e(i) )i=1 = (e(i) )(i)I is a basis for W .
22.8.19 Theorem: Let U be a linear subspace of a finite-dimensional linear space V . Then there is a basis
(ei )ni=1 for V such that (ei )m
i=1 is a basis for U , where m = dim(U ) and n = dim(V ).
Proof: Let B = (ei )ni=1 be a basis for V . (Such a basis exists by Theorem 22.7.11.) The map B : V K n
is a bijection
by Theorem 22.8.11. Let W = B (U ). Then W is a linear subspace of K n by Theorem 22.8.13,
and B U : U W is a bijection because B is injective. By Theorem 22.8.18 (ii), K n has a basis (e0i )ni=1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.9.2 Example: The most obvious basis for real-valued vectors on the non-negative integers.
Let V = FL( + Z R +
0 , ). Then B = (ei )iZ+ is a basis for V , where ei V is defined for i 0 by ei (x) = i,x Z
Z Z R) satisfies
0
+ +
for all x 0. The component map B : V Fin( 0,
Then
P
iZ+
0
B (v)(i)ei (j) =
P
P
iZ+
0
P
0
P
v(i)ei (j) = iZ+ v(i)ei (j) = iZ+ v(i)i,j = v(j) for all j +
0
0, Z
for all v V . So iZ+ B (v)(i)ei = v for all v V . This verifies that B is the component map for B.
0
P
The expression B (v)(i) may be written as jZ+ ci,j v(j) where ci,j = i,j for all i, j +
0
0 . This means Z
that the ith component B (v)(i) of v depends only on the value of v at i.
22.9.3 Example: Non-finite component transition map for an infinite-dimensional linear space.
Let V = FL( + Z R
0 , ). Then B = (ei )iZ+ is a basis for V , where ei V is defined by e0 (x) = 0,x for
Z Z Z+.
0
+
all x and ei (x) = i,x i1,x for all x +
0 0 , for all i A comparison of this basis B with the
basis B in Example 22.9.2 is illustrated in Figure 22.9.1.
R R
e0 e1 e2 e3 e4 e5 e0 e1 e2 e3 e4 e5
1 1
0
1 2 3 4 5 Z+0 0
1 2 3 4 5 Z+0
-1 -1
= v(j).
P
So iZ B (v)(i)ei = v for all v V . This verifies that B is the component map for B in V .
+
0
P P
The expression B (v)(i) may be written as jZ+ ci,j v(j) where ci,j = kZ+ i+k,j for all i, j +
0 0
0 . This Z
means that the ith component B (v)(i) of v depends on all of the values of v at i+k for k 0. A comparison
of this component map B with the component map B in Example 22.9.2 is illustrated in Figure 22.9.2.
c2,j c2,j
R B R B
1 1 ...
j j
0 0
Z Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
+ +
1 2 3 4 5 0 1 2 3 4 5 0
-1 -1
22.9.4 Definition: The (vector) component transition map for a linear space V over a field K, with
indexed bases B1 = (e1i )iI and B2 = (e2j )jJ is the map B1 ,B2 = B2 B1 : Fin(I, K) Fin(J, K),
where the component map B2 : V Fin(J,P K) is as in Definition 22.8.7 and the linear combination map
B1 : Fin(I, K) V is defined by B1 : k 7 iI ki e1i .
22.9.5 Remark: The vector component transition map is always well defined.
The component transition maps B1 ,B2 and B2 ,B1 in Definition 22.9.4 for indexed bases B1 and B2 for a
linear space V are illustrated in Figure 22.9.3.
The map B1 : Fin(I, K) V always yields a well-defined element of V . Then the map B2 : V Fin(J, K)
always yields a well-defined finite linear combination of elements of the indexed basis B2 because a basis for
V is defined to be such that each element of V is a unique (finite) linear combination of elements of the basis.
One might suppose that therefore the component array of the transition map must have a finite number of
non-zero elements in every row and column. It turns out that this is not true.
22.9.6 Definition: The basis-transition (component) array for a linear space V over a field K, from
indexed basis B1 = (e1i )iI to indexed basis B2 = (e2j )jJ is the double family (aji )jJ,iI where aji =
B2 (e1i )j for all i I and j J.
vecto
r comp
Fin(J, K) onen
B t ma
2 = p
B1
2
component B2
transition B1 ,B2 B2 ,B1 V
maps B1
1
= B1 map
Fin(I, K) B1 nt
c om pone
r
vecto
Figure 22.9.3 Component transition maps for linear spaces
22.9.7 Remark: Basis-transition component array rows may have infinitely many non-zero entries.
The double family AB1 ,B2 in Definition 22.9.6 is well defined, and for each i I, there are only a finite
number of j J such that aji 6= 0. However, there is no guarantee that the number of non-zero aji for a
fixed j J is finite. Nevertheless, the two basis-transition component arrays are inverses of each other, as
shown in Theorem 22.9.8.
The convention followed here for the words row and column is that row j of a double family (aji )jJ,iI
is the single-index family (aji )iI , whereas column i is the single-index family (aji )jJ . Then a change of
basis is effected by multiplying on the right by the array, whereas the transformation of components is
effected by multiplying on the left by the array. (See Remark 22.9.10 and Theorem 22.9.11.)
It is not strictly correct to refer to the array operations in Theorem 22.9.8 as matrix multiplication
or matrix inversion because multiplication and inversion of infinite matrices requires either topology or
sparseness constraints to make infinite sums meaningful. This is why the word array is used here.
22.9.8 Theorem: Forward and reverse basis-transition component arrays are inverses of each other.
Let B1 = (e1i )iI and B2 = (e2j )jJ be bases for a linear space V over a field K. Let AB1 ,B2 = (aji )jJ,iI
and AB2 ,B1 = (aij )iI,jJ be the basis-transition component arrays for B1 and B2 . Then
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
i, j J, ai` a`j = ij (22.9.1)
`I
P
i, j I, ai` a`j = ij . (22.9.2)
`J
Proof: For indexed bases B1 = (e1i )iI and B2 = (e2j )jJ for a linear space V over a field K, define the
arrays (aji )jJ,iI and (aij )iI,jJ by aji = B2 (e1i )j and aij = B1 (e2j )i for all i I and j J. These
P P
formulas mean that e1i = jJ aji e2j for all i I, and e2j = iI aij e1i for all j J. One may insert the
P P P P
expression for e1i into the expression for e2j to obtain e2j = iI aij kJ aki e2k = iI kJ aij aki e2k =
P P 2
kJ iI aki aij ek . (The manipulations of the summation symbols
P are justified by the fact that all of the
sums have only a finite number of non-zero terms.) But e2j = PkI jk e2k , and the components of a vector
with respect to B2 are unique by Theorem 22.8.2. Therefore iI aki aij = jk for all i I and j J.
This is equivalent to line (22.9.1). Line (22.9.2) follows similarly by inserting the expression for e2j into the
expression for e1i .
22.9.9 Example: A basis-transition component array with infinitely many non-zero entries in all rows.
In Example 22.9.3, the bases have index sets I = + Z
0 and J =
+
Z
0 . The double family AB,B = (aji )j,i=0
P +
Z
satisfies aji = B (ei )j = k=0 i,j+k for i, j 0 . So aji = 1 for i j and aji = 0 for i < j. This is
illustrated in Figure 22.9.4.
Clearly #{j + Z +
Z +
Z +
Z Z
0 ; aji 6= 0} = #{j 0 ; k 0 , i = j +k} = #{j 0 ; i j} = #( [0, i]) = i+1 < ,
+
Z Z
as expected. However, #{i 0 ; aji 6= 0} = #( [j, ]) = .
One may easily verify that the two basis-transition component arrays are inverses of each other in the
sense of Theorem 22.9.8. Questions of convergence of infinite series do not arise because the number of
non-zero elements is finite in every column. The index sets for the bases may be finite, countably infinite,
or arbitrarily uncountably infinite. There is no requirement for any order on the index sets.
22.9.10 Remark: Basis-transition component arrays on the right of bases, on the left of components.
The basis-transition array AB1 ,B2 = (aji )jJ,iI seems to be backwards. This is both intentional and
unfortunate. In the theory of matrices, one generally prefers matrices to be on the left, which implies
that the tuple-vectors which are multiplied by matrices must be column vectors which are on the right.
However, this convention generally assumes that the entries in the tuples are components of vectors in some
linear space. When one changes the basis of a linear space, the tuple of basis vectors must be multiplied by
a transition matrix on the right if the components are multiplied by a transition matrix on the left. In
other words, the transition array for the basis vectors must be backwards if the transition array for the
components is forwards.
The basis-transition component array in Definition 22.9.6 is backwards also in a second sense. The
1
formulaP aji = B1 2 (e2i )j expresses
P the basis vectors of B1 in terms of the basis vectors in B2 . Thus
1 2
ei = jJ B2
(ei ) j ej = jJ ej a ji . One would expect this to be called the transition array from B2
to B1 . Therefore it would seem more logical to define the array as AB1 ,B2 = (aij )iI,jJ with aij = B1 (e2j )i
P P
for all i I and j J. Then one would obtain e2j = iI B1 (e2j )i e1i = iI e1i aij . However, the name of
the array is the basis-transition component array, not the basis-transition basis array. It turns out that it is
an inevitable consequence of the definition of coordinatisation of vectors by bases that the transition array
for basis vectors is the transposed inverse of the transition array for the components. This is demonstrated
by lines (22.9.3) and (22.9.5) in Theorem 22.9.11.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.9.11 Theorem: Let V be a linear space over a field K. Let B1 = (e1i )iI and B2 = (e2j )jJ be indexed
bases for V . Define basis-transition component arrays AB1 ,B2 = (aji )jJ,iI and AB2 ,B1 = (aij )iI,jJ by
i I, j J, aji = B2 (e1i )j
i I, j J, aij = B1 (e2j )i .
Then
P
j J, e2j = e1i aij (22.9.3)
iI
P
i I, e1i = e2j aji . (22.9.4)
jJ
Denote the components (vi1 )iI and (vj2 )jJ of vectors v V with respect to B1 and B2 respectively by
i I, vi1 = B1 (v)i
j J, vj2 = B2 (v)j .
Then
P
v V, j J, vj2 = aji vi1 (22.9.5)
iI
P
v V, i I, vi1 = aij vj2 . (22.9.6)
jJ
Proof: Lines (22.9.3) and (22.9.4) follow directly from the definitions of the component maps B1 and B2 .
Let (vi1 )iI and (vj2 )jJ be the component families for a vector v V with respect to B1 and B2 as
P P P P
indicated. Then v = iI B1 (v)i e1i = iI vi1 e1i and v = jJ B2 (v)j e2j = jJ vj2 e2j by the definition
P P
of the component maps. So iI vi1 e1i = jJ vj2 e2j . Substituting e1i from line (22.9.4) into this equation
P P P P P P
gives iI vi1 ( jJ e2j aji ) = jJ 1 2
iI aji vi ej =
2 2
jJ vj ej . Hence
1 2
iI aji vi = vj by the uniqueness
of vector components P with respect
P to B2 . This verifies line (22.9.5). Similarly, line (22.9.6) follows by
substituting e2j into iI vi1 e1i = jJ vj2 e2j from line (22.9.3).
22.9.12 Remark: The figure/frame bundle interpretation for changes of linear space basis.
Theorem 22.9.11 may be interpreted in the context of figure/frame bundles (as described in Section 20.10)
by regarding multiplication on the left by the array (aji )jJ,iI as the left action Lg of a group element g
acting on a fibre space element (vi1 )iI , which is an observation of an object v in the total space V . The
bases B1 and B2 are regarded as frames of references for observers. Then line (22.9.3) may be interpreted
as the right action Rg1 of g 1 on the basis (e1i )iI in the principal bundle (or frame bundle). Thus the
combination of the actions Lg and Rg1 keeps everything the same. In other words, the observations
(vi1 )iI and (vj2 )jJ correspond to the same observed object v V if the basis and coordinates are acted on
by Rg1 and Lg respectively.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Pn nP n o
Si = vi ; i n , vi Si .
i=1 i=1
The Minkowski scalar multiple by K of a subset S of a linear space V over a field K, is the set
S = {v; v S}.
22.10.3 Remark: Minkowski set-sum notations for singletons are equivalent to coset notation.
Definition 22.10.4 defines the notations u + S and S + u for vectors u and subsets S to mean the same as
{u}+S. This kind of notation is equivalent to the left and right cosets for general groups in Definition 17.7.1,
except that the operation is notated additively rather than multiplicatively.
It is also convenient to introduce notations for negatives and differences of sets in Notation 22.10.5.
22.10.4 Definition: The Minkowski sum of a vector u and a subset S of a linear space is the set
u + S = S + u = {u + v; v S} = {u} + S.
22.10.5 Notation: S, for a subset S of a linear space V over a field K, denotes the Minkowski scalar
multiple (1K )S of S, where 1K is the unit element of K.
S1 S2 , for subsets S1 and S2 of a linear space V , denotes the Minkowski sum S1 + (S2 ) of S1 and S2 .
In other words, S1 S2 = {v1 v2 ; v1 S1 and v2 S2 }.
22.10.6 Theorem: Let V be a linear space.
P
(i) S1 , S2 , A (V ), (S1 + A) S2 = S1 (S2 A) = .
Proof: For part (i), let S1 , S2 and A be subsets of V . Suppose that (S1 + A) S2 6= . Then there are
x1 S1 , x2 S2 and a A such that x1 + a = x2 . Then x1 = x2 a. So S1 (S2 A) 6= . Similarly,
S1 (S2 A) 6= implies (S1 + A) S2 6= . Hence (S1 + A) S2 = S1 (S2 A) = .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is to consider the map k 7 ni=1 ki wi from K n to V . This may be thought Pn of as a linear chart which induces
N
order relations. Then the image under this map of the set {k K n ; i=1 ki = 1 and i n , ki 0} is
the set of convex combinations of a family (wi )ni=1 S n .
22.11.4 Remark: Some basic properties of convex combinations.
Convex combinations have properties which make them very useful. For example, a convex combination of
convex combinations of a fixed set of vectors is also a convex combination of those vectors.
Convex combinations have properties which are related to the properties of linear combinations. For example,
the set of convex combinations of a subset of a linear space is closed under convex combinations.
22.11.5 Remark: Ad-hoc notation for the set of convex multipliers in a free linear space.
As in the case of linear combinations, the lack of suitable notation is a hindrance to the presentation of
convex
combinations for infinite-dimensional
spaces. Notation 22.2.25 introduced Fin(S, K) to mean the set
f : S K; #{x S; f (x) 6= 0K } < of K-valued functions which are non-zero only on a finite subset
of the domain S. Notation 22.11.6 is a corresponding non-standard notation for discussing convexity. The
superscript 1 is supposed to suggest that the sum of the function values equals 1.
P
22.11.6 Notation: Fin10 (S, K) denotes the set {f : S K0+ ; #(supp(f )) < and xS f (x) = 1K } for
any set S and ordered field K. In other words,
P
Fin10 (S, K) = {f Fin(S, K0+ ); f (x) = 1K }.
xS
22.11.8 Definition: A convex set S in a linear space V is a subset S of V which is closed under convex
combinations. In other words,
n n
Z
X X
+
n 0,
n
w S , k (K0+ )n , ki = 1K ki wi S. (22.11.2)
i=1 i=1
Convex sets are frequently defined to be sets which are closed under convex combinations of pairs of points.
Then it is easily shown by induction that Definition 22.11.8 is satisfied. The well-known method for this is
exemplified in the proofs of Theorems 23.1.3 and 23.4.4.
22.11.10 Definition:
The convex span of a subset S of a linear space V is the set of all convex combinations of S in V .
22.11.11 Remark: The notation for the convex span of a set has an implied linear space.
The linear space V in Notation 22.11.12 must be provided by the context.
22.11.12 Notation: conv(S) denotes the convex span of a subset S of a linear space.
22.11.14 Theorem: For any subset S of a linear space V , the convex span of S is a convex set in V .
Proof: Let S be a subset of a linear space V . If S = , then conv(S) Pn = , which is a convex set. In
n
general, if v is a linear combination of elements of conv(S), then v =
Pni i=1 i wi for some (wi )i=1
k conv(S)n
Pn
N
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
+ n ni
and (ki )i=1 (K0 ) with i=1 ki = 1K . But for all i n , wi = j=1 i,j yi,j for some (yi,j )j=1 S ni and
n
P i P P i P P i
(i,j )nj=1
i
(K0+ )ni with nj=1 i,j = 1K . Therefore v = ni=1 ki ( nj=1 i,j yi,j ) = ni=1 nj=1 ki i,j yi,j =
PN
`=1 ki` i` ,j` yi` ,j` , where N =
Pn 0
i=1 ni , and i` = min{i Z
+ Pi
0;
0
i=1 ni `} and j` = `
Pi` 1
i=1 ni for
N
all ` N . So v conv(S). Hence conv(S) is a convex set by Definition 22.11.8.
22.11.15 Theorem: A subset S of a linear space is a convex set if and only if conv(S) = S.
Proof: Let S be a subset of a linear space V . If S is a convex set, then by Definition 22.11.8, every convex
combination of elements of S is in S. Therefore conv(S) = S by Definition 22.11.10. Now suppose that
conv(S) = S. Then S is a convex set by Theorem 22.11.14. Hence the assertion is proved.
22.11.17 Definition: The convex generator of a subset S of a linear space V is the minimum subset T
of V such that the convex span of T equals the convex span of S.
22.11.18 Remark: Alternative terminology for convex spans and convex generators.
The convex span of a set S in Definition 22.11.10 is generally referred to as the convex hull of S. However,
EDM2 [109], 89.A, page 331, uses the term convex hull for Definition 22.11.17 and gives the notation [S]
for this. Yosida [161], page 363, calls Definition 22.11.10 the convex hull and gives the notation Conv(S).
Rudin [126], page 72, also calls Definition 22.11.10 the convex hull, and gives the notation co(S). See
Table 22.11.1 for further options.
Guggenheimer [16], page 250, uses the term convex hull for the boundary of the smallest convex domain
whose interior contains the interior of S, and gives the notation S for this. This corresponds roughly to
Definition 22.11.17, but requires a topological context.
The term convex hull does suggest the boundary of a convex set, in other words a minimal (convex)
spanning set for a given convex set. The English word hull means a shell, pod, husk, outer covering or
envelope, or the body or frame of a ship. This strongly suggests Definition 22.11.17. The literature may be
evenly split between Definitions 22.11.10 and 22.11.17 for the convex hull. Therefore it is safest to avoid
the term.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
22.11.20 Remark: Convex combinations in affine spaces.
The concept of an affine space is often argued to be closer to the common intuition of space than a linear
space, since affine spaces have no specified origin. (See Chapter 26 for affine spaces.) The concept of a
hyperplane segment in Definition 26.9.12 is applicable to a more general affine space over a unitary module
over an ordered ring.
22.11.22 Theorem: The Minkowski sum of convex subsets of a linear space over an ordered field is convex.
Proof: Let S1 , S2 be convex subsets of a linear space V over an ordered field K. Let w be a convex
combination of elements of the Minkowski sum S1 + S2 . Then w = 1 w1 + 2 w2 for some w1 , w2 S1 + S2
and 1 , 2 K0+ with 1 + 2 = 1 by Definition 22.11.2. But wj = vj,1 + vj,2 for some vj,i Si for i = 1, 2
and j = 1, 2. So w = 1 (v1,1 + v1,2 ) + 2 (v2,1 + v2,2 ) = (1 v1,1 + 2 v2,1 ) + (1 v1,2 + 2 v2,2 ) S1 + S2 by
Definitions 22.11.8 and 22.10.2. Hence S1 + S2 is a convex set by Definition 22.11.8. The case of Minkowski
sums of general finite families of convex sets follows similarly.
if x y. Therefore I is convex. Conversely, if I is convex then I satisfies Definition 16.1.4 for a real-number
interval.
Part (ii) follows from part (i) and Theorem 22.11.22.
Chapter 23
Linear maps and dual spaces
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
functionals to those which are bounded (i.e. continuous), the fathomless depths of infinitude of dual spaces
are removed from consideration. (For example, the algebraic duals of and R is rarely mentioned.)
R R
In Chapter 23, some of the difficulties of unbounded algebraic duals are considered. Even a relatively
R
benign space such as FL(, ), which is the linear space of real sequences with finite support, has the dual
R R
UL(, ) = , which is the linear space of completely general real sequences, and the double dual of
R
FL(, ) is difficult to describe because of choice-function issues. These give some idea of why algebraic
duals and double duals of infinite-dimensional spaces should be avoided.
1 , 2 K, v1 , v2 V, (1 v1 + 2 v2 ) = 1 (v1 ) + 2 (v2 ).
Definition 23.1.1 is equivalent to linearity for arbitrary linear combinations, as shown in Theorem 23.1.3.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
23.1.3 Theorem: Let V and W be linear spaces over a field K. Let : V W . Then is a linear map
from V to W if and only if
Z
Pn Pn
n +
0 , K n
, v V n
, i vi = i (vi ). (23.1.1)
i=1 i=1
Hence
P P
Fin(V, K), v = v (v). (23.1.2)
vV vV
Proof: Let V and W be linear spaces over a field K. Let : V W . Suppose satisfies line (23.1.1).
Let v1 , v2 V . Then with n = 2, 1 = 2 = 1K , line (23.1.1) implies that (v1 + v2 ) = (v1 ) + (v2 ), which
verifies Definition 23.1.1 (i). Now suppose that K and w V . Then with n = 1, 1 = and v1 = w,
line (23.1.1) implies that (w) = (w), which verifies Definition 23.1.1 (ii).
To show the converse, suppose that is a linear
Pn map according to Definition Pn23.1.1. Let n = 0 in line (23.1.1).
Then and v are empty sequences. So i=1 i vP i = 0V . Therefore ( i=1 i vi ) = (0V ) = (0K 0V ) =
0K (0V ) = 0K by Definition 23.1.1 (i). Similarly, ni=1 i (vi ) = 0K . So line (23.1.1) is verified for n = 0.
Suppose that line (23.1.1) is valid for some n +
Pn+10
Z
. Let K n+1 and w V n+1 . Then (n+1 wn+1 ) =
Pn
n+1 (wn+1 ) by Definition 23.1.1 (ii). So ( i=1 i wi ) = (n+1 wn+1 + i=1 i wi ) = (n+1 wn+1 ) +
Pn Pn Pn+1
( i=1 i wi ) = n+1 (wn+1 ) + i=1 i (wi ) = i=1 i (wi ) by Definition 23.1.1 (i) and the inductive
assumption. Hence line (23.1.1) holds for all n + Z
0 by induction.
23.1.4 Notation: Lin(V, W ) denotes the set of linear maps from the linear space V to the linear space W .
23.1.5 Remark: Alternative notations for the space of linear maps between two spaces.
The set of linear maps Lin(V, W ) is denoted as L(V, W ) by Rudin [125], page 184, and as Hom(V, W ) by
Federer [67], page 11. The Hom notation is unnecessarily unspecific. The concept of a homomorphism is
very broad indeed. It seems therefore preferable to save the Hom notation for when it is really needed.
The notation HomK (V, W ) is used by some authors. (For example EDM2 [109], section 256.D.) But the
field K is implied in the specification tuples for V and W . It is best to avoid tacking useful hints onto
notations. (A similar kind of useful hint is the annoying practice of writing a manifold M as M n to remind
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the reader of the dimension of the manifold.)
23.1.6 Definition: The linear space of linear maps from a linear space V to a linear space W is the set
Lin(V, W ) of linear maps from V to W , together with the operations of pointwise vector addition and scalar
multiplication given by
f1 , f2 Lin(V, W ), v V, (f1 + f2 )(v) = f1 (v) + f2 (v)
and
K, f Lin(V, W ), v V, (f )(v) = f (v).
23.1.7 Remark: The possible difficulty of constructing non-trivial linear maps.
For any two linear spaces V and W , the space Lin(V, W ) contains at least the zero map f : V W defined
by f (v) = 0W for all v V . If V is equipped with a basis, linear maps are readily constructed from
the component map, which is well-defined by Theorem 22.8.11. Otherwise, it is not guaranteed that there
are linear maps other than the zero maps in general. (This issue is also mentioned in Remark 23.5.1 and
elsewhere in Section 23.5.)
23.1.8 Definition: Linear space morphisms.
Let V1 , V2 and V be linear spaces over a field K.
A linear space homomorphism from V1 to V2 is a linear map : V1 V2 .
A linear space monomorphism from V1 to V2 is an injective linear space homomorphism : V1 V2 .
A linear space epimorphism from V1 to V2 is a surjective linear space homomorphism : V1 V2 .
A linear space isomorphism from V1 to V2 is a bijective linear space homomorphism : V1 V2 .
A linear space endomorphism on V is a linear space homomorphism : V V .
A linear space automorphism on V is a linear space isomorphism : V V .
23.1.12 Notation: GL(V ), for a linear space V , denotes the set of linear space automorphisms on V . In
other words, GL(V ) = Aut(V ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.1.13 Theorem: Let U , V and W be linear spaces over a field K. Let 1 : U V and 2 : V W be
linear maps. Then 2 1 : U W is a linear map.
23.1.14 Theorem: The inverse of a linear space isomorphism is a linear space isomorphism.
Let : V W be a linear space isomorphism between linear spaces V and W . Then 1 : W V is a
linear space isomorphism.
Proof: Let : V W be a linear space isomorphism between linear spaces V and W . Then 1 : W V
is a well-defined bijection because is a bijection.
P P P
Let Fin(W, K) and let u = 1 wW w w . Then (u) = wW w w = wW w (1 (w)) =
P 1
P
wW w (w) because is a linear map. So u = wW w 1 (w) because is injective. Therefore
1
is linear.
23.1.15 Theorem: The component map of a linear space with a basis is an isomorphism.
Let B be a basis set for a linear space V over a field K. Then the component map B : V FL(B, K) is
an isomorphism.
Proof: Let B be a basis set for a linear space V over a field K. Then by Theorem 22.8.11, the component
map B : V FL(B, K) is a well-defined P bijection. The inverse of B is the linear combination map
B : FL(B, K) V defined by B () = eB e e for all Fin(B, K). Clearly B is a linear map. So
B is an isomorphism. Therefore B is an isomorphism by Theorem 23.1.14.
23.1.16 Remark: The general linear group of a linear space is a left transformation group.
The group operation G in Theorem 23.1.17 uses the order g1 g2 so that the left transformation group
action rule will be satisfied. If the order is reversed to g2 g1 , then the tuple becomes a right transformation
group as in Definition 20.7.2, assuming that the action map VG is transposed.
23.1.17 Theorem: Let V be a linear space over a field K. Let G = GL(V ). Define G : G G G by
G : (g1 , g2 ) 7 g1 g2 . Define VG : G V V by VG : (g, v) 7 g(v). Then the tuple (GL(V ), V, G , VG ) is
an effective left transformation group on V .
Proof: The closure of G under the composition operation G follows from Theorem 23.1.13. Existence of
inverses follows from Theorem 23.1.14. The identity map idV : V V is an identity operation for G. So
(G, G ) is a group by Definition 17.3.2.
Let g1 , g2 G and v V . Then VG (G (g1 , g2 ), v) = VG (g1 g2 , v) = (g1 g2 )(v) = g1 (g2 (v)) =
VG (g1 , VG (g2 , v)). So VG and G satisfy Definition 20.1.2 (i). Let v V . Then VG (idV , v) = idV (v) = v. So
VG and idV satisfy Definition 20.1.2 (ii). Hence (GL(V ), V, G , VG ) is a left transformation group on V by
Definition 20.1.2.
G acts effectively on V by Definition 20.2.1 because if VG (g, v) = v for all v V , then g : V V is the
identity map, which is the identity element of G. Hence (GL(V ), V, G , VG ) is an effective left transformation
group on V by Definition 20.2.1.
23.1.18 Remark: Finite-dimensional linear spaces are fully classified by their dimension.
The following comment, which is related to Theorem 23.1.20, is made by Gilmore [80], page 12.
Because of the profound result that all N -dimensional vector spaces over the same field are
equivalent (this is not true of groups with n elements or of algebras with the same dimension),
it is necessary to study only one vector space of any dimension N in any detail.
Probably much more profound than this is the fact that all bases for a finite-dimensional linear space
have the same cardinality by Theorem 22.7.13. So the dimension of a linear space is uniquely determined
by the cardinality of any basis. (All other bases have the same cardinality.) Some authors define the
dimension of a linear space to be the cardinality of a linearly independent spanning set for that space. (In
Definition 22.5.2, the dimension is defined as the minimum cardinality of spanning sets without mentioning
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
linear independence.) When the dimension is defined in terms of a basis, the profound observation is that
the cardinality of a single finite basis fully determines the space up to isomorphism because all other bases
have the same cardinality. Therefore the nature of the profound observation depends on exactly how one
defines dimension. (The motivation for the minimum spanning set cardinality style of definition is to ensure
that every linear space has a well-defined dimension even if no basis can be proved to exist.)
Theorems 23.1.19 and 23.1.20 extend the profound result for finite-dimensional spaces to general linear
spaces with bases which are equinumerous in the sense of Definition 13.1.2.
23.1.19 Theorem: Let : BU BV be a bijection for sets BU and BV . Let K be a field. Define
: FL(BU , K) FL(BV , K) by
Proof: The map on line (23.1.3) is well defined because Dom(f ) = BU = Dom() = Range( 1 ) and
Dom( 1 ) = Range() = BV . To show the linearity of , let f, g FL(BU , K) and , K. Then for
all j BV , it follows from the definitions of vector addition and scalar multiplication on FL(BV , K) that
(f + g)(j) = (f + g)( 1 (j)) = f ( 1 (j)) + g( 1 (j)) = (f )(j) + (g)(j). Hence (f + g) =
(f ) + (g). So is a linear map by Definition 23.1.1. The fact that is an isomorphism follows from
the observation that = idFL(BU ,K) and = idFL(BV ,K) , where : FL(BV , K) FL(BU , K) is the
map defined according to line (23.1.3) for the inverse map = 1 : BV BV .
23.1.20 Theorem: Let U and V be linear spaces with bases BU and BV which are equinumerous. (In
other words, there exists a bijection between BU and BV .) Then U and V are isomorphic linear spaces.
23.1.22 Definition: The kernel of a linear map : V W for linear spaces V and W over the same
field is the subspace {v V ; (v) = 0W } of V .
23.1.23 Notation: ker() denotes the kernel of a linear map : V W for linear spaces V and W over
the same field. In other words, ker() = 1 ({0W }).
23.1.24 Theorem: The kernel of a linear map is trivial if and only if the map is injective.
Let : V W be a linear map for linear spaces V and W over the same field. Then ker() = {0V } if and
only if is injective.
Proof: Suppose that is injective. Then (v) 6= (0W ) for all v V \ {0V }. So ker() = {0V }.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
To show the converse, assume that ker() = {0V }. Suppose that is not injective. Then there are vectors
v1 , v2 V with v1 6= v2 and (v1 ) = (v2 ). So (v2 v1 ) = (v2 ) (v1 ) = 0W by the linearity of . But
v2 v1 ker() and v2 v1 6= 0V , which contradicts the assumption. Therefore is injective.
23.1.25 Remark: The nullity of a linear map is the dimension of its kernel.
The dimension of the kernel of a linear map is well defined because the dimension of any linear space is well
defined, as mentioned in Remark 22.5.5. This well-defined concept is given a name in Definition 23.1.26
23.1.26 Definition: The nullity of a linear map : V W for linear spaces V and W over the same
field is the dimension of the kernel of .
23.1.27 Notation: nullity() denotes the nullity of a linear map : V W for linear spaces V and W
over the same field. In other words, nullity() = dim(ker()).
23.1.28 Theorem: Let : V W be a linear map for linear spaces V and W over the same field. Then
is injective if and only if nullity() = 0.
Proof: The assertion follows from Theorem 23.1.24 and Definition 23.1.26.
23.1.29 Remark: Relation between linear map injectivity and the rank-nullity theorem.
Theorem 23.1.28 is related to the rank-nullity formula which follows directly from the first isomorphism
theorem for linear spaces. (See Remark 24.2.15 for the three isomorphism theorems which are valid for a
wide range of algebraic categories.) Theorem 24.2.18 states that dim(Range())+nullity() = dim(V ) for any
linear map : V W between linear spaces V and W over the same field such that V is finite-dimensional.
The special case nullity() = 0 implies that is injective, which implies that dim(Range()) = dim(V )
because Range() and V are then isomorphic.
23.1.30 Remark: Expression for a linear map in terms of a basis for the domain space.
Section 23.2 is concerned with component maps for linear maps, which require a basis for both the domain
and target spaces. But with only a basis for the domain space, it is possible to express linear maps as linear
combinations of values for the domain basis.
23.1.31 Theorem: Let V and W be linear spaces over a field K. Let B = (ei )iI be a basis for V . Let
: V W be a linear map. Then
P
v V, (v) = B (v)i (ei ).
iI
P P
Proof: Let v V . Then v = iI B (v)i ei by Definition 22.8.7. Therefore (v) = iI B (v)i (ei ) by
Theorem 23.1.3.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
vector component map
Figure 23.2.1 Component maps for a linear map
Let : V1 V2 be a linear map between linear spaces V1 and V2 which have respective bases B` , or
indexed bases B` = (e`i )iI` , for ` = 1, 2. By the definition of a linear space basis, the component maps
B` : V` Fin(B` , K) (or B` : V` Fin(I` , K)) are well defined for ` = 1, 2. Therefore the component
map B1 ,B2 = B2 B1 : Fin(B1 , K) Fin(B2 , K) is well defined, where the vector component map
B2 : V Fin(B2 , K)Pis as in Definition 22.8.6, and the linear combination map B1 : Fin(B1 , K) V is
defined by B1 : k 7 vB1 k(v)v.
It is more convenient to use an indexed basis. Then the component map is B1 ,B2 = B2 B1 :
Fin(I1 , K) Fin(I2 , K), where the vector component map B2 : V Fin(I2 , K) isPas in Definition 22.8.7,
and the linear combination map B1 : Fin(I1 , K) V is defined by B1 : k 7 iI1 ki e1i . This is now
formalised as Definition 23.2.2.
23.2.2 Definition: The component map for a linear map : V1 V2 , for linear spaces V` over a field K
with indexed bases B` = (e`i )iI` for ` = 1, 2 is the map B1 ,B2 = B2 B1 : Fin(I1 , K) Fin(I2 , K),
where the component map B2 : V Fin(J, K) P is as in Definition 22.8.7, and the linear combination map
B1 : Fin(I1 , K) V is defined by B1 : k 7 iI1 ki e1i .
23.2.3 Remark: Linear-map component maps in the special case that the domain and range are equal.
If V1 = V2 = V in Definition 23.2.2 and : V1 V2 is the identity map on V , then the component map
for is equivalent to the component transition map in Definition 22.9.4 (which is the component transition
map for a change of basis).
When V1 = V2 = V and Aut(V ), the component maps B1 ,B2 and B2 ,B1 are both well defined.
If = idV , then these component maps are inverses of each other.
23.2.5 Definition: The component array for a linear map : V1 V2 between linear spaces V` over a
field K with respect to indexed bases B` = (e`i )iI` for ` = 1, 2 is the double family (aji )jI2 ,iI1 , where
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
jI2 iI1
Proof: Line (23.2.1) follows directly from Definitions 23.2.5 and 22.8.7.
Line (23.2.2) follows from line (23.2.1), Theorem 23.1.3 and Definition 22.8.7.
23.2.8 Definition: The linear map component map for basis sets B1 and B2 of linear spaces V1 and V2
respectively over a field K is the map B1 ,B2 : Lin(V1 , V2 ) Fin(B2 B1 , K) with
Lin(V1 , V2 ), e1 B1 , e2 B2 ,
B1 ,B2 ()(e2 , e1 ) = B2 ((e1 ))(e2 ),
23.2.9 Definition: The linear map component map for basis families B1 = (e1i )iI1 and B2 = (e2i )iI2 for
linear spaces V1 and V2 respectively over a field K is the map B1 ,B2 : Lin(V1 , V2 ) Fin(I1 I2 , K) with
23.2.10 Definition: The linear map component map for finite bases B1 = (e1i )ni=1 and B2 = (e2j )m j=1 for
linear spaces V1 and V2 respectively over a field K is the map B1 ,B2 : Lin(V1 , V2 ) K mn with
Lin(V1 , V2 ), i Nn, j Nm ,
B1 ,B2 ()(j, i) = B2 ((ei ))j ,
23.2.11 Remark: Comparison of proofs of inverse linear-map and reverse basis-transition theorems.
The proof of Theorem 23.2.12 is, unsurprisingly, very similar to the proof of Theorem 22.9.8, which is
effectively the special case of Theorem 23.2.12 for V = V1 = V2 and = idV .
23.2.12 Theorem: The component arrays of a linear map and its inverse are inverse matrices.
Let B` = (e`i )iI` be bases for linear spaces V` over a field K for ` = 1, 2. Let : V1 V2 be a linear space
isomorphism. Let (aji )jI2 ,iI1 and (aij )iI1 ,jI2 be the linear-map component arrays with respect to B1
and B2 for and 1 respectively. Then
P
i, j I2 , aim amj = ij (23.2.3)
mI1
and
P
i, j I1 , aim amj = ij . (23.2.4)
mI2
Proof: For line (23.2.3), let vj = 1 (e2j ) for j I2 . Then vj V1 for all j I2 , and e2j = (vj ) =
P P 2 2
P 2
iI2 mI1 aim B1 (vj )m ei by Theorem 23.2.6 line (23.2.2). But ej = iI2 ij ei for all j I2 , and by
Definition 22.7.2 and Theorem 22.6.8 (iii), e2j can be expressed in at most one way as a linear combination of
P
elements of B2 . So ij = mI1 aim B1 (vj )m for all i, j PI2 . However, B1 (vj )m = B1 ( 1 (e2j ))m = amj
for all j I2 and m I1 by Definition 23.2.5. Hence ij = mI1 aim amj for all i, j I2 .
Line (23.2.4) follows exactly as for line (23.2.3) by swapping and 1 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.3. Trace of a linear map
23.3.1 Remark: Invariance of the trace of a linear space endomorphism.
The most important fact about the trace of a linear space endomorphism in Definition 23.3.3 is that its value
is independent of the choice of basis. (For the component map B : V K n , see Definition 22.8.8.) This
invariance is probably also the most annoying fact about the trace. Although the diagonal elements which
are added to compute the trace all individually change when the basis changes, somehow the sum of these
elements is invariant.
The trace of linear space endomorphisms finds extensive applications in differential geometry, especially for
the contraction of indices of tensors. The requirement of a basis for the definition of the trace is a particular
embarrassment for anti-components campaigners, some of whom would probably ban coordinates, bases and
indices outright if they could. Since the trace is basis-independent, one can at least hide the role of a basis
in its definition by using an abstract notation such as Tr(). (Nostalgia for the good old days of synthetic
geometry is somewhat misplaced. Anyone who has experienced the rigours of traditional Euclidean geometry
knows that it was infested with non-obvious constructions and plethoras of tedious labels for points, lines,
angles, triangles and other figures, and its scope was limited to straight lines and a few conic sections in flat
space. The arithmetisation of geometry by Fermat and Descartes achieved substantial simplification and a
huge broadening of scope which more than compensate for the lack of geometric immediacy. It is analogous
to the replacement of analogue computers with digital computers, which replace voltages with bits.)
Proof: Let Lin(V, V ) for a finite-dimensional linear space V . Let B1 = (e1i )ni=1 and B2 = (e2j )nj=1 be
bases for V . Let aji = B2 (e1i )j and aij = B1 (e2j )i for all i, j n . ThenN
n
P n
P n
P
B2 ((e2j ))j = B2 aij e1i j
(23.3.1)
j=1 j=1 i=1
Pn n
P
= B2 aij (e1i ) j
(23.3.2)
j=1 i=1
Pn
= aij B2 ((e1i ))j (23.3.3)
i,j=1
P n
= B1 ((e1i ))i (23.3.4)
i=1
where line (23.3.1) follows from Theorem 22.9.11 line (22.9.3), line (23.3.2) follows from the linearity of ,
line (23.3.3) follows from Theorem 22.8.12, and line (23.3.4) follows from Theorem 22.9.11 line (22.9.6).
23.3.3 Definition: The trace of a linear P space endomorphism Lin(V, V ), for a finite-dimensional
linear space V over a field K, is the element ni=1 B ((ei ))i of K, where B = (ei )ni=1 is any basis for V .
23.3.4 Notation: Tr(), for a linear space endomorphism Lin(V, V ) on a finite-dimensional linear
space V , denotes the trace of .
23.3.5 Remark: Geometric interpretation of the trace of a linear map as its divergence.
Without resorting to analytical concepts, it is possible to give a geometric kind of interpretation for the trace
R
of a linear map. Consider the family of transformations : Lin(V, V ) given by (t)(v) = v + tL(v) for
a linear map L Lin(V, V ) for some finite-dimensional linear space V . Given any figure S V , such as a
rectangular or ellipsoidal region with finite positive volume, the image (t)(S) of S under the map (t) will
have a volume which is a polynomial function with respect to t. The constant term in this polynomial is
the initial volume of S = (0)(S). The first-order term has the form t Tr(L) vol(S). This follows from the
well-known properties of the characteristic polynomial for a linear map. Thus the trace of a linear map may
be identified with the divergence of the linear map v 7 Lv.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.4. Linear functionals
23.4.1 Remark: Specialisation of linear maps to linear functionals.
Linear functionals are linear maps from a linear space to the field of the linear space. As mentioned in
Remark 22.1.6 and Definition 22.1.7, a field may be regarded as a linear space whose field is itself. Hence
Definition 23.1.1 for a linear map applies immediately to the case where the range of the map is a field
(regarded as a linear space), but this field must be the same as the field of the domain space.
A linear map where the range is the field of the domain space is called a linear functional. The use of a
different name is justified by the large number of special properties of linear functionals which are not valid
for general linear maps. However, the form of the definition for a linear functional in Definition 23.4.2 is
essentially identical to that of linear maps in Definition 23.1.1.
23.4.2 Definition: A linear functional on a linear space V over a field K is a function f : V K which
satisfies the following conditions.
(i) v1 , v2 V, f (v1 + v2 ) = f (v1 ) + f (v2 ). [vector additivity]
(ii) K, v V, f (v) = f (v). [scalarity]
1 , 2 K, v1 , v2 V, f (1 v1 + 2 v2 ) = 1 f (v1 ) + 2 f (v2 ).
Theorem 23.4.4 asserts that Definition 23.4.2 is equivalent to linearity for arbitrary linear combinations.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: For V , K, B, B and hi as in the statement of the theorem, let 1 , 2 K and v1 , v2 V .
By Theorem 23.1.15, the component map B : V Fin(I, K) is a linear space isomorphism. Therefore
hi (1 v1 + 2 v2 ) = B (1 v1 + 2 v2 )(i) = (1 B (v1 ) + 2 B (v2 ))(i) = 1 B (v1 )(i) + 2 B (v2 )(i) by the
pointwise definition of operations on Fin(I, K).
23.4.9 Remark: Ad-hoc juxtaposition products of linear functionals by vectors.
In the expression in Theorem 23.1.31 for a linear map : V W in terms of a basis B for V , the maps
V are linear functionals in V by Theorem 23.4.8.
v 7 B (v)i for v P P This suggests replacing the formula
v V, (v) = iI B (v)i (ei ) with the simpler-looking = iI (ei )(B )i , where (ei ) W and
(B )i V for each i I. This looks like a linear combination of the linear functionals (B )i except that
the multipliers in the linear combination are vectors in W , not scalars in K. This raises the question of
what kind of product is intended here for elements of W and V . (Such juxtaposition products are seen
in the literature for vector-valued differential forms for example.)
The juxtaposition product in Definition 23.4.10 is clearly well defined. As long as the V and W have the
same field, the product f w is a well-defined linear map from V to W . This is very general, requiring no
bases nor any relation between the spaces apart from the common field.
The juxtaposition product is quite limited. Only linear maps g Lin(V, W ) which have a one-dimensional
range can be written as g = f w for some f V and w W . However, if V has a basis, Theorem 23.1.31
implies that g may be written as a sum of such products.
23.4.10 Definition: The juxtaposition product of a linear functional f by a vector w , where f V ,
w W , and V and W are linear spaces over the same field, is the map h : V W Lin(V, W ) defined by
f V , w W, v V, h(f, w)(v) = f (v)w.
The expression h(f, w) may be denoted as f w for f V and w W , or alternatively as w f .
for any linear spaces V and W over the same field K, where B = (ei )iI is a basis for V . Since each term
in the sum is a linear map from V to W , the summation uses the addition operation for the linear space
Lin(V, W ) in Definition 23.1.6.
The fact that the index set I could be infinite might seem to make the infinite sum perilous. However,
each term (B )i (ei ) in the sum has finite support in the sense that ((B )i (ei ))(ej ) 6= 0 for only
one j I, and any vector v V has a non-zero component (B )(v)i with respect to B for only a finite
number of i I. So for any fixed v V , only a finite number of terms in the sum are non-zero. In the sense,
the sum converges for each v V . Moreover, the sum converges to (v). This justifies line (23.4.1).
Although all of the terms may be non-zero linear maps, only a finite number of terms are non-zero for any
fixed argument v V .
If the linear space W is replaced by the field K, line (23.4.1) seems to state that the family B = ((B )i )iI is a
basis for V because then any linear functional Lin(V, K) = V may be written as a linear combination
of members of B . This apparent dual basis is given the name canonical dual basis in Definition 23.7.3,
and line (23.4.1) is then equivalent to Theorem 23.7.7. However, by the comments in Remark 23.7.4,
B does not span V if V is infinite-dimensional. This follows from the observation that Lin(V, K) can only
be spanned by B if infinite subfamilies are permitted, which contradicts Definition 22.7.6 since the span
of a family in Definition 22.4.3 permits only finite linear combinations.
It is the special linear-functional structure of the vectors (B )i V which makes it possible to reduce
infinite linear combinations to finite linear combinations by restricting these linear functionals to particular
fixed vectors v V . Without knowledge of this special structure, infinite linear combinations in general
are meaningless because the meaningfulness of linear combinations in Definition 22.3.2 relies upon math-
ematical induction to ensure that all finite sums lie within the linear space. It should also be noted that
the convergence of the infinite linear combinations here depends on the presence of only one copy of
the linear functional (B )i in B for each i I. If infinitely many copies were permitted, then even for a
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
fixed vector v V , the number of terms in the sum in line (23.4.1) would be infinite and therefore undefined
(because there is no topology to define convergence with). Thus it is both the special nature of the individual
members of B and the fact that there is only one copy of each (B )i which makes the pseudo-convergence
in line (23.4.1) possible.
on W2 = span(v1 , v2 ). This process can be continued for any particular finite linearly independent set of
vectors by hand if one has sufficient patience. By countable induction, this procedure can be extended to
an arbitrary finite or countably infinite linearly independent set of vectors. Using transfinite induction, the
same procedure yields a linear functional if a well-ordered basis is available for the linear space. In fact,
the well-ordering of the basis is not necessary. But without any basis at all, it is not possible within ZF set
theory to construct a non-zero linear functional.
23.5.2 Remark: Methods for guaranteeing the existence of non-zero linear functionals.
Theorem 23.5.3 guarantees the existence of a non-zero linear functional on a linear space if the space has a
basis. If no basis is available, Theorem 23.5.5 will usually guarantee the existence of constructible non-zero
linear functionals.
23.5.3 Theorem: Construction of non-zero linear functionals on a non-trivial linear space with a basis.
Let V be a linear space over a field K, which has a basis. Then for any non-zero v V , there is a linear
functional f : V K such that f (v) 6= 0.
Proof: Let B be a basis Pfor V . Let v V \ {0}. Denote the component map for the basis B by
: V K B . Then v = eB (v)(e)e. The set B 0 = {e B; (v)(e) 6= 0} is finite. Let e B be a
basis vector such that (v)(e) 6= 0. (There must be such a vector e because v 6= 0.) Define f : V K by
f (w) = (v)(e)1 (w)(e) for all w V . Then f is a well-defined linear functional on V with f (v) = 1K .
This is clearly non-zero.
23.5.4 Remark: Existence of non-zero linear functionals on subspaces of unrestricted linear spaces.
Theorem 23.5.5 is shallow, but surprisingly effective. The majority of linear spaces are either subspaces of
unrestricted linear spaces K S (according to Definition 22.2.5), or are easily identifiable with such subspaces.
(For example, the linear space of all real or complex Cauchy sequences or the linear space C ( n , ) in R R
Notation 42.1.10.) Therefore in practice, finding non-zero linear functionals is usually not difficult. (One
needs only to find a v V and x S with v(x) 6= 0K , which should be easy if V 6= {0V }.)
23.5.5 Theorem: Let K S be the unrestricted linear space on a set S with field K (with vector addition
and scalar multiplication defined pointwise). Let V be a linear subspace of K S .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) For any x S, the map : V K defined by : v 7 v(x) is a linear functional on V .
(ii) If v V, v(x) 6= 0K , then : v 7 v(x) is a non-zero linear functional on V .
23.5.6 Remark: Extendability of linear functionals from a subspace to the full space.
It is frequently required to know that there exists a linear extension of a linear functional from a subspace
to the whole linear space. This is guaranteed by Theorem 23.5.7 in the case of linear spaces which have a
well-ordered basis. (In ZF+AC set theory, every linear space has a well-ordered basis by Theorem 22.7.18,
which yields Theorems 23.5.9 and 23.5.10.)
23.5.7 Theorem: Let V be a linear space over a field K which has a well-ordered basis. Let f be a linear
functional on a subspace W of V . Then f may be extended to a linear functional f : V K.
Proof: A proof may be constructed following the pattern of the proof of Theorem 22.7.23. Let W be a
subspace of a linear space V over a field K. Let V have a basis B = (v )A , where A is a well-ordered set.
Let f : W K be a linear functional on W .
Define g : A {0, 1} inductively for A by
n
g() = 1 if v
/ span(W {v ; < and g() = 1})
0 otherwise.
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
23.6. Dual linear spaces 689
Then by transfinite induction (Theorem 11.7.2), span(W {v ; g() = 1}) = V and A, (g() = 1
v / W ). So W span(S) = {0V }.
P u V = span(W S), where S = {v ; g() = 1}. Then by the definitionPof the linearPspan, u =
Let
PS e e for some Fin(W
eW P S, K). Since W and S are disjoint sets, u = eW e e + eS e e =
w + eS e e, where w = eW e e W . Now f : W K can be extended to f : V K by defining
f(u) = f (w) for all u V . It must be shown that this extension is well defined and linear on V .
P 1
P 2 1 2
Let u V and suppose that u = P w1 + 2 eS1e e = w2 + eS e e for some P w1 , w22 W1 and ,
Fin(S, K). Then (w1 w2 ) = eS (e e )e. But w1 w2 W and eS (e e )e span(S).
So w1 = w2 . Thus the choice of W is unique for a given u V .
P
To show the linearity of f, let v = ni=1 i ui with n + Z n n
0 , K Pand u V . PThen ui =
P
wi + eS ie e
for some
P n P
N
w W n and i Fin(S, K) for each i n . So v = ni=1 i wi + eS ie e = ni=1 i wi +
P
i
i=1 eS e e. Therefore
Pn Pn P
f(v) = f i w i + ie e
i=1 i=1 eS
Pn
=f i w i
i=1
n
P
= i f (wi )
i=1
Pn
= i f(ui ).
i=1
Hence f is linear.
23.5.8 Remark: Existence of linear functionals if the axiom of choice is accepted.
A general linear space is guaranteed to have a well-ordered basis if the axiom of choice is assumed. (See
AC-tainted Theorem 22.7.18 for this.) The AC versions of Theorems 23.5.3 and 23.5.7 are Theorems 23.5.9
and 23.5.10 respectively. Acceptance of the axiom of choice does not reduce the workload. The proofs are
the same. Only the claimed generality differs.
23.5.9 Theorem [zf+ac]: Let V be a linear space over a field K. Then for any non-zero v V , there is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a linear functional f : V K such that f (v) 6= 0.
Proof: This follows immediately from Theorems 22.7.18 and 23.5.3.
23.5.10 Theorem [zf+ac]: Let f be a linear functional on a subspace W of a linear space V over a field K.
Then there is a linear functional f : V K which extends f from W to V .
Proof: This follows immediately from Theorems 22.7.18 and 23.5.7.
Z
Pn P n
V = f : V K; n + n n
0 , K , w V , f i w i = i f (wi ) .
i=1 i=1
The set V in Notation 23.6.2 is given a linear space structure in Definition 23.6.4.
23.6.4 Definition: The dual (linear) space of a linear space V < (K, V, K , K , V , V ) is the linear space
V < (K, V , K , K , V , V ) where the set V is the set of all linear functionals on V , and V , V are
the pointwise vector addition and scalar multiplication operations on V respectively. In other words,
(i) w1 , w2 V , v V, V (w1 , w2 )(v) = K (w1 (v), w2 (v)), [pointwise vector addition]
(ii) K, w V , v V, V (, w)(v) = K (, w(v)). [pointwise scalar multiplication]
23.6.6 Remark: The fine distinction between the dual linear space and the set of linear functionals.
The notation V is used both for the linear space V in Notation 23.6.5 and the set V in Notation 23.6.2.
This ambiguity is almost always harmless.
23.6.8 Theorem: If all linear functionals have zero value on a vector, then the vector is zero.
Let V be a linear space which has a basis. Then (w V , w(v) = 0) v = 0. In other words, the only
vector in V for which all linear functionals on V have the zero value is the zero vector.
v1 , v2 V, (w V , w(v1 ) = w(v2 )) v1 = v2 .
In other words, if all linear functionals on V have the same value on two vectors in V , then the two vectors
are the same.
Proof: Let v1 , v2 V have the property that w V , w(v1 ) = w(v2 ), where V is a linear space which
has a basis. Suppose that w V , w(v1 v2 ) = 0 by the linearity of w for w V . So v1 v2 = 0 by
Theorem 23.6.8. So v1 = v2 . Hence (w V , w(v1 ) = w(v2 )) v1 = v2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.6.10 Theorem: Let V be a linear space over a field K, which has a basis. Let t K. Then
Proof: Let V be a linear space over a field K, which has a basis. Let t K. let v1 , v2 V , and suppose
that w V , w(v1 ) = tw(v2 ). Then w V , w(v1 ) = w(tv2 ). So by Theorem 23.6.9, v1 = tv2 .
23.6.11 Remark: The dual of a proper subspace is not a subspace of the dual.
The dual of a proper linear subspace of a linear space is not a subspace of the dual of the primal space.
To see why this is so, let U be a linear subspace of a linear space V such that U 6= V . The elements of
V are linear functionals f : V K, where K is the scalar field of V . The elements of U are linear
functionals g : U K. But Dom(g) = U 6= V = Dom(f ). Therefore g / V . In fact, U V = .
This observation regarding duals of subspaces has particular relevance for the definition of antisymmetric
tensors in Section 30.4. In the tensor algebra context, one defines subspaces of tensor spaces which are
defined by a symmetry or antisymmetry constraint of some kind. Then the formation of the dual of such a
subspace has an untidy relationship with the dual of the unconstrained space.
It seems straightforward, perhaps, to extend linear functionals g : U K to the whole space V , thereby
identifying elements of U with elements of V . The existence of such linear extensions is sometimes in
some doubt, as noted in Remarks 23.5.1 and 23.5.6, although in essentially all practical cases, the existence
of (algebraic) extensions to (algebraic) linear functionals is easily proved (without invoking the axiom of
choice). The uniqueness issue is more serious. In general, there are infinitely many linear extensions from
a proper subspace to the whole space, if at least one is known to exist. With the aid of a basis, one may
standardise such extensions to some extent, but the artificiality of such a construction is very unsatisfying
and unsatisfactory.
23.6.12 Remark: The dual of a subspace is easier to define than the dual of the full space.
The fact that the dual U of a subspace U is not a subspace of the dual V of the full space V , but is instead
a space of restricted linear functionals with domain U , makes the definition of U easier. Instead of needing
to define the value of f (v) for all v V , it is only necessary to define f (v) for v U . If V is a huge space
such as or R , axiom-of-choice issues arise in the definition of V , as mentioned in Remark 23.7.16. But
R R
if U is so constrained that it has an identifiable basis, then the construction of the dual is straightforward,
as indicated in Theorem 23.7.14.
For a wide range of infinite-dimensional linear spaces which are of practical interest, axiom-of-choice issues
do not arise in the formation of dual spaces. This is partly because the primal spaces are so restricted that a
basis, possibly well-ordered, can be readily identified. It is also partly because the spaces of linear functionals
are themselves constrained to be bounded with respect to some topology. Thus even the double dual, under
such circumstances, may be well-defined in ZF set theory.
23.7.2 Definition: The canonical dual basis of a basis set E for a linear space V is the set {he ; e E},
where he : V K is defined for all e E by he (v) = E (v)(e) for all v V , where K is the field of V and
E : V Fin(E, K) is the component map for E in Definition 22.8.6.
23.7.3 Definition: The canonical dual basis of a basis family B = (ei )iI for a linear space V is the
family (hi )iI , where hi : V K is defined for all i I by hi (v) = B (v)(i) for all v V , where K is the
field of V and B : V Fin(I, K) is the component map for B in Definition 22.8.7.
23.7.4 Remark: Difficulties of constructing an algebraic dual basis for an infinite-dimensional space.
Definitions 23.7.2 and 23.7.3 only define bases if the primal space is finite-dimensional. These canonical
dual basis constructions are intuitively appealing, but they do not span the dual linear space if the primal
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
space is infinite-dimensional.
The family (hi )iI in Definition 23.7.3 certainly does define a family of linear functionals hi on V , and the
span of these linear functionals is a linear space with the same dimension as V . However, this span is a
proper subspace of the dual space V if dim(V ) = .
It is an appealing thought that a useful restricted dual space may be constructed as the span of the canonical
dual basis in Definition 23.7.3. This would be analogous to the way a topological dual consisting of only
the bounded linear functionals is constructed for topological linear spaces. However, the span of the family
of linear functionals (hi )iI depends on the primal basis B.
To see how the primal basis may affect the span of the canonical dual basis, consider Examples 22.9.2
and 22.9.3. These examples present well-ordered bases B and B respectively P for the free linear space V =
Z R
FL( + 0 , ). Consider the linear functional f : V R defined by f (v) = iZ+0
v(i) for all v V . This
function is well defined because v(i) = 0 for all except a finite number of values of i. So f V , but f is not
in the span of the elements of B , where B = (hi )iZ+ is constructed from the basis family B = (ei )iZ+
0 0
as above. On the other hand, f is in the span of the elements of the family B , where B = (hi )iZ+ is
0
constructed from the basis B = (ei )iZ+ as above. In fact, f = h0 . So span(Range(B )) 6= span(Range(B )).
0
This is a little surprising. The bases B and B span the same space V , but the dual families do not. In this
example, span(Range(B )) span(Range(B )), but span(Range(B )) Range(B ) = .
The dual of any free linear space FL(S, K) may be identified with the unrestricted
P linear space K S . (See
Definition 22.2.5 for unrestricted linear spaces.) To see this, define (f ) = xS (x)f (x) for all K S
and f FL(S, K). Then (f ) is well-defined because f (x) = 0 for all but a finite number of elements x S,
and (f ) is clearly a linear function of f . So : K S V maps the unrestricted linear space to V .
To see that : K S V is a bijection, let g V and define g K SP P all x S, where
by g (x) = g(ex ) for
ex FL(K, S) is defined by ex : y 7 x,y for x S. Then g (f ) = xS g (x)f (x) = xS g(ex )f (x) =
23.7.5 Theorem: The canonical dual basis in Definition 23.7.3 is the unique family of linear functionals
(hi )iI on V such that hi (ej ) = ij for all i, j I.
Proof:P The family B = (hi )iI in Definition 23.7.3 satisfies hi (ej ) = B (ej )(i) for all i, j I. But
ej = kI jk ei . Since the components with respect to B are unique by Theorem 22.8.11, B (ej )(k) = jk
for all j, k I. So hi (ej ) = ij . The uniqueness of B follows from the fact that a linear functional is uniquely
determined by its action on a basis (by Theorem 23.4.5).
23.7.6 Theorem: Let V be a linear space which has a basis. Then the canonical dual basis set for V is
a well-defined linearly independent set of linear functionals on V .
Proof: Let V be a linear space over a field K, with a basis E V . Let H = {he ; e E} be the canonical
dual basis set of E. (See Definition 23.7.2.) It must be shown first that he : V K is well defined and
linear for all e E. Let e E. By Theorem 22.8.11, the component map E : V Fin(E, K) is a well-
defined bijection. So he (v) = E (v)(e) is well defined for all v V . By Theorem 23.1.15, the component
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
map E : V FL(E, K) is a linear space isomorphism. (FL(E, K) is the standard linear space structure
on Fin(E, K). See Notation 22.2.11.) So he is linear. Hence he V for all e E.
To show
P the linear independence of the linear functionalsP he H forP e E, let Fin(E, K) be such
that eE P e he = 0V . Then forPall v V , 0V = eE e he (v) = eE e E (v)(e). Let v = e0 E.
Then 0V = eE e E (e0 )(e) = eE e e0 e = e0 . So e0 = 0 for all e0 E. So the functionals in H are
linearly independent by Theorem 22.6.8 (ii).
23.7.7 Theorem: Let V be a linear spaceP with a finite basis E. Let H = {he ; e E} be the canonical
dual basis of E. Let f V . Then f = eE f (e)he .
23.7.8 Theorem: Let V be a linear space which has a finite basis. Then the canonical dual basis for V is
a basis for V .
Proof: Let V be a linear space over a field K, with a basis E V . Let H = {he ; e E} be the canonical
dual basis set of E. (See Definition 23.7.2.) Then H is well-defined and the linear functionals he are linearly
independent by Theorem 23.7.6.
P
Let f V . Then f = eE f (e)he by Theorem 23.7.7. So f span(H) because E is a finite set. So
span(H) = V . Hence H is a basis for V by Definition 22.7.2.
23.7.9 Remark: The canonical dual basis is a valid basis for a finite-dimensional space.
In the proof of Theorem 23.7.8, the finite dimension of V is essential to guarantee that
Pf can be expressed
as a finite linear combination of the elements he of the basis E. The expression f = eE f (e)he is not a
linear combination if there are an infinite number of terms f (e)he in the summation.
Proof: The assertion is the family version of Theorem 23.7.7. So it may be proved in the same way.
23.7.11 Theorem: The dimension of the dual of a finite-dimensional linear space equals the dimension of
the primal space.
Proof: Let V be a finite-dimensional linear space with basis E. The canonical dual basis linear functionals
he H in Definition 23.7.2 for e E are different. To see this, let e, e0 E. Then he (e) = 1K 6= 0K = he0 (e).
So he 6= he0 . Therefore the map h : E H defined by h : e 7 he is a bijection. Therefore #(E) = #(H).
So dim(V ) = dim(V ).
23.7.12 Theorem: A linear space which has a basis is infinite-dimensional if and only if its algebraic dual
space is infinite-dimensional.
23.7.13 Remark: The algebraic dual is isomorphic to the unrestricted linear space on the primal space.
Theorem 23.1.15 states that if a linear space V over a field K has a basis set B, then the component map
B : V FL(B, K) is a linear space isomorphism. Thus the primal space V is isomorphic to the free linear
space on B. Theorem 23.7.14 asserts an analogous property for dual spaces. The dual space is isomorphic
to the unrestricted linear space UL(B, K) = K B when the primal space has basis B. Therefore if the primal
basis is infinite, the dual space is hugely bigger than the primal space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.7.14 Theorem: Let V be a linear space over a field K with a basis set B. Then V is isomorphic to
the unrestricted linear space K B , and the map f 7 (f (v))vB is a linear space isomorphism from V to K B .
Proof: Let V be a linear space over a field K with a basis set B. Let f V . Then f : V K is a linear
functional by Notation 23.6.2. Define : V K B by : f 7 (f (v))vB K B . Then is a linear map
because (1 f1 + 2 f2 ) = (1 f1 (v) + 2 f2 (v))vB = 1 (f1 (v))vB + 2 (f2 (v))vB = 1 (f1 ) + 2 (f2 ) for
all 1 , 2 K and f1 , f2 V by Definition 22.2.5.
P
For h K B , define (h) : V K by (h)(v) = eB h(e)B (v)(e) for all v V . Then (h) is well
defined because {e B; PB (v)(e) 6= 0} < by Definition P 22.8.6, which is justified
P by Theorem 22.8.2.
But (h)(1 v1 + 2 v2 ) = eB h(e)B (1 v1 + 2 v2 )(e) = 1 eB h(e)B (v1 ) + 2 eB h(e)B (v2 )(e) =
1 (h)(v1 ) + 2 (h)(v2 ) for all 1 , 2 K and v1 , v2 V by Theorem 22.8.12. Therefore (h) V by
Notation 23.6.2. Thus (h) V for all h K B . So : K B V is a well-defined map. However,
h K B , ((h)) = ((h)(v))vB
P
= h(e)B (v)(e) vB
eB
P
= h(e)e,v vB (23.7.1)
eB
= (h(v))vB
= h,
where line (23.7.1) follows from Theorem 22.8.9 (i). So = idK B . Therefore by Theorem 10.5.12 (iii),
is surjective. Conversely,
P
f V , v V, ((v)) = (f )(e)B (v)(e)
eB
= f (v)
23.7.15 Theorem: The dual linear space of FL(B, K) is canonically isomorphic to UL(B, K) for any set
B and field K.
23.7.16 Remark: The double dual of the free linear space on an infinite basis.
Regrettably, the double dual linear space of a free linear space raises choice function issues if the basis is
R R R
infinite. For example, let B = and K = . Let W = UL(, ) = . Then one might expect that
there exists a linear map : W R for which (1W ) = 1, where 1W : R is the function defined by
i , 1W (i) = 1. Since this vector 1W W is linearly independent of all of the atoms j : which R
are defined by i , j (i) = ij , the value of for all such atomic functions may be chosen to equal zero.
R
Thus (g) = 0 for all g UL(, ) with compact support. In other words, (g) = 0 for all g FL(, ). R
The next task is to choose a value of (g) for all g W which are not spanned by the atoms. One could
perhaps hope that the definition (g) = limi g(i) would be suitable, because it does satisfy (1W ) = 1
R
and (g) = 0 for all g FL(, ). However, this is not defined for all real-valued sequences. Nor are the
inferior and superior limits, which are not even linear on their domains. It is not possible to set (g) = 0 for
R
all g W which are not in the span of 1W and not in FL(, ) because this is not a linear subspace.
Now define gm,j W by Range(gm,j ) = {0R , 1R } and i , (g(i) = 1R (i mod m) = j) for m + Z
and n + Z
0 with n < m. Then g2,0 + g2,1 = 1W . Therefore (g ) + (g2,1 ) = 1R , but (g2,0 ) and (g2,1 )
Pm1 2,0
Z
may be freely chosen within this constraint. Similarly, j=0 (gm,j ) = 1R for all m + , but whenever
Z
m1 , m2 + have a common factor, there will be additional constraints. It is not too difficult to choose
values of for this well-ordered, countably infinite subspace of W . The difficulties lie in the not-well-ordered,
uncountably infinite remainder of the space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
There is no obvious well-ordering of W which could be used here because this would imply a well-ordering
P R
of (), which would imply a well-ordering for , which is not possible, as mentioned in Remarks 7.12.4
R
and 22.7.20. Consequently the possibility apparently cannot be excluded that UL(, ) FL(, ) in R
some model of ZF set theory. Since it is not possible to demonstrate the existence of linear functionals on
R
UL(, ) which do not have finite support, it seems reasonable to suppose that they do not exist. But this
apparently cannot be proved within ZF set theory. The same comments apply to any infinite basis set B,
whether it is Dedekind-infinite or not. Thus Theorem 23.7.17 can only demonstrate the existence of an
embedding of FL(B, K) inside UL(B, K) for a general set B.
23.7.17 Theorem: For any set B and ordered field K, the map : FL(B, K) UL(B, K) defined by
P
f FL(B, K), g UL(B, K), (f )(g) = f (e)g(e) (23.7.2)
eB
is a linear space monomorphism from FL(B, K) to the dual linear space of UL(B, K).
Proof: Let W = UL(B, K). The sum in line (23.7.2) is finite because #(supp(f )) < . The linearity of
(f ) : W K is clear for f FL(B, K). So : FL(B, K) W is a well-defined map. The linearity of
(f ) with respect to f is also clear. Therefore : FL(B, K) W is a linear map. To show injectivity, let
f1 , f2 FL(B, K) satisfy f1 6= f2 . Then f1 (e) 6= f2 (e) for some e B. Define g W by g = {e} . Then
(f1 )(g) = f1 (e) 6= f2 (e) = (f2 )(g). So (f1 ) 6= (f2 ). Therefore is injective.
i, j Nn, hi (ej ) = ij .
23.8.2 Definition: The canonical dual basis of a basis B = (ei )ni=1 for a finite-dimensional Pn linear space V
N
is the basis (hi )ni=1 for the dual space V defined by hi (v) = vi for all i n , for all v = i=1 vi ei V . In
other words, hi (v) = B (v)(i) for all v V .
23.8.3 Remark: Visualisation of the canonical dual basis as families of level sets.
Each canonical dual basis functional hi in Definition 23.8.2 depends on all of the vectors in the primal
basis (ei )ni=1 . This is illustrated in Figure 23.8.1 in the case n = 2. For the same primal basis vector e1 , the
dual basis vector h1 depends on the choice of the other primal basis vector e2 .
H2 H1 0
H0 H1 H2 0
H1 H00 H10 H 0
H2 H3 0
2 H3
0
e2 e2
0 e1 0 e1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
t=2 t=1
t=2 t=1 t=0 t=1
t=0 t=2 t=3
t=1 t=2 t=3
Ht = v V ; h1 (v) = t Ht0 = v V ; h01 (v) = t
v = v1 e1 + v2 e2 v = v1 e1 + v20 e02
h1 (v) = v1 h01 (v) = v1
Figure 23.8.1 Level sets of canonical dual basis vector depending on primal space basis
In the case of an orthonormal basis in a linear space with an inner product, each canonical dual basis
functional hi depends only on the matching vector ei because orthonormality removes the freedom of the
other vectors to influence the dual basis functionals. Since ones intuition for linear algebra is largely built
through examples with orthonormal bases, it is tempting to apply this intuition to linear spaces in general.
This intuition must be resisted!
The fact that the dual basis depends on all of the primal basis vectors helps to explain why the existence of
a basis for the primal space is helpful for constructing the space of linear functionals.
there is no metric tensor (or inner product). In linear algebra, upper and lower indices serve only to hint
that a particular object is either covariant or contravariant. When a single letter, such as e, denotes a
basis (ei )ni=1 of a linear space, the raised-index version (ei )ni=1 denotes the dual basis. This raising of the
index is not achieved by contracting with a metric tensor in the linear algebra context.
To see why the dual basis of a linear space isP not obtained from the primal space basis by contracting with a
n
metric tensor, suppose to the contrary that i=1 gij ei = ej for some (invertible) matrix (gij )ni,j=1 nn . R
(See Chapter 25 for matrix algebra.) This equation does not make sense because the left-hand side is a linear
combination of linear functionals ei V and the right-hand side is a vector ej V . For two objects to be
equal, they should at least live in the same space! But suppose we temporarily ignore this show-stopper and
optimistically try to force the equality. Maybe the problem can be fixed somehow!
Pn i
Pn i
Suppose
Pn i=1 gij eP= ej , as above. The action of the left-hand side on a vector v V is i=1 gP
ij e (v) =
i n i n
i=1 gij (e (v)) =Pn i=1 giji v , where (v i )ni=1 V n is the tuple of vector components
Pn satisfying v = i=1 v i ei .
i
In other words, i=1 gij e V is the linear functional which maps v V to i=1 gij v K.PThis is well
defined. If the right-hand side ej of the equation is applied to v, the result is ej (v) = ej ( ni=1 v i ei ) =
P n i
i=1 v ej (ei ), assuming that the ill-defined function ej : V K is at least linear! The left and right hand
N
sides will now be equal if gij = ej (ei ) for all i, j n . The product of basis vectors is not defined. But if
an inner product is defined on V as a bilinear function h : V V K (satisfying certain criteria), then the
expression ej (ei ) may be interpreted as such an inner product. (See Section 24.7 for inner products.) An
inner product is, in fact, exactly what a metric tensor is at each point of a manifold. An inner product is an
additional structure which is not available for a linear space according to Definition 22.1.1.
It may be concluded that the use of subscripts and superscripts to denote two different objects, for example
the primal and dual space bases, is confusing in linear algebra because this convention has a different meaning
in linear algebra to what it has in tensor calculus.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.9.2 Remark: Necessity of defining canonical dual component maps only for finite-dimensional spaces.
For a general linear space V with a basis in Definition 23.9.1, it is not self-evident that Range( ) Fin(I, K).
It must be shown that only a finite number of components of (f ) are non-zero for any f V . As discussed
in Remark 23.7.4, there are always linear functionals f in V for which (f ) has an infinite number of non-
zero values. So Range( ) 6 Fin(I, K). This is why the definition must be restricted to finite-dimensional
linear spaces.
Proof: Let (hi )iI be the canonical dual basis of (ei )iI . Let f V . Then
P P
v V, (f )(i)hi (v) = (f )(i)hi (v)
iI iI
P
= f (ei )B (v)(i)
iI
= f (v)
P
by Definitions 23.9.1 and 23.8.2 and Theorem 23.4.6. Hence f = iI (f )(i)hi .
23.9.4 Definition: Canonical component map (tuple version) for dual of finite-dimensional linear space.
The dual component map for the dual V of an n-dimensional linear space V over a field K with n +
0 for Z
a basis (ei )ni=1 for V is the map : V K n defined by : f 7 (f (ei ))ni=1 for f V .
i I, j J, aji = B2 (e1i )j
i I, j J, aij = B1 (e2j )i .
Then
P
j J, h2j = aji h1i (23.9.1)
iI
P
i I, h1i = aij h2j . (23.9.2)
jJ
Denote the components (fi1 )iI and (fj2 )jJ of dual vectors f V with respect to the bases B1 and B2
respectively by
fi1 = f (e1i )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
i I,
j J, fj2 = f (e2j ).
Then
P
f V , j J, fj2 = fi1 aij (23.9.3)
iI
P
f V , i I, fi1 = fj2 aji . (23.9.4)
jJ
Proof: By Definition 23.8.2, the dual bases (h1i )iI and (h2j )jJ satisfy h1i (v) = B1 (v)(i) for i I and
P
h2 (v) = B2 (v)(j) for j J. So for all v V and j J, h2j (v) = B2 (v)(j) = B2 ( iI B1 (v)(i)e1i )(j) =
Pj 1
P 1
iI B1 (v)(i)B2 (ei )(j) = iI aji hi (v), which verifies line (23.9.1). Line (23.9.2) follows similarly.
P
For all f V and j J, fj2 = f (e2j ) = f ( iI e1i aij ) by Theorem 22.9.11 line (22.9.3). So fj2 =
P P
1
iI f (ei )aij =
1
iI fi aij , which verifies line (23.9.3). Line (23.9.4) follows similarly.
23.10.2 Definition: The second dual of a linear space V is the dual of the dual of V .
23.10.3 Remark: The canonical immersion of a linear space inside its double dual.
The double-map-rule notation : v 7 (f 7 f (v)) in Definition 23.10.4 is described in Remark 10.19.3.
23.10.4 Definition: The canonical map from a linear space V to its second dual is the map : V V
defined by : v 7 (f 7 f (v)). In other words, (v)(f ) = f (v) for all v V and f V .
23.10.5 Theorem: Let V be a linear space which has a basis. Then the canonical map from V to its
second dual is a well-defined linear space monomorphism.
Proof: The canonical map in Definition 23.10.4 is well-defined because for all v V , (v) is a linear
map from V to the field K of the linear space V . (This follows from the calculation (v)(1 f1 + 2 f2 ) =
(1 f1 + 2 f2 )(v) = 1 f1 (v) + 2 f2 (v) = 1 (v)(f1 ) + 2 (v)(f2 ) for all v V , 1 , 2 K and f1 , f2 V .)
Therefore f (v) V .
For all 1 , 2 K and v1 , v2 V , (1 v1 + 2 v2 )(f ) = f (1 v1 + 2 v2 ) = 1 f (v1 ) + 2 f (v2 ) = 1 (v1 )(f ) +
2 (v2 )(f ) for all f V . So (1 v1 + 2 v2 ) = 1 (v1 ) + 2 (v2 ). Thus is a linear space homomorphism.
To show that is injective, suppose that v1 , v2 V are vectors such that (v1 ) = (v2 ) and v1 6= v2 . By
Theorem 23.5.3, there is a linear functional f V such that f (v1 v2 ) 6= 0. Then 0 = (v1 )(f )(v2 )(f ) =
f (v1 ) f (v2 ) = f (v1 v2 ) 6= 0. This is a contradiction. So is injective.
23.10.6 Remark: The dual of the dual is isomorphic to the primal space if it is finite-dimensional.
The second dual V of a finite-dimensional linear space V is canonically isomorphic to the primal space V .
Theorem 23.10.7 does not hold for infinite-dimensional linear spaces which have a basis. (The dual of an
infinite-dimensional linear space with a basis is larger than the primal space.)
23.10.7 Theorem: Let V be a finite-dimensional linear space. Then the canonical map from V to its
second dual is a linear space isomorphism.
Proof: Let V be a finite-dimensional linear space. It follows from Theorem 23.10.5 that the canonical
map : V V is a monomorphism.
To show that is surjective, let g V . It must be shown that there is a vector v V such that
g(f ) = f (v) for all f V . Let : V Fin(B, K) denote P the component map relative to a basis
B for V . Then any vector w V may be written as w = eB (w)(e)e. Define P Fin(B, V ) by
(e)(w)
P = (w)(e) for
P all e B and w V . ThenPfor any f V and w V , f (w) = f ( eB (w)(e)e) =
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
eB (w)(e)f P(e) = eB (e)(w)f
P (e). So f = eB f
P (e)(e) because B is finite. P for all f
Therefore
V , g(f ) = g( eB f (e)(e)) = eB f (e)g((e)) = f( eB g((e))e) = f (v), where v = eB g((e))e.
Then v is a well-defined vector in V because g((e)) eB Fin(B, K). So is surjective. Hence is a
linear space isomorphism.
23.10.8 Remark: Space of linear maps in the special case that the domain or range is a field.
The space of linear maps Lin(V, W ) for linear spaces V and W over a field K has a special form when V or
W is substituted with K. If W is substituted with K, then Lin(V, K) = V . If V is substituted with K,
then there is a natural isomorphism Lin(K, W ) = W as shown in Theorem 23.10.9. It is usual to identify
Lin(K, W ) with W .
23.10.9 Theorem: Let W be linear spaces over a field K. Then the map : Lin(K, W ) W defined by
(f ) = f (1K ) for all f Lin(K, W ) is a linear space isomorphism.
Proof: Clearly is a linear map. To show surjectivity, let g W and note that the map f : K W
defined by f (k) = kg is well defined and linear. So f Lin(K, W ). But (f ) = f (1K ) = g. So is surjective.
To show injectivity, let f1 , f2 Lin(K, W ) satisfy (f1 ) = (f2 ). Then 0W = (f1 ) (f2 ) = (f1 f2 ) =
(f1 f2 )(1K ). By the linearity of f , (f1 f2 )(k) = k(f1 f2 )(1K ) = 0W for all k K. So f1 = f2 . Hence
is injective.
23.10.10 Remark: The application of primal and dual linear spaces as tangent vectors and covectors.
In the differentiable manifolds context, if the primal linear space is the tangent space at a point, then the
dual space is the set of all differentials of differentiable real-valued functions at that point. (See Section 53.2.)
A tangent vector v on a differentiable manifold M may be thought of as an equivalence class of C 1 curves
which have the same velocity at a point p. Such a vector may be identified with the common velocity of the
curves. This is then pictured as an arrow as in Figure 23.8.1. Similarly, a tangent covector may be thought of
as an equivalence class of C 1 real-valued functions on M which have the same differential at p. The covector
is identified with the common differential of the functions. This is then pictured as a set of contour lines as
in Figure 23.8.1. Thus one may think of the arrows and contour lines of linear algebra as limiting pictures
for differentiable manifold concepts. (Note that the word gradient cannot be substituted for differential
here because, technically speaking, the gradient requires a metric tensor. See Definition 68.6.2.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
All in all, it seems best to use the terms transpose or transposed map for the concept in Definition 23.11.2
to minimise confusion.
23.11.4 Example: Transpose map for immersions and submersions of finite-dimensional spaces.
R R
Define Lin(V, W ) for V = m , W = n , for m, n + Z 0 with m n, by : v 7 (v1 , . . . vm , 0, . . . 0).
Then T : W V is defined by T (f ) : v 7 f ((v)) = f (v1 , . . . vm , 0, . . . 0) for all f W . Thus
R R
T (f ) : V is defined on m by evaluating f on the subspace Range() of W and attributing the value
to the input tuple (v1 , . . . vm ). Then is injective and T is surjective, but if m < n, then is not surjective
and T is not injective. (This is related to Theorems 23.11.9 and 23.11.10.)
Now let m n and define Lin(V, W ) by : v 7 (v1 , . . . vn ). Then T : W V is defined by
T (f ) : v 7 f ((v)) = f (v1 , . . . vn ) for all f W . So is surjective and T is injective, but if m > n, then
is not injective and T is not surjective.
In regard to Theorems 23.11.14, 23.11.15 and 23.11.17, it would be interesting to know whether the map
: Lin(V, W ) Lin(W ,P V ) defined by : 7 T is surjective in this example. All linear maps
Pmfrom V to
m
W have the form : v 7 i=1 vi gi for some (gi )m m T T
i=1 W . Then has the form (f )(v) = i=1 vi f (gi )
R
for f W and v m . But the functions gi : R R N
have finite support for i m . Therefore only
those linear functionals in Lin(W , V ) which have finite support are represented in Range(). It cannot
be guaranteed in pure ZF set theory that there are no linear functionals in W which do not have finite
support. In fact, in ZF models where the axiom of choice is valid, linear functionals with infinite support do
exist in W . Therefore it cannot be claimed that : Lin(V, W ) Lin(W , V ) is surjective.
23.11.6 Remark: The relation of the transpose of a linear map to the pull-back concept.
If V = W and = idV in Definition 23.11.2, then = idV . In the general case, the formula (f ) = f
may be written as = R , where R is the right action on linear functionals f W by the map . This
is actually a special case of the concept of the pull-back of the function f : V K by the map to a
function (v) : W K. The pull-back concept is used frequently for functions on manifolds.
The natural generalisation of this to the set W K of all (not necessarily linear) K-valued functions on W
would have the same definition : f 7 f . Another useful generalisation would be to linear maps from
V and W to a third linear space U . Thus a transposed map : Lin(W, U ) Lin(V, U ) may be defined by
the same rule (f ) = f for all f Lin(W, U ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
23.11.8 Remark: Properties of transposes of linear space maps.
Theorems 23.11.9, 23.11.10 and 23.11.12 give some basic properties of the transpose of a linear space map.
The terms monomorphism and epimorphism were introduced in Definition 23.1.8 to mean injective and
surjective homomorphisms respectively. It is noteworthy that Theorem 23.11.10 requires the target space
W to have a well-ordered basis. This is required to ensure the existence of a linear extension of a linear
functional on a subspace of W to the whole of W .
23.11.9 Theorem: Let : V W be a linear space epimorphism between linear spaces V and W over a
field K. Then the transpose map T : W V is a linear space monomorphism from W to V .
Proof: Let : V W be a linear space epimorphism for linear spaces V and W over a field K. Suppose
g1 , g2 W satisfy T (g1 ) = T (g2 ). Then v V, g1 ((v)) = g2 ((v)) by Definition 23.11.2. Since is
surjective, w W, v V, (v) = w. Therefore w W, v V, ((v) = w and g1 ((v)) = g2 ((v))).
This is equivalent to w W, v V, ((v) = w and g1 (w) = g2 (w)). So w W, v V, (g1 (w) = g2 (w)).
So w W, (g1 (w) = g2 (w)). Therefore g1 = g2 . Hence T is a monomorphism.
23.11.10 Theorem: Let : V W be a linear space monomorphism between linear spaces V and W
over a field K. If W has a well-ordered basis, then the transpose map T : W V is a linear space
epimorphism from W to V .
Proof: Let : V W be a linear space monomorphism for linear spaces V and W over a field K.
Suppose W has a well-ordered basis. Let T : W V denote the transpose of . Since is injective, it
has a unique well-defined left inverse 1 : (V ) V such that 1 = idV . (See Figure 23.11.1.)
For f V , define the linear functional gf : (V ) K by gf (w) = f (1 (w)) for w in the linear subspace
(V ) of W . By Theorem 23.5.7, since W has a well-ordered basis, gf can be extended to a linear func-
(V ) W
V
1
W \ (V )
1
g f
W
=
f
V
gf
gf
K T
tional gf : W K. Then for all v V , T (gf )(v) = gf ((v)) = f (1 ((v))) = f (v). So T (gf ) = f .
Hence f V , g W , T (g) = f . In other words, T is an epimorphism.
23.11.12 Theorem: Let : V W be an isomorphism between linear spaces V and W over a field K.
Then the transpose map T : W V is an isomorphism from W to V . Hence if V and W are isomorphic
spaces over a field K, then V and W are isomorphic spaces over K.
Proof: Let V and W be linear spaces over a field K. Let : V W be a linear space isomorphism.
Then the inverse map 1 : W V is a well-defined linear space isomorphism.
For any f V , let hf = f 1 . Then hf W . So T (hf ) = hf = f 1 = f . Hence
f V , h W , T (h) = f . That is, T is surjective.
Let g1 , g2 W and suppose T (g1 ) = T (g2 ) V . This means g1 = g2 V . So g1 1 =
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
g2 1 W . That is, g1 = g2 . So T is injective. Hence T is a bijection. Hence T is a linear space
isomorphism.
23.11.13 Remark: Linear space morphism properties of linear space transpose maps.
It is not at all surprising that the linear space transpose map is a linear map between two spaces of linear
maps in Theorem 23.11.14. The transpose map cannot be expected to be surjective in general because
the algebraic dual of an infinite-dimensional linear space is typically larger than the primal space. (See
Theorem 23.7.14.)
23.11.14 Theorem: Let V and W be linear spaces over a field K. Then : Lin(V, W ) Lin(W , V )
defined by : 7 T is a linear map.
Proof: Let V and W be linear spaces over K. Define : Lin(V, W ) Lin(W , V ) by : 7 T . Then
is a well-defined map by Definition 23.11.2. (The linear space structure on Lin(V, W ) and Lin(W , V ) is
given in Definition 23.1.6.) Let 1 , 2 K and 1 , 2 Lin(V, W ). Then by Definition 23.11.2,
23.11.15 Theorem: Let V and W be linear spaces over a field K. Suppose that W has a basis. Then the
map : Lin(V, W ) Lin(W , V ) defined by : 7 T is a linear space monomorphism.
Proof: Let V and W be linear spaces over K such that W has a basis. Then : Lin(V, W ) Lin(W , V )
is a linear map by Theorem 23.11.14.
Suppose that 1 , 2 Lin(V, W ) satisfy (1 ) = (2 ). Then T1 = T2 . So f (1 (v)) = f (2 (v)) for all
f W and v V . Therefore 1 (v) = 2 (v) for all v V by Theorem 23.6.9 since W has a basis. So
1 = 2 . Hence is a linear space monomorphism.
23.11.16 Remark: The isomorphism from linear maps to their transposes for finite-dimensional spaces.
Theorem 23.11.17 shows that for the special case of finite-dimensional linear spaces over the same field, the
map from linear maps to their transposes is an isomorphism. This is useful for establishing isomorphisms
between certain kinds of mixed tensor spaces. (See Remark 29.3.1.)
The generalisation of Theorem 23.11.17 to infinite-dimensional spaces is not valid, even if those spaces have
bases, because not all elements of the dual of the dual of a linear space have finite support. (This issue is
discussed in Remark 23.7.16 and Example 23.11.5.)
23.11.17 Theorem: Let V and W be finite-dimensional linear spaces over a field K. Then the map
: Lin(V, W ) Lin(W , V ) defined by : 7 T is a linear space isomorphism.
Proof: Let V and W be finite-dimensional linear spaces over K. Then is a linear space monomorphism
by Theorem 23.11.15. Let Lin(W , V ). Let BV = (eVi )iI and BW = (eW j )jJ be finite bases for V
and W respectively, and let BV = (hVi )iI and BW
= (hWj ) jJ be the corresponding canonical dual bases
as in Definition 23.7.3.P BV and BW
ThenP
are bases for V and W respectively by Theorem 23.7.8. Define
Lin(V, W ) by = jJ iI (hW V W V
j )(ei )ej hi . Then by Theorem 23.7.10,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P P
= f (eW
j ) (hW V V
j )(ei )hi (v)
jJ iI
P P
= f (eW W
j )(hj )( hVi (v)eVi )
jJ iI
P
= f (eW W
j )(hj )(v)
jJ
P
= ( f (eW W
j )hj )(v)
jJ
= (f )(v).
23.11.19 Definition: The dual representation or contragredient representation of an invertible linear map
L on a finite-dimensional linear space V is the map L : V V defined by
In other words, each group element is represented by its dual (or contragredient) representation.
Chapter 24
Linear space constructions
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
can be multiplied by summing the powers. Thus one defines direct products of groups (Definition 17.7.14),
but in the case of linear spaces (and modules over rings), one generally defines direct sums.
24.1.2 Definition: The external direct sum of a sequence (V )A of linear spaces over a field K is the
set
V = (v )A V ; #{ A; v 6= 0} <
A A
together with the operations of componentwise addition and scalar multiplication defined by
(v )A , (w )A V , (v )A + (w )A = (v + w )A
A
and
K, (v )A V , (v )A = (v )A .
A
The external direct sum of linear spaces may also be called a formal direct sum .
24.1.3 Notation: External direct sums of linear spaces.
A V denotes the external direct sum of a family (V )A of linear spaces.
n1
i=0 Vi denotes the external direct sum A V in case A = n .
i=1 Vi denotes the external direct sum A V in case A = Nn Z+0 .
n
n V denotes the external direct sum ni=1 Vi in case n Z+0 and Vi = V for all i Nn.
2
V1 V2 denotes the external direct sum i=1 Vi .
3
V1 V2 V3 denotes the external direct sum i=1 Vi , and so forth.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
24.1.4 Remark: The difference between internal and external direct sums of linear spaces.
The formal or abstract direct sum in Definition 24.1.2 is referred to as an external direct sum by Hartley/
Hawkes [87], definition 3.1, page 34, whereas they call Definition 24.1.6 (corresponding to their definition 3.2)
the internal direct sum.
Most authors use the simple term direct sum of linear spaces to mean the internal direct sum which
is given by Definition 24.1.6. Since the internal and external direct sums of linear spaces are different set
constructions, they should have different notations to distinguish them. But when the internal direct sum is
defined, it is equivalent to the external direct sum. This reduces the motivation to find a separate notation.
The external direct sum of linear spaces has much broader applicability than the internal direct sum. But
the internal direct sum is more often used in applications. It seems inevitable that both the internal and
external constructions should share a single notation.
Hartley/Hawkes [87], page 34, make the following useful observation in regard to external and internal direct
sums of rings.
The reason for the use of the same notation for external and internal direct sums will appear
shortly. The external direct sum should be thought of as a way of building up more complicated
rings from given ones, and the internal direct sum as a way of breaking a given ring down into
easier components.
This comment is equally valid for direct sums of linear spaces. The reason for the use of the same notation
is that the external and internal direct sums are isomorphic, as shown here in Theorem 24.1.8.
It seems fairly safe to use the same notation for external and internal direct sums of linear spaces, since the
interpretation will generally be clear from the context.
24.1.5 Definition: The standard injection i : V A V is defined by
w =
w V , (i (w)) =
0V 6= .
24.1.6 Definition: Let V1 and V2 be subspaces of a linear space V over a field K such that V1 V2 = {0}.
Then the (internal) direct sum of V1 and V2 is the subspace of V defined by
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
{v1 + v2 ; v1 V1 , v2 V2 }.
24.1.7 Remark: The direct sum of intersecting subspaces equals the span of their union.
Definition 24.1.6 means that the direct sum of two subspaces with trivial intersection is the span of the union
of the two subspaces. (See Definition 22.4.2 for the span of a subset of a linear space.) Thus
{v1 + v2 ; v1 V1 , v2 V2 } =
T
P
U (V ); U is a subspace of V and V1 V2 U ,
24.1.8 Theorem: Let V1 , V2 , V3 be subspaces of a linear space V over a field K such that V3 is the internal
direct sum of V1 and V2 . Define a map : V1 V2 V3 by
v1 V1 , v2 V2 , (v1 , v2 ) = v1 + v2 .
24.1.10 Notation: V1 V2 denotes the internal direct sum of linear subspaces V1 and V2 which is given
by Definition 24.1.6.
24.1.11 Remark: Extension of the sum-of-dimensions rule for direct sums to infinite-dimensional spaces.
Theorem 24.1.12 is restricted to finite-dimensional spaces because the situation for infinite-dimensional spaces
is much less clear. Whether linear space dimension is defined in terms of beta-cardinality of spanning sets
(as in Definition 22.5.2) or in terms of conventional cardinality (which encounters difficulties with the axiom
of choice and some other issues), the cardinal arithmetic must be carefully defined and carefully applied.
Theorem 24.1.12 can be extended to infinite-dimensional linear spaces, but the effort in the general case is
not necessarily justified by the benefit.
The assertion of Theorem 24.1.12 is fairly obviously true. The proof given here is somewhat convoluted
because of the need to manage doubly indexed families of vectors. The management of multiply indexed
families of vectors becomes even more tedious in tensor algebra. It is probably best to avoid even reading
such proofs, which sometimes resemble computer programming more than mathematics.
24.1.12 Theorem: The dimension of a direct sum of finitely many finite-dimensional linear spaces equals
the sum of the dimensions of the component spaces.
N
i=1
for some sequences (i,j )nj=1
i
K ni for i m . So w span(B0 ), where B0 = {e0i,j ; (i, j) I0 }, where
N Z N N
I0 = {(i, j) m + and i m , j ni }, and e0i,j = (ki ei,j )m
m Pm
m
k=1 i=1 Vi for all (i, j) I0 . Thus
i=1 Vi is spanned by B0 , where #(B0 ) = i=1 ni . But the elements of B0 are linearly independent because
N
each set Bi is linearly independent for i m and the
m Pm m
m
components of i=1 Vi cannot be non-trivial linear
combinations of each other. Hence dim(i=1 Vi ) = i=1 ni .
24.1.13 Remark: Sum-of-dimensions rule for direct sums. Product-of-dimensions for tensor products.
Theorem 24.1.12 may be contrasted with the situation for tensor products of linear spaces. The dimension
of a tensor product of finite-dimensional linear spaces equals the product of the dimensions of the component
spaces. (See Theorem 28.1.20 for the dimension of a tensor product of linear spaces.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
24.1.14 Remark: Free linear spaces on finite sets are special kinds of direct products.
The free linear space on a finite set S in Definition 22.2.10 is a direct product of #(S) copies of the field K
regarded as a linear space over itself. Thus dim(FL(S, K)) = #(S). (This is shown in Theorem 22.7.15, but
without the interpretation of free linear spaces as direct products.)
x S, v V, v S v x U.
24.2.4 Definition:
The (left) translate of a subset S of a linear space V by an element x V is the set {x + v; v S}.
The right translate of a subset S of a linear space V by an element x V is the set {v + x; v S}.
24.2.5 Notation:
x + S, for an element x and a subset S of a linear space V , denotes the left translate of S in V by x.
S + x, for an element x and a subset S of a linear space V , denotes the right translate of S in V by x.
24.2.6 Theorem:
Let V be a linear space. Let S be a subset of V . Let x V . Then v V, (v x + S v x S).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
be an element of S such that v V, (v S v y0 U ). Let x S. Then x y0 U . So
v y0 U (v y0 ) (x y0 ) U v x U for all v V . Therefore v V, (v S v x U ).
But v V, (v x+U vx U ) by Theorem 24.2.6. So v V, (v S v x+U ). Hence S = x+U .
For part (iv), let x V and y x + U . Suppose that z x + U . Then y x U and z x U by
Theorem 24.2.6. So (z x) (y x) = z y U because U is a subspace. So z y + U by Theorem 24.2.6.
Therefore x+U y +U . Similarly, suppose that z y +U . Then (z y)+(y x) = z x U . So z x+U .
Therefore y + U x + U . Hence x + U = y + U .
For part (v), let x V and note that 0V U by Theorem 22.1.9. So x = x + 0V x + U by Definition 24.2.4.
Hence x V, x x + U .
For part (vi), let x1 , x2 V . Suppose that (x1 + U ) (x2 + U ) 6= . Then y x1 + U and y x2 + U for
some y V . So x1 +U = y+U = x2 +U by part S (iv). Therefore for allS x1 , x2 V , either (x1 +U )(x2 +U ) =
or x1 + U = x2 + U . But by part (v), V xV (x + U ). So V = xV (x + U ). Hence {x + U ; x V } is a
partition of V by Definition 8.6.11.
For part (vii), let x1 , x2 V . Suppose that x1 + U = x2 + U . Let y x1 + U . Then y x1 U by
Theorem 24.2.6. Similarly, y x2 U . So x1 x2 = (y x2 ) (y x1 ) U because U is a subspace. Now
suppose that x1 x2 U . Then x1 x2 + U by Theorem 24.2.6. So x1 + U = x2 + U by part (iv).
24.2.8 Definition: The quotient (linear) space of a linear space V with respect to a subspace U of V is
the set {v + U ; v V } together with addition and scalar multiplication operations defined by
24.2.9 Notation: V /U , for a linear space V and subspace U , denotes the quotient space of V over U .
Proof: The sets v1 + U , v2 + U and (v1 + v2 ) + U in Definition 24.2.8, line (24.2.1), are well defined by
Definition 24.2.4. It must be shown that the defined addition operation for cosets v1 + U and v2 + U is
independent of the vectors v1 and v2 when the sets are the same. In other words, it must be shown that
(v1 + U ) + (v2 + U ) = (v10 + U ) + (v20 + U ) if v1 + U = v10 + U and v2 + U = v20 + U . So suppose that
v1 + U = v10 + U and v2 + U = v20 + U . Then v1 v10 U and v2 v20 U by Theorem 24.2.7 (vii). So
(v1 +v2 )(v10 +v20 ) = (v1 v10 )+(v2 v20 ) U . Therefore (v1 +v2 )+U = (v10 +v20 )+U by Theorem 24.2.7 (vii).
This verifies that line (24.2.1) of Definition 24.2.8 is well defined.
To show the well-definition of line (24.2.2), let v1 , v10 V satisfy v1 + U = v10 + U . Then v1 v10 U by
Theorem 24.2.7 (vii). Let K. Then (v1 v10 ) U because U is a subspace. So v1 v10 U . Hence
(v1 ) + U = (v10 ) + U by Theorem 24.2.7 (vii).
24.2.11 Theorem: Let U be a linear subspace of a linear space V . Then U is the additive identity of the
quotient linear space V /U . In other words, 0V /U = U .
24.2.12 Definition: The natural homomorphism from a linear space V to the quotient linear space V /U ,
for a subspace U of V , is the map : V V /U defined by v V, (v) = v + U .
24.2.13 Remark: Natural homomorphisms for quotient linear spaces are epimorphisms.
The natural homomorphism in Definition 24.2.12 is a linear space epimorphism. This follows easily from
Definitions 23.1.8 and 24.2.8. The maps and spaces in Definition 24.2.12 are illustrated in Figure 24.2.1.
V U U V /U
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
idU
idU
Figure 24.2.1 Natural homomorphism between a linear space V and a quotient space V /U
24.2.14 Theorem:
Let U be a linear subspace of a finite-dimensional linear space V . Then dim(V /U ) = dim(V ) dim(U ).
V /U = {v + U ; v V }
Pn
= i ei + U ; K n
i=1
n
P
= i (ei + U ); K n
i=1
n
P
= 0i (ei + U ); 0 K nm .
i=m+1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
0 (v1 +U ) = 0 (v2 +U ), and noting that then 0W = 0 (v1 +U )0 (v2 +U ) = 0 ((v1 v2 )+U ) = (v1 v2 ), and
so v1 v2 ker() = U , and therefore v1 + U = v2 + U by Theorem 24.2.7 (vii). Hence 0 : V / ker() (V )
is a linear space isomorphism by Definition 23.1.8.
24.2.17 Remark: Rank-nullity theorem.
Theorem 24.2.18, which follows from Theorem 24.2.16, is sometimes known as the rank-nullity theorem
because it relates the rank, dim(Range()), of to its nullity, dim(ker()). (See Definition 23.1.26 for the
nullity of a linear map.) Theorems 24.2.16 and 24.2.18 are illustrated in Figure 24.2.2.
0W
V ker() W
(V )
V / ker() (V )
dim(V ) dim(ker()) = dim((V ))
Figure 24.2.2 Isomorphism theorem and rank-nullity theorem
The rank-nullity theorem may seem, at first sight, to be provable in terms of some kind of orthogonal
complement (or linearly independent complement) to the kernel of a linear map. However, the space which
has dimension dim(V ) dim(ker()) is not a subspace of Dom(). It is in fact a set of cosets. If the space
has an inner product, one could possibly make some kind of orthogonal complement interpretation, but
Theorem 24.2.18 is metric-free.
24.2.18 Theorem: Let : V W be a linear map for linear spaces V and W over the same field. If V
is finite-dimensional, then dim(Range()) + dim(ker()) = dim(V ).
Proof: The assertion follows from Theorems 24.2.16 (iii) and 24.2.14.
24.3.2 Remark: Illustration for isomorphism for the dual of a subspace of a linear space.
The maps and spaces in Theorem 24.3.4 are illustrated in Figure 24.3.1. The map in this diagram is the
natural homomorphism in Definition 24.2.12 for the quotient V /U . The numbers at the bottom left of
each space are the dimensions of the spaces if dim(V ) = n and dim(U ) = m for some m, n +0. Z
V V V /U
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
+U
U U
U
n n m
idU idU
v f
m nm m
U U U
Figure 24.3.1 Natural isomorphism for the dual of a subspace of a primal space
24.3.4 Theorem: Natural isomorphism for the dual of a linear subspace of a primal linear space.
Let U be a subspace of a linear space V with a well-ordered basis. Let U = { V ; v U, (v) = 0}.
P
Define : U (V ) by (f ) = { V ; v U, (v) = f (v)} for all f U .
Proof: Let V be a linear space with a well-ordered basis. Let U be a linear subspace of V . Define the
subset U of the algebraic dual space V by U = { V ; v U, (v) = 0}. Then clearly 0V U . Let
1 , 2 U . Then v U, 1 (v) = 0 and v U, 2 (v) = 0. So v U, (1 + 2 )(v) = 0. So 1 + 2 U .
Similarly, U for all K and U , where K is the field of V . Hence U is a linear subspace
of V , which proves part (i).
For part (ii), it must be shown that for all f U , the subset (f ) = { V ; v U, (v) = f (v)} of V
is a coset of the form 0 + U V /U for some 0 V , and that this coset is uniquely determined by f .
Let f U . Then (f ) 6= by Theorem 23.5.7 because V has a well-ordered basis. Let 0 (f ). Then
v U, 0 (v) = f (v). Let V . Then
(f ) v U, (v) = f (v)
v U, (v) = 0 (v)
v U, ( 0 )(v) = 0K
0 U
0 + U ,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Then 1 (f1 ) and 2 (f2 ) for such 1 , 2 . So 1 (v) = f1 (v) and 2 (v) = f2 (v) for all v U . Therefore
v U, (1 + 2 )(v) = 1 (v) + 2 (v) = f1 (v) + f2 (v) = (f1 + f2 )(v). So 1 + 2 { V ; v U, (v) =
(f1 +f2 )(v)} = (f1 +f2 ). So by part (iii), (f1 +f2 ) = (1 +2 )+U = (1 +U )+(2 +U ) = (f1 )+(f2 ).
Therefore satisfies the condition of vector additivity. To prove scalarity of , let K and f U . By
part (ii), (f ) = 0 + U for some 0 V . Then 0 (v) = f (v) for all v U . So (0 )(v) = 0 (v) =
f (v) = (f )(v) for all v U . Therefore 0 { V ; v U, (v) = (f )(v)} = (f ). So by part (iii),
(f ) = (0 ) + U = (0 + U ) = (f ), which proves the scalarity of . Hence is a linear map.
For part (v), to show surjectivity of , let 0 + U V /U for some 0 V . Let f = 0 U . Then
f U because the restriction of a linear functional to a linear subspace is a linear functional on the
subspace. So 0 { V ; v U, 0 (v) = f (v)} = (f ). So (f ) = 0 + U by part (iii). Therefore
0 + U Range(). Hence is surjective. To show injectivity, let f1 , f2 U satisfy (f1 ) = (f2 ). Then
(f1 ) = 1 + U and (f2 ) = 2 + U for some 1 , 2 V by part (ii). So 1 + U = 2 + U . Therefore
1 2 U by Theorem 24.2.7 (vii). But 1 (f1 ) and 2 (f2 ) by part (iii). So for all v U ,
0 = (1 2 )(v) = 1 (v) 2 (v) = f1 (v) f2 (v). So f1 = f2 . Therefore f is injective. Hence is a linear
space isomorphism.
24.3.5 Remark: Illustration for isomorphism for the dual of a subspace of a dual space.
The maps and spaces in Theorem 24.3.7 are illustrated in Figure 24.3.2. The map in this diagram is the
natural homomorphism in Definition 24.2.12 for the quotient V /W . The numbers at the bottom left of
each space are the dimensions of the spaces if dim(V ) = n and dim(W ) = m for some m, n +0. Z
24.3.6 Remark: The double dual of a finite-dimensional space is isomorphic to the primal space.
It may not be immediately clear why (f ) = {v V ; W, (v) = f ()} should be non-empty for
any f W in part (ii) of the proof of Theorem 24.3.7. Since V is finite-dimensional, the double dual of
V is canonically isomorphic to V . (See Theorem 23.10.7.) The map fv : (v) for V defined a
V V V /W
v v+W
W W
W
n n m
idW idW
f
m nm m
W W W
Figure 24.3.2 Natural isomorphism for the dual of a subspace of a dual space
24.3.7 Theorem: Natural isomorphism for the dual of a linear subspace of a dual linear space.
Let V be a finite-dimensional linear space, and W a subspace of V . Let W = {v V ; W, (v) = 0}.
P
Define : W (V ) by (f ) = {v V ; W, (v) = f ()} for all f W .
(i) W is a linear subspace of V .
(ii) f W , (f ) V/W .
(iii) f W , v V, (v (f ) (f ) = v + W ).
(iv) is a linear map.
(v) : W V /W is a linear space isomorphism.
Proof: Let V be a finite-dimensional linear space. Let W be a linear subspace of V , the algebraic dual
of V . Define the subset W of V by W = {v V ; W, (v) = 0}. Then clearly 0V W . Let
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
v1 , v2 W . Then W, (v1 ) = 0 and W, (v2 ) = 0. So W, (v1 + v2 ) = 0. So v1 + v2 W .
Similarly, v W for all K and v W , where K is the field of V . Hence W is a linear subspace
of V , which proves part (i).
For part (ii), it must be shown that for all f W , the subset (f ) = {v V ; W, (v) = f ()}
of V is a coset of the form v0 + W V /W for some v0 V , and that this coset is uniquely determined
by f . Let f W . Then (f ) 6= by Theorem 23.5.7 because V has a well-ordered basis (because V is
finite-dimensional). Let v0 (f ). Then W, (v0 ) = f (). Let v V . Then
v (f ) W, (v) = f ()
W, (v) = (v0 )
W, (v v0 ) = 0K
v v0 W
v v0 + W ,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) For some b V , the map b : V V with b : v 7 (v) b is linear.
If K is an ordered field, then conditions (i), (ii) and (iii) are equivalent to condition (iv), where K[0, 1]
denotes the set { K; 0K 1K }.
(iv) p, v V, K[0, 1], (p + v) (p) = ((p + v) (p)).
Proof: Let V be a linear space over a field K. Let : V V be a bijection which satisfies condition (i).
Substituting 0K for gives condition (ii).
Let V be a linear space over a field K whose characteristic does not equal 2. Let : V V be a bijection
which satisfies condition (ii). Let p, u, v V and , K. Substitution of u for v and 2 for in (ii) gives
(p + 2u) (p) = 2((p + u) (p)).
Substitution of 2 for in (ii) gives
(p + 2v) (p) = 2((p + v) (p)).
Substitute p + 2u for p, and 2v 2u for v, and 21 for in (ii). Then
(p + 2u + 21 (2v 2u)) (p + 2u) = 21 ((p + 2u + (2v 2u)) (p + 2u)).
So
(p + u + v) (p + 2u) = 21 ((p + 2v) (p + 2u)).
Hence
(p + u + v) (p) = 21 ((p + 2v) + (p + 2u)) (p)
= 21 ((p + 2v) (p)) + 21 ((p + 2u) (p))
= 21 ((p + 2v) (p)) + 21 ((p + 2u) (p))
= 21 (2((p + v) (p))) + 21 (2((p + u) (p)))
= ((p + u) (p)) + ((p + v) (p)),
by a double application of condition (ii). This verifies condition (i). Hence (i) and (ii) are equivalent.
Now assume condition (iii). Then there is a b V such that the map b : V V with b : v 7 (v) b is
linear. Then
(p + u + v) b = 1K ((p) b) + 1K ((u + v) b)
= ((p) b) + ((u) b) + ((v) b)
So
(p + u + v) (p) = ((u) b) + ((v) b).
But by linearity of b again, (p + u) b = ((p) b) + ((u) b). So (u) b = (p + u) (p). Similarly,
(v) b = (p + v) (p). So
(p + u + v) (p) = ((p + u) (p)) + ((p + v) (p)).
This verifies condition (i). So now assume condition (i) and let b = (0). Then by substituting 0 for p in (i),
b (u + v) = (u + v) (0)
= ((u) (0)) + ((v) (0))
= b (u) + b (v).
So b is linear, which verifies (iii). So conditions (i), (ii) and (iii) are equivalent.
Now let K be an ordered field. Then K has characteristic 0 by Theorem 18.8.7 (iv). Condition (iv) is a
restriction of condition (ii). So (iv) follows immediately from (ii).
Now assume condition (iv). Let p, v V and K. If K[0, 1], then (ii) follows immediately from (iv).
Suppose > 1. Then 1 K[0, 1] by Theorem 18.8.7 (iii). Substituting 1 for and v for v in part (iv)
gives (p + 1 (v)) (p) = 1 ((p + v) (p)). Hence (p + v) (p) = ((p + v) (p)), which
verifies part (ii) for all 0.
So suppose < 0. Then 1 > 1. Substituting p + v for p, and v for v, and 1 for in condition (ii),
which has just been verified for all 0, gives
(p + v + (1 )(v)) (p + v) = (1 )((p + v v) (p + v)).
So (p+v)(p+v) = (1)((p)(p+v)). So (p+v)(p) = ((p+v)(p)). Hence condition (ii)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is verified for all K. Therefore condition (iv) is equivalent to all of the other three conditions if K is an
ordered field.
24.4.3 Remark: Affine combinations of points.
The point p + v in Theorem 24.4.2 (iv) is a convex combination of p and p + v because K[0, 1]. It seems
reasonable to call the point p + v in Theorem 24.4.2 (ii) with unrestricted K an affine combination
of p and p + v.
24.4.4 Remark: Affine transformations have no fixed base point.
It is perhaps surprising that the equivalence of one-scalar and two-scalar conditions for affine transformations
in Theorem 24.4.2 does not work for linear transformations. However, in the affine case, the base point p is
portable. So a triangle of points can be linked and related to each other. In the linear case, the single-scalar
condition (v) = (v) has a fixed base point at the origin. So only pairs of points on radial lines from the
origin can be related.
24.4.5 Remark: Affine transformations on linear spaces.
Definition 24.4.6 is expressed in terms of Theorem 24.4.2 condition (iii) instead of the slightly more general
condition (i) because linear spaces over fields with characteristic 2 are (probably) of limited value for differ-
ential geometry, and the existence of a single point of linearity is a more convenient assumption than the
assumption that all points are points of linearity.
24.4.6 Definition: An affine transformation of a linear space V over a field K whose characteristic is not
equal to 2 is a bijection : V V which satisfies
p V, u, v V, , K,
(p + u + v) (p) = ((p + u) (p)) + ((p + v) (p)).
24.5.2 Definition: An exact sequence of linear maps (over a field K) is a pair ((Vi )n+1 n
i=1 , (fi )i=1 ) of
sequences such that each Vi is a linear space (over K), each fi is a linear map from Vi to Vi+1 , and
Img fi = ker fi+1 for 1 i n 1; that is, the pair of sequences must satisfy
(i) i Nn+1 , Vi is a linear space (over K),
(ii) i Nn , fi Lin(Vi , Vi+1 ),
(iii) i Nn1 , Img fi = ker fi+1 .
If K is not mentioned, it may be arbitrary or else equal to R, depending on context.
24.5.3 Notation: The pair of sequences ((Vi )n+1 n
i=1 , (fi )i=1 ) may be denoted
f1 f2 fn1 fn
V1 V2 . . . Vn Vn+1 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(i) For 1 k l n,
l
X
dim Img fl = (1)li dim Vi (1)lk dim ker fk .
i=k
That is, the sum of the odd dimensions equals the sum of the even dimensions.
Proof: Let 1 k n. Then by Theorem 24.2.18, dim Vk = dim ker fk +dim Img fk . That is, dim Img fk =
dim Vk dim ker fk . This establishes case l = k of (i).
If 1 k < n, then
dim Img fk+1 = dim Vk+1 dim ker fk+1
= dim Vk+1 (dim Vk dim ker fk+1 ).
This is case l = k + 1 of (i). The result for general k and l follows by induction from Theorem 24.2.18 and
the exactness of the sequence.
Parts (ii) and (iii) follow directly from (i).
24.5.5 Theorem:
f
(i) If V1 V2 0 is an exact sequence of linear maps, then f is surjective, and hence dim V1 dim V2 .
g
(ii) If 0 V1 V2 is an exact sequence of linear maps, then g is injective, and hence dim V1 dim V2 .
f
(iii) If 0 V1 V2 0 is an exact sequence of linear maps, then f is a bijection. Hence dim V1 = dim V2 .
g f
(iv) If 0 V1 V2 V3 0 is an exact sequence of linear maps, then f is surjective, g is injective,
f g is the zero map, and dim V2 = dim V1 + dim V3 .
Proof: Assertions (i), (ii) and (iii) follow easily from Definition 24.5.2.
Part (iv) follows from Theorem 24.5.4 (iii).
24.5.7 Theorem: Let V and W be linear spaces over the same field. Let f : V W be a surjective linear
map. Let : W V be a linear map such that f = idW . Then V = (ker f ) (W ).
Proof: Clearly (W ) is a linear subspace of V . To show that (ker f ) (W ) spans V , let v V and put
v 0 = f (v). Then
f (v v 0 ) = f (v) f ( f (v))
= f (v) idW f (v)
= 0.
So vv 0 ker f . But v 0 (W ). Hence v = (vv 0 )+v 0 (ker f ) (W ). To show that (ker f )(W ) = {0},
suppose v (ker f ) (W ). Then v = (w) for some w W . But w = idW (w) = f (w) = f (v) = 0,
because v ker f . So v = (0) = 0. So V = (ker f ) (W ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
24.6. Seminorms and norms on linear spaces
24.6.1 Remark: Seminorms and norms are formally algebraic, but have topological significance.
Seminorms and norms are apparently algebraic structures on linear spaces. (See Definitions 19.4.3 and 19.5.2
respectively for the general definitions of seminorms and norms on modules over non-zero rings.) But a
norm may be regarded as a topological concept because it induces a metric, and a metric induces a topology.
Similarly, a set of seminorms may induce a topology on a linear space. So topological concepts such as limits,
convergence and continuity may be defined in terms of seminorms or norms on linear spaces. However, even
the absolute value of a real or complex number may be regarded as a metric, which therefore has a topological
character. Seminorms and norms only acquire an explicitly topological character when limits, convergence,
continuity and other topological concepts are applied to them. In Section 24.6, only the algebraic aspects of
seminorms and norms are presented.
24.6.2 Remark: Specialisation of general seminorms and norms from modules to linear spaces.
Definitions 19.4.3 and 19.5.2 define respectively a seminorm and norm on a general module over a non-zero
ring, compatible with a general absolute value function which has values in a general ordered ring. Definitions
24.6.3 and 24.6.8 specialise these to linear spaces with real numbers for the ordered ring. The spaces and
functions in Definitions 24.6.3 and 24.6.8 are illustrated in Figure 24.6.1.
A norm defines the lengths of vectors in a scalable fashion. In other words, if a vector is multiplied by
a scalar, then the norm is multiplied by the absolute value of the scalar. (It is not necessary to take the
absolute value if the scalar is non-negative and the field of scalars K is the same as the field of absolute
values, which is R
in Definition 24.6.8.)
C
The use of an abstract field K is not superfluous formalism. It could be that K equals or is a subfield of
C which is not a subfield of . R
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
718 24. Linear space constructions
||
K R L||
L
L = L||
V
L
24.6.3 Definition: A seminorm on a linear space V over a field K, compatible with an absolute value
R R
function | |K : K , is a function : V which satisfies:
(i) K, v V, (v) = ||K (v), [scalarity]
(ii) v, w V, (v + w) (v) + (w). [subadditivity]
24.6.4 Theorem: Let V be a linear space over a field K. Let : V R be a seminorm on V which is
compatible with an absolute value function | |K : K . R
(i) is a seminorm on the left module V over the non-zero ring K, compatible with | |K .
(ii) (0V ) = 0R .
(iii) v V, (v) = (v).
(iv) v V, (v) 0R .
Proof: Part (i) follows from Definitions 19.4.3 and 24.6.3 because any field K is a non-zero ring by
Theorem 18.7.8 (iii), and R
is an ordered ring.
Parts (ii), (iii) and (iv) follow from part (i) and Theorem 19.4.4 parts (i), (ii) and (iii) respectively.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Theorem 24.6.6 (iv) may be thought of as a kind of continuity of a seminorm with respect to itself. (See
Theorem 39.2.6 for a direct application of this.) In other words, if x y is small as measured by , then
(x) (y) is also small as measured in R
by the standard absolute value function | |R on . R
24.6.6 Theorem: Let : V R be a seminorm on a linear space V over a field K, with absolute value
function | |K : K . R
(i) If (v) 6= 0R for some v V , then |1K |K = 1R .
(ii) x, y M, (x) (y) (x y).
(iii) x, y M, (x) (y) + (x y).
(iv) x, y M, |(x) (y)|R (x y).
Proof: For part (i), let v V with (v) 6= 0R . By Definition 24.6.3 (i), (v) = (1K v) = |1K |K (v).
Therefore |1K |K = 1R by Theorem 18.6.4 (ii).
Part (ii) follows as for Theorem 19.4.4 (iv).
Part (iii) follows from part (ii) and Theorem 24.6.4 (iii) by swapping x and y.
For part (iv), let x, y M . By part (ii), (y) (x) (x y). By part (iii), (x) (y) (x y).
Therefore |(x) (y)|R (x y) by Definition 18.5.14.
24.6.8 Definition: A norm on a linear space V over a field K, compatible with an absolute value function
R R
| |K : K , is a function : V which satisfies:
24.6.9 Theorem: Let V be a linear space over a field K. Let : V R be a norm on V which is
compatible with an absolute value function | |K : K . R
(i) is a norm on the left module V over the non-zero ring K, compatible with | |K .
(ii) is a seminorm on the linear space V over the field K, compatible with | |K .
(iii) (0V ) = 0R .
(iv) v V, (v) = (v).
Proof: Part (i) follows from Definitions 19.5.2 and 24.6.8 because any field K is a non-zero ring by
Theorem 18.7.8 (iii), and R
is an ordered ring.
Part (ii) follows Definitions 24.6.3 and 24.6.8.
Parts (iii) and (iv) follow from part (ii) and Theorem 24.6.4 parts (ii) and (iii) respectively.
24.6.11 Definition: A normed linear space is a linear space with a norm. In other words, a normed linear
space is a tuple V
< (V, )
< (K, V, K , K , V , , | |K , ) where
(i) (K, V, K , K , V , ) is a linear space over K,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) | |K : K R is an absolute value function for K,
(iii) : V R is a norm on the linear space V < (K, V, K , K , V , ).
24.6.12 Remark: Notation for standard absolute value function for the real numbers.
In the case of norms on real linear spaces as in Definition 24.6.13, the absolute value function is always the
R
standard absolute value function on as in Definition 18.5.14. Therefore it is unnecessary to add a subscript
R
as in | |R .
R
24.6.13 Definition: A normed real linear space is a linear space over the field , with a norm, with the
R R
standard absolute value function | | : defined by x 7 |x| = max(x, x) for x . R
24.6.14 Notation: k k, for a vector v in a normed linear space (V, ), denotes the norm . In other
words, v V, kvk = (v).
| | is an alternative notation for k k.
-1 1 x1
1
=
5
p
1.
p=
5
2.
p=
5
p=
4.
=
-1 p
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
sup{xi ; i n }, where the supremum is defined with respect to the ordered set + 0 . By Theorem 11.2.13 (v),
sup() = min( + R
0 ) = 0R with respect to this ordered set. Consequently maxi=1 |xi | = 0 for n = 0.
n
24.6.19 Notation: |x|p for x Rn , n Z+0 and p [1, ] denotes the p-norm of x Rn . Thus
n
P
|xi |p 1/p if 1 p <
|x|p = i=1
n
max |xi | if p = .
i=1
24.6.21 Definition: The Euclidean norm on Rn for n Z+0 is the 2-norm x 7 |x|2 on Rn
24.6.22 Definition: The Euclidean normed (Cartesian) (coordinate) space with dimension n Z+0 is the
R R
pair ( n , | |2 ), where n has the usual linear space structure and | |2 is the Euclidean norm on Rn.
24.6.23 Remark: Equivalence of all norms on a finite-dimensional linear space.
From the point of view of the topology on a linear space which is induced by the metric which is obtained
from a norm as in Theorem 37.1.8, two norms on the space are equivalent if each norm is bounded with
respect to the other. It is useful to know that any two norms on a finite-dimensional linear space are
equivalent from this point of view. In other words, they induce the same topology. (The same is not true
for infinite-dimensional linear spaces.)
The equivalence of norms on a finite-dimensional linear space may be shown by means of topological concepts
such as Cauchy sequences. (See for example Bachman/Narici [50], pages 122125; Lang [104], pages 470471.)
However, a norm is an apparently algebraic concept, whose properties should presumably be demonstrable
by algebraic means alone. (Certainly not all algebraic theorems can be proved without analysis, but it is
generally preferable to at least seek an algebraic proof before resorting to analysis.)
Theorem 24.6.24 shows that the open and closed unit balls SO and SC for a seminorm are convex in parts
(ii) and (iii), balanced in parts (iv) and (v), and absorbing in parts (vi) and (vii). (See Definition 24.6.25
for balanced and absorbing sets.) The functions O and C are known as the Minkowski functionals of
SO and SC respectively. (See Yosida [161], page 25.)
24.6.24 Theorem: Let be a seminorm on a real linear space V for the usual absolute value function
R
on . Let SO = {v V ; (v) < 1} and SC = {v V ; (v) 1}. Define O , C : V by R
R R
O : v 7 inf{t + ; t1 v SO } and C : v 7 inf{t + ; t1 v SC }.
(i) 0 SO SC .
(ii) SO is a convex subset of V . [SO is convex]
(iii) SC is a convex subset of V . [SC is convex]
(iv) v V, (v SO v SO ). [SO is balanced]
(v) v V, (v SC v SC ). [SC is balanced]
(vi) v V, {t R+; t1v SO } =6 . [SO is absorbing]
(vii) v V, {t R+ ; t1 v SC } =
6 . [SC is absorbing]
(viii) v V, O (v) = (v).
(ix) v V, C (v) = (v).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
For part (vi), let v V . Let t + satisfy t > (v). Then (t1 v) = t1 (v) < 1. So t1 v SO . Therefore
R
{t + ; t1 v SO } ((v), ) 6= .
R
For part (vii), let v V . Let t + satisfy t (v). Then (t1 v) = t1 (v) 1. So t1 v SC . Therefore
R R R
{t + ; t1 v SC } [(v), ) 6= if (v) > 0, and {t + ; t1 v SC } + 6= if (v) = 0.
For part (viii), let v V . Let t R
+ 1
with t > (v). Then t (v) < 1. So t SO . But if 0 < t (v)
then t1 (v) 1 and so t R
/ SO . Therefore O (v) = inf{t + ; t > (v)} = (v).
R
For part (ix), let v V . Let t + with t (v). Then t1 (v) 1. So t SC . But if 0 < t < (v), then
t1 (v) > 1 and so t R
/ SC . Therefore C (v) = inf{t + ; t (v)} = (v).
24.6.25 Definition:
A balanced set in a linear space V is a subset S of V which satisfies v V, (v S v S).
An absorbing set in a real linear space V is a subset S of V which satisfies v V, t R+, t1v S.
24.6.26 Remark: Unit balls contain the same information as seminorms.
It follows from Theorem 24.6.24 (viii, ix) that the information in a seminorm may be recovered from either
the open unit ball SO or the closed unit ball SC . This is useful in particular for visualisation of seminorms
in terms of these unit balls.
Theorem 24.6.27 shows that unit balls can be, more or less, recovered from Minkowski functionals which
are constructed from given convex, balanced, absorbing sets. (The open and closed character of these
sets requires some topology. So these concepts are not formally defined here.) Consequently there is a kind
of information equivalence between seminorms and convex, balanced, absorbing sets. They are in some sense
inverses of each other.
This equivalence is useful for the graphical representation of seminorms and norms because one may visualise
them either in terms of their value for a variable vector or else in terms of their unit balls SO or SC . (See
for example the unit level sets for p-norms in Figure 24.6.2.) Some concepts are easier to understand, and
some kinds of theorems are easier to prove, in terms of the unit balls. In particular, when a seminorm or
norm is constrained in some way, for example by specifying its value on the elements of a basis, the knowledge
that the unit ball must be convex, balanced and absorbing helps to narrow down the possibilities.
24.6.27 Theorem: Let V be a real linear space. Let S be a subset of V which is convex, balanced and
R
absorbing. Define : V by : v 7 inf{t + ; t1 v S}.R
(i) 0V S.
(ii) v V, t R+ , ((v) < t t1v S).
(iii) is a seminorm on V .
(iv) {v V ; (v) < 1} S {v V ; (v) 1}.
Proof: For part (i), the absorbing property implies t1 0V S for some t R+ . Then 0V = t1 0V S.
For part (ii), let v V and t R
+
satisfy (v) < t. Then inf{t R
; t v S} < t. So t1 v S for
+ 1
some t +
R satisfying t < t. But then t v = (1 )0V + t v with = t/t [0, 1]. So t1 v S by the
1 1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
R
s, t + satisfy (v) < s and (w) < t. So s1 v S and t1 w S by part (ii). Let = t/(s + t).
Then [0, 1]. So (1 )s1 v + t1 w S by the convexity of S. Thus (s + t)1 (v + w) S. So
R
(v + w) s + t = (v) + (w) + . Since this holds for all + , it follows that (v + w) (v) + (w).
Hence is a seminorm on V by Definition 24.6.3.
For part (iv), let SO = {v V ; (v) < 1}. Let v SO . Then (v) < 1. So v S by part (ii). Thus SO S.
R
Now let SC = {v V ; (v) 1}. Let v S. Then 1 {t + ; t1 v S}. So inf{t + ; t1 v S} 1. R
So (v) 1. So v SC . Thus S SC .
Rn ,
n
P Pn
i ei |i |(ei ) (24.6.1)
i=1 i=1
Pn n
|i | max (ei ) (24.6.2)
i=1 i=1
n
= ||1 max (ei )
i=1
and
Rn ,
n
P n n
P
i ei max |i | (ei ) (24.6.3)
i=1 i=1 i=1
n
P
= || (ei ).
i=1
Proof: The inequality on line (24.6.1) follows from Definition 24.6.3 (i, ii) by induction on n. Then lines
(24.6.2) and (24.6.3) follow from line (24.6.1) by Theorem 24.6.4 (iv). (As explained in Remark 24.6.18, the
expressions maxni=1 (ei ) and maxni=1 |i | are interpreted to equal zero when n = 0.)
24.6.30 Remark: Lower bounds for norms in terms of values at finite sets of sample points.
Theorem 24.6.29 gives upper bounds for the value of a seminorm for a vector v in terms of the vectors
coordinate tuple with respect to some basis B and the value of at the sample points ei in the basis. If
the seminorm is not a norm, no such lower bound is possible because of the existence of non-zero vectors
whose seminorm equals zero. Therefore a lower bound of this kind may only be obtained for norms.
General lower bounds for norms, analogous to the upper bounds for seminorms in Theorem 24.6.29, cannot
in general be obtained in terms of the values for the elements of a basis alone. This is demonstrated by
Example 24.6.31. However, one may hope to express such lower bounds in terms of a finite set of sample
points. The sample points should preferably be chosen by some simple algorithm in terms of and B, not
in terms of some kind of topological procedure based on convergence of Cauchy sequences for example.
R
24.6.31 Example: Let V = 2 with the usual linear space structure. Then B = (ei )2i=1 is a basis for V
with e1 = (1, 0) and e2 = (0, 1). Define k : V for k +
0 by R R
x V, k (x) = max(k|x1 + x2 |, |x1 x2 |). (24.6.4)
Then k is a norm on V for k > 0. The values of k on B are k (e1 ) = k (e2 ) = max(k, 1). The unit level
sets of this norm for some values of k (0, 1] are illustrated in Figure 24.6.3.
x2
k=
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3
1/
k=
6
1/
k=
2
5
1/
k=
4
1/
k=
1
3
1/
k
2
=
-3 -2 -1
1
1 2 3 x1
-1
-2
-3
Rn ,
n
P n
k i ei k ||1 max k (ei ), (24.6.5)
i=1 i=1
where n = 2. To verify line (24.6.5), note that k||1 maxni=1 (ei ) = k(|1 | + |2 |), where |1 | + |2 | =
max(|1 + 2 |, |1 2 |) by Theorem 18.5.16 (iv). So k||1 maxni=1 (ei ) = k max(|1 + 2 |, |1 2 |) ().
The sharpness of this bound is verified by = (1, 1).
24.6.32 Remark: StrategiesPfor proving lower bounds for norms relative to values on a basis.
Whereas the upper bound P ( ni=1 i ei ) ||1 maxni=1 (ei ) in Theorem 24.6.29 is universal for all norms
on V , the lower bound k ( ni=1 i ei ) k||1 maxni=1 k (ei ) in Example 24.6.31 line (24.6.5) depends on the
norm k . The factor k can be arbitrarily small. So the bound can be arbitrarily weak. However, it can be
hoped that for any fixed norm , the lower bound in line (24.6.5) will hold for some positive factor k.
It follows from Theorems 24.6.24 and 24.6.27 that seeking a lower bound for a norm is equivalent to seeking
an upper bound for its unit ball. Thus the algebraic lower-bound task here is equivalent to the geometric
task of finding an outer bound for a convex, balanced, absorbing set S which has the elements of a basis on
its boundary. From Figure 24.6.3, it is abundantly clear that such a set S can be arbitrarily large.
The positivity property of a norm requires the intersection of S with any ray through the origin to be finite.
It is intuitively clear that if all such ray-intersections have finite length, then S must be bounded by some
finite cube centred at the origin because the intersection of S with a plane perpendicular to the ray must be
convex, and this convex intersection must be non-decreasing as the plane moves away from the origin. In
fact, this plane-intersection must be sublinear, which implies that it must contract to the empty set at some
finite distance from the origin. This intuitive picture must be converted into a logically rigorous proof. One
may either approach such a proof via norms directly or via convex, balanced, absorbing sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: By Theorems 24.6.4 (i) and 19.4.4 (iv), and Definition 24.6.3 (i), (1 v1 + 2 v2 ) (1 v1 )
(2 v2 ) = 1 (v1 ) 2 (v2 ). Similarly (1 v1 + 2 v2 ) (2 v2 ) (1 v1 ) = 2 (v2 ) 1 (v1 ). So
(1 v1 + 2 v2 ) max(1 (v1 ) 2 (v2 ), 2 (v2 ) 1 (v1 )) = |1 (v1 ) 2 (v2 )|.
24.6.34 Example: Let be a seminorm on a real linear space V . Let v0 , v1 V . Define v for [0, 1]
by v = (1 )v0 + v1 . Then by Theorem 24.6.33, (v ) |(1 )(v0 ) (v1 )| for all [0, 1]. This
is illustrated in Figure 24.6.4.
| (1 )(v0 ) + (v1 ) |
(0, (v0 ))
(v0 )
0 =
(v0 ) + (v1 )
(1, (v1 ))
0
( , (v0 ))
(0 , 0) 1
Figure 24.6.4 Lower bound for seminorm of convex combinations of two vectors
In the case of a norm, it is known that (v0 ) and (v1 ), and it is also known that (v0 ) is positive for
0 = (v0 )/((v0 ) (v1 )). Therefore Theorem 24.6.33 may be applied to the vector pairs (v0 , v0 ) and
(v0 , v1 ) to obtain two additional lower bounds to fill the gap. This yields an explicit positive lower bound
which depends only on three sample points for the norm, where two of the points come from a basis, and
the third point is computed from the value of the norm at the first two sample points. This idea is applied
in the proof of Theorem 24.6.35 (iii).
24.6.35 Theorem: For any finite-dimensional real linear space V , and any norm on V , and any basis
B = (ei )ni=1 for V , let P (V, , B) denote the proposition:
R+, Rn ,
n
P n
k i ei k ||1 max (ei ), (24.6.6)
i=1 i=1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that 0 2 (e1 )/((e1 ) + (e2 )). The two bounds in this maximum-expression overlap when 2 ((e1 ) +
(e2 ) + 12 q0 /(e1 )) = (e1 ), which gives a minimum value equal to q0 (e1 )/(q0 + 2(e1 )((e1 ) + (e2 ))).
Dividing this by max((e1 ), (e2 )) gives
R
as the constant k in line (24.6.6) for 2 satisfying 0 1 and 0 2 ((e1 )/(e2 ))1 . A similar
constant k2 may be computed for 0 1 and 2 ((e1 )/(e2 ))1 in terms of the same q0 value. Then by
letting q1 = ((e2 )e1 (e1 )e2 ), a further two similar constants k3 and k4 may be computed for 0 for
two similar ranges of 2 . (The computation of k2 , k3 and k4 is left as an exercise for the curious reader!)
Then line (24.6.6) is satisfied with k = min(k1 , k2 , k3 , k4 ).
24.6.36 Remark: Difficulties in computing explicit lower bounds for norms in higher-dimensional spaces.
In Theorem 24.6.35, the lower bound k may be freely chosen as any positive real number when n = 0, is
equal to at most 1 when n = 1, and has a complicated dependence on and B when n = 2. When n = 3,
it is not entirely clear how to compute an explicit expression for k. It is relatively straightforward to prove
R
the existence of k + for general n + Z
0 using topological concepts such as continuity, compactness and
completeness. Therefore this task is delayed to Theorem 39.4.2.
The principal motivation for Theorem 24.6.33 is to show that all norms on finite-dimensional linear spaces
are equivalent in the sense that their ratio is bounded above and below by some positive constant. This then
implies that they have the same topological structure. Based on the limited bounds in Theorem 24.6.33,
obtained using algebraic arguments, it is possible to state the limited Theorem 24.6.38 regarding equivalence
of norms. The more general theorem for dimension greater than 2 requires some topological analysis.
24.6.37 Definition: Equivalent norms on a linear space V over a field K, for an absolute value function
R
| |K : K are norms 1 and 2 on V which satisfy
c, C R +
, v V, c1 (v) 2 (v) C1 (v).
24.6.38 Theorem: Let 1 , 2 be norms on a real linear space V with dim(V ) 2. Then 1 and 2 are
equivalent norms on V .
n
Proof: Let 1 , 2 be norms P on a real linear space V with n = dim(V ) 2. Let B = (ei )i=1 be a basis
R
for V . Let v V . Then v = ni=1 i ei for some unique = (i )ni=1 n . So 1 (v) ||1 maxni=1 1 (ei ) by
R
Theorem 24.6.29. But by Theorem 24.6.35, 2 (v) k||1 maxni=1 2 (ei ) for some constant k + which is
R
independent of v. Let c = k maxni=1 2 (ei )/ maxni=1 1 (ei ). Then c + and
n
v V, c1 (v) c||1 max 1 (ei )
i=1
n
= k||1 max 2 (ei )
i=1
2 (v).
Let C = c1 . Then swapping 1 and 2 gives v V, 2 (v) C1 (v).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
and therefore a topology, on the linear space. (See Definition 39.2.2.) When the two vector parameters are
different, the inner product is equal to the product of the induced norms of the two vectors multiplied by
the cosine of the angle between them, which effectively defines the angle.
24.7.2 Remark: Definition of inner product for real linear spaces.
Definition 24.7.4 is essentially identical to Definition 19.6.2. The combination of conditions (i), (ii) and (iii)
means that the inner product must be bilinear. The combination of conditions (iii) and (v) implies that
(v, v) = 0 if and only if v = 0, as shown in Theorem 19.6.4 (ii).
Bilinear functions are a special case of the multilinear maps in Definition 27.1.3. However, they are introduced
early here as Definition 24.7.3 because of their importance for general linear algebra (as distinct from tensor
algebra). In differential geometry, and in analytic geometry in general, bilinear maps have a non-tensorial
meaning. Tensors are objects which are defined on manifolds, but inner products are part of the infrastructure
of Riemannian manifolds and linear spaces. The role of an inner product is to specify distances and angles
between vectors. This is part of the geometrical structure of a space. Tensors and tensor fields are merely
inhabitants in the space.
Bilinear functions are often called bilinear forms because when written in terms of vector components,
they are sums of products of terms which resemble quadratic forms, polynomials or multinomials.
24.7.3 Definition: A bilinear function or bilinear form on a linear space V over a field K is a function
: V V K which satisfies:
(i) u, v, w V, (u + v, w) = (u, w) + (v, w), [left additivity]
(ii) u, v, w V, (u, v + w) = (u, v) + (u, w), [right additivity]
(iii) u, v V, K, (u, v) = (u, v) = (u, v). [scalarity]
In other words, the maps (v, ) : V K and ( , v) : V K are both linear maps for all v V .
24.7.4 Definition: An inner product on a real linear space V is a positive definite symmetric bilinear
R
function on V . In other words, it is a function : V V which satisfies the following conditions.
(i) u, v, w V, (u + v, w) = (u, w) + (v, w), [left additivity]
(ii) u, v, w V, (u, v + w) = (u, v) + (u, w), [right additivity]
(iii) u, v V, R, (u, v) = (u, v) = (u, v), [scalarity]
(iv) u, v V, (u, v) = (v, u), [symmetry]
(v) u V \ {0V }, (u, u) > 0. [positive definiteness]
24.7.5 Definition: An inner product (real linear) space is a real linear space with an inner product. In
other words, it is a pair (V, ), where V is a real linear space and is an inner product on V .
24.7.6 Definition: TheP(standard) (Euclidean) inner product on Rn for n Z+0 is the map : Rn Rn
R defined by : (x, y) 7 ni=1 xi yi .
24.7.7 Definition: The Euclidean inner product space or Cartesian inner product space with dimension
n + Z
0 is the inner product space (
n
R
, ), where is the standard Euclidean inner product on n . R
24.7.8 Notation: Various notations for an inner product.
R Z
(x, y) denotes the inner product of x, y n for n +
0.
x y denotes the inner product of x, y R for n Z+
n
0.
hx, yi denotes the inner product of x, y R for n Z+
n
0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
24.7.10 Remark: Inversion of inner products.
Since the metric tensor field on a Riemannian manifold in Section 67.2 is an inner product at each point of
the manifold, the properties of inner products have great relevance to differential geometry. In particular, the
ability to invert an inner product, in some sense, is often required for raising indices in tensor calculus.
One important application of this idea is the gradient concept in Definition 68.6.2.
In general, the practical inversion of linear maps is not trivial. One often demonstrates the existence of
an inverse without stating an explicit procedure to compute it. Matrix methods are typically required for
explicit inversion procedures. Therefore only existence is shown here, not explicit formulas for inverses.
Although matrix inversion is a procedure, which could require non-trivial numerical methods in practice, it
should also be considered that even the division of real numbers is a procedure if, for example, a decimal or
binary expansion of the result is required. Even multiplication and addition require some kind of numerical
algorithm in binary or decimal. When a ratio such as 2/7 is written, it is only a symbolic formula for
the solution of an equation such as 7x = 2 which must be solved for x by some procedure. Similarly A1
denotes the solution of a matrix equation AX = XA = I, which requires a procedure of some kind. Even
the addition of integers requires numerical procedures.
24.7.11 Theorem: Let V be a finite-dimensional real linear space. Let : V V R be an inner product
on V . Define : V V by u, v V, (u)(v) = (u, v).
(i) Lin(V, V ).
(ii) is a linear space isomorphism.
Proof: For part (i), let u V . Then (u) V by Definition 24.7.4 (ii, iii). Let u1 , u2 V . Let v V .
Then (u1 + u2 )(v) = (u1 + u2 , v) = (u1 , v) + (u2 , v) = (u1 )(v) + (u2 )(v) by Definition 24.7.4 (i). So
(u1 + u2 ) = (u1 ) + (u2 ) by Definition 23.6.4. Similarly, for all R and u V , (u) = (u, v) =
(u, v) = (u)(v) by Definition 24.7.4 (iii). So (u) = (u) by Definition 23.6.4. Hence is linear by
Definition 23.1.1.
For part (ii), dim(V ) = dim(V ) by Theorem 23.7.11. Suppose that (u) = 0V for some u V \ {0V }.
Then (u)(v) = 0R for all v V . So (u, u) = 0R . But this contradicts Definition 24.7.4 (v). Therefore
ker() = {0V }. Hence is a linear space isomorphism by Theorem 40.4.6 (i).
24.7.12 Remark: Inner products on real linear spaces.
Inner products may be defined on general modules over rings as in Definition 19.6.2. In such generality they
are not easily related to norms. As mentioned in Remark 19.7.1, one requires the ring to be upgraded to an
ordered field where all non-negative elements have square roots. It is possible to prove a generalised Cauchy-
Schwarz inequality for a general unitary module over a commutative ordered ring as in Theorem 19.7.2.
In the case of a real linear space, one obtains the usual square-root norm as in Definition 24.7.13, and a
Cauchy-Schwarz inequality as in Theorem 24.7.14.
24.7.13 Definition: The norm corresponding to an inner product : V V R on a real linear space V
is the function L : V + R
0 defined by L(v) = (v, v)
1/2
for all v V .
24.7.14 Theorem: Cauchy-Schwarz inequality.
R
Let : V V be an inner product on a real linear space V . Then
v1 , v2 V, |(v1 , v2 )| kv1 kkv2 k,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Riemannian manifolds. An inner product which is not necessarily positive definite is called a scalar product
by ONeill [261], page 12; Sternberg [37], page 117. It is called an inner product by Szekeres [266], page 127.
24.8.2 Definition: A nondegenerate or non-singular symmetric bilinear function on a linear space V over
a field K is a symmetric bilinear function : V V K satisfying v V \ {0}, w V, (v, w) 6= 0.
24.8.3 Definition: A (hyperbolic) inner product on a linear space V is a nondegenerate symmetric bilinear
function on V .
24.8.4 Definition: An orthogonal subset of a linear space V with respect to a hyperbolic inner product
on V is a subset S of V \ {0} which satisfies v, w S, (v, w) = 0.
24.8.5 Definition: The index of a hyperbolic inner product on a linear space V is the extended integer
sup{#(S); S V and v S, (v, v) < 0 and v, w S, (v, w) = 0}. In other words, the index of is the
largest number of orthogonal vectors v V with g(v, v) < 0.
24.8.6 Definition: A Minkowskian inner product on a real linear space V is a hyperbolic inner product
on V with index 1. In other words, a Minkowskian inner product satisfies
(i) v V, (v, v) < 0, and [index 1]
(ii) v, w V, (((v, v) < 0 and (w, w) < 0) (v, w) 6= 0). [index < 2]
24.8.7 Theorem: Let V be a finite-dimensional real linear space with hyperbolic inner product . Let
n = dim(V ). Then is Minkowskian if and only if V has a basis (ei )ni=1 which satisfies
(i) (e1 ) = 1,
N
(ii) i n \ {1}, (ei ) = 1,
N N
(iii) i n , j n \ {i}, (ei , ej ) = 0.
Proof: Assume that is Minkowskian. Then n 1 because (v, v) < 0 for some v V , If n = 1, then
choose e1 = ((v, v))1/2 v. This gives (e1 , e1 ) = ((v, v))1 (v, v) = 1. So (ei )ni=1 satisfies (i), (ii)
and (iii). So it may be assumed that n 2.
Since V is finite-dimensional, it has a basis B = (ei )ni=1 . This basis B can be modified in stages to make it
satisfy (i), (ii) and (iii). First suppose that (ei , ei ) 0 P N
for all i n . By Definition 24.8.6 (i), there is at
least one v V with (v, v) < 0. Since B is a basis, v = ni=1 vi ei for some unique (vi )ni=1 n . Not all of R
N
the components vi equal zero because (v, v) 6= 0. Let i0 = min{i n ; vi 6= 0}. Modify 0
Pn B to 0B = (ei )i=1
0 n
0 0 0
where eP i0 = v and ei = ei for i 6
= i0 . To see that
P B is a basis for V , suppose that
i=1 i i e = 0. Then
i0 v + i6=i0 i ei = 0. If i0 6= 0, then v = i6=i0 (i /i0 )ei , which would imply that vi0 = 0 by the
n
uniquenessP of the component tuple (vi )i=1 , which contradicts the choice of i0 . Therefore i0 = 0, which
N
implies i6=i0 i ei = 0, which implies that i = 0 for all i n because B is a basis. Thus it may be
assumed that (ei , ei ) < 0 for at least one i n . N
N
Since (ei , ei ) < 0 for some i n , the vectors in B may be permuted so that (e1 , e1 ) < 0, and this can be
scaled so that (e1 , e1 ) = 1. Thus it may be assumed that (e1 , e1 ) = 1.
For the third stage, it is assumed that B is a basis with (e1 , e1 ) = 1. Define B 0 = (e0i )ni=1 by e01 = e1 and
N
e0i = ei + (e1 , ei )e1 for i n \ {1}. Then B 0 is a basis for V and (e1 , e0i ) = (e1 , ei ) + (e1 , ei )(e1 , e1 ) = 0
for all i 6= 1. Therefore (e0i , e0i ) 0 for all i 6= 1 by Definition 24.8.6 (ii). Note that at the end of the third
stage, the matrix [(ei , ej )]ni,j=1 , as defined in Section 25.2, has the following general appearance:
1 0 0 ... 0
0 0 ? ... ?
0 ? 0 ... ? .
. .. .. .. ..
.. . . . .
0 ? ? ... 0
For the fourth stage, suppose that (ei , ei ) = 0 for some i 6= 1. Then by permuting indices, it may be
assumed that i = 2 and (e2 , e2 ) = 0. If n = 2, let v V . Then v = v1 e1 + v2 e2 for some v1 , v2 . So R
(e2 , v) = v1 (e2 , e1 ) + v2 (e2 , e2 ) = 0. Thus (e2 , v) = 0 for all v V , which contradicts Definition 24.8.2
for nondegeneracy because e2 6= 0 since it is a basis vector. Therefore (e2 , e2 ) > 0, and e2 may be scaled so
that (e2 , e2 ) = 1. Thus conditions (i), (ii) and (iii) are verified. So it may be assumed that n 3.
So now assume that n 3 with (e1 , e1 ) = 1, and suppose that (e2 , e2 ) = 0. Suppose that (e2 , ei ) = 0
N
for all i n \ {1, 2}. Then (e2 , v) = 0 for all v V , which contradicts the nondegeneracy assumption. So
N
(e2 , ei ) 6= 0 for some i n \ {1, 2}. By permuting indices, it may be assumed that (e2 , e3 ) 6= 0. Then
N
suppose that (e3 , e3 ) = 0. Let e02 = e2 +e3 and e3 = e2 e3 , and e0i = ei for i n \{2, 3}. Then B 0 = (e0i )ni=1
is a basis for V , and (e02 , e02 ) = 2(e2 , e3 ), and (e03 , e03 ) = 2(e2 , e3 ), and (e02 , e03 ) = (e03 , e02 ) = 0. If
(e2 , e3 ) < 0, then (e02 , e02 ) < 0 and (e1 , e02 ) = 0, which contradicts Definition 24.8.6 (ii). If (e2 , e3 ) > 0,
then (e03 , e03 ) < 0 and (e1 , e03 ) = 0, which contradicts Definition 24.8.6 (ii). Therefore the hypothesis
(e3 , e3 ) = 0 is excluded. So (e3 , e3 ) > 0. Now let e02 = e2 + e3 with = (e2 , e3 )/(e3 , e3 ). Then
(e02 , e3 ) = 0 and (e02 , e02 ) = (e2 , e3 )2 /(e3 , e3 ) < 0, which contradicts Definition 24.8.6 (ii). Therefore
(e2 , e2 ) > 0. Consequently (ei , ei ) > 0 for all i 6= 1. Then ei may be scaled so that (ei , ei ) = 1 for
all i 6= 1.
For the final stage, (ei , ej ) may be set to zero for i 6= j by induction on i 1. Suppose that (ei , ej ) = 0
N N
for all i, j k with i 6= j for some k 1 with k < n. Define e0` = e` for ` k and e0` = e` (ek , e` )ek
N N N N
for ` n \ k . Then (ek , e` ) = 0 for all ` n with ` 6= k. Thus by induction on k n , there is a basis
B for V which satisfies conditions (i), (ii) and (iii).
For the converse, suppose that satisfies conditions (i), (ii) and (iii). ThenP Definition 24.8.6 (i) P is satisfied
n n
with v = e1 . Let v, w P V with (v, v) < 0 and (w, w) <P0. Let v = i=1 vi ei and w =P i=1 wi ei .
2 n 2 2 n 2 2 n 2
Then (v,Pn v) = v1 + i=2 vP i < 0 and Pn(w, 2w) = P w1 + i=2 wi < 0. Therefore v1 > i=2 vi and
2 2 2 2 n 2 n 2
w1 > i=2 wi . So v1 w1 > ( i=2 vi )( i=2 wi ) P( i=2 vi wi ) by the Cauchy-Schwarz inequality. (See
Theorem 24.7.14). Therefore (v, w) = v1 w1 + ni=2 vi wi 6= 0. Thus Definition 24.8.6 (ii) is satisfied.
Hence is a Minkowskian inner product on V .
Chapter 25
Matrix algebra
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
rectangular matrix
Mm,n (K)
The algebra of matrices was first developed systematically by Cayley in a series of papers which
began to appear in 1857.
Eves [66], page 10, wrote the following.
The name matrix was first assigned to a rectangular array of numbers by James Joseph Sylvester
(18141897) in 1850. [. . . ] But it was Arthur Cayley (18211895) who, in his paper A memoir on
the theory of matrices, of 1858, first considered general m n matrices as single entities subject
to certain laws of combination.
According to Mirsky [113], page 75: The term matrix is due to Sylvester (1850). A useful, concise account
of the origins and wide range of applications of matrix algebra is given by Eves [66], pages 15, 1012.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is because coordinate transformations are generally invertible. I.e. They must not lose information.)
A change of vector coordinates due to a linear change of basis may be thought of as a linear map from the
linear space of coordinate tuples to itself. In this sense, case (2) is a special case of case (1). Thus matrices
mostly correspond to some kind of linear transformation.
To avoid confusion between point transformations and coordinate transformations, it is perhaps helpful to
think of the points as referring to the state of some system, whereas the coordinates refer to the ob-
servation of the same system from the point of view of a particular observer. Thus the points refer to
the real thing, whereas the coordinates are more subjective, depending on the viewpoint of the observer.
(This system/observer dichotomy is also mentioned in Remarks 20.1.10 and 20.10.2, Example 20.10.6, Defi-
nition 20.10.8, and Remark 20.10.17.)
There is a third main way in which matrices arise naturally in differential geometry.
(3) Component matrix for a second-degree tensor. For example, the components of a metric tensor
on a Riemannian manifold.
The second-degree tensor components in case (3) resemble vector components more than transformation
components. A tensor is a kind of generalised vectorial object, not a kind of transformation between vectorial
objects. The component matrices of tensors also do not usually act on vectors in the same way that
transformation component matrices do. In other words, tensors are generally passive objects on which
transformation component matrices act, whereas matrices actively transform such passive objects. The
component matrix for a second-degree tensor is usually square.
Second-degree tensors are a special case of tensors of general degree. Component matrices are well defined
for tensors of any degree. It is preferable to discuss algebraic operations on tensor component matrices in
the context of tensor algebra because the operations of interest are different to the typical operators for the
component matrices of linear maps and coordinate transformations. (See Chapters 2730 for tensor algebra.)
In addition to differential geometry contexts, finite matrices are also applied to purely algebraic systems.
(4) Component matrix for simultaneous linear equations. The main task here is to solve the equa-
tions using row operations which preserve the set of solutions.
(5) Component matrix for a quadratic linear form. The main task here is to determine eigenvalues
and eigenvectors.
Case (4) is closely related to the components of linear maps and coordinate transformations in cases (1)
and (2). Case (5) is closely related to the symmetric second-degree tensor component matrices in case (3),
such as the components of a metric tensor. Whereas there are bijective relations between general matrices
and general linear maps, the coefficients of a quadratic form correspond to symmetric matrices only.
Thus the tasks to be performed with matrices fall into two main groups. In the first group, matrix multiplica-
tion corresponds to composition of maps, whereas on the second group, matrix symmetry and diagonalisation
are more important. In the second group, the matrices represent an object to be studied in itself by trans-
forming coordinates to reveal its structure, whereas in the first group, the matrices act to transform points,
and inversion and invertibility are of more interest.
The difference between transformation-style matrices and quadratic-form matrices can be seen fairly clearly
0 n
by trying to give meaning to the product of the matrices of two metric tensors (gij )ni,j=1 and (gij )i,j=1 . The
n
Pn 0
product (hij )i,j=1 given by hij = k=1 gik gkj has no meaning. This is because such matrices are components
of tensorial objects, not transformations of tensorial objects. Thus it is important to not assume that all
square arrays of numbers refer to the same underlying class of objects. As always, the operations which
may be applied to a class of mathematical objects depends on the underlying meaning, not just the set of
parameters for the class. (As a more absurd example, consider the cube root of the number of dollars in a
bank account, or the number of prime factors of the number of people who live in a city.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
W V
all j n . Then (v) = i=1 j=1 aij vj ei for general vectors v = j=1 vj ej in V . Thus (v) = w,
Pm
where w = i=1 wi eW m
i , where the component tuple (wi )i=1 is given by the formula w = Av. In this
context, each matrix element aij means the component of (eVj ) with respect to the ith vector eW i in
the basis (eW ) m
i i=1 for W .
(2) Coordinate transformation. If a linear space V has finite bases B 1 = (e1i )ni=1 and B 2 = (e2i )ni=1 ,
then the component matrix for the change of basis from B 1 to B 2 is the matrix A = (aij )ni,j=1 given by
Pn P P
i=1 vi2 e2i = nj=1 vj1 e1j , where vi2 = nj=1 aij vj1 . In other words, v 2 = Av 1 . This implies the relation
Pn 2 1
i=1 aij ei = ej for all j N n . So in this context, each matrix element aij means the component of ej
1
2 2
with respect to the ith vector ei in the basis B for V . (Note that this is not a point transformation
because the linear map here is effectively the identity map idV on V . Points are not transformed. Only
the basis and vector components are changed.)
N P
(3) Simultaneous linear equations. In the equations i m , nj=1 aij vj = wi , each element aij of the
m n matrix A = (aij )ni,j=1 is the coefficient of the contribution of the variable vj to the ith constraint.
Each line of the sequence of simultaneous linear equations is a constraint which must be satisfied. In
this context, it is possible to think of the map v 7 Av more abstractly as a linear map from K m to K n ,
where the object of the exercise is to determine the inverse image of the set {w} by this map. But in
concrete terms, each equation is an individual constraint which delineates the feasible space, which is
the set {v K m ; Av = w}. The role of the equations is only to describe the feasible space.
(4) Inner product. Given a finite-dimensional linear space V with a basis (ei )ni=1 , an inner product
: V V K0+ is in essence a positive definite symmetric
P bilinear form onPV . (See Sections 19.6
Pand 19.7
for general inner products.) Then (v, w) = ni=1 aij vi wj for all v = ni=1 vi ei and w = ni=1 wi ei
in V , for some positive definite symmetric n n matrix A = (aij )ni,j=1 . In this context, each matrix
element aij means the inner product (ei , ej ) of ei and ej . This is a property of the unordered pair of
vectors {ei , ej }. Metric tensors for Riemannian spaces are particular examples of inner products.
(5) Area element. The area subtended by two tangent vectors at a point in a differentiable manifold is
an antisymmetric bilinear real-valued function of the ordered vector
Pn pair. In terms of aPbasis (ei )ni=1
n
at a P
given point, this function may be written as (v, w) = i=1 aij vi wj for v = i=1 vi ei and
n
w = i=1 wi ei , where the n n matrix A = (aij )ni,j=1 is antisymmetric. In this context, the matrix
element aij represents the area subtended by vectors ei and ej .
(6) Markov process. Each matrix element aij represents the probability of transition from state i to
state j in a single discrete-time step. Thus the state vector is a row vector which is multiplied on the
right by the transition matrix, which is the opposite to the order in most applications. Markov process
transition matrices must be square, their elements must lie in the real number interval [0, 1], and the
sum of each row must equal 1. Matrix multiplication signifies the execution of multiple state transitions.
(7) Input-output process. Each square matrix element aij represents the flow of some commodity,
substance or items between sub-systems i and j of a system in some unit of time. The kth power of
such a matrix yields the flow in k units of time is the flow rate is constant.
(8) Network connectivity. Each square matrix element aij equals 1 if there is a link between node i
and node j, and equals 0 otherwise. Matrix multiplication for network connectivity matrices signifies
existence of paths of particular lengths.
(9) Basis frames. Each square matrix element aij represents the component of a variable basis vector i
with respect to a fixed basis vector j, or vice versa. Such a matrix is a coordinatisation of a variable
reference frame, for example in the context of a principal fibre bundle.
If it clear that matrices are typically associated with linear spaces in some way, but there are applications
other than as components of linear maps.
25.1.3 Remark: Applicability of various matrix operations and properties to various contexts.
Arrays of numbers (or more general kinds of ring elements) are often encountered in mathematical contexts
where none, or almost none, of the typical matrix operations are meaningful. For example, an addition or
multiplication operation for a finite set may be presented as a Cayley table, which is a square array of values
aij = xi xj of elements xi of some finite algebraic system with an operation . (See Example 17.1.4 for
example.) Sums, products, determinants and eigenvalues of such matrices have little meaning. As another
example, truth tables in propositional logic are often presented as rectangular arrays of truth values. The
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
typical matrix operations have only limited applicability for such arrays. By considering such examples of
non-matrix arrays, it becomes clear the concept of a matrix is defined by the set of meaningful operations
which may be applied, not by the static structure of the arrays. Moreover, the set of typical matrix operations
which are applicable varies according to context.
If the addition operation is commutative, the order of summation is immaterial. The multiplication operation
does not need to be commutative. (Eves [66], page 20, gives the name Cayley product for the matrix
product rule on line (25.1.1).)
Matrices can have an infinite number of rows, or columns, or both. The numbers of rows and columns can
even be uncountable. This raises the question of how to define an infinite sum if Y is infinite. There are two
main ways to resolve this issue. One may require that the rows {(j, A(i, j)); j Y } of A and the columns
{(j, B(j, k)); j Y } of B contain only finitely many non-zero entries for each i X and k Z. This ensures
that an infinite number of ring elements never need to be summed. Alternatively, one may define a topology
on R so that convergence is meaningful. Of course, this implies that every multiplication must be checked
for convergence, and meaningless matrix products must be rejected. In Chapter 25, neither of these methods
of extending matrices to infinite domains are presented. Nor is the range of matrices extended from fields
to general rings. (The properties of inverses of matrices are more straightforward when a field is used.) For
the purposes of this book, it is sufficient to consider only matrices with domains which are finite and ranges
which are fields. In fact, fields other than the real numbers are rarely required here.
The word matrix is often generalised to include infinite matrices, for which the numbers of rows and
columns may be infinite, typically countably infinite. Such a generalisation is useful in applications to Hilbert
space operators, for example, but in this book, the word matrix generally means finite matrix.
25.2.4 Notation: Mm,n (K), for m, n Z+0 and a field K, denotes the set of m n matrices over K. In
other words, Mm,n (K) = K Nm Nn .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
25.2.5 Notation: Mm,n for m, n
25.2.7 Remark: Matrices with no rows or no columns are indistinguishable from the empty set.
There is a peculiarity which arises when a matrix has either no rows, or no columns, or both. It follows
N N
from Theorem 9.4.6 (i) that the Cartesian product m n equals the empty set if either m = 0 or n = 0.
Therefore there is no way to distinguish 0 n matrices and m 0 matrices from 0 0 matrices. They are
all equal to the empty set. So it is generally preferable to avoid them!
Since all matrices in Mm,n (K) with m = 0 or n = 0 are functions valued on the empty set, it follows that
Mm,n (K) = {} whenever m = 0 or n = 0. One could refer to the matrix as the empty matrix. This
is, of course, very different to the zero matrix in Definition 25.2.8. But when m = 0 or n = 0, the zero
matrix in Mm,n (K) does in fact equal the empty matrix.
25.2.8 Definition: The zero matrix in a set of matrices Mm,n (K), for some field K and m, n Z+0, is
the matrix (aij )m n
i=1 j=1 Mm,n (K) which satisfies (i, j) N
m N
n , aij = 0K .
Barker [128], page 4; Franklin [69], pages xixii. However, it is often also convenient to ignore this convention
and use the same case for both the name and the components of a matrix.) The purpose of the upper-case
is to contrast matrices and vectors when used in the same expression. (Thus the expression Av is a hint
that A is a matrix and v is a vector or column matrix.) When indices are used, the upper-case hint is not
required because the reader can count the indices to distinguish matrices from vectors. Thus it is usual to
write A = (aij )m,n n
i=1,j=1 . If m = n, it is usual to write A = (aij )i,j=1 .
m,n
Square brackets may also be used. Thus A = [aij ]i,j=1 . (The rectangularity of square brackets is perhaps
a useful mnemonic for the rectangularity of the matrix!) Likewise, when the elements are fully written out
in rows and columns, the brackets may be round or rectangular. Thus
a11 a12 ... a1n a11 a12 ... a1n
a21 a22 ... a2n a21 a22 ... a2n
A = (aij )m,n m,n
i,j=1 = [aij ]i,j=1 =
... .. .. .. = . .. .. .. .
. . . .. . . .
am1 am2 . . . amn am1 am2 . . . amn
Rectangular brackets are preferable because they delineate the matrix entries more precisely. As mentioned
in Remark 25.2.17, it is convenient to reserve round brackets (i.e. parentheses) for vectors, which have a
single index, so that matrices, which have a double index, can be highlighted with square brackets.
25.2.10 Remark: Elements of matrices, and their addresses and row and column indexes.
To refer to an element of a matrix A = (aij )m n
i=1 j=1 Mm,n (K), one typically writes simply aij . This
is a kind of pseudo-notation because if the matrix values are all zero, for example, then the value 0 could
refer to any of the m n zero elements. The notation aij strictly speaking means the value a(i, j) K
N N
of the function a : m n K, but in this case, the information about the index pair (i, j) must not be
discarded. Therefore in Definition 25.2.11, a matrix element is defined as the ordered pair ((i, j), aij ) instead
of the more colloquial aij . But if one uses a phrase such as element (i, j) of a matrix A, there is no
ambiguity in defining this to mean the value of A for the pair (i, j).
If one refers to the address of an element aij of a matrix A, this suffers from ambiguity in the case of a
zero matrix, for example. Therefore Definition 25.2.11 gives a pedantically correct meaning for the address
of an element. The same comment applies to the row index and column index.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
25.2.11 Definition: An element of a matrix A = (aij )m n
i=1 j=1 Mm,n (K) for some field K and m, n 0 Z
+
N
is a pair ((i, j), aij ) for some (i, j) m n . N
Element (i, j) of a matrix A = (aij )m n
i=1 j=1 Mm,n (K), for some field K and m, n
+
Z
0 , for some (i, j)
N m Nn , is the value aij .
The address of an element ((i, j), aij ) of a matrix A = (aij )m n
i=1 j=1 Mm,n (K), for some field K and
m, n + Z0 , for some (i, j) N
m Nn , is the index pair (i, j).
The row index of an element ((i, j), aij ) of a matrix A = (aij )m n
i=1 j=1 Mm,n (K), for some field K and
Z+
N N
m, n 0 , for some (i, j) m n , is the integer i.
The column index of an element ((i, j), aij ) of a matrix A = (aij )m n
i=1 j=1 Mm,n (K), for some field K and
Z+
N N
m, n 0 , for some (i, j) m n , is the integer j.
An entry in a matrix is synonymous with an element of a matrix.
25.2.12 Remark: Diagrammatic rows and columns versus pure mathematical rows and columns.
The distinction between rows and columns corresponds to the order of the finite sets in the Cartesian product.
Thus the elements in row i of a matrix A Mm,n (K) are the values A(i, j) for which the first index i is
fixed while j varies, and vice versa for columns. It is customary to think of rows and columns of matrices as
row vectors and column vectors respectively. Strictly speaking, this should mean that row vector i of
N
A Mm,n (K) is the set of ordered pairs {((i, j), A(i, j)); j n }, which is the restriction of the function A
N
to the set {(i, j); j n }, with a corresponding expression for columns. If the double indexing is replaced
by single indexing, the information about whether such a vector is a row or a column is lost. In other words,
there is no way to distinguish a vertical vector from a horizontal vector.
There are (at least) three reasonable pure mathematical interpretations of the concepts of row and column
vectors. These are presented here as Definitions 25.2.13, 25.2.14 and 25.2.15. The rows and columns in
Definition 25.2.13 are simple restrictions of the matrix to specific subsets of the domain of the matrix. In
Definition 25.2.14, the domain index pair is shifted so as to yield 1 n and m 1 matrices respectively.
Definition 25.2.15 fully extracts the rows and columns as vectors which are neither horizontal nor vertical. In
this last case, the terms row vector and column vector are misleading because the information about the
orientation of the vector has been discarded. Vectors do not have an orientation, whereas the row and column
matrices are matrices, which do have a specific orientation. This question of orientation is important when
a row or column vector is to be multiplied with matrices according to the standard matrix multiplication
rules. In practice, these three interpretations are freely and informally intermingled. Usually the intended
meaning can be guessed from the context.
25.2.13 Definition: Rows and columns of a matrix.
Nn} = A{i}N K {i}Nn .
Row i of a matrix A Mm,n (K) is the set {((i, j), A(i, j)); j
n
Column j of a matrix A Mm,n (K) is the set {((i, j), A(i, j)); i Nm } = AN K Nm {j} .
m {j}
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
spaces K m and K n . These maps may be thought of as injections, embeddings or immersions. (The column
matrix injection map and row matrix injection map are summarised in Figure 25.2.1.)
v1
v2 c r
.
.. (v1 , v2 , . . . vn ) [ v1 v2 ... vn ]
vn tuple
column matrix vector row matrix
in Mn,1 (K) in K n in M1,n (K)
Figure 25.2.1 Column matrix injection map and row matrix injection map
Figure 25.2.1 demonstrates how one may conveniently distinguish between row vectors and row matrices by
using round brackets (i.e. parentheses) for row vectors and square brackets for row matrices.
25.2.18 Definition: The column matrix injection map for a linear space K m over a field K for m Z+0
is the map c : K m Mm,1 (K) defined by:
v K m , i Nm , c (v)(i, 1) = vi .
The matrix c (v) is called the column matrix for v.
25.2.19 Definition: The row matrix injection map for a linear space K n over a field K for n Z+0 is the
map r : K n M1,n (K) defined by:
v K n , j Nn, r (v)(1, j) = vj .
The matrix r (v) is called the row matrix for v.
25.3.3 Remark: Matrix operation definitions without formal specification tuples is clearer.
The formal linear space Definition 22.1.1 requires a specification tuple (K, Mm,n (K), K , K , , ) with field
operations K and K for K. Definition 25.3.2 is clearer without such formality.
25.3.4 Remark: Linear spaces of matrices are free linear spaces on finite set products.
For each fixed pair m, n + Z
0 , the linear space Mm,n (K) in Definition 25.3.2 is no more and no less than
N N
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the free linear space FL( m n , K). (See Definition 22.2.10 and Notation 22.2.11 for free linear spaces.)
So there is no need to present the basic algebraic properties of this system here. The real interest lies in the
multiplication operation. Although the linear spaces Mm,n (K) are distinct for different pairs (m, n) (except
for the cases noted in Remark 25.2.7), the multiplication operation is not confined to a single such space as
the addition operation is. Thus the relations between addition and multiplication are not as straightforward
as in most of the algebraic structures in Chapters 17, 18 and 19.
25.3.5 Theorem: Let K be a field and m, n Z+0. Then dim(Mm,n (K)) = mn.
Proof: The linear space Mm,n (K) is clearly spanned by {ek` ; (k, `) Nm Nn }, where ek` Mm,n (K) is
defined by ek` (i, j) = ik j` for (i, j) Nm Nn , for all (k, `) Nm Nn . These mn vectors are also clearly
linearly independent. So they are a basis for Mm,n (K) by Definition 22.7.2. Hence dim(Mm,n (K)) = mn by
Theorem 22.7.13. (Alternatively, the assertion follows more directly from Theorem 22.7.15.)
25.3.6 Remark: Matrix multiplication is defined only for conformable matrices.
The multiplication operation (A, B) 7 AB for matrices is defined only for matrices A and B such that the
number of columns of A equals the number of rows of B. Cullen [62], page 15, calls such ordered pairs of
matrices conformable for multiplication.
25.3.7 Definition: The (matrix) product of an ordered pair of matrices A Mm,p (K) and B Mp,n (K)
for a field K and m, n, p + Z
0 is the matrix C Mm,n (K) defined by
p
Nm Nn,
X
(i, j) C(i, j) = A(i, k)B(k, j).
k=1
25.3.8 Notation: AB denotes the matrix product of matrices A Mm,p (K) and B Mp,n (K) for a field
K and m, n, p +
0. Z
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
25.3. Rectangular matrix addition and multiplication 739
25.3.10 Remark: Entries in a matrix product equal products of left rows and right columns.
Theorem 25.3.11 demonstrates the usual way in which matrix products are taught and thought of. Each
entry (AB)(i, j) in the product AB of A and B is computed as the sum of products of entries in the Rowi (A)
and Colj (B). A minor pedantic subtlety here is that since this row-times-column product is a 1 1 matrix,
its value must be extracted by making it act on the index pair (1, 1).
25.3.11 Theorem: Let A Mm,p (K) and B Mp,n (K) for a field K and m, n, p Z+0. Then
p
Nm , j Nn,
X
i (AB)(i, j) = Rowi (A)(1, k) Colj (B)(k, 1) (25.3.1)
k=1
= (Rowi (A) Colj (B))(1, 1). (25.3.2)
Proof: Line (25.3.1) follows from Definitions 25.3.7 and 25.2.14. Line (25.3.2) follows from line (25.3.1)
and Definition 25.3.7 for the product of the 1 p and p 1 matrices Rowi (A) and Colj (B).
25.3.12 Remark: Rows and columns of matrix products are spanned by right rows and left columns.
Theorem 25.3.13 combines the elements in each row or column of a matrix product to conclude that each
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
row of a matrix product AB equals a linear combination of the rows of B, and each column of AB equals a
linear combination of the columns of A. This elementary observation has important consequences because
it allows some of the theory of linear spaces to be applied to matrices.
25.3.13 Theorem: Let A Mm,p (K) and B Mp,n (K) for a field K and m, n, p Z+0. Then
p
Nm ,
X
i Rowi (AB) = aik Rowk (B) (25.3.3)
k=1
and
p
Nn ,
X
j Colj (AB) = bkj Colk (A), (25.3.4)
k=1
p p
where A = [aik ]m n
i=1 k=1 and B = [bkj ]k=1 j=1 .
N
Proof: For line (25.3.3), Rowi (AB) = {((1, j), (AB)(i, j)); j n } M1,n (K) by Definition Pp 25.2.14 for
N N
all i m . Thus Rowi (AB)(1, j) = (AB)(i, j) for all j n . Then Rowi (AB)(1, j) = k=1 aik B(k, j)
by
Pp Definition 25.3.7. But Rowk (B)(1, j) = B(k, j) by Definition 25.2.14. Therefore Rowi (AB)(1, j) =
k=1 aik Rowk (B)(1, j) for all j N
n . Hence Rowi (AB) =
Pp
k=1 aik Rowk (B) by the vector addition and
scalar multiplication operations defined on M1,n (K) in Definition 25.3.2. This verifies line (25.3.3).
N
For line (25.3.4), Colj (AB) = {((i, 1), (AB)(i, j)); i m } Mm,1 (K) for N
Ppall j n by Definition 25.2.14.
N
So Colj (AB)(i, 1) = (AB)(i, j) for all i m . Then Colj (AB)(i, 1) = A(i, k)bkj =
Pp
k=1 kj A(i, k)
b
Pp k=1
by Definition 25.3.7.PBut Colk (A)(i, 1) = A(i, k). So Colj (AB)(i, 1) = k=1 bkj Colk (A)(i, 1) for all i m . N
Hence Colj (AB) = pk=1 bkj Colk (A) by the vector addition and scalar multiplication operations for Mi,1 (K)
in Definition 25.3.2. This verifies line (25.3.4).
25.3.14 Remark: More expressions for the rows and columns of a matrix product.
Lines (25.3.3) and (25.3.4) in Theorem 25.3.13 may also be rewritten as follows.
p
Nm ,
X
i Rowi (AB) = Rowi (A)(1, k) Rowk (B)
k=1
= Rowi (A) B
and
p
Nn ,
X
j Colj (AB) = Colj (B)(k, 1) Colk (A)
k=1
= A Colj (B).
25.3.15 Remark: Layout conventions for matrix products and matrix equations.
Hand-written matrix algebra is most convenient when the vector components are depicted as column vectors.
In other words, the elements of the n-tuples are listed in increasing order down the page. This gives the best
layout when such a vector is multiplied by a matrix as in the following.
y1 a11 a12 ... a1n x1
y2 a21 a22 ... a2n x2
. = . .. .. .. . .
. .. . . . ..
.
ym am1 am2 . . . amn xn
The use of vertical (i.e. column) vectors keeps handwritten calculations compact in the horizontal direction.
A further advantage of column vectors is that the composition of multiple linear transformations gives the
same order for matrices. Thus if y = f (x) = Ax and z = g(y) = By for matrices A and B, the composition
g(f (x)) equals BAx. Thus in both notations, the order is the reverse of the temporal (or causal) order which
intuitively underlies the composition of functions. An m n matrix over a field K is used for linear maps
f : K n K m , which has m and n in the reverse order. An advantage P of multiplying a vector by a matrix
from the left is that the indices are contiguous. For example, yi = nj=1 aij xj . The two instances of the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
index j are close to each other.
In Markov chain theory, the reverse convention is adopted. There the vectors are row vectors, which makes the
matrix multiplication order the reverse of the standard function composition notation order. The advantage
in this case is that the matrix multiplication order matches the temporal order since the initial state of the
system is a rowPvector P which is multiplied on the right by state transition matrices. For example, in the
equation wk = ni=1 nj=1 vi Pij Qjk , the initial state is v and one or matrices on the right modify the state,
first P then Q.
Accordingly, general matrices A Mm,p (K) and B Mq,n (K) for a field K and m, n, p, q Z+0 may be
multiplied to give AB Mm,n (K) defined by
This calculation gives the same result as (25.3.5). It is probably not very useful for differential geometry.
The infinite matrix embedding idea also makes possible the addition of arbitrary matrices A Mm1 ,n1 (K)
and B Mm2 ,n2 (K) for a field K and m1 , n1 , m2 , n2 + Z
0 . A matrix sum A + B Mm3 ,n3 may be defined
for m3 = max(m1 , m2 ) and n3 = max(n1 , n2 ) by
N
A Mm1 ,n1 (K), B Mm2 ,n2 (K), (i, j) m3 n3 , N
(A + B)(i, j) = A(i, j) + B(i, j),
where 0K is assumed for each right-hand term if it is undefined. (In other words, the infinite embedded
matrices are zero filled.) As in the case of the generalised matrix product, this generalised matrix addition
is also of dubious practical value.
25.4.2 Definition: The unit matrix or identity matrix in a matrix space Mn,n (K) for a field K and
n + Z
0 is the matrix A Mn,n (K) defined by A(i, j) = ij for all i, j n. N
25.4.3 Notation: In denotes the identity matrix in Mn,n (K) for n Z+
0 and a field K. In other words,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
n Z+0, (i, j) Nn Nn, In (i, j) = ij .
25.4.4 Theorem: Identity matrices act as left and right identities for rectangular matrices.
Let A Mm,n (K) be a matrix over a field K with m, n + Z
0 . Then Im A = AIn = A.
25.4.5 Definition: A left inverse of a matrix A Mm,n (K) for a field K and m, n Z+0 is a matrix
B Mn,m (K) such that BA = In .
A right inverse of a matrix A Mm,n (K) for a field K and m, n Z+0 is a matrix B Mn,m (K) such
that AB = Im .
25.4.7 Theorem: Let A Mm,n (K), A0 Mn,m (K), B Mn,p (K) and B 0 Mp,n (K) for some field K
and m, n, p +
0. Z
(i) If AA0 = Im and BB 0 = In , then ABB 0 A0 = Im .
(ii) If A is a left inverse for A0 and B is a left inverse for B 0 , then AB is a left inverse for B 0 A0 .
(iii) If A0 is a right inverse for A and B 0 is a right inverse for B, then B 0 A0 is a right inverse for AB.
Proof: For part (i), let AA0 = Im and BB 0 = In . Then ABB 0 A0 = A(BB 0 )A0 = AIn A0 = AA0 = Im since
matrix multiplication is associative by Theorem 25.3.9.
Parts (ii) and (iii) follow from part (i) and Definition 25.4.5.
25.4.8 Theorem: Let A, B 0 Mm,n (K) and A0 , B Mn,m (K) for some field K and m, n Z+0.
(i) If AA0 = Im and BB 0 = In , then ABB 0 A0 = Im and BAA0 B 0 = In .
(ii) If A is a left inverse for A0 and B is a left inverse for B 0 , then AB is a left inverse for B 0 A0 and BA is
a left inverse for A0 B 0 .
(iii) If A0 is a right inverse for A and B 0 is a right inverse for B, then B 0 A0 is a right inverse for AB and
A0 B 0 is a right inverse for BA.
Proof: For part (i), let AA0 = Im and BB 0 = In . Then ABB 0 A0 = A(BB 0 )A0 = AIn A0 = AA0 = Im and
BAA0 B 0 = B(AA0 )B 0 = BIm B 0 = BB 0 = In since matrix multiplication is associative by Theorem 25.3.9.
Parts (ii) and (iii) follow from part (i) and Definition 25.4.5.
25.4.9 Remark: Constraints on matrix sizes for left and right inverses.
The matrix size parameters m, n and p in Theorem 25.4.7 cannot be freely chosen. If AA0 = Im and
BB 0 = In , then the constraints m n and n p follow from Theorem 25.5.14. (The proof of this exploits
some properties of the row rank and column rank of matrices. Therefore this topic is delayed to Section 25.5.)
Similarly, the equalities AA0 = Im and BB 0 = In in Theorem 25.4.8 imply that m n and n m, and
so m = n. However, even if m = n, it does not follow by pure algebra from AA0 = Im that A0 A = Im .
Matrices are not commutative in general, and the case of infinite matrices in Example 25.4.10 shows that
AA0 = I is consistent with A0 A = I for an infinite identity matrix I. The peculiar properties of finite-
dimensional linear spaces are required in order to show that AA0 = Im implies A0 A = Im for a square
matrix A.
25.4.10 Example: Infinite square matrices for which left inverse does not imply right inverse.
In the space M, = K Z Z of infinite matrices with positive integer indices and values in a field K, it
+ +
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the matrices A = [aij ]
i,j=1 and A 0
= [a 0
]
ij i,j=1 with a ij = i+1,j and a 0
ij = i,j+1 for all i, j + . These
matrices may be visualised as follows.
0 1 0 0 ... 0 0 0 0 ...
.. ..
0
0 1 0 . 1
0 0 0 .
.. ..
A= [aij ]
i,j=1 0
= 0 0 1 .
0
A = [a0ij ]
i,j=1 0
= 1 0 0 ..
.. ..
0 0 0 0 . 0 0 1 0 .
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
Then
P 0
k=1 aik akj=
P
Pk=1 i+1,k k,j+1
Z
= i+1,j+1 = ij for all i, j + . So AA0 = I. But
P 0
k=1 aik akj =
P
k=1 i,k+1 k+1,j = k=2 i,k k,j = i,j (1 i,1 ) for all i, j
+
Z
. So A0 A 6= I because the entry ((1, 1), 1)
0
in I is missing in A A.
25.4.11 Definition: The transpose of a matrix A Mm,n (K) for a field K and m, n Z+0 is the matrix
B Mn,m (K) defined by B(j, i) = A(i, j) for all i m and j n . N N
25.4.12 Notation: AT denotes the transpose of A Mm,n (K) for a field K and m, n Z+0.
25.4.13 Remark: An alternative notation for the transpose of a matrix.
The notation tA for the transpose of a matrix A is also popular. However, a mixture of prefix and postfix
superscripts (or subscripts) can be confusing. (For example, it might not be clear which order of computation
is implied by expressions such as tA2 and tA1 . The fact that they may give the same answer does not remove
the ambiguity as to which order of operations is signified.) Therefore only postfix superscripts and subscripts
are used in this book.
25.4.15 Theorem: For any field K, m, n, p Z+0, A Mm,p (K), B Mp,n (K), (AB)T = BTAT .
Proof: Suppose m, n, p + Z
A Mm,p (K) P
0,P and B Mp,n (K)
Pn for Tsome field K. Then for all i Nn
N n n
and j m , (AB)Tij = (AB)ji = k=1 ajk bki = k=1 bki ajk = k=1 Bik ATkj = (B TAT )ij as claimed.
25.5.2 Definition: The row (matrix) span of a matrix A Mm,n (K), for some field K and m, n + 0, Z
is the span of the set of m row matrices of A in the linear space M1,n (K).
Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The column (matrix) span of a matrix A Mm,n (K), for some field K and m, n +0 , is the span of the
set of n column matrices of A in the linear space Mm,1 (K).
25.5.3 Notation: Rowspan(A), for a matrix A Mm,n (K) for some field K and m, n Z+0, denotes the
row matrix span of A. In other words,
25.5.4 Remark: Row and column spans are subspaces of row and column matrix spaces.
It is fairly obvious that Rowspan(A) is a subspace of the linear space of row matrices M1,n (K), and
Colspan(A) is a subspace of the linear space of column matrices Mm,1 (K), where A Mm,n (K) for some
field K and m, n + Z
0 . This is shown in Theorem 25.5.5 because its better to be safe than sorry!
25.5.6 Theorem: Let A Mm,n (K) and B Mn,m (K) for some field K and m, n Z+.
(i) i Nm , Rowi (AB) Rowspan(B).
(ii) j Nn , Colj (AB) Colspan(A).
(iii) Rowspan(AB) is a linear subspace of Rowspan(B).
(iv) Colspan(AB) is a linear subspace of Colspan(A).
Proof: Part (i) follows from Theorem 25.3.13 line (25.3.3) and Definition 25.5.2.
Part (ii) follows from Theorem 25.3.13 line (25.3.4) and Definition 25.5.2.
Pm m m
For part (iii), note that Rowspan(AB) = i=1 i Rowi (AB); (i )i=1 K as in Notation 25.5.3. So
Rowspan(AB) Rowspan(B) by part (i) because Rowspan(B) is a linear space. But Rowspan(AB) is closed
under linear combinations. Hence Rowspan(AB) is a linear subspace of Rowspan(B).
Pn n n
For part (iv), note that Colspan(AB) = j=1 j Colj (AB); (j )j=1 K as in Notation 25.5.3. So
Colspan(AB) Colspan(A) by part (ii) because Colspan(A) is a linear space. But Colspan(AB) is closed
under linear combinations. Hence Colspan(AB) is a linear subspace of Colspan(A).
25.5.8 Definition: The row rank of a matrix A Mm,n (K), for a field K and m, n Z+0, is the dimension
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of the row span of A.
The column rank of a matrix A Mm,n (K), for a field K and m, n Z+0, is the dimension of the column
span of A.
25.5.9 Notation:
Rowrank(A), for a matrix A, denotes the row rank of A. In other words, Rowrank(A) = dim(Rowspan(A)).
Colrank(A), for a matrix A, denotes the column rank of A. In other words, Colrank(A) = dim(Colspan(A)).
25.5.10 Remark: Some constraints for row rank and column rank of products of matrices.
It follows from Theorem 25.3.5 that dim(Rowspan(A)) n and dim(Colspan(A)) m. This yields the
inequalities for the row rank and column rank of a product of two matrices in Theorem 25.5.11.
25.5.11 Theorem: Let A Mm,n (K) and B Mn,m (K) for some field K and m, n Z+ .
(i) Rowrank(AB) Rowrank(B).
(ii) Colrank(AB) Colrank(A).
Proof: Part (i) follows from Theorems 25.5.6 (iii) and 22.5.14 (ii).
Part (ii) follows from Theorems 25.5.6 (iv) and 22.5.14 (ii).
25.5.12 Theorem: Let K be a field and n Z+0. Let In be the identity matrix in Mn,n (K).
(i) Rowspan(In ) = M1,n (K).
(ii) Colspan(In ) = Mn,1 (K).
(iii) Rowrank(In ) = Colrank(In ) = n.
Proof: For part (i), Rowrank(A) = Rowrank(AIn ) Rowrank(In ) = n by Theorem 25.5.11 (i) and
Theorem 25.5.12 (iii). But Rowspan(A) is spanned by the m row vectors of A by Definition 25.5.2. So
Rowrank(A) = dim(Rowspan(A)) m by Definition 22.5.2. Hence Rowrank(A) min(m, n).
For part (ii), Colrank(A) = Colrank(Im A) Colrank(Im ) = m by Theorems 25.5.11 (ii) and 25.5.12 (iii).
But Colspan(A) is spanned by the n column vectors of A by Definition 25.5.2. Therefore Colrank(A) =
dim(Colspan(A)) n by Definition 22.5.2. Hence Colrank(A) min(m, n).
25.5.14 Theorem: For a field K and m, n Z+, let A Mm,n (K) and B Mn,m (K).
(i) If A is a left inverse of B, then m n.
(ii) If A is a right inverse of B, then m n.
(iii) If A is both a left inverse and right inverse of B, then m = n.
Proof: For part (i), let A be a left inverse of B. Then AB = Im . So m = Rowrank(Im ) = Rowrank(AB)
Rowrank(B) n by Theorems 25.5.12 (iii), 25.5.11 (i) and 25.5.13 (i).
For part (ii), let A be a right inverse of B. Then BA = In . So n = Colrank(In ) = Colrank(BA)
Colrank(B) m by Theorems 25.5.12 (iii), 25.5.11 (ii) and 25.5.13 (ii).
Part (iii) follows from parts (i) and (ii),
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
25.5.15 Remark: Two-sided inverse matrices.
As mentioned in Remark 25.5.1, and proved in Theorem 25.5.14 (iii), a two-sided inverse matrix must be
square. The idea that non-square matrices have even single-sided inverses is somewhat illusory. There are
infinitely many identity matrices, which is impossible in any group. Of course, the non-square matrices
dont even constitute a multiplicative semigroup. So terms like left inverse and right inverse do not have
the usual kind of meaning.
Since only square matrices have two-sided inverses, and such inverses are themselves necessarily square
matrices, two-sided inverse matrices are presented in Section 25.8.
25.5.16 Theorem: Let A Mm,n (K) for some field K and m, n Z+.
(i) Rowrank(A) = n if and only if Rowspan(A) = M1,n (K).
(ii) Colrank(A) = m if and only if Colspan(A) = Mm,1 (K).
Proof: Let A Mm,n (K). By Theorem 25.5.6 (i), Rowspan(AIn ) is a linear subspace of Rowspan(In ),
which equals M1,n (K) by Theorem 25.5.12 (i). Therefore Rowspan(A) = Rowspan(AIn ) is linear subspace
of M1,n (K), which has dimension n by Theorem 25.3.5. Hence if Rowrank(A) = n, then Rowspan(A) =
M1,n (K) by Theorem 22.5.16 (ii). The converse follows from Theorem 25.3.5. This verifies part (i).
For part (ii), Theorem 25.5.6 (ii) implies that Colspan(Im A) is a linear subspace of Colspan(Im ), which equals
Mm,1 (K) by Theorem 25.5.12 (ii). Therefore Colspan(A) = Colspan(Im A) is linear subspace of Mm,1 (K),
which has dimension m by Theorem 25.3.5. Hence if Colrank(A) = m, then Colspan(A) = Mm,1 (K) by
Theorem 22.5.16 (ii). The converse follows from Theorem 25.3.5.
25.5.17 Remark: Row and column spans of left and right inverse matrices.
Theorem 25.5.18 shows that a necessary condition for A to be a left inverse of B, or equivalently that B is a
right inverse of A, is that the rows of B must span the whole row space M1,m (K) and the columns of A must
span the whole column space Mm,1 (K). An additional necessary condition is given by Theorem 25.5.14 (i),
namely that m n. Example 25.5.19 shows that these conditions are not sufficient to guarantee that A is
a left inverse of B.
25.5.18 Theorem: For a field K and m, n Z+, let A Mm,n (K) and B Mn,m (K) with AB = Im .
(i) Rowrank(B) = m and Rowspan(B) = M1,m (K).
(ii) Colrank(A) = m and Colspan(A) = Mm,1 (K).
Proof: For part (i), m = Rowrank(Im ) = Rowrank(AB) Rowrank(B) by Theorems 25.5.12 (iii)
and 25.5.11 (i). But Rowrank(B) m by Theorem 25.5.13 (i). So Rowrank(B) = m. Hence Rowspan(B) =
M1,m (K) by Theorem 25.5.16 (i).
For part (ii), m = Colrank(Im ) = Colrank(AB) Colrank(A) by Theorems 25.5.12 (iii) and 25.5.11 (ii).
But Colrank(A) m by Theorem 25.5.13 (ii). So Colrank(A) = m. Hence Colspan(A) = Mm,1 (K) by
Theorem 25.5.16 (ii).
25.5.21 Example: For a field K and m = 1 and n = 2, define A Mm,n (K) and B Mn,m (K) by
1K
A = [ 1K 0K ] and B = . Then AB = [ 1K ] = Im . Note also that A and B satisfy m n and
0K
1K 0K
Rowrank(B) = 1 = m and Colrank(A) = 1 = m. However, BA = 6= In .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
0K 0K
25.5.22 Remark: Conditions for row rank and column rank of a matrix.
Theorem 25.5.23 shows the possibly surprising equality of the row rank and column rank of any matrix.
The row rank of a matrix A Mm,n (K) equals the dimension of the range of the linear map v 7 Av
for v Mn,1 (K). The column rank of A equals the dimension of the range of the linear map w 7 AT w
for w M1,m (K). (See Definition 25.4.11 for the transpose of a matrix.) It is not immediately, intuitively
obvious that these two ranges should have the same dimension.
It is perhaps of some technical interest that Theorem 25.5.23 makes non-trivial use of matrices with zero
rows or zero columns, thereby justifying their inclusion in Definition 25.2.2.
Proof: For part (i), let A Mm,n (K). Suppose that Rowrank(A) p for some p + Z
0 . Rowspan(A) is
a linear subspace of M1,n (K) by Theorem 25.5.5 (i). So there is a sequence of vectors (ck )pk=1 (M1,n (K))p
N N
such that Rowspan(A) span({ck ; k P p }) by Definition 22.5.2. Thus for each i m , there is a sequence
p
(i,k )pk=1 K p such that Rowi (A) = k=1 i,k ck . Define the matrix B Mm,p (K) by B(i, k) = i,k for
N N
all (i, k) P m p . Define the matrix C Mp,n (K) by C(k, j) = ck (1, j) for all (k, j) p n . Then N N
N N
A(i, j) = pk=1 B(i, k)C(k, j) for all (i, j) m n . In other words, A = BC.
Conversely, suppose that A = BC for some B Mm,p (K) and C Mp,n (K), for some p Z+0. Then
Rowrank(A) = Rowrank(BC) Rowrank(C) p by Theorems 25.5.11 (i) and 25.5.13 (i).
For part (ii), let A Mm,n (K). Suppose that Colrank(A) p for some p + Z
0 . Colspan(A) is a linear
subspace of Mm,1 (K) by Theorem 25.5.5 (ii). So there is a sequence of vectors (bk )pk=1 (Mm,1 (K))p such
that Colspan(A) span({bk ; k P Np }) by Definition 22.5.2. Thus for each j N
n , there is a sequence
(j,k )pk=1 K p such that Colj (A) = pk=1 j,k bk . Define the matrix C Mp,n (K) by C(k, j) = j,k for all
(k, j) PN p Nn . Define the matrix B Mm,p (K) by B(i, k) = bk (i, 1) for all (i, k) m N
p . Then N
p
N N
A(i, j) = k=1 B(i, k)C(k, j) for all (i, j) m n . In other words, A = BC.
Conversely, suppose that A = BC for some B Mm,p (K) and C Mp,n (K), for some p Z+0. Then
Colrank(A) = Colrank(BC) Colrank(B) p by Theorems 25.5.11 (ii) and 25.5.13 (ii).
Part (iii) follows from parts (i), (ii) and (iii).
Part (iv) follows from parts (i), (ii) and (iii).
Part (v) follows from parts (i), (ii) and (iii).
Part (vi) follows from parts (i), (ii) and (iii), and Theorems 25.5.11 (i, ii) and 25.5.13 (i, ii).
25.5.24 Remark: Conditions for existence of left and right inverses of matrices.
Theorem 25.5.25 shows the importance of the row and column rank concepts. Existence of inverses is an
important property. It turns out that this existence is directly related to row and column rank.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iii) A has both a left and right inverse if and only if Rowrank(A) = m = n.
(iv) A has both a left and right inverse if and only if Colrank(A) = m = n.
Proof: For part (i), suppose that A has a left inverse. Then Rowrank(A) = n by Theorem 25.5.18 (i).
Now suppose that Rowrank(A) = n. By Definitions 25.5.2 and 25.5.8, dim(span({Rowk (A); k m })) = n. N
So span({Rowk (A); k m }) = MN P1,n (K) by Theorem 25.5.16 (i). Thus for each i N
n , there exists a
sequence (i,k )m
k=1 K
m
such that mk=1 i,k Rowk (A) = ei , where ei M1,n (K) is defined by ei (1, j) = ij
N N
for all j n . Define B Mn,m (K) by B(i, k) = i,k for all (i, k) n m . Then N
Nn Nn,
m
P
(i, j) (BA)(i, j) = B(i, k)A(k, j)
k=1
Pm
= i,k A(k, j)
k=1
Pm
= i,k (Rowk (A)(1, j))
k=1
Pm
= (i,k Rowk (A))(1, j)
k=1
= ei (1, j) = ij .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the column null space.
By considering matrices A Mm,n (K) with m 6= n, it becomes clear that the row and column null spaces
are not spaces of row matrices or column matrices of the matrix A. This is in contrast to the row span and
column span of a matrix, which can be naturally identified as spaces of row and column matrices respectively.
The vector 0 for the row null space and row nullity in Definitions 25.6.2 and 25.6.3 signifies the null matrix in
M1,n (K). The vector 0 for the corresponding column null space and column nullity signifies the null matrix
in Mm,1 (K).
Theorem 25.6.5 asserts that the row nullity and column nullity have a simple relation to the corresponding
row rank and column rank.
25.6.2 Definition: The row null space of a matrix A Mm,n (K), for a field K and m, n + 0 , is the Z
linear subspace {w M1,m (K); wA = 0} of M1,m (K).
The column null space of a matrix A Mm,n (K), for a field K and m, n + Z
0 , is the linear subspace
{v Mn,1 (K); Av = 0} of Mn,1 (K)
25.6.3 Definition:
The row nullity of a matrix A Mm,n (K), for a field K and m, n + Z
0 , is the dimension of the row null
space of A. In other words, the row nullity equals dim({w M1,m (K); wA = 0}).
The column nullity of a matrix A Mm,n (K), for a field K and m, n + Z
0 , is the dimension of the column
null space of A. In other words, the column nullity equals dim({v Mn,1 (K); Av = 0}).
Proof: Parts (i) and (ii) follow from Theorem 23.1.28 and Definition 25.6.3.
25.6.5 Theorem: Let A Mm,n (K) for a field K and m, n Z+0.
(i) The row nullity of A equals m Rowrank(A).
(ii) The column nullity of A equals n Colrank(A).
Proof: Parts (i) and (ii) follow from Theorem 24.2.18, the rank-nullity theorem for linear maps.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Linear maps are often specified in terms of component matrices. The principal drawback of component
matrices is the fact that they depend on the choice of basis for both the source and target space. Conversion
between matrices with respect to different bases is tedious and error-prone.
The decision to work with bases and matrices or directly with linear spaces and linear maps is a trade-
off between their advantages and disadvantages in each application context. Differential geometers who
proclaim the virtues of coordinate-free methods are advocating the avoidance of bases and matrices.
The coordinates they refer to are the components of vectors and matrices (and tensors) with respect to
particular choices of bases. Abstract theorems can mostly be written in a coordinate-free style, but practical
calculations mostly require the use of vector component tuples and matrices with respect to bases.
Matrices are also useful for representing bilinear maps with respect to a linear space basis. This is discussed
in Example 27.3.9.
25.7.3 Definition: The (component) matrix of a linear map : V W with respect to bases (ej )nj=1
V n and P
(fi )m
i=1 W
m
for linear spaces V and W over a field K is the matrix Mm,n (K) such that
m
N
(ej ) = i=1 fi ij for all j n .
25.7.4 Remark: Well-definition of the component matrix of a given linear map.
The matrix in Definition 25.7.3 is well defined because by definition of a basis, each vector (ej ) has a
unique m-tuple (ij )m
i=1 K
m
of components with respect to the basis (fi )m
i=1 . In terms of the component
map in Definition 22.8.8, ij = W ((ej ))i , where W : W K m is the component map with respect to
the basis (fi )m
i=1 .
25.7.5 Definition: The linear map for a (component) matrix Mm,n (K) with respect to bases
(ej )nj=1 V n and (fi )m
i=1 W
m
for linear spaces V and W over a field K is the linear map given by
Pm Pn
: v 7 i=1 j=1 fi ij V (v)j for v V , where V : V K n is the component map for the basis (ej )nj=1 .
25.7.6 Remark: Well-definition of the linear map for a given component matrix.
Definition 25.7.5 specifies a unique linear map Lin(V, W ) for each matrix Mm,n (K) for given bases
for V and W . Conversely, Definition 25.7.3 specifies a unique matrix Mm,n (K) for each linear map
Lin(V, W ) for the same bases. This establishes a bijection between the linear maps and component
matrices. Thus one may freely choose whether to work coordinate-free, in terms of linear maps, or using
coordinates, in terms of matrices.
25.7.7 Remark: Matrices multiply basis vectors on the right, vector components on the left.
Whereas the basis vectors are transformed by multiplying the sequence of basis vectors on the right by the
matrix in Definition 25.7.3, component tuples are multiplied on the left by .
P
Let v = nj=1 vj ej be a vector in V with components (vj )nj=1 K n . (Here v denotes both the vector and
its component tuple.) Let P :V W be the linear map for the component matrix Mm,n (K). Then
m Pn
by Definition 25.7.5, (v) = i=1 j=1 fi ij vj . So the component tuple for (v) with respect to the basis
(fi )m
i=1 W
m
is (wi )m
i=1 , where wi =
Pn
j=1 ij vj for i N
m . This is multiplication of the component tuple
of v on the left by the matrix . This is illustrated in Figure 25.7.1.
matrix multiplication
Kn (vj )nj=1 Pn (wi )m
i=1 Km
wi = j=1 ij vj
component n component
map V : V K W : W K m map
:V W
V W
(ej )nj=1 linear map (fi )m
i=1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 25.7.1 Matrix multiplication corresponding to a linear map
25.7.8 Remark: Component matrices for linear maps between infinite-dimensional linear spaces.
Component matrices are defined for general linear spaces, which may be infinite-dimensional, in Section 23.2.
If the spaces have a topology, questions of convergence place constraints on the definitions and properties of
such matrices. If there is no topology, infinite sums of vectors must be avoided, which also places constraints
on definitions and properties. In the non-topological case, even changing basis is quite problematic, as
discussed in Section 22.9. Apart from some very basic definitions and properties, both topological linear
spaces and non-topological infinite-dimensional linear spaces are not presented here. They are unnecessary
for the differentiable manifold definitions which are presented in Part IV of this book.
25.8.2 Theorem: Left and right inverses of square matrices are equal and unique.
If B1 A = B2 A = AC1 = AC2 = In for A, B1 , B2 , C1 , C2 Mn,n (K), for a field K and n Z+, then
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
25.8. Square matrix algebra 751
B1 = B2 = C1 = C2 . In other words, if A has at least one left inverse and at least one right inverse, then
these two inverses are both unique and they are equal to each other.
25.8.4 Theorem: Every left inverse of a square matrix is a right inverse, and vice versa.
Let AB = In for some A, B Mn,n (K), for some field K and n + Z
0 . Then BA = In .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
which one wishes to undo. But this works perfectly well for matrix left and right inverses.
A very beneficial consequence of the commutativity of matrices with their inverses is the fact that one may
define a unital morphism from the integers to the ring of square matrices so that a single-parameter group
of matrices is associated with each invertible matrix.
25.8.7 Definition: An invertible matrix or nonsingular matrix is a matrix A Mn,n (K), for some field
K and n + Z
0 , such that AB = BA = In for some B Mn,n (K).
A non-invertible matrix or singular matrix is a square matrix which is not invertible.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
integer powers of matrices as in Definition 25.8.15.
25.8.15 Definition: The powers of a matrix A Mn,n (K), for a field K and n Z+0, are defined
inductively by
(i) A0 = In .
(ii) p Z+0, Ap+1 = Ap A.
inv
(iii) If A Mn,n (K), then p Z0 , Ap1 = ApA1.
25.8.16 Theorem: Let : V W be a linear map between finite-dimensional linear spaces V and W
with dim(V ) = dim(W ).
(i) is a linear space isomorphism if and only if ker() = {0V }.
(ii) is a linear space isomorphism if and only if Range() = W .
Proof: For part (i), let n = dim(V ) = dim(W ). Then V and W have bases (ei )ni=1 and (e0i )ni=1 re-
spectively by Theorem 22.7.11. Let A Mn,n (K) be the matrix for according to Definition 25.7.3. By
Theorem 25.8.12 (iii), A is an invertible matrix if and only if b Mn,1 (K), P (Ab = 0 b P= 0). Therefore
n n
A is an invertible matrix if and only if ker() = {0V } because (v) = ( j=1 bj ej ) = j=1 bj (ej ) =
Pn Pn 0 n n
i=1 ( A b )e
j=1 ij j i , where b = (b )
j j=1 = ( V (v) )
j j=1 Mn,1 (K) for v V , and so (v) = 0 implies v = 0
for all v V is equivalent to Ab = 0 implies b = 0 for all b Mn,1 (K). But is an invertible linear
transformation if and only if A is an invertible matrix. Hence is a linear space isomorphism if and only if
ker() = {0V }. (Alternatively, the assertion follows from Theorem 24.2.16 (iii).)
Part (ii) follows from Theorem 24.2.16 (iii).
25.9.2 Definition: The trace of a matrix A = (aij )ni,j=1 Mn,n (K) for n + Z
0 is the sum
Pn
i=1 aii .
25.9.3 Notation: Tr(A) denotes the trace of a matrix A Mn,n (K) for n Z+
0 and a field K,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Another kind of interpretation for the trace of a matrix pertains to the application of the matrix in the
context of bilinear maps. In this case, the matrix is a set of components for an object which transforms like
a second-degree covariant tensor. The trace of such a matrix is a special case of the concept of a contraction
of a tensor. This kind of interpretation does require an inner product, and the trace is in this case invariant
under the orthogonal group of transformations.
More generally, a function f of an endomorphism component matrix A is invariant if f (B 1 AB) = f (A)
for all B in some specified matrix group, whereas if A is the coefficient matrix for a bilinear form, it is
invariant if f (B T AB) = f (A) for all B in some specified matrix group. Thus in the linear endomorphism
1
case,
Pn invariance Pn of the trace is obtained
Pn if Tr(B
Pn AB) = Tr(A) Pn for all B in some
Pn group. This means that
i=1 a ii = i,j,k=1 bij a jk bki = j,k=1 a jk i=1 bki bij = j,k=1 a jk kj = j=1 ajj , which holds for all
1 n
B GL(n), where B = [bij ]i,j=1 . So GL(n) is the invariance group for the endomorphism application. If
P P P P
A corresponds to a bilinear form, one wants ni=1 aii = ni,j,k=1 bji ajk bki = nj,k=1 ajk ni=1 bki bji , which
Pn
holds if B is orthogonal because then i=1 bki bji = kj . Thus O(n) is the invariance group for the bilinear
form application.
The discrepancy between the invariance groups for the trace of a linear map and the trace of a bilinear
form is a clue that these are different kinds of traces, although arithmetically they seem identical when the
different kinds of objects are parametrised by matrices.
25.10.3 Definition: The determinant of a matrix A = (aij )ni,j=1 Mn,n (K) for n
P Q
Z+0 and field K is
the element of K given by P perm(Nn ) parity(P ) ni=1 ai,P (i) .
25.10.4 Notation: det(A) denotes the determinant of a matrix A Mn,n (K) for n Z+0 and a field K.
X n
Y
det(A) = parity(P ) ai,P (i) .
P perm( N n) i=1
25.10.5 Remark: The special case of determinants for a field with characteristic 2.
Z
If the field K in Definition 25.10.3 is 2 = {0, 1}, the parity values 1 and 1 are the same, and the matrix
A has only zeros and ones as elements. (See Definition 18.7.10 for the characteristic of a field.) Since
the determinant can only be zero or one, it is interesting to ask what kinds of matrices have the non-zero
N
determinant value. The diagonal matrix with aij = ij for i, j n certainly has det(A) = 1. It is also clear
that det(A) = 1 for any lower diagonal matrix with aij = 1 for i = j and aij = 0 for i < j. (This is true for
any field.)
25.10.6 Theorem: Let A Mn,n (K) for n Z+0 and a field K. Then det(AT ) = det(A).
Proof: For any n Z+0 , the set {P
1
; P perm(Nn )} is equal to perm(Nn ) because P perm(Nn ) if
and only if P perm(Nn ). So
1
X n
Y
det(A) = parity(P ) ai,P 1 (i) .
P perm( Nn) i=1
Q Q Q
N
But ni=1 ai,P 1 (i) = ni=1 aP (i),P (P 1 (i)) = ni=1 aP (i),i for any P perm( n ) because a permutation of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the factors in a product in a field does not affect the value of the product. (According to Definition 18.7.3,
the product operation of a field is commutative.) Therefore
X n
Y
det(A) = parity(P ) aP (i),i ,
P perm( N n) i=1
25.10.7 Theorem: Let A Mn,n (K) for n Z+0 and a field K. Then det(A) = n det(A) for K.
Proof:
P Let A Mn,n (K) +
Qn for n 0 andnaP Z field K. Then det(A) Q =
n
P perm(Nn ) parity(P ) i=1 ai,P 1 (i) = P perm(Nn ) parity(P )
n
i=1 ai,P 1 (i) = det(A).
25.10.9 Theorem: Let A Mn,n (K) for n Z+0 and a field K. Let Q : Nn Nn. Then
X n
Y
parity(P ) aQ(i),P (i) = parity(Q) det(A). (25.10.1)
P perm( Nn) i=1
N
Proof: Suppose first that Q is not a permutation of n . (This implies, incidentally, that n 2.) Then
N N
Q(k) = Q(`) for some k, ` n with k 6= `. Let S perm( n ) be the permutation which swaps k and `.
(That is, S(k) = ` and S(`) = k; otherwise S(i) = i.) Then Q S = Q S 1 = Q and parity(S) = 1.
Substitute P = T S in the left-hand side of (25.10.1) for permutations T perm( n ). Then N
X n
Y X n
Y
parity(P ) aQ(i),P (i) = parity(T S) aQ(i),T S(i)
P perm( N n) i=1 T perm( N n) i=1
X n
Y
= parity(T ) parity(S) aQS 1 (i),T (i)
T perm( N n) i=1
X n
Y
= parity(S) parity(T ) aQ(i),T (i)
T perm( N n) i=1
= 0,
because x = x for x K implies that x = 0. Since parity(Q) = 0 for a non-permutation, the theorem is
verified in this case.
If Q is a permutation of Nn, substitution of P = T Q in (25.10.1) yields
X n
Y X n
Y
parity(P ) aQ(i),P (i) = parity(T Q) aQ(i),T Q(i)
P perm( Nn) i=1 T perm( N n) i=1
X n
Y
= parity(T ) parity(Q) aQQ1 (i),T (i)
T perm( N n) i=1
X n
Y
= parity(Q) parity(T ) ai,T (i)
T perm( N n) i=1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
= parity(Q) det(A),
which is as claimed.
25.10.11 Theorem: Let A Mm,n (K) be a matrix over a field K with m, n Z+0. Then
m X
Y n X m
Y
ai,j = ai,J(i) . (25.10.2)
i=1 j=1 JNN n
m i=1
P n m+1
P Q
= ai,(J{(m+1,j)})(i)
JNm,n j=1 i=1
P m+1
Q
= ai,J(i) ,
JNm+1,n i=1
which verifies line (25.10.2) for case m + 1 for all n Z+0. Hence by induction, line (25.10.2) is valid for
all m, n + Z
0.
25.10.12 Theorem: The determinant of the product equals the product of the determinants.
Let A, B Mn,n (K) for n + Z
0 and a field K. Then det(AB) = det(A) det(B).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
X X Yn
= sign(P ) ai,Q(i) bQ(i),P (i)
QNn P Pn i=1
X Y n X n
Y
= ai,Q(i) sign(P ) bQ(j),P (j)
QNn i=1 P Pn j=1
X Y n X n
Y
= ai,Q(i) sign(Q) sign(P ) bj,P (j) (25.10.4)
QNn i=1 P Pn j=1
X n
Y X n
Y
= sign(Q) ai,Q(i) sign(P ) bj,P (j)
QPn i=1 P Pn j=1
= det(A) det(B).
Line (25.10.3) follows from Theorem 25.10.11. Line (25.10.4) follows from Theorem 25.10.9.
inv
25.10.13 Theorem: Let A Mn,n (K), for some field K and n Z+. Then det(A1) = det(A)1.
Proof: The assertion follows from Theorem 25.10.12 and the definition of an inverse matrix.
sequence of n vectors (ei )ni=1 in a linear space V with dimension n, the image of that region under the linear
map : x 7 Ax has volume vol((S)) = det(A) vol(S).
The interpretation of the determinant in the application to bilinear forms is not so simple. Like the trace, the
determinant of a bilinear form is not invariant under all linear transformations. In fact, the special linear
transformations are defined to be those linear transformations which preserve the determinant of bilinear
forms. Since the determinant of a symmetric real matrix is equal to the product of its eigenvalues, the
determinant of a positive definite real symmetric matrix may be interpreted as the inverse volume multiplier
for level sets of the bilinear form. Thus if = {x V ; xT x < 1} and 0 = {x V ; xT Ax < 1}, then
vol(0 ) = det(A)1 vol() if det(A) 6= 0 and A is a positive definite real symmetric matrix.
25.11.2 Definition: The upper modulus for real square matrices is the function + : Mn,n ( ) R R
defined for n + Z
0 by
n n
R Rn and
n X X o
A Mn,n ( ), + (A) = sup aij vi vj ; v vi2 = 1 .
i,j=1 i=1
The lower modulus for real square matrices is the function : Mn,n ( ) R R defined for n Z+0 by
n n
R Rn and
nX X o
A Mn,n ( ), (A) = inf aij vi vj ; v vi2 = 1 .
i,j=1 i=1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
functions are useful for putting upper and lower limits on the coefficients of elliptic second-order partial
derivatives in partial differential equations.
25.11.4 Remark: The apparently useless upper and lower modulus for zero-dimensional matrices.
In the case n = 0 in Definition 25.11.2, + (A) = and (A) = for all A Mn,n ( ). Therefore these R
functions are of dubious value for n = 0.
25.11.5 Theorem:
(i) n Z+0, A Mn,n (R), +( 21 (A + AT )) = +(A).
(ii) n Z+0 , A Mn,n (R), ( 2 (A + A )) = (A).
1 T
P P
Proof: Parts (i) and (ii) follow from ni,j=1 ( 12 (aij + aji ))vi vj = ni,j=1 aij vi vj , which follows from com-
Pn
mutativity of real number multiplication. Part (iii) follows from i,j=1 ( 21 (aij aji ))vi vj = 0.
25.11.6 Remark: Applications of matrix upper and lower modulus to elliptic second-order PDEs.
In the study of second-order partial differential equations, the order properties of the coefficient matrix for the
second-order derivatives are important for the classification of operators and equations. An elliptic operator,
for example, requires a second-order coefficient matrix which is positive definite. A weakly elliptic operator
requires a positive semi-definite second-order coefficient matrix. In curved space, similar classifications are
applicable, although the second-order derivatives in this case should be covariant and the coefficient matrix
is replaced by a second-order contravariant tensor. Then the (semi-)definiteness properties are applied to the
components of this contravariant tensor with respect to a basis. Of course, the (semi-)definiteness properties
must be chart-independent in order to ensure that the properties are well defined. (See Definition 30.5.3 for
the bilinear function version of Definition 25.11.7.)
25.11.7 Definition: A real square matrix (aij )ni,j=1 Mn,n ( ) for n R Z+0 is said to be
R P
(i) positive semi-definite if v n , ni,j=1 aij vi vj 0,
R
(ii) negative semi-definite if v n ,
Pn
i,j=1 aij vi vj 0,
(iii) positive definite if v R n
\ {0},
Pn
i,j=1 aij vi vj > 0,
(iv) negative definite if v R n
Pn
\ {0}, i,j=1 aij vi vj < 0.
25.11.8 Theorem: The relation of the upper and lower matrix modulus to eigenvalues.
Z R
Let n + and A Mn,n ( ). Then
(i) A is positive semi-definite if and only if (A) 0;
(ii) A is negative semi-definite if and only if + (A) 0.
Z
Proof: To show part (i), let n + and let A = [aij ]ni,j=1 be positive semi-definite. Then A Mn,n ( ) and R
R
v n ,
Pn
i,j=1 aij vi vj 0. By Definition 25.11.2, (A) = inf
Pn
i,j=1 R
aij vi vj ; v n and
Pn 2
i=1 vi =
Pn
R
1 . But i,j=1 aij vi vj 0 for any v n . Therefore (A) 0 as claimed.
R R Pn
Now suppose that A = [aij ]ni,j=1 Mn,n ( ) and (A) 0. If v = 0 n , then i,j=1 aij vi vj = 0. So
Pn
Pn i,j=1 R P P
aij vi vj 0. If v 6= 0, define w n by w = k1/2 v, where k = ni=1 vi2 . Then ni=1 wi2 = 1. So
Pn Pn
i,j=1 aij wi wj P 0 be the definition of (A). Therefore i,j=1 aij vi vj = k i,j=1 aij wi wj 0 since k 0.
R
It follows that ni,j=1 aij vi vj 0 for all v n . Hence A is positive semi-definite.
The proof of (ii) follows by suitable changes of sign.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
square matrices. Notation 25.13.9 specialises the concept to real symmetric matrices.
The lower bound function for real square matrices is the function : Mn,n ( ) R R defined for n Z+ by
A Mn,n ( ), R (A) = inf { |Av|; v Rn and |v| = 1 }. (25.12.2)
25.12.3 Theorem: Some basic properties of upper and lower bound functions for square matrices.
R
Let A Mn,n ( ) for some n + . Z
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
25.12. Upper and lower bounds for real square matrices 759
Proof: Part (i) follows directly from Definition 25.12.2 because { |Av|; v Rn and |v| = 1 } =6 .
Z R R
For part (ii), let n + and let A Mn,n ( ). Then { |Av|; v n and |v| = 1 } is a non-empty set
R
of non-negative real numbers. Suppose that v n and |v| = 1. Then |Av|2 =
Pn Pn
Pn Pn
Pn i=1 j=1
Pn
2
ajk vk .
So by Theorem 19.7.9 (the Cauchy-Schwarz inequality), |Av|2 i=1 2
j=1 aij
2
j=1 vj =
2
i,j=1 aij .
+
Therefore 0 (A) < .
For part (iii), { |Av|; v Rn and |v| = 1 } =6 and |Av| 0 for all v Rn . So 0 (A) < .
Part (iv) follows immediately from Definition 25.12.2.
For part (v), let A = [aij ]ni,j=1 6= 0. Then ak` =
Pn
N
6 0 for some k, ` n . Let v = (j` )nj=1 Rn . Then
Av = ( j=1 aij j` )i=1 = (ai` )i=1 6= 0. Therefore |Av| |ak` | > 0. Hence + (A) > 0.
n n
R
For part (vi), let v n . If v = 0, then the inequality is an equality, 0 = 0. Suppose that v 6= 0. Let
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
w = v/|v|. Then |w| = 1. So by line (25.12.1), |Aw| + (A). Hence |Av| = |Aw| |v| + (A)|v|.
R
For part (vii), let v n . If v = 0, then the inequality is an equality, 0 = 0. Suppose that v 6= 0. Let
w = v/|v|. Then |w| = 1. So by line (25.12.2), |Aw| (A). Hence |Av| = |Aw| |v| (A)|v|.
R R
For part (viii), let S + (A) = { ; v n , |Av| |v| }. Then + (A) S + (A) by part (vi). Therefore
S + (A) 6= . But S + (A) + R
0 because
n
R
contains at least one non-zero vector v, for which |Av| 0
and |v| > 0. So |Av|/|v| 0 for all S + (A). So 0 inf(S + (A)) + (A) since + (A) S + (A). Now
R
let S + (A). Then |Av| for all v n with |v| = 1. So + (A) . Therefore + (A) inf(S + (A)).
Hence + (A) = inf(S + (A)) as claimed.
R R
For part (ix), let S (A) = { ; v n , |Av| |v| }. Then 0 S (A). So S (A) 6= and
R
sup(S (A)) 0. Let S (A). Then |Av| for all v n with |v| = 1. So (A) by
Definition 25.12.2. Therefore (A) sup(S (A)). So 0 sup(S (A)) (A) < . But (A) S (A)
by part (vii). So sup(S (A)) (A). Hence (A) = sup(S (A)) as claimed.
Parts (x) and (xi) follow directly from parts (viii) and (ix).
R
For part (xii), let A be invertible. Let v n . Let w = Av. Then v = A1 w. So |v| + (A1 )|w| by
part (vi), and + (A1 ) < by part (ii). So |Av| = |w| + (A1 )1 |v|. Therefore (A) + (A1 )1
by part (xi). But + (A1 )1 > 0. So (A) > 0. To show the converse, suppose that (A) > 0. Then
R
Av 6= 0 for all v n \ {0}. Hence A is invertible by Theorem 25.8.12 (iii).
R R
For part (xiii), let A Mn,n ( ) be invertible. Let v n . Let w = Av. Then v = A1 w. So |v| = |A1 w|
+ (A1 )|w| by part (vi). But + (A1 ) > 0 by part (v). So |Av| = |w| + (A1 )1 |v|. Consequently
R
v n , |Av| + (A1 )1 |v|. Hence (A) + (A1 )1 by part (xi).
R
Now let v n . Let w = A1 v. Then v = Aw. So |v| = |Aw| (A)|w| by part (vii). But (A) > 0
R
by part (xii). So |A1 v| = |w| (A)1 |v|. Consequently v n , |A1 v| (A)1 |v|. Therefore
+ (A1 ) (A)1 by part (x). So (A) + (A1 )1 , and hence (A) = + (A1 )1 .
By substituting A1 for A, it follows that (A1 ) = + (A)1 . Hence + (A) = (A1 )1 . The inequality
(A) + (A) follows from part (i).
Part (xiv) follows from part (xiii) by substituting A1 for A.
25.12.4 Theorem: Let n Z+.
(i) R R
, A Mn,n ( ), + (A) = ||+ (A).
(ii) R
A, B Mn,n ( ), + (A + B) + (A) + + (B).
(iii) R
+ is a norm on the linear space Mn,n ( ).
(iv) R
A, B Mn,n ( ), + (AB) + (A)+ (B).
(v) R
A, B Mn,n ( ), (AB) (A) (B).
R R
Proof: For part (i), let and A Mn,n ( ). Then it follows from Definition 25.12.2 that + (A) =
R R
sup { |(A)v|; v n and |v| = 1} = || sup { |Av|; v n and |v| = 1} = ||+ (A).
R
For part (ii), let A, B Mn,n ( ). Then by Definition 25.12.2,
R
+ (A + B) = sup { |(A + B)v|; v n and |v| = 1}
R
sup { |Av| + |Bv|; v n and |v| = 1}
R
sup { |Av|; v n and |v| = 1} + sup { |Av|; v Rn and |v| = 1}
= + (A) + + (B).
Part (iii) follows from Definitions 19.5.2 and 25.3.2, Theorem 25.12.3 (iv, v), and parts (i) and (ii).
R R
For part (iv), let A, B Mn,n ( ). Let v n with |v| = 1. Then |(AB)v| = |A(Bv)| + (A)|Bv|
+ (A)+ (B)|v| by Definition 25.12.2. Hence + (AB) + (A)+ (B) by Definition 25.12.2.
R R
For part (v), let A, B Mn,n ( ). Let v n with |v| = 1. Then |(AB)v| = |A(Bv)| (A)|Bv|
(A) (B)|v| by Definition 25.12.2. Hence (AB) (A) (B) by Definition 25.12.2.
25.12.5 Remark: The benefits of the lower bound function for matrices.
Clearly the lower bound function is not a norm on the real square matrices, but it does have benefits. The
lower bound is closely related to the invertibility of a matrix, and invertibility is one of the key properties of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
matrices which are significant for applications. By Theorem 25.12.3 (xii), a matrix A is invertible if and only
if its lower bound (A) is positive. Consequently lower bounds for the matrix lower bound are of some
interest. This is the motivation for Theorem 25.12.6.
R
25.12.6 Theorem: Let A, B Mn,n ( ) for some n Z+ .
(i) (B) (A) + (B A).
(ii) If A is invertible and + (B A) < (A), then B is invertible.
(iii) | (B) (A)| + (B A).
(iv) (A) + (B A) (B) (A) + + (B A).
(v) If A and B are invertible, + (B 1 A1 ) + (A1 )+ (B 1 )+ (BA) = + (BA) (A)1 (B)1 .
R
Proof: For part (i), let v n with |v| = 1. Then |Bv| = |(B A)v + Av| |Av| |(B A)v| by
R
the triangle inequality for the norm on n . (See Theorem 19.5.4 (v).) So |Bv| (A) + (B A) by
Definition 25.12.2. Hence (B) (A) + (B A) by Definition 25.12.2.
Part (ii) follows from part (i) and Theorem 25.12.3 (xii).
Part (iii) follows from a double application of part (i).
Part (iv) follows from part (iii).
For part (v), for A, B invertible, + (B 1 A1 ) = + (B 1 (A B)A1 ) + (A1 )+ (B 1 )+ (B A) by
Theorem 25.12.4 (i, iv), and + (A1 )+ (B 1 ) = (A)1 (B)1 by Theorem 25.12.3 (xiv).
25.12.7 Remark: Continuity of the map from a matrix to its inverse.
From Theorem 25.12.6 (v), it can be shown that the inversion map A A1 for invertible matrices is
continuous. However, the formal statement of this assertion must wait until continuity on metric spaces has
been defined. (See Section 24.6 for norms on linear spaces.)
25.13.2 Definition: A symmetric n n matrix over a field K for n Z+0 is a matrix (aij )ni,j=1 Mn,n (K)
N
such that i, j n , aij = aji .
25.13.3 Definition: A real symmetric n n matrix for n Z+0 is a matrix (aij )ni,j=1 Mn,n (R) such
N
that i, j n , aij = aji .
25.13.4 Notation: Sym(n, R) denotes the set of real symmetric n n matrices for n Z+0.
25.13.5 Remark: Symmetric matrices are not closed under simple matrix multiplication.
R R
It is not generally true that AB Sym(n, ) if A, B Sym(n, ). By Theorem 25.13.6, (AB)T = BA
R
if A, B Sym(n, ), which is not quite the same thing. Theorem 25.4.15 implies that (AB)T = B TAT for
R
any matrices A, B Mn,n ( ). So (AB)T = BA = B TAT = (ATB T )T for all A, B Sym(n, ). It is not R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
possible to conclude the general equality of (AB)T and AB from this. However, AB + BA is symmetric if
A and B are symmetric, as shown in Theorem 25.13.7.
25.13.6 Theorem: Let n Z+0 and A, B Sym(n, R). Then (AB)T = BA.
Proof: Let n Z+
0 and A, B Sym(n, R). Then
n
Nn,
X
i, j (AB)Tij = (AB)ji = ajk bki
k=1
Xn
= akj bik
k=1
Xn
= bik akj = (BA)ij .
k=1
25.13.7 Theorem: Let n Z+0 and A, B Sym(n, R). Then AB + BA Sym(n, R).
Proof: Let n Z+
0 and A, B Sym(n, R). Then by Theorem 25.13.6 and Remark 25.4.14, (AB +BA) =
T
25.13.9 Notation:
Sym+ R
0 (n, ) denotes the set of real positive semi-definite symmetric n n matrices for n Z+0.
0 (n, R) denotes the set of real negative semi-definite symmetric n n matrices for n Z0 .
Sym +
Sym+ (n, R) denotes the set of real positive definite symmetric n n matrices for n Z+
0.
Sym (n, R) denotes the set of real negative definite symmetric n n matrices for n Z+
0.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
25.14. Matrix groups
25.14.2 Definition: An orthogonal matrix in a set of matrices Mm,n (K) for m, n Z+0 and field K is a
matrix A Mm,n (K) which satisfies AAT = Im .
structure
19.1.12 set GL(M ) of automorphisms of a module M over a set
23.1.12 set GL(V ) of linear space automorphisms of a linear space V
23.1.17 left transformation group GL(V ) acting on V
23.2.10 component maps for linear maps between finite-dimensional linear spaces
23.11.20 dual (contragredient) representation of a general linear group
25.14.1 table of notations and names for the classical matrix groups
32.5.15 the standard topology for the general linear group GL(F )
39.5.4 topological transformation group structure of a general linear group
48.3.16 locally Cartesian space structure for general linear group GL(F )
48.6.15 the standard atlas for the general linear group GL(F )
50.4.20 standard differentiable manifold structure for general linear group GL(F )
60.11.13 general linear Lie transformation group GL(V )
25.15.2
Q Definition: A multidimensional matrix ror higher-degree array over a field K is a function
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
N
A : rk=1 mk K for some m = (mk )rk=1 ( +
0) . Z
The degree or dimension of A is the number r.
The width sequence of A is the sequence m.
25.15.5 Definition: A symmetric multidimensional matrix or symmetric higher-degree array over a field
N
K is a function A : rn K, for some n, r + Z
0 , which satisfies
Chapter 26
Affine spaces
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.18 Line-to-line transformation groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
affine space over module over group affine space over module over ring
(G,M,X,G ,M ,G ,M ,X ) (R,M,X,R ,R ,M ,R ,M ,X )
affine space over module over field affine space over module over ordered ring
(K,M,X,K ,K ,M ,K ,M ,X ) (R,M,X,R ,R ,<,M ,R ,M ,X )
affine space over linear space affine space over module over ordered field
(K,V,X,K ,K ,V ,K ,V ,X ) (K,M,X,K ,K ,<,M ,K ,M ,X )
One may also think of an affine space as a point space in which every point is an origin of a linear space,
and all compatible sets of coordinates at all such origins are equally acceptable.
26.1.2 Remark: Affine spaces are a flat-space reference model for affinely connected manifolds.
Affine spaces provide a reference model for understanding parallel transport and affine connections on general
manifolds. An affine space may be thought of as a manifold with absolute (i.e. path-independent) parallelism.
In other words, affine spaces are examples of flat spaces.
26.1.3 Remark: The many meanings and contexts of the word affine.
The word affine is used in several ways. Transformation groups may be described as affine. Conversely,
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
spaces which are invariant under affine transformation groups may be described as affine. Then path-
dependent parallelism concepts which generalise the absolute parallelism on affine spaces may be described
as affine. Spaces which have a path-dependent parallelism may then also be described as affine.
(1) An affine transformation of a linear space is a combination of a translation and an invertible linear
transformation. This use of the word affine is closest to the original meaning which was introduced
by Euler in 1748. Euler intended the word to mean scalings along a single axis. (These are a kind of
uni-axial similarity transformation. Hence the word affine, which means related or similar.) The
modern meaning of affine transformation is an element of the group of transformations generated by
uni-axial scalings. (See Remark 24.4.1 and Section 71.2.)
(2) An affine space is a space which has both points and vectors, and a difference function which maps
pairs of points to vectors. The difference functions satisfies a linear space addition rule. The point-space
transformations which conserve this addition rule are the affine transformations. Thus an affine space
is invariant under the group of affine transformations.
(3) An affine connection is a generalisation of affine space parallelism. Whereas an affine space determines
a path-independent global parallelism on its point set, an affine connection defines a path-dependent
parallelism. An affine connection is thus a path-dependent analogue of affine space parallelism. Using
an affine connection, one may define constant-velocity or self-parallel lines which are analogous to the
(affine-parametrised) straight lines in affine spaces.
(4) An affine manifold may refer to a differentiable manifold on which an affine connection is defined.
This term is apparently not used by the majority of authors. Affine manifolds are a generalisation of
affine spaces.
(5) An affine map is generally a map f which is locally linear in the sense that the difference (f (x1 ), f (x2 ))
depends linearly on the difference (x1 , x2 ). In this sense, an affine transformation in case (1) is an
invertible affine map for which the domain and range are the same set. (The difference functions may
be linear space differences of some more general kinds of difference functions. The domain and range of
f might not be linear spaces, but the -values should be in linear spaces.)
(6) An affine-parametrised curve (or more general map) is an affine map whose domain is a subset of R
R
(or n for some n + ). Z
Sometimes the word affine is also used as a synonym for linear. There is therefore ample opportunity
for confusion in the use of this word.
26.1.4 Remark: The invariance constraints which characterise affine spaces and other spaces.
R Z
Table 26.1.1 gives some examples of invariant relations on a Cartesian space n for n + . Each relation
corresponds to a maximal group of transformations which preserve the relation.
Each of the relations in Table 26.1.1 requires an expression calculated after applying to be the equal to the
same expression calculated before applying . In the case of linear transformations, let w = u + v. Then
w u v = 0 before the transformation, whereas (w) (u) (v) = 0 after the transformation. So
the linear combination relation is invariant under the transformation.
In the affine transformation case, let w = u + (v u). Then w u (v u) = 0 before the transformation,
whereas (w) (u) ((v) (u)) = 0 after the transformation. So the convex combination relation is
invariant under the transformation. This is a weaker constraint than in the linear case.
The affine transformations may thus be defined as those which preserve convex combination relations between
points. However, history seems to have developed in the opposite direction. First the affine transformations
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
were defined by Euler. Much later, the invariant properties and relations of affine transformations and affine
spaces were studied by Mobius.
26.1.6 Remark: The affine transformation group is the automorphism group for affine spaces.
The invariance relation for affine spaces is essentially the same as the requirement for automorphisms on the
space of right transformation groups on a set. (See Section 20.7 for right transformation groups.) The most
basic kind of affine space is the affine space over a group in Definition 26.2.2, which is essentially the same
as a right transformation group. By Definition 20.8.1, right transformation group automorphisms must map
both the group (i.e. the vector set) and the point set so that the action of the group after the automorphism
corresponds to the action before it. This implies that (p + g n ) = (p) + ((p + g) (p))n for all points p,
group elements (i.e. vectors) g, and n . Z
In the case of an affine space over a linear space, the automorphism condition becomes (p + v) = (p) +
((p + v) (p)) for vectors v and scalars . In terms of the difference function on the point space,
this may be written as ((p + v), (p)) = ((p + v), (p)), which is equivalent to the invariance of the
convex combination relation. This is equivalent to the affine space invariance relation in Table 26.1.1 in
Remark 26.1.4. The affine transformation group is thus the automorphism group for affine spaces.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
According to Struik [228], page 175, the word vector was introduced into mathematics by William Rowan
Hamilton (18051865) in the context of quaternions. The OED [424], page 2456, gives the date 1865 as the
first recorded occurrence of the word vector in the mathematical sense of a quantity having both magnitude
and direction although it was used as early as 1796 in the sense of the straight line joining a planet to the
focus of its orbit. According to Kreyszig [22], page 9: The concept of a vector was first used by W. Snellius
(15811626) and L. Euler (170783).
In 1926, Levi-Civita [26], page 102, used the word versor to mean a vector with unit length. This word
is derived from a Latin verb meaning to turn in the same way that vector is derived from a Latin verb
meaning to carry. However, the word versor does not appear in classical Latin dictionaries.
26.2.3 Remark: Affine spaces over groups are very similar to right transformation groups.
An affine space X over a group G is essentially identical to a right transformation group which acts freely
and transitively on X because the difference operation adds no new information. (See Definitions 20.7.10
and 20.7.13 for right transformation groups which act freely and transitively respectively.) The operation
: X X G in Definition 26.2.2 is uniquely determined by the action operation of the group G on the
set X. This is shown in Theorem 26.2.4.
26.2.4 Theorem: Let (G, X, , ) be a right transformation group which acts freely and transitively on X.
(i) There exists a function : X X G such that x X, g G, ((x, g), x) = g.
(ii) There is at most one function : X X G such that x X, g G, ((x, g), x) = g.
Proof: For part (i), if X = , the empty function = satisfies x X, g G, ((x, g), x) = g.
Suppose that X 6= . For y X, define y : G X by y : g 7 (y, g). Then y is a bijection by
Theorem 20.7.15 (iii) because the group action is both free and transitive. So y has an inverse function
y1 : X G. Define : X X V by : (x, y) 7 1 y (x) for all (x, y) X X. Then ((x, g), x) =
1
x ((x, g)) = 1
x ( x (g)) = g for all x X and g G.
For part (ii), let k : X X G satisfy x X, g G, k ((x, g), x) = g for k = 1, 2. Let x, y X.
Then by transitivity, (y, g 0 ) = x for some g 0 G. So k (x, y) = k ((y, g 0 ), y) = g 0 for k = 1, 2 by
Definition 26.2.2 (i). Hence 1 (x, y) = 2 (x, y) for all x, y X. Thus 1 = 2 .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) x X, (x, x) = eG .
(I.e. x X, x x = 0G .)
(iii) x1 , x2 , x3 X, (x3 , x1 ) = ((x2 , x1 ), (x3 , x2 )).
(I.e. x1 , x2 , x3 X, x3 x1 = (x2 x1 ) + (x3 x2 ).)
(iv) x1 , x2 X, (x2 , x1 ) = (x1 , x2 )1 .
(I.e. x1 , x2 X, x2 x1 = (x1 x2 ).)
(v) x X, g1 , g2 G, ((x, g2 ), (x, g1 )) = (g11 , g2 ).
(I.e. x X, g1 , g2 G, (x + g2 ) (x + g1 ) = g1 + g2 .)
For g G, define g : X X by g : x 7 (x, g). Then:
(vi) g G, g : X X is a bijection.
(vii) g G, g g1 = g1 g = idX .
For p X, define p : X G by p : x 7 (x, p). Then:
(viii) p X, p : X G is a bijection.
For part (iv), let x1 , x2 X. Then (x1 , x1 ) = ((x2 , x1 ), (x1 , x2 )) by part (iii) with x3 = x1 . So
((x2 , x1 ), (x1 , x2 )) = eG by part (ii). Similarly, ((x1 , x2 ), (x2 , x1 )) = eG . So (x2 , x1 ) = (x1 , x2 )1 .
To show part (v), let x X and g1 , g2 G. Let x1 = (x, g1 ), x2 = x and x3 = (x, g2 ). Then by
part (iii), ((x, g2 ), (x, g1 )) = ((x, (x, g1 )), ((x, g2 ), x)). But (x, (x, g1 )) = ((x, g1 ), x)1 = g11
by part (iv) and Definition 26.2.2 (i). Hence ((x, g2 ), (x, g1 )) = (g11 , ((x, g2 ), x)) = (g11 , g2 ).
For part (vi), the surjectivity of g : X X for all g G follows from the assumption that G acts transitively
on X. To show injectivity, let x1 , x2 X and let y1 = (x1 , g) and y2 = (x2 , g). Then (y1 , g 1 ) =
((x1 , g), g 1 ) = (x1 , (g, g 1 )) = (x1 , eG ) = x1 by Definition 20.7.2. Similarly, (y2 , g 1 ) = x2 . So if
y1 = y2 , then x1 = x2 . Hence g is injective.
Part (vii) is the same as part (i).
For part (viii), let p X. Define p : G X by p : g 7 (p, g). Let x X. By the transitivity of G on X,
there is a g G with (p, g) = x. Then p (x) = p ((p, g)) = ((p, g), p) = g. So p (p (x)) = p (g) =
(p, g) = x. Therefore p p = idX . Let g G. Then p (p (g)) = p ((p, g)) = ((p, g), p) = g. Therefore
p p = idG . Hence p = p1 . So p : X G is a bijection.
26.2.7 Remark: Local difference maps in an affine space resemble manifold charts.
The bijections p in Theorem 26.2.6 may be thought of as manifold charts which satisfy a linearity constraint.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.8 tangent velocity spaces on affine spaces over modules
26.12 tangent-line bundles on Cartesian spaces
26.14 tangent velocity bundles on Cartesian spaces
26.15 tangent covector bundles on Cartesian spaces
52.3.32 tangent vector bundles on differentiable manifolds
52.9 unidirectional tangent bundles on differentiable manifolds
Note that such a tangent bundle is not a species of fibre bundle, although it may be considered to be a
non-topological tangent fibration in the sense of Section 21.1. If suitable structure group and fibre atlas
structures are added, it could be considered to be a non-topological fibre bundle in the sense of Section 21.5.
Z
The range of a line Lp,g Tp (X) for p X and g G is equal to the set {(p, g n ); n }. This is the image
Z
of H = {g n ; n } under the bijection p : G X defined by p : g 7 (p, g). So #(Tp (X)) equals #(H),
which is the order of the cyclic subgroup H of G which is generated by g. It follows that the structure and
relations of these primitive tangent spaces are essentially the same as the structure and relations of cyclic
subgroups and their cosets in the group G.
26.3.3 Definition: A line on an affine space over a group (G, X, , , ) is a map f : Z X which is
defined inductively for some p X and g G by
(i) f (0) = p,
(ii) n Z+0, f (n + 1) = (f (n), g),
(iii) n Z0 , f (n 1) = (f (n), g
1
).
The point p is called the base point of the line.
The group element g is called the velocity of the line.
26.3.4 Notation: Lp,g , for p X and g G, for an affine space over a group (G, X)
< (G, X, , , ),
denotes the line on (G, X) with base point p and velocity g.
26.3.5 Definition: The tangent space on an affine space over a group (G, X, , , ) at base point p X
is the set of lines {Lp,g ; g G}.
26.3.6 Notation: Tp (X), for p X, for an affine space over a group (G, X) < (G, X, , , ), denotes the
tangent space on (G, X) at base point p. In other words, Tp (X) = {Lp,g ; g G}.
26.3.7 Definition: The tangent bundle on an affine space over a group (G, X, , , ) is the set of lines
{Lp,g ; p X, g G}.
26.3.8 Notation: T (X), for an affine space over a group (G, X)S < (G, X, , , ), denotes the tangent
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
bundle on (G, X). In other words, T (X) = {Lp,g ; p X, g G} = pX Tp (X).
26.3.9 Remark: Attempting to make lines on affine spaces over groups coordinate-free.
Definition 26.3.3 may be regarded as coordinate-bound in some sense. Those who yearn for coordinate-
free definitions might prefer that a particular point p X is not specified. One way to remove special
points from the definition is to specify only that f : G X should satisfy part (ii). The rule n
Z , f (n + 1) = (f (n), g) guarantees that the function will be a line. Then one may recover the base point
p by examining f (0).
Z
One may also avoid specifying the velocity g by changing the rule to n , f (n+1) = (f (n), (f (1), f (0))).
Z
In other words, n , (f (n + 1), f (n)) = (f (1), f (0)). One can be even more coordinate-free by
Z
specifying the rule m, n , (f (m + 1), f (m)) = (f (n + 1), f (n)). This signifies that the velocity is
constant. This is reminiscent of the analytical definition of straight lines in Euclidean spaces and affine
manifolds (i.e. manifolds which have an affine connection). One may remove one of the dummy variables by
Z
specifying n , (f (n + 1), f (n)) = (f (n), f (n 1)), which is reminiscent of the second-order definition
of lines and geodesics as those curves whose curvature equals zero.
the requirements of Definition 21.4.5 for a non-topological fibre bundle for fibre space G. Consequently,
the tangent bundle meets the requirements of Definition 21.5.2 for a non-topological (G, G) fibre bundle is
G
one adds a fibre atlas EE which consists of the single map : T (X) G, where : f 7 (f (1), f (0))
for f T (X). (The action of G on X must be a left action, not the right action in Definition 26.2.2, but one
may easily define the left action to equal the right action. Then condition (i) of Definition 26.2.2 yields the
corresponding associativity for the left action.) One could even go further by observing that this tangent
bundle is a principal fibre bundle with this atlas. (See Definition 21.6.3 for non-topological principal fibre
bundles.)
26.3.12 Definition: A half-line on an affine space over a group (G, X, , , ) is a map f : Z+0 X which
is defined inductively for some p X and g G by
(i) f (0) = p,
(ii) n Z+0, f (n + 1) = (f (n), g).
The point p is called the base point of the half-line.
The group element g is called the velocity of the half-line.
26.3.13 Notation: L+ p,g , for p X and g G, for an affine space over a group (G, X)
< (G, X, , , ),
denotes the half-line on (G, X) with base point p and velocity g.
26.3.14 Definition: The unidirectional tangent space on an affine space over a group (G, X, , , ) at a
base point p X is the set of lines {L+
p,g ; g G}.
26.3.15 Notation: Tp+ (X), for p X, for an affine space over a group (G, X) < (G, X, , , ), denotes
the unidirectional tangent space on (G, X) at base point p. In other words, Tp+ (X) = {L+
p,g ; g G}.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.3.16 Definition: The unidirectional tangent bundle on an affine space over a group (G, X, , , ) is
the set of half-lines {L+
p,g ; p X, g G}.
26.3.17 Notation: T + (X), for an affine space over a group (G, X) < (G, X, , , ),
S denotes the unidi-
rectional tangent bundle on (G, X). In other words, T + (X) = {L+
p,g ; p X, g G} =
+
pX Tp (X).
26.4.2 Definition: An affine space over a module (without operator domain) is an affine space over a
group X
< (M, X, M , M , X ) such that M
< (M, M ) is a commutative group written additively (i.e. a
module without operator domain).
A G G R R R
A G R
G G M M M M M M
G X M X M X M X
X X X X
affine space affine space affine space affine space
over a group over a module over a module over a module
(or module) over a set over a group over a ring
Figure 26.4.1 Sets and operations for affine spaces over modules
26.4.3 Remark: Affine spaces over commutative groups have zero curvature.
The commutativity condition which is added to affine spaces over groups by Definition 26.4.2 causes the
tangent spaces at points of such affine spaces to have a simpler algebraic structure. Tangent vectors at a
point may be added without regard to order. This may be thought of as zero curvature in some sense.
Definition 26.3.3 and Notation 26.3.4 for lines on an affine space over a group may be adopted immediately
for affine spaces over modules. Since modules are commutative groups, the additive notation may be used.
26.4.4 Remark: The special case of a module with empty operator domain.
As mentioned in Remark 19.1.6, a module without operator domain is the special case of a module over
an unstructured set which happens to be empty. Therefore Definition 26.4.5 is superfluous because it is a
special case of Definition 26.4.10.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(ii) k Z+0, f (k + 1) = f (k) + m,
(iii) k Z0 , f (k 1) = f (k) m.
26.4.6 Notation: Lp,m , for p X and m M , for an affine space (M, X) < (M, X, M , M , X ) over a
module (M, M ), denotes the line on (M, X) with base point p and velocity m.
26.4.7 Remark: Unital morphisms mapping integers to points in an affine space over a module.
The unital morphism in Definition 18.2.11 may be applied to Definition 26.4.5 to write Lp,m (k) = km
Z
for k , where km is defined inductively so that 0Z m = 0M and (k + 1)m = km + m for all k . Z
26.4.8 Remark: Lines are expected to use scalar multiples when the module is over a field.
In principle, Definition 26.4.5 is applicable to affine spaces over all kinds of modules, including linear spaces,
which are the same thing as unitary left modules over fields (and fields are the same thing as commutative
division rings). However, simply taking integer multiples of the displacement m from the point p is not what
we generally mean by a line in an affine space over a linear space. We expect the line in a space with a
scalar multiplication operation to contain every scalar multiple of the velocity vector.
26.4.9 Remark: Modification of affine spaces over modules to modules with non-empty operator domain.
A left module M < (A, M, M , A ) over a set A is a commutative group (M, M ) with a left action map A :
A M M (notated multiplicatively) which is distributive over the group operation. (See Definition 19.1.7
for left modules over sets.) Thus a A, x, y M, A (a, M (x, y)) = M (A (a, x), A (a, y)). In other
words, a A, x, y M, a(x + y) = ax + ay. This requires a modest extension to Definition 26.2.2 for an
affine space over a group.
Affine spaces over modules are defined in terms of left modules. There is no particular significance in this.
It does not affect the algebra. The module for an affine space is in practice typically a linear space, which is
almost always defined as a left module. Therefore it makes sense to standardise the module as a left module.
26.4.11 Remark: An affine space over a module over a set has 3 spaces and 2 action maps.
The slightly confusing operations in Definition 26.4.10 may be summarised as follows.
(i) A : A M M . Top-level set acting on the middle-level set.
(ii) M : M M M . Addition operation within the middle-level set.
(iii) M : X M X. Middle-level set acting on the bottom-level set.
(iv) X : X X M . Bottom-level operation with middle-level result.
A novel aspect of affine spaces over modules over sets is the fact that there are three sets and two action
operations linking them. The unstructured set A is at the top level, acting on M . The commutative group M
(in the middle) acts on X. The operation M keeps elements of M at the same level. The operation X ,
unusually, raises elements of X from the bottom level to the middle level.
26.4.12 Remark: Possibly confusing combination of a left module with a right transformation group.
It is noteworthy that Definition 26.4.10 combines a left module with a right transformation group. This is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the standard way of doing things, although it is perhaps not the tidiest way of doing things. This convention
is seen in affine spaces over linear spaces in expressions such as p + v for the displacement of a point p by
a vector v, where is a scalar multiplier for the displacement v. There is no confusion because the left
module action is written multiplicatively in such expressions, and the displacement is written additively. So
the operator binding rules ensure that the actions take place in the correct order. Thus v is acted on by
the left module action of , and then v acts on p by a right transformation action.
26.5.2 Remark: Scalar operations can join the dots of lines on affine spaces.
The lines in Definition 26.3.3 for affine spaces over groups are defined in the absence of a scalar multiplication
operation on the group. Therefore only the most primitive form of line can be defined, by simply iterating
the group operation on points of the affine space point set forwards and backwards.
Lines can also be defined using iterated group operations when a scalar operation is available, but a different
kind of line is possible using scalar multiplication. When the vector space of an affine space is a module
over a set, the set is thought of as an operator domain. In fact, the unstructured operator domain A in
Definition 26.4.10 cannot be considered to be an authentic scalar space because it does not have even an
addition operation. Therefore Definition 26.5.3 should not be taken very seriously. It is probably of technical
interest only. The gradual build-up of structure helps one to understand which aspects of algebraic structure
perform which functions.
26.5.3 Definition: A line on an affine space over a module over a set (A, M, X, M , A , M , X ) is a map
f : A X which is defined for some p X and m M by f : a 7 p + am.
The point p is called the base point of the line.
The module element m is called the velocity of the line.
26.5.4 Notation: Lp,m , for p X and m M , for an affine space (M, X) < (A, M, X, M , A , M , X )
over a module M
< (A, M, M , A ) over a set A, denotes the line on (M, X) with base point p and velocity m.
26.5.5 Remark: Scalars in the operator domain can act on velocities before applying them to points.
Definition 26.5.3 and Notation 26.5.4 incompatibly redefine Definition 26.3.3 and Notation 26.3.4. (This
does not matter much because it is unlikely that anyone will use these definitions and notations!) It should
be noted that the line parameter space in Definition 26.3.3 is the ring of integers whereas the line parameter
space in Definition 26.5.3 is the scalar set A.
The line map in Definition 26.5.3 is defined by making the scalar a A act on the velocity m M first
before applying the product am M to the point p X. The more primitive line map in Definition 26.3.3
effectively applies the unital morphism in Definition 18.2.11 to the velocity g G before applying the iterated
Z Z
result g k to the point p X for k . This is effectively using the ring of integers as an ad-hoc scalar
space for the group which is in the role of a vector space. Therefore in this sense, Definition 26.3.3 may
be considered to be a special case of Definition 26.5.3. Thus Definition 26.5.3 is the true definition. The
domains, ranges and parameters of lines on affine spaces over modules are illustrated in Figure 26.5.1.
Z Z Z A G G R R R
A G R
Lp,g Lp,m Lp,m Lp,m
G G M M M M M M
G X M X M X M X
X X X X
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
affine space affine space affine space affine space
over a group over a module over a module over a module
(or module) over a set over a group over a ring
Figure 26.5.1 Sets and maps for lines on affine spaces over modules
26.5.6 Remark: Applicability of tangent concepts for unstructured to structured operator domains.
Definition 26.5.3 and Notation 26.5.4 are directly applicable to affine spaces over modules over groups and
rings in Definitions 26.6.2 and 26.6.6, which do have authentic scalar spaces.
Definitions 26.5.7 and 26.5.9, and Notations 26.5.8 and 26.5.10, are the scalar-multiple versions of the cor-
responding unital morphism definitions and notations in Section 26.2. They are defined in terms of Defini-
tion 26.5.3 and Notation 26.5.4, but they are directly applicable to affine spaces over modules over groups
and rings.
26.5.7 Definition: The tangent space on an affine space over a module over a set
(M, X)
< (A, M, X, M , A , M , X ) at base point p X is the set of lines {Lp,m ; m M }.
26.5.8 Notation: Tp (X), for p X, for an affine space over a module over a set (M, X), denotes the
tangent space on (M, X) at base point p. In other words, Tp (X) = {Lp,m ; m M }.
26.5.9 Definition: The tangent bundle on an affine space over a module over a set
(M, X)
< (A, M, X, M , A , M , X ) is the set of lines {Lp,m ; p X, m M }.
26.5.11 Remark: Comparison of tangent bundles on affine spaces over modules with fibre bundles.
The tangent bundle in Definition 26.5.9 may be regarded as a non-topological fibre bundle in the same way
as for the affine spaces over groups which were discussed in Remark 26.3.10. In the case of an affine space
(A, M, X, M , A , M , X ) over a module M over an unstructured set A, there is no explicit role for the
set A within the fibre bundle. The set A cannot be the structure group because it is not a group!
26.5.12 Remark: Uni-directional tangent lines require an order definition on the operator domain.
The definitions of half-lines cannot be given if the scalar space lacks a compatible ordering. It would be
possible to define half-lines here for ordered operator domains A. However, ordered rings are of more interest
than general ordered operator domains. Therefore the definition of half-lines is deferred until affine spaces
over modules over rings have been defined. (See Definition 26.7.5.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The scalar space of the affine space (G, M, X, G , M , G , M , X ) is the group G
< (G, G ).
26.6.3 Remark: Summary of spaces and operations for affine spaces over modules over groups.
The operations in Definition 26.6.2 may be summarised as follows.
(i) G : G G G. Operation within the top-level set.
(ii) G : G M M . Top-level set acting on the middle-level set.
(iii) M : M M M . Addition operation within the middle-level set.
(iv) M : X M X. Middle-level set acting on the bottom-level set.
(v) X : X X M . Bottom-level operation with middle-level result.
The space G of actions on the module M is now a group. So there is an extra operation G : G G G
within the top-level structure G.
26.6.4 Remark: Interpretation of the operator domain of an affine space over a group as a structure group.
As mentioned in Remark 26.5.6, the definitions and notations for lines, tangent spaces and tangent bun-
dles for affine spaces over modules over sets may be directly exported to affine spaces over modules over
groups and rings without change. However, when exporting the fibre bundle ideas which are mentioned in
Remark 26.3.10, the presence of a group acting on the module opens up the possibility of using this group
as the structure group for the fibre bundle.
Let (G, M, X, G , M , G , M , X ) be an affine space over a module over a group. Definition 21.5.2 for a non-
topological (G, F ) fibre bundle requires a specification tuple (E, , B, AF
E ) and an effective left transformation
group (G, F ) < (G, F, G , ).
Let E = T (X), B = X and F = M , using the same group G for the affine space and the fibre bundle. Let
AFE = {}, where : T (X) M is defined by : Lp,m 7 m. This map is well defined, as mentioned in
Remark 26.3.10, because for any line L T (X), the value of m = (L) is uniquely determined as (L) =
X (L(1), L(0)). Similarly the map : T (X) X with : Lp,m 7 p is well defined. The action of G on
F = M is = G : G M M . With these assignments, (E, , B, AF E ) is a non-topological (G, F ) fibre
bundle. In other words, the tuple (T (X), , X, {}) is a non-topological (G, M ) fibre bundle. It happens
that this fibre bundle is trivial, but that is because of the particular choice of fibre atlas AM
T (X) = {}.
It is clear from this example and Remark 26.3.10 that affine spaces in general have tangent-line spaces which
satisfy the requirements for a non-topological fibre bundle. In the case of an affine space over a linear space,
the structure group is the field of the linear space. So in the case of the well-known affine space on n , theR
structure group is . R
26.6.5 Remark: A module over a ring is not a module over a group.
Definition 26.6.6 for an affine space over a module over a ring is not derived from Definition 26.6.2 because
a module over a ring is not a module over a group. (See Remark 19.3.3 for details.) Therefore it must be
defined separately.
An affine space over a module over a ring is almost the same thing as an affine space over a linear space,
which is the most common kind of affine space seen in applications.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.6.7 Remark: Summary of spaces and operations for affine spaces over modules over rings.
The operations in Definition 26.6.6 may be summarised as follows.
(i) R : R R R. Additive group operation within the top-level set.
(ii) R : R R R. Multiplicative semigroup operation within the top-level set.
(iii) R : R M M . Top-level set acting on the middle-level set.
(iv) M : M M M . Addition operation within the middle-level set.
(v) M : X M X. Middle-level set acting on the bottom-level set.
(vi) X : X X M . Bottom-level operation with middle-level result.
26.7.2 Definition: An affine space over a unitary module over a ring is an affine space over a module
over a ring X
< (R, M, X, R , R , M , R , M , X ) such that
(i) M
< (R, M, R , R , M , R ) is a unitary module over a ring R
< (R, R , R ).
26.7.3 Remark: Import of tangent definitions to affine spaces over a ring.
Lines, tangent spaces and tangent bundles are defined and notated on affine spaces over modules over rings
exactly as for affine spaces over modules over unstructured operator domains. (See Definitions 26.5.3, 26.5.7
and 26.5.9, and Notations 26.5.4. 26.5.8 and 26.5.10.) One simply ignores the structure on the ring.
26.7.4 Remark: Advantages of upgrading an affine space to a module over an ordered ring.
Definitions 26.7.5, 26.7.7 and 26.7.9, and Notations 26.7.6, 26.7.8 and 26.7.10 are the scalar-multiple versions
of the corresponding unital morphism definitions and notations in Section 26.2. Half-lines make no sense
if the scalar space has no order. Therefore these definitions require an ordered ring. (See Section 18.3 for
ordered rings.) It is possible to define half-lines for affine spaces over modules over unstructured sets or
groups also, but an ordering must then be provided. The form of the definitions and notations would then
be as given here.
An ordered ring has a tuple R < (R, R , R , <), where < is the ordering on R. So the definition of an affine
space over a module over an ordered ring requires an expanded tuple (R, M, X, R , R , <, M , R , M , X ).
A formal definition of this would be tedious and obvious. So it is omitted.
26.7.5 Definition: A half-line on an affine space over a module over an ordered ring
< (R, M, X, R , R , <, M , R , M , X ) is a map f : R0+ X which is defined for some p X and
(M, X)
m M by f : r 7 p + rm, where R0+ = {r R; r 0R }.
The point p is called the base point of the half-line.
The module element m is called the velocity of the half-line.
26.7.6 Notation: L+ p,m , for p X and m M , for an affine space over a module over an ordered ring
(M, X), denotes the half-line on (M, X) with base point p and velocity m.
26.7.7 Definition:
The unidirectional tangent space on an affine space over a module over an ordered ring (M, X) at a base
point p X is the set of half-lines {L+
p,m ; m M }.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.7.8 Notation: Tp+ (X), for p X, for an affine space over a module over an ordered ring (M, X),
denotes the unidirectional tangent space on (M, X) at base point p. That is, Tp+ (X) = {L+
p,m ; m M }.
26.7.9 Definition:
The unidirectional tangent bundle on an affine space over a module over an ordered ring (M, X) is the set
of half-lines {L+
p,m ; p X, m M }.
26.7.10 Notation: T + (X), for an affine space over a module over an ordered ring (M, X), S denotes the
unidirectional tangent bundle on (M, X). In other words, T + (X) = {L+
p,m ; p X, m M } = T
pX p
+
(X).
26.8.2 Definition: The tangent velocity space on an affine space over a module over a set
(M, X)
< (A, M, X, M , A , M , X ) at base point p X is the set of pairs {(p, m); m M }.
26.8.3 Notation: Tp (X), for p X, for an affine space over a module over a set (M, X), denotes the
tangent velocity space on (M, X) at base point p. In other words, Tp (X) = {(p, m); m M }.
26.8.4 Definition: The tangent velocity bundle on an affine space over a module over a set
(M, X)
< (A, M, X, M , A , M , X ) is the set of pairs {(p, m); p X, m M }.
26.8.5 Notation: T (X), for an affine space over a module over a set (M,
S X), denotes the tangent velocity
bundle on (M, X). In other words, T (X) = {(p, m); p X, m M } = pX Tp (X).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
mating the function by a line. When one differentiates a real-valued curve in n , one is approximating the
curve by an affine-parametrised line. The real-valued case is also an affine-parametrised line because the
Y-value is a linear function of the X-value. In other words, x 7 y = p + mx for some point p and velocity m.
So differentiation is the art of approximating a curve by an affine-parametrised line.
In the case of differentiable manifolds, one approximates differentiable curves by affine-parametrised lines in
the chart space. That is as good as it gets. There is no such thing as an affine-parametrised line on a general
differentiable manifold. Therefore the most natural definition of a tangent vector on a general differentiable
manifold is an affine-parametrised line with respect to each chart in the atlas. That is as good as it gets!
26.9.2 Definition: The line through points p, q in an affine space X over a unitary module M over a
ring R is the map L : R X defined by L : t 7 p + t(q p).
26.9.3 Remark: The line through two points might not include the second point in a non-unitary module.
The expression p+t(q p) in Definition 26.9.2 means M (p, M (t, X (q, p))) in terms of the affine space tuple
(R, M, X, R , R , <, M , R , M , X ) in Definition 26.6.6. So Definition 26.9.2 is well defined in a general
affine space over a module over a ring (without requiring the module to be unitary). But one requires the
module to be unitary so that one can write L(1R ) = p + 1R (q p) = q.
In terms of Notation 26.5.4, the line through points p, q is Lp,qp . Thus Definition 26.9.2 does not require
its own specific notation. It is only a different way of specifying or parametrising lines.
26.9.5 Remark: Non-availability of a customary linear space formula for a line through two points.
The expression (1 t)p + tq is not well defined because there is no product function from R X to X, and
there is no addition function from X X to X. The pseudo-expression (1 t)p + tq would be well defined if
X were given a linear space structure (or the structure of a unitary module over a ring) which is consistent
with the affine space structure, but that would defeat the main purpose of defining affine spaces, which is to
remove the origin and vector addition operation from the point space.
26.9.7 Definition: The line segment through points p, q in an affine space X over a unitary module M
over an ordered ring R is the map L : R[0R , 1R ] X defined by L : t 7 p + t(q p).
26.9.8 Remark: Using the real-number interval notation for line segments.
Notations 26.9.9 and 26.9.15 are suggested by the standard closed interval notation in R.
26.9.9 Notation: X[p, q], for points p, q X for an affine space X over a unitary module over an ordered
field, denotes the parametrised line segment through points p and q.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
all point sequences p X k+1 and number sequences t Rk . The expression ki=1 ti (pi p0 ) is defined
Pj Pj1
inductively by the rule i=1 ti (pi p0 ) = tj (pj p0 ) + i=1 ti (pi p0 ). Since vector addition in the
module M is commutative, the sum expression is independent of the order of addition of the terms.
26.9.15 Notation: X[p0 , p1 . . . pk ] denotes the hyperplane segment through points p0 , p1 . . . pk X for
k + Z
0 in an affine space X over a unitary module over an ordered field.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Pyramid to that cosmic abstraction that we now call north.
Thus the cosmic abstraction of portable direction was well established at the dawn of history. So it is
difficult now to see it as an abstraction. We think of directions as having a meaning or existence of their
own, independent of location. Therefore one may say that the primacy of points is probably the most ancient
view, and the primacy of directions is the most modern, but modern includes all of recorded history.
In Definition 26.6.6 for an affine space over a module over a ring, the local displacement is defined in terms
of a universal, portable vector space. That is, the displacement map M : X M X is primary, and
the subtraction map X : X X M is secondary, being defined in terms of M . Some texts make the
opposite choice, making the difference function X primary. The choices made by some authors are shown
in Table 26.10.1.
26.10.2 Remark: Upgrading an affine space over a module to an affine space over a linear space.
Affine spaces over linear spaces are derived from affine spaces over modules over rings (Definition 26.6.6) or
affine spaces over unitary modules over rings (Definition 26.7.2) by requiring the module to be a unitary left
module over a field, which is the same thing as a linear space. (See Definition 19.3.6 and Remark 19.3.8 for
unitary left modules over fields.)
26.10.3 Definition: An affine space over a linear space is an affine space over a unitary module over a
ring X
< (K, V, X, K , K , V , K , V , X ) such that
(i) V
< (K, V, K , K , V , K ) is a unitary left module over a field K
< (K, K , K ) (i.e. a linear space).
26.10.4 Remark: An affine space over a linear space can be specified in two equivalent ways.
Theorem 26.10.5 shows that it is possible to define an affine space X over a linear space V in terms of the
difference function X : X X V instead of the affine structure function V : X V X. The resulting
algebraic structure is identical. The equivalence of the two specification methods applies also to the affine
spaces in Definitions 26.2.2, 26.4.2, 26.4.10, 26.6.2, 26.6.6 and 26.7.2.
26.10.5 Theorem: Construction of an affine space from a linear space and a difference function.
Let V < (K, V, K , K , V , K ) be a linear space over a field K
< (K, K , K ). Let X be a set. Let
X : X X V satisfy the following conditions.
(i) For all q X, the map p 7 X (p, q) is a bijection from X to V .
(ii) p, q, r X, X (p, r) = V (X (q, r), X (p, q)).
Then X < (K, V, X, K , K , V , K , V , X ) is an affine space over the linear space V , where V : X V X
is defined by V : (y, v) 7 y1 (v), where y : X V is defined by y : x 7 X (x, y) for y X.
Proof: The inverse function y1 : V X is well defined by condition (i). So the map V : X V X
defined by V : (y, v) 7 y1 (v) is well defined. Then X (V (y, v), y) = X (y1 (v), y) = y (y1 (v)) = v for
all y X and v V . So V satisfies Definition 26.6.6 (iii).
It must also be shown that (V, X, V , V ) is a right transformation group which acts freely and transitively
on X. The pair (V, V ) is a group because V is a linear space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Let x X and v1 , v2 V . Let y = V (x, v1 ) and z = V (y, v2 ). Then X (y, x) = (V (x, v1 ), x) = v1 and
X (z, y) = (V (y, v2 ), y) = v2 . So by condition (ii), X (z, x) = X (y, x) + X (z, y) = v1 + v2 . Therefore
V (x, v1 + v2 ) = V (X (z, x), x) = x1 (x (z)) = z = V (y, v2 ) = V (V (x, v1 ), v2 ). Hence V satisfies the
associativity requirement for a group action. (See Definition 20.7.2 (i).)
Let x X. Then by condition (ii), X (x, y) = X (x, x) + X (x, y) for all y X. So v = X (x, x) + v
for all v V because y X, v V, x X, X (x, y) = v by condition (i). Since (V, V ) is a group,
each v has an inverse v. So 0V = v v = X (x, x) + v v = X (x, x). Hence x X, X (x, x) = 0V .
Therefore x X, V (x, 0V ) = V (x, X (x, x)) = x1 (X (x, x)) = x by the definition of V . This verifies
Definition 20.7.2 (ii). So (V, X, V , V ) is a right transformation group.
It follows from condition (i) and Theorem 20.7.15 (iii) that V acts freely and transitively on X. Hence
X < (K, V, X, K , K , V , K , V , X ) is an affine space over the linear space V by Definitions 26.10.3,
26.7.2, 26.6.6 and 20.7.2.
26.10.6 Remark: The difference function can be easily reconstructed from the module-action map.
The basic property Theorem 26.10.5 (ii) for the difference function X : X X V follows very easily as
Theorem 26.10.7 (i) from the basic property Definition 26.6.6 (iii) of the affine structure map V : X V X.
This seems to indicate that it is preferable for mathematical purposes to define affine spaces in terms of
transformation groups rather than in terms of difference functions.
Proof: For part (i), let p, q, r X. Let v1 = X (q, p) and v2 = X (r, q). Then V (p, v1 + v2 ) =
((p, v1 ), v2 ) = (q, v2 ) = r by Definition 20.7.2 (i). So X (r, p) = X (V (p, v1 + v2 ), p) = v1 + v2 =
X (q, p) + X (r, q) by Definition 26.6.6 (iii).
For part (ii), let p X. Then X (p, p) = X (V (p, 0V ), p) = 0V by Definition 26.6.6 (iii).
Part (iii) follows from part (i) by substituting p for r.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(6) Euclidean inner product space. P R
The real linear space n with the standard inner product :
R
n
R
n
R
defined by : (x, y) 7 ni=1 xi yi . (See Definition 24.7.7.) This immediately implies the
corresponding standard norm as in case (5).
(7) A linear space of n dimensions together with the usual tangent bundle, Riemannian metric and Levi-
Civita connection.
The spaces in the following list have no Euclidean metric. So they should be referred to as Cartesian.
(Of course, Descartes did assume that space always had the Euclidean metric. So this is a misnomer. It
would be more accurate to call spaces with coordinates Oresmian spaces because Oresme introduced the
idea of plotting functions using coordinates in the 14th century. See Boyer/Merzbach [217], pages 239241;
Boyer [215], pages 8085; Boyer [216], pages 4651.)
(8) Cartesian tuple space. The set Rn with index set Nn. (See Definition 16.4.1.)
(9) Cartesian linear space. The linear space Rn . (See Definition 22.2.19.) This is the set of tuples Rn ,
for some n Z+
0 , together with the operations of componentwise addition and scalar multiplication over
the field R.
(10) Cartesian topological space. The set Rn together with the standard topology as in Definition 32.5.11,
which is induced by the standard metric in case (3). (But the metric is not part of this structure.) Any
topological space which is homeomorphic to this may also be called a Cartesian topological space, as
in Definition 48.3.3.
R
(11) Cartesian topological linear space. The linear space n with the standard topology, which is
induced by the standard metric in case (4). (But the metric is not part of this structure.)
(12) Cartesian abstract linear space. An n-dimensional real linear space. This space has implied linear
charts. For each basis B = (ei )ni=1 for an n-dimensional linear space V , there is a component map
R P
R
B : V n defined by v V, v = ni=1 B (v)i ei , where n denotes the linear space of real n-tuples.
(See Definition 22.8.8 for component maps.) Each such component map is a linear space isomorphism.
The elements of V are considered to be the points of the space while the maps B are linear charts
for V . The point space has no fixed axes as are present in case (9). Therefore this abstract space is
closer to Euclidean plane geometry than case (9), but the origin is still fixed.
(13) Affine space of n dimensions. This space has a linear space of displacement vectors at each point.
(14) A linear space of n dimensions together with a tangent bundle. (See Section 26.12.) This has a space
of velocity vectors at each point, which is often confused with the affine space in case (13), which has
displacement vectors at each point.
(15) R
The set n together with the usual flat affine connection.
(16) R
The set n together with its usual differentiable structure.
(17) R
The set n together with its usual Lebesgue measure.
As a result of the wide variety of meanings in the literature, it is difficult to say exactly how much algebraic,
geometric or analytic structure is referred to when someone says Euclidean space.
26.11.2 Remark: The disadvantages of pseudo-notation for Euclidean spaces.
The pseudo-notation E n is frequently used for one or more Euclidean space definitions. When the notation
R
E n is used in elementary texts it is probably intended to remind the reader to think of n with some
range of usual Euclidean structures attached to it. However, such an E n notation is not the nth power
of anything certainly not the nth power of E, whatever E is. This kind of pseudo-notation should be
replaced with meaningful notation wherever it is found. (An even more absurd notation is M n for an
n-dimensional manifold. Teachers can do more harm than good by using such meaningless notation because
it forces students to abandon logical thinking.)
26.11.3 Remark: The coordinatisation of Euclidean space.
The modern mind is so accustomed to the use of Cartesian coordinates that the majority of authors refer to
Cartesian coordinates as Euclidean space. It is true that the construction of a Cartesian coordinate chart
for a Euclidean space enables all Euclidean definitions and theorems to be expressed correctly in terms of
n-tuples of real numbers. But such charts are arbitrary and not intrinsic. One may counter-argue that the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
lines and circles of Euclidean geometry are equally arbitrary and non-intrinsic, being imposed on physical
geometry in the same way as Cartesian grid lines.
For any set of Euclidean points, lines and circles, one may easily construct a Cartesian grid using ruler and
compass so as to express the locations, dimensions and orientations of points, lines and circles in terms of
numerical coordinates. For example, given any straight line segment OA, one may construct another line
segment OB at a right angle to one of its end-points O. (See Figure 26.11.1.)
Py P
Qy
B
Px
O A
Qx
Figure 26.11.1 Euclidean construction of Cartesian coordinates
According to Boyer [216], page 150, it was Nicolas Guisnee in 1705 (see Guisnee [202], page 15) who first
published an explicit statement that the coordinates of points in Cartesian geometry may be constructed by
dropping perpendiculars to two orthogonal axes.
[. . . ] this book seems to be the first one in which both the x and y coordinates in a rectangular
Cartesian system are interpreted as the segments cut off on the two axes by perpendiculars from a
given point.
Boyer [216], pages 156157, also mentions that dropping perpendiculars to obtain Cartesian coordinates for
three dimensions was published by Antoine Parent in Essais et recherches de mathematique et physique
in 1705. (The classical method of dropping perpendiculars is given by Euclids proposition I-12. See
Euclid/Heath [195], pages 270275; Euclid [198], pages 1011.)
The common point O is the origin. Each of these two lines may be extended indefinitely in both directions,
thereby constructing axes. For any point P in the plane, there is a unique point on each line which is the
intersection of the perpendicular from the point to the line. (In Euclidean geometry, there is one and only
one perpendicular from each point to each axis line. This is not totally obvious. For example, why should the
perpendicular intersect the axis in the same point for any choices of radii for drawing the circles? Numerous
other aspects of the construction are non-obvious a-priori. But they are part and parcel of the Euclidean
geometry tradition.) These two points Px and Py are the coordinates. The only serious issue is whether the
distance from the origin to the coordinates is commensurable with the length of the initial line segment OA
used in the construction. The distances of the coordinate points from the origin effectively convert points in
the place into numbers, but only if the lengths of lines OPx and OPy may be expressed as numbers.
Pairs or triples of numbers are not the same as geometrical points. The numbers which were known to
classical and Hellenic geometry were only those which couldbe constructed geometrically. Thus it was
impossible, for example, to construct lines in the proportion 3 2 : 1 or : 1. So the use of the Euclidean
definition of numbers would not permit points to be located on all of the points of the plane as we
understand the Euclidean plane in our time. In the time of Descartes, there were less real numbers than
were available by the end of the 19th century. Thus the Euclidean plane and Euclidean space have been
stealthily redefined by virtue of our modern understanding of numbers.
Euclidean space is commutative in the sense that if you move a kilometre forwards and then a kilometre to
the left, you arrive at the same place as if you move a kilometre left and then a kilometre forward. So the
order in which one make translations does not affect the outcome. Therefore the relative location of a place
may be specified arithmetically as the sum of the X and Y translations, where the order does not matter,
like on graph paper. The geometry of a sphere is not so simple. The graph-paper model for the geometry of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
space is difficult to unlearn. One should be surprised that space has a simple arithmetic character. In the
time of Kant, philosophers even tried to show that space must a-priori obey Euclids axioms.
It is best to regard Cartesian coordinates as a numerical model for Euclidean space. It happens that when
we specify a linear chart for space or time, the model gives practical benefits. But we cannot be sure that
the model is free of errors both at the small and large extremes of scale.
These observations about Cartesian coordinates need to be said because when differentiable manifolds are
defined in terms of charts, it is generally assumed that the real-number tuple spaces are safe, secure constructs
which are familiar and can be taken for granted. But if the concept of a tangent vector on a manifold is
defined in terms of tangent vectors on a so-called Euclidean space, it is important to know that they are well
defined. In fact, they are not, because there is no such thing as a vector in classical Euclidean space. There
are only points, lines and circles in two-dimensional space and various point-sets such as spheres, cones,
ellipses, etc., in three-dimensional space. (See Sections 40.1 and 51.1 for further comments on this issue.) An
ontologically unambiguous definition for tangent vectors in Cartesian spaces is the subject of Section 26.12.
26.11.4 Remark: Cartesian coordinate frames for Euclidean spaces are charts.
Cartesian coordinates, and the inhomogeneous Euclidean group of rotations and translations, are surely the
first example in history of mathematical atlases of charts. (One could argue that any collections of two or
more overlapping geographical charts of the Earths surface were the first differentiable manifolds defined in
terms of atlases of charts.) In this sense, it is wrong to think of the modern approach to differential geometry
as commencing in the 19th century with Gau and others. But in another sense, this is correct. The Cartesian
atlases for Euclidean geometry have algebraic transition functions, not topologically or differentially defined
transition functions. The collections of geographical charts from the time of Ptolemy to the 17th century
were not mathematically described as differentiable atlases until the necessary mathematical concepts had
been defined.
The fact that collections of compatible Cartesian coordinate charts provide atlases for physical (or intuitive)
Euclidean space implies that Euclidean geometry is extra-mathematical. At least, this is so if one assumes
that modern mathematics consists of all that can be constructed and proved within ZF set theory. In
ancient Greek times, geometry was regarded as more trustworthy than arithmetic because arithmetic could
not deal with the incommensurables which were so immediately evident in their geometry. In relative
trustworthiness, arithmetic and geometry have swapped places, probably some time in the 19th century. Now
we import geometry into set theory and arithmetic and perform calculations within analysis and arithmetic,
and then we export the answers back to geometry. In ancient Greek times, it was the opposite way around.
Most of the other concepts of differential geometry are also extra-mathematical. For example, differentiability
and continuity of curves and functions are evident in the intuitive geometry of curved manifolds, and are
merely modelled within differential geometry. Likewise, straight lines and flat planes are imported into
Cartesian atlases from the intuitive Euclidean geometry.
26.11.5 Remark: The ancient Greeks used Cartesian coordinates.
One might ask why the ancient Greeks did not discover Cartesian coordinates. In fact, such orthogonal
coordinate systems were used frequently by Archimedes and Apollonius in the study of conic sections. The
value added by Descartes seems to lie in the systematic conversion of geometric operations into algebraic
operations. (In fact, it wasnt really Descartes who introduced Cartesian coordinates. This was done by later
mathematicians. But Descartes did introduce the arithmetisation of geometry. See Boyer/Merzbach [217],
pages 319320; Descartes [194].)
26.11.6 Remark: Tangent-line bundles on Cartesian spaces use implicit charts.
The tangent-line bundles in Section 26.12 are not quite correct. For example, a line in n-dimensional
R R
Euclidean space at a point p is not a function L : n which is affine and satisfies L(0) = p as claimed
in Definition 26.12.9 and elsewhere. Some information is missing here. The missing information is the chart
R
: n X for some Euclidean space X. The coordinatisation L for a geometric line depends on the choice
of chart . Likewise, the coordinatisation p for each point is chart-dependent. Therefore strictly speaking,
one should always think of lines and points as equivalence classes of pairs (, L) and (, p) respectively. If
this was always done for flat-space geometry, there would be almost no culture shock when making the
transition to the atlases of charts in differential geometry. There is in fact less difference than there at first
seems between Cartesian analytic geometry and differential geometry. The former supports any concept
which is invariant under the affine group of chart transitions. The latter supports any concept which is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
invariant under some transformation semigroup of chart transition maps.
If one could follow a Euclidean space or manifold chart from the coordinate space back to the point space,
one could presumably see real geometry. However, this real geometry is highly variable. Thousands or
millions of real-world systems can be modelled, with varying fidelity, by coordinate charts. The principal value
of abstract Cartesian spaces and differentiable manifold atlases lies in the enormous range of applications.
It is counter-productive to ask to see the real geometry. Fitting coordinate charts to the real world is the
task of the applied mathematician, the scientist, the engineer, the geographer or anyone else. It must never
be forgotten that mathematics merely provides toolboxes of intellectual tools. The value of the tools arises
both from the broad generality of applications and from the austere abstraction which isolates mathematics
from applications, making mathematics application-independent.
There was a time in history when mathematics and the real world were note kept distinct from each other.
When mathematics became more abstracted from the real world towards the end of the 19th century, the
desire for meaning caused some people to advocate the Platonic ideal-universe ontology for mathematics,
according to which mathematics was an exact model for a metaphysical universe beyond the senses. That
is a path which some follow in their desperate thirst for meaning, but the meaning of mathematics is not
difficult to find. The true meaning can be found by following the implied charts from all mathematical
models back to the real world. In each application context, the chart leads to a different reality which is being
modelled by someone. (That person might not be human. For example, many interplanetary probes use
stars to guide them, building up an internal model of their location in the Solar System from observations.
This is mathematical modelling of real-world geometry. Autonomous and semi-autonomous robots create
models of the world much as humans do.)
One may likewise ask for the meaning of the integers. As in the case of Cartesian coordinates, the integers
are merely a model for real-world systems. Following the implicit application charts from the integers back
to the real world may lead to cows in a field, which are being counted by a farmer. Or they may lead to
bundles of bank-notes, or beta particles and gamma rays being counted in a Geiger counter. Modelling the
world is just something which humans, other animals and robots do. Mathematics is merely one class of
human-made models among many.
If one does not forget that every applicable mathematical structure has an implicit chart which may attach it
R
to the real world, then it is harmless to model Euclidean space with Cartesian coordinates in n , ignoring
the chart connecting coordinates to a real geometry. The harm arises when one begins to think that
mathematical structures have some reality of their own.
The pertinent difference in the case of manifolds is that manifold specifications require explicit charts to
connect them to the modelled geometry because more than one chart is typically required. The linkage
between charts via the extra-mathematical real manifold gives meaning to the inter-chart transition maps.
For other classes of mathematical systems, we tend to focus on the mathematical set-construction as it was
the real thing, not just a model. The pertinent difference in the case of manifolds is that we are forced to
be aware of the linkage to the real geometry.
It may be counter-argued that manifolds could be exempted from the need to link coordinate charts to an
extra-mathematical geometry in two different ways. The first way is to link manifold charts to, for example,
subsets of some higher-dimensional flat Cartesian space. In other words, manifolds could be regarded as
R
merely models for embedded sets, for example of the form {x n ; F (x) = 0} for some suitably regular
R R
function F : n . Then the extra-mathematical linkage would be from the purely mathematical
ambient space to the real geometry. The second kind of evasive action would be simply link the charts of a
manifold to each other. This is the local transformation semigroup approach, where one defines an abstract
semigroup of local injective maps with some specific property such as continuity or differentiability. With this
approach, the transition maps between charts lack motivation and meaning. If one believes that mathematics
can flourish as a purely formal, axiomatic symbolic algebra devoid of extra-mathematical meaning, then this
kind of intra-mathematical linkage between charts would be satisfactory. For many mathematicians, however,
intuition is the main spring of ideas and conjectures.
One may regard it as misfortune or fortune that manifolds are defined in terms of charts linking them to an
abstract point space. It is certainly true that differential geometry is made confusing by them, and much
more difficult to learn and teach. But one is led by the multi-chart style of manifold specification to think
very carefully about geometry without the traditional assumptions about Euclidean flat spaces. Multi-chart
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
manifolds are a bit like walking on a very slippery iceberg. You never know quite where you stand, and the
group can move under you very quickly. One is compelled to recognise that everything is relative.
vector means either a start-point/end-point pair or an element of the vector space of a flat affine space.
A velocity means a particular parameter of lines and more general curves. The displacement from L(0K )
from L(1K ) is a fairly arbitrary parameter for specifying constant-velocity lines. There are other ways to
parametrise lines. If the curve is not of constant velocity, then the displacement from L(0K ) from L(1K ) is
clearly not generally in a one-to-one association with the velocity. Conversely, the point L(0K ) + v is not
generally equal to L(1K ).
The underlying ontology of mathematical concepts is not vapid philosophy. One of the principal thrusts of
mathematics is generalisation. Illumination through generalisation has been a major thrust for the last two
centuries. Two concepts which may be in a close one-to-one association in a specific context will require quite
different treatment in a generalised context if they are ontologically different. One of the incidental benefits
of the generalisation from Euclidean geometry to differential geometry is that it forces mathematicians
to clarify the ontology of traditional geometric concepts so that they may be unambiguously defined and
represented. Concepts which were formerly regarded as interchangeable often have only a tenuous association
in the differential geometry context. It should not be forgotten also that there are (at least) three distinct
differential geometry contexts, namely differentiable manifolds, affinely connected manifolds, and Riemannian
manifolds.
v, v0 , v1 V, K,
v v0 = (v1 v0 ) (v) (v0 ) = ((v1 ) (v0 )).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a more ontologically correct representation of a velocity vector.
26.12.5 Remark: Affine spaces are ontologically more correct, but linear spaces are more convenient.
The following definitions of ontologically correct tangent-line vectors are closely related to the tangent-line
spaces in Section 26.5 for general affine spaces over modules. The following definitions are, however, for
Cartesian spaces of n-tuples, not for affine spaces over a linear space. These definitions are applicable also
to general linear spaces, and to general modules without any affine space structure.
The purpose of affine structure is to remove the special role of the origin and axes of a coordinate system.
In fact, Definitions 26.12.6, 26.12.7, 26.12.8 and 26.12.9 have a fundamental ontological error because they
use Cartesian n-tuple spaces as a convenient substitute for the corresponding affine space structures. In
the intended applications, these definitions will be typically used as charts, for which the origin and axes
have no special value (because differentiable manifold charts have the same meaning no matter how they are
translated or rotated).
Therefore although Cartesian n-tuple spaces are used as charts throughout this book (and in most other
differential geometry texts), the ontologically more correct chart spaces are affine spaces. The convenience
of Cartesian spaces, and the weight of tradition, cannot be ignored in this case. However, the representation
R
of tangent vectors on linear spaces such as n are as convenient as the velocity-components representation
or any other representation. Therefore the weight of tradition can be ignored when defining ontologically
correct tangent vectors on the (slightly ontologically incorrect) Cartesian n-tuple spaces.
26.12.6 Definition: A tangent (line) vector in a Cartesian space Rn for n Z+0 is an affine map from R
R
to n .
26.12.7 Definition: A tangent (line) vector at a point p in a Cartesian space Rn for n Z+0 is a tangent-
R
line vector L in n such that L(0) = p.
26.12.8 Definition: The tangent (line) vector at a point p with velocity v in a Cartesian space Rn for
n + Z
0 is the tangent-line vector L in
n
R
such that L(0) = p and L(1) L(0) = v.
26.12.9 Definition: The tangent (line) vector set at a point p in a Cartesian space Rn for n Z+0 is the
R R
set {L : n ; L(0) = p and L is affine}.
26.12.10 Notation: Tp ( Rn ), for n Z+0 and p Rn , denotes the tangent-line vector set at p in Rn.
26.12.11 Remark: Extension of affine space tangent-line vector notation to Cartesian spaces.
Notation 26.4.6 for lines in affine spaces over modules is extended to Cartesian spaces in Notation 26.12.12.
26.12.12 Notation: Lp,v denotes the tangent-line vector at a point p with velocity v in a Cartesian space
Rn
for n +
0. Z
26.12.13 Remark: Illustration of tangent-line vectors on a Cartesian space.
R
Figure 26.12.1 illustrates two tangent-line vectors in Tp ( n ). Lines L1 , L2 Tp ( n ) are functions with R
domain R R
and range n , but for simplicity, the domain values t = . . . 2, 1, 0, 1, 2, 3 . . . are indicated as
tags on the points in the range set.
L1
-2 p + v1
-1 3
p 2
1 1
0 2
-1
-2 p + v2 3
L2
Rn
Rn )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 26.12.1 Tangent-line vectors in Tp (
26.12.14 Remark: Space-time and time-space illustration of tangent-line vectors on a Cartesian space.
R
Figure 26.12.2 illustrates two tangent-line vectors in Tp ( n ) which are similar to the vectors in Figure 26.12.1,
R
but the time parameter is shown explicitly for the lines L1 , L2 Tp ( n ) in this space-time diagram. The
R
space-like lines L01 , L02 are obtained by projection into the space plane n . In the special case of the zero
tangent vector L : t 7 0, the line L would be vertical above and below the point p. The projection would
then have range equal to the single point p n . R
L1
t L2
3
3
2
2
1
1
p L01
0
-1 -1 L02
-2
Rn
-2
L1
t L0
3 3
L2
3
2 2
2
1 1
1
p
0
-1
-1 -1
-2
-2 -2 Rn
Figure 26.12.3 Tangent-line vectors in Tp ( Rn), time-space view
Tangent-line vectors may also be illustrated as in Figure 26.12.3, which shows space vectors as a function of
the time parameter. In this case, the zero tangent vector at p is shown as a vertical line L0 .
26.12.15 Definition: The tangent (line) space at a point p in a Cartesian space n for n R Z+0 is the
R R R
tangent-line vector set Tp ( n ) = {L : n ; L(0) = p and L is affine} together with
(i) the vector addition operation : Tp (Rn ) Tp (Rn ) Tp (Rn ) with (L1 , L2 ) : t 7 L1 (t) + L2 (t) p for
all L1 , L2 Tp (Rn ), and
(ii) the scalar multiplication operation : R Tp (Rn ) Tp (Rn ) with (, L) : t 7 L(t) for all R
and L Tp (Rn ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
p + v1 2
-1 2
p 1 1
-1
p + v1 + v2 L1 + L2
0 1
2
-1 p + v2
L2
Rn
Figure 26.12.4 Vector addition for tangent-line vectors in Tp ( Rn)
26.12.17 Theorem: The tangent-line space in Definition 26.12.15 is a linear space over the field R.
Proof: Let n + Z
0 and p
n
R
. Then clearly L1 + L2 and L are affine maps from to n for any R R R
R
and L, L1 , L2 Tp ( n ). Since (L1 + L2 )(0) = L1 (0) + L2 (0) p = p and (L)(0) = L(0) = p, it follows that
R
Tp ( n ) is closed under vector addition and scalar multiplication. Hence Tp ( n ) (with these operations) is R
a linear space.
26.12.21
P Theorem: Let p be a point in a Cartesian space Rn for n Z+0. Let v = (vi )ni=1 Rn . Then
Lp,v = ni=1 vi Lp,ei , where (ei )j = ij for all i, j n . N
Proof: Let p be a point in a Cartesian space n for n + R Z n
0 . Let v = (vi )i=1
n
R N
. Let i n . Then
R
t , (vi Lp,ei )(t) = Lp,ei (vi t) = p+vi tei = Lp,vi ei (t) by Definition 26.12.15 (i). Therefore vi Lp,ei = Lp,vi ei .
R
If n = 2, then t , (Lp,v1 e1 + Lp,v2 e2 )(t) = Lp,v1 e1 (t) + Lp,v2 e2 (t) p = p + v1 e1 t + p + v2 e2 t p =
p + v1 e1 t + v2 e2 t = p + (v1 e1 + v2 e2 )t = Lp,v1 e1 +v2 e2 (t) by Definition 26.12.15 (i). Therefore PnLp,v1 e1 +
the usual inductive argument on n +
Lp,v2 e2 = Lp,v1 e1 +v2 e2 . From P Z
0 , it follows that Lp,v = i=1 vi Lp,ei
for general n + Z0 since v =
n
i=1 vi ei .
26.12.22 Remark: Tangent-line vectors in tangent-line bundles have unambiguous base points.
R R
In contrast to the situation with numerical tangent vectors v n at a point p n , it is not necessary
to combine tangent-line vectors with point-tags to indicate which point each vector belongs to. For
R
any L Tp ( n ), the point p may be obtained from L as L(0).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A numerical vector v must be tagged as, for example, the ordered pair (p, v) to indicate the base point of
the vector. So one typically uses the untagged form v when only one base point is being discussed, and
R
the tagged form (p, v) when the whole tangent bundle on n is discussed. This can be confusing. The
tangent-line bundle representation does not suffer from this kind of confusion.
26.12.23 Definition: The tangent (line) bundle total space on a Cartesian space Rn for n Z+0 is the
S
set pRn Tp ( n ). R
26.12.24 Notation: T ( Rn), for n Z+0, denotes the tangent-line bundle total space on Rn.
26.12.25 Remark: Projection maps and velocity charts for tangent-line bundle line spaces.
R R
According to Definition 21.1.19, T ( n ) is a non-topological fibration with fibre space n , and the sets
R R
Tp ( n ) for p n are its fibre sets. Definition 26.12.26 defines the obvious projection map for this fibration,
and also the obvious global chart which associates the velocity component tuples in Definition 26.12.18 with
tangent vectors.
26.12.26 Definition: The projection map for the tangent-line bundle total space T ( n ), for n + R0 , is Z
R R R R R
the map : T ( n ) n defined by p n , L Tp ( n ), (L) = p. That is, L T ( n ), (L) = L(0).
R
The velocity chart for the tangent-line bundle total space T ( n ), for n + Z
0 , is the map : T (
n
) n R R
n
R
defined by L T ( ), (L) = L(1) L(0). In other words, (L) is the velocity of L T ( ). n
R
26.12.27 Remark: Tangent-line bundles satisfy the requirements for a tangent bundle.
R R S
R
Since the sets Tp ( n ) are pairwise disjoint for points p n , the tangent bundle set T ( n ) = pRn Tp ( n ) R
is a disjoint union of the tangent sets at the points p of the base space n
R
. By analogy with the fibre bundles
R
defined in Chapter 21, the name tangent bundle may be given to the set T ( n ) if it is given some suitable
additional structures.
26.12.28 Remark: Tangent-line bundles are not related to the standard term line bundle.
The term tangent-line bundle in this book is different to the term line bundle in the fibre bundle
literature. In this book, a line bundle is a bundle of lines at each point of a base set. The tangent objects
are one-dimensional in character, but the linear space of such lines at each point in the Cartesian space is
n-dimensional. In the fibre bundle literature, a line bundle is a vector bundle where the linear space of
objects at each point is one-dimensional. Line bundles, in the sense of 1-dimensional vector bundles, are
defined by EDM2 [109], 147.F, page 570, for example.
26.13.2 Definition: The direct product identification map for Cartesian space tangent spaces Tp1 ( Rn )1
R
and Tp2 ( n2 ), for n1 , n2 + Z
0 and pk
nk
R
for k = 1, 2, is the map i : Tp1 ( n1 ) Tp2 ( n2 ) Tp ( R R Rn),
where n = n1 + n2 and p = concat(p1 , p2 ), which is defined by
v1 Rn , v2 Rn ,
1 2
i(Lp1 ,v1 , Lp2 ,v2 ) = Lp,concat(v1 ,v2 ) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.13.3 Remark: The direct product of two tangent bundles of Cartesian spaces.
Definition 26.13.2 is easily extended to tangent bundles. The concatenation operation is applied in the same
way to the component point-pairs and velocity-pairs. Although this is admittedly almost too simple to
present formally, these definitions are applied to direct products of differentiable manifolds. It is best to first
deal with the low-level formalities here before proceeding to higher-level applications.
The values of p1 and v1 in Definition 26.13.4 are easily obtained from L1 as p1 = L1 (0) and v1 = L1 (1)L1 (0).
(See for example Definition 26.12.8.) Similarly, p2 = L2 (0) and v2 = L2 (1) L2 (0). But it would be perhaps
too pedantic to write the equation in Definition 26.13.4 more formally as
L1 Rn , L2 Rn ,
1 2
26.13.4 Definition: The direct product identification map for Cartesian space tangent bundles T ( Rn )1
R Z
and T ( n2 ), for n1 , n2 +
0 , is the map i : T (
n1
R R R
) T ( n2 ) T ( n1 +n2 ) which is defined by
p1 , v1 Rn , p2 , v2 Rn ,
1 2
i(Lp1 ,v1 , Lp2 ,v2 ) = Lconcat(p1 ,p2 ),concat(v1 ,v2 ) .
26.13.6 Definition: The concatenation of (Cartesian space) tangent vectors Lpk ,vk T ( nk ) for k = 1, 2 R
R
is the tangent vector i(Lp1 ,v1 , Lp2 ,v2 ) T ( n ), where i is the identification map in Definition 26.13.4.
Alternative name: (Cartesian space) tangent vector concatenation .
R
26.13.7 Notation: (L1 , L2 ), for tangent vectors L1 T ( n1 ) and L2 T ( n2 ), denotes the tangent R
R
vector concatenation i(L1 , L2 ) T ( n1 +n2 ), where i is the direct product identification map for T ( n1 ) R
R
and T ( n2 ) as in Definition 26.13.4. In other words,
p1 , v1 Rn , p2, v2 Rn ,
1 2
(Lp1 ,v1 , Lp2 ,v2 ) = i(Lp1 ,v1 , Lp2 ,v2 )
= Lconcat(p1 ,p2 ),concat(v1 ,v2 ) .
26.14.2 Definition: A tangent velocity vector in a Cartesian space Rn for n Z+0 is a pair (p, v)
Rn
n. R
26.14.3 Definition: A tangent velocity vector at a point p in a Cartesian space Rn for n Z+0 is a tangent
R
velocity vector (p, v) n n . R
26.14.4 Definition: The tangent velocity vector at a point p with velocity v in a Cartesian space Rn for
n + Z
0 is the tangent velocity vector (p, v)
n
n. R R
26.14.5 Definition: The tangent velocity set at a point p in a Cartesian space Rn for n Z+0 is the set
R R
{p} n = {(p, v); v n }.
26.14.6 Notation: Tp (Rn ), for n Z+ 0 and p R , denotes the tangent velocity set at p in R .
n n
26.14.7 Definition: The tangent velocity space at a point p in a Cartesian space Rn for n Z+ 0 is the
tangent velocity set Tp (Rn ) = {p} Rn together with
(1) the vector addition operation : Tp (Rn ) Tp (Rn ) Tp (Rn ) with : ((p, v1 ), (p, v2 )) 7 (p, v1 + v2 ),
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(2) the scalar multiplication operation : R Tp (Rn ) Tp (Rn ) with : (, (p, v)) 7 (p, v).
26.14.8 Definition: The velocity of a tangent vector (p, v) Tp (Rn ), at a point p in a Cartesian space
Rn for n Z+0, is the n-tuple v Rn.
26.14.9
S Definition: The tangent velocity bundle (set) on a Cartesian space Rn for n Z+ 0 is the
set pR Tp (R ).
n
n
26.14.10 Notation: T (Rn ), for n Z+ 0 , denotes the tangent velocity bundle set on R .
n
may regard a parametrised line L : t 7 p + tv as a test velocity, which is related to the well-known test
particle concept. One may move a particle at various velocities v in a small neighbourhood of a point p,
and one may observe the work required to move the particle.
As discussed in Remark 28.5.8, a useful inspiration for covectors is the (old-fashioned) work formula dW =
Xdx + Y dy + Zdz, where one tests a force field by moving a point by infinitesimal displacements dx, dy
and dz, and one observes infinitesimal work dW , and the corresponding ratios are denoted X, Y and Z. If
the work rate is given by the linear combination rule for infinitesimal displacements in all directions, then
the work rate is given as the algebraic dual on the tangent space at the point. However, forces are not
always consistent with such a linear force field rule. Even if the force is linear, it may not be integrable as
a conservative force field. Therefore an affine function is not always a suitable model for the force field. In
general, one may expect some rule for the force as a function of displacement, but it may not be linear.
One must also consider that the v in tangent-line vectors represents velocity, not displacement. In physics,
forces are quite often velocity-dependent. Thus the simple rule of the style dW = Xdx + Y dy + Zdz does
not reflect real physics.
It may be concluded that the affine field notion in Section 26.16 has limited applicability. It is better to adopt
the algebraic linear dual as the definition of tangent covectors, while observing that this is not necessarily
universally applicable. The linear dual may be replaced, if necessary, by something else.
26.15.3 Definition: The tangent covector space at a point p in a Cartesian space Rn with n Z+0 is the
algebraic dual Tp (M ) of the tangent-line space Tp (M ).
26.15.4 Notation: Tp ( Rn), for n Z+0 and p Rn , denotes the tangent covector space at p.
26.15.5 Definition: A tangent covector at a point p in a Cartesian space Rn with n Z+0 is an element
of the tangent covector space at p.
26.15.6 Definition: A tangent covector in a Cartesian space Rn for n Z+0 is an element of the tangent
covector space at some point in n . R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.15.7 Definition: The tangent covector at a point p with co-velocity w , forPna point p in a Cartesian
R
space n for n +0 , Z
and w n
, is R
the tangent covector at p which maps L to i
i=1 wi (L(1) L(0)) . R
26.15.8 Notation: Lp,w , for p, w Rn for n Z+0, denotes the tangent covector at p with co-velocity w.
26.15.9 Remark: The action of tangent covectors on tangent-linePn vectors.
Combining Notations 26.12.12 and 26.15.8 yields Lp,w (Lp,v ) = i=1 wi vi for all p, v, w Rn , for n Z+0.
26.16. Tangent field covector bundles on Cartesian spaces
26.16.1 Remark: An apparent natural dual for affinely parametrised lines.
An appealing idea for a natural dual of the set of affinely parametrised maps from R R
to n is the set of
affine maps from R n
R
to . An affine map L :
Rn
R
satisfies L (v2 ) L (v1 ) = L (v2 ) L (v10 ) for all
0
0 0
R
v1 , v2 , v1 , v2 n
such that v2 v1 = v2 v1 . In other words, L (p1 + v) L (p1 ) = L (p2 + v) L (p2 )
0 0
R
for all p1 , p2 , v n . It is not too difficult to show that all such affine maps have the form Lp,k : x 7
Pn
i=1 ki (xi pi ) for some p, k
n
R
.
26.16.3 Definition: A tangent field covector in a Cartesian space Rn for n Z+0 is an affine map from
Rn
to . R
26.16.4 Definition: A tangent field covector at a point p in a Cartesian space Rn for n Z+0 is a tangent
R
field covector L in n such that L (p) = 0.
26.16.6 Definition: The tangent field covector set at a point p in a Cartesian space Rn for n Z+0 is the
R R
set {L : n ; L (p) = 0 and L is affine}.
26.16.7 Notation: Tp ( Rn), for n Z+0 and p Rn , denotes the tangent field covector set at p in Rn .
26.16.8 Definition: The tangent field covector space at a point p in a Cartesian space Rn for n Z+ 0 is
the tangent field covector set Tp (Rn ) = {L : Rn R; L(p) = 0 and L is affine} together with
(1) the vector addition operation : Tp (Rn ) Tp (Rn ) Tp (Rn ) with (L1 , L2 ) : x 7 L1 (x) + L2 (x)
for all L1 , L2 Tp (Rn ), and
(2) the scalar multiplication operation : R Tp (Rn ) Tp (Rn ) with (, L ) : x 7 L (t) for all
R and L Tp (Rn ).
26.16.9 Definition: The co-velocity of a tangent covector L Tp (Rn ), at a point p in a Cartesian space
Rn for n Z+0, is the n-tuple k Rn such that L(x) = Pni=1 ki (xi pi) for all x Rn .
26.16.10 Remark: Affine functionals on Cartesian space lack a well-defined start-point.
Although the set of linear functionals on a finite-dimensional linear space is in a dual relation to the linear
space, this convenient duality is not available between affine lines and affine functionals. The problem is that
a single affine function L : n R R
is zero for more than one n
R
S point p n when n 2. So one cannot
R
covector bundle total space as the set pRn Tp ( ) as in the case of the tangent-
define the tangent field S
R R R
line bundle total space pRn Tp ( n ) in Definition 26.12.23. Given a line L : n , the base point p is
unambiguously p = L(0). There is no such formula for a unique base point for affine maps L : n R R
when n 2. Consequently one must add tags to the affine maps. Definitions 26.16.11 and 26.16.13, and
Notations 26.16.12 and 26.16.14, are too clumsy to use in practice.
26.16.11 Definition: The tagged tangent field covector set at a point p in a Cartesian space Rn for n Z+0
R R
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is the set {(p, L ) : n ; L (p) = 0 and L is affine}.
26.16.12 Notation: Tp ( Rn ), for n Z+0 and p Rn, denotes the tagged tangent field covector set at p
R
in n .
26.16.13 Definition: The tagged tangent field covector bundle (set) on a Cartesian space Rn for n Z+0
R
is the set pRn Tp ( n ).
S
26.16.14 Notation: T ( Rn), for n Z+0, denotes the tangent field covector bundle set on Rn.
26.17. Parametrised lines in ancient Greek mathematics
26.17.1 Remark: Euclids geometry was static. Archimedes geometry was dynamic.
It is perhaps noteworthy that none of the definitions of lines or any other geometrical figures in Euclids
Elements have a time parameter or any other parameter. Euclids geometry is static. (See Euclid/Heath
[195], pages 153, 370; [196], pages 1, 78, 113 188, 277; [197], pages 10, 101, 177, 260.) By contrast, Archimedes
freely uses a time parameter in the definition of a spiral for example. (See Archimedes/Heath [183], page 154.)
If a straight line of which one extremity remains fixed be made to revolve at a uniform rate in a plane
until it returns to the position from which it started, and if, at the same time as the straight line
revolves, a point move at a uniform rate along the straight line, starting from the fixed extremity,
the point will describe a spiral in the plane.
Archimedes also freely uses time parametrisation for straight lines. Following are the first two theorems and
their proofs in the Archimedes treatise On Spirals. (See Archimedes/Heath [183], page 155.)
Proposition 1.
If a point move at a uniform rate along any line, and two lengths be taken on it, they will be
proportional to the times of describing them.
Two unequal lengths are taken on a straight line, and two lengths on another straight line repre-
senting the times; and they are proved to be proportional by taking equimultiples of each length
and the corresponding time after the manner of Eucl. V. Def. 5.
Proposition 2.
If each of two points on different lines respectively move along them each at a uniform rate, and if
lengths be taken, one on each line, forming pairs, such that each pair are described in equal times,
the lengths will be proportionals.
This is proved at once by equating the ratio of the lengths taken on one line to that of the times of
description, which must also be equal to the ratio of the lengths taken on the other line.
It seems that the modern definition of a tangent vector in differential geometry is more closely related to
Euclids static definition than to Archimedes dynamic definition. Since differential geometry is based entirely
on tangent vectors as the most fundamental concept upon which all other concepts are built, it seems desirable
to have a satisfying definition rather than a discordant set of definitions, neither of which is satisfying. (See
Remark 51.3.2 for a survey of definitions of tangent vectors on differentiable manifolds.) Definition 26.12.15
is in effect an upgrade of tangent vector definitions from Euclid (about 300bc) to Archimedes (about 250bc).
Here are the first three formal definitions from On Spirals by Archimedes. (See Archimedes/Heath [183],
pages 165166.)
Definitions.
1. If a straight line drawn in a plane revolve at a uniform rate about one extremity which remains
fixed and return to the position from which it started, and if, at the same time as the line revolves,
a point move at a uniform rate along the straight line beginning from the extremity which remains
fixed, the point will describe a spiral in the plane.
2. Let the extremity of the straight line which remains fixed while the straight line revolves be
called the origin of the spiral.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3. And let the position of the line from which the straight line began to revolve be called the initial
line in the revolution.
The time parameter is clearly in the foreground of the thinking in Archimedes definitions of both straight
lines and curves. This shows clearly the difference between Euclid and Archimedes. Euclid was more of a
pure geometer while Archimedes was more of a mathematical physicist. The former was more concerned
with static displacements and angles while the latter was more concerned with velocities. Many difficulties
in differential geometry disappear when tangent vectors are viewed as velocities rather than displacements.
Thus Definition 26.12.15 could be thought of as the Archimedean tangent space whereas the velocity in
Definition 26.12.18 could be referred to as the corresponding Cartesian tangent vector.
26.17.2 Remark: Archimedes dynamic definition of a spiral.
Figure 26.17.1 illustrates Proposition 20 the Archimedes treatise On Spirals. (See Archimedes/Heath [183],
pages 174176.)
F
P cu rv
e
R
e
t lin
s p ir al
ge n
tan
O
T OT = cot
Proposition 20 in Archimedes On Spirals shows that the length of the line OT in the diagram is equal to
the length of the arc KRP. This effectively converts the geometric construction for the tangent line TP to
the spiral curve OP at P into a number, namely the ratio of the length of OT to the length of the radius OP.
This ratio is the cotangent of the angle . Thus the tangent line TP may be constructed as the unique line
through P which cuts the curve only once or by measuring the specified distance along the line OT at right
angles to OP. This is an example of how even in about 250bc, tangents to curves were being thought of in
the two alternative ways: the geometrical and the numerical.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
26.17.4 Remark: Generalisations of ancient Greek definitions of tangents.
The word tangent does mean touching, but Euclids definition of a tangent line to a curve is that it
intersects the curve only once (even if extended infinitely in both directions). This definition corresponds to
the modern tangent concept in the case of the boundary of a bounded convex open set.
The simple ideas of touching and cutting only once yield some interesting generalisations. For example,
a triangle may have an infinite number of tangents at one of its vertices. Any infinite line which intersects
a vertex but does not intersect any other point of the triangle may be considered to be a tangent because it
does indeed touch the triangle. This kind of tangent line is closely related to the tangent cone concept.
The vertex of a triangle does not seem to have a genuine tangent line because we generally think of tangents
as being unique, being aligned with the unique infinitesimal direction of the curve. It could be argued,
however, that a vertex does pass through an infinite number of directions within a zero distance, as if it were
an arc of a circle with zero radius.
The idea of circles being tangent to each other is defined by Euclid. (See Remark 51.3.12.) This suggests
a generalisation from first-order to second-order tangency. That is, whereas a tangent line merely has the
same direction as the curve where it touches, a tangent circle would have both the same direction and radius
of curvature where it touches. Of course, this implies that the curve must have a well-defined direction and
curvature at the tangent point. Second-order tangency may be generalised to higher-dimensional surfaces,
using ellipsoids as tangential objects. This kind of tangency is well known in the study of the curvature of
embedded manifolds.
Another way to generalise tangents is to replace tangent lines with other kinds of geometrical objects.
For example, the infinite line may be replaced by a semi-infinite line. Generally a closed convex region
will have an infinite number of such tangent semi-infinite lines, and the union of these lines will form a
cone. This suggests the definition of exterior tangent cones. These are used in the analysis of boundary
value problems for partial differential equations, where an exterior (or interior) cone condition may help to
guarantee existence of solutions to some classes of problems. Similarly, exterior (or interior) sphere conditions
can be useful.
Chapter 27
Multilinear algebra
27.0.1 Remark: Tensor algebra is perhaps more accurately called multilinear algebra.
The subject of Chapters 2730 is multilinear algebra, but for historical reasons, it is generally referred to as
tensor algebra.
27.0.2 Remark: It is helpful to keep tensor algebra and tensor analysis separate.
Chapters 2730 present only the algebra of tensor spaces. The analysis of tensor spaces is delayed until
Chapter 45 because analytical tensor space concepts such as differential forms (Section 45.2) and the exterior
derivative (Section 45.6) require differential and integral calculus.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
An example is a small antenna in an electric field, or a small loop of wire in a magnetic field, or a small
lump of matter in a gravitational field, or a particle of an elastic solid subjected to stress. The response of
a particle to a multilinear field can be quantified as a tensorial state of the particle. The effects of fields on
larger amounts of matter may be calculated by integrating the response of notional individual particles over
a volume of matter. (Multilinear functions are defined in Section 27.1.)
27.0.4 Remark: Tensor products of vectors are not really products of vectors.
Probably the usual method of introducing tensors as products of vectors is the root cause of the complexity
and incomprehensibility of tensor algebra. Products of vectors are really only incidental to tensor algebra.
Multilinear functions are the core concept of tensor algebra. Covariant tensors are multilinear functions, no
more, no less. Contravariant tensors are the duals of multilinear functions. In other words, a contravariant
tensor is a linear functional on a space of multilinear functions.
It just happens that a product of vectors yields a particular simple kind of linear functional on a space of
multilinear functions. Products of vectors only have significance to tensor algebra via their association with
the special kinds of contravariant tensors called simple tensors or monomial tensors.
Tensor algebra may be thought of as having two kinds of entities: fields and tensors. The fields are multilinear
functions and the tensors are the responses to those fields. Trying to think of these responses (i.e. linear
functionals on spaces of multilinear functions) as sums of products of vectors is a bit like subdividing a
sphere into cubes which is in fact how practical integration is generally done in Cartesian coordinates.
The sphere does not easily subdivide into cubes. Nor do tensors naturally subdivide into products of vectors.
In tensor algebra, it is the covariant tensors which are fundamental. Contravariant tensors are a secondary
construction, defined as the dual of covariant tensors. This could possibly provide some slender justification
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
for the mystifying use of the terms covariant and contravariant in differential geometry. It is more likely,
however, that the term contravariant refers to the components of vectors with respect to a given basis. The
components of a vector do vary contravariantly with respect to the sequence of basis vectors. In the tensor
calculus which was developed in the last decade of the 19th century by Ricci-Curbastro and Levi-Civita,
which was heavily oriented towards tensor component arrays, there would have been a natural inclination
to regard the component arrays as being the real tensors, just as in the theory of linear spaces as taught
at an elementary level, vectors are usually identified with n-tuples of numbers. In terms of coordinates,
contravariant vectors and tensors do vary in a contrary fashion to the linear space basis. Unfortunately,
this confusion of vectors and tensors with their component arrays has led to somewhat careless usage of
the terms covariant and contravariant for the vectors and tensors themselves. (See also Remark 28.5.7
regarding this terminology issue.)
27.0.5 Remark: Tensors are inevitably unpleasant. There is no Royal Road to tensors!
Gallot/Hulin/Lafontaine [13], page 36, say the following about tensors.
It is one of the unpleasant tasks for the differential geometers to define them!
Theyre not exaggerating! Defining tensors is unpleasant for both the writer and the reader, and Chapters 27
30 are no exception. The presentation here seeks the deeper truth underlying tensor algebra and attempts
to fill some of the gaps in typical introductions. This requires a substantial investment of effort when
developing the basic definitions and properties. But hopefully the result will be greater confidence and
certainty in the applications of tensor algebra to differential geometry. The calculations will be the same,
but better understanding should yield more accurate decisions about which calculations to make.
All things considered, the tensor algebra which is presented in Chapters 2730 is far too tedious for the vast
majority of readers. Although the gaps and short-cuts in the majority of texts are discomforting to any
reader who wants to know the whole truth about tensors, the whole truth is an administrative nightmare of
indices and multi-indices. So all in all, it may not be such a bad thing to think of tensors as arrays which
obey certain transformation rules (as the elementary tensor calculus texts say), or as multilinear functions
of tuples of vectors and linear functionals (as many other texts say).
A logically correct presentation of multilinear and tensor algebra and analysis, including exterior algebra
and exterior calculus, seems to be unavoidably painful. For practical applications, it is probably better to
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
ignore the precise underlying logic and use intuition and guesswork instead.
27.0.7 Remark: Tensor algebra requires coordinates, but the tensor concept is coordinate-independent.
The use of coordinates in differential geometry is much maligned. But many of the most basic steps in
the development of tensor algebra depend heavily on basis-and-coordinates expressions because multilinear
functions and tensors have natural decompositions into linear combinations of simple multilinear functions
(in Section 27.2) and simple tensors (in Section 28.4) respectively. Only those constructions and properties
which are independent of the choices of bases are regarded as authentic. So when bases and coordinates are
used in a construction or proof, it must be verified that the result is basis-independent.
One of the guiding objectives of an introduction to tensor algebra is the need to build up mathematical
machinery which hides the coordinates as soon as possible. Thus coordinates are used where necessary in
the early development in the earnest hope that they will be politely hidden in the later development and
applications. Of course, coordinates inevitably reappear in practical calculations.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
27.1. Multilinear maps
27.1.1 Remark: Scalar-valued multilinear maps are called multilinear functions.
A multilinear map is a scalar-valued or vector-valued function of vector variables which depends linearly on
each variable individually. Scalar-valued may be thought of as a special case of vector-valued because a
field may be regarded as a one-dimensional linear space over itself. (See Definition 22.1.7.)
A scalar-valued multilinear map will be referred to as a multilinear function. A multilinear function is not,
strictly speaking, a particular kind of multilinear map because a field is not, strictly speaking, a particular
kind of linear space. But this choice of terminology is convenient and unambiguous in most contexts. (Of
course, functions and maps are synonyms. These words are used here only as hints that the target space
is a field or linear space respectively. This is similar to the way in which the synonyms set and collection
are often used as hints.)
The main reason for defining multilinear functions and multilinear maps together is to avoid repetition of
very similar-looking definitions and theorems. However, multilinear functions have some special properties
not enjoyed by general multilinear maps. For example, the duals of multilinear function spaces are tensor
spaces, which have very broad applications, whereas general multilinear maps do not have such useful dual
spaces. However, vector-valued multilinear maps are very useful in their role as vector-valued forms in
physics, including Lie-algebra-valued forms in particular.
27.1.2 Remark: A function of r variables is the same as a function of a single r-tuple variable.
A function of many variables is (almost always) formalised as a function of a single variable which is a tuple.
For example, a function with variables v1 , v2 . . . vr is formalised as a function of the tuple (v1 , v2 . . . vr ). (See
Remark 10.2.10.) Thus a function with domain A V is, strictly speaking, a function of a single variable,
but it is thought of as a function of #(A) variables.
27.1.3 Definition: A multilinear map from the Cartesian product A V of a finite family of linear
spaces (V )A over a field K to a linear space U over K is a map f : A V U such that
A, , K, u, v, w V ,
A
u = v + w and A \ {}, u = v = w f (u) = f (v) + f (w). (27.1.1)
A multilinear function is a multilinear map whose target space U is the field K regarded as a linear space.
27.1.4 Remark: Interpretation of the multilinear map concept.
Definition 27.1.3 says that a function f : A V U is multilinear if f is linear with respect to each
space V individually for fixed values of the other parameters. Theorem 27.1.28 expresses line (27.1.1) in an
intuitively clearer way perhaps.
The linearity of a multilinear map with respect to each component of its domain is illustrated in Figure 27.1.1.
This diagram shows three linear relations. The function value f (v10 , v2 , v3 ) U is related linearly to v10 V1
for each fixed v2 V2 and v3 V3 . Similarly, f (v1 , v20 , v3 ) is a linear function of v20 for fixed v1 and v3 , and
f (v1 , v2 , v30 ) is a linear function of v30 for fixed v1 and v2 .
U
f (v1 , v2 , v30 )
f (v1 , v2 , v3 )
f (v10 , v2 , v3 )
f (v1 , v20 , v3 ) linear
linear linear
v1 v20
V1 V2 V3 v3 v30
v2
v10
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
multilinear map f : V1 V2 V3 U
Figure 27.1.1 Linearity of a multilinear map with respect to domain components
27.1.5 Example: Application of the multilinear map condition for two finite-dimensional spaces.
If one does not know the answer in advance, it is not entirely obvious how much flexibility remains for a
map after the multilinearity condition in line (27.1.1) has been applied. In particular, it is not immediately
obvious that the set of multilinear maps for a finite product of finite-dimensional linear spaces has the
structure of a finite-dimensional linear space. (See Theorem 27.4.14 for a proof of this assertion.)
N R R R
Let A = 2 and K = . Let V1 = 2 and V2 = 3 . Then A V = 2 3 . (Note the pedantic subtlety R R
here that strictly speaking, A V is the set of maps g : A (V1 V2 ) such that g1 V1 and g2 V2 by
Definition 10.11.2, whereas here the simple set of ordered pairs in Definition 9.7.2 is used. This observation
is in the weeds of mathematical formalism. So it does not need to be considered!)
R
Let U be the linear space . By Definition 27.1.3, a multilinear map from 2 3 to U = R
is then a R R
R
map f : 2 3 R R
which satisfies line (27.1.1). Let = 1. Let , . Let u, v, w 2 3 . Let R R R
R R
u1 = v1 + w2 2 and u2 = v2 = w2 3 . Then line (27.1.1) requires that f (u) = f (v) + f (w).
R
Thus for each fixed u2 3 , the map f1,u2 : u1 7 f (u1 , u2 ) is linear from 2 to . So by Theorem 23.7.10,R R
R R
u2 3 , u1 2 , f1,u2 (u1 ) = f1,u2 (e1,1 )h11 (u1 ) + f1,u2 (e1,2 )h21 (u1 ), where e1,1 = (1, 0) and e1,2 = (0, 1)
R
are the standard basis vectors for 2 , and h11 (u1 ) and h21 (u1 ) are the components u11 and u21 of u1 = (ui1 )2i=1
respectively. Consequently, if only this = 1 condition is applied, f would only be required to satisfy
R R
u1 2 , u2 3 , f (u1 , u2 ) = f (e1,1 , u2 )h11 (u1 ) + f (e1,2 , u2 )h21 (u1 ). This would allow enormous freedom
for f because the numbers f (e1,1 , u2 ) and f (e1,2 , u2 ) would be completely arbitrary functions of u2 3 . In R
R R
fact, for an arbitrary linear-functional-valued function g : 3 ( 2 ) , the function f : (u1 , u2 ) 7 g(u2 )(u1 )
satisfies Definition 27.1.3 for = 1. Moreover, Theorem 23.7.10 implies that any function f : 2 3 R R R
which satisfies Definition 27.1.3 for = 1 must have this form.
R
The condition for = 2 can now be applied for u1 2 \ {0} to the function f : (u1 , u2 ) 7 g(u2 )(u1 ) for
R R
any linear-functional-valued function g : 3 ( 2 ) to discover that g is a linear linear-functional-valued
R
function with respect to the linear space structure on the dual space ( 2 ) given in Definition 23.6.4. Since
R
g must have the form g(u2 ) : u1 7 g(u2 )(e1,1 )h11 (u1 ) + g(u2 )(e1,2 )h21 (u1 ) for all u1 2 , and this expression
must be linear with respect to u2 , it follows that both u2 7 g(u2 )(e1,1 ) and u2 7 g(u2 )(e1,2 ) must be linear
R R P3
functions from 3 to , which implies by Theorem 23.7.10 that g(u2 )(e1,i ) = j=1 g(e2,j )(e1,i )uj2 for i 2 . N
R R
Therefore f (u1 , u2 ) is determined for all (u1 , u2 ) 2 3 by the six numbers g(e2,j )(e1,i ) = f (e1,i , e2,j ).
This suggests that the set P of multilinear functions f : 2 3 R R R
constitutes a six-dimensional linear
P
R
space. In fact, f (u1 , u2 ) = 2i=1 3j=1 f (e1,i , e2,j )ui1 uj2 for all (u1 , u2 ) 2 3 .R
In terms of P the standard bases and coordinates, every multilinear function f : R2 R3 R has the form
2,3
f (u1 , u2 ) = i,j=1 aij ui1 uj2 for all (u1 , u2 ) R2 R3 for some array [aij ]2i=1, 3j=1 R23 . This seems much
simpler to understand than Definition 27.1.3. So it is not at all surprising that so many authors prefer
to think of multilinear functions (and also tensors) as arrays of numbers with suitable manipulation rules.
Nevertheless, an abstract approach is adopted here, even though this is often painful.
27.1.6 Remark: Comparison of multilinear maps with partial derivatives of functions.
Initially it may be difficult to acquire an intuition for multilinear maps. It may be helpful to consider
the analogy to partial differentiation of real functions of several variables. The partial derivative i f (a) =
R R
f (x)/xi x=a of a continuously differentiable real-valued function f : n , with respect to each i n N
Z
for n + , is calculated as the limit of the differential quotient of f with respect to the variable xi while
all other variables are kept constant. Thus i f (a) = limh0 h1 (f (a + hei ) f (a)) for the standard unit
vectors ei . The existence of the partial derivatives requires only that the differentials f (a + hei ) f (a) be
approximately linear in the limit as h 0.
27.1.7 Remark: Multilinear maps on infinite families of linear spaces are useless.
Definition 27.1.3 explicitly excludes an infinite set A. This is a necessary assumption. To see why, consider
the consequences of Theorem 27.1.31 (ii) in the case that A is an infinite set. If any element v of the family
of vectors v = (v )A is zero, then every multilinear function f L ((V )A ; U ) has value f (v) = 0.
Therefore if v A V satisfies f (v) 6= 0, then v 6= 0 for all A. Define w A V by w = 2v .
Then the application of linearity to each index gives a factor of 2 between f (w) and f (v). Therefore
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
f (w) = 2 f (v) / U if the character of K equals 0. (Fields with non-zero character cause other problems
for multilinear functions.) It follows that L ((V )A ; U ) = {0}. In other words, there is only one possible
multilinear function, namely the function f with value f (v) = 0 U for all v (V )A . This is clearly
useless although it does not appear to be logically self-contradictory.
27.1.8 Remark: Combined multinomial-style conditions for multilinearity.
Theorems 27.1.9 and 27.1.10 give equivalent criteria for the multilinearity in Definition 27.1.3. These criteria
have the advantage of resembling the usualPrules for linear maps acting on linear combinations of vectors.
(See Remark 23.1.2.) Expressions such as v V (v )v )A and (1 v1 + 2 v2 )A may be thought of
as multilinear combinations. (The expression Fin(S, K) denotes the set of functions from a set S to a field
K which are zero outside a finite set. See Notation 22.2.25.) The formulas in Theorems 27.1.9 and 27.1.10
are probably too convoluted for practical use.
27.1.9 Theorem: Let (V )A be a finite family of linear spaces over a field K and let U be a linear space
over K. Then a map f : A V U is multilinear if and only if
Fin(V , K),
A X Y
P
f (( (v )v )A ) = (w ) f ((w )A ).
v V w V A
A
Proof: The result follows by substituting suitable values for in Theorem 27.1.9.
27.1.11 Remark: Multilinearity is not a superficial extension of the linearity concept.
If one looks long enough and closely enough at the conditions for multilinear maps in Definition 27.1.3 and
Theorems 27.1.9, 27.1.10 and 27.1.28, one may conclude that the concept of a multilinear map is not a
simple generalisation of the linear map concept. As mentioned in Remark 27.1.6, there are some similarities
between multilinear functions and partial derivatives of functions of several variables. In the same way that
partial differential equations are not a superficial extension of ordinary differential equations, multilinear
maps are not a superficial extension of the linear map concept.
27.1.12 Remark: Well-definition of multilinearity in extreme boundary cases.
The Cartesian product A V of linear spaces (V )A in Definition 27.1.3 is always non-empty. If A = ,
then A V = {} 6= . (See Theorem 10.11.6 (i).) The empty function A V is the all-zeros tuple.
(Since the tuple is empty, all of its elements are zero!)
In the case A = , all functions f : A V U are multilinear. Therefore L ((V )A ; U ) = U {} , which
is the set of all functions which map to an element of U . Thus L ((V )A ; U ) may be identified with the
set U when A = . When U is a field K regarded as a linear space, the set L ((V )A ; K) may be identified
with K when A = .
27.1.13 Definition: The degree of a multilinear map f : A V U for a finite family of linear spaces
(V )A over a field K to a linear space U over K is the integer #(A).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
27.1.15 Remark: Alternative terminology for the degree of a multilinear map.
The term degree is well-chosen because, as mentioned in Remark 27.2.8, multilinear functions (with
range K) with finite-dimensional domain component spaces are polynomials with homogeneous degree. As
mentioned in Remark 28.1.14, some authors use other terms for the degree. (See Table 29.5.1 in Remark 29.5.8
for some variant terminology.)
27.1.16 Definition: Let f : A V U be a multilinear map as in Definition 27.1.3.
When #(A) = 2, the map f is said to be bilinear .
When #(A) = 3, the map f is said to be trilinear , and so forth.
When #(A) = r for r + Z
0 , the map f is said to be r-linear .
27.1.17 Notation: L ((V )A ; U ) for a finite family of linear spaces (V )A over a field K, and a linear
space U over K, denotes the set of multilinear maps from A V to U .
27.1.18 Notation: L (V1 , . . . Vm ; U ) for a sequence of m linear spaces (Vi )m
i=1 over a field K with m Z+0,
N
and a linear space U over K, denotes the set L ((V )A ; U ) with A = m = {1, . . . m}.
27.1.19 Notation: Lm (V ; U ) for m Z+0 and linear spaces V and U over a field K denotes the set of
multilinear maps L ((V )A ; U ) with A = Nm and V = V for all A.
27.1.20 Remark: Schematic diagrams for spaces of multilinear maps and multilinear functions.
Notations 27.1.17, 27.1.18 and 27.1.19 are illustrated in Figure 27.1.2. (These diagrams may be compared
with the diagrams for tensor spaces in Figure 28.1.1 in Remark 28.1.3.)
The notations L ((V )A ; K), L (V1 , . . . Vm ; K) and Lm (V ; K) are used in case the linear space U in
Notations 27.1.17, 27.1.18 and 27.1.19 respectively is the field K regarded as a linear space, as defined in
Definition 22.1.7. A case of particular interest for differential geometry is, of course, K = . R
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
27.1. Multilinear maps 805
U U U
V V1 . . . Vm V
| .{z
. . V}
A m
L ((V )A ; U ) L (V1 , . . . Vm ; U ) Lm (V ; U )
Figure 27.1.2 Notations for spaces of multilinear maps
Notations 27.1.17, 27.1.18 and 27.1.19 may be abbreviated to Notations 27.1.21, 27.1.22 and 27.1.23 in case
U is the field K regarded as a linear space. These abbreviations are illustrated in Figure 27.1.3.
K K K
V V1 . . . Vm V
| .{z
. . V}
A m
L ((V )A ) L (V1 , . . . Vm ) Lm (V )
Figure 27.1.3 Notations for spaces of multilinear functions
27.1.21 Notation: L ((V )A ) for a finite family of linear spaces (V )A over a field K denotes the set
of multilinear maps L ((V )A ; K).
27.1.23 Notation: Lm (V ) for m Z+0 and a linear space V over a field K denotes the set of multilinear
maps Lm (V ; K).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A set L ((V )A ; U ) of degree 1 may be identified with Lin(V , U ), the set of linear maps from V to U ,
where A = {}. (See Section 23.1 for linear maps.) Then for f L ((V )A ; U ), the domain of f is the
set A V = {(, v); v V }, which is not exactly the same as the set V . However, it is customary to
identify the set {} V with V .
A set Lm (V ; U ) for m + Z
0 has degree m. If U is the field K regarded as a linear space, then L1 (V ; K)
may be identified with the linear space Lin(V, K) = V , the dual of V , defined in Section 23.6.
27.1.25 Remark: Issues that arise for multilinear maps with an infinite-dimensional component space.
As discussed in Remark 23.5.1, it is not absolutely guaranteed that the dual space V has more than a single
element (the zero functional) if V is infinite-dimensional (although in practical situations, the existence
of non-zero functionals is usually very easily demonstrated). Consequently, if any space V in the family
(V )A is infinite-dimensional, it must be shown that the set L ((V )A ; U ) contains more than the zero
multilinear function. One way to guarantee this is by providing a basis for each V .
In the case of an infinite-dimensional linear space V which has a basis, the dual space V is very large. (See
Theorem 23.7.14.) In general, there is no constructible basis for such a dual space. Consequently tensor
products with one or more infinite-dimensional component spaces are of limited value.
The algebraic duals of infinite-dimensional linear spaces can be made useful by imposing a topological
constraint on the dual, for example by requiring linear functionals in the dual space to be bounded or
continuous. The same can be done for multilinear functions. The set of multilinear maps may be regarded
as the multilinear dual, which can be pruned back from the full algebraic multilinear dual to a topological
multilinear dual of some kind. So infinite-dimensional multilinear functions and tensors are not a totally
absurd idea.
R
f C ( n ) be a real-valued function of n variables with continuous partial derivatives of all orders,
where P Z
n + 0 . Let V =
n
R
. By induction on m + Z
0 , define fm : V
m
C ( ) by fm : (v 1 , . . . v m ) 7R
n m i +
(x 7 i=1 vi (/x )fm1 (v1 , . . . vm1 )(x)). For m 0 and x n
Z
, define gm,x : V m R by gm,x : R
1 m
R +
Z
(v , . . . v ) 7 fm (x). Then gm,x Lm (V ; ) for all m 0 and x n . In other words, directional R
derivatives of order m are multilinear functions of degree m at each point.
27.1.27 Remark: Clearer formulation of multilinearity in terms of substitution operators.
Theorem
27.1.28 re-expresses Definition 27.1.3 in terms of the substitution operator notation which defines
E(x)x=V to mean the result of substituting of expression V into expression E(x) wherever the variable x
occurs. Condition (27.1.2) is probably easier to interpret than (27.1.1).
Theorem 27.1.28 says that a function
f : A V U is multilinear if and only if the map w 7 f (v v =w ) from V to U is linear for all A
and v A V .
27.1.28 Theorem: Equivalent substitution-operator-based expression for multilinearity.
A map f from the Cartesian product A V of a finite family of linear spaces (V )A over a field K to
a linear space U over K is multilinear if and only if
v V , A, w1 , w2 V , 1 , 2 K,
A
f v v =1 w1 +2 w2 = 1 f v v + 2 f v v . (27.1.2)
=w1 =w2
V3
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
f L (V1 , V2 , V3 ; U )
(v1 , v2 , v30 )
linear f (v1 , v2 , v30 ) U
V1 V2 V3 V1 U
Figure 27.1.4 Definition of multilinear map
27.1.30 Remark: Multilinear maps equal zero if at least one component equals zero.
It follows from Definition 27.1.3 that the value of a multilinear map is zero if any of its arguments is zero.
This is expressed in two equivalent ways in parts (i) and (ii) of Theorem 27.1.31.
27.1.31 Theorem: Let (V )A be a finite family of linear spaces over a field K, and let U be a linear
space over K. Then:
(i) f L ((V )A ; U ), A, v (V )A , (v = 0 f (v) = 0).
(ii) v (V )A , ( A, v = 0) (f L ((V )A ; U ), f (v) = 0) .
Proof: For part (i), let A and let u = (v )A A (V ) satisfy u = 0. In Definition 27.1.3, let
v = w = u and = = 0. Then line (27.1.1) gives f (u) = 0.f (u) + 0.f (u) = 0.
Part (ii) follows from part (i) because it is logically equivalent.
27.1.32 Remark: Ad-hoc juxtaposition products of multilinear functions by vectors.
In practical applications, one sometimes sees products of multilinear functions by vectors. In other words, a
multilinear function f L ((V )A ; K) is multiplied by a vector U (using scalar multiplication in U ) to
obtain a map f : A V U which is hopefully a valid multilinear map in L ((V )A ; U ). As always,
anything which seems intuitively obvious must be carefully checked, as in Theorem 27.1.33. (In practice, the
multiplication operation is not indicated explicitly. The product is indicated only by juxtaposition.)
The output of the juxtaposition product f is limited to the subspace {; K} of U . So clearly not
all multilinear maps can be represented in this way. However, if L ((V )A ; K) and L ((V )A ; U ) are
finite-dimensional linearPspaces, a general multilinear map in L ((V )A ; U ) can be expressed as a sum of
juxtaposition products vB v for a basis B of vector tuples in L ((V )A ; K). (See Section 27.4 for
linear spaces of multilinear maps.)
The juxtaposition product for multilinear functions by vectors is an extension of Definition 23.4.10 for the
juxtaposition product for linear functionals by vectors.
27.1.33 Theorem: Let (V )A be a finite family of linear spaces over a field K. Let U be a linear space
over K. Let f denote the function f : A V U defined by f : (v )A 7 f ( (v )A ) for all
(v )A A V and U . Then
f L ((V )A ; K), U, f L ((V )A ; U ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
commutative ring have a well-defined concept of linearity. (See for example Theorem 19.3.13.) Multilinearity
of maps is well-defined for these more general classes. Therefore tensor spaces are well-defined for these classes
also. (See for example Bump [56], chapter 9.)
for any w A V .Definition 27.2.4 formalises this idea as the canonical multilinear function map
: A V L ((V )A ; K). This is a kind of upgrade map from a linear functional tuple to a
multilinear function. It is not a bijection (except in special cases.)
27.2.4 Definition: The canonical multilinear function map for the set L ((V )A ; K) of multilinear
functions is the map : A V L ((V )A ; K) defined by
Y
w V , v V , (w)(v) = w (v ) (27.2.1)
A A
A
(w )A V , (v )A V ,
A A
Y
(w )A (v )A = w (v ).
A
It is easy to show that (w)(v) is multilinear with respect to v A V for all w A V . (See
Theorem 27.4.6 for proof.)
27.2.6 Definition: A simple multilinear function for a finite family of linear spaces (V )A over a field K
is a multilinear function (w) L ((V )A ; K) for some w A V , where is the canonical multilinear
function map in Definition 27.2.4.
27.2.7 Remark: Analogy between simple multilinear functions and simple tensors.
The simple multilinear functions in Definition 27.2.6 are analogous to the simple tensors in Section 28.4.
The canonical multilinear function map in Definition 27.2.4 is likewise analogous to the canonical tensor
map in Definition 28.2.2.
27.2.8 Remark: Multilinear functions are sums of products of monomials of homogeneous degree.
The canonical multilinear function map in Definition 27.2.4 and the simple multilinear functions in Defini-
tion 27.2.6 may seem to be almost
Q trivial and superficial by comparison with the supposed complexity of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
tensor algebra. The products A w (v ) look like algebraic monomials, and that is exactly what they
are. General multilinear functions for finite-dimensional component spaces V are no more and no less than
sums of products of linear functionals which have homogeneous polynomial degree.
27.2.9 Example: Contours of a bilinear function on two copies of the real numbers.
Figure 27.2.1 illustrates a bilinear function f L (V1 , V2 ; U ) for the linear spaces V1 = V2 = R1 and U = R.
This is an example of a simple multilinear function. (See Definition 27.2.6.)
x2
f (x1 , x2 ) = 1 f (x1 , x2 ) = 1
2 1 2
1.5 0.5 1.5
1 1
0.5 0.25 0.5
0 x1
0.5 -1 1 0.5
1 1
1.5 f (x1 ,x2 )=0 1.5
2 -1 2
f (x1 , x2 ) = x1 x2
2 1 1 2
R R R R
All multilinear functions from 1 1 to are of the form f (x1 , x2 ) = kx1 x2 for some k . The diagram
R
uses k = 1. It is clear from the contour curves that f is not a linear function on 2 . A linear function of two
variables would have straight lines for contours. (More generally, the contours of linear functions are always
parallel hyperplanes, i.e. translations of a subspace.) However, the value of f (x1 , x2 ) for a constant value of
x1 does vary linearly with respect to x2 .
27.2.10 Remark: Contrast between multilinear functions and linear functions of a vector variable.
Example 27.2.9 hints at a general property of multilinear maps which makes them significantly different to
linear functionals. In the case of linear functionals on a linear space V which has a basis, the only vector in
V for which all linear functionals have the zero value is the zero vector. In other words, (w V , w(v) =
0) v = 0. (This follows from Theorem 23.6.8.) In the case of multilinear functionals of degree 2 or more,
Theorem 27.1.31 implies that all multilinear functionals on a vector tuple v = (v )A A V have the
value zero if one or more of the components v equals zero. This is the set {v A V ; A, v = 0}.
By Theorem 23.6.9, if all linear functionals on a linear space V which has a basis have the same value for
two vectors in V , then the two vectors are the same. In the case of multilinear maps, equality of multilinear
map values defines an equivalence relation on the tuple set A V . One of the equivalence classes of this
relation is {v A V ; A, v = 0}. All multilinear maps have value 0 on this set. The equivalence
relation for the rest of A V is characterised in Theorem 27.2.11. The equivalence classes are clearly
(r 1)-dimensional hyperbolic manifolds embedded in the product space A V , where r = #(A). These
are generalisations of the contour lines in Figure 27.2.1.
27.2.11 Theorem: Equivalence of multilinear functions for pairs of linear functional families.
Let (V )A be a non-empty finite family of linear spaces over a field K, which have bases. Define a relation
on A V by u v whenever w A V , (w)(u) = (w)(v). Then
u, v (V \ {0}),
A Y
u v k (K \ {0K })A , k = 1K and A, u = k v . (27.2.2)
A
Q Q
Proof: Suppose u, v A (V \ {0}) and u v. Then A w (u ) = A w (v ) for all w
A V by Definition 27.2.4. For #(A) = 1, let A = {} and V = {0}. Then line (27.2.2) is true because
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A (V \ {0}) = by Theorem 10.11.6 (ii).
Suppose A = {} and V 6= {0}. Then w(u ) = w(v ) for all w V . So by Theorem 23.6.9, u = v
because V has a basis. So line (27.2.2) holds with k = 1.
Suppose #(A) 2. If V = {0} for some A, then A (V \ {0}) = by Theorem 10.11.6 (ii). So
line (27.2.2) is true because it is void.
Q
SupposeQ#(A) 2 and V 6= {0} for all A. Let A. Let A0 = A \ {}. Then w (u ) A0 w0 (u ) =
w (v ) A0 w0 (v ) for all w V and Qall w
0
A0 V . Theorem 23.5.3
Q implies
0
Q for all 0 A ,
that
0 0 0
there is some w A0 V for which A0 w (u ) 6= 0. Then k = A0 w (v )/ A0 w (u ) is
well defined. For such w0 , w (u ) = k wQ (v ) for all w
QV . Therefore
Q u = k v by Theorem 23.6.10.
Hence
Q A, k K, u = k v . So A w (u ) = ( A k ) A (v ) for all w A V . So
w
A
A k = 1. Then clearly k (K \ {0K }) . So line (27.2.2) is verified.
Q
To show the converse, let u, Q v A (V \Q{0}) and k (K Q \ {0K })AQbe such that A Q k = 1K and
A, u = k v . Then A w (u ) = A w (k v ) = A k A w (k v ) = A w (k v )
for all w A V . So (w)(u) = (w)(v) for all w A V . So u v.
27.2.12 Remark: Function-transposition of the canonical multilinear function map.
If the canonical multilinear function map in Definition 27.2.4 is transposed, the result is the map T in
Definition 27.2.13. (The map T is shown to be multilinear in Theorem 27.4.7.)
27.2.13 Definition: The canonical multilinear function map transpose for a set of multilinear functions
L ((V )A ; K) is the map T : A V L ((V )A ; K) defined by
Y
v V , w V , T (v)(w) = w (v )
A A
A
27.2.14 Definition: A simple dual multilinear function for a finite family of linear spaces (V )A over
a field K is a multilinear function T (v) L ((V )A ; K) for some v A V , where T is the canonical
multilinear function map transpose in Definition 27.2.13.
27.2.15 Theorem: Inverse image of zero vector of a canonical multilinear function map transpose.
Let (V )A be a non-empty finite family of linear spaces, which have bases, over a field K. Then
( T )1 ({0}) = {v V ; A, v = 0},
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
where T : A V L ((V )A ; K) is the canonical multilinear function map transpose which is given
in Definition 27.2.13.
u V , u ( T )1 ({}) T (u) =
A
T (u) = T (v)
w V , T (u)(w) = T (v)(w)
A
w V , (w)(u) = (w)(v)
A
Q
k K A , k = 1K and A, u = k v (27.2.3)
A
Q
u {(k v )A ; k K A and k = 1},
A
In the same way, because of the linearity properties of multilinear maps, the redundancy of information in
the maps may be exploited by using components. Demonstrations of many basic properties of tensors are
most easily achieved by expressing tensors and multilinear maps in terms of bases consisting of simple
tensors and multilinear maps. Matrix representation is the most usual way of specifying multilinear maps
for practical calculations. Hence coordinates cannot be easily avoided.
It could be argued that the use of coordinates is onerous. The following steps are typically followed to
calculate f (v1 , . . . vr ) for an r-linear map f and a vector tuple (v1 , . . . vr ) V r using coordinates.
(i) Choose a basis (ei )ni=1 V n for V .
(ii) Determine the value ai1 ,...ir = f (ei1 , . . . eir ) for each of the nr tuples (ei1 , . . . eir ).
N
(iii) For each j r , determine coordinates (vj,i )ni=1 such that vj = i=1 vj,i ei .
P P
Pn
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
an equivalent expression for multilinear maps in terms of primal bases and canonical dual bases for the
component spaces. (See Definition 23.7.3 for the canonical dual basis of a basis family.) Theorem 27.3.3
gives technical expressions for simple multilinear functions in preparation for Theorem 27.3.4.
Y
i I, v V , (hi )(v) = h
i (v )
A
A
Y
= B (v )(i )
A
where B : V Fin(I , K) is the coordinate map defined for B in Definition 22.8.7, and the linear
functional families hi = (hi )A A V for i I are as in Definition 23.7.3, and : A V
L ((V )A ; K) is the canonical multilinear function map in Definition 27.2.4. Hence
X
f= f (ei )(hi ). (27.3.4)
iI
Proof: Let f L ((V )A ; U ) be a multilinear map, where (V )A is a finite family of linear spaces
over a field K, and U is a linear space over K. Let B = (e I
i )i I V be a basis for V for each A.
Let eiP= (ei )A A V for each index family i = (i )A I = A I . Then by Definition 22.8.7,
v = i I B (v )(i )e
i for all A, for all v = (v )A A V . Therefore
X
v V , f (v) = f B (v )(i )e
i A
.
A
i I
This expression for f (v) may be expanded for a particular A using the definition of linearity of f with
respect to v .
X
v V , f (v) = B (v )(i )f v v =e ,
A i
i I
means the result of substituting ei for v in v = (v )A . This expansion of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
where the expression v v
=ei
f may be continued inductively on A to obtain line (27.3.1). Lines (27.3.2), (27.3.3) and (27.3.4) follow
from Theorem 27.3.3.
27.3.6 Theorem: Basis-and-coordinates expression for canonical multilinear function map transpose T .
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let B = (e I
i )i I V
I
be a basis for V for A. Let B = (hi )i I (V ) be the canonical dual basis to each B . Then
Y
i I, w V , T (ei )(w) = w (e
i )
A
A
Y
= B (w )(i )
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
X
w V , f (w) = w (e
i )f w w =h ,
A i
i I
where w w means the result of substituting hi for w in w = (w )A . This expansion of f may be
=hi
continued inductively on A to obtain line (27.3.6). Then lines (27.3.5), (27.3.7) and (27.3.8) follow from
Theorem 27.3.6 and the definitions of and its function-transpose, T .
27.3.8 Remark: Decomposition of multilinear functions into simple multilinear functions.
In Section 27.4, it will follow from lines (27.3.4) and (27.3.8) in Theorems 27.3.4 and 27.3.7 that general
multilinear functions may be expressed as linear combinations of simple multilinear functions. This is
formalised for L ((V )A ; U ) in Theorem 27.4.11.
27.3.9 Example: Decomposition of bilinear functions into simple bilinear functions.
Let V be a linear space over a field K with basis (ei )ni=1 V n for some n + Z
0 . Let f : V V K be a
N
bilinear function. Let aij = f (ei , ej ) for i, j n . Then the matrix a = (aij )ni,j=1 Mn,n (K) satisfies
n
X n
X
v, w V, f (v, w) = f vi ei , wj ej
i=1 j=1
n
XX n
= vi wj f (ei , ej )
i=1 j=1
Xn
= aij vi wj ,
i,j=1
P P
where v = ni=1 vi ei and w = nj=1 wj ej . Thus the action of the bilinear function f on v and w is equivalent
to the action of the matrix a on the products of coordinates of v and w.
tuple space
(vi )ni=1
n n
K K P
n
(wj )m
j=1
i,j =
bij v 1
i wj
bilin
ear f
orm
component n (v, w)
map V : V K K
ap
i l i near m
b
K
v : V V field
w
V V
linear space
Figure 27.3.1 Components of a bilinear map with respect to a basis
27.3.11 Remark: Converting between colloquial (covariant) tensors and well-defined multilinear maps.
In the literature, one frequently sees expressions such as the tensor gij . The real meaning of such an
R
expression is something like the bilinear map g : V V with g(ei , ej ) = gij for all i, j n , for some N
Z
basis (ei )ni=1 for V , where n + . Clearly the shorter expression does have advantages, as long as one
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
does not forget that the underlying bilinear map is the real thing. Examples 27.3.9 and 27.3.10 show how
to convert from bilinear maps to matrices and back again.
1 , 2 K, f1 , f2 L ((V )A ; U ), v V ,
A
Proof: Let g, h L ((V )A ; U ) and let f = g + h be the pointwise sum of g and h. The expression f (u)
in Definition 27.1.3 evaluates to g(u) + h(u) = 1 g(v) + 2 g(w) + 1 h(v) + 2 h(w) = 1 f (v) + 2 f (w), as
required. Similarly, for K and f = g, f (u) evaluates to g(u) = (1 g(v) + 2 g(w)) = 1 f (v) + 2 f (w),
as required.
27.4.3 Definition: The linear space of multilinear maps from a finite family (V )A of linear spaces over
a field K to a linear space U over K is the set L ((V )A ; U ) together with the operations of pointwise
vector addition and scalar multiplication.
The linear space of multilinear functions on a finite family (V )A of linear spaces over a field K is the set
L ((V )A ; K) together with the operations of pointwise vector addition and scalar multiplication.
27.4.4 Remark: Analogy of linear spaces of multilinear functions to multilinear dual spaces.
The multilinear function space L ((V )A ; K) in Definition 27.4.3 may be thought of as a kind of multilinear
dual of the family (V )A , which is analogous to the dual of a single linear space. It is not truly a dual
because the quintessential property of a dual is that the dual of a dual is isomorphic to the original space.
The multilinear dual of the multilinear dual is not isomorphic to the original linear-space-family (except
in trivial cases).
The dual of a linear space is the space of linear functionals on that space, whereas L ((V )A ; K) is the
linear space of multilinear functionals. So the linear space of multilinear functions is only a generalisation of
the concept of a space of linear functionals, not truly a dual in any genuine sense. Nevertheless, the phrase
multilinear dual is a convenient shorthand for linear space of multilinear functions.
27.4.5 Remark: The specification tuple for the linear space of multilinear maps.
More strictly speaking, the linear space of multilinear maps in Definition 27.4.3 is specified as the tuple
(K, W, K , K , W , K,W ), where K
< (K, K , K ) is the field, W = L ((V )A ; U ) is the set of maps,
W : W W W denotes pointwise addition on W , and K,W : K W W denotes pointwise
multiplication of elements of W by elements of K. (See Definition 22.1.1 for linear spaces.)
27.4.6 Theorem: The canonical multilinear function map is a multilinear map.
The canonical multilinear function map : A V L ((V )A ; K) is a multilinear map for any finite
family of linear spaces (V )A over a field K.
Proof: Let (V )A be a finite family of linear spaces over a field K. Let A. Let 1 , 2 K. Let
w0 , w1 , w2 A V satisfy w0 = 1 w1 + 2 w2 and A \ {}, w0 = w1 = w2 .
Q Q
Let v A V . By Definition 27.2.4, (w0 )(v) = A w0 (v ) = (1 w1 + 2 w2 )(v ) A\{} w0 (v ) =
Q Q
1 A w1 (v ) + 2 A w2 (v ) = 1 (w1 )(v) + 2 (w2 )(v). So (w0 ) = 1 (w1 ) + 2 (w2 ). Hence is
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a multilinear map by Definition 27.1.3.
27.4.7 Theorem: The canonical multilinear function map transpose is a multilinear map.
The canonical multilinear function map transpose T : A V L ((V )A ; K) is a multilinear map for
any finite family of linear spaces (V )A over a field K.
Proof: Let (V )A be a finite family of linear spaces over a field K. Let A. Let 1 , 2 K. Let
v 0 , v 1 , v 2 A V satisfy v0 = 1 v1 + 2 v2 and A \ {}, v0 = v1 = v2 .
Q Q
Let w A V . By Definition 27.2.13, T (v 0 )(w) = A w (v0 ) = w (1 v1 +2 v2 ) A\{} w (v0 ) =
Q Q
1 A w (v1 ) + 2 A w (v2 ) = 1 T (v 1 )(w) + 2 T (v 2 )(w). So T (v 0 ) = 1 T (v 1 ) + 2 T (v 2 ). Hence
T is a multilinear map by Definition 27.1.3.
27.4.8 Remark: Properties of the canonical basis for the multilinear function space.
For the canonical basis family (i )iI defined in Definition 27.4.9 for the multilinear function space
L ((V )A ; K), it is shown in Theorem 27.4.10 that i L ((V )A ; K) for all i I, and the family
is shown to be a basis for L ((V )A ; K) in Theorem 27.4.11.
Regrettably, the family (i )iI does not span L ((V )A ; K) if one or more of the component spaces V is
infinite-dimensional. (See Remark 23.7.4.) So the basis is only a basis for finite-dimensional spaces.
27.4.9 Definition: The canonical basis for the multilinear function space L ((V )A ; K) corresponding
to the bases B = (ei )i I V
I
for V for A is the family (i )iI where I = A I and
i : A V K is defined for i = (i )A I by
Y
v = (v )A V , i (v) = B (v )(i ),
A
A
27.4.10 Theorem: Canonical basis elements are images of linear functional families.
Let (i )iI be the canonical basis for L ((V )A ; K) in Definition 27.4.9, where (V )A is a finite family
of linear spaces over a field K. Then
i I, i = (hi ) L ((V )A ; K),
where : A V
L ((V )A ; K) is the canonical multilinear function map in Definition 27.2.4, and hi
I
denotes the family (h
i )A A V for all i I, where (hi )i I (V ) is the canonical dual basis
to each primal basis B .
Proof: The equation i = (hi ) follows from Theorem 27.3.3. Therefore i L ((V )A ; K).
27.4.11 Theorem: The canonical basis is a basis if the component spaces are finite-dimensional.
Let (i )iI be the canonical basis for L ((V )A ; K) in Definition 27.4.9, where (V )A is a finite family
of finite-dimensional linear spaces over a field K. Then (i )iI is a basis for the linear space of multilinear
functions L ((V )A ; K).
Proof: To show that P{i ; i I}Qspans L ((V )A ; K), let f L ((V )A ; K). Then by Theorem 27.3.4
line (27.3.1),
P f (v) = iI (f (ei ) A B (v )(i ))P for all v A V . Therefore by Definition 27.4.9,
f (v) = iI f (ei )i (v) for all v A V . So f = iI f (ei )i . Hence {i ; i I} spans L ((V )A ; K).
P
To show that the vectors i for i I are linearly independent, suppose that iI ki i = 0 for some k K I .
Let j I. Then X
0= ki i (ej )
iI
X Y
= ki B (e
j )(i )
iI A
X Y
= ki i ,j
iI A
= kj .
So kj = 0 for all j I. So the vectors i for i I are linearly independent. Hence (i )iI is a basis for
L ((V )A ; K).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
27.4.12 Remark: The canonical basis elucidates the structure of a multilinear function space.
In view of Theorem 27.4.11, the quotation marks around the term canonical basis for families (i )iI for
L ((V )A ; K) may be removed when the spaces V are finite-dimensional.
The functions i in Definition 27.4.9 are simple multilinear functions according to Definition 27.2.6. Thus
every linear space of multilinear functions with finite-dimensional component spaces is spanned by a finite
set of simple multilinear functions. This elucidates the structure of the space of multilinear functions to a
great extent. Theorem 27.4.13 gives a simple explicit formula on line (27.4.1) for multilinear functions.
The method of proof for Theorem 27.4.13 also yields the explicit expression on line (27.4.2) for multilinear
maps f L ((V )A ; U ) in terms of the basis vector family (i )iI for general linear spaces U . The
pointwise products f (ei )i are well-defined maps in L ((V )A ; U ). Therefore the sum of these products is
a well-defined map in the same linear space.
There is a similar expression for tensor spaces. (See Theorem 28.1.18.)
27.4.13 Theorem: Basis-and-coordinates expression for L ((V )A ; K) and L ((V )A ; U ).
Let (i )iI be the canonical basis for L ((V )A ; K) in Definition 27.4.9, where (V )A is a finite family
of finite-dimensional linear spaces over a field K. Then
X
f L ((V )A ; K), f= f (ei )i , (27.4.1)
iI
where ei = (e
i )A for all i I. Let U be a linear space over K. Then
X
f L ((V )A ; U ), f= f (ei )i . (27.4.2)
iI
Proof: Line (27.4.2) follows from Theorem 27.3.4 line (27.3.4) and Theorem 27.4.10. To prove the formula
on line (27.4.1) for K-valued maps, substitute K for U .
27.4.14 Theorem: The multilinear function space dimension equals the product of component dimensions.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then
Q
dim(L ((V )A ; K)) = dim(V ).
A
Proof: The assertion follows from Definition 27.4.9 and Theorem 27.4.11.
27.4.15 Remark: Canonical basis for a linear spaces of multilinear maps.
Definition 27.4.16 is the vector-valued version
Q of Definition 27.4.9. The right hand side of line (27.4.3) is
the scalar product of eUj by the element A B (v )(i ) of the field K. This is a kind of juxtaposition
product as described in Remark 27.1.32.
27.4.16 Definition: The canonical basis for the multilinear map space L ((V )A ; U ) corresponding to
the bases B = (e I
i )i I V for V for A, and a basis B
U
= (eU m
j )j=1 for the finite-dimensional
linear space U over K, is the family (i,j )iI,jNm where I = A I and i,j : A V U is defined
N
for i = (i )A I and j m by
Y
v = (v )A V , i,j (v) = eU
j B (v )(i ), (27.4.3)
A
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
by TheoremP 27.3.4 line (27.3.1). Since f (ei ) U for all i I, it follows from Definition 22.8.8 that
f (ei ) = m j=1 B U (f (ei ))j e
U
j for all i I. Therefore
P P Q
v V , f (v) = (B U (f (ei ))j eU B (v )(i ))
N
j
A iI j m A
P P
= B U (f (ei ))j i,j (v)
iI j N m
P P
by Definition 27.4.16. So f = iI jNm B U (f (ei ))j i,j .
Part (ii) follows from part (i).
For part (iii), suppose that iI,jNm ki,j i,j = 0 for some k K INm . Let i0 I. Then
P
m
P P
0= ki,j i,j (ei0 )
iI j=1
P P m Q
= ki,j eU
j B (e i0 )i
iI j=1 A
m
P P Q
= eU
j ki,j B (e i0 )i
j=1 iI A
m
P P Q
= eU
j ki,j i ,i0 (27.4.4)
j=1 iI A
m
P P
= eU
j ki,j i,i0
j=1 iI
m
P
= eU
j ki0 ,j ,
j=1
N
where line (27.4.4) follows from Theorem 22.8.9 (ii). So ki0 ,j = 0 for all (i0 , j) I m by Theorem 22.6.8 (ii)
because (eU m
j )j=1 is a linearly independent sequence of vectors in U . Thus the vectors i,j for (i, j) I m N
are linearly independent.
Part (iv) follows from parts (ii) and (iii) and Definition 22.7.2.
Proof: The assertion follows from Definition 27.4.16 and Theorem 27.4.17.
27.5.2 Definition: The canonical basis for the multilinear dual function space L ((V )A ; K),
corresponding to bases B = (e I
i )i I V for V for A, is the family (i )iI where I = A I
and i : A V K is defined for i = (i )A I by
Y
w = (w )A V , i (w) = w (e
i ).
A
A
27.5.3 Theorem: Dual canonical basis elements are images of basis-vector families.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Let (i )iI be the canonical basis for L ((V )A ; K) in Definition 27.5.2, where (V )A is a finite family
of linear spaces over a field K. Then
Proof: The equation i = T (ei ) follows from Theorem 27.3.6. Therefore i L ((V )A ; K).
27.5.4 Theorem: Dual canonical basis elements are multilinear functions over the dual linear spaces.
Let (i )iI be the canonical basis for L ((V )A ; K) in Definition 27.5.2, where (V )A is a finite family
of linear spaces over a field K. Then i L ((V )A ; K) for all i I.
27.5.5 Theorem: The dual canonical basis is a basis if the component spaces are finite-dimensional.
Let (i )iI be the canonical basis for L ((V )A ; K) in Definition 27.5.2, where (V )A is a finite family
of finite-dimensional linear spaces over a field K. Then (i )iI is a basis for the linear space of multilinear
functions L ((V )A ; K).
= kj .
So kj = 0 for all j I. So the vectors i for i I are linearly independent. Hence (i )iI is a basis for
L ((V )A ; K).
where hi = (h
i )A is the dual to ei for i I. Let U be a linear space over K. Then
X
f L ((V )A ; U ), f= f (hi )i . (27.5.1)
iI
Proof: Line (27.5.1) follows from Theorem 27.3.7 line (27.3.8) and Theorem 27.5.3. To prove the formula
for K, substitute K for U .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
27.6. Universal factorisation property for multilinear functions
27.6.1 Remark: Kernel-like structure for multilinear maps on dual spaces.
Theorem 27.6.2 says essentially that the multilinear kernel of the canonical multilinear function map is
included in the kernel of any multilinear map with the same domain. (See Definition 10.16.7 for the related
equivalence kernel concept for general functions.) In other words, linear functionals w1 and w2 which are
in the same equivalence class with respect to the action of are also equivalent with respect to the action
of f . This shows that there is no finer partition of A V which is generated by a multilinear function on
this set. (See Definition 8.6.14 for finer and coarser partitions.) The fact that this partition is the coarsest
possible partition with this property follows from the fact that is a multilinear function. Therefore the
partitioning of A V by is the coarsest partition for which all multilinear functions have equal values on
the elements of the partition. This partition may be thought of as a kind of multilinear kernel of A V .
(This partition is precisely characterised in Theorems 27.2.11, 27.2.15 and 27.2.16.)
The canonical multilinear function map helps to clarify the nature of multilinear maps in general. The
kernel of is also in the kernel of every other multilinear map. Hence there is no such thing (in general)
as an injective multilinear map when #(A) 2. Similar observations apply also to the canonical tensor map
for the Cartesian product A V in Section 28.2.
27.6.2 Theorem: Multilinear maps are constant on contours of the canonical multilinear function map.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space
over K. Then for any multilinear map f : A V U ,
where : A V L ((V )A ; K) is the canonical multilinear function map for L ((V )A ; K).
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear
space over K. Let f : A V U be a multilinear map. Let (e i )i I be a basis of V for each A, and
denote by (h i )i I the corresponding dual basis in Definition
P 23.7.3. Let w1 , w2 PA V with (w
1
)=
2 1 1 2
(w ). Let I = A I . Then by Theorem 27.3.7, f (w ) = iI f (hi )(w )(ei ) = iI f (hi )(w )(ei ) =
f (w2 ), where ei = (e
i )A and hi = (hi )A for i I.
27.6.3 Remark: Divisibility of multilinear functions by the canonical multilinear function map.
Theorem 27.6.2 has the interesting consequence that f 1 : Range() U is a well-defined function. (In
fact, this is the purpose of the theorem!) The map 1 : Range() A V is not well defined because
1 ({T }) is not in general a singleton for T Range(). However, f 1 ({T }) = {f (w); (w) = T }
is guaranteed to be a singleton for T Range() by Theorem 27.6.2. (This is an alternative way of
expressing Theorem 27.6.2.) Therefore there is a well-defined function g : Range() U which satisfies
g(T ) = f (w) (w) = T . In other words, g = f . Theorem 27.6.4 asserts the existence and uniqueness
of g.
27.6.4 Theorem: Quotient of multilinear functions by the canonical multilinear function map.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space
over K. Then for any multilinear map f : A V U , there is a unique function g : Range() U
which satisfies g((w)) = f (w) for all w A V , where : A V L ((V )A ; K) is the canonical
multilinear function map for L ((V )A ; K).
27.6.5 Remark: The lift of a multilinear function can be extended to all multilinear functions.
The quotient function g in Theorem 27.6.4 may be thought of as a kind of lift of the function f from the do-
main A V to the domain Range(), which is the set of all simple multilinear functions in L ((V )A ; K).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
This lift does not lose any information from the function f : A V U because f may be recovered
from g as f = g .
The function g in Theorem 27.6.7 is an extension of the function g in Theorem 27.6.4 from Range()
to L ((V )A ; K). In other words, it is an extension of the lift of f from the simple multilinear functions
in Range() to set of all multilinear functions in L ((V )A ; K). Moreover, Theorem 27.6.7 asserts that
there is a unique linear extension of this kind.
27.6.6 Remark: Diagram of quotient of multilinear functions by canonical multilinear function map.
Theorem 27.6.7 is a dual version of Theorem 28.3.7. The functions and spaces in Theorem 27.6.7 are
illustrated in Figure 27.6.1.
g
L ((V )A ; K) U
f
V
A
where ei = (e A V and hi =
i )A (h A V for all i = (i )A I = A I . Then g is
i )A
a linear function of T because T (ei ) is a linear function of T for each i I. (This follows from the pointwise
definition of vector addition and scalar multiplication for the linear space L ((V )A ; K).) Substituting
(w) for T gives:
X
w V , g((w)) = (w)(ei )f (h
i )A
A
iI
X Y
= w (e
i ) f (hi )A
iI A
X
= f (w (e
i )hi )A
iI
X
=f w (e
i )hi A
i I
= f (w )A
= f (w).
Therefore f = g .
To show uniqueness of g, let g 0 : L ((V )A ; K) U be a linear function such that f = g 0 . Then
g 0 (T ) = g(T ) for all T Range(). By Theorem 27.4.11, every multilinear function T in the linear space
L ((V )A ; K) may be expressed asPa linear combination of the simple multilinear functions Q i = (hi )
for i I. By Theorem 27.4.13, P T = iI T (e
P i ) i , where i LP((V ) A ; K) is defined by i = A hi
for i I. Then g 0 (T ) = g 0 ( iI T (ei )i ) = iI T (ei )g 0 (i ) = iI T (ei )g(i ) = g(T ). So g 0 = g.
27.6.8 Remark: Difficulty of proving universal factorisation property without basis and coordinates.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Although the proof of Theorem 27.6.7 uses bases and coordinates, the statement of the theorem does not
seem to suggest a requirement for the use of bases and coordinates. The uniqueness assertion in the statement
of the theorem implies, in fact, that the map g is independent of the choice of basis which is used in its
construction. It only depends on f . Therefore it is reasonable to ask whether a proof may be constructed
without bases and coordinates.
Theorem 27.6.7 asserts essentially that the map g in Theorem 27.6.4 can be extended from simple multilinear
functions to a linear map on the whole domain L ((V )A ; K). In principle, this kind of extension can be con-
structed without choosing a basis. To construct g, we need to know that each function T L ((V )A ; K)
can be expressed as a linear combination of simple multilinear functions (w) for w A V . (This is
a spanning property for simple multilinear functions.) Then we need to show that such a construction
for g is independent of the choice of linear combinations. (This is a uniqueness property.) The fly in the
ointment here is the difficulty of proving this spanning property and uniqueness property (Theorem 27.6.9)
without using a basis and coordinates. This intuitively appealing approach would be coordinate-free if
Theorem 27.6.9 could be proved without a basis and coordinates.
27.6.9 Theorem: Basis-free theorem about multilinear function quotients, requiring bases for proof.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space
over K. Then
(1) f L ((V )A ; K), m + Z m m
0 , K , w (A V ) , f =
Pm
j=1 j (wj ).
(2) f L ((V )A ; U ), m Z+ , K m , w (A V )m ,
P m P 0 m
j=1 j (wj ) = 0 j=1 j f (wj ) = 0.
(3) f L ((V )A ; U ), m1 , m2 +
Pm1 1 Pm2 2
Z 1
0 , K
P m1 1
m1
, 2 K m2 , w1 (A V )m1 , w2 (A V )m2 ,
Pm2 2
1 2 1 2
j=1 j (wj ) = k=1 k (wk ) j=1 j f (wj ) = k=1 k f (wk ).
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let B =
I
(e I
i )i I V be a basis for V for each A. Let (hi )i I (V ) be the dual basis of B for V for
each A. Let ei = (ei )A A V and hi = (hi )A A V for all i = (i )A I = A I
P Q
To show part (1), let f L ((V )A ; U ). By Theorem 27.3.4, f = m j=1 j (wj ) with m = A dim(V ),
m m
where K and w (A V ) are defined by j = f (e(j) ) and wj = (h(j) ) for all j m , for any N
N
bijection : m A.
To show P m +
part (2), let f L ((V )A ; U ),P Zm m
0 , K , and w (A V ) . By P Theorem 27.3.7,
(wj ) = iI (wj )(ei )(hi ) and f (wj ) = iI (wj )(ei )f (hi ) for all j m . Suppose m N
j=1 j (wj ) = 0.
Then
Xm
0= j (wj )
j=1
Xm X
= j (wj )(ei )(hi )
j=1 iI
X m
X
= (hi ) j (wj )(ei ).
iI j=1
Pm the vectors (hi ) L ((V )A ; K) for i I are linearly independent by Theorem 27.4.11.
But So
j=1 j (wj )(ei ) = 0 for all i I. Therefore
m
X m
X X
j f (wj ) = j (wj )(ei )f (hi )
j=1 j=1 iI
X m
X
= f (hi ) j (wj )(ei )
iI j=1
= 0.
Part (3) follows immediately from part (2).
27.6.10 Remark: Difficulty of removing coordinates from proof of multilinear function properties.
Theorem 27.6.9 part (3) is a generalisation of Theorem 27.6.2. Parts (1) and (3) of Theorem 27.6.9 imply
respectively that every multilinear function over a finite family of finite-dimensional linear spaces can be
expressed as a linear combination of simple multilinear functions and that any multilinear map evaluated
term-by-term on such a linear combination will give the same value, no matter how the linear combination
is chosen.
Theorem 27.6.9 could be used to prove Theorem 27.6.7 without using a basis and coordinates. But clearly
Theorem 27.6.9 is heavily dependent on a basis and coordinates for its own proof. The statement of this
theorem avoids the basis and coordinates, but it is very difficult to see how a coordinate-free proof of it can
be constructed. Nevertheless, parts (1) and (3) do give an intuitively appealing notion of how multilinear
functions are related to simple multilinear functions.
27.6.11 Remark: Kernel-like structure for multilinear maps on primal spaces.
Theorem 27.6.12 is a dual version of Theorem 27.6.2.
27.6.12 Theorem: Multilinear maps constant on contours of canonical multilinear function map transpose.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space
over K. Then for any multilinear map f : A V U ,
v 1 , v 2 V , T (v 1 ) = T (v 2 ) f (v 1 ) = f (v 2 ),
A
where T is the canonical multilinear function map transpose for L ((V )A ; K).
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a
linear space over K. Let f : A V U be a multilinear map. Let (e i )i I be a basis of V for
each A, and denote by (h )
i i I the corresponding dual basis in Definition 23.7.3. Let v 1 , v 2 A V
P
with T (v 1 ) = T (v 2 ). Let I = A I . Then by Theorem 27.3.4, f (v 1 ) =
P
T 1
iI f (ei ) (v )(hi ) =
iI f (ei ) T
(v 2
)(h i ) = f (v 2
), where ei = (e
)
i A and h i = (h
)
i A for i I.
Chapter 28
Tensor spaces
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of vectors. But this is a trap. A development path which starts with such simple products of vectors leads
very rapidly to complicated, convoluted representations which are difficult to work with. The approach here
is to commence with multilinear maps (Section 27.1), then define tensors as the dual of multilinear maps
(Section 28.1), and then define simple tensors as a special kind of tensor (Section 28.4).
28.1.2 Definition: The tensor product (space) of a finite family (V )A of linear spaces over a field K
is the dual L ((V )A ; K) of the linear space of multilinear functions L ((V )A ; K).
A tensor space is the tensor product of a finite family of linear spaces over the same field.
A tensor is any element of a tensor space.
28.1.4 Notation: A V denotes the tensor product space of a finite family (V )A of linear spaces
over a field K. Thus
V = L ((V )A ; K)
A
= Lin(L ((V )A ; K), K).
m
28.1.5 Notation: i=1 Vi for a sequence of linear spaces (Vi )m
i=1 over a field K, where m Z+0, denotes
the tensor product space A V with index set A = m . N
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
K K K
K K K
V V1 . . . Vm V
| .{z
. . V}
A m
L ((V )A ) L (V1 , . . . Vm ) Lm (V )
m
V = L ((V )A ) Vm = L ((Vi )m
i=1 )
m V = Lm (V )
A i=1
28.1.7 Notation: m V for a linear space V and m Z+0 denotes the tensor product mi=1 Vi where
Vi = V for i m .N
28.1.8 Remark: Alternative tensor space notations.
See Remark 30.4.13 (Table 30.4.2) for a list of tensor space notations of other authors.
m
V Vm m V
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A i=1
L ((V )A ) L (V1 , . . . Vm ) Lm (V )
V V1 , . . . V m V, . . . V
A | {z }
m
28.1.10 Remark: Equivalences between notations for tensor spaces for a family of linear spaces.
In terms of Notations 28.1.5 and 28.1.6, one may write
m
V1 . . . Vm = Vi = L ((Vi )m
i=1 ; K) = L (V1 , . . . Vm ; K)
i=1
28.1.11 Definition: The degree of a tensor space A V is the number #(A), where (V )A is a finite
family of linear spaces over the same field.
28.1.12 Remark: The degree of a tensor space with homogeneous linear spaces.
m
The degree of a tensor space V for m + Z
0 and a linear space V is the number m.
the linear space L ((V )A ; K). (The domain of a function may be determined unambiguously from any
function, and all functions in a dual space have the same domain.) The domain for this linear space is always
a non-empty set of m-tuples. (See Remark 27.1.14.) The number of elements in a tuple is unambiguous.
Therefore Definition 28.1.11 is readily extended to define the degree of each tensor which is an element of a
tensor space.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
28.1.17 Remark: Tensors of degree 1 are not the same as vectors.
On the subject of identifications generally, it should be said that they generally trade meaningfulness
for convenience. Often some important meaning is lost when similar things are agreed to be identical or
identified. In the case of tensor spaces of degree 1, the loss of meaningfulness might be quite significant.
In fact, it could be argued that much of the confusion surrounding tensors may be traced to the kind of
identification of tensors with vectors which is mentioned in Remark 28.1.16.
A vector is a class of object which may be interpreted physically as a velocity. (See Sections 26.17, 51.1
and 51.3 for some discussion of the interpretation of vectors as velocities.) By contrast, an element of V ,
for a finite-dimensional linear space V , is a map of the form w 7 w(v) for some v V . In other words, an
element of V maps linear functionals w V to values w(v). In other words, it is a map from V to K,
where K is the field of V . This map just happens to be expressible as the action of linear functionals w V
on a fixed vector v V (or alternatively as the response of vector v to linear functionals w V ). If w
represents a force field of some kind, then w(v) represents the force (or work) experienced by a particle with
velocity v in that field. By replacing the physically meaningful formula w 7 w(v) with the vector v, the
interaction with the force field is eliminated from the mathematical representation.
If one acquiesces in the identification of tensors of degree 1 with vectors, one naturally seeks to identify
tensors of degree 2 with vectors in some way. Then it may seem reasonable to identify tensors of degree 2
with sums of pairs of vectors, and the mathematical machinery to handle this becomes very clumsy. But
if one forgoes the initial advantages of identifying tensors of degree 1 with vectors, one may deal with
all (contravariant) tensors in a consistent way as the duals of multilinear spaces. Then one may physically
interpret all (contravariant) tensors as interactions between particles and force fields, including tensors of
degree 1. In short, vectors are not a sub-class of the class of tensors, and tensors are not a generalisation of
the class of vectors.
28.1.18 Theorem: Basis-and-coordinates formula for the action of a tensor on a multilinear function.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then
X
T V , f L ((V )A ; K), T (f ) = T (i )f (ei ),
A
iI
where (i )iI is the canonical basis for L ((V )A ; K) in Definition 27.4.9 corresponding to the bases
(e
i )i I for V , and ei = (ei )A for all i = (i )A I = A I .
P
Proof: Let TP A V and f L ((V )A ; K). Then T (f ) = T iI f (ei )i by Theorem 27.4.13.
Hence T (f ) = iI T (i )f (ei ).
28.1.20 Theorem: The tensor product space dimension equals the product of the component dimensions.
Let (V )A be a finite family of finite-dimensional linear spaces. Then
Y
dim V = dim(V ).
A
A
Proof: This follows immediately from Theorem 27.4.14 because A V is the linear space dual of
L ((V )A ; K).
28.1.21 Remark: Q Comparison of tensor product space dimension with direct product space dimension.
The dimension A dim(V ) P of the tensor product A V of finite-dimensional linear spaces may be
compared with the dimension A dim(V ) of the direct product A V , which is also known as the
direct sum of the family of linear spaces.
space dimension
P
A V
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A dim(V )
Q
A V A dim(V )
This shows clearly how different the two concepts are. It also explains why the word sum is used for the
direct sum of linear spaces.
28.2.2 Definition: The canonical tensor map for a tensor space A V , where (V )A is a finite
family of linear spaces, is the map : A V A V which is defined by
28.2.5 Remark: Diagrams of domains and ranges of the four canonical multilinear maps.
The domains and ranges of the canonical multilinear function map (and its function-transpose T ) and
the canonical tensor map (and its dual version ) are summarised in Figure 28.2.1.
V V
A A
28.2.6 Remark: The relation between the canonical tensor map and canonical multilinear map.
The canonical tensor map : A V A V in Definition 28.2.2 is related to the canonical multilinear
function map : A V L ((V )A ; K) in Definition 27.2.4. Since A V is the linear space dual
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of L ((V )A ; K), the composition (v)((w)) is well defined and satisfies
28.2.7 Remark: The canonical tensor map for a family of dual spaces.
If Definition 28.2.2 is applied to the dual spaces (V )A , a canonical tensor map : A V A V
is then defined by
This is well-defined when the primal component spaces V are infinite-dimensional, or even when these spaces
have no basis. However, the dual-space canonical tensor map is (probably) of limited applicability for
such primal spaces.
The dual-space canonical tensor map does not require a separate definition, but it is formalised anyway in
Definition 28.2.8, to at least give it a definite name and notation.
28.2.8 Definition: The dual-space canonical tensor map for a tensor space A V , where (V )A is
a finite family of linear spaces, is the map : A V A V which is defined by
28.2.9 Remark: Relation between the dual-space canonical tensor map and canonical multilinear map.
The dual-space canonical tensor map in Definition 28.2.8 is related to the canonical multilinear function
map : A V L ((V )A ; K) in Definition 27.2.4. Since A V is the linear space dual of
L ((V )A ; K), the composition (w)( T (v)) is well defined and satisfies
28.2.10 Remark: Conditions for injectivity and surjectivity of the canonical tensor map.
The canonical tensor map in Definition 28.2.2 is almost never injective, and it is only surjective in special
cases. Injectivity holds for #(A) = 0 and #(A) = 1. In the case #(A) = 0, there is only one element
in A V . So is necessarily injective. In case #(A) = 1, the space A V is the double dual of
the linear space V . So is actually the canonical map from a linear space to its second dual which is
presented in Definition 23.10.4. Therefore the map is injective in this case. When #(A) > 1, the map is
necessarily non-unique.
The canonical tensor map is surjective when dim(V ) = 0 for some A because then dim(A V ) = 0
also. The map is also surjective if #(A) = 1 with a single finite-dimensional linear space V because the
canonical map from a linear space to its second dual is a linear space isomorphism if the primal linear space
is finite-dimensional. The map is also surjective if dim(V ) > 1 for at most one index A if this single
tensor component is finite-dimensional. But if dim(V ) > 1 for two or more indices , and dim(V ) > 0 for
all indices , then is necessarily non-surjective.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
28.2.12 Theorem: Basis-and-coordinates expression for A V = L ((V )A ; K) .
Let T A V , where (V )A is a finite family of finite-dimensional linear spaces over a field K. Let
B = (e I
i )i I V be a basis for V for each A. Let ei = (ei )A A V for each index family
i = (i )A I = A I . Then
X
f L ((V )A ; K), T (f ) = T (i )f (ei ) (28.2.1)
iI
X
= T (i )(ei )(f ), (28.2.2)
iI
Q
where i L ((V )A ; K) is defined by i = A hi for i I and the linear functional families
hi = (h )
i A V
A
for i I are as in Definition 23.7.3, and : A V A V is the
canonical tensor map in Definition 28.2.2. Hence
X
T = T (i )(ei ). (28.2.3)
iI
Proof: Line (28.2.1) is the same as Theorem 28.1.18. Then line (28.2.2) follows from Definition 28.2.2.
Line (28.2.3) follows from the pointwise definition of equality of tensors.
28.2.13 Remark: Extension of the canonical tensor map to symbolic sums of vector-tuples.
The canonical tensor map : A V A V for the finite family of linear spaces (V )A may be
extended in a natural way to the list space List(A V ). There is no natural vector addition operation on
the Cartesian product A V , but if this set is extended to lists, a primitive kind of addition can be defined
by list concatenation. The range of has a well-defined vector addition already. This suggests an extended
for all ((vi )A )iI List(A V ) and f L ((V )A ; K). This map has the property that applied
to the concatenation of two vector-tuple lists yields the sum of applied to each vector-tuple list. In other
words, the concatenation operation maps to the addition operation.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
restructured to be more like Theorem 27.6.7, which uses Theorem 27.6.4 to prove uniqueness.
28.3.4 Theorem: Multilinear maps are constant on contours of the canonical tensor map.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U be a linear space
over K. Then for any multilinear map f : A V U ,
v 1 , v 2 V , (v 1 ) = (v 2 ) f (v 1 ) = f (v 2 ),
A
28.3.6 Remark: Arrow diagram for quotient of multilinear maps by the canonical tensor map.
The functions and spaces in Theorem 28.3.7 are illustrated in Figure 28.3.1.
V g
A U
f
V
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
X
= i (v)f (e
i )A
iI
X Y
= vi f (e
i )A
iI A
X
= f (vi e
i )A
iI
X
=f vi e
i A
iI
= f (v).
28.3.9 Remark: Universal factorisation property theorem statement does not need a basis.
It seems that the proof of Theorem 28.3.7 requires the use of bases for the component spaces of the tensor
product, although the statement of the theorem does not seem to require bases. The uniqueness part of the
theorem shows that the constructed map is independent of the basis which is used to construct it.
28.4.2 Definition: A simple tensor in a tensor product space A V is any tensor (v) for v A V ,
where is the canonical tensor map for A V .
28.4.4 Notation: m m m
i=1 vi , for a sequence of vectors (vi )i=1 in linear spaces (Vi )i=1 over a field K, where
Z
+ m
m 0 , denotes the simple tensor ((vi )i=1 ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
28.4.5 Notation: v1 . . . vm , for a sequence of vectors (vi )m m
i=1 in linear spaces (Vi )i=1 over a field K,
+
Z m
where m 0 , denotes the simple tensor i=1 vi .
28.4.6 Remark: Equivalences between notations for tensor monomials for a family of linear spaces.
In terms of Notations 28.4.4 and 28.4.5, one may write
m
v1 . . . vm = vi : f 7 f (vi )m
i=1 = f (v1 , . . . vm )
i=1
28.4.7 Remark: The relation between simple tensors and simple multilinear maps.
Simple tensors (v) = A v for v A V are related to the simple multilinear maps (w) in Defini-
tion 27.2.6 as follows.
w V , v V , v ((w)) = (w) (v )A
A A A Y
= w (v ).
A
= ij .
2 v1
Ht Ht Ht Ht
v1
0.5 v1 2 v2
v2 = f (v1 , v2 )
0.5 v2
v2
f (2 v1 , 0.5 v2 ) = f (v1 , v2 ) = f (0.5 v1 , 2 v2 )
v1
Ht Ht Ht Ht
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Ht = {1 v1 + 2 v2 ; f (1 v1 , 2 v2 ) = t}
Figure 28.4.1 Equivalent bilinear effect of vector pairs
The constraint f (1 v1 , 2 v2 ) = t for a bilinear function f implies 1 2 f (v1 , v2 ) = t, which implies a reciprocal
relation between 1 and 2 .
The same monomial degree-2 tensor is determined by each of the vector pairs (1 v1 , 2 v2 ) in Figure 28.4.1
because all bilinear functions f must give the same value f (1 v1 , 2 v2 ) for each of these vector pairs. Thus
even though we often think of tensors as products of vectors, the tensors are only determined by the vectors.
Many vector pairs determine the same tensor. It is better to think of a tensor as a property of a vector tuple
rather than as the tuple itself. This is analogous to the common acceptance that 1/2 and 2/4 determine the
same number.
R R Pn
Let (aij )ni,j=1 Mn,n ( ) be a real n n matrix and define f : V1 V2 by f (v1 , v2 ) = i,j=1 aij v1,i v2,j
n
for (v1 , v2 ) V1 V2 , where (vPk,i )i=1 are the components of vk for k = 1, 2. Then f is bilinear. Therefore
(v1 v2 )(f ) = ((v1 , v2 ))(f ) = ni,j=1 aij v1,i v2,j for all v1 V1 and v2 V2 . In this example, f is a general
bilinear map and v1 v2 is a simple bilinear tensor.
28.4.11 Remark: Another way of seeing that not all tensors are simple tensors.
Not all tensors are simple tensors. This is abundantly clear from the different dimensions of the spaces of
simple tensors and general tensors. In the case of finite-dimensional spacesP (V )A with n = dim(V )
for
Q each A, the dimension of the linear subspace of simple tensors is A n , but dim(A V ) =
A n . (See Remark 28.1.21.)
There is a more concrete way to see that not all tensors are simple tensors. If one has
P a finite basis for each
component space as in Remark 28.4.8, one may write any T A V as T = iI ki A e i , where
the coefficients ki are defined by ki = T (i ) for all i I. In the case #(A) = 2 with V1 = V2 = V with
dim(V ) = n + Z
0 and K = R R
, the coefficients ki become an n n square matrix a Mn,n ( ). By adjusting
the basis of V , the matrix can be diagonalised if it is symmetric. Then the tensor T can be written as the
sum of n simple tensors. Only in very special cases is it possible to choose the basis so that all except one
of the elements of the matrix a are equal to zero. Clearly simple tensors are quite unusual in general.
Similarly, in the case of antisymmetric tensors in Section 30.4, one might wish to represent such tensors as
equivalence classes (v) = {v 0 m 0 m
i=1 Vi ; Lm (V ), (v ) = (v)} for v i=1 Vi . However, only
simple antisymmetric tensors may be represented in this way.
In the case of simple tensors, Example 28.4.10 shows that equivalent bilinear effect classes do have a
strong intuitive clarity. Similarly, in Remark 30.4.25, the equivalent antisymmetric bilinear effect of vector
pairs gives strong intuitive clarity for simple antisymmetric tensors. Thus the tensor product v1 v2 may be
interpreted as the multilinear effect of the pair (v1, v2 ), and the symbolic expression v1 v2 may be represented
mathematically as an equivalence class (v1 , v2 ) . Similarly, the antisymmetric wedge product v1 v2 may
be interpreted as the antisymmetric multilinear effect of the pair (v1 , v2 ), and the symbolic expression v1 v2
may be represented mathematically as an equivalence class (v1 , v2 ) . Despite the intuitive advantages,
it is not easy to generalise this kind of representation to general tensors and general antisymmetric tensors.
: V L ((V )A ; K)
A
: V L ((V )A ; K)
T
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
: V L ((V )A ; K) = V
A A
: V L ((V )A ; K) = V
A A
Q
: w 7 (v 7 w (v )) (28.4.1)
A
Q
T : v 7 (w 7 w (v )) (28.4.2)
A
: v 7 ( 7 (v)) (28.4.3)
: w 7 ( 7 (w)). (28.4.4)
Substituting (w) for in (v)() and substituting T (v) for in (w)() gives
v V , w V , (v)((w)) = (w)(v)
A A
= T (v)(w)
v V , (v) = T (v) (28.4.5)
A
i I, i = (hi )
i I, i = T (ei )
X X
f L ((V )A ; U ), f= f (ei )(hi ) = f (ei )i
iI iI
X X
f L ((V )A ; U ), f= T
f (hi ) (ei ) = f (hi )i
iI iI
X
T V , f L ((V )A ; K), T (f ) = T (i )f (ei )
A
iI
X
T V , T = T (i )(ei ),
A
iI
where U is a linear space over the field K. The universal factorisation properties are as follows.
28.5.2 Remark: Tensorial objects which transform similarly may be very different classes of objects.
Bishop/Goldberg [3], page 79, make the following comment on the many alternative representations of
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
tensor spaces as set-constructions. The term interpretation means representation in their context. The
multilinear function interpretation of a tensor which they refer to is the style of tensor representation
presented in Definition 29.3.8.
Tensors generally admit several interpretations in addition to the definitive one of being a multilinear
R
function with values in . For tensors arising in applications or from mathematical structures it
is rarely the case that the multilinear function interpretation of a tensor is the most meaningful in
a physical or geometric sense. Thus it is important to be able to pass from one interpretation to
another. The number of interpretations increases rapidly as a function of the degrees.
Flanders [11], page 3, makes a related comment.
In tensor calculations the maze of indices often makes one lose sight of the very great differences
between various types of quantities which can be represented by tensors, for example, vectors
tangent to a space, mappings between such vectors, geometric structures on the tangent spaces.
In other words, the fact that two tensor spaces have the same transformation rules does not necessarily imply
that they are the same type of quantity.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1988 Kay [18] L ((V )A ; K) L ((V )A ; K)
1991 Fulton/Harris [74] characterisation A V , (A V )
1994 Darling [8] quotient for (V )A
1995 ONeill [261] L ((V )A ; K) L ((V )A ; K)
1996 Goenner [244] L ((V )A ; K) L ((V )A ; K)
1997 Frankel [12] L ((V )A ; K) L ((V )A ; K)
1997 Lee [24] L ((V )A ; K) L ((V )A ; K)
1999 Lang [23] L ((V )A ; K)
1999 Rebhan [264] coordinates coordinates
2004 Bump [56] characterisation
2004 Szekeres [266] quotient for (V )A quotient for (V )A
2005 Penrose [263] coordinates coordinates
2006 Gregory [246] coordinates
2012 Sternberg [37] L ((V )A ; K) L ((V )A ; K)
Kennington L ((V )A ; K) L ((V )A ; K)
force fields, and tensors represent the response to fields, and that response may have a vectorial character,
or a product-of-two-vectors character, or something more complex than that. By starting a presentation of
tensor algebra with the simplest case, the tensors of contravariant degree 1, the subject becomes unnecessarily
complex. Starting with multilinear force fields and defining contravariant tensors as responses to such fields
seems to give the subject a more natural structure and development sequence.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The difference is clearest in the case of antisymmetric tensors. In the dual-of-multilinear-functions case (A),
the action of the field on a vector pair is the same as in the ordinary tensor case, but just happens to
be antisymmetric. That is, (u, v) = (v, u). One can easily implement any kind of symmetry for the field
by imposing symmetry conditions on the field , which is where the symmetry constraint belongs.
In the popular multilinear-functions-of-duals representation (B), a simple wedge product u v must map
each pair (1 , 2 ) to the expression (u v)(1 , 2 ) = 21 (u v v u)(1 , 2 ) = 21 (1 (u)2 (v) 2 (u)1 (v)).
In case (B), the particle must perform a more complex calculation on the two field strengths 1 and 2 to
decide how to move antisymmetrically. This seems very clumsy and arbitrary. This representation of simple
wedge products does not put any constraints on the multilinear function pairs (1 , 2 ).
The fact that the popular model (B) defines wedge product-tensors to be a subspace of the general tensor
products might sound like a good idea (because an already defined space can be recycled). But in fact,
wedge products are a quite different kind of object to general tensors or symmetric tensors. They have a
different algebra. You shouldnt be able to add an antisymmetric tensor to a symmetric tensor. It is difficult
to think of a circumstance where you would want to add them. The result would be subject to ambiguous
multiplication operations. Multiplication of antisymmetric tensors follows different rules to general tensors.
It seems clear, therefore, that it is best to define antisymmetric contravariant tensors as the dual of the
antisymmetric multilinear functions.
The popular representation of tensors as multilinear functions on spaces of linear functionals has the ad-
vantage of simplicity, especially when dealing with tensors of mixed type, but the advantage of convenience
must be weighed against the disadvantage of ontological falsity. In other words, the popular representation
has an obscure meaning, although it is convenient for computations. This is a frequent theme in modern
differential geometry: trading meaningfulness for computational convenience. One might perhaps speculate
that part of the reason for the famous incomprehensibility of modern differential geometry is this kind of
trade-off of meaningfulness for convenience.
28.5.6 Remark: Formulas for the wedge product in terms of the tensor product.
The formula u v = 12 (u v v u) is given by MacLane/Birkhoff [106], pages 558560, and Szekeres [266],
page 209. This must not be confused with the formula = for differential forms, which is
given by Frankel [12], page 67, and Darling [8], page 33.
28.5.7 Remark: The perplexing terminology for contravariant and covariant tensors.
It often seems that the words covariant and contravariant are defined with the reverse meanings to what
is expected. The terminology may be justified by noting that the coefficients of contravariant vectors vary
as inverses of the transformations of basis vectors. However, contravariant vectors themselves are the primal
vectors whereas covariant vectors are the dual vectors.
It seems that the expression of all vectors in terms of components in tensor calculus may be responsible
for the confusion. It one thinks of the components as being the real vectors and tensors, then certainly the
term contravariant components makes very good sense. But if this adjective is then transferred from the
components to the vector, the term contravariant vector comes into existence, although the vector itself
does not vary! A more precise terms would be a vector whose components are contravariant. This would
remove the confusion but would be tedious to read and write.
In a comment about the transformation rule for contravariant vector coefficients with respect to a transformed
basis, Szekeres [266], page 84, says the following.
This law of transformation of components of a vector v is sometimes called the contravariant
transformation law of components, a curious and somewhat old-fashioned terminology that
possibly defies common sense.
However, as discussed in Remark 27.0.4, the traditional choice of terminology might be completely correct.
(See also a related discussion of the confusing terminology for the covariant derivative in Remark 65.6.2.)
28.5.8 Remark: Historical origins of terminology for contravariant and covariant tensors.
The following passage appears in the 1926 English translation (approved by the author) of a 1923 book by
Levi-Civita [26], page 20. It is expressed in an old-fashioned way. So it needs deciphering.
[. . . ] We may therefore interchange i and j in the second part of the preceding formula, so that we
can now write the equation in the form
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
n
P X Xj
i
d d = ij dxi xj (11)
1 xj xi
The expression d d is called the bilinear covariant relative to the given Pfaffian. The use
of the term bilinear is sufficiently justified by the expression just found, which is linear in the
arguments dx and also in the arguments x. The name covariant is due to the circumstance that
the numerical value and formal structure of the two sides of equation (11) always remain the same
when the independent variables x vary in any way whatever.
In more modern notation, the quoted equation (11) would look something like:
n
X
Y (X) X (Y ) = (j i i j )X i Y j ,
i,j=1
where dx and x would be written nowadays as vector fields X and Y on a manifold, and Levi-Civitas Pfaffian
X would be notated as a 1-form . The detailed context of this quotation is not important. The main point is
that Levi-Civita observes that the exterior derivative (as we now call it) is a second-order covariant tensor, or
bilinear covariant as he calls it, which is chart-independent. The quoted passage suggests that Levi-Civita
considered chart-independence to mean the same thing as covariance. The key point that he makes, though,
is that the expression is covariant with respect to the Pfaffian. The expression on line (11) is bilinear
covariant relative to the given Pfaffian. Thus 1-forms are taken to be the fundamental class of objects on
a manifold, not vector fields. This is possibly the key difference of perspective which explains the quirky
terminology. Pfaffians are generalisations of differentials of real-valued functions, whereas vector fields are
generalisations of the derivatives of curves. The early 20th century differential geometry literature places
more emphasis on solving systems of differential equations than the modern literature. Arrays of differentials
like uj /xi were in the foreground in the earlier literature. (Theres an important clue in the name given
to the subject absolute differential calculus or tensor calculus, not differential geometry.) Hence
any tensor which transformed like a differential was called covariant.
In a later chapter, Levi-Civita [26], pages 6171, develops the general terminology and definitions for co-
variant, contravariant and mixed tensors. When assigning the name covariant, he invokes the examples
of linear and bilinear forms, force fields and differentials of functions. Then tangent vectors are defined as
contravariant as a consequence. Most particularly, Levi-Civita assumes that physical force is the model
upon which transformation laws for vectors should be based, whereas the last 20th century view is that
infinitesimal displacement is the model for vectors. He then notes that the differential of work in a force field
is given by a formula dW = Xdx+Y dy +Zdz, and work is an invariant under transformations. Consequently
the force vector (X, Y, Z) must transform inversely to the coordinates. The text shows some awareness of
the arbitrariness of the choice of names.
[. . . ] If in fact, as we may suppose, the vector has a magnitude and a direction which are independent
of the system of co-ordinates chosen (we shall think of it as being defined physically as a force),
its components, on the contrary, even when the point considered remains unchanged, change their
values when the frame of reference is changed. This is obvious in the case of a rotation of Cartesian
axes. If, however, the transformation considered is not of this particular kind, we do not know
a priori what to substitute for the projections X, Y , Z of the vector on the axes of the old system
in order to specify the vector in the new system; i.e. we have to determine the law of transformation
which will meet the needs of the case in question. The most suitable criterion to take as a guide
in making our choice is found by introducing, alongside the given vector, a scalar quantity with a
physical significance which is transformed by invariance. In this case we take two infinitely near
points whose co-ordinates differ by dx, dy, dz; then the work of the force whose components are
X, Y , Z, in passing from one of these points to the other, will be
dW = Xdx + Y dy + Zdz; (2)
this scalar quantity has a physical significance which is invariant, and it can therefore be concretely
determined.
Hence it does seem that the choice of terminology is due to a perspective which is more oriented towards
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
algebra, force fields and systems of differential equations, and not oriented so much towards geometry. In
fact, it may even be erroneous to assume that the developers of tensor calculus identified their research as
differential geometry. The first paragraph of the first chapter of Levi-Civita [26], page 1, seems almost to
be an apology for introducing any geometrical thinking at all into tensor calculus.
1. Geometrical terminology
In analytical geometry it frequently happens that complicated algebraic relationships represent
simple geometrical properties. In some of these cases, while the algebraic relationships are not easily
expressed in words, the use of geometrical language, on the contrary, makes it possible to express
the equivalent geometrical relationships clearly, concisely, and intuitively. Further, geometrical
relationships are often easier to discover than are the corresponding analytical properties, so that
geometrical terminology offers not only an illuminating means of exposition, but also a powerful
instrument of research. We can therefore anticipate that in various questions of analysis it will be
advantageous to adopt terms taken over from geometry.
1 2
V
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
g21
W1 W2
g12
1 2
V
A
Very importantly, the isomorphism between any two representations of a tensor product is unique. This
ensures that any individual tensor in one representation may be identified with one and only one particular
tensor in the other representation.
28.6.6 Remark: Alternative representations of tensor product spaces.
Four ways of defining tensor product spaces are presented in this chapter.
(i) Characterisation in terms of a multilinear map (Definition 28.6.2).
(ii) The dual of the linear space of multilinear maps (Definition 28.1.2).
(iii) The multilinear function space on a mixture of primal and dual spaces (Definition 29.3.8).
(iv) The quotient of a free linear space by a set of multilinear identities (Definition 28.7.2).
Definition 28.6.2 defines tensor space representations in general. Definitions 28.1.2 (dual of multilinear
maps) and 28.7.2 (quotient of free linear space) are particular tensor space representations. All tensor space
representations (for a particular sequence of linear spaces) are related by a unique isomorphism. Therefore
calculations in all representations give the same answers.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
where eu F denotes the function eu = {u} : A V K.
The standard immersion of a product A V of linear spaces over a field K into the tensor product
A V is the function : A V A V defined by
v V , (v) = ev + G.
A
That is, each element v of A V is mapped onto the coset in F/G of the indicator function of {v}.
28.7.3 Remark: Interpretation of vectors in the free-linear-space definition of tensor spaces.
Definition 28.7.2 may be interpreted as representing each symbol of the form v as ev . (See Figure 28.7.1.)
(u, 1)
V
A
28.7.4 Remark: Verification that the free-linear-space representation satisfies the metadefinition.
A metadefinition for the tensor product of linear spaces is given in Section 28.6. Definitions of tensor product
spaces are characterised by a set of properties which any definition of tensor products must satisfy. It is
straightforward to show that the free linear space tensor product immersion : A V A V in
Definition 28.7.2 satisfies Definition 28.6.2 with W = A V .
Chapter 29
Tensor space constructions
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A V
m-
L ((V )A ; K) A V (A V )
du du du
al al al
dual
al al al
-du du du
m
A V L ((V )A ; K) A V (A V )
T
Figure 29.1.1 Tensor space duals
The space L ((V )A ; K) is constructed from L ((V )A ; K) by substituting the dual space V for each
component space V . The space A V = L ((V )A ; K) is constructed as the dual of the whole space
L ((V )A ; K). Similarly, A V = L ((V )A ; K) is constructed from L ((V )A ; K). Then the
double duals (A V ) and (A V ) may be constructed similarly.
It follows from Theorem 23.10.7 that the double duals (A V ) and (A V ) have natural iso-
morphisms to their respective primal spaces. Thus (A V ) = L ((V )A ; K) and (A V ) =
L ((V )A ; K). Some other natural isomorphisms between spaces in Figure 29.1.1 are not so easy to
construct. These isomorphisms are the subject of Section 29.2.
The canonical multilinear maps , , and T are indicated in Figure 29.1.1. These maps associate simple
tensors with vector tuples as in Figure 28.2.1 in Remark 28.2.5.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
29.1.2 Remark: Alternative unentwined diagram for various duals of multilinear function spaces.
Figure 29.1.2 contains the same information as Figure 29.1.1, shown differently. Whereas Figure 29.1.1 shows
duality relations between spaces as diagonal arrows in an entwined manner, Figure 29.1.2 unentwines
the spaces to show the duality relations in linear sequences.
V V
A A
V V
A A
L ((V )A ) L ((V )A )
T
V V
A A
: (v )A 7 ( 7 ((v )A )) (29.1.1)
Q
T : (v )A 7 ((w )A 7 w (v )). (29.1.2)
A
: (w )A 7 ( 7 ((w )A )) (29.1.3)
Q
: (w )A 7 ((v )A 7 w (v )). (29.1.4)
A
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The maps , , and T have the following significance.
(1) (v) means the representation of the simple contravariant tensor for v = (v )A as a linear functional
acting on L ((V )A ; K).
(2) T (v) means the representation of the simple contravariant tensor for v = (v )A as a multilinear
functional acting on A V .
(3) (w) means the representation of the simple covariant tensor for w = (w )A as a linear functional
acting on L ((V )A ; K).
(4) (w) means the representation of the simple covariant tensor for w = (w )A as a multilinear functional
acting on A V .
As an example of how the representations (v) and T (v) work, consider the operation of addition. The sum
of (v 1 ) and (v 2 ) is the map 7Q((v1 )A ) + ((v 2
Q )A ) for L ((V )A ; K). The sum of T (v 1 )
T 2 1 2
and (v ) is the map (w )A 7 A w (v ) + A w (v ).
It could be claimed that most of these tensor-related constructions are irrelevant and devoid of real interest.
However, since different authors use different constructions to represent the same concept, it is important
to define sufficient constructions here to be able to recognise them in the literature. (Representations for
tensor spaces which are presented by some authors are summarised in Table 28.5.1 in Remark 28.5.3.) It
is also important to prove sufficient natural isomorphisms to be able to recognise which constructions are
essentially identical to other constructions. Another objective here is to investigate various tensor space
constructions in this section to determine which are most suited to particular purposes.
29.2.2 Remark: Overview of four basic tensor space isomorphisms.
Figure 29.2.1 illustrates some relations between various spaces of multilinear maps and tensor products. The
abbreviation iso means natural isomorphism, dual means linear space dual, and m-dual means
the multilinear dual or space of multilinear maps.
iso iso
(V )A
m-
L ((V )A ; K) A V (A V )
du du du
al al al
dual
al al al
-du du du
m
(V )A L ((V )A ; K) A V (A V )
iso iso
Figure 29.2.1 Multilinear spaces and tensor products of linear space families
The isomorphisms in Figure 29.2.1 help to reconcile the many definitions for covariant and contravariant
tensor spaces which are adopted by various authors. The spaces in the top row are covariant. The spaces in
the bottom row are contravariant.
The syncretistic point of view would be that all three spaces in the top row are valid covariant tensor
spaces, and the three spaces in the bottom row are valid contravariant tensor spaces. There is no real
need to identify which is the correct representation. They each have their own peculiar virtues which
are applicable in different circumstances. Therefore they may co-exist as multiple tensor spaces for a single
family of linear spaces (V )A . As suggested by Figure 29.2.7 in Remark 29.2.19, each of these tensor space
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
representations has its own canonical multilinear map. Therefore they all qualify as tensor spaces according
to the characterisation test in Definition 28.6.2.
The following isomorphisms are indicated in Figure 29.2.1.
L ((V )A ; K)
= V = L ((V )A ; K) (29.2.1)
A
L ((V )A ; K)
= V = L ((V )A ; K) (29.2.2)
A
V
=
V = L ((V )A ; K) (29.2.3)
A A
V = L ((V )A ; K)
V = (29.2.4)
A A
29.2.3 Remark: Tensors and multilinear functions of dual vectors are isomorphic.
The isomorphism between L ((V )A ; K) and L ((V )A ; K) in Remark 29.2.2 line (29.2.1) demonstrates
an equivalence between this books representation for tensor products A V and the more popular rep-
resentation. (These two representations are discussed in Remark 28.5.5.)
The canonical isomorphism which is constructed in Theorem 29.2.4 between tensors and the corresponding
multilinear functions of dual spaces is analogous to the isomorphism construction which appears in the proof
of Theorem 29.2.6. The maps and spaces in Theorem 29.2.4 are illustrated in Figure 29.2.2.
29.2.4 Theorem: Isomorphism between tensors and multilinear functions of dual vectors.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then there is a unique
linear space isomorphism : A V L ((V )A ; K) which satisfies
Y
v V , w V , ((v))(w) = w (v ), (29.2.5)
A A
A
V
A
L ((V )A ; K)
T
V
A
29.2.5 Remark: Dual-vector tensors and multilinear functions of primal vectors are isomorphic.
Theorem 29.2.6 presents a canonical isomorphism between : A V and L ((V )A ; K) which validates
line (29.2.2) in Remark 29.2.2. This isomorphism is natural in the sense that it is the
Q unique linear extension
of the map from (w) = A w to the multilinear function (w) : v 7 A w (v ) for all w =
(w )A A V , where : A V A V is the canonical tensor map for A V , and
: A V L ((V )A ; K) is the canonical multilinear function map defined in Definition 27.2.6.
The isomorphism is explicitly specified only for simple tensors, not for general tensors in A V . The
existence and uniqueness of the extension from simple tensors to all tensors is guaranteed by the universal
factorisation property for tensor products, which uses bases and coordinates for the proof. An explicit
construction of the isomorphism would therefore require the use of coordinates with respect to bases for
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the spaces (V )A . The isomorphism is not basis-dependent, but the proof does employ a basis for each
component space.
The functions and spaces in Theorem 29.2.6 are illustrated in Figure 29.2.3.
V
A
L ((V )A ; K)
V
A
29.2.6 Theorem: Isomorphism between dual-vector tensors and multilinear functions of primal vectors.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then there is a unique
linear space isomorphism : A V L ((V )A ; K) which satisfies
Y
w V , v V , ( (w))(v) = w (v ), (29.2.6)
A A
A
where :
A V
A V denotes the canonical tensor map for A V .
Proof: The canonical multilinear function map : A V L ((V )A ; K) is a multilinear map by
Theorem 27.4.6. So by Theorem 28.3.7 with target space U = L ((V )A ; K), there exists a unique linear
function : A V L ((V )A ; K) such that = . This is the same as line (29.2.6).
It follows from line (27.3.4) in Theorem 27.3.4 that the linear space L ((V )A ; K) is spanned by Range().
But Range() Range() because = , and span(Range()) = Range() because is linear.
Therefore Range() = span(Range()) span(Range()) = L ((V )A ; K). So is surjective.
To show that is injective, let T A V satisfy (T ) = 0. If T Range( ), then T = (w) for
some w A V . So ( (w)) = 0. So (w) = 0. So (w) = 0 by Theorem 27.6.2. Therefore Range( )
ker() = {0}. So span(Range( )) ker() = {0}. But span(Range( )) = A V . So ker() = {0}. So
is an injection. Hence is a linear space isomorphism.
29.2.7 Remark: The universal factorisation property uses bases-and-coordinates for its proof.
The construction of an isomorphism L ((V )A ; K) = A V in Theorem 29.2.6 cleverly hides the use
of coordinates. Theorem 28.3.7 (the universal factorisation property for tensor products) requires bases
and coordinates for its proof. Presentations of tensor algebra which characterise (i.e. meta-define) tensor
spaces in terms of a universal factorisation property seem to have no dependence on coordinates, but any
representation of such a metadefinition requires bases and coordinates to boot-strap such an apparently
coordinate-free approach. This seems to be true more generally. Essentially all claimed coordinate-free
approaches to differential geometry topics are really coordinates-hidden.
29.2.8 Remark: The last two tensor space isomorphisms are direct consequences of the first two.
A natural isomorphism in line (29.2.3) may be constructed from the natural isomorphism in line (29.2.2)
by observing that A V is the dual of L ((V )A ; K), and (A V ) is the dual of A V .
Line (29.2.4) is proved in exactly the same way in Theorem 29.2.10. This completes the demonstration
of the four isomorphisms required to validate Figure 29.2.1.
29.2.9 Theorem: Isomorphism between tensors and the dual of the dual-space tensors.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural
isomorphism between A V and (A V ) .
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U = A V
and W = L ((V )A ; K). Then by Theorem 29.2.6, there is a natural linear space isomorphism :
U W . By Theorem 23.11.12, the transpose T : W U of is a linear space isomorphism. Thus
T : A V (A V ) is a natural linear space isomorphism.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
29.2.10 Theorem: Isomorphism between dual-space tensors and the dual of the primal-space tensors.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural
isomorphism between A V and (A V ) .
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Let U = A V
and W = L ((V )A ; K). Then by Theorem 29.2.4, there is a natural linear space isomorphism :
U W . By Theorem 23.11.12, the transpose T : W U of is a linear space isomorphism. Thus
T : A V (A V ) is a natural linear space isomorphism.
29.2.11 Remark: Tensor space isomorphisms in the context of the canonical multilinear maps.
Figure 29.2.4 illustrates the four isomorphisms in Theorems 29.2.4, 29.2.6, 29.2.9 and 29.2.10 combined with
the context of the diagram in Figure 29.1.1, which shows the canonical tensor map and the canonical
multilinear function map , and the maps and T for the corresponding dual spaces.
iso iso
A V
m-
L ((V )A ; K) A V (A V )
du du du
al al al
dual
al al al
-du du du
m
A V L ((V )A ; K) A V (A V )
T iso iso
Figure 29.2.4 Canonical tensor map relations between tensor spaces
29.2.12 Remark: Unentwined vertical diagram of tensor space isomorphisms and duals.
Figure 29.2.5 contains the same information as Figure 29.2.4, but shown in the unentwined vertical manner
which is explained in Remark 29.1.2.
V V
A A
iso
iso
V V
iso
A
iso A
L ((V )A ) L ((V )A )
T
V V
A A
Figure 29.2.5 Tensor spaces with isomorphisms and canonical tensor maps
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then by Theo-
rem 23.10.7, there is a natural linear space isomorphism h from the linear space L ((V )A ; K) to its double
dual. Thus h : L ((V )A ; K) L ((V )A ; K) = (A V ) .
29.2.15 Remark: Yet another canonical tensor space isomorphism.
The proof of Theorem 29.2.16 is the same as for Theorem 29.2.14 because it is really the same theorem! By
combining the isomorphism h : L ((V )A ; K) (A V ) in Theorem 29.2.16 with the isomorphism
T : A V (A V ) in the proof of Theorem 29.2.9, one obtains a natural isomorphism h1 T :
A V L ((V )A ; K) which is as required for line (29.2.1) in Remark 29.2.2.
29.2.16 Theorem: Isomorphism between multilinear functions of duals and dual of dual-space tensors.
Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then there is a natural
isomorphism between L ((V )A ; K) and (A V ) .
Proof: Let (V )A be a finite family of finite-dimensional linear spaces over a field K. Then by The-
orem 23.10.7, there is a natural linear space isomorphism h from the linear space L ((V )A ; K) to its
double dual. Thus h : L ((V )A ; K) L ((V )A ; K) = (A V ) .
29.2.17 Remark: Circuitous method of constructing a tensor space isomorphism from four isomorphisms.
The pathway to construct an isomorphism between A V and L ((V )A ; K) on line (29.2.1) in Re-
mark 29.2.2 could be made more circuitous. First, the universal factorisation property for the canonical
tensor map can be applied in Theorem 29.2.6 to implicitly construct an isomorphism : A V
L ((V )A ; K) such that = , where T is the canonical multilinear function map transpose. This
construction employs a basis and vector components. Second, Theorem 29.2.9 transposes to obtain the
isomorphism T : A V (A V ) . Third, Theorem 29.2.16 employs the canonical double dual
map h : L ((V )A ; K) (A V ) . Fourth, T may be composed with the inverse of h to obtain
the isomorphism (h )1 T : A V L ((V )A ; K).
1 V
L ((V )A ; K) A
V
A
= 1 T
T = (29.2.7)
= 1
= . (29.2.8)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
These relations are illustrated in Figure 29.2.7.
L ((V )A ; K) L ((V )A ; K) L ((V )A ; K) L ((V )A ; K)
= V = V
A A
T
V V
A A
The existence of the linear space isomorphism implies that the contravariant tensor representations (v)
and T (v) contain the same information. However, the (v) style of representation is preferred in this book
because it has clearer motivation and significance in the physics context, it is easier to construct spaces
of alternating tensors from it, and it gives clearer meaning to various kinds of mixed tensor constructions
(including the Riemann curvature tensor). The most popular representation is T (v).
Similarly, the existence of the linear space isomorphism implies that the covariant tensor representations
(w) and (w) contain the same information. The (w) style of representation is preferred in this book.
This is the same as the popular definition.
29.2.20 Remark: The analogy of multilinear function spaces to distribution test function spaces.
In the contravariant case in Remark 29.2.19, (v) and T (v) are alternative representations for the tensor
product of the contravariant vector family (v )A A V . The linear space L ((V )A ; K) may be
thought of as the test space for (v) in the same way that generalised functions act on C test spaces.
Such test spaces serve only to test the objects which are defined on them. The Cartesian product A V
may be thought of as the test space for the representation T (v).
The equation (v) = T (v) in line (28.4.5) in Remark 28.4.13 gives the conversion rule between the
contravariant tensor representations (v) and T (v) for v. The right-composition of (v) by serves to shift
the test space of the representation from L ((V )A ; K) to A V . Thus the tensor representation (v)
which maps the test space L ((V )A ; K) to K is converted into the equivalent tensor representation T (v),
which maps the test space A V to K.
Thus acts on the representation (v) on the right to shift the domain of (v) to convert it to the repre-
sentation T (v). Similarly, the equation T = in line (29.2.7) means that the map acts on on the
left to shift the range of to convert it to the representation T . The big advantage of this left-composition
with is that it maps all tensors, not just the simple tensors as in the case of the right-composition with .
29.2.21 Remark: Another analogy between multilinear function spaces and test function spaces.
In the covariant case in Remark 29.2.19, (w) and (w) are alternative representations for the tensor product
of the covariant vector family (w )A A V . The linear space L ((V )A ; K) may be thought of as
the test space for (w). The Cartesian product A V may be thought of as the test space for the
representation (w).
The equation (w) T = (w) in line (28.4.6) of Remark 28.4.13 gives the conversion rule between the
covariant tensor representations (w) and (w) for w. The right-composition of (w) by T serves to shift
the test space of the representation from L ((V )A ; K) to A V . Thus the tensor representation (w)
which maps the test space L ((V )A ; K) to K is converted into the equivalent tensor representation (w),
which maps the test space A V to K.
Thus T acts on the representation (w) on the right to shift the domain of (w) to convert it to the
representation (w). Similarly, the equation = on line (29.2.8) means that the map acts on on
the left to shift the range of to convert it to the representation . The big advantage of this left-composition
with is that it maps all tensors, not just the simple tensors as in the case of the right-composition with T .
29.2.22 Remark: Interpreting the structure of the canonical isomorphisms.
From Remarks 29.2.20 and 29.2.21, it can be seen what the tensor space isomorphisms and look like
when expressed in terms of bases and coordinates. The relation T = implies that maps the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
simple tensor representations (v) and T (v) to each other for all v A V . In particular, for any bases
B = (e T
i )i I for the component spaces V , the simple tensor representations (ei ) and (ei ) are mapped
to each other by , where ei denotes (ei )A for i = (i )A I = A I .
P
Theorem 28.2.12 line (28.2.3) gives the expression T = iI T (i )(ei ) for T A V , where hi denotes
the dual vector family (h i )A A V for i I, and (i )iI = ((hi ))iI is the canonical basis for
L ((V )A ; K) in Definition 27.4.9 for the bases B . So each such tensor T has coordinates ti = T (i )
for i I with respect to the basis ((ei ))iI for A V .
P
Theorem 27.3.7 line (27.3.8) gives the expression T 0 = iI T 0 (hi ) T (ei ) for T 0 L ((V )A ; K). So the
coordinates of T 0 are t0i = T 0 (hi ) for i I with respect to the basis ( T (ei ))iI for L ((V )A ; K).
If T 0 = (T ), then the coordinates satisfy ti = t0i for all i I because the bases ((ei ))iI and ( T (ei ))iI
for A V and L ((V )A ; K) respectively are mapped to each other by . Hence the map is the map
which preserves coordinates between the two spaces. Therefore is invariant under all changes of bases for
the component spaces (V )A .
P P
The isomorphism has the map-rule : iI ti (ei ) 7 iI ti T (ei ) for all t = (ti )iI K I . Hence in
terms of coordinates,
P the map-rule is : (ti )P N
iI 7 (ti )iI . If A = Z +
r for some r 0 , the map-rule may be
written as : iI ti1 i2 ...ir e1 e2 . . . er 7 iI ti1 i2 ...ir hi1 hi2 . . . hir .
These comments apply equally to the map , referring to the comments in Remark 29.2.21.
29.2.23 Remark: Diagram of tensor space duals and isomorphisms for homogeneous tensors.
Figure 29.2.8 illustrates the relations between spaces of multilinear maps and tensor product spaces of a
single linear space. As in Figure 29.2.1, the abbreviation iso means isomorphism, and m-dual means
the multilinear dual or space of multilinear maps.
In the case that all tensor component linear spaces are copies of a single finite-dimensional linear space, the
isomorphisms listed in Remark 29.2.2 may be written as follows. These are the isomorphisms which are
iso iso
V m Lm (V ; K) m V m
( V )
-du du du
al al al
dual
dual
al al al
-du du m
du m
V V m Lm (V ; K) V ( V )
iso iso iso
Figure 29.2.8 Linear and multilinear duals of a linear space
m
Lm (V ; K)
= V = Lm (V ; K) (29.2.9)
m
Lm (V ; K) = V = Lm (V ; K) (29.2.10)
m V m
= ( V ) = Lm (V ; K) (29.2.11)
m V m
= ( V ) = Lm (V ; K) (29.2.12)
These natural isomorphisms follow immediately from the corresponding isomorphisms in Remark 29.2.2,
which are proved in Theorems 29.2.6, 29.2.14, 29.2.9 and 29.2.16.
29.2.24 Remark: Unentwined diagram of tensor space duals and isomorphisms for homogeneous tensors.
Figure 29.2.9 contains the same information as Figure 29.2.8, shown unentwined. (See Remark 29.1.2 for
an explanation of such unentwined diagrams.)
m m
( V ) ( V )
iso
iso
m V iso m V
iso
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Lm (V ) Lm (V )
T
V m
(V )m (V )m
iso
Figure 29.2.9 Tensor spaces for a single linear space with canonical tensor maps
29.2.26 Theorem: Let V and W be finite-dimensional linear spaces over a field K. Define the map
: Lin(V, W ) L (V, W ; K) by ()(v, x) = x((v)) for all Lin(V, W ) and (v, x) V W . Then
is a linear space isomorphism.
Proof: Let V and W be finite-dimensional linear spaces over K. Let be as defined. Let 1 , 2 K
and 1 , 2 Lin(V, W ). Then (1 1 + 2 2 )(v, x) = x((1 1 + 2 2 )(v)) = x(1 1 (v) + 2 2 (v)) =
1 x(1 (v))+2 x(2 (v)) = 1 (1 )(v, x)+2 (2 )(v, x) = (1 (1 )+2 (2 ))(v, x) for all (v, x) V W .
So (1 1 + 2 2 ) = 1 (1 ) + 2 (2 ) for all 1 , 2 K and 1 , 2 Lin(V, W ). So is a linear map.
For L (V, W ; K), define a map () : V W by ()(v)(x) = (v, x) for all (v, x) V W . It is
clear that ()(v) : W K is linear for v V . So ()(v) W . It is also clear that () : V W is
linear. So () Lin(V, W ) for all L (V, W ; K).
Let 1 , 2 K and 1 , 2 L (V, W ; K). Then (1 1 + 2 2 )(v)(x) = (1 1 + 2 2 )(v, x) = 1 1 (v, x) +
2 2 (v, x) = 1 (1 )(v)(x)+2 (2 )(v)(x) = (1 (1 )+2 (2 ))(v)(x) for all (v, x) V W . So (1 1 +
2 2 ) = 1 (1 ) + 2 (2 ) for all 1 , 2 K and 1 , 2 L (V, W ; K). Therefore : L (V, W ; K)
Lin(V, W ) is a linear map.
By Theorem 23.10.7, the canonical linear map : W W defined by : w 7 (x 7 x(v)) is a linear space
isomorphism because W is finite-dimensional. Therefore 1 : W W is a linear space isomorphism.
Define : Lin(V, W ) Lin(V, W ) by ()(v) = ((v)) for all Lin(V, W ) and v V . Then it is
easily shown that is a linear map, and that : Lin(V, W ) Lin(V, W ) defined by ()(v) = 1 ((v))
for all Lin(V, W ) and v V is the inverse of . So is a linear space isomorphism. Therefore
: L (V, W ; K) Lin(V, W ) is a linear space isomorphism. It is easily shown that is the inverse
of . Hence is a linear space isomorphism.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
multilinear function dual space of the form L ((V )A , (W )B ; K) , which has a natural isomorphism to
L ((V )A , (W )B ; K). Many textbooks present only one of these kinds of spaces, but there are numerous
other kinds of constructions which are naturally isomorphic to these two families, and these constructions
do arise naturally in differential geometry. The purpose of Section 29.3 is to present some of the naturally
occurring mixed tensor space constructions and demonstrate natural equivalences between them.
A third general family of mixed tensor spaces has the form Lin(L ((V )A ; K), L ((W )B ; K)). (See
Definition 29.3.3.) These are spaces of linear maps between multilinear function spaces. A natural isomor-
phism between this family and the multilinear function family L ((V )A , (W )B ; K) is demonstrated
in Theorem 29.3.11. These three families of mixed tensors may be summarised as follows. (For reasons of
aesthetic balance, a fourth isomorphic family is included in this list.)
family construction
(1) multilinear function space L ((V )A , (W )B ; K)
(2) multilinear function space dual L ((V )A , (W )B ; K)
(3) linear map space Lin(L ((V )A ; K), L ((W )B ; K))
(4) linear map space dual Lin(L ((W )B ; K), L ((V )A ; K))
Whenever these and any other families of mixed tensorial spaces are encountered, it is clearly desirable
to be able to identify the relations between them. The mixed tensor space families require definitions,
notations and isomorphism theorems. Many more spaces could be added to the list in the table above, such
as Lin(L ((W )B ; K) , L ((V )A ; K) ). When the linear space components of a multilinear spaces are
the same, various symmetries and anti-symmetries may be applied to obtain yet more mixed tensor spaces.
This is not idle definition building. Such combinations of linear spaces, dual spaces, multilinear spaces and
symmetries are often encountered. (See Remark 30.4.14, for example.) However, the techniques required to
construct such natural isometries are reusable. Theorem 29.3.11 gives an example of how these isometries
may be constructed.
It follows from Theorem 23.11.17 that Lin(L ((W )B ; K) , L ((V )A ; K) ) is equivalent to line (3) in
the above table.
29.3.2 Remark: The order of listing of component spaces for mixed tensor products.
The order of covariant and contravariant parts of tensor spaces is most often chosen with the contravariant
part first. In the table in Remark 29.3.1, the families of mixed tensors are effectively contravariant with
respect to the linear spaces V and covariant with respect to the linear spaces W . The contravariant-first
convention is followed in Notation 29.3.4.
29.3.3 Definition: The mixed tensor product (space) of finite families (V )A and (W )B of linear
spaces over a field K is the linear space of linear maps Lin(L ((V )A ; K), L ((W )B ; K)).
K K
V W
A B
L ((V )A ; K) L ((W )B ; K)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Lin(L ((V )A ; K), L ((W )B ; K))
= ((V )A ; (W )B )
Figure 29.3.1 Mixed tensor product space
The arrows with chicken-feet tails symbolise spaces of multilinear maps from the indicated cross-product
sets to K. The arrow with the single-stroke tail symbolises the set of linear maps between the spaces in the
two inner boxes.
((V )A ) = A V = L ((V )A ; K)
((W )B ) = L ((W )B ; K)
((V )A ; (W )B ) = Lin(L ((V )A ; K), L ((W )B ; K)).
The star-notation ((W )B ) for covariant tensors L ((W )B ; K) is uncomfortable and unnecessary.
29.3.7 Remark: The special case of mixed tensor spaces which are not mixed.
If A = in Definition 29.3.3, then L ((V )A ; K) may be identified with K regarded as a linear space. (See
Remark 28.1.15.) Then ((V )A ; (W )B ) = Lin(K, L ((W )B ; K)), which may be identified with
L ((W )B ; K). In other words, the mixed tensor product space in this case may be identified with the
linear space of multilinear functions for the family (W )B , which may be thought of as a space of covariant
tensors.
Similarly, if B = in Definition 29.3.3, then ((V )A ; (W )B ) = Lin(L ((V )A , K); K) = A V .
In other words, the mixed tensor product space in this case may be identified with the tensor product space
A V in Definition 28.1.2.
The mixed tensor product space ((V )A ; (W )B ) = Lin(L ((V )A ; K), L ((W )B ; K)) is thus
some kind of mixture of covariant and contravariant tensors. It is not identical to the sets which typically
define mixed tensor product spaces. However, this formalism has various advantages relative to the more
popular definitions.
29.3.8 Definition: The mixed multilinear function space of finite families (V )A and (W )B of linear
spaces over a field K is the linear space of multilinear maps L ((V )A , (W )B ; K).
29.3.9 Remark: Fine points of the notation for mixed multilinear function spaces.
The comma-separated juxtaposition (V )A , (W )B in Definition 29.3.8 signifies the union of the two
finite families of linear spaces (V )A and (W )B . If A B 6= , then one or both of the index sets A
and B must be modified so that they do not intersect. If the index sets are both ordered, then the combined
index set should have an order which makes all elements of A less then (or greater than) the elements of B.
N N
If A = m and B = n , it is customary to add m to each element of B so that B = m+n \ m . N N
29.3.10 Remark: Schematic diagram of isomorphism between mixed tensor spaces.
The isomorphism map in Theorem 29.3.11 is illustrated in Figure 29.3.2.
K K K
V W V W
A B A B
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Lin(L ((V )A ; K), L ((W )B ; K))
= ((V )A ; (W )B )
Figure 29.3.2 Isomorphism between mixed tensor spaces
29.3.11 Theorem: Isomorphism between linear-style and multilinear-style mixed tensor spaces.
Let (V )A and (W )B be finite families of finite-dimensional linear spaces over a field K. Then the map
from Lin(L ((V )A ; K), L ((W )B ; K)) (a mixed tensor product space) to L ((V )A , (W )B ; K)
(the corresponding mixed multilinear function space) which is defined by
X, v V , w W ,
A B
()(v , w) = (w)(((v )))
Proof: Let (V )A and (W )B be finite families of finite-dimensional linear spaces over a field K. Let
X = Lin(L ((V )A ; K), L ((W )B ; K)) and Y = L ((V )A , (W )B ; K). Let X. Then
by Definition 28.2.2. The multilinearity of ()(v , w) with respect to w for fixed v follows from the
definition of . The multilinearity with respect to v for fixed w follows from the multilinearity of (v )
with respect to v and the linearity of ()(w) with respect to L ((V )A ; K) for fixed w. (To
verify this linearity, define w : L ((V )A ; K) K by w : 7 ()(w) for w B W . Then for
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
iI
= g(w)(v )
= g(v , w).
Thus (g ) = g for all g Y . This shows that is surjective onto Y .
To show injectivity of the map , suppose (1 ) = (2 ) for 1 , 2 X. Then 1 ((v ))(w) = 2 ((v ))(w)
for all v A V and w B W . So 1 ((v )) = 2 ((v )) for all v A V . This implies that
1 and 2 are equal on all simple multilinear functions (v ) L ((V )A ; K). Therefore 1 () and 2 ()
are equal for all multilinear functions L ((V )A ; K). So 1 = 2 .
29.3.12 Remark: Strategy for proof discovery by first testing simple tensors and multilinear functions.
Much of the proof of Theorem 29.3.11 was constructed by first restricting attention to simple tensors or
simple multilinear functions, and then extending the procedure to general tensors or multilinear functions by
expressing them as linear combinations of the simple case. This approach is quite useful in general. A useful
tactic for testing conjectures is to substitute simple tensors and simple multilinear functions where possible.
If an idea does not work for simple tensors and multilinear functions, then it will certainly not work for the
general case. But if the simple case is successful, the general case will very often follow by using a basis for
the relevant space or spaces.
29.3.13 Remark: Literature for the mixed tensor space isomorphism theorem.
A statement similar to Theorem 29.3.11 for the special case #(A) = #(B) = 1 is given by Willmore [41],
page 175.
29.3.14 Remark: An advantage of the linear-map style of mixed tensor space representation.
With the linear-map representation Lin(L ((V )A ; K), L ((W )B ; K)) for mixed tensors, the traditional
composition operation on tensors may often be achieved by composition of tensors.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Qr
where I = k=1 Nm , J = Qs`=1 Nn , and w`j `
= B` (w` )j` is the j` -th component of w` with respect to the
Ns , and
k `
basis B` for `
where ek,ik is the ik -th element of the canonical dual basis to Bk for k Nr .
Proof: The assertion follows from Theorem 27.3.4 line (27.3.1).
Q
N Q
N
where I = rk=1 mk , J = s`=1 n` , and i = ((ek,ik )rk=1 ) = (e1,i1 , . . . er,ir ) for i I, and w`j` =
B` (w` )j` is the j` -th component of w` with respect to the basis B` for ` s , and N
i I, j J, ai j = A(ei )((e`,j` )s`=1 ),
where ei = rk=1 ek,ik , and eik is the ik -th element of the canonical dual basis to Bk for k Nr .
[ www.geometry.org/dg.html ] [ draft: UTC 201749 Sunday 13:30 ]
29.4. Components for mixed tensors with mixed primal spaces 857
Proof: The assertion follows from the multilinearity of L ((Vk )rk=1 ) and L ((W` )s`=1 ).
29.4.7 Theorem: Change-of-basis formulas for two mixed tensor space representations.
Let (Vk )rk=1 and (W` )s`=1 be finite families of finite-dimensional linear spaces over a field K, where r, s + 0. Z
Let Bk = (ek,ik )m k
ik =1 and B k = (ek,i k
) mk
ik =1 be bases for V k for all k r , and let B ` = (e`,j `
)Nn`
j` =1 and
= (e )n` be bases for W for all `
B ` `,j` j` =1 ` N k nk
s . Define the basis transition matrices (Y, ),=1 and
k nk
(Y, N
),=1 for k r , where nk = dim(Vk ), by
Nr , Nn ,
nk
P k
k k
ek, = ek, Y,
=1
Nr , Nn , Pnk
k
k k
ek, = ek, Y, .
=1
`
Define the basis transition matrices (Z, )m ` ` m`
,=1 and (Z, ),=1 for ` Ns, where m` = dim(W`), by
Ns , Nm ,
m
P` `
` `
e`, = e`, Z,
=1
Ns , Nm ,
m
P` `
` `
e`, = e`, Z, .
=1
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Let A L ((Vk )rk=1 , (W` )s`=1 ). Then
P P r
Q s
Q 0
i I, j J, ai j = Yikk ,i0 Zj`0 ,j` ai j 0 , (29.4.1)
k `
i0 I j 0 J k=1 `=1
P P Q r s
Q 0
i I, j J, ai j = Yikk ,i0 Zj`0 ,j` ai j 0 , (29.4.2)
k `
i0 I j 0 J k=1 `=1
where
where (ek,ik )m
ik =1 and (e
k k,ik mk
)ik =1 are the canonical dual bases to Bk and Bk respectively for all k Nr , and
n
`,j` ` n
`,j` `
(e )j` =1 and (e )j` =1 are the canonical dual bases to B` and B respectively for all `
` s. N
Similarly, let A Lin(L ((Vk )rk=1 ), L ((W` )s`=1 )).
Then lines (29.4.1) and (29.4.2) hold when (ai j )iI,jJ
i
and (a j )iI,jJ are defined instead by (29.4.5) and (29.4.6):
Proof: Let A L ((Vk )rk=1 , (W` )s`=1 ) for linear spaces (Vk )rk=1 and (W` )s`=1 for r, s +
0 . For the bases Z
and other assumptions as in the statement of the theorem, define (ai j )iI,jJ and (ai j )iI,jJ by lines
This verifies line (29.4.1) for A L ((Vk )rk=1 , (W` )s`=1 ). (The coordinate transformation rules applied in
line (29.4.7) are proved in Theorems 22.9.11 and 23.9.7.) Line (29.4.2) follows similarly.
Let A Lin(L ((Vk )rk=1 ), L ((W` )s`=1 )) for linear spaces (Vk )rk=1 and (W` )s`=1 for r, s +
0 . For the bases and Z
other assumptions as in the statement of the theorem, define (ai j )iI,jJ and (ai j )iI,jJ by lines (29.4.5)
and (29.4.6). Then
This verifies line (29.4.1) for A Lin(L ((Vk )rk=1 ), L ((W` )s`=1 )). (The coordinate transformation rules
applied in line (29.4.8) are proved in Theorems 22.9.11 and 23.9.7.) Line (29.4.2) follows similarly.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
29.4.8 Remark: Different classes of tensorial object may correspond to the same coefficient matrix.
The tensor calculus approach to differential geometry utilises the coefficient matrix (ai j )iI,jJ in Theorems
29.4.4 and 29.4.5 and often largely ignores the objects which the coefficients coordinatise. In this case, the
coefficient arrays are the same, and they transform in the same way under changes of basis, but they refer
to different object classes.
29.4.9 Remark: Tensor monomial notation meanings are different for vectors and linear functionals.
Notations 29.4.10 and 29.4.11 give a different meanings to the binary operator depending on whether
the operands are vectors or linear functionals. The expression rk=1 vk is a simple tensor. The expression
s`=1 ` in Notation 29.4.11 is a simple multilinear function. (Notation 29.4.10 duplicates Notations 28.4.4
and 28.4.5, but is presented again here, slightly differently, for direct comparison with Notations 29.4.11,
29.4.14 and 29.4.15.)
29.4.12 Remark: Contravariant and covariant simple tensors are totally different kinds of objects.
The definition of mixed tensors is presented in most texts as if the contravariant and covariant simple tensors
rk=1 vk and s`=1 ` were equivalent in some sense. In fact, the contravariant simple tensor rk=1 Q vk is formed
by applying multilinear functions in L ((Vk )rk=1 ; K) to the sequence of vectors (vk )rk=1 rk=1 Vk . The
covariant simple tensor s`=1 `
Qs , by contrast, is defined as the application of each individual linear functional
Qs
in the sequence ( )`=1 `=1 W` to the corresponding element of a sequence of vectors (w` )s`=1 `=1 W` .
` s
Consequently, it is not possible to simply string these simple tensors together in a well-defined mixed simple
tensor (rk=1 vk ) (s`=1 ` ).
An alternative interpretation for the notation for simple contravariant tensors in Notation Q29.4.10 would
define rk=1 vkQto mean the multilinear function in L ((Vk )rk=1 ; K) defined by (k )rk=1 7 rk=1 k (vk ) for
r
all (k )rk=1 k=1 Vk . The equivalence (i.e. natural isomorphism) of this interpretation to Notation 29.4.10
is shown in Theorem 29.2.4. This multilinear function style of tensor product is closer to the simple mixed
tensors in Notation 29.4.14. As discussed in Remark 28.5.5, the representation L ((Vk )rk=1 ; K) for con-
travariant tensors is much closer to physical applications than L ((Vk )rk=1 ; K), although the latter does have
some advantages in the construction of mixed tensors.
Notations 29.4.14 and 29.4.15 define two ways of combining contravariant simple tensors with covariant
simple tensors to form a single mixed simple tensor. Notation 29.4.14 constructs a mixed simple tensor in
the popular multilinear-map style of mixed tensor space. Notation 29.4.15 constructs a mixed simple tensor
in the linear-map style of mixed tensor space. For both styles of mixed tensor products, the contravariant
and covariant components must be treated differently.
29.4.13 Remark: The application of the list-concatenation operator to tuples of vectors.
The list concatenation function template concat : List(S) List(S) List(S) for general
S sets S is S defined in
Definition 14.11.6 (iii). For the application in Notation 29.4.14, S may be chosen as ( k=1 Vk ) ( s`=1 W` ).
29.4.14 Notation: Simple mixed tensors in theQmultilinear-function style.
(rk=1 vk ) (s`=1 ` ), for a sequence (vk )Q
r
k=1
r
k=1 Vk , for r
+
Z
0 , where Vk is a linear space over K
N s +
Z
for all k r , and a sequence (` )s`=1 `=1 (W` ) , for s 0 , where W` is a linear space over K for
N
all ` s , denotes the mixed tensor in L ((Vk )rk=1 , (W` )s`=1 ) which is defined by
r
Q s
Q
concat((k )rk=1 , (w` )s`=1 ) 7 k (vk ) ` (w` )
k=1 `=1
Q Q
for all (k )rk=1 rk=1 Vk and (w` )s`=1 s`=1 W` .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(v1 . . . vr ) (1 . . . s ) means the same as (rk=1 vk ) (s`=1 ` ).
29.4.15 Notation: Simple mixed tensors in the Qrlinear-map style. +
rk=1 vk s`=1 ` , for a sequence (vk )rk=1 k=1 Vk , for r Z
0 , where Vk is a linear space over K
N Q
for all k r , and a sequence (` )s`=1 s`=1 (W` ) , for s + Z
0 , where W` is a linear space over K for
N
all ` s , denotes the mixed tensor in Lin(L ((Vk )rk=1 ), L ((W` )s`=1 )) which is defined by
s
Q
7 (w` )s`=1 7 (v1 , . . . vr ) ` (w` )
`=1
Qs
for all L ((Vk )rk=1 ; K) and (w` )s`=1 `=1 W` .
v1 . . . vr 1 . . . s means the same as rk=1 vk s`=1 ` .
29.4.16 Remark: Simple mixed tensors composed of basis vectors.
Notations 29.4.14 and 29.4.15 are abbreviated in the case of mixed tensors composed of basis vectors to No-
tations 29.4.17 and 29.4.18 respectively. There is some inconvenience in the use of the over-bar to distinguish
r+s
the two sets of bases, but this can be easily removed by re-indexing the family (W` )s`=1 as (W` )`=r+1 , and
n` n`
rewriting (e`,j` )j` =1 as (e`,j` )j` =1 .
29.4.17 Notation: Simple multilinear-function-style mixed tensors composed of basis vectors.
Qr
N Qs
N
ei j , for i k=1 mk and j `=1 n` , for r, s + Z
0 for finite families of finite-dimensional linear spaces
,
(Vk )rk=1 and (W` )s`=1 over a field K, with bases (ek,ik )m
ik =1 for Vk for k
k
N
n`
r , and (e`,j` )j` =1 for W` for ` s, N
r s
denotes the mixed tensor in L ((Vk )k=1 , (W` )`=1 ) which is defined by
r s
ei j = ( ek,ik ) ( e`,j` ),
k=1 `=1
r s
ei Kj = ( ek,ik e`,j` ),
k=1 `=1
P P r s
A L ((Vk )rk=1 , (W` )s`=1 ), A= ai j ( ek,ik ) ( e`,j` )
iI jJ k=1 `=1
P P i j
= a j ei ,
iI jJ
where I =
Qr
k=1 Nm , J = Qs`=1 Nn , and
k `
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
ik =1 is the canonical dual basis to Bk for k
k ` `
N
`
to B` for ` s .
Proof: The assertion follows straightforwardly from the definition of L ((Vk )rk=1 , (W` )s`=1 ).
P P r s
A Lin(L ((Vk )rk=1 ), L ((W` )s`=1 )), A= ai j ( ek,ik e`,j` )
iI jJ k=1 `=1
P P i Kj
= a j ei ,
iI jJ
where I =
Qr
k=1 Nm , J = Qs`=1 Nn , and
k `
where (ek,ik )m
ik =1 is the canonical dual basis to Bk for k
k
Nr , and (e`,j )nj =1 is the canonical dual basis
` `
N
`
to B` for ` s .
Proof: The assertion follows straightforwardly from the definition of Lin(L ((Vk )rk=1 ), L ((W` )s`=1 )).
29.5.2 Definition: The mixed multilinear function space of type (r, s) on V , where r, s + Z
0 and V is
a finite-dimensional linear space, is the mixed multilinear function space L ((V )A , (W )B ; K). with
N
V = V for all A = r and W = V for all B = s . N
A mixed multilinear function of type (r, s) on V is an element of the mixed multilinear function space of
type (r, s) on V .
29.5.4 Definition: The mixed tensor (product) space of type (r, s) on V , where r, s + Z
0 and V is a
finite-dimensional linear space, is the mixed tensor product ((V )A ; (W )B ) with V = V for all
N
A = r and W = V for all B = s . N
A mixed tensor of type (r, s) on V is an element of the mixed tensor space of type (r, s) on V .
29.5.6 Remark: Simplified notations for mixed tensor spaces on a single primal space.
The mixed tensor product ((V )A ; (W )B ) in Definition 29.5.4 may be written as (V r ; V s ). Thus
rKs V = (V r ; V s ) and r V r,0
= V
rK0
= V = (V r ; V 0 ). One may also write Ls (V ) = V
0,s
=
0Ks V = (V 0 ; V s ).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
29.5.7 Remark: Identifications of mixed tensor spaces in boundary cases.
The special cases r = 0 or s = 0 in Definition 29.5.4 are implicitly discussed in Remark 29.3.7 as the
special cases A = or B = respectively. In the case (r, s) = (0, 0), the mixed tensor product space
rKs V is the space (; ), which has two empty sequences of linear spaces over an implied field K. Then
0K0 V = Lin(L (; K), L (; K)). As mentioned in Remarks 27.1.12 and 28.1.15, L (; K) is equal to K =
{f : K} with the operations of pointwise addition and scalar multiplication by elements of K. This may
0K0
be identified with K (the linear space K over the field K), and so V may be identified with Lin(K, K),
0K0
which can also be identified with the linear space K over the field K. Hence dim( V ) = 1.
29.5.9 Remark: Tensor spaces with the same transformation rules are not necessarily the same.
It could be argued that the mixed tensor products in Definition 29.5.4 are highly inconvenient because this
approach will spawn numerous spaces of linear maps between numerous tensor spaces. Every application may
require a new space of linear maps which is ultimately isomorphic to one of the spaces in Definition 29.5.4,
which is itself isomorphic to the popular definition of mixed tensors as spaces of multilinear maps on mixed
primal and dual spaces. In this view, all of the naturally isomorphic tensor bundles on a differentiable
manifold are merely alternative representations of the one-and-only single tensor bundle with each type (r, s).
There are two main counter-arguments to this view. First, the extra work required to build the right tensor
space for each application is valuable because it forces one to understand the specific structure of each tensor
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1998 Petersen [31], page 51 type (r,
s)
r
1999 Rebhan [264], page 973 [de] s
2004 Szekeres [266], page 186 degree r, s type (r,s)
2005 Penrose [263], page 240 valence rs
2006 Gregory [246], page 501 order r
2012 Sternberg [37], pages 94, 98 degree r, s type (r, s)
Kennington degree r, s type (r, s)
bundle. Second, the view that all naturally isomorphic tensor bundles on a manifold are really the same
tensor bundle is essentially asserting that the only distinguishing feature of tensor bundles is their coordinate
transformation rules. There is a very popular view that coordinates-free differential geometry is somehow
superior, but most people who believe this also believe that coordinate transformation rules are the only
distinguishing feature of tensor bundles. These beliefs seem to be incompatible. The view adopted in this
book is that multiple transformation-identical tensor bundles may co-exist on a single manifold, but that
coordinates are the principal structural feature of tensor algebra and differential geometry generally.
space is constructed.
29.5.11 Remark: Permutations of the order of primal and dual spaces in a mixed tensor space.
The style of mixed tensor representation mentioned in Remark 29.5.10 uses a sequence of primal spaces
followed by a sequence of dual spaces, but this could be generalised so that the primal and dual spaces are
mixed up in any order.
As an example, a possible notation for the space of tensors whose components are written as Ki jk ` m would be
R
0,1,2,1,1 n . (Something like this is discussed by Petersen [31], page 55.) More generally, the notation would
r ,s ,r ,s ,...
be 1 1 2 2 V with contravariant and covariant degrees respectively equal to r1 , r2 , . . . and s1 , s2 , . . .. It
would then be necessary to develop a set of notations for arbitrary contractions and products of such tensors
and spaces. This seems to be an area where the physicists component notations are better developed than
the pure mathematical notations.
29.5.12 Notation: Lr,s (V ; U ) denotes the mixed multilinear map space L ((V )r , V s ; U ), where V is a
finite-dimensional linear space, U is a linear space, and r, s +
0. Z
Lr,s (V ) denotes the mixed multilinear map space L ((V )r , V s ; K), where V is a finite-dimensional linear
space over the field K, and r, s + 0. Z
29.5.13 Remark: A natural isomorphism for multilinear-function-style and linear-map-style tensors.
The mixed multilinear map space Lr,s (V ) = L ((V )r , V s ; K) in Notation 29.5.12 is the same as the
r,s
space V . If Theorem 29.3.11 is applied to the case that all of the tensor component spaces are the
same, the result is the following natural isomorphism:
rKs V = Lin(L (V r ; K), L (V s ; K)) r,s
= L ((V )r , V s ; K) = V = Lr,s (V ).
This gives the equivalence of the definition in this book to the more popular style of definition.
29.5.14 Remark: An unimportant generalisation of a tensor space notation.
Notation 29.5.15 slightly generalises Notation 29.5.12. As a consequence, Lr,s (V ; U ) = Lr,s (V, V ; U ).
29.5.15 Notation: Lr,s (V1 , V2 ; U ) denotes the mixed multilinear map space L ((V1 )r , V2s ; U ), where V1
and V2 are finite-dimensional linear spaces, U is a linear space, and r, s +
0. Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
29.6. Components for mixed tensors with a single primal space
29.6.1 Remark: Specialising components for mixed tensors to the homogenous primal spaces case.
The purpose of Section 29.6 is to apply bases and components to the mixed tensor product spaces which are
defined in Section 29.5 for a single primal space. Theorems 29.6.2 and 29.6.3 are specialisations of Theorems
29.4.4 and 29.4.5 respectively to the case of uniform linear spaces.
29.6.2 Theorem: Components for multilinear-function-style homogenous mixed tensor spaces.
Let V be a linear space over a field K, with n = dim(V ). Let B = (ei )ni=1 be a basis for V . Then
r, s Z+0, A L ((V )r , V s ), (k )rk=1 (V )r , (v` )s`=1 V s ,
P P r
Q s
Q
A (k )rk=1 , (v` )s`=1 = A((eik )rk=1 , (ej` )s`=1 ) kik v`j` ,
iN r
n j N s
n k=1 `=1
where v`j` = B (v` )j` denotes the j` -th component of v` with respect to B, kik = k (eik ) denotes the ik -th
component of k with respect to B, and eik denotes the ik -th element of the canonical dual basis to B.
Proof: This result is a direct consequence of Theorem 29.4.4 by replacing Vk with V for k Nr and W`
with V for ` s . N
29.6.3 Theorem: Components for linear-map-style homogenous mixed tensor spaces.
Let V be a linear space over a field K, with n = dim(V ). Let B = (ei )ni=1 be a basis for V . Then
r, s Z+0, A Lin(Lr (V ), Ls (V )), Lr (V ), (v` )s`=1 V s ,
P P s
Q
A() (v` )s`=1 = A(ei )((ej` )s`=1 ) i v`j` ,
N
i r
n j N s
n `=1
N
where i = ((eik )rk=1 ) for i rn , and v`j` = B (v` )j` is the j` -th component of v` with respect to the
basis B, and ei = rk=1 eik , where eik is the ik -th element of the canonical dual basis to B.
Proof: This result is a direct consequence of Theorem 29.4.5 by replacing Vk with V for k Nr and W`
with V for ` s . N
Nn ,
n
P
e = e Y,
=1
Nn , Pn
e = e Y, .
=1
P P r
Q s
Q 0
i I, j J, ai j = Yik ,i0k Yj`0 ,j` ai j 0 , (29.6.1)
i0 I j 0 J k=1 `=1
P P Q r s
Q 0
i I, j J, ai j = Yik ,i0k Yj`0 ,j` ai j 0 , (29.6.2)
i0 I j 0 J k=1 `=1
where
where (ek )nk=1 and (ek )nk=1 are the canonical dual bases to B and B respectively.
Similarly, let A Lin(L (V r ), L (W s )). Then (29.6.1) holds when (ai j )iI,jJ and (ai j )iI,jJ are defined
instead by (29.6.5) and (29.6.6):
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
i I, j J, ai j = A(ei )((ej` )s`=1 ) (29.6.5)
i i
i I, j J, a j = A(e )((ej` )s`=1 ). (29.6.6)
Proof: This result is a direct consequence of Theorem 29.4.7 by replacing Vk with V for k Nr and W`
with V for ` s . N
29.6.5 Remark: Tensor monomial notation meanings are different for vectors and linear functionals.
Notations 29.6.6 and 29.6.7 are specialisations of Notations 29.4.10 and 29.4.11 to homogeneous linear spaces.
Note that the binary operator has a different meaning for the contravariant vectors in Notation 29.6.6
and the covariant vectors in Notation 29.6.7. In the contravariant case, the r vectors are applied as arguments
of an r-linear test function Lr (V ; K). In the covariant case, the s linear functionals are applied to an
s-tuple of test vectors.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
basis B = (em )nm=1 , denotes the mixed tensor in L ((V )r , V s ) which is defined by
r s
ei j = ( eik ) ( ej` ),
k=1 `=1
where (em )nm=1 is the canonical dual basis to B.
29.6.13 Notation: Simple linear-map-style mixed tensors composed of basis vectors.
N N
ei Kj , for i rn and j sn , for r, s + Z
0 , for a finite-dimensional linear space V over a field K, with bases
B = (em )nm=1 denotes the mixed tensor in Lin(Lr (V ), Ls (V )) which is defined by
r s
ei Kj = ( eik ej` ),
k=1 `=1
where (em )nm=1 is the canonical dual basis to B.
29.6.14 Remark: Expressions for mixed tensors in terms of bases.
Theorems 29.6.15 and 29.6.16 are specialisations of Theorems 29.4.20 and 29.4.21 to a single primal space.
29.6.15 Theorem: Expression for multilinear-function-style mixed tensors in terms of bases.
Let V be a finite-dimensional linear space over a field K. Let B = (em )nm=1 be a basis for V . Let r, s Z+0.
Then
P P i r s
A L ((V )r , V s ), A= a j ( eik ) ( ej` )
iI jJ k=1 `=1
P P
= ai j ei j ,
iI jJ
Proof: This result is a direct consequence of Theorem 29.4.20 by replacing Vk with V for k Nr and W`
with V for ` s . N
where ei = rk=1 eik , and (em )nm=1 is the canonical dual basis to B.
Proof: This result is a direct consequence of Theorem 29.4.21 by replacing Vk with V for k Nr and W`
with V for ` s . N
Chapter 30
Multilinear map spaces and symmetries
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
have extensive applications when a vector multiplication operation is available from a Lie algebra structure
on the linear space.
It is noteworthy that in applications, tensors almost always have symmetry or antisymmetry constraints.
Hence tensors of degree 2, either doubly covariant or doubly contravariant, are predominantly either fully
symmetric or fully antisymmetric.
30.1.2 Remark: Linear space component permutation symmetries force homogeneous linear spaces.
For Definitions 30.1.3 and 30.1.4 for symmetric and antisymmetric multilinear maps, a permutation of a
set A means a bijection P : A A. (See Definition 14.7.2.) When requiring symmetry or antisymmetry
under linear-space-component permutations, all linear space components being permuted must be the same.
(See Definition 14.11.6 (vi) for the list swap-item operator.) Hence spaces of the form L ((V )A ; U ) are
abandoned here in favour of the spaces Lm (V ; U ) with m + Z
0 . (See Notation 27.1.19.)
30.1.3 Definition: A symmetric multilinear map from a Cartesian product V m of a linear space V over a
field K, for m + Z
0 , to a linear space U over K is a multilinear map f Lm (V ; U ) such that f (swapj,k (v)) =
f (v) for all v = (vi )m
i=1 V
m
and j, k m . N
30.1.4 Definition: An antisymmetric multilinear map from a Cartesian product V m of a linear space
V over a field K, for m + Z
0 , to a linear space U over K is a multilinear map f Lm (V ; U ) such that
f (swapj,k (v)) = f (v) for all v = (vi )m
i=1 V
m
and j, k m . N
An alternating multilinear map means the same thing as an antisymmetric multilinear map.
An alternating m-form (map) for m + Z
0 means an antisymmetric multilinear map on an m-times Cartesian
product of a linear space.
Alan U. Kennington, Differential geometry reconstructed: A unified systematic framework. www.geometry.org/dg.html
Copyright 2017, Alan U. Kennington. All rights reserved. You may print this book draft for personal use.
Public redistribution in electronic or printed form is forbidden. You may not charge any fee for copies of this book draft.
30.1.5 Notation: Lm +
(V ; U ) for m Z+0 and linear spaces U and V denotes the set of symmetric multi-
m
linear maps from V to U . Thus
+
Lm (V ; U ) = {f Lm (V ; U ); j, k Nm, f swap = f }.
j,k
30.1.6 Notation: Lm
(V ; U ) for m Z+0 and linear spaces U and V denotes the set of antisymmetric
multilinear maps from V m to U . Thus
Lm (V ; U ) = {f Lm (V ; U ); j, k Nm , j 6= k f swap = f }.
j,k
30.1.7 Remark: Symmetric and antisymmetric multilinear functions are special cases of maps.
The linear space U in Definitions 30.1.3 and 30.1.4 may be substituted with the field K regarded as a
linear space. (See Definition 22.1.7 for fields regarded as a linear space.) This special case deserves its own
definitions and simplified notations.
30.1.10 Notation: Lm +
(V ) for m Z+0 and a linear space V denotes the set of symmetric multilinear
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
m
functions on V . Thus
+
Lm (V ) = {f Lm (V ; K); j, k Nm, f swap = f }.
j,k
30.1.11 Notation: Lm (V ) for m Z+0 and a linear spaces V denotes the set of antisymmetric multilinear
m
maps on V . Thus
Lm (V ) = {f Lm (V ; K); j, k Nm , j 6= k f swap = f }.
j,k
30.1.12 Remark: Duals of constrained linear spaces are not subspaces of the unconstrained duals.
The symmetric and antisymmetric multilinear map spaces and multilinear function spaces are linear sub-
spaces of the corresponding unconstrained linear spaces. Thus for linear spaces V and U , and m + 0, Z
+ +
Lm (V ; U ) Lm (V ; U ), Lm (V ; U ) Lm (V ; U ), Lm (V ) Lm (V ) and Lm (V ) Lm (V ). This is quite
obvious, but the corresponding statements for the duals of these spaces are not valid. (See Remark 23.6.11
for discussion of this issue.)
+
30.1.13 Theorem: The set Lm (V ; U ) is closed under pointwise addition and scalar multiplication.
+
Proof: Let g, h Lm (V ; U ) and let f = g + h be the pointwise sum of g and h. Then f Lm (V ;U )
by Theorem 27.4.2. For permutations
P: Nm Nm and vector sequences
(vi )m m m
i=1 V , f (vP (i) )i=1 =
g (vP (i) )m m m m m
i=1 + h (vP (i) )i=1 = g (vi )i=1 + h (vi )i=1 = f (vi )i=1 .
30.1.14 Theorem: The set Lm (V ; U ) is closed under pointwise addition and scalar multiplication.
Proof: Let g, h Lm (V ; U ) and let f = g + h be the pointwise sum of g and h. Then f Lm (V ; U ) by
Theorem 27.4.2. For permutations P : Nm Nm and vector sequences (vi )m i=1 V ,
m
f (vP (i) )m m m
i=1 = g (vP (i) )i=1 + h (vP (i) )i=1
= parity(P )g (vi )m m
i=1 + parity(P )h (vi )i=1
= parity(P )f (vi )m
i=1 .
Therefore f is antisymmetric. So f Lm (V ; U ).
30.1.16 Theorem: Symmetry under all binary swaps implies symmetry under all permutations.
Let f Lm+
(V ; U ) with m + Z
0 for linear spaces V and U over a field K. Then for all permutations
N N
P : m m and v = (vi )m i=1 V m , f ((vP (i) )m m
i=1 ) = f ((vi )i=1 ). In other words,
+
f Lm (V ; U ), P perm( Nm), v V m ,
f (v P ) = f (v).
Proof: The assertion follows from Definition 30.1.3 and Theorem 14.7.11.
30.1.17 Theorem: Antisymmetry under all binary swaps implies antisymmetry under odd permutations.
Let f Lm
(V ; U ) for m + Z
0 and linear spaces V and U over a field K. Then for all permutations
N N
P : m m and v = (vi )m i=1 V m , f ((vP (i) )m m
i=1 ) = parity(P )f ((vi )i=1 ). In other words,
f Lm (V ; U ), P perm( Nm), v V m ,
f (v P ) = parity(P )f (v).
Proof: Let f Lm
N N
(V ; U ), P perm( m ) and v V m . All permutations P : m m are either N
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
odd or even (and not both) by Theorem 14.7.14 (vi). Suppose that P is even. Then by Definition 14.7.13,
Z
P = T` T`1 . . . T1 for some sequence (Ti )`i=1 of transpositions of m , for some even integer ` + Z0.
N N
Thus v P = v swapji ,ki swapji1 ,ki1 . . . swapj1 ,k1 for some integer-pairs (ji , ki )`i=1 ( m m )` with
N
ji 6= ki for all i ` . So f (v P ) = f (v) = parity(P )f (v) by Definitions 30.1.4 and 14.7.17. Similarly, if P
is odd, then ` is odd. So f (v P ) = f (v) = parity(P )f (v). Thus the assertion is verified.
where vkik = B (vk )ik denotes the ik -th component of vk with respect to B.
q
P P r
Q
A Lr (V ; U ), v V r , A(v) = aji eU BV (vk )ik ,
N
j
i r j=1 k=1
n
where BV (vk )ik means the ik -th component of vk with respect to BV as in Definition 22.8.8, and
30.2.5 Remark: Multilinear map space canonical basis for unmixed component spaces.
Theorem 30.2.4 is based on the mixed-space multilinear map space canonical basis in Definition 27.4.16.
This is specialised to unmixed component spaces in Definition 30.2.6.
30.2.6 Definition: The canonical basis for the multilinear map space Lr (V ; U ) for r + Z
0 , corresponding
to bases BV = (eVi )ni=1 and BU = (eU ) m
j j=1 for the finite-dimensional linear spaces V and U respectively over
a field K, is the family (j )iNn ,jNm , where j Lr (V ; U ) is defined for (i, j) n m by
i r
i r
N N
r
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Q
v = (v )r=1 V r , ij (v) = eU
j BV (v )i , (30.2.1)
=1
where BV and BU are the component maps for BV and BU respectively as in Definition 22.8.7.
30.2.7 Theorem: The multilinear map space canonical basis for unmixed spaces is a basis.
Let (ij )iNrn ,jNm be the multilinear map space canonical basis in Definition 30.2.6 for Lr (V ; U ) with
r + Z V n
0 , corresponding to bases BV = (ei )i=1 V
n
and BU = (eU m
j )j=1 U
m
for finite-dimensional spaces
V and U respectively.
(i) (ij )iNrn ,jNm is a basis for Lr (V ; U ).
P P
(ii) A Lr (V ; U ), A = iNrn jNm BU (A(eVi ))j ij . In other words,
P P
N N
A Lr (V ; U ), A = iNrn jNm aji ij , where i rn , j m , aji = BU (A(eVi ))j .
where vkik = BV (vk )ik is the ik -th component of vk with respect to BV , and (`) = {` P ; P perm( Nr )}
N N
for all ` Irn . (See Notation 14.9.3 for Irn = Inc( r , n ). See Notation 14.7.18 for parity(P ).)
N N N N
Proof: Let i rn \ Inj( r , n ). Then ik1 = ik2 for some k1 , k2 r with k1 6= k2 . So swapk1 ,k2 (ei ) = ei
and then A(ei ) = A(swapk1 ,k2 (ei )) = A(ei ) by Definition 30.1.4. Therefore A(ei ) = 0. Thus A(ei ) = 0 for
N N N
all i rn \ Inj( r , n ). So
P r
Q P r
Q
A (eik )rk=1 vkik = A (eik )rk=1 vkik
iN r
n k=1 iInj( N ,N
r n) k=1
P P r
Q
= A (eik )rk=1 vkik
`Irn i(`) k=1
N N
because {(`); ` Irn } is a partition of Inj( r , n ) by Theorem 14.10.5 (ix). Then line (30.3.1) follows from
Theorem 30.2.2.
For line (30.3.2), let ` Irn . Then i (`) if and only if i = ` P for one and only one P perm( r ). (The N
uniqueness follows from the injectivity of `.) By Theorem 14.10.5 (vii), there is one and only one element of
(`) which is an element of Irn , and this element is `. (In other words, (`) Irn = {`}.) Therefore
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P r
Q P r
Q `
A (eik )rk=1 vkik = A (e`P (k) )rk=1 vkP (k)
i(`) k=1 P perm( N r) k=1
P r
Q `
= parity(P ) A (e`k )rk=1 vkP (k)
P perm( N r) k=1
where BV (vk )iP (k) means the iP (k) -th component of vk with respect to BV as in Definition 22.8.8, and
Proof: The assertion follows from Theorem 30.3.2 and Definition 22.8.8.
30.3.5 Definition: The canonical basis for the antisymmetric multilinear map space Lr (V ; U ) with r
Z+ V n U m
0 , corresponding to finite bases BV = (ei )i=1 and BU = (ej )j=1 for linear spaces V and U respectively
over a field K, is the family (j )iIrn ,jNm , where j Lr (V ; U ) is defined for (i, j) Irn m by
i i
N
P r
Q
v = (v )r=1 V r , ij (v) = eU parity(P ) BV (v )iP () , (30.3.3)
N
j
P perm( r) =1
where BV and BU are the component maps for BV and BU respectively as in Definition 22.8.7.
30.3.7 Theorem: The antisymmetric multilinear map space canonical basis is a basis.
Let (ij )iIrn ,jNm be the antisymmetric multilinear map space canonical basis in Definition 30.3.5 for
Lr (V ; U ) with r + Z V n
0 , corresponding to bases BV = (ei )i=1 V
n
and BU = (eU m
j )j=1 U
m
for finite-
dimensional spaces V and U respectively over the same field K.
(i) i Irn , j Nm , ji PLr (VP; U ).
PiIr PjNm
(ii) A Lr (V ; U ), A = n BU (A(eVi ))j ij . In other words,
A Lr (V ; U ), A = iIrn jN m
aji ij , where i Irn , j Nm, aji = B U
(A(eVi ))j .
Nm } spans Lr (V ; U ).
(iii) {ij ; i Irn , j
(iv) The vectors ji for (i, j) Irn Nm are linearly independent.
(v) (ij )iIrn , jNm is a basis for Lr (V ; U ).
(vi) dim(Lr (V ; U )) = #(Irn Nm) = m C nr.
Proof: For part (i), let i Irn and j Nm . Let P perm(Nr ). Let Nr . Then the map g,P : V K
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
defined by g,P : v 7 BV (v )iP () is a linear functional in V = Lin(V, K) L1 (V, K) by Theorem 23.4.8.
Q Q
Define fi,P : V r K by fi,P (v) = r=1 BV (v )iP () = r=1 g,P (v) for v V r . Then fi,P Lr (V ; K).
Qr
(See Remark 27.2.3 and Definition 27.2.4.) So the map hi,j,P : v 7 eU j parity(P ) =1 BV (v )iP () satisfies
N N
hi,j,P Lr (V ; U ) for all (i, j) Irn m and P perm( r ) by Theorem 27.1.33 and the closure of
P
Lr (V ; U ) under scalar multiplication. Then ji = P perm(Nr ) hi,j,P satisfies ji Lr (V ; U ) by the closure
of Lr (V ; U ) under vector addition. To show the antisymmetry of ji , let Q perm( Nr ). Then
P r
Q
v V r , ji (v Q) = eU parity(P ) BV (vQ() )iP ()
N
j
P perm( r) =1
P r
Q
= eU parity(P ) BV (v )iP (Q1 ()) (30.3.4)
N
j
P perm( r) =1
P r
Q
= eU parity(P 0 Q) BV (v )iP 0 () (30.3.5)
N
j
P 0 perm( r) =1
P r
Q
= eU
j parity(P 0 ) parity(Q) BV (v )iP 0 () (30.3.6)
P 0 perm( N r) =1
= parity(Q) ji (v),
Nrn,
m
P P
i0 0= kij ij (ei0 )
iIrn j=1
m
P P P r
Q
= kij eU parity(P ) BV (ei0 )iP ()
N
j
iIrn j=1 P perm( r) =1
m
P P P r
Q
= eU kij parity(P ) BV (ei0 )iP ()
N
j
j=1 iIrn P perm( r) =1
m
P P P r
Q i
= eU
j kij parity(P ) i0P () (30.3.7)
j=1 iIrn P perm( N r) =1
m
P P P
= eU
j kij parity(P ) iiP
0
j=1 iIrn P perm( N r)
m
P P P
= eU parity(P ) kij iiP
0
N
j
j=1 P perm( r) iIrn
m
P P P
= eU
j parity(P ) kij ii0 P (30.3.8)
j=1 P perm( N r) iIrn
m
P P
= eU
j parity(P ) kij0 P ( i0 P Irn ) (30.3.9)
j=1 P perm( N r)
m
P j
= eU
j parity(Pi0 ) ki0 Pi0 (30.3.10)
j=1
m
P j
= parity(Pi0 ) eU
j ki0 Pi0 , (30.3.11)
j=1
where line (30.3.7) follows from Theorem 22.8.9 (ii), and line (30.3.8) substitutes P = (P )1 , where the
equality parity(P 1 ) = parity(P ) follows from Theorem 14.7.14 (i). (See Notation 14.6.21 for the pseudo-
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
notation ( i0 P Irn ) on line (30.3.9). See Definition 14.10.3 for the standard sorting-permutation Pi0
for i0 on line (30.3.10), where i0 Pi0 Irn is the sorted rearrangement of i0 .)
Pm letU i j Ir . Then P (i ) j= idNr by Theorem
0 0
Now n
14.10.5 (vii). So i0 Pi0 = i0 and parity(Pi0 ) = 1. So
0
j=1 ej ki0 = 0. Therefore ki0 = 0 for all (i , j) Ir
n
N U m
m by Theorem 22.6.8 (ii) because (ej )j=1 is a
j
linearly independent vector-family in U . Thus the vectors i for (i, j) Irn m are linearly independent. N
Part (v) follows from parts (iii) and (iv).
Part (vi) follows from part (v) and Theorem 14.9.5 (i).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
only if P perm( m ), I = J P . A unique representative may be chosen from each equivalence class
by sorting into increasing order. There is one and only one increasing index sequence in each equivalence
class, and there is one and only one equivalence class for each increasing index sequence. It follows that the
n n
number of equivalence classes equals the number of increasing index sequences in Im , which is equal to Cm
by Theorem 14.9.5 (i).
For part (iii), the symmetry rule implies that aI = aJ whenever I = J P for some P perm( m ). N
Equivalent index sequences may be partitioned into equivalence classes as in the antisymmetric case, but
coefficients with repeated indices are not set to zero. A unique representative for each equivalence class may
be obtained by sorting into non-increasing order. Since aI = aJ if and only if the sorted index sequences I
and J are equal, it follows that the number of independent coefficients is equal to the cardinality of the set
n
Jm N N
of non-increasing maps from m to n . By Theorem 14.9.5 (ii), this equals C m n+m1
.
30.4.2 Notation: m (V ; W ), for linear spaces V and W over the same field, where m + Z
0 , denotes the
set Lm (V ; W ) together with its pointwise vector addition and scalar multiplication operations.
30.4.6 Remark: The inconvenient structure of the duals of antisymmetric multilinear function spaces.
The significance of the falsity of the inclusion Lm (V ) Lm (V ) in general becomes clearer if one considers
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
that a knowledge of the effect on a particle of all possible antisymmetric multilinear functions is not sufficient
to determine the effect of all general multilinear functions. The antisymmetric multilinear fields constitute
a subspace of the general multilinear fields. Extending the effect from a subspace to the full space is non-
unique. In other words, although Lm (V ) Lm (V ), the linear functionals in the subspace dual Lm (V ) need
to be extended in some way in order to compare them with linear functionals in the full-space dual Lm (V ) .
Let K be the field of V . Then the elements of Lm (V ) are linear functionals of the form : Lm (V ) K.
In general, Lm (V ) is a linear subspace of Lm (V ), but Lm (V ) 6= Lm (V ) when m 2 and dim(V ) 1.
Then Lm (V ) is a proper linear subspace of Lm (V ). The elements of Lm (V ) are linear functionals of the
form : Lm (V ) K. So the dual space Lm (V ) cannot be considered to be the space of antisymmetric
contravariant tensors of degree m. In other words, Lm (V ) is not the subspace of tensors in Lm (V ) which
happen to be antisymmetric. However, the space Lm (V ) can be considered to be the space of antisymmetric
covariant tensors of degree m because Lm (V ) Lm (V ).
The issue here is a special case of the fact that the dual of a proper linear subspace is not a subspace of the
dual of the full linear space. In short, the dual of a subspace is not a subspace of the dual. (This more general
issue is discussed in Remark 23.6.11. The issue is partially resolved in Section 24.3 by using equivalence
classes of extensions of linear functionals instead of trying to construct unique extensions.)
30.4.9 Notation:
Vm
V , for m V + Z
0 and a linear space V , denotes the alternating tensor product space
of degree m on V . In other words, m V = m (V ) .
30.4.10 Remark: Notational conventions for covariant and V contravariant alternating tensors.
The alert reader will have noticed that the wedge symbol for alternating tensors in Notation 30.4.9 is
not quite the same as the lambda symbol for multilinear functions. This is intended as a reminder of the
imperfect duality which is mentioned in Remarks 30.4.5 and 30.4.6. Definition 30.4.8 and Notation 30.4.9
imply that for m 2 and dim(V ) 1,
Vm
V = (V ) ; Lm (V )
m
m
= (V ) ; V
m
m
6 V = Lm (V ) .
V m
Individual
Vm tensors in m V are restrictions of individual tensors in V , but
Vm the alternating tensor space
m m
V is not a subset of the tensor space V . The relation between V and V is clarified in
Remark 30.4.19.
Vm Vm
V m m V V V
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
-du du du
al al al
dual
dual
al al al
-du du Vm du Vm
V V m m V V V
iso
Figure 30.4.1 Linear and antisymmetric multilinear duals of a linear space
30.4.13 Remark: Survey of notations for general and antisymmetric tensor spaces.
Table 30.4.2 summarises the general and antisymmetric tensor space notations of a selection of authors.
There is clearly much agreement, but there is also significant diversity. However, there is fairly broad
agreement that the contravariant degrees should be superscripts and covariant degrees should be subscripts.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1997 Lee [24] Tm (V ) T m (V ) V
m
Vm (Vm)
1999 Lang [23] Lm (V, ) R m
V V , La (V, ) R
2004 Bump [56] m V m
V
2004 Szekeres [266] m (V ) m (V ), m (V )
2012 Sternberg [37] Tm
0 (V ) Tm (V )
0
m V
Kennington m V R
Lm (V ; ), Lm V
Vm
V m (V ; R), m V
Table 30.4.2 Survey of general and antisymmetric tensor space notations
The antisymmetry arises from the area elements which are approximations for closed curves. Since the
curvature is defined in terms of parallel transport around curves, and parallel transport is defined as the
integral of a connection form around the curve, it is inevitable (by the Stokes theorem) that the resulting
automorphism will converge to the application of the exterior derivative of the connection form to the area
form in the limit of small curves. Thus the curvature form is the exterior derivative of a connection form.
(The connection form is an element of Lin(Tp (M ), Aut(Tp (M ))).)
V
The space Lin( 2 Tp (M ), Aut(Tp (M ))) is almost identical to 2 (Tp (M ); Aut(Tp (M ))), namely the space of
antisymmetric bilinear maps from Tp (M ) to Aut(Tp (M )). This is slightly less intuitive because it is a space
of maps from vector-pairs to the automorphism group. Each vector-pair is associated with a parallelogram
boundedVby the vectors on two sides, and this parallelogram is associated with a unique, well-defined area
form in 2 Tp (M ). However, surfaces may be partitioned in many ways, not necessarily approximating the
surface by parallelograms. So although an antisymmetric bilinear function of vector-pairs does correspond
mathematically to the space of area forms, it is not so immediately intuitive.
Since the set Aut(Tp (M )) of tangent space automorphisms is a linear subspace of Lin(Tp (M ), Tp (M )), the set
V
of all tangent space endomorphisms, the spaces Lin( 2 Tp (M ), Aut(Tp (M ))) and 2 (Tp (M ); Aut(Tp (M ))) are
Lm (V ; W ) Lm (V ; W ) Lm (V ; W ) /Lm
(V ; W )
+U
U U
U
nm nm n
Cm
idU idU
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
f 0
n m n n
Cm n Cm Cm
U = Lm (V ; W ) U = Lm
(V ; W ) U = Lm
(V ; W )
Figure 30.4.3 Maps and spaces for subspace-quotient isomorphism for multilinear map space
30.4.16 Theorem: Let V and W be finite-dimensional linear spaces over the same field, and m + 0. Z
Let Lm (V ; W ) denote the linear subspace { Lm (V ; W ) ; f Lm
(V ; W ), (f ) = 0} of Lm (V ; W ) .
Define the map : Lm (V ; W ) Lm (V ; W ) /Lm
(V ; W ) by
0 Lm
(V ; W ) , (0 ) = { Lm (V ; W ) ; f Lm
(V ; W ), (f ) = 0 (f )}.
Then is a linear space isomorphism.
Proof: The result follows by substituting Lm (V ; W ) for V and Lm (V ; W ) for U in Theorem 24.3.4.
30.4.17 Remark: Natural immersion for antisymmetric multilinear functions.
Theorem 30.4.18 specialises Theorem 30.4.16 to the case that the linear space W is the field K regarded as
a linear space over itself.
30.4.18 Theorem: Let V be a finite-dimensional linear space over a field K, and m +
0 . Let Lm (V ; K)
Z
denote the linear subspace { Lm (V ; K) ; f Lm (V ; K), (f ) = 0} of Lm (V ; K) . Define the map
: Lm (V ; K) Lm (V ; K) /Lm
(V ; K) by
0 Lm
(V ; K) , (0 ) = { Lm (V ; K) ; f Lm
(V ; K), (f ) = 0 (f )}.
Then is a linear space isomorphism.
Proof: The result follows by substitution of the field K for the linear space W in Theorem 30.4.16.
Lm (V ) m V m V /(m V )
+U
U U
U
nm nm n
Cm
idU idU
f 0
n m n n
Cm n Cm Cm
Vm
U = m V U = (m V ) U = V
Figure 30.4.4 Maps and spaces for subspace-quotient isomorphism for multilinear functions
m
This has the clear geometric interpretation that (0 ) is the set of all tensors in V which have the
same antisymmetric m-linear effect as 0 . Hence contravariant antisymmetric tensors may be thought
of as equivalence V
classes of contravariant tensors which have the same antisymmetric m-linear effect as
m m m
each other. Thus V is not a subset of V , but it is isomorphic to a quotient of V . This and some
other natural isomorphisms between antisymmetric tensors spaces are illustrated in Figure 30.4.5.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
m V /(m V )
iso
iso
iso Vm iso Vm
V m m V V V
-du du du
al al al
dual
dual
al al al
-du du Vm du Vm
V V m m V V V
iso iso iso iso
iso
m V /(m V )
Figure 30.4.5 Duals and isomorphisms for antisymmetric tensors
30.4.20 Definition: An m-covector for a linear space V and m Z+0 is any element of m V .
30.4.21 Definition: An m-vector for a linear space V and m Z+0 is any element of Vm V .
Vm
30.4.22 Definition:Vm A simple m-vector in an alternating tensor product V for a linear space V and
m + Z
0 is any f V of the form f :
7 (v) for some v V m
.
30.4.23 Notation: Vm
m
m
i=1 vi for a sequence v V , where V is a linear space and m Z+0, denotes the
simple m-vector f V defined by f () = (v).
v1 v2 . . . vm is an alternative notation for mi=1 vi .
30.4.24 Theorem: Let V be a linear space and m Z+0. Then dim(Vm V ) = dim(m V ) = C dim(V
m
)
.
n
Proof: It follows fromVm Theorem 30.3.7 (vi) and Notation 30.4.3 that dim(m V ) = Cm for n = dim(V ).
The dimensionality of V then follows from Theorem 23.7.11.
30.4.25 Remark: The area spanned by a vector-pair is an antisymmetric bilinear effect.
In the same way that a simple general tensor may be thought of as the multilinear effect on a sequence
of vectors at a point in a multilinear field, a simple antisymmetric tensor may be thought of as the
antisymmetric multilinear effect on the sequence of vectors.
The area spanned by a pair of vectors v1 and v2 is a kind of antisymmetric bilinear effect of the pair of
vectors. The word area here means directed area, not just the amount of area. This directed area is
denoted v1 v2 . (The symbol is pronounced wedge.) Since this area has a direction, reversing one of
the vectors changes the direction to the opposite: v1 (v2 ) = (v1 v2 ) = (v1 ) v2 . When the order of
multiplication is swapped, the resulting area has the opposite direction. So v2 v1 = (v1 v2 ).
As illustrated in Figure 30.4.6, the antisymmetric bilinear effect of the pair of vectors v1 and v1 + v2 is the
same as for the pair v1 and v2 . This is because v1 (v1 + v2 ) = v1 v1 + v1 v2 by linearity with respect to
the second factor, and v1 v1 = (v1 v1 ) by antisymmetry. So v1 v1 = 0. Hence v1 (v1 + v2 ) = v1 v2 .
Similarly,
0.5(v1 v2 ) (v1 + v2 ) = (v1 0.5(v1 + v2 )) (v1 + v2 )
= v1 (v1 + v2 ) 0.5(v1 + v2 ) (v1 + v2 )
= v1 (v1 + v2 ).
2v1 + v2
v1 + v2 v1 + v2 v1 + v2 2v1 + 0.5v2
v2 v2 1.5v1 + 0.5v2
0.5v2 2v1
v1 v1 v1
0.5(v1 v2 )
v1 v2 = v1 (v1 + v2 ) = 0.5(v1 v2 ) (v1 + v2 ) = (2v1 ) (0.5v2 )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 30.4.6 Equivalent antisymmetric multilinear effect of vector pairs
Antisymmetric tensors are used for integration of functions on curves, surfaces and volumes. The line, area
and volume elements for (standard) integration are antisymmetric multilinear effects of vectors.
30.4.26 Remark: Interpretation of antisymmetric tensors in the context of integration on manifolds.
The principal motivation for antisymmetric tensors is the theory of integration on manifolds embedded within
flat spaces or manifolds, especially when the codimension is greater than zero.
In applications to the pure mathematical study of the global topological implications of the curvature of
manifolds, it is customary to assume that all manifolds have C regularity. Such topological differential
geometry does not require the sophisticated analytical investigation of integration on regions with extremely
discontinuous boundaries which is the subject of geometric measure theory. Even the sophistication of
Lebesgue measure and integration is surplus to requirements for such topological geometry.
In applications of antisymmetric tensor spaces to integration on manifolds, regions of space, or patches
of surfaces, are approximated (in one way or another) by very simple regions such as parallelograms or
parallelepipeds. The edges of such simple regions are considered to be vectors, and the area or volume may
be modelled as an antisymmetric tensor product of the edge vectors. Such local approximation of regions
and boundaries is very straightforward for smooth manifolds, but requires progressively more sophistication
as the smoothness parameter is lowered.
The use of antisymmetric tensors for integration on manifolds requires a metric tensor field. This requirement
is often hidden by bundling the metric tensor into the integrand, or by implicitly assuming that the metric is
Euclidean. The interpretation of antisymmetric tensors as volume, area or line elements requires, in principle,
multiplication of such tensors by the square root of the determinant of the projection of the metric tensor
onto the hyperplane of the elements.
30.5.2 Remark: Classification of bilinear functions as positive and negative definite and semi-definite.
For the real number field, bilinear functions may be usefully classified according to how positive or negative
they are, and how definite or non-degenerate they are. In the case of matrices, these properties are discussed
in Sections 25.11 and 25.13. Since these properties are invariant under changes of basis, the matrix concepts
may be meaningfully applied to bilinear functions.
R
Although Definition 30.5.3 is written in terms of general spaces of bilinear functions L2 (V ; ), the definition
R
is mostly applied to the symmetric bilinear function spaces L2+ (V ; ). A particular application of this
definition is to the metric tensor for a Riemannian manifold, which is defined to be a positive definite
bilinear function on the tangent space at each point of the manifold. (See Section 67.2.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(iv) negative definite if v V \ {0V }, f (v, v) < 0.
30.5.4 Remark: Alternative terminology for positive and negative semidefinite bilinear functions.
Lang [104], pages 583 and 597, uses the term semipositive to mean positive semi-definite.
30.6.2 Remark: Tensor bundles are based on tangent vector bundles and tangent covector bundles.
Tangent vector bundles on Cartesian spaces are presented in Section 26.12. Tangent covector bundles on
Cartesian spaces are presented in Section 26.15. Section 30.6 applies the definitions of tensors to construct
the corresponding tensor bundles on Cartesian spaces.
Tensor bundles are not fibre bundles because they do not have an explicit structure group. (See Defini-
tion 21.5.2 for non-topological ordinary fibre bundles.)
30.6.3 Remark: Notations and identifications for tensor spaces on Cartesian spaces.
When the linear space V in Definition 29.5.4 and Notation 29.5.5 for mixed tensor spaces is the tangent line
R R
space Tp ( n ) at a point p in a Cartesian space n for n + Z
0 , as in Section 26.12, the dual space is then
R rKs
denoted Tp ( n ), as in Section 26.15. Using the mixed tensor space notation V in Notation 29.5.5,
R 1K0
R R 0K1
R
one may identify Tp ( n ) with Tp ( n ), and Tp ( n ) with Tp ( n ). Conversely, one may write
rKs
R R r,s
R R
Tp ( ) as Tp ( ), and Tp ( ) as Tp ( ). This is done in Notations 30.6.6 and 30.6.7.
n rKs n n r,s n
30.6.4 Definition: The (multilinear-style) tensor space of type (r, s) at a point p in a Cartesian space n R
with n + Z
0 , for p
n
R Z
0 , is the tensor space
and r, s +
r,s
R
Tp ( n ) of the tangent line space Tp ( n ). R
30.6.5 Definition: The (linear-style) tensor space of type (r, s) at a point p in a Cartesian space n R
with n + Z
0 , for p
n
R Z
0 , is the tensor space
and r, s +
rKs
R
Tp ( n ) of the tangent line space Tp ( n ). R
Tpr,s (Rn ), for p Rn , r, s Z+
0 and n Z0 , denotes the tensor space Tp (Rn ).
+ r,s
30.6.6 Notation:
30.6.9 Definition: A (multilinear-style) tensor of type (r, s) at a point p in a Cartesian space n with R
n + Z
0 , for p
n
R
and r, s + Z
0 , is an element of the multilinear-style tensor space of type (r, s) at p.
30.6.10 Definition: A (linear-style) tensor of type (r, s) at a point p in a Cartesian space Rn with n Z+0,
R
for p n and r, s + Z
0 , is an element of the linear-style tensor space of type (r, s) at p.
R Z
s (T ( n )), for s + Z+
0 and n 0 , denotes the set
S
pRn s (Tp (R n
)).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
s (T (R ), W ), for s Z0 , n Z0 , and a real linear space W , denotes the set pR Rn), W ).
n + + S
n s (Tp (
Ls (T (Rn ), W ), for s Z+0 , n Z0 , and a real linear space W , denotes the set Ls (Tp (Rn ), W ).
+ S
pR n
30.6.13 Remark: Sets of cross-sections of tensor bundle total spaces for Cartesian spaces.
For general non-topological fibrations (E, , B), Notation 21.1.14 lets X(E, , B) denote the set of cross-
section X : B E such that b B, X(b) 1 ({b}). In the case of tensor bundles of various kinds, the
base space and projection map are clear from the context and do not need to be included in the notation.
R R R
Therefore one generally writes X(T r,s (T ( n ))) instead of X(T r,s (T ( n )), , n ), and so forth. This is
stated somewhat informally in Notation 30.6.14.
30.6.14 Notation: X(E), for any tensor bundle total space for a Cartesian space n with n + R0, Z
R R
denotes the set of cross-sections of E. In other words, X(E) = {X : n E; p n , X(p) 1 ({p})},
R
where : E n is the standard projection map for E.
Then X(E | U ), for a subset U of Rn , denotes the set {X : U E; p Rn , X(p) 1({p})}.
(These notations apply to the spaces in Notations 26.12.24 and 30.6.12 in particular.)
30.6.15 Remark: Tensors parametrised by their components with respect to the standard basis.
R
The standard basis (em )nm=1 ( n )n in Notations 30.6.16 and 30.6.17 is the standard basis of the Cartesian
R
space n which is defined in Definition 22.7.9. Tangent vectors Lp,v and tangent covectors Lp,w for p, v, w
R n
are defined in Notation 26.12.12 and Notation 26.15.8 respectively.
Simple tensors of the form (rk=1 vk ) (s`=1 ` ) L ((V )r , V s ) for (vk )rk=1 V r and (` )s`=1 (V )s ,
where V is a finite-dimensional linear space, are defined in Notation 29.6.9. Simple tensors of the form
rk=1 vk s`=1 ` Lin(Lr (V ), Ls (V )) for (vk )rk=1 V r and (` )s`=1 (V )s Lin(Lr (V ), Ls (V )) are
defined in Notation 29.6.10.
P P r s
Lr,s
p,a = ai j ( Lp,eik ) ( Lp,ej )
k=1 `=1 `
iI jJ
where I = Nrn, J = Nsn, and (em )nm=1 (Rn )n is the standard basis of Rn .
p,a , for r, s Z0 , for p R and a : Nn Nn R for some n Z0 , denotes the
+ +
30.6.17 Notation: LrKs n r s
P P r s
LrKs
p,a = ai j ( Lp,eik Lp,ej ),
k=1 `=1 `
iI jJ
where I = Nrn, J = Nsn, and (em )nm=1 (Rn )n is the standard basis of Rn .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part III
topology in terms of the interior or closure instead of open sets.) Table 31.8.1 in Remark 31.8.10 is
a survey of the wide range notations of these four concepts in the mathematics literature. Numerous
useful properties are given for these concepts, which are frequently used in this book.
(10) Section 31.10 defines limit points of sets (also known as accumulation points or cluster points), and
isolated points. A substantial proportion of analysis is concerned with demonstrating the existence
of limit points of one kind or another. In application contexts, it is easy to make errors by following
intuition. So precise definitions for these concepts are required. Also defined here are -limit points,
which are sometimes closer to intuitive notions of limit points than the standard definition. Limit points
must not be confused with sequential limits (although they often are).
(11) Section 31.11 presents some simple topologies on infinite sets. These are very broadly useful as examples
and counterexamples.
(12) Sections 31.12, 31.13 and 31.14 give basic initial definitions for continuous functions and homeomorph-
isms (which are bidirectional continuous functions). The serious consideration of continuous functions
commences in Chapter 35, but some definitions and properties are required much earlier, for example
for defining some function-induced topologies in Section 32.7, for product topologies in Sections 32.8
and 32.9, and for some separation classes in Section 33.3
spaces. In fact, spaces which are not in these classes typically have a pathological character. The T2
class is also known as the Hausdorff class. This is important for differential geometry because locally
Cartesian spaces do not necessarily have even this very weak Hausdorff property. (Many examples are
given in Section 48.4.)
(3) Before progressing to the strong separation classes, Section 33.2 defines two notions of the separation of
pairs of sets, referred to here as weak separation and strong separation. Confusion between these
two notions can easily lead to erroneous assumptions. Therefore their properties and differences are
emphasised here. (There is no difference between them in a T5 space or metric space.)
(4) Section 33.3 introduces stronger topological separation classes. Spaces in these classes are known as
T3 , T3 21 , T4 , T5 and T6 spaces. The relations between the stronger separation classes are not simple
inclusions according to numerical order as in the case of the weaker separation classes. The untidy
details of these relations are summarised in Figure 33.3.7 in Remark 33.3.36.
(5) Section 33.4 presents separable spaces, and first and second countable spaces. (Confusingly, separable
spaces are not directly related to separability classes, despite the very similar name.) These classes are
concerned with the countability of dense subsets, global open bases and pointwise open bases respectively.
These properties are often indirectly useful because they imply other properties which are directly useful.
Since these classes require only the existence of certain kinds of subsets or open bases, the ability to
choose these subsets or open bases often leads to invocations of the axiom of choice to combine infinitely
many of these choices. To combat this issue, explicit first and second countable spaces are defined.
(6) Section 33.5 introduces open covers and compactness. There are many kinds of topological compactness,
which are often confused in the mathematics literature. Literature which uses this concept must always
be examined carefully to determine which definitions and terminology are being used. The style of
compactness defined in Section 33.5 is expressed in terms of subcovers of open covers.
(7) Section 33.6 introduces local compactness, which is implied by the global compactness in Section 33.5.
(8) Section 33.7 presents some variations of the compactness class in Section 33.5. These include the classes
of countably compact, Lindelof and paracompact spaces. These classes often require the axiom of choice
in applications. So they are defined here more on a know-thy-enemy basis than because they are
really applicable. They are often encountered in differential geometry books, although they are rarely
genuinely useful.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(9) Section 33.8 defines topological dimension. This is rarely useful in differential geometry, although it
has a very limited relevance for Hilberts 5th problem for Lie groups in Section 60.1. The Lebesgue
dimension of a topological space is difficult to define and difficult to use.
(8) Section 34.9 gives some topological properties of the real numbers, including the Heine-Borel theorem,
which states that all bounded closed sets of real numbers are compact.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
the Bolzano-Weierstra property for real numbers and for Cartesian spaces.
(8) Section 35.8 presents inferior and superior limits of real-valued functions on topological spaces.
given here. The most important property is Theorem 36.7.9, which states that pathwise connected sets
are connected.
(8) Section 36.8 defines paths as equivalence classes of curves. It turns out that these are not as useful as
one would expect.
(9) Section 36.9 defines topological groups. Sections 36.10 and 36.11 define topological left and right trans-
formation groups respectively, which are useful for defining topological fibre bundles.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
33.5 and 35.4 to metric spaces.
(9) Section 37.8 defines Cauchy sequences and completeness.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) Section 40.4 presents the constant multiplication rule, sum rule, product rule, reciprocal rule, quotient
rule and chain rule for differentiation.
(6) Section 40.5 presents the mean value theorem and Rolles theorem, and some useful consequences.
(7) Section 40.6 presents the so-called mean value theorem for several dependent variables. It states that
the average speed of a curve between two points does not exceed the maximum speed at points along
the curve. This is very easy to show with integral calculus, but Theorems 40.6.4 and 40.6.4 show that
it can be proved using only differential calculus.
(8) Section 40.7 presents unidirectional differentiability of real functions of a real variable.
(9) Section 40.8 presents Dini derivatives, which are used in the proof of the Lebesgue differentiation theorem
in Section 44.7.
(5) Section 41.4 defines unidirectional derivatives of real-valued functions on Cartesian spaces. These can
be useful at boundaries of regions.
(6) Section 41.5 defines total differentiability of real-valued functions on Cartesian spaces. This is a very
strong form of definition for multiple independent variables. This requires the directional derivatives at a
point to be a linear function of the direction vector, and the convergence must be uniform with respect
to direction. This is the stronger than both partial and directional differentiability. It is therefore
the most difficult to verify directly. But without total differentiability, it would not be possible to
express all tangent vectors on differentiable manifolds in terms of a simple n-tuple of real numbers.
Luckily Theorem 7.2.9, here called the total differential theorem, shows that a continuously partially
differentiable function is also continuously totally differentiable. This explains why differentiability in
differential geometry is mostly required to be continuous differentiability.
(7) Section 41.6 presents differentiation of maps between abstract linear spaces.
(8) Section 41.7 presents the inverse function theorem and implicit function theorem. These are important
for the existence and regularity of immersions and embeddings of differentiable manifolds.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
This is perplexing because Archimedes was performing impressive feats of integral calculus more than
2200 years ago, whereas differential calculus is an invention of the 17th century, almost two millennia
later. The 20th century Lebesgue measure and integration technology is not presented because it has
numerous technical complexities which are irrelevant and unnecessary for most integration purposes in
differential geometry. The 19th century integration technology is sufficient.
(2) Section 43.1 is an overview and brief history of numerous methods of integration.
(3) Section 43.2 makes some comments on the Leibniz-Newton antiderivative style of integration.
(4) Sections 43.3, 43.4 and 43.5 introduce the Cauchy, Cauchy-Riemann and Cauchy-Riemann-Darboux
integrals respectively for real functions of a real variable. These are 19th century integrals, as the
Stieltjes versions of these integrals are also, but they are adequate for most purposes in differential
geometry.
(5) Section 43.6 presents the fundamental theorem of calculus for the Darboux integral.
(6) Section 43.7 applies the Riemann integral to vector-valued integrands.
(7) Section 43.8 introduces the Cauchy-Riemann-Darboux-Stieltjes integral (also known as the Riemann-
Stieltjes integral) for real functions of a real variable. Although this is a 19th century integral, it is
useful and adequate for integration of the velocity of a curve.
(8) Sections 43.9 and 43.10 introduce the Stieltjes integral for vector and linear operator functions of a
real variable. These are the right integrals for computing parallel transport for linear connections on
vector bundles along rectifiable curves. (They are used in conjunction with the Picard iteration method
in Definition 43.12.15.)
(9) Section 43.11 presents the old-fashioned Peano method for demonstrating existence of solutions of sys-
tems of ordinary differential equations. This has some small advantages relative to the Picard method
which has largely displaced it.
(10) Section 43.12 presents the Picard integration-iteration method for demonstrating existence for systems
of ordinary differential equations. Despite the name of this topic, it is actually a generalisation of integral
calculus, which is in essence the solution of equations such as f 0 (x) = g(x), which is a very simple kind
of ODE. In the ODE subject, the objective is always to solve them, not just compute the derivatives!
ODEs are central to differential geometry because parallel transport along a curve is defined as the
solution of an ODE. The Picard iteration method in Definition 43.12.15 (which is also 19th century
technology) is powerful enough to prove existence of solutions for the linear connections on general
vector bundles which appear in differential geometry.
(11) Section 43.13 defines the familiar logarithmic and exponential functions. Section 43.14 defines the
familiar trigonometric functions.
(1) Chapter 44 presents, in a cursory manner, Lebesgue measure and integration. Although these are
often used because they are in some sense maximal, they require extensive applications of the axiom of
choice for even quite elementary constructions. The main difference between the Lebesgue approach to
integration and the earlier Riemann/Stieltjes approach is the summation over horizontal slices in the
Lebesgue by contrast with the vertical slices in the Riemann/Stieltjes case. Hence set-measure (the
measure of horizontal slices of functions) is a specific characteristic of the Lebesgue theory.
(2) Section 44.1 presents sets of real numbers of measure zero. Although some of the very basic theorems
for sets of measure zero require the axiom of countable choice, it is shown in Sections 44.2 and 44.3 that
these theorems can be replaced with explicit versions which do not need any axiom of choice.
(3) Section 44.4 demonstrates in Theorem 44.4.5 that there exist bounded non-decreasing functions which
are non-differentiable at all points of any given set of explicit measure zero. The purpose of this is
to show that the Lebesgue differentiation theorem in Section 44.7 is the best possible constraint on
the subsets of non-decreasing functions where functions are non-differentiable. (In other words, the
Lebesgue differentiation theorem is sharp.)
(4) Sections 44.5 and 44.6 are concerned with shadow sets and double shadow sets respectively. These
are technical constructions which are useful for proving the Lebesgue differentiation theorem.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(5) Section 44.7 presents the Lebesgue differentiation theorem, which states that a non-decreasing real
function (and therefore also a function of bounded variation) is differentiable everywhere except on a set
of measure zero. This is achieved by careful analysis of Dini derivatives with the assistance of shadow
sets and double shadow sets.
(1) Chapter 45 is concerned with differential forms, the exterior derivative and the so-called Stokes theorem
in Cartesian spaces. (Stokes did not discover it.)
(2) Section 45.1 gives some basic definitions for vector fields on Cartesian spaces.
(3) Section 45.2 defines differential forms on Cartesian spaces. These are antisymmetric covariant tensor
fields valued in arbitrary finite-dimensional linear spaces.
(4) Section 45.3 defines naive vector fields for Cartesian spaces. These are used for the Lie bracket in
Section 45.4.
(5) Section 45.5 defines naive differential forms for Cartesian spaces. These are optimised for computations
related to the exterior derivative.
(6) Section 45.6 introduces the exterior derivative for Cartesian spaces. Since the exterior derivative does
not depend on connections or metrics, it is essentially the same locally for Cartesian spaces as for
differentiable manifolds.
(7) Section 45.7 applies the exterior derivative to general nonholonomic vector-field tuples.
(8) Sections 45.8 and 45.9 introduce the very simple cases of the Stokes theorem for rectangular regions of
R2
R
and 3 . These are easy applications of the fundamental theorem of calculus.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
manifolds are constrained.) More importantly, the fibre atlas induces structure from the structure
(transformation) group onto the underlying topological fibration.
(9) Section 46.8 introduces topological fibre bundle homomorphisms, isomorphisms and products. These
are similar to the corresponding definitions for other categories.
(10) Section 46.9 defines topological principal fibre bundles. In structural terms, a PFB is like an OFB
except that the fibre space is the same as the structure group. However, the real purpose of a PFB is
to act as the associated frame bundle for an OFB. Then parallelism can be defined on many different
associated OFBs by first defining parallelism on the PFB.
(11) Section 46.10 defines associated topological fibre bundles. A PFB may be associated with one or more
individual OFBs, and the OFBs may be associated with each other.
(12) Section 46.11 presents a collage construction method for associated topological fibre bundle.
(13) Section 46.12 presents a method of constructing associated topological fibre bundles. This is called here
the associated topological fibre bundle orbit-space construction method. Some textbooks give this as
the only way to define associated fibre bundles.
(14) Section 46.13 defines combined topological fibre/frame bundles, which differ from the baseless fig-
ure/frame bundles in Section 20.10 by the addition of a base space and topological structure on all
constituent spaces. Both OFBs and PFBs are given greater meaning by being combined.
Chapter 47: Parallelism on topological fibre bundles
(1) Section 47.1 introduces structure-preserving fibre set maps. These are maps between fibre sets at
different base points of a topological fibre bundle which are consistent with the structure group. This
is relevant to the definition of parallelism.
(2) Chapter 47 makes a not very successful attempt to define parallelism on topological fibre bundles. The
difficulty lies in the excess of freedom.
(3) Section 47.2 notes that the objects on which parallelism is defined are equivalence classes of curves.
These are called parallelism path classes here. Parallel transport should be the same along two
curves which cover the same path with different parametrisation. Thus a parallelism path class is an
equivalence class of curves which are expected to always determine the same parallel transport of fibre
sets along the curves.
(4) Section 47.3 defines a very general notion of parallelism on topological fibre bundles. The concept is
meaningful, but the freedom is excessive. This helps to motivate the notion of a connection, which
defines the differential of parallelism everywhere, so that parallelism can be defined as its integral.
(5) Section 47.4 defines associated parallelism on associated topological fibre bundles. For example, this
can be used to define parallelism on an OFB in terms of parallelism on an associated PFB. Alternatively,
parallelism can be defined on OFBs from other associated OFBs. The prime example for parallelism on
affinely connected differentiable manifolds is the method by which parallel transport is defined for all
tensor bundles which are associated with a given tangent bundle. This is why connections and parallel
transport only need to be defined on the tangent bundle, because the tensor bundles are associated with
that tangent bundle. (For example, the Christoffel symbol is defined for contravariant vectors. Then
the covariant derivative for covariant vectors essentially uses the negative of the Christoffel symbol, and
all other tensor use mixtures of positive and negative Christoffel symbols.)
Chapter 31
Topology
31.0.1 Remark: The most relevant topology topics for differential geometry.
Chapters 31 to 39 are not a full introduction to general topology. Only those aspects of topology which are
needed for later chapters are presented here. Apart from the basic definitions, the most important aspects
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of topology for differential geometry are topological space construction techniques (such as product and
quotient topologies), classes of topological spaces, continuous curves and paths, topological transformation
groups, and metric spaces.
Leibniz, Gramann, Mobius, Carnot, Monge, Euler and Vandermonde as significant names in the historical
development of topology.
According to Struik [228], page 196,
An important contribution to the further penetration of point-set topology into the main body of
mathematics was the Grundzuge der Mengenlehre, published in 1914 by Felix Hausdorff, a teacher
in Bonn. It offered an axiomatic definition of what became known as topological space.
Struik [228], page 199, makes the following comments about the early development of topology.
R. L. Moore and Veblen were American representatives of that field of mathematics first known as
analysis situs and after 1900 more and more as that part of topology called combinatory (in contrast
with point-set topology) as it was explained, for instance, by Hausdorff. Its emergence from a set of
puzzle-like problems, such as that of Euler on the seven bridges of Konigsberg, or the Mobius strip,
came with the Riemann surfaces in complex function theory, with Jordans closed curve theorem,
and in particular with Poincares publications, between 1895 and 1904, on simplexes, and Betti
numbers of manifolds.
This theory of homology with its group considerations, chains, and cycles was taken up by many
investigators, among them Veblen and his colleague at Princeton, James W. Alexander. Of special
importance here were the discoveries of the young Dutchman L. E. J. Brouwer [. . . ].
It is not immediately obvious why the word topology is used for the study of continuity. The connection
can be seen by thinking about what a set would be without a topology. Then every point is equivalent.
For example, there would be no concept of a continuous curve. But with a topology, continuous curves are
constrained to move smoothly from one point to another. If progressively shorter segments of a curve are
considered, they are progressively more local to their starting point. In other words, a topology gives a set
a sense of place or locality. This becomes even clearer when considering how neighbourhoods are used in
defining interior, exterior and boundary points of sets. In fact, the aptness of the word topology is most
clear when a topology is defined in terms of open bases at all points rather than open sets. Thus it may be
best to think of topology as meaning not the study of place, but rather the study of locality.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Simmons [133], page viii, said the following about the subject matter of topology in 1963.
Historically speaking, topology has followed two principal lines of development. In homology theory,
dimension theory, and the study of manifolds, the basic motivation appears to have come from
geometry. In these fields, topological spaces are looked upon as generalized geometric configurations,
and the emphasis is placed on the structure of the spaces themselves. In the other direction, the
main stimulus has been analysis. Continuous functions are the chief objects of interest here, and
topological spaces are regarded primarily as carriers of such functions and as domains over which
they can be integrated. These ideas lead naturally into the theory of Banach and Hilbert spaces
and Banach algebras, the modern theory of integration, and abstract harmonic analysis on locally
compact groups.
Topology thus splits into two branches of investigation, one focused on the connectedness of sets, the other
focused on the continuity of functions. The first branch focuses on sets typically smooth manifolds using
homeomorphism invariants such as homotopy as tools to classify spaces into homeomorphism equivalence
classes. The second branch focuses on functions, using continuity to demonstrate the existence of limits.
branch names main focus agenda
combinatorial, connectedness of sets classification of homeomorphism equivalence classes of
algebraic low-dimensional manifolds; algebraic topology tools for
classifying topologies
point-set, continuity of functions analysis on infinite-dimensional linear spaces; conditions for
set-theoretic convergence; existence for partial differential equations problems
However, there is another view of the subject matter of topology, namely that it consists only of the con-
nectedness questions. EDM2 [109], section 426, gives the following summary of topologys subject matter.
The term topology means a branch of mathematics that deals with topological properties of
geometric figures or point sets.
That matches the view which is often seen in popular science magazine articles for example. Possibly the
meaning of topology as a mathematical subject has drifted over the decades due to the many undergraduate
courses which have this name.
Bynum et alii [218], page 423, has the following.
Topology studies properties of geometrical figures invariant under continuous one-to-one transfor-
mations. There are two branches, combinatorial and point-set topology. Certain problems of a
type treated by L. Euler (170783) gave rise to combinatorial topology, often called rubber sheet
geometry [. . . ]. The first systematic development was given in 1895 by H. Poincare (18541912) in
his Analysis situs (an expression used by G. W. Leibniz (16461716) for geometry of situation). [. . . ]
Point-set topology, which regards the geometrical figures as subsets of a structured space, originated
with G. Cantor (18451918) and was perfected by F. Hausdorff (18681942), who showed that the
essential concept of a neighbourhood was possible in a space without a metric. L. E. J. Brouwer
(18811966) combined the two branches and topology now unifies almost all mathematics.
This definition of topology seems to focus very much on homeomorphism-invariant figures side of topology,
while also permitting a second branch dealing with the spaces rather than the figures.
Reinhardt/Soeder [120], page 207, says the following.
Die Topologie ist als selbststandige math. Disziplin noch verhaltnismaig jung. Man unterschei-
det heute die mengentheoretische Topologie und die algebraische Topologie. In der algebraischen
Topologie werden top. Probleme mit algebraischen Hilfsmitteln gelost.
Die mengentheoretische Topologie (kurz: Topologie) hat ihre Wurzel in der reellen Analysis.
Es zeigte sich namlich, da man z. B. die Konvergenztheorie allein durch Eigenschaften von Punkt-
mengen entwickeln kann, ohne sich der algebr. Struktur oder der Ordnungsstruktur zu bedienen. Es
kristallisierte sich eine dritte Struktur heraus, die topologische Struktur . In ihr werden Begriffe wie
Umgebung, offene Menge, abgeschlossene Menge, Beruhrungspunkt, Haufungspunkt, Konvergenz,
Zusammenhang und Kompaktheit formuliert und Punktmengen mit Hilfe dieser Begriffe untersucht
und klassifiziert.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
This may be translated into English as follows.
Topology as an independent mathematical discipline is still relatively young. One distinguishes
nowadays set-theoretic topology and algebraic topology. In algebraic topology, topological problems
are solved with algebraic tools.
Set-theoretic topology (in short: topology) has its root in real analysis. Thus it is shown, for
example, that the theory of convergence may be developed by way of point-set properties alone,
without using algebraic structure or order structure. A third structure crystallised out of this:
topological structure. In this, concepts such as neighbourhood, open set, closed set, contact point,
accumulation point, convergence, connectedness and compactness are formulated, and point-sets
are investigated and classified with the help of these concepts.
This explanation also seems to indicate that topology may be divided into the analytic (local) and geometric
(global) branches.
Boyer/Merzbach [217], pages 552553, says the following on this subject.
Topology is now a broad and fundamental branch of mathematics, with many aspects, but it can
be subdivided into two fairly distinct sub-branches: combinatorial topology and point-set topology.
Poincare had little enthusiasm for the latter, and when, in 1908, he addressed the International
Mathematical Congress at Rome, he referred to Cantors Mengenlehre as a disease from which later
generations would regard themselves as having recovered. Combinatorial topology, or analysis situs,
as it was then generally called, is the study of intrinsic qualitative aspects of spatial configurations
that remain invariant under continuous one-to-one transformations. It is often referred to popularly
as rubber-sheet geometry, for deformations of, say, a balloon, without puncturing or tearing it,
are instances of topological transformations.
Roughly speaking, continuous functions are those which preserve the connectedness of sets. (See Section 35.2
for details.) So continuity may be defined in terms of connectedness. But connectedness and continuity may
both be defined in terms of the concepts of interior, exterior and boundary of sets. Thus:
The interior Int(S) and exterior Ext(S) of a set S in a topological space X may be defined in terms of the
boundary Bdy(S) as Int(S) = S \ Bdy(S) and Ext(S) = (X \ S) \ Bdy(S). So everything in topology may
be defined in terms of boundaries of sets. Hence:
The modern technical specification of a topology is expressed in terms of open sets. (An open set is a set
which contains none of its boundary points.) The interior, exterior and boundary of a set are then defined
in terms of these open sets. However, for the meaning of a topology, boundaries are more fundamental than
open sets. Both concepts contain the same information in a technical sense, but boundaries have a much
stronger intuitive appeal than open sets.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(3) Oceans, seas and lakes, water droplets, icebergs and glaciers.
(4) Nations, territories and continents.
(5) Planets, stars, asteroids.
It is not totally implausible that the earliest vertebrate animals on Earth implemented the concepts of
interior, exterior and boundary in the world-models by which they navigated their environments. These
topological concepts are arguably more fundamental than numbers or propositional logic.
The logical concept of a class of objects implies both an interior and an exterior of the class. So there is
apparently a close fundamental relation between set theory and topology.
Mathematical models of the physical world are very often expressed in terms of the interior, exterior and
boundary of sets. For example, the Stokes theorem expresses integrals over a region in terms of integrals over
its boundary. Conservation equations for physical flows are expressed in terms of boundary integrals and
interior integrals. Solutions of boundary value problems are typically expressed as integrals over boundaries
and interior regions. A large proportion of complex analysis is concerned with integrals over boundary curves
and their relations to integrals over interior regions bounded by curves.
Perhaps most importantly, all of mathematical analysis is expressed in terms of limits, which are effectively
equivalent to boundaries of sets. Thus analysis is based upon the boundary concept, which is the core
concept of topology.
31.2.3 Remark: The topology of a set partitions the whole space into interior, exterior and boundary.
A topology for a set X defines the interior Int(S), exterior Ext(S) and boundary Bdy(S) of every subset S
of X. These three sets form a partition of X for each set S. The technical specification of a topology uses
open neighbourhoods to define these three sets.
The interior of a set S consists of the points x1 which have at least one neighbourhood which is entirely
inside the set S. The exterior consists of the points x3 which have at least one neighbourhood which is
X \S x2 Bdy(S)
S
x1 Int(S)
neighbourhoods
x3 Ext(S)
entirely outside the set S. The boundary consists of the points x2 which are neither interior nor exterior.
In other words, every neighbourhood of a boundary point contains at least one point inside and one point
outside S. (See Figure 31.2.1.)
One way to think about set interiors and boundaries is to recall Zenos paradox of Achilles and the tortoise.
Imagine the tortoise walking towards the boundary. Whenever Achilles gets to where the tortoise was, he
still has the tortoise between him and the boundary. So Achilles always has a neighbour inside the set.
31.2.4 Remark: The point-set and topology layers in a 5-layer differential geometry structure model.
The entities in layer 0 of the differential geometry layer-model proposed in Section 1.1 are points, which may
be locations in space or events in space-time, or any other kind of concrete or abstract point. Points are
represented mathematically as elements of sets.
There is not necessarily any association at all between points in layer 0. The only necessary relation between
points is the equality relation. Given any two points P and Q, it is possible to say if the points are equal
(P = Q) or not equal (P 6= Q). There is no distance relation. So there is no distinction between points
which are close to P and those which are distant from P .
A set of points can be counted because the equality relation is well defined on any set. Counting a set or
subset can be achieved by labelling points with numbers as illustrated in Figure 31.2.2.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
7
5
9
3 8
2 4
6
1
0
Although the points in this figure are drawn on 2-dimensional paper, pure points do not have coordinates of
any kind. All we can do with pure points is to label and group and count them. (The basic theory underlying
pure point-sets is presented in Chapters 7 to 16.)
Differential geometry layer 1 adds topological structure to simple point sets. Topology may be thought of as
a kind of glue which holds neighbouring points together. Otherwise points would have no association with
their neighbours at all.
neighbourhood Nx or Ny in this diagram, but points may have any number of neighbourhoods of any size
and shape.
The fact that the two sets of points S1 and S2 in Figure 31.2.3 are disconnected from each other is clear from
the fact that each point-set S1 and S2 can be covered by neighbourhoods which exclude the neighbourhoods
of the other set of points. That is, the intersection Nx Ny of neighbourhoods Nx and Ny is empty for all
x S1 and y S2 .
x Nx
y Ny
Nx Ny =
The neighbourhoods of the sets S1 and S2 may be joined into combined neighbourhoods 1 and 2 which
contain all of the respectiveS points in the interiors.
S This is illustrated in Figure 31.2.4. The combined
neighbourhoods are 1 = xS1 Nx and 2 = yS2 Ny respectively. Clearly S1 is in the interior of 1 and
S2 is in the interior of 2 , and the intersection 1 2 is empty because all of the neighbourhood pairs Nx
and Ny have an empty intersection.
S
1 = Nx S
xS1 1 2 = 2 = Ny
yS2
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Note that the sets 1 and 2 are not associated with any particular individual point. The subject of topology
is much simpler as a logical discipline if the open sets are not associated with particular points. Instead of
dealing with an infinite number of little neighbourhoods around an infinite number of points, we can deal
instead with a much smaller number of combined neighbourhoods around combined sets of points. This leads
to a way of thinking that looks more like Figure 31.2.5.
1
2
This is the definition of connectedness of a set. A set X is disconnected if there are two neighbourhoods
which cover X and there is at least one point of X in each of the two neighbourhoods. These neighbourhoods
effectively disconnect the set into two non-empty components. (See also Figure 34.1.1.)
The ability to disconnect points and sets of points from each other is the fundamental task of open sets. If
the set of open neighbourhoods is reduced by removing some open neighbourhoods, this tends to reduce the
ability to disconnect sets into two components. It follows that more sets are then defined to be connected as
the set of neighbourhoods is reduced. Similarly, if the set of neighbourhoods is augmented by adding new
neighbourhood sets, this makes it easier to disconnect sets into two ore more subsets. Roughly speaking, the
bigger the topology is, the smaller the class of connected sets is.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
which functions are continuous. If one knows which functions are continuous, one may determine which sets
are connected. This could be called the interdefinability of these concepts.
31.2.9 Remark: Topology is applicable to mathematical models, not to the real physical world.
Real physical geometry does not correspond very well to the properties of topological spaces. For example,
if a zero-diameter single point is removed from a physical plane figure, it is impossible to detect this. No
R
measurement apparatus has a zero resolution. If a point is removed from the set 2 , the topology is
significantly altered. There is no way to determine whether a physical set is open or closed because a zero-
thickness boundary is undetectable. Physical point-sets have fuzzy boundaries. However, this is not a serious
problem. Topology is applicable to mathematical models, not to real physical systems. Real points have
positive width. Real boundaries have positive thickness.
From the point of view of doing practical analysis, undoubtedly the open-set formalisation is the best trade-off
between objectives. However, open-set classes do lack direct intuitive appeal, as mentioned in Remark 31.2.1.
31.3.3 Definition:
A topological space is a pair X < (X, T ) such that T is a topology for the set X.
The point set of a topological space (X, TX ) is the set X.
A point in a topological space (X, TX ) is any element of X.
A set in a topological space (X, TX ) is any subset of X.
An open set in a topological space (X, TX ) is any element of TX .
31.3.4 Notation: Top(X), for a topological space X < (X, T ), denotes the topology T when the choice
of topology T on X is implicit in a particular context. In other words, Top(X) = T .
31.3.6 Remark: The equivalence of the two-set and finite-collection-of-sets intersection rules.
The two-set intersection rule in Definition 31.3.2 (ii) is chosen to facilitate the proof of validity of topologies.
By Theorem 31.3.7, the two-set rule implies the general finite intersection rule. Since Theorem 31.3.7 (ii0 )
implies Definition 31.3.2 (ii), the finite intersection condition (ii0 ) could be substituted for the two-set inter-
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
section condition (ii) without changing the definition.
The non-standard set notation P1 (T ) is defined as {C P (T ); 1 #(C) < }, the set of all non-empty
finite subsets of T . There is no standard notation for the set of non-empty finite subsets of T . Therefore the
non-standard Notation 13.10.5 is used here to make such set-expressions more readable. (The unfortunate
cost is that the reader must memorise yet another notation!)
31.3.7 Theorem: The two-set intersection rule implies the finite-collection-of-sets intersection rule.
Let (X, T ) be a topological space. Then
(ii0 ) C P
1 (T ),
T
C T.
Proof: Let (X, T )Tbe a topological space. To show (ii0 ), let C P 1 (T ). If #(C) = 1, then C = {} for
some
T set T . So C = T as claimed. If #(C) = 2, then C = {1 , 2 } for some sets 1 , 2 T . So
C = 1 2 T by Definition 31.3.2 (ii).
Z N T
Let n = #(C) + . Then there is a bijection f : n C. So C = i=1 f (i) =
Tn Tn1
i=1 f (i) f (n). This
is an element of T by Definition 31.3.2 (ii) if part (ii0 ) is valid for #(C) = n 1. Therefore by induction
Z
on n, (ii0 ) is valid for all n = #(C) + .
31.3.8 Remark: Why the intersection rule is finite whereas the union rule is infinite.
A topology for a set X is a set of subsets of X which contains the empty set, the set X, and any finite
intersection or arbitrary union of sets in X. One might reasonably ask why there is an asymmetry between
set intersections and set unions in this definition. One simple answer is that topology would be very boring
if closure under arbitrary intersections was required. In that case, every topology on a set X would be the
set of all unions of a partition of X. For such a topology to have the ability to separate pairs of points at all
(in the sense of the very weak T1 separation class in Definition 33.1.8), the topology would have to be the
P
power set (X). Then the only connected sets would be singletons.
A better way to answer the question is to consider the intuitive idea of the interior of an open set. A set is
intuitively defined to be open if every point in the set is in the interior of the set.
If a point x is in the interior of an open set 1 and also in the interior of 2 , then we would expect x to
be in the interior of 1 2 although the walls of the set would be a little closer. In the case of a
union 1 2 , the walls of the union will be either the same distance away or further away. Since the
union operation makes sets bigger, we are guaranteed to always be in the interior of a set no matter how
many open sets are in a union, even an infinite or uncountably infinite number of open sets. The shrinkage
of a set, on the other hand, has the danger that eventually we might not have enough room to move. Being
in the interior of a set intuitively means that we have at least some space between each point and the
walls. The amount of space for the intersection 1 2 should be the minimum of the space for the
individual sets 1 and 2 . But the minimum of a small space and a small space is still a small space.
So by naive induction, we expect that the intersection of any finite number of sets will leave us at least some
room around each point which is still in the intersection.
To put it simply, after any number of enlargements, a neighbourhood is still a neighbourhood. But after
an infinite number of shrinkages, a neighbourhood might not be a neighbourhood any more. Hence the
asymmetry. Therefore a topological space specifies closure under finite intersections and arbitrary unions.
31.3.9 Remark: The empty set is not excluded as a topological space.
Some texts (e.g. Simmons [133], page 92; B. Mendelson [111], page 71) require a topology to be defined on
a non-empty set. Others (e.g. Kelley [97], page 37; Wallace [150], page 264; Steen/Seebach [137], page 3;
Ahlfors [44], page 67; Gilmore [80], page 59; Robertson/Robertson [122], page 6; Yosida [161], page 3) do
not make this requirement, at least not explicitly. The exclusion of the empty set is inconvenient for the
statement of some kinds of theorems. It is tedious to have to treat separately the cases where a set is empty.
Therefore the pair (, {}) is accepted as a valid topology in Definition 31.3.3.
31.3.10 Remark: Defining neighbourhoods in terms of the class of all open sets.
Definition 31.3.11 is somewhat backwards. A topology on a set is the collection of all neighbourhoods of all
points in the set. A more intuitive approach to topology would first define the collection of neighbourhoods
at each point, and then define a topology as the union of these per-point collections. However, the class-of-
all-open-sets definition of a topology does have many practical advantages.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A neighbourhood always means an open neighbourhood in this book. Some authors also define a closed
neighbourhood and a compact neighbourhood. (See for example Kelley [97], pages 38, 141, 146.) Closed
and compact neighbourhoods are given here as Definitions 31.8.25 and 33.6.2.
31.3.11 Definition: An (open) neighbourhood of a point x in a topological space X
< (X, T ) is any set
T which satisfies x .
31.3.12 Notation: The set of neighbourhoods of a point.
Topx (X), for a topological space X
< (X, T ) and x X, denotes the set of neighbourhoods of x. In other
words, Topx (X) = { Top(X); x } = { T ; x }.
31.3.13 Remark: Alternative notations for the set of neighbourhoods at a point.
B. Mendelson [111], page 76, gives the notation Nx (using the Fraktur-N character) for the set of neighbour-
hoods Topx (X) at x X. Kelley [97], page 56, gives a notation similar to Ux . Robertson/Robertson [122],
page 6, gives the notation Ux .
The non-standard Notation 31.3.14 denotes the set of all point/neighbourhood pairs for a given topological
space. This is occasionally useful as a shorthand.
31.3.14 Notation: The set of all point/neighbourhood pairs.
Nbhd(X), for a topological space X
< (X, T ), denotes the set of point/neighbourhood pairs (x, ) X T
such that x . In other words,
Nbhd(X) = {(x, ) X Top(X); x }
= {(x, ) X Top(X); Topx ()}.
31.3.15 Remark: A set is open if every point has a neighbourhood included in the set.
Theorem 31.3.16 gives a clue as to why open sets are called open. The open sets are those for which every
point has a neighbourhood which is included in the set. In other words, no point of an open set has outside
points in every neighbourhood. Such a point would be incapable of isolating itself from the outside. So it
would be on the boundary of the set, not within the inside.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
A discrete topological space is a topological space (X, T ) such that T is the discrete topology on X.
Proof: To show that the trivial topology is a topology, let X be a set and TX = {, X}. Then {, X} TX .
S that 1 , 2 TX . Then
Suppose S 1 2 equals either S or X. So 1 2 TX . Suppose that C (TX ). P
Then C = if X / C, and C = X if X C. So C TX . Hence TX is a topology on X.
P
To show that the discrete topology is a topology, let X be a set and TX = (X). Then {, X} TX .
Suppose that 1 , 2 TX . Then 1 2 1 SX. So 1 2 (X) P
S = TX . Suppose that C (TX ). P
P P
Then C (X). So X for all C. So C X. Therefore C (X) = TX . Hence TX is a
topology on X.
31.3.22 Remark: Standard terminology for smaller and larger topologies on a given set.
When more than one topology is being discussed on a single set, one often compares the relative strength
or weakness of the topologies. It is often true that if a given topology has a property, then all stronger, or
weaker, topologies also have that property. Therefore knowing the relative strength or weakness of topologies
is often useful for proving properties more easily.
In contradiction to standard English, topologies T1 and T2 such that T1 = T2 are said to be both weaker and
stronger than each other. It is simpler to adopt this convention than to say, for instance, that T1 is weaker
than or equal to T2 . For any set X, the trivial topology {, X} is clearly weaker than all topologies on X,
P
and the discrete topology (X) is stronger than all topologies on X.
31.3.23 Definition: A topology T1 on a set X is said to be weaker than a topology T2 on X if T1 T2 .
T1 is said to be stronger than T2 if T1 T2 .
31.3.24 Remark: Alternative terminology for weak and strong topologies.
The meanings of the adjectives weak and strong are not instantly clear in the context of topologies on
sets. The respective adjectives coarse and fine would be much clearer.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
O (Fraktur O) for the set of open sets and F (Fraktur F) for the set of closed sets. But the Fraktur font is
difficult to write and read. The non-standard Notation 31.4.4 uses an over-bar to indicate the set of closed
sets of a topological space (X, T ) by analogy with Notations 31.3.4 and 31.8.8.
31.4.4 Notation: The set of closed sets for a given topological space.
Top(X), for a topological space X
< (X, T ), denotes the set of closed sets in X. In other words,
P
Top(X) = {F (X); X \ F Top(X)}
= {X \ ; Top(X)}
= {X \ ; T }.
Topx (X) denotes the set of closed sets in a topological space X
< (X, T ) which contain a point x X. That
is, Topx (X) = {F Top(X); x F }.
31.4.5 Theorem: Closure of the set of all closed sets under finite unions and arbitrary intersections.
Let X be a topological space.
(i) C P
0 Top(X) ,
S
C Top(X). (The union of any finite set of closed sets is closed.)
P T
(ii) C 1 Top(X) , C Top(X). (The intersection of any non-empty set of closed sets is closed.)
Proof: To prove part (i), let C be a non-empty finite set of closed sets in a topological T space X. Let
C 0 =S{X \ K; K T C}. By Definition 31.4.1, S C 0 , S Top(X). By Theorem 31.3.7, S C 0
Top(X).
But C = X \ ( C 0 ). So the union of C is closed by Definition 31.4.1. If C is empty, C = , which is a
closed set.
space X. Let C 0 = {XS\ K; K C}.
For part (ii), let C be a non-empty set of closed sets in a topological S T By
Definition 31.4.1, S C , S Top(X). By Definition 31.3.2 (iii), C 0 Top(X). But C = X \ ( C 0 ).
0
31.4.6 Remark: Closed sets contain all of their limits points and isolated points.
Theorem 31.4.7 gives some idea of why closed sets are called closed. In the terminology of limit points
in Definition 31.10.2 and isolated points in Definition 31.10.8, Theorem 31.4.7 (i) states in essence that a
closed set contains all of its limit points and isolated points. Theorem 31.4.7 (ii) states that the closed sets
are precisely those which have the property in part (i). Theorem 31.4.7 is the dual of Theorem 31.3.16.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
has a finite range, the topology on the range-space puts constraints on the topology of the domain-space.
The smallest topology Top(X) = {, X} on any set X is the trivial topology in Definition 31.3.18. The
P
largest topology Top(X) = (X) = {S; S X} on any set X is the discrete topology in Definition 31.3.19.
The interesting thing is to determine what the other possibilities are for the topology on any given set X.
31.5.3 Remark: Without loss of generality, let the finite sets be sets of integers.
When considering the possible topologies on countable sets of points, it is convenient to consider only sets
whose points are integers. Only the cardinality of the point set matters for this task.
2a 2b 2c 2d
1 2 1 2 1 2 1 2
Topology 2a is the trivial topology. Topology 2d is the discrete topology. Topologies 2b and 2c are equivalent
under a permutation of the point set. So there is only one interesting topology on a two-point set. The
2
P
four possible topologies (amongst the 2(2 ) = 16 subsets of ({1, 2})) are illustrated in Figure 31.5.1.
Since and X are always elements of a topology on a set X, it makes sense to ignore them. Similarly, the
set brackets are a distraction. So the topologies may be abbreviated as in the right column of the table.
31.5.6 Example: Nine unique topologies are possible on a three-element set, modulo permutations.
On a three-element set X = {1, 2, 3}, there are 9 unique topologies. The other topologies are obtained by
permuting the point set.
topology multiplicity
3a 0 1
3b 1 3
3c 12 3
3d 12, 3 3
3e 1, 12 6
3f 1, 2, 12 3
3g 2, 12, 23 3
3h 1, 2, 12, 23 6
3i 1, 2, 3, 12, 13, 23 1
total 29
Topology 3a is the trivial topology. Topology 3i is the discrete topology. Including all of the permutations
P
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
3
of the point set, there are 29 possible topologies amongst the 2(2 ) = 256 subsets of ({1, 2, 3}). Topologies
3a to 3h are illustrated in Figure 31.5.2.
3a 3b 3c 3d
1 2 3 1 2 3 1 2 3 1 2 3
3e 3f 3g 3h
1 2 3 1 2 3 1 2 3 1 2 3
T1 () T2 (1) T3 (2) T4 (12) T5 (1,12) T6 (2,12) T7 (1,2,12) T8 (3) T9 (13) T10 (1,13) T11 (3,13) T12 (1,3,13) T13 (23) T14 (2,23)
T15 (3,23) T16 (2,3,23) T17 (123) T18 (1,123) T19 (2,123) T20 (12,123) T21 (1,12,123) T22 (2,12,123) T23 (1,2,12,123) T24 (3,123)
T25 (12,3,123) T26 (13,123) T27 (1,13,123) T28 (2,13,123) T29 (1,12,13,123) T30 (1,2,12,13,123) T31 (3,13,123) T32 (1,3,13,123)
T33 (1,12,3,13,123) T34 (23,123) T35 (1,23,123) T36 (2,23,123) T37 (2,12,23,123) T38 (1,2,12,23,123) T39 (3,23,123)
T40 (2,3,23,123) T41 (2,12,3,23,123) T42 (3,13,23,123) T43 (1,3,13,23,123) T44 (2,3,13,23,123) T45 (1,2,12,3,13,23,123) T46 (4)
T47 (123,4) T48 (14) T49 (1,14) T50 (23,14) T51 (1,123,14) T52 (1,23,123,14) T53 (4,14) T54 (1,4,14) T55 (1,123,4,14) T56 (24)
T57 (2,24) T58 (13,24) T59 (2,123,24) T60 (2,13,123,24) T61 (4,24) T62 (2,4,24) T63 (2,123,4,24) T64 (124) T65 (1,124) T66 (2,124)
T67 (12,124) T68 (1,12,124) T69 (2,12,124) T70 (1,2,12,124) T71 (3,124) T72 (1,13,124) T73 (1,3,13,124) T74 (2,23,124)
T75 (2,3,23,124) T76 (12,123,124) T77 (1,12,123,124) T78 (2,12,123,124) T79 (1,2,12,123,124) T80 (12,3,123,124)
T81 (1,12,13,123,124) T82 (1,2,12,13,123,124) T83 (1,12,3,13,123,124) T84 (2,12,23,123,124) T85 (1,2,12,23,123,124)
T86 (2,12,3,23,123,124) T87 (1,2,12,3,13,23,123,124) T88 (4,124) T89 (12,4,124) T90 (12,123,4,124) T91 (14,124) T92 (1,14,124)
T93 (2,14,124) T94 (1,12,14,124) T95 (1,2,12,14,124) T96 (2,23,14,124) T97 (1,12,123,14,124) T98 (1,2,12,123,14,124)
T99 (1,2,12,23,123,14,124) T100 (4,14,124) T101 (1,4,14,124) T102 (1,12,4,14,124) T103 (1,12,123,4,14,124) T104 (24,124)
T105 (1,24,124) T106 (2,24,124) T107 (2,12,24,124) T108 (1,2,12,24,124) T109 (1,13,24,124) T110 (2,12,123,24,124)
T111 (1,2,12,123,24,124) T112 (1,2,12,13,123,24,124) T113 (4,24,124) T114 (2,4,24,124) T115 (2,12,4,24,124)
T116 (2,12,123,4,24,124) T117 (4,14,24,124) T118 (1,4,14,24,124) T119 (2,4,14,24,124) T120 (1,2,12,4,14,24,124)
T121 (1,2,12,123,4,14,24,124) T122 (34) T123 (12,34) T124 (3,34) T125 (3,123,34) T126 (12,3,123,34) T127 (4,34) T128 (3,4,34)
T129 (3,123,4,34) T130 (4,124,34) T131 (12,4,124,34) T132 (3,4,124,34) T133 (12,3,123,4,124,34) T134 (134) T135 (1,134)
T136 (2,134) T137 (1,12,134) T138 (1,2,12,134) T139 (3,134) T140 (13,134) T141 (1,13,134) T142 (3,13,134) T143 (1,3,13,134)
T144 (3,23,134) T145 (2,3,23,134) T146 (13,123,134) T147 (1,13,123,134) T148 (2,13,123,134) T149 (1,12,13,123,134)
T150 (1,2,12,13,123,134) T151 (3,13,123,134) T152 (1,3,13,123,134) T153 (1,12,3,13,123,134) T154 (3,13,23,123,134)
T155 (1,3,13,23,123,134) T156 (2,3,13,23,123,134) T157 (1,2,12,3,13,23,123,134) T158 (4,134) T159 (13,4,134) T160 (13,123,4,134)
T161 (14,134) T162 (1,14,134) T163 (3,14,134) T164 (1,13,14,134) T165 (1,3,13,14,134) T166 (3,23,14,134) T167 (1,13,123,14,134)
T168 (1,3,13,123,14,134) T169 (1,3,13,23,123,14,134) T170 (4,14,134) T171 (1,4,14,134) T172 (1,13,4,14,134)
T173 (1,13,123,4,14,134) T174 (4,24,134) T175 (2,4,24,134) T176 (13,4,24,134) T177 (2,13,123,4,24,134) T178 (14,124,134)
T179 (1,14,124,134) T180 (2,14,124,134) T181 (1,12,14,124,134) T182 (1,2,12,14,124,134) T183 (3,14,124,134)
T184 (1,13,14,124,134) T185 (1,3,13,14,124,134) T186 (2,3,23,14,124,134) T187 (1,12,13,123,14,124,134)
T188 (1,2,12,13,123,14,124,134) T189 (1,12,3,13,123,14,124,134) T190 (1,2,12,3,13,23,123,14,124,134) T191 (4,14,124,134)
T192 (1,4,14,124,134) T193 (1,12,4,14,124,134) T194 (1,13,4,14,124,134) T195 (1,12,13,123,4,14,124,134) T196 (4,14,24,124,134)
T197 (1,4,14,24,124,134) T198 (2,4,14,24,124,134) T199 (1,2,12,4,14,24,124,134) T200 (1,13,4,14,24,124,134)
T201 (1,2,12,13,123,4,14,24,124,134) T202 (34,134) T203 (1,34,134) T204 (1,12,34,134) T205 (3,34,134) T206 (3,13,34,134)
T207 (1,3,13,34,134) T208 (3,13,123,34,134) T209 (1,3,13,123,34,134) T210 (1,12,3,13,123,34,134) T211 (4,34,134) T212 (3,4,34,134)
T213 (3,13,4,34,134) T214 (3,13,123,4,34,134) T215 (4,14,34,134) T216 (1,4,14,34,134) T217 (3,4,14,34,134)
T218 (1,3,13,4,14,34,134) T219 (1,3,13,123,4,14,34,134) T220 (4,14,124,34,134) T221 (1,4,14,124,34,134)
T222 (1,12,4,14,124,34,134) T223 (3,4,14,124,34,134) T224 (1,3,13,4,14,124,34,134) T225 (1,12,3,13,123,4,14,124,34,134)
T226 (234) T227 (1,234) T228 (2,234) T229 (2,12,234) T230 (1,2,12,234) T231 (3,234) T232 (3,13,234) T233 (1,3,13,234) T234 (23,234)
T235 (2,23,234) T236 (3,23,234) T237 (2,3,23,234) T238 (23,123,234) T239 (1,23,123,234) T240 (2,23,123,234)
T241 (2,12,23,123,234) T242 (1,2,12,23,123,234) T243 (3,23,123,234) T244 (2,3,23,123,234) T245 (2,12,3,23,123,234)
T246 (3,13,23,123,234) T247 (1,3,13,23,123,234) T248 (2,3,13,23,123,234) T249 (1,2,12,3,13,23,123,234) T250 (4,234)
T251 (23,4,234) T252 (23,123,4,234) T253 (4,14,234) T254 (1,4,14,234) T255 (23,4,14,234) T256 (1,23,123,4,14,234) T257 (24,234)
T258 (2,24,234) T259 (3,24,234) T260 (3,13,24,234) T261 (2,23,24,234) T262 (2,3,23,24,234) T263 (2,23,123,24,234)
T264 (2,3,23,123,24,234) T265 (2,3,13,23,123,24,234) T266 (4,24,234) T267 (2,4,24,234) T268 (2,23,4,24,234)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T269 (2,23,123,4,24,234) T270 (24,124,234) T271 (1,24,124,234) T272 (2,24,124,234) T273 (2,12,24,124,234)
T274 (1,2,12,24,124,234) T275 (3,24,124,234) T276 (1,3,13,24,124,234) T277 (2,23,24,124,234) T278 (2,3,23,24,124,234)
T279 (2,12,23,123,24,124,234) T280 (1,2,12,23,123,24,124,234) T281 (2,12,3,23,123,24,124,234)
T282 (1,2,12,3,13,23,123,24,124,234) T283 (4,24,124,234) T284 (2,4,24,124,234) T285 (2,12,4,24,124,234) T286 (2,23,4,24,124,234)
T287 (2,12,23,123,4,24,124,234) T288 (4,14,24,124,234) T289 (1,4,14,24,124,234) T290 (2,4,14,24,124,234)
T291 (1,2,12,4,14,24,124,234) T292 (2,23,4,14,24,124,234) T293 (1,2,12,23,123,4,14,24,124,234) T294 (34,234) T295 (2,34,234)
T296 (2,12,34,234) T297 (3,34,234) T298 (3,23,34,234) T299 (2,3,23,34,234) T300 (3,23,123,34,234) T301 (2,3,23,123,34,234)
T302 (2,12,3,23,123,34,234) T303 (4,34,234) T304 (3,4,34,234) T305 (3,23,4,34,234) T306 (3,23,123,4,34,234) T307 (4,24,34,234)
T308 (2,4,24,34,234) T309 (3,4,24,34,234) T310 (2,3,23,4,24,34,234) T311 (2,3,23,123,4,24,34,234) T312 (4,24,124,34,234)
T313 (2,4,24,124,34,234) T314 (2,12,4,24,124,34,234) T315 (3,4,24,124,34,234) T316 (2,3,23,4,24,124,34,234)
T317 (2,12,3,23,123,4,24,124,34,234) T318 (34,134,234) T319 (1,34,134,234) T320 (2,34,134,234) T321 (1,2,12,34,134,234)
T322 (3,34,134,234) T323 (3,13,34,134,234) T324 (1,3,13,34,134,234) T325 (3,23,34,134,234) T326 (2,3,23,34,134,234)
T327 (3,13,23,123,34,134,234) T328 (1,3,13,23,123,34,134,234) T329 (2,3,13,23,123,34,134,234)
T330 (1,2,12,3,13,23,123,34,134,234) T331 (4,34,134,234) T332 (3,4,34,134,234) T333 (3,13,4,34,134,234) T334 (3,23,4,34,134,234)
T335 (3,13,23,123,4,34,134,234) T336 (4,14,34,134,234) T337 (1,4,14,34,134,234) T338 (3,4,14,34,134,234)
T339 (1,3,13,4,14,34,134,234) T340 (3,23,4,14,34,134,234) T341 (1,3,13,23,123,4,14,34,134,234) T342 (4,24,34,134,234)
T343 (2,4,24,34,134,234) T344 (3,4,24,34,134,234) T345 (3,13,4,24,34,134,234) T346 (2,3,23,4,24,34,134,234)
T347 (2,3,13,23,123,4,24,34,134,234) T348 (4,14,24,124,34,134,234) T349 (1,4,14,24,124,34,134,234)
T350 (2,4,14,24,124,34,134,234) T351 (1,2,12,4,14,24,124,34,134,234) T352 (3,4,14,24,124,34,134,234)
T353 (1,3,13,4,14,24,124,34,134,234) T354 (2,3,23,4,14,24,124,34,134,234) T355 (1,2,12,3,13,23,123,4,14,24,124,34,134,234)
Figure 31.5.3 Example 31.5.8: List of all topologies on the point set {1, 2, 3, 4}
31.5.9 Remark: The dual of a topology is a valid topology on the same set.
In the case of finite point sets X, there is a simple duality between the open sets and closed sets because an
arbitrary union of sets in a topology on X means a finite union of sets when X is finite.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
31.6. Relative topology
31.6.1 Remark: The topology induced on a subset by the topology on the whole set.
The standard terminology relative topology in Definition 31.6.2 is not as clear as it could be. It would have
been better to call it the induced topology by S on T , or the topology induced on S by the topology T ,
or something like that. It is important to know where the topology came from. Therefore the slightly
non-standard word induced is applied in Definition 31.6.2.
31.6.2 Definition: The relative topology on a subset S of a set X, induced by a topology T on X, is the
set { S; T }.
P
Proof: Let T be a topology on X. Let S X. Let T 0 = { S; T } (S) denote the relative
topology induced by T on S. Then = S T 0 because T , and S = X S T 0 because X T .
Let 01 , 02 T 0 . Then 01 = 1 S and 02 = 2 S for some 1 , 2 T . So 1 2 T since T is a
topology. So 01 02 = (1 S) (2 S) = (1 2 ) S T 0 . So T 0 is closed under binary intersections.
P
Let C 0 (T 0 ). Then 0 C 0 , T, 0 = S. (Choosing a single set for each 0 C 0 at this
point in the proof would risk invoking the axiom of choice. This is avoided by using all possible setsS.) Let
P
C = { T ; 0 C 0 , 0 = S}. Then 0 (S), (0 C 0 ( C, 0 = S)). But C T
because T is a topology. So
S S
C 0 = {0 ; 0 C 0 }
S
= {0 ; 0 C 0 and T, 0 = S}
S
P
= {0 (S); 0 C 0 and T, 0 = S}
S
P
= {0 (S); C, 0 = S}
S
= { S; C}
S
= {; C} S
S
= ( C) S.
S
Therefore C 0 T 0 . Hence T 0 is a topology on X by Definition 31.3.2.
31.6.5 Theorem: Let T be a topology on a set X. Let S T . Let T 0 be the relative topology induced by
T on S. Then T 0 = T (S). P
Proof: Let T be a topology on a set X. Let S T . (In other words, let S be an open set in (X, T ).) Let
T 0 be the relative topology induced by T on S. Then T 0 = { S; T } by Definition 31.6.2. Suppose
that G T 0 . Then G = S for some T . So G T because T is closed under unions. But G (S) P
P P
because G S. So G T (S). Therefore T 0 T (S). Now suppose that G T (S). Let = G. P
P
Then G = S and T . So G T 0 . Therefore T (S) T 0 . Hence T 0 = T (S). P
31.6.6 Theorem: Let T be a topology on a set X. Let S T . Let T 0 be the relative topology induced by
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T on S. Then T 0 T . In other words, all sets which are open in the relative topological space (S, T 0 ) are
also open in the topological space (X, T ).
Proof: The relation T 0 T follows from the equality T 0 = T P(S) in Theorem 31.6.5.
(i) If S Top(X) has the relative topology from X on S, then Top(S) Top(X).
(ii) If S P(X) has the relative topology from X on S, then Top(S) = {F S; F Top(X)}.
(iii) If S Top(X) has the relative topology from X on S, then Top(S) Top(X).
31.7.2 Definition:
S An open cover of a subset S of a topological space X
< (X, T ) is a set C P(T ) such
that S C.
A finite open cover of a set S is an open cover C of S such that #(C) < .
31.7.6 Theorem: Let (X, T ) be a topological space and let C be an open cover of X. Then
S P(X), S T C, S T.
S
Hence T = { C G ; G : C T and C, G T }, where T denotes the relative topology on .
Proof: Suppose S T . Then S T for all C by the closure of T under finite intersection.
S S
Let S T and suppose that S T for all C. Then S = X S = C S = C ( S) by
Theorem 8.4.8 (iii). Hence S T by the closure of T under arbitrary unions.
S
To verify the claimed set-expression for T , let S = C G for some function G : C T which satisfies
C, G T . Then G = G0 for some G0 T , for all C, by Definition 31.6.2 for the relative
topology T . Therefore G T and G for all C. Hence S T .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S that SS T . Then C, S T . Define GS
Conversely, suppose = S for all C. Then
S = X S =S C S = C ( S) by Theorem 8.4.8 (iii). So S = C G . But C, G T .
Hence S { C G ; G : C T and C, G T }.
31.8.2 Definition: The (topological) interior of a set S in a topological space (X, T ) is the union of all
open sets in (X, T ) which are included in the set S. In other words, the interior of S is the set
S S S
= { T ; S} = (T (S)). P
T
S
31.8.3 Notation: Int(S) denotes the interior of a set S in a topological space (X, T ).
31.8.4 Definition: The interior operator of a topological space (X, T ) is the map Int : P(X) T , where
Int(S) denotes the interior of subsets S of X as in Definition 31.8.2 and Notation 31.8.3.
31.8.7 Definition: The (topological) closure of a set S in a topological space (X, T ) is the intersection of
all closed sets in (X, T ) which include the set S. In other words, the closure of S is the set
T T
X \ = {X \ ; T and S X \ }
T
X\S
T
P
= {K (X); X \ K T and S K}
T
= {K Top(X); S K}.
31.8.8 Notation: S denotes the closure of a set S with respect to an implicit topology.
31.8.9 Definition: The closure operator of a topological space (X, T ) is the map from P(X) to T which
P
is defined by S 7 S for S (X).
31.8.10 Remark: Survey of notations for interior, closure and boundary of sets in a topological space.
The notation S for the closure of a set S is the most popular, although it is inconvenient for complicated
set-expressions. The notation S a is used by some authors. (The letter a may be mnemonic for adherent
set, but Yosida [161] states that it comes from the German phrase for closure: abgeschlossene Hulle.)
Table 31.8.1 lists some notations which are found in the literature for the topological interior and closure of
sets. (Also included are some notations for the topological boundary of a set, which is defined in Section 31.9.)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
31.8.11 Remark: Box diagram for relations between the interior and closure of a set.
Some of the relations between the interior and closure of a set are illustrated in Figure 31.8.2.
S X \ S
Int(S) S \ Int(S) Int(X \ S)
S X \S
X
Figure 31.8.2 Relations between the interior and closure of a set S
31.8.12 Theorem: Some elementary properties of the interior and closure operators.
Let S be a subset of a topological space X. Then:
(i) Int(S) Top(X).
(ii) Int(S) S.
(iii) x X, (x Int(S) Topx (X), S).
(iv) Int(S) = {x X; Topx (X), S}.
(v) Int(S) = {x S; Topx (X), S}.
(vi) S Top(X).
(vii) S S.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
1974 Gilmore [80], page 60 S
1974 Reinhardt/Soeder [120], page 214 S0 S S
1975 Adams [43], page 2 S bdry S
1975 Lovelock/Rund [27], page 332 S S S
1975 Treves [146], pages xv, 189 S S
1981 Bleecker [231], page 11 S
1981 Johnsonbaugh/Pfaffenberger [93], page 136 S0 S S
1983 Gilbarg/Trudinger [79], page 9 S S
1983 Nash/Sen [30], pages 14, 15 S0 S b(S)
1993 EDM2 [109], pages 1607, 1611 S (or S , Int S) S a (or S, ClS) Bd S, Fr S, S
i
(xii) X \ S = X \ Int(S).
Let S1 and S2 be subsets of a topological space X. Then:
(xiii) S1 S2 Int(S1 ) Int(S2 ).
(xiv) S1 S2 S1 S2 .
Proof: Part (i) follows from Definition 31.3.2 because the interior of a set is the union of a set of open
sets by Definition 31.8.2.
S
For part (ii), let C = { Top(X); S}. Then z C, z S. So C S by Theorem 8.4.8 (xv).
That is, Int(S) S.
S
For part (iii), let x X. Suppose that x Int(S). Then x { Top(X); S} by Definition 31.8.2.
So x for some Top(X) such that S. Therefore Topx (X), S. For the converse,
S that x X and Topx (X), S. Then x for some Top(X) such that S. So
suppose
x { Top(X); S}. Hence x Int(S) by Definition 31.8.2.
Part (iv) is a paraphrase of part (iii).
Part (v) follows from part (iv) by part (ii).
Part (vi) follows from Theorem 31.4.5 because the closure of a set is the intersection of a non-empty set of
closed sets by Definition 31.8.7. (The set C of closed supersets of a set S X is non-empty because X is a
closed set in any topological space X. Therefore X C. See also Remark 31.8.6.)
S
For part (vii), let C = {K Top(X); S K}. Then z C, z S. So C S by Theorem 8.4.8 (xvi).
That is, S S.
S S
PartT(viii) follows from Int(X \ S) = { Top(X); X \ S} = {; Top(X) S X \ } =
X \ {X \ ; Top(X) S X \ }, which equals X \ S by Definition 31.8.7.
Part (ix) follows from part (viii) and Theorem 8.2.5 (xv).
Part (x) follows from parts (ii) and (vii).
Part (xi) follows from part (viii) by substituting X \ S for S.
Part (xii) follows from part (xi).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Part (xiii) follows from Theorem 8.4.8 (i) because, by the transitivity of the set inclusion relation, S1 S2
implies that { Top(X); S1 } { Top(X); S2 }.
Part (xiv) follows from Theorem 8.4.8 (ii) because, by the transitivity of the set inclusion relation, S1 S2
implies that {K Top(X); K S1 } {K Top(X); K S2 }.
31.8.13 Theorem: Further elementary properties of the interior and closure operators.
Let S be a subset of a topological space X. Then:
(i) S Top(X) Int(S) = S.
(ii) S Top(X) S Int(S).
(iii) S = Int(S) = .
(iv) S = X Int(S) = X.
(v) S Top(X) S = S.
(vi) S Top(X) S S.
(vii) S = S = .
(viii) S = X S = X.
(ix) Int(Int(S)) = Int(S).
(x) S = S.
(xi) Int(S) S.
(xii) Int(S) Int(S).
Let S1 and S2 be subsets of a topological space X. Then:
Proof: SFor part (i), let S be an open set in X. That is, S Top(X). Then S { Top(X); S}.
So S { Top(X); S} = Int(S) by Definition 31.8.2. But Int(S) S by Theorem 31.8.12 (i).
So Int(S) = S.
Now suppose that S X and Int(S) = S. Then S is open by Theorem 31.8.12 (i). So part (i) is verified.
Part (ii) follows from part (i) and Theorem 31.8.12 (ii).
Part (iii) follows from part (i) and Definition 31.3.2 (i).
For part (iv), let S = X. Then Int(S) = X by part (i) and Definition 31.3.2 (i). Now suppose that Int(S) =
X. Then S = X by Theorem 31.8.12 (ii).
T (v), let S be a closed set in X. That is, S Top(X). Then S {K Top(X); K S}. So
For part
S {K Top(X); K S} = S by Definition 31.8.7. But S S by Theorem 31.8.12 (vi). So S = S.
Now suppose that S X and S = S. Then S is closed by Theorem 31.8.12 (vi). So part (v) is verified.
Part (vi) follows from part (v) and Theorem 31.8.12 (vii).
For part (vii), let S = . Then S = by part (v) and Definitions 31.3.2 (i) and 31.4.1. Now suppose
that S = . Then S = by Theorem 31.8.12 (vii).
Part (viii) follows from part (v) and Definitions 31.3.2 (i) and 31.4.1.
Part (ix) follows from part (i) and Theorem 31.8.12 (i).
Part (x) follows from part (v) and Theorem 31.8.12 (vi).
Part (xi) follows from Theorem 31.8.12 (ii, xiv).
Part (xii) follows from Theorem 31.8.12 (vii, xiii).
For part (xiii), Int(S1 ) Int(S1 S2 ) by Theorem 31.8.12 (xiii). Similarly, Int(S2 ) Int(S1 S2 ). Hence
Int(S1 ) Int(S2 ) Int(S1 S2 ) by Theorem 8.1.7 (viii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
For part (xiv), let S1 , S2 (X). Then Int(S1 ) Int(S1 S2 ) by Theorem 31.8.12 (xiii). Similarly Int(S2 )
Int(S1 S2 ). So Int(S1 ) Int(S2 ) Int(S1 S2 ) by Theorem 8.1.7 (xi). Now assume x Int(S1 ) Int(S2 ).
Then 1 , 2 Topx (X), (1 S1 and 2 S2 ) by Theorem 31.8.12 (iii). So 1 2 Topx (X) and
1 2 S1 S2 for such 1 and 2 by Theorem 8.1.9 (iii). So x Int(S1 S2 ) by Theorem 31.8.12 (iii).
Therefore Int(S1 ) Int(S2 ) Int(S1 S2 ). Hence Int(S1 ) Int(S2 ) = Int(S1 S2 ).
P
For part (xv), let S1 , S2 (X). Then S1 S1 S2 by Theorem 31.8.12 (xiv). Similarly S2 S1 S2 . So
S1 S2 S1 S2 by Theorem 8.1.7 (viii). Now assume that x S1 S2 . Then x X \ Int(X \ (S1 S2 ))
by Theorem 31.8.12 (ix). So Int(X \ (S1 S2 )) = Int((X \ S1 ) (X \ S2 )) = Int(X \ S1 ) Int(X \ S2 ) by
part (xiv). Therefore x X \ (Int(X \ S1 ) Int(X \ S2 )) = (X \ Int(X \ S1 )) (X \ Int(X \ S2 )) = S1 S2
by Theorem 31.8.12 (ix). So S1 S2 S1 S2 . Hence S1 S2 = S1 S2 .
For part (xvi), S1 S1 S2 by Theorem 31.8.12 (xiv). Similarly, S2 S1 S2 . Hence S1 S2 S1 S2 .
by Theorem 8.1.7 (xi).
31.8.14 Remark: Non-existent general rules for interior of closure and closure of interior.
The rules Int(S) S and Int(S) S are not valid in general. For example, let S = in Q R with the
R R R
usual topology on . Then Int(S) = 6 S. Let S = [0, 1] . Then Int(S) = (0, 1) 6 S.
Similarly, Int S S and Int S S are not general rules. Let S = (0, 1). Then Int S = [0, 1] 6 S. Let
Q
S = . Then Int S = 6 S.
31.8.15 Theorem: Yet more elementary properties of the interior and closure operators.
Let S be a subset of a topological space X.
(i) S = X \ Int(X \ S).
(ii) Int(S) = X \ X \ S .
Part (ii) follows from part (i) by substituting X \ S for S and complementing both sides of the equation.
31.8.16 Theorem: Even more elementary properties of the interior and closure operators.
Let S be a subset of a topological space X.
(i) An element x of X is an element of the interior of S in X if and only if Topx (X), S.
In other words, Int(S) = {x X; Topx (X), S}.
(ii) An element x of X is an element of the closure of S in X if and only if Topx (X), S 6= .
In other words, S = {x X; Topx (X), S 6= }.
(iii) S Top(X) if and only if x S, Topx (X), S.
Proof: To show part (i), let x XS and assume that Topx (X), S. Then x for some
Top(X) such that S. So x { Top(X); S}. Therefore x Int(S) by Definition 31.8.2.
S
Conversely, let x Int(S). Then x { Top(X); S} by Definition 31.8.2. So x and S for
some Top(X). Therefore S for some Topx (X). In other words, Topx (X), S.
For part (ii), let x X. Then by Definition 31.8.7,
T
x S x {X \ ; T and S X \ }
S
x / {; T and S X \ }
Top(X), (x S X \ )
Top(X), (x S = )
Top(X), (x
/ S / )
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Top(X), (x S / )
Topx (X), S
/ .
31.8.17 Remark: Alternative expressions for the interior and closure of sets.
Definitions 31.8.2 and 31.8.7, and Theorem 31.8.16 (i, ii), may be summarised as follows.
S
Int(S) = { Top(X); S}
= {x X; Topx (X), S}
T
S = {X \ ; Top(X) and S X \ }
T
= {K Top(X); S K}
= {x X; Topx (X), S 6= },
31.8.18 Remark: Interior-operator and closure-operator notations which indicate choice of topology.
Notations 31.8.3 and 31.8.8 require the topology and base set to be implied in the context. So there can be
confusion if more than one topology is under consideration. As mentioned in Remark 31.3.5, a topological
space (X, T ) is fully
S determined by the set T . Thus Notations 31.8.19 and 31.8.20 fully determine the implicit
point space X = T by stating that T is a topology on X.
Notations 31.8.19 and 31.8.20 are applied in Theorem 31.8.21. The notation S for the closure of a set does not
easily permit the addition of a subscript to indicate the implied topology explicitly. Therefore the notation
ClosT (S) is used here for the closure of a set S with respect to a topology T .
31.8.19 Notation: IntT (S) denotes the interior of a set S P(X) with respect to a topology T on X.
31.8.20 Notation: ClosT (S) denotes the closure of a set S P(X) with respect to a topology T on X.
Clos(S) denotes the closure of a set S with respect to an implicit topology.
31.8.21 Theorem: The effect of topology strength on the interior and closure operators.
Let T1 and T2 be topologies on a set X such that T1 T2 . Then:
(i) IntT1 (S) IntT2 (S) for all S P(X).
(ii) ClosT (S) ClosT (S) for all S P(X).
1 2
31.8.23 Remark: The extremes of interior and closure operators for trivial and discrete topologies.
Theorem 31.8.21 says that strengthening the topology on a fixed set X makes interiors of sets larger, and
makes closures of sets smaller. (The inequalities in Theorem 31.8.21 are illustrated in Figure 31.9.3.) In
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
P
the extreme case of the strongest topology on X, namely the discrete topology (X), both the interior and
closure of S equal S itself. (This is discussed in more detail in Remark 31.11.11.) In the opposite extreme
P
of the trivial topology T = {, X} on X, the interior of any set S in (X) \ {, X} equals , and the closure
of such a set S equals X. (See Remark 31.9.16 for similar comments on the exterior and boundary of sets.)
31.8.25 Definition: A closed neighbourhood of a point x in a topological space X is any set F Topx (X)
which satisfies Topx (X), F .
31.9.2 Definition: The (topological) exterior of a set S in a topological space X is the union of all open
sets in X which are included in the set X \ S. In other words, the exterior of S is the set
S S
= { Top(X); S = }.
Top(X)
X\S
31.9.3 Notation: Ext(S) denotes the exterior of a set S in an implicit topological space.
P
31.9.4 Definition: The exterior operator of a topological space (X, T ) is the map Ext : (X) T , where
Ext(S) denotes the exterior of subsets S of X as in Definition 31.9.2 and Notation 31.9.3.
31.9.5 Definition: The (topological) boundary of a set S in a topological space X is the complement of
the interior of S within the closure of S; in other words, S \ Int(S).
31.9.6 Notation: Bdy(S) denotes the boundary of a set S with respect to an implicit topology.
S is an alternative notation for Bdy(S).
31.9.7 Definition: The boundary operator of a topological space (X, T ) is the map Bdy : (X) T , P
where Bdy(S) denotes the boundary of subsets S of X as in Definition 31.9.5 and Notation 31.9.6.
31.9.9 Remark: Diagram for relations between interior, exterior, boundary and closure of a set.
The relations between the interior, exterior, boundary and closure of a set in a topological space are illustrated
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
in Figure 31.9.1. Note particularly that, in general, the boundary Bdy(S) straddles the set S and its
complement. The set S is open if and only if the intersection Bdy(S) S is empty. The set S is closed if
and only if the intersection Bdy(S) (X \ S) is empty.
The entire boundary Bdy(S) is empty if S = or S = X. If Bdy(S) = and S 6= and S 6= X, then X
is a disconnected topological space and the pair (S, X \ S) is a disconnection of X. (See Section 34.1 for
connectedness definitions.)
S
Int(S) Bdy(S) Ext(S)
S X \S
X
Figure 31.9.1 Relations between interior, exterior, boundary and closure of a set S
31.9.10 Theorem: Some elementary properties of the exterior, boundary, interior and closure operators.
Let S be a subset of a topological space X. Then:
(i) Bdy(S) Int(S) = .
(ii) Bdy(S) S.
(iii) S Top(X) Bdy(S) S = .
(iv) Ext(S) = Int(X \ S).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
(xxix) Bdy(Int(S)) Bdy(S).
(xxx) Bdy(S) Bdy(S).
(xxxi) Bdy(Ext(S)) Bdy(S).
(xxxii) Bdy(S) = Bdy(S).
(xxxiii) Bdy(Bdy(S)) Bdy(S).
(xxxiv) X \ S Top(X) Ext(S) = X \ S.
(xxxv) X \ S Top(X) (X \ S) Bdy(S) = .
(xxxvi) Bdy(S) = S Top(X).
(xxxvii) Bdy(S) = X \ S Top(X).
(xxxviii) Bdy(S) = (S Top(X) and X \ S Top(X)).
Let S1 and S2 be subsets of a topological space X. Then:
(xxxix) S1 S2 Ext(S1 ) Ext(S2 ).
Part (vi) follows from part (iv) and Theorem 31.8.12 (viii).
Part (vii) follows from part (vi) and Theorem 8.2.5 (v).
Part (viii) follows from part (vii) and Theorem 31.8.12 (x).
Part (ix) follows from part (vii) and Definition 31.9.5.
Part (x) follows from part (iv) and Theorem 31.8.12 (ii).
Part (xi) follows from Definition 31.9.5 and Theorem 31.8.12 (xi).
Part (xii) follows from part (xi), Theorem 31.8.12 (vi) and Theorem 31.4.5 (ii).
For part (xiii), note that Bdy(S) = S \ Int(S) = X \ Int(X \ S) \ Int(S) = X \ Int(S) \ Ext(S) by part (iv).
To show part (xiv), note that by part (xiii), Bdy(X \ S) = X \ Int(X \ S) \ Ext(X \ S) = X \ Ext(S) \ Int(S)
by part (iv) and Theorem 8.2.5 (xv). This equals Bdy(S) by part (xiii).
Part (xv) follows from part (xiii) because Bdy(S) = X \ Int(S) \ Ext(S) = X \ (Int(S) Ext(S)).
Part (xvi) follows from parts (i), (viii) and (ix).
Part (xvii) follows from parts (xv) and (xvi).
For part (xviii), note that Bdy(S) Ext(S) = (X \ Int(S) \ Ext(S)) Ext(S) = X \ Int(S) by part (xvi).
But by Theorem 31.8.12 (ix), X \ S = X \ Int(S).
Part (xix) follows from parts (xviii) and (vi), and Theorems 31.8.12 (xii) and 31.8.13 (i).
Part (xx) follows from Definition 31.9.5 and Theorem 31.8.12 (x).
Part (xxi) follows from part (xx) and Theorem 31.8.12 (ii).
For part (xxii), Int(S) = S \Bdy(S) by part (xxi). So S \Int(S) = S \(S \Bdy(S)). So S \Int(S) = S Bdy(S)
by Theorem 8.2.5 (xiv).
Part (xxiii) follows from parts (iv) and (xiv).
Part (xxiv) follows from Definition 31.9.5 and Theorem 31.8.12 (x).
Part (xxv) follows from part (xxiv) and Theorem 31.8.12 (ii, vii).
For part (xxvi), note that Bdy(S) S if and only if S = S by part (xxv) and Theorem 8.1.7 (vii). But
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S = S if and only if S Top(S) by Theorem 31.8.13 (v). Hence S Top(S) if and only if Bdy(S) S.
Part (xxvii) follows from Ext(Ext(S)) = Int(X \ Ext(S)), by part (iv), which equals Int(S) by part (vi).
Part (xxviii) follows from part (v) and Theorem 31.8.13 (i).
Part (xxix) follows from Bdy(Int(S)) = Int(S) \ Int(Int(S)) by Definition 31.9.5, which equals Int(S) \ Int(S)
by Theorem 31.8.13 (ix), which is a subset of S \ Int(S) by Theorem 31.8.13 (xi), which equals Bdy(S).
For part (xxx), note that Bdy(S) = S \ Int(S) by Definition 31.9.5, which equals S \ Int(S) by Theo-
rem 31.8.13 (x), which is a subset of S \ Int(S) by Theorem 31.8.13 (xii), which equals Bdy(S).
For part (xxxi), note that Bdy(Ext(S)) = Bdy(X \ S), by part (vi), which equals Bdy(S) by part (xiv), and
this is a subset of Bdy(S) by part (xxx).
Part (xxxii) follows from part (xii) and Theorem 31.8.13 (v).
For part (xxxiii), note that Bdy(Bdy(S)) Bdy(S) by Definition 31.9.5, but Bdy(S) = Bdy(S) by
part (xxxii).
For part (xxxiv), note that Ext(S) = Int(X/S) by part (iv). Hence Ext(S) = X \ S Int(X/S) = X \ S
X \ S Top(X) by Theorem 31.8.13 (i).
For part (xxxv), note that (X \ S) \ Int(X \ S) = (X \ S) Bdy(X \ S) = (X \ S) Bdy(S) by parts (xxii)
and (xiv). So X \ S Top(X) (X \ S) Bdy(S) = . by Theorem 31.8.13 (i).
Part (xxxvi) follows from part (xxi) and Theorem 31.8.13 (i).
Part (xxxvii) follows from parts (xxiii) and (xxxiv).
For part (xxxviii), Bdy(S) = (S Top(X) and X \S Top(X)) follows from parts (xxxvi) and (xxxvii).
The converse follows from parts (iii) and (xxxv) because Bdy(S) = (Bdy(S) S) (Bdy(S) (X \ S)).
Part (xxxix) follows from part (iv), Theorem 31.8.12 (xiii) and Theorem 8.2.6 (xvi).
31.9.11 Remark: Every set is partitioned into its interior, boundary and exterior.
Theorem 31.9.10 (xv, xvi) implies that the set {Int(S), Bdy(S), Ext(S)} is a partition of the topological
space X for any subset S of X.
31.9.12 Theorem: The 3-way partition of the empty and full sets.
Let X be a topological space.
(i) Int() = , Bdy() = and Ext() = X.
(ii) Int(X) = X, Bdy(X) = and Ext(X) = .
Proof: For part (i), Int() = by Theorem 31.8.13 (iii), and Clos() = by Theorem 31.8.13 (vii). Then
Bdy() = by Definition 31.9.5 and Ext() = X by Theorem 31.9.10 (vi).
For part (ii), Int() = X by Theorem 31.8.13 (iv), Bdy() = by Theorem 31.9.10 (xxxviii), and Ext() =
by Theorem 31.9.10 (x).
31.9.13 Remark: The open and closed portions of the boundary of a set.
For any set S in a topological space X, the points in Int(S) are always elements of S (by Theorem 31.8.12 (ii)),
and the points in Ext(S) are always elements of X \ S (by Theorem 31.9.10 (x)). But the points of Bdy(S)
may belong to either S or X \ S. (See Figure 31.9.2.)
rtion o pen
po p or
sed ti o
clo n
X \S
S Bdy(S) X \ S
Bdy(S) S
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Figure 31.9.2 Open and closed portions of boundary of a set S
If S is open, all points of Bdy(S) belong to X \ S, whereas if S is closed, all points of Bdy(S) belong to S.
Therefore one may refer to the points of Bdy(S) which are elements of X \ S as the open portions of the
boundary, and the points of Bdy(S) which are elements of S as the closed portions of the boundary.
31.9.14 Remark: Ad-hoc notations for exterior and boundary operators to indicate choice of topology.
Notation 31.9.15 gives topology-dependent notations for the exterior and boundary of sets analogous to
Notation 31.8.19 for the interior of sets.
31.9.15 Notation: ExtT (S) denotes the exterior of a set S with respect to a topological space (X, T ).
BdyT (S) denotes the boundary of a set S with respect to a topological space (X, T ).
31.9.16 Remark: The effects of topology strength on set exteriors and boundaries.
Theorem 31.9.17 says that strengthening the topology on a fixed set X makes set exteriors larger and set
boundaries smaller. In the extreme case of the strongest topology on X, namely the discrete topology (X), P
the exterior of S is equal to X \ S and the boundary of S is empty. In the opposite extreme of the trivial
P
topology T = {, X} on X, the exterior of any set S in (X) \ {, X} equals , and the boundary of such a
set S equals X. (See Remark 31.8.23 for similar comments on the interior and closure of sets.)
Proof: Part (i) follows from Theorem 31.8.21 (i) because ExtT (S) = IntT (X \ S) for T = T1 and T2 by
Theorem 31.9.10 (iv).
Part (ii) follows from Theorem 31.8.21 and Theorem 31.9.10 (xiii).
S
Int(S) Bdy(S) Ext(S)
topology weaker
T1 topology
S X \S
topology stronger
T2 topology
Int(S) Bdy(S) Ext(S)
S
X
Figure 31.9.3 Relation between topology strength and interior/boundary/exterior of sets
The decreasing (actually non-increasing) boundary set Bdy(S) for a fixed set S with respect to the strength of
the topology may be thought of a process of nibbling away of the boundary by the interior Int(S) and Ext(S)
as more empty sets are added to the topology. When the topology is weak, many points have undecided
status. That is, they are neither in the interior nor in the interior. (Figure 31.9.2 in Remark 31.9.13
illustrates the undecided status of points in Bdy(S) which are in S or X \ S, but which are not allocated
to either Int(S) or Ext(S).)
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
As the topology is strengthened, more and more points are decided as either interior or exterior points. The
extreme cases, the trivial and discrete topologies, are illustrated in Figure 31.9.4, where almost all sets S
in the trivial topology have Bdy(S) = X because no points in X are decided, whereas all sets S in the
discrete topology have Bdy(S) = because all points in X are decided.
Proof: By Definition 31.3.19 for a discrete topological space, Top(X) = P(X). This implies part (i).
Part (ii) follows from part (i) and Definition 31.4.1.
Part (iii) follows from part (i) and Theorem 31.8.13 (i).
Part (iv) follows from part (iii) and Theorem 31.9.10 (iv).
Part (v) follows from parts (iii) and (iv) and Theorem 31.9.10 (xiii).
Part (vi) follows from part (ii) and Theorem 31.8.13 (v). (Or from part (iv) and Theorem 31.9.10 (vi).)
Proof: Part (i) is equivalent to Definition 31.3.18 for the trivial topology.
Part (ii) follows from part (i) and the definition of closed sets.
For part (iii), note that if S P(X) \ {X} and Top(X) and S, then = .
Part (iv) follows from part (iii) and Theorem 31.9.10 (iv).
Part (v) follows from parts (iii) and (iv).
Part (vi) follows from part (iv) and Theorem 31.9.10 (vi).
S = X
Int(S) = Bdy(S) = X Ext(S) =
trivial weakest
topology topology
{, X}
S X \S
discrete
topology
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
strongest
P(X)
Int(S) = S Bdy(S) = Ext(S) = X \ S
topology
S = S
X
Figure 31.9.4 Interior/boundary/exterior of sets including trivial and discrete extremes
Note that for the trivial topology on a non-empty set X, the label Int(S) = applies only if S 6= X; the
label Ext(S) = applies only if S 6= ; the label Bdy(S) = applies only if S
/ {, X}; and the label S = X
applies only if S 6= . (See Remark 31.9.18 for related comments.)
31.9.22 Remark: Diagram illustrating the effect of topology strength on boundary thickness.
Figure 31.9.5 illustrates the way in which boundary thickness of a fixed set S in a topological space X
decreases as the topology is strengthened. It is notable that both the weakest (i.e. trivial) topology and
the strongest (i.e. discrete) topology contain no information. The extremes of topology strength add no
information to the set S.
X
Bdy(S) = X S weakest topology = {, X}
X
thick boundary S weak topology
X
S
X
thin boundary S strong topology
X
Bdy(S) = S strongest topology = P(X)
Figure 31.9.5 The influence of topology strength on boundary thickness
31.10.2 Definition: A limit point of a set S in a topological space X is a point x X which satisfies
Topx (X), (S \ {x}) 6= .
A limit point is also known as an accumulation point or a cluster point .
The limit set of a set S in a topological space X is the set of limit points of S.
31.10.3 Theorem: Let X be a topological space. Let S1 , S2 P(X) satisfy S1 S2 .
(i) Let z be a limit point of S1 . Then z is a limit point of S2 .
(ii) Let A1 , A2 be the limits sets of S1 , S2 respectively. Then A1 A2 .
Proof: For part (i), let S1 , S2 be subsets of a topological space X such that S1 S2 . Let z be a limit point
of S1 . Then Topx (X), (S1 \ {x}) 6= by Definition 31.10.2. So Topx (X), (S2 \ {x}) 6= .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Hence z is a limit point of S2 by Definition 31.10.2.
Part (ii) follows from part (i) and Definition 31.10.2 for the limit set.
31.10.4 Theorem: A point a in a topological space X is a limit point of X if and only if {a}
/ Topa (X).
Proof: Let X be a topological space. Let a X. Then for all Topa (X), (X \ {a}) 6= if and only
if \ {a} 6= , which holds if and only if 6 {a} by Theorem 8.2.5 (iv), which holds if and only if 6=
and 6= {a}. But the case = is excluded by the definition of Topa (X). So a is a limit point of X if and
only if Topa (X), (S \ {a}) 6= , which holds if and only if Topa (X), 6= {a}, which means
that {a} / Topa (X).
31.10.5 Remark: Limit points are uninteresting for finite neighbourhood-systems.
If a point x has only a finite number of neighbourhoods Topx (X), theTpoint x can be a limit point only
if there is at least one point y distinct from x which is in the intersection Topx (X) of all neighbourhoods
of x. (See Figure 31.10.1.)
T
But then G = Topx (X) must be an element of Topx (X) (i.e. an open neighbourhood of x) if the number
of neighbourhoods is finite. It would therefore follow that {x, y} G for all non-empty Topx (X).
This would imply that the topology is extremely weak. Such a topology would not even have the extremely
weak T1 separation property in Definition 33.1.8. Therefore limit points are of real interest only when
there are infinitely many neighbourhoods at each point. (A topological space where each point has only a
finite number of neighbourhoods could be called a first finite topological space by analogy with the first
countable spaces in Definition 33.4.12. But such a worthless concept does not deserve to have a name.)
31.10.6 Remark: Expression for the limit set of a set.
The limit set of a set S in a topological space X may be written as
{x X; Topx (X), (S \ {x}) 6= }.
S S
31.10.7 Theorem: Limit points are points which are not in the closure of their complement.
Let S be a subset of a topological space X. Then a point x is a limit point of S if and only if x S \ {x}.
Proof: The proof follows from Definitions 31.10.2 and 31.8.2. Let x X. Then
x is a limit point of S Topx (X), (S \ {x}) 6=
Topx (X), (S \ {x}) =
Topx (X), X \ (S \ {x})
x/ Int(X \ (S \ {x}))
x S \ {x}.
The last line follows from Theorem 31.8.15 (i).
31.10.8 Definition: An isolated point of a set S in a topological space X is a point x S which is not a
limit point of S.
31.10.9 Remark: Isolated points are points which are excluded by some neighbourhood of their complement.
From Definition 31.10.2, it is clear that a point x in a topological space X is an isolated point of a subset S
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
of X if and only if Topx (X), (S \ {x}) = .
31.10.10 Theorem:
A point a in a topological space X is an isolated point of X if and only if {a} Topa (X).
Proof: The assertion follows immediately from Theorem 31.10.4 and Definition 31.10.8.
31.10.11 Remark: Theorem and diagram for properties of limit points and isolated points.
Some relations between the closure of a set and its limit points and isolated points are summarised in
Theorem 31.10.12 for convenient reference. These relations are illustrated in Figure 31.10.2 for the case that
X has no isolated points. Any isolated points in the interior of a set S are necessarily also isolated points
of X. So S cannot have any isolated interior points if X has no isolated points.
S
Int(S) Bdy(S) Ext(S)
S X \S
X
Int(S) Bdy(S) S Bdy(S) \ S Ext(S)
lim(S) \ S
iso(S)
= S \ S
lim(S) lim(S)
Figure 31.10.2 Limit points, isolated points and set interior and boundary if iso(X) =
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S \ S = , if and only if lim(S) \ S = by part (iii), if and only if lim(S) S.
Part (vii) follows from parts (iv) and (i).
P
For part (viii), suppose that iso(X) = . Let S (X) and x Int(S). Then 1 S for some 1
Topx (X). Let Topx (X). Then 1 Topx (X). So 1 \ {x} 6= because iso(X) = . But
1 S because 1 S. So 1 = 1 S. So 1 S \ {x} = 6 . So S \ {x} =
6 because
1 . Therefore Topx (X), (S \ {x}) 6= . Thus x lim(S).
For part (ix), suppose that iso(X) = . Then Int(S) lim(S) by part (viii). But iso(S) lim(S) = by
part (ii). So iso(S) Int(S) = .
For part (x), suppose that iso(X) = . Then iso(S) Int(S) = by part (ix). So iso(S) S \ Int(S). But
S \ Int(S) = S Bdy(S) by Theorem 31.9.10 (xxii). Hence iso(S) S Bdy(S).
31.10.13 Remark: Diagram for properties of limit points and isolated points in a general topological space.
Figure 31.10.3 illustrates the relations in Theorem 31.10.12 between limit points, isolated points and set
interior and boundary in the general case where the set of isolated points iso(X) is not necessarily empty.
31.10.14 Remark: Alternative definition for a closed set.
Theorem 31.10.12 (vi) is very often given as the definition of a closed set, and Definition 31.4.1 is then given
as a theorem regarding close sets, while Theorem 31.10.12 (vii) becomes the definition of the closure of a set.
31.10.15 Remark: A limit point is not necessarily an -limit point.
It seems intuitively clear that if every neighbourhood of a point z contains at least one element of a set S,
then by shrinking the neighbourhoods in an finite sequence will guarantee the existence of infinitely many
distinct elements of S in every neighbourhood of z, since if the number were only finite, one could exclude
them all by sufficiently shrinking a neighbourhood. This intuition is almost correct, but not quite. The
proof that every -limit point is a limit point in Theorem 31.10.18 is trivial, but the converse implication
in Theorem 33.1.22 requires the space to have the T1 separation property.
S
Int(S) Bdy(S) Ext(S)
S X \S
X
Int(S) Bdy(S) S Bdy(S) \ S Ext(S)
lim(S) \ S
= S \ S
lim(S) iso(S) lim(S)
Figure 31.10.3 Relations between limit points, isolated points and set interior and boundary
31.10.17 Definition: An -limit point of a set S in a topological space X is a point z X which satisfies
Topz (X), #( (S \ {z})) = .
An -limit point is also known as an -accumulation point or an -cluster point .
The -limit set of a set S in a topological space X is the set of -limit points of S.
Proof: Let X be a topological space with S X. Let z X be an -limit point of S. Let Topz (X).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Then #((S\{z})) = by Definition 31.10.17. So (S\{z}) 6= . Thus Topz (X), (S\{z}) 6= .
Hence z is a limit point of S by Definition 31.10.2.
Proof: For part (i), let S1 , S2 be subsets of a topological space X such that S1 S2 . Let z be an
-limit point of S1 . Then Topx (X), #( (S1 \ {x})) = by Definition 31.10.17. Therefore
Topx (X), ( (S2 \ {x})) = . Hence z is an -limit point of S2 by Definition 31.10.17.
Part (ii) follows from part (i) and Definition 31.10.17 for the -limit set.
31.11.2 Example: A topology with pairwise point separation, but some points not isolated.
Figure 31.11.1 illustrates a topology with imperfect separation on a countably infinite set X = +
0 . In other Z
words, the topology is not the discrete topology. Even though the point x = 0 is separated from any other
point y X by a set y Top(X), it is not possible to construct {x} as the intersection of a finite set of
open sets y . So x is not an isolated point.
Figure 31.11.1 illustrates the set T = {} {y ; y + Z +
Z +
0 } where y = {i 0 ; i = 0 i > y} for y 0 . Z
7 6 5 4
0
9 8 7 6 5 4 3 2 1
P
It is easily verified that the range of any non-increasing sequence of subsets of (X) (with respect to the
set-inclusion partial order) is closed under finite intersection and arbitrary union. So if and X are added,
the result is a valid topology. It follows that T is a valid topology on X.
Although x = 0 is separated from y by the open set y for all y + Z
0 (because 0 y and y / y ), the
set {0} is clearly not equal to the intersection of any finite number of sets y . So {0} / T although 0 is
separated from all individual elements of X.
Z
The topology T may be extended to include all sets {y} for y + . Let T 0 = {G1 G2 ; G1 T and G2
PZ PZ
( + )}. (This is the same as T 0 = ( + ) { ( + PZ
0 ); #(
+
Z
\ ) < }.) Then it is (fairly) clear that
T is also a valid topology on X. This larger topology completely separates all points x +
0
0 from otherZ
Z
elements y X \ {x}. The set {x} is in T 0 for all x + , but {0} / T 0.
31.11.3 Remark: Translation-invariant topologies on the integers containing non-empty finite sets.
Z
There is now the question of whether infinite sets such as have interesting topologies which are invariant
under various groups of permutations of the point set.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
Suppose a topology T on Z
is invariant under all translations of Z
and contains at least one non-empty
PZ
finite set. (See Definition 24.2.4 and Notation 24.2.5 for translates of sets.) Then T = ( ). To show this,
let T be a non-empty finite set. Let d = max() min(). Then ( + d) = {max()} contains
Z
exactly one element of , where + d denotes the translate of by a distance d. It follows that {x} T for
Z Z
all x . From this, every subset of can be constructed as a union of singleton sets. (See Definition 32.5.2
for the standard topology for the set of integers.)
31.11.4 Remark: Translation-invariant topologies on the integers containing no non-empty finite sets.
Next one naturally asks whether there are non-trivial, non-discrete topologies on the integers which contain
no non-empty finite sets.
Z Z
The set Tk of all subsets of with period k + is a translation-invariant topology on . These topologiesZ
N
are simply infinite copies of the discrete topology on k . The case k = 1 is the trivial topology. A more
interesting class of topologies is given in Theorem 31.11.6. These complement-of-finite-set topologies are
well defined on arbitrary sets. (There is a nuance here in regard to the condition #(X \ ) < . This
means that there exists a bijection from a finite ordinal number to #(X \ ). But the word exists raises
questions of constructibility in some situations.)
31.11.5 Remark: Topology consisting of complements of finite subsets of the integers.
The topologies in Remarks 31.11.3 and 31.11.4 do not exhaust all of the possibilities for translation-invariant
Z
topologies on . By Theorem 31.11.6, a topology can be defined on X = by Z
P(X); #(X \ ) < }.
Top(X) = {} { (31.11.1)
This topology is invariant under all permutations of the point set Z. The set construction defined by (31.11.1)
is a valid topology for any set X, and it is always permutation-invariant. It might not be very useful in direct
applications, but it is certainly useful for counterexamples, as in Remark 33.1.27 and Example 35.3.13.
P
Proof: Let T = {} { (X); #(X \ ) < } for a set X. Then clearly T and X T .
Let 1 , 2 T . If 1 = or 2 = , then 1 2 = T . So suppose that 1 6= and 2 6= . Then
#(X \ 1 ) < and #(X \ 2 ) < . But X \ (1 2 ) = (X \ 1 ) (X \ 2 ) by De Morgans law
(Theorem 8.2.3). So #(X \ (1 2 )) #(X \ 1 ) + #(X \ 2 ) < . So 1 2 T .
Now suppose P
T that C (T ). Let C =
S
C. If C = , then C = T . So let C 6= . Then
X \ C = {X \ ; C} by the generalised De Morgans law (Theorem 8.4.10). So #(X \ C )
min{#(X \ ); C} < . So C T . Hence T is a topology on X by Definition 31.3.2.
A finite-complement topological space is a topological space (X, T ) such that T is the finite-complement
topology on X.
Proof: Let X be a set. Define T as in equation (31.11.2). Then clearly T P(X) and {, X} T . So
Definition 31.3.2 (i) is satisfied.
Let 1 , 2 T . If 1 = or 2 = , then 1 2 = T . So suppose that 1 6= and 2 6= . Then
P
1 , 2 (X), and #(X \ 1 ) < and #(X \ 2 ) < . But X \ (1 2 ) = (X \ 1 ) (X \ 2 ) by
Theorem 8.2.3. So #(X \ (1 2 )) < and therefore 1 2 T , which satisfies Definition 31.3.2 (ii).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
S
Let C T and let = C. If = then T . So suppose that 6= . Then G C, GS 6= . Therefore
G C, (G X #(X \ G) < ). But by Theorem 8.4.8 (xiii), G C implies that S G C. Therefore
G C, (G #(X \ G) < ). Hence #(X \ ) #(X \ G) < . So C T , which satisfies
Definition 31.3.2 (iii).
31.11.10 Theorem: The finite-complement topology on a set X is the smallest topology on X for which
{x} is a closed set for all x X.
Proof: To show that {x} is closed for all x X for the finite-complement topology on the set X, let
= X \ {x} and note that #(X \ ) = 1 < .
Let T be the finite-complement topology on a set X. To show that T T 0 for all topologies T 0 on X such
that {x} is a closed subset of X with respect to T 0 for all x X, let T 0 be such a topology. Let T .
If = , then T 0 because T 0 is a topology. So let be a subset of X such that #(X \ ) < .
Let C = {X \ {x}; x X \ }. Then #(C) < and C T 0 because every singleton {x} is closed
with respect to T 0 . If #(C)T= 0 then = X, and so T 0 because T 0 T is a topology on X. So assume
that 1 < #(C) < . Then C T 0 because T 0 is a topology. Thus = C T 0 . Hence T T 0 . That
is, T is smaller than (or equal to) every topology on X for which {x} is a closed set for all x X.
If the set X is infinite, the interior, boundary, exterior and closure are as follows.
cardinality of S Int(S) Bdy(S) Ext(S) S
#(S) < S X \S S
#(S) = , #(X \ S) = X X
#(X \ S) < S X \S X
These tables suggest that the finite-complement topologies are rather uninteresting. When X is finite, there
are no boundary points for any set, and the interior and exterior are simply S and X \S for any set S (X). P
So the topology gives no information about a set S other than its set of elements.
When X is infinite, any set S for which #(S) = and #(X \ S) = has Bdy(S) = X. In other words,
such sets have no interior and no exterior. So the topology says very little of interest about sets S, other
than whether they (and their complements) are finite or infinite.
It can be hoped that this simple class of topologies may be of some use in providing pathological examples
to disprove false conjectures. (See for example Remark 33.4.18 and Theorem 33.4.19.) In particular, various
trivial classes of topologies show that Definition 31.3.2 for a general topology is perhaps overly broad. This
motivates the introduction of the classes of topologies in Chapter 33, which add various sets of extra axioms
to topologies to make them more likely to be useful.
Sections 32.1, 32.3 and 32.4 introduce methods of generating topologies which are more interesting than
the finite-complement topologies because the interior, boundary and exterior of sets can be made to contain
much more information when topologies are built up in more sophisticated ways.
31.11.13 Definition: The particular-point topology on a set X, for a particular point p X, is the set
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
31.11.14 Theorem: The particular-point topology on a set, for a given point in that set, is a valid topology
on the set.
Proof: Let T be the particular-point topology on a set X for a point p X. Then {, X} T , and T is
clearly closed under unions and intersections. So T is a topology on X by Definition 31.3.2.
31.12.5 Theorem: Let X and Y be topological spaces. Then f : X Y is continuous if and only if
K Top(Y ), f 1 (K) Top(X).
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
is continuous.
Proof: Let f : X Y and g : Y Z be continuous for topological spaces X, Y and Z. Let Top(Z).
Then g 1 () Top(Y ) by the continuity of g. So (g f )1 () = f 1 (g 1 ()) Top(X) by the continuity
of f for all Top(Z). Hence g f : X Z is continuous.
P
Proof: For part (i), let X and Y be topological spaces such that Top(X) = (X). Let f : X Y .
P
Let Top(Y ). Then f 1 () (X) = Top(X). So f is continuous.
For part (ii), let X and Y be topological spaces with Top(Y ) = {, Y }. Let f : X Y . Let Top(Y ).
Then = or = Y . If = , then f 1 () = Top(X). If = Y , then f 1 () = X Top(X). So f
is continuous.
31.12.10 Notation:
C(X, Y ), for topological spaces X and Y , denotes the set of continuous functions from X to Y ,
C 0 (X, Y ) is an alternative notation for C(X, Y ) for topological spaces X and Y .
31.12.11 Remark: The convenience, and ambiguity, of the C 0 notation for continuous functions.
The alternative notation C 0 (X, Y ) in Notation 31.12.10 is often used in contexts where the sets C k (X, Y )
of k-times continuously differentiable functions from X to Y for k + Z 0 are also defined.
0
Even if differentiability of functions from X to Y is not defined, the C notation is a convenient abbreviation
for the word continuous because the notation C on its own is ambiguous, whereas C 0 strongly suggests
the idea of continuity. It is often said that a function is C 0 , but rarely that a function is C. (See for
example Notation 42.1.10 for C k function spaces.)
Notation 31.12.12 defines the default range Y of C(X, Y ) as the real numbers. In other words, C 0 (X) is
defined to equal C 0 (X, ). R
31.12.12 Notation: C 0 (X), for a topological space X, denotes the set of continuous functions from X to
R with the usual topology on . R
31.12.13 Remark: Continuity of restrictions of continuous functions.
It seems intuitively obvious that any restriction of a continuous function should be continuous. This is
certainly clear in the case of metric spaces because the bounds which apply to the full function will apply at
least as strongly for the restricted function. In the case of general continuous functions, one must demonstrate
that the inverse images of open sets are open sets, but this is not true in general. (Consider for example the
map f : 2 R R R
defined by f : (x1 , x2 ) 7 x1 , restricted to {x 2 ; x2 = 0}.) However, it is true that
the inverse images of a continuous function restricted to a set S are restrictions of open sets to S. These
restricted sets happen to be the sets in the relative topology in Definition 31.6.2. This observation is stated
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
as Theorem 31.12.14. This is useful for restrictions of fibre charts and projection maps for fibre bundles.
(See Theorem 32.8.13 for an application to restrictions of homeomorphisms whose target space is a direct
product of topological spaces.)
31.12.14 Theorem: Restrictions of continuous functions are continuous.
Let (X1 , T1 ) and (X2 , T2 ) be topological spaces. Let f : X1 X2 be continuous. Let S be a subset of X1
with the relative topology T10 on S from T1 . Then f S : S X2 is continuous from (S, T10 ) to (X2 , T2 ).
Proof: Let G2 T2 . Then (f S )1 (G2 ) = f 1 (G2 ) S T10 by Definition 31.6.2 for the relative topology
on S because f 1 (G2 ) T1 by Definition 31.12.3. Hence f S : S X2 is continuous.
31.12.15 Remark: Keyhole testing for continuity.
In the case of topological manifolds, it is sometimes inconvenient to test the continuity of a function globally.
Theorem 31.12.16 makes it possible to conclude that a function is continuous globally if each point has
a neighbourhood for which the restriction of the function is continuous. The proof glues together the
restrictions U f 1 () of an inverse image f 1 () to elements U of an open cover Q of the domain X to
show that f 1 () is open if is open.
31.12.16 Theorem: Local continuity implies global continuity.
Let f : X Y for topological spaces X and Y . Then f is continuous if and only if for all p X there is a
set U Topp (X) such that f U is continuous. In other words,
f C(X, Y ) p X, U Topp (X), f U C(U, Y ).
Proof: Let f : X Y satisfy
p X, U Topp (X), f U C(U, Y ). Define a collection of open sets
S
Q = {U Top(X) \ {}; f U C(U, Y )}. Then Q = X.
Then f C(X, Y ).
Proof: Let G Top(Y ). Suppose that y0 / G. Then f 1 (G) . So f 1 (G) = f 1 (G) = (f )1 (G)
by Theorem 10.6.11 (ii). So f 1 (G) = (h )1 (G) Top() because h C(, Y ) by Theorem 31.12.14.
Now suppose that y0 G. Then f 1 (G) = h1 (G) (X \ ), and h1 (G) Top() by Definition 31.12.3.
1 1
So h (G) = U for some U Top(X) by Definition 31.6.2. This gives f (G) = (U ) (X \ ).
1
But X \ = (X \ ) Bdy() by Theorem 31.9.10 (xix). So f (G) = (U ) (X \ ) Bdy().
1
But Bdy() h (G) because y0 G and h(x) = y0 for x Bdy(). So Bdy() U , and so
1 1
f (G) = (U ) (X \ ) by Theorem 8.1.7 (iii). So f (G) = (U ) (U (X \ )) (X \ ) by
1 1
Theorem 8.1.6 (xiii). Therefore f (G) = U (X \ ) by Theorem 8.2.5 (viii). So f (G) Top(X) since
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
U Top(X) and X \ Top(X). Hence f C(X, Y ) by Definition 31.12.3.
31.13.2 Definition: A continuous partial function from a topological space X to a topological space Y
Y such that Top(Y ), f 1 () Top(X).
is a partial function f : X
0
g(Range(f )) Range(g)
0
Z
Z
g
Dom(g)
Range(f )
Y
(g f )1 (0 ) f
X0
Dom(f ) f 1 (Dom(g))
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
X
Figure 31.13.1 Inverse image of open set under composite of partial functions
31.13.6 Theorem: The composite of two continuous partial functions is a continuous function.
Let X, Y and Z be topological spaces. Let f : X Y and g : Y Z be continuous partial functions.
Then g f is a surjective continuous function from Dom(g f ) = f 1 (Dom(g)) X to Range(g f ) =
g(Range(f )) Z with respect to the relative topologies on Dom(gf ) to Range(gf ) in X and Z respectively.
Proof: Let f : X Y and g : Y Z be continuous partial functions for topological spaces X, Y and
Z. Let X 0 = Dom(g f ) and Z 0 = g(Range(f )). Then X 0 = f 1 (Dom(g)) and Z 0 = g(Range(f )) by
Theorem 10.10.13 (i, ii). Therefore g f : X 0 Z 0 is a well-defined surjective function. The inclusions
X 0 X and Z 0 Z follow from Theorem 9.5.15 (ii, i) and Definition 10.2.2 (iii).
Let 0 Top(Z 0 ). Then 0 = Z 0 for some Top(Z) by Definition 31.6.2 for the relative topology on
Z 0 in Z. By Definition 31.13.2, g 1 () Top(Y ), and so f 1 (g 1 ()) Top(X), because f : X Y and
g:Y Z are continuous. But (g f )1 (0 ) = (g f )1 ( Z 0 ) = (g f )1 () by Theorem 10.7.1 (iii).
Therefore (g f )1 (0 ) = f 1 (g 1 ()) Top(X). But (g f )1 (0 ) X 0 . It follows that (g f )1 (0 ) =
(g f )1 (0 ) X 0 Top(X 0 ) by Definition 31.6.2 for the relative topology on X 0 in X. Hence g f : X 0 Z 0
is a continuous function by Definition 31.12.3.
31.14. Homeomorphisms
31.14.1 Remark: History and etymology of the word homeomorphism.
The word homeomorphism was introduced by Henri Poincare in 1895. (See EDM2 [109], section 425.G.)
It comes from the Greek word meaning like, similar, resembling; the same, of the same rank; equal
citizen; equal; common, mutual; a match for; agreeing, convenient, and meaning form, shape, figure,
appearance, fashion, image; beauty, grace.
31.14.2 Definition: A homeomorphism between two topological spaces (X1 , T1 ) and (X2 , T2 ) is a bijection
f : X1 X2 such that both f and f 1 are continuous.
Two topological spaces (X1 , T1 ) and (X2 , T2 ) are said to be homeomorphic if there exists a homeomorph-
ism f : X1 X2 .
A topological automorphism on a topological space X < (X, TX ) is a homeomorphism from X to X.
31.14.3 Notation: (X1 , T1 ) (X2 , T2 ), for topological spaces (X1 , T1 ) and (X2 , T2 ), means that (X1 , T1 )
and (X2 , T2 ) are homeomorphic.
X1 X2 , for topological spaces X1 and X2 , means that X1 and X2 are homeomorphic with respect to
topologies which are implicit in the context.
f : (X1 , T1 ) (X2 , T2 ), for topological spaces (X1 , T1 ) and (X2 , T2 ), means that f is a homeomorphism from
X1 to X2 with respect to topologies T1 and T2 on X1 and X2 respectively.
f : X1 X2 , for topological spaces X1 and X2 , means that f is a homeomorphism from X1 to X2 with
respect to topologies which are implicit in the context.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
within a single context. Therefore a subscript may sometimes be used to distinguish structure classes.
31.14.6 Notation: Iso(X, Y ) for topological spaces X < (X, TX ) and Y < (Y, TY ) denotes the set of all
topological isomorphisms from X to Y .
Aut(X) for a topological space X
< (X, TX ) denotes the set of all topological automorphisms on X.
31.14.7 Remark: The homeomorphism relation is an equivalence relation on the class of topological spaces.
P P
It follows from Definition 31.14.2 that the set map f : (X1 ) (X2 ) of f is a bijection between the open
sets in T1 and T2 . In other words, there is a one-to-one correspondence between the topologies of the two
sets. (This is stated as Theorem 31.14.8.) This implies that absolutely all topological properties of the
two sets are identical. The homeomorphism relation is clearly an equivalence relation, but the set of all
topological spaces is not a set. So this is not an equivalence relation in the sense of a set of ordered pairs as
in Definition 9.5.2.
31.14.8 Theorem: Let f : X1 X2 be a homeomorphism for topological spaces (X1 , T1 ) and (X2 , T2 ).
Then T2 = {f (); T1 } and T1 = {f 1 (); T2 }.
not in fact valid unless one considers that the inverse f 1 of a homeomorphism f : X1 X2 is considered
to be continuous only with respect to the relative topology on f (X1 ). This use of the relative topology
is not required for the continuity of the forward map f . Thus when one says that a topological space is
homeomorphic to a subset of a given topological space, this must be interpreted with respect to the relative
topology on the image of the homeomorphism.
Theorem 31.14.12 is a two-sided version of Theorem 31.14.10 which similarly has the purpose of making
clear that a homeomorphism between subsets of topological spaces is to be interpreted with respect to the
relative topologies on the forward and reverse images of the homeomorphism. Definition 31.14.11 makes this
interpretation explicit.
31.14.10 Theorem: Let (X1 , T1 ) and (X2 , T2 ) be topological spaces. Suppose that f : X1 X2 is a
homeomorphism from (X1 , T1 ) and (f (X1 ), Tf (X1 ) ), where Tf (X1 ) is the relative topology of f (X1 ) in X2 .
Then Tf (X1 ) = {f (); Top(X1 )}.
Proof: The equalities Top(S1 ) = {f (); Top(S2 )} and Top(S2 ) = {f 1 (); Top(S1 )} follow from
Definition 31.14.11. Let Top(X2 ). Then f () = f ( Dom(f )) = f ( S2 ), where S2 Top(S2 )
by Definition 31.6.2. So {f (); Top(X2 )} {f (); Top(S2 )}. Let 0 Top(S2 ). Then 0 = S2
for some Top(X2 ), and f (0 ) = f ( S2 ) = f (). So {f (); Top(X2 )} {f (); Top(S2 )}
Hence {f (); Top(X2 )} = {f (); Top(S2 )}. This verifies the assertion for Top(S1 ). The assertion
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
for Top(S2 ) follows similarly.
Proof: The restriction of a bijection is always a bijection. Denote by T10 and T20 the relative topologies
for S1 and S2 from (X1 , T1 ) and (X2 , T2 ) respectively. To show the continuity of f S1 , note that if G02 T20 ,
then G02 = G2 S2 for some G2 T2 . Since f is continuous, f 1 (G2 ) T1 . It follows that (f S )1 (G02 ) =
1
f 1 (G2 ) S1 T1 . So f S1 is continuous with respect to the relative topologies. The continuity of the
inverse follows symmetrically. Therefore f S1 is a homeomorphism with respect to the relative topologies on
S1 and f (S1 ). That is, f : (S1 , T10 ) (S2 , T20 ).
S1
Proof: For part (i), let f : X Y and g : Y Z be local homeomorphisms. Then f : X Y and
g : Y Z are continuous partial functions by Definition 31.14.15. So g f : X Z is a continuous
partial function by Theorem 31.13.4. Similarly, f 1 : Y X and g 1 : Z Y are continuous partial
functions by Definition 31.14.15, and so (g f )1 = f 1 g 1 : Z
X is a continuous partial function by
Theorem 31.13.4. Hence g f : X Z is a local homeomorphism by Definition 31.14.15.
1
For part (ii), Dom(g f ) = f (Dom(g)) and Range(g f ) = g(Range(f )) by Theorem 10.10.13 (i, ii). The
inclusions Dom(g f ) X and Range(g f ) Z follow from Theorem 10.10.9.
For part (iii), it follows from Definition 31.14.15 and Theorem 31.13.6 that g f is a surjective continuous
function from Dom(g f ) to Range(g f ) with respect to the relative topologies on Dom(g f ) to Range(g f )
in X and Z respectively. Similarly, (gf )1 is a surjective continuous function from Range(gf ) to Dom(gf )
with respect to the same relative topologies. Hence g f is a homeomorphism from Dom(g f ) to Range(g f )
for these relative topologies by Definition 31.14.2.
Chapter 32
Topological space constructions
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
It is also possible to generate a topology on a set X from a family of maps from X to other topological
spaces. (See Definition 32.4.8.) A topology can also be defined as a pull-back of a topology via a function.
(See Definition 32.7.4.)
It is also possible to define topologies on sets which are constructed from sets which have pre-defined topolo-
gies on them. Direct products of topological spaces have a useful minimal topology for which the projection
functions are continuous. (See Sections 32.8 and 32.9.) Disjoint and non-disjoint unions of topological spaces
have a minimal topology. (See Section 32.11.) Partitions of topological spaces into subsets have a natural
quotient topology on the set of subsets in the partition. (See Section 32.10.) Families of topological spaces
with collage functions identifying overlaps have a collage topology. (See Section 32.12.)
in S. It is convenient to first state Theorem 32.1.2, which is used in Theorem 32.1.5 to prove that topologies
generated by set-collections are valid topologies.
T
32.1.2 Theorem: Let T be a non-empty set of topologies on a set X. Then T is a topology on X.
P
Proof: Let T be a non-empty set of topologies on a set X. Then T (X) for all TT T . So T Tis a
T
P
well-defined subset of (X). Since T and X T for all T T , it follows that T and X T .
So Definition 31.3.2 condition (i) is satisfied.
T
To prove condition (ii) of DefinitionT31.3.2, let 1 , 2 T . Then 1 , 2 T for all T T . So 1 T 2 T
for all T TT. Therefore 1 2 T . Condition S (iii) of Definition 31.3.2 follows
S similarly.
T Let C P
T T ).
(
Then C T . So C T for all T T . So C T for all T T . Therefore C T . Hence T is a
topology on X.
T
32.1.3 Theorem: Let (Ti )iI be a non-empty family of topologies on a set X. Then T = iI Ti is a
topology on X.
T
Proof: This follows immediately from Theorem 32.1.2 by setting T = {Ti ; i I} and T = T .
32.1.4 Definition: The topology generated by S on X , for any sets X and S such that S P(X), is the
intersection of all topologies T on X such that S T .
32.1.5 Theorem: Let X and S be sets which satisfy S P(X). Then the topology generated by S on X
is a valid topology on X.
PP
Proof: Let T = {T ( (X)); S T and T is a topology on T X}, where X and S are sets which
P
satisfy S (X). Then the topology generated by S on X is equal to T . The set T is non-empty because
P P T
(X) T for any set X. (See Definition 31.3.19 for the discrete topology (X) on X.) So T is a valid
topology on X by Theorem 32.1.2.
32.1.6 Remark: The empty collection of sets generates the trivial topology.
In the special case S = , the topology generated by S on any set X is the trivial topology {, X} on X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
32.1.7 Remark: Application of cardinality-constrained power-sets to topology.
By Notation 13.10.5, P
1 (S) means {C P
(S); 1 #(C) < }, the set of non-empty finite subsets of any
set S. Some elementary properties of 1 P
(S) are given in Theorem 13.10.8. This notation is useful for the
precise statement of numerous definitions and theorems in topology.
(i) S T (S).
S S
(ii) T (S) = S.
S
(iii) T (S) is a topology on S.
P S
It follows from Y (S) S that S (Y (S)). So S T (S).
By Theorem 8.5.2, T (S) is closed under arbitrary unions.
S S
P
Let 1, 2 T (S). Then 1 = C1 and 2 = C2 for some C1 , C2 (Y (S)). So 1 2 =
S S
S
C1
C2 = {E1 E2 ; E1 C1 , E2 C2 } by Theorem 8.4.8 (vii). Then E1 , E2 Y (S) for all E1 C1
and
T T
P
T E2 C2 . So E1 = D1 and E2 = D2 for some D1 , D2 1 (S). So E1 E2 =
T
D1
T
D2 =
S (D 1 D 2 ) by Theorem 8.4.8 (xii). Then E 1 E 2 Y (S) because 1 #(D 1 D 2 ) < . Therefore
{E1 E2 ; E1 C1 , E2 C2 } T (S) because T (S) is closed under arbitrary unions. Thus 1 2 T (S).
S S
From T (S) = S,Sit follows that T (S) P
S
PS
S ( T (S)) ( S) by Theorem 8.4.14 (vi, ii). It has thus
P
been shown that {, S} T (S) ( S S) and T (S) is closed under binary intersections and arbitrary
unions. Therefore T (S) is a topology on S by Definition 31.3.2. This verifies part (iii).
32.1.9 Remark: The topology generated by a collection of sets equals an explicit expression.
P
Theorem 32.1.10 shows that the topology generated by a collection of sets S (X) on a set X is the same
as the fairly explicit topology construction in Theorem 32.1.8.
32.1.10 Theorem: Explicit expression for theStopology generated by an arbitrary collection of sets.
S
Let X = S for sets X and S. Then T (S) = { C; C ({ D; D P T
P 1 (S)})} is the topology generated
by S on X.
S
Proof: Let X and S be sets with X = S. Define T (S) as indicated. Then T (S) is a topology on X by
Theorem 32.1.8 (iii).
T
Let T 0 be a topology on X which satisfies S T 0 . Let Y (S) = { D; D P1 (S)}. Then Y (S) T
0
0
because Y (S) is the set of all finite intersections of elements of S, and T is closed under finite intersections
(because T 0 is assumed to be a topology). Since T 0 is closed under arbitrary unions, it follows similarly that
S
P
{ C; C (Y (S))} T 0 . Thus T (S) T 0 . Since T (S) is a topology on X which satisfies S T (S)
by Theorem 32.1.8 (i), it follows from Theorem 32.1.2 that T (S) is equal to the intersection of all such
topologies. Therefore T (S) is the topology generated by S on X by Definition 32.1.4.
32.1.11 Remark: The choice axiom is not required for an explicit expression for a generated topology.
The proof of Theorem 32.1.10 does not use the axiom of choice because it was avoided in the proof of the
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
closure of T under arbitrary unions in Theorem 8.5.2. Avoiding the axiom of choice in topology is difficult
because topology deals with such general sets and collections of sets. Measure theory is another subject
which tempts one to use the axiom of choice because of the enormous generality of the sets.
32.1.13 Remark: The topology generated by a set is the weakest topology which includes the set.
The topology generated on a set X by a set S of subsets of X is the unique topology on X which is weaker
than all other possible topologies on X which include S.
32.1.14 Remark: Construction for generated topologies from collections closed under intersection.
Theorem 32.1.15 is a version of Theorem 32.1.10 which assumes that the set S is closed under finite inter-
sections. Then only half of the construction work is required to build the topology generated by S on X.
To facilitate comparisons, the set S is denoted as T 0 . In fact, the proof of Theorem 32.1.10 could have been
shortened by first proving Theorem 32.1.15 and then applying it to the set T 0 in Theorem 32.1.10.
32.1.15 Theorem: Explicit expression for topology generated by an intersection-closed collection of sets.
P
Let X and T 0 be sets such that {, X} T 0 (X) and T 0 is closed under finite intersections. Define
T =
S
Q; Q P(T 0 ) .
Then T is the topology generated by T 0 on X.
P
Proof: To show that TS, let Q = (T 0 ). Then =
S
Q T . To show that X T , let
P
Q = {X} (T 0 ). Then X = Q T . So {, X} T .
S 0
To show that T is closed under finite S letSA1 , A2 T . Then Ai = Qi for some Qi T
S intersections,
for i = 1, 2. Hence A1 A2 = Q1 Q2 = Q with Q = {U1 U2 ; U1 Q1 , U2 Q2 } (by
0 0 0
Theorem 8.4.8 (vii)). For i = 1, 2, Ui Qi implies that
S Ui T and so U1 U2 T by the closure of T
P 0
under finite intersections. So Q (T ). Therefore Q T . That is, A1 A2 T . Hence T is closed
under finite intersections.
The closure of T under arbitrary unions is guaranteed by Theorem 8.5.2. So T satisfies all of the conditions
of Definition 31.3.2 for a topology on X.
that T is the topology generated by T 0 on X, first show that T 0 T . Let U T 0 . Then {U }
To showS P(T 0).
So U = {U } T . Hence T 0 T .
Let T be a topology on X which satisfies T 0 T . Then T T because T is closed under arbitrary unions
and T is the closure of T 0 under arbitrary unions. Therefore T is included in the intersection of all topologies
T on X which include T 0 . Since T is itself such a topology, it follows that T is equal to the intersection of
all such topologies. Therefore T satisfies Definition 32.1.4 for the topology generated by T 0 on X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
32.2.1 Remark: Generating topologies from open bases and open subbases.
In practice, it is almost always too inconvenient to directly specify all of the open sets in a topology. A
topology is most conveniently generated from an open base or open subbase. The open subbase concept
corresponds roughly to Theorem 32.1.10, which states that the topology generated by an arbitrary collection
of sets equals the set of unions of finite intersections of sets in the collection. The open base concept
corresponds roughly to Theorem 32.1.15, which states that the topology generated by a collection of sets
which are closed under finite intersections equals the set of unions of sets in the collection.
The open base and subbase concepts are similar to the concept of a basis for a linear space. Many operations
on linear spaces can be specified for a basis, from which the operations on the whole space follow. In the
same way, many definitions and calculations for topological spaces may be specified for an open base or
subbase, from which the corresponding operations follow for the full topology.
32.2.2 Definition: An open base at a point x in a topological space X is a subset B of Topx (X) such that
P
every neighbourhood of x includes at least one element of B. In other words, B (Topx (X)) and
32.2.3 Definition: An open base for a topological space X is a subset B of Top(X) such that
32.2.4 Theorem: Let X be a topological space. Then Top(X) is an open base for X.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
a general indexed open base. This is asserted in Theorem 32.2.11.
In most cases, finite open bases are of no practical interest. If they are required, they can either be explicitly
defined to be finite, with or without and index, or a single open set can appear an infinite numbers of times
in the sequence. Therefore only countably infinite indexed open bases at a single point are defined here.
32.2.9 Definition: A countable open base at a point x in a topological space X is a countable subset B
of Topx (X) which is an open base at x.
An indexed countable open base at a point x in a topological space X is a sequence (Bi )i Topx (X)
whose range {Bi ; i } is an open base at x.
A non-increasing indexed countable open base at a point x in a topological space X is an indexed countable
open base (Bi )i Topx (X) at x such that Bi+1 Bi for all i .
32.2.10 Definition: A countable open base for a topological space X is a countable subset B of Top(X)
which is an open base for X.
An indexed countable open base for a topological space X is a sequence (Bi )i Top(X) whose range
{Bi ; i } is an open base for X.
32.2.11 Theorem: Let (Bi )i be an indexed open base at a point x in a topological space X. Then the
T
sequence (Bi0 )i defined by Bi0 = ij=0 Bj for all i is a non-increasing indexed open base at x.
Proof: Let X be a topological space with x X. Let B = (Bi )i be an indexed countable open base at x.
Ti
Define B 0 = (Bi0 )i by Bi0 = j=0 Bj for all i . Then Bi0 Topx (X) for all i by Theorem 31.3.7.
Let Topx (X). Then Bi for some i by Definition 32.2.2. But Bi0 Bi . Consequently
Topx (X), i , Bi0 . So Topx (X), G Range(B 0 ), G . Therefore Range(B 0 ) is an
open base at x by Definition 32.2.2. Hence B 0 is a non-increasing indexed open base at x.
32.3.2TDefinition: An open subbase at a point x in a topological space X is a subset S of Topx (X) such
that { D; D P
1 (S)}, the set of finite intersections of elements of S, is an open base at x. In other
P
words, S (Topx (X)) and
32.3.3 Remark: Notation for the set of non-empty finite subsets of a given set.
As mentioned in Remark 32.1.7, the expression P
1 (S) (which appears in Definition 32.3.2) denotes the set
P
{D (S); 1 #(D) < } of non-empty finite subsets of D. (See Notation 13.10.5 for P
1 (D). For basic
properties, see Theorem 13.10.8.)
32.3.4 Remark: Non-empty finite intersections of open subbase elements span an open base.
In Definition 32.3.5 line (32.3.2), the constraint D P
1T(S Topx (X) forces the non-empty finite set
collection D to contain only open neighbourhoods of x. So D is always a well-defined element of Topx (X).
32.3.5 Definition: An open subbase for a topological space X is a subset S of Top(X) such that the set
T
{ D; D P
1 (S)} is an open base for X. In other words, S (Top(X)) and P
x X, Topx (X), D P1 (S Topx (X)),
T
(32.3.2)
D .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
32.3.6 Theorem: Let X be a topological space.
(i) Top(X) is an open subbase for X.
(ii) If B is an open base for X, then B is an open subbase for X.
32.3.7 Remark: A set-collection is a subbase for a topology if and only if it generates the topology.
PP
By combining Theorem 32.2.6 and Definition 32.3.5, one sees that a set S ( (X)) is an open subbase
for a topological space (X, T ) if and only if
T =
S
P
C; C (P(X)) and G C, D P
1 (S), G =
T
D
C; G C, D P1 (S), G = D
S T
=
C; C { D; D P
S T
= 1 (S)}
C; C P({ D; D P
S T
= 1 (S)}) . (32.3.3)
32.3.8 Theorem: Let X be a topological space. Let S P(Top(X)). Then S is an open subbase for X
if and only if
P
S S #(
TD)
Top(X), C Sm , = Di . (32.3.4)
m=1 DC i=1
32.3.9 Remark: Diagram for the two-level hierarchy which generates a topology from an open subbase.
The two-level hierarchy in Definition
T 32.3.5 is illustrated in Figure 32.3.1. The subbase elements Ei,j,k S are
intersected
T to construct sets D i,j S Ei,j,1 E
= T i,j,2 . where Di,j = {Ei,j,1 , Ei,j,2, T
. ., T . . .}. The T set intersections
Di,j are joined toSconstruct
S sets C i = D i,1 Di,2 S. . ., where C i = D i,1 , Di,2 , . . . . Then
the topology T = { C1 , C2 , . . .} is the set of all such unions Ci .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
T
S S S
C1 C2 ... Ci ...
T T T T T T
D1,1 D1,2 D2,1 D2,2 Di,1 ... Di,ji ...
E1,1,1 E1,1,2 ... ... ... ... Ei,ji ,1 Ei,ji ,2 Ei,ji ,3 ...
32.3.10 Theorem: Let X and S be sets satisfying {, X} S (X). Let T be a topology on X. Then P
S is an open subbase for the topological space (X, T ) if and only if T is the topology generated by S on X.
P
Proof: Let X and S be sets satisfying {, X} S (X). Let T be a topology on X. Suppose that S is
an open subbase for
S the topological
T space (X, T ). Then by Theorem 32.1.10, the topology generated on S is
the set T (S) = { C; C ({ D; D P P
1 (S)})}. But as mentioned in Remark 32.3.7, this is equivalent
to the condition in Definition 32.3.5 for S to be an open subbase for T (S).
32.4.4 Remark: Generation of a topology from inverse images from multiple topological spaces.
Theorem 32.4.2 is generalised in Theorem 32.4.6 to an arbitrary family of functions and target topologies.
This is useful for proving the validity of topologies on direct products of topological spaces.
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
In Theorem 32.9.3, the family of projections of a Cartesian set-product induces a topology on the
product by forming the unions of intersections of inverse images of open sets via the projections.
(3) Topologically consistent set-collection. Generated topology = set of all unions.
In Theorem 32.1.15, the topology generated by a set-collection is the set of all unions of sets in the
collection if it is closed under finite intersections.
In Theorem 32.11.7, a topology is constructed on the base set X as the set of all unions of open subsets
of overlapping sets which cover X and have consistent topologies on overlaps.
(4) Topologically consistent function-family. Generated topology = set of all unions.
In Theorem 48.7.8, a topology is constructed on the base set M as the set of unions of inverse images
of partial functions on M satisfying the consistency condition in Definition 48.7.2 (iv).
Cases (3) and (4) both require prior consistency between contributing topologies, which explains why their
constructions are expressed in terms of simple unions of sets, not unions of intersections. Thus the generated
topologies in cases (1) and (2) are constructed from open subbases, whereas the generated topologies in cases
(3) and (4) are constructed from open bases.
TX =
S
C; C P(TX0 ) ,
where
0
TX =
T
fj1 (j ); J P1 (I) and j J, j Tj .
jJ
Then TX is a topology on X.
32.4.7 Remark: Properties of the weak topology generated by a family of maps and topological spaces.
Theorem 32.4.6 shows that the set TX in Definition 32.4.8 is a topology on X. Any topology on X which
contains all of the sets fi1 (i ) must include the weak topology on X. It is shown in Theorem 32.7.12 that
TX is the weakest topology on X for which all of the functions fi : X Xi are continuous. (Definition 32.4.8
is used in Definition 32.9.2 for product topologies. A differentiable manifold version of Theorem 32.4.6 is
given as Theorem 50.5.7.)
TX =
S
P
C; C (TX 0
) ,
where
0
TX =
T 1
fj (j ); J P
1 (I) and j J, j Tj .
jJ
In other words,
TX =
S
C; C ({P T 1
fj (j ); J P
1 (I) and (j )jJ Tj }) .
jJ
jJ
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
The usual topology for the integers in Definition 32.5.2 is the only topology on the integers which is
translation-invariant and contains at least one non-empty finite set. (See Remark 31.11.3 for discussion
of this.) This usual topology is the same as the relative topology induced by the set of real numbers in
Definition 32.5.6. It is also the same as the discrete topology in Definition 31.3.19. The same comments
apply to the non-negative integers.
The extended integers are more interesting. The topologies in Definition 32.5.3 are the standard two-point
and one-point compactifications of the topologies in Definition 32.5.2.
32.5.2 Definition: The usual topology on the integers is the set of all subsets of the integers. In other
Z
words, Top( ) = ( ). PZ
The usual topology on the non-negative integers is the set of all subsets of the non-negative integers. In other
words, Top( + Z
0) = ( + PZ
0 ).
32.5.3 Definition: The usual topology on the extended integers is the set of all subsets of the integers
together with all complements of finite subsets of the integers. In other words,
Z
Top( ) = P(Z) {S P(Z); #(Z \ S) < }.
The usual topology on the non-negative extended integers is the set of all subsets of the non-negative integers
together with all complements of finite subsets of the non-negative integers. In other words,
Z
Top( +
0) = P(Z) {S P(Z+0); #(Z \ S) < }.
32.5.4 Remark: The usual topology on the real numbers requires only the total order structure.
The usual topology on the real numbers can be defined in a relatively elementary way without reference
to metric spaces or open bases. The usual total order on the real numbers is a sufficient foundation for
Definition 32.5.6. (Open intervals for general totally ordered sets are defined in Definition 11.5.9. Real
numbers are introduced in Section 15.3.)
For good measure, the usual topology on the set of rational numbers is also given here. It has the same
formal structure as the usual topology on the real numbers, and it is equal to the relative topology induced
by the usual topology on the set of real numbers.
32.5.5 Definition: The usual topology on the rational numbers is the set of all unions of open intervals
of the rational numbers.
32.5.6 Definition: The usual topology on the real numbers is the set of all unions of open intervals of the
real numbers.
32.5.7 Remark: The topology of the real numbers is generated from open intervals.
The usual topology for the real numbers is the topology generated by the set of all real open intervals. (By
Theorem 32.6.8, each open set of real numbers is the union of a countable sequence of disjoint open intervals.)
Definition 32.5.6 is not as circular as it looks. In Definition 16.1.5, an open interval is defined as a set
R
of the form (a, b) = {x ; a < x < b}, where a, b
R
with a b. This definition uses only the order
R R
structure on . It turns out that open intervals are indeed open sets in the usual topology of as one would
expect. Some topological properties of real number intervals are presented in Section 34.9.
The topological and order properties of the real numbers are closely interlinked. This is shown especially by
the two major methods for constructing the real numbers, namely Dedekind cuts (which use a total order)
and Cauchy sequences (which use a distance function). These are discussed in Section 15.3.
R R
32.5.8 Theorem: Top( ), a , + , (a , a + ) .
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
= max(1, min(1 , 2 )). Then +
and (a , a + ) .
32.5.9 Definition: The usual topology for Rn for n Z+ is the topology generated by the set of all
Cartesian products of real open intervals.
32.5.10 Remark: The topology on Cartesian products of real numbers is generated from open intervals.
The Cartesian products of real open intervals which are referred to in Definition 32.5.9 are sets of the form
R N
ni=1 (ai , bi ), where (ai )ni=1 , (bi )ni=1 n are sequence of real numbers such that ai bi for all i n . The
R R R
usual topology on n is the same as the product topology for the set product n = ni=1 . (See Definitions
32.8.4 and 32.9.2 for general product topologies.)
The Cartesian topological spaces in Definition 32.5.11 are Cartesian tuple spaces as in Definition 16.4.1,
together with the usual topology in Definition 32.5.9.
32.5.11 Definition: The Cartesian topological space with dimension n Z+0 is the topological space
R
( n , TRn ), where TRn is the usual topology for n . R
32.5.12 Remark: The standard topology on a finite-dimensional linear space.
The topology in Definition 32.5.9 may be imported onto any finite-dimensional linear space via a coordinate
chart as in Definition 32.5.13. The construction for this importation is given in Theorem 32.4.2. (See
Definitions 22.8.6, 22.8.7 and 22.8.8 for coordinate maps.) The imported topology is independent of the
choice of basis for the chart. (This follows easily from the basic properties of the vector component transition
maps in Definition 22.9.4.)
32.5.13 Definition: The standard topology for a finite-dimensional linear space F is the topology on F
given by {1B (); Top(
n
R
)} for any basis B for F , where n = dim(F ) and B is the component map
for B on F .
32.5.14 Remark: The topology for general linear groups is imported from Cartesian spaces.
In Definition 32.5.15, the standard topology for the general linear group GL(F ) of a finite-dimensional linear
R
space is imported from the parameter space nn for n n matrices via the linear-space component maps
R
B,B in Definition 23.2.8. Since the image of B,B is a proper subset of nn when n 2, the relative
R
topology on the set of invertible matrices in nn is used. The set of invertible matrices is an open subset
R
of the set of all matrices with respect to the topology on Mn,n ( ) which is imported from nn via the R
component maps B,B .
32.5.15 Definition: The standard topology for the general linear group GL(F ) of a finite-dimensional
linear space F is the topology on GL(F ) given by {1
B,B (); Top(
nn
R
)} for any basis B for F , where
n = dim(F ) and B,B is the linear-map component map for B on F as in Definition 23.2.8.
32.5.16 Theorem: Q = R.
R R R
Proof: Let x . Let Topx ( ). Then (x , x + ) for some + . Let N = ceiling(1 ).
Then 1 N . So N 1 . Let q = ceiling(N x)/N . Then N q = ceiling(N x). So N x N q < N x + 1.
Q Q
Therefore x q < x + N 1 x + . So q (x , x + ). But q . So 6= . Therefore x by Q
Q R
Theorem 31.8.16 (ii). Hence = .
Proof: For the case n = 0, Q0 = {0} = R0 . The case n = 1 follows from Theorem 32.5.16. For n 2,
R
the method of Theorem 32.5.16 may be applied to each component xi of x n for i n . This yields a N
N
tuple q = (qi )ni=1 such that xi qi < xi + for all i n . Then q ni=1 (xi , xi + ). Thus q n Q
R Q R
for any given Topx ( n ). Hence n = n for all n + 0. Z
Differential geometry reconstructed.Copyright (C) 2017, Alan U. Kennington. All Rights Reserved.You may print this book draft for personal use.Public redistribution of this book draft in electronic or printed form isforbidden.You may not charge any fee for copies of this book draft.
32.5.20 Remark: Definition of circle topology on semi-open intervals in terms of set translates.
The notation + L in Definition 32.5.21 means the set {x + L; x }. The torus topological spaces in
Definition 32.5.22 are defined as topological space products of circle topological spaces.
32.5.21 Definition: The circle topology for a finite semi-open real-number interval is the topology
{ ( ( + L)) I; T },
where the semi-open interval is I = [a, a + L) or I = (a, a + L] for some a R and L R+ , and T is the
usual topology on . R
32.5.22 Definition: The torus topology for a finite product of finite semi-open real-number intervals is the
N
product topology for a set nk=1 Ik , where for each k n , the set Ik is a semi-open interval Ik = [ak , ak +Lk )
or Ik = (ak , ak + Lk ] for some ak R R
and Lk + , and the topology Tk on each interval Ik is the circle
topology on Ik .
R
page 55. However, Theorem 34.8.2 confirms that no axiom of choice is required for n .) Definition 32.6.2 is
generalised from the real numbers to any topological space by Definition 34.6.16. Theorem 32.6.8 is gener-
alised from the real numbers to general locally connected separable topological spaces by Theorem 34.8.2.
To enumerate the components of an open set of real numbers, ones first instinct might be to list them in
R Z R Z
left-to-right order, but this clearly will not work for open sets like \ or {x \ ; sin(1/ frac(x)) < 0}.
R
If one restricts the focus to open subsets of a bounded interval I of , one may enumerate the intervals of an
open set by successively listing the intervals whose length lies in the interval (L2k1 , L2k ] for k , where
L is the length of the interval I. This list is finite for each k . Thus one may generate an enumeration
of all of the component intervals of the set. This approach is not so feasible for the Cartesian products n R
R
of , however, because in place of the length of open-set components as a criterion for sorting, one would
need to use some other attribute, such as the Lebesgue measure. But then it would not be possible to exploit
the countable-components property of open sets when building Lebesgue measure theory. Therefore a more
general and systematic strategy is usually adopted, which exploits the separability and local connectedness
of the real numbers.
To avoid using the excessively mystical axiom of choice, it is important to check that each stage in the
construction of an enumeration of the components of an open set is expressible in terms of set-theoretic
functions. The usual strategy is as follows.
(1) Construct an explicit enumeration of a countable dense subset of a given topological space, which is in
this case the real number system. (The usual choice of countable dense subset is the rational numbers,
although the set of binary or decimal fractions would be suitable also. See Remark 15.0.2 for some choices
for countable dense subsets. An explicit enumeration of the rational numbers is given in Remark 15.2.4.)
(2) Using an explicit enumeration (xi )i of such a dense subset, construct the connected component of a
given open set which contains any given point xi of the enumeration for i . (This construction is
presented in Theorem 32.6.4 and Definition 32.6.5.)
(3) By countable induction, construct an explicit enumeration of the component intervals of an open set
which associates each integer i with the first new component interval which was not landed on
by earlier elements in the dense subset enumeration.
The most mystical aspect of this procedure is the use of countable induction. (In other words, it is not very