Vous êtes sur la page 1sur 500

Calculus 1 to 4 (20042006)

Axel Schuler
January 3, 2007

Contents
1 Real and Complex Numbers
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notations . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sums and Products . . . . . . . . . . . . . . . . . . . . . .
Mathematical Induction . . . . . . . . . . . . . . . . . . . .
Binomial Coefficients . . . . . . . . . . . . . . . . . . . . .
1.1 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Ordered Sets . . . . . . . . . . . . . . . . . . . . .
1.1.2 Fields . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Ordered Fields . . . . . . . . . . . . . . . . . . . .
1.1.4 Embedding of natural numbers into the real numbers
1.1.5 The completeness of R . . . . . . . . . . . . . . . .
1.1.6 The Absolute Value . . . . . . . . . . . . . . . . . .
1.1.7 Supremum and Infimum revisited . . . . . . . . . .
1.1.8 Powers of real numbers . . . . . . . . . . . . . . . .
1.1.9 Logarithms . . . . . . . . . . . . . . . . . . . . . .
1.2 Complex numbers . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 The Complex Plane and the Polar form . . . . . . .
1.2.2 Roots of Complex Numbers . . . . . . . . . . . . .
1.3 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Monotony of the Power and Exponential Functions .
1.3.2 The Arithmetic-Geometric mean inequality . . . . .
1.3.3 The CauchySchwarz Inequality . . . . . . . . . . .
1.4 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Sequences and Series
2.1 Convergent Sequences . . . . . . . . . . .
2.1.1 Algebraic operations with sequences
2.1.2 Some special sequences . . . . . .
2.1.3 Monotonic Sequences . . . . . . .
2.1.4 Subsequences . . . . . . . . . . . .
2.2 Cauchy Sequences . . . . . . . . . . . . .
2.3 Series . . . . . . . . . . . . . . . . . . . .
3

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

11
11
11
12
12
13
15
15
17
19
20
21
22
23
24
26
29
31
33
34
34
34
35
36

.
.
.
.
.
.
.

43
43
46
49
50
51
55
57

CONTENTS

4
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.6
2.3.7
2.3.8
2.3.9
2.3.10
2.3.11

Properties of Convergent Series . . .


Operations with Convergent Series . .
Series of Nonnegative Numbers . . .
The Number e . . . . . . . . . . . .
The Root and the Ratio Tests . . . . .
Absolute Convergence . . . . . . . .
Decimal Expansion of Real Numbers
Complex Sequences and Series . . .
Power Series . . . . . . . . . . . . .
Rearrangements . . . . . . . . . . . .
Products of Series . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

57
59
59
61
63
65
66
67
68
69
72

3 Functions and Continuity


75
3.1 Limits of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity . . . . . . . . . 77
3.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2.1 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . 81
3.2.2 Continuous Functions on Bounded and Closed IntervalsThe Theorem about Maximum and M
3.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4 Monotonic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses . . . 86
3.5.1 Exponential and Logarithm Functions . . . . . . . . . . . . . . . . . . 86
3.5.2 Trigonometric Functions and their Inverses . . . . . . . . . . . . . . . 89
3.5.3 Hyperbolic Functions and their Inverses . . . . . . . . . . . . . . . . . 94
3.6 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.6.1 Monotonic Functions have One-Sided Limits . . . . . . . . . . . . . . 95
3.6.2 Proofs for sin x and cos x inequalities . . . . . . . . . . . . . . . . . . 96
3.6.3 Estimates for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4 Differentiation
4.1 The Derivative of a Function . . . . . . . . .
4.2 The Derivatives of Elementary Functions . .
4.2.1 Derivatives of Higher Order . . . . .
4.3 Local Extrema and the Mean Value Theorem
4.3.1 Local Extrema and Convexity . . . .
4.4 LHospitals Rule . . . . . . . . . . . . . . .
4.5 Taylors Theorem . . . . . . . . . . . . . . .
4.5.1 Examples of Taylor Series . . . . . .
4.6 Appendix C . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

101
101
107
108
108
111
112
113
115
117

5 Integration
119
5.1 The RiemannStieltjes Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1.1 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . 126

CONTENTS
5.2

5.3

5.4
5.5
5.6

Integration and Differentiation . . . . . . . . . . . . . . .


5.2.1 Table of Antiderivatives . . . . . . . . . . . . . .
5.2.2 Integration Rules . . . . . . . . . . . . . . . . . .
5.2.3 Integration of Rational Functions . . . . . . . . .
5.2.4 Partial Fraction Decomposition . . . . . . . . . .
5.2.5 Other Classes of Elementary Integrable Functions .
Improper Integrals . . . . . . . . . . . . . . . . . . . . .
5.3.1 Integrals on unbounded intervals . . . . . . . . . .
5.3.2 Integrals of Unbounded Functions . . . . . . . . .
5.3.3 The Gamma function . . . . . . . . . . . . . . . .
Integration of Vector-Valued Functions . . . . . . . . . . .
Inequalities . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix D . . . . . . . . . . . . . . . . . . . . . . . . .
5.6.1 More on the Gamma Function . . . . . . . . . . .

6 Sequences of Functions and Basic Topology


6.1 Discussion of the Main Problem . . . . . . . . . .
6.2 Uniform Convergence . . . . . . . . . . . . . . . .
6.2.1 Definitions and Example . . . . . . . . . .
6.2.2 Uniform Convergence and Continuity . . .
6.2.3 Uniform Convergence and Integration . . .
6.2.4 Uniform Convergence and Differentiation .
6.3 Fourier Series . . . . . . . . . . . . . . . . . . . .
6.3.1 An Inner Product on the Periodic Functions
6.4 Basic Topology . . . . . . . . . . . . . . . . . . .
6.4.1 Finite, Countable, and Uncountable Sets . .
6.4.2 Metric Spaces and Normed Spaces . . . . .
6.4.3 Open and Closed Sets . . . . . . . . . . .
6.4.4 Limits and Continuity . . . . . . . . . . .
6.4.5 Comleteness and Compactness . . . . . . .
6.4.6 Continuous Functions in Rk . . . . . . . .
6.5 Appendix E . . . . . . . . . . . . . . . . . . . . .
7 Calculus of Functions of Several Variables
7.1 Partial Derivatives . . . . . . . . . . . . .
7.1.1 Higher Partial Derivatives . . . .
7.1.2 The Laplacian . . . . . . . . . . .
7.2 Total Differentiation . . . . . . . . . . . .
7.2.1 Basic Theorems . . . . . . . . . .
7.3 Taylors Formula . . . . . . . . . . . . .
7.3.1 Directional Derivatives . . . . . .
7.3.2 Taylors Formula . . . . . . . . .
7.4 Extrema of Functions of Several Variables

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

132
134
135
138
140
141
143
143
146
147
148
150
151
152

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

157
157
158
158
162
163
166
168
171
177
177
178
180
182
185
187
188

.
.
.
.
.
.
.
.
.

193
194
197
199
199
202
206
206
208
211

CONTENTS

6
7.5
7.6
7.7
7.8

7.9

The Inverse Mapping Theorem . . . . . . .


The Implicit Function Theorem . . . . . . .
Lagrange Multiplier Rule . . . . . . . . . .
Integrals depending on Parameters . . . . .
7.8.1 Continuity of I(y) . . . . . . . . .
7.8.2 Differentiation of Integrals . . . . .
7.8.3 Improper Integrals with Parameters
Appendix . . . . . . . . . . . . . . . . . .

8 Curves and Line Integrals


8.1 Rectifiable Curves . . . . .
8.1.1 Curves in Rk . . .
8.1.2 Rectifiable Curves
8.2 Line Integrals . . . . . . .
8.2.1 Path Independence

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

9 Integration of Functions of Several Variables


9.1 Basic Definition . . . . . . . . . . . . . . . . .
9.1.1 Properties of the Riemann Integral . . .
9.2 Integrable Functions . . . . . . . . . . . . . .
9.2.1 Integration over More General Sets . .
9.2.2 Fubinis Theorem and Iterated Integrals
9.3 Change of Variable . . . . . . . . . . . . . . .
9.4 Appendix . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

10 Surface Integrals
10.1 Surfaces in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 The Area of a Surface . . . . . . . . . . . . . . . . . . . .
10.2 Scalar Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Other Forms for dS . . . . . . . . . . . . . . . . . . . . .
10.2.2 Physical Application . . . . . . . . . . . . . . . . . . . . .
10.3 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4 Gau Divergence Theorem . . . . . . . . . . . . . . . . . . . . . .
10.5 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5.1 Greens Theorem . . . . . . . . . . . . . . . . . . . . . . .
10.5.2 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . .
10.5.3 Vector Potential and the Inverse Problem of Vector Analysis

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

216
219
223
225
225
225
227
230

.
.
.
.
.

231
231
231
233
236
239

.
.
.
.
.
.
.

245
245
247
248
249
250
253
257

.
.
.
.
.
.
.
.
.
.
.
.

259
259
261
262
262
264
264
264
268
272
272
274
276

.
.
.
.

279
279
279
284
285

11 Differential Forms on n
11.1 The Exterior Algebra (Rn ) . . .
11.1.1 The Dual Vector Space V
11.1.2 The Pull-Back of k-forms
11.1.3 Orientation of Rn . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

CONTENTS

11.2 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . .


11.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.3 Pull-Back . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.4 Closed and Exact Forms . . . . . . . . . . . . . . . . . . . .
11.3 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.1 Singular Cubes, Singular Chains, and the Boundary Operator
11.3.2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.3 Stokes Theorem . . . . . . . . . . . . . . . . . . . . . . . .
11.3.4 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . .
12 Measure Theory and Integration
12.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.1 Algebras, -algebras, and Borel Sets . . . . . . . . . . . . .
12.1.2 Additive Functions and Measures . . . . . . . . . . . . . .
12.1.3 Extension of Countably Additive Functions . . . . . . . . .
12.1.4 The Lebesgue Measure on Rn . . . . . . . . . . . . . . . .
12.2 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . .
12.3.1 Simple Functions . . . . . . . . . . . . . . . . . . . . . . .
12.3.2 Positive Measurable Functions . . . . . . . . . . . . . . .
12.4 Some Theorems on Lebesgue Integrals . . . . . . . . . . . . . . . .
12.4.1 The Role Played by Measure Zero Sets . . . . . . . . . . .
12.4.2 The space Lp (X, ) . . . . . . . . . . . . . . . . . . . . . .
12.4.3 The Monotone Convergence Theorem . . . . . . . . . . . .
12.4.4 The Dominated Convergence Theorem . . . . . . . . . . .
12.4.5 Application of Lebesgues Theorem to Parametric Integrals .
12.4.6 The Riemann and the Lebesgue Integrals . . . . . . . . . .
12.4.7 Appendix: Fubinis Theorem . . . . . . . . . . . . . . . . .
13 Hilbert Space
13.1 The Geometry of the Hilbert Space . . . . . . .
13.1.1 Unitary Spaces . . . . . . . . . . . . .
13.1.2 Norm and Inner product . . . . . . . .
13.1.3 Two Theorems of F. Riesz . . . . . . .
13.1.4 Orthogonal Sets and Fourier Expansion
13.1.5 Appendix . . . . . . . . . . . . . . . .
13.2 Bounded Linear Operators in Hilbert Spaces . .
13.2.1 Bounded Linear Operators . . . . . . .
13.2.2 The Adjoint Operator . . . . . . . . . .
13.2.3 Classes of Bounded Linear Operators .
13.2.4 Orthogonal Projections . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

285
285
286
288
291
293
293
295
296
298
299

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

305
305
306
308
313
314
316
318
318
319
322
322
324
325
326
327
329
329

.
.
.
.
.
.
.
.
.
.
.

331
331
331
334
335
339
343
344
344
347
349
351

CONTENTS

13.2.5 Spectrum and Resolvent . . . . . . . . . . . . . . . . . . . . . . . . . 353


13.2.6 The Spectrum of Self-Adjoint Operators . . . . . . . . . . . . . . . . . 357
14 Complex Analysis
14.1 Holomorphic Functions . . . . . . . . . . . . .
14.1.1 Complex Differentiation . . . . . . . .
14.1.2 Power Series . . . . . . . . . . . . . .
14.1.3 CauchyRiemann Equations . . . . . .
14.2 Cauchys Integral Formula . . . . . . . . . . .
14.2.1 Integration . . . . . . . . . . . . . . .
14.2.2 Cauchys Theorem . . . . . . . . . . .
14.2.3 Cauchys Integral Formula . . . . . . .
14.2.4 Applications of the Coefficient Formula
14.2.5 Power Series . . . . . . . . . . . . . .
14.3 Local Properties of Holomorphic Functions . .
14.4 Singularities . . . . . . . . . . . . . . . . . . .
14.4.1 Classification of Singularities . . . . .
14.4.2 Laurent Series . . . . . . . . . . . . .
14.5 Residues . . . . . . . . . . . . . . . . . . . . .
14.5.1 Calculating Residues . . . . . . . . . .
14.6 Real Integrals . . . . . . . . . . . . . . . . . .
14.6.1 Rational Functions in Sine and Cosine .
R
14.6.2 Integrals of the form f (x) dx . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

15 Partial Differential Equations I an Introduction


15.1 Classification of PDE . . . . . . . . . . . . . . . .
15.1.1 Introduction . . . . . . . . . . . . . . . . .
15.1.2 Examples . . . . . . . . . . . . . . . . . .
15.2 First Order PDE The Method of Characteristics
15.3 Classification of Semi-Linear Second-Order PDEs .
15.3.1 Quadratic Forms . . . . . . . . . . . . . .
15.3.2 Elliptic, Parabolic and Hyperbolic . . . . .
15.3.3 Change of Coordinates . . . . . . . . . . .
15.3.4 Characteristics . . . . . . . . . . . . . . .
15.3.5 The Vibrating String . . . . . . . . . . . .
16 Distributions
16.1 Introduction Test Functions and Distributions .
16.1.1 Motivation . . . . . . . . . . . . . . . .
16.1.2 Test Functions D(Rn ) and D() . . . .
16.2 The Distributions D (Rn ) . . . . . . . . . . . . .
16.2.1 Regular Distributions . . . . . . . . . . .
16.2.2 Other Examples of Distributions . . . . .

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

363
363
363
365
366
369
369
371
373
377
380
383
385
386
387
392
394
395
395
396

.
.
.
.
.
.
.
.
.
.

401
401
401
402
405
408
408
408
409
411
414

.
.
.
.
.
.

417
417
417
418
422
422
424

CONTENTS
16.2.3 Convergence and Limits of Distributions
16.2.4 The distribution P x1 . . . . . . . . . . .
16.2.5 Operation with Distributions . . . . . . .
16.3 Tensor Product and Convolution Product . . . . .
16.3.1 The Support of a Distribution . . . . . .
16.3.2 Tensor Products . . . . . . . . . . . . . .
16.3.3 Convolution Product . . . . . . . . . . .
16.3.4 Linear Change of Variables . . . . . . . .
16.3.5 Fundamental Solutions . . . . . . . . . .
16.4 Fourier Transformation in S (Rn ) and S (Rn ) .
16.4.1 The Space S (Rn ) . . . . . . . . . . . .
16.4.2 The Space S (Rn ) . . . . . . . . . . . .
16.4.3 Fourier Transformation in S (Rn ) . . . .
16.5 AppendixMore about Convolutions . . . . . .

9
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

425
426
427
433
433
433
434
437
438
439
440
446
447
450

17 PDE II The Equations of Mathematical Physics


17.1 Fundamental Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1.1 The Laplace Equation . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1.2 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1.3 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2 The Cauchy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2.1 Motivation of the Method . . . . . . . . . . . . . . . . . . . . . . . .
17.2.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2.3 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2.4 Physical Interpretation of the Results . . . . . . . . . . . . . . . . . .
17.3 Fourier Method for Boundary Value Problems . . . . . . . . . . . . . . . . . .
17.3.1 Initial Boundary Value Problems . . . . . . . . . . . . . . . . . . . . .
17.3.2 Eigenvalue Problems for the Laplace Equation . . . . . . . . . . . . .
17.4 Boundary Value Problems for the Laplace and the Poisson Equations . . . . . .
17.4.1 Formulation of Boundary Value Problems . . . . . . . . . . . . . . . .
17.4.2 Basic Properties of Harmonic Functions . . . . . . . . . . . . . . . . .
17.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.5.1 Existence of Solutions to the Boundary Value Problems . . . . . . . . .
17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet Principle
17.5.3 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .

453
453
453
455
456
459
459
460
464
466
468
469
473
477
477
478
485
485
490
494

10

CONTENTS

Chapter 1
Real and Complex Numbers
Basics
Notations

R
C
Q
N = {1, 2, . . . }
Z

Real numbers
Complex numbers
Rational numbers
positive integers (natural numbers)
Integers

We know that N Z Q R C. We write R+ , Q+ and Z+ for the non-negative


real, rational, and integer numbers x 0, respectively. The notions A B and A B are
equivalent. If we want to point out that B is strictly bigger than A we write A ( B.
We use the following symbols
:=
y,

defining equation
implication, if . . . , then . . .
if and only if, equivalence
for all
there exists

Let a < b fixed real numbers. We denote the intervals as follows


[a, b] := {x R | a x b}
(a, b) := {x R | a < x < b}
[a, b) := {x R | a x < b}
(a, b] := {x R | a < x b}
[a, ) := {x R | a x}
(a, ) := {x R | a < x}
(, b] := {x R | x b}
(, b) := {x R | x < b}

closed interval
open interval
half-open interval
half-open interval
closed half-line
open half-line
closed half-line
open half-line

11

1 Real and Complex Numbers

12

(a) Sums and Products


P
Q
Let us recall the meaning of the sum sign
and the product sign . Suppose m n are
integers, and ak , k = m, . . . , n are real numbers. Then we set
n
X

k=m

n
Y

ak := am + am+1 + + an ,

k=m

ak := am am+1 an .

In case m = n the sum and the product consist of one summand and one factor only, respectively. In case n < m it is customary to set
n
X

ak := 0, (empty sum)

k=m

n
Y

ak := 1

(empty product).

k=m

The following rules are obvious: If m n p and d Z are integers then


n
X

k=m

ak +

p
X

ak =

k=n+1

We have for a R,

n
X

k=m

p
X

ak ,

k=m

n
X

ak =

k=m

n+d
X

akd

(index shift).

k=m+d

a = (n m + 1)a.

(b) Mathematical Induction


Mathematical induction is a powerful method to prove theorems about natural numbers.
Theorem 1.1 (Principle of Mathematical Induction) Let n0 Z be an integer. To prove a
statement A(n) for all integers n n0 it is sufficient to show:
(I) A(n0 ) is true.
(II) For any n n0 : If A(n) is true, so is A(n + 1) (Induction step).
It is easy to see how the principle works: First, A(n0 ) is true. Apply (II) to n = n0 we obtain
that A(n0 + 1) is true. Successive application of (II) yields A(n0 + 2), A(n0 + 3) are true and
so on.
Example 1.1 (a) For all nonnegative integers n we have

n
X
k=1

(2k 1) = n2 .

Proof. We use induction over n. In case n = 0 we have an empty sum on the left hand side (lhs)
and 02 = 0 on the right hand side (rhs). Hence, the statement is true for n = 0.
Suppose it is true for some fixed n. We shall prove it for n + 1. By the definition of the sum
P
and by induction hypothesis, nk=1 (2k 1) = n2 , we have
n+1
X
k=1

(2k 1) =

n
X
k=1

(2k 1) + 2(n + 1) 1

This proves the claim for n + 1.

ind. hyp.

n2 + 2n + 1 = (n + 1)2 .

13
(b) For all positive integers n 8 we have 2n > 3n2 .
Proof. In case n = 8 we have
2n = 28 = 256 > 192 = 3 64 = 3 82 = 3n2 ;
and the statement is true in this case.
Suppose it is true for some fixed n 8, i. e. 2n > 3n2 (induction hypothesis). We will show
that the statement is true for n + 1, i. e. 2n+1 > 3(n + 1)2 (induction assertion). Note that n 8
implies
= (n 1)2 > 4 > 2

n1 7>2

= 3(n2 2n 1) > 0

= 6n2 > 3n2 + 6n + 3

= 2 3n2 > 3(n + 1)2 .

= 3n2 6n 3 > 0

= 2 3n2 > 3(n2 + 2n + 1)

= n2 2n 1 > 0
| +3n2 + 6n + 3

(1.1)

By induction assumption, 2n+1 = 2 2n > 2 3n2 . This together with (1.1) yields
2n+1 > 3(n + 1)2 . Thus, we have shown the induction assertion. Hence the statement is true
for all positive integers n 8.
For a positive integer n N we set
n! :=

n
Y

k,

read: n factorial,

0! = 1! = 1.

k=1

(c) Binomial Coefficients


For non-negative integers n, k Z+ we define
 
k
Y
ni+1
n(n 1) (n k + 1)
n
=
.
:=
i
k(k 1) 2 1
k
i=1


The numbers nk (read: n choose k) are called binomial coefficients since they appear in the
binomial theorem, see Proposition 1.4 below. It just follows from the definition that
 
n
= 0 for k > n,
k
 


n!
n
n
=
for 0 k n.
=
k
k!(n k)!
nk
Lemma 1.2 For 0 k n we have:

   

n
n
n+1
.
+
=
k+1
k
k+1

1 Real and Complex Numbers

14

Proof. For k = n the formula is obvious. For 0 k n 1 we have



  
n!
n
n
n!
=
+
+
k+1
k
k!(n k)! (k + 1)!(n k 1)!


(k + 1)n! + (n k)n!
(n + 1)!
n+1
=
.
=
=
(k + 1)!(n k)!
(k + 1)!(n k)!
k+1

We say that X is an n-set if X has exactly n elements. We write Card X = n (from cardinality) to denote the number of elements in X.

Lemma 1.3 The number of k-subsets of an n-set is nk .

The Lemma in particular shows that nk is always an integer (which is not obvious by its definition).
Proof. We denote the number of k-subsets of an n set Xn by Ckn . It is clear that C0n = Cnn = 1
since is the only 0-subset of Xn and Xn itself is the only n-subset of Xn . We use induction


over n. The case n = 1 is obvious since C01 = C11 = 10 = 11 = 1. Suppose that the claim is
true for some fixed n. We will show the statement for the (n + 1)-set X = {1, . . . , n + 1} and
all k with 1 k n. The family of (k + 1)-subsets of X splits into two disjoint classes. In the
first class A1 every subset contains n + 1; in the second class A2 , not. To form a subset in A1
one has to choose another k elements out of {1, . . . , n}. By induction assumption the number

is Card A1 = Ckn = nk . To form a subset in A2 one has to choose k + 1 elements out of

n
n
. By Lemma 1.2
= k+1
{1, . . . , n}. By induction assumption this number is Card A2 = Ck+1
we obtain

 
  
n+1
n
n
n+1
Ck+1 = Card A1 + Card A2 =
=
+
k+1
k+1
k
which proves the induction assertion.

Proposition 1.4 (Binomial Theorem) Let x, y R and n N. Then we have


n  
X
n nk k
(x + y) =
x y .
k
k=0
n

Proof. We give a direct proof. Using the distributive law we find that each of the 2n summands
of product (x + y)n has the form xnk y k for some k = 0, . . . , n. We number the n factors
as (x + y)n = f1 f2 fn , f1 = f2 = = fn = x + y. Let us count how often the
summand xnk y k appears. We have to choose k factors y out of the n factors f1 , . . . , fn . The
remaining n k factors must be x. This gives a 1-1-correspondence between the k-subsets
of {f1 , . . . , fn } and the different summands of the form xnk y k . Hence, by Lemma 1.3 their

number is Ckn = nk . This proves the proposition.

1.1 Real Numbers

15

1.1 Real Numbers


In this lecture course we assume the system of real numbers to be given. Recall that the set of
integers is Z = {0, 1, 2, . . . } while the fractions of integers Q = { m
| m, n Z, n 6= 0}
n
form the set of rational numbers.
A satisfactory discussion of the main concepts of analysis such as convergence, continuity,
differentiation and integration must be based on an accurately defined number concept.
An existence proof for the real numbers is given in [Rud76, Appendix to Chapter 1]. The author
explicitly constructs the real numbers R starting from the rational numbers Q.
The aim of the following two sections is to formulate the axioms which are sufficient to derive
all properties and theorems of the real number system.
The rational numbers are inadequate for many purposes, both as a field and an ordered set. For
instance, there is no rational x with x2 = 2. This leads to the introduction of irrational numbers
which are often written as infinite decimal expansions and are considered to be approximated
by the corresponding finite decimals. Thus the sequence
1, 1.4, 1.41, 1.414, 1.4142, . . .

tends to 2. But unless the irrational number 2 has been clearly defined, the question must
arise: What is it that this sequence tends to?
This sort of question can be answered as soon as the so-called real number system is constructed.
Example 1.2 As shown in the exercise class, there is no rational number x with x2 = 2. Set
A = {x Q+ | x2 < 2} and

B = {x Q+ | x2 > 2}.

Then A B = Q+ and A B = . One can show that in the rational number system, A
has no largest element and B has no smallest element, for details see Appendix A or Rudins
book [Rud76, Example 1.1, page 2]. This example shows that the system of rational numbers
has certain gaps in spite of the fact that between any two rationals there is another: If r < s
then r < (r + s)/2 < s. The real number system fills these gaps. This is the principal reason
for the fundamental role which it plays in analysis.
We start with the brief discussion of the general concepts of ordered set and field.

1.1.1 Ordered Sets


Definition 1.1 (a) Let S be a set. An order (or total order) on S is a relation, denoted by <,
with the following properties. Let x, y, z S.
(i) One and only one of the following statements is true.
x < y,

x = y,

(ii) x < y and y < z implies x < z

y<x

(trichotomy)

(transitivity).

1 Real and Complex Numbers

16

In this case S is called an ordered set.


(b) Suppose (S, <) is an ordered set, and E S. If there exists a S such that x for
all x E, we say that E is bounded above, and call an upper bound of E. Lower bounds are
defined in the same way with in place of .
If E is both bounded above and below, we say that E is bounded.
The statement x < y may be read as x is less than y or x precedes y. It is convenient to
write y > x instead of x < y. The notation x y indicates x < y or x = y. In other words,
x y is the negation of x > y. For example, R is an ordered set if r < s is defined to mean
that s r > 0 is a positive real number.
Example 1.3 (a) The intervals [a, b], (a, b], [a, b), (a, b), (, b), and (, b] are bounded
above by b and all numbers greater than b.
(b) E := { n1 | n N} = {1, 12 , 13 , . . . } is bounded above by any 1. It is bounded below
by 0.
Definition 1.2 Suppose S is an ordered set, E S, an E is bounded above. Suppose there
exists an S such that
(i) is an upper bound of E.
(ii) If is an upper bound of E then .
Then is called the supremum of E (or least upper bound) of E. We write
= sup E.
An equivalent formulation of (ii) is the following:
(ii) If < then is not an upper bound of E.
The infimum (or greatest lower bound) of a set E which is bounded below is defined in the same
manner: The statement
= inf E
means that is a lower bound of E and for all lower bounds of E we have .
Example 1.4 (a) If = sup E exists, then may or may not belong to E. For instance consider
[0, 1) and [0, 1]. Then
1 = sup[0, 1) = sup[0, 1],
however 1 6 [0, 1) but 1 [0, 1]. We will show that sup[0, 1] = 1. Obviously, 1 is an upper
bound of [0, 1]. Suppose that < 1, then is not an upper bound of [0, 1] since 6 1. Hence
1 = sup[0, 1].
We show will show that sup[0, 1) = 1. Obviously, 1 is an upper bound of this interval. Suppose
that < 1. Then < +1
< 1. Since +1
[0, 1), is not an upper bound. Consequently,
2
2
1 = sup[0, 1).
(b) Consider the sets A and B of Example 1.2 as subsets of the ordered set Q. Since AB = Q+
(there is no rational number with x2 = 2) the upper bounds of A are exactly the elements of B.

1.1 Real Numbers

17

Indeed, if a A and b B then a2 < 2 < b2 . Taking the square root we have a < b. Since B
contains no smallest member, A has no supremum in Q+ .
Similarly, B is bounded below by any element of A. Since A has no largest member, B has no
infimum in Q.
Remarks 1.1 (a) It is clear from (ii) and the trichotomy of < that there is at most one such .
Indeed, suppose also satisfies (i) and (ii), by (ii) we have and ; hence = .
(b) If sup E exists and belongs to E, we call it the maximum of E and denote it by max E.
Hence, max E = sup E and max E E. Similarly, if the infimum of E exists and belongs to
E we call it the minimum and denote it by min E; min E = inf E, min E E.
bounded subset of Q
[0, 1]
[0, 1)
A

an upper bound sup


2
2
2

1
1

max
1

(c) Suppose that is an upper bound of E and E then = max E, that is, property (ii) in
Definition 1.2 is automatically satisfied. Similarly, if E is a lower bound, then = min E.
(d) If E is a finite set it has always a maximum and a minimum.

1.1.2 Fields
Definition 1.3 A field is a set F with two operations, called addition and multiplication which
satisfy the following so-called field axioms (A), (M), and (D):
(A) Axioms for addition
(A1)
(A2)
(A3)
(A4)
(A5)

If x F and y F then their sum x + y is in F .


Addition is commutative: x + y = y + x for all x, y F .
Addition is associative: (x + y) + z = x + (y + z) for all x, y, z F .
F contains an element 0 such that 0 + x = x for all x F .
To every x F there exists an element x F such that x + (x) = 0.
(M) Axioms for multiplication

(M1)
(M2)
(M3)
(M4)
(M5)

If x F and y F then their product xy is in F .


Multiplication is commutative: xy = yx for all x, y F .
Multiplication is associative: (xy)z = x(yz) for all x, y, z F .
F contains an element 1 such that 1x = x for all x F .
If x F and x 6= 0 then there exists an element 1/x F such that x (1/x) = 1.
(D) The distributive law
x(y + z) = xy + xz

holds for all x, y, z F .

1 Real and Complex Numbers

18
Remarks 1.2 (a) One usually writes
x y,

x
, x + y + z, xyz, x2 , x3 , 2x, . . .
y

in place of
1
x + (y), x , (x + y) + z, (xy)z, x x, x x x, 2x, . . .
y
(b) The field axioms clearly hold in Q if addition and multiplication have their customary meaning. Thus Q is a field. The integers Z form not a field since 2 Z has no multiplicative inverse
(axiom (M5) is not fulfilled).
(c) The smallest field is F2 = {0, 1} consisting of the neutral element 0 for addition and the neu+ 0 1
tral element 1 for multiplication. Multiplication and addition are defined as follows 0 0 1
1 1 0
0 1
0 0 0 . It is easy to check the field axioms (A), (M), and (D) directly.
1 0 1
(d) (A1) to (A5) and (M1) to (M5) mean that both (F, +) and (F \ {0}, ) are commutative (or
abelian) groups, respectively.
Proposition 1.5 The axioms of addition imply the following statements.
(a) If x + y = x + z then y = z (Cancellation law).
(b) If x + y = x then y = 0 (The element 0 is unique).
(c) If x + y = 0 the y = x (The inverse x is unique).
(d) (x) = x.
Proof. If x + y = x + z, the axioms (A) give
y = 0 + y = (x + x) + y = x + (x + y) = x + (x + z)
A4

A5

A3

assump.

= (x + x) + z = 0 + z = z.

A3

A5

A4

This proves (a). Take z = 0 in (a) to obtain (b). Take z = x in (a) to obtain (c). Since
x + x = 0, (c) with x in place of x and x in place of y, gives (d).

Proposition 1.6 The axioms for multiplication imply the following statements.
(a) If x 6= 0 and xy = xz then y = z (Cancellation law).
(b) If x 6= 0 and xy = x then y = 1 (The element 1 is unique).
(c) If x 6= 0 and xy = 1 then y = 1/x (The inverse 1/x is unique).
(d) If x 6= 0 then 1/(1/x) = x.
The proof is so similar to that of Proposition 1.5 that we omit it.

1.1 Real Numbers

19

Proposition 1.7 The field axioms imply the following statements, for any x, y, z F
(a) 0x = 0.
(b) If xy = 0 then x = 0 or y = 0.
(c) (x)y = (xy) = x(y).
(d) (x)(y) = xy.
Proof. 0x + 0x = (0 + 0)x = 0x. Hence 1.5 (b) implies that 0x = 0, and (a) holds.
Suppose to the contrary that both x 6= 0 and y 6= 0 then (a) gives
1 1
1 1
1 = xy = 0 = 0,
y x
y x
a contradiction. Thus (b) holds.
The first equality in (c) comes from
(x)y + xy = (x + x)y = 0y = 0,
combined with 1.5 (b); the other half of (c) is proved in the same way. Finally,
(x)(y) = [x(y)] = [xy] = xy
by (c) and 1.5 (d).

1.1.3 Ordered Fields


In analysis dealing with equations is as important as dealing with inequalities. Calculations
with inequalities are based on the ordering axioms. It turns out that all can be reduced to the
notion of positivity.
In F there are distinguished positive elements (x > 0) such that the following axioms are valid.
Definition 1.4 An ordered field is a field F which is also an ordered set, such that for all
x, y, z F
(O) Axioms for ordered fields
(O1) x > 0 and y > 0 implies x + y > 0,
(O2) x > 0 and y > 0 implies xy > 0.
If x > 0 we call x positive; if x < 0, x is negative.
For example Q and R are ordered fields, if x > y is defined to mean that x y is positive.
Proposition 1.8 The following statements are true in every ordered field F .
(a) If x < y and a F then a + x < a + y.
(b) If x < y and x < y then x + x < y + y .

1 Real and Complex Numbers

20

Proof. (a) By assumption (a + y) (a + x) = y x > 0. Hence a + x < a + y.


(b) By assumption and by (a) we have x + x < y + x and y + x < y + y . Using transitivity,
see Definition 1.1 (ii), we have x + x < y + y .

Proposition 1.9 The following statements are true in every ordered field.
(a) If x > 0 then x < 0, and if x < 0 then x > 0.
(b) If x > 0 and y < z then xy < xz.
(c) If x < 0 and y < z then xy > xz.
(d) If x 6= 0 then x2 > 0. In particular, 1 > 0.
(e) If 0 < x < y then 0 < 1/y < 1/x.
Proof. (a) If x > 0 then 0 = x + x > x + 0 = x, so that x < 0. If x < 0 then
0 = x + x < x + 0 = x so that x > 0. This proves (a).
(b) Since z > y, we have z y > 0, hence x(z y) > 0 by axiom (O2), and therefore
xz = x(z y) + xy

>

P rp. 1.8

0 + xy = xy.

(c) By (a), (b) and Proposition 1.7 (c)


[x(z y)] = (x)(z y) > 0,
so that x(z y) < 0, hence xz < xy.
(d) If x > 0 axiom 1.4 (ii) gives x2 > 0. If x < 0 then x > 0, hence (x)2 > 0 But
x2 = (x)2 by Proposition 1.7 (d). Since 12 = 1, 1 > 0.
(e) If y > 0 and v 0 then yv 0. But y (1/y) = 1 > 0. Hence 1/y > 0, likewise 1/x > 0.
If we multiply x < y by the the positive quantity (1/x)(1/y), we obtain 1/y < 1/x.

Remarks 1.3 (a) The finite field F2 = {0, 1}, see Remarks 1.2, is not an ordered field since
1 + 1 = 0 which contradicts 1 > 0.
(b) The field of complex numbers C (see below) is not an ordered field since i2 = 1 contradicts
Proposition 1.9 (a), (d).

1.1.4 Embedding of natural numbers into the real numbers


Let F be an ordered field. We want to recover the integers inside F . In order to distinguish 0
and 1 in F from the integers 0 and 1 we temporarily write 0F and 1F . For a positive integer
n N, n 2 we define
nF := 1F + 1F + + 1F
Lemma 1.10 We have nF > 0F for all n N.

(n times).

1.1 Real Numbers

21

Proof. We use induction over n. By Proposition 1.9 (d) the statement is true for n = 1. Suppose
it is true for a fixed n, i. e. nF > 0F . Moreover 1F > 0F . Using axiom (O2) we obtain
(n + 1)1F = nF + 1F > 0.
From Lemma 1.10 it follows that m 6= n implies nF 6= mF . Indeed, let n be greater than m,
say n = m + k for some k N, then nF = mF + kF . Since kF > 0 it follows from 1.8 (a) that
nF > mF . In particular, nF 6= mF . Hence, the mapping

N F,

n 7 nF

is a one-to-one correspondence (injective). In this way the positive integers are embedded into
the real numbers. Addition and multiplication of natural numbers and of its embeddings are the
same:
nF mF = (nm)F .
nF + mF = (n + m)F ,
From now on we identify a natural number with the associated real number. We write n for nF .
Definition 1.5 (The Archimedean Axiom) An ordered field F is called Archimedean if for all
x, y F with x > 0 and y > 0 there exists n N such that nx > y.

An equivalent formulation is: The subset N F of positive integers is not bounded above.
Choose x = 1 in the above definition, then for any y F there in an n N such that n > y;
hence N is not bounded above.
Suppose N is not bounded and x > 0, y > 0 are given. Then y/x is not an upper bound for N,
that is there is some n N with n > y/x or nx > y.

1.1.5 The completeness of

Using the axioms so far we are not yet able to prove the existence of irrational numbers. We
need the completeness axiom.
Definition 1.6 (Order Completeness) An ordered set S is said to be order complete if for
every non-empty bounded subset E S has a supremum sup E in S.
(C) Completeness Axiom
The real numbers are order complete, i. e. every bounded subset E R has a supremum.
The set Q of rational numbers is not order complete since, for example, the bounded set

A = {x Q+ | x2 < 2} has no supremum in Q. Later we will define 2 := sup A.

The existence of 2 in R is furnished by the completeness axiom (C).


Axiom (C) implies that every bounded subset E R has an infimum. This is an easy consequence of Homework 1.4 (a).
We will see that an order complete field is always Archimedean.
Proposition 1.11 (a) R is Archimedean.
(b) If x, y R, and x < y then there is a p Q with x < p < y.

1 Real and Complex Numbers

22

Part (b) may be stated by saying that Q is dense in R.


Proof. (a) Let x, y > 0 be real numbers which do not fulfill the Archimedean property. That is,
if A := {nx | n N}, then y would be an upper bound of A. Then (C) furnishes that A has a
supremum = sup A. Since x > 0, x < and x is not an upper bound of A. Hence
x < mx for some m N. But then < (m + 1)x, which is impossible, since is an upper
bound of A.
(b) See [Rud76, Theorem 29].

Remarks 1.4 (a) If x, y Q with x < y, then there exists z R \ Q with x < z < y; chose

z = x + (y x)/ 2.


Ex class: (b) We shall show that inf n1 | n N = 0. Since n > 0 for all n N, n1 > 0 by
Proposition 1.9 (e) and 0 is a lower bound. Suppose > 0. Since R is Archimedean, we find
m N such that 1 < m or, equivalently 1/m < . Hence, is not a lower bound for E
which proves the claim.
(c) Axiom (C) is equivalent to the Archimedean property together with the topological completeness (Every Cauchy sequence in R is convergent, see Proposition 2.18).
(d) Axiom (C) is equivalent to the axiom of nested intervals, see Proposition 2.11 below:
Let In := [an , bn ] a sequence of closed nested intervals, that is (I1 I2 I3 )
such that for all > 0 there exists n0 such that 0 bn an < for all n n0 .
Then there exists a unique real number a R which is a member of all intervals,
T
i. e. {a} = nN In .

1.1.6 The Absolute Value


For x R one defines
| x | :=

(
x,

x,

if
if

x 0,

x < 0.

Lemma 1.12 For a, x, y R we have


(a) | x | 0 and | x | = 0 if and only if x = 0. Further | x | = | x |.
(b) x | x |, | x | = max{x,
x}, and | x | a (x a and

(c) | xy | = | x | | y | and xy = || xy || if y 6= 0.
(d) | x + y | | x | + | y | (triangle inequality).
(e) | | x | | y | | | x + y |.

x a).

Proof. (a) By Proposition 1.9 (a), x < 0 implies | x | = x > 0. Also, x > 0 implies | x | > 0.
Putting both together we obtain, x 6= 0 implies | x | > 0 and thus | x | = 0 implies x = 0.
Moreover | 0 | = 0. This shows the first part.
The statement | x | = | x | follows from (b) and (x) = x.
(b) Suppose first that x 0. Then x 0 x and we have max{x, x} = x = | x |. If x < 0
then x > 0 > x and

1.1 Real Numbers

23

max{x, x} = x = | x |. This proves max{x, x} = | x |. Since the maximum is an


upper bound, | x | x and | x | x. Suppose now a is an upper bound of {x, x}. Then
| x | = max{x, x} a. On the other hand, max{x, x} a implies that a is an upper bound
of {x, x} since max is.
One proves the first part of (c) by verifying the four cases (i) x, y 0, (ii) x 0, y < 0, (iii)
x < 0, y 0, and (iv) x, y < 0 separately. (i) is clear. In case (ii) we have by Proposition 1.9 (a)
and (b) that xy 0, and Proposition 1.7 (c)
| x | | y | = x(y) = (xy) = | xy | .
The cases (iii) and (iv) are similar. To the second part.

Since x = xy y we have by the first part of (c), | x | = xy | y |. The claim follows by multipli-

cation with | y1 | .
(d) By (b) we have x | x | and y | y |. It follows from Proposition 1.8 (b) that
(x + y) | x | + | y | .
By the second part of (b) with a = | x | + | y |, we obtain | x + y | | x | + | y |.
(e) Inserting u := x + y and v := y into | u + v | | u | + | v | one obtains
| x | | x + y | + | y | = | x + y | + | y | .

Adding | y | on both sides one obtains | x | | y | | x + y |. Changing the role of x and y


in the last inequality yields (| x | | y |) | x + y |. The claim follows again by (b) with
a = | x + y |.

1.1.7 Supremum and Infimum revisited


The following equivalent definition for the supremum of sets of real numbers is often used in
the sequel. Note that
x

= sup M .

x M

Similarly, x for all x M implies inf M.


Remarks 1.5 (a) Suppose that E R. Then is the supremum of E if and only if
(1) is an upper bound for E,
(2) For all > 0 there exists x E with < x.
Using the Archimedean axiom (2) can be replaced by
(2) For all n N there exists x E such that

1
n

< x.

1 Real and Complex Numbers

24

(b) Let M R and N R nonempty subsets which are bounded above.


Then M + N := {m + n | m M, n N} is bounded above and
sup(M + N) = sup M + sup N.
(c) Let M R+ and N R+ nonempty subsets which are bounded above.
Then MN := {mn | m M, n N} is bounded above and
sup(MN) = sup M sup N.

1.1.8 Powers of real numbers


We shall prove the existence of nth roots of positive reals. We already know xn , n Z. It is
1
recursively defined by xn := xn1 x, x1 := x, n N and xn := xn
for n < 0.
Proposition 1.13 (Bernoullis inequality) Let x 1 and n N. Then we have
(1 + x)n 1 + nx.
Equality holds if and only if x = 0 or n = 1.
Proof. We use induction over n. In the cases n = 1 and x = 0 we have equality. The strict
inequality (with an > sign in place of the sign) holds for n0 = 2 and x 6= 0 since (1 + x)2 =
1 + 2x + x2 > 1 + 2x. Suppose the strict inequality is true for some fixed n 2 and x 6= 0.
Since 1 + x 0 by Proposition 1.9 (b) multiplication of the induction assumption by this factor
yields
(1 + x)n+1 (1 + nx)(1 + x) = 1 + (n + 1)x + nx2 > 1 + (n + 1)x.
This proves the strict assertion for n + 1. We have equality only if n = 1 or x = 0.

Lemma 1.14 (a) For x, y R with x, y > 0 and n N we have


x < y xn < y n .
(b) For x, y R+ and n N we have
nxn1 (y x) y n xn ny n1 (y x).
We have equality if and only if n = 1 or x = y.
Proof. (a) Observe that
n

y x = (y x)

n
X
k=1

y nk xk1 = c(y x)

(1.2)

1.1 Real Numbers

25

P
with c := nk=1 y nk xk1 > 0 since x, y > 0. The claim follows.
(b) We have
y n xn nxn1 (y x) = (y x)
= (y x)

n
X
k=1

n
X
k=1

y nk xk1 xn1


xk1 y nk xnk 0

since by (a) y x and y nk xnk have the same sign. The proof of the second inequality is
quite analogous.

Proposition 1.15 For every real x > 0 and every positive integer n N there is one and only
one y > 0 such that y n = x.

1
This number y is written n x or x n , and it is called the nth root of x.
Proof. The uniqueness is clear since by Lemma 1.14 (a) 0 < y1 < y2 implies 0 < y1n < y2n .
Set
E := {t R+ | tn < x}.
Observe that E 6= since 0 E. We show that E is bounded above. By Bernoullis inequality
and since 0 < x < nx we have
t E tn < x < 1 + nx < (1 + x)n
=

t<1+x

Lemma 1.14

Hence, 1 + x is an upper bound for E. By the order completeness of R there exists y R such
that y = sup E. We have to show that y n = x. For, we will show that each of the inequalities
y n > x and y n < x leads to a contradiction.
Assume y n < x and consider (y + h)n with small h (0 < h < 1). Lemma 1.14 (b) implies
0 (y + h)n y n n (y + h)n1 (y + h y) < h n(y + 1)n1 .
Choosing h small enough that h n(y + 1)n1 < x y n we may continue
(y + h)n y n x y n .
Consequently, (y + h)n < x and therefore y + h E. Since y + h > y, this contradicts the fact
that y is an upper bound of E.
Assume y n > x and consider (y h)n with small h (0 < h < 1). Again by Lemma 1.14 (b)
we have
0 y n (y h)n n y n1(y y + h) < h ny n1.
Choosing h small enough that h ny n1 < y n x we may continue
y n (y h)n y n x.

1 Real and Complex Numbers

26

Consequently, x < (y h)n and therefore tn < x < (y h)n for all t in E. Hence y h is
an upper bound for E smaller than y. This contradicts the fact that y is the least upper bound.
Hence y n = x, and the proof is complete.

Remarks 1.6 (a) If a and b are positive real numbers and n N then (ab)1/n = a1/n b1/n .
Proof. Put = a1/n and = b1/n . Then ab = n n = ()n , since multiplication is commutative. The uniqueness assertion of Proposition 1.15 shows therefore that
(ab)1/n = = a1/n b1/n .
(b) Fix b > 0. If m, n, p, q Z and n > 0, q > 0, and r = m/n = p/q. Then we have
(bm )1/n = (bp )1/q .

(1.3)

Hence it makes sense to define br = (bm )1/n .


(c) Fix b > 1. If x R define
bx = sup{bp | p Q, p < x}.
For 0 < b < 1 set
bx =

(1.4)

1
 .
1 x
b

Without proof we give the familiar laws for powers and exponentials. Later we will redefine the
power bx with real exponent. Then we are able to give easier proofs.
(d) If a, b > 0 and x, y R, then
x
(i) bx+y = bx by , bxy = bby , (ii) bxy = (bx )y , (iii) (ab)x = ax bx .

1.1.9 Logarithms
Fix b > 1, y > 0. Similarly as in the preceding subsection, one can prove the existence of a
unique real x such that bx = y. This number x is called the logarithm of y to the base b, and we
write x = logb y. Knowing existence and uniqueness of the logarithm, it is not difficult to prove
the following properties.
Lemma 1.16 For any a > 0, a 6= 1 we have
(a) loga (bc) = loga b + loga c if b, c > 0;
(b) loga (bc ) = c loga b, if b > 0;
logd b
if b, d > 0 and d 6= 1.
(c) loga b =
logd a
Later we will give an alternative definition of the logarithm function.

1.1 Real Numbers

27

Review of Trigonometric Functions


(a) Degrees and Radians
The following table gives some important angles in degrees and radians. The precise definition
of is given below. For a moment it is just an abbreviation to measure angles. Transformation
2
of angles deg measured in degrees into angles measured in radians goes by rad = deg 360
.
Degrees 0
Radians 0

30

45

60

90

120

135

150

2
3

3
4

5
6

180

270
3
2

360
2

(b) Sine and Cosine


The sine, cosine, and tangent functions are defined in terms of ratios of sides of a right triangle:

length of the side adjacent to


,
length of the hypotenuse
length of the side opposite to
,
sin =
length of the hypotenuse
length of the side opposite to
tan =
.
length of the side adjacent to
cos =

hypotenuse
opposite side

adjacent side

Let be any angle between 0 and 360 . Further let P be the point on the unit circle (with
center in (0, 0) and radius 1) such that the ray from P to the origin (0, 0) and the positive x-axis
make an angle . Then cos and sin are defined to be the x-coordinate and the y-coordinate
of the point P , respectively.
y

P
1

sin

cos

If the angle is between 0 and 90 this new definition coincides


with the definition using the right triangle since the hypotenuse
which is a radius of the unit circle has now length 1.

y
P

If 90 < < 180 we find

sin

cos

cos = cos(180 ) < 0,


sin = sin(180 ) > 0.

1 Real and Complex Numbers

28
y

If 180 < < 270 we find

cos

cos = cos( 180 ) < 0,

sin = sin( 180 ) < 0.

sin

P
y

If 270 < < 360 we find

cos

cos = cos(360 ) > 0,

sin = sin(360 ) < 0.

sin
1

For angles greater than 360 or less than 0 define


cos = cos( + k360),

sin = sin( + k360),

where k Z is chosen such that 0 + k 360 < 360 . Thinking of to be given in radians,
cosine and sine are functions defined for all real taking values in the closed interval [1, 1].
If 6= 2 + k, k Z then cos 6= 0 and we define
tan :=
If 6= k, k Z then sin 6= 0 and we define
cot :=

sin
.
cos
cos
.
sin

In this way we have defined cosine, sine, tangent, and cotangent for arbitrary angles.
(c) Special Values
x in degrees

30

45

60

90

120

135

150

180

270

360

x in radians

2
3

3
4

5
6

3
2

sin x

1
2

1
2

cos x

3
2

12

2
2

2
2

3
2

1
2

1
0

2
2

3
2

2
2

3
2

3
3
tan x
0
1
3
/

3
1

0
/
0
3
3
Recall the addition formulas for cosine and sine and the trigonometric pythagoras.

cos(x + y) = cos x cos y sin x sin y,


sin(x + y) = sin x cos y + cos x sin y.

sin2 x + cos2 x = 1.

(1.5)
(1.6)

1.2 Complex numbers

29

1.2 Complex numbers


Some algebraic equations do not have solutions in the real number system. For instance the
quadratic equation x2 4x + 8 = 0 gives formally

x1 = 2 + 4 and x2 = 2 4.
We will see that one can work with this notation.
Definition 1.7 A complex number is an ordered pair (a, b) of real numbers. Ordered means
that (a, b) 6= (b, a) if a 6= b. Two complex numbers x = (a, b) and y = (c, d) are said to be
equal if and only if a = c and b = d. We define
x + y := (a + c, b + d),
xy := (ac bd, ad + bc).
Theorem 1.17 These definitions turn the set of all complex numbers into a field, with (0, 0) and
(1, 0) in the role of 0 and 1.
Proof. We simply verify the field axioms as listed in Definition 1.3. Of course, we use the field
structure of R.
Let x = (a, b), y = (c, d), and z = (e, f ). (A1) is clear.
(A2) x + y = (a + c, b + d) = (c + a, d + b) = y + x.
(A3) (x+y)+z = (a+c, b+d)+(e, f ) = (a+c+e, b+d+f ) = (a, b)+(c+e, d+f ) = x+(y+z).
(A4) x + 0 = (a, b) + (0, 0) = (a, b) = x.
(A5) Put x := (a, b). Then x + (x) = (a, b) + (a, b) = (0, 0) = 0.
(M1) is clear.
(M2) xy = (ac bd, ad + bc) = (ca db, da + cb) = yx.
(M3) (xy)z = (ac bd, ad + bc)(e, f ) = (ace bde adf bcf, acf bdf + ade + bce) =
(a, b)(ce df, cf + de) = x(yz).
(M4) x 1 = (a, b)(1, 0) = (a, b) = x.
(M5) If x 6= 0 then (a, b) 6= (0, 0), which means that at least one of the real numbers a, b is
different from 0. Hence a2 + b2 > 0 and we can define


b
1
a
,
.
:=
x
a2 + b2 a2 + b2
Then

1
x = (a, b)
x

(D)

a
b
,
a2 + b2 a2 + b2

= (1, 0) = 1.

x(y + z) = (a, b)(c + e, d + f ) = (ac + ae bd bf, ad + af + bc + be)


= (ac bd, ad + bc) + (ae bf, af + be)
= xy + yz.

1 Real and Complex Numbers

30

Remark 1.7 For any two real numbers a and b we have (a, 0) + (b, 0) = (a + b, 0) and
(a, 0)(b, 0) = (ab, 0). This shows that the complex numbers (a, 0) have the same arithmetic
properties as the corresponding real numbers a. We can therefore identify (a, 0) with a. This
gives us the real field as a subfield of the complex field.
Note that we have defined the complex numbers without any reference to the mysterious square
root of 1. We now show that the notation (a, b) is equivalent to the more customary a + bi.
Definition 1.8 i := (0, 1).
Lemma 1.18 (a) i2 = 1. (b) If a, b R then (a, b) = a + bi.
Proof. (a) i2 = (0, 1)(0, 1) = (1, 0) = 1.
(b) a + bi = (a, 0) + (b, 0)(0, 1) = (a, 0) + (0, b) = (a, b).

Definition 1.9 If a, b are real and z = a + bi, then the complex number z := a bi is called the
conjugate of z. The numbers a and b are the real part and the imaginary part of z, respectively.
We shall write a = Re z and b = Im z.
Proposition 1.19 If z and w are complex, then
(a) z + w = z + w,
(b) zw = z w,
(c) z + z = 2 Re z,
z z = 2i Im z,
(d) z z is positive real except when z = 0.
Proof. (a), (b), and (c) are quite trivial. To prove (d) write z = a + bi and note that z z = a2 + b2 .

Definition 1.10 If z is complex number, its absolute value | z | is the (nonnegative) root of z z;

that is | z | := z z.
The existence (and uniqueness)of | x | follows from Proposition 1.19 (d). Note that when x is
real, then x = x, hence | x | = x2 . Thus | x | = x if x > 0 and | x | = x if x < 0. We have
recovered the definition of the absolute value for real numbers, see Subsection 1.1.6.
Proposition 1.20 Let z and w be complex numbers . Then
(a) | z | > 0 unless z = 0,
(b) | z | = | z |,
(c) | zw | = | z | | w |,
(d) | Re z | | z |,
(e) | z + w | | z | + | w | .
Proof. (a) and (b) are trivial. Put z = a + bi and w = c + di, with a, b, c, d real. Then
| zw |2 = (ac bd)2 + (ad + bc)2 = (a2 + b2 )(c2 + d2 ) = | z |2 | w |2

1.2 Complex numbers

31

or | zw |2 = (| z | | w |)2 . Now (c) follows from the uniqueness assertion for roots.
To prove (d), note that a2 a2 + b2 , hence

| a | = a2 a2 + b2 = | z | .
To prove (e), note that z w is the conjugate of z w, so that z w + zw = 2 Re (z w). Hence
| z + w |2 = (z + w)(z + w) = z z + z w + z w + w w
= | z |2 + 2 Re (z w) + | w |2

| z |2 + 2 | z | | w | + | w |2 = (| z | + | w |)2 .
Now (e) follows by taking square roots.

1.2.1 The Complex Plane and the Polar form


There is a bijective correspondence between complex numbers and the points of a plane.
Im

z=a+b i

b
r=| z|

By the Pythagorean theorem it is clear that

| z | = a2 + b2 is exactly the distance of


z from the origin 0. The angle between
the positive real axis and the half-line 0z is
called the argument of z and is denoted by
= arg z. If z 6= 0, the argument is
uniquely determined up to integer multiples
of 2

Re

Elementary trigonometry gives


sin =

b
,
|z|

cos =

a
.
|z|

This gives with r = | z |, a = r cos and b = r sin . Inserting these into the rectangular form
of z yields
z = r(cos + i sin ),

(1.7)

which is called the polar form of the complex number z.

Example 1.5 a) z = 1 + i. Then | z | = 2 and sin = 1/ 2 = cos . This implies = /4.

Hence, the polar form of z is 1 + i = 2(cos /4 + i sin /4).


b) z = i. We have | i | = 1 and sin = 1, cos = 0. Hence = 3/2 and i =
1(cos 3/2 + i sin 3/2).
c) Computing the rectangular form of z from the polar form is easier.

z = 32(cos 7/6 + i sin 7/6) = 32( 3/2 i/2) = 16 3 16i.

1 Real and Complex Numbers

32

z+w
z

The addition of complex numbers corresponds


to the addition of vectors in the plane. The geometric meaning of the inequality | z + w |
| z | + | w | is: the sum of two edges of a triangle
is bigger than the third edge.

Multiplying complex numbers z = r(cos +


i sin ) and w = s(cos + i sin ) in the polar
form gives

zw

zw = rs(cos + i sin )(cos + i sin )


= rs(cos cos sin sin )+

z
w

i(cos sin + sin cos )

zw = rs(cos( + ) + i sin( + )), (1.8)

where we made use of the addition laws for sin


and cos in the last equation.
Hence, the product of complex numbers is formed by taking the product of their absolute values
and the sum of their arguments.
The geometric meaning of multiplication by w is a similarity transformation of C. More precisely, we have a rotation around 0 by the angle and then a dilatation with factor s and center
0.
Similarly, if w 6= 0 we have
z
r
= (cos( ) + i sin( )) .
(1.9)
w
s
Proposition 1.21 (De Moivres formula) Let z = r(cos + i sin ) be a complex number with
absolute value r and argument . Then for all n Z one has
z n = r n (cos(n) + i sin(n)).

(1.10)

Proof. (a) First let n > 0. We use induction over n to prove De Moivres formula. For n = 1
there is nothing to prove. Suppose (1.10) is true for some fixed n. We will show that the
assertion is true for n + 1. Using induction hypothesis and (1.8) we find
z n+1 = z n z = r n (cos(n)+i sin(n))r(cos +i sin ) = r n+1 (cos(n+)+i sin(n+)).
This proves the induction assertion.
(b) If n < 0, then z n = 1/(z n ). Since 1 = 1(cos 0 + i sin 0), (1.9) and the result of (a) gives
zn =

1
z n

1
r n

(cos(0 (n)) + i sin(0 (n))) = r n (cos(n) + i sin(n)).

This completes the proof.

1.2 Complex numbers

33

Example 1.6 Compute the polar form of z = 3 3i and compute z 15 .

We have | z | = 3 + 9 = 2 3, cos = 1/2, and sin = 3/2. Therefore, = /3 and

z = 2 3(cos(/3) + sin(/3)). By the De Moivres formula we have



 



+ i sin 15
= 215 37 3(cos(5) + i sin(5))
z 15 = (2 3)15 cos 15
3
3

15
15 7
z = 2 3 3.

1.2.2 Roots of Complex Numbers


Let z C and n N. A complex number w is called an nth root of z if w n = z. In contrast
to the real case, roots of complex numbers are not unique. We will see that there are exactly n
different nth roots of z for every z 6= 0.
Let z = r(cos + i sin ) and w = s(cos + i sin ) an nth root of z. De Moivres formula

gives w n = sn (cos n + i sin n). If we compare w n and z we get sn = r or s = n r 0.


2k
, k Z. For k = 0, 1, . . . , n 1 we obtain
Moreover n = + 2k, k Z or = +
n
n
different values 0 , 1 , . . . , n1 modulo 2. We summarize.
Lemma 1.22 Let n N and z = r(cos + i sin ) 6= 0 a complex number. Then the complex
numbers



+ 2k
+ 2k
n
, k = 0, 1, . . . , n 1
wk = r cos
+ i sin
n
n
are the n different nth roots of z.
Example 1.7 Compute the 4th roots of z = 1.
w1

w0

| z | = 1 = | w | =

1 = 1, arg z = = 180. Hence

,
4
1 360
1 = +
= 135,
4
4
2 360
= 225,
2 = +
4
4
3 360
= 315.
3 = +
4
4
0 =

z=-1

w2

w3

We obtain
1
1
2+i
2
2
2
1
1
2+i
2,
w1 = cos 135 + i sin 135 =
2
2
1
1
2i
2,
w2 = cos 225 + i sin 225 =
2
2
1
1
w3 = cos 315 + i sin 315 =
2i
2.
2
2
Geometric interpretation of the nth roots. The nth roots of z 6= 0 form a regularp
n-gon in the
n
complex plane with center 0. The vertices lie on a circle with center 0 and radius | z |.
w0 = cos 45 + i sin 45

1 Real and Complex Numbers

34

1.3 Inequalities
1.3.1 Monotony of the Power and Exponential Functions
Lemma 1.23 (a) For a, b > 0 and r Q we have

a < b ar < br
r

if r > 0,

a < b a > b

if r < 0.

r < s ar < as

if a > 1,

(b) For a > 0 and r, s Q we have

r < s ar > as

if a < 1.

Proof. Suppose that r > 0, r = m/n with integers m, n Z, n > 0. Using Lemma 1.14 (a)
twice we get
1
1
a < b am < bm (am ) n < (bm ) n ,

which proves the first claim. The second part r < 0 can be obtained by setting r in place of r
in the first part and using Proposition 1.9 (e).
(b) Suppose that s > r. Put x = s r, then x Q and x > 0. By (a), 1 < a implies
1 = 1x < ax . Hence 1 < asr = as /ar (here we used Remark 1.6 (d)), and therefore ar < as .
Changing the roles of r and s shows that s < r implies as < ar such that the converse direction
is also true.
The proof for a < 1 is similar.

1.3.2 The Arithmetic-Geometric mean inequality


Proposition 1.24 Let n N and x1 , . . . , xn be in R+ . Then

x1 + + xn
n x1 xn .
n
We have equality if and only if x1 = x2 = = xn .

(1.11)

Proof. We use forward-backward induction over n. First we show (1.11) is true for all n which
are powers of 2. Then we prove that if (1.11) is true for some n + 1, then it is true for n. Hence,
it is true for all positive integers.

The inequality is true for n = 1. Let a, b 0 then ( a b)2 0 implies a + b 2 ab and


the inequality is true in case n = 2. Equality holds if and only if a = b. Suppose it is true for
some fixed k N; we will show that it is true for 2k. Let x1 , . . . , xk , y1 , . . . , yk R+ . Using
induction assumption and the inequality in case n = 2, we have

!
!
!1/k
!1/k
k
k
k
k
k
k
X
Y
Y
1 X
1X
1 1X
1

xi +
yi
xi +
yi
xi
+
yi
2k i=1
2
k
k
2
i=1
i=1
i=1
i=1
i=1

k
Y
i=1

xi

k
Y
i=1

yi

! 2k1

1.3 Inequalities

35

This completes the forward part. Assume now (1.11) is true for n + 1. We will show it for n.
P
Let x1 , . . . , xn R+ and set A := ( ni=1 xi )/n. By induction assumption we have
1
! n+1

n
Y

n
Y

1
! n+1

1
1
1

A n+1
(x1 + + xn + A)
(nA + A)
xi A
xi
n+1
n+1
i=1
i=1
1
1
!
!
!1/n
n
n
n
n+1
n+1
Y
Y
Y
1
n
A
A n+1 A n+1
A
xi
xi
xi
.

i=1

i=1

i=1

It is trivial that in case x1 = x2 = = xn we have equality. Suppose that equality holds in


a case where at least two of the xi are different, say x1 < x2 . Consider the inequality with the
new set of values x1 := x2 := (x1 + x2 )/2, and xi = xi for i 3. Then
!1/n
n
n
n
X
X
Y
1
1
xk
=
xk =
xk
xk
.
n
n
k=1
k=1
k=1
k=1

2
x1 + x2

4x1 x2 x21 + 2x1 x2 + x22 0 (x1 x2 )2 .
x1 x2 x1 x2 =
2
n
Y

!1/n

This contradicts the choice x1 < x2 . Hence, x1 = x2 = = xn is the only case where
equality holds. This completes the proof.

1.3.3 The CauchySchwarz Inequality


Proposition 1.25 (CauchySchwarz inequality) Suppose that x1 , . . . , xn , y1 , . . . , yn are real
numbers. Then we have
!2
n
n
n
X
X
X
xk yk

x2k
yk2.
(1.12)
k=1

k=1

k=1

Equality holds if and only if there exists t R such that yk = t xk for k = 1, . . . , n that is, the
vector y = (y1 , . . . , yn ) is a scalar multiple of the vector x = (x1 , . . . , xn ).
Proof. Consider the quadratic function f (t) = at2 2bt + c where
a=

n
X

x2k ,

b=

k=1

n
X

xk yk ,

c=

k=1

n
X

yk2 .

k=1

Then
f (t) =
=

n
X
k=1
n
X
k=1

x2k t2

n
X
k=1

2xk yk t +

n
X

yk2

k=1

n
 X
x2k t2 2xk yk t + yk2 =
(xk t yk )2 0.
k=1

1 Real and Complex Numbers

36

Equality holds if and only if there is a t R with yk = txk for all k. Suppose now, there is no
such t R. That is
f (t) > 0, for all t R.


In other words, the polynomial f (t) = at2 2bt + c has no real zeros, t1,2 = a1 b b2 ac .
That is, the discriminant D = b2 ac is negative (only complex roots); hence b2 < ac:
!2
n
n
n
X
X
X
2
xk yk
<
xk
yk2 .
k=1

k=1

k=1

this proves the claim.

Corollary 1.26 (The Complex CauchySchwarz inequality) If x1 , . . . , xn and y1 , . . . , yn are


complex numbers, then
2
n
n
n

X
X
X


2
xk yk
| xk |
| y k |2 .
(1.13)



k=1

k=1

k=1

Equality holds if and only if there exists a C such that y = x, where y = (y1 , . . . , yn )
Cn , x = (x1 , . . . , xn ) Cn .

Proof. Using the generalized triangle inequality | z1 + + zn | | z1 | + + | zn | and the


real CauchySchwarz inequality we obtain

2
!2
!2
n
n
n
n
n
X

X
X
X
X


2
xk yk
| xk yk | =
| xk | | yk |
| xk |
| yk |2 .



k=1

k=1

k=1

k=1

k=1

This proves the inequality.


The right less equal is an equality if there is a t R such that | y | = t | x |. In the first less
equal sign we have equality if and only if all zk = xk y k have the same argument; that is
arg yk = arg xk + . Putting both together yields y = x with = t(cos + i sin ).

1.4 Appendix A
In this appendix we collect some assitional facts which were not covered by the lecture.
We now show that the equation
x2 = 2

(1.14)

is not satisfied by any rational number x.


Suppose to the contrary that there were such an x, we could write x = m/n with integers m
and n, n 6= 0 that are not both even. Then (1.14) implies
m2 = 2n2 .

(1.15)

1.4 Appendix A

37

This shows that m2 is even and hence m is even. Therefore m2 is divisible by 4. It follows that
the right hand side of (1.15) is divisible by 4, so that n2 is even, which implies that n is even.
But this contradicts our choice of m and n. Hence (1.14) is impossible for rational x.
We shall show that A contains no largest element and B contains no smallest. That is for every
p A we can find a rational q A with p < q and for every p B we can find a rational q B
such that q < p.
Suppose that p is in A. We associate with p > 0 the rational number
2p + 2
2 p2
=
.
p+2
p+2

(1.16)

4p2 + 8p + 4 2p2 8p 8
2(p2 2)
=
.
(p + 2)2
(p + 2)2

(1.17)

q =p+
Then
q2 2 =

If p is in A then 2 p2 > 0, (1.16) shows that q > p, and (1.17) shows that q 2 < 2. If p is in B
then 2 < p2 , (1.16) shows that q < p, and (1.17) shows that q 2 > 2.
A Non-Archimedean Ordered Field
The fields Q and R are Archimedean, see below. But there exist ordered fields without this
property. Let F := R(t) the field of rational functions f (t) = p(t)/q(t) where p and q are
polynomials with real coefficients. Since p and q have only finitely many zeros, for large t,
f (t) is either positive or negative. In the first case we set f > 0. In this way R(t) becomes an
ordered field. But t > n for all n N since the polynomial f (t) = t n becomes positive for
large t (and fixed n).
Our aim is to define bx for arbitrary real x.
Lemma 1.27 Let b, p be real numbers with b > 1 and p > 0. Set
M = {br | r Q, r < p},

M = {bs | s Q, p < s}.

Then
sup M = inf M .
Proof. (a) M is bounded above by arbitrary bs , s Q, with s > p, and M is bounded below by
any br , r Q, with r < p. Hence sup M and inf M both exist.
(b) Since r < p < s implies ar < bs by Lemma 1.23, sup M bs for all bs M . Taking the
infimum over all such bs , sup M inf M .
(c) Let s = sup M and > 0 be given. We want to show that inf M < s + . Choose n N
such that
1/n < /(s(b 1)).

(1.18)

By Proposition 1.11 there exist r, s Q with


r < p < s and

sr <

1
.
n

(1.19)

1 Real and Complex Numbers

38

Using s r < 1/n, Bernoullis inequality (part 2), and (1.18), we compute
1
1
bs br = br (bsr 1) s(b n 1) s (b 1) < .
n

Hence
inf M bs < br + sup M + .
Since was arbitrary, inf M sup M, and finally, with the result of (b), inf M = sup M.

Corollary 1.28 Suppose p Q and b > 1 is real. Then


bp = sup{br | r Q, r < p}.
Proof. For all rational numbers r, p, s Q, r < p < s implies ar < ap < as . Hence
sup M ap inf M . By the lemma, these three numbers coincide.

Inequalities
Now we extend Bernoullis inequality to rational exponents.
Proposition 1.29 (Bernoullis inequality) Let a 1 real and r Q. Then
(a) (1 + a)r 1 + ra if r 1,
(b) (1 + a)r 1 + ra if 0 r 1.
Equality holds if and only if a = 0 or r = 1.
Proof. (b) Let r = m/n with m n, m, n N. Apply (1.11) to xi := 1 + a, i = 1, . . . , m and
xi := 1 for i = m + 1, . . . , n. We obtain
1
1
(m(1 + a) + (n m)1) (1 + a)m 1nm n
n
m
m
a + 1 (1 + a) n ,
n
which proves (b). Equality holds if n = 1 or if x1 = = xn i. e. a = 0.
(a) Now let s 1, z 1. Setting r = 1/s and a := (1 + z)1/r 1 we obtain r 1 and
a 1. Inserting this into (b) yields


1 r
(1 + a)r (1 + z) r 1 + r ((1 + z)s 1)
z r ((1 + z)s 1)

1 + sz (1 + z)s .
This completes the proof of (a).

1.4 Appendix A

39

Corollary 1.30 (Bernoullis inequality) Let a 1 real and x R. Then


(a) (1 + a)x 1 + xa if x 1,
(b) (1 + a)x 1 + xa if x 1. Equality holds if and only if a = 0 or x = 1.

Proof. (a) First let a > 0. By Proposition 1.29 (a) (1 + a)r 1 + ra if r Q. Hence,
(1 + a)x = sup{(1 + a)r | r Q, r < x} sup{1 + ra | r Q, r < x} = 1 + xa.
Now let 1 a < 0. Then r < x implies ra > xa, and Proposition 1.29 (a) implies
(1 + a)r 1 + ra > 1 + xa.

(1.20)

By definition of the power with a real exponent, see (1.4)


(1 + a)x =

1
sup{(1/(a + 1))r | r Q, r < x}

HW 2.1

inf{(1 + a)r | r Q, r < x}.

Taking in (1.20) the infimum over all r Q with r < x we obtain


(1 + a)x = inf{(1 + a)r | r Q, r < x} 1 + xa.
(b) The proof is analogous, so we omit it.

Proposition 1.31 (Youngs inequality) If a, b R+ and p > 1, then


ab

1 p 1 q
a + b,
p
q

(1.21)

where 1/p + 1/q = 1. Equality holds if and only if ap = bq .


Proof. First note that 1/q = 1 1/p. We reformulate Bernoullis inequality for y R+ and
p>1
y p 1 p(y 1)

1 p
1
1
(y 1) + 1 y y p + y.
p
p
q

If b = 0 the statement is always true. If b 6= 0 insert y := ab/bq into the above inequality:
 p
1
1 ab
ab
+ q
q
p b
q
b
1 ap bp 1
ab
+ q
| bq
pq
p b
q
b
1 p 1 q
a + b ab,
p
q
since bp+q = bpq . We have equality if y = 1 or p = 1. The later is impossible by assumption.
y = 1 is equivalent to bq = ab or bq1 = a or b(q1)p = ap (b 6= 0). If b = 0 equality holds if
and only if a = 0.

1 Real and Complex Numbers

40

Proposition 1.32 (Holders inequality) Let p > 1, 1/p + 1/q = 1, and x1 , . . . , xn R+ and
y1 , . . . , yn R+ non-negative real numbers. Then
n
X
k=1

n
X

xk yk

! p1

xpk

k=1

n
X

ykq

k=1

! 1q

(1.22)

We have equality if and only if there exists c R such that for all k = 1, . . . , n, xpk /ykq = c (they
are proportional).
1
1
P
P
Proof. Set A := ( nk=1 xpk ) p and B := ( nk=1 ykq ) q . The cases A = 0 and B = 0 are trivial. So
we assume A, B > 0. By Youngs inequality we have
1 xpk
xk yk
1 ykq

+
A B
p Ap q B q
n
n
n
1 X
1 X p
1 X q
=
xk yk
x +
y
AB k=1
pAp k=1 k qB q k=1 k
1 X p
1 X q
xk + P q
yk
= P p
p xk
q yk
1 1
= + =1
p q
! p1
! q1
n
n
n
X
X
X
.
xk yk
xpk
ykq
=
k=1

k=1

k=1

Equality holds if and only if xpk /Ap = ykq /B q for all k = 1, . . . , n. Therefore, xpk /ykq = const.

Corollary 1.33 (Complex Holders inequality) Let p > 1, 1/p + 1/q = 1 and xk , yk C,
k = 1, . . . , n. Then
n
X
k=1

| xk yk |

n
X
k=1

| xk |

! p1

n
X
k=1

| yk |

! 1q

Equality holds if and only if | xk |p / | yk |q = const. for k = 1, . . . , n.


Proof. Set xk := | xk | and yk := | yk | in (1.22). This will prove the statement.
Proposition 1.34 (Minkowskis inequality) If x1 , . . . , xn R+ and y1 , . . . , yn R+ and
p 1 then
n
X
(xk + yk )p
k=1

! p1

n
X
k=1

xpk

! p1

Equality holds if p = 1 or if p > 1 and xk /yk = const.

n
X
k=1

ykp

! p1

(1.23)

1.4 Appendix A

41

Proof. The case p = 1 is obvious. Let p > 1. As before let q > 0 be the unique positive number
with 1/p + 1/q = 1. We compute
n
n
n
n
X
X
X
X
p
p1
p1
(xk + yk ) =
(xk + yk )(xk + yk )
=
xk (xk + yk )
+
yk (xk + yk )p1
k=1

(1.22)

k=1

X

X

xpk
xpk

 p1

1/p

X
k

(xk + yk )(p1)q

X

! q1

k=1

X

ykp

k=1

 1p

1/q  X
1/q
q
(xk + yk )p
.
yk

(xk + yk )(p1)q

! q1

P
1
1
We can assume that (xk + yk )p > 0. Using 1 = by taking the quotient of the last
q
p
P
p 1/q
inequality by ( (xk + yk ) ) we obtain the claim.
Equality holds if xpk /(xk + yk )(p1)q = const. and ykp /(xk + yk )(p1)q) = const.; that is
xk /yk = const.

Corollary 1.35 (Complex Minkowskis inequality) If x1 , . . . , xn , y1, . . . , yn C and p 1


then
! p1
! p1
! p1
n
n
n
X
X
X
p
p
p

+
.
(1.24)
| xk + yk |
| xk |
| yk |
k=1

k=1

k=1

Equality holds if p = 1 or if p > 1 and xk /yk = > 0.


Proof. Using the triangle inequality gives | xk + yk | | xk | + | yk |; hence
Pn
Pn
p
p

The real version of Minkowskis inequality


k=1 | xk + yk |
k=1 (| xk | + | yk |) .
now proves the assertion.
If x = (x1 , . . . , xn ) is a vector in Rn or Cn , the (non-negative) number
kxkp :=

n
X
k=1

| xk |

! p1

is called the p-norm of the vector x. Minkowskis inequalitie then reads as


kx + ykp kxkp + kykp
which is the triangle inequality for the p-norm.

42

1 Real and Complex Numbers

Chapter 2
Sequences and Series
This chapter will deal with one of the main notions of calculus, the limit of a sequence. Although we are concerned with real sequences, almost all notions make sense in arbitrary metric
spaces like Rn or Cn .
Given a R and > 0 we define the -neighborhood of a as
U (a) := (a , a + ) = {x R | a < x < a + } = {x R | | x a | < }.

2.1 Convergent Sequences


A sequence is a mapping x : N R. To every n N we associate a real number xn . We write
this as (xn )nN or (x1 , x2 , . . . ). For different sequences we use different letters as (an ), (bn ),
(yn ).


Example 2.1 (a) xn = n1 , n1 , (xn ) = 1, 12 , 13 , . . . ;
(b) xn = (1)n + 1, (xn ) = (0, 2, 0, 2, . . . );
(c) xn = a (a R fixed), (xn ) = (a, a, . . . ) (constant sequence),
(d) xn = 2n 1, (xn ) = (1, 3, 5, 7, . . . ) the sequence of odd positive integers.
(e) xn = an (a R fixed), (xn ) = (a, a2 , a3 , . . . ) (geometric sequence);
Definition 2.1 A sequence (xn ) is said to be convergent to x R if
For every > 0 there exists n0 N such that n n0 implies
| xn x | < .
x is called the limit of (xn ) and we write
x = lim xn
n

or simply

x = lim xn

or

xn x.

If there is no such x with the above property, the sequence (xn ) is said to be divergent.
In other words: (xn ) converges to x if any neighborhood U (x), > 0, contains almost all
elements of the sequence (xn ). Almost all means all but finitely many. Sometimes we say
for sufficiently large n which means the same.
43

2 Sequences and Series

44

This is an equivalent formulation since xn U (x) means x < xn < x+, hence | x xn | <
. The n0 in question need not to be the smallest possible.
We write
lim xn = +

(2.1)

if for all E > 0 there exists n0 N such that n n0 implies xn E. Similarly, we write
lim xn =

(2.2)

if for all E > 0 there exists n0 N such that n n0 implies xn E. In these cases we say
that + and are improper limits of (xn ). Note that in both cases (xn ) is divergent.
Example 2.2 This is Example 2.1 continued.


(a) limn n1 = 0. Indeed, let > 0 be fixed. We are looking for some n0 with n1 0 <
for all n n0 . This is equivalent to 1/ < n. Choose n0 > 1/ (which is possible by the
Archimedean property). Then for all n n0 we have
n n0 >

1
1
= < = | xn 0 | < .

Therefore, (xn ) tends to 0 as n .


(b) xn = (1)n + 1 is divergent. Suppose to the contrary that x is the limit. To = 1 there is
n0 such that for n n0 we have | xn x | < 1. For even n n0 this implies | 2 x | < 1 for
odd n n0 , | 0 x | = | x | < 1. The triangle inequality gives
2 = | (2 x) + x | | 2 x | + | x | < 1 + 1 = 2.
This is a contradiction. Hence, (xn ) is divergent.
(c) xn = a. lim xn = a since | xn a | = | a a | = 0 < for all > 0 and all n N.
(d) lim(2n 1) = +. Indeed, suppose that E > 0 is given. Choose n0 > E2 + 1. Then
n n0 = n >

E
+ 1 = 2n 2 > E = xn = 2n 1 > 2n 2 > E.
2

This proves the claim. Similarly, one can show that lim n3 = . But both ((n)n ) and
(1, 2, 1, 3, 1, 4, 1, 5, . . . ) have no improper limit. Indeed, the first one becomes arbitrarily large
for even n and arbitrarily small for odd n. The second one becomes large for eveb n but is
constant for odd n.
(e) xn = an , (a 0).
(
1, if a = 1,
lim an =
n
0, if 0 a < 1.

(an ) is divergent for a > 1. Moreover, lim an = +. To prove this let E > 0 be given. By
the Archimedean property of R and since a 1 > 0 we find m N such that m(a 1) > E.
Bernoullis inequality gives
am m(a 1) + 1 > m(a 1) > E.

2.1 Convergent Sequences

45

By Lemma 1.23 (b), n m implies


an am > E.
This proves the claim.
Clearly (an ) is convergent in cases a = 0 and a = 1 since it is constant then. Let 0 < a < 1
and put b = a1 1; then b > 0. Bernoullis inequality gives
 n
1
= (b + 1)n 1 + nb > nb
a
1
= 0 < an < .
nb
1
=
an

Let > 0. Choose n0 >

(2.3)

1
1
. Then >
and n n0 implies
b
n0 b
| an 0 | = | an | = an <

(2.3)

1
1

< .
nb
n0 b

Hence, an 0.
Proposition 2.1 The limit of a convergent sequence is uniquely determined.
Proof. Suppose that x = lim xn and y = lim xn and x 6= y. Put := | x y | /2 > 0. Then
n1 N n n1 : | x xn | < ,

n2 N n n2 : | y xn | < .

Choose m max{n1 , n2 }. Then | x xm | < and | y xm | < . Hence,


| x y | | x xm | + | y xm | < 2 = | x y | .
This contradiction establishes the statement.
Proposition 2.1 holds in arbitrary metric spaces.
Definition 2.2 A sequence (xn ) is said to be bounded if the set of its elements is a bounded set;
i. e. there is a C 0 such that
| xn | C

for all n N.

Similarly, (xn ) is said to be bounded above or bounded below if there exists C R such that
xn C or xn C, respectively, for all n N
Proposition 2.2 If (xn ) is convergent, then (xn ) is bounded.

2 Sequences and Series

46

Proof. Let x = lim xn . To = 1 there exists n0 N such that | x xn | < 1 for all n n0 .
Then | xn | = | xn x + x | | xn x | + | x | < | x | + 1 for all n n0 . Put
C := max{| x1 | , . . . , | xn0 1 | , | x | + 1}.

Then | xn | C for all n N.

The reversal statement is not true; there are bounded sequences which are not convergent, see
Example 2.1 (b).
Ex Class: If (xn ) has an improper limit, then (xn ) is divergent.
Proof. Suppose to the contrary that (xn ) is convergent; then it is bounded, say | xn | C for all
n. This contradicts xn > E as well as xn < E for E = C and sufficiently large n. Hence,
(xn ) has no improper limits, a contradiction.

2.1.1 Algebraic operations with sequences


The sum, difference, product, quotient and absolute value of sequences (xn ) and (yn ) are defined as follows
(xn ) (yn ) := (xn yn ),
 
(xn )
xn
:=
, (yn 6= 0)
(yn )
yn

(xn ) (yn ) := (xn yn ),


| (xn ) | := (| xn |).

Proposition 2.3 If (xn ) and (yn ) are convergent sequences and c R, then their sum, difference, product, quotient (provided yn 6= 0 and lim yn 6= 0), and their absolute values are also
convergent:
(a) lim(xn yn ) = lim xn lim yn ;
(b) lim(cxn ) = c lim xn , lim(xn + c) = lim xn + c.
(c) lim(xn yn ) = lim xn lim yn ;
xn
(d) lim xynn = lim
if yn 6= 0 for all n and lim yn 6= 0;
lim yn
(e) lim | xn | = | lim xn | .
Proof. Let xn x and yn y.
(a) Given > 0 then there exist integers n1 and n2 such that
n n1 implies | xn x | < /2 and n n2 implies | yn y | < /2.
If n0 := max{n1 , n2 }, then n n0 implies
| (xn + yn ) (x + y) | | xn x | + | yn y | .
The proof for the difference is quite similar.
(b) follows from | cxn cx | = | c | | xn x | and | (xn + c) (x + c) | = | xn x |.
(c) We use the identity
xn yn xy = (xn x)(yn y) + x(yn y) + y(xn x).
Given > 0 there are integers n1 and n2 such that

(2.4)

2.1 Convergent Sequences

47

n n1 implies | xn x | <

and n n2 implies | yn y | <

If we take n0 = max{n1 , n2 }, n n0 implies


| (xn x)(yn y) | < ,
so that
lim (xn x)(yn y) = 0.

Now we apply (a) and (b) to (2.4) and conclude that


lim (xn yn xy) = 0.

(d) Choosing n1 such that | yn y | < | y | /2 if n n1 , we see that


| y | | y yn | + | yn | < | y | /2 + | yn | = | yn | > | y | /2.
Given > 0, there is an integer n2 > n1 such that n n2 implies
| yn y | < | y |2 /2.
Hence, for n n2 ,



1
yn y
1


yn y = yn y



< 2 | yn y | <
| y |2

1
. The general case can be reduced to the above case using (c) and
lim yn
(xn /yn ) = (xn 1/yn ).
(e) By Lemma 1.12 (e) we have | | xn | | x | | | xn x |. Given > 0, there is n0 such that
n n0 implies | xn x | < . By the above inequality, also | | xn | | x | | and we are
done.
and we get lim( y1n ) =

Example 2.3 (a) zn := n+1


. Set xn = 1 and yn = 1/n. Then zn = xn + yn and we already
n
= lim 1 + lim n1 = 1 + 0 = 1.
know that lim xn = 1 and lim yn = 0. Hence, lim n+1
n
2
(b) an = 3nn2+13n
. We can write this as
2
an =

3 + 13
n
.
1 n22

Since lim 1/n = 0, by Proposition 2.3, we obtain lim 1/n2 = 0 and lim 13/n = 0. Hence
lim 2/n2 = 0 and lim (3 + 13/n) = 3. Finally,

limn 3 + 13
3
3n2 + 13n
n
=
= = 3.
lim
2
2
n
n 2
1
limn 1 n

(c) We introduce the notion of a polynomial and and a rational function.


Given a0 , a1 , . . . , an R, an 6= 0. The function p : R R given by p(t) := an tn + an1 tn1 +
+ a1 t + a0 is called a polynomial. The positive integer n is the degree of the polynomial

2 Sequences and Series

48

p(t), and a1 , . . . an are called the coefficients of p(t). The set of all real polynomials forms a
real vector space denoted by R[x].
Given two polynomials p and q; put D := {t R | q(t) 6= 0}. Then r = pq is a called a rational
function where r : D R is defined by
r(t) :=

p(t)
.
q(t)

Polynomials are special rational functions with q(t) 1. the set of rational functions with real
coefficients form both a real vector space and a field. It is denoted by R(x).
Lemma 2.4 (a) Let an 0 be a sequence tending to zero with an 6= 0 for every n. Then
(
+,
if an > 0 for almost all n;
1
lim
=
n an
,
if an < 0 for almost all n.

(b) Let yn a be a sequence converging to a and a > 0. Then yn > 0 for almost all n N.

Proof. (a) We will prove the case with . Let > 0. By assumption there is a positive integer
n0 such that n n0 implies < an < 0. Tis implies 0 < an < and further a1n < 1 < 0.
Suppose E > 0 is given; choose = 1/E and n0 as above. Then by the previous argument,
n n0 implies
1
1
< = E.
an

This shows limn a1n = .


(b) To = a there exists n0 such that n n0 implies | yn a | < a. That is a < yn a < a
or 0 < yn < 2a which proves the claim.

Lemma 2.5 Suppose that p(t) =


ar 6= 0 and bs 6= 0. Then

k
k=0 ak t

0,

ar ,

p(n)
= br
n q(n)

+,

,
lim

Proof. Note first that

Pr

and q(t) =

Ps

k
k=0 bk t

are real polynomials with

r < s,
r = s,
r>s

and

r>s

and

ar
br
ar
br

> 0,
< 0.


nr ar + ar1 n1 + + a0 n1r
ar + ar1 n1 + + a0 n1r
p(n)
1
1

= s
=

1
1 =: sr cn
1
1
sr
q(n)
n
n
bs + bs1 n + + b0 ns
n bs + bs1 n + + b0 ns

Suppose that r = s. By Proposition 2.3, n1k 0 for all k N. By the same proposition, the
limits of each summand in the numerator and denominator is 0 except for the first in each sum.
Hence,

limn ar + ar1 n1 + + a0 n1r
p(n)
a
 = r.
=
lim
1
1
n q(n)
br
limn bs + bs1 n + + b0 ns

2.1 Convergent Sequences

49

Supose now that r < s. As in the previous case, the sequence (cn ) tends to ar /bs but the first
1
factor nsr
tends to 0. Hence, the product sequence tends to 0.
Suppose that r > s and abrs > 0. The sequence (cn ) has a positive limit. By Lemma 2.4 (b)
almost all cn > 0. Hence,
q(n)
1 1
dn :=
= rs
p(n)
n cn
tends
 above part and dn > 0 for almost all n. By Lemma 2.4 (a), the sequence
  to 0 by the
p(n)
1
= q(n) tends to + as n , which proves the claim in the first case. The case
dn
ar /bs < 0 can be obtained by multiplying with 1 and noting that limn xn = + implies
limn (xn ) = .
In the German literature the next proposition is known as the Theorem of the two policemen.
Proposition 2.6 (Sandwich Theorem) Let an , bn and xn be real sequences with an xn bn
for all but finitely many n N. Further let limn an = limn bn = x. Then xn is also
convergent to x.
Proof. Let > 0. There exist n1 , n2 , and n3 N such n n1 implies an U (x), n n2
implies bn U (x), and n n3 implies an xn bn . Choosing n0 = max{n1 , n2 , n3 },
n n0 implies xn U (x). Hence, xn x.
Remark. (a) If two sequences (an ) and (bn ) differ only in finitely many elements, then both
sequences converge to the same limit or both diverge.
(b) Define the shifted sequence bn := an+k , n N, where k is a fixed positv integer. Then
both sequences converge to the same limit or both diverge.

2.1.2 Some special sequences


1
(a) If p > 0, then lim p = 0.
n n

(b) If p > 0, then lim n p = 1.


n

(c) lim n n = 1.
n
n
(d) If a > 1 and R, then lim n = 0.
n a

Proposition 2.7

Proof. (a) Let > 0. Take n0 > (1/)1/p (Note that the Archimedean Property of the real
numbers is used here). Then n n0 implies 1/np < .

(b) If p > 1, put xn = n p1. Then, xn > 0 and by Bernoullis inequality (that is by homework
1
1
that is
4.1) we have p n = (1 + p 1) n < 1 + p1
n
0 < xn

1
(p 1)
n

By Proposition 2.6, xn 0. If p = 1, (b) is trivial, and if 0 < p < 1 the result is obtained by
taking reciprocals.

2 Sequences and Series

50
(c) Put xn =

n 1. Then xn 0, and, by the binomial theorem,


n = (1 + xn )n

Hence

n(n 1) 2
xn .
2

2
, (n 2).
n1

1
0. Applying the sandwich theorem again, xn 0 and so n n 1.
By (a), n1
(d) Put p = a 1, then p > 0. Let k be an integer such that k > , k > 0. For n > 2k,
 
n k n(n 1) (n k + 1) k nk pk
n
p =
p > k .
(1 + p) >
k
k!
2 k!
0 xn

Hence,
0<

n
n
2k k! k
=
<
n
an
(1 + p)n
pk

(n > 2k).

Since k < 0, nk 0 by (a).


Q 1. Let (xn ) be a convergent sequence, xn x. Then the sequence of arithmetic means
n
X
sn := n1
xk also converges to x.
k=1

Q 2. Let (xn ) > 0 a convergent sequence of positive numbers with and lim xn = x > 0. Then

n
x1 x2 xn x. Hint: Consider yn = log xn .

2.1.3 Monotonic Sequences


Definition 2.3 A real sequence (xn ) is said to be
(a) monotonically increasing if xn xn+1 for all n;
(b) monotonically decreasing if xn xn+1 for all n.
The class of monotonic sequences consists of the increasing and decreasing sequences.
A sequence is said to be strictly monotonically increasing or decreasing if xn < xn+1 or xn >
xn+1 for all n, respectively. We write xn and xn .
Proposition 2.8 A monotonic and bounded sequence is convergent. More precisely, if (xn ) is
increasing and bounded above, then lim xn = sup{xn }. If (xn ) is decreasing and bounded
below, then lim xn = inf{xn }.
Proof. Suppose xn xn+1 for all n (the proof is analogous in the other case). Let E := {xn |
n N} and x = sup E. Then xn x, n N. For every > 0 there is an integer n0 N such
that
x < xn0 < x,
for otherwise x would be an upper bound of E. Since xn increases, n n0 implies
x < xn < x,

2.1 Convergent Sequences

51

which shows that (xn ) converges to x.


cn
with some fixed c > 0. We will show that xn 0 as n .
n!
Writing (xn ) recursively,
Example 2.4 Let xn =

xn+1 =

c
xn ,
n+1

(2.5)

c
<
n+1
xn . On the other hand, xn > 0 for all n such that (xn ) is bounded below by 0. By Proposition 2.8, (xn ) converges to some x R. Taking the limit n in (2.5), we have
we observe that (xn ) is strictly decreasing for n c. Indeed, n c implies xn+1 = xn

c
lim xn = 0 x = 0.
n n + 1 n

x = lim xn+1 = lim


n

Hence, the sequence tends to 0.

2.1.4 Subsequences
Definition 2.4 Let (xn ) be a sequence and (nk )kN a strictly increasing sequence of positive
integers nk N. We call (xnk )kN a subsequence of (xn )nN . If (xnk ) converges, its limit is
called a subsequential limit of (xn ).
Example 2.5 (a) xn = 1/n, nk = 2k . then (xnk ) = (1/2, 1/4, 1/8, . . . ).
(b) (xn ) = (1, 1, 1, 1, . . . ). (x2k ) = (1, 1, . . . ) has the subsequential limit 1; (x2k+1 ) =
(1, 1, 1, . . . ) has the subsequential limit 1.
Proposition 2.9 Subsequences of convergent sequences are convergent with the same limit.
Proof. Let lim xn = x and xnk be a subsequence. To > 0 there exists m0 N such that
n m0 implies | xn x | < . Since nm m for all m, m m0 implies | xnm x | < ;
hence lim xnm = x.

Definition 2.5 Let (xn ) be a sequence. We call x R a limit point of (xn ) if every neighborhood of x contains infinitely many elements of (xn ).
Proposition 2.10 The point x is limit point of the sequence (xn ) if and only if x is a subsequential limit.
Proof. If lim xnk = x then every neighborhood U (x) contains all but finitely many xnk ; in
k

particular, it contains infinitely many elements xn . That is, x is a limit point of (xn ).
Suppose x is a limit point of (xn ). To = 1 there exists xn1 U1 (x). To = 1/k there exists
nk with xnk U1/k (x) and nk > nk1 . We have constructed a subsequence (xnk ) of (xn ) with
| x xnk | <

1
;
k

2 Sequences and Series

52
Hence, (xnk ) converges to x.

Question: Which sequences do have limit points? The answer is: Every bounded sequence has
limit points.
Proposition 2.11 (Principle of nested intervals) Let In := [an , bn ] a sequence of closed
nested intervals In+1 In such that their lengths bn an tend to 0:
Given > 0 there exists n0 such that 0 bn an < for all n n0 .
For any such interval sequence {In } there exists a unique real number x R which is a member
T
of all intervals, i. e. {x} = nN In .

Proof. Since the intervals are nested, (an ) is an increasing sequence bounded above by each
of the bk , and (bn ) is decreasing sequence bounded below by each of the ak . Consequently,
by Proposition 2.8 we have
x = lim an = sup{an } bm ,
n

for all m, and y = lim bm = inf{bm } x.

Since an x y bn for all n N


6= [x, y]

In .

T
We show the converse inclusion namely that nN [an , bn ] [x, y]. Let p In for all n, that
is, an p bn for all n N. Hence supn an p inf n bn ; that is p [x, y]. Thus,
T
[x, y] = nN In . We show uniqueness, that is x = y. Given > 0 we find n such that
y x bn an . Hence y x 0; therefore x = y. The intersection contains a unique
point x.

Proposition 2.12 (BolzanoWeierstra) A bounded real sequence has a limit point.


Proof. We use the principle of nested intervals. Let (xn ) be bounded, say | xn | C. Hence,
the interval [C, C] contains infinitely many xk . Consider the intervals [C, 0] and [0, C]. At
least one of them contains infinitely many xk , say I1 := [a1 , b1 ]. Suppose, we have already
constructed In = [an , bn ] of length bn an = C/2n2 which contains infinitely many xk .
Consider the two intervals [an , (an + bn )/2] and [(an + bn )/2, bn ] of length C/2n1 . At least
one of them still contains infinitely many xk , say In+1 := [an+1 , bn+1 ]. In this way we have
constructed a nested sequence of intervals which length go to 0. By Proposition 2.11, there
T
exists a unique x nN In . We will show that x is a subsequential limit of (xn ) (and hence a
limit point). For, choose xnk Ik ; this is possible since Ik contains infinitely many xm . Then,
ak xnk bk for all k N. Proposition 2.6 gives x = lim ak lim xnk lim bk = x; hence
lim xnk = x.
Remark. The principle of nested intevals is equivalent to the order completeness of R.

2.1 Convergent Sequences

53

Example 2.6 (a) xn = (1)n1 + n1 ; set of limit points is {1, 1}. First note that 1 =
limn x2n and 1 = limn x2n+1 are subsequential limits of (xn ). We show, for example,
that 13 is not a limit point. Indeed, for n 4, there exists a small neighborhood of 13 which has
no intersection with U 1 (1) and U 1 (1). Hence, 13 is not a limit point.
4
hni 4
, where [x] denotes the least integer less than or equal to x ([] = [3] = 3,
(b) xn = n 5
5
[2.8] = 3, [1/2] = 0). (xn ) = (1, 2, 3, 4, 0, 1, 2, 3, 4, 0, . . . ); the set of limit points is
{0, 1, 2, 3, 4}
(c) One can enumerate the rational numbers in (0, 1) in the following way.
1
,
x1 ,
2
1
2
, 3,
x2 , x3
3
1
2
3
, 4,
, x4 , x5 , x6 ,
4
4

The set of limit points is the whole interval [0, 1] since in any neighborhood of any real number
there is a rational number, see Proposition 1.11 (b) 1.11 (b). Any rational number of (0, 1)
appears infinitely often in this sequence, namely as pq = 2p
= 3p
= .
2q
3q
(d) xn = n has no limit point. Since (xn ) is not bounded, Bolzano-Weierstra fails to apply.
Definition 2.6 (a) Let (xn ) be a bounded sequence and A its set of limit points. Then sup A is
called the upper limit of (xn ) and inf A is called the lower limit of (xn ). We write
lim xn , and

= lim xn .
n

for the upper and lower limits of (xn ), respectively.


(b) If (xn ) is not bounded above, we write lim xn = +. If moreover + is the only limit
point, lim xn = +, and we can also write lim xn = +. If (xn ) is not bounded below,
lim xn = .
Proposition 2.13 Let (xn ) be a bounded sequence and A the set of limit points of (xn ).
Then lim xn and lim xn are also limit points of (xn ).
Proof. Let x = lim xn . Let > 0. By the definition of the supremum of A there exists x A
with

x < x < x.
2
Since x is a limit point, U 2 (x ) contains infinitely many elements xk . By construction,
U 2 (x ) U (x). Indeed, x U 2 (x ) implies | x x | < 2 and therefore
| x x | = | x x + x x | | x x | + | x x | <


+ = .
2 2

Hence, x is a limit point, too. The proof for lim xn is similar.


Proposition 2.14 Let b R be fixed. Suppose (xn ) is a sequence which is bounded above, then
xn b for all but finitely many n implies

lim xn b.

(2.6)

2 Sequences and Series

54
Similarly, if (xn ) is bounded below, then

xn b for all but finitely many n implies

lim xn b.

(2.7)

Proof. We prove only the first part for lim xn . Proving statement for lim xn is similar.
Let t := lim xn . Suppose to the contrary that t > b. Set = (t b)/2, then U (t) contains
infinitely many xn (t is a limit point) which are all greater than b; this contradicts xn b for
all but finitely many n. Hence lim xn b.
Applying the first part to b = supn {xn } and noting that inf A sup A, we have
inf {xn } lim xn lim xn sup{xn }.
n

The next proposition is a converse statement to Proposition 2.9.


Proposition 2.15 Let (xn ) be a bounded sequence with a unique limit point x. Then (xn )
converges to x.
Proof. Suppose to the contrary that (xn ) diverges; that is, there exists some > 0 such that
infinitely many xn are outside U (x). We view these elements as a subsequence (yk ) := (xnk )
of (xn ). Since (xn ) is bounded, so is (yk ). By Proposition 2.12 there exists a limit point y of
(yk ) which is in turn also a limit point of (xn ). Since y 6 U (x), y 6= x is a second limit point;
a contradiction! We conclude that (xn ) converges to x.
Note that t := lim xn is uniquely characterized by the following two properties. For every > 0
t <xn ,

for infinitely many n,

xn < t + , for almost all n.

(See also homework 6.2) Let us consider the above examples.


n1
Example 2.7 (a)
+ 1/n; lim xn = 1, lim xn = 1.
h nxin = (1)
, lim xn = 0, lim xn = 4.
(b) xn = n 5
5
(c) (xn ) is the sequence of rational numbers of (0, 1); lim xn = 0, lim xn = 1.
(d) xn = n; lim xn = lim xn = +.

Proposition 2.16 If sn tn for all but finitely many n, then


lim sn lim tn ,

lim sn lim tn .

2.2 Cauchy Sequences

55

Proof. (a) We keep the notations s and t for the upper limits of (sn ) and (tn ), respectively. Set
s = lim sn and t = lim tn . Let > 0. By homework 6.3 (a)

by assumption

s sn

for all but finitely many n

s sn tn

for all but finitely many n

= s lim tn

by Prp. 2.14

by first Remark in Subsection 1.1.7

sup{s | > 0 } t
s t.

(b) The proof for the lower limit follows from (a) and sup E = inf(E).

2.2 Cauchy Sequences


The aim of this section is to characterize convergent sequences without knowing their limits.
Definition 2.7 A sequence (xn ) is said to be a Cauchy sequence if:
For every > 0 there exists a positive integer n0 such that | xn xm | < for all
m, n n0 .
The definition makes sense in arbitrary metric spaces. The definition is equivalent to
> 0 n0 N n n0 k N : | xn+k xn | < .
Lemma 2.17 Every convergent sequence is a Cauchy sequence.
Proof. Let xn x. To > 0 there is n0 N such that n n0 implies xn U/2 (x). By
triangle inequality, m, n n0 implies
| xn xm | | xn x | + | xm x | /2 + /2 = .
Hence, (xn ) is a Cauchy sequence.

Proposition 2.18 (Cauchy convergence criterion) A real sequence is convergent if and only
if it is a Cauchy sequence.
Proof. One direction is Lemma 2.17. We prove the other direction. Let (xn ) be a Cauchy
sequence. First we show that (xn ) is bounded. To = 1 there is a positive integer n0 such
that m, n n0 implies | xm xn | < 1. In particular | xn xn0 | < 1 for all n n0 ; hence
| xn | < 1 + | xn0 |. Setting
C = max{| x1 | , | x2 | , . . . , | xn0 1 | , | xn0 | + 1},

2 Sequences and Series

56

| xn | < C for all n.


By Proposition 2.12 there exists a limit point x of (xn ); and by Proposition 2.10 a subsequence
(xnk ) converging to x. We will show that limn xn = x. Let > 0. Since xnk x we find
k0 N such that k k0 implies | xnk x | < /2. Since (xn ) is a Cauchy sequence, there
exists n0 N such that m, n n0 implies | xn xm | < /2. Put n1 := max{n0 , nk0 } and
choose k1 with nk1 n1 nk0 . Then n n1 implies






| x xn | x xnk1 + xnk1 xn < 2 /2 = .

Example 2.8 (a) xn =


sequence. For, consider

Pn

1
k=1 k

= 1+

x2m xm =

1
2

1
3

+ + n1 . We show that (xn ) is not a Cauchy

2m
2m
X
X
1
1
1
1

=m
= .
k k=m+1 2m
2m
2
k=m+1

Hence, there is no n0 such that p, n n0 implies | xp xn | < 12 .


n
X
(1)k+1
(b) xn =
= 1 1/2 + 1/3 + + (1)n+1 1/n. Consider
k
k=1



1
1
1
1
k+1
xn+k xn = (1)

+
+ (1)
n+1 n+2 n+3
n+k
 


1
1
1
1
+
+

= (1)n
n+1 n+2
n+3 n+4
(

1
1
, k even
n+k
n+k1
+
1
,
k odd
n+k
n

Since all summands in parentheses are positive, we conclude


(

 

1
1
n+k
,
1
1
1
1
n+k1
+
++

| xn+k xn | =
1
n+1 n+2
n+3 n+4
,
n+k
(
"

1
,
k even
1
1
1
+ + n+k 1

=

1
n+1
n+2 n+3

, k even


n+k1

k even
k odd

n+k

1
,
| xn+k xn | <
n+1

since all summands in parentheses are positive. Hence, (xn ) is a Cauchy sequence and converges.

2.3 Series

57

2.3 Series
Definition 2.8 Given a sequence (an ), we associate with (an ) a sequence (sn ), where
sn =

n
X
k=1

ak = a1 + a2 + + an .

For (sn ) we also use the symbol

ak ,

(2.8)

k=1

and we call it an infinite series or just a series. The numbers sn are called the partial sums of
the series. If (sn ) converges to s, we say that the series converges, and write

ak = s.

k=1

The number s is called the sum of the series.


Remarks 2.1 (a) The sum of a series should be clearly understood as the limit of the sequence
of partial sums; it is not simply obtained by addition.
(b) If (sn ) diverges, the series is said to be divergent.
P
(c) The symbol
as well as the limit of this
k=1 ak means both, the sequence of partial sums
P
sequence (if it exists). Sometimes we use series of the form k=k0 ak , k0 N. We simply
P
write ak if there is no ambiguity about the bounds of the index k.
Example 2.9 (Example 2.8 continued)

X
1
is divergent. This is the harmonic series.
(1)
n
n=1

X
1
(2)
(1)n+1 is convergent. It is an example of an alternating series (the summands are
n
n=1
changing their signs, and the absolute value of the summands form a decreasing to 0 sequence).

X
P n
1
q n is called the geometric series. It is convergent for | q | < 1 with
(3)
0 q = 1q . This
n=0

n
X

1 q n+1
, see proof of Lemma 1.14, first formula with y = 1, x = q.
is seen from
q =
1q
k=0
The series diverges for | q | 1. The general formula in case | q | < 1 is
k

cq n =

n=n0

cq n0
.
1q

2.3.1 Properties of Convergent Series


P
P
an is convergent, then
Lemma 2.19 (1) If
n=1
k=m ak is convergent for any m N.
P
P
(2) If an is convergent, then the sequence rn :=
k=n+1 ak tends to 0 as n .

(2.9)

2 Sequences and Series

58
(3) If (an ) is a sequence of nonnegative real numbers, then
partial sums are bounded.

an converges if and only if the

P
P
Proof. (1). Suppose that
a = s (a1 + a2 + + am1 ).
n=1 an = s; we show that
n=m
Pk
P
Indeed, let (sn ) and (tn ) denote the nth partial sums of k=1 ak and
k=m ak , respectively.
Pm1
Then for n > m one has tn = sn k=1 ak . Taking the limit n proves the claim.
P
P
We prove (2). Suppose that
1 an converges to s. By (1), rn =
k=n+1 ak is also a convergent
series for all n. We have

ak =

k=1

n
X

ak +

k=1

= s = sn + rn

ak

k=n+1

= rn = s sn

= lim rn = s s = 0.
n

(3) Suppose an 0, then sn+1 = sn + an+1 sn . Hence, (sn ) is an increasing sequence. By


Proposition 2.8, (sn ) converges.
The other direction is trivial since every convergent sequence is bounded.

Proposition 2.20 (Cauchy criterion)


integer n0 N such that

an converges if and only if for every > 0 there is an



n
X



ak <


(2.10)

k=m

if m, n n0 .

Proof. Clear from Proposition 2.18. Consider the sequence of partial sums sn =
P
note that for n m one has| sn sm1 | = | nk=m ak |.
Corollary 2.21 If

Pn

k=1 ak

and

an converges, then (an ) converges to 0.

Proof. Take m = n in (2.10); this yields | an | < . Hence (an ) tends to 0.

Proposition 2.22 (Comparison test) (a) If | an | Cbn for some C > 0 and for almost all
P
P
n N, and if bn converges, then an converges.
P
P
(b) If an Cdn 0 for some C > 0 and for almost all n, and if
dn diverges, then
an
diverges.

2.3 Series

59

Proof. (a) Suppose n n1 implies | an | Cbn . Given > 0, there exists n0 n1 such that
m, n n0 implies
n
X

bk <
C
k=m
by the Cauchy criterion. Hence

n
n
n
X X
X


ak
| ak |
Cbk < ,



k=m

k=m

k=m

and (a) follows by the Cauchy criterion.


P
P
(b) follows from (a), for if an converges, so must dn .

2.3.2 Operations with Convergent Series


P
P
Definition 2.9 If
an and
bn are series, we define sums and differences as follows
P
P
P
P
P
can , c R.
an bn := (an bn ) and c an :=
Pn
P
P
P
Let cn := k=1 ak bnk+1 , then cn is called the Cauchy product of an and bn .
P
P
P
P
P
(a
+
b
)
=
a
+
If an and bn are convergent, it is easy to see that
n
n
n
1
1
1 bn and
Pn
Pn
1 can = c
1 an .
P

Caution, the product series cn need not to be convergent. Indeed, let an := bn := (1)n / n.
P
P
P
One can show that an and bn are convergent (see Proposition 2.29 below), however, cn
P
is not convergent, when cn = nk=1 ak bnk+1 . Proof: By the arithmetic-geometric mean inP
2
2n
2
n+1
. Hence, | cn | nk=1 n+1
= n+1
. Since cn doesnt
equality, | ak bnk+1 | = 1
k(n+1k)
P
converge to 0 as n , n=0 cn diverges by Corollary 2.21

2.3.3 Series of Nonnegative Numbers


Proposition 2.23 (Compression Theorem) Suppose a1 a2 0. Then the series

X
an converges if and only if the series
n=1

X
k=0

2k a2k = a1 + 2a2 + 4a4 + 8a8 +

converges.
Proof. By Lemma 2.19 (3) it suffices to consider boundedness of the partial sums. Let
sn = a1 + + an ,

tk = a1 + 2a2 + + 2k a2k .

(2.11)

2 Sequences and Series

60
For n < 2k

sn a1 + (a2 + a3 ) + + (a2k + + a2k+1 1 )


sn a1 + 2a2 + + 2k a2k = tk .

(2.12)

On the other hand, if n > 2k ,


sn a1 + a2 + (a3 + a4 ) + + (a2k1 +1 + + a2k )
1
sn a1 + a2 + 2a4 + + 2k1a2k
2
1
sn tk .
2

(2.13)

By (2.12) and (2.13), the sequences sn and tk are either both bounded or both unbounded. This
completes the proof.

X
1
Example 2.10 (a)
converges if p > 1 and diverges if p 1.
np
n=1
If p 0, divergence follows from Corollary 2.21. If p > 0 Proposition 2.23 is applicable, and
we are led to the series
k


X
X
1
k 1
2 kp =
.
2
2p1
k=0
k=0

1
This is a geometric series wit q = p1 . It converges if and only if 2p1 > 1 if and only if
2
p > 1.
(b) If p > 1,

X
n=2

1
n(log n)p

(2.14)

converges; if p 1, the series diverges. log n denotes the logarithm to the base e.
If p < 0, n(log1 n)p > n1 and divergence follows by comparison with the harmonic series. Now let
p > 0. By Lemma 1.23 (b), log n < log(n + 1). Hence (n(log n)p ) increases and 1/(n(log n))p
decreases; we can apply Proposition 2.23 to (2.14). This leads us to the series

X
k=1

2k

X
X
1
1
1
1
=
=
,
p
p
p
2k (log 2k )p
(k
log
2)
(log
2)
k
k=1
k=1

and the assertion follows from example (a).


This procedure can evidently be continued. For instance

X
n=3

X
n=3

1
converges.
n log n(log log n)2

1
diverges, whereas
n log n log log n

2.3 Series

61

2.3.4 The Number e


Leonhard Euler (Basel 1707 St. Petersburg 1783) was one of the greatest mathematicians.
He made contributions to Number Theorie, Ordinary Differential Equations, Calculus of Varian
tions, Astronomy, Mechanics. Fermat (1635) claimed that all numbers of the form fn = 22 +1,
n N, are prime numbers. This is obviously true for the first 5 numbers (3, 5, 17, 257, 65537).
Euler showed that 641 | 232 + 1. In fact, it is open whether any other element fn is a
prime number. Euler showed that the equation x3 + y 3 = z 3 has no solution in positive integers x, y,z. This is the special case ofFermats last theorem. It is known that the limit
1
1 1
= lim 1 + + + + log n exists and gives a finite number , the so called
n
2 3
n
Euler constant. It is not known whether is rational or not. Further, Euler numbers Er play
P
1
a role in calculating the series n (1)n (2n+1)
r . Soon, we will speak about the Euler formula
ix
e = cos x + i sin x. More about live and work of famous mathematicians is to be found in
www-history.mcs.st-andrews.ac.uk/
We define

X
1
,
(2.15)
e :=
n!
n=0
where 0! = 1! = 1 by definition. Since

1
1
1
+
++
12 123
1 2n
1
1
1
< 1 + 1 + + 2 + + n1 < 3,
2 2
2
the series converges (by the comparing it with the geometric series with q = 12 ) and the definition makes sense. In fact, the series converges very rapidly and allows us to compute e with
great accuracy. It is of interest to note that e can also defined by means of another limit process.
e is called the Euler number.
sn = 1 + 1 +

Proposition 2.24

Proof. Let

n

1
e = lim 1 +
.
n
n
n
X
1
,
sn =
k!
k=0

(2.16)

n

1
tn = 1 +
.
n

By the binomial theorem,


1 n(n 1) 1
n(n 1)(n 2) 1
n(n 1) 1 1
2+
3 ++
n
tn = 1 + n +
n
3!
n!
n
 2!  n 

n
1
1
1
1
2
= 1+1+
1
+
1
1
+
2!
n
3!
n
n


 

1
1
2
n1
+
1
1
1
n!
n
n
n
1
1
1
1+1+ + ++
2! 3!
n!

2 Sequences and Series

62
Hence, tn sn , so that by Proposition 2.16
lim tn lim sn = lim sn = e.

(2.17)

Next if n m,







1
1
1
m1
1
1
++
1
1
.
tn 1 + 1 +
2!
n
m!
n
n

Let n , keeping m fixed again by Proposition 2.16 we get


lim tn 1 + 1 +

1
1
++
= sm .
2!
m!

Letting m , we finally get


e lim tn .

(2.18)

The proposition follows from (2.17) and (2.18).


P
The rapidity with which the series 1/n! converges can be estimated as follows.

1
1
+
+
(n + 1)! (n + 2)!


1
1
1
1
1
1
1+
+

+ =
<
1 =
2
(n + 1)!
n + 1 (n + 1)
(n + 1)! 1 n+1
n! n

e sn =

so that
0 < e sn <

1
.
n! n

(2.19)

We use the preceding inequality to compute e. For n = 9 we find


s9 = 1 + 1 +

1
1
1
1
1
1
1 1
+ +
+
+
+
+
+
= 2.718281526...
2 6 24 120 720 5040 40, 320 362, 880
(2.20)

By (2.19)

3.1
107
such that the first six digits of e in (2.20) are correct.
e s9 <

Example 2.11
n
n


1
1
1
n1
n = lim
(a) lim 1
= lim
= lim
n1
n
1
n
n
n
n 1 +
n
n
1+
n1
n1



4n
4n
4n
3n + 1
3n + 1
3n
= lim
lim
(b) lim
n 3n 1
n
n 3n 1
3n
3n ! 34
3n ! 34


8
1
1
lim
= e3.
= lim
1+
1+
n
n
3n
3n 1

1
n1

1
= .
e

2.3 Series

63

Proposition 2.25 e is irrational.


Proof. Suppose e is rational, say e = p/q with positive integers p and q. By (2.19)
1
0 < q!(e sq ) < .
q

(2.21)

By our assumption, q!e is an integer. Since




1
1
q!sq = q! 1 + 1 + + +
2!
q!
is also an integer, we see that q!(e sq ) is an integer. Since q 1, (2.21) implies the existence
of an integer between 0 and 1 which is absurd.

2.3.5 The Root and the Ratio Tests


p
P
Theorem 2.26 (Root Test) Given an , put = lim n | an |.
n
Then
P
(a) if < 1, an converges;
P
(b) if > 1, an diverges;
(c) if = 1, the test gives no information.
Proof. (a) If < 1 choose such that < < 1, and an integer n0 such that
p
n
| an | <
p
for n n0 (such n0 exists since is the supremum of the limit set of ( n | an |)). That is,
n n0 implies
| an | < n .
P n
P
Since 0 < < 1,
converges. Convergence of
an now follows from the comparison
test.
p
(b) If > 1 there is a subsequence (ank ) such that nk | ank | . Hence | an | > 1 for infinitely
many n, so that the necessary condition for convergence, an 0, fails.
X1
X 1
. For each of the series = 1, but the first
To prove (c) consider the series
and
n
n2
diverges, the second converges.
p
P
Remark. (a) an converges, if there exists q < 1 such that n | an | q for almost all n.
P
(b) an diverges if | an | 1 for infinitely many n.
P
Theorem 2.27 (Ratio Test) The series an


an+1
< 1,
(a) converges if lim
n
an


an+1

1 for all but finitely many n.
(b) diverges if
an

2 Sequences and Series

64
In place of (b) one can also use the
(weaker) statement
an+1
P
> 1.
(b) an diverges if lim
an
n


an+1
Indeed, if (b) is satisfied, almost all elements of the sequence
an

Corollary 2.28 The series

an



are 1.



an+1

< 1,
(a) converges if lim

a
n

an+1
> 1.
(b) diverges if lim
an

Proof of Theorem 2.27. If condition (a) holds, we can find < 1 and an integer m such that
n m implies


an+1


an < .

In particular,

| am+1 | < | am | ,

| am+2 | < | am+1 | < 2 | am | ,


...

| am+p | < p | am | .
That is,
| an | <

| am | n

P
for n m, and (a) follows from the comparison test, since n converges. If | an+1 | | an |
for n n0 , it is seen that the condition an 0 does not hold, and (b) follows.

Remark 2.2 Homework 7.5 shows that in (b) all but finitely many cannot be replaced by the
weaker assumption infinitely many.
Example 2.12 (a) The series

2
n
n=0 n /2

converges since, if n 3,




2

2
an+1 (n + 1)2 2n
1
1
1
1
8
=

1+
1+
=

= < 1.
an
n+1
2
2 n
2
n
2
3
9

(b) Consider the series

1 1
1
1
1
1
1
+1+ + +
+
+
+
+
2
8 4 32 16 128 64
=

1
1
1
1
1
1
+ 0 + 3 + 2 + 5 + 4 +
1
2
2
2
2
2
2

2.3 Series

65

an+1
an+1
1

= , lim
= 2, but lim n an = 12 . Indeed, a2n = 1/22n2 and a2n+1 =
8 n an
n an
1/22n+1 yields
a2n
1
a2n+1
= ,
= 2.
a2n
8
a2n1

where lim

The root test indicates convergence; the ratio test does not apply.
X 1
X1
and
(c) For
both the ratio and the root test do not apply since both (an+1 /an ) and
n
n2

n
( an ) converge to 1.
The ratio test is frequently easier to apply than the root test. However, the root test has wider
scope.
Remark 2.3 For any sequence (cn ) of positive real numbers,

cn+1
cn+1
lim n cn lim n cn lim
.
n
n cn
n cn
n
lim

For the proof, see [Rud76, 3.37 Theorem]. In particular, if lim


exists and both limits coincide.

cn+1

exists, then lim n cn also


cn

P
P
bn =
Proposition 2.29 (Leibniz criterion) Let
bn be an alternating serie, that is
P
n+1
(1) an with a decreasing sequence of positive numbers a1 a2 0. If lim an = 0
P
then bn converges.

Proof. The proof is quite the same as in Example 2.8 (b). We find for the partial sums sn of
P
bn
| sn sm | am+1
P
if n m. Since (an ) tends to 0, the Cauchy criterion applies to (sn ). Hence,
bn is
convergent.

2.3.6 Absolute Convergence


The series

an is said to converge absolutely if the series

Proposition 2.30 If

an converges absolutely, then

plus the Cauchy criterion.

k=m

| an | converges.

an converges.

Proof. The assertion follows from the inequality




n
n
X
X


ak
| ak |



k=m

2 Sequences and Series

66

Remarks 2.4 For series with positive terms, absolute convergence is the same as convergence.
P
P
P
If
an converges but
| an | diverges, we say that
an converges nonabsolutely. For inP
n+1
stance (1) /n converges nonabsolutely. The comparison test as well as the root and
the ratio tests, is really a test for absolute convergence and cannot give any information about
nonabsolutely convergent series.
We shall see that we may operate with absolutely convergent series very much as with finite
sums. We may multiply them, we may change the order in which the additions are carried out
without effecting the sum of the series. But for nonabsolutely convergent sequences this is no
longer true and more care has to be taken when dealing with them.
Without proof we mention the fact that one can multiply absolutely convergent series; for the
proof, see [Rud76, Theorem 3.50].
Proposition 2.31 If
cn =

n
X
k=0

an converges absolutely with

ak bnk , n Z+ . Then

an = A,

n=0

bn converges,

bn = B,

n=0

cn = AB.

n=0

2.3.7 Decimal Expansion of Real Numbers


Proposition 2.32 (a) Let be a real number with 0 < 1. Then there exists a sequence
(an ), an {0, 1, 2, . . . , 9} such that
=

an 10n .

(2.22)

n=1

The sequence (an ) is called a decimal expansion of .


(b) Given a sequence (ak ), ak {0, 1, . . . , 9}, then there exists a real number [0, 1] such
that

X
=
an 10n .
n=1

Proof. (b) Comparison with the geometric series yields

X
n=1

an 10n 9

X
n=1

10n =

1
9

= 1.
10 1 1/10

Hence the series n=1 an 10n converges to some [0, 1].


(a) Given [0, 1) we use induction to construct a sequence (an ) with (2.22) and
sn < sn + 10

where

sn =

n
X

ak 10k .

k=1

First, cut [0, 1] into 10 pieces Ij := [j/10, (j + 1)/10), j = 0, . . . , 9, of equal length. If Ij ,


put a1 := j. Then,
a1
1
s1 =
< s1 + .
10
10

2.3 Series

67

Suppose a1 , . . . , an are already constructed and


sn < sn + 10n .
Consider the intervals Ij := [sn + j/10n+1, sn + (j + 1)/10n+1), j = 0, . . . , 9. There is exactly
one j such that Ij . Put an+1 := j, then
an+1
an+1 + 1
< sn +
n+1
10
10n+1
n1
< sn+1 + 10
.
sn +

sn+1

The induction step is complete. By construction | sn | < 10n , that is, lim sn = .

Remarks 2.5 (a) The proof shows that any real number [0, 1) can be approximated by
rational numbers.
(b) The construction avoids decimal expansion of the form = . . . a9999 . . . , a < 9, and gives
instead = . . . (a + 1)000 . . . . It gives a bijective correspondence between the real numbers of
the interval [0, 1) and the sequences (an ), an {0, 1, . . . , 9}, not ending with nines. However,
the sequence (an ) = (0, 1, 9, 9, ) corresponds to the real number 0.02.
(c) It is not difficult to see that [0, 1) is rational if and only if there exist positive integers
n0 and p such that n n0 implies an = an+p the decimal expansion is periodic from n0 on.

2.3.8 Complex Sequences and Series


Almost all notions and theorems carry over from real sequences to complex sequences. For
example
A sequence (zn ) of complex numbers converges to z if for every (real) > 0 there
exists a positive integer n0 N such that n n0 implies
| z zn | < .
The following proposition shows that convergence of a complex sequence can be reduced to the
convergence of two real sequences.
Proposition 2.33 The complex sequence (zn ) converges to some complex number z if and only
if the real sequences ( Re zn ) converges to Re z and the real sequence ( Im zn ) converges to
Im z.
Proof. Using the (complex) limit law lim(zn + c) = c + lim zn it is easy to see that we
can restrict ourselves to the case z = 0. Suppose first zn 0. Proposition 1.20 (d) gives
| Re zn | | zn |. Hence Re zn tends to 0 as n . Similarly, | Im zn | | zn | and therefore
Im zn 0.
Suppose now xn := Re zn 0 and yn := Im zn 0 as n goes to infinity. Since

2 Sequences and Series

68
| zn |2 = x2n + yn2 , | zn |2 0 as n ; this implies zn 0.

Since the complex field C is not an ordered field, all notions and propositions where the order
is involved do not make sense for complex series or they need modifications. The sandwich
theorem does not hold; there is no notion of monotonic sequences, upper and lower limits. But
still there are bounded sequences (| zn | C), limit points, subsequences, Cauchy sequences,
series, and absolute convergence. The following theorems are true for complex sequences, too:
Proposition/Lemma/Theorem 1, 2, 3, 9, 10, 12, 15, 17, 18
The BolzanoWeierstra Theorem for bounded complex sequences (zn ) can be proved by considering the real and the imaginary sequences ( Re zn ) and ( Im zn ) separately.
The comparison test for series now reads:
P
(a) If | an | C | bn | for some C > 0 and for almost all n N, and if
| bn |
P
converges, then an converges.
P
(b) If | an | C | dn | for some C > 0 and for almost all n, and if | dn | diverges,
P
then an diverges.

The Cauchy criterion, the root, and the ratio tests are true for complex series as well. Propositions 19, 20, 26, 27, 28, 30, 31 are true for complex series.

2.3.9 Power Series


Definition 2.10 Given a sequence (cn ) of complex numbers, the series

cn z n

(2.23)

n=0

is called a power series. The numbers cn are called the coefficients of the series; z is a complex
number.
In general, the series will converge or diverge, depending on the choice of z. More precisely,
with every power series there is associated a circle with center 0, the circle of convergence, such
that (2.23) converges if z is in the interior of the circle and diverges if z is in the exterior. The
radius R of this disc of convergence is called the radius of convergence.
On the disc of convergence, a power series defines a function since it associates to each z with
P
| z | < R a complex number, namely the sum of the numerical series n cn z n . For example,
P n
1
n=0 z defines the function f (z) = 1z for | z | < 1. If almost all coefficients cn are 0, say
cn = 0 for all n m + 1, the power series is a finite sum and the corresponding function is a
Pm
P
n
n
2
m
polynomial:
n=0 cn z =
n=0 cn z = c0 + c1 z + c2 z + + cm z .
P
Theorem 2.34 Given a power series cn z n , put
p
1
(2.24)
= lim n | cn |, R = .
n

P
If = 0, R = +; if = +, R = 0. Then
cn z n converges if | z | < R, and diverges if
| z | > R.

2.3 Series

69

The behavior on the circle of convergence cannot be described so simple.


Proof. Put an = cn z n and apply the root test:
p
p
|z|
n
.
| an | = | z | lim n | cn | =
n
n
R
lim

This gives convergence if | z | < R and divergence if | z | > R.

The nonnegative number R is called the radius of convergence.


Pm
m
Examplep2.13 (a) The series
has cn = 0 for almost all n. Hence =
n=0 cn z
n
limn | cn | = limn 0 = 0 and R = +.
P
(b) The series nn z n has R = 0.
X zn
has R = +. (In this case the ratio test is easier to apply than the root
(c) The series
n!
test. Indeed,


cn+1
n!
1
= lim
= lim
= lim
= 0,

n
n (n + 1)!
n n + 1
cn
and therefore R = +. )
P n
(d) The series
z has R = 1. If | z | = 1 diverges since (z n ) does not tend to 0. This
generalizes the geometric series; formula (2.9) still holds if | q | < 1:
 n

X
2(i/3)2
i
3+i
2
=
=
.
3
1

i/3
15
n=2

P
(e) The series z n /n has R = 1. It diverges if z = 1. It converges for all other z with | z | = 1
(without proof).
P
(f) The series z n /n2 has R = 1. It converges for all z with | z | = 1 by the comparison test,
since | z n /n2 | = 1/n2 .

2.3.10 Rearrangements
The generalized associative law for finite sums says that we can insert brackets without effecting
the sum, for example, ((a1 + a2 ) + (a3 + a4 )) = (a1 + (a2 + (a3 + a4 ))). We will see that a
similar statement holds for series:
P
P
P
Suppose that k ak is a converging series and l bl is a sum obtained from k ak by inserting
brackets, for example
b1 + b2 + b3 + = (a1 + a2 ) + (a3 + + a10 ) + (a11 + a12 ) +
| {z } |
{z
} | {z }
b1

b2

b3

P
P
Then l bl converges and the sum is the same. If k ak diverges to +, the same is true
P
P
P
for l bl . However, divergence of
ak does not imply divergence of
bl in general, since
1 1 + 1 1 + 1 1 + diverges but (1 1) + (1 1) + converges. For the proof
P
P
let sn = nk=1 and tm = m
l=1 bl . By construction, tm = snm for a suitable subsequence (snm )

2 Sequences and Series

70

P
of the partial sums of k ak . Convergence (proper or improper) of (sn ) implies convergence
P
(proper or improper) of any subsequence. Hence, l bl converges.
For finite sums, the generalized commutative law holds:
a1 + a2 + a3 + a4 = a2 + a4 + a1 + a3 ;
that is, any rearrangement of the summands does not effect the sum. We will see in Example 2.14 below that this is not true for arbitrary series but for absolutely converging ones, (see
Proposition 2.36 below).
Definition 2.11 Let : N N be a bijective mapping, that is in the sequence ((1), (2), . . . )
every positive integer appears once and only once. Putting
an = a(n) ,
we say that

an is a rearrangement of

(n = 1, 2, . . . ),

an .

P
P
P
If (sn ) and (sn ) are the partial sums of
an and a rearrangement
an of
an , it is easily
seen that, in general, these two sequences consist of entirely different numbers. We are led to
the problem of determining under what conditions all rearrangements of a convergent series
will converge and whether the sums are necessarily the same.
Example 2.14 (a) Consider the convergent series

X
(1)n+1
n=1

=1

1 1
+ +
2 3

(2.25)

and one of its rearrangements


1

1 1
1 1 1
1
1
1
+ +

+ .
2 4
3 6 8
5 10 12

(2.26)

If s is the sum of (2.25) then s > 0 since


 


1 1
1
+
+ > 0.

1
2
3 4
We will show that (2.26) converges to s = s/2. Namely






X
1
1
1
1
1 1
1
1

s =
+
+

+
an = 1
2
4
3 6
8
5 10
12
1 1 1 1
1
1
= + +

+
2  4 6 8 10 12 
1 1 1
1
1
1 + + = s
=
2
2 3 4
2
Since s 6= 0, s 6= s. Hence, there exist rearrangements which converge; however to a different
limit.

2.3 Series

71

(b) Consider the following rearrangement of the series (2.25)


X
1 1 1
an = 1 + +
2 3 4

1 1
1
+
+
+
5 7
6


1
1
1
1
1

+
+
+
+
9 11 13 15
8

+


1
1
1
1

+ n
+ + n+1
+
+
n
2 +1 2 +3
2
1
2n + 2

Since for every positive integer n 10




1
1
1
1
1
1
1
1
1

+ n
+ + n+1
> 2n1 n+1
>
>
n
2 +1 2 +3
2
1
2n + 2
2
2n + 2
4 2n + 2
5
the rearranged series diverges to +.
Without proof (see [Rud76, 3.54 Theorem]) we remark the following surprising theorem. It
shows (together with the Proposition 2.36) that the absolute convergence of a series is necessary
and sufficient for every rearrangement to be convergent (to the same limit).
P
Proposition 2.35 Let
an be a series of real numbers which converges, but not absolutely.
P
Suppose +. Then there exists a rearrangement an with partial sums sn
such that
lim sn = , lim sn = .
n

P
Proposition 2.36 If
an is a series of complex numbers which converges absolutely, then
P
every rearrangement of an converges, and they all converge to the same sum.
P
Proof. Let an be a rearrangement with partial sums sn . Given > 0, by the Cauchy criterion
P
for the series | an | there exists n0 N such that n m n0 implies
n
X

k=m

| ak | < .

(2.27)

Now choose p such that the integers 1, 2, . . . , n0 are all contained in the set (1), (2), . . . , (p).
{1, 2, . . . , n0 } {(1), (2), . . . , (p)}.
Then, if n p, the numbers a1 , a2 , . . . , an0 will cancel in the difference sn sn , so that



n
n
n
n
X

X
X
X



| sn sn | =
ak
a(k)
ak
| ak | < ,



k=1

k=1

k=n0 +1

by (2.27). Hence (sn ) converges to the same sum as (sn ).


P
The same argument shows that an also absolutely converges.

k=n0 +1

2 Sequences and Series

72

2.3.11 Products of Series


If we multiply two finite sums a1 + a2 + + an and b1 + b2 + + bm by the distributive
law, we form all products ai bj put them into a sequence p0 , p1 , , ps , s = mn, and add up
p0 + p1 + p2 + + ps . This method can be generalized to series a0 + a1 + and b0 + b1 + .
Surely, we can form all products ai bj , we can arrange them in a sequence p0 , p1 , p2 , and
form the product series p0 + p1 + . For example, consider the table
a0 b0 a0 b1 a0 b2
a1 b0 a1 b1 a1 b2
a2 b0 a2 b1 a2 b2
..
..
..
.
.
.

p0 p1 p3
p2 p4 p7
p5 p8 p12
..
.. ..
.
. .

P
and the diagonal enumeration of the products. The question is: under which conditions on an
P
and
bn the product series converges and its sum does not depend on the arrangement of the
products ai bk .

P
P
P
ak and
bk converge absolutely with A =
a
Proposition 2.37 If both series
k=0
k=0
P
P
P k=0 k
and B = k=0 bk , then any of their product series pk converges absolutely and k=0 pk =
AB.
P
Proof. For the nth partial sum of any product series nk=0 | pk | we have
| p0 | + | p1 | + + | pn | (| a0 | + + | am |) (| b0 | + + | bm |),

if m is sufficiently large. More than ever,


| p0 | + | p1 | + + | pn |

X
k=0

| ak |

X
k=0

| bk | .

P
That is, any series
by Lemma 2.19 (c). By Propok=0 | pk | is bounded and hence convergent
P
sition 2.36 all product series converge to the same sum s = k=0 pk . Consider now the very
P
special product series
k=1 qn with partial sums consisting of the sum of the elements in the
upper left square. Then
q1 + q2 + + q(n+1)2 = (a0 + a1 + + an )(b0 + + bn ).
converges to s = AB.
Arranging the elements ai bj as above in a diagonal array and summing up the elements on the
nth diagonal cn = a0 bn + a1 bn1 + + an b0 , we obtain the Cauchy product

X
n=0

cn =

X
n=0

(a0 bn + a1 bn1 + + an b0 ).

P
P
Corollary 2.38 If both series k=0 ak and
bk converge absolutely with A =
k=0
k=0 ak
P
P
P
and B = k=0 bk , their Cauchy product k=0 ck converges absolutely and k=0 ck = AB.

2.3 Series

73

Example 2.15 We compute the Cauchy product of two geometric series:


(1 + p + p2 + )(1 + q + q 2 + ) = 1 + (p + q) + (p2 + pq + q 2 )+
+ (p3 + p2 q + pq 2 + q 2 ) +

p q p2 q 2 p3 q 3
1 X n
=
+
+
+ =
(p q n )
pq
pq
pq
p q n=1
=

| p |<1,| q |<1

1
p
q
1 p(1 q) q(1 p)
1
1

=
=

.
pq1p 1q
p q (1 p)(1 q)
1p 1q

Cauchy Product of Power Series


In case of power series the Cauchy product is appropriate since it is again a power series (which
P
k
is not the case for other types of product series). Indeed, the Cauchy product of
k=0 ak z and
P
k
k=0 bk z is given by the general element
n
X

ak z bnk z

nk

=z

k=0

such that

X
k=0

ak z k

n
X

ak bnk ,

k=0

bk z k =

X
n=0

k=0

(a0 bn + + an b0 )z n .

P
Corollary 2.39 Suppose that n an z n and n bn z n are power series with positive radius
of convergence R1 and R2 , respectively. Let R = min{R1 , R2 }. Then the Cauchy product
P
n
n=0 cn z , cn = a0 bn + + an b0 , converges absolutely for | z | < R and
P

X
n=0

an z n

bn z n =

n=0

cn z n ,

n=0

| z | < R.

This follows from the previous corollary and the fact that both series converge absolutely for
| z | < R.
Example 2.16 (a)

1
, | z | < 1.
(1 z)2
n=0
P n
1
= 1z
, | z | < 1 with itself.
Indeed, consider the Cauchy product of
n=0 z
Pn
Pn
an = bn = 1, cn = k=0 ak bnk = k=0 1 1 = n + 1, the claim follows.
(b)
2

(z + z )

X
n=0

z =

(z

n+1

(n + 1)z n =

+z

n+2

n=0

=z+2

)=

X
n=1

X
n=2

z +

Since

zn =

n=2

z n = z + 2z 2 + 2z 3 + 2z 4 + = z +

z + z2
2z 2
=
.
1z
1z

74

2 Sequences and Series

Chapter 3
Functions and Continuity
This chapter is devoted to another central notion in analysisthe notion of a continuous function. We will see that sums, product, quotients, and compositions of continuous functions are
continuous. If nothing is specified otherwise D will denote a finite union of intervals.

Definition 3.1 Let D R be a subset of R. A function is a map f : D R.


(a) The set D is called the domain of f ; we write D = D(f ).
(b) If A D, f (A) := {f (x) | x A} is called the image of A under f .
The function f A : A R given by f A(a) = f (a), a A, is called the restriction of f to A.
(c) If B R, we call f 1 (B) := {x D | f (x) B} the preimage of B under f .
(d) The graph of f is the set graph(f ) := {(x, f (x)) | x D}.
Later we will consider functions in a wider sense: From the complex numbers into complex
numbers and from Fn into Fm where F = R or F = C.
We say that a function f : D R is bounded, if f (D) R is a bounded set of real numbers,
i. e. there is a C > 0 such that | f (x) | C for all x D. We say that f is bounded above
(resp. bounded below) if there exists C R such that f (x) < C (resp. f (x) > C) for all x in
the domain of f .

Example 3.1 (a) Power series (with radius of convergence R > 0), polynomials and rational
functions are the most important examples of functions.
Let c R. Then f (x) = c, f : R R, is called the constant function.

(b) Properties of the functions change drastically if we change the domain or the image set.
Let f : R R, g : R R+ , k : R+ R, h : R+ R+ function given by x 7 x2 .
g is surjective, k is injective, h is bijective, f is neither injective nor surjective. Obviously,
f R+ = k and gR+ = h.
P n
1
\
(c) Let f (x) =
n=0 x , f : (1, 1) R and h(x) = 1x , h : R {1} R. Then
h(1, 1) = f .
75

3 Functions and Continuity

76
y

f(x)=x

f(x)=|x|

c
x

The graphs of the constant, the identity, and absolute value functions.

3.1 Limits of a Function


Definition 3.2 (--definition) Let (a, b) a finite or infinite interval and x0 (a, b). Let
f : (a, b) \ {x0 } R be a real-valued function. We call A R the limit of f in x0 (The
limit of f (x) is A as x approaches x0 ; f approaches A near x0 ) if the following is satisfied
For any > 0 there exists > 0 such that x (a, b) and 0 < | x x0 | < imply
| f (x) A | < .
We write
lim f (x) = A.

xx0

Roughly speaking, if x is close to x0 , f (x) must be closed to A.

f(x)+
f(x)
f(x)-
x- x x+

Using quantifiers lim f (x) = A reads as


xx0

> 0 > 0 x (a, b) :

0 < | x x0 | < = | f (x) A | < .

Note that the formal negation of lim f (x) = A is


xx0

> 0 > 0 x (a, b) :

0 < | x x0 | <

and

| f (x) A | .

3.1 Limits of a Function

77

Proposition 3.1 (sequences definition) Let f and x0 be as above. Then lim f (x) = A if and
xx0

only if for every sequence (xn ) with xn (a, b), xn 6= x0 for all n, and lim xn = x0 we have
n

lim f (xn ) = A.

Proof. Suppose limxx0 f (x) = A, and xn x0 where xn 6= x0 for all n. Given > 0 we find
> 0 such that | f (x) A | < if 0 < | x x0 | < . Since xn x0 , there is a positive integer
n0 such that n n0 implies | xn x0 | < . Therefore n n0 implies | f (xn ) A | < . That
is, limn f (xn ) = A.
Suppose to the contrary that the condition of the proposition is fulfilled but limxx0 f (x) 6= A.
Then there is some > 0 such that for all = 1/n, n N, there is an xn (a, b) such that
0 < | xn x0 | < 1/n, but | f (xn ) A | . We have constructed a sequence (xn ), xn 6= x0
and xn x0 as n such that limn f (xn ) 6= A which contradicts our assumption.
Hence lim f (x) = A.
xx0

Example. limx1 x + 3 = 4. Indeed, given > 0 choose = . Then | x 1 | < implies


| (x + 3) 4 | < = .

3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity


Definition 3.3 (a) We are writing
lim f (x) = A

xx0 +0

if for all sequences (xn ) with xn > x0 and lim xn = x0 , we have lim f (xn ) = A. Sometimes
n

we use the notation f (x0 + 0) in place of lim f (x). We call f (x0 + 0) the right-hand limit
xx0 +0

of f at x0 or we say A is the limit of f as x approaches x0 from above (from the right).


Similarly one defines the left-hand limit of f at x0 , lim f (x) = A with xn < x0 in place of
xx0 0

xn > x0 . Sometimes we use the notation f (x0 0).


(b) We are writing
lim f (x) = A
x+

if for all sequences (xn ) with lim xn = + we have lim f (xn ) = A. Sometimes we use the
n

notation f (+). In a similar way we define lim f (x) = A.


x

(c) Finally, the notions of (a), (b), and Definition 3.2 still make sense in case A = + and
A = . For example,
lim f (x) =
xx0 0

if for all sequences (xn ) with xn < x0 and lim xn = x0 we have lim f (xn ) = .
n

Remark 3.1 All notions in the above definition can be given in - or -D or E- or E-D
languages using inequalities. For example, lim f (x) = if and only if
xx0 0

E > 0 > 0 x D(f ) : 0 < x0 x < = f (x) < E.

3 Functions and Continuity

78

For example, we show that limx00 x1 = . To E > 0 choose = E1 . Then 0 < x < =
1
implies 0 < E < x1 and hence f (x) < E. This proves the claim.
E
Similarly, lim f (x) = + if and only if
x+

E > 0 D > 0 x D(f ) : x > D = f (x) > E.


The proves of equivalence of - definitions and sequence definitions are along the lines of
Proposition 3.1.

For example, limx+ x2 = +. To E > 0 choose D = E. Then x > D implies x > E;


thus x2 > E.
1
= 0. For, let > 0; choose D = 1/. Then x > D = 1/ implies
x+ x
0 < 1/x < . This proves the claim.
(b) Consider the entier function f (x) = [x], defined in
Example 2.6 (b). If n Z, lim f (x) = n 1 whereas
y

Example 3.2 (a) lim

xn0

lim f (x) = n.

f(x)=[x]

xn+0

Proof. We use the - definition of the one-sided limits


to prove the first claim. Let > 0. Choose = 12 then
0 < n x < 12 implies n 12 < x < n and therefore
f (x) = n 1. In particular | f (x) (n 1) | = 0 < .
Similarly one proves lim f (x) = n.

n-1

xn+0

Since the one-sided limits are different, lim f (x) does not exist.
xn

Definition 3.4 Suppose we are given two functions f and g, both defined on (a, b) \ {x0 }. By
f + g we mean the function which assigns to each point x 6= x0 of (a, b) the number f (x) +
g(x). Similarly, we define the difference f g, the product f g, and the quotient f /g, with the
understanding that the quotient is defined only at those points x at which g(x) 6= 0.
Proposition 3.2 Suppose that f and g are functions defined on (a, b) \ {x0 }, a < x0 < b, and
limxx0 f (x) = A, limxx0 g(x) = B, , R. Then
(a) lim f (x) = A implies A = A.
xx0

(b) lim (f + g)(x) = A + B;


xx0

(c) lim (f g)(x) = AB;


xx0

A
f
(x) = , if B 6= 0.
xx0 g
B
(e) lim | f (x) | = | A |.

(d) lim

xx0

Proof. In view of Proposition 3.1, all these assertions follow immediately from the analogous
properties of sequences, see Proposition 2.3. As an example, we show (c). Let (xn ) , xn 6= x0 ,
be a sequence tending to x0 . By assumption, limn f (xn ) = A and limn g(xn ) = B.
By the Propostition 2.3 limn f (xn )g(xn ) = AB, that is, limn (f g)(xn ) = AB. By

3.1 Limits of a Function

79

Proposition 3.1, limxx0 f g(x) = AB.

Remark 3.2 The proposition remains true if we replace (at the same time in all places) x x0
by x x0 + 0, x x0 0, x +, or x . Moreover we can replace A or B by +
or by provided the right members of (b), (c), (d) and (e) are defined.
Note that + + (), 0 , /, and A/0 are not defined.
The extended real number system consists of the real field R and two symbols, + and .
We preserve the original order in R and define
< x < +
for every x R.
It is the clear that + is an upper bound of every subset of the extended real number system,
and every nonempty subset has a least upper bound. If, for example, E is a set of real numbers
which is not bounded above in R, then sup E = + in the extended real system. Exactly the
same remarks apply to lower bounds.
The extended real system does not form a field, but it is customary to make the following
conventions:
(a) If x is real then
x + = +,

x = ,

x
x
=
= 0.
+

(b) If x > 0 then x (+) = + and x () = .


(c) If x < 0 then x (+) = and x () = +.
When it is desired to make the distinction between the real numbers on the one hand and the
symbols + and on the other hand quite explicit, the real numbers are called finite.
In Homework 9.2 (a) and (b) you are invited to give explicit proves in two special cases.
Example 3.3 (a) Let p(x) and q(x) be polynomials and a R. Then
lim p(x) = p(a).

xa

This immediately follows from limxa x = a, limxa c = c and Proposition 3.2. Indeed, by (b)
and (c), for p(x) = 3x3 4x + 7 we have limxa (3x2 4x + 7) = 3 (limxa x)3 4 limxa x +
7 = 3a2 4a + 7 = p(a). This works for arbitrary polynomials . Suppose moreover that
q(a) 6= 0. Then by (d),
p(a)
p(x)
lim
=
.
xa q(x)
q(a)
Hence, the limit of a rational function f (x) as x approaches a point a of the domain of f is
f (a).

3 Functions and Continuity

80

P
(b) Let f (x) = p(x)
be a rational function with polynomials p(x) = rk=0 ak xk and q(x) =
q(x)
Ps
k
k=0 bk x with real coefficients ak and bk and of degree r and s, respectively. Then

0, if r < s,

ar , if r = s,
lim f (x) = bs
x+

+, if r > s and

, if r > s and

ar
bs
ar
bs

> 0,
< 0.

The first two statements (r s) follow from Example 3.2 (b) together with Proposition 3.2.
Namely, ak xkr 0 as x + provided 0 k < r. The statements for r > s follow from
xrs + as x + and the above remark.
Note that
lim f (x) = (1)r+s lim f (x)

x+

since
r
p(x)
(1)r ar xr + . . .
r+s ar x + . . .
=
=
(1)
.
q(x)
(1)s bs xs + . . .
bs xs + . . .

3.2 Continuous Functions


Definition 3.5 Let f be a function and x0 D(f ). We say that f is continuous at x0 if
> 0 > 0 x D(f ) : | x x0 | < = | f (x) f (x0 ) | < .

(3.1)

We say that f is continuous in A D(f ) if f is continuous at all points x0 A.


Proposition 3.1 shows that the above definition of continuity in x0 is equivalent to: For all
sequences (xn ), xn D(f ), with lim xn = x0 , lim f (xn ) = f (x0 ). In other words, f is
n

continuous at x0 if lim f (x) = f (x0 ).


xx0

Example 3.4 (a) In example 3.2 we have seen that every polynomial is continuous in R and
every rational functions f is continuous in their domain D(f ).
f (x) = | x | is continuous in R.
(b) Continuity is a local property: If two functions f, g : D R coincide in a neighborhood
U (x0 ) D of some point x0 , then f is continuous at x0 if and only if g is continuous at x0 .
(c) f (x) = [x] is continuous in R \ Z. If x0 is not an integer, then n < x0 < n + 1 for some
n N and f (x) = n coincides with a constant function in a neighborhood x U (x0 ). By (b),
f is continuous at x0 . If x0 = n Z, limxn [x] does not exist; hence f is not continuous at n.
x2 1
(d) f (x) =
if x 6= 1 and f (1) = 1. Then f is not continuous at x0 = 1 since
x1
x2 1
= lim (x + 1) = 2 6= 1 = f (1).
x1 x 1
x1
lim

3.2 Continuous Functions

81

There are two reasons for a function not being continuous at x0 . First, limxx0 f (x) does not
exist. Secondly, f has a limit at x0 but limxx0 f (x) 6= f (x0 ).
Proposition 3.3 Suppose f, g : D R are continuous at x0 D. Then f + g and f g are also
continuous at x0 . If g(x0 ) 6= 0, then f /g is continuous at x0 .
The proof is obvious from Proposition 3.2.
The set C(D) of continuous function on D R form a commutative algebra with 1.
Proposition 3.4 Let f : D R and g : E R functions with f (D) E. Suppose f is
continuous at a D, and g is continuous at b = f (a) E. Then the composite function
g f : D R is continuous at a.
Proof. Let (xn ) be a sequence with xn D and limn xn = a. Since f is continuous
at a, limn f (xn ) = b. Since g is continuous at b, limn g(f (xn )) = g(b); hence
g f (xn ) g f (a). This completes the proof.
Example 3.5 f (x) = x1 is continuous for x 6= 0, g(x) = sin x is continuous (see below), hence,
(g f )(x) = sin x1 is continuous on R \ {0}.

3.2.1 The Intermediate Value Theorem


In this paragraph, [a, b] R is a closed, bounded interval, a, b R.
The intermediate value theorem is the basis for several existence theorems in analysis. It is
again equivalent to the order completeness of R.
Theorem 3.5 (Intermediate Value Theorem) Let f : [a, b] R be a continuous function and
a real number between f (a) and f (b). Then there exists c [a, b] such that f (c) = .

The statement is clear from the graphical presentation. Nevertheless, it needs a proof since pictures do not prove anything.
The statement is wrong for rational numbers. For example, let

D = {x Q | 1 x 2} and f (x) = x2 2. Then f (1) = 1


c
a
b
and f (2) = 2 but there is no p D with f (p) = 0 since 2 has no
rational square root.
Proof. Without loss of generality suppose f (a) f (b). Starting with [a1 , b1 ] = [a, b], we successively construct a nested sequence of intervals [an , bn ] such that f (an ) f (bn ). As in
the proof of Proposition 2.12, the [an , bn ] is one of the two halfintervals [an1 , m] and [m, bn1 ]
where m = (an1 + bn1 )/2 is the midpoint of the (n 1)st interval. By Proposition 2.11 the
monotonic sequences (an ) and (bn ) both converge to a common point c. Since f is continuous,
lim f (an ) = f ( lim an ) = f (c) = f ( lim bn ) = lim f (bn ).

By Proposition 2.14, f (an ) f (bn ) implies


lim f (an ) lim f (bn );

3 Functions and Continuity

82
Hence, = f (c).

Example 3.6 (a) We again show the existence of the nth root of a positive real number a > 0,
n N. By Example 3.2, the polynomial p(x) = xn a is continuous in R. We find p(0) =
a < 0 and by Bernoullis inequality
p(1 + a) = (1 + a)n a 1 + (n 1)a 1 > 0.
Theorem 3.5 shows that p has a root in the interval (0, 1 + a).
(b) A polynomial p of odd degree with real coefficients has a real zero. Namely, by Example 3.3,
if the leading coefficient ar of p is positive, lim p(x) = and lim p(x) = +. Hence
x

there are a and b with a < b and p(a) < 0 < p(b). Therefore, there is a c (a, b) such that
p(c) = 0.
There are polynomials of even degree having no real zeros. For example f (x) = x2k + 1.

Remark 3.3 Theorem 3.5 is not true for continuous functions f : Q R. For example,
f (x) = x2 2 is continuous, f (0) = 2 < 0 < 2 = f (2). However, there is no r Q
with f (r) = 0.

3.2.2 Continuous Functions on Bounded and Closed IntervalsThe Theorem about Maximum and Minimum
We say that f : [a, b] R is continuous, if f is continuous on (a, b) and f (a + 0) = f (a) and
f (b 0) = f (b).
Theorem 3.6 (Theorem about Maximum and Minimum) Let f : [a, b] R be continuous.
Then f is bounded and attains its maximum and its minimum, that is, there exists C > 0 with
| f (x) | C for all x [a, b] and there exist p, q [a, b] with sup f (x) = max f (x) = f (p)
axb

axb

and inf f (x) = min f (x) = f (q).


axb

axb

Remarks 3.4 (a) The theorem is not true in case of open, half-open or infinite intervals. For
example, f : (0, 1] R, f (x) = x1 is continuous but not bounded. The function f : (0, 1) R,
f (x) = x is continuous and bounded. However, it doesnt attain maximum and minimum.
Finally, f (x) = x2 on R+ is continuous but not bounded.
(b) Put M := max f (x) and m := min f (x). By the Theorem about maximum and minimum
xK

xK

and the intermediate value theorem, for all R with m M there exists c [a, b] such
that f (c) = ; that is, f attains all values between m and M.
Proof. We give the proof in case of the maximum. Replacing f by f yields the proof for the
minimum. Let
A = sup f (x) R {+}.
axb

3.3 Uniform Continuity

83

(Note that A = + is equivalent to f is not bounded above.) Then there exists a sequence
(xn ) [a, b] such that limn f (xn ) = A. Since (xn ) is bounded, by the Theoremm of
Weierstra there exists a convergent subsequence (xnk ) with p = lim xnk and a p b. Since
k

f is continuous,

A = lim f (xnk ) = f (p).


k

In particular, A is a finite real number; that is, f is bounded above by A and f attains its
maximum A at point p [a, b].

3.3 Uniform Continuity


Let D be a finite or infinite interval.
Definition 3.6 A function f : D R is called uniformly continuous if for every > 0 there
exists a > 0 such that for all x, x D | x x | < implies | f (x) f (x ) | < .
f is uniformly continuous on [a, b] if and only if
> 0 > 0 x, y [a, b] : | x y | < = | f (x) f (y) | < .

(3.2)

Remark 3.5 If f is uniformly continuous on D then f is continuous on D. However, the


converse direction is not true.
Consider, for example, f : (0, 1) R, f (x) = x1 which is continuous. Suppose to the contrary
that f is uniformly continuous. Then to = 1 there exists > 0 with (3.2). By the Archimedian
1
1
property there exists n N such that 2n
< . Consider xn = n1 and yn = 2n
. Then | xn yn | =
1
< . However,
2n
| f (xn ) f (yn ) | = 2n n = n 1.
A contradiction! Hence, f is not uniformly continuous on (0, 1).
Let us consider the differences between the concepts of continuity and uniform continuity. First,
uniform continuity is a property of a function on a set, whereas continuity can be defined in a
single point. To ask whether or not a given function is uniformly continuous at a certain point is
meaningless. Secondly, if f is continuous on D, then it is possible to find, for each > 0 and for
each point x0 D, a number = (x0 , ) > 0 having the property specified in Definition 3.5.
This depends on and on x0 . If f is, however, uniformly continuous on X, then it is possible,
for each > 0 to find one = () > 0 which will do for all possible points x0 of X.
That the two concepts are equivalent on bounded and closed intervals follows from the next
proposition.
Proposition 3.7 Let f : [a, b] R be a continuous function on a bounded and closed interval.
Then f is uniformly continuous on [a, b].

3 Functions and Continuity

84

Proof. Suppose to the contrary that f is not uniformly continuous. Then there exists 0 > 0
without matching > 0; for every positive integer n N there exists a pair of points xn , xn
with | xn xn | < 1/n but | f (xn ) f (xn ) | 0 . Since [a, b] is bounded and closed, (xn ) has
a subsequence converging to some point p [a, b]. Since | xn xn | < 1/n, the sequence (xn )
also converges to p. Hence

lim f (xnk ) f (xnk ) = f (p) f (p) = 0



which contradicts f (xnk ) f (xnk ) 0 for all k.

There exists an example of a bounded continuous function f : [0, 1) R which is not uniformly continuous, see [Kon90, p. 91].
Discontinuities
If x is a point in the domain of a function f at which f is not continuous, we say f is discontinuous at x or f has a discontinuity at x. It is customary to divide discontinuities into two
types.
Definition 3.7 Let f : (a, b) R be a function which is discontinuous at a point x0 . If the
one-sided limits limxx0 +0 f (x) and limxx0 0 f (x) exist, then f is said to have a simple discontinuity or a discontinuity of the first kind. Otherwise the discontinuity is said to be of the
second kind.
Example 3.7 (a) f (x) = sign(x) is continuous on R \ {0} since it is locally constant. Moreover, f (0 + 0) = 1 and f (0 0) = 1. Hence, sign(x) has a simple discontinuity at x0 = 0.
(b) Define f (x) = 0 if x is rational, and f (x) = 1 if x is irrational. Then f has a discontinuity
of the second kind at every point x since neither f (x + 0) nor f (x 0) exists.
(c) Define
f (x) =

sin x1 , if
0,

if

x 6= 0;
x = 0.

Consider the two sequences


xn =

1
+ n

and yn =

1
,
n

Then both sequences (xn ) and (yn ) approach 0 from above but limn f (xn ) = 1 and
limn f (yn ) = 0; hence f (0 + 0) does not exist. Therefore f has a discontinuity of the
second kind at x = 0. We have not yet shown that sin x is a continuous function. This will be
done in Section 3.5.2.

3.4 Monotonic Functions

85

3.4 Monotonic Functions


Definition 3.8 Let f be a real function on the interval (a, b). Then f is said to be monotonically
increasing on (a, b) if a < x < y < b implies f (x) f (y). If the last inequality is reversed, we
obtain the definition of a monotonically decreasing function. The class of monotonic functions
consists of both the increasing and the decreasing functions.
If a < x < y < b implies f (x) < f (y), the function is said to be strictly increasing. Similarly,
strictly decreasing functions are defined.
Theorem 3.8 Let f be a monotonically increasing function on (a, b). Then the one-sided limits
f (x + 0) and f (x 0) exist at every point x of (a, b). More precisely,
sup f (t) = f (x 0) f (x) f (x + 0) = inf f (t).
t(x,b)

t(a,x)

(3.3)

Furthermore, if a < x < y < b, then


f (x + 0) f (y 0).

(3.4)

Analogous results evidently hold for monotonically decreasing functions.


Proof. See Appendix B to this chapter.

Proposition 3.9 Let f : [a, b] R be a strictly monotonically increasing continuous function


and A = f (a) and B = f (b). Then f maps [a, b] bijectively onto [A, B] and the inverse function
f 1 : [A, B] R
is again strictly monotonically increasing and continuous.
Note that the inverse function f 1 : [A, B] [a, b] is defined by f (y0 ) = x0 , y0 [A, B],
where x0 is the unique element of [a, b] with f (x0 ) = y0 . However, we can think of f 1 as a
function into R. A similar statement is true for strictly decreasing functions.
Proof. By Remark 3.4, f maps [a, b] onto the whole closed interval [A, B] (intermediate value
theorem). Since x < y implies f (x) < f (y), f is injective and hence bijective. Hence, the
inverse mapping f 1 : [A, B] [a, b] exists and is again strictly increasing (u < v implies
f 1 (u) = x < y = f 1 (v) otherwise, x y implies u v).
We show that g = f 1 is continuous. Suppose (un ) is a sequence in [A, B] with un u and
un = f (xn ) and u = f (x). We have to show that (xn ) converges to x. Suppose to the contrary
that there exists 0 > 0 such that | xn x | 0 for infinitely many n. Since (xn ) [a, b] is
bounded, there exists a converging subsequence (xnk ), say, xnk c as k . The above
inequality is true for the limit c, too, that is | c x | 0 . By continuity of f , xnk c implies
f (xnk ) f (c). That is unk f (c). Since un u = f (x) and the limit of a converging
sequence is unique, f (c) = f (x). Since f is bijective, x = c; this contradicts | c x | 0 .
Hence, g is continuous at u.

3 Functions and Continuity

86

Example 3.8 The function f : R+ R+ , f (x) = xn , is continuous and strictly increasing.

Hence x = g(y) = n y is continuous, too. This gives an alternative proof of homework 5.5.

3.5 Exponential, Trigonometric, and Hyperbolic Functions


and their Inverses
3.5.1 Exponential and Logarithm Functions
In this section we are dealing with the exponential function which is one of the most important
in analysis. We use the exponential series to define the function. We will see that this definition
is consistent with the definition ex for rational x Q as defined in Chapter 1.
Definition 3.9 For z C put
E(z) =

X
zn
n=0

n!

=1+z+

z2 z3
+
+ .
2
6

(3.5)

Note that E(0) = 1 and E(1) = e by the definition at page 61. The radius of convergence of
the exponential series (3.5) is R = +, i. e. the series converges absolutely for all z C, see
Example 2.13 (c).
Applying Proposition 2.31 (Cauchy product) on multiplication of absolutely convergent series,
we obtain
n

X
z n X w m X X z k w nk
=
E(z)E(w) =
n! m=0 m!
k! (n k)!
n=0
n=0 k=0



X 1 X n
X
(z + w)n
k nk
,
z w
=
=
n!
n!
k
n=0
n=0
k=0

which gives us the important addition formula


E(z + w) = E(z)E(w),

z, w C.

(3.6)

z C.

(3.7)

One consequence is that


E(z)E(z) = E(0) = 1,

This shows that E(z) 6= 0 for all z. By (3.5), E(x) > 0 if x > 0; hence (3.7) shows E(x) > 0
for all real x.
Iteration of (3.6) gives
E(z1 + + zn ) = E(z1 ) E(zn ).

(3.8)

Let us take z1 = = zn = 1. Since E(1) = e by (2.15), we obtain


E(n) = en ,

n N.

(3.9)

3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses

87

If p = m/n, where m, n are positive integers, then


E(p)n = E(pn) = E(m) = em ,

(3.10)

so that
p Q+ .

E(p) = ep ,

(3.11)

It follows from (3.7) that E(p) = ep if p is positive and rational. Thus (3.11) holds for all
rational p. This justifies the redefinition
x C.

ex := E(x),
The notation exp(x) is often used in place of ex .

Proposition 3.10 We can estimate the remainder term rn :=


2 | z |n
| rn (z) |
n!
Proof. We have
k
X
z

| rn (z) |
k!
k=n

| z |n
n!

if

|z|

k=n

z k /k! as follows

n+1
.
2

(3.12)


| z |n
=

n!

|z|
| z |2
| z |k
1+
+
++
+
n + 1 (n + 1)(n + 2)
(n + 1) (n + k)
!
| z |k
|z|
| z |2
++
+ .
1+
+
n + 1 (n + 1)2
(n + 1)k

| z | (n + 1)/2 implies,

| z |n
| rn (z) |
n!

1 1
1
1+ + ++ k +
2 4
2

2 | z |n
.

n!

Example 3.9 (a) Inserting n = 1 gives


| E(z) 1 | = | r1 (z) | 2 | z | ,

| z | 1.

In particular, E(z) is continuous at z0 = 0. Indeed, to > 0 choose = /2 then | z | <


implies | E(z) 1 | 2 | z | ; hence lim E(z) = E(0) = 1 and E is continuous at 0.
z0
(b) Inserting n = 2 gives
| ez 1 z | = | r2 (z) | | z |2 ,
This implies


z

e 1
|z |,

1

z

The sandwich theorem gives lim

z0

ez 1
z

= 1.

3
|z| .
2

3
|z| < .
2

3 Functions and Continuity

88

By (3.5), limx E(x) = +; hence (3.7) shows that limx E(x) = 0. By (3.5),
0 < x < y implies that E(x) < E(y); by (3.7), it follows that E(y) < E(x); hence, E
is strictly increasing on the whole real axis.
The addition formula also shows that
lim (E(z + h) E(z)) = E(z) lim (E(h) 1) = E(z) 0 = 0,

h0

h0

(3.13)

where limh0 E(h) = 1 directly follows from Example 3.9. Hence, E(z) is continuous for all
z.
Proposition 3.11 Let ex be defined on R by the power series (3.5). Then
(a) ex is continuous for all x.
(b) ex is a strictly increasing function and ex > 0.
(c) ex+y = ex ey .
(d) lim ex = +, lim ex = 0.
x+

xn
(e) lim x = 0 for every n N.
x+ e
Proof. We have already proved (a) to (d); (3.5) shows that
ex >

xn+1
(n + 1)!

for x > 0, so that

(n + 1)!
xn
,
<
x
e
x
and (e) follows. Part (e) shows that ex tends faster to + than any power of x, as x +.

Since ex , x R, is a strictly increasing continuous function, by Proposition 3.9 ex has an strictly


increasing continuous inverse function log y, log : (0, +) R. The function log is defined
by
elog y = y,

y > 0,

(3.14)

or, equivalently, by
log(ex ) = x,

x R.

(3.15)

Writing u = ex and v = ey , (3.6) gives


log(uv) = log(ex ey ) = log(ex+y ) = x + y,
such that
log(uv) = log u + log v,

u > 0, v > 0.

(3.16)

3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses

89

This shows that log has the familiar property which makes the logarithm useful for computations. Another customary notation for log x is ln x. Proposition 3.11 shows that
lim log x = +,

lim log x = .

x+

x0+0

We summarize what we have proved so far.


Proposition 3.12 Let the logarithm log : (0, +) R be the inverse function to the exponential function ex . Then
(a) log is continuous on (0, +).
(b) log is strictly increasing.
(c) log(uv) = log u + log v for u, v > 0.
(d) lim log x = +,
lim log x = .
x+

x0+0

It is seen from (3.14) that


x = elog x = xn = en log x

(3.17)

if x > 0 and n is an integer. Similarly, if m is a positive integer, we have


1

xm = e

log x
m

(3.18)

Combining (3.17) and (3.18), we obtain


x = e log x .

(3.19)

for any rational . We now define x for any real and x > 0, by (3.19). In the same way, we
redefine the exponential function
ax = ex log a ,

a > 0,

x R.

It turns out that in case a 6= 1, f (x) = ax is strict monotonic and continuous since ex is so.
Hence, f has a stict monotonic continuous inverse function loga : (0, +) R defined by
loga (ax ) = x,

x R,

aloga x = x,

x > 0.

3.5.2 Trigonometric Functions and their Inverses


Simple Trig Functions
1

In this section we redefine the trigonometric functions using


the exponential function ez . We will see that the new definitions coincide with the old ones.

0.5

Definition 3.10 For z C define


cos z =

0.5

such that

Sine
Cosine


1 iz
e + ei z ,
2

ei z = cos z + i sin z

sin z =


1 iz
e ei z
2i

(Euler formula)

(3.20)

(3.21)

3 Functions and Continuity

90

Proposition 3.13 (a) The functions sin z and cos z can be written as power series which converge absolutely for all z C:
cos z =
sin z =

X
(1)n
n=0

X
n=0

1
1
1
z 2n = 1 z 2 + z 4 z 6 +
(2n)!
2
4!
6!

1
1
(1)n 2n+1
= z z3 + z5 + .
z
(2n + 1)!
3!
5!

(3.22)

(b) sin x and cos x are real valued and continuous on R, where cos x is an even and sin x is an
odd function, i. e. cos(x) = cos x, sin(x) = sin x. We have
sin2 x + cos2 x = 1;

(3.23)

cos(x + y) = cos x cos y sin x sin y;


sin(x + y) = sin x cos y + cos x sin y.

(3.24)

Proof. (a) Inserting iz into (3.5) in place of z and using (in ) = (i, 1, i, 1, i, 1, . . . ), we have
iz

e =

nz

n=0

n!

X
k=0

X
z 2k
z 2k+1
(1)
(1)k
+i
.
(2k)!
(2k + 1)!
k

k=0

Inserting iz into (3.5) in place of z we have


ei z =

in

n=0

X
zn X
z 2k
z 2k+1
(1)k
(1)k
=
i
.
n!
(2k)!
(2k
+
1)!
k=0
k=0

Inserting this into (3.20) proves (a).


(b) Since the exponential function is continuous on C, sin z and cos z are also continuous on
C. In particular, their restrictions to R are continuous. Now let x R, then i x = i x. By
Homework 11.3 and (3.20) we obtain
 1
 1


1  ix
e + ei x =
ei x + ei x =
ei x + ei x = Re ei x
cos x =
2
2
2
and similarly

sin x = Im ei x .
Hence, sin x and cos x are real for real x.


For x R we have ei x = 1. Namely by (3.7) and
Homework 11.3
i
ei x
ix 2
i sin x
e = eix eix = eix ei x = e0 = 1,
so that for x R

cos x

ix
e = 1.

(3.25)

On the other hand, the Euler formula and the fact that
cos x and sin x are real give


1 = ei x = | cos x + i sin x | = cos2 x + sin2 x.

3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses

91

Hence, eix = cos x + i sin x is a point on the unit circle in the complex plane, and cos x and
sin x are its coordinates. This establishes the equivalence between the old definition of cos x as

the length of the adjacent side in a rectangular triangle with hypothenuse 1 and angle x 180
and

the power series definition of cos x. The only missing link is: the length of the arc from 1 to eix
is x.
It follows directly from the definition that cos(z) = cos z and sin(z) = sin z for all
z C. The addition laws for sin x and cos x follow from (3.6) applied to ei (x+y) . This
completes the proof of (b).

Lemma 3.14 There exists a unique number (0, 2) such that cos = 0. We define the
number by
= 2.

(3.26)

The proof is based on the following Lemma.


Lemma 3.15

x3
< sin x < x.
6

(b) 0 < x < 2 implies 0 < cos x,


sin x
0 < sin x < x <
,
cos x
1
.
cos2 x <
1 + x2
(a) 0 < x <

6 implies x

(3.27)
(3.28)
(3.29)
(3.30)

(c) cos x is strictly decreasing on [0, ]; whereas sin x is strictly increasing on [/2, /2].
2

In particular, the sandwich theorem applied to statement (a), 1 x6 < sinx x < 1 as x 0 + 0
gives limx0+0 sinx x = 1. Since sinx x is an even function, this implies limx0 sinx x = 1.
The proof of the lemma is in the Appendix B to this chapter.
Proof of Lemma 3.14. cos 0 = 1. By the Lemma 3.15, cos2 1 < 1/2. By the double angle
formula for cosine, cos 2 = 2 cos2 1 1 < 0. By continuity of cos x and Theorem 3.5, cos has
a zero in the interval (0, 2).
By addition laws,




x+y
xy
cos x cos y = 2 sin
sin
.
2
2
So that by Lemma 3.15 0 < x < y < 2 implies 0 < sin((x + y)/2) and sin((x y)/2) < 0;
therefore cos x > cos y. Hence, cos x is strictly decreasing on (0, 2). The zero is therefore
unique.

3 Functions and Continuity

92


By definition, cos 2 = 0; and (3.23) shows sin(/2) = 1. By (3.27), sin /2 = 1. Thus
ei/2 = i, and the addition formula for ez gives
e i = 1,

e2 i = 1;

(3.31)

z C.

(3.32)

hence,
ez+2 i = ez ,

Proposition 3.16 (a) The function ez is periodic with period 2i.


We have eix = 1, x R, if and only if x = 2k, k Z.
(b) The functions sin z and cos z are periodic with period 2.
The real zeros of the sine and cosine functions are {k | k Z} and {/2 + k | k Z},
respectively.
Proof. We have already proved (a). (b) follows from (a) and (3.20).

Tangent and Cotangent Functions


Tangent
4

Cotangent
4

For x 6= /2 + k, k Z, define
tan x =

sin x
.
cos x

(3.33)

cot x =

cos x
.
sin x

(3.34)

For x 6= k, k Z, define

Lemma 3.17 (a) tan x is continuous at x R \ {/2 + k | k Z}, and tan(x + ) = tan x;
(b) lim
tan x = +, lim
tan x = ;

x 2 0

x 2 +0

(c) tan x is strictly increasing on (/2, /2);

3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses

93

Proof. (a) is clear by Proposition 3.3 since sin x and cos x are continuous. We show only (c) and
let (b) as an exercise. Let 0 < x < y < /2. Then 0 < sin x < sin y and cos x > cos y > 0.
Therefore
tan x =

sin x
sin y
<
= tan y.
cos x
cos y

Hence, tan is strictly increasing on (0, /2). Since tan(x) = tan(x), tan is strictly
increasing on the whole interval (/2, /2).

Similarly as Lemma 3.17 one proves the next lemma.

Lemma 3.18 (a) cot x is continuous at x R \ {k | k Z}, and cot(x + ) = cot x;


(b) lim cot x = , lim cot x = +;
x00

x0+0

(c) cot x is strictly decreasing on (0, ).

Inverse Trigonometric Functions


We have seen in Lemma 3.15 that cos x is strictly decreasing on [0, ] and sin x is strictly increasing on [/2, /2]. Obviously, the images are cos[0, ] = sin[/2, /2] = [1, 1].
Using Proposition 3.9 we obtain that the inverse functions exists and they are monotonic and
continuous.
Proposition 3.19 (and Definition) There exists the inArcosine and Arcsine
verse function to cos
3
arccos : [1, 1] [0, ]

1 0.8 0.6 0.4 0.2 0

0.2

0.4

0.6

0.8

(3.35)

given by arccos(cos x) = x, x [0, ] or cos(arccos y) =


y, y [1, 1]. The function arccos x is strictly decreasing and continuous.
There exists the inverse function to sin
arcsin : [1, 1] [/2, /2]

(3.36)

given by arcsin(sin x) = x, x [/2, /2] or


sin(arcsin y) = y, y [1, 1]. The function arcsin x
is strictly increasing and continuous.
Note that arcsin x + arccos x = /2 if x [1, 1]. Indeed, let y = arcsin x; then x = sin y =
cos(/2 y). Since y [0, ], /2 y [/2, /2], and we have arccos x = /2 y.
Therefore y + arccos x = /2.
arccos
arcsin

3 Functions and Continuity

94

By Lemma 3.17, tan x is strictly increasing on


(/2, /2). Therefore, there exists the inverse
function on the image tan(/2, /2) = R.

Arctangent and Arccotangent


3

10

Proposition 3.20 (and Definition) There


exists the inverse function to tan

arctan : R (/2, /2)

x
1

10

(3.37)

given by arctan(tan x) = x, x (/2, /2)


or tan(arctan y) = y, y R. The function
arctan x is strictly increasing and continuous.
There exists the inverse function to cot x
arccot : R (0, )

(3.38)

given by arccot (cot x) = x, x (0, ) or


cot(arccot y) = y, y R. The function
arccot x is strictly decreasing and continuous.

arctan
arccot

3.5.3 Hyperbolic Functions and their Inverses

Hyperbolic Cosine and Sine

The functions

3
2
1

ex ex
,
2
ex + ex
cosh x =
,
2
ex ex
tanh x = x
=
e + ex
ex + ex
=
coth x = x
e ex
sinh x =

1
1
2
3

cosh
sinh

(3.39)
(3.40)
sinh x
cosh x
cosh x
sinh x

(3.41)
(3.42)

are called hyperbolic sine, hyperbolic cosine, hyperbolic tangent, and hyperbolic cotangent, respectively.
There are many analogies between these functions and
their ordinary trigonometric counterparts.

3.6 Appendix B

95
Hyperbolic Cotangent
3

Hyperbolic Tangent
1

2
y

0.5

x
1
0.5
2

The functions sinh x and tanh x are strictly increasing with sinh(R) = R and tanh(R) =
(1, 1). Hence, their inverse functions are defined on R and on (1, 1), respectively, and are
also strictly increasing and continuous. The function
arsinh : R R

(3.43)

is given by arsinh (sinh(x)) = x, x R or sinh(arsinh (y)) = y, y R.


The function
artanh : (1, 1) R

(3.44)

is defined by artanh (tanh(x)) = x, x R or tanh(artanh (y)) = y, y (1, 1).


The function cosh is strictly increasing on the half line R+ with cosh(R+ ) = [1, ). Hence,
the inverse function is defined on [1, ) taking values in R+ . It is also strictly increasing and
continuous.
arcosh : [1, ) R+

(3.45)

is defined via arcosh (cosh(x)) = x, x 0 or by cosh(arcosh (y)) = y, y 1.


The function coth is strictly decreasing on the x < 0 and on x > 0 with coth(R \ 0) =
R \ [1, 1]. Hence, the inverse function is defined on R \ [1, 1] taking values in R \ 0. It is
also strictly decreasing and continuous.
arcoth : R \ [1, 1] R

(3.46)

is defined via arcoth (coth(x)) = x, x 6= 0 or by coth(arcoth (y)) = y, y < 1 or y > 1.

3.6 Appendix B
3.6.1 Monotonic Functions have One-Sided Limits
Proof of Theorem 3.8. By hypothesis, the set {f (t) | a < t < x} is bounded above by f (x),
and therefore has a least upper bound which we shall denote by A. Evidently A f (x). We

3 Functions and Continuity

96

have to show that A = f (x 0).


Let > 0 be given. It follows from the definition of A as a least upper bound that there exists
> 0 such that a < x < x and
A < f (x ) A.

(3.47)

Since f is monotonic, we have


f (x ) < f (t) A,

if x < t < x.

(3.48)

Combining (3.47) and (3.48), we see that


| f (t) A | < if

x < t < x.

Hence f (x 0) = A.
The second half of (3.3) is proved in precisely the same way. Next, if a < x < y < b, we see
from (3.3) that
f (x + 0) = inf f (t) = inf f (t).
x<t<b

x<t<y

(3.49)

The last equality is obtained by applying (3.3) to (a, y) instead of (a, b). Similarly,
f (y 0) = sup f (t) = sup f (t).
a<t<y

(3.50)

x<t<y

Comparison of the (3.49) and (3.50) gives (3.4).

3.6.2 Proofs for sin x and cos x inequalities


Proof of Lemma 3.15. (a) By (3.22)




1 2
1
1 2
4
cos x = 1 x + x
x + .
2!
4! 6!

0 < x < 2 implies 1 x2 /2 > 0 and, moreover 1/(2n)! x2 /(2n + 2)! > 0 for all n N;
hence C(x) > 0.
By (3.22),




1 2
1 2
1
5
x + .
sin x = x 1 x + x
3!
5! 7!
Now,

1
1
1 2
x > 0 x < 6,
x2 > 0 x < 42, . . . .
3!
5! 7!

Hence, S(x) > 0 if 0 < x < 6. This gives (3.27). Similarly,






1
1
1 2
1 2
3
7
x +x
x + ,
x sin x = x
3! 5!
7! 9!
1

3.6 Appendix B

97

and we obtain sin x < x if 0 < x < 20. Finally we have to check whether sin x x cos x > 0;
equivalently






?
1
1
1
1
1
1
3
5
7
x
+x
+

0<x
2! 3!
4! 5!
6! 7!




?
2
6
3
2 4
7
2 8
0<x
+x
+
x
x
3!
5!
7!
9!

Now 10 > x > 0 implies


2n + 2 2
2n

x >0
(2n + 1)! (2n + 3)!
for all n N. This completes the proof of (a)
(b) Using (3.23), we get
0 < x cos x < sin x = 0 < x2 cos2 x < sin2 x
1
.
= x2 cos2 x + cos2 x < 1 = cos2 x <
1 + x2
(c) In the proof of Lemma 3.14 we have seen that cos x is strictly decreasing in (0, /2). By

(3.23), sin x = 1 cos2 x is strictly increasing. Since sin x is an odd function, sin x is strictly
increasing on [/2, /2]. Since cos x = sin(x /2), the statement for cos x follows.

3.6.3 Estimates for


Proposition 3.21 For real x we have
cos x =

n
X

(1)k

x2k
+ r2n+2 (x)
(2k)!

(3.51)

(1)k

x2k+1
+ r2n+3 (x),
(2k + 1)!

(3.52)

k=0

sin x =

n
X
k=0

where
| x |2n+2
| r2n+2 (x) |
(2n + 2)!
| x |2n+3
| r2n+3 (x) |
(2n + 3)!
Proof. Let

Put

if

| x | 2n + 3,

(3.53)

if

| x | 2n + 4.

(3.54)



x2
x2n+2
r2n+2 (x) =
1
.
(2n + 2)!
(2n + 3)(2n + 4)
ak :=

x2k
.
(2n + 3)(2n + 4) (2n + 2(k + 1))

3 Functions and Continuity

98
Then we have, by definition
r2n+2 (x) =

x2n+2
(1 a1 + a2 + ) .
(2n + 2)!

Since
ak = ak1

x2
,
(2n + 2k + 1)(2n + 2k + 2)

| x | 2n + 3 implies
1 > a1 > a2 > > 0
and finally as in the proof of the Leibniz criterion
0 1 a1 + a2 a3 + 1.
Hence, | r2n+2 (x) | | x |2n+2 /(2n + 2)!. The estimate for the remainder of the sine series is
similar.

This is an application of Proposition 3.21. For numerical calculations it is convenient to use the
following order of operations
 
cos x =




x2
x2
x2
+1
+1
+1
2n(2n 1)
(2n 2)(2n 3)
(2n 4)(2n 5)

x2
+ 1 + r2n+2 (x).

First we compute cos 1.5 and cos 1.6. Choosing n = 7 we obtain


cos x =









x2
x2
x2
x2
x2
x2
x2
+1
+1
+1
+1
+1
+1
+
182
132
90
56
30
12
2
+ 1 + r16 (x).

By Proposition 3.21
| r16 (x) |

| x |16
0.9 1010
16!

if

| x | 1.6.

The calculations give


cos 1.5 = 0.07073720163 20 1011 > 0, cos 1.6 = 0.02919952239 20 1011 < 0.
By the itermediate value theorem, 1.5 < /2 < 1.6.

3.6 Appendix B

99
Now we compute cos x for two values of x which are close to
the linear interpolation

cos 1.5

a = 1.5 + 0.1

1.5

cos 1.5
= 1.57078 . . .
cos 1.5 cos 1.6

cos 1.5707 = 0.000096326273 20 1011 > 0,

1.6

cos 1.5708 = 0.00000367326 20 1011 < 0.

cos 1.6

Hence, 1.5707 < /2 < 1.5708.


The next linear interpolation gives
cos 1.5707
= 1.570796326 . . .
cos 1.707 cos 1.708
cos 1.570796326 = 0.00000000073 20 1011 > 0,
b = 1.5707 + 0.00001

cos 1.570796327 = 0.00000000027 20 1011 < 0.

Therefore 1.570796326 < /2 < 1.570796327 so that


= 3.141592653 109.

100

3 Functions and Continuity

Chapter 4
Differentiation
4.1 The Derivative of a Function
We define the derivative of a function and prove the main properties like product, quotient and
chain rule. We relate the derivative of a function with the derivative of its inverse function. We
prove the mean value theorem and consider local extrema. Taylors theorem will be formulated.
Definition 4.1 Let f : (a, b) R be a function and x0 (a, b). If the limit
lim

xx0

f (x) f (x0 )
x x0

(4.1)

exists, we call f differentiable at x0 . The limit is denoted by f (x0 ). We say f is differentiable


if f is differentiable at every point x (a, b). We thus have associated to every function f a
function f whose domain is the set of points x0 where the limit (4.1) exists; f is called the
derivative of f .
Sometimes the Leibniz notation is used to denote the derivative of f
f (x0 ) =

df (x0 )
d
=
f (x0 ).
dx
dx

f (x0 + h) f (x0 )
Remarks 4.1 (a) Replacing x x0 by h , we see that f (x0 ) = lim
.
h0
h
(b) The limits
f (x0 + h) f (x0 )
f (x0 + h) f (x0 )
,
lim
lim
h00
h0+0
h
h
are called left-hand and right-hand derivatives of f in x0 , respectively. In particular for
f : [a, b] R, we can consider the right-hand derivative at a and the left-hand derivative at b.
Example 4.1 (a) For f (x) = c the constant function
f (x) f (x0 )
cc
= lim
= 0.
xx0
xx0 x x0
x x0

f (x0 ) = lim
(b) For f (x) = x,

f (x0 ) = lim

xx0

x x0
= 1.
x x0

101

4 Differentiation

102

(c) The slope of the tangent line. Given a function f : (a, b) R which is differentiable at x0 .
Then f (x0 ) is the slope of the tangent line to the graph of f through the point (x0 , f (x0 )).
The slope of the secant line through (x0 , f (x0 )) and
(x1 , f (x1 )) is
y
f( x 1)

m = tan 1 =

f( x 0)
1
x0

x1

f (x1 ) f (x0 )
.
x1 x0

One can see: If x1 approaches x0 , the secant line through


(x0 , f (x0 )) and (x1 , f (x1 )) approaches the tangent line
through (x0 , f (x0 )). Hence, the slope of the tangent line
is the limit of the slopes of the secant lines if x approaches
x0 :
f (x) f (x0 )
.
f (x0 ) = lim
xx0
x x0

Proposition 4.1 If f is differentiable at x0 (a, b), then f is continuous at x0 .


Proof. By Proposition 3.2 we have
f (x) f (x0 )
(x x0 ) = f (x0 ) lim (x x0 ) = f (x0 ) 0 = 0.
xx0
xx0
x x0

lim (f (x) f (x0 )) = lim

xx0

The converse of this proposition is not true. For example f (x) = | x | is continuous in R but
|h|
|h|
= 1 whereas lim
= 1. Later we will become
differentiable in R \ {0} since lim
h0+0 h
h00 h
aquainted with a function which is continuous on the whole line without being differentiable at
any point!
Proposition 4.2 Let f : (r, s) R be a function and a (r, s). Then f is differentiable at a if
and only if there exists a number c R and a function defined in a neighborhood of a such
that
f (x) = f (a) + (x a)c + (x),

(4.2)

where
lim

xa

In this case f (a) = c.

(x)
= 0.
xa

(4.3)

The proposition says that a function f differentiable at a can be approximated by a linear function, in our case by
y = f (a) + (x a)f (a).
The graph of this linear function is the tangent line to the graph of f at the point (a, f (a)). Later
we will use this point of view to define differentiability of functions f : Rn Rm .

4.1 The Derivative of a Function

103

Proof. Suppose first f satisfies (4.2) and (4.3). Then




(x)
f (x) f (a)
lim
= c.
= lim c +
xa
xa
xa
xa
Hence, f is differentiable at a with f (a) = c.
Now, let f be differentiable at a with f (a) = c. Put (x) = f (x) f (a) (x a)f (a). Then
lim

xa

(x)
f (x) f (a)
= lim
f (a) = 0.
xa
xa
xa

Let us compute the linear function whose graph


is the tangent line through (x0 , f (x0 )). Consider the rectangular triangle P P0Q0 . By Example 4.1 (c) we have

P (x,y)

P0

f( x 0)

f (x0 ) = tan =

Q
x0

y y0
,
x x0

such that the tangent line has the equation


x

y = g(x) = f (x0 ) + f (x0 )(x x0 ).

This function is called the linearization of f at x0 . It is also the Taylor polynomial of degree 1
of f at x0 , see Section4.5 below.
Proposition 4.3 Suppose f and g are defined on (a, b) and are differentible at a point x
(a, b). Then f + g, f g, and f /g are differentiable at x and
(a) (f + g) (x) = f (x) + g (x);

(b) 
(f g)
 (x) = f (x)g(x) + f (x)g (x);
(c)

f
g

(x) =

f (x)g(x)f (x)g (x)


.
g(x)2

In (c), we assume that g(x) 6= 0.


Proof. (a) Since

f (x + h) f (x) g(x + h) g(x)


(f + g)(x + h) (f + g)(x)
=
+
,
h
h
h
the claim follows from Proposition 3.2.
Let h = f g and t be variable. Then
h(t) h(x) = f (t)(g(t) g(x)) + g(x)(f (t) f (x))
h(t) h(x)
g(t) g(x)
f (t) f (x)
= f (t)
+ g(x)
.
tx
tx
tx

4 Differentiation

104
Noting that f (t) f (x) as t x, (b) follows.
Next let h = f /g. Then
f (t)
g(t)

f (x)
g(x)

f (t)g(x) f (x)g(t)
tx
g(x)g(t)(t x)
1
f (t)g(x) f (x)g(x) + f (x)g(x) f (x)g(t)
=
g(t)g(x)
tx


f (t) f (x)
g(t) g(x)
1
g(x)
.
f (x)
=
g(t)g(x)
tx
tx

h(t) h(x)
=
tx

Letting t x, and applying Propositions 3.2 and 4.1, we obtain (c).


Example 4.2 (a) f (x) = xn , n Z. We will prove f (x) = nxn1 by induction on n N.
The cases n = 0, 1 are OK by Example 4.1. Suppose the statement is true for some fixed n. We
will show that (xn+1 ) = (n + 1)xn .
By the product rule and the induction hypothesis
(xn+1 ) = (xn x) = (xn ) x + xn (x ) = nxn1 x + xn = (n + 1)xn .
This proves the claim for positive integers n. For negative n consider f (x) = 1/xn and use
the quotient rule.
(b) (ex ) = ex .
ex+h ex
ex eh ex
eh 1
= lim
= ex lim
= ex ;
h0
h0
h0
h
h
h

(ex ) = lim

(4.4)

the last equation simply follows from Example 3.9 (b)


(c) (sin x) = cos x, (cos x) = sin x. Using sin(x + y) sin(x y) = 2 cos (x) sin (y) we
have
2 cos 2x+h
sin h2
sin(x + h) sin x
2
= lim
(sin x) = lim
h0
h0
h
h


sin h2
h
lim
= lim cos x +
.
h0
2 h0 h2

sin h
= 1 (by the argument after Lemma 3.15), we obtain
h0 h
(sin x) = cos x. The proof for cos x is analogous.
1
(d) (tan x) =
. Using the quotiont rule for the function tan x = sin x/ cos x we have
cos2 x
Since cos x is continuous and lim

(tan x) =

(sin x) cos x sin x(cos x)


cos2 x + sin2 x
1
=
=
.
2
2
cos x
cos x
cos2 x

The next proposition deals with composite functions and is probably the most important statement about derivatives.

4.1 The Derivative of a Function

105

Proposition 4.4 (Chain rule) Let g : (, ) R be differentiable at x0 (, ) and let


f : (a, b) R be differentiable at y0 = g(x0 ) (a, b). Then h = f g is differentiable at
x0 , and
h (x0 ) = f (y0 )g (x0 ).

(4.5)

Proof. We have
f (g(x)) f (g(x0 )) g(x) g(x0 )
f (g(x)) f (g(x0 ))
=
x x0
g(x) g(x0 )
x x0
f (y) f (y0 )
lim
g (x0 ) = f (y0 )g (x0 ).
xx0 yy0
y y0
Here we used that y = g(x) tends to y0 = g(x0 ) as x x0 , since g is continuous at x0 .
Proposition 4.5 Let f : (a, b) R be strictly monotonic and continuous. Suppose f is differentiable at x. Then the inverse function g = f 1 : f ((a, b)) R is differentiable at y = f (x)
with
g (y) =

1
f (x)

1
f (g(y))

(4.6)

Proof. Let (yn ) f ((a, b)) be a sequence with yn y and yn 6= y for all n. Put xn = g(yn ).
Since g is continuous (by Corollary 3.9), limn xn = x. Since g is injective, xn 6= x for all n.
We have
g(yn) g(y)
xn x
= lim
= lim
n
n f (xn ) f (x)
n
yn y
lim

1
f (xn )f (x)
xn x

1
f (x)

Hence g (y) = 1/f (x) = 1/f (g(y)).


We give some applications of the last two propositions.
Example 4.3 (a) Let f : R R be differentiable; define F : R R by F (x) := f (ax + b)
with some a, b R. Then
F (x) = af (ax + b).
(b) In what follows f is the original function (with known derivative) and g is the inverse
function to f . We fix the notion y = f (x) and x = g(y).
log : R+ \ {0} R is the inverse function to f (x) = ex . By the above proposition
(log y) =

1
1
1
= x = .
x

(e )
e
y

1
(c) x = e log x . Hence, (x ) = (e log x ) = e log x = x1 .
x

1
(d) Suppose f > 0 and g = log f . Then g = f ; hence f = f g .
f

4 Differentiation

106

(e) arcsin : [1, 1] R is the inverse function to y = f (x) = sin x. If x (1, 1) then
(arcsin(y)) =

1
1
=
.

(sin x)
cos x

Since y [1,p1] implies x = arcsin y [/2, /2], cos x 0. Therefore, cos x =


p
1 sin2 x = 1 y 2. Hence
1
,
(arcsin y) = p
1 y2

1 < y < 1.

Note that the derivative is not defined at the endpoints y = 1 and y = 1.


(f)
1
1
= 1 = cos2 x.
(arctan y) =

(tan x)
cos2 x
Since y = tan x we have
y 2 = tan2 x =
1
1 + y2
1
(arctan y) =
.
1 + y2
cos2 x =

1 cos2 x
1
sin2 x
=
=
1
2
2
cos x
cos x
cos2 x

4.2 The Derivatives of Elementary Functions

107

4.2 The Derivatives of Elementary Functions

function
const.

derivative
0

xn (n N)

nxn1

x ( R, x > 0)

x1

ex

ex

ax , (a > 0)

ax log a
1
x
1
x log a
cos x

log x
loga x
sin x
cos x
tan x
cot x
sinh x
cosh x
tanh x
coth x
arcsin x
arccos x
arctan x
arccot x
arsinh x
arcosh x
artanh x
arcoth x

sin x
1
cos2 x
1
2
sin x
cosh x
sinh x
1
cosh2 x
1

sinh2 x
1

1 x2
1

1 x2
1
1 + x2
1

1 + x2
1

x2 + 1
1

x2 1
1
1 x2
1
1 x2

4 Differentiation

108

4.2.1 Derivatives of Higher Order


Let f : D R be differentiable. If the derivative f : D R is differentiable at x D, then
d2 f (x)
= f (x) = (f ) (x)
dx2
is called the second derivative of f at x. Similarly, one defines inductively higher order derivatives. Continuing in this manner, we obtain functions
f, f , f , f (3) , . . . , f (k)
each of which is the derivative of the preceding one. f (n) is called the nth derivative of f or the
derivative of order n of f . We also use the Leibniz notation
f

(k)

dk f (x)
(x) =
=
dxk

d
dx

k

f (x).

Definition 4.2 Let D R and k N a positive integer. We denote by Ck (D) the set of all
functions f : D R such that f (k) (x) exists for all x D and f (k) (x) is continuous. Obviously
C(D) C1 (D) C2 (D) . Further, we set
\
Ck (D) = {f : D R | f (k) (x) exists k N, x D}.
(4.7)
C (D) =

f Ck (D) is called k times continuously differentiable. C(D) = C0 (D) is the vector space of
continuous functions on D.
Using induction over n, one proves the following proposition.
Proposition 4.6 (Leibniz formula) Let f and g be n times differentiable. Then f g is n times
differentiable with
(f (x)g(x))

(n)

n  
X
n (k)
f (x)g (nk) (x).
=
k
k=0

(4.8)

4.3 Local Extrema and the Mean Value Theorem


Many properties of a function f like monotony, convexity, and existence of local extrema can
be studied using the derivative f . From estimates for f we obtain estimates for the growth of
f.
Definition 4.3 Let f : [a, b] R be a function. We say that f has a local maximum at the point
, (a, b), if there exists > 0 such that f (x) f () for all x [a, b] with | x | < .
Local minima are defined likewise.
We say that is a local extremum if it is either a local maximum or a local minimum.

4.3 Local Extrema and the Mean Value Theorem

109

Proposition 4.7 Let f be defined on [a, b]. If f has a local extremum at a point (a, b), and
if f () exists, then f () = 0.
Proof. Suppose f has a local maximum at . According with the definition choose > 0 such
that
a < < < + < b.
If < x < , then

Letting x , we see that f () 0.


If < x < + , then

f (x) f ()
0.
x

f (x) f ()
0.
x

Letting x , we see that f () 0. Hence, f () = 0.

Remarks 4.2 (a) f (x) = 0 is a necessary but not a sufficient condition for a local extremum
in x. For example f (x) = x3 has f (x) = 0, but x3 has no local extremum.
(b) If f attains its local extrema at the boundary, like f (x) = x on [0, 1], we do not have
f () = 0.
Theorem 4.8 (Rolles Theorem) Let f : [a, b] R be continuous with f (a) = f (b) and let f
be differentiable in (a, b). Then there exists a point (a, b) with f () = 0.
In particular, between two zeros of a differentiable function there is a zero of its derivative.
Proof. If f is the constant function, the theorem is trivial since f (x) 0 on (a, b). Otherwise, there exists x0 (a, b) such that f (x0 ) > f (a) or f (x0 ) < f (a). Then f attains its
maximum or minimum, respectively, at a point (a, b). By Proposition 4.7, f () = 0.

Theorem 4.9 (Mean Value Theorem) Let f : [a, b] R be continuous and differentiable in
(a, b). Then there exists a point (a, b) such that
f () =

f (b) f (a)
ba

(4.9)

Geometrically, the mean value theorem states that there exists


a tangent line through some point (, f ()) which is parallel
to the secant line AB, A = (a, f (a)), B = (b, f (b)).
a

4 Differentiation

110

Theorem 4.10 (Generalized Mean Value Theorem) Let f and g be continuous functions on
[a, b] which are differentiable on (a, b). Then there exists a point (a, b) such that
(f (b) f (a))g () = (g(b) g(a))f ().
Proof. Put
h(t) = (f (b) f (a))g(t) (g(b) g(a))f (t).
Then h is continuous in [a, b] and differentiable in (a, b) and
h(a) = f (b)g(a) f (a)g(b) = h(b).
Rolles theorem shows that there exists (a, b) such that
h () = f (b) f (a))h () (g(b) g(a))f () = 0.
The theorem follows.
In case that g is nonzero on (a, b) and g(b)g(a) 6= 0, the generalized MVT states the existence
of some (a, b) such that
f ()
f (b) f (a)
= .
g(b) g(a)
g ()
This is in particular true for g(x) = x and g = 1 which gives the assertion of the Mean Value
Theorem.

Remark 4.3 Note that the MVT fails if f is complex-valued, continuous on [a, b], and differentiable on (a, b). Indeed, f (x) = eix on [0, 2] is a counter example. f is continuous on [0, 2],
differentiable on (0, 2) and f (0) = f (2) = 1. However, there is no (0, 2) such that
(0)
0 = f (2)f
= f () = iei since the exponential function has no zero, see (3.7) (ez ez = 1)
2
in Subsection 3.5.1.
Corollary 4.11 Suppose f is differentiable on (a, b).
If f (x) 0 for all x (a, b), then f in monotonically increasing.
If f (x) = 0 for all x (a, b), then f is constant.
If f (x) 0 for all x in (a, b), then f is monotonically decreasing.
Proof. All conclusions can be read off from the equality
f (x) f (t) = (x t)f ()
which is valid for each pair x, t, a < t < x < b and for some (t, x).

4.3 Local Extrema and the Mean Value Theorem

111

4.3.1 Local Extrema and Convexity


Proposition 4.12 Let f : (a, b) R be differentiable and suppose f () exists at a point
(a, b). If
f () = 0 and f () > 0,
then f has a local minimum at . Similarly, if
f () = 0

and f () < 0,

f has a local maximum at .


Remark 4.4 The condition of Proposition 4.12 is sufficient but not necessary for the existence
of a local extremum. For example, f (x) = x4 has a local minimum at x = 0, but f (0) = 0.
Proof. We consider the case f () > 0; the proof of the other case is analogous. Since
f (x) f ()
> 0.
x
x

f () = lim

By Homework 10.4 there exists > 0 such that


| f () |
f (x) f ()
>
> 0,
x
2

for all x with 0 < | x | < .

Since f () = 0 it follows that


f (x) < 0

f (x) > 0

if < x < ,
if < x < + .

Hence, by Corollary 4.11, f is decreasing in ( , ) and increasing in (, + ). Therefore,


f has a local minimum at .
Definition 4.4 A function f : (a, b) R is
said to be convex if for all x, y (a, b) and all
[0, 1]
f(x)+(1- )f(y)

f (x + (1 )y) f (x) + (1 )f (y).


(4.10)

f( x+(1- )y )

x+(1- )y

A function f is said to be concave if f is convex.

Proposition 4.13 (a) Convex functions are continuous.


(b) Suppose f : (a, b) R is twice differentiable. Then f is convex if and only if f (x) 0 for
all x (a, b).
Proof. The proof is in Appendix C to this chapter.

4 Differentiation

112

4.4 LHospitals Rule


Theorem 4.14 (LHospitals Rule) Suppose f and g are differentiable in (a, b) and g(x) 6= 0
for all x (a, b), where a < b +. Suppose
f (x)
= A.
xa+0 g (x)
lim

(4.11)

If
(a)
(b)

lim f (x) = lim g(x) = 0 or

(4.12)

lim f (x) = lim g(x) = +,

(4.13)

xa+0

xa+0

xa+0

xa+0

then
f (x)
= A.
xa+0 g(x)
lim

(4.14)

The analogous statements are of course also true if x b 0, or if g(x) .

Proof. First we consider the case of finite a R. (a) One can extend the definition of f and
g via f (a) = g(a) = 0. Then f and g are continuous at a. By the generalized mean value
theorem, for every x (a, b) there exists a (a, x) such that
f (x)
f ()
f (x) f (a)
=
= .
g(x) g(a)
g(x)
g ()
If x approaches a then also approaches a, and (a) follows.
(b) Now let f (a + 0) = g(a + 0) = +. Given > 0 choose > 0 such that



f (t)


g (t) A <

if t (a, a + ). By the generalized mean value theorem for any x, y (a, a + ) with x 6= y,


f (x) f (y)



g(x) g(y) A < .
We have

f (x) f (y) 1
f (x)
=
g(x)
g(x) g(y) 1

g(y)
g(x)
f (y)
f (x)

The right factor tends to 1 as x approaches a, in particular there exists 1 > 0 with 1 < such
that x (a, a + 1 ) implies


f (x) f (x) f (y)


g(x) g(x) g(y) < .

Further, the triangle inequality gives



f (x)


< 2.

A
g(x)

4.5 Taylors Theorem

113

This proves (b).


The case x + can be reduced to the limit process y 0 + 0 using the substitution
y = 1/x.
LHospitals rule applies also applies in the cases A = + and A = .
sin x
cos x
= lim
= 1.
Example 4.4 (a) lim
x0 x
x0
1

x
1
2 x
(b) lim
= lim
= lim
= +.
x0+0 1 cos x
x0+0 sin x
x0+0 2 x sin x
(c)
1
log x
x
= lim
= lim x = 0.
lim x log x = lim
1
x0+0
x0+0
x0+0 12
x0+0
x
x
0
Remark 4.5 It is easy to transform other indefinite expressions to or
of lHospitals rule.
0
f
0: f g = 1
g

f g =

1
g

1
fg
g log f

1
f

00 : f g = e
.
Similarly, expressions of the form 1 and 0 can be transformed.

4.5 Taylors Theorem


The aim of this section is to show how n times differentiable functions can be approximated by
polynomials of degree n.
First consider a polynomial p(x) = an xn + + a1 x + a0 . We compute
p (x) = nan xn1 + (n 1)an1 xn2 + + a1 ,

p (x) = n(n 1)an xn2 + (n 1)(n 2)an1 xn2 + + 2a2 ,


..
.
p(n) (x) = n! an .
Inserting x = 0 gives p(0) = a0 , p (0) = a1 , p (0) = 2a2 ,. . . , p(n) (0) = n!an . Hence,
p(x) = p(0) +

p(n) (0) n
p (0)
p (0) 2
x+
x ++
x .
1!
2!
n!

Now, fix a R and let q(x) = p(x + a). Since q (k) (0) = p(k) (a), (4.15) gives
p(x + a) = q(x) =
p(x + a) =

n
X
q (k) (0)
k=0
n
X
k=0

k!

xk ,

p(k) (a) k
x .
k!

(4.15)

4 Differentiation

114
Replacing in the above equation x + a by x yields
p(x) =

n
X
p(k) (a)
k=0

k!

(x a)k .

(4.16)

Theorem 4.15 (Taylors Theorem) Suppose f is a real function on [r, s], n N, f (n) is continuous on [r, s], f (n+1) (t) exists for all t (r, s). Let a and x be distinct points of [r, s] and
define
Pn (x) =

n
X
f (k) (a)
k=0

k!

(x a)k .

(4.17)

Then there exists a point between x and a such that


f (x) = Pn (x) +

f (n+1) ()
(x a)n+1 .
(n + 1)!

(4.18)

For n = 0, this is just the mean value theorem. Pn (x) is called the nth Taylor polynomial of f
at x = a, and the second summand of (4.18)
Rn+1 (x, a) =

f (n+1) ()
(x a)n+1
(n + 1)!

is called the Lagrangian remainder term.


In general, the theorem shows that f can be approximated by a polynomial
of degree n, and that
(n+1)

(4.18) allows to estimate the error, if we know the bounds of f
(x) .
Proof. Consider a and x to be fixed; let M be the number defined by
f (x) = Pn (x) + M(x a)n+1

and put
g(t) = f (t) Pn (t) M(t a)n+1 ,

for r t s.

(4.19)

We have to show that (n + 1)!M = f (n+1) () for some between a and x. By (4.17) and (4.19),
g (n+1) (t) = f (n+1) (t) (n + 1)!M,

for r < t < s.

(4.20)

Hence the proof will be complete if we can show that g (n+1) () = 0 for some between a and
x.
(k)
Since Pn (a) = f (k) (a) for k = 0, 1, . . . , n, we have
g(a) = g (a) = = g (n) (a) = 0.
Our choice of M shows that g(x) = 0, so that g (1 ) = 0 for some 1 between a and x, by
Rolles theorem. Since g (a) = 0 we conclude similarly that g (2 ) = 0 for some 2 between
a and 1 . After n + 1 steps we arrive at the conclusion that g (n+1) (n+1 ) = 0 for some n+1
between a and n , that is, between a and x.

4.5 Taylors Theorem

115

Definition 4.5 Suppose that f is a real function defined on [r, s] such that f (n) (t) exists for all
t (r, s) and all n N. Let x and a points of [r, s]. Then
Tf (x) =

X
f (k) (a)
k=0

k!

(x a)k

(4.21)

is called the Taylor series of f at a.


Remarks 4.6 (a) The radius r of convergence of a Taylor series can be 0.
(b) If Tf converges, it may happen that Tf (x) 6= f (x). If Tf (x) at a point a converges to f (x)
in a certain neighborhood Ur (a), r > 0, f is called to be analytic at a.
Example 4.5 We give an example for (b). Define f : R R via
(
2
if x 6= 0,
e1/x ,
f (x) =
0,
if x = 0.
We will show that f C (R) with f (k) (0) = 0. For we will prove by induction on n that there
exists a polynomial pn such that
 
1 1/x2
(n)
, x 6= 0
f (x) = pn
e
x
and f (n) (0) = 0. For n = 0 the statement is clear taking p0 (x) 1. Suppose the statement is
true for n. First, let x 6= 0 then
 
  
 
 
2
1 1/x2
1 1
1
2
(n+1)
f
e
+ 3 pn
e1/x .
(x) = pn
= 2 pn
x
x
x
x
x
Choose pn+1 (t) = pn (t)t2 + 2pn (t)t3 .
Secondly,
pn
f (n) (h) f (n) (0)
= lim
f (n+1) (0) = lim
h0
h0
h

1
h

e1/h
2
= lim xpn (x)ex = 0,
x
h

where we used Proposition 2.7 in the last equality.


Hence Tf 0 at 0the Taylor series is identically 0and Tf (x) does not converge to f (x) in
a neigborhood of 0.

4.5.1 Examples of Taylor Series


(a) Power series coincide with their Taylor series.
ex =

X
xn
n=0

n!

x R,

(b) f (x) = log(1 + x), see Homework 13.4.

X
n=0

xn =

1
,
1x

x (1, 1).

4 Differentiation

116
(c) f (x) = (1 + x) , R, a = 0. We have
f (k) (x) = (1) (k+1)(1+x)k ,

in particular f (k) (0) = (1) (k+1).

Therefore,
(1 + x) =

n
X
( 1) ( k + 1)

k!

k=1

xk + Rn (x)

(4.22)

The quotient test shows that the corresponding power series converges for | x | < 1. Consider
the Lagrangian remainder term with 0 < < x < 1 and n + 1 > . Then











0
(1 + )n1xn+1
xn+1
| Rn+1 (x) | =
n+1
n+1
n+1

as n . Hence,

(1 + x) =

 
X

n=0

xn ,

0 < x < 1.

(4.23)

(4.23) is called the binomial series. Its radius of convergence is R = 1. Looking at other forms
of the remainder term gives that (4.23) holds for 1 < x < 1.
(d) y = f (x) = arctan x. Since y = 1/(1 + x2 ) and y = 2x/(1 + x2 )2 we see that
y (1 + x2 ) = 1.
Differentiating this n times and using Leibnizs formula, Proposition 4.6 we have
n
X

(k)

2 (nk)

(y ) (1 + x )

k=0

 
n
= 0.
k

 




n (n+1)
n
n
2
(n)
y
=
y 2x +
y (n1) 2 = 0;
(1 + x ) +
n
n1
n2
x=0:

This yields
y (n) (0) =

(
0,

(1) (2k)!,

y (n+1) + n(n 1)y (n1) = 0.


if

n = 2k,

if

n = 2k + 1.

Therefore,
n
X
(1)k 2k+1
x
arctan x =
+ R2n+2 (x).
2k
+
1
k=0

(4.24)

One can prove that 1 < x 1 implies R2n+2 (x) 0 as n . In particular, x = 1 gives

1 1
= 1 + + .
4
3 5

4.6 Appendix C

117

4.6 Appendix C
Corollary 4.16 (to the mean value theorem) Let f : R R be a differentiable function with
f (x) = cf (x) for all x R,

(4.25)

where c R is a fixed number. Let A = f (0). Then


f (x) = Aecx

for all x R.

(4.26)

Proof. Consider F (x) = f (x)ecx . Using the product rule for derivatives and (4.25) we obtain
F (x) = f (x)ecx + f (x)(c)ecx = (f (x) cf (x)) ecx = 0.
By Corollary 4.11, F (x) is constant. Since F (0) = f (0) = A, F (x) = A for all x R; the
statement follows.

The Continuity of derivatives


We have seen that there exist derivatives f which are not continuous at some point. However,
not every function is a derivative. In particular, derivatives which exist at every point of an interval have one important property: The intermediate value theorem holds. The precise statement
follows.
Proposition 4.17 Suppose f is differentiable on [a, b] and suppose f (a) < < f (b). Then
there is a point x (a, b) such that f (x) = .
Proof. Put g(t) = f (t) t. Then g is differentiable and g (a) < 0. Therefore, g(t1) < g(a)
for some t1 (a, b). Similarly, g (b) > 0, so that g(t2 ) < g(b) for some t2 (a, b). Hence,
g attains its minimum in the open interval (a, b) in some point x (a, b). By Proposition 4.7,
g (x) = 0. Hence, f (x) = .

Corollary 4.18 If f is differentiable on [a, b], then f cannot have discontinuities of the first
kind.
Proof of Proposition 4.13. (a) Suppose first that f 0 for all x. By Corollary 4.11, f is
increasing. Let a < x < y < b and [0, 1]. Put t = x + (1 )y. Then x < t < y and by
the mean value theorem there exist 1 (x, t) and 2 (t, y) such that
f (y) f (t)
f (t) f (x)
= f (1 ) f (2 ) =
.
tx
yt
Since t x = (1 )(y x) and y t = (y x) it follows that
f (t) f (x)
f (y) f (t)

= f (t) f (x) + (1 )f (y).

4 Differentiation

118

Hence, f is convex.
(b) Let f : (a, b) R be convex and twice differentiable. Suppose to the contrary f (x0 ) < 0
for some x0 (a, b). Let c = f (x0 ); put
(x) = f (x) (x x0 )c.
Then : (a, b) R is twice differentiable with (x0 ) = 0 and (x0 ) < 0. Hence, by
Proposition 4.12, has a local maximum in x0 . By definition, there is a > 0 such that
U (x0 ) (a, b) and
(x0 ) < (x0 ), (x0 + ) < (x0 ).
It follows that
f (x0 ) = (x0 ) >

1
1
((x0 ) + (x0 + )) = (f (x0 ) + f (x0 + )) .
2
2

This contradicts the convexity of f if we set x = x0 , y = x0 + , and = 1/2.

Chapter 5
Integration
In the first section of this chapter derivatives will not appear! Roughly speaking, integration
generalizes addition. The formula distance = velocity time is only valid for constant
Rt
velocity. The right formula is s = t01 v(t) dt. We need integrals to compute length of curves,
areas of surfaces, and volumes.
The study of integrals requires a long preparation, but once this preliminary work has been
completed, integrals will be an invaluable tool for creating new functions, and the derivative
will reappear more powerful than ever. The relation between the integral and derivatives is
given in the Fundamental Theorem of Calculus.

The integral formalizes a simple intuitive conceptthat of


area. It is not a surprise that to learn the definition of an intuitive concept can present great difficultiesarea is certainly
not an exception.

Whats the area ??


a

5.1 The RiemannStieltjes Integral


In this section we will only define the area of some very special regionsthose which are
bounded by the horizontal axis, the vertical lines through (a, 0) and (b, 0) and the graph of a
function f such that f (x) 0 for all x in [a, b]. If f is negative on a subinterval of [a, b], the
integral will represent the difference of the areas above and below the x-axis.
All intervals [a, b] are finite intervals.
Definition 5.1 Let [a, b] be an interval. By a partition of [a, b] we mean a finite set of points
x0 , x1 , . . . , xn , where
a = x0 x1 xn = b.
We write
xi = xi xi1 ,
119

i = 1, . . . , n.

5 Integration

120

Now suppose f is a bounded real function defined on [a, b]. Corresponding to each partition P
of [a, b] we put
Mi = sup{f (x) | x [xi1 , xi ]}

x=
a
0

x1

x2

x=
3 b

mi = inf{f (x) | x [xi1 , xi ]}


n
n
X
X
U(P, f ) =
Mi xi , L(P, f ) =
mi xi ,
i=1

(5.1)
(5.2)
(5.3)

i=1

and finally
Z

f dx = inf U(P, f ),

(5.4)

f dx = sup L(P, f ),

(5.5)

a
b
a

where the infimum and supremum are taken over all partitions P of [a, b]. The left members of
(5.4) and (5.5) are called the upper and lower Riemann integrals of f over [a, b], respectively.
If the upper and lower integrals are equal, we say that f Riemann-integrable on [a, b] and we
write f R (that is R denotes the Riemann-integrable functions), and we denote the common
value of (5.4) and (5.5) by
Z b
Z b
f dx or by
f (x) dx.
(5.6)
a

This is the Riemann integral of f over [a, b].

Since f is bounded, there exist two numbers m and M such that m f (x) M for all
x [a, b]. Hence for every partition P
m(b a) L(P, f ) U(P, f ) M(b a),
so that the numbers L(P, f ) and U(P, f ) form a bounded set. This shows that the upper and the
lower integrals are defined for every bounded function f . The question of their equality, and
hence the question of the integrability of f , is a more delicate one. Instead of investigating it
separately for the Riemann integral, we shall immediately consider a more general situation.
Definition 5.2 Let be a monotonically increasing function on [a, b] (since (a) and (b) are
finite, it follows that is bounded on [a, b]). Corresponding to each partition P of [a, b], we
write
i = (xi ) (xi1 ).
It is clear that i 0. For any real function f which is bounded on [a, b] we put
U(P, f, ) =
L(P, f, ) =

n
X
i=1
n
X
i=1

Mi i ,

(5.7)

mi i ,

(5.8)

5.1 The RiemannStieltjes Integral

121

where Mi and mi have the same meaning as in Definition 5.1, and we define
Z

f d = inf U(P, f, ),

(5.9)

f d = sup U(P, f, ),

(5.10)

a
b
a

where the infimum and the supremum are taken over all partitions P .
If the left members of (5.9) and (5.10) are equal, we denote their common value by
Z

f d

or sometimes by

f (x) d(x).

(5.11)

This is the RiemannStieltjes integral (or simply the Stieltjes integral) of f with respect to ,
over [a, b]. If (5.11) exists, we say that f is integrable with respect to in the Riemann sense,
and write f R().
By taking (x) = x, the Riemann integral is seen to be a special case of the RiemannStieltjes
integral. Let us mention explicitely, that in the general case, need not even be continuous.
We shall now investigate the existence of the integral (5.11). Without saying so every time, f
will be assumed real and bounded, and increasing on [a, b].
Definition 5.3 We say that a partition P is a refinement of the partition P if P P (that
is, every point of P is a point of P ). Given two partitions, P1 and P2 , we say that P is their
common refinement if P = P1 P2 .
Lemma 5.1 If P is a refinement of P , then
L(P, f, ) L(P , f, ) and U(P, f, ) U(P , f, ).

(5.12)

Proof. We only prove the first inequality of (5.12); the proof of the second one is analogous.
Suppose first that P contains just one point more than P . Let this extra point be x , and
suppose xi1 x < xi , where xi1 and xi are two consecutive points of P . Put
w1 = inf{f (x) | x [xi1 , x ]},

w2 = inf{f (x) | x [x , xi ]}.

Clearly, w1 mi and w2 mi (since inf M inf N if M N, see homework 1.4 (b)),


where, as before, mi = inf{f (x) | x [xi1 , xi ]}. Hence
L(P , f, ) L(P, f, ) = w1 ((x ) (xi1 )) + w2 ((xi ) (x )) mi ((xi ) (xi1 ))
= (w1 mi )((x ) (xi1 )) + (w2 mi )((xi ) (x )) 0.

If P contains k points more than P , we repeat this reasoning k times, and arrive at (5.12).

5 Integration

122
Proposition 5.2

b
a

f d

f d.
a

Proof. Let P be the common refinement of two partitions P1 and P2 . By Lemma 5.1
L(P1 , f, ) L(P , f, ) U(P , f, ) U(P2 , f, ).
Hence
L(P1 , f, ) U(P2 , f, ).
If P2 is fixed and the supremum is taken over all P1 , (5.13) gives
Z b
f d U(P2 , f, ).

(5.13)

(5.14)

The proposition follows by taking the infimum over all P2 in (5.14).

Proposition 5.3 (Riemann Criterion) f R() on [a, b] if and only if for every > 0 there
exists a partition P such that
U(P, f, ) L(P, f, ) < .

(RC)

Proof. For every P we have


L(P, f, )

Thus (RC) implies

b
a

f d

b
a

f d

Z
Z

b
a

f d U(P, f, ).

f d < .
a

since the above inequality can be satisfied for every > 0, we have
Z

f d =
a

f d,
a

that is f R().
Conversely, suppose f R(), and let > 0 be given. Then there exist partitions P1 and P2
such that
Z b
Z b

U(P2 , f, )
(5.15)
f d < ,
f d L(P1 , f, ) < .
2
2
a
a
We choose P to be the common refinement of P1 and P2 . Then Lemma 5.1, together with
(5.15), shows that
Z b

U(P, f, ) U(P2 , f, ) <


f d + < L(P1 , f, ) + L(P, f, ) + ,
2
a

5.1 The RiemannStieltjes Integral

123

so that (RC) holds for this partition P .


Proposition 5.3 furnishes a convenient criterion for integrability. Before we apply it, we state
some closely related facts.
Lemma 5.4 (a) If (RC) holds for P and some , then (RC) holds with the same for every
refinement of P .
(b) If (RC) holds for P = {x0 , . . . , xn } and if si , ti are arbitrary points in [xi1 , xi ], then
n
X
i=1

| f (si) f (ti ) | i < .

(c) If f R() and (RC) holds as in (b), then




Z b
n

X


f
(t
)

f
d
< .

i
i


a
i=1

Proof. Lemma 5.1 implies (a). Under the assumptions made in (b), both f (si) and f (ti ) lie in
[mi , Mi ], so that | f (si ) f (ti ) | Mi mi . Thus
n
X
i=1

| f (ti ) f (si ) | i U(P, f, ) L(P, f, ),

which proves (b). The obvious inequalities


X
L(P, f, )
f (ti )i U(P, f, )
i

and
L(P, f, )
prove (c).

f d U(P, f, )

Theorem 5.5 If f is continuous on [a, b] then f R() on [a, b].


Proof. Let > 0 be given. Choose > 0 so that
((b) (a)) < .
Since f is uniformly continuous on [a, b] (Proposition 3.7), there exists a > 0 such that
| f (x) f (t) | <

(5.16)

if x, t [a, b] and | x t | < . If P is any partition of [a, b] such that xi < for all i, then
(5.16) implies that
Mi mi ,

i = 1, . . . , n

(5.17)

5 Integration

124
and therefore
U(P, f, ) L(P, f, ) =

n
X
i=1

(Mi mi )i

n
X
i=1

i = ((b) (a)) < .

By Proposition 5.3, f R().


Example 5.1 (a) The proof of Theorem 5.5 together with Lemma 5.4 shows that


Z b
n
X



f (ti )i
f d <



a
i=1

if xi < .
Rb
We compute I = a sin x dx. Let > 0. Since sin x is continuous, f R. There exists > 0
such that | x t | < implies
| sin x sin t | <

.
ba

(5.18)

In this case (RC) is satisfied and consequently




Z b
n
X



sin(ti )xi
sin x dx <



a
i=1

for every partition P with xi < , i = 1, . . . , n.


For we choose an equidistant partition of [a, b], xi = a + (b a)i/n, i = 0, . . . , n. Then
(b a)2
h = xi = (b a)/n and the condition (5.18) is satisfied provided n >
. We have, by

addition the formula cos(x y) cos(x + y) = 2 sin x sin y


n
X

sin xi xi =

i=1

n
X
i=1

X
h
sin(a + ih)h =
2 sin h/2 sin(a + ih)
2 sin h/2 i=1
n

X
h
(cos(a + (i 1/2)h) cos(a + (i + 1/2)h))
2 sin h/2 i=1

h
(cos(a + h/2) cos(a + (n + 1/2)h))
2 sin h/2
h/2
(cos(a + h/2) cos(b + h/2)))
=
sin h/2
=

Since limh0 sin h/h = 1 and cos x is continuous, we find that the above expression tends to
Rb
cos a cos b. Hence a sin x dx = cos a cos b.
(b) For x [a, b] define
(
1,
x Q,
f (x) =
0,
x 6 Q.
We will show f 6 R. Let P be any partition of [a, b]. Since any interval contains rational
as well as irrational points, mi = 0 and Mi = 1 for all i. Hence L(P, f ) = 0 whereas

5.1 The RiemannStieltjes Integral

125

P
U(P, f ) = ni=1 xi = b a. We conclude that the upper and lower Riemann integrals dont
coincide; f 6 R. A similar reasoning shows f 6 R() if (b) > (a) since L(P, f, ) = 0 <
U(P, f, ) = (b) (a).
Proposition 5.6 If f is monotonic on [a, b], and is continuous on [a, b], then f R().
(x)

Let > 0 be given. For any positive


integer n, choose a partition such that
i =

Proof.

(b) (a)
,
n

i = 1, . . . , n.

This is possible by the intermediate


value theorem (Theorem 3.5) since
is continuous.
a

x2

b
n-1

We suppose that f is monotonically increasing (the proof is analogous in the other case). Then
Mi = f (xi ),

mi = f (xi1 ),

i = 1, . . . , n,

so that
n

U(P, f, ) L(P, f, ) =
=

(b) (a) X
(f (xi ) f (xi1 ))
n
i=1

(b) (a)
(f (b) f (a)) <
n

if n is taken large enough. By Proposition 5.3, f R().


Without proofs which can be found in [Rud76, pp. 126 128] we note the following facts.
Proposition 5.7 If f is bounded on [a, b], f has finitely many points of discontinuity on [a, b],
and is continuous at every point at which f is discontinuous. Then f R().
Proof. We give an idea of the proof in case of the Riemann integral ((x) = x) and one single
discontinuity at c, a < c < b. For, let > 0 be given and m f (x) M for all x [a, b] and
put C = M m. First choose point a and b with a < a < c < b < b and C(b a ) < .
Let fj , j = 1, 2, denote the restriction of f to the subintervals I1 = [a, a ] and I2 = [b, b ],
respectively. Since fj is continuous on Ij , fj R over Ij and therefore, by the Riemann
criterion, there exist partitions Pj , j = 1, 2, of Ij such that U(Pj , fj ) L(Pj , fj ) < , j = 1, 2.
Let P = P1 P2 be a partition of [a, b]. Then
U(P, f ) L(P, f ) = U(P1 , f1 ) L(P1 , f ) + U(P2 , f ) L(P2 , f ) + (M0 m0 )(b a )
+ + C(b a ) < 3,

5 Integration

126

where M0 and m0 are the supremum and infimum of f (x) on [a , b ]. The Riemann criterion is
satisfied for f on [a, b], f R.
Proposition 5.8 If f R() on [a, b], m f (x) M, is continuous on [m, M], and
h(x) = (f (x)) on [a, b]. Then h R() on [a, b].
Remark 5.1 (a) A bounded function f is Riemann-integrable on [a, b] if and only if f is continuous almost everywhere on [a, b]. (The proof of this fact can be found in [Rud76, Theorem 11.33]).
Almost everywhere means that the discontinuities form a set of (Lebesgue) measure 0. A set
S
M R has measure 0 if for given > 0 there exist intervals In , n N such that M nN In
P
and nN | In | < . Here, | I | denotes the length of the interval. Examples of sets of measure
0 are finite sets, countable sets, and the Cantor set (which is uncountable).
(b) Note that such a chaotic function (at point 0) as
(
cos x1 ,
x 6= 0,
f (x) =
0,
x = 0,
is integrable on [, ] since there is only one single discontinuity at 0.

5.1.1 Properties of the Integral


Proposition 5.9 (a) If f1 , f2 R() on [a, b] then f1 + f2 R(), cf R() for every
constant c and
Z b
Z b
Z b
Z b
Z b
(f1 + f2 ) d =
f1 d +
f2 d,
cf d = c
f d.
a

(b) If f1 , f2 R() and f1 (x) f2 (x) on [a, b], then


Z b
Z b
f1 d
f2 d.
a

(c) If f R() on [a, b] and if a < c < b, then f R() on [a, c] and on [c, b], and
Z b
Z c
Z b
f d =
f d +
f d.
a

(d) If f R() on [a, b] and | f (x) | M on [a, b], then


Z b




M((b) (a)).
f
d


a

(e) If f R(1 ) and f R(2 ), then f R(1 + 2 ) and


Z b
Z b
Z b
f d(1 + 2 ) =
f d1 +
f d2 ;
a

5.1 The RiemannStieltjes Integral

127

if f R() and c is a positive constant, then f R(c) and


Z b
Z b
f d(c) = c
f d.
a

Proof. If f = f1 + f2 and P is any partition of [a, b], we have


L(P, f1 , ) + L(P, f2 , ) L(P, f, ) U(P, f, ) U(P, f1 , ) + U(P, f2 , )

(5.19)

since inf f1 + inf f2 inf (f1 + f2 ) and sup f1 + sup f2 sup(f1 + f2 ).


Ii

Ii

Ii

Ii

Ii

Ii

If f1 R() and f2 R(), let > 0 be given. There are partitons Pj , j = 1, 2, such that
U(Pj , fj , ) L(Pj , fj , ) < .
These inequalities persist if P1 and P2 are replaced by their common refinement P . Then (5.19)
implies
U(P, f, ) L(P, f, ) < 2
which proves that f R(). With the same P we have
Z b
fj d + ,
U(P, fj , ) <

j = 1, 2;

since L(P, f, ) <

Rb

f d < U(P, f, ); hence (5.19) implies


Z b
Z b
Z b
f d U(P, f, ) <
f1 d +
f2 d + 2.

Since was arbitrary, we conclude that


Z b
Z b
Z b
f d
f1 d +
f2 d.
a

(5.20)

If we replace f1 and f2 in (5.20) by f1 and f2 , respectively, the inequality is reversed, and


the equality is proved.
Rb
(b) Put f = f2 f1 . It suffices to prove that a f d 0. For every partition P we have mi 0
since f 0. Hence
Z b
n
X
f d L(P, f, ) =
mi i 0
a

i=1

since in addition i = (xi ) (xi1 ) 0 ( is increasing).


The proofs of the other assertions are so similar that we omit the details. In part (c) the point is
that (by passing to refinements) we may restrict ourselves to partitions which contain the point
Rb
c, in approximating a f d, cf. Homework 14.5.

Note that in (c), f R() on [a, c] and on [c, b] in general does not imply that f R() on
[a, b]. For example consider the interval [1, 1] with
(
0,
1 x < 0,
f (x) = (x) =
1,
0 x 1.

5 Integration

128

R1
Then 0 f d = 0. The integral vanishes since is constant on [0, 1]. However, f 6 R() on
[1, 1] since for any partition P including the point 0, we have U(P, f, ) = 1 and L(P, f, ) =
0.
Proposition 5.10 If f, g R() on [a, b], then
(a) f g R();
Z b
Z b


(b) | f | R() and
f d
| f | d.
a

Proof. If we take (t) = t2 , Proposition 5.8 shows that f 2 R() if f R(). The identity
4f g = (f + g)2 (f g)2
completes the proof of (a).
If we take (t) = | t |, Proposition 5.8 shows that | f | R(). Choose c = 1 so that
R
c f d 0. Then

since f | f |.


Z
Z
Z
Z


f d = c f d = cf d | f | d,

The unit step function or Heaviside function H(x) is defined by H(x) = 0 if x < 0 and
H(x) = 1 if x 0.
Example 5.2 (a) If a < s < b, f is bounded on [a, b], f is continuous at s, and (x) = H(xs),
then
Z b
f d = f (s).
a

For the proof, consider the partition P with n = 3; a = x0 < x1 < s = x2 < x3 = b. Then
1 = 3 = 0, 2 = 1, and
U(P, f, ) = M2 ,

L(P, f, ) = m2 .

Since f is continuous at s, we see that M2 and m2 converge to f (s) as x s.


(b) Suppose cn 0 for all n = 1, . . . , N and (sn ), n = 1, . . . , N, is a strictly increasing finite
P
sequence of distinct points in (a, b). Further, (x) = N
n=1 cn H(x sn ). Then
Z

f d =
a

This follows from (a) and Proposition 5.9 (e).

N
X
n=1

cn f (sn ).

5.1 The RiemannStieltjes Integral

129

Proposition 5.11 Suppose cn 0 for all positive integers


P
n N,
n=1 cn converges, (sn ) is a strictly increasing sequence of distinct points in (a, b), and
(x) =

X
n=1

c3

cn H(x sn ).

(5.21)
c1

Let f be continuous on [a, b]. Then


Z

f d =

c2

s1

cn f (sn ).

s2 s 3

s4

(5.22)

n=1

Proof. The comparison test shows that the series (5.21) converges for every x. Its sum is
P
evidently an increasing function with (a) = 0 and (b) =
cn . Let > 0 be given, choose
N so that

X
cn < .
n=N +1

Put

1 (x) =

N
X
n=1

cn H(x sn ),

2 (x) =

n=N +1

By Proposition 5.9 and Example 5.2


Z

f d1 =
a

N
X

cn H(x sn ).

cn f (sn ).

n=1

Since 2 (b) 2 (a) < , by Proposition 5.9 (d),


Z b




M,
f
d
2


a

where M = sup | f (x) |. Since = 1 + 2 it follows that



Z
N

b
X


f d
cn f (sn ) M.


a
n=1

If we let N we obtain (5.22).

Proposition 5.12 Assume that is increasing and R on [a, b]. Let f be a bounded real
function on [a, b].
Then f R() if and only if f R. In that case
Z b
Z b
f d =
f (x) (x) dx.
(5.23)
a

The statement remains true if is continuous on [a, b] and differentiable up to finitely many
points c1 , c2 , . . . , cn .

5 Integration

130

Proof. Let > 0 be given and apply the Riemann criterion Proposition 5.3 to : There is a
partition P = {x0 , . . . , xn } of [a, b] such that
U(P, ) L(P, ) < .

(5.24)

The mean value theorem furnishes points ti [xi1 , xi ] such that


i = (xi ) (xi1 ) = (ti )(xi xi1 ) = (ti )xi ,
If si [xi1 , xi ], then

n
X
i=1

for

| (si ) (ti ) | xi <

i = 1, . . . , n.

(5.25)

by (5.24) and Lemma 5.4 (b). Put M = sup | f (x) |. Since


n
X

f (si )i =

i=1

n
X

f (si ) (ti )xi

i=1

it follows from (5.25) that




n
n
X

X


f (si)i
f (si ) (si )xi M.



i=1

In particular,

(5.26)

i=1

n
X
i=1

f (si )i U(P, f ) + M,

for all choices of si [xi1 , xi ], so that


U(P, f, ) U(P, f ) + M.
The same argument leads from (5.26) to
U(P, f ) U(P, f, ) + M.
Thus
| U(P, f, ) U(P, f ) | M.

(5.27)

Now (5.25) remains true if P is replaced by any refinement. Hence (5.26) also remains true.
We conclude that
Z b

Z b



f d
f (x) (x) dx M.

a

a
But is arbitrary. Hence

f d =
a

f (x) (x) dx,

for any bounded f . The equality for the lower integrals follows from (5.26) in exactly the same
way. The proposition follows.
We now summarize the two cases.

5.1 The RiemannStieltjes Integral

131

Proposition 5.13 Let f be continuous on [a, b]. Except for finitely many points c0 , c1 , . . . , cn
with c0 = a and cn = b there exists (x) which is continuous and bounded on
[a, b] \ {c0 , . . . , cn }.
Then f R() and
Z

f d =
a

f (x) (x) dx +
a

n1
X
i=1

f (ci )((ci + 0) (ci 0))+

f (a)((a + 0) (a)) + f (b)((b) (b 0)).

Proof (Sketch of proof). (a) Note that A+


i = (ci + 0) (ci ) and Ai = (ci ) (ci 0)
exist by Theorem 3.8. Define
n1
X

1 (x) =

A+
i H(x

i=0

ci ) +

k
X
i=1

A
i H(ci x).

(b) Then 2 = 1 is continuous.


(c) Since 1 is piecewise constant, 1 (x) = 0 for x 6= ck . Hence 2 (x) = (x). for x 6= ci .
Applying Proposition 5.12 gives
Z

f d2 =

Further,

f d =

By Proposition 5.11

b
a

f 2

dx =

f dx.

f d(1 + 2 ) =

f dx +

f d1 =

n
X

A+
i f (ci )

i=1

n1
X

f d1 .
a

A
i (f (ci )).

i=1

Example 5.3 (a) The Fundamental Theorem of Calculus, see Theorem 5.15 yields
Z

(b) f (x) = x2 .

2
3

x dx =

2
x4
x 3x dx = 3 = 12.
4 0
2

x,

7,
(x) =

x2 + 10,

64,

0 x < 1,
x = 1,

1 < x < 2,
x = 2.

5 Integration

132
Z

f dx + f (1)((1 + 0) (1 0)) + f (2)((2) (2 0))


Z 1
Z 2
2
=
x 1 dx +
x2 2x dx + 1(11 1) + 4(64 14)
0
1


3 1
4 2
1
5
x
x
1
=
+ + 10 + 200 = + 8 + 210 = 217 .

3 0
2 1
3
2
6

f d =

Remark 5.2 The three preceding proposition show the flexibility of the Stieltjes process of
integration. If is a pure step function, the integral reduces to an infinite series. If has
an initegrable derivative, the integral reduces to the ordinary Riemann integral. This makes it
possible to study series and integral simultaneously, rather than separately.

5.2 Integration and Differentiation


We shall see that integration and differentiation are, in a certain sense, inverse operations.
Theorem 5.14 Let f R on [a, b]. For a x b put
Z x
f (t) dt.
F (x) =
a

Then F is continuous on [a, b]; furthermore, if f is continuous at x0 [a, b] then F is differentiable at x0 and
F (x0 ) = f (x0 ).
Proof. Since f R, f is bounded. Suppose | f (t) | M on [a, b]. If a x < y b, then
Z y




| F (y) F (x) | =
f (t) dt M(y x),
x

by Proposition 5.9 (c) and (d). Given > 0, we see that

| F (y) F (x) | < ,


provided that | y x | < /M. This proves continuity (and, in fact, uniform continuity) of F .
Now suppose that f is continuous at x0 . Given > 0, choose > 0 such that
| f (t) f (x0 ) | <
if | t x0 | < , t [a, b]. Hence, if
x0 < s x0 t < x0 + ,

and a s < t b,

we have by Proposition 5.9 (d) as before





Z t
Z t
1

F (t) F (s)
1



f (r) dr
f (x0 ) dr
f (x0 ) =

ts
ts s
ts s
Z t


1
=
(f (u) f (x0 )) du < .

ts
s

5.2 Integration and Differentiation

133

This in particular holds for s = x0 , that is




F (t) F (x0 )


< .

f
(x
)
0


t x0
It follows that F (x0 ) = f (x0 ).

Definition 5.4 A function F : [a, b] R is called an antiderivative or a primitive of a function


f : [a, b] R if F is differentiable and F = f .
Remarks 5.3 (a) There exist functions f not having an antiderivative, for example the Heaviside function H(x) has a simple discontinuity at 0; but by Corollary 4.18 derivatives cannot
have simple discontinuities.
(b) The antiderivative F of a function f (if it exists) is unique up to an additive constant. More
precisely, if F is a antiderivative on [a, b], then F1 (x) = F (x) + c is also a antiderivative of f .
If F and G are antiderivatives of f on [a, b], then there is a constant c so that F (x) G(x) = c.
The first part is obvious since F1 (x) = F (x) + c = f (x). Suppose F and G are antiderivatives
of f . Put H(x) = F (x) G(x); then H (x) = 0 and H(x) is constant by Corollary 4.11.
Notation for the antiderivative:
F (x) =

f (x) dx =

f dx.

The function f is called the integrand. Integration and differentiation are inverse to each other:
Z
Z
d
f (x) dx = f (x),
f (x) dx = f (x).
dx
Theorem 5.15 (Fundamental Theorem of Calculus) Let f : [a, b] R be continuous. (a) If
Z x
F (x) =
f (t) dt.
a

Then F (x) is an antiderivative of f (x) on [a, b].


(b) If G(x) is an antiderivative of f (x) then
Z

b
a

f (t) dt = G(b) G(a).

Rx
Proof. (a) By Theorem 5.14 F (x) = a f (x) dx is differentiable at any point x0 [a, b] with
F (x) = f (x).
(b) By the above remark, the antiderivative is unique up to a constant, hence F (x) G(x) = C.
Ra
Since F (a) = a f (x) dx = 0 we obtain
G(b) G(a) = (F (b) C) (F (a) C) = F (b) F (a) = F (b) =

f (x) dx.
a

5 Integration

134

Note that the FTC is also true if f R and G is an antiderivative of f on [a, b]. Indeed, let > 0
be given. By the Riemann criterion, Proposition 5.3 there exists a partition P = {x0 , . . . , xn }
of [a, b] such that U(P, f ) L(P, f ) < . By the mean value theorem, there exist points
ti [xi1 , xi ] such that
F (xi ) F (xi1 ) = f (ti )(xi xi1 ),
Thus
F (b) F (a) =

n
X

i = 1, . . . , n.

f (ti )xi .

i=1

It follows from Lemma 5.4 (c) and the above equation that



Z b
Z b
n
X




f (ti )xi
f (x) dx = F (b) F (a)
f (x) dx < .



a
a
i=1

Since > 0 was arbitrary, the proof is complete.

5.2.1 Table of Antiderivatives


By differentiating the right hand side one gets the left hand side of the table.
function

domain

R \ {1}, x > 0

1
x
ex

x < 0 or

x>0

log | x |

ax

a > 0, a 6= 1, x R

sin x

R
R

cos x
1
sin2 x
1
cos2 x
1
1 + x2
1

1 + x2
1

1 x2
1

x2 1

antiderivative
1
x+1
+1

ex
ax
log a
cos x

sin x

R \ {k | k Z}

R\

n
2

+ k | k Z

cot x

tan x

R
R

arctan x
arsinh x = log(x +

1 < x < 1
x < 1

or

x>1

x2 + 1)

arcsin x
log(x +

x2 1)

5.2 Integration and Differentiation

135

5.2.2 Integration Rules


The aim of this subsection is to calculate antiderivatives of composed functions using antiderivatives of (already known) simpler functions.
Notation:
f (x)|ba := f (b) f (a).
Proposition 5.16 (a) Let f and g be functions with antiderivatives F and G, respectively. Then
af (x) + bg(x), a, b R, has the antiderivative aF (x) + bG(x).
Z
Z
Z
(af + bg) dx = a f dx + b g dx (Linearity.)
(b) If f and g are differentiable, and f (x)g (x) has a antiderivative then f (x)g(x) has a antiderivative, too:
Z
Z

f g dx = f g f g dx, (Integration by parts.)


(5.28)
If f and g are continuously differentiable on [a, b] then
Z b
Z b
b

f g dx = f (x)g(x)|a
f g dx.
a

(5.29)

(c) If : D R is continuously differentiable with (D) I, and f : I R has a antiderivative F , then


Z
(5.30)
f ((x)) (x) dx = F ((x)), (Change of variable.)

If : [a, b] R is continuously differentiable with ([a, b]) I and f : I R is continuous,


then
Z b
Z (b)

f ((t)) (t) dt =
f (x) dx.
a

(a)

Proof. Since differentiation is linear, (a) follows.


(b) Differentiating the right hand side, we obtain
Z
d
(f g f g dx) = f g + f g f g = f g
dx
which proves the statement.
(c) By the chain rule F ((x)) is differentiable with
d
F ((x)) = F ((x)) (x) = f ((x)) (x),
dx
and (c) follows.
The statements about the Riemann integrals follow from the statements about antiderivatives

5 Integration

136

using the fundamental theorem of calculus. For example, we show the second part of (c). By
the above part, F ((t)) is an antiderivative of f ((t))(t). By the FTC we have
Z b
f ((t)) (t) dt = F ((t))|ba = F ((b)) F ((a)).
a

On the other hand, again by the FTC,


Z (b)
(b)
f (x) dx = F (x)|(a) = F ((b)) F ((a)).
(a)

This completes the proof of (c).

Corollary 5.17 Suppose F is the antiderivative of f .


Z
1
(5.31)
f (ax + b) dx = F (ax + b), a 6= 0;
a
Z
g (x)
dx = log | g(x) | , (g differentiable and g(x) 6= 0).
(5.32)
g(x)
R
P
Example 5.4 (a) The antiderivative of a polymnomial. If p(x) = nk=0 ak xk , then p(x) dx =
Pn ak k+1
.
k=0 k+1 x

(b) Put f (x) = ex and g(x) = x, then f (x) = ex and g (x) = 1 and we obtain
Z
Z
x
x
xe dx = xe 1 ex dx = ex (x 1).
R
R
R
(c) I = (0, ). log x dx = 1 log x dx = x log x x x1 dx = x log x x.
(d)
Z
Z
Z
1
arctan x dx = 1 arctan x dx = x arctan x x
dx
1 + x2
Z
1
1
(1 + x2 )
= x arctan x
dx = x arctan x log(1 + x2 ).
2
2
1+x
2
In the last equation we made use of (5.32).
(e) Recurrent computation of integrals.
Z
dx
In :=
,
(1 + x2 )n
I1 = arctan x.
In =
Put u = x, v =

n N.

(1 + x2 ) x2
= In1
(1 + x2 )n

x2 dx
.
(1 + x2 )n

x
. Then U = 1 and
(1 + x2 )n
Z
x dx
1 (1 + x2 )1n
v=
=
.
(1 + x2 )n
2 1n

5.2 Integration and Differentiation


Hence,

137

Z
1 x(1 + x2 )1n
1
In = In1
(1 + x2 )1n dx

2
1n
2(1 n)
x
2n 3
In =
In1 .
+
(2n 2)(1 + x2 )n1 2n 2

In particular, I2 =

x
2(1+x2 )

+ 12 arctan x and I3 =

x
4(1+x2 )2

+ 34 I2 .

Proposition 5.18 (Mean Value Theorem of Integration) Let f, : [a, b] R be continuous


functions and 0. Then there exists [a, b] such that
Z b
Z b
f (x)(x) dx = f ()
(x) dx.
(5.33)
a

In particular, in case = 1 we have


Z b
a

f (x) dx = f ()(b a)

for some [a, b].

Proof. Put m = inf{f (x) | x [a, b]} and M = sup{f (x) | x [a, b]}. Since 0 we obtain
m(x) f (x)(x) M(x). By Proposition 5.9 (a) and (b) we have
Z b
Z b
Z b
(x) dx
f (x)(x) dx M
(x) dx.
m
a

Hence there is a [m, M] such that


Z b
Z b
f (x)(x) dx =
(x) dx.
a

Since f is continuous on [a, b] the intermediate value theorem Theorem 3.5 ensures that there
is a with = f (). The claim follows.

Example 5.5 The trapezoid rule. Let f : [0, 1] R be twice continuously differentiable. Then
there exists [0, 1] such that
Z 1
1
1
f (x) dx = (f (0) + f (1)) f ().
(5.34)
2
12
0
Proof. Let (x) = 12 x(1 x) such that (x) 0 for x [0, 1], (x) = 12 x, and (x) = 1.
Using integration by parts twice as well as Theorem 5.18 we find
Z 1
Z 1
Z 1
1

f (x) dx =
(x)f (x) dx = (x)f (x)|0 +
(x)f (x) dx
0
0
0
Z 1
1
1
= (f (0) + f (1)) + (x)f (x)|0
(x)f (x) dx
2
0
Z 1
1
= (f (0) + f (1)) f ()
(x) dx
2
0
1
1
= (f (0) + f (1)) f ().
2
12

5 Integration

138
Indeed,

R1
0

1
x
2

1

12 x2 dx = 14 x2 16 x3 0 =

1
4

1
6

1
.
12

5.2.3 Integration of Rational Functions


We will give a useful method to compute antiderivatives of an arbitrary rational function.
Consider a rational function p/q where p and q are polinomials. We will assume that deg p <
deg q; for otherwise we can express p/q as a polynomial function plus a rational function which
is of this form, for eample
x2
1
=x+1+
.
x1
x1
Polynomials
We need some preliminary facts on polynomials which are stated here without proof. Recall
that a non-zero constant polynomial has degree zero, deg c = 0 if c 6= 0. By definition, the zero
polynomial has degree , deg 0 = .
Theorem 5.19 (Fundamental Theorem of Algebra) Every polynomial p of positive degree
with complex coefficients has a complex root, i. e. there exists a complex number z such that
p(z) = 0.
Lemma 5.20 (Long Division) Let p and q be polynomials, then there exist unique polynomials
r and s such that
p = qs + r, deg r < deg q.
Lemma 5.21 Let p be a complex polynomial of degree n 1 and leading coefficient an . Then
there exist n uniquely determined numbers z1 , . . . , zn (which may be equal) such that
p(z) = an (z z1 )(z z2 ) (z zn ).
Proof. We use induction over n and the two preceding statements. In case n = 1 the linear
polynomial p(z) = az + b can be written in the desired form


b
b
p(z) = a z
with the unique root z1 = .
a
a
Suppose the statement is true for all polynomials of degree n 1. We will show it for degree n
polynomials. For, let zn be a complex root of p which exists by Theorem 5.19; p(zn ) = 0. Using
long division of p by the linear polynomial q(z) = z zn we obtain a quotient polynomial p1 (z)
and a remainder polynomial r(z) of degree 0 (a constant polynomial) such that
p(z) = (z zn )p1 (z) + r(z).
Inserting z = zn gives p(zn ) = 0 = r(zn ); hence the constant r vanishes and we have
p(z) = (z zn )p1 (z)

5.2 Integration and Differentiation

139

with a polynomial p1 (z) of degree n 1. Applying the induction hypothesis to p1 the statement
follows.
A root of p is said to be a root of multiplicity k, k N, if appears exactly k times among
the zeros z1 , z2 , . . . , zn . In that case (z )k divides p(z) but (z )k+1 not.
If p is a real polynomial, i. e. a polynomial with real coefficients, and is a root of multiplicity
k of p then is also a root of multiplicity k of p. Indeed, taking the complex conjugation of the
equation
p(z) = (z )k q(z)
we have since p(z) = p(z) = p(z)
p(z) = (z )k q(z) = p(z) = (z )k q(z).
z:=z

Note that the product of the two complex linear factors z and z yield a real quadratic
factor
(z )(z ) = z 2 ( + )z + = z 2 2 Re + | |2 .
Using this fact, the real version of Lemma 5.21 is as follows.
Lemma 5.22 Let q be a real polynomial of degree n with leading coefficient an . Then there
exist real numbers i , j , j and multiplicities ri , sj N, i = 1, . . . , k, j = 1, . . . , l such that
q(x) = an

k
Y
i=1

(x i )

ri

l
Y
j=1

(x2 2j x + j )sj .

We assume that the quadratic factors cannot be factored further; this means

Of course, deg q =

i ri

j2 j < 0,
j

j = 1, . . . , l.

2sj = n.

Example 5.6 (a) x4 4 = (x2 + 2)(x2 2) = (x 2)(x + 2)(x i 2)(x + i 2) =

(x 2)(x + 2)(x2 + 2)
(b) x3 + x 2. One can guess the first zero x1 = 1. Using long division one gets
x3
(x3 x2 )
x2
(x2

+x

2 = (x 1)(x2 + x + 2)

+x 2
x
)
2x 2
(2x 2)
0

There are no further real zeros of x2 + x + 2.

5 Integration

140

5.2.4 Partial Fraction Decomposition


Proposition 5.23 Let p(x) and q(x) be real polynomials with deg p < deg q. There exist real
numbers Air , Bjs , and Cjs such that
!
!
sj
ri
k
l
X
X
Air
Bjs x + Cjs
p(x) X X
=
+
(5.35)
q(x)
(x i )r
(x2 2j x + j )s
r=1
s=1
i=1
j=1
where the i , j , j , ri , and sj have the same meaning as in Lemma 5.22.
Z

x4
Example 5.7 (a) Compute f (x) dx =
dx. We use long division to obtain a ratiox3 1
x
nal function p/q with deg p < deg q, f (x) = x + x3 1 . To obtain the partial fraction decomposition (PFD), we need the factorization of the denominator polynomial q(x) = x3 1. One can
guess the first real zero x1 = 1 and divide q by x 1; q(x) = (x 1)(x2 + x + 1).
The PFD then reads
x
a
bx + c
=
+ 2
.
3
x 1
x1 x +x+1
We have to determine a, b, c. Multiplication by x3 1 gives
0 x2 + 1 x + 0 = a(x2 + x + 1) + (bx + c)(x 1) = (a + b)x2 + (a b + c)x + a c.
The two polynomials on the left and on the right must coincide, that is, there coefficients must
be equal:
0 = a c, 1 = a b + c, 0 = a + b;

which gives a = b = c = 13 . Hence,


x3

x
1 1
1 x1
=

.
1
3 x 1 3 x2 + x + 1

We can integrate the first two terms but we have to rewrite the last one
x2
Recall that
Z

x1
1 2x + 1
3
1
=

.

2
1
+x+1
2 x +x+1 2 x+ 2+ 3
2
4

2

2x 2
x 2x + ,
dx
=
log
x2 2x +

x+b
1
dx
.
= arctan
2
2
(x + b) + a
a
a

Therefore,
Z

2x + 1
x4
1 2 1
1
1
2

.
arctan
dx
=
x
log
|
x

1
|

log(x
+
+
x
+
1)
+
x3 1
2
2
6
3
3

(b) If q(x) = (x 1)3 (x + 2)(x2 + 2)2 (x2 + 1) and p(x) is any polynomial with deg p < deg q =
10, then the partial fraction decomposition reads as
A11
A12
B11 x + C11 B12 x + C12 B21 + C21
A13
A21
p(x)
=
+
+
+
.
+
+
+
2
3
q(x)
x 1 (x 1)
(x 1)
x+2
x2 + 2
(x2 + 2)2
x2 + 1
(5.36)

5.2 Integration and Differentiation

141

Suppose now that p(x) 1. One can immediately compute A13 and A21 . Multiplying (5.36)
by (x 1)3 yields
1
= A13 + (x 1)p1 (x)
(x + 2)(x2 + 2)2 (x2 + 1)
with a rational function p1 not having (x 1) in the denominator. Inserting x = 1 gives
1
1
= 54
. Similarly,
A13 = 32 32


1
1

A21 =
.
=

3
2
2
2
3
(x 1) (x + 2) (x + 1) x=2 (3) 6 5

5.2.5 Other Classes of Elementary Integrable Functions

An elementary function is the compositions of rational, exponential, trigonometric functions


and their inverse functions, for example

esin( x1)
f (x) =
.
x + log x
A function is called elementary integrable if it has an elementary antiderivative. Rational functions are elementary integrable. Most functions are not elementary integrable as
2

ex ,

ex
,
x

1
,
log x

They define new functions


Z x
t2
e 2 dt,
W (x) :=
Z0 x
dt
li(x) :=
log t
Z0
dx
p
F(, k) :=
0
1 k 2 sin2 x
Z p
1 k 2 sin2 x dx
E(, k) :=

sin x
.
x

(Gaussian integral),
(integral logarithm)
(elliptic integral of the first kind),
(elliptic integral of the second kind).

R(cos x, sin x) dx

Let R(u, v) be a rational function in two variables u and v, that is R(u, v) =


x
mials p and q in two variables. We substitute t = tan . Then
2
sin x =
Hence

2t
,
1 + t2

R(cos x, sin x) dx =

cos x =

1 t2
,
1 + t2

1 t2 2t
,
1 + t2 1 + t2

dx =


2dt
.
1 + t2

2dt
=
1 + t2

with another rational function R1 (t).


There are 3 special cases where another substitution is apropriate.

p(u,v)
q(u,v)

R1 (t) dt

with polino-

5 Integration

142
(a) R(u, v) = R(u, v), R is odd in u. Substitute t = sin x.
(b) R(u, v) = R(u, v), R is odd in v. Substitute t = cos x.
(c) R(u, v) = R(u, v). Substitute t = tan x.

Example 5.8 (1)

sin3 x dx. Here, R(u, v) = v 3 is an odd function in v, such that (b) applies;

t = cos x, dt = sin x dx, sin2 x = 1 cos2 x = 1 t2 . This yields


Z
Z
Z
t3
3
2
sin x dx = sin x ( sin x dx) = (1 t2 ) dt = t + + const.
3
3
cos
+ const. .
= cos x +
3
R
(2) tan x dx. Here, R(u, v) = uv . All (a), (b), and (c) apply to this situation. For example, let
t = sin x. Then cos2 x = 1 t2 , dt = cos x dx and
Z
Z
Z
Z
1
sin x cos x dx
d(1 t2 )
t dt
=

=
tan x dx =
=
cos2 x
1 t2
2
1 t2
1
= log(1 t2 ) = log | cos x | .
2
R

R(x,

n
ax + b) dx

The substitution
t=

ax + b

yields x = (tn b)/a, dx = ntn1 dt/a, and therefore


 n

Z
Z

t b
n
n
R
, t tn1 dt.
R(x, ax + b) dx =
a
a
R

R(x,

ax2 + 2bx + c) dx

Using the method of complete squares the above integral can be written in one of the three basic
forms
Z
Z
Z

2
2
R(t, t 1) dt,
R(t, 1 t2 ) dt.
R(t, t + 1) dt,
Further substitutions

t = sinh u,
t = cosh u,
t = cos u,

t2 + 1 = cosh u,

dt = cosh u du,

t2 1 = sinh u,

dt = sinh u du,

1 t2 = sin u,

reduce the integral to already known integrals.

dt = sin u du

5.3 Improper Integrals

143
Z

dx
. Hint: t = x2 + 6x + 5 x.
x2 + 6x + 5
2
2
2
Then (x + t) = x + 2tx + t = x2 + 6x + 5 such that t2 + 2tx = 6x + 5 and therefore x =
and
2t(6 2t) + 2(t2 5)
2t2 + 12t 10
dx =
dt
=
dt.
(6 2t)2
(6 2t)2
Example 5.9 Compute I =

Hence, using t + x = t +

t2 5
62t

t2 5
62t

t2 +6t5
,
62t

Z
(2t2 + 12t 10) dt 1
2(6 2t)(t2 + 6t 5)
I=
=
dt
(6 2t)2
t+x
(t2 + 6t 5)(6 2t)2
Z

dt


= log | 6 2t | + const. = log 6 2 x2 + 6x + 5 + 2x + const.
=2
6 2t

5.3 Improper Integrals


The notion of the Riemann integral defined so far is apparently too tight for some applications:
we can integrate only over finite intervals and the functions are necessarily bounded. If the
integration interval is unbounded or the function to integrate is unbounded we speak about
improper integrals. We consider three cases: one limit of the integral is infinite; the function is
not defined at one of the end points a or b of the interval; both a and b are critical points (either
infinity or the function is not defined there).

5.3.1 Integrals on unbounded intervals


Definition 5.5 Suppose f R on [a, b] for all b > a where a is fixed. Define
Z

f (x) dx = lim

b+

f (x) dx

(5.37)

if this limit exists (and is finite). In that case, we say that the integral on the left converges. If it
also converges if f has been replaced by | f |, it is said to converge absolutely.
If an integral converges absolutely, then it converges, see Example 5.11 below, where
Z
Z



f dx
| f | dx.

a

Similarly, one defines

f (x) dx. Moreover,

f dx :=

if both integrals on the right side converge.

f dx +

f dx

5 Integration

144
Z

Example 5.10 (a) The integral


Indeed,
Z

Since

R+

Rs1

(b)

Hence

dx
converges for s > 1 and diverges for 0 < s 1.
xs

R


dx
1
1
1
1
1 s1 .

=
=
xs
1 s xs1 1
s1
R
lim

it follows that

ex dx = 1.

(
0,

+,

dx
1
,
=
s
x
s1

if s > 1,

if 0 < s < 1,
if s > 1.

R
1
ex dx = ex 0 = 1 R .
e

Proposition 5.24 (Cauchy criterion) The improper integral

f dx converges if and only if

for every > 0 there exists some b > a such that for all c, d > b

Z d


< .

f
dx


c

Proof. The following Cauchy criterion for limits of functions is easily proved using sequences:
The limit lim F (x) exists if and only if
x

> 0 R > 0 x, y > R : | F (x) F (y) | < .

(5.38)

Indeed, suppose that (xn ) is any sequence converging to + as n . We will show


that (F (xn )) converges if (5.38) is satisfied. Let > 0. By assumption, there exists R > 0
with the above property. Since xn + there exists n0 N such that n n0 implies
n

xn > R. Hence, | F (xn ) F (xm ) | < as m, n n0 . Thus, (F (xn )) is a Cauchy sequence and
therefore convergent. This proves one direction of the above criterion. The inverse direction is
even simpler: Suppose that lim F (x) = A exists (and is finite!). We will show that the above
x+

criterion is satisfied.Let > 0. By definition of the limit there exists R > 0 such that x, y > R
imply | F (x) A | < /2 and | F (y) A | < /2. By the triangle inequality,
| F (x) F (y) | = | F (x) A (F (y) A) | | F (x) A | + | F (y) A | <


+ = ,
2 2

as x, y > R which completes the proof of this Cauchy criterion.


R

Rt
d

Applying this criterion to the function F (t) = a f dx noting that | F (d) F (c) | = c f dx ,
the limit limt F (t) exists.

5.3 Improper Integrals

145

R
R
Example 5.11 (a) If a f dx converges absolutely, then a f dx converges. Indeed, let > 0
R
and a | f | dx converges. By the Cauchy Criterion for the later integral and by the triangle
inequality, Proposition 5.10, there exists b > 0 such that for all c, d > b
Z d
Z d



f dx
| f | dx < .
(5.39)

c

R
Hence, the Cauchy criterion is satisfied for f if it holds for | f |. Thus, a f dx converges.
R
(b) 1 sinx x dx. Partial integration with u = x1 and v = sin x yields u = x12 , v = cos x and
d Z d
Z d

1
sin x
cos x
dx = cos x
dx
x
x
x2
c
c
c

Z d

Z d
1



1
sin
x
dx
cos d + cos c +


dx
d


2
x
c
c
c x




1 1
1 1 1 1
<
+
+ + 2
c d
d c
c d
R
if c and d are sufficiently large. Hence, 1 sinx x dx converges.
The integral does not converge absolutely. For non-negative integers n Z+ we have

Z (n+1)
Z (n+1)
sin x
1
2
dx

;
| sin x | dx =
x
(n + 1) n
(n + 1)
n
hence



n
X
sin x
1

dx 2
.
x
k=1 k + 1
1

R
Since the harmonic series diverges, so does the integral sinx x dx.
R
Proposition 5.25 Suppose f R is nonnegative, f 0. Then a f dx converges if there
exists C > 0 such that
Z b
f dx < C, for all b > a.
Z

(n+1)

The proof is similar to the proof of Lemma 2.19 (c); we omit it. Analogous propositions are true
Ra
for integrals f dx.

Proposition 5.26 (Integral criterion for series) Assume that f R is nonnegative f 0 and
R
P
decreasing on [1, +). Then 1 f dx converges if and only if the series
n=1 f (n) converges.
Proof. Since f (n) f (x) f (n 1) for n 1 x n,
Z n
f dx f (n 1).
f (n)
n1

Summation over n = 2, 3, . . . , N yields


N
X
n=2

f (n)

N
1

f dx

N
1
X
n=1

f (n).

5 Integration

146

R
P
If 1 f dx converges the series
and therefore convergent.
n=1 f (n) is bounded
RR
P
P
Conversely, if
n=1 f (n) converges, the integral 1 f dx
n=1 f (n) is bounded as
R , hence convergent by Proposition 5.25.
R dx
P
1
Example 5.12
n=2 n(log n) converges if and only if 2 x(log x) converges. The substitution
gives
y = log x, dy = dx
x
Z
Z
dx
dy
=

x(log x)
2
log 2 y
which converges if and only if > 1 (see Example 5.10).

5.3.2 Integrals of Unbounded Functions


Definition 5.6 Suppose f is a real function on [a, b) and f R on [a, t] for every t, a < t < b.
Define
Z
Z
b

f dx = lim

f dx

tb0

if the limit on the right exists. Similarly, one defines


Z

f dx = lim

ta+0

f dx
t

if f is unbounded at a and integrable on [t, b] for all t with a < t < b.


Rb
In both cases we say that a f dx converges.

Example 5.13 (a)


Z 1
Z t
dx
dx

= lim
= lim arcsin x|t0 = lim arcsin t = arcsin 1 = .
2
2
t10
t10
t10
2
1x
1x
0
0
(b)
Z

1
0

dx
= lim
t0+0
x

1
t

dx
= lim
t0+0
x


1
1 1
x
1
t
1
log x|t ,

, 6= 1

=1

1
,
1

< 1,

+,

1.

Remarks 5.4 (a) The analogous statements to Proposition 5.24 and Proposition 5.25 are true
Rb
for improper integrals a f dx.
R 1 dx
R1
R1
For example, 0 x(1x)
diverges since both improper integrals 02 f dx and 1 f dx diverge,
2
R 1 dx
R1
dx

diverges
since
it
diverges
at
x
=
1,
finally
I
=
converges.
Indeed, the
0
0
x(1x)
x(1x)

substitution x = sin2 t gives I = .


(b) If f is unbounded both at a and at b we define the improper integral
Z

f dx =
a

f dx +
a

f dx
c

5.3 Improper Integrals

147

if c is between a and b and both improper integrals on the right side exist.
(c) Also, if f is unbounded at a define
Z
Z b
Z
f dx =
f dx +
f dx
a

if the two improper integrals on the right side exist.


(d) If f is unbounded in the interior of the interval [a, b], say at c, we define the improper integral
Z c
Z b
Z b
f dx =
f dx +
f dx
a

if the two improper integrals on the right side exist. For example,
Z 1
Z 0
Z 1
Z t
Z 1
dx
dx
dx
dx
dx
p
p
p
p
p
=
+
= lim
+ lim
|x|
|x|
| x | t00 1 | x | t0+0 t
|x|
1
1
0
t
1
= lim 2 x + lim 2 x = 4.
1

t00

t0+0

5.3.3 The Gamma function


For x > 0 set
(x) =

tx1 et dt.

(5.40)

By Example 5.13, 1 (x) =

By Example 5.10, 2 (x) =

R1
0

tx1 et dt converges since for every t > 0

R
1

tx1 et

1
t1x

tx1 et dt converges since for every t t0


tx1 et

1
.
t2

Note that limt tx+1 et = 0 by Proposition 3.11. Hence, (x) is defined for every x > 0.
Proposition 5.27 For every positive x
x(x) = (x + 1).

(5.41)

In particular, for n N we have (n + 1) = n!.


Proof. Using integration by parts,
Z R
Z

x t
x t R
t e dt = t e + x

tx1 et dt.

Taking the limits 0 + 0 and R + one has (x + 1) = x(x). Since by Example 5.10
Z
(1) =
et dt = 1,
0

5 Integration

148
it follows from (5.41) that
(n + 1) = n(n) = = n(n 1)(n 2) (1) = n!

The Gamma function interpolates the factorial function n! which is defined only for positive
integers n. However, this property alone is not sufficient for a complete characterization of the
Gamma function. We need another property. This will be done more in detail in the appendix
to this chapter.

5.4 Integration of Vector-Valued Functions


A mapping : [a, b] Rk , (t) = (1 (t), . . . , k (t)) is said to be continuous if all the mappings
i , i = 1, . . . , k, are continuous. Moreover, if all the i are differentiable, we write (t) =
(1 (t), . . . , k (t)).
Definition 5.7 Let f1 , . . . , fk be real functions on [a, b] and let f = (f1 , . . . , fk ) be the correponding mapping from [a, b] into Rk . If increases on [a, b], to say that f R() means that
fj R() for j = 1, . . . , k. In this case we define
Z b

Z b
Z b
f d =
f1 d, . . . ,
fk d .
a

Rb

Rb
In other words a f d is the point in Rk whose jth coordinate is a fj d. It is clear that parts
(a), (c), and (e) of Proposition 5.9 are valid for these vector valued integrals; we simply apply
the earlier results to each coordinate. The same is true for Proposition 5.12, Theorem 5.14, and
Theorem 5.15. To illustrate this, we state the analog of the fundamental theorem of calculus.

Theorem 5.28 If f = (f1 , . . . , fk ) R on [a, b] and if F = (F1 , . . . , Fk ) is an antiderivative


of f on [a, b], then
Z b
f (x) dx = F (b) F (a).
a

The analog of Proposition 5.10 (b) offers some new features.


Let x = (x1 , . . . , xk ) Rk be any
p
vector in Rk . We denote its Euclidean norm by kxk = x21 + + x2k .
Proposition 5.29 If f = (f1 , . . . , fk ) R() on [a, b] then kf k R() and
Z b
Z b



f d
kf k d.


a

(5.42)

Proof. By the definition of the norm,

kf k = f12 + f22 + + fk2

 12

By Proposition 5.10 (a) each of the functions fi2 belong to R(); hence so does their sum f12 +
f22 + + fk2 . Note that the square-root is a continuous function on the positive half line. If we

5.4 Integration of Vector-Valued Functions

149

apply Proposition 5.8 we see kf k R().


Rb
Rb
To prove (5.42), put y = (y1 , . . . , yk ) with yj = a fj d. Then we have y = a f d, and
2

kyk =

k
X

yj2

j=1

k
X

yj

Z bX
k

fj d =

j=1

(yj fj ) d.

j=1

By the CauchySchwarz inequality, Proposition 1.25,


k
X
j=1

yj fj (t) kyk kf (t)k ,

t [a, b].

Inserting this into the preceding equation, the monotony of the integral gives
2

kyk kyk

kf k d.

If y = 0, (5.42) is trivial. If y 6= 0, division by kyk gives (5.42).

Integration of Complex Valued Functions


This is a special case of the above arguments with k = 2. Let : [a, b] C be a complexvalued function. Let u, v : [a, b] R be the real and imaginary parts of , respectively; u =
Re and v = Im .
The function = u + iv is said to be integrable if u, v R on [a, b] and we set
Z

dx =
a

u dx + i
a

v dx.

The fundamental theorem of calculus holds: If the complex function is Riemann integrable,
R on [a, b] and F (x) is an antiderivative of , then
Z

(x) dx = F (b) F (a).

Rx
Similarly, if u and v are both continuous, F (x) = a (t) dt is an antiderivative of (x).
Proof. Let F = U +iV be the antiderivative of where U = u and V = v. By the fundamental
theorem of calculus
Z b
Z b
Z b
dx =
u dx + i
v dx = U(b) U(a) + i (V (b) V (a)) = F (b) F (a).
a

Example:

b
1 t
e dt = e ,

a
t

C.

5 Integration

150

5.5 Inequalities
R

Rb
b

Besides the triangle inequality a f d a | f | d which was shown in Proposition 5.10
we can formulate Holders, Minkowskis, and the CauchySchwarz inequalities for Riemann
Stieltjes integrals. For, let p > 0 be a fixed positive real number and an increasing function
on [a, b]. For f R() define the Lp -norm
kf kp =

Z

| f | d

 p1

(5.43)

CauchySchwarz Inequality
Proposition 5.30 Let f, g : [a, b] C be complex valued functions and f, g R on [a, b].
Then
Z b
2 Z b
Z b
2
| f g | dx
| f | dx
| g |2 dx.
(5.44)
a

R
R
Proof. Letting f = | f | and g = | g |, it suffices to show ( f g dx)2 f 2 dx g 2 dx. For,
Rb
Rb
Rb
put A = a g 2 dx, B = a f g dx, and C = a f 2 dx. Let C be arbitrary. By the positivity
and linearity of the integal,
Z b
Z b
Z b
Z b
2
2
2
(f + g) dx =
f dx + 2
f g dx +
g 2 dx = C + 2B + A2 =: h().
0
a

Thus, h is non-negative for all complex values .


Case 1. A = 0. Inserting this, we get 2B + C 0 for all C. This implies B = 0 and
C 0; the inequality is satisfied.
Case 2. A > 0. Dividing the above inequality by A, we have
2  2

C
B
2B
B
C
2
0 +
+ +

+ .
A
A
A
A
A
This is satisfied for all if and only if
 2
B
C

A
A

and, finally B 2 AC.

This completes the proof.

Proposition 5.31 (a) CauchySchwarz inequality. Suppose f, g R(), then


s
s
Z b
Z b
Z b
Z b


2


f g d
| f g | d
| f | d
| g |2 d or

a
a
a
a
Z b
| f g | d kf k2 kgk2 .
a

(5.45)
(5.46)

5.6 Appendix D

151

1 1
(b) Holders inequality. Let p and q be positive real numbers such that + = 1. If f, g R(),
p q
then
Z b
Z b




f
g
d
| f g | d kf kp kgkq .
(5.47)


a

(c) Minkowskis inequality. Let p 1 and f, g R(), then

kf + gkp kf kp + kgkp .

(5.48)

5.6 Appendix D
The composition of an integrable and a continuous function is integrable
Proof of Proposition 5.8. Let > 0. Since is uniformly continuous on [m, M], there exists
> 0 such that < and | (s) (t) | < if | s t | < and [s, t [m, M].
Since f R(), there exists a partition P = {x0 , x1 , . . . , xn } of [a, b] such that
U(P, f, ) L(P, f, ) < 2 .

(5.49)

Let Mi and mi have the same meaning as in Definition 5.1, and let Mi and mi the analogous
numbers for h. Divide the numbers 1, 2, . . . , n into two classes: i A if Mi mi < and
i B if Mi mi > . For i A our choice of shows that Mi mi . For i B,
Mi mi 2K where K = sup{| (t) | | m t M}. By (5.49), we have
X
X
i
(Mi mi )i < 2
(5.50)

so that

iB

iB

iB

i < . It follows that

U(P, h, ) L(P, h, ) =

X
iA

(Mi mi )i +

X
iB

(Mi mi )i

((b) (a)) + 2K < ((b) (a) + 2K).


Since was arbitrary, Proposition 5.3 implies that h R().

Convex Functions are Continuous


Proposition 5.32 Every convex function f : (a, b) R, a < b +, is continuous.
Proof. There is a very nice geometric proof in Rudins book Real and Complex Analysis,
see [Rud66, 3.2 Theorem]. We give another proof here.
Let x (a, b); choose a finite subinterval (x1 , x2 ) with a < x1 < x < x2 < b. Since
f (x) f (x1 ) + (1 )f (x2 ), [0, 1], f is bounded above on [x1 , x2 ]. Chosing x3 with
x1 < x3 < x the convexity of f implies
f (x3 ) f (x1 )
f (x) f (x1 )
f (x3 ) f (x1 )

= f (x)
(x x1 ).
x3 x1
x x1
x3 x1

5 Integration

152

This means that f is bounded below on [x3 , x2 ] by a linear function; hence f is bounded on
[x3 , x2 ], say | f (x) | C on [x3 , x2 ].
The convexity implies


1
1
1
(x + h) + (x h) (f (x + h) + f (x h))
f
2
2
2
= f (x) f (x h) f (x + h) f (x).

Iteration yields
f (x ( 1)h) f (x h) f (x + h) f (x) f (x + h) f (x + ( 1)h).
Summing up over = 1, . . . , n we have
f (x) f (x nh) n (f (x + h) f (x)) f (x + nh) f (x)
1
1
= (f (x) f (x nh)) f (x + h) f (x) (f (x + nh) f (x)) .
n
n
Let > 0 be given; choose n N such that 2C/n < and choose h such that x3 < x nh <
x < x + nh < x2 . The above inequality then implies
| f (x + h) f (x) |

2C
< .
n

This shows continuity of f at x.


If g is an increasing convex function and f is a convex function, then g f is convex since
f (x + y) f (x) + f (y), + = 1, , 0, implies
g(f (x + y)) g(f (x) + g(x)) g(f (x)) + g(f (y)).

5.6.1 More on the Gamma Function


Let I R be an interval. A positive function F : I R is called logarithmic convex if
log F : I R is convex, i. e. for every x, y I and every , 0 1 we have
F (x + (1 )y) F (x) F (y)1.

Proposition 5.33 The Gamma function is logarithmic convex.


Proof. Let x, y > 0 and 0 < < 1 be given. Set p = 1/ and q = 1/(1 ). Then
1/p + 1/q = 1 and we apply Holders inequality to the functions
f (t) = t
and obtain

f (t)g(t) dt

x1
p

e p ,

Z

g(t) = t

y1
q

 p1 Z
f (t)p dt

e q
R

 1q
g(t)q dt .

5.6 Appendix D

153

Note that
x

f (t)g(t) = t p + q 1 et ,

f (t)p = tx1 et ,

g(t)q = ty1 et .

Taking the limts 0 + 0 and R + we obtain




1
1
x y
(x) p (y) q .
+

p q

Remark 5.5 One can prove that a convex function (see Definition 4.4) is continuous, see Proposition 5.32. Also, an increasing convex function of a convex function f is convex, for example
ef is convex if f is. We conclude that (x) is continuous for x > 0.
Theorem 5.34 Let F : (0, +) (0, +) be a function with
(a) F (1) = 1,
(b) F (x + 1) = xF (x),
(c) F is logarithmic convex.
Then F (x) = (x) for all x > 0.
Proof. Since (x) has the properties (a), (b), and (c) it suffices to prove that F is uniquely
determined by (a), (b), and (c). By (b),
F (x + n) = F (x)x(x + 1) (x + n)
for every positive x and every positive integer n. In particular F (n + 1) = n! and it suffices to
show that F (x) is uniquely determined for every x with x (0, 1). Since n + x = (1 x)n +
x(n + 1) from (c) it follows
F (n + x) F (n)1x F (n + 1)x = F (n)1x F (n)x nx = (n 1)!nx .
Similarly, from n + 1 = x(n + x) + (1 x)((n + 1 + x) it follows
n! = F (n + 1) F (n + x)x F (n + 1 + x)1x = F (n + x)(n + x)1x .
Combining both inequalities,
n!(n + x)x1 F (n + x) (n 1)!nx
and moreover
an (x) :=
Since

bn (x)
an (x)

(n 1)!nx
n!(n + x)x1
F (x)
=: bn (x).
x(x + 1) (x + n 1)
x(x + 1) (x + n 1)

(n+x)nx
n(n+x)x

converges to 1 as n ,
(n 1)!nx
.
n x(x + 1) (x + n)

F (x) = lim
Hence F is uniquely determined.

5 Integration

154
Stirlings Formula

We give an asymptotic formula for n! as n . We call two sequences (an ) and (bn ) to be
an
asymptotically equal if lim
= 1, and we write an bn .
n bn
Proposition 5.35 (Stirlings Formula) The asymptotical behavior of n! is
 n n

.
n! 2n
e

Proof. Using the trapezoid rule (5.34) with f (x) = log x, f (x) = 1/x2 we have
Z

k+1

log x dx =

1
1
(log k + log(k + 1)) +
2
12k2

with k k k + 1. Summation over k = 1, . . . , n 1 gives

Since

n
1

n
X

n1

1 X 1
1
log x dx =
log k log n +
.
2
12 k=1 k2
k=1

log x dx = x log x x (integration by parts), we have


n log n n + 1 =
n
X

log k =

k=1

where n = 1

1
12

Pn1

1
k=1 k2 .

n
X
k=1

n1

log k

1
n+
2

1
1 X 1
log n +
2
12 k=1 k2

log n n + n ,

Exponentiating both sides of the equation we find with cn = en


1

n! = nn+ 2 en cn .

(5.51)

Since 0 < 1/k2 1/k 2 , the limit

X
1
= lim n = 1
n
2
k=1 k

exists, and so the limit c = lim cn = e .


n

Proof of cn 2. Using (5.51) we have

(n!)2 2n(2n)2n 22n (n!)2


c2n
= 2
=
c2n
n2n+1 (2n)!
n(2n)!
and limn

c2n
c2n

c2
c

= c. Using Walliss product formula for

=2

2244 2n2n
4k 2
2
=
lim
4k 2 1 n 1335 (2n 1)(2n + 1)
k=1

(5.52)

5.6 Appendix D

155

we have
n
Y

4k 2
2
4k 2 1
k=1

! 12

24 2n
1

=q
35 (2n 1) 2n + 1
n+

1
=q
n+

such that

1
2

22 42 (2n)2
234 (2n 1)(2n)

22n (n!)2

,
(2n)!

Consequently, c =

1
2

22n (n!)2
= lim
.
n
n(2n)!

2 which completes the proof.

Proof of Holders Inequality


Proof of Proposition 5.31. We prove (b). The other two statements are consequences, their
proofs are along the lines in Section 1.3. The main idea is to approximate the integral on the left
by Riemann sums and use Holders inequality (1.22). Let > 0; without loss of generality, let
f, g 0. By Proposition 5.10 f g, f p , g q R() and by Proposition 5.3 there exist partitions
P1 , P2 , and P3 of [a, b] such that U(f g, P1 , )L(f g, P1, ) < , U(f p , P2 , )L(f p , P2 , ) <
, and U(g q , P3 , ) L(g q , P3 , ) < . Let P = {x0 , x1 , . . . , xn } be the common refinement
of P1 , P2 , and P3 . By Lemma 5.4 (a) and (c)
Z b
n
X
f g d <
(f g)(ti)i + ,
(5.53)
a

n
X

i=1
n
X

i=1
b

f (ti )p i <
q

g(ti ) i <

i=1

Z
Z

f p d + ,

(5.54)

g q d + ,

(5.55)

a
b
a

for any ti [xi1 , xi ]. Using the two preceding inequalities and Holders inequality (1.22) we
have
! p1
! 1q
n
n
n
1
1
X
X
X
f (ti )ip g(ti )iq
f (ti )p i
g(ti)q i
i=1

i=1

<

Z

i=1

f d +

 p1 Z

g d +

 1q

By (5.53),
Z

b
a

Z b
 p1 Z b
 1q
n
X
p
q
f g d <
(f g)(ti )i + <
f d +
g d + + .
i=1

156
Since > 0 was arbitrary, the claim follows.

5 Integration

Chapter 6
Sequences of Functions and Basic
Topology
In the present chapter we draw our attention to complex-valued functions (including the realvalued), although many of the theorems and proofs which follow extend to vector-valued functions without difficulty and even to mappings into more general spaces. We stay within this
simple framework in order to focus attention on the most important aspects of the problem that
arise when limit processes are interchanged.

6.1 Discussion of the Main Problem


Definition 6.1 Suppose (fn ), n N, is a sequence of functions defined on a set E, and suppose
that the sequence of numbers (fn (x)) converges for every x E. We can then define a function
f by
f (x) = lim fn (x),
n

x E.

(6.1)

Under these circumstances we say that (fn ) converges on E and f is the limit (or the limit
function) of (fn ). Sometimes we say that (fn ) converges pointwise to f on E if (6.1) holds.
P
Similarly, if
n=1 fn (x) converges for every x E, and if we define
f (x) =

fn (x),

n=1

the function f is called the sum of the series

n=1

x E,

(6.2)

fn .

The main problem which arises is to determine whether important properties of the functions
fn are preserved under the limit operations (6.1) and (6.2). For instance, if the functions fn are
continuous, or differentiable, or integrable, is the same true of the limit function? What are the
relations between fn and f , say, or between the integrals of fn and that of f ? To say that f is
continuous at x means
lim f (t) = f (x).
tx

157

6 Sequences of Functions and Basic Topology

158

Hence, to ask whether the limit of a sequence of continuous functions is continuous is the same
as to ask whether
lim lim fn (t) = lim lim fn (t)
tx n

(6.3)

n tx

i. e. whether the order in which limit processes are carried out is immaterial. We shall now
show by means of several examples that limit processes cannot in general be interchanged
without affecting the result. Afterwards, we shall prove that under certain conditions the order
in which limit operations are carried out is inessential.

Example 6.1 (a) Our first example, and the simplest one, concerns a double sequence. For
positive integers m, n N let
m
smn =
.
m+n
Then, for fixed n
lim smn = 1,
m

so that lim lim smn = 1. On the other hand, for every fixed m,
n m

lim smn = 0,

so that lim lim smn = 0. The two limits cannot be interchanged.


m n
(
0,
0 x < 1,
All functions fn (x) are conti(b) Let fn (x) = xn on [0, 1]. Then f (x) =
1,
x = 1.
nous on [0, 1]; however, the limit f (x) is discontinuous at x = 1; that is lim lim tn = 0 6=

1 = lim lim tn . The limits cannot be interchanged.

x10 n

n t10

After these examples, which show what can go wrong if limit processes are interchanged carelessly, we now define a new notion of convergence, stronger than pointwise convergence as
defined in Definition 6.1, which will enable us to arrive at positive results.

6.2 Uniform Convergence


6.2.1 Definitions and Example
Definition 6.2 A sequence of functions (fn ) converges uniformly on E to a function f if for
every > 0 there is a positive integer n0 such that n n0 implies
| fn (x) f (x) |
for all x E. We write fn f on E.
As a formula, fn f on E if
> 0 n0 N n n0 x E : | fn (x) f (x) | .

(6.4)

6.2 Uniform Convergence

159

f(x) +
f(x)
f(x)

Uniform convergence of fn to f on [a, b] means


that fn is in the -tube of f for sufficiently large
n

It is clear that every uniformly convergent sequence is pointwise convergent (to the same function). Quite explicitly, the difference between the two concepts is this: If (fn ) converges pointwise on E to a function f , for every > 0 and for every x E, there exists an integer n0
depending on both and x E such that (6.4) holds if n n0 . If (fn ) converges uniformly on
E it is possible, for each > 0 to find one integer n0 which will do for all x E.
P
We say that the series
k=1 fk (x) converges uniformly on E if the sequence (sn (x)) of partial
sums defined by
n
X
fk (x)
sn (x) =
k=1

converges uniformly on E.

Proposition 6.1 (Cauchy criterion) (a) The sequence of functions (fn ) defined on E converges
uniformly on E if and only if for every > 0 there is an integer n0 such that n, m n0 and
x E imply
| fn (x) fm (x) | .
(b) The series of functions

(6.5)

gk (x) defined on E converges uniformly on E if and only if for

k=1

every > 0 there is an integer n0 such that n, m n0 and x E imply




n
X



gk (x) .



k=m

Proof. Suppose (fn ) converges uniformly on E and let f be the limit function. Then there is an
integer n0 such that n n0 , x E implies

| fn (x) f (x) | ,
2

so that
| fn (x) fm (x) | | fn (x) f (x) | + | fm (x) f (x) |
if m, n n0 , x E.
Conversely, suppose the Cauchy condition holds. By Proposition 2.18, the sequence (fn (x))
converges for every x to a limit which may we call f (x). Thus the sequence (fn ) converges

6 Sequences of Functions and Basic Topology

160

pointwise on E to f . We have to prove that the convergence is uniform. Let > 0 be given,
choose n0 such that (6.5) holds. Fix n and let m in (6.5). Since fm (x) f (x) as
m this gives
| fn (x) f (x) |
for every n n0 and x E.
P
(b) immediately follows from (a) with fn (x) = nk=1 gk (x).
Remark 6.1 Suppose
lim fn (x) = f (x),

x E.

Put
Mn = sup | fn (x) f (x) | .
xE

Then fn f uniformly on E if and only if Mn 0 as n .

(prove!)

The following comparison test of a function series with a numerical series gives a sufficient
criterion for uniform convergence.
Theorem 6.2 (Weierstra) Suppose (fn ) is a sequence of functions defined on E, and suppose

Then

n=1

| fn (x) | Mn ,
fn converges uniformly on E if

x E, n N.

n=1

(6.6)

Mn converges.

P
Proof. If Mn converges, then, for arbitrary > 0 there exists n0 such that m, n n0 implies
Pn
i=m Mi . Hence,


n
n
n
X

X
X


f
(x)

|
f
(x)
|

Mi ,
x E.


i
i

tr.In.
(6.6)
i=m

i=m

i=m

Uniform convergence now follows from Proposition 6.1.

P
Proposition 6.3 ( Comparison Test) If
converges uniformly on E and | fn (x) |
n=1 gn (x)P
gn (x) for all sufficiently large n and all x E then
n=1 fn (x) converges uniformly on E.

Proof. Apply the Cauchy criterion. Note that




m
m
m
X
X
X


fn (x)
| fn (x) |
gn (x) < .



n=k

n=k

n=k

6.2 Uniform Convergence

161

Application of Weierstra Theorem to Power Series and Fourier Series


Proposition 6.4 Let

an C,

an z n ,

n=0

(6.7)

be a power series with radius of convergence R > 0.


Then (6.7) converges uniformly on the closed disc {z | | z | r} for every r with 0 r < R.
Proof. We apply Weierstra theorem to fn (z) = an z n . Note that
| fn (z) | = | an | | z |n | an | r n .
P
n
Since r < R, r belongs to the disc of convergence, the series
n=0 | an | r converges by TheoP
rem 2.34. By Theorem 6.2, the series n=0 an z n converges uniformly on {z | | z | r}.
Remark 6.2 (a) The power series

nan z n1

n=0

has the same radius of convergence R as the series (6.7) and hence also converges uniformly on
the closed disc {z | | z | r}.
Indeed, this simply follows from the fact that
lim

p
n

(n + 1) | an+1 | = lim

p
1
n
| an | = .
n
R

n + 1 lim

(b) Note that the power series in general does not converge uniformly on the whole open disc
of convergence | z | < R. As an example, consider the geometric series
f (z) =

X
1
=
zk ,
1z
k=0

| z | < 1.

Note that the condition


0 > 0 n N xn E : | fn (xn ) f (xn ) | 0
implies that (fn ) does not converge uniformly to f on E.
Prove!
n
To = 1 and every n N choose zn = n+1 and we obtain, using Bernoullis inequality,
n

1
1
znn
n
zn = 1
= 1 zn , hence
1n
1.
(6.8)
n+1
n+1
1 zn
so that

n1
X
1

| sn1 (zn ) f (zn ) | =
znk

1 zn
k=0



X
znn


znk =
1.
=

1 zn (6.8)
k=n

The geometric series doesnt converge uniformly on the whole open unit disc.

6 Sequences of Functions and Basic Topology

162
Example 6.2 (a) A series of the form

an cos(nx) +

n=0

an , bn , x R,

bn sin(nx),

n=1

(6.9)

P
P
is called a Fourier series (see Section 6.3 below). If both
n=0 | an | and
n=0 | bn | converge
then the series (6.9) converges uniformly on R to a function F (x).
Indeed, since | an cos(nx) | | an | and | bn sin(nx) | | bn |, by Theorem 6.2, the series (6.9)
converges uniformly on R.
(b) Let f : R R be the sum of the Fourier series
f (x) =

X
sin nx
n=1

(6.10)

P
P
Note that (a) does not apply since n | bn | = n n1 diverges.
If f (x) exists, so does f (x + 2) = f (x), and f (0) = 0. We will show that the series converges
uniformly on [, 2 ] for every > 0. For, put
!
n
n
X
X
sn (x) =
sin kx = Im
eikx .
k=1

k=1

If x 2 we have

n
ei(n+1)x eix
X
2
1
1

ikx

=
| sn (x) |
e =
.
x

ix
ix/2
ix/2


e 1
|e
e
|
sin 2
sin 2
k=1

Note that | Im z | | z | and eix = 1. Since sin x2 sin 2 for /2 x/2 /2 we have
for 0 < m < n



n
n
X
X

sin
kx
s
(x)

s
(x)



k
k1

=




k
k
k=m
k=m



n
X

(x)
(x)
s
1
s
1


n
m1
=
+

sk (x)



k
k
+
1
n
+
1
m
k=m


!


n 
X
1
1 1
1
1

+

n+1 m
sin 2 k=m k k + 1


1
2
1
1
1
1

+
+

sin 2 m n + 1 n + 1 m
m sin 2
The right side becomes arbitraryly small as m . Using Proposition 6.1 (b) uniform convergence of (6.10) on [, 2 ] follows.

6.2.2 Uniform Convergence and Continuity


Theorem 6.5 Let E R be a subset and fn : E R, n N, be a sequence of continuous
functions on E uniformly converging to some function f : E R.
Then f is continuous on E.

6.2 Uniform Convergence

163

Proof. Let a E and > 0 be given. Since fn f there is an r N such that


| fr (x) f (x) | /3

for all

x E.

Since fr is continuous at a, there exists > 0 such that | x a | < implies


| fr (x) f (a) | /3.
Hence | x a | < implies
| f (x) f (a) | | f (x) fr (x) | + | fr (x) fr (a) | + | fr (a) f (a) |


+ + = .
3 3 3

This proves the assertion.


The same is true for functions f : E C where E C.
Example 6.3 (Example 6.2 continued) (a) A power series defines a continuous functions on
the disc of convergence {z | | z | < R}.
Indeed, let | z0 | < R. Then there exists r R such that | z0 | < r < R. By Proposition 6.4,
P
n
n an x converges uniformly on {x | | x | r}. By Theorem 6.5, the function, defined by the
sum of the power series is continuous on [r, r].
P
(b) The sum of the Fourier series (6.9) is a continuous function on R if both n | an | and
P
n | bn | converge.
P
is continuous on [, 2 ] for all with
(c) The sum of the Fourier series f (x) = n sin(nx)
n
0 < < by the above theorem. Clearly, f is 2-periodic since all partial sums are.
/2

f(x)

Later (see the section on Fourier series) we will show that


(
0,
x = 0,
f (x) = x
,
x (0, 2),
2
Since f is discontinuous at x0 = 2n, the Fourier series
does not converge uniformly on R.

/2

Also, Example 6.1 (b) shows that the continuity of the fn (x) = xn alone is not sufficient for
the continuity of the limit function. On the other hand, the sequence of continuous functions
(xn ) on (0, 1) converges to the continuous function 0. However, the convergence is not uniform.
Prove!

6.2.3 Uniform Convergence and Integration


Example 6.4 Let fn (x) = 2n2 xen
Z

2 x2

; clearly limn fn (x) = 0 for all x R. Further

fn (x) dx = en

2 x2

1 

2

= 1 en 1.
0

6 Sequences of Functions and Basic Topology

164

R1
R1
On the other hand 0 limn fn (x) dx = 0 0 dx = 0. Thus, limn and integration cannot
be interchanged. The reason, (fn ) converges pointwise to 0 but not uniformly. Indeed,
 
2n2 1 2n
1
fn
=
e =
+.
n
n
e n
Theorem 6.6 Let be an increasing function on [a, b]. Suppose fn R() on [a, b] for all
n N and suppose fn f uniformly on [a, b]. Then f R() on [a, b] and
Z b
Z b
f d = lim
fn d.
(6.11)
n

Proof. Put
n = sup | fn (x) f (x) | .
x[a,b]

Then
fn n f fn + n ,
so that the upper and the lower integrals of f satisfy
Z

b
a

(fn n ) d

Hence,
0

b
a

f d

f d

b
a

f d

(fn + n ) d.

(6.12)

f d 2n ((b) (a)).

Since n 0 as n (Remark 6.1), the upper and the lower integrals of f are equal. Thus
f R(). Another application of (6.12) yields
Z b
Z b
Z b
(fn n ) d
f d
(fn + n ) d
a
a
a

Z b
Z b


n (((b) (a)).

f
d

f
d
n


a

This implies (6.11).

Corollary 6.7 If fn R() on [a, b] and if the series


f (x) =

fn (x),

n=1

axb

converges uniformly on [a, b], then


Z

X
n=1

fn

d =

Z
X
n=1

In other words, the series may be integrated term by term.

fn d .

6.2 Uniform Convergence

165

Corollary 6.8 Let fn : [a, b] R be a sequence of continuous functions uniformly converging


on [a, b] to f . Let x0 [a, b].
Rx
Rx
Then the sequence Fn (x) = x0 fn (t) dt converges uniformly to F (x) = x0 f (t) dt.

Proof. The pointwise convergence of Fn follows from the above theorem with (t) = t and a
and b replaced by x0 and x.
We show uniform convergence: Let > 0. Since fn f on [a, b], there exists n0 N such

for all t [a, b]. For n n0 and all x [a, b] we


that n n0 implies | fn (t) f (t) | ba
thus have
Z x
Z x

| Fn (x) F (x) | =
(b a) = .
(fn (t) f (t)) dt
| fn (t) f (t) | dt
ba
x0

x0

Hence, Fn F on [a, b].

Example 6.5 (a) For every real t (1, 1) we have

X (1)n1
t2 t3
tn .
log(1 + t) = t + =
2
3
n
n=1

(6.13)

Proof. In Homework 13.5 (a) there was computed the Taylor series
T (x) =

X
(1)n1
n=1

xn

of log(1 + x) and it was shown that T (x) = log(1 + x) if x (0, 1).

X
By Proposition 6.4 the geometric series
(1)n xn converges uniformly to the function

1
1+x

n=0

on [r, r] for all 0 < r < 1. By Corollary 6.7 we have for all t [r, r]
Z tX
Z t

dx
t
=
(1)n xn dx
log(1 + t) = log(1 + x)|0 =
1
+
x
0
0 n=0
t
Z

t
X
X (1)n
X

(1)n1 n
n n
n+1
x =
t
=
(1) x dx =
Cor 6.7
n
+
1
n
0
0
n=0
n=0
n=1
(b) For | t | < 1 we have

X
t2n+1
t3 t5
(1)n
arctan t = t + =
3
5
2n + 1
n=0

(6.14)

As in the previous example we use the uniform convergence of the geometric series on [r, r]
for every 0 < r < 1 that allows to exchange integration and summation
Z t
Z tX
Z t

X
X
dx
(1)n 2n+1
n 2n
n
2n
arctan t =
t
=
(1) x dx =
(1)
x dx =
.
2
2n + 1
0 1+x
0 n=0
0
n=0
n=0

6 Sequences of Functions and Basic Topology

166

Note that you are, in general, not allowed to insert t = 1 into the equations (6.13) and (6.14).
However, the following proposition (the proof is in the appendix to this chapter) fills this gap.
Proposition 6.9 (Abels Limit Theorem) Let
Then the power series
f (x) =

n=0

an a convergent series of real numbers.

an xn

n=0

converges for x [0, 1] and is continuous on [0, 1].


As a consequence of the above proposition we have

X (1)n1
1 1 1
log 2 = 1 + =
,
2 3 4
n
n=0

X (1)n
1 1 1

= 1 + =
.
4
3 5 7
2n + 1
n=0

1 x
e n f (x) 0 on [0, +). Indeed, | fn (x) 0 |
n
and for all x R+ . However,

Example 6.6 We have fn (x) =


if n

+
nt

fn (t) dt = e

Hence
lim

= lim

t+

fn (t) dt = 1 6= 0 =

nt

1e

1
n

<

= 1.

f (t) dt.

That is, Theorem 6.6 fails in case of improper integrals.

6.2.4 Uniform Convergence and Differentiation


Example 6.7 Let
fn (x) =

sin(nx)

,
n

x R, n N,

(6.15)

f (x) = lim fn (x) = 0. Then f (x) = 0, and


n

fn (x) =

n cos(nx),

so that (fn ) does not converge to f . For instance fn (0) =

n + as n , whereas
n

f (0) = 0. Note that (fn ) converges uniformly to 0 on R since | sin(nx)/ n | 1/ n becomes


small, independently on x R.
Consequently, uniform convergence of (fn ) implies nothing about the sequence (fn ). Thus,
stronger hypothesis are required for the assertion that fn f implies fn f

6.2 Uniform Convergence

167

Theorem 6.10 Suppose (fn ) is a sequence of continuously differentiable functions on [a, b]


pointwise converging to some function f . Suppose further that (fn ) converges uniformly on
[a, b].
Then fn converges uniformly to f on [a, b], f is continuously differentiable and on [a, b], and
f (x) = lim fn (x),

a x b.

(6.16)

Proof. Put g(x) = limn fn (x), then g is continuous by Theorem 6.5. By the Fundamental
Theorem of Calculus, Theorem 5.14,
Z x
fn (x) = fn (a) +
fn (t) dt.
a

By assumption on (fn ) and by Corollary 6.8 the sequence


Z x


fn (t) dt

converges uniformly on [a, b] to


thus obtain

Rx

g(t) dt. Taking the limit n in the above equation, we


Z x
f (x) = f (a) +
g(t) dt.

Since g is continuous, the right hand side defines a differentiable function, namely the
antiderivative of g(x), by the FTC. Hence, f (x) = g(x); since g is continuous the proof is now
complete.
For a more general result (without the additional assumption of continuity of fn ) see [Rud76,
7.17 Theorem].
P
n
Corollary 6.11 Let f (x) =
n=0 an x be a power series with radius of convergence R.
(a) Then f is differentiable on (R, R) and we have
f (x) =

nan xn1 ,

n=1

x (R, R).

(6.17)

(b) The function f is infinitely often differentiable on (R, R) and we have


f (k) (x) =

X
n=k

an =

n(n 1) (n k + 1)an xnk ,

1 (n)
f (0),
n!

n N0 .

(6.18)
(6.19)

In particular, f coincides with its Taylor series.


P
n
Proof. (a) By Remark 6.2 (a), the power series
n=0 (an x ) has the same radius of convergence
and converges uniformly on every closed subinterval [r, r] of (R, R). By Theorem 6.10,
f (x) is differentiable and differentiation and summation can be interchanged.

6 Sequences of Functions and Basic Topology

168

(b) Iterated application of (a) yields that f (k1) is differentiable on (R, R) with (6.18). In
particular, inserting x = 0 into (6.18) we find
f (k) (0) = k!ak ,

ak =

f (k) (0)
.
k!

These are exactly the Taylor coefficients of f hat a = 0. Hence, f coincides with its Taylor
series.

Example 6.8 For x (1, 1) we have

nxn =

n=1

x
.
(1 x)2

P
n
Since the geometric series f (x) =
n=0 x equals 1/(1 x) on (1, 1) by Corollary 6.11 we
have
!



X
X
1
d
d
d X n
1
n
=
=
x
nxn1 .
=
(x ) =
(1 x)2
dx 1 x
dx n=0
dx
n=1
n=1
Multiplying the preceding equation by x gives the result.

6.3 Fourier Series


In this section we consider basic notions and results of the theory of Fourier series. The question
is to write a periodic function as a series of cos kx and sin kx, k N. In contrast to Taylor
expansions the periodic function need not to be infinitely often differentiable. Two Fourier
series may have the same behavior in one interval, but may behave in different ways in some
other interval. We have here a very striking contrast between Fourier series and power series.
In this section a periodic function is meant to be a 2-periodic complex valued function on R,
that is f : R C satisfies f (x + 2) = f (x) for all x R. Special periodic functions are the
trigonometric polynomials.
Definition 6.3 A function f : R R is called trigonometric polynomial if there are real numbers ak , bk , k = 0, . . . , n with
n

f (x) =

a0 X
ak cos kx + bk sin kx.
+
2
k=1

The coefficients ak and bk are uniquely determined by f since


Z
1 2
ak =
f (x) cos kx dx, k = 0, 1, . . . , n,
0
Z
1 2
f (x) sin kx dx, k = 1, . . . , n.
bk =
0

(6.20)

(6.21)

6.3 Fourier Series

169

This is immediate from


Z

cos kx sin mx dx = 0,

0
Z 2
0

cos kx cos mx dx = km , k, m N,

(6.22)

sin kx sin mx dx = km ,

where km = 1 if k = m and km = 0 if k 6= m is the so called Kronecker symbol, see


Homework 19.2. For example, if m 1 we have
!
Z
Z
n
1 2 a0 X
1 2
+
f (x) cos mx dx =
ak cos kx + bk sin kx cos mx dx
0
0
2
k=1
!
n Z
1 X 2
=
(ak cos kx cos mx + bk sin kx cos mx) dx
k=1 0
!
n
1 X
=
ak km = am .
k=1
Sometimes it is useful to consider complex trigonometric polynomials. Using the formulas
expressing cos x and sin x in terms of eix and eix we can write the above polynomial (6.20) as
f (x) =

n
X

ck eikx ,

(6.23)

k=n

where c0 = a0 /2 and
ck =

1
(ak ibk ) ,
2

ck =

1
(ak + ibk ) ,
2

k 1.

To obtain the coefficients ck using integration we need the notion of an integral of a complexvalued function, see Section 5.5. If m 6= 0 we have
Z

b
imx

If a = 0 and b = 2 and m Z we obtain


Z

b
1 imx
dx =
e .
im
a

eimx dx =

(
0,

2,

m Z \ {0},

m = 0.

We conclude,
1
ck =
2

f (x) eikx dx,


0

k = 0, 1, . . . , n.

(6.24)

6 Sequences of Functions and Basic Topology

170

Definition 6.4 Let f : R C be a periodic function with f R on [0, 2]. We call


1
ck =
2

f (x)eikx dx,

kZ

(6.25)

the Fourier coefficients of f , and the series

ck eikx ,

(6.26)

k=

i. e. the sequence of partial sums


sn =

n
X

ck eikx ,

k=n

n N,

the Fourier series of f .


The Fourier series can also be written as

a0 X
+
ak cos kx + bk sin kx.
2

(6.27)

k=1

where ak and bk are given by (6.21). One can ask whether the Fourier series of a function
converges to the function itself. It is easy to see: If the function f is the uniform limit of a series
of trigonometric polynomials

f (x) =

k eikx

(6.28)

k=

then f coincides with its Fourier series. Indeed, since the series (6.28) converges uniformly, by
Proposition 6.6 we can change the order of summation and integration and obtain
1
ck =
2
=

2
0

m eimx

m=

1
2 m=

eikx dx

m ei(mk)x dx = k .

In general, the Fourier series of f neither converges uniformly nor pointwise to f . For Fourier
series convergence with respect to the L2 -norm
kf k2 =
is the appropriate notion.

1
2

 21
| f | dx
2

(6.29)

6.3 Fourier Series

171

6.3.1 An Inner Product on the Periodic Functions


Let V be the linear space of periodic functions f : R C, f R on [0, 2]. We introduce an
inner product on V by
Z 2
1
f g =
f (x) g(x) dx, f, g V.
2 0
One easily checks the following properties for f, g, h V , , C.
f + g h = f h + g h,

f g + h = f g + f h,
f g = f g,
f g = g f.

R 2
For every f V we have f f = 1/(2) 0 | f |2 dx 0. However, f f = 0 does not imply
f = 0 (you can change f at finitely many points without any impact on f f ). If f V is

continuous, then f f = 0 implies f = 0, see Homework 14.3. Put kf k2 = f f .


Note that in the physical literature the inner product in L2 (X) is often linear in the second component and antilinear in the first component. Define for k Z the periodic function ek : R C
by ek (x) = eikx , the Fourier coefficients of f V take the form
ck = f ek ,

k Z.

From (6.24) it follows that the functions ek , k Z, satisfy


ek el = kl .

(6.30)

Any such subset {ek | k N} of an inner product space V satisfying (6.30) is called an
orthonormal system (ONS). Using ek (x) = cos kx + i sin kx the real orthogonality relations
(6.22) immediately follow from (6.30).
The next lemma shows that the Fourier series of f is the best L2 -approximation of a periodic
function f V by trigonometric polynomials.
Lemma 6.12 (Least Square Approximation) Suppose f V has the Fourier coefficients ck ,
k Z and let k C be arbitrary. Then
2
2

n
n




X
X




ck ek f
k ek ,
(6.31)
f




k=n

k=n

and equality holds if and only if ck = k for all k. Further,



2
n
n


X
X


2
ck ek = kf k2
| ck |2 .
f


k=n

k=n

(6.32)

6 Sequences of Functions and Basic Topology

172

Proof. Let

always denote

n
X

. Put gn =

k=n

f g n = f
and gn ek = k such that

k ek . Then

k ek =

g n g n =

k f ek =

ck k

| k |2 .

Noting that | a b |2 = (a b)(a b) = | a |2 + | b |2 ab ab, it follows that


kf gn k22 = f gn f gn = f f f gn gn f + gn gn
= kf k22
= kf k22

P
P

ck k

| ck |2 +

ck k +

| k |2

| k ck | 2

(6.33)

which is evidently minimized if and only if k = ck . Inserting this into (6.33), equation (6.32)
follows.

Corollary 6.13 (Bessels Inequality) Under the assumptions of the above lemma we have

k=

| ck |2 kf k22 .

(6.34)

Proof. By equation (6.32), for every n N we have


n
X

k=n

| ck |2 kf k22 .

Taking the limit n or supnN shows the assertion.


An ONS {ek | k Z} is said to be complete if instead of Bessels inequality, equality holds for
all f V .
Definition 6.5 Let fn , f V , we say that (fn ) converges to f in L2 (denoted by fn f ) if
kk2

lim kfn f k2 = 0.

Explicitly
Z

2
0

| fn (x) f (x) |2 dx 0.
n

6.3 Fourier Series

173

Remarks 6.3 (a) Note that the L2 -limit in V is not unique; changing f (x) at finitely many
R 2
points of [0, 2] does not change the integral 0 | f fn |2 dx.
(b) If fn f on R then fn f . Indeed, let > 0. Then there exists n0 N such that
kk2

n n0 implies supxR | fn (x) f (x) | . Hence


Z 2
Z 2
2
| fn f | dx
2 dx = 22 .
0

This shows fn f 0.
kk2

(c) The above Lemma, in particular (6.32), shows that the Fourier series converges in L2 to f if
and only if
kf k22

k=

| ck | 2 .

(6.35)

This is called Parsevals Completeness Relation. We will see that it holds for all f V .
Let use write
f (x)

ck eikx

k=

to express the fact that (ck ) are the (complex) Fourier coefficients of f . Further
sn (f ) = sn (f ; x) =

n
X

ck eikx

(6.36)

k=n

denotes the nth partial sum.


Theorem 6.14 (Parsevals Completeness Theorem) The ONS {ek | k Z} is complete.
More precisely, if f, g V with
f

ck ek ,

k=

k ek ,

k=

then
(i)

1
lim
n 2

(ii)
(iii)

| f sn (f ) |2 dx = 0,
1
2
Z

1
2

f g dx =

| f |2 dx =

k=

X
k=

(6.37)
ck k ,
| ck | 2 =

(6.38)

a20 1 X 2
+
(a + b2k ) Parsevals formula.
4
2 k=1 k
(6.39)

The proof is in Rudins book, [Rud76, 8.16, p.191]. It uses StoneWeierstra theorem about
the uniform approximation of a continuoous function by polynomials. An elementary proof is
in Forsters book [For01, 23].

6 Sequences of Functions and Basic Topology

174

Example 6.9 (a) Consider the periodic function f V given by


(
1,
0x<
f (x) =
1,
x < 2.
Since f is an odd function the coefficients ak vanish. We compute the Fourier coefficients bk .
2
bk =

(



0,
2
2
(1)k+1 + 1 = 4
cos kx =
sin kx dx =
k
k
,
0
k

The Fourier series of f reads

Noting that

kf k22

1
=
2

dx = 1 =

if k is odd..

4 X sin(2n + 1)x
.
f
n=0
2n + 1

Parsevals formula gives

if k is even,

| ck | 2 =

a20 1 X 2
+
(a + b2n )
4
2 nN n

1
1X 2
8 X
8
2
.
bn = 2
=:
s
=
s
=
1
1
2 nN
n=0 (2n + 1)2
2
8

P
1
Now we can compute s =
n=1 n2 . Since this series converges absolutely we are allowed to
rearrange the elements in such a way that we first add all the odd terms, which gives s1 and then
all the even terms which gives s0 . Using s1 = 2 /8 we find
1
1
1
s = s1 + s0 = s1 + 2 + 2 + 2 +
2
4 6

1 1
1
s
+ 2 + = s1 +
s = s0 + 2
2
2 1
2
4
2
4

s = s1 = .
3
6
(b) Fix a [0, 2] and consider f V with
(
1,
f (x) =
0,

0 x a,

a < x < 2

Z a
1
a
dx =
and
The Fourier coefficients of f are c0 =
2 0
2
Z a

1
i
eika 1 ,
eikx dx =
ck = f ek =
2 0
2k

If k 6= 0,

| ck | 2 =

k 6= 0.


 1 cos ka
1
ika
ika
1

e
1

e
=
,
4 2 k 2
2 2 k 2

6.3 Fourier Series

175

hence Parsevals formula gives

X 1 cos ak
a2
| ck | = 2 +
4
2k2
k=
k=1

where s =

a2
1 X 1
1 X cos ak
= 2+ 2

4
k=1 k 2 2 k=1 k 2
!

X
cos ak
a2
1
= 2 + 2 s
,
4

k2
k=1

1/k 2 . On the other hand


kf k22

1
=
2

dx =
0

a
.
2

Hence, (6.37) reads


a2
1
+
4 2 2

X
cos ka
k=1

X
k=1

k2

a
2

cos ka
(a )2 2
a2 a 2

+
=
.
=
k2
4
2
6
4
12

(6.40)

Since the series

X
cos kx

(6.41)

k2

k=1

converges uniformly on R (use Theorem 6.2 and


Fourier series of the function
(x )2 2
,
4
12

1/k 2 is an upper bound) (6.41) is the

x [0, 2]

and the Fourier series converges uniformly on R to the above function. Since the term by term
differentiated series converges uniformly on [, 2 ], see Example 6.2, we obtain

X
sin kx
k=1



X
cos kx
k2

k=1

(x )2 2

4
12

x
2

which is true for x (0, 2).


We can also integrate the Fourier series and obtain by Corollary 6.7
Z

xX
k=1

X 1
cos kt
dt
=
k2
k2
k=1
=

x
0

X
sin kx
k=1

k3

X

1

cos kt dt =
sin
kt

3
k
0
k=1

6 Sequences of Functions and Basic Topology

176
On the other hand,
Z

(t )2 2

4
12

By homework 19.5
f (x) =

X
sin kx

k3

k=1

dt =

(x )3 2
3
x+ .
12
12
12

3
(x )3 2
x+
12
12
12

defines a continuously differentiable periodic function on R.

Theorem 6.15 Let f : R R be a continuous periodic function which is piecewise continuously differentiable, i. e. there exists a partition {t0 , . . . , tr } of [0, 2] such that f |[ti1 , ti ]
is continuously differentiable.
Then the Fourier series of f converges uniformly to f .

Proof. Let i : [ti1 , ti ] R denote the continuous derivative of f |[ti1 , ti ] and : R R


the periodic function that coincides with i on [ti1 , ti ]. By Bessels inequality, the Fourier
coefficients k of satisfy

X
| k |2 kk22 < .
k=

If k 6= 0 the Fourier coefficients ck of f can be found using integration by parts from the Fourier
coefficients of k .


Z ti
Z ti

i
ikx
ikx ti
ikx
f (x)e

f (x)e
dx =
(x)e
dx .
ti1
k
ti1
ti1
Hence summation over i = 1, . . . , r yields,
1
ck =
2
ck =

i
2k

2
ikx

f (x)e

1 X
dx =
2 i=1

(x)eikx dx =

Note that the term


r
X
i=1

ti

f (x)eikx dx

ti1

ik
.
k

ti

ikx
f (x)e

ti1

vanishes since f is continuous and f (2) = f (0). Since for , C we have | |


1
(| |2 + | |2 ), we obtain
2


1
1
2
| ck |
+ | k | .
2 | k |2

6.4 Basic Topology

177

X
X
1
Since both
and
| k |2 converge,
2
k
k=1
k=

k=

| ck | < .

Thus, the Fourier series converges uniformly to a continuous function g (see Theorem 6.5).
Since the Fourier series converges both to f and to g in the L2 norm, kf gk2 = 0. Since both
f and g are continuous, they coincide. This completes the proof.
P
P
2
Note that for any f V , the series
|
c
|
converges
while
the
series
k
kZ
kZ | ck |
converges only if the Fourier series converges uniformly to f .

6.4 Basic Topology


In the study of functions of several variables we need some topological notions like neighborhood, open set, closed set, and compactness.

6.4.1 Finite, Countable, and Uncountable Sets


Definition 6.6 If there exists a 1-1 mapping of the set A onto the the B (a bijection), we say
that A and B have the same cardinality or that A and B are equivalent; we write A B.
Definition 6.7 For any nonnegative integer n N0 let Nn be the set {1, 2, . . . , n}. For any set
A we say that:
(a) A is finite if A Nn for some n. The empty set is also considered to be
finite.
(b) A is infinite if A is not finite.
(c) A is countable if A N.
(d) A is uncountable if A is neither finite nor countable.
(e) A is at most countable if A is finite or countable.
For finite sets A and B we evidently have A B if A and B have the same number of elements.
For infinite sets, however, the idea of having the same number of elements becomes quite
vague, whereas the notion of 1-1 correspondence retains its clarity.
Example 6.10 (a) Z is countable. Indeed, the arrangement
0, 1, 1, 2, 2, 3, 3, . . .
gives a bijection between N and Z. An infinite set Z can be equivalent to one of its proper
subsets N.

6 Sequences of Functions and Basic Topology

178

(b) Countable sets represent the smallest infinite cardinality: No uncountable set can be a
subset of a countable set. Any countable set can be arranged in a sequence. In particular, Q is
contable, see Example 2.6 (c).
(c) The countable union of countable sets is a countable set; this is Cantors First Diagonal
Process:
x11 x12
x13 x14
...

x21
x22
x23
x24
...

x31
x32
x33
x34
...

x41
x42
x43
x44

x51

(d) Let A = {(xn ) | xn {0, 1} n N} be the set of all sequences whose elements are 0
and 1. This set A is uncountable. In particular, R is uncountable.
Proof. Suppose to the contrary that A is countable and arrange the elements of A in a sequence
(sn )nN of distinct elements of A. We construct a sequence s as follows. If the nth element in
sn is 1 we let the nth digit of s be 0, and vice versa. Then the sequence s differs from every
member s1 , s2 , . . . at least in one place; hence s 6 Aa contradiction since s is indeed an
element of A. This proves, A is uncountable.

6.4.2 Metric Spaces and Normed Spaces


Definition 6.8 A set X is said to be a metric space if for any two points x, y X there is
associated a real number d(x, y), called the distance of x and y such that
(a) d(x, x) = 0 and d(x, y) > 0 for all x, y X with x 6= y (positive definiteness);
(b) d(x, y) = d(y, x) (symmetry);
(c) d(x, y) d(x, z) + d(z, y) for any z X (triangle inequality).
Any function d : X X R with these three properties is called a distance function or metric
on X.
Example 6.11 (a) C, R, Q, and Z are metric spaces with d(x, y) := | y x |.
Any subsets of a metric space is again a metric space.
(b) The real plane R2 is a metric space with respect to
p
d2 ((x1 , x2 ), (y1 , y2 )) := (x1 x2 )2 + (y1 y2 )2 ,
d1 ((x1 , x2 ), (y1 , y2 )) := | x1 x2 | + | y1 y2 | .

d2 is called the euclidean metric.

6.4 Basic Topology

179

(c) Let X be a set. Define


d(x, y) :=

(
1, if
0, if

x 6= y,

x = y.

Then (X, d) becomes a metric space. It is called the discrete metric space.
Definition 6.9 Let E be a vector space over C (or R). Suppose on E there is given a function
kk : E R which associates to each x E a real number kxk such that the following three
conditions are satisfied:
(i) kxk 0 for every x E, and kxk = 0 if and only if x = 0,
(ii) kxk = | | kxk for all C (in R, resp.)
(iii) kx + yk kxk + kyk , for all x, y E.
Then E is called a normed (vector) space and kxk is the norm of x.
kxk generalizes the length of vector x E. Every normed vector space E is a metric space
if we put d(x, y) = kx yk.
Prove!
However, there are metric spaces that are not normed spaces, for example (N, d(m, n) =
| n m |).
Example 6.12 (a) E = Rk or E = Ck . Let x = (x1 , , xk ) E and define
v
u k
uX
kxk2 = t
| xi |2 .
i=1

Then kk is a norm on E. It is called the Euclidean norm.


There are other possibilities to define a norm on E. For example,
kxk = max | xi | ,
1ik

kxk1 =

k
X
i=1

| xk | ,

kxka = kxk2 + 3 kxk1 ,

kxkb = max(kxk1 , kxk2 ).

(b) E = C([a, b]). Let p 1. Then


kf k = sup | f (x) | ,
x[a,b]

kf kp =

Z

b
a

 p1
| f (t) | dt .
p

define norms on E. Note that kf kp p b a kf k .


P
2
(c) Hilberts sequence space. E = 2 = {(xn ) |
n=1 | xn | < }. Then
kxk2 =

X
n=1

| xn |2

! 21

6 Sequences of Functions and Basic Topology

180

defines a norm on 2 .
(d) The bounded sequences. E = = {(xn ) | supnN | xn | < }. Then
kxk = sup | xn |

defines a norm on E.
(e) E = C([a, b]). Then
kf k1 =
defines a norm on E.

b
a

| f (t) | dt

6.4.3 Open and Closed Sets


Definition 6.10 Let X be a metric space with metric d. All points and subsets mentioned below
are understood to be elements and subsets of X, in particular, let E X be a subset of X.
(a) The set U (x) = {y | d(x, y) < } with some > 0 is called the
-neighborhood (or -ball with center x) of x. The number is called the radius of
the neighborhood U (x).
(b) A point p is an interior or inner point of E if there is a neighborhood U (p)
completely contained in E. E is open if every point of E is an interior point.
(c) A point p is called an accumulation or limit point of E if every neighborhood of
p has a point q 6= p such that q E.
(d) E is said to be closed if every accumulation point of E is a point of E. The closure of E (denoted by E) is E together with all accumulation points of E. In other
words p E, if and only if every neighborhood of x has a non-empty intersection
with E.
(e) The complement of E (denoted by E c ) is the set of all points p X such that
p 6 E.
(f) E is bounded if there exists a real number C > 0 such that d(x, y) C for all
x, y E.
(g) E is dense in X if E = X.
Example 6.13 (a) X = R with the standard metric d(x, y) = | x y |. E = (a, b) R is
an open set. Indeed, for every x (a, b) we have U (x) (a, b) if is small enough, say
min{| x a | , | x b |}. Hence, x is an inner point of (a, b). Since x was arbitrary, (a, b)
is open.
F = [a, b) is not open since a is not an inner point of [a, b). Indeed, U (a) ( [a, b) for every
> 0.
We have
E = F = set of accumulation points = [a, b].

6.4 Basic Topology

181

Indeed, a is an accumulation point of both (a, b) and [a, b). This is true since every neighborhood U (a), < b a, has a + /2 (a, b) (resp. in [a, b)) which is different from a. For any
point x 6 [a, b] we find a neighborhood U (x) with U (x) [a, b) = ; hence x 6 E.
The set of rational numbers Q is dense in R. Indeed, every neighborhood U (r) of every real
number r contains a rational number, see Proposition 1.11 (b).
For the real line one can prove: Every open set is the at most countable union of disjoint open
(finite or infinite) intervals. A similar description for closed subsets of R is false. There is no
similar description of open subsets of Rk , k 2.
(b) For every metric space X, both the whole space X and the empty set are open as well as
closed.
(c) Let B = {x Rk | kxk2 < 1} be the open unit ball in Rk . B is open (see Lemma 6.16
below); B is not closed. For example, x0 = (1, 0, . . . , 0) is an accumulation point of B since
xn = (1 1/n, 0, . . . , 0) is a sequence of elements of B converging to x0 , however, x0 6 B.
The accumulation points of B are B = {x Rk | kxk2 1}. This is also the closure of B in
Rk .
f+
g

f-

(d) Consider E = C([a, b]) with the supremum norm.


Then g E is in the -neighborhood of a function f E
if and only if
| f (t) g(t) | < ,

for all

x [a, b].

Lemma 6.16 Every neighborhood Ur (p), r > 0, of a point p is an open set.


Proof. Let q Ur (p). Then there exists > 0 such that d(q, p) =
r . We will show that U (q) Ur (p). For, let x U (q). Then
by the triangle inequality we have
q

d(x, p) d(x, q) + d(q, p) < + (r ) = r.


Hence x Ur (p) and q is an interior point of Ur (p). Since q was
arbitrary, Ur (p) is open.

Remarks 6.4 (a) If p is an accumulation point of a set E, then every neighborhood of p contains
infinitely many points of E.
(b) A finite set has no accumulation points; hence any finite set is closed.
Example 6.14 (a) The open complex unit disc, {z C | | z | < 1}.
(b) The closed unit disc, {z C | | z | 1}.
(c) A finite set.
(d) The set Z of all integers.
(e) {1/n | n N}.
(f) The set C of all complex numbers.
(g) The interval (a, b).

6 Sequences of Functions and Basic Topology

182

Here (d), (e), and (g) are regarded as subsets of R. Some properties of these sets are tabulated
below:
(a)
(b)
(c)
(d)
(e)
(f)
(g)

Closed
No
Yes
Yes
Yes
No
Yes
No

Open
Yes
No
No
No
No
Yes
Yes

Bounded
Yes
Yes
Yes
No
Yes
No
Yes

Proposition 6.17 A subset E X of a metric space X is open if and only if its complement
E c is closed.
Proof. First, suppose E c is closed. Choose x E. Then x 6 E c , and x is not an accumulation
point of E c . Hence there exists a neighborhood U of x such that U E c is empty, that is
U E. Thus x is an interior point of E and E is open.
Next, suppose that E is open. Let x be an accumulation point of E c . Then every neighborhood
of x contains a point of E c , so that x is not an interior point of E. Since E is open, this means
that x E c . It follows that E c is closed.

6.4.4 Limits and Continuity


In this section we generalize the notions of convergent sequences and continuous functions to
arbitrary metric spaces.
Definition 6.11 Let X be a metric space and (xn ) a sequence of elements of X. We say that
(xn ) converges to x X if lim d(xn , x) = 0. We write lim xn = x or xn x.
n

In other words, limn xn = x if for every neighborhood U , > 0, of x there exists an n0 N


such that n n0 implies xn U .
Note that a subset F of a metric space X is closed if and only if F contains all limits of
convergent sequences (xn ), xn F .
Two metrics d1 and d2 on a space X are said to be topologically equivalent if limn xn = x
w.r.t. d1 if and only if limn xn = x w.r.t. d2 . In particular, two norms kk1 and kk2 on the
same linear space E are said to be equivalent if the metric spaces are topologically equivalent.
Proposition 6.18 Let E1 = (E, kk1 ) and E2 = (E, kk2 ) be normed vector spaces such that
there exist positive numbers c1 , c2 > 0 with
c1 kxk1 kxk2 c2 kxk1 ,
Then kk1 and kk2 are equivalent.

for all x E.

(6.42)

6.4 Basic Topology

183

Proof. Condition (6.42) is obviously symmetric with respect to E1 and E2 since kxk2 /c2
kxk1 kxk2 /c1 . Therefore, it is sufficient to show the following: If xn x w.r.t. kk1 then
n

xn x w.r.t. kk2 . Indeed, by Definition, limn kxn xk1 = 0. By assumption,


n

c1 kxn xk1 kxn xk2 c2 kxn xk1 ,

n N.

Since the first and the last expressions tend to 0 as n , the sandwich theorem shows that
limn kxn xk2 = 0, too. This proves xn x w.r.t. kk2 .
Example 6.15 Let E = Rk or E = Ck with the norm kxkp =
All these norms are equivalent. Indeed,
kxkp

k
X
i=1

| xi |

= kxk kxkp

k
X
i=1

p
p

| x1 |p + | xk |p , p [1, ].

kxkp = k kxkp ,

k kxk .

(6.43)

The following Proposition is quite analogous to Proposition 2.33 with k = 2. Recall that a
complex sequence (zn ) converges if and only if both Re zn and Im zn converge.
Proposition 6.19 Let (xn ) be a sequence of vectors of the euclidean space (Rk , kk2 ),
xn = (xn1 , . . . , xnk ).
Then (xn ) converges to a = (a1 , . . . , ak ) Rk if and only if
lim xni = ai ,

i = 1, . . . , k.

Proof. Suppose that lim xn = a. Given > 0 there is an n0 N such that n n0 implies
n

kxn ak2 < . Thus, for i = 1, . . . , k we have

| xni ai | kxn ak2 < ;


hence lim xni = ai .
n
Conversely, suppose that lim xni = ai for i = 1, . . . , k. Given > 0 there are n0i N such
n
that n n0i implies

| xni ai | < .
k
For n max{n01 , . . . , n0k } we have (see (6.43))

kxn ak2 k kxn ak < .


hence lim xn = a.
n

6 Sequences of Functions and Basic Topology

184

Corollary 6.20 Let B Rk be a bounded subset and (xn ) a sequence of elements of B.


Then (xn ) has a converging subsequence.
(1)

Proof. Since B is bounded all coordinates of B are bounded; hence there is a subsequence (xn )
(2)
(1)
of (xn ) such that the first coordinate converges. Further, there is a subsequence (xn ) of (xn )
(k)
(k1)
such that the second coordinate converges. Finally there is a subsequence (xn ) of (xn )
(k)
such that all coordinates converge. By the above proposition the subsequence (xn ) converges
in Rk .
The same statement is true for subsets B Ck .
Definition 6.12 A mapping f : X Y from the metric space X into the metric space Y is
said to be continuous at a X if one of the following equivalent conditons is satisfied.
(a) For every > 0 there exists > 0 such that for every x X
d(x, a) <

implies d(f (x), f (a)) < .

(6.44)

(b) For any sequence (xn ), xn X with lim xn = a it follows that lim f (xn ) = f (a).
n

The mapping f is said to be continuous on X if f is continuous at every point a of X.

Proposition 6.21 The composition of two continuous mappings is continuous.


The proof is completely the same as in the real case (see Proposition 3.4) and we omit it.
We give the topological description of continuous functions.
Proposition 6.22 Let X and Y be metric spaces. A mapping f : X Y is continuous if and
only if the preimage of any open set in Y is open in X.
Proof. Suppose that f is continuous and G Y is open. If f 1 (G) = , there is nothing to
prove; the empty set is open. Otherwise there exists x0 f 1 (G), and therefore f (x0 ) G.
Since G is open, there is > 0 such that U (f (x0 )) G. Since f is continuous at x0 , there
exists > 0 such that x U (x0 ) implies f (x) U (f (x0 )) G. That is, U (x0 ) f 1 (G),
and x0 is an inner point of f 1 (G); hence f 1 (G) is open.
Suppose now that the condition of the proposition is fulfilled. We will show that f is continuous. Fix x0 X and > 0. Since G = U (f (x0 )) is open by Lemma 6.16, f 1 (G) is open by
assumption. In particular, x0 f 1 (G) is an inner point. Hence, there exists > 0 such that
U (x0 ) f 1 (G). It follows that f (U (x0 )) U (x0 ); this means that f is continuous at x0 .
Since x0 was arbitrary, f is continuous on X.

Remark 6.5 Since the complement of an open set is a closed set, it is obvious that the proposition holds if we replace open set by closed set.
In general, the image of an open set under a continuous function need not to be open; consider
for example f (x) = sin x and G = (0, 2) which is open; however, f ((0, 2)) = [1, 1] is not
open.

6.4 Basic Topology

185

6.4.5 Comleteness and Compactness


(a) Completeness
Definition 6.13 Let (X, d) be a metric space. A sequence (xn ) of elements of X is said to be a
Cauchy sequence if for every > 0 there exists a positive integer n0 N such that
d(xn , xm ) <

for all

m, n n0 .

A metric space is said to be complete if every Cauchy sequence converges.


A complete normed vector space is called a Banach space.
Remark. The euclidean k-space Rk and Ck is complete.
The function space C([a, b]), kk ) is complete, see homework 21.2. The Hilbert space 2 is
complete
(b) Compactness
The notion of compactness is of great importance in analysis, especially in connection with
continuity.
By an open cover of a set E in a metric space X we mean a collection {G | I} of open
S
subsets of X such that E G . Here I is any index set and
[
G = {x X | I : x G }.
I

Definition 6.14 (Covering definition) A subset K of a metric space X is said to be compact if


every open cover of K contains a finite subcover. More explicitly, if {G } is an open cover of
K, then there are finitely many indices 1 , . . . , n such that
K G1 Gn .
Note that the definition does not state that a set is compact if there exists a finite open coverthe
whole space X is open and a cover consisting of only one member. Instead, every open cover
has a finite subcover.
Example 6.16 (a) It is clear that every finite set is compact.
(b) Let (xn ) be a converging to x sequence in a metric space X. Then
A = {xn | n N} {x}
is compact.
Proof. Let {G } be any open cover of A. In particular, the limit point x is covered by, say, G0 .
Then there is an n0 N such that xn G0 for every n n0 . Finally, xk is covered by some
Gk , k = 1, . . . , n0 1. Hence the collection
{Gk | k = 0, 1, . . . , n0 1}
is a finite subcover of A; therefore A is compact.

6 Sequences of Functions and Basic Topology

186

Proposition 6.23 (Sequence Definition) A subset K of a metric space X is compact if and


only if every sequence in K contains a convergent subsequence with limit in K.
Proof. (a) Let K be compact and suppose to the contrary that (xn ) is a sequence in K without
any convergent to some point of K subsequence. Then every x K has a neighborhood Ux
containing only finitely many elements of the sequence (xn ). (Otherwise x would be a limit
point of (xn ) and there were a converging to x subsequence.) By construction,
[
K
Ux .
xX

Since K is compact, there are finitely many points y1 , . . . , ym K with


K Uy1 Uym .
Since every Uyi contains only finitely many elements of (xn ), there are only finitely many
elements of (xn ) in Ka contradiction.
(b) The proof is an the appendix to this chapter.

Remark 6.6 Further properties. (a) A compact subset of a metric space is closed and bounded.
(b) A closed subsets of a compact set is compact.
(c) A subset K of Rk or Ck is compact if and only if K is bounded and closed.
Proof. Suppose K is closed and bounded. Let (xn ) be a sequence in K. By Corollary 6.20 (xn )
has a convergent subsequence. Since K is closed, the limit is in K. By the above proposition
K is compact. The other directions follows from (a)

(c) Compactness and Continuity


As in the real case (see Theorem 3.6) we have the analogous results for metric spaces.
Proposition 6.24 Let X be a compact metric space.
(a) Let f : X Y be a continuous mapping into the metric space Y . Then f (X) is compact.
(b) Let f : X R a continuous mapping. Then f is bounded and attains its maximum and
minimum, that is there are points p and q in X such that
f (p) = sup f (x),
xX

f (q) = inf f (x).


xX

Proof. (a) Let {G } be an open covering of f (X). By Proposition 6.22 f 1 (G ) is open for
every . Hence, {f 1 (G )} is an open cover of X. Since X is compact there is an open
subcover of X, say {f 1(G1 ), . . . , f 1 (Gn )}. Then {G1 , . . . , Gn } is a finite subcover of
{G } covering f (X). Hence, f (X) is compact. We skip (b).
Similarly as for real function we have the following proposition about uniform continuity. The
proof is in the appendix.

6.4 Basic Topology

187

Proposition 6.25 Let f : K R be a continuous function on a compact set K R. Then f


is uniformly continuous on K.

6.4.6 Continuous Functions in

Proposition 6.26 (a) The projection mapping pi : Rk R, i = 1, . . . , k, given by


pi (x1 , . . . , xk ) = xi is continuous.
(b) Let U Rk be open and f, g : U R be continuous on U. Then f + g, f g, | f |, and, f /g
(g 6= 0) are continuous functions on U.
(c) Let X be a metric space. A mapping
f = (f1 , . . . , fk ) : X Rk
is continuous if and only if all components fi : X R, i = 1, . . . , k, are continuous.
Proof. (a) Let (xn ) be a sequence converging to a = (a1 , . . . , ak ) Rk . Then the sequence
(pi (xn )) converges to ai = pi (a) by Proposition 6.19. This shows continuity of pi at a.
(b) The proofs are quite similar to the proofs in the real case, see Proposition 2.3. As a sample
we carry out the proof in case f g. Let a U and put M = max{| f (a) | , | f (b) |}. Let > 0,
< 3M 2 , be given. Since f and g are continuous at a, there exists > 0 such that
kx ak <

implies

kx ak <

implies

,
3M

.
| g(x) g(a) | <
3M
| f (x) f (a) | <

(6.45)

Note that
f g(x) f g(a) = (f (x) f (a))(g(x) g(a)) + f (a)(g(x) g(a)) + g(a)(f (x) f (a)).
Taking the absolute value of the above identity, using the triangle inequality as well as (6.45)
we have that kx ak < implies
| f g(x)f g(a) | | f (x)f (a) | | g(x)g(a) | + | f (a) | | g(x) g(a) | + | g(a) | | f (x) f (a) |
2

+M
+ + = .
+M
2
9M
3M
3M
3 3 3

This proves continuity of f g at a.


(c) Suppose first that f is continuous at a X. Since fi = pi f , fi is continuous by the result
of (a) and Proposition 6.21.
Suppose now that all the fi , i = 1, . . . , k, are continuous at a. Let (xn ), xn 6= a, be a sequence
in X with limn xn = a in X. Since fi is continuous, the sequences (fi (xn )) of numbers
converge to fi (a). By Proposition 6.19, the sequence of vectors f (xn ) converges to f (a); hence
f is continuous at a.

6 Sequences of Functions and Basic Topology

188
Example 6.17 Let f : R3 R2 be given by
f (x, y, z) =

sin

x2 +ez
x2 +y 2 +z 2 +1
2
2

log | x2 + y + z + 1 |

Then f is continuous p
on U. Indeed, since product, sum, and composition of continuous functions are continuous, x2 + y 2 + z 2 + 1 is a continuous function on R3 . We also made use
of Proposition 6.26 (a); the coordinate functions x, y, and z are continuous. Since the denomi2
z
nator is nonzero, f1 (x, y, z) = sin 2x +e
is continuous. Since | x2 + y 2 + z 2 + 1 | > 0,
2
2
x +y +z +1

f2 (x, y, z) = log | x2 + y 2 + z 2 + 1 | is continuous. By Proposition 6.26 (c) f is continuous.

6.5 Appendix E
(a) A compact subset is closed
Proof. Let K be a compact subset of a metric space X. We shall prove that the complement of
K is an open subset of X.
Suppose that p X, p 6 K. If q K, let V q and U(q) be neighborhoods of p and q,
respectively, of radius less than d(p, q)/2. Since K is compact, there are finitely many points
q1 , . . . , qn in K such that
K Uq1 Uqn =: U.
If V = V q1 V qn , then V is a neighborhood of p which does not intersect U. Hence
U K c , so that p is an interior point of K c , and K is closed. We show that K is bounded.
Let > 0 be given. Since K is compact the open cover {U (x) | x K} of K has a finite
S
subcover, say {U (x1 ), . . . , U (xn )}. Let U = ni=1 U (xi ), then the maximal distance of two
points x and y in U is bounded by
2 +

d(xi , xj ).

1i<jn

A closed subset of a compact set is compact


Proof. Suppose F K X, F is closed in X, and K is compact. Let {U } be an open cover
of F . Since F c is open, {U , F c } is an open cover of K. Since K is compact, there is a
finite subcover of , which covers K. If F c is a member of , we may remove it from and
still retain an open cover of F . Thus we have shown that a finite subcollection of {U } covers
F.

6.5 Appendix E

189

Equivalence of Compactness and Sequential Compactness


Proof of Proposition 6.23 (b). This direction is hard to proof. It does not work in arbitrary
topological spaces and essentially uses that X is a metric space. The prove is roughly along the
lines of Exercises 22 to 26 in [Rud76]. We give the proof of Bredon (see [Bre97, 9.4 Theorem])
Suppose that every sequence in K contains a converging in K subsequence.
1) K contains a countable dense set. For, we show that for every > 0, K can be covered by
a finite number of -balls ( is fixed). Suppose, this is not true, i. e. K cant be covered by any
finite number of -balls. Then we construct a sequence (xn ) as follows. Take an arbitrary x1 .
Suppose x1 , . . . , xn are already found; since K is not covered by a finite number of -balls, we
find xn+1 which distance to every preceding element of the sequence is greater than or equal
to . Consider a limit point x of this sequence and an /2-neighborhood U of x. Almost all
elements of a suitable subsequence of (xn ) belong to U, say xr and xs with s > r. Since both
are in U their distance is less than . But this contradicts the construction of the sequence.
Now take the union of all those finite sets corresponding to = 1/n, n N. This is a countable
dense set of K.
2) Any open cover {U } of K has a countable subcover. Let x K be given. Since {U }I is
an open cover of K we find I and n N such that U2/n (x) U . Further, since {xi }iN
is dense in K, we find i, n N such that d(x, xi ) < 1/n. By the triangle inequality
x U1/n (xi ) U2/n (x) U .
To each of the countably many U1/n (xi ) choose one U U1/n (xi ). This is a countable subcover of {U }.
3) Rename the countable open subcover by {Vn }nN and consider the decreasing sequence Cn
of closed sets
n
[
Cn = K \
Vk , C 1 C 2 .
k=1

If Ck = we have found a finite subcover, namely V1 , V2 , . . . , Vk . Suppose that all the Cn are
nonempty, say xn Cn . Further, let x be the limit of the subsequence (xni ). Since xni Cm
T
for all ni m and Cm is closed, x Cm for all m. Hence x mN Cm . However,
\

Cm = K \

Vm = .

This contradiction completes the proof.


Proof of Proposition 6.25. Let > 0 be given. Since f is continuous, we can associate to each
point p K a positive number (p) such that q K U(p) (p) implies | f (q) f (p) | < /2.
Let J(p) = {q K | | p q | < (p)/2}.
Since p J(p), the collection {J(p) | p K} is an open cover of K; and since K is compact,
there is a finite set of points p1 , . . . , pn in K such that
K J(p1 ) J(pn ).

(6.46)

6 Sequences of Functions and Basic Topology

190

We put := 12 min{(p1 ), . . . , (pn )}. Then > 0. Now let p and q be points of K with
| x y | < . By (6.46), there is an integer m, 1 m n, such that p J(pm ); hence
1
| p pm | < (pm ),
2
and we also have
1
| q pm | | p q | + | p pm | < + (pm ) (pm ).
2
Finally, continuity at pm gives
| f (p) f (q) | | f (p) f (pm ) | + | f (pm ) f (q) | < .

Proposition 6.27 There exists a real continuous function on the real line which is nowhere
differentiable.
Proof. Define
(x) = | x | ,

x [1, 1]

and extend the definition of to all real x by requiring periodicity


(x + 2) = (x).
Then for all s, t R,
| (s) (t) | | s t | .

(6.47)

In particular, is continuous on R. Define


f (x) =

 n
X
3
n=0

(4n x).

(6.48)

Since 0 1, Theorem 6.2 shows that the series (6.48) converges uniformly on R. By
Theorem 6.5, f is continuous on R.
Now fix a real number x and a positive integer m N. Put
m =

1
2 4m

where the sign is chosen that no integer lies between 4m x and 4m (x + m ). This can be done
since 4m | m | = 12 . It follows that | (4m x) (4m x + 4m m ) | = 12 . Define
n =

(4n (x + m )) (4n x)
.
m

6.5 Appendix E

191

When n > m, then 4n m is an even integer, so that n = 0 by peridicity of . When 0 n m,


(6.47) implies | n | 4m . Since | m | = 4m , we conclude that


m  n
m1

X
f (x + m ) f (x) X
3
1

m

=

3n = (3m + 1) .

3

n


n=0 4

m
2
n=0

As m , m 0. It follows that f is not differentiable at x.

Proof of Abels Limit Theorem, Proposition 6.9. By Proposition 6.4, the series converges on
(1, 1) and the limit function is continuous there since the radius of convergence is at least 1,
by assumption. Hence it suffices to proof continuity at x = 1, i. e. that limx10 f (x) = f (1).
P
n Z+
Put rn =
k=n ak ; then r0 = f (1) and rn+1 rn = cn for all nonnegative integers
P
and limn rn = 0. Hence there is a constant C with | rn | C and the series n=0 rn+1 xn
converges for | x | < 1 by the comparison test. We have
(1 x)

rn+1 xn =

n=0

rn+1 xn +

n=0

X
n=0

rn+1 xn+1

n=0

rn+1 x

X
n=0

rn x + r0 =

hence,
f (1) f (x) = (1 x)

an xn + f (1),

n=0

rn+1 xn .

n=0

Let > 0 be given. Choose N N such that n N implies | rn | < . Put = /(CN); then
x (1 , 1) implies
| f (1) f (x) | (1 x)

N
1
X
n=0

| rn+1 | xn + (1 x)

(1 x)CN + (1 x)

n=N

| rn+1 | xn

xn = 2;

n=0

hence f tends to f (1) as x 1 0.


Definition 6.15 If X is a metric space C(X) will denote the set of all continuous, bounded
functions with domain X. We associate with each f C(X) its supremum norm
kf k = kf k = sup | f (x) | .

(6.49)

xX

Since f is assumed to be bounded, kf k < . Note that boundedness of X is redundant if X is


a compact metric space (Proposition 6.24). Thus C(X) contains of all continuous functions in
that case.
It is clear that C(X) is a vector space since the sum of bounded functions is again a bounded

6 Sequences of Functions and Basic Topology

192

function (see the triangle inequality below) and the sum of continuous functions is a continuous
function (see Proposition 6.26). We show that kf k is indeed a norm on C(X).
(i) Obviously, kf k 0 since the absolute value | f (x) | is nonnegative. Further k0k = 0.
Suppose now kf k = 0. This implies | f (x) | = 0 for all x; hence f = 0.
(ii) Clearly, for every (real or complex) number we have
kf k = sup | f (x) | = | | sup | f (x) | = | | kf k .
xX

xX

(iii) If h = f + g then
| h(x) | | f (x) | + | g(x) | kf k + kgk ,

x X;

hence
kf + gk kf k + kgk .
We have thus made C(X) into a normed vector space. Remark 6.1 can be rephrased as
A sequence (fn ) converges to f with respect to the norm in C(X) if and only if
fn f uniformly on X.
Accordingly, closed subsets of C(X) are sometimes called uniformly closed, the closure of a
set A C(X) is called the uniform closure, and so on.
Theorem 6.28 The above norm makes C(X) into a Banach space (a complete normed space).
Proof. Let (fn ) be a Cauchy sequence of C(X). This means to every > 0 corresponds an
n0 N such that n, m n0 implies kfn fm k < . It follows by Proposition 6.1 that there
is a function f with domain X to which (fn ) converges uniformly. By Theorem 6.5, f is
continuous. Moreover, f is bounded, since there is an n such that | f (x) fn (x) | < 1 for all
x X, and fn is bounded.
Thus f C(X), and since fn f uniformly on X, we have kf fn k 0 as n .

Chapter 7
Calculus of Functions of Several Variables
In this chapter we consider functions f : U R or f : U Rm where U Rn is an open set.
In Subsection 6.4.6 we collected the main properties of continuous functions f . Now we will
study differentiation and integration of such functions in more detail
The Norm of a linear Mapping
Proposition 7.1 Let T L(Rn , Rm ) be a linear mapping of the euclidean spaces Rn into Rm .
(a) Then there exists some C > 0 such that
kT (x)k2 C kxk2 ,

(b) T is uniformly continuous on Rn .

for all x Rn .

(7.1)

Proof. (a) Using the standard bases of Rn and Rm we identify T with its matrix T = (aij ),
P
T ej = m
i=1 aij ei . For x = (x1 , . . . , xn ) we have
!
n
n
X
X
a1j xj , . . . ,
amj xj ;
T (x) =
j=1

j=1

hence by the CauchySchwarz inequality we have


n
2
!2
m X
n
m

X
X
X


aij xj
| aij xj |
kT (x)k22 =



i=1 j=1
i=1
j=1
! n
m
n
n
XX
X
X
X
2
2
2

| aij |
| xj | =
aij
| xj |2 = C 2 kxk2 ,
i=1 j=1

where C =

qP

i,j

j=1

i,j

j=1

a2ij . Consequently,
kT xk C kxk .

(b) Let > 0. Put = /C with the above C. Then kx yk < implies
kT x T yk = kT (x y)k C kx yk < ,
which proves (b).

193

7 Calculus of Functions of Several Variables

194

Definition 7.1 Let V and W normed vector spaces and A L(V, W ). The smallest number C
with (7.1) is called the norm of the linear map A and is denoted by kAk.
kAk = inf{C | kAxk C kxk

for all x V }.

(7.2)

By definition,
kAxk kAk kxk .

(7.3)

Let T L(Rn , Rm ) be a linear mapping. One can show that


kT k = sup
x6=0

kT xk
= sup kT xk = sup kT xk .
kxk
kxk=1
kxk1

7.1 Partial Derivatives


We consider functions f : U R where U Rn is an open set. We want to find derivatives
one variable at a time.
Definition 7.2 Let U Rn be open and f : U R a real function. Then f is called partial
differentiable at a = (a1 , . . . , an ) U with respect to the ith coordinate if the limit
Di f (a) = lim

h0

f (a1 , . . . , ai + h, . . . , an ) f (a1 , . . . , an )
h

(7.4)

exists where h is real and sufficiently small (such that (a1 , . . . , ai + h, . . . , an ) U).
Di f (x) is called the ith partial derivative of f at a. We also use the notations
Di f (a) =

f
f (a)
(a) =
= fxi (a).
xi
xi

It is important that Di f (a) is the ordinary derivative of a certain function; in fact, if g(x) =
f (a1 , . . . , x, . . . , an ), then Di f (a) = g (ai ). That is, Di f (a) is the slope of the tangent line at
(a, f (a)) to the curve obtained by intersecting the graph of f with the plane xj = aj , j 6= i. It
also means that computation of Di f (a) is a problem we can already solve.
Example 7.1 (a) f (x, y) = sin(xy 2 ). Then D1 f (x, y) = y 2 cos(xy 2 ) and D2 f (x, y) =
2xy cos(xy 2 ).
(b) Consider the radius function r : Rn R
q
r(x) = kxk2 = x21 + + x2n ,
x = (x1 , . . . , xn ) Rn . Then r is partial differentiable on Rn \ 0 with
r
xi
,
(x) =
xi
r(x)
Indeed, the function
f () =

x 6= 0.

q
x21 + + 2 + + x2n

(7.5)

7.1 Partial Derivatives

195

is differentiable, where x1 , . . . , xi1 , xi+1 , . . . , xn are considered to be constant. Using the chain
rule one obtains (with = xi )
2
1
xi
r
(x) = f () = p 2
= .
xi
2 x1 + + 2 + + x2n
r

(c) Let f : (0, +) R be differentiable. The composition x 7 f (r(x)) (with the above
radius function r) is denoted by f (r), it is partial differentiable on Rn \ 0. The chain rule gives
r
xi

f (r) = f (r)
= f (r) .
xi
xi
r
(d) Partial differentiability does not imply continuity. Define
(
xy
xy
(x, y) 6= (0, 0),
2
2 2 = r4 ,
f (x, y) = (x +y )
0,
(x, y) = (0, 0).
Obviously, f is partial differentiable on R2 \ 0. Indeed, by definition of the partial derivative
f
f (h, 0)
(0, 0) = lim
= lim 0 = 0.
h0
h0
x
h
Since f is symmetric in x and y, f
(0, 0) = 0, too. However, f is not continuous at 0 since
y
2
f (, ) = 1/(4 ) becomes large as tends to 0.
Remark 7.1 In the next section we will become acquainted with stronger notion of differentiability which implies continuity. In particular, a continuously partial differentiable function is
continuous.
Definition 7.3 Let U Rn be open and f : U R partial differentiable. The vector


f
f
grad f (x) =
(x), . . . ,
(x)
x1
xn

(7.6)

is called the gradient of f at x U.


Example 7.2 (a) For the radius function r(x) defined in Example 7.1 (b) we have
x
grad r(x) = .
r
Note that x/r is a unit vector (of the euclidean norm 1) in the direction x. With the notations of
Example 7.1 (c),
x
grad f (r) = f (r) .
r
(b) Let f, g : U R be partial differentiable functions. Then we have the following product
rule
grad (f g) = g grad f + f grad g.
This is immediate from the product rule for functions of one variable
f
g

(f g) =
g+f
.
xi
xi
xi
(c) f (x, y) = xy . Then grad f (x, y) = (yxy1 , xy log x).

(7.7)

7 Calculus of Functions of Several Variables

196

Notation. Instead of grad f one also writes f (Nabla f ). is a vector-valued differential


operator:



=
.
,...,
x1
xn
Definition 7.4 Let U Rn . A vector field on U is a mapping
v = (v1 , . . . , vn ) : U Rn .

(7.8)

To every point x U there is associated a vector v(x) Rn .


If the vector field v is partial differentiable (i. e. all components vi are partial differentiable)
then
n
X
vi
div v =
xi
i=1

(7.9)

is called the divergence of the vector field v.


Formally the divergence of v can be written as a inner product of and v
div v = v =

n
X

vi .
x
i
i=1

The product rule gives the following rule for the divergence. Let f : U R a partial differentiable function and
v = (v1 , . . . , vn ) : U R
a partial differentiable vector field, then
f
vi

(f vi ) =
vi + f
.
xi
xi
xi
Summation over i gives
div (f v) = grad f v + f div v.
Using the nabla operator this can be rewritten as
f v = f v + f v.

Example 7.3 Let F : Rn \ 0 Rn be the vector field F (x) = xr , r = kxk. Since


div x =

n
X
xi
i=1

xi

=n

and

xx = r 2 ,

Example 7.2 gives with v = x and f (r) = 1/r


div

x
n
1
x
n1
1
= grad x + div x = 3 x + =
.
r
r
r
r
r
r

(7.10)

7.1 Partial Derivatives

197

7.1.1 Higher Partial Derivatives


Let U Rn be open and f : U R a partial differentiable function. If all partial derivatives
Di f : U R are again partial differentiable, f is called twice partial differentiable. We can
form the partial derivatives Dj Di f of the second order.
More general, f : U R is said to be (k + 1)-times partial differentiable if it is k-times partial
differentiable and all partial derivatives of order k
Dik Dik1 Di1 f : U R
are partial differentiable.
A function f : U R is said to be k-times continuously partial differentiable if it is k-times
partial differentiable and all partial derivatives of order less than or equal to k are continuous.
The set of all such functions on U is denoted by Ck (U).
We also use the notation
Dj Di f =

k f
2f
2f
f
=
= fxi xj , Di Di f =
,
D

D
.
i
i
1
k
xj xi
x2i
xik xi1

Example. Let f (x, y) = sin(xy 2 ). One easily sees that


fyx = fxy = 2y cos(xy 2 ) y 2 sin(xy 2 )2xy.
Proposition 7.2 (Schwarzs Lemma) Let U Rn be open and f : U R be twice continuously partial differentiable.
Then for every a U and all i, j = 1, . . . , n we have
Dj Di f (a) = Di Dj f (a).

(7.11)

Proof. Without loss of generality we assume n = 2, i = 1, j = 2, and a = 0; we write (x, y) in


place of (x1 , x2 ). Since U is open, there is a small square of length 2 > 0 completely contained
in U:
{(x, y) R2 | | x | < , | y | < } U.
For fixed y U (0) define the function F : (, ) R via

F (x) = f (x, y) f (x, 0).


By the mean value theorem (Theorem 4.9) there is a with | | | x | such that
F (x) F (0) = xF ().
But F () = fx (, y) fx (, 0). Applying the mean value theorem to the function h(y) =
fx (, y), there is an with | | | y | and
fx (, y) fx (, 0) = h (y)y =

fx (, ) y = fxy (, ) y.
y

Altogether we have
F (x) F (0) = f (x, y) f (x, 0) f (0, y) + f (0, 0) = fxy (, )xy.

(7.12)

7 Calculus of Functions of Several Variables

198

The same arguments but starting with the function G(y) = f (x, y) f (0, y) show the existence
of and with | | | x |, | | | y | and
f (x, y) f (x, 0) f (0, y) + f (0, 0) = fxy ( , ) xy.

(7.13)

From (7.12) and (7.13) for xy 6= 0 it follows that


fxy (, ) = fxy ( , ).
If (x, y) approaches (0, 0) so do (, ) and ( , ). Since fxy and fyx are both continuous it
follows from the above equation
D2 D1 f (0, 0) = D1 D2 f (0, 0).

Corollary 7.3 Let U Rn be open and f : U Rn be k-times continuously partial differentiable. Then
Dik Di1 f = Di(k) Di(1) f
for every permutation of 1, . . . , k.
Proof. The proof is by induction on k using the fact that any permutation can be written as a
product of transpositions (j j + 1).
Example 7.4 Let U R3 be open and let v : U R3 be a partial differentiable vector field.
One defines a new vector field curl v : U R3 , the curl of v by


v3
v2 v1
v3 v2
v1
curl v =
.
(7.14)

x2 x3 x3 x1 x1 x2
Formally one can think of curl v as being the vector product of and v


e

1 e2 e3


curl v = v = x 1 x 2 x 3 ,


v1 v2 v3

where e1 , e2 , and e3 are the unit vectors in R3 . If f : U R has continuous second partial
derivatives then, by Proposition 7.2,
curl grad f = 0.

(7.15)

Indeed, the first coordinate of curl grad f is by definition


2f
2f

= 0.
x2 x3 x3 x2
The other two components are obtained by cyclic permutation of the indices.
We have found: curl v = 0 is a necessary condition for a continuously partial differentiable
vector field v : U R3 to be the gradient of a function f : U R.

7.2 Total Differentiation

199

7.1.2 The Laplacian


Let U Rn be open and f C(U). Put
f = div grad f =
and call
=

2f
2f
+

+
,
x21
x2n

(7.16)

2
2
+

+
x21
x2n

the Laplacian or Laplace operator. The equation f = 0 is called the Laplace equation; its
solution are the harmonic functions. If f depends on an additional time variable t, f : U I
R, (x, t) 7 f (x, t) one considers the so called wave equation
ftt a2 f = 0,

(7.17)

ft kf = 0.

(7.18)

and the so called heat equation

Example 7.5 Let f : (0, +) R be twice continuously differentiable. We want to compute


the Laplacian f (r), r = kxk, x Rn \ 0. By Example 7.2 we have
x
grad f (r) = f (r) ,
r

and by the product rule and Example 7.3 we obtain


x
x
x x
n1
f (r) = div grad f (r) = grad f (r) + f (r) div = f (r) + f (r)
;
r
r
r r
r
thus

n1
f (r).
r
= 0 if n 3 and log r = 0 if n = 2.
f (r) = f (r) +

1
In particular, rn2

Prove!

7.2 Total Differentiation


In this section we define (total) differentiability of a function f from Rn to Rm . Roughly
speaking, f is differentiable (at some point) if it can be approximated by a linear mapping. In
contrast to partial differentiability we need not to refer to single coordinates. Differentiable
functions are continuous. In this section U always denotes an open subset of Rn . The vector
space of linear mappings f of a vector space V into a vector space W will be denoted by
L(V, W ).
Motivation: If f : R R is differentiable at x R and f (x) = a, then
f (x + h) f (x) ah
= 0.
h0
h
lim

Note that the mapping h 7 ah is linear from R R and any linear mapping is of that form.

7 Calculus of Functions of Several Variables

200

Definition 7.5 The mapping f : U Rm is said to be differentiable at a point x U if there


exist a linear map A : Rn Rm such that
kf (x + h) f (x) A(h)k
= 0.
h0
khk
lim

(7.19)

The linear map A L(Rn , Rm ) is called the derivative of f at x and will be denoted by Df (x).
In case n = m = 1 this notion coincides with the ordinary differentiability of a function.
Remark 7.2 We reformulate the definition of differentiability of f at a U: Define a function
a : U (0) Rn Rm (depending on both a and h) by
f (a + h) = f (a) + A(h) + a (h).

(7.20)

a (h)k
Then f is differentiable at a if and only if limh0 kkhk
= 0. Replacing the r.h.s. of (7.20) by
f (a) + A(h) (forgetting about a ) and inserting x in place of a + h and Df (a) in place of A,
we obtain the linearization L : Rn Rm of f at a:

L(x) = f (a) + Df (a)(x a).

(7.21)

Lemma 7.4 If f is differentiable at x U the linear mapping A is uniquely determined.

Proof. Throughout we refer to the euclidean norms on Rn and Rm . Suppose that A


L(Rn , Rm ) is another linear mapping satisfying (7.19). Then for h Rn , h 6= 0,
kA(h) A (h)k = kf (x + h) f (x) A(h) (f (x + h) f (x) A (h))k

kf (x + h) f (x) A(h)k + kf (x + h) f (x) A (h)k


kA(h) A (h)k
kf (x + h) f (x) A(h)k kf (x + h) f (x) A (h)k

+
khk
khk
khk

Since the limit h 0 on the right exists and equals 0, the l.h.s also tends to 0 as h 0, that is
kA(h) A (h)k
= 0.
h0
khk
lim

Now fix h0 Rn , h0 6= 0, and put h = th0 , t R , t 0. Then h 0 and hence,

kA(th0 ) A (th0 )k
| t | kA(h0 ) A (h0 )k
kA(h0 ) A (h0 )k
= lim
=
.
t0
t0
kth0 k
| t | kh0 k
kh0 k

0 = lim

Hence, kA(h0 ) A (h0 )k = 0 which implies A(h0 ) = A (h0 ) such that A = A .

Definition 7.6 The matrix (aij ) Rmn to the linear map Df (x) with respect to the standard
bases in Rn and Rm is called the Jacobi matrix of f at x. It is denoted by f (x), that is

0
..
.

aij = (f (x))ij = (0, 0, . . . , |{z}


1 , 0, . . . , 0)Df (x)
1 = ei Df (x)(ej ).
..
i
.
0

7.2 Total Differentiation

201

Example Let f : Rn Rm be linear, f (x) = B(x) with B L(Rn , Rm ). Then Df (x) = B


is the constant linear mapping. Indeed,
f (x + h) f (x) B(h) = B(x + h) B(x) B(h) = 0
since B is linear. Hence, lim kf (x + h) f (x) B(h)k khk = 0 which proves the claim.
h0

Remark 7.3 (a) Using a column vector h = (h1 , . . . , hn ) the map Df (x)(h) is then given by
matrix multiplication

Pn

h1
a11 . . . a1n
j=1 a1j hj

..
.. .. =
Df (x)(h) = f (x) h = ...
.
.
. .
Pn
hn
am1 . . . amn
j=1 amj hj
Once chosen the standard basis in Rm , we can write f (x) = (f1 (x), . . . , fm (x)) as vector of m
scalar functions fi : Rn R. By Proposition 6.19 the limit of the vector function
1
(f (x + h) f (x) Df (x)(h)) .
h0 khk
lim

exists and is equal to 0 if and only if the limit exists for every coordinate i = 1, . . . , m and is 0
!
n
X
1
fi (x + h) fi (x)
(7.22)
aij hj = 0, i = 1, . . . , m.
lim
h0 khk
j=1
We see, f is differentiable at x if and only if all fi , i = 1, . . . , m, are. In this case the Jacobi
matrix f (x) is just the collection of the row vectors fi (x), i = 1, . . . , m:

f1 (x)

f (x) = ... ,

(x)
fm

where fi (x) = (ai1 , ai2 , . . . , ain ).


(b) Case m = 1 (f = f1 ), f (a) R1n is a linear functional (a row vector). Proposition 7.6
below will show that f (a) = grad f (a). The linearization L(x) of f at a is given by the linear
functional Df (a) from Rn R. The graph of L is an n-dimensional hyper plane in Rn+1
touching the graph of f at the point (a, f (a)). In coordinates, the linearization (hyper plane
equation) is
xn+1 = L(x) = f (a) + f (a)(x a).
Here f (a) is the row vector corresponding to the linear functional Df (a) : Rn R w.r.t. the
standard basis.
Example 7.6 Let C = (cij ) Rnn be a symmetric n n matrix, that is cij = cji for all
i, j = 1, . . . , n and define f : Rn R by
f (x) = xC(x) =

n
X

i,j=1

cij xi xj ,

x = (x1 , . . . , xn ) Rn .

7 Calculus of Functions of Several Variables

202
If a, h Rn we have

f (a + h) = a + hC(a + h) = aC(a) + hC(a) + aC(h) + hC(h)


= aCa + 2C(a)h + hC(h)
= f (a) + v h + (h),

where v = 2C(a) and (h) = hC(h). Since, by the CauchySchwarz inequality,


| (h) | = | hC(h) | khk kC(h)k khk kCk khk kCk khk2 ,

= 0. This proves f to be differentiable at a Rn with derivative Df (x)(x) =


limh0 (h)
khk
2C(a)x. The Jacobi matrix is a row vector f (a) = 2C(a) .

7.2.1 Basic Theorems


Lemma 7.5 Let f : U Rm differentiable at x, then f is continuous at x.
Proof. Define x (h) as in Remarks 7.2 with Df (x) = A, then
lim kx (h)k = 0

h0

since f is differentiable at x. Since A is continuous by Prop. 7.1, limh0 A(h) = A(0) = 0.


This gives
lim f (x + h) = f (x) + lim A(h) + lim x (h) = f (x).
h0

h0

h0

This shows continuity of f at x.

Proposition 7.6 Let f : U Rm , f (x) = (f1 (x), . . . , fm (x)) be differentiable at x U . Then


, i = 1, . . . m, j = 1, . . . , n exist and the Jacobi matrix f (x)
all partial derivatives fxi (x)
j
Rmn has the form
f1

f1
. . . x


x1
n
fi (x)
..

.. (x) =
.
(7.23)
(aij ) = f (x) = .
i = 1, . . . , m
xj
fm
fm
. . . xn
x1
j = 1, . . . , n
Notation. (a) For the Jacobi matrix we also use the notation


(f1 , . . . , fm )

f (x) =
(x) .
(x1 , . . . , xn )
(b) In case n = m the determinant det(f (x)) of the Jacobi matrix is called the Jacobian or
functional determinant of f at x. It is denoted by
det(f (x)) =

(f1 , . . . , fn )
(x) .
(x1 , . . . , xn )

7.2 Total Differentiation

203

Proof. Inserting h = tej = (0, . . . , t, . . . , 0) into (7.22) (see Remark 7.3) we have, since khk =
| t | and hk = tkj for all i = 1, . . . , m
P
kfi (x + tej ) fi (x) nk=1 aik hk k
0 = lim
t0
ktej k
| fi (x1 , . . . , xj + t, . . . , xn ) fi (x) taij |
= lim
t0
|t|


fi (x1 , . . . , xj + t, . . . , xn ) fi (x)


= lim
aij
t0
t


fi (x)

=
aij .
xj
Hence aij =

fi (x)
.
xj

Hyper Planes
A plane in R3 is the set H = {(x1 , x2 , x3 ) R3 | a1 x2 + a2 x2 + a3 x3 = a4 } where, ai R,
i = 1, . . . , 4. The vector a = (a1 , a2 , a3 ) is the normal vector to H; a is orthogonal to any
vector x x , x, x H. Indeed, a(x x ) = ax ax = a4 a4 = 0.
The plane H is 2-dimensional since H can be written with two parameters 1 , 2 R as
(x01 , x02 , x03 )+1 v1 +2 v2 , where (x01 , x02 , x03 ) is some point in H and v1 , v2 R3 are independent
vectors spanning H.
This concept is can be generalized to Rn . A hyper plane in Rn is the set of points
H = {(x1 , . . . , xn ) Rn | a1 x1 + a2 x2 + + an xn = an+1 },
where a1 , . . . , an+1 R. The vector (a1 , . . . , an ) Rn is called the normal vector to the hyper
plane H. Note that a is unique only up to scalar multiples. A hyper plane in Rn is of dimension
n 1 since there are n 1 linear independent vectors v1 , . . . , vn1 Rn and a point h H
such that
H = {h + 1 v1 + + n vn | 1 , . . . , n R}.
Example 7.7 (a) Special case m = 1; let f : U R be differentiable. Then


f
f

(x), . . . ,
(x) = grad f (x).
f (x) =
x1
xn
It is a row vector and gives a linear functional on Rn which linearly associates to each vector
y = (y1 , . . . , yn ) Rn a real number

y1
n
X
..
fxj (x)yj .
Df (x) . = grad f (x)y =
yn

j=1

7 Calculus of Functions of Several Variables

204

In particular by Remark 7.3 (b), the equation of the linearization of f at a (the touching hyper
plane) is
xn+1 = L(x) = f (a) + grad f (a)(x a)

xn+1 = f (a) + (fx1 (a), . . . , fxn (a)) (x1 a1 , , xn an )


xn+1 = f (a) +

n
X
j=1

0=

n
X
j=1

fxj (a)(xj aj )

fxj (a)(xj aj ) + 1 (xn+1 f (a))

0=n
(
xa
),

= (a1 , an , f (a)) and n


= ( grad f (a), 1) Rn+1 is the
where x = (x1 , , xn , xn+1 ), a
normal vector to the hyper plane at a
.
(b) Special case n = 1; let f : (a, b) Rm , f = (f1 , . . . , fm ) be differentiable. Then f is a

(t)) Rm1 is
curve in Rm with initial point f (a) and end point f (b). f (t) = (f1 (t), . . . , fm
the Jacobi matrix of f at x (column vector). It is the tangent vector to the curve f at t (a, b).
(c) Let f : R3 R2 be given by

 3
x 3xy 2 + z
.
f (x, y, z) = (f1 , f2 ) =
sin(xyz 2 )
Then

f (x, y, z) =

(f1 , f2 )
(x, y, z)


6xy
1
3x2 3y 2
.
yz 2 cos(xy 2 z) xz 2 cos(xy 2 z) 2xyz cos(xy 2 z)

The linearization of f at (a, b, c) is


L(x, y, z) = f (a, b, c) + f (a, b, c)(x a, y b, z c).
Remark 7.4 Note that the existence of all partial derivatives does not imply the existence of
f (a). Recall Example 7.1 (d). There was given a function having partial derivatives at the
origin not being continuous at (0, 0), and hence not being differentiable at (0, 0). However, the
next proposition shows that the converse is true provided all partial derivatives are continuous.
Proposition 7.7 Let f : U Rm be continuously partial differentiable at a point a U.
Then f is differentiable at a and f : U L(Rn , Rm ) is continuous at a.
The proof in case n = 2, m = 1 is in the appendix to this chapter.
Theorem 7.8 (Chain Rule) If f : Rn Rm is differentiable at a point a and g : Rm Rp is
differentiable at b = f (a), then the composition k = g f : Rn Rp is differentiable at a and
Dk(a) = Dg(b)Df (a).

(7.24)

7.2 Total Differentiation

205

Using Jacobi matrices, this can be written as


k (a) = g (b) f (a).

(7.25)

Proof. Let A = Df (a), B = Dg(b), and y = f (x). Defining functions , , and by


(x) = f (x) f (a) A(x a),
(y) = g(y) g(b) B(y b),

(x) = g f (x) g f (a) B A(x a)

(7.26)
(7.27)
(7.28)

then
lim

xa

k(x)k
= 0,
kx ak

and we have to show that

lim

yb

k(y)k
=0
ky bk

(7.29)

k(x)k
= 0.
xa kx ak
lim

Inserting (7.26) and (7.27) we find

(x) = g(f (x)) g(f (a)) BA (x a) = g(f (x)) g(f (a)) B(f (x) f (a) (x))
(x) = [g(f (x)) g(f (a)) B(f (x) f (a))] + B (x)
(x) = (f (x)) + B((x)).

Using kT (x)k kT k kxk (see Proposition 7.1) this shows


k(x)k
k(f (x))k kB (x)k
k(y)k kf (x) f (a)k
k(x)k

+ kBk
.
kx ak
kx ak
kx ak
ky bk
kx ak
kx ak
Inserting (7.26) again into the above equation we continue
k(x)k
k(y)k k(x) + A(x a)k

+ kBk
ky bk
ka xk
kx ak


k(y)k k(x)k
k(x)k

+ kAk + kBk
.
ky bk ka xk
kx ak
=

All terms on the right side tend to 0 as x approaches a. This completes the proof.

Remarks 7.5 (a) The chain rule in coordinates. If A = f (a), B = g (f (a)), and C = k (a),
then A Rmn , B Rpm , and C Rpn and

 
 

(k1 , . . . , kp )
(g1 , . . . , gp )
(f1 , . . . , fm )

=
(7.30)
(x1 , . . . , xn )
(y1 , . . . , ym)
(x1 , . . . , xn )
m
X
fi
kr
gr
(a) =
(f (a))
(a), r = 1, . . . , p, j = 1, . . . , n.
(7.31)
xj
yi
xj
i=1
(b) In particular, in case p = 1, k(x) = g(f (x)) we have,
k
g f1
g fm
=
++
.
xj
y1 xj
ym xj

7 Calculus of Functions of Several Variables

206

Example 7.8 (a) Let f (u, v) = uv, u = g(x, y) = x2 + y 2 , v = h(x, y) = xy, and z =
f (g(x, y), h(x, y)) = (x2 + y 2 )xy = x3 y + x2 y 3 .
z
f g f h
=

= v 2x + u y = 2x2 y + y(x2 + y 2 )
x
u x v x
z
= 3x2 y + y 3 .
x
(b) Let f (u, v) = uv , u(t) = v(t) = t. Then F (t) = f (u(t), v(t)) = tt and
f
f
u (t) +
v (t) = vuv1 1 + uv log u 1
u
v
= t tt1 + tt log t = tt (log t + 1).

F (t) =

7.3 Taylors Formula


The gradient of f gives an approximation of a scalar function f by a linear functional. Taylors
formula generalizes this concept of approximation to higher order. We consider quadratic approximation of f to determine
local extrema. Througout this section we refer to the euclidean
p
2
norm kxk = kxk2 = x1 + + x2n .

7.3.1 Directional Derivatives


Definition 7.7 Let f : U R be a function, a U, and e Rn a unit vector, kek = 1. The
directional derivative of f at a in the direction of the unit vector e is the limit
(De f )(a) = lim
t0

Note that for e = ej we have De f = Dj f =

f (a + te) f (a)
.
t

(7.32)

f
.
xj

Proposition 7.9 Let f : U R be continuously differentiable. Then for every a U and every
unit vector e Rn , kek = 1, we have
De f (a) = e grad f (a)

(7.33)

Proof. Define g : R Rn by g(t) = a + te = (a1 + te1 , . . . , an + ten ). For sufficiently small


t R, say | t | , the composition k = f g
g

R Rn R, k(t) = f (g(t)) = f (a1 + te1 , . . . , an + ten )


is defined. We compute k (t) using the chain rule:
n
X
f
k (t) =
(a + te) gj (t).
xj
j=1

7.3 Taylors Formula

207

Since gj (t) = (aj + tej ) = ej and g(0) = a, it follows


n
X
f
k (t) =
(a + te) ej ,
xj
j=1

k (0) =

n
X

(7.34)

fxj (a)ej = grad f (a)e.

j=1

On the other hand, by definition of the directional derivative


k(t) k(0)
f (a + te) f (a)
= lim
= De f (a).
t0
t0
t
t

k (0) = lim
This completes the proof.

Remark 7.6 (Geometric meaning of grad f ) Suppose that grad f (a) 6= 0 and let e be a
normed vector, kek = 1. Varying e, De f (x) = e grad f (x) becomes maximal if and only if
e and f (a) have the same directions. Hence the vector grad f (a) points in the direction of
maximal slope of f at a. Similarly, grad f (a) is the direction of maximal
decline.
p
For example f (x,
1 x2 y 2 has
 y) =
grad f (x, y) =

x
,
1x2 y 2

1x2 y 2

. The

maximal slope p
of f at (x, y) is in direction
e = (x, y)/ x2 + y 2 . In this case,the tangent line to the graph points to the z-axis and
has maximal slope.

Corollary 7.10 Let f : U R be k-times continuously differentiable, a U and x Rn such


that the whole segment a + tx, t [0, 1] is contained in U.
Then the function h : [0, 1] R, h(t) = f (a + tx) is k-times continuously differentiable, where
h(k) (t) =

n
X

i1 ,...,ik =1

Dik Di1 f (a + tx)xi1 xik .

(7.35)

In particular
h(k) (0) =

n
X

i1 ,...,ik =1

Dik Di1 f (a)xi1 xik .

(7.36)

Proof. The proof is by induction on k. For k = 1 it is exactly the statement of the Proposition.
We demonstrate the step from k = 1 to k = 2. By (7.34)


n
n
n X
X
X
d f (a + tx)
f

xi1 =
(a + tx)xi2 xi1 .
h (t) =
dt
xi1
xi2 xi1
i =1
i =1 i =1
1

= fx (a + tx).
In the second equality we applied the chain rule to h(t)
i1

7 Calculus of Functions of Several Variables

208

For brevity we use the following notation for the term on the right of (7.36):
n
X
(x )k f (a) =
xi1 xik Dik Di1 f (a).
i1 ,...,ik =1

In particular, (x)f (a) = x1 fx1 (a) + x2 fx2 (a) + + xn fxn (a) and ( x)2 f (a) =
Pn
2f
i,j=1 xi xj xi xj .

7.3.2 Taylors Formula

Theorem 7.11 Let f Ck+1 (U), a U, and x Rn such that a + tx U for all t [0, 1].
Then there exists [0, 1] such that
k
X
1
1
(x )m f (a) +
(x )k+1f (a + x)
f (a + x) =
m!
(k
+
1)!
m=0

f (a + x) = f (a) +

n
X
i=1

n
1 X
xi fxi (a) +
xi xj fxi xj (a) + +
2! i,j=1
X
1
+
xi1 xik+1 fxi1 xik+1 (a + x).
(k + 1)! i ,...i
1

The expression Rk+1 (a, x) =

(7.37)

1
(x )k+1 f (a
(k+1)!

k+1

+ x) is called the Lagrange remainder term.

Proof. Consider the function h : [0, 1] R, h(t) = f (a + tx). By Corollary 7.10, h is a


(k + 1)-times continuously differentiable. By Taylors theorem for functions in one variable
(Theorem 4.15 with x = 1 and a = 0 therein), we have
k
X
h(m) (0) h(k+1) ()
+
.
f (a + x) = h(1) =
m!
(k
+
1)!
m=0

By Corollary 7.10 for m = 1, . . . , k we have

1
h(m) (0)
=
(x )m f (a).
m!
m!
and

1
h(k+1) ()
=
(x )k+1 f (a + x);
(k + 1)!
(k + 1)!

the assertion follows.


It is often convenient to substitute x := x + a. Then the Taylor expansion reads
f (x) =

k
X
1
1
((x a) )m f (a) +
((x a) )(k+1) f (a + (x a))
m!
(k
+
1)!
m=0
n
X

n
1 X
f (x) = f (a) +
(xi ai )fxi (a) +
(xi ai )(xj aj )fxi xj (a) + +
2! i,j=1
i=1
X
1
+
(xi1 ai1 ) (xik+1 aik+1 )fxi1 xik+1 (a + (x a)).
(k + 1)! i ,...,i
1

k+1

7.3 Taylors Formula

209

We write the Taylor formula for the case n = 2, k = 3:


f (a + x,b + y) = f (a, b) + (fx (a, b)x + fy (a, b)y) +

1
+
fxx (a, b)x2 + 2fxy (a, b) xy + fyy (a, b)y 2 +
2!

1
fxxx (a, b)x3 + 3fxxy (a, b)x2 y + 3fxyy (a, b)xy 2 + fyyy (a, b)y 3 ) + R4 (a, x).
+
3!
T
k
k
If f
k=0 C (U) = {f : f C (U) k N} and lim Rk (a, x) = 0 for all x U, then
k

X
1
f (x) =
((x a) )m f (a).
m!
m=0

The r.h.s. is called the Taylor series of f at a.


Example 7.9 (a) We compute the Taylor expansion of f (x, y) = cos x sin y at (0, 0) to the third
order. We have
fx = sin x sin y,

fy = cos x cos y,

fx (0, 0) = 0,

fxx = cos x sin y,

fxx (0, 0) = 0,

fxxy = cos x cos y,

fxxy (0, 0) = 1,

fy (0, 0) = 1,
fyy = cos x sin y,

fyy (0, 0) = 0,

fyyy = cos x cos y,

fyyy (0, 0) = 1,

fxy = sin x cos y

fxy (0, 0) = 0.

fxyy (0, 0) = fxxx (0, 0) = 0.

Inserting this gives


1
3x2 y y 3 + R4 (x, y; 0).
3!
The same result can be obtained by multiplying the Taylor series for cos x and sin y:



y3
1
x2 x4
y3
+

y
= y x2 y
+
1
2
4!
3!
2
6
f (x, y) = y +

(b) The Taylor series of f (x, y) = exy at (0, 0) is

X
(xy 2 )n
n=0

n!

1
= 1 + xy 2 + x2 y 4 + ;
2

it converges all over R2 to f (x, y).

Corollary 7.12 (Mean Value Theorem) Let f : U R be continuously differentiable, a U,


x Rn such that a + tx U for all t [0, 1]. Then there exists [0, 1] such that
f (a + x) f (a) = f (a + x)x, f (y) f (x)

= f ((1 )x + y)(y x).

This is the special case of Taylors formula with k = 0.

(7.38)

7 Calculus of Functions of Several Variables

210

Corollary 7.13 Let f : U R be k times continuously differentiable, a U, x Rn such


that a + tx U for all t [0, 1]. Then there exists : U R such that
k
X
1
f (a + x) =
(x )m f (a) + (x),
m!
m=0

where
lim

x0

(x)
kxkk

(7.39)

= 0.

Proof. By Taylors theorem for f Ck (U), there exists [0, 1] such that
f (x + a) =

k1
k
X
X
1
1
1
!
(x )m f (a) + (x )k f (a + x) =
(x )m f (a) + (x).
m!
k!
m!
m=0
m=0

This implies
(x) =


1
(x )k f (a + x) (x )k f (a) .
k!

Since | xi1 xik | kxk . . . kxk = kxkk for x 6= 0,


| (x) |
kxkk


1 k
f (a + x) k f (a) .
k!

Since all kth partial derivatives of f are continuous,

Di1 i2 ik (f (a + x) f (a)) 0.
x0

This proves the claim.

Remarks 7.7 (a) With the above notations let


Pm (x) =

((x a) )m
f (a).
m!

Then Pm is a polynomial of degree m in the set of variables x = (x1 , . . . , xn ) and we have


f (x) =

k
X

m=0

Pm (x) + (x),

lim

xa

k(x)k

kx akk

= 0.

Let us consider in more detail the cases m = 0, 1, 2.


Case m = 0.
D 0 f (a) 0
x = f (a).
P0 (x) =
0!
P0 is the constant polynomial with value f (a).
Case m = 1. We have
P1 (x) =

n
X
j=1

fxj (a)(xj aj ) = grad f (a)(x a)

7.4 Extrema of Functions of Several Variables

211

Using Corollary 7.13 the first order approximation of a continuously differentiable function is
f (x) = f (a) + grad f (a)(x a) + (x),

(x)
= 0.
xa kx ak
lim

The linearization of f at a is L(x) = P0 (x) + P1 (x).


Case m = 2.
n
1X
fx x (a)(xi ai )(xj aj ).
P2 (x) =
2 i,j=1 i j
Hence, P2 (x) is quadratic with the corresponding matrix
Corollary 7.13 (m = 2) we have for f C2 (U)

1
f (a)
2 xi xj

1
f (a + x) = f (a) + grad f (a)x + x Hess f (a)x + (x),
2

(7.40)

. As a special case of

(x)
= 0,
x0 kxk2
lim

(7.41)

where
( Hess f )(a) = fxi xj (a)

n

i,j=1

(7.42)

is called the Hessian matrix of f at a U. The Hessian matrix is symmetric by Schwarz


lemma.

7.4 Extrema of Functions of Several Variables


Definition 7.8 Let f : U R be a function. The point x U is called local maximum
(minimum) of f if there exists a neighborhood U (x) U of x such that
f (x) f (y)

(f (x) f (y)) for all y U (x).

A local extremum is either a local maximum or a local minimum.


Proposition 7.14 Let f : U R be partial differentiable. If f has a local extremum at x U
then grad f (x) = 0.
Proof. For i = 1, . . . , n consider the function
gi (t) = f (x + tei ).
This is a differentiable function of one variable, defined on a certain interval (, ). If f has
an extremum at x, then gi has an extremum at t = 0. By Proposition 4.7
gi (0) = 0.
Since gi (0) = limt0

f (x+tei )f (x)
t

= fxi (x) and i was arbitrary, it follows that

grad f (x) = (Di f (x), . . . , Dn f (x)) = 0.

7 Calculus of Functions of Several Variables

212

p
Example 7.10 Let f (x, y) = 1 x2 y 2 be defined on the open unit disc U = {(x, y)
R2 | x2 + y 2 < 1}. Then grad f (x, y) = (x/r, y/r) = p
0 if and only if x = y = 0. If f has
an extremum in U then at the origin. Obviously, f (x, y) = 1 x2 y 2 1 = f (0, 0) for all
points in U such that f attains its global (and local) maximum at (0, 0).
To obtain a sufficient criterion for the existence of local extrema we have to consider the Hessian
matrix. Before, we need some facts from Linear Algebra.
Definition 7.9 Let A Rnn be a real, symmetric n n-matrix, that is aij = aji for all
i, j = 1, . . . , n. The associated quadratic form
Q(x) =

n
X

i,j=1

aij xi xj = x A x

is called
positive definite
negative definite
indefinite
positive semidefinite
negative semidefinite

if Q(x) > 0 for all x 6= 0,


if Q(x) < 0 for all x 6= 0,
if Q(x) > 0, Q(y) < 0 for some x, y,
if Q(x) 0 for all x and Q is not positive definite,
if Q(x) 0 for all x and Q is not negative definite.

Also, we say that the corresponding matrix A is positive defininite if Q(x) is.
Example 7.11 Let n = 2, Q(x) = Q(x1 , x2 ). Then Q1 (x) = 3x21 + 7x22 is positive definite,
Q2 (x) = x21 2x22 is negative definite, Q3 (x) = x21 2x22 is indefinite, Q4 (x) = x21 is positive
semidefinite, and Q5 (x) = x22 is negative semidefinite.
Proposition 7.15 (Sylvester) Let A be a real symmetric n n-matrix and Q(x) = xAx the
corresponding quadratic form. For k = 1, , n let

a11 a1k

.. , D = det A .
Ak = ...
k
k
.
ak1 akk

Let 1 , . . . , n be the eigenvalues of A. Then

(a) Q is positive definite if and only if 1 > 0, 2 > 0, . . . , n > 0. This is the case
if and only if D1 > 0, D2 > 0,. . . , Dn > 0.
(b) Q(x) is negative definite if and only if 1 < 0, 2 < 0,. . . ,n < 0. This is the
case if and only if (1)k Dk > 0 for all k = 1, . . . , n.
(c) Q(x) is indefinite if and only if, A has both positive and negative eigenvalues.
Example 7.12 Case n = 2. Let A R

22

, A =

Sylvesters criterion A is
(a) positive definite if and only if det A > 0 and a > 0,


a b
, be a symmetric matrix. By
b c

7.4 Extrema of Functions of Several Variables

213

(b) negative definite if and only if det A > 0 and a < 0,


(c) indefinite if and only if det A < 0,
(d) semidefinite if and only if det A = 0.
Proposition 7.16 Let f : U R be twice continuously differentiable and let grad f (a) = 0 at
some point a U.
(a) If Hess f (a) is positive definite, then f has a local minimum at a.
(b) If Hess f (a) is negative definite, then f has a local maximum at a.
(c) If Hess f (a) is indefinite, then f has not a local extremum at a.
Note that in general there is no information on a if Hess f (a) is semidefinit.
Proof. By (7.41) and since grad f (a) = 0,
1
f (a + x) = f (a) + xA(x) + (x),
2

(x)
= 0,
x0 kxk2
lim

(7.43)

where A = Hess f (a).


(a) Let A be positive definite. Since the unit sphere S = {x Rn | kxk = 1} is compact (closed
and bounded) and the map Q(x) = xA(x) is continuous, the function attains its minimum, say
m, on S, see Proposition 6.24,
m = min{xA(x) | x S}.

Since Q(x) is positive definite and 0 6 S, m > 0. If x is nonzero, y = x/ kxk S and therefore


1
x A(x)
1
x
m y A(y) =
=
xA
=
xA(x),

kxk
kxk
kxk kxk
kxk2
This implies Q(x) = xA(x) m kxk2 for all x U.
Since (x)/ kxk2 0 as x 0, there exists > 0 such that kxk < implies

m
m
kxk2 (x)
kxk2 .
4
4

From (7.43) it follows


f (a + x) = f (a) +

m
m
1
1
Q(x) + (x) f (a) + m kxk2 kxk2 f (a) + kxk2 ,
2
2
4
4

hence
f (a + x) > f (a),

if 0 < kxk < ,

and f has a strict (isolated) local minimum at a.


(b) If A = Hess f (a) is negative definite, consider f in place of f and apply (a).
(c) Let A = Hess f (a) indefinite. We have to show that in every neighborhood of a there exist
x and x such that f (x ) < f (a) < f (x ). Since A is indefinite, there is a vector x Rn \ 0
such that xA(x) = m > 0. Then for small t we have
m
1
f (a + tx) = f (a) + txA(tx) + (tx) = f (a) + t2 + (tx).
2
2

7 Calculus of Functions of Several Variables

214
If t is small enough, m4 t2 (tx)

m 2
t,
4

hence

f (a + tx) > f (a),

if

0 < | t | < .

Similarly, if y Rn \ 0 satisfies y A(y) < 0, for sufficiently small t we have f (a + ty) < f (a).

Example 7.13 (a) f (x, y) = x2 + y 2. Here f = (2x, 2y) = 0 if and only if x = y = 0.


Furthermore,


2 0
Hess f (x, y) =
0 2
is positive definite. f has a (strict) local minimum at (0, 0)
(b)Find the local extrema of z = f (x, y) = 4x2 y 2 on R2 . (the graph is a hyperbolic
paraboloid). We find that the necesary condition f = 0 implies fx = 8x = 0, fy = 2y = 0;
thus x = y = 0. Further,


 
fxx fxy
8 0
Hess f (x, y) =
.
=
fyx fyy
0 2
The Hessian matrix at (0, 0) is indefinite; the function has not an extremum at the origin (0, 0).
(c) f (x, y) = x2 + y 3 . f (x, y) = (2x, 3y 2) vanishes if and only if x = y = 0. Furthermore,


2 0
Hess f (0, 0) =
0 0
is positive semidefinit. However, there is no local extremum at the origin since f (, 0) = 2 >
0 = f (0, 0) > 3 = f (0, ).
(d) f (x, y) = x2 + y 4. Again the Hessian matrix at (0, 0) is positive semidefinite. However,
(0, 0) is a strict local minimum.
Local and Global Extrema
To compute the global extrema of a function f : U R where U Rn is open and U is the
closure of U we have go along the following lines:
(a) Compute the local extrema on U;
(b) Compute the global extrema on the boundary U = U U c ;
(c) If U is unbounded without boundary (as U = R), consider the limits at infinity.
Note that
sup f (x) = max{maximum of all local maxima in U, sup f (x)}.
xU

xU

To compute the global extremum of f on the boundary one has to find the local extrema on
the interior point of the boundary and to compare them with the values on the boundary of the
boundary.

7.4 Extrema of Functions of Several Variables

215

Example 7.14 Find the global extrema of f (x, y) = x2 y on U = {(x, y) R2 | x2 + y 2 1}


(where U is the open unit disc.)
Since grad f = (fx , fy ) = (2xy, x2 ) local extrema can appear only on the y-axis x = 0, y is
arbitrary. The Hessian matrix at (0, y) is






fxx fxy
2y 2x
2y 0
Hess f (0, y) =
=
=
.
fxy fyy x=0
2x 0 x=0
0 0

This matrix is positive semidefinite in case y > 0, negative semidefinite in case y < 0 and 0 at
(0, 0). Hence, the above criterion gives no answer. We have to apply the definition directly. In
case y > 0 we have f (x, y) = x2 y 0 for all x. In particular f (x, y) f (0, y) = 0. Hence
(0, y) is a local minimum. Similarly, in case y < 0, f (x, y) f (0, y) = 0 for all x. Hence, f
has a local maximum at (0, y), y < 0. However f takes both positive and negative values in a
neighborhood of (0, 0), for example f (, ) = 3 and f (, ) = 3 . Thus (0, 0) is not a local
extremum.
We have to consider the boundary x2 + y 2 = 1. Inserting x2 = 1 y 2 we obtain

g(y) = f (x, y)|x2 +y2 =1 = x2 y x2 +y2 =1 = (1 y 2)y = y y 3 , | y | 1.

We compute the local extrema of the boundary x2 +y 2 = 1 (note, that the circle has no boundary,
such that the local extrema are actually the global extrema).
!

g (y) = 1 3y 2 = 0,

1
|y| = .
3

Since g (1/ 3) < 0 and g (1/ 3) > 0, g attains its maximum 32 3 at y = 1/ 3. Since this
is greater than the local maximum of f at (0, y), y > 0, f attains its global maximum at the two
points
!
r
2 1
M1,2 =
,
,
3 3

where f (M1,2 ) = x2 y = 32 3 . g attains its minimum 32 3 at y = 1/ 3. Since this is less


than the local minimum of f at (0, y), y < 0, f attains its global minimum at the two points
!
r
2
1
,
,
m1,2 =
3
3
where f (m1,2 ) = x2 y = 32 3 .
The arithmetic-geometric mean inequality shows the same result for x, y > 0:
1
x2 + y 2

=
3
3

x2
2

x2
2

+ y2

x2 x2 2
y
2 2

 31

2
= x2 y .
3 3

(b) Among all boxes with volume 1 find the one where the sum of the length of the 12 edges is
minimal.
Let x, y and z denote the length of the three perpendicular edges of one vertex. By assumption
xyz = 1; and g(x, y, z) = 4(x + y + z) is the function to minimize.

7 Calculus of Functions of Several Variables

216

Local Extrema. Inserting the constraint z = 1/(xy) we have to minimize




1
f (x, y) = 4 x + y +
on U = {(x, y) | x > 0, y > 0}.
xy
The necessary condition is


1
= 0,
fx = 4 1 2
xy


= x2 y = xy 2 = 1 = x = y = 1.
1
fy = 4 1 2 = 0,
xy
Further,
fxx =
such that

8
,
x3 y

fyy =

8
,
xy 3

fxy =

4
x2 y 2



8 4
= 64 16 > 0;

det Hess f (1, 1) =
4 8

hence f has an extremum at (1, 1). Since fxx (1, 1) = 8 > 0, f has a local minimum at (1, 1).
Global Extrema. We show that (1, 1) is even the global minimum on the first quadrant U.
1
Consider N = {(x, y) | 25
x, y 5}. If (x, y) 6 N,
f (x, y) 4(5 + 0 + 0) = 20,
Since f (x, y) 12 = f (1, 1), the global minimum of f on the right-upper quadrant is attained
on the compact rectangle N. Inserting the four boundaries x = 5, y = 5, x = 1/5, and y = 1/5,
in all cases, f (x, y) 20 such that the local minimum (1, 1) is also the global minimum.

7.5 The Inverse Mapping Theorem


Suppose that f : R R is differentiable on an open set U R, containing a U, and
f (a) 6= 0. If f (a) > 0, then there is an open interval V U containing a such that f (x) >
0 for all x V . Thus f is strictly increasing on V and therefore injective with an inverse
function g defined on some open interval W containing f (a). Moreover g is differentiable (see
Proposition 4.5) and g (y) = 1/f (x) if f (x) = y. An analogous result in higher dimensions is
more involved but the result is very important.

R
U

11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
a
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111

11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
f(a)
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111

Rn

7.5 The Inverse Mapping Theorem

217

Theorem 7.17 (Inverse Mapping Theorem) Suppose that f : Rn Rn is continuously differentiable on an open set U containing a, and det f (a) 6= 0. Then there is an open set V U
containing a and an open set W containing f (a) such that f : V W has a continuous inverse
g : W V which is differentiable and for all y W . For y = f (x) we have
g (y) = (f (x))

Dg(y) = (Df (x))1 .

(7.44)

For the proof see [Rud76, 9.24 Theorem] or [Spi65, 2-11].


Corollary 7.18 Let U Rn be open, f : U Rn continuously differentiable and
det f (x) 6= 0 for all x U. Then f (U) is open in Rn .
Remarks 7.8 (a) One main part is to show that there is an open set V U which is mapped
onto an open set W . In general, this is not true for continuous mappings. For example sin x
maps the open interval (0, 2) onto the closed set [1, 1]. Note that sin x does not satisfy the
assumptions of the corollary since sin (/2) = cos(/2) = 0.
(b) Note that continuity of f (x) in a neighborhood of a, continuity of the determinant mapping
det : Rnn R, and det f (a) 6= 0 implies that det f (x) 6= 0 in a neighborhood V1 of a, see
homework 10.4. This implies that the linear mapping Df (x) is invertible for x V1 . Thus,
Df (x)1 and (f (x))1 exist for x V1 the linear mapping Df (x) is regular.
(c) Let us reformulate the statement of the theorem. Suppose
y1 = f1 (x1 , . . . , xn ),
y2 = f2 (x1 , . . . , xn ),
..
.
yn = fn (x1 , . . . , xn )
is a system of n equations in n variables x1 , . . . , xn ; y1 , . . . , yn are given in a neighborhood W
of f (a). Under the assumptions of the theorem, there exists a unique solution x = g(y) of this
system of equations
x1 = g1 (y1 , . . . , yn ),
x2 = g2 (y1 , . . . , yn ),
..
.
xn = gn (y1, . . . , yn )
in a certain neighborhood (x1 , . . . , xn ) V of a. Note that the theorem states the existence of
such a solution. It doesnt provide an explicit formula.
(d) Note that the inverse function g may exist even if det f (x) = 0. For example f : R R,

defined by f (x) = x3 has f (0) = 0; however g(y) = 3 x is inverse to f (x). One thing is
certain if det f (a) = 0 then g cannot be differentiable at f (a). If g were differentiable at f (a),
the chain rule applied to g(f (x)) = x would give
g (f (a)) f (a) = id

7 Calculus of Functions of Several Variables

218
and consequently

det g (f (a)) det f (a) = det id = 1


contradicting det f (a) = 0.
(e) Note that the theorem states that under the given assumptions f is locally invertible. There
is no information about the existence of an inverse function g to f on a fixed open set. See
Example 7.15 (a) below.
Example 7.15 (a) Let x = r cos and y = r sin be the polar coordinates in R2 . More
precisely, let

  
r cos
x
, f : R2 R2 .
=
f (r, ) =
r sin
y
The Jacobian is




(x, y) xr x cos r sin
=
= r.
=
(r, ) yr y sin r cos

Let f (r0 , 0 ) = (x0 , y0 ) 6= (0, 0), then r0 6= 0 and the Jacobian of f at (r0 , 0 ) is non-zero.
Since all partial derivatives of f with respect to r and exist and they are continuous on R2 , the
assumptions of the theorem are satisfied. Hence, in a neighborhood U of (x0 , y0 ) there exists a
continuously differentiable inverse
function r = r(x, y), = (x, y). In this case, the function
p
2
can be given explicitly, r = x + y 2 , = arg(x, y). We want to compute the Jacobi matrix
of the inverse function. Since the inverse matrix

1 

cos
sin
cos r sin
=
1r sin 1r cos
sin r cos
we obtain by the theorem
g (x, y) =

(r, )
(x, y)

cos
1r sin


sin
=
1
cos
r

x
x2 +y 2
y
x2 +y
2

x2 +y 2
x
x2 +y 2

in particular, the second row gives the partial derivatives of the argument function with respect
to x and y
arg(x, y)
y
x
arg(x, y)
= 2
=
,
.
x
x + y2
y
x2 + y 2
Note that we have not determined the explicit form of the argument function which is not unique
since f (r, +2k) = f (r, ), for all k Z. However, the gradient takes always the above form.
Note that det f (r, ) 6= 0 for all r 6= 0 is not sufficient for f to be injective on R2 \ {(0, 0)}.
(b) Let f : R2 R2 be given by (u, v) = f (x, y) where
u(x, y) = sin x cos y,

v(x, y) = cos x + sin y.

Since



(u, v) ux uy cos x sin y
=
=
= cos x cos y sin x sin y = cos(x + y)
(x, y) vx vy sin x cos y

7.6 The Implicit Function Theorem

219

f is locally invertible at (x0 , y0 ) = ( 4 , 4 ) since the Jacobian at (x0 , y0 ) is cos 0 = 1 6= 0.

Since f ( 4 , 4 ) = (0, 2), the inverse function g(u, v) = (x, y) is defined in a neighborhood

of (0, 2) and the Jacobi matrix of g at (0, 2) is


!1
!
1
1
1
1

2
2
2
2
=
.
1
1
1
1

2
2
2
2
Note that at point ( 4 , 4 ) the Jacobian of f vanishes. There is indeed no neighborhood of ( 4 , 4 )
where f is injective since for all t R

 


= (0, 0).
+ t, t = f
,
f
4
4
4 4

7.6 The Implicit Function Theorem


Motivation: Hyper Surfaces
Suppose that F : U R is a continuously differentiable function and grad F (x) 6= 0 for all
x U. Then
S = {(x1 , . . . , xn ) | F (x1 , . . . , xn ) = 0}

is called a hyper surface in Rn . A hyper surface in Rn has dimension n 1. Examples are


hyper planes a1 x1 + + an xn + c = 0 ((a1 , . . . , an ) 6= 0), spheres x21 + + x2n = r 2 . The
graph of differentiable functions f : U R is also a hyper surface in Rn+1
f = {(x, f (x)) Rn+1 | x U}.

Question: Is any hyper surface locally the graph of a differentiable function? More precisely,
we may ask the following question: Suppose that f : Rn R R is differentiable and
f (a1 , . . . , an , b) = 0. Can we find for each (x1 , . . . , xn ) near (a1 , . . . , an ) a unique y near b
such that f (x1 , . . . , xn , y) = 0? The answer to this question is provided by the Implicit Function Theorem (IFT).
Consider the function f : R2 R defined by f (x, y) = x2 + y 2 1. If we choose (a, b) with
a, b > 0, there are open intervals A and B containig a and b with the following property: if
x A, there is a unique y B with f (x, y) = 0. We can therefore define a function g : A B

by the condition g(x) B and f (x, g(x)) = 0. If b > 0 then g(x) = 1 x2 ; if b < 0 then

g(x) = 1 x2 . Both functions g are differentiable. These functions are said to be defined
implicitly by the equation f (x, y) = 0.
On the other hand, there exists no neighborhood of (1, 0) such that f (x, y) = p
0 can locally be
solved for y. Note that fy (1, 0) = 0. However it can be solved for x = h(y) = 1 y 2.

Theorem 7.19 Suppose that f : Rn Rm Rm , f = f (x, y), is continuously differentiable


in an open set containing (a, b) Rn+m and f (a, b) = 0. Let Dfy (x, y) be the linear mapping
from Rm into Rm given by


 

(f1 , . . . , fm )
fi (x, y)
i
Dy f (x, y) = Dn+j f (x, y) =
, i, j = 1, . . . , m.
(x, y) =
(y1 , . . . , ym )
yj
(7.45)

7 Calculus of Functions of Several Variables

220

If det Dy f (a, b) 6= 0 there is an open set A Rn containing a and an open set B Rm


containing b with the following properties: There exists a unique continuously differentiable
function g : A B such that
(a) g(a) = b,
(b) f (x, g(x)) = 0 for all x A.

For the derivative Dg(x) L(Rn , Rm ) we have


Dg(x) = (Dfy (x, g(x)))1 Dfx (x, g(x)),
g (x) = (fy (x, g(x)))1 fx (x, g(x)).

The Jacobi matrix g (x) is given by






(g1 , . . . , gm )
(f1 , . . . , fm )

1
(x) = fy (x, g(x))
(x, g(x))
(7.46)
(x1 , . . . , xn )
(x1 , . . . , xn )
n
X
(gk (x))
fl (x, g(x))
=
(fy (x, g(x))1 )kl
, k = 1, . . . , m, j = 1, . . . , n.
xj
xj
l=1

Idea of Proof. Define F : Rn Rm Rn Rm by F (x, y) = (x, f (x, y)). Let M = fy (a, b).
Then


1n 0n,m

F (a, b) =
= det F (a, b) = det M 6= 0.
0m,n M

By the inverse mapping theorem Theorem 7.17 there exists an open set W Rn Rm containing F (a, b) = (a, 0) and an open set V Rn Rm containing (a, b) which may be of the
form A B such that F : A B W has a differentiable inverse h : W A B.
Since g is differentiable, it is easy to find the Jacobi matrix. In fact, since fi (x, g(x)) = 0,
f
on both sides gives by the chain rule
i = 1, . . . n, taking the partial derivative x
j
m

fi (x, g(x)) X fi (x, g(x)) gk (x)


0=
+

xj
yk
xj
k=1


fi (x, g(x))
gk (x)

0=
+ fy (x, g(x))
xj
xj




fi (x, g(x))
gk (x)

= fy (x, g(x))
.
xj
xj


fi (x, g(x))


xj

Since det fy (a, b) 6= 0, det fy (x, y) 6= 0 in a small neighborhood of (a, b). Hence fy (x, g(x))
is invertible and we can multiply the preceding equation from the left by (fy (x, g(x)))1 which
gives (7.46).
Remarks 7.9 (a) The theorem gives a sufficient condition for locally solving the system of
equations
0 = f1 (x1 , . . . , xn , y1 , . . . ym ),
..
.
0 = fm (x1 , . . . , xn , y1 , . . . , ym )

7.6 The Implicit Function Theorem

221

with given x1 , . . . , xn for y1 , . . . , ym .


(b) We rewrite the statement in case n = m = 1: If f (x, y) is continously differentiable on an
open set G R2 which contains (a, b) and f (a, b) = 0. If fy (a, b) 6= 0 then there exist , > 0
such that the following holds: for every x U (a) there exists a unique y = g(x) U (b) with
f (x, y) = 0. We have g(a) = b; the function y = g(x) is continuously differentiable with
g (x) =
Be careful, note fx (x, g(x)) 6=

d
dx

fx (x, g(x))
.
fy (x, g(x))

(f (x, g(x))).

Example 7.16 (a) Let f (x, y) = sin(x + y) + exy 1. Note that f (0, 0) = 0. Since
fy (0, 0) = cos(x + y) + xexy |(0,0) = cos 0 + 0 = 1 6= 0
f (x, y) = 0 can uniquely be solved for y = g(x) in a neighborhood of x = 0, y = 0. Further
fx (0, 0) = cos(x + y) + yexy |(0,0) = 1.
By Remark 7.9 (b)

fx (x, y)
cos(x + g(x)) + g(x) exg(x)
=
.
g (x) =
fy (x, y) y=g(x)
cos(x + g(x)) + x exg(x)

In particular g (0) = 1.
Remark. Differentiating the equation fx + fy g = 0 we obtain

0 = fxx + fxy g + (fyx + fyy g )g + fy g



1
fxx + 2fxy g + fyy (g )2
g =
fy
fxx fy2 + 2fxy fx fy fyy fx2

g
=
.
fy3
g = ffx
y
Since

fxx (0, 0) = sin(x + y) + y 2 exy (0,0) = 0,

fyy (0, 0) = sin(x + y) + x2 exy (0,0) = 0,

fxy (0, 0) = sin(x + y) + exy (1 + xy)|(0,0) = 1,

we obtain g (0) = 2. Therefore the Taylor expansion of g(x) around 0 reads


g(x) = x + x2 + r3 (x).
(b) Let (t) = (x(t), y(t)) be a differentiable curve C2 ([0, 1]) in R2 . Suppose in a neighborhood of t = 0 the curve describes a function y = g(x). Find the Taylor polynomial of degree
2 of g at x0 = x(0).

7 Calculus of Functions of Several Variables

222

Inserting the curve into the equation y = g(x) we have y(t) = g(x(t)). Differentiation gives
y = g x,

y = g x 2 + g x

Thus
g (x) =

y
,
x

g (x) =

y g x
yx xy
=
x 2
x 3

Now we have the Taylor ploynomial of g at x0


T2 (g)(x) = x0 + g (x0 )(x x0 ) +

g (x0 )
(x x0 )2 .
2

(c) The tangent hyper plane to a hyper surface.


Anyone who understands geometry can understand everything in this world
(Galileo Galilei, 1564 1642)
Suppose that F : U R is continuously differentiable, a U, F (a) = 0, and grad F (a) 6= 0.
Then
n
X
F (a)(x a) =
Fxi (a)(xi ai ) = 0
i=1

is the equation of the tangent hyper plane to the surface F (x) = 0 at point a.
Proof. Indeed, since the gradient at a is nonzero we may assume without loss of generality that
Fxn (a) 6= 0. By the IFT, F (x1 , . . . , xn1 , xn ) = 0 is locally solvable for xn = g(x1 , . . . , xn1 )
a) = an , where a
= (a1 , . . . , an1 ) and
in a neighborhood of a = (a1 , . . . , an ) with g(
x = (x1 , . . . , xn1 ). Define the tangent hyperplane to be the graph of the linearization of g
is given by
at (a1 , . . . , an1 , an ). By Example 7.7 (a) the hyperplane to the graph of g at a
xn = g(
a) + grad g(
a)x.
Since F (
a, g(
a)) = 0, by the implicit function theorem
Fx (a)
g(
a)
= j
,
xj
Fxn (a)

j = 1, . . . , n 1.

Inserting this into (7.47) we have


n1

xn an =

1 X
Fx (a)(xj aj ).
Fxn (a) j=1 j

Multiplication by Fxn (a) gives


Fxn (a)(xn an ) =

n1
X
j=1

Fxj (a)(xj aj ) = 0 = grad F (a)(x a).

(7.47)

7.7 Lagrange Multiplier Rule

223

U1
U0

U1

Let f : U R be differentiable. For c R


define the level set Uc = {x U | f (x) =
c}. The set Uc may be empty, may consist of
a single point (in case of local extrema) or, in
the generic case, that is if grad F (a) 6= 0 and
Uc is non-empty, Uc it is a (n 1)-dimensional
hyper surface. {Uc | c R} is family of nonintersecting subsets of U which cover U.

7.7 Lagrange Multiplier Rule


This is a method to find local extrema of a function under certain constraints.
Consider the following problem: Find local extremma of a function f (x, y) of two variables
where x and y are not independent from each other but satisfy the constraint
(x, y) = 0.
Suppose further that f and are continuously differentiable. Note that the level sets Uc =
{(x, y) R2 | f (x, y) = c} form a family of non-intersecting curves in the plane.

f=c

We have to find the curve f (x, y) = c intersecting


the constraint curve (x, y) = 0 where c is as large
or as small as possible. Usually f = c intersects
= 0 if c monotonically changes. However if c is
maximal, the curve f = c touches the graph =
0. In other words, the tangent lines coincide. This
means that the defining normal vectors to the tangent
lines are scalar multiples of each other.

=0

Theorem 7.20 (Lagrange Multiplier Rule) Let f, : U R, U Rn is open, be continuously differentiable and f has a local extrema at a U under the constraint (x) = 0. Suppose
that grad (a) 6= 0.
Then there exists a real number such that
grad f (a) = grad (a).
This number is called Lagrange multiplier.
Proof. The idea is to solve the constraint (x) = 0 for one variable and to consider the free
extremum problem with one variable less. Suppose without loss of generality that xn (a) 6=
0. By the implicit function theorm we can solve (x) = 0 for xn = g(x1 , . . . , xn1 ) in a
neighborhood of x = a. Differentiating (
x, g(
x)) = 0 and inserting a = (
a, an ) as before we
have
xj (a) + xn (a)gxj (
a) = 0,

j = 1, . . . , n 1.

(7.48)

7 Calculus of Functions of Several Variables

224

Since h(
x) = f (
x, g(
x)) has a local extremum at a
all partial derivatives of h vanish at a
:
fxj (a) + fxn (a)gxj (
a) = 0,

j = 1, . . . , n 1.

(7.49)

Setting = fxn (a)/xn (a) and comparing (7.48) and (7.49) we find
fxj (a) = xj (a),

j = 1, . . . , n 1.

Since by definition, fxn (a) = xn (a) we finally obtain grad f (a) = grad (a) which
completes the proof.

Example 7.17 (a) Let A = (aij ) be a real symmetric n n-matrix, and define f (x) = xAx =
P
n1
= {x Rn | kxk = 1}.
i,j aij xi xj . We aks for the local extrema of f on the unit sphere S
P
This constraint can be written as (x) = kxk2 1 = ni=1 x2i 1 = 0. Suppose that f attains
a local minimum at a Sn1 . By Example 7.6 (b)
grad f (a) = 2A(a).
On the other hand
grad (a) = (2x1 , . . . , 2xn )|x=a = 2a.
By Theorem 7.20 there exists a real number 1 such that
grad f (a) = 2A(a) = 1 grad (a) = 2a,
Hence A(a) = 1 a; that is, is an eigenvalue of A and a the corresponding eigenvector. In
particular, A has a real eigenvalue. Since Sn1 has no boundary, the global minimum is also a
local one. We find: if f (a) = aA(a) = aa = is the global minimum, is the smallest
eigenvalue.
(b) Let a be the point of a hypersurface M = {x | (x) = 0} with minimal distance to a given
point b 6 M. Then the line through a and b is orthogonal to M.
Indeed, the function f (x) = kx bk2 attains its minimum under the condition (x) = 0 at a.
By the Theorem, there is a real number such that
grad f (a) = 2(a b) = grad (a).
The assertion follows since by Example 7.16 (c), grad (a) is orthogonal to M at a and b a
is a multiple of the normal vector (a).
Theorem 7.21 (Lagrange Multiplier Rule extended version) Let f, i : U R, i =
1, . . . , m, m < n, be continuously differentiable functions. Let M = {x U | 1 (x) = =
m (x) = 0} and suppose that f (x) has a local extrema at a under the constraints x M.
Suppose further that the Jacobi matrix (a) Rmn has maximal rank m.
Then there exist real numbers 1 , . . . , m such that
grad f (a) = grad (1 1 + + m m )(a) = 0.
Note that the rank condition ensures that there is a choice of m variables out of x1 , . . . , xn such
that the Jacobian of 1 , . . . , m with respect to this set of variable is nonzero at a.

7.8 Integrals depending on Parameters

225

7.8 Integrals depending on Parameters


Rb
Problem: Define I(y) = a f (x, y) dx; what are the relations between properties of f (x, y) and
of I(y) for example with respect to continuity and differentiability.

7.8.1 Continuity of I(y)


Proposition 7.22 Let f (x, y) be continuous on the rectangle R = [a, b] [c, d].
Rb
Then I(y) = a f (x, y) dx is continuous on [c, d].

Proof. Let > 0. Since f is continuous on the compact set R, f is uniformly continuous on R
(see Proposition 6.25). Hence, there is a > 0 such that | x x | < and | y y | < and
(x, y), (x, y ) R imply
| f (x, y) f (x , y ) | < .

Therefore, | y y0 | < and y, y0 [c, d] imply

Z b



(f (x, y) f (x, y0 )) dx (b a).
| I(y) I(y0 ) | =
a

This shows continuity of I(y) at y0 .


For example, I(y) =

R1
0

arctan xy dx is continuous for y > 0.

Remark 7.10 (a) Note that continuity at y0 means that we can interchange the limit and the
Z b
Z b
Z b
integral, lim
f (x, y) dx =
lim f (x, y) dx =
f (x, y0 ) dx.
yy0

a yy0

(b) A similar statement holds for y : Suppose that f (x, y) is continuous on [a, b][c, +)
and limy+ f (x, y) = (x) exists uniformly for all x [a, b] that is
> 0 R > 0 x [a, b], y R : | f (x, y) (x) | < .
Then

Rb
a

(x) dx exists and lim I(y) =


y

(x) dx.
a

7.8.2 Differentiation of Integrals


Proposition 7.23 Let f (x, y) be defined on R = [a, b] [c, d] and continuous as a function of x
for every fixed y. Suppose that fy (x, y) exists for all (x, y) R and is continuous as a function
of the two variables x and y.
Then I(y) is differentiable and
d
I (y) =
dy

f (x, y) dx =

fy (x, y) dx.

7 Calculus of Functions of Several Variables

226

Proof. Let > 0. Since fy (x, y) is continuous, it is uniformly continuous on R. Hence there
exists > 0 such that | x x | < and | y y | < imply | fy (x , y ) fy (x , y ) | < .
We have for | h | <

Z b 


Z b
I(y0 + h) I(y0 )


f (x, y0 + h) f (x, y0 )


fy (x, y0 ) dx
fy (x, y0 ) dx

h
h
a
a
Z b
| fy (x, y0 + h) fy (x, y0 ) | dx < (b a)

Mean value theorem

for some (0, 1). Since this inequality holds for all small h, it holds for the limit as h 0,
too. Thus,


Z b


I (y0 )
fy (x, y0 ) dx (b a).

a

Since was arbitrary, the claim follows.

In case of variable integration limits we have the following theorem.


Proposition 7.24 Let f (x, y) be as in Proposition 7.23. Let (y) and (y) be differentiable on
[c, d], and suppose that ([c, d]) and ([c, d]) are contained in [a, b].
R (y)
Let I(y) = (y) f (x, y) dx. Then I(y) is differentiable and

I (y) =

Proof. Let F (y, u, v) =


of calculus yields

(y)

(y)

Rv
u

fy (x, y) dx + (y)f ((y), y) (y)f ((y), y).

f (x, y) dx; then I(y) = F (y, (y), (y)). The fundamental theorem

F
(y, u, v) =
v
F
(y, u, v) =
u

Z v

f (x, y) dx = f (v, y),


v u
 Z u


f (x, y) dx = f (u, y).


u
v

By the chain rule, the previous proposition and (7.51) we have


F
F
F
+
(y) +
(y)
y
u
v
F
F
F
=
(y, (y), (y)) +
(y, (y), (y)) (y) +
(y, (y), (y)) (y)
y
u
v
Z (y)
fy (x, y) dx + (y)(f ((y), y)) + (y)f ((y), y).
=

I (y) =

(y)

(7.50)

(7.51)

7.8 Integrals depending on Parameters

227

R4
Example 7.18 (a) I(y) = 3 sin(xy)
dx is differentiable by Proposition 7.23 since fy (x, y) =
x
cos(xy)
x = cos(xy) is continuous. Hence
x

I (y) =

4
sin(xy)
sin 4y sin 3y

.
cos(xy) dx =
=

y
y
y
3

(b) I(y) =

R sin y
log y

ex y dx is differentiable with

I (y) =

sin y

x2 ex y dx + cos yey sin

log y

1
2
ey(log y) .
y

7.8.3 Improper Integrals with Parameters


R

f (x, y) dx exists for y [c, d].


R
Definition 7.10 We say that the improper integral a f (x, y) dx converges uniformly with respect to y on [c, d] if for every > 0 there is an A0 > 0 such that A > A0 implies

Z

Z A



I(y)
f (x, y) dx
f (x, y) dx <

Suppose that the improper integral

for all y [c, d].

Note that the Cauchy and Weierstra criteria (see Proposition 6.1 and Theorem 6.2) for uniform
convergence of series of functions also hold for improper parametric integrals. For example the
theorem of Weierstra now reads as follows.
RA
Proposition 7.25 Suppose that a f (x, y) dx exists for all A a and y [c, d]. Suppose
R
further that | f (x, y) | (x) for all x a and a (x) dx converges.
R
Then a f (x, y) dx converges uniformly with respect to y [c, d].
R
Example 7.19 I(y) = 1 exy xy y 2 dx converges uniformly on [2, 4] since


| f (x, y) | = exy xy y 2 e2x x4 42 = (x).
R
and 1 e2x x4 42 dx < converges.
If we add the assumption of uniform convergence then the preceding theorems remain true for
improper integrals.

Proposition 7.26 Let f (x, y) be continuous on {(x, y) R2 | a x < , c y d}.


R
Suppose that I(y) = a f (x, y) dx converges uniformly with respect to y [c, d].
Then I(y) is continuous on [c, d].
Proof. This proof was not carried out in the lecture. Let > 0. Since the improper integral
converges uniformly, there exists A0 > 0 such that for all A A0 we have
Z




<
f
(x,
y)
dx


A

7 Calculus of Functions of Several Variables

228

for all y [c, d]. Let A A0 be fixed. On {(x, y) R2 | a x A, c y d} f (x, y)


is uniformly continuous; hence there is a > 0 such that | x x | < and | y y | <
implies

.
| f (x , y ) f (x , y ) | <
Aa
Therefore,
Z A

(A a) = , for | y y0 | < .
| f (x, y) f (x, y0 ) | dx <
Aa
a
Finally,
| I(y) I(y0) | =

Z

| f (x, y) f (x, y0 ) | 2 for

| y y0 | < .

We skip the proof of the following proposition.


Proposition 7.27 Let fy (x, y) be continuous on {(x, y) | a x < , c y d}, f (x, y)
continuous with respect to x for all fixed y [c, d].
R
Suppose that for all y [c, d] the integral I(y) = a f (x, y) dx exists and the integral
R
fy (x, y) dx converges uniformly with respect to y [c, d].
a
R
Then I(y) is differentiable and I (y) = a fy (x, y) dx.
Combinig the results of the last Proposition and Proposition 7.25 we get the following corollary.

Corollary 7.28 Let fy (x, y) be continuous on {(x, y) | a x < , c y d}, f (x, y)


continuous with respect to x for all fixed y [c, d].
Suppose that
R
(a) for all y [c, d] the integral I(y) = a f (x, y) dx exists,
(b) | fy (x, y) | (x) for all x a and all y
R
(c) a (x) dx exists.
R
Then I(y) is differentiable and I (y) = a fy (x, y) dx.
R
2
2
Example 7.20 (a) I(y) = 0 ex cos(2yx) dx. f (x, y) = ex cos(2yx), fy (x, y) =
2
2x sin(2yx) ex converges uniformly with respect to y since
2

| fy (x, y) | 2xex Kex


Hence,

I (y) =

2 /2

2x sin(2yx) ex dx.

Integration by parts with u = sin(2yx), v = ex 2x gives u = 2y cos(2yx), v = ex and


Z A
Z A
2
x2
A2
e 2x sin(2yx) dx = sin(2yA) e

2y cos(2yx) ex dx.
0

7.8 Integrals depending on Parameters

229

As A the first summand on the right tends to 0; thus I(y) satisfies the ordinary differential
equation
I (y) = 2yI(y).
ODE: y = 2xy; dy = 2xy dx; dy/y = 2x dx. Integration yields log y = x2 + c;
2
y = c ex .
2
The general solution is I(y) = Cey . We determine the constant C. Insert y = 0. Since
R

2
I(0) = 0 ex dx = /2, we find

y2
I(y) =
e .
2
R
(b) The Gamma function (x) = 0 tx1 et dt
is in C (R+ ). Let x > 0, say x
[c, d]. Recall from Subsection 5.3.3 the definition and the proof of the convergence of the imR1
proper integrals 1 = 0 f (x, t) dt and 2 =
R
f (x, t) dt, where f (x, t) = tx1 et . Note
1
that 1 (x) is an improper integral at t = 0 + 0.
By LHospitals rule lim t log t = 0 for all

Gamma Function

2.2

1.8

1.6

1.4

1.2

t0+0

> 0. In particular, | log t | < tc/2 if 0 < t <


t0 < 1.

0.5

1.5

2.5

Since et < 1 and moreover tx1 < tc1 for t < t0 by Lemma 1.23 (b) we conclude that





f (x, t) = tx1 log tet | log t | tc1 t 2c tc1 = 1 c ,
x

t1 2

for 0 < t < t0 . Since (t) = 11 2c is integrable over [0, 1], 1 (x) is differentiable by the
t
R1
Corrollary with 1 (x) = 0 tx1 log tet dt. Similarly, 2 (x) is an improper integral over an
unbounded interval [1, +), for sufficiently large t t0 > 1, we have log t < t and tx < td ,
such that

f (x, t) = tx1 log tet tx et td et td et/2 et/2 Met/2 .


x
Since td et/2 tends to 0 as t , it is bounded by some constant M and et/2 is integrable on
[1, +) such that 2 (x) is differentiable with
Z

2 (x) =
tx1 log tet dt.
1

Consequently, (x) is differentiable for all x > 0 with


Z

tx1 log t et dt.


(x) =
0

Similarly one can show that C (R>0 ) with


Z
(k)
(x) =
tx1 (log t)k et dt.

7 Calculus of Functions of Several Variables

230

7.9 Appendix



fi
Proof of Proposition 7.7. Let A = (Aij ) =
be the matrix of partial derivatives considxj
ered as a linear map from Rn to Rm . Our aim is to show that
kf (a + h) f (a) A hk
= 0.
h0
khk
lim

For, it suffices to prove the convergence to 0 for each coordinate i = 1, . . . , m by Proposition 6.26
P
fi (a + h) fi (a) nj=1 Aij hj
= 0.
(7.52)
lim
h0
khk
Without loss of generality we assume m = 1 and f = f1 . For simplicity, let n = 2, f = f (x, y),
a = (a, b), and h = (h, k). Note first that by the mean value theorem we have
f (a + h, b + k) f (a, b) = f (a + h, b + k) f (a, b + k) + f (a, b + k) f (a, b)
f
f
(, b + k)h +
(a, )k,
=
x
y
where (a, a + h) and (b, b + k). Using this, the expression in(7.52) reads
(a, b) h f
(a, b) k
f (a + h, b + k) f (a, b) f
x
y

=
h2 + k 2



f
f
f
f
(,
b
+
k)

(a,
b)
h
+
(a,
)

(a,
b)
k
x
x
y
x

.
=
h2 + k 2
Since both f
and f
are continuous at (a, b),
x
y
U ((a, b)) implies



f
f


x (x, y) x (a, b) < ,
This shows

f (, b + k)
x


f
(a, b)
x

h+

h2 + k 2

given > 0 we find > 0 such that (x, y)

f
(a, )
y



f

f


y (x, y) y (a, b) < .

f
(a, b)
x


k
|h| + |k|

2,

h2 + k 2

(a, b) f
(a, b)).
hence f is differentiable at (a, b) with Jacobi matrix A = ( f
x
y
Since both components of Athe partial derivativesare continuous functions of (x, y), the
assignment x 7 f (x) is continuous by Proposition 6.26.

Chapter 8
Curves and Line Integrals
8.1 Rectifiable Curves
8.1.1 Curves in

We consider curves in Rk . We define the tangent vector, regular points, angle of intersection.
Definition 8.1 A curve in Rk is a continuous mapping : I Rk , where I R is a closed
interval consisting of more than one point.
The interval can be I = [a, b], I = [a, +), or I = R. In the first case (a) and (b) are
called the initial and end point of . These two points derfine a natural orientation of the curve
from (a) to (b). Replacing (t) by (a + b t) we obtain the curve from (b) to (a) with
opposite orientation.
If (a) = (b), is said to be a closed curve. The curve is given by a k-tupel = (1 , . . . , k )
of continuous real-valued functions. If is differentiable, the curve is said to be differentiable.
Note that we have defined the curve to be a mapping, not a set of points in Rk . Of course, with
each curve in Rk there is associated a subset of Rk , namely the image of ,
C = (I) = {(t) Rk | t I}.
but different curves may have the same image C = (I). The curve is said to be simple if
is injective on the inner points I of I. A simple curve has no self-intersection.
Example 8.1 (a) A circle in R2 of radius r > 0 with center (0, 0) is described by the curve
: [0, 2] R2 ,

(t) = (r cos t, r sin t).

Note that : [0, 4] R2 with (t) = (t) has the same image but is different from . is a
simple curve, is not.
(b) Let p, q Rk be fixed points, p 6= q. Then
1 (t) = (1 t)p + tq,

2 (t) = (1 t)p + tq,


231

t [0, 1],
t R,

8 Curves and Line Integrals

232

are the segment pq from p to q and the line pq through p and q, respectively. If v Rk is a
vector, then 3 (t) = p + tv, t R, is the line through p with direction v.
(c) If f : [a, b] R is a continuous function, the graph of f is a curve in R2 :
: [a, b] R2 ,

(t) = (t, f (t)).

(d) Implicit Curves. Let F : U R2 R be continuously differentiable, F (a, b) = 0, and


F (a, b) 6= 0 for some point (a, b) U. By the Implicit function theorem, F (x, y) = 0 can
locally be solved for y = g(x) or x = f (y). In both cases (t) = (t, g(t)) and (t) = (f (t), t)
is a curve through (a, b). For example,
F (x, y) = y 2 x3 x2 = 0
is locally solvable except for (a, b) = (0, 0). The corresponding curve is Newtons knot
Definition 8.2 (a) A simple curve : I Rk is said to be regular at t0 if is continuously
differentiable on I and (t0 ) 6= 0. is regular if it is regular at every point t0 I.
(b) The vector (t0 ) is called the tangent vector, (t) = (t0 ) + t (t0 ), t R, is called the
tangent line to the curve at point (t0 ).
Remark 8.1 The moving partice. Let t the time variable and s(t) the coordinates of a point
vector of the moving point.
moving in Rk . Then the tangent vector v(t) = s (t) is the velocityp
The instantaneous velocity is the euclidean norm of v(t) kv(t)k = s1 (t)2 + + sk (t)2 . The
acceleration vector is the second derivative of s(t), a(t) = v (t) = s (t).
Let i : Ii Rk , i = 1, 2, be two regular curves with a common point 1 (t1 ) = 2 (t2 ). The
angle of intersection between the two curves i at ti is defined to be the angle between the
two tangent lines 1 (t1 ) and 2 (t2 ). Hence,
cos =

1 (t1 )2 (t2 )
,
k1 (t1 )k k2 (t2 )k

[0, ].

NewtonsKnot

1.5

1
0.5

0
-1

-0.5

0
-0.5
-1

-1.5

0.5

Example 8.2 (a) Newtons knot. The curve


: R R2 given by (t) = (t2 1, t3 t)
is not injective since (1) = (1) = (0, 0) =
x0 . The point x0 is a double point of the curve.
In general has two different tangent lines at
a double point. Since (t) = (2t, 3t2 1) we
have (1) = (2, 2) and (1) = (2, 2). The
curve is regular since (t) 6= 0 for all t.

Let us compute the angle of self-intersection. Since (1) = (1) = (0, 0), the selfintersection angle satisfies
cos =

(2, 2)(2, 2)
= 0,
8

8.1 Rectifiable Curves

233

hence = 90 , the intersection is orthogonal.


(b) Neils parabola. Let : R R2 be given by (t) = (t2 , t3 ). Since (t) = (2t, 3t2 ), the
origin is the only singular point.

8.1.2 Rectifiable Curves


The goal of this subsection is to define the length of a curve. For differentiable curves, there
is a formula using the tangent vector. However, the lenght of a curve makes sense for some
non-differentiable, continuous curves.
Let : [a, b] Rk be a curve. We associate to each partition P = {t0 , . . . , tn } of [a, b] the
points xi = (ti ), i = 0, . . . , n, and the number
(P, ) =

n
X
i=1

k(ti ) (ti1 )k .

(8.1)

The ith term in this sum is the euclidean distance of the points xi1 = (ti1 ) and xi = (ti ).
x3

Hence (P, ) is the length of the polygonal


path with vertices x0 , . . . , xn . As our partition becomes finer and finer, this polygon approaches the image of more and more closely.

x0
x11
x2
a

t1

t2

Definition 8.3 A curve : [a, b] Rk is said to be rectifiable if the set of non-negative real
numbers {(P, ) | P is a partition of [a, b]} is bounded. In this case
() = sup (P, ),
where the supremum is taken over all partitions P of [a, b], is called the length of .
In certain cases, () is given by a Riemann integral. We shall prove this for continuously
differentiable curves, i. e. for curves whose derivative is continuous.
Proposition 8.1 If is continuous on [a, b], then is rectifiable, and
Z b
k (t)k dt.
() =
a

R ti
Proof. If a ti1 < ti b, by Theorem 5.28, (ti ) (ti1 ) = ti1
(t) dt. Applying
Proposition 5.29 we have
Z ti
Z ti


k(ti ) (ti1 )k =
(t) dt
k (t)k dt.

ti1

Hence

(P, )

ti1

k (t)k dt

8 Curves and Line Integrals

234
for every partition P of [a, b]. Consequently,
Z b
()
k (t)k dt.
a

To prove the opposite inequality, let > 0 be given. Since is uniformly continuous on [a, b],
there exists > 0 such that
k (s) (t)k < if

| s t | < .

Let P be a partition with ti for all i. If ti1 t ti it follows that


k (t)k k (ti )k + .
Hence

ti

k (t)k dt k (ti )k ti + ti
ti1
Z ti


+ ti
=
(
(t)

(t
)

(t))
dt
i


t
Z i1



Z
ti
ti

+
+ ti

(t)
dt
(
(t
)

(t))
dt
i



ti1

ti1

k (ti ) (ti1 )k + 2ti .

If we add these inequalities, we obtain


Z b
k (t)k dt (P, ) + 2(b a) () + 2(b a).
a

Since was arbitrary,

This completes the proof.

k (t) dtk ().

Special Case k = 2
k = 2, (t) = (x(t), y(t)), t [a, b]. Then
Z bp
() =
x (t)2 + y (t)2 dt.
a

In particular, let (t) = (t, fp


(t)) be the graph of a continuously differentiable function
Rb
f : [a, b] R. Then () = a 1 + (f (t))2 dt.

Example 8.3 Catenary Curve. Let f (t) = a cosh at , t [0, b], b > 0. Then f (t) = sinh at
and moreover
s
b

2
Z b
Z b
t
t
b
t
() =
1 + sinh
dt =
cosh dt = a sinh = a sinh .
a
a
b 0
a
0
0

8.1 Rectifiable Curves

235

Cycloid and Circle


2

1.5

0.5

(b) The position of a bulge in a bicycle


tire as it rolls down the street can be
parametrized by an angle as shown
in the figure.

Cycloid
Defining Circle
Curve 3
Curve 4

Let the radius of the tire be a. It can be verified by plane trigonometry that


a( sin )
.
() =
a(1 cos )
This curve is called a cycloid.
Find the distance travelled by the bulge for 0 2.

Using 1 cos = 2 sin2 we have


2
() = a(1 cos , sin )
q

k ()k = a (1 cos )2 + sin2 = a 2 2 cos


= a 2 1 cos = 2a sin .
2
Therefore,
() = 2a

sin d = 4a
2

 2

= 4a( cos + cos 0) = 8a.
cos
2 0

(c) The arc element ds. Formally the arc element of


a plane differentiable curve can be computed using the
pythagorean theorem
ds
p
dy
( ds)2 = ( dx)2 + ( dy)2 = ds = dx2 + dy 2
r
dy 2
ds = dx 1 +
dx
dx2
p
ds = 1 + (f (x))2 dx.
(d) Arc of an Ellipse. The ellipse with equation x2 /a2 + y 2/b2 = 1 , 0 < b a, is parametrized
by (t) = (a cos t, b sin t), t [0, t0 ], such that (t) = (a sin t, b cos t). Hence,
Z t0 p
Z t0 p
2
a2 sin t + b2 cos2 t dt =
a2 (a2 b2 ) cos2 t dt
() =
0
0
Z t0
1 2 cos2 t dt,
=a
0

where =

a2 b2
.
a

This integral can be transformed into the function


Z p
E(, ) =
1 2 sin2 t dt
0

8 Curves and Line Integrals

236

is the elliptic integral of the second kind as defined in Chapter5.


(e) A non-rectifiable curve. Consider the graph (t) = (t, f (t)), t [0, 1], of the function f ,
(

,
0 < t 1,
t cos 2t
f (t) =
0,
t = 0.
Since lim f (t) = f (0) = 0, f is continuous and (t) is a curve. However, this curve is not rect0+0

tifiable. Indeed, choose the partition Pk = {t0 = 0, 1/(4k), 1/(4k 2), . . . , 1/4, 1/2, t2k+1 =
1
1} consisting of 2k + 1 points, ti = 4k2i+2
, i = 1, . . . , 2k. Note that t0 = 0 and


t2k+1 = 1 play a special role and will be omitted in the calculations below. Then cos 2ti =
(1, 1, 1, 1, . . . , 1, 1), i = 1, . . . , 2k. Thus
(Pk , )

2k
X
p
i=2

(ti ti1

)2

+ (f (ti ) f (ti1

))2

2k
X
i=2

| f (ti ) f (ti1 ) |

 



1
1
1 1
1
1

+
++
+
+
+
4k 2 4k
4k 4 4k 2
2 4


1
1
1
1
+
++
= +2
2
4
4k 2
4k

which is unbounded for k since the harmonic series is unbounded. Hence is not
rectifiable.

8.2 Line Integrals


A lot of physical applications are to be found in [MW85, Chapter 18]. Integration of vector
fields along curves is of fundamental importance in both mathematics and physics. We use the
concept of work to motivate the material in this section.
The motion of an object is described by a parametric curve ~x = ~x(t) = (x(t), y(t), z(t)). By
differentiating this function, we obtain the velocity ~v (t) = ~x (t) and the acceleration ~a(t) =
~x (t). We use the physicist notation ~x (t) and ~x(t) to denote derivatives with respect to the time
t.
According to Newtons law, the total force F acting on an object of mass m is
F = m~a.
Since the kinetic energy K is defined by K = 12 m~v 2 = 12 m~v ~v we have
1

K(t)
= m(~v ~v + ~v ~v ) = m~a ~v = F ~v.
2
The total change of the kinetic energy from time t1 to t2 , denoted W , is called the work done by
the force F along the path ~x(t):
Z t2
Z t2
Z t2

W =
K(t)
dt =
F ~v dt =
F (t) ~x (t) dt.
t1

t1

t1

8.2 Line Integrals

237

Let us now suppose that the force F at time t depends only on the position ~x(t). That is, we
assume that there is a vector field F~ (~x) such that F (t) = F~ (~x(t)) (gravitational and electrostatic
attraction are position-dependent while magnetic forces are velocity-dependent). Then we may
rewrite the above integral as
Z
t2

W =

t1

F~ (~x(t)) ~x (t) dt.

In the one-dimensional case, by a change of variables, this can be simplified to


Z b
W =
F (x) dx,
a

where a and b are the starting and ending positions.

Definition 8.4 Let = {~x(t) | t [r, s]}, be a continuously differentiable curve ~x(t)
C1 ([r, s]) in Rn and f~ : Rn a continuous vector field on . The integral
Z
Z s
~
f (~x) d~x =
f~(~x(t)) ~x (t) dt

is called the line integral of the vector field f~ along the curve .

Remark 8.2 (a) The definition of the line integral does not depend on the parametrization of
.
(b) If we take different curves between the same endpoints, the line integral may be different.
R
(c) If the vector field f~ is orthogonal to the tangent vector, then f~ d~x = 0.
(d) Other notations. If f~ = (P, Q) is a vector field in R2 ,
Z
Z
~
f d~x =
P dx + Q dy,

R
where the right side is either a symbol or P dx = (P, 0) d~x.
R
Example 8.4 (a) Find the line integral i y dx + (x y) dy, i =
1, 2, where
R

1 = {~x(t) = (t, t2 ) | t [0, 1]}

and 2 = 3 4 ,

y dx + (x y) dy =
R

1
2

(t 1 + (t t )2t) dt =
R

with 3 = {(t, 0) | t [0, 1]}, 4 = {(1, t) | t [0, 1]}.


In the first case ~x (t) = (1, 2t); hence
Z

(1,1)

(0,0)

(1,0)

1
(3t2 2t3 ) dt = .
2

In the second case 2 f d~x = 3 f d~x + 4 f d~x. For the first part ( dx, dy) = ( dt, 0), for the
second part ( dx, dy) = (0, dt) such that
1
Z
Z
Z 1
Z 1
1 2
1
f d~x =
y dx + (x y) dy =
0 dt + (t 0) 0 +
t 0 + (1 t) dt = t t = .
2 0 2

0
0
(b) Find the work done by the force field F~ (x, y, z) = (y, x, 1) as a particle moves from
(1, 0, 0) to (1, 0, 1) along the following paths = 1:

8 Curves and Line Integrals

238
~x(t) = (cos t, sin t, 2t ), t [0, 2],
We find

F~ d~x =

( sin t, cos t, 1) ( sin t, cos t, 1/(2)) dt



Z 2 
1
2
2
=
dt
sin t cos t +
2
0
0

= 2 + 1.

In case = 1, the motion is with the force, so the work is positive; for the path = 1, the
motion is against the force and the work is negative.
We can also define a scalar line integral in the following way. Let : [a, b] Rn be a continuously differentiable curve, = ([a, b]), and f : R a continuous function. The integral
Z
Z b
f (x) ds :=
f ((t)) k (t)k dt

is called the scalar line integral of f along .


Properties of Line Integrals
Remark 8.3 (a) Linearity.
Z
Z
Z
~
~
f d~x + ~g d~x,
(f + ~g ) d~x =

f~ d~x =

f~ d~x.

(b) Change of orientation. If ~x(t), t [r, s] defines a curve which goes from a = ~x(r) to
b = ~x(s), then ~y (t) = ~x(r + s t), t [r, s], defines the curve which goes in the opposite
direction from b to a. It is easy to see that
Z
Z
f~ d~x = f~ d~x.

(c) Triangle inequality.

Z






~
f~ d~x ( ) sup
f(x) .


x

Proof. Let ~x(t), t [t0 , t1 ] be a parametrization of , then


Z t1

Z
Z t1




~

~

f~ d~x =
f
(~
x
(t))
f
(~
x
(t))

~
x
(t)
dt


k~x (t)k dt

tri.in.,CSI

t0
t0
Z t



~ 1
~
sup f (~x)
k~x (t)k dt = sup f (~x) ( ).
~
x

~
x

t0

(d) Splitting. If 1 and 2 are two curves such that the ending point of 1 equals the starting
point of 2 then
Z
Z
Z
f~ d~x =
f~ d~x +
f~ d~x.
1 2

8.2 Line Integrals

239

8.2.1 Path Independence


Problem: For which vector fields f~ the line integral from a to b does not depend upon the path
(see Example 8.4 (a) Example 8.2)?
Definition 8.5 A vector field f~ : G Rn , G Rn , is called conservative if for any points a
and b in G and any curves 1 and 2 from a to b we have
Z
Z
f~ d~x =
f~ d~x.
1

In this case we say that the line integral


Rb
f~ d~x.
a

f~ d~x is path independent and we use the notation

Definition 8.6 A vector field f~ : G Rn is called potential field or gradient vector field if
there exists a continuously differentiable function U : G R such that f~(x) = grad U(x) for
x G. We call U the potential or antiderivative of f~.
Example 8.5 The gravitational force is given by
x
,
F~ (x) =
kxk3
where = mM. It is a potential field with potential
U(x) =

1
kxk

x
with f (y) = 1/y and f (y) =
This follows from Example 7.2 (a), grad f (kxk) = f (kxk) kxk
1/y 2.

Remark 8.4 (a) A vector field f~ is conservative if and only if the line integral over any closed
curve in G is 0. Indeed, suppose that f~ is conservative and = 1 2 is a closed curve,
where 1 is a curve from a to b and 2 is a curve from b to a. By Remark 8.3 (b), changing the
orientation of 2 , the sign of the line integral changes and 2 is again a curve from a to b:
Z
Z
Z
Z 
Z 
~
~
+

f d~x =
f d~x =
f~ d~x = 0.

The proof of the other direction is similar.

y
G

x
x

(b) Uniqueness of a potential. An open subset


G Rn is said to be connected, if any two
points x, y G can be connected by a polygonal path from x to y inside G. If it exists, U(x)
is uniquely determined up to a constant.

8 Curves and Line Integrals

240

Indeed, if grad U1 (x) = grad U2 (x) = f~ put U = U1 U2 then we have U = U1 U2 =


f~ f~ = 0. Now suppose that x, y G can be connected by a segment xy G inside G. By
the MVT (Corollary 7.12)
U(y) U(x) = U((1 t)x + ty) (y x) = 0,
since U = 0 on the segment xy. This shows that U = U1 U2 = const. on any polygonal
path inside G. Since G is connected, U1 U2 is constant on G.
Theorem 8.2 Let G Rn be a domain.
(i) If U : G R is continuously differentiable and f~ = grad U. Then f~ is conservative, and
for every (piecewise continuously differentiable) curve from a to b, a, b G, we have
Z
f~ d~x = U(b) U(a).

(ii) Let f~ : G Rn be a continuous, conservative vector field and a G. Put


Z x
U(x) =
f~ d~y , x G.
a

Then U(x) is a potential for f~, that is grad U = f~.


(iii) A continuous vector field f~ is conservative in G if and only if it is a potential field.
Proof. (i) Let = {~x(t) | t [r, s]}, be a continuously differentiable curve from a = ~x(r) to
b = ~x(s). We define (t) = U(~x(t)) and compute the derivative using the chain rule
(t)
= grad U(~x(t)) ~x (t) = f~(~x(t)) ~x (t).
By definition of the line integral we have
Z
Z s
~
f d~x =
f~(~x(t)) ~x (t) dt.

Inserting the above expression and applying the fundamental theorem of calculus, we find
Z
Z s
~
(t)
dt = (s) (r) = U(~x(s)) U(~x(r)) = U(b) U(a).
f d~x =

(ii) Choose h Rn small such that x + th G for all t [0, 1]. By the path independence of
the line integral
Z x+h
Z x
Z x+h
~
~
U(x + h) U(x) =
f d~y
f d~y =
f~ d~y
a

Consider the curve ~x(t) = x + th, t [0, 1] from x to x + h. Then ~x (t) = h. By the mean value
theorem of integration (Theorem 5.18 with = 1, a = 0 and b = 1) we have
Z 1
Z x+h
~
~ + h) h,
f d~y =
f~(~x(t)) h dt = f(x
x

8.2 Line Integrals

241

where [0, 1]. We check grad U(x) = f~(x) using the definition of the derivative:









~

~
U(x + h) U(x) f~(x) h
f (x + h) f~(x) khk
(f(x + h) f~(x)) h
=

khk
khk
khk
CSI



~
= f~(x + h) f(x)
0,
h0

since f~ is continuous at x. This shows that U = f~.


(iii) follows immediately from (i) and (ii).

Remark 8.5 (a) In case n = 2, a simple path to compute the line integral (and so the potential
U) in (ii) consists of 2 segments: from (0, 0) via (x, 0) to (x, y). The line integral of P dx+Q dy
then reads as ordinary Riemann integrals
Z x
Z y
U(x, y) =
P (t, 0) dt +
Q(x, t) dt.
0

(b) Case n = 3. You can also use just one single segment from the origin to the endpoint
(x, y, z). This path is parametrized by the curve
~x(t) = (tx, ty, tz),

t [0, 1],

~x (t) = (x, y, z).

We obtain
U(x, y, z) =

=x

(x,y,z)

f1 dx + f2 dy + f3 dz
(0,0,0)
Z 1

f1 (tx, ty, tz) dt + y

f2 (tx, ty, tz) dt + z

(8.2)
Z

f3 (tx, ty, tz) dt.

(8.3)

(c) Although Theorem 8.2 gives a necessary and sufficient condition for a vector field to be
conservative, we are missing an easy criterion.
Recall from Example 7.4, that a necessary condition for f~ = (f1 , . . . , fn ) to be a potential vector
field is
fj
fi
=
, 1 i < j n.
xj
xi
which is a simple consequence from Schwarzs lemma since if fi = Uxi then
Uxi xj =
The condition

fi
xj

fj
,
xi

Uxj
Uxi
fi
fj
=
=
=
= Uxj xi .
xj
xj
xi
xi

~ It is a
1 i < j n. is called integrability condition for f.

necessary condition for f~ to be conservative. However, it is not sufficient.


Remark 8.6 Counter example. Let G = R2 \ {(0, 0)} and


y
x
~
.
f = (P, Q) =
,
x2 + y 2 x2 + y 2

8 Curves and Line Integrals

242

The vector field satisfies the integrability condition Py = Qx . However, it is not conservative.
For, consider the unit circle (t) = (cos t, sin t), t [0, 2]. Then (t) = ( sin t, cos t) and
Z
Z 2
Z
Z 2
y dx
sin t sin t dt cos t cos t dt
x dy
~
f d~x =
+
=
+ 2
=
dt = 2.
2
2
x + y2
1
1
0
0

x +y
R
This contradicts f~ d~x = 0 for conservative vector fields. Hence, f~ is not conservative.
f~ fails to be conservative since G = R2 \ {(0, 0)} has an hole.
For more details, see homework 30.1.
The next proposition shows that under one additional assumption this criterion is also sufficient.
A connected open subset G (a region) of Rn is called simply connected if every closed polygonal
path inside G can be shrunk inside G to a single point.
Roughly speaking, simply connected sets do not have holes.
convex subset of Rn
1-torus S1 = {z C | | z | = 1}
annulus {(x, y) R2 | r 2 < x2 + y 2 R2 }, 0 r < R
R2 \ {(0, 0)}
R3 \ {(0, 0, 0)}

simply connected
not simply connected
not simply connected
not simlpy connected
simply connected

The precise mathematical term for a curve to be shrinkable to a point is to be nullhomotopic.


Definition 8.7 (a) A closed curve : [a, b] G, G Rn open, is said to be null-homotopic if
there exists a continuous mapping h : [a, b] [0, 1] G and a point x0 G such that
(a) h(t, 0) = (t) for all t,
(b) h(t, 1) = x0 for all t,
(c) h(a, s) = h(b, s) = x0 for all s [0, 1].
(b) G is simply connected if any curve in G is null homotopic.
Proposition 8.3 Let f~ = (f1 , f2 , f3 ) a continuously differentiable vector field on a region G
R3 .
(a) If f~ is conservative then curl f~ = 0, i. e.
f2
f3

= 0,
x2 x3

f1
f3

= 0,
x3 x1

f2
f1

= 0.
x1 x2

(b) If curl f~ = 0 and G is simply connected, then f~ is conservative.


Proof. (a) Let f~ be conservative; by Theorem 8.2 there exists a potential U, grad U = f~.
However, curl grad U = 0 since
f3
f2
2U
2U

=0
x2 x3
x2 x3 x3 x2
by Schwarzs Lemma.
(b) This will be an application of Stokes theorem, see below.

8.2 Line Integrals

243

Example 8.6 Let on R3 , f~ = (P, Q, R) = (6xy 2 + ex , 6x2 y, 1). Then


curl f~ = (Ry Qz , Pz Rx , Qx Py ) = (0, 0, 12xy 12xy) = 0;
hence, f~ is conservative with the potential U(x, y, z).
First method to compute the potential U: ODE ansatz.
The ansatz Ux = 6xy 2 + ex will be integrated with respect to x:
Z
Z
U(x, y, z) = Ux dx + C(y, z) = (6xy 2 + ex ) dx + C(y, z) = 3x2 y 2 + ex + C(y, z).
Hence,
!

Uy = 6x2 y + Cy (y, z) = 6x2 y,

Uz = Cz = 1.

This implies Cy = 0 and Cz = 1. The solution here is C(y, z) = z + c1 such that U =


3x2 y 2 + ex + z + c1 .
Second method: Line Integrals. See Remark 8.5 (b))
Z 1
Z 1
Z 1
U(x, y, z) = x
f1 (tx, ty, tz) dt + y
f2 (tx, ty, tz) dt + z
f3 (tx, ty, tz) dt
0
0
0
Z 1
Z 1
Z 1
3
2
tx
3 2
=x
(6t xy + e ) dt + y
6t x y dt + z
dt
0

= 3x2 y 2 + ex + z.

244

8 Curves and Line Integrals

Chapter 9
Integration of Functions of Several
Variables
References to this chapter are [ON75, Section 4] which is quite elemantary and good accessible. Another elementary approach is [MW85, Chapter 17] (part III). A more advanced but still
good accessible treatment is [Spi65, Chapter 3]. This will be our main reference here. Rudins
book [Rud76] is not recommendable for an introduction to integration.

9.1 Basic Definition


The definition of the Riemann integral of a function f : A R, where A Rn is a closed
rectangle, is so similar to that of the ordinary integral that a rapid treatment will be given, see
Section 5.1.
If nothing is specified otherwise, A denotes a rectangle. A rectangle A is the cartesian product
of n intervals,
A = [a1 , b1 ] [an , bn ] = {(x1 , . . . , xn ) Rn | ak xk bk , k = 1, . . . , n}.
Recall that a partition of a closed interval [a, b] is a sequence t0 , . . . , tk where a = t0 t1
tk = b. The partition divides the interval [a, b] in to k subintervals [ti1 , ti ]. A partition
of a rectangle [a1 , b1 ] [an , bn ] is a collection P = (P1 , . . . , Pn ) where each Pi is a
partition of the interval [ai , bi ]. Suppose for example that P1 = (t0 , . . . , tk ) is a partition of
[a1 , b1 ] and P2 = (s0 , . . . , sl ) is a partition of [a2 , b2 ]. Then the partition P = (P1 , P2 ) of
[a1 , b1 ] [a2 , b2 ] divides the closed rectangle [a1 , b1 ] [a2 , b2 ] into kl subrectangles, a typical
one being [ti1 , ti ] [sj1 , sj ]. In general, if Pi divides [ai , bi ] into Ni subintervals, then P =
(P1 , . . . , Pn ) divides [a1 , b1 ] [an , bn ] into N1 Nn subrectangles. These subrectangles
will be called subrectangles of the partition P .
Suppose now A is a rectangle, f : A R is a bounded function, and P is a partition of A. For
each subrectangle S of the partition let
mS = inf{f (x) | x S},

MS = sup{f (x) | x S},


245

9 Integration of Functions of Several Variables

246

and let v(S) be the volume of the rectangle S.


A = [a1 , b1 ] [an , bn ] is

Note that volume of the rectangle

v(A) = (b1 a1 )(b2 a2 ) (bn an ).


The lower and the upper sums of f for P are defined by
X
X
mS v(S) and U(P, f ) =
MS v(S),
L(P, f ) =
S

where the sum is taken over all subrectangles S of the partition P . Clearly, if f is bounded with
m f (x) M on the rectangle x R,
m v(R) L(P, f ) U(P, f ) M v(R),
so that the numbers L(P, f ) and U(P, f ) form bounded sets. Lemma 5.1 remains true; the proof
is completely the same.
Lemma 9.1 (a) Suppose the partition P is a refinement of P (that is, each subrectangle of P
is contained in a subrectangle of P ). Then
L(P, f ) L(P , f ) and U(P , f ) U(P, f ).
(b) If P and P are any two partitions, then L(P, f ) U(P , f ).
It follows from the above corollary that all lower sums are bounded above by any upper sum
and vice versa.
Definition 9.1 Let f : A R be a bounded function. The function f is called Riemann integrable on the rectangle A if
Z

f dx := sup{L(P, f )} = inf {U(P, f )} =:


P

f dx,
A

where the supremum and the infimum are taken over all partitions P of A. This common
number is the Riemann integral of f on A and is denoted by
Z
Z
f dx or
f (x1 , . . . , xn ) dx1 dxn .
A

R
f dx and A f dx are called the lower and the upper integral of f on A, respectively. They
A
always exist. The set of integrable function on A is denoted by R(A).
As in the one dimensional case we have the following criterion.

Proposition 9.2 (Riemann Criterion) A bounded function f : A R is integrable if and only


if for every > 0 there exists a partition P of A such that U(P, f ) L(P, f ) < .

9.1 Basic Definition

247

Example 9.1 (a) Let f : A R be a constant function f (x) = c. Then for any Partition P and
any subrectangle S we have mS = MS = c, so that
X
X
c v(S) = c
v(S) = cv(A).
L(P, f ) = U(P, f ) =
S

Hence, A c dx = cv(A).
(b) Let f : [0, 1] [0, 1] R be defined by
(
0,
f (x, y) =
1,

if x is rational,
if x is irrational.

If P is a partition, then every subrectangle S will contain points (x, y) with x rational, and also
points (x, y) with x irrational. Hence mS = 0 and MS = 1, so
X
L(P, f ) =
0v(S) = 0,
S

and

U(P, f ) =
Therefore,

f dx = 1 6= 0 =
A

1v(S) = v(A) = v([0, 1] [0, 1]) = 1.

f dx and f is not integrable.

9.1.1 Properties of the Riemann Integral


We briefly write R for R(A).
R
Remark 9.1 (a) R is a linear space and A () dx is a linear functional, i. e. f, g R imply
f + g R for all , R and
Z
Z
Z
(f + g) dx = f dx + g dx.
A

(b) R is a lattice, i. e., f R implies | f | R. If f, g R, then max{f, g} R and


min{f, g} R.
(c) R is an algebra, i. e., f, g R imply f g R.
(d) The triangle inequality holds:

Z
Z


f dx
| f | dx.


A

(e) C(A) R(A).

(f) f R(A) and f (A) [a, b], g C[a, b]. Then g f R(A).
(g) If f R and f = g except at finitely many points, then g R and

f dx =

g dx.

(h) Let f : A R and let P be a partition of A. Then f R(A) if and only if f S is


integrable for each subrectangle S. In this case
Z
XZ
f dx =
f S dx.
A

9 Integration of Functions of Several Variables

248

9.2 Integrable Functions


We are going to characterize integrable functions. For, we need the notion of a set of measure
zero.
Definition 9.2 Let A be a subset of Rn . A has (n-dimensional) measure zero if for every > 0
P
there exists a sequence (Ui )iN of closed rectangles Ui which cover A such that
i=1 v(Ui ) < .
Open rectangles can also be used in the definition.

Remark 9.2 (a) Any finite set {a1 , . . . , am } Rn is of measure 0. Indeed, let > 0 and
choose Ui be a rectangle with midpoint ai and volume /m. Then {Ui | i = 1, . . . , m} covers
P
A and i v(Ui ) .
(b) Any contable set is of measure 0.
(c) Any countable set has measure 0.
(d) If each (Ai )iN has measure 0 then A = A1 A2 has measure 0.
Proof. Let > 0. Since Ai has measure 0 there exist closed rectangles Uik , i N, k N,
S
such that for fixed i, the family {Uik | k N} covers Ai , i.e.
kN Uik Ai and
P
i1
, i N. In this way we have constructed an infinite array {Uik } which
kN v(Uik ) /2
covers A. Arranging those sets in a sequence (cf. Cantors first diagonal process), we obtain a
sequence of rectangles which covers A and

Hence,

v(Uik )
= 2.
i1
2
i=1
i,k=1

i,k=1 v(Uik )

2 and A has measure 0.

(e) Let A = [a1 , b1 ] [an , bn ] be a non-singular rectangle, that is ai < bi for all i = 1, . . . , n.
Then A is not of measure 0. Indeed, we use the following two facts about the volume of finite
unions of rectangles:
P
(a) v(U1 Un ) ni=1 v(Ui ),
(b) U V implies v(U) v(V ).
Now let = v(A)/2 = (b1 a1 ) (bn an )/2 and suppose that the open rectangles (Ui )iN
cover the compact set A. Then there exists a finite subcover U1 Um A. This and (a),
(b) imply
!
n
n

[
X
X
Ui
v(Ui )
v(Ui ).
< v(A) v
This contradicts

i=1

i=1

i=1

v(Ui ) ; thus, A has not measure 0.

Theorem 9.3 Let A be a closed rectangle and f : A R a bounded function.


B = {x A | f is discontinuous at x}.
Then f is integrable if and only if B is a set of measure 0.
For the proof see [Spi65, 3-8 Theorem] or [Rud76, Theorem 11.33].

Let

9.2 Integrable Functions

249

9.2.1 Integration over More General Sets


We have so far dealt only with integrals of functions over rectangles. Integrals over other sets
are easily reduced to this type.
If C Rn , the characteristic function C of C is defined by
(
1,
x C,
C (x) =
0,
x 6 C.

1111111111111111111
0000000000000000000
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111

Definition 9.3 Let f : C R be bounded and A a rectangle, C A. We call f Riemann integrable on C if the
product function f C : A R is Riemann integrable
on A. In this case we define
Z
Z
f dx =
f C dx.
C

This certainly occurs if both f and C are integrable on


R
R
A. Note, that C 1 dx = A C dx =: v(C) is defined to
be the volume or measure of C.
R
Problem: Under which conditions on C, the volume v(C) = C dx exists? By Theorem 9.3
C is integrable if and only if the set B of discontinuities of C in A has measure 0.
The boundary of a set C

111
000
000
111
000
111
C
A

11111
00000
00000
11111
00000
11111
00000
11111
111
000
000
111
000
111

Let C A. For every x A exactly one of the


following three cases occurs:
(a) x has a neighborhood which is completely
contained in C (x is an inner point of C),
(b) x has a neighborhood which is completely
contained in C c (x is an inner point of C c ),
(c) every neighborhood of x intersects both C
and C c . In this case we say, x belongs to the
boundary C of C. By definition C = C C c ;
also C = C \ C .

By the above discussion, A is the disjoint union of two open and a closed set:
A = C C (C c ) .
Theorem 9.4 The characteristic function C : A R is integrable if and only if the boundary
of C has measure 0.
Proof. Since the boundary C is closed and inside the bounded set, C is compact. Suppose
first x is an inner point of C. Then there is an open set U C containing x. Thus C (x) = 1
on x U; clearly C is continuous at x (since it is locally constant). Similarly, if x is an inner
point of C c , C (x) is locally constant, namely C = 0 in a neighborhood of x. Hence C is

9 Integration of Functions of Several Variables

250

continuous at x. Finally, if x is in the boundary of C for every open neighborhood U of x there


is y1 U C and y2 U C c , so that C (y1 ) = 1 whereas C (y2 ) = 0. Hence, C is not
continuous at x. Thus, the set of discontinuity of C is exactly the boundary C. The rest
follows from Theorem 9.3.

Definition 9.4 A bounded set C is called Jordan measurable or simply a Jordan set if its boundR
ary has measure 0. The integral v(C) = C 1 dx is called the n-dimensional Jordan measure of
C or the n-dimensional volume of C; sometimes we write (C) in place of v(C).
Naturally, the one-dimensional volume in the length, and the two-dimensional volume is the
area.
Typical Examples of Jordan Measurable Sets
P
Hyper planes ni=1 ai xi = c, and, more general, hyper surfaces f (x1 , . . . , xn ) = c, f C1 (G)
are sets with measure 0 in Rn . Curves in Rn have measure 0. Graphs of functions
f = {(x, f (x)) Rn+1 | x G}, f continuous, are of measure 0 in Rn+1 . If G is a bounded
region in Rn , the boundary G has measure 0. If G Rn is a region, the cylinder
C = G R = {(x, xn+1 ) | x G} Rn+1 is a measure 0 set.
Let D Rn+1 be given by
D = {(x, xn+1 ) | x K, 0 xn+1 f (x)},

f(x)

D
K

where K Rn is a compact set and f : K R is continuous. Then D is Jordan


measurable. Indeed, D is bounded by the graph f , the hyper plane xn+1 = 0
and the cylinder D R = {(x, xn+1 ) | x K} and all have measure 0 in
Rn+1 .

9.2.2 Fubinis Theorem and Iterated Integrals


Our goal is to evaluate Riemann integrals; however, so far there was no method to compute
multiple integrals. The following theorem fills this gap.
Theorem 9.5 (Fubinis Theorem) Let A Rn and B Rm be closed rectangles, and let
f : A B R be integrable. For x A let gx : B R be defined by gx (y) = f (x, y) and let
L(x) =
U(x) =

gx dy =
B

gx dy =
B

f (x, y) dy,
B

f (x, y) dy.
B

9.2 Integrable Functions

251

Then L(x) and U(x) are integrable on A and


Z
Z
Z
f dxdy =
L(x) dx =
AB

f dxdy =

AB

U(x) dx =

Z Z
A

f (x, y) dy
B

f (x, y) dy
B

dx,
dx.

The integrals on the right are called iterated integrals.


The proof is in the appendix to this chapter.
Remarks 9.3 (a) A similar proof shows that we can exchange the order of integration
!

Z
Z
Z Z
Z
f dxdy =
f (x, y) dx dy =
f (x, y) dx dy.
AB

These integrals are called iterated integrals for f .


R
(b) In practice it is often the case that each gx is integrable so that AB f dxdy =
R R

f
(x,
y)
dy
dx. This certainly occurs if f is continuous.
A
B
(c) If A = [a1 , b1 ] [an , bn ] and f : A R is continuous, we can apply Fubinis theorem
repeatedly to obtain
 
Z
Z bn  Z b1
f dx =

f (x1 , . . . , xn ) dx1 dxn .


A

an

a1

R
(d) If C A B, Fubinis theorem can be used to compute C f dx since this is by definition
R
f C dx. Here are two examples in case n = 2 and n = 3.
AB
Let a < b and (x) and (x) continuous real valued functions on
[a, b] with (x) < (x) on [a, b]. Put
(x)

C = {(x, y) R2 | a x b,

(x) y (x)}.

Let f (x, y) be continuous on C. Then f is integrable on C and


!
ZZ
Z
Z
b

(x)

f dxdy =

f (x, y) dy

dx.

(x)
a

(x)

Let
G = {(x, y, z) R3 | a x b, (x) y (x), (x, y) z (x, y)},
where all functions are sufficiently nice. Then
ZZZ
Z b Z (y)
f (x, y, z) dxdydz =
G

(x)

(x,y)

(x,y)

f (x, y, z) dz

dy

dx.

(e) Cavalieris Principle. Let A and B be Jordan sets in R3 and let Ac = {(x, y) | (x, y, c)
A} be the section of A with the plane z = c; Bc is defined similar. Suppose each Ac and Bc is
Jordan measurable (in R2 ) and they have the same area v(Ac ) = v(Bc ) for all c R.
Then A and B have the same volume v(A) = v(B).

9 Integration of Functions of Several Variables

252
(1,1)

Example 9.2 (a) Let f (x, y) = xy and


C = {(x, y) R2 | 0 x 1, x2 y x}

= {(x, y) R2 | 0 y 1, y x y}.

Then
ZZ

xy dxdy =

xy dy dx =

x2

x
Z
1 1 3
xy 2
dx =
(x x5 ) dx
2 y=x2
2 0

1
1
x4 x6
1
1
=
=
= .
8
12 0 8 12
24

Interchanging the order of integration we obtain


ZZ

xy dxdy =

xy dx dy =

y
Z
1 1 2
x2 y
=
(y y 3) dy

2 y
2 0

1
y 3 y 4
1
1 1
=
= = .
6
8 0 6 8
24

(b) Let G = {(x, y, z) R3 | x, y, z 0, x + y + z 1} and f (x, y, z) = 1/(x + y + z + 1)3 .


The set G can be parametrized as follows
ZZZ
G

(2,4)

(1,2)
(2,2)

(1,1)

f dxdydz =

Z

1x

Z

1xy

dz
(1 + x + y + z)3

dy

dx

1xy !

1
1

=
dy dx
2 (1 + x + y + z)2 0
0
0
 
Z 1 Z 1x 
1
1
1
dy dx
=

2 (1 + x + y)2 8
0
0



Z 
1
5
x3
1 1
1
dx =
log 2
.
+
=
2 0 x+1
4
2
8
Z

1x

(c) Let f (x, y) = ey/x and D the above region. Compute


the integral of f on D.
D can be parametrized as follows D = {(x, y) | 1 x
2, x y 2x} Hence,
Z 2
Z 2x
ZZ
y
f dxdy =
dx
e x dy
D

2
1


y 2x
dx xe x =
x

3
(e2 x ex) dx = (e2 e).
2

9.3 Change of Variable

253

But trying to reverse the order of integration we encounter two problems. First, we must break
D in several regions:
ZZ
Z 2
Z y
Z 4
Z 2
y/x
f dxdy =
dy
e dx +
dy
ey/x dx.
1

y/2

This is not a serious problem. A greater problem is that e1/x has no elementary antiderivative, so
R y y/x
R2
e dx and y/2 ey/x dx are very difficult to evaluate. In this example, there is a considerable
1
advantage in one order of integration over the other.

The Cosmopolitan Integral


Let f 0 be a continuous real-valued function on [a, b].
y
There are some very special solids whose volumes can
f(x)
be expressed by integrals. The simplest such solid G is a
volume of revolution obtained by revolving the region
under the graph of f 0 on [a, b] around the horizontal
x
axis. We apply Fubinis theorem to the set
a
b
G = {(x, y, z) R3 | a x b,

y 2 + z 2 f (x)2 }.

Consequently, the volume of v(G) is given by


v(G) =

ZZZ

dxdydz =
G

dx
a

Z Z

dydz ,
Gx

(9.1)

where Gx = {(y, z) R2 | y 2 + z 2 f (x)2 } is the closed disc of radius f (x) around (0, 0).
RR
For any fixed x [a, b] its area is v(Gx ) = Gx dydz = f (x)2 . Hence
v(G) =

f (x)2 dx.

(9.2)

Example 9.3 We compute the volume of the ellipsoid obtained by revolving the graph of the
ellipse
x2 y 2
+ 2 =1
a2
b


2
around the x-axis. We have y 2 = f (x)2 = b2 1 xa2 ; hence
2

v(G) = b




 a


x2
x3
2a3
4 2
2
2
b a.
1 2 dx = b x 2 = b 2a 2 =
a
3a
3a
3
a
a

9.3 Change of Variable

We want to generalize change of variables formula

R g(b)
g(a)

f (x) dx =

Rb
a

f (g(y))g (y) dy.

9 Integration of Functions of Several Variables

254

If BR is the ball in R3 with radius R around the origin we have in cartesian coordinates
ZZZ
Z
Z 2 2
Z 2 2 2
R

f dxdydz =

dx

BR

R x

R2 x2

R x y

dy

dzf (x, y, z).

R2 x2 y 2

Usually, the complicated limits yield hard computations. Here spherical coordinates are appropriate.
To motivate the formula consider the area of a parallelogram D in the x-y-plane spanned by the
two vectors a = (a1 , a2 ) and b = (b1 , b2 ).



g1 (, )
D = {a + b | , [0, 1]} =
(, [0, 1] ,
g2 (, )
where g1 (, ) = a1 + b1 and g2 (, ) = a2 + b2 . As known from linear algebra the area
of D equals the norm of the vector product




e
e
e
1
2
3


v(D) = ka bk =
det a1 a2 0 = k(0, 0, a1b2 a2 b1 )k = | a1 b2 a2 b1 | =: d

b1 b2 0

Introducing new variables and with

x = a1 + b1 ,

y = a2 + b2 ,

the parallelogram D in the x-y-plane is now the unit square C = [0, 1] [0, 1] in the --plane
and D = g(C). We want to compare the area d of D with the area 1 of C. Note that d is exactly
1 ,g2 )
; indeed
the absolute value of the Jacobian (g
(,)
(g1 , g2 )
= det
(, )
Hence,

ZZ
D

g1

g2

g1

g2



a1 a2
= det
= a1 b2 a2 b1 .
b1 b2


ZZ
(g1 , g2 )


dxdy =
(, ) dd.
C

This is true for any Rn and any regular map, g : C D.

Theorem 9.6 (Change of variable) Let C and D be compact Jordan set in Rn ; let M C a
set of measure 0. Let g : C D be continuously differentiable with the following
properties
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
(i) g is injective on C \ M.
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111 11111111111
0000000000000000 00000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
(ii) g (x) is regular on C \ M.
0000000000000000
1111111111111111
00000000000
11111111111
g
D, y
0000000000000000
1111111111111111
00000000000
11111111111
Let f : D R be continuous.
Then
Z
Z
f (y) dy =
f (g(x))
D

1111111111111111
0000000000000000
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
fg
00000000
11111111
C, x
00000000
11111111

IR

1111111111
0000000000
00000000000
11111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
f
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111




(g1 , . . . , gn )


(x1 , . . . , xn ) (x) dx.

(9.3)

9.3 Change of Variable

255

Remark 9.4 Why the absolute value of the Jacobian? In R1 we dont have the absolute value.
Rb
Ra
But in contrast to Rn , n 1, we have an orientation of the integration set a f dx = b f dx.

For the proof see [Rud76, 10.9 Theorem]. The main steps of the proof are: 1) In a small open
set g can be written as the composition of n flips and n primitive mappings. A flip changes
two variables xi and xk , wheras a primitive mapping H is equal to the identity except for one
variable, H(x) = x + (h(x) x)em where h : U R.
2) If the statement is true for transformations S and T , then it is true for the composition S T
which follows from det(AB) = det A det B.
3) Use a partition of unity.
Example 9.4 (a) Polar coordinates. Let A = {(r, ) | 0 r R, 0 < 2} be a rectangle
in polar coordinates. The mapping g(r, ) = (x, y), x = r cos , y = r sin maps this
rectangle continuously differentiable onto the disc D with radius R. Let M = {(r, ) | r = 0}.
= r, the map g is bijective and regular on A \ M. The assumptions of the theorem
Since (x,y)
(r,)
are satisfied and we have
ZZ
ZZ
f (x, y) dxdy =
f (r cos , r sin )rdrd
D

Fubini

A
R Z 2

f (r cos , r sin )r drd.

(b) Spherical coordinates. Recall from the exercise class the spherical coordinates r [0, ),
[0, 2], and [0, ]
x = r sin cos ,
y = r sin sin ,
z = r cos .
The Jacobian reads



xr x x sin cos r cos cos r sin sin


(x, y, z)
= yr y y = sin sin r cos sin r sin cos = r 2 sin
(r, , )

zr z z cos
r sin
0
Sometimes one uses

Hence

ZZZ
B1

(x, y, z)
= r 2 sin .
(r, , )

f (x, y, z) dxdydz =

f (x, y, z) sin2 dr d d.
0

This example was not covered in the lecture. Compute the volume of the ellipsoid E given by
u2 /a2 + v 2 /b2 + w 2 /c2 = 1. We use scaled spherical coordinates:
u = ar sin cos ,
v = br sin sin ,
w = cr cos ,

9 Integration of Functions of Several Variables

256

where r [0, 1], [0, ], 0, 2]. Since the rows of the spherical Jacobian matrix
are simply multiplied by a, b, and c, respectively, we have

(x,y,z)
(r,,)

(u, v, w)
= abcr 2 sin .
(r, , )
Hence, if B1 is the unit ball around 0 we have using iterated integrals
ZZZ
ZZZ
v(E) =
r 2 sin drdd
dudvdw = abc
E

= abc

drr 2
0

B1

sin d

1
4
= abc 2 ( cos )|0 =
abc.
3
3
2

x - y =1
2

x -y

=4

(c)

ZZ

(x2 + y 2) dxdy where C is bounded by the four hyperbolas

xy=2
xy=1

xy = 1, xy = 2, x2 y 2 = 1, x2 y 2 = 4.
We change coordinates g(x, y) = (u, v)
u = xy,

The Jacobian is

v = x2 y 2 .



(u, v) y
x
= 2(x2 + y 2).
=

2x 2y
(x, y)

The Jacobian of the inverse transform is

(x, y)
1
=
.
2
(u, v)
2(x + y 2)
In the (u, v)-plane, the region is a rectangle D = {(u, v) R2 | 1 u 2,
Hence,


ZZ
ZZ
ZZ


x2 + y 2
2
2
2
2 (x, y)
dudv
=
dudv =
(x + y ) dxdy =
(x + y )
(u, v)
2(x2 + y 2 )
C

1 v 4}.
1
3
v(D) = .
2
2

Physical Applications
If (x) = (x1 , x2 , x3 ) is a mass density of a solid C R3 , then
m=
xi =

dx is the mass of C and

CZ

1
m

xi (x) dx, i = 1, . . . , 3 are the coordinates of the mass center x of C.

9.4 Appendix

257

The moments of inertia of C are defined as follows


ZZZ
ZZZ
ZZZ
2
2
2
2
Ixx =
(y + z ) dxdydz,Iyy =
(x + z ) dxdydz,Izz =
(x2 + y 2 ) dxdydz,
Ixy =

ZCZ Z

xy dxdydz,

Ixz =

ZCZ Z

xz dxdydz,

Iyz =

ZCZ Z

yz dxdydz.

Here Ixx , Iyy , and Izz are the moments of inertia of the solid with respect to the x-axis, y-axis,
and z-axis, respectively.
Example 9.5 Compute the mass center of a homogeneous half-plate of radius R, C = {(x, y) |
x2 + y 2 R2 , y 0}.
Solution. By the symmetry of C with respect to the y-axis, x = 0. Using polar coordinates we
find
ZZ
Z Z
Z
1
1 R
1 R 2
1 2R3
.
y=
y dxdy =
r sin rd dr =
r dr ( cos ) |0 =
m
m 0 0
m 0
m 3
C

) is the mass center of


Since the mass is proportional to the area, m = R2 and we find (0, 4R
3
the half-plate.

9.4 Appendix
Proof of Fubinis Theorem. Let PA be a partition of A and PB a partition of B. Together they
give a partition P of A B for which any subrectangle S is of the form SA SB , where SA is
a subrectangle of the partition PA , and SB is a subrectangle of the partition PB . Thus
X
X
mSA SB v(SA SB )
mS v(S) =
L(P, f ) =
S

SA ,SB

X X
SA

SB

mSA SB v(SB ) v(SA ).

Now, if x SA , then clearly mSA SB (f ) mSB (gx ) since the reference set SA SB on the
left is bigger than the reference set {x} SB on the right. Consequently, for x SA we have
Z
X
X
gx dy = L(x)
mSA SB v(SB )
mSB (gx ) v(SB )
SB

X
SB

SB

mSA SB v(SB ) mSA (L(x)).

Therefore,
X X
SA

SB

mSA SB v(SB ) v(SA )

X
SA

mSA (L(x))v(SA ) = L(PA , L).

258

9 Integration of Functions of Several Variables

We thus obtain
L(P, f ) L(PA , L) U(PA , L) U(PA , U) U(P, f ),
where the proof of the last inequality is entirely analogous to the proof of the first. Since f is
R
integrable, sup{L(P, f )} = inf{U(P, f )} = AB f dxdy. Hence,
Z
sup{L(PA , L)} = inf{U(PA , L)} =
f dxdy.
AB

R
R
In other words, L(x) is integrable on A and AB f dxdy = A L(x) dx.
The assertion for U(x) follows similarly from the inequalities

L(P, f ) L(PA , L) L(PA , U) U(PA , U) U(P, f ).

Chapter 10
Surface Integrals
10.1 Surfaces in

Recall that a domain G is an open and connected subset in Rn ; connected means that for any
two points x and y in G, there exist points x0 , x1 , . . . , xk with x0 = x and xk = y such that
every segment xi1 xi , i = 1, . . . , k, is completely contained in G.
Definition 10.1 Let G R2 be a domain and F : G R3 continuously differentiable. The
mapping F as well as the set F = F (G) = {F (s, t) | (s, t) G} is called an open regular
surface if the Jacobian matrix F (s, t) has rank 2 for all (s, t) G.
If

the Jacobian matrix of F is

x(s, t)
F (s, t) = y(s, t) ,
z(s, t)

xs xt
F (s, t) = ys yt .
zs zt
The two column vectors of F (s, t) span the tangent plane to F at (s, t):


y
z
x
(s, t), (s, t), (s, t) ,
D1 F (s, t) =
s
s
s


x
y
z
D2 F (s, t) =
(s, t), (s, t), (s, t) .
t
t
t
Justification: Suppose (s, t0 ) G where t0 is fixed. Then (s) = F (s, t0 ) defines a curve
in F with tangent vector (s) = D1 F (s, t0 ). Similarly, for fixed s0 we obtain another curve
(t) = F (s0 , t) with tangent vector (t) = D2 F (s0 , t). Since F (s, t) has rank 2 at every point
of G, the vectors D1 F and D2 F are linearly independent; hence they span a plane.

259

10 Surface Integrals

260

Definition 10.2 Let F : G R3 be an open regular surface, and (s0 , t0 ) G. Then


~x = F (s0 , t0 ) + D1 F (s0 , t0 ) + D2 F (s0 , t0 ),

, R

is called the tangent plane E to F at F (s0 , t0 ). The line through F (s0 , t0 ) which is orthogonal
to E is called the normal line to F at F (s0 , t0 ).
Recall that the vector product ~x ~y of vectors ~x = (x1 , x2 , x3 ) and ~y = (y1 , y2, y3 ) from R3 is
the vector


e1 e2 e3


~x ~y = x1 x2 x3 = (x2 y3 y2 x3 , x3 y1 y3 x1 , x1 y2 y1 x2 ).
y y y
1
2
3

It is orthogonal to the plane spanned by the parallelogram P with edges ~x and ~y . Its length is
the area of the parallelogram P .
A vector which points in the direction of the normal line is

e1 e2

D1 F (s0 , t0 ) D2 F (s0 , t0 ) = xs ys
x y
t
t
D1 F
~n =
kD1 F


e3
zs
zt
D2 F
,
D2 F k

(10.1)

(10.2)

where ~n is the unit normal vector at (s0 , t0 ).


Example 10.1 (Graph of a function) Let F be given by the graph of a function f : G R,
namely F (x, y) = (x, y, f (x, y)). By definition
D1 F = (1, 0, fx ),
hence

D2 F = (0, 1, fy ),



e1 e2 e3


D1 f D2 f = 1 0 fx = (fx , fy , 1).
0 1 f
y

Therefore, the tangent plane has the equation

fx (x x0 ) fy (y y0 ) + 1(z z0 ) = 0.
Further, the unit normal vector to the tangent plane is
(fx , fy , 1)
~n = p 2
.
fx + fy2 + 1

10.1 Surfaces in R3

261

10.1.1 The Area of a Surface


Let F and F be as above. We assume that the continuous vector fields D1 F and D2 F on G can
be extended to continuous functions on the closure G.
Definition 10.3 The number

| F | = F :=

ZZ

kD1 F D2 F k dsdt

(10.3)

is called the area of F and of F.


We call
dS = kD1 F D2 F k dsdt
RR
dS.
the scalar surface element of F . In this notation, | F | =
G

Helix: (s*cos(t), s*cos(t), 2*t)

Example 10.2 Let F = {(s cos t, s sin t, 2t) | s


[0, 2], t [0, 4]} be the surfaced spanned by a helix.
We shall compute its area. The normal vector is

25
20
15

D1 F = (cos t, sin t, 0),

10
5

D2 F = (s sin t, s cos t, 2)

-2
-1
0

1
-2

-1



e1
e2
e3

D1 F D2 F = cos t
sin t 0 = (2 sin t, 2 cos t, s).
s sin t s cos t 2

Therefore,
|F| =

such that

Z 2p
Z
2
2
2
4 cos t + 4 sin t + s dsdt = 4
0

2
0

4 + s2 ds = 8( 2 log( 2 1)).

Example 10.3 (Guldins Rule (Paul Guldin, 15771643, Swiss Mathematician)) Let f be a
continuously differentiable function on [a, b] with f (x) 0 for all x [a, b]. Let the graph of
f revolve around the x-axis and let F be the corresponding surface. We have
| F | = 2

f (x)

1 + f (x)2 dx.

Proof. Using polar coordinates in the y-z-plane, we obtain a parametrization of F


F = {(x, f (x) cos , f (x) sin ) | x [a, b], [0, 2]}.

10 Surface Integrals

262
We have
D1 F = (1, f (x) cos , f (x) sin ),
D1 F D2 F = (f f , f cos , f sin );

D1 F = (0, f sin , f cos ),

p
so that dS = f (x) 1 + f (x)2 dxd. Hence
|F| =

Z bZ
a

Z b
p
p

2
f (x) 1 + f (x) d dx = 2
f (x) 1 + f (x)2 dx.
a

10.2 Scalar Surface Integrals


Let F and F be as above, and f : F R a continuous function on the compact subset F R3 .
Definition 10.4 The number
ZZ
ZZ
f (~x) dS :=
f (F (s, t)) kD1 F (s, t) D2 (s, t)k dsdt
G

is called the scalar surface integral of f on F.

10.2.1 Other Forms for dS


(a) Let the surface F be given as the graph of a function F (x, y) = (x, y, f (x, y)), (x, y) G.
Then, see Example 10.1,
q
dS =

1 + fx2 + fy2 dxdy.

(b) Let the surface be given implicitly as G(x, y, z) = 0. Suppose G is locally solvable for z in
a neighborhood of some point (x0 , y0 , z0 ). Then the surface element (up to the sign) is given by
dS =

Fx2 + Fy2 + Fz2


k grad F k
dxdy =
dxdy.
| Fz |
| Fz |

One checks that DF1 DF2 = (Fx , Fy , Fz )/Fz .


(c) If F is given by F (s, t) = (x(s, t), y(s, t), z(s, t)) we have
dS =

EG H 2 dsdt,

where
E = x2s + ys2 + zs2 ,

G = x2t + yt2 + zt2 ,

H = xs xt + ys yt + zs zt .

10.2 Scalar Surface Integrals

263





~
Indeed, using ~a b = k~ak
by ~a and ~b we get


~
b sin and sin2 = 1 cos2 , where is the angle spanned

EG H 2 = kD1 F k2 kD2 F k2 (D1 F D2 F )2 = kD1 F k2 kD2 F k2 (1 cos2 )


= kD1 F k2 kD2 F k2 sin2 = kD1 F D2 F k2

which proves the claim.


Example 10.4 (a) We give two different forms for the scalar surface element of a sphere. By
(b), the sphere x2 + y 2 + z 2 = R2 has surface element
dS =

k(2x, 2y, 2z)k


R
dxdy = dxdy.
2z
z

If
x = R cos sin ,

y = R sin sin ,

z = R cos ,

we obtain
D1 = F = R(cos cos , sin cos , sin ),
D2 = F = R( sin sin , cos sin , 0),

D1 D2 = R2 (cos sin2 , sin sin2 , sin cos ).


Hence,
dS = kD1 D2 k dd = R2 sin dd.
(b) Riemann integral in R3 and surface integral over spheres. Let M = {(x, y, z) R3 |
k(x, y, z)k R} where R > 0. Let f : M R be continuous. Let us denote the
sphere of radius r by Sr = {(x, y, z) R3 | x2 + y 2 + z 2 = r 2 }. Then
ZZZ
M

f dxdydz =

dr

ZZ
Sr

f (~x) dS =

ZZ
f (r~x) dS(~x) dr.
r2
S1

Indeed, by the previous example, and by our knowledge of spherical coordinates (r, , ).
dxdydz = r 2 sin dr d d = dr dSr .
On the other hand, on the unit sphere S1 , dS = sin d d such that
dxdydz = r 2 dr dS
which establishes the second formula.

10 Surface Integrals

264

10.2.2 Physical Application


(a) If (x, y, z) is the mass density on a surface F,

RR

dS is the total mass of F. The mass

center of F has coordinates (xc , yc , zc ) with


xc = RR

1
dS

ZZ

x (x, y, z) dS,

and similarly for yc and zc .


(b) If (~x) is a charge density on a surface F. Then
U(~y ) =

ZZ
F

(~x)
dS(~x),
k~y ~xk

~y 6 F

is the potential generated by F.

10.3 Surface Integrals


10.3.1 Orientation
We want to define the notion of orientation for a regular surface. Let F be a regular (injective)
surface with or without boundary. Then for every point x0 F there exists the tangent plane
Ex0 ; the normal line to F at x0 is uniquely defined.
However, a unit vector on the normal line can have two different directions.
Definition 10.5 (a) Let F be a surface as above. A unit normal field to F is a continuous
function ~n : F R3 with the following two properties for every x0 F
(i) ~n(x0 ) is orthogonal to the tangent plane to F at x0 .
(ii) k~n(x0 )k = 1.
(b) A regular surface F is called orientable, if there exists a unit normal field on F.
Suppose F is an oriented, open, regular surface with piecewise smooth boundary F. Let
F (s, t) be a parametrization of F. We assume that the vector functions F , DF1 , and DF2 can
be extended to continuous functions on F. The unit normal vector is given by
~n =

D1 F D2 F
,
kD1 F D2 F k

where = +1 or = 1 fixes the orientation of F. It turns out that for a regular surface F
there either exists exactly two unit normal fields or there is no such field. If F is provided with
an orientation we write F+ for the pair (F, ~n). For F with the opposite orientation, we write
F .

10.3 Surface Integrals

265

Examples of non-orientable surfaces are the Mobius band and the real projective plane. Analytically the Mobius band is given by


1 + t cos 2s sin s

F (s, t) = 1 + t cos 2s cos s ,
t sin 2s



1 1
(s, t) [0, 2] ,
.
2 2

Definition 10.6 Let f~ : F R3 be a continuous vector field on F. The number


ZZ
~ x) ~n dS
f(~

(10.4)

F+

is called the surface integral of the vector field f~ on F+ . We call


~ = ~n dS = D1 F D2 F dsdt
dS
the surface element of F.
Remark 10.1 (a) The surface integral is independent of the parametrization of F but depends
RR
RR
~ =
~
on the orientation; F+ f~ dS
f~ dS.
F
For, let (s, t) = (s(, ), t(, )) be a new parametrization with F (s(, ), t(, )) = G(, ).
Then the Jacobian is
(s, t)
dsdt =
= (s t s t ) dd.
(, )
Further
D1 G = D1 F s + D2 F t ,

D2 G = D1 F s + D2 F t ,

so that using ~x ~x = 0, ~x ~y = ~y ~x
D1 G D2 Gdd = (D1 F s + D2 F t ) (D1 F s + D2 F t )dd,
= (s t s t )D1 F D2 F dd
= D1 F D2 F dsdt.

(b) The scalar surface integral is a special case of the surface integral, namely
RR
RR
f dS =
f~n~n dS.
(c) Special cases. Let F be the graph of a function f , F = {(x, y, f (x, y)) | (x, y) C}, then
~ = (fx , fy , 1) dxdy.
dS
If the surface is given implicitly by F (x, y, z) = 0 and it is locally solvable for z, then
~ = grad F dxdy.
dS
Fz

10 Surface Integrals

266
~
(d) Still another form of dS.
ZZ

~ =
f~ dS

ZZ

f (F (s, t)) (D1 F D2 F ) dsdt



f1 (F (s, t)) f2 (F (s, t)) f3 (F (s, t))


~ (s, t)) (D1 F D2 F ) = xs (s, t)
.
f(F
y
(s,
t)
z
(s,
t)
s
s


x (s, t)
yt (s, t)
zt (s, t)
t

(10.5)

(e) Again another notation. Computing the previous determinant or the determinant (10.1)
explicitely we have






ys zs
zs xs
xs ys
~





= f1 (y, z) +f2 (z, x) +f3 (x, y) .
+f2
+f3
f (D1 F D2 F ) = f1


yt zt
zt xt
xt yt
(s, t)
(s, t)
(s, t)
Hence,

~ = D1 F D2 F dsdt =
dS

~ = ( dydz, dzdx, dxdy) .


dS
Therefore we can write
ZZ

~ =
f~ dS

In this setting
ZZ

(f1 dydz + f2 dzdx + f3 dxdy) .

f1 dydz =

ZZ


(z, x)
(x, y)
(y, z)
dsdt,
dsdt,
dsdt
(s, t)
(s, t)
(s, t)

ZZ

~ =
(f1 , 0, 0) dS

ZZ

f1 (F (s, t))

(y, z)
dsdt.
(s, t)

Sometimes one uses


~ = (cos(~n, e1 ), cos(~n, e2 ), cos(~n, e3 )) dS,
dS
~ = ~n dS.
since cos(~n, ei ) = ~nei = ni and dS
Note that we have surface integrals in the last two lines, not ordinary double integrals since F
is a surface in R3 and f1 = f1 (x, y, z) can also depend on x.
RR
~ is the flow of the vector field f~ through the surface F. The
The physical meaning of F f~ dS
flow is (locally) positive if ~n and f~ are on the same side of the tangent plane to F and negative
in the other case.
Example 10.5 (a) Compute the surface integral
ZZ
f dzdx
F+

of f (x, y, z) = x2 yz where F is the graph of g(x, y) = x2 + y over the unit square G =


[0, 1] [0, 1] with the downward directed unit normal field.

10.3 Surface Integrals

267

By Remark 10.1 (c)


~ = (gx , gy , 1) dxdy = (2x, 1, 1) dxdy.
dS
Hence
ZZ

f dzdx =

F+

ZZ

~
(0, f, 0) dS

F+

= (R)

ZZ

x y(x + y) dxdy =

dx

(x4 y + x2 y 2) dy =
0

19
.
90

(b) Let G denote the upper half ball of radius R in R3 :


G = {(x, y, z) | x2 + y 2 + z 2 R2 ,

z 0},

and let F be the boundary of G with the orientation of the


outer normal. Then F consists of the upper half sphere
F1

y
R
R
x

F1 = {(x, y,

p
R2 x2 y 2 ) | x2 + y 2 R2 },

z = g(x, y) =

R2 x2 y 2 ,

with the upper orientation of the unit normal field and of the disc F2 in the x-y-plane
F2 = {(x, y, 0) | x2 + y 2 R2 },

z = g(x, y) = 0,

with the downward directed normal. Let f~(x, y, z) = (ax, by, cz). We want to compute
ZZ
~
f~ dS.
F+

~ = 1 (x, y, z) dxdy. Hence


By Remark 10.1 (c), the surface element of the half-sphere F1 is dS
z

ZZ
ZZ
ZZ

1
1
2
2
2
~
~
f dS =
I1 =
(ax + by + cz )
(ax, by, cz) (x, y, z) dxdy =
dxdy.
z
z
z=g(x,y)
BR

F1+

BR

Using polar coordinates x = r cos , y = r sin , r [0, R], and z =

R2 r 2 we get
I1 =

Noting

R 2
0

sin2 d =

R 2

I1 =

ar 2 cos2 + br 2 sin2 + c(R2 r 2 )

rdr.
R2 r 2

cos2 d = we continue

ar 3
br 3

+
+ 2cr R2 r 2
R2 r 2
R2 r 2

dr.

R2 x2 y 2 =

10 Surface Integrals

268

Using r = R sin t, dr = R cos t dt we have


Z R
Z 3 3
Z
2 R sin tR cos t dt
2
2
r3
3
p

dr =
=R
sin3 t dt = R3 .
3
R2 r 2
0
0
0
R 1 sin2 t
Hence,

2 3
R (a + b) + c
I1 =
3

(R2 r 2 ) 2 d(r 2 )
0
R


2 3
2 2
2 32
=
R (a + b) + c (R r )
3
3
0
2 3
R (a + b + c).
=
3

In case of the disc F2 we have z = f (x, y) = 0, such that fx = fy = 0 and


~ = (0, 0, 1) dxdy
dS
by Remark 10.1 (c). Hence
ZZ
ZZ
ZZ
~
~
f dS =
(ax, by, cz) (0, 0, 1) dxdy = c
z dxdy = 0.
z=0

BR

F2+

Hence,

ZZ

BR

~ = 2 R3 (a + b + c).
(ax, by, cz) dS
3

10.4 Gau Divergence Theorem


The aim is to generalize the fundamental theorem of calculus to higher dimensions:
Z b
f (x) dx = f (b) f (a).
a

Note that a and b form the boundary of the segment [a, b]. There are three possibilities to do this
RR
RRR
~
g dxdydz =
Gau theorem in R3 ,
f~ dS
G
(G)+
RR
R
g dxdy
=
f~ d~x Greens theorem in R2 ,
G
(G)+
RR
R
~
f~ d~x Stokes theorem .
=
~g dS
F+

(G)+

Let G R3 be a bounded domain (open, connected) such that its boundary F = G satisfies
the following assumptions:

1. F is a union of regular, orientable surfaces Fi . The parametrization Fi (s, t), (s, t) Ci ,


of Fi as well as D1 Fi and D2 Fi are continuous vector functions on Ci; Ci is a domain in
R2 .

10.4 Gau Divergence Theorem

269

2. Let Fi be oriented by the outer normal (with respect to G).


3. There is given a continuously differentiable vector field f~ : G R3 on G (More
precisely, there exist an open set U G and a continuously differentiable function
f: U R3 such that fG = f~.)
Theorem 10.1 (Gau Divergence Theorem) Under the above assumptions we have
ZZ
ZZZ
~
f~ dS
div f~ dxdydz =

(10.6)

G+

Sometimes the theorem is called GauOstrogadski theorem or simply Ostrogadski theorem.


Other writings:

ZZZ 
ZZ
f1 f2 f3
dxdydz =
+
+
(f1 dydz + f2 dzdx + f3 dxdy)
(10.7)
x
y
z
G

The theorem holds for more general regions G R3 .


We give a proof for
G = {(x, y, z) | (x, y) C, (x, y) z (x, y)},

(x,y)

where C R2 is a domain and , C1 (C) define regular top and bottom surfaces F1 and F2 of F, respectively.
We prove only one part of (10.7) namely f~ = (0, 0, f3).
ZZ
ZZZ
f3
dxdydz =
f3 dxdy.
(10.8)
z

Proof.
(x,y)

0000000000000000000000000000
1111111111111111111111111111
1111111111111111111111111111
0000000000000000000000000000
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
C
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111

By Fubinis theorem, the left side reads


ZZZ

f3
dxdydz =
x

ZZ

(x,y)

ZZ

(x,y)

f3
dz
z

dxdy

(f3 (x, y, (x, y)) f3 (x, y, (x, y))) dxdy,

(10.9)

where the last equality is by the fundamental theorem of calculus.


Now we are going to compute the surface integral. The outer normal for the top surface is
(x (x, y), y (x, y), 1) such that
ZZ
ZZ
I1 =
(0, 0, f3) (x (x, y), y (x, y), 1) dxdy
f3 dxdy =
F1+

ZZ
C

f3 (x, y, (x, y)) dxdy.

10 Surface Integrals

270

Since the bottom surface F2 is oriented downward, the outer normal is (x (x, y), y (x, y), 1)
such that
ZZ
ZZ
I2 =

f3 dxdy =

f3 (x, y, (x, y)) dxdy.

F2+

Finally, the shell F3 is parametrized by an angle and z:


F3 = {(r() cos , r() sin , z) | (x, y) z (x, y), [0, 2]}.
Since D2 F = (0, 0, 1), the normal vector is orthogonal to the z-axis, ~n = (n1 , n2 , 0). Therefore,
ZZ
ZZ
f3 dxdy =
(0, 0, f3 ) (n1 , n2 , 0) dS = 0.
I3 =
F3+

F3+

Comparing I1 + I2 + I3 with (10.9) proves the theorem in this special case.

Remarks 10.2 (a) Gau divergence theorem can be used to compute the volume of the domain
G R3 . Suppose the boundary G of G has the orientation of the outer normal. Then
ZZ
ZZ
ZZ
x dydz =
y dzdx =
z dxdy.
v(G) =
G

(b) Applying the mean value theorem to the left-hand side of Gau formula we have for any
bounded region G containing x0
ZZZ
ZZ
~
~
~
div f (x0 + h)
dxdydz = div f (x0 + h)v(G) =
f~ dS,
G

G+

where h is a small vector. The integral on the left is the volume v(G). Hence
1
Gx0 v(G)

div f~(x0 ) = lim

ZZ

G+

~ = lim 1 3
f~ dS
4
0

ZZ

S (x0 )+

~
f~ dS,

where the region G tends to x0 . In the second formula, we have chosen G = B (x0 ) the open
ball of radius with center x0 . The right hand side can be thought as to be the source density of
~ In particular, the right side gives a basis independent description of div f~.
the field f.
Example 10.6 We want to compute the surface integral from Example 10.5 (b) using Gau
theorem:
ZZ
ZZZ
ZZZ
2R3
~
~
div f dxdydz =
(a + b + c) dxdydz =
f dS =
(a + b + c).
3
F+

x2 +y 2 +z 2 R2 , z0

10.4 Gau Divergence Theorem

271

Gau divergence theorem which play an important role in partial differential equations.
Recall (Proposition 7.9 (Prop. 8.9)) that the directional derivative of a function v : U R,
U Rn , at x0 in the direction of the unit vector ~n is given by D~n f (x0 ) = grad f (x0 )~n.
Notation. Let U R3 be open and F+ U be an oriented, regular open surface with the unit
normal vector ~n(x0 ) at x0 F. Let g : U R be differentiable.
Then
g
(x0 ) = grad g(x0 )~n(x0 )
~n

(10.10)

is called the normal derivative of g on F+ at x0 .


Proposition 10.2 Let G be a region as in Gau theorem, the boundary G is oriented with the
outer normal, u, v are twice continuously differentiable on an open set U with G U. Then we
have Greens identities:
ZZ
ZZZ
ZZZ
v
(u)(v) dxdydz =
u
u (v) dxdydz,
(10.11)
dS
~n
G
G
G

ZZZ
ZZ 
u
v
dS,
(10.12)
v
(u(v) v(u)) dxdydz =
u
~n
~n
G
ZZZ
ZGZ
u
dS.
(10.13)
(u) dxdydz =
~n
G

Proof. Put f = u(v). Then by nabla calculus


div f = (uv) = (u) (v) + u(v)
= grad u grad v + u(v).

Applying Gau theorem, we obtain


ZZZ
ZZZ
ZZZ
div f dxdydz =
( grad u grad v) dxdydz +
u(v) dxdydz
G

ZG
Z
G

u grad v ~n dS =

ZZ

v
dS.
~n

This proves Greens first identity. Changing the role of u and v and taking the difference, we
obtain the second formula.
Inserting v = 1 into (10.12) we get (10.13).

Application to Laplace equation


Let u1 and u2 be functions on G with u1 = u2 , which coincide on the boundary G,
u1 (x) = u2 (x) for all x G. Then u1 u2 in G.

10 Surface Integrals

272

Proof. Put u = u1 u2 and apply Greens first formula (10.11) to u = v. Note that (u) =
(u1 ) (u2 ) = 0 (U is harmonic ini G) and u(x) = u1 (x) u2 (x) = 0 on the boundary
x G. In other words, a harmonic function is uniquely determined by its boundary values.
ZZZ
ZZZ
ZZ
u
dS
u (u) dxdydz = 0.
(u)(u) dxdydz =
u
|{z}
| {z }
~n
G

0,xG

Since (u)(u) = k(u)k 0 and k(u)k is a continuous function on G, by homework


14.3, k(u)k2 = 0 on G; hence (u) = 0 on G. By the Mean Value Theorem, Corollary 7.12,
u is constant on G. Since u = 0 on G, u(x) = 0 for all x G.

10.5 Stokes Theorem


Roughly speaking, Stokes theorem relates a surface integral over a surface F with a line integral
over the boundary F. In case of a plane surface in R2 , it is called Greens theorem.

10.5.1 Greens Theorem

11111
00000
00000
11111
00000
11111
00000
11111

00000000000
11111111111
00000
11111
00000000000
11111111111
00000
11111
00000000000
11111111111
00000
00000000000 11111
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
2

Let G be a domain in R2 with picewise smooth (differentiable) boundaries 1 , 2 , . . . , k . We give an orientation to the boundary: the outer curve is oriented counter
clockwise (mathematical positive), the inner boundaries
are oriented in the opposite direction.

Theorem 10.3 (Greens Theorem) Let (P, Q) be a continuously differentiable vector field on
G and let the boundary = G be oriented as above. Then

Z
ZZ 
Q P
dxdy =

P dx + Q dy.
(10.14)
x
y

Proof. (a) First, we consider a region G of type 1 in the plane, as shown in the figure and we
will prove that
Z
ZZ
P
dxdy =

P dx.
(10.15)
y

(x)
y

1
(x)

The double integral on the left may be evaluated as an


iterated integral (Fubinis theorem), we have
!
ZZ
Z b Z (x)
P
Py (x, y) dy dx
dxdy =
y
a
(x)
G
Z b
=
(P (x, (x)) P (x, (x))) dx.
a

10.5 Stokes Theorem

273

The latter equality is due to the fundamental theorem of calculus. To compute the line integral,
we parametrize the four parts of in a natural way:
1 ,

~x1 (t) = (a, t),

2 ,

~x2 (t) = (t, (t)),

3 ,

~x3 (t) = (b, t),

4 ,

t [(a), (a)],

dx = 0,

t [(b), (b)],

dx = 0,

t [a, b],

~x4 (t) = (t, (t)),

dy = dt,

dx = dt,

t [a, b],

dy = (t) dt,
dy = dt,

dx = dt,

dy = (t) dt.

Since dx = 0 on 1 and 3 we are left with the line integrals over 2 and 4 :
Z b
Z b
Z
P dx =
P (t, (t)) dt
P (t, (t)) dt

Let us prove the second part,


Z

(x)

d
Q(x, y) dy =
x
dx

(x)

Inserting this into

RR

Q
x

ZZ

Q
dxdy =
x

dxdy =

d
dx

RR

Q
x

dxdy =

(x)

Q dy. Using Proposition 7.24 we have

Q(x, y) dy
(x)

R b R (x)

Q
(x) x

Q(x, y) dy

(x)

Q(b, y) dy

(x)Q(x, (x)) + (x)Q(x, (x)).


dy dx, we get

(x)

(b)

(b)

(x)Q(x, (x)) + (x)Q(x, (x))

(a)

(a)

Q(a, y) dy
+

We compute the line integrals:


Z
Z

Q dy =
1

Further,
Z

Q dy =

(a)

Q(a, y) dy,

(a)

Q(t, (t)) (t) dt,


a

dx

Q(x, (x)) (x) dx+ (10.16)

Q(x, (x)) (x) dx.

(10.17)

Q dy =

Q dy =

Z
Z

(b)

Q(b, y) dy.
(b)

Q(t, (t)) (t) dt.

Adding up these integrals and comparing the result with (10.17), the proof for type 1 regions is
complete.
1

type 2
(x)

1010
1010
type 1
110010
1111111111111111111111111
0000000000000000000000000
11111111111111
00000000000000
00000000000000
11111111111111
11111111
00000000
10
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
type 2
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
type
2
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
1111111111111111111
0000000000000000000
0000000000000000000000000
1111111111111111111111111

(x)

2
3

111
000
x

10 Surface Integrals

274

Exactly in the same way, we can prove that if G is a type 2 region then (10.14) holds.
(b) Breaking a region G up into smaller regions, each of which is both of type 1 and 2, Greens
theorem is valid for G. The line integrals along the inner boundary cancel leaving the line
integral around the boundary of G.
(c) If the region has a hole, one can split it into two simply connected regions, for which
Greens theorem is valid by the arguments of (b).

Application: Area of a Region


R
If is a curve which bounds a region G, then the area of G is A = (1 )x dy y dx
where R is arbitrary, in particular,
Z
Z
Z
1
A=
x dy y dx =
x dy = y dx.
(10.18)
2

Proof. Choosing Q = (1 )x, P = y one has


ZZ
ZZ
Z
ZZ
dxdy =
((1 ) ()) dxdy =
(Qx Py ) dxdy =
P dx + Q dy
A=
G

y dx + (1 )

x dy.

Inserting = 0, = 1, and =

1
2

yields the assertion.

x2
y2
Example 10.7 Find the area bounded by the ellipse : 2 + 2 = 1. We parametrize by
a
b
~x(t) = (a cos t, b sin t), t [0, 2], ~x (t) = (a sin t, b cos t). Then (10.18) gives
1
A=
2

1
a cos t b sin t dt b sin t(a sin t) dt =
2

ab dt = ab.

10.5.2 Stokes Theorem


Conventions: Let F+ be a regular, oriented surface. Let = F+ be the boundary of F with the
induced orientation: the orientation of the surface (normal vector) together with the orientation
of the boundary form a right-oriented screw. A second way to get the induced orientation:
sitting in the arrowhead of the unit normal vector to the surface, the boundary curve has counter
clockwise orientation.
Theorem 10.4 (Stokes theorem) Let F+ be a smooth regular oriented surface with a
parametrization F C2 (G) and G is a plane region to which Greens theorem applies. Let

10.5 Stokes Theorem

275

= F be the boundary with the above orientation. Further, let f~ be a continuously differentiable vector field on F.
Then we have
Z
ZZ
~
~
curl f dS =
f~ d~x.
(10.19)
F+

F+

This can also be written as







ZZ 
Z
f3 f2
f1 f3
f2 f1
dydz+
dzdx+
dxdy =

f1 dx+f2 dy+f3 dz.


y
z
z
x
x
y

F+

Proof. Main idea: Reduction to Greens theorem. Since both sides of the equation are additive
with respect to the vector field f~, it suffices to proof the statement for the vector fields (f1 , 0, 0),
(0, f2 , 0), and (0, 0, f3 ). We show the theorem for f~ = (f, 0, 0), the other cases are quite
analogous:
 Z
ZZ 
f
f
dzdx
dxdy =
f dx.
z
y
F
F

Let F (u, v), u, v G be the parametrization of the surface F. Then


dx =

x
x
du +
dv,
u
v

(u, v)
such that the line integral on the right reads with P (u, v) = f (x(u, v), y(u, v), z(u, v)) x
u
x
and Q(u, v) = f v .
Z

f dx =
F

f xu du + f xv dv =

=
=
=

ZZ

ZGZ
ZGZ

P du + Q dv
G

Green s th.

(fu xu + f xvu ) + (fu xv + f xuv ) du dv =

ZZ

ZZ 

Q
P
+

v
u

du dv

fv xu + fu xv du dv

(fx xv + fy yv + fz zv )xu + (fx xu + fy yu + fz zu )xv du dv


(fy (xu yv xv yu ) + fz (zu xv zv xu )) du dv

G

ZZ
ZZ 
(x, y)
(z, x)
du dv =
+ fz
=
fy
fy dxdy + fz dzdx.
(u, v)
(u, v)
G

This completes the proof.

Remark 10.3 (a) Greens theorem is a special case with F = G {0}, ~n = (0, 0, 1) (orientation) and f~ = (P, Q, 0).

10 Surface Integrals

276

(b) The right side of (10.19) is called the circulation of the vector field f~ over the closed curve
. Now let ~x0 F be fixed and consider smaller and smaller neighborhoods F0+ of ~x0 with
boundaries 0 . By Stokes theorem and by the Mean Value Theorem of integration,
Z
ZZ
~
curl f~~n dS = curl f~(x0 )~n(x0 ) area (F0 ).
f d~x =
0

F0

Hence,
curl f~(x0 )~n(x0 ) = lim

F0 x0

F0

f~ d~x

| F0 |

We call curl f~(x0 ) ~n(x0 ) the infinitesimal circulation of the vector field f~ at x0 corresponding
to the unit normal vector ~n.
(c) Stokes theorem then says that the integral over the infinitesimal circulation of a vector field
f~ corresponding to the unit normal vector ~n over F equals the circulation of the vector field
along the boundary of F.
Path Independence of Line Integrals
We are going complete the proof of Proposition 8.3 and show that for a
simply connected region G R3 and
a twice continuously differentiable vector field f~ with
curl f~ = 0 for all x G
the vector field f~ is conservative.
Proof. Indeed, let be a closed, regular, piecewise differentiable curve G and let be the
the boundary of a smooth regular oriented surface F+ , = F+ such that has the induced
orientation. Inserting curl f~ = 0 into Stokes theorem gives
ZZ
Z
~
f~ d~x;
curl f dS = 0 =
F+

the line integral is path independent and hence, f~ is conservative. Note that the region must be
simply connected; otherwise its in general impossible to find F with boundary .

10.5.3 Vector Potential and the Inverse Problem of Vector Analysis


Let f~ be a continuously differentiable vector field on the simply connected region G R3 .
Definition 10.7 The vector field f~ on G is called a source-free field (solenoidal field) if there
exists a vector field ~g on G with f~ = curl ~g . Then ~g is called the vector potential to f~.
Theorem 10.5 f~ is source-free if and only if div f~ = 0.

10.5 Stokes Theorem

277

Proof. (a) If f~ = curl ~g then div f~ = div ( curl ~g ) = 0.


(b) To simplify notations, we skip the arrows. We explicitly construct a vector potential g to f
with g = (g1 , g2 , 0) and curl g = f . This means
g2
,
z
g1
f2 =
,
z
g2 g1

.
f3 =
x
y
f1 =

Integrating the first two equations, we obtain


Z z
g2 =
f1 (x, y, t) dt + h(x, y),
z0
Z z
f2 (x, y, t) dt,
g1 =
z0

where h(x, y) is the integration constant, not depending on z. Inserting this into the third equation, we obtain
Z z
Z z
f1
f2
g2 g1

=
(x, y, t) dt + hx (x, y)
(x, y, t) dt
x
y
z0 x
z0 y

Z z
f1 f2
=
dt + hx
+
x
y
z0
Z z
f3
(x, y, t) dt + hx
=
div f =0 z z
0
f3 (x, y, z) = f3 (x, y, z) f3 (x, y, z0 ) + hx (x, y).

This yields, hx (x, y) = f3 (x, y, z0).


Integration with respect to x finally gives
Rx
h(x, y) = x0 f3 (t, y, z0) dt; the third equation is satisfied and curl g = f .
Remarks 10.4 (a) The proof of second direction is a constructive one; you can use this method
to calculate a vector potential explicitly. You can also try another ansatz, say g = (0, g2 , g3 ) or
g = (g1 , 0, g3 ).
(b) If g is a vector potential for f and U C2 (G), then g = g + grad U is also a vector potential
for f . Indeed
curl g = curl g + curl grad U = f.
The Inverse Problem of Vector Analysis
Let h be a function and ~a be a vector field on G; both continuously differentiable.
Problem: Does there exist a vector field f~ such that
div f~ = h and

curl f~ = ~a.

10 Surface Integrals

278

Proposition 10.6 The above problem has a solution if and only if div ~a = 0.
Proof. The condition is necessary since div ~a = div curl f~ = 0. We skip the vector arrows.
For the other direction we use the ansatz f = r + s with
curl r = 0,

div r = h,

(10.20)

curl s = a,

div s = 0.

(10.21)

Since curl r = 0, by Proposition 8.3 there exists a potential U with r = grad U. Then
curl r = 0 and div r = div grad U = (U). Hence (10.20) is satisfied if and only if
r = grad U and (U) = h.
Since div a = 0 by asssumption, there exists a vector potential g such that curl g = a. Let be
twice continuously differentiable on G and set s = g + grad . Then curl s = curl g = a and
div s = div g + div grad = div g + (). Hence, div s = 0 if and only if () = div g.
Both equations (U) = h and () = div g are so called Poisson equations which can be
solved within the theory of partial differential equations (PDE).
The inverse problem has not a unique solution. Choose a harmonic function , () = 0 and
put f1 = f + grad . Then
div f1 = div f + div grad = div f + () = div f = h,
curl f1 = curl f + curl grad = curl f = a.

Chapter 11
Differential Forms on

Rn

We show that Gau, Greens and Stokes theorems are three cases of a general theorem which
R
R
is also named after Stokes. The simple formula now reads c d = c . The appearance of
the Jacobian in the change of variable theorem will become clear. We formulate the Poincare
lemma.
Good references are [Spi65], [AF01], and [vW81].

11.1 The Exterior Algebra (

R)
n

Although we are working with the ground field R all constructions make sense for arbitrary
fields K, in particular, K = C. Let {e1 , . . . , en } be the standard basis of Rn ; for h Rn we
P
write h = (h1 , . . . , hn ) with respect to the standard basis, h = i hi ei .

11.1.1 The Dual Vector Space V


The interplay between a normed space E and its dual space E forms the basis of functional
analysis. We start with the definition of the (algebraic) dual.
Definition 11.1 Let V be a linear space. The dual vector space V to V is the set of all linear
functionals f : V R,
V = {f : V R | f is linear}.

It turns out that V is again a linear space if we introduce addition and scalar multiples in the
natural way. For f, g V , R put
(f + g)(v) := f (v) + g(v),

(f )(v) := f (v).

The evaluation of f V on v V is sometimes denoted by


f (v) = hf , vi K.
279

11 Differential Forms on Rn

280

In this case, the brackets denote the dual pairing between V and V . By definition, the pairing
is linear in both components. That is, for all v, w V and for all , R
hf + g , vi = hf , vi + hg , vi ,

hf , v + wi = hf , vi + hf , wi .

Example 11.1 (a) Let V = Rn with the above standard basis. For i = 1, . . . , n define the ith
coordinate functional dxi : Rn R by
dxi (h) = dxi (h1 , . . . , hn ) = hi ,

h Rn .

The functional dxi associates to each vector h Rn its ith coordinate hi . The functional dxi is
indeed linear since for all v, w Rn and , R, dxi (v+w) = (v+w)i = vi +wi =
dxi (v) + dxi (w).
The linear space (Rn ) has also dimension n. We will show that { dx1 , dx2 , . . . , dxn } is a
basis of (Rn ) . We call it the dual basis of to {e1 , . . . , en }. Using the Kronecker symbol the
evaluation of dxi on ej reads as follows
dxi (ej ) = ij ,

i, j = 1, . . . , n.

{ dx1 , dx2 , . . . , dxn } generates V . Indeed, let f V . Then f =


coincide for all h V :
n
X
i=1

f (ei ) dxi (h) =

n
X

f (ei ) hi

i=1

f homog.

f (hi ei ) = f

Pn

i=1

X
i

f (ei ) dxi since both

hi ei

= f (h).

In Propostition 11.1 below, we will see that { dx1 , . . . , dxn } is not only generating but linearly
independent.
(b) If V = C([0, 1]), the continuous functions on [0, 1] and is an increasing on [0, 1] function,
then the Riemann-Stieltjes integral
(f ) =

f d,

defines a linear functional on V .


If a [0, 1],

a (f ) = f (a),

f V

f V

defines the evaluation functional of f at a. In case a = 0 this is Diracs -functional playing an


important role in the theory of distributions (generalized functions).
P
(c) Let a Rn . Then ha , xi = ni=1 ai xi , x Rn defines a linear functional on Rn . By (a)
this is already the most general form of a linear functional on Rn .

11.1 The Exterior Algebra (Rn )

281

Definition 11.2 Let k N. An alternating (or skew-symmetric) multilinear form of degree


k on Rn , a k-form for short, is a mapping : Rn Rn R, k factors Rn , which is
multilinear and skew-symmetric, i. e.
MULT ( , vi + wi , ) = ( , vi , ) + ( , wi, ),
SKEW

( , vi , , vj , ) = ( , vj , , vi , ),

(11.1)

i, j = 1, . . . , k, i 6= j, (11.2)

for all vectors v1 , v2 , . . . , vk , wi Rn .

We denote the linear space of all k-forms on Rn by k (Rn ) with the convention 0 (Rn ) = R.
In case k = 1 property (11.2) is an empty condition such that 1 (Rn ) = (Rn ) is just the dual
space.
Let f1 , . . . , fk (Rn ) be linear functionals on Rn . Then we define the k-form
f1 fk k (Rn ) (read: f1 wedge f2 . . . wedge fk ) as follows


f1 (h1 ) f1 (hk )



..
(11.3)
f1 fk (h1 , . . . , hk ) = ...
.

fk (h1 ) fk (hk )
In particular, let i1 , . . . , ik {1, . . . , n} be fixed and choose fj = dxij , j = 1, . . . , k. Then


h1i hki
1
1

..
dxi1 dxik (h1 , . . . , hk ) = ...
.

h1i hki
k
k

f1 fk is indeed a k-form since the fi are linear, the determinant is multilinear








a b c
a b c
a + a b c






d + d e f = d e f + d e f ,






g h i
g h i
g + g h i
and skew-symmetric





b a c
a b c




d e f = e d f .




h g i
g h i

For example, let y = (y1 , . . . , yn ), z = (z1 , . . . , zn ) Rn ,




y3 z3
= y3 z1 y1 z3 .
dx3 dx1 (y, z) =
y1 z1

If fr = fs = f for some r 6= s, we have f1 f f fk = 0 since determinants


with identical rows vanish. Also, for any k (Rn ),
(h1 , . . . , h, . . . , h, . . . , hk ) = 0,

h1 , . . . , hk , h Rn

since the defining determinant has two identical columns.

11 Differential Forms on Rn

282

Proposition 11.1 For k n the k-forms { dxi1 dxik | 1 i1 < i2 < < ik n}
form a basis of the vector space k (Rn ). A k-form with k > n is identically zero. We have
 
n
k
n
dim (R ) =
.
k
Proof. Any k-form is uniquely determined by its values on the k-tuple of vectors (ei1 , . . . , eik )
with 1 i1 < i2 < < ik n. Indeed, using skew-symmetry of , we know on all ktuples of basis vectors; using linearity in each component, we get on all k-tuples of vectors.
This shows that the dxi1 dxik with 1 i1 < i2 < < ik n generate the linear space
P
P
k (Rn ). We make this precise in case k = 2. With y = i yi ei , z = j zj ej we have by
linearity and skew-symmetry of
(y, z) =

n
X

yi zj (ei , ej ) =

SKEW

i,j=1

X
i<j

Hence,

(yi zj yj zi )(ei , ej )

1i<jn



yi zi X

=
(ei , ej )
(ei , ej ) dxi dxj (y, z).
yj zj i<j
=

X
i<j

(ei, ej ) dxi dxj .


This shows that the n2 2-forms { dxi dxj | i < j} generate 2 (Rn ).
P
We show its linear independence. Suppose that i<j ij dxi dxj = 0 for some ij R.
Evaluating this on (er , es ), r < s, gives


X
X
ri si X
=
ij dxi dxj (er , es ) =
ij
ij (ri sj rj si ) = rs ;
0=

rj
sj
i<j
i<j
i<j

hence, the above 2-forms are linearly independent. The arguments for general k are similar.

In general, let k (Rn ) then there exist unique numbers ai1 ik = (ei1 , . . . , eik ) R,
i1 < i2 < < ik such that
X
=
ai1 ik dxi1 dxik .
1i1 <<ik n

Example 11.2 Let n = 3.


k = 1 { dx1 , dx2 , dx3 } is a basis of 1 (R3 ).
k = 2 { dx1 dx2 , dx1 dx3 , dx2 dx3 } is a basis of 2 (R3 ).
k = 3 { dx1 dx2 dx3 } is a basis of 3 (R3 ).
k (R3 ) = {0} for k 4.
Definition 11.3 An algebra A over R is a linear space together with a product map (a, b) ab,
A A A, such that the following holds for all a, b, c A an d R

11.1 The Exterior Algebra (Rn )

283

(i) a(bc) = (ab)c (associative),


(ii) (a + b)c = ac + bc, a(b + c) = ab + ac,
(iii) (ab) = (a)b = a(b).
Standard examples are C(X), the continuous functions on a metric space X or Rnn , the full
n n-matrix algebra over R or the algebra of polynomials R[X].
n
M
n
Let (R ) =
k (Rn ) be the direct sum of linear spaces.
k=0

Proposition 11.2 (i) (Rn ) is an R-algebra with unity 1 and product defined by
( dxi1 dxik ) ( dxj1 dxjl ) = dxi1 dxik dxj1 dxjl
(ii) If k k (Rn ) and l l (Rn ) then k l k+l (Rn ) and
k l = (1)kl l k .
Proof. (i) Associativity is clear since concatanation of strings is associative. The distributive
laws are used to extend multiplication from the basis to the entire space (Rn ).
We show (ii) for k = dxi1 dxik and l = dxj1 dxjl . We already know
dxi dxj = dxj dxi . There are kl transpositions dxir dxjs necessary to transport all
dxjs from the right to the left of k . Hence the sign is (1)kl .
In particular, dxi dxi = 0. The formula dxi dxj = dxj dxi determines the product in
(Rn ) uniquely.
We call (Rn ) is the exterior algebra of the vector space Rn .
The following formula will be used in the next subsection. Let k (Rn ) and l (Rn )
then for all v1 , . . . , vk+l Rn
()(v1 , . . . , vk+l ) =

1 X
(1) (v(1) , . . . , v(k) ) (v(k+1) , . . . , v(k+l) ).
k!l! S

(11.4)

k+l

Indeed, let = f1 fk , = fk+1 fk+l , fi (Rn ) . The above formula can be


obtained from
(f1 fk ) (fk+1 fk+l )(v1 , . . . , vk+l ) =

f1 (v1 )
f1 (vk )
f1 (vk+1 )
f1 (vk+l )

f2 (v1 )
f2 (vk )
f2 (vk+1 )
f2 (vk+l )


..
..
..

.
.
.


= fk (v1 )
fk (vk )
fk (vk+1 )
fk (vk+l )
f (v ) f (v ) f (v ) f (v )
k+1 k
k+1 k+1
k+1 k+l
k+1 1

..
..
..

.
.
.

f (v ) f (v ) f (v ) f (v )
k+l 1
k+l k
k+l k+1
k+l k+l

11 Differential Forms on Rn

284

when expanding this determinant with respect to the last l rows. This can be done using Laplace
expansion:



ai j ai j ai j
aik+1 jk+l
1 1
1 k
k+1 k+1

X
Pk


..
..
..
|A| =
(1) m=1 (im +jm ) ...

,
.
.
.



1j1 <<jk n
ai j ai j ai j
ai j
k 1

k k

k+l k+1

k+l k+l

where (i1 , . . . , ik ) is any fixed orderd multi-index and (jk+1 , . . . , jk+l ) is the complementary
orderd multi-index to (j1 , . . . , jk ) such that all inegers 1, 2, . . . , k + l appear.

11.1.2 The Pull-Back of k-forms


Definition 11.4 Let A L(Rn , Rm ) a linear mapping and k N. For k (Rm ) we define
a k-form A () k (Rn ) by
(A )(h1 , . . . , hk ) = (A(h1 ), A(h2 ), . . . , A(hk )),

h1 , . . . , hk Rn

We call A () the pull-back of under A.


Note that A L(k (Rm ), k (Rn )) is a linear mapping. In case k = 1 we call A the dual
mapping to A. In case k = 0, R we simply set A () = . We have A () =
A ()A (). Indeed, let k (Rn ), l (Rn ), and hi Rn , i = 1, . . . , k + l, then by
(11.4)
A ()(h1 , . . . ,hk+l ) = ()(A(h1 ), . . . , A(hk+l ))
1 X
(1) (A(v(1) ), . . . , A(v(k) )) (A(v(k+1) ), . . . , A(v(k+l) ))
=
k!l! S
k+l
1 X
(1) A ()(v(1) , . . . , v(k) ) A ()(v(k+1) , . . . , v(k+l) )
=
k!l! S
k+l

= (A ()A ())(h1 , . . . , hk+l ).




1 0 3
Example 11.3 (a) Let A =
R23 be a linear map A : R3 R2 , defined by
2 1 0
matrix multiplication, A(v) = Av, v R3 . Let {e1 , e2 , e3 } and {f1 , f2 } be the standard bases
of R3 and R2 resp. and let { dx1 , dx2 , dx3 } and { dy1 , dy2 } their dual bases, respectvely.
First we compute A ( dy1 ) and A ( dy2 ).
A ( dy1)(ei ) = dy1 (A(ei )) = ai1 ,

A ( dy2 )(ei ) = ai2 .

In particular,
A ( dy1 ) = 1 dx1 + 0 dx2 + 3 dx3 ,

A ( dy2 ) = 2 dx1 + dx2 .

Compute A ( dy2 dy1 ). By definition, for 1 i < j 3,



ai2 aj2
.
A ( dy2 dy1 )(ei , ej ) = dy2 dy1 (A(ei ), A(ej )) = dy2 dy1 (Ai , Aj ) =
aj2 aj1

11.2 Differential Forms

285

In particular

2
A ( dy2 dy1 )(e1 , e2 ) =
1

1
A ( dy2 dy1 )(e2 , e3 ) =
3

Hence,


1
= 1,
0

0
= 3.
0



2 0

= 6,
A ( dy2 dy1 )(e1 , e3 ) =
1 3

A ( dy2 dy1 ) = dx1 dx2 + 6 dx1 dx3 + 3 dx2 dx3 .


On the other hand
A ( dy2 )A ( dy1) = (2 dx1 + dx2 )( dx1 +3 dx3 ) = dx1 dx2 +3 dx1 dx3 +6 dx1 dx3 .
(b) Let A Rnn , A : Rn Rn and =
A () = det(A) .

11.1.3 Orientation of

dx1 dxn n (Rn ).

Then

If {e1 , . . . , en } and {f1 , . . . , fn } are two bases of Rn there exists a unique regular matrix A =
P
(aij ) (det A 6= 0) such that ei =
j aij fj . We say that {e1 , . . . , en } and {f1 , . . . , fn } are
equivalent if and only if det A > 0. Since det A 6= 0, there are exactly two equivalence classes.
We say that the two bases {ei | i = 1, . . . , n} and {fi | i = 1, . . . , n} define the same orientation
if and only if det A > 0.
Definition 11.5 An orientation of Rn is given by fixing one of the two equivalence classes.
Example 11.4 (a) In R2 the bases {e1 , e2 } and {e2 , e1 } have different orientations since A =


0 1
and det A = 1.
1 0
(b) In R3 the bases {e1 , e2 , e3 }, {e3 , e1 , e2 } and {e2 , e3 , e1 } have the same orientation whereas
{e1 , e3 , e2 }, {e2 , e1 , e3 }, and {e3 , e2 , e1 } have opposite orientation.
(c) The standard basis {e1 , . . . , en } and {e2 , e1 , e3 , . . . , en } define different orientations.

11.2 Differential Forms


11.2.1 Definition
Throughout this section let U Rn be an open and connected set.
Definition 11.6 (a) A differential k-form on U is a mapping : U k (Rn ), i. e. to every
point p U we associate a k-form (p) k (Rn ). The linear space of differential k-forms on
U is denoted by k (U).

11 Differential Forms on Rn

286

(b) Let be a differential k-form on U. Since { dxi1 dxik | 1 i1 < i2 < < ik n}
forms a basis of k (Rn ) there exist uniquely determined functions ai1 ik on U such that
X
(p) =
(11.5)
ai1 ik (p) dxi1 dxik .
1i1 <<ik n

If all functions ai1 ik are in Cr (U), r N {} we say is an r times continuosly differentiable differential k-form on U. The set of those differential k-forms is denoted by rk (U)
We define

r0 (U)

= C (U) and (U) =

n
M

k (U). The product in (Rn ) defines a product

k=0

in (U):

()(x) = (x)(x),

x U,

hence (U) is an algebra. For example, if


1 = x2 dy dz + xyz dx dy

2 = (xy 2 + 3z 2 ) dx

define a differential 2-form and a 1-form on R3 , 1 2 (R3 ), 2 1 (R3 ) then 1 2 =


(x3 y 2 + 3x2 z 2 ) dx dy dz.

11.2.2

Differentiation

Definition 11.7 Let f 0 (U) = Cr (U) and p U. We define


df (p) = Df (p);
then df is a differential
1-form on U.
X
If (p) =
ai1 ik (p) dxi1 dxik is a differential k-form, we define
1i1 <<ik n

d(p) =

1i1 <<ik n

dai1 ik (p) dxi1 dxik .

(11.6)

Then d is a differential (k + 1)-form. The linear operator d : k (U) k+1 (U) is called the
exterior differential.
Remarks 11.1 (a) Note, that for a function f : U R, Df L(Rn , R) = 1 (Rn ). By
Example 7.7 (a)
n
n
X
X
f
f
Df (x)(h) = grad f (x) h =
(x)hi =
(x) dxi (h),
x
x
i
i
i=1
i=1

hence
n
X
f
df (x) =
(x) dxi .
xi
i=1

(11.7)

11.2 Differential Forms

287

Viewing xi : U R as a C -function, by the above formula


dxi (x) = dxi .
This justifies the notation dxi . If f C (R) we have df (x) = f (x) dx.
(b) One can show that the definition of d does not depend on the choice of the basis
{ dx1 , . . . , dxn } of 1 (Rn ).
Example 11.5 (a) G = R2 , = exy dx + xy 3 dy. Then
d = d(exy ) dx + d(xy 3 ) dy
= (yexy dx + xexy dy) dx + (y 3 dx + 3xy 2 dy) dy
= (xexy + y 3) dx dy.
(b) Let f be continuously differentiable. Then
df = fx dx + fy dy + fz dz = grad f ( dx, dy, dz) = grad f . d~x.
(c) Let v = (v1 , v2 , v3 ) be a C1 -vector field. Put = v1 dx + v2 dy + v3 dz. Then we have
d =

v3 v2

y
z

dy dz +

v1 v3

z
x

dz dx +

v2 v1

x
y

dx dy

= curl (v) ( dy dz, dz dx, dx dy) = curl (v) dS.


(d) Let v be as above. Put = v1 dy dz + v2 dz dx + v3 dx dy. Then we have
d = div (v) dx dy dz.
Proposition 11.3 The exterior differential d is a linear mapping which satisfies
(i) d() = d + (1)k d,
(ii) d(d) = 0, 2 (U).

1k (U), 1 (U).

Proof. (i) We first prove Leibniz rule for functions f, g 10 (U). By Remarks 11.1 (a),

X  f
X
g
(f g) dxi =
g+f
dxi
d(f g) =
xi
xi
xi
i
i
X f
X g
=
dxi g +
dxi f = df g + f dg.
xi
xi
i
i

For I = (i1 , . . . , ik ) and J = (j1 , . . . , jl ) we abbreviate dxI = dxi1 dxik and dxJ =

11 Differential Forms on Rn

288
dxj1 dxjl . Let =

X
I

d() = d
=

X
I,J

aI dxI and =

X
I,J

bJ dxJ . By definition

X
I,J

aI bJ dxI dxJ

X
I,J

d(aI bj ) dxI dxJ

(daI bJ + aI dbJ ) dxI dxJ


daI dxI bJ dxJ +
k

= d + (1) d,

X
I,J

aI dxI dbJ dxJ (1)k

where in the third line we used dbJ dxI = (1)k dxI dbJ .
(ii) Again by the definition of d:
d(d) =

X
I

 X 2
X  aI
aI
d (daI dxI ) =
d
dxj dxI =
dxi dxj dxI
x
x
j
i xj
I,j
I,i,j

X 2 aI
( dxj dxi dxI ) = d(d).
Schwarz lemma
xj xi
I,i,j
=

It follows that d(d) = d2 = 0.

11.2.3 Pull-Back
Definition 11.8 Let f : U V be a differentiable function with open sets U Rn and
V Rm . Let k (V ) be a differential k-form. We define a differential k-form f ()
k (U) by
(f )(p) = (Df (p))(f (p)),
(f )(p; h1, . . . , hk ) = (f (p); Df (p)(h1), . . . , Df (p)(hk )),

p U, h1 , . . . , hk Rn .

In case k = 0 and 0 (V ) = C (V ) we simply set


(f )(p) = (f (p)),

f = f.

We call f () the pull-back of the differential k-form with respect to f .


Note that by definition the pull-back f is a linear mapping from the space of differential kforms on V to the space of differential k-forms on U, f : k (V ) k (U).

11.2 Differential Forms

289

Proposition 11.4 Let f be as above and , (V ). Let { dy1 , . . . , dym } be the dual basis
to the standard basis in (Rm ) . Then we have with f = (f1 , . . . , fm )
n
X
fi
dxj = dfi ,
x
j
j=1

(a)

f ( dyi) =

(b)

f (d) = d(f ),

(c)

f (a) = (af )f (),

(d)

f () = f ()f ().

i = 1, . . . , m.

a C (V ),

(11.8)
(11.9)
(11.10)
(11.11)

If n = m, then
(e)

f ( dy1 dyn ) =

(f1 , . . . , fn )
dx1 dxn .
(x1 , . . . , xn )

(11.12)

Proof. We show (a). Let h Rn ; by Definition 11.7 and the definition of the derivative we have
+
*
 !
n 
X
(p)
f
k
f ( dyi )(h) = dyi (Df (p)(h)) = dyi ,
hj
x
j
j=1
k=1,...,m




n
n
X fi (p)
X fi (p)
=
hj =
dxj (h).
xj
xj
j=1
j=1
This shows (a). Equation (11.10) is a special case of (11.11); we prove (d). Let p U. Using
the pull-back formula for k forms we obtain
f ()(p) = (Df (p)) ((f (p))) = (Df (p))((f (p))(f (p)))
= (Df (p)) ((f (p))) (Df (p)) ((f (p)))

= f ()(p) f ()(p) = (f () f ())(p)

To show (11.9) we start with a 0-form g and prove that f (dg) = d(f g) for functions
g : U R. By (11.7) and (11.10) we have
!
m
m
X
X
g
g(f (p))

f (dg)(p) = f
dyi (p) =
f ( dyi)
yi
yi
i=1
i=1
n
m
X
g(f (p)) X fi
(p) dxj
y
x
i
j
i=1
j=1
!
n
m
X X g(f (p)) fi (p)
=
dxj
yi
xj
j=1
i=1

n
X

=
(g f )(p) dxj
chain rule
x
j
j=1

= d(g f )(p) = d(f g)(p).


Now let =

aI dxI be an arbitrary form. Since by Leibniz rule


df ( dxI ) = d(d(fi1 ) d(fik )) = 0,

11 Differential Forms on Rn

290
we get by Leibniz rule
d (f ) = d
=

X
I

f (aI )f ( dxI )

(d(f (aI )) f ( dxI ) + f (aI ) d(f ( dxI ))) =

X
I

d(f (aI ))i f ( dxI ).

On the other hand, by (d) we have


!
!
X
X
X
ai dxI = f
daI dxI =
f (daI ) f ( dxI ).
f d
I

By the first part of (b), both expressions coincide. This completes the proof of (b).
We finally prove (e). By (b) and (d) we have
f ( dy1 dyn ) = f ( dy1 ) f ( dyn )
n
n
X
X
f1
fn
dxin
dxi1
=
xi1
xin
i =1
i =1
1

n
X

i1 ,...,in

f1
fn
dxi1 dxin .

xi1
xin
=1

Since the square of a 1-form vanishes, the only non-vanishing terms in the above sum are the
permutations (i1 , . . . , in ) of (1, . . . , n). Using skew-symmetry to write dxi1 dxin as a
multiple of dx1 dxn , we obtain the sign of the permutation (i1 , . . . , in ):
X
f1
fn
sign (I)
f ( dy1 dyn ) =

dx1 dxn
xi1
xin
I=(i1 ,...,in )Sn

(f1 , . . . , fn )
dx1 dxn .
(x1 , . . . , xn )

Example 11.6 (a) Let f (r, ) = (r cos , r sin ) be given on R2 \ (0 R) and let { dr, d}
and { dx, dy} be the dual bases to {er , e } and {e1 , e2 }. We have
f (x) = r cos ,

f (y) = r sin ,

f ( dx) = cos dr r sin d, f ( dy) = sin dr + r cos d,

f ( dx dy) = r drd,

y
x
dx + 2
dy = d.
x2 + y 2
x + y2

(b) Let k N, r {1, . . . , k}, and R. Define a mapping a mapping from I : Rk Rk+1
and k (Rk+1 ) by
I(x1 , . . . , xk ) = (x1 , . . . , xr1 , , xr , . . . xk ),
(y1, . . . , yk+1) =

k+1
X
i=1

fi (y) dy1 d
dyi dyk+1,

11.2 Differential Forms

291

where fi C (Rk+1 ) for all i; the hat means omission of the factor dyi . Then
I ()(x) = fr (x1 , . . . , xr1 , , xr , . . . , xk ) dx1 dxk .
This follows from
I ( dyi ) = dxi ,

I ( dyr ) = 0,
I ( dyi+1 ) = dxi ,

i = 1, . . . , r 1,
i = r, . . . , k.

Roughly speaking: f () is obtained by substituting the new variables at all places.

11.2.4 Closed and Exact Forms


Motivation: Let f (x) be a continuous function on R. Then = f (x) dx is a 1-form. By the
fundamental theorem of calculus, there exists an antiderivative F (x) to f (x) such that d F (x) =
f (x) dx = .
Problem: Given k (U). Does there exist k1 (U) with d = ?
Definition 11.9 k (U) is called closed if d = 0.
k (U) is called exact if there exists k1 (U) such that d = .
Remarks 11.2 (a) An exact form is closed; indeed, d = d(d) = 0.
P
(b) A 1-form = i fi dxi is closed if and only if curl f~ = 0 for the corresponding vector field
f~ = (f1 , . . . , fn ). Here the general curl can be defined as a vector with n(n1)/2 components
~ ij = fj fi .
( curl f)
xi xj
The form is exact if and only if f~ is conservative, that is, f~ is a gradient vector field with
f~ = grad (U). Then = d U.
(c) There are closed forms that are not exact; for example, the winding form
=

y
x
dx + 2
dy
2
+y
x + y2

x2

on R2 \ {(0, 0)} is not exact, cf. homework 30.1.


(d) If d = then d( + d) = , too, for all k2 (U).

x0

Definition 11.10 An open set U is called star-shaped if there exists an x0 U such that for all x U the segment from x0 to x is
in U, i. e. (1 t)x0 + tx U for all t [0, 1].

Convex sets U are star-shaped (take any x0 U); any star-shaped set is connected and simply
connected.

11 Differential Forms on Rn

292

Lemma
11.5 Let U
Rn be star-shaped with respect to the origin.
X
=
ai1 ik dxi1 dxik k (U). Define

Let

i1 <<ik


Z 1
k
X X
r1
k1
dir dxi ,
(1)
t ai1 ik (tx) dt xir dxi1 dx
k

I()(x) =

i1 <<ik r=1

(11.13)

where the hat means omission of the factor dxir . Then we have
I(d ) + d(I ) = .

(11.14)

(Without proof.)
Example 11.7 (a) Let k = 1, n = 3, and = a1 dx1 + a2 dx2 + a3 dx3 . Then
I() = x1

a1 (tx) dt + x2

a2 (tx) dt + x3

a3 (tx) dt.

Note that this is exactly the formula for the potential U(x1 , x2 , x3 ) from Remark 8.5 (b). Let
(a1 , a2 , a3 ) be a vector field on U with d = 0. This is equivalent to curl a = 0 by Example 11.5 (c). The above lemma shows dU = for U = I(); this means grad U = (a1 , a2 , a3 ),
U is the potential to the vector field (a1 , a2 , a3 ).
(b) Let k = 2, n = 3, and = a1 dx2 dx3 + a2 dx3 dx1 + a3 dx1 dx2 where a is a C1 -vector
field on U. Then
I() =

x3


ta2 (tx) dt x2
ta3 (tx) dt dx1 +
0

 Z 1
Z 1
ta3 (tx) dt x3
ta1 (tx) dt dx2 +
+ x1
0
0

 Z 1
Z 1
+ x2
ta1 (tx) dt x1
ta2 (tx) dt dx3 .
Z

By Example 11.5 (d), is closed if and only if div (a) = 0 on U. Let = b1 dx1 + b2 dx2 +
b3 dx3 such that d = . This means curl b = a. The Poincare lemma shows that b with
curl b = a exists if and only if div (a) = 0. Then b is the vector potential to a. In case d = 0
we can choose ~b d~x = I().
Theorem 11.6 (Poincare Lemma) Let U be star-shaped. Then every closed differential form
is exact.
Proof. Without loss of generality let U be star-shaped with respect to the origin and d = 0.
By Lemma 11.5, d(I) = .

11.3 Stokes Theorem

293

Remarks 11.3 (a) Let U be star-shaped, k (U). Suppose d 0 = for some


k1 (U). Then the general solution of d = is given by 0 + d with k2 (U).
Indeed, let be a second solution of d = . Then d( 0 ) = 0. By the Poincare lemma,
there exists k2 (U) with 0 = d, hence = 0 + d.
(b) Let V be a linear space and W a linear subspace of V . We define an equivalence relation
on V by v1 v2 if v1 v2 W . The equivalence class of v is denoted by v + W . One easily
sees that the set of equivalence classes, denoted by V /W , is again a linear space: (v + W ) +
(u + W ) := v + u + W .
Let U be an arbitrary open subset of Rn . We define
C k (U) = { k (U) | d = 0}, the cocycles on U,
B k (U) = { k (U) | is exact}, the coboundaries on U.
Since exact forms are closed, B k (U) is a linear subspace of C k (U). The factor space
k
(U) = C k (U)/B k (U)
HdeR
k
is called the de Rham cohomology of U. If U is star-shaped, HdeR
(U) = 0 for k 1, by
1
2\
Poincares lemma. The first de Rham cohomology HdeR of R {(0, 0)} is non-zero. The winding form is a non-zero element. We have
0
HdeR
(U)
= Rp ,

if and only if U has exactly p components which are not connected U = U1 Up (disjoint
union). Then, the characteristic functions Ui , i = 1, . . . , p, form a basis of the 0-cycles C 0 (U)
(B 0 (U) = 0).

11.3 Stokes Theorem


11.3.1

Singular Cubes, Singular Chains, and the Boundary Operator

A very nice treatment of the topics to this section is [Spi65, Chapter 4]. The set [0, 1]k =
[0, 1] [0, 1] = {x Rk | 0 xi 1, i = 1, . . . , k} is called the k-dimensional unit
cube. Let U Rn be open.
Definition 11.11 (a) A singular k-cube in U Rn is a continuously differentiable mapping
ck : [0, 1]k U.
(b) A singular k-chain in U is a formal sum
sk = n1 ck,1 + + nr ck,r

with singular k-cubes ck,i and integers ni Z.


c2

c2

A singular 0-cube is a point, a singular 1-cube is a curve,


in general, a singular 2-cube (in R3 ) is a surface with
a boundary of 4 pieces which are differentiable curves.
Note that a singular 2-cube can also be a single point
that is where the name singular comes from.

11 Differential Forms on Rn

294

Let Ik : [0, 1]k Rk be the identity map, i. e. Ik (x) = x, x [0, 1]k . It is called the standard kcube in Rk . We are going to define the boundary sk of a singular k-chain sk . For i = 1, . . . , k
define
k
(x1 , . . . , xk1 ) = (x1 , . . . , xi1 , 0, xi , . . . , xk1 ),
I(i,0)
k
I(i,1)
(x1 , . . . , xk1 ) = (x1 , . . . , xi1 , 1, xi , . . . , xk1 ).

Insert a 0 and a 1 at the ith component, respectively.


The boundary of the standard k-cube Ik is now defined by Ik : [0, 1]k1 [0, 1]k
Ik =

k
X
i=1


k
k
(1)i I(i,0)
I(i,1)
.

(11.15)

It is the formal sum of 2k singular (k 1)-cubes, the faces of the k-cube.


The boundary of an arbitrary singular k-cube ck : [0, 1]k U Rn is defined by the composition of the above mapping Ik : [0, 1]k1 [0, 1]k and the k-cube ck :
ck = ck Ik =

k
X
i=1


k
k
(1)i ck I(i,0)
ck I(i,1)
,

(11.16)

and for a singular k-chain sk = n1 ck,1 + + nr ck,r we set


sk = n1 ck,1 + + nr ck,r .
The boundary operator ck associates to each singular k-chain a singular (k 1)-chain (since
k
k
both I(i,0)
and I(i,1)
depend on k 1 variables, all from the segment [0, 1]).
One can show that
(sk ) = 0
for any singular k-chain sk .
Example 11.8 (a) In case n = k = 3 have
+ I(3,1)

x3

3
3
3
3
3
3
I3 = I(1,0)
+ I(1,1)
+ I(2,0)
I(2,1)
I(3,0)
+ I(3,1)
,

+I(2,0)

- I(1,0)

-I

+ I(1,1)
x2

x1

- I(3,0)

where

(2,1)

3
3
I(1,0)
(x1 , x2 ) = (0, x1 , x2 ), +I(1,1)
(x1 , x2 ) = +(1, x1 , x2 ),

3
3
(x1 , x2 ) = +(x1 , 0, x2 ), I(2,1)
(x1 , x2 ) = (x1 , 1, x2 ),
+I(2,0)

3
3
I(3,0)
(x1 , x2 ) = (x1 , x2 , 0), +I(3,1)
(x1 , x2 ) = +(x1 , x2 , 1).

k
k
D2 I(i,j)
to
Note, if we take care of the signs in (11.15) all 6 unit normal vectors D1 I(i,j)
3
the faces have the orientation of the outer normal with respect to the unit 3-cube [0, 1] . The
above sum I3 is a formal sum of singular 2-cubes. You are not allowed to add componentwise:
(0, x1 , x2 ) + (1, x1 , x2 ) 6= (1, 0, 0).

11.3 Stokes Theorem

295
(b) In case k = 2 we have

E3

I (1,0)

E1

I (2,0)

2
2
2
2
I2 (x) = I(1,1)
I(1,0)
+ I(2,0)
I(2,1)

I (2,1)

E4

+ I
(1,1)

4
c2

= (1, x) (0, x) + (x, 0) (x, 1).

I2 = (E3 E2 ) (E4 E1 )+

E2

+ (E2 E1 ) (E3 E4 ) = 0.

Here we have c2 = 1 + 2 3 4 .
be the singular 2-cube

(c) Let c2 : [0, 2] [0, ] R

3\
{(0, 0, 0)}

c(s, t) = (cos s sin t, sin s sin t, cos t).


By (b)
c2 (x) = c2 I2 =
= c2 (2, x) c2 (0, x) + c2 (x, 0) c2 (x, )

= (cos 2 sin x, sin 2 sin x, cos 2) (cos 0 sin x, sin 0 sin x, cos x)+

+ (cos x sin 0, sin x sin 0, cos 0) (cos x sin , sin x sin , cos )
= (sin x, 0, cos x) (sin x, 0, cos x) + (0, 0, 1) (0, 0, 1)

= (0, 0, 1) (0, 0, 1).

Hence, the boundary c2 of the singular 2-cube c2 is a degenerate singular 1-chain. We come
back to this example.

11.3.2 Integration
Definition 11.12 Let ck : [0, 1]k U Rn , ~x = ck (t1 , . . . , tk ), be a singular k-cube and
a k-form on U. Then (ck ) () is a k-form on the unit cube [0, 1]k . Thus there exists a unique
function f (t), t [0, 1]k , such that
(ck ) () = f (t) dt1 dtk .
Then

ck

:=

Ik

(ck ) ()

:=

[0,1]k

f (t) dt1 dtk

is called the integral of over the singular cube ck ; on the right there is the k-dimensional
Riemann integral.
r
X
If sk =
ni ck,i is a k-chain, set
i=1

sk

r
X
i=1

ni

ck,i

If k = 0, a 0-cube is a single point c0 (0) = x0 and a 0-form is a function C (G). We set


R
= c0 ()|t=0 = (c0(0)) = (x0 ). We discuss two special cases k = 1 and k = n.
c0

11 Differential Forms on Rn

296

Example 11.9 (a) k = 1. Let c : [0, 1] Rn be an oriented, smooth curve = c([0, 1]). Let
= f1 (x) dx1 + + fn (x) dxn be a 1-form on Rn , then
c () = (f1 (c(t))c1 (t) + + fn (c(t))cn (t)) dt
is a 1-form on [0, 1] such that
Z
Z
Z 1
Z

=
c =
f1 (c(t))c1 (t) + + fn (c(t))cn (t) dt =
f~ d~x.
c

[0,1]

R
Obviously, c is the line integral of f~ over .
(b) k = n. Let c : [0, 1]k Rk be continuously differentiable and let x = c(t). Let =
f (x) dx1 dxk be a differential k-form on Rk . By Proposition 11.4 (e),
c () = f (c(t))

(c1 , . . ., ck )
dt1 . . . dtk .
(t1 , . . ., tk )

Therefore,
Z

f (c(t))

(c1 , . . ., ck )
dt1 . . . dtk
(t1 , . . ., tk )

(11.17)

[0,1]k

Let c = Ik be the standard k-cube in [0, 1]k . Then


Z
Z
f (x) dx1 dxk
=
Ik

[0,1]k

is the k-dimensional Riemann integral of f over [0, 1]k .


Let Ik (x1 , . . ., xk ) = (x2 , x1 , x3 , . . ., xk ). Then Ik ([0, 1]k ) = Ik ([0, 1]k ) = [0, 1]k , however
Z
Z
=
f (x2 , x1 , x3 , . . ., xk )(1) dx1 dxk
Ik

[0,1]k

f (x1 , x2 , x3 , . . ., xk ) dx1 dxk =

Ik

[0,1]k

R
We see that is an oriented Riemann integral. Note that in the above formula (11.17) we do
Ik

not have the absolute value of the Jacobian.

11.3.3 Stokes Theorem


Theorem 11.7 Let U be an open subset of Rn , k 0 a non-negative integer and let sk+1 be a
singular (k + 1)-chain on U. Let be a differential k-form on U, 1k (U). Then we have
Z
Z
=
d .
sk+1

sk+1

11.3 Stokes Theorem

297

Proof. (a) Let sk+1 = Ik+1 be the standard (k + 1)-cube, in particular n = k + 1.


k+1
X
Let =
fi (x) dx1 d
dxi dxk+1 . Then
i=1

k+1
X

d =

(1)i+1

i=1

fi (x)
dx1 dxk+1 ,
xi

hence by Example 11.6 (b), Fubinis theorem and the fundamental theorem of calculus
Z
Z
k+1
X
fi
i+1
d =
(1)
dx1 dxk+1
xi
i=1
Ik+1

[0,1]k+1

k+1
X

(1)

i=1

k+1
X

(1)

Example 11.6 (b)

k+1
X
i=1


fi
(x1 , . . ., t, . . .xk+1 ) dt dx1 dxi1 dxi+1 dxk+1
bi
xi

(fi (x1 , . . . , 1, . . . , xk+1 ) fi (x1 , . . . , 0, . . . , xk+1 )) dx1 b dxk+1


bi

[0,1]k

i+1

[0,1]k

i=1

Z Z

i+1

(1)i+1

Z 

[0,1]k

bi

Z




k+1
k+1

I(i,1)
I(i,0)
[0,1]k

Ik+1

by definition of Ik+1 . The assertion is shown in case of identity map.


(b) The general case. Let Ik+1 be the standard (k + 1)-cube. Since the pull-back and the
differential commute (Proposition 11.4) we have
Z
Z
Z
Z

d =
(ck+1) (d ) =
d ((ck+1) ) =
(ck+1 )
ck+1

Ik+1

k+1
X

i=1

k+1
X

(1)i

sk+1

Ik+1

(ck+1)

k+1
I(i,0)

(1)

k+1
I(i,0)

i=1

(c) Finally, let sk+1 =


by (b),
Z

Ik+1

(ck+1 )

ck+1

k+1
k+1
ck+1 I(i,0)
ck+1 I(i,1)

ni cki with ni Z and singular (k + 1)-cubes cki . By definition and

d =

X
i

ni

d =
ck+1

X
i

ni

=
ck+1

sk+1

11 Differential Forms on Rn

298

Remark 11.4 Stokes theorem is valid for arbitrary oriented compact differentiable kdimensional manifolds F and continuously differentiable (k 1)-forms on F.
Example 11.10 We come back to Example 11.8 (c). Let = (x dy dz + y dz dx +
z dx dy)/r 3 be a 2-form on R3 \ {(0, 0, 0)}. It is easy to show that is closed, d = 0. We
R
compute . First note that c2 (r 3 ) = 1, c2 (x) = cos s sin t, c2 (y) = sin s sin t, c2 (z) = cos t
c2

such that

c2 ( dx) = d(cos s sin t) = sin s sin t ds + cos s cos t dt,


c2 ( dy) = d(sin s sin t) = cos s sin t ds + sin s cos t dt,
c2 ( dz) = sin t dt.
and
c2 () = c2 (x dy dz + y dz dx + z dx dy)
= c2 (x) c2 ( dy)c2 ( dz) + c2 (y dz dx) + c2 (z dx dy)
= ( cos2 s sin3 t sin2 s sin3 t cos t(sin2 s sin t cos t + cos2 s sin t cos t) ds dt

= sin t ds dt,
such that
Z
=
c2

c2 ()

[0,2][0,]

sin t ds dt =

[0,2][0,]

( sin t) dsdt = 4.

Stokes theorem shows that is not exact on R3 \ {(0, 0, 0)}. Suppose to the contrary that
= d for some 1 (R3 \ {(0, 0, 0)}). Since by Example 11.8 (c), c2 is a degenerate
1-chain (it consists of two points), the pull-back (c2 ) () is 0 and so is the integral
Z
Z
Z
Z

0=
(c2 ) () =
=
d =
= 4,
I1

c2

c2

c2

a contradiction; hence, is not exact.


We come back to the two special cases k = 1, n = 2 and k = 1, n = 3.

11.3.4 Special Cases


k = 1, n = 3. Let c : [0, 1]2 U R3 be a singular 2-cube, F = c([0, 1]2 ) is a regular
smooth surface in R3 . Then F is a closed path consisting of 4 parts with the counter-clockwise
orientation. Let = f1 dx1 +f2 dx2 +f3 dx3 be a differential 1-form on U. By Example 11.9 (a)
Z
Z
=
f1 dx1 + f2 dx2 + f3 dx3
c2

On the other hand by Example 11.5 (c)


d = curl f ( dx2 dx2 , dx3 dx2 , dx1 dx2 ).

11.3 Stokes Theorem

299

In this case Stokes theorem gives


Z
Z

c2

c2

f1 dx1 + f2 dx2 + f3 dx3 =

curl f ( dx2 dx3 , dx3 dx1 , dx1 dx2 )

If F is in the x1 x2 plane, we get Greens theorem.


k = 2, n = 3. Let c3 be a singular 3-cube in R3 and G = c3 ([0, 1]). Further let
= v1 dx2 dx3 + v2 dx3 dx1 + v3 dx1 dx2 ,
with a continuously differentiable vector field v C1 (G).
By Example 11.5 (d),
d = div (v) dx1 dx2 dx3 . The boundary of G consists of the 6 faces c3 ([0, 1]3 ). They
are oriented with the outer unit normal vector. Stokes theorem then gives
Z
Z
d =
v1 dx2 dx3 + v2 dx3 dx1 + v3 dx1 dx2 ,
c3
c3
Z
Z
~
div v dxdydz =
~v dS.
G

This is Gau divergence theorem.

11.3.5 Applications
The following two applications were not covered by the lecture.
(a) Browers Fixed Point Theorem
Proposition 11.8 (Retraction Theorem) Let G Rn be a compact, connected, simply connected set with smooth boundary G.
There exist no vector field f : G Rn , fi C2 (G), i = 1, . . . , n such that f (G) G and
f (x) = x for all x G.
Proof. Suppose to the contrary that such f exists; consider n1 (U),
= x1 dx2 dx3 dxn . First we show that f (d) = 0. By definition, for
v1 , . . . , vn Rn we have
f (d)(p)(v1 , . . . , vn ) = d(f (p))(Df (p)v1, Df (p)v2 , . . . , Df (p)vn )).
Since dim f (G) = dim G = n 1, the n vectors Df (p)v1 , Df (p)v2 , . . . , Df (p)vn can be
thought as beeing n vectors in an n 1 dimensional linear space; hence, they are linearly
dependent. Consequently, any alternating n-form on those vectors is 0. Thus f (d) = 0. By
Stokes theorem
Z
Z

f () = 0 =
f (d).
G

11 Differential Forms on Rn

300
On the other hand, f = id on G such that

f () = |G= x1 dx2 dxn |G .


Again, by Stokes theorem,
0=

f () =
G

dx1 dxn = | G | ;

a contradiction.

Theorem 11.9 (Browers Fixed Point Theorem) Let g : B1 B1 a continuous mapping of


the closed unit ball B1 Rn into itself.
Then f has a fixed point p, f (p) = p.
Proof. (a) We first prove that the theorem holds true for a twice continuously differentiable
mapping g. Suppose to the contrary that g has no fixed point. For any p B1 the line through
p and f (p) is then well defined. Let h(p) be those intersection point of the above line with the
the unit sphere Sn1 such that h(p) p is a positive multiple of f (p) p. In particular, h is a
C2 -mapping from B1 into Sn1 which is the identity on Sn1 . By the previous proposition, such
a mapping does not exist. Hence, f has a fixed point.
(b) For a continuous mapping one needs StoneWeierstra to approximate the continuous
functions by polynomials.

In case n = 1 Browers theorem is just the intermediate value theorem.

(b) The Fundamental Theorem of Algebra


We give a first proof of the fundamental theorem of algebra, Theorem 5.19:
Every polynomial f (z) = z n + a1 z n1 + + an with complex coefficients ai C
has a root in C.
We use two facts, the winding form on R2 \ {(0, 0)} is closed but not exact and z n and f (z)
are close together for sufficiently large | z |.

We view C as R2 with (a, b) = a + bi. Define the following singular 1-cubes on R2

11.3 Stokes Theorem

301

cR,n (s) = (Rn cos(2ns), Rn sin(2ns)) = z n ,


cR,f (s) = f cR,1 (s) = f (R cos(2s), R sin(2s)) = f (z),

where z = z(s) = R(cos 2s + i sin 2s), s [0, 1].


Note that | z | = R. Further, let

f(z)

c(s, t) = (1 t)cR,f (s) + tcR,n

c(s,t)

= (1 t)f (z) + tz n ,

(1-t) f(z)+t z

(s, t) [0, 1]2 ,

b(s, t) = f ((1 t) R(cos 2s, sin 2s))


= f ((1 t)z),

(s, t) [0, 1]2

be singular 2-cubes in R2 .
Lemma 11.10 If | z | = R is sufficiently large, then
| c(s, t) |

Rn
,
2

(s, t) [0, 1]2.

Proof. Since f (z) z n is a polynomial of degree less than n,




f (z) z n

0,

z
zn

in particular | f (z) z n | Rn /2 if R is sufficiently large. Then we have


| c(s, t) | = | (1 t)f (z) + tz n | = | z n + (1 t)(f (z) z n ) |
Rn
Rn
| z n | (1 t) | f (z) z n | Rn
=
.
2
2

The only fact we need is c(s, t) 6= 0 for sufficiently large R; hence, c maps the unit square into
R2 \ {(0, 0)}.
Lemma 11.11 Let = (x, y) = (y dx + x dy)/(x2 + y 2 ) be the winding form on
R2 \ {(0, 0)}. Then we have
(a)
c = cR,f cR,n ,

b = f (z) f (0).

(b) For sufficiently large R, c, cR,n , and cR,f are chains in R2 \ {(0, 0)} and
Z
Z
=
= 2n.
cR,n

cR,f

11 Differential Forms on Rn

302

Proof. (a) Note that z(0) = z(1) = R. Since I2 (x) = (x, 0) (x, 1) + (1, x) (0, x) we have
c(s) = c(s, 0) c(s, 1) + c(1, s) c(0, s)

= f (z) z n ((1 s)f (R) + sRn ) + ((1 s)f (R) + sRn ) = f (z) z n .

This proves (a). Similarly, we have


b(s) = b(s, 0) b(s, 1) + b(1, s) b(0, s)

= f (z) f (0) + f ((1 s)R) f ((1 s)R) = f (z) f (0).

(b) By the Lemma 11.10, c is a singular 2-chain in R2 \ {(0, 0, 0)} for sufficiently large R.
Hence c is a 1-chain in R2 \ {(0, 0)}. In particular, both cR,n and cR,f take values in
R2 \ {(0, 0)}. Hence (c) () is well-defined. We compute cR,n () using the pull-backs of
dx and dy
cR,n (x2 + y 2 ) = R2n ,
cR,n ( dx) = 2nRn sin(2ns) ds,
cR,n ( dy) = 2nRn cos(2ns) ds,
cR,n () = 2n ds.
Hence

cR,n

2n ds = 2n.
0

By Stokes theorem and since is closed,


Z
Z
= d = 0,
c

such that by (a), and the above calculation


Z
Z
Z
0=
=

,
c

cR,n

hence

cR,f

=
cR,n

= 2n.

cR,f

111111111111
000000000000
000000000000
111111111111
0000
1111
00
11
We complete the proof of the fundamental theorem of al000000000000
111111111111
0000
1111
00
11
000000000000
111111111111
0000
1111
n
b(s,t)=(1-t) z
000000000000
111111111111
gebra. Suppose to the contrary that the polynomial f (z)
0000
1111
000000000000
111111111111
0000
1111
for f=1.
000000000000
111111111111
is non-zero in C, then b as well as b are singular chains
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
in R2 \ {(0, 0)}.
000000000000
111111111111
000000000000
111111111111
By Lemma 11.11 (b) and again by Stokes theorem we have
Z
Z
Z
Z
=
=
= d = 0.
cR,f

cR,f f (0)

But this is a contradiction to Lemma 11.11 (b). Hence, b is not a 2-chain in R2 \ {(0, 0)}, that
is there exist s, t [0, 1] such that b(s, t) = f ((1 t)z) = 0. We have found that (1

11.3 Stokes Theorem

303

t)R(cos(2s) + i sin(2s)) is a zero of f . Actually, we have shown a little more. There is a


P
zero of f in the disc {z C | | z | R} where R max{1, 2 i | ai |}. Indeed, in this case
n1
nk X
Rn
n1


| ak | z
| ak | R

| f (z) z |
2
k=1
k=1
n

n1
X

and this condition ensures | c(s, t) | =


6 0 as in the proof of Lemma 11.10.

304

11 Differential Forms on Rn

Chapter 12
Measure Theory and Integration
12.1 Measure Theory
Citation from Rudins book, [Rud66, Chapter 1]: Towards the end of the 19th century it became
clear to many mathematicians that the Riemann integral should be replaced by some other
type of integral, more general and more flexible, better suited for dealing with limit processes.
Among the attempts made in this direction, the most notable ones were due to Jordan, Borel,
W.H. Young, and Lebesgue. It was Lebesgues construction which turned out to be the most
successful.
In a brief outline, here is the main idea: The Riemann integral of a function f over an interval
[a, b] can be approximated by sums of the form
n
X

f (ti ) m(Ei ),

i=1

where E1 , . . . , En are disjoint intervals whose union is [a, b], m(Ei ) denotes the length of Ei
and ti Ei for i = 1, . . . , n. Lebesgue discovered that a completely satisfactory theory of integration results if the sets Ei in the above sum are allowed to belong to a larger class of subsets
of the line, the so-called measurable sets, and if the class of functions under consideration is
enlarged to what we call measurable functions. The crucial set-theoretic properties involved
are the following: The union and the intersection of any countable family of measurable sets are
measurable;. . . the notion of length (now called measure) can be extended to them in such
a way that
m(E1 E2 ) = m(E1 ) + m(E2 ) +
for any countable collection {Ei } of pairwise disjoint measurable sets. This property of m is
called countable additivity.
The passage from Riemanns theory of integration to that of Lebesgue is a process of completion. It is of the same fundamental importance in analysis as the construction of the real number
system from rationals.
305

12 Measure Theory and Integration

306

12.1.1 Algebras, -algebras, and Borel Sets


(a) The Measure Problem. Definitions
Lebesgue (1904) states the following problem: We want to associate to each bounded subset
E of the real line a positive real number m(E), called measure of E, such that the following
properties are satisfied:
(1) Any two congruent sets (by shift or reflexion) have the same measure.
(2) The measure is countably additive.
(3) The measure of the unit interval [0, 1] is 1.
He emphasized that he was not able to solve this problem in full detail, but for a certain class of
sets which he called measurable. We will see that this restriction to a large family of bounded
sets is unavoidablethe measure problem has no solution.
Definition 12.1 Let X be a set. A family (non-empty) family A of subsets of X is called an
algebra if A, B A implies Ac A and A B A.
An algebra A is called a -algebra if for all countable families {An | n N} with An A we
have
[
An = A1 A2 An A.

Remarks 12.1 (a) Since A A implies A Ac A; X A and = X c A.


(b) If A is an algebra, then A B A for all A, B A. Indeed, by de Morgans rule,
S
T
T
S
( A )c = Ac and ( A )c = Ac , we have A B = (Ac B c )c , and all the
members on the right are in A by\
the definition of an algebra.
(c) Let A be a -algebra. Then
An A if An A for all n N. Again by de Morgans

rule

An =

Acn

!c

(d) The family P(X) of all subsets of X is both an algebra as well as a -algebra.
(e) Any -algebra is an algebra but there are algebras not being -algebras.
(f) The family of finite and cofinite subsets (these are complements of finite sets) of an infinite
set form an algebra. Do they form a -algebra?
(b) Elementary Sets and Borel Sets in

Let R be the extended real axis together with , R = R {+} {}. We use the old
rules as introduced in Section 3.1.1 at page 79. The new rule which is used in measure theory
only is
0 = 0 = 0.
The set
I = {(x1 , . . . , xn ) Rn | ai  xi  bi ,

i = 1, . . . , n}

12.1 Measure Theory

307

is called a rectangle or a box in Rn , where  either stands for < or for where ai , bi R.
For example ai = and bi = + yields I = Rn , whereas a1 = 2, b1 = 1 yields I = .
A subset of Rn is called an elementary set if it is the union of a finite number of rectangles
in Rn . Let En denote the set of elementary subsets of Rn . En = {I1 I2 Ir | r
N, Ij is a box in Rn }.
Lemma 12.1 En is an algebra but not a -algebra.
Proof. The complement of a finite interval is the union of
two intervals, the complement of an infinite interval is an
infinite interval. Hence, the complement of a rectangle in
Rn is the finite union of rectangles.
S
The countable (disjoint) union M = nN [n, n + 12 ] is
not an elementary set.
Note that any elementary set is the disjoint union of a
finite number of rectangles.
Let B be any (nonempty) family of \
subsets of X. Let (B) denote the intersection of all algebras containing B, i. e. (B) =
Ai , where {Ai | i I} is the family of all -algebras
iI

Ai which contain B, B Ai for all i I.


Note that the -algebra P(X) of all subsets is always a member of that family {Ai } such that
(B) exists. We call (B) the -algebra generated by B. By definition, (B) is the smallest
-algebra which contains the sets of B. It is indeed a -algebra. Moreover, ((B)) = (B)
and if B1 B2 then (B1 ) (B2 ).
Definition 12.2 The Borel algebra Bn in Rn is the -algebra generated by the elementary sets
En . Its elements are called Borel sets.
The Borel algebra Bn = (En ) is the smallest -algebra which contains all boxes in Rn We
will see that the Borel algebra is a huge family of subsets of Rn which contains all sets we are
ever interested in. Later, we will construct a non-Borel set.
Proposition 12.2 Open and closed subsets of Rn are Borel sets.
Proof. We give the proof in case of R2 . Let I (x0 , y0 ) = (x0 , x0 + ) (y0 , y0 + )
denote the open square of size 2 by 2 with midpoint (x0 , y0 ). Then I 1 I 1 for n N. Let
n+1
n
M R2 be open. To every point (x0 , y0 ) with rational coordinates x0 , y0 we choose the largest
square I1/n (x0 , y0 ) M in M and denote it by J(x0 , y0 ). We show that
M=

(x0 ,y0 )M, x0 ,y0

J(x0 , y0 ).

12 Measure Theory and Integration

308

(x,y)
0 0

(a,b)

Since the number of rational points in M is at least countable, the right


side is in (E2 ). Now let (a, b) M arbitrary. Since M is open, there
exists n N such that I2/n (a, b) M. Since the rational points
are dense in R2 , there is rational point (x0 , y0 ) which is contained in
I1/n (a, b). Then we have
I 1 (x0 , y0) I 2 (a, b) M.
n

Since (a, b) I 1 (x0 , y0) J(x0 , y0 ), we have shown that M is the union of the countable
n
family of sets J. Since closed sets are the complements of open sets and complements are
again in the -algebra, the assertion follows for closed sets.

Remarks 12.2 (a) We have proved that any open subset M of Rn is the countable union of
rectangles I M.
(b) The Borel algebra Bn is also the -algebra generated by the family of open or closed sets
in Rn , Bn = (Gn ) = (Fn ). Countable unions and intersections of open or closed sets are in
Bn .
Let us look in more detail at some of the sets in (En ). Let G and F be the families of all open
and closed subsets of Rn , respectively. Let G be the collection of all intersection of sequences
of open sets (from G), and let F be the collection of all unions of sequences of sets of F. One
can prove that F G and G F . These inclusions are strict. Since countable intersection
and unions of countable intersections and union are still countable operations, G , F (En )
For an arbitrary family S of sets let S be the collection of all unions of sequences of sets in S,
and let S be the collection of all unions of sequences of sets in S. We can iterate the operations
represented by and , obtaining from the class G the classes G , G , G ,. . . and from F the
classes F , F ,. . . . It turns out that we have inclusions
G G G (En )

F F F (En ).
No two of these classes are equal. There are Borel sets that belong to none of them.

12.1.2 Additive Functions and Measures


Definition 12.3 (a) Let A be an algebra over X. An additive function or content on A is a
function : A [0, +] such that
(i) () = 0,
(ii) (A B) = (A) + (B) for all A, B A with A B = .
(b) An additive function is called countably additive (or -additive in the German literature)
on A if for any disjoint family {An | An A, n N}, that is Ai Aj = for all i 6= j, with

12.1 Measure Theory


S

309

N An A we have

An

(An ).

(c) A measure is a countably additive function on a -algebra A.


If X is a set, A a -algebra on X and a measure on A, then the tripel (X, A, ) is called
a measure space. Likewise, if X is a set and A a -algebra on X, the pair (X, A) is called a
measurable space.
Notation. X
[
An in place of
An if {An } is a disjoint family of subsets. The countable
We write

additivity reads as follows

(A1 A2 ) = (A1 ) + (A2 ) + ,


!

X
X

An =
(An ).
n=1

n=1

We say is finite if (X) < . If (X) = 1, we call (X, A, ) a probability space. We call
S
-finite if there exist sets An A, with (An ) < and X =
n=1 An .
Example 12.1 (a) Let X be a set, x0 X and A = P(X). Then
(A) =

(
1,

0,

x0 A,
x0 6 A

defines a finite measure on A. is called the point mass concentrated at x0 .


(b1) Let X be a set and A = P(X). Put
(A) =

(
n,

if A has n elements

+,

if A has infinitely many elements.

is a measure on A, the so called counting measure.


(b2) Let X be a set and A = P(X). Put
(A) =

(
0,

+,

if A has finitely many or countably many elements


if A has uncountably many elements.

is countably additive, not -finite.


(b3) Let X be a set and A = P(X). Put
(A) =
is additive, not -additive.

(
0,

+,

if A is finite
if A is infinite.

12 Measure Theory and Integration

310

(c) X = Rn , A = En is the algebra of elementary sets of Rn . Every A En is the finite disjoint


P
Pm
union of rectangles A = m
k=1 Ik . We set (A) =
k=1 (Ik ) where
(I) = (b1 a1 ) (bn an ),

if I = {(x1 , . . . , xn ) Rn | ai  xi  bi , i = 1, . . . , n} and ai  bi ; () = 0. Then is an


additive function on En . It is called the Lebesgue content on Rn . Note that is not a measure
(since A is not a -algebra and is not yet shown to be countably additive). However, we will
see in Proposition 12.5 below that is even countably additive. By definition, (line in R2 ) = 0
and (plane in R3 ) = 0.
(d) Let X = R, A = E1 , and an increasing function on R. For a, b in R with a < b set
([a, b]) = (b + 0) (a 0),

([a, b)) = (b 0) (a 0),

((a, b]) = (b + 0) (a + 0),

((a, b)) = (b 0) (a + 0).


Then is an additive function on E1 if we set
(A) =

n
X

(Ii ),

if A =

i=1

n
X

Ii .

i=1

We call the LebesgueStieltjes content.


On the other hand, if : E1 R is an additive function, then : R R defined by
(x) =

((0, x]),
((x, 0]),

x 0,

x < 0,

defines an increasing, right-continuous function on R such that = . In general 6=


since the function on the right hand side is continuous from the right whereas is, in general,
not .

Properties of Additive Functions


Proposition 12.3 Let A be an algebra over X and an additive function on A. Then
!
n
n
X
X
Ak =
(Ak ) if Ak A, k = 1, . . . , n form a disjoint family of n
(a)
k=1

k=1

subsets.

(b) (A B) + (A B) = (A) + (B), A, B A.


(c) A B implies (A) (B) ( is monotone).
(d) If A B, A, B A, and (A) < +, then (B \ A) = (B) (A),
( is subtractive).

12.1 Measure Theory

311

(e)

n
[

Ak

k=1

n
X

(Ak ),

k=1

if Ak A, k = 1, . . . , n, ( is finitely subadditive).

X
Ak A. Then
(f) If {Ak | k N} is a disjoint family in A and
k=1

Ak

k=1

(Ak ).

k=1

Proof. (a) is by induction. (d), (c), and (b) are easy (cf. Homework 34.4).
n
[
(e) We can write
Ak as the finite disjoint union of n sets of A:
k=1

n
[

k=1




Ak = A1 A2 \ A1 A3 \ (A1 A2 ) An \ (A1 An1 ) .

Since is additive,

n
[

k=1

Ak

n
X
k=1

(Ak \ (A1 Ak1 ))

n
X

(Ak ),

k=1

where we used (B \ A) (B) (from (d)).


(f) Since is additive, and monotone
n
X

(Ak ) =

k=1

n
X

Ak

k=1

X
k=1

Ak

Taking the supremum on the left gives the assertion.

Proposition 12.4 Let be an additive function on the algebra A. Consider the following statements
(a) is countably additive.
(b) For any increasing sequence An An+1 , An A, with
have lim (An ) = (A).
n

(c) For any decreasing sequence An An+1 , An A, with


(An ) < we have lim (An ) = (A).
n
(d) Statement (c) with A = only.

n=1

n=1

An = A A we

An = A A and

We have (a) (b) (c) (d). In case (X) < ( is finite), all statements are equivalent.

12 Measure Theory and Integration

312

Proof. (a) (b). Without loss of generality A1 = . Put Bn = An \ An1 for n = 2, 3, . . . .


S
Then {Bn } is a disjoint family with An = B2 B3 Bn and A =
n=1 Bn . Hence, by
countable additivity of ,
!

n
n
X
X
X
(A) =
(Bk ) = lim
(Bk ) = lim
Bk = lim (An ).
n

k=2

k=2

k=2

S
(b) (a). Let {An } be a family of disjoint sets in A with An = A A; put Bk =
A1 Ak . Then Bk is an increasing to A sequence. By (b)
!
!
n

n
X
X
X
(Bn ) =
=
Ak
(Ak ) (A) =
Ak ;
is additive

k=1

Thus,

(Ak ) =

k=1

X
k=1

k=1

k=1

Ak .

(b) (c). Since An is decreasing to A, A1 \ An is increasing to A1 \ A. By (b)


(A1 \ An ) (A1 \ A),
n

hence (A1 ) (An ) (A1 ) (A) which implies the assertion.


n
(c) (d) is trivial.
Now let be finite, in particular, (B) < for all B A. We show (d) (b). Let (An ) be an
increasing to A sequence of subsets A, An A. Then (A \ An ) is a decreasing to sequence.
By (d), (A \ An ) 0. Since is subtractive (Proposition 12.3 (d)) and all values are finite,
n

(An ) (A).
n

Proposition 12.5 Let be a right-continuous increasing function : R R, and the


corresponding LebesgueStieltjes content on E1 . Then is countably additive if.
Proof. For simplicity we write for . Recal that ((a, b]) = (b) (a). We will perform
the proof in case of

[
(ak , bk ]
(a, b] =
k=1

with a disjoint family [ak , bk ) of intervals. By Proposition 12.3 (f) we already know
((a, b])

((ak , bk ]).

(12.1)

k=1

We prove the opposite direction. Let > 0. Since is continuous from the right at a, there
exists a0 [a, b) such that (a0 )(a) < and, similarly, for every k N there exists ck > bk
such that (ck ) (bk ) < /2k . Hence,
[a0 , b]

(ak , bk ]

k=1

(ak , ck )

k=1

12.1 Measure Theory

313

is an open covering of a compact set. By HeineBorel (Definition 6.14) there exists a finite
subcover
N
N
[
[
[a0 , b]
(ak , ck ), hence (a0 , b]
(ak , ck ],
k=1

k=1

such that by Proposition 12.3 (e)

((a0 , b])

N
X

((ak , ck ]).

k=1

By the choice of a0 and ck ,


((ak , ck ]) = ((ak , bk ]) + (ck ) (bk ) ((ak , bk ]) +

.
2k

Similarly, ((a, b]) = ((a, a0 ]) + ((a0 , b]) such that


N 
X

((ak , bk ]) + k +
((a, b]) ((a0 , b]) +
2
k=1

N
X
k=1

((ak , bk ]) + 2

Since was arbitrary,


((a, b])
In view of (12.1), is countably additive.

((ak , bk ]) + 2

k=1

((ak , bk ]).

k=1

Corollary 12.6 The correspondence 7 from Example 12.1 (d) defines a bijection between countably additive functions on E1 and the monotonically increasing, right-continuous
functions on R (up to constant functions, i. e. and + c define the same additive function).

Historical Note. It was the great achievment of Emile


Borel (18711956) that he really proved
the countable additivity of the Lebesgue measure. He realized that the countable additivity of
is a serious mathematical problem far from being evident.

12.1.3 Extension of Countably Additive Functions


Here we must stop the rigorous treatment of measure theory. Up to now, we know only two
trivial examples of measures (Example 12.1 (a) and (b)). We give an outline of the steps toward
the construction of the Lebesgue measure.
Construction of an outer measure on P(X) from a countably additive function on an
algebra A.
Construction of the -algebra A of measurable sets.

12 Measure Theory and Integration

314

The extension theory is due to Caratheodory (1914). For a detailed treatment, see [Els02, Section II.4].
Theorem 12.7 (Extension and Uniqueness) Let be a countably additive function on the algebra A.
(a) There exists an extension of to a measure on the -algebra (A) which coincides with
on A. We denote the measure on (A) also by . It is defined as the restriction of the outer
measure : P(X) [0, ]
(A) = inf

X
k=n

(An ) | A

n=1

An , An A, n N

to the -measurable sets A .


(b) This extension is unique, if (X, A, ) is -finite.
(For a proof, see [Bro92, (2.6), p.68])
Remark 12.3 (a) A subset A X is said to be -measurable if for all Y X
(Y ) = (A Y ) + (Ac Y ).
The family of -measurable sets form a -algebra A .
(b) We have A (A) A and (A) = (A) for all A A.
(c) (X, A , ) is a measure space, in particular, is countably additive on the measurable
sets A and we redenote it by .

12.1.4 The Lebesgue Measure on

Using the facts from the previous subsection we conclude that for any increasing, right continuous function on R there exists a measure on the -algebra of Borel sets. We call this
measure the LebesgueStieltjes measure on R. In case (x) = x we call it the Lebesgue measure. Extending the Lebesgue content on elementary sets of Rn to the Borel algebra Bn , we
obtain the n-dimensional Lebesgue measure n on Rn .
Completeness
A measure : A R+ on a -algebra A is said to be complete if A A, (A) = 0, and
B A implies B A. It turns out that the Lebesgue measure n on the Borel sets of Rn is
not complete. Adjoining to Bn the subsets of measure-zero-sets, we obtain the -algebra An
of Legesgue measurable sets An .
An = (Bn {X Rn | B Bn : X E,
The Lebesgue measure n on An is now complete.

n (B) = 0}) .

12.1 Measure Theory

315

Remarks 12.4 (a) The Lebesgue measure is invariant under the motion group of Rn . More
precisely, let O(n) = {T Rnn | T T = T T = En } be the group of real orthogonal
n n-matrices (motions), then
n (T (A)) = n (A),

A An ,

T O(n).

(b) n is translation invariant, i. e. n (A) = n (x+A) for all x Rn . Moreover, the invariance
of n under translations uniquely characterizes the Lebesgue measure n : If is a translation
invariant measure on Bn , then = cn for some c R+ .
(c) There exist non-measurable subsets in Rn . We construct a subset E of R that is not Lebesgue
measurable.
We write x y if x y is rational. This is an equivalence relation since x x for all x R,
x y implies y x for all x and y, and x y and y z implies x z. Let E be a subset of
(0, 1) that contains exactly one point in every equivalence class. (the assertion that there is such
a set E is a direct application of the axiom of choice). We claim that E is not measurable. Let
E + r = {x + r | x E}. We need the following two properties of E:
(a) If x (0, 1), then x E + r for some rational r (1, 1).
(b) If r and s are distinct rationals, then (E + r) (E + s) = .
To prove (a), note that for every x (0, 1) there exists y E with x y. If r = x y, then
x = y + r E + r.
To prove (b), suppose that x (E + r) (E + s). Then x = y + r = z + s for some y, z E.
Since y z = s r 6= 0, we have y z, and E contains two equivalent points, in contradiction
to our choice of E.
[
Now assume that E is Lebesgue measurable with (E) = . Define S =
(E + r) where
the union is over all rational r (1, 1). By (b), the sets E + r are pairwise disjoint; since
is translation invariant, (E + r) = (E) = for all r. Since S (1, 2), (S) 3. The
countable additivity of now forces = 0 and hence (S) = 0. But (a) implies (0, 1) S,
hence 1 (S), and we have a contradiction.
(d) Any countable set has Lebesgue measure zero. Indeed, every single point is a box with
edges of length 0; hence ({pt}) = 0. Since is countably additive,
({x1 , x2 , . . . , xn , . . . }) =

X
n=1

({xn }) = 0.

In particular, the rational numbers have Lebesgue measure 0, (Q) = 0.


(e) There are uncountable sets with measure zero. The Cantor set (Cantor: 18451918, inventor
of set theory) is a prominent example:

)
(
X ai
C=
ai {0, 2} i N ;
3i
i=1
Obviously, C [0, 1]; C is compact and can be written as the intersection of a decreasing
sequence (Cn ) of closed subsets; C1 = [0, 1/3] [2/3, 1], (C1 ) = 2/3, and, recursively,




2
1
1
1
2
2 1
Cn+1 = Cn
= (Cn ).
+ Cn = (Cn+1 ) = (Cn ) + Cn +
3
3 3
3
3
3
3

12 Measure Theory and Integration

316



iai {0, 2} i = 1, . . . , n Clearly,
 n
 n+1
2
2
2
(Cn+1) = (Cn ) = =
(C1 ) =
.
3
3
3

It turns out that Cn =

 P

ai
i=1 3i

By Proposition 12.4 (c), (C) = lim (Cn ) = 0. However, C has the same cardinality as
n
{0, 2}N
= {0, 1}N
= R which is uncountable.

12.2 Measurable Functions


Let A be a -algebra over X.
Definition 12.4 A real function f : X R is called A-measurable if for all a R the set
{x X | f (x) > a} belongs to A.
A complex function f : X C is said to be A-measurable if both Re f and Im f are Ameasurable.
A function f : U R, U Rn , is said to be a Borel function if f is Bn -measurable, i. e. f is
measurable with respect to the Borel algebra on Rn .
A function f : U V , U Rn , V Rm , is called a Borel function if f 1 (B) is a Borel set
for all Borel sets B V . It is part of homework 39.3 (b) to prove that in case m = 1 these
definitions coincide. Also, f = (f1 , . . . , fm ) is a Borel function if all fi are.
Note that {x X | f (x) > a} = f 1 ((a, +)). From Proposition 12.8 below it becomes clear
that the last two notions are consistent. Note that no measure on (X, A) needs to be specified
to define a measurable function.
Example 12.2 (a) Any continuous function f : U R, U Rn , is a Borel function. Indeed,
since f is continuous and (a, +) is open, f 1 ((a, +)) is open as well and hence a Borel set
(cf. Proposition 12.2).
(b) The characteristic function A is A-measurable if and only if A A (see homework 35.3).
(c) Let f : U V and g : V W be Borel functions. Then g f : U W is a Borel function,
too. Indeed, for any Borel set C W , g 1(C) is a Borel set in V since g is a Borel function.
Since f is a Borel function (g f )1 (C) = f 1 (g 1(C)) is a Borel subset of U which shows the
assertion.
Proposition 12.8 Let f : X R be a function. The following are equivalent
(a) {x | f (x) > a} A for all a R (i. e. f is A-measurable).
(b) {x | f (x) a} A for all a R.
(c) {x | f (x) < a} A for all a R.
(d) {x | f (x) a} A for all a R.
(e) f 1 (B) A for all Borel sets B B1 .

Proof. (a) (b) follows from the identity


[a, +] =

(a 1/n, +]

12.2 Measurable Functions

317

and the invariance of intersections under preimages, f 1 (A B) = f 1 (A) f 1 (B).


Since f is A-measurable and A is a -algebra, the countable intersection on the right is in A.
(a) (d) follows from {x | f (x) a} = {x | f (x) > a}c . The remaining directions are left
to the reader (see also homework 35.5).

Remark 12.5 (a) Let f, g : X R be A-measurable. Then {x | f (x) > g(x)} and {x |
f (x) = g(x)} are in A. Proof. Since
[
{x | f (x) < g(x)} =
({x | f (x) < q} {x | q < g(x)}) ,

and all sets {f < q} and {q < g} the right are in A, and on the right there is a countable union,
the right hand side is in A. A similar argument works for {f > g}. Note that the sets {f g}
and {f g} are the complements of {f < g} and {f > g}, respectively; hence they belong to
A as well. Finally, {f = g} = {f g} {f g}.
(b) It is not difficult to see that for any sequence (an ) of real numbers
lim an = inf sup ak

lim an = sup inf ak .

and

(12.2)

N kn

kn

As a consequence we can construct new mesurable functions using sup and limn . Let (fn )
be a sequence of A-measurable real functions on X. Then sup fn , inf fn , lim fn , lim fn are

A-measurable. In particular lim fn is measurable if the limit exists.


n
Proof. Note that for all a R we have
\
{fn a}.
{sup fn a} =

Since all fn are measurable, so is sup fn . A similar proof works for inf fn . By (12.2), lim fn
n
and lim fn , are measurable, too.
n

Proposition 12.9 Let f, g : X R Borel functions on X Rn . Then f + g, f g, and | f |


are Borel functions, too.
Proof. The function h(x) = (f (x), g(x)) : X R2 is a Borel function since its coordinate
functions are so. Since the sum s(x, y) = x + y and the product p(x, y) = xy are continuous
functions, the compositions sh and ph are Borel functions by Example 12.2 (c). Since the
constant functions and are Borel, so are f , g, and finally f + g. Hence, the Borel
functions over X form a linear space, moreover a real algebra. In particular f is Borel and so
is | f | = max{f, f }.

12 Measure Theory and Integration

318

f+
f-

Let (X, A, ) be a measure space and f : X R arbitrary. Let f + = max{f, 0} and f = max{f, 0}
denote the positive and negative parts of f . We have
f = f + f and | f | = f + +f ; moreover f + , f 0.

Corollary 12.10 Let f is a Borel function if and only if both f + and f are Borel.

12.3 The Lebesgue Integral


We define the Lebesgue integral of a complex function in three steps; first for positive simple
functions, then for positive measurable functions and finally for arbitrary measurable functions.
In this section (X, A, ) is a measure space.

12.3.1 Simple Functions


Definition 12.5 Let M X be a subset. The function
(
1, x M,
M (x) =
0, x 6 M,
is called characteristic function of M.
An A-measurabel function f : X R is called simple if f takes only finitely many values
c1 , . . . , cn . We denote the set of simple functions on (X, A) by S; the set of non-negative simple
functions is denoted by S+ .
Clearly, if c1 , . . . , cn are the distinct values of the simple function f , then
f=

n
X

ci Ai ,

i=1

where Ai = {x | f (x) = ci }. It is clear, that f measurable if and only if Ai A for all i.


Obviously, {Ai | i = 1, . . . , n} is a disjoint family of subsets of X.
It is easy to see that f, g S implies f + g S, max{f, g} S, min{f, g} S, and f g S.
Step 1: Positive Simple Functions
For f =

n
X
i=1

ci Ai S+ define
Z

f d =
X

n
X

ci (Ai ).

(12.3)

i=1

The convention 0 (+) is used here; it may happen that ci = 0 for some i and (Ai ) = +.

12.3 The Lebesgue Integral

319

Remarks 12.6 (a) Since ci 0 for all i, the right hand side is well-defined in R.
m
m
X
X
(b) Given another presentation of f , say, f (x) =
dj Bj ,
dj (Bj ) gives the same value
j=1

j=1

as (12.3).

The following properties are easily checked.


Lemma 12.11 For f, g S+ , A A, c R+ we have
R
(1) X A d = (A).
R
R
(2) X cf d = c X f d.
R
R
R
(3) X (f + g) d = X f d + X g d.
R
R
(4) f g implies X f d X g d.

12.3.2 Positive Measurable Functions


The idea is to approximate a positive measurable function with an increasing sequence of positive simple ones.
Theorem 12.12 Let f : X [0, +] be measurable. There exist simple functions sn , n N,
on X such that
(a)

0 s1 s2 f .

(b)

sn (x) f (x), as n , for every x X.


n

y
f
1

Example. X = (a, b), n = 1,


1 i 2. Then


1
1
,
0,
E11 = f
2


1
1
E12 = f
,1
,
2

s1

3/4

E24

E24
1/2

F2 = f 1 ([1, +]) .

E11

E12

E12 E11 b

F1

Proof. For n N and for 1 i n2n , define




i1 i
1
,
Eni = f
2n 2n

and Fn = f 1 ([n, ])

and put
n

sn =

n2
X
i1
i=1

2n

Eni + nFn .

12 Measure Theory and Integration

320

Proposition 12.8 shows that Eni and Fn are measurable sets. It is easily seen that the functions
sn satisfy (a). If x is such that f (x) < +, then
0 f (x) sn (x)

1
2n

(12.4)

as soon as n is large enough, that is, x Eni for some n, i N and not x Fn . If f (x) = +,
then sn (x) = n; this proves (b).
From (12.4) it follows, that sn f uniformly on X if f is bounded.
Step 2: Positive Measurable Real Functions
Definition 12.6 (Lebesgue Integral) Let f : X [0, +] be measurable. Let (sn ) be an
increasing sequence of non-negative simple functions sn converging to f (x) for all x X,
lim sn (x) = sup sn (x) = f (x). Define

f d = lim

sn d = sup

sn d

(12.5)

and call this number in [0, +] the Lebesgue integral of f (x) over X with respect to the measure or -integral of f over X.
The definition of the Lebesgue integral does not depend on the special choice of the increasing
functions sn f . One can define
Z

Z
f d = sup
s d | s f, and s is a simple function .
X

R
Observe, that we apparently have two definitions for X f d if f is a simple function. However
these assign the same value to the integral since f is the largest simple function greater than or
equal to f .

Proposition 12.13 The properties (1) to (4) from Lemma 12.11 hold for any non-negative measurable functions f, g : X [0, +], c R+ .
(Without proof.)
Step 3: Measurable Real Functions
Let f : X R be measurable and f + (x) = max(f, 0), f (x) = max(f (x), 0). Then f +
and f are both positive and measurable. Define
Z
Z
Z
+
f d =
f d
f d
X

if at least one of the integrals on the right is finite. We say that f is -integrable if both are
finite.

12.3 The Lebesgue Integral

321

Step 4: Measurable Complex Functions


Definition 12.7 (Lebesgue IntegralContinued) A
complex,
f : X C is called -integrable if
Z
| f | d < .

measurable

function

If f = u + iv is -integrable, where u = Re f and v = Im f are the real and imaginary parts


of f , u and v are real measurable functions on X. Define the -integral of f over X by
Z
Z
Z
Z
Z
+

+
f d =
u d
u d + i
v d i
v d.
(12.6)
X

These four functions u+ , u , v + , and v are measurable, real, and non-negative. Since we have
u+ | u | | f | etc., each of these four integrals is finite. Thus, (12.6) defines the integral on
the left as a complex number.
We define L 1 (X, ) to be the collection of all complex -integrable functions f on X.
R
Note that for an integrable functions f , X f d is a finite number.
Proposition 12.14 Let f, g : X C be measurable.

(a) f is -integrable if and only if | f | is -integrable and we have


Z
Z



f d
| f | d.

X

(b) f is -integrable if and only if there exists an integrable function h with | f |


h.
(c) If f, g are integrable, so is c1 f + c2 g where
Z
Z
Z
(c1 f + c2 g) d = c1
f d + c2
g d.
X

(d) If f g on X, then

f d

g d.

It follows that the set L 1 (X, ) of -integrable complex-valued functions on X is a linear


space. The Lebesgue integral defines a positive linear functional on L 1 (X, ). Note that (b)
implies that any measurable and bounded function f on a space X with (X) < is integrable.
Step 5:

f d

Definition 12.8 Let A A, f : X R or f : X C measurable. The function f is called


-integrable over A if A f is -integrable over X. In this case we put,
Z
Z
f d =
A f d.
A

In particular, Lemma 12.11 (1) now reads

d = (A).

12 Measure Theory and Integration

322

12.4 Some Theorems on Lebesgue Integrals


12.4.1 The Role Played by Measure Zero Sets
Equivalence Relations
Let X be a set and R X X. For a, b X we write a b if (a, b) R.
Definition 12.9 (a) The subset R X X is said to be an equivalence relation if R is reflexive,
symmetric and transitive, that is,
(r) x X : x x.
(s) x, y X : x y = y x.
(t) x, y, z X : x y y z = x z.
For a X the set a := {x X | x a} is called the equivalence class of a. We have a = b if
and only if a b.
(b) A partition P of X is a disjoint family P = {A | I} of subsets A X such that
P
A = X.

The set of equivalence classes is sometimes denoted by X/ .

Example 12.3 (a) On Z define a b if 2 | (ab). a and b are equivalent if both are odd or both
are even. There are two equivalence classes, 1 = 5 = 2Z + 1 (odd numbers), 0 = 100 = 2Z
even numbers.
(b) Let W V be a subspace of the linear space V . For x, y V define x y if x y W .
This is an equivalence relation, indeed, the relation is reflexive since x x = 0 W , it
is symmetric since x y W implies y x = (x y) W , and it is transitive since
x y, y z W implies that there sum (x y) + (y z) = x z W such that x z. One
has 0 = W and x = x + W := {x + w | w W }. Set set of equivalence classes with respect to
this equivalence relation is called the factor space or quotient space of V with respect to W and
is denoted by V /W . The factor space becomes a linear space if we define x + y := x + y and
x = x, C. Addition is indeed well-defined since x x and y y , say, x x = w1 ,
y y = w2 , w1 , w2 W implies x + y (x + y ) = w1 + w2 W such that x + y = x + y .
(c) Similarly as in (a), for m N define the equivalence relation a b (mod m) if m | (a b).
We say a is congruent b modulo m. This defines a partition of the integers into m disjoint
equivalence classes 0, 1, . . . , m 1, where r = {am + r | a Z}.
(d) Two triangles in the plane are equivalent if
(1) there exists a translation such that the first one is mapped onto the second one.
(2) there exists a rotation around (0, 0)
(3) there exists a motion (rotation or translation or reflexion or composition)
Then (1) (3) define different equivalence relations on triangles or more generally on subsets
of the plane.
(e) Cardinality of sets is an equivalence relation.

12.4 Some Theorems on Lebesgue Integrals

323

Proposition 12.15 (a) Let be an equivalence relation on X. Then P = {x | x X} defines


a partition on X denoted by P .
(b) Let P be a partition on X. Then x y if there exists A P with x, y A defines an
equivalence relation P on X.
(c) P = and PP = P.
Let P be a property which a point x may or may not have. For instance, P may be the property
f (x) > 0 if f is a given function or (fn (x)) converges if (fn ) is a given sequence of
functions.
Definition 12.10 If (X, A, ) is a measure space and A A, we say P holds almost everywhere on A, abbreviated by P holds a. e. on A, if there exists N A such that (N) = 0
and P holds for every point x A \ N.
This concept of course strongly depends on the measure , and sometimes we write a. e. to
emphasize the dependence on .
(a) Main example. On the set of measurable functions on X, f : X C we define an equivalence relation by
f g, if f = g a. e. on X.
This is indeed an equivalence relation since f (x) = f (x) for all x X, f (x) = g(x) implies
g(x) = f (x). Let f = g a. e. on X and g = h a. e. on X, that is, there exist M, N A with
(M) = (N) = 0 and f (x) = g(x) for all x X \ M, g(x) = h(x) for all x N. Hence,
f (x) = h(x) for all x M N. Since 0 (M N) (M) + (N) = 0 + 0 = 0 by
Proposition 12.3 (e), (M N) = 0 and finally f = h a. e. on X.
(b) Note that f = g a. e. on X implies
Z
Z
f d =
g d.
X

Indeed, let N denote the zero-set where f 6= g. Then


Z
Z
Z
Z
Z




f d
g d
| f g | d =
| f g | d +

X

X \N

| f g | d

(N)() + (X \ N) 0 = 0.

Here we used that for disjoint sets A, B A,


Z
Z
Z
Z
Z
Z
f d =
AB f d =
A f d +
B f d =
f d +
f d.
AB

Proposition 12.16 Let f : X [0, +] be measurable. Then


R
f d = 0 if and only if f = 0 a. e. on X.
X
R
R
Proof. By the above argument in (b), f = 0 a. e. implies X f d = X 0 d = 0 which proves
one direction. The other direction is homework 40.4.

12 Measure Theory and Integration

324

12.4.2 The space Lp (X, )


For any measurable function f : X C and any real p, 1 p < define
kf kp =

Z

| f | d

 p1

(12.7)

This number may be finite or . In the first case, | f |p is integrable and we write f L 1 (X, ).
Proposition 12.17 Let p, q 1 be given such that 1p + 1q = 1.
(a) Let f, g : X C be measurable functions such that f L p and g L q .
Then f g L 1 and
Z
| f g | d kf kp kgkq (Holder inequality).

(12.8)

(b) Let f, g L q . Then f + g L q and


kf + gkq kf kq + kgkq ,

(Minkowski inequality).

(12.9)

Idea of proof. Holder follows from Youngs inequality (Proposition 1.31, as in the calssical case
of Holders inequality in Rn , see Proposition 1.32
Minkowskis inequality follows from Holders inequality as in Propostion 1.34
Note that Minkowski implies that f, g L p yields kf + gk < such that f + g L p . In
particular, L p is a linaer space.
Let us check the properties of kkp . For all measurable f, g we have
kf kp 0,

kf kp = | | kf kp ,

kf + gk kf k + kgk .
All properties of a norm, see Definition 6.9 at page 179 are satisfied except for the definitness:
R
kf kp = 0 imlies X | f |p d = 0 implies by Proposition 12.16, | f |p = 0 a. e. implies f = 0
a. e. . However, it does not imply f = 0. To overcome this problem, we use the equivalece
relation f = g a. e. and consider from now on only equivalence classes of functions in L p , that
is we identify functions f and g which are equal a. e. .
The space N = {f : X C | f is measurable and f = 0 a. e. } is a linear subspace of
L p (X, ) for all all p, and f = g a. e. if and only if f g N. Then the factor space L p /N,
see Example 12.3 (b) is again a linear space.
Definition 12.11 Let (X, A, ) be a measure space. Lp (X, ) denotes the set of equivalence
classes of functions of L p (X, ) with respect to the equivalence relation f = g a. e. that is,
Lp (X, ) = L p (X, )/N
is the quotient space. (Lp (X, ), kkp ) is a normed space. With this norm Lp (X, ) is complete.

12.4 Some Theorems on Lebesgue Integrals

325

Example 12.4 (a) We have Q = 0 in Lp (R, ) since Q = 0 a. e. on R with respect to the


Lebesgue measure (note that Q is a set of measure zero).
(b) In case of sequence spaces L p (N) with respect to the counting measure, L p = Lp since
f = 0 a. e. implies f = 0.
(c) f (x) = x1 , > 0 is in L2 (0, 1) if and only if 2 < 1. We identify functions and their
equivalence classes.

12.4.3 The Monotone Convergence Theorem


The following theorem about the monotone convergence by Beppo Levi (18751961) is one of
the most important in the theory of integration. The theorem holds for an arbitrary increasing
R
sequence of measurable functions with, possibly, X fn d = +.

Theorem 12.18 (Monotone Convergence Theorem/Lebesgue) Let (fn ) be a sequence of


measurable functions on X and suppose that
(1)
(2)

0 f1 (x) f2 (x) + for all x X,


fn (x) f (x), for every x X.
n

Then f is measurable, and


lim

fn d =
X

f d =

Z 


lim fn d.

(Without proof) Note, that f is measurable is a consequence of Remark 12.5 (b).


Corollary 12.19 (Beppo Levi) Let fn : X [0, +] be measurable for all n N, and

X
f (x) =
fn (x) for x X. Then
n=1

Z X

fn d =

X n=1

Z
X
n=1

fn d.
X

Example 12.5 (a) Let X = N, A = P(N) the -algebra of all subsets, and the counting
measure on N. The functions on N can be identified with the sequences (xn ), f (n) = xn .
Trivially, any function is A-measurable.
R
What is N xn d? First, let f 0. For a simple function gn , given by gn = xn {n} , we obtain

X
R
gn d = xn ({n}) = xn . Note that f =
gn and gn 0 since xn 0. By Corollary 12.19,
n=1

f d =

Z
X
n=1

gn d =

xn .

n=1

P
Now, let f be arbitrary integrable, i. e. N | f | d < ; thus
n=1 | xn | < . Therefore,
P
1
(xn ) L (N, ) if and only if xn converges absolutely. The space of absolutely convergent
series is denoted by 1 or 1 (N).

12 Measure Theory and Integration

326
(b) Let anm 0 for all n, m N. Then

X
X

amn =

n=1 m=1

X
X

amn .

m=1 n=1

Proof. Consider the measure space (N, P(N), ) from (a). For n N define functions fn (m) =
amn . By Corollary 12.19 we then have
Z X

X n=1

fn (m) d =
{z

f (m)

XZ
n=1

f d =
(a)

fn (m) d =

f (m) =

m=1

amn

m=1 n=1

amn .

n=1 m=1

Proposition 12.20 Let f : X [0, +] be measurable. Then


Z
(A) =
f d, A A
A

defines a measure on A.
Proof. Since f 0, (A) 0 for all A A. Let (An ) be a countable disjoint family of
P
P
measurable sets An A and let A =
n=1 An . By homework 40.1, A =
n=1 An and
therefore
Z
Z X
Z

Z
X
f d =
A f d =
An f d =
An f d
(A) =
A

Z
X
n=1

An

f d =

X n=1

B.Levi

n=1

(An ).

n=1

12.4.4 The Dominated Convergence Theorem


Besides the monotone convergence theorem the present theorem is the most important one. It is
due to Henry Lebesgue. The great advantage, compared with Theorem6.6, is that (X) = is
allowed, that is, non-compact domains X are included. We only need the pointwise convergence
of (fn ), not the uniform convergence. The main assumtion here is the existence of an integrable
upper bound for all fn .
Theorem 12.21 (Dominated Convergence Theorem of Lebesgue) Let fn : X
g, fn : X C be measurable functions such that

R or

12.4 Some Theorems on Lebesgue Integrals

327

(1) fn (x) f (x) as n a. e. on X,


(2) | fn (x) | g(x)
Z
g d < +.
(3)

a. e. on X,

R
Then f is measurable and integrable, X | f | d < , and
Z
Z
Z
lim
fn d =
f d =
lim fn d,
n X
n
X
X
Z
| fn f | d = 0.
lim
n

(12.10)

Note, that (12.10) shows that (fn ) converges to f in the normed space L1 (X, ).
Example
12.6 (a) Let An A, n N, A1 A2 be an increasing sequence with
[
An = A. If f L 1 (A, ), then f L 1 (An ) for all n and

lim

f d =
An

f d.

(12.11)

Indeed, the sequence (An f ) is pointwise converging to A f since A (x) = 1 iff x A


iff x An for all n n0 iff limn An (x) = 1. Moreover, | An f | | A f | which is
integrable. By Lebesgues theorem,
Z
Z
Z
Z
f d = lim
lim
An f =
A f d =
f d.
n

An

However, if we do not assume f L 1 (A, ), the statement is not true (see Remark 12.7 below).
Exhausting theorem. Let (An ) be an increasing sequencce of measurable sets and A =
R
S
n=1 An . suppose that f is measurable, and An f d is a bounded sequence. Then f
L 1 (A, ) and (12.11) holds.
(b) Let fn (x) = (1)n xn on [0, 1]. The sequence is dominated by the integrable function
R
R
1 | fn (x) | for all x [0, 1]. Hence limn [0,1] fn d = 0 = [0,1] limn fn d.

12.4.5 Application of Lebesgues Theorem to Parametric Integrals


As a direct application of the dominated convergence theorem we now treat parameter dependent integrals see Propostions 7.22 and 7.23
Proposition 12.22 (Continuity) Let U Rn be an open connected set, t0 U, and
f : Rm U R be a function. Assume that
(a) for a.e. x Rm , the function t 7 f (x, t) is continuous at t0 ,
(b) There exists an integrable function F : Rm R such that for every t U,
| f (x, t) | F (x) a.e. on Rm .

12 Measure Theory and Integration

328
Then the function
g(t) =
is continuous at t0 .

Rm

f (x, t) dx

Proof. First we note that for any fixed t U, the function ft (x) = f (x, t) is integrable on
Rm since it is dominated by the integrable function F . We have to show that for any sequence
tj t0 , tj U, g(tj ) tends to g(t0 ) as n . We set fj (x) = f (x, tj ) and f0 (x) = f (x, t0 )
for all n N. By (b) we have
f0 (x) = lim fj (x),
j

a. e. x Rm .

By (a) and (c), the assumptions of the dominated convergence theorem are satisfied and thus
Z
Z
Z
lim g(tj ) = lim
fj (x) dx =
lim fj (x) dx =
f0 (x) dx = g(t0 ).
j

Rm

Rm j

Rm

Proposition 12.23 (Differentiation under the Integral Sign) Let I R be an open interval
and f : Rm I R be a function such that

(a) for every t I the function x 7 f (x, t) is integrable,


(b) for almost all x Rm , the function t 7 f (x, t) is finite and continuously
differentiable,
(c) There exists an integrable function F : Rm R such that for every t I



f
F (x), a.e. x Rm .

(x,
t)

t
R
Then the function g(t) = Rm f (x, t) dx is differentiable on I with
Z
f

(x, t) dx.
g (t) =
Rm t
The proof uses the previous theorem about the continuity of the parametric integral. A detailed
proof is to be found in [Kon90, p. 283].

Example 12.7 (a) Let f L 1 (R).


Then the Fourier transform f: R C,
R itx
1
f (x) dx is continuous on R, see homework 41.3.
f(t) = 2 e

(b) Let K R3 be a compact subset and : K R integrable, the Newton potential (with
mass density ) is given by
Z
(x)
u(t) =
dx, t 6 K.
K kx tk
Then u(t) is a harmonic function on R3 \ K.
Similarly, if K R2 is compact and L (K), the Newton potential is given by
Z
u(t) =
(x) log kx tk dx, t 6 K.
K

Then u(t) is a harmonic function on R2 \ K.

12.4 Some Theorems on Lebesgue Integrals

329

12.4.6 The Riemann and the Lebesgue Integrals


Proposition 12.24 Let f be a bounded function on the finite interval [a, b].
(a) f is Riemann integrable on [a, b] if and only if f is continuous a. e. on [a, b].
(b) If f is Riemann integrable on [a, b], then f is Lebesgue integrable, too. Both integrals
coincide.
Let I R be an interval such that f is Riemann integrable on all compact subintervals of I.
(c) f is Lebesgue integrable on I if and only if | f | is improperly Riemann integrable on I (see
Section 5.4); both integrals coincide.
Remarks 12.7 (a) The characteristic function Q on [0, 1] is Lebesgue but not Riemann integrable; Q is nowhere continuous on [0, 1].
(b) The (improper) Riemann integral
Z
sin x
dx
x
1

converges (see Example 5.11); however, the Lebesgue integral does not exist since the integral
does not converge absolutely. Indeed, for non-negative integers n 1 we have with some c > 0

Z (n+1)
Z (n+1)
sin x
1
c
dx

;
| sin x | dx =
x
(n + 1) n
(n + 1)
n
hence



n
X
sin x
1
dx c

.
x
k=1 k + 1


R
Since the harmonic series diverges, so does the integral 1 sinx x dx.
Z

(n+1)

12.4.7 Appendix: Fubinis Theorem


Theorem 12.25 Let (X1 , A1 , 1 ) and (X2 , A2 , 2 ) be -finite measure spaces, let f be an A1
A2 -measurable function and X = X1 X2 .
R
R
(a) If f : X [0, +], (x1 ) = f (x1 , x2 ) d2 , (x2 ) = f (x1 , x2 ) d1 , then
Z

X2

(x2 ) d2 =

X2

X1

f d(1 2 ) =

X1 X2

(b) If f L 1 (X, 1 2 ) then


Z
Z
f d(1 2 ) =
X1 X2

X2

Z

(x1 ) d1 .

X1

f (x1 , x2 ) d1
X1

d2 .

Here A1 A2 denotes the smallest -algebra over X, which contains all sets A B, A A1
and B A2 . Define (A B) = 1 (A)2 (B) and extend to a measure 1 2 on A1 A2 .
Remark 12.8 In (a), as in Levis theorem, we dont need any assumption on f to change the
order of integration since f 0. In (b) f is an arbitrary measurable function on X1 X2 ,
R
however, the integral X | f | d needs to be finite.

330

12 Measure Theory and Integration

Chapter 13
Hilbert Space
Functional analysis is a fruitful interplay between linear algebra and analysis. One defines
function spaces with certain properties and certain topologies and considers linear operators
between such spaces. The friendliest example of such spaces are Hilbert spaces.
This chapter is divided into two partsone describes the geometry of a Hilbert space, the
second is concerned with linear operators on the Hilbert space.

13.1 The Geometry of the Hilbert Space


13.1.1 Unitary Spaces
Let E be a linear space over K = R or over K = C.
Definition 13.1 An inner product on E is a function h , i : E E K with
(a) h1 x1 + 2 x2 , yi = 1 hx1 , yi + 2 hx2 , yi (Linearity)
(b) hx , yi = hy , xi. (Hermitian property)
(c) hx , xi 0 for all x E, and hx , xi = 0 implies x = 0 (Positive definiteness)
A unitary space is a linear space together with an inner product.
Let us list some immediate consequences from these axioms: From (a) and (b) it follows that
(d)

hy , 1 x1 + 2 x2 i = 1 hy , x1 i + 2 hy , x2 i .

A form on E E satisfying (a) and (d) is called a sesquilinear form. (a) implies h0 , yi = 0
for all y E. The mapping x 7 hx , yi is a linear mapping into K (a linear functional) for all
y E.
By (c), we may define kxk, the norm of the vector x E to be the square root of hx , xi; thus
kxk2 = hx , xi .
331

(13.1)

13 Hilbert Space

332

Proposition 13.1 (CauchySchwarz Inequality) Let (E, h , i) be a unitary space.


x, y E we have
| hx , yi | kxk kyk .

For

Equality holds if and only if x = y for some K.

Proof. Choose C, | | = 1 such that hy , xi = | hx , yi |. For R we then have (since


hx , yi = hy , xi = | hx , yi |)
hx y , x yi = hx , xi hy , xi hx , yi + 2 hy , yi
= hx , xi 2 | hx , yi | + 2 hy , yi 0.

This is a quadratic polynomial a2 + b + c in with real coefficients. Since this polynomial


takes only non-negative values, its discriminant b2 4ac must be non-positive:
4 | hx , yi |2 4 kxk2 kyk2 0.
This implies | hx , yi | kxk kyk.
Corollary 13.2 kk defines a norm on E.
Proof. It is clear that kxkq 0. From (c) it follows that kxk = 0 implies x = 0. Further,
p
kxk = hx , xi = | |2 hx , xi = | | kxk. We prove the triangle inequality. Since
2 Re (z) = z + z we have by Proposition 1.20 and the Cauchy-Schwarz inequality
kx + yk2 = hx + y , x + yi = hx , xi + hx , yi + hy , xi + hy , yi
= kxk2 + kyk2 + 2 Re hx , yi

kxk2 + kyk2 + 2 | hx , yi |

kxk2 + kyk2 + 2 kxk kyk = (kxk + kyk)2 ;


hence kx + yk kxk + kyk.
p
By the corollary, any unitary space is a normed space with the norm kxk = hx , xi.
Recall that any normed vector space is a metric space with the metric d(x, y) = kx yk. Hence,
the notions of open and closed sets, neighborhoods, converging sequences, Cauchy sequences,
continuous mappings, and so on make sense in a unitary space. In particular lim xn = x
n

means that the sequence (kxn xk) of non-negative real numbers tends to 0. Recall from
Definition 6.8 that a metric space is said to be complete if every Cauchy sequence converges.
Definition 13.2 A complete unitary space is called a Hilbert space.
Example 13.1 Let K = C.
(a) E = C , x = (x1 , . . . , xn ) C , y = (y1 , . . . , yn ) C . Then hx , yi =
n

n
X
k=1

xk yk defines

1
P
an inner product, with the euclidean norm kxk = ( nk=1 | xk |) 2 . (Cn , h , i) is a Hilbert space.

13.1 The Geometry of the Hilbert Space

333

R
(b) E = L2 (X, ) is a Hilbert space with the inner product hf , gi = X f g d.
By Proposition 12.17 with p = q = 2 we obtain the Cauchy-Schwarz inequality
Z
Z
 12 Z
 12


2
2


f g d
| f | d
| g | d .

X

Using CSI one can prove Minkowskis inequality, that is, f, g L2 (X, ) implies f + g
L2 (X, ). Also, hf , gi is a finite complex number, since f g L1 (X, ).
R
Note that the inner product is positive definite since X | f |2 d = 0 implies (by Proposition 12.16) | f | = 0 a. e. and therefore, f = 0 in L2 (X, ). To prove the completeness of
L2 (X, ) is more complicated, we skip the proof.
(c) E = 2 , i. e.

X
| xn |2 < }.
2 = {(xn ) | xn C, n N,
n=1

Note that CauchySchwarzs inequality in R (Corollary 1.26) implies



2
k
k

X

X
X
X
X


2
2
2
xn yn
| xn |
| yn |
| xn |
| y n |2 .

n=1

n=1
n=1
n=1
n=1
k

Taking the supremum over all k N on the left, we have



2

X

X
X


2
xn yn
| xn |
| y n |2 ;



n=1

n=1

hence

h(xn ) , (yn )i =

n=1

xn yn

n=1

is an absolutely converging series such that the inner product is well-defined on 2 .


Lemma 13.3 Let E be a unitary space. For any fixed y E the mappings f, g : E C given
by
f (x) = hx , yi , and g(x) = hy , xi

are continuous functions on E.

Proof. First proof. Let (xn ) be a sequence in E , converging to x E, that is,


limn kxn xk = 0. Then
| hxn , yi hx , yi | = | hxn x , yi | kxn xk kyk = 0
CSI

as n . This proves continuity of f . The same proof works for g.


Second proof. The CauchySchwarz inequality implies that for x1 , x2 E
| hx1 , yi hx2 , yi | = | hx1 x2 , yi | kx1 x2 k kyk ,
which proves that the map x 7 hx , yi is in fact uniformly continuous (Given > 0 choose
= / kyk. Then kx1 x2 k < implies | hx1 , yi hx2 , yi | < ). The same is true for
x 7 hy , xi.

13 Hilbert Space

334

Definition 13.3 Let H be a unitary space. We call x and y orthogonal to each other, and write
x y, if hx , yi = 0. Two subsets M, N H are called orthogonal to each other if x y for
all x M and y N.
For a subset M H define the orthogonal complement M of M to be the set
M = {x H | hx , mi = 0,

for all m M }.

For example, E = Rn with the standard inner product and v = (v1 , . . . , vn ) Rn , v 6= 0 yields
{v} = {x R |

n
X

xk vk = 0}.

k=1

This is a hyperplane in Rn which is orthogonal to v.

Lemma 13.4 Let H be a unitary space and M H be an arbitrary subset. Then, M is a


closed linear subspace of H.
Proof. (a) Suppose that x, y M . Then for m M we have
h1 x + 2 y , mi = 1 hx , mi + 2 hy , mi = 0;
hence 1 x + 2 y M . This shows that M is a linear subspace.
(b) We show that any converging sequence (xn ) of elements of M has its limit in M . Suppose
lim xn = x, xn M , x H. Then for all m M, hxn , mi = 0. Since the inner product is
n
continuous in the first argument (see Lemma 13.3) we obtain
0 = lim hxn , mi = hx , mi .
n

This shows x M ; hence M is closed.

13.1.2 Norm and Inner product


Problem. Given p
a normed linear space (E, kk). Does there exist an inner product h , i on E
such that kxk = hx , xi for all x E? In this case we call kk an inner product norm.

Proposition 13.5 (a) A norm kk on a linear space E over K = C or K = R is an inner


product norm if and only if the parallelogram law
kx + yk2 + kx yk2 = 2(kxk2 + kyk2 ),

x, y E

(13.2)

is satisfied.
(b) If (13.2) is satisfied, the inner product h , i is given by (13.3) in the real case K = R and
by (13.4) in the complex case K = C.

1
kx + yk2 kx yk2 , if K = R.
(13.3)
hx , yi =
4

1
kx + yk2 kx yk2 + i kx + iyk2 i kx iyk2 , if K = C.
hx , yi =
(13.4)
4
These equations are called polarization identities.

13.1 The Geometry of the Hilbert Space

335

Proof. We check the parallelogram and the polarization identity in the real case, K = R.
kx + yk2 + kx yk2 = hx + y , x + yi + hx y , x yi

= hx , xi + hy , xi + hx , yi + hy , yi + (hx , xi hy , xi hx , yi + hy , yi)

= 2 kxk2 + 2 kyk2 .
Further,

kx + yk2 kx yk2 = (hx , xi + hy , xi + hx , yi + hy , yi)


(hx , xi hy , xi hx , yi + hy , yi) = 4 hx , yi .

The proof that the parallelogram identity is sufficient for E being a unitary space is in the
appendix to this section.
R2
Example 13.2 We show that L1 ([0, 2]) with kf k1 = 0 | f | dx is not an inner product norm.
R1
Indeed, let f = [1,2] and g = [0,1] . Then f + g = | f g | = 1 and kf k1 = kgk1 = 0 dx = 1
such that
kf + gk21 + kf gk21 = 22 + 22 = 8 6= 4 = 2(kf k21 + kgk21 ).
The parallelogram identity is not satisfied for kk1 such that L1 ([0, 2]) is not an inner product
space.

13.1.3 Two Theorems of F. Riesz


(born: January 22, 1880 in Austria-Hungary, died: February 28, 1956, founder of functional
analysis)
Definition 13.4 Let (H1 , h , i1 ) and (H2 , h , i2 ) be Hilbert spaces. Let H = {(x1 , x2 ) | x1
H1 , x2 H2 } be the direct sum of the Hilbert spaces H1 and H2 . Then
h(x1 , x2 ) , (y1 , y2)i = hx1 , y1 i1 + hx2 , y2 i2
defines an inner product on H. With this inner product H becomes a Hilbert space. H =
H1 H2 is called the (direct) orthogonal sum of H1 and H2 .
Definition 13.5 Two Hilbert spaces H1 and H2 are called isomorphic if there exists a bijective
linear mapping : H1 H2 such that
h(x) , (y)i2 = hx , yi1 ,

x, y H1 .

is called isometric isomorphism or a unitary map.


Back to the orthogonal sum H = H1 H2 . Let H1 = {(x1 , 0) | x1 H1 } and H2 = {(0, x2 ) |
x2 H2 }. Then x1 7 (x1 , 0) and x2 7 (0, x2 ) are isometric isomorphisms from Hi Hi ,
i , i = 1, 2 are closed linear subspaces of H.
i = 1, 2. We have H1 H2 and H
In this situation we say that H is the inner orthogonal sum of the two closed subspaces H1 and
H2 .

13 Hilbert Space

336
(a) Rieszs First Theorem

Problem. Let H1 be a closed linear subspace of H. Does there exist another closed linear
subspace H2 such that H = H1 H2 ?
Answer: YES.
Lemma 13.6 ( Minimal Distance Lemma) Let C be a conx
vex and closed subset of the Hilbert space H. For x H
let

1111
00
0000
11
00
11
00
11
c
000000000
111111111
00
11
01
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
C

(x) = inf{kx yk | y C}.


Then there exists a unique element c C such that
(x) = kx ck .

Proof. Existence. Since (x) is an infimum, there exists a sequence (yn ), yn C, which
approximates the infimum, limn kx yn k = (x). We will show, that (yn ) is a Cauchy
sequence. By the parallelogram law (see Proposition 13.5) we have
kyn ym k2 = kyn x + x ym k2

= 2 kyn xk2 + 2 kx ym k2 k2x yn ym k2



2

yn + ym
2
2

.
= 2 kyn xk + 2 kx ym k 4 x

2


m
Since C is convex, (yn + ym )/2 C and therefore x yn +y
(x). Hence
2
2 kyn xk2 + 2 kx ym k2 4(x)2 .

By the choice of (yn ), the first two sequences tend to (x)2 as m, n . Thus,
lim kyn ym k2 = 2(2 (x) + (x)2 ) 4(x)2 = 0,

m,n

hence (yn ) is a Cauchy sequence. Since H is complete, there exists an element c H such
that limn yn = c. Since yn C and C is closed, c C. By construction, we have
kyn xk (x). On the other hand, since yn c and the norm is continuous (see
homework 42.1. (b)), we have
kyn xk kc xk .
This implies (x) = kc xk.
Uniqueness. Let c, c two such elements. Then, by the parallelogram law,
2

0 kc c k = kc x + x c k

2

c + c
2
2


= 2 kc xk + 2 kx c k 4 x
2
2((x)2 + (x)2 ) 4(x)2 = 0.

This implies c = c ; the point c C which realizes the infimum is unique.

13.1 The Geometry of the Hilbert Space

337

11111111
00000000
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
H
0000000000000000000
1111111111111111111
000
111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
x
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
000000
111111
0
1
x
00000000
11111111
0000000000000000000
1111111111111111111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
00000000
11111111
0000000000000000000
1111111111111111111
00000000
11111111
00
11
000000000
111111111
000
x 111
0
1
00000000
H11111111
2
00
11

Theorem 13.7 (Rieszs first theorem)


Let H1 be a closed linear subspace of the
Hilbert space H. Then we have
H = H1 H1 ,

that is, any x H has a unique representation x = x1 + x2 with x1 H1 and


x2 H1 .

Proof. Existence. Apply Lemma 13.6 to the convex, closed set H1 . There exists a unique
x1 H1 such that
(x) = inf{kx yk | y H1 } = kx x1 k kx x1 ty1 k
for all t K and y1 H1 . homework 42.2 (c) now implies x2 = x x1 y1 for all y1 H1 .
Hence x2 H1 . Therefore, x = x1 + x2 , and the existence of such a representation is shown.
Uniqueness. Suppose that x = x1 + x2 = x1 + x2 are two possibilities to write x as a sum of
elements of x1 , x1 H1 and x2 , x2 H1 . Then
x1 x1 = x2 x2 = u
belongs to both H1 and H1 (by linearity of H1 and H2 ). Hence hu , ui = 0 which implies
u = 0. That is, x1 = x1 and x2 = x2 .
Let x = x1 + x2 be as above. Then the mappings P1 (x) = x1 and P2 (x) = x2 are well-defined
on H. They are called orthogonal projections of H onto H1 and H2 , respectively. We will
consider projections in more detail later.
Example 13.3 Let H be a Hilbert space, z H, z 6= 0, H1 = K z the one-dimensional linear
subspace spanned by one single vector z. Since any finite dimensional subspace is closed,
Rieszs first theorem applies. We want to compute the projections of x H with respect to H1
and H1 . Let x1 = z; we have to determine such that hx x1 , zi = 0, that is
hx z , zi = hx , zi hz , zi = hx , zi hz , zi = 0.
Hence,
=

hx , zi
hx , zi
=
.
hz , zi
kzk2

The Rieszs representation with respect to H1 = Kz and H1 is




hx , zi
hx , zi
x=
z+ x
z .
kzk2
kzk2

13 Hilbert Space

338
(b) Rieszs Representation Theorem

Recall from Section 11 that a linear functional on the vector space E is a mapping F : E K
such that F (1 x1 + 2 x2 ) = 1 F (x1 ) + 2 F (x2 ) for all x1 , x2 E and 1 , 2 K.
Let (E, kk) be a normed linear space over K. Recall that a linear functional F : E K is
called continuous if xn x in E implies F (xn ) F (x).
The set of all continuous linear functionals F on E form a linear space E with the same linear
operations as in E .
Now let (H, h , i) be a Hilbert space. By Lemma 13.3, Fy : H K, Fy (x) = hx , yi defines
a continuous linear functional on H. Rieszs representation theorem states that any continuous
linear functional on H is of this form.
Theorem 13.8 (Rieszs Representation Theorem) Let F be a continuous linear functional on
the Hilbert space H.
Then there exists a unique element y H such that F (x) = Fy (x) = hx , yi for all x H.
Proof. Existence. Let H1 = ker F be the null-space of the linear functional F . H1 is a linear
subspace (since F is linear). H1 is closed since H1 = F 1 ({0}) is the preimage of the closed
set {0} under the continuous map F . By Rieszs first theorem, H = H1 H1 .
Case 1. H1 = {0}. Then H = H1 and F (x) = 0 for all x. We can choose y = 0; F (x) =
hx , 0i.
Case 2. H1 6= {0}. Suppose u H1 , u 6= 0. Then F (u) 6= 0 (otherwise, u H1 H1 such
that hu , ui = 0 which implies u = 0). We have


F (x)
F (x)
u = F (x)
F (u) = 0.
F x
F (u)
F (u)

Hence x

F (x)
u H1 . Since u H1 we have
F (u)


F (x)
x
u, u
F (u)

F (x)
= hx , ui
hu , ui
F (u)
*
+
F (u)
F (u)
hx , ui = x ,
u = Fy (x),
F (x) =
hu , ui
kuk2
0=

F (u)
u.
kuk2
Uniqueness. Suppose that both y1 , y2 H give the same functional F , i. e. F (x) = hx , y1 i =
hx , y2 i for all x. This implies
where y =

hy1 y2 , xi = 0,

x H.

In particular, choose x = y1 y2 . This gives ky1 y2 k2 = 0; hence y1 = y2 .

13.1 The Geometry of the Hilbert Space

339

(c) Example
Any continuous linear functionals on L2 (X, ) are of the form F (f ) =
g L2 (X, ). Any continuous linear functional on 2 is given by
F ((xn )) =

xn yn ,

n=1

f g d with some

with (yn ) 2 .

13.1.4 Orthogonal Sets and Fourier Expansion


Motivation. Let E = Rn be the euclidean space with the standard inner product and standard
basis {e1 , . . . , en }. Then we have with xi = hx , ei i
x=

n
X

xk ek ,

k=1

kxk =

n
X
k=1

| xk | ,

hx , yi =

n
X

xk yk .

k=1

We want to generalize these formulas to arbitrary Hilbert spaces.


(a) Orthonormal Sets
Let (H, h , i) be a Hilbert space.
Definition 13.6 Let {xi | i I} be a family of elements of H.
{xi } is called an orthogonal set or OS if hxi , xj i = 0 for i 6= j.
{xi } is called an orthonormal set or NOS if hxi , xj i = ij for all i, j I.
Example 13.4 (a) H = 2 , en = (0, 0, . . . , 0, 1, 0, . . . ) with the 1 at the nth component. Then
{en | n N} is an OS in H.
R 2
(b) H = L2 ((0, 2)) with the Lebesgue measure, hf , gi = 0 f g d. Then
{1, sin(nx), cos(nx) | n N}

is an OS in H.



sin(nx) cos(nx)
1
,
,
|nN ,

2
to be orthonormal sets of H.

einx
|nN
2

Lemma 13.9 (The Pythagorean Theorem) Let {x1 , . . . , xk } be an OS in H, then


kx1 + + xk k2 = kx1 k2 + + kxk k2 .
The easy proof is left to the reader.
Lemma 13.10 Let {xn } be an OS in H. Then
converges.
The proof is in the appendix.

k=1

xk converges if and only if

k=1 kxk k

13 Hilbert Space

340
(b) Fourier Expansion and Completeness
Throughout this paragraph let {xn | n N} an NOS in the Hilbert space H.

Definition 13.7 The numbers hx , xn i, n N, are called Fourier coefficients of x H with


respect to the NOS {xn }.


1
sin(nx) cos(nx)
Example 13.5 Consider the NOS ,
,
| n N from the previous ex

2
ample on H = L2 ((0, 2)). Let f H. Then


Z 2
sin(nx)
1
f (t) sin(nt) dt,
f,
=

0


Z 2
cos(nx)
1
f (t) cos(nt) dt,
f,
=

0


Z 2
1
1
f,
f (t) dt,
=
2
2 0
These are the usual Fourier coefficientsup to a factor. Note that we have another normalization than in Definition 6.3 since the inner product there has the factor 1/(2).
Proposition 13.11 (Bessels Inequality) For x H we have

X
k=1

| hx , xk i |2 kxk2 .

Proof. Let n N be a positive integer and yn = x


hyn , xm i = hx , xm i

n
X
k=1

Pn

k=1 hx ,

(13.5)
xk i xk . Then

hx , xk i hxk , xm i = hx , xm i

n
X
k=1

hx , xk i km = 0

for m = 1, . . . , n. Hence, {yn , hx , x1 i x1 , . . . , hx , xn i xn } is an OS. By Lemma 13.9



2
n
n
n


X
X
X


2
2
2
2
kxk = yn +
hx , xk i xk = kyn k +
| hx , xk i | kxk k
| hx , xk i |2 ,


k=1

k=1

k=1

since kxk k2 = 1 for all k. Taking the supremum over all n on the right, the assertion follows.

Corollary 13.12 For any x H the series

X
k=1

hx , xk i xk converges in H.

Proof. Since {hx , xk i xk } is an OS, by Lemma 13.10 the series converges if and only if the
P
P
2
2
series
k=1 khx , xk i xk k =
k=1 | hx , xk i | converges. By Bessels inequality, this series
converges.
We call

k=1 hx ,

xk i xk the Fourier series of x with respect to the NOS {xk }.

13.1 The Geometry of the Hilbert Space

341

Remarks 13.1 (a) In general, the Fourier series of x does not converge to x.

(b) The NOS { 12 , sin(nx)


, cos(nx)
} gives the ordinary Fourier series of a function f which is

integrable over (0, 2).


Theorem 13.13 Let {xk | k N} be an NOS in H. The following are equivalent:
(a) x =

X
k=1

hx , xk i xk

for all x H, i.e. the Fourier series of x converges to x.

(b) hz , xk i = 0 for all k N implies z = 0, i. e. the NOS is maximal.

X
2
(c) For every x H we have kxk =
| hx , xk i |2 .
k=1

(d) If x H and y H, then hx , yi =

k=1

hx , xk i hxk , yi.

Formula (c) is called Parsevals identity.


Definition 13.8 An orthonormal set {xi | i N} which satisfies the above (equivalent) properties is called a complete orthonormal system, CNOS for short.
Proof. (a) (d): Since the inner product is continuous in both components we have
hx , yi =
=

*
X
k=1

X
k=1

hx , xk i xk ,

X
n=1

hy , xn i xn

k,n=1

hx , xk i hxk , yi .

hx , xk i hy , xn i hxk , xn i
| {z }
kn

(d) (c): Put y = x.


(c) (b): Suppose hz , xk i = 0 for all k. By (c) we then have
2

kzk =

X
k=1

| hz , xk i |2 = 0;

hence

z = 0.

P
(b) (a): Fix x H and put y =
k=1 hx , xk i xk which converges according to Corollary 13.12. With z = x y we have for all positive integers n N
hz , xn i = hx y , xn i =
hz , xn i = hx , xn i

X
k=1

X
k=1

hx , xk i xk , xn

hx , xk i hxk , xn i = hx , xn i hx , xn i = 0.

This shows z = 0 and therefore x = y, i. e. the Fourier series of x converges to x.

13 Hilbert Space

342

Example 13.6 (a) H = 2 , {en | n N} is an NOS. We show that this NOS is complete.
For, let x = (xn ) be orthogonal to every en , n N; that is, 0 = hx , en i = xn . Hence,
x = (0, 0, . . . ) = 0. By (b), {en } is a CNOS. How does the Fourier series of x look like? The
Fourier coefficients of x are hx , en i = xn such that
x=

xn en

n=1

is the Fourier series of x . The NOS {en | n 2} is not complete.


(b) H = L2 ((0, 2)),

 inx


e
1
sin(nx) cos(nx)
,
|nZ
,
| n N , and

2
2
are both CNOSs in H. This was stated in Theorem6.14
(c) Existence of CNOS in a Separable Hilbert Space
Definition 13.9 A metric space E is called separable if there exists a countable dense subset
of E.
Example 13.7 (a) Rn is separable. M = {(r1 , . . . , rn ) | r1 , . . . , rn Q} is a countable dense
set in Rn .
(b) Cn is separable. M = {(r1 + is1 , . . . , rn + isn ) | r1 , . . . , rn , s1 , . . . , sn Q} is a countable
dense subset of Cn .
(c) L2 ([a, b]) is separable. The polynomials {1, x, x2 , . . . } are linearly independent in L2 ([a, b])
and they can be orthonormalized via Schmidts process. As a result we get a countable CNOS
in L2 ([a, b]) (Legendre polynomials in case a = 1 = b). However, L2 (R) contains no polyno2
mial; in this case the Hermite functions which are of the form pn (x) ex with polynomials pn ,
form a countable CNOS.
More general, L2 (G, n ) is separable for any region G Rn with respect to the Lebesgue
measure.
(d) Any Hilbert space is isomorphic to some L2 (X, ) where is the counting measure on X;
X = N gives 2 . X uncountable gives a non-separable Hilbert space.
Proposition 13.14 (Schmidts Orthogonalization Process) Let {yk } be an at most countable
linearly independent subset of the Hilbert space H. Then there exists an NOS {xk } such that
for every n
lin {y1 , . . . , yn } = lin {x1 , . . . , xn }.
The NOS can be computed recursively,
y1
x1 :=
,
ky1 k

xn+1 = (yn+1

n
X
k=1

hyn+1 , xk i xk )/ kk

Corollary 13.15 Let {ek | k N} be an NOS where N = {1, . . . , n} for some n N


or N = N. Suppose that H1 = lin {ek | k N} is the linear span of the NOS. Then
P
x1 = kN hx , ek i ek is the orthogonal projection of x H onto H1 .

13.1 The Geometry of the Hilbert Space

343

Proposition 13.16 (a) A Hilbert space H has an at most countable complete orthonormal system (CNOS) if and only if H is separable.
(b) Let H be a separable Hilbert space. Then H is either isomorphic to Kn for some n N or
to 2 .

13.1.5 Appendix
(a) The Inner Product constructed from an Inner Product Norm
Proof of Proposition 13.5. We consider only the case K = R. Assume that the parallelogram
identity is satisfied. We will show that
hx , yi =


1
kx + yk2 kx yk2
4

defines a bilinear form on E.


(a) We show Additivity. First note that the parallelogram identity implies
1
1
kx1 +x2 +yk2 + kx1 +x2 +yk2
2
2
 1

1
2
2
=
2 kx1 +yk +2 kx2 k kx1 x2 +yk2 + 2 kx2 +yk2 +2 kx1 k2 kx2 x1 +yk2
2
2

1
2
2
2
2
= kx1 +yk + kx2 +yk + kx1 k + kx2 k kx1 x2 +yk2 + kx2 x1 +yk2
2

kx1 +x2 +yk2 =

Replacing y by y, we have

kx1 +x2 yk2 = kx1 yk2 + kx2 yk2 + kx1 k2 + kx2 k2


By definition and the above two formulas,


1
kx1 x2 yk2 + kx2 x1 yk2 .
2


1
kx1 + x2 + yk2 kx1 + x2 yk2
4

1
=
kx1 + yk2 kx1 yk2 + kx2 + yk2 kx2 yk2
2
= hx1 , yi + hx2 , yi ,

hx1 + x2 , yi =

that is, h , i is additive in the first variable. It is obviously symmetric and hence additive in the
second variable, too.
(b) We show hx , yi = hx , yi for all R, x, y E. By (a), h2x , yi = 2 hx , yi. By
induction on n, hnx , yi = n hx , yi for all n N. Now let = m
, m, n N. Then
n
Dm
E D m
E
n hx , yi = n
x , y = n x , y = m hx , yi
n
n
m
= hx , yi = hx , yi = hx , yi .
n
Hence, hx , yi = hx , yi holds for all positive rational numbers . Suppose Q+ , then
0 = hx + (x) , yi = hx , yi + hx , yi

13 Hilbert Space

344

implies hx , yi = hx , yi and, moreover, hx , yi = hx , yi such that the equation


holds for all Q. Suppose that R is given. Then there exists a sequence (n ), n Q
of rational numbers with n . This implies n x x for all x E and, since kk is
continuous,
hx , yi = lim hn x , yi = lim n hx , yi = hx , yi .
n

This completes the proof.

(b) Convergence of Orthogonal Series


We reformulate Lemma 13.10
P
P
2
Let {xn } be an OS in H. Then
k=1 xk converges if and only if
k=1 kxk k
converges.
P
x of elements xi of a Hilbert space H is defined
Note that the convergence of a series
i=1
Pni
to be the limit of the partial sums limn i=1 xi . In particular, the Cauchy criterion applies
since H is complete:
P
The series yi converges
if and only if for every > 0 there exists n0 N such

n
X



yi < .
that m, n n0 imply


i=m

P
Pn
2
Proof. By the above discussion,
k=1 xk converges if and only if k
k=m xk k becomes small
for sufficiently large m, n N. By the Pythagorean theorem this term equals
n
X

k=m

hence the series

kxk k2 ;

xk converges, if and only if the series

kxk k2 converges.

13.2 Bounded Linear Operators in Hilbert Spaces


13.2.1 Bounded Linear Operators
Let (E1 , kk1 ) and (E2 , kk2 ) be normed linear space. Recall that a linear map T : E1 E2 is
called continuous if xn x in E1 implies T (xn ) T (x) in E2 .
Definition 13.10 (a) A linear map T : E1 E2 is called bounded if there exist a positive real
number C > 0 such that
kT (x)k2 C kxk1 ,

for all x E1 .

(13.6)

13.2 Bounded Linear Operators in Hilbert Spaces

345

(b) Suppose that T : E1 E2 is a bounded linear map. Then the operator norm is the smallest
number C satisfying (13.6) for all x E1 , that is
kT k = inf {C > 0 | x E1 : kT (x)k2 C kxk1 } .
One can show that


kT (x)k2
| x E1 , x 6= 0 ,
kxk1

(a)

kT k = sup

(b)

kT k = sup {kT (x)k2 | kxk1 1}

(c)

kT k = sup {kT (x)k2 | kxk1 = 1}

Indeed, we may restrict ourselves to unit vectors since


| | kT (x)k2
kT (x)k2
kT (x)k2
=
=
.
kxk1
| | kxk1
kxk1
This shows the equivalence of (a) and (c). Since kT (x)k2 = | | kT (x)k2 , the suprema (b)
and (c) are equal. From The last equality follows from the fact that the least upper bound is the
infimum over all upper bounds. From (a) and (d) it follows,
kT (x)k2 kT k kxk1 .
S

(13.7)

Also, if E1 E2 E3 are bounded linear mappings, then T S is a bounded linear mapping


with
kT Sk kT k kSk .

Indeed, for x 6= 0 one has by (13.7)

kT (S(x))k3 kT k kS(x)k2 kT k kSk kxk1 .


Hence, k(T S)(x)k3 / kxk1 kT k kSk.
Proposition 13.17 For a linear map T : E1 E2 of a normed space E1 into a normed space
E2 the following are equivalent:
(a) T is bounded.
(b) T is continuous.
(c) T is continuous at one point of E1 .
Proof. (a) (b). This follows from the fact
kT (x1 ) T (x2 )k = kT (x1 x2 )k kT k kx1 x2 k ,
and T is even uniformly continuous on E1 . (b) trivially implies (c).
(c) (a). Suppose T is continuous at x0 . To each > 0 one can find > 0 such that
kx x0 k < implies kT (x) T (x0 )k < . Let y = x x0 . In other words kyk < implies
kT (y + x0 ) T (x0 )k = kT (y)k < .
Suppose z E1 , kzk 1. Then k/2zk /2 < ; hence kT (/2z)k < . By linearity of T ,
kT (z)k < 2/. This shows kT k 2/.

13 Hilbert Space

346

Definition 13.11 Let E and F be normed linear spaces. Let L (E, F ) denote the set of all
bounded linear maps from E to F . In case E = F we simply write L (E) in place of L (E, F ).
Proposition 13.18 Let E and F be normed linear spaces. Then L (E, F ) is a normed linear
space if we define the linear structure by
(S + T )(x) = S(x) + T (x),

(T )(x) = T (x)

for S, T L (E, F ), K. The operator norm kT k makes L (E, F ) a normed linear space.
Note that L (E, F ) is complete if and only if F is complete.
Example 13.8 (a) Recall that L (Kn , Km ) is a normed vector space with kAk
P
1
2 2
, where A = (aij ) is the matrix representation of the linear operator A, see
|
a
|
ij
i,j
Proposition 7.1
(b) The space E = L (E, K) of continuous linear functionals on E.
(c) H = L2 ((0, 1)), g C([0, 1]),
Tg (f )(t) = g(t)f (t)
defines a bounded linear operator on H. (see homework)
(d) H = L2 ((0, 1)), k(s, t) L2 ([0, 1] [0, 1]). Then
Z 1
k(s, t)f (s) ds, f H = L2 ([0, 1])
(Kf )(t) =
0

defines a continuous linear operator K L (H). We have


Z 1
2 Z 1
2


2
| (Kf )(t) | =
k(s, t)f (s) ds
| k(s, t) | | f (s) | ds
0
0
Z 1
Z 1
2

| k(s, t) | ds
| f (s) |2 ds
C-S-I 0
0
Z 1
=
| k(s, t) |2 ds kf k2H .
0

Hence,
kK(f )k2H

1
0

Z

1
0

| k(s, t) | ds

dt kf k2H

kK(f )kH kkkL2 ([0,1][0,1]) kf kH .

This shows Kf H and further, kKk kkkL2 ([0,1]2 ) . K is called an integral operator; K is
compact, i. e. it maps the unit ball into a set whose closure is compact.
(e) H = L2 (R), a R,
(Va f )(t) = f (t a), t R,
defines a bounded linear operator called the shift operator. Indeed,
Z
Z
2
2
kVa f k2 =
| f (t a) | dt =
| f (t) |2 dt = kf k22 ;

13.2 Bounded Linear Operators in Hilbert Spaces

347

since all quotients kVa (x)k / kxk = 1, kVa k = 1 .


(f) H = 2 . We define the right-shift S by
S(x1 , x2 , . . . ) = (0, x1 , x2 , . . . ).
1
P
2 2
Obviously, kS(x)k = kxk =
|
x
|
. Hence, kSk = 1.
n
n=1
1
(g) Let E1 = C ([0, 1]) and E2 = C([0, 1]). Define the differentiation operator (T f )(t) = f (t).
Let kf k1 = kf k2 = sup | f (t) |. Then T is linear but not bounded. Indeed, let fn (t) = 1 tn .

t[0,1]

Then kfn k1 = 1 and T fn (t) = ntn1 such that ktfn k2 = n. Thus, kT fn k2 / kfn k1 = n +
as n . T is unbounded.
However, if we put kf k1 = sup | f (t) | + sup | f (t) | and kf k2 as before, then T is bounded
t[0,1]

t[0,1]

since
kT f k2 = sup | f (t) | kf k1 = kT k 1.
t[0,1]

13.2.2 The Adjoint Operator


In this subsection H is a Hilbert space and L (H) the space of bounded linear operators on H.
Let T L (H) be a bounded linear operator and y H. Then F (x) = hT (x) , yi defines a
continuous linear functional on H. Indeed,
| F (x) | = | hT (x) , yi | kT (x)k kyk kT k kyk kxk C kxk .
| {z }
CSI
C

Hence, F is bounded and therefore continuous. in particular,


kF k kT k kyk

By Rieszs representation theorem, there exists a unique vector z H such that


hT (x) , yi = F (x) = hx , zi .
Note that by the above inequality
kzk = kF k kT k kyk .

(13.8)

Suppose y1 is another element of H which corresponds to z1 H with


hT (x) , y1 i = hx , z1 i .
Finally, let u H be the element which corresponds to y + y1 ,
hT (x) , y + y1 i = hx , ui .
Since the element u which is given by Rieszs representation theorem is unique, we have u =
z + z1 . Similarly,
hT (x) , yi = F (x) = hx , zi
shows that z corresponds to y.

13 Hilbert Space

348

Definition 13.12 The above correspondence y 7 z is linear. Define the linear operator T by
z = T (y). By definition,


hT (x) , yi = x , T (y) ,

x, y H.

(13.9)

T is called the adjoint operator to T .


Proposition 13.19 Let T, T1 , T2 L (H). Then T is a bounded linear operator with T =
kT k. We have
(a) (T1 + T2 ) = T1 + T2 and
(b) ( T ) = T .

(c) (T1 T2 ) = T2 T1 .
(d) If T is invertible in L (H), so is T , and we have (T )1 = (T 1 ) .
(e) (T ) = T .
Proof. Inequality (13.8) shows that

T (y) kT k kyk ,
By definition, this implies
and T is bounded. Since

y H.


T kT k



T (x) , y = hy , T (x)i = hT (y) , xi = hx , T (y)i ,



we get (T ) = T . We conclude kT k = T T ; such that T = kT k.
(a). For x, y H we have
h(T1 + T2 )(x) , yi = hT1 (x) + T2 (x) , yi = hT1 (x) , yi + hT2 (x) , yi


= x , T1 (y) + x , T2 (y) = x , (T1 + T2 )(y) ;

which proves (a).


(c) and (d) are left to the reader.

A mapping : A A such that the above properties (a), (b), and (c) are satisfied is called an
involution. An algebra with involution is called a -algebra.
We have seen that L (H) is a (non-commutative) -algebra. An example of a commutative
-algebra is C(K) with the involution f (x) = f (x).
Example 13.9 (Example 13.8 continued)
(a) H = Cn , A = (aij ) M(n n, C). Then A = (bij ) has the matrix elements bij = aji .
(b) H = L2 ([0, 1]), Tg = Tg .
(c) H = L2 (R), Va (f )(t) = f (t a) (Shift operator), Va = Va .

13.2 Bounded Linear Operators in Hilbert Spaces

349

(d) H = 2 . The right-shift S is defined by S((xn )) = (0, x1 , x2 , . . . ). We compute the adjoint


S .

X
X
hS(x) , yi =
xn1 yn =
xn yn+1 = h(x1 , x2 , . . . ) , (y2 , y3 , . . . )i .
n=2

n=1

Hence, S ((yn )) = (y2 , y3 , . . . ) is the left-shift.

13.2.3 Classes of Bounded Linear Operators


Let H be a complex Hilbert space.
(a) Self-Adjoint and Normal Operators
Definition 13.13 An operator A L (H) is called
(a) self-adjoint, if A = A,
(b) normal , if

A A = A A ,

A self-adjoint operator A is called positive, if hAx , xi 0 for all x H. We write A 0. If


A and B are self-adjoint, we write A B if A B 0.
A crucial role in proving the simplest properties plays the polarization identity which generalizes the polarization identity from Subsection 13.1.2. However, this exist only in complex
Hilbert spaces.
4 hA(x) , yi = hA(x + y) , x + yi hA(x y) , x yi +

+ i hA(x + iy) , x + iyi i hA(x iy) , x iyi .

We use the identity as follows


hA(x) , xi = 0

for all x H implies A = 0.

Indeed, by the polarization identity, hA(x) , yi = 0 for all x, y H. In particular y = A(x)


yields A(x) = 0 for all x; thus, A = 0.


Remarks 13.2 (a) A is normal if and only if kA(x)k = A (x) for all x H. Indeed, if A


is normal, then for all x H we have A A(x) , x = A A (x) , x which imply kA(x)k2 =

2


hA(x) , A(x)i = A (x) , A (x) = A (x) . On the other hand, the polarization identity


and A A(x) , x = A A (x) , x implies (A A A A )(x) , x = 0 for all x; hence
A A A A = 0 which proves the claim.
(b) Sums and real scalar multiples of self-adjoint operators are self-adjoint.
(c) The product AB of self-adjoint operators is self-adjoint if and only if A and B commute
with each other, AB = BA.
(d) A is self-adjoint if and only if hAx , xi is real for all x H.
Proof. Let A = A. Then hAx , xi = hx , Axi = hAx , xi is real; for the opposite direction
hA(x) , xi = hx , A(x)i and the polarization identity yields hA(x) , yi = hx , A(y)i for all
x, y; hence A = A.

13 Hilbert Space

350
(b) Unitary and Isometric Operators
Definition 13.14 Let T L (H). Then T is called
(a) unitary, if T T = I = T T .
(b) isometric, if

kT (x)k = kxk for all x H.

Proposition 13.20 (a) T is isometric if and only if T T = I and if and only if hT (x) , T (y)i =
hx , yi for all x, y H.
(b) T is unitary, if and only if T is isometric and surjective.
(c) If S, T are unitary, so are ST and T 1 . The unitary operators of L (H) form a group.


Proof. (a) T isometric yields hT (x) , T (x)i = hx , xi and further (T T I)(x) , x = 0 for


all x. The polarization identity implies T T = I. This implies (T T I)(x) , y = 0, for
all x, y H. Hence, hT (x) , T (y)i = hx , yi. Inserting y = x shows T is isometric.
(b) Suppose T is unitary. T T = I shows T is isometric. Since T T = I, T is surjective.
Suppose now, T is isometric and surjective. Since T is isometric, T (x) = 0 implies x = 0;
hence, T is bijective with an inverse operator T 1 . Insert y = T 1 (z) into hT (x) , T (y)i =
hx , yi. This gives


hT (x) , zi = x , T 1 (z) , x, z H.
Hence T 1 = T and therefore T T = T T = I.
(c) is easy (see homework 45.4).

Note that an isometric operator is injective with norm 1 (since kT (x)k / kxk = 1 for all x). In
case H = Cn , the unitary operators on Cn form the unitary group U(n). In case H = Rn , the
unitary operators on H form the orthogonal group O(n).
Example 13.10 (a) H = L2 (R). The shift operator Va is unitary since Va Vb = Va+b . The
multiplication operator Tg f = gf is unitary if and only if | g | = 1. Tg is self-adjoint (resp.
positive) if and only if g is real (resp. positive).
(b) H = 2 , the right-shift S((xn )) = (0, x1 , x2 , . . . ) is isometric but not unitary since S is not
surjective. S is not isometric since S (1, 0, . . . ) = 0; hence S is not injective.
(c) Fourier transform. For f L1 (R) define
Z
1
(Ff )(t) =
eitx f (x) dx.
2 R


Let S(R) = {f C (R) | suptR tn f (k) (t) < , n, k Z+ }. S(R) is called the
Schwartz space after Laurent Schwartz. We have S(R) L1 (R) L2 (R), for example, f (x) =
2
ex S(R). We will show later that F : S(R) S(R) is bijective and norm preserving,
kF(f )kL2 (R) = kf kL2 (R) , f S(R). F has a unique extension to a unitary operator on L2 (R).
The inverse Fourier transform is
Z
1
1
(F f )(t) =
eitx f (x) dx, f S(R).
2 R

13.2 Bounded Linear Operators in Hilbert Spaces

351

13.2.4 Orthogonal Projections


(a) Rieszs First Theoremrevisited
Let H1 be a closed linear subspace. By Theorem 13.7 any x H has a unique decomposition
x = x1 + x2 with x1 H1 and x2 H1 . The map PH1 (x) = x1 is a linear operator from H
to H, (see homework 44.1). PH1 is called the orthogonal projection from H onto the closed
subspace H1 . Obviously, H1 is the image of PH1 ; in particular, PH1 is surjective if and only if
H1 = H. In this case, PH = I is the identity. Since
kPH1 (x)k2 = kx1 k2 kx1 k2 + kx2 k2 = kxk2
we have kPH1 k 1. If H1 6= {0}, there exists a non-zero x1 H1 such that kPH1 (x1 )k = kx1 k.
This shows kPH1 k = 1.
Here is the algebraic characterization of orthogonal projections.
Proposition 13.21 A linear operator P L (H) is an orthogonal projection if and only if
P 2 = P and P = P .
In this case H1 = {x H | P (x) = x}.
Proof. . Suppose that P = PH1 is the projection onto H1 . Since P is the identity on H1 ,
P 2 (x) = P (x1 ) = x1 = P (x) for all x H; hence P 2 = P .
Let x = x1 + x2 and y = y1 + y2 be the unique decompositions of x and y in elements of H1
and H1 , respectively. Then
hP (x) , yi = hx1 , y1 + y2 i = hx1 , y1 i + hx1 , y2 i = hx1 , y1 i = hx1 + x2 , y1 i = hx , P (y)i ,
| {z }
=0

that is, P = P .
. Suppose P 2 = P = P and put H1 = {x | P (x) = x}. First note, that for P 6= 0, H1 6=
{0} is non-trivial. Indeed, since P (P (x)) = P (x), the image of P is part of the eigenspace
of P to the eigenvalues 1, P (H) H1 . Since for z H1 , P (z) = z, H1 P (H) and thus
H1 = P (H).
Since P is continuous and {0} is closed, H1 = (P I)1 ({0}) is a closed linear subspace of
H. By Rieszs first theorem, H = H1 H1 . We have to show that P (x) = x1 for all x.
Since P 2 = P , P (P (x)) = P (x) for all x; hence P (x) H1 . We show x P (x) H1 which
completes the proof. For, let z H1 , then
hx P (x) , zi = hx , zi hP (x) , zi = hx , zi hx , P (z)i = hx , zi hx , zi = 0.
Hence x = P (x) + (I P )(x) is the unique Riesz decomposition of x with respect to H1 and
H1 .

Example 13.11 (a) Let {x1 , . . . , xn } be an NOS in H. Then


P (x) =

n
X
k=1

hx , xk i xk ,

x H,

13 Hilbert Space

352

defines the orthogonal projection P : H H onto lin {x1 , . . . , xn }. Indeed, since P (xm ) =
Pn
2
k=1 hxm , xk i xk = xm , P = P and since
+
*
n
n
X
X
hP (x) , yi =
hx , xk i hxk , yi = x ,
hxk , yi xk = hx , P (y)i .
k=1

k=1

Hence, P = P and P is a projection.


(b) H = L2 ([0, 1] [2, 3]), g C([0, 1] [2, 3]). For f H define Tg f = gf . Then Tg = (Tg )
if and only if g(t) is real for all t. Tg is an orthogonal projection if g(t)2 = g(t) such that
g(t) = 0 or g(t) = 1. Since g is continuous, there are only four solutions: g1 = 0, g2 = 1,
g3 = [0,1] , and g4 = [2,3] .
In case of g3 , the subspace H1 can be identified with L2 ([0, 1]) since f H1 iff Tg f = f iff
gf = f iff f (t) = 0 for all t [2, 3].
(b) Properties of Orthogonal Projections
Throughout this paragraph let P1 and P2 be orthogonal projections on the closed subspaces H1
and H2 , respectively.
Lemma 13.22 The following are equivalent.
(a) P1 + P2 is an orthogonal projection.
(b) P1 P2 = 0.
(c) H1 H2 .
Proof. (a) (b). Let P1 + P2 be a projection. Then

(P1 + P2 )2 = P12 + P1 P2 + P2 P1 + P22 = P1 + P2 + P1 P2 + P2 P1 = P1 + P2 ,


hence P1 P2 + P2 P1 = 0. Multiplying this from the left by P1 and from the right by P1 yields
P1 P2 + P1 P2 P1 = 0 = P1 P2 P1 + P2 P1 .
This implies P1 P2 = P2 P1 and finally P1 P2 = P2 P1 = 0.
(b) (c). Let x1 H1 and x2 H2 . Then
0 = hP1 P2 (x2 ) , x1 i = hP2 (x2 ) , P1 (x1 )i = hx2 , x1 i .
This shows H1 H2 .
(c) (b). Let x, z H be arbitrary. Then
hP1 P2 (x) , zi = hP2 (x) , P1 (z)i = hx2 , z1 i = 0;
Hence P1 P2 (x) = 0 and therefore P1 P2 = 0. The same argument works for P2 P1 = 0.
(b) (a). Since P1 P2 = 0 implies P2 P1 = 0 (via H1 H2 ),
(P1 + P2 ) = P1 + P2 = P1 + P2 ,

(P1 + P2 )2 = P12 + P1 P2 + P2 P1 + P22 = P1 + 0 + 0 + P2 .

13.2 Bounded Linear Operators in Hilbert Spaces

353

Lemma 13.23 The following are equivalent


P1 P2 is an orthogonal projection.
P1 P2 = P2 P1 .

(a)
(b)

In this case, P1 P2 is the orthogonal projection onto H1 H2 .

Proof. (b) (a). (P1 P2 ) = P2 P1 = P2 P1 = P1 P2 , by assumption. Moreover, (P1 P2 )2 =


P1 P2 P1 P2 = P1 P1 P2 P2 = P1 P2 which completes this direction.
(a) (b). P1 P2 = (P1 P2 ) = P2 P1 = P2 P1 .
Clearly, P1 P2 (H) H1 and P2 P1 (H) H2 ; hence P1 P2 (H) H1 H2 . On the other hand
x H1 H2 implies P1 P2 x = x. This shows P1 P2 (H) = H1 H2 .
The proof of the following lemma is quite similar to that of the previous two lemmas, so we
omit it (see homework 40.5).
Lemma 13.24 The following are equivalent.
(a) H1 H2 ,

(b)

(d)

P1 P2 = P1 ,

(e)

P2 P1

(c)
is an orth. projection,

(f)

P1 P2 ,

P2 P1 = P1 ,

kP1 (x)k kP2 (x)k ,

x H.

Proof. We show (d) (c). From P1 P2 we conclude that I P2 I P2 . Note that both
I P1 and I P2 are again orthogonal projections on H1 and H2 , respectively. Thus for all
x H:
k(I P2 )P1 (x)k2 = h(I P2 )P1 (x) , (I P2 )P1 (x)i

= (I P2 ) (I P2 )P1 (x) , P1 (x) = h(I P2 )P1 (x) , P1 (x)i

h(I P1 )P1 (x) , P1 (x)i = hP1 (x) P1 (x) , P1 (x)i = h0 , P1 (x)i = 0.

Hence, k(I P2 )P1 (x)k2 = 0 which implies (I P2 )P1 = 0 and therefore P1 = P2 P1 .

13.2.5 Spectrum and Resolvent


Let T L (H) be a bounded linear operator.
(a) Definitions
Definition 13.15 (a) The resolvent set of T , denoted by (T ), is the set of all C such that
there exists a bounded linear operator R (T ) L (H) with
R (T )(T I) = (T I)R (T ) = I,

13 Hilbert Space

354

i. e. there T I has a bounded (continuous) inverse R (T ). We call R (T ) the resolvent of T


at .
(b) The set C \ (T ) is called the spectrum of T and is denoted by (T ).
(c) C is called an eigenvalue of T if there exists a nonzero vector x, called eigenvector,
with (T I)x = 0. The set of all eigenvalues is the point spectrum p (T )
Remark 13.3 (a) Note that the point spectrum is a subset of the spectrum, p (T ) (T ).
Suppose to the contrary, the eigenvalue with eigenvector y belongs to the resolvent set. Then
there exists R (T ) L (H) with
y = R (T )(T I)(y) = R (T )(0) = 0
which contradicts the definition of an eigenvector; hence eigenvalues belong to the spectrum.
(b) p (T ) is equivalent to T I not being injective. It may happen that T I is not
surjective, which also implies (T ) (see Example 13.12 (b) below).
Example 13.12 (a) H = Cn , A M(n n, C). Since in finite dimensional spaces T L (H)
is injective if and only if T is surjective, (A) = p (A).
(b) H = L2 ([0, 1]). (T f )(x) = xf (x). We have
p (T ) = .
Indeed, suppose is an eigenvalue and f L 2 ([0, 1]) an eigenfunction to T , that is (T
I)(f ) = 0; hence (x )f (x) 0 a. e. on [0, 1]. Since x is nonzero a. e. , f = 0 a. e. on
[0, 1]. That is f = 0 in H which contradicts the definition of an eigenvector. We have

\ [0, 1]

(T ).

Suppose 6 [0, 1]. Since x 6= 0 for all x [0, 1], g(x) =

bounded) function on [0, 1]. Hence,

(R f )(x) =

1
is a continuous (hence
x

1
f (x)
x

defines a bounded linear operator which is inverse to T I since






1
1
(T I)
f (x) = (x )
f (x) = f (x).
x
x
We have
(T ) = [0, 1].
Suppose to the contrary that there exists (T ) [0, 1]. Then there exists R L (H) with
R (T I) = I.

(13.10)

By homework 39.5 (a), the norm of the multiplication operator Tg is less that or equal to kgk
(the supremum norm of g). Choose f = (,+) . Since M = 2M ,




k(T I)f k = (x )U () (x)f (x) sup (x )U () (x) kf k .
x[0,1]

13.2 Bounded Linear Operators in Hilbert Spaces


However,



sup (x )U () (x) = sup | x | = .

x[0,1]

This shows

355

xU ()

k(T I)f k kf k .
Inserting f into (13.10) we obtain
kf k = kR (T I)f k kR k k(T I)f k kR k kf k
which implies kR k 1/. This contradicts the boundedness of R since > 0 was arbitrary.
(b) Properties of the Spectrum
Lemma 13.25 Let T L (H). Then
(T ) = (T ) ,

(complex conjugation) (T ) = (T ) .

Proof. Suppose that (T ). Then there exists R (T ) L (H) such that


R (T )(T I) = (T I)R (T ) = I
(R (T )(T I)) = ((T I)R ) = I

(T I)R (T ) = R (t) (T I) = I.

This shows R (T ) = R (T ) is again a bounded linear operator on H. Hence,


(T ) ((T )) . Since is an involution (T = T ), the opposite inclusion follows.
Since (T ) is the complement of the resolvent set, the claim for the spectrum follows as well.

For , , T and S we have


R (T ) R (T ) = ( )R (T )R (T ) = ( )R (T )R (T ),
R (T ) R (S) = R (T )(S T )R (S).

Proposition 13.26 (a) (T ) is open and (T ) is closed.


(b) If 0 (T ) and | 0 | < kR0 (T )k1 then (T ) and
R (T ) =

X
n=0

( 0 )n R0 (T )n+1 .

(c) If | | > kT k, then (T ) and


R (T ) =

X
n=0

n1 T n .

13 Hilbert Space

356

Proof. (a) follows from (b).


(b) For brevity, we write R0 in place of R0 (T ). With q = | 0 | kR0 (T )k, q (0, 1) we
have

X
X
kR0 k
n
n+1
converges.
| 0 | kR0 k
=
q n kR0 k =
1

q
n=0
n=0
P
P
By homework 38.4, xn converges if kxn k converges. Hence,
B=

X
n=0

( 0 )n Rn+1
0

converges in L (H) with respect to the operator norm. Moreover,


(T I)B = (T 0 I)B ( 0 )B

X
X
n
n+1
=
( 0 ) (T 0 I)R0
( 0 )n+1 Rn+1
0
n=0

n=0

X
n=0

( 0 )n Rn0

= ( 0 )

R0 0

= I.

X
n=0

( 0 )n+1 Rn+1
0

Similarly, one shows B(T I) = I. Thus, R (T ) = B.


(c) Since | | > kT k, the series converges with respect to operator norm, say
C=
We have
(T I)C =

n1 T n .

n=0

n1

n+1

n=0

n T n = 0 T 0 = I.

n=0

Similarly, C(T I) = I; hence R (T ) = C.

Remarks 13.4 (a) By (b), R (T ) is a holomorphic (i. e. complex differentiable) function in


the variable with values in L (H). One can use this to show that the spectrum is non-empty,
(T ) 6= .
P
n
(b) If kT k < 1, T I is invertible with inverse
n=0 T .
(c) Proposition 13.26 (c) means: If (T ) then | |
kT k. However, there is, in general, a smaller disc around
0 which contains the spectrum. By definition, the spectral radius r(T ) of T is the smallest non-negative number
such that the spectrum is completely contained in the disc
around 0 with radius r(T ):

()
0
r(T)

r(T ) = sup{| | | (T )}.

13.2 Bounded Linear Operators in Hilbert Spaces

357

(d) (T ) implies n (T n ) for all non-negative integers. Indeed, suppose n (T n ),


that is B(T n n ) = (T n n )B = I for some bounded B. Hence,
B

n
X
k=0

T k n1k (T ) = (T )CB = I;

thus (T ).
We shall refine the above statement and give a better upper bound for {| | | (T )} than
kT k.
Proposition 13.27 Let T L (H) be a bounded linear operator. Then the spectral radius of T
is
1

r(T ) = lim kT n k n .

(13.11)

The proof is in the appendix.

13.2.6 The Spectrum of Self-Adjoint Operators


Proposition 13.28 Let T = T be a self-adjoint operator in L (H). Then (T ) if and
only if there exists C > 0 such that
k(T I)xk C kxk .
Proof. Suppose that (T ). Then there exists (a non-zero) bounded operator R (T ) such
that
kxk = kR (T )(T I)xk kR (T )k k(T I)xk .
Hence,
k(T I)xk

1
kxk ,
kR (T )k

x H.

We can choose C = 1/ kR (T )k and the condition of the proposition is satisfied.


Suppose, the condition is satisfied. We prove the other direction in 3 steps, i. e. T 0 I has a
bounded inverse operator which is defined on the whole space H.
Step 1. T I is injective. Suppose to the contrary that (T )x1 = (T )x2 . Then
0 = k(T )(x1 x2 )k C kx1 x2 k ,
and kx1 x2 k = 0 follows. That is x1 = x2 . Hence, T I is injective.
Step 2. H1 = (T I)H, the range of T I is closed. Suppose that yn = (T I)xn ,
xn H, converges to some y H. We want to show that y H1 . Clearly (yn ) is a Cauchy
sequence such that kym yn k 0 as m, n . By assumption,
kym yn k = k(T I)(xn xm )k C kxn xm k .
Thus, (xn ) is a Cauchy sequence in H. Since H is complete, xn x for some x H. Since
T I is continuous,
yn = (T I)xn (T I)x.
n

13 Hilbert Space

358

Hence, y = (T I)x and H1 is a closed subspace.


Step 3. H1 = H. By Riesz first theorem, H = H1 H1 . We have to show that H1 = {0}. Let
u H1 , that is, since T = T ,

0 = h(T I)x , ui = x , (T I)u , for all x H.


This shows (T I)u = 0, hence T (u) = u. This implies
hT (u) , ui = hu , ui .
However, T = T implies that the left side is real, by Remark 13.2 (d). Hence = is real.
We conclude, (T I)u = 0. By injectivity of T I, u = 0. That is H1 = H.
We have shown that there exists a linear operator S = (T I)1 which is inverse to T I
and defined on the whole space H. Since
kyk = k(T I)S(y)k C kS(y)k ,
S is bounded with kSk 1/C. Hence, S = R (T ).
Note that for any bounded real function f (x, y) we have
sup f (x, y) = sup(sup f (x, y)) = sup(sup f (x, y)).
x,y

In particular, kxk = sup | hx , yi | since y = x/ kxk yields the supremum and CSI gives the
kyk1

upper bound. Further, kT (x)k = sup | hT (x) , yi | such that


kyk1

kT k = sup sup | hT (x) , yi | =


kxk1 kyk1

sup
kxk1, kyk1

| hT (x) , yi | = sup sup | hT (x) , yi |


kyk1 kxk1

In case of self-adjoint operators we can generalize this.


Proposition 13.29 Let T = T L (H). Then we have
kT k = sup | hT (x) , xi | .

(13.12)

kxk1

Proof. Let C = sup | hT (x) , xi |. By CauchySchwarz inequality, | hT (x) , xi | kT k kxk2


kxk1

such that C kT k.
For any real positive > 0 we have:


kT (x)k2 = hT (x) , T (x)i = T 2 (x) , x =
T (x + 1 T (x)) , x + 1 T (x)
4


= T (x 1 T (x)) , x 1 T (x)
2

2 
1
C x + 1 T (x) + C x 1 T (x)

4

2  C 2

C
2 kxk2 + 2 1 T (x) =
kxk2 + 2 kT (x)k2 .
=
P.I. 4
2

13.2 Bounded Linear Operators in Hilbert Spaces

359

Inserting 2 = kT (x)k / kxk we obtain


=

C
(kT (x)k kxk + kxk kT (x)k)
2

which implies kT (x)k C kxk. Thus, kT k = C.


Let m = inf hT (x) , xi and M = sup hT (x) , xi denote the lower and upper bound of T .
kxk=1

kxk=1

Then we have

sup | hT (x) , xi | = max{| m | , M} = kT k ,

kxk1

and
m kxk2 hT (x) , xi M kxk2 ,

for all x H.

Corollary 13.30 Let T = T L (H) be a self-adjoint operator. Then


(T ) [m, M].
Proof. Suppose that 0 6 [m, M]. Then
C :=

inf

[m,M ]

| 0 | > 0.

Since m = inf hT (x) , xi and M = sup hT (x) , xi we have for kxk = 1


kxk=1

kxk=1








2
k(T 0 I)xk = kxk k(T 0 I)xk | h(T 0 I)x , xi | = hT (x) , xi 0 kxk C.
|{z}
| {z }
CSI
[m,M ]
1

This implies

k(T 0 I)xk C kxk

for all x H.

By Proposition 13.28, 0 (T ).
Example 13.13 (a) Let H = L2 [0, 1], g C[0, 1] a real-valued function, and (Tg f )(t) =
g(t)f (t). Let m = inf g(t), M = sup g(t). One proves that m and M are the lower and
t[0,1]

t[0,1]

upper bounds of Tg such that (Tg ) [m, M]. Since g is continuous, by the intermediate value
theorem, (Tg ) = [m, M].
(b) Let T = T L (H) be self-adjoint. Then all eigenvalues of T are real and eigenvectors
to different eigenvalues are orthogonal to each other. Proof. The first statement is clear from
Corollary 13.30. Suppose that T (x) = x and T (y) = y with 6= . Then
hx , yi = hT (x) , yi = hx , T (y)i = hx , yi = hx , yi .
Since 6= , hx , yi = 0.
The statement about orthogonality holds for arbitrary normal operators.

13 Hilbert Space

360

Appendix: Compact Self-Adjoint Operator in Hilbert Space


Proof of Proposition 13.27. From the theory of power series, Theorem 2.34 we know that the
series
z

X
n=0

kT n k z n

(13.13)

converges if | z | < R and diverges if | z | > R, where


R=

lim

1
p
n

kT n k

(13.14)

Inserting z = 1/ and using homework 38.4, we have

p
n

n1 T n

n=0

kT n k (and converges if | | > lim

p
n

kT n k). The reason for the


p
divergence of the power series is, that the spectrum (T ) and the circle with radius lim n kT n k
n
have points in common; hence
p
r(T ) = lim n kT n k.

diverges if | | < lim

On the other hand, by Remark 13.4 (d), (T ) implies n (T n ); hence, by Remark 13.4
(c),
p
| n | kT n k = | | n kT n k.

Taking the supremum over all (T ) on the left and the lim over all n on the right, we have
p
p
r(T ) lim n kT n k lim n kT n k = r(T ).
n

Hence, the sequence

p
n

kT n k converges to r(T ) as n tends to .

Compact operators generalize finite rank operators. Integral operators on compact sets are compact.
Definition 13.16 A linear operator T L (H) is called compact if the closure T (U1 ) of the
unit ball U1 = {x | kxk 1} is compact in H. In other words, for every sequence (xn ),
xn U1 , there exists a subsequence such that T (xnk ) converges.
Proposition 13.31 For T L (H) the following are equivalent:
(a) T is compact.
(b) T is compact.
(c) For all sequences (xn ) with (hxn , yi) hx , yi converges for all y we have
T (xn ) T (x).
(d) There exists a sequence (Tn ) of operators of finite rank such that kT Tn k
0.

13.2 Bounded Linear Operators in Hilbert Spaces

361

Definition 13.17 Let T be an operator on H and H1 a closed subspace of H. We call H1 an


reducing subspace if both H1 and H1 are T -invariant, i. e. T (H1 ) H1 and T (H1 ) H1 .
Proposition 13.32 Let T L (H) be normal.
(a) The eigenspace ker(T I) is a reducing subspace for T and ker(T I) = ker(T I) .
(b) If , are distinct eigenvalues of T , ker(T I) ker(T I).


Proof. (a) Since T is normal, so is T . Hence k(T )(x)k = (T ) (x) . Thus,
ker(T ) = ker(T ) . In particular, T (x) = x if x ker(T I).
We show invariance. Let x ker(T ); then T (x) = x ker(T I). Similarly,
x ker(T I) , y ker(T I) imply


hT (x) , yi = x , T (y) = x , y = 0.
Hence, ker(T I) is T -invariant, too.
(b) Let T (x) = x and T (y) = y. Then (a) and T (y) = y ... imply


hx , yi = hT (x) , yi = x , T (y) = hx , yi = hx , yi .
Thus ( ) hx , yi = 0; since 6= , x y.

Theorem 13.33 (Spectral Theorem for Compact Self-Adjoint Operators) Let H be an infinite dimensional separable Hilbert space and T L (H) compact and self-adjoint.
Then there exists a real sequence (n ) with n 0 and an CNOS {en | n N} {fk | k
n

N N} such that

T (en ) = n en ,

nN

T (fk ) = 0,

k N.

Moreover,
T (x) =

X
n=1

n hx , en i en ,

x H.

(13.15)

Remarks 13.5 (a) Since {en } {fk } is a CNOS, any x H can be written as its Fourier series
x=

X
n=1

hx , en i en +

kN

hx , fk i fk .

Applying T using T (en ) = n en we have


T (x) =

X
n=1

hx , en i n en +

kN

hx , fk i T (fk )
| {z }
=0

which establishes (13.15). The main point is the existence of a CNOS of eigenvectors {en }
{fk }.
(b) In case H = Cn (Rn ) the theorem says that any hermitean (symmetric) matrix A is diagonalizable with only real eigenvalues.

362

13 Hilbert Space

Chapter 14
Complex Analysis
Here are some useful textbooks on Complex Analysis: [FL88] (in German), [Kno78] (in German), [Nee97], [Ruh83] (in German), [Hen88].
The main part of this chapter deals with holomorphic functions which is another name for a
function which is complex differentiable in an open set. On the one hand, we are already
familiar with a huge class of holomorphic functions: polynomials, the exponential function,
sine and cosine functions. On the other hand holomorphic functions possess quite amazing
properties completely unusual from the vie point of real analysis. The properties are very
strong. For example, it is easy to construct a real function which is 17 times differentiable but
not 18 times. A complex differentiable function (in a small region) is automatically infinitely
often differentiable.
Good references are Ahlfors [Ahl78], a little harder is Conway [Con78], easier is Howie
[How03].

14.1 Holomorphic Functions


14.1.1 Complex Differentiation
We start with some notations.
Ur
UR (a)
Ur

Ur
Sr

{z | | z | < r}
{z | | z a | < R}
{z | | z | r}
{z | 0 < | z | < r}
{z | | z | = r}

open ball of radius r around 0


open ball of radius R around a
closed ball of radius r around 0
punctured ball of radius r
circle of radius r around 0

Definition 14.1 Let U C be an open subset of C and f : U C be a complex function.


(a) If z0 U and the limit
f (z) f (z0 )
lim
=: f (z0 )
zz0
z z0

exists, we call f complex differentiable at z0 and f (z0 ) the derivative of f at z0 . We call f (z0 )
the derivative of f at z0 .
363

14 Complex Analysis

364

(b) If f is complex differentiable for every z0 U, we say that f is holomorphic in U. We call


f the derivative of f on U.
(c) f is holomorphic at z0 if it complex differentiable in a certain neighborhood of z0 .
To be quite explicit, f (z0 ) exists if to every > 0 there exists some > 0 such that z U (z0 )
implies


f (z) f (z0 )


f (z0 ) < .

z z0

Remarks 14.1 (a) Differentiability of f at z0 forces f to be continuous at z0 . Indeed, f is


differentiable at z0 with derivative f (z0 ) if and only if, there exists a function r(z, z0 ) such that
f (z) = f (z0 ) + f (z0 )(z z0 ) + (z z0 )r(z, z0 ),
where limzz0 r(z, z0 ) = 0, In particular, taking the limit z z0 in the above equation we get
lim f (z) = f (z0 ),

zz0

which proves continuity at z0 .


Complex conjugation is a uniformly continuous function on C since | z z 0 | = | z z0 | for
all z, z0 C.
(b) The derivative satisfies the well-known sum, product, and quotient rules, that is, if both f
and g are holomorphic in U, so are f + g, f g, and f /g, provided g 6= 0 in U and we have
 
f g f g
f

=
.
(f + g) = f + g , (f g) = f g + f g ,
g
g2
f

Also, the chain rule holds; if U V C are holomorphic, so is g f and


(g f ) (z) = g (f (z)) f (z).
The proofs are exactly the same as in the real case. Since the constant functions f (z) = c and
the identity f (z) = z are holomorphic in C, so is every polynomial with complex coefficients
and, moreover, every rational function (quotient of two polynomials) f : U C, provided the
denominator has no zeros in U. So, we already know a large class of holomorphic functions.
Another bigger class are the convergent power series.
Example 14.1 f (z) = | z |2 is complex differentiable at 0 with f (0) = 0. f is not differentiable
at z0 = 1. Indeed,
f (h + 0) f (0)
| h |2
lim
= lim
= lim h = 0.
h0
h0 h
h0
h
On the other hand. Let R
| 1 + |2 1
2 + 2
= lim
=2
0
0

lim

whereas

| 1 + i |2 1
1 + 2 1
= lim
= 0.
lim
0
0
i
i
This shows that f (1) does not exist.

14.1 Holomorphic Functions

365

14.1.2 Power Series


Recall from Subsection 2.3.9 that a power series
R=

lim

P
1
p
n

cn z n has a radius of convergence

| cn |

That is, the series converges absolutely for all z with | z | < R; the series diverges for | z | > R,
the behaviour for | z | = R depends on the (cn ). Moreover, it converges uniformly on every
closed ball Ur with 0 < r < R, see Proposition 6.4.
We already know that a real power series can be differentiated elementwise, see Corollary 6.11.
We will see, that power series are holomorphic inside its radius of convergence.
Proposition 14.1 Let a C and
f (z) =

X
n=0

cn (z a)n

(14.1)

be a power series with radius of convergence R. Then f : UR (a) C is holomorphic and the
derivative is

f (z) =

X
n=1

ncn (z a)n1 .

(14.2)

Proof. If the series (14.1) converges in UR (a), the root test shows that the series (14.2) also
converges there. Without loss of generality, take a = 0. Denote the sum of the series (14.2) by
g(z), fix w UR (0) and choose r so that | w | < r < R. If z 6= w, we have
 n


X
z wn
f (z) f (w)
n1
cn
.
g(w) =
nw
zw
z

w
n=0
The expression in the brackets is 0 if n = 1. For n 2 it is (by direct computation of the
following term)
= (z w)

n1
X
k=1

kw

k1

nk1

n1
X
k=1


kw k1 z nk kw k z nk1 ,

(14.3)

which gives a telescope sum if we shift k := k + 1 in the first summand. If | z | < r, the absolute
value of the sum (14.3) is less than
n(n 1) n2
r ,
2
so

X
f (z) f (w)



g(w) | z w |
n2 | cn | r n2 .
(14.4)

zw
n=2
Since r < R, the last series converges. Hence the left side of (14.4) tends to 0 as z w. This
says that f (w) = g(w), and completes the proof.

14 Complex Analysis

366

Corollary 14.2 Since f (z) is again a power series with the same radius of convergence R, the
proposition can be applied to f (z). It follows that f has derivatives of all orders and that each
derivative has a power series expansion around a
f

(k)

(z) =

X
n=k

n(n 1) (n k + 1)cn (z a)nk

(14.5)

Inserting z = a implies
f (k) (a) = k!ck ,

k = 0, 1, . . . .

This shows that the coefficients cn in the power series expansion f (z) =
with midpoint a are unique.
z

Example 14.2 The exponential function e =

n=0 cn (z

a)n of f

X
zn

is holomorphic on the whole complex


n!
plane with (ez ) = ez ; similarly, the trigonometric functions sin z and cos z are holomorphic in
C since

2n+1
2n
X
X
n z
n z
sin z =
(1)
(1)
, cos z =
.
(2n + 1)!
(2n)!
n=0
n=0
n=0

We have (sin z) = cos z and (cos z) = sin z.

Definition 14.2 A complex function which is defined on C and which is holomorphic on the
entire complex plane is called an entire function.

14.1.3 CauchyRiemann Equations


Let us identify the complex field C and the two dimensional real plane R2 via z = x + iy, that
is, every complex number z corresponds to an ordered pair (x, y) of real numbers. In this way,
a complex function w = f (z) corresponds to a function U R2 where U C is open. We
have w = u + iv where u = u(x, y) and v = v(x, y) are the real and the imaginary parts of
the function f ; u = Re w and v = Im w. Problem: What is the relation between complex
differentiability and the differentiability of f as a function from R2 to R2 ?
Proposition 14.3 Let
f : U C,

U C open,

aU

be a function. Then the following are equivalent:


(a) f is complex differentiable at a.
(b) f (x, y) = u(x, y) + iv(x, y) is real differentiable at a as a function
f : U R2 R2 , and the CauchyRiemann equations are satisfied at a:
u
v
(a) =
(a),
x
y
ux = vy ,

u
v
(a) = (a).
y
x
uy = vx .

(14.6)

14.1 Holomorphic Functions

367

In this case,
f = ux + ivx = vy iuy .
Proof. (a) (b): Suppose that z = h + ik is a complex number such that a + z U; put
f (a) = b1 + ib2 . By assumption,
| f (a + z) f (a) zf (a) |
= 0.
z0
|z |
lim

We shall write this in the real form with real variables h and k. Note that
zf (a) = (h + ik)(b1 + ib2 ) = hb1 kb2 + i(hb2 + kb1 )

 
 
hb1 kb2
b1 b2
h
=
.
=
k
hb2 + kb2
b2 b1
This implies, with the identification z = (h, k),

  


h

f (a + z) f (a) b1 b2

k
b2 b1
= 0.
lim
z0
|z |
That is (see Subsection 7.2), f is real differentiable at a with the Jacobian matrix


b1 b2

f (a) = Df (a) =
.
b2 b1

(14.7)

By Proposition 7.6, the Jacobian matrix is exactly the matrix of the partial derivatives, that is


ux uy
.
Df (a) =
vx vy
Comparing this with (14.7), we obtain ux (a) = vy (a) = Re f (a) and uy (a) = vx (a) =
Im f (a). This completes the proof of the first direction.
(b) (a). Since f = (u, v) is differentiable at a U as a real function, there exists a linear
mapping Df (a) L (R2 ) such that

 


f (a + (h, k)) f (a) Df (a) h

k
lim
= 0.
(h,k)0
k(h, k)k

By Proposition 7.6,



ux uy
.
Df (a) =
vx vy

The CauchyRiemann equations show that Df takes the form




b1 b2
Df (a) =
,
b2 b1

14 Complex Analysis

368
where ux = b1 and vx = b2 . Writing
  

h
hb1 kb2
Df (a)
=
= z(b1 + ib2 )
k
hb2 + kb2

in the complex form with z = h+ik gives f is complex differentiable at a with f (a) = b1 +ib2 .

Example 14.3 (a) We already know that f (z) = z 2 is complex differentiable. Hence, the
CauchyRiemann equations must be fulfilled. From
f (z) = z 2 = (x + iy)2 = x2 y 2 + 2ixy,

u(x, y) = x2 y 2 ,

v(x, y) = 2xy

we conclude
ux = 2x,

uy = 2y,

vx = 2y,

vy = 2x.

The CauchyRiemann equations are satisfied.


(b) f (z) = | z |2 . Since f (z) = x2 + y 2, u(x, y) = x2 + y 2 , v(x, y) = 0. The Cauchy-Riemann
equations yield ux = 2x = 0 = vy and uy = 2y = 0 = vx such that z = 0 is the only solution
of the CRE. z = 0 is the only point where f is differentiable.
f (z) = z is nowhere differentiable since u(x, y) = x, v(x, y) = y; thus
1 = ux 6= vy = 1.

c2

c1
c3

A function f : U C, U C open, is called locally


constant in U, if for every point a U there exists a ball
V with a V U such that f is constant on V .
Clearly, on every connectedness component of U, f is
constant. In fact, one can define U to be connected if for
every holomorphic f : U C, f is constant.

Corollary 14.4 Let U C be open and f : U C be a holomorphic function on U.


(a) If f (z) = 0 for all z U, then f is locally constant in U.
(b) If f takes real values only, then f is locally constant.
(c) If f has a continuous second derivative, u = Re f and v = Im f are harmonic functions,
i. e., they satisfy the Laplace equation (u) = uxx + uyy = 0 and (v) = 0.
Proof. (a) Since f (z) = 0 for all z U, the CauchyRiemann equations imply ux = uy =
vx = vy = 0 in U. From real analysis, it is known that u and v are locally constant in U (apply
Corollary 7.12 with grad f (a + x) = 0).
(b) Since f takes only real values, v(x, y) = 0 for all (x, y) U. This implies vx = vy = 0 on
U. By the CauchyRiemann equations, ux = uy = 0 and f is locally constant by (a).

14.2 Cauchys Integral Formula

369

(c) ux = vy implies uxx = vyx and differentiating uy = vx with respect to y yields


uyy = vxy . Since both u and v are twice continuously differentiable (since so is f ), by
Schwarz Lemma, the sum is uxx + uyy = vyx vxy = 0. The same argument works for
vxx + vyy = 0.

Remarks 14.2 (a) We will see soon that the additional differentiability assumption in (c) is
superfluous.
(b) Note, that an inverse statement to (c) is easily proved: If Q = (a, b) (c, d) is an open
rectangle and u : Q R is harmonic, then there exists a holomorphic function f : Q C
such that u = Re f .

14.2 Cauchys Integral Formula


14.2.1 Integration
The major objective of this section is to prove the converse to Proposition 14.1: Every in D
holomorphic function is representable as a power series in D. The quickest route to this is
via Cauchys Theorem and Cauchys Integral Formula. The required integration theory will be
developed. It is a useful tool to study holomorphic functions.
Recall from Section 5.4 the definition of the Riemann integral of a bounded complex valued
function : [a, b] C. It was defined by integrating both the real and the imaginary parts of
. In what follows, a path is always a piecewise continuously differentiable curve.
Definition 14.3 Let U C be open and
f : U C a continuous function on U. Suppose that : [t1 , tn ] U is a path in U. The
Z
integral of f along is defined as the line integral
6

Z5

f
C

Z4
Z1

Z2

Z3

f (z) dz :=

t
n Zk
X

f ((t)) (t) dt, (14.8)

k=1t

k1

where is continuously differentiable on


[tk1 , tk ] for all k = 1, . . . , n.
By the change of variable rule, the integral of f along does not depend on the parametrization
of the path {(t) | t [t0 , t1 ]}. However, if we exchange the initial and the end point of (t),
we obtain a negative sign.
Remarks 14.3 (Properties of the complex integral) (a) The integral of f along is linear
over C:
Z
Z
Z
Z
Z
(f1 + f2 ) dz = f1 dz + f2 dz,
f (z) dz =
f (z) dz,

14 Complex Analysis

370

where has the opposite orientation of + .


(b) If 1 and 2 are two paths so that 1 and 2 join to form , then we have
Z

f (z) dz =

f (z) dz +

f (z) dz.

(c) From the definition and the triangle inequality, it follows that for a continuously differentiable path


Z



f (z) dz M ,



Rb
where | f (z) | M for all z and is the length of , = a | (t) | dt. t [t0 , t1 ]. Note
that the integral on the right is the length of the curve (t).
Rb
(d) The integral of f over generalizes the real integral f (t) dt. Indeed, let (t) = t, t [a, b],
a

then

f (z) dz =

Zb

f (t) dt.

(e) Let be the circle Sr (a) of radius r with center a. We can parametrize the positively oriented
circle as (t) = a + reit , t [0, 2]. Then
Z

f (z) dz = ir

Z2
0


f a + reit eit dt.

Example 14.4 (a) Let 1 (t) = eit , t [0, ], be the half of the unit circle from 1 to 1 via i and
2 (t) = t, t [1, 1] the segment from 1 to 1. Then 1 (t) = ieit and 2 (t) = 1. Hence,
Z

z dz = i

e2it eit

dt = i

2it+it


i it
=
e = (1 1) = 2.
i
0
Z 1
Z
2
z 2 dz =
t2 dt = .
see (b)
3
1

dt = i

eit dt

In particular, the integral of z 2 is not path independent.


(b)
(
Z
Z 2
Z 2
0,
n 6= 1,
dz
ireit
=
dt = ir n+1
e(n1)it dt =
n
n
int
r e
2i, n = 1.
Sr z
0
0

14.2 Cauchys Integral Formula

371

14.2.2 Cauchys Theorem


Cauchys theorem is the main part in the proof that every in a holomorphic function can be
written as a power series with midpoint a. As a consequence of Corollary 14.2, holomorphic
functions have derivatives of all orders.
We start with a very weak form. The additional assumption is, that f has an antiderivative.
Lemma 14.5 Let f : U C be continuous, and suppose that f has an antiderivative F which
is holomorphic on U, F = f . If is any path in U joining z0 and z1 from U, we have
Z
f (z) dz = F (z2 ) F (z1 ).

In particular, if is a closed path in U


Z

f (z) dz = 0.

Proof. It suffices to prove the statement for a continuously differentiable curve (t). Put h(t) =
F ((t)). By the chain rule
h (t) =

d
F ((t)) = F ((t)) (t) = f ((t)) (t)
dt

By definition of the integral and the fundamental theorem of calculus (see Subsection 5.5),
Z
Z b
Z b

f (z) dz =
f ((t)) (t) dt =
h (t) dt = h(t)|ba = h(b) h(a) = F (z1 ) F (z0 ).
a

Example 14.5 (a)

(b)

R i
1

1i

z 3 dz =
2+3i

(1 i)4 (2 + 3i)4

.
4
4

ez dz = 1 e.

Theorem 14.6 (Cauchys Theorem) Let U be a simply connected region in C and let f (z) be
R
holomorphic in U. Suppose that (t) is a path in U joining z0 and z1 in U. Then f (z) dz

R
depends on z0 and z1 only and not on the choice of the path. In particular, f (z) dz = 0 for

any closed path in U.

Proof. We give the proof under the weak additional assumption that f not only exists but is
continuous in U. In this case, the partial derivatives ux , uy , vx , and vy are continuous and we can
apply the integrability criterion Proposition 8.3 which was a consequence of Greens theorem,
see Theorem 10.3. Note that we need U to be simply connected in contrast to Lemma 14.5.

14 Complex Analysis

372

Without this additional assumption (f is continuous), the proof is lengthy (see [FB93, Lan89,
Jan93]) and starts with triangular or rectangular paths and is generalized then to arbitrary paths.
We have
Z

f (z) dz =

(u + iv)( dx + i dy) =

(u dx v dy) + i

We have path independence of the line integral

(v dx + u dy).

P dx + Q dy if and only if, the integrability

condition Qx = Py is satisfied if and only if P dx + Q dy is a closed form.


In our case, the real part is path independent if and only if vx = uy . The imaginary part is
path independent if and only if ux = vy . These are exactly the CauchyRiemann equations
which are satisfied since f is holomorphic.

Remarks 14.4 (a) The proposition holds under the following weaker assumption: f is continuous in the closure U and holomorphic in U, U is a simply connected region, and = U is a
path.
(b) The statement is wrong without the assumption U is simply connected. Indeed, consider
the circle of radius r with center a, that is (t) = a + reit . Then f (z) = 1/(z a) is singular
at a and we have
Z 2 it
Z 2
Z
dz
e
= ir
dt = i
dt = 2i.
za
reit
0
0
Sr (a)

(c) For a non-simply


connected region G one
cuts G with pairwise
inverse to each other
paths (in the picture: 1 ,
2 , 3 and 4 ). The re is now
sulting region G
simply connected such
R
that f (z) dz = 0 by

(a).
Since the integrals along i , i = 1, . . . , 4, cancel, we have
Z
f (z) dz = 0.
+1 +2

Sr2

Sr1

In particular, if f is holomorphic in {z | 0 < | z a | < R} and 0 <


r1 < r2 < R, then
Z
Z
f (z) dz =
f (z) dz
Sr1 (a)

Sr2 (a)

if both circles are positively oriented.

14.2 Cauchys Integral Formula

373

Proposition 14.7 Let U be a simply connected region, z0 U, U0 = U \ {z0 }. Suppose that f


is holomorphic in U0 and bounded in a certain neighborhood of z0 .
Then
Z
f (z) dz = 0

for every non-selfintersecting closed path in U0 .


Proof. Suppose that | f (z) | C for | z z0 | < 0 . For any with 0 < < 0 we then have by
Remark 14.3 (c)


Z





f (z) dz 2 C.



S (z0 )

Z
Z
Z
By Remark 14.4 (c), f (z) dz =
f (z) dz =
f (z) dz. Hence

S0 (z0 )

S (z0 )



Z
Z


f (z) dz =



S (z0 )

Since this is true for all small > 0,





f (z) dz 2 C.

f (z) dz = 0.

We will see soon that under the conditions of the proposition, f can be made holomorphic at z0 ,
too.

14.2.3 Cauchys Integral Formula


Theorem 14.8 (Cauchys Integral Formula) Let U be a region. Suppose that f is holomorphic in U, and a non-selfintersecting positively oriented path in U such that is the boundary
of U0 U; in particular, U0 is simply connected.
Then for every a U0 we have
Z
f (z) dz
1
f (a) =
(14.9)
2i
za

Proof. a U0 is fixed. For z U we define


(
F (z) =

f (z)f (a)
,
za

0,

z 6= a

z = a.

Then F (z) is holomorphic in U \ {a} and bounded in a neighborhood of a since f (a) exists
and therefore,


f (z) f (a)


<

f
(a)


za

14 Complex Analysis

374
as z approaches a. Using Proposition 14.7 and Remark14.4 (b) we have

F (z) dz = 0, that is

such that

f (z) dz
=
za

f (z) f (a)
dz = 0,
za
f (a) dz
= f (a)
za

dz
= 2i f (a).
za

Remark 14.5 The values of a holomorphic function f inside a path are completely determined by the values of f on .
Example 14.6 Evaluate
Ir :=

sin z
dz
z2 + 1

Sr (a)

in cases a = 1 + i and r = 12 , 2, 3.
Solution. We use the partial fraction decomposition of z 2 + 1 to obtain linear terms in the
denominator.


1
1
1
1
.
=

z2 + 1
2i z i z + i
Hence, with f (z) = sin z we have in case r = 3
Z
Z
Z
1
1
sin z dz
sin z
=
dz
I3 =
z2 + 1
2i
zi
2i
Sr (a)

Sr (a)

sin z
dz
z+i

Sr (a)

= (f (i) f (i)) = 2 sin(i) = i(e 1/e).


In case r = 2, the function

sin z
z+i

is holomorphic inside the circle of radius 2 with center a. Hence,


I2 = sin(i) = I3 /2.

In case r = 12 , both integrand are holomorphic, such that I 1 = 0.


2

45
0

Example 14.7 Consider the function f (z) = eiz which


is an entire function. Let 1 (t) = t, t [0, R], be the
segment from 0 to R on the real line; let 2 (t) = Reit ,
t [0, /4], be the sector of the circle of radius R with
center 0; and let finally 3 (t) = tei/4 , t [0, R], be the
segment from 0 to Rei/4 . By Cauchys Theorem,
Z
I1 + I2 I3 =
f (z) dz = 0.
1 +2 3

14.2 Cauchys Integral Formula


Obviously, since ei/4

2

375

= ei/2 = i
Z

eit dt,
0
Z R
2
i/4
I3 = e
et dt

I1 =

We shall show that | I2 (R) | 0 as R tends to . We have


n
Z

Z /4
Z /4

/4

2


iR2 (cos 2t+i sin 2t)
i(R2 e2it )
it
| I2 (R) | =
e
eR sin 2t dt.
Rie dt R
e
dt R
0

0
0

Note that sin t is a concave function on [0, /2], that is, the graph of the sine function is above
the graph of the corresponding linear function through (0, 0) and (/2, 1); thus, sin t 2t/,
t [0, /2]. We have
Z /4


 R2

R2 4t/
R2
.
e
dt =
1 =
| I2 (R) | R
e
1e
4R
4R
0
We conclude that | I2 (R) | tends to 0 as R . By Cauchys Theorem I1 + I2 I3 = 0 for
all R, we conclude
Z
Z
2
it2
i/4
lim I1 (R) =
e dt = e
et dt = lim I3 (R).
R

/2 (see below); hence eit = cos(t2 ) + i sin(t2 ) implies

Z
Z
2
2
=
cos(t ) dt =
sin(t2 ) dt.
4
0
0
R

2
These are the so called Fresnel integrals. We show that I = 0 ex dx = /2. (This was
already done in Homework 41) For, we compute the double integral using Fubinis theorem:

The integral on the right is

Z Z
0

x2 y 2

dxdy =

x2

dx

ey dy = I 2 .

Passing to polar coordinates yields dxdy = r dr, x2 + y 2 = r 2 such that


Z /2
Z R
ZZ
2
x2 y 2
e
dxdy = lim
d
er r dr.

R+)2

The change of variables r 2 = t, dt = 2r dr yields

Z
1 t

2
e dt = = I =
I =
.
22 0
4
2

This proves the claim. In addition, the change of variables x = s also yields
  Z x

1
e
dx = .

=
x
2
0

14 Complex Analysis

376

Theorem 14.9 Let be a path in an open set U and g be a continuous function on U. If a is


not on , define
Z
g(z)
h(a) =
dz.
za

Then h is holomorphic on the complement of in U and has derivatives of all orders. They are
given by
Z
g(z)
(n)
h (a) = n!
dz.
(z a)n+1

Proof. Let b U and not on . Then there exists


some r > 0 such that | z b | r for all points z on
. Let 0 < s < r. We shall see that h has a power
series expansion in the ball Us (b). We write

1
1
1
1
=
=
za
z b (a b)
z b 1 ab
zb
!
2

ab
ab
1
1+
+
=
+ .
zb
zb
zb

a
b

This geometric series converges absolutely and uniformly for | a b | s because




ab s


z b r < 1.

Since g is continuous and is a compact set, g(z) is bounded on such that by Theorem 6.6,

P
ab n
the series
can be integrated term by term, and we find
n=0 g(z) zb
h(a) =

Z X

(a b)n
g(z)
dz
(z b)n+1
n=0

X
n=0

X
n=0

(a b)

g(z)
dz
(z b)n+1

cn (a b)n ,

where
cn =

g(z) dz
.
(z b)n+1

This proves that h can be expanded into a power series in a neighborhood of b. By Proposition 14.1 and Corollary 14.2, f has derivatives of all orders in a neighborhood of b. By the

14.2 Cauchys Integral Formula

377

formula in Corollary 14.2


(n)

(b) = n!cn = n!

g(z) dz
.
(z b)n+1

Remark 14.6 There is an easy way to deduce the formula. Formally, we can exchange the
R
d
and :
differentiation da
Z
Z
Z

d
g(z)
d
g(z)

1
dz.
h (a) =
g(z)(z a)
dz =
dz =
da
za
da
(z a)2

Z
Z

g(z)
d
h (a) =
g(z)(z a)2 dz = 2
dz.
da
(z a)3

Theorem 14.10 Suppose that f is holomorphic in U and Ur (a) U, then f has a power series
expansion in Ur (a)

X
cn (z a)n .
f (z) =
n=0

In particular, f has derivatives of all orders, and we have the following coefficient formula
Z
f (n) (a)
f (z) dz
1
cn =
.
(14.10)
=
n!
2i
(z a)n+1
Sr (a)

Proof. In view of Cauchys Integral Formula (Theorem 14.8) we obtain


Z
1
f (z) dz
f (a) =
2i
za
SR (a)

Inserting g(z) = f (z)/(2i) (f is continuous) into Theorem 14.9, we see that f can be expanded
into a power series with center a and, therefore, it has derivatives of all orders at a,
Z
f (z) dz
n!
(n)
f (a) =
.
2i
(z a)n+1
SR (a)

14.2.4 Applications of the Coefficient Formula


Proposition 14.11 (Growth of Taylor Coefficients) Suppose that f is holomorphic in U and
is bounded by M > 0 in Ur (a) U; that is, | z a | < r implies | f (z) | M.
P
n
Let
n=0 cn (z a) be the power series expansion of f at a.
Then we have
M
| cn | n .
(14.11)
r

14 Complex Analysis

378

Proof.
By the

coefficient formula (14.10) and Remark 14.3 (c) we have noting that


M
f
(z)


(z a)n+1 r n+1 for z Sr a




Z
1

1 M
M
M
f
(z)

dz
(S
(a))
=
2r
=
.
| cn |

r
2i
2 r n+1
(z a)n+1
2r n+1
rn


Sr (a)

Theorem 14.12 (Liouvilles Theorem) A bounded entire function is constant.


Proof. Suppose that | f (z) | M for all z C. Since f is given by a power series f (z) =
P
n
n=0 cn z with radius of convergence R = , the previous proposition gives
| cn |

M
rn

for all r > 0. This shows cn = 0 for all n 6= 0; hence f (z) = c0 is constant.
Remarks 14.7 (a) Note that we explicitly assume f to be holomorphic on the entire complex
plane. For example, f (z) = e1/z is holomorphic and bounded outside every ball U (0). However, f is not constant.
(b) Note that f (z) = sin z is an entire function which is not constant. Hence, sin z is unbounded
as a complex function.
Theorem 14.13 (Fundamental Theorem of Algebra) A polynomial p(z) with complex coefficients of degree deg p 1 has a complex root.

Proof. Suppose to the contrary that p(z) 6= 0 for all z C. It is known, see Example 3.3, that
lim | p(z) | = +. In particular there exists R > 0 such that

| z |

| z | R = | p(z) | 1.
That is, f (z) = 1/p(z) is bounded by 1 if | z | R. On the other hand, f is a continuous
function and {z | | z | R} is a compact subset of C. Hence, f (z) = 1/p(z) is bounded on
UR , too. That is, f is bounded on the entire plane. By Liouvilles theorem, f is constant and so
is p. This contradicts our assumption deg p 1. Hence, p has a root in C.
Now, there is an inverse-like statement to Cauchys Theorem.
Theorem 14.14 (Moreras Theorem) Let f : U C be a continuous function where U C
is open. Suppose that the integral of f along each closed triangular path [z1 , z2 , z3 ] in U is 0.
Then f is holomorphic in U.

14.2 Cauchys Integral Formula

379

Proof. Fix z0 U. We show that f has an anti-derivative in a small neighborhood U (z0 ) U.


For a U (z0 ) define
Za
F (a) = f (z) dz.
z0

a+h

Note that F (a) takes the same value for all polygonal paths
from z0 to a by assumption of the theorem. We have


Z a+h

F (a + h) F (a)
1

(f (z) f (a)) dz ,
f (a) =

h
h a

where the integral on the right is over the segment form a to


R a+h
a + h and we used a c dz = ch. By Remark 14.3 (c), the
right side is less than or equal to

1
sup | f (z) f (a) | | h | = sup | f (z) f (a) | .
| h | zUh(a)
zUh (a)

Since f is continuous the above term tends to 0 as h tends to 0. This shows that F is
differentiable at a with F (a) = f (a). Since F is holomorphic in U, by Theorem 14.10 it has
derivatives of all orders; in particular f is holomorphic.

Corollary 14.15 Suppose that (fn ) is a sequence of holomorphic functions on U, uniformly


converging to f on U. Then f is holomorphic on U.
Proof. Since fn are continuous and uniformly converging, f is continuous on U. Let be any
closed triangular path in U. Since (fn ) converges uniformly, we may exchange integration and
limit:
Z
Z
Z
f (z) dz =

lim fn (z) dz = lim

fn (z) dz = lim 0 = 0

since each fn is holomorphic. By Moreras theorem, f is holomorphic in U.

Summary
Let U be a region and f : U C be a function on U. The following are equivalent:
(a) f is holomorphic in U.
(b) f = u + iv is real differentiable and the CauchyRiemann equations ux = vy
and uv = vx are satisfied in U.
(c) If U is simply connected, f is continuous and for every closed triangular path
R
= [z1 , z2 , z3 ] in U, f (z) dz = 0 (Morera condition).

14 Complex Analysis

380

(d) f possesses locally an antiderivative, that is, for every a U there is a ball
U (a) U and a holomorphic function F such that F (z) = f (z) for all z U (a).
(e) f is continuous and for every ball Ur (a) with Ur (a) U we have
Z
1
f (z)
f (b) =
dz,
b Ur (a).
2i
zb
Sr (a)

(f) For a U there exists a ball with center a such that f can be expanded in that
ball into a power series.
(g) For every ball B which is completely contained in U, f can be expanded into a
power series in B.

14.2.5 Power Series


Since holomorphic functions are locally representable by power series, it is quite useful to know
how to operate with power series. In case that a holomorphic function f is represented by a
power series, we say that f is an analytic function. In other words, every holomorphic function
is analytic and vice versa. Thus, by Theorem 14.10, any holomorphic function is analytic and
vice versa.
(a) Uniqueness
P
P
bn z n converge in a ball around 0 and define the same function then
If both
cn z n and
cn = bn for all n N0 .
(b) Multiplication
P
P
If both cn z n and bn z n converge in a ball Ur (0) around 0 then

X
n=0

where dn =

Pn

cn z

X
n=0

bn z =

dn z n ,

n=0

| z | < r,

k=0 cnk bk .

(c) The Inverse 1/f


Let f (z) =

cn z n be a convergent power series and

n=0

c0 6= 0.
Then f (0) = c0 6= 0 and, by continuity of f , there exists r > 0 such that the power series
converges in the ball Ur (0) and is non-zero there. Hence, 1/f (z) is holomorphic in Ur (0) and
therefore it can be expanded into a converging power series in Ur (0), see summary (f). Suppose

14.2 Cauchys Integral Formula

that 1/f (z) = g(z) =

X
n=0

381

bn z n , | z | < r. Then f (z)g(z) = 1 = 1 + 0z + 0z 2 + , the

uniqueness and (b) yields

1 = c0 b0 ,

0 = c0 b1 + c1 b0 ,

0 = c0 b2 + c1 b1 + c2 b0 ,

This system of equations can be solved recursively for bn , n N0 , for example, b0 = 1/c0,
b1 = c1 b0 /c0 .
(d) Double Series
Suppose that
fk (z) =

X
n=0

ckn (z a)n ,

kN

are converging in Ur (a) power series. Suppose further that the series

fk (z)

k=1

converges locally uniformly in Ur (a) as well. Then

fk (z) =

X
X
n=0

k=1

k=1

ckn (z a)n .

P
In particular, one can form the sum of a locally uniformly convergent series fk (z) of power
P
series coefficientwise. Note that a series of functions
k=1 fk (z) converges locally uniformly at b if

there exists > 0 such that the series converges uniformly in U (b).

Note that any locally uniformly converging series of holomorphic functions defines a holomorphic function (Theorem of Weierstra). Indeed, since the series converges uniformly, line
integral and summation can be exchanged: Let = [z0 z1 z2 ] be any closed triangular path inside
U, then by Cauchys theorem
Z
Z X

X
X
f (z) dz =
fk (z) dz =
fk (z) dz =
0 = 0.

By Moreras theorem, f (z) =

k=1

k=1 f (z)

k=1

k=1

is holomorphic.

(e) Change of Center


P
n
Let f (z) =
n=0 cn (z a) be convergent in Ur (a), r > 0, and b Ur (a). Then f can be
expanded into a power series with center b

X
f (n) (b)
n
bn (z b) , bn =
f (z) =
,
n!
n=0

with radius of convergence at least r | b a |. Also, the coefficients can be obtained by


reordering for powers of (z b)k using the binomial formula
n  
X
n
n
n
(z b)k (b a)nk .
(z a) = (z b + b a) =
k
k=0

14 Complex Analysis

382
(f) Composition
We restrict ourselves to the case
f (z) = a0 + a1 z + a2 z 2 +
g(z) = b1 z + b2 z 2 +

where g(0) = 0 and therefore, the image of g is a small neighborhood of 0, and we assume that
the first power series f is defined there; thus, f (g(z)) is defined and holomorphic in a certain
neighborhood of 0, see Remark 14.1. Hence
h(z) = f (g(z)) = c0 + c1 z + c2 z 2 +
where the coefficients cn = h(n) (0)/n! can be computed using the chain rule, for example,
c0 = f (g(0)) = a0 , c1 = f (g(0))g (0) = a1 b1 .
(g) The Composition Inverse f 1
P
n
Suppose that f (z) =
n=1 an z , a1 6= 0, has radius of convergence r > 0. Then there exists a
P
power series g(z) = n=1 bn z n converging on U (0) such that f (g(z)) = z = g(f (z)) for all
z U (0). Using (f) and the uniqueness, the coefficients bn can be computed recursively.
Example 14.8 (a) The function
f (z) =

1
1
+
2
1+z
3z

is holomorphic in C \ {i, i, 3}. Expanding f into a power series with center 1, the closest
singularity to 1 is i. Since the disc of convergence cannot contain i, the radius of convergence

is | 1 i | = 2. Expanding the power series around a = 2, the closest singularity of f is 3;


hence, the radius of convergence is now | 3 2 | = 1.
1
1z
which is holomorphic in C \ {1} into a power series
around b = i/2. For arbitrary b with | b | < 1 we have

(b) Change of center. We want to expand f (z) =


i
i/2
0

1
1
1
1
=
=

1z
1 b (z b)
1 b 1 zb
1b

X
1
=
(z b)n = f(z).
n+1
(1

b)
n=0

By the root test, the


of this series is | 1 b |. In case b = i/2 we have
p radius of convergence

r = | 1 i/2 | = 1 + 1/4 = 5/2. Note that the power series 1 + z + z 2 + has radius of
convergence 1 and a priori defines an analytic (= holomorphic) function in the open unit ball.
However, changing the center we obtain an analytic continuation of f to a larger region. This
example shows that (under certain assumptions) analytic functions can be extended into a larger
region by changing the center of the series.

14.3 Local Properties of Holomorphic Functions

383

14.3 Local Properties of Holomorphic Functions


We omit the proof of the Open Mapping Theorem and refer to [Jan93, Satz 11, Satz 13] see
also [Con78].
Theorem 14.16 (Open Mapping Theorem) Let G be a region and f a non-constant holomorphic function on G. Then for every open subset U G, f (U) is open.
The main idea is to show that any at some point a holomorphic function f with f (a) = 0 looks
like a power function, that is, there exists a positive integer k such that f (z) = h(z)k in a small
neighborhood of a, where h is holomorphic at a with a zero of order 1 at a.
Theorem 14.17 (Maximum Modulus Theorem) Let f be holomorphic in the region U and
a U is a point such that | f (a) | | f (z) | for all z U. Then f must be a constant function.
Proof. First proof. Let V = f (U) and b = f (a). By assumption, | b | | w | for all w V .
b cannot be an inner point of V since otherwise, there is some c in the neighborhood of b with
| c | > | b | which contradict the assumption. Hence b is in the boundary V V . In particular,
V is not open. Hence, the Open Mapping Theorem says that f is constant.
Second Proof. We give a direct proof using Cauchys Integral formula. For simplicity let a = 0
and let Ur (0) U be a small ball in U. By Cauchys theorem with = Sr (0), z = (t) = reit ,
t [0, 2], dz = rieit dt we get

Z 2
Z 2
f reit rieit
1
1
f (0) =
dt =
f (reit ) dt.
2i 0
reit
2 0
In other words, f (0) is the arithmetic mean of the values of f on any circle with center 0.
Let M = | f (a) | | f (z) | be the maximal modulus of f on U. Suppose, there exists z0
with z0 = r0 eit0 with r0 < r and | f (z0 ) | < M. Since f is continuous, there exists a whole
neighborhood of t U (t0 ) with f (reit ) < M. However, in this case


Z 2
Z 2


1

1
it

f (reit ) dt < M

f (re ) dt
M = | f (0) | =
2 0
2 0

which contradicts the mean value property. Hence, | f (z) | = M is constant in any sufficiently
small neighborhood of 0. Let z1 U be any point in U. We connect 0 and z1 by a path in
U. Let d be its distance from the boundary U. Let z continuously moving from 0 to z1 and
cosider the chain of balls with center z and radius d/2. By the above, | f (z) | = M in any such
ball, hence | f (z) | = M in U. It follows from homework 47.2 that f is constant.

Remark 14.8 In other words, if f is holomorphic in G and U G, then sup | f (z) | is attained
zU

on the boundary U. Note that both theorems are not true in the real setting: The image of the
sine function of the open set (0, 2) is [1, 1] which is not open. The maximum of f (x) = 1x2
over (1, 1) is not attained on the boundary since f (1) = f (1) = 0 while f (0) = 1. However
| z 2 1 | on the complex unit ball attains its maximum in z = ion the boundary.

14 Complex Analysis

384
Recall from topology:

An accumulation point of a set M C is a point of z C such that for every > 0,


U (z) contains infinitely many elements of M. Accumulation points are in the closure of
M, not necessarily in the boundary of M. The set of accumulation points of M is closed
(Indeed, suppose that a is in the closure of the set of accumulation points of M. Then
every neighborhood of a meets the set of accumulation points of M; in particular, every
neighborhood has infinitely many element of M. Hence a itself is an accumulation point
of M).
M is connected if every locally constant function is constant. M is not connected if M is
the disjoint union of two non-empty subsets A and B, both are open and closed in M.
For example, M = {1/n | n N} has no accumulation point in C \ {0} but it has one
accumulation point, 0, in C.
Proposition 14.18 Let U be a region and let f : U C be holomorphic in U.
If the set Z(f ) of the zeros of f has an accumulation point in U, then f is identically 0 in U.
Example 14.9 Consider the holomorphic function f (z) = sin 1z , f : U C, U = C \ {0} and
1
| n Z}. The only accumulation point 0 of Z(f ) does
with the set of zeros Z(f ) = {zn = n
not belong to U. The proposition does not apply.
Proof. Suppose a U is an accumulation point of Z(f ). Expand f into a power series with
center a:

X
f (z) =
cn (z a)n , | z a | < r.
n=0

Since a is an accumulation point of the zeros, there exists a sequence (zn ) of zeros converging
to a. Since f is continuous at a, limn f (zn ) = 0 = f (a). This shows c0 = 0. The same
argument works with the function

f (z)
,
za
which is holomorphic in the same ball with center a and has a as an accumulation point of zeros.
Hence, c1 = 0. In the same way we conclude that c2 = c3 = = cn = = 0. This shows
that f is identically 0 on Ur (a). That is, the set
f1 (z) = c1 + c2 (z a) + c3 (z a)3 + =

A = {a U | a is an accumulation point of Z(f ) }


is an open set. Also,
B = {a U | a is not an accumulation point of Z(f ) }
is open (with every non-accumulation point z, there is a whole neighborhood of z not containing accumulation points of Z(f )). Now, U is the disjoint union of A and B, both are open as
well as closed in U. Hence, the characteristic function on A is a locally constant function on U.
Since U is connected, either U = A or U = B. Since by assumption A is non-empty, A = U,
that is f is identically 0 on U.

14.4 Singularities

385

Theorem 14.19 (Uniqueness Theorem) Suppose that f and g are both holomorphic functions
on U and U is a region. Then the following are equivalent:
(a) f = g
(b) The set D = {z U | f (z) = g(z)} where f and g are equal has an accumulation point in U.
(c) There exists z0 U such that f (n) (z0 ) = g (n) (z0 ) for all non-negative integers
n N0 .
Proof. (a) (b). Apply the previous proposition to the function f g.
(a) implies (c) is trivial. Suppose that (c) is satisfied. Then, the power series expansion of
f g at z0 is identically 0. In particular, the set Z(f g) contains a ball B (z0 ) which has an
accumulation point. Hence, f g = 0.
The following proposition is an immediate consequence of the uniqueness theorem.
Proposition 14.20 (Uniqueness of Analytic Continuation) Suppose that M U C where
U is a region and M has an accumulation point in U. Let g be a function on M and suppose
that f is a holomorphic function on U which extents g, that is f (z) = g(z) on M.
Then f is unique.
Remarks 14.9 (a) The previous proposition shows a quite amazing property of a holomorphic
function: It is completely determined by very few values. This is in a striking contrast to
C -functions on the real line. For example, the hat function
( 1
| x | < 1,
e 1x2
h(x) =
0
|x| 1
is identically 0 on [2, 3] (a set with accumulation points), however, h is not identically 0. This
shows that h is not holomorphic.
(b) For the uniqueness theorem, it is an essential point that U is connected.
(c) It is now clear that the real function ex , sin x, and cos x have a unique analytic continuation
into the complex plane.
(d) The algebra O(U) of holomorphic functions on a region U is a domain, that is, f g =
0 implies f = 0 or g = 0. Indeed, suppose that f (z0 ) 6= 0, then f (z) 6= 0 in a certain
neighborhood of z0 (by continuity of f ). Then g = 0 on that neighborhood. Since an open set
has always an accumulation point in itself, g = 0.

14.4 Singularities

We consider functions which are holomorphic in a punctured ball U r (a). From information
about the behaviour of the function near the center a, a number of interesting and useful results
will be derived. In particular, we will use these results to evaluate certain unproper integrals
over the real line which cannot be evaluated by methods of calculus.

14 Complex Analysis

386

14.4.1 Classification of Singularities


Throughout this subsection U is a region, a U, and f : U \ {a} C is holomorphic.
Definition 14.4 (a) Let f be holomorphic in U \ {a} where U is a region and a U. Then a is
said to be an isolated singularity.
(b) The point a is called a removable singularity if there exists a holomorphic function
g : Ur (a) C such that g(z) = f (z) for all z with 0 < | z a | < r.
sin z 1
, , and e1/z all have isolated singularities at 0. However,
Example 14.10 The functions
z z
sin z
has a removable singularity. The holomorphic function g(z) which coincides
only f (z) =
z
with f on C \ {0} is g(z) = 1 z 2 /3! + z 4 /5! + . Hence, redefining f (0) := g(0) = 1
makes f holomorphic in C. We will see later that the other two singularities are not removable.
It is convenient to denote the the new function g with one more point in its domain (namely a)
also by f .
Proposition 14.21 (Riemann1851) Suppose that f : U \ {a} C, a U, is holomorphic.
Then a is a removable singularity of f if and only if there exists a punctured neighborhood

U r (a) where f is bounded.


Proof. The necessity of the condition follows from the fact, that a holomorphic function g is
continuous and the continuous function | g(z) | defined on the compact set Ur/2 (a) is bounded;
hence f is bounded.
For the sufficiency we assume without loss of generality, a = 0 (if a is non-zero, consider the
function f(z) = f (z a) instead). The function
(
z 2 f (z),
z 6= 0,
h(z) =
0,
z=0

is holomorphic in U r (0). Moreover, h is differentiable at 0 since f is bounded in a neighborhood


of 0 and
h(z) h(0)
h (0) = lim
= lim zf (z) = 0.
z0
z0
z
Thus, h can be expanded into a power series at 0,
h(z) = c0 + c1 z + c2 z 3 + c3 z 3 + = c2 z 2 + c3 z 3 +
with c0 = c1 = 0 since h(0) = h (0) = 0. For non-zero z we have
f (z) =

h(z)
= c2 + c3 z + c4 z 2 + .
2
z

The right side defines a holomorphic function in a neighborhood of 0 which coincides with f
for z 6= 0. The setting f (0) = c2 removes the singularity at 0.

14.4 Singularities

387

Definition 14.5 (a) An isolated singularity a of f is called a pole of f if there exists a positive
integer m N and a holomorphic function g : Ur (a) C such that
f (z) =

g(z)
(z a)m

The smallest number m such that (z a)m f (z) has a removable singularity at a is called the
order of the pole.
(b) An isolated singularity a of f which is neither removable nor a pole is called an essential
singularity.
(c) If f is holomorphic at a and there exists a positive integer m and a holomorphic function g
such that f (z) = (z a)m g(z), and g(a) 6= 0, a is called a zero of order m of f .
Note that m = 0 corresponds to removable singularities. If f (z) has a zero of order m at a,
1/f (z) has a pole of order m at a and vice versa.

Example 14.11 The function f (z) = 1/z 2 has a pole of order 2 at z = 0 since z 2 f (z) = 1 has
a removable singularity at 0 and zf (z) = 1/z not. The function f (z) = (cos z 1)/z 3 has a
pole of order 1 at 0 since (cos z 1)/z 3 = /(2z) + z/4! .

14.4.2 Laurent Series


In a neighborhood of an isolated singularity a holomorphic function cannot expanded into a
power series, however, in a so called Laurent series.

Definition 14.6 A Laurent series with center a is a series of the form

n=

cn (z a)n

or more precisely the pair of series

f (z) =

X
n=1

cn (z a)

and

f+ (z) =

X
n=0

cn (z a)n .

The Laurent series is said to be convergent if both series converge.

14 Complex Analysis

388

R
R
r

r
a

f(z) converges
+
f(z) converges

1
. Thus, we can derive facts about the
za
convergence of Laurent series from the convergence of power series. In fact, suppose that 1/r
P
n
is the radius of convergence of the power series
n=1 cn and R is the radius of convergence
P
P
of the series n=0 cn z n , then the Laurent series nZ cn z n converges in the annulus Ar,R =
{z | r < z < R} and defines there a holomorphic function.
P
(a) The power series f+ (z) = n0 cn (z a)n converges in the inner part of the ball UR (a)
whereas the series with negative powers, called the principal part of the Laurent series, f (z) =
P
n
n<0 cn (z a) converges in the exterior of the ball Ur (a). Since both series must converge,
f (z) convergence in intersection of the two domains which is the annulus Ar,R (a).
Remark 14.10 (a) f (z) is a power series in

The easiest way to determine the type of an isolated singularity is to use Laurent series which
are, roughly speaking, power series with both positive and negative powers z n .
Proposition 14.22 Suppose that f is holomorphic in the open annulus Ar,R (a) =
{z | r < | z a | < R}. Then f (z) has an expansion in a convergent Laurent series for
z Ar,R
f (z) =

X
n=0

cn (z a) +

X
n=1

cn

1
(z a)n

(14.12)

n Z,

(14.13)

with coefficients
1
cn =
2i

S (a)

where r < < R.


r < s1 s2 < R.

f (z)
dz,
(z a)n+1

The series converges uniformly on every annulus As1 ,s2 (a) with

14.4 Singularities

389

Proof.

s1

Let z be in the annulus As1 ,s2 and let be the closed


path around z in the annulus consisting of the two circles Ss1 (a), Ss2 (a) and the two bridges. By Cauchys
integral formula,
Z
f (w)
1
dw = f1 (z) + f2 (z) =
f (z) =
2i w z
Z
Z
1
1
f (w)
f (w)
dw
dw
=
2i
wz
2i
wz
Ss2 (a)

We consider the two functions


Z
1
f1 (z) =
2i

Ss2 (a)

f (w)
dw,
wz

Ss1 (a)

1
f2 (z) =
2i

Ss1 (a)

f (w)
dw
wz

separately.
P
n
In what follows, we will see that f1 (z) is a power series
n=0 cn (z a) and f2 (z) =
P
1
The first part is completely analogous to the proof of Theorem 14.9.
n=1 cn (za)n .


za
<1
Case 1. w Ss2 (a). Then | z a | < | w a | and | q | =
s
wa
z
such that
2

1
1
1 X n X (z a)n
1

q =
.
=
=
za
wz
w a 1 wa
w a n=0
(w a)n+1
n=0

Since f (w) is bounded on Ss2 (a), the geometric series has a converging numerical upper bound.
Hence, the series converges uniformly with respect to w; we can exchange integration and
summation:
Z

1
f1 (z) =
2i

Ss2 (a)

where cn =

1
2i

X (z a)n
(z a)n
f (w)
dw
=
(w a)n+1
2i
n=0
n=0

Ss2 (a)

s1
z

a
w

Ss2 (a)

X
f (w)dw
=
cn (za)n ,
(w a)n+1
n=0

f (w)dw
(wa)n+1

are the coefficients of the power series f1 (z).




wa
< 1 such
Case 2. w Ss1 (a). Then | z a | > | w a | and
za
that

X
1
1
1 X n
1
(w a)n

=
=

q
=

.
wz
z a 1 wa
z a n=0
(z a)n+1
za
n=0

Since f (w) is bounded on Ss1 (a), the geometric series has a converging numerical upper bound.
Hence, the series converges uniformly with respect to w; we can exchange integration and

14 Complex Analysis

390
summation:
Z

1
f2 (z) =
2i

Ss1 (a)

X
n=1

where cn =

Ss2 (a)

X
1
(w a)n
f (w)
dw
=
(z a)n+1
2i(z a)n+1
n=0
n=0

f (w) (w a)n dw

Ss1 (a)

cn (z a)n ,

1
2i

Ss2 (a)

Since the integrand


mark 14.4 (c)
Z

f (w) (w a)n1 dw are the coefficients of the series f2 (z).

f (w)
, k Z, is holomorphic in both annuli As1 , and A,s2 , by Re(w a)k

f (w)dw
=
(w a)k

S (a)

f (w)dw
,
(w a)k

and

Ss1 (a)

f (w)dw
=
(w a)k

S (a)

f (w)dw
,
(w a)k

that is, in the coefficient formulas we can replace both circles Ss1 (a) and Ss2 (a) by a common
circle S (a). Since a power series converge uniformly on every compact subset of the disc of
convergence, the last assertion follows.

Remark 14.11 The Laurent series of f on Ar,R (a) is unique. Its coefficients cn , n Z are
uniquely determined by (14.13). Another value of with r < < R yields the same values cn
by Remark 14.4 (c).
Example 14.12 Find the Laurent expansion of f (z) =
midpoint 0
0 < | z | < 1,

z2

1 < | z | < 3,

2
in the three annuli with
4z + 3

3 < |z|.

1
1
+
, we find in the case
1z z3



1
1
1 X  z n
1
=
=
.
3z
3 1 z3
3 n=0 3

Using partial fraction decomposition, f (z) =


(a) | z | < 1

Hence,

X
1
=
zn,
1z
n=0


X
f (z) =
1
n=0

In the case | z | > 1,

and as in (a) for | z | < 3

1
3n+1

1 1
1
=
1z
z 1

1
z

zn ,

| z | < 1.

X
1
=
n+1
z
n=0

1
1 X  z n
=
3z
3 n=0 3

14.4 Singularities

391

such that
f (z) =

X
1
n=0

(c) In case | z | > 3 we have

X
1 n
+
z .
n
z
3n+1
n=0

such that

X 3n
1
1

=
=
z3
z n+1
z 1 3z
n=0
f (z) =

X
n=1

(3n1 1)

1
.
zn

We want to study the behaviour of f in a neighborhood of an essential singularity a. It is


characterized by the following theorem. For the proof see Conway, [Con78, p. 300].
Theorem 14.23 (Great Picard Theorem (1879)) Suppose that f (z) is holomorphic in the annulus G = {z | 0 < | z a | < r} with an essential singularity at a.
Then there exists a complex number w1 with the following property. For any complex number
w 6= w1 , there are infinitely many z G with f (z) = w.
In other words, in every neighborhood of a the function f (z) takes all complex values with
possibly one omission. In case of f (z) = e1/z the number 0 is omitted; in case of f (z) = sin 1z
no complex number is omitted.
We will prove a much weaker form of this statement.
Proposition 14.24 (CasoratiWeierstra) Suppose that f (z) is holomorphic in the annulus
G = {z | 0 < | z a | < r} with an essential singularity at a.
Then the image of any neighborhood of a in G is dense in C, that is for every w C and any
> 0 and > 0 there exists z G such that | z a | < and | f (z) w | < .
Proof. For simplicity, assume that < r. Assume to the contrary that there exists w C and

> 0 such that | f (z) w | for all z U (a). Then the function
g(z) =

1
,
f (z) w

z U (a)

is bounded (by 1/) in some neighborhood of a; hence, by Proposition 14.21, a is a removable


singularity of g(z). We conclude that
f (z) =

1
+w
g(z)

has a removable singularity at a if g(a) 6= 0. If, on the other hand, g(z) has a zero at a of order
m, that is

X
cn (z a)n , cm 6= 0,
g(z) =
n=m

14 Complex Analysis

392

the function (z a)m f (z) has a removable singularity at a. Thus, f has a pole of order m at a.
Both conclusions contradict our assumption that f has an essential singularity at a.
The Laurent expansion establishes an easy classification of the singularity of f at a. We summarize the main facts about isolated singularities.

Proposition 14.25 Suppose that f (z) is holomorphic in the punctured disc U = U R (a) and

X
cn (z a)n .
possesses there the Laurent expansion f (z) =
n=

Then the singularity at a

(a) is removable if cn = 0 for all n < 0. In this case, | f (z) | is bounded in U.


(b) is a pole of order m if cm 6= 0 and cn = 0 for all n < m. In this case, limza | f (z) | =
+.
(c) is an essential singularity if cn 6= 0 for infinitely many n < 0. In this case, | f (z) | has no
finite or infinite limit as z a.
The easy proof is left to the reader. Note that CasoratiWeierstra implies that | f (z) | has no
limit at a.
Example 14.13 f (z) = e1/z has in C \ {0} the Laurent expansion
1

ez =

X
1
,
n
n!z
n=0

| z | > 0.

Since cn 6= 0 for all n, f has an essential singularity at 0.

14.5 Residues
Throughout U C is an open connected subset of C.

Definition 14.7 Suppose that f : U r (a) C, is holomorphic, 0 < r1 < r and let
Z
X
1
f (z) dz
n
cn (z a) , cn =
f (z) =
2i Sr1 (a) (z a)n+1
nZ
be the Laurent expansion of f in the annulus {z | 0 < | z a | < r}.
Then the coefficient
Z
1
f (z) dz
c1 =
2i Sr1 (a)
is called the residue of f at a and is denoted by Res f (z) or Res f .
a

14.5 Residues

393

Remarks 14.12 (a) If f is holomorphic at a, Res f = 0 by Cauchys theorem.


a
R
(b) The integral f (z) dz depends only on the coefficient c1 in the Laurent expansion of f (z)
around a. Indeed, every summand cn (z a)n , n 6= 0, has an antiderivative in U \ {a} such that
the integral over a closed path is 0.
(c) Res f + Res g = Res f + g and Res f = Res f .
a

Theorem 14.26 (Residue Theorem) Suppose that f : U \ {a1 , . . . , am } C, a1 , . . . , am


U, is holomorphic. Further, let be a non-selfintersecting positively oriented closed curve in
U such that the points a1 , . . . , am are in the inner part of . Then
Z
m
X
f (z) dz = 2i
Res f
(14.14)

k=1

Proof.

a
a1

ak

R
As in Remark 14.4 we can replace by
the sum of integrals over small circles, one
around each singularity. As before, we obtain
Z
Z
m
X
f (z) dz =
f (z) dz,

k=1

S (ak )

where all circles are positively oriented. Applying the definition of the residue we obtain the
assertion.

Remarks 14.13 (a) The residue theorem generalizes the Cauchys Theorem, see Theorem 14.6.
Indeed, if f (z) possesses an analytic continuation to the points a1 , . . . , am , all the residues are
R
zero and therefore f (z) dz = 0.

X
(b) If g(z) is holomorphic in the region U, g(z) =
cn (z a)n , c0 = g(a), then
n=0

f (z) =

g(z)
,
za

z U \ {a}

is holomorphic in U \ {a} with a Laurent expansion around a:


f (z) =

c0
+ c1 + c2 (z a)2 + ,
za

where c0 = g(a) = Res f . The residue theorem gives


a

Sr (a)

g(z)
dz = 2i Res f = 2i c0 = 2ig(a).
a
za

We recovered Cauchys integral formula.

14 Complex Analysis

394

14.5.1 Calculating Residues


(a) Pole of order 1
As in the previous Remark, suppose that f has a pole of order 1 at a and g(z) = (z a)f (z) is
the corresponding holomorphic function in Ur (a). Then
Res f = g(a) =
a

lim g(z) = lim (z a)f (z).

(14.15)

za

za,z6=a

(b) Pole of order m


Suppose that f has a pole of order m at a. Then
f (z) =

cm
cm+1
c1
+ c0 + c1 (z a) + ,
+
++
m
m1
(z a)
(z a)
za

0 < |z a| < r
(14.16)

is the Laurent expansion of f around a. Multiplying (14.16) by (z a)m yields a holomorphic


function
(z a)m f (z) = cm + cm+1 (z a) + c1 (z a)m1 + ,

| z a | < r.

Differentiating this (m 1) times, all terms having coefficient cm , cm+1 , . . . , c2 vanish and
we are left with the power series
dm1
((z a)m f (z)) = (m 1)!c1 + m(m 1) 2 c0 (z a) +
dz m1
Inserting z = a on the left, we obtain c1 . However, on the left we have to take the limit z a
since f is not defined at a.
Thus, if f has a pol of order m at a,
Res f (z) =
a

dm1
1
lim m1 ((z a)m f (z)) .
(m 1)! za dz

(14.17)

(c) Quotients of Holomorphic Functions


Suppose that f = pq where p and q are holomorphic at a and q has a zero of order 1 at a, that is
q(a) = 0 6= q (a). Then, by (a)
p(z)
p
= lim
Res = lim (z a)
a
za
q
q(z) za

p(z)
q(z)q(a)
za

lim p(z)

za
lim q(z)q(a)
za
za

p(a)
.
q (a)

(14.18)

14.6 Real Integrals

395
R
dz
Example 14.14 Compute S1 (i) 1+z
The only singularities of
4.
4
f (z) = 1/(1 + z ) inside the disc {z | | z i | < 1} are

a1 = ei/4 = (1 + i)/ 2 and a2 = e3i/4 = (1 + i)/ 2. In


deed, | a1 i |2 = 2 2 < 1. We apply the Residue Theorem
and (c) and obtain

Im

i
a2

a1

Re

S1 (i)



dz
= 2i Res f + Res f =
a1
a2
1 + z4



2
1
a1 a2
1
=
.
+ 3 = 2i
2i
3
4a1 4a2
4
2

14.6 Real Integrals


14.6.1 Rational Functions in Sine and Cosine
Suppose we have to compute the integral of such a function over a full period [0, 2]. The idea
is to replace t by z = eit on the unit circle, cos t and sin t by (z + 1/z)/2 and (z 1/z)/(2i),
respectively, and finally dt = dz/(iz).
Proposition 14.27 Supppose that R(x, y) is a rational function in two variables and
R(cos t, sin t) is defined for all t [0, 2]. Then
Z 2
X
R(cos t, sin t) dt = 2i
Res f (z),
(14.19)
0

where

aU1 (0)

1
f (z) = R
iz




 
1
1
1
1
z+
,
z
2
z
2i
z

and the sum is over all isolated singularities of f (z) in the open unit ball.
Proof. By the residue theorem,
Z

f (z) dz = 2i
S1 (0)

aU1 (0)

Res f
a

Let z = eit for t [0, 2]. Rewriting the integral on the left using dz = eit i dt = iz dt
Z 2
Z
f (z) dz =
R(cos t, sin t) dt
S1 (0)

completes the proof.

Example 14.15 For | a | < 1,

2
0

dt
2
=
.
2
1 2a cos t + a
1 a2

14 Complex Analysis

396

For a = 0, the statement is trivially true; suppose now a 6= 0. Indeed, the complex function
corresponding to the integrand is
f (z) =

1
1
i/a
.
=
=
iz(1 + a2 az a/z)
i(az 2 a + (1 + a2 )z)
(z a) z a1

In the unit disc, f (z) has exactly one pole of order 1, namely z = a. By (14.15), the formula in
Subsection 14.5.1,
i
i
a
;
Res f = lim (z a)f (z) =
= 2
1
a
za
a 1
a a
the assertion follows from the proposition:
Z

dt
2
i
=
= 2i 2
.
2
1 2a cos t + a
a 1
1 a2

Specializingi R = 1 and r = a R in Homework 49.1, we obtain the same formula.

14.6.2 Integrals of the form


(a) The Principal Value

f (x) dx

We often compute improper integrals of the form

f (x) dx. Using the residue theorem, we

calculate limits
lim

f (x) dx,

(14.20)

which is called the principal value (or Cauchy mean value) of the integral over R and we denote
it by
Z

Vp

f (x) dx.

The existence of the coupled limit (14.20) in general does not imply the existence of the
improper integral
Z

f (x) dx = lim

f (x) dx + lim

f (x) dx.
0

R
R
R
For example, Vp x dx = 0 whereas x dx does not exist since 0 x dx = +. In
general, the existence of the improper integral implies the existence of the principal value. If f
is an even function or f (x) 0, the existence of the principal value implies the existence of the
improper integral.
(b) Rational Functions
R
The main idea to evaluate the integral R f (x) dx is as follows. Let H = {z | Im (z) > 0}
be the upper half-plane and f : H \ {a1 , . . . , am } C be holomorphic. Choose R > 0 large

14.6 Real Integrals

397

enough such that | ak | < R for all k = 1, . . . , m, that is, all isolated singularities of f are in the
upper-plane-half-disc of radius R around 0.
Consider the path as in the picture which consists of the segment from R to R on the real
iR
line and the half-circle R of radius R. By the

.
R
residue theorem,
.
.
Z R
Z
m
.
. .
X
.
f (x) dx+
Res f (z).
f (z) dz = 2i
R

If
lim

f (z) dz = 0

k=1

ak

(14.21)

the above formula implies


lim

f (x) dx = 2i

Knowing the existence of the improper integral


Z

Res (z).

k=1

ak

f (x) dx one has

f (x) dx = 2i

m
X

m
X

Res (z).

k=1

ak

Suppose that f = pq is a rational function such that q has no real zeros and deg q deg p + 2.
Then (14.21) is satisfied. Indeed, since only the two leading terms
of p and q determine the the
p(z)
C on R . Using the
limit behaviour of f (z) for | z | , there exists C > 0 with
q(z) R2
estimate M() from Remark 14.3 (c) we get
Z


p(z)
C
C

dz 2 (R ) =
0.

R
R R
R q(z)

By the same reason namely | p(x)/q(x) | C/x2 , for large x, the improper real integral exists
(comparison test) and converges absolutely. Thus, we have shown the following proposition.

Proposition 14.28 Suppose that p and q are polynomials with deg q deg p + 2. Further, q
in the open
has no real zeros and a1 , . . . , am are all poles of the rational function f (z) = p(z)
q(z)
upper half-plane H.
Then
Z
m
X
f (x) dx = 2i
Res f.

k=1

ak

R dx
2
2
Example 14.16 (a) 1+x
2 . The only zero of q(z) = z + 1 in H is a1 = i and deg(1 + z ) =
2 deg(1) + 2 such that

Z
dx
1
1
= 2i Res
= 2i
= .
2
i
1 + z2
1 + z z=i
1 + x

14 Complex Analysis

398
(b) It follows from Example 14.14 that



Z
2
dx
.
= 2i Res f + Res f =
4
a1
a2
2
1 + x
(c) We compute the integral

1
dt
=
6
1+t
2

dt
.
1 + t6

a 2 =i

a3

a1

The zeros of q(z) = z 6 + 1 in the upper half-plane are


a1 = ei/6 , a2 = ei/2 and a3 = e5i/6 . They are all of
multiplicity 1 such that Formula (14.18) applies:

Res
ak

1
1
ak
1
=
= 5 = .
q(z)
q (ak )
6ak
6

By Proposition 14.28 and noting that a1 + a3 = i,


Z

i
1
dt
1 i/6

5i/6
= 2i = .
=
+
i
+
e
e
2i
6
1+t
2
6
6
3
0
(c) Functions of Type g(z) eiz
Proposition 14.29 Suppose that p and q are polynomials with deg q deg p + 1. Further, q
has no real zeros and a1 , . . . , am are all poles of the rational function g = pq in the open upper
half-plane H. Put f (z) = g(z) eiz , where R is positive > 0.
Then
Z
m
X
f (x) dx = 2i
Res f.

k=1

ak

Proof. (The proof was omitted in the lecture.)


r+ir

ir

r+ir

.
.
.

Instead of a semi-circle it is more appropriate to


consider a rectangle now.

. .

According to the residue theorem,


Z r+ir
Z
Z r
f (x) dx +
f (z) dz +
r

rir

r+ir

f (z) dz +

r+ir

f (z) dz = 2i

m
X
k=1

Res f.
ak





p(z)
p(z)
exists and tends to 0 as
Since deg q deg p + 1, limz q(z) = 0. Thus, sr = sup

| z |r q(z)
r .

14.6 Real Integrals

399

Consider the second integral I2 with z = r + it, t [0, r], dz = i dt. On this segment we have
the following estimate


p(z) i(r+it)

sr et
q(z) e

which implies

| I2 | sr

et dt =

 sr
sr
1 er .

A similar estimate holds for the fourth integral from r + ir to r. In case of the third integral
one has z = t + ir, t [r, r], dz = dt such that
Z r
Z r
i(t+ri)
r


| I3 |
sr e
dt = 2rsr er .
dt = sr e
r

Since 2rer is bounded and sr 0 as r , all three integrals I2 , I3 , and I4 tend to 0 as


r . This completes the proof.

Example 14.17 For a > 0,

Obviously,

The function f (z) =


formula (14.18)

cos t
a
e .
dt
=
t2 + a2
2a

cos t
1
dt = Re
2
2
t +a
2

Z


eit
dt .
t2 + a2

eit
has a single pole of order 1 in the upper half-plane at z = ai. By
t2 + a2

eiz
eiz
ea
Res 2
.
=
=
ai z + a2
2z z=ai
2ai

Proposition 14.29 gives the result.


(d) A Fourier transformations
Lemma 14.30 For a R,

1 2
iax

e 2 x

1 2

dx = e 2 a .

(14.22)

Proof.
R+ai

ai

Let f (z) = e 2 z , z C and the closed


rectangular path = 1 + 2 + 3 + 4 as in
the picture. Since f is an entire function, by
R
Cauchys theorem f (z) dz = 0. Note that
2 is parametrized as z = R + ti, t [0, a],
dz = i dt, such that
1 2

R+ai

14 Complex Analysis

400

12 (R2 +it)2

f (z) dz =
e
i dt =
e 2 (R +2Ritt ) i dt
0
Z0 a
Z 2
Z a


1 2
1 2
12 R2 + 12 t2
12 R2


f (z) dz
e
e 2 t dt = Ce 2 R .
dt = e

2

SinceZ e 2 R tends to 0 as R , the above integral


tends to 0, as well; hence
Z
R
lim
f (z) dz = 0. Similarly, one can show that lim
f (z) dz = 0. Since f (z) dz = 0,
R
R
2Z
4
f (z) dz = 0, that is
we have
1 +3

Using

f (x) dx =

1 2

f (x + ai) dx.

e 2 x dx =

2,

which follows from Example 14.7, page 374, or from homework 41.3, we have
Z
Z
Z

1 2
1 2
12 x2
12 (x2 +2iaxa2 )
a
2
2 =
e
dx =
e
dx = e
e 2 x iax dx
R
R Z
R
1 2
1 2
1

e 2 x iax dx = e 2 a .
2 R

Chapter 15
Partial Differential Equations I an
Introduction
15.1 Classification of PDE
15.1.1 Introduction
There is no general theory known concerning the solvability of all PDE. Such a theory is extremely unlikely to exist, given the rich variety of physical, geometric, probabilistic phenomena
which can be modelled by PDE. Instead, research focuses on various particular PDEs that are
important for applications in mathematics and physics.
Definition 15.1 A partial differential equation (abbreviated as PDE) is an equation of the form
F (x, y, . . . , u, ux, uy , . . . , uxx , uxy , . . . ) = 0

(15.1)

where F is a given function of the independent variables x, y, . . . of the unknown function u


and a finite number of its partial derivatives.
We call u a solution of (15.1) if after substitution of u(x, y, . . . ) and its partial derivatives (15.1)
is identically satisfied in some region in the space of the independent variables x, y, . . . . The
order of a PDE is the order of the highest derivative that occurs.
A PDE is called linear if it is linear in the unknown function u and their derivatives ux , uy , uxy ,
. . . , with coefficients depending only on the variables x, y, . . . . In other words, a linear PDE
can be written in the form
G(u, ux , uy , . . . , uxx , uxy , . . . ) = f (x, y, . . . ),

(15.2)

where the function f on the right depends only on the variables x, y, . . . and G is linear in all
components with coefficients depending on x, y, . . . . More precisely, the formal differential
operator L(u) = G(u, ux, uy , . . . , uxx , uxy , . . . ) which associates to each function u(x, y, . . . )
a new function L(u)(x, y, . . . ) is a linear operator. The linear PDE (15.2) (L(u) = f ) is called
homogeneous if f = 0 and inhomogeneous otherwise. For example, cos(xy 2 )uxxy y 2 ux +
401

15 Partial Differential Equations I an Introduction

402

u sin x + tan(x2 + y 2) = 0 is a linear inhomogeneous PDE of order 3, the corresponding


homogeneous linear PDE is cos(xy 2 )uxxy y 2 ux + u sin x = 0.
A PDE is called quasi-linear if it is linear in all partial derivatives of order m (the order of the
PDE) with coefficients which depend on the variables x, y, and partial derivatives of order
less than m; for example ux uxx + u2 = 0 is quasi-linear, uxy uxx + 1 = 0 not. Semi-linear
equations are those quasi-linear equation in which the coefficients of the highest order terms
does not depend on u and its partial derivatives; sin xuxx +u2 = 0 is semi-linear; ux uxx +u2 = 0
not. Sometimes one considers systems of PDEs involving one or more unknown functions and
their derivatives.

15.1.2 Examples
(1) The Laplace equation in n dimensions for a function u(x1 , . . . , xn ) is the linear second
order equation
u = ux1x1 + + uxn xn = 0.
The solutions u are called harmonic (or potential) functions. In case n = 2 we associate
with a harmonic function u(x, y) its conjugate harmonic function v(x, y) such that the
first-order system of CauchyRiemann equations
ux = vy ,

uy = vx

is satisfied. A real solution (u, v) gives rise to the analytic function f (z) = u + iv. The
Poisson equation is
u = f,

for a given function f : R.

The Laplace equation models equilibrium states while the Poisson equation is important in electrostatics. Laplace and Poisson equation always describe stationary processes
(there is no time dependence).
(2) The heat equation. Here one coordinate t is distinguished as the time coordinate, while
the remaining coordinates x1 , . . . , xn represent spatial coordinates. We consider
u : R+ R,

open in Rn ,

where R+ = {t R | t > 0} is the positive time axis and pose the equation
kut = u,

where

u = ux1 x1 + + uxn xn .

The heat equation models heat conduction and other diffusion processes.
(3) The wave equation. With the same notations as in (2), here we have the equation
utt a2 u = 0.
It models wave and oscillation phenomena.

15.1 Classification of PDE

403

(4) The Kortewegde Vries equation


ut 6u ux + uxxx = 0
models the propagation of waves in shallow waters.
(5) The MongeAmp`ere equation
uxx uyy u2xy = f
with a given function f , is used for finding surfaces with prescribed curvature.
(6) The Maxwell equations for the electric field strength E = (E1 , E2 , E3 ) and the magnetic
field strength B = (B1 , B2 , B3 ) as functions of (t, x1 , x2 , x3 ):
div B = 0,

(magnetostatic law),

Bt + curl E = 0,

(magnetodynamic law),

div E = 4,

(electrostatic law, = charge density),

Et curl B = 4j

(electrodynamic law, j = current density)

(7) The NavierStokes equations for the velocity v(x, t) = (v 1 , v 2 , v 3 ) and the pressure
p(x, t) of an incompressible fluid of density and viscosity :
vtj

3
X
i=1

v i vxj i v j = pxj ,

j = 1, 2, 3,

div v = 0.

(8) The Schrodinger equation


~2
u + V (x, u)
i~ut =
2m
(m = mass, V = given potential, u : C) from quantum mechanics is formally
similar to the heat equation, in particular in the case V = 0. The factor i, however, leads
to crucial differences.
Classification
We have seen so many rather different-looking PDEs, and it is hopeless to develop a theory that
can treat all these diverse equations. In order to proceed we want to look for criteria to classify
PDEs. Here are some possibilities
(I) Algebraically.
(a) Linear equations are (1), (2), (3), (6) which is of first order, and (8)
(b) semi-linear equations are (4) and (7)

15 Partial Differential Equations I an Introduction

404
(c) a non-linear equation is (5)

naturally, linear equations are simple than non-linear ones. We shall therefore mostly
study linear equations.
(II) The order of the equation. The CauchyRiemann equations and the Maxwell equations
are linear first order equations. (1), (2), (3), (5), (7), (8) are of second order; (4) is of third
order. Equations of higher order rarely occur. The most important PDEs are second order
PDEs.
(III) Elliptic, parabolic, hyperbolic. In particular, for the second order equations the following
classification turns out to be useful: Let x = (x1 , . . . , xn ) and
F (x, u, uxi , uxixj ) = 0
be a second-order PDE. We introduce auxiliary variables pi , pij , i, j = 1, . . . , n, and study
the function F (x, u, pi, pij ). The equation is called elliptic in if the matrix
Fpij (x, u(x), uxi (x), uxi xj (x))i,j=1,...,n
of the first derivatives of F with respect to the variables pij is positive definite or negative
definite for all x .
this may depend on the function u. The Laplace equation is the prominent example of an
elliptic equation. Example (5) is elliptic if f (x) > 0.
The equation is called hyperbolic if the above matrix has precisely one negative and d 1
positive eigenvalues (or conversely, depending on the choice of the sign). Example (3) is
hyperbolic and so is (5) if f (x) < 0.
Finally, the equation is parabolic if one eigenvalue of the above matrix is 0 and all the
other eigenvalues have the same sign. More precisely, the equation can be written in the
form
ut = F (t, x, u, uxi , uxixj )
with an elliptic F .
(IV) According to solvability. We consider the second-order PDE F (x, u, uxi , uxixj ) = 0 for
u : R, and wish to impose additional conditions upon the solution u, typically
prescribing the values of u or of certain first derivatives of u on the boundary or part
of it.
Ideally such a boundary problem satisfies the three conditions of Hadamard for a wellposed problem
(a) Existence of a solution u for the given boundary values;
(b) Uniqueness of the solution;
(c) Stability, meaning continuous dependence on the boundary values.

15.2 First Order PDE The Method of Characteristics

405

Example 15.1 In the following examples = R2 and u = u(x, y).


(a) Find all solutions u C2 (R2 ) with uxx = 0. We first integrate with respect to x and find
that ux is independent on x, say ux = a(y). We again integrate with respect to x and obtain
u(x, y) = xa(y) + b(y) with arbitrary functions a and b. Note that the ODE u = 0 has the
general solution ax + b with coefficients a, b. Now the coefficients are functions on y.
(b) Solve uxx + u = 0, u C2 (R2 ). The solution of the corresponding ODE u + u = 0,
u = u(x), u C2 (R), is a cos x + b sin x such that the general solution of the corresponding
PDE in 2 variables x and y is a(y) cos x + b(y) sin x with arbitrary functions a and b.

(c) Solve uxy = 0, u C2 (R2 ). First integrate y


(ux ) = 0 with respect to y. we obtain
R

ux = f(x).
Integration with respect to x yields u = f(x) dx + g(y) = f (x) + g(y), where f
is differentiable and g is arbitrary.

15.2 First Order PDE The Method of Characteristics


We solve first order PDE by the method of chracteristics. It applies to quasi-linear equations
a(x, y, u)ux + b(x, y, u)uy = c(x, y, u)

(15.3)

as well as to the linear equation


a(x, y)ux + b(x, y)uy = c0 (x, y)u + c1 (x, y).

(15.4)

We restrict ourselves to the linear equation with an initial condition given as a parametric curve
in the xyu-space
= (s) = (x0 (s), y0(s), u0 (s)),

s (a, b) R.

(15.5)

The curve will be called the initial curve. The initial condition then reads
u(x0 (s), y0 (s)) = u0 (s),

initial curve

(x(0,s), y(0,s),u(0,s))

(s)

integral
surface

initial point
y

characteristic curves

s (a, b).

The geometric idea behind this method is the


following. The solution u = u(x, y) can be
thought as surface in R3 = {(x, y, u) | x, y, u
R3 }. Starting from a point on the initial curve,
we construct a chracteristic curve in the surface
u. If we do so for any point of the initial curve,
we obtain a one-parameter family of characteristic curves; glueing all these curves we get the
solution surface u.

The linear equation (15.4) can be rewritten as


(a, b, c0 u + c1 )(ux , uy , 1) = 0.

(15.6)

15 Partial Differential Equations I an Introduction

406

Recall that (ux , uy , 1) is the normal vector to the surface (x, y, u(x, y)), that is, the tangent
equation to u at (x0 , y0 , u0 ) is
u u0 = ux (x x0 ) + uy (y y0 ) (x x0 , y y0 , u u0 )(ux , uy , 1) = 0.

It follows from (15.6) that (a, b, c0 u + c1 ) is a vector in the tangent plane. Finding a curve
(x(t), y(t), u(t)) with exactely this tangent vector
(a(x(t), y(t)), b(x(t), y(t)), c0(x(t), y(t))u(t) + c1 (x(t), y(t)))
is equivalent to solve the ODE
x (t) = a(x(t), y(t)),

(15.7)

y (t) = b(x(t), y(t)),

(15.8)

u (t) = c0 (x(t), y(t))u(t) + c1 (x(t), y(t))).

(15.9)

This system is called the characteristic equations. The solutions are called characteristic curves
of the equation. Note that the above system is autonomous, i. e. there is no explicit dependence
on the parameter t.
In order to determine characteristic curves we need an initial condition. We shall require the
initial point to lie on the initial curve (s). Since each curve (x(t), y(t), u(t)) emanates from
a different point (s), we shall explicitely write the curves in the form (x(t, s), y(t, s), u(t, s)).
The initial conditions are written as
x(0, s) = x0 (s),

y(0, s) = y0 (s),

u(0, s) = u0 (s).

Notice that we selected the parameter t such that the characteristic curve is located at the initial
curve at t = 0. Note further that the parametrization (x(t, s), y(t, s), u(t, s)) represents a surface
in R3 .
The method of characteristics also applies to quasi-linear equations.
To summarize the method: In the first step we identify the initial curve . In the second step
we select a point s on as initial point and solve the characterictic equations using the point
we selected on as an initial point. After preforming the steps for all points on , we obtain
a portion of the solution surface, also called integral surface. That consists of the union of the
characteristic curves.
Example 15.2 Solve the equation
ux + uy = 2
subject to the initial condition u(x, 0) = x2 . The characteristic equations and the parametric
initial conditions are
xt (t, s) = 1,

yt (t, s) = 1,

ut (t, s) = 2,

x(0, s) = s,

y(0, s) = 0,

u(0, s) = s2 .

It is easy to solve the characteristic equations:


x(t, s) = t + f1 (s),

y(t, s) = t + f2 (s),

u(t, s) = 2t + f3 (s).

15.2 First Order PDE The Method of Characteristics

407

Inserting the initial conditions, we find


x(t, s) = t + s,

y(t, s) = t,

u(t, s) = 2t + s2 .

We have obtained a parametric representation of the integral surface. To find an explicit representation we have to invert the transformation (x(t, s), y(t, s)) in the form (t(x, y), s(x, y)),
namely, we have to solve for s and t. In the current example, we find t = y, s = x y. Thus
the explicit formula for the integral surface is
u(x, y) = 2y + (x y)2.
Remark 15.1 (a) This simple example might lead us to think that each initial value problem
for a first-order PDE possesses a unique solution. But this is not the case Is the problem(15.3)
together with the initial condition (15.5) well-posed? Under which conditions does there exists
a unique integral surface that contains the initial curve?
(b) Notice that even if the PDE is linear, the characteristic equations are non-linear. It follows
that one can expect at most a local existanece theorem for a first ordwer PDE.
(c) The inversion of the parametric presentation of the integral surface might hide further difficulties. Recall that the implicit function theorem implies that the inversion locally exists if the
6= 0. An explicit computation of the Jacobian at a point s of the initial curve
Jacobian (x,y)
(t,s)
gives


a b
x y x y

.
J=

= ay0 bx0 =
x0 y0
t s
s t

Thus, the Jacobian vanishes at some point if and only if the vectors (a, b) and (x0 , y0 ) are
linearly dependent. The geometrical meaning of J = 0 is that the projection of into the xy
plane is tangent to the projection of the characteristic curve into the xy plane. To ensure a unique
solution near the initial curve we must have J 6= 0. This condition is called the transersality
condition.
Example 15.3 (Well-posed and Ill-posed Problems) (a) Solve ux = 1 subject to the initial
condition u(0, y) = g(y). The characteristic equations and the inition conditions are given by
xt = 1,

yt (t, s) = 0,

ut (t, s) = 1,

x(0, s) = 0,

y(0, s) = s,

u(0, s) = g(s).

The parametric integral surface is (x(t, s), y(t, s), u(t, s)) = (t, s, t+g(s)) such that the explicit
solution is u(x, y) = x + g(y).
(b) If we keep ux = 1 but modify the initial condition into u(x, 0) = h(x), the picture changess
dramatically.
xt = 1,

yt (t, s) = 0,

ut (t, s) = 1,

x(0, s) = s,

y(0, s) = 0,

u(0, s) = h(s).

In this case the parametric solution is


(x(t, s), y(t, s), u(t, s)) = (t + s, 0, t + h(s)).

15 Partial Differential Equations I an Introduction

408

Now, however, the transformation x = t + s, y = 0 cannot be inverted. Geometrically, the


projection of the initial curve is the x axis, but this is also the projection of the characteristic
curve. In the speial case h(x) = x + c for some constant c, we obtain u(t, s) = t + s + c. Then
it is not necessary to invert (x, y) since we find at once u = x + c + f (y) for every differentiable
function f with f (0) = 0. We have infinitely many solutions uniqueness fails.
(c) However, for any other choice of h Existence fails the problem has no solution at all.
Note that the Jacobian is



a b 1 0



= 0.
J =
=
x0 y0 1 0

Remark 15.2 Because of the special role played by the projecions of the characteristics on the
xy plane, we also use the term characteristics to denote them. In case of the linear PDE (15.4)
the ODE for the projection is
x (t) =
which yields y (x) =

dx
= a(x(t), y(t)),
dt

y (t) =

dy
= b(x(t), y(t)),
dt

(15.10)

dy
b(x, y)
=
.
dx
a(x, y)

15.3 Classification of Semi-Linear Second-Order PDEs


15.3.1 Quadratic Forms
We recall some basic facts about quadratic forms and symmetric matrices.
Proposition 15.1 (Sylvesters Law of Inertia) Suppose that A Rnn is a symmetric matrix.
(a) Then there exist an invertible matrix B Rnn , r, s, t N0 with r + s + t = n and a
diagonal matrix D = diag (d1 , d2 , . . . , dr+s , 0, . . . , 0) with di > 0 for i = 1, . . . , r and di < 0
for i = r + 1, . . . , r + s and
BAB = diag (d1 , . . . , dr+s , 0, . . . , 0).
We call (r, s, t) the signature of A.
(b) The signature does not depend on the change of coordinates, i. e. If there exist another
regular matrix B and a diagonal matrix D with D = B A(B ) then the signature of D and
D coincide.

15.3.2 Elliptic, Parabolic and Hyperbolic


Consider the semi-linear second-order PDE in n variables x1 , . . . , xn in a region Rn
n
X

aij (x) uxi xj + F (x, u, ux1 , . . . , uxn ) = 0

(15.11)

i,j=1

with continuous coefficients aij (x). Since we assume u C2 (), by Schwarzs lemma we
assume without loss of generality that aij = aji. Using the terminology of the introduction

15.3 Classification of Semi-Linear Second-Order PDEs

409

(Classification (III), see page 404) we find that the matrix A(x) := (aij (x))i,j=1,...,n , coincides
with the matrix (Fpij )i,j defined therein.
Definition 15.2 We call the PDE (15.11) elliptic at x0 if the matrix A(x0 ) is positive definite
or negative definite. We call it parabolic at x0 if A(x0 ) is positive or negative semidefinite with
exactly one eigenvalue 0. We call it hyperbolic if A(x0 ) has the signature (n 1, 1, 0), i. e. A is
indefinite with n 1 positive eigenvalues and one negative eigenvalue and no zero eigenvalue
(or vice versa).

15.3.3 Change of Coordinates


First we study how the coefficients aij will change if we impose a non-singular transformation
of coordinates y = (x);
yl = l (x1 , . . . , xn ),

l = 1, . . . , n;

1 ,...,n )
The transformation is called non-singular if the Jacobian (
(x0 ) 6= 0 is non-zero at
(x1 ,...,xn )
any point x0 . By the Inverse Mapping Theorem, the transformation possesses locally an
inverse transformation denoted by x = (y)

xl = l (y1, . . . , yn ),

l = 1, . . . , n.

Putting
u(y) := u((y)),

then u(x) = u((x))

and if moreover l C2 () we have by the chain rule


u xi =

n
X
l=1

uyl

l
,
xi

uxi xj = (uxi )xj =

n
X

uyl yk

k,l=1

l k X
2 l
+
uyl
.
xi xj
x
i xj
l=1

(15.12)

Inserting (15.12) into (15.11) one has


n
X

k,l=1

uylyk

n
X

i,j=1

aij

n
n
X
l k X
2 l
+
uyl
aij
+ F (y, u
, uy1 , . . . , uyn ) = 0.
xi xj
x
x
i
j
i,j=1
l=1

(15.13)

We denote by a
lk the new coefficients of the partial second derivatives of u,
a
lk =

n
X

aij (x)

i,j=1

l k
,
xi xj

and write (15.13) in the same form as (15.11)


n
X

k,l=1

a
lk (y)
uylyk + F (y, u
, uy1 , . . . , uyn ) = 0.

(15.14)

15 Partial Differential Equations I an Introduction

410

Equation (15.14) later plays a crucial role in simplifying PDE (15.11). Namely, if we want
some of the coefficients alk to be 0, the right hand side of (15.14) has to be 0. Writing
blj =

l
,
xj

l, j = 1, . . . , n,

B = (blj ),

the new coefficient matrix A(y)


= (
alk (y)) reads as follows
A = B AB .

By Proposition 15.1, A and A have the same signature. We have shown the following proposition.
Proposition 15.2 The type of a semi-linear second order PDE is invariant under the change of
coordinates.
Notation. We call the operator L with
L(u) =

n
X

aij (x)

i,j=1

2u
+ F (x, u, ux1 , . . . , uxn )
xi xj

differential operator and denote by L2


n
X

2u
L2 (u) =
aij (x)
xi xj
i,j=1
the sum of its the highest order terms; L2 is a linear operator.
Definition 15.3 The second-order PDE L(u) = 0 has normal form if
L2 (u) =

m
X
j=1

u xj xj

r
X

u xj xj

j=m+1

with some positive integers m r n.


Remarks 15.3 (a) It happens that the type of the equation depends on the point x0 . For
example, the Trichomi equation
yuxx + uyy = 0
is of mixed type. More precisely, it is elliptic if y > 0, parabolic if y = 0 and hyperbolic if
y < 0.
(b) The Laplace equation is elliptic, the heat equation is parabolic, the wave equation is hyperbolic.
(c) The classification is not complete in case n 3; for example, the quadratic form can be of
type (n 2, 1, 1).
(d) Case n = 2. The PDE
auxx + 2buxy + cuyy + F (x, y, u, ux, uy ) = 0
with coefficients a = a(x, y), b = b(x, y) and c = c(x, y) is elliptic, parabolic or hyperbolic at
(x0 , y0) if and only if ac b2 > 0, ac b2 = 0 or ac b2 < 0 at (x0 , y0 ), respectively.

15.3 Classification of Semi-Linear Second-Order PDEs

411

15.3.4 Characteristics
Suppose we are given the semi-linear second-order PDE in Rn
n
X

aij (x)

i,j=1

2u
+ F (x, u, ux1 , . . . , uxn ) = 0
xi xj

(15.15)

with continuous aij ; aij (x) = aji(x).


We define the concept of characteristics which plays an important role in the theory of PDEs,
not only of second-order PDEs.
Definition 15.4 Suppose that C1 () is continuously differentiable, grad 6= 0, and
(a) for some point x0 of the hypersurface F = {x | (x) = c}, c R, we have
n
X

i,j=1

aij (x0 )

(x0 ) (x0 )
= 0.
xi
xj

(15.16)

Then F is said to be characteristic at x0 .


(b) If F is characteristic at every point of F, F is called a characteristic hypersurface or simply
a characteristic of the PDE (15.11). Equation (15.16) is called the characteristic equation of
(15.11).
In case n = 2 we speak of characteristic lines.
If all hypersurfaces (x) = c, a < c < b, are characteristic, this family of hypersurfaces fills
the region such that for any point x there is exactly one hypersurface with (x) = c.
This c can be chosen to be one new coordinate. Setting
y1 = (x)
we see from (15.14) that a11 = 0. That is, the knowledge of one or more characteristic hypersurfaces can simplify the PDE.
Example 15.4 (a) The characteristic equation of uxy = 0 is x y = 0 such that x = 0 and
y = 0 define the characteristic lines; the parallel lines to the coordinate axes, y = c1 and
x = c2 , are the characteristics.
(b) Find type and characteristic lines of
x2 uxx y 2 uyy = 0,

x 6= 0, y 6= 0.

Since det = x2 (y 2 )0 = x2 y 2 < 0, the equation is hyperbolic. The characteristic equation,


in the most general case, is
ax2 + 2bx y + cy2 = 0.
Since grad 6= 0, (x, y) = c is locally solvable for y = y(x) such that y = x /y . Another
way to obtain this is as follows: Differentiating the equation (x, y) = c yields x dx+y dy =
0 or dy/ dx = x /y . Inserting this into the previous equation we obtain a quadratic ODE
a(y )2 2by + c = 0,

15 Partial Differential Equations I an Introduction

412
with solutions

b2 ac
, if a 6= 0.
a
We can see, that the elliptic equation has no characteristic lines, the parabolic equation has one
family of characteristics, the hyperbolic equation has two families of characteristic lines.
Hyperbolic case. In general, if c1 = 1 (x, y) is the first family of characteristic lines and
c2 = 2 (x, y) is the second family of characteristic lines,

y =

= 1 (x, y),

= 2 (x, y)

is the appropriate change of variable which gives a


= c = 0. The transformed equation then
reads
, u , u ) = 0.
2b u + F (, , u
Parabolic case. Since det A = 0, there is only one real family of characteristic hyperplanes,
say c1 = 1 (x, y). We impose the change of variables
z = 1 (x, y),

y = y.

Since det A = 0, the coefficients b vanish (together with a


). The transformed equation reads
, uz , uy ) = 0.
c uyy + F (z, y, u
The above two equations are called the characteristic forms of the PDE (15.11)
In our case the characteristic equation is
x2 (y )2 y 2 = 0,

y = y/x.

This yields
dx
dy
= ,
log | y | = log | x | + c0 .
y
x
We obtain the two families of characteristic lines

Indeed, in our example


=

c2
.
x

y = c1 x,

y=

y
= c1 ,
x

= xy = c2

gives
x = y,
x =

y = x,
1
y = ,
x

y
,
x2

xx = 0,
xx = 2

y
,
x3

yy = 0,

xy = 1,

yy = 0,

xy =

In our case (15.12) reads


uxx = u x2 + 2
u x x + u x2 + u xx + u xx
uyy = u y2 + 2
u y y + u y2 + u yy + u yy

1
.
x2

15.3 Classification of Semi-Linear Second-Order PDEs

413

Noting x2 = /, y 2 = and inserting the values of the partial derivatives of and we get
y2
y2
y

2
u + u y 2 + 2 3 u ,
4
2
x
x
x
1
= u 2 + 2
u + u x2 .
x

uxx = u
uyy
Hence

y
x2 uxx y 2 uyy = 4y 2 u + 2 u = 0
x
1 1
u
u = 0.
2 xy
Since = xy, we obtain the characteristic form of the equation to be
u

1
u = 0.
2

1
1
v = 0 which corresponds to the ODE v 2
v=
Using the substitution v = u , we obtain v 2

0. Hence, v(, ) = c() . Integration with respect to gives u(, ) = A() + B().
Transforming back to the variables x and y, the general solution is
y 
u(x, y) = A
xy + B(xy).
x

(c) The one-dimensional wave equation utt a2 uxx = 0. The characteristic equation t2 = a2 x2
yields
t /x = dx/ dt = x = a.

The characteristics are x = at + c1 and x = at + c2 . The change of variables = x at


and = x + at yields u = 0 which has general solution u(, ) = f () + g(). Hence,
u(x, t) = f (x at) + g(x + at) is the general solution, see also homework 23.2. .
(d) The wave equation in n dimensions has characteristic equation
t2

n
X

x2i = 0.

i=1

This equation is satisfied by the characteristic cone


2

(0) 2

(x, t) = a (t t )

n
X
i=1

(0)

(xi xi )2 = 0,

where the point (x(0) , t(0) ) is the peak of the cone. Indeed,

implies t2 a2

Pn

i=1 (xi

t = 2a2 (t t(0) ),
(0)

xi )2 = 0.

(0)

x1 = 2(xi xi )

15 Partial Differential Equations I an Introduction

414

Further, there are other characteristic surfaces: the hyperplanes


(x, t) = at +

n
X

bi xi = 0,

i=1

where kbk = 1.
P
(e) The heat equation has characteristic equation ni=1 x2i = 0 which implies xi = 0 for
all i = 1, . . . , n such that t = c is the only family of characteristic surfaces (the coordinate
hyperplanes).
(f) The Poisson and Laplace equations have the same characteristic equation; however we have
one variable less (no t) and obtain grad = 0 which is impossible. The Poisson and Laplace
equations dont have characteristic surfaces.

15.3.5 The Vibrating String


(a) The Infinite String on

We consider the Cauchy problem for an infinite string (no boundary values):
utt a2 uxx = 0,

u(x, 0) = u0 (x),

where u0 and u1 are given.


Inserting the initial values into
u(x, t) = f (x at) + g(x + at) we get

the

u0 (x) = f (x) + g(x),

ut (x, 0) = u1(x),

general

solution

(see

Example 15.4 (c))

u1(x) = af (x) + ag (x).

Differentiating the first one yields u0 (x) = f (x) + g (x) such that
1
1
f (x) = u0 (x) u1 (x),
2
2a
Integrating these equations we obtain
Z x
1
1
f (x) = u0 (x)
u1 (y) dy + A,
2
2a 0

1
1
g (x) = u0 (x) + u1 (x).
2
2a

1
1
g(x) = u0 (x) +
2
2a

u1 (y) dy + B,
0

where A and B are constants such that A + B = 0 (since f (x) + g(x) = u0 (x)). Finally we
have
u(x, t) = f (x at) + g(x + at)

1
= (u0 (x + at) + u0 (x at))
2
1
= (u0 (x + at) + u0 (x at)) +
2

Z xat
Z x+at
1
1
u1 (y) dy +
u1 (y) dy
2a 0
2a 0
Z x+at
1
u1 (y) dy.
(15.17)
2a xat

15.3 Classification of Semi-Linear Second-Order PDEs

415

It is clear from (15.17) that u(x, t) is uniquely determined by the values of the initial functions u0 and u1
in the interval [x at, x + at] whose end points are
cut out by the characteristic lines through the point
(x, t). This interval represents the domain of dependence for the solution at point u(x, t) as shown in the
figure.

(x,t)

x-at

x+at

Conversely, the initial values at point (, 0) of the x-axis influence u(x, t) at points (x, t) in the
wedge-shaped region bounded by the characteristics through (, 0), i. e. , for at < x < +at.
This indicates that our signal or disturbance only moves with speed a.
We want to give some interpretation of the solution (15.17). Suppose u1 = 0 and
u
1
t=0
(
| x | a,
1 | xa | ,
u0 (x) =
x
0,
| x | > a.
-a
a
In this example we consider the vibrating string which is plucked at time t = 0 as in the above
picture (given u0 (x)). The initial velocity is zero (u1 = 0).
u
t=1/2

1/2
-3a/2

a/2

-a/2

3a/2

u
t=1

1/2

-2a

-a

2a

In the pictures one can see the behaviour of


the string. The initial peek is divided into two
smaller peeks with the half displacement, one
moving to the right and one moving to the left
with speed a.

u
t=2

1/2
-3a

-2a

-a

2a

3a

Formula (15.17) is due to dAlembert (1746). Usually one assumes u0 C2 (R) and u1
C1 (R). In this case, u C2 (R2 ) and we are able to evaluate the classical Laplacian (u)
which gives a continuous function. On the other hand, the right hand side of (15.17) makes
sense for arbitrary continuous function u1 and arbitrary u0 . If we want to call these u(x, t) a
generalized solution of the Cauchy problem we have to alter the meaning of (u). In particular,
we need more general notion of functions and derivatives. This is our main objective of the next
section.
(b) The Finite String over [0, l]
We consider the initial boundary value problem (IBVP)
utt = a2 uxx ,

u(0, x) = u0(x),

ut (0, x) = u1 (x), x [0, l], u(0, t) = u(l, t) = 0,

t R.

15 Partial Differential Equations I an Introduction

416

Suppose we are given functions u0 C2 ([0, l]) and u1 C1 ([0, l]) on [0, l] with
u0 (0) = u0 (l) = 0,

u1 (0) = u1 (l) = 0,

u0 (0) = u0 (l) = 0.

To solve the IBVP, we define new functions u0 and u1 on R as follows: first extend both
functions to [l, l] as odd functions, that is, ui (x) = ui (x), i = 0, 1. Then extend ui as a
2l-periodic function to the entire real line. The above assumptions ensure that u0 C2 (R) and
u1 C1 (R). Put
Z x+at
1
1
u(x, t) = (
u0 (x + at) + u0 (x at)) +
u1 (y) dy.
2
2a xat
Then u(x, t) solves the IVP.

Chapter 16
Distributions
16.1 Introduction Test Functions and Distributions
In this section we introduce the notion of distributions. Distributions are generalized functions.
The class of distributions has a lot of very nice properties: they are differentiable up to arbitrary
order, one can exchange limit procedures and differentiation, Schwarz lemma holds. Distributions play an important role in the theory of PDE, in particular, the notion of a fundamental
solution of a differential operator can be made rigorous within the theory of distributions only.
Generalized functions were first used by P. Dirac to study quantum mechanical phenomena.
Systematically he made use of the so called -function (better: -distribution). The mathematical foundations of this theory are due to S. L. Sobolev (1936) and L. Schwartz (1950, 1915
2002).
Since then many mathematicians made progress in the theory of distributions. Motivation comes
from problems in mathematical physics and in the theory of partial differential equations.
Good accessible (German) introductions are given in the books of W. Walter [Wal74] and
O. Forster [For81, 17]. More detailed explanations of the theory are to be found in the
books of H. Triebel (in English and German), V. S. Wladimirow (in russian and german) and
Gelfand/Schilow (in Russian and German, part I, II, and III), [Tri92, Wla72, GS69, GS64].

16.1.1 Motivation
Distributions generalize the notion of a function. They are linear functionals on certain spaces of
test functions. Using distributions one can express rigorously the density of a mass point, charge
density of a point, the single-layer and the double-layer potentials, see [Arn04, pp. 92]. Roughly
speaking, a generalized function is given at a point by the mean values in the neighborhood
of that point.
The main idea to associate to each sufficiently nice function f a linear functional Tf (a distribution) on an appropriate function space D is described by the following formula.
hTf , i =

f (x)(x) dx,
417

D.

(16.1)

16 Distributions

418

On the left we adopt the notation of a dual pairing of vector spaces from Definition 11.1. In
general the bracket hT , i denotes the evaluation of the functional T on the test function .
Sometimes it is also written as T (). It does not denote an inner product; the left and the right
arguments are from completely different spaces.
What we really want of Tf is
(a) The correspondence should be one-to-one, i. e., different functionals Tf and Tg correspond to different functions f and g. To achieve this, we need the function space D
sufficiently large.
(b) The class of functions f should contain at least the continuous functions. However, if
f (x) = xn , the function f (x)(x) must be integrable over R, that is xn (x) L1 (R).
Since polynomials are not in L1 (R), the functions must be very small for large | x |.
Roughly speaking, there are two possibilities to this end. First, take only those functions
which are identically zero outside a compact set (which depends on ). This leads to
the test functions D(R). Then Tf is well-defined if f is integrable over every compact
subset of R. These functions f are called locally integrable.
Secondly, we take (x) to be rapidly decreasing as | x | tends to . More precisely, we
want
sup | xn (x) | <

for all non-negative integers n Z+ . This concept leads to the notion of the so called
Schwartz space S (R).
(c) We want to differentiate f arbitrarily often, even in case that f has discontinuities. The
only thing we have to do is to give the expression
Z

f (x)(x) dx,

a meaning. Using integration by parts and the fact that (+) = () = 0, the above
R
expression equals R f (x) (x) dx. That is, instead differentiating f , we differentiate
the test function . In this way, the functional Tf makes sense as long as f is integrable. Since we want to differentiate f arbitrarily often, we need the test function to
be arbitrarily differentiable, C (R).
Note that conditions (b) and (c) make the space of test functions sufficiently small.

R ) and D()

16.1.2 Test Functions D(

We want to solve the problem f to be integrable for all polynomials f . We use the first
approach and consider only functions which are 0 outside a bounded set. If nothing is stated
otherwise, Rn denotes an open, connected subset of Rn .

16.1 Introduction Test Functions and Distributions

419

(a) The Support of a Function and the Space of Test Functions


Let f be a function, defineds on . The set
supp f := {x | f (x) 6= 0} Rn
is called the support of f , denoted by supp f .
Remark 16.1 (a) supp f is always closed; it is the smallest closed subset M such that f (x) = 0
for all x Rn \ M.
(b) A point x0 6 supp f if and only if there exists > 0 such that f 0 in U (x0 ). This in
particular implies that for f C (Rn ) we have f (k) (x0 ) = 0 for all k N.
(c) supp f is compact if and only if it is bounded.
Example 16.1 (a) supp sin = R.
(b) Let let f : (1, 1) R, f (x) = x(1 x). Then supp f = [1, 1].
(c)The characteristic function M has support M .
(d) Let h be the hat function on R note that supp h = [1, 1] and f (x) = 2h(x) 3h(x 10).
Then supp f = [1, 1] [11, 9].
Definition 16.1 (a) The space D(
Rn with compact support.

R ) consists of all infinitely differentiable functions f on


n

n
D(Rn ) = C
0 (R ) = {f C (R ) | supp f

is compact}.

(b) Let be a region in Rn . Define D() as follows


D() = {f C () | supp f is compact in Rn and supp f }.
We call D() the space of test functions on .

c/e
h(t)

First of all let us make sure the existence of such


functions. On the real axis consider the hat
function (also called bump function)
( 1
| t | < 1,
c e 1t2 ,
h(t) =
0,
| t | 1.

1
1
R
The constant c is chosen such that R h(t) dt = 1. The function h is continuous on R. It was
already shown in Example 4.5 that h(k) (1) = h(k) (1) = 0 for all k N. Hence h D(R) is
a test function with supp h = [1, 1]. Accordingly, the function
( 1
kxk < 1,
cn e 1kxk2 ,
h(x) =
0,
kxk 1.

16 Distributions

420

is an element of D(Rn ) with support


being theZ closed unit ball supp h = {x | kxk 1}. The
Z
constant cn is chosen such that

h(x) dx =

Rn

h(x) dx = 1.

U1 (0)

For > 0 we introduce the notation

1 x
.
h
n

h (x) =

Then supp h = U (0) and


Z
Z
Z
x
1
dx =
h
h(y) dy = 1.
h (x) dx = n
U (0)

Rn
U1 (0)

So far, we have constructed only one function h(x) (as well as its scaled relatives h (x)) which
is C and has compact support. Using this single hat-function h we are able
(a) to restrict the support of an arbitrary integrable function f to a given domain by replacing
f by f h (x a) which has a support in U (a),
(b) to make f smooth.
(b) Mollification
In this way, we have an amount of C functions with compact support which is large enough for
our purposes (especially, to recover the function f from the functional Tf ). Using the function
h , S. L. Sobolev developed the following mollification method.
Definition 16.2 (a) Let f L1 (Rn ) and g D(Rn ), define the convolution product f g by
Z
Z
f (y)g(x y) dy =
f (x y)g(y) dy = (g f )(x).
(f g)(x) =

Rn

Rn

(b) We define the mollified function f of f by

f = f h .
Note that
f (x) =

Rn

h (x y)f (y) dy =

h (x y)f (y) dy.

(16.2)

U (x)

Roughly speaking, f (x) is the mean value of f in the -neighborhood of x. If f is continuous


at x0 then f (x0 ) = f () for some U (x0 ). This follows from Proposition 5.18.
In particular let f = [1,2] the characteristic
function of the interval [1, 2]. The mollification
f looks as follows

0,
x < 1 ,

1
2

1 < x < 1 + ,

,
f (x) =

1+

2 2+

1,

0,

1 + < x < 2 ,

2 < x < 2 + ,
2 + < x,

16.1 Introduction Test Functions and Distributions

421

where denotes a value between 0 and 1.


Remarks 16.2 (a) For f L1 (Rn ), f C (Rn ),
(b) f f in L1 (Rn ) as 0.
(c) C0 (Rn ) L1 (Rn ) is dense (with respect to the L1 -norm). In other words, for any
R
f L1 (Rn ) and > 0 there exists g C(Rn ) with supp g is compact and Rn | f g | dx < .
(Sketch of Proof). (A) Any integrable function can be approximated by integrable
functions with compact support. This follows from Example 12.6.
(B) Any integrable function with compact support can be approximated by simple functions (which are finite linear combinations of characteristic functions) with
compact support.
(C) Any characteristic function with compact support can be approximated by characteristic functions Q where Q is a finite union of boxes.
(D) Any Q where Q is a closed box can be approximated by a sequence fn of
continuous functions with compact support:
fn (x) = max{0, n d(x, Q)},

n N,

where d(x, Q) denotes the distance of x from Q. Note that fn is 1 in Q and 0


outside U1/n (Q).
n
1
n
(d) C
0 (R ) L (R ) is dense.

(b) Convergence in D
Notations. For x Rn and Nn0 (a multi-index), = (1 , . . . , n ) we write
| | = 1 + 2 + + n ,
! = 1 ! n !

x = x1 1 x2 2 xnn ,

| | u(x)
.
D u(x) = 1
x1 xnn

It is clear that D(Rn ) is a linear space. We shall introduce an appropriate notion of convergence.
Definition 16.3 A sequence (n (x)) of functions of D(Rn ) converges to D(Rn ) if there
exists a compact set K Rn such that
(a) supp n K for all n N and
(b)
D n D ,

uniformly on K for all multi-indices .

We denote this type of convergence by n .


D

16 Distributions

422

Example
16.2

 Let D be a fixed test function and consider the sequence (n (x)) given by
(x)
(a)
. This sequence converges to 0 in D since supp n = supp for all n and the
n
convergence
isuniform for all x Rn (in fact, it suffices to consider x supp ).

(x/n)
. The sequence does not converge to 0 in D since the supports supp (n ) =
(b)
n
n supp
n N, are not in any common compact subset.
 (), 
(nx)
(c)
has no limit if 6= 0, see homework 49.2.
n
Note that D(Rn ) is not a metric space, more precisely, there is no metric on D(Rn ) such that
the metric convergence and the above convergence coincide.

16.2 The Distributions D(

R)
n

Definition 16.4 A distribution (generalized function) is a continuous linear functional on the


space D(Rn ) of test functions.
Here, a linear functional T on D is said to be continuous if and only if for all sequences (n ),
n , D, with n we have hT , n i hT , i in C.
D

The set of distributions is denoted by D (Rn ) or simply by D .

The evaluation of a distribution T D on a test function D is denoted by hT , i. Two


distributions T1 and T2 are equal if and only if hT1 , i = hT2 , i for all D.
Remark 16.3 (Characterization of continuity.) (a) A linear functional T on D(Rn ) is continuous if and only if n 0 implies hT , n i 0 in C. Indeed, T continuous, trivially
D

implies the above statement. Suppose now, that n . Then (n ) 0; thus


D

hT , n i 0 as n . Since T is linear, this shows hT , n i hT , i and T is


continuous.
(b) A linear functional T on D is continuous if and only if for all compact sets K there exist a
constant C > 0 and l Z+ such that for all
| hT , i | C

sup
xK, | |l

| D (x) | ,

with supp K.

(16.3)

We show that the criterion (16.3) in implies continuity of T . Indeed, let 0. Then there
D

exists compact subset K Rn such that supp n K for all n. By the criterion, there is a
C > 0 and an l Z+ with | hT , n i | C sup | D n (x) |, where the supremum is taken over
all x K and multiindices with | | l. Since D n 0 on K for all , we particularly
have sup | D n (x) | 0 as n . This shows hT , n i 0 and T is continuous.
For the proof of the converse direction, see [Tri92, p. 52]

16.2.1 Regular Distributions

A large subclass of distributions


Z of D is given by ordinary functions via the correspondence
f Tf given by hTf , i =
f (x)(x) dx. We are looking for a class which is as large as

16.2 The Distributions D (Rn )

423

possible.
Definition 16.5 Let be an open subset of Rn . A function f (x) on is said to be locally
integrable over if f (x) is integrable over every compact subset K ; we write in this case
f L1loc ().
Remark 16.4 The following are equivalent:
(a) f L1loc (Rn ).
(b) For any R > 0, f L1 (UR (0)).
(c) For any x0 Rn there exists > 0 such that f L1 (U (x0 )).
Lemma 16.1 If f is locally integrable f L1loc (), Tf is a distribution, Tf D ().
A distribution T which is of the form T = Tf with some locally integrable function f is called
regular.
Proof. First, Tf is linear functional on D since integration is a linear operation. Secondly, if
m 0, then there exists a compact set K with supp m K for all m. We have the
D

following estimate:
Z

Z




| f (x) | dx = C sup | m (x) | ,
f (x)m (x) dx sup | m (x) |

R

Rn

xK

xK

where C = K | f | dx exists since f L1loc . The expression on the right tends to 0 since m (x)
uniformly tends to 0. Hence hTf , m i 0 and Tf belongs to D .

Example 16.3 (a) C() L1loc (), L1 () L1loc ().


1
is in L1loc ((0, 1)); however, f 6 L1 ((0, 1)) and f 6 L1loc (R) since f is not
(b) f (x) =
x
integrable over [1, 1].
Lemma 16.2 (Du BoisReymond, Fund. Lemma of the Calculus of Variation) Let
Rn be a region. Suppose that f L1loc (Rn ) and hTf , i = 0 for all D().
Then f = 0 almost everywhere in .

Proof. For simplicity we consider the case n = 1, = (, ). Fix with 0 < < . Let
n (x) = einx h (x), n Z. Then supp n [, ]. Since both, ex and h are C -functions,
n D() and
Z
cn = hTf , n i =
f (x)einx h (x) dx = 0, n Z;

and all Fourier coefficients of f h L2 [, ] vanish. From Theorem 13.13 (b) it follows that
f h is 0 in L2 (, ). By Proposition 12.16 it follows that f h is 0 a.e. in (, ). Since
h > 0 on (, ), f = 0 a.e. on (, ).

16 Distributions

424

Remark 16.5 The previous lemma shows, if f1 and f2 are locally integrable and Tf1 = Tf2 then
f1 = f2 a.e.; that is, the correspondence is one-to-one. In this way we can identify L1loc (Rn )
D (Rn ) the locally integrable functions as a subspace of the distributions.

16.2.2 Other Examples of Distributions


Definition 16.6 Every non-regular distribution is called singular. The most important example
of singular distribution is the -distribution a defined by
a Rn ,

ha , i = (a),

D.

It is immediate that a is a linear functional on D. Suppose that n 0 then n (x) 0


D

pointwise. Hence, a (n ) = n (a) 0; the functional is continuous on D and therefore a


distribution. We will also use the notation (x a) in place of a and or (x) in place of 0 .
Proof that a is singular. If a D were regular there would exist a function f L1loc such that
R
a = Tf , that is (a) = Rn f (x)(x) dx. First proof. Let Rn be an open such that a 6 .
R
Let D(), that is, supp . In particular (a) = 0. That is, f (x)(x) dx = 0 for
all D(). By Du Bois-Reymonds Lemma, f = 0 a.e. in . Since was arbitrary, f = 0
a.e. in Rn \ {a} and therefore, f = 0 a.e. in Rn . It follows that Tf = 0 in D (Rn ), however
a 6= 0 a contradiction.
Second Proof for a = 0. Since f L1loc there exists > 0 such that
d :=

U (0)

| f (x) | dx < 1.

Putting (x) = h(x/) with the bump function h we have supp = U (0) and
supxRn | (x) | = (0) > 0 such that
Z



Z

f (x)(x) dx sup | (x) |
n

U (0)

| f (x) | dx = (0)d < (0).


R
This contradicts Rn f (x)(x) dx = | (0) | = (0).
In the same way one can show that the assignment
hT , i = D (a),
defines an element of D which is singular.
The distribution
Z
hT , i =

Rn

a Rn ,

f (x)D (x) dx,

f L1loc ,

may be regular or singular which depends on the properties of f .


Locally integrable functions as well as a describe mass, force, or charge densities. That is why
L. Schwartz named the generalized functions distributions.

16.2 The Distributions D (Rn )

425

16.2.3 Convergence and Limits of Distributions


There are a lot of possibilities to approximate the distribution by a sequence of L1loc functions.
Definition 16.7 A sequence (Tn ), Tn D (Rn ), is said to be convergent to T D (Rn ) if and
only if for all D(Rn )
lim Tn () = T ().
n

Similarly, let T , > 0, be a family of distributions in D , we say that lim0 T = T if


lim0 T () = T () for all D.
Note that D (Rn ) with the above notion of convergence is complete, see [Wal02, p. 39].

Example 16.4 Let f (x) = 12 [1,1] and f = 1 f x be the scaling of f . Note that f =
1/(2)[,]. We will show tht f in D (R). Indeed, for D(R), by the Mean Value
Theorem of integration,
Z
Z
1
1
1
Tf () =
[,] dx =
(x) dx = 2() = (), [, ],
2 R
2
2
for some . Since is continuous at 0, () tends to (0) as 0 such that
lim Tf () = (0) = ().

This proves the calaim.


The following lemma generalizes this example.
Lemma 16.3 Suppose that f L1 (R) with

function f (x) = 1 f x .
Then lim Tf = in D (R).

R f (x) dx = 1. For > 0 define the scaled

0+0

R
Proof. By the change of variable theorem, R f (x) dx = 1 for all > 0. To prove the claim we
have to show that for all D
Z
Z
f (x)(x) dx (0) =
f (x)(0) dx as 0;

or, equivalently,


Z



f (x)((x) (0)) dx 0,

as 0.

Using the new coordinate y with x = y, dx = dy the above integral equals


Z
Z





=
.
f
(y)((y)

(0))
dy
f
(y)((y)

(0))
dy

Since is continuous at 0, for every fixed y, the family of functions ((y) (0)) tends to
0 as 0. Hence, the family of functions g (y) = f (y)((y) (0)) pointwise tends to

16 Distributions

426

0. Further, g has an integrable upper bound, 2C | f |, where C = sup | (x) |. By Lebesgues

theorem about the dominated convergence, the limit of the integrals is 0:


Z
Z
lim
| f (y) | | (y) (0) | dy =
| f (y) | lim | (y) (0) | dy = 0.
0

This proves the claim.


The following sequences of locally integrable functions approximate as 0.
1
2 x
sin
,
x2

x2
1
f (x) = e 42 ,
2

,
x2 + 2
x
1
sin
f (x) =
x

f (x) =

f (x) =

(16.4)



The first three functions satisfy the assumptions of the Lemma, the last one not since sinx x is
R
not in L1 (R). Later we will see that the above lemma even holds if f (x) dx = 1 as an
improper Riemann integral.

16.2.4 The distribution P

1
x

Since the function x1 is not locally integrable in a neighborhood of 0, 1/x is not a regular
distribution. However, we can define a substitute that coincides with 1/x for all x 6= 0.
Recall that the principal value (or Cauchy mean value) of an improper Riemann integral is
defined as follows. Suppose f (x) has a singularity at c [a, b] then
Z c Z b 
Z b
f (x) dx := lim
+
f (x) dx.
Vp
0

c+

R 1 dx
For example, Vp 1 x2n+1
= 0, n N.
For D define
Z Z 
Z
(x)
(x)
F () = Vp
dx = lim
dx.
+
0
x
x

Then F is obviously linear. We have to show, that F () is finite and continuous on D. Suppose
that supp [R, R]. Define the auxiliary function
(
(x)(0)
,
x 6= 0
x
(x) =
(0),
x = 0.
R
Since is differentiable at 0, C(R). Since 1/x is odd, dx/x = 0 and we get
Z Z 
Z Z R 
(x)
(x) (0)
F () = lim
+
+
dx = lim
dx
0
0
x
x

Z Z R 
Z R
+
(x) dx =
(x) dx.
= lim
0

16.2 The Distributions D (Rn )

427

Since is continuous, the above integral exists.


We now prove continuity of F . By Taylors theorem, (x) = (0) + x (x ) for some value x
between x and 0. Therefore


Z Z 


(x)

| F () | = lim
dx
+
0
x



Z Z R 



(
)
(0)
+
x
x
= lim
dx
+
0
x
R

Z R

| (x ) | dx 2R sup | (x) | .

This shows that the condition (16.3) in Remark 16.3 is satisfied with C = 2R and l = 1 such
that F is a continuous linear functional on D(R), F D (R). We denote this distribution by
P x1 .
In quantum physics one needs the so called Sokhotskys formulas, [Wla72, p.76]
1
1
= i(x) + P ,
0+0 x + i
x
1
1
lim
= i(x) + P .
0+0 x i
x
lim

Idea of proof: Show the sum and the difference of the above formulas instead.
lim

0+0 x2

2x
1
= 2P ,
2
+
x

2i
= 2i.
0+0 x2 + 2
lim

The second limit follows from (16.4).

16.2.5 Operation with Distributions


The distributions are distinguished by the fact that in many calculations they are much easier to
handle than functions. For this purpose it is necessary to define operations on the set D . We
already know how to add distributions and how to multiply them with complex numbers since
D is a linear space. Our guiding principle to define multiplication, derivatives, tensor products, convolution, Fourier transform is always the same: for regular distributions, i.e. locally
integrable functions, we want to recover the old well-known operation.
(a) Multiplication
There is no notion of a product T1 T2 of distributions. However, we can define aT = T a,
T D (Rn ), a C (Rn ). What happens in case of a regular distribution T = Tf ?
Z
Z
haTf , i =
(16.5)
a(x)f (x)(x) dx =
f (x) a(x)(x) dx = hTf , a i .

Rn
Rn
Obviously, a D(Rn ) since a C (Rn ) and has compact support; thus, a has compact
support, too. Hence, the right hand side of (16.5) defines a linear functional on D(Rn ). We
have to show continuity. Suppose that n 0 then an 0. Then hT , an i 0 since T
D

is continuous.

16 Distributions

428

Definition 16.8 For a C (Rn ) and T D (Rn ) we define aT D (Rn ) by


haT , i = hT , ai
and call aT the product of a and T .
Example 16.5 (a) xP


1
xP ,
x

1
x

= 1. Indeed, for D(Rn ),


1
P , x(x)
x

= Vp

x(x)
dx =
x

(x) dx = h1 , i .

(b) If f (x) C (Rn ) then


hf (x)a , i = ha , f (x)(x)i = f (a)(a) = f (a) ha , i .
This shows f (x)a = f (a)a .
(c) Note that multiplication of distribution is no longer associative:
1
1
( x) P = 0 P = 0,
x (b)
x

1
xP
x

= 1 = .

(a)

(b) Differentiation
Consider n = 1. Suppose that f L1loc is continuously differentiable. Suppose further that
D with supp (a, a) such that (a) = (a) = 0. We want to define (Tf ) to be Tf .
Using integration by parts we find
Z a
Z a
a

hTf , i =
f (x)(x) dx = f (x)(x)|a
f (x) (x) dx
a
a
Z a
f (x) (x) dx = hTf , i ,
=
a


where we used that (a) = (a) = 0. Hence, it makes sense to define Tf , = hTf , i.
This can easily be generalized to arbitrary partial derivatives D Tf .
Definition 16.9 For T D (Rn ) and a multi-index Nn0 define D T D (Rn ) by
hD T , i = (1)| | hT , D i .
We have to make sure that D T is indeed a distribution. The linearity of D T is obvious. To
prove continuity let n 0. By definition, this implies D n 0. Since T is continuous,
D

hT , D n i 0. This shows hD T , n i 0; hence D T is a continuous linear functional


on D.
Note, that exactly the fact D T D needs the complicated looking notion of convergence in
D, D n D . Further, a distribution has partial derivatives of all orders.

16.2 The Distributions D (Rn )

429

Lemma 16.4 Let a C (Rn ) and T D (Rn ). Then


(a) Differentiation D : D D is continuous in D , that is, Tn T in D implies D Tn
D T in D .
(b)

a
T
(a T ) =
T +a
,
xi
xi
xi

i = 1, . . . , n (product rule).

(c) For any two multi-indices and


D + T = D (D T ) = D (D T ) (Schwarzs Lemma).
Proof. (a) Suppose that Tn T in D , that is hTn , i hT , i for all D. In particular,
for = D , D, we get
(1)| | hD Tn , i = hTn , D i hT , D i = (1)| | hD T , i .
Since this holds for all D, the assertion follows.
(b) This follows from

 



T

T
a
, =
, a = T ,
(a)
xi
xi
xi





= haxi T , i aT ,
= hT , axi (x)i T , a
xi
xi

 


= haxi T , i +
(aT ) , = axi T +
(aT ) , .
xi
xi
Cancelling on both sides proves the claim.
(c) The easy proof uses D + = D (D ) for D.
Example 16.6 (a) Let a Rn , f L1loc (Rn ), D. Then
hD a , i = (1)| | ha , D i = (1)| | D (a)
Z

||
f D dx.
hD f , i = (1)

Rn

(b) Recall that the so-called Heaviside function H(x) is defined as the characteristic function of
the half-line (0, +). We compute its derivative in D :
Z
Z

H(x) (x) dx =
(x) dx = (x)|
hTH , (x)i =
0 = (0) = h , i .

This shows TH = .
f(x)

(c) More generally, let f (x) be differentiable on G =


R \ {c} = (, c) (c, ) with a discontinuity of the
first kind at c.

16 Distributions

430
The derivative of Tf in D is
Tf = Tf + hc ,

where

h = f (c + 0) f (c 0),

is the difference between the right-handed and left-handed limits of f at c. Indeed, for D
we have
 Z c
Z 



Tf , =
f (x) (x) dx

c
Z
= f (c 0)(c) + f (c + 0)(c) +
f (x)(x) dx
G

= (f (c + 0) f (c 0))c + Tf (x) ,
= hh c + Tf , i .

(d) We prove that f (x) = log | x | is in L1loc (R) (see homework 50.4, and 51.5) and compute its
derivative in D (R).
Proof. Since f (x) is continuous on R \ {0}, the only critical point is 0. Since the integral
R1
R0
(improper Riemann or Lebesgue) 0 log x dx = et dt = 1 exists f is locally integrable
at 0 and therefore defines a regular distribution. We will show that f (x) = P x1 . We use the
R
R R R
fact that = + + for all positive > 0. Also, the limit 0 of the right hand
R
side gives the . By definition of the derivative,
Z

log | x | (x) dx
hlog | x | , (x)i = hlog | x | , (x)i =


Z Z Z

+
+
log | x | (x) dx .
=

R

R
1

Since 1 log | x | (x) dx < , the middle integral log | x | (x) dx tends to 0 as 0
(Apply Lebesgues theorem to the family of functions g (x) = [,](x) log | x | (x) which
pointwise tends to 0 and is dominated by the integrable function log | x | (x)). We consider
the third integral. Integration by parts and (+) = 0 gives
Z
Z
Z
(x)
(x)

dx = log ()
dx.
log x (x) dx = log x (x)|
x
x

Similarly,
Z

log(x) (x) dx = log ()

(x)
dx.
x

The sum of the first two (non-integral) terms tends to 0 as 0 since log 0. Indeed,
log () log () = log
Hence,

hf , i = lim

Z

() ()
2 2 lim log (0) = 0.
0
2
+

(x)
dx =
x


1
P , .
x

16.2 The Distributions D (Rn )

431

(c) Convergence and Fourier Series


Lemma 16.5 Suppose that (fn ) converges locally uniformly to some function f , that is, fn f
uniformly on every compact set; assume further that fn is locally integrable for all n, fn
L1loc (Rn ).
(a) Then f L1loc (Rn ) and Tfn Tf in D (Rn ).
(b) D Tfn D Tf in D (Rn ) for all multi-indices .

Proof. (a) Let K be a compact subset of Rn , we will show that f L1 (K). Since fn converge
R
R
uniformly on K to 0, by Theorem 6.6 f is integrable and limn K fn (x) dx = K f dx. such
that f L1loc (Rn ).
We show that Tfn Tf in D . Indeed, for any D with compact support K, again by
Theorem 6.6 and uniform convergence of fn on K,
Z
fn (x)(x) dx
lim Tfn () = lim
n
n K
Z 
Z

=
lim fn (x) (x) dx =
f (x)(x) dx = Tf ();
K

we are allowed to exchange limit and integration since (fn (x)(x)) uniformly converges on K.
Since this is true for all D, it follows that Tfn Tf .
(b) By Lemma 16.4 (a), differentiation is a continuous operation in D . Thus D Tfn D Tf .

Example 16.7 (a) Suppose that a, b > 0 and m N are given such that | cn | a | n |m + b for
all n Z. Then the Fourier series
X
cn einx ,

converges in D (R).
First consider the series

X
cn
c0 xm+2
+
einx .
(m + 2)! nZ,n6=0 (ni)m+2
By assumption,



cn
cn
inx


(ni)m+2 e = (ni)m+2


a | n |m + b
a

.
m+2

|n|
| n |2

(16.6)

P
Since n6=0 | na|2 < , the series (16.6) converges uniformly on R by the criterion of Weierstra (Theorem 6.2). By Lemma 16.5, the series (16.6) converges in D , too and can be differentiated term-by-term. The (m + 2)nd derivative of (16.6) is exactly the given Fourier series.
1/2

1/2

x
The 2-periodic function f (x) = 12 2
,x
[0, 2) has discontinuities of the first kind at
2n, n Z; the jump has height 1 since
f (0 + 0) f (0 0) = 12 + 12 = 1.

16 Distributions

432
Therefore in D
f (x) =
The Fourier series of f (x) is

X
1
(x 2n).
+
2 nZ

f (x)

1 X 1 inx
e .
2i n6=0 n

R 2
Note that f and the Fourier series g on the right are equal in L2 (0, 2). Hence 0 | f g |2 = 0.
This implies f = g a. e. on [0, 2]; moreover f = g a. e. on R. Thus f = g in L1loc (R) and f
coincides with g in D (R).
1 X 1 inx
f (x) =
e
in D (R).
2i
n
n6=0

By Lemma 16.5 the series can be differentiated elementwise up to arbitrary order. Applying
Example 16.6 we obtain:
X
1
1 X inx
=
(x 2n) =
e
in D (R).
f (x) +
2 nZ
2 nZ

(b) A solution of xm u(x) = 0 in D is

u(x) =

m1
X

cn (n) (x),

n=0

cn C.

Since for every D and n = 0, . . . , m 1 we have


m (n)
x (x) , = (1)n , (xm (x))(n) = (xm (x))(n) x=0 = 0;

thus, the given u satisfies xm u = 0. One can show, that this is the general solution, see [Wla72,
p. 84].
(c) The general solution of the ODE u(m) = 0 in D is a polynomial of degree m 1.
Proof. We only prove that u = 0 implies u = c in D . The general statement follows by
induction on m.
Suppose that u = 0. That is, for all D we have 0 = hu , i = hu , i. In particular,
for , 1 D we have
Z x
((t) 1 (t)I) dt, where I = h1 , i ,
(x) =

belongs to D since both and 1 do; 1 plays an auxiliary role. Since hu , i = 0 and
= I 1 we obtain
0 = hu , i = hu , 1 Ii = hu , i hu , 1 i h1 , i
= hu , i h1 hu , 1 i , i

= hu 1 hu , 1 i , i = hu c 1 , i ,

where c = hu , 1 i. Since this is true for all test functions D(Rn ), we obtain 0 = u c or
u = c which proves the assertion.

16.3 Tensor Product and Convolution Product

433

16.3 Tensor Product and Convolution Product


16.3.1 The Support of a Distribution
Let T D be a distribution. We say that T vanishes at x0 if there exists > 0 such that
hT , i = 0 for all functions D with supp U (x0 ). Similarly, we say that two
distributions T1 and T2 are equal at x0 , T1 (x0 ) = T2 (x0 ), if T1 T2 vanishes at x0 . Note that
T1 = T2 if and only if T1 = T2 at a Rn for all points a Rn .
Definition 16.10 Let T D be a distribution. The support of T , denoted by supp T , is the set
of all points x such that T does not vanish at x, that is
supp T = {x | > 0 D(U (x)) : hT , i =
6 0}.
Remarks 16.6 (a) If f is continuous, then supp Tf = supp f ; for an arbitrary locally integrable
function we have, in general supp Tf supp f . The support of a distribution is closed. Its
complement is the largest open subset G of Rn such that T G = 0.
(b) supp a = {a}, that is, a vanishes at all points b 6= a. supp TH = [0, +), supp TQ = .

16.3.2 Tensor Products


(a) Tensor product of Functions
Let f : Rn C, g : Rm C be functions. Then the tensor product f g : Rn+m C is
defined via f g(x, y) = f (x)g(y), x Rn , y Rm .
If D(Rn ) and k D(Rm ), k = 1, . . . , r, we call the function (x, y) =
Pr k
(y) which is defined on Rn+m the tensor product of the functions k and k . It is
k=1 k (x)
Pk
P
denoted by k k k . The set of such tensors rk=1 k k is denoted by D(Rn ) D(Rm ).
It is a linear space.
P
Note first that under the above assumptions on k and k the tensor product = k k
k C (Rn+m ). Let K1 Rn and K2 Rm denote the common compact supports of
the families {k } and {k }, respectively. Then supp K1 K2 . Since both K1 and K2
are compact, its product K1 K2 is agian compact. Hence, (x, y) D(Rn+m ). Thus,
D(Rn ) D(Rm ) D(Rn+m ). Moreover, D(Rn ) D(Rm ) is a dense subspace in D(Rn+m ).
(m)
(m)
That is, for any D(Rn+m ) there exist positive integers rk N and test functions k , k
such that
rm
X
(m)
(m)
k k as m .
D

l=1

(b) Tensor Product of Distributions


Definition 16.11 Let T D (Rn ) and S D (Rm ) be two distributions. Then there exists a
unique distribution F D (Rn+m ) such that for all D(Rn ) and D(Rm )
F ( ) = T ()S().

This distribution F is denoted by T S.

16 Distributions

434

P
Indeed, T S is linear on D(Rn ) D(Rn ) such that (T S)( rk=1 k k ) =
Pr
n
n
n+m
). For
k=1 T (k )S(k ). By continuity it is extended from D(R ) D(R ) to D(R
n
m
n
example, if a R , b R then a b = (a,b) . Indeed, for D(R ) and D(Rm ) we
have
(a b )( ) = (a)(b) = ( )(a, b) = (a,b) ( ).
Lemma 16.6 Let F = T S be the unique distribution in D (Rn+m ) where T D(Rn ) and
S D(Rm ) and (x, y) D(Rn+m ).
Then (x) = hS(y) , (x, y)i is in D(Rn ), (y) = hT (x) , (x, y)i is in D(Rm ) and we have
h(T S) , i = hS , hT , ii = hT , hS , ii .
For the proof, see [Wla72, II.7].
Example 16.8 (a) Regular Distributions. Let f L1loc (Rn ) and g L1loc (Rm ). Then f g
L1loc (Rn+m ) and Tf Tg = Tf g . Indeed, by Fubinis theorem, for test functions and one
has
Z
Z
h(Tf Tg ) , i = hTf , i hTg , i = f (x)(x) dx
g(y)(y) dy
ZR

Rm

Rn+m

f (x)g(y) (x)(y) dxdy = hTf g , i .

(b) hx0 T , i = hT , (x0 , y)i. Indeed,


hx0 T , (x)(y)i = hx0 , (x)i hT , i = (x0 ) hT , (y)i = hT , (x0 )(y)i .
In particular,
(a Tg )() =
(c) For any Nn0 , Nm
0 ,

Rm

g(y)(a, y) dy.

D + (T S) = (Dx T ) (Dy S) = D ((D T ) S) = D (T D S).


Idea of proof in case n = m = 1. Let , D(R). Then



(T S) ( ) = (T S)
( )
x
x

= (T S)( ) = T ( )S() = T ()S()


= (T S)( ).

16.3.3 Convolution Product


Motivation: Knowing the fundamental solution E of a linear differential operator L, that is
L(E) = , one can immediately has the solution of the equation L[u] = f for an arbitrary
f , namely u = E f where the is the convolution product already defined for functions in
Definition 16.2.

16.3 Tensor Product and Convolution Product

435

(a) Convolution Product of Functions


The main problem with convolutions is: we run into trouble with the support. Even in case that
f and g are locally integrable, f g need not to be a locally integrable function. However, there
are three cases where all is fine:
1. One of the two functions f or g has compact support.
2. Both functions have support in [0, +).
3. Both functions are in L1 (R).
R
In the last case (f g)(x) = f (y)g(x y) dy is again integrable. The convolution product is
a commutative and associative operation on L1 (Rn ).
(b) Convolution Product of Distributions
Let us consider the case of regular distributions. Suppose that If f, g, f g L1loc (Rn ).
As usual we want to have Tf Tg = Tf g . Let D(Rn ),
Z
ZZ
hTf g , i = (f g)(x)(x) dx =
f (y)g(x y)(x) dxdy

t=xy

ZZ

R2

R2

f (y)g(t)(y + t) dy dt = Tf g (),

(16.7)

where (y,
t) = (y + t).

0110
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
supp (x+y)
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
x
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
supp
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
y

There are two problems: (a) in general is not


a test function since it has unbounded support
in R2n . Indeed, (y, t) supp if y + t = c
supp , which is a family of parallel lines forming an unbounded strip. (b) the integral does not
exist. We overcome the second problem if we
impose the condition that the set
K = {(y, t) R2n | y supp Tf , t supp Tg ,
y + t supp }

is bounded for any D(Rn ); then the integral (16.7) makes sense.

We want to solve the problem (a) by cutting .


Define
Tf g () = lim (Tf Tg )((y + t)k (y, t)),
k

where k 1 as k and k D(R2n ). Such a sequence exists; let (y, t) D(R2n )


n

with (y, t) = 1 for kyk2 +ktk2 1. Put k (y, t) = ky , kt , k N. Then limk k (y, t) = 1
for all (y, t) R2n .

16 Distributions

436

Definition 16.12 Let T, S D (Rn ) be distributions and assume that for every D(Rn ) the
set
K := {(x, y) R2n | x + y supp , x supp T, y supp S}
is bounded. Define
hT S , i = lim hT S , (x + y)k (x, y)i .

(16.8)

T S is called the convolution product of the distributions S and T .


Remark 16.7 (a) The sequence (16.8) becomes stationary for large k such that the limit exists.
Indeed, for fixed , the set K is bounded, hence there exists k0 N such that k (x, y) = 1 for
all x, y K and all k k0 . That is (x + y)k (x, y) does not change for k k0 .
(b) The limit is a distribution in D (Rn )
(c) The limit does not depend on the special choice of the sequence k .
Remarks 16.8 (Properties) (a) If S or T has compact support, then T S exists. Indeed,
suppose that supp T is compact. Then x + y supp and x supp T imply y supp
supp T = {y1 y2 | y1 supp , y2 supp T }. Hence, (x, y) K implies
k(x, y)k kxk + kyk kxk + ky1 k + ky2 k 2C + D
if supp T UC (0) and supp UD (0). That is, K is bounded.
(b) If S T exists, so does T S and S T = T S.
(c) If T S exists, so do D T S, T D S, D (T S) and they coincide:
D (T S) = D T S = T D S.
Proof. For simplicity let n = 1 and D =

d
.
dx

Suppose that D(R) then

h(S T ) , i = hS T , i = lim hS T , (x + y) k (x, y)i


k



k
= lim S T ,
((x + y)k (x, y)) (x + y)
k
x
x
*
= lim hS T , (x + y)k (x, y)i lim
k

S T , (x + y)

k
x
|{z}

=0 for large k

= hS T , i
The proof of the second equality uses commutativity of the convolution product.

(d) If supp S is compact and D(Rn ) such that (y) = 1 in a neighborhood of supp S.
Then
(T S)() = hT S , (x + y)(y)i ,
D(Rn ).

(e) If T1 , T2 , T3 D (Rn ) all have compact support, then T1 (T2 T3 ) and (T1 T2 ) T3 exist
and T1 (T2 T3 ) = (T1 T2 ) T3 .

16.3 Tensor Product and Convolution Product

437

16.3.4 Linear Change of Variables


Suppose that y = Ax + b is a regular, linear change of variables; that is, A is a regular n n

matrix. As usual, consider first the case of a regular distribution f (x). Let f(x)
= f (Ax + b)
1
with y = Ax + b, x = A (y b), dy = det A dx. Then
D
E Z

f (x) , (x) = f (Ax + b)(x) dx


Z
1
= f (y)(A1(y b))
dy
| det A |


1
f (y) , (A1 (y b)) .
=
| det A |
Definition 16.13 Let T D (Rn ), A a regular n n-matrix and b Rn . Then T (Ax + b)
denotes the distribution


1
T (y) , (A1 (y b)) .
hT (Ax + b) , (x)i :=
| det A |

For example, T (x) = T , hT (x a) , (x)i = hT , (x + a)i, in particular, (x a) = a .


h(x b) , (x)i = h(x) , (x + b)i = (0 + b) = (b) = hb , i .

Example 16.9 (a) S = S = S for all S D . The existence is clear since has compact
support.
h( S) , i = lim h(x) S(y) , (x + y)k (x, y)i
k

= lim hS(y) , (y)k (0, y)i = hS , i


k

(b) a S = S(x a). Indeed,


(a S)() = lim ha S , (x + y)k (x, y)i
k

= lim hS(y) , (a + y)k (a, y)i = hS(y) , (a + y)i = hS(y a) , i .


k

Inparticular a b = a+b .
(c) Let L1loc (Rn ) and supp Tf is compact.

1
Case n = 2 f (x) = log kxk
L1loc (R2 ). We call

V (x) = ( f )(x) =
surface potential with density .
Case n 3 f (x) =

1
kxkn2

ZZ

L1loc (Rn ). We call


V (x) = ( f )(x) =

vector potential with density .


(d) For > 0 and x R put f (x) =

x
1 e 22 .
2

R2

Rn

(y) log

(y)

1
dy
kx yk

1
dy
kx ykn2

Then f f = f2 + 2 .

16 Distributions

438

16.3.5 Fundamental Solutions


Suppose that L[u] is a linear differential operator on Rn ,
X
L[u] =
c (x)D u,
| |k

where c C (Rn ).
Definition 16.14 A distribution E D (Rn ) is said to be a fundamental solution of the differential operator L if
L(E) = .
Note that E D (Rn ) need not to be unique. It is a general result due to Malgrange and Ehrenpreis (1952) that any linear partial differential operator with constant coefficients possesses a
fundamental solution.
(a) ODE
We start with an example from the theory of ordinary differential equations. Recall that H =
(0,+) denotes the Heaviside function.
Lemma 16.7 Suppose that u(t) is a solution of the following initial value problem for the ODE
L[u] = u(m) + a1 (t)u(m1) + + am (t)u = 0,

u(0) = u (0) = = u(m2) (0) = 0,

u(m1) (0) = 1.

Then E = TH(t) u(t) is a fundamental solution of L, that is, it satisfies L(E) = .


Proof. Using Leibniz rule, Example 16.5 (b), and u(0) = 0 we find
E = Tu + THu = u(0) + THu = THu .
Similarly, on has
E = THu , . . . , E(m1) = THu(m1) ,

E(m) = THu(m) + (t).

This yields
L(E) = E(m) + a1 (t)E(m1) + + am (t)E(t) = TH(t) L(u(t)) + = T0 + = .

Example 16.10 We have the following example of fundamental solutions:


y + ay = 0,

E = TH(x)eax ,

E = TH(x) sin ax .

y + a y = 0,

16.4 Fourier Transformation in S (Rn ) and S (Rn )

439

(b) PDE
Here is the main application of the convolution product: knowing the fundamental solution of
a partial differential operator L one immediately knows a weak solution of the inhomogeneous
equations L(u) = f for f D (Rn ).
X
Theorem 16.8 Suppose that L[u] =
c D u is a linear differential operator in Rn with
| |k

constant coefficients c . Suppose further that E D (Rn ) is a fundamental solution of L. Let


f D (Rn ) be a distribution such that the convolution product S = E f exists.
Then L(S) = f in D .
In the set of distributions of D which possess a convolution with E, S is the unique solution of
L(S) = f
Proof. By Remark 16.8 (b) we have
X
X
L(S) =
ca D (E f ) =
ca D (E) f = L(E) f = f = f.
| |k

| |k

Suppose that S1 and S2 are both solutions of L(S) = f , i. e. L(S1 ) = L(S2 ) = f . Then
S1 S2 = (S1 S2 ) = (S1 S2 )

ca D E =

| |k

| |k

ca D (S1 S2 ) E = (f f ) E = 0. (16.9)

16.4 Fourier Transformation in S (

R ) and S (R )
n

We want to define the Fourier transformation for test functions as well as for distributions.
The problem with D(Rn ) is that its Fourier transformation
Z
eix (x) dx
F() = n

of is an entire (analytic) function with real support R. That is, F does not have compact
support. The only test function in D which is analytic is 0. To overcome this problem, we
enlarge the space of test function D S in such a way that S becomes invariant under the
Fourier transformation F(S ) S .

R
Lemma 16.9 Let D(R). Then the Fourier transform g(z) = n R eitz (t) dt is holomorphic in the whole complex plane and bounded in any half-plane Ha = {z C | Im (z)
a}.

16 Distributions

440

Proof. (a) We show that the complex limit limh0(g(z + h) g(z))/h exists for all z C.
Indeed,
Z
g(z + h) g(z)
eiht 1
= n
(t) dt.
eizt
h
h
R


izt eiht 1

Since e
(t) C for all x supp (), h C, | h | 1, we can apply Lebesgues
h
Dominated Convergence theorem:
Z
Z
g(z + h) g(z)
eiht 1
izt
= n
(t) dt = n
e
lim
eizt (it)(t) dt = F(it(t)).
lim
h0
h0
h
h
R
R
(b) Suppose that Im (z) a. Then
Z
Z
it Re (z) t Im (z)


| g(z) | n
e
e
(t) dt n sup | (t) |
eta dt,

tK

where K is a compact set which contains supp .

R)

16.4.1 The Space S (

Definition 16.15 S (Rn ) is the set of all functions f C (Rn ) such that for all multi-indices
and


p, (f ) = sup x D f (x) < .

Rn

S is called the Schwartz space or the space of rapidly decreasing functions.


S (Rn ) = {f C (Rn ) | , : P, (f ) < }.

Roughly speaking, a Schwartz space function is a function decreasing to 0 (together with all its
1
partial derivatives) faster than any rational function
as x . In place of p, one can
P (x)
also use the norms
X
pk,l () =
p, (), k, l Z+
| |k, | |l,

to describe S (Rn ).
The set S (Rn ) is a linear space and p, are norms on S .
2
For example, P (x) 6 S for any non-zero polynomial P (x); however ekxk S (Rn ).
S (Rn ) is an algebra. Indeed, the generalized Leibniz rule ensures pkl ( ) < . For
2
example, f (x) = p(x)eax +bx+c , a > 0, belongs to S (R) for p is a polynomial; g(x) = e| x |
is not differentiable at 0 and hence not in S (R).
Convergence in S
Definition 16.16 Let n , S . We say that the sequence (n ) converges in S to , abbreviated by n , if one of the following equivalent conditions is satisfied for all multi-indices
S

16.4 Fourier Transformation in S (Rn ) and S (Rn )

441

and :
p, ( n ) 0;
n

0, uniformly on Rn ;
x D ( n )

x D , uniformly on Rn .

x D n

Remarks 16.9 (a) In quantum mechanics one defines the position and momentum operators
Qk and Pk , k = 1, . . . , n, by
(Qk )(x) = xk (x),

(Pk )(x) = i

,
xk

respectively. The space S is invariant under both operators Qk and Pk ; that is x D (x)
S (Rn ) for all S (Rn ).
(b) S (Rn ) L1 (Rn ).
Recall that a rational function P (x)/Q(x) is integrable over [1, +) if and only if Q(x) 6= 0
for x 1 and deg Q deg P + 2. Indeed, C/x2 is then an integrable upper bound. We want
to find a condition on m such that
Z
dx
< .
Rn (1 + kxk2 )m

For, we use that any non-zero x Rn can uniquely be written as x = r y where r = kxk and y
is on the unit sphere Sn1 . One can show that dx1 dx2 dxm = r n1 dr dS where dS is the
surface element of the unit sphere Sn1 . Using this and Fubinis theorem,
Z
Z Z
Z n1
dx
r n1 dr dS
r
dr
= n1
,
2 m =
2
m
(1 + r 2 )m
0
0
Rn (1 + kxk )
Sn1 (1 + r )

where n1 is the (n 1)-dimensional measure of the unit sphere Sn1 . By the above criterion,
the integral is finite if and only if 2m n + 1 > 1 if and only if m > n/2. In particular,
Z
dx
< .
Rn 1 + kxkn+1
In case n = 1 the integral is .
By the above argument
Z
Z
| (x) | dx =

Rn

Rn



(1 + kxk2n )(x)

C p0,2n ()

Rn

dx
1 + kxk2n

dx
< .
1 + kxk2n

(c) D(Rn ) S (Rn ); indeed, the supremum p, () of any test function D(Rn ) is finite
since the supremum of a continuous function over a compact set is finite. On the other hand,
2
D ( S since ekxk is in S but not in D.
(d) In contrast to D(Rn ), the rapidly decreasing functions S (Rn ) form a metric space. Indeed,
S (Rn ) is a locally convex space, that is a linear space V such that the topology is given by a

16 Distributions

442

set of semi-norms p separating the elements of V , i. e. p (x) = 0 for all implies x = 0.


Any locally convex linear space where the topology is given by a countable set of semi-norms
is metrizable. Let (pn )nN be the defining family of semi-norms. Then

X
1 pn ( )
d(, ) =
,
2n 1 + pn ( )
n=1

, V

defines a metric on V describing the same topology. (In our case, use Cantors first diagonal
method to write the norms pkl , k, l N, from th array into a sequence pn .)
Definition 16.17 Let f (x) L1 (Rn ), then the Fourier transform Ff of the function f (x) is
given by
Z
1

eix f (x) dx,


Ff () = f () = n
2
Rn
P
where x = (x1 , . . . , xn ), = (1 , . . . , n ), and x = nk=1 xk k .

1
Let us abbreviate the normalization factor, n = 2
n . Caution, Wladimirow, [Wla72] uses
+ix
under the integral and normalization factor 1 in place of n .
another convention with e
R
Note that Ff (0) = n Rn f (x) dx.

Example 16.11 We calculate the Fourier transform F


2
1
(x) = ekxk /2 = e 2 xx , x Rn .
(a) n = 1. From complex analysis, Lemma 14.30, we know
Z
 x2 
2
x2
1
2
F e
() =
e 2 eix dx = e 2 .
2 R

of

the

function

(16.10)

(b) Arbitrary n. Thus,


F() = ()

= n
= n
=

Rn k=1

n
Y

(16.10)

Rn
1

e 2

Pn

k=1

x2k

ei

Pn

k=1

xk k

dx

e 2 xk ixk k dx1 dxn


1

n e 2 xk ixk k dxk

k=1

F() =

n
Y

n
Y

1 2

1 2

e 2 k = e 2 .

k=1
1 2

Hence, the Fourier transform of e 2 x is the function itself. It follows via scaling x 7 cx that
 c2 x 2 
2
1
F e 2 () = n e 2c2 .
c
Theorem 16.10 Let , S (Rn ). Then we have

(i) F(x (x)) = i| | D (F), that is F Qk = Pk F, k = 1, . . . , n.

16.4 Fourier Transformation in S (Rn ) and S (Rn )

443

(ii) F(D (x))() = i| | (F)(), that is F Pk = Qk F, k = 1, . . . , n.


(iii) F() S (Rn ), moreover n implies Fn F, that is, the Fourier transform
S

F is a continuous linear operator on S .

(iv) F( ) = n1 F() F().


(v) F( ) = n F() F()
(vi)
F((Ax + b))() =


1
1
eiA b F A ,
| det A |

where A is a regular n n matrix and A denotes the transpose of A1 . In particular,


 

1
,
F((x))() =
n (F)
||

F((x + b))() = eib (F)().

Proof. (i) We carry out the proof in case = (1, 0, . . . , 0). The general case simply follows.
Z

(F)() = n
eix (x) dx.
1
1 Rn

Since 1 eix (x) = ix1 eix(x) tends to 0 as x , we can exchange partial differentiation and integration, see Proposition 12.23 Hence,
Z

eix ix1 (x) dx = F(ix1 (x))().


(F)() = n
1
Rn
(ii) Without loss of generality we again assume = (1, 0, . . . , 0). Using integration by parts,
we obtain


Z
Z


ix
(x) () = n
x1 (x) dx = n
e
F
eix (x) dx
x1
RZn
Rn x1

eix (x) dx = i1 (F)().
= i1 n

Rn

(iii) By (i) and (ii) we have for | | k and | | l


Z
Z




D (x (x)) dx c1
D F n

(1 + kxkl )

(1 + kxkl+n+1) X
| D (x) | dx
n+1
n
R (1 + kxk ) | |k

X
| D (x) |
c3 sup (1 + kxkl+n+1 )
c2

Rn

c4 pk,l+n+1().

| |k

| |k

| D (x) | dx

16 Distributions

444

This implies F S (Rn ) and, moreover F : S S being continuous.


(iv) First note that L1 (Rn ) is an algebra with respect to the convolution product where
kf gkL1 kf kL1 kgkL1 . Indeed,

Z Z
Z
| f g | dx =
| f (y)g(x y) | dy dx
kf gkL1 =
n
n
n
R
R
R
Z

Z

| f (y) |
| g(x y) | dx dy
Rn Z
Rn
| f (y) | dy = kf kL1 kgkL1 .
kgkL1

Rn

This in particular shows that | f g(x) | < is finite a.e. on Rn . By definition and Fubinis
theorem we have
Z
Z
ix
F( )() = n
e
(y)(x y) dy dx
n
n
R
R

Z  Z
1
i(xy)
= n
(x y) dx n eiy (y) dy
e
n

Rn

z=xy

Rn

n1 F() F().

(v) will be done later, after Proposition 16.11.


(vi) is straightforward using hA1 (y) , i = y , A () .
Remark 16.10 Similar properties as F has the operator G which is also defined on L1 (Rn ):
Z
G() = ()

= n
e+ix (x) dx.

Rn

Put (x) := (x). Then


G = F = F()

and F = G = G().

It is easy to see that (iv) holds for G, too, that is


G( ) = n1 G()G().
Proposition 16.11 (Fourier Inversion Formula) The Fourier transformation is a one-to-one
mapping of S (Rn ) onto S (Rn ). The inverse Fourier transformation is given by G:
F(G) = G(F) = ,
xx

S (Rn ).

2 x2

Proof. Let (x) = e 2 and (x) = (x) = e 2 . Then (x) := F (x) = 1n


have
Z
Z
Z
1 x

dx = n
n
(x) dx = n
(x) dx = (0)
= 1.

Rn
Rn n
Rn

. We

16.4 Fourier Transformation in S (Rn ) and S (Rn )

445

Further,
Z
Z
Z
1
1
1
n
(x)(x) dx = n
(x)(x) dx n
(x)(0) dx = (0).
0
2 Rn
2 Rn
2 Rn
(16.11)
In other words, n (x) is a -sequence.
We compute G(F )(x). Using Fubinis theorem we have
Z
Z
Z
1
1
i x
i x
(F)() ()e d =
()e
ei y (y) dyd
G(F )(x) = n
n
(2) Rn
2 ZRn
Rn
Z
1
1
(y) n
ei(yx) ()d dy
= n
2 ZRn
2 Rn
1
(y) (F )(y x) dy
= n
2 Rn Z
Z
1
1
= n
F (z) (z + x) dz = n
(z) (z + x) dz
z:=yx
2 Rn
2 Rn

(x).
as 0, see (16.11)

On the other hand, by Lebesgues dominated convergence theorem,


Z
Z
i x
(F)()(0)e d = n
(F)()ei x d = G(F)(x).
lim G(F )(x) = n

Rn

Rn

This proves the first part. The second part F(G) = follows from G() = F(), F() =
G(), and the first part.
We are now going to complete the proof of Theorem 16.10 (v). For, let = G1 and = G1
with 1 , 1 S . By (iv) we have
F( ) = F(G(1 )G(1 )) = F(n G(1 1 )) = n 1 1 = n F F.

Proposition 16.12 (FourierPlancherel formula) For , S (Rn ) we have


Z
Z
dx =
F() F() dx,

Rn

Rn

In particular, kkL2 (Rn ) = kF()kL2 (Rn )


Proof. First note that
F()(y) = n

ixy

(x) dx = n

eixy (x) dx = F()(y).

By Theorem 16.10 (v),


Z
(x) (x) dx = n1 F( )(0) = (F() F())(0)
Rn
Z
Z
=
F()(y) F()(0 y) dy =

Rn

(16.12)

(16.12)

Rn

F()(y) F()(y) dy.

16 Distributions

446

Remark 16.11 S (Rn ) L2 (Rn ) is dense. Thus, the Fourier transformation has a unique
extension to a unitary operator F : L2 (Rn ) L2 (Rn ). (To a given f L2 choose a sequence
n S converging to f in the L2 -norm. Since F preserves the L2 -norm, kFn Fm k =
kn m k and (n ) is a Cauchy sequence in L2 , (Fn ) is a Cauchy sequence, too; hence it
converges to some g L2 . We define F(f ) = g.)

R)

16.4.2 The Space S (

Definition 16.18 A tempered distribution (or slowly increasing distribution) is a continuous


linear functional T on the space S (Rn ). The set of all tempered distributions is denoted by
S (Rn ).
A linear functional T on S (Rn ) is continuous if for all sequences n S with n 0,
S

hT , n i 0.
For n D with n 0 it follows that n 0. So, every continuous linear functional
S

on S (restricted to D) is continuous on D. Moreover, the mapping : S D , (T ) = T D


is injective since T () = 0 for all D implies T () = 0 for all S which follows
from D S is dense and continuity of T . Using the injection , we can identify S with as a
subspace of D , S D . That is, every tempered distribution is a distribution.
Lemma 16.13 (Characterization of S ) A linear functional T defined on S (Rn ) belongs to
S (Rn ) if and only if there exist non-negative integers k and l and a positive number C such
that for all S (Rn )
| T () | C pkl (),
X
where pkl () =
p ().
| |k | |l,

Remarks 16.12 (a) With the usual identification f Tf of functions and regular distributions,
L1 (Rn ) S (Rn ), L2 (Rn ) S (Rn ).
2
(b) L1loc 6 S , for example Tf 6 S (
R), f (x) = ex , since Tf () is not well-defined for all
2

Schwartz function , for example Tex2 ex = +.


(c) If T D (Rn ) and supp T is compact then T S (Rn ).
(d) Let f (x) be measurable. If there exist C > 0 and m N such that
| f (x) | C(1 + kxk2 )m

a. e. x Rn .

Then Tf S . Indeed, the above estimate and Remark 16.9 imply



Z
Z


1


f (x)(x) dx C
(1 + kxk2 )m (1 + kxk2 )n
| hTf , i | =
| (x) | dx
n
(1 + kxk2 )n
Rn
R
Z
dx
Cp0,2m+2n ()
.
Rn (1 + kxk2 )n
By Lemma 16.13, f is a tempered regular distribution, f (x) S .

16.4 Fourier Transformation in S (Rn ) and S (Rn )

447

Operations on S
The operations are defined in the same way as in case of D . One has to show that the result is
again in the (smaller) space S . If T S then
(a) D T S for all multi-indices .
(b) f T S for all f C (Rn ) such that D f growth at most polynomially at infinity for all
multi-indices , (i. e. for all multi-indices there exist C > 0 and k such that | D f (x) |
C (1 + kxk)k . )
(c) T (Ax + b) S for any regular real n n- matrix and b Rn .
(d) T S (Rn ) and S S (Rm ) implies T S S (Rn+m ).
(e) Let T S (Rn ), S (Rn ). Define the convolution product
h T , i = h(1(x) T (y)) , (x)(x + y)i ,
 Z

= T,
(x)(x + y) dx

S (Rn )

Rn

Note that this definition coincides with the more general Definition 16.12 since
lim (x)(x + y)k (x, y) = (x)(x + y) S (R2n ).

R)

16.4.3 Fourier Transformation in S (

We are following our guiding principle to define the Fourier transform of a distribution T S :
First consider the case of a regular tempered distribution. We want to define F(Tf ) := TFf .
Suppose that f (x) L1 (Rn ) is integrable. Then its Fourier transformation Ff exists and is a
bounded continuous function:
Z
Z

ix


| Ff () | n
| f (x) | dx = n kTf kL1 < .
e f (x) dx = n

Rn

Rn

By Remark 16.12 (d), Ff defines a distribution in S . By Fubinis theorem


ZZ
Z
hTFf , i =
f (x)eix()d dx
Ff ()()d = n

Rn

Rn

R2n

f (x) F(x) dx = hTf , Fi .

Hence, hTFf , i = hTf , Fi. We take this equation as the definition of the Fourier transformation of a distribution T S .
Definition 16.19 For T S and S define
hFT , i = hT , Fi .
We call FT the Fourier transform of the distribution T .

(16.13)

16 Distributions

448

Since F S (Rn ), FT it is well-defined on S . Further, F is a linear operator and T a


linear functional, hence FT is again a linear functional. We show that FT is a continuous linear
functional on S . For, let n in S . By Theorem 16.10, Fn F. Since T is
S
S
continuous,
hFT , n i = hT , Fn i hT , Fi = hFT , i
which proves continuity.
Lemma 16.14 The Fourier transformation F : S (Rn ) S (Rn ) is a continuous bijection.
Proof. (a) We show continuity (see also homework 53.3). Suppose that Tn T in S ; that is
for all S , hTn , i hT , i. Hence,
hFTn , i = hTn , Fi hT , Fi = hFT , i
n

which proves the assertion.


(b) We define a second transformation G : S S via
hGT , i := hT , Gi
and show that F G = GF = id on S . Taking into account Proposition 16.11 we have
hG(FT ) , i = hFT , Gi = hT , F(G)i = hT , i ;
thus, GF = id . The proof of the direction F G = id is similar; hence, F is a bijection.

Remark 16.13 All the properties of the Fourier transformation as stated in Theorem 16.10 (i),
(ii), (iii), (iv), and (v) remain valid in case of S . In particular, F(x T ) = i| | D (FT ).
Indeed, for S (Rn ), by Theorem 16.10 (ii)


F(x T )() = hx T , Fi = hT , x Fi = T , (i)| | F(D )


= (1)| | (i)| | hD (FT ) , i = i| | D T , .
Example 16.12 (a) Let a Rn . We compute F a . For S (Rn ),
Z
eixa (x) dx = Tn eixa ().
Fa () = a (F) = (F)(a) = n

Rn

Hence, Fa is the regular distribution corresponding to f (x) = n eixa . In particular, F() =


1
Tn 1 is the constant function. Note that F() = G() = 2
n T1 . Moreover, F(T1 ) = G(T1 ) =
1
n .
(b) n = 1, b > 0.
Z
eix H(b | x |) dx
F(H(b | x |)) = 1
R
Z b
2 sin(b)
= 1
eix dx =
.

2
b

16.4 Fourier Transformation in S (Rn ) and S (Rn )

449

(c) The single-layer distribution. Suppose that S is a compact, regular, piecewise differentiable,
non self-intersecting surface in R3 and (x) L1loc (R3 ) is a function on S (a density function
or distributionin the physical sense). We define the distribution S by the scalar surface
integral
ZZ
hS , i =

(x)(x) dS.

The support of S is S, a set of measure zero with respect to the 3-dimensional Lebesgue
measure. Hence, S is a singular distribution.
Similarly, one defines the double-layer distribution (which comes from dipoles) by

 ZZ

(x)

(S ) , =
dS,
(x)
~n
~n
S

where ~n denotes the unit normal vector to the surface.


We compute the Fourier transformation of the single layer F(S ) in case of a sphere of radius
r, Sr = {x R3 | kxk = r} and density = 1. By Fubinis theorem,

Z Z Z

1
ix
hFSr , i = Sr (0) , F 3
(x) dx dS
e
3
Sr
R
2

Z
ZZ
1

(cos(x ) i sin(x )) dS (x) dx


= 3
|
{z
}
3
Sr
2 R
is 0

Using spherical coordinates on Sr , where x is fixed to be the z-axis and is the angle between
x and Sr , we have dS = r 2 sin d d and x = r kxk cos . Hence, the inner (surface)
integral reads
=

= 2

cos(kxk r cos )r 2 sin dd,

0
kxkr

kxkr

cos s

s = kxk r cos ,

ds = kxk r sin d

r
r
ds = 4
sin(kxk r).
kxk
kxk

Hence,
2r
hFSr , i =
2

R3

(x)

sin(r kxk)
dx;
kxk

the Fourier transformation of Sr is the regular distribution

2r sin(r kxk)
FSr (x) =
.
kxk
2
(d) The Resolvent of the Laplacian . Consider the Hilbert space H = L2 (Rn ) and its dense
subspace S (Rn ). For S there is defined the Laplacian . Recall that the resolvent of
a linear operator A at is the bounded linear operator on H, given by R (A) = (A I)1 .
Given f H we are looking for u H with R (A) f = u. This is equivalent to solve

16 Distributions

450

f = (A I)(u) for u. In case of A = we can apply the Fourier transformation to solve


this equation. By Theorem 16.10 (ii)
!
n
2
X

u u = f, F
u Fu = Ff,
x2k
k=1
n
X
k=1

k2 (Fu)() Fu() = (Fu)()( 2 ) = Ff ()


Ff ()
2


1
Ff () (x)
u(x) = G 2

Fu() =

Hence,
R () = F 1

1
F,
2

where in the middle is the multiplication operator by the function 1/( 2 ). One can see that
this operator is bounded in H if and only if C \ R+ such that the spectrum of satisfies
() R+ .

16.5 AppendixMore about Convolutions


Since the following proposition is used in several places, we make the statement explicit.
Proposition 16.15 Let T (x, t) and S(x, t) be distributions in D (Rn+1 ) with supp T Rn
R+ and supp S + (0, 0). Here + (0, 0) = {(x, t) Rn+1 | kxk at} denotes the forward
light cone at the origin.
Then the convolution T S exists in D (Rn+1 ) and can be written as
hT S , i = hT (x, t) S(y, s) , (t)(s)(as kyk) (x + y, t + s)i ,

(16.14)

D(Rn+1 ), where D(R) with (t) = 1 for t > and > 0 is any fixed positive
number. The convolution (T S)(x, t) vanishes for t < 0 and is continuous in both components,
that is
(a) If Tk T in D (Rn+1 ) and supp fk , f Rn R, then Tk S T S in D (Rn+1 ).
(b) If Sk S in D (Rn+1 ) and supp Sk , S + (0, 0), then T Sk T S in D (Rn+1 ).
Proof. Since D(R), there exists > 0 with (x) = 0 for x < . Let (x, t) D(Rn+1 )
with supp UR (0) for some R > 0. Let K (x, t, y, s), K R2n+2 , be a sequence in
D(R2n+2 ) converging to 1 in R2n+2 , see before Definition 16.12. For sufficiently large K we
then have
K := (s)(t)(as kyk)K (x, t, y, s)(x + y, t + s)
= (s)(t)(at kyk)(x + y, t + s) =: .

(16.15)

16.5 AppendixMore about Convolutions

451

To prove this it suffices to show that D(R2n+2 ). Indeed, is arbitrarily often differentiable
and its support is contained in
{(x, t, y, s) | s, t , as kyk ,

kx + yk2 + | r + s |2 R2 },

which is a bounded set.


Since (t) = 1 in a neighborhood of supp T and (s)(as kyk) = 1 in a neighborhood of
supp S, T (x, t) = (x)T (x, t) and S(y, s) = (s)(as kyk)S(y, s). Using (16.15) we have
hT S , i =
=

lim2n+2 hT (x, t) S(y, s) , K (x, t, y, s)(x + y, t + s)i

lim2n+2 hT (x, t) S(y, s) , K i ,

D(R2n+2 ).

This proves the first assertion.


We now prove that the right hand side of (16.14) defines a continuous linear functional on
D(Rn+1 ). Let k as k . Then
D

k := (t)(s)(as kyk) k (x + y, t + s)
D

as k . Hence,
hT S , k i = hT (x, t) S(y, s) , k i hT (x, t) S(y, s) , i = hT S , i ,

k ,

and T S is continuous.
We show that T S vanishes for t < 0. For, let D(Rn+1 ) with supp Rn (, 1 ].
Choosing > 1 /2 one has
(t)(s)(as kyk)(x + y, t + s) = 0,
such that hT S , i = 0. Continuity of the convolution product follows from the continuity
of the tensor product.

452

16 Distributions

Chapter 17
PDE II The Equations of Mathematical
Physics
In this chapter we study in detail the Laplace equation, wave equation as well as the heat equation. Firstly, for all space dimensions n we determine the fundamental solutions to the corresponding differential operators; then we consider initial value problems and initial boundary
value problems. We study eigenvalue problems for the Laplace equation.
Recall Greens identities, see Proposition 10.2,

ZZ 
ZZZ
u
v
dS,
v
u
(u(v) v(u)) dxdydz =
~n
~n
G
ZZZ
ZGZ
u
(u) dxdydz =
dS.
(17.1)
~n
G

We also need that for x Rn \ {0},




1
= 0,

kxkn2

n 3,

(log kxk) = 0,

n = 2,

see Example 7.5.

17.1 Fundamental Solutions


17.1.1 The Laplace Equation
Let us denote by n the measure of the unit sphere Sn1 in Rn , that is, 2 = 2, 3 = 4.
Theorem 17.1 The function
En (x) =

1
2

log kxk ,

1
(n2)
n

1
,
kxkn2

n = 2,
n3

is locally integrable; the corresponding regular distribution En satisfies the equation En = ,


and hence is a fundamental solution for the Laplacian in Rn .
453

17 PDE II The Equations of Mathematical Physics

454

Proof. Step 1. Example 7.5 shows that (E(x)) = 0 if x 6= 0.


Step 2. By homework 50.4, log kxk is in L1loc (R2 ) and 1/ kxk L1loc (Rn ) if and only if < n.
Hence, En , n 2, define regular distributions in Rn .
Let n 3 and D(Rn ). Using that 1/ kxkn2 is locally integrable and Example 12.6 (a),
D implies
Z
Z
(x)
(x)
dx = lim
dx.
n2
0 kxk r n2
Rn r
Abbreviating n = 1/((n 2)n ) we have

(x) dx
RnZ kxkn2
(x) dx
= n lim
.
0 kxk kxkn2

hEn , i = n

We compute the integral on the right using v(x) =


v = 0. Applying Greens identity, we have
n

kxk

(x) dx
= n
kxkn2

kxk=

1
,
r n2

which is harmonic for kxk ,

(x) (x)
n2
r
r
r

1
r n2



dS

Let us consider the first integral as 0. Note that and grad are both bounded by a
constant C since h is a test function. We make use of the estimate
Z


kxk=




Z
Z

(x)
c
dS
c n1
1

(x) n2 n2
dS
dS
=
= c
n2
r
n2
r
r

kxk=
kxk=

which tends to 0 as 0.
\
n \v

second integral = n

Hence we are left with computing the second integral. Note


x
that the outer unit normal vector to the sphere is ~n =
kxk



1
1
such that ~n kxkn2 = (n 2) rn1 and we have
Z

(n 2)
1 1
(x)
dS =
n1
r
n n1
kxk=

(x) dS.

kxk=

Note that n n1 is exactly the (n 1)-dimensional measure of the sphere of radius . So, the
integral is the mean value of over the sphere of radius . Since is continuous at 0, the mean
value tends to (0). This proves the assertion in case n 3.
The proof in case n = 2 is quite analogous.

17.1 Fundamental Solutions

455

Corollary 17.2 Suppose that f (x) is a continuous function with compact support. Then S =
E f is a regular distribution and we have S = f in D . In particular,
ZZ
1
S(x) =
log kx yk f (y) dy, n = 2;
2
R2Z Z Z
(17.2)
f (y)
1
dy, n = 3.
S(x) =
4
kx yk

R3

Proof. By Theorem 16.8, S = E f is a solution of Lu = f if E is a fundamental solution of


the differential operator L. Inserting the fundamental solution of the Laplacian for n = 2 and
n = 3 and using that f has compact support, the assertion follows.

Remarks 17.1 (a) The given solution (17.2) is even a classical solutions of the Poisson equation. Indeed, we can differentiate the parameter integral as usual.
(b) The function G(x, y) = En (x y) is called the Greens function of the Laplace equation.

17.1.2 The Heat Equation


Proposition 17.3 The function
F (x, t) =

kxk2
1
2
n H(t) e 4a t
(4a2 t) 2

defines a regular distribution E = TF and a fundamental solution of the heat equation


ut a2 u = 0, that is
Et a2 x E = (x) (t).

(17.3)

Proof. Step 1. The function F (x, t) is locally integrable since F = 0 for t 0 and F 0 for
t > 0 and

Z
Z
Z
n 
Y
2
1
1
2
r2

e k dk = 1.
(17.4)
F (x, t) dx =
e 4a t dx =
(4a2 t)n/2 Rn

Rn
R
k=1
Step 2. For t > 0, F C and therefore

 2
F
x
n
F,
=

t
4a2 t2 2t
 2

F
xi
1
xi
2F
= 2 F;
=

,
xi
2a t
x2i
4a4 t2 2a2 t
F
a2 F = 0.
t
See also homework 59.2.

(17.5)

17 PDE II The Equations of Mathematical Physics

456

We give a proof using the Fourier transformation with respect to the spatial variables. Let
E(, t) = (Fx F )(, t). We apply the Fourier transformation to (17.3) and obtain a first order
ODE with respect to the time variable t:

E(, t) + a2 2 E(, t) = n 1()(t).


t
Recall from Example 16.10, that u + bu = has the fundamental solution u(t) = H(t) ebt ;
hence
2 2
E(, t) = H(t) n ea t
We want to apply the inverse Fourier transformation with respect to the spatial variables. For,
note that by Example 16.11,


2

F 1 e 2c2

where, in our case,

1
2c2

= a2 t or c =

1 .
2a2 t

 22
E(x, t) = H(t) n F 1 ea t =

= cn e

c2 x 2
2

Hence,
2
2
1
1
1
x2
x2
n
n e 22a t =
n e 4a t .
(2) 2 (2a2 t) 2
(4a2 t) 2

Corollary 17.4 Suppose that f (x, t) is a continuous function on Rn R+ with compact support. Let
kxyk2
Z tZ

e 4a2 (ts)
1
V (x, t) = H(t)
n
n f (y, s) dy ds
(4a2 ) 2 0 Rn (t s) 2
Then V (x, t) is a regular distribution in D (Rn R+ ) and a solution of ut a2 u = f in
D (Rn R+ ).
Proof. This follows from Theorem 16.8.

17.1.3 The Wave Equation


We shall determine the fundamental solutions for the wave equation in dimensions n = 3,
n = 2, and n = 1. In case n = 3 we again apply the Fourier transformation. For the other
dimensions we use the method of descent.
(a) Case n = 3
Proposition 17.5
H(t)
D (R4 )
4a2 t
is a fundamental solution for the wave operator 2a u = utt a2 (ux1 x1 + ux2x2 + ux3 x3 ) where
Sat denotes the single-layer distribution of the sphere of radius at around 0.
E(x, t) = Sat

17.1 Fundamental Solutions

457

Proof. As in case of the heat equation let E(, t) = FE(, t) be the Fourier transform of the
fundamental solution E(x, t). Then E(, t) satisfies
2
E + a2 2 E = 3 1()(t)
t2
Again, this is an ODE of order 2 in t. Recall from Example 16.10 that u + a2 u = , a 6= 0, has
a solution u(t) = H(t) sinaat . Thus,
E(, t) = 3 H(t)

sin(a kk t)
,
a kk

where is thought to be a parameter. Apply the inverse Fourier transformation Fx1 to this
function. Recall from Example 16.12 (b), the Fourier transform of the single layer of the sphere
of radius at around 0 is
2at sin(at kk)
FSat () =
.
kk
2
This shows
E(x, t) =

1
1
1 1
H(t)Sat (x) =
H(t)Sat (x)
2 2at
a
4a2 t

Lets evaluate hE3 , (x, t)i. Using dx1 dx2 dx3 = dSr dr where x = (x1 , x2 , x3 ) and r = kxk
as well as the transformation r = at, dr = a dt and dS is the surface element of the sphere
Sr (0), we obtain
Z ZZ
1
1
hE3 , (x, t)i =
(x, t) dS dt
(17.6)
2
4a 0 t
S
Z ZatZ 
a
1
dr
r
=
dS

x,
2
4a 0 r
a
a
Sr


Z x, kxk
a
1
=
dx.
(17.7)
2
4a R3
kxk
(b) The Dimensions n = 2 and n = 1
To construct the fundamental solution E2 (x, t), x = (x1 , x2 ), we use the so-called method of
descent.
Lemma 17.6 A fundamental solution E2 of the 2-dimensional wave operator 2a,2 is given by
hE2 , (x1 , x2 , t)i = lim hE3 (x1 , x2 , x3 , t) , (x1 , x2 , t)k (x3 )i ,
k

where E3 denotes a fundamental solution of the 3-dimensional wave operator 2a,3 and k
D(R) is the function converging to 1 as k .

17 PDE II The Equations of Mathematical Physics

458

Proof. Let DR2 ). Noting that k 0 uniformly on R as k , we get


h2a,2 E2 , (x1 , x2 , t)i = hE2 , 2a,2 (x1 , x2 , t)i

= lim hE3 , 2a,2 ((x1 , x2 , t)k (x3 ))i


k

= lim hE3 , 2a,2 (x1 , x2 , t)k (x3 )+k (x3 )i


k


= lim E3 , 2a,3 (x1 , x2 , t) k (x3 )


k

= lim h2a,3 E3 , (x1 , x2 , t)k (x3 )i =


k

lim h(x1 , x2 , x3 )(t) , (x1 , x2 , t)k (x3 )i

= (0, 0, 0) = h(x1 , x2 , t) , (x1 , x2 , t)i .



In the third line we used that x1 ,x2 ,x3 (x1 , x2 , t)(x3 ) = x1 ,x2 (x1 , x2 , t) + (x3 ).
Proposition 17.7 (a) For x = (x1 , x2 ) R2 and t R, the regular distribution
(
1 1
,
at > kxk ,
1 H(at kxk)
2a a2 t2 x2

=
E2 (x, t) =
2a
a2 t2 x2
0,
at kxk
is a fundamental solution of the 2-dimensional wave operator.
(b) The regular distribution
(
1
,
| x | < at,
1
E1 (x, t) = H(at | x |) = 2a
2a
0,
| x | at
is a fundamental solution of the one-dimensional wave operator.
Proof. By the above lemma,
1
hE2 , (x1 , x2 , t)i = hE3 , (x1 , x2 , t)1(x3 )i =
4a2

1
t

ZZ

(x1 , x2 , t) dS dt.

Sat

We compute the surface element of the sphere of radius


p at around 0 in terms of x1 , x2 . The
surface is the graph of the function x3 = f (x1 , x2 ) = a2 t2 x21 x22 . By the formula before
p
Example 10.4, dS = 1 + fx21 + fx22 dx1 dx2 . In case of the sphere we have
at dx1 dx2
dSx1 ,x2 = p
a2 t2 x21 x22

Integration over both the upper and the lower half-sphere yields factor 2,
Z
ZZ
1
1
at(x1 , x2 , t)
p
=2
dx1 dx2 dt
2
4a 0 t
a2 t2 x21 x22
x2 +x2 a2 t2
Z ZZ 1 2
(x1 , x2 , t)
1
p
=
dx1 dx2 dt.
2a 0
a2 t2 x21 x22
kxkat

17.2 The Cauchy Problem

459

This shows that E2 (x, t) is a regular distribution of the above form.


RRR
One can show directly that E2 L1loc (R3 ). Indeed,
E2 (x1 , x2 , t) dx dt < for all

R2[R,R]

R > 0.
(b) It was already shown in homework 57.2 that E1 is the fundamental solution to the onedimensional wave operator. A short proof is to be found in [Wla72, II. 6.5 Example g)].

Use the method of descent to complete this proof.


Let D(R2 ).
Since
R
R
R | E2 (x1 , x2 , t) | dx2 < and R E2 (x1 , x2 , t) dx2 again defines a locally integrable function, we have as in Lemma 17.6
Z
E1 () = lim hE2 (x1 , x2 , t) , (x1 , t)k (x2 )i = lim
E2 (x1 , x2 , t)k (x2 )(x1 , t) dx1 dx2 dt.
k

R3

Hence, the fundamental solution E1 is the regular distribution


Z
E2 (x1 , x2 , t) dx2
E1 (x1 , t) =

17.2 The Cauchy Problem


In this section we formulate and study the classical and generalized Cauchy problems for the
wave equation and for the heat equation.

17.2.1 Motivation of the Method


To explain the method, we first apply the theory of distribution to solve an initial value problem
of a linear second order ODE.
Consider the Cauchy problem
u (t) + a2 u(t) = f (t),

u |t=0+ = u0 ,

u |t=0+ = u1 ,

(17.8)

where f C(R+ ). We extend the solution u(t) as well as f (t) by 0 for negative values of t,
t < 0. We denote the new function by u and f, respectively. Since u has a jump of height u0
at 0, by Example 16.6, u (t) = {u(t)} + u0 (t). Similarly, u (t) jumps at 0 by u1 such that
u (t) = {u (t)} + u0 (t) + u1 (t). Hence, u satisfies on R the equation
u + a2 u = f(t) + u0 (t) + u1(t).

(17.9)

We construct the solution u. Since the fundamental solution E(t) = H(t) sin at/a as well as the
right hand side of (17.9) has positive support, the convolution product exists and equals
u = E (f + u0 (t) + u1 (t)) = E f + u0 E (t) + u1E(t)
Z
1 t
f ( ) sin a(t )d + u0 E (t) + u1 E(t).
u =
a 0
Since in case t > 0, u satisfies (17.9) and the solution of the Cauchy problem is unique, the
above formula gives the classical solution for t > 0, that is
Z
sin at
1 t
.
f ( ) sin a(t )d + u0 cos at + u1
u(t) =
a 0
a

17 PDE II The Equations of Mathematical Physics

460

17.2.2 The Wave Equation


(a) The Classical and the Generalized Initial Value ProblemExistence, Uniqueness, and
Continuity
Definition 17.1 (a) The problem
2a u = f (x, t),
u|t=0+ = u0(x),

u
= u1(x),
t

x Rn , t > 0,

(17.10)
(17.11)
(17.12)

t=0+

where we assume that

f C(Rn R+ ),

u0 C1 (Rn ),

u1 C(Rn ).

is called the classical initial value problem (CIVP, for short) to the wave equation.
A function u(x, t) is called classical solution of the CIVP if
u(x, t) C2 (Rn R+ ) C1 (Rn R+ ),
u(x, y) satisfies the wave equation (17.10) for t > 0 and the initial conditions (17.11) and
(17.12) as t 0 + 0.
(b) The problem
2a U = F (x, t) + U0 (x) (t) + U1 (x) (t)

with F D (Rn+1 ), U0 , U1 D (Rn ), and supp F Rn [0, +) is called


generalized initial value problem (GIVP). A generalized function U D (Rn+1 ) with
supp U Rn [0, +) which satisfies the above equation is called a (generalized, weak) solution of the GIVP.
Proposition 17.8 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f , u0 ,
and u1 . Then the regular distribution Tu is a solution of the GIVP with the right hand side
Tf + Tu0 (t) + Tu1 (t) provided that f (x, t) and u(x, t) are extended by 0 into the domain
{(x, t) | (x, t) Rn+1 , t < 0}.
(b) Conversely, suppose that U is a solution of the GIVP. Let the distributions F = Tf , U0 = Tu0 ,
U1 = Tu1 and U = Tu be regular and they satisfy the regularity assumptions of the CIVP.
Then, u(x, t) is a solution of the CIVP.
Proof. (b) Suppose that U is a solution of the GIVP; let D(Rn+1 ). By definition of the
tensor product and the derivative,


Utt a2 U , = hF , i + hU0 , i + hU1 , i
Z Z
Z
Z

=
f (x, t)(x, t) dx dt
u0 (x) (x, 0) dx +
u1 (x)(x, 0) dx.
t
0
Rn
Rn
Rn

(17.13)

17.2 The Cauchy Problem

461

Applying integration by parts with respect to t twice, we find


Z
Z

utt dt = ut |0
ut t dt
0
0
Z

= u(x, 0)t (x, 0) ut |0 +


ut t dt
0
Z
= u(x, 0)t (x, 0) + ut (x, 0)(x, 0) +
ut t dt.
0

Since Rn has no boundary and has compact support, integration by parts with respect to the
spatial variables x yields no boundary terms.
RR
RR
Hence, by the above formula and
u dt dx =
u dt dx, we obtain
Z Z



2
2
Utt a U , = U , tt a =
u(x, t) tt a2 dt dx
n
Z Z
ZR 0

(u(x, 0)t (x, 0) ut (x, 0)(x, 0)) dx.
=
utt a2 u (x, t) dx dt

Rn

Rn

(17.14)

For any D(Rn R+ ), supp is contained in Rn (0, +) such that


(x, 0) = t (x, 0) = 0. From (17.13) and (17.14) it follows that
Z Z

f (x, t) utt + a2 u (x, t) dt dx = 0.

Rn

By Lemma 16.2 (Du Bois Reymond) it follows that utt a2 u = f on Rn R+ . Inserting this
into (17.13) and (17.14) we have
Z
Z
(u0 (x) u(x, 0))t (x, 0) dx
(u1 (x) ut (x, 0)) (x, 0) dx = 0.

Rn

Rn

If we set (x, t) = (x)(t) where D(R) and (t) = 1 is constant in a neighborhood of 0,


t (x, 0) = 0 and therefore
Z
(u1 (x) ut (x, 0))(x) = 0, D(Rn ).

Rn

Moreover,

(u0(x) u(x, 0))(x) dx = 0,

D(Rn )

if we set (x, t) = t(t)(x). Again, Lemma 16.2 yields


u0 (x) = u(x, 0),

u1 (x) = ut (x, 0)

and u(x, t) is a solution of the CIVP.


(a) Conversely, if u(x, t) is a solution of the CIVP then (17.14) holds with
U(x, t) = H(t) u(x, t).

17 PDE II The Equations of Mathematical Physics

462

Comparing this with (17.13) it is seen that


Utt a2 U = F + U0 + U1
where F (x, t) = H(t)f (x, t), U0 (x) = u(x, 0) and U1 (x) = ut (x, 0).

Corollary 17.9 Suppose that F , U0 , and U1 are data of the GIVP. Then there exists a unique
solution U of the GIVP. It can be written as
U = V + V (0) + V (1)
where

En
x U0 .
t
Here En x U1 := En (U1 (x) (t)) denotes the convolution product with respect to the spatial
variables x only. The solution U depends continuously in the sense of the convergence in D on
F , U0 , and U1 . Here En denote the fundamental solution of the n-dimensional wave operator
2a,n .
V = En F,

V (1) = En x U1 ,

V (0) =

Proof. The supports of the distributions U0 and U1 are contained in the hyperplane
{(x, t) Rn+1 | t = 0}. Hence the support of the distribution F + U0 + U1 is contained
in the half space Rn R+ .
It follows from Proposition 16.15 below that the convolution product
U = En (F + U0 + U1 )
exists and has support in the positive half space t 0. It follows from Theorem 16.8 that U is a
solution of the GIVP. On the other and, any solution of the GIVP has support in Rn R+ and
therefore,
by Proposition 16.15, posses the convolution with En . By Theorem 16.8, the solution U is
unique.
Suppose that Uk U1 as k in D (Rn+1 ) then En Uk En U1 by the continuity of
the convolution product in D (see Proposition 16.15).

(b) Explicit Solutions for n = 1, 2, 3


We will make the above formulas from Corollary 17.9 explicit, that is, we compute the above
convolutions to obtain the potentials V , V (0) , and V (1) .
Proposition 17.10 Let f C2 (Rn R+ ), u0 C3 (Rn ), and u1 C2 (Rn ) for n = 2, 3; let
f C1 (R+ ), u0 C2 (R), and u1 C1 (R) in case n = 1.
Then there exists a unique solution of the CIVP. It is given in case n = 3 by Kirchhoffs formula

xy 
ZZZ
Z
Z
Z
Z
f y, t a
1
1
1

u1(y) dSy +
u0 (y) dSy .
dy +
u(x, t) =

2
4a
kx yk
t
t t
Uat (x)

Sat (x)

Sat (x)

(17.15)

17.2 The Cauchy Problem

463

The first term V is called retarded potential.


In case n = 2, x = (x1 , x2 ), y = (y1 , y2 ), it is given by Poissons formula
1
u(x, t) =
2a

ZZ

f (y, s) dy ds

q
0
a2 (t s)2 kx yk2
Ua(ts) (x)

1
+
2a

ZZ

Uat (x)

1
+
2a t

ZZ

Uat (x)

u1 (y) dy
q
a2 t2 kx yk2

u0 (y) dy
q
. (17.16)
2
2
2
a t kx yk

In case n = 1 it is given by dAlemberts formula


Z t Z x+a(ts)
Z x+at
1
1
1
u(x, t) =
f (y, s) dy ds +
u1 (y) dy + (u0 (x + at) + u0 (x at)) .
2a 0 xa(ts)
2a xat
2
(17.17)
The solution u(x, t) depends continuously on u0 , u1 , and f in the following sense: If



f f < , | u0 u0 | < 0 , | u1 u1 | < 1 , k grad (u0 u0 )k < 0

(where we impose the last inequality only in cases n = 3 and n = 2), then the corresponding
solutions u(x, t) and u(x, t) satisfy in a strip 0 t T
1
| u(x, t) u(x, y) | < T 2 + T 1 + 0 + (aT 0 ),
2
where the last term is omitted in case n = 1.
Proof. (idea of proof) We show Kirchhoffs formula.
(a) The potential term with f .
By Proposition 16.15 below, the convolution product E3 f exists. It is shown in [Wla72, p.
153] that for a locally integrable function f L1loc (Rn+1 ) with supp f Rn R+ , En Tf is
again a locally integrable function.
Formally, the convolution product is given by
Z
Z
(E3 f )(x, t) =
E3 (y, s)f (x y, t s) dy ds =
E3 (x y, t s) f (y, s) dy ds,

R4

R4

where the integral is to be understood the evaluation of E3 (y, s) on the shifted function f (x
y, t s). Since f has support on the positive time axis, one can restrict oneselves to s > 0 and
t s > 0, that is to 0 < s < t. That is formula (17.6) gives
Z t ZZ
1
1
E3 f (x, t) =
f (x y, t s) dS(y) ds
2
4a 0 s
Sas

Using r = as, dr = a ds, we obtain


1
V (x, t) =
4a2

at

ZZ
Sr

1 
r
dS(y) dr.
f x y, t
r
a

17 PDE II The Equations of Mathematical Physics

464

Using dy1 dy2 dy3 = dr dS as well as kyk = r = as we can proceed




Z Z Z f x y, t kyk
a
1
V (x, t) =
dy1 dy2 dy3 .
2
4a
kyk
Uat(0)

The shift z = x y, dz1 dz2 dz3 = dy1 dy2 dy3 finally yields


Z Z Z f z, t kxzk
a
1
V (x, t) =
dz.
2
4a
kx zk
Uat(x)

This is the first potential term of Kirchhoffs formula.


(b) We compute V (1) (x, t). By definition,
V (1) = E3 (u1 ) = E3 x u1 .
Formally, this is given by,
V

(1)

1
(x, t) =
4a2 t

ZZZ
ZRZ

1
Sat (y) u1(x y) dy =
4a2 t

1
4a2 t

ZZ

u1 (x y) dS(y)

Sat

u1 (y) dS(y).

Sat (x)

(c) Recall that (D S) T = D (S T ), by Remark 16.8 (b). In particular


E3 (u0 ) =

(E3 x u0 ) ;
t

which immediately gives (c) in view of (b).

Remark 17.2 (a) The stronger regularity (differentiability) conditions on f, u0 , u1 are necessary to prove u C2 (Rn R+ ) and to show stability.
(b) Proposition 17.10 and Corollary 17.9 show that the GIVP for the wave wave equation is a
well-posed problem (existence, uniqueness, stability).

17.2.3 The Heat Equation


Definition 17.2 (a) The problem
ut a2 u = f (x, t),
u(x, 0) = u0 (x),

x Rn , t > 0

where we assume that


f C(Rn R+ ),

u0 C(Rn )

(17.18)
(17.19)

17.2 The Cauchy Problem

465

is called the classical initial value problem (CIVP, for short) to the heat equation.
A function u(x, t) is called classical solution of the CIVP if
u(x, t) C2 (Rn (0, +)) C(Rn [0, +)),
and u(x, t) satisfies the heat equation (17.18) and the initial condition (17.19).
(b) The problem
Ut a2 U = F + U0

with F D (Rn+1 ), U0 D (Rn ), and supp F Rn R+ is called generalized initial


value problem (GIVP). A generalized function U D (Rn+1 ) with supp U Rn R+ which
satisfies the above equation is called a generalized solution of the GIVP.

The fundamental solution of the heat operator has the following properties:
Z
E(x, t) dx = 1,

Rn

E(x, t) (x),

as

t0+.

The fundamental solution describes the heat distribution of a point-source at the origin (0, 0).
Since E(x, t) > 0 for all t > 0 and all x Rn , the heat propagates with infinite speed. This is in
contrast to our experiences. However, for short distances, the heat equation is gives sufficiently
good results. For long distances one uses the transport equation. We summarize the results
which are similar to that of the wave equation.

Proposition 17.11 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f
and u0 . Then the regular distribution Tu is a solution of the GIVP with the right hand side
Tf + Tu0 provided that f (x, t) and u(x, t) are extended to f(x, t) and u(x, t) by 0 into the
left half-space {(x, t) | (x, t) Rn+1 , t < 0}.
(b) Conversely, suppose that U is a solution of the GIVP. Let the distributions F = Tf , U0 = Tu0 ,
and U = Tu be regular and they satisfy the regularity assumptions of the CIVP.
Then, u(x, t) is a solution of the CIVP.
Proposition 17.12 Suppose that F and U0 are data of the GIVP. Suppose further that F and U0
both have compact support. Then there exists a solution U of the GIVP which can be written as
U = V + V (0)
where
V = E F,

V (0) = E x U0 .

The solution U varies continuously with F and U0 .


Remark 17.3 The theorem differs from the corresponding result for the wave equation in that
there is no proposition on uniqueness. It turns out that the GIVP cannot be solved uniquely. A.
Friedman, Partial differential equations of parabolic type, gave an example of a non-vanishing

17 PDE II The Equations of Mathematical Physics

466

distribution which solves the GIVP with F = 0 and U0 = 0.


However, if all distributions are regular and we place additional requirements on the growth for
t, kxk of the regular distribution, uniqueness can be achieved.
For existence and uniqueness we introduce the following class of functions
M = {f C(Rn R+ ) | f

Cb (Rn ) = {f C(Rn ) | f

is bounded on the strip Rn [0, T ] for all T > 0},

is bounded on

Rn }

Corollary 17.13 (a) Let f M and u0 Cb (Rn ). Then the two potentials V (x, t) as in
Corollary 17.4 and
Z
2
H(t)
kxyk
(0)
4a2 t
V (x, t) = E Tu0 =
(y)e
dy
u
n
0
(4a2 t) 2 Rn
are regular distributions and u = V + V (0) is a solution of the GIVP.
(b) In case f C2 (Rn R+ ) with D f M for all with | | 1 (first order partial
derivatives), the solution in (a) is a solution of the CIVP. In particular, V (0) (x, t) u0 (x) as
t 0+.
(c) The solution u of the GIVP is unique in the class M.

17.2.4 Physical Interpretation of the Results


s

The backward light cone

(x,t)

00000000000000000000000
11111111111111111111111
11111111111111111111111
00000000000000000000000
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111

(x,t)

y1

00000000000000000000000
11111111111111111111111
11111111111111111111111
00000000000000000000000
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111

radius = at

(x,t)
+

y2

y2
(x,t)
The forward light cone

y1

Definition 17.3 We introduce the two cones in Rn+1


(x, t) = {(y, s) | kx yk < a(t s)},
+ (x, t) = {(y, s) | kx yk < a(s t)},

s < t,
s > t,

which are called domain of dependence (backward light cone) and domain of influence (forward
light cone), respectively.
Recall that the boundaries + and are characteristic surfaces of the wave equation.

17.2 The Cauchy Problem

467

(a) Propagation of Waves in Space


Consider the fundamental solution
E3 (x, t) =

1
S T 1
t
4a2 at

of the 3-dimensional wave equation.


It shows that the disturbance at time t > 0 effected by a
point source (x)(t) in the origin is located on a sphere
of radius at around 0. The disturbance moves like a
spherical wave, kxk = at, with velocity a. In the beginning there is silence, then disturbance (on the sphere),
and afterwards, again silence. This is called Huygens
principle.

Sat

Disturbance is on a sphere of radius at

It follows by the superposition principle that the solution u(x0 , t0 ) of an initial disturbance
u0 (x) (t) + u1 (x)(t) is completely determined by the values of u0 and u1 on the sphere of
the backwards light-cone at t = 0; that is by the values u0 (x) and u1 (x) at all values x with
kx x0 k = at0 .
Now, let the disturbance be situated in a compact set K rather than in a single point. Suppose that d and D are the minimal and maximal
distances of x from K. Then the disturbance
starts to act in x at time t0 = d/a it lasts for
(D d)/a; and again, for t > D/a = t1 there
is silence at x. Therefore, we can observe a forward wave front at time t0 and a backward wave
front at time t1 .

silence

disturbance

M(K)

M(K)

silence
K

This shows that the domain of influence M(K) of compact set K is the union of all boundaries
of forward light-cones + (y, 0) with y K at time t = 0.
M(K) = {(y, s) | x K : kx yk = as}.
(b) Propagation of Plane Waves
Consider the fundamental solution
E2 (x, t) =

H(at kxk)
q
,
2
2
2
2a a t kxk

of the 2-dimensional wave equation.

x = (x1 , x2 )

17 PDE II The Equations of Mathematical Physics

468

000
111
111
000
000
111
000
111
000
111
000
111

t=0

t=1

0000000
1111111
1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111

t=2

1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111

t=3

0000000000000000
1111111111111111
1111111111111111
0000000000000000
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111

t=4

It shows that the disturbance effected by a point


source (x)(t) in the origin at time 0 is a disc
Uat of radius at around 0. One observes a forward wave front moving with speed a. In contrast to the 3-dimensional picture, there exists
no back front. The disturbance is permanently
present from t0 on. We speak of wave diffusion;
Huygens principle does not hold.

Diffusion can also be observed in case of arbitrary initial disturbance u0 (x) (t) + u1 (x)(t).
Indeed, the superposition principle shows that the domain of dependence of a compact initial
disturbance K is the union of all discs Uat (y) with y K.
(c) Propagation on a Line
1
Recall that E1 (x, t) = 2a
H(at | x |). The disturbance at time t > 0 which is effected by a
point source (x)(t) is the whole closed interval [at, at]. We have two forward wave fronts
one at the point x = at and one at x = at; one moving to the right and one moving to the left.
As in the plane case, there does not exist a back wave font; we observe diffusion.
For more details, see the discussion in Wladimirow, [Wla72, p. 155 159].

17.3 Fourier Method for Boundary Value Problems


A good, easy accessable introduction to the Fourier method is to be found in [KK71].
In this section we use Fourier series to solve BEVP to the Laplace equation as well as initial
boundary value problems to the wave and heat equations.
Recall that the following sets are CNOS in the Hilbert space H



1 int
e | n Z, , H = L2 (a, a + 2),
2


2i
1
nt

e ba | n Z , H = L2 (a, b),
ba


1
1
1
, sin(nt), cos(nt) | n N , H = L2 (a, a + 2),

2
(
)
r

 r


1
2
2
2
2

,
sin
nt ,
cos
nt | n N ,
ba
ba
ba
ba
ba
For any function f L1 (0, 2) one has an associated Fourier series
f

int

cn e ,

1
cn =
2

f (t)eint dt.

H = L2 (a, b),

17.3 Fourier Method for Boundary Value Problems

469

Lemma 17.14 Each of the following two sets forms a CNOS in H = L2 (0, ) (on the half
interval).
(r
)
(r
)
2
2
sin(nt) | n N ,
cos(nt) | n N0 .

Proof. To check that they form an NOS is left to the reader. We show completeness of the first
set. Let f L2 (0, ). Extend f to an odd function f L2 (, ), that is f(x) = f (x) and
f(x) = f (x) for x (0, ). Since f is an odd function, in its Fourier series

a0 X
+
an cos(nt) + bn sin(nt)
2
n=1

P
2

we have an = 0 for all n. Since the Fourier series


n=1 bn sin(nt) converges to f in L (, ),
it converges to f in L2 (0, ). Thus, the sine system is complete. The proof for the cosine
system is analogous.

17.3.1 Initial Boundary Value Problems


(a) The Homogeneous Heat Equation, Periodic Boundary Conditions
We consider heat conduction in a closed wire loop of length 2. Let u(x, t) be the temperature
of the wire at position x and time t. Since the wire is closed (a loop), u(x, t) = u(x + 2, t);
u is thought to be a 2 periodic function on R for every fixed t. Thus, we have the following
periodic boundary conditions
u(0, t) = u(2, t),

ux (0, t) = ux (2, t),

t R+ .

(PBC)

The initial temperature distribution at time t = 0 is given such that the BIVP reads
ut a2 uxx = 0,

x R, t > 0,

u(x, 0) = u0 (x),

(PBC).

x R,

Separation of variables. We are seeking solutions of the form


u(x, t) = f (x) g(t)
ignoring the initial conditions for a while. The heat equation then takes the form
f (x)g (t) = a2 f (x)g(t)

1 g (t)
f (x)
=
= = const.
a2 g(t)
f (x)

We obtain the system of two independent ODE only coupled by :


f (x) f (x) = 0,

g (t) a2 g(t) = 0.

(17.20)

17 PDE II The Equations of Mathematical Physics

470

The periodic boundary conditions imply:


f (0)g(t) = f (2)g(t),

f (x)g(t) = f (2)g(t)

which for nontrivial g gives f (0) = f (2) and f (0) = f (2). In case = 0, f (x) = 0 has
the general solution f (x) = ax + b. The only periodic solution is f (x) = b = const. Suppose
now that = 2 < 0. Then the general solution of the second order ODE is
f (x) = c1 cos(x) + c2 sin(x).
Since f is periodic with period 2, only a discrete set of values are possible, namely n = n,
n Z. This implies n = n2 , n N.
Finally, in case = 2 > 0, the general solution
f (x) = c1 ex + c2 ex
provides no periodic solutions f . So far, we obtained a set of solutions
fn (x) = an cos(nx) + bn sin(nx),

n N0 ,

corresponding to n = n2 . The ODE for gn (t) now reads


gn (t) + a2 n2 gn (t) = 0.
2 n2 t

, n N. Hence, the solutions of the BVP are given by

2 n2 t

(an cos(nx) + bn sin(nx)), n N,

Its solution is gn (t) = cea

un (x, t) = ea

u0 (x, t) =

a0
2

and finite or infinite linear combinations:

a0 X a2 n2 t
+
e
(an cos(nx) + bn sin(nx)),
u(x, t) =
2
n=0

(17.21)

Consider now the initial value, that is t = 0. The corresponding series is

a0 X
an cos(nx) + bn sin(nx),
+
u(x, 0) =
2
n=0

(17.22)

which gives the ordinary Fourier series of u0 (x). That is, the Fourier coefficient an and bn of
the initial function u0 (x) formally give a solution u(x, t).
1. If the Fourier series of u0 pointwise converges to u0 , the initial conditions are satisfied by
the function u(x, t) given in (17.21)
2. If the Fourier series of u0 is twice differentiable (with respect to x), so is the function
u(x, t) given by (17.21).

17.3 Fourier Method for Boundary Value Problems

471

Lemma 17.15 Consider the BIVP (17.20). (a) Existence. Suppose that u0 C4 (R) is periodic.
Then the function u(x, t) given by (17.21) is in C2,1
x,t ([0, 2] R+ ) and solves the classical BIVP
(17.20).
(b) Uniqueness and Stability. In the class of functions C2,1
x,t ([0, 2] R+ ) the solution of the
above BIVP is unique.
(4)

Proof. (a) The Fourier coefficients of u0 are bounded such that the Fourier coefficients of u0
(4)
have growth 1/n4 (integrate the Fourier series of u0 four times). Then the series for uxx (x, t)

X
1
and ut (x, t) both are dominated by the series
; hence they converge uniformly. This
n2
n=0
shows that the series u(x, t) can be differentiated term by term twice w.r.t. x and once w.r.t. t.
(b) For any fixed t 0, u(x, t) is continuous in x. Consider v(t) := ku(x, t)k2L2 (0,2) . Then

Z 2
Z 2
Z 2
d
2
u(x, t) dx = 2
u(x, t)ut (x, t) dx = 2
u(x, t)a2 uxx (x, t) dx
v (t) =
dt
0
0
 0

Z 2
Z 2

2
= 2 a2 ux u 0 a2
(ux (x, t))2 dx = 2a2
u2x dx 0.

This shows that v(t) is monotonically decreasing in t.


Suppose now u1 and u2 both solve the BIVP in the given class. Then u = u1 u2 solves
the BIVP in this class with homogeneous initial values, that is u(x, 0) = 0, hence, v(0) =
ku(x, 0)k2L2 = 0. Since v(t) is decreasing for t 0 and non-negative, v(t) = 0 for all t 0;
hence u(x, t) = 0 in L2 (0, 2) for all t 0. Since u(x, t) is continuous in x, this implies
u(x, t) = 0 for all x and t. Thus, u1 (x, t) = u2 (x, t)the solution is unique.
Stability. Since v(t) is decreasing
sup ku(x, t)kL2 (0,2) ku0 kL2 (0,2) .

R+

This shows that small changes in the initial conditions u0 imply small changes in the solution
u(x, t). The problem is well-posed.

(b) The Inhomogeneous Heat Equation, Periodic Boundary Conditions


We study the IBVP
ut a2 uxx = f (x, t),
u(x, 0) = 0,

(17.23)

(PBC).

Solution. Let en (x) = einx / 2, n Z, be the CNOS in L2 (0, 2). These functions are all

2
d2
eigen functions with respect to the differential operator dx
2 , en (x) = n en (x). Let t > 0 be
fixed and
X
f (x, t)
cn (t) en (x)

17 PDE II The Equations of Mathematical Physics

472

be the Fourier series of f (x, t) with coefficients cn (t). For u, we try the following ansatz
X
u(x, t)
dn (t) en (x)
(17.24)

If f (x, t) is continuous in x and piecewise continuously differentiable with respect to x, its


Fourier series converges pointwise and we have
X
X

cn (t)en .
en d (t) + a2 n2 en d(t) = f (x, t) =
ut a2 uxx =

For each n this is an ODE in t


dn (t) + a2 n2 dn (t) = cn (t),

dn (0) = 0.

From ODE the solution is well-known


a2 n2 t

dn (t) = e

2 n2 s

ea

cn (s) ds.

Under certain regularity and growth conditions on f , (17.24) solves the inhomogeneous IBVP.
(c) The Homogeneous Wave Equation with Dirichlet Conditions
Consider the initial boundary value problem of the vibrating string of length .

(E)
(BC)
(IC)

utt a2 uxx = 0,

0 < x < ,

t > 0;

u(0, t) = u(, t) = 0,

u(x, 0) = (x),

ut (x, 0) = (x),

0 < x < .

The ansatz u(x, t) = f (x)g(t) yields


g
f
== 2 ,
f
ag

f (x) = f (x),

g = a2 g.

The boundary conditions imply f (0) = f () = 0. Hence, the first ODE has the only solutions
fn (x) = cn sin(nx),

n = n2 ,

n N.

The corresponding ODEs for g then read


gn + n2 a2 gn = 0,
which has the general solution an cos(nat) + bn sin(nat). Hence,

X
u(x, t) =
(an cos(nat) + bn sin(nat)) sin(nx)
n=1

17.3 Fourier Method for Boundary Value Problems

473

solves the boundary value problem in the sense of D (R2 ) (choose any an , bn of polynomial
growth). Now, insert the initial conditions, t = 0:
u(x, 0) =

an sin(nx) = (x),

ut (x, 0) =

n=1

nabn sin(nx) = (x).

n=1

Since { 2 sin(nx) | n N} is a CNOS in L2 (0, ), we can determine the Fourier coefficients


of and with respect to this CNOS and obtain an and bn , respectively.
Regularity. Suppose that C4 ([0, ]), C3 ([0, ]). Then the Fourier-Sine coefficients an
and anbn of and have growth 1/n4 and 1/n3 , respectively. Hence, the series

X
u(x, t) =
(an cos(nat) + bn sin(nat)) sin(nx)

(17.25)

n=1

can be differentiated twice with respect to x or t since the differentiated series have a summable
P
upper bound c/n2 . Hence, (17.25) solves the IBVP.
(d) The Wave Equation with Inhomogeneous Boundary Conditions
Consider the following problem in Rn
utt a2 u = 0,

u(x, 0) = ut (x, 0) = 0

u | = w(x, t).
Idea. Find an extension v(x, t) of w(x, t), v C2 ( R+ ), and look for functions u = u v.
Then u has homogeneous boundary conditions and satisfies the IBVP
utt a2
u = vtt + a2 v,
u(x, 0) = v(x, 0),
u | = 0.

ut (x, 0) = vt (x, 0)

This problem can be split into two problems, one with zero initial conditions and one with
homogeneous wave equation.

17.3.2 Eigenvalue Problems for the Laplace Equation


In the previous subsection we have seen that BIVPs using Fouriers method often lead to boundary eigenvalue problems (BEVP) for the Laplace equation.
We formulate the problems. Let n = 1 and = (0, l). One considers the following types of
BEVPs to the Laplace equation; f = f :
Dirichlet boundary conditions: f (0) = f (l) = 0.
Neumann boundary conditions: f (0) = f (l) = 0.

17 PDE II The Equations of Mathematical Physics

474

Periodic boundary conditions : f (0) = f (l), f (0) = f (l).


Mixed boundary conditions: 1 f (0) + 2 f (0) = 0, 1 f (l) + 2 f (l) = 0.
Symmetric boundary conditions: If u and v are function satisfying these boundary conditions, then (u v uv )|l0 = 0. In this case integration by parts gives
Z l
Z l

l
(u v uv ) dx = u v v u|0 (u v v u) dx = 0.
0

That is u v = uv and the Laplace operator becomes symmetric.

Proposition 17.16 Let Rn . The BEVP with Dirichlet conditions


u = u,

u C2 () C1 ()

u | = 0,

(17.26)

has countably many eigenvalues k . All eigenvalues are negative and of finite multiplicity. Let
0 > 1 > 2 > then sequence ( 1k ) tends to 0. The eigenfunctions uk corresponding to k
form a CNOS in L2 ().
Sketch of proof. (a) Let H = L2 (). We use Greens 1st formula with u = v , u | = 0,
Z
Z
u u dx + (u)2 dx = 0.

to show that all eigenvalues of are negative. Let u = u. First note, that = 0 is not an
eigenvalue of . Suppose to the contrary u = 0, that is, u is harmonic. Since u | = 0, by
the uniqueness theorem for the Dirichlet problem, u = 0 in . Then
Z
Z
2
u u dx = (u)2 dx < 0.
kuk = hu , ui = hu , ui = hu , ui =

Hence, is negative.
(b) Assume that a Greens function G for exists. By (17.34), that is
Z
Z
G(x, y)
u(y) =
G(x, y) u(x) dx +
u(x)
dS(x),
~nx

u | = 0 implies

u(y) =

G(x, y) u(x) dx.

This shows that the integral operator A : L () L2 () defined by


Z
(Av)(y) :=
G(x, y)v(x) dx

is inverse to the Laplacian. Since G(x, y) = G(y, x) is real, A is self-adjoint. By (a), its
eigenvalues, 1/k are all negative. If
ZZ
| G(x, y) |2 dxdy < ,

17.3 Fourier Method for Boundary Value Problems

475

A is a compact operator.
R
We want to justify the last statement. Let (Kf )(x) = k(x, y)f (y) dy be an integral operator
on H = L2 (), with kernel k(x, y) H = L2 ( ). Let {un | n N} be a CNOS in H;
then {un (x)um (y) | n, m N} is a CNOS in H. Let knm be the Fourier coefficients of k with
respect to the basis {un (x)um (y)} in H. Then
Z
X
(Kf )(x) =
f (y)
knm un (x)um (y) dy

n,m

un (x)

knm

un (x)

X
m

um (y)f (y) dy

knm hf , um i =

X
n,m

knm hf , um i un .

This in particular shows that


2

kKf k =
2

kKk

X
m,n

X
n

2
kmn

hf , um i

2
kmn

m,n

kf k = kf k

k(x, y)2 dxdy = kf k2

X
n

kKun k2

kKun k

We show that K is approximated by the sequence (Kn ) defined by


Kn f =

n
XX
m

r=1

krm hf , um i ur

of finite rank operators. Indeed,


2

k(K Kn )f k =

X X

2
krm

m r=n+1

| hf , um i | sup

such that
2

kK Kn k = sup
m

r=n+1

r=n+1

2
krm
kf k2

2
krm
0

as n . Hence, K is compact.
(c) By (a) and (b), A is a negative, compact, self-adjoint operator. By the spectral theorem for
compact self-adjoint operators, Theorem 13.33, there exists an NOS (uk ) of eigenfunctions to
1/k of A. The NOS (uk ) is complete since 0 is not an eigenvalue of A.
Example 17.1 Dirichlet Conditions on the Square. Let Q = (0, ) (0, ) R2 . The
Laplace operator with Dirichlet boundary conditions on has eigenfunctions
umn (x, y) =

2
sin(mx) sin(ny),

corresponding to the eigenvalues mn = (m2 + n2 ). The eigenfunctions {umn | m, n N}


form a CNOS in the Hilbert space L2 ().

17 PDE II The Equations of Mathematical Physics

476

Example 17.2 Dirichlet Conditions on the ball U1 (0) in


Dirichlet boundary conditions on the ball.
u = u,

R . We consider the BEVP with


2

u |S1 (0) = 0.

In polar coordinates u(x, y) = u(r, ) this reads,


1
1
u, 0 < r < 1, 0 < 2.
(r ur ) + 2 u =
r r
r
Separation of variables. We try the ansatz u(r, ) = R(r)(). We have the boundary condition R(1) = 0 and in R(r) is bounded in a neighborhood of r = 0. Also, is periodic. Then

u = R and
r

u(r, ) =

(r
ur ) =
(rR ) = (R + rR ), u = R .
r
r
Hence, u = u now reads


R
R

+ R + 2 = R
r
r
R
r

+ R
1
+ 2
=
R
r

rR + r 2 R
+ r 2 =
= .
R

In this way, we obtain the two one-dimensional problems


2

+ = 0,

(0) = (2);

| R(0) | < ,

r R + rR + (r )R = 0,

R(1) = 0.

(17.27)

The eigenvalues and eigenfunctions to the first problem are


k = k 2 ,

k () = eik ,

k Z.

2
Equation (17.27) is the Bessel ODE.
For = k the solution of (17.27) bounded in r = 0 is
given by the Bessel function Jk (r ). Recall from homework 21.2 that
2n+k

X
(1)n x2
Jk (x) =
, k N0 .
n!(n
+
k)!
n=0

To
determine the eigenvalues
we use the boundary condition R(1) = 0 in (17.27), namely

Jk ( ) = 0. Hence, = kj , where kj , j = 1, 2, . . . , denote the positive zeros of Jk . We


obtain
kj = 2kj , Rkj (r) = Jk (kj r), j = 1, 2, .
The solution of the BEVP is
kj = 2kj ,

ukj (x) = J| k | (| k |j r)eik ,

k Z,

j = 1, 2, .

Note that the Bessel functions {Jk | k Z+ } and the system {eikt | k Z} form a complete
OS in L2 ((0, 1), r dr) and in in L2 (0, 2), respectively. Hence, the OS {ukl | k Z, l Z+ } is
a complete OS in L2 (U1 (0)). Thus, there are no further solutions to the given BEVP. For more
details on Bessel functions, see [FK98, p. 383].

17.4 Boundary Value Problems for the Laplace and the Poisson Equations

477

17.4 Boundary Value Problems for the Laplace and the Poisson Equations
Throughout this section (if nothing is stated otherwise) we will assume that is a bounded
region in Rn , n 2. We suppose further that belongs to the class C2 , that is, the boundary
consists of finitely many twice continuously differentiable hypersurfaces; := Rn \ is
assumed to be connected (i. e. it is a region, too). All functions are assumed to be real valued.

17.4.1 Formulation of Boundary Value Problems


(a) The Inner Dirichlet Problem:
Given C() and f C(), find u C() C2 () such that
u(x) = f (x) x ,

and

u(y) = (y), y .

(b) The Exterior Dirichlet Problem:


Given C() and f C( ), find u C( ) C2 ( ) such that
u(x) = f (x), x ,

and

u(y) = (y), y ,

lim u(x) = 0.

| x |

(c) The Inner Neumann Problem:


n
y

Given C() and f C(), find u C1 () C2 ()


such that

x=ytn

u(x) = f (x) x , and


u
(y) = (y), y .
~n
Here

u
(y)
~
n

denotes the limit of directional derivative


u
(y) = lim ~n(y) grad u(y t~n(y))
t0+0
~n

and ~n(y) is the outer normal to at y . That is, x approaches y in the direction
of the normal vector ~n(y). We assume that this limit exists for all boundary points y .

17 PDE II The Equations of Mathematical Physics

478
(d) The Exterior Neumann Problem:
x=y+tn
n
y

Given C() and f C( ), find u C1 ( )


C2 ( ) such that

u(x) = f (x) x ,
u
(y) = (y), y
~n +
lim u(x) = 0.

and

| x |

Here

u
(y)
~
n+

denotes the limit of directional derivative


u
(y) = lim ~n(y) grad u(y + t~n(y))
t0+0
~n +

and ~n(y) is the outer normal to at y . We assume that this limit exists for all boundary
points y . In both Neumann problems one can also look for a function u C2 () C()

or u C2 ( ) C( ), respectively, provided the above limits exist and define continuous


functions on the boundary.
These four problems are intimately connected with each other, and we will obtain solutions to
all of them simultaneously.

17.4.2 Basic Properties of Harmonic Functions


Recall that a function u C2 (), is said to be harmonic if u = 0 in .
We say that an operator L on a function space V over Rn is invariant under a affine transformation T , T (x) = A(x) + b, where A L (Rn ), b Rn , if
LT = T L,
where T : V V is given by (T f )(x) = f (T (x)), f V . It follows that the Laplacian is
invariant under translations (T (x) = x+ b) and rotations T (i. e. T T = T T = I). Indeed, for
translations, the matrix B with A = BAB is the identity and in case of the rotation, B = T 1 .
Since A = I, in both cases, A = A = I; the Laplacian is invariant.
In this section we assume that Rn is a region where Gau divergence theorem is valid for
all vector fields f C1 () C() for :
Z
Z
~
div f (x) dx =
f (y) dS(y),

where the dot denotes the inner product in R . The term under the integral can be written as
n

~ = (f1 (y), , fn (y)) ( dy2 dy3 dyn , dy1 dy3 dyn ,


(y) = f (y) dS

, (1)n1 dy1 dyn1
n
X
=
(1)k1 fk (y) dy1 d
dyk dyn ,
k=1

17.4 Boundary Value Problems for the Laplace and the Poisson Equations

479

where the hat means ommission of this factor. In this way (y) becomes a differential (n 1)form. Using differentiation of forms, see Definition 11.7, we obtain
d = div f (y) dy1 dy2 dyn .
This establishes the above generalized form of Gau divergence theorem. Let U : R be
~
a continuous scalar function on , one can define U(y) dS(y) := U(y)~n(y) dS(y),
where ~n
is the outer unit normal vector to the surface .
Recall that we obtain Greens first formula inserting f (x) = v(x)u(x), u, v C2 ():
Z
Z
Z
u
v(x)u(x) dx +
u(x) v(x) dx =
v(y) (y) dS(y).
~n

Interchanging the role of u and v and taking the difference, we obtain Greens second formula

Z
Z 
v
u
(y) u(y) (y) dS(y).
(17.28)
v(y)
(v(x)u(x) u(x)v(x)) dx =
~n
~n

Recall that
1
log kxk , n = 2,
2
1
kxkn+2 ,
En (x) =
(n 2)n
E2 (x) =

n3

are the fundamental solutions of the Laplacian in Rn .

Theorem 17.17 (Greens representation formula) Let u C2 ().


Then for x we have

Z
Z 
En
u
(y) dS(y)
u(x) =
En (x y) u(y) dy +
u(y)
(x y) En (x y)
~ny
~n

(17.29)
Here
y.

~
ny

denotes the derivative in the direction of the outer normal with respect to the variable


u

Note that the distributions {u}, ~


, and ~
(u ) have compact support such that the
n
n
convolution products with En exist.
Proof. Idea of proof For sufficiently small > 0, U (x) , since is open. We apply
Greens second formula with v(y) = En (x y) and \ U (x) in place of . Since En (x y)
is harmonic with respect to the variable y in \ {x} (recall from Example 7.5, that En (x) is
harmonic in Rn \ {0}), we obtain

Z
Z 
u
En (x y)
En (x y)(y) dy =
En (x y) (y) u(y)
dS(y)
~n
~ny

\ U (x)

U (x)



En (x y)
u
En (x y) (y) u(y)
dS(y). (17.30)
~n
~ny

17 PDE II The Equations of Mathematical Physics

480

In the second integral ~n denotes the outer normal to \ U (x) hence the inner normal of U (x).
We wish to evaluate the limits of the individual integrals in this formula as 0. Consider
the left-hand side of (17.30). Since u C2 (), u is bounded; since En (x y) is locally
integrable, the lhs converges to
Z
En (x y) u(y) dy.

On U (x), we have En (x y) = n n+2 , n = 1/(n (n 2)). Thus as 0,





Z
Z


| | Z u


u(y)
| n |
u


n




En (x y) (y) dS n2
dS

~n (y) dS n2 sup ~n


~n
U (x)

U (x)
U (x)
S (x)




u(y)
1
n n1 sup
= C 0.

n2
(n 2)n
~n
U (x)

Furthermore, since ~n is the interior normal of the ball U (y), the same calculations as in the
d
proof of Theorem 17.1 show that En~(xy)
= n d
(n+2 ) = n+1 /n . We obtain,
ny
Z
Z
En (x y)
1

u(y)
dS(y) =
u(y) dS(y) u(x).
~ny
n n1
U (x)
S (x)
{z
}
|
spherical mean

In the last line we used that the integral is the mean value of u over the sphere S (x), and u is
continuous at x.

Remarks 17.4 (a) Greens representation formula is also true for functions
u C2 () C1 (). To prove this, consider Greens representation theorem on smaller
regions such that .
(b) Applying Greens representation formula to a test function D(), see Definition 16.1,
(y) =
(y) = 0, y , we obtain
~
n
Z
(x) =
En (x y)(x) dx

(c) We may now draw the following consequence from Greens representation formula: If one
knows u, then u is completely determined by its values and those of its normal derivative on
. In particular, a harmonic function on can be reconstructed from its boundary data. One
may ask conversely whether one can construct a harmonic function for arbitrary given values
u
of u and ~
on . Ignoring regularity conditions, we will find out that this is not possible in
n
general. Roughly speaking, only one of these data is sufficient to describe u completely.
(d) In case of a harmonic function u C2 ()C1 (), u = 0, Greens representation formula
reads (n = 3):

Z 
1
1

u(y)
1
u(x) =
dS(y).
(17.31)
u(y)
4 kx yk ~n
~ny kx yk

17.4 Boundary Value Problems for the Laplace and the Poisson Equations

481

In particular, the surface potentials V (0) (x) and V (1) (x) can be differentiated arbitrarily often
for x . Outside , V (0) and V (1) are harmonic. It follows from (17.31) that any harmonic
function is a C -function.

Spherical Means and Ball Means


First of all note that u C1 () and u harmonic in implies
Z
u(y)
dS = 0.
n
~

(17.32)

Indeed, this follows from Greens first formula inserting v = 1 and u harmonic, u = 0.
Proposition 17.18 (Mean Value Property) Suppose that u is harmonic in UR (x0 ) and continuous in UR (x0 ).
(a) Then u(x0 ) coincides with its spherical mean over the sphere SR (x0 ).
Z
1
u(x0 ) =
u(y) dS(y) (spherical mean).
(17.33)
n Rn1
SR (x0 )

(b) Further,

n
u(x0 ) =
n R n

u(x) dx (ball mean).

UR (x0 )

Proof. (a) For simplicity, we consider only the case n = 3 and x0 = 0. Apply Greens representation formula (17.31) to any ball = U (0) with < R. Noting (17.32) from (17.31) it
follows that
!
Z
Z
u(y)
1 1
1
dS
dS
u(0) =
u(y)
4 S (0) ~n
~ny kyk
S (0)
Z
Z
1
1
1
1
dS =
=
u(y)
u(y) 2 dS
4 S (0)
~ny kyk
4 S (0)

Z
1
=
u(y) dS,
42 S (0)
Since u is continuous on the closed ball of radius R, the formula remains valid as R.
(b) Use dx = dx1 dxn = dr dSr where kxk = r. Multiply both sides of (17.33) by
r n1 dr and integrate with respect to r from 0 to R:


Z R
Z
Z R
1
n1
n1
r u(x0 ) dr =
r
u(y) dS dr
n r n1 SR (x0 )
0
0
Z
1 n
1
u(x) dx.
R u(x0 ) =
n
n UR (x0 )
The assertion follows. Note that Rn n /n is exactly the n-dimensional volume of UR (x0 ). The
proof in case n = 2 is similar.

17 PDE II The Equations of Mathematical Physics

482

Proposition 17.19 (Minimum-Maximum Principle) Let u be harmonic in and continuous


in . Then
max u(x) = max u(x);
x

i. e. u attains its maximum on the boundary . The same is true for the minimum.
Proof. Suppose to the contrary that M = u(x0 ) = max u(x) is attained at an inner point x0
x

and M > m = max u(x) = u(y0 ), y0 .


x

(a) We show that u(x) = M is constant in any ball U (x0 ) around x0 .


Suppose to the contrary that u(x1 ) < M for some x1 U (x0 ). By continuity of u, u(x) < M
for all x U (x0 ) U (x1 ). In particular
Z
Z
n
n
M = u(x0 ) =
u(x) dx <
M dy = M;
n n B (x0 )
n n B (x0 )
this is a contradiction; u is constant in U (x0 ).
(b) u(x) = M is constant in . Let x1 ; we will show that u(x1 ) = M. Since is
connected and bounded, there exists a path from x0 to x1 which can be covered by a chain of
finitely many balls in . In all balls, starting with the ball around x0 from (a), u(x) = M is
constant. Hence, u is constant in . Since u is continuous, u is constant in . This contradicts
the assumption; hence, the maximum is assumed on the boundary .
Passing from u to u, the statement about the minimum follows.
Remarks 17.5 (a) A stronger proposition holds with local maximum in place of maximum
(b) Another stricter version of the maximum principle is:
Let u C2 () C() and u 0 in . Then either u is constant or
u(y) < max u(x)
x

for all y .
Corollary 17.20 (Uniqueness) The inner and the outer Dirichlet problem has at most one solution, respectively.
Proof. Suppose that u1 and u2 both are solutions of the Dirichlet problem, u1 = u2 = f .
Put u = u1 u2 . Then u(x) = 0 for all x and u(y) = 0 on the boundary y .
(a) Inner problem. By the maximum principle, u(x) = 0 for all x ; that is u1 = u2 .
(b) Suppose that u 6 0. Without loss of generality we may assume that
u(x1 ) = > 0 for some x1 . By assumption, | u(x) | 0 as
B(0)
x . Hence, there exists r > 0 such that | u(x) | < /2 for all
x r. Since u is harmonic in Br (0) \ , the maximum principle yields

11111111111111111111111111
00000000000000000000000000
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
r
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111

= u(x1 )
a contradiction.

max

xSR (0)

u(x) /2;

17.4 Boundary Value Problems for the Laplace and the Poisson Equations

483

Corollary 17.21 (Stability) Suppose that u1 and u2 are solutions of the inner Dirichlet problem u1 = u2 = f with boundary values 1 (y) and 2 (y) on , respectively. Suppose
further that
| 1 (y) 2 (y) | y .
Then | u1 (x) u2 (x) | for all x .
A similar statement is true for the exterior Dirichlet problem.
Proof. Put u = u1 u2 . Then u = 0 and | u(y) | for all y . By the Maximum
Principle, | u(x) | for all x .

Lemma 17.22 Suppose that u is a non-constant harmonic function on and the maximum of
u(x) is attained at y .
u
Then ~
(y) > 0.
n
For the proof see [Tri92, 3.4.2. Theorem, p. 174].
Proposition 17.23 (Uniqueness) (a) The exterior Neumann problem has at most one solution.
(b) A necessary condition for solvability of the inner Neumann problem is
Z
Z
dS =
f (x) dx.

Two solutions of the inner Neumann problem differ by a constant.


Proof. (a) Suppose that u1 and u2 are solutions of the exterior Neumann problem, then u =
u
u1 u2 satisfies u = 0 and
(y) = 0. The above lemma shows that u(x) = c is constant in
~n
. Since lim| x | u(x) = 0, the constant c is 0; hence u1 = u2.
(b) Inner problem. The uniqueness follows as in (a). The necessity of the formula follows from
v
(17.28) with v 1, u = f , ~
= 0.
n

Proposition 17.24 (Converse Mean Value Theorem) Suppose that u C() and that whenever x0 such that Ur (x0 ) we have the mean value property
Z
Z
1
1
u(y) dS(y) =
u(x0 + ry) dS(y).
u(x0 ) =
n r n1 Sr (x0 )
n S1 (0)
Then u C () and u is harmonic in .
Proof. (a) We show that u C (). The Mean Value Property ensures that the mollification
h u equals u as long as U1/ (x0 ) ; that is, the mollification does not change u. By

17 PDE II The Equations of Mathematical Physics

484

homework 49.4, u C since h is. We prove h u = u. Here g denotes the 1-dimensional


bump function, see page 419.
Z
Z
u (x) = (u h )(x) =
u(y)h (x y) dy =
u(x y)h (y) dy
n
n
R
R
Z
Z
n
u(x y)h(y/) dy =y
u(x z)h(z) dz
=
=

U (0)
1Z

zi =

S1 (0)

= n u(x)

U1 (0)

u(x ry)g(r)r n1 dS(y) dr


1

g(r)r

n1

dr = u(x)

Rn

h(y) dy = u(x).

R
Second Part. Differentiating the above equation with respect to r yields Ur (x) u(y) dy = 0
for any ball in , since the left-hand side u(x) does not depend on r.
Z
Z
d
u(x + ry) dS(y) =
y u(x + ry) dS(y)
0=
dr S1 (0)
S1 (0)
Z
=
(r 1 z)u(x + z)r 1n dS(z)
Sr (0)
Z
n
=r
~n(z) u(x + z) dS(z)
Sr (0)
Z
u
n
=r
(x + z) dS(z)
n
Sr (0) ~
Z
Z
u(y)
n
n
=r
dS(y) = r
u(x) dx.
n
Sr (x0 ) ~
Ur (x0 )
In the last line we used Greens 2nd formula with v = 1. Thus u = 0. Suppose to the contrary
that u(x0 ) 6= 0, say u(x0 ) > 0. By continuity of u(x), u(x) > 0 for x U (x0 ). Hence
R
u(x) dx > 0 which contradicts the above equation. We conclude that u is harmonic in
U (x0 )
Ur (x0 ).

Remark 17.6 A regular distribution u D () is called harmonic if u = 0, that is,


Z
u(x) (x) dx = 0, D().
hu , i =

Weyls Lemma: Any harmonic regular distribution is a harmonic function, in particular, u


C ().
Example 17.3 Solve
u = 2,

u | = 0.

(x, y) = (0, a) (b/2, b/2),

17.5 Appendix

485

Ansatz: u = w + v with w = 0. For example, choose w = x2 + ax. Then v = 0 with


boundary conditions
v(0, y) = v(a, y) = 0,

v(x, b/2) = v(x, b/2) = x2 ax.

Use separation of variables, u(x, y)) = X(x)Y (y) to solve the problem.

17.5 Appendix
17.5.1 Existence of Solutions to the Boundary Value Problems
(a) Greens Function
Let u C2 () C1 (). Let us combine Greens representation formula and Greens
2nd formula with a harmonic function v(x) = vy (x), x , where y is thought to be
a parameter.

Z
Z 
En
u
(x) dS(x)
u(y) =
En (x y) u(x) dx +
u(x)
(x y) En (x y)
~nx
~n


Z
Z 
vy
u
0=
(x) vy (x)
(x) dS(x)
u(x)
vy (x)u(x) dx +
~n
~n

Adding up these two lines and denoting G(x, y) = En (x y) + vy (x) we get



Z 
Z
G(x, y)
u
(x) dS(x).
G(x, y) u(x) dx +
u(x)
G(x, y)
u(y) =
~nx
~n

Suppose now that G(x, y) vanishes for all x then the last surface integral is 0 and
Z
Z
G(x, y)
u(y) =
G(x, y) u(x) dx +
u(x)
dS(x).
(17.34)
~nx

In the above formula, u is completely determined by its boundary values and u in . This
motivates the following definition.
Definition 17.4 A function G : R satisfying
(a) G(x, y) = 0 for all x , y , x 6= y.
(b) vy (x) = G(x, y) En (x y) is harmonic in x for all y .
is called a Greens function of . More precisely, G(x, y) is a Greens function to the inner
Dirichlet problem on .
Remarks 17.7 (a) The function vy (x) is in particular harmonic in x = y. Since En (x y) has
a pole at x = y, G(x, y) has a pole of the same order at x = y such that G(x, y) En (x y)
has no singularity.
If such a function G(x, y) exists, for all u C2 () we have (17.34).

17 PDE II The Equations of Mathematical Physics

486

In particular, if in addition, u is harmonic in ,


u(y) =

u(x)

G(x, y)
dS(x).
~nx

(17.35)

This is the so called Poissons formula for . In general, it is difficult to find Greens function.
For most regions it is even impossible to give G(x, y) explicitely. However, if has kind
of symmetry, one can use the reflection principle to construct G(x, y) explicitely. Nevertheless,
G(x, y) exists for all well-behaved (the boundary is a C2 -set and Gau divergence theorem
holds for ).

(c) The Reflection Principle


This is a method to calculate Greens function explicitely in case of domains with the following property: Using repeated reflections on spheres and on hyperplanes occurring as boundaries
of and its reflections, the whole Rn can be filled up without overlapping.
Example 17.4 Greens function on a ball UR (0). For, we use the reflection on the sphere SR (0).
For y Rn put
.
R

y :=
O

(
y

R2
,
kyk2

y 6= 0,

y = 0.

Note that this map has the the property y y = R2 and kyk2 y = R2 y. Points on the sphere
SR (0) are fix under this map, y = y. Let En : R+ R denote the corresponding to En radial
scalar function with E(x) = En (kxk), that is En (r) = 1/((n 2)n r n2 ), n 2. Then we
put



E (kx yk) E kyk kx yk ,
n
n
R
G(x, y) =
En (kxk) En (R),

y 6= 0,

(17.36)

y = 0.

For x 6= y, G(x, y) is harmonic in x, since for kyk < R, kyk > R and therefore x y 6= 0. The
function G(x, y) has only one singularity in UR (0) namely at x = y and this is the same as that
of En (x y). Therefore,



E kyk kx yk ,
n
R
vy (x) = G(x, y) En (x y) =
En (R),

y 6= 0,
y = 0.

17.5 Appendix

487

is harmonic for all x . For x = SR (0) we have for y 6= 0





 12 
 21
kyk
2
2
2
2
En
G(x, y) = En kxk + kyk 2x y
kxk + kyk 2x y
R

! 12
2
2


1

y
kyk kyk
= En R2 + kyk2 2x y 2 En kyk2 +
2 kyk2 x 2
2
R
R


1 
1 
= En R2 + kyk2 2x y 2 En kyk2 + R2 2x y 2 = 0.
For y = 0 we have

G(x, 0) = En (kxk) En (R) = En (R) En (R) = 0.


This proves that G(x, y) is a Greens function for UR (0). In particular, the above calculation
shows that r = kx yk and r = kyk
kx yk are equal if x .
R
One can show that Greens function is symmetric, that is G(x, y) = G(y, x). This is a general
property of Greens function.

To apply formula (17.34) we have to compute the normal derivative


G(x, y). Note first that
~nx
n
for any constant z R and x SR (0)
x
xz

f (kx zk) = ~n f (kx zk) =


f (kx zk)
.
~nx
kxk
kx zk
Note further that for kxk = R we have

r = kx yk =

(x y) x

kyk
kx yk ,
R

kyk2
(x y) x = R2 kyk2 .
R2

Hence, for y 6= 0,

1
G(x, y) =
~nx
(n 2)n

(17.37)

kx ykn+2
~nx
~nx

kykn+2
kx ykn+2
Rn+2

(17.38)
!!

x
kykn+2
x
xy
xy

n+2 kx ykn+1

kx ykn+1
kx yk kxk
R
kx yk kxk
!
1
kyk2
=
(x

y)

(x

y)
x
n r n R
R2
1
=
n

By (17.38), the expression in the brackets is R2 kyk2 . Hence,

1
R2 kyk2

G(x, y) =
.
~nx
n R
kx ykn
This formula holds true in case y = 0. Inserting this into (17.34) we have for any harmonic
function u C2 (UR (0)) C(UR (0)) we have
Z
R2 kyk2
u(x)
u(y) =
dS(x).
(17.39)
n R
kx ykn
SR (0)

This is the so called Poissons formula for the ball UR (0).

17 PDE II The Equations of Mathematical Physics

488

Proposition 17.25 Let n 2. Consider the inner Dirichlet problem in = UR (0) and f = 0.
The function
R2 kyk2 R
(x)
n R
dS(x),
kyk < R,
kxykn
SR (0)
u(y) =

(y),
kyk = R
is continuous on the closed ball UR (0) and harmonic in UR (0).
In case n = 2 the function u(y), can be written in the following form


Z
1
z + y dz
u(y) = Re
, y UR (0) C.
(z)
2i SR (0)
zy z

For the proof of the general statement with n 2, see [Jos02, Theorem 1.1.2] or [Joh82, p.
107]. We show the last statement for n = 2. Since yz yz is purely imaginary,
z+y
(z + y)(z y)
| z |2 | y |2 + yz yz
R 2 | y |2
Re
= Re
= Re
=
.
zy
(z y)(z y)
| z y |2
| z y |2
Using the parametrization z = Reit , dt =

Re

1
2

z + y dz
(z)
z y iz
SR (0)

1
=
2

dz
we obtain
iz
Z

R2 | y |2
(z) dt =
| z y |2

R2 | y |2
2R

SR (0)

(x)
| dx | .
| x y |2

In the last line we have a (real) line integral of the first kind, using x = (x1 , x2 ) = x1 + ix2 = z,
x SR (0) and | dx | = R dt on the circle.
Other Examples. (a) n = 3. The half-space = {(x1 , x2 , x3 ) R3 | x3 > 0}. We use the
ordinary reflection map with respect to the plane x3 = 0 which is given by y = (y1 , y2, y3 ) 7
y = (y1, y2 , y3 ). Then Greens function to is


1
1
1

G(x, y) = E3 (x, y) E3 (x, y ) =


.

4 kx y k kx yk
(see homework 57.1)
(b) n = 3. The half ball = {(x1 , x2 , x3 ) R3 | kxk < R, x3 > 0}. We use the reflections
y y and y y (reflection with respect to the sphere SR (0)). Then
G(x, y) = E3 (x y)

R
R
E3 (x y) E3 (x y ) +
E3 (x y )
kyk
kyk

is Greens function to .
(c) n = 3, = {(x1 , x2 , x3 ) R3 | x2 > 0, x3 > 0}. We introduce the reflection
y = (y1 , y2 , y3 ) 7 y = (y1 , y2 , y3 ). Then Greens function to is
G(x, y) = E3 (x y) E3 (x y ) E3 (x y ) + E3 (x (y ) ).

17.5 Appendix

489

Consider the Neumann problem and the ansatz for Greens function in case of the Dirichlet
problem:

Z
Z 
H(x, y)
u
u(y) =
(x) dS(x). (17.40)
u(x)
H(x, y) u(x) dx +
H(x, y)
~nx
~n

We want to choose a Greens function of the second kind H(x, y) in such a way that only the
last surface integral remains present.
Inserting u = 1 in the above formula, we have
Z
G(x, y)
dS(x).
1=
~nx

= = const. , this constant must be = 1/vol().


Imposing, G(x,y)
~
nx
Note, that one defines a Greens function to the Neumann problem replacing G(x, y) = 0 on
by the condition ~ny G(x, y) = const.
Greens function of second kind (Neumann problem) to the ball of radius R in R3 , UR (0).
1
H(x, y) =
4

1
R
1
2R2
+
+ log 2
kx yk kyk kx yk R
R x y + kyk kx yk

(c) Existence Theorems


The aim of this considerations is to sketch the method of proving existence of solutions of the
four BVPs.
Definition. Suppose that Rn , n 3 and let (y) be a continuous function on the boundary
, that is C(). We call
Z
w(x) =
(y) En(x y) dS(y), x Rn ,
(17.41)

a single-layer potential and


v(x) =

(y)

En
(x y) dS(y),
~n

x Rn

(17.42)

a double-layer potential
Remarks 17.8 (a) For x 6 the integrals (17.41) and (17.42) exist.
(b) The single layer potential u(x) is continuous on Rn . The double-layer potential jumps at
y0 by (y0 ) as x approaches y0 , see (17.43) below.
Theorem 17.26 Let be a connected bounded region in Rn of the class C2 and = Rn \
also be connected.
Then the interior Dirichlet problem to the Laplace equation has a unique solution. It can be
represented in form of a double-layer potential. The exterior Neumann problem likewise has a
unique solution which can be represented in form of a single-layer potential.

17 PDE II The Equations of Mathematical Physics

490

Theorem 17.27 Under the same assumptions as in the previous theorem, the inner Neumann
R
problem to the Laplace equation has a solution if and only if (y) dS(y) = 0. If this
condition is satisfies, the solution is unique up to a constant.
The exterior Dirichlet problem has a unique solution.
Remark. Let ID denote the continuous functions which produces the solution v(x) of the inteR

En (xy).
rior Dirichlet problem, i. e. v(x) = ID (y) K(x, y) dS(y), where K(x, y) =
~ny
Because of the jump relation for v(x) at x0 :
1
1
v(x) (x0 ) = v(x0 ) = lim v(x) + (x0 ),
xx0 ,x
xx0 ,x
2
2
lim

(17.43)

ID satisfies the integral equation


1
(x) = ID (x) +
2

ID (y)K(x, y) dS(y),

x .

The above equation can be written as = (A + 12 I)ID , where A is the above integral operator
in L2 (). One can prove the following facts: A is compact, A + 12 I is injective and surjective,
continuous implies ID continuous. For details, see [Tri92, 3.4].
Application to the Poisson Equation
Consider the inner Dirichlet problem u = f , and u = on . We suppose that f
C() C1 (). We already know that
Z
1
f (y)
dy
w(x) = (En f )(x) =
(n 2)n Rn kx ykn2
is a distributive solution of the Poisson equation, w = f . By the assumptions on f , w
C2 () and therefore is a classical solution. To solve the problem we try the ansatz u = w + v.
Then u = w + v = f + v. Hence, u = f if and only if v = 0. Thus, the
inner Dirichlet problem for the Poisson equation reduces to the inner Dirichlet problem for the
Laplace equation v = 0 with boundary values
v(y) = u(y) w(y) = (y) w(y) =: (y),

y .

Since and w are continuous on , so is .

17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet


Principle
(a) The Dirichlet Principle
Consider the inner Dirichlet problem to the Poisson equation with given data f C() and
C().
Put
C1 () := {v C1 () | v = on }.

17.5 Appendix

491

On this space define the Dirichlet integral by


Z
Z
1
2
E(v) =
kvk dx +
f v dx,
2

v C1 ().

(17.44)

This integral is also called energy integral. The Dirichlet principle says that among all functions
v with given boundary values , the function u with u = f minimizes the energy integral E.
Proposition 17.28 A function u C1 () C2 () is a solution of the inner Dirichlet problem
if and only if the energy integral E attains its minimum on C1 () at u.
Proof. (a) Suppose first that u C1 () C2 () is a solution of the inner Dirichlet problem,
u = f . For v C1 () let w = v u C1 (). Then
Z
Z
1
E(v) = E(u + w) =
(u + w) (u + w) dx + (u + w)f dx
2
Z
Z
Z
Z
1
1
kuk2 +
kwk2 +
u w dx + (u + w)f dx
=
2
2

Since u and v satisfy the same boundary conditions, w | = 0. Further, u = f . By Greens


1st formula,
Z
Z
Z
Z
u
w dS =
u w dx = (u) w dx +
f w dx.
n

Inserting this into the above equation, we have


Z
Z
Z
Z
1
1
2
2
kuk +
kwk
f w dx + (u + w)f dx
E(v) =
2
2

Z
1
kwk2 dx E(u).
= E(u) +
2
This shows that E(u) is minimal.
(b) Conversely, let u C1 () C2 () minimize the energy integral. In particular, for any test
function D(), has zero boundary values, the function
Z
Z
1 2
(u + f ) dx + t
kk2 dx
g(t) = E(u + t) = E(u) + t
2

has a local minimum at t = 0. Hence, g (0) = 0 which is, again by Greens 1st formula and
| = 0, equivalent to
Z
Z
0 = (u + f ) dx =
(u + f ) dx.

By the fundamental Lemma of calculus of variations, u = f almost everywhere on . Since


both u and f are continuous, this equation holds pointwise for all x .

17 PDE II The Equations of Mathematical Physics

492
(b) Hilbert Space Methods

We want to give another reformulation of the Dirichlet problem. Consider the problem
u = f,

u | = 0.

On C1 () define a bilinear map


uvE =

u v dx.

C1 () is not yet an inner product space since for any non-vanishing constant function u, uuE =
0. Denote by C10 () the subspace of functions in C1 () vanishing on the boundary . Now,
uvE is an inner product on C10 (). The positive definiteness is a consequence of the Poincare
R
inequality below. Its corresponding norm is kuk2E = kuk2 dx. Let u be a solution of the
above Dirichlet problem. Then for any v C10 (), by Greens 1st formula
Z
Z
Z
v uE =
v u dx =
v u dx =
v f dx = v fL2 .

This suggests that u can be found by representing the known linear functional in v
Z
F (v) =
v f dx

as an inner product v uE . To make use of Rieszs representations theorem, Theorem 13.8, we


have to complete C10 () into a Hilbert space W with respect to the energy norm kkE and to
prove that the above linear functional F is bounded with respect to the energy norm. This is
a consequence of the next lemma. We make the same assumtions on as in the beginning of
Section 17.4.
Lemma 17.29 (Poincare inequality) Let Rn . Then there exists C > 0 such that for all
u C10 ()
kukL2 () C kukE .
Proof. Let be contained in the cube = {x Rn | | xi | a, i = 1, . . . , n}. We extend u
by zero outside . For any x = (x1 , . . . , xn ), by the Fundamental Theorem of Calculus
Z x1
2 Z x1
2
2
ux1 (y1 , x2 , . . . , xn ) dy1 =
1 ux1 (y1 , x2 , . . . , xn ) dy1
u(x) =
a
a
Z x1
Z x1
Z x1
Z a
2
2

dy1
ux1 dy1 = (x1 + a)
ux1 dy1 2a
u2x1 dy1 .
CSI

Since the last integral does not depend on x1 , integration with respect to x1 gives
Z a
Z a
2
2
u(x) dx1 4a
u2x1 dy1 .
a

Integrating over x2 , . . . , xn from a to a we find


Z
Z
2
2
u dx 4a
u2x1 dy.

17.5 Appendix

493

The same inequality holds for xi , i = 2, . . . , n in place of x1 such that


Z
Z
4a2
2
2
kukL2 =
u dx
(u)2 dx = C 2 kuk2E ,
n

where C = 2a/ n.
The Poincare inequality is sometimes called PoincareFriedrich inequality. It remains true for
functions u in the completion W . Let us discuss the elements of W in more detail. By definition,
f W if there is a Cauchy sequence (fn ) in C10 () such that (fn ) converges to f in the
energy norm. By the Poincare inequality, (fn ) is also an L2 -Cauchy sequence. Since L2 ()
is complete, (fn ) has an L2 -limit f . This shows W L2 (). For simplicity, let R1 .
R
By definition of the energy norm, | (fn fm ) |2 dx 0, as m, n ; that is (fn ) is an
L2 -Cauchy sequence, too. Hence, (fn ) has also some L2 -limit, say g L2 (). So far,
kfn f kL2 0,

kfn gkL2 0.

(17.45)

We will show that the above limits imply f = g in D (). Indeed, by (17.45) and the Cauchy
Schwarz inequality, for all D(),
Z

(fn f ) dx

Hence,
Z

(fn g) dx

f dx =

Z

Z

 21 Z
 12
2
0,
| fn f | dx
| | dx
2

 12 Z
 21
2
2

0.
| fn g | dx
| | dx

f dx = lim

fn dx = lim

fn

dx =

g dx.

This shows f = g in D (). One says that the elements of W provide weak derivatives, that
is, its distributive derivative is an L2 -function (and hence a regular distribution).
Also, the inner product E is positive definite since the L2 -inner product is. It turns out that W
is a separable Hilbert space. W is the so called Sobolev space W01,2 () sometimes also denoted
by H10 (). The upper indices 1 and 2 in W01,2 () refer to the highest order of partial derivatives
(| | = 1) and the Lp -space (p = 2) in the definition of W , respectively. The lower index 0
refers to the so called generalized boundary values 0. For further readings on Sobolev spaces,
see [Fol95, Chapter 6].
R
Corollary 17.30 F (v) = v fL2 = f v dx defines a bounded linear functional on W .
Proof. By the CauchySchwarz and Poincare inequalities,
Z
| F (v) |
| f v | dx kf kL2 kvkL2 C kf kL2 kvkE .

Hence, F is bounded with kF k C kf kL2 .

17 PDE II The Equations of Mathematical Physics

494

Corollary 17.31 Let f C(). Then there exists a unique u W such that
v uE = v fL2 ,

v W.

This u solves
u = f,

in

D ().

The first statement is a consequence of Rieszs representation theorem; note that F is a bounded
linear functional on the Hilbert space W . The last statement follows from D() C10 () and
uE = hu , i = f L2 . This is the so called modified Dirichlet problem. It remains open
the task to identify the solution u W with an ordinary function u C2 ().

17.5.3 Numerical Methods


(a) Difference Methods
Since most ODE and PDE are not solvable in a closed form there are a lot of methods to
find approximative solutions to a given equation or a given problem. A general principle is
discretization. One replaces the derivative u(x) by one of its difference quotients
+ u(x) =

u(x + h) u(x)
,
h

u(x) =

u(x) u(x h)
,
h

where h is called the step size. One can also use a symmetric difference
Five-Point formula for the Laplacian in R2 is then given by

u(x+h)u(xh)
.
2h

The

h u(x, y) := (x x+ + y y+ )u(x, y) =
u(x h, y) + u(x + h, y) + u(x, y h) + u(x, y + h) 4u(x, y)
=
h2
Besides the equation, the domain as well as its boundary undergo a discretization: If
= (0, 1) (0, 1) then
h = {(nh, mh) | n, m N},

h = {(nh, mh) | n, m Z}.

The discretization of the inner Dirichlet problem then reads


h u = f,
u |h = .

x h ,

Also, Neumann problems have discretizations, [Hac92, Chapter 4].


(b) The RitzGalerkin Method
Suppose we have a boundary value problem in its variational formulation:
Find u V , so that uvE = F (v) for all v V ,

17.5 Appendix

495

where we are thinking of the Sobolev space V = W from the previous paragraph. Of course,
F is assumed to be bounded.
Difference methods arise through discretising the differential operator. Now we wish to leave
the differential operator which is hidden in E unchanged. The RitzGalerkin method consists
in replacing the infinite-dimensional space V by a finite-dimensional space VN ,
VN V,

dim VN = N < .

VN equipped with the norm kkE is still a Banach space. Since VN V , both the inner product
E and F are defined for u, v VN . Thus, we may pose the problem
Find uN VN , so that uN vE = F (v) for all v VN ,

The solution to the above problem, if it exists, is called RitzGalerkin solution (belonging to
VN ).
An introductory example is to be found in [Hac92, 8.1.11, p. 164], see also [Bra01, Chapter 2].

496

17 PDE II The Equations of Mathematical Physics

Bibliography
[AF01]

I. Agricola and T. Friedrich. Globale Analysis (in German). Friedr. Vieweg & Sohn,
Braunschweig, 2001.

[Ahl78] L. V. Ahlfors. Complex analysis. An introduction to the theory of analytic functions of one complex variable. International Series in Pure and Applied Mathematics.
McGraw-Hill Book Co., New York, 3 edition, 1978.
[Arn04] V. I. Arnold. Lectures in Partial Differential Equations. Universitext. Springer and
Phasis, Berlin. Moscow, 2004.
[Bra01] D. Braess. Finite elements. Theory, fast solvers, and applications in solid mechanics.
Cambridge University Press, Cambridge, 2001.
[Bre97] Glen E. Bredon. Topology and geometry. Number 139 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1997.
[Bro92] Th. Brocker. Analysis II (German). B. I. Wissenschaftsverlag, Mannheim, 1992.
[Con78] J. B. Conway. Functions of one complex variable. Number 11 in Graduate Texts in
Mathematics. Springer-Verlag, New York, 1978.
[Con90] J. B. Conway. A course in functional analysis. Number 96 in Graduate Texts in
Mathematics. Springer-Verlag, New York, 1990.
[Cou88] R. Courant. Differential and integral calculus III. Wiley Classics Library. John
Wiley & Sons, New York etc., 1988.
[Die93] J. Dieudonne. Treatise on analysis. Volume I IX. Pure and Applied Mathematics.
Academic Press, Boston, 1993.
[Els02]

J. Elstrodt. Ma- und Integrationstheorie. Springer, Berlin, third edition, 2002.

[Eva98] L. C. Evans. Partial differential equations. Number 19 in Graduate Studies in Mathematics. AMS, Providence, 1998.
[FB93]

E. Freitag and R. Busam. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag,


Berlin, 1993.
497

BIBLIOGRAPHY

498
[FK98]

H. Fischer and H. Kaul. Mathematik fur Physiker. Band 2. Teubner Studienbucher:


Mathematik. Teubner, Stuttgart, 1998.

[FL88]

W. Fischer and I. Lieb. Ausgewahlte Kapitel der Funktionentheorie. Number 48 in


Vieweg Studium. Friedrich Vieweg & Sohn, Braunschweig, 1988.

[Fol95]

G. B. Folland. Introduction to partial differential equations. Princeton University


Press, Princeton, 1995.

[For81]

O. Forster. Analysis 3 (in German). Vieweg Studium: Aufbaukurs Mathematik.


Vieweg, Braunschweig, 1981.

[For01]

O. Forster. Analysis 1 3 (in German). Vieweg Studium: Grundkurs Mathematik.


Vieweg, Braunschweig, 2001.

[GS64]

I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen).


III: Einige Fragen zur Theorie der Differentialgleichungen. (German). Number 49
in Hochschulbucher fur Mathematik. VEB Deutscher Verlag der Wissenschaften,
Berlin, 1964.

[GS69]

I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen). I:


Verallgemeinerte Funktionen und das Rechnen mit ihnen. (German). Number 47
in Hochschulbucher fur Mathematik. VEB Deutscher Verlag der Wissenschaften,
Berlin, 1969.

[Hac92] W. Hackbusch. Theory and numerical treatment. Number 18 in Springer Series in


Computational Mathematics. Springer-Verlag, Berlin, 1992.
[Hen88] P. Henrici. Applied and computational complex analysis. Vol. 1. Wiley Classics
Library. John Wiley & Sons, Inc., New York, 1988.
[How03] J. M. Howie. Complex analysis.
Springer-Verlag, London, 2003.
[HS91]

Springer Undergraduate Mathematics Series.

F. Hirzebruch and W. Scharlau. Einfuhrung in die Funktionalanalysis. Number 296


in BI-Hochschultaschenbucher. BI-Wissenschaftsverlag, Mannheim, 1991.

[HW96] E. Hairer and G. Wanner. Analysis by its history. Undergraduate texts in Mathematics.
Readings in Mathematics. Springer-Verlag, New York, 1996.
[Jan93]

K. Janich. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag, Berlin, third edition, 1993.

[Joh82]

F. John. Partial differential equations. Number 1 in Applied Mathematical Sciences.


Springer-Verlag, New York, 1982.

[Jos02]

J. Jost. Partial differential equations. Number 214 in Graduate Texts in Mathematics.


Springer-Verlag, New York, 2002.

BIBLIOGRAPHY

499

[KK71] A. Kufner and J. Kadlec. Fourier Series. G. A. Toombs Iliffe Books, London, 1971.
[Kno78] K. Knopp. Elemente der Funktionentheorie. Number 2124 in Sammlung Goschen.
Walter de Gruyter, Berlin, New York, 9 edition, 1978.
[Kon90] K. Konigsberger. Analysis 1 (English). Springer-Verlag, Berlin, Heidelberg, New
York, 1990.
[Lan89] S. Lang. Undergraduate Analysis. Undergraduate texts in mathematics. Springer,
New York-Heidelberg, second edition, 1989.
[MW85] J. Marsden and A. Weinstein. Calculus. I, II, III. Undergraduate Texts in Mathematics. Springer-Verlag, New York etc., 1985.
[Nee97] T. Needham. Visual complex analysis. Oxford University Press, New York, 1997.
[ON75] P. V. ONeil. Advanced calculus. Collier Macmillan Publishing Co., London, 1975.
[RS80]

M. Reed and B. Simon. Methods of modern mathematical physics. I. Functional


Analysis. Academic Press, Inc., New York, 1980.

[Rud66] W. Rudin. Real and Complex Analysis. International Student Edition. McGraw-Hill
Book Co., New York-Toronto, 1966.
[Rud76] W. Rudin. Principles of mathematical analysis. International Series in Pure and Applied Mathematics. McGraw-Hill Book Co., New York-Auckland-Dusseldorf, third
edition, 1976.
[Ruh83] F. Ruhs. Funktionentheorie. Hochschulbucher fur Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 4 edition, 1983.
[Spi65]

M. Spivak. Calculus on manifolds. W. A. Benjamin, New York, Amsterdam, 1965.

[Spi80]

M. Spivak. Calculus. Publish or Perish, Inc., Berkeley, California, 1980.

[Str92]

W. A. Strauss. Partial differential equations. John Wiley & Sons, New York, 1992.

[Tri92]

H. Triebel. Higher analysis. Hochschulbucher fur Mathematik. Johann Ambrosius


Barth Verlag GmbH, Leipzig, 1992.

[vW81] C. von Westenholz. Differential forms in mathematical physics. Number 3 in Studies in Mathematics and its Applications. North-Holland Publishing Co., AmsterdamNew York, second edition, 1981.
[Wal74] W. Walter. Einfuhrung in die Theorie der Distributionen (in German). Bibliographisches Institut, B.I.- Wissenschaftsverlag, Mannheim-Wien-Zurich, 1974.
[Wal02] W. Walter. Analysis 12 (in German). Springer-Lehrbuch. Springer, Berlin, fifth
edition, 2002.

500

BIBLIOGRAPHY

[Wla72] V. S. Wladimirow. Gleichungen der mathematischen Physik (in German). Number 74


in Hochschulbucher fur Mathematik. VEB Deutscher Verlag der Wissenschaften,
Berlin, 1972.

PD D R . A. S CH ULER
M ATHEMATISCHES I NSTITUT
L EIPZIG
U NIVERSIT AT
04009 L EIPZIG
Axel.Schueler@math.uni-leipzig.de

Vous aimerez peut-être aussi