Académique Documents
Professionnel Documents
Culture Documents
JACK SPIELBERG
Contents
1. Axioms for the real numbers 2
2. Cardinality (briefly) 8
3. Decimal representation of real numbers 9
4. Metric spaces 11
5. The topology of metric spaces 14
6. The Cantor set 17
7. Sequences 19
8. Continuous functions 21
9. Limits of functions 23
10. Sequences in R 24
11. Limsup and liminf 26
12. Infinite limits and limits at infinity 28
13. Cauchy sequences and complete metric spaces 29
14. Compactness 31
15. Continuity and compactness 36
16. Connectedness 37
17. Continuity and connectedness 40
18. Uniform continuity 42
19. Convergence of functions 43
20. Differentiation 45
21. Higher order derivatives and Taylor’s theorem 51
22. The Riemann integral 53
23. The “Darboux” approach 56
24. Measure zero and integration 59
25. The fundamental theorem of calculus 63
26. The Weierstrass approximation theorem 65
27. Uniform convergence and the interchange of limits 68
28. Infinite series 71
29. Series of functions 74
30. Power series 75
31. Compactness in function space 76
32. Conditional convergence 78
1
2 JACK SPIELBERG
n n!
= .
j j!(n − j)!
We will assume familiarity with this stuff. It is interesting, though, to consider what is
actually included in the phrase “this stuff.” What facts from high school algebra are covered
by the field axioms? Here is an example of something that is not covered.
Example 1.2. Let F be a field. Are the elements 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .
all distinct? In fact, if we just have the field axioms, we can neither prove nor disprove
that these are all distinct elements. Notice that these are what we normally refer to as the
natural numbers (denoted N). So it isn’t clear that the natural numbers even make sense in
an arbitrary field.
Exercise 1.3. Explain why the “fact” stated in the previous example is true.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 3
(8) |x − a| < r if and only if a − r < x < a + r (draw a picture on the number line).
(9) If a < x < b and a < y < b then |x − y| < b − a.
(10) Let x ∈ F . Suppose that |x| < ε for every positive element ε ∈ F . Then x = 0.
Property 10 above can be strengthened a bit, in a way that can be very useful. (Don’t
cite property 10 when proving this.)
Exercise 1.6. Let F be an ordered field, and let x ∈ F . Suppose that p, q ∈ F + are such
that for every ε ∈ F with 0 < ε < p, we have |x| < qε. Then x = 0.
Remark 1.7. Here is another consequence of the ordered field axioms. Let b > 0. Then
(1 + b)n = 1 + nb + · · · > nb.
1
Now let 0 < a < 1. Then a
> 1, so
1−a 1
= − 1 > 0.
a a
1 1
Let b = a
− 1. Then a = 1+b
, and
n 1 1 a 1
a = < = .
(1 + b)n nb 1−a n
Now we ask the following question (assuming some familiarity with the concept of limit,
but only for the sake of the discussion): if 0 < a < 1 does an tend towards 0, as n → ∞?
Another way to put this is to ask: if c is any fixed positive element, does there exists n0 ∈ N
such that an < c for all n ≥ n0 ? Using the above computations, we see that we can answer
this question affirmatively if we could show that for any fixed positive element c, there exists
a
n0 ∈ N such that (1−a)n < c for all n ≥ n0 . Now observe that we could do this if we could
a
find n0 ∈ N such that (1−a)n 0
< c. In other words, we could prove that an → 0 if we could
a
find n0 ∈ N such that n0 > (1−a)c . But since c is an arbitrary positive element, then so
a
is (1−a)c . So this all comes down to trying to prove that for any positive element x, there
is a natural number n0 such that n0 > x. An ordered field in which this is true is called
Archimedean.
Definition 1.8. Let F be an ordered field. F is called Archimedean if for every x ∈ F there
exists a natural number n such that x < n.
It is evident that Q is an Archimedean ordered field, and we “know” that R is one too.
But we can’t prove it yet, because not all ordered fields are Archimedean!! In other words,
we don’t yet have enough axioms for the real numbers, since we can’t prove the most basic
fact from advanced calculus. Along with the field and order axioms, there is one more axiom
that is necessary to characterize the real numbers. We need some definitions before we can
present it.
Definition 1.9. Let F be an ordered field, let S ⊆ F , and let x ∈ F .
(1) x is an upper bound of S if y ≤ x for every y ∈ S.
(2) x is a lower bound of S if y ≥ x for every y ∈ S.
(3) S is bounded above if there exists an upper bound for S.
(4) S is bounded below if there exists a lower bound for S.
(5) S is bounded if it is bounded above and below.
Exercise 1.10. Is the empty set bounded?
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 5
natural numbers march off arbitrarily far to the right. Our first theorem about the real
numbers is this fact. As we pointed out before, the proof must rely on the completeness
axiom, since not all ordered fields are Archimedean.
Theorem 1.17. R is Archimedean: for every x ∈ R there exists n ∈ N such that x < n.
Proof. We suppose that R is not Archimedean, and derive a contradiction. So let x ∈ R
be such that x ≥ n for all n ∈ N. This just means that x is an upper bound for N. Thus
the (non-empty) subset N of R is bounded above. By the completeness axiom, N has a
supremum. Let z = sup(N). Now z − 1 < z. By Definition 1.11 (200 ), there is an element
n ∈ N with n > z − 1. But then n + 1 > z. Since n + 1 ∈ N, this contradicts Definition 1.11
(1). Therefore R is Archimedean.
We now present some corollaries of the Archimedean property.
1
Corollary 1.18. If x ∈ R with x > 0, then there exists n ∈ N with n
< x.
Proof. By the Archimedean property there is n ∈ N with n > x1 . Then 1
n
< x.
Before stating the next corollary, we recall the well-ordering principle (WOP) and one of
its variations. The WOP states that a non-empty subset of N contains a smallest element.
This is a fundamental property of the natural numbers — it is logically equivalent to the
principle of mathematical induction. The variation we need states that a non-empty subset
of Z that is bounded below (in Z) contains a smallest element.
Corollary 1.19. For x ∈ R there exists a unique n ∈ Z with n ≤ x < n + 1.
Proof. Let x ∈ R. By the Archimedean property there is m ∈ N with m > |x|. Then
x > −m, so the set {k ∈ Z : k > x} is non-empty and bounded below (by −m). Let n + 1
be its smallest element. Then n + 1 > x. But since n < n + 1, n is not in this set, so
n ≤ x. This proves existence. For uniqueness, suppose that n and n0 both do the job. Then
x − 1 < n, n0 ≤ x, so (by property 9 of absolute value) we have |n − n0 | < 1. Since n, n0 ∈ Z
then n = n0 .
The integer n of Corollary 1.19 is denoted [x]. The function [·] : R → Z is called the greatest
integer function. (Some people denote it by bxc; b·c is also called the floor function.)
n
Corollary 1.20. For x ∈ R and for N ∈ N there exists a unique n ∈ Z such that N
≤x<
n+1
N
.
Proof. Apply Corollary 1.19 to N x.
Corollary 1.21. For x, ε ∈ R with ε > 0, there exists y ∈ Q such that |x − y| < ε.
Proof. By Corollary 1.18 there is N ∈ N with N1 < ε. By Corollary 1.20 there is n ∈ Z such
that Nn ≤ x < n+1
N
. Let y = Nn . Then y ∈ Q, and |x − y| = x − y < n+1
N
− Nn = N1 < ε.
The conclusion of Corollary 1.21 is often expressed as: Q is dense in R.
The completeness axiom is actually stronger than the Archimedean property. The next
result does not follow from the Archimedean property (as can be seen from the fact that the
conclusion does not hold in Q).
Theorem 1.22. Let n ∈ N. Every positive real number has a unique positive nth root.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 7
Proof. We first prove uniqueness. If 0 < y < z then y n < z n , so two distinct positive real
numbers cannot be nth roots of the same real number. We now prove existence. Let a > 1.
(If 0 < a < 1, then 1/a > 1. In this case, if we show that 1/a has a positive nth root, then
the inverse of that root will be a positive nth root for a.) Let E = {x ≥ 0 : xn ≤ a}. We
note that E 6= ∅ since 1 ∈ E. We claim that E is bounded above. To see this, note that if
x ∈ E then
xn ≤ aan .
Therefore x < a, and we see that a is an upper bound for E. Thus the completeness axiom
implies that y = sup(E) exists. We will show that y n = a, finishing the proof.
First note that y ≥ 1, since 1 ∈ E. We will use Exercise 1.6. Let 0 < ε < 1. First note
that since y − ε < y < y + ε, we have
(1) (y − ε)n < y n < (y + ε)n .
Since y − ε < y, property (200 ) of Definition 1.11 implies that there is x ∈ E with y − ε < x.
Then (y − ε)n < xn ≤ a. Also, since y + ε > y then y + ε 6∈ E, and hence a < (y + ε)n .
Therefore
(2) (y − ε)n < a < (y + ε)n .
From (1) and (2), and property 9 of absolute value, we have |y n − a| < (y + ε)n − (y − ε)n .
We have
n
n n
X n
y n−j εj − y n−j (−ε)j
(y + ε) − (y − ε) =
j=0
j
n
X n n−j j
y ε 1 − (−1)j
=
j=0
j
n
X n n−j j
=2 y ε
j=1
j
j odd
n
X n n−j
<
2 y ε, since ε < 1.
j=1
j
j odd
is much bigger than the set of rationals (Corollary 2.12). Before proving this, we will first
review some basic facts about the size of sets.
2. Cardinality (briefly)
Definition 2.1. Let A and B be sets.
(1) A and B are equivalent, written A ∼ B, if there exists a bijection from A to B. In
this case, A and B are said to be of the same cardinality.
(2) A is subequivalent to B, written A B, if there is a one-to-one function from A to
B.
The proof of the following proposition is elementary.
Proposition 2.2. For any sets A, B and C,
• A ∼ A.
• If A ∼ B then B ∼ A.
• If A ∼ B and B ∼ C then A ∼ C.
The next theorem is very useful, and its proof is a nice exercise.
Theorem 2.3. (Cantor-Bernstein) Let A and B be sets. If A B and B A then A ∼ B.
Definition 2.4. Let A be a set.
(1) A is finite if there is n ∈ N ∪ {0} such that A ∼ {1, 2, . . . , n}.
(2) A is infinite if A is not finite.
(3) A is denumerable if A ∼ N.
(4) A is countable if A is finite or denumerable.
(5) A is uncountable if A is not countable.
Proposition 2.5. (1) If m 6= n then {1, 2, . . . , m} 6∼ {1, 2, . . . , n}.
(2) N is infinite.
(3) A is countable if and only if A N.
(4) Let A1 , A2 , . . . be countable sets. Then ∪∞ n=1 An is countable, and for each n, A1 ×
· · · × An is countable.
(5) Q is countable.
Proof. The first three statements can be proved as exercises. For the fourth, let An =
{xn1 , xn2 , . . .}. Consider the list: x11 , x12 , x21 , x13 , x22 , x31 , . . .. For each entry, delete all
subsequent occurrences. What is left is a list, without duplications, of the elements of the
union. This defines a bijection from N to the union.
Suppose inductively that A1 × · · · × An is countable. Then
A1 × · · · × An+1 = ∪x∈An+1 A1 × · · · × An × {x}
is countable.
For the last statement, first note that Z is countable, as can be seen from the list: 0, 1,
-1, 2, -2, . . .. Since Z ∼ n1 Z, it follows from Proposition 2.2 that n1 Z is countable. Then
Q = ∪∞ 1
n=1 n Z is countable.
Q∞
Example 2.6. Let X = 1 {0, 1} = (x1 , x2 , . . .) : xi ∈ {0, 1} for all i . (Thus X is the
set of all sequences of 0’s and 1’s.)
Proposition 2.7. X is uncountable.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 9
Proof. We will show that if f : N → X is any function, then f is not onto. Therefore there
does not exist a bijection from N to X.
So let f : N → X be given. Let f (n) be the sequence (xn1 , xn2 , xn3 , . . .). Define an
element y = (y1 , y2 , . . .) ∈ X by yn = 1 − xnn . Then for each n, y and f (n) differ in the nth
slot, so that y 6= f (n). Therefore y is not in the range of f . Therefore f is not onto.
Remark 2.8. X ∼ P(N). To define a bijection from X to P(N), send a sequence x =
(x1 x2 . . .) to the set {n ∈ N : xn = 1}. It is easy to check that this works. In fact, this is a
special case of a general theorem of Cantor.
Theorem 2.9. If S is any set, and if f : S → P(S) is any function, then f is not onto.
Thus for any set S, S 6∼ P(S). (Since it is evident that S P(S), we observe that P(S)
has a larger cardinality than S.)
Proof. Given f , let E = {x ∈ S : x 6∈ f (x)}. It is easy to check that E is not in the range
of f .
The next result will be proved later (Corollary 6.4).
Theorem 2.10. R ∼ X.
Corollary 2.11. R is uncountable.
The previous corollary (and hence also the next corollary) can be proved from the results
of the next section, rather than from Corollary 6.4.
Corollary 2.12. The set of irrational numbers is uncountable.
Thus we have defined x0 ∈ Z and xn ∈ {0, 1, . . . , 9} for n ≥ 1 so that (1) holds for all n.
Pn −i
(2) x = sup i=0 x i 10 : n ≥ 0 .
Proof. The proof is left as an exercise.
(3) (xn ) is not eventually equal to 9; precisely, for every n there is m ≥ n such that
xm 6= 9. The point is that, for example, if we start with x = 1, we will obtain the
expansion 1.0000 · · · , and NOT 0.9999 · · · .
Proof. The proof is left as an exercise.
(4) If x 6= y then there exists k such that xk 6= yk . In other words, the map that takes a
real number to its decimal expansion is one-to-one.
Proof. Let x < y. Choose n such that 10−n < y − x. Then
n
X n
X
−i −n
xi 10 ≤ x < y − 10 < yi 10−i .
i=0 i=0
4. Metric spaces
Much of what we do in analysis ultimately comes down to measuring the distance between
two real numbers. We use the absolute value for this: |x − y| is the distance between the
numbers x and y. There are many other situations where we use the distance between
points in an essential way. For example, the Pythagorean theorem is used to define the usual
distance between points in R2 , and even in Rn . One of the wonderful abstractions of XXth
century mathematics is a generalization of this notion of distance. In fact, it isn’t too hard to
notice that everything we use distance for in advanced calculus (e.g. limits, continuity, etc.)
relies only on a few very coarse aspects of the distance function. The following definition
sets these out precisely, and gives the basic setting for this course.
Definition 4.1. Let X be a set. A metric on X is a function d : X × X → R such that
(1) d(x, y) ≥ 0 for all x, y ∈ X (positivity).
(2) d(x, y) = 0 if and only if x = y (definiteness).
12 JACK SPIELBERG
Corollary 4.10. Let V be an inner product space. For x ∈ V let kxk = hx, xi1/2 . Then k · k
is a norm on V .
Proof. We will prove the triangle inequality, leaving the verification of the other properties
of a norm as an exercise. Let x, y ∈ V . Then by the Cauchy-Schwartz inequality,
kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 2hx, yi + kyk2
2
≤ kxk2 + 2kxk kyk + kyk2 = kxk + kyk .
Example 4.11. The usual norm on Rn arises from the usual inner product. The corre-
sponding metric space is usually referred to as (n-dimensional) Euclidean space. We note
the following important inequalities for the Euclidean norm (proof by squaring).
Remark 4.12. Let x ∈ Rn . Then for any i,
|xi | ≤ kxk ≤ |x1 | + · · · + |xn |.
Definition 4.13. Let (X, d) be a metric space, and let Y ⊆ X. If we restrict the metric d
to points of Y then Y becomes a metric space, called a subspace of X.
Example 4.14. The circle (or torus) is a subspace of Euclidean space: T = (x, y) ∈ R2 :
√
x2 + y 2 = 1 . (Thus, for example, d (1, 0), (0, 1) = 2.)
It is very important to remember that, while pictures can give a lot of valuable intuition,
they are not a substitute for a proof. In this course, you may never use a picture as part
of a proof (though they can be included to help explain what you are doing). Well, it isn’t
really enough to just tell you not to touch the stove — you really have to burn yourself. The
following example is much more frequently encountered than you might imagine the first
time you see it. You should work through carefully on your own the details of the proof that
it is a metric space, and try to visualize it in some way (it’s unclear what that means!). It
provides a counterexample to many “obvious” facts about metric spaces that are not actually
true. The point is this: any theorem that we prove about metric spaces must be true for all
metric spaces. In particular, it will be true for the metric space in the next example.
Example 4.15. Recall the set X from Example 2.6. We define a metric on X as follows.
for x, y ∈ X with x 6= y, the set {i : xi 6= yi } is non-empty. By the well-ordering principle,
it has a least element. We set k(x, y) = min{i : xi 6= yi }. Then we define
(
1
k(x,y)
, if x 6= y
d(x, y) =
0, if x = y.
We claim that d is a metric on X. The proofs of positive definiteness and symmetry are
immediate. We will verify the triangle inequality. In fact, we will prove something stronger,
called the ultrametric inequality.
Lemma 4.16. For any x, y, z ∈ X, d(x, y) ≤ max d(x, z), d(y, z) .
Proof. We will write “k(x, x) = ∞” as a kind of shorthand. (But notice that then we have
that d(x, y) < d(u, v) if and only if k(x, y) > k(u, v), for any points x, y, u, v ∈ X.)
Now let x, y, z ∈ X. If d(x, y) ≤ d(x, z) then the inequality holds. So suppose that
d(x, y) > d(x, z). Then k(x, y) < k(x, z). Since xi = zi for i < k(x, z), we have that
= xk(x,y) 6= yk(x,y)
zk(x,y) . Therefore k(y, z) ≤ k(x, y), and hence d(x, y) ≤ d(y, z). Therefore
max d(x, z), d(y, z) = d(y, z) ≥ d(x, y).
14 JACK SPIELBERG
The following is another example of a metric space that varies from what our intuition
suggests. This one often seems like a stupid metric space . . . well, it is stupid, but it is
also a metric space. Every theorem about metric spaces must be true for it, and hence any
statement that is not true for this example, cannot be proven using only the axioms of a
metric space.
Example 4.17. Let S be any set. The discrete metric on S is defined by
(
1, if x 6= y,
d(x, y) =
0, if x = y.
Remark 4.18. It is easy to see that the discrete metric on a set with n points can be
realized as a subspace of Euclidean n-space. It is a little harder to find a natural setting for
the discrete metric on N. The discrete metric on R is a useful counterexample to keep in
mind.
Proof. These are easy exercises using DeMorgan’s laws (and the notation for families of
sets).
Example 5.6. (1) In any metric space X, X and ∅ are both open and closed. (It is a
fairly deep fact (to be proved later) that if X = Rn then these are the only sets that
are simultaneously open and closed.)
(2) A singleton set in a metric space is a closed set.
Definition 5.7. An open box in Rn is a set of the form (a1 , b1 ) × · · · × (an , bn ), where
−∞ ≤ ai ≤ bi ≤ ∞ for each i. Closed boxes in Rn are defined similarly, by including all
finite endpoints of the interval factors of the Cartesian product. We note that an open box
is a finite intersection of (at most 2n) open half-spaces, and hence is an open set. Similarly,
closed boxes are closed sets.
It is important to remember that, while the complement of an open set is a closed set, the
opposite of “open” is not “closed” — many (most, even) sets are neither open nor closed.
We next introduce the operations of interior and closure. These provide important open and
closed sets associated with arbitrary subsets of a metric space.
Definition 5.8. Let X be a metric space and let E ⊆ X. The interior of E is the set
[
int (E) = U : U ⊆ E and U is open .
Proof. The first two items follow immediately from the definitions. For the third item, we
have
E = ∩{K : E ⊆ K, K closed}
(E)c = ∪{K c : E ⊆ K, K closed}
= ∪{K c : K c ⊆ E c , K c open}
= ∪{U : U ⊆ E c , U open}
= int (E c ).
Taking complements of both sides yields the first formula. If we apply the first formula to
E c , and take complements of both sides, we obtain the second formula.
The above definitions are abstract, in that they don’t give an explicit criterion to use to
decide if a point does or does not belong to the interior or closure of a set. We now give
such criteria.
Proposition 5.10. (1) x ∈ int(E) if and only if there is r > 0 such that Br (x) ⊆ E.
(2) x ∈ E if and only if for every r > 0, Br (x) ∩ E 6= ∅.
Proof. (1) This is almost instantly obtained from the definition, and we leave the details
as an exercise.
(2) We note that x ∈ (E)c if and only if x ∈ Int(E c ), by Remark 5.9(3). But this is true
if and only if there is r > 0 such that Br (x) ⊆ E c , by part (1). But this is true if and
only if there is r > 0 such that Br (x) ∩ E = ∅. By negating the first and last items
in this chain of equivalent statements, we find that x ∈ E if and only if for all r > 0,
Br (x) ∩ E 6= ∅.
Example 5.11. It is worth thinking about the above definitions and results in the context
of some examples. We note that in any metric space, if E is open then int(E) = E, while if
E is closed then E = E.
In R,
(1) int (a, b] = (a, b).
(2) int (Z) = ∅.
(3) int (Q) = ∅.
(4) (0, 1] = [0, 1].
(5) Z = Z.
(6) Q = R.
It might seem tempting to try to describe the new points sucked in by the closure operation,
i.e. the points of E that are not already in E. However it turns out to be much more useful
to describe the property that brings them into the closure. This property may apply also to
some points already in E.
Definition 5.12. Let X be a metric space, E ⊆ X, and a ∈ X. The point a is a cluster
point of E (also called by some people limit point or accumulation point) if for every r > 0,
the intersection E ∩ Br (a) is infinite. We write E 0 for the set of cluster points of E.
Example 5.13. Let X = R.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 17
It is a good idea to draw a picture. It isn’t hard to see that C is nonempty: all the
endpoints of the closed subintervals making up the Fn ’s belong to C. Still, this set of
endpoints is a countable set. In fact, C is much bigger, as we will now see. Recall the space
X of Definition 2.6. We will prove that C ∼ X.
Definition 6.2. We define f : X → C as follows. Let x = (x1 , x2 , . . .) ∈ X. For each n
define a closed interval In (x) recursively by
I0 (x) = [0, 1]
(
left piece of In (x) ∩ Fn+1 , if xn+1 = 0,
In+1 =
right piece of In (x) ∩ Fn+1 , if xn+1 = 1.
Then I0 (x) ⊇ I1 (x) ⊇ · · · . Let us write In (x) = [an , bn ]. The nesting of these intervals
implies that
a1 ≤ a2 ≤ · · · ≤ b 2 ≤ b 1 .
T∞
n=0 In (x) = [α, β]. To see
Let α = sup{a1 , a2 , . . .} and β = inf{b1 , b2 , . . .}. We claim that T
this, we firstTnote that since an ≤ α ≤ β ≤ bn for all n, [α, β] ⊆ ∞ n=0 In (x). On the other
∞
hand, if x ∈ n=0 , then an ≤ x ≤ bn for all n. Hence x is an upper bound for the set of an ’s,
and a lower bound for the set of bn ’s. Thus α ≤ x ≤ β. This proves the claim. Finally,T∞ since
−n −n
bn − an = 3 , we have β − α ≤ 3 for all n. Therefore α = β. It follows that n=0 In (x) is
the singleton set {α}. We define f by setting f (x) = α. More precisely, the above argument
allows us to describe f as follows:
\∞
{f (x)} = In (x).
n=0
Proposition 6.3. f is bijective.
Proof. We first show that f is injective. Let x, y ∈ X with x 6= y. Let k = k(x, y) (recall
Example 4.15). For i < k, xi = yi , so that Ii (x) = Ii (y). Since xk 6= yk , Ik (x) and Ik (y) are
two disjoint subintervals of Ik−1 (x) = Ik−1 (y). Since f (x) ∈ Ik (x) and f (y) ∈ Ik (y), we must
have f (x) 6= f (y)..
We now show that f is surjective. Let t ∈ C. Then t ∈ Fn for all n. For each n, let In
be the subinterval of Fn containing t. Since In and In+1 are subintervals of Fn and Fn+1 ,
respectively, then either In ⊇ IT n+1 or In ∩ In+1 = ∅. Since both contain t, we must have
In ⊇ In+1 . Thus we must have ∞ n=0 In = {t}. Now let
(
0, if In is the left piece of In−1 ∩ Fn ,
xn =
1, if In is the right piece of In−1 ∩ Fn .
Letting x = (x1 , x2 , . . .) ∈ X, we see that In = In (x) for all n, so that t = f (x).
Corollary 6.4. R, C, X, and P(N) are equivalent sets. In particular, R is uncountable.
Proof. In Remark 2.8 we sketched the proof that X ∼ P(N), while in Proposition 6.3 we saw
that X ∼ C. We finish the proof by showing that R ∼ C. Since C ⊆ R we have C R. By
the Cantor-Bernstein theorem, it suffices to show that R C. Since C ∼ P(N), it suffices
to show that R P(N). But since N ∼ Q, we know that P(N) ∼ P(Q). Thus we will
be finished if we can show that R P(Q). We do that as follows. We define a function
g : R → P(Q) by
g(t) = {q ∈ Q : q < t}.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 19
If s 6= t are distinct points of R, say s < t, then by the density of Q in R there exists
q ∈ Q with r < q < t. Then q ∈ g(t) and q 6∈ g(s), and we have g(s) 6= g(t). Hence g is
one-to-one.
Exercise 6.5. (1) int(C) = ∅.
(2) C 0 = C.
7. Sequences
Definition 7.1. Let X be a set. A sequence in X is a function x : N → X.
Remark 7.2. We usually write xn instead of x(n), but the latter notation is often useful
too. We sometimes write (xn )∞ n=1 , or (xn ), for x. It is important to remember that in this
notation, n is a dummy variable — it is the argument of the function x. (So, in particular,
there is nothing special about the letter n used as the argument — it will often be convenient
to use a different letter.) Some texts use curly braces instead of parentheses, but we will
avoid this notation, for the following reason. The range of the sequence x is the subset
{xn : n ∈ N} of X. This is often referred to as the set of terms of (xn ). It is important
to distinguish between the sequence itself (which is a function from N to X), and its set of
terms (which is a subset of X).
While we are on the subject of the subtlety of the notation for sequences, let me point
out a common mistake to guard against. What should we make of the following statement
(taken from more than one actual homework paper!): “Let (xn ) be a sequence, and let (xi )
be another sequence.”? Of course, this deserves a quantity of red ink, but you should think
carefully about the precise error. (And PLEASE don’t make this mistake too.)
Definition 7.3. Let (xn ) be a sequence in a metric space X, and let a ∈ X. (xn ) converges
to a if for every ε > 0, there exists n0 ∈ N such that for all n ≥ n0 we have d(xn , a) < ε. We
write xn → a (as n → ∞) to indicate that (xn ) converges to a.
Lemma 7.4. A sequence in a metric space converges to at most one point.
Proof. Suppose that xn → a and xn → b. Let ε > 0. There exist n1 , n2 ∈ N such that
d(xn , a) < ε/2 for all n ≥ n1 , and d(xn , b) < ε/2 for all n ≥ n2 . Let n = max{n1 , n2 }. Then
d(a, b) ≤ d(a, xn ) + d(xn , b) < ε/2 + ε/2 = ε. Since d(a, b) < ε for all ε > 0, it follows that
a = b.
Definition 7.5. If xn → a, a is called the limit of (xn ), and we write limn→∞ xn = a. We
say that (xn ) converges if it has a limit; otherwise it diverges.
Proposition 7.6. Let X be a metric space, let E ⊆ X, and let a ∈ X.
(1) a ∈ E if and only if there is a sequence in E converging to a.
(2) a ∈ E 0 if and only if there is a sequence in E \ {a} converging to a.
(3) E is closed if and only if every sequence in E that converges in X has its limit in E.
Proof. We prove part of the proposition, and leave the rest as an exercise.
(1) (⇒): Let a ∈ E. By Proposition 5.10(2), for each n ∈ N we have E ∩ B1/n (a) 6= ∅.
Choose xn ∈ E with d(xn , a) < 1/n. Then xn → a.
Remark 7.7. Sequences are an important tool for studying metric spaces. One can think
of a sequence as a kind of “probe” — a function from N to the space picks out a certain
countable subset in a manner indexed by the natural numbers. It is also useful to use
sequences as tools to study a sequence itself. This leads to the next definition.
20 JACK SPIELBERG
Definition 7.8. Let x be a sequence in a set X, and let n be a strictly increasing sequence
in N. (Thus n : N → N satisfies n1 < n2 < n3 < · · · .) Then x ◦ n is another sequence in X.
It is called a subsequence of x.
Remark 7.9. The terms of the subsequence ∞ x ◦ n may be denoted (x ◦ n)i = x n(i) =
xn(i) = xni . Thus we may write x ◦ n = xni i=1 .
The idea of a subsequence is pretty simple, but the notation can lead to lots of silly
mistakes, against which you should be on guard. For example, let (xn ) be a sequence. The
expression x50 makes sense — it is the 50th term of the sequence. Now let xni be a
subsequence. The expression xn50 makes sense — it is the 50th term of the subsequence,
and equivalently, it is the n50 th term of the original sequence. However, the expression x50i
does not make sense. If we try to interpret it, we first realize that it is the value of the
function x at the argument 50i . So 50i must be an element of the domain of x, namely a
natural number. Now 50i must be the value of the function 50 at the argument i. But this
is nonsense — ‘50’ is not a function, so it can’t be ‘evaluated’ at the argument i.
Here is another example to keep in mind. Suppose that we have a bunch of sequences
in X. Say that x1 , x2 , . . . are all sequences (i.e. we have a sequence of sequences). How
∞ the terms of the nth sequence? We have that xn : N → X, so we can write
should we write
xn = xn (i) i=1 , using function notation for xn . Note carefully that i is the argument of the
function xn , and not the argument of n (which is not a function). We have to be careful
about using subscript notation. If we weren’t being careful, we might write xn = (xni )∞ i=1 .
But this is the same as the notation for a subsequence ∞of a sequence x. One resolution of this
ambiguity is to use more parentheses: xn = (xn )i i=1 . The more usual way is to use two
subscripts: xn = (xni )∞
i=1 , and this is what we will do when we are faced with this situation.
Writing it out longhand for clarity gives xn = (xn1 , xn2 , xn3 , . . .). Note that it is necessary to
write so clearly that the reader does not mistake the second subscript for a sub-subscript.
Here is a simple result about subsequences.
Proposition 7.10. Let (xn ) be a convergent sequence in a metric space. Then every subse-
quence of (xn ) is also convergent, and has the same limit.
Remark 7.11. Before proving the proposition, we observe that if n : N → N is strictly
increasing, then ni ≥ i for all i. This is easily proved by induction on i, and we omit the
proof. We do point out that equality is possible. In fact, letting ni = i for all i shows that
any sequence is a subsequence of itself.
Proof. (of Proposition 7.10) Let xn → a, and let (xni ) be a subsequence. We will show that
xni → a. Let ε > 0. Since xn → a, there is m such that d(xn , a) < ε whenever n ≥ m. Now
if i ≥ m, then ni ≥ m, by the remark, so that d(xni , a) < ε. Thus xni → a (as i → ∞).
Remark 7.12. It is clear from the definition that convergence or divergence of a sequence
is unaffected if finitely many terms are changed. Convergence, divergence, the limit if con-
vergent, are examples of properties of a sequence that depend only on the ultimate behavior
of the sequence. In fact, such properties are the only ones that are important for sequences.
One way to describe this is by means of tails of a sequence. If (xn ) is a sequence, the nth
tail is the subsequence (xi )∞i=n . Thus, if the sequence converges to L, then every tail of
the sequence also converges to L. We sometimes say that a property holds eventually for a
sequence if it holds for some tail.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 21
8. Continuous functions
Definition 8.1. Let (X, d) and (Y, ρ) be metric spaces, f : X → Y a function, and x0 ∈ X.
f is continuous at x0 if for every ε > 0 there exists δ > 0 such that for every x ∈ X, if
d(x, x0 ) < δ then ρ f (x), f (x0 ) < ε. f is continuous if it is continuous at each point of X.
Remark 8.2. Here are some equivalent formulations of continuity at a point x0 .
(1) For every ε > 0 there exists δ > 0 such that f Bδ (x0 ) ⊆ Bε f (x0 ) .
(2) For every open ball C with center f (x0 ), there exists an open ball B with center x0
such that f (B) ⊆ C.
(3) For every ε > 0 there exists δ > 0 such that Bδ (x0 ) ⊆ f −1 Bε f (x0 ) .
Homeomorphic metric spaces have the same topological structure and properties. It is
colloquial to describe this by saying that one space can be deformed into the other by
bending and stretching without tearing. Here are some simple examples.
Example 8.8. (1) Any two open disks in R2 are homeomorphic.
(2) Any two closed disks in R2 having positive radii are homeorphic.
(3) No open disk in R2 is homeomorphic to any closed disk in R2 . (This is not an obvious
one.)
(4) Every open ball in Rn is homeomorphic to every open box in Rn .
(5) The unit circle T = {x ∈ R2 : kxk = 1} is not homeomorphic to the unit interval
[0, 1] ⊆ R. (Again, it isn’t so obvious how to prove this.)
Example 8.9. Recall the function f : X → C from Definition 6.2, where X = ∞
Q
1 {0, 1} is
as in Example 2.6, and C is the Cantor set (Definition 6.1). We will Qn show that f and f −1
are continuous functions. First some notation. If (a1 , a2 , . . . , an ) ∈ 1 {0, 1}, let
Z(a1 , . . . , an ) = {x ∈ X : xi = ai for 1 ≤ i ≤ n}.
Such sets are called cylinder sets. Note that cylinder sets are clopen: Z(a1 , . . . , an ) =
B1/n (x) = B 1/(n+1) (x) for any x ∈ Z(a1 , . . . , an ). Note also that f Z(a1 , . . . , an ) = C ∩In (x)
(again for any x ∈ Z(a1 , . . . , an )), which is a clopen subset of C (recall the definition of In (x)
from Definition 6.2). Thus these two families of clopen subsets are paired by the function
f . Since every open subset of X is a union of open balls, i.e. of cylinder sets, and every
open subset of C is a union of subsets of the form C ∩ In (x) (an exercise!), it follows from
Theorem 8.5 that f and f −1 are continuous.
The proofs of the next two results are easy, and so are left as exercises.
Corollary 8.10. (of Theorem 8.5) Let X be a metric space. f : X → R is continuous if
and only if f −1 (a, b) is open for all a < b in R. Equivalently, f : X → R is continuous if
and only if {f < a} and {f > a} are open for all a ∈ R.
Theorem 8.11. Let f : X → Y and g : Y → Z be functions between metric spaces, and let
x0 ∈ X. If f is continuous at x0 , and g is continuous at f (x0 ), then g ◦ f is continuous at
x0 .
9. Limits of functions
The definition we gave a while ago for the limit of a sequence is a special case of a general
notion of limit of a function — after all, a sequence is just a special kind of function. But
sequences are quite special. The definition of the limit of a function is a little bit more
involved. We will need it, in principle, when we talk about differentiation.
Definition 9.1. Let (X, d) and (Y, ρ) be metric spaces, let E ⊆ X, let x0 ∈ E 0 , and let
y0 ∈ Y . The limit of f , as x approaches x0 , equals y0 if for every ε > 0 there exists δ > 0
such that for all x ∈ E, if 0 < d(x, x0 ) < δ then ρ f (x), y0 < ε. (The final implication can
also be expressed as f E ∩ Bδ (x0 ) \ {x0 } ⊆ Bε (y0 ).) We write limx→x0 f (x) = y0 .
Remark 9.2. Note that f might or might not be defined at x0 (accordingly as x0 ∈ E or
x0 6∈ E). We require x0 ∈ E 0 so that for every δ > 0 there will exist points x satisfying the
hypothesis of the implication. Even if x0 ∈ E, the definition of the limit as x → x0 never
requires that f be evaluated at x0 — the value of f at x0 is irrelevant.
24 JACK SPIELBERG
Note further, that if we tried to apply this definition to a point x0 that is not a cluster
point of E, then we would find that the definition is satisfied for any point y0 ∈ Y . To avoid
this situation, we only consider limits at cluster points of the domain of the function.
Exercise 9.3. Show that in the situation of Definition 9.1, if the limit exists it is unique.
(Be sure to note explicitly where the hypothesis that x0 ∈ E 0 is used.)
10. Sequences in R
Theorem 10.1. Let (an ) and (bn ) be sequences in R. Suppose that an → a, and bn → b.
Then
(1) an + bn → a + b.
(2) an bn → ab.
(3) If b 6= 0 then an /bn → a/b (where at most finitely many terms are not defined).
(4) If an ≤ bn for all n, then a ≤ b.
Proof. These are good exercises, so we will only prove part of the third statement; namely,
the case where an = 1 for all n. First, let’s sort out the parenthetical comment. If b 6= 0,
then |b| > 0. By definition of convergence, there is n0 such that |bn − b| < |b| for all n ≥ n0 .
But then, for all n ≥ n0 we have |bn | = |b − (b − bn )| ≥ |b| − |b − bn | > |b| − |b| = 0. Therefore
bn 6= 0 if n ≥ n0 . The quotient sequence will fail to be defined if the denominator equals
zero, but this can only happen for finitely many n (all less than n0 ).
Now let’s prove that if bn → b 6= 0, then 1/bn → 1/b. Let ε > 0. Let n1 be such that
|bn − b| < |b|/2 whenever n ≥ n1 . We can improve on the previous paragraph. If n ≥ n1 we
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 25
have that |bn | ≥ |b| − |b − bn | > |b| − |b|/2 = |b|/2. Now let n2 be such that |bn − b| < |b|2 ε/2
whenever n ≥ n2 . Let n0 = max{n1 , n2 }. For n ≥ n0 we have
2
b − bn
= |bn − b| · 1 · 1 < |b| ε · 1 · 2 = ε.
1 1
− =
bn b bbn |b| |bn | 2 |b| |b|
Therefore 1/bn → 1/b.
Remark 10.2. The first three statements in the theorem mean that the functions + and
· : R2 → R, and ÷ : R × (R \ {0}) → R are continuous.
Remark 10.3. (1) It follows from Theorem 10.1(4) that if an < bn for all n, then a ≤ b.
Note that even with strict inequalities in the hypotheses, the conclusion will in general
only be a weak inequality. This reflects a general principle: limits change strict
inequalities into weak inequalities.
(2) The following well-known lemma also follows from Theorem 10.1(4).
Lemma 10.4. Let (an ) and (bn ) be real sequences, suppose that |an | ≤ |bn |, and suppose that
bn → 0. Then an → 0.
Lemma 10.5. Let (xi ) be a sequence in Rn . We write the ith term of the sequence as an
n-tuple thus: (xi1 , . . . , xin ) (cf. Remarks 7.9). If a = (a1 , . . . , an ) ∈ Rn , then xn → a if and
only if xij → aj (as i → ∞) for all j = 1, . . ., n.
Proof. These follow easily from Remarks 4.12.
We now establish convergence of some special, familiar sequences in R.
√
Proposition 10.6. (1) For any k ∈ N, 1/ k n → 0 as n → ∞.
(2) For any 0 < a < 1, an → 0 as n → ∞.
(3) n1/n → 1 as n → ∞.
(4) For any a ∈ R with 0 < a < 1, and any k ∈ N, an nk → 0 as n → ∞.
1/k
Proof.
√ (1) Let ε > 0. Choose n0 > 1/εk . If n ≥ n0 then n1/k ≥ n0 > 1/ε, and hence
1/ n < ε.
k
is an upper bound, we have xn ≤ c < c + ε for all n. Since c − ε < c, c − ε is not an upper
bound, so there exists n0 with c − ε < xn0 . Then for all n ≥ n0 we have c − ε < xn . Thus
we get that c − ε < xn < c + ε whenever n ≥ n0 . Thus xn → c.
Exercise 10.9. A bounded monotone sequence is convergent.
√
q p
Example 10.10. Does 2 + 2 + 2 + · · · mean anything? OK, this is phrased as a
philosophical question, i.e. it’s a joke. But we can still try to give the expression some
kind of sense. For example, we could argue that √ IF it does represent a real number, call
it x, then x must satisfy the equation x = 2 + x. Then it’s easy to see that x = 2.
But this is not valid since we haven’t shown that the expression does indeed represent a
real number. Someq people might try to make sense of it by interpreting it as a sequence:
√ p √ p √
( 2, 2 + 2, 2 + 2 + 2, . . .). They would define the expression to be the limit of this
sequence, assuming that the sequence converges. We could argue about whether this is a
reasonable definition for the expression, but we can’t argue with the intelligibility of the new
problem: does the given sequence converge, and if so, to what? (Other people might argue
that the limitq of this sequence (if existing) is actually the definition of a different expression;
p √
namely, · · · + 2 + 2 + 2.) However we come to study this sequence, it is a nice exercise
in induction to prove that it is bounded above and√increasing. Therefore it converges. Using
the recursive definition of the sequence (an+1 = 2 + an ), and Theorem 10.1, it is easy to
prove that the limit is, in fact, 2.
We next point out that continuity of real-valued functions is preserved by pointwise arith-
metic of functions.
Corollary 10.11. Let f , g : X → Y , and let a ∈ X. If f and g are continuous at a, then
so are f + g, f g, and f /g (if g(a) 6= 0).
Proof. This follows from Theorems 10.1 and 8.4.
Remark 10.12. The result for limits analogous to the one in Corollary 10.11 holds, as can
be seen by using Lemma 9.4.
Definition 10.13. If f : X → Rn we define the coordinate functions of f by fi = πi ◦ f :
X → R (recall the coordinate projections πi from Example 8.3 (6)). We can then write
f (x) = f1 (x), . . . , fn (x) .
Corollary 10.14. Let f : X → Rn . Then f is continuous if and only if all fi are continuous.
Proof. (=⇒): Use Theorem 8.11 and Example 8.3 (6).
(⇐=): Use Remark 4.12 and Theorem 8.4.
of the sequence (an ).) Notice that {ak : k ≥ n} ⊇ {ak : k ≥ n + 1}, and hence that
supk≥n ak ≥ supk≥n+1 ak . Of course, since L ≤ ak for all k, we also have L ≤ supk≥n ak for
all n. Therefore the sequence of suprema of tails, (supk≥n ak )∞
n=1 is decreasing and bounded
below, and hence converges.
Definition 11.1. Let (an ) be a bounded sequence in R. The limit superior, or limsup of
(an ) is the real number
lim sup an = lim sup ak .
n→∞ n→∞ k≥n
The justification is the opposite of the above: the sequence (inf k≥n ak )∞
n=1 is increasing and
bounded, so it has a limit.
Theorem 11.2. Let (an ) be a bounded sequence in R.
(1) lim inf n→∞ an ≤ lim supn→∞ an
(2) (an ) converges if and only if lim inf n→∞ an = lim supn→∞ an , and in this case,
lim an = lim inf an = lim sup an .
n→∞ n→∞ n→∞
Definition 11.6. We introduce here some standard terminology regarding sequences, re-
flecting the idea that it is only the “ultimate” behavior of a sequence that is of interest (cf.
Remark 7.12). Our phrasing is very general, hence vague, but expresses a useful notion that
is easy to understand once you see the idea. Let (an ) be a sequence, and let P be some
property that the terms of the sequence might have. We say that P holds eventually if P
holds for all terms in some tail of the sequence; in other words, if there exists n0 such that
P (an ) is true for all n ≥ n0 . We say that P holds frequently if every tail of the sequence
contains a term for which P holds; in other words, if for all n0 there exists n ≥ n0 such that
P (an ) is true.
For example, you can check your understanding of these terms by working through the
following statements.
(1) (an ) converges to c if and only if for every ε > 0, an ∈ Bε (c) eventually.
(2) (an ) has a subsequence converging to c if and only if for every ε > 0, an ∈ Bε (c)
frequently.
Exercise 11.7. Let (an ) be a bounded real sequence, and let x ∈ R. Prove the following:
1. x < lim sup an =⇒ x < an frequently =⇒ x ≤ lim sup an
2. x > lim sup an =⇒ x > an eventually =⇒ x ≥ lim sup an
3. x < lim inf an =⇒ x < an eventually =⇒ x ≤ lim inf an
4. x > lim inf an =⇒ x > an frequently =⇒ x ≥ lim inf an
(The exercise is not only to prove the eight implications, but also to show that none of these
implications can be reversed.)
With the above definition and remarks in mind, we can extend the arithmetic of limits
from Corollary 10.11 and Remark 10.12 to include infinite limits (and, of course, limits of
sequences as well as of functions). By this we mean that the limit of the sum/difference/pro-
duct/quotient of two functions equals the sum/difference/product/quotient of the two limits,
IF that arithmetic combination of the limits is permissible. We leave it as an exercise to
write a precise theorem and its proof.
A different use of the symbols ±∞ is in the description of limits at infinity.
Definition 12.3. Let X be a metric space, let f : R → X, and let x0 ∈ X. We write
limt→∞ f (t) = x0 if for every ε > 0 there exists M ∈ R such that for all t ∈ R with t ≥ M
we have d f (t), x0 < ε. There is a similar definition for limits at minus infinity.
Remark 12.4. We mention that in this context, the symbols ∞ and −∞ merely indicate
“directions”, and are not to be thought of as “numbers” in any way.
P∞ 2 1/2
We define a norm on V by kxk = i=1 xi (note that the sum is actually finite). It’s
easy to see that this is a norm: the properties defining a norm only involve finitely many
vectors at a time, and then the required property actually occurs in some Euclidean space,
where we already know the properties hold. Now, let
1 1 1 1
vn = , , , . . . , n , 0, 0, 0, . . . ∈ V.
2 4 8 2
If m < n, we have
1 1
kvm − vn k2 = k(0, 0, . . . , 0, m+1 , . . . , n , 0, 0, . . .)k2
2 2
n n−m−1
X 1 2 1 X 1 1
= i
= m+1 i
< m.
i=m+1
2 4 i=0
4 4
Thus (vn ) is Cauchy in V . But we claim that (vn ) does not converge. To prove this, let
y = (yn ) be an arbitrary vector in V . There is k such that yi = 0 for i > k. For n > k,
∞ n
2
X
2
X 1 2 1 2 1
ky − vn k = (yi − vni ) = yi − i ≥ yk+1 − k+1 = k+1 .
i=1 i=1
2 2 4
Thus d(vn , y) ≥ 2−(k+1) for all n > k. Therefore vn 6→ y.
Definition 13.6. A metric space is called complete if every Cauchy sequence converges.
Theorem 13.7. Rn is complete.
We will give the proof after a couple of lemmas about Cauchy sequences in general metric
spaces.
Lemma 13.8. A Cauchy sequence is bounded.
Proof. Let (an ) be a Cauchy
sequence. Then there is L such that d(am , an ) < 1 for all m,
n ≥ L. Let R = max d(a1 , aL ), . . . , d(aL−1 , aL ) + 2. Then d(an , aL ) < R for all n, and
hence (an ) is bounded
Lemma 13.9. A Cauchy sequence having a convergent subsequence is convergent.
Proof. Let (an ) be a Cauchy sequence, and let (ani ) be a convergent subsequence, with limit
c. We claim an → c. Let ε > 0. Since (an ) is Cauchy there is L such that d(am, an ) < ε/2
for all m, n ≥ L. By the definition of convergence, there is i0 such that d ani , c < ε/2 for
all i ≥ i0 . Let i1 ≥ i0 be such that ni1 ≥ L. Then for any n ≥ ni1 we have
ε ε
d(an , c) ≤ d(an , ani1 ) + d(ani1 , c) < + = ε.
2 2
Hence an → c.
Proof. (of Theorem 13.7) We first show that R is complete. Let (an ) be a Cauchy sequence
in R. By Lemma 13.8 we know that (an ) is bounded. By Theorem 11.3 we know that (an )
has a convergent subsequence. Then by Lemma 13.9 we know that (an ) converges. Thus R
is complete. Now it follows easily from Remark 4.12 that Rn is complete (the details are left
as an exercise).
Exercise 13.10. A closed subset of a complete metric space is complete.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 31
Exercise 13.11. Let (X, d) be a metric space. Recall the diameter of a subset of X from
Exercise 5.19.
(1) Suppose that X is complete. Prove that for every decreasing sequence
F1 ⊇ F2 ⊇ · · ·
of nonempty closed subsets of X with limn→∞ diam(Fn ) = 0, there exists an element
a ∈ X such that
∞
\
Fn = {a}.
n=1
(2) (converse of part (a)) Suppose that whenever F1 , F2 . . . are nonempty
T∞ closed subsets
of X such that F1 ⊇ F2 ⊇ · · · and limn→∞ diam(Fn ) = 0, then n=1 Fn 6= ∅. Prove
that X is a complete metric space.
14. Compactness
Compactness is probably the most important concept in analysis. It can be described in
various ways. The “right” way is not necessarily the easiest to understand. Before we give
the definition, here is some motivation for why it is reasonable. The basic problem that
compactness addresses is the transition from local information to global information. That
may sound cryptic, and it is meant to be a catchy phrase that will become more intelligible
as you get more used to these ideas. But it isn’t hard to see what it is about. Local (near a
point) means in an open ball centered at that point. Here is a simple example of using this
terminology. If a function is continuous at a point, then it is bounded in some open ball
centered at that point. Thus if a function is continuous on a set, it is bounded locally on
that set: each point in the set has a neighborhood on which the function is bounded. On
the other hand, global (on a set) means on the whole set. A function is “globally bounded”
if it is bounded on its domain, i.e. if it is a bounded function. Is every continuous function
bounded? Of course not! For example, a non-constant polynomial on R is continuous, but
not bounded. Local boundedness does not generally imply global boundedness. However if
the domain of the polynomial is taken to be a closed bounded interval, then the extreme
value theorem from calculus implies that the polynomial is bounded on the interval. The
great insight was that it is a property of the domain that lets us pass from local boundedness
to global boundedness, and this property is called compactness.
Now, recall what the word local means: in a neighborhood of a point. A property holds
locally on a set if for each point, there is an open ball centered at the point such that the
property holds in that ball. If the set is infinite, this will give an infinite collection of open
balls, one for each point. We could obtain the property globally if we had a finite collection
of balls instead of an infinite collection. Compactness of the set means that we can always
reduce to a finite collection.
You might notice that a lot of mathematics seems to proceed in this way: what would we
like to have? Let’s give a name to the situation where we have what we want. Now let’s
analyze the situation to see what exactly we were asking for. In fact, compactness can be
described in a variety of ways that seem very different. That means that we can prove that a
space is compact using an easy description. Then we can use compactness via a complicated
description.
OK, with that as motivation, here is the precise definition.
32 JACK SPIELBERG
Definition 14.1. Let X be a set. A cover of X is a collection of sets whose union contains
X. If U is a cover of X, a subcover of U is a subcollection of U that is also a cover of X.
Example
14.2. (1) The set
of all open intervals is a cover of R.
(2) (a, b) : a < b, a, b ∈ Z is a subcover of example (1).
Definition 14.3. Let X be a metric space, and let E ⊆ X. An open cover of E is a cover
of E whose elements are open subsets of X.
Definition 14.4. Let X be a metric space, and let E ⊆ X. E is compact if every open
cover of E has a finite subcover.
Example 14.5. (1) Example 14.2(1) is an open cover of R having a finite subcover.
(2) Example 14.2(2) is an open cover of R not having a finite subcover. In particular, it
follows that R is not compact.
Example 14.6. (1) Finite sets are compact.
(2) {0, 1, 1/2, 1/3, . . .} is a compact subset of R.
(3) [0, 1] is a compact subset of R (this is a special case of Corollary 14.30).
Proof. Let U be an open cover of [0,1]. Let E = x ∈ [0, 1] : [0, x] is finitely covered
by U . Note that 0 ∈ E, so E 6= ∅. Let c = sup E. Then c ∈ [0, 1]. We first
claim that c ∈ E. To see this, choose U0 ∈ U with c ∈ U0 . Then there exists r > 0
such that (c − r, c + r) ⊆ U0 . By the definition of supremum, there is y ∈ E S with
y > c − r. By definition of E there is a finite subcollection V ⊆ U with [0, y] ⊆ V.
But then V ∪ {U0 } is a finite subcollection of U covering [0, c], proving that c ∈ E.
Now we note that, in fact, V ∪ {U0 } covers [0, a] for any number a between c and
c + r. Thus if c < 1 we could find a larger element of E than c, contradicting its
status as supremum. So we have shown that c = 1. Thus [0, 1] is finitely covered by
U.
(4) [0, 1) is not compact.
Proof. (−1, 1 − n1 ) : n ∈ N is an open cover not having a finite subcover.
Definition 14.7. A metric space X is compact if X is a compact subset of itself.
By now our waffling use of the qualifier “subset” after the word “compact” may be causing
some trauma. We will remedy this now, but first we need the important notion of relatively
open set.
Definition 14.8. Let X be a metric space. Recall that a subset E ⊆ X is also a metric
space (cf Definition 4.13). A subset of E is called relatively open (in E) if it is an open
subset of the metric space E.
Example 14.9. (1) Let X = R, and let E = [0, 1] ⊆ X. Then [0, 1/2) is relatively open
in E, but not open in X.
(2) Let X = R2 , and let E = R × {0} ⊆ X (we think of E as being the x-axis in R2 ).
Then (0, 1) × {0} is just the usual open unit interval in the x-axis — it is relatively
open in E, but is not open in X.
Lemma 14.10. Let X be a metric space, and let E ⊆ X. For U ⊆ E, U is relatively open
in E if and only if there exists an open subset V of X such that U = E ∩ V .
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 33
Proof. We will use a superscript E to distinguish open balls in the metric space E from open
balls in X. For a ∈ E and r > 0 we see that
BrE (a) = x ∈ E : d(x, a) < r = x ∈ X : d(x, a) < r ∩ E = Br (a) ∩ E.
S x ∈ U there
Thus U is relatively open in E if and only if for every exists r(x) > 0 such that
Br(x) (x)∩E ⊆ U . In this case, we have that U = x∈U Br(x) (x) ∩E, and we may use the set
in parentheses for V . Conversely, suppose that U = V ∩ E for some open set V of X. Then
for a point x ∈ U there is r > 0 such that Br (x) ⊆ V . Then BrE (x) = Br (x)∩E ⊆ V ∩E = U ,
so we have that U is relatively open in E.
Proposition 14.11. Let X be a metric space, and let E ⊆ X. E is a compact subset of X
if and only if E is a compact metric space.
Proof. Suppose that E is a compact subset of X. Let U be an open cover of (the metric space)
E. By Lemma 14.10, for each U ∈ U there is an open set VU ⊆ X such that U = VU ∩ E.
Then [ [ [
E= U= (VU ∩ E) = VU ∩ E,
U ∈U U ∈U U ∈U
and hence {VU : U ∈ U} is an open cover of E in X. By hypothesis this open cover has
a finite subcover. Thus there are U1 , . . ., Uk ∈ U such that E ⊆ VU1 ∪ · · · ∪ VUk . Hence
E ⊆ U1 ∪ · · · ∪ Uk , so that U has a finite subcover. Therefore the metric space E is compact.
The converse is left as an exercise.
Thus compactness is an intrinsic property of a metric space, that cannot be lost when the
space is realized as a subspace of another metric space (in contrast to openness, which does
depend on the ambient metric space, as seen in Example 14.9). We now develop the chief
properties of compactness.
Proposition 14.12. A closed subset of a compact space is compact.
Proof. Let X be a compact metric space, and let E ⊆ X be a closed subset. Let U be an
open cover of E. Since E is closed, E c is open. Then U ∪ {E c } is an open cover of X. Since
X is compact, this open cover has a finite subcover. The subcover consists of finitely many
sets from U, possibly together with E c . But then the sets from U must cover E, so that U
has a finite subcover (of E). Therefore E is compact.
Exercise 14.13. It is a nice exercise to prove a sort of converse to this. Namely, a compact
subset of a metric space is closed. We won’t do it here, as this fact will follow from a later
result (Corollary 14.20).
Proposition 14.14. A compact subset of a metric space is bounded.
Proof. Let E be a compact
subset of the metric space X. Choose any point x0 ∈ X. Then
Bn (x0 ) : n = 1, 2, 3, . . . is an open cover of X, hence also of E. Since E is compact, there
is a finite subcover. But since the open balls increase with n, this means that there is n such
that E ⊆ Bn (x0 ). Thus E is bounded.
Of course, the converse of Proposition 14.14 is false.
Theorem 14.15. (Finite Intersection Property, or FIP) Let X be a compact metric space.
Let {Ei }i∈I be a collection of nonempty closed subsets of X. Suppose that every finite
subcollection has nonempty intersection: for all k ∈ N, for all i1 , . . ., ik ∈ I, we have
Ei1 ∩ · · · ∩ Eik 6= ∅. Then ∩i∈I Ei 6= ∅.
34 JACK SPIELBERG
Proof. Suppose not. Then taking complements we have ∪i∈I Eic = X. This means that
{Eic : i ∈ I} is an open cover of X. Since X is compact there are i1 , . . ., ik ∈ I with
Eic1 ∪ · · · ∪ Eick = X. But then by complements again, we get that Ei1 ∩ · · · ∩ Eik = ∅, a
contradiction.
Example
14.16. The theorem may fail if the sets are not closed: consider (0, 1/n) : n ∈
N . This does have the FIP, but the intersection is empty.
Definition 14.17. A metric space X is sequentially compact if every sequence in X has a
convergent subsequence (convergent in X, of course).
Example 14.18. [a, b] is sequentially compact by Theorem 11.3, and the fact that [a, b] is
closed.
Theorem 14.19. A compact metric space is sequentially compact.
Corollary 14.20. A compact subset of a metric space is closed.
The proof of the theorem will be made easier by the following preliminary “computation.”
Lemma 14.21. Let (xn ) be a sequence in a metric space, and let y be a point. Then (xn )
has a subsequence converging to y if and only if for every ε > 0 and for every m ∈ N, there
exists n ≥ m such that d(xn , y) < ε.
Proof. (⇒): Suppose limi→∞ xni = y. Let ε > 0 and m ∈ N. By the hypothesized conver-
gence there is i0 such that d(xni , y) < ε whenever i ≥ i0 . Since ni → ∞ as i → ∞ there
exists j ≥ i0 such that nj ≥ m. Then d(xnj , y) < ε. So nj is the desired ‘n’.
(⇐): Suppose the condition in the statement holds. We apply it repeatedly. First choose
n1 such that d(xn1 , y) < 1. Then choose n2 > n1 such that d(xn2 , y) < 1/2. Continuing
this way we construct a subsequence (xni )∞ i=1 such that d(xni , y) < 1/i for all i. Evidently
xni → y as i → ∞.
Proof. (of Theorem 14.19) We will prove the contrapositive of the statement in the theorem.
So suppose that X is not sequentially compact. Then there is a sequence (xn ) having no
convergent subsequence. Thus for all y ∈ X, (xn ) does not have a subsequence converging to
y. Negating the condition in Lemma 14.21, we find that for all y ∈ X there exists εy > 0 and
there exists ny ∈ N such that for all n ≥ ny , d(xn , y) ≥ εy . Let U = Bεy (y) : y ∈ X . U is
obviously an open cover of X. But if y1 , . . ., yk ∈ X are any finite collection of points, choose
n > max{ny1 , . . . , nyk }. Then d(xn , y) ≥ εyi for i = 1, . . ., k. Hence xn 6∈ ∪ki=1 Bεyi (yi ). Thus
U has no finite subcover. Therefore X is not compact.
Proposition 14.22. A sequentially compact metric space is complete.
Proof. This follows from Lemma 13.9.
Exercise 14.23. A metric space X is sequentially compact if and only if every infinite subset
of X has a cluster point.
We now turn to the role of boundedness for compact metric spaces. By way of introduction,
we mention that the most famous result about compact metric spaces is the Heine-Borel
theorem: a subset of Rn is compact if and only if it closed and bounded. We will prove this
later, but now we want to point out that this result is special to Rn — it is NOT true in
arbitrary metric spaces. The reason is that Rn is (duh!) finite dimensional. This may not
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 35
seem so special now, but many of the most important metric spaces in analysis are infinite
dimensional, and you will surely run into them (maybe not today, maybe not tomorrow,
but...yeah, yeah.)
Here is a simple part of the Heine-Borel theorem that we have essentially proved already.
For E ⊆ R, if E is bounded then every sequence in E has a convergent subsequence. If E is
both closed and bounded, then the limit of the convergent subsequence must belong to E.
Thus we see that for subsets of R, closed and bounded imply sequentially compact.
Here are two examples to show that for general metric spaces, boundedness is too weak a
notion. The first is simple-minded, but the second is more interesting.
Example 14.24. (1) Let X be an infinite set with the discrete metric (Example 4.17).
Then X is bounded, but not sequentially compact.
(2) Let V be the normed space of finite real sequences (Example 13.5). Then B 1 (0) is
closed and bounded, but not sequentially compact.
In fact, the situation is worse than might be realized if you just think about the
non-convergent Cauchy sequence from Example 13.5. Consider the sequence (en )
in V , where en = (0, 0, . . . , 0, 1, 0, 0, . . .) (with 1 in the nth slot). This sequence is
contained in the unit ball of V , but does not even have a Cauchy subsequence.
These examples show that the problem with boundedness is that a huge space can hide
inside a bounded set. The correct definition is the following.
Definition 14.25. A subset E of a metric space is called totally bounded if for every ε > 0
there are finitely many balls of radius ε that cover E.
Remark 14.26. (1) The definition is unaffected by specifying the type of the balls (open
vs. closed).
(2) A totally bounded subset of a metric space is bounded. A subset of a totally bounded
set is totally bounded.
The proofs are left as exercises.
The next lemma shows what makes Rn so special.
Lemma 14.27. In Rn , every bounded subset is totally bounded.
Proof. Let E ⊆√Rn be bounded, and let ε > 0. Choose C > 0 such that E ⊆ [−C, C]n .
Choose k > 2C n/ε. Write
k [ k
[ 2C(i − 1) 2Ci
[−C, C] = −C + , −C + = Si ,
i=1
k k i=1
√
where S1 , . . ., Sk are closed intervals of length 2C/k < ε/ n. Then
k
k
[
n
[−C, C] = (S1 ∪ · · · ∪ Sk ) × · · · × (S1 ∪ · · · ∪ Sk ) = Si1 × · · · × Sik = ∪nj=1 Fj ,
i1 ,...,in =1
where each Fj is a closed cube of side 2C/k. Then √ the diameter of each Fj , which equals
the length of the diagonal of Fj , equals (2C/k) n < ε. Let xj ∈ Fj be arbitrary. Then
k
Fj ⊆ Bε (xj ). It follows that E ⊆ [−C, C]n ⊆ ∪nj=1 Bε (xj ).
We now return to our development of the properties of compactness.
36 JACK SPIELBERG
Exercise 15.2. One can also prove this theorem using sequences and sequential compact-
ness.
Corollary 15.3. If X is compact and f : X → Y is continuous, then f (X) is a closed
bounded subset of Y (in fact, totally bounded).
Corollary 15.4. (Extreme value theorem) Let X be a compact metric space, and let f :
X → R be continuous. Then f achieves its maximum and minimum at points of X: there
exist x0 , x1 ∈ X such that for all x ∈ X, f (x0 ) ≤ f (x) ≤ f (x1 ).
Proof. A (non-empty) closed bounded subset of R contains its infimum and supremum.
Corollary 15.5. A continuous (R-valued) function on a closed bounded interval has a max-
imum and a minumum.
Definition 15.6. Let X and Y be metric spaces, and let f : X → Y . f is an open map if
f (A) is an open subset of Y whenever A is an open subset of X. f is a closed map if f (A)
is a closed subset of Y whenever A is a closed subset of X.
Remark 15.7. Note that the above definitions refer to the forward set map defined by f ,
which is less well behaved than the reverse set map. For the reverse map, the analogous
properties are equivalent to continuity (Theorem 8.5 and Exercise 8.6).
Theorem 15.8. Let X be compact, and let f : X → Y be continuous. Then f is a closed
map.
Proof. The proof is an exercise.
Example 15.9. (1) Let T be the unit circle, and let f : [0, 1] → T be given by f (t) =
(cos 2πt, sin 2πt). Then f is continuous
but not an open map: [0, 1/2) is an open
subset of [0, 1], but f [0, 1/2) is not an open subset of T, since it contain its non-
interior point (1, 0).
2πt 2πt
(2) Define g : [0, ∞) → T by g(t) = cos t+1 , sin t+1 . Then g is bijective and continuous,
but is neither
a closed map nor an open map: [1, ∞) is a closed subset of [0, ∞), but
f [1, ∞) is not a closed subset of T since it does not contain its limit point (1, 0).
As in the previous example, [0, 1) is an open subset of [0, ∞), but f [0, 1) is not an
open subset of T.
Theorem 15.10. Let X and Y be metric spaces with X compact, and let f : X → Y be
continuous and bijective. Then f is an open map.
Proof. Let U ⊆ X be open. Then U c is closed, hence compact. Therefore f (U c ) is compact,
hence closed. But f (U c ) = f (U )c since f is bijective. Therefore f (U ) is open.
Corollary 15.11. In the above theorem, f −1 is continuous.
16. Connectedness
Let’s recall for a moment Example 8.8(5): T and [0, 1] are not homeomorphic metric
spaces. How might we go about proving this? A clever observation is the following: if we
remove a point from T, the result is still “one piece” (in fact, it is easy to see that for any
z ∈ T, T \ {z} is homeomorphic to R). On the other hand, if we remove a point from [0, 1]
(other than one of the two endpoints), the result “consists of two pieces”. It is an even
38 JACK SPIELBERG
cleverer observation that it is not very easy to say more precisely what we mean by “consists
of two pieces”. For example, any set containing more than one point can be divided into two
nonempty disjoint pieces. But surely, the divsion [0, 1] \ { 12 } = [0, 12 ) t ( 21 , 1] is a special way
of dividing a set into two pieces. What is special about it?
We need a topological property, and the following is the right one: no sequence in one
of the pieces can converge to a point of the other. Well, this is clearly true of the division
of [0, 1] \ { 21 } described above. But it pushes the problem back over to the other side: can
we prove that it is not possible to divide R into two nonempty disjoint pieces such that no
sequence in one piece can converge to a point of the other piece?
At some point, we just have to bite the bullet and try to prove a hard result. In this section
we will do this, and prove the fact about R stated in the previous paragraph. This is a deep
consequence of the completeness axiom. The relevant property of R is called connectedness.
As the above discussion has indicated, connectedness is a sort of “negative” property. We
will begin with the corresponding “positive” property. First, notice that to say that no
sequence in A converges to a point of B is the same thing as saying that A ∩ B = ∅. We use
this for our definition (notice that disjointness of A and B is implied).
Definition 16.1. Let X be a metric space. We call X separated if there exist nonempty
subsets A and B such that A ∪ B = X and A ∩ B = ∅ = A ∩ B. X is called connected if it
is not separated.
Remark 16.2. If E ⊆ X is a subset, we call E separated (or connected) if as a metric space
in its own right E has that property. We note that in the above definition of separation,
if A and B are subsets of E with union equal to E, the closures may be taken relative to
E, or in X — the intersections A ∩ B and A ∩ B will be the same. Thus being separated
or connected is an intrinsic property of E; it does not depend on whether E is given as a
subspace of another metric space.
There is another way to describe connectedness. Suppose that the metric space X is
separated, and let A and B be subsets as in the definition. Since X = A ∪ B, we know that
X = A ∪ B. Since A ∩ B = ∅, then A = (B)c . Thus A is an open set in X. Since A ∩ B = ∅
also, we know that B = Ac , hence B is closed. By the symmetry of the situation we know
that A is also closed, and B is open.
Definition 16.3. Let X be a metric space. A subset of X is clopen if it is both closed and
open.
Lemma 16.4. The metric space X is separated if and only if it contains a proper nonempty
clopen subset. X is connected if and only if its only clopen subsets are X and ∅.
Proof. The proof is elementary, and we leave it as an exercise.
Remark 16.5. Let X be a metric space, and let E ⊆ X. What does it mean for A ⊆ E to
be relatively clopen in E? We know that A is relatively open in E if and only if A = E ∩ U
for some open set U ⊆ X. Similarly, one can check that A is relatively closed in E if and
only if A = E ∩ K for some closed set K ⊆ X. Thus A is relatively clopen in E if and only if
there are two sets U and K in X, with U open and K closed, such that A = E ∩ U = E ∩ K.
(Note that it is NOT NECESSARILY true that A equals the intersection of E with a clopen
subset of X.)
Exercise 16.6. The Cantor set (Definition 6.1) is not connected.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 39
Thus E ∩ [a, b] and (I \ E) ∩ [a, b] are closed subsets of R. Let c = sup E ∩ [a, b] . Then
c ∈ E ∩[a, b] since this set is closed. Also, c < b since b 6∈ E. Hence (c, b] ⊆ (I \E)∩[a, b], and
so c ∈ (I \E)∩[a, b] since this set is closed. This leads to the contradiction c ∈ E ∩(I \E).
The following theorem is very useful, and we place it here because it deals with intervals
(although it is not a result about connectedness).
Theorem 16.11. Let U ⊆ R be open. Then U equals the union of countably many open
intervals. Moreover, U can be written as the union of a countable collection of pairwise
disjoint open intervals, and this collection is unique.
Proof. For x ∈ U choose a(x), b(x) ∈ Q with x ∈ a(x), b(x) ⊆ U . Let E = a(x), b(x) :
x ∈ U . Then E is a collection of open intervals. Since E ⊆ (α, Sβ) : α, β ∈ Q, α < β
2
Q , we see that E is a countable collection. It is clear that U = E.
The proof of the second statement of the Theorem is left as an exercise.
40 JACK SPIELBERG
(3) By (1) and Exercise 17.13, C(x) is connected. Since x ∈ C(x), C(x) is one of the sets in
the union defining C(x); thus C(x) ⊆ C(x).
(4) Any connected set containing C(x) is one of the sets in the union defining C(x), and
hence must equal C(x).
Definition 17.18. Let X be a metric space. A component of X is a maximal connected
subset. Thus the components of X are the sets C(x) from Theorem 17.17.
Theorem 17.19. Let U ⊆ Rn be open. Then U has countably many components, and these
are open sets.
Proof. Let x ∈ U , and y ∈ C(x). Since U is open there is r > 0 such that Br (y) ⊆ U . Then
C(x) ∪ Br (y) is connected by Lemma 17.15 (and Corollary 17.9). Then C(x) ∪ Br (y) ⊆ C(x)
by the definition of C(x), hence Br (y) ⊆ C(x). Thus C(x) is open.
Since the components of U are open, we may choose an element of Qn in each one. This
defines a map from the set of components to Qn . Since the distinct components are disjoint,
this map is one-to-one. Since Qn is countable, so is the set of components.
(3) Let h : (0, 1) → R be given by h(t) = sin(1/t). Then h is not uniformly continuous.
√
Proof. We choose ε = 2. Let δ > 0 be given. Choose n > 1/ δ. Let s = 2/[(2n+1)π]
and let t = 2/[(2n + 3)π]. Then
2 1 1 2 2 1
|s − t| = − = ≤ 2 < δ.
π 2n + 1 2n + 3 π (2n + 1)(2n + 3) n
But h(s) − h(t) = 1 − (−1) = 2 ≥ ε. Therefore h is not uniformly continuous.
The following theorem is a classic use of compactness to get a global result from local
information.
Theorem 18.3. Suppose f : X → Y is continuous, and X is compact. Then f is uniformly
continuous.
Proof. Let ε > 0 be given.
Since f is continuous,
for each x ∈ X there is rx > 0 such that
f Brx (x) ⊆ Bε/2 f (x) . The collection Brx /2 (x) : x ∈SX is an open cover of X. Since X
is compact, there are x1 , . . ., xn ∈ X such that X = ni=1 Brxi /2 (xi ). Let δ = min{rxi /2 :
1 ≤ i ≤ n}. Let y, z ∈ X with d(y, z) < δ. There is i such that d(y, xi ) < rxi/2. Then
d(z, xi ) ≤ d(z,
y) + d(y, xi ) < δ + rxi /2 ≤ rxi . Then f (y), f (z) ∈ Bε/2 f (xi ) , so that
d f (y), f (z) < ε.
19. Convergence of functions
Definition 19.1. Let X be a set. (Note that we really do mean set. Later we will let X
be a metric space, but for now, that is not relevant.) Let fn : X → Rk for n = 1, 2, 3, . . ..
(We remark that Rk may be replaced by another metric space. For ease of exposition, we
restrict our attention to the case where
∞ the codomain is Euclidean space.) For a ∈ X we say
k
that (fn ) converges at a if fn (a) n=1 is a convergent sequence in R . If (fn ) converges at
each point of x, define f : X → Rk by f (x) = limn→∞ fn (x). We say that (fn ) converges to
f (pointwise).
We may specify this more precisely as: for every ε > 0, for every x ∈ X, there exists
n0 ∈ N such that for all n ≥ n0 , kfn (x) − f (x)k < ε. (Note that n0 ≡ no (ε, x) depends on
both ε and on x.)
Example 19.2. (1) Let fn : [0, 1] → R be given by fn (x) = n1 x. Then fn → 0.
(2) Let gn : [0, 1] → R be given by gn (x) = xn . Then gn → g, where
(
0, if x < 1,
g(x) =
1, if x = 1.
Definition 19.3. Let f , fn : X → Rk . We say that (fn ) converges to f uniformly (on X)
if for each ε > 0, there exists n0 ∈ N such that for every x ∈ X, and for every n ≥ n0 ,
kfn (x) − f (x)k < ε. (Note that n0 ≡ n0 (ε) depends only on ε.)
Formally, the difference between pointwise convergence and uniform convergence is only
in the order of the two quantifed variables n0 and x. The difference practically, however, is
profound, and it is important that you get a good feel for it.
Example 19.4. (1) n1 x → 0 uniformly on [0, 1].
(2) xn 6→ 0 uniformly on [0, 1].
44 JACK SPIELBERG
(where the first and third occurrences of ε/3 are due to the uniform approximation of f by
fn , and the second is due to the continuity of fn at a). Therefore f is continuous at a.
Corollary 19.8. The uniform limit of continuous functions is continuous.
Example 19.9. (1) Consider the sequence of functions xn on [0, 1]. We have seen that
this sequence has a pointwise limit, which is not continuous. Since xn is continuous
for each n, the theorem implies that the convergence is not uniform (this is an easier
proof than the direct proof we gave earlier).
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 45
(2) The above argument cannot be used in reverse. For example, let fn : [0, 1] → R be
given by
1
2nx,
if 0 ≤ x ≤ 2n
1 1
fn (x) = −2n(x − 2n ), if 2n ≤ x ≤ n1
if n1 ≤ x ≤ 1.
0,
(It will be helpful to draw a picture.) Then fn → 0 pointwise on [0, 1], but not
uniformly, even though the limit is continuous.
Example 19.10. Recall function space from Example 4.6: if X is a set, B(X, Rk ) is the
vector space of all bounded function from X to Rk . B(X, Rk ) is a normed vector space, with
norm given by kf k = supx∈X kf (x)k. Thus B(X, Rk ) is a metric space.
Proposition 19.11. Let f , fn : X → Rk be bounded functions.
(1) fn → f in B(X, Rk ) if and only if fn → f uniformly on X.
(2) (fn ) is Cauchy in B(X, Rk ) if and only if (fn ) is uniformly Cauchy on X.
Proof. This follows immediately from the definitions.
Corollary 19.12. B(X, Rk ) is a complete metric space.
Proof. This follows from Proposition 19.6 and the above proposition.
Definition 19.13. Let X be a metric space. Cb (X, Rk ) is the space of all bounded continuous
functions from X to Rk .
Note that Cb (X, Rk ) is a vector subspace of B(X, Rk ), since the sum and (scalar) product
of continuous functions is continuous.
Proposition 19.14. Cb (X, Rk ) is a complete metric space.
Proof. This follows from Corollary 19.8.
Remark 19.15. If X is a compact metric space, then C(X, Rk ) = Cb (X, Rk ).
20. Differentiation
Definition 20.1. Let I ⊆ R be open, let f : I → R, and let a ∈ I. f is differentiable at a if
f (x) − f (a)
lim
x→a x−a
exists (equivalently, if limh→0 f (a + h) − f (a) /h exists). The limit is called the derivative
df df
of f at a, and is denoted f 0 (a) (or dx (a), or dx x=a
). We say that f is differentiable on I if
it is differentiable at each point of I. We refer to the quantity f (x) − f (a) /(x − a) as the
difference quotient.
Suppose that f is differentiable at a. Let L(x) = f (a) + f 0 (a)(x − a) (L is a “linear
function”, in that its graph is a straight line). The function f is well-approximated by L in
the following sense:
(3) f (a) = L(a)
f (x) − L(x)
(4) lim = 0.
x→a x−a
46 JACK SPIELBERG
Remark 20.2. There exists at most one linear function L having these properties. Unique-
ness is an exercise, while existence is equivalent to differentiability.
There is a third equivalent formulation of differentiability. We motivate it as follows. Let
f be differentiable at a. Define u : I → R by
f (x)−f (a)−f 0 (a)(x−a)
(
x−a
, if x 6= a
u(x) =
0, if x = a.
Then limx→a u(x) = limx→a f (x)−L(x)
x−a
= 0, so that u is continuous at a. Moreover, f (x) =
f (a) + f 0 (a)(x − a) + u(x)(x − a). Thus we see that if f is differentiable at a, then f differs
from L by a function that tends to zero as x tends to a, even when divided by x − a.
Theorem 20.3. f is differentiable at a if and only if there exist a linear function L(x) =
m(x − a) + b, and a function u(x), such that
(1) u(a) = 0.
(2) u is continuous at a.
(3) f (x) = L(x) + u(x)(x − a).
In this case, f 0 (a) = m (and of course, b = f (a)).
Proof. The ‘only if’ direction was proved in the remarks before the statement of the theorem.
For the ‘if’ direction, let L and u be as in the statement of the theorem. Letting x = a in
the third item of the statement gives f (a) = b. Then dividing by x − a, and letting x → a,
we get
f (x) − f (a) m(x − a) + u(x)(x − a)
lim = lim = lim m + u(x) = m,
x→a x−a x→a x−a x→a
Theorem 20.7. (The chain rule.) Let I, J ⊆ R be open, let f : I → R and g : J → R, let
a ∈ I, suppose that f (a) ∈ J, and suppose that f is differentiable at a and g is differentiable
at f (a). Then g ◦ f is differentiable at a, and (g ◦ f )0 (a) = g 0 f (a) f 0 (a).
Proof. We apply Theorem 20.3 to f and g to obtain functions u : I → R and v : J → R
such that
(1) u and v vanish at a and f (a), respectively.
(2) u and v are continuous at a and f (a), respectively.
(3)
f (x) = f (a) + f 0 (a)(x − a) + u(x)(x − a)
g(y) = g f (a) + g 0 f (a) y − f (a) + v(y) y − f (a) .
0
+ v f (x) f (a)(x − a) + u(x)(x − a)
= g f (a) + g 0 f (a) f 0 (a)(x − a)
h i
+ g 0 f (a) u(x) + v f (x) f 0 (a) + v f (x) u(x) (x − a).
Then by Theorem 20.3 it suffices to show that the expression in square brackets vanishes
and is continous at x = a. We check this for each of the three terms separately. It is true for
the first term because it is true for u. It is true for the second term because f is continuous
at a (by Lemma 20.4), v is continuous, and vanishes, at f (a), and Theorem 8.11. It is true
for the third term by both of the above.
We now draw out some consequences of differentiability on intervals. First we give a
general definition.
Definition 20.8. Let X be a metric space, let U ⊆ X be open, let a ∈ U and let f : U → R.
f has a local maximum (respectively local minimum) at a if there is r > 0 such that for all
x ∈ Br (a) we have f (x) ≤ f (a) (respectively, f (x) ≥ f (a)). Local maxima and minima are
called local extrema.
Lemma 20.9. Let I ⊆ R be an open interval, let a ∈ I, and let f : I → R. Suppose that f
is differentiable at a. If f has a local extremum at a, then f 0 (a) = 0.
Proof. We prove the contrapositive. Suppose that f 0 (a) 6= 0. For definiteness we assume
f 0 (a) > 0 (the proof in the case f 0 (a) < 0 is analogous). We then have that limx→a f (x) −
f (a) /(x − a) > 0. Then there is δ >0 such that (a − δ, a + δ) ⊆ I, and such that for x ∈ I,
if 0 < |x − a| < δ then f (x) − f (a) /(x − a) > 0. Now, for any x with a − δ < x < a, we
have x − a < 0. Since the difference quotient is positive, we must have f (x) − f (a) < 0; thus
f does not have a local minimum at a. Similarly, for any x with a < x < a + δ, we have
x − a > 0. Again, since the difference quotient is positive, we must have f (x) − f (a) > 0;
48 JACK SPIELBERG
thus f does not have a local maximum at a. Therefore, f does not have a local extremum
at a.
This lemma has several famous applications.
Theorem 20.10. (Rolle’s theorem) Let f : [a, b] → R be continuous, and assume that f is
differentiable on (a, b). Suppose further that f (a) = f (b) = 0. Then there exists c ∈ (a, b)
such that f 0 (c) = 0.
Rolle’s theorem is a special case of the following theorem
Theorem 20.11. (Mean value theorem) Let f : [a, b] → R be continuous, and assume that f
0
is differentiable on (a, b). Then there exists c ∈ (a, b) such that f (c) = f (b) − f (a) /(b − a).
The idea of the theorem, and the proof, is easy to see from a simple sketch:
on the graph
of f , draw the straight line between the endpoints of the graph a, f (a) and b, f (b) . Let
L(x) be the linear function whose graph passes through these two points. The point c in the
theorem is (one of) the place(s) where the vertical distance between the graphs of f and L
is stationary, i.e. has a local extremum. A little algebraic manipulation of the expression
f (x) − L(x) yields the beginning of the following proof.
Proof. Let h(x) = f (x) − f (a) (b − a) − f (b) − f (a) (x − a). Then h is continuous on [a, b]
and differentiable on (a, b). Also h(a) = h(b) = 0. By the extreme value theorem (Corollary
15.4), h takes on its maximum and minimum values on [a, b]. We note that at least one
of these occurs in the interior (a, b). For if both occur at the endpoints, then h must be
identically zero, and hence achieves its maximimum and minimum at every point of [a, b].
Let c ∈ (a, b) be such a point. By Lemma 20.9 we have h0 (c) = 0. Differentiating h gives
h0 (x) = f 0 (x)(b−a)− f (b)−f (a) . Then the equation h0 (c) = 0 gives the desired result.
Remark 20.12. There is an alternate phrasing of the mean value theorem that is often
convenient. Let f : I → R be differentiable, where I is an open interval. Let a ∈ I and
h ∈ R \ {0} be such that a + h ∈ I. If we wish to apply the mean value theorem to the closed
interval having a and a + h as endpoints, we would like to express the conclusion without
declaring which is the left, and which the right, endpoint. We avoid this inconvenience in the
following way: the point c lies (strictly) between a and a + h if and only if there is a number
0 < θ < 1 such that c = a + θh. Thus we reexpress the mean value theorem in the following
way: if a, a + h ∈ I then there exists 0 < θ < 1 such that f (a + h) = f (a) + hf 0 (a + θh).
Now we give some corollaries of the mean value theorem.
Corollary 20.13. Let I ⊆ R be an open interval, and let f : I → R be differentiable. If
f 0 = 0 on I, then f is constant on I.
Proof. Let x0 ∈ I, and apply the mean value theorem to the interval between x0 and x,
for any x ∈ I. We find that there is c strictly between x0 and x such that f (x) − f (x0 ) =
f 0 (c)(x − x0 ) = 0. Thus f (x) = f (x0 ) for all x ∈ I.
Corollary 20.14. Let I be as in the previous corollary, and let f , g : I → R be differentiable.
If f 0 = g 0 on I, then f − g is a constant function.
Proof. Apply the previous corollary to f − g.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 49
a. A similar argument shows that the minimum does not occur at b. Hence f has a local
minimum in the open interval (a, b), and at this point f 0 = 0.
Proof. (of Theorem 20.19) By Theorem 20.20 we know that f 0 > 0 on I, or that f 0 < 0
on I. By Corollary 20.16 it follows that f is strictly monotone on I. It follows from the
intermediate value theorem that f (I) is an open interval, and that f −1 is continuous. We now
show that f −1 is differentiable, and compute its derivative. For x ∈ I let y = f (x) ∈ f (I).
For w ∈ f (I) with w 6= y, there is t ∈ I such that w = f (t). Since f is one-to-one, t 6= x.
We have −1
f −1 (w) − f −1 (y)
t−x f (t) − f (x)
= = .
w−y f (t) − f (x) t−x
Since f −1 is continuous, limw→y t = x. Moreover t 6= x during this limiting process. Therefore
−1
f −1 (w) − f −1 (y)
f (t) − f (x) 1 1
lim = lim = 0 = 0 −1 .
w→y w−y t→x t−x f (x) f f (y)
Corollary 20.21. If f is C r (in addition to the hypotheses of the inverse function theorem),
then so is f −1 .
Proof. The formula for (f −1 )0 shows that it is continuous if f 0 is continuous. Similarly, it is
differentiable if f 0 is differentiable, etc.
If you consider the function h used in the proof of the mean value theorem, you will notice
the beginnings of some symmetry: the function f and the identity function play opposite
roles. Remarkably, the identity function can be replaced by another function like f . The
result is
Theorem 20.22. (Cauchy mean value theorem.) Let f , g : [a, b] → R be continuous, 0
and differentiable
0 on (a, b). Then there exists c ∈ (a, b) such that f (b) − f (a) g (c) =
g(b) − g(a) f (c).
Proof. Let h(t) = f (b)−f (a) g(t)−g(a) − f (t)−f (a) g(b)−g(a) . Then h is continuous
on [a, b], differentiable on (a, b), and h(a) = h(b) = 0. Now the mean value theorem gives
the result.
We apply Cauchy’s mean value theorem to prove L’Hôpital’s rule on the computation of
indeterminate limits. The proof applies to any form of continuous limit — here we phrase
it for one-sided limits.
Theorem 20.23. (L’Hôpital’s rule.) Let f , g : (a, b) → R be differentiable. Suppose that
limt→a+ f (t) = limt→a+ g(t) = 0, and that g(t) 6= 0 on (a, b). If limt→a+ f 0 (t)/g 0 (t) = L, then
limt→a+ f (t)/g(t) = L.
Proof. Define f (a) = g(a) = 0. Then f and g are continuous on [a, b). By the hypothesis on
the limit of f 0 /g 0 , we are implicitly assuming that g 0 (t) 6= 0, at least for all t close enough to
a. Replacing b by a smaller value, we may assume that g 0 6= 0 on (a, b). Now, for t ∈ (a, b),
we apply Cauchy’s mean value 0 theorem to f and g on the interval [a, t]. Thus there exists
0
c ∈ (a, t) with f (t) − f (a) g (c) = g(t) − g(a) f (c). Since f (a) = g(a) = 0, we get
f (t)g 0 (c) = g(t)f 0 (c). By hypothesis we have g(t) 6= 0. Thus we have
f (t) f 0 (c)
= 0 .
g(t) g (c)
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 51
Moreover, a < c < t. Of course, c depends on t, but we see that as t → a+ then also c → a+ .
Hence
f (t) f 0 (c)
lim+ = lim+ 0 = L.
t→a g(t) c→a g (c)
With a bit
more work, the same result can be proved in the case where we assume
limt→a+ f (t) = limt→a+ g(t) = ∞. This is an interesting exercise, or you may look up
the proof (e.g. in Rudin). If limt→a+ f (t) = 0 while limt→a+ g(t) = ±∞, evaluating the limit
limt→a+ f (t)g(t) presents us with the third kind of indeterminate form, namely 0 · ∞. In this
case, we would instead consider the limit of f /(g −1 ), which is indeterminate of form 0/0.
We see by this lemma that it is easy to find a polynomial that approximates f well at the
point a. It is not as easy to see how well this polynomial approximates f near the point a.
For this, we have Taylor’s Theorem. One can think of it as the generalization of the mean
value theorem from order 0 to order k. The proof is a bit tricky; we will use Cauchy’s mean
value theorem.
52 JACK SPIELBERG
(2) Define g : R → R by
(
e−1/x , if x > 0
g(x) =
0, if x ≤ 0.
It is a nice exercise to show that g has derivatives of all orders at 0 (this is clear at
other points of R), and that g (j) (0) = 0 for all j. Thus all Taylor polynomials of g
at 0 are identically zero. Therefore the Taylor polynomials of g do not approximate
g uniformly in any neighborhood of zero.
Among the first properties of integration that are presented in calculus are the “sum” and
“scalar multiple” rules:
Z Z Z Z Z
(f + g) = f + g; (cf ) = c f.
In fact, these are indicating precisely that integration is a linear functional. Linear algebra
is an essential part of modern analysis, and the analysis of linear functionals, functional
analysis, is one of its broadest subdisciplines.
Well, the notion of linear map presupposes the idea of vector spaces: the domain and
codomain of a linear map should be vector spaces. This is a fundamental idea, that is
almost completely lost in a calculus course: the collection of functions that can be integrated
should be a vector space. To be candid, we don’t really talk at all about the “space of
integrable functions” in a calculus course. At best, we try to explain why certain functions
are integrable, e.g. continuous, or piecewise continuous, functions. This time, we will directly
address this question. Not only will we carefully define what integrable means, and prove that
the set of integrable functions is a vector space. We will give an independent characterization
(due to Lebesgue) of exactly which functions are integrable. This is useful even just in the
context of Riemann integration. Many important results that would otherwise require fussy
proofs will become effortless (so to speak). But it also prods us to a larger view. Once we
are able to see the space of Riemann integrable functions as a whole, we can also begin to
54 JACK SPIELBERG
see its limitations, and where it might give way to generalization. In the next semester we
will spend some time (how much???) exploring Lebesgue’s version of integration.
That is the end of the “introduction”. We have to get started, and the beginning is
very basic — after all, integration is just a lot of arithmetic. We will follow Pugh’s idea
of emphasizing the fact that there are two usual ways to present the integral; he refers to
them as the Riemann and the Darboux approaches. Without any expertise in the history of
mathematics, or any effort at tracking down that history, we will just adopt this terminology.
First we give the Riemann approach. We let f be a real-valued function on a compact interval
[a, b].
Definition 22.1. A partition of [a, b] is a finite set P ⊆ [a, b] such that a, b ∈ P .
The idea of a partition is that it defines a subdivision of [a, b] into a finite number of
subintervals. The easiest way of indicating this is by giving the set of endpoints of the
subintervals, which is what our definition does. We usually write a partition in the form
P = {x0 , x1 , . . . , xn },
where a = x0 < x1 < · · · < xn = b. This is a slight abuse of notation, since the definition
of P as a set does not indicate that the numbers in the set are given in (strictly) increasing
order. From the partition P we obtain n subintervals of [a, b]: [x0 , x1 ], . . ., [xn−1 , xn ]. Note
that the number n associated with P is obtained from the relation n + 1 = #(P ). We use
the term mesh for the length of the largest subinterval: mesh(P ) = max1≤i≤n (xi − xi−1 ).
The mesh is a rough sort of description of how fine the partition is.
Definition 22.2. A partition pair is a partition P together with a list T = (t1 , . . . , tn ) such
that xi−1 ≤ ti ≤ xi for 1 ≤ i ≤ n.
Thus the list T consists of a selection of one element from each subinterval of the partition.
Definition 22.3. Let f : [a, b] → R, and let (P, T ) be a partition pair for the interval [a, b].
The Riemann sum associated to this data is the number
n
X
R(f, P, T ) = f (ti )∆xi ,
i=1
where ∆xi = xi − xi−1 , the length of the ith subinterval.
Now we have the terminology we need to define Riemann integrability and the Riemann
integral. As mentioned above, Riemann sums are just a lot of (carefully organized) arith-
metic. To pass to the integral is a limiting process. The following definition is the usual
notion of limit, but is based on the mesh.
Definition 22.4. The function f : [a, b] → R is Riemann integrable if there is a number L
such that for every ε > 0, there exists δ > 0 such that for every partition pair (P, T ) of [a, b],
if mesh(P ) < δ then R(f, P, T ) − L < ε.
We write L = limmesh(P )→0 R(f, P, T ) to indicate this limit. The number L is unique, if
it exists. This is proved in theR usual way
R b of limits,
R band is left to youR as an exercise. If f is
b
Riemann integrable, we write a f (or a f dx, or a f (x) dx, or just f ) for the number L.
We will write R[a, b] for the set of all Riemann integrable functions on [a, b].
There is an important detail hidden in the last definition. For the limit to exist it must
be the case that the approximation holds independently of the choice of the list T in the
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 55
partition pair. In other words, if P is a partition with mesh(P ) < δ, then the Riemann sum
is within ε of L for any choice of T .
We now give some consequences of the definition.
Theorem 22.5. If f is Riemann integrable then f is bounded.
Proof. We apply the definition of integrability with ε = 1: there
exist L and δ > 0 such that
if P is any partition with mesh(P ) < δ, then R(f, P, T ) − L < 1. (As we mentioned above,
this estimate holds for any choice of T .) It follows from the triangle inequality that
n
X
f (ti )∆xi < 1 + |L|.
i=1
We will show that f is bounded on each subinterval of [a, b] defined by P . It will then follow
that f is bounded on [a, b]. Fix i0 ∈ {1, 2, . . . , n}. For i 6= i0 choose ti ∈ [xi−1 , xi ]. For any
t ∈ [xi0 −1 , xi0 ] we apply the above inequality to the list T = (t1 , . . . , ti0 −1 , t, ti0 +1 , . . . , tn ):
X
f (t)∆xi0 − f (ti )∆xi ≤ R(f, P, T ) < 1 + |L|.
i6=i0
We find that !
X
f (t) ≤ (∆xi0 )−1 1 + |L| +
f (ti )∆xi .
i6=i0
Thus the right hand side is an upper bound for |f | on [xi0 −1 , xi0 ].
Theorem 22.6. R[a, b] is a vector space, and integration defines a linear functional on it.
Proof. We note that for a fixed partition pair (P, T ), the Riemann sum is linear in f :
X
R(cf + g, P, T ) = (cf + g)(ti )∆xi
i
X
= cf (ti ) + g(ti ) ∆xi
i
X X
=c f (ti )∆xi + g(ti )∆xi
i i
= cR(f, P, T ) + R(g, P, T ).
Since addition and multiplication in R are continuous, we get
lim R(cf + g, P, T ) = lim cR(f, P, T ) + R(g, P, T )
mesh(P )→0 mesh(P )→0
Mi = sup f (t).
xi−1 ≤t≤xi
X
L(f, P ) = mi ∆xi
i
X
U (f, P ) = Mi ∆xi .
i
These are referred to as lower and upper sums. Notice that for any partition pair (P, T )
we have that L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Finally we define
I(f ) = sup L(f, P )
P
These are referred to as the lower and upper integrals of f on [a, b]. It is standard to write
Rb Rb
a
f for I(f ), and a f for I(f ). Finally, we say that f is Darboux integrable on [a, b] if I = I,
and in this case the common value is called the (Darboux) integral.
Our goal for this section is to prove that the Riemann and Darboux approaches yield the
same result. Before doing this we need to talk a bit about refinements of partitions, and
their effect on upper and lower sums and integrals.
Definition 23.2. Let P and P 0 be partitions of [a, b]. We say that P 0 refines P if P ⊆ P 0 .
It is easy to see that P 0 refines P if and only if every subinterval associated to P 0 is
contained in one of the subintervals associated to P .
Lemma 23.3. (Refinement Principle) Let P 0 refine P . Then L(f, P ) ≤ L(f, P 0 ) and
U (f, P 0 ) ≤ U (f, P ).
In other words, refining the partition causes the lower sum to increase, and the upper sum
to decrease. The idea of the proof is to proceed from P to P 0 by adding one point at a time.
Then the change in the lower and upper sums happens on only one subinterval of P . We
leave as an exercise the writing of a precise proof.
In general, if P1 and P2 are two partitions of [a, b], then neither one need refine the other.
Thus there is in general no relation between the upper and lower sums for two partitions.
However, P1 and P2 always have a common refinement; for example, P1 ∪ P2 contains both
P1 and P2 . This device gives us the following important result: every lower sum for f is less
than or equal to every upper sum for f .
Lemma 23.4. Let P1 and P2 be two partitions of [a, b]. Then L(f, P1 ) ≤ U (f, P2 ).
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 57
I = lim U (f, P ).
P →∞
R
If f is Darboux integrable, then we have f = limP →∞ L(f, P ) = limP →∞ U (f, P ).)
We are now ready to prove the main theorem of this section.
Theorem 23.6. Let f : [a, b] → R. Then f is Riemann integrable if and only if f is Darboux
integrable. For an integrable function, the two integrals coincide.
Proof. We first assume that f is Riemann integrable. Let ε > 0. There exist a number L
and δ > 0 such that if P is any
partition with mesh(P ) < δ, then for any list T associated
to P we have R(f, P, T ) − L < ε. Fix any partition P with mesh(P ) < δ. Then we have
(for any T )
L − ε < R(f, P, T ) < L + ε.
Recall that for any partition pair (P, T ), we have L(f, P ) ≤ R(f, P, T ) ≤ U (f, P ). Moreover,
it is easy to see that
L(f, P ) = inf R(f, P, T )
T
U (f, P ) = sup R(f, P, T ).
T
It follows that
L − ε ≤ L(f, P )
L + ε ≥ U (f, P ).
Therefore U (f, P ) − L(f, P ) ≤ 2ε. Hence f is Darboux integrable.
58 JACK SPIELBERG
Now we assume that f is Darboux integrable. The proof of this direction is a bit trickier
than the other one. In particular, it relies upon the standard technique of dividing the sum
into two kinds of terms, and estimatingR b them differently. Since f is bounded, there is K
such that |f | ≤ K on [a, b]. Let L = a f (the Darboux integral of f ). Let ε > 0. Choose a
partition P such that
U (f, P ) − L(f, P ) < ε.
Write P = {x0 , x1 , . . . , xn }. Set δ = nε . We will show that if (Q, T ) is any partition pair
with mesh(Q) < δ, then R(f, Q, T ) − L < (2K + 1)ε, proving Riemann integrability
(and also showing that the two integrals coincide). In fact, it will suffice to show that
U (f, Q) − L(f, Q) < (2K + 1)ε, since both L and R(f, Q, T ) lie between the lower and upper
sums.
So let Q = {y0 , y1 , . . . , yk } have mesh less than δ. We will write Ii = [xi−1 , xi ] for 1 ≤ i ≤ n,
and Jj = [yj−1 , yj ] for 1 ≤ j ≤ k. We divide the subintervals associated to Q into two groups
as follows:
S1 = {j : there exists i with xi ∈ int(Jj )}
S2 = {1, 2, . . . , k} \ S1 .
Thus S2 indicates those Jj ’s that are entirely contained in one of the Ii ’s; S1 indicates those
Jj ’s that straddle more than one of the Ii ’s. There are at most n elements in S1 (in fact,
there are at most n − 1). Now we will use m(I) and M (I) for the infimum and supremum
of f over an interval I. For j ∈ S1 we have
−K ≤ m(Jj ) ≤ M (Jj ) ≤ K.
For j ∈ S2 there is i such that Jj ⊆ Ii . Then
m(Ii ) ≤ m(Jj ) ≤ M (Jj ) ≤ M (Ii ).
Hence for this j and i we have
M (Jj ) − m(Jj ) ≤ M (Ii ) − m(Ii ).
Now we estimate:
k
X
U (f, Q) − L(f, Q) = M (Jj ) − m(Jj ) ∆yj
j=1
X X
= M (Jj ) − m(Jj ) ∆yj + M (Jj ) − m(Jj ) ∆yj
j∈S1 j∈S2
X n
X X
≤ 2K∆yj + M (Ii ) − m(Ii ) ∆yj
j∈S1 i=1 j∈S2
Jj ⊆Ii
n
X
< 2Knδ + M (Ii ) − m(Ii ) ∆xi
i=1
= 2Kε + U (f, P ) − L(f, P )
< (2K + 1)ε.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 59
There are various situations where it is fairly easy to prove integrability (or non-integrabi-
lity) using the Darboux definition. These are useful exercises in working with the definition.
In the next section we will prove a deep theorem that will make them trivial to verify.
Example 23.7. (1) Continuous functions are Riemann integrable.
(2) Monotone functions are Riemann integrable.
(3) Step functions are Riemann integrable. (A step function on [a, b] is a function for
which there exists a partition of [a, b] such that the function is constant on the interior
of each subinterval.) In particular, the characteristic function χ[c,d] of a subinterval
[c, d] of [a, b] is Riemann integrable over [a, b], where χE (x) = 1 if x ∈ E and = 0 if
x 6∈ E.
(4) More generally, a bounded function that is continuous at all but finitely many points
of [a, b] is Riemann integrable.
(5) The characteristic function of Q is not Riemann integrable over any interval.
Proof. Proofs for the previous three assertions are left as exercises.
(7) The Cantor set C has measure zero.
Proof. Recall from our construction of C that C = ∞
T
n=1 Fn , where Fn is the union
of 2n closed intervals, each of length 3−n . Stretching each of these a little bit, we
can produce 2n open intervals Ui each having length less than (2.5)−n and having
Pn 2 n
union containing Fn (and hence C). Then 2i=1 |Ui | = 2.5 , which tends to zero as
n → ∞.
(8) If a < b then [a, b] does not have measure zero. This is a good exercise, even if it
isn’t homework (but it might be).
Before stating the main theorem, we recall the notion of oscillation of a function at a
point. The definition makes sense for a function between general metric spaces, but for
clarity we will state it only for functions whose codomain is R.
Definition 24.3. Let X be a metric space, and let f : X → R. Let a ∈ X. The oscillation
of f at a is
osc(f, a) = inf sup f (x) − f (y) .
r>0 x,y∈B (a)
r
This is the precise description of a very natural idea. Let’s briefly take the definition apart.
Fix r > 0. This defines an open ball about a. How much can the function vary over this
ball? The supremum in the parentheses is exactly how much. If we let r become smaller,
then the ball becomes smaller, so that there are fewer points in the ball to put inside of f .
Thus as r decreases, the supremum also decreases. In fact, the infimum over r is actually
equal to the limit as r → 0. This limiting value is the minimum amount that f can be made
to jump, no matter to how small a ball (centered at a) you confine its argument. That is
what we mean by the oscillation at a.
We can think of the oscillation of f at a as a measure of the size of the discontinuity of f
at a. That is an interpretation of the first part of the following lemma (which should have
been homework earlier in the semester).
Lemma 24.4. Let X be a metric space, let f : X → R, and let a ∈ X.
(1) f is continuous at a if and only if osc(f, a) = 0.
(2) For c > 0, {x ∈ X : osc(f, x) ≥ c} is a closed set.
Theorem 24.5. Let f : [a, b] → R be bounded. Let E be the set of points in [a, b] where f is
discontinuous. Then f is Riemann integrable if and only if E has measure zero.
1
Proof. We first assume that f is Riemann S∞ integrable. Let En = {x ∈ [a, b] : osc(f, x) ≥ n }.
By Lemma 24.4 (1), we know that E = n=1 En . Thus it suffices to show that En has measure
zero for each n. So now fix n, and choose a partition P such that U (f, P ) − L(f, P ) < nε .
Let
S = {I : I is a subinterval of P, and int(I) ∩ En 6= ∅}.
For I ∈ S we have that M (I) − m(I) ≥ n1 . (The reason is that there must exist a point
a ∈ En in the interior of I, so that I ⊇ Br (a) for some r > 0.) But now we estimate
X
1
n
|I| ≤ U (f, P ) − L(f, P ) < nε ,
I∈S
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 61
P S
so that I∈S |I| < ε. Now the union I∈S I contains all points of En except possibly some
of the endpoints of subintervals of P not in S. There can be only finitely many such points.
Let T be a collection
P P of open intervals centered at these points with total length so small that
I∈S |I| + J∈T |J| < ε. Then {int(I) : I ∈ S} ∪ T is a finite collection of open intervals
covering En and having total length less than ε. Therefore En has measure zero.
Now we prove the converse. Suppose that E has measure zero. Let |f | ≤ K on [a, b],
and let ε > 0 be given. Let E0 = {x ∈ [a, b] : osc(f, x) ≥ ε}. Then E0 ⊆SE, so that
∞
P0∞also has measure zero. Let U1 , U2 , . . . be open intervals such that E0 ⊆ i=1 Ui and
E
i=1 |Ui | < ε. By Lemma 24.4(2), E0 is closed.SnSince E0 ⊆ [a,b], E0 is compact. Thus there
is n such that E0 ⊆ U1 ∪ · · · ∪ Un . Let P0 = i=1 ∂Ui ∩ [a, b] ∪ {a, b}, a partition of [a, b].
We will find a suitable refinement P of P0 such that U (f, P ) − L(f, P ) < (2K + b − a)ε,
which will conclude the proof. Since P0 contains the endpoints of the Ui ’s, each subinterval
associated to P0 is either contained in some Ui , or is disjoint from all of the Ui ’s. Let S1
denote the collection of those subintervals that are contained in some Ui , and let S2 denote
the remaining subintervals. Then for I ∈ S1 we have
M (I) − m(I) ≤ 2K.
Hence
X n
X
M (I) − m(I) |I| ≤ 2K |Ui | < 2Kε.
I∈S1 i=1
Now consider a subinterval I ∈ S2 . Then I ∩ E0 = ∅, so the oscillation of f at each point
of I is less than ε. Thus for each x ∈ I there is an open interval Ix centered at x such that
M (Ix ) − m(Ix ) < ε. The collection {Ix : x ∈ I} is an open cover of the compact interval I,
hence has a finite subcover: there are x1 , . . ., xk ∈ I such that I ⊆ ki=1 Ixi . We define P by
S
including into P0 all endpoints of the Ixi that lie in I:
[ [
P = P0 ∪ (∂Ixi ) ∩ I .
I∈S2 i
Let us consider the subintervals of P contained in some I ∈ S2 , let J be one such. Then
J ⊆ Ixi for some i, and hence M (J) − m(J) < ε. Therefore
XX XX X
M (J) − m(J) |J| < ε|J| = ε |I| ≤ ε(b − a).
I∈S2 J⊆I I∈S2 J⊆I I∈S2
We now have
X XX
U (f, P ) − L(f, P ) = M (I) − m(I) |I| + M (J) − m(J) |J|
I∈S1 I∈S2 J⊆I
Proof. The integrability follows from the previous corollary. It is easy to use the definition
of the integral to show that the integral is zero.
Corollary 24.8. Riemann integrability, and the value of the Riemann integral, of a function
are unaffected when the function is altered at finitely many points.
Proof. The altered function equals the sum of the original function with a function that is
zero except at finitely many points. Thus the previous corollary, together with linearity of
the integral, give the result.
Corollary 24.9. Monotone functions are Riemann integrable.
Proof. This follows from the fact that a monotone function has countably many discontinu-
ities. To see this, note that a monotone function has one-sided limits at all points, and is
discontinuous at a point if and only if the two one-sided limits at that point are distinct. If
we let
f (x±) = lim± f (t),
t→x
then for any x 6= y we have f (x−), f (x+) ∩ f(y−), f (y+) = ∅. Thus if we let q(x) be
a rational number in the interval f (x−), f (x+) for each discontinuity x of f , then q is a
one-to-one function from the set of discontinuities into Q. Therefore the set of discontinuities
is countable, and hence of measure zero.
Corollary 24.10. The product of Riemann integrable functions is Riemann integrable.
Proof. The set of discontinuities of f g is contained in the union of the sets of discontinuities
of f and g separately.
Corollary 24.11. Let f be Riemann integrable on [a, b], and let ϕ be a continuous function
defined on the range of f . Then ϕ ◦ f is Riemann integrable (also on [a, b]).
Proof. Since composition preserves continuity, the set of points where f is continuous is
contained in the set of points where ϕ ◦ f is continuous. Hence the sets of discontinuities
satisfy the reverse containment.
Remark 24.12. The order in which the two functions are composed in the previous corollary
is crucial: f ◦ ϕ need not be integrable. (You can remember which order preserves integrabi-
lity by noting that in the corollary, the composition has the same domain as the integrable
function.)
Corollary 24.13. If f is Riemann integrable, then so is |f |.
Proof. |f | = | · | ◦ f .
Corollary 24.14. Let f be Riemann integrable on [a, b], and let [c, d] ⊆ [a, b]. Then f is
Rd Rb
Riemann integrable on [c, d]. Moreover, c f = a f χ[c,d] .
Proof. For the first statement, note that any discontinuity of f in [c, d] is also a discontinuity
in [a, b]. The second statement follows easily from either definition of the integral by including
{c, d} into a partition of [a, b].
Corollary 24.15. Let f be Riemann integrable on [a, b], and let c ∈ (a, b). Then
Z b Z c Z b
f= f+ f.
a a c
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 63
Proof. This follows from linearity of the integral and Corollary 24.8, since f χ[a, b] and
f χ[a,c] + f χ[c,b] can differ only at c.
The image of a set of measure zero under a continuous function need not have measure
zero. This is a pretty strange phenomenon. The upshot is that continuity is not really such a
strong property. It is important that a stronger version of continuity is sufficient to preserve
measure zero sets.
Lemma 24.16. Let g : [a, b] → R be a Lipschitz function, and let E ⊆ [a, b] have measure
zero. Then g(E) has measure zero.
Proof. Let c > 0 be a Lipschitz
constant for g. We claim that if I is an open interval
contained in [a, b], then g(I) ≤ c|I|.To see this, let I = (t − r, t + r). Then by the Lipschitz
condition, g(I) ⊆P g(t) − cr, g(t) + cr . Now let ε > 0. Let U1 , U2 , . . . be open intervals with
ε
S
E ⊆ i Ui and i |Ui | < c . Let us assume that Ui ⊆ [a, b]; this is not a serious restriction,
as we may extend the domain of g to all of R (e.g. by letting g be S constant on (−∞, a] and
on [b, ∞)) without changing the Lipschitz constant. Then g(E) ⊆ i g(Ui ), and
X X
g(Ui ) ≤ c |Ui | < c εc = ε.
i i
Proof. One of a, b and c lies between the other two. By symmetry, we may assume without
loss of generality that it is b that lies in the middle. Again without loss of generality, we may
assume that a < b < c. Now, if f is integrable on [a, b], we are done by Corollary 24.14. On
the other hand, if f is integrable on [a, c] and [c, b], then f is integrable on [a, b] by Theorem
24.5.
R a R b
Remark 25.3. If a < b, then b f ≤ a |f |.
64 JACK SPIELBERG
Proof. Since g 0 is continuous on [c, d], it does not change sign. We first consider the case
where g 0 > 0 on [c, d]. Then g(c) < g(d). Note that g −1 is also continuously differentiable,
by the inverse function theorem, and hence that g −1 is Lipschitz. By Corollary 24.17, f ◦ g
is Riemann integrable, and hence so is (f ◦ g)g 0 . Let L and L0 be the two integrals in the
statementof the theorem, and let ε > 0. Let δ > 0 be such that
for any partition pair
(Q, U ) of g(c), g(d) with mesh(Q) < δ we have R(f, Q, U ) − L < ε. Since
g is uniformly
continuous on [c, d] there is η1 > 0 such that if |x − x0 | < η1 then g(x) − g(x0 ) < δ.
Choose
η2 > 0 such
that for any partition pair (P, T ) of [c, d] with mesh(P ) < η2 we have
R (f ◦ g)g 0 , P, T − L0 < ε. Fix a partition P of [c, d] with mesh(P ) < min{η1 , η2 }.
Write P = {x0 , x1 , . . . , xn }. Let yi = g(xi ), and let Q = g(P ) = {y0 , y1 , . . . , yn }. Since
mesh(P ) < η1 we know that mesh(Q) < δ. The mean value theorem applied to g on
[xi−1 , xi ] gives ti ∈ (xi−1 , xi ) such that
g(xi ) − g(xi−1 ) = g 0 (ti )(xi − xi−1 )
i.e. ∆yi = g 0 (ti )∆xi .
Let ui = g(ti ), and set U = (u1 , . . . , un ). Then (Q, U ) is a partition pair of g(c), g(d) , and
n
X n
X
f g(ti ) g 0 (ti )∆xi = R (f ◦ g)g 0 , P, T .
R(f, Q, U ) = f (ui )∆yi =
i=1 i=1
Therefore
|L − L0 | ≤ L − R(f, Q, U ) + R (f ◦ g)g 0 , P, T − L0 < ε + ε = 2ε.
Hence L = L0 .
If, on the other hand, g 0 < 0 on [c, d], then g(d) < g(c). Note that ∆yi = −g 0 (ti )∆xi (and
R g(c)
i runs backward). But R(f, Q, U ) approximates g(d) f = −L.
and so
Z 1 1 1
1−δ
Z Z
1 2 n n+1
gn (t) dt = (1 − t ) dt ≤ (1 − δ 2 )n dt = (n + 1)(1 − δ 2 )n → 0
δ cn δ 2 δ 2
R −δ Rδ
as n → ∞. Similarly, −1 gn → 0 as n → ∞. Hence −δ gn → 1.
Now we will state and prove the Weierstrass approximation theorem.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 67
It follows that !
2n
X Z 1 2n
X
pn (x) = f (u) aij ui du xj
j=0 0 i=0
is a polynomial in x.
Moreover,
Z b Z b Z b
(h − g) = (h0 − g0 + 2η) = (h0 − g0 ) + 2η(b − a) < η 1 + 2(b − a) = ε.
a a a
Theorem 27.2. Let I be an interval, and let fn : I → R be differentiable. Suppose that (fn0 )
converges uniformly on I to a functiong. Suppose additionally that there is a ∈ I such that
the sequence of function values fn (a) converges. Then (fn ) converges to a differentiable
function f , and f 0 = g. Moreover, the convergence of fn to f is uniform on any bounded
subinterval of I.
Proof. We first show that there is a function f to which (fn ) converges, and that this con-
vergence is uniform on bounded subintervals.
Let ε > 0 be given. Choose N so that
kfn0 − fm
0
ku < ε and fn (a) − fm (a) < ε for all m, n ≥ N . Let J ⊆ I be a bounded
subinterval; say |x| ≤ M for x ∈ J. If x ∈ J, we have
fn (x) − fm (x) = (fn − fm )(x)
= (fn − fm )(x) − (fn − fm )(a) + fn (a) − fm (a)
= (fn − fm )0 (c)|x − a| + fn (a) − fm (a),
70 JACK SPIELBERG
≤ ε|x − a| + ε
= ε |x − a| + 1
≤ ε(M + |a| + 1).
Thus (fn ) is uniformly Cauchy on J, and hence converges uniformly on J (and pointwise on
all of I too).
Let f be the limit of fn . We now show that f is differentiable, and that f 0 = g. Let
ε > 0. Choose N so that kfn0 − fm 0
ku < ε/3 for m, n ≥ N . Letting m → ∞, we see also that
kfn0 − gku ≤ ε/3. Now fix n ≥ N , and fix x ∈ I. For any h 6= 0 such that x + h ∈ I, we have
fn (x + h) − fn (x) f (x + h) − f (x) fn (x + h) − fn (x) fm (x + h) − fm (x)
− = m→∞
lim
−
h h h h
(fn − fm )(x + h) − (fn − fm )(x)
= lim
m→∞ h
= lim (fn − fm )0 (x + θh),
m→∞
As a last example of the interchange of two limiting processes, we give a result on differen-
tiating an integral. For this we recall from earlier experience the notion of partial derivative.
Let f : [a, b] × [c, d] → R, and suppose that for each y ∈ [c, d] the function x 7→ f (x, y) is
differentiable on [a, b]. The partial derivative of f with respect to x is defined by
∂f f (x + h, y) − f (x, y)
(x, y) = lim .
∂x h→0 h
Theorem 27.3. Let f : [a, b] × [c, d] → R be continuous, and suppose that ∂f /∂x exists and
Rd
is continuous on [a, b] × [c, d]. Let G : [a, b] → R be defined by G(x) = c f (x, y) dy. Then
Rd
G is differentiable on [a, b], and G0 (x) = c (∂f /∂x)(x, y) dy.
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 71
Proof. Let ε > 0. Since ∂f /∂x is continuous on the compact set [a, b] × [c, d], it is uniformly
continuous. Let δ > 0 be as in the definition of uniform continuity for ∂f /∂x on [a, b] × [c, d]
and for the positive quantity ε/(d − c). Now if x, x + h ∈ [a, b] with 0 < |h| < δ, then
Z d Z d
G(x + h) − G(x) ∂f f (x + h, y) − f (x, y) ∂f
− (x, y) dy = − (x, y) dy
h c ∂x c h ∂x
Z d
∂f ∂f
= (x + θh, y) − (x, y) dy ,
∂x c ∂x
for some θ ≡ θ(x, y, h) ∈ (0, 1),
ε
≤ (d − c) = ε.
d−c
Rd
It follows that G0 (x) = c
(∂f /∂x)(x, y) dy.
P∞
Example 28.5. (1) The series n=0 (−1)n diverges, since limn→∞ (−1)n does not exist
(and hencePis not equal to zero). √
(2) The series ∞ n=1 n
−1/n
diverges, since limn→∞ n−1/n = 1/(limn→∞ n n) = 1 is nonzero.
P P P P
Proposition
P 28.6.PIf an and bn converge, then so does (λan + µbn ), and (λan +
µbn ) = λ an + µ bn .
Proof. This follows immediately from the corresponding results for sequences.
72 JACK SPIELBERG
This last is the partial sum of a geometric series with ratio 21−p . Since p > 1, the ratioPis less
than 1, and hence the geometric series converges. It follows that the partial sums of 1/np
are bounded, hence it converges.
Next we suppose that 0 < p ≤ 1. We have that
2 n
X
ai = a1 + a2 + (a3 + a4 ) + (a5 + · · · + a8 ) + · · · + (a2n−1 +1 + · · · + a2n )
i=1
≥ a2 + 2a4 + 4a8 + · · · + 2n−1 a2n ,
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 73
We also have the following facts as immediate corollaries of the theorems on uniform
convergence.
Theorem 29.4. Let fn : X → R be functions.
P P
(1) If fn is continuous for all n, and fn converges
P uniformly, then fn is continuous.
P
(2) If X = [a, b], fn ∈ R[a, b] for all n, and fn converges uniformly, then fn ∈
RbP PRb
R[a, b], and a fn = f .
a n P 0 P
(3) If X = [a, b], fn is differentiable forPall n, fn converges uniformly, and P 0fn (x0 )
fn is differentiable, and ( fn )0 =
P
converges for some x0 ∈ [a, b], then fn .
We will characterize the compact subsets of C(X, Rk ). Recall that a set is compact if
and only if it is complete and totally bounded. Since C(X, Rk ) is already a complete metric
space, a subset is complete if and only if it is closed. Therefore we will focus our attention
on the property of total boundedness: how can we describe in a more intrinsic way what it
means for a subset of C(X, Rk ) to be totally bounded?
Let F ⊆SC(X, Rk ) be totally bounded. Let ε > 0. Then there are f1 , . . ., fn ∈ F such
that F ⊆ ni=1 Bε (fi ). Since X is compact, and the fi are continuous, they are uniformly
continuous.
for each i there is δi > 0 such that for all x, y ∈ X, if d(x, y) < δi then
Thus
fi (x) − fi (y)
< ε. Let δ = min{δ1 , . . . , δn }. We claim that for any function f ∈ F, this δ
works in the definition of uniform continuity. To see this, let f ∈ F, and let x, y ∈ X with
d(x, y) < δ. There is i0 , 1 ≤ i0 ≤ n, such that kf − fi0 k < ε. Then
f (x) − f (y)
≤
f (x) − fi0 (x)
+
fi0 (x) − fi0 (y)
+
fi0 (y) − f (y)
< ε + ε + ε = 3ε.
Thus we have shown that the functions in the family F are “equally uniformly continuous”.
This phrase has been shortened to “equicontinuous”.
Definition 31.1. Let F be a family of functions between metric spaces X and Y . Let
x0 ∈ X.
(1) F is equicontinuous at x0 if for each ε > 0 there is δ > 0 such that for each f ∈ F
and for all x ∈ X, if dX (x, x0 ) < δ then dY f (x), f (x0 ) < ε. (I.e. δ is independent
of the choice of f ∈ F.)
(2) F is equicontinuous (on X) if it is equicontinuous at each point of X.
(3) F is uniformly equicontinuous (on X) if for each ε > 0 there is δ > 0 such that for
each f ∈ F and for all x, z ∈ X, if dX (x, z) < δ then dY f (x), f (z) < ε.
Exercise 31.2. If X is compact, and F is equicontinuous, then F is uniformly equicontin-
uous.
Because of this exercise, when X is compact we need not distinguish between equicontinu-
ity and uniform equicontinuity. We remark that there are stupid examples of equicontinuous
families. For example, in C(X, R), we may consider the family of all constant functions.
This family is clearly equicontinuous, but is not totally bounded (or even bounded). For this
reason we identify another property of a family of functions.
Definition 31.3.
F ⊆ C(X, Rk ) is pointwise bounded if for each x ∈ X, the set F(x) :=
f (x) : f ∈ F is a bounded subset of Rk .
Exercise 31.4. If F ⊆ C(X, Rk ) is pointwise bounded and equicontinuous, then F is a
bounded subset (of C(X, Rk )).
We remark that a totally bounded subset of C(X, Rk ) is also bounded, and hence pointwise
bounded. Thus we have already proved the following result.
Lemma 31.5. Let X be compact and F ⊆ C(X, Rk ). If F is totally bounded, then F is
pointwise bounded and equicontinuous.
The Arzela-Ascoli theorem is the converse of the lemma. It is usually phrased in terms of
precompactness: a subset of a metric space is precompact if its closure is compact. In the
setting of C(X, Rk ), then, precompactness is the same as total boundedness.
Theorem 31.6. Let X be a compact metric space, and let F ⊆ C(X, Rk ). Then F is
precompact if and only if it is pointwise bounded and equicontinuous.
78 JACK SPIELBERG
Proof. As remarked above, we have already proved the “only if” direction. So we assume that
F is pointwise bounded and equicontinuous. We use Exercise 31.2; hence F is uniformly
equicontinuous. Let ε > 0. Choose δ > 0 as in the definition of uniform equicontinuity
of F. SinceS X is compact, X is totally bounded. Then there are x1 , . . ., xp ∈ X such
that X = pi=1 Bδ (xi ). Now we use the pointwise boundedness of F. For each i, the set
F(xi ) = f (xi ) : f ∈ FS is a bounded subset of Rk , hence is totally bounded (by Lemma
14.27). Then the union pi=1 F(xi ) is also totally bounded. So we can choose points y1 , . . .,
yq ∈ Rk such that
p q
[ [
F(xi ) ⊆ Bε (yj ).
i=1 j=1
Now we come to the interesting part of the argument. Let f ∈ F. For each i, choose j
such that f (xi ) ∈ Bε (yj ). This defines a function ηf : {1, 2, . . . , p} → {1, 2, . . . , q}. Thus ηf
satisfies the formula
f (xi ) ∈ Bε (yηf (i) ).
But notice that there are only a finite number of possible functions η : {1, 2, . . . , p} →
{1, 2, . . . , q}. For each such function η, let
Cη = {f ∈ F : ηf = η}.
Then F ⊆ η Cη , a finite union. Each Cη is a subset of C(X, Rk ). To finish the proof, we
S
will show that Cη has diameter at most 4ε. Let f , g ∈ Cη , for some η. Then for i = 1, . . .,
p, we have f (xi ), g(xi ) ∈ Bε (yη(i) ). For any x ∈ X choose i with x ∈ Bδ (xi ). Then
f (x) − g(x)k ≤
f (x) − f (xi )
+
f (xi ) − g(xi )
+
g(xi ) − g(x)
< ε +
f (xi ) − g(xi )
+ ε,
by the uniform equicontinuity of F (and the choice of δ),
< ε + 2ε + ε,
since f (xi ) and g(xi ) belong to a ball of radius ε. Thus kf − gku < 4ε. Therefore Cη has
diameter at most 4ε.
n
X n
X
aj b j = (sj − sj−1 )bj
j=m j=m
n
X n−1
X
= s j bj − sj bj+1
j=m j=m−1
n−1
X
= sj (bj − bj+1 ) + sn bn − sm−1 bm .
j=m
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 79
We then have
Xn X n−1
aj b j ≤ sj (bj − bj+1 ) + |sn bn | + |sm−1 bm |
j=m j=m
n−1
X
≤ M (bj − bj+1 ) + M bn + M bm ,
j=m
= 2M bm .
P
If bm → 0 as m → ∞, the series an bn converges by the Cauchy criterion.
Corollary 32.3. (Alternating series test.) Let (bn ) be a decreasing sequence with limit 0.
Then the alternating series
∞
X
b1 − b2 + b3 − · · · = (−1)n−1 bn
n=1
converges, and ∞
P j−1
j=n+1 (−1) bj ≤ bn+1 .
Proof. With an = (−1)n , Abel’s theorem provesPconvergence, and gives the estimate with a
factor of 2. However, since the partial sums of an are all non-negative (either 0 or 1), the
estimate in that proof can be improved as in the statement of the corollary. We leave the
details to the interested reader.
P∞ n−1
Example 32.4. (1) (The alternating harmonic series.) n=1 (−1) /n = 1 − 1/2 +
1/3 − 1/4 + 1/5 − · · · converges by the alternating series test. (We will see later that
the sum is log 2.)
(2) Let θ be an irrational number. (In fact, the argument we present applies to any
non-integral real number θ.) In the following, we will apply the formula for the sum
of a finite geometric series to complex numbers.
n
X n
X
sin 2πjθ = Im (cos 2πjθ + i sin 2πjθ)
j=1 j=0
n
X
= Im (cos 2πθ + i sin 2πθ)j
j=0
Hence
n
X 2
sin 2πjθ ≤
1 − (cos 2πθ + i sin 2πθ)
j=1
2
=p
(1 − cos 2πθ)2 + sin2 2πθ
r
2
= .
1 − cos 2πθ
P
Thus the series
P sin 2πnθ n sin 2πnθ has bounded partial sums. By Abel’s theorem, the series
n n
converges.
Abel also proved the following theorem on the behavior of a power series at an endpoint
of the interval of convergence.
Theorem 32.5. Let ∞ n
P
n=0 an (x − x0 ) have radius of convergence 0 < R < ∞. Suppose that
the series converges at an endpoint of the interval of convergence. Then the series converges
uniformly on the closed interval from x0 to that endpoint.
Corollary 32.6. With the hypotheses of the theorem, let f (x) denote the sum of the series
in its domain of convergence. Then f is continuous.
Proof. (of theorem) A linear change of variables reduces the theorem to the case where x0 = 0
and R = 1. We consider the case where the series converges at P the right-hand endpoint;
∞ n
the other case has a similar proof.
P Thus we have a power series n=0 an x with radius of
convergenceP 1, and such that an converges. Let ε > 0 be given. Applying the Cauchy
criterion to an , we obtain n0 ∈ N such that for all n0 ≤ m ≤ n we have
X n ε
aj < .
2
j=m
For any x ∈ [0, 1], the sequence xn is decreasing. We apply Abel’s theorem to the series
P ∞ j
j=n0 aj x to get
X n ε
aj x j 2 xm ≤ ε,
2
j=m
a uniform estimate.
n
P∞
Example 32.7. (1) From the geometric series 1/(1 − x) = n=0 x for |x| < 1, we
integrate term-by-term to obtain
Z x ∞ Z x ∞ ∞
dt X X xn+1 X xn
− log(1 − x) = = tn dt = = ,
0 1−t n=0 0 n=0
n + 1 n=1
n
still with radius of convergence equal to 1. Replacing x by −x we get
∞
X xn
(∗∗) log(1 + x) = (−1)n−1 ,
n=1
n
valid for |x| < 1. When x = 1 we have the alternating harmonic series, which
converges. By Abel’s theorem, the power series converges uniformly on [0, 1], and
NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 81
hence the limit is continuous. Since the equality in (∗∗) holds on [0, 1), and both
sides are continuous on [0, 1], the equality must hold at x = 1. This gives
1 1 1
log 2 = 1 − + − + ··· .
2 3 4
(2) Again starting with the geometric series, we replace x by −x2 to get
∞
1 X
= (−1)n x2n .
1 + x2 n=0
Since | − x2 | < 1 if and only if |x| < 1, this equation is also valid for |x| < 1. Now we
integrate term-by-term to get
Z x ∞
dt X (−1)n 2n
arctan x = 2
= x ,
0 1+t n=0
2n + 1
valid for |x| < 1. Again, the series converges for x = 1 by the alternating series test.
By Abel’s theorem, the series is continuous on [0, 1], and so the above equation is
still valid at x = 1. We obtain the classical series
π 1 1 1
= 1 − + − + ··· .
4 3 5 7
(3) We consider f (x) = (1 + x)α for α > 0, α 6∈ N. Repeated differentiation gives
f (n) (x) = α(α − 1) · · · (α − n + 1)(1 + x)α−n . Thus the Taylor series for f is given by
∞
X α(α − 1) · · · (α − n + 1)
1+ xn .
n=1
n!