Vous êtes sur la page 1sur 104

Towards a Course in Analysis

construction at 09FALL/3142all.tex

Franz Rothe
Department of Mathematics
University of North Carolina at Charlotte
Charlotte, NC 28223
frothe@uncc.edu

September 27, 2011

Contents
1 Convergence 4
1.1 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Sequential Compactness 6
2.1 The Bolzano-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . 6
2.2 Limit sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Some examples how to use compactness . . . . . . . . . . . . . . . . . 11
2.4 Various notions related to compactness . . . . . . . . . . . . . . . . . . 14
2.5 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Normed Spaces 19
3.1 The Basic Convergence Theorem . . . . . . . . . . . . . . . . . . . . . 19

4 Power series 22
4.1 The radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Termwise integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Termwise differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Matrices and Bounded Operators 32


5.1 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Inverse matrices and continuity . . . . . . . . . . . . . . . . . . . . . . 37
5.3 The Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Subadditive Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1
6 The eigenvalue problem 43
6.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Back to arbitrary matrices . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7 The Hilbert-Schmitt Norm 51

8 The Radius of Convergence and Complex Analysis 56


8.1 Consequences for the Spectral Radius . . . . . . . . . . . . . . . . . . 59

9 A Mean Ergodic Theorem 60

10 The contraction mapping principle 64

11 Local inverses 67
11.1 The Local Inverse Theorem . . . . . . . . . . . . . . . . . . . . . . . . 67
11.2 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . 71
11.3 Implicit curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

12 Does Pythagoras Theorem imply the Euclidean Parallel Postulate? 84


12.1 Different Formulations of the Problem . . . . . . . . . . . . . . . . . . 84
12.2 The Main Results obtained with Calculus . . . . . . . . . . . . . . . . 85
12.3 Using Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.4 The Lunes of Pythagoras . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.5 Examples for the Lunes of Pythagoras . . . . . . . . . . . . . . . . . . 89
12.6 Using the Local Implicit Function Theorem . . . . . . . . . . . . . . . 94
12.7 Some Global Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

13 Approximation Theory 98

14 Lagrange Multipliers 103

List of Figures
2.1 A sequence of closed boxed squares contains a common point. . . . . . . . . 7
8.1 The arbitrary cut (, 0] does not alter the radius of convergence. . . . . . 58
11.1 Look at my Folium. Is it of Descartes? . . . . . . . . . . . . . . . . . . . . 80
11.2 Nicomedes concoid for p > d has a saddle point at zero, and a loop. . . . . . 81
11.3 Agnesi curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.4 Nicomedes concoid for p < d has a 2-d minimum at critical point. . . . . . . 82
12.1 The Pythagoras lune with a = 45 . . . . . . . . . . . . . . . . . . . . . . . 90
12.2 The Pythagoras lune with a = 70 . . . . . . . . . . . . . . . . . . . . . . . 90

2
12.3 The Pythagoras lune with a = 90 is highly degenerate. . . . . . . . . . . . 91
12.4 The Pythagoras lune with a = 120 . . . . . . . . . . . . . . . . . . . . . . 91
12.5 The Pythagoras lune with a = 150 . . . . . . . . . . . . . . . . . . . . . . 92
12.6
The Pythagoras lune with a = 180 corresponds to a flat right triangle with
sides 3, 4 and 5it is degenerate, too. . . . . . . . . . . . . . . . . . . . . . 92
12.7 An approximately isosceles lune. . . . . . . . . . . . . . . . . . . . . . . . 93

3
1 Convergence
1.1 Metric spaces
Definition 1.1 (Metric space). A metric space is a set X, together with a distance
dist ( , ) satisfying the following requirements:
(i) dist (x, y) 0 for all x, y X.
(ii) dist (x, y) = 0 implies x = y.
(iv) The triangle inequality: dist (x, z) dist (x, y) + dist (y, z).
Definition 1.2 (Cauchy sequence). A sequence xn in a metric space is a Cauchy
sequence if and only if the following holds:
for all > 0 there exists N () N such that
dist (xn , xn+p ) < for all n > N () and all p N.

Definition 1.3 (Completeness). A metric space in which every Cauchy sequence has
a limit is called complete.
1
10 Problem 1.1. Investigate, how this naming did arise.
10 Problem 1.2. Give a reason why in any metric space a finite set is always
closed.
Reasoning using sequences. Take any convergent sequence an a with elements an S
from the given finite set S. Since the set S contains only finitely many elements, infinitely
many an are equal. We can extract a constant subsequence a = ans for all s N, which
clearly is convergent to a. Hence a = a = ans S.
By this reasoning, we have checked that the limit of any convergent sequence of
elements of S is in S. Hence the set S is closed.
Reasoning using neighborhoods. Let the finite set be S = {s1 , s2 , . . . , sn }. We check
whether the complement E \ S is open. Does any element in the complement have a
neighborhood in the complement?
Take any x E \ S. Since x 6= s1 , x 6= s2 , . . . , x 6= sn ,
the distances dist (x, s1 ), dist (x, s2 ), . . . , dist (x, sn ) are all positive. Let
= min [dist (x, s1 ), dist (x, s2 ), . . . , dist (x, sn )] > 0
The ball B (x) does not contain any one of the elements s1 , s2 , . . . , sn . Thus B (x)
E \ S.
This argument confirms that the set E \ S is open. Hence the complement S is
closed.
1
That is a difficult problem that may carry you away. Perhaps it is only appropriate for frequent
visitors of Hilberts rose garden . . .

4
Still another solution. Use induction on the number of elements of the set S.

Induction start: We check that a set S = {a} of one element is closed. The only
sequence of element of S is the constant sequence an = a for all n N. This
sequence is convergent to a S. Hence the set one element S is closed.

Induction step: We have already checked that any set of n elements is closed. Let S +
be a set of n + 1 elements, of which we want to check closedness.
Clearly S + = S {a}, and S is a set of n elements. As shown above, both S and
the one element set {a} are closed. The union of any two closed sets is closed. 2
Hence the union S {a} is closed, too.

10 Problem 1.3. Let xn be a Cauchy sequence in any metric space.

(i) Prove that the sequence is convergent if there exists a convergent subsequence.

(ii) Conclude that a Cauchy sequence in a compact metric space is always convergent.

(iii) In other words, a compact metric space is always complete.

Answer. (i) Suppose xn is a Cauchy sequence with the convergent subsequence xns x.
Given any > 0, there exist N1 () and N2 () such that

dist (xn , xn+p ) < for all n, n + p > N1 ()


dist (xns , x) < for all ns > N2 ()

Let N = max(N1 , N2 ). We can choose ns = n + p since the subsequence contains


terms with arbitrarily high index.

dist (xn , x) dist (xn , xn+p ) + dist (xn+p , x) < 2 for all n > N ()

Hence the Cauchy sequence converges to the limit x obtained from its subsequence.

(ii) In a compact space, any sequence has a convergent subsequence. Hence item (i)
implies that the Cauchy sequence is convergent.

(iii) Completeness means that every Cauchy sequence has a limit, lying in the given
space. In a compact space, any sequence has a convergent subsequence, the limit
of which lies in the given space. Hence item (i) implies that the Cauchy sequence
is convergent, and its limit is still lying in the given space.
2
I leave it as an exercise to prove this claim.

5
10 Problem 1.4. Let 0 < q < 1, K > 0, and let xn be a sequence in any metric
space about which it is assumed that
dist (xn , xn+1 ) Kq n
for all n N. Prove that xn is a Cauchy sequence.
Answer. For any n, p N, we estimate the distance
dist (xn , xn+p ) dist (xn , xn+1 ) + dist (xn+1 , xn+2 ) + + dist (xn+p1 , xn+p )
q n q n+p qn
Kq n + Kq n+1 + + Kq n+p1 = K K
1q 1q
Since 0 < q < 1 was assumed, we get
qn
lim K =0
n 1q
Hence, for any given > 0, there exists N (), such that
qn
dist (xn , xn+p ) K < for all n > N ()
1q
and all natural numbers p. We conclude that xn is a Cauchy sequence.

2 Sequential Compactness
2.1 The Bolzano-Weierstrass Theorem
Definition 2.1. An accumulation point of any subset A Rd , is a point x such that
for all > 0 there exists a point y A which is different from x.
Main Theorem (The Bolzano-Weierstrass Theorem).
Any bounded sequence in Rd has a convergent subsequence.

Any infinite bounded subset of Rd has an accumulation point.

Corollary 1. Any subset of Rd is sequentially compact if and only if it is closed and


bounded.
Proof of the Corollary. Using previous Problem 2.2, we see that a sequentially compact
set needs to be closed. Too, any sequentially compact set is bounded. Conversely, we
have to prove that a set A that is closed and bounded is sequentially compact, too.
Let xn be any sequence in A. By the Bolzano-Weiertrass Theorem, the sequence has
a convergent subsequence xns x. Since xns A for all ns and the set A is closed by
assumption, we conclude that x A.
This argument confirms that the set A is sequentially compact.

6
Proof of the Bolzano-Weiertrass Theorem. The argument is now explained for the unit
square Q = [0, 1] [0, 1]. I leave it to the reader to do the more general d-dimensional
case. For convenience, we write ys v xn if ys = xns is a subsequence of xn . Given is
any sequence xn of points in the unit square Q. We divide the unit square into four
congruent closed subsquares

[0, 21 ] [0, 12 ] , [0, 21 ] [ 12 , 1] , [ 12 , 1] [0, 12 ] , [ 12 , 1] [ 12 , 1]

At least one of these four smaller squares contains an infinite subsequence xns v xn . We
[1]
call such a subsequence xn and the square Q[1] .
Now the process can be repeated. Once more, the square Q[1] is divided into four
closed congruent squares of half side length. At least one of these four smaller squares
[2] [1]
now called Q[2] contains an infinite subsequence xn v xn .

Figure 2.1: A sequence of closed boxed squares contains a common point.

Inductively, we get a sequence of closed boxed squares

Q Q[1] Q[2] . . .

and a sequence of more and more sparse subsequences.

x w x[1] w x[2] w . . .

For the sequence of boxed squares, there exists a point in the infinite intersection
\
x Q[n]
nN

Question. Justify this assertion using earlier results.


Answer. We can take from each square its lower left corner point qn . Both coordinates
of qn are bounded monotone sequences, and hence convergent.

7
On the other hand, for the sequence of subsequences, one can use a diagonal argu-
ment. We define the diagonal sequence
[i]
xi = xi

its first element is the first element from the first subsequence, its second element is the
second element from the second subsequence, and so on.
For any given natural number n, almost all elements of the diagonal sequence except
the initial n 1 elements are a subsequence of x[n] . Hence they lie in the square Q[n]
and hence
[n] 2
dist (xi , x) diam (Q ) = n for all i n
2
We see that the diagonal sequence converges to x. Clearly, the diagonal sequence xi is
a subsequence of the original sequence xn . Altogether, this argument confirms that any
given sequence of points in the square Q has a convergent subsequence, and hence the
square Q is compact.
10 Problem 2.1. If A Rn and B Rp are any compact sets, then the Cartesian
product set A B is compact, too. Prove this statement.
Answer. Take any sequence with terms (xn , yn ) in the Cartesian product A B. Since
A is compact, the sequence xn has a convergent subsequence xns x A. Since B is
compact, too, the sequence yns has a convergent subsequence yns ,t y B.
Finally, xns ,t x and yns ,t y imply (xns ,t , yns ,t ) (x, y) A B. Thus we
conclude that the Cartesian product A B is compact.
Remark . As a consequence, we get the Bolzano-Weierstrass Theorem for Rn from its
version for the reals R:
Every bounded sequence is Rn has a convergent subsequence.

10 Problem 2.2. Let A Rn be any compact set and B A any subset. Show
that B is compact if and only if it is closed.
Answer. If the subset B is compact, it is closed, since any compact set is closed. The
converse is not true in general: A closed set need not be compact.
But assume that B A is a closed subset of the compact set A. In that case B is
compact, as we check now: Let bn be a sequence in B. Since B A, and A is assumed
to be compact, there exists a convergent subsequence

bn s b A

Now the assumption that B is closed implies that b B. Thus we have checked that B
is compact.

8
2.2 Limit sets
Definition 2.2 (Limit set of a sequence). The limit set {xn } of any sequence xn
consists of the limits of all convergent subsequences of xn .

10 Problem 2.3. Give examples

(i) where the limit set is empty,

(ii) the limit set consists of exactly two points,

(iii) the limit set is the entire interval. [0, 1].

Answer. (i) The limit set of the sequence xn = n for all n N is empty.

(ii) The limit set of the sequence xn = (1)n consists of the two points +1 and 1.

(iii) Let the sequence xn enumerate all rational numbers in the interval (0, 1). Its limit
set is the entire interval [0, 1] since each real number in the closed unit interval
no matter whether it is rational or not is the limit of a non repeating sequence
of rational numbers. 3
Remark . (i) The empty set is not a sequence.

(iii) Another nice example is the sequence

(common) yn := sin2 n

10 Problem 2.4. Prove or disprove: in any metric space, any sequence is con-
vergent if and only if its limit set consists of exactly one point.

Answer. The statement is not true. Indeed, any convergent sequence has a limit set
consisting of exactly one point. This is a consequence of the fact that all subsequences
of a convergent sequence are convergentwith the same limit.
But there exist sequences which are not convergent and nevertheless have exactly
one limit point. Here is an example:
(
0 if n is even
xn =
n if n is odd

This sequence has the limit set {xn } = {0}, but does not converge.
3
One cannot use the constant sequence infinitely repeating the given rational number. The required
subsequence needs to be obtained as a subsequence of the enumerating sequence xn .

9
10 Problem 2.5. Prove that a bounded sequence in Rn the limit set of which is
exactly one point is convergent. 4

Answer. By the Bolzano-Weierstrass Theorem, a bounded sequence ak in Rn has a


convergent subsequence aks a. Since the limit set contains a, it is not empty.
We assume towards a contradiction that the sequence does not converge to a. There
would exist a critical > 0 such that |ak a| holds for infinitely many k. From
these ak we extract, once more by the Theorem of Bolzano-Weierstrass, another con-
vergent subsequence akt b. Since |b a| , the two subsequences akt and aks have
different limits b 6= a. Hence the limit set contains two different points, contrary to the
assumption.
This contradiction confirms that the sequence converges to a.
Question. (i) For which type of metric spaces does the claim remain valid.

(ii) Does the same statement remain true for any metric space. Why not?
Answer. The entire reasoning remains valid in any compact metric space. But the
claim is false for a metric space that is not compact. For example, take the sequence
of functions f2n = cos nt, f2n+1 = sinnnt in the space of continuous functions C[0, 2].
This sequence is not convergent, but its limit set has exactly one element, namely the
function zero.

10 Problem 2.6. Prove that the limit set of any sequence is closed.

Answer. Assume that yk y is a convergent sequence of points yk in the limit set


{xn }. Choose a real null sequence p 0, for example p = p1 . The convergent
sequence yk contains a subsequence ykp v yk such that

dist (ykp , y) < p for all p N

Since the term ykp is assumed to be in the limit set, there exists a subsequence of the
sequence xn convergent to ykp . From this subsequence, we can choose a term xnp such
that np > np1 and
dist (xnp , ykp ) < p for all p N
Hence
dist (xnp , y) dist (xnp , ykp ) + dist (ykp , y) < 2 p
Hence the subsequence xnp v xn is convergent with limit xnp y. By definition, this
limit y is an element of the limit set {xn }.
Altogether, the argument shows that the limit set contains the limits of convergent
sequences from the limit set. Hence the limit set is closed.
4
More simply, one can saya bounded sequence with exactly one limit point. But this is somewhat
misleading, because limit and limit point have different meanings!

10
10 Problem 2.7. Let A be any compact set and xn A be any sequence. Use the
previously solved problems to conclude that the limit set is compact and not empty.

Answer. By definition of compactness, there exists a convergent subsequence xns x


A. Hence the limit set is a nonvoid subset of A. By previous problems, the limit set
is always closed. Furthermore, a closed subset of a compact set is compact. Hence the
limit set is compact and not empty.

2.3 Some examples how to use compactness


Main Theorem (Continuous inverse on a compact). A continuous injective func-
tion on a compact metric space, mapping to a second metric space, has a continuous
inverse.

Proof. Let f : K 7 H be a continuous function on a compact metric space K to the


metric space H. Let R := f (K) H be the range of the function. Because the function
is assumed to be injective, there exists a unique inverse f 1 defined on R.
We check continuity of the inverse by means of sequences. Assume that

yk R y

is a convergent sequence in the range. Hence yk = f (xk ) with unique xk K. Since K


is assumed to be compact, there exists a convergent subsequence xks v xs :

xks x K.

By continuity of f , we conclude that

yks = f (xks ) f (x) R.

Since the limit is unique, we conclude

y = f (x)

We have to confirm that not only the subsequence xks , but indeed the sequence xk is
convergent to x. Since K is compact, it is enough that every convergent subsequence of
xk has the same limit. Assume for a second subsequence xkt v xs

xkt x K.

The same argument as above yields

y = f (x)

Hence f (x) = f (x), and since f is injective, we get x = x.

11
Hence all convergent subsequences of xk K have the same limit x. Since K is
compact, this implies that

f 1 (ys ) = xs x = f 1 (y)

The entire reasoning shows that the inverse f 1 of an injective function on a compact
set is continuous.
The general principle behind Theorem 2.3 is
compactness + uniqueness yields convergence.
5
I give a second example: Take any monic polynomial

P (x) = xm + a1 xm1 + a2 xm2 + + am

of degree m 1 with real coefficient vector (a1 . . . am ) Rm .


10 Problem 2.8. Prove that any zero of this polynomial P (x) = 0 satisfies
m
X
|x| 1 + |ai |
i=1

Answer. The assertion clearly holds for all zeros |x| 1. On the other hand,

|P (x)| |x|m |a1 ||x|m1 |a2 ||x|m2 |am |

Hence |x| > 1 implies


m m m
!
X X X
|P (x)| |x|m |ai ||x|mi |x|m |ai ||x|m1 = |x|m1 |x| |ai |
i=1 i=1 i=1

Now P (x) = 0 and |x| > 1 imply that the right-hand side is non-positive, and hence
m
X
|x| |ai |
i=1

10 Problem 2.9. Let P [n] be a sequence of monic polynomials, all of same degree
m 1. Assume the coefficient vectors a[n] Rm are a convergent sequence with limit a.
Let P be the limit polynomial. Take any sequence x[n] of zeros;

P [n] (x[n] ) = 0 for all n N.

Give reasons for the following statements:


5
A polynomial with highest coefficient one is called monic.

12
(i) If x[n] x, then the limit is a zero of the limiting polynomial P .

(ii) Whether the sequence x[n] is convergent or not, it always has limit points.

(iii) All limit points of the sequence x[n] are zeros of the limiting polynomial P .

(iv) To take a nice example, we assume that the limiting polynomial is P (x) = xm 1
and m is odd. Show that the sequence x[n] is convergent.

Answer. (i) Assume a[n] a and x[n] x for a sequence of some zeros of the corre-
sponding polynomials. One takes the limit of

P [n] (x[n] ) = 0 for all n N

By the rules dealing with limits, one obtains as required:

P (x) = lim P [n] (x[n] ) = 0


n

(ii) By assumption, the sequence of coefficients a[n] is convergent. It is hence bounded


by some constant A. From
[n]
|ai | A for all n N and i = 1 . . . m

the estimate from Problem 2.8 yields

|x[n] | 1 + mA for all n N

for all zeros of the corresponding polynomials. By the Bolzano-Weierstrass The-


orem, any bounded sequence has a convergent subsequence. Hence the sequence
x[n] always has limit points.

(iii) Since the limit points are limits of convergent subsequences, part (i) implies that
they are zeros of the limiting polynomial P .

(iv) The polynomial is P (x) = xm 1 with m odd has exactly one real zero x = 1.
The sequence x[n] is bounded by part (ii). Hence, by the Bolzano-Weierstrass
Theorem, it has at least one limit point.
By part (iii) each of its limit points is a zero of the limiting polynomial P . Since
this polynomial has exactly one real zero x = 1, the sequence x[n] has exactly one
limit point.
A bounded sequence with exactly one limit point is convergent, as I wanted to tell
you earlier.

13
2.4 Various notions related to compactness
Definition 2.3. An accumulation point of any subset A X in any metric space X, is
a point x each neighborhood of which contains another point y A which is different
from x.

Definition 2.4. A metric space X is said to have the Bolzano-Weierstrass property if


any infinite subset A X has an accumulation point.

Definition 2.5. A metric space X is called sequentially compact if any sequence has a
convergent subsequence.

10 Problem 2.10. Convince yourself that in any metric space, which has the
Bolzano-Weierstrass property is sequentially compact, too. You need to use the following
two facts:

(i) Any two points x 6= y have a positive distance dist (x, y) > 0.

(ii) The Archimedean axiom holds for the real numbers, and hence: For any > 0 there
exists a natural number n N such that n1 < .

Answer. Suppose that in the metric space X, the Bolzano-Weierstrass property holds.
To check whether X is sequentially compact, take any sequence xn in X. Let A := {xn :
n N} be the set of members of the sequence.
In case that A is finite, at least one member of a A appears infinitely often in the
sequence. Hence the sequence xn has a constant subsequence xnk = a for all k N.
Clearly the constant sequence converges to a.
In case that A is infinite, we use the assumed Bolzano-Weierstrass property, and
conclude that the set A has accumulation point x. Inductively we get a subsequence ys v
xn convergent to x: 6 Let y1 6= x be a member of xn with 0 < dist (y1 , x) < 1. Let y2 6= x
be a member of xn with higher index such that 0 < dist (y2 , x) < min[ 21 , dist (y1 , x)].
After y1 , y2 , . . . yn have been chosen, we choose for yn+1 a member of the sequence
1
xn with higher index such that 0 < dist (yn+1 , x) < min[ n+1 , dist (yn , x)] = . Such a
member exists, because the ball of radius around x contains infinitely many terms of
the sequence all different to x.
By the Archimedean axiom, for any > 0 there exists a natural number n N such
that n1 < . Hence the subsequence ys v xn is convergent to x.

Definition 2.6. A collection of open sets O with I is called an open covering of


any set A iff
I O A
A covering is called countable iff the index set I is countable, and finite for a finite index
set, for example I = {1, 2, . . . N }
6
For convenience, we write ys v xn if ys = xns is a subsequence of xn .

14
Main Theorem (The Heine-Borel Theorem). Any subset A Rd is a closed and
bounded, if and only if any open cover has a finite subcover.

Definition 2.7. A metric space X is said to be compact if any open cover of X has a
finite subcover.

Definition 2.8. A metric space X is said to be countable compact if any countable


open cover iN Oi X has a finite subcover N
i=1 Oi X.

Corollary 2. For any subset of A Rd , these statements are equivalent:

(i) A is closed and bounded.

(ii) A is sequentially compact.

(iii) A is compact.

(iv) A is countable compact.

The same equivalences hold for any metric space. More interestingly, the four state-
ments are still equivalent for many topological spaces, but not for all.

2.5 Connectivity
Definition 2.9 (Separating sets). We say that two open sets U and V separate any
given set A in any metric space iff

both A U 6= and A V 6= but A U V =


and nevertheless A U V .

Definition 2.10 (Connected set). A set A in any metric space is called connected iff
there does not exist any pair of separating open sets.

Theorem 2.1. Any interval in the reals is connected. It does not matter whether the
interval is bounded, or unbounded, open or closed or half-open.

Proof. Let the given interval be (0, 1). The minor modifications to cover the other cases
are left to the reader. Assume towards a contradiction that the two open sets U R
and V R separate the interval (0, 1).
There exist two points u U (0, 1) and v V (0, 1). We may assume without
loss of generality that 0 < u < v < 1. We define the cut-point

(2.1) u := sup U [u, v]

Since U is open, there exists > 0 such that (u , u + ) U and hence u + u,


and u < u.

15
Since V is open, too, there exists > 0 such that (v , v +) V . By the definition
of separating sets [u, v] U V = . We conclude u v . Hence

u<u<v

Since by definition of separating sets [u, v] (0, 1) U V , two possibilities are left:
case (a): u U case (b): u V .
Both cases can be ruled out, leading to a contradiction:

case (a): u U and U open imply that there exists > 0 such that (u , u + ) U
which contradicts the definition (2.1) of the cut-point.

case (b): u V and V open imply there exists > 0 such that (u , u + ) V
and hence U [u, v] [u, u ] which again contradicts the definition (2.1) of the
cut-point.

This contradiction rules out that exist any two open sets U R and V R separating
the interval (0, 1). Hence the interval (0, 1) is connected.
Remark. We have shown: for any two nonempty open disjoint subsets of (0, 1), their
union can never be the entire interval (0, 1). 7

10 Problem 2.11. Given are again two open nonempty disjoint sets, and u
U R and v V R, and u < v. As an alternative, we define the cut-point

(2.2) u := sup{c [u, 1) : [u, c] U }

(i) Check that u u.

(ii) Prove independently that u < u < v.

(iii) Prove that both cases


case (a): u U , case (b): u V ; lead to a contradiction.

Hence, once more, the two open sets U and V cannot separate the interval (0, 1). Hence
the interval (0, 1) is connected.

Theorem 2.2 (The limit set of a continuous trajectory is connected). Let X


be a either a closed and bounded subset of Rd or any compact metric space X. Let

C : t [0, ) 7 x(t) X

be a continuous trajectory in the set X. Then the limit set (C) is compact and con-
nected.
7
At least a paper always fits between two clouds.

16
10 Problem 2.12. Assume that a continuous trajectory t [0, ) 7 x(t) Rd .
has a bounded nonempty limit set. Prove that (C) is bounded. Conclude that the limit
set is compact and connected.

Corollary 3. Let
C : t [0, ) 7 x(t) Rd
be a continuous trajectory in Rd . If the limit set (C) is compact and nonempty, then
it is connected, too.

10 Problem 2.13. Find a continuous trajectory in R2 , the limit set of which is


empty.

10 Problem 2.14. Find a continuous trajectory in R2 , the limit set of which


consists of two disjoint rays.

Proof of Theorem 2.2. We need to confirm that the set A := (C) does not have any
pair of separating open sets U and V . Their definition implies

(C) U (C) \ V =: K ,

which is closed and hence compact. Similarly,

(C) V (C) \ U =: H ,

which is closed and compact, too. Furthermore, the two sets H and K are disjoint.
Indeed x K H would imply x (C), but x / U V . This contradicts the
assumption that (C) U V .
Proposition 2.1. Any two disjoint compact sets K and H in a metric space have a
positive distance. In other words, there exists d > 0 such that

dist (k, h) d for all k K and h H.

10 Problem 2.15 (Disjoint compact sets have a positive distance). Prove Propo-
sition 2.1.
From the Proposition, we see that there exists a distance d > 0 such that

(2.3) dist ((C) U , (C) V ) dist ((C) \ V, (C) \ U ) d > 0

By the assumptions made, there exist two different points x (C) U and x
(C) V .
10 Problem 2.16. Show that there exists an x (C) U .

17
This is the coup de grace of the proof of Theorem 2.2: Since U is open, x U
implies x
/ U . Too, x (C) U (C) U and the estimate (2.3) together
imply x / (C) V , and hence x / V . Since we know that x (C), we see
that (C) * U V . Thus the two open sets U and V cannot separate the limit set (C),
which hence is connected.
Solution of Problem 2.16. Define d := dist (x , x ). Since U and V are both open, there
exists an (0, d2 ) such that both B (x ) U and B (x ) V , and these two balls
are disjoint.
By definition of the limit set, there exist sequences tk and k such that x(tk ) x
and x(k ) x . By taking subsequences, we can assume that tk < k < tk+1 for all
k N. There exists K such that for all k > K, we get
x(tk ) B (x ) U and x(k ) B (x ) V
Define
wk = inf{s (tk , k ) : x(s)
/U}
which exists and wk (tk , k ). Since x[tk , wk ) U , continuity implies x(wk ) U , from
the infimum we get even x(wk ) U .
By the Bolzano-Weierstrass Theorem, there exists a convergent subsequence x(wk )
x (C), and hence x (C) U , as to be shown.
10 Problem 2.17. Provide an illustration for the proof of the crucial Prob-
lem 2.16.
10 Problem 2.18. Define d := dist (x , x ), and for all 0 < s let
s := {x (C) : dist (x , x) = s }
Prove that for all s [0, d], the sets s are compact and nonempty.
Answer. By definition of the limit set, there exist sequences tk and sj such that x(tk )
x and x(j ) x . By taking subsequences, we can assume that tk < k < tk+1 for all
k N.
The claim is obviously true for s = 0 and s = d. Hence we can assume s (0, d).
and choose = min[s, d s]. There exists K such that for all k > K, we get
dist (x , x(tk )) < s
dist (x , x(k )) < d s and hence dist (x , x(k )) > d s
The Intermediate Value Theorem, applied to the continuous function t 7 dist (x , x(t)),
yields existence of parameters wk (tk , k ) such that
dist (x , x(wk )) = s
for all k > K. By the Bolzano-Weierstrass Theorem, there exists a convergent sub-
sequence x(wk ) xl(s) (C) such that dist (x , xl(s)) = s. Hence xl(s) s is
nonempty. The set is compact, since it is the intersection of a closed with a compact
set.

18
3 Normed Spaces
Definition 3.1 (Normed space). A normed space is a linear vector space X, together
with a norm k k satisfying the following requirements:

(i) kxk 0 for all x X.

(ii) kxk = 0 implies x = 0.

(iii) kxk = || kxk for all R, or C


for a real or complex normed space, respectively.

(iv) The triangle inequality kx + yk kxk + kyk.

10 Problem 3.1. Explain how every normed space naturally becomes a metric
space.

Definition 3.2 (Completeness). A normed space in which every Cauchy sequence


has a limit is called complete.

Definition 3.3 (Banach space). A Banach space is a complete normed space.

3.1 The Basic Convergence Theorem


Main Theorem (Majorized convergence). Let Mn be a sequence of positive reals such
that the partial sums are bounded:
N
X
(3.1) Mn K for all N N.
n=1

Assume that the sequence xn from a complete normed space is majorized by Mn :

kxn k Mn for all n N.

Then the infinite sum



X
xn = x
n=1

exists and the series is absolutely convergent.

Proof. Let
N
X
M := sup Mn
N N n=1

19
This supremum exists since, by assumption, the set of partial sums is bounded. Given
any > 0, there exists N () such that
m
X
(3.2) M < Mn < M for all m > N ().
n=1
m+p
X
(3.3) Mn < for all m > N () and all p N.
n=m+1

We check that the sequence of partial sums


m
X
sm := xn
n=1

is a Cauchy sequence. Indeed, for all m > N () and all p N, we estimate


m+p m+p m+p
X X X
ksm+p sm k = xn kxn k Mn <

n=m+1 n=m+1 n=m+1

Because the space is assumed to be complete, the sequence converges. By definition,


the limit of the partial sums yields the infinite sum

X
xn = x
n=1

Too, the triangle inequality yields



Xm
x xn for all m > N ().


n=1

Corollary 4. The sum of a majorized series does not depend on the order of its terms.

Corollary 5. A uniformly majorized series of continuous real or vector valued functions


on any metric space converges uniformly and absolutely to a continuous function.

Proof. The space of of continuous real of vector valued functions x : ERd on any metric
space E with the maximum norm

kxk := sup{kx(t)k : t E}

is a complete normed space.

20
8
10 Problem 3.2. Prove Corollary 4.

Answer. Given any bijection r : N 7 N, let the reordered series be Mn := Mr(n) and
xn := xr(n) . The reordered majorising series is bounded, too: For all N , there exists N
such that r([1, N ]) [1, N ], and hence
N
X N
X N
X
Mn = Mr(n) Mn K
n=1 n=1 n=1

Hence, by majorized convergence, the reordered infinite series has some limit x.
It remains to show that x = x. Given > 0, we choose N () such that estimate (3.2)
holds. Hence
m

X
x xn for all m > N ().


n=1

[
There exist N () > N () and M > N () such that

[
[1, N ()] r([1, N ()]) [1, M ]

[
For all m N (), we choose m such that

[1, N ()] r([1, m]) [1, m]

For the sums we get


m m N ()
X X X X
xn = xr(n) = xn + xi
n=1 n=1 n=1 iD

with D = r([1, m]) \ [1, N ()] (N (), m]. Now we estimate norms

N ()
m
X X X X

x n x n

kx i k Mi <
n=1 n=1 iD N ()<im

The triangle inequality yields


N ()

N ()

Xm Xm X X


xn x xn xn + xn x
2



n=1 n=1 n=1 n=1

[
for all m N (), and hence convergence.
8
The principle idea is clear, but hard to write downexcellent exercise for a future mathematician.

21
Proposition 3.1. A normed space, in which every absolutely convergent series is con-
vergent, is complete.

Proof. Let xn be a Cauchy sequence in the given normed space. We use the Cauchy
criterium with s = 2s for all s = 0, 1, 2, . . . . Hence there exists an increasing sequence
Ns such that
kxn+p xn k < 2s1 for all n Ns and all p 1
holds for all s = 0, 1, 2, . . . . We get an absolutely convergent series, because of the
estimate
X X
kxNs xNs1 k < 2s = 1
s=1 s=1

By the assumption of the theorem, this series is convergent in norm. In other words,
the limit
XS
x xN0 := lim [xNs xNs1 ] = lim xNS xN0
S S
s=1

exists. Too, we can estimate the speed of convergence for this subsequence:
S+p
X
x xNS = (x xN0 ) (xNS xN0 ) = lim [xNs xNs1 ]
p
s=S+1
S+p S+p
X X

kx xNS k lim xNs xNs1 lim kxNs xNs1 k

p p
s=S+1 s=S+1

X
2s = 2S
s=S+1

Indeed, any Cauchy sequence with a convergent subsequence is convergent. For conve-
nience, we directly establish convergence of the entire sequence. Indeed, for all n NS

kx xn k kx xNS+1 k + kxn xNS+1 k 2S1 + 2S1 = 2S

and hence limn xn = x , as to be shown.

4 Power series
4.1 The radius of convergence
In the context of analytic functions, the Corollary 5 about uniformly majorized series
of continuous functions yields

22
Corollary 6 (Weierstrass M-test). Suppose a series of continuous functions
uk : T 7 C on the domain T C is uniformly majorized:

|uk (z)| Mk for all z T and k N

and the majorizing series


P
k=1 Mk < is convergent. Then the series of functions


X
uk (z)
k=1

converges uniformly and absolutely to a continuous function.

Theorem 4.1. A power series n


P
n=0 cn z , with arbitrary complex coefficients cn , has a
radius of convergence [0, ]. The power series is absolutely convergent if |z| <
and divergent if |z| > .
9
(i) If = 0, the power series is only convergent for z = 0 and no other value of z.

(ii) If 0 < < , the set where the power series is convergent contains the open disk
D(0, ) and is contained in the closed disk D(0, ).

(iii) If = , the power series is absolutely convergent for all z C.

Corollary 7. The convergence is uniform in any closed disk D(0, r) of radius r < , as
well as in any compact subset K D(0, ) of the open convergence disk.

Remark. The convergence on the boundary circle {z : |z| = } is a much more dif-
ficult problem. This question is really equivalent to the deep and difficult subject of
convergence of Fourier series.
It is known that there exist cases with conditional or absolute convergence on a large
variety of different subsets of the boundary circle.
Proof of Theorem 4.1. Let
(
)
X
:= sup r 0 : |cn rn | <
n=0
B := sup {r 0 : the sequence |cn rn | is bounded }

We prove that both B and B . Since the series is absolutely convergent for
|z| < , and divergent for |z| > B , the result follows. P
Let r < . Since the terms of the convergent series n
n=0 |cn |r have the limit zero,
they are bounded, too. Hence r < implies r B . Hence B .
9
In the context of power series, one puts 00 = 1, hence the sum for z = 0 is c0 .

23
We now prove the reversed inequality. Assume B > 0, otherwise nothing is to be
shown. Let 0 < q < 1 be arbitrarily given and put r := q 2 B . Since qr = qB < B ,
there exists a bound M such that
rn
|cn | n M for all n N
q
|cn r | M q n for all n N
n


X X M
|cn rn | M qn = <
n=0 n=0
1 q

The Weierstrass M-test given in Corollary 6 above implies that the series converges
absolutely and uniformly in the disk D(0, r), since it is uniformly majorized. Hence
q 2 B = r . Since q < 1 is arbitrary, we conclude B .
Remark. The argument proves, too, that the function f (z) := n
P
n=0 cn z satisfies the
estimate
r
|f (z)| sup{|cn |rn : n N}
r |z|
10
for |z| < r < .
Theorem 4.2. A power series has the same radius of convergence as the termwise
differentiated, or termwise integrated series.
Proof. We deal with the termwise differentiated series. Let
(
)
X
0 := sup r 0 : n|cn |rn1 <
n=0

be its radius of convergence. The Weierstrass M-test implies 0 , as the reader can
check.
We now prove the reversed inequality. Let 0 < q < 1 be arbitrarily given. Bernoullis
inequality implies
1 qn 1
nq n1
1q 1q
11
for all natural n. We can now us Weierstrass M-test for the termwise differentiated
series, and prove uniform convergence for |z| r with r := q 2 .
|cn n1 q n1 |
n |cn n1 q 2n2 | for all n N
1q

X
n1 2n2 1 X
n |cn q | |cn n1 q n1 | <
n=0
1 q n=0

Hence q 2 0 for arbitrary q < 1. We conclude 0 , as to be shown.


10
Often the maximum principle allows a sharper estimate.
11
Who does not know that, including professors, needs to do this as an exercise.

24
P n
Proposition 4.1. The power series n=0 cn z , with arbitrary complex coefficients cn ,
has the radius of convergence
1
(4.1) :=
lim supn |cn |1/n
Proof. Let [0, ] be the root expression above. We show that the series is absolutely
convergent for |z| < , but divergent for |z| > . Indeed, we show the convergence is
uniform on any closed disk of smaller radius r < .
Assume |z| r < . Since

lim sup |cn |1/n < 1


n

there exists q < 1 and N such that

r lim sup |cn |1/n < q < 1


n

r|cn |1/n q
|cn rn | q n for all n N

Hence for all |z| r and all n N

|cn z n | q n

X
n
X qN
|cn z | qn = <
n=N n=N
1 q

Hence Weierstrass M-test shows the series is absolutely and uniformly convergent for
|z| rand this statement is true for all r strictly less .
On the other hand, assume |z| > . Hence

|z| lim sup |cn |1/n > 1


n

Hence there exists a subsequence with indexes ns such that

|z||cn |1/n > 1


|cn z n | > 1 for all n from the subsequence
X X
n
|cn z | |cns z ns | =
n=1 s=1

Hence the series is divergent for |z| > .


10 Problem 4.1. Prove

n
(4.2) lim n=1
n

25
Answer. Given 3 > > 0. By the Archimedean axiom exists N such that N > 9 2 .
This choice allows the following calculation for all n N . We use Bernoullis inequality,
and a further elementary estimate, too.
9
<N n
2
n 2
1<
9
n2 2  n 2  2n
n< < 1+ 1+ by Bernoullis inequality
9 3 3
 2

1+ 1 + holds for 0 < < 3
3
 2n

n< 1+ (1 + )n
3
1 nn<1+

| n n 1| <

> 0, there exists N such that for all n N , the last


We have shown: for any given
line is true. Hence limn n n = 1 has been confirmed.
Using the limit (4.2), and the formula (4.1) for the radius of convergence, we get an
independent proof of theorem 4.2 Indeed, the radius of convergence of the termwise
differentiated series equals that of the original series.
P n
Proposition 4.2. The radius of convergence of the power series n=0 cn z with
nonzero coefficients has the bounds

|cn | |cn |
(4.3) lim inf lim sup
n |cn+1 | n |cn+1 |

Proof. Let 1 , 2 [0, ] be the limit infimum and the limit superior of quotients above.
We show the series is absolutely convergent for |z| < 1 . Secondly, we show the series is
divergent for |z| > 2 .
If 1 = 0, nothing has to be proved. (Why?) We may assume |z| r < qr < 1 with
some quotient q < 1. By definition of the limit infimum, there exists N such that for
all n N we have cn 6= 0 and

r |cn |

q |cn+1 |
|cn+1 r | |cn rn | q
n+1

By induction we prove

|cN +p rN +p | |cN rN | q p

26
for all p N. Similarly to the proof of proposition 4.1, the estimate

X X |cN rN |
|cn z n | < |cN rN | qp = <
n=N p=0
1q

and Weierstrass M-test show the series is absolutely and uniformly convergent for |z|
r < 1 .
On the other hand, it is as easy to show that the series is divergent for |z| > 2 . If
2 = +, nothing has to be proved. Assume 2 < +. By the definition of the limit
superior, there exists N such that for all n N we have cn 6= 0 and
|cn |
|z|
|cn+1 |
|cn+1 z | |cn z n |
n+1

Of course, induction shows |cn z n | |cN z N | > 0 for all n N . Hence the series diverges,
since the terms of a convergent series have the limit zero.
Corollary 8. If cn 6= 0 except finitely many coefficients, and the limit
|cn |
(4.4) q := lim
n |cn+1 |

exists, or this sequence of quotientsPdiverges to +, then q = [0, ] is the radius


of convergence of the power series n
n=0 cn z .

Prove: If the power series n


P
10 Problem 4.2. P n=0 cn z has radius of convergence
2n
, then the power series n=0 cn z has radius of convergence .
Remark. It is easier to get the radius of convergence of an even series 2n
P
n=0 cn z , or
similarly an odd series, directly with the quotient criterium.
10 Problem 4.3. Find the radius of convergence of the series

X nn
zn
n=1
n!

Use your result to calculate the limit



n
n!
lim
n n
Proposition 4.3. Let |cn | > 0 for all n. If the limit
|cn |
(4.5) q := lim
n |cn+1 |

27
exists then the limit

(4.6) lim |cn |1/n


n

exists, too, and the two are equal. Similarly, if the first sequence diverges to infinity, the
second sequence diverges to infinity.
10 Problem 4.4. Find positive sequences with different values of
an 1/n
lim an
n an+1

4.2 Termwise integration


Theorem 4.3 (Integration with uniform convergence). Let T Rd be a connected
set and be a piecewise smooth path in T of finite length. Suppose a sequence of
continuous functions fn : T 7 R is uniformly convergent on the path T .

lim fn (x) = f (x) uniformly for x


n

Then the integrals converge, too:


Z Z
lim fn (x) dx = f (x) dx
n

Remark. The remainder can be estimated


Z Z

f (z) fn (z) dz |f (z) fn (z)| |dz| || max{|f (z) fn (z)| : z }

where || denotes the length of the path .


Corollary 9 (Integration with Weierstrass M-test). Suppose a series of contin-
uous functions uk : T 7 C on the domain T C is uniformly majorized:

|uk (z)| Mk for all z T and k N

and the majorizing series


P
k=1 Mk < is convergent. Then the series of functions

X
u(z) = uk (z)
k=1

converges uniformly and absolutely to a continuous function u(z), and can be termwise
integrated along any piecewise smooth path T of finite length.
Z Z
X
u(z) dz = uk (z) dz
k=1

28
Remark. The remainder can be estimated
Z
XN Z
X
u(z) dz uk (z) dz || Mk


k=0 k=N +1

where || denotes the length of the path .

Corollary 10. Any Taylor series with positive radius of convergence



X
f (z) = cn (z z0 )n
n=0

defines of continuous function in its open disk of convergence D(z0 , ). Let be a


compact path from center z0 to any point z inside the open disk of convergence. Then
termwise integration yields
Z
X cn
f (z) dz = (z z0 )n+1
n=0
n + 1

Corollary 11. Especially, the integral is independent of the integration path. Hence
integration defines a function
Z z
F (z) := f () d
z0

for all z in the open disk D(z0 , ). This function has the Taylor series

X cn
F (z) = (z z0 )n+1
n=0
n + 1

obtainable by termwise integration. Its convergence is uniform on any compact subset


of the open disk D(z0 , ).

Remark. The remainder can be estimated


Z
N z N
X c n

n+1
X
n
F (z) (z z ) f () c ( z 0 |d|
)

0 n
n + 1


n=0
z 0

n=0

Z z X
|cn ( z0 )n | |d|
z0 n=N +1

X |cn (z z0 )n+1 |

n=N +1
n+1

29
4.3 Termwise differentiation
Theorem 4.4. Any Taylor series with positive radius of convergence

X
f (z) = cn (z z0 )n
n=0

defines of continuously complex differentiable function in its open disk of convergence


D(z0 , ). The derivative is obtained by termwise differentiation

df X
= ncn (z z0 )n1
dz n=0

and the convergence of this series is uniform on any compact subset of the open disk
D(z0 , ).
Proof. We define

X
g(z) := ncn (z z0 )n1
n=0

The radius of convergence of this termwise differentiated series is again , and the
convergence of this series is uniform on any compact subset of the open disk D(z0 , ).
Hence we can integrate termwise and obtain
Z z
X
g() d = cn (z z0 )n
z0 n=1

with the same radius of convergenceand uniform convergence of the power series on
any compact subset of the open disk D(z0 , ). Hence we get
Z z
g() d = f (z) c0
z0

Too, we have shown above that the integral is independent of the integration path.
Hence the function f (z) is differentiable and f 0 (z) = g(z).
Proof. Instead, one can avoid all integralsand their path independence. We prove
differentiability head-on: Take any point z D(z0 , ) and assume |z| < |z| for the
increment. Difference quotient minus expected derivative have the series expansion

(z + z z0 )n (z z0 )n
 
f (z + z) f (z) X
n1
g(z) = cn n(z z0 )
z n=2
z

The last bracket can be estimated by


(z + z z0 )n (z z0 )n


n1 n(n 1)
n(z z0 ) |z| max[|zz0 |n2 , |z+zz0 |n2 ]
z 2

30
There exists > 0 such that max[|z z0 |, |z + z z0 |] (1 )4 .
X
f (z + z) f (z) n(n 1)|cn | n2 |z|
g(z)
(1 )4n8 |z| K 2
z n=2
2

Since the series


X
K := |cn |n2 (1 )2n8
n=2

is convergent, and
n(n 1) n2 n2 1
(1 )2n n2 (1 )2n 2n
2
2
2 (1 + ) (1 + n)
In the end, we have confirmed

f (z + z) f (z)
lim g(z) = 0
0 z

and hence f 0 (z) = g(z), as to be shown.


10 Problem 4.5. Prove that for any complex numbers a 6= b and natural number
n2
n1
an b n n1
X
nb = (a b) k an1k bk1
ab k=1

Answer. Take the formula for the finite geometric series, differentiate both sides by x.
Multiply both sides with (1 x). Then we put x = ab and multiply both sides with an1 :
n1
X 1 xn
xk =
k=0
1x
n1
X
k1 nxn1 (1 x) + 1 xn
kx =
k=1
(1 x)2
n1
X 1 xn
(1 x) k xk1 = nxn1
k=1
1x
n1
X an b n
(a b) k bk1 an1k = nbn1
k=1
ab

10 Problem 4.6. Use the last problem to get the estimate


n
a bn

|a b| n(n 1) max[|a|n2 , |b|n2 ]
n1


ab nb 2

31
5 Matrices and Bounded Operators
5.1 Matrix norms
For any vector norm kxk for x Rd , the corresponding matrix norm for d d matrices
is defined to be
 
kAxk d
(5.1) kAk := sup : x R , x 6= 0
kxk
Proposition 5.1. Any matrix norm has the following properties:
(i) The supremum in equation (5.1) can be restricted to the unit ball.

(5.2) kAk := sup{kAxk : x Rd , kxk = 1}

(ii) For all x Rd

(5.3) kAxk kAk kxk

(iii) The matrix norm is the least constant M such that for all x Rd

kAxk M kxk

10 Problem 5.1. To convince yourself that Proposition 5.1 is correct, we define:


 
kAxk d
N (A) := sup : x R , x 6= 0
kxk
B(A) := sup kAxk : x Rd , kxk = 1


C(A) := inf M : kAxk M kxk for all x Rd




Prove that N (A) = B(A) = C(A) and kAxk N (A) kxk for all x Rd .
Answer.
Proposition 5.2. The matrix norm is a norm on the vector space of matrices.
(iv) For all matrices, kAk 0and kAk = 0 implies A = 0.

(v) kAk || kAk

(vi) kA + Bk kAk + kBk

(vii) kABk kAk kBk


10 Problem 5.2. Prove that the matrix norm is a norm on the vector space of
matrices.

32
Answer.
10 Problem 5.3 (The row-sum norm of matrices). The maximum norm for
vectors in x Rd is defined as
(5.4) kxk := max{|xk | : 1 k d}
Prove that the corresponding matrix norm is
d
d X
(5.5) kAk := max |aik |
i=1
k=1

Answer. For any vector x Rd ,



d d
d X d X
kAxk = max aik xk max |aik | |xk |

i=1 i=1
k=1 k=1
(5.6) " d # " d
#
d X d d X
max |aik | max |xk | = max |aik | kxk
i=1 k=1 i=1
k=1 k=1

Hence
d
d X
kAk max |aik |
i=1
k=1

We now check this is the best constant. Let i0 be the minimal index such that
d d
X d X
|ai0 k | = max |aik |
i=1
k=1 k=1

Choose the vector x by setting


xk = sign ai0 k for all 1 k d
For this choice of vector x, all steps in the estimate (5.6) become equalities. Hence the
estimate is optimal.
10 Problem 5.4 (The column-sum norm of matrices). The sum norm for vectors
in x Rd is defined as
d
X
(5.7) kxk1 := |xk |
k=1

Prove that the corresponding matrix norm is


d
d X
(5.8) kAk1 := max |aik |
k=1
i=1

Hence kAk1 = kAT k .

33
Answer. For any vector x Rd ,
d
d X
d X d
X X
kAxk1 = aik xk |aik | |xk |



i=1 k=1 i=1 k=1
d d
" d
# d
XX d X X
(5.9) = |aik | |xk | max |aik | |xk |
k=1
k=1 i=1 i=1 k=1
" d
#
d X
= max |aik | kxk1
k=1
i=1

Hence
d
d X
kAk1 max |aik |
k=1
i=1

We now check this is the best constant. Let k0 be the minimal index such that
d d
X d X
|aik0 | = max |aik |
k=1
i=1 i=1

Choose the vector x by setting


(
1 if k = k0
(5.10) xk =
0 6 k0
if k =

For this choice of vector x, all steps in the estimate (5.9) become equalities. Hence the
estimate is optimal.

Definition 5.1 (The Euclidean norms). The Euclidean norm for vectors in x Rd is
defined as
v
u d
uX
(5.11) kxk2 := t |xk |2 = xT x
k=1

The corresponding matrix norm is

(5.12) kAk2 = max{kAxk2 : x Rd , kxk2 = 1}

Definition 5.2 (Transposition and Symmetry). The tranposed of an m n-matrix


A with elements aik is the n m matrix AT with the elements

(AT )ki = Aik for i = 1 . . . n and k = 1 . . . m

34
10 Problem 5.5. Show (AB)T = B T AT hold for any two matrices where the
multiplication on either side is well defined.

I shall often use matrix multiplication of row and column vectors. Hence for a, b Rd ,
d
X
T
a b= ai b i
i=1

Proposition 5.3. The Cauchy-Schwarz inequality tells that for all a, b Rd

|aT b| kak2 kbk2

The equality |aT b| = kak2 kbk2 holds in Cauchy-Schwarz if and only if the two vectors a
and b are linearly dependent.

10 Problem 5.6. Prove the Cauchy-Schwarz inequality, and deal with the case of
equality.

10 Problem 5.7. Explain why any d d matrix satisfies kAT k2 = kAk2 .

Answer. By the Cauchy-Schwarz inequality. we get for all x, y Rd

|(AT x)T y| = |xT Ay| kxk2 kAyk2 kxk2 kAk2 kyk2


 2
We choose y := AT x. Because of |(AT x)T AT x| = kAT xk2 , we get
 T 2
kA xk2 kxk2 kAk2 kyk2 = kxk2 kAk2 kAT xk2
kAT xk2 kxk2 kAk2 for all x Rd
kAT k2 kAk2

Since (AT )T = A, we get kAk2 kAT k2 similarly, and hence kAT k2 = kAk2 , as to be
shown.

Lemma 5.1. The maximum norm, Euclidean norm, and sum norm satisfy

kxk kxk2 kxk1 d kxk2 d kxk for all x Rd

for all vectors x Rd .

Remark. We shall improve this estimate lateron.

10 Problem 5.8. Prove Lemma 5.1. You need the Cauchy-Schwarz inequality at
one point.

35
Lemma 5.2. The maximum norm, Euclidean norm, and sum norm of d d matrices
satisfy

kAk d kAk2 d kAk1

kAk1 d kAk2 d kAk

10 Problem 5.9. Prove Lemma 5.2.


Lemma 5.3. For any norm k k in Rd , there exist constants m, M > 0 such that

(5.13) m kxk kxk M kxk for all x Rd

Proof. Any vector x Rd has a unique decomposition in the standard basis


d
X
x= xk ek
k=1

Hence the triangle inequality implies


d
" d # " d #
X X d X
kxk |xk |kek k kek k max |xk | = kek k kxk1
k=1
k=1 k=1 k=1

Since kxk = maxdk=1 |xk |, the right hand inequality of (5.13) holds with
d
X
M := kek k
k=1

The set
K := {x Rd : kxk = 1}
is bounded and closed, and hence compact. Too, we see that the function

k k : x Rd 7 kxk R

is continuous. On the compact set K, this function k k takes a minimal, and a maximal
value. Its maximum value is M (Why?). Its minimum value is m > 0. Indeed m = 0 is
impossible, since K does not contain the zero vector, and hence kxk > 0 for all x K.
Hence we have confirmed that

m kxk M for all x Rd with kxk = 1

which easily implies

m kxk kxk M kxk for all x Rd

36
Lemma 5.4. The supremum in equation (5.2) is attained. In other words, for any
vector norm k k = 1 and any matrix A, there exists a vector x Rd such that kxk = 1
and kAxk = kAk.
Proof. The function
H : x Rd 7 kAxk R
is continuous (why?). The set

Knorm := {x Rd : kxk = 1}

is closed and bounded (why?), and hence compact. On the set Knorm , the function H
takes a maximum value. This maximum value can only be the matrix norm kAk.
Lemma 5.5. For any two norms k k and k kstandard in Rd , there exist constants
m, M > 0 such that

mkxkstandard kxk M kxkstandard for all x Rd

10 Problem 5.10. Give the simple reason, why Lemma 5.3 implies Lemma 5.5.
12

5.2 Inverse matrices and continuity


10 Problem 5.11. Let kAk be denote any matrix norm for d d matrices.
(i) If kAk < 1, then the geometric series

I + A + A2 + . . .

is convergent in the matrix norm.

(ii) Use the identity

(I A)(I + A + A2 + + An1 ) = (I + A + A2 + + An1 )(I A) = I An

to show: If kAk < 1, then the geometric series has limit (I A)1 . Hence I A
is invertible.

(iii) If kAk < 1, then


kAk
k(I A)1 Ik
1 kAk
(iv) If there exists a natural number m such that kAm k < 1, then I A is invertible.
Answer.
12
The values of the bounds m and M is the personal secret of the gods who invented compactness.

37
10 Problem 5.12. Use the previous problem to derive:
1
(i) If A is invertible, and kBk < kA1 k
, then A + B is invertible.
1
(ii) If A is invertible, and kBk < kA1 k
, then

kA1 kkBA1 k
k(A + B)1 A1 k
1 kBA1 k

Answer.
10 Problem 5.13. Use the previous problem to derive:
(i) The set of invertible matrices is open.
(ii) The inverse depends continuously on the matrix.
Answer.

5.3 The Spectral Radius


Definition 5.3. The spectral radius of any d d matrix A is
(5.14) sp A := inf kAn k1/n
nN

Lemma 5.6. For any two norms k k and k kstandard in Rd , there exist constants
m, M > 0 such that
mkxkstandard kxk M kxkstandard for all x Rd
Hence for any matrix A, the two corresponding matrix norms satisfy
m M
kAkstandard kAk kAkstandard
M m
10 Problem 5.14. Prove the second assertion of Lemma 5.6.
Lemma 5.7. The spectral radius does not depend on the norm used in the definition.
Lemma 5.8. The spectral radius of a matrix and its transposed are equal.
10 Problem 5.15. Give the reason for Lemma 5.8.
Answer. Since the spectral radius does not depend on the norm used in the definition,
we can use the Euclidean norm. By Problem 5.7, we know that kAT k2 = kAk2 , and
hence k(AT )n k2 = k(An )T k2 = kAn k2 for all n N, too. The definition of the spectral
radius yields
1/n 1/n
sp AT = inf k(AT )n k2 = inf kAn k2 = sp A
nN nN

as to be shown.

38
Proposition 5.4. For all C with || > sp A, the series

X An
(5.15) [I 1 A]1 = I +
n=1
n

is convergent in norm and yields the inverseand hence the matrix I1 A is invertible.
10 Problem 5.16. Give the reason for Proposition 5.4.
Proof. Since
|| > sp A = inf kAn k1/n
nN

there exists a natural number m such that

|| > kAm k1/m


1 > km Am k

The norm convergence of the series (5.15) can be seen by grouping the terms according
to the division n/m. For all n N, division yields

n = qm + r
n
with quotient q = floor m and remainder 0 r < m. The norm series is easily checked
to be convergent. With the convention A0 = C 0 = I and C := 1 A, we get

X m1
X X m1
X X
n qm+r
kC k = kC k kC m kq kC r k
n=0 q=0 r=0 q=0 r=0
"m1 #" # Pm1
X
r
X
m q r=0 kC r k
= kC k kC k =
r=0 q=0
1 kC m k

Hence the spectral series (5.15) is absolutely convergent. Let B() be its limit. In the
identity

(I C)(I + C + C 2 + + C n1 ) = (I + C + C 2 + + C n1 )(I C) = I C n

one can take the limit n with convergence in the matrix norm. One gets

(I C)B() = B()(I C) = I

In other words, the spectral series (5.15) converges absolutely to the inverse of I C =
I 1 A, as to be shown.
Lemma 5.9. Assume sp A > 0. For any C with 0 < || sp A the terms of the
series (5.15) have all norm greater than 1 and the series is divergent. For any C
with 0 < || < sp A, the terms are unbounded and even diverge of infinity.

39
10 Problem 5.17. Prove Lemma 5.9.
Proof. Again, we put C := 1 A. For || = sp A, the definition of the spectral radius
yields
sp A
1= = inf kC n k1/n
|| nN
n
1 kC k for all n N
Now we check the second part: If 0 < || < sp A, there exists > 0 such that
||(1 + ) < sp A(1 ) < kAn k1/n for all n N
 n
n sp A(1 )
(1 + ) < < kC n k for all n N
||
Hence even the terms of the series (5.15) diverge to infinity.
Theorem 5.1 (The radius of convergence of the spectral series). The spectral series

X An
(5.15) [I 1 A]1 = I +
n=1
n

is absolutely convergent in norm for || > sp A, and divergent for sp A.


The matrix I 1 A is invertible for all C with || > sp A.

5.4 Subadditive Sequences


Proposition 5.5. Let an be a sequence of real numbers for which the subadditive property
(5.16) am+n am + an for all m, n N
is assumed. Then
an an
(5.17) lim = inf [, )
n n nN n

In detail: If the sequence ann is bounded below, then its limit and the infimum both exist
and are equal reals. If the sequence ann is not bounded below, then this sequence diverges
to .
Proposition 5.6 (Multiplicative version). Let bn 0 be a sequence of nonnegative
numbers for which
(5.18) bm+n bm bn for all m, n N
is assumed. Then
(5.19) lim an 1/n = inf an 1/n
n nN

The limit and the infimum both exist, and are equal nonnegative reals.

40
Quicker proof of the multiplicative version. The division n/m yields

n = qm + r
n
with the remainder 0 r < m and quotient q = floor m . The assumption (5.19) implies

bn (bm )q br
mq
Since n
= 1 nr , taking the n-th root yields
h r i1/n
mq
bn 1/n [bm 1/m ] n br 1/n = bm 1/m bmm br

We fix m and take the limit n to get

lim sup bn 1/n bm 1/m for all m N


n

lim sup bn 1/n inf bm 1/m lim sup bm 1/m


n mN m

Hence the last inequality is an equality.


Elaborate proof of the additive version. The division n/m yields

n = qm + r
n
with the remainder 0 r < m and quotient q = floor m . The subadditivity (5.16)
implies

an q am + ar
mq
Since n
= 1 nr , division by n yields

an mq am ar am mr am + ar am |am | + maxm1
r=0 |ar |
+ = + +
n n m n m n m n
If the sequence amm is not bounded below, this estimate implies that it diverges to .
Assume now that the sequence ann is bounded below. Let its infimum be
an
b := inf R
nN n

Given > 0. We choose in the first place M large enough such that
M am
min b+
m=1 m 2
Next choose N such that
maxM
r=0 |ar |

N 4
41
and conclude
an am
+ for all m M and all n N ()
n m 2
an M a
m
min + for all n N ()
n m=1 m 2
an
b b+ + for all n N ()
n 2 2
The last estimate implies the convergence
an
lim =b
n n

as to be shown.
Theorem 5.2 (The spectral radius is a limit). The spectral radius of any d d matrix
A is

(5.20) sp A = inf kAn k1/n = lim kAn k1/n


nN n

independently of the matrix norm k k chosen.


Answer. Since kAm+n k kAm kkAn k, the sequence bn := kAn k satisfies the assumptions

bm+n bm bn

of Proposition 5.6. Hence the limit

lim bn 1/n = inf bn 1/n


n nN

exists and is equal to the infimum. Obviously, the infimum is defined to be the spectral
radius sp A [0, ).
Remark. If AN = 0 for any N , then An = 0 for all n N and hence the assertion holds
with sp A = 0. But in an infinitely dimensional space, it is possible that An 6= 0 for all
n N and nevertheless sp A = 0.
10 Problem 5.18. Give the reason why the spectral radius does not depend on
the norm used in the definitionas stated in Lemma 5.7.
Lemma 5.10. For any two operators, sp (BA) = sp (AB)
Proof. Use the definition and calculate

sp (BA) = inf k(BA)n k1/n


nN

sp (BA) k(BA) k = kB(AB)n1 Ak1/n


n 1/n

1/n
sp (BA) kBkk(AB)n1 kAk

for all n N

42
If AB = 0, we see that sp (BA) = 0, even in the awkward case that BA 6= 0, as can
happen. In the case AB 6= 0, we go on
 1/n
kBkkAk
sp (BA) k(AB)n k1/n for all n N
kABk

In the limit n , one gets sp (BA) sp (AB). Since sp (AB) sp (BA) can be
shown similarly, the equality holds.

10 Problem 5.19. Assume two d d matrices are commutative: AB = BA.


Show that sp (AB) sp A sp B.

Answer. By the definition of the left hand side, and commutativity

(sp AB)n k(AB)n k = kAn B n k kAn k kB n k


sp AB kAn k1/n kB n k1/n

for all n N. We can take the limit n and use the limit property (5.20) for the
spectral radius. Thus we get the required inequality sp (AB) sp A sp B.

10 Problem 5.20. Any invertible matrix has positive spectral radius. A matrix
with spectral radius zero cannot be invertible.

Answer. If the given matrix A is invertible, there exists B such that AB = BA = I.


Hence
1 = sp I = sp AB sp A sp B
Hence sp A 6= 0.

6 The eigenvalue problem


Definition 6.1 (Eigenvalue, eigenvector, eigenspace). An eigenvalue of an oper-
ator A is a value C such that the eigenvalue problem

Ax = x

has a nontrivial solution x 6= 0. Such a solution is called eigenvector. The linear subspace
of all eigenvectors (and zero) is called eigenspace.

Lemma 6.1. All eigenvalues of any operator are in absolute value less or equal to the
spectral radius.

10 Problem 6.1. Give the reason for Lemma 6.1.

43
Answer. Assume Ax = x has a nontrivial solution x 6= 0. Take any vector norm k k.
We can normalize to get kxk = 1. Taking the norm of the eigenvalue problem, we get

kAxk = ||kxk

and hence by the definition of the matrix norm


kAxk
|| = kAk
kxk
Since this holds for all eigenvalues, we get

(6.1) max{|| : Ax = x has a solution x 6= 0} kAk

Since Ax = x implies An x = n x, we can use the estimate (6.1) for all n N and get

kAn xk
|n | = kAn k
kxk
|| kAn k1/n for all n N
|| inf kAn k1/n = sp A
nN
max{|| : Ax = x has a solution x 6= 0} sp A

as to be shown.
Definition 6.2 (Characteristic polynomial, characteristic equation). The char-
acteristic polynomial of any d d matrix A is defined to be

(6.2) pA () = det[I A]

The equation
det[I A] = 0
is called the characteristic equation of the matrix A.
Definition 6.3 (Geometric and algebraic multiplicity). The dimension of the
eigenspace is called the geometric multiplicity of the eigenvalue.
The algebraic multiplicity of the eigenvalue is the multiplicity of the eigenvalue as a
root of the characteristic polynomial.
Main Theorem (The eigenvalues are obtained from the characteristic equa-
tion).
(i) The eigenvalues of any matrix are exactly the zeros of the characteristic polynomial.

(ii) The geometric multiplicities of the eigenvalues are less or equal the algebraic mul-
tiplicities.

44
(iii) The characteristic equation has only simple roots if and only if all eigenspaces are
one-dimensional.
Definition 6.4. The trace tr A of a matrix A is the sum of its diagonal elements:
d
X
tr A = aii .
i=1

10 Problem 6.2. The trace is cyclically commutative: Check that for any two
matrices tr (AB) = tr (BA).
Give an example of three 2 2 matrices such that tr (ABC) 6= tr (BAC).
Lemma 6.2. The trace of a matrix is the sum of its eigenvalues, counting algebraic
multiplicities. The determinant of the matrix is the product of its eigenvalues, again
counting algebraic multiplicities:
X
tr A = i i
Y
det A = i i

Proof. In its factored form, the characteristic polynomial is


Y
pA () = ( i )i

with the product taken over all eigenvalues, and the i their algebraic multiplicities.
Distributing the terms yields
hX i Y
d
pA () = i i d1 + + (1)d i i

On the other hand, the definition of the characteristic polynomial yields


pA () = det(I A) = d [tr A]d1 + + (1)d det A
It is left to the reader to check this by induction. Comparison of the two results yields
the claim.
Lemma 6.3. For any matrix A and invertible matrix S, two matrices A and SAS 1
have the same characteristic polynomials, and hence the same eigenvalues with the same
algebraic multiplicitiesand the same trace and determinant.
But they can have different eigenspaces, still the geometric multiplicities of the eigen-
values do match, too.
Proof. The characteristic polynomial of SAS 1 is
pSAS 1 () = det[I SAS 1 ] = det[S(I A)S 1 ] = det S det[I A] det S 1
= det S det S 1 det[I A] = det[SS 1 ] det[I A] = det I det[I A]
= det[I A]

45
Thus the two matrices A and SAS 1 have the same characteristic polynomials, and
hence the same eigenvalues with the same algebraic multiplicities.
If x is an eigenvector of A with eigenvalue , then Sx is an eigenvector of SAS 1
with the same eigenvalue , since

Ax = x implies (SAS 1 ) Sx = SAx = Sx

Hence S induces an injective mapping from the eigenspace of A into the eigenspace of
SAS 1 . The inverse S 1 induces an injective mapping from the eigenspace of SAS 1
into the eigenspace of A. Hence S induces an bijection from the eigenspace of A onto
the eigenspace of SAS 1 , and they have the same dimension.

Definition 6.5 (Similar matrices). Two square matrices A and B are called similar
if there exists an invertible matrix S such that B = SAS 1 .

10 Problem 6.3. Prove that similarity of matrices is an equivalence relation.

Lemma 6.4. For any two matrices the products AB and BA have the same character-
istic polynomials, and hence the same eigenvalues with the same algebraic multiplicities.

Proof. If B is invertible, we use BA = B(AB)B 1 to see that AB and BA are similar.


Hence they have the same characteristic polynomial.
If B 6= 0 is not invertible, let m be the minimum of the absolute values for the
nonzero eigenvalues of B:

m = min{|| > 0 : Bx = x has a solution x 6= 0}

For all (0, m), the matrix B I is invertible. Hence the first part shows that
A(B I) and (B I)A have the same characteristic polynomial.

det[I A(B I)] = det[I (B I)A]

for all (0, m) and all C. Taking the limit 0 yields that AB and BA have
the same characteristic polynomial, as claimed.

Lemma 6.5. For any two matrices the products AB and BA have the same eigenvalues.
For the nonzero eigenvalues even the geometric multiplicities are equal.
If either A of B is invertible, all eigenvalues have the same geometric multiplicities.
In the case that neither A nor B is invertible, the geometric multiplicities of the
eigenvalue 0 can be different.

Proof. For any eigenvalue 6= 0, let E (AB) the eigenspace for AB. Since ABx = x
implies (BA)Bx = Bx, we see that B maps E (AB) into E (BA):

BE (AB) E (BA)

46
From the assumption 6= 0, we get furthermore ABE = E , and hence B restricted
to E (AB) as well as A restricted to BE are both injective. Hence B is a bijection
E 7 BE , and A is a bijection BE 7 E .
Hence E and BE which are the eigenspaces of 6= 0 for the matrices AB and
BA have the same dimension.
If B is invertible, we use BA = B(AB)B 1 , if A is invertible, we can use BA =
A1 (AB)A, and see that the two matrices are similar.
All arguments break down in one special case: for the eigenvalue zero of AB, with
neither A nor B invertible. The following simple example confirms that the geometric
multiplicities of the eigenvalue zero can be different for AB and BA:
   
0 1 0 b
A := and B :=
0 0 0 1
One calculates    
0 1 0 0
AB = and BA =
0 0 0 0
The geometric multiplicity of eigenvalue zero is one for AB, but two for BA.
Definition 6.6 (Resolvent set). The resolvent set of an operator A is defined as
RA := { C : the inverse [I A]1 exists}
Question. Explain why the set RA is open.
Answer. The set of all invertible matrices is open in matrix norm. Hence the set of all
invertible matrices of the form [I A] is open in C.

6.1 Symmetric Matrices


Definition 6.7 (Important classes of matrices). A square d d-matrix is called
symmetric if AT = A and antisymmetric if AT = A, orthogonal if AT A = I, special
orthogonal if OT O = I and det O = 1.
Proposition 6.1. For a symmetric operator A = AT , the matrix norm corresponding
to the Euclidean vector norm satisfies [kAk2 ]2 = kA2 k2 .
10 Problem 6.4. Derive Proposition (6.1). Begin with the definition of the norm
on the left-hand side, and use the Cauchy-Schwarz inequality.
Answer. We begin with the definition of the norm on the left-hand side, and use the
Cauchy-Schwarz inequality:
2
(Ax)T Ax xT AT Ax xT A2 x

2 kAxk2
[kAk2 ] = max = max = max = max
kxk2 xT x xT x xT x
kxk2 kA2 xk2 kA2 xk2
max 2
= max = kA2 k2
kxk2 kxk2

47
Since the properties of the matrix norm yields the reversed inequality
kA2 k2 [kAk2 ]2
all are equalities.
Proposition 6.2. For a symmetric operator A = AT , the matrix norm corresponding
to the Euclidean vector norm is equal to the spectral radius: kAk2 = sp A.
Proof. By induction, the last proposition yields
s 1/2s
kAk2 = kA2 k2


for all s 0. As we have shown, the limit of the right-hand side for s is the
spectral radius. Hence kAk2 = sp A, as to be shown.
Definition 6.8 (Raleigh quotient). The Raleigh quotient is defined as
Pd Pd
xT Ax xi aik xk
R(x) := T = i=1Pd k=1
x x i=1 xi xi

Theorem 6.1. For a symmetric matrix A, the minimum and maximum values of the
Raleigh quotient are the minimal and maximal eigenvalues.
The Raleigh quotient gets stationary at x if and only if x is an eigenvector. In this
case R(x) is the corresponding eigenvalue.
Proof. It is enough for consider the Raleigh quotient on the Euclidean unit ball
d
X
UB2 = {x Rd : x2i = 1}
i=1

Since UB2 is compact and the Raleigh quotient is a continuous function, it assumes
its minimal and maximal value. At the extremal values, the gradient of the Raleigh
quotient is zero. (Why?) A simple calculation yields
d d d d
XX X X
xi aik xk = alk xk + xi ail
xl i=1 k=1 k=1 i=1
xT Ax = Ax + xT A = Ax + AT x = 2Ax
xT Ax xT Ax T
 
1 T
T = T x Ax T x x
x x x x x x
1 1
R(x) = T [2Ax R(x)2x] = T [Ax R(x)x]
x x x x
Hence the Raleigh quotient gets stationary at x if and only if x is an eigenvector. In
this case, R(x) is the corresponding eigenvalue.
At the eigenvectors for the minimal and maximal eigenvalues, the Raleigh quotient
assumes its absolute minimum and maximum, respectively.

48
Lemma 6.6. For any symmetric matrix

max{||2 : A2 x = 2 x has a solution x 6= 0}


= max{||2 : Ay = y has a solution y 6= 0}

Proof. Assume A2 x = 2 x has a nontrivial solution x 6= 0. Hence (A+I)(AI)x = 0.


In case that (A I)x = 0, we take y := x, := . In case that (A I)x 6= 0, we take
y := (A I)x, := .
In both cases, we see that Ay = y has a nontrivial solution y 6= 0 with 2 = 2 .
Thus we conclude that the left-hand side is less or equal the right-hand side. The
reversed inequality is easierand left as an exercise.

Proposition 6.3. For a symmetric d d matrix, the maximal absolute value of all the
eigenvalues is equal to the spectral radius:

max{|| : Ax = x has a solution x 6= 0} = sp A

Answer. All eigenvalues of any operator are in absolute value less or equal to the spectral
radius. Squared we get

max{||2 : Ax = x has a solution x 6= 0}


2
(Ax)T Ax xT A 2 x

2 2 kAxk2
[sp A] [kAk2 ] = max = max = max
kxk2 xT x xT x
= max{||2 : A2 x = 2 x has a solution x 6= 0}
max{||2 : Ax = x has a solution x 6= 0}

Everywhere equality!
Remark. The maximal absolute value relevant for the spectral radius can be either the
absolute value of the maximal, or the minimal eigenvalue. Hence we get the two cases.
13
In the Main Theorem (8.1) below, we shall prove the corresponding fact for arbitrary
matrices.

6.2 Back to arbitrary matrices


The goal of this section is to prove the followingstill not optimalTheorem.

Theorem 6.2. For any d d matrix:


p
max{|| : Ax = x has a solution x 6= 0} sp A sp AT A
p
sp AT A = kAk2 = max{|| : AT Ay = 2 y has a solution y 6= 0}
13
If father and mother have the same opinion, mother is right, if they have different opinions, father
is right.

49
10 Problem 6.5. Prove the matrices AT A and AAT have only nonnegative eigen-
values. Prove that AT A and A have the same kernel.
Answer. For any eigenvector x of AT A,
AT Ax = x implies
(Ax)T Ax = xT AT Ax = xT x
kAxk22 = kxk22
and hence 0. Similarly, we see that AT Ax = 0 implies kAxk22 = 0 and hence Ax = 0.
The converse is obvious, and hence
AT Ax = 0 if and only Ax = 0
Proposition 6.4. For any dd matrix, the matrix norm corresponding to the Euclidean
vector norm is the square root of the maximal eigenvalue of the matrix AT A.
The equality kAxk2 = kAk2 kxk2 holds if and only x is an eigenvector for the largest
eigenvalue of AT A:
kAxk2 = kAk2 kxk2 if and only if AT Ax = 2 xwith 2 maximal.
qP
d 2
Proof. The matrix norm for the Euclidean vector norm kxk2 = l=1 xi has the square
2
(Ax)T Ax xT AT Ax

2 kAxk2
[kAk2 ] = max = max = max
kxk2 xT x xT x
This is just the maximal value of the Raleigh quotient of the symmetric matrix AT A,
and hence the maximal eigenvalue of this matrix.
Proposition 6.5. For any dd matrix, the matrix norm corresponding to the Euclidean
vector norm is the square root of the spectral radius of the matrix AT A.
p p
kAk2 = sp (AT A) = sp (AAT )
Proof. Recapitulating and continuing the proof of Proposition 6.4
2
(Ax)T Ax xT AT Ax

2 kAxk2
[kAk2 ] = max = max = max
kxk2 xT x xT x

This is just the maximal value of the Raleigh quotient of the symmetric matrix AT A,
and hence the maximal eigenvalue.

[kAk2 ]2 = max{||2 : AT Ax = 2 x has a solution x 6= 0}

By Lemma 6.1, all eigenvalues are less or equal the spectral radius.

[kAk2 ]2 sp (AT A) kAT Ak2 kAT k2 kAk2 = [kAk2 ]2


p
All equalities! Hence kAk2 = sp (AT A), as to be shown.

50
p
10 Problem 6.6. Give an example for a 22 matrix, for which kAk2 = sp (AT A) >
sp A.

Answer.  
0 1
A=
0 0
Since A2 = 0, we get sp A = 0. But
   
T 1 0 0 0
T
A A= and AA =
0 0 0 1

have the eigenvalues 0, 1 and hence sp (AT A) = 1. Too, one can check directly that
kAk2 = 1.

6.3 Normal Matrices


Definition 6.9. A square matrix that commutes with its transposed is called normal :

AT A = AAT if and only if A is normal.

Theorem 6.3. For a normal matrix, the Euclidean norm equals the spectral radius

kAk2 = sp A for normal matrices A.

Especially, this holds for symmetric, antisymmetric and orthogonal matrices.

Corollary 12. For normal matrices, the Euclidean matrix norm kAk2 is the minimal
among all matrix norms.

10 Problem 6.7. Prove Theorem 6.3.

Answer. Combine Proposition 6.5 with Proposition 5.19. Hence

kAk22 = sp (AT A) sp AT sp A kAT k2 kAk2 = kAk22

since kAT k2 = kAk2 . Hence everywhere equality holds. Now sp AT = sp A implies that
kAk2 = sp A as claimed.

7 The Hilbert-Schmitt Norm


10 Problem 7.1. Check directly that the trace is commutative: tr (AB) = tr (BA)
for any two d d matrices A and B

51
Definition 7.1. The Hilbert-Schmitt norm of a matrix is the square root of the sums
of the squares of its elements:
v
u d d
uX X
(7.1) N (A) = t a2ik
i=1 k=1

10 Problem 7.2. Check that the Hilbert-Schmitt norm is


p p
N (A) = tr (AT A) = tr (AAT )
10 Problem 7.3. Use the Cauchy-Schwarz inequality to get
|tr (AB)| N (A) N (B)
for any two d d matrices A and B. In which case does one get equality?
Answer. Equality occurs if and only if the matrices A and the transpose B T are linearly
dependent.
10 Problem 7.4. Prove that the Hilbert-Schmitt norm is a vector-norm on the
vector space of d d matrices.
10 Problem 7.5. Prove that the Hilbert-Schmitt norm satisfies
N (AB) N (A)kBk2 and N (AB) kAk2 N (B)
for any two d d matrices A and B.
Check that the equality N (AB) = N (A)kBk2 holds if and only if AT AB = kAk22 B.
Answer. We prove the first inequality from the column-wise decomposition of B. The
second inequality follows similarly by a row-wise decomposition of A.
B = [b1 , b2 , . . . , bd ]
d
X
2
N (B) = kbi k22
i=1
AB = [Ab1 , Ab2 , . . . , Abd ]
d
X d
X d
X
2
N (AB) = kAbi k22 kAk22 kbi k22 = kAk22 kbi k22
i=1 i=1 i=1
2
N (AB) kAk22 N (B) 2

We now check under which assumptions equality does hold. The equality N (AB) =
N (A)kBk2 holds if and only if kAbi k2 = kAk2 kbi k2 for all column vectors bi of B. By
Theorem 6.2, the equality kAxk2 = kAk2 kxk2 holds if and only x is an eigenvector for
the largest eigenvalue kAk2 of AT A. Since this holds for all column vectors bi = x of B,
we get in matrix notation
AT AB = kAk22 B

52
10 Problem 7.6. Prove that the Hilbert-Schmitt norm satisfies

kAk2 N (A)

for any d d matrix A.


Answer. The Cauchy-Schwarz inequality yields
d
" d #2
X X
kAxk22 = aik xk
i=1 k=1
" d d
#" d
#
XX X
a2ik x2k = N (A)2 kxk22
i=1 k=1 k=1
kAxk2 N (A)kxk2 for all x Rd .
kAk2 N (A)

Lemma 7.1. The equality kAk2 = N (A) occurs if and only if the matrix A has rank
zero or one. In the dyadic notation: A = yxT .
Proof. The equality holds in the Cauchy-Schwarz in equality |aT b| kak2 kbk2 if and
only if the two vectors a and b are linearly dependent. Assume now that kAk2 = N (A).
There exists x Rd such that kAxk2 = kAk2 kxk2 and kxk = 1. The equality holds in
Cauchy-Schwarz estimate done in Problem 7.6 holds in all components.
" d #2 " d #" d #
X X X
aik xk = a2ik x2k for all 1 i d.
k=1 k=1 k=1

Hence the the vectors ai. and x are linearly dependent for all 1 i d. Hence there
exist yi such that

aik = yi xk for all 1 i d and all 1 k d.

We see that the equality kAk2 N (A) occurs if and only if the matrix A has rank zero
or one. In the dyadic notation: A = yxT .
Corollary 13. The Hilbert-Schmitt norm satisfies

kAk2 N (A) dkAk2

for any d d matrix A. This estimate is optimal.


Lemma 7.2. Given is any matrix norm arising from a vector norm. An estimate
lkAk N (A) LkAk for all matrices can only holdbut need not holdwith constants
l 1 and L d.

53
Corollary 14. Neither the Hilbert-Schmitt norm not any multiple of it is a matrix
norm, which did arise from a vector norm.

Proof of the lemma. Let k k be any vector norm, for which we may assume that

1 kxk kxk2 K kxk for all x Rd

Furthermore, we may assume that 1 and K are the optimal constants with which such
an estimate is always true.
Take any vectors a, b Rd and let A = abT be the rank one matrix with the elements
Aik = ai bk for i = 1 . . . d, k = 1 . . . d.

kAxk = ka(bT x)k for all x Rd .


With x = b we get kAbk = ka(bT b)k = kakkbk22 kakkbk2 kbk
kabT k kakkbk2

The Hilbert Schmitt norm of the rank one matrix is N (abT ) = kak2 kbk2 . Hence the
assumed estimate lkAk N (A) implies

lkakkbk2 lkabT k N (abT ) = kak2 kbk2

for all a, b Rd . Hence


lkak kak2 for all a Rd .
Since this estimate for assumed not to hold for any l > 1, we conclude that l 1.
The upper bound follows by taking A = I the identity matrix, since kIk = 1 but
N (I) = d.

10 Problem 7.7. Let k k be any vector norm, for which we assume

kxk kxk2 Kkxk for all x Rd

Take any vectors a, b Rd and let A = abT be a rank one matrix with the matrix
elements Aik = ai bk for i = 1 . . . d and k = 1 . . . d. Prove that the norm of the rank one
matrix satisfies the estimate

kakkbk2 kabT k Kkakkbk2

Solution of Problem 7.7. We upper estimate of the matrix norm arises from the Cauchy-
Schwarz inequality:

kAxk = ka(bT x)k = kak|bT x|


kAxk kakkbk2 kxk2 Kkakkbk2 kxk for all x Rd
kAk Kkakkbk2

54
On the other hand, by choosing x = b, we get a lower bound.
kAbk = ka(bT b)k = kakkbk22 kakkbk2 kbk
kAk kakkbk2
Put together, we have checked that
kakkbk2 kabT k Kkakkbk2
holds for the norm of the rank one matrix.
Proposition 7.1. For any matrix norm derived from a vector norm, equivalently:
(i) kAk N (A) holds for all d d matrices A.
(ii) kAk = kAk2 holds for all d d matrices A.
Proof. It has been shown in Problem 7.6 that (ii) implies (i). We need only to show
that (i) implies (ii). Take any vectors a, b Rd and define the rank one matrix A = abT .
The assumption (i) implies
kabT k N (abT ) = kak2 kbk2
Hence for all x Rd
kak|bT x| = ka(bT x)k kabT kkxk kak2 kbk2 kxk
Taking x = b yields
kakkbk22 kak2 kbk2 kbk
Since this holds universally, we get
kak kbk

kak2 kbk2
for all a, b 6= 0. But this can only be true if the quotient is a positive constant. The
corresponding matrix norm is the matrix norm kAk2 , independently of this constant.
10 Problem 7.8. Prove that the square of the Hilbert-Schmitt norm of any matrix
A is the sum of the eigenvalues of the matrix AT A.
In the more special case of a symmetric matrix, the square of the Hilbert-Schmitt
norm is the sum of squares of the eigenvalues of A:
qX
N (A) = 2i
with the sum over all eigenvalues Ax = i x
Answer. The first claim is easy to check:
X
N (A)2 = tr (AT A) = {2i : all eigenvalues AT Ax = 2i x}
For a symmetric matrix AT = A, we can use Lemma 6.6:
X
N (A)2 = tr (AT A) = {2i : all eigenvalues A2 x = 2i x}
X
= {2i : all eigenvalues Ax = i x}

55
8 The Radius of Convergence and Complex Analy-
sis
Complex analysis yields a very useful characterization of the radius of convergence of any
power series. The following theorem is a rather immediate consequence of the Cauchy
integral formula.
Main Theorem (The radius of convergence is given by the maximal circle
inside the maximal domain of analyticity). Let f be any analytic function in any
open domain O C.
(i) The power series of f about any point z0 O

X f (n) (z0 )
(8.1) f (z) = (z z0 )n
n=0
n

is convergent inside the largest open disk D(z0 ) around z0 entirely lying inside O.
(ii) Even more, this power expansion is convergent inside the largest disk Dmax (z0 ) about
z0 to which an analytic continuation of the function f is possible.
(iii) If the radius of this maximal disk is finite, there exists at least one point on its
boundary, were the function f has no Taylor expansionand hence a singularity.
(iv) Assume that the restriction to a small circle around z0 of the function f has already
been extended to its maximal domain of analyticity Omax this can even be a
Riemann surface. 14 Under this assumption, the radius of convergence R(z0 )
of the Taylor expansion at any point z0 Omax is given by its distance to the
boundary of Omax .
(v) Altogether, we have claimed that
(8.2) R(z0 ) = min{|z z0 | : z Omax } min{|z z0 | : z O}

A short indication of the reason. We need only to deal we item (i). Anything else fol-
lows from there rather easily. The open disk D(z0 ) O was chosen maximally. Take
any circle C D(z0 ) around z0 lying inside the maximally chosen disk D(z0 ). The
Cauchy integral formula I
1 f ()
f (z) = d
2i C z
yields a power expansion of f around z0 which is convergent for all points z in the
interior region of C.
14
For a Riemann surface, there is the awkward possibility that the extension of the function f from
a small neighborhood of z0 on the one hand to the maximal disk Dmax (z0 ) and on the other hand
to the originally given domain Oare in conflict. In that case, just the first extensionto the largest
diskis what matters.

56
10 Problem 8.1. Students of MATH 3146:
Provide this power expansion.
10 Problem 8.2. Students of MATH 3146:
Estimate f (n) (z0 ) in terms of max{|f ()| : C}
By uniqueness of the power expansion of a given function around a given point, we
always get the same expansion, independent of the radius of circle C.
10 Problem 8.3. Students of MATH 3146:
Prove that the power expansion of a given function around a given point is unique.
10 Problem 8.4. Students of MATH 3146:
Give a second reason, why the expansion cannot depend on the radius of circle C.
Since the circle C can be chosen arbitrarily near to the boundary of D(z0 ), we get
a power expansion of f around z0 which is convergent for all points z in the open disk
D(z0 ). In other words, we have an estimate

(8.3) R(z0 ) min{|z z0 | : z O}

for the convergence radius R(z0 ) of the Taylor series (8.1).


10 Problem 8.5. What is really astonishing about this result?
Answer. On gets an estimate for the radius of convergence of the Taylor series without
ever calculating its coefficients. On can use the estimate without knowing anything
about the reasoning behind it.
It is important to remember that the Taylor expansion of a function in a small
neighborhood of any point already uniquely determines the analytic extension to its
maximal domain of analyticity Omax and this can even be a Riemann surface.
The remaining ideas leading from (8.3) to (8.2) are clever, but easy to grasp learned
from the genius on whose shoulders we stand.

10 Problem 8.6. Students of MATH 3146:


For the following functions:

(i) f (z) = tan z


1
(ii) g(z) =
z2 z+1
1
(iii) h(z) =
z2 z1
ez + ez 1
(iv) L(z) = z z

e e z

(v) K(z) = 1 z 2

57
find the radius of convergence of the Taylor expansions about z0 = 0.

10 Problem 8.7. Students of MATH 3146:


Define F (z) = z in the region O = C \ (, 0]. What is the Riemann surface
for this function. What is the radius of convergence of the Taylor expansion around
z0 = 1+i and how is it determined. Explain part (iv) and (v) of the Main Theorem (8),
and especially the footnote.

Figure 8.1: The arbitrary cut (, 0] does not alter the radius of convergence.

Answer. The Riemann surface for the function z consists of two sheets winding around
the singularity at z = 0. This is the maximal domain of analyticity Omax .
The radius of convergence R(z0 ) of the Taylor expansion at point z0 = 1+i is given
boundary of Omax , which consists of the single point 0. Hence
by the distance to the
R = | 1 + i 0| = 2. The Taylor series is convergent in the open disk maximal disk
Dmax , but divergent in the exterior of this disk.
The distance of z0 to the boundary of the originally given region O = C \ (, 0] is
smaller, indeed
min{|z z0 | : z O} = 1 < 2

58
The reason is the arbitrary choice (, 0] for the cut, but does not influence the radius
of convergence.
The extension of the function z from a small neighborhood of z0 = 1 + i on the
one hand to the maximal disk Dmax (z0 ) and on the other hand to the originally given
domain Oare different in the circular segment Dmax (z0 ) {z = x + iy C : y < 0}.
That does not matter for the radius of convergence R.

8.1 Consequences for the Spectral Radius


So far one can go without complex analysis. The Cauchy integral formula yields the
even much stronger result:

Theorem 8.1. There exists a C with || = sp A, for which the operator I A is


not invertible.

Main Theorem (The spectral radius is the maximal absolute value for the
eigenvalues). For any d d matrix A, the maximal absolute value for the eigenvalues
is equal to the spectral radius.

Main Theorem (All bounds). Put all together: any d d matrix satisfies:
p
max{|| : Ax = x has a solution x 6= 0} = sp A sp AT A
p
sp AT A = kAk2 = max{|| : AT Ay = 2 y has solution y 6= 0} N (A)
p X
N (A) = tr (AT A) = {|i | : all eigenvalues AT Ax = 2i x}

Lemma 8.1. The function 7 [I A]1 is an analytic function on the resolvent set
RA where the inverse of I A exists.
The function z 7 [I zA]1 is an analytic function on the set of z C where I zA
is invertible. 15 In the case of matrices, we state equivalently that all d2 elements of the
matrix A are analytic functions on this set. 16

10 Problem 8.8. Use the geometric series to check that



X
1
[I zA] = [z z0 ]n An [I z0 A](n+1)
n=0
X
[I A]1 = [ ]n [I A](n+1)
n=0

By means of these identities, get the Lemma (8.1) for any arbitrary operator A.
15 1
This is the reciprocal resolvent set {0} RA .
16
In case of an operator, all possible definitions turn out to be equivalent.

59
Proof of the Theorem 8.1. With z = 1 , the spectral expansion gets

X
1
[I zA] = z n An
n=0

We know by Theorem 5.1 that this power series has the positivepossibly infinite
radius of convergence R = (sp A)1 (0, ]. If sp A = 0, then the matrix A is not
invertible, as shown in Problem 5.20. 17
From now on, we assume that sp A > 0. By the Main Theorem (8), the radius of
convergence R is given by the maximal circle inside the maximal domain of analyticity.
The radius of this maximal circle needs to be R = (sp A)1 (0, ) by Theorem 5.1.
Hence for sp A 6= 0, there exists a point z with |z| = R = (sp A)1 at which the
function z 7 [I zA]1 is not analytic. Hence, by Lemma 8.1, the operator I zA is
not invertible. With = z 1 , the operator I A is not invertible, neither.

Corollary 15 (The Fundamental Theorem of Algebra). Every nonconstant poly-


nomial with real or complex coefficients has at least one complex zero.

10 Problem 8.9. Use Theorem 8.1 to derive the Fundamental Theorem of Alge-
bra.

9 A Mean Ergodic Theorem


We deal with a mean ergodic theorem about the fractional parts of irrational numbers.

10 Problem 9.1. Given is the sequence

(9.1) rn := sin2 bn

with
b p
=
q
with p and q relatively prime. How many points does the limit set have. Determine the
limit set of the sequence. How does it depend on the denominator q. Why does it not
depend on the numerator p? One needs to use that there exist integers k, l such that

kp lq = 1

as you should know from modern algebra.


17
We have proved this without complex analysisI think this is the simpler approach.

60
Answer. The given sequence
pn
rn := sin2
q
can only take the q values
1 2 3 q
sin2 , sin2 , sin2 , . . . , sin2
q q q q
It is obvious they all occur for p = 1. The same is indeed true if p and q are relatively
prime. Let k and l be integers such that kp lq = 1. Indeed, we get terms
kpn (kp lq)n n
rkn = sin2 = sin2 = sin2
q q q
with n = 1, 2, 3, . . . q, which are equal to the numbers above.
Remark. The index k for which rk = sin2 q yields an inverse of p modulo the denominator
q. The set of all numbers in 1, 2, . . . q 1 which are relatively prime to q form a group,
denoted by Z q . The number of its elements is (q)called Eulers totient function.
Let a (0, 1) be irrational. Seeking some rational approximation with any denomi-
nator q, we define

p := floor(aq) Z
frac(aq) := aq floor(aq) (0, 1)

to get

p frac(aq)
a= +
q q
What is the distribution the sequence frac(aq) for q N? The answer is astonishingly
easy: In the mean, it is equally distributed. It does not dependent on the irrational real
number a.
Main Theorem (A Mean Ergodic Theorem). In the mean, the fractional parts
frac(aq) are equally distributed, independently of the irrational real number a.
(i) For all x [0, 1],
number of frac(aq) [0, x] for q = 1 . . . N
lim =x
N N

(ii) For all continuous functions f (x) on [0, 1] with f (0) = f (1):
PN Z 1
q=1 f (frac(aq))
lim = f (t) dt
N N 0

61
(iii) For all nonzero integer numbers k Z, k =6 0
PN 2ikqa
q=1 e
lim =0
N N
(iv) For any x [0, 1], there exists a subsequence of frac(aq) convergent to x.
Idea of proof. It is a straightforward application of the geometric series to get part (iii).
To derive part (ii) from part (iii), we use the fact that any periodic function can be
uniformly approximated by trigonometric polynomials.
To derive part (iv) from part (ii), we use a nonnegative continuous periodic function
f with a sharp peak at point x.
Finally deriving part (i) from part (ii) requires to squeeze a step function between
two approximating continuous functions.
Part (iii) implies part (ii). For any given continuous periodic function f and any > 0,
there exists a trigonometric polynomial
K
X
P (t) = a0 + ak cos 2kt + bk sin 2kt
k=1

such that
|f (t) P (t)| for all t
Elementary properties of sums and integrals imply
PN PN
q=1 f (frac(aq)) P (frac(aq))

q=1


N N


Z 1 Z 1


P (t) dt f (t) dt
0 0

Part (iii) implies there exists N such that


PN Z 1

q=1 P (frac(aq))
P (t) dt

N

0
for all N > N . Together we get
PN Z 1 PN PN

q=1 f (frac(aq))
q=1 f (frac(aq)) q=1 P (frac(aq))
f (t) dt

N N N

0
PN Z 1
q=1 P (frac(aq))

+ P (t) dt

N 0
Z 1 Z 1

+ P (t) dt f (t) dt
0 0
3

62
as to be shown.
Part (ii) implies part (iv). Given is x [0, 1]. Assume towards a contradiction that
no subsequence of frac(aq) converges to x. Then there would exist an (exceptional)
> 0 such that the interval (x , x + ) would contain only finitely many terms of the
sequence frac(aq).
There exists a periodic continuous function f such that

f (t) 0 for all t


Z 1
f (t) dt = 1
0
f (t) = 0 for all t with |x t|

For such a function, we would get


PN
q=1 f (frac(aq))
lim =0
N N
R1
and hence by part (ii) 0
f (t) dt = 0which is a contradiction.
Part (ii) implies part (i). Given are x [0, 1] and > 0. The characteristic function x
of the interval [0, x] can be squeezed between two continuous functions f and g:

0 f x g 1
f (t) = 1 for all t x
g(t) = 0 for all t x +

Monotonicity implies
PN PN
q=1 f (frac(aq)) number of frac(aq) [0, x] for q = 1 . . . N q=1 g(frac(aq))

N N N
Monotonicity of the integral implies
Z 1 Z 1
x f (t) dt and g(t) dt x +
0 0

We use part (ii) for the two functions f and g. Hence there exists N such that
Z 1 PN
q=1 f (frac(aq))
f (t) dt
0 N
PN Z 1
q=1 g(frac(aq))
g(t) dt +
N 0

63
for all N > N . Altogether we get
Z 1 PN
q=1 f (frac(aq))
x 2 f (t) dt
0 N
number of frac(aq) [0, x] for q = 1 . . . N

N
PN Z 1
q=1 g(frac(aq))
g(t) dt + x + 2
N 0

which confirms part (i).


10 Problem 9.2. As an easy consequence, we consider the sequence
(9.2) zn := sin2 cn
c
with
irrational. Determine the limit set of this sequence. Conclude that the sequence
(common) yn := sin2 n
has subsequences convergent to any given number x [0, 1].
Answer. Since
sin2 frac(aq) = sin2 aq = sin2 cq = zq
we can use the Mean Ergodic Theorem, part (iv). Hence, for any x [0, 1], there exists
a subsequence such that
lim frac(aqs ) = x
s
lim sin2 frac(aqs ) = lim sin2 aqs = sin2 x
s s

Again, we can prescribe for y = sin2 x to be any value y [0, 1].


Since is irrational (proved already by Lambert), the case a = 1 , c = 1 is covered.
We conclude that the sequence
(common) yn := sin2 n
has subsequences convergent to any given number x [0, 1].

10 The contraction mapping principle


Definition 10.1 (Contraction). A contraction is a mapping T : X 7 X of a metric
space X into itself, for which there exists a contraction constant 0 q < 1 such that
dist (T x, T y) q dist (x, y)
for all x, y X

64
Definition 10.2 (Fixed point). A fixed point of a mapping T : X 7 X is a solution
of the equation
Tx = x

Main Theorem (The contraction mapping principleorBanach Fixed Point


Theorem). A contraction mapping from a complete metric space to itself has exactly
one fixed point.

Uniqueness of the fixed point. Let X be a complete metric space, and T : X 7 X be a


mapping such that
dist (T x, T y) q dist (x, y)
with contraction constant 0 q < 1. Assume that both x and y are fixed points. Hence

dist (x, y) = dist (T x, T y) q dist (x, y)


(1 q)dist (x, y) 0
dist (x, y) = 0
x=y

confirming uniqueness.
Existence of a fixed point. We choose any initial point x0 X and define the sequence
of iterates
x1 := T x0 , x2 := T x1 , . . . , xn+1 := T xn , . . .
From the contraction assumption we get inductively

dist (xn , xn+1 ) q n dist (x0 , T x0 )

for all n N. Hence, as shown in problem 1.4, the sequence xn is a Cauchy sequence
and
qn
dist (xn , xn+p ) dist (x0 , T x0 )
1q
for all n, p N. Since the space X is assumed to be complete, the Cauchy sequence has
a limit x. Taking the limit in the iteration xn+1 = T xn implies x = T x, confirming that
the limit is a fixed point.

Corollary 16. Any iteration sequence satisfies


qn
dist (xn , x) dist (x0 , T x0 )
1q
for all n N.

65
Corollary 17. Suppose that the contraction assumption
dist (T x, T y) q dist (x, y)
inside a ball Br (x0 ) of radius r > 0 about the initial x0 X. If the radius r and the
contraction constant q satisfy
dist (T x0 , x0 ) (1 q)r
then the ball Br (x0 ) is mapped by the contraction into itself. Furthermore, there exists
exactly one fixed point in the closed ball Br (x0 ).
Proof. For any point x Br (x0 ),
dist (x, x0 ) r
dist (T x, x0 ) dist (T x, T x0 ) + dist (T x0 , x0 ) q dist (x, x0 ) + dist (T x0 , x0 )
qr + (1 q)r = r

and hence T x Br (x0 ). Now existence and uniqueness of a fixed point in the closed
ball Br (x0 ) can be shown as above.
Corollary 18. Suppose that the contraction assumptions
dist (Ti x, Ti y) q dist (x, y)
hold for both mappings T1 : X 7 X and T2 : X 7 X. Then the distance between their
two fixed points is at most
1
(10.1) dist (x1 , x2 ) max{dist (T1 x, T2 x) : x X}
1q
Proof. The two fixed points satisfy
dist (x1 , x2 ) = dist (T1 x1 , T2 x2 )
dist (T1 x1 , T1 x2 ) + dist (T1 x2 , T2 x2 )
q dist (x1 , x2 ) + max{dist (T1 x, T2 x) : x X}
(1 q)dist (x1 , x2 ) max{dist (T1 x, T2 x) : x X}
Dividing by 1 q yields the claim.
10 Problem 10.1. Let f : K 7 K be a mapping of a compact metric space to
itself, which is weakly contracting:
dist (f (x), f (y)) < dist (x, y) for all pairs x 6= y in K.
Show that there exists a unique fixed point b of f . For any initial value x0 K, the
sequence of iterates xn+1 = f (xn ) converges to b. 18
18
I find it best to calculate with sets.

66
Answer. Define a sequence of sets K1 := K and inductively Kn+1 := f (Kn ) for all
n N. They satisfy
Kn+1 Kn for all n N.
and all Kn are compact and nonempty. The intersection B = nN Kn is compact and
nonempty, since it contains the limit (or limits) of all convergent subsequences of any
sequences kn Kn . Too, the intersection satisfies f (B) = B. (Why?)
It remains to check that B consists of exactly one point. Since B is compact, there
exist c B and d B such that diam (B) = dist (c, d). (Why?). Since B = f (B), there
exist c0 , d0 B such that c = f (c0 ), d = f (d0 ). Under the assumption that B contains
more than one point, we would get

diam (B) = dist (c, d) = dist (f (c0 ), f (d0 )) < dist (c0 , d0 ) diam (B)

which is a contradiction.
Hence B consists of exactly one point b. Since
\
f [n] (K) = {b}
nN

we conclude that for any initial value x1 K, the sequence of iterates converges to b.
Hence b is a fixed point. Uniqueness of the fixed point is obvious.

11 Local inverses
11.1 The Local Inverse Theorem
Main Theorem (The Local Inverse Function Theorem). Let O Rd be open,
and the mapping F : O 7 Rd and all its first partial derivatives be continuous.
For any point x0 O at which the derivative matrix DF (x0 ) is invertible, there exist
open neighborhoods U of x0 and V of its image y0 = F (x0 ) such that the restriction of
F to U is a bijection from U to V . The local inverse is continuous and has continuous
first partial derivatives
DF 1 (y) = [DF (F 1 (y))]1
for all y V .

Theorem 11.1 (The detailed Inverse Function Theorem). Make the same as-
sumptions as for the Inverse Function Theorem. Then there exist open balls Br (x0 ) and
Bs (y0 ) such that

(i) All y Bs (y0 ) have exactly one preimage x Br (x0 ) such that y = F (x).

(ii) The restriction of F to U := F 1 (Bs (y0 )) Br (x0 ) is a bijection from U to V :=


Bs (y0 ).

67
(iii) There exist a constant L such that
1
kx1 x2 k kF (x1 ) F (x2 )k Lkx1 x2 k for all x1 , x2 Br (x0 )
L
In other words, the mapping F restricted to Br (x0 ) is both Lipschitz continuous
and expansive.

(iv) At the point y0 , the local inverse has the total derivative

DF 1 (y0 ) = [DF (x0 )]1

Proof of part (i), detailed local Inverse Function Theorem 11.1. Given y Bs (y0 ), we
have to find all solutions of the equation

(11.1) y = F (x)

with x Br (x0 ). Too, suitable radii r and s have to be found.


To this end, equation (11.1) is converted into a fixed point equation for a suitable
contraction T . To construct this contraction, we use the fact from linear algebra that
an invertible matrix A := DF (x0 ) has a nonsingular inverse A1 . Too, we need the
tangent approximation

(11.2) F (x) = y0 + A(x x0 ) + g(x)

The remainder function g is continuously differentiable with g(x0 ) = 0, Dg(x0 ) = 0.


Hence for all > 0, there exists r > 0 such that

(11.3) kg(x1 )k kx1 x0 k


(11.4) kg(x1 ) g(x2 )k kx1 x2 k for all x1 , x2 Br (x0 )

Equation (11.1) is equivalent to

(11.5) y = y0 + A(x x0 ) + g(x)


(11.6) x = x0 + A1 (y y0 ) A1 g(x)

From the right hand side, we define the operator

(11.7) T x := x0 + A1 (y y0 ) A1 g(x)

The radius r > 0 has to be chosen small enough such that

(a) Br (x0 ) O.

(b) The mapping T is a contraction: kT x1 T x2 k qkx1 x2 k with 0 < q < 1.

68
(c) The mapping T maps the ball Br (x0 ) into itself. To this end, it is enough to require
kT x0 x0 k r(1 q) with 0 < q < 1 from above.
To obtain requirement (b), we calculate
kT x1 T x2 k = kA1 [g(x1 ) g(x2 )]k kA1 kkg(x1 ) g(x2 )k
kA1 kkx1 x2 k = qkx1 x2 k
Hence one needs to choose any contraction constant 0 < q < 1 and set
q

kA1 k
Finally the radius r > 0 is determine by requirements (a) and (11.4). To obtain require-
ment (c), we calculate
kT x0 x0 k = kA1 [y y0 ]k kA1 kky y0 k r(1 q)
Hence
r(1 q)
ky y0 k =: s
kA1 k
where s > 0 is the radius of the ball restricting y. After q, , r and s have been chosen
as explained, we see that the mapping T is a contraction which maps the ball Br (x0 )
into itself.
Hence, by the contraction mapping principle 10, there exists exactly one fixed point,
and hence exactly one solution of equation (11.1) with x Br (x0 ).
Proof of part (ii) from the detailed local Inverse Function Theorem 11.1. As shown early,
the inverse image of an open set by a continuous mapping is open. Hence F 1 (Bs (y0 )),
and the intersection U := F 1 (Bs (y0 )) Br (x0 ) are open. Let V := Bs (y0 ). This set is
covered, as shown in part (i). Hence
Bs (y0 ) F (Br (x0 ))
1
F (U ) = F [F (Bs (y0 )) Br (x0 )] = Bs (y0 ) F (Br (x0 )) = Bs (y0 ) = V
Too, the restriction F : U 7 V is one-to-one, as shown above, and hence a bijection.
Proof of part (iii), detailed local Inverse Function Theorem 11.1. The tangent line ap-
proximation (11.2) and the estimate (11.4) of the remainder term imply
kF (x1 )F (x2 )k = kA(x1 x2 )+g(x1 )g(x2 )k kA(x1 x2 )k+kx1 x2 k [kAk+]kx1 x2 k
On the other hand, subtracting two instances with i = 1, 2 of equation (11.6) yields
xi = x0 + A1 (yi y0 ) A1 g(xi )
x1 x2 = A1 (y1 y2 ) A1 [g(x1 ) g(x2 )]
kx1 x2 k kA1 kky1 y2 k + kA1 kkx1 x2 k
kA1 k
kx1 x2 k ky1 y2 k
1 kA1 k

69
With the bound for used above, we get

kA1 k kA1 k
(11.8) kx1 x2 k kF (x 1 ) F (x 2 )k kF (x1 ) F (x2 )k
1 kA1 k 1q

Proof of part (iv) from the detailed local Inverse Function Theorem 11.1. To obtain the
total derivative of the local inverse at the point y0 , we take equation (11.6) and obtain

F 1 (y) F 1 (y0 ) = x x0 = A1 (y y0 ) A1 g(x)


kF 1 (y) F 1 (y0 ) A1 (y y0 )k = kA1 g(x)k
kA1 kkx x0 k = kA1 kkF 1 (y) F 1 (y0 )k
kA1 k2
ky y0 k
1 kA1 k

In the last step, we have used equation (11.8). Since can be chosen as small as
neededat the expense to diminish r and swe see that the local inverse has the total
derivative
DF 1 (y0 ) = A1 = [DF (x0 )]1
as claimed.

Corollary 19. Let O Rd be open and the mapping F : O 7 Rd , and all its first partial
derivatives be continuous. Assume that the the derivative matrix DF (x) is invertible for
all x O.
Then the local inverse F 1 has continuous first partial derivatives for all y F (O).

Proof of Corollary 19. This follows from part (iv), together with the fact that the inverse
of an invertible matrix depends continuously on the matrix.

Corollary 20. As in Corollary 19, let O Rd be open and the mapping F : O 7 Rd and
all its first partial derivatives be continuous. Furthermore, assume that the derivative
matrix DF (x) is invertible for all x O. Under these assumptions

(i) the image F (O) is open.

(ii) the set of preimages of any y F (O) can only accumulate on the boundary O.

(iii) If the domain is open and connected, the image F (O) is open and connected.

Proof of part (i) of Corollary 20. For each point y = F (x) F (O), we have constructed
two neighborhoods U of x and V of y such that F (U ) = V . Hence for any point
y F (O), the image set F (O) F (U ) = V contains a neighborhood V .

70
Proof of part (ii) of Corollary 20. Suppose xn O is a sequence of points with the same
image F (xn ) = F (x1 ) for all n N. Suppose furthermore convergence xn x. If x O,
we get a contradiction to the local Inverse Function Theorem. Hence the accumulation
point x O.
Proof of part (iii) of Corollary 20. A set which is open and connected is path-wise con-
nected. Hence the image F (O) is open and path-wise connected, and hence connected,
too.

11.2 The Implicit Function Theorem


Main Theorem (The Implicit Function Theorem). On the open set O Rn+d ,
the mapping F : O 7 Rd is given. Let u = (x, y) with x Rn , y Rd and z = F (x, y).
Assume the mapping F and its first partial derivatives

Dy F (x, y)

are continuous. Assume that at point u0 = (x0 , y0 ) O the derivative matrix Dy F (x0 , y0 )
from above is invertible. Then the equations

(11.9) F (x, y) = z

locally have a unique solution

(11.10) y = G(x, z)

(i) Indeed, there exists p, r, s > 0 such that for kx x0 k < p, ky y0 k < r and
kz z0 k < s with z0 = F (x0 , y0 ), the implicit equation (11.9) and its solu-
tion (11.10) are equivalent. Moreover, the function G and its partial derivatives
Dz G are continuous.

(ii) Assume additionally that both first partial derivatives

Dx F (x, y) , Dy F (x, y)

exist and are continuous. Then the function G is even continuously differentiable.

(iii) All existing partial derivatives of the solution G can be obtained via the chain rule.

Corollary 21 (Dinis Theorem). Assume the mapping F : O R2 7 R is continu-


ously differentiable and
Dy F (x0 , y0 ) 6= 0
at point (x0 , y0 ) O. Let z0 = F (x0 , y0 ).
Then the equation
F (x, y) = z0

71
locally has a unique solution
y = G(x)
Indeed, there exists r > 0 such that for kx x0 k < r, ky y0 k < r the implicit equation
and its solution are equivalent. Moreover, the function G is continuously differentiable.
Remark. I give at first the proof of the Implicit Function Theorem for continuously
differentiable mappings, part (ii). Indeed, the proof is easier because of the additional
assumption that all partial derivatives exist and are continuous. The trick is to define
a new function, to which one applies the Inverse Function Theorem. Too, one easily
checks that the solution is continuously differentiable with respect to all variables.
Later, the proof of part (i)] follows. Here only continuous dependence on the param-
eter x is assumed. continuous derivatives with respect to parameter x are not assumed
to exist. Under this restriction, the only available proof is a rather tedious. Step by
step, the proof of the Inverse Function Theorem. is modified in order to apply it to the
mapping F (x, .) with x as parameter, y as independent and z as dependent variable. As
expected for this case, the solution depends only continuously on the parameter x, and
continuously differentiable only on the variable z.
Simplified proof of part (ii) for continuously differentiable mappings. I begin with this
part, because the proof is easier under the additional assumption that all partial deriva-
tives exist and are continuous. We define a new function

H : (x, y) O 7 (x, z) Rn Rd
H(x, y) = (x, F (x, y))

with second coordinate the given function, the additional first coordinate is mapped by
the identity. The derivative matrix of function H is
 
I 0
DH =
Dx F Dy F

The inverse of this matrix at point (x0 , y0 ) is


 
1 I 0
[DH(x0 , y0 )] =
Dx F [Dy F ]1 [Dy F ]1
Hence we can apply the Inverse Function Theorem to the Function H. We get the local
inverse
H 1 (x, z) = (x, G(x, z))
the second coordinate of which gives the solution of the implicit equation (11.9).
Proof of the Implicit Function Theorem part (iii). Direct calculation yields
 
1 I 0
DH =
Dx G Dy G

72
By the Inverse Function Theorem, and the calculation above
 
1 1 I 0
DH = [DH] =
Dx F [Dy F ]1 [Dy F ]1

Hence we get by comparison

Dx G = Dx F [Dy F ]1
Dy G = [Dy F ]1

It is left to the reader to check that the chain rule leads to the same result.
Proof of the Implicit Function Theorem part (i). Without assuming continuous deriva-
tives with respect to parameter x, the only available proof is a modification of the
proof of the Inverse Function Theorem. The Inverse Function Theorem is used for the
mapping F (x, .) with x as parameter, y as independent and z as dependent variable.
By assumption, the partial derivative A := Dy F (x0 , y0 ) has a nonsingular inverse
1
A . We need the tangent approximation

(11.11) F (x, y) = y0 + A(y y0 ) + g(x, y)

The remainder function g is continuously differentiable with

g(x0 , y0 ) = 0, Dy g(x0 , y0 ) = 0

Hence for all > 0, there exists r > 0 such that

(11.12) kg(x, y1 ) g(x, y2 )k ky1 y2 k for all x Br (x0 ) , y1 , y2 Br (y0 )

Given z Bs (z0 ) and x Bp (x0 ), we have to find all solutions of the implicit equation

(11.9) F (x, y) = z

with y Br (y0 ). Too, suitable radii p, r and s have to be found.


To this end, equation (11.9) is converted into a fixed point equation for a suitable
contraction T , but which now contains the variables x and z as parameters.

(11.13) z = z0 + A(y y0 ) + g(x, y)


(11.14) y = y0 + A1 (z z0 ) A1 g(x, y)

From the right hand side, we define the operator

(11.15) T (x, z)y := y0 + A1 (z z0 ) A1 g(x, y)

with parameters x Bp (x0 ) , z Bs (z0 ). The radii p, r, s > 0 have to be chosen small
enough such that

73
(a) Br (x0 ) Br (y0 ) O.

(b) The mapping T is a contraction: kT (x, z)y1 T (x, z)y2 k qky1 y2 k with 0 <
q < 1, uniformly for all x Bp (x0 ) , z Bs (z0 ).

(c) The mappings T (x, z) all map the ball Br (y0 ) into itself. To this end, it is enough
to require kT (x, z)y0 y0 k r(1 q), with 0 < q < 1 from above, uniformly for
all x Bp (x0 ) , z Bs (z0 ).

To obtain requirement (b), we calculate

kT (x, z)y1 T (x, z)y2 k = kA1 [g(x, y1 ) g(x, y2 )]k kA1 kkg(x, y1 ) g(x, y2 )k
kA1 kky1 y2 k qky1 y2 k

As in the Inverse Function Theorem, one needs to choose any contraction constant
0 < q < 1 and such that
q

kA1 k
The radius r > 0 is determined by requirements (a) and (11.12), where at first we may
choose p = r. But to obtain requirement (c), we may have to still diminish p. We
calculate

kT (x, z)y0 y0 k = kA1 [y y0 g(x, y0 )]k kA1 k[ky y0 k + kg(x, y0 )k]

and require the right-hand side to be at most r(1 q). To this end, it is sufficient

(1) to require
r(1 q)
ky y0 k =: s
2kA1 k
where s > 0 is the radius of the ball restricting y.

(2) choose p (0, r] small enough such that

(11.16) kg(x, y0 )k s for all x Bp (x0 )

After q, , r, s and finally p have been chosen as explained, we see that the mappings
T (x, z) are contractions which map the ball Br (y0 ) into itself, for all parameters x
Bp (x0 ) , z Bs (z0 ).
Hence, by the contraction mapping principle 10, there exists exactly one family of
fixed points and hence for any given x Bp (x0 ) , z Bs (z0 ) exactly one solution with
y Br (x0 ) of equation (11.9), which we now define to be y = G(x, z).

74
Proof of continuity. To obtain continuity at point (x2 , z2 ), we take any tolerance > 0
and aim to achieve kG(x1 , z1 ) G(x2 , z2 )k < for kx1 x2 k and kz1 z2 k small enough.
To this end, for any two xi B p (x0 ) , zi B s (z0 ), we estimate the difference
kG(x1 , z1 ) G(x2 , z2 )k kG(x1 , z1 ) G(x1 , z2 )k + kG(x1 , z2 ) G(x2 , z2 )k
For both terms, we use the fixed point equations
G(xi , zk ) = y0 + A1 (zk z0 ) A1 g(xi , G(xi , zk ))
from which we obtain by subtraction
G(x1 , z1 ) G(x1 , z2 ) = A1 [z1 z2 ] A1 [g(x1 , G(x1 , z1 )) g(x1 , G(x1 , z2 ))]

and

G(x1 , z2 ) G(x2 , z2 ) = A1 [g(x1 , G(x1 , z2 )) g(x2 , G(x2 , z2 ))]


= A1 [g(x2 , G(x2 , z2 )) g(x1 , G(x2 , z2 ))]
+ A1 [g(x1 , G(x2 , z2 )) g(x1 , G(x1 , z2 ))]
Now we get the norm estimatewith x1 = x
kG(x, z1 ) G(x, z2 )k kA1 k[kz1 z2 k + kg(x, G(x, z1 )) g(x, G(x, z2 ))k]
kG(x, z1 ) G(x, z2 )k kA1 k[kz1 z2 k + kG(x, z1 ) G(x, z2 )k]
kA1 k
kG(x1 , z1 ) G(x1 , z2 )k kz1 z2 k
1 kA1 k
and secondlywith z2 = z
kG(x2 , z) G(x1 , z)k kA1 kkg(x2 , G(x2 , z)) g(x1 , G(x2 , z))k
+ kA1 kkg(x1 , G(x2 , z)) g(x1 , G(x1 , z))k
kA1 kkg(x2 , G(x2 , z)) g(x1 , G(x2 , z))k
+ kA1 kkG(x2 , z) G(x1 , z)k
kA1 k
kG(x2 , z) G(x1 , z)k kg(x2 , G(x2 , z)) g(x1 , G(x2 , z))k
1 kA1 k
Given any tolerance > 0, we require
1 kA1 k
kz1 z2 k
2kA1 k
and kx1 x2 k small enough such that
1 kA1 k
kg(x2 , G(x2 , z2 )) g(x1 , G(x2 , z2 ))k ,
2kA1 k
too. Thus we get kG(x1 , z1 ) G(x2 , z2 )k < as required.

75
11.3 Implicit curves
Definition 11.1 (Critical points, critical values). For any continuously differen-
tiable mapping F : O R2
7 R, the set of critical points is defined as

CP = {(x, y) O : Dx F (x, y) = Dy F (x, y) = 0}

The set of critical values is defined to be

CV = {F (x, y) R : Dx F (x, y) = Dy F (x, y) = 0}

10 Problem 11.1. Check that = O \ (CP F 1 (0)) is an open set.


Answer. The set CP F 1 (0) = Dx F 1 (0) Dy F 1 (0) F 1 (0) is closed since the
functions Dx F, Dy F and F are assumed to be continuous. Hence its complement is
open. Thus its intersection with the open set O is open, too.
Definition 11.2 ( and limit sets of a curve). For any curve C : t (a, b) 7
(x(t), y(t)), the limit sets are defined to be

(C) := {(x, y) R2 : there exists a sequence ti ak such that (x(ti ), y(ti )) (x.y)}
(C) := {(x, y) R2 : there exists a sequence ti bk such that (x(ti ), y(ti )) (x.y)}

Theorem 11.2 (About the Accumulation of Implicit Curves). Assume the map-
ping F : O R2 7 R is continuously differentiable. Let

= {(x, y) O : if F (x, y) = 0 then either Dx F (x, y) 6= 0 or Dy F (x, y) 6= 0. }

be called the noncritical domain. Then the solutions of the implicit equation

F (x, y) = 0

have the following properties:


(i) They are given by the union of an at most countable number of continuously differ-
entiable curves
Ck : t 7 (xk (t), yk (t))
where k N, with the subset of critical points CP F 1 (0).
(ii) Through each point of F 1 (0), there passes exactly one of these curves.
(iii) Two different curves do not intersect inside the noncritical domain.
(iv) Each curve Ck can be extended to a finite or infinite maximal interval (ak , bk ).
(v) Either one gets by extension a periodic orbit, or the limit sets (C) and (C) of
the maximally extended curve C : t (t , t+ ) 7 (a(t), b(t)) are closed connected
subsets of the boundary .

76
(vi) Sequences k N (xk (tk ), yk (tk )) of points from different curves can only accu-
mulate on the boundary .
Proof. (ii) At any point (x0 , y0 ) with F (x0 , y0 ) = 0, either Dx F (x0 , y0 ) 6= 0 or
Dy F (x0 , y0 ) 6= 0. In the second case, we apply Dinis Theorem and get a continu-
ously differentiable curve x = t, y = G(t) through point (x0 , y0 ). In the first case,
we roles of the variables x and y are switched, and we get a curve x = G(t),
e y = t.

(iii) Two different curves do not intersect in the noncritical domain, since the Implicit
Function Theorem guarantees locally a unique solution.

(iv) Each curve Ck can clearly be extended to a finite or infinite maximal interval
(ak , bk )by taking the union of the domains of all extensions.

(v) Take any point (x, y) in the limit set (C). By definition, there exists a sequence
ti ak such that (x(ti ), y(ti )) (x.y).
Assume towards a contradiction that the limit point is in the noncritical domain
(x, y) . Since = O \ (CP F 1 (0)) is an open set, F (x, y) = 0, and either
Dx F (x, y) 6= 0 or Dy F (x, y) 6= 0, Dinis Theorem can be applied, and the curve
can be extended to a neighborhood of (x, y)contradicting maximality.
This contradiction implies that (x, y) / . Since the definition of a limit point
implies (x, y) , the only remaining possibility is (x, y) . As shown in an
earlier exercise, the limit set is always closed. The connectedness has been shown
in Theorem 2.2.

(v) Here is another proof using the Heine-Borel Theorem: Take any compact set
Aj , as defined by equation (11.17). The local implicit function theorem can
be applied at all points of K = Aj F 1 (0), and yields at every point (x, y) K
an open neighborhood with a unique smooth solution curve inside of it. By the
Heine-Borel Theorem, one can cover the compact set K by finitely many such
neighborhoods. The solution curves can be pieced together where the neighbor-
hoods of this finite cover of K overlap, since these intersections are open. Either
one gets a periodic orbit, or, if not, a solution curve reaches the boundary Aj .
One now continues the process for all j N. Thus one exhausts all solutions
of f (x, y) = 0 in the entire noncritical domain , since jN Aj = by equa-
tion (11.18). Once more, where the neighborhoods overlap, the solutions curves
can be extended. Either further periodic orbits appear, or, if not, a solution curve
reaches the boundary .

(vi) Take an accumulation point (x, y) of points (xk (tk ), yk (tk )) of points from different
curves Ck with k = 1, 2, 3, . . . .
Assume towards a contradiction that the accumulation point is in the noncritical
domain (x, y) . By Dinis Theorem there exists a locally unique solution curve

77
C through (x, y), which in some neighborhood N is unique by the Implicit Function
Theorem.
All points (xk (t), yk (t)) for k K large enough lie in this neighborhood N , and
hence on the curve C. This contradicts the assumption that (xk (t), yk (t)) are points
from different curves Ck .
(i) The not so obvious point is that the solutions of the implicit equation in the non-
critical domain are an at most countable union of curves.
For all j N, we define
(11.17) Aj := {(x, y) : x2 + y 2 j 2 , dist ((x, y), ) 1j }
Question. Convince yourself that the sets (compactexhaustion) Aj are compact
subsets of the noncritical domain , with union
[
(11.18) Aj =
jN

Exhaustion by compact sets. Each set Aj is compact, since it is closed and bounded.
Indeed, each set Aj is contained in a disk of radius j, and hence bounded. To check
whether Aj is closed, let (xn , yn ) be any sequence of points in Aj , for which we
assume convergence (xn , yn ) (x, y). Is the limit (x, y) in Aj ?
The assumption x2n + yn2 j 2 for all n N implies x2 + y 2 j 2 for the limit, too.
The assumption dist ((xn , yn ), ) 1j implies
1
dist ((xn , yn ), (a, b)) j

for all (a, b) and all n N. Hence


1
dist ((x, y), (a, b)) j

for all (a, b) holds for the limit, and hence


1
dist ((x, y), ) j

Thus we have checked that (x, y) Aj , and confirmed that Aj is closed.


Since a set which is closed and bounded is compact, each set Aj is indeed compact.
Next we check that the union of all sets Aj is . Take any point (x, y) . Since
is assumed to be open, there exists an > 0 such that B (x, y) . By the
Archimedean axiom, there exists j N such that j 0 1 . Hence
1
dist ((x, y), ) j0

Again by the Archimedean axiom, there exists j 00 N such that x2 +y 2 j 00 j 002 .


We conclude that (x, y) Aj for j = max(j 0 , j 00 ). Hence the set equation (11.18)
is confirmed.

78
Question. Prove that any of the set Aj defined by equation (11.17) can only in-
tersect finitely many solution curves of F (x, y) = 0.
Answer. Suppose towards a contradiction that one set Aj intersects infinitely many
different curves Ck with k = 1, 2, 3, . . . . By the Bolzano-Weierstrass Theorem, any
sequence of points from these curves has a convergent subsequence, which we de-
note by (xk (t), yk (t)) (x, y) Aj . We may choose the parameter t independent
of k.
1
Since the limit point (x, y) Aj has distance at least j
from the boundary
, it is an interior point of .
As already shown in part (iv), sequences of points from different curves can only
accumulate on the boundary . Hence we get a contradiction.
Question. Why are the solutions of the implicit equation in the noncritical domain
an at most countable union of curves?
Answer. Each set Aj is intersected by at most finitely many solution curves. Hence
there union [
= Aj
jN

is intersected by at most countably many solution curves.

11.4 Examples
10 Problem 11.2. The Folium of Descartes is given by the implicit equation

x3 + y 3 3xy = 0

Determine the critical points. Determine the points of the curve with horizontal, vertical
and 450 slope of tangent.

Answer. The function F (x, y) = x3 + y 3 3xy has the partial derivatives

Fx = 3x2 3y , Fy = 3y 2 3x
Fxx = 6x , Fxy = 3 , Fyy = 6y

Solving the system Fx = Fy = 0 yields two critical points (x, y) = (0, 0) and (x, y) =
(1, 1). The only critical point on the level F (x, y) = 0 is (x, y) = (0, 0). The determinant
2
of the Hessian at this critical point is Fxx Fyy Fxy = 9. Hence it is a saddle point,
where two solution curves cross each other.
Points of the curve with horizontal tangent satisfy
F = Fx = 0. These equations
3 3
yield two solutions (x, y) = (0, 0) or (x, y) = ( 2, 4). The second solution is not

79
Figure 11.1: Look at my Folium. Is it of Descartes?

critical. Hence the implicit function theorem tells there exist a locally unique curve
through the second point.
Vertical tangents can onlyoccur
for F = Fy = 0. These equations yield two solutions
3 3
(x, y) = (0, 0) or (x, y) = ( 4, 2). By the implicit function theorem, there exist a
locally unique curve through the second point.
Tangents of slope 1 occur for 1 = h0 = FFxy , where F (x, h(x)) 0. Solving the
equations F = 0, Fx = Fy easily yields the solution (x, y) = ( 23 , 32 ). I did not check
carefully that this is the only oneas it turns out to be.
10 Problem 11.3. Explain which curve I have draw in the figure on page 80.
How is it related to the folium of Descartes?
10 Problem 11.4. Check that the parametric curve
3t 3t2
x= , y =
1 + t3 1 + t3
is actually the folium. Which interval is needed for the parameter t. Locate the points
calculated above.

80
10 Problem 11.5. Let d, p > 0. The concoid of Nicomedes is given by the implicit
equation
(x2 + y 2 )(x d)2 p2 x2 = 0

(i) Under which condition is the critical point (0, 0) a local extremum.

(ii) Determine the points of the curve with vertical slope of tangent.

(iii) Determine the points of the curve with horizontal slope of tangent.

(iv) Under which condition is the critical point (0, 0) a saddle point. How many parts
has the curve is the nonsingular domain in that case? How many parts are linked
at a critical point?

(v) Draw Agnesis curve, which is the boundary case p = d. How many parts has the
curve?

Figure 11.2: Nicomedes concoid for p > d has a saddle point at zero, and a loop.

Answer.

81
Figure 11.3: Agnesi curve.

Figure 11.4: Nicomedes concoid for p < d has a 2-d minimum at critical point.

82
The function F (x, y) = (x2 + y 2 )(x d)2 p2 x2 has the partial derivatives

Fx = 2x(x d)2 + 2(x2 + y 2 )(x d) 2p2 x


Fy = 2y(x d)2

Under the assumption p, d > 0, the only critical point on the level F (x, y) = 0 is
(x, y) = (0, 0). The second derivatives at (x, y) = (0, 0) are

Fxx = 2d2 2p2 , Fxy = 0 , Fyy = 2d2

(i) For d > p > 0, we get Fxx > 0, Fyy > 0, Fxy = 0. Hence the Hessian is positive
definite, which implies a local minimum of F .

(ii) The points of the curve with vertical slope are obtained by solving

y(x d)2 = 0
(x2 + y 2 )(x d)2 p2 x2 = 0

Under the assumption p, d > 0, we get the solutions (x, y) = (d + p, 0), (x, y) =
(d p, 0), and the critical point (x, y) = (0, 0). The first two are indeed points of
the curve with vertical tangent. But the implicit function theorem cannot be used
at a critical point. Indeed, we do not get a vertical slope curve at (0, 0)!

(iii) The points of the curve with horizontal slope are obtained by solving

x(x d)2 + (x2 + y 2 )(x d) p2 x = 0


(x2 + y 2 )(x d)2 p2 x2 = 0

A bid of effort in calculation yields the solution



3
hp
3
i hp
3
i3/2
x = d 3 p2 d2 , y = 3 p2 d2

This solution occurs only for p d > 0.


2
(iv) The determinant of the Hessian at this critical point is Fxx Fyy Fxy = 4d2 (d2 p2 ).
Hence it is a saddle point for p > d > 0.
How many parts are linked at the critical point (0, 0)? Two curves are crossing at
the saddle point. They are linked to one loop, and two branches going to infinity.
How many parts has the curve is the nonsingular domain in that case? The concoid
has a forth branch with x > d, lying on the right hand side of the guiding line
x = d. Altogether the curve in the nonsingular domain consists of four branches.

83
(v) Solving for y is easy. One can use the graphing calculator to plot
x p
y= (p + d x)(p d + x)
xd
Too, the factored form shows the vertical tangents immediately. Agnesis curve is
the boundary case p = d. One can simplify to get
p
x x(2d x)
y=
xd
The branch on the right hand side to the guiding line x = d has a vertical tangent
at (2d, 0).
On the left hand side to the guiding line at the critical point (0, 0), two branches
merge with horizontal tangents. Altogether the curve in the nonsingular domain
consists of three branches.

12 Does Pythagoras Theorem imply the Euclidean


Parallel Postulate?
12.1 Different Formulations of the Problem
A purely geometric proof would have to struggle with some obvious difficulties. At first
the squares of the sides cannot be interpreted as area of any figure in neither spherical
nor hyperbolic geometry. In these geometries, there do not exist any similar figures
other than congruent ones, and squares do not exist at all. So the squares of the sides
can only be interpreted as numbers. I am for now considering these difficulties as not
crucial. Here are possible approaches to address the question posed:
(1) One can look at the special Pythagorean triples of integers such that a2 + b2 = c2 .
The smallest one is the triple 32 + 42 = 52 . So a meaningful question is whether
there exist right triangles with the sides 3, 4 and 5 in either spherical or hyperbolic
geometry.
(2) Take any right triangle 4ABC. In spherical geometry its sides satisfy
cos c = cos a cos b
Does this imply that Pythagoras Theorem can never hold?
(3) Take any right triangle 4ABC. In hyperbolic geometrywith Gaussian curvature
K = 1the sides of a right triangle satisfy
cosh c = cosh a cosh b
Does this imply that Pythagoras Theorem can never hold?

84
I do no know about any approach to answer at least part of these questions in the
framework of neutral geometrywhere they would naturally fit, similarly to Legendres
Theorems about the angle sum.

12.2 The Main Results obtained with Calculus


In the following, we approach problems (2) and (3) using substantial tools from trigonom-
etry and calculus. The main results are:

Theorem 12.1. If a spherical right triangle has the short arc as its hypothenuse c, this
side is strictly shorter than for the Euclidean flat triangle with legs of the same lengths.
Hence in single elliptic geometry, the hypothenuse c of any right triangle is strictly
shorter than for the corresponding Euclidean flat triangle with legs of the same lengths.

Theorem 12.2 (The lunes of Pythagoras). For spherical right triangles where
the hypothenuse c is the longer arc, this side may be either shorter or longer, or of equal
length as for the Euclidean flat triangle with legs of the same lengths.
Assume the length of the hypothenuse is restricted to the interval c (, 2). The
legs satisfy a2 + b2 = c2 as in the Euclidean case if and only if (a, b) lies on one
curve C E, which connects the boundary points (s , 0) and (0, s ) inside the quarter
circle
E = {(a, b) : 0 < a < 2 , 0 < b < 2 , a2 + b2 < 4 2 }
Here the number s 257 is the unique solution of tan s = s (, 3
2
).

Theorem 12.3. In hyperbolic geometry, the hypothenuse c is longer than for the Eu-
clidean flat triangle with legs of the same lengths.

12.3 Using Convexity


We now proceed to the proof of these results. The main tool to get Theorem 12.1 and
Theorem 12.3 turns out to be convexity. We begin with a proposition which yields the
result of Theorem 12.1 assuming an additional convenientbut a bid more restrictive
condition.

Proposition 12.1. Assume the sides a > 0, b > 0 and c of a small spherical right
triangle satisfy the following two assumptions:

(i) The sum of the two legs satisfy a + b s , where s 257 is the solution of
tan s = s (, 3
2
).

(ii) The hypothenuse c is the short arc: c .

Then c2 < a2 + b2 .

85
Proof of the Proposition. We use the identities
(a + b)2 + (a b)2
a2 + b 2 =
2
cos(a + b) + cos(a b)
cos a cos b =
2
2 2 2
and define x := a + b and y := a b 0. The
assertion a + b > c will be deducted
from the inequalities cos a2 + b2
< cos c and a2 + b2 +c 2. The key point is to use
the convexity of the function cos x. The relevant facts are gathered in the following
lemmathe proof of which is left to the reader.
Lemma 12.1. Define the function
(
cos x for x 0
(x) :=
cosh x for x 0
This function can be continued to an entire analytic function. Actually its McLaurin
series is
X (z)n
(z) =
n=0
(2n) !
which is convergent for all z C. It is easy to check that

0 sin x
(x) =
2 x



00 1 sin x
(x) = cos x
4x x
and easy to check that 00 (0) = 1
12
. The smallest real zero of 00 is (s )2 , and hence
00 (x) > 0 for all real x (, (s )2 ).
Convexity implies
x0 + y 0 (x0 ) + (y 0 )
 
< for all y 0 < x0 (s )2 .
2 2
10 Problem 12.1. Students of MATH 3146:
Prove the Lemma in detail.
Now let x0 := x2 and y 0 := y 2 . Because of the Lemma, and the addition assumption
a + b = x s , we conclude that indeed
r
x2 + y 2 cos x + cos y
cos < for all 0 y < x s .
2 2
cos(a + b) + cos(a b)
cos a2 + b2 < = cos a cos b = cos c
2

86

for all 0 < b a with a + b s . From this estimate follows indeed d := a2 + b2 > c
immediately if a, b, c are small. Here is a completely rigorous argument. Because of the
identity
c+d dc
2 sin sin = cos c cos d > 0
2 2
we are ready under the assumption that c + d 2. But the case that c + d > 2 and
d < c would imply c > , which we had to rule out.

End of the proof of Theorem 12.1 . We rule out the case a2 +b2 c by deriving a
contradiction. Since (a + b)2
2a 2
+ 2b 2
, we would get a + b 2 a 2 + b2 2 < s .

Indeed 2 = 4.4429 < s = 4.4934. Hence both assumptions (i) and (ii) of the
Proposition hold, and we get a2 + b2 > c.
Proof of Theorem 12.3 . We use the identities

(a + b)2 + (a b)2
a2 + b 2 =
2
cosh(a + b) + cosh(a b)
cosh a cosh b =
2

y := a b 0. The key point is to use the convexity of the


and define x := a + b and
function f (x) = cosh x. This follows from the Lemma above, but it easy to check
independently. Now let x0 := x2 and y 0 := y 2 .
r
x2 + y 2 cosh x + cosh y
cosh < for all 0 y < x.
2 2
cosh(a + b) + cosh(a b)
cosh a2 + b2 < = cosh a cosh b = cosh c
2
for all 0 < b a. The assertion a2 + b2 < c2 now follows immediately.

12.4 The Lunes of Pythagoras


It requires more work to derive Theorem 12.2. We prove existence and calculate some
exceptional spherical right triangles: with hypothenuse c > and even a2 + b2 = c2
happens to hold. I like to call these triangles

Definition 12.1 (The lunes of Pythagoras). The lunes of Pythagoras are right
spherical triangles, the sides a, b, c (0, 2) of which satisfy both

cos c = cos a cos b and


(12.1)
c 2 = a2 + b 2

87
To study them, we apply Dinis Theorem to the function

f (a, b) := cos a2 + b2 cos a cos b

defined on the open quarter circle

E = {(a, b) : 0 < a < 2 , 0 < b < 2 , a2 + b2 < 4 2 }



Throughout, I shall use c := a2 + b2 as an abbreviation.
10 Problem 12.2. Calculate the partial derivatives fa and fb . At first, check that
dc a
=
da c
Answer.
a
fa = sin c sin a cos b
c
b
fb = sin c cos a sin b
c
10 Problem 12.3. Simplify fa fb by means of the addition theorem for sin.
Show that the function f has exactly one critical point inside the open quarter circle E.
Calculate this critical point. 19
Solution.
sin c
fa fb = (a b) sin(a b)
c
At the critical points, both partial derivatives vanish and hence fa fb = 0. Hence
sin(a b) sin c sin(a + b)
= = =: S
ab c a+b
The key point is that the function x [0, 2] 7 sinx x assumes neither any nonnegative
values S 0 twice, nor any negative values S < 0 three times.
Assume at first a > b. Since 0 < a b < c < a + b and c 2, we conclude S 0.
Hence a b < c 2, but 3 a + b.
This implies 2 acontrary to the assumption (a, b) E. Similarly, one gets a
contradiction in the case that a > b.
Finally, to cover the borderline case a = b > 0, we observe c = 2 a and hence we
get

sin c sin 2 c
(12.2) = =S
c 2c
19
I have found several different arguments to get this resultall a bid strangeone needs to check
all such arguments carefully!

88

It is impossible that 2c > 2, which would imply c 2 but 2 c 3impossible.
Hence we get S 0, c and 2c 2. There exists exactly one solution of
equation (12.2) in this range. Approximately, we get c = 3.7613, corresponding to
215.51 . Hence ac = c2 152.39 .

10 Problem 12.4. Determine the critical points of the function f on the bound-
ary E of the quarter circle E.

Answer. Both axis a = 0 and b = 0 consist entirely of critical points. There are no other
critical points of the circle {(a, b) : a2 + b2 = 4 2 }this claim follows for example from
the solution of the last problem.

12.5 Examples for the Lunes of Pythagoras


On the pages 90 through 93, one can see several examplesobtained by solving the
equation
cos a2 + x2 cos a cos x
:= g(a, x) = 0
x2
numerically for several values of a. I show the flat triangle, its sides in radian measure,
together with the spherical triangle in stereographic projection, with side b on the equa-
tor (red), and side a extended through the south pole (blue). Sides c and b intersect in
both vertex A and its antipode. The lengths of the sides are given in degree measure.

10 Problem 12.5. Give your own comments about these figures.

Answer.

89
Figure 12.1: The Pythagoras lune with a = 45 .

Figure 12.2: The Pythagoras lune with a = 70 .

90
Figure 12.3: The Pythagoras lune with a = 90 is highly degenerate.

Figure 12.4: The Pythagoras lune with a = 120 .

91
Figure 12.5: The Pythagoras lune with a = 150 .

Figure 12.6: The Pythagoras lune with a = 180 corresponds to a flat right triangle with
sides 3, 4 and 5it is degenerate, too.

92
Figure 12.7: An approximately isosceles lune.

10 Problem 12.6. Let a denote the smallest positive solution of cos2 a =



cos( 2 a ). Find the lunes of Pythagoras for which a = b. Show that there is exactly
one such solution.
10 Problem 12.7. Find the lunes of Pythagoras in the special cases that
(i) a = b.

(ii) a = 2

(iii) a =
(iv) Take the limit a 0.
Set up a table with the exact values of a, b, c as well as fa , fb and db
da
= ffab .

a b c fa fb ffab .
0+ s s 0 0 0


2
2 3
2
1
3
cos(

2) 2 2
3

(3 cos ( 2)+1)
4
2

4 5 3 3 9 3
31
3 3 10 10
a a
2 a 1 cos 2a sin
2a
2a
fa 1

4 5 9 3 3 3
3
3 3 10

2 2
10
2
2
3
2 3
1
3
cos( 2) 3 cos 2(2)+1
2

s 0+ s
0 0

93
By the way, we see from this table that the answer to the first question (1) posed above
is positive:

Yes, in spherical geometry, there exists a special right triangle with the sides
of 180 , 240 and 300 .

10 Problem 12.8. Get numerical approximations for this table, using angle mea-
surement in degrees.

db
a b c fa fb da
0+ 257.45 257.45 0 0 0
90 254.56 270 0.06708 0.94281 0.07115
180 240 300 0.51962 1.55885 0.33333
Answer.
218.03 218.03 308.35 0.38214 0.38214 1
240 180 300 1.55885 0.51962 3
254.56 90 270 0.94281 0.06708 14.0554
257.45 0+ 257.45 0 0

12.6 Using the Local Implicit Function Theorem


10 Problem 12.9. What does the local implicit function theorem yield about so-
lutions of the system (12.1)directly from these data?

Answer. There exists neighborhoods of the five points (a, b) with a, b > 0, as given in
the table, inside of which the solutions of system (12.1) consist of a single continuously
differentiable curve b = b(a). The slope of its tangent at these points is

db fa
=
da fb
The local implicit function theorem cannot decide by itself whether these local solu-
tions are parts of one solution curve, or whether there exist several solution curves.

10 Problem 12.10. Why does one not get the corresponding result at the first
point with a = 0, nor the last point with b = 0.

Answer. All points on both axes are critical points. In the case that both fa = 0 and
fb = 0, the implicit function theorem cannot be applied.

10 Problem 12.11. Find the one variable Taylor expansion of the function b 7
f (a0 , b), up to second order, at any point (a0 , 0). What can one say about the terms of
odd order. Convince yourself that the Taylor polynomial of second order consists of only
one term kb2 and determine k by means of lHopitals rule.

94
Answer. Since the function b 7 f (a0 , b) is even, all terms of odd order are zero. Too,
it is easy to see that f (a0 , 0) = 0 for all a0 R. Hence the Taylor polynomial of second
order consists of only one term kb2 . Now apply lHopitals rule:
f (a0 , b) fb (a0 , b)
k = lim 2
= lim
b0 b b0 2b
b
sin c c cos a0 sin b
= lim
b0 2b
sin c cos a0 sin a0 cos a0
= lim =
b0 2c 2 2a0 2
Hence the Taylor expansion with respect to b, up to even third order is
 
sin a0 cos a0 2
(12.3) f (a0 , b) = b + O(b4 )
2a0 2
10 Problem 12.12. The following exercise needs a bid of complex analysis.
(i) Convince yourself that the function (a, b) C2 7 f (a, b) is an entire analytic
function in both variables.
f (a0 , z)
(ii) Give the reason why the function z C 7 is an entire analytic function,
z2
at least for a0 6= 0.
(iii) Find the two variable Taylor expansion up to order four at (0, 0) of the function
f (a, b)
(a, b) C2 7 2 2
a b
f (a, b)
(iv) Convince yourself that the function (a, b) C2 7 2 2 is an entire analytic
a b
function in both variables.

Solution. (i) The function f (a, b) = (a2 +b2 )cos a cos b is an entire analytic function
in both variables, since both z C 7 (z) and z C 7 cos(z) are entire
functions.
(ii) Assume for now a0 6= 0. The function z C 7 g(z) := f (az02,z) is an analytic for
z C \ {0}. It could have a pole of first or second order at z = 0. From the
Taylor expansion (12.3) follows that the function g is bounded for z 0 along
the real axis. But an analytic function goes to at a pole, in the sense that
|g(z)| for |z| 0. Hence there cannot occur a pole, and hence the Laurent
expansion is indeed the Taylor expansion, and hence the function g is analytic in
a complex neighborhood of zero. Since there are no other singularities, it is an
entire function. 20
2
20
The example of the function z C 7 e1/x shows that one needs to take the behavior as z 0
for complex values of z into account for this argument to work.

95
(iii) A simple calculation shows that
f (a, b) a2 b 2
(12.4) 2 2
= + O(|a|4 + |b|4 )
a b 6

(iv) From this expansion, we see that at (a, b) (0, 0), there is no singularity, neither.
Hence by the same reasoning as in item (ii), the function (a, b) C2 7 fa(a,b)
2 b2 is an

entire analytic function in both variables.

We now continue the analysis of the solutions of system (12.1)for real variables
(a, b) E R2 .
10 Problem 12.13. Prove that the solutions of system (12.1) in a neighborhood
of point (s , 0) consist of two smooth curves: the vertical axis, and a second smooth
curve with horizontal tangent at the intersection point (s , 0).
Prove that in a neighborhood of point (a0 , 0) where a0 6= 0 and a0 6= s , the solutions
of the system (12.1) consist of one curve, which is the vertical axis.
Prove that the solutions of the system (12.1) in a neighborhood of point (0, 0) consist
of two smooth curves: the vertical and the horizontal axes.
Answer. Both axes are solutions of (12.1), since f (a, 0) = f (0, b) = 0 for all a and b. To
obtain solutions of a 6= 0, b 6= 0, but b small, we solve the equation
f (a, b)
g(a, b) := =0
b2
From the Taylor expansion (12.3), we conclude that
 
sin a cos a
(12.5) g(a, b) = + O(b2 )
2a 2
This function gets zero at b = 0 if and only if tan a = a, which happens if a = 0 or a = s ,
which are the only solutions in [0, 2]. Hence there exist no solutions of system (12.1)
with small b 6= 0 except in these two cases.
In the case a = s , we can apply the implicit function theorem to the function g.
Indeed  
d sin a a cos a sin a
2ga (a, 0) = cos a = + sin a
da a a
and hence
sin s
ga (s , 0) = 6= 0 , gb (s , 0) = 0
2
The local implicit function theorem yields one branch of solutions b = b(a) with hori-
zontal tangent.
In the case a = 0, the Taylor expansion (12.4) shows that there exist no further
solutions near (0, 0) other than both axes.

96
12.7 Some Global Results
Finally, I address the analysis of the global nature of solutions of system (12.1)still
for real variables (a, b) E R2 . How many solutions curves are there inside the
noncritical domain = E? The general Theorem 11.2 about the accumulation of
implicit curves would still allow countably infinitely many curves!

Proposition 12.2. Any periodic orbit on which f = 0 contains a critical point inside
the bounded domain it encloses.

Proof. Let D E be the bounded domain, on the boundary of which the function f is
zero. 21 Since f is continuous and D is compact, f assumes its minimum and maximum
values in D. Either the minimum, or the maximum is assumed at an interior point of
D, or the function f is identically zero on D. In all cases, there exists a critical point
inside D, at which both fa = fb = 0.

Proposition 12.3. The solutions of system (12.1) inside the open quarter circle E are
just one smooth curve which connects the points (s , 0) and (0, s ) on the boundary of
E.

Proof. As explained by Theorem 11.2, each solution curve C E in the noncritical


domain can be extended to a finite or infinite maximal interval (t , t+ ). Either one
gets a periodic orbit, or the limit sets (C) and (C) of the maximally extended curve
C : t (t , t+ ) 7 (a(t), b(t)) are closed connected subsets of the boundary .
In the present example, there are no critical points in the interior of the given domain
E, at which f = fa = fb = 0. Hence boundary is = E. This rules out periodic
orbits, too. 22 By Proposition 12.2, a periodic orbit on which f = 0 contains a critical
point inside the bounded domain it encloses.
But in the present case, as shown in Problem 12.6, there exists only one critical point
(ac , ac ) inside E, which the periodic orbit has to surround. Hence the periodic orbit has
to cross the line a = b at least twice inside E. This would contradict the solution to
Problem 12.6: the line a = b contains only one solution of f (a, a) = 0 with (a, a) E,
namely the point (a , a ).
As a result of Problem 12.13, the solution curve can approach the boundaryfrom
inside Eonly at the two points (s , 0) and (0, s ). In other words, both (C)
{(s , 0), (0, s )} and (C) {(s , 0), (0, s )}.
By Theorem 2.2, the limit set of a continuous trajectory is connected, hence cannot
consist of two discrete points. By Problem 2.5, a bounded sequence in Rn the limit
set of which is exactly one point is convergent. Hence both limits limtt (a(t), b(t))
and limtt+ (a(t), b(t)) exist and can be either (s , 0) and (0, s ). Can these limits be
equalwhich would mean existence of a homoclinic orbit?
21
To be picky, we apply here the Jordan Curve Theorem.
22
The lunes of Pythagoras shall never come back!

97
Well, in Problem 12.13, we have applied the local implicit function theorem to the di-
vided function g(a, b) = f (a,b)
b2
. As a consequence, exactly one solution curve approaches
the boundary point (s , 0) from inside E.
Existence of homoclinic orbit would imply that at least three solution curves ap-
proaches either the boundary point (s , 0) or (0, s ) from inside Ewhich is impossible.
By a similar reasoning, it is impossible that several solution curves connect (s , 0) and
(0, s ).
Hence, in the end, we have confirmed that exactly one smooth curve of solutions of
system (12.1) exists inside the quarter circle E, and it connects the points (s , 0) and
(0, s ) on the boundary of E.
10 Problem 12.14. Prove that there is a critical point inside the domain H
bounded by the lune curve connecting the points (s , 0) and (0, s ), and the straight
segments (0, s ) to (0, 0) and (0, 0) to (s , 0) on the axes.
Answer. We use the same reasoning as in the proof of Theorem 12.2: On the boundary
H, the function f is zero. Since f is continuous and H is compact, f assumes its
minimum and maximum values in H. Either the minimum, or the maximum is assumed
at an interior point of H, or the function f is identically zero on H. In all cases, there
exists a critical point inside H, at which both fa = fb = 0.
10 Problem 12.15. Find the minimum value of the function f in the domain E.
Answer. Extremal values of any differentiable function can be assumed only at critical
points, or on the boundary of a domain. The uniqueness of the critical point in E was
shown, and the critical value has been calculated in Problem 12.6. The level is

f (ac , ac ) = cos( 2 ac ) cos2 ac = 1.56992
This is the absolute minimum of f , since f is zero on H and the axes, positive in E \H,
but negative only in the interior of H E.
For the students who think all this can be done simpler, I pose the following prob-
lem. Anyway, monotonicity arguments are interesting and lead to even sharper results.
Observe that it is impossible that both partial derivatives satisfy fa (a, b) > 0 and
fb (a, b) > 0 for all (a, b) E.
10 Problem 12.16. Give the reason for this observation. Prove that for all
(a, b) E which satisfy the equation f (a, b) = 0, both partial derivatives fa (a, b) and
fb (a, b) are positive.

13 Approximation Theory
Main Theorem (The Weierstrass Approximation Theorem). Any continuous
function on a closed bounded interval can be uniformly approximated by polynomials,
with any given accuracy.

98
Main Theorem (Polynomial Approximation in Several Dimensions). Any con-
tinuous function on a compact subset of Rd can be uniformly approximated by polyno-
mials in d variables, with any given accuracy.
Main Theorem (Trigonometric Approximation). Any continuous 2-periodic func-
tion can be uniformly approximated by trigonometric polynomials, with any given accu-
racy.
There exist a lot of different proofs of these basic theorems. I present the proof based
on Bernstein polynomials, which can be easily modified to include the case of several
dimensions.
We begin with the motivation from probability theory. Let x be the probability to
flip head with an arbitrary, nonsymmetric coin. The probability to get k times head
in a series on n independent coin flips is
 
n k
pk (x) = x (1 x)nk
k
Counting the number of heads in such a series of coin flips defines a random variable
X. From the calculation of the expectation value and variance of X, one knows the
formulas
n
X
(13.1) pk (x) = 1
k=0
n
X
(13.2) kpk (x) = nx
k=0
n
X
(13.3) (k nx)2 pk (x) = nx(1 x)
k=0

Proof of the Weierstrass Approximation Theorem. Now we consider pk (x) as


polynomial functions of x, defined on the interval [0, 1]. Given any continuous function
f on [0, 1], let its Bernstein approximation polynomial of degree n be
n  
X k
(13.4) Bn [f ](x) := f pk (x)
k=0
n

How close is this approximation? With the identity (13.1), one can estimate the differ-
ence

n   
X k
|f (x) Bn [f ](x)| = f (x) f pk (x)


k=0
n
(13.5) n  
X k
f (x) f pk (x)
k=0
n

99
Given is any accuracy > 0. Because of the uniform continuity of f on the compact
interval [0, 1], there exists > 0 such that

(13.6) |f (x) f (x0 )| < if |x x0 | <
2
Furthermore, we know that a continuous function on a compact interval is bounded.
Hence there exists M such that
(13.7) |f (x)| M for all x [0, 1]
In order to estimate the difference |f (x) f nk |, we distinguish the two cases


near-zone |x nk | <
far-zone |x nk |
In the near-zone, we use continuity. In the far zone, the Tschebychev-trick worksgrace
of the variance identity (13.3). In both near-zone and far-zone, we get
  " 2 # 2
k 2M k 2M k
(13.8) f (x) f n max 2 , 2 x n 2 + 2 x n

Now we plug into the estimate (13.5)


n  
f (x) f k pk (x)
X
|f (x) Bn [f ](x)|
k=0
n
n
" 2 #
X 2M k
+ 2 x pk (x)
k=0
2 n
n n 2
X 2M X k
= pk (x) + 2 x pk (x)
2 k=0 k=0 n

The first expression can be calculated with the sum identity (13.1). The last expression
can be calculated with the variance identity (13.3), divided by n2 . Hence we get:
2M x(1 x) 2M 1
(13.9) |f (x) Bn [f ](x)| + 2 + 2
2 n 2 4n
In order to get the required bound, we need to choose the degree n of the approximating
polynomial high enough such that
M
n 2

and get
|f (x) Bn [f ](x)|
uniformly for all x [0, 1], as required.

100
10 Problem 13.1. Assume that f (0) = 0, f (x) 0 for all x [0, 1], and a
Lipschitz condition
|f (x) f (x0 )| L|x x0 |
for all x, x0 [0, 1]. Check that you can choose = L , and that even

L 1
|f (x) Bn [f ](x)| + 2
2 4n
Optimize > 0, and prove that even
3L
(13.10) |f (x) Bn [f ](x)|
3
16n

Answer. The Lipschitz condition implies |f (x) f nk | < L, hence we can replace 2M


by L in the basic estimate (13.9). Too, the Lipschitz condition allows to set = L .
Thus one gets

L x(1 x) L3 1
(13.11) |f (x) Bn [f ](x)|
+ 2 + 2
2 n 2 4n
q
The minimum of the right hand side is assumed for := L 3 n4 . The minimum value
yield the estimate as claimed.
Proof of approximation in several dimensions. Given any continuous function
f on the square [0, 1] [0, 1], let its Bernstein approximation polynomial of degree n be
n X n  
X k l
(13.12) Bn [f ](x, y) := f , pk (x)pl (y)
k=0 l=0
n n

For given accuracy > 0, we use the uniform continuity and boundedness of the function
f on the compact set [0, 1][0, 1]. In order to estimate the difference |f (x, y)f nk , nl |,


we distinguish the three cases

near-zone |x nk | < and |y nl | <

far-zone in x |x nk |

far-zone in y |y nl |

In all three cases we get


  2 2
k l 2M x k + 2M

y l

(13.13) f (x, y) f n , n 2 + 2

n 2 n

101
and get the estimate
n X n  
X k l
|f (x, y) Bn [f ](x, y)| f (x, y) f , pk (x)pl (y)
k=0 l=0
n n
n X n
" 2 2 #
X 2M k 2M l
+ 2 x + 2 y pk (x)pl (y)
k=0 l=0
2 n n
2M x(1 x) 2M y(1 y)
= + 2 + 2
2 n n
4M 1
+ 2
2 4n
In order to get the required bound, we need to choose the degree n of the approximating
polynomial high enough such that
2M
n 2

and get

|f (x, y) Bn [f ](x, y)|

uniformly for all (x, y) [0, 1] [0, 1], as required.


Proof of approximation by trigonometric polynomials. Given any continuous
2-periodic function f on [0, 2], we define a function g(x, y) on the unit circle by
setting
g(cos t, sin t) = f (t)
for all t [0, 2]. One can extend this function to a continous function on a square, for
example by setting
g(r cos t, r sin t) = rf (t)
for all t [0, 2] and r 0 such that (r cos t, r sin t) [1, 1] [1, 1]. By the last
theorem, this function g can be uniformly approximated by a polynomial
n X
X n
(13.14) P (x, y) = akl xk y l
k=0 l=0

To convert P into a trigonometric polynomial, use complex variable


z+z zz
x= ,y=
2 2i
The binomial theorem can convert
n X
n
X (z + z)k (z z)l
(13.15) P (x, y) = akl
k=0 l=0
2k+l ii

102
into a sum
n X
X n n X
X n
p q
(13.16) P (x, y) = bpq z z = bpq z pq |z|2q
p=0 q=0 p=0 q=0

Finally one goes back to polar coordinates by putting in z s = rs (cos st + i sin st).
n X
X n
(13.17) P (x, y) = bpq rp+q [cos(p q)t + i sin(p q)t]
p=0 q=0

If P is real, then bqp = bpq and hence with bpq = cpq + idpq and cqp = cpq , dqp = dpq we
have a real trigonometric polynomial
n X
X n
(13.18) P (r cos t, r sin t) = rp+q [cpq cos(p q)t dpq sin(p q)t]
p=0 q=0

One may now restrict to r = 1 to get a trigonometric polynomial which approximates


the given function f (t) = g(cos t, sin t) with the prescribed accuracy > 0.
10 Problem 13.2. Convert the two variable polynomial

P (x, y) = 2xy + 32x2 y 3

into a trigonometric polynomials as in formula (13.18). Check your result with the
graphing calculator.
z+z zz
Answer. With x = 2
,y= 2i
, one gets

z+z zz (z + z)2 (z z)3


P (x, y) = 2 + 32
2 2i 4 8i3
2 2
z z
= + i(z 2 z 2 )2 (z z)
2i
z2 z2
= + i(z 4 2|z|4 + z 4 )(z z)
2i
z2 z2
= + i(z 5 2z|z|4 + |z|2 z 3 ) i(z 3 |z|2 2|z|4 z + z 5 )
2i
= r2 sin 2t + r5 (2 sin 5t + 4 sin t + 2 sin 3t)
= 4r5 sin t + r2 sin 2t + 2r5 sin 3t 2r5 sin 5t

14 Lagrange Multipliers
10 Problem 14.1. Use Lagrange multipliers to get the extrema of f (x, y) = x5 +y 6
under the constraint x2 + y 2 = 1. How many extrema, how many local minima, how

103
many local maxima has the function f on the unit circle? Calculate the values at the
extremal points, too.
(I get a forth order equation for x, which one can solve numerically on the graphing
calculator. The character of minima or maxima, I have checked numerically, too.)

104

Vous aimerez peut-être aussi