An Introduction To Nonlinear Waves

An Introduction to Nonlinear Waves
Erik Wahlén
Copyright
c 2011 Erik Wahlén
Preface
The aim of these notes is to give an introduction to the mathematics of nonlinear waves.
The waves are modelled by partial differential equations (PDE), in particular hyperbolic or
dispersive equations. Some aspects of completely integrable systems and soliton theory are
also discussed. While the goal is to discuss the nonlinear theory, this cannot be achieved
without first discussing linear PDE. The prerequisites include a good background in real and
complex analysis, vector calculus and ordinary differential equations. Some basic knowledge
of Lebesgue integration and (the language of) functional analysis is also useful. There are two
appendices providing the necessary background in these areas. I have not assumed any prior
knowledge of PDE.
The notes have been used for a graduate course on nonlinear waves in Lund, 2011. I
want to thank my students Fatemeh Mohammadi, Karl-Mikael Perfekt and Johan Richter for
pointing out some errors. I also want to thank my colleague Mats Ehrnstöm for letting me use
parts of his lecture notes on ‘Waves and solitons’.
Erik Wahlén,
Lund, December 2011
3
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Introduction 9
1.1 Linear waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.1 Sinusoidal waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.2 The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Nonlinear waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Solitons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 The Fourier transform 25

2.1 The Schwartz space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The Fourier transform on S . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Extension of the Fourier transform to L p . . . . . . . . . . . . . . . . . . . . 31
3 Linear PDE with constant coefficients 35

3.1 The heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.2 Solution by Fourier’s method . . . . . . . . . . . . . . . . . . . . . . 37
3.1.3 The heat kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.4 Qualitative properties . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.2 Solution by Fourier’s method . . . . . . . . . . . . . . . . . . . . . . 44
3.2.3 Solution formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.4 Qualitative properties . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 The Laplace equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 General PDE and the classification of second order equations . . . . . . . . . 52
3.5 Well-posedness of the Cauchy problem . . . . . . . . . . . . . . . . . . . . . 54
3.6 Hyperbolic operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 First order quasilinear equations 65

4.1 Some definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 The method of characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 Conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5
6
4.3.1 Discontinuous solutions . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3.2 The Riemann problem . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.3 Entropy conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 Distributions and Sobolev spaces 83

5.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Operations on distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Tempered distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 Approximation with smooth functions . . . . . . . . . . . . . . . . . . . . . 91
5.5 Sobolev spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6 Dispersive equations 97
6.1 Calculus in Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2 NLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.1 Linear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.2 Nonlinear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 KdV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.1 Linear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.2 Nonlinear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.3 Derivation of the KdV equation . . . . . . . . . . . . . . . . . . . . 123
6.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7 Solitons and complete integrability 127

7.1 Miura’s transformation, Gardner’s extension and infinitely many conservation
laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.2 Soliton solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.3 Lax pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.1 The scattering problem . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.2 The evolution equation . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.3 The relation to Lax pairs . . . . . . . . . . . . . . . . . . . . . . . . 137
7.3.4 A Lax-pair formulation for the KdV equation . . . . . . . . . . . . . 139
A Functional analysis 141

A.1 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
A.2 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
A.3 Metric spaces and Banach’s fixed point theorem . . . . . . . . . . . . . . . . 145
A.4 Completions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
B Integration 149
B.1 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.2 Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.3 Convergence properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
B.4 L p spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
B.5 Convolutions and approximation with smooth functions . . . . . . . . . . . . 156
7
B.6 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8
Chapter 1
Introduction
You are probably familiar with many different kinds of wave phenomena such as surface
waves on the ocean, sound waves in the air or other media, and electromagnetic waves, of
which visible light is a special case. A common feature of these examples is that they all can
be described by partial differential equations (PDE). The purpose of these lecture notes is to
give an introduction to various kinds of PDE describing waves. In particular we will focus on
nonlinear equations. In this chapter we introduce some basic concepts and give an overview
of the contents of the lecture notes.
1.1 Linear waves

1.1.1 Sinusoidal waves
The first encounter with the mathematical theory of waves is usually with cosine (or sine)
waves of the form
u(x,t) = a cos(kx ± ωt).
Here x,t ∈ R denote space and time, respectively, and the parameters a, k and ω are positive
numbers. a is the amplitude of the wave, k the wave number and ω the angular frequency.
Note that u is periodic in both space and time with periods
2π
λ=
k
and
2π
T= .
ω
λ is usually called the wavelength and T simply the period. The wave has crests (local max-
ima) when kx ± ωt is an even multiple of π and troughs (local minima) when it is an odd
multiple of π.
Rewriting u as
u(x,t) = a cos(k(x ± ct)),
9
10 Introduction
c
a x
Figure 1.1: A cosine wave.
where
ω
c= , (1.1)
k
we see that as t changes, the fixed shape
X 7→ cos(kX)
is simply being translated at constant speed c along the x-axis. The number c is called the wave
speed or phase speed. The wave moves to the left if the sign is positive and to the right if the
sign is negative. A wave of permanent shape moving at constant speed is called a travelling
wave or sometimes a steady wave.
1.1.2 The wave equation

Assume that the phenomenon being studied is linear so that more complicated wave forms can
be obtained by adding different cosine waves. In order to do this, we must be more specific
about the relation between the parameters c and k (the parameter ω is determined by c and k
through (1.1)). One often makes the assumption that c is independent of k, so that all cosine
waves move at the same speed. Note that the waves then satisfy the PDE
utt (x,t) = c2 uxx (x,t) (1.2)
(subscripts denote partial derivatives). This is the wave equation in one (spatial) dimension.
The assumption that one can add the waves together agrees with the linearity of the wave
equation; any linear combination of solutions of (1.2) is also a solution of (1.2).
A better way of deriving the wave equation is to start from physical principles. Here is an
example.
Example 1.1. What we perceive as sound is really a pressure wave in the air. Sound waves
are longitudinal waves, meaning that the oscillations take place in the same direction as the
wave is moving. The equations governing acoustic waves are the equations of gas dynamics.
Introduction 11
These form a system of nonlinear PDE for the velocity u and the density ρ. Assuming that the
oscillations are small, one can derive a set of linearized equations:
ρ0 ut + c20 ∇ρ = 0,
ρt + ρ0 ∇ · u = 0,
where ρ0 is the density and c0 the speed of sound in still air. Here u(·,t) : Ω → R3 while
ρ(·,t) : Ω → R for some open set Ω ⊂ R3 . If we consider a one-dimensional situation (e.g. a
thin pipe), the system simplifies to
ρ0 ut + c20 ρx = 0,
ρt + ρ0 ux = 0.
Differentiating both equations with respect to t and x, we obtain after some simplifications
that both u and ρ satisfy the wave equation:
utt = c20 uxx
and
ρtt = c20 ρxx .
Note that ρ describes the fluctuation of the density around ρ0 . In the linear approximation,
the pressure fluctuation p around the constant atmospheric pressure is simply proportional to
ρ. It follows that p also satisfies the wave equation. The nonlinear equations of gas dynamics
will be discussed in Chapter 4.
We have assumed that the waves are even in x ± ct. To find the most general form of the
wave we also have to include terms of the form sin(k(x ± ct)). In order to make the notation
simpler it is convenient to use the complex exponential function. The original real-valued
function can be recovered by separating into real and imaginary parts. It is convenient also to
allow k to be negative. A general linear combination of sinusoidal waves can be written
u(x,t) = ∑(ak eik(x−ct) + bk eik(x+ct) ),

k
which is often rewritten (with other coefficients) in the form
u(x,t) = ∑ eikx (ak cos(kct) + bk sin(kct)).

k
Depending on the physical situation there might be boundary conditions which limit the range
of wave numbers k. We might e.g. only look for a wave in a bounded interval, say [0, π], which
vanishes at the end points. This forces us to look for solutions of the form
∞
u(x,t) = ∑ sin(kx)(ak cos(kct) + bk sin(kct)).
k=1
12 Introduction
In these notes we will not discuss boundaries, and hence there is no restriction on k. The most
general linear combination is then not a sum, but an integral
Z
u(x,t) = eiξ x (a(ξ ) cos(ξ ct) + b(ξ ) sin(ξ ct)) dξ
R
(we have changed k to ξ to emphasize that it is a continuous variable). If you have studied
Fourier analysis, you will recognize this as a Fourier integral. Suppose that we know the
initial position and velocity of u, that is, u(x, 0) and ut (x, 0). Then a and b are determined by
Z
u(x, 0) = eiξ x a(ξ ) dξ
R
and Z
ut (x, 0) = eiξ x cξ b(ξ ) dξ .
R
There are of course many question marks to be sorted out if one wants to make this into
rigorous mathematics. In particular, one has to know that the functions u(x, 0) and ut (x, 0)
can be written as Fourier integrals and that the functions a and b are unique. This question is
discussed in Chapters 2 and 6. In Chapter 3 we will discuss the application of Fourier methods
to solving PDE such as the wave equation.
There is also another way of solving the wave equation discovered by d’Alembert. The
idea is that the equation can be rewritten in the factorized form
(∂t − c∂x )(∂t + c∂x )u = 0.
Introducing the new variable v = ut + cux , we therefore obtain the two equations
ut + cux = v,
vt − cvx = 0.
The expression vt − cvx is the directional derivative of the function v in the direction (−c, 1).
The second equation therefore expresses that v is constant on the lines x + ct = const. In other
words, v is a function of x + ct, v(x,t) = f (x + ct). The first equation now becomes
ut + cux = f (x + ct)
The left hand side of this equation can be interpreted as the directional derivative of u in
the direction (c, 1). Evaluating u along the straight line γ(s) = (y + cs, s), where y ∈ R is a
parameter, we find that
d
((u ◦ γ)(s)) = (v ◦ γ)(s),
ds
and hence
Z
(u ◦ γ)(s) = (v ◦ γ)(r) dr
Z
= f (y + 2cr) dr
= F(y + 2cs) + G(y)
Introduction 13
1 s
R R
where F(s) = 2c 0 f (r) dr and G(y) is an integration constant ( is used to denote the indefi-
nite integral). Setting s = t, y = x − ct, we obtain that
u(x,t) = F(x + ct) + G(x − ct).
This can interpreted as saying that the general solution is a sum of a wave moving to the left,
F(x + ct), and a wave moving to the right, G(x − ct). One usually derives this formula by
making the change of variables y = x − ct, z = x + ct, which results in the equation uyz = 0.
The approach used here is however useful to keep in mind when we discuss the method of
characteristics in Chapter 4 (see also Section 1.2 below).
The functions F and G can again be uniquely determined once we know the initial data
u(x, 0) = u0 (x) and ut (x, 0) = u1 (x). Indeed, we obtain the system of equations
F + G = u0 ,
1
F − G = U1 ,
c
where U10 = u1 , which has the unique solution
1 1
F = u0 + U1 ,
2 2c
1 1
G = u0 − U1
2 2c
Rs
Since U1 (s) = 0 u1 (r) dr +C, this results in d’Alembert’s solution formula
Z x+ct
1 1
u(x,t) = (u0 (x + ct) + u0 (x − ct)) + u1 (s) ds.
2 2c x−ct
Since we have derived a formula for the solution, it follows that it is unique under suitable
regularity assumptions.
Theorem 1.2. Assume that u0 ∈ C2 (R), u1 ∈ C1 (R) and c > 0. There exists a unique solution
u ∈ C2 (R2 ) of the initial-value problem

2
utt = c uxx

u(x, 0) = u0 (x)

ut (x, 0) = u1 (x).

The solution is given by d’Alembert’s formula

Z x+ct
1 1
u(x,t) = (u0 (x + ct) + u0 (x − ct)) + u1 (s) ds.
2 2c x−ct
14 Introduction
1.1.3 Dispersion
In many cases, the assumption that c is constant as a function of k is unrealistic. If c is a
non-constant function of k one says that the wave (or rather the equation) is dispersive. This
means that waves of different wavelengths travel with different speeds. An initially localized
wave will disintegrate into separate components according to wavelength and disperse. The
relation between ω and k is called the dispersion relation (although some authors use this term
for the relation between c and k).
Example 1.3. The equation
h̄2
ih̄ut + uxx = 0
2m
is the one-dimensional version of the famous free Schrödinger equation from quantum me-
chanics, describing a quantum system consisting of one particle. h̄ is the reduced Planck’s
constant, and m the mass of the particle. The function u, called the wave function, is complex-
valued and its modulus square, |u|2 , can be interpreted as a probability density. Assuming that
u is square-integrable and has been normalized so that
Z
|u(x,t)|2 dx = 1,
R
the probability of finding the particle in the interval [a, b] is given by

Z b
|u(x,t)|2 dx.
a
After rescaling t we can rewrite the Schrödinger equation in the form
iut + uxx = 0.
Making the Ansatz u(x,t) = ei(kx−ω(k)t) , we obtain the equation
ω(k) = k2
and therefore
ω(k)
c(k) = = k.
k
Note that we allow c, k and ω to be negative now, so that waves can travel either to the left
(k < 0) or to the right (k > 0).
Example 1.4. The linearized Korteweg-de Vries (KdV) equation
ut + uxxx = 0
appears as a model for small-amplitude water waves with long wavelength. The equation
is sometimes called Airy’s equation, although that name is also used for the closely related
ordinary differential equation y00 − xy = 0. Substituting u(x,t) = ei(kx−ω(k)t) into the equation,
we find that
ω(k) = −k3
Introduction 15
and hence
c(k) = −k2 .
The fact that c is negative means that the waves travel to the left.
Since c(k) depends on k for dispersive waves, the wave speed only gives the speed of a
wave with a single wave number. A different speed can be found if one looks for a solution
which consists of a narrow band of wave numbers. For simplicity, we suppose that the equation
has cosine solutions cos(kx − ω(k)t) and consider the sum of two such waves with nearly the
same wave numbers k1 and k2 . Let ω1 and ω2 be the corresponding angular frequencies
and suppose that the waves have the same amplitude a. Using the trigonometric identity
cos(α) + cos(β ) = 2 cos((α − β )/2) cos((α + β )/2), the sum can be written

(k1 − k2 )x (ω1 − ω2 )t (k1 + k2 )x (ω1 + ω2 )t
2a cos − cos − .
2 2 2 2
This is a product of two cosine waves, one with wave speed (ω1 + ω2 )/(k1 + k2 ) and the other
with speed (ω1 − ω2 )/(k1 − k2 ). Setting k1 = k0 + ∆k and k2 = k0 − ∆k, we obtain as ∆k → 0
the approximate form
2a cos(∆k(x − ω 0 (k0 )t)) cos(k0 (x − c(k0 )t)).
We thus find an approximate solution in the form of a carrier wave, cos(k0 (x − c(k0 )t)), with
speed c(k0 ), multiplied by an envelope, cos(∆k(x − ω 0 (k0 )t)), with speed ω 0 (k0 ). This is an
example of a wave packet. Notice that the carrier has wavelength 2π/k0 whereas the envelope
has the much longer wavelength 2π/∆k. The speed
dω
cg =
dk
of the envelope is called the group speed. Notice the similarity with the definition of the phase
speed
ω
c= .
k
In a wave packet, the group speed measures the speed of the packet as a whole, whereas the
phase speed measures the speed of the individual components of the packet. See Figure 1.2
for an illustration. We will analyze this in more detail in Chapter 6.
Example 1.5.
• For the wave equation, we have ω± (k) = ±ck, so the group speed and the phase speed
are equal.
• For the Schrödinger equation we have cg (k) = ω 0 (k) = 2k while c(k) = k.
• For the linearized KdV equation, we have cg (k) = ω 0 (k) = −3k2 and c(k) = −k2 .
We remark that for each k 6= 0, the wave equation has solutions travelling to the left and
solutions travelling to the right, whereas the linearized KdV equation only have solutions
travelling in one direction. We say that the wave equation is bidirectional while the linearized
KdV equation is unidirectional. The Schrödinger equation is somewhere in between; it allows
waves travelling in both directions, but the direction is determined by the sign of k.
16 Introduction
cg
Figure 1.2: A wave packet. The groups speed, cg , is the speed of the packet as a whole, while
the phase speed c is the speed of the individual crests and troughs.
1.2 Nonlinear waves

The (inviscid) Burgers equation
ut + uux = 0 (1.3)
is a simplified model for fluid flow. It is named after the Dutch physicist Jan Burgers who
studied the viscous version
ut + uux = εuxx , ε > 0,
(viscosity is the internal friction of a fluid) as a model for turbulent flow. Recalling our discus-
sion of the wave equation above, we can think of the Burgers equation as
ut + c(u)ux = 0,
with ‘speed’ c(u) = u. Again, the left hand side can be thought of as a directional derivative in
the direction (u, 1), which now depends on u. When studying the equation ut + cux = 0 with
constant c, we found that the solution was constant on straight lines in the direction (c, 1).
This suggests that the solution of Burgers’ equation should be constant on curves with tangent
(u(x,t), 1). Such curves are given by (x(t),t), where x(t) is a solution of the differential
equation
ẋ(t) = u(x(t),t).
This doesn’t seem to help much since we don’t know u. However, since u is constant on the
curve, the equation actually simplifies to
ẋ(t) = u0 (x0 ),
where x0 = x(0) and u0 (x) = u(x, 0). This means that the curves (x(t),t) actually are straight
lines,
(x(t),t) = (x0 + u0 (x0 )t,t),
and that we have
u(x0 + u0 (x0 )t,t) = u0 (x0 ). (1.4)
Introduction 17
Figure 1.3: The evolution of a positive solution of the Burgers equation. ‘Particles’ located
higher up have greater speed. The solution steepens and eventually becomes multivalued. Of
course, it is then no longer a function and doesn’t satisfy equation (1.3).
This determines, at least formally, the solution in an implicit way. Note that this gives credence
to our interpretation of u as a speed: considering a ‘particle’ situated originally on the graph of
u at (x0 , u0 (x0 )), this particle will travel parallel to the x-axis with constant speed u0 (x0 ) (the
particle will travel backwards if u0 (x0 ) < 0). This suggests that (1.4) actually does define the
solution implicitly for some time. However, suppose that the function u0 has negative slope
somewhere, so that there exist points x1 < x2 with u0 (x1 ) > u0 (x2 ). We suppose for simplicity
that these values are positive. The point originally located at (x1 , u0 (x1 )) will then have greater
speed than the point located at (x2 , u0 (x2 )) so that at some point T the solution will develop a
vertical tangent. Continuing the ‘Lagrangian’1 solution beyond this point in time, the function
u will become multivalued (see Figure 1.3). Clearly, u then ceases to be a solution of the
original equation (1.3). In Chapter 4 we will study a different way of continuing the solution
past the time T , namely as a shock wave. The solution then continues to be the graph of a
function, but now has a jump discontinuity. Clearly, this solution can’t satisfy (1.3) either, and
so we have to develop a different way of interpreting solutions.
The above example shows that nonlinear waves may develop singularities. It turns out that
dispersion can sometimes help, however. If we add to the Burgers equation the dispersion
term uxxx , we obtain the famous Korteweg-de Vries (KdV) equation
ut + 6uux + uxxx = 0 (1.5)
(the coefficient 6 is inessential). This equation is named after the Dutch mathematician
Diederik Korteweg and his student Gustav de Vries, who derived it as an equation for long
water waves in 1895. The equation had in fact been derived previously by Joseph Boussinesq
[3]. As we will show in Chapter 6, the solutions of the KdV equation remain smooth for all
times. Naively one can think of the nonlinear terms as trying to focus and steepen the wave.
The dispersive term on the other hand tends to spread the wave apart and therefore might be
able to counteract the nonlinearity.
1 In
the Lagrangian approach to fluid mechanics, the coordinates follow the fluid particles. In the Eulerian
approach the coordinates are fixed while the fluid moves.
18 Introduction
1.3 Solitons
The KdV equation was derived in attempt to explain a phenomenon observed earlier by the
Scottish naval engineer John Scott Russell. In 1834, on the Union canal between Edinburgh
and Glasgow, Scott Russell made a remarkable discovery. This is his own account of the event
[22].
I was observing the motion of a boat which was rapidly drawn along a narrow
channel by a pair of horses, when the boat suddenly stopped—not so the mass
of water in the channel which it had put in motion; it accumulated round the
prow of the vessel in a state of violent agitation; then suddenly leaving it behind,
rolled forward with great velocity, assuming the form of a large solitary elevation,
a rounded, smooth and well defined heap of water, which continued its course
along the channel apparently without change of form or diminution of speed. I
followed it on horseback, and overtook it still rolling on at a rate of some eight
or nine miles an hour, preserving its original figure some thirty feet long and a
foot to foot and half in height. Its height gradually diminished, and after a chase
of one or two miles I lost it in the windings of the channel. Such, in the month
of August, 1834, was my first chance interview with that singular and beautiful
phenomenon which I have called the Wave of Translation.
As far as we know this is the first reported observation of a solitary water wave. Scott Rus-
sell was to spend a great deal of time reproducing the phenomenon in laboratory experiments.
He built a small water channel in the end of which he dropped weights, producing solitary
bump-like waves of the sort he had earlier observed. Figure 1.4 shows Scott Russell’s own
illustrations of his experiments.
At that time the predominant mathematical theory for water waves was linear. In particular,
the esteemed mathematicians and fluid dynamicists Sir George Gabriel Stokes and Sir George
Biddell Airy–both with chairs at University of Cambridge—did not accept Russell’s theories,
because they believed that it contradicted the mathematical/physical laws of fluid motion. In
fact, if one considers only linear dispersive equations one finds that solitary travelling waves on
the whole line are impossible. For the linearized KdV equation, any wave is the superposition
of periodic travelling waves of different wave numbers which disperse with time.
To explain Scott Russell’s observations, we look for a travelling wave solution of the KdV
equation which decays to 0 at infinity. Such a solution is called a solitary wave.2 We thus
make the Ansatz u(x,t) = U(x − ct). This yields the ODE
−cU 0 (X) + 6U(X)U 0 (X) +U 000 (X) = 0, (1.6)
where X = x − ct. Since we are looking for a solitary wave we also require that
lim U(X) = 0. (1.7)

X→±∞
2A solitary wave could also converge to a non-zero constant at infinity.
Introduction 19
Figure 1.4: John Scott Russell’s illustrations of his experiments from [22].
20 Introduction
Figure 1.5: The phase portrait of (1.9) with c > 0. The thick curve represents the homoclinic
orbit corresponding to the solitary wave. The filled circles represent the equilibria (0, 0) and
(0, c/3).
We may integrate equation (1.6) to
U 00 = cU − 3U 2 +C1 , C1 ∈ R.
In view of (1.7), we set C1 = 0. Otherwise U 00 would converge to a non-zero number, yielding

a contradiction. We thus arrive at the second order ODE
U 00 = cU − 3U 2 . (1.8)
This can be written as the first-order system

(
U0 = V
(1.9)
V 0 = cU − 3U 2
and we find that the function
H(U,V ) = V 2 − cU 2 − 2U 3
is constant on the orbits. We can thus find the phase portrait by drawing the level curves of
H. Moreover, the orbit corresponding to a solitary wave must lie in the level set H −1 (0). This
reveals that we must take c > 0 in order to find a solitary wave. The phase portrait in this case is
shown in Figure 1.5. We find that there is a homoclinic orbit connecting the origin with itself.
Since the ODE is autonomous, this orbit corresponds to a family of solutions X 7→ U(X − X0 ),
X0 ∈ R, of (1.6)–(1.7). We also note from Figure 1.5 that U(X) > 0 for all X.
We can in fact find an explicit formula for U(X). To make U unique, we assume (without
loss of generality) that U(0) = c/2 and hence U 0 (0) = V (0) = 0 (from the equation H(U,V ) =
0). Note that
X 7→ (U(−X), −V (−X))
Introduction 21
also is a solution of (1.9) with the same initial data, so that by the Picard-Lindelöf theorem,
U(X) = U(−X) (U is even). It therefore suffices to find the solution for X > 0. Also, note that
U(X) ∈ (0, c/2] for all X. The equation H(U,U 0 ) = 0 can be written
(U 0 )2 = cU 2 − 2U 3 ,
which yields √
U 0 = −U c − 2U,
for X ≥ 0. This is a separable ODE. For X > 0 we know that U(X) < c/2 and hence
Z U(X)
dU
− √ = X − X0 , X0 > 0, X > 0,
U(X0 ) U c − 2U
By continuity this also holds in the limit X0 → 0, so that
Z U(X)
dU
− √ = X, X > 0.
c/2 U c − 2U
To calculate this integral, we introduce the new variable θ defined by
c ∂U c sinh(θ )
U= , =− , θ > 0.
2 cosh2 (θ ) ∂θ cosh3 (θ )
Note that θ = 0 corresponds to U = c/2 and θ = ∞ to U = 0. The identity cosh2 (θ ) −
sinh2 (θ ) = 1 implies that
s s
√ cosh2 (θ ) − 1 √ sinh(θ )

1
c − 2U = c 1 − = c = c ,
cosh2 (θ ) cosh2 (θ ) cosh(θ )
and
1 1 ∂U 2 cosh2 (θ ) cosh(θ ) c sinh(θ ) 2
√ =− √ 3
= −√ .
U c − 2U ∂ θ c c sinh(θ ) cosh (θ ) c
We therefore obtain that
Z θ (X)
2 2
√ θ (X) = √ dθ
c 0 c
Z U(X)
dU
=− √
c/2 U c − 2U
= X,
where
c
U(X) = .
2 cosh2 (θ (X))
It follows that
√
c c c
U(X) = √ = sech2 X , X ≥ 0, (1.10)
2 cosh2 c 2 2
2 X
where sech = 1/ cosh is the hyperbolic secant. Since U and sech are even, the formula also
holds for X < 0. Going back to the original variables, we have proved the following result.
22 Introduction
Figure 1.6: The great wave of translation. Three solitary-wave solutions of the KdV equation
of the form (1.10). Notice that fast waves are tall and narrow.
Theorem 1.6. For each c > 0 there is a family of solitary-wave solutions

√
c 2 c
u(x,t) = sech (x − ct − x0 ) , x0 ∈ R, (1.11)
2 2
of the KdV equation (1.5). For c < 0 there are no solitary wave solutions.
Note that the amplitude is proportional to the wave speed c, so that higher √ waves travel
faster. The width of the wave3 , on the other hand, is inversely proportional to c. In other
words, the waves become narrower the taller they are (see Figure 1.6). Another interesting
aspect is that the solitary waves move in the opposite direction of the travelling waves of the
linearized KdV equation (cf. Example 1.4). In fact, if one perturbs a solitary wave, one will
generally see a small ‘dispersive tail’ travelling in the opposite direction.
The KdV equation was largely forgotten about by the mathematical community until the
1960’s. In a numerical study of the initial-value problem
ut + uux + δ 2 uxxx = 0, u(x, 0) = cos(πx), (1.12)
with periodic boundary conditions and δ 6= 0 but small, Martin Kruskal and Norman Zabusky
discovered that the solution splits up into finitely many distinct waves resembling the sech2 -
wave, which interacted with each other almost as the equation was linear. They called the
waves solitons. We will investigate this in much more detail in Chapter 7. Figure 1.7 shows
some pictures from Kruskal and Zabusky’s original paper.
3 The width can e.g. be defined as the distance between to points where the elevation is half of the amplitude,
u = c/4.
Introduction 23
<
=
0
>
1
2
?
3
@
;4
5
6
/7
8
9
:.A
-B
+
,C
*#
D
$
%
)"E
(F
!&
G
'K
JH
L
M
IN
~
O}P
|m
Q
u
w
v{zyxtR
sln
ko
p
q
rS
T
jU
g
h
V
ià
b
d
ce
W
f_
^]X
\Y
Z
[
<
=
0
>
1
2
?
3
@
;4
5
6
/7
8
9
:.A
-B
+
,C
*#
D
$
%
)"E
(F
!&
G
'K
JH
L
M
IN
~
O}P
|m
Q
u
w
v{zyxtR
sln
ko
p
q
rS
T
jU
g
h
V
ià
b
d
ce
W
f_
^]X
\Y
Z
[
Figure 1.7: The first reported solitons, from the pioneering paper [30] by Kruskal and Zabusky.
The left plot depicts the temporal development of a solution of (1.12) with δ = 0.22. The
wave first seems to break, but instead eventually separates into eight solitary waves whose
paths through space-time can be followed in the picture to the right. We see that the velocities
are constant but for when the solitons interact (the paths are nearly straight lines interrupted
by sudden shifts).
24 Introduction
Chapter 2
The Fourier transform
2.1 The Schwartz space

We begin by studying the Fourier transform on a class of very nice functions. This means
that we don’t have to worry much about technical details. In order to define this class it is
practical to use the ‘multi-index’ notation due to Laurent Schwartz. Let N = {0, 1, 2, . . .}. If
α = (α1 , . . . , αd ) ∈ Nd , we define
∂xα = ∂xα11 ∂xα22 · · · ∂xαdd .
One can think of ∂x as the d-tuple (∂x1 , . . . , ∂xd ). We similarly define xα = x1α1 · · · xdαd for x ∈ Rd
(or even Cd ). The number |α| = α1 + · · · + αn is called the order of the multi-index and we
also write α! = α1 ! · · · αd !. All functions in this chapter are allowed to be complex-valued.
Definition 2.1 (Schwartz space). The Schwartz space S (Rd ) is the vector space of rapidly
decreasing smooth functions,
( )
S (Rd ) := f ∈ C∞ (Rd ) : sup |xα ∂xβ f (x)| < ∞ for all α, β ∈ Nd .
x∈Rd
S (Rd ) contains the class of C∞ functions with compact support, C0∞ (Rd ). See Lemma
B.24 for a non-trivial example of functions belonging to this class. The main reason for
using S instead of C0∞ is that, as we will see, the former class is invariant under the Fourier
transform. This is not the case for C0∞ .
The topology on S (Rd ) is defined by the family of semi-norms (see Section A.1),
k f kα,β := sup |xα ∂xβ f (x)|, α, β ∈ Nd .

x∈Rd
In other words, we say that

ϕn → ϕ
in S (Rd ) if
lim kϕn − ϕkα,β = 0
n→∞
25
26 The Fourier transform
for all α, β ∈ Nd . One can in fact show that

1 k f − gkα,β
ρ( f , g) := ∑ 2|α|+|β | 1 + k f − gkα,β ,
α,β
defines a metric on S (Rd ). The topology defined by this metric is the same as the one defined
by the semi-norms and the metric space (S (Rd ), ρ) is complete. S (Rd ) is an example of
a generalization of Banach spaces called Frechét spaces, but we will not use that in any sub-
stantial way. Notice, though, that the Schwartz space is not a Banach space, i.e. it cannot be
equipped with a norm making it complete (in particular, f 7→ ρ( f , 0) does not define a norm).
Proposition 2.2.
(1) For all N ≥ 0 and α ∈ Nd there exists a C > 0 such that |∂xα f (x)| ≤ C(1+|x|)−N , x ∈ Rd .
(2) S (Rd ) is closed under multiplication by polynomials.
(3) S (Rd ) is closed under differentiation.
(4) S (Rd ) is an algebra: f g ∈ S (Rd ) if f , g ∈ S (Rd ).
(5) C0∞ (Rd ) is a dense subspace of S (Rd ).
(6) S (Rd ) is a dense subspace of L p (Rd ) for 1 ≤ p < ∞.
Proof. The first property is basically a restatement of the definition of S (Rd ). Properties (2)
and (3) follow from the definition of the Schwartz space. Property (4) is a consequence of
Leibniz’ formula for derivatives of products.
It is clear that C0∞ (Rd ) ⊂ S (Rd ). To show that it is dense, let f ∈ S (Rd ) and define
fn (x) = ϕ(x/n) f (x), where ϕ ∈ C0∞ (Rd ) with ϕ(x) = 1 for |x| ≤ 1 (see Lemma B.24). Then
fn − f = (ϕ(x/n) − 1) f (x)
with support in Rd \ Bn (0). From property (1) it follows that | f (x)| ≤ CN (1 + |x|)−N for any
N ≥ 0. Therefore
k fn − f kL∞ (Rd ) ≤ CN (1 + kϕkL∞ (Rd ) )(1 + n)−N → 0
as n → ∞. Since N is arbitrary, the same is true if we multiply fn − f by a polynomial. Since

the partial derivatives of ϕ(x/n) − 1 all have support in Rd \ Bn (0) and are bounded uniformly
in n, the convergence remains true after differentiation. This proves (5).
By estimating
| f (x)| ≤ C(1 + |x|)−d−1
and using the fact that
rd−1
Z Z ∞
−p(d+1)
(1 + |x|) dx = C dr < ∞,
Rd 0 1 + r p(d+1)
The Fourier transform 27
it follows that f ∈ L p (Rd ) for 1 ≤ p < ∞, if f ∈ S (Rd ). Since C0∞ (Rd ) is dense in L p (Rd )
(cf. Corollary B.26), property (6) now follows from property (5) (we only use the part that
C0∞ (Rd ) ⊂ S (Rd )).
2
Example 2.3. The Gaussian function e−|x| is an example of a function in S (Rd ) which is
not in C0∞ (Rd ).
2.2 The Fourier transform on S

Definition 2.4 (Fourier transform). The Fourier transform, F ( f ), of a function f ∈ S (Rd )
is defined by
1
Z
F ( f )(ξ ) = d/2
f (x)e−ix·ξ dx. (2.1)
(2π) Rd
We also use the notation fˆ = F ( f ).
It is clear from the inequality

Z Z
−ix·ξ

Rd f (x)e dx ≤ | f (x)| dx (2.2)
Rd

and Proposition 2.2 (6) that fˆ(ξ ) is defined for each ξ ∈ Rd .
Lemma 2.5. Let f ∈ S (Rd ). Then fˆ is a bounded continuous function with
k fˆkL∞ (Rd ) ≤ (2π)−d/2 k f kL1 (Rd )
Proof. The boundedness follows directly from the inequality (2.2). The continuity is a conse-
quence of the dominated convergence theorem (see Theorem B.15).
Proposition 2.6. For f ∈ S (Rd ), we have that fˆ ∈ C∞ (Rd ) and
F
f (x + y) 7→ eiy·ξ fˆ(ξ ),
F
eiy·x f (x) 7→ fˆ(ξ − y),
F
f (Ax) 7→ | det A|−1 fˆ(A−T ξ )
F
f (λ x) 7→ |λ |−d fˆ(λ −1 ξ ),
F
∂xα f (x) 7→ (iξ )α fˆ(ξ ),
F
xα f (x) 7→ (i∂ξ )α fˆ(ξ ),
where y ∈ Rd , λ ∈ R \ {0} and A is an invertible real d × d matrix.

Proof. The first four properties follow by changing variables in the integral in (2.1). The fifth
property follows by partial integration and the sixth by differentiation under the integral sign.
This also proves that fˆ ∈ C∞ (Rd ).
2
Example 2.7. Let us compute the Fourier transform of the Gaussian function e−|x| /2 , x ∈ Rd .
2
We begin by considering the case d = 1. The function u(x) = e−x /2 satisfies the differential
equation u0 (x) = −xu(x). Taking Fourier transforms we therefore find that iξ û(ξ ) = −iû0 (ξ ),
or equivalently û0 (ξ ) = −ξ û(ξ ). But this means that
ξ2
û(ξ ) = Ce− 2 ,
where
1 x2
Z
C = û(0) = √ e− 2 dx.
2π R
This integral can be computed using the standard trick
1 (x2 +y2 )
ZZ
2
C = e− 2 dx dy
2π R2
Z ∞ r2
= e− 2 r dr
0
= 1.
Since C > 0 it follows that C = 1. Hence,
x2 ξ2
F (e− 2 )(ξ ) = e− 2 ,
that is, u is invariant under the Fourier transform!

This can be generalized to Rd , since
d 2
2 /2
e−|x| = ∏ e−x j /2 , x = (x1 , . . . , xd ) ∈ Rd .
j=1
From this it follows that

d 2
d 2
2 /2 2 /2
F (e−|x| )(ξ ) = ∏ F (e−x j /2 )(ξ j ) = ∏ e−ξ j /2 = e−|ξ | .
j=1 j=1
Note in particular that the Fourier transform of a Gaussian function is in S (Rd ). This is
no coincidence.
Theorem 2.8. For any f ∈ S (Rd ), F ( f ) ∈ S (Rd ), and F is a continuous transformation

of S (Rd ) to itself.
Proof. We estimate the S (Rd ) semi-norms of fˆ as follows:
k fˆkα,β = kξ α ∂ξ fˆkL∞ (Rd )

β
= kF (∂xα (xβ f ))kL∞ (Rd )

≤ (2π)−d/2 k∂xα (xβ f )kL1 (Rd )
= (2π)−d/2 k(1 + |x|)−(d+1) (1 + |x|)d+1 ∂xα (xβ f )kL1 (Rd )
≤ Ck(1 + |x|)d+1 ∂xα (xβ f )kL∞ (Rd ) .
The last line can be estimated by a finite number of semi-norms of f using Leibniz’ formula.
It follows that F maps S (Rd ) to itself continuously.
Recall that the convolution of two functions f and g is defined by
Z
( f ∗ g)(x) := f (x − y)g(y) dy, (2.3)
Rd
whenever this makes sense (see Section B.5).

Theorem 2.9 (Convolution theorem). Let f , g ∈ S (Rd ). Then fd
∗ g = (2π)d/2 fˆĝ.
Proof. Using Fubini’s theorem, we have that
Z Z
(2π) F ( f ∗ g)(ξ ) =
d/2
f (x − y)g(y) dy e−ix·ξ dx
Rd Rd
Z Z
−ix·ξ
= f (x − y)e dx g(y) dy
Rd Rd
Z Z
−i(z+y)·ξ
= f (z)e dz g(y) dy
Rd Rd
Z Z
−iz·ξ −iy·ξ
= f (z)e dz g(y)e dy
Rd Rd
d
= (2π) fˆ(ξ ) ĝ(ξ ).
The inverse Fourier transform is defined by

1
Z
−1
F ( f )(x) = F ( f )(−x) = f (x)eix·ξ dx. (2.4)
(2π)d/2 Rd
It is clearly also continuous on S (Rd ). The name is explained by the following theorem.
Theorem 2.10 (Fourier’s inversion formula). Let f ∈ S (Rd ). Then f (x) = F −1 ( fˆ)(x).
2
Proof. Let K(x) R= (2π)−d/2 e−|x| /2 and recall from Example 2.7 that K̂ = K. In particular,
this implies that Rd K(x) dx = (2π)d/2 K(0) = 1. By assumption, the iterated integral
Z Z
1 −iy·ξ
d/2
f (y)e dy eix·ξ dξ
(2π) R d R d
exists for all x ∈ Rd . Squeezing a Gaussian in makes it absolutely convergent, and

Z
fε (x) := fˆ(ξ )eix·ξ K(εξ ) dξ
Rd
1
Z
→ fˆ(ξ )eix·ξ dξ as ε → 0, (2.5)
(2π)d/2 Rd
by the dominated convergence theorem. Changing order of integration, we also see that
1
ZZ
fε (x) = f (y)ei(x−y)·ξ K(εξ ) dy dξ
(2π)d/2 Rd ×Rd
Z
1
Z
−i(y−x)·ξ
= f (y) e K(εξ ) dξ dy
(2π)d/2 Rd Rd
Z
= f (y) F (K(εξ )) (y − x) dy
Rd Z
−d
=ε f (y) K(ε −1 (y − x)) dy
R
= (Kε ∗ f )(x),
where Kε (x) = ε −d K(ε −1 x). Theorem B.27 implies that
lim fε (x) = f (x). (2.6)

ε→0
The result follows by combining (2.5) and (2.6).
Corollary 2.11. The Fourier transform is a linear bijection of S (Rd ) to itself with inverse
equal to F −1 . That is,
F −1 F = F F −1 = Id on S (Rd ).
Proof. The above results implies that F −1 F = Id on S (Rd ). To prove the corollary, we
must also show that F F −1 = Id. Define the reflection operator R : S (Rd ) → S (Rd ) by
R( f )(x) = f (−x). R commutes with F and F −1 (Proposition 2.6), is its own inverse and
satisfies F −1 = RF . It follows that
F F −1 = F RF = F −1 F = Id .
Corollary 2.12. Let f , g ∈ S (Rd ). Then f ∗ g ∈ S (Rd ).
Proof. This follows from the convolution theorem, Fourier’s inversion formula and the fact
that S (Rd ) is an algebra, since f ∗ g = (2π)d/2 F −1 ( fˆĝ).
2.3 Extension of the Fourier transform to L p

Since S (Rd ) is dense in L p (Rd ) for 1 ≤ p < ∞ (Proposition 2.2), it is natural to try to extend
the Fourier transform to L p by continuity. We have the following abstract result.
Proposition 2.13. Suppose that X is a normed vector space, E ⊂ X is a dense linear subspace,
and Y is a Banach space. If T : E → Y is a bounded linear operator in the sense that there
exists C > 0 such that
kT xkY ≤ CkxkX
for each x ∈ E, then there is a unique bounded extension T̃ : X → Y of T .
Proof. Let S be another bounded extension. If x ∈ X and {xn } is a sequence in E which
converges to X, we find that
lim (T̃ − S)xn = (T̃ − S)x,
n→∞
while at the same time
(T̃ − S)xn = (T − T )xn = 0.
It follows that T̃ x = Sx.
To prove the existence of T̃ , let x and {xn } be as above. Then {xn } is a Cauchy sequence in
X. Since T is bounded it follows that {T xn } is a Cauchy sequence in Y . By completeness of Y
there exists a limit, which we define to be T̃ x. The limit is clearly independent of the sequence
xn since T applied to the difference of two such sequence will converge to 0. Linearity and
boundedness follow by a limiting procedure.
We begin by extending the Fourier transform to L1 . This extension can in fact be defined
without using the above proposition. In order to simplify the notation, we also write fˆ = F ( f )
for the extension. Let Cb (Rd ) be the space of bounded continuous functions on Rd , equipped
with the supremum norm.
Theorem 2.14 (Riemann-Lebesgue lemma). F : S (Rd ) → S (Rd ) extends uniquely to a
bounded linear operator L1 (Rd ) → Cb (Rd ), defined as the absolutely convergent integral
(2.1). Moreover,
k fˆkL∞ (Rd ) ≤ (2π)−d/2 k f kL1 (Rd ) (2.7)
and limx→∞ fˆ(x) = 0.
Proof. Repeating the proof of Lemma 2.5, we find that the extension exists and defines a
bounded operator satisfying the estimate (2.7). Proposition 2.13 shows that it is unique.
To prove that fˆ(x) → 0 as x → ∞, we take ε > 0 and g ∈ S (Rd ) with k fˆ − ĝkL∞ (Rd ) ≤
(2π)−d/2 k f − gkL1 (Rd ) < ε/2. Choosing R large enough, we find that |ĝ(x)| < ε/2 for |x| ≥ R.
Hence,
| fˆ(x)| ≤ k fˆ − ĝkL∞ (Rd ) + |ĝ(x)| < ε
for |x| ≥ R.
In order to extend the Fourier transform to L2 (Rd ), we need an estimate similar to (2.7).
In fact, we can do even better.
Lemma 2.15. For f , g ∈ S (Rd ), we have that

(F ( f ), g)L2 (Rd ) = ( f , F −1 (g))L2 (Rd ) ,
where Z
( f , g)L2 (Rd ) = f (x)g(x) dx
Rd
is the inner product on L2 (Rd ).
Proof. Just as in the proof of Theorem 2.9, we use Fubini’s theorem to change the order of
integration:
Z Z Z
−ix·y
F ( f )(x)g(x) dx = f (y)e dy g(x) dx
Rd Rd Rd
Z Z
= g(x)eix·y dx f (y) dy
Rd Rd
Z
= f (y)F −1 (g)(y) dy.
Rd
In particular, if f , g ∈ S (Rd ), one finds that

(F ( f ), F (g))L2 (Rd ) = ( f , F −1 F (g))L2 (Rd ) = ( f , g)L2 (Rd ) (2.8)
by Fourier’s inversion theorem. Taking g = f , we find that
kF ( f )k2L2 (Rd ) = k f k2L2 (Rd ) = kF −1 ( f )k2L2 (Rd ) , (2.9)
where the last equality follows from the first by replacing f with F −1 ( f ). This is sometimes
called Parseval’s formula. A bijective linear operator which preserves the inner product is
called a unitary operator. Note that a unitary operator automatically is bounded.
Theorem 2.16 (Plancherel’s theorem). The Fourier transform F and the inverse Fourier
transform F −1 on S (Rd ) extend uniquely to unitary operators on L2 (Rd ) satisfying F −1 F =
F F −1 = Id.
Proof. The existence and uniqueness of the extensions follow from Proposition 2.13 and (2.9).
Since F F −1 = F −1 F = Id on a dense subspace, it follows that the same is true on L2 (Rd ).
By continuity, we have that (F ( f ), g)L2 (Rd ) = ( f , F −1 (g))L2 (Rd ) for f , g ∈ L2 (Rd ), so F and
F −1 are unitary.
Since the Fourier transform is defined for L1 (Rd ) and L2 (Rd ), one can in fact extend it to
L p (Rd ), 1 ≤ p ≤ 2, by interpolation. The following result is a direct consequence of Theorems
2.14, 2.16 and B.28.
Theorem 2.17 (Hausdorff-Young inequality). Let p ∈ [1, 2] and q ∈ [2, ∞] with 1p + 1q = 1. The
Fourier transform extends uniquely to a bounded linear operator F : L p (Rd ) → Lq (Rd ), with

d
1− 2p
kF ( f )kLq (Rd ) ≤ (2π) 2
k f kL p (Rd ) , f ∈ L p (Rd ).
The extension to L p (Rd ) for p > 2 is a completely different matter. We will return to this
in Chapter 5.
Note that many of the properties of the Fourier transform on S (Rd ) automatically hold on
L p (Rd ),
1 ≤ p ≤ 2, by continuity. This is e.g. true for the four first properties in Proposition
2.6. One also obtains the following extension of Fourier’s inversion formula.
Proposition 2.18. Let f ∈ L p (Rd ) for some p ∈ [1, 2] and fˆ ∈ L1 (Rd ). Then f is continuous1
and f (x) = F −1 ( fˆ)(x) for all x ∈ Rd .
Note that we have defined the extension in two different ways if e.g. f ∈ L1 (Rd ) ∩ L2 (Rd ).
However, approximating f by a sequence in { fn } in C0∞ (Rd ) with fn → f in L1 and L2 , we find
that fˆn → FL1 ( f ) in Cb and fˆn → FL2 ( f ) in L2 , where the subscripts on the Fourier transforms
are used to differentiate the two extensions. But this implies that FL2 ( f ) = FL1 ( f ) a.e.
Example 2.19. Let (

1, x ∈ [−1, 1],
f (x) =
0, x ∈
6 [−1, 1]
be the characteristic function of the interval [−1, 1]. f ∈ L1 (R), so
1
Z
fˆ(ξ ) = √ f (x)e−ixξ dx
2π R
Z 1
1
=√ e−ixξ dx
2π −1
" #1
1 e−ixξ
=√
2π −iξ x=−1
1 eiξ −e−iξ
=√
2π iξ
r
2 sin(ξ )
= .
π ξ
fˆ is not in L1 (R), so we can’t use the pointwise inversion formula in Proposition 2.18. We do
however have f = F −1 ( fˆ) in the sense of Theorem 2.16 since f ∈ L2 (R). Moreover, we have
k f k2L2 (Rd ) = k fˆk2L2 (Rd ) ,
by Parseval’s formula. When written out, this means that
sin2 ξ
Z
2
dξ = π.
R ξ
1 Meaning that there is an element of the equivalence class which is continuous.

Chapter 3
Linear PDE with constant coefficients
In this chapter we will study the simplest kind of PDE, namely linear PDE with constant coef-
ficients. This means in particular that the superposition principle holds, so that new solutions
can be constructed by forming linear combinations of known solutions. There are two main
reasons for why this chapter is included in the lecture notes. The first reason is that some
important equations describing waves are linear. The second is that many nonlinear wave
equations can be treated as perturbations of linear ones. A knowledge of linear equations is
therefore essential for understanding nonlinear problems.
We will concentrate on the Cauchy problem (the initial-value problem), in which, in ad-
dition to the PDE, the values of the unknown function and sufficiently many time derivatives
are prescribed at t = 0. This presupposes that there is some distinguished variable t which can
be interpreted as time. Usually this is clear from the physical background of the equation in
question. If you are familiar with the Cauchy problem for ordinary differential equations you
know that there always exists a unique solution under sufficient regularity hypotheses. We
will look for conditions under which the same property is true for PDE. Our goal is in fact to
convert the PDE to ODE by using the Fourier transform with respect to the spatial variables
x. From now on, Fourier transforms will only be applied with respect to the spatial variables.
We will consider functions u(x,t) defined for all x ∈ Rd . This is of course a simplification;
in the ‘physical world’, u(x,t) will only be defined for x in some bounded domain and there
will be additional boundary conditions imposed on the boundary of this domain. Our study
will not be completely restricted to equations describing waves, but in the end we will arrive
at a condition which guarantees that the solutions behave like waves in the sense that there
is a well-defined speed of propagation. Equations having this property are called hyperbolic.
Not all equations which describe waves are hyperbolic. In particular, dispersive equations
such as the Schrödinger equation and the linearized KdV equation are not. Nevertheless, we
will prove that the Cauchy problem is uniquely solvable for these equations as well. Further
properties of dispersive equations will be studied in Chapter 6.
We begin by considering three classical second order equations, namely the heat equation,
the wave equation and the Laplace equation. These examples are instructive in that they illus-
trate that the properties of the solutions of PDE depend heavily on the form of the equation.
They also give us some intuition of what to expect in general.
35
36 Linear PDE with constant coefficients
The presentation of the material in this chapter is close that in Rauch [20]. The treatment
of Kirchhoff’s solution formula for the wave equation in R3 follows Stein and Shakarchi [24].
A more detailed account of the theory of linear PDE with constant coefficients can be found
in Hörmander [13].
3.1 The heat equation

3.1.1 Derivation
The heat equation in R3 is the equation
ut = k∆u, (3.1)
where ∆u = ∂x21 u + ∂x22 u + ∂x23 u and k > 0 is a constant. As the name suggest, the equation
describes the flow of heat. Recall that heat is a form of energy transferred between two bodies,
or regions of space, at different temperatures. Heat transfer to a region increases or decreases
the internal energy in that region. The internal energy of a system consists of all the different
forms of energy of the molecules in the system, but excludes the energy associated with the
movement of the system as a whole. We look at this from a macroscopic perspective and
suppose that the internal energy can be described by a density e(x,t). If E(t) is the total
internal energy in the domain Ω ⊂ R3 we therefore have that
Z
E(t) = e(x,t) dx,
Ω
and the rate of change is Z

E 0 (t) = et (x,t) dx.
Ω
The heat flux J is a vector-valued function which describes the flow of heat. The total transfer
of heat across an oriented surface Σ is given by
Z
J(x,t) · n(x) dS(x),
Σ
where n denotes the unit normal

R
and dS the element of surface area on Σ. The rate at which
heat leaves Ω is therefore ∂ Ω J · Rn dS, where n is the outer unit normal. Using the divergence
theorem, this can be rewritten as Ω ∇ · J dx. Assuming that the only change in internal energy
is due to conduction of heat across the boundary, we find that
Z
(et + ∇ · J) dx = 0.
Ω
Since this holds for all sufficiently nice domains Ω we obtain the continuity equation
et + ∇ · J = 0
Linear PDE with constant coefficients 37
if all the involved functions are continuous. To derive the heat equation we use two constitutive
laws. First of all, we assume that over a sufficiently small range of temperatures, e depends
linearly on the temperature T , that is,
e = e0 + ρc(T − T0 ),
where ρ is the density, c is the specific heat capacity and T0 a reference temperature. We
assume that ρ and c are independent of T . We can simply think of the temperature as what
we measure with a thermometer. On a microscopic level, the temperature is a measure of
the average kinetic energy of the molecules. The above assumption means that an increase in
internal energy results in an increase in temperature. This is not always true, as an increase
in internal energy may instead result in an increase in potential energy. This is e.g. what
happens in phase transitions. The second constitutive law is Fourier’s law, which says that
heat flows from hot to cold regions proportionally to the temperature gradient. In other words,
J = −κ∇T , where κ is a scalar quantity called the thermal conductivity. Assuming that c and
ρ are independent of t, we therefore obtain the equation
cρTt = ∇ · (κ∇T ).
If in addition c, ρ and κ are constant, we find that the temperature satisfies the heat equation
with k = κ/(cρ).
The heat equation is sometimes also called the diffusion equation. The physical interpreta-
tion is then that u describes the concentration of a substance in some medium (e.g. a chemical
substance in some liquid). Diffusion means that the substance tends to move from regions
of higher concentration to regions of lower concentration. The heat equations is obtained by
assuming that the rate of motion is proportional to the concentration gradient (Fick’s law of
diffusion).
3.1.2 Solution by Fourier’s method

The Cauchy problem for the heat equation in Rd reads
(
ut (x,t) = ∆u(x,t), t > 0,
(3.2)
u(x, 0) = u0 (x),
where u : Rd × [0, ∞) → C and u0 : Rd → C. Note that one can assume that k = 1 in (3.1)
by using the transformation t 7→ kt. It is useful to first consider a solution in the form of an
individual Fourier mode û(t)eiξ ·x . Substituting this in the equation yields
û0 (t) = −|ξ |2 û(t).

2
This is an ODE with solution û(t) = e−|ξ | t û(0). The corresponding solution is therefore
2 2
u(x,t) = û(0)e−|ξ | t eix·ξ = û(0)ei(x·ξ +i|ξ | ) . (3.3)
This is written in the form of a complex sinusoidal wave aei(x·ξ −ω(ξ )t) considered in Chapter
1, except that the angular frequency ω(ξ ) = −i|ξ |2 now is imaginary. This implies that the ab-
solute value of the solution is strictly decreasing in t. Since there are no temporal oscillations,
we don’t normally think of the heat equation as an equation describing waves.
Our strategy for solving the Cauchy problem in general is to find a solution which is a
‘continuous linear combination’ of solutions of the form (3.3), that is, a Fourier integral in
x. We work formally and assume that u and u0 are sufficiently regular, so that their Fourier
transforms are well-defined. We also assume that we can differentiate under the integral sign,
so that ∂t F (u) = F (∂t u). We then obtain the transformed problem
(
ût (ξ ,t) = −|ξ |2 û(ξ ,t), t > 0,
û(ξ , 0) = û0 (ξ ),
2
with solution û(ξ ,t) = e−t|ξ | û0 (ξ ). Applying the inverse Fourier transform, we obtain that
1
Z
2
u(x,t) = eix·ξ e−t|ξ | û0 (ξ ) dξ , t ≥ 0. (3.4)
(2π)d/2 Rd
The integral will in general not converge for t < 0 due to the exponentially growing factor
2
e−t|ξ | . Under the assumption that u0 ∈ S (Rd ), one can differentiate under the integral sign as
many times as one likes for t > 0 by Theorem B.15. This still is true if we only allow one-sided
time derivatives at t = 0. It follows that u defined by this formula belongs to C∞ (Rd × [0, ∞))1 .
Carrying out the differentiation one also verifies that u satisfies the heat equation and the
initial condition. Since we have derived an explicit formula, it follows that the solution is
unique whenever the steps involved in deriving the formula can be justified. This holds e.g.
if u(·,t), ut (·,t) ∈ S (Rd ) for each t ≥ 0 and û(·,t) and ût (·,t) can be dominated by some
L1 functions which are independent of t, so that Theorem B.15 applies. There is however a
more natural condition which guarantees that the steps can be justified. The idea is to take the
analogy with ordinary differential equations one step further and think of the solution u as a
function of t with ’values’ in the function space S (Rd ).
Definition 3.1. Let I be an open interval in R.
(1) A function u : I → S (Rd ) is said to be continuous if ku(t + h) − u(t)kα,β → 0 as h →
0 for every semi-norm k · kα,β and every t ∈ I. C(I; S (Rd )) is the vector space of
continuous functions I → S (Rd ).
(2) A function u : I → S (Rd ) is said to be differentiable with derivative u0 : I → S (Rd ) if

u(t + h) − u(t) 0

lim − u (t) =0
h→0 h
α,β
for all α, β ∈ Nd and t ∈ I. The space C1 (I; S (Rd )) consists of differentiable functions
u : I → S (Rd ) such that u, u0 ∈ C(I; S (Rd )).
1 Meaning that at t = 0 derivatives with respect to t are only taken from the right.
(3) The space Ck (I; S (Rd )), k ≥ 2, is defined inductively by requiring that u ∈ C(I; S (Rd ))
and u0 ∈ Ck−1 (I; S (Rd )), and C∞ (I; S (Rd )) = ∩∞ k=0C (I; S (R )).
k d
If I is any interval (not necessarily open), we say that u ∈ Ck (I; S (Rd )) if it is k times continu-
ously differentiable from the interior of I to S (Rd ) and all the derivatives extend continuously
in S (Rd ) to the endpoints.
A function u ∈ C(I; S (Rd )) also defines a function v ∈ Rd × I → C, v(x,t) = u(t)(x). The

definition implies that v ∈ C(Rd × I) and that v(·,t) ∈ S (Rd ) for each t ∈ I. When there is no
risk of confusion we shall write u for v. Also, to clarify the notation, we will write ut or ∂t u for
u0 . If u ∈ C1 (I; S (Rd )) it follows in particular that t 7→ ∂xα u(x,t) is continuously differentiable
on I for each α ∈ Nd and each fixed x ∈ Rd .
Lemma 3.2. Assume that u ∈ Ck (I; S (Rd )). Then û ∈ Ck (I; S (Rd )) with ∂tk F (u) = F (∂tk u).
Proof. The result follows immediately from Lemma 2.8.

We therefore obtain uniqueness in the class C∞ ([0, ∞); S (Rd )) for instance. The fact that
the solution u given by (3.4) belongs to this class follows immediately from the fact that
2
t 7→ û0 (ξ )e−t|ξ | does so and Lemma 3.2.
There is a final important property to discuss. In real life it is impossible to measure some-
thing with infinite accuracy. It is therefore essential that the solution depends continuously on
the data in some sense, so that small measuring errors don’t grow large (at least in finite time).
In the case of the heat equation, we e.g. have that the map
S (Rd ) 3 u0 7→ u ∈ C∞ ([0, ∞); S (Rd ))
is continuous if the space C∞ ([0, ∞); S (Rd )) is equipped with the countable family of semi-
norms
sup k∂tk u(·,t)kα,β , n, k ∈ N, α, β ∈ Nd .
0≤t≤n
Note that t 7→ k∂tk u(·,t)kα,β is a continuous function so that the supremum is finite. The
2
continuity of the map u0 7→ u follows from the formula û(ξ ,t) = e−t|ξ | û0 (ξ ).
In summary, we have proved the following result.
Theorem 3.3. There exists a unique solution u ∈ C∞ ([0, ∞); S (Rd )) of the Cauchy problem
(3.2). Moreover, the solution map S (Rd ) 3 u0 7→ u ∈ C∞ ([0, ∞); S (Rd )) is continuous.
A convenient way of describing the solution is by looking at the map S(t) : S (Rd ) →
S (Rd ) from the data u0 to the solution u(·,t) at time t ≥ 0. S is called the propagator for
the heat equation. The continuous dependence of the solution on the initial data means that
S(·) ∈ C(S (Rd );C∞ ([0, ∞); S (Rd ))). The map S has two important properties. First of all,
S(0) = Id, (3.5)

where Id is the identity map on S (Rd ). Secondly, note that the heat equation is invariant under
translations in t. This means that the solution with initial data u0 at t = t0 is given by S(t −t0 )u0 ,
t ≥ t0 . Consider the solution u with initial data u0 at time 0. If s,t ≥ 0 we can express u(·, s +t)
in two different ways. We can of course express it as S(s + t)u0 . On the other hand, we can
solve the equation up to time t, which gives us S(t)u0 . We can then use this as our new initial
data and solve up to time s + t. In other words S(s + t)u0 = S(s + t − t)S(t)u0 = S(s)S(t)u0 . In
other words,
S(s + t) = S(s)S(t), s,t ≥ 0. (3.6)
Definition 3.4. Let X be a vector space. A family of linear operators {S(t)}t≥0 on X with the
properties (3.5) and (3.6) is called a semigroup. If the family is defined for all t ∈ R and (3.6)
holds for all s,t ∈ R it is called a group.
3.1.3 The heat kernel

Define the heat kernel
1 |x|2
− 4t
Kt (x) := e , t > 0.
(4πt)d/2
Using the convolution theorem and the fact that
2 1 |x|2
F −1 (e−t|·| )(x) = d/2
e− 4t
= (2π)d/2 Kt (x), t >0
(2t)
(see Example 2.7), we can express the solution of the heat equation without using the Fourier
transform.
Theorem 3.5. The unique solution of (3.2) with u0 ∈ S (Rd ) is given by
u(x,t) = (Kt ∗ u0 )(x), t > 0, (3.7)
where Kt is the heat kernel.
Note that (x,t) 7→ Kt (x) is itself a solution of the heat equation in the half space Rd × (0, ∞)
with (t 7→ Kt ) ∈ C∞ ((0, ∞); S (Rd )). Solving the heat equation with initial data Kt we therefore
find that
Ks ∗ Kt = Ks+t , s,t > 0.
This is another way of stating the semigroup property of the propagator S(t). We also note
that Z
kKt kL1 (Rd ) = Kt (x) dx = (2π)d/2 F (Kt )(0) = 1.
Rd
3.1.4 Qualitative properties

Decay in time
The solutions of the heat equation decay as t → ∞. Physically, this is related to the fact that
heat flows from hot regions to cold regions, so that the temperature evens out. Since we are
working in the whole space Rd and assume that the solutions converge to 0 at spatial infinity,
the temperature will converge to 0 everywhere as t → ∞. The total internal energy is however
conserved. Mathematically this is reflected in the conservation law
Z
u(x,t) dx = (2π)d/2 û(0,t) = (2π)d/2 û0 (0).
Rd
The L2 norm decays to zero however.

Proposition 3.6. Assume that u0 ∈ S (Rd ) and let u be the corresponding solution of (3.2).
Then
lim ku(·,t)kL2 = 0.
t→∞
Proof. Using Parseval’s formula it follows that
Z
2
ku(·,t)k2L2 = e−2t|ξ | |û0 (ξ )|2 dξ .
Rd
2
Since e−2t|ξ | → 0 monotonically as t → ∞, the proposition follows.
The L2 norm of the solution or its derivative can often be interpeted as some form of
energy. Due to the above result, one often thinks of the heat equation as a dissipative equation,
meaning that energy is lost. This is not trueR in the physical sense however, since the physical
energy is given by the conserved quantity Rd u(x,t) dx. From a mathematical point of view,
it is however useful to think of the heat equation as dissipative, in contrast to e.g. the wave
equation (see Section 3.2.4).
One can also use the heat kernel to prove the decay of the solution as t → ∞. Using (3.7)
and Young’s inequality one can prove a number of estimates.
Proposition 3.7. Assume that u0 ∈ S (Rd ) and let u be the corresponding solution of (3.2).
All the norms ku(·,t)kL p , p ≥ 1, are decreasing functions of t. For p > 1, we have that
lim ku(·,t)kL p = 0.
t→∞
Proof. Due to the translation invariance in t, it suffices to show that ku(·,t)kL p ≤ ku0 kL p to
prove the monotonicity of the L p norm. This bound follows from Young’s inequality (Theorem
B.29):
ku(·,t)kL p ≤ kKt kL1 ku0 kL p = ku0 kL p .
The second part also follows from Young’s inequality:

− d2 1− 1p
ku(·,t)kL p ≤ kKt kL p ku0 kL1 = Ct ku0 kL1 → 0
if p > 1, for some constant C > 0.
We mention an alternative way of proving the monotonicity. This is especially useful if one
wants to consider the heat equation in a domain other than Rd , where the Fourier transform is
not avaiable. For simplicity we assume that u0 (and hence u) is real-valued. Multiplying the
equation ut = ∆u by u and integrating we find that
Z
d
Z Z Z
2
u dx = 2 uut dx = 2 u∆u dx = −2 |∇u|2 dx ≤ 0
dt Rd Rd Rd Rd
where we used integration by parts in the last step. The monotonicity of the other L p norms,
1 ≤ p < ∞ can be proved in a similar way, although special care has to be taken in the case
1 ≤ p < 2 due to regularity problems. The method of multiplying the equation by some
quantity and integrating to obtain a bound is called the energy method. One can sometimes
guess the form of the integral directly.
Smoothing
The heat equation also has a smoothing property. Again this makes sense physically. Since
the heat tends to spread out evenly, we expect discontinuities to be ‘evened out’. To formalize
this smoothing property we need to consider non-smooth initial data.
Proposition 3.8. The propagator S(t) : S (Rd ) → S (Rd ), t ≥ 0, extends uniquely to a con-
tinuous map S(t) : L p (Rd ) → L p (Rd ) for 1 ≤ p < ∞, given by
S(t)u0 = Kt ∗ u0 , t >0
and S(0)u0 = u0 .
Proof. It follows from Young’s inequality that Kt ∗ u0 is defined for t > 0 and defines a con-
tinuous extension of S(t) : S (Rd ) → S (Rd ). The uniqueness follows from Propositions 2.13
and 3.7.
Proposition 3.9. Let u0 ∈ L p (Rd ), 1 ≤ p < ∞. Then S(·)u0 ∈ C∞ (Rd × (0, ∞)) and S(t)u0
solves the heat equation for t > 0. Moreover, S(t)u0 → u0 in L p (Rd ).
Proof. This follows by using the formula S(t)u0 = Kt ∗ u0 . Since Kt (x) is exponentially de-
caying in x for t > 0, we can apply Theorem B.15 to differentiate under the integral sign as
many times as we like. It follows that S(t)u0 is smooth in Rd × (0, ∞). The convergence of
S(t)u0 = Kt ∗ u0 to u0 in L p is a consequence of Theorem B.27.
Infinite speed of propagation

Suppose that u0 ≥ 0. Formula (3.7) then shows that u(x,t) ≥ 0 for t > 0 and all x ∈ R. In
fact it also shows that u(x,t) > 0 unless u0 ≡ 0. This holds even if u0 has compact support.
We say that heat equation has infinite speed of propagation. In fact, a much stronger property
holds. One can show that u is a real analytic function for t > 0, that is, it can be expanded
in a convergent power series in a neighbourhood of each point (x,t) ∈ Rd × (0, ∞). If u(·,t)
vanishes on an open subset Ω ⊂ Rd for some t > 0, then ∂tk ∂xα u(x,t) = ∆k ∂xα u(x,t) = 0 for
(x,t) ∈ Ω for all k ∈ N and α ∈ Nd . It then follows that u(x,t) ≡ 0 on Rd × (0, ∞) by the
unique continuation property for real analytic functions.
3.2 The wave equation

3.2.1 Derivation
We consider next the wave equation in Rd
utt (x,t) = c2 ∆u(x,t).
The equation describes many different physical phenomena. We discuss two examples.
Example 3.10 (Acoustic waves). Consider again the linearized equations of gas dynamics
ρ0 ut + c20 ∇ρ = 0,
ρt + ρ0 ∇ · u = 0,
from Example 1.1. Taking the divergence of the first equation with respect to x and differenti-
ating the second equation with respect to t, one finds that
ρtt = c20 ∆ρ.
Suppose next that the vector field u is irrotational, ∇ × u = 0, for some t. It is an easy matter
show that it is then irrotational for all t. Differentiating the first equation with respect to t and
the second with respect to x and using the identity
∇ × (∇ × F) = ∇(∇ · F) − ∆F, (3.8)
one finds that u satisfies the wave equation
utt = c20 ∆u
(meaning that each component of u j satisfies the wave equation).
Example 3.11 (Electromagnetic waves). Maxwell’s equations couple electric and magnetic
fields. In vacuum they are
∂B
∇ · E = 0, ∇×E = − ,
∂t
∂E
∇ · B = 0, ∇ × B = µ0 ε0
∂t
where E is the electric field, B the magnetic field, µ0 the electric constant and ε0 the magnetic
constant. Uppercase letters are used to indicate vector quantities and lowercase letters to
indicate scalar quantities. Taking the curl of the last two equations we obtain that
∂ ∂ 2E
∇ × (∇ × E) = − ∇ × B = −µ0 ε0 2 ,
∂t ∂t
∂ ∂ 2B
∇ × (∇ × B) = µ0 ε0 ∇ × E = −µ0 ε0 2 ,
∂t ∂t
Using (3.8) it follows that
Ett = c20 ∆E,
Btt = c20 ∆B,
√
where c0 = 1/ µ0 ε0 is the speed of light in vacuum.
3.2.2 Solution by Fourier’s method

To simplify the notation, we shall assume that c = 1. In fact, this can always be achieved by
the change of variables t 7→ ct. We thus consider the Cauchy problem

utt (x,t) = ∆u(x,t), t > 0,

u(x, 0) = u0 (x), (3.9)

ut (x, 0) = u1 (x).

In contrast to the heat equation, there are two initial conditions. This is due to the fact that the
equation is of second order in time. The transformed problem is

2
ûtt (ξ ,t) = −|ξ | û(ξ ,t), t > 0,

û(ξ , 0) = û0 (ξ ),

ût (x, 0) = û1 (ξ ),

with solution
sin(t|ξ |)
û(ξ ,t) = cos(t|ξ |)û0 (ξ ) + û1 (ξ )
|ξ |
and hence

1 sin(t|ξ |)
Z
ix·ξ
u(x,t) = e cos(t|ξ |)û0 (ξ ) + û1 (ξ ) dξ .
(2π)d/2 Rd |ξ |
Just as for the heat equation, one obtains an existence and uniqueness result from these formu-
las. In contrast to the heat equation, the solution is now also defined for negative times. This
is related to the fact that, contrary to the heat equation, the wave equation is invariant under
the transformation t 7→ −t.
Theorem 3.12. There exists a unique solution u ∈ C∞ (R; S (Rd )) of the Cauchy problem
(3.9). Moreover, the solution map S (Rd )2 3 (u0 , u1 ) 7→ u ∈ C∞ (R; S (Rd )) is continuous.
To define a propagator with the semigroup property, one has to consider both u and ut . We
leave it as an exercise to prove that the map S(t)(u0 , u1 ) := (u(·,t), ut (·,t)) in fact is a group
in S (Rd )2 .
3.2.3 Solution formulas

One can express the solution u without using the Fourier transform, although it requires more
work than for the heat equation. The problem is that the functions cos(t|ξ |) and sin(t|ξ |)/|ξ |
don’t belong to S (Rd ) and that we therefore can’t use the convolution theorem and Fourier’s
inversion formula directly. Note that
Z Z
ix·ξ sin(t|ξ |) ˆ
∂t e f (ξ ) dξ = eix·ξ cos(t|ξ |) fˆ(ξ ) dξ ,
Rd |ξ | Rd
which implies that the solution corresponding to the case u1 = 0 can be obtained by differenti-
ating the solution corresponding to the case u0 = 0 with respect to t. It turns out that the form
of the solution depends on the dimension. We discuss the cases d = 1, 2 and 3 separately.
One dimension
In this case we already know from Theorem 1.2 that the solution is given by d’Alembert’s
formula
1 1 x+t
Z
u(x,t) = (u0 (x + t) + u0 (x − t)) + u1 (y) dy. (3.10)
2 2 x−t
We can recover this formula by calculating the inverse Fourier transform of sin(t|ξ |)/|ξ |.
Using d’Alembert’s formula we can in fact guess what the solution must be. For t > 0 we find
that Z t
1 2 sin(tξ ) 2 sin(t|ξ |)
F (χ(−t,t) )(ξ ) = √ e−ixξ dx = √ =√ .
2π −t 2π ξ 2π |ξ |
It follows that √
−1 sin(t|ξ |) 2π
F = χ , t > 0.
|ξ | 2 (−t,t)
Using the convolution theorem (which extends to L1 by continuity) we now find that

−1 sin(t|ξ |) 1
F û1 (ξ ) (x) = (χ(−t,t) ∗ g)(x)
|ξ | 2
1
Z
= χ (x − y)u1 (y) dy
2 R (−t,t)
1 x+t
Z
= u1 (y) dy.
2 x−t
The formula extends directly to negative t. Finally, the solution corresponding to u0 is obtained
by differentiating with respect to t and replacing u1 by u0 .
Three dimensions
In three dimensions, it is not possible to make sense of the identity

−1 sin(t|ξ |) −3/2 −1 sin(t|ξ |)
F û1 = (2π) F ∗ u1
|ξ | |ξ |

|)
directly. As we shall see, one can make sense of it by thinking of F −1 sin(t|ξ |ξ |
not as a
function, but as a linear functional. This is our first encounter with distributions. They will be
discussed more in Chapter 5. As in one dimension, we shall compute the ‘inverse transform’
by making an educated guess. Note that d’Alembert’s formula involves the average of the
function u1 over the interval [x − t, x + t] and of u0 over the boundary of the interval (that is,
its endpoints). It turns out that the correct idea in the three-dimensional case is to consider the
spherical average
1
Z
Mt ( f )(x) := f (y) dS(y),
|∂ Bt (x)| ∂ Bt (x)
of a function f over the sphere ∂ Bt (x) of radius t around x, where |∂ Bt (x)| = 4πt 2 is the area
of the sphere. It is useful to rewrite this as
1
Z
Mt ( f )(x) = f (x − ty) dS(y).
4π ∂ B1 (0)
Note that Mt ( f ) extends to en even function of t defined by taking the average over the ball of
radius |t| for t < 0.
Lemma 3.13. Let f ∈ S (R3 ). Then the map t 7→ Mt ( f ) belongs to C∞ (R; S (R3 )).
Proof. To prove that Mt ( f ) is rapidly decreasing for fixed t, we simply estimate | f (x − ty)| ≤
CN (1 + |x − ty|)−N ≤ CN0 (1 + |x|)−N when |y| = 1 (separate into the cases |x| ≤ 2t|y| and |x| ≥
2t|y| and use the reverse triangle inequality in the latter case) and hence |Mt ( f )(x)| ≤ CN00 (1 +
|x|)−N by integration. Differentiating under the integral sign, we find that Mt ( f ) ∈ S (R3 ) for
all t ∈ R. The difference quotient
f (x − (t + h)y) − f (x − ty)
− ∇ f (x) · (−y)
h
can be estimated using Taylor’s theorem by (max|α|=2 supz∈B|t|+|h| (x) |∂ α f (z)|)|h| if h is small.
Again, this is rapidly decreasing in x and therefore converges to zero as h → 0 when multiplied
by any polynomial. Similar estimates for the derivatives ∂xα f (x − ty) are easily obtained. It
follows that t 7→ Mt ( f ) ∈ C∞ (R; S (R3 )).
Lemma 3.14.
1 sin |ξ |
Z
e−ix·ξ dS(x) = .
4π ∂ B1 (0) |ξ |
Proof. We begin by noting that the integral is radial in ξ . Indeed, if R is a rotation matrix,
then RT R = I, so
Z Z Z
−ix·Rξ −iRT x·ξ
e dS(x) = e dS(x) = e−iy·ξ dS(y),
∂ B1 (0) ∂ B1 (0) ∂ B1 (0)
where y = RT x. It therefore suffices to take ξ = (0, 0, |ξ |), in which case the integral equals
" #π
1
Z 2π Z π
1 e−i|ξ | cos θ sin |ξ |
e−i|ξ | cos θ sin θ dθ dϕ = = .
4π 0 0 2 i|ξ | |ξ |
θ =0
Theorem 3.15 (Kirchhoff’s formula). The solution of (3.9) when d = 3 is given by

1 1
Z Z
∂
u(x,t) = u1 (y) dS(y) + u0 (y) dS(y) . (3.11)
4πt |x−y|=|t| ∂t 4πt |x−y|=|t|
Proof. It suffices to take u0 = 0. We find that
1
Z Z
(2π) 3/2
F (Mt (u1 ))(ξ ) = e−ix·ξ u1 (x − ty) dS(y) dx
4π R 3 ∂ B1 (0)
1
Z Z
= u1 (z)e−i(z+ty)·ξ dS(y) dz
4π R ∂ B1 (0)
3
Z Z
−iz·ξ 1 −ity·ξ
= u1 (z)e dz e dS(y)
R3 4π ∂ B1 (0)
sin(t|ξ |)
= (2π)3/2 û1 (ξ ).
t|ξ |
Two dimensions
It’s more difficult to directly calculate a solution formula in the two-dimensional case, but
there is a trick which works well. The idea is to look at the solution u(x1 , x2 ,t) as a solution of
the three-dimensional wave equation which is independent of x3 . This is called the method of
descent.
Theorem 3.16. The solution of (3.9) when d = 2 is given by

sgnt u1 (y) sgnt u0 (y)
Z Z
∂
u(x,t) = 1/2
dy + dy .
2π 2 2
|x−y|≤|t| (t − |x − y| ) ∂t 2π |x−y|≤|t| (t 2 − |x − y|2 )1/2
(3.12)
Proof. As usual it suffices to take u0 = 0. Let ũ be the solution of the wave equation for d = 3
with initial data ũ0 (x̃) = 0 and ũ1 (x̃) = u1 (x), where x̃ = (x, x3 ), x = (x1 , x2 ) ∈ R2 . According
to Theorem 3.15, we then have that
1 1
Z Z
ũ(x̃,t) = ũ1 (ỹ) dS(ỹ) = ũ1 (ỹ) dS(ỹ),
4πt 2πt
|x̃−ỹ|=|t| |x̃−ỹ|=|t|
y3 ≥0
since ũ1 is independent of y3 . The upper hemisphere {ỹ : |x̃− ỹ| = |t|, y3 ≥ 0} can be parametrized
by setting y3 = x3 + (t 2 − |x − y|2 )1/2 , |x − y| ≤ |t|. The element of surface area is then
dS = |t|(t 2 − |x − y|2 )−1/2 dy, and
sgnt u1 (y)
Z
u(x,t) = dy
2π |x−y|≤|t| (t − |x − y|2 )1/2
2
There is a small problem in that ũ1 6∈ S (R3 ). This can e.g. be taken care of by replacing it
with u1 (x)ϕ(x3 ) where ϕ is in C0∞ (R) with ϕ(x) = 1 for |x| ≤ R. As long as |t| ≤ R this will
not affect u. One therefore finds that u defined by (3.12) is the unique solution of the wave
equation.
3.2.4 Qualitative properties

Finite speed of propagation
The solution formulas (3.10), (3.11) and (3.12) show that the wave equation has finite speed
of propagation, in the sense that the initial data at x0 only influences the solution at (x,t) if
|x − x0 | ≤ |t|. There are various ways of making this more precise.
Definition 3.17.
(1) We say that a point P = (x0 ,t0 ) ∈ Rd × R influences a future point Q = (x,t) ∈ Rd × R,
t ≥ t0 , if for every spatial neighbourhood U ⊂ Rd of x0 there exist two solutions u and v
of the wave equation, such that (u, ut ) = (v, vt ) outside of U at time t0 but (u, ut ) 6= (v, vt )
at Q.
(2) The (future) domain of influence of a point P is the set of future points which are influ-
enced by P. If U ⊂ Rd × R, the domain of influence of U is the union of the domains of
influence of all P ∈ U.
(3) The domain of dependence of a point Q ∈ Rd × R is the set of past points which influ-
ences it.
Remark 3.18. The reader should be aware that the terminology isn’t standardized. In the def-
inition of the domain of influence one e.g. sometimes considers both future and past points.
Note also that since the equation is linear one can replace v by the zero function in the defini-
tion.
Proposition 3.19. Consider the wave equation in Rd . If d = 1, 2, the domain of influence of a
point (x0 ,t0 ) ∈ Rd × R is the set {(x,t) ∈ Rd × R : t ≥ t0 , |x − x0 | ≤ t − t0 }. If d = 3, it is the
set {(x,t) ∈ Rd × R : t ≥ t0 , |x − x0 | = t − t0 }. Similarly, the domain of dependence of (x0 ,t0 )
is {(x,t) ∈ Rd × R : t ≤ t0 , |x − x0 | ≤ t0 −t} for d = 1, 2 and {(x,t) ∈ Rd × R : t ≤ t0 , |x − x0 | =
t0 − t} for d = 3.
Proof. It suffices to consider the domain of influence. The domain of dependence is easily
obtained by considering the domains of influence of points in the past. Since the wave equation
is invariant under translations in t, we can assume that t0 = 0.
Consider first the cases d = 1, 2. The proof follows by inspecting the solution formulas
(3.10) and (3.12). If u0 and u1 have supports in BR (x0 ), the supports of u(·,t) and ut (·,t) for
t ≥ 0 are contained in {x ∈ Rd : ∃y ∈ BR (x0 ), |x −y| ≤ t}. Picking u0 (x) ≡ 0 and u1 (x) ≥ 0 with
supp u1 = BR (x0 ), we find that supp u(·,t) = {x ∈ Rd : ∃y ∈ BR (x0 ), |x − y| ≤ t}. Choosing R
arbitrarily small, the result follows.
In the three-dimensional case, the solution formula (3.11) only involves integration over
spheres rather than balls. Repeating the above argument shows that the domain of influence is
now {(x,t) ∈ Rd × R : t ≥ 0, |x − x0 | = t}.
The cones {(x,t) ∈ Rd × R : t ≥ t0 , |x − x0 | = t −t0 } and {(x,t) ∈ Rd × R : t ≤ t0 , |x − x0 | =
t0 − t} are called the future and past light cones of (x0 ,t0 ).
t t
Q
P x x
Figure 3.1: Left: Domain of influence for the wave equation. In dimensions 1 and 2, the
domain of influence of the point P is the shaded cone. In dimension 3 it is the boundary of the
cone. Right: Domain of dependence for the wave equation.
We can summarize this phenomenon by saying that information travels with speed less
than or equal to 1 for the wave equation. This is in fact true in all dimensions. In three
dimensions information travels exactly at speed 1. This is usually called Huygens’ principle
and one can prove that it holds in all odd dimensions except 1. This means that in three-
dimensional space, a localized disturbance will cause a wave which is only observed for a
finite amount of time by an observer who is standing still. In two dimensions, the observer
will notice the wave for all future times, although the amplitude decays in time. A more useful
fact is that Huygens’ principle holds in all dimensions if one considers the speed of which
singularities travel. This is known as the generalized Huygens’ principle.
Extension of the the solution formulas and conservation of energy

The solution formulas (3.10), (3.11) and (3.12) extend in two different ways. First of all, it is
not necessary to require that u0 and u1 are rapidly decreasing. Because of the finite speed of
propagation, it suffices to assume that u0 , u1 ∈ C∞ (Rd ). Indeed, given a non-negative function
ϕ ∈ C0∞ (Rd ), we construct a family {ψα }α∈Zd of non-negative functions in C0∞ (Rd ) such that
supp ψα = supp ϕ + α and ∑α∈Zd ψα (x) = 1, by setting ϕα (x) = ϕ(x − α) and
ϕα (x)
ψα (x) = .
∑α∈Zd ϕα (x)
Note that the sum in the denominator only ranges over finitely many α for every fixed x. The
sequence {ψα }α∈Zd is an example of a locally finite partition of unity. Writing u0 = ∑α u0 ψα
and u1 = ∑α u1 ψα and solving the wave equation with initial data u0 ψα , u1 ψα ∈ C0∞ (Rd ) for
every α, we obtain a solution for the initial data u0 , u1 ∈ C∞ (Rd ) by adding all the solutions.
Due to the finite speed of propagation, the sum will only range over finitely many indices at
each point.
Secondly, the initial data don’t have to be infinitely differentiable. We saw already in
Theorem 1.2 that it suffices that u0 ∈ C2 and u1 ∈ C1 in one dimension. The amount of
regularity depends on the dimension however. In three dimensions we e.g. have the following
result.
Proposition 3.20. Let u0 ∈ Ck (R3 ), u1 ∈ Ck−1 (R3 ) with k ≥ 3. Formula (3.11) defines a
solution u ∈ Ck−1 (R3 ×R) of the wave equation with initial data u(x, 0) = u0 (x) and ut (x, 0) =
u1 (x).
The proof simply relies on changing variables to fix the domain of integration and differ-
entiation under the integral sign. The difference with respect to the one-dimensional case has
to do with the fact that the integral in (3.11) is over a two-dimensional surface in R3 , whereas
the integral in d’Alembert’s formula is over an open interval in R. One can show that the ‘loss
of derivatives’ gets worse the higher the dimension. It is however possible to avoid this by
instead measuring the regularity in terms of Sobolev norms. We will return to this in Chapter
5. All of this is in stark contrast to the heat equation, for which initial data which may not
even be continuous evolve into smooth solutions for t > 0 (cf. Proposition 3.9).
Proposition 3.20 only provides an existence result. Uniqueness can be also be proved using
the energy method.
Theorem 3.21. Let BR (0) be the ball of radius R around the origin in Rd and let Γ = {(x,t) ∈
Rd × R : |x| ≤ R − |t|}. Assume that u ∈ C2 (Γ) is a solution of the wave equation and that u
and ut vanish in BR (0) × {0}. Then u(x,t) ≡ 0 in Γ.
Proof. For simplicity we consider the case t ≥ 0. The other case follows by the change of
variables t 7→ −t. Since the real and imaginary parts each satisfy the wave equation, we may
also assume that u is real. Let n(x) = x/(R − t) be the exterior unit normal of the ball BR−t (0)
in Rd and dS(x) the surface measure on ∂ BR−t (0). Introduce
1
Z
e(t) = (ut2 + |∇u|2 ) dx, 0 ≤ t ≤ R/c.
2 |x|≤R−t
1
Z Z
0
e (t) = (ut utt + ∇u · ∇ut ) dx − (u2 + |∇u|2 ) dS(x)
|x|≤R−t 2 |x|=R−t t
1
Z Z
= ut (utt − ∆u) dx − (u2 + |∇u|2 − 2ut ∇u · n(x)) dS(x)
|x|≤R−t 2 |x|=R−t t
≤ 0,
since u satisfies the wave equation and 2|ut ∇u · n(x)| ≤ ut2 + |∇u|2 . This shows that e(t) is
decreasing. Since e(0) = 0 and e(t) ≥ 0 it follows that e(t) ≡ 0.
Since the wave equation is linear, this guarantees that any C2 solution is uniquely deter-
mined by its initial data. If we assume that ∇u and ut are decaying at infinity and instead
integrate of Rd , the boundary terms disappear when we integrate by parts. This proves the
following result.
Proposition 3.22. Let u0 , u1 ∈ S (Rd ). Then,
1
Z
E(t) = (|ut |2 + |∇u|2 ) dx
2 Rd
is independent of t.
The quantity E can be interpreted as the total energy of the wave. This explains the name
‘energy method’ used above. Alternatively, the conservation of energy can be derived by using
the Fourier representation of the solution. We leave this as an exercises.
3.3 The Laplace equation

Laplace’s equation
∆u = 0
arises e.g. when considering time-independent solutions of the heat or wave equation. A
solution is called a harmonic function. Consider Laplace’s equation in Rd+1 . Though one
usually thinks of this as an equation for steady states and not as an evolution equation, we
can of course pick one of the variables, say xd+1 , and think of that as our time-variable. We
thus consider Laplace’s equation in the half-space xd+1 > 0 with data given at the hyperplane
xd+1 = 0. To emphasize this, we set t = xd+1 and end up with the problem

utt (x,t) = −∆u(x,t), t > 0,

u(x, 0) = u0 (x),

ut (x, 0) = u1 (x).

The corresponding problem on the Fourier side is


2
ûtt (ξ ,t) = |ξ | û(ξ ,t), t > 0,

û(ξ , 0) = û0 (ξ ),

ût (x, 0) = û1 (ξ ),

yielding
sinh(t|ξ |)
û(ξ ,t) = cosh(t|ξ |)û0 (ξ ) + û1 (ξ ).
|ξ |
Here we run into problems since û(·,t) will not even be bounded for t > 0 unless û0 and û1 are
exponentially decaying (note that cosh(t|ξ |) and sinh(t|ξ |) grow like et|ξ | at infinity). Thus,
there is in general no solution of the form discussed above, unless we assume e.g. that û0 and
û1 have compact support. One could of course look for a solution with less regularity, but using
distribution theory one can in fact show under very mild conditions that there is no solution.
There is however the possibility that û0 and û1 fit together in such a way that the exponentially
large terms cancel. This will be the case precisely if û1 (ξ ) = −|ξ |û0 (ξ ). If we are satisfied
with only prescribing u(x, 0) we can therefore find a solution, namely u(ξ ,t) = e−t|ξ | û0 (ξ ).
This gives the solution formula
1
Z
u(x,t) = eix·ξ e−t|ξ | û0 (ξ ) dξ ,
(2π)d/2 Rd
defining a harmonic function in the half plane t > 0. As for the heat equation, it is possible to
obtain a solution formula directly in terms of u0 by calculating F −1 (e−t|ξ | ). The result is
u(x,t) = (Pt ∗ u0 )(x),
where
Γ ((d + 1)/2) t
Pt (x) = (d+1)/2
,
π (t + |x|2 )(d+1)/2
2
is the Poisson kernel for the half-space t > 0, Γ denoting the Gamma function.
3.4 General PDE and the classification of second order equa-

tions
In general, a homogeneous linear second order PDE with real constant coefficients can be
written
d d
∂ 2u ∂u
∑ i j ∂ xi∂ x j ∑ ai ∂ xi + a0u = 0.
a + (3.13)
i, j=1 i=1
The coefficients ai j , ai are real numbers. We assume that not all ai j = 0. Without loss of
generality we can assume that ai j = a ji . Consider a linear change of variables y = Bx, where
B is a d × d matrix. That is,
yk = ∑ bkl xl .
l
Using the chain rule, we obtain that
∂u ∂ u ∂ yk ∂u
=∑ = ∑ bki
∂ xi k ∂ yk ∂ xi k ∂ yk
and ! !
∂ 2u ∂ ∂ ∂ 2u
= ∑ bki ∂ yk ∑ bl j ∂ yl u = ∑ bki bl j .
∂ xi ∂ x j k l k,l ∂ yk ∂ yl
The second order part of the PDE are therefore converted as follows:
!
d
∂ 2u ∂ 2u
a
∑ ∂ xi∂ x j ∑ ∑
i j = b ki a i j b l j .
i, j=1 k,l i, j ∂ yk ∂ yl
We associate to the second order terms the quadratic form

p(ξ ) = ∑ ai j ξi ξ j = ξ T Aξ ,
i, j
where ξ ∈ Rd is thought of as a column vector and ξ T is the transpose of ξ . Let q(η) be the
form corresponding to the transformed second order part. Then
q(η) = η T BABT η.
Note that this is just the change of variables formula associated with the transformation ξ =
BT η. Recall that the number of positive (m+ ) and negative (m− ) eigenvalues of the symmetric
matrix associated with a quadratic form is invariant under linear transformations and that we
can find a transformation B so that
d
q(η) = p(BT η) = ∑ σ j η 2j ,
j=1
in which σ j ∈ {−1, 0, 1} with m+ positive and m− negative squares. We can make a classifi-
cation of second order PDE by considering the quadratic form q (or p). The names correspond
to the character of the quadratic surface q(η) = c.
Definition 3.23.
• We say that the PDE (3.13) is elliptic if the quadratic form p is positive or negative
definite, that is, if all the eigenvalues are positive or if all are negative. In other words
this means that after linear change of variables, we can transform the PDE to the Laplace
equation plus terms of lower order.
• If one eigenvalue is positive and the others negative (or vice versa) we say that the
equation is hyperbolic. This means that we can transform the equation to the wave
equation plus lower order terms.
• If the quadratic form is non-degenerate, but not hyperbolic or elliptic, we say that the
equation is ultrahyperbolic. This means that both m+ and m− are ≥ 2 and that d =
m+ + m− . Note in particular that we must have d ≥ 4 for this to be possible.
• Finally, if one eigenvalue is zero and the others have the same sign, we say that the
equation is parabolic. Note that this is a degenerate case and that one needs information
on the first order terms to somehow compare the equation with the heat equation.
Roughly speaking, we expect elliptic equations to behave like Laplace’s equation, hyper-
bolic equations to like the wave equation and parabolic equations like the heat equation. A
large portion of the history of PDE has been devoted to finding suitable generalizations of
these definitions for operators of arbitrary order. Note that this classification by no means is
exhaustive, not even for second order equations.
Using multi-index notation, we can write a general linear partial differential operator of
order m with constant coefficients as
∑ aα ∂ α ,
|α|≤m
where aα ∈ C. There is no need to assume that the coefficients are real, but note that a
complex equation always can be written as a system of two real equations. It turns out to be
more convenient to work with the partial derivatives D j = ∂ j /i and D = (∂1 , . . . , ∂d )/i. We
therefore consider operators of the form
P(D) = ∑ aα Dα ,
|α|≤m
and associate with such an operator the symbol
P(ξ ) = ∑ aα ξ α
|α|≤m
obtained by replacing D j by ξ j . The reason for this choice is that D j corresponds to ξ j on the
Fourier side, and hence
F (P(D)u)(ξ ) = P(ξ )û(ξ ). (3.14)
The principal symbol Pm (ξ ) is defined by

Pm (ξ ) = ∑ aα ξ α ,
|α|=m
and the principal part is Pm (D). Just as for second order operators, one can relate properties
of the PDE to those of the polynomials P and Pm . This is e.g. evident by considering relation
(3.14). Our aim will be to find a suitable general definition of hyperbolicity in terms of P
and Pm . The idea is to consider two particular characteristics of the wave equation, namely
the well-posedness of the initial-value problem and the finite speed of propagation. We shall
therefore devote the next section to a study of the initial-value problem for partial differential
operators with constant coefficients.
For completeness, we also mention the general definition of an elliptic operator.

Definition 3.24 (Ellipticity). We say that the operator P(D) is elliptic if Pm (ξ ) 6= 0 for ξ ∈
Rd \ {0}.
Note that this agrees with the definition for second order equations. The idea is essentially
that one can solve the equation P(D)u = f by dividing with P(ξ ) on the Fourier side. This
idea doesn’t quite work since P(ξ ) might vanish for small |ξ |. However, ellipticity implies
that |P(ξ )| ≥ c|ξ |m for large |ξ | and this implies e.g. that u is smooth if f is smooth, since the
regularity of u has to do with the decay of û at infinity. The equation
Pm (ξ ) = 0
is called the characteristic equation. We will return to this later on. On the other hand, one
can show that the initial-value problem is always ill-posed in the sense described in Section
3.3 for elliptic equations if d ≥ 2. We will not study elliptic operators more in these notes
since they’re not directly relevant to waves. Note however that elliptic equations appear in the
study of special types of waves, such as standing or travelling waves.
3.5 Well-posedness of the Cauchy problem

We consider the Cauchy problem
(
ut = A(Dx )u, t > 0,
(3.15)
u = u0 , t = 0.
where A(ξ ) is an n × n matrix with polynomial entries and u is defined on Rd × [0, ∞) with val-
ues in Rn . Note that scalar equations of order n in t can be reduced to this form by introducing
the time derivatives of order ≤ n − 1 as new variables. The corresponding partial differential
operator is now the matrix-valued operator with symbol
P(ξ , τ) = iτI − A(ξ ),
where I the unit n × n matrix and we use τ to denote the Fourier variable corresponding to t.
Example 3.25. The wave equation utt = ∆u can be written

(
ut = v,
vt = ∆u,
which is of the above form with

0 1
A(ξ ) = .
−|ξ |2 0
Definition 3.26 (Well-posedness in the sense of Hadamard). We say that the Cauchy problem
is well-posed if:
(1) there exists a solution for each u0 ;
(2) the solution is unique;
(3) the solution depends continuously on the data u0 .
Of course, this has to be made more precise in the sense that one has to specify the class of
initial data, the class of solutions and the topology used to measure the continuity of the map
u0 7→ u. We will consider data u0 ∈ S (Rd )n , meaning that each component of u0 belongs to
S (Rd ). Note that Definition 3.1 extends in a natural way to functions with values in S (Rd )n
by imposing the conditions on each component.
Definition 3.27. The Cauchy problem (3.15) is said to be well-posed in S if
(1) for every u0 ∈ S (Rd )n there exists a unique solution
u = S(·)u0 ∈ C∞ ([0, ∞); S (Rd )n );
(2) the map S : S (Rd )n → C∞ ([0, ∞); S (Rd )n ), S(t)u0 = u(t), is continuous.
If the Cauchy problem is not well-posed it is said to be ill-posed (in S ).
Remark 3.28. The assumption that u ∈ C∞ ([0, ∞); S (Rd )n ) in the uniqueness result can be
relaxed. It suffices e.g. to assume that u ∈ C([0, ∞); S (Rd )n ) ∩C1 ((0, ∞); S (Rd )n ). From the
equation ut = A(Dx )u it is then easy to prove that u ∈ C2 ((0, ∞); S (Rd )n ) with ∂t ut = A(Dx )ut
and ut (0) = A(Dx )u0 . Using induction, we find that u ∈ C∞ ((0, ∞); S (Rd )n ). Moreover, all
the derivatives extend continuously to t = 0.
We look for a general condition under which the initial-value problem is well-posed in S .
Taking the Fourier transform of (3.15) we obtain the problem
(
ût = A(ξ )û, t > 0,
(3.16)
û = û0 , t = 0.
Note that (3.16) is a linear system of ordinary differential equations. We therefore know that
for each ξ and u0 (ξ ), (3.16) has a unique solution
û(ξ ,t) = etA(ξ ) û0 (ξ ).

Moreover, since A(ξ ) is a polynomial in ξ , û ∈ C∞ (Rd × (0, ∞))n . In order for û(·,t) to be of
Schwartz class, we need to control the growth of etA(ξ ) û0 (ξ ) for large ξ . The idea is that this
can be done by prescribing a condition on the eigenvalues of A.
Definition 3.29 (Petrowsky’s condition). We say that A(Dx ) satisfies Petrowsky’s condition if
there exists a number CA ∈ R such that the eigenvalues λ (ξ ) of A(ξ ) satisfy Re λ (ξ ) ≤ CA for
all ξ ∈ Rd .
Remark 3.30.
(1) Consider the symbol P(ξ , τ) = iτI − A(ξ ) of the operator ∂t I − A(Dx ). An equivalent
formulation of Petrowsky’s condition is that det P(ξ , τ) 6= 0 if Im τ ≤ −CA .
(2) Petrowsky’s original condition was in fact a logarithmic bound on Re λ (ξ ). It was
shown by Gårding in the special case of hyperbolic operators that this condition is
equivalent to the one in the definition. The equivalence in general follows from the
Tarski-Seidenberg theorem discussed below. This was first observed by Hörmander.
Theorem 3.31. The Cauchy problem (3.15) is well-posed in S if A(Dx ) satisfies Petrowsky’s
condition.
To clarify the proof, we first treat the scalar case n = 1. A(ξ ) = a(ξ ) is then a complex
number for each ξ and the condition means that Re a(ξ ) ≤ CA for all ξ ∈ Rd . It follows that
|eta(ξ ) | = et Re a(ξ ) ≤ eCat , t ≥ 0.

Hence
û(ξ ,t) = eta(ξ ) û0 (ξ ) ∈ S (Rd )
for each t ≥ 0. A familiar argument shows that
1
Z
u(x,t) = eix·ξ eta(ξ ) û0 (ξ ) dξ ,
(2π)d/2 Rd
defines a solution of the equation. A straight-forward proof shows that the function t 7→ u(·,t)
satisfies the conditions of Definition 3.27.
For the general case one needs a property of the matrix exponential. This property is best
proved using complex analysis.
Lemma 3.32. There exists a constant C > 0 such that
keA k ≤ C(1 + kAkn−1 )emax Re σ (A)
for A ∈ Cn×n , where σ (A) denotes the spectrum (set of eigenvalues) of A and k · k denotes the
matrix norm.
Proof. The proof relies on the following representation of the matrix exponential:
1
I
tA
e = etz (zI − A)−1 dz, (3.17)
2πi |z|=r
where r is chosen so that all the eigenvalues of A are contained in the ball Br (0) = {z ∈ C : |z| <
r} and the contour is traversed counterclockwise. The path integral of a matrix-valued function
is just the matrix formed by taking the path integral of each element of he matrix.
To prove formula (3.17), let B(t) be the function defined by the path integral. Notice first
that the function z 7→ etz (zI − A)−1 is analytic away from the eigenvalues of A (again defined
by considering each element). It follows that B(t) is independent of r. By the properties of the
matrix exponential, it suffices to check that
B0 (t) = AB(t)
and B(0) = I. Using the identity
(zI − A)−1 = z−1 (I + A(zI − A)−1 ),
and Cauchy’s theorem, we find that

1
I
B(0) = (zI − A)−1 dz
2πi |z|=r
1
I
= z−1 (I + A(zI − A)−1 ) dz
2πi |z|=r
1
I
=I+ z−1 A(zI − A)−1 dz.
2πi |z|=r
Recall now Cramer’s rule, which says that
k, j
j+k det A
A−1
j,k = (−1) ,
det A
where Ak, j is the matrix obtained if one removes row number k and column number j from A.
From this it follows that k(zI − A)−1 k ≤ C|z|−1 as |z| → ∞. Consequently,

1 I −1 −1
C

2πi |z|=r z A(zI − A) dz ≤ r →0
as r → ∞, so that B(0) = I. Finally, we have that

1
I
0
B (t) − AB(t) = etz I dz = 0.
2πi |z|=r
This proves formula (3.17).

Define now a path Γ in C as follows. Let λ j , j = 1, . . . , k be the distinct eigenvalues of A,
in order of increasing real part. For each j, let B1 (λ j ) be the open disc of radius 1 around λ j .
Form the union U = ∪kj=1 B1 (λ j (A)) (note that some of the discs may intersect). Let Γ be the
boundary of U, traversed counterclockwise. Deforming the contour |z| = r into Γ, we obtain
from (3.17) that
1
I
A
e = ez (zI − A)−1 dz.
2πi Γ
On Γ we have that |z − λ j | ≥ 1 for each j. Writing,
det(zI − A) = Πkj=1 (z − λ j )m j ,
where m j is the algebraic multiplicity of λ j , it follows that
| det(zI − A)| ≥ 1.
Using Cramer’s rule, we find that
det((zI − A)k, j )
(zI − A)−1
j,k = (−1)
j+k
,
det(zI − A)−1
where det((zI − A)k, j ) is a polynomial of degree at most n − 1 in z, the coefficients of which

are products of the elements of A. We can therefore estimate
k(zI − A)−1 k ≤ C1 (|z|n−1 + kAkn−1 ).
On the other hand, when z belongs to Γ it follows that |z − λ j | = 1 for some j. Using the
trivial estimate |λ j | ≤ kAk, which follows from the definition of an eigenvalue, we find that
|z| ≤ kAk + 1 on Γ. We conclude that k(zI − A)−1 k ≤ C2 (1 + kAkn−1 ). Finally, |ez | = eRe z ≤
eRe λk +1 on Γ and the total length of Γ is at most 2πn. It follows that

1 I z −1
A
e (zI − A) dz ≤ C(1 + kAkn−1 )eRe λk ,

ke k ≤
2πi Γ
where C = nC2 e.
Proof of Theorem 3.31. Given the assumptions in the theorem and the previous lemma it is
not difficult to see that
ketA(ξ ) k ≤ C(1 + |t|n−1 |ξ |m )eCAt ,
where m is the maximum degree of the elements of A(ξ ) times n − 1. We therefore find that
kξ α û(ξ ,t)kL∞ (Rd ) ≤ C ∑ kξ β û0 (ξ )kL∞ (Rd )

|β |≤|α|+m
uniformly for t in a compact interval, for some other constant C. The solution of a system of
ordinary differential equations whose coefficients are C∞ functions of some parameter depends
smoothly on that parameter. It follows that û is C∞ . Differentiating (3.16) with respect to ξk ,
we find that (
∂ξk ût = A(ξ )∂ξk û + (∂ξk A(ξ ))û, t > 0,
∂ξk û = ∂ξk û0 , t = 0,
and hence Z t
tA(ξ )
∂ξk û(ξ ,t) = e ∂ξk û0 (ξ ) + e(t−s)A(ξ ) (∂ξk A(ξ ))û(t, ξ ) dξ .
0
We also have that
∂t û(ξ ,t) = etA(ξ ) A(ξ )û0 (ξ ).
This will at most increase |û(ξ ,t)| by a polynomial factor. The same holds whenever we take
a finite number of derivatives. Since u0 ∈ S , we therefore find that û(·,t) ∈ S and hence
u(·,t) ∈ S .
To prove that t 7→ u(·,t) is continuous we use the formula
Z
u(x,t) = (2π)−d/2 eix·ξ etA(ξ ) û0 (ξ ) dξ .
Rd
It follows that
Z
ku(·,t + h) − u(·,t)kL∞ (Rd ) ≤ C ke(t+h)A(ξ ) − etA(ξ ) k|û0 (ξ )| dξ .
Rd
For any ε we can choose R so large that
Z Z
(t+h)A(ξ ) tA(ξ )
ke −e k|û0 (ξ )| dξ ≤ C (1 + |ξ |m )|û0 (ξ )| dξ ≤ ε,
|ξ |≥R |ξ |≥R
while Z Z
(t+h)A(ξ ) tA(ξ )
ke −e k|û0 (ξ )| dξ ≤ Ch |û0 (ξ )| dξ → 0
|ξ |≤R |ξ |≤R
as h → 0. It follows that ku(·,t + h) − u(·,t)kL∞ (Rd ) → 0 as h → 0. Similar estimates are easily
obtained for all the other semi-norms. To prove that u is C1 in t one needs in addition the
identity
Z t+h
(t+h)A(ξ ) tA(ξ )
e −e = A(ξ )esA(ξ ) ds,
t
from which it follows that

e(t+h)A(ξ ) − etA(ξ ) 0
= (1 + |ξ |m )e(t+h)CA ,

h
for some m0 . Splitting the integral as above it then follows that k u(·,t+h)−u(·,t)
h −ut (·,t)kα,β → 0
as h → 0 for every semi-norm, where ut is the formal time-derivative of u,
Z
ut (x,t) = (2π)−d/2 eix·ξ A(ξ )etA(ξ ) û0 (ξ ) dξ .
Rd
The continuity of ut is shown as above. It follows that t 7→ u(·,t) belongs to C1 ([0, ∞); S (Rd ))
and hence to C∞ ([0, ∞); S (Rd )) using the equation (see Remark 3.28). The continuity of the
map u0 7→ u follows from the estimates
ku(·,t)kα,β ≤ C ∑ ku0 kα,β 0
|β 0 |≤|β |+m
and the linearity of the equation (again using Remark 3.28).

One can show that Petrowsky’s condition is also necessary for well-posedness in S . For
this it is necessary to use a result from algebraic geometry known as the Tarski-Seidenberg
theorem. This result implies that the maximum real part of the eigenvalues of A(ξ ) either is
bounded or grows as a power in |ξ | as ξ → ∞. More precisely, one has that
max{Re λ : det(λ I − A(ξ )) = 0 for some ξ ∈ Rd with |ξ | = r} = crα (1 + o(1))
as r → ∞ for some α ∈ Q and c ∈ R. Suppose that Petrowsky’s condition is violated. We can
then find a sequence ξ j → ∞ and corresponding eigenvalues λ j with Re λ j ∼ c|ξ j |α , c, α > 0.
Let e j be a sequence of corresponding unit eigenvectors. Choosing
û j (ξ ) = |ξ j |−1 ϕ(ξ − ξ j )e j ,
for some fixed function ϕ ∈ C0∞ (Rd ) with ϕ(0) = 1, we obtain that
α
|etA(ξ j ) û j (ξ j )| = |ξ j |−1 e(Re λ j )t e j ∼ |ξ j |−1 ec|ξ j | t e j ,
so that u j (·,t) = F −1 (û j (·,t)) is ubounded in S (Rd ) for any fixed t > 0, while u0, j =
F −1 (û j ) → 0 in S (Rd ). This proves that (2) in Definition 3.27 is violated. In fact, one
can show that for ‘most’ initial data, there is not even a solution in the sense of Definition
3.27.
Example 3.33. Consider the scalar equation ut = a(Dx )u, where
a(ξ ) = am ξ m ,
with am 6= 0. To avoid trivialities, we suppose that m ≥ 1. If m is odd, the equation is well-
posed if and only if am is purely imaginary. If m is even, the equation is well-posed if and only
if Re am ≤ 0.
Example 3.34. The linearized KdV equation ut + uxxx = 0 can be written ut = a(Dx )u with
a(ξ ) = iξ 3 . Thus, it is well-posed. So is the Schrödinger equation iut + uxx = 0, which can be
written ut = a(Dx )u with a(ξ ) = −iξ 2 .
Example 3.35. The heat equation ut = ∆u can be written ut = a(Dx )u with a(ξ ) = −|ξ |2 .
Thus, it is well-posed. However, the backwards heat equation ut = −∆u is ill-posed. This
equation is obtained by considering the evolution of the heat equation for t < 0.
Since we have assumed that A(Dx ) is independent of t, the semigroup property follows at
once if Petrowsky’s condition is satisfied. On the Fourier side this corresponds to the identity
e(s+t)A(ξ ) = esA(ξ ) etA(ξ ) .
One sometimes uses the notation
S(t) = etA(Dx )
due to the similarity with the matrix exponential.
Finally, we consider the inhomogeneous problem
(
ut = A(Dx )u + f , t > 0,
(3.18)
u = u0 , t = 0,
which will play a big role when we consider nonlinear equations.
Theorem 3.36 (Duhamel’s formula). Let f ∈ C∞ ([0, ∞); S (Rd )) and u0 ∈ S (Rd ) and as-
sume that A satisfies Petrowsky’s condition. The inhomogeneous Cauchy problem (3.18) has
a unique solution u ∈ C∞ ([0, ∞); S (Rd )) given by
Z t
u(t) = etA(Dx ) u0 + e(t−s)A(Dx ) f (s) ds, t ≥ 0.
0
The solution formula should be interpreted pointwise for each x ∈ Rd .

Proof. Working on the Fourier side, we find that any solution must satisfy
Z t
û(ξ ,t) = etA(ξ ) û0 (ξ ) + e(t−s)A(ξ ) fˆ(ξ , s) ds, t ≥0
0
from the theory of ordinary differential equations. On the other hand, û(ξ ,t) defined by this
formula is a solution of the the transformed version of the problem. Using the same approach
as in the proof of Theorem 3.31 one shows that e(t−s)A(ξ ) fˆ(ξ , s) is a smooth function of s and
t with values in S (Rd ) when 0 ≤ s ≤ t. It is then an easy matter to prove that û belongs to
C∞ ([0, ∞); S (Rd )). It follows from Lemma 3.2 that the same is true for u = F −1 (û). Since
(ξ , s) 7→ e(t−s)A(ξ ) fˆ(ξ ) is in L1 (Rd × [0,t]) we can change the order of integration to deduce
that
Z t Zt Z t
−1 (t−s)A(ξ ) ˆ −1 (t−s)A(ξ ) ˆ
F e f (ξ , s) ds = F (e f (ξ , s)) ds = e(t−s)A(Dx ) f (s) ds.
0 0 0
3.6 Hyperbolic operators

As mentioned in Section 3.4, hyperbolic operators are those for which the Cauchy problem
is well-posed and which have the property of finite speed of propagation. By finite speed
of propagation we mean that the solution u(x,t) has compact support for a fixed t if u0 has
compact support. One can show that a necessary and sufficient condition for this to hold is
that
det(λ I − A(ξ )) has degree at most n in ξ . (3.19)
The idea is that if u(x,t) has compact support, then its Fourier transform
Z
−d/2
û(ξ ,t) = (2π) u(x,t)e−ix·ξ dx
supp u(·,t)
can be extended to the whole complex plane. Moreover, we have an estimate of the form
|û(ξ ,t)| ≤ CeK| Im ξ | , ξ ∈ C. (3.20)
On the other hand, we know that û(ξ ,t) = etA(ξ ) û0 (ξ ) and one can show that this together with
(3.20) implies a growth condition of the form |λ (ξ )| ≤ C(1 + |ξ |) on the eigenvalues of A(ξ ).
This leads to the condition (3.19). For the converse, one needs a characterization of functions
of compact support in terms of their Fourier transform. One can show that if û(·,t) is an entire
analytic function satisfying (3.20), then u(·,t) has compact support. Results of this type are
called Paley-Wiener theorems. A proof of these facts is beyond the scope of these notes.
An equation which is well-posed in S (Rd ) and has finite speed of propagation is in fact
well-posed in C∞ (see the discussion before Proposition 3.20). The hypothesis of finite speed
of propagation is crucial for this to hold. It can be shown that if the condition (3.19) is violated,
there exists a non-zero solution u ∈ C∞ (Rd × R) of ut = A(Dx )u which vanishes in the half-
space t ≤ 1. Thus, uniqueness is lost for the Cauchy problem.
We are now ready for the definition of a hyperbolic operator with constant coefficients. The
operator ∂t I − A(Dx ) is said to be hyperbolic if it satisfies condition (3.19) and Petrowsky’s
condition (Definition 3.29). For a general partial differential operator P(D) in Rd , it may
not be clear what to consider as the time-variable. Moreover, one may also wish to solve
the Cauchy problem with data on a hypersurface in Rd . It is therefore useful to have a more
general definition of hyperbolicity.
Definition 3.37 (Hyperbolicity).
(1) A scalar partial differential operator P(D) of order m is said to be hyperbolic in the
direction ν ∈ Rd if its principal part satisfies
Pm (ν) 6= 0,
and there exists τ0 ∈ R such that
P(ξ + τν) 6= 0, ξ ∈ Rd , Im τ ≤ τ0 .
(2) P(D) is said to be strictly hyperbolic in the direction ν if, in addition, for every ξ ∈ Rd
not parallel with ν, the equation
Pm (ξ + τν) = 0,
has only simple real roots τ.
(3) A matrix-valued operator P(D) is said to be (strictly) hyperbolic if det P(D) is (strictly)
hyperbolic.
Note that we recover the definition of hyperbolicity for P(D) = ∂t I − A(Dx ) by taking
ν = (0, 1) ∈ Rd × R.
In the following discussion we consider a scalar operator P(D). The condition Pm (ν) 6= 0
is the analogue of (3.19). We say that the direction ν is non-characteristic for P if this condi-
tion holds. A smooth (d − 1)-dimensional hypersurface Σ in Rd is said to be non-characteristic
if at each point of Σ, the normal is a non-characteristic direction for P. The Cauchy problem
with data on Σ consists of prescribing the values of the function u and all normal derivatives
of u of order strictly less than m on Σ. One can show that the Cauchy problem for a hyperbolic
operator with C∞ data on a non-characteristic hypersurface is well-posed. If one only assumes
that the surface is non-characteristic, one can still prove the (local) existence and uniqueness
of solutions with real-analytic data if Σ is real-analytic. This is the Cauchy-Kovalevskaya the-
orem. Intuitively, the assumption means that the normal derivative of order m can be expressed
in terms of the normal derivatives of order ≤ (m − 1) (the data) and their tangential deriva-
tives. This can be used to construct a formal power series solution, and the proof consists
of showing that the series converges. In fact, the theorem also holds for nonlinear equations.
The assumption of real-analyticity is often too strong for the theorem to be of any practical
relevance. Moreover, the solution may not depend continuously on the data.
Let us briefly mention something about operators
P(x, D) = ∑ aα (x)Dα ,
|α|≤m
with (smooth) variable coefficients. We say that P is (strictly) hyperbolic in the direction ν
at x0 if P(x0 , D) is (strictly) hyperbolic in the direction ν. It turns out that strict hyperbolicity
is sufficient for well-posedness, whereas the weaker notion is not. However, there are certain
notions in between which are useful. Consider e.g. the first order system
d
A0 (x,t)∂t u − ∑ Ak (x,t)∂k u − B(x,t)u = 0, (3.21)
k=1
where u = (u1 , . . . , un ) ∈ Rn , and B and Ak , k = 0, . . . , d, are n × n real matrices.
Definition 3.38. The system (3.21) is called a symmetric hyperbolic system if all the matrices
Ak (x,t) are symmetric and A0 (x,t) is positive definite for all (x,t).
One can prove that the Cauchy problem with initial data on t = 0 is well-posed also for
such systems.
Chapter 4
First order quasilinear equations
After our study of linear equations in the previous chapter, we now turn to nonlinear hyper-
bolic equations. We will focus on real first order equations since these can be solved by the
method of characteristics. The main emphasis is on showing that initially smooth solutions
can develop singularities. This is something one encounters already in a course on ordinary
differential equations, where solutions of the nonlinear equation ẋ = x2 blow up in finite time.
We will also briefly discuss how one can continue a solution after the development of a sin-
gularity. To do this we introduce the concept of weak solutions. Since weak solutions are in
general not differentiable, they cannot satisfy the PDE in the classical sense. To interpret the
solution one has to go back to the original physical problem. We will do this for a class of
equations called conservation laws, for which weak solutions have a natural interpretation.
More details on the method of characteristics can be found in Evans [7] or John [14]. For
more on hyperbolic conservation laws, see e.g. Bressan [4], Hörmander [11], Holden and
Risebro [10] and Lax [17].
4.1 Some definitions

A general partial differential equation of order m can be written
F(x, ∂ α u(x); |α| ≤ m) = 0, x ∈ Ω, (4.1)
where Ω ⊂ Rd and the notation means that the real-valued function F depends on u and all of
its derivatives of order less than or equal to m. Note that we now assume that u is real-valued.
This will be the case throughout the entire chapter. In the previous chapter we discussed linear
PDE,
∑ aα (x)∂ α u(x) = f (x), x ∈ Ω,
|α|≤m
for given functions aα , f : Ω → R. We mostly discussed the homogeneous case f = 0. Non-

linear equations are divided into different classes depending on ‘how nonlinear’ they are.
65
66 First order quasilinear equations
Definition 4.1.
(1) A PDE is said to be semilinear if it has the form
∑ aα (x)∂ α u + b(x, ∂ β u; |β | ≤ m − 1) = 0. (4.2)

|α|=m
(2) It is said to be quasilinear if it has the form
∑ aα (x, ∂ β u; |β | ≤ m − 1)∂ α u + b(x, ∂ β u; |β | ≤ m − 1) = 0. (4.3)

|α|=m
(3) It is said to be fully nonlinear if it depends nonlinearly on the highest derivatives.
In other words, the highest derivatives appear linearly in a quasilinear equation. If the
coefficients of the highest derivatives are also independent of the function u and its derivatives,
the equation is semilinear. A rule of thumb is that a PDE is more difficult to handle the more
the nonlinearity affects the highest derivatives.
Example 4.2.
• The KdV equation

ut + uux + uxxx = 0
is an example of a semilinear equation. So is the nonlinear wave equation
utt − ∆u + f (u) = 0,
where f : R → R is a given function.
• Burgers’ equation
ut + uux = 0
is an example of a quasilinear equation, which is not semilinear.
• An important example of a fully nonlinear equation is obtained as follows. Consider a

hypersurface Σ in Rd × R, given implicitly by S(x,t) = 0, where S is a C1 -function. Σ is
characteristic for the wave equation utt = ∆u if the normal is a characteristic direction for
the wave equation at each point of Σ. Since the normal at (x,t) is given by ∇(x,t) S(x,t),
this condition can be written
St2 − |∇S|2 = 0.
This is the eikonal1 equation. Assuming that S(x,t) = t − u(x), the eikonal equation is
sometimes rewritten
|∇u|2 = 1.
1 From the Greek word for image.

First order quasilinear equations 67
One can define characteristic directions and hyperbolicity also for nonlinear equations. For
a semilinear eqution it is natural to define ξ ∈ Rd to be a characteristic direction at the point
x ∈ Ω if ξ is a characteristic direction of the linear operator ∑|α|=m aα (x)∂ α . The definition
is similar in the quasilinear case: ξ ∈ Rd is a characteristic direction if ξ is characteristic
for ∑|α|=m aα (x, u(x), . . . , ∂ m−1 u(x))∂ α . This condition depends on both x and the function u.
Strict hyperbolicity is defined in an analogous manner. In the fully nonlinear case, one can
define strict hyperbolicity in terms of the linearization at u. The linearization at u is obtained
by calculating
d α

F(x, ∂ (u + εv)) = ∑ aα (x)∂ α v(x),

dε ε=0
|α|≤m
where
∂F
aα (x) = (x, ∂ β u(x)).
∂ (∂ α u)
We say that the equation (4.1) is strictly hyperbolic if its linearization is strictly hyperbolic.
We leave it as an exercise to prove that this agrees with the previous definitions in the case that
the equation is semilinear or quasilinear.
Example 4.3. Consider the first order quasilinear equation
a1 (x, u)∂1 u + · · · + ad (x, u)∂d u = f (x, u)
in Rd . The direction ν ∈ Rd is non-characteristic at x and u if
a(x, u) · ν 6= 0,
where a(x, u) = (a1 (x, u), . . . , ad (x, u)). The equation is then automatically strictly hyperbolic,
since the equation
a(x, u) · (ξ + τν) = 0
has the real root τ = −(a(x, u) · ξ )/(a(x, u) · ν) for each ξ ∈ Rd . The equation also shares
the finite speed of propagation property with the linear hyperbolic equations with constant
coefficients discussed in the previous chapter. This follows from the method of characteristics
discussed below.
Many of the models which appear in physics are in fact systems of PDE. The only differ-
ence is that u and F are then vector valued. One usually assumes that the number of equations
equals the number of unknown functions. One can make a similar division into linear, semi-
linear, quasilinear and fully nonlinear equations.
4.2 The method of characteristics

We consider now first order quasilinear equations
a1 (x, u)∂1 u + · · · + ad (x, u)∂d u = f (x, u). (4.4)

The coefficients a j are assumed to be C1 functions. Our goal is to solve (4.4) by turning it into
an appropriate system of ODE. Given u, we can define a curve x : I → Rd , I = (s0 , s1 ) ⊂ R, by
ẋ(s) = a(x(s), u(x(s))). The left hand side of (4.4) is the derivative of u along the curve x(s).
In other words, ż(s) = f (x(s), z(s)), where z(s) = u(x(s)). This closes the system and we have
obtained the characteristic equations
(
ẋ(s) = a(x(s), z(s)),
(4.5)
ż(s) = f (x(s), z(s)).
The curves x(s) are called characteristics, although this name is sometimes used for the curves
(x(s), z(s)) in Rd+1 . If equation (4.4) is semilinear, the first equation doesn’t involve z(s). We
can then first find the characteristic x(s) and then solve the second equation. If the original
equation is linear this simply entails integrating f (x(s)).
Example 4.4. Consider the linear equation x1 ∂2 u − x2 ∂1 u = f (x) in R2 . The characteristic

equations are 
ẋ1 (s) = −x2 (s),

ẋ2 (s) = x1 (s).

ż(s) = f (x(s)).

The characteristics are circles, x(s) = r(cos s, sin s), r > 0 fixed. Once we know this, we obtain
that Z s
u(r cos s, r sin s) = f (r cos θ , r sin θ ) dθ + g(r),
0
where g is arbitrary. Note that a necessary condition in order to obtain a global solution is that
Z 2π
f (r cos θ , r sin θ ) dθ = 0
0
for every r, since otherwise we will not obtain the same value of u when we go one time
around the origin.
We now illustrate how the characteristics can be used to solve the Cauchy problem in the
neighbourhood of a point. For simplicity we assume that the initial data u0 is given on the
hyperplane xd = 0.
Theorem 4.5 (Local solvability). Let u0 ∈ C1 (U) for some open set U ⊂ Rd−1 and let y0 ∈ U.
Assume that moreover that a and f are C1 in Rd and that the equation is non-characteristic in
the direction ed at (y0 , 0) and u0 . There exists an open subset W ⊂ Rd with W ∩(Rd−1 ×{0}) ⊂
U such that equation (4.4) has unique solution u ∈ C1 (W ) with u(y, 0) = u0 (y).
Proof. The solution is obtained by solving the characteristic equations (4.5) with initial data
x(0; y) = (y, 0) and z(0; y) = u0 (y), where y ∈ U. By the theory of ordinary differential equa-
tions, there exists for each y a maximal solution x(s; y) defined for s in some open inter-
val I(y) containing the origin. Moreover, the function (y, s) 7→ x(s; y) is C1 on the open set
∪y∈U {y} × I(y). We wish to define the solution for s 6= 0 as u(x(s; y)) = z(s; y). This re-
quires that the map y 7→ x(s; y) is injective, in other words, that the characteristics don’t cross.
Moreover, as y and s vary, the curves x(s; y) should trace out an open set W ⊂ Rd . Both of
these conditions follow from the inverse function theorem if we can prove that the Jacobian
of the map (y, s) 7→ x(s; y) is non-zero at s = 0 and y = y0 . Since x(0; y) = (y, 0) it follows that
∂y j xi (0; y) = δi j . We also have that ∂s x(0; y) = a((y, 0), u0 (y)) from (4.5). Hence, the Jacobian
is
1 0 . . . a1 ((y0 , 0), u0 (y0 ))

0 1 . . . a2 ((y0 , 0), u0 (y0 ))
= ad ((y0 , 0), u0 (y0 )) 6= 0

.. .. . . ..
. . . .

0 0 · · · ad ((y0 , 0), u0 (y0 ))
by the non-characteristic assumption. The fact that u defined in this way is a solution of the
problem follows from the inverse function theorem and the definition of the characteristics.
The uniqueness of the solution follows from the construction.
For details on the generalization of the method of characteristics to fully nonlinear first
order equations and the solution of the Cauchy problem with data on a noncharacteristic hy-
persurface in Rd we refer to Evans [7, Chapter 3.2].
4.3 Conservation laws

We used the conservation of energy in the derivation of the heat equation in Section 3.1.1.
Suppose more generally that we have a time-dependent real-valued function u on Rd which
describes the ‘density’ of some quantity, e.g. the concentration of some substance in a fluid.
The total amount of substance in the region Ω ⊂ Rd is then given by
Z
u(x,t) dx.
Ω
Assume that the production of the substance at a point (x,t) in space and time is given by
g(x,t) (substance is destroyed if g(x,t) < 0) and that the flow of the substance is described by
the vector-valued function f : Rd × R → Rd . The same considerations as in Section 3.1.1 lead
to the equation
d
Z Z Z
u(x,t) dx = − f (x,t) · n dS + g(x,t).
dt Ω ∂Ω Ω
Assuming again that all the involved functions are continuous, we obtain the continuity equa-
tion
ut + ∇ · f = g.
To get any further one has to know the form of f and g, which depends on the specific problem.
In many cases, f and g depend on u and possibly its derivatives. The relation between these
quantities is called a constitutive law. We shall assume throughout the rest of the chapter that
f (x,t) = f (u(x,t)), that is, f depends solely on u. We will also mostly consider the case g ≡ 0.
The equation of interest is therefore
ut + ∇ · f (u) = 0.
This is called a scalar conservation law. The function f is sometimes called the flux function
in this context. Carrying out the differentiation, we find that
ut + a1 (u)∂1 u + · · · + ad (u)∂d u = 0,
where ai (u) = fi0 (u), i = 1, . . . , d. In particular, we notice that a scalar conservation law always
is hyperbolic in the t-direction (cf. Example 4.3). Many of the interesting problems from
physics are in fact systems of conservation laws.
∂t u j + ∇ · f j (u) = 0, (4.6)
where u = (u1 , . . . , un ) : Rd → Rn and f j : Rn → Rd , j = 1, . . . , n.
Example 4.6 (Burgers’ equation). Burgers’ equation ut +uux = 0 can be rewritten as the scalar
one-dimensional conservation law
2
u
ut + =0
2 x
with flux f (u) = u2 /2.
Example 4.7 (Equations of gas dynamics). The macroscopic description of a gas (or any other
fluid) involves the fluid velocity u, the density ρ, the pressure p and the energy E. The first is a
vector quantity while the others are scalar quantities. At a given moment in time, the functions
are defined in some open subset of R3 . The flow of substance across a surface is given by ρu·n,
where n is the unit normal. The conservation of mass can therefore be expressed in the form
ρt + ∇ · (ρu) = 0. (4.7)
Basic principles of physics require balance of momentum. This is expressed as the conserva-
tion law2
∂t (ρu j ) + ∇ · (ρu j u) + ∂ j p = ρFj , (4.8)
where ρu j are the components of the momentum density, ρu j u · n the components of the
momentum flow across a surface, p is the pressure and Fj the components of the body forces
(e.g. gravity). Here we have neglected viscosity. Equation (4.8) can be written in vector
notation as
∂t (ρu) + ∇ · (ρu ⊗ u) + ∇p = ρF,
2 We’re abusing the terminology here, since the equations are not of the form (4.6). Introducing new variables
w j = ρu j we would however obtain (4.6) with source terms.
where u⊗u = (u j uk ) j,k ∈ Rd is the outer product of the the vector u with itself. In addition, one
needs an equation for the conservation of energy. Ignoring heat conduction and production,
this takes the form
∂t E + ∇ · (u(E + p)) = ρF · u, (4.9)
where E is the total energy per unit volume. E is the sum of internal energy and kinetic
energy associated with the macroscopic motion, E = ρe + 12 ρ|u|2 , where e denotes the specific
internal energy (internal energy per unit mass). In general one should add a a heat flux term
as in Section 3.1.1 and possibly a heat production term. The considerations so far are not
specific for gases, and would work equally well for a liquid. Note that we have six unknowns
(counting all components of u) but only five equations (4.7)–(4.9). This means that we have an
underdetermined system. Liquids are assumed to be incompressible, mathematically meaning
that introduces the extra equation ∇ · u. For a gas one adds instead a constitutive relation of the
form e = e(p, ρ). Assuming that we are dealing with an ideal gas with constant entropy (an
isentropic gas) the equations can be reduced to (4.7)–(4.8) together with a constitutive relation
p = f (ρ),
where f (ρ) = κρ γ for some constants κ > 0 and γ > 1. Here we have also assumed that the
gas is polytropic, that is, that it has constant specific heat capacities. Equation (4.8) can be
simplified using (4.7). Carrying out the differentiation in (4.8), one obtains that

(ρt + ∇ · (ρu)) u j + ρ ∂t u j + u · ∇u j + ∂ j p = ρFj ,
which simplifies to
1
ut + (u · ∇)u + ∇p = F (4.10)
ρ
due to the identity (4.7). Equation (4.10) is the usual form of Euler’s equations. Note that
(4.8) can be written in the similar form
ρt + u · ∇ρ + ρ∇ · u = 0.
The differential operator

D
= ∂t + u · ∇,
Dt
called the material derivative, has a very natural interpretation. Suppose that we follow a very
small parcel of gas. The parcel is approximated by a single mass at the point x(t) (a ‘fluid
particle’). The evolution of this particle is governed by the equation
ẋ(t) = u(x(t),t).
The material derivative of some physical quantity, such as the density, is then simply the
derivative of the quantity along the path of a fluid particle, that is,
dn o Dρ
ρ(x(t),t) = ρt (x(t),t) + u(x(t),t) · ∇ρ(x(t),t) = (x(t),t).
dt Dt
See Whitham [28, Chapter 6] for a much more extensive discussion of gas dynamics.
Example 4.8 (The linearized equations of gas dynamics). We now show how the linearized
equations of gas dynamics discussed in Example 1.1 can be derived from (4.7) and (4.10)
under the assumption (4.7). We assume in addition that there are no external forces (F = 0).
Note that a constant state u ≡ 0, ρ ≡ ρ0 and p ≡ p0 := f (ρ0 ) is a solution of the equations.
Consider now a small perturbation of the constant state: ρ = ρ0 + ε ρ̃ and u = ε ũ, where ε is a
small parameter. Substituting this into the governing equations, expanding in powers of ε and
dividing by ε, we obtain
ρ̃t + ρ0 ∇ · ũ = O(ε)
and
f 0 (ρ0 )
ũt + ∇ρ̃ = O(ε).
ρ0
Neglecting terms of order ε and dropping the tildes for notational convenience, we arrive at
the linearized equations
ρ0 ut + c20 ∇ρ = 0,
ρt + ρ0 ∇ · u = 0,
p
from Example 1.1, where c0 = f 0 (ρ0 ).
Example 4.9 (Traffic model). We discuss a simple model for traffic flow introduced inde-
pendently by Lighthill and Whitham [18] and Richards [21]. Consider a road consisting of a
single lane. The traffic is described by a the density ρ(x,t) of cars per unit length. The number
of cars in the stretch [a, b] is given by ab ρ(x,t) dx. The function ρ is in reality a piecewise
R
continuous function of x (if we count fractions of cars), but if we replace it by the local av-
erage over a an interval which is sufficiently large compared with the average length of the
cars and sufficiently small compared with the total length of the road, we can assume that it
is continuous. We also assume that it is sufficiently differentiable for the following considera-
tions to go through. We suppose that there are no exists or entries, so that the only way a car
can enter or leave a stretch of road is through the endpoints. Let v(x,t) be the velocity of an
individual vehicle at (x,t). The rate of cars passing through the point x at time t is then given
by v(x,t)ρ(x,t) and we have the relation
Z b
d
ρ(x,t) dx = v(a,t)ρ(a,t) − v(b,t)ρ(b,t).
dt a
We therefore obtain the conservation law
ρt + (vρ)x = 0.
The model has to be completed with a constitutive relation between v and ρ (and possibly
(x,t)). Different relations can be used to describe different traffic situations. Assuming that
the road looks the same everywhere and at all times, we postulate that v is a function only of ρ.
A natural assumption is that v is a decreasing function of ρ (cars slow down when the traffic
becomes denser). We assume that when the traffic is light, a car will drive at maximum speed
vmax and that when the traffic reaches a critical density ρmax the cars will stop. The simplest
possible constitutive relation which satisfies these conditions is given by

ρ
v = vmax 1 − ,
ρmax
meaning that ρ satisfies the conservation law
ρt + f (ρ)x = 0,

ρ
with f (ρ) = ρvmax 1 − ρmax . According to this model, the maximum flow rate vρ is attained

when ρ = ρmax /2. Note that the ‘wave velocity’ a(ρ) = f 0 (ρ) = vmax 1 − ρ2ρ max
is less than
the individual velocity of the cars v(ρ). The interpretation is that information travels backward
through the traffic and that the drivers are influenced by what happens ahead. Setting u =
ρ/ρmax and x = vmax x̃ turns the equation into
ut + (u(1 − u))x = 0,
where we have dropped the tilde for notational convenience. Setting w = 1 − 2u one recovers
Burgers’ equation. See Whitham [28, Chapter 3] for more on traffic models.
There are well-developed theories for scalar conservation laws in any dimension (n = 1)
and for systems of conservation laws in one spatial dimension (d = 1). Very little is known
when both d > 1 and n > 1. For simplicity, we will consider scalar conservation laws in one
space-dimension (d = n = 1). The Cauchy problem is
ut + f (u)x = 0, t > 0, (4.11)

u(x, 0) = u0 (x). (4.12)
The integrated form of (4.11) is

Z b
d x=b
u(x,t) dx + f (u(x,t)) x=a = 0. (4.13)
dt a
Carrying out the differentiation in (4.11), we obtain
ut + a(u)ux = 0, (4.14)
where a(u) = f 0 (u). Equation (4.14) can be thought of as a generalization of the ‘transport
equation’ ut + aux = 0, a constant, in which the velocity a(u) depends on the solution u. As
discussed in Section 1.2 of the introduction, the crossing of the characteristics can be inter-
preted as saying that the portion of the graph of u initially located at x1 eventually overtakes
the portion of the graph initially located at x2 > x1 , given that a(u0 (x1 )) > a(u0 (x2 )). This
causes the function u to become multivalued.
We now summarize the local theory for the problem (4.11)– (4.12).
Figure 4.1: Crossing characteristics for the conservation law (4.11).
Theorem 4.10. Let u0 ∈ Cb1 (R) and f ∈ C2 (R). There exists a maximal time T > 0 such that
(4.11)–(4.12) has a unique solution u ∈ C1 ([0, T ) × R). The corresponding characteristics are
straight lines. If a(u0 ) is increasing, the maximal existence time is infinite, T = ∞. Otherwise,
1
T =− .
infx∈Rd ∂x (a(u0 (x)))
Proof. The proof essentially follows as for Theorem 4.5, the difference being that we are no
considering the global picture. We begin by proving the uniqueness. Due to the form of the
equation we can parametrize the characteristics using t and write them in the form (x(t; y),t).
The characteristic equations simplify to
(
ẋ = a(z),
ż = 0,
meaning that u is constant along the characteristics and that the characteristics are straight
lines t 7→ (y + a(u0 (y))t,t). The map y 7→ x(t; y) defines a change of variables as long as it is
strictly increasing with limy→±∞ x(t; y) = ±∞. The latter property will hold for all t since u0
is bounded. To verify the former property, we calculate
∂y x(t; y) = 1 + ∂y (a(u0 (y)))t,
which is positive for all y precisely when t < T as defined above. This shows that the so-
lution is uniquely determined by u0 . Note that the map y 7→ x(t; y) is not monotone for
t > T . This means that there cannot exist a C1 solution beyond T since we would then have
x(t; x1 ) = x(t; x2 ) := x and u(x,t) = u0 (x1 ) = u0 (x2 ) for some numbers x1 < x2 . This means that
the characteristics starting at x1 and x2 intersect at (x,t). On the other hand, the slope of these
characteristics is a(u(x1 )) = a(u(x2 )), which shows that they can never intersect. Clearly this
is preposterous. The existence of the solution simply follows by defining u(x,t) = u0 (g(x,t)),
where x 7→ g(x,t) is the inverse of the map y 7→ x(t; y). The inverse function theorem guaran-
tees that g, and hence u, is a C1 function. From the definition of the characteristics it follows
that u satisfies the differential equation, and that u(x, 0) = u0 (g(x, 0)) = u0 (x).
4.3.1 Discontinuous solutions

The above shows that we must allow discontinuous solutions in order to have any hope of
developing a global theory. Equation (4.11) does not make sense in this case, but the integrated
form (4.13) might still be sensible. Suppose that u is a classical solution of (4.11) on either
side of a curve x = x(t) and that u and its derivatives extend continuously to the curve from
the left and the right. Let ul (t) be the limit from the left and ur (t) the limit from the right of u.
Taking a < x(t) < b, we obtain
Z b Z x(t) Z b
d d
u dx = u dx + u dx
dt a dt a x(t)
Z b
= ut dx + (ul (t) − ur (t))ẋ(t)
a
Z b
=− f (u)x dx + (ul (t) − ur (t))ẋ(t)
a
x=b
= (ul (t) − ur (t))ẋ(t) + f (ur (t)) − f (ul (t)) − f (u(x,t)) x=a .
We therefore obtain the necessary condition
f (ur (t)) − f (ul (t)) = ẋ(t)(ur (t) − ul (t))
for u to be a solution of the integrated version of the conservation law. This is called the
Rankine-Hugoniot condition. It is often written
[[ f (u)]] = ẋ(t)[[u]],
where [[v]] = vr (t) − vl (t) for a function v having limits vl (t) from the left and vr (t) from the
right at the curve x = x(t).
Equation (4.13) still requires that we can make sense of the derivative of ab u(x,t) dx. One
R
could obtain an even weaker notion of solutions by integrating this relation over a time interval.
There is a related concept which we now describe. It has the advantage that it can be used to
define discontinuous solutions of other types of differential equations than conservation laws.
Let u be a C1 solution of (4.11) and multiply the equation by a function ϕ ∈ C0∞ (R2 ). If we
now integrate by parts, we obtain that
ZZ
0= (ut (x,t) + f (u(x,t))x )ϕ(x,t) dx dt
R2+
ZZ Z
=− (u(x,t)ϕt (x,t) + f (u(x,t))ϕx (x,t)) dx dt − u(x, 0)ϕ(x, 0) dx,
R2+ R
where R2+ = R × (0, ∞). Hence

ZZ Z
(u(x,t)ϕt (x,t) + f (u(x,t))ϕx (x,t)) dx dt + u0 (x)ϕ(x, 0) dx = 0. (4.15)
R2+ R
∞ (R). A function u ∈ L∞ (R2 ) is called a weak solution of (4.11)

Definition 4.11. Let u0 ∈ Lloc loc +
if (4.15) holds for every ϕ ∈ C0∞ (R2 ).
∞ (R2 ) we mean that u ∈ L∞ (K) for every compact subset K of R2 := R × [0, ∞).
By u ∈ Lloc + +
We have already seen that a classical solution is a weak solution. If a weak solution u is C1 it
is easy to see that it must in fact be classical solution. Indeed, integrating by parts one finds
that
ZZ Z
(ut (x,t) + f (u(x,t))x )ϕ(x,t) dx dt + (u(x, 0) − u0 (x))ϕ(x, 0) dx = 0.
R2+ R
It now follows that that ut (x,t) + f (u(x,t))x = 0 for t > 0. Otherwise one could find an open
set Ω where this is non-zero and of constant sign and then choose ϕ ≥ 0 and 6≡ 0 with support
in Ω to obtain a contradiction. Having established this, the initial condition follows from a
similar argument.
Proposition 4.12. Assume that x : [0, ∞) → R2+ is a C1 function with x(0) = 0. Assume that
u is a classical C1 solution on each side of the curve x = x(t) and that u and its deriva-
tives extend continuously to x = x(t) from each side, with ul (t) = limx→x(t)− u(x,t) and ur =
limx→x(t)+ u(x,t). Then u is a weak solution if and only if it satisfies the Rankine-Hugoniot
condition along the curve x = x(t).
Proof. Let Ωl = {(x,t) ∈ R2+ : x < x(t)} and Ωr = {(x,t) ∈ R2+ : x > x(t)}. We can write
ZZ ZZ ZZ
(uϕt + f (u)ϕx ) dx dt = (uϕt + f (u)ϕx ) dx dt + (uϕt + f (u)ϕx ) dx dt.
R2+ Ωl Ωr
Since ∇(x,t) · ( f (u)ϕ, uϕ) = (ut + f (u)x )ϕ + uϕt + f (u)ϕx = uϕt + f (u)ϕx in Ωl , we obtain
that
ZZ ZZ
(uϕt + f (u)ϕx ) dx dt = ∇(x,t) · ( f (u)ϕ, uϕ) dx dt
Ωl Ωl
Z
= ( f (u)ϕ, uϕ) · n ds
∂ Ωl
Z x(0) Z ∞
=− u0 (x)ϕ(x, 0) dx + ( f (ul (t)) − ẋ(t)ul (t))ϕ(x(t),t) dt
−∞ 0
by the divergence theorem, where n denotes the outer unit normal. In order to use the diver-
gence theorem, we should work in a bounded domain. Note however that ϕ(x,t) = 0 outside
a compact set, so that we could do the caclulations in Ωl ∩ R where R = (−A, A) × (−B, B) is
a large rectangle in R2 containing supp ϕ and {(x(t),t) : 0 ≤ t < B}. The result would be the
same since ϕ and its derivatives vanish on ∂ R. Performing the same calculation for Ωr , we
find that
ZZ Z ∞ Z ∞
(uϕt + f (u)ϕx ) dx dt = − u0 (x)ϕ(x, 0) dx + ( f (ur (t)) − ẋ(t)ur (t))ϕ(x(t),t) dt
Ωl x(0) 0
Figure 4.2: Characteristics for the Riemann problem with a(ul ) > a(ur ).
Adding the two parts and comparing with (4.15), we find that u is a weak solution if and only
if Z ∞
([[ f (u)]] − ẋ(t)[[u]])ϕ(x(t),t) dt = 0
0
for all ϕ. This is equivalent to the Rankine-Hugoniot condition [[ f (u)]] = ẋ(t)[[u]].
Note that the condition is automatically satisfied if ul = ur , so that a piecewise C1 function

of the above form always is a weak solution.
4.3.2 The Riemann problem

Having obtained a necessary condition on a solution with a jump discontinuity, we are left
with the question of whether such solutions exist. We consider initial data of the form
(
ul , x ≤ 0,
u0 (x) =
ur , x > 0,
where ul and ur are constants with ul 6= ur . We suppose in addition that a(ul ) 6= a(ur ).
Consider first the case a(ul ) > a(ur ). If we try to solve the problem using the method of
characteristics, we see that the region x < a(ur )t only contains characteristics emanating from
the negative half-axis, while the region x > a(ul )t only contains characteristics emanating
from the positive half-axis. We should therefore set
(
ul , x < a(ur )t,
u(x,t) =
ur , x > a(ul )t.
However, in the region a(ur )t ≤ x ≤ a(ul )t the characteristics intersect and it is unclear how
to define the solution (see Figure 4.2). We propose the solution
(
ul , x ≤ ct,
u(x,t) = (4.16)
ur , x > ct.
u
c
ul
ur
x
Figure 4.3: Shock wave and corresponding characteristics. Here a(ul ) > a(ur ).
Figure 4.4: Characteristics for the Riemann problem with a(ul ) < a(ur ).
Clearly u solves (4.11) on either side of the discontinuity. From the previous discussion, we
see that u is a weak solution if and only if the Rankine-Hugoniot condition
[[ f (u)]]
c= (4.17)
[[u]]
is satisfied. The solution defined in (4.16)–(4.17) is called a shock wave.

Consider next the case a(ul ) < a(ur ) and assume for simplicity that a is strictly monotone
(see e.g. Holden and Risebro [10] for the general case). We no longer have the problem that
characteristics cross. Instead we have the problem that no characteristics emanating from
the x-axis enter the region a(ul )t ≤ x ≤ a(ur )t. Thus, it is again unclear how to define the
solution in this region. The shock solution (4.16)–(4.17) is still a weak solution even if ul < ur .
However, we can also define a solution by setting

ul ,
 x ≤ a(ul )t,
−1
u(x,t) = a (x/t), a(ul )t ≤ x ≤ a(ur )t,

ur , x ≥ a(ur )t.

t
a(ur )
ur
ul x
a(ul )
Figure 4.5: A rarefaction wave and corresponding characteristics.
Indeed, it is straight-forward to show that this defines a piecewise C1 function, which satisfies
the differential equation in each of the three different regions. It follows that it is a weak
solution. We thus have the unhappy situation that weak solutions are not unique. The function
u is called a rarefaction wave. Figure 4.5 shows a rarefaction wave corresponding to Burgers’
equation, for which a−1 (x/t) = x/t and solution is given by a straight line in the interval
a(ul )t ≤ x ≤ a(ur )t.
4.3.3 Entropy conditions

There is now way to single out the ‘correct’ solution by simply using the PDE. Precisely as
when we introduced weak solutions above, we have to go back to the physics behind the
model to find a criterion. A common way of singling out the physically relevant solution
is to introduce some form of entropy condition. The name comes from the equations of gas
dynamics, where these conditions have a physical interpretation. The basic idea is to introduce
a direction of time. Note that the original conservation law (4.11) is reversible, in the sense
that if u(x,t) is a solution, then so is u(−t, −x). Thus the qualitative behavior forward and
backward in time should be identical. This is in contrast to the heat equation ut = uxx . As
we saw in Chapter 3, the heat equation is well-posed forward in time, but not backward in
time. Thus, there is a ‘preferred’ sense of direction. This is also illustrated by the fact that for
the heat equation the L2 -norm is strictly decreasing for solutions in S (R) (see Section 3.1.4),
whereas it is conserved for Schwartz class solutions of the conservation law (4.11). Arguing
that there always is some diffusion or dissipation in a real physical system, one imposes the
condition that the ‘correct’ solution should be a limit as ε → 0+ of a solution uε of the equation
ut + f (u)x = εuxx , ε > 0. (4.18)
Since this is difficult to verify directly, we now derive a necessary condition on u for this
to hold, wich is easier to check. Suppose that η is a C1 function and define q by q0 (u) =
f 0 (u)η 0 (u). It is then easily confirmed that if u is a classical solution of the conservation law,
then η(u)t + q(u)x = 0. Suppose now that η is C2 and convex so that η 00 (u) ≥ 0. We will
show that η(u)t + q(u)x ≤ 0 in a suitable weak sense if u is a weak solution obtained as a limit
of C2 solutions of (4.18). We assume that uε → u in Lloc 1 (R2 ) and that uε is locally bounded
+
uniformly in ε. Note that we don’t prove that such a convergence result holds; the main point
is to motivate condition (4.19) below. Multiplying (4.18) by η 0 (uε )ϕ, where ϕ ∈ C0∞ (R2+ ) with
ϕ ≥ 0, we find that
ZZ
0= (utε + f (uε )x − εuεxx )η 0 (uε )ϕ dx dt
R2+
ZZ
= (η(uε )t + q(uε )x − εη 0 (uε )uεxx )ϕ dx dt
R2+
ZZ ZZ
= ε ε
(−η(u )ϕt − q(u )ϕx ) dx dt + ε (η 00 (uε )(uεx )2 ϕ − η(uε )ϕxx ) dx dt,
R2+ R2+
so that ZZ ZZ
ε ε
(−η(u )ϕt − q(u )ϕx ) dx dt ≤ ε η(uε )ϕxx dx dt.
R2+ R2+
In the limit as ε → 0+ we therefore find that

ZZ
(−η(u)ϕt − q(u)ϕx ) dx dt ≤ 0 (4.19)
R2+
for every ϕ ∈ C0∞ (R2+ ) with ϕ ≥ 0. This is sometimes expressed as saying that η(u)t + q(u)x ≤
0 in the sense of distributions. The pair (η, q) is called an entropy-flux pair.
Definition 4.13. A weak solution u is said to be entropy admissible if (4.19) holds for every
entropy-flux pair (η, q).
For a fixed number u? , the function ηδ (u) = ((u − u? )2 + δ 2 )1/2 is convex and C∞ . It
converges to |u − u? | as δ → 0. Choosing the corresponding flux function qδ appropriately,
we find that
f 0 (s)(s − u? )
Z u
qδ (u) = ds
u? ((s − u? )2 + δ 2 )1/2
Z u
→ f 0 (s) sgn(s − u? ) ds
u?
= ( f (u) − f (u? )) sgn(u − u? )
pointwise as δ → 0 by dominated convergence. Note that η(u) = |u − u? | is convex and

smooth everywhere except at u? , and that q(u) = ( f (u) − f (u? )) sgn(u − u? ) satisfies q0 (u) =
f 0 (u)η 0 (u) for u 6= u? . Thus (η, q) can be thought of as a Lipschitz continuous entropy-flux
pair. Approximating (η, q) with (ηδ , qδ ) in (4.20) and passing to the limit using the dominated
convergence theorem, one obtains the inequality
ZZ
(|u − u? |ϕt + sgn(u − u? )( f (u) − f (u? ))ϕx ) dx dt ≥ 0
R2
for every u? ∈ R. This is Kružkov’s admissibility condition. One can in fact show that
Kružkov’s condition is equivalent to the entropy condition in Definition 4.13.
The entropy condition turns out to be exactly what is needed to prove existence and unique-
ness. The proof of the following theorem is beyond the scope of these notes. See e.g. Bressan
[4] and Hörmander [11].
Theorem 4.14. Assume that u0 ∈ L∞ (R) and that f is locally Lipschitz continuous. There
exists a unique entropy admissible weak solution u ∈ C([0, ∞); Lloc 1 (R)) ∩ L∞ (R × [0, ∞)) of
the Cauchy problem (4.11)–(4.12). If in addition u0 ∈ L1 (R), then u ∈ C([0, ∞); L1 (R)) and
the map S(t) : u0 7→ u(·,t) from initial data to solution extends to a continuous semigroup on
L1 (R) with kS(t)u0 − S(t)v0 kL1 (R) ≤ ku0 − v0 kL1 (R) .
We consider now the Riemann problem and show how the admissibility condition can
be used to single out a unique solution. The argument used to prove the Rankine-Hugoniot
condition shows that (4.19) leads to the jump condition
[[q(u)]] ≤ c[[η(u)]] (4.20)
for every C1 entropy-flux pair (η, q). Once again, the condition is automatically satisfied by
a piecewise C1 solution. This means e.g. that a rarefaction wave is always admissible. A
limiting procedure shows again that the inequality continues to hold if we take η(u) = |u − u? |
and q(u) = sgn(u − u? )( f (u) − f (u? )), whence we obtain
sgn(ur − u? )( f (ur ) − f (u? )) − sgn(ul − u? )( f (ul ) − f (u? )) ≤ c(|ur − u? | − |ul − u? |).
Suppose that ul < ur and take u? = αul + (1 − α)ur for α ∈ [0, 1], so that u? ∈ [ul , ur ]. We then
obtain the chain of inequalities
f (ur ) + f (ul ) − 2 f (u? ) ≤ c(ur + ul − 2u? )

⇒ f (ur ) + f (ul ) − 2 f (u? ) ≤ c(1 − 2α)(ur − ul )
⇒ f (ur ) + f (ul ) − 2 f (u? ) ≤ (1 − 2α)( f (ur ) − f (ul ))
⇒α f (ur ) + (1 − α) f (ul ) ≤ f (αul + (1 − α)ur ),
where we used the Rankine-Hugoniot condition in the third line. We leave it as an exercise to
show that the inequality is reversed if ur < ul . The conditions
α f (ur ) + (1 − α) f (ul ) ≤ f (αul + (1 − α)ur ), α ∈ [0, 1],
if ul < ur , and
α f (ur ) + (1 − α) f (ul ) ≥ f (αul + (1 − α)ur ), α ∈ [0, 1],

if ur < ul , are called the Lax shock conditions. They have a very simple geometric interpre-
tation, saying that the graph of f must lie above the line segment between (ul , f (ul )) and
(ur , f (ur )) in the first case and below the line segment in the second case. In particular, if f is
convex, we obtain the condition
ur < ul
for a shock. For the Riemann problem, this means that the shock is the correct entropy solution
in the case ur < ul , whereas the rarefaction wave is the correct entropy solution in the case
ul < ur . The condition is reversed if f is concave.
Example 4.15. Consider the traffic model from Example 4.9, written in the normalized form
ut + f (u)x = 0, where f (u) = u(1 − u) is concave. This means that a shock wave will form if
ul < ur (meaning that the density is greater in the direction of the traffic). Consider e.g. the
case when ur = 1 meaning that the cars are standing still ahead.This might e.g. model a queue
forming at a red light. The velocity of the shock will then be c = ( f (ur ) − f (ul ))/(ur − ul ) < 0
(assuming that ul ∈ (0, 1)), so the shock wave is moving backwards, that is cars driving towards
the traffic light are coming to a sudden stop. If on the other hand ul > ur we get a rarefaction
wave. If ul = 1, this might model what happens when the light turns green.
Chapter 5
Distributions and Sobolev spaces
In the previous chapter we encountered weak solutions of PDE. The key was to multiply the
equation by a ‘test function’ ϕ ∈ C0∞ (Rd ) and integrate by parts to put all the derivatives on
ϕ. The name test function refers to the fact that the equation is ‘tested’ against ϕ. This gives
a suitable definition of weak solutions in which the solution u doesn’t need to be differen-
tiable. If u is a sufficiently regular weak solution, one shows that it is a classical solution,
meaning that the PDE holds pointwise. What is involved here is the concept of ‘weak’ or
‘distributional’ derivatives. The weak derivative of a non-differentiable function is in general
not a function, but a distribution. The purpose of this chapter is partly to clarify what this
means. Distributions or ‘generalized functions’ have a long prehistory. The theory was made
rigorous in the middle of the 20th century, in particular by Laurent Schwartz (1915–2002).
Distributions are indispensable in the modern theory of linear partial differential equations.
Distributions are not quite as useful for nonlinear equations, since one can in general
not define the product of two distributions. Moreover, the space of distributions is not a
normed vector space. For this purpose we introduce a generalization of Lebesgue spaces
called Sobolev spaces after Sergei L’vovich Sobolev (1908–1989). Sobolev was also involved
in the early theory of distributions. The Sobolev space H k simply consists of L2 functions
whose weak derivatives up to order k lie in L2 . As we will see, H k is a Banach space and
moreover an algebra if k is sufficiently large. Another important reason for using Sobolev
spaces is that they behave well with respect to the Fourier transform. This is not the case for
spaces based on the supremum norm.
The present chapter can only serve as brief introduction. For more details on distribu-
tions we refer to Friedlander [9], Duistermaat and Kolk [6], Hörmander [12], Rauch [20] and
Strichartz [25]. For more on Sobolev spaces we refer to the standard reference Adams and
Fournier [1]. In particular, this reference discusses Sobolev spaces on subsets of Rd , where
the regularity of the boundary plays an important role. In these notes we shall only discuss
Sobolev spaces on Rd .
83
84 Distributions and Sobolev spaces
5.1 Distributions
1 (Rd ) gives rise to a linear functional1 ` : C∞ (Rd ) →
Note that a locally integrable function u ∈ Lloc u 0
C defined by ù (ϕ) = hu, ϕi, where
Z
hu, ϕi := u(x)ϕ(x) dx. (5.1)
Rd
Note that this differs from the inner product in L2 (Rd ) in that there is no complex conjugate
on ϕ and that we don’t require u, ϕ ∈ L2 (Rd ). It is clear at least if u is continuous that ù
determines u uniquely, since if ù (ϕ) = 0 for all ϕ ∈ C0∞ (Rd ) then u ≡ 0. Below we shall
1 (Rd ).
prove that this is also the case if u ∈ Lloc
In keeping with the notation of Laurent Schwartz, we denote the space of test functions
C0∞ (Rd ) by D(Rd ).
Definition 5.1. A distribution ` is a linear functional on D(Rd ) which is continuous in the
following sense. If {ϕn } ⊂ D(Rd ) has the following properties:
(1) there exists a compact set K ⊂ Rd such that supp ϕn ⊆ K for all n, and
(2) there exists a function ϕ ∈ D(Rd ) such that for all α ∈ Nd , ∂ α ϕn → ∂ α ϕ uniformly,
then `(ϕn ) → `(ϕ) as n → ∞. The space of distributions is denoted D 0 (Rd ).
Remark 5.2. D 0 (Rd ) is simply the space of continuous linear functionals on D(Rd ) when
D(Rd ) is given the topology defined by (1) and (2) in the above definition. When (1) and (2)
hold, we say that ϕn → ϕ in D(Rd ). One can define D(Ω) if Ω is an open subset of Rd in a
similar fashion.
The most common ways that distributions appear is through the expression (5.1). The fact
that this expression defines a distribution follows from the estimate
Z
|hu, ϕi| ≤ |u(x)| dx kϕkL∞ (Rd ) ,
K
if ϕ ∈ D(Rd ) with supp ϕ ⊆ K. Let us prove that (5.1) determines the distribution ù uniquely.
1 (Rd )) if ` = 0.
Note that by linearity, it suffices to show that that u = 0 (as an element of Lloc u
1 (Rd ). If hu, ϕi = 0 for all ϕ ∈ D(Rd ), then u(x) = 0 a.e.
Theorem 5.3. Let u ∈ Lloc
Proof. It suffices to prove Rthat u = 0 a.e. in Rthe ball BR (0) ⊂ Rd for any R > 0. Let v =
χB2R (0) u ∈ L1 (Rd ). Then Rd v(x)ϕ(x) dx = Rd u(x)ϕ(x) dx = 0 for any ϕ ∈ D(Rd ) with
supp ϕ ⊂ B2R (0). Let J be a mollifier as in Theorem B.25. Then Jε ∗ v → v in LR1 (Rd ) by The-
orem B.25 (4) and hence also in L1 (BR (0)). On the other hand, (Jε ∗ v)(x) = Rd v(y)Jε (x −
y) dy = 0 for x ∈ BR (0) and ε > 0 sufficiently small, since supp Jε (x − ·) ⊂ BR+ε (0). It follows
that u = v = 0 a.e. in BR (0).
1A linear functional is just a linear map with values in C.
Distributions and Sobolev spaces 85
The above theorem implies that we can identify u with the linear functional ù . We will do
this without further mention in what follows. We shall also use the notation hu, ϕi for u(ϕ)
even when u is just an element of D 0 (Rd ). Let us give an example of a distribution which does
not correspond to a locally integrable function.
Example 5.4. The Dirac2 distribution δa at a ∈ Rd is defined by
hδa , ϕi := ϕ(a).
It is clear that hδa , ϕn i = ϕn (a) → ϕ(a) = hδa , ϕi if ϕn → ϕ in D(Rd ), so that δa really is a

distribution. It is also clear that δa doesn’t correspond to a locally integrable function, since
hδa , ϕi = 0 if ϕ(a) = 0. If hδa , ϕi = hu, ϕi for some u ∈ Lloc 1 (Rd ), this would imply that
u(x) = 0 a.e. and hence that hδa , ϕi = hu, ϕi = 0 for all ϕ ∈ D(Rd ).
We also need to know what is meant by convergence of a sequence of distributions.
Definition 5.5. We say that un → u in D 0 (Rd ) if hun , ϕi → hu, ϕi for all ϕ ∈ D(Rd )
Example 5.6. Consider a function K ∈ L1 (Rd ) withR Rd K(x) dx = 1 and set Kε (x) = ε −d K(ε −1 x)
R
for ε > 0. Then Kε → δ0 as ε → 0 since hKε , ϕi = Rd Kε (y)ϕ(y) dy = (Kε ∗ R(ϕ))(0) → ϕ(0)

as ε → 0+ by Theorem B.27, where R(ϕ)(x) = ϕ(−x). In particular, we have that Kt → δ0
as t → 0+ , where Kt is the heat kernel.
5.2 Operations on distributions

We now discuss how to extend various operations on functions to distributions. To explain the
method, we consider translations in x. Given a function ϕ ∈ D(Rd ) and a vector h ∈ Rd , we
define (τh ϕ)(x) = ϕ(x − h). If ϕ, ψ ∈ D(Rd ) we have that
Z Z
hτh ϕ, ψi = ϕ(x − h)ψ(x) dy = ϕ(y)ψ(y + h) dx = hϕ, τ−h ψi.
Rd Rd
It therefore makes sense to define
hτh u, ϕi := hu, τ−h ϕi (5.2)
if u is just a distribution. Note that τ−h ϕn → τ−h ϕ if ϕn → ϕ in D(Rd ), so that (5.2) indeed
defines a distribution τh u. It is clear that the new definition agrees with the old one if u ∈
D(Rd ) in the sense that the distribution τh u defined by (5.2) can be represented by the function
u(· − h).
Example 5.7. We have τh δ0 = δh , since
hτh δ0 , ϕi = hδ0 , τ−h ϕi = ϕ(0 + h) = ϕ(h).
2 After the physicist Paul Dirac (1902–1984).

The above extension of the translation operator is an example of a general principle.
Proposition 5.8. Suppose that L is a linear operator on D(Rd ) which is continuous in the
sense that ϕn → ϕ in D(Rd ) implies L(ϕn ) → L(ϕ). Suppose, moreover, that there exists a
continuous linear operator LT on D(Rd ), with the property that hL(ϕ), ψi = hϕ, LT (ψ)i for
all ϕ, ψ ∈ D(Rd ). Then L extends to a continuous linear operator on D 0 (Rd ) given by
hL(u), ϕi = hu, LT (ϕ)i for all u ∈ D 0 (Rd ) and ϕ ∈ D(Rd ). (5.3)
Proof. The continuity of LT shows that L(u) defined in (5.3) is a distribution. It is also clear
from the definition of the transpose that (5.3) defines an extension of the original operator L.
Finally, if un → u in D 0 (Rd ), then
hL(un ), ϕi = hun , LT (ϕ)i → hu, LT (ϕ)i = hL(u), ϕi,
so the extension is continuous on D 0 (Rd ).

We call the operator LT the transpose of L. Note that this might differ from the L2 adjoint
since there is no complex conjugate on ψ in hϕ, ψi. We now give a list of common operators
and their transposes. Here (σλ ϕ)(x) = ϕ(λ x) is dilation by λ ∈ R \ {0} and (Mψ ϕ)(x) =
ψ(x)ϕ(x) multiplication by ψ ∈ C∞ (Rd ).
L LT
τh τ−h
σλ |λ |−d σ1/λ
Mψ Mψ
∂α (−1)|α| ∂ α
As a corollary to Proposition 5.8 we therefore have the following result.
Proposition 5.9. Each of the operators τh , σλ , Mψ and ∂ α on D(Rd ) have unique extensions
to continuous maps on D 0 (Rd ). The extensions are given by hLu, ϕi = hu, LT ϕi where L and
LT are as in the above table.
Note the restriction to multiplication with smooth functions. In general we can’t define the
product of a distribution and a non-smooth function ψ since then ϕψ 6∈ D(Rd ). In particular,
one can in general not define the product of two distributions. We will get back to this question
later on.
Example 5.10. The Heaviside3 function

(
1 x > 0,
H(x) =
0 x ≤ 0,
3 After the physicist Oliver Heaviside (1850–1925).

is locally integrable and hence H ∈ D 0 (R). The distributional derivative of H is calculated as

follows: Z ∞
0 0
hH , ϕi = −hH, ϕ i = − ϕ 0 (x) dx = ϕ(0) = hδ0 , ϕi.
0
It follows that H0
= δ0 ∈ D 0 (R).
Note that the pointwise derivative of H is zero everywhere
except at 0, where it is undefined. We can continue the process and calculate higher derivatives
of H. E.g.
hH 00 , ϕi = hδ00 , ϕi = −hδ0 , ϕ 0 i = −ϕ 0 (0).
5.3 Tempered distributions

One operation which can not be defined on D 0 (Rd ) is the Fourier transform. As the following
lemma shows, F is its own transpose.
Lemma 5.11. For ϕ, ψ ∈ S (Rd ), we have that
hF (ϕ), ψi = hϕ, F (ψ)i.
Proof. The result can be proved using Lemma 2.15, but it is simpler to give a direct proof. By
Fubini’s theorem we have that
Z Z
−d/2 −ix·ξ
hF (ϕ), ψi = (2π) ϕ(x)e dx ψ(ξ ) dξ
Rd Rd
Z Z
−d/2 −ix·ξ
= (2π) ϕ(x) ψ(ξ )e dξ dx
Rd Rd
= hϕ, F (ψ)i.
The problem with defining hF (u), ϕi = hu, F (ϕ)i is that F doesn’t map D(Rd ) to itself
(compact support is lost). The key is to instead use S (Rd ) as the space of test functions.
Definition 5.12. A continuous linear functional on S (Rd ) is called a tempered distribution4 .

The space of tempered distributions is denoted S 0 (Rd ).
Convergence in S 0 (Rd ) is defined analogously to convergence in D 0 (Rd ).
Definition 5.13. We say that un → u in S 0 (Rd ) if un (ϕ) → u(ϕ) for all ϕ ∈ S (Rd )
Continuity of an element u ∈ S 0 (Rd ) means that
hu, ϕn i → hu, ϕi
if ϕn → ϕ in S (Rd ). If u ∈ S 0 (Rd ), then hu, ϕi is well-defined if ϕ ∈ D(Rd ). Moreover, if

ϕn → ϕ ∈ D(Rd ), then clearly ϕn → ϕ in the topology of S (Rd ). Consequently, the restric-
tion of a tempered distribution to D(Rd ) is an element of D 0 (Rd ). Moreover, u is uniquely
4 Sometimes called a temperate distribution.
determined by its restriction to D(Rd ), since D(Rd ) is a dense subspace of S (Rd ) (Proposi-
tion 2.2 (5)). It therefore makes sense to write S 0 (Rd ) ⊂ D 0 (Rd ). The inclusion is strict, in
the sense that there exist elements u ∈ D 0 (Rd ) which can not be extended to continuous linear
functionals on S (Rd ).
Example 5.14.
(1) The Dirac distribution δa belongs to S 0 (Rd ).
(2) Any locally integrable function u with (1 + | · |2 )−N u ∈ L1 (Rd ) for some N belongs to
S 0 (Rd ) since
Z Z
|u(x)ϕ(x)| dx ≤ (1 + |x|2 )−N |u(x)|(1 + |x|2 )N |ϕ(x)| dx
Rd Rd
≤ k(1 + | · |2 )−N ukL1 (Rd ) k(1 + | · |2 )N ϕkL∞ (Rd ) .
(3) In particular, any u ∈ L p (Rd ), p ∈ [1, ∞], belongs to S 0 (Rd ) since (1 + | · |2 )−N u ∈
L1 (Rd ) if N is sufficiently large by Hölder’s inequality.
(4) u(x) = e|x| is in D 0 (Rd ) but not in S 0 (Rd ). Strictly speaking this means that the func-
tional ϕ 7→ hu, ϕi, ϕ ∈ D(Rd ), can not be extended to an element of S 0 (Rd ). This can
e.g. be seen by approximating a function ϕ ∈ S (Rd ) satisfying ϕ(x) = e−|x|/2 for large
|x| with a sequence in D(Rd ) in the topology of S (Rd ).
The proof of the following result is exactly the same as the proof of Proposition 5.8.
Proposition 5.15. The result in Proposition 5.8 remains true if D(Rd ) is replaced by S (Rd )
and D 0 (Rd ) by S 0 (Rd ).
As a consequence, Proposition 5.9 holds also for S 0 (Rd ) if we in the definition of the
multiplication operator assume that ψ ∈ S (Rd ), so that ψϕ ∈ S (Rd ) if ϕ ∈ S (Rd ). More
generally, one can assume that ψ and its derivatives grow at most polynomially.
Proposition 5.16. Let ψ ∈ C∞ (Rd ) and assume that ψ and its derivative grow at most poly-
nomially. Each of the operators τh , σλ , Mψ and ∂ α on S (Rd ) extend to continuous operators
on S 0 (Rd ).
Alternatively, one can view these operators as restrictions of the corresponding operators on
D 0 (Rd ) to S 0 (Rd ).
The main purpose of introducing tempered distributions is to extend the Fourier trans-
form. The fact that F : S (Rd ) → S (Rd ) is continuous shows that following definition makes
sense.
Definition 5.17. The Fourier transform on S 0 (Rd ) is the continuous operator F : S 0 (Rd ) →
S 0 (Rd ) defined by
hF (u), ϕi := hu, F (ϕ)i,
for u ∈ S 0 (Rd ) and ϕ ∈ S (Rd ). The inverse Fourier transform is the continuous operator
F −1 : S 0 (Rd ) → S 0 (Rd ) defined by
hF −1 (u), ϕi := hu, F −1 (ϕ)i.
It follows easily from the definition of F −1 on S (Rd ) that F −1 = RF = F R. As

usual, we write û for F (u) also if u ∈ S 0 (Rd ). The following properties are analogous to the
ones in Proposition 2.6.
Proposition 5.18. For u ∈ S 0 (Rd ), we have that

F
τ−y u 7→ eiy·ξ û,
F
eiy·x u 7→ τy û,
F
σλ u 7→ |λ |−d σ1/λ û,
F
∂xα u 7→ (iξ )α û,
F
xα u 7→ (i∂ξ )α û,
where y ∈ Rd and λ ∈ R \ {0}.
Proof. The identities follow by combining the definitions of these operators in terms of their
transposes with the corresponding properties for Schwartz class functions. We illustrate the
method by proving the first identity and leave the others to the reader. Considering ϕ as a
function of ξ we have that
hF (τ−y u), ϕi = hτ−y u, ϕ̂i = hu, τy ϕ̂i = hu, F (eiy·ξ ϕ)i = hû, eiy·ξ ϕi = heiy·ξ û, ϕi.
Theorem 5.19. F : S 0 (Rd ) → S 0 (Rd ) is an isomorphism with inverse F −1 .
Proof. We simply note that
hF −1 F (u), ϕi = hF (u), F −1 (ϕ)i = hu, F F −1 (ϕ)i = hu, ϕi.
Similarly, hF F −1 (u), ϕi = hu, F −1 F (ϕ)i = hu, ϕi.
Example 5.20. We have hδ̂0 , ϕi = hδ0 , ϕ̂i = ϕ̂(0) = (2π)−d/2 Rd ϕ(x) dx, so δ̂0 = (2π)−d/2 .
R
By Fourier’s inversion formula it follows that F (1) = RF −1 (1) = (2π)d/2 R(δ0 ) = (2π)d/2 δ0 .
In order to extend convolutions to S 0 (Rd ) we first prove that ϕ ∗ is a continuous linear

operator on S (Rd ) for a fixed ϕ ∈ S (Rd ).
Lemma 5.21. Assume that ϕ ∈ S (Rd ) and that ψn → ψ in S (Rd ). Then ϕ ∗ ψn → ϕ ∗ ψ in
S (Rd ).
Proof. From the convolution theorem it follows that F (ϕ ∗ ψn ) = (2π)d/2 ϕ̂ ψ̂n . The result
follows since Mϕ̂ , F and F −1 are continuous operators on S (Rd ).
The transpose of ϕ ∗ is computed by changing the order of integration in hϕ ∗ u, ψi:
Z Z Z Z
ϕ(x − y)u(y) dy ψ(x) dx = u(y) ϕ(x − y)ψ(x) dx dy
Rd Rd Rd Rd
if u ∈ S (Rd ), so that (ϕ ∗)T = R(ϕ) ∗, where as usual R(ϕ)(x) = ϕ(−x).

Proposition 5.22. For ϕ ∈ S (Rd ), the operator ϕ ∗ extends to a continuous operator on
S 0 (Rd ) given by hϕ ∗ u, ψi := hu, R(ϕ) ∗ ψi. The extension satisfies
F (ϕ ∗ u) = (2π)d/2 ϕ̂ û,
∂ α (ϕ ∗ u) = (∂ α ϕ) ∗ u = ϕ ∗ (∂ α u).
Proof. The existence of the extension is a consequence of Proposition 5.15 and Lemma 5.21.
The identity F (ϕ ∗ u) = (2π)d/2 ϕ̂ û follows from the calculation
hF (ϕ ∗ u), ψi = hϕ ∗ u, ψ̂i
= hu, R(ϕ) ∗ ψ̂i
= hu, F F −1 (R(ϕ) ∗ ψ̂)i
= hû, F −1 (R(ϕ) ∗ ψ̂)i
= hû, (2π)d/2 ϕ̂ψi
= h(2π)d/2 ϕ̂ û, ψi,
in which the convolution theorem for S (Rd ) and the relation F −1 = F R = RF have been
used. The second identity follows from similar manipulations and its proof is left to the
reader.
Since convolution is commutative on S (Rd ) we shall also write u ∗ ϕ instead of ϕ ∗ u.
Keeping u fixed and varying ϕ we obtain a continuous linear operator from S (Rd ) to S 0 (Rd ).
Note that u ∗ v is in general not well-defined if u, v ∈ S 0 (Rd ).
Example 5.23. We have hδ0 ∗ ϕ, ψi = hδ0 , R(ϕ) ∗ ψi = Rd ϕ(x)ψ(x) dx = hϕ, ψi. Thus, δ0 ∗
R
is the identity on S (Rd ). Similarly, (∂ α δ0 ) ∗ = ∂ α on S (Rd ).

Example 5.24. If Kε is as in Example 5.6, we find that Kε ∗ ϕ → δ0 ∗ ϕ = ϕ in S 0 (Rd ). This
agrees with what we already know about Kε .
5.4 Approximation with smooth functions

There is an important question that we have avoided. In Propositions 5.9 and 5.15 we didn’t
say anything about the uniqueness of the extensions. The extensions are in fact unique, but to
prove this one has to show that D(Rd ) is dense in D 0 (Rd ) in the first case and that S (Rd )
is dense S 0 (Rd ) in the second case. This is done by the usual procedure of mollification and
multiplication with smooth ‘cut-off’ functions, but there are some technical details. We will
present the proofs for the interested reader. The section may be skipped on a first reading since
the results are not used elsewhere in the notes.
We begin by noting that the definition hu ∗ ϕ, ψi := hu, R(ϕ) ∗ ψi also makes sense when
u ∈ D 0 (Rd ) and ϕ, ψ ∈ D(Rd ), since then R(ϕ) ∗ ψ ∈ D(Rd ) (see Section B.5). There is
however another natural way of defining convolutions. We show that these two definitions
agree.
Lemma 5.25. For u ∈ D 0 (Rd ) and ϕ ∈ D(Rd ) we have that u ∗ ϕ ∈ C∞ (Rd ) with
(u ∗ ϕ)(x) = hu, τx R(ϕ)i = hu, ϕ(x − ·)i.
Proof. Since x 7→ ϕ(x − ·) is continuous from Rd to D(Rd ) it follows that x 7→ hu, ϕ(x − ·)i ∈
C(Rd ). Moreover, limh→0 h−1 (ϕ(x + he j − ·) − ϕ(x − ·)) = ∂ j ϕ(x − ·) in D(Rd ) for each
x and j, and x 7→ ∂ j ϕ(x − ·) is continuous from Rd to D(Rd ). From this it follows that
x 7→ hu, ϕ(x − ·)i ∈ C1 (Rd ). Proceeding by induction, one shows that x 7→ hu, ϕ(x − ·)i ∈
C∞ (Rd ). The result therefore follows if we prove the equality (u ∗ ϕ)(x) = hu, ϕ(x − ·)i. Let
v(x) = hu, ϕ(x − ·)i. Using Riemann sums we find that
Z
hv, ψi = v(x)ψ(x) dx
Rd
= lim ∑ ψ(α/n)hu, ϕ(α/n − ·)in−d
n→∞
α∈Zd
* +
= u, lim ∑ ψ(α/n)ϕ(α/n − ·)n−d
n→∞
α∈Zd
= hu, R(ϕ) ∗ ψi
for ψ ∈ D(Rd ), where we have used the fact that ∑α∈Zd ψ(α/n)ϕ(α/n−·)n−d → R(ϕ)∗ψ in
D(Rd ) (left as an exercise for the reader). The expression on the last line is simply hu ∗ ϕ, ψi,
so this proves the equality u ∗ ϕ = v.
Theorem 5.26. D(Rd ) is dense in S 0 (Rd ) and D 0 (Rd ).
Proof. We first approximate u ∈ D 0 (Rd ) with the sequence ψ(·/n)u in D 0 (Rd ), where ψ ∈
D(Rd ) with ψ(x) = 1 for |x| ≤ 1. It is clear that ψ(·/n)ϕ → ϕ in D(Rd ) if ϕ ∈ D(Rd )
(they are equal for large n) and hence ψ(·/n)u → u in D 0 (Rd ). Similarly, if u ∈ S 0 (Rd ), the
sequence converges to u in S 0 (Rd ). We can therefore replace u by ψu, for some function
ψ ∈ D(Rd ), in what follows. We define un = (ψu) ∗ J1/n , where J is an even mollifier (see
Section B.5). By the previous lemma it follows that un ∈ C∞ (Rd ) as an element of D 0 (Rd ).
Moreover, un has compact support since un (x) = hψu, J1/n (x − ·)i = hu, ψJ1/n (x − ·)i = 0 if
dist(x, supp ψ) ≥ 1/n. Finally,
hun , ϕi = hψu, J1/n ∗ ϕi = hψu, ψ̃(J1/n ∗ ϕ)i,
where ψ̃ ∈ D(Rd ) with ψ̃ = 1 on supp ψ. We have that ∂ α (J1/n ∗ ϕ) = J1/n ∗ ∂ α ϕ → ∂ α ϕ

uniformly. Hence ψ̃(J1/n ∗ ϕ) → ψ̃ϕ in D(Rd ) and un → ψu in D 0 (Rd ). To prove the same
convergence in the topology of S 0 (Rd ), we first note that h(ψu) ∗ J1/n , ϕi is given by hw, ϕi,
where w(x) = hu, J1/n (x − ·)i is in D(Rd ), whenever ϕ ∈ D(Rd ). However, since D(Rd ) is
dense in S (Rd ) and w has compact support, the same is true by an approximation argument
whenever ϕ ∈ S (Rd ). We still have that ∂ α (J1/n ∗ ϕ) = J1/n ∗ ∂ α ϕ → ∂ α ϕ uniformly and
hence ψ̃(J1/n ∗ ϕ) → ψ̃ϕ in S (Rd ). It follows that un → ψu in S 0 (Rd ).
Corollary 5.27. The extensions in Propositions 5.9 and 5.15 are unique.
Example 5.28. Consider the three-dimensional wave equation. For initial data u0 = 0 and
|)
u1 ∈ S (Rd ) the solution is expressed on the Fourier side as sin(t|ξ
|ξ |
û1 (ξ ). Note that sin(t|·|)
|·| ∈
S 0 (Rd ). By Proposition 5.22 we therefore have that

−3/2 −1 sin(t|ξ |)
u(x,t) = (2π) F ∗ u1
|ξ |

|)
in the sense of distributions. At least if u1 ∈ D(Rd ), we can make sense of (2π)−3/2 F −1 sin(t|ξ |ξ |
∗
u1 as
hvt , u1 (x − ·)i,

|)
where vt = (2π)−3/2 F −1 sin(t|ξ |ξ |
∈ S 0 (Rd ). The distribution vt is nothing but tMt from
Section 3.2, that is,
1
Z
hvt , ϕi = tMt (ϕ) = ϕ(x) dS(x).
4πt |x|=|t|
5.5 Sobolev spaces

Definition 5.29. The Sobolev space H k (Rd ) is the space of u ∈ L2 (Rd ) such that ∂ α u ∈ L2 (Rd )
for any |α| ≤ k, where the derivatives are interpreted in the sense of distributions.
The notation W k,2 (Rd ) is also frequently used instead of H k (Rd ). One then regards
H k (Rd )as a member of a more general family W k,p (Rd ) of Sobolev spaces, in which L2 (Rd )
is replaced by L p (Rd ).
Note that Z
(u, v)H k (Rd ) := ∑ ∂ α u(x)∂ α v(x) dx
|α|≤k Rd
defines an inner product on H k (Rd ). Using the completeness of L2 (Rd ) and the definition
of distributional derivatives, it is easy to prove that H k (Rd ) is in fact a Hilbert space with
this inner product. Alternatively, one can prove this by first characterizing H k (Rd ) using the
Fourier transform. The following result is a direct consequence of Parseval’s formula.
Lemma 5.30. Let Pk (ξ ) = ∑|α|≤k ξ 2α . Then u ∈ L2 (Rd ) is in H k (Rd ) if and only if

Z
Pk (ξ )|û(ξ )|2 dξ < ∞. (5.4)
Rd
Moreover, Z
kuk2H k (Rd ) = Pk (ξ )|û(ξ )|2 dξ . (5.5)
Rd
Proof. If u ∈ L2 (Rd ), then its distributional derivatives satisfy F (∂ α u) = (iξ )α û by Proposi-

tion 5.18. If u ∈ H k (Rd ), we therefore obtain the equality (5.5) which implies (5.4). On the
other hand, if (5.4) holds, we obtain that all the derivatives of order less than or equal to k are
in L2 (Rd ). We therefore obtain that u ∈ H k (Rd ) and that (5.5) holds.
This result suggests a generalization of the Sobolev spaces. We introduce the notation
hξ i = (1 + |ξ |2 )1/2 . Note that Pk (ξ ) ∼ hξ i2k in the sense that there exist constants C1 ,C2 > 0
such that C1 hξ i2k ≤ Pk (ξ ) ≤ C2 hξ i2k for all ξ ∈ Rd . We can therefore replace Pk (ξ ) by hξ i2k
in the definition of the norm of H k (Rd ). This allows us to extend the definition of Sobolev
spaces to non-integer indices and to negative indices.
Definition 5.31. Let s ∈ R. H s (Rd ) is the space of tempered distributions u such that û ∈
1 (Rd ) with
Lloc
Z 1/2
2 s 2
kukH s (Rd ) := (1 + |ξ | ) |û(ξ )| dξ < ∞. (5.6)
Rd
These spaces are usually called fractional Sobolev spaces (even though s doesn’t have to
be rational). From now on we will use (5.6) as the norm on H s (Rd ), even when s is a non-
negative integer. Note that S (Rd ) ⊂ H t (Rd ) ⊂ H s (Rd ) ⊂ S 0 (Rd ) when s < t, each inclusion
being continuous.
Theorem 5.32. H s (Rd ) is a Hilbert space with inner product

Z
(u, v)H s (Rd ) := (1 + |ξ |2 )s û(ξ )v̂(ξ ) dξ .
Rd
Proof. It is clear that (·, ·)H s (Rd ) defines an inner product on H s (Rd ) with (u, u)H s (Rd ) =
kuk2H s (Rd ) , so we only need to prove the completeness. This follows since the map T : u 7→
hξ is û is an isometric isomorphism from H s (Rd ) to L2 (Rd ).
Note that the elements of H s (Rd ) need not be functions when s < 0.
Example 5.33. The distribution δ0 ∈ S 0 (Rd ) satisfies δ̂0 = (2π)−d/2 . Using polar coordi-
nates, we find that Z
(1 + |ξ |2 )s dξ < ∞
Rd
if and only if s < −d/2. Hence δ0 ∈ H s (Rd ) if and only if s < −d/2.
Riesz’ representation theorem (Theorem A.12) implies that the dual space of H s (Rd ) can
be identified with H s (Rd ) itself. The element v ∈ H s (Rd ) is here identified with the bounded
linear functional
Z
s d
H (R ) 3 u 7→ (u, v)H s (Rd ) = (1 + |ξ |2 )s û(ξ )v̂(ξ ) dξ . (5.7)
Rd
The dual can also be identified with H −s (Rd ) using an extension of the usual pairing between
S 0 (Rd ) and S (Rd ). Rewriting
Z Z
hw, ϕi = ŵ(ξ )ϕ̂(−ξ ) dξ = hξ i−s ŵ(ξ )hξ is ϕ̂(−ξ ) dξ ,
Rd Rd
we see that h·, ·i extends to a bilinear map H −s (Rd ) × H s (Rd ) → C. It is also clear that the
functional defined by v ∈ H s (Rd ) according to (5.7) is identical to the functional u 7→ hw, ui
where w := Λ2s C v, in which C v = v for v ∈ S (Rd ) and is extended by duality to S 0 (Rd ),
and Λ2s := F −1 Mhξ i2s F : H s (Rd ) → H −s (Rd ). Whenever one identifies the dual with either
H s (Rd ) or H −s (Rd ) one has to keep in mind how the identification is done.
Proposition 5.34. C0∞ (Rd ) is dense in H s (Rd ).

Proof. We can approximate hξ is û arbitrarily well by ϕ ∈ C0∞ (Rd ) in L2 (Rd ). Hence, u is
approximated well by F −1 (hξ i−s ϕ) ∈ S (Rd ) in H s (Rd ). Since C0∞ (Rd ) is dense in S (Rd )
it follows that C0∞ (Rd ) is also dense in H s (Rd ).
In particular this implies that H s (Rd ) is the completion of C0∞ (Rd ) with respect to the norm
k · kH s (Rd ) . This gives an alternative way of introducing Sobolev spaces.
Sobolev spaces are typically used in connection with PDE. One usually proves the ex-
istence of a solution in some Sobolev space. It is of interest to know when the solution is
sufficiently differentiable in the usual sense to define a classical solution of the PDE.
Definition 5.35. Ċk (Rd ) is the space of k times continuously differentiable functions u : Rd →
C, such that lim|x|→∞ |∂ α u(x)| = 0 for all α ∈ Nd with |α| ≤ k, endowed with the norm
kukCk (Rd ) := max|α|≤k supRd |∂ α u(x)|.
b
Lemma 5.36. Ċk (Rd ) is a closed subspace of the Banach space Cbk (Rd ).
Proof. Let {un } ⊂ Ċk (Rd ) with un → u in Cbk (Rd ). For ε > 0 we can find a number N such
that kun − ukCk (Rd ) < ε if n ≥ N. Choosing R such that |∂ α uN (x)| < ε for |x| ≥ R and |α| ≤ k,
b
we find that |∂ α u(x)| ≤ |∂ α uN (x)| + ku − uN kCk (Rd ) < 2ε if |x| ≥ R. Hence, u ∈ Ċk (Rd ).
b
Note that this means that Ċk (Rd ) is itself a Banach space.
Theorem 5.37 (Sobolev embedding theorem). Let k ≥ 0. H s (Rd ) is continuously embedded

in Ċk (Rd ) when s > k + d2 , meaning that H s (Rd ) ⊂ Ċk (Rd ) and that the inclusion map from
H s (Rd ) to Ċk (Rd ) is continuous.
Proof. We begin by proving the inequality
kϕkCk (Rd ) ≤ CkϕkH s (Rd )

b
for ϕ ∈ S (Rd ) where C = C(k, s, d). For α ∈ Nd with |α| ≤ k it follows from Hölder’s
inequality that
Z
−d/2
α
|∂ ϕ(x)| ≤ (2π) |ξ |k |ϕ̂(ξ )| dξ
Rd
|ξ |k
Z
−d/2 2 s/2

= (2π) s/2
1 + |ξ | |ϕ̂(ξ )| dξ
Rd (1 + |ξ |2 )
1/2 Z 1/2
|ξ |2k
Z
−d/2 2 s 2

≤ (2π) s dξ 1 + |ξ | |ϕ̂(ξ )| dξ
Rd (1 + |ξ |2 ) Rd
= CkϕkH s (Rd ) ,
where C = (2π)−d/2 k|ξ |k hξ i−s kL2 (Rd ) < ∞ since |ξ |2k hξ i−2s ≤ 2|ξ |2(k−s) for large |ξ | and
2(k − s) < −d. Taking the maximum over all α we find that
kϕkCk (Rd ) ≤ CkϕkH s (Rd ) . (5.8)

b
Pick {ϕn } ⊂ S (Rd ) with ϕn → f in H s (Rd ) as n → ∞. The inequality kϕn − ϕm kCk (Rd ) ≤
b
Ckϕn − ϕm kH s (Rd ) follows from (5.8) and implies that {ϕn } is a Cauchy sequence in Ċk (Rd ).
Since Ċk (Rd ) is a Banach space it follows that {ϕn } converges to some element g in Ċk (Rd ).
But then {ϕn } converges both to f and g in S 0 (Rd ) so f = g as distributions. It follows from
Theorem 5.3 that f = g a.e. Taking limits, one obtains (5.8) with f substituted for ϕ.
Remark 5.38. The elements of H s (Rd ) are actually equivalence classes, not functions. The
Sobolev embedding theorem implies that, for s > k + d/2, there is a continuous member of
each such class. As usual we identify the equivalence class with the continuous representative.
Finally, we prove that H s (Rd ) is an algebra if s > d/2.
Theorem 5.39. H s (Rd ) is an algebra if s > d/2, meaning that u, v ∈ H s (Rd ) ⇒ uv ∈ H s (Rd )
and that there exists C = C(s, d) > 0 such that
kuvkH s (Rd ) ≤ CkukH s (Rd ) kvkH s (Rd ) . (5.9)
We need the following lemma.

Lemma 5.40. The inequality

hξ + ηis ≤ 2s−1 (hξ is + hηis )
holds for s ≥ 1 and ξ , η ∈ Rd .
Proof. First note that
hξ + ηi = (1 + |ξ + η|2 )1/2 ≤ (1 + |ξ |2 )1/2 + |η| ≤ hξ i + hηi (5.10)
by the triangle inequality in Rd+1 applied to the vectors (1, ξ ) and (0, η). The function t 7→ t s ,
t ≥ 0, is convex for s ≥ 1 (the derivative is strictly increasing), and hence
t
1 t2 s t1s t2s
+ ≤ + ,
2 2 2 2
so that
(t1 + t2 )s ≤ 2s−1 (t1s + t2s ), (5.11)
when t1 ,t2 ≥ 0. The result follows by combining (5.10) and (5.11).
Proof of Theorem 5.39. Note that F (uv) = RF −1 (uv) = (2π)−d/2 R(F −1 (u) ∗ F −1 (v)) =
(2π)−d/2 (û ∗ v̂) for u, v ∈ S (Rd ) by the convolution theorem. By continuity, we also have
that F (uv) = (2π)−d/2 (û ∗ v̂) when u, v ∈ L2 (Rd ), and hence when u, v ∈ H s (Rd ), s > d/2.
We therefore have that
Z 1/2
d/2 s 2s 2
(2π) kuvkH s (Rd ) = khξ i (û ∗ v̂)kL2 (Rd ) = hξ i |(û ∗ v̂)(ξ )| dξ .
Rd
Note that Z
hξ is |(û ∗ v̂)(ξ )| ≤ hξ is |û(ξ − η)||v̂(η)| dη.
Rd
By the previous lemma,
hξ is = hξ − η + ηis ≤ 2s−1 (hξ − ηis + hηis ).
Hence,
Z Z
s s−1 s s
hξ i |(û ∗ v̂)(ξ )| ≤ 2 hξ − ηi |û(ξ − η)||v̂(η)| dη + hηi |û(ξ − η)||v̂(η)| dη .
Rd Rd
By the special case (B.3) of Young’s inequality, we therefore obtain that
khξ is (û ∗ v̂)kL2 (Rd ) ≤ C(khξ is ûkL2 (Rd ) kv̂kL1 (Rd ) + kûkL1 (Rd ) khξ is v̂kL2 (Rd ) ).
Since khξ is ûkL2 (Rd ) = kukH s (Rd ) and
Z
kûkL1 (Rd ) = hξ i−s hξ is |û(ξ )| dξ
Rd
Z 1/2 Z 1/2
−2s 2s 2
≤ hξ i dξ hξ i |û(ξ )| dξ
Rd Rd
≤ CkukH s (Rd ) ,
where we have used the assumption that 2s > d, the result follows.
Chapter 6
Dispersive equations
In this chapter we will focus on nonlinear dispersive wave equations of the form
ut − a(Dx )u = F(u, Dx u).
Here F is some nonlinear function of u and Dx u, while a(Dx ) is a differential operator and
Dx = ∇/i. We assume that F vanishes to second order at the origin. By the equation being
dispersive, we mean that the linear part ut = a(Dx )u is dispersive. Recall that in the one-
dimensional case this means that the equation has sinusoidal waves ei(kx−ω(k)t) = eik(x−c(k)t) ,
where c(k) = ω(k)/k ∈ R is a non-constant function of k. Since the dispersion relation is given
by ω(k) = ia(k), one should assume that a(k) is a polynomial of order 2 or higher, which takes
purely imaginary values for k ∈ R.1 In the higher dimensional case, the sinusoidal waves take
the form ei(k·x−ω(k)t) = eik·(x−c(k)t) , with c(k) = ω(k)k/|k|2 = ia(k)k/|k|2 . In this case it is
more difficult to define what is meant by a dispersive equation. Even for the wave equation
one obtains that there is some weak dispersion in the sense that c(k) = ±k/|k| depends on
the direction of k.2 At the very least, one should assume that a takes imaginary values when
k ∈ Rd .
To illustrate the ideas we will concentrate on two particular examples:
(1) the nonlinear Schrödinger (NLS) equation: iut + ∆u = σ |u| p−1 u, where σ = ±1 and
p ≥ 3 is an odd integer;
(2) the Korteweg-de Vries (KdV) equation: ut + 6uux + uxxx = 0.
Taking NLS as an example, we first consider the Cauchy problem
iut + ∆u = f (x,t). (6.1)
with u(x, 0) = u0 (x). Using Duhamel’s formula, we express the solution u = L(u0 , f ) in terms
of u0 and f , where L is a linear operator. A solution of the NLS equation is then obtained
by solving the fixed-point equation u = L(u0 , σ |u| p−1 u) using Banach’s fixed point theorem.
In order to carry out this procedure, we need to identify a suitable function space in which to
study (6.1). We begin with a short introduction to calculus in Banach spaces.
1 Ifa(k) = ia0 + ia1 k, with a0 , a1 ∈ R \ {0}, we obtain the transport equation vt = a1 ∂x v after replacing u by
v := e−ia0 t u. This equation is therefore not truly dispersive.
2 The wave equation is second order in time, and doesn’t quite fit into the above framework.
97
98 Dispersive equations
6.1 Calculus in Banach spaces

We restrict ourselves to functions of one real variable with values in a Banach space (X, k · k).
Definition 6.1. Let I be an open interval in R.
(1) A function u : I → X is said to be continuous at t0 ∈ I if limt→t0 u(t) = u(t0 ). It is said
to be continuous if it is continuous at all points of I. C(I; X) is the vector space of
continuous functions I → X.
(2) A function u : I → X is said to be differentiable at t0 with derivative u0 (t0 ) ∈ X if
u(t0 + h) − u(t0 )
lim = u0 (t0 )
h→0 h
in X. u is said to be differentiable if it is differentiable at all points in I. Ck (I; X) is the
vector space of k times continuously differentiable functions u : I → X.
If I is any interval (not necessarily open), we say that u ∈ Ck (I; X) if it is k times continu-
ously differentiable in the interior of I and all the derivatives extend continuously in X to the
endpoints. From Proposition 6.2 below it then follows that u has one-sided derivatives at the
endpoints.3
Assume that I is a compact interval. One easily verifies that t 7→ ku(t)k is continuous if
u ∈ C(I; X). From this it follows that ku(·)k is bounded. C(I; X) is therefore a normed vector
space with
kukC(I;X) = sup ku(t)kX .
t∈I
The proof that C(I) is a Banach space can be repeated word for word to prove that C(I; X) is
a Banach space. The assumption that X is a Banach space is essential here.
The mean-value theorem is replaced by the following result.
Proposition 6.2. Assume that u ∈ C([a, b]; X) and that u is differentiable on (a, b). Then
ku(b) − u(a)k ≤ (b − a) sup ku0 (t)k.
t∈(a,b)
Proof. If supt∈(a,b) ku0 (t)k = ∞ there is nothing to prove, so we can assume that it is finite. Let
C > supt∈(a,b) ku0 (t)k. Take α, β ∈ (a, b) with α < β and let I = {t ∈ [α, β ] : ku(t) − u(α)k ≤
C(t − α)}. I is non-empty since it contains α. It is also closed since u is continuous. Hence it
has a largest element β? . If β? < β , we obtain that
ku(β? + h) − u(α)k = ku(β? + h) − u(β? ) + u(β? ) − u(α)k
≤ ku(β? + h) − u(β )k +C(β? − α)
≤ Ch +C(β? − α)
= C(β? + h − α),
3 For instance, if u0 (a) = limt→a+ u0 (t), it follows from Proposition 6.2 applied to v(t) = u(t) − u(a) −
u0 (a)(t − a) that ku(a + h) − u(a) − u0 (a)hk = o(h) as h → 0+ .
Dispersive equations 99
if h > 0 is sufficiently small, since ku(β? + h) − u(β )k ≤ hku0 (t)k + o(h) as h → 0 according
to the definition of differentiability and the triangle inequality. It follows that β? + h ∈ I, and
therefore β? = β . Hence, ku(β ) − u(α)k ≤ C(β − α) and since u is continuous on [a, b] it
follows that ku(b) − u(a)k ≤ C(b − a). As C > supt∈[a,b] ku0 (t)k is arbitrary we obtain the
desired result.
We immediately obtain the following corollary.

Corollary 6.3. Let u ∈ C1 (I; X) with u0 (t) = 0 on I. Then u(t) is constant on I.
We also need to integrate Banach-space-valued functions. For our purposes it suffices to

consider continuous functions defined on a compact interval I, for which the theory is very
simple. There is a also a generalization of the Lebesgue integral for Banach-space-valued
functions called the Bochner integral, but we will not need this here (see e.g. Yosida [29]). A
step function s : I → X is a piecewise constant function from I to X, that is
N
s= ∑ χIk (t)sk , (6.2)
k=1
sk ∈ X, where {Ik }N N
k=1 is a partition of I into pairwise disjoint intervals (I = ∪k=1 Ik and I j ∩Ik =
0/ if j 6= k). We define
Z N
s(t) dt = ∑ sk |Ik |,
I k=1
where |Ik | is the length of the interval Ik . The representation (6.2) is not unique, but the
integral is independent of the particular representation.
R R
One easily verifies that the integral
is linear and that the triangle inequality k I s(t) dtk ≤ I ks(t)k dt holds. The integral of a
continuous function u is defined by approximating u with step functions. Precisely as for the
Riemann integral, one uses the fact that a continuous function on a compact set is uniformly
continuous.4
Proposition 6.4. Assume that f ∈ C(I; X). Then there exists a sequence of step functions {sn }
such
R
that sn (t) → f (t) uniformly on I (supt∈I ksn (t)R− f (t)kX → 0 as n → ∞). The sequence
{ I sn (t) dt} converges in X and the limit, denoted I f (t) dt, is the same for any sequence of
step functions which converges uniformly to f on I.
Proof. The existence of such a sequence is proved by taking sn (t) = ∑1≤k≤Nn f (tk )χIn,k with
tk ∈ In,k for any sequence of partitions {In,k }N k=1 of I with maxk |In,k | → 0 as n → ∞. Since
n
a continuous function on a compact interval is uniformly

R
continuous, one easily shows
R
that
sn (t) → f (t) uniformly on I. But this means that I ksn (t) − f (t)k dt → 0 and hence I ksn (t) −
sm (t)k dt → 0 as n, m → R
∞ by the triangle inequality in X. Using the triangle inequality for the
integral, we find that { I sn (t) dt} is a Cauchy sequence, and hence that it has a limit in X. The
fact that this limit is independent of the particular sequence of step functions follows again by
an application of the triangle inequality.
4 This can be proved using the Bolzano-Weierstraß theorem, exactly as for real-valued functions.
We define ba u(t) dt := −
R Rb
a u(t) dt when a < b. The proof of the following proposition is
left to the reader.
Proposition 6.5. Let X,Y be Banach spaces, I = [a, c] with a < b < c, f , g ∈ C(I; X), α, β ∈ C
and T : X → Y a bounded linear operator. Then
R R R
(1) I (α f (t) + β g(t)) dt = α I f (t) dt + β I g(t) dt;
R Rb Rc
(2) I f (t) dt = a f (t) dt + b f (t) dt;
R R
(3) k I f (t) dtkX ≤ I k f (t)kX dt;
R R
(4) T ( I f (t) dt) = IT f (t) dt.
The usual relation between derivatives and integrals holds.
Rt
Proposition 6.6. Let u ∈ C([a, b]; X). Then a u(τ) dτ is differentiable on (a, b) with
d t
Z
u(τ) dτ = u(t).
dt a
Proof. We simply note that
Z t+h Z t
1 t+h

1
Z
u(τ) dτ − u(τ) dτ − u(t) = (u(τ) − u(t)) dτ.
h a a h t
If ε > 0 is given, we can choose δ > 0 such that ku(τ) − u(t)k ≤ ε if |τ − t| ≤ δ . We then
obtain that Z t+h Z t+h
1 1
h t (u(τ) − u(t)) dτ ≤ h t ku(τ) − u(t)k dτ ≤ ε

if |h| ≤ δ . The result follows.

Combining this with Corollary 6.3 we obtain the following result.
CorollaryRt
6.7. Let f ∈ C(I; X) and t0 ∈ I. Then u ∈ C1 (I; X) with u0 (t) = f (t) on I if and only
if u(t) = t0 f (τ) dτ + u0 for some u0 ∈ X.
As an application of the above, we record the following existence and uniqueness result
for ordinary differential equations in Banach spaces.
Theorem 6.8. Let I = [0, b] for some b > 0. Assume that f ∈ C(X × I; X) and that for every
R > 0 there exists a constant L ≥ 0 such that k f (x,t) − f (y,t)k ≤ Lkx − yk for t ∈ I and
x, y ∈ BR (0) ⊂ X. There exists a T > 0 such that the Cauchy problem
u0 (t) = f (u(t),t), u(0) = u0 ,
has a unique solution u ∈ C1 ([0, T ]; X).
The proof of the theorem is precisely the same as for ODE in Rn . The reader is invited to
fill in the details. The theorem has very limited applicability to PDE, however. Consider e.g.
the nonlinear Schrödinger equation ut = i∆u − iσ |u| p−1 u. Taking X = H s (Rd ) as a Sobolev
space, one finds that f (u,t) = i∆u − iσ |u| p−1 u does not define a continuous function from
X × I to X. As we will see, a modification of the method is on the other hand useful for
solving the NLS equation.
6.2 NLS
The nonlinear Schrödinger equation shows up in many different models for physical phenom-
ena. The main reason is that it appears as a universal model when one looks for solutions of
wave packet type. To explain this we concentrate on one example, namely the KdV equation.
Several other interesting examples are provided by Sulem & Sulem [26].
Example 6.9. We look for an approximate solution of the KdV equation ut + 6uux + uxxx = 0
of the form
u(x,t) = εA1 (X, T )ei(kx−ωt) + εA1 (X, T )e−i(kx−ωt)

+ ε 2 A2 (X, T )e2i(kx−ωt) + ε 2 A2 (X, T )e−2i(kx−ωt) + ε 2 A0 (X, T ),
where T = ε 2t, X = ε(x + 3k2t) and ω = −k3 . The functions A j are complex-valued, and the
bars indicate complex conjugation. Note that u is therefore real-valued. The parameter ε is as-
sumed to be small, so that u is given to leading order by the wave packet εA1 (X, T )ei(kx−ωt) +
εA1 (X, T )e−i(kx−ωt) . Note that this can be rewritten in the form 2εR(X, T ) cos(kx − ωt +
θ (X, T )), where R = |A1 | and θ = arg A1 . Since the variable T changes at a slower rate than
X, which in turn changes at a slower rate than kx−ωt, we roughly have an envelope 2εR(X, T )
travelling with group velocity cg = −3k2 and a carrier wave cos(kx − ωt + θ (X, T )) with wave
number k and phase velocity c = −k2 . To see how the NLS equation arises, we substitute the
above Ansatz into the equation. Equating terms of the same order in ε, we arrive at the fol-
lowing equations:
k2 A0 = −2|A1 |2 ,
∂T A1 = −3ik∂X2 A1 − 6ik(A2 A1 + A1 A0 ),
k2 A2 = A21 ,
where two integration constants have been set to 0. When these equations are satisfied, we
formally find that ut + 6uux + uxxx = O(ε 4 ). Combining the three equations, we obtain the
cubic one-dimensional NLS equation
6
i∂T A1 − 3k∂X2 A1 + A1 |A1 |2 = 0
k
(the minus sign in front of the second order derivative can be removed by changing T to −T ).
For a rigorous justification of this approximation we refer to Schneider [23].
6.2.1 Linear theory

We first consider the Cauchy problem for the inhomogeneous Schrödinger equation
(
iut + ∆u = f , t > 0,
(6.3)
u = u0 , t = 0,
where f and u0 are given functions. If u0 ∈ S (Rd ) and f ∈ C∞ ([0, ∞); S (Rd )) there is a
unique solution u ∈ C∞ ([0, ∞); S (Rd )) given by
Z t
u(t) = S(t)u0 − i S(t − τ) f (τ) dτ (6.4)
0
2
according to Theorem 3.36. Here S(t)ϕ = F −1 (e−it|ξ | ϕ̂) for ϕ ∈ S (Rd ). Note that we can
write the solution of (6.3) with f ≡ 0 as
2
u(x,t) = F −1 (e−it|ξ | û0 ) = Kt ∗ u0 ,
2
where Kt = (2π)−d/2 F −1 (e−it|ξ | ) ∈ S 0 (Rd ).
Lemma 6.10. Let z ∈ C with Re z > 0. Then
z|·|2 1 |ξ |2
F (e− 2 )(ξ ) = d/2
e− 2z , (6.5)
z
where zd/2 is defined using the principal branch of the square root if d is odd.
2 /2 2 /2
Proof. The assumption Re z > 0 implies that e−z|·| ∈ S (Rd ) so that F (e−z|·| ) ∈ S (Rd ).
Moreover, the rapid decay implies that
z|x|2
Z
−z|·|2 /2 −d/2
F (e )(ξ ) = (2π) e− 2 −ix·ξ dx
Rd
is complex differentiable as function of z for each fixed ξ ∈ Rd (the order of complex dif-
2
ferentiation and integration can be interchanged). We therefore have that z 7→ F (e−z|·| /2 ) is
2
analytic in the right half-plane Re z > 0. On the other hand, the function z−d/2 e−|ξ | /(2z) is
also analytic in the right half-plane and the two functions agree when Im z = 0 (Example 2.7).
The unique continuation principle for analytic functions therefore proves that they are equal
in the whole right half-plane.
Proposition 6.11. The formula (6.5) extends to z ∈ C with Re z = 0 and z 6= 0.
Proof. Let z be as in the statement of the proposition. By the previous lemma we have that
2 /2 2 /2
he−(z+ε)|·| , ϕ̂i = hF (e−(z+ε)|·| ), ϕi
2 /(2(z+ε))
= h(z + ε)−d/2 e−|·| , ϕi
for ε > 0. Letting ε → 0+ we find that
2 /2
D 2
E
he−z|·| , ϕ̂i = z−d/2 e−|·| /(2z) , ϕ .
Theorem 6.12. For u0 ∈ S (Rd ) the solution of (6.3) can be written

1 i|x−y|2
Z
(S(t)u0 )(x) = e 4t u0 (y) dy (6.6)
(4πit)d/2 Rd
for t 6= 0.
Proof. From the previous proposition we find that
2 2it|·|2 1 i|x|2
(2π)d/2 Kt (x) = F −1 (e−it|·| )(x) = F (e− 2 )(−x) = e 4t .
(2it)d/2
We therefore find that

1 i|·|2
S(t)u0 = e 4t ∗ u
0 (6.7)
(4πit)d/2
in the sense of distributions, meaning that

1 i|·|2
hS(t)u0 , ϕi = e 4t , R(u0 ) ∗ ϕ ,
(4πit)d/2
for all ϕ ∈ S (Rd ). On the other hand, the convolution integral in (6.6) converges and defines
a continuous, bounded function of x for each t 6= 0. Since u0 (x − y)ϕ(x) is in L1 (Rd × Rd ),
we can change the order of integration to show that

1 i|·−y|2 1 i|y|2
Z Z
e 4t u0 (y) dy, ϕ = e 4t u0 (· − y) dy, ϕ
(4πit)d/2 Rd (4πit)d/2 Rd
Z
1 i|y|2
Z
= e 4t u0 (x − y) dy ϕ(x) dx
Rd (4πit)d/2 Rd
Z Z
1 i|y|2
= u0 (x − y)ϕ(x) dx e 4t dy
Rd Rd (4πit)d/2

1 i|·|2
= e 4t , R(u0 ) ∗ ϕ ,
(4πit)d/2
so that the identity (6.7) actually holds pointwise. This proves (6.6).
Proposition 6.13. Let u0 ∈ S (Rd ). Then

1 1
− d2 q− p
kS(t)u0 kL p (Rd ) ≤ (4π|t|) ku0 kLq (Rd )
for p ∈ [2, ∞], where 1/p + 1/q = 1.
Proof. For p = 2 we have that q = 2 and the result follows from Parseval’s formula since
2
e−it|ξ | has modulus 1. For p = ∞, the result follows from the representation formula (6.6). For
p ∈ (2, ∞) the result follows from the Riesz-Thorin interpolation theorem (Theorem B.28).
Recall that solutions of the heat equation also have decay properties (see Section 3.1.4).
The decay for the Schrödinger equation is of a different nature. This can be seen by look-
ing at an individual Fourier mode û(t)eiξ ·x . For the heat equation the solution has the form
2 2
û(0)eix·ξ −|ξ | t , while the corresponding solution of the Schrödinger equation is û(0)eix·ξ −i|ξ | t .
Thus, the individual Fourier modes decay for the heat equation, but not for Schrödinger’s equa-
tion. In the latter case, the decay is instead related to the fact that different modes travel with
different speeds, that is, dispersion. An initially localized solution consists of a continuous lin-
ear combination of different Fourier modes. The dispersion of these modes causes the decay.
The solution formula (6.4) can be extended to u0 ∈ S 0 (Rd ) and f ∈ C([0, T ); S 0 (Rd ))
if one interprets the integral in a suitable weak sense. Here we shall content ourselves with
data in Sobolev spaces. We assume that u0 ∈ H s (Rd ) and that f ∈ C([0, T ); H s (Rd )) for some
T > 0. We begin by recording some properties of the propagator S(t).
Proposition 6.14. The family of operators {S(t)}t∈R defined for u ∈ H s (Rd ) by S(t)u =
2
F −1 (e−i|ξ | t û) satisfies the following properties:
(1) S(t) is a unitary operator on H s (Rd );
(2) S(t) commutes with partial derivatives, meaning that ∂ j (S(t)u) = S(t)(∂ j u);
(3) S(0) = Id;
(4) S(t1 + t2 ) = S(t1 )S(t2 );
(5) the map t 7→ S(t) is strongly continuous, meaning that t 7→ S(t)u is in C(R; H s (Rd )) for
every u ∈ H s (Rd );
(6) for u ∈ H s (Rd ) we have that t 7→ S(t)u is in C1 (R; H s−2 (Rd )) with
d
(S(t)u) = i∆S(t)u.
dt
Proof. We work on the Fourier transform side, where S(t) corresponds to multiplication with
2 2
e−i|ξ | t . Properties (1)–(4) now follow from the facts that e−i|ξ | t has modulus 1 and satisfies
2 2 2
e−i|ξ | (t1 +t2 ) = e−i|ξ | t1 e−i|ξ | t2 , while ∂ j corresponds on the Fourier side to multiplication with
2
iξ j . Property (5) is proved by noting that t 7→ e−i|ξ | t is continuous for fixed ξ and applying the
dominated convergence theorem. Property (6) is also proved using the dominated convergence
2 2
theorem and the facts that ∂t e−i|ξ | t = −i|ξ |2 e−i|ξ | t and F (∆u) = −|ξ |2 û.
Properties (3)–(5) are the defining properties of a strongly continuous group of operators
on a normed vector space (it is also assumed that the operators are bounded).
Proposition 6.15. Assume that f ∈ C([0, T ); H s (Rd )) and that u0 ∈ H s (Rd ). There exists a
unique solution u ∈ C([0, T ); H s (Rd )) ∩C1 ([0, T ); H s−2 (Rd )) of (6.3) given by (6.4).
Proof. Writing (6.4) in the form u(t) = S(t)u0 − iS(t) 0t S(−τ) f (τ) dτ and using Proposition
R
6.14 it is easy to verify that u defines a solution in C([0, T ); H s (Rd )) ∩ C1 ([0, T ); H s−2 (Rd ))
of (6.3). On the other hand, if v is another solution, then w(t) := S(−t)(u(t) − v(t)) satisfies
the differential equation w0 (t) = 0 with w(0) = 0, so w(t) = 0 for all t by Corollary 6.3. Hence
u(t) ≡ v(t).
6.2.2 Nonlinear theory

We now return to the nonlinear Schrödinger equation
iut + ∆u = σ |u| p−1 u (6.8)
where σ = ±1 and p ≥ 3 is an odd integer. The two different cases, σ = −1 and σ = 1, are
referred to as focusing and defocusing, respectively. The distinction between these two cases
will become clear later on.
Local well-posedness of the Cauchy problem

The first goal is to prove the local well-posedness of the Cauchy problem
(
iut + ∆u = σ |u| p−1 u, t > 0,
(6.9)
u = u0 , t = 0,
We assume that u0 ∈ H s (Rd ) for some s > d/2. We are looking for a solution u ∈ C([0, T ); H s (Rd ))∩
C1 ([0, T ); H s−2 (Rd )). Repeated use of Theorem 5.39 shows that |u| p−1 u = u(p+1)/2 u(p−1)/2 ∈
H s (Rd ) if u ∈ H s (Rd ), where we also used the fact that u ∈ H s (Rd ).
Lemma 6.16. Let s > d/2 and f (z) = σ |z| p−1 z. The map u 7→ f (u) is a locally Lipschitz
continuous on H s (Rd ), meaning that for each R > 0 there exists a constant C such that k f (u)−
f (v)kH s (Rd ) ≤ Cku − vkH s (Rd ) , whenever u, v ∈ BR (0) ⊂ H s (Rd ).
Proof. Since f (z) is a polynomial in z and z, we can find two polynomials P1 (z1 , z2 , z3 , z4 ) and
P2 (z1 , z2 , z3 , z4 ), such that
|u| p−1 u − |v| p−1 v = P1 (u, v, u, v)(u − v) + P2 (u, v, u, v)(u − v).
This prove that
k f (u) − f (v)kH s (Rd ) ≤ C(kP1 (u, v, u, v)kH s (Rd ) + kP2 (u, v, u, v)kH s (Rd ) )ku − vkH s (Rd ) .
Using Theorem 5.39 we find that kPj (u, v, u, v)kH s (Rd ) , j = 1, 2, is bounded by a constant
(depending on R) when kukH s (Rd ) , kvkH s (Rd ) ≤ R.
It follows that t 7→ f (u(t)) is in C([0, T ); H s (Rd )) if u ∈ C([0, T ); H s (Rd )). We cannot

directly view (6.8) as an ODE in a H s (Rd ) since ∆ is not a bounded operator on H s (Rd ).
Proposition 6.15 shows that the problem is equivalent to
Z t
u(t) = S(t)u0 − i S(t − τ) f (u(τ)) dτ, (6.10)
0
where S(t) is the propagator for the linear Schrödinger equation.

Definition 6.17. A function u ∈ C([0, T ); H s (Rd )) is called a strong solution of (6.9) if it
satisfies (6.10).
It follows from Proposition 6.15 that a strong solution u is also in C1 ([0, T ); H s−2 (Rd ))
and solves (6.8) for each t ∈ (0, T ) with u(x, 0) = u0 (x). Just as for finite-dimensional ODE it
is simpler to work with the integral equation (6.10) than the original problem. Note however
that (6.10) differs from the usual integral equation
Z t
u(t) = u0 + i (∆u(τ) − f (u(τ))) dτ, (6.11)
0
which we would obtain if we were to treat the problem as an infinite-dimensional ODE. The
advantage with (6.10) is that the right hand side defines a continuous map on C([0, T ); H s (Rd ))
for s > d/2; this is not the case for (6.11).
Theorem 6.18. Let s > d/2. For any u0 ∈ H s (Rd ), there exists a time T1 = T1 (ku0 kH s (Rd ) ) > 0
such that the Cauchy problem (6.9) has a unique strong solution u ∈ C([0, T1 ]; H s (Rd )).
Proof. The strategy is to prove that the nonlinear operator defined by the right hand side of
(6.10) is a contraction if the time interval is sufficiently small. We choose X as the Banach
space
X = C([0, T1 ]; H s (Rd ))
where T1 is to be determined and let
K = {u ∈ X : kukX ≤ 2R},
where ku0 kH s (Rd ) < R. K is a closed subset of the Banach space X and hence a complete metric
space. We need to prove that F(K) ⊆ K and that kF(u) − F(v)kX ≤ θ ku − vkX for some θ ∈
(0, 1), where F(u) denote the right hand side of (6.10). Note that kS(·)u0 kX = ku0 kH s (Rd ) < R.
Using Lemma 6.16 we find that there exists a constant C = C(R) > 0 such that
k f (u) − f (v)kH s (Rd ) ≤ Cku − vkH s (Rd )
for u, v ∈ H s (Rd ) with kukH s (Rd ) , kvkH s (Rd ) ≤ 2R. Therefore

Z t

S(t − τ)( f (u(τ)) − f (v(τ))) dτ ≤ CT ku − vkH s (R) .
0

H s (Rd )
Taking T = 1/(2C), the contraction property follows with θ = 1/2. We thus have kF(u) −
F(v)kX ≤ 12 ku−vkX . Taking v = 0, It follows that kF(u)kX ≤ kF(0)kX + 12 kukX = ku0 kH s (Rd ) +
1
2 kukX ≤ 2R, so that F(K) ⊆ K. From Banach’s fixed point theorem (Theorem A.15) we there-
fore find a unique strong solution u ∈ K. It remains to prove that the solution is unique in X.
This follows if we can prove that any solution in X is in fact in K. To see this, we estimate
k f (u)kH s (Rd ) = k f (u) − f (0)kH s (Rd ) ≤ Cku − 0kH s (Rd ) = CkukH s (Rd ) and
Z t
kF(u)(t)kH s (Rd ) ≤ ku0 kH s (Rd ) + k f (u(τ))kH s (Rd ) dτ
0
≤ R + 2RCt
= (1 + t/T )R,
as long as u(τ) remains in the ball of radius 2R around the origin in H s (Rd ). But this implies
that ku(t)kH s (Rd ) = kF(u)(t)kH s (Rd ) ≤ 2R when 0 ≤ t ≤ T , since ku(t)kH s (Rd ) is continuous
with ku(0)kH s (Rd ) < R. In other words u ∈ K. The proof is complete.
Note that T1 only depends on the Lipschitz constant C in Lemma 6.16. Since the optimal
constant is an increasing function of R, we can assume that T1 is a decreasing function of
ku0 kH s (Rd ) .
There is nothing special about the point t = 0. If the initial condition is prescribed at some
point t0 we instead obtain the integral equation
Z t
u(t) = S(t − t0 )u0 − i S(t − τ) f (u(τ)) dτ.
t0
Suppose that u is a strong solution on [0, T1 ] and that v is a strong solution on [T1 , T2 ] with
initial data v0 = u(T1 ) at t = T1 . Set w(t) = u(t) on [0, T1 ] and w(t) = v(t) on [T1 , T2 ], so that
w ∈ C([0, T2 ]; H s (Rd )). Clearly w is a strong solution on [0, T1 ]. It is also a strong solution on
[0, T2 ] since
Z t
S(t − T1 )u(T1 ) − i S(t − τ) f (v(τ)) dτ
T1
Z T1 Z t
= S(t − T1 )S(T1 )u0 − i S(t − T1 )S(T1 − τ) f (u(τ)) dτ − i S(t − τ) f (v(τ)) dτ
0 T1
Z t
= S(t)u0 − i S(t − τ) f (w(τ)) dτ,
0
for t ∈ [T1 , T2 ] by Proposition 6.14.

Suppose that u, v ∈ C([0, T ]; H s (Rd )) are two strong solutions for some T > T1 . By the
above theorem, the solutions coincide on [0, T1 ]. Let T? ≤ T be the maximal time such that
u(t) = v(t) on [0, T? ). By continuity it follows that they also coincide at T? . But then T? = T , for
otherwise we could solve the equation with initial data u(T? ) = v(T? ) at t = T? . The uniqueness
assertion of Theorem 6.18 proves that u(t) = v(t) in a neighbourhood of T? , giving a contradic-
tion. This proves that we have uniqueness on an arbitrary time interval. It also implies that we
can extend the local solution to some maximal interval in a unique way. Suppose that T is the
maximal existence time, meaning that there is a solution on [0, T ), which can not be extended
beyond T . If T < ∞ we have that limt→T ku(t)kH s (Rd ) = ∞. Indeed, otherwise there exists
an R > 0 and a sequence t j → T such that ku(t j )kH s (Rd ) ≤ R. If we solve the equation with
initial data u(t j ) at time t j ∈ [0, T ), the solution will exist at least on a time interval [t j ,t j + T1 ]
where T1 = T1 (ku(t j )kH s (Rd ) ) ≥ T1 (R) > 0 is the existence time provided by Theorem 6.18.
Choosing j sufficiently large we obtain a contradiction.
We can also prove that the solution depends continuously on the initial data. Indeed, if
u0 , v0 ∈ H s (Rd ), we obtain that
Z t
ku(t) − v(t)kH s (Rd ) ≤ ku0 − v0 kH s (Rd ) + Cku(τ) − v(τ)kH s (Rd ) dτ,
0
if u, v ∈ C([0, T ]; H s (Rd )) are the corresponding solutions (T is chosen strictly less than the
maximal existence time for each solution), ku(t)kH s (Rd ) , kv(t)kH s (Rd ) ≤ R on [0, T ] and C is
a Lipschitz constant for f (u) on {u ∈ H s (Rd ) : kukH s (Rd ) ≤ R}. It follows from Grönwall’s
inequality that
ku(t) − v(t)kH s (Rd ) ≤ ku0 − v0 kH s (Rd ) eCt , 0 ≤ t ≤ T.
In conclusion, we have the following result.
Theorem 6.19. There exists a maximal time T > 0 such that the Cauchy problem (6.9) has a
strong solution in C([0, T ); H s (Rd )). The solution is unique in this class and if T < ∞ then
limt→T ku(t)kH s (Rd ) = ∞. The solution depends continuously on the initial data in the sense
described above.
Global existence theory and conservation laws

We now turn to the global existence theory. For simplicity we will assume that d = 1 from
now on. One of the main ways of proving global existence is to use conserved quantities.
Define the functionals
1
Z
M(u) = |u|2 dx
2 R
and Z
1 2 σ p+1
E(u) = |ux | + |u| dx.
R 2 p+1
One can formally show that M(u(t)) and E(u(t)) are independent of t by differentiating with
respect to t, using the equation and integrating by parts (see Theorem 6.23 below). The pa-
rameters p, σ don’t play a big role in the local well-posedness theory.5 They are extremely
important for the global theory, however. This can be realized by inspecting the conserved
functional E above. If σ = 1, we can estimate kukH 1 (R) ≤ 2(M(u) + E(u)). Thus we can
prevent the blow-up of the H 1 norm and obtain a global existence result for H 1 solutions. In
order to do this, we must first rigorously prove that E(u) and M(u) are conserved for such
solutions.
If u0 ∈ H s (R) with s > 1/2, Theorem 6.19 guarantees the existence of a maximal solution
u ∈ C([0, Ts ); H s (Rd )). Let 1/2 < r < s. Since H s (R) ⊂ H r (R), we also obtain the existence
of a solution in C([0, Tr ); H r (Rd )), where a priori we could have Tr 6= Ts . By the uniqueness
assertion of the proposition, the two solutions must agree on their common domain of defini-
tion and clearly Ts ≤ Tr . One could however imagine that the solution remains in H s (R) only
up to the time Ts < Tr and that it thereafter looses regularity (consider e.g. the weak solutions
discussed in Chapter 4). We shall now prove that this cannot happen.
Lemma 6.20. Let 1/2 < r < s. Then
kuvkH s (R) ≤ C(kukH s (R) kvkH r (R) + kukH r (R) kvkH s (R) ).
5 They do play a role if one considers initial data with lower regularity.
Proof. From the proof of Theorem 5.39, we find that
kuvkH s (R) ≤ C(kukH s (R) kv̂kL1 (R) + kûkL1 (R) kvkH s (R) ).
On the other hand kûkL1 (R) ≤ CkukH r (R) by the same proof.
Remark 6.21. One actually has the stronger result
kuvkH s (Rd ) ≤ C(kukH s (Rd ) kvkL∞ (Rd ) + kukL∞ (Rd ) kvkH s (Rd ) )
for s ≥ 0, but the proof requires more sophisticated methods (see Tao [27]).
Theorem 6.22 (Persistence of regularity for NLS). Let u0 ∈ H s (R), where 1/2 < r < s. Then
Ts (u0 ) = Tr (u0 ).
Proof. Let T < Tr (u0 ), so that C := supt∈[0,T ] ku(t)kH r (R) < ∞. Repeated application of Lemma
6.20 shows that
Z t
ku(t)kH s (Rd ) ≤ ku0 kH s (R) + k f (u(τ))kH s (R) dτ
0
Z t
≤ ku0 kH s (R) + K ku(τ)kHp−1
r (R) ku(τ)kH s (R) dτ
0
Z t
p−1
≤ ku0 kH s (R) + KC ku(τ)kH s (R) dτ.
0
It follows from Grönwall’s inequality that

p−1 t
ku(t)kH s (R) ≤ ku0 kH s (R) eKC , 0 ≤ t < min{T, Ts },
so that T ≤ Ts (u0 ) ≤ Tr (u0 ). Since T < Tr (u0 ) is arbitrary we obtain that Ts (u0 ) = Tr (u0 ).
Theorem 6.23. Let u ∈ C([0, T ); H 1 (R)) be a strong solution of the NLS equation. Then the
quantities M(u(t)) and E(u(t)) are independent of t.
Proof. First assume that u0 ∈ H 3 (R) and let u ∈ C([0, T ); H 3 (R)) ∩C1 ([0, T ); H 1 (R)) be the
corresponding solution. We have that
d
Z
M(u(t)) = Re ut u dx
dt ZR
= Re i(uxx − σ |u| p−1 u)u dx
RZ
= − Re i(|ux |2 + σ |u| p+1 ) dx

R
=0
(note that u ∈ Ċ2 (R), so that the integration by parts is justified). Similarly,
d
Z
uxt ux + σ |u| p−1 ut u dx

E(u(t)) = Re
dt RZ
= − Re ut (uxx − σ |u| p−1 u) dx

ZR
= − Re i|uxx − σ |u| p−1 u|2 dx
R
= 0.
If u0 ∈ H 1 (R) we approximate it in H 1 (R) by a sequence {u0,n } ⊂ H 3 (R). By Theorems 6.18

and 6.22, there exist corresponding solutions {un } ⊂ C([0, T ]; H 3 (R)), where T > 0 can be
taken independent of n. We also have that
lim kun − ukC([0,T ];H 1 (R)) = 0,

n→∞
by Theorem 6.18. By the above reasoning we have that E(un (t)) = E(u0,n ) and M(un (t)) =
M(u0,n ) for all t ∈ [0, T ]. Since the maps u 7→ E(u) and u 7→ M(u) are continuous from H 1 (R)
to R, it follows that
E(u(t)) = E(u0 ), M(u(t)) = M(u0 ).
Theorem 6.24. If σ = 1 and u0 ∈ H s (R), s ≥ 1, the strong solution of (6.9) in H s (R) exists
for all t ≥ 0.
Proof. This follows from Theorems 6.19 and 6.23 and the inequality kuk2H 1 (R) ≤ 2(E(u) +
M(u)).
In the focusing case, we generally don’t have global existence. The problem is that one
generally can’t control the H 1 norm using M and E since the second term in E is negative. An
exception is when the nonlinearity is cubic.
Theorem 6.25. If σ = −1, p = 3 and u0 ∈ H s (R), s ≥ 1, the strong solution of (6.9) in H s (R)
exists for all t ≥ 0.
Lemma 6.26. Let u ∈ H 1 (R). Then
kuk2L∞ (R) ≤ 2kukL2 (R) ku0 kL2 (R) .
Proof. Assume that u ∈ C0∞ (R). Then

Z x
2
|u(x)| = 2 Re u0 (y)u(y) dy,
−∞
yielding
kuk2L∞ (R) ≤ 2kukL2 (R) ku0 kL2 (R) .
The result now follows by using the fact that C0∞ (R) is dense in H 1 (R).
Proof of Theorem 6.25. We can estimate

Z
|u| p+1 dx ≤ kukLp−1 2
∞ (R) kukL2 (R)
R
p−1 p+3 p−1
≤2 2 kukL22(R) ku0 kL22(R) .
If p = 3 this gives
1 1 1 1
Z
|u|4 dx ≤ kuk3L2 (R) ku0 kL2 (R) ≤ ku0 k2L2 + kuk6L2 ,
4 R 2 4 4
by the arithmetic-geometric inequality, so that
1 1 1
Z
E(u) = ku0 k2L2 (R) − |u|4 dx ≥ ku0 k2L2 (R) − 2M(u)3 .
2 4 R 4
We can therefore bound kuk2H 1 (R) ≤ 4(E(u) + M(u)) + 8M(u)3 . The conservation of E(u) and
M(u) now implies that ku(t)kH 1 is bounded.
We conclude by showing a blow-up result. For this we need the additional result that if the
initial data satisfies xu0 ∈ L2 (R), then xu ∈ L2 (R) for all t for which the solution is defined.
This can e.g. be shown by proving a well-posedness result in the space
H 1,1 (R) = {u ∈ H 1 (R) : xu ∈ L2 (R)},
with norm kukH 1,1 (R) = (kuk2H 1 (R) + kxuk2L2 (R) )1/2 (see Tao [27]). We also need the follow-
ing result, which can formally be verified by carrying out the differentiation and integrating
by parts. The rigorous justification relies on an approximation argument, as in the proof of
Theorem 6.23.
Lemma 6.27. Define Z

V (u) = x2 |u(x)|2 dx.
R
Then
d
Z
V (u(t)) = 4 Im xu ux dx
dt R
and
d2 4σ (p − 5)
Z
2
V (u(t)) = 16E(u(t)) + |u(t)| p+1 dx.
dt p+1 R
Theorem 6.28. Suppose that σ = −1 and p ≥ 5, and consider u0 ∈ H 1 (R) with V (u0 ) finite.
If
E(u0 ) < 0,
the solution with initial data u0 has finite existence time.
Proof. Abbreviate V (t) = V (u(t)) and E = E(u(t)) = E(u0 ). If p ≥ 5 and σ = −1, we obtain
that V 00 (t) ≤ 16E. Thus,
V (t) ≤ 8Et 2 +V 0 (0)t +V (0), 0 ≤ t < T,
with E negative by assumption. Hence, T < ∞ since we would otherwise obtain the contra-
diction V (t) < 0 for t large.
Remark 6.29. Note that E(λ u0 ) < 0 if λ > 0 is sufficiently large for a fixed function u0 ∈
H 1,1 (R), u0 6= 0.
6.3 KdV
As mentioned in the introduction, the KdV equation was introduced in an attempt to explain
the solitary water wave observed by John Scott Russell. The equation can be derived as an
approximation of the equations of fluid mechanics. Since the derivation is quite long it has
been placed at the end of the section.
6.3.1 Linear theory

The Cauchy problem for the linearized KdV equation is
(
ut + uxxx = f , t > 0,
(6.12)
u = u0 , t = 0,
where f and u0 are given. If u0 ∈ S (Rd ) and f ∈ C∞ ([0, ∞); S (Rd )) there is a unique solution
u ∈ C∞ ([0, ∞); S (Rd )) given by
Z t
u(t) = S(t)u0 + S(t − τ) f (τ) dτ, (6.13)
0
3
where S(t)ϕ = F −1 (eitξ ϕ̂) for ϕ ∈ S (Rd ). Just as for the NLS equation this solution for-
mula can be extended to u0 ∈ H s (R) and f ∈ C([0, T ); H s (R)). Using the same arguments as
in the proof of Propositions 6.14 and 6.15, we obtain the following results.
3
Proposition 6.30. The family of operators {S(t)}t∈R defined for u ∈ H s (R) by S(t)u = F −1 (eiξ t û)
is a strongly continuous group of unitary operators on H s (R). The operator S(t) commutes
with ∂x for each t. If u ∈ H s (R) then t 7→ S(t)u is in C1 (R; H s−3 (R)) with
d
(S(t)u) = −∂x3 S(t)u.
dt
Proposition 6.31. Assume that f ∈ C([0, T ); H s (R)) and that u0 ∈ H s (R). There exists a
unique solution u ∈ C([0, T ); H s (R)) ∩C1 ([0, T ); H s−3 (R)) of (6.12) given by (6.13).
Just as in the case of the Schrödinger equation, the solution can be expressed as a convo-
lution of u0 with a function Kt (x) if f ≡ 0 and u0 is sufficiently smooth (e.g. u0 ∈ S (R)).
3
Define the tempered distribution Ai ∈ S 0 (R) through Ai = (2π)−1/2 F −1 (eiξ /3 ). We then
have
u = Kt ∗ u0
in the sense of distributions, where Kt is the tempered distribution defined by
Kt (x) = (3t)−1/3 Ai((3t)−1/3 x)
Here we have abused notation slightly by writing Ai(λ x) instead of σλ Ai. This abuse of
notation is justified since Ai is in fact given by a bounded, smooth function.
Lemma 6.32. The tempered distribution Ai is given by the smooth function defined by the
path integral
1
Z
3
eiζ /3+ixζ dζ , (6.14)
2π Im ζ =η
where η > 0 is arbitrary.
Proof. We have that Re(iζ 3 /3 + ixζ ) = −ξ 2 η + η 3 /3 − xη, so that the integral in (6.14)
3
converges for each η > 0. Moreover, the function ζ 7→ eiζ /3+ixζ is analytic for fixed x,
showing that the integral is independent of η. Since the integrand is smooth (in fact, real
analytic) in x and all the derivatives decay rapidly as | Re ζ | → ∞, (6.14) defines a smooth
function of x. It remains to show that the distribution defined by this function coincides with
Ai. Changing the order of integration, we find that
Z Z Z
1 iζ 3 /3+ixζ 1 i(ξ +iη)3 /3+ix(ξ +iη)
e dζ , ϕ = e ϕ(x) dx dξ
2π Im ζ =η 2π R R
Z Z
1 iξ 3 /3+ixξ
→ e ϕ(x) dx dξ
2π R R
3 /3
= h(2π)−1/2 eiξ , F −1 (ϕ)i
= hAi, ϕi
as η → 0+ . Since (6.14) is independent of η, the result follows.
From (6.14) one also sees that
Ai00 (x) − x Ai(x) = 0, (6.15)
since
1
Z
00 3
Ai (x) − x Ai(x) = − (ζ 2 + x)eiζ /3+ixζ dζ
2π Im ζ =η
1 h 3 iζ =R+iη
=− lim eiζ /3+ixζ
2πi R→∞ ζ =−R+iη
= 0.
Equation (6.15) is called Airy’s equation.
3/2
Proposition 6.33. Ai(x) = O(|x|−1/4 ) as x → −∞, while Ai(x) = O(x−1/4 e−x ) as x → ∞.
Proof. We begin with the case x → ∞ and thus assume that

√ x > 0. The derivative of ζ 7→
3
iζ /3 + ixζ 2
√ vanishes when ζ = −x, that is, when ζ = ±i x. Write ζ = ξ + iη. Along the
line R + i x we have
√
i(ξ + iη)3 /3 + ix(ξ + iη) = iξ 3 /3 − xξ 2 − 2x3/2 /3,
and thus,
3/2 /3
e−2x √ 2
Z
xξ +iξ 3 /3
Ai(x) = e− dξ . (6.16)
2π R
We can estimate √
√ 2 √ 2
Z Z
− xξ +iξ 3 /3 − xξ π
e dξ ≤ e dξ = 1/4 ,
R R x
yielding
3/2
e−2x /3
| Ai(x)| ≤ √ 1/4 , x > 0. (6.17)
2 πx
One has to work a bit harder to find the asymptotic behaviour when x → −∞. Note first
that (6.16) continues to hold when |√ arg x| < π. So does the estimate (6.17), if one replaces
x with Re x and x with (Re x)1/2 in the right hand side. Next note that Ai(ωx) is
3/2 3/2 1/4
also a solution of (6.15) if ω is a cubic root of unity, that is, if ω = ωk , where ωk = e2πki/3 ,
k = 0, 1, 2. We claim that any two of these solutions are linearly independent. This follows
since
1 1 ∞ 3−1/6 Γ(1/3)
Z Z
3 3
Ai(0) = eiζ /3 dζ = Re eiπ/6 e−t /3 dt = ,
2π R+i π 0 2π
1 1 31/6 Γ(2/3)
Z Z ∞
0 iζ 3 /3 iπ/3 −t 3 /3
Ai (0) = iζ e dζ = − Im e e dt = − ,
2π R+i π 0 2π
where we have deformed the path R + i into {e−iπ/6t : − ∞ ≤ t < 0} ∪ {eiπ/6t : 0 ≤ t < ∞}.
Hence, Ai(ωx) has the value Ai(0) at 0, while the derivative is ω Ai0 (0), confirming the linear
independence. We also have
3
∑ ωk Ai(ωk x) = 0,
k=0
since the value of this solution of the Airy equation at 0 is Ai(0) ∑3k=0 ωk = 0 and it deriva-
tive is Ai0 (0) ∑3k=0 ωk2 = 0. Expressing Ai(−r) = −ω1 Ai(−ω1 r) − ω2 Ai(−ω2 r), r > 0, with
arg(−ω1 r) = −π/3 and arg(−ω2 r) = π/3, we therefore find that
| Ai(−r)| ≤ | Ai(−ω1 r)| + | Ai(−ω2 r)| ≤ Cr−1/4 , r > 0,
for some constant C > 0.

Remark 6.34. With a little more work one can also prove that Ai(−r) = π −1/2 r−1/4 (sin(2r3/2 /3+
π/4) + O(r−3/2 )) as r → ∞, which shows that the Airy function oscillates rapidly for large
negative values of x. The treatment of the Airy function given here follows the approach in
Hörmander [12].
Corollary 6.35. There exists a constant C > 0 such that

− 13 1 1
q− p
kS(t)u0 kL p (Rd ) ≤ Ct ku0 kLq (R)
for u0 ∈ S (R) and p ∈ [2, ∞], where 1/p + 1/q = 1.
6.3.2 Nonlinear theory

We next consider the Cauchy problem
(
ut + 6uux + uxxx = 0, t > 0,
(6.18)
u = u0 , t = 0,
where u is assumed to be real-valued. One could consider more general nonlinearities of the
form uk ux , where k ≥ 1 is an integer, but we shall concentrate on the case k = 1 for simplicity.
The local results hold for any power of the nonlinearity, while the global results depend heavily
on k.
Local well-posedness
The Cauchy problem for KdV is harder than for NLS, the reason being that the nonlinearity
contains derivatives of u. We begin by stating the main result. The proof is conceptually easy
but contains some technical details.
Theorem 6.36. Let u0 ∈ H k (R), k ≥ 2. There exists a maximal time T > 0 such that the
Cauchy problem (6.18) has a strong solution in C([0, T ); H k (R)). The solution is unique in
this class and if T < ∞, then limt→T ku(t)kH k (R) = ∞. The solution depends continuously on
the initial data.
Remark 6.37. The reason for using Sobolev spaces of integer order is to simplify the proof.
Using the same methods one can prove local well-posedness in H s (R) with s > 3/2.
We first prove uniqueness.
Lemma 6.38. Let u, v ∈ C([0, T ]; H 2 (R)) ∩ C1 ([0, T ]; H −1 (R)) be solutions of (6.18) with
initial data u0 ∈ H 2 (R). Then u = v.
Proof. Since w := u − v satisfies the equation wt + 3((u + v)w)x + wxxx = 0, it follows that
d
kw(t)k2L2 (R) = 2hwxxx (t) − 3((u(t) + v(t))w(t))x , wi
dt Z
=6 wx (x,t)w(x,t)(u(x,t) + v(x,t)) dx
RZ
= −3 w2 (x,t)(ux (x,t) + vx (x,t)) dx

R
≤ Ckw(t)k2L2 (R) ,
where C = 3 sup0≤t≤T (kux (·,t)kL∞ (R) + kvx (·,t)kL∞ (R) ) < ∞ due to the Sobolev embedding
H 2 (R) ⊂ Cb1 (R). Note that we have to interpret f (t) := wxxx (t) − 3((u(t) + v(t))w(t))x as an
element of H −1 (R) in the first line. The expression h f (t), w(t)i is well-defined since w(t) ∈
H 2 (R) ⊂ H 1 (R). Using Grönwall’s inequality and the fact that w(0) = 0, it follows that
w(t) ≡ 0 in [0, T ].
Remark 6.39. If we instead assume that u, v ∈ C([0, T ]; H 4 (R)) ∩C1 ([0, T ]; H 1 (R)), then all
the derivatives can be interpreted in the classical sense. The reason for working with the lower
regularity is that we need it to prove global existence.
The strategy for proving the existence of a local solution is to replace the Cauchy problem
(6.18) by the approximate problem
(
ut + 6uux + uxxx + εuxxxx = 0, t > 0,
(6.19)
u = u0 , t = 0,
with ε > 0. This is basically the same idea which was involved when we discussed uniqueness
for weak solutions of conservation laws in Section 4.3.3. In fact, one could instead have used
ut + 6uux + uxxx = εuxx , although this gets more technical.
The strategy is the following.
(1) Prove that (6.19) has a solution uε defined on some time interval [0, T (ε)).
(2) Prove that {uεn } converges to a solution of (6.18) as n → ∞, where {εn } is a sequence
of positive numbers which converges to 0.
There are two technical complications here. First of all, one must show that the lifespan
T (εn ) is bounded away from 0 as n → ∞. Secondly, we will only prove convergence in
C([0, T ]; H k−2 (R)) if u0 ∈ H k (R) in the second step. In order to prove the existence of a solu-
tion in C([0, T ]; H k (R)) we approximate the initial data by smooth functions and use the KdV
equation to construct a sequence of approximate solutions which converges in the stronger
norm.
The Cauchy problem (6.19) can be reformulated as the fixed point equation6
Z t
u(t) = Sε (t)u0 − 3 Sε (t − τ)(u2 (τ))x dx, (6.20)
0
4 3 4 −iξ 3 )
where Sε (t) = e−t(ε∂x +∂x ) . We write Ŝε (t) = e−t(εξ .
Lemma 6.40.
k∂x Sε (t) f kH k (R) ≤ C((εt)−1/2 + (εt)−1/4 )k f kH k−1 (R) .
Proof. The result follows by estimating kξ Ŝε (t)kL∞ (R) and kξ 2 Ŝε (t)kL∞ (R) since
F (∂x Sε (t) f )(ξ ) = iξ Ŝε (t) fˆ(ξ ) and F (∂x2 Sε (t) f )(ξ ) = −ξ 2 Ŝε (t) fˆ(ξ ).
We have
4
kξ Ŝε (t)kL∞ = kξ e−tεξ kL∞
4
= (tε)−1/4 k(tε)1/4 ξ e−tεξ kL∞
≤ C(tε)−1/4 .
In a similar way one proves that kξ 2 Ŝε (t)kL∞ (R) ≤ C(tε)−1/2 .
Lemma 6.41. Let T > 0, ε > 0 and f ∈ C([0, T ]; H k (R)). There exists C = C(T, ε) such that
Z t
k ∂x Sε (t − τ) f (τ) dτkH k (R) ≤ Ck f kC([0,T ];H k−1 (R)) , t ∈ [0, T ].
0
Proof. This follows immediately by moving the norm inside the integral and applying the
previous lemma. Here it is essential that τ 7→ (t − τ)−1/2 is integrable on [0,t]
Theorem 6.42. Let k ≥ 2 be an integer and u0 ∈ H k (R). For any ε > 0 there exists a unique
maximal strong solution u ∈ C([0, T ); H k (R)) of (6.19) with u(0) = u0 . If T < ∞, we have that
limt→T ku(t)kH k (R) = ∞. The solution has the additional property that u ∈ C∞ ((0, T ); H m (R))
for any integer m ≥ 2.
Proof. The proof of the existence and uniqueness of a solution u ∈ C([0, T ); H k (R)) follows as
for the NLS equation in view of Lemma 6.41. To prove the additional regularity property of the
solution, we note that Sε (t)u0 ∈RC∞ ((0, ∞); H m (R)) for any m due to the rapid decay of Ŝε (t) in
ξ for t > 0. On the other hand, 0t ∂x S(t − τ)u2 (τ) dτ ∈ C([0, T ); H k+1 (R)) by Lemma 6.41. It
follows that u ∈ C((0, T ); H k+1 (R)). Solving the equation with initial data u(t0 ) ∈ H k+1 (R),
for any t0 ∈ (0, T ), we see by the same argument that u ∈ C((t0 , T ); H k+2 (R)). Since t0 is
6 The proof of the equivalence requires some modifications of the method in the proof of Proposition 6.15
since Sε (−t) is not bounded for t > 0. To prove the uniqueness of the solution of the inhomogeneous linear
problem one can instead use energy estimates.
arbitrary it follows that u ∈ C((0, T ); H k+2 (R)) and by induction that u ∈ C((0, T ); H m (R))
for any m ≥ 2. By differentiating under the integral sign, one now finds that
Z t
u(t) = Sε (t − t0 )u(t0 ) − 3 ∂x Sε (t − τ)u2 (τ) dτ ∈ C∞ ((t1 , T ); H m (R))
t0
for any m and any t0 ∈ (0, T ). Since t0 is arbitrary it follows that u ∈ C∞ ((0, T ); H m (R)) for
any m ≥ 2.
As mentioned earlier, the maximal existence time T = T (ε) is a function ε. Our first task
is to prove that it doesn’t shrink to 0 as ε → 0+ . To do so, we must simply bound the H k norm
of the solution. We begin with a lemma related to Grönwall’s inequality.
Lemma 6.43. Assume that f ∈ C[0, T ] with f ≥ 0, and that there exist a, b, α ∈ R with a > 0
and α > 1 such that Z t
f (t) ≤ a + b f α (τ) dτ
0
when t ∈ [0, T ]. Then
1
f (t) ≤ (a1−α − (α − 1)bt) 1−α .
Proof. Let F(t) = a + b 0t f α (τ) dτ, so that F ∈ C1 ([0, T ]) with F 0 (t) = a + b f α (t) and F(t) >
R
0, f (t) ≤ F(t) for all t ∈ [0, T ]. Then
F 0 (t) ≤ bF α (t), t ∈ [0, T ],
which implies that F 0 (t)F −α (t) ≤ b. Integrating this inequality, we find that
1
F 1−α (t) − F 1−α (0) ≤ bt,

1−α
yielding
F 1−α (t) ≥ F(0)1−α − (α − 1)bt = a1−α − (α − 1)bt
and hence
1
f (t) ≤ F(t) ≤ (a1−α − (α − 1)bt) α−1 .
Lemma 6.44. T (ε) is bounded away from 0 as ε → 0+ .
Proof. Let 0 ≤ j ≤ k. For t > 0 we have that

d j
k∂ u(t)k2L2 (R) = −2(∂xj u, ∂xj (εuxxxx + uxxx + 3(u2 )x ))L2 (R)
dt x
= −2εk∂xj+2 uk2L2 (R) + 2(∂xj+1 u, ∂xj+2 u)L2 (R) − 6(∂xj u, ∂xj+1 (u2 ))L2 (R)
≤ −6(∂xj u, ∂xj+1 (u2 ))L2 (R)
j+1
j+1
= −6 ∑ (∂xj u, ∂xm u ∂xj+1−m u)L2 (R) .
m=0 m
j j+1−m
If 1 ≤ m ≤ j −1, we can estimate the terms in the sum by k∂x ukL2 (R) k∂xm ukL∞ (R) k∂x ukL2 (R) ≤
3
CkukH k (R) , where the Sobolev embedding theorem is used. The case m = j can be estimated
j j j+1
by k∂x uk2L2 (R) k∂x ukL∞ (R) ≤ Ckuk3H k (R) . This leaves the term −12h∂x u, u ∂x ui. Integration
by parts yields that
Z
−2h∂xj u, u∂xj+1 ui = (∂xj u)2 ∂x u dx ≤ k∂x ukL∞ (R) k∂xj uk2L2 (R) ≤ Ckuk3H k (R) .
R
Altogether we find that

d
ku(t)k2H k (R) ≤ Cku(t)k3H k (R)
dt
for t ∈ (0, T ). Integrating this inequality from t0 to t and letting t0 → 0+ , we find that
Z t
ku(t)k2H k (R) ≤ ku0 k2H k (R) +C ku(τ)k3H k (R) dτ.
0
Applying the previous lemma with f (t) = ku(t)k2H k (R) , a = ku0 k2H k (R) , b = C and α = 3/2,
we find that −2
2 −1 1
ku(t)kH k (R) ≤ ku0 kH k (R) − Ct .
2
This proves that the maximal existence time is bounded from below by T0 = 2ku0 k−1H k (R)
/C,
for otherwise the left hand side would blow up at some time t < T0 . This would result in a
contradiction, since the right hand side doesn’t blow up until T0 .
We now consider a sequence {εn } with εn → 0+ and a sequence {un } of solutions in

C([0, T ]; H k (R)) of (6.19) with ε = εn and un (0) = u0 ∈ H k (R), where k ≥ 4. By the previous
lemma we can assume that the sequence {kun kC([0,T ];H k (R)) } is bounded.
Lemma 6.45. The sequence {un } converges in C([0, T ]; H k−2 (R)) to a strong solution u ∈
C([0, T ]; H k−2 (R)) of (6.18).
Proof. We show that {un } is a Cauchy sequence in C([0, T ]; H k−2 (R)). Let w := un − um .
Then
wt + 3((un + um )w)x + wxxx + (εn − εm )unxxxx + εm wxxxx = 0.
One finds that
1d j
k∂ w(t)k2L2 (R) = (εm − εn )(∂xj+2 w, ∂xj+2 un )L2 (R) − εm (∂xm+2 w, ∂xm+2 w)L2 (R)
2 dt x
− 3(∂xj w, ∂xj+1 ((un + um )w))L2 (R)
≤ (εm − εn )(∂xj+2 w, ∂xj+2 un )L2 (R) − 3(∂xj w, ∂xj+1 ((un + um )w))L2 (R) .
The first term can be estimated by |εm − εn |kwkC([0,T ];H k (R)) kun kC([0,T ];H k (R) , while the second
and the third term can be estimated by kw(t)k2H k (R) k∂x un kC([0,T ];H k (R)) , since
Z
2(∂xj w, (un + um )∂xj+1 w)L2 (R) =− (∂xj w)2 (un + um ) dx.
R
Summing over all j ≤ k − 2, we therefore find that
d
kw(t)k2H k−2 (R) ≤ C(|εm − εn | + kw(t)k2H k−2 (R) ).
dt
It follows from Grönwall’s inequality that
kun (t) − um (t)k2H k−2 (R) ≤ C|εm − εn |(eCt − 1) + kun (0) − um (0)k2H k−2 (R) eCt .
Taking the supremum over [0, T ] and using the fact that εn , εm → 0 as n, m → ∞, we find
that {un } is Cauchy in C([0, T ]; H k−2 (R)). Letting ε → 0 in (6.20), we find that the limit
u ∈ C([0, T ]; H k−2 (R)) is a strong solution of (6.18).
Proof of Theorem 6.36. The uniqueness of the solution follows from Lemma 6.38. We ap-
proximate u0 by a sequence {u0n } ⊂ C0∞ (R) in the H k (R) norm. Since C0∞ (R) ⊂ H k+2 (R), we
can find a solution un ∈ C([0, Tn ]; H k (R)) with initial data u0n for each n. It remains to prove
that we can take Tn = T0 , for some T0 > 0 which is independent of n, and that {un } converges
in C([0, T0 ]; H k (R)). To do this, we use the same arguments as in Lemmas 6.44 and 6.45. The
only purpose of working in H k−2 (R) instead of H k (R) in Lemma 6.45 was to handle the term
εuxxxx . With this term gone, we find that there exists a T0 > 0 such that
d
kun (t) − um (t)k2H k (R) ≤ Ckun (t) − um (t)k2H k (R) , t ∈ [0, T0 ],
dt
with C = C(ku0 kH k (R) ), and hence that
2
kun − um kC([0,T k
0 ];H (R))
≤ ku0n − u0m k2H k (R) eCT0 ,
showing that {un } is Cauchy in C([0, T0 ]; H k (R)). This proves the existence of a local solution.
If u0 , v0 ∈ BR (0) ⊂ H k (R) the same argument shows that
2
kun − vn kC([0,T k
0 ];H (R))
≤ ku0n − v0n k2H k (R) eCT0 ,
where u0n , v0n are smooth initial data approximating u0 and v0 and un , vn the corresponding
solutions on the time interval [0, T0 ] with T0 = T0 (R). Passing to the limit, we find that
2
ku − vkC([0,T k
0 ];H (R))
≤ ku0 − v0 k2H k (R) eCT0 ,
proving that the solutions depend continuously on the initial data over the interval [0, T0 ]. The
argument used for the NLS equation shows that the solution can be extended to a maximal
solution u ∈ C([0, T ); H k (R)). Since the time interval in the above continuity argument just
depends on the norm of the initial data, one can extend it to any interval [0, T0 ] on which the
two solutions exist.
Global existence and conservation laws

We now turn to the question of global existence. As for the NLS equation, this will be given a
positive answer due to the existence of suitable conservation laws. Define
Z
I(u) = u dx,
R
1
Z
M(u) = u2 dx,
2
R
1 2
Z
3
E(u) = ux − u dx,
R 2
Z
1 2 2 5 4
F(u) = uxx − 5uux + u dx.
R 2 2
These are formally conserved for the KdV equation. The first conservation law follows by
writing the equation as
ut + (3u2 + uxx )x = 0.
Assuming that 3u2 + uxx → 0 as x → ±∞, we obtain that I(u) is conserved. Multiplying the
KdV equation by u we obtain
uut + 6u2 ux + uuxxx = 0,
which can be rewritten

1 2 3 1 2
∂t u + ∂x uuxx + 2u − ux = 0.
2 2
This gives the conservation of M(u). The conservation of E(u) follows by subtracting 3u2 ×
(KdV) from ux × ∂x (KdV). After some work one obtains that

1 2 3 2 2 1 2 9 4
∂t u − u + ∂x 6uux + ux uxxx − 3u uxx − uxx − u = 0.
2 x 2 2
This formally results in the conservation of E(u). The conservation of F(u) can be proved
in a similar manner, although the calculations are rather tedious. By adding uxx × ∂x2 (KdV),
−5u2x × (KdV), −10uux × ∂x (KdV) and 5u3 × (KdV) one finds after a long calculation that

1 2 2 5 4
∂t u − 5uux + u + ∂x R(u) = 0,
2 xx 2
where
1
R(u) = uxx uxxxx − u2xxx + 5u2x uxx + 8uu2xx − 10uux uxxx + 10u3 uxx − 45u2 u2x + 12u5 .
2
The conserved quantities I(u), M(u) and E(u) are related to mass, momentum and energy for
water waves. The quantity F(u) has no direct physical interpretation. The existence of this
conserved quantity is related to the complete integrability of the KdV equation, a property
which will be discussed further in the next chapter.
If u ∈ H 2 (R), the quantities M(u), E(u) and F(u) are well-defined by Sobolev’s embed-
ding theorem. By repeating the procedure in the proof of Theorem 6.23 one obtains the fol-
lowing result.
Proposition 6.46. Let u ∈ C([0, T ); H 2 (R)) be a strong solution of the KdV equation. The
quantities M(u(t)), E(u(t)) and F(u(t)) are independent of t.
For the proof one needs a persistence of regularity result. We denote by Tk (u0 ) the maximal
existence time of the H k solution of KdV with initial data u0 ∈ H k (R).
Lemma 6.47 (Persistence of regularity for KdV). Let u0 ∈ H k (R), where k ≥ 2. Then Tk (u0 ) =
T2 (u0 ).
To illustrate the idea we consider the proof that T3 (u0 ) = T2 (u0 ). One formally has that
d
kuxxx (t)k2L2 (R) = (uxxx , −6(uux )xxx − uxxxxxx )L2 (R)
dt
= (uxxx , −6(uux )xxx )L2 (R) .
The last expression can be written using Leibniz’ formula as a linear combination of terms
of the form (uxxx , uuxxxx )L2 (R) , (uxxx , ux uxxx )L2 (R) , (uxxx , u2xx )L2 (R) . Integration by parts reveals
that 2(uxxx , uuxxxx )L2 (R) = −(u2xxx , ux )L2 (R) , while (uxxx , u2xx )L2 (R) = (uxxx uxx , uxx )L2 (R) = 0. We
can estimate |(u2xxx , ux )L2 (R) | ≤ kux kL∞ (R) kuxxx k2L2 (R) ≤ CkukH 2 (R) kuxxx k2L2 (R) . Since kukH 2 (R)
is bounded on [0, T ] for any T < T2 (u0 ) we obtain by Grönwall’s inequality that ku(t)kH 3 (R) is
bounded on [0, T ] and hence that T3 (u0 ) = T2 (u0 ). The calculation doesn’t quite make sense,
since uxxx (t) 6∈ C1 ([0, T3 ); L2 (R)) in general. This can be taken care of by first proving the
result for (6.19) and then letting ε → 0+ . One shows similarly that Tk+1 (u0 ) = Tk (u0 ) = T2 (u0 )
for all k ≥ 2.
Theorem 6.48. For u0 ∈ H k (R), k ≥ 2, the solution of (6.18) exists for all t ≥ 0.
Proof. We have that
kuk2H 1 (R) ≤ 2(E(u) + M(u)) + 4kukL∞ (R) M(u)

≤ 2(E(u) + M(u)) +CkukH 1 (R) M(u)
1 C2
≤ 2(E(u) + M(u)) + kuk2H 1 (R) + M(u)2 ,
2 2
where we have used the arithmetic-geometric inequality. This shows that
1 C2 C2
ku(t)k2H 1 (R) ≤ 2(E(u(t)) + M(u(t))) + M(u(t)) = 2(E(u(0)) + M(u(0))) + M(u(0))
2 2 2
is bounded. Similarly, we find that
kuxx k2L2 (R) ≤ 2F(u) + 10kukL∞ (R) kuk2H 1 (R) + 10kuk2L∞ (R) M(u)
≤ 2F(u) +C(kuk3H 1 (R) + kuk4H 1 (R) ).
Which together with the bound on ku(t)kH 1 (R) and the conservation of F(u(t)) shows that
ku(t)kH 2 (R) is bounded for all t. We conclude using Theorem 6.36, that if u0 ∈ H 2 (R), then
the corresponding solution is defined for all t, that is, u ∈ C([0, ∞); H 2 (R)). If u0 ∈ H k (R),
with k ≥ 2, we conclude by Lemma 6.47 that u ∈ C([0, T ); H k (R)).
6.3.3 Derivation of the KdV equation

The usual mathematical model for water waves is the following. We assume that the water
is bounded from below by a rigid flat bottom z = 0 and above by a moving boundary z =
η(x, y,t) (we assume that it the graph of a function). The fluid domain is Ωt := {(x, y, z) ∈
R3 : 0 < z < η(x, y,t)}. We denote the surface by St := {(x, y, z) ∈ R3 : z = η(x, y,t)} and the
bottom by B := {(x, y, z) ∈ R3 : z = 0}. The flow of the water is described by the velocity field
u = (u1 , u2 , u3 ). The velocity field satisfies Euler’s equation
1
ut + (u · ∇)u = F − ∇p, (6.21)
ρ
where ρ is the density, p the pressure and F the body forces acting on the fluid. In addition,
the density satisfies the continuity equation
ρt + ∇ · (ρu) = 0.
For water it is reasonable to assume that the density is constant, in which case the last equation
reduces to
∇ · u = 0. (6.22)
Moreover, we assume that the only force is gravity, so that F = (0, 0, −g), where g is the
gravitational constant of acceleration. We assume in addition that all functions are independent
of y, so that the waves move only in the horizontal direction x and are uniform in the y-
direction. The equations of motion then reduce to

vx + wz = 0,

vt + vvx + wvz = − ρ1 px , (6.23)
wt + vwx + wwz = − ρ1 pz − g,


if we set u1 = v, u2 = 0 and u3 = w. In addition to (6.23) one has to impose boundary condi-

tions at the surface and the bottom. These are usually assumed to be of the form

w = 0
 on B
w = ηt + vηx on St , (6.24)

p = patm on St ,

where patm is the constant atmospheric pressure at the water surface. The first condition simply
means that there is no flow of water across the bottom. The second condition means that
surface moves with the fluid, so that it always contains the same fluid particles. If (x(t), z(t))
describes the position of a water particle, then the condition for a water particle to remain
on the surface is that z(t) = η(x(t),t). Differentiating this with respect to t, one obtains
ż(t) = ηt (x(t),t) + ηx (x(t),t)ẋ(t). Since ẋ = v and ż = w, the second boundary condition
follows. There are two important difficulties to note here. First of all, the domain Ωt depends
on t since the surface is time-dependent. Even more importantly, the surface is not prescribed,
it is one of the unknowns of the problem. We therefore have to solve PDE in a domain which
is unknown! This is an example of a free boundary problem.
To derive the KdV equation, we assume that the waves we are describing have small am-
plitude. We therefore introduce a small parameter ε (the amplitude parameter) and write the
equation for the surface in the form z = h0 + h0 εη(x,t), where h0 is the depth of the undis-
turbed water. Rescaling the other variables in an appropriate way, one obtains the new set of
equations
 
vx + wz = 0,
 w = 0
 on z = 0
vt + ε(vvx + wvz ) = −px , p=η on z = 1 + εη, (6.25)
 
ε{wt + ε(vwx + wwz )} = −pz , w = ηt + εvηx on z = 1 + εη.
 
The appropriate change of variables is

s
h0 h0 1 p p
x = √ x̃, z = h0 z̃, t = √ t˜, v= gh0 ε ṽ, w= gh0 ε 3/2 w̃,
ε g ε
p = patm + ρgh0 (1 − z̃) + ρgh0 ε p̃ η = h0 + h0 ε η̃.
Equations (6.25) are obtained from (6.23)–(6.24) by using the tilde-variables. We have dropped
the tildes in (6.25) for notational convenience. For an explanation of this change of variables
we refer to Johnson [15].
To derive the leading order approximation we set ε = 0. This yields vx + wz = 0, vt = −px ,
pz = 0 and the boundary conditions w = 0 at z = 0 and p = η, w = ηt at z = 1. A calculation
which is left as an exercise to the reader then shows that
ηtt = ηxx
(the crucial point is that p(x, z,t) = η(x,t)). In other words, the surface profile satisfies the
wave equation to leading order. This suggests the introduction of new variables
ξ = x − t, τ = εt,
meaning that we are seeking a perturbation of a wave moving to the right with unit speed7 .
The equations (6.25) are rewritten in the form
 
vξ + wz = 0,
 w = 0
 on z = 0,
−vξ + ε(vτ + vvξ + wvz ) = −pξ , p=η on z = 1 + εη,
 
ε{−wξ + ε(−wτ + vwξ + wwz )} = −pz , w = −ηξ + ε(ητ + vηξ ) on z = 1 + εη.
 
7A similar analysis can be made for waves moving to the left.

A solution is sought in the form of asymptotic expansions
v = v0 + εv1 + · · · , w = w0 + εw1 + · · · , η = η0 + εη1 + · · · , p = p0 + ε p1 + · · · .
The KdV equation is obtained by substituting this in the equations and collecting terms of
the same order in ε. The boundary conditions on the surface z = 1 + εη have to be handled
by expanding p(ξ , 1 + εη, τ) = p(ξ , 1, τ) + ε pz (ξ , 1, τ)η + O(ε 2 ) and similarly for the other
variables. Keeping this in mind, one obtains at order 0
 
v0ξ + w0z = 0,
 w0 = 0
 on z = 0,
−v0ξ = −p0ξ , p 0 = η0 on z = 1,
 
p0z = 0, w0 = −η0ξ on z = 1,
 
which is solved by
p0 = η0 = v0 , w0 = −zη0ξ
(one can add an arbitrary function of (z,t) to v0 ). Using this, one obtains at the next order
 
v1ξ + w1z = 0,
 w1 = 0
 on z = 0,
−v1ξ + v0τ + v0 v0ξ = −p1ξ , p 1 = η1 on z = 1,
 
p1z = w0ξ , w1 + w0z η0 = −η1ξ + η0τ + v0 η0ξ on z = 1.
 
We have p1z = w0ξ = −zη0ξ ξ and p1zz = w0ξ z = 0, from which it follows that p1 = 21 (1 −
z2 )η0ξ ξ + η1 . From this it follows that
1
−w1z = v1ξ = v0τ + v0 v0ξ + p1ξ = η0τ + η0 η0ξ + (1 − z2 )η0ξ ξ ξ + η1ξ ,
2
which in combination with the bottom boundary condition gives
1 1
w1 = −(η0τ + η0 η0ξ + η1ξ + η0ξ ξ ξ )z + z3 η0ξ ξ ξ
2 6
Substituting this in the surface boundary condition, one finally arrives at
1 1
−(η0τ + η0 η0ξ + η0ξ ξ ξ ) + η0ξ ξ ξ − η0 η0ξ = η0τ + η0 η0ξ
2 6
1
2η0τ + 3η0 η0ξ + η0ξ ξ ξ = 0.
3
Up to a simple rescaling this is is the KdV equation.
6.4 Further reading

The material presented in this chapter can only be considered as a first glimpse of the subject.
In particular, we have not made any serious use of the properties of the propagator S(t) for the
linear problems when considering the nonlinear Cauchy problem. We only used the bound-
edness of S(t) on H s (Rd ). Our approach could therefore essentially be used for any equa-
tion (or even system) ut − a(Dx )u = f (u, Dx u) where the operator a(Dx ) satisfies Petrowsky’s
condition from Chapter 3 and f e.g. is a polynomial. This includes the conservation law
ut + f (u)x = 0 from Chapter 4, for which a(Dx ) = 0. With the technique used to treat the KdV
equation one can prove local well-posedness. The method is of course more complicated than
the method of characteristics, but has the advantage that it extends to higher dimensions and
to systems.
In the last decades, an important development has been the use of finer properties of S(t)
in order to lower the regularity assumptions in the well-posedness results. Taking the NLS
equation as an example, the basic ingredient is the estimate

1 1
− d2 q− p
kS(t)u0 kLq (Rd ) ≤ (4π|t|) ku0 kL p (Rd )
from Proposition 6.13. This estimate can be used to prove the continuity of the map
u0 7→ S(t)u0
from L2 (Rd ) to Lq (R; Lr (Rd )) under certain conditions on q and r. One also has continuity of
the operator Z t
f 7→ S(t − τ) f (τ) dτ
0
L (I; L (R )) ∩ C([0, T ]; L2 (Rd ))
from Lγ ((0, T ); Lρ (Rd )) intoq r d under appropriate conditions
on the exponents. These types of estimates go under the name of Strichartz estimates. They
can e.g. be used to prove the local well-posedness of the NLS equation in L2 (Rd ) under the
condition 1 < p < 1 + 4/d on the exponent. The idea is to replace the space X in the proof
of Theorem 6.18 with a space which is compatible with the Strichartz estimates. The great
advantage of having such a local well-posedness result is that it automatically implies global
existence, irrespective of the sign of σ , due to the conservation of the L2 norm.
For the KdV equation, the Strichartz estimates do not suffice to lower the regularity re-
quirements. The problem is as usual the derivative in the nonlinearity. To handle this, one
needs smoothing properties of the operator f 7→ 0t S(t − τ) f (τ) dτ. A very simple example of
R
such a smoothing result is the estimate

k∂x S(t)u0 kLx∞ (R;Lt2 (R)) ≤ Cku0 kL2 (R) ,
in which the indices in the left hand side indicate that we first integrate in t and then take the
essential supremum in x. This estimate follows by writing
1 1
Z Z
3 1/3
∂x S(t)u0 (x) = √ eitξ +ixξ iξ û0 (ξ ) dξ = √ eitη eixη iη −1/3 û0 (η 1/3 ) dη
2π R 3 2π R
and applying Parseval’s formula with respect to the t variable.
For details on well-posedness results with low regularity and applications to global exis-
tence we refer to Linares & Ponce [19], Sulem & Sulem [26] and Tao [27]. The derivation
of the KdV equation from the governing equations for water waves is essentially taken from
Johnson [15].
Chapter 7
Solitons and complete integrability
In the final chapter we will consider some of the remarkable properties of the KdV equation
alluded to in the introduction and the previous chapter. These properties can be summarized
by saying that the equation is ‘completely integrable’. It isn’t easy to give a precise meaning to
this phrase, but it involves the existence of infinitely many independent conserved quantities
and a way to transform the equation into a linear problem. In the context of finite-dimensional
Hamiltonian systems (ODE) one can give a more precise definition (see [2]). There are many
other interesting examples of completely integrable PDE, e.g. the cubic nonlinear Schrödinger
equation in one dimension, but most equations do not have this property. We will focus on
the KdV equation here for simplicity. The considerations in this chapter will mostly be of a
formal nature. In other words, we will not specify the function spaces that the solutions have
to belong to for the results to be true. It is useful in this chapter to rewrite the KdV equation
in the equivalent form
ut − 6uux + uxxx = 0,
which is obtained using the transformation u 7→ −u. Transformations between different PDE
play a central role in this chapter. As a first illustration, we consider the Cole-Hopf transfor-
mation.
Example 7.1. Consider the viscous Burgers equation
ut + uux = εuxx , ε > 0.

Rx
Let w(x,t) = 0 u(s,t) ds + g(t). If g is chosen appropriately, one finds that
1
wt + w2x = εwxx
2
w(x,t)
(show this!). Setting v(x,t) = e− 2ε , one obtains
wt wx
vt = − v, vx = − v (7.1)
2ε 2ε
and
w2x

wxx
vxx = − + v
2ε (2ε)2
127
128 Solitons and complete integrability
so that
1 1 2
vt − εvxx = − wt + wx − εwxx v = 0.
2ε 2
The transformation u 7→ v is called the Cole-Hopf transformation. Note that this transforms
the nonlinear Burgers equation to the linear heat equation.
The existence of the Cole-Hopf transformation is very surprising and seems almost like
magic. There is a way to transform the KdV equation into a linear problem, but as we will
see, it is much more complicated. There is however a transformation of the KdV equation to
a different nonlinear equation which will turn out to be very useful.
7.1 Miura’s transformation, Gardner’s extension and infinitely

many conservation laws
The modified KdV equation (mKdV)
ut − 6u2 ux + uxxx = 0
is obtained by changing the power of the nonlinearity in KdV. Suppose that
u = v2 + vx . (7.2)
Then
ut − 6uux + uxxx = (2vvt + vxt ) − 6(v2 + vx )(2vvx + vxx ) + 6vx vxx + 2vvxxx + vxxxx
= (2v + ∂x )(vt − 6v2 vx + vxxx ),
so that u solves the KdV equation if v is a solution of the mKdV equation (but not necessarily
vice versa). The transformation v 7→ u is known as the Miura transformation. So far this
doesn’t seem to gain very much since the mKdV equation is just as complicated as the KdV
equation. We shall soon see that it in fact gives rise to an infinite number of conservation laws
for the KdV equation, but first we need an extension due to Gardner. The idea is to introduce
a parameter ε. Gardner’s transformation is
u = v + εvx + ε 2 v2 . (7.3)
Substitution of this in the KdV equation gives
ut − 6u2 ux + uxxx = (1 + ε∂x + 2ε 2 v)(vt − 6(v + ε 2 v2 )vx + vxxx ).
Hence, u is a solution of the KdV equation if v is a solution of the Gardner equation
vt − 6(v + ε 2 v2 )vx + vxxx = 0.
This is in conservation form and hence
Z
v dx
R
Solitons and complete integrability 129
is a conserved quantity. For ε = 0 we have v = u, in which this is just the conservation of I(u)
(see Section 6.3). The idea is now to expand v in powers of ε. Assume that v can be expanded
in a series
∞
v= ∑ ε k vk .
n=0
Substituting this in (7.3), ve obtain
!2
∞ ∞ ∞
u= ∑ ε k vk + ε ∑ ε k ∂x vk + ε 2 ∑ ε k vk
k=0 k=0 k=0
!
∞ ∞ ∞ k
= ∑ ε k vk + ε ∑ ε k ∂x vk + ε 2 ∑ ε k ∑ vk− j v j
k=0 k=0 k=0 j=0
!
∞ k−2
= v0 + ε (v1 + ∂x v0 ) + ∑ ε k vk + ∂x vk−1 + ∑ vk−2− j v j .
k=2 j=0
Thus
v0 = u, v1 = −∂x v0 = −ux , (7.4)
and !
k−2
vk = − ∂x vk−1 + ∑ vk−2− j v j , k ≥ 2. (7.5)
j=0
The first few expressions are easily computed. We have
v2 = uxx − u2 ,
v3 = −(uxx − u2 )x + 2uux ,
v4 = −(2uux − (uxx − u2 )x )x − 2u(uxx − u2 ) − u2x .
R
The fact that R v dx = const moreover implies that
Z
vk (t, x) dx = const, k ∈ N.
R
In general, it turns out that the relations corresponding to odd powers of ε are trivial (the
corresponding vk are exact differentials), whereas the even ones induce an infinite family of
nontrivial conserved quantities [5]. Note that v2 and v4 correspond to the functionals M(u)
and E(u) discussed in the previous chapter, respectively.
7.2 Soliton solutions

We now show how the Miura transformation can be used to generate new solutions of the KdV
equation from known solutions. The basic idea is that the mKdV equation has a symmetry
which the KdV equation does’t have: it is invariant under the reflection v 7→ −v. In other
words, if v solves the mKdV equation, then so does −v. In addition, the Miura transformation
is not injective. For example, any solution of the Riccati equation v2 + vx = 0 is mapped by the
Miura transformation to the 0 solution of the KdV equation. We could therefore in principle
generate a new solution of the KdV equation by mapping u1 = 0 to a solution v1 of the Riccati
equation, setting v2 = −v1 and u2 = v22 + v2x = v21 − v1x . One problem is that solutions of
the KdV equation are not automatically mapped to solutions of the mKdV equation by the
(inverse) Miura transformation. There is however a free parameter in the solution of the Riccati
equation, which can be adjusted to guarantee that v1 solves mKdV. Another problem is that
the solution v1 is only local, unless it vanishes identically. To remedy this, we introduce a
parameter λ in Miura’s transformation, so that
u = λ + v2 + vx .
The modified KdV equations is now replaced by
vt − 6(v2 + λ )vx + vxxx = 0, (7.6)
meaning as usual that u solves the former equation if v satisfies the latter. The above idea for
generating new solutions can now be summarized as follows.
(1) Start with a known solution u1 of the KdV equation and let v(x,t; f ) be the general
solution of the equation λ + vx + v2 = u1 . This will in general depend upon an arbitrary
function f = f (t).
(2) Adapt the function f so that v solves equation (7.6).
(3) Set u2 = λ + v2 − vx . u2 is a new solution of the KdV equation.
In the second step we note that (7.6) can be simplified using the identity vx = −λ − v2 + u1 .
Indeed, we have vxx = −2vvx + u1x and vxxx = −2v2x − 2vvxx + u1xx = −2v2x + 4v2 vx − 2vu1x +
u1xx . We can therefore rewrite (7.6) as
vt = 6(v2 + λ )vx + 2v2x − 4v2 vx + 2vu1x − u1xx

= 2(v2 + vx )vx + 6λ vx + 2vu1x − u1xx
= 2(u1 − λ )vx + 6λ vx + 2vu1x − u1xx
= 4λ vx + 2(u1 v)x − u1xx .
Example 7.2. We illustrate the method for u1 = 0. We should then choose v as a solution of
the ordinary differential equation
vx + v2 + λ = 0.
This can be integrated to
v(x,t) = κ tanh(κx + f (t)),
where λ = −κ 2 (< 0) and f is arbitrary. The equation for vt becomes
vt = −4κ 2 vx ,
and hence f 0 (t) = −4κ 3 , so that f (t) = −4κ 3t − κx0 , where x0 is arbitrary. We therefore
obtain the new solution
2κ 2
u2 (x,t) = −κ 2 + v2 (x,t) − vx (x,t) = − .
cosh2 (κ(x − x0 − 4κ 2t))
This is the solitary wave from the introduction with wave speed c = 4κ 2 (recall that we
changed u to −u in the KdV equation). This solution is valid only if |v| < κ. If |v| > κ,
we obtain the singular solution
v(x,t) = −κ coth(κ(x − x0 − 4κ 2t)).
The above transformation of solutions of the KdV equation to other solutions is an example
of a Bäcklund transformation. Assume that u and v are functions of two variables x and t and
that
P(u, ux , ut , . . . ; x,t) = 0 and Q(v, vx , vt , . . . ; x,t) = 0
are two uncoupled partial differential equations. A Bäcklund transformation for P and Q is a
system of differential equations
Fj (u, v, ux , vx , ut , vt , . . . ; x,t) = 0, j = 1, 2,
which is solvable for u if and only if Q(v) = 0 and has the property that u then solves P(u) = 0,
and vice versa. If P = Q, then Fj = 0 is called an auto-Bäcklund transformation.
Example 7.3. The simplest example of a Bäcklund transformation is the Cauchy-Riemann

system (
ux = vy ,
uy = −vx ,
which is an auto-Bäcklund transformation for Laplace’s equation.
The above transformation for the KdV equation may also be seen as an auto-Bäcklund
transformation. Introducing two functions w1 and w2 such that w jx = u j , the system defining
the Bäcklund transformation can be written
(w1 + w2 )x = 2λ + 12 (w1 − w2 )2 , (7.7)

(w1 − w2 )t = 3(w21x − w22x ) − (w1 − w2 )xxx .
The relation with (7.6) is that 2v = w1 − w2 is a solution and u1 = λ + v2 + vx , while u2 =

λ +v2 −vx . The amazing thing about the auto-Bäcklund transformation is that we can continue
generating new solutions from the solitary wave. In practice this turns out to be a bit difficult
since we have to integrate two ordinary differential equations in each step (one in x and one
in t). There is an elegant way to avoid this. The idea is to use the parameter λ . For this it is
convenient to work with (7.7). We first relabel the solutions. Let w0 be the known solution.
We solve equation (7.7) with w2 replaced by w0 and with two different values of λ , say λ1 and
λ2 . The two solutions are denoted w1 and w2 , so that
(w1 + w0 )x = 2λ1 + 12 (w1 − w0 )2 , (7.8)
(w2 + w0 )x = 2λ2 + 12 (w2 − w0 )2 (7.9)
Using these solutions, we define w12 and w21 by
(w12 + w1 )x = 2λ2 + 12 (w12 − w1 )2 , (7.10)
1 2
(w21 + w2 )x = 2λ1 + 2 (w21 − w2 ) (7.11)
Theorem 7.4 (Permutability theorem). Assume that w1 and w2 are solutions of (7.8)–(7.9).
There exists solutions w12 and w21 of (7.10)–(7.11) with w12 = w21 .
Such a result was originally proved for a similar problem in differential geometry by
Bianchi. Assume for a moment that the theorem is true, so that w12 = w21 = w. Forming
the combination (7.11)−(7.10)−((7.9)−(7.8)), we deduce that
0 = 4(λ1 − λ2 ) + 12 ((w − w2 )2 − (w − w1 )2 − (w2 − w0 )2 + (w1 − w0 )2 ).
Solving for w, we obtain the formula
4(λ1 − λ2 )
w = w0 − . (7.12)
w1 − w2
This may be seen as a nonlinear superposition principle. From the two solutions u1 and u2
corresponding to w1 and w2 , we obtain a new solution u = wx .
Example 7.5. We take w0 = 0, w1 = −2 tanh(x − 4t) and w2 = −4 coth(2x − 32t) so that
λ1 = −1 and λ2 = −4. We therefore obtain
6
w(x,t) = − .
2 tanh(2x − 32t) − tanh(x − 4t)
The corresponding solution of the KdV equation u(x,t) = wx (x,t) is (after some simplifica-
tions) given by
3 + 4 cosh(2x − 8t) + cosh(4x − 64t)
u(x,t) = −12 .
(3 cosh(x − 28t) + cosh(3x − 36t))2
This is a so-called two-soliton solution. As t → ±∞, the solution has the asymptotic form
u(x,t) ∼ −8 sech2 (2(x − 16t) ∓ 21 log 3) − 2 sech2 (x − 4t ± 12 log 3).
The solution can therefore be interpreted as describing the interaction of two solitary waves
−8 sech2 (2(x − 16t − 12 log 3)) and −2 sech2 (x − 4t + 21 ). For large negative t, the solution is
basically a linear combination of these two solutions. This is not surprising since the solitary
waves are very localized and are far apart. For t close to 0 the solutions interact in a nonlinear
wave. The amazing thing is that the the waves come out of the interaction looking exactly the
same as before. The only difference is a phase shift, so that the waves are not at the positions
one would expect if the interaction were linear. It is this property that gave rise to the name
solitons; the solitary waves interact almost like particles. Figure 7.2 illustrates the soliton
interaction.
Figure 7.1: A two-soliton solution – before, during and after interaction. Note that u has been
changed to −u, so that this is a solution of ut + 6uux + uxxx = 0.
Proof of Theorem 7.4. To check the validity of the permutability theorem, we simply substi-
tute w given by (7.12) back into equations (7.10) and (7.11). This results in
4(λ1 − λ2 ) 2

4(λ1 − λ2 ) 1
(w1 + w0 )x + (w1 − w2 )x = 2λ2 + 2 w1 − w0 +
(w1 − w2 )2 w1 − w2
and
4(λ1 − λ2 ) 2

4(λ1 − λ2 ) 1
(w2 + w0 )x + (w1 − w2 )x = 2λ1 + 2 w2 − w0 +
(w1 − w2 )2 w1 − w2
We now use (7.8) and (7.9) to rewrite this as
4(λ1 − λ2 )(w1 − w0 ) 1 4(λ1 − λ2 ) 2

4(λ1 − λ2 )
(w1 − w2 )x = 2(λ2 − λ1 ) + +2 ,
(w1 − w2 )2 w1 − w2 w1 − w2
4(λ1 − λ2 )(w2 − w0 ) 1 4(λ1 − λ2 ) 2

4(λ1 − λ2 )
(w1 − w2 )x = 2(λ1 − λ2 ) + +2 ,
(w1 − w2 )2 w1 − w2 w1 − w2
or equivalently,
(w1 − w2 )x = − 21 (w1 − w2 )2 + (w1 − w2 )(w1 − w0 ) + 2(λ1 − λ2 ),
= 12 (w1 − w0 )2 − (w2 − w0 )2 + 2(λ1 − λ2 ),

(w1 − w2 )x = 21 (w1 − w2 )2 + (w1 − w2 )(w2 − w0 ) + 2(λ1 − λ2 )

= 12 (w1 − w0 )2 − (w2 − w0 )2 + 2(λ1 − λ2 ).

This is nothing but the difference between (7.8) and (7.9). We also need to verify the second
part of the Bäcklund transformation That is, we need to show that
wt − 3w2x + wxxx = w2t − 3w22x + w2xxx = 0,
where we have used the fact that w2x solves the KdV equation. An argument similar to the one
above shows that this is satisfied. The calculations are straightforward but rather lengthy. We
leave them to the interested reader.
7.3 Lax pairs

We now return to the problem of transforming the KdV equation to a linear problem. As we
will see, the Miura transformation is useful also in this aspect.
7.3.1 The scattering problem

In the last section, we sometimes treated (7.2) as an equation in the unknown v for a given u.
We mentioned that this is a Riccati equation, a type which can always be rewritten as a second
order linear ordinary differential equation through the introduction of a potential ϕ such that
ϕ0
v= = ∂x ln(ϕ).
ϕ
This leads to
2
ϕxx ϕ − ϕx2 ϕx2 ϕxx

ϕx ϕx
u = ∂x + = + 2= ,
ϕ ϕ ϕ2 ϕ ϕ
so that
ϕxx − uϕ = 0,
at least as long as ϕ 6= 0. Ingeniously, Gardner, Greene, Kruskal and Miura generalized this,
and went on to investigate the scattering problem
ψxx + (λ − u)ψ = 0. (7.13)
For every fixed t, and equipped with appropriate boundary conditions, this as a Sturm–Liouville
problem. For given u and for every t, there are only certain λ (called eigenvalues) that admit
solutions ψ (called eigenfunctions). Furthermore, it turns out that (7.13) constitutes an implicit
linearization of the KdV equation: the evolution of u(x,t) according to the KdV equation,
forces λ (t) and ψ(x,t) to evolve in specific way, which will be calculated below.
7.3.2 The evolution equation
What happens when one substitutes (7.13) into the KdV equation? To simplify calculations,
define
R(x,t) := ψt (x,t) + ux (x,t)ψ(x,t) − 2 (u(x,t) + 2λ (t)) ψx (x,t)
and let
W ( f , g) := f gx − fx g
be the Wronskian of two functions f and g (with respect to x). Then

R
2
W (ψ, R) = ψ
ψ x

2 ψt + ux ψ − 2(u + 2λ )ψx
=ψ
ψ x
= ψtx ψ − ψt ψx + uxx ψ 2 + ux ψx ψ − ux ψψx − 2ux ψx ψ
− 2(u + 2λ )ψxx ψ + 2(u + 2λ )ψx2
= ψtx ψ − ψt ψx + uxx ψ 2 − 2ux ψx ψ + 2(u + 2λ ) ψx2 − ψxx ψ ,

and
∂xW (ψ, R) = ψtxx ψ + ψtx ψx − ψtx ψx − ψt ψxx + uxxx ψ 2 + 2uxx ψψx

− 2uxx ψx ψ − 2ux ψxx ψ − 2ux ψx2 + 2ux ψx2 − ψxx ψ

+ 2(u + 2λ ) (2ψx ψxx − ψxxx ψ − ψxx ψx )

2 ψxx
=ψ + uxxx ψ 2 − 4ux ψxx ψ + 2(u + 2λ ) (ψx ψxx − ψxxx ψ)
ψ t

2 2 2 2 ψxx
= ψ (u − λ )t + uxxx ψ − 4ux ψ (u − λ ) − 2(u + 2λ )ψ
ψ x
= ψ 2 ((u − λ )t + uxxx − 4ux (u − λ ) − 2(u + 2λ ) (u − λ )x )
= ψ 2 (ut − λt + uxxx − 6uux ) .
Thus, if (7.13) holds, then
ψ 2 ∂t λ = ∂xW (R, ψ) exactly when ut + uxxx = 6uux .
In particular, if W (ψ, R) has decay, then we may integrate over R to obtain that
Z
λt ψ 2 dx = 0.
R
This is an unexpected and amazing result: if u(x,t) solves the KdV equation, then the eigen-
values λ of (7.13) are time-independent, although both u and ψ depend on time. Moreover,
W (ψ, R) is a function of time only, and thus

W (ψ, R) = W (ψ, R)x=∞ = 0,
if we assume, for example, that ψ(·,t) ∈ S (R). Put differently,

2 R
ψ ∂x = 0,
ψ
so that R/ψ, too, is a function of time only. Given that
R
→C as x → ∞,
ψ
with C independent of time, we can deduce the following evolution equation for ψ:
ψt + ux ψ − 2(u + 2λ )ψx = Cψ. (7.14)
Thus, at least formally, we have related the KdV equation in u to a linear problem in ψ:
ψxx + (λ − u)ψ = 0,
(7.15)
ψt + (ux −C)ψ − 2(u + 2λ )ψx = 0.
Given (7.15), one can show (do this!) that the compatibility condition
ψtxx = ψxxt ,
is equivalent to the relation
ψ (ut − λt + uxxx − 6uux ) = 0,
found above. One sometimes says that the KdV equation is the compatibility condition for the
system (7.15), meaning that λ = const in (7.15) if and only of u satisfies the KdV equation.
7.3.3 The relation to Lax pairs

The formulation (7.15) can be used to solve the KdV equation by means of the inverse scatter-
ing transform. The idea is basically that one starts with the initial datum u0 , and calculates the
corresponding eigenvalues λ and eigenfunctions ψ. One then let these evolve in time, and at a
later point in time tries to recover u from λ and ψ. Note, however, that the evolution equation
for ψ involves u. It is therefore not feasible to keep track of the actual eigenfunctions. More-
over, for eigenvalue problems such as the one above, the spectrum does not only consist of
eigenvalues, but also of so called continuous spectrum. It turns out, however, that the function
u in the scattering problem can be determined from the eigenvalues and some additional con-
stants, whose evolution is also known. For an introduction to this beautiful subject we refer
to [5]. The scattering/inverse scattering approach to the KdV equation seems like magic, and
it was for some time unclear if it applied to other equations. In 1968, Peter Lax discovered
an abstract structure behind the calculations above. This structure is called a Lax pair in his
honour. There are by now many examples of equations having Lax pairs.
One first notes that (7.15) can be reformulated as
Lψ = λ ψ,
(7.16)
ψt = Bψ,
where L is the linear and symmetric operator in the scattering problem, and B is the cor-
responding evolution operator. Now, assume that ∂t λ = 0. Taking the time-derivative of
Lψ = λ ψ, we then get that
0 = Lt ψ + Lψt − λ ψt
= Lt ψ + LBψ − λ Bψ
= Lt ψ + LBψ − Bλ ψ
= (Lt + LB − BL) ψ.
Denoting BL − LB by [B, L], we see that
Lt = [B, L],
for all ψ which solve (7.16).

Contrariwise, suppose that

Lt = [B, L],
where L and B are spatial, but time-dependent, operators on some Hilbert space X, and L is
linear and symmetric, i.e.
(Lψ, ϕ)X = (ψ, Lϕ)X , ψ, ϕ ∈ X.
One may then consider the eigenvalue problem
Lψ = λ ψ, 0 6= ψ ∈ X.
We assume also that L, B and the eigenfunctions of L are continuously differentiable with
respect to t. Just as above, we find that
Lt ψ + Lψt = λt ψ + λ ψt ,
whence
λt ψ = (L − λ )ψt + [B, L]ψ
= (L − λ )ψt + Bλ ψ − LBψ
(7.17)
= (L − λ )ψt + (λ − L)Bψ
= (L − λ )(ψt − Bψ),
in view of that Lt = [B, L] and Lψ = λ ψ. By using the symmetry of L it follows that
λt (ψ, ψ)X = hψ, (L − λ )(ψt − Bψ))X

= ((L − λ )ψ, (ψt − Bψ))X
= (0, (ψt − Bψ))X = 0.
Thus λt = 0, and we infer from (7.17) that
(L − λ )(ψt − Bψ) = 0,
i.e. ψt − Bψ is an eigenfunction of L for the eigenvalue λ . In particular, when the eigenspace

corresponding to λ is one-dimensional, then
ψt − Bψ = f (t)ψ,
for some function f depending only on time. Since f commutes with the spatial operator L,
we find that B̃ := B + f satisfies both
ψt = B̃ψ and Lt = [B̃, L].

7.3.4 A Lax-pair formulation for the KdV equation

Theorem 7.6. The KdV equation can be written as the Lax equation Lt = [B, L] for the Lax
pair
L := −∂x2 + u, and B := −4∂x3 + 6u∂x + 3ux +C, C ∈ R.
Remark 7.7. Notice that this Lax pair appears already in the linear formulation (7.15) of the
KdV equation. This follows, since the scattering problem
λ ψ = Lψ = u − ∂x2 ψ,

implies that
ψt = −(ux −C)ψ + 2(u + 2λ )ψx

= [−ux +C + 2u∂x + 4∂x λ ] ψ
= −ux +C + 2u∂x + 4∂x u − ∂x2 ψ

= −ux +C + 2u∂x + 4 ux + u∂x − ∂x3 ψ

= −4∂x3 + 6u∂x + 3ux +C ψ,

and C is an arbitrary constant.
Proof. Elementary calculations yield that
LB = 4∂x5 − 4u∂x3 − 6 uxx ∂x + 2ux ∂x2 + u∂x3 + 6u2 ∂x

− 3 uxxx + 2uxx ∂x + ux ∂x2 + 3uux +CL

= 4∂x5 − 10u∂x3 − 15ux ∂x2 − 12uxx ∂x + 6u2 ∂x − 3uxxx + 3uux +CL.
and
BL = 4∂x5 − 4 uxxx + 3uxx ∂x + 3ux ∂x2 + u∂x3

− 6u∂x3 + 6u (ux + u∂x ) − 3ux ∂x2 + 3uux +CL

= 4∂x5 − 10u∂x3 − 15ux ∂x2 − 12uxx ∂x + 6u2 ∂x − 4uxxx + 9uux +CL.
We see that
Lt = ut , and [B, L] = −uxxx + 6uux ,
whence the Lax equation Lt = [B, L] is just the KdV equation itself. This explains why the
eigenvalues are constant in time, as also explicitly calculated above.
Appendix A
Functional analysis
The basic idea of functional analysis is to consider functions (or other objects) as points or
vectors in a vector space (also called a linear space). The space should be equipped with some
topology, so that one can discuss things such as convergence and continuity. By proving results
based on the structure of the vector space and not on the specific objects under consideration,
one can deduce results which apply in many different concrete cases. In most cases discussed
in these notes, the topology is defined in terms of a norm. We assume that the reader is familiar
with the definition of a vector space. Throughout the chapter, if nothing else is said, X defines
a vector space over the real or complex numbers. The letter K will denote R or C, depending
on whether X is real or complex. Since the idea is to give a brief overview, we will not give
complete proofs of all the statements. The reader will the find proofs in any book on functional
analysis, e.g. [16].
A.1 Banach spaces

Definition A.1. A norm on X is a function k · k : X → [0, ∞) such that
(1) kλ xk = |λ |kxk (positive homogeneity);
(2) kx + yk ≤ kxk + kyk (triangle inequality);
(3) kxk = 0 ⇔ x = 0 (positive definiteness);
for all x, y ∈ X and λ ∈ K. The pair (X, k · k) is called a normed vector space.
We will sometimes also make use of functions k · k : X → [0, ∞) which satisfy conditions
(1) and (2) of the above definition, but not necessarily (3). Such a function is called a semi-
norm. Whenever we have a semi-norm k · kX on X, we can however obtain a normed vector
space by introducing an equivalence relation. We say that x ∼ y if kx − ykX = 0. The tri-
angle inequality shows that ∼ really defines an equivalence relation. By the reverse triangle
inequality, we find that
|kxkX − kykX | ≤ kx − ykX = 0
141
142 Functional analysis
if x ∼ y. We can therefore define a norm k · kY by setting k[x]kY = kxkY for the equivalence
class [x] ∈ Y containing the element x ∈ X (the above argument shows that it is independent of
the choice of representative). We leave it to the reader to verify that k · kY really is a norm on
Y.
Example A.2.
kuk = sup |u(x)|
0≤x≤1
defines a norm on C[0, 1], the space of continuous function on the interval [0, 1]. It only defines
a semi-norm on C(R), since any function u which vanishes outside of the interval [0, 1] satisfies
kuk = 0.
From now on we suppose that X is endowed with a norm k · k. Using the norm one can
define the open ball Br (x) of radius r > 0 around a point x ∈ X as Br (x) = {y ∈ X : kx−yk < r}.
Using these open balls one can define open and closed sets and continuous functions just as in
Rd .
Note that any two norms k · k1 and k · k2 on a finite-dimensional space are equivalent, in
the sense that there exist constants C1 ,C2 > 0, such that Ckxk1 ≤ kx2 k ≤ C2 kxk1 . This implies
that the norms define the same topology, i.e. the same notion of open sets.
Of particular interest are linear operators between normed vector spaces. We have the
following basic result.
Proposition A.3. Let (X, k · kX ) and (Y, k · kY ) be normed vector spaces and let T : X → Y be
a linear operator. The following are equivalent.
(1) T is continuous at some point in X;
(2) T is continuous at every point in X;
(3) there exists a constant C ≥ 0 such that
kT xkY ≤ CkxkX
for all x ∈ X.
Due to (3), a continuous linear operator X → Y is also called a bounded linear operator.
The smallest possible constant in (3),
kT xkY
kT k := sup kT xkY = sup ,
kxkX =1 x∈X\{0} kxkX
is called the operator norm of T . By definition, we have kT xkY ≤ kT kkxkX . Note that the set
B(X,Y ) of bounded linear operators X → Y is a normed vector space when endowed with the
operator norm.
Functional analysis 143
Every linear operator defined on a finite-dimensional vector space is bounded. If X is

infinite-dimensional there always exist unbounded operators on X, although they can’t be
constructed explicitly. The simplest kind of linear operators are the linear functionals. These
are operators X → K.
Definition A.4. The dual of X is the space of bounded linear functionals on X. It is denoted
X ∗.
The dual of a finite-dimensional space X can be identified with itself. Indeed, given a basis
{e1 , . . . , en } for X, any linear functional ` on X can be written
`(x1 e1 + · · · + xn en ) = c1 x1 + · · · + cn xn ,
where c j = `(e j ). This defines a linear isomorphism X ? → Kn .

The linear functionals on X induce a topology on X which is different from the one induced
by the norm.
Definition A.5. We say that a sequence {xn }∞

n=1 in X converges weakly to x ∈ X if `(xn ) → `(x)
∗
for every ` ∈ X . We then write xn * x. The topology defined by weak convergence is called
the weak topology on X.
This topology is important since it is often much easier to prove weak convergence than strong
convergence (convergence in norm). There are many cases where it is enough to have a weak
limit.
An important property which separates R form Q is its completeness. This can be ex-
pressed in many ways, e.g. as the property that an increasing sequence which is bounded
from above always has a limit in R (this is not true in Q as can be seen e.g. by looking at
the decimal expansion of any irrational number). An equivalent formulation of this is that any
Cauchy sequence in R is convergent. This turns out to be the correct generalization. Recall
that a Cauchy sequence is a sequence {xn } satisfying xn − xm → 0 as n, m → ∞.
Definition A.6. X is said to be complete if every Cauchy sequence in X is convergent. A

complete normed vector space is also called a Banach space.
Proposition A.7. A closed subspace of a Banach space is a Banach space.
The basic example of a function space which a Banach space is the space C(I) of contin-
uous real-valued functions on a compact interval I, endowed with the supremum norm. The
completeness property basically follows from the fact that R is complete and that uniform
limits of continuous functions are continuous. If I is not compact, one merely has to add the
condition supt∈I ku(t)kX < ∞ to the definition. We denote the space of bounded continuous
real-valued functions on I by Cb (I). If I is compact we have that Cb (I) = C(I). This example
can be generalized in many ways. One example which we use repeatedly in the notes is the
following.
Example A.8. Let X be a Banach space and I a compact interval. Then the space C(I; X)
consisting of continuous functions u : I → X is a Banach space, with norm given by
kukC(I;X) = sup ku(t)kX .
t∈I
The proof of completeness is almost word for word the same as in case of real-valued func-
tions. One only has to use the completeness of X instead of the completeness of R. Similarly,
for any k ≥ 0, we introduce the space Ck (I; X) consisting of k times continuously differentiable
functions from I to X. Here, differentiability of u at t means that there exists an element v ∈ X
such that
u(t + h) − u(t)
lim =v
h→0 h
in X. Ck (I; X) is also a Banach space when endowed with the norm
kukCk (I;X) = max sup ku( j) (t)kX .
0≤ j≤k t∈I
A.2 Hilbert spaces

A norm often comes from an inner product. We recall the definition.
Definition A.9. An inner product on X is a function (·, ·) : X × X → K, such that
(1) (λ x + µy, z) = λ (x, y) + µ(y, z) (linearity in the first argument);
(2) (x, y) = (y, x) (conjugate symmetry);
(3) (x, x) ≥ 0 with equality iff x = 0 (positive definiteness);
for every x, y, z ∈ X and λ , µ ∈ K. The pair (X, (·, ·)) is called an inner product space.
Using the conjugate symmetry one finds that (x, λ y + µz) = λ (x, y) + µ(x, z). One says that
the inner product is sesquilinear. If K = R, then (2) simply means that (x, y) = (y, x)p
and the
inner product is then bilinear. The norm associated with the inner product is kxk = (x, x).
We recall the Cauchy-Schwarz inequality: |(x, y)| ≤ kxkkyk, with equality if and only if x and
y are linearly dependent.
Definition A.10. A complete inner product space is called a Hilbert space.
1R
Example A.11. The expression (u, v)L2 = −1 u(x)v(x) dx defines an inner product on C[−1, 1].
C[−1, 1] is not a Hilbert space with this inner product. The sequence un (x) = nx/(n|x| + 1) is
an example of a Cauchy sequence which does not converge to an element of C[−1, 1]. Indeed,
if v(x) = 1 for x > 0 and v(x) = −1 for x < 0, we find that
Z 1
dx 2
kun − vk2L2 =2 2
= →0
0 (nx + 1) n+1
as n → ∞. It follows that kun − um kL2 ≤ kun − vkL2 + kv − um k → 0 as n, m → ∞, so that the
{un }n is a Cauchy sequence. On the other hand the sequence can’t have a limit u ∈ C[−1, 1]
R1
since we would then obtain that −1 |u(x) − v(x)|2 dx = 0, which would imply that u(x) = 1 for
x > 0 and u(x) = −1 for x < 0.
We saw earlier that the dual of a finite-dimensional space can be identified with the space
itself. Generally, this is not the case in infinite dimensions. However, for Hilbert spaces it is
true. Note that any y ∈ X defines a bounded linear functional `y through the formula
`y (x) = (x, y).
The fact that this a linear functional is obvious; the fact that it is bounded with norm kyk
follows from the Cauchy-Schwarz inequality. Riesz’ representation theorem asserts that the
converse is also true.
Theorem A.12 (Riesz’ representation theorem). Let (X, (·, ·)) be a Hilbert space. Given any
bounded linear functional ` ∈ X ∗ , there exists a unique element y = y` ∈ X such that
`(x) = (x, y)
for every x ∈ X. The map Φ : X ∗ → X, ` 7→ y` is an isometric anti-isomorphism, meaning that

(1) Φ is bijective;
(2) kΦ(`)k = k`k;
(3) Φ(λ `1 + µ`2 ) = λ Φ(`1 ) + µΦ(`2 );

for all `, `1 , `2 ∈ X ∗ and µ, λ ∈ K.
A.3 Metric spaces and Banach’s fixed point theorem

When solving nonlinear equations, the most important tool is Banach’s fixed point theorem
(also called the contraction principle). This is e.g. the main ingredient in the implicit function
theorem and the inverse function theorem. It can also be used to prove the Picard-Lindelöf
theorem on existence and uniqueness of solutions of the initial-value problem for ordinary
differential equations. Although we will always use it in Banach spaces, it is more natural to
present it in the setting of metric spaces.
Definition A.13. Let X be a set. A map ρ : X × X → [0, ∞) is called a metric if
(1) ρ(x, y) = 0 ⇔ x = y;
(2) ρ(x, y) = ρ(y, x);
(3) ρ(x, z) ≤ ρ(x, y) + ρ(y, z);

for every x, y, z ∈ X. (X, ρ) is called a metric space.
Given a metric, one can again define a topology by introducing the open balls Br (x) = {y ∈
X : ρ(x, y) < r}. Any norm induces a metric ρ(x, y) = kx − yk. Note that we can still define
Cauchy sequences in metric spaces by requiring that ρ(xn , xm ) → 0 as m, n → ∞. A metric
space is said to be complete if every Cauchy sequence converges.
Definition A.14. Let (X, ρX ) and (Y, ρY ) be metric spaces. A function f : X → Y is called a
contraction if there exists a constant θ ∈ [0, 1) such that
ρY ( f (x), f (y)) ≤ θ ρX (x, y)
for every x, y ∈ X.
Theorem A.15 (Banach’s fixed point theorem). Let (X, ρX ) be a complete metric space. If
f : X → X is a contraction, then the fixed point equation
f (x) = x
has a unique solution.

Proof. If x, y are solutions, then ρ(x, y) = ρ( f (x), f (y)) ≤ θ ρ(x, y) ⇒ ρ(x, y) = 0 ⇒ x = y.
This proves uniqueness.
To prove the existence of a solution, we pick an arbitrary element x0 ∈ X and set x1 = f (x0 ),
x2 = f (x1 ), x3 = f (x2 ), . . .. We have
ρ(xn+1 , xn ) = ρ( f (xn ), f (xn−1 )) ≤ θ ρ(xn , xn−1 )
and hence ρ(xn+1 , xn ) ≤ θ n ρ(x1 , x0 ). If m > n it follows that
ρ(xm , xn ) ≤ ρ(xm , xm−1 ) + · · · + ρ(xn+1 , xn )

≤ (θ m−1 + · · · + θ n )ρ(x1 , x0 )
∞
≤ ∑ θ k ρ(x1 , x0 )

k=n
θn
= ρ(x1 , x0 )
1−θ
→0
as n → ∞. It follows that {xn } is a Cauchy sequence and thus has a limit x since X is complete.
The fact that x is a solution of the fixed point problem follows by passing to the limit in the
relation xn+1 = f (xn ) (note that a contraction is always continuous).
Note that any closed subset of a Banach space X is a complete metric space with the metric
induced by the norm on X. This is the setting in which we will use the above theorem.
A.4 Completions
A subset of U of a metric space (X, ρ) is said to be dense in X if for every x ∈ X there exists
a sequence {xn } in U such that xn → x. An example is that Q is dense in R. Whenever U
is a subset of X, U will be dense in the closure U, defined as the smallest closed subset of X
containing U (that such a set exists follows from the fact that intersections of closed sets are
closed).
For many purposes in functional analysis it is important to work in a complete space. As

an example we mention Banach’s fixed point theorem above. The question is what to do if
the space is not complete. There is an abstract answer to this question which is contained in
the following proposition. An isometry between to metric spaces is a map which preserves
distances. Note that the inverse of a bijective isometry also is an isometry.
Proposition A.16. Given any metric space (X, ρ), one can find a complete metric space (X̃, ρ̃)
and a map Φ : X → X̃ which is a bijective isometry onto a dense subset of X̃.
Proof. We give a sketch of the proof. The idea is to let X̃ consist of Cauchy sequences in X
and to define the distance as the limit of the distance between elements of two such sequences.
However, two sequences with the same limit will have distance 0 unless one introduces equiv-
alence classes. Two sequences {xn } and {yn } in X are defined to be equivalent if xn − yn → 0.
The space X̃ consists of all equivalence classes of Cauchy sequences in X and the metric d˜ is
defined as
ρ̃([{xn }], [{yn }]) = lim ρ(xn , yn ),
n→∞
where [{xn }] denotes the equivalence class containing the sequence {xn }. The limit on the
right exists since
|ρ(xm , ym ) − ρ(xn , yn )| ≤ |ρ(xm , ym ) − ρ(xn , ym ) + |ρ(xn , ym ) − ρ(xn , yn )|

≤ ρ(xm , xn ) + ρ(ym , yn )
and R is complete. It is independent of the choice of representative by a similar calcula-

tion. The map Φ is defined by letting Φ(x) be the constant sequence x, x, . . . (or rather the
corresponding equivalence class). This map is clearly an isometry. Φ(X) is dense since we
can approximate x = [{xn }] ∈ X̃ by x(m) = Φ(xm ) = [{xm , xm , . . .}]. Indeed, ρ̃(x, x(m)) =
limn→∞ ρ(xn , xm ) which converges to 0 as m → ∞. The only difficult part is the proof of com-
pleteness. The trick is to approximate a Cauchy sequence in X̃ by one in Φ(X) (we proved
that Φ(X) is dense in X̃). On the other hand, we know that a Cauchy sequence Φ(xn ) in Φ(X)
has a limit in X̃, namely [{xn }].
More generally, one says that a complete metric space (Y, ρY ) is a completion of (X, ρX )
if there exists a map Φ : X → Y which is a bijective isometry onto a dense subset of Y . One
usually identifies X with the set Φ(X).
The following properties of the completion are important.
Proposition A.17. If X is a normed space then the completion is a Banach space (with the
norm corresponding to the metric). Similarly, if X is an inner product space then the comple-
tion is a Hilbert space with the natural inner product. Moreover, the map Φ is an isometric
isomorphism, that is, a linear bijection which preserves norms.
Consider the normed vector space from Example (A.11). The completion of this space is
a Hilbert space. In Appendix B, we show that the completion can be identified with the space
of square-integrable functions in the sense of Lebesgue, L2 (−1, 1). This is clearly preferable
to working with equivalence classes of Cauchy sequences.
Appendix B
Integration
The Riemann integral is insufficient for many purposes in analysis. Specifically it doesn’t
behave well with respect to limits. Suppose that { fn } is an increasing sequence
R
of non-negative
Riemann integrable functions. Then R
both f (x) = lim
R n→∞ n
f (x) and limn→∞ f n dx exists and
it would be natural to expect that f dx = limn→∞ fn dx. However, f might not be Riemann
integrable, as the following example illustrates.
Example B.1. Let {xn }∞ n=1 be an enumeration of the rational numbers in the interval [0, 1].
Define the functions fn : [0, 1] → R by setting fn (x) = 1 if x ∈ {x1 , . . . , xn } and fn (x)R = 0 other-
wise. Each fn is 0 except at finitely many points, so it is Riemann integrable with fn dx = 0.
The sequence { fn (x)} is increasing and bounded from above for each x, so it has a limit f (x).
In fact, it is clear that f (x) is 1 at the rational numbers in [0, 1] and zero everywhere else. The
function f is not Riemann integrable.
As you know from calculus, integrals measure some form of area or volume. It is therefore
natural to start by discussing measure, that is, the generalization of length, area and volume
to arbitrary dimensions. By assigning a measure to the set of rational points in [0, 1], one
can integrate the function f in the previous example. There are other ways of introducing
the Lebesgue integral on Rd , e.g. by approximation with continuous functions of compact
support.
The main purpose of this appendix is to describe the construction of the Lebesgue integral
and to collect some results which are used throughout the text. Proofs of the theorems can be
found in any book on measure theory and real analysis, e.g. [8].
B.1 Lebesgue measure

The measure of a set should be a function µ which assigns a non-negative number µ(E) (the
measure of E)) to sets E ⊂ Rd . There are two things to note here. First of all, it is natural
to allow µ(E) to take the value +∞. We let [−∞, ∞] be extended real line. Any operation
involving ±∞ which makes sense in an intuitive way, e.g. x + ∞ = ∞ if x ∈ R, is defined in this
way. We define 0 · (±∞) = 0. Otherwise, any operation which can’t be defined in an intuitive
149
150 Integration
way, such as ∞ − ∞, is undefined. Secondly, we will not be able to measure all subset of Rd .
This is not a big problem, since the class of sets we can measure is very large (e.g. it contains
all open sets and all closed sets).
Definition B.2 (Sigma algebras). A collection Σ of subsets of Rd is called a σ -algebra if the
the following conditions hold
(1) Rd ∈ Σ;
(2) if A ∈ Rd , then Ac ∈ Rd ;
(3) if A j ∈ Σ, j = 1, 2, . . ., then ∪∞j=1 A j ∈ Σ.
It follows from the definition that
• 0/ = (Rd )c ∈ Σ;
• ∩∞j=1 A j = (∪∞j=1 Acj )c ∈ Σ if A j ∈ Σ for all j;
• A \ B = A ∩ Bc ∈ Σ if A, B ∈ Σ.
Definition B.3 (Measures). A measure on a σ -algebra Σ is a function µ : Σ → [0, ∞] which is
countably additive, that is,
∞
µ ∪∞j=1 A j = ∑ µ(A j )
j=1
whenever A j ∈ Σ for all j and the sets A j are pairwise disjoint (A j ∩ Ak = 0/ if j 6= k), and
/ = 0.
satisfies µ(0)
One can show from the definition that
• µ(A) ≤ µ(B) if A, B ∈ Σ and A ⊂ B;
• µ(∪∞j=1 A j ) ≤ ∑∞j=1 µ(A j ) if A j ∈ Σ for all j;
• µ(∪∞j=1 A j ) = lim j→∞ µ(A j ) if A j ∈ Σ and A j ⊂ A j+1 for all j.
Theorem B.4 (Lebesgue measure). There exists a σ -algebra of subsets of Rd and a measure
µ on Σ such that
(1) every open set (and thus every closed set) in Rd belongs to Σ;
(2) if A ⊂ B and B ∈ Σ with µ(B) = 0, then A ∈ Σ with µ(A) = 0;
(3) if A = [a1 , b1 ] × [a2 , b2 ] × · · · × [ad , bd ], then µ(A) = Πdj=1 (b j − a j );
We denote this measure | · | and call it the Lebesgue measure on Rd . A set in the above
σ -algebra is called Lebesgue measurable. Since we will only deal with the Lebesgue measure,
we usually drop the word ‘Lebesgue’ in what follows. The Lebesgue measure is a general-
ization of the intuitive idea of lengths in R, areas in R2 and volumes in R3 . It gives the usual
value to sets such as polygons, balls etc, but it also allows us to measure ‘wilder’ sets and sets
in higher dimensions than 3.
Integration 151
Definition B.5. A set of measure zero is called a null set. A property which holds everywhere
on Rd , except on a null set, is said to hold almost everywhere, abbreviated a.e.
Any countable set is a null set, in particular the set [0, 1] ∩ Q which appeared in Example
B.1.
B.2 Lebesgue integral

Just as in the previous section, we will only be able to define the integral for certain functions.
Definition B.6. Let A be a measurable set. A function f : A → [−∞, ∞] is said to be measurable

if the set
{x : f (x) > a}
is measurable for each a ∈ R.
Proposition B.7.
(1) If f is measurable, then so is | f |;
(2) if f and g are measurable and real-valued, then so are f + g and f g;
(3) if { f j } is a sequence of measurable functions, then sup j f j , inf j f j , lim sup j→∞ f j and
lim inf j→∞ f j are measurable;
(4) a continuous function defined on a measurable set is measurable;
(5) if f : R → R is continuous and g is measurable and real-valued, then the composition

f ◦ g is measurable.
Recall that the characteristic function χA of a set A ⊂ Rd is defined by

(
1 x ∈ A,
χA (x) =
0 x∈ / A.
Definition B.8. A simple function s on Rd is a finite linear combination of characteristic

functions of measurable subsets of Rd , that is,
n
s(x) = ∑ a j χA j (x),
j=1
a j ∈ R for all j.
Definition B.9. The Lebesgue integral of a non-negative simple function s = ∑nj=1 a j χA j , a j ≥

0, is defined as Z n
s(x) dx = ∑ a j |A j |.
Rd j=1
152 Integration
The Lebesgue integral of a non-negative measurable function f : Rd → [0, ∞] is defined as

Z Z
f (x) dx = sup s(x) dx,
Rd Rd
where the supremum is taken over all simple functions s satisfying 0 ≤ s(x) ≤ f (x) for all x.
Note that the integral of a non-negative function can have the value +∞. If it is finite, we
say that the function is integrable. We allowed f to take the value +∞, but if it is integrable
this cannot happen very often.
Proposition B.10. If f : Rd → [0, ∞] is integrable, then, f (x) < ∞ a.e.
If f is measurable and real-valued, we set f = f + − f − , where f + (x) = max{ f (x), 0} and

f − (x) = − min{ f (x), 0}, are both measurable and real-valued. We define
Z Z Z
+
f (x) dx = f (x) dx − f − (x) dx,
Rd Rd Rd
provided that at least one of f + and f − are integrable. If both f + and f − are integrable, we
say that f is integrable (some authors use the word summable). Note that this is equivalent to
Z
| f (x)| dx < ∞,
Rd
so Lebesgue integrals are always absolutely convergent. The set of integrable functions on Rd
is denoted L 1 (Rd ). This set is closed under linear combinations and hence a vector space
with the usual pointwise operations.
We can also define integrals over a measurable subset A ⊂ Rd . If f is a measurable function

on A, then the extension f˜(x) = f (x), x ∈ A, f (x) = 0, x ∈ Ac is also measurable and we set
Z Z
f (x) dx = f˜(x) dx
A Rd
when the right hand side is well-defined. Similarly, if f is a measurable function on Rd ,

we define the integral of f over A as the integral over Rd of the function χA f (when this
is possible). Complex-valued functions are integrated by separating into real and imaginary
parts; vector-valued functions are integrated componentwise.
Note that all the usualR rules that youR have learned
R
from Riemann
R
integration
R
also apply to
Lebesgue integrals, e.g. ( f + g) dx = f dx + g dx and | f dx| ≤ | f | dx. Moreover, any
Riemann integrable function is also Lebesgue integrable and the values of the two integrals
coincide. The only thing one has to watch out for is generalized integrals. The Lebesgue
integral automatically handles absolutely convergent generalized integrals, but one still needs
aR limiting procedure to take care of conditionally convergent generalized integrals, such as
sin x
R x dx. One thing which is much easier for the Lebesgue integral than for the Riemann
integral is iterated integration.
Integration 153
Theorem B.11 (Fubini-Tonelli theorem1 ). Let f be a measurable function on Rn+d and sup-
pose that at least one of the integrals
Z
I1 = | f (x, y)| dx dy,
Rn+d

Z Z
I2 = | f (x, y)| dx dy,
Rd Rn
Z Z
I3 = | f (x, y)| dy dx
Rn Rd
exists and is finite. For I2 and I3 this means that the inner integral exists a.e. and defines an
integrable function. Then all the integrals exist and I1 = I2 = I3 . Moreover,
(1) f (·, y) ∈ L1 (Rn ) for a.e. y ∈ Rd and f (x, y) dx ∈ L1 (Rd );
R
Rn
(2) f (x, ·) ∈ L1 (Rd ) for a.e. x ∈ Rn and f (x, y) dy ∈ L1 (Rn );

R
Rd
(3) Z Z Z Z Z
f (x, y) dx dy = f (x, y) dx dy = f (x, y) dy dx.
Rn+d Rd Rn Rn Rd
B.3 Convergence properties

One of the most useful aspects of the Lebesgue integral is that it behaves well with respect to
limits. More precisely, we have the following convergence results.
Theorem B.12 (Fatou’s lemma). Assume that { fn } is a sequence of non-negative, measurable
functions. Then Z Z
lim inf fn dx ≤ lim inf fn dx.
Rd n→∞ n→∞ Rd
Theorem B.13 (Monotone convergence theorem). Assume that { fn } is an increasing sequence

of non-negative measurable functions. Then
Z Z
lim fn dx = lim fn dx.
Rd n→∞ n→∞ Rd
Theorem B.14 (Dominated convergence theorem). Assume that { fn } is a sequence of complex-

valued integrable functions and that
fn → f a.e.
If
| fn | ≤ g a.e.
1 The
first part is due to Tonelli and the second to Fubini. For simplicity we will mostly refer to the theorem
as Fubini’s theorem.
154 Integration
for some nonnegative integrable function g, then

Z Z
fn dx → f dx.
Rd Rd
Using the dominated convergence theorem one can prove the following important results
about the dependence of integrals on parameters.
Theorem B.15. Let Ω be an open subset of Rn . Suppose that f : Ω × Rd → C and that

d
R
f (x, ·) : R → C is integrable for each x ∈ Ω. Let F(x) = Rd f (x, y) dy.
(1) Suppose that there exists g ∈ L 1 (Rd ) such that | f (x, y)| ≤ g(y) for all x, y. If limx→x0 f (x, y) =
f (x0 , y) for every y, then limx→x0 F(x) = F(x0 ). In particular, if f (·, y) is continuous for
each y, then F is continuous.
(2) Suppose that ∂x j f (x, y) exists for all x, y and that there exists g ∈ L 1 (Rd ) such that
|∂x j f (x, y)| ≤ g(y) for all x. Then ∂x j F(x) exists and is given by
Z
∂x j F(x) = ∂x j f (x, y) dy.
Rd
B.4 L p spaces
If p ∈ [1, ∞) and f is a measurable complex-valued function on Rd , we define
Z 1/p
p
k f kL p (Rd ) = | f | dx .
Rd
We will also write k f kL p if the dimension d is clear from the context. We also define
k f kL∞ (Rd ) = ess sup | f |,
where the essential supremum of a measurable function g is defined as the minimal c ∈ [−∞, ∞]
such that g(x) ≤ c a.e., that is,
ess sup g = inf{c ∈ [−∞, ∞] : |{x : g(x) > c}| = 0}
We say that f ∈ L p (Rd ) if k f kL p (Rd ) < ∞.

1
Theorem B.16 (Hölder’s inequality). Suppose that 1 ≤ p ≤ ∞ and p + 1q = 1. If f and g are
measurable functions on Rd , then
k f gkL1 ≤ k f kL p kgkLq .
In particular, f g ∈ L 1 if f ∈ L p and g ∈ L q .
Integration 155
Theorem B.17 (Minkowski’s inequality). If 1 ≤ p ≤ ∞ and f , g ∈ L p (Rd ), then

k f + gkL p ≤ k f kL p + kgkL p .
Minkowski’s inequality shows that k · kL p defines a semi-norm on L p for 1 ≤ p ≤ ∞. It is not
a norm, however, since k f kL p = 0 when f = 0 a.e. This is easy to fix.
Definition B.18. We introduce an equivalence class on measurable functions by defining two
functions as equivalent if they are equal a.e. For 1 ≤ p ≤ ∞, the space L p (Rd ) is defined as
L p (Rd ) modulo this equivalence relation.
The definition means that in contrast to L p , an element of L p is not a function; it is an
equivalence class of functions which are equal a.e. However, in practice little confusion is
caused by identifying equivalence classes with representatives. From now on, we shall make
no difference between L p and L p .
Minkowski’s inequality can be iterated to give an inequality for a sum of finitely many
functions, or even a series. Since an integral can be thought of as a continuous sum, it is
natural to expect the following result.
Theorem B.19 (Minkowski’s inequality for integrals). Let 1 ≤ p < ∞. Suppose that f is
measurable on Rn × Rd , that f (·, y) ∈ L p (Rn ) for almost every y ∈ Rd , and that the function
1 d
y 7→ k f (·, y)kL p (Rn ) belongs to L (R ). Then the function x 7→ Rd f (x, y) dy belongs to L p (Rd )
R
and Z Z

f (·, y) dy ≤ k f (·, y)kL p (Rn ) dy.
Rd Rd

L p (Rn )
Proof. After splitting the function f into real and imaginary parts, and then splitting these into
positive and negative parts, we can assume that f ≥ 0. For p = 1 the result holds with equality
by Fubini’s theorem. If p > 1 we write
Z p Z
f (x, y) dy = f (x, y) dy w(x),
Rd Rd
where w(x) = ( Rd f (x, y) dy) p−1 . By Fubini’s theorem and Hölder’s inequality, it follows that
R
Z Z Z Z
f (x, y) dy w(x) dx = f (x, y)w(x) dx dy
Rn Rd Rd Rn
Z Z 1/p Z 1/q
p q
≤ f (x, y) dx w(x) dx dy
Rd Rn Rn
Z Z 1/p ! Z Z p 1/q
= f (x, y) p dx dy f (x, y) dy dx
Rd Rn Rn Rd
where 1/q = 1 − 1/p. Hence,

Z Z p Z Z 1/p ! Z Z p 1/q
p
f (x, y) dy dx ≤ f (x, y) dx dy f (x, y) dy dx .
Rn Rd Rd Rn Rn Rd
The result follows after dividing with the second factor in the right hand side.
156 Integration
As already mentioned in Appendix A, the great advantage of working with L p spaces is

that they are complete.
Theorem B.20. L p (Rd ) is a Banach space for 1 ≤ p ≤ ∞. For p = 2 it is a Hilbert space.
Recall that the support of a continuous function f is defined as supp f = {x ∈ Rd : f (x) 6= 0}.
We denote by C0 (Rd ) the set of continuous functions on Rd with compact support (the support
is always closed, so the only condition is that it should be bounded). If f is only measurable,
one instead defines the essential support, ess supp f , as the smallest closed set outside of which
f = 0 a.e. We will use the notation supp f for ess supp f when f is not continuous. The fact
that L p (Rd ) is the completion of C0 (Rd ) now follows from the following theorem.
Theorem B.21. C0 (Rd ) is dense in L p (Rd ) for 1 ≤ p < ∞.
Note that Hölder’s inequality implies that a function g ∈ Lq defines a bounded linear func-
tional on L p , Z
`g ( f ) = f g dx, (B.1)
Rd
where p and q are conjugate indices ( 1p + 1q = 1). Moreover, k`g k ≤ kgkq .
Theorem B.22. For 1 ≤ p < ∞, the dual of L p (Rd ) can be identified with Lq (Rd ), where p
and q are conjugate indices. That is, the map g 7→ `g from Lq (Rd ) → (Lq (Rd ))∗ defined by
(B.1) is an isometric isomorphism.
In the case p = 2 this is a special case of Riesz’ representation theorem, at least if one replaces
g by g.
We can also define L p on a measurable subset A of Rd by replacing the domain of inte-

gration by A. Recall that the function |x|a is integrable on B1 (0) ⊂ Rd if and only if a > −d,
whereas it is integrable on Rd \ B1 (0) if and only if a < −d. If we are only interested in local
integrability properties and not in the the decay at infinity, the following spaces are useful.
p
Definition B.23. We say that f ∈ Lloc (Rd ) if f ∈ L p (K) for any compact subset K ⊂ Rd .
B.5 Convolutions and approximation with smooth functions

The convolution f ∗ g of two measurable functions f and g is defined by
Z
( f ∗ g)(x) = f (x − y)g(y) dy
Rd
whenever this makes sense. From the definition it follows that ( f ∗ g)(x) = (g ∗ f )(x) a.e. By
Hölder’s inequality, we obtain that f ∗ g ∈ L∞ if f ∈ L p and g ∈ Lq with 1p + 1q = 1, p, q ∈ [1, ∞].
Moreover,
k f ∗ gkL∞ ≤ k f kL p kgkLq . (B.2)
Integration 157
From Minkowski’s inequality for integrals, we obtain that

k f ∗ gkL p ≤ k f kL p kgkL1 , 1 ≤ p ≤ ∞, (B.3)
so that f ∗ g ∈ L p if f ∈ L p and g ∈ L1 . If f has compact support, it also follows from the
definition that
supp( f ∗ g) ⊂ supp f + supp g.
Finally, a very important property of convolutions is the following. If g ∈ L1 (Rd ) and f ∈
Cb1 (Rd ), then f ∗ g ∈ C1 with
∂xk ( f ∗ g) = ∂xk f ∗ g.
This follows from the definition of the convolution and Theorem B.15.
In many situations it is useful to approximate functions in L p by nicer functions. E.g. a part

of a proof might only make sense for C∞ functions, whereas the outcome of the proof makes
sense in L p . One can then prove the result in L p by approximating with smooth functions. We
already saw that C0 (Rd ) is dense in L p (Rd ), but we shall now show that much more is true.
We begin by proving that there exist smooth functions with compact support.
Lemma B.24 (Existence of functions in C0∞ (Rd )). There exists a non-negative function ϕ ∈
C∞ (Rd ) with supp ϕ = B1 (0).
Proof. Note first that the function
(
e−1/t t >0
f (t) =
0 t ≤0
is C∞ on R with support [0, ∞). Indeed, for t > 0 all the derivatives are polynomials in 1/t
times e−1/t and t a e−1/t → 0 as t → 0+ for any a ∈ R. The function ϕ can e.g. be defined as
ϕ(x) = f (1 − |x|2 ).
We can now construct
R
a smooth ‘approximation of the
R
identity’ as follows. After dividing
ϕ in Lemma B.24 by Rd ϕ dx = 1, we can assume that Rd ϕ dx = 1. We set
Jε (x) = ε −d ϕ(ε −1 x), ε > 0.

Then Jε ∈ C0∞ (Rd ) with supp Jε = Bε (0) and Rd Jε (x) dx = 1 for all ε > 0 (change of vari-
R
ables). One can think of Jε as converging to a ‘unit mass’ at the origin. This can be made
sense of using distribution theory.
Generally, we call a function J ∈ C0∞ (Rd ) with J ≥ 0 and
R
Rd J dx = 1 a mollifier. We set
Jε (x) = ε −d J(ε −1 x). The convolution
Z Z
(Jε ∗ f )(x) = Jε (x − y) f (y) dy = Jε (y) f (x − y) dy
Rd Rd
1 (Rd ) and is called the mollification or regularization of f . (J ∗ f )(x)
is defined for f ∈ Lloc ε
can be thought of as an average of f over a small neighborhood of x.
158 Integration
Theorem B.25. Let J be a mollifier.

1 (Rd ), then J ∗ f ∈ C∞ (Rd ).
(1) If f ∈ Lloc ε
(2) If f ∈ L1 (Rd ) with compact support, then Jε ∗ f ∈ C0∞ (Rd ).
(3) If f ∈ C(Rd ), then Jε ∗ f → f uniformly on compact subset of Rd .
(4) If f ∈ L p (Rd ), where 1 ≤ p < ∞, then kJε ∗ f kL p ≤ k f kL p and limε→0+ kJε ∗ f − f kL p .
Proof. The first statement follows by repeated differentiation under the integral sign.
The second statement follows since supp Jε ⊂ supp f + supp Jε and both f and Jε have
compact supports.
We have that
Z
(Jε ∗ f )(x) − f (x) = Jε (y)( f (x − y) − f (x)) dy
d
ZR
= J(y)( f (x − εy) − f (x)) dy
supp J
Since f is uniformly continuous on a compact subset K ⊂ Rd , it follows that sup(x,y)∈K×supp J | f (x−

εy) − f (x)| → 0 as ε → 0+ .
The first part of (4) is a consequence of (B.2) Using Theorem B.21 we can find g ∈ C0 (Rd )
such that k f − gkL p < δ and it follows from (2) that Jε ∗ g − g has support in a compact set
which is independent of ε (sufficiently small) and hence from (3) that Jε ∗ g → g uniformly. It
follows that kJε ∗ g − gkL p < δ if ε is sufficiently small. For ε sufficiently small, we therefore
have
kJε ∗ f − f kL p ≤ kJε ∗ f − Jε ∗ gkL p + kJε ∗ g − gkL p + kg − f kL p

≤ 2kg − f kL p + kJε ∗ g − gkL p
< 3δ .
This proves (4).
Corollary B.26. C0∞ (Rd ) is dense in L p (Rd ) for 1 ≤ p < ∞.
Proof. This follows by approximating f ∈ L p (Rd ) with a function g ∈ C0 (Rd ) using Theorem
B.21 and then approximating g with a function in C0∞ (Rd ) using Theorem B.25.
We also prove the following variant which has many applications.
Theorem B.27. Suppose that K ∈ L1 (Rd ) and that Define Kε (x) = ε −d K(ε −1 x),
R
Rd K(x) dx = 1.
ε > 0.
(1) If f ∈ Cb (Rd ), then limε→0+ Kε ∗ f = f uniformly on compact subsets of Rd . If f is

uniformly continuous, the convergence is uniform on Rd .
Integration 159
(2) If f ∈ L p (Rd ), 1 ≤ p < ∞, then Kε ∗ f → f in L p (Rd ).
Proof. (1) Write

Z
Kε ∗ f − f = K(y)( f (x − εy) − f (x)) dy
d
ZR Z
= K(y)( f (x − εy) − f (x)) dy + K(y)( f (x − εy) − f (x)) dy.
|y|≤R |y|≥R
The second term is estimated as follows:

Z Z

K(y)( f (x − εy) − f (x)) dy ≤ 2 sup | f (x)|
|K(y)| dy
|y|≥R |y|≥R

x∈Rd
→0
as R → ∞. To estimate the first term, we note that limε→0+ ( f (x−εy)− f (x)) = 0 uniformly for
|y| ≤ R and x in a compact subset of Rd . It therefore follows that the first integral converges to
0 locally uniformly in x for fixed R. The result is obtained by combining these two estimates.
The second part is obvious because the estimate for the first term is now uniform over Rd .
(2) Let δ > 0. Approximate f by g ∈ C0 (Rd ) with k f − gkL p < δ and K by J ∈ C0 (Rd )
with kK − JkL1 < δ . We then have
kKε ∗ f − f kL p ≤ kKε ∗ f − Kε ∗ gkL p + kKε ∗ g − Jε ∗ gkL p + kJε ∗ g − gkL p + kg − f kL p .
We have that
kKε ∗ f − Kε ∗ gkL p ≤ kKε kL1 k f − gkL p ≤ δ kKkL1 ,
kKε ∗ g − Jε ∗ gkL p ≤ kKε − Jε kL1 kgkL p

≤ kK − JkL1 (k f − gkL p + k f kL p )
≤ δ (δ + k f kL p ).
Jε ∗ g − g converges to 0 uniformly as ε → 0+ and has support contained in a fixed compact

set since Jε and g both do. It follows that
kJε ∗ g − gkL p → 0
as ε → 0+ for fixed δ . Since δ is arbitrary, the result follows.
B.6 Interpolation
If E has finite measure, it follows from Hölder’s inequality that L p2 (E) ⊂ L p1 (E) if p1 < p2 .
p2 p1
This means that Lloc (Rd ) ⊂ Lloc (Rd ) (a higher exponent means less singular behavior locally).
However, at infinity the inclusion is reversed in the sense that L∞ (Rd ) ∩ L p1 (Rd ) ⊂ L p2 (Rd ) (a
160 Integration
smaller exponent means more decay). In fact, more generally, we have that f ∈ L p1 ∩ L p2 ⇒
f ∈ L p for any p ∈ (p1 , p2 ). This follows from Hölder’s inequality:
Z
k f kLp p = | f | p dx
ZRd
= | f |θ p1 | f |(1−θ )p2 dx
Rd
Z θ Z 1−θ (B.4)
p1 p2
≤ | f | dx | f | dx
Rd Rd
p (1−θ )
= k f kLp1p1θ k f kL2p2 ,
where p = θ p1 + (1 − θ )p2 , θ ∈ (0, 1).
Suppose that T is a linear map defined on a dense subspace X ⊂ L p1 . Suppose also that X
is dense in L p2 and that T is bounded from X ⊂ L p j to Lr j in the sense that T f ∈ Lr j and
kT f kLr j ≤ C j k f kL p j , f ∈ X.
Note that this means that T extends uniquely to a bounded operator T : L p j → Lr j (we abuse
notation by using the same letter for the extension). Indeed, if xn → x ∈ L p j then {T xn } is a
Cauchy sequence in Lr j . Since Lr j is complete we can define T x = limn→∞ T xn . This operator
will be bounded (by the same constant) and this is the only bounded extension.
One can also extend T to any L p with p ∈ (p1 , p2 ). Indeed, let A = {x : | f (x)| > 1}. Write
f = f1 + f2 , where f1 = χA f and f2 = χAc f . Then f1 ∈ L p1 since A has finite measure if
f ∈ L p , and f2 ∈ L p2 since f2 ∈ L p ∩ L∞ . Hence we can define
T f = T f 1 + T f 2 ∈ L r1 + L r2 .
The next results shows that in fact T f ∈ Lr for some r between r1 and r2 .
Theorem B.28 (Riesz-Thorin interpolation theorem). Let T be as above and let
1 θ 1−θ
= +
p p1 p2
and
1 θ 1−θ
= +
r r1 r2
p r
for some θ ∈ (0, 1). Then T : L → L is bounded with
kT f kLr ≤ C1θ C21−θ k f kL p .
Theorem B.29 (Young’s inequality). Suppose that p, q, r ∈ [1, ∞] and 1p + q1 = 1 + 1r . If f ∈ L p
and g ∈ Lq , then f ∗ g ∈ Lr with
k f ∗ gkLr ≤ k f kL p kgkLq .
Proof. Let q and g be fixed. The special case p = 1, r = q follows from Minkowski’s inequality
for integrals. The case p = q/(q − 1), r = ∞ follows from Hölder’s inequality. The general
case now follows from the Riesz-Thorin theorem.
Bibliography
[1] R. A. A DAMS AND J. J. F. F OURNIER, Sobolev spaces, vol. 140 of Pure and Applied
Mathematics (Amsterdam), Elsevier/Academic Press, Amsterdam, second ed., 2003.
[2] V. I. A RNOL0 D, Mathematical methods of classical mechanics, vol. 60 of Graduate Texts

in Mathematics, Springer-Verlag, New York, second ed., 1989. Translated from the
Russian by K. Vogtmann and A. Weinstein.
[3] J. B OUSSINESQ, Théorie des ondes et des remous qui se propagent le long d’un canal
rectangulaire horizontal, en communiquant au liquide contenu dans ce canal des vitesses
sensiblement pareilles de la surface au fond, Liouville J., 2 (1872), pp. 55–109.
[4] A. B RESSAN, Hyperbolic systems of conservation laws, vol. 20 of Oxford Lecture Series
in Mathematics and its Applications, Oxford University Press, Oxford, 2000. The one-
dimensional Cauchy problem.
[5] P. G. D RAZIN AND R. S. J OHNSON, Solitons: an introduction, Cambridge Texts in

Applied Mathematics, Cambridge University Press, Cambridge, 1989.
[6] J. J. D UISTERMAAT AND J. A. C. KOLK, Distributions, Cornerstones, Birkhäuser

Boston Inc., Boston, MA, 2010. Theory and applications, Translated from the Dutch
by J. P. van Braam Houckgeest.
[7] L. C. E VANS, Partial differential equations, vol. 19 of Graduate Studies in Mathematics,

American Mathematical Society, Providence, RI, 1998.
[8] G. B. F OLLAND, Real analysis, Pure and Applied Mathematics (New York), John Wiley
& Sons Inc., New York, second ed., 1999. Modern techniques and their applications, A
Wiley-Interscience Publication.
[9] F. G. F RIEDLANDER, Introduction to the theory of distributions, Cambridge University

Press, Cambridge, second ed., 1998. With additional material by M. Joshi.
[10] H. H OLDEN AND N. H. R ISEBRO, Front tracking for hyperbolic conservation laws,
vol. 152 of Applied Mathematical Sciences, Springer-Verlag, New York, 2002.
[11] L. H ÖRMANDER, Lectures on nonlinear hyperbolic differential equations, vol. 26 of

Mathématiques & Applications (Berlin) [Mathematics & Applications], Springer-Verlag,
Berlin, 1997.
161
162 BIBLIOGRAPHY
[12] , The analysis of linear partial differential operators. I, Classics in Mathematics,

Springer-Verlag, Berlin, 2003. Distribution theory and Fourier analysis, Reprint of the
second (1990) edition [Springer, Berlin; MR1065993 (91m:35001a)].
[13] , The analysis of linear partial differential operators. II, Classics in Mathematics,
Springer-Verlag, Berlin, 2005. Differential operators with constant coefficients, Reprint
of the 1983 original.
[14] F. J OHN, Partial differential equations, vol. 1 of Applied Mathematical Sciences,

Springer-Verlag, New York, fourth ed., 1991.
[15] R. S. J OHNSON, A modern introduction to the mathematical theory of water waves,

Cambridge Texts in Applied Mathematics, Cambridge University Press, Cambridge,
1997.
[16] P. D. L AX, Functional analysis, Pure and Applied Mathematics (New York), Wiley-
Interscience [John Wiley & Sons], New York, 2002.
[17] , Hyperbolic partial differential equations, vol. 14 of Courant Lecture Notes in

Mathematics, New York University Courant Institute of Mathematical Sciences, New
York, 2006. With an appendix by Cathleen S. Morawetz.
[18] M. J. L IGHTHILL AND G. B. W HITHAM, On kinematic waves. II. A theory of traffic

flow on long crowded roads, Proc. Roy. Soc. London. Ser. A., 229 (1955), pp. 317–345.
[19] F. L INARES AND G. P ONCE, Introduction to nonlinear dispersive equations, Universi-

text. New York, NY: Springer., 2009.
[20] J. R AUCH, Partial differential equations, vol. 128 of Graduate Texts in Mathematics,
Springer-Verlag, New York, 1991.
[21] P. I. R ICHARDS, Shock waves on the highway, Operations Res., 4 (1956), pp. 42–51.
[22] J. S. RUSSELL, Report on Waves, Report of the fourteenth meeting of the British Asso-
ciation for the Advancement of Science, (1844), pp. 311–390, plates XLVII–LVII.
[23] G. S CHNEIDER, Approximation of the Korteweg-de Vries equation by the nonlinear

Schrödinger equation., J. Differ. Equations, 147 (1998), pp. 333–354.
[24] E. M. S TEIN AND R. S HAKARCHI, Fourier analysis, vol. 1 of Princeton Lectures in

Analysis, Princeton University Press, Princeton, NJ, 2003. An introduction.
[25] R. S. S TRICHARTZ, A guide to distribution theory and Fourier transforms, River Edge,
NJ: World Scientific., 2003.
[26] C. S ULEM AND P.-L. S ULEM, The nonlinear Schrödinger equation. Self-focusing and
wave collapse., Applied Mathematical Sciences. 139. New York, NY: Springer, 1999.
BIBLIOGRAPHY 163
[27] T. TAO, Nonlinear dispersive equations. Local and global analysis, CBMS Regional
Conference Series in Mathematics 106. Providence, RI: American Mathematical Society
(AMS), 2006.
[28] G. B. W HITHAM, Linear and nonlinear waves, Pure and Applied Mathematics (New
York), John Wiley & Sons Inc., New York, 1974.
[29] K. YOSIDA, Functional analysis, Classics in Mathematics, Springer-Verlag, Berlin,

1995. Reprint of the sixth (1980) edition.
[30] N. J. Z ABUSKY AND M. D. K RUSKAL, Interaction of "solitons" in a collisionless

plasma and the recurrence of initial states, Phys. Rev. Lett., 15 (1965), pp. 240–243.

An Introduction To Nonlinear Waves

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

An Introduction To Nonlinear Waves

Transféré par

Droits d'auteur :

Formats disponibles

An Introduction to Nonlinear Waves

2 The Fourier transform 25

3 Linear PDE with constant coefficients 35

4 First order quasilinear equations 65

4.3.1 Discontinuous solutions . . . . . . . . . . . . . . . . . . . . . . . . 75

5 Distributions and Sobolev spaces 83

7 Solitons and complete integrability 127

A Functional analysis 141

B.6 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

1.1 Linear waves

Figure 1.1: A cosine wave.

1.1.2 The wave equation

utt (x,t) = c2 uxx (x,t) (1.2)

utt = c20 uxx

ρtt = c20 ρxx .

u(x,t) = ∑(ak eik(x−ct) + bk eik(x+ct) ),

which is often rewritten (with other coefficients) in the form

u(x,t) = ∑ eikx (ak cos(kct) + bk sin(kct)).

u(x,t) = F(x + ct) + G(x − ct).

where U10 = u1 , which has the unique solution

The solution is given by d’Alembert’s formula

the probability of finding the particle in the interval [a, b] is given by

After rescaling t we can rewrite the Schrödinger equation in the form

Making the Ansatz u(x,t) = ei(kx−ω(k)t) , we obtain the equation

1.2 Nonlinear waves

ut + 6uux + uxxx = 0 (1.5)

−cU 0 (X) + 6U(X)U 0 (X) +U 000 (X) = 0, (1.6)

lim U(X) = 0. (1.7)

We may integrate equation (1.6) to

In view of (1.7), we set C1 = 0. Otherwise U 00 would converge to a non-zero number, yielding

This can be written as the first-order system

and we find that the function

Theorem 1.6. For each c > 0 there is a family of solitary-wave solutions

ut + uux + δ 2 uxxx = 0, u(x, 0) = cos(πx), (1.12)

The Fourier transform

2.1 The Schwartz space

∂xα = ∂xα11 ∂xα22 · · · ∂xαdd .

k f kα,β := sup |xα ∂xβ f (x)|, α, β ∈ Nd .

In other words, we say that

for all α, β ∈ Nd . One can in fact show that

k fn − f kL∞ (Rd ) ≤ CN (1 + kϕkL∞ (Rd ) )(1 + n)−N → 0

as n → ∞. Since N is arbitrary, the same is true if we multiply fn − f by a polynomial. Since

2.2 The Fourier transform on S

We also use the notation fˆ = F ( f ).

It is clear from the inequality

and Proposition 2.2 (6) that fˆ(ξ ) is defined for each ξ ∈ Rd .

Lemma 2.5. Let f ∈ S (Rd ). Then fˆ is a bounded continuous function with

k fˆkL∞ (Rd ) ≤ (2π)−d/2 k f kL1 (Rd )

Proposition 2.6. For f ∈ S (Rd ), we have that fˆ ∈ C∞ (Rd ) and

where y ∈ Rd , λ ∈ R \ {0} and A is an invertible real d × d matrix.

This integral can be computed using the standard trick

Since C > 0 it follows that C = 1. Hence,

that is, u is invariant under the Fourier transform!

From this it follows that

Theorem 2.8. For any f ∈ S (Rd ), F ( f ) ∈ S (Rd ), and F is a continuous transformation

Proof. We estimate the S (Rd ) semi-norms of fˆ as follows:

k fˆkα,β = kξ α ∂ξ fˆkL∞ (Rd )

= kF (∂xα (xβ f ))kL∞ (Rd )

whenever this makes sense (see Section B.5).

The inverse Fourier transform is defined by

exists for all x ∈ Rd . Squeezing a Gaussian in makes it absolutely convergent, and

where Kε (x) = ε −d K(ε −1 x). Theorem B.27 implies that

lim fε (x) = f (x). (2.6)