Académique Documents
Professionnel Documents
Culture Documents
Erik Wahlén
Copyright
c 2011 Erik Wahlén
Preface
The aim of these notes is to give an introduction to the mathematics of nonlinear waves.
The waves are modelled by partial differential equations (PDE), in particular hyperbolic or
dispersive equations. Some aspects of completely integrable systems and soliton theory are
also discussed. While the goal is to discuss the nonlinear theory, this cannot be achieved
without first discussing linear PDE. The prerequisites include a good background in real and
complex analysis, vector calculus and ordinary differential equations. Some basic knowledge
of Lebesgue integration and (the language of) functional analysis is also useful. There are two
appendices providing the necessary background in these areas. I have not assumed any prior
knowledge of PDE.
The notes have been used for a graduate course on nonlinear waves in Lund, 2011. I
want to thank my students Fatemeh Mohammadi, Karl-Mikael Perfekt and Johan Richter for
pointing out some errors. I also want to thank my colleague Mats Ehrnstöm for letting me use
parts of his lecture notes on ‘Waves and solitons’.
Erik Wahlén,
Lund, December 2011
3
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Introduction 9
1.1 Linear waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.1 Sinusoidal waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.2 The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Nonlinear waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Solitons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5
6
6 Dispersive equations 97
6.1 Calculus in Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2 NLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.1 Linear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.2 Nonlinear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 KdV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.1 Linear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3.2 Nonlinear theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.3 Derivation of the KdV equation . . . . . . . . . . . . . . . . . . . . 123
6.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
B Integration 149
B.1 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.2 Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.3 Convergence properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
B.4 L p spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
B.5 Convolutions and approximation with smooth functions . . . . . . . . . . . . 156
7
Introduction
You are probably familiar with many different kinds of wave phenomena such as surface
waves on the ocean, sound waves in the air or other media, and electromagnetic waves, of
which visible light is a special case. A common feature of these examples is that they all can
be described by partial differential equations (PDE). The purpose of these lecture notes is to
give an introduction to various kinds of PDE describing waves. In particular we will focus on
nonlinear equations. In this chapter we introduce some basic concepts and give an overview
of the contents of the lecture notes.
2π
λ=
k
and
2π
T= .
ω
λ is usually called the wavelength and T simply the period. The wave has crests (local max-
ima) when kx ± ωt is an even multiple of π and troughs (local minima) when it is an odd
multiple of π.
Rewriting u as
u(x,t) = a cos(k(x ± ct)),
9
10 Introduction
c
a x
where
ω
c= , (1.1)
k
we see that as t changes, the fixed shape
X 7→ cos(kX)
is simply being translated at constant speed c along the x-axis. The number c is called the wave
speed or phase speed. The wave moves to the left if the sign is positive and to the right if the
sign is negative. A wave of permanent shape moving at constant speed is called a travelling
wave or sometimes a steady wave.
(subscripts denote partial derivatives). This is the wave equation in one (spatial) dimension.
The assumption that one can add the waves together agrees with the linearity of the wave
equation; any linear combination of solutions of (1.2) is also a solution of (1.2).
A better way of deriving the wave equation is to start from physical principles. Here is an
example.
Example 1.1. What we perceive as sound is really a pressure wave in the air. Sound waves
are longitudinal waves, meaning that the oscillations take place in the same direction as the
wave is moving. The equations governing acoustic waves are the equations of gas dynamics.
Introduction 11
These form a system of nonlinear PDE for the velocity u and the density ρ. Assuming that the
oscillations are small, one can derive a set of linearized equations:
ρ0 ut + c20 ∇ρ = 0,
ρt + ρ0 ∇ · u = 0,
where ρ0 is the density and c0 the speed of sound in still air. Here u(·,t) : Ω → R3 while
ρ(·,t) : Ω → R for some open set Ω ⊂ R3 . If we consider a one-dimensional situation (e.g. a
thin pipe), the system simplifies to
ρ0 ut + c20 ρx = 0,
ρt + ρ0 ux = 0.
Differentiating both equations with respect to t and x, we obtain after some simplifications
that both u and ρ satisfy the wave equation:
and
Note that ρ describes the fluctuation of the density around ρ0 . In the linear approximation,
the pressure fluctuation p around the constant atmospheric pressure is simply proportional to
ρ. It follows that p also satisfies the wave equation. The nonlinear equations of gas dynamics
will be discussed in Chapter 4.
We have assumed that the waves are even in x ± ct. To find the most general form of the
wave we also have to include terms of the form sin(k(x ± ct)). In order to make the notation
simpler it is convenient to use the complex exponential function. The original real-valued
function can be recovered by separating into real and imaginary parts. It is convenient also to
allow k to be negative. A general linear combination of sinusoidal waves can be written
Depending on the physical situation there might be boundary conditions which limit the range
of wave numbers k. We might e.g. only look for a wave in a bounded interval, say [0, π], which
vanishes at the end points. This forces us to look for solutions of the form
∞
u(x,t) = ∑ sin(kx)(ak cos(kct) + bk sin(kct)).
k=1
12 Introduction
In these notes we will not discuss boundaries, and hence there is no restriction on k. The most
general linear combination is then not a sum, but an integral
Z
u(x,t) = eiξ x (a(ξ ) cos(ξ ct) + b(ξ ) sin(ξ ct)) dξ
R
(we have changed k to ξ to emphasize that it is a continuous variable). If you have studied
Fourier analysis, you will recognize this as a Fourier integral. Suppose that we know the
initial position and velocity of u, that is, u(x, 0) and ut (x, 0). Then a and b are determined by
Z
u(x, 0) = eiξ x a(ξ ) dξ
R
and Z
ut (x, 0) = eiξ x cξ b(ξ ) dξ .
R
There are of course many question marks to be sorted out if one wants to make this into
rigorous mathematics. In particular, one has to know that the functions u(x, 0) and ut (x, 0)
can be written as Fourier integrals and that the functions a and b are unique. This question is
discussed in Chapters 2 and 6. In Chapter 3 we will discuss the application of Fourier methods
to solving PDE such as the wave equation.
There is also another way of solving the wave equation discovered by d’Alembert. The
idea is that the equation can be rewritten in the factorized form
(∂t − c∂x )(∂t + c∂x )u = 0.
Introducing the new variable v = ut + cux , we therefore obtain the two equations
ut + cux = v,
vt − cvx = 0.
The expression vt − cvx is the directional derivative of the function v in the direction (−c, 1).
The second equation therefore expresses that v is constant on the lines x + ct = const. In other
words, v is a function of x + ct, v(x,t) = f (x + ct). The first equation now becomes
ut + cux = f (x + ct)
The left hand side of this equation can be interpreted as the directional derivative of u in
the direction (c, 1). Evaluating u along the straight line γ(s) = (y + cs, s), where y ∈ R is a
parameter, we find that
d
((u ◦ γ)(s)) = (v ◦ γ)(s),
ds
and hence
Z
(u ◦ γ)(s) = (v ◦ γ)(r) dr
Z
= f (y + 2cr) dr
= F(y + 2cs) + G(y)
Introduction 13
1 s
R R
where F(s) = 2c 0 f (r) dr and G(y) is an integration constant ( is used to denote the indefi-
nite integral). Setting s = t, y = x − ct, we obtain that
This can interpreted as saying that the general solution is a sum of a wave moving to the left,
F(x + ct), and a wave moving to the right, G(x − ct). One usually derives this formula by
making the change of variables y = x − ct, z = x + ct, which results in the equation uyz = 0.
The approach used here is however useful to keep in mind when we discuss the method of
characteristics in Chapter 4 (see also Section 1.2 below).
The functions F and G can again be uniquely determined once we know the initial data
u(x, 0) = u0 (x) and ut (x, 0) = u1 (x). Indeed, we obtain the system of equations
F + G = u0 ,
1
F − G = U1 ,
c
1 1
F = u0 + U1 ,
2 2c
1 1
G = u0 − U1
2 2c
Rs
Since U1 (s) = 0 u1 (r) dr +C, this results in d’Alembert’s solution formula
Z x+ct
1 1
u(x,t) = (u0 (x + ct) + u0 (x − ct)) + u1 (s) ds.
2 2c x−ct
Since we have derived a formula for the solution, it follows that it is unique under suitable
regularity assumptions.
Theorem 1.2. Assume that u0 ∈ C2 (R), u1 ∈ C1 (R) and c > 0. There exists a unique solution
u ∈ C2 (R2 ) of the initial-value problem
2
utt = c uxx
u(x, 0) = u0 (x)
ut (x, 0) = u1 (x).
1.1.3 Dispersion
In many cases, the assumption that c is constant as a function of k is unrealistic. If c is a
non-constant function of k one says that the wave (or rather the equation) is dispersive. This
means that waves of different wavelengths travel with different speeds. An initially localized
wave will disintegrate into separate components according to wavelength and disperse. The
relation between ω and k is called the dispersion relation (although some authors use this term
for the relation between c and k).
Example 1.3. The equation
h̄2
ih̄ut + uxx = 0
2m
is the one-dimensional version of the famous free Schrödinger equation from quantum me-
chanics, describing a quantum system consisting of one particle. h̄ is the reduced Planck’s
constant, and m the mass of the particle. The function u, called the wave function, is complex-
valued and its modulus square, |u|2 , can be interpreted as a probability density. Assuming that
u is square-integrable and has been normalized so that
Z
|u(x,t)|2 dx = 1,
R
iut + uxx = 0.
ω(k) = k2
and therefore
ω(k)
c(k) = = k.
k
Note that we allow c, k and ω to be negative now, so that waves can travel either to the left
(k < 0) or to the right (k > 0).
Example 1.4. The linearized Korteweg-de Vries (KdV) equation
ut + uxxx = 0
appears as a model for small-amplitude water waves with long wavelength. The equation
is sometimes called Airy’s equation, although that name is also used for the closely related
ordinary differential equation y00 − xy = 0. Substituting u(x,t) = ei(kx−ω(k)t) into the equation,
we find that
ω(k) = −k3
Introduction 15
and hence
c(k) = −k2 .
The fact that c is negative means that the waves travel to the left.
Since c(k) depends on k for dispersive waves, the wave speed only gives the speed of a
wave with a single wave number. A different speed can be found if one looks for a solution
which consists of a narrow band of wave numbers. For simplicity, we suppose that the equation
has cosine solutions cos(kx − ω(k)t) and consider the sum of two such waves with nearly the
same wave numbers k1 and k2 . Let ω1 and ω2 be the corresponding angular frequencies
and suppose that the waves have the same amplitude a. Using the trigonometric identity
cos(α) + cos(β ) = 2 cos((α − β )/2) cos((α + β )/2), the sum can be written
(k1 − k2 )x (ω1 − ω2 )t (k1 + k2 )x (ω1 + ω2 )t
2a cos − cos − .
2 2 2 2
This is a product of two cosine waves, one with wave speed (ω1 + ω2 )/(k1 + k2 ) and the other
with speed (ω1 − ω2 )/(k1 − k2 ). Setting k1 = k0 + ∆k and k2 = k0 − ∆k, we obtain as ∆k → 0
the approximate form
2a cos(∆k(x − ω 0 (k0 )t)) cos(k0 (x − c(k0 )t)).
We thus find an approximate solution in the form of a carrier wave, cos(k0 (x − c(k0 )t)), with
speed c(k0 ), multiplied by an envelope, cos(∆k(x − ω 0 (k0 )t)), with speed ω 0 (k0 ). This is an
example of a wave packet. Notice that the carrier has wavelength 2π/k0 whereas the envelope
has the much longer wavelength 2π/∆k. The speed
dω
cg =
dk
of the envelope is called the group speed. Notice the similarity with the definition of the phase
speed
ω
c= .
k
In a wave packet, the group speed measures the speed of the packet as a whole, whereas the
phase speed measures the speed of the individual components of the packet. See Figure 1.2
for an illustration. We will analyze this in more detail in Chapter 6.
Example 1.5.
• For the wave equation, we have ω± (k) = ±ck, so the group speed and the phase speed
are equal.
• For the Schrödinger equation we have cg (k) = ω 0 (k) = 2k while c(k) = k.
• For the linearized KdV equation, we have cg (k) = ω 0 (k) = −3k2 and c(k) = −k2 .
We remark that for each k 6= 0, the wave equation has solutions travelling to the left and
solutions travelling to the right, whereas the linearized KdV equation only have solutions
travelling in one direction. We say that the wave equation is bidirectional while the linearized
KdV equation is unidirectional. The Schrödinger equation is somewhere in between; it allows
waves travelling in both directions, but the direction is determined by the sign of k.
16 Introduction
cg
Figure 1.2: A wave packet. The groups speed, cg , is the speed of the packet as a whole, while
the phase speed c is the speed of the individual crests and troughs.
ut + c(u)ux = 0,
with ‘speed’ c(u) = u. Again, the left hand side can be thought of as a directional derivative in
the direction (u, 1), which now depends on u. When studying the equation ut + cux = 0 with
constant c, we found that the solution was constant on straight lines in the direction (c, 1).
This suggests that the solution of Burgers’ equation should be constant on curves with tangent
(u(x,t), 1). Such curves are given by (x(t),t), where x(t) is a solution of the differential
equation
ẋ(t) = u(x(t),t).
This doesn’t seem to help much since we don’t know u. However, since u is constant on the
curve, the equation actually simplifies to
ẋ(t) = u0 (x0 ),
where x0 = x(0) and u0 (x) = u(x, 0). This means that the curves (x(t),t) actually are straight
lines,
(x(t),t) = (x0 + u0 (x0 )t,t),
and that we have
u(x0 + u0 (x0 )t,t) = u0 (x0 ). (1.4)
Introduction 17
Figure 1.3: The evolution of a positive solution of the Burgers equation. ‘Particles’ located
higher up have greater speed. The solution steepens and eventually becomes multivalued. Of
course, it is then no longer a function and doesn’t satisfy equation (1.3).
This determines, at least formally, the solution in an implicit way. Note that this gives credence
to our interpretation of u as a speed: considering a ‘particle’ situated originally on the graph of
u at (x0 , u0 (x0 )), this particle will travel parallel to the x-axis with constant speed u0 (x0 ) (the
particle will travel backwards if u0 (x0 ) < 0). This suggests that (1.4) actually does define the
solution implicitly for some time. However, suppose that the function u0 has negative slope
somewhere, so that there exist points x1 < x2 with u0 (x1 ) > u0 (x2 ). We suppose for simplicity
that these values are positive. The point originally located at (x1 , u0 (x1 )) will then have greater
speed than the point located at (x2 , u0 (x2 )) so that at some point T the solution will develop a
vertical tangent. Continuing the ‘Lagrangian’1 solution beyond this point in time, the function
u will become multivalued (see Figure 1.3). Clearly, u then ceases to be a solution of the
original equation (1.3). In Chapter 4 we will study a different way of continuing the solution
past the time T , namely as a shock wave. The solution then continues to be the graph of a
function, but now has a jump discontinuity. Clearly, this solution can’t satisfy (1.3) either, and
so we have to develop a different way of interpreting solutions.
The above example shows that nonlinear waves may develop singularities. It turns out that
dispersion can sometimes help, however. If we add to the Burgers equation the dispersion
term uxxx , we obtain the famous Korteweg-de Vries (KdV) equation
(the coefficient 6 is inessential). This equation is named after the Dutch mathematician
Diederik Korteweg and his student Gustav de Vries, who derived it as an equation for long
water waves in 1895. The equation had in fact been derived previously by Joseph Boussinesq
[3]. As we will show in Chapter 6, the solutions of the KdV equation remain smooth for all
times. Naively one can think of the nonlinear terms as trying to focus and steepen the wave.
The dispersive term on the other hand tends to spread the wave apart and therefore might be
able to counteract the nonlinearity.
1 In
the Lagrangian approach to fluid mechanics, the coordinates follow the fluid particles. In the Eulerian
approach the coordinates are fixed while the fluid moves.
18 Introduction
1.3 Solitons
The KdV equation was derived in attempt to explain a phenomenon observed earlier by the
Scottish naval engineer John Scott Russell. In 1834, on the Union canal between Edinburgh
and Glasgow, Scott Russell made a remarkable discovery. This is his own account of the event
[22].
I was observing the motion of a boat which was rapidly drawn along a narrow
channel by a pair of horses, when the boat suddenly stopped—not so the mass
of water in the channel which it had put in motion; it accumulated round the
prow of the vessel in a state of violent agitation; then suddenly leaving it behind,
rolled forward with great velocity, assuming the form of a large solitary elevation,
a rounded, smooth and well defined heap of water, which continued its course
along the channel apparently without change of form or diminution of speed. I
followed it on horseback, and overtook it still rolling on at a rate of some eight
or nine miles an hour, preserving its original figure some thirty feet long and a
foot to foot and half in height. Its height gradually diminished, and after a chase
of one or two miles I lost it in the windings of the channel. Such, in the month
of August, 1834, was my first chance interview with that singular and beautiful
phenomenon which I have called the Wave of Translation.
As far as we know this is the first reported observation of a solitary water wave. Scott Rus-
sell was to spend a great deal of time reproducing the phenomenon in laboratory experiments.
He built a small water channel in the end of which he dropped weights, producing solitary
bump-like waves of the sort he had earlier observed. Figure 1.4 shows Scott Russell’s own
illustrations of his experiments.
At that time the predominant mathematical theory for water waves was linear. In particular,
the esteemed mathematicians and fluid dynamicists Sir George Gabriel Stokes and Sir George
Biddell Airy–both with chairs at University of Cambridge—did not accept Russell’s theories,
because they believed that it contradicted the mathematical/physical laws of fluid motion. In
fact, if one considers only linear dispersive equations one finds that solitary travelling waves on
the whole line are impossible. For the linearized KdV equation, any wave is the superposition
of periodic travelling waves of different wave numbers which disperse with time.
To explain Scott Russell’s observations, we look for a travelling wave solution of the KdV
equation which decays to 0 at infinity. Such a solution is called a solitary wave.2 We thus
make the Ansatz u(x,t) = U(x − ct). This yields the ODE
where X = x − ct. Since we are looking for a solitary wave we also require that
Figure 1.4: John Scott Russell’s illustrations of his experiments from [22].
20 Introduction
Figure 1.5: The phase portrait of (1.9) with c > 0. The thick curve represents the homoclinic
orbit corresponding to the solitary wave. The filled circles represent the equilibria (0, 0) and
(0, c/3).
U 00 = cU − 3U 2 +C1 , C1 ∈ R.
U 00 = cU − 3U 2 . (1.8)
H(U,V ) = V 2 − cU 2 − 2U 3
is constant on the orbits. We can thus find the phase portrait by drawing the level curves of
H. Moreover, the orbit corresponding to a solitary wave must lie in the level set H −1 (0). This
reveals that we must take c > 0 in order to find a solitary wave. The phase portrait in this case is
shown in Figure 1.5. We find that there is a homoclinic orbit connecting the origin with itself.
Since the ODE is autonomous, this orbit corresponds to a family of solutions X 7→ U(X − X0 ),
X0 ∈ R, of (1.6)–(1.7). We also note from Figure 1.5 that U(X) > 0 for all X.
We can in fact find an explicit formula for U(X). To make U unique, we assume (without
loss of generality) that U(0) = c/2 and hence U 0 (0) = V (0) = 0 (from the equation H(U,V ) =
0). Note that
X 7→ (U(−X), −V (−X))
Introduction 21
also is a solution of (1.9) with the same initial data, so that by the Picard-Lindelöf theorem,
U(X) = U(−X) (U is even). It therefore suffices to find the solution for X > 0. Also, note that
U(X) ∈ (0, c/2] for all X. The equation H(U,U 0 ) = 0 can be written
(U 0 )2 = cU 2 − 2U 3 ,
which yields √
U 0 = −U c − 2U,
for X ≥ 0. This is a separable ODE. For X > 0 we know that U(X) < c/2 and hence
Z U(X)
dU
− √ = X − X0 , X0 > 0, X > 0,
U(X0 ) U c − 2U
By continuity this also holds in the limit X0 → 0, so that
Z U(X)
dU
− √ = X, X > 0.
c/2 U c − 2U
To calculate this integral, we introduce the new variable θ defined by
c ∂U c sinh(θ )
U= , =− , θ > 0.
2 cosh2 (θ ) ∂θ cosh3 (θ )
Note that θ = 0 corresponds to U = c/2 and θ = ∞ to U = 0. The identity cosh2 (θ ) −
sinh2 (θ ) = 1 implies that
s s
√ cosh2 (θ ) − 1 √ sinh(θ )
1
c − 2U = c 1 − = c = c ,
cosh2 (θ ) cosh2 (θ ) cosh(θ )
and
1 1 ∂U 2 cosh2 (θ ) cosh(θ ) c sinh(θ ) 2
√ =− √ 3
= −√ .
U c − 2U ∂ θ c c sinh(θ ) cosh (θ ) c
We therefore obtain that
Z θ (X)
2 2
√ θ (X) = √ dθ
c 0 c
Z U(X)
dU
=− √
c/2 U c − 2U
= X,
where
c
U(X) = .
2 cosh2 (θ (X))
It follows that
√
c c c
U(X) = √ = sech2 X , X ≥ 0, (1.10)
2 cosh2 c 2 2
2 X
where sech = 1/ cosh is the hyperbolic secant. Since U and sech are even, the formula also
holds for X < 0. Going back to the original variables, we have proved the following result.
22 Introduction
Figure 1.6: The great wave of translation. Three solitary-wave solutions of the KdV equation
of the form (1.10). Notice that fast waves are tall and narrow.
of the KdV equation (1.5). For c < 0 there are no solitary wave solutions.
Note that the amplitude is proportional to the wave speed c, so that higher √ waves travel
faster. The width of the wave3 , on the other hand, is inversely proportional to c. In other
words, the waves become narrower the taller they are (see Figure 1.6). Another interesting
aspect is that the solitary waves move in the opposite direction of the travelling waves of the
linearized KdV equation (cf. Example 1.4). In fact, if one perturbs a solitary wave, one will
generally see a small ‘dispersive tail’ travelling in the opposite direction.
The KdV equation was largely forgotten about by the mathematical community until the
1960’s. In a numerical study of the initial-value problem
with periodic boundary conditions and δ 6= 0 but small, Martin Kruskal and Norman Zabusky
discovered that the solution splits up into finitely many distinct waves resembling the sech2 -
wave, which interacted with each other almost as the equation was linear. They called the
waves solitons. We will investigate this in much more detail in Chapter 7. Figure 1.7 shows
some pictures from Kruskal and Zabusky’s original paper.
3 The width can e.g. be defined as the distance between to points where the elevation is half of the amplitude,
u = c/4.
Introduction 23
<
=
0
>
1
2
?
3
@
;4
5
6
/7
8
9
:.A
-B
+
,C
*#
D
$
%
)"E
(F
!&
G
'K
JH
L
M
IN
~
O}P
|m
Q
u
w
v{zyxtR
sln
ko
p
q
rS
T
jU
g
h
V
ià
b
d
ce
W
f_
^]X
\Y
Z
[
<
=
0
>
1
2
?
3
@
;4
5
6
/7
8
9
:.A
-B
+
,C
*#
D
$
%
)"E
(F
!&
G
'K
JH
L
M
IN
~
O}P
|m
Q
u
w
v{zyxtR
sln
ko
p
q
rS
T
jU
g
h
V
ià
b
d
ce
W
f_
^]X
\Y
Z
[
Figure 1.7: The first reported solitons, from the pioneering paper [30] by Kruskal and Zabusky.
The left plot depicts the temporal development of a solution of (1.12) with δ = 0.22. The
wave first seems to break, but instead eventually separates into eight solitary waves whose
paths through space-time can be followed in the picture to the right. We see that the velocities
are constant but for when the solitons interact (the paths are nearly straight lines interrupted
by sudden shifts).
24 Introduction
Chapter 2
One can think of ∂x as the d-tuple (∂x1 , . . . , ∂xd ). We similarly define xα = x1α1 · · · xdαd for x ∈ Rd
(or even Cd ). The number |α| = α1 + · · · + αn is called the order of the multi-index and we
also write α! = α1 ! · · · αd !. All functions in this chapter are allowed to be complex-valued.
Definition 2.1 (Schwartz space). The Schwartz space S (Rd ) is the vector space of rapidly
decreasing smooth functions,
( )
S (Rd ) := f ∈ C∞ (Rd ) : sup |xα ∂xβ f (x)| < ∞ for all α, β ∈ Nd .
x∈Rd
S (Rd ) contains the class of C∞ functions with compact support, C0∞ (Rd ). See Lemma
B.24 for a non-trivial example of functions belonging to this class. The main reason for
using S instead of C0∞ is that, as we will see, the former class is invariant under the Fourier
transform. This is not the case for C0∞ .
The topology on S (Rd ) is defined by the family of semi-norms (see Section A.1),
25
26 The Fourier transform
defines a metric on S (Rd ). The topology defined by this metric is the same as the one defined
by the semi-norms and the metric space (S (Rd ), ρ) is complete. S (Rd ) is an example of
a generalization of Banach spaces called Frechét spaces, but we will not use that in any sub-
stantial way. Notice, though, that the Schwartz space is not a Banach space, i.e. it cannot be
equipped with a norm making it complete (in particular, f 7→ ρ( f , 0) does not define a norm).
Proposition 2.2.
(1) For all N ≥ 0 and α ∈ Nd there exists a C > 0 such that |∂xα f (x)| ≤ C(1+|x|)−N , x ∈ Rd .
(2) S (Rd ) is closed under multiplication by polynomials.
(3) S (Rd ) is closed under differentiation.
(4) S (Rd ) is an algebra: f g ∈ S (Rd ) if f , g ∈ S (Rd ).
(5) C0∞ (Rd ) is a dense subspace of S (Rd ).
(6) S (Rd ) is a dense subspace of L p (Rd ) for 1 ≤ p < ∞.
Proof. The first property is basically a restatement of the definition of S (Rd ). Properties (2)
and (3) follow from the definition of the Schwartz space. Property (4) is a consequence of
Leibniz’ formula for derivatives of products.
It is clear that C0∞ (Rd ) ⊂ S (Rd ). To show that it is dense, let f ∈ S (Rd ) and define
fn (x) = ϕ(x/n) f (x), where ϕ ∈ C0∞ (Rd ) with ϕ(x) = 1 for |x| ≤ 1 (see Lemma B.24). Then
fn − f = (ϕ(x/n) − 1) f (x)
with support in Rd \ Bn (0). From property (1) it follows that | f (x)| ≤ CN (1 + |x|)−N for any
N ≥ 0. Therefore
rd−1
Z Z ∞
−p(d+1)
(1 + |x|) dx = C dr < ∞,
Rd 0 1 + r p(d+1)
The Fourier transform 27
it follows that f ∈ L p (Rd ) for 1 ≤ p < ∞, if f ∈ S (Rd ). Since C0∞ (Rd ) is dense in L p (Rd )
(cf. Corollary B.26), property (6) now follows from property (5) (we only use the part that
C0∞ (Rd ) ⊂ S (Rd )).
2
Example 2.3. The Gaussian function e−|x| is an example of a function in S (Rd ) which is
not in C0∞ (Rd ).
Proof. The boundedness follows directly from the inequality (2.2). The continuity is a conse-
quence of the dominated convergence theorem (see Theorem B.15).
F
f (x + y) 7→ eiy·ξ fˆ(ξ ),
F
eiy·x f (x) 7→ fˆ(ξ − y),
F
f (Ax) 7→ | det A|−1 fˆ(A−T ξ )
F
f (λ x) 7→ |λ |−d fˆ(λ −1 ξ ),
F
∂xα f (x) 7→ (iξ )α fˆ(ξ ),
F
xα f (x) 7→ (i∂ξ )α fˆ(ξ ),
Proof. The first four properties follow by changing variables in the integral in (2.1). The fifth
property follows by partial integration and the sixth by differentiation under the integral sign.
This also proves that fˆ ∈ C∞ (Rd ).
2
Example 2.7. Let us compute the Fourier transform of the Gaussian function e−|x| /2 , x ∈ Rd .
2
We begin by considering the case d = 1. The function u(x) = e−x /2 satisfies the differential
equation u0 (x) = −xu(x). Taking Fourier transforms we therefore find that iξ û(ξ ) = −iû0 (ξ ),
or equivalently û0 (ξ ) = −ξ û(ξ ). But this means that
ξ2
û(ξ ) = Ce− 2 ,
where
1 x2
Z
C = û(0) = √ e− 2 dx.
2π R
1 (x2 +y2 )
ZZ
2
C = e− 2 dx dy
2π R2
Z ∞ r2
= e− 2 r dr
0
= 1.
x2 ξ2
F (e− 2 )(ξ ) = e− 2 ,
d 2
2 /2
e−|x| = ∏ e−x j /2 , x = (x1 , . . . , xd ) ∈ Rd .
j=1
Note in particular that the Fourier transform of a Gaussian function is in S (Rd ). This is
no coincidence.
The last line can be estimated by a finite number of semi-norms of f using Leibniz’ formula.
It follows that F maps S (Rd ) to itself continuously.
Recall that the convolution of two functions f and g is defined by
Z
( f ∗ g)(x) := f (x − y)g(y) dy, (2.3)
Rd
It is clearly also continuous on S (Rd ). The name is explained by the following theorem.
Theorem 2.10 (Fourier’s inversion formula). Let f ∈ S (Rd ). Then f (x) = F −1 ( fˆ)(x).
2
Proof. Let K(x) R= (2π)−d/2 e−|x| /2 and recall from Example 2.7 that K̂ = K. In particular,
this implies that Rd K(x) dx = (2π)d/2 K(0) = 1. By assumption, the iterated integral
Z Z
1 −iy·ξ
d/2
f (y)e dy eix·ξ dξ
(2π) R d R d
30 The Fourier transform
by the dominated convergence theorem. Changing order of integration, we also see that
1
ZZ
fε (x) = f (y)ei(x−y)·ξ K(εξ ) dy dξ
(2π)d/2 Rd ×Rd
Z
1
Z
−i(y−x)·ξ
= f (y) e K(εξ ) dξ dy
(2π)d/2 Rd Rd
Z
= f (y) F (K(εξ )) (y − x) dy
Rd Z
−d
=ε f (y) K(ε −1 (y − x)) dy
R
= (Kε ∗ f )(x),
Corollary 2.11. The Fourier transform is a linear bijection of S (Rd ) to itself with inverse
equal to F −1 . That is,
F −1 F = F F −1 = Id on S (Rd ).
Proof. The above results implies that F −1 F = Id on S (Rd ). To prove the corollary, we
must also show that F F −1 = Id. Define the reflection operator R : S (Rd ) → S (Rd ) by
R( f )(x) = f (−x). R commutes with F and F −1 (Proposition 2.6), is its own inverse and
satisfies F −1 = RF . It follows that
F F −1 = F RF = F −1 F = Id .
Proof. This follows from the convolution theorem, Fourier’s inversion formula and the fact
that S (Rd ) is an algebra, since f ∗ g = (2π)d/2 F −1 ( fˆĝ).
The Fourier transform 31
where the last equality follows from the first by replacing f with F −1 ( f ). This is sometimes
called Parseval’s formula. A bijective linear operator which preserves the inner product is
called a unitary operator. Note that a unitary operator automatically is bounded.
Theorem 2.16 (Plancherel’s theorem). The Fourier transform F and the inverse Fourier
transform F −1 on S (Rd ) extend uniquely to unitary operators on L2 (Rd ) satisfying F −1 F =
F F −1 = Id.
Proof. The existence and uniqueness of the extensions follow from Proposition 2.13 and (2.9).
Since F F −1 = F −1 F = Id on a dense subspace, it follows that the same is true on L2 (Rd ).
By continuity, we have that (F ( f ), g)L2 (Rd ) = ( f , F −1 (g))L2 (Rd ) for f , g ∈ L2 (Rd ), so F and
F −1 are unitary.
Since the Fourier transform is defined for L1 (Rd ) and L2 (Rd ), one can in fact extend it to
L p (Rd ), 1 ≤ p ≤ 2, by interpolation. The following result is a direct consequence of Theorems
2.14, 2.16 and B.28.
Theorem 2.17 (Hausdorff-Young inequality). Let p ∈ [1, 2] and q ∈ [2, ∞] with 1p + 1q = 1. The
Fourier transform extends uniquely to a bounded linear operator F : L p (Rd ) → Lq (Rd ), with
d
1− 2p
kF ( f )kLq (Rd ) ≤ (2π) 2
k f kL p (Rd ) , f ∈ L p (Rd ).
The Fourier transform 33
The extension to L p (Rd ) for p > 2 is a completely different matter. We will return to this
in Chapter 5.
Note that many of the properties of the Fourier transform on S (Rd ) automatically hold on
L p (Rd ),
1 ≤ p ≤ 2, by continuity. This is e.g. true for the four first properties in Proposition
2.6. One also obtains the following extension of Fourier’s inversion formula.
Proposition 2.18. Let f ∈ L p (Rd ) for some p ∈ [1, 2] and fˆ ∈ L1 (Rd ). Then f is continuous1
and f (x) = F −1 ( fˆ)(x) for all x ∈ Rd .
Note that we have defined the extension in two different ways if e.g. f ∈ L1 (Rd ) ∩ L2 (Rd ).
However, approximating f by a sequence in { fn } in C0∞ (Rd ) with fn → f in L1 and L2 , we find
that fˆn → FL1 ( f ) in Cb and fˆn → FL2 ( f ) in L2 , where the subscripts on the Fourier transforms
are used to differentiate the two extensions. But this implies that FL2 ( f ) = FL1 ( f ) a.e.
1
Z
fˆ(ξ ) = √ f (x)e−ixξ dx
2π R
Z 1
1
=√ e−ixξ dx
2π −1
" #1
1 e−ixξ
=√
2π −iξ x=−1
1 eiξ −e−iξ
=√
2π iξ
r
2 sin(ξ )
= .
π ξ
fˆ is not in L1 (R), so we can’t use the pointwise inversion formula in Proposition 2.18. We do
however have f = F −1 ( fˆ) in the sense of Theorem 2.16 since f ∈ L2 (R). Moreover, we have
sin2 ξ
Z
2
dξ = π.
R ξ
In this chapter we will study the simplest kind of PDE, namely linear PDE with constant coef-
ficients. This means in particular that the superposition principle holds, so that new solutions
can be constructed by forming linear combinations of known solutions. There are two main
reasons for why this chapter is included in the lecture notes. The first reason is that some
important equations describing waves are linear. The second is that many nonlinear wave
equations can be treated as perturbations of linear ones. A knowledge of linear equations is
therefore essential for understanding nonlinear problems.
We will concentrate on the Cauchy problem (the initial-value problem), in which, in ad-
dition to the PDE, the values of the unknown function and sufficiently many time derivatives
are prescribed at t = 0. This presupposes that there is some distinguished variable t which can
be interpreted as time. Usually this is clear from the physical background of the equation in
question. If you are familiar with the Cauchy problem for ordinary differential equations you
know that there always exists a unique solution under sufficient regularity hypotheses. We
will look for conditions under which the same property is true for PDE. Our goal is in fact to
convert the PDE to ODE by using the Fourier transform with respect to the spatial variables
x. From now on, Fourier transforms will only be applied with respect to the spatial variables.
We will consider functions u(x,t) defined for all x ∈ Rd . This is of course a simplification;
in the ‘physical world’, u(x,t) will only be defined for x in some bounded domain and there
will be additional boundary conditions imposed on the boundary of this domain. Our study
will not be completely restricted to equations describing waves, but in the end we will arrive
at a condition which guarantees that the solutions behave like waves in the sense that there
is a well-defined speed of propagation. Equations having this property are called hyperbolic.
Not all equations which describe waves are hyperbolic. In particular, dispersive equations
such as the Schrödinger equation and the linearized KdV equation are not. Nevertheless, we
will prove that the Cauchy problem is uniquely solvable for these equations as well. Further
properties of dispersive equations will be studied in Chapter 6.
We begin by considering three classical second order equations, namely the heat equation,
the wave equation and the Laplace equation. These examples are instructive in that they illus-
trate that the properties of the solutions of PDE depend heavily on the form of the equation.
They also give us some intuition of what to expect in general.
35
36 Linear PDE with constant coefficients
The presentation of the material in this chapter is close that in Rauch [20]. The treatment
of Kirchhoff’s solution formula for the wave equation in R3 follows Stein and Shakarchi [24].
A more detailed account of the theory of linear PDE with constant coefficients can be found
in Hörmander [13].
ut = k∆u, (3.1)
where ∆u = ∂x21 u + ∂x22 u + ∂x23 u and k > 0 is a constant. As the name suggest, the equation
describes the flow of heat. Recall that heat is a form of energy transferred between two bodies,
or regions of space, at different temperatures. Heat transfer to a region increases or decreases
the internal energy in that region. The internal energy of a system consists of all the different
forms of energy of the molecules in the system, but excludes the energy associated with the
movement of the system as a whole. We look at this from a macroscopic perspective and
suppose that the internal energy can be described by a density e(x,t). If E(t) is the total
internal energy in the domain Ω ⊂ R3 we therefore have that
Z
E(t) = e(x,t) dx,
Ω
Since this holds for all sufficiently nice domains Ω we obtain the continuity equation
et + ∇ · J = 0
Linear PDE with constant coefficients 37
if all the involved functions are continuous. To derive the heat equation we use two constitutive
laws. First of all, we assume that over a sufficiently small range of temperatures, e depends
linearly on the temperature T , that is,
e = e0 + ρc(T − T0 ),
where ρ is the density, c is the specific heat capacity and T0 a reference temperature. We
assume that ρ and c are independent of T . We can simply think of the temperature as what
we measure with a thermometer. On a microscopic level, the temperature is a measure of
the average kinetic energy of the molecules. The above assumption means that an increase in
internal energy results in an increase in temperature. This is not always true, as an increase
in internal energy may instead result in an increase in potential energy. This is e.g. what
happens in phase transitions. The second constitutive law is Fourier’s law, which says that
heat flows from hot to cold regions proportionally to the temperature gradient. In other words,
J = −κ∇T , where κ is a scalar quantity called the thermal conductivity. Assuming that c and
ρ are independent of t, we therefore obtain the equation
cρTt = ∇ · (κ∇T ).
If in addition c, ρ and κ are constant, we find that the temperature satisfies the heat equation
with k = κ/(cρ).
The heat equation is sometimes also called the diffusion equation. The physical interpreta-
tion is then that u describes the concentration of a substance in some medium (e.g. a chemical
substance in some liquid). Diffusion means that the substance tends to move from regions
of higher concentration to regions of lower concentration. The heat equations is obtained by
assuming that the rate of motion is proportional to the concentration gradient (Fick’s law of
diffusion).
where u : Rd × [0, ∞) → C and u0 : Rd → C. Note that one can assume that k = 1 in (3.1)
by using the transformation t 7→ kt. It is useful to first consider a solution in the form of an
individual Fourier mode û(t)eiξ ·x . Substituting this in the equation yields
This is written in the form of a complex sinusoidal wave aei(x·ξ −ω(ξ )t) considered in Chapter
1, except that the angular frequency ω(ξ ) = −i|ξ |2 now is imaginary. This implies that the ab-
solute value of the solution is strictly decreasing in t. Since there are no temporal oscillations,
we don’t normally think of the heat equation as an equation describing waves.
Our strategy for solving the Cauchy problem in general is to find a solution which is a
‘continuous linear combination’ of solutions of the form (3.3), that is, a Fourier integral in
x. We work formally and assume that u and u0 are sufficiently regular, so that their Fourier
transforms are well-defined. We also assume that we can differentiate under the integral sign,
so that ∂t F (u) = F (∂t u). We then obtain the transformed problem
(
ût (ξ ,t) = −|ξ |2 û(ξ ,t), t > 0,
û(ξ , 0) = û0 (ξ ),
2
with solution û(ξ ,t) = e−t|ξ | û0 (ξ ). Applying the inverse Fourier transform, we obtain that
1
Z
2
u(x,t) = eix·ξ e−t|ξ | û0 (ξ ) dξ , t ≥ 0. (3.4)
(2π)d/2 Rd
The integral will in general not converge for t < 0 due to the exponentially growing factor
2
e−t|ξ | . Under the assumption that u0 ∈ S (Rd ), one can differentiate under the integral sign as
many times as one likes for t > 0 by Theorem B.15. This still is true if we only allow one-sided
time derivatives at t = 0. It follows that u defined by this formula belongs to C∞ (Rd × [0, ∞))1 .
Carrying out the differentiation one also verifies that u satisfies the heat equation and the
initial condition. Since we have derived an explicit formula, it follows that the solution is
unique whenever the steps involved in deriving the formula can be justified. This holds e.g.
if u(·,t), ut (·,t) ∈ S (Rd ) for each t ≥ 0 and û(·,t) and ût (·,t) can be dominated by some
L1 functions which are independent of t, so that Theorem B.15 applies. There is however a
more natural condition which guarantees that the steps can be justified. The idea is to take the
analogy with ordinary differential equations one step further and think of the solution u as a
function of t with ’values’ in the function space S (Rd ).
Definition 3.1. Let I be an open interval in R.
(1) A function u : I → S (Rd ) is said to be continuous if ku(t + h) − u(t)kα,β → 0 as h →
0 for every semi-norm k · kα,β and every t ∈ I. C(I; S (Rd )) is the vector space of
continuous functions I → S (Rd ).
(2) A function u : I → S (Rd ) is said to be differentiable with derivative u0 : I → S (Rd ) if
u(t + h) − u(t) 0
lim
− u (t)
=0
h→0 h
α,β
for all α, β ∈ Nd and t ∈ I. The space C1 (I; S (Rd )) consists of differentiable functions
u : I → S (Rd ) such that u, u0 ∈ C(I; S (Rd )).
1 Meaning that at t = 0 derivatives with respect to t are only taken from the right.
Linear PDE with constant coefficients 39
(3) The space Ck (I; S (Rd )), k ≥ 2, is defined inductively by requiring that u ∈ C(I; S (Rd ))
and u0 ∈ Ck−1 (I; S (Rd )), and C∞ (I; S (Rd )) = ∩∞ k=0C (I; S (R )).
k d
If I is any interval (not necessarily open), we say that u ∈ Ck (I; S (Rd )) if it is k times continu-
ously differentiable from the interior of I to S (Rd ) and all the derivatives extend continuously
in S (Rd ) to the endpoints.
Lemma 3.2. Assume that u ∈ Ck (I; S (Rd )). Then û ∈ Ck (I; S (Rd )) with ∂tk F (u) = F (∂tk u).
There is a final important property to discuss. In real life it is impossible to measure some-
thing with infinite accuracy. It is therefore essential that the solution depends continuously on
the data in some sense, so that small measuring errors don’t grow large (at least in finite time).
In the case of the heat equation, we e.g. have that the map
is continuous if the space C∞ ([0, ∞); S (Rd )) is equipped with the countable family of semi-
norms
sup k∂tk u(·,t)kα,β , n, k ∈ N, α, β ∈ Nd .
0≤t≤n
Note that t 7→ k∂tk u(·,t)kα,β is a continuous function so that the supremum is finite. The
2
continuity of the map u0 7→ u follows from the formula û(ξ ,t) = e−t|ξ | û0 (ξ ).
Theorem 3.3. There exists a unique solution u ∈ C∞ ([0, ∞); S (Rd )) of the Cauchy problem
(3.2). Moreover, the solution map S (Rd ) 3 u0 7→ u ∈ C∞ ([0, ∞); S (Rd )) is continuous.
A convenient way of describing the solution is by looking at the map S(t) : S (Rd ) →
S (Rd ) from the data u0 to the solution u(·,t) at time t ≥ 0. S is called the propagator for
the heat equation. The continuous dependence of the solution on the initial data means that
S(·) ∈ C(S (Rd );C∞ ([0, ∞); S (Rd ))). The map S has two important properties. First of all,
where Id is the identity map on S (Rd ). Secondly, note that the heat equation is invariant under
translations in t. This means that the solution with initial data u0 at t = t0 is given by S(t −t0 )u0 ,
t ≥ t0 . Consider the solution u with initial data u0 at time 0. If s,t ≥ 0 we can express u(·, s +t)
in two different ways. We can of course express it as S(s + t)u0 . On the other hand, we can
solve the equation up to time t, which gives us S(t)u0 . We can then use this as our new initial
data and solve up to time s + t. In other words S(s + t)u0 = S(s + t − t)S(t)u0 = S(s)S(t)u0 . In
other words,
S(s + t) = S(s)S(t), s,t ≥ 0. (3.6)
Definition 3.4. Let X be a vector space. A family of linear operators {S(t)}t≥0 on X with the
properties (3.5) and (3.6) is called a semigroup. If the family is defined for all t ∈ R and (3.6)
holds for all s,t ∈ R it is called a group.
2 1 |x|2
F −1 (e−t|·| )(x) = d/2
e− 4t
= (2π)d/2 Kt (x), t >0
(2t)
(see Example 2.7), we can express the solution of the heat equation without using the Fourier
transform.
Note that (x,t) 7→ Kt (x) is itself a solution of the heat equation in the half space Rd × (0, ∞)
with (t 7→ Kt ) ∈ C∞ ((0, ∞); S (Rd )). Solving the heat equation with initial data Kt we therefore
find that
Ks ∗ Kt = Ks+t , s,t > 0.
This is another way of stating the semigroup property of the propagator S(t). We also note
that Z
kKt kL1 (Rd ) = Kt (x) dx = (2π)d/2 F (Kt )(0) = 1.
Rd
Linear PDE with constant coefficients 41
Proof. Due to the translation invariance in t, it suffices to show that ku(·,t)kL p ≤ ku0 kL p to
prove the monotonicity of the L p norm. This bound follows from Young’s inequality (Theorem
B.29):
ku(·,t)kL p ≤ kKt kL1 ku0 kL p = ku0 kL p .
The second part also follows from Young’s inequality:
− d2 1− 1p
ku(·,t)kL p ≤ kKt kL p ku0 kL1 = Ct ku0 kL1 → 0
if p > 1, for some constant C > 0.
42 Linear PDE with constant coefficients
We mention an alternative way of proving the monotonicity. This is especially useful if one
wants to consider the heat equation in a domain other than Rd , where the Fourier transform is
not avaiable. For simplicity we assume that u0 (and hence u) is real-valued. Multiplying the
equation ut = ∆u by u and integrating we find that
Z
d
Z Z Z
2
u dx = 2 uut dx = 2 u∆u dx = −2 |∇u|2 dx ≤ 0
dt Rd Rd Rd Rd
where we used integration by parts in the last step. The monotonicity of the other L p norms,
1 ≤ p < ∞ can be proved in a similar way, although special care has to be taken in the case
1 ≤ p < 2 due to regularity problems. The method of multiplying the equation by some
quantity and integrating to obtain a bound is called the energy method. One can sometimes
guess the form of the integral directly.
Smoothing
The heat equation also has a smoothing property. Again this makes sense physically. Since
the heat tends to spread out evenly, we expect discontinuities to be ‘evened out’. To formalize
this smoothing property we need to consider non-smooth initial data.
Proposition 3.8. The propagator S(t) : S (Rd ) → S (Rd ), t ≥ 0, extends uniquely to a con-
tinuous map S(t) : L p (Rd ) → L p (Rd ) for 1 ≤ p < ∞, given by
S(t)u0 = Kt ∗ u0 , t >0
and S(0)u0 = u0 .
Proof. It follows from Young’s inequality that Kt ∗ u0 is defined for t > 0 and defines a con-
tinuous extension of S(t) : S (Rd ) → S (Rd ). The uniqueness follows from Propositions 2.13
and 3.7.
Proposition 3.9. Let u0 ∈ L p (Rd ), 1 ≤ p < ∞. Then S(·)u0 ∈ C∞ (Rd × (0, ∞)) and S(t)u0
solves the heat equation for t > 0. Moreover, S(t)u0 → u0 in L p (Rd ).
Proof. This follows by using the formula S(t)u0 = Kt ∗ u0 . Since Kt (x) is exponentially de-
caying in x for t > 0, we can apply Theorem B.15 to differentiate under the integral sign as
many times as we like. It follows that S(t)u0 is smooth in Rd × (0, ∞). The convergence of
S(t)u0 = Kt ∗ u0 to u0 in L p is a consequence of Theorem B.27.
In contrast to the heat equation, there are two initial conditions. This is due to the fact that the
equation is of second order in time. The transformed problem is
2
ûtt (ξ ,t) = −|ξ | û(ξ ,t), t > 0,
û(ξ , 0) = û0 (ξ ),
ût (x, 0) = û1 (ξ ),
with solution
sin(t|ξ |)
û(ξ ,t) = cos(t|ξ |)û0 (ξ ) + û1 (ξ )
|ξ |
and hence
1 sin(t|ξ |)
Z
ix·ξ
u(x,t) = e cos(t|ξ |)û0 (ξ ) + û1 (ξ ) dξ .
(2π)d/2 Rd |ξ |
Just as for the heat equation, one obtains an existence and uniqueness result from these formu-
las. In contrast to the heat equation, the solution is now also defined for negative times. This
is related to the fact that, contrary to the heat equation, the wave equation is invariant under
the transformation t 7→ −t.
Theorem 3.12. There exists a unique solution u ∈ C∞ (R; S (Rd )) of the Cauchy problem
(3.9). Moreover, the solution map S (Rd )2 3 (u0 , u1 ) 7→ u ∈ C∞ (R; S (Rd )) is continuous.
To define a propagator with the semigroup property, one has to consider both u and ut . We
leave it as an exercise to prove that the map S(t)(u0 , u1 ) := (u(·,t), ut (·,t)) in fact is a group
in S (Rd )2 .
which implies that the solution corresponding to the case u1 = 0 can be obtained by differenti-
ating the solution corresponding to the case u0 = 0 with respect to t. It turns out that the form
of the solution depends on the dimension. We discuss the cases d = 1, 2 and 3 separately.
Linear PDE with constant coefficients 45
One dimension
In this case we already know from Theorem 1.2 that the solution is given by d’Alembert’s
formula
1 1 x+t
Z
u(x,t) = (u0 (x + t) + u0 (x − t)) + u1 (y) dy. (3.10)
2 2 x−t
We can recover this formula by calculating the inverse Fourier transform of sin(t|ξ |)/|ξ |.
Using d’Alembert’s formula we can in fact guess what the solution must be. For t > 0 we find
that Z t
1 2 sin(tξ ) 2 sin(t|ξ |)
F (χ(−t,t) )(ξ ) = √ e−ixξ dx = √ =√ .
2π −t 2π ξ 2π |ξ |
It follows that √
−1 sin(t|ξ |) 2π
F = χ , t > 0.
|ξ | 2 (−t,t)
Using the convolution theorem (which extends to L1 by continuity) we now find that
−1 sin(t|ξ |) 1
F û1 (ξ ) (x) = (χ(−t,t) ∗ g)(x)
|ξ | 2
1
Z
= χ (x − y)u1 (y) dy
2 R (−t,t)
1 x+t
Z
= u1 (y) dy.
2 x−t
The formula extends directly to negative t. Finally, the solution corresponding to u0 is obtained
by differentiating with respect to t and replacing u1 by u0 .
Three dimensions
of a function f over the sphere ∂ Bt (x) of radius t around x, where |∂ Bt (x)| = 4πt 2 is the area
of the sphere. It is useful to rewrite this as
1
Z
Mt ( f )(x) = f (x − ty) dS(y).
4π ∂ B1 (0)
Note that Mt ( f ) extends to en even function of t defined by taking the average over the ball of
radius |t| for t < 0.
Lemma 3.13. Let f ∈ S (R3 ). Then the map t 7→ Mt ( f ) belongs to C∞ (R; S (R3 )).
Proof. To prove that Mt ( f ) is rapidly decreasing for fixed t, we simply estimate | f (x − ty)| ≤
CN (1 + |x − ty|)−N ≤ CN0 (1 + |x|)−N when |y| = 1 (separate into the cases |x| ≤ 2t|y| and |x| ≥
2t|y| and use the reverse triangle inequality in the latter case) and hence |Mt ( f )(x)| ≤ CN00 (1 +
|x|)−N by integration. Differentiating under the integral sign, we find that Mt ( f ) ∈ S (R3 ) for
all t ∈ R. The difference quotient
f (x − (t + h)y) − f (x − ty)
− ∇ f (x) · (−y)
h
can be estimated using Taylor’s theorem by (max|α|=2 supz∈B|t|+|h| (x) |∂ α f (z)|)|h| if h is small.
Again, this is rapidly decreasing in x and therefore converges to zero as h → 0 when multiplied
by any polynomial. Similar estimates for the derivatives ∂xα f (x − ty) are easily obtained. It
follows that t 7→ Mt ( f ) ∈ C∞ (R; S (R3 )).
Lemma 3.14.
1 sin |ξ |
Z
e−ix·ξ dS(x) = .
4π ∂ B1 (0) |ξ |
Proof. We begin by noting that the integral is radial in ξ . Indeed, if R is a rotation matrix,
then RT R = I, so
Z Z Z
−ix·Rξ −iRT x·ξ
e dS(x) = e dS(x) = e−iy·ξ dS(y),
∂ B1 (0) ∂ B1 (0) ∂ B1 (0)
where y = RT x. It therefore suffices to take ξ = (0, 0, |ξ |), in which case the integral equals
" #π
1
Z 2π Z π
1 e−i|ξ | cos θ sin |ξ |
e−i|ξ | cos θ sin θ dθ dϕ = = .
4π 0 0 2 i|ξ | |ξ |
θ =0
1
Z Z
(2π) 3/2
F (Mt (u1 ))(ξ ) = e−ix·ξ u1 (x − ty) dS(y) dx
4π R 3 ∂ B1 (0)
1
Z Z
= u1 (z)e−i(z+ty)·ξ dS(y) dz
4π R ∂ B1 (0)
3
Z Z
−iz·ξ 1 −ity·ξ
= u1 (z)e dz e dS(y)
R3 4π ∂ B1 (0)
sin(t|ξ |)
= (2π)3/2 û1 (ξ ).
t|ξ |
Two dimensions
It’s more difficult to directly calculate a solution formula in the two-dimensional case, but
there is a trick which works well. The idea is to look at the solution u(x1 , x2 ,t) as a solution of
the three-dimensional wave equation which is independent of x3 . This is called the method of
descent.
Proof. As usual it suffices to take u0 = 0. Let ũ be the solution of the wave equation for d = 3
with initial data ũ0 (x̃) = 0 and ũ1 (x̃) = u1 (x), where x̃ = (x, x3 ), x = (x1 , x2 ) ∈ R2 . According
to Theorem 3.15, we then have that
1 1
Z Z
ũ(x̃,t) = ũ1 (ỹ) dS(ỹ) = ũ1 (ỹ) dS(ỹ),
4πt 2πt
|x̃−ỹ|=|t| |x̃−ỹ|=|t|
y3 ≥0
since ũ1 is independent of y3 . The upper hemisphere {ỹ : |x̃− ỹ| = |t|, y3 ≥ 0} can be parametrized
by setting y3 = x3 + (t 2 − |x − y|2 )1/2 , |x − y| ≤ |t|. The element of surface area is then
dS = |t|(t 2 − |x − y|2 )−1/2 dy, and
sgnt u1 (y)
Z
u(x,t) = dy
2π |x−y|≤|t| (t − |x − y|2 )1/2
2
There is a small problem in that ũ1 6∈ S (R3 ). This can e.g. be taken care of by replacing it
with u1 (x)ϕ(x3 ) where ϕ is in C0∞ (R) with ϕ(x) = 1 for |x| ≤ R. As long as |t| ≤ R this will
not affect u. One therefore finds that u defined by (3.12) is the unique solution of the wave
equation.
48 Linear PDE with constant coefficients
(2) The (future) domain of influence of a point P is the set of future points which are influ-
enced by P. If U ⊂ Rd × R, the domain of influence of U is the union of the domains of
influence of all P ∈ U.
(3) The domain of dependence of a point Q ∈ Rd × R is the set of past points which influ-
ences it.
Remark 3.18. The reader should be aware that the terminology isn’t standardized. In the def-
inition of the domain of influence one e.g. sometimes considers both future and past points.
Note also that since the equation is linear one can replace v by the zero function in the defini-
tion.
Proposition 3.19. Consider the wave equation in Rd . If d = 1, 2, the domain of influence of a
point (x0 ,t0 ) ∈ Rd × R is the set {(x,t) ∈ Rd × R : t ≥ t0 , |x − x0 | ≤ t − t0 }. If d = 3, it is the
set {(x,t) ∈ Rd × R : t ≥ t0 , |x − x0 | = t − t0 }. Similarly, the domain of dependence of (x0 ,t0 )
is {(x,t) ∈ Rd × R : t ≤ t0 , |x − x0 | ≤ t0 −t} for d = 1, 2 and {(x,t) ∈ Rd × R : t ≤ t0 , |x − x0 | =
t0 − t} for d = 3.
Proof. It suffices to consider the domain of influence. The domain of dependence is easily
obtained by considering the domains of influence of points in the past. Since the wave equation
is invariant under translations in t, we can assume that t0 = 0.
Consider first the cases d = 1, 2. The proof follows by inspecting the solution formulas
(3.10) and (3.12). If u0 and u1 have supports in BR (x0 ), the supports of u(·,t) and ut (·,t) for
t ≥ 0 are contained in {x ∈ Rd : ∃y ∈ BR (x0 ), |x −y| ≤ t}. Picking u0 (x) ≡ 0 and u1 (x) ≥ 0 with
supp u1 = BR (x0 ), we find that supp u(·,t) = {x ∈ Rd : ∃y ∈ BR (x0 ), |x − y| ≤ t}. Choosing R
arbitrarily small, the result follows.
In the three-dimensional case, the solution formula (3.11) only involves integration over
spheres rather than balls. Repeating the above argument shows that the domain of influence is
now {(x,t) ∈ Rd × R : t ≥ 0, |x − x0 | = t}.
The cones {(x,t) ∈ Rd × R : t ≥ t0 , |x − x0 | = t −t0 } and {(x,t) ∈ Rd × R : t ≤ t0 , |x − x0 | =
t0 − t} are called the future and past light cones of (x0 ,t0 ).
Linear PDE with constant coefficients 49
t t
Q
P x x
Figure 3.1: Left: Domain of influence for the wave equation. In dimensions 1 and 2, the
domain of influence of the point P is the shaded cone. In dimension 3 it is the boundary of the
cone. Right: Domain of dependence for the wave equation.
We can summarize this phenomenon by saying that information travels with speed less
than or equal to 1 for the wave equation. This is in fact true in all dimensions. In three
dimensions information travels exactly at speed 1. This is usually called Huygens’ principle
and one can prove that it holds in all odd dimensions except 1. This means that in three-
dimensional space, a localized disturbance will cause a wave which is only observed for a
finite amount of time by an observer who is standing still. In two dimensions, the observer
will notice the wave for all future times, although the amplitude decays in time. A more useful
fact is that Huygens’ principle holds in all dimensions if one considers the speed of which
singularities travel. This is known as the generalized Huygens’ principle.
The proof simply relies on changing variables to fix the domain of integration and differ-
entiation under the integral sign. The difference with respect to the one-dimensional case has
to do with the fact that the integral in (3.11) is over a two-dimensional surface in R3 , whereas
the integral in d’Alembert’s formula is over an open interval in R. One can show that the ‘loss
of derivatives’ gets worse the higher the dimension. It is however possible to avoid this by
instead measuring the regularity in terms of Sobolev norms. We will return to this in Chapter
5. All of this is in stark contrast to the heat equation, for which initial data which may not
even be continuous evolve into smooth solutions for t > 0 (cf. Proposition 3.9).
Proposition 3.20 only provides an existence result. Uniqueness can be also be proved using
the energy method.
Theorem 3.21. Let BR (0) be the ball of radius R around the origin in Rd and let Γ = {(x,t) ∈
Rd × R : |x| ≤ R − |t|}. Assume that u ∈ C2 (Γ) is a solution of the wave equation and that u
and ut vanish in BR (0) × {0}. Then u(x,t) ≡ 0 in Γ.
Proof. For simplicity we consider the case t ≥ 0. The other case follows by the change of
variables t 7→ −t. Since the real and imaginary parts each satisfy the wave equation, we may
also assume that u is real. Let n(x) = x/(R − t) be the exterior unit normal of the ball BR−t (0)
in Rd and dS(x) the surface measure on ∂ BR−t (0). Introduce
1
Z
e(t) = (ut2 + |∇u|2 ) dx, 0 ≤ t ≤ R/c.
2 |x|≤R−t
1
Z Z
0
e (t) = (ut utt + ∇u · ∇ut ) dx − (u2 + |∇u|2 ) dS(x)
|x|≤R−t 2 |x|=R−t t
1
Z Z
= ut (utt − ∆u) dx − (u2 + |∇u|2 − 2ut ∇u · n(x)) dS(x)
|x|≤R−t 2 |x|=R−t t
≤ 0,
since u satisfies the wave equation and 2|ut ∇u · n(x)| ≤ ut2 + |∇u|2 . This shows that e(t) is
decreasing. Since e(0) = 0 and e(t) ≥ 0 it follows that e(t) ≡ 0.
Since the wave equation is linear, this guarantees that any C2 solution is uniquely deter-
mined by its initial data. If we assume that ∇u and ut are decaying at infinity and instead
integrate of Rd , the boundary terms disappear when we integrate by parts. This proves the
following result.
Proposition 3.22. Let u0 , u1 ∈ S (Rd ). Then,
1
Z
E(t) = (|ut |2 + |∇u|2 ) dx
2 Rd
is independent of t.
The quantity E can be interpreted as the total energy of the wave. This explains the name
‘energy method’ used above. Alternatively, the conservation of energy can be derived by using
the Fourier representation of the solution. We leave this as an exercises.
Linear PDE with constant coefficients 51
yielding
sinh(t|ξ |)
û(ξ ,t) = cosh(t|ξ |)û0 (ξ ) + û1 (ξ ).
|ξ |
Here we run into problems since û(·,t) will not even be bounded for t > 0 unless û0 and û1 are
exponentially decaying (note that cosh(t|ξ |) and sinh(t|ξ |) grow like et|ξ | at infinity). Thus,
there is in general no solution of the form discussed above, unless we assume e.g. that û0 and
û1 have compact support. One could of course look for a solution with less regularity, but using
distribution theory one can in fact show under very mild conditions that there is no solution.
There is however the possibility that û0 and û1 fit together in such a way that the exponentially
large terms cancel. This will be the case precisely if û1 (ξ ) = −|ξ |û0 (ξ ). If we are satisfied
with only prescribing u(x, 0) we can therefore find a solution, namely u(ξ ,t) = e−t|ξ | û0 (ξ ).
This gives the solution formula
1
Z
u(x,t) = eix·ξ e−t|ξ | û0 (ξ ) dξ ,
(2π)d/2 Rd
defining a harmonic function in the half plane t > 0. As for the heat equation, it is possible to
obtain a solution formula directly in terms of u0 by calculating F −1 (e−t|ξ | ). The result is
u(x,t) = (Pt ∗ u0 )(x),
where
Γ ((d + 1)/2) t
Pt (x) = (d+1)/2
,
π (t + |x|2 )(d+1)/2
2
is the Poisson kernel for the half-space t > 0, Γ denoting the Gamma function.
52 Linear PDE with constant coefficients
where ξ ∈ Rd is thought of as a column vector and ξ T is the transpose of ξ . Let q(η) be the
form corresponding to the transformed second order part. Then
q(η) = η T BABT η.
Note that this is just the change of variables formula associated with the transformation ξ =
BT η. Recall that the number of positive (m+ ) and negative (m− ) eigenvalues of the symmetric
matrix associated with a quadratic form is invariant under linear transformations and that we
can find a transformation B so that
d
q(η) = p(BT η) = ∑ σ j η 2j ,
j=1
in which σ j ∈ {−1, 0, 1} with m+ positive and m− negative squares. We can make a classifi-
cation of second order PDE by considering the quadratic form q (or p). The names correspond
to the character of the quadratic surface q(η) = c.
Linear PDE with constant coefficients 53
Definition 3.23.
• We say that the PDE (3.13) is elliptic if the quadratic form p is positive or negative
definite, that is, if all the eigenvalues are positive or if all are negative. In other words
this means that after linear change of variables, we can transform the PDE to the Laplace
equation plus terms of lower order.
• If one eigenvalue is positive and the others negative (or vice versa) we say that the
equation is hyperbolic. This means that we can transform the equation to the wave
equation plus lower order terms.
• If the quadratic form is non-degenerate, but not hyperbolic or elliptic, we say that the
equation is ultrahyperbolic. This means that both m+ and m− are ≥ 2 and that d =
m+ + m− . Note in particular that we must have d ≥ 4 for this to be possible.
• Finally, if one eigenvalue is zero and the others have the same sign, we say that the
equation is parabolic. Note that this is a degenerate case and that one needs information
on the first order terms to somehow compare the equation with the heat equation.
Roughly speaking, we expect elliptic equations to behave like Laplace’s equation, hyper-
bolic equations to like the wave equation and parabolic equations like the heat equation. A
large portion of the history of PDE has been devoted to finding suitable generalizations of
these definitions for operators of arbitrary order. Note that this classification by no means is
exhaustive, not even for second order equations.
Using multi-index notation, we can write a general linear partial differential operator of
order m with constant coefficients as
∑ aα ∂ α ,
|α|≤m
where aα ∈ C. There is no need to assume that the coefficients are real, but note that a
complex equation always can be written as a system of two real equations. It turns out to be
more convenient to work with the partial derivatives D j = ∂ j /i and D = (∂1 , . . . , ∂d )/i. We
therefore consider operators of the form
P(D) = ∑ aα Dα ,
|α|≤m
P(ξ ) = ∑ aα ξ α
|α|≤m
obtained by replacing D j by ξ j . The reason for this choice is that D j corresponds to ξ j on the
Fourier side, and hence
F (P(D)u)(ξ ) = P(ξ )û(ξ ). (3.14)
54 Linear PDE with constant coefficients
and the principal part is Pm (D). Just as for second order operators, one can relate properties
of the PDE to those of the polynomials P and Pm . This is e.g. evident by considering relation
(3.14). Our aim will be to find a suitable general definition of hyperbolicity in terms of P
and Pm . The idea is to consider two particular characteristics of the wave equation, namely
the well-posedness of the initial-value problem and the finite speed of propagation. We shall
therefore devote the next section to a study of the initial-value problem for partial differential
operators with constant coefficients.
where A(ξ ) is an n × n matrix with polynomial entries and u is defined on Rd × [0, ∞) with val-
ues in Rn . Note that scalar equations of order n in t can be reduced to this form by introducing
the time derivatives of order ≤ n − 1 as new variables. The corresponding partial differential
operator is now the matrix-valued operator with symbol
P(ξ , τ) = iτI − A(ξ ),
where I the unit n × n matrix and we use τ to denote the Fourier variable corresponding to t.
Linear PDE with constant coefficients 55
Definition 3.26 (Well-posedness in the sense of Hadamard). We say that the Cauchy problem
is well-posed if:
Of course, this has to be made more precise in the sense that one has to specify the class of
initial data, the class of solutions and the topology used to measure the continuity of the map
u0 7→ u. We will consider data u0 ∈ S (Rd )n , meaning that each component of u0 belongs to
S (Rd ). Note that Definition 3.1 extends in a natural way to functions with values in S (Rd )n
by imposing the conditions on each component.
(2) the map S : S (Rd )n → C∞ ([0, ∞); S (Rd )n ), S(t)u0 = u(t), is continuous.
Remark 3.28. The assumption that u ∈ C∞ ([0, ∞); S (Rd )n ) in the uniqueness result can be
relaxed. It suffices e.g. to assume that u ∈ C([0, ∞); S (Rd )n ) ∩C1 ((0, ∞); S (Rd )n ). From the
equation ut = A(Dx )u it is then easy to prove that u ∈ C2 ((0, ∞); S (Rd )n ) with ∂t ut = A(Dx )ut
and ut (0) = A(Dx )u0 . Using induction, we find that u ∈ C∞ ((0, ∞); S (Rd )n ). Moreover, all
the derivatives extend continuously to t = 0.
We look for a general condition under which the initial-value problem is well-posed in S .
Taking the Fourier transform of (3.15) we obtain the problem
(
ût = A(ξ )û, t > 0,
(3.16)
û = û0 , t = 0.
56 Linear PDE with constant coefficients
Note that (3.16) is a linear system of ordinary differential equations. We therefore know that
for each ξ and u0 (ξ ), (3.16) has a unique solution
defines a solution of the equation. A straight-forward proof shows that the function t 7→ u(·,t)
satisfies the conditions of Definition 3.27.
For the general case one needs a property of the matrix exponential. This property is best
proved using complex analysis.
Lemma 3.32. There exists a constant C > 0 such that
keA k ≤ C(1 + kAkn−1 )emax Re σ (A)
for A ∈ Cn×n , where σ (A) denotes the spectrum (set of eigenvalues) of A and k · k denotes the
matrix norm.
Linear PDE with constant coefficients 57
Proof. The proof relies on the following representation of the matrix exponential:
1
I
tA
e = etz (zI − A)−1 dz, (3.17)
2πi |z|=r
where r is chosen so that all the eigenvalues of A are contained in the ball Br (0) = {z ∈ C : |z| <
r} and the contour is traversed counterclockwise. The path integral of a matrix-valued function
is just the matrix formed by taking the path integral of each element of he matrix.
To prove formula (3.17), let B(t) be the function defined by the path integral. Notice first
that the function z 7→ etz (zI − A)−1 is analytic away from the eigenvalues of A (again defined
by considering each element). It follows that B(t) is independent of r. By the properties of the
matrix exponential, it suffices to check that
B0 (t) = AB(t)
Form the union U = ∪kj=1 B1 (λ j (A)) (note that some of the discs may intersect). Let Γ be the
boundary of U, traversed counterclockwise. Deforming the contour |z| = r into Γ, we obtain
from (3.17) that
1
I
A
e = ez (zI − A)−1 dz.
2πi Γ
On Γ we have that |z − λ j | ≥ 1 for each j. Writing,
det(zI − A) = Πkj=1 (z − λ j )m j ,
| det(zI − A)| ≥ 1.
det((zI − A)k, j )
(zI − A)−1
j,k = (−1)
j+k
,
det(zI − A)−1
On the other hand, when z belongs to Γ it follows that |z − λ j | = 1 for some j. Using the
trivial estimate |λ j | ≤ kAk, which follows from the definition of an eigenvalue, we find that
|z| ≤ kAk + 1 on Γ. We conclude that k(zI − A)−1 k ≤ C2 (1 + kAkn−1 ). Finally, |ez | = eRe z ≤
eRe λk +1 on Γ and the total length of Γ is at most 2πn. It follows that
1 I z −1
A
e (zI − A) dz
≤ C(1 + kAkn−1 )eRe λk ,
ke k ≤
2πi Γ
where C = nC2 e.
Proof of Theorem 3.31. Given the assumptions in the theorem and the previous lemma it is
not difficult to see that
ketA(ξ ) k ≤ C(1 + |t|n−1 |ξ |m )eCAt ,
where m is the maximum degree of the elements of A(ξ ) times n − 1. We therefore find that
uniformly for t in a compact interval, for some other constant C. The solution of a system of
ordinary differential equations whose coefficients are C∞ functions of some parameter depends
smoothly on that parameter. It follows that û is C∞ . Differentiating (3.16) with respect to ξk ,
we find that (
∂ξk ût = A(ξ )∂ξk û + (∂ξk A(ξ ))û, t > 0,
∂ξk û = ∂ξk û0 , t = 0,
Linear PDE with constant coefficients 59
and hence Z t
tA(ξ )
∂ξk û(ξ ,t) = e ∂ξk û0 (ξ ) + e(t−s)A(ξ ) (∂ξk A(ξ ))û(t, ξ ) dξ .
0
We also have that
∂t û(ξ ,t) = etA(ξ ) A(ξ )û0 (ξ ).
This will at most increase |û(ξ ,t)| by a polynomial factor. The same holds whenever we take
a finite number of derivatives. Since u0 ∈ S , we therefore find that û(·,t) ∈ S and hence
u(·,t) ∈ S .
To prove that t 7→ u(·,t) is continuous we use the formula
Z
u(x,t) = (2π)−d/2 eix·ξ etA(ξ ) û0 (ξ ) dξ .
Rd
It follows that
Z
ku(·,t + h) − u(·,t)kL∞ (Rd ) ≤ C ke(t+h)A(ξ ) − etA(ξ ) k|û0 (ξ )| dξ .
Rd
For any ε we can choose R so large that
Z Z
(t+h)A(ξ ) tA(ξ )
ke −e k|û0 (ξ )| dξ ≤ C (1 + |ξ |m )|û0 (ξ )| dξ ≤ ε,
|ξ |≥R |ξ |≥R
while Z Z
(t+h)A(ξ ) tA(ξ )
ke −e k|û0 (ξ )| dξ ≤ Ch |û0 (ξ )| dξ → 0
|ξ |≤R |ξ |≤R
as h → 0. It follows that ku(·,t + h) − u(·,t)kL∞ (Rd ) → 0 as h → 0. Similar estimates are easily
obtained for all the other semi-norms. To prove that u is C1 in t one needs in addition the
identity
Z t+h
(t+h)A(ξ ) tA(ξ )
e −e = A(ξ )esA(ξ ) ds,
t
from which it follows that
e(t+h)A(ξ ) − etA(ξ )
0
= (1 + |ξ |m )e(t+h)CA ,
h
for some m0 . Splitting the integral as above it then follows that k u(·,t+h)−u(·,t)
h −ut (·,t)kα,β → 0
as h → 0 for every semi-norm, where ut is the formal time-derivative of u,
Z
ut (x,t) = (2π)−d/2 eix·ξ A(ξ )etA(ξ ) û0 (ξ ) dξ .
Rd
The continuity of ut is shown as above. It follows that t 7→ u(·,t) belongs to C1 ([0, ∞); S (Rd ))
and hence to C∞ ([0, ∞); S (Rd )) using the equation (see Remark 3.28). The continuity of the
map u0 7→ u follows from the estimates
ku(·,t)kα,β ≤ C ∑ ku0 kα,β 0
|β 0 |≤|β |+m
One can show that Petrowsky’s condition is also necessary for well-posedness in S . For
this it is necessary to use a result from algebraic geometry known as the Tarski-Seidenberg
theorem. This result implies that the maximum real part of the eigenvalues of A(ξ ) either is
bounded or grows as a power in |ξ | as ξ → ∞. More precisely, one has that
max{Re λ : det(λ I − A(ξ )) = 0 for some ξ ∈ Rd with |ξ | = r} = crα (1 + o(1))
as r → ∞ for some α ∈ Q and c ∈ R. Suppose that Petrowsky’s condition is violated. We can
then find a sequence ξ j → ∞ and corresponding eigenvalues λ j with Re λ j ∼ c|ξ j |α , c, α > 0.
Let e j be a sequence of corresponding unit eigenvectors. Choosing
û j (ξ ) = |ξ j |−1 ϕ(ξ − ξ j )e j ,
for some fixed function ϕ ∈ C0∞ (Rd ) with ϕ(0) = 1, we obtain that
α
|etA(ξ j ) û j (ξ j )| = |ξ j |−1 e(Re λ j )t e j ∼ |ξ j |−1 ec|ξ j | t e j ,
so that u j (·,t) = F −1 (û j (·,t)) is ubounded in S (Rd ) for any fixed t > 0, while u0, j =
F −1 (û j ) → 0 in S (Rd ). This proves that (2) in Definition 3.27 is violated. In fact, one
can show that for ‘most’ initial data, there is not even a solution in the sense of Definition
3.27.
Example 3.33. Consider the scalar equation ut = a(Dx )u, where
a(ξ ) = am ξ m ,
with am 6= 0. To avoid trivialities, we suppose that m ≥ 1. If m is odd, the equation is well-
posed if and only if am is purely imaginary. If m is even, the equation is well-posed if and only
if Re am ≤ 0.
Example 3.34. The linearized KdV equation ut + uxxx = 0 can be written ut = a(Dx )u with
a(ξ ) = iξ 3 . Thus, it is well-posed. So is the Schrödinger equation iut + uxx = 0, which can be
written ut = a(Dx )u with a(ξ ) = −iξ 2 .
Example 3.35. The heat equation ut = ∆u can be written ut = a(Dx )u with a(ξ ) = −|ξ |2 .
Thus, it is well-posed. However, the backwards heat equation ut = −∆u is ill-posed. This
equation is obtained by considering the evolution of the heat equation for t < 0.
Since we have assumed that A(Dx ) is independent of t, the semigroup property follows at
once if Petrowsky’s condition is satisfied. On the Fourier side this corresponds to the identity
e(s+t)A(ξ ) = esA(ξ ) etA(ξ ) .
One sometimes uses the notation
S(t) = etA(Dx )
due to the similarity with the matrix exponential.
Finally, we consider the inhomogeneous problem
(
ut = A(Dx )u + f , t > 0,
(3.18)
u = u0 , t = 0,
which will play a big role when we consider nonlinear equations.
Linear PDE with constant coefficients 61
Theorem 3.36 (Duhamel’s formula). Let f ∈ C∞ ([0, ∞); S (Rd )) and u0 ∈ S (Rd ) and as-
sume that A satisfies Petrowsky’s condition. The inhomogeneous Cauchy problem (3.18) has
a unique solution u ∈ C∞ ([0, ∞); S (Rd )) given by
Z t
u(t) = etA(Dx ) u0 + e(t−s)A(Dx ) f (s) ds, t ≥ 0.
0
from the theory of ordinary differential equations. On the other hand, û(ξ ,t) defined by this
formula is a solution of the the transformed version of the problem. Using the same approach
as in the proof of Theorem 3.31 one shows that e(t−s)A(ξ ) fˆ(ξ , s) is a smooth function of s and
t with values in S (Rd ) when 0 ≤ s ≤ t. It is then an easy matter to prove that û belongs to
C∞ ([0, ∞); S (Rd )). It follows from Lemma 3.2 that the same is true for u = F −1 (û). Since
(ξ , s) 7→ e(t−s)A(ξ ) fˆ(ξ ) is in L1 (Rd × [0,t]) we can change the order of integration to deduce
that
Z t Zt Z t
−1 (t−s)A(ξ ) ˆ −1 (t−s)A(ξ ) ˆ
F e f (ξ , s) ds = F (e f (ξ , s)) ds = e(t−s)A(Dx ) f (s) ds.
0 0 0
can be extended to the whole complex plane. Moreover, we have an estimate of the form
On the other hand, we know that û(ξ ,t) = etA(ξ ) û0 (ξ ) and one can show that this together with
(3.20) implies a growth condition of the form |λ (ξ )| ≤ C(1 + |ξ |) on the eigenvalues of A(ξ ).
This leads to the condition (3.19). For the converse, one needs a characterization of functions
of compact support in terms of their Fourier transform. One can show that if û(·,t) is an entire
62 Linear PDE with constant coefficients
analytic function satisfying (3.20), then u(·,t) has compact support. Results of this type are
called Paley-Wiener theorems. A proof of these facts is beyond the scope of these notes.
An equation which is well-posed in S (Rd ) and has finite speed of propagation is in fact
well-posed in C∞ (see the discussion before Proposition 3.20). The hypothesis of finite speed
of propagation is crucial for this to hold. It can be shown that if the condition (3.19) is violated,
there exists a non-zero solution u ∈ C∞ (Rd × R) of ut = A(Dx )u which vanishes in the half-
space t ≤ 1. Thus, uniqueness is lost for the Cauchy problem.
We are now ready for the definition of a hyperbolic operator with constant coefficients. The
operator ∂t I − A(Dx ) is said to be hyperbolic if it satisfies condition (3.19) and Petrowsky’s
condition (Definition 3.29). For a general partial differential operator P(D) in Rd , it may
not be clear what to consider as the time-variable. Moreover, one may also wish to solve
the Cauchy problem with data on a hypersurface in Rd . It is therefore useful to have a more
general definition of hyperbolicity.
(1) A scalar partial differential operator P(D) of order m is said to be hyperbolic in the
direction ν ∈ Rd if its principal part satisfies
Pm (ν) 6= 0,
P(ξ + τν) 6= 0, ξ ∈ Rd , Im τ ≤ τ0 .
(2) P(D) is said to be strictly hyperbolic in the direction ν if, in addition, for every ξ ∈ Rd
not parallel with ν, the equation
Pm (ξ + τν) = 0,
(3) A matrix-valued operator P(D) is said to be (strictly) hyperbolic if det P(D) is (strictly)
hyperbolic.
Note that we recover the definition of hyperbolicity for P(D) = ∂t I − A(Dx ) by taking
ν = (0, 1) ∈ Rd × R.
In the following discussion we consider a scalar operator P(D). The condition Pm (ν) 6= 0
is the analogue of (3.19). We say that the direction ν is non-characteristic for P if this condi-
tion holds. A smooth (d − 1)-dimensional hypersurface Σ in Rd is said to be non-characteristic
if at each point of Σ, the normal is a non-characteristic direction for P. The Cauchy problem
with data on Σ consists of prescribing the values of the function u and all normal derivatives
of u of order strictly less than m on Σ. One can show that the Cauchy problem for a hyperbolic
operator with C∞ data on a non-characteristic hypersurface is well-posed. If one only assumes
Linear PDE with constant coefficients 63
that the surface is non-characteristic, one can still prove the (local) existence and uniqueness
of solutions with real-analytic data if Σ is real-analytic. This is the Cauchy-Kovalevskaya the-
orem. Intuitively, the assumption means that the normal derivative of order m can be expressed
in terms of the normal derivatives of order ≤ (m − 1) (the data) and their tangential deriva-
tives. This can be used to construct a formal power series solution, and the proof consists
of showing that the series converges. In fact, the theorem also holds for nonlinear equations.
The assumption of real-analyticity is often too strong for the theorem to be of any practical
relevance. Moreover, the solution may not depend continuously on the data.
P(x, D) = ∑ aα (x)Dα ,
|α|≤m
with (smooth) variable coefficients. We say that P is (strictly) hyperbolic in the direction ν
at x0 if P(x0 , D) is (strictly) hyperbolic in the direction ν. It turns out that strict hyperbolicity
is sufficient for well-posedness, whereas the weaker notion is not. However, there are certain
notions in between which are useful. Consider e.g. the first order system
d
A0 (x,t)∂t u − ∑ Ak (x,t)∂k u − B(x,t)u = 0, (3.21)
k=1
Definition 3.38. The system (3.21) is called a symmetric hyperbolic system if all the matrices
Ak (x,t) are symmetric and A0 (x,t) is positive definite for all (x,t).
One can prove that the Cauchy problem with initial data on t = 0 is well-posed also for
such systems.
64 Linear PDE with constant coefficients
Chapter 4
After our study of linear equations in the previous chapter, we now turn to nonlinear hyper-
bolic equations. We will focus on real first order equations since these can be solved by the
method of characteristics. The main emphasis is on showing that initially smooth solutions
can develop singularities. This is something one encounters already in a course on ordinary
differential equations, where solutions of the nonlinear equation ẋ = x2 blow up in finite time.
We will also briefly discuss how one can continue a solution after the development of a sin-
gularity. To do this we introduce the concept of weak solutions. Since weak solutions are in
general not differentiable, they cannot satisfy the PDE in the classical sense. To interpret the
solution one has to go back to the original physical problem. We will do this for a class of
equations called conservation laws, for which weak solutions have a natural interpretation.
More details on the method of characteristics can be found in Evans [7] or John [14]. For
more on hyperbolic conservation laws, see e.g. Bressan [4], Hörmander [11], Holden and
Risebro [10] and Lax [17].
where Ω ⊂ Rd and the notation means that the real-valued function F depends on u and all of
its derivatives of order less than or equal to m. Note that we now assume that u is real-valued.
This will be the case throughout the entire chapter. In the previous chapter we discussed linear
PDE,
∑ aα (x)∂ α u(x) = f (x), x ∈ Ω,
|α|≤m
65
66 First order quasilinear equations
Definition 4.1.
In other words, the highest derivatives appear linearly in a quasilinear equation. If the
coefficients of the highest derivatives are also independent of the function u and its derivatives,
the equation is semilinear. A rule of thumb is that a PDE is more difficult to handle the more
the nonlinearity affects the highest derivatives.
Example 4.2.
utt − ∆u + f (u) = 0,
• Burgers’ equation
ut + uux = 0
is an example of a quasilinear equation, which is not semilinear.
One can define characteristic directions and hyperbolicity also for nonlinear equations. For
a semilinear eqution it is natural to define ξ ∈ Rd to be a characteristic direction at the point
x ∈ Ω if ξ is a characteristic direction of the linear operator ∑|α|=m aα (x)∂ α . The definition
is similar in the quasilinear case: ξ ∈ Rd is a characteristic direction if ξ is characteristic
for ∑|α|=m aα (x, u(x), . . . , ∂ m−1 u(x))∂ α . This condition depends on both x and the function u.
Strict hyperbolicity is defined in an analogous manner. In the fully nonlinear case, one can
define strict hyperbolicity in terms of the linearization at u. The linearization at u is obtained
by calculating
d α
F(x, ∂ (u + εv)) = ∑ aα (x)∂ α v(x),
dε ε=0
|α|≤m
where
∂F
aα (x) = (x, ∂ β u(x)).
∂ (∂ α u)
We say that the equation (4.1) is strictly hyperbolic if its linearization is strictly hyperbolic.
We leave it as an exercise to prove that this agrees with the previous definitions in the case that
the equation is semilinear or quasilinear.
a(x, u) · ν 6= 0,
where a(x, u) = (a1 (x, u), . . . , ad (x, u)). The equation is then automatically strictly hyperbolic,
since the equation
a(x, u) · (ξ + τν) = 0
has the real root τ = −(a(x, u) · ξ )/(a(x, u) · ν) for each ξ ∈ Rd . The equation also shares
the finite speed of propagation property with the linear hyperbolic equations with constant
coefficients discussed in the previous chapter. This follows from the method of characteristics
discussed below.
Many of the models which appear in physics are in fact systems of PDE. The only differ-
ence is that u and F are then vector valued. One usually assumes that the number of equations
equals the number of unknown functions. One can make a similar division into linear, semi-
linear, quasilinear and fully nonlinear equations.
The coefficients a j are assumed to be C1 functions. Our goal is to solve (4.4) by turning it into
an appropriate system of ODE. Given u, we can define a curve x : I → Rd , I = (s0 , s1 ) ⊂ R, by
ẋ(s) = a(x(s), u(x(s))). The left hand side of (4.4) is the derivative of u along the curve x(s).
In other words, ż(s) = f (x(s), z(s)), where z(s) = u(x(s)). This closes the system and we have
obtained the characteristic equations
(
ẋ(s) = a(x(s), z(s)),
(4.5)
ż(s) = f (x(s), z(s)).
The curves x(s) are called characteristics, although this name is sometimes used for the curves
(x(s), z(s)) in Rd+1 . If equation (4.4) is semilinear, the first equation doesn’t involve z(s). We
can then first find the characteristic x(s) and then solve the second equation. If the original
equation is linear this simply entails integrating f (x(s)).
The characteristics are circles, x(s) = r(cos s, sin s), r > 0 fixed. Once we know this, we obtain
that Z s
u(r cos s, r sin s) = f (r cos θ , r sin θ ) dθ + g(r),
0
where g is arbitrary. Note that a necessary condition in order to obtain a global solution is that
Z 2π
f (r cos θ , r sin θ ) dθ = 0
0
for every r, since otherwise we will not obtain the same value of u when we go one time
around the origin.
We now illustrate how the characteristics can be used to solve the Cauchy problem in the
neighbourhood of a point. For simplicity we assume that the initial data u0 is given on the
hyperplane xd = 0.
Theorem 4.5 (Local solvability). Let u0 ∈ C1 (U) for some open set U ⊂ Rd−1 and let y0 ∈ U.
Assume that moreover that a and f are C1 in Rd and that the equation is non-characteristic in
the direction ed at (y0 , 0) and u0 . There exists an open subset W ⊂ Rd with W ∩(Rd−1 ×{0}) ⊂
U such that equation (4.4) has unique solution u ∈ C1 (W ) with u(y, 0) = u0 (y).
Proof. The solution is obtained by solving the characteristic equations (4.5) with initial data
x(0; y) = (y, 0) and z(0; y) = u0 (y), where y ∈ U. By the theory of ordinary differential equa-
tions, there exists for each y a maximal solution x(s; y) defined for s in some open inter-
val I(y) containing the origin. Moreover, the function (y, s) 7→ x(s; y) is C1 on the open set
First order quasilinear equations 69
∪y∈U {y} × I(y). We wish to define the solution for s 6= 0 as u(x(s; y)) = z(s; y). This re-
quires that the map y 7→ x(s; y) is injective, in other words, that the characteristics don’t cross.
Moreover, as y and s vary, the curves x(s; y) should trace out an open set W ⊂ Rd . Both of
these conditions follow from the inverse function theorem if we can prove that the Jacobian
of the map (y, s) 7→ x(s; y) is non-zero at s = 0 and y = y0 . Since x(0; y) = (y, 0) it follows that
∂y j xi (0; y) = δi j . We also have that ∂s x(0; y) = a((y, 0), u0 (y)) from (4.5). Hence, the Jacobian
is
1 0 . . . a1 ((y0 , 0), u0 (y0 ))
0 1 . . . a2 ((y0 , 0), u0 (y0 ))
= ad ((y0 , 0), u0 (y0 )) 6= 0
.. .. . . ..
. . . .
0 0 · · · ad ((y0 , 0), u0 (y0 ))
by the non-characteristic assumption. The fact that u defined in this way is a solution of the
problem follows from the inverse function theorem and the definition of the characteristics.
The uniqueness of the solution follows from the construction.
For details on the generalization of the method of characteristics to fully nonlinear first
order equations and the solution of the Cauchy problem with data on a noncharacteristic hy-
persurface in Rd we refer to Evans [7, Chapter 3.2].
Assume that the production of the substance at a point (x,t) in space and time is given by
g(x,t) (substance is destroyed if g(x,t) < 0) and that the flow of the substance is described by
the vector-valued function f : Rd × R → Rd . The same considerations as in Section 3.1.1 lead
to the equation
d
Z Z Z
u(x,t) dx = − f (x,t) · n dS + g(x,t).
dt Ω ∂Ω Ω
Assuming again that all the involved functions are continuous, we obtain the continuity equa-
tion
ut + ∇ · f = g.
To get any further one has to know the form of f and g, which depends on the specific problem.
In many cases, f and g depend on u and possibly its derivatives. The relation between these
quantities is called a constitutive law. We shall assume throughout the rest of the chapter that
70 First order quasilinear equations
f (x,t) = f (u(x,t)), that is, f depends solely on u. We will also mostly consider the case g ≡ 0.
The equation of interest is therefore
ut + ∇ · f (u) = 0.
This is called a scalar conservation law. The function f is sometimes called the flux function
in this context. Carrying out the differentiation, we find that
ut + a1 (u)∂1 u + · · · + ad (u)∂d u = 0,
where ai (u) = fi0 (u), i = 1, . . . , d. In particular, we notice that a scalar conservation law always
is hyperbolic in the t-direction (cf. Example 4.3). Many of the interesting problems from
physics are in fact systems of conservation laws.
∂t u j + ∇ · f j (u) = 0, (4.6)
Example 4.6 (Burgers’ equation). Burgers’ equation ut +uux = 0 can be rewritten as the scalar
one-dimensional conservation law
2
u
ut + =0
2 x
Example 4.7 (Equations of gas dynamics). The macroscopic description of a gas (or any other
fluid) involves the fluid velocity u, the density ρ, the pressure p and the energy E. The first is a
vector quantity while the others are scalar quantities. At a given moment in time, the functions
are defined in some open subset of R3 . The flow of substance across a surface is given by ρu·n,
where n is the unit normal. The conservation of mass can therefore be expressed in the form
ρt + ∇ · (ρu) = 0. (4.7)
Basic principles of physics require balance of momentum. This is expressed as the conserva-
tion law2
∂t (ρu j ) + ∇ · (ρu j u) + ∂ j p = ρFj , (4.8)
where ρu j are the components of the momentum density, ρu j u · n the components of the
momentum flow across a surface, p is the pressure and Fj the components of the body forces
(e.g. gravity). Here we have neglected viscosity. Equation (4.8) can be written in vector
notation as
∂t (ρu) + ∇ · (ρu ⊗ u) + ∇p = ρF,
2 We’re abusing the terminology here, since the equations are not of the form (4.6). Introducing new variables
w j = ρu j we would however obtain (4.6) with source terms.
First order quasilinear equations 71
where u⊗u = (u j uk ) j,k ∈ Rd is the outer product of the the vector u with itself. In addition, one
needs an equation for the conservation of energy. Ignoring heat conduction and production,
this takes the form
∂t E + ∇ · (u(E + p)) = ρF · u, (4.9)
where E is the total energy per unit volume. E is the sum of internal energy and kinetic
energy associated with the macroscopic motion, E = ρe + 12 ρ|u|2 , where e denotes the specific
internal energy (internal energy per unit mass). In general one should add a a heat flux term
as in Section 3.1.1 and possibly a heat production term. The considerations so far are not
specific for gases, and would work equally well for a liquid. Note that we have six unknowns
(counting all components of u) but only five equations (4.7)–(4.9). This means that we have an
underdetermined system. Liquids are assumed to be incompressible, mathematically meaning
that introduces the extra equation ∇ · u. For a gas one adds instead a constitutive relation of the
form e = e(p, ρ). Assuming that we are dealing with an ideal gas with constant entropy (an
isentropic gas) the equations can be reduced to (4.7)–(4.8) together with a constitutive relation
p = f (ρ),
where f (ρ) = κρ γ for some constants κ > 0 and γ > 1. Here we have also assumed that the
gas is polytropic, that is, that it has constant specific heat capacities. Equation (4.8) can be
simplified using (4.7). Carrying out the differentiation in (4.8), one obtains that
(ρt + ∇ · (ρu)) u j + ρ ∂t u j + u · ∇u j + ∂ j p = ρFj ,
which simplifies to
1
ut + (u · ∇)u + ∇p = F (4.10)
ρ
due to the identity (4.7). Equation (4.10) is the usual form of Euler’s equations. Note that
(4.8) can be written in the similar form
ρt + u · ∇ρ + ρ∇ · u = 0.
ẋ(t) = u(x(t),t).
The material derivative of some physical quantity, such as the density, is then simply the
derivative of the quantity along the path of a fluid particle, that is,
dn o Dρ
ρ(x(t),t) = ρt (x(t),t) + u(x(t),t) · ∇ρ(x(t),t) = (x(t),t).
dt Dt
See Whitham [28, Chapter 6] for a much more extensive discussion of gas dynamics.
72 First order quasilinear equations
Example 4.8 (The linearized equations of gas dynamics). We now show how the linearized
equations of gas dynamics discussed in Example 1.1 can be derived from (4.7) and (4.10)
under the assumption (4.7). We assume in addition that there are no external forces (F = 0).
Note that a constant state u ≡ 0, ρ ≡ ρ0 and p ≡ p0 := f (ρ0 ) is a solution of the equations.
Consider now a small perturbation of the constant state: ρ = ρ0 + ε ρ̃ and u = ε ũ, where ε is a
small parameter. Substituting this into the governing equations, expanding in powers of ε and
dividing by ε, we obtain
ρ̃t + ρ0 ∇ · ũ = O(ε)
and
f 0 (ρ0 )
ũt + ∇ρ̃ = O(ε).
ρ0
Neglecting terms of order ε and dropping the tildes for notational convenience, we arrive at
the linearized equations
ρ0 ut + c20 ∇ρ = 0,
ρt + ρ0 ∇ · u = 0,
p
from Example 1.1, where c0 = f 0 (ρ0 ).
Example 4.9 (Traffic model). We discuss a simple model for traffic flow introduced inde-
pendently by Lighthill and Whitham [18] and Richards [21]. Consider a road consisting of a
single lane. The traffic is described by a the density ρ(x,t) of cars per unit length. The number
of cars in the stretch [a, b] is given by ab ρ(x,t) dx. The function ρ is in reality a piecewise
R
continuous function of x (if we count fractions of cars), but if we replace it by the local av-
erage over a an interval which is sufficiently large compared with the average length of the
cars and sufficiently small compared with the total length of the road, we can assume that it
is continuous. We also assume that it is sufficiently differentiable for the following considera-
tions to go through. We suppose that there are no exists or entries, so that the only way a car
can enter or leave a stretch of road is through the endpoints. Let v(x,t) be the velocity of an
individual vehicle at (x,t). The rate of cars passing through the point x at time t is then given
by v(x,t)ρ(x,t) and we have the relation
Z b
d
ρ(x,t) dx = v(a,t)ρ(a,t) − v(b,t)ρ(b,t).
dt a
ρt + (vρ)x = 0.
The model has to be completed with a constitutive relation between v and ρ (and possibly
(x,t)). Different relations can be used to describe different traffic situations. Assuming that
the road looks the same everywhere and at all times, we postulate that v is a function only of ρ.
A natural assumption is that v is a decreasing function of ρ (cars slow down when the traffic
becomes denser). We assume that when the traffic is light, a car will drive at maximum speed
First order quasilinear equations 73
vmax and that when the traffic reaches a critical density ρmax the cars will stop. The simplest
possible constitutive relation which satisfies these conditions is given by
ρ
v = vmax 1 − ,
ρmax
meaning that ρ satisfies the conservation law
ρt + f (ρ)x = 0,
ρ
with f (ρ) = ρvmax 1 − ρmax . According to this model, the maximum flow rate vρ is attained
when ρ = ρmax /2. Note that the ‘wave velocity’ a(ρ) = f 0 (ρ) = vmax 1 − ρ2ρ max
is less than
the individual velocity of the cars v(ρ). The interpretation is that information travels backward
through the traffic and that the drivers are influenced by what happens ahead. Setting u =
ρ/ρmax and x = vmax x̃ turns the equation into
ut + (u(1 − u))x = 0,
where we have dropped the tilde for notational convenience. Setting w = 1 − 2u one recovers
Burgers’ equation. See Whitham [28, Chapter 3] for more on traffic models.
There are well-developed theories for scalar conservation laws in any dimension (n = 1)
and for systems of conservation laws in one spatial dimension (d = 1). Very little is known
when both d > 1 and n > 1. For simplicity, we will consider scalar conservation laws in one
space-dimension (d = n = 1). The Cauchy problem is
ut + a(u)ux = 0, (4.14)
where a(u) = f 0 (u). Equation (4.14) can be thought of as a generalization of the ‘transport
equation’ ut + aux = 0, a constant, in which the velocity a(u) depends on the solution u. As
discussed in Section 1.2 of the introduction, the crossing of the characteristics can be inter-
preted as saying that the portion of the graph of u initially located at x1 eventually overtakes
the portion of the graph initially located at x2 > x1 , given that a(u0 (x1 )) > a(u0 (x2 )). This
causes the function u to become multivalued.
We now summarize the local theory for the problem (4.11)– (4.12).
74 First order quasilinear equations
Theorem 4.10. Let u0 ∈ Cb1 (R) and f ∈ C2 (R). There exists a maximal time T > 0 such that
(4.11)–(4.12) has a unique solution u ∈ C1 ([0, T ) × R). The corresponding characteristics are
straight lines. If a(u0 ) is increasing, the maximal existence time is infinite, T = ∞. Otherwise,
1
T =− .
infx∈Rd ∂x (a(u0 (x)))
Proof. The proof essentially follows as for Theorem 4.5, the difference being that we are no
considering the global picture. We begin by proving the uniqueness. Due to the form of the
equation we can parametrize the characteristics using t and write them in the form (x(t; y),t).
The characteristic equations simplify to
(
ẋ = a(z),
ż = 0,
meaning that u is constant along the characteristics and that the characteristics are straight
lines t 7→ (y + a(u0 (y))t,t). The map y 7→ x(t; y) defines a change of variables as long as it is
strictly increasing with limy→±∞ x(t; y) = ±∞. The latter property will hold for all t since u0
is bounded. To verify the former property, we calculate
which is positive for all y precisely when t < T as defined above. This shows that the so-
lution is uniquely determined by u0 . Note that the map y 7→ x(t; y) is not monotone for
t > T . This means that there cannot exist a C1 solution beyond T since we would then have
x(t; x1 ) = x(t; x2 ) := x and u(x,t) = u0 (x1 ) = u0 (x2 ) for some numbers x1 < x2 . This means that
the characteristics starting at x1 and x2 intersect at (x,t). On the other hand, the slope of these
characteristics is a(u(x1 )) = a(u(x2 )), which shows that they can never intersect. Clearly this
is preposterous. The existence of the solution simply follows by defining u(x,t) = u0 (g(x,t)),
where x 7→ g(x,t) is the inverse of the map y 7→ x(t; y). The inverse function theorem guaran-
tees that g, and hence u, is a C1 function. From the definition of the characteristics it follows
that u satisfies the differential equation, and that u(x, 0) = u0 (g(x, 0)) = u0 (x).
First order quasilinear equations 75
for u to be a solution of the integrated version of the conservation law. This is called the
Rankine-Hugoniot condition. It is often written
[[ f (u)]] = ẋ(t)[[u]],
where [[v]] = vr (t) − vl (t) for a function v having limits vl (t) from the left and vr (t) from the
right at the curve x = x(t).
Equation (4.13) still requires that we can make sense of the derivative of ab u(x,t) dx. One
R
could obtain an even weaker notion of solutions by integrating this relation over a time interval.
There is a related concept which we now describe. It has the advantage that it can be used to
define discontinuous solutions of other types of differential equations than conservation laws.
Let u be a C1 solution of (4.11) and multiply the equation by a function ϕ ∈ C0∞ (R2 ). If we
now integrate by parts, we obtain that
ZZ
0= (ut (x,t) + f (u(x,t))x )ϕ(x,t) dx dt
R2+
ZZ Z
=− (u(x,t)ϕt (x,t) + f (u(x,t))ϕx (x,t)) dx dt − u(x, 0)ϕ(x, 0) dx,
R2+ R
∞ (R2 ) we mean that u ∈ L∞ (K) for every compact subset K of R2 := R × [0, ∞).
By u ∈ Lloc + +
We have already seen that a classical solution is a weak solution. If a weak solution u is C1 it
is easy to see that it must in fact be classical solution. Indeed, integrating by parts one finds
that
ZZ Z
(ut (x,t) + f (u(x,t))x )ϕ(x,t) dx dt + (u(x, 0) − u0 (x))ϕ(x, 0) dx = 0.
R2+ R
It now follows that that ut (x,t) + f (u(x,t))x = 0 for t > 0. Otherwise one could find an open
set Ω where this is non-zero and of constant sign and then choose ϕ ≥ 0 and 6≡ 0 with support
in Ω to obtain a contradiction. Having established this, the initial condition follows from a
similar argument.
Proposition 4.12. Assume that x : [0, ∞) → R2+ is a C1 function with x(0) = 0. Assume that
u is a classical C1 solution on each side of the curve x = x(t) and that u and its deriva-
tives extend continuously to x = x(t) from each side, with ul (t) = limx→x(t)− u(x,t) and ur =
limx→x(t)+ u(x,t). Then u is a weak solution if and only if it satisfies the Rankine-Hugoniot
condition along the curve x = x(t).
Proof. Let Ωl = {(x,t) ∈ R2+ : x < x(t)} and Ωr = {(x,t) ∈ R2+ : x > x(t)}. We can write
ZZ ZZ ZZ
(uϕt + f (u)ϕx ) dx dt = (uϕt + f (u)ϕx ) dx dt + (uϕt + f (u)ϕx ) dx dt.
R2+ Ωl Ωr
Since ∇(x,t) · ( f (u)ϕ, uϕ) = (ut + f (u)x )ϕ + uϕt + f (u)ϕx = uϕt + f (u)ϕx in Ωl , we obtain
that
ZZ ZZ
(uϕt + f (u)ϕx ) dx dt = ∇(x,t) · ( f (u)ϕ, uϕ) dx dt
Ωl Ωl
Z
= ( f (u)ϕ, uϕ) · n ds
∂ Ωl
Z x(0) Z ∞
=− u0 (x)ϕ(x, 0) dx + ( f (ul (t)) − ẋ(t)ul (t))ϕ(x(t),t) dt
−∞ 0
by the divergence theorem, where n denotes the outer unit normal. In order to use the diver-
gence theorem, we should work in a bounded domain. Note however that ϕ(x,t) = 0 outside
a compact set, so that we could do the caclulations in Ωl ∩ R where R = (−A, A) × (−B, B) is
a large rectangle in R2 containing supp ϕ and {(x(t),t) : 0 ≤ t < B}. The result would be the
same since ϕ and its derivatives vanish on ∂ R. Performing the same calculation for Ωr , we
find that
ZZ Z ∞ Z ∞
(uϕt + f (u)ϕx ) dx dt = − u0 (x)ϕ(x, 0) dx + ( f (ur (t)) − ẋ(t)ur (t))ϕ(x(t),t) dt
Ωl x(0) 0
First order quasilinear equations 77
Figure 4.2: Characteristics for the Riemann problem with a(ul ) > a(ur ).
Adding the two parts and comparing with (4.15), we find that u is a weak solution if and only
if Z ∞
([[ f (u)]] − ẋ(t)[[u]])ϕ(x(t),t) dt = 0
0
where ul and ur are constants with ul 6= ur . We suppose in addition that a(ul ) 6= a(ur ).
Consider first the case a(ul ) > a(ur ). If we try to solve the problem using the method of
characteristics, we see that the region x < a(ur )t only contains characteristics emanating from
the negative half-axis, while the region x > a(ul )t only contains characteristics emanating
from the positive half-axis. We should therefore set
(
ul , x < a(ur )t,
u(x,t) =
ur , x > a(ul )t.
However, in the region a(ur )t ≤ x ≤ a(ul )t the characteristics intersect and it is unclear how
to define the solution (see Figure 4.2). We propose the solution
(
ul , x ≤ ct,
u(x,t) = (4.16)
ur , x > ct.
78 First order quasilinear equations
u
c
ul
ur
x
Figure 4.3: Shock wave and corresponding characteristics. Here a(ul ) > a(ur ).
Figure 4.4: Characteristics for the Riemann problem with a(ul ) < a(ur ).
Clearly u solves (4.11) on either side of the discontinuity. From the previous discussion, we
see that u is a weak solution if and only if the Rankine-Hugoniot condition
[[ f (u)]]
c= (4.17)
[[u]]
t
a(ur )
ur
ul x
a(ul )
Indeed, it is straight-forward to show that this defines a piecewise C1 function, which satisfies
the differential equation in each of the three different regions. It follows that it is a weak
solution. We thus have the unhappy situation that weak solutions are not unique. The function
u is called a rarefaction wave. Figure 4.5 shows a rarefaction wave corresponding to Burgers’
equation, for which a−1 (x/t) = x/t and solution is given by a straight line in the interval
a(ul )t ≤ x ≤ a(ur )t.
Since this is difficult to verify directly, we now derive a necessary condition on u for this
to hold, wich is easier to check. Suppose that η is a C1 function and define q by q0 (u) =
80 First order quasilinear equations
f 0 (u)η 0 (u). It is then easily confirmed that if u is a classical solution of the conservation law,
then η(u)t + q(u)x = 0. Suppose now that η is C2 and convex so that η 00 (u) ≥ 0. We will
show that η(u)t + q(u)x ≤ 0 in a suitable weak sense if u is a weak solution obtained as a limit
of C2 solutions of (4.18). We assume that uε → u in Lloc 1 (R2 ) and that uε is locally bounded
+
uniformly in ε. Note that we don’t prove that such a convergence result holds; the main point
is to motivate condition (4.19) below. Multiplying (4.18) by η 0 (uε )ϕ, where ϕ ∈ C0∞ (R2+ ) with
ϕ ≥ 0, we find that
ZZ
0= (utε + f (uε )x − εuεxx )η 0 (uε )ϕ dx dt
R2+
ZZ
= (η(uε )t + q(uε )x − εη 0 (uε )uεxx )ϕ dx dt
R2+
ZZ ZZ
= ε ε
(−η(u )ϕt − q(u )ϕx ) dx dt + ε (η 00 (uε )(uεx )2 ϕ − η(uε )ϕxx ) dx dt,
R2+ R2+
so that ZZ ZZ
ε ε
(−η(u )ϕt − q(u )ϕx ) dx dt ≤ ε η(uε )ϕxx dx dt.
R2+ R2+
for every ϕ ∈ C0∞ (R2+ ) with ϕ ≥ 0. This is sometimes expressed as saying that η(u)t + q(u)x ≤
0 in the sense of distributions. The pair (η, q) is called an entropy-flux pair.
Definition 4.13. A weak solution u is said to be entropy admissible if (4.19) holds for every
entropy-flux pair (η, q).
For a fixed number u? , the function ηδ (u) = ((u − u? )2 + δ 2 )1/2 is convex and C∞ . It
converges to |u − u? | as δ → 0. Choosing the corresponding flux function qδ appropriately,
we find that
f 0 (s)(s − u? )
Z u
qδ (u) = ds
u? ((s − u? )2 + δ 2 )1/2
Z u
→ f 0 (s) sgn(s − u? ) ds
u?
= ( f (u) − f (u? )) sgn(u − u? )
pair. Approximating (η, q) with (ηδ , qδ ) in (4.20) and passing to the limit using the dominated
convergence theorem, one obtains the inequality
ZZ
(|u − u? |ϕt + sgn(u − u? )( f (u) − f (u? ))ϕx ) dx dt ≥ 0
R2
for every u? ∈ R. This is Kružkov’s admissibility condition. One can in fact show that
Kružkov’s condition is equivalent to the entropy condition in Definition 4.13.
The entropy condition turns out to be exactly what is needed to prove existence and unique-
ness. The proof of the following theorem is beyond the scope of these notes. See e.g. Bressan
[4] and Hörmander [11].
Theorem 4.14. Assume that u0 ∈ L∞ (R) and that f is locally Lipschitz continuous. There
exists a unique entropy admissible weak solution u ∈ C([0, ∞); Lloc 1 (R)) ∩ L∞ (R × [0, ∞)) of
the Cauchy problem (4.11)–(4.12). If in addition u0 ∈ L1 (R), then u ∈ C([0, ∞); L1 (R)) and
the map S(t) : u0 7→ u(·,t) from initial data to solution extends to a continuous semigroup on
L1 (R) with kS(t)u0 − S(t)v0 kL1 (R) ≤ ku0 − v0 kL1 (R) .
We consider now the Riemann problem and show how the admissibility condition can
be used to single out a unique solution. The argument used to prove the Rankine-Hugoniot
condition shows that (4.19) leads to the jump condition
for every C1 entropy-flux pair (η, q). Once again, the condition is automatically satisfied by
a piecewise C1 solution. This means e.g. that a rarefaction wave is always admissible. A
limiting procedure shows again that the inequality continues to hold if we take η(u) = |u − u? |
and q(u) = sgn(u − u? )( f (u) − f (u? )), whence we obtain
Suppose that ul < ur and take u? = αul + (1 − α)ur for α ∈ [0, 1], so that u? ∈ [ul , ur ]. We then
obtain the chain of inequalities
where we used the Rankine-Hugoniot condition in the third line. We leave it as an exercise to
show that the inequality is reversed if ur < ul . The conditions
if ul < ur , and
if ur < ul , are called the Lax shock conditions. They have a very simple geometric interpre-
tation, saying that the graph of f must lie above the line segment between (ul , f (ul )) and
(ur , f (ur )) in the first case and below the line segment in the second case. In particular, if f is
convex, we obtain the condition
ur < ul
for a shock. For the Riemann problem, this means that the shock is the correct entropy solution
in the case ur < ul , whereas the rarefaction wave is the correct entropy solution in the case
ul < ur . The condition is reversed if f is concave.
Example 4.15. Consider the traffic model from Example 4.9, written in the normalized form
ut + f (u)x = 0, where f (u) = u(1 − u) is concave. This means that a shock wave will form if
ul < ur (meaning that the density is greater in the direction of the traffic). Consider e.g. the
case when ur = 1 meaning that the cars are standing still ahead.This might e.g. model a queue
forming at a red light. The velocity of the shock will then be c = ( f (ur ) − f (ul ))/(ur − ul ) < 0
(assuming that ul ∈ (0, 1)), so the shock wave is moving backwards, that is cars driving towards
the traffic light are coming to a sudden stop. If on the other hand ul > ur we get a rarefaction
wave. If ul = 1, this might model what happens when the light turns green.
Chapter 5
In the previous chapter we encountered weak solutions of PDE. The key was to multiply the
equation by a ‘test function’ ϕ ∈ C0∞ (Rd ) and integrate by parts to put all the derivatives on
ϕ. The name test function refers to the fact that the equation is ‘tested’ against ϕ. This gives
a suitable definition of weak solutions in which the solution u doesn’t need to be differen-
tiable. If u is a sufficiently regular weak solution, one shows that it is a classical solution,
meaning that the PDE holds pointwise. What is involved here is the concept of ‘weak’ or
‘distributional’ derivatives. The weak derivative of a non-differentiable function is in general
not a function, but a distribution. The purpose of this chapter is partly to clarify what this
means. Distributions or ‘generalized functions’ have a long prehistory. The theory was made
rigorous in the middle of the 20th century, in particular by Laurent Schwartz (1915–2002).
Distributions are indispensable in the modern theory of linear partial differential equations.
Distributions are not quite as useful for nonlinear equations, since one can in general
not define the product of two distributions. Moreover, the space of distributions is not a
normed vector space. For this purpose we introduce a generalization of Lebesgue spaces
called Sobolev spaces after Sergei L’vovich Sobolev (1908–1989). Sobolev was also involved
in the early theory of distributions. The Sobolev space H k simply consists of L2 functions
whose weak derivatives up to order k lie in L2 . As we will see, H k is a Banach space and
moreover an algebra if k is sufficiently large. Another important reason for using Sobolev
spaces is that they behave well with respect to the Fourier transform. This is not the case for
spaces based on the supremum norm.
The present chapter can only serve as brief introduction. For more details on distribu-
tions we refer to Friedlander [9], Duistermaat and Kolk [6], Hörmander [12], Rauch [20] and
Strichartz [25]. For more on Sobolev spaces we refer to the standard reference Adams and
Fournier [1]. In particular, this reference discusses Sobolev spaces on subsets of Rd , where
the regularity of the boundary plays an important role. In these notes we shall only discuss
Sobolev spaces on Rd .
83
84 Distributions and Sobolev spaces
5.1 Distributions
1 (Rd ) gives rise to a linear functional1 ` : C∞ (Rd ) →
Note that a locally integrable function u ∈ Lloc u 0
C defined by `u (ϕ) = hu, ϕi, where
Z
hu, ϕi := u(x)ϕ(x) dx. (5.1)
Rd
Note that this differs from the inner product in L2 (Rd ) in that there is no complex conjugate
on ϕ and that we don’t require u, ϕ ∈ L2 (Rd ). It is clear at least if u is continuous that `u
determines u uniquely, since if `u (ϕ) = 0 for all ϕ ∈ C0∞ (Rd ) then u ≡ 0. Below we shall
1 (Rd ).
prove that this is also the case if u ∈ Lloc
In keeping with the notation of Laurent Schwartz, we denote the space of test functions
C0∞ (Rd ) by D(Rd ).
Definition 5.1. A distribution ` is a linear functional on D(Rd ) which is continuous in the
following sense. If {ϕn } ⊂ D(Rd ) has the following properties:
(1) there exists a compact set K ⊂ Rd such that supp ϕn ⊆ K for all n, and
(2) there exists a function ϕ ∈ D(Rd ) such that for all α ∈ Nd , ∂ α ϕn → ∂ α ϕ uniformly,
then `(ϕn ) → `(ϕ) as n → ∞. The space of distributions is denoted D 0 (Rd ).
Remark 5.2. D 0 (Rd ) is simply the space of continuous linear functionals on D(Rd ) when
D(Rd ) is given the topology defined by (1) and (2) in the above definition. When (1) and (2)
hold, we say that ϕn → ϕ in D(Rd ). One can define D(Ω) if Ω is an open subset of Rd in a
similar fashion.
The most common ways that distributions appear is through the expression (5.1). The fact
that this expression defines a distribution follows from the estimate
Z
|hu, ϕi| ≤ |u(x)| dx kϕkL∞ (Rd ) ,
K
if ϕ ∈ D(Rd ) with supp ϕ ⊆ K. Let us prove that (5.1) determines the distribution `u uniquely.
1 (Rd )) if ` = 0.
Note that by linearity, it suffices to show that that u = 0 (as an element of Lloc u
1 (Rd ). If hu, ϕi = 0 for all ϕ ∈ D(Rd ), then u(x) = 0 a.e.
Theorem 5.3. Let u ∈ Lloc
Proof. It suffices to prove Rthat u = 0 a.e. in Rthe ball BR (0) ⊂ Rd for any R > 0. Let v =
χB2R (0) u ∈ L1 (Rd ). Then Rd v(x)ϕ(x) dx = Rd u(x)ϕ(x) dx = 0 for any ϕ ∈ D(Rd ) with
supp ϕ ⊂ B2R (0). Let J be a mollifier as in Theorem B.25. Then Jε ∗ v → v in LR1 (Rd ) by The-
orem B.25 (4) and hence also in L1 (BR (0)). On the other hand, (Jε ∗ v)(x) = Rd v(y)Jε (x −
y) dy = 0 for x ∈ BR (0) and ε > 0 sufficiently small, since supp Jε (x − ·) ⊂ BR+ε (0). It follows
that u = v = 0 a.e. in BR (0).
1A linear functional is just a linear map with values in C.
Distributions and Sobolev spaces 85
The above theorem implies that we can identify u with the linear functional `u . We will do
this without further mention in what follows. We shall also use the notation hu, ϕi for u(ϕ)
even when u is just an element of D 0 (Rd ). Let us give an example of a distribution which does
not correspond to a locally integrable function.
hδa , ϕi := ϕ(a).
u(x) = 0 a.e. and hence that hδa , ϕi = hu, ϕi = 0 for all ϕ ∈ D(Rd ).
Definition 5.5. We say that un → u in D 0 (Rd ) if hun , ϕi → hu, ϕi for all ϕ ∈ D(Rd )
Example 5.6. Consider a function K ∈ L1 (Rd ) withR Rd K(x) dx = 1 and set Kε (x) = ε −d K(ε −1 x)
R
if u is just a distribution. Note that τ−h ϕn → τ−h ϕ if ϕn → ϕ in D(Rd ), so that (5.2) indeed
defines a distribution τh u. It is clear that the new definition agrees with the old one if u ∈
D(Rd ) in the sense that the distribution τh u defined by (5.2) can be represented by the function
u(· − h).
Proposition 5.8. Suppose that L is a linear operator on D(Rd ) which is continuous in the
sense that ϕn → ϕ in D(Rd ) implies L(ϕn ) → L(ϕ). Suppose, moreover, that there exists a
continuous linear operator LT on D(Rd ), with the property that hL(ϕ), ψi = hϕ, LT (ψ)i for
all ϕ, ψ ∈ D(Rd ). Then L extends to a continuous linear operator on D 0 (Rd ) given by
Proof. The continuity of LT shows that L(u) defined in (5.3) is a distribution. It is also clear
from the definition of the transpose that (5.3) defines an extension of the original operator L.
Finally, if un → u in D 0 (Rd ), then
L LT
τh τ−h
σλ |λ |−d σ1/λ
Mψ Mψ
∂α (−1)|α| ∂ α
Proposition 5.9. Each of the operators τh , σλ , Mψ and ∂ α on D(Rd ) have unique extensions
to continuous maps on D 0 (Rd ). The extensions are given by hLu, ϕi = hu, LT ϕi where L and
LT are as in the above table.
Note the restriction to multiplication with smooth functions. In general we can’t define the
product of a distribution and a non-smooth function ψ since then ϕψ 6∈ D(Rd ). In particular,
one can in general not define the product of two distributions. We will get back to this question
later on.
Proof. The result can be proved using Lemma 2.15, but it is simpler to give a direct proof. By
Fubini’s theorem we have that
Z Z
−d/2 −ix·ξ
hF (ϕ), ψi = (2π) ϕ(x)e dx ψ(ξ ) dξ
Rd Rd
Z Z
−d/2 −ix·ξ
= (2π) ϕ(x) ψ(ξ )e dξ dx
Rd Rd
= hϕ, F (ψ)i.
The problem with defining hF (u), ϕi = hu, F (ϕ)i is that F doesn’t map D(Rd ) to itself
(compact support is lost). The key is to instead use S (Rd ) as the space of test functions.
Definition 5.13. We say that un → u in S 0 (Rd ) if un (ϕ) → u(ϕ) for all ϕ ∈ S (Rd )
hu, ϕn i → hu, ϕi
determined by its restriction to D(Rd ), since D(Rd ) is a dense subspace of S (Rd ) (Proposi-
tion 2.2 (5)). It therefore makes sense to write S 0 (Rd ) ⊂ D 0 (Rd ). The inclusion is strict, in
the sense that there exist elements u ∈ D 0 (Rd ) which can not be extended to continuous linear
functionals on S (Rd ).
Example 5.14.
(2) Any locally integrable function u with (1 + | · |2 )−N u ∈ L1 (Rd ) for some N belongs to
S 0 (Rd ) since
Z Z
|u(x)ϕ(x)| dx ≤ (1 + |x|2 )−N |u(x)|(1 + |x|2 )N |ϕ(x)| dx
Rd Rd
≤ k(1 + | · |2 )−N ukL1 (Rd ) k(1 + | · |2 )N ϕkL∞ (Rd ) .
(3) In particular, any u ∈ L p (Rd ), p ∈ [1, ∞], belongs to S 0 (Rd ) since (1 + | · |2 )−N u ∈
L1 (Rd ) if N is sufficiently large by Hölder’s inequality.
(4) u(x) = e|x| is in D 0 (Rd ) but not in S 0 (Rd ). Strictly speaking this means that the func-
tional ϕ 7→ hu, ϕi, ϕ ∈ D(Rd ), can not be extended to an element of S 0 (Rd ). This can
e.g. be seen by approximating a function ϕ ∈ S (Rd ) satisfying ϕ(x) = e−|x|/2 for large
|x| with a sequence in D(Rd ) in the topology of S (Rd ).
The proof of the following result is exactly the same as the proof of Proposition 5.8.
Proposition 5.15. The result in Proposition 5.8 remains true if D(Rd ) is replaced by S (Rd )
and D 0 (Rd ) by S 0 (Rd ).
As a consequence, Proposition 5.9 holds also for S 0 (Rd ) if we in the definition of the
multiplication operator assume that ψ ∈ S (Rd ), so that ψϕ ∈ S (Rd ) if ϕ ∈ S (Rd ). More
generally, one can assume that ψ and its derivatives grow at most polynomially.
Proposition 5.16. Let ψ ∈ C∞ (Rd ) and assume that ψ and its derivative grow at most poly-
nomially. Each of the operators τh , σλ , Mψ and ∂ α on S (Rd ) extend to continuous operators
on S 0 (Rd ).
Alternatively, one can view these operators as restrictions of the corresponding operators on
D 0 (Rd ) to S 0 (Rd ).
The main purpose of introducing tempered distributions is to extend the Fourier trans-
form. The fact that F : S (Rd ) → S (Rd ) is continuous shows that following definition makes
sense.
Distributions and Sobolev spaces 89
Definition 5.17. The Fourier transform on S 0 (Rd ) is the continuous operator F : S 0 (Rd ) →
S 0 (Rd ) defined by
hF (u), ϕi := hu, F (ϕ)i,
for u ∈ S 0 (Rd ) and ϕ ∈ S (Rd ). The inverse Fourier transform is the continuous operator
F −1 : S 0 (Rd ) → S 0 (Rd ) defined by
Proof. The identities follow by combining the definitions of these operators in terms of their
transposes with the corresponding properties for Schwartz class functions. We illustrate the
method by proving the first identity and leave the others to the reader. Considering ϕ as a
function of ξ we have that
hF (τ−y u), ϕi = hτ−y u, ϕ̂i = hu, τy ϕ̂i = hu, F (eiy·ξ ϕ)i = hû, eiy·ξ ϕi = heiy·ξ û, ϕi.
Example 5.20. We have hδ̂0 , ϕi = hδ0 , ϕ̂i = ϕ̂(0) = (2π)−d/2 Rd ϕ(x) dx, so δ̂0 = (2π)−d/2 .
R
By Fourier’s inversion formula it follows that F (1) = RF −1 (1) = (2π)d/2 R(δ0 ) = (2π)d/2 δ0 .
90 Distributions and Sobolev spaces
F (ϕ ∗ u) = (2π)d/2 ϕ̂ û,
∂ α (ϕ ∗ u) = (∂ α ϕ) ∗ u = ϕ ∗ (∂ α u).
Proof. The existence of the extension is a consequence of Proposition 5.15 and Lemma 5.21.
The identity F (ϕ ∗ u) = (2π)d/2 ϕ̂ û follows from the calculation
hF (ϕ ∗ u), ψi = hϕ ∗ u, ψ̂i
= hu, R(ϕ) ∗ ψ̂i
= hu, F F −1 (R(ϕ) ∗ ψ̂)i
= hû, F −1 (R(ϕ) ∗ ψ̂)i
= hû, (2π)d/2 ϕ̂ψi
= h(2π)d/2 ϕ̂ û, ψi,
in which the convolution theorem for S (Rd ) and the relation F −1 = F R = RF have been
used. The second identity follows from similar manipulations and its proof is left to the
reader.
Since convolution is commutative on S (Rd ) we shall also write u ∗ ϕ instead of ϕ ∗ u.
Keeping u fixed and varying ϕ we obtain a continuous linear operator from S (Rd ) to S 0 (Rd ).
Note that u ∗ v is in general not well-defined if u, v ∈ S 0 (Rd ).
Example 5.23. We have hδ0 ∗ ϕ, ψi = hδ0 , R(ϕ) ∗ ψi = Rd ϕ(x)ψ(x) dx = hϕ, ψi. Thus, δ0 ∗
R
Lemma 5.25. For u ∈ D 0 (Rd ) and ϕ ∈ D(Rd ) we have that u ∗ ϕ ∈ C∞ (Rd ) with
Proof. Since x 7→ ϕ(x − ·) is continuous from Rd to D(Rd ) it follows that x 7→ hu, ϕ(x − ·)i ∈
C(Rd ). Moreover, limh→0 h−1 (ϕ(x + he j − ·) − ϕ(x − ·)) = ∂ j ϕ(x − ·) in D(Rd ) for each
x and j, and x 7→ ∂ j ϕ(x − ·) is continuous from Rd to D(Rd ). From this it follows that
x 7→ hu, ϕ(x − ·)i ∈ C1 (Rd ). Proceeding by induction, one shows that x 7→ hu, ϕ(x − ·)i ∈
C∞ (Rd ). The result therefore follows if we prove the equality (u ∗ ϕ)(x) = hu, ϕ(x − ·)i. Let
v(x) = hu, ϕ(x − ·)i. Using Riemann sums we find that
Z
hv, ψi = v(x)ψ(x) dx
Rd
= lim ∑ ψ(α/n)hu, ϕ(α/n − ·)in−d
n→∞
α∈Zd
* +
= u, lim ∑ ψ(α/n)ϕ(α/n − ·)n−d
n→∞
α∈Zd
= hu, R(ϕ) ∗ ψi
for ψ ∈ D(Rd ), where we have used the fact that ∑α∈Zd ψ(α/n)ϕ(α/n−·)n−d → R(ϕ)∗ψ in
D(Rd ) (left as an exercise for the reader). The expression on the last line is simply hu ∗ ϕ, ψi,
so this proves the equality u ∗ ϕ = v.
Proof. We first approximate u ∈ D 0 (Rd ) with the sequence ψ(·/n)u in D 0 (Rd ), where ψ ∈
D(Rd ) with ψ(x) = 1 for |x| ≤ 1. It is clear that ψ(·/n)ϕ → ϕ in D(Rd ) if ϕ ∈ D(Rd )
(they are equal for large n) and hence ψ(·/n)u → u in D 0 (Rd ). Similarly, if u ∈ S 0 (Rd ), the
sequence converges to u in S 0 (Rd ). We can therefore replace u by ψu, for some function
ψ ∈ D(Rd ), in what follows. We define un = (ψu) ∗ J1/n , where J is an even mollifier (see
92 Distributions and Sobolev spaces
Section B.5). By the previous lemma it follows that un ∈ C∞ (Rd ) as an element of D 0 (Rd ).
Moreover, un has compact support since un (x) = hψu, J1/n (x − ·)i = hu, ψJ1/n (x − ·)i = 0 if
dist(x, supp ψ) ≥ 1/n. Finally,
Corollary 5.27. The extensions in Propositions 5.9 and 5.15 are unique.
Example 5.28. Consider the three-dimensional wave equation. For initial data u0 = 0 and
|)
u1 ∈ S (Rd ) the solution is expressed on the Fourier side as sin(t|ξ
|ξ |
û1 (ξ ). Note that sin(t|·|)
|·| ∈
S 0 (Rd ). By Proposition 5.22 we therefore have that
−3/2 −1 sin(t|ξ |)
u(x,t) = (2π) F ∗ u1
|ξ |
|)
in the sense of distributions. At least if u1 ∈ D(Rd ), we can make sense of (2π)−3/2 F −1 sin(t|ξ |ξ |
∗
u1 as
hvt , u1 (x − ·)i,
|)
where vt = (2π)−3/2 F −1 sin(t|ξ |ξ |
∈ S 0 (Rd ). The distribution vt is nothing but tMt from
Section 3.2, that is,
1
Z
hvt , ϕi = tMt (ϕ) = ϕ(x) dS(x).
4πt |x|=|t|
The notation W k,2 (Rd ) is also frequently used instead of H k (Rd ). One then regards
H k (Rd )as a member of a more general family W k,p (Rd ) of Sobolev spaces, in which L2 (Rd )
is replaced by L p (Rd ).
Note that Z
(u, v)H k (Rd ) := ∑ ∂ α u(x)∂ α v(x) dx
|α|≤k Rd
Distributions and Sobolev spaces 93
defines an inner product on H k (Rd ). Using the completeness of L2 (Rd ) and the definition
of distributional derivatives, it is easy to prove that H k (Rd ) is in fact a Hilbert space with
this inner product. Alternatively, one can prove this by first characterizing H k (Rd ) using the
Fourier transform. The following result is a direct consequence of Parseval’s formula.
Moreover, Z
kuk2H k (Rd ) = Pk (ξ )|û(ξ )|2 dξ . (5.5)
Rd
This result suggests a generalization of the Sobolev spaces. We introduce the notation
hξ i = (1 + |ξ |2 )1/2 . Note that Pk (ξ ) ∼ hξ i2k in the sense that there exist constants C1 ,C2 > 0
such that C1 hξ i2k ≤ Pk (ξ ) ≤ C2 hξ i2k for all ξ ∈ Rd . We can therefore replace Pk (ξ ) by hξ i2k
in the definition of the norm of H k (Rd ). This allows us to extend the definition of Sobolev
spaces to non-integer indices and to negative indices.
Definition 5.31. Let s ∈ R. H s (Rd ) is the space of tempered distributions u such that û ∈
1 (Rd ) with
Lloc
Z 1/2
2 s 2
kukH s (Rd ) := (1 + |ξ | ) |û(ξ )| dξ < ∞. (5.6)
Rd
These spaces are usually called fractional Sobolev spaces (even though s doesn’t have to
be rational). From now on we will use (5.6) as the norm on H s (Rd ), even when s is a non-
negative integer. Note that S (Rd ) ⊂ H t (Rd ) ⊂ H s (Rd ) ⊂ S 0 (Rd ) when s < t, each inclusion
being continuous.
Proof. It is clear that (·, ·)H s (Rd ) defines an inner product on H s (Rd ) with (u, u)H s (Rd ) =
kuk2H s (Rd ) , so we only need to prove the completeness. This follows since the map T : u 7→
hξ is û is an isometric isomorphism from H s (Rd ) to L2 (Rd ).
Note that the elements of H s (Rd ) need not be functions when s < 0.
94 Distributions and Sobolev spaces
Example 5.33. The distribution δ0 ∈ S 0 (Rd ) satisfies δ̂0 = (2π)−d/2 . Using polar coordi-
nates, we find that Z
(1 + |ξ |2 )s dξ < ∞
Rd
if and only if s < −d/2. Hence δ0 ∈ H s (Rd ) if and only if s < −d/2.
Riesz’ representation theorem (Theorem A.12) implies that the dual space of H s (Rd ) can
be identified with H s (Rd ) itself. The element v ∈ H s (Rd ) is here identified with the bounded
linear functional
Z
s d
H (R ) 3 u 7→ (u, v)H s (Rd ) = (1 + |ξ |2 )s û(ξ )v̂(ξ ) dξ . (5.7)
Rd
The dual can also be identified with H −s (Rd ) using an extension of the usual pairing between
S 0 (Rd ) and S (Rd ). Rewriting
Z Z
hw, ϕi = ŵ(ξ )ϕ̂(−ξ ) dξ = hξ i−s ŵ(ξ )hξ is ϕ̂(−ξ ) dξ ,
Rd Rd
we see that h·, ·i extends to a bilinear map H −s (Rd ) × H s (Rd ) → C. It is also clear that the
functional defined by v ∈ H s (Rd ) according to (5.7) is identical to the functional u 7→ hw, ui
where w := Λ2s C v, in which C v = v for v ∈ S (Rd ) and is extended by duality to S 0 (Rd ),
and Λ2s := F −1 Mhξ i2s F : H s (Rd ) → H −s (Rd ). Whenever one identifies the dual with either
H s (Rd ) or H −s (Rd ) one has to keep in mind how the identification is done.
In particular this implies that H s (Rd ) is the completion of C0∞ (Rd ) with respect to the norm
k · kH s (Rd ) . This gives an alternative way of introducing Sobolev spaces.
Sobolev spaces are typically used in connection with PDE. One usually proves the ex-
istence of a solution in some Sobolev space. It is of interest to know when the solution is
sufficiently differentiable in the usual sense to define a classical solution of the PDE.
Definition 5.35. Ċk (Rd ) is the space of k times continuously differentiable functions u : Rd →
C, such that lim|x|→∞ |∂ α u(x)| = 0 for all α ∈ Nd with |α| ≤ k, endowed with the norm
kukCk (Rd ) := max|α|≤k supRd |∂ α u(x)|.
b
Lemma 5.36. Ċk (Rd ) is a closed subspace of the Banach space Cbk (Rd ).
Proof. Let {un } ⊂ Ċk (Rd ) with un → u in Cbk (Rd ). For ε > 0 we can find a number N such
that kun − ukCk (Rd ) < ε if n ≥ N. Choosing R such that |∂ α uN (x)| < ε for |x| ≥ R and |α| ≤ k,
b
we find that |∂ α u(x)| ≤ |∂ α uN (x)| + ku − uN kCk (Rd ) < 2ε if |x| ≥ R. Hence, u ∈ Ċk (Rd ).
b
Distributions and Sobolev spaces 95
Note that this means that Ċk (Rd ) is itself a Banach space.
for ϕ ∈ S (Rd ) where C = C(k, s, d). For α ∈ Nd with |α| ≤ k it follows from Hölder’s
inequality that
Z
−d/2
α
|∂ ϕ(x)| ≤ (2π) |ξ |k |ϕ̂(ξ )| dξ
Rd
|ξ |k
Z
−d/2 2 s/2
= (2π) s/2
1 + |ξ | |ϕ̂(ξ )| dξ
Rd (1 + |ξ |2 )
1/2 Z 1/2
|ξ |2k
Z
−d/2 2 s 2
≤ (2π) s dξ 1 + |ξ | |ϕ̂(ξ )| dξ
Rd (1 + |ξ |2 ) Rd
= CkϕkH s (Rd ) ,
where C = (2π)−d/2 k|ξ |k hξ i−s kL2 (Rd ) < ∞ since |ξ |2k hξ i−2s ≤ 2|ξ |2(k−s) for large |ξ | and
2(k − s) < −d. Taking the maximum over all α we find that
Pick {ϕn } ⊂ S (Rd ) with ϕn → f in H s (Rd ) as n → ∞. The inequality kϕn − ϕm kCk (Rd ) ≤
b
Ckϕn − ϕm kH s (Rd ) follows from (5.8) and implies that {ϕn } is a Cauchy sequence in Ċk (Rd ).
Since Ċk (Rd ) is a Banach space it follows that {ϕn } converges to some element g in Ċk (Rd ).
But then {ϕn } converges both to f and g in S 0 (Rd ) so f = g as distributions. It follows from
Theorem 5.3 that f = g a.e. Taking limits, one obtains (5.8) with f substituted for ϕ.
Remark 5.38. The elements of H s (Rd ) are actually equivalence classes, not functions. The
Sobolev embedding theorem implies that, for s > k + d/2, there is a continuous member of
each such class. As usual we identify the equivalence class with the continuous representative.
Theorem 5.39. H s (Rd ) is an algebra if s > d/2, meaning that u, v ∈ H s (Rd ) ⇒ uv ∈ H s (Rd )
and that there exists C = C(s, d) > 0 such that
Proof of Theorem 5.39. Note that F (uv) = RF −1 (uv) = (2π)−d/2 R(F −1 (u) ∗ F −1 (v)) =
(2π)−d/2 (û ∗ v̂) for u, v ∈ S (Rd ) by the convolution theorem. By continuity, we also have
that F (uv) = (2π)−d/2 (û ∗ v̂) when u, v ∈ L2 (Rd ), and hence when u, v ∈ H s (Rd ), s > d/2.
We therefore have that
Z 1/2
d/2 s 2s 2
(2π) kuvkH s (Rd ) = khξ i (û ∗ v̂)kL2 (Rd ) = hξ i |(û ∗ v̂)(ξ )| dξ .
Rd
Note that Z
hξ is |(û ∗ v̂)(ξ )| ≤ hξ is |û(ξ − η)||v̂(η)| dη.
Rd
By the previous lemma,
hξ is = hξ − η + ηis ≤ 2s−1 (hξ − ηis + hηis ).
Hence,
Z Z
s s−1 s s
hξ i |(û ∗ v̂)(ξ )| ≤ 2 hξ − ηi |û(ξ − η)||v̂(η)| dη + hηi |û(ξ − η)||v̂(η)| dη .
Rd Rd
By the special case (B.3) of Young’s inequality, we therefore obtain that
khξ is (û ∗ v̂)kL2 (Rd ) ≤ C(khξ is ûkL2 (Rd ) kv̂kL1 (Rd ) + kûkL1 (Rd ) khξ is v̂kL2 (Rd ) ).
Since khξ is ûkL2 (Rd ) = kukH s (Rd ) and
Z
kûkL1 (Rd ) = hξ i−s hξ is |û(ξ )| dξ
Rd
Z 1/2 Z 1/2
−2s 2s 2
≤ hξ i dξ hξ i |û(ξ )| dξ
Rd Rd
≤ CkukH s (Rd ) ,
where we have used the assumption that 2s > d, the result follows.
Chapter 6
Dispersive equations
In this chapter we will focus on nonlinear dispersive wave equations of the form
ut − a(Dx )u = F(u, Dx u).
Here F is some nonlinear function of u and Dx u, while a(Dx ) is a differential operator and
Dx = ∇/i. We assume that F vanishes to second order at the origin. By the equation being
dispersive, we mean that the linear part ut = a(Dx )u is dispersive. Recall that in the one-
dimensional case this means that the equation has sinusoidal waves ei(kx−ω(k)t) = eik(x−c(k)t) ,
where c(k) = ω(k)/k ∈ R is a non-constant function of k. Since the dispersion relation is given
by ω(k) = ia(k), one should assume that a(k) is a polynomial of order 2 or higher, which takes
purely imaginary values for k ∈ R.1 In the higher dimensional case, the sinusoidal waves take
the form ei(k·x−ω(k)t) = eik·(x−c(k)t) , with c(k) = ω(k)k/|k|2 = ia(k)k/|k|2 . In this case it is
more difficult to define what is meant by a dispersive equation. Even for the wave equation
one obtains that there is some weak dispersion in the sense that c(k) = ±k/|k| depends on
the direction of k.2 At the very least, one should assume that a takes imaginary values when
k ∈ Rd .
To illustrate the ideas we will concentrate on two particular examples:
(1) the nonlinear Schrödinger (NLS) equation: iut + ∆u = σ |u| p−1 u, where σ = ±1 and
p ≥ 3 is an odd integer;
(2) the Korteweg-de Vries (KdV) equation: ut + 6uux + uxxx = 0.
Taking NLS as an example, we first consider the Cauchy problem
iut + ∆u = f (x,t). (6.1)
with u(x, 0) = u0 (x). Using Duhamel’s formula, we express the solution u = L(u0 , f ) in terms
of u0 and f , where L is a linear operator. A solution of the NLS equation is then obtained
by solving the fixed-point equation u = L(u0 , σ |u| p−1 u) using Banach’s fixed point theorem.
In order to carry out this procedure, we need to identify a suitable function space in which to
study (6.1). We begin with a short introduction to calculus in Banach spaces.
1 Ifa(k) = ia0 + ia1 k, with a0 , a1 ∈ R \ {0}, we obtain the transport equation vt = a1 ∂x v after replacing u by
v := e−ia0 t u. This equation is therefore not truly dispersive.
2 The wave equation is second order in time, and doesn’t quite fit into the above framework.
97
98 Dispersive equations
Assume that I is a compact interval. One easily verifies that t 7→ ku(t)k is continuous if
u ∈ C(I; X). From this it follows that ku(·)k is bounded. C(I; X) is therefore a normed vector
space with
kukC(I;X) = sup ku(t)kX .
t∈I
The proof that C(I) is a Banach space can be repeated word for word to prove that C(I; X) is
a Banach space. The assumption that X is a Banach space is essential here.
The mean-value theorem is replaced by the following result.
Proposition 6.2. Assume that u ∈ C([a, b]; X) and that u is differentiable on (a, b). Then
ku(b) − u(a)k ≤ (b − a) sup ku0 (t)k.
t∈(a,b)
Proof. If supt∈(a,b) ku0 (t)k = ∞ there is nothing to prove, so we can assume that it is finite. Let
C > supt∈(a,b) ku0 (t)k. Take α, β ∈ (a, b) with α < β and let I = {t ∈ [α, β ] : ku(t) − u(α)k ≤
C(t − α)}. I is non-empty since it contains α. It is also closed since u is continuous. Hence it
has a largest element β? . If β? < β , we obtain that
ku(β? + h) − u(α)k = ku(β? + h) − u(β? ) + u(β? ) − u(α)k
≤ ku(β? + h) − u(β )k +C(β? − α)
≤ Ch +C(β? − α)
= C(β? + h − α),
3 For instance, if u0 (a) = limt→a+ u0 (t), it follows from Proposition 6.2 applied to v(t) = u(t) − u(a) −
u0 (a)(t − a) that ku(a + h) − u(a) − u0 (a)hk = o(h) as h → 0+ .
Dispersive equations 99
if h > 0 is sufficiently small, since ku(β? + h) − u(β )k ≤ hku0 (t)k + o(h) as h → 0 according
to the definition of differentiability and the triangle inequality. It follows that β? + h ∈ I, and
therefore β? = β . Hence, ku(β ) − u(α)k ≤ C(β − α) and since u is continuous on [a, b] it
follows that ku(b) − u(a)k ≤ C(b − a). As C > supt∈[a,b] ku0 (t)k is arbitrary we obtain the
desired result.
sk ∈ X, where {Ik }N N
k=1 is a partition of I into pairwise disjoint intervals (I = ∪k=1 Ik and I j ∩Ik =
0/ if j 6= k). We define
Z N
s(t) dt = ∑ sk |Ik |,
I k=1
where |Ik | is the length of the interval Ik . The representation (6.2) is not unique, but the
integral is independent of the particular representation.
R R
One easily verifies that the integral
is linear and that the triangle inequality k I s(t) dtk ≤ I ks(t)k dt holds. The integral of a
continuous function u is defined by approximating u with step functions. Precisely as for the
Riemann integral, one uses the fact that a continuous function on a compact set is uniformly
continuous.4
Proposition 6.4. Assume that f ∈ C(I; X). Then there exists a sequence of step functions {sn }
such
R
that sn (t) → f (t) uniformly on I (supt∈I ksn (t)R− f (t)kX → 0 as n → ∞). The sequence
{ I sn (t) dt} converges in X and the limit, denoted I f (t) dt, is the same for any sequence of
step functions which converges uniformly to f on I.
Proof. The existence of such a sequence is proved by taking sn (t) = ∑1≤k≤Nn f (tk )χIn,k with
tk ∈ In,k for any sequence of partitions {In,k }N k=1 of I with maxk |In,k | → 0 as n → ∞. Since
n
We define ba u(t) dt := −
R Rb
a u(t) dt when a < b. The proof of the following proposition is
left to the reader.
Proposition 6.5. Let X,Y be Banach spaces, I = [a, c] with a < b < c, f , g ∈ C(I; X), α, β ∈ C
and T : X → Y a bounded linear operator. Then
R R R
(1) I (α f (t) + β g(t)) dt = α I f (t) dt + β I g(t) dt;
R Rb Rc
(2) I f (t) dt = a f (t) dt + b f (t) dt;
R R
(3) k I f (t) dtkX ≤ I k f (t)kX dt;
R R
(4) T ( I f (t) dt) = IT f (t) dt.
The usual relation between derivatives and integrals holds.
Rt
Proposition 6.6. Let u ∈ C([a, b]; X). Then a u(τ) dτ is differentiable on (a, b) with
d t
Z
u(τ) dτ = u(t).
dt a
Proof. We simply note that
Z t+h Z t
1 t+h
1
Z
u(τ) dτ − u(τ) dτ − u(t) = (u(τ) − u(t)) dτ.
h a a h t
If ε > 0 is given, we can choose δ > 0 such that ku(τ) − u(t)k ≤ ε if |τ − t| ≤ δ . We then
obtain that
Z t+h
Z t+h
1
1
h t (u(τ) − u(t)) dτ
≤ h t ku(τ) − u(t)k dτ ≤ ε
6.2 NLS
The nonlinear Schrödinger equation shows up in many different models for physical phenom-
ena. The main reason is that it appears as a universal model when one looks for solutions of
wave packet type. To explain this we concentrate on one example, namely the KdV equation.
Several other interesting examples are provided by Sulem & Sulem [26].
Example 6.9. We look for an approximate solution of the KdV equation ut + 6uux + uxxx = 0
of the form
where T = ε 2t, X = ε(x + 3k2t) and ω = −k3 . The functions A j are complex-valued, and the
bars indicate complex conjugation. Note that u is therefore real-valued. The parameter ε is as-
sumed to be small, so that u is given to leading order by the wave packet εA1 (X, T )ei(kx−ωt) +
εA1 (X, T )e−i(kx−ωt) . Note that this can be rewritten in the form 2εR(X, T ) cos(kx − ωt +
θ (X, T )), where R = |A1 | and θ = arg A1 . Since the variable T changes at a slower rate than
X, which in turn changes at a slower rate than kx−ωt, we roughly have an envelope 2εR(X, T )
travelling with group velocity cg = −3k2 and a carrier wave cos(kx − ωt + θ (X, T )) with wave
number k and phase velocity c = −k2 . To see how the NLS equation arises, we substitute the
above Ansatz into the equation. Equating terms of the same order in ε, we arrive at the fol-
lowing equations:
k2 A0 = −2|A1 |2 ,
∂T A1 = −3ik∂X2 A1 − 6ik(A2 A1 + A1 A0 ),
k2 A2 = A21 ,
where two integration constants have been set to 0. When these equations are satisfied, we
formally find that ut + 6uux + uxxx = O(ε 4 ). Combining the three equations, we obtain the
cubic one-dimensional NLS equation
6
i∂T A1 − 3k∂X2 A1 + A1 |A1 |2 = 0
k
(the minus sign in front of the second order derivative can be removed by changing T to −T ).
For a rigorous justification of this approximation we refer to Schneider [23].
where f and u0 are given functions. If u0 ∈ S (Rd ) and f ∈ C∞ ([0, ∞); S (Rd )) there is a
unique solution u ∈ C∞ ([0, ∞); S (Rd )) given by
Z t
u(t) = S(t)u0 − i S(t − τ) f (τ) dτ (6.4)
0
2
according to Theorem 3.36. Here S(t)ϕ = F −1 (e−it|ξ | ϕ̂) for ϕ ∈ S (Rd ). Note that we can
write the solution of (6.3) with f ≡ 0 as
2
u(x,t) = F −1 (e−it|ξ | û0 ) = Kt ∗ u0 ,
2
where Kt = (2π)−d/2 F −1 (e−it|ξ | ) ∈ S 0 (Rd ).
Lemma 6.10. Let z ∈ C with Re z > 0. Then
z|·|2 1 |ξ |2
F (e− 2 )(ξ ) = d/2
e− 2z , (6.5)
z
where zd/2 is defined using the principal branch of the square root if d is odd.
2 /2 2 /2
Proof. The assumption Re z > 0 implies that e−z|·| ∈ S (Rd ) so that F (e−z|·| ) ∈ S (Rd ).
Moreover, the rapid decay implies that
z|x|2
Z
−z|·|2 /2 −d/2
F (e )(ξ ) = (2π) e− 2 −ix·ξ dx
Rd
is complex differentiable as function of z for each fixed ξ ∈ Rd (the order of complex dif-
2
ferentiation and integration can be interchanged). We therefore have that z 7→ F (e−z|·| /2 ) is
2
analytic in the right half-plane Re z > 0. On the other hand, the function z−d/2 e−|ξ | /(2z) is
also analytic in the right half-plane and the two functions agree when Im z = 0 (Example 2.7).
The unique continuation principle for analytic functions therefore proves that they are equal
in the whole right half-plane.
Proposition 6.11. The formula (6.5) extends to z ∈ C with Re z = 0 and z 6= 0.
Proof. Let z be as in the statement of the proposition. By the previous lemma we have that
2 /2 2 /2
he−(z+ε)|·| , ϕ̂i = hF (e−(z+ε)|·| ), ϕi
2 /(2(z+ε))
= h(z + ε)−d/2 e−|·| , ϕi
for ε > 0. Letting ε → 0+ we find that
2 /2
D 2
E
he−z|·| , ϕ̂i = z−d/2 e−|·| /(2z) , ϕ .
for t 6= 0.
Dispersive equations 103
2 2it|·|2 1 i|x|2
(2π)d/2 Kt (x) = F −1 (e−it|·| )(x) = F (e− 2 )(−x) = e 4t .
(2it)d/2
for all ϕ ∈ S (Rd ). On the other hand, the convolution integral in (6.6) converges and defines
a continuous, bounded function of x for each t 6= 0. Since u0 (x − y)ϕ(x) is in L1 (Rd × Rd ),
we can change the order of integration to show that
1 i|·−y|2 1 i|y|2
Z Z
e 4t u0 (y) dy, ϕ = e 4t u0 (· − y) dy, ϕ
(4πit)d/2 Rd (4πit)d/2 Rd
Z
1 i|y|2
Z
= e 4t u0 (x − y) dy ϕ(x) dx
Rd (4πit)d/2 Rd
Z Z
1 i|y|2
= u0 (x − y)ϕ(x) dx e 4t dy
Rd Rd (4πit)d/2
1 i|·|2
= e 4t , R(u0 ) ∗ ϕ ,
(4πit)d/2
so that the identity (6.7) actually holds pointwise. This proves (6.6).
Proof. For p = 2 we have that q = 2 and the result follows from Parseval’s formula since
2
e−it|ξ | has modulus 1. For p = ∞, the result follows from the representation formula (6.6). For
p ∈ (2, ∞) the result follows from the Riesz-Thorin interpolation theorem (Theorem B.28).
Recall that solutions of the heat equation also have decay properties (see Section 3.1.4).
The decay for the Schrödinger equation is of a different nature. This can be seen by look-
ing at an individual Fourier mode û(t)eiξ ·x . For the heat equation the solution has the form
2 2
û(0)eix·ξ −|ξ | t , while the corresponding solution of the Schrödinger equation is û(0)eix·ξ −i|ξ | t .
Thus, the individual Fourier modes decay for the heat equation, but not for Schrödinger’s equa-
tion. In the latter case, the decay is instead related to the fact that different modes travel with
104 Dispersive equations
different speeds, that is, dispersion. An initially localized solution consists of a continuous lin-
ear combination of different Fourier modes. The dispersion of these modes causes the decay.
The solution formula (6.4) can be extended to u0 ∈ S 0 (Rd ) and f ∈ C([0, T ); S 0 (Rd ))
if one interprets the integral in a suitable weak sense. Here we shall content ourselves with
data in Sobolev spaces. We assume that u0 ∈ H s (Rd ) and that f ∈ C([0, T ); H s (Rd )) for some
T > 0. We begin by recording some properties of the propagator S(t).
Proposition 6.14. The family of operators {S(t)}t∈R defined for u ∈ H s (Rd ) by S(t)u =
2
F −1 (e−i|ξ | t û) satisfies the following properties:
(2) S(t) commutes with partial derivatives, meaning that ∂ j (S(t)u) = S(t)(∂ j u);
(5) the map t 7→ S(t) is strongly continuous, meaning that t 7→ S(t)u is in C(R; H s (Rd )) for
every u ∈ H s (Rd );
(6) for u ∈ H s (Rd ) we have that t 7→ S(t)u is in C1 (R; H s−2 (Rd )) with
d
(S(t)u) = i∆S(t)u.
dt
Proof. We work on the Fourier transform side, where S(t) corresponds to multiplication with
2 2
e−i|ξ | t . Properties (1)–(4) now follow from the facts that e−i|ξ | t has modulus 1 and satisfies
2 2 2
e−i|ξ | (t1 +t2 ) = e−i|ξ | t1 e−i|ξ | t2 , while ∂ j corresponds on the Fourier side to multiplication with
2
iξ j . Property (5) is proved by noting that t 7→ e−i|ξ | t is continuous for fixed ξ and applying the
dominated convergence theorem. Property (6) is also proved using the dominated convergence
2 2
theorem and the facts that ∂t e−i|ξ | t = −i|ξ |2 e−i|ξ | t and F (∆u) = −|ξ |2 û.
Properties (3)–(5) are the defining properties of a strongly continuous group of operators
on a normed vector space (it is also assumed that the operators are bounded).
Proposition 6.15. Assume that f ∈ C([0, T ); H s (Rd )) and that u0 ∈ H s (Rd ). There exists a
unique solution u ∈ C([0, T ); H s (Rd )) ∩C1 ([0, T ); H s−2 (Rd )) of (6.3) given by (6.4).
Proof. Writing (6.4) in the form u(t) = S(t)u0 − iS(t) 0t S(−τ) f (τ) dτ and using Proposition
R
6.14 it is easy to verify that u defines a solution in C([0, T ); H s (Rd )) ∩ C1 ([0, T ); H s−2 (Rd ))
of (6.3). On the other hand, if v is another solution, then w(t) := S(−t)(u(t) − v(t)) satisfies
the differential equation w0 (t) = 0 with w(0) = 0, so w(t) = 0 for all t by Corollary 6.3. Hence
u(t) ≡ v(t).
Dispersive equations 105
where σ = ±1 and p ≥ 3 is an odd integer. The two different cases, σ = −1 and σ = 1, are
referred to as focusing and defocusing, respectively. The distinction between these two cases
will become clear later on.
We assume that u0 ∈ H s (Rd ) for some s > d/2. We are looking for a solution u ∈ C([0, T ); H s (Rd ))∩
C1 ([0, T ); H s−2 (Rd )). Repeated use of Theorem 5.39 shows that |u| p−1 u = u(p+1)/2 u(p−1)/2 ∈
H s (Rd ) if u ∈ H s (Rd ), where we also used the fact that u ∈ H s (Rd ).
Lemma 6.16. Let s > d/2 and f (z) = σ |z| p−1 z. The map u 7→ f (u) is a locally Lipschitz
continuous on H s (Rd ), meaning that for each R > 0 there exists a constant C such that k f (u)−
f (v)kH s (Rd ) ≤ Cku − vkH s (Rd ) , whenever u, v ∈ BR (0) ⊂ H s (Rd ).
Proof. Since f (z) is a polynomial in z and z, we can find two polynomials P1 (z1 , z2 , z3 , z4 ) and
P2 (z1 , z2 , z3 , z4 ), such that
k f (u) − f (v)kH s (Rd ) ≤ C(kP1 (u, v, u, v)kH s (Rd ) + kP2 (u, v, u, v)kH s (Rd ) )ku − vkH s (Rd ) .
Using Theorem 5.39 we find that kPj (u, v, u, v)kH s (Rd ) , j = 1, 2, is bounded by a constant
(depending on R) when kukH s (Rd ) , kvkH s (Rd ) ≤ R.
It follows from Proposition 6.15 that a strong solution u is also in C1 ([0, T ); H s−2 (Rd ))
and solves (6.8) for each t ∈ (0, T ) with u(x, 0) = u0 (x). Just as for finite-dimensional ODE it
is simpler to work with the integral equation (6.10) than the original problem. Note however
that (6.10) differs from the usual integral equation
Z t
u(t) = u0 + i (∆u(τ) − f (u(τ))) dτ, (6.11)
0
which we would obtain if we were to treat the problem as an infinite-dimensional ODE. The
advantage with (6.10) is that the right hand side defines a continuous map on C([0, T ); H s (Rd ))
for s > d/2; this is not the case for (6.11).
Theorem 6.18. Let s > d/2. For any u0 ∈ H s (Rd ), there exists a time T1 = T1 (ku0 kH s (Rd ) ) > 0
such that the Cauchy problem (6.9) has a unique strong solution u ∈ C([0, T1 ]; H s (Rd )).
Proof. The strategy is to prove that the nonlinear operator defined by the right hand side of
(6.10) is a contraction if the time interval is sufficiently small. We choose X as the Banach
space
X = C([0, T1 ]; H s (Rd ))
where T1 is to be determined and let
K = {u ∈ X : kukX ≤ 2R},
where ku0 kH s (Rd ) < R. K is a closed subset of the Banach space X and hence a complete metric
space. We need to prove that F(K) ⊆ K and that kF(u) − F(v)kX ≤ θ ku − vkX for some θ ∈
(0, 1), where F(u) denote the right hand side of (6.10). Note that kS(·)u0 kX = ku0 kH s (Rd ) < R.
Using Lemma 6.16 we find that there exists a constant C = C(R) > 0 such that
Taking T = 1/(2C), the contraction property follows with θ = 1/2. We thus have kF(u) −
F(v)kX ≤ 12 ku−vkX . Taking v = 0, It follows that kF(u)kX ≤ kF(0)kX + 12 kukX = ku0 kH s (Rd ) +
1
2 kukX ≤ 2R, so that F(K) ⊆ K. From Banach’s fixed point theorem (Theorem A.15) we there-
fore find a unique strong solution u ∈ K. It remains to prove that the solution is unique in X.
This follows if we can prove that any solution in X is in fact in K. To see this, we estimate
k f (u)kH s (Rd ) = k f (u) − f (0)kH s (Rd ) ≤ Cku − 0kH s (Rd ) = CkukH s (Rd ) and
Z t
kF(u)(t)kH s (Rd ) ≤ ku0 kH s (Rd ) + k f (u(τ))kH s (Rd ) dτ
0
≤ R + 2RCt
= (1 + t/T )R,
Dispersive equations 107
as long as u(τ) remains in the ball of radius 2R around the origin in H s (Rd ). But this implies
that ku(t)kH s (Rd ) = kF(u)(t)kH s (Rd ) ≤ 2R when 0 ≤ t ≤ T , since ku(t)kH s (Rd ) is continuous
with ku(0)kH s (Rd ) < R. In other words u ∈ K. The proof is complete.
Note that T1 only depends on the Lipschitz constant C in Lemma 6.16. Since the optimal
constant is an increasing function of R, we can assume that T1 is a decreasing function of
ku0 kH s (Rd ) .
There is nothing special about the point t = 0. If the initial condition is prescribed at some
point t0 we instead obtain the integral equation
Z t
u(t) = S(t − t0 )u0 − i S(t − τ) f (u(τ)) dτ.
t0
Suppose that u is a strong solution on [0, T1 ] and that v is a strong solution on [T1 , T2 ] with
initial data v0 = u(T1 ) at t = T1 . Set w(t) = u(t) on [0, T1 ] and w(t) = v(t) on [T1 , T2 ], so that
w ∈ C([0, T2 ]; H s (Rd )). Clearly w is a strong solution on [0, T1 ]. It is also a strong solution on
[0, T2 ] since
Z t
S(t − T1 )u(T1 ) − i S(t − τ) f (v(τ)) dτ
T1
Z T1 Z t
= S(t − T1 )S(T1 )u0 − i S(t − T1 )S(T1 − τ) f (u(τ)) dτ − i S(t − τ) f (v(τ)) dτ
0 T1
Z t
= S(t)u0 − i S(t − τ) f (w(τ)) dτ,
0
if u, v ∈ C([0, T ]; H s (Rd )) are the corresponding solutions (T is chosen strictly less than the
maximal existence time for each solution), ku(t)kH s (Rd ) , kv(t)kH s (Rd ) ≤ R on [0, T ] and C is
a Lipschitz constant for f (u) on {u ∈ H s (Rd ) : kukH s (Rd ) ≤ R}. It follows from Grönwall’s
inequality that
ku(t) − v(t)kH s (Rd ) ≤ ku0 − v0 kH s (Rd ) eCt , 0 ≤ t ≤ T.
In conclusion, we have the following result.
Theorem 6.19. There exists a maximal time T > 0 such that the Cauchy problem (6.9) has a
strong solution in C([0, T ); H s (Rd )). The solution is unique in this class and if T < ∞ then
limt→T ku(t)kH s (Rd ) = ∞. The solution depends continuously on the initial data in the sense
described above.
kuvkH s (R) ≤ C(kukH s (R) kvkH r (R) + kukH r (R) kvkH s (R) ).
5 They do play a role if one considers initial data with lower regularity.
Dispersive equations 109
kuvkH s (R) ≤ C(kukH s (R) kv̂kL1 (R) + kûkL1 (R) kvkH s (R) ).
On the other hand kûkL1 (R) ≤ CkukH r (R) by the same proof.
kuvkH s (Rd ) ≤ C(kukH s (Rd ) kvkL∞ (Rd ) + kukL∞ (Rd ) kvkH s (Rd ) )
for s ≥ 0, but the proof requires more sophisticated methods (see Tao [27]).
Theorem 6.22 (Persistence of regularity for NLS). Let u0 ∈ H s (R), where 1/2 < r < s. Then
Ts (u0 ) = Tr (u0 ).
Proof. Let T < Tr (u0 ), so that C := supt∈[0,T ] ku(t)kH r (R) < ∞. Repeated application of Lemma
6.20 shows that
Z t
ku(t)kH s (Rd ) ≤ ku0 kH s (R) + k f (u(τ))kH s (R) dτ
0
Z t
≤ ku0 kH s (R) + K ku(τ)kHp−1
r (R) ku(τ)kH s (R) dτ
0
Z t
p−1
≤ ku0 kH s (R) + KC ku(τ)kH s (R) dτ.
0
so that T ≤ Ts (u0 ) ≤ Tr (u0 ). Since T < Tr (u0 ) is arbitrary we obtain that Ts (u0 ) = Tr (u0 ).
Theorem 6.23. Let u ∈ C([0, T ); H 1 (R)) be a strong solution of the NLS equation. Then the
quantities M(u(t)) and E(u(t)) are independent of t.
Proof. First assume that u0 ∈ H 3 (R) and let u ∈ C([0, T ); H 3 (R)) ∩C1 ([0, T ); H 1 (R)) be the
corresponding solution. We have that
d
Z
M(u(t)) = Re ut u dx
dt ZR
= Re i(uxx − σ |u| p−1 u)u dx
RZ
(note that u ∈ Ċ2 (R), so that the integration by parts is justified). Similarly,
d
Z
uxt ux + σ |u| p−1 ut u dx
E(u(t)) = Re
dt RZ
by Theorem 6.18. By the above reasoning we have that E(un (t)) = E(u0,n ) and M(un (t)) =
M(u0,n ) for all t ∈ [0, T ]. Since the maps u 7→ E(u) and u 7→ M(u) are continuous from H 1 (R)
to R, it follows that
E(u(t)) = E(u0 ), M(u(t)) = M(u0 ).
Theorem 6.24. If σ = 1 and u0 ∈ H s (R), s ≥ 1, the strong solution of (6.9) in H s (R) exists
for all t ≥ 0.
Proof. This follows from Theorems 6.19 and 6.23 and the inequality kuk2H 1 (R) ≤ 2(E(u) +
M(u)).
In the focusing case, we generally don’t have global existence. The problem is that one
generally can’t control the H 1 norm using M and E since the second term in E is negative. An
exception is when the nonlinearity is cubic.
Theorem 6.25. If σ = −1, p = 3 and u0 ∈ H s (R), s ≥ 1, the strong solution of (6.9) in H s (R)
exists for all t ≥ 0.
yielding
kuk2L∞ (R) ≤ 2kukL2 (R) ku0 kL2 (R) .
The result now follows by using the fact that C0∞ (R) is dense in H 1 (R).
Dispersive equations 111
If p = 3 this gives
1 1 1 1
Z
|u|4 dx ≤ kuk3L2 (R) ku0 kL2 (R) ≤ ku0 k2L2 + kuk6L2 ,
4 R 2 4 4
by the arithmetic-geometric inequality, so that
1 1 1
Z
E(u) = ku0 k2L2 (R) − |u|4 dx ≥ ku0 k2L2 (R) − 2M(u)3 .
2 4 R 4
We can therefore bound kuk2H 1 (R) ≤ 4(E(u) + M(u)) + 8M(u)3 . The conservation of E(u) and
M(u) now implies that ku(t)kH 1 is bounded.
We conclude by showing a blow-up result. For this we need the additional result that if the
initial data satisfies xu0 ∈ L2 (R), then xu ∈ L2 (R) for all t for which the solution is defined.
This can e.g. be shown by proving a well-posedness result in the space
with norm kukH 1,1 (R) = (kuk2H 1 (R) + kxuk2L2 (R) )1/2 (see Tao [27]). We also need the follow-
ing result, which can formally be verified by carrying out the differentiation and integrating
by parts. The rigorous justification relies on an approximation argument, as in the proof of
Theorem 6.23.
Theorem 6.28. Suppose that σ = −1 and p ≥ 5, and consider u0 ∈ H 1 (R) with V (u0 ) finite.
If
E(u0 ) < 0,
the solution with initial data u0 has finite existence time.
112 Dispersive equations
Proof. Abbreviate V (t) = V (u(t)) and E = E(u(t)) = E(u0 ). If p ≥ 5 and σ = −1, we obtain
that V 00 (t) ≤ 16E. Thus,
with E negative by assumption. Hence, T < ∞ since we would otherwise obtain the contra-
diction V (t) < 0 for t large.
Remark 6.29. Note that E(λ u0 ) < 0 if λ > 0 is sufficiently large for a fixed function u0 ∈
H 1,1 (R), u0 6= 0.
6.3 KdV
As mentioned in the introduction, the KdV equation was introduced in an attempt to explain
the solitary water wave observed by John Scott Russell. The equation can be derived as an
approximation of the equations of fluid mechanics. Since the derivation is quite long it has
been placed at the end of the section.
where f and u0 are given. If u0 ∈ S (Rd ) and f ∈ C∞ ([0, ∞); S (Rd )) there is a unique solution
u ∈ C∞ ([0, ∞); S (Rd )) given by
Z t
u(t) = S(t)u0 + S(t − τ) f (τ) dτ, (6.13)
0
3
where S(t)ϕ = F −1 (eitξ ϕ̂) for ϕ ∈ S (Rd ). Just as for the NLS equation this solution for-
mula can be extended to u0 ∈ H s (R) and f ∈ C([0, T ); H s (R)). Using the same arguments as
in the proof of Propositions 6.14 and 6.15, we obtain the following results.
3
Proposition 6.30. The family of operators {S(t)}t∈R defined for u ∈ H s (R) by S(t)u = F −1 (eiξ t û)
is a strongly continuous group of unitary operators on H s (R). The operator S(t) commutes
with ∂x for each t. If u ∈ H s (R) then t 7→ S(t)u is in C1 (R; H s−3 (R)) with
d
(S(t)u) = −∂x3 S(t)u.
dt
Proposition 6.31. Assume that f ∈ C([0, T ); H s (R)) and that u0 ∈ H s (R). There exists a
unique solution u ∈ C([0, T ); H s (R)) ∩C1 ([0, T ); H s−3 (R)) of (6.12) given by (6.13).
Dispersive equations 113
Just as in the case of the Schrödinger equation, the solution can be expressed as a convo-
lution of u0 with a function Kt (x) if f ≡ 0 and u0 is sufficiently smooth (e.g. u0 ∈ S (R)).
3
Define the tempered distribution Ai ∈ S 0 (R) through Ai = (2π)−1/2 F −1 (eiξ /3 ). We then
have
u = Kt ∗ u0
in the sense of distributions, where Kt is the tempered distribution defined by
Kt (x) = (3t)−1/3 Ai((3t)−1/3 x)
Here we have abused notation slightly by writing Ai(λ x) instead of σλ Ai. This abuse of
notation is justified since Ai is in fact given by a bounded, smooth function.
Lemma 6.32. The tempered distribution Ai is given by the smooth function defined by the
path integral
1
Z
3
eiζ /3+ixζ dζ , (6.14)
2π Im ζ =η
where η > 0 is arbitrary.
Proof. We have that Re(iζ 3 /3 + ixζ ) = −ξ 2 η + η 3 /3 − xη, so that the integral in (6.14)
3
converges for each η > 0. Moreover, the function ζ 7→ eiζ /3+ixζ is analytic for fixed x,
showing that the integral is independent of η. Since the integrand is smooth (in fact, real
analytic) in x and all the derivatives decay rapidly as | Re ζ | → ∞, (6.14) defines a smooth
function of x. It remains to show that the distribution defined by this function coincides with
Ai. Changing the order of integration, we find that
Z Z Z
1 iζ 3 /3+ixζ 1 i(ξ +iη)3 /3+ix(ξ +iη)
e dζ , ϕ = e ϕ(x) dx dξ
2π Im ζ =η 2π R R
Z Z
1 iξ 3 /3+ixξ
→ e ϕ(x) dx dξ
2π R R
3 /3
= h(2π)−1/2 eiξ , F −1 (ϕ)i
= hAi, ϕi
as η → 0+ . Since (6.14) is independent of η, the result follows.
From (6.14) one also sees that
Ai00 (x) − x Ai(x) = 0, (6.15)
since
1
Z
00 3
Ai (x) − x Ai(x) = − (ζ 2 + x)eiζ /3+ixζ dζ
2π Im ζ =η
1 h 3 iζ =R+iη
=− lim eiζ /3+ixζ
2πi R→∞ ζ =−R+iη
= 0.
Equation (6.15) is called Airy’s equation.
114 Dispersive equations
3/2
Proposition 6.33. Ai(x) = O(|x|−1/4 ) as x → −∞, while Ai(x) = O(x−1/4 e−x ) as x → ∞.
and thus,
3/2 /3
e−2x √ 2
Z
xξ +iξ 3 /3
Ai(x) = e− dξ . (6.16)
2π R
We can estimate √
√ 2 √ 2
Z Z
− xξ +iξ 3 /3 − xξ π
e dξ ≤ e dξ = 1/4 ,
R R x
yielding
3/2
e−2x /3
| Ai(x)| ≤ √ 1/4 , x > 0. (6.17)
2 πx
One has to work a bit harder to find the asymptotic behaviour when x → −∞. Note first
that (6.16) continues to hold when |√ arg x| < π. So does the estimate (6.17), if one replaces
x with Re x and x with (Re x)1/2 in the right hand side. Next note that Ai(ωx) is
3/2 3/2 1/4
also a solution of (6.15) if ω is a cubic root of unity, that is, if ω = ωk , where ωk = e2πki/3 ,
k = 0, 1, 2. We claim that any two of these solutions are linearly independent. This follows
since
1 1 ∞ 3−1/6 Γ(1/3)
Z Z
3 3
Ai(0) = eiζ /3 dζ = Re eiπ/6 e−t /3 dt = ,
2π R+i π 0 2π
1 1 31/6 Γ(2/3)
Z Z ∞
0 iζ 3 /3 iπ/3 −t 3 /3
Ai (0) = iζ e dζ = − Im e e dt = − ,
2π R+i π 0 2π
where we have deformed the path R + i into {e−iπ/6t : − ∞ ≤ t < 0} ∪ {eiπ/6t : 0 ≤ t < ∞}.
Hence, Ai(ωx) has the value Ai(0) at 0, while the derivative is ω Ai0 (0), confirming the linear
independence. We also have
3
∑ ωk Ai(ωk x) = 0,
k=0
since the value of this solution of the Airy equation at 0 is Ai(0) ∑3k=0 ωk = 0 and it deriva-
tive is Ai0 (0) ∑3k=0 ωk2 = 0. Expressing Ai(−r) = −ω1 Ai(−ω1 r) − ω2 Ai(−ω2 r), r > 0, with
arg(−ω1 r) = −π/3 and arg(−ω2 r) = π/3, we therefore find that
Remark 6.34. With a little more work one can also prove that Ai(−r) = π −1/2 r−1/4 (sin(2r3/2 /3+
π/4) + O(r−3/2 )) as r → ∞, which shows that the Airy function oscillates rapidly for large
negative values of x. The treatment of the Airy function given here follows the approach in
Hörmander [12].
where u is assumed to be real-valued. One could consider more general nonlinearities of the
form uk ux , where k ≥ 1 is an integer, but we shall concentrate on the case k = 1 for simplicity.
The local results hold for any power of the nonlinearity, while the global results depend heavily
on k.
Local well-posedness
The Cauchy problem for KdV is harder than for NLS, the reason being that the nonlinearity
contains derivatives of u. We begin by stating the main result. The proof is conceptually easy
but contains some technical details.
Theorem 6.36. Let u0 ∈ H k (R), k ≥ 2. There exists a maximal time T > 0 such that the
Cauchy problem (6.18) has a strong solution in C([0, T ); H k (R)). The solution is unique in
this class and if T < ∞, then limt→T ku(t)kH k (R) = ∞. The solution depends continuously on
the initial data.
Remark 6.37. The reason for using Sobolev spaces of integer order is to simplify the proof.
Using the same methods one can prove local well-posedness in H s (R) with s > 3/2.
Lemma 6.38. Let u, v ∈ C([0, T ]; H 2 (R)) ∩ C1 ([0, T ]; H −1 (R)) be solutions of (6.18) with
initial data u0 ∈ H 2 (R). Then u = v.
116 Dispersive equations
Proof. Since w := u − v satisfies the equation wt + 3((u + v)w)x + wxxx = 0, it follows that
d
kw(t)k2L2 (R) = 2hwxxx (t) − 3((u(t) + v(t))w(t))x , wi
dt Z
=6 wx (x,t)w(x,t)(u(x,t) + v(x,t)) dx
RZ
where C = 3 sup0≤t≤T (kux (·,t)kL∞ (R) + kvx (·,t)kL∞ (R) ) < ∞ due to the Sobolev embedding
H 2 (R) ⊂ Cb1 (R). Note that we have to interpret f (t) := wxxx (t) − 3((u(t) + v(t))w(t))x as an
element of H −1 (R) in the first line. The expression h f (t), w(t)i is well-defined since w(t) ∈
H 2 (R) ⊂ H 1 (R). Using Grönwall’s inequality and the fact that w(0) = 0, it follows that
w(t) ≡ 0 in [0, T ].
Remark 6.39. If we instead assume that u, v ∈ C([0, T ]; H 4 (R)) ∩C1 ([0, T ]; H 1 (R)), then all
the derivatives can be interpreted in the classical sense. The reason for working with the lower
regularity is that we need it to prove global existence.
The strategy for proving the existence of a local solution is to replace the Cauchy problem
(6.18) by the approximate problem
(
ut + 6uux + uxxx + εuxxxx = 0, t > 0,
(6.19)
u = u0 , t = 0,
with ε > 0. This is basically the same idea which was involved when we discussed uniqueness
for weak solutions of conservation laws in Section 4.3.3. In fact, one could instead have used
ut + 6uux + uxxx = εuxx , although this gets more technical.
The strategy is the following.
(1) Prove that (6.19) has a solution uε defined on some time interval [0, T (ε)).
(2) Prove that {uεn } converges to a solution of (6.18) as n → ∞, where {εn } is a sequence
of positive numbers which converges to 0.
There are two technical complications here. First of all, one must show that the lifespan
T (εn ) is bounded away from 0 as n → ∞. Secondly, we will only prove convergence in
C([0, T ]; H k−2 (R)) if u0 ∈ H k (R) in the second step. In order to prove the existence of a solu-
tion in C([0, T ]; H k (R)) we approximate the initial data by smooth functions and use the KdV
equation to construct a sequence of approximate solutions which converges in the stronger
norm.
Dispersive equations 117
The Cauchy problem (6.19) can be reformulated as the fixed point equation6
Z t
u(t) = Sε (t)u0 − 3 Sε (t − τ)(u2 (τ))x dx, (6.20)
0
4 3 4 −iξ 3 )
where Sε (t) = e−t(ε∂x +∂x ) . We write Ŝε (t) = e−t(εξ .
Lemma 6.40.
k∂x Sε (t) f kH k (R) ≤ C((εt)−1/2 + (εt)−1/4 )k f kH k−1 (R) .
Proof. The result follows by estimating kξ Ŝε (t)kL∞ (R) and kξ 2 Ŝε (t)kL∞ (R) since
F (∂x Sε (t) f )(ξ ) = iξ Ŝε (t) fˆ(ξ ) and F (∂x2 Sε (t) f )(ξ ) = −ξ 2 Ŝε (t) fˆ(ξ ).
We have
4
kξ Ŝε (t)kL∞ = kξ e−tεξ kL∞
4
= (tε)−1/4 k(tε)1/4 ξ e−tεξ kL∞
≤ C(tε)−1/4 .
Lemma 6.41. Let T > 0, ε > 0 and f ∈ C([0, T ]; H k (R)). There exists C = C(T, ε) such that
Z t
k ∂x Sε (t − τ) f (τ) dτkH k (R) ≤ Ck f kC([0,T ];H k−1 (R)) , t ∈ [0, T ].
0
Proof. This follows immediately by moving the norm inside the integral and applying the
previous lemma. Here it is essential that τ 7→ (t − τ)−1/2 is integrable on [0,t]
Theorem 6.42. Let k ≥ 2 be an integer and u0 ∈ H k (R). For any ε > 0 there exists a unique
maximal strong solution u ∈ C([0, T ); H k (R)) of (6.19) with u(0) = u0 . If T < ∞, we have that
limt→T ku(t)kH k (R) = ∞. The solution has the additional property that u ∈ C∞ ((0, T ); H m (R))
for any integer m ≥ 2.
Proof. The proof of the existence and uniqueness of a solution u ∈ C([0, T ); H k (R)) follows as
for the NLS equation in view of Lemma 6.41. To prove the additional regularity property of the
solution, we note that Sε (t)u0 ∈RC∞ ((0, ∞); H m (R)) for any m due to the rapid decay of Ŝε (t) in
ξ for t > 0. On the other hand, 0t ∂x S(t − τ)u2 (τ) dτ ∈ C([0, T ); H k+1 (R)) by Lemma 6.41. It
follows that u ∈ C((0, T ); H k+1 (R)). Solving the equation with initial data u(t0 ) ∈ H k+1 (R),
for any t0 ∈ (0, T ), we see by the same argument that u ∈ C((t0 , T ); H k+2 (R)). Since t0 is
6 The proof of the equivalence requires some modifications of the method in the proof of Proposition 6.15
since Sε (−t) is not bounded for t > 0. To prove the uniqueness of the solution of the inhomogeneous linear
problem one can instead use energy estimates.
118 Dispersive equations
arbitrary it follows that u ∈ C((0, T ); H k+2 (R)) and by induction that u ∈ C((0, T ); H m (R))
for any m ≥ 2. By differentiating under the integral sign, one now finds that
Z t
u(t) = Sε (t − t0 )u(t0 ) − 3 ∂x Sε (t − τ)u2 (τ) dτ ∈ C∞ ((t1 , T ); H m (R))
t0
for any m and any t0 ∈ (0, T ). Since t0 is arbitrary it follows that u ∈ C∞ ((0, T ); H m (R)) for
any m ≥ 2.
As mentioned earlier, the maximal existence time T = T (ε) is a function ε. Our first task
is to prove that it doesn’t shrink to 0 as ε → 0+ . To do so, we must simply bound the H k norm
of the solution. We begin with a lemma related to Grönwall’s inequality.
Lemma 6.43. Assume that f ∈ C[0, T ] with f ≥ 0, and that there exist a, b, α ∈ R with a > 0
and α > 1 such that Z t
f (t) ≤ a + b f α (τ) dτ
0
when t ∈ [0, T ]. Then
1
f (t) ≤ (a1−α − (α − 1)bt) 1−α .
Proof. Let F(t) = a + b 0t f α (τ) dτ, so that F ∈ C1 ([0, T ]) with F 0 (t) = a + b f α (t) and F(t) >
R
which implies that F 0 (t)F −α (t) ≤ b. Integrating this inequality, we find that
1
F 1−α (t) − F 1−α (0) ≤ bt,
1−α
yielding
F 1−α (t) ≥ F(0)1−α − (α − 1)bt = a1−α − (α − 1)bt
and hence
1
f (t) ≤ F(t) ≤ (a1−α − (α − 1)bt) α−1 .
j j+1−m
If 1 ≤ m ≤ j −1, we can estimate the terms in the sum by k∂x ukL2 (R) k∂xm ukL∞ (R) k∂x ukL2 (R) ≤
3
CkukH k (R) , where the Sobolev embedding theorem is used. The case m = j can be estimated
j j j+1
by k∂x uk2L2 (R) k∂x ukL∞ (R) ≤ Ckuk3H k (R) . This leaves the term −12h∂x u, u ∂x ui. Integration
by parts yields that
Z
−2h∂xj u, u∂xj+1 ui = (∂xj u)2 ∂x u dx ≤ k∂x ukL∞ (R) k∂xj uk2L2 (R) ≤ Ckuk3H k (R) .
R
Applying the previous lemma with f (t) = ku(t)k2H k (R) , a = ku0 k2H k (R) , b = C and α = 3/2,
we find that −2
2 −1 1
ku(t)kH k (R) ≤ ku0 kH k (R) − Ct .
2
This proves that the maximal existence time is bounded from below by T0 = 2ku0 k−1H k (R)
/C,
for otherwise the left hand side would blow up at some time t < T0 . This would result in a
contradiction, since the right hand side doesn’t blow up until T0 .
Lemma 6.45. The sequence {un } converges in C([0, T ]; H k−2 (R)) to a strong solution u ∈
C([0, T ]; H k−2 (R)) of (6.18).
Proof. We show that {un } is a Cauchy sequence in C([0, T ]; H k−2 (R)). Let w := un − um .
Then
wt + 3((un + um )w)x + wxxx + (εn − εm )unxxxx + εm wxxxx = 0.
One finds that
1d j
k∂ w(t)k2L2 (R) = (εm − εn )(∂xj+2 w, ∂xj+2 un )L2 (R) − εm (∂xm+2 w, ∂xm+2 w)L2 (R)
2 dt x
− 3(∂xj w, ∂xj+1 ((un + um )w))L2 (R)
≤ (εm − εn )(∂xj+2 w, ∂xj+2 un )L2 (R) − 3(∂xj w, ∂xj+1 ((un + um )w))L2 (R) .
The first term can be estimated by |εm − εn |kwkC([0,T ];H k (R)) kun kC([0,T ];H k (R) , while the second
and the third term can be estimated by kw(t)k2H k (R) k∂x un kC([0,T ];H k (R)) , since
Z
2(∂xj w, (un + um )∂xj+1 w)L2 (R) =− (∂xj w)2 (un + um ) dx.
R
120 Dispersive equations
d
kw(t)k2H k−2 (R) ≤ C(|εm − εn | + kw(t)k2H k−2 (R) ).
dt
kun (t) − um (t)k2H k−2 (R) ≤ C|εm − εn |(eCt − 1) + kun (0) − um (0)k2H k−2 (R) eCt .
Taking the supremum over [0, T ] and using the fact that εn , εm → 0 as n, m → ∞, we find
that {un } is Cauchy in C([0, T ]; H k−2 (R)). Letting ε → 0 in (6.20), we find that the limit
u ∈ C([0, T ]; H k−2 (R)) is a strong solution of (6.18).
Proof of Theorem 6.36. The uniqueness of the solution follows from Lemma 6.38. We ap-
proximate u0 by a sequence {u0n } ⊂ C0∞ (R) in the H k (R) norm. Since C0∞ (R) ⊂ H k+2 (R), we
can find a solution un ∈ C([0, Tn ]; H k (R)) with initial data u0n for each n. It remains to prove
that we can take Tn = T0 , for some T0 > 0 which is independent of n, and that {un } converges
in C([0, T0 ]; H k (R)). To do this, we use the same arguments as in Lemmas 6.44 and 6.45. The
only purpose of working in H k−2 (R) instead of H k (R) in Lemma 6.45 was to handle the term
εuxxxx . With this term gone, we find that there exists a T0 > 0 such that
d
kun (t) − um (t)k2H k (R) ≤ Ckun (t) − um (t)k2H k (R) , t ∈ [0, T0 ],
dt
2
kun − um kC([0,T k
0 ];H (R))
≤ ku0n − u0m k2H k (R) eCT0 ,
showing that {un } is Cauchy in C([0, T0 ]; H k (R)). This proves the existence of a local solution.
If u0 , v0 ∈ BR (0) ⊂ H k (R) the same argument shows that
2
kun − vn kC([0,T k
0 ];H (R))
≤ ku0n − v0n k2H k (R) eCT0 ,
where u0n , v0n are smooth initial data approximating u0 and v0 and un , vn the corresponding
solutions on the time interval [0, T0 ] with T0 = T0 (R). Passing to the limit, we find that
2
ku − vkC([0,T k
0 ];H (R))
≤ ku0 − v0 k2H k (R) eCT0 ,
proving that the solutions depend continuously on the initial data over the interval [0, T0 ]. The
argument used for the NLS equation shows that the solution can be extended to a maximal
solution u ∈ C([0, T ); H k (R)). Since the time interval in the above continuity argument just
depends on the norm of the initial data, one can extend it to any interval [0, T0 ] on which the
two solutions exist.
Dispersive equations 121
This gives the conservation of M(u). The conservation of E(u) follows by subtracting 3u2 ×
(KdV) from ux × ∂x (KdV). After some work one obtains that
1 2 3 2 2 1 2 9 4
∂t u − u + ∂x 6uux + ux uxxx − 3u uxx − uxx − u = 0.
2 x 2 2
This formally results in the conservation of E(u). The conservation of F(u) can be proved
in a similar manner, although the calculations are rather tedious. By adding uxx × ∂x2 (KdV),
−5u2x × (KdV), −10uux × ∂x (KdV) and 5u3 × (KdV) one finds after a long calculation that
1 2 2 5 4
∂t u − 5uux + u + ∂x R(u) = 0,
2 xx 2
where
1
R(u) = uxx uxxxx − u2xxx + 5u2x uxx + 8uu2xx − 10uux uxxx + 10u3 uxx − 45u2 u2x + 12u5 .
2
The conserved quantities I(u), M(u) and E(u) are related to mass, momentum and energy for
water waves. The quantity F(u) has no direct physical interpretation. The existence of this
conserved quantity is related to the complete integrability of the KdV equation, a property
which will be discussed further in the next chapter.
122 Dispersive equations
If u ∈ H 2 (R), the quantities M(u), E(u) and F(u) are well-defined by Sobolev’s embed-
ding theorem. By repeating the procedure in the proof of Theorem 6.23 one obtains the fol-
lowing result.
Proposition 6.46. Let u ∈ C([0, T ); H 2 (R)) be a strong solution of the KdV equation. The
quantities M(u(t)), E(u(t)) and F(u(t)) are independent of t.
For the proof one needs a persistence of regularity result. We denote by Tk (u0 ) the maximal
existence time of the H k solution of KdV with initial data u0 ∈ H k (R).
Lemma 6.47 (Persistence of regularity for KdV). Let u0 ∈ H k (R), where k ≥ 2. Then Tk (u0 ) =
T2 (u0 ).
To illustrate the idea we consider the proof that T3 (u0 ) = T2 (u0 ). One formally has that
d
kuxxx (t)k2L2 (R) = (uxxx , −6(uux )xxx − uxxxxxx )L2 (R)
dt
= (uxxx , −6(uux )xxx )L2 (R) .
The last expression can be written using Leibniz’ formula as a linear combination of terms
of the form (uxxx , uuxxxx )L2 (R) , (uxxx , ux uxxx )L2 (R) , (uxxx , u2xx )L2 (R) . Integration by parts reveals
that 2(uxxx , uuxxxx )L2 (R) = −(u2xxx , ux )L2 (R) , while (uxxx , u2xx )L2 (R) = (uxxx uxx , uxx )L2 (R) = 0. We
can estimate |(u2xxx , ux )L2 (R) | ≤ kux kL∞ (R) kuxxx k2L2 (R) ≤ CkukH 2 (R) kuxxx k2L2 (R) . Since kukH 2 (R)
is bounded on [0, T ] for any T < T2 (u0 ) we obtain by Grönwall’s inequality that ku(t)kH 3 (R) is
bounded on [0, T ] and hence that T3 (u0 ) = T2 (u0 ). The calculation doesn’t quite make sense,
since uxxx (t) 6∈ C1 ([0, T3 ); L2 (R)) in general. This can be taken care of by first proving the
result for (6.19) and then letting ε → 0+ . One shows similarly that Tk+1 (u0 ) = Tk (u0 ) = T2 (u0 )
for all k ≥ 2.
Theorem 6.48. For u0 ∈ H k (R), k ≥ 2, the solution of (6.18) exists for all t ≥ 0.
Proof. We have that
kuxx k2L2 (R) ≤ 2F(u) + 10kukL∞ (R) kuk2H 1 (R) + 10kuk2L∞ (R) M(u)
≤ 2F(u) +C(kuk3H 1 (R) + kuk4H 1 (R) ).
Dispersive equations 123
Which together with the bound on ku(t)kH 1 (R) and the conservation of F(u(t)) shows that
ku(t)kH 2 (R) is bounded for all t. We conclude using Theorem 6.36, that if u0 ∈ H 2 (R), then
the corresponding solution is defined for all t, that is, u ∈ C([0, ∞); H 2 (R)). If u0 ∈ H k (R),
with k ≥ 2, we conclude by Lemma 6.47 that u ∈ C([0, T ); H k (R)).
ρt + ∇ · (ρu) = 0.
For water it is reasonable to assume that the density is constant, in which case the last equation
reduces to
∇ · u = 0. (6.22)
Moreover, we assume that the only force is gravity, so that F = (0, 0, −g), where g is the
gravitational constant of acceleration. We assume in addition that all functions are independent
of y, so that the waves move only in the horizontal direction x and are uniform in the y-
direction. The equations of motion then reduce to
vx + wz = 0,
vt + vvx + wvz = − ρ1 px , (6.23)
wt + vwx + wwz = − ρ1 pz − g,
where patm is the constant atmospheric pressure at the water surface. The first condition simply
means that there is no flow of water across the bottom. The second condition means that
surface moves with the fluid, so that it always contains the same fluid particles. If (x(t), z(t))
124 Dispersive equations
describes the position of a water particle, then the condition for a water particle to remain
on the surface is that z(t) = η(x(t),t). Differentiating this with respect to t, one obtains
ż(t) = ηt (x(t),t) + ηx (x(t),t)ẋ(t). Since ẋ = v and ż = w, the second boundary condition
follows. There are two important difficulties to note here. First of all, the domain Ωt depends
on t since the surface is time-dependent. Even more importantly, the surface is not prescribed,
it is one of the unknowns of the problem. We therefore have to solve PDE in a domain which
is unknown! This is an example of a free boundary problem.
To derive the KdV equation, we assume that the waves we are describing have small am-
plitude. We therefore introduce a small parameter ε (the amplitude parameter) and write the
equation for the surface in the form z = h0 + h0 εη(x,t), where h0 is the depth of the undis-
turbed water. Rescaling the other variables in an appropriate way, one obtains the new set of
equations
vx + wz = 0,
w = 0
on z = 0
vt + ε(vvx + wvz ) = −px , p=η on z = 1 + εη, (6.25)
ε{wt + ε(vwx + wwz )} = −pz , w = ηt + εvηx on z = 1 + εη.
Equations (6.25) are obtained from (6.23)–(6.24) by using the tilde-variables. We have dropped
the tildes in (6.25) for notational convenience. For an explanation of this change of variables
we refer to Johnson [15].
To derive the leading order approximation we set ε = 0. This yields vx + wz = 0, vt = −px ,
pz = 0 and the boundary conditions w = 0 at z = 0 and p = η, w = ηt at z = 1. A calculation
which is left as an exercise to the reader then shows that
ηtt = ηxx
(the crucial point is that p(x, z,t) = η(x,t)). In other words, the surface profile satisfies the
wave equation to leading order. This suggests the introduction of new variables
ξ = x − t, τ = εt,
meaning that we are seeking a perturbation of a wave moving to the right with unit speed7 .
The equations (6.25) are rewritten in the form
vξ + wz = 0,
w = 0
on z = 0,
−vξ + ε(vτ + vvξ + wvz ) = −pξ , p=η on z = 1 + εη,
ε{−wξ + ε(−wτ + vwξ + wwz )} = −pz , w = −ηξ + ε(ητ + vηξ ) on z = 1 + εη.
The KdV equation is obtained by substituting this in the equations and collecting terms of
the same order in ε. The boundary conditions on the surface z = 1 + εη have to be handled
by expanding p(ξ , 1 + εη, τ) = p(ξ , 1, τ) + ε pz (ξ , 1, τ)η + O(ε 2 ) and similarly for the other
variables. Keeping this in mind, one obtains at order 0
v0ξ + w0z = 0,
w0 = 0
on z = 0,
−v0ξ = −p0ξ , p 0 = η0 on z = 1,
p0z = 0, w0 = −η0ξ on z = 1,
which is solved by
p0 = η0 = v0 , w0 = −zη0ξ
(one can add an arbitrary function of (z,t) to v0 ). Using this, one obtains at the next order
v1ξ + w1z = 0,
w1 = 0
on z = 0,
−v1ξ + v0τ + v0 v0ξ = −p1ξ , p 1 = η1 on z = 1,
p1z = w0ξ , w1 + w0z η0 = −η1ξ + η0τ + v0 η0ξ on z = 1.
We have p1z = w0ξ = −zη0ξ ξ and p1zz = w0ξ z = 0, from which it follows that p1 = 21 (1 −
z2 )η0ξ ξ + η1 . From this it follows that
1
−w1z = v1ξ = v0τ + v0 v0ξ + p1ξ = η0τ + η0 η0ξ + (1 − z2 )η0ξ ξ ξ + η1ξ ,
2
which in combination with the bottom boundary condition gives
1 1
w1 = −(η0τ + η0 η0ξ + η1ξ + η0ξ ξ ξ )z + z3 η0ξ ξ ξ
2 6
Substituting this in the surface boundary condition, one finally arrives at
1 1
−(η0τ + η0 η0ξ + η0ξ ξ ξ ) + η0ξ ξ ξ − η0 η0ξ = η0τ + η0 η0ξ
2 6
1
2η0τ + 3η0 η0ξ + η0ξ ξ ξ = 0.
3
Up to a simple rescaling this is is the KdV equation.
linear problems when considering the nonlinear Cauchy problem. We only used the bound-
edness of S(t) on H s (Rd ). Our approach could therefore essentially be used for any equa-
tion (or even system) ut − a(Dx )u = f (u, Dx u) where the operator a(Dx ) satisfies Petrowsky’s
condition from Chapter 3 and f e.g. is a polynomial. This includes the conservation law
ut + f (u)x = 0 from Chapter 4, for which a(Dx ) = 0. With the technique used to treat the KdV
equation one can prove local well-posedness. The method is of course more complicated than
the method of characteristics, but has the advantage that it extends to higher dimensions and
to systems.
In the last decades, an important development has been the use of finer properties of S(t)
in order to lower the regularity assumptions in the well-posedness results. Taking the NLS
equation as an example, the basic ingredient is the estimate
1 1
− d2 q− p
kS(t)u0 kLq (Rd ) ≤ (4π|t|) ku0 kL p (Rd )
from Proposition 6.13. This estimate can be used to prove the continuity of the map
u0 7→ S(t)u0
from L2 (Rd ) to Lq (R; Lr (Rd )) under certain conditions on q and r. One also has continuity of
the operator Z t
f 7→ S(t − τ) f (τ) dτ
0
L (I; L (R )) ∩ C([0, T ]; L2 (Rd ))
from Lγ ((0, T ); Lρ (Rd )) intoq r d under appropriate conditions
on the exponents. These types of estimates go under the name of Strichartz estimates. They
can e.g. be used to prove the local well-posedness of the NLS equation in L2 (Rd ) under the
condition 1 < p < 1 + 4/d on the exponent. The idea is to replace the space X in the proof
of Theorem 6.18 with a space which is compatible with the Strichartz estimates. The great
advantage of having such a local well-posedness result is that it automatically implies global
existence, irrespective of the sign of σ , due to the conservation of the L2 norm.
For the KdV equation, the Strichartz estimates do not suffice to lower the regularity re-
quirements. The problem is as usual the derivative in the nonlinearity. To handle this, one
needs smoothing properties of the operator f 7→ 0t S(t − τ) f (τ) dτ. A very simple example of
R
In the final chapter we will consider some of the remarkable properties of the KdV equation
alluded to in the introduction and the previous chapter. These properties can be summarized
by saying that the equation is ‘completely integrable’. It isn’t easy to give a precise meaning to
this phrase, but it involves the existence of infinitely many independent conserved quantities
and a way to transform the equation into a linear problem. In the context of finite-dimensional
Hamiltonian systems (ODE) one can give a more precise definition (see [2]). There are many
other interesting examples of completely integrable PDE, e.g. the cubic nonlinear Schrödinger
equation in one dimension, but most equations do not have this property. We will focus on
the KdV equation here for simplicity. The considerations in this chapter will mostly be of a
formal nature. In other words, we will not specify the function spaces that the solutions have
to belong to for the results to be true. It is useful in this chapter to rewrite the KdV equation
in the equivalent form
ut − 6uux + uxxx = 0,
which is obtained using the transformation u 7→ −u. Transformations between different PDE
play a central role in this chapter. As a first illustration, we consider the Cole-Hopf transfor-
mation.
Example 7.1. Consider the viscous Burgers equation
127
128 Solitons and complete integrability
so that
1 1 2
vt − εvxx = − wt + wx − εwxx v = 0.
2ε 2
The transformation u 7→ v is called the Cole-Hopf transformation. Note that this transforms
the nonlinear Burgers equation to the linear heat equation.
The existence of the Cole-Hopf transformation is very surprising and seems almost like
magic. There is a way to transform the KdV equation into a linear problem, but as we will
see, it is much more complicated. There is however a transformation of the KdV equation to
a different nonlinear equation which will turn out to be very useful.
is a conserved quantity. For ε = 0 we have v = u, in which this is just the conservation of I(u)
(see Section 6.3). The idea is now to expand v in powers of ε. Assume that v can be expanded
in a series
∞
v= ∑ ε k vk .
n=0
Substituting this in (7.3), ve obtain
!2
∞ ∞ ∞
u= ∑ ε k vk + ε ∑ ε k ∂x vk + ε 2 ∑ ε k vk
k=0 k=0 k=0
!
∞ ∞ ∞ k
= ∑ ε k vk + ε ∑ ε k ∂x vk + ε 2 ∑ ε k ∑ vk− j v j
k=0 k=0 k=0 j=0
!
∞ k−2
= v0 + ε (v1 + ∂x v0 ) + ∑ ε k vk + ∂x vk−1 + ∑ vk−2− j v j .
k=2 j=0
Thus
v0 = u, v1 = −∂x v0 = −ux , (7.4)
and !
k−2
vk = − ∂x vk−1 + ∑ vk−2− j v j , k ≥ 2. (7.5)
j=0
v2 = uxx − u2 ,
v3 = −(uxx − u2 )x + 2uux ,
v4 = −(2uux − (uxx − u2 )x )x − 2u(uxx − u2 ) − u2x .
R
The fact that R v dx = const moreover implies that
Z
vk (t, x) dx = const, k ∈ N.
R
In general, it turns out that the relations corresponding to odd powers of ε are trivial (the
corresponding vk are exact differentials), whereas the even ones induce an infinite family of
nontrivial conserved quantities [5]. Note that v2 and v4 correspond to the functionals M(u)
and E(u) discussed in the previous chapter, respectively.
words, if v solves the mKdV equation, then so does −v. In addition, the Miura transformation
is not injective. For example, any solution of the Riccati equation v2 + vx = 0 is mapped by the
Miura transformation to the 0 solution of the KdV equation. We could therefore in principle
generate a new solution of the KdV equation by mapping u1 = 0 to a solution v1 of the Riccati
equation, setting v2 = −v1 and u2 = v22 + v2x = v21 − v1x . One problem is that solutions of
the KdV equation are not automatically mapped to solutions of the mKdV equation by the
(inverse) Miura transformation. There is however a free parameter in the solution of the Riccati
equation, which can be adjusted to guarantee that v1 solves mKdV. Another problem is that
the solution v1 is only local, unless it vanishes identically. To remedy this, we introduce a
parameter λ in Miura’s transformation, so that
u = λ + v2 + vx .
meaning as usual that u solves the former equation if v satisfies the latter. The above idea for
generating new solutions can now be summarized as follows.
(1) Start with a known solution u1 of the KdV equation and let v(x,t; f ) be the general
solution of the equation λ + vx + v2 = u1 . This will in general depend upon an arbitrary
function f = f (t).
In the second step we note that (7.6) can be simplified using the identity vx = −λ − v2 + u1 .
Indeed, we have vxx = −2vvx + u1x and vxxx = −2v2x − 2vvxx + u1xx = −2v2x + 4v2 vx − 2vu1x +
u1xx . We can therefore rewrite (7.6) as
Example 7.2. We illustrate the method for u1 = 0. We should then choose v as a solution of
the ordinary differential equation
vx + v2 + λ = 0.
This can be integrated to
v(x,t) = κ tanh(κx + f (t)),
where λ = −κ 2 (< 0) and f is arbitrary. The equation for vt becomes
vt = −4κ 2 vx ,
Solitons and complete integrability 131
and hence f 0 (t) = −4κ 3 , so that f (t) = −4κ 3t − κx0 , where x0 is arbitrary. We therefore
obtain the new solution
2κ 2
u2 (x,t) = −κ 2 + v2 (x,t) − vx (x,t) = − .
cosh2 (κ(x − x0 − 4κ 2t))
This is the solitary wave from the introduction with wave speed c = 4κ 2 (recall that we
changed u to −u in the KdV equation). This solution is valid only if |v| < κ. If |v| > κ,
we obtain the singular solution
The above transformation of solutions of the KdV equation to other solutions is an example
of a Bäcklund transformation. Assume that u and v are functions of two variables x and t and
that
P(u, ux , ut , . . . ; x,t) = 0 and Q(v, vx , vt , . . . ; x,t) = 0
are two uncoupled partial differential equations. A Bäcklund transformation for P and Q is a
system of differential equations
Fj (u, v, ux , vx , ut , vt , . . . ; x,t) = 0, j = 1, 2,
which is solvable for u if and only if Q(v) = 0 and has the property that u then solves P(u) = 0,
and vice versa. If P = Q, then Fj = 0 is called an auto-Bäcklund transformation.
The above transformation for the KdV equation may also be seen as an auto-Bäcklund
transformation. Introducing two functions w1 and w2 such that w jx = u j , the system defining
the Bäcklund transformation can be written
We solve equation (7.7) with w2 replaced by w0 and with two different values of λ , say λ1 and
λ2 . The two solutions are denoted w1 and w2 , so that
(w1 + w0 )x = 2λ1 + 12 (w1 − w0 )2 , (7.8)
(w2 + w0 )x = 2λ2 + 12 (w2 − w0 )2 (7.9)
Using these solutions, we define w12 and w21 by
(w12 + w1 )x = 2λ2 + 12 (w12 − w1 )2 , (7.10)
1 2
(w21 + w2 )x = 2λ1 + 2 (w21 − w2 ) (7.11)
Theorem 7.4 (Permutability theorem). Assume that w1 and w2 are solutions of (7.8)–(7.9).
There exists solutions w12 and w21 of (7.10)–(7.11) with w12 = w21 .
Such a result was originally proved for a similar problem in differential geometry by
Bianchi. Assume for a moment that the theorem is true, so that w12 = w21 = w. Forming
the combination (7.11)−(7.10)−((7.9)−(7.8)), we deduce that
0 = 4(λ1 − λ2 ) + 12 ((w − w2 )2 − (w − w1 )2 − (w2 − w0 )2 + (w1 − w0 )2 ).
Solving for w, we obtain the formula
4(λ1 − λ2 )
w = w0 − . (7.12)
w1 − w2
This may be seen as a nonlinear superposition principle. From the two solutions u1 and u2
corresponding to w1 and w2 , we obtain a new solution u = wx .
Example 7.5. We take w0 = 0, w1 = −2 tanh(x − 4t) and w2 = −4 coth(2x − 32t) so that
λ1 = −1 and λ2 = −4. We therefore obtain
6
w(x,t) = − .
2 tanh(2x − 32t) − tanh(x − 4t)
The corresponding solution of the KdV equation u(x,t) = wx (x,t) is (after some simplifica-
tions) given by
3 + 4 cosh(2x − 8t) + cosh(4x − 64t)
u(x,t) = −12 .
(3 cosh(x − 28t) + cosh(3x − 36t))2
This is a so-called two-soliton solution. As t → ±∞, the solution has the asymptotic form
u(x,t) ∼ −8 sech2 (2(x − 16t) ∓ 21 log 3) − 2 sech2 (x − 4t ± 12 log 3).
The solution can therefore be interpreted as describing the interaction of two solitary waves
−8 sech2 (2(x − 16t − 12 log 3)) and −2 sech2 (x − 4t + 21 ). For large negative t, the solution is
basically a linear combination of these two solutions. This is not surprising since the solitary
waves are very localized and are far apart. For t close to 0 the solutions interact in a nonlinear
wave. The amazing thing is that the the waves come out of the interaction looking exactly the
same as before. The only difference is a phase shift, so that the waves are not at the positions
one would expect if the interaction were linear. It is this property that gave rise to the name
solitons; the solitary waves interact almost like particles. Figure 7.2 illustrates the soliton
interaction.
Solitons and complete integrability 133
Figure 7.1: A two-soliton solution – before, during and after interaction. Note that u has been
changed to −u, so that this is a solution of ut + 6uux + uxxx = 0.
134 Solitons and complete integrability
Proof of Theorem 7.4. To check the validity of the permutability theorem, we simply substi-
tute w given by (7.12) back into equations (7.10) and (7.11). This results in
4(λ1 − λ2 ) 2
4(λ1 − λ2 ) 1
(w1 + w0 )x + (w1 − w2 )x = 2λ2 + 2 w1 − w0 +
(w1 − w2 )2 w1 − w2
and
4(λ1 − λ2 ) 2
4(λ1 − λ2 ) 1
(w2 + w0 )x + (w1 − w2 )x = 2λ1 + 2 w2 − w0 +
(w1 − w2 )2 w1 − w2
We now use (7.8) and (7.9) to rewrite this as
This is nothing but the difference between (7.8) and (7.9). We also need to verify the second
part of the Bäcklund transformation That is, we need to show that
wt − 3w2x + wxxx = w2t − 3w22x + w2xxx = 0,
where we have used the fact that w2x solves the KdV equation. An argument similar to the one
above shows that this is satisfied. The calculations are straightforward but rather lengthy. We
leave them to the interested reader.
This leads to
2
ϕxx ϕ − ϕx2 ϕx2 ϕxx
ϕx ϕx
u = ∂x + = + 2= ,
ϕ ϕ ϕ2 ϕ ϕ
so that
ϕxx − uϕ = 0,
at least as long as ϕ 6= 0. Ingeniously, Gardner, Greene, Kruskal and Miura generalized this,
and went on to investigate the scattering problem
For every fixed t, and equipped with appropriate boundary conditions, this as a Sturm–Liouville
problem. For given u and for every t, there are only certain λ (called eigenvalues) that admit
solutions ψ (called eigenfunctions). Furthermore, it turns out that (7.13) constitutes an implicit
linearization of the KdV equation: the evolution of u(x,t) according to the KdV equation,
forces λ (t) and ψ(x,t) to evolve in specific way, which will be calculated below.
What happens when one substitutes (7.13) into the KdV equation? To simplify calculations,
define
R(x,t) := ψt (x,t) + ux (x,t)ψ(x,t) − 2 (u(x,t) + 2λ (t)) ψx (x,t)
and let
W ( f , g) := f gx − fx g
R
2
W (ψ, R) = ψ
ψ x
2 ψt + ux ψ − 2(u + 2λ )ψx
=ψ
ψ x
= ψtx ψ − ψt ψx + uxx ψ 2 + ux ψx ψ − ux ψψx − 2ux ψx ψ
− 2(u + 2λ )ψxx ψ + 2(u + 2λ )ψx2
= ψtx ψ − ψt ψx + uxx ψ 2 − 2ux ψx ψ + 2(u + 2λ ) ψx2 − ψxx ψ ,
136 Solitons and complete integrability
and
In particular, if W (ψ, R) has decay, then we may integrate over R to obtain that
Z
λt ψ 2 dx = 0.
R
This is an unexpected and amazing result: if u(x,t) solves the KdV equation, then the eigen-
values λ of (7.13) are time-independent, although both u and ψ depend on time. Moreover,
W (ψ, R) is a function of time only, and thus
W (ψ, R) = W (ψ, R)x=∞ = 0,
R
→C as x → ∞,
ψ
with C independent of time, we can deduce the following evolution equation for ψ:
Thus, at least formally, we have related the KdV equation in u to a linear problem in ψ:
ψxx + (λ − u)ψ = 0,
(7.15)
ψt + (ux −C)ψ − 2(u + 2λ )ψx = 0.
Solitons and complete integrability 137
Given (7.15), one can show (do this!) that the compatibility condition
ψtxx = ψxxt ,
found above. One sometimes says that the KdV equation is the compatibility condition for the
system (7.15), meaning that λ = const in (7.15) if and only of u satisfies the KdV equation.
Lψ = λ ψ,
(7.16)
ψt = Bψ,
where L is the linear and symmetric operator in the scattering problem, and B is the cor-
responding evolution operator. Now, assume that ∂t λ = 0. Taking the time-derivative of
Lψ = λ ψ, we then get that
0 = Lt ψ + Lψt − λ ψt
= Lt ψ + LBψ − λ Bψ
= Lt ψ + LBψ − Bλ ψ
= (Lt + LB − BL) ψ.
Lt = [B, L],
where L and B are spatial, but time-dependent, operators on some Hilbert space X, and L is
linear and symmetric, i.e.
Lψ = λ ψ, 0 6= ψ ∈ X.
We assume also that L, B and the eigenfunctions of L are continuously differentiable with
respect to t. Just as above, we find that
Lt ψ + Lψt = λt ψ + λ ψt ,
whence
λt ψ = (L − λ )ψt + [B, L]ψ
= (L − λ )ψt + Bλ ψ − LBψ
(7.17)
= (L − λ )ψt + (λ − L)Bψ
= (L − λ )(ψt − Bψ),
(L − λ )(ψt − Bψ) = 0,
ψt − Bψ = f (t)ψ,
for some function f depending only on time. Since f commutes with the spatial operator L,
we find that B̃ := B + f satisfies both
Remark 7.7. Notice that this Lax pair appears already in the linear formulation (7.15) of the
KdV equation. This follows, since the scattering problem
λ ψ = Lψ = u − ∂x2 ψ,
implies that
and
We see that
Lt = ut , and [B, L] = −uxxx + 6uux ,
whence the Lax equation Lt = [B, L] is just the KdV equation itself. This explains why the
eigenvalues are constant in time, as also explicitly calculated above.
140 Solitons and complete integrability
Appendix A
Functional analysis
The basic idea of functional analysis is to consider functions (or other objects) as points or
vectors in a vector space (also called a linear space). The space should be equipped with some
topology, so that one can discuss things such as convergence and continuity. By proving results
based on the structure of the vector space and not on the specific objects under consideration,
one can deduce results which apply in many different concrete cases. In most cases discussed
in these notes, the topology is defined in terms of a norm. We assume that the reader is familiar
with the definition of a vector space. Throughout the chapter, if nothing else is said, X defines
a vector space over the real or complex numbers. The letter K will denote R or C, depending
on whether X is real or complex. Since the idea is to give a brief overview, we will not give
complete proofs of all the statements. The reader will the find proofs in any book on functional
analysis, e.g. [16].
for all x, y ∈ X and λ ∈ K. The pair (X, k · k) is called a normed vector space.
We will sometimes also make use of functions k · k : X → [0, ∞) which satisfy conditions
(1) and (2) of the above definition, but not necessarily (3). Such a function is called a semi-
norm. Whenever we have a semi-norm k · kX on X, we can however obtain a normed vector
space by introducing an equivalence relation. We say that x ∼ y if kx − ykX = 0. The tri-
angle inequality shows that ∼ really defines an equivalence relation. By the reverse triangle
inequality, we find that
|kxkX − kykX | ≤ kx − ykX = 0
141
142 Functional analysis
if x ∼ y. We can therefore define a norm k · kY by setting k[x]kY = kxkY for the equivalence
class [x] ∈ Y containing the element x ∈ X (the above argument shows that it is independent of
the choice of representative). We leave it to the reader to verify that k · kY really is a norm on
Y.
Example A.2.
kuk = sup |u(x)|
0≤x≤1
defines a norm on C[0, 1], the space of continuous function on the interval [0, 1]. It only defines
a semi-norm on C(R), since any function u which vanishes outside of the interval [0, 1] satisfies
kuk = 0.
From now on we suppose that X is endowed with a norm k · k. Using the norm one can
define the open ball Br (x) of radius r > 0 around a point x ∈ X as Br (x) = {y ∈ X : kx−yk < r}.
Using these open balls one can define open and closed sets and continuous functions just as in
Rd .
Note that any two norms k · k1 and k · k2 on a finite-dimensional space are equivalent, in
the sense that there exist constants C1 ,C2 > 0, such that Ckxk1 ≤ kx2 k ≤ C2 kxk1 . This implies
that the norms define the same topology, i.e. the same notion of open sets.
Of particular interest are linear operators between normed vector spaces. We have the
following basic result.
Proposition A.3. Let (X, k · kX ) and (Y, k · kY ) be normed vector spaces and let T : X → Y be
a linear operator. The following are equivalent.
kT xkY ≤ CkxkX
for all x ∈ X.
Due to (3), a continuous linear operator X → Y is also called a bounded linear operator.
The smallest possible constant in (3),
kT xkY
kT k := sup kT xkY = sup ,
kxkX =1 x∈X\{0} kxkX
is called the operator norm of T . By definition, we have kT xkY ≤ kT kkxkX . Note that the set
B(X,Y ) of bounded linear operators X → Y is a normed vector space when endowed with the
operator norm.
Functional analysis 143
Definition A.4. The dual of X is the space of bounded linear functionals on X. It is denoted
X ∗.
The dual of a finite-dimensional space X can be identified with itself. Indeed, given a basis
{e1 , . . . , en } for X, any linear functional ` on X can be written
`(x1 e1 + · · · + xn en ) = c1 x1 + · · · + cn xn ,
This topology is important since it is often much easier to prove weak convergence than strong
convergence (convergence in norm). There are many cases where it is enough to have a weak
limit.
An important property which separates R form Q is its completeness. This can be ex-
pressed in many ways, e.g. as the property that an increasing sequence which is bounded
from above always has a limit in R (this is not true in Q as can be seen e.g. by looking at
the decimal expansion of any irrational number). An equivalent formulation of this is that any
Cauchy sequence in R is convergent. This turns out to be the correct generalization. Recall
that a Cauchy sequence is a sequence {xn } satisfying xn − xm → 0 as n, m → ∞.
The basic example of a function space which a Banach space is the space C(I) of contin-
uous real-valued functions on a compact interval I, endowed with the supremum norm. The
completeness property basically follows from the fact that R is complete and that uniform
limits of continuous functions are continuous. If I is not compact, one merely has to add the
condition supt∈I ku(t)kX < ∞ to the definition. We denote the space of bounded continuous
real-valued functions on I by Cb (I). If I is compact we have that Cb (I) = C(I). This example
can be generalized in many ways. One example which we use repeatedly in the notes is the
following.
144 Functional analysis
Example A.8. Let X be a Banach space and I a compact interval. Then the space C(I; X)
consisting of continuous functions u : I → X is a Banach space, with norm given by
kukC(I;X) = sup ku(t)kX .
t∈I
The proof of completeness is almost word for word the same as in case of real-valued func-
tions. One only has to use the completeness of X instead of the completeness of R. Similarly,
for any k ≥ 0, we introduce the space Ck (I; X) consisting of k times continuously differentiable
functions from I to X. Here, differentiability of u at t means that there exists an element v ∈ X
such that
u(t + h) − u(t)
lim =v
h→0 h
in X. Ck (I; X) is also a Banach space when endowed with the norm
kukCk (I;X) = max sup ku( j) (t)kX .
0≤ j≤k t∈I
We saw earlier that the dual of a finite-dimensional space can be identified with the space
itself. Generally, this is not the case in infinite dimensions. However, for Hilbert spaces it is
true. Note that any y ∈ X defines a bounded linear functional `y through the formula
The fact that this a linear functional is obvious; the fact that it is bounded with norm kyk
follows from the Cauchy-Schwarz inequality. Riesz’ representation theorem asserts that the
converse is also true.
Theorem A.12 (Riesz’ representation theorem). Let (X, (·, ·)) be a Hilbert space. Given any
bounded linear functional ` ∈ X ∗ , there exists a unique element y = y` ∈ X such that
`(x) = (x, y)
Definition A.14. Let (X, ρX ) and (Y, ρY ) be metric spaces. A function f : X → Y is called a
contraction if there exists a constant θ ∈ [0, 1) such that
for every x, y ∈ X.
Theorem A.15 (Banach’s fixed point theorem). Let (X, ρX ) be a complete metric space. If
f : X → X is a contraction, then the fixed point equation
f (x) = x
as n → ∞. It follows that {xn } is a Cauchy sequence and thus has a limit x since X is complete.
The fact that x is a solution of the fixed point problem follows by passing to the limit in the
relation xn+1 = f (xn ) (note that a contraction is always continuous).
Note that any closed subset of a Banach space X is a complete metric space with the metric
induced by the norm on X. This is the setting in which we will use the above theorem.
A.4 Completions
A subset of U of a metric space (X, ρ) is said to be dense in X if for every x ∈ X there exists
a sequence {xn } in U such that xn → x. An example is that Q is dense in R. Whenever U
is a subset of X, U will be dense in the closure U, defined as the smallest closed subset of X
containing U (that such a set exists follows from the fact that intersections of closed sets are
closed).
Functional analysis 147
Integration
The Riemann integral is insufficient for many purposes in analysis. Specifically it doesn’t
behave well with respect to limits. Suppose that { fn } is an increasing sequence
R
of non-negative
Riemann integrable functions. Then R
both f (x) = lim
R n→∞ n
f (x) and limn→∞ f n dx exists and
it would be natural to expect that f dx = limn→∞ fn dx. However, f might not be Riemann
integrable, as the following example illustrates.
Example B.1. Let {xn }∞ n=1 be an enumeration of the rational numbers in the interval [0, 1].
Define the functions fn : [0, 1] → R by setting fn (x) = 1 if x ∈ {x1 , . . . , xn } and fn (x)R = 0 other-
wise. Each fn is 0 except at finitely many points, so it is Riemann integrable with fn dx = 0.
The sequence { fn (x)} is increasing and bounded from above for each x, so it has a limit f (x).
In fact, it is clear that f (x) is 1 at the rational numbers in [0, 1] and zero everywhere else. The
function f is not Riemann integrable.
As you know from calculus, integrals measure some form of area or volume. It is therefore
natural to start by discussing measure, that is, the generalization of length, area and volume
to arbitrary dimensions. By assigning a measure to the set of rational points in [0, 1], one
can integrate the function f in the previous example. There are other ways of introducing
the Lebesgue integral on Rd , e.g. by approximation with continuous functions of compact
support.
The main purpose of this appendix is to describe the construction of the Lebesgue integral
and to collect some results which are used throughout the text. Proofs of the theorems can be
found in any book on measure theory and real analysis, e.g. [8].
149
150 Integration
way, such as ∞ − ∞, is undefined. Secondly, we will not be able to measure all subset of Rd .
This is not a big problem, since the class of sets we can measure is very large (e.g. it contains
all open sets and all closed sets).
Definition B.2 (Sigma algebras). A collection Σ of subsets of Rd is called a σ -algebra if the
the following conditions hold
(1) Rd ∈ Σ;
(2) if A ∈ Rd , then Ac ∈ Rd ;
(3) if A j ∈ Σ, j = 1, 2, . . ., then ∪∞j=1 A j ∈ Σ.
It follows from the definition that
• 0/ = (Rd )c ∈ Σ;
• ∩∞j=1 A j = (∪∞j=1 Acj )c ∈ Σ if A j ∈ Σ for all j;
• A \ B = A ∩ Bc ∈ Σ if A, B ∈ Σ.
Definition B.3 (Measures). A measure on a σ -algebra Σ is a function µ : Σ → [0, ∞] which is
countably additive, that is,
∞
µ ∪∞j=1 A j = ∑ µ(A j )
j=1
whenever A j ∈ Σ for all j and the sets A j are pairwise disjoint (A j ∩ Ak = 0/ if j 6= k), and
/ = 0.
satisfies µ(0)
One can show from the definition that
• µ(A) ≤ µ(B) if A, B ∈ Σ and A ⊂ B;
• µ(∪∞j=1 A j ) ≤ ∑∞j=1 µ(A j ) if A j ∈ Σ for all j;
• µ(∪∞j=1 A j ) = lim j→∞ µ(A j ) if A j ∈ Σ and A j ⊂ A j+1 for all j.
Theorem B.4 (Lebesgue measure). There exists a σ -algebra of subsets of Rd and a measure
µ on Σ such that
(1) every open set (and thus every closed set) in Rd belongs to Σ;
(2) if A ⊂ B and B ∈ Σ with µ(B) = 0, then A ∈ Σ with µ(A) = 0;
(3) if A = [a1 , b1 ] × [a2 , b2 ] × · · · × [ad , bd ], then µ(A) = Πdj=1 (b j − a j );
We denote this measure | · | and call it the Lebesgue measure on Rd . A set in the above
σ -algebra is called Lebesgue measurable. Since we will only deal with the Lebesgue measure,
we usually drop the word ‘Lebesgue’ in what follows. The Lebesgue measure is a general-
ization of the intuitive idea of lengths in R, areas in R2 and volumes in R3 . It gives the usual
value to sets such as polygons, balls etc, but it also allows us to measure ‘wilder’ sets and sets
in higher dimensions than 3.
Integration 151
Definition B.5. A set of measure zero is called a null set. A property which holds everywhere
on Rd , except on a null set, is said to hold almost everywhere, abbreviated a.e.
Any countable set is a null set, in particular the set [0, 1] ∩ Q which appeared in Example
B.1.
Proposition B.7.
(3) if { f j } is a sequence of measurable functions, then sup j f j , inf j f j , lim sup j→∞ f j and
lim inf j→∞ f j are measurable;
a j ∈ R for all j.
where the supremum is taken over all simple functions s satisfying 0 ≤ s(x) ≤ f (x) for all x.
Note that the integral of a non-negative function can have the value +∞. If it is finite, we
say that the function is integrable. We allowed f to take the value +∞, but if it is integrable
this cannot happen very often.
Proposition B.10. If f : Rd → [0, ∞] is integrable, then, f (x) < ∞ a.e.
provided that at least one of f + and f − are integrable. If both f + and f − are integrable, we
say that f is integrable (some authors use the word summable). Note that this is equivalent to
Z
| f (x)| dx < ∞,
Rd
so Lebesgue integrals are always absolutely convergent. The set of integrable functions on Rd
is denoted L 1 (Rd ). This set is closed under linear combinations and hence a vector space
with the usual pointwise operations.
Note that all the usualR rules that youR have learned
R
from Riemann
R
integration
R
also apply to
Lebesgue integrals, e.g. ( f + g) dx = f dx + g dx and | f dx| ≤ | f | dx. Moreover, any
Riemann integrable function is also Lebesgue integrable and the values of the two integrals
coincide. The only thing one has to watch out for is generalized integrals. The Lebesgue
integral automatically handles absolutely convergent generalized integrals, but one still needs
aR limiting procedure to take care of conditionally convergent generalized integrals, such as
sin x
R x dx. One thing which is much easier for the Lebesgue integral than for the Riemann
integral is iterated integration.
Integration 153
Theorem B.11 (Fubini-Tonelli theorem1 ). Let f be a measurable function on Rn+d and sup-
pose that at least one of the integrals
Z
I1 = | f (x, y)| dx dy,
Rn+d
Z Z
I2 = | f (x, y)| dx dy,
Rd Rn
Z Z
I3 = | f (x, y)| dy dx
Rn Rd
exists and is finite. For I2 and I3 this means that the inner integral exists a.e. and defines an
integrable function. Then all the integrals exist and I1 = I2 = I3 . Moreover,
(1) f (·, y) ∈ L1 (Rn ) for a.e. y ∈ Rd and f (x, y) dx ∈ L1 (Rd );
R
Rn
(3) Z Z Z Z Z
f (x, y) dx dy = f (x, y) dx dy = f (x, y) dy dx.
Rn+d Rd Rn Rn Rd
fn → f a.e.
If
| fn | ≤ g a.e.
1 The
first part is due to Tonelli and the second to Fubini. For simplicity we will mostly refer to the theorem
as Fubini’s theorem.
154 Integration
Using the dominated convergence theorem one can prove the following important results
about the dependence of integrals on parameters.
(1) Suppose that there exists g ∈ L 1 (Rd ) such that | f (x, y)| ≤ g(y) for all x, y. If limx→x0 f (x, y) =
f (x0 , y) for every y, then limx→x0 F(x) = F(x0 ). In particular, if f (·, y) is continuous for
each y, then F is continuous.
(2) Suppose that ∂x j f (x, y) exists for all x, y and that there exists g ∈ L 1 (Rd ) such that
|∂x j f (x, y)| ≤ g(y) for all x. Then ∂x j F(x) exists and is given by
Z
∂x j F(x) = ∂x j f (x, y) dy.
Rd
B.4 L p spaces
If p ∈ [1, ∞) and f is a measurable complex-valued function on Rd , we define
Z 1/p
p
k f kL p (Rd ) = | f | dx .
Rd
We will also write k f kL p if the dimension d is clear from the context. We also define
where the essential supremum of a measurable function g is defined as the minimal c ∈ [−∞, ∞]
such that g(x) ≤ c a.e., that is,
k f gkL1 ≤ k f kL p kgkLq .
In particular, f g ∈ L 1 if f ∈ L p and g ∈ L q .
Integration 155
and
Z
Z
f (·, y) dy
≤ k f (·, y)kL p (Rn ) dy.
Rd Rd
L p (Rn )
Proof. After splitting the function f into real and imaginary parts, and then splitting these into
positive and negative parts, we can assume that f ≥ 0. For p = 1 the result holds with equality
by Fubini’s theorem. If p > 1 we write
Z p Z
f (x, y) dy = f (x, y) dy w(x),
Rd Rd
where w(x) = ( Rd f (x, y) dy) p−1 . By Fubini’s theorem and Hölder’s inequality, it follows that
R
Z Z Z Z
f (x, y) dy w(x) dx = f (x, y)w(x) dx dy
Rn Rd Rd Rn
Z Z 1/p Z 1/q
p q
≤ f (x, y) dx w(x) dx dy
Rd Rn Rn
Z Z 1/p ! Z Z p 1/q
= f (x, y) p dx dy f (x, y) dy dx
Rd Rn Rn Rd
The result follows after dividing with the second factor in the right hand side.
156 Integration
Recall that the support of a continuous function f is defined as supp f = {x ∈ Rd : f (x) 6= 0}.
We denote by C0 (Rd ) the set of continuous functions on Rd with compact support (the support
is always closed, so the only condition is that it should be bounded). If f is only measurable,
one instead defines the essential support, ess supp f , as the smallest closed set outside of which
f = 0 a.e. We will use the notation supp f for ess supp f when f is not continuous. The fact
that L p (Rd ) is the completion of C0 (Rd ) now follows from the following theorem.
Theorem B.21. C0 (Rd ) is dense in L p (Rd ) for 1 ≤ p < ∞.
Note that Hölder’s inequality implies that a function g ∈ Lq defines a bounded linear func-
tional on L p , Z
`g ( f ) = f g dx, (B.1)
Rd
where p and q are conjugate indices ( 1p + 1q = 1). Moreover, k`g k ≤ kgkq .
Theorem B.22. For 1 ≤ p < ∞, the dual of L p (Rd ) can be identified with Lq (Rd ), where p
and q are conjugate indices. That is, the map g 7→ `g from Lq (Rd ) → (Lq (Rd ))∗ defined by
(B.1) is an isometric isomorphism.
In the case p = 2 this is a special case of Riesz’ representation theorem, at least if one replaces
g by g.
whenever this makes sense. From the definition it follows that ( f ∗ g)(x) = (g ∗ f )(x) a.e. By
Hölder’s inequality, we obtain that f ∗ g ∈ L∞ if f ∈ L p and g ∈ Lq with 1p + 1q = 1, p, q ∈ [1, ∞].
Moreover,
k f ∗ gkL∞ ≤ k f kL p kgkLq . (B.2)
Integration 157
is C∞ on R with support [0, ∞). Indeed, for t > 0 all the derivatives are polynomials in 1/t
times e−1/t and t a e−1/t → 0 as t → 0+ for any a ∈ R. The function ϕ can e.g. be defined as
ϕ(x) = f (1 − |x|2 ).
We can now construct
R
a smooth ‘approximation of the
R
identity’ as follows. After dividing
ϕ in Lemma B.24 by Rd ϕ dx = 1, we can assume that Rd ϕ dx = 1. We set
ables). One can think of Jε as converging to a ‘unit mass’ at the origin. This can be made
sense of using distribution theory.
Generally, we call a function J ∈ C0∞ (Rd ) with J ≥ 0 and
R
Rd J dx = 1 a mollifier. We set
Jε (x) = ε −d J(ε −1 x). The convolution
Z Z
(Jε ∗ f )(x) = Jε (x − y) f (y) dy = Jε (y) f (x − y) dy
Rd Rd
1 (Rd ) and is called the mollification or regularization of f . (J ∗ f )(x)
is defined for f ∈ Lloc ε
can be thought of as an average of f over a small neighborhood of x.
158 Integration
Proof. The first statement follows by repeated differentiation under the integral sign.
The second statement follows since supp Jε ⊂ supp f + supp Jε and both f and Jε have
compact supports.
We have that
Z
(Jε ∗ f )(x) − f (x) = Jε (y)( f (x − y) − f (x)) dy
d
ZR
= J(y)( f (x − εy) − f (x)) dy
supp J
Proof. This follows by approximating f ∈ L p (Rd ) with a function g ∈ C0 (Rd ) using Theorem
B.21 and then approximating g with a function in C0∞ (Rd ) using Theorem B.25.
Theorem B.27. Suppose that K ∈ L1 (Rd ) and that Define Kε (x) = ε −d K(ε −1 x),
R
Rd K(x) dx = 1.
ε > 0.
as R → ∞. To estimate the first term, we note that limε→0+ ( f (x−εy)− f (x)) = 0 uniformly for
|y| ≤ R and x in a compact subset of Rd . It therefore follows that the first integral converges to
0 locally uniformly in x for fixed R. The result is obtained by combining these two estimates.
The second part is obvious because the estimate for the first term is now uniform over Rd .
(2) Let δ > 0. Approximate f by g ∈ C0 (Rd ) with k f − gkL p < δ and K by J ∈ C0 (Rd )
with kK − JkL1 < δ . We then have
We have that
kKε ∗ f − Kε ∗ gkL p ≤ kKε kL1 k f − gkL p ≤ δ kKkL1 ,
kJε ∗ g − gkL p → 0
B.6 Interpolation
If E has finite measure, it follows from Hölder’s inequality that L p2 (E) ⊂ L p1 (E) if p1 < p2 .
p2 p1
This means that Lloc (Rd ) ⊂ Lloc (Rd ) (a higher exponent means less singular behavior locally).
However, at infinity the inclusion is reversed in the sense that L∞ (Rd ) ∩ L p1 (Rd ) ⊂ L p2 (Rd ) (a
160 Integration
smaller exponent means more decay). In fact, more generally, we have that f ∈ L p1 ∩ L p2 ⇒
f ∈ L p for any p ∈ (p1 , p2 ). This follows from Hölder’s inequality:
Z
k f kLp p = | f | p dx
ZRd
= | f |θ p1 | f |(1−θ )p2 dx
Rd
Z θ Z 1−θ (B.4)
p1 p2
≤ | f | dx | f | dx
Rd Rd
p (1−θ )
= k f kLp1p1θ k f kL2p2 ,
where p = θ p1 + (1 − θ )p2 , θ ∈ (0, 1).
Suppose that T is a linear map defined on a dense subspace X ⊂ L p1 . Suppose also that X
is dense in L p2 and that T is bounded from X ⊂ L p j to Lr j in the sense that T f ∈ Lr j and
kT f kLr j ≤ C j k f kL p j , f ∈ X.
Note that this means that T extends uniquely to a bounded operator T : L p j → Lr j (we abuse
notation by using the same letter for the extension). Indeed, if xn → x ∈ L p j then {T xn } is a
Cauchy sequence in Lr j . Since Lr j is complete we can define T x = limn→∞ T xn . This operator
will be bounded (by the same constant) and this is the only bounded extension.
One can also extend T to any L p with p ∈ (p1 , p2 ). Indeed, let A = {x : | f (x)| > 1}. Write
f = f1 + f2 , where f1 = χA f and f2 = χAc f . Then f1 ∈ L p1 since A has finite measure if
f ∈ L p , and f2 ∈ L p2 since f2 ∈ L p ∩ L∞ . Hence we can define
T f = T f 1 + T f 2 ∈ L r1 + L r2 .
The next results shows that in fact T f ∈ Lr for some r between r1 and r2 .
Theorem B.28 (Riesz-Thorin interpolation theorem). Let T be as above and let
1 θ 1−θ
= +
p p1 p2
and
1 θ 1−θ
= +
r r1 r2
p r
for some θ ∈ (0, 1). Then T : L → L is bounded with
kT f kLr ≤ C1θ C21−θ k f kL p .
Theorem B.29 (Young’s inequality). Suppose that p, q, r ∈ [1, ∞] and 1p + q1 = 1 + 1r . If f ∈ L p
and g ∈ Lq , then f ∗ g ∈ Lr with
k f ∗ gkLr ≤ k f kL p kgkLq .
Proof. Let q and g be fixed. The special case p = 1, r = q follows from Minkowski’s inequality
for integrals. The case p = q/(q − 1), r = ∞ follows from Hölder’s inequality. The general
case now follows from the Riesz-Thorin theorem.
Bibliography
[1] R. A. A DAMS AND J. J. F. F OURNIER, Sobolev spaces, vol. 140 of Pure and Applied
Mathematics (Amsterdam), Elsevier/Academic Press, Amsterdam, second ed., 2003.
[3] J. B OUSSINESQ, Théorie des ondes et des remous qui se propagent le long d’un canal
rectangulaire horizontal, en communiquant au liquide contenu dans ce canal des vitesses
sensiblement pareilles de la surface au fond, Liouville J., 2 (1872), pp. 55–109.
[4] A. B RESSAN, Hyperbolic systems of conservation laws, vol. 20 of Oxford Lecture Series
in Mathematics and its Applications, Oxford University Press, Oxford, 2000. The one-
dimensional Cauchy problem.
[8] G. B. F OLLAND, Real analysis, Pure and Applied Mathematics (New York), John Wiley
& Sons Inc., New York, second ed., 1999. Modern techniques and their applications, A
Wiley-Interscience Publication.
[10] H. H OLDEN AND N. H. R ISEBRO, Front tracking for hyperbolic conservation laws,
vol. 152 of Applied Mathematical Sciences, Springer-Verlag, New York, 2002.
161
162 BIBLIOGRAPHY
[13] , The analysis of linear partial differential operators. II, Classics in Mathematics,
Springer-Verlag, Berlin, 2005. Differential operators with constant coefficients, Reprint
of the 1983 original.
[16] P. D. L AX, Functional analysis, Pure and Applied Mathematics (New York), Wiley-
Interscience [John Wiley & Sons], New York, 2002.
[20] J. R AUCH, Partial differential equations, vol. 128 of Graduate Texts in Mathematics,
Springer-Verlag, New York, 1991.
[21] P. I. R ICHARDS, Shock waves on the highway, Operations Res., 4 (1956), pp. 42–51.
[22] J. S. RUSSELL, Report on Waves, Report of the fourteenth meeting of the British Asso-
ciation for the Advancement of Science, (1844), pp. 311–390, plates XLVII–LVII.
[25] R. S. S TRICHARTZ, A guide to distribution theory and Fourier transforms, River Edge,
NJ: World Scientific., 2003.
[26] C. S ULEM AND P.-L. S ULEM, The nonlinear Schrödinger equation. Self-focusing and
wave collapse., Applied Mathematical Sciences. 139. New York, NY: Springer, 1999.
BIBLIOGRAPHY 163
[27] T. TAO, Nonlinear dispersive equations. Local and global analysis, CBMS Regional
Conference Series in Mathematics 106. Providence, RI: American Mathematical Society
(AMS), 2006.
[28] G. B. W HITHAM, Linear and nonlinear waves, Pure and Applied Mathematics (New
York), John Wiley & Sons Inc., New York, 1974.