M5 A44 Lecture Notes

M5A44 COMPUTATIONAL STOCHASTIC
PROCESSES
Professor G.A. Pavliotis
Department of Mathematics Imperial College London, UK
g.pavliotis@imperial.ac.uk
WINTER TERM 2014-2015

14/01/2015
Pavliotis (IC)
CompStochProc
1 / 298
Lectures: Mondays, 11:00-13:00, Wednesdays 12:00-13:00,

Huxley 658.
Office Hours: Mondays 14:00-15:00, Wednesdays 14:00-15:00 or
by appointment. Huxley 621.
Course webpage:
http://www.ma.imperial.ac.uk/~pavl/comp_stoch_proc.htm
Text: Lecture notes, available from the course webpage. Also,
recommended reading from various textbooks/review articles.
Pavliotis (IC)
CompStochProc
2 / 298
This is an introductory course on computational stochastic

processes, aimed towards 4th year, MSc and MRes students in
applied mathematics applied mathematics and theoretical physics.
Prior knowledge of basic stochastic processes in continuous time,
scientific computing and a programming language such as matlab
or C is assumed.
Pavliotis (IC)
CompStochProc
3 / 298
PART I: INTRODUCTION
Random number generators.

Random variables, probability distribution functions.
Introduction to Monte Carlo techniques.
Variance reduction techniques.
SIMULATION OF STOCHASTIC PROCESSES
Introduction to stochastic processes.

Brownian motion and related stochastic processes and their
simulation.
Gaussian stochastic processes.
Karhunen-Loeve expansion and simulation of Gaussian random
fields.
Numerical solution of ODEs and PDEs with random coefficients.
Pavliotis (IC)
CompStochProc
4 / 298
3. NUMERICAL SOLUTION OF STOCHASTIC DIFFERENTIAL

EQUATIONS.
Basic properties of stochastic differential equations.

Its formula and stochastic Taylor expansions.
Examples of numerical methods: The Euler-Marayama and Milstein
schemes.
Theoretical issues: convergence, consistency, stability. Weak versus
strong convergence.
Implicit schemes and stiff SDEs.
Pavliotis (IC)
CompStochProc
5 / 298
4. ERGODIC DIFFUSION PROCESSES
Deterministic methods: numerical solution of the Fokker-Planck

equation.
Numerical calculation of the invariant measure/steady state
simulation.
Calculation of expectation values, the Feynman-Kac formula.
5. MARKOV CHAIN MONTE CARLO (MCMC).
The Metropolis-Hastings and MALA algorithms.

Diffusion limits for MCMC algorithms.
Peskun and Tierney orderings and Markov Chain comparison.
Optimization of MCMC algorithms.
Bias correction and variance reduction techniques.
Pavliotis (IC)
CompStochProc
6 / 298
6. STATISTICAL INFERENCE FOR DIFFUSION PROCESSES.
Estimation of the diffusion coefficient (volatility).

Maximum likelihood estimation of drift coefficients in SDEs.
Nonparametric and Bayesian techniques.
7. APPLICATIONS
Probabilistic methods for the numerical solution of partial differential

equations.
Exit time problems.
Molecular dynamics/computational statistical mechanics.
Molecular motors, stochastic resonance.
Pavliotis (IC)
CompStochProc
7 / 298
Prerequisites
Basic knowledge of ODEs and PDEs.
Familiarity with the theory of stochastic processes.
Stochastic differential equations.
Basic knowledge of functional analysis and stochastic analysis
Numerical methods and scientific computing.
Familiarity with a programming language: Matlab, C, Fortran....
Pavliotis (IC)
CompStochProc
8 / 298
Assessment
Based on 1 project (25%) and a final exam (75%).
Mastery exam: additional project/paper to study and write a
report.
The project will be (mostly) computational.
The exam will be theoretical (e.g. analysis of algorithms).
Pavliotis (IC)
CompStochProc
9 / 298
Bibliography
Lecture notes will be provided for all the material that we will cover
in this course. They will be posted on the course webpage.
Books that cover (parts of) the contents of this course are
G.A. Pavliotis Stochastic Processes and Applications, Springer

(2014).
Kloeden & Platen: Numerical Solution of Stochastic Differential
Equations, Springer (1992).
S.M. Iacus Simulation and inference for stochastic differential
equations, Springer (2008).
Asmussen and Glynn, Stochastic Simulation, Springer (2007).
Huynh, Lai, Soumare Stochastic simulation and applications in
finance with Matlab programs. Wiley (2008).
Pavliotis (IC)
CompStochProc
10 / 298
Lectures
Slides and whiteboard.
Computer experiments in Matlab.
Pavliotis (IC)
CompStochProc
11 / 298
why Computational Stochastic Processes?

Many models in science, engineering and economics are
probabilistic in nature and we have to deal with uncertainty.
Many deterministic problems can be solved more efficiently using
probabilistic techniques.
Pavliotis (IC)
CompStochProc
12 / 298
Deriving dynamical models from paleoclimatic records

F. Kwasniok, and G. Lohmann, Phys. Rev. E, 80, 6, 066104 (2009)
Pavliotis (IC)
CompStochProc
13 / 298
Fit this data to a bistable SDE
dx = V (x; a) dt + dW,
V(x) =
4
X
aj xj .
(1)
j=1
Estimate the coefficients in the drift from the palecolimatic data

using the unscented Kalman filter.
the resulting potential is highly asymmetric.
Pavliotis (IC)
CompStochProc
14 / 298
Pavliotis (IC)
CompStochProc
15 / 298
Computational Statistical Physics

Sample from the Boltzmann distribution:
(x) =
1 U(x)
e
,
Z
(2)
where = (kB T)1 is the inverse temperature and the

normalization constant Z is the partition function
Z
eU(x) dx.
Z=
(3)
R3k
x denotes the configuration of the system x = {xi : i = 1, . . . k},

xi = (xi1 , xi2 , xi3 ).
The potential U(x) is a sum of pairwise interactions

X
12 6
.
U(x) =
(|xi xj |), (r) = 4
r
r
i,j
Pavliotis (IC)
CompStochProc
16 / 298
We need to be able to calculate Z(), which is a very high

dimensional integral.
We want to be able to calculate expectations with respect to (x):
Z
f (x)(x) dx.
E f (x) =
R3k
One method for doing this is by simulating a diffusion process

(MCMC):
p
dXt = U(Xt ) dt + 2 1 dWt .
Pavliotis (IC)
CompStochProc
17 / 298
Some interesting questions

How many experiments should we do?
How de we compute expectations with respect to stationary
distributions?
Can we exploit the problem structure to speed up the
computation?
How do we efficiently compute probabilities of rare events?
How do we estimate the sensitivity of a stochastic model to
changes in a parameter?
How we we use stochastic simulation to optimize our choice of
decision parameters?
Pavliotis (IC)
CompStochProc
18 / 298
Example: Brownian motion in a periodic potential

Consider the Langevin dynamics in a smooth periodic potential
p
q = V(q) q + 2 1 W,
(4)
where W(t) denotes standard d-dimensional Brownian motion,
> 0 the friction coefficient and the inverse temperature
(strength of the noise).
Write as a first order system:
dqt = pt dt,
dpt = q V(qt ) dt pt dt +
Pavliotis (IC)
CompStochProc
(5a)
2 1 dW
t.
(5b)
19 / 298
At long times the solution to qt performs an effective Brownian

motion with diffusion coefficient D, Eq2t 2Dt, t 1.
We compute the diffusion coefficient for V(q) = cos(q) using the

formula
Var(qt )
,
D = lim
t
2t
as a function of , at a fixed temperature 1 .
The SDE (5) has to be solved numerically using an appropriate
scheme.
The numerical simulations suggest that
D
Pavliotis (IC)
1
,
for both 1 and 1.
CompStochProc
(6)
20 / 298
10
10
= 0.063
eff
= 0.01
D0 /
= 0.0158
eff
10
0.1826/0.97
= 0.0251
= 0.0398
1
10
eff
< x >/2t
10
10
10
10
10
10
10
10
10
10
10
10
a. Var(qt )i/2t vs t
10
10
10
10
10
b. D vs
Figure: Variance and effective diffusion coefficient for various values of .
Pavliotis (IC)
CompStochProc
21 / 298
The One-Dimensional Random Walk

We let time be discrete, i.e. t = 0, 1, . . . . Consider the following
stochastic process Sn :
S0 = 0;
at each time step it moves to 1 with equal probability 12 .
In other words, at each time step we flip a fair coin. If the outcome is
heads, we move one unit to the right. If the outcome is tails, we move
one unit to the left.
Alternatively, we can think of the random walk as a sum of independent
random variables:
n
X
Xj ,
Sn =
j=1
where Xj {1, 1} with P(Xj = 1) = 21 .

Pavliotis (IC)
CompStochProc
22 / 298
We can simulate the random walk on a computer:

We need a (pseudo)random number generator to generate n
independent random variables which are uniformly distributed in
the interval [0,1].
If the value of the random variable is > 21 then the particle moves
to the left, otherwise it moves to the right.
We then take the sum of all these random moves.
The sequence {Sn }Nn=1 indexed by the discrete time
T = {1, 2, . . . N} is the path of the random walk. We use a linear
interpolation (i.e. connect the points {n, Sn } by straight lines) to
generate a continuous path.
Pavliotis (IC)
CompStochProc
23 / 298
50step random walk

8
6
4
2
0
2
4
6
0
10
15
20
25
30
35
40
45
50
Figure: Three paths of the random walk of length N = 50.

Pavliotis (IC)
CompStochProc
24 / 298
1000step random walk

20
10
10
20
30
40
50
0
100
200
300
400
500
600
700
800
900
1000
Figure: Three paths of the random walk of length N = 1000.

Pavliotis (IC)
CompStochProc
25 / 298
Every path of the random walk is different: it depends on the

outcome of a sequence of independent random experiments.
We can compute statistics by generating a large number of paths
and computing averages. For example, E(Sn ) = 0, E(Sn2 ) = n.
The paths of the random walk (without the linear interpolation) are
not continuous: the random walk has a jump of size 1 at each time
step.
This is an example of a discrete time, discrete space stochastic
processes.
The random walk is a time-homogeneous (the probabilistic law
of evolution is independent of time) Markov (the future depends
only on the present and not on the past) process.
If we take a large number of steps, the random walk starts looking
like a continuous time process with continuous paths.
Pavliotis (IC)
CompStochProc
26 / 298
Consider the sequence of continuous time stochastic processes

1
Ztn := Snt .
n
In the limit as n , the sequence {Ztn } converges (in some
appropriate sense) to a Brownian motion with diffusion
2
1
coefficient D = x
2t = 2 .
Pavliotis (IC)
CompStochProc
27 / 298
2
mean of 1000 paths
5 individual paths
1.5
U(t)
0.5
0.5
1.5
0.2
0.4
0.6
0.8
t
Figure: Sample Brownian paths.
Pavliotis (IC)
CompStochProc
28 / 298
Brownian motion W(t) is a continuous time stochastic processes

with continuous paths that starts at 0 (W(0) = 0) and has
independent, normally. distributed Gaussian increments.
We can simulate the Brownian motion on a computer using a
random number generator that generates normally distributed,
independent random variables.
Pavliotis (IC)
CompStochProc
29 / 298
We can write an equation for the evolution of the paths of a

Brownian motion Xt with diffusion coefficient D starting at x:
dXt = 2DdWt , X0 = x.
This is an example of a stochastic differential equation.
The probability of finding Xt at y at time t, given that it was at x at
time t = 0, the transition probability density (y, t) satisfies the
PDE
2
= D 2 , (y, 0) = (y x).
t
y
This is an example of the Fokker-Planck equation.
The connection between Brownian motion and the diffusion
equation was made by Einstein in 1905.
Pavliotis (IC)
CompStochProc
30 / 298
ELEMENTS OF PROBABILITY THEORY
Pavliotis (IC)
CompStochProc
31 / 298
A collection of subsets of a set is called a algebra if it

contains and is closed under the operations of taking
complements and countable unions of its elements.
A sub-algebra is a collection of subsets of a algebra which
satisfies the axioms of a algebra.
A measurable space is a pair (, F) where is a set and F is a
algebra of subsets of .
Let (, F) and (E, G) be two measurable spaces. A function
X : 7 E such that the event
{ : X() A} =: {X A}
belongs to F for arbitrary A G is called a measurable function or
random variable.
Pavliotis (IC)
CompStochProc
32 / 298
Let (, F) be a measurable space. A function : F 7 [0, 1] is

called a probability
P measure if () = 1, () = 1 and
(
A
)
=
k
k=1
k=1 (Ak ) for all sequences of pairwise disjoint sets
{Ak }
F.
k=1
The triplet (, F, ) is called a probability space.
Let X be a random variable (measurable function) from (, F, ) to

(E, G). If E is a metric space then we may define expectation with
respect to the measure by
Z
X() d().
E[X] =
More generally, let f : E 7 R be Gmeasurable. Then,

Z
f (X()) d().
E[f (X)] =
Pavliotis (IC)
CompStochProc
33 / 298
Let U be a topological space. We will use the notation B(U) to

denote the Borel algebra of U: the smallest algebra
containing all open sets of U. Every random variable from a
probability space (, F, ) to a measurable space (E, B(E))
induces a probability measure on E:
X (B) = PX 1 (B) = ( ; X() B),
B B(E).
The measure X is called the distribution (or sometimes the law)

of X.
Example
Let I denote a subset of the positive integers. A vector
0 = {0,i , i I} is a distribution
on I if it has nonnegative entries and
P
its total mass equals 1: iI 0,i = 1.
Pavliotis (IC)
CompStochProc
34 / 298
We can use the distribution of a random variable to compute

expectations and probabilities:
Z
E[f (X)] = f (x) dX (x)
S
and
P[X G] =
dX (x),
G B(E).
When E = Rd and we can write dX (x) = (x) dx, then we refer to

(x) as the probability density function (pdf), or density with
respect to Lebesque measure for X.
When E = Rd then by Lp (; Rd ), or sometimes Lp (; ) or even
simply Lp (), we mean the Banach space of measurable functions
on with norm

kXkLp = E|X|p
Pavliotis (IC)
CompStochProc
1/p
35 / 298
Example
Consider the random variable X : 7 R with pdf

(x m)2
12
,m (x) := (2) exp
.
2
Such an X is termed a Gaussian or normal random variable. The
mean is
Z
x,m (x) dx = m
EX =
R
and the variance is
E(X m)2 =
Pavliotis (IC)
(x m)2 ,m (x) dx = .
CompStochProc
36 / 298
Example (Continued)
Let m Rd and Rdd be symmetric and positive definite. The
random variable X : 7 Rd with pdf

1
1
,m (x) := (2)d det 2 exp h1 (x m), (x m)i
2
is termed a multivariate Gaussian or normal random variable.

The mean is
E(X) = m
and the covariance matrix is

E (X m) (X m) = .
Pavliotis (IC)
CompStochProc
(7)
(8)
37 / 298
Since the mean and variance specify completely a Gaussian

random variable on R, the Gaussian is commonly denoted by
N (m, ). The standard normal random variable is N (0, 1).
Since the mean and covariance matrix completely specify a

Gaussian random variable on Rd , the Gaussian is commonly
denoted by N (m, ).
Pavliotis (IC)
CompStochProc
38 / 298
The Characteristic Function

Many of the properties of (sums of) random variables can be
studied using the Fourier transform of the distribution function.
The characteristic function of the random variable X is defined
to be the Fourier transform of the distribution function
Z
eit dX () = E(eitX ).
(9)
(t) =
R
The characteristic function determines uniquely the distribution

function of the random variable, in the sense that there is a
one-to-one correspondance between F() and (t).
The characteristic function of a N (m, ) is
1
(t) = ehm,ti 2 ht,ti .
Pavliotis (IC)
CompStochProc
39 / 298
Lemma
Let {X1 , X2 , . . . Xn } be independent random
P variables with characteristic
functions j (t), j = 1, . . . n and let Y = nj=1 Xj with characteristic
function Y (t). Then
Y (t) = nj=1 j (t).
Lemma
Let X be a random variable with characteristic function (t) and
assume that it has finite moments. Then
E(X k ) =
Pavliotis (IC)
1 (k)
(0).
ik
CompStochProc
40 / 298
Types of Convergence and Limit Theorems
One of the most important aspects of the theory of random

variables is the study of limit theorems for sums of random
variables.
The most well known limit theorems in probability theory are the
law of large numbers and the central limit theorem.
There are various different types of convergence for sequences or
random variables.
Pavliotis (IC)
CompStochProc
41 / 298
Definition
Let {Zn }
n=1 be a sequence of random variables. We will say that
(a) Zn converges to Z with probability one (almost surely) if

P lim Zn = Z = 1.
n+
(b) Zn converges to Z in probability if for every > 0

lim P |Zn Z| > = 0.
n+
(c) Zn converges to Z in Lp if
p

lim E Zn Z = 0.
n+
(d) Let Fn (), n = 1, + , F() be the distribution functions of

Zn n = 1, + and Z, respectively. Then Zn converges to Z in
distribution if
lim Fn () = F()
n+
Pavliotis (IC)
CompStochProc
42 / 298
Let {Xn }
n=1 be iid random variables with EXn = V. Then, the
strong law of large numbers states that average of the sum of
the iid converges to V with probability one:
!
N
1X
Xn = V = 1.
P
lim
n+ N
n=1
The strong law of large numbers provides us with information

about the behavior of a sum of random variables (or, a large
number or repetitions of the same experiment) on average.
We can also study fluctuations around the average behavior.
Indeed, let E(Xn V)2 = 2 . Define the centered iid random
variables
Yn = Xn V. Then, the sequence of random variables
1 PN
n=1 Yn converges in distribution to a N (0, 1) random

N
variable:
! Z
N
a
1 2
1
1 X
e 2 x dx.
Yn 6 a =
lim P
n+
N n=1
2
Pavliotis (IC)
CompStochProc
43 / 298
Assume that E|X| < and let G be a subalgebra of F. The

conditional expectation of X with respect to G is defined to be
the function E[X|G] : 7 E which is Gmeasurable and satisfies
Z
Z
X d G G.
E[X|G] d =
G
We can define E[f (X)|G] and the conditional probability

P[X F|G] = E[IF (X)|G], where IF is the indicator function of F, in
a similar manner.
Pavliotis (IC)
CompStochProc
44 / 298
Random Number Generators

How can a deterministic machine produce random numbers?
We can use a chaotic dynamical system:
xn+1 = f (xn ),
x0 = x,
n = 1, 2, . . .
Provided that this dynamical system is sufficiently chaotic, then

we can hope that for n sufficiently large the sequence of numbers
xn , xn+1 , xn+2 , . . . is random with nice statistical properties.
Statistical tests can be used in order to determine whether this
sequence of pseudo-random numbers can be used in order to
perform Monte Carlo simulations, simulate stochastic processes
etc.
Modern programming languages have well tested build in RNG. In
Matlab:
rand U(0, 1).
randn N(0, 1).
Pavliotis (IC)
CompStochProc
45 / 298
Definition
A uniform pseudo-number generator is an algorithm which, starting
from an initial value x0 (the seed), produces a sequence ui = Di u0 of
values in [0, 1].
Lemma
Suppose Y U(0, 1) and F a 1d cumulative distribution function. Then
X = F 1 (U) with F 1 (u) = inf {x ; F(x) > u} has the distribution F.
We can use the Box-Muller algorithm to generate Gaussian random
variables, given a pseudonumber generator of uniform r.v.: Given
U1 , U2 U(0, 1) independent, then
p
2 ln U1 cos(2U2 ),
Z0 =
p
Z1 =
2 ln U1 sin(2U2 )
are independent N(0, 1) r.v.

Pavliotis (IC)
CompStochProc
46 / 298
Fast Chaotic Noise

Fast/slow system:
dx
dt
dy1
dt
dy2
dt
dy3
dt
= x x3 +
y2 ,
10
(y2 y1 ) ,
2
1
= 2 (28y1 y2 y1 y3 ) ,
1
8
= 2 (y1 y2 y3 )
3
=
Effective Dynamics: [Melbourne, Stuart 11]

dXt = A Xt Xt 3 dt + dWt
true values:
A=1,
Pavliotis (IC)
2
,
=
45
= 2
CompStochProc
1
lim
T T
s (y) s+t (y) ds dt
0
47 / 298
Fast Chaotic Noise

Estimators
Values for reported in the literature ( = 103/2 )
0.126 0.003 via Gaussian moment approx.

0.13 0.01 via HMM
here: = 101
0.121 and = 103/2
0.124
But we estimate also A

Pavliotis (IC)
CompStochProc
48 / 298
Generation of Multidimensional Gaussian Random Variables

We are given Z N (0, I), where I Rdd denotes the identity
matrix.
We want to generate a Gaussian random vector Y N (b, ),
EY = b, E((Y b)(Y b)T ) = .
X = Y b is mean zero, sufficient to consider this case.
We can use the Cholesky decomposition: Assuming that > 0

there exists a unique lower triangular matrix such that
= T .
(10)
Then: X = Z is X N (0, ).
The computational cost is O(d3 ).

In matlab:
chol()
Pavliotis (IC)
CompStochProc
49 / 298
Eigenvalue decomposition of Cov(X)

The Cholesky decomposition is not unique for covariance
matrices that are not strictly positive definite.
The eigenvalues and eigenvectors of the covariance matrix
provide us complete information about the Gaussian random
vector.
Principal Component Analysis.
Let X N (0, ), > 0. Then
= PDPT ,
PPT = I,
D = diag(d1 , dd ).
(11)
Set L = P D and use the Cholesky decomposition.

In Matlab:
[P,D] = eig ()
Pavliotis (IC)
CompStochProc
50 / 298
Other techniques for generating random numbers with a given

distribution:
Acceptance-rejection algorithm.
Markov Chain Monte Carlo.
Pavliotis (IC)
CompStochProc
51 / 298
Monte Carlo Techniques

Example:
I =
x (U(D))d
f (x) dx = Ef (X),
N
1X
f (xi ) =: IN ,
N
xi iid U(D).
i=1
To prove convergence use the (strong) law of large numbers:

N
1X
f (xi ) = I
N+ N
lim
a.s.
i=1
Pavliotis (IC)
CompStochProc
52 / 298
To obtain an estimate on the rate of convergence use the central

limit theorem:

N IN I N(0, 2 ), in distribution,
where 2 = Var (f (x)).
The error of the Monte Carlo approximation is O(N 1/2 ),

regardless of the dimensionality of x.
For smooth functions, deterministic approximations (Riemann
sum, trapezoidal rule..) can have better convergence rate, but they
dont scale well as the number of dimensions increases.
Pavliotis (IC)
CompStochProc
53 / 298
More generally, consider the integral

Z
f (x)(x) dx,
I=
(12)
Rd
where (x) is a probability distribution. We can always write

integrals in this form (multiply and divide by (x)).
Monte Carlo estimator:
IN =
N
1X
f (Xi ),
N
i=1
Xi (x), iid.
(13)
The MC estimator is unbiased:

E IN = I.
Variance of the estimator:
I2
1
1
:= var (IN ) = var (f (x)) =
N
N
Pavliotis (IC)
CompStochProc
Rd
(f (x) I)2 (x) dx.

54 / 298
From the strong law of large numbers:

lim IN = I
a.s.
N+
From the weak LLN and Chebyshevs inequality:

f2
I2
=
1
.
(14)
2
N2
Accuracy of the MC estimator depends on N and on the variance
I2 .
In order for
P (IN [I , I + ]) = 1 ,
P (|IN I| 6 ) > 1
for a confidence level (0, 1) we need

N>
f2
MC error scales like 1/ N, independently of the number of

dimensions. It is expected that f2 increases as the number of
dimensions increases.
Pavliotis (IC)
CompStochProc
55 / 298
Example (Huynh et al Stochastic Simulation and

Applications in finance ch. 5)
1
2
3
function [Call, VarCall]=CalculateCall2(N)

% Price a call option
%N: Number of generated points to compute the expectation
4
5
6
7
x = exp(sqrt(0.1) * randn(1,N) + 5);

Call = mean( max(x - 110,0));
VarCall = var(max(x - 110,0))/N;
8
9
end
Pavliotis (IC)
CompStochProc
56 / 298
VARIANCE REDUCTION TECHNIQUES

To improve the accuracy of the MC estimator and to reduce the
computational cost we can use variance reduction techniques.
Examples of such techniques are:
Antithetic variables.
Control variates.
Importance sampling.
To implement these techniques we need to be able to estimate the

mean and variance "on the fly":
IN+1 =
and
1
var(IN ) =
N
Pavliotis (IC)
1
(NIN + f (XN+1 ))
N+1
N
N 2
1 X
I
|f (Xi )|2
N1
N1 N
i=1
CompStochProc
57 / 298
Antithetic Variables: generate additional random variables that

have the same distribution as Xi , i = 1, . . . N but are negatively
correlated to Xi .
Example: suppose we want to estimate the mean X of iid random
variables Xi , i = 1, . . . N. We introduce the auxiliary r.v.
Xia , i = 1, . . . N with
E[(Xia x )(Xj X )] = ij X2 .
The variance of the sample mean is
Var(b
a ) =
N
1 2
1 X
E[(Xia x )(Xj X )]
+
2N 2 X 4N 2
k,j=1
1 2
.
4N X
Not possible in general to find perfectly anti-correlated random

variables.
Pavliotis (IC)
CompStochProc
58 / 298
Example
Let X U(0, 1). Then X a = 1 X is an antithetic variable of X
Example
Let X N (, 2 ). Then X a = 2 X is an antithetic variable of X.
Example
Let X N (, ) a 2d Gaussian random vector. We use antithetic
variables to reduce the variance and increase the accuracy of the
estimation of Ef (X1 , X2 ). We will estimate the expectation of
Yj = eXj , j = 1, 2. These integrals can be calculated analytically.
Pavliotis (IC)
CompStochProc
59 / 298
(Huynh et al Stochastic Simulation and Applications in finance ch. 5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function Rep=SimulationsAnti
%calculate the mean of an exponential gaussian using
%antithetic variables.
NbTraj=100;
% construct the covariance matrix
sigma1=0.2; sigma2=0.5;rho=-0.3;
MoyTheo=[10;15];
CovTheo=[sigma1^2,rho*sigma1*sigma2;rho*sigma1*sigma2,sigma2^2
L=chol(CovTheo)';
%Simulation of the random variables.
Sample=randn(2,NbTraj);
SampleSimple=repmat(MoyTheo,1,NbTraj)+L*Sample;
%Introduction of the antithetic variables
XAnti=cat(2,SampleSimple,2*repmat(MoyTheo,1,NbTraj)-SampleSimp
%Transformation of Xi into Zi
Z1Anti=exp(XAnti(1,:))
Z2Anti=exp(XAnti(2,:))
Rep=mean(Z1Anti,2);
end
Pavliotis (IC)
CompStochProc
60 / 298
CONTROL VARIATES
Let Z be a random variable. We want to estimate EZ.
We look for a r.v. W that is strongly correlated with Z and with
known EW.
Consider the new r.v.
X = Z + (W EW).
(15)
We have
Var(X) = Var(Z) + 2 Var(W) + 2Cov(Z, W).
We minimize the variance with respect to :
=
Pavliotis (IC)
Cov(Z, W)
Var(W)
and Varmin (X) = Var(Z)(1 2 ).

CompStochProc
61 / 298
CONTROL VARIATES
The estimator (16) is unbiased and the variance is reduced when
the correlation coefficient 2 = Cov(Z,W) is close to 1.
Var(Z)Var(W)
The optimal value of and Var(X) can be estimated using the
empirical means.
We can also construct confidence intervals using the empirical
values.
We can add R control variates:
X=Z+
R
X
j=1
(Wj EWj ).
(16)
A similar idea can be used for vector valued r.v. X, or even in

infinite dimensions.
Pavliotis (IC)
CompStochProc
62 / 298
Importance sampling: choose a good probability distribution from

which to simulate random variables:
Z
Z
f (x)(x)
f (x)(x) dx =
(x) dx
(x)
Rd
Rd

f (x)(x)
= E
.
(x)
where (x) is a probability distribution.
Consider the MC estimator
IN =
N
1 X f (Xj )(Xj )
,
N
(Xj )
j=1
Xj (x).
The estimator is unbiased. The variance is

f (X)(X)
1
var(IN ) = var
.
N
(X)
Pavliotis (IC)
CompStochProc
63 / 298
We can to choose (x) so that it has similar shape to f (x) (x).

The closer
f (X)(X)
(X)
is to a constant the smaller the variance is.
Example
We use importance sampling to estimate the integral
I=
c exp(a(x b)4 ) dx = 18.128
with c = 100, a = 10000, b = 0.5. we choose = (0.5, 0.05);

For N = 10000 we have IN = 18.2279, var(IN ) = 0.1193, errN =
0.0999, IIC = 18.1271, var(IIC ) = 0.0041, err(IIC ) = 8.56 104 .
Pavliotis (IC)
CompStochProc
64 / 298
100
90
80
70
60
50
40
30
20
10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure: c exp(a(x b)4 ) and (0.5, 0.05)

Pavliotis (IC)
CompStochProc
65 / 298
1
2
3
4
5
6
7
8
9
10
11
12
13
14
function [yun,varyun,errun,yIC,varyIC,errIC] = quarticIS(N)

% Use importance sampling to calculate the mean
% of exp(-V) where V is a quartic polynomial
a = 10000;b=0.5;c=100;
intexact = c*0.18128;
x = rand(N,1);
yun = mean(c*exp(-a*(x-b).^4));
varyun = var(c*exp(-a*(x-b).^4))/N;
errun = abs(intexact-yun);
mu = 0.5;sigma = 0.05;
xx = mu + sigma*randn(N,1);
yIC = mean((c*exp(-a*(xx-b).^4))./normpdf(xx,mu,sigma));
varyIC = var((c*exp(-a*(xx-b).^4))./normpdf(xx,mu,sigma))/N;
errIC = abs(intexact-yIC);
Pavliotis (IC)
CompStochProc
66 / 298
ELEMENTS OF THE THEORY OF STOCHASTIC PROCESSES
Pavliotis (IC)
CompStochProc
67 / 298
Let T be an ordered set. A stochastic process is a collection of

random variables X = {Xt ; t T} where, for each fixed t T, Xt is
a random variable from (, F) to (E, G).
The measurable space {, F} is called the sample space. The
space (E, G) is called the state space .
In this course we will take the set T to be [0, +).
The state space E will usually be Rd equipped with the algebra

of Borel sets.
A stochastic process X may be viewed as a function of both t T
and . We will sometimes write X(t), X(t, ) or Xt () instead
of Xt . For a fixed sample point , the function Xt () : T 7 E is
called a sample path (realization, trajectory) of the process X.
Pavliotis (IC)
CompStochProc
68 / 298
The finite dimensional distributions (fdd) of a stochastic

process are the distributions of the Ek valued random variables
(X(t1 ), X(t2 ), . . . , X(tk )) for arbitrary positive integer k and arbitrary
times ti T, i {1, . . . , k}:
F(x) = P(X(ti ) 6 xi , i = 1, . . . , k)
with x = (x1 , . . . , xk ).
We will say that two processes Xt and Yt are equivalent if they
have same finite dimensional distributions.
From experiments or numerical simulations we can only obtain
information about the (fdd) of a process.
Pavliotis (IC)
CompStochProc
69 / 298
Definition
A Gaussian process is a stochastic processes for which E = Rd and
all the finite dimensional distributions are Gaussian
F(x) = P(X(ti ) 6 xi , i = 1, . . . , k)

1 1
n/2
1/2
= (2)
(detKk )
exp hKk (x k ), x k i ,
2
for some vector k and a symmetric positive definite matrix Kk .
A Gaussian process x(t) is characterized by its mean
m(t) := Ex(t)
and the covariance function

C(t, s) = E x(t) m(t) x(s) m(s) .
Thus, the first two moments of a Gaussian process are sufficient

for a complete characterization of the process.
Pavliotis (IC)
CompStochProc
70 / 298
Examples of Gaussian stochastic processes

Random Fourier series: let i , i N (0, 1), i = 1, . . . N and define
X(t) =
N
X
(j cos(2jt) + j sin(2jt)) .
j=1
Brownian motion is a Gaussian process with

m(t) = 0, C(t, s) = min(t, s).
Brownian bridge is a Gaussian process with
m(t) = 0, C(t, s) = min(t, s) ts.
The Ornstein-Uhlenbeck process is a Gaussian process with

m(t) = 0, C(t, s) = e|ts| with , > 0.
Pavliotis (IC)
CompStochProc
71 / 298
To simulate a Gaussian stochastic process:
Fix t and define tj = (j 1)t, j = 1, . . . N.

Set Xj := X(tj ) and define the Gaussian random vector
N
X N = XjN j=1 . Then X N N (N , N ) with N = ((t1 ), . . . (tN ))
and Nij = C(ti , tj ).

Then X N = N + N (0, I) with N = T .
We can calculate the square root of the covariance matrix either

using the Cholesky factorization or by calculating its eigenvalue
decomposition (PCA).
might be a full matrix and the calculation of can be
computationally expensive.
Pavliotis (IC)
CompStochProc
72 / 298
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function [t,x,gamma,mx,varx] =gaussiansimulation(T,dt,M)

% simulate a mean zero gaussian process
% use the eigenvalue decomposition to calculate sqrt{Gamma}
% Input: T:
length of path
%
dt:
timestep
%
gamma: covariance
%
M :
number of paths simulated
% calculate first and second moment
t = 0:dt:T;
N = length(t);
for i =1:N,
for j=1:N,
%
gamma(i,j) = covBM(t(i),t(j));
%
gamma(i,j) = covOU(t(i),t(j),1,1);
%
gamma(i,j) = covBB(t(i),t(j));
gamma(i,j) = covfBM(t(i),t(j),0.2);
end
end
[vv,dd] = eig(gamma);lambda = vv*sqrt(dd)*vv';
x = lambda*randn(N,M);
mx = mean(x'); varx = mean(x'.*x');
figure;plot(t,x(:,1:10),'Linewidth',1.0);
Pavliotis (IC)
CompStochProc
73 / 298
1
2
3
4
1
2
3
4
1
2
3
4
5
6
function rr = covBM(x,y)
% covariance function of Brownian motion
rr = min(x,y);
end
function rr = covBB(x,y)
% covariance function of Brownian motion
rr = min(x,y) - x*y;
end
function rr = covOU(x,y,dd,aa)
% covariance function of Ornstein-Uhlenbeck process
% aa: inverse correlation time
% dd: diffusion coefficient, rr(o)= dd/aa
rr = (dd/aa)*exp(-aa*abs(x-y));
end
Pavliotis (IC)
CompStochProc
74 / 298
1
2
3
4
function rr = covfBM(x,y,H)
% covariance function of fractional Brownian motion
% H: Hurst exponent
rr = (1/2)*(abs(x)^(2*H) + abs(y)^(2*H) - abs(x-y)^(2*H) ...
);
end
Pavliotis (IC)
CompStochProc
75 / 298

Let , F, P be a probability space. Let Xt , t T (with T = R or Z) be
a real-valued random process on this probability space with finite
second moment, E|Xt |2 < + (i.e. Xt L2 ).
Definition
A stochastic process Xt L2 is called second-order stationary or
wide-sense stationary if the first moment EXt is a constant and the
second moment E(Xt Xs ) depends only on the difference t s:
EXt = ,
Pavliotis (IC)
E(Xt Xs ) = C(t s).
CompStochProc
76 / 298
The constant m is called the expectation of the process Xt . We

will set m = 0.
The function C(t) is called the covariance or the autocorrelation
function of the Xt .
Notice that C(t) = E(Xt X0 ), whereas C(0) = E(Xt2 ), which is finite,
by assumption.
Since we have assumed that Xt is a real valued process, we have
that C(t) = C(t), t R.
Pavliotis (IC)
CompStochProc
77 / 298
The correlation function of a second order stationary process

enables us to associate a time scale to Xt , the correlation time
cor :
Z
Z
1
E(X X0 )/E(X02 ) d.
C( ) d =
cor =
C(0) 0
0
The slower the decay of the correlation function, the larger the
correlation time is. We have to assume sufficiently fast decay of
correlations so that the correlation time is finite.
Pavliotis (IC)
CompStochProc
78 / 298
Example
Consider the mean zero, second order stationary process with
covariance function
D
(17)
R(t) = e|t| .
The spectral density of this process is:

Z
1 D + ixt |t|
e e
dt
f (x) =
2

Z 0
Z +
1 D
eixt et dt
eixt et dt +
=
2

0
1
1 D
1
+
=
2 ix + ix +
D
1
=
.
2
x + 2
Pavliotis (IC)
CompStochProc
79 / 298
Example (Continued)
This function is called the Cauchy or the Lorentz distribution.
The Gaussian stochastic process with covariance function (17) is
called the stationary Ornstein-Uhlenbeck process.
The correlation time is (we have that C(0) = D/())
Z
et dt = 1 .
cor =
0
Pavliotis (IC)
CompStochProc
80 / 298
Second order stationary processes are ergodic: time averages equal

phase space (ensemble) averages. An example of an ergodic theorem
for a stationary processes is the following L2 (mean-square) ergodic
theorem.
Theorem
Let {Xt }t>0 be a second order stationary process on a probability
space , F, P with mean and covariance R(t), and assume that
R(t) L1 (0, +). Then
2
Z T

1

X(s) ds = 0.
lim E
T+
T 0
Pavliotis (IC)
CompStochProc
(18)
81 / 298
Proof.
We have
2
Z T

1
X(s) ds =
E
T 0
=
Z TZ T
1
R(t s) dtds
T2 0 0
Z TZ t
2
R(t s) dsdt
T2 0 0
Z T
2
(T v)R(u) du 0,
T2 0
using the dominated convergence theorem and the assumption

R() L1 . In the above we used the fact that R is a symmetric function,
together with the change of variables u = t s, v = t and an integration
over v.
Pavliotis (IC)
CompStochProc
82 / 298
Definition
A stochastic process is called (strictly) stationary if all finite
dimensional distributions are invariant under time translation: for any
integer k and times ti T, the distribution of (X(t1 ), X(t2 ), . . . , X(tk )) is
equal to that of (X(s + t1 ), X(s + t2 ), . . . , X(s + tk )) for any s such that
s + ti T for all i {1, . . . , k}. In other words,
P(Xt1 +t A1 , Xt2 +t A2 . . . Xtk +t Ak )
= P(Xt1 A1 , Xt2 A2 . . . Xtk Ak ), t T.

Let Xt be a strictly stationary stochastic process with finite second
moment (i.e. Xt L2 ). The definition of strict stationarity implies
that EXt = , a constant, and E((Xt )(Xs )) = C(t s).
Hence, a strictly stationary process with finite second moment is
also stationary in the wide sense. The converse is not true.
Pavliotis (IC)
CompStochProc
83 / 298
Remarks
A sequence Y0 , Y1 , . . . of independent, identically distributed

random variables is a stationary process with
R(k) = 2 k0 , 2 = E(Yk )2 .
Let Z be a single random variable with known distribution and set

Zj = Z, j = 0, 1, 2, . . . . Then the sequence Z0 , Z1 , Z2 , . . . is a
stationary sequence with R(k) = 2 .
The first two moments of a Gaussian process are sufficient for a

complete characterization of the process. A corollary of this is that
a second order stationary Gaussian process is also a (strictly)
stationary process.
Pavliotis (IC)
CompStochProc
84 / 298
The most important continuous time stochastic process is

Brownian motion. Brownian motion is a mean zero, continuous
(i.e. it has continuous sample paths: for a.e the function Xt
is a continuous function of time) process with independent
Gaussian increments.
A process Xt has independent increments if for every sequence
t0 < t1 ...tn the random variables
Xt1 Xt0 , Xt2 Xt1 , . . . , Xtn Xtn1
are independent.
If, furthermore, for any t1 , t2 and Borel set B R
P(Xt2 +s Xt1 +s B)
is independent of s, then the process Xt has stationary
independent increments.
Pavliotis (IC)
CompStochProc
85 / 298
Definition
A one dimensional standard Brownian motion W(t) : R+ R is a
real valued stochastic process with the following properties:
1
2
3
4
W(0) = 0;
W(t) is continuous;
W(t) has independent increments.
For every t > s > 0 W(t) W(s) has a Gaussian distribution with
mean 0 and variance t s. That is, the density of the random
variable W(t) W(s) is

12
x2
;
(19)
exp
g(x; t, s) = 2(t s)
2(t s)
Pavliotis (IC)
CompStochProc
86 / 298
Definition (Continued)
A ddimensional standard Brownian motion W(t) : R+ Rd is a
collection of d independent one dimensional Brownian motions:
W(t) = (W1 (t), . . . , Wd (t)),
where Wi (t), i = 1, . . . , d are independent one dimensional
Brownian motions. The density of the Gaussian random vector
W(t) W(s) is thus

d/2
kxk2
.
g(x; t, s) = 2(t s)
exp
2(t s)
Brownian motion is sometimes referred to as the Wiener process .
Pavliotis (IC)
CompStochProc
87 / 298
To generate Brownian paths we can use the general algorithm for

simulating Gaussian processes with N = 0 and

N
= min(ti , tj ) i,j=1 .
It is easier to use the fact that Brownian motion has independent
increments with dW N (0, dt):
Wj = Wj1 + dW
j , j = 1, 2, . . . N,
where dWj = tN (0, 1).
After having generated Brownian paths we can compute statistics

using Monte Carlo
Ef (Wt )
N
1X
f (Wtj ).
N
j=1
We can also simulate processes related to Brownian motion such

as the geometric Brownian motion:
St = S0 eat+Wt .
Pavliotis (IC)
CompStochProc
(20)
88 / 298
2
mean of 1000 paths
5 individual paths
1.5
U(t)
0.5
0.5
1.5
0.2
0.4
0.6
0.8
Figure: Brownian sample paths
Pavliotis (IC)
CompStochProc
89 / 298
It is possible to prove rigorously the existence of the Wiener process

(Brownian motion):
Theorem
(Wiener) There exists an almost-surely continuous process Wt with
independent increments such and W0 = 0, such that for each t > the
random variable Wt is N (0, t). Furthermore, Wt is almost surely locally
Hlder continuous with exponent for any (0, 21 ).
Notice that Brownian paths are not differentiable.
Pavliotis (IC)
CompStochProc
90 / 298
Brownian motion is a Gaussian process. For the ddimensional

Brownian motion, and for I the d d dimensional identity, we have (see
(7) and (8))
EW(t) = 0 t > 0
and
Moreover,

E (W(t) W(s)) (W(t) W(s)) = (t s)I.
Pavliotis (IC)

E W(t) W(s) = min(t, s)I.
CompStochProc
(21)
(22)
91 / 298
From the formula for the Gaussian density g(x, t s), eqn. (19), we
immediately conclude that W(t) W(s) and W(t + u) W(s + u)
have the same pdf. Consequently, Brownian motion has stationary
increments.
Notice, however, that Brownian motion itself is not a stationary
process.
Since W(t) = W(t) W(0), the pdf of W(t) is
g(x, t) =
1 x2 /2t
e
.
2t
We can easily calculate all moments of the Brownian motion:

Z +
2
1
n
xn ex /2t dx
E(x (t)) =
2t
n
1.3 . . . (n 1)tn/2 , n even,
=
0,
n odd.
Pavliotis (IC)
CompStochProc
92 / 298
We can define the OU process through the Brownian motion via a

time change.
Lemma
Let W(t) be a standard Brownian motion and consider the process
V(t) = et W(e2t ).
Then V(t) is a Gaussian second order stationary process with mean 0
and covariance
K(s, t) = e|ts| .
Pavliotis (IC)
CompStochProc
93 / 298
Definition
A (normalized) fractional Brownian motion WtH , t > 0 with Hurst
parameter H (0, 1) is a centered Gaussian process with continuous
sample paths whose covariance is given by
E(WtH WsH ) =

1 2H
s + t2H |t s|2H .
2
(23)
Fractional Brownian motion has the following properties.

1
1
2
3
4
When H = 21 , Wt2 becomes the standard Brownian motion.

W0H = 0, EWtH = 0, E(WtH )2 = |t|2H , t > 0.
It has stationary increments, E(WtH WsH )2 = |t s|2H .

It has the following self similarity property
H
(Wt
, t > 0) = (H WtH , t > 0), > 0,
where the equivalence is in law.

Pavliotis (IC)
CompStochProc
94 / 298
A very useful result is that we can expand every centered

(EXt = 0) stochastic process with continuous covariance (and
hence, every L2 -continuous centered stochastic process) into a
random Fourier series.
Let (, F, P) be a probability space and {Xt }tT a centered
process which is mean square continuous. Then, the
Karhunen-Love theorem states that Xt can be expanded in the
form
X
n n (t),
(24)
Xt =
n=1
where the n are an orthogonal sequence of random variables with

E|k |2 = k , where {k n }
k=1 are the eigenvalues and
eigenfunctions of the integral operator whose kernel is the
covariance of the processes Xt . The convergence is in L2 (P) for
every t T.
If Xt is a Gaussian process then the k are independent Gaussian
random variables.
Pavliotis (IC)
CompStochProc
95 / 298
Set T = [0, 1].Let K : L2 (T) L2 (T) be defined by

K(t) =
K(s, t)(s) ds.
The kernel of this integral operator is a continuous (in both s and

t), symmetric, nonnegative function. Hence, the corresponding
integral operator has eigenvalues, eigenfunctions
Kn = n ,
n > 0,
(n , m )L2 = nm ,
such that the covariance operator can be expanded in a uniformly

convergent series
B(t, s) =
n n (t)n (s).
n=1
Pavliotis (IC)
CompStochProc
96 / 298
The random variables n are defined as

Z 1
X(t)n (t) dt.
n =
0
The orthogonality of the eigenfunctions of the covariance operator

implies that
E(n m ) = n nm .
Thus the random variables are orthogonal. When X(t) is
Gaussian, then k are Gaussian random variables. Furthermore,
since they are also orthogonal, they are independent Gaussian
random variables.
Pavliotis (IC)
CompStochProc
97 / 298
Example
The Karhunen-Love Expansion for Brownian Motion We set
T = [0, 1].The covariance function of Brownian motion is
C(t, s) = min(t, s). The eigenvalue problem Cn = n n becomes
Z
min(t, s)n (s) ds = n n (t).
Or,
sn (s) ds + t
n (s) ds = n n (t).
We differentiate this equation twice:

Z
n (s) ds = n n (t)
and
n (t) = n n (t),
where primes denote differentiation with respect to t.

Pavliotis (IC)
CompStochProc
98 / 298
Example
From the above equations we immediately see that the right boundary
conditions are (0) = (1) = 0. The eigenvalues and eigenfunctions
are

2
1
2
n (t) = 2 sin
(2n 1)t , n =
.
2
(2n 1)
Thus, the Karhunen-Love expansion of Brownian motion on [0, 1] is

X
sin n 21 t

n
.
(25)
Wt = 2
1
2
n=1
Pavliotis (IC)
CompStochProc
99 / 298
Example KL expansion for Brownian motion
1
2
3
4
5
6
7
8
9
10
11
function [t,bb] =KLBmotion(dt)

% generate paths of Brownian motion on [0,1]
% using the KL expansion
t = 0:dt:1;
N = length(t);
bb = zeros(1,N);
for i =1:N,
xi = sqrt(2)*randn/((i-0.5)*pi);
bb = bb + xi*sin((i-0.5)*pi*t);
end
figure;plot(t,bb,'Linewidth',2);
12
13
end
Pavliotis (IC)
CompStochProc
100 / 298
Example
The Karhunen-Love expansion of the Brownian bridge Bt = Wt tW1
on [0, 1] is
X
2 sin (nt)
.
(26)
n
Bt =
n
n=1
Pavliotis (IC)
CompStochProc
101 / 298
Example KL expansion for Brownian bridge
1
2
3
4
5
6
7
8
9
10
11
function [t,bb] =KLBBridge(dt)

% generate paths of Brownian bridge on [0,1]
% using the KL expansion
t = 0:dt:1;
N = length(t);
bb = zeros(1,N);
for i =1:N,
xi = sqrt(2)*randn/(i*pi);
bb = bb + xi*sin(i*pi*t);
end
figure;plot(t,bb,'Linewidth',2);
12
13
end
Pavliotis (IC)
CompStochProc
102 / 298
STOCHASTIC DIFFERENTIAL EQUATIONS (SDEs)
Pavliotis (IC)
CompStochProc
103 / 298
In this part of the course we will study stochastic differential

equation (SDEs): ODEs driven by Gaussian white noise.
Let W(t) denote a standard mdimensional Brownian motion,
h : Z Rd a smooth vector-valued function and : Z Rdm a
smooth matrix valued function (in this course we will take
Z = Td , Rd or Rl Tdl .
Consider the SDE
dW
dz
= h(z) + (z)
,
dt
dt
z(0) = z0 .
(27)
We think of the term dW

dt as representing Gaussian white noise: a
mean-zero Gaussian process with correlation (t s)I.
The function h in (27) is sometimes referred to as the drift and

as the diffusion coefficient.
Pavliotis (IC)
CompStochProc
104 / 298
Such a process exists only as a distribution. The precise

interpretation of (27) is as an integral equation for z(t) C(R+ , Z):
Z t
Z t
(z(s))dW(s).
(28)
h(z(s))ds +
z(t) = z0 +
0
In order to make sense of this equation we need to define the

stochastic integral against W(s).
Pavliotis (IC)
CompStochProc
105 / 298
The It Stochastic Integral

For the rigorous analysis of stochastic differential equations it is
necessary to define stochastic integrals of the form
Z t
I(t) =
f (s) dW(s),
(29)
0
where W(t) is a standard one dimensional Brownian motion. This

is not straightforward because W(t) does not have bounded
variation.
In order to define the stochastic integral we assume that f (t) is a
random process, adapted to the filtration Ft generated by the
process W(t), and such that

Z T
2
f (s) ds < .
E
0
Pavliotis (IC)
CompStochProc
106 / 298
The It stochastic integral I(t) is defined as the L2 limit of the

Riemann sum approximation of (29):
I(t) := lim
K1
X
k=1
f (tk1 ) (W(tk ) W(tk1 )) ,
(30)
where tk = kt and Kt = t.
Notice that the function f (t) is evaluated at the left end of each
interval [tn1 , tn ] in (30).
The resulting It stochastic integral I(t) is a.s. continuous in t.
These ideas are readily generalized to the case where W(s) is a
standard d dimensional Brownian motion and f (s) Rmd for each
s.
Pavliotis (IC)
CompStochProc
107 / 298
The resulting integral satisfies the It isometry

Z t
2
E|f (s)|2F ds,
E|I(t)| =
(31)
where | |F denotes the Frobenius norm |A|F =

The It stochastic integral is a martingale:
p
tr(AT A).
EI(t) = 0
and
E[I(t)|Fs ] = I(s)
t > s,
where Fs denotes the filtration generated by W(s).
Pavliotis (IC)
CompStochProc
108 / 298
Example
Consider the It stochastic integral
Z t
f (s) dW(s),
I(t) =
0
where f , W are scalarvalued. This is a martingale with quadratic

variation
Z t
hIit =
(f (s))2 ds.
More generally, for f , W in arbitrary finite dimensions, the integral

I(t) is a martingale with quadratic variation
Z t
(f (s) f (s)) ds.
hIit =
0
Pavliotis (IC)
CompStochProc
109 / 298
The Stratonovich Stochastic Integral

In addition to the It stochastic integral, we can also define the
Stratonovich stochastic integral. It is defined as the L2 limit of a
different Riemann sum approximation of (29), namely
Istrat (t) := lim
K1
X
k=1

1
f (tk1 ) + f (tk ) (W(tk ) W(tk1 )) ,
2
(32)
where tk = kt and Kt = t. Notice that the function f (t) is

evaluated at both endpoints of each interval [tn1 , tn ] in (32).
The multidimensional Stratonovich integral is defined in a similar
way. The resulting integral is written as
Z t
f (s) dW(s).
Istrat (t) =
0
Pavliotis (IC)
CompStochProc
110 / 298
The limit in (32) gives rise to an integral which differs from the It
integral.
The situation is more complex than that arising in the standard
theory of Riemann integration for functions of bounded variation:
in that case the points in [tk1 , tk ] where the integrand is evaluated
do not effect the definition of the integral, via a limiting process.
In the case of integration against Brownian motion, which does not
have bounded variation, the limits differ.
When f and W are correlated through an SDE, then a formula
exists to convert between them.
Pavliotis (IC)
CompStochProc
111 / 298
The It and Stratonovich stochastic integrals coincide provided

that the integrand is sufficiently regular then the It and
Stratonovich stochastic integrals coincide.
Suppose that there exist C < , > 0 such that
E|f (t, ) f (s, )|2 6 C|t s|1+ ,
for all s, t [0, T]. Then
Z
f (t, ) dWt =
f (t, ) dWt .
This is more than the regularity of the solution to an SDE.

On the other hand:
Z T
1
1
Wt dWt = WT2 T,
2
2
0
Pavliotis (IC)
CompStochProc
T
0
Wt dWt =
1 2
W .
2 T
112 / 298
It and Stratonovich stochastic integrals (Higham

2001)
1
2
3
4
function [itoerr, straterr] = stint(T,N)

%
% Ito and Stratonovich integrals of W dW
%
5
6
dt = T/N;
7
8
9
dW = sqrt(dt)*randn(1,N);
W = cumsum(dW);
% increments
% cumulative sum
10
11
12
ito = sum([0,W(1:end-1)].*dW)
strat = sum((0.5*([0,W(1:end-1)]+W) + ...
0.5*sqrt(dt)*randn(1,N)).*dW)
13
14
15
16
itoerr = abs(ito - 0.5*(W(end)^2-T))

straterr = abs(strat - 0.5*W(end)^2)
end
Pavliotis (IC)
CompStochProc
113 / 298
We can associate a second order differential operator with and

SDE, the generator of the process.
Consider the It SDE
dXt = b(Xt ) dt + (Xt ) dW(t),
(33)
(x) = (x)(x)T .
(34)
We define
The generator of Xt is
d
2
1X
ij (x)
,
L=b+
2
xi xj
(35)
i,j=1
Pavliotis (IC)
CompStochProc
114 / 298
Using the generator we can write Its formula, which enables us

to calculate the rate of change in time of functions V(t, Xt ).
In the absence of noise the rate of change of V can be written as
d
V
V(t, Xt ) =
(t, xt ) + AV(t, x(t)),
dt
t
(36)
where Xt is the solution of the ODE X t = b(x) and A is

A = b(x) .
(37)
For the SDE the chain rule (36) has to be modified by the addition
of a term due to noise:

d
V
dW
V(t, Xt ) =
(t, Xt ) + AV(t, Xt ) + V(t, Xt ), (Xt )
(t) . (38)
dt
t
dt
Pavliotis (IC)
CompStochProc
115 / 298
The additional term in LV is proportional to and it arises from

the lack of smoothness of Brownian motion.
The precise interpretation of the expression for the rate of change
of V is in integrated form:
Z t
Z t
V
LV(s, Xs ) ds
(s, Xs ) ds +
V(t, Xt ) = V(x(0)) +
0
0 s
Z t
hV(s, Xs ), (Xs ) dW(s)i .
(39)
+
0
The Brownian differential scales like the square root of the

differential in time: E(dWt )2 = dt. We can write Its formula in the
form
d
X
V
1 2V
V
dt +
dxi +
dxi dxj ,
(40)
dV(t, x(t)) =
t
xi
2 xi xj
i,j=1
We are using the convention

dWi (t)dWj (t) = ij dt, dWi (t)dt = 0, i, j = 1, . . . d.
Thus, we can think of (40) as a generalization of Leibniz rule (38)
where second order differentials are kept.
Pavliotis (IC)
CompStochProc
116 / 298
Taking the expectation of (40) and using the fact that the
stochastic integral is a martingale we obtain (we assume that V is
independent of time and that the I.C. are deterministic)
Z t
LV(s, Xs ) ds.
EV(Xt ) = E
0
We can use Its formula to calculate the expectation value of

functionals of the solution on an SDE.
For a Stratonovich SDE the rules of standard calculus apply; let Xt
denote the solution of the Stratonovich SDE
dXt = b(Xt ) dt + (Xt ) dWt ,
(41)
where b : Rd 7 Rd and : Rd 7 Rdd . Then the generator of Xt

is
1
(42)
L = b + : ( T ).
2
Furthermore, the Newton-Leibniz chain rule applies: let
h C2 (Rd ) and define Yt = h(Xt ). Then

(43)
dYt = h(Xt ) b(Xt ) dt + (Xt ) dWt .
Pavliotis (IC)
CompStochProc
117 / 298
Example (Geometric Brownian motion)

Consider the linear It SDE with multiplicative noise
dXt = Xt dt + Xt dWt ,
X0 = x,
(44)
L = x
2 x2 2
+
.
x
2 x2
The solution of (44) is

2
X(t) = x exp ( )t + W(t) .
2

Pavliotis (IC)
CompStochProc
(45)
118 / 298
Example (Geometric Brownian motion contd.)

To derive this formula, we apply Its formula to the function
f (x) = log(x):

d log(Xt ) = L log(Xt ) dt + xx log(Xt ) dWt

1
2 Xt2
1
=
Xt +
dt + dWt
2
Xt
2
Xt

2
=
dt + dWt .
2
Consequently:
log
Xt
X0
t + W(t),
from which (45) follows.

Pavliotis (IC)
CompStochProc
119 / 298
Existence and Uniqueness of solutions for SDEs
Definition
By a solution of (27) we mean an Rd -valued stochastic process {Xt } on
t [0, T] with the properties:
1
2
3
Xt is continuous and Ft adapted, where the filtration is generated

by the Brownian motion W(t);
h(Xt ) L1 ((0, T)), (Xt ) L2 ((0, T));
equation (27) holds for every t [0, T] with probability 1.
The solution is called unique if any two solutions Xi (t), i = 1, 2 satisfy

P(X1 (t) = X2 (t), t [0.T]) = 1.
Pavliotis (IC)
CompStochProc
120 / 298
It is well known that existence and uniqueness of solutions for

ODEs (i.e. when 0 in (27)) holds for globally Lipschitz vector
fields h(x).
A very similar theorem holds when 6= 0.
As for ODEs the conditions can be weakened, when a priori

bounds on the solution can be found.
Pavliotis (IC)
CompStochProc
121 / 298
Theorem
Assume that both h() and () are globally Lipschitz on Rd and that x0
is a random variable independent of the Brownian motion W(t) with
E|x0 |2 < .
Then the SDE (27) has a unique solution z(t) C(R+ ; Rd ) with

Z T
2
|Xt | dt < T < .
E
0
Furthermore, the solution of the SDE is a Markov process.
Pavliotis (IC)
CompStochProc
122 / 298
Remarks
The Stratonovich analogue of (27) is
dX
dW
= h(X) + (X)
,
dt
dt
X(0) = X0 .
(46)
By this we mean that z C(R+ , Rd ) satisfies the integral equation

Z t
Z t
(X(s)) dW(s).
(47)
h(X(s))ds +
X(t) = X(0) +
0
By using definitions (30) and (32) it can be shown that z satisfying

the Stratonovich SDE (46) also satisfies the It SDE
1

dX
1
dW
= h(X)+ (X)(X)T (X) (X)T +(X)
, (48a)
dt
2
2
dt
X(0) = X0 ,
(48b)
provided that (z) is differentiable.

Pavliotis (IC)
CompStochProc
123 / 298
White noise is, in most applications, an idealization of a stationary

random process with short correlation time. In this context the
Stratonovich interpretation of an SDE is particularly important
because it often arises as the limit obtained by using smooth
approximations to white noise.
On the other hand the martingale machinery which comes with
the It integral makes it more important as a mathematical object.
It is very useful that we can convert from the It to the
Stratonovich interpretation of the stochastic integral.
There are other interpretations of the stochastic integral, e.g. the
Klimontovich stochastic integral.
Pavliotis (IC)
CompStochProc
124 / 298
Examples of SDEs
The Ornstein-Uhlenbeck process (, > 0):
dXt = Xt dt + dWt .
The solution is
t
Xt = e
X0 +
(49)
e(ts) dWs .
Geometric Brownian motion:
dXt = rXt dt + Xt dWt .

The solution is
(50)
1 2
Xt = X0 eWt +(r 2 )t .
Note that the solution is different in for the It and the Stratonovich
SDEs.
The Cox-Ingersoll-Ross SDE (, b > 0):
(51)
dXt = (b Xt ) dt + Xt dWt.
Pavliotis (IC)
CompStochProc
125 / 298
Examples of SDEs (contd)

Stochastic Verhulst equation (population dynamics)
dXt = (Xt Xt2 ) dt + Xt dWt .
Lotka-Volterra SDEs:
dXi (t) = Xi (t) ai +
Protenin kinetics:
d
X
j=1
bij Xj (t) dt + i Xi (t) dWi (t).
dXt = ( Xt + Xt (1 Xt )) dt + Xt (1 Xt ) dWt .
Pavliotis (IC)
CompStochProc
(52)
(53)
(54)
126 / 298
Examples of SDEs (contd)

Tracer particle (turbulent diffusion)
dXt = u(Xt , t) dt + dWt ,
u(x, t) = 0.
(55)
Josephson junction (pendulum with friction and noise)

p
t.
t = sin(t ) t + 2 1 W
(56)
t = Xt Xt3 X t + A cos(t) + W
t
X
(57)
Noisy Duffing oscillator (stochastic resonance)
Stochastic heat equation

t u = x2 u + t W(x, t),
(58)
on [0, 1] with Dirichlet boundary conditions and W(x, t) denoting an

infinite dimensional Brownian motion.
Pavliotis (IC)
CompStochProc
127 / 298
Consider the SDE (80). The generator L and its L2 -adjoint are
1
L = b(x)x + 2 (x)x2 ,
2
and

1
L = x b(x) + x 2 (x) .
2
We can obtain evolution equations for the expectation of
functionals of the solution of the SDE and for the transition
probability density.
Pavliotis (IC)
CompStochProc
128 / 298
Theorem
(Kolmogorov) Let f (x) Cb (R) and let
Z
u(x, s) := E(f (Xt )|Xs = x) = f (y)p(y, t|x) dy Cb2 (R).
Assume furthermore that the functions b(x), (x) = 2 (x) are
continuous in x. Then u(x, s) C2,1 (R R+ ) and it solves the final
value problem
u 1
2u
u
= b(x)
+ (x, s) 2 ,
s
x 2
x
Pavliotis (IC)
CompStochProc
lim u(s, x) = f (x).

st
(59)
129 / 298
Theorem
(Kolmogorov) Assume that p(y, t|, ), b(y), (y) C2 (R R+ ). Then the
transition probability density of Xt satisfies the equation
1 2
p
= (b(y)p) +
((y)p) ,
t
y
2 y2
Pavliotis (IC)
CompStochProc
p(0, y|x) = (x y).
(60)
130 / 298
Assume that initial distribution of Xt is 0 (x). Define

Z
p(y, t) := p(y, t|x)0 (x) dx.
We multiply the forward Kolmogorov equation (60) by 0 (x) and
integrate with respect to x to obtain the equation
1 2
p(y, t)
= (b(y, t)p(y, t)) +
((y)p(t, y)) ,
t
y
2 y2
(61)
together with the initial condition

p(y, 0) = 0 (y).
Pavliotis (IC)
CompStochProc
(62)
131 / 298
Numerical Solution of SDEs

We consider the scalar It SDE
X0 = x,
(63)
posed on the interval [0, T]. The initial condition can be either
deterministic or random.
Our goal is to obtain an approximate solution to this equation to
compute quantities of the form Ef (Xt ), where E denotes the
expectation with respect to the law of the process Xt .
The simplest numerical method is the Euler-Marayama method
that is the analogue of the explicit Euler method for ODEs.
Pavliotis (IC)
CompStochProc
132 / 298
We partition the interval [0, T] into N equal subintervals of size t

with Nt = T.
We set
Xj := X(jt),
j = 0, 1, . . . N.
The Euler-Marayama method for the SDE (63) is

Xj = Xj1 + b(Xj1 ) t + (Xj1 ) Wj ,
j = 1, . . . N,
(64)
where Wj = Wj Wj1 denotes the jth Brownian increment.
We can write Wj = j t with j N (0, 1) i.i.d.

The Euler-Marayama scheme can be written as
Xj = Xj1 + b(Xj1 ) t + (Xj1 ) tj , j = 1, . . . N.
Pavliotis (IC)
CompStochProc
(65)
133 / 298
2.5
1.5
Xt
0.5
0.5
1.5
10
Figure: Five sample paths of the Ornstein-Uhlenbeck process.
Pavliotis (IC)
CompStochProc
134 / 298
We solve the OrnsteinUhlenebeck SDE in one dimension using

the EM scheme.
dXt = Xt dt + 2 dWt .
(66)
We solve (66) for = 1, = 21 for t [0, 10] with t = 0.0098 and
initial conditions X0 2U(0, 1), where U(0, 1) denotes the uniform
distribution in the interval (0, 1).
In Figure 7 we present five sample paths of the OU process.
In Figure 8 we plot the first two moments of the Euler-Marayama
approximation of the OU process. We compare against the
theoretical solution.
Pavliotis (IC)
CompStochProc
135 / 298
1.2
1.4
EulerMarayama
EulerMarayama
1.3
exact formula
exact formula
1.2
1.1
E(X(t) )
0.6
E X(t)
0.8
0.4
0.9
0.8
0.7
0.2
0.6
0
0.5
0.2
0.4
10
10
b. EXt2
a. EXt
Figure: First and second moments of the Ornstein-Uhlenebeck process using

the Euler-Marayama method.
Pavliotis (IC)
CompStochProc
136 / 298
We use the analytical solution to investigate the error of the

Euler-Marayama scheme, as a function of t.
We generate 1, 000 paths using the EM scheme as well as the
exact solution, using the same Brownian paths.
Since the noise in the OU SDE is additive, we expect that the
strong order of convergence is 1.
This is verified by our numerical experiments, see Figure 9.
Pavliotis (IC)
CompStochProc
137 / 298
10
E|X(T) X
em
(T)|
10
10
10
10
10
10
10
10
Figure: Strong order of convergence for the Euler-Marayama method for the
Ornstein-Uhlenbeck process. The linear equation with slope 1 is plotted for
comparison.
Pavliotis (IC)
CompStochProc
138 / 298
Consider the motion of a Brownian particle in a bistable potential:

p
dXt = V (Xt ) dt + 2 1 dWt ,
(67)
with
x4 x2
.
(68)
4
2
The deterministic dynamical system has two stable equilibria.
V(x) =
For weak noise strengths the solution of the SDE spends most
time oscillating around the minima of the potential.
In Figure 10 we present a sample path of the SDE with = 10,
obtained using the EM algorithm.
Pavliotis (IC)
CompStochProc
139 / 298
1.5
0.5
0.5
1.5
50
100
150
200
250
300
350
400
450
500
Figure: Sample path of the SDE (67).
Pavliotis (IC)
CompStochProc
140 / 298
0.9
E(X(t) )
0.8
0.7
0.6
0.5
0.4
10
Figure: Second moment of the bistable SDE (67).
Pavliotis (IC)
CompStochProc
141 / 298
Another standard numerical method for solving stochastic

differential equations is the Milsteins method
Xj = Xj1 +b(Xj1 ) t+(Xj1 )j

1
t+ (Xj1 ) (Xj1 ) t j2 1 .
2
(69)
Notice that there is an additional term, in comparison to the EM

scheme.
The Milstein scheme converges strongly with order 1.
Pavliotis (IC)
CompStochProc
142 / 298
The drift term is discretized in the same way in the Euler and
Milstein schemes.
On the other hand, the Milstein scheme has an additional term,
which is related to the approximation of the stochastic term in (63).
We can derive the Milstein scheme as follows. First, we write an
increment of the solution to the SDE in the form
Z (j+1)t
Z (j+1)t
(Xs ) dWs .
(70)
b(Xs ) ds +
Xj+1 = Xj +
jt
jt
We apply Its formula to the drift and diffusion coefficients to

obtain
Z s
Z s
(b )(X ) dW
(Lb)(X ) d +
b(Xs ) = b(Xj ) +
jt
jt
and
(Xs ) = (Xj ) +
(L)(X ) d +
jt
( )(X ) dW ,
jt
for s [jt, (j + 1)t], where L denotes the generator of the

process Xt .
Pavliotis (IC)
CompStochProc
143 / 298
We substitute these formulas into (70) to obtain, with j N (0, 1),

Xj+1 = Xj + b(Xj ) t + (Xj ) Wj
Z (j+1)t Z s
Z
+
(Lb)(X ) d ds +
+
jt
Z (j+1)t
jt
jt
Z s
(j+1)t
jt
(L)(X ) ddWs +
(b )(X ) dW ds
jt
(j+1)t
jt
jt
= Xj + b(Xj ) t + (Xj ) Wj + ( )(Xj )
( )(X ) dW dWs
jt
(j+1)t
jt
dW dWs + O(
jt

1
Xj + b(Xj ) t + (Xj ) Wj + ( )(Xj ) Wj2 t
2

1
= Xj + b(Xj ) t + (Xj ) Wj + t( )(Xj ) j2 1 ,
2
In the above, we have used the fact that (t) (Wj ) = O((t)+/2 )
and that, in one dimension,
Z (j+1)t Z s

1
dW dWs =
Wj2 t .
(71)
2
jt
jt
Pavliotis (IC)
CompStochProc
144 / 298
We will say that a numerical scheme has strong order of

convergence if there exists a positive constant C such that
j X(jt)| 6 Ct,
E|X
(72)
for any j = 1, . . . N and for t sufficiently small.

We usually evaluate (72) at t = T.
We will show that the Euler-Marayama method has strong order of
convergence 12 .
The Milstein scheme has strong order 1.
The strong order of convergence for the Euler-Marayama method
becomes 1 when the noise is additive.
There are applications such as filtering and statistical inference
where strong convergence of the numerical solution of the SDE is
needed.
Pavliotis (IC)
CompStochProc
145 / 298
Quite often we are interested in the calculation of statistical

quantities of solutions to SDEs, such as moments, rather than in
pathwise properties.
For such a purpose, the strong convergence (72) is more than
what we actually need.
We will say that a numerical method has weak order of
convergence provided that there exists a positive constant C
such that for all f CP (R) (the class of times continuously
differentiable functions which, together with their derivatives up to
and including order have at most polynomial growth)

Ef (X
j ) Ef (X(jt)) 6 Ct ,
(73)
for any j = 1, . . . N and for t sufficiently small.
Both the Euler-Marayama and the Milstein schemes have weak

order of convergence 1.
The Milstein scheme requires more function evaluations.
Pavliotis (IC)
CompStochProc
146 / 298
For the calculation of the expectation in (72) and (73) we need to

use a Monte Carlo method for calculating the expectation.
We generate N sample paths of the exact solution of the SDE and
of the numerical discretization that are driven by the same noise:
N
1 X k
|XT XTk |.
=
N
k=1
is a random variable for which we can prove a central limit

theorem.
In addition to the discretization error we also have the Monte Carlo
error.
We can perform similar numerical experiments in order to study
the weak order of convergence. In this case the exact and the
approximate solutions do not need to be driven by the same noise.
Pavliotis (IC)
CompStochProc
147 / 298
To estimate the strong order of convergence of a numerical

scheme we plot the error as a function of step size in log-log and
use least squares fitting:
log(err) = log C + log t.
The weak convergence error consists of two steps, the
discretization error and the statistical error, due to the MC
approximation.
bT denotes the solution of the numerical scheme):
We estimate (X
= |Ef (XT )
M
1 X bm
f (XNt )|
M
m=1
M
1 X
m
b
b
)|
f (XNt
6 |Ef (XT ) Ef (XT )| + |Ef (XT )
M
m=1
6 CN
+ CM
1/2
Fixed computational resources: optimize over N, M.

Pavliotis (IC)
CompStochProc
148 / 298
The EM or Milstein schemes provide us with the solution of the

SDE only at the discretization times.
The values at intermediate instants can be estimated using
interpolation:
Constant interpolation:
b = Xjt ,
X(t)
jt = max{j = 0, 1, 2 . . . , N : jt 6 t},
linear interpolation:
b =X
bjt +
X(t)
t jt b
bjt ),
(Xj +1 X
jt+1 jt t
b to denote the
where j = jt and where we have used X(t)
interpolated numerical solution of the SDE.
Pavliotis (IC)
CompStochProc
149 / 298
Consider an approximate Brownian motion W h (t) constructed as a

Gaussian random walk at tn = nh, h = T/N, T = 1 and by linear
interpolation in between. Then
E
where c1 =
|W h (t) W(t)| dt =
c1
,
N 1/2
/32. Furthermore
s
N
lim
E sup |W h (t) W(t)| = c2 ,
N+
log N 06t61
for some constant c2 (0, +).
We can obtain similar results for diffusion processes.

It is possible to simulate exactly a diffusion process: Exact
simulation of diffusions A. Beskos, G.O. Roberts - The Annals
of Applied Probability, 2005
Pavliotis (IC)
CompStochProc
150 / 298
Consider an SDE in arbitrary dimensions:

(74)
where Xt : [0, T] 7 Rd , b C (Rd ), C (Rdd ), Wt standard

d-dimensional Brownian motion.
The EM scheme is (we use the notation bin = bi (Xn ) for the ith
component of the vector b etc.)
i
Xn+1
Xni
bin t
d
X
ij Wnj .
(75)
j=1
It is straightforward to implement this scheme.
Pavliotis (IC)
CompStochProc
151 / 298
For the Milstein scheme we have:
i
Xn+1
Xni
bin t
d
X
ij
Wnj
d
X
nj
j1 ,j2 =1,=1
j=1
j1 j2
Ij j ,
x n 1 2
where Ij1 j2 denotes a double It integral:

Z tn+1 Z s1
Ij1 j2 =
dWsj12 dWsj21 .
tn
(76)
(77)
tn
The diagonal terms Ij1 j2 are given by the 1d formula. The

off-diagonal elements might be difficult/expensive to simulate.
We can obtain higher order schemes using the Stochastic Taylor
expansion.
Higher order schemes require the evaluation of a large number of
derivatives of the drift and diffusion coefficients
Pavliotis (IC)
CompStochProc
152 / 298
We can exploit the structure of the SDE to simplify the calculation

of the Milstein correction to the EM scheme.
Consider a nonlinear oscillator with friction and noise (example:
the noisy Duffing-Van der Pol oscillator):
t = b(Xt ) (Xt )X t +
X
d
X
j.
j (Xt ) W
(78)
j=1
Write as a first order system:

dXt1 = Xt2 dt,
dXt2
b(Xt1 )
(Xt1 )Xt2
dt +
d
X
j (Xt ) dW j .
j=1
The Milstein discretization for this SDE is

1
Yn+1
= Yn1 + Yn2 t,
2
Yn+1
Pavliotis (IC)
Yn2
b(Yn1 )
(Yn1 )Yn2
CompStochProc
t +
d
X
j (Yn ) dW j .
j=1
153 / 298
No Milstein correction appears (double It integrals vanish). This

scheme has strong order of convergence 1.
Solve the first equation for Yn2 and substitute to the second to
obtain a multistep scheme for Yn1 :
1
1
Yn+2
=
2 (Yn1 )t Yn+1
(1 (Yn1 )t)Yn1 + b(Yn1 )(t)2
+
d
X
j (Yn1 )Wnj t.
(79)
j=1
This is the Lepingle-Ribemont scheme. The first equation in the

Milstein scheme can be used as a starting routine and for
calculating Yn2 .
Pavliotis (IC)
CompStochProc
154 / 298
Population dynamics
dXt = rXt (K Xt ) dt + Xt dWt ,
X(0) = X0 .
Motion in a periodic incompressible velocity field subject to

molecular diffusion
dXt = v(Xt ) dt + 2dWt , v(x) = 0.
Pavliotis (IC)
CompStochProc
155 / 298
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
% compute effective diffusivities for particles moving ...

in periodic velocity
% fields using monte carlo. the velocity field is a ...
Taylor-Green flow
% dt: time step N: number of steps M: number of paths
%
% D: molecular diffusion
%
function [kxx, kyy,tint] = effdiff(N,dt,M,D)
T = dt*N;
tint = [0:dt:T];
%
u = zeros(M,N);
v = zeros(M,N);
uin = u(:,1);
vin = v(:,1);
%
DDD = num2str(D);
MMM = num2str(M);
elll = num2str(dt);
%
dW1 = sqrt(dt)*randn(M,N);
% Brownian increments
Pavliotis (IC)
CompStochProc
155 / 298
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
dW2 = sqrt(dt)*randn(M,N);
for j = 1:N-1
u(:,j+1) = u(:,j) + chsowx(u(:,j),v(:,j))*dt + ...
sqrt(2*D)*dW1(:,j);
v(:,j+1) = v(:,j) + chsowy(u(:,j),v(:,j))*dt + ...
sqrt(2*D)*dW2(:,j);
end
%
v = [vin,v];
u=[uin,u];
%
figure
plot(u(1,:),v(1,:))
grid on
%
kxx = var(v)./(2*tint);
kyy = var(u)./(2*tint);
%
figure
plot(tint,kxx,'r','LineWidth',3)
ylabel('var(x)/2t','Fontsize',16,'Rotation',0)
xlabel('t','Fontsize',16)
title(['var(x)/2t ', ' D = ' DDD ' , M = ' MMM ' , dt = ...
Pavliotis (IC)
CompStochProc
155 / 298
42
43
44
45
46
47
' elll
])
%
figure
plot(tint,kyy,'r','LineWidth',3)
ylabel('var(y)/2t','Fontsize',16,'Rotation',0)
xlabel('t','Fontsize',16)
title(['var(y)/2t ', ' D = ' DDD ' , M = ' MMM ' , dt = ...
' elll
])
Pavliotis (IC)
CompStochProc
156 / 298
We consider the It SDE

X0 = x.
(80)
We assume that the coefficients are smooth, globally Lipschitz

continuous and satisfy the linear growth condition:
|b(x) b(y)| + |(x) (y)| 6 C|x y|,
(81)
|b(x)| + |(x)| 6 C(1 + |x|).
(82)
and
We also assume that the initial condition is a random variable
independent of the Brownian motion Wt with
E|X0 |2 6 C.
Under the above assumptions there exists a unique strong
solution to (80) for t [0, T].
Pavliotis (IC)
CompStochProc
156 / 298
Lemma
Under the above assumptions the solution to the SDE satisfies
E sup |Xt |2 6 C(T).
(83)
t[0,T]
Proof.
Use the assumptions on the coefficients b() and () and the initial
conditions, Cauchy-Schwartz and the Burkholder-Gandy-Davis and
Gronwall inequalities.
The BDG inequality that we need is
c2 E
t
2
(s) ds 6 E sup
0
Pavliotis (IC)
t[0,T]
Z
(s) dWs
CompStochProc
2
6 C2 E
2 (s) ds
(84)
157 / 298
Let {Xnt }Nn=0 denote the Euler discretisation of the SDE (80) with
constant interpolation.
We partition the interval [0, T]: tn = nt, n = 0,
N, t = T/N.
Lemma
Under the above assumptions we have
E
sup
t[tn ,tn+1 ]
|Xt Xtn |2 6 Ct.
(85)
Proof.
Use the assumptions on the coefficients, the BDG inequality and
Lemma 34.
Pavliotis (IC)
CompStochProc
158 / 298
Theorem
constant interpolation. Under the above assumptions we have
E sup |Xt Xtt |2 6 Ct.
(86)
t[0,T]
For the proof we will need the discrete Gronwall inequality in the
following form: let {yn }Nn=0 be a sequence and A, B > 0 satisfying
y0 = 0,
yn 6 A + Bh
n1
X
yj ,
1 6 n 6 N, h = 1/N.
j=0
Then
max yi 6 AeB .
06i6N
Pavliotis (IC)
CompStochProc
(87)
159 / 298
We can write the Euler scheme in the form

Z tn+1
Z tn+1
t
Xn+1
= Xnt +
b(Xnt ) ds +
(Xnt ) dWs .
tn
tn
For the exact solution we have

Z
Z tn+1
b(Xn ) ds +
Xn+1 = Xn +
tn
tn+1
(Xn ) dWs + n ,
tn
where
n =
Pavliotis (IC)
tn+1
tn
(b(Xs ) b(Xn )) ds +
CompStochProc
tn+1
tn
((Xs ) (Xn )) dWs .
160 / 298
Take the difference between the Euler approximation and the

exact solution (use the notation
Xn = Xn Xnt , b(Xn ) = b(Xn ) b(Xnt ) etc.):
Z tn+1
Z tn+1
Xn+1 = Xn +
bn ds +
n dWs + n .
tn
tn
Sum over n and use X0 = 0 to obtain

Xn+1 =
N1
X Z tn+1
n=0
bn ds +
tn
N1
X Z tn+1
n=0
where
RN =
N1
X
n dWs + RN ,
tn
n .
n=0
Pavliotis (IC)
CompStochProc
161 / 298
Use the Lipschitz continuity, Cauchy-Schwarz, BGD to obtain

(t 6 1)
E|Xn+1 |2 6 C(T)t
N1
X
n=0
E|Xn |2 + CE|RN |2 .
Use Lemma 35 and Lipschitz continuity, Cauchy-Schwarz, BGD to

obtain
E|RN |2 6 C.
Use the discrete Gronwall inequality to conclude the proof of
Theorem 36.
Pavliotis (IC)
CompStochProc
162 / 298
Theorem
constant interpolation and assume that b, Cb4 and f CP4 . Then
|Ef (XTt ) Ef (XT )| 6 Ct.
Pavliotis (IC)
CompStochProc
(88)
163 / 298
For the proof of this theorem we will use the backward Kolmogorov
equation. Let u(t, x) denote the solution of the backward Kolmogorov
equation with u(T, x) = f (x). We have
Ef (XTt ) Ef (XT ) = E(u(T, XTt )) u(0, X0 )
=
n1
X

E u(ti+1 , Xtt
) u(ti , Xtt
)
i
i+1
i=0
n1
X
i=0
n1
X
i=0
ti+1
ti
ti
ti+1
i
t
t u(s, Xst ) + L(Xtt
)u(s,
X
)
ds
s
i
i

t
t
t + L(Xtt
)
(u(t
,
X
)
(u(s,
X
)
d
i
ti
s
i
We use Its formula:

E|t u(s, Xst ) t u(ti , Xtt
)| = E|(Xti )|O(t).
i
Pavliotis (IC)
CompStochProc
164 / 298
when we are interested in weak approximation, we can replace

the Brownian increments by different random variables that might
be easier to simulate and have the right statistics.
We
can use, for example the two point distributed random variable
tn where n are i.i.d. with

1
P n = 1 = .
2
(89)
For higher order approximations we can use the 3-point distributed

random variable

2
1
P n = 3 = , P n = 0 = .
(90)
6
3
This is useful when considering weak convergence for fully implicit
schemes.
Pavliotis (IC)
CompStochProc
165 / 298
Its formula can be used to obtain a probabilistic description of

solutions to linear partial differential equations of parabolic type.
Let Xtx be a diffusion process with drift b(), diffusion () = T (),
and generator L with X0x = x, and let f C02 (Rd ) and V C(Rd ),
bounded from below. Then the function

Rt x
(91)
u(x, t) = E e 0 V(Xs ) ds f (Xtx )
is the solution to the initial value problem
u
= Lu Vu,
t
u(0, x) = f (x).
Pavliotis (IC)
CompStochProc
(92a)
(92b)
166 / 298
To derive this
R t result, we
introduce the variable
Yt = exp 0 V(Xs ) ds .
We rewrite the SDE for Xt as
dXtx = b(Xtx ) dt + (Xtx ) dWt ,

dYtx
V(Xtx ) dt,
Y0x
X0x = x,
= 0.
(93a)
(93b)
The process {Xtx , Ytx } is a diffusion process with generator

Lx,y = L V(x)
.
y
We can write

Rt x
E e 0 V(Xs ) ds f (Xtx ) = E((Xtx , Ytx )),
where (x, y) = f (x)ey . We apply Its formula to this function to

obtain (92).
Pavliotis (IC)
CompStochProc
167 / 298
The representation formula (91) for the solution of the initial value
problem (92) is called the FeynmanKac formula.
We can use the Feynman-Kac formula to develop a numerical
scheme for solving parabolic PDEs based on the numerical
solution of the underlying SDE.
Pavliotis (IC)
CompStochProc
168 / 298
Stability analysis of numerical schemes

Goal: solve SDEs accurately over long time intervals.
Constant in the strong/weak order of convergence estimates
grows exponentially fast (Gronwalls lemma).
Stability analysis: fix t, study the T + limit.
Test the stability of the scheme on the linear SDE (geometric BM)
X(0) = X0 ,
(94)
where , C.
Pavliotis (IC)
CompStochProc
169 / 298
We will say that an SDE is stable in mean square if for all X0 =

6 0
with prob. 1
lim E|Xt |2 = 0,
(95)
t+
We will say that an SDE is asymptotically stable if for all X0 =

6 0
with prob. 1

P lim |Xt | = 0 = 1.
(96)
t+
For the gBM (94) we have
lim E|Xt |2 = 0
t+
and
P
Pavliotis (IC)

lim |Xt | = 0 = 1
t+
1
Re() + ||2 < 0.
2
CompStochProc

1
Re 2 < 0.
2
(97)
(98)
170 / 298
Lemma
The geometric Brownian motion (94) with , C is stable in mean
square provided that
1
(99)
Re() + ||2 < 0.
2
Pavliotis (IC)
CompStochProc
171 / 298
Proof.
We apply Its formula to |Xt |2 = (ReXt )2 + (ImXt )2 to obtain

d |Xt |2 = 2Re() + ||2 |Xt |2 dt + dMt ,
where Mt is a martingale. We take the expectation to obtain

d
E|Xt |2 = 2Re() + ||2 E|Xt |2 .
dt
The solution to this equation is

E|Xt |2 = exp 2Re() + ||2 E|X0 |2 ,
Pavliotis (IC)
CompStochProc
172 / 298
Mean square stability is more stringent than asymptotic stability.

Suppose that the parameters and are chosen so that gBM is
mean square stable. For what values of t is the EM also mean
square stable?
lim EXj2 = 0
j+
(1 + t + )2 + t||2 < 1.
(100)
Similar results can be obtained for the Milstein scheme.
Pavliotis (IC)
CompStochProc
173 / 298
Definition
Let Xtt denote a time-discrete approximation of the SDE
dXt = b(Xt ) dt + (Xt ) dWt
(101)
t
with step size t starting at X0t at t = 0 and let X t denote the

t
approximation starting at X 0 We will say that Xtt is stochastically
numerically stable for the SDE (101) if for any finite interval [0, T]
t0 s.t. > 0, t (0, t0 ) we have

t
|
>
= 0.
(102)
sup P |X nt Xnt
lim
t
t
|X0t X 0 |0 t[0,T]
We will say that a numerical scheme is stochastically numerically

stable if it satisfies (103) for all SDEs for which it is convergent.
Pavliotis (IC)
CompStochProc
174 / 298
Asymptotic stochastic stability: Replace (103) with

t
lim
|
>
= 0.
lim P |X nt Xnt
t
t
|X0t X 0 |0 T+
(103)
Consider the linear SDE

dXt = Xt dt + dWt .
It is mean square stable for Re() < 0.
Consider numerical schemes that can be written in the form
t
Xn+1
= Xt G(t) + Znt .
The region of absolute stability of the scheme is the set of t

for which
Re() < 1 and |G(t)| < 1.
If the region of absolute stability is the left half complex plane the
scheme is A-stable.
Pavliotis (IC)
CompStochProc
175 / 298
Just as with ODEs, implicit schemes have better stability

properties than explicit schemes.
The implementation of an implicit scheme requires the solution of
an additional algebraic equation at each time step, which can
usually be done using the Newton-Raphson algorithm.
Treating the noise involves reciprocals of Gaussian random
variables which do not have finite absolute moments.
We will treat the drift implicitly and the noise explicitly.
The implicit Euler scheme is
Xn+1 = Xn + b(Xn+1 )t + (Xn ) Wn .
(104)
If
we are only interested in weak convergence we can replace
tN (0, 1) by random variables that are accurate in the weak
sense and are easier to take their reciprocals.
Pavliotis (IC)
CompStochProc
176 / 298
Family of implicit Euler schemes (stochastic theta method, STM):

Xn+1 = Xn + b(Xn+1 ) + (1 )b(Xn ) t + (Xn ) Wn , (105)
for [0, 1].
Family of implicit Milstein schemes:

Xn+1 = Xn + b(Xn+1 ) + (1 )b(Xn ) t
(106)

1
+(Xn ) Wn + ( )(Xn ) (Wn )2 t . (107)
2
For both schemes we have
G(t) =
Pavliotis (IC)
1 + (1 )t
.
1 t
CompStochProc
177 / 298
Test problem: geometric Brownian motion (linear SDE with

multiplicative noise), eqn. (94).
For what stepsizes t does the stochastic method
share the stability properties of the test problem?
Use the concept of mean square stability.
For the SDE the mean square stability depends on the parameters
, .
For the numerical scheme it depends on the model parameters
, and on the numerical scheme parameters t, .
Our goal is to find for what values of t, , , the stochastic
theta method is also mean square stable.
Pavliotis (IC)
CompStochProc
178 / 298
The STM for the geometric Brownian motion becomes
Xn+1 = Xn + (1 )hXn + hXn+1 + hn Xn ,

with h := t. We can rewrite this equation as

Xn+1 = a + b n Xn ,
(108)
(109)
with
h
1 + (1 )h
, b=
.
(110)
a=
1 h
1 h
This is a homogeneous Markov chain with an uncountable state
space.
To study the stability properties of the STM we need to study the
long time behavior of the Markov Chain (109)
Pavliotis (IC)
CompStochProc
179 / 298
Definition
The numerical scheme (Markov Chain) (109) is mean square stable iff
lim E|Xn |2 = 0.
n+
(111)
Theorem
The MC (109) is mean square stable if and only if
|a|2 + |b|2 < 1.
(112)
In particular, for the STM we have

|1 + (1 )h|2 + h| 2 |
< 1.
|1 h|2
Pavliotis (IC)
CompStochProc
(113)
180 / 298
Define the stability regions for the gBM and for the STM:
1
SSDE := {, C : Re() + ||2 < 0}
2
(114)
and
SSTM (, h) :=

|1 + (1 )h|2 + h| 2 |
, C :
<
1
.
|1 h|2
(115)
Theorem
For all h > 0 we have
SSTM (, h) SSDE for [0, 21 ).
SSTM (, h) = SSDE for = 12 .
SSTM (, h) SSDE for ( 12 , 1].
Pavliotis (IC)
CompStochProc
181 / 298
Definition
The numerical scheme is (109) (mean sqaure) A-stable provided that
whenever the test problem (94) is mean-sqaure stable, then (109) is
also mean square stable for all t > 0.
Corollary
The STM is A-stable for all [ 12 , 1].
If the SDE is unstable, then so is the STM for all h.
If the SDE is stable, then so is the STM for sufficiently small h.
In this case, the resulting stepsize restriction can be arbitrarily
severe.
Pavliotis (IC)
CompStochProc
182 / 298
Numerical methods for ergodic SDEs

We consider SDEs in Rd
dXt = b(Xt ) dt + (Xt ) dWt .
(116)
We assume that Xt is ergodic with respect to the invariant

measure (dx) = (x) dx.
Assuming sufficient regularity, the invariant density (x) is the
unique normalizable solution of the stationary Fokker-Planck
equation
L = 0.
(117)
In many applications we need to be able to calculate expectations

with respect to the stationary distribution
Z
f (x) (x) dx.
(118)
E f (x) =
Rd
Pavliotis (IC)
CompStochProc
183 / 298
In high dimensions it is computationally prohibitive to solve the

stationary Fokker-Planck equation.
We also need to calculate the high dimensional integral (118).
For ergodic SDEs time averages equal phase-space averages:
1
lim
T+ T
f (Xs ) ds = E f (x).
(119)
We can solve numerically the SDE (116) and then calculate the
time average in (119) to calculate E f (x).
Let Xnh = X h (nh), h = t denote the solution of the numerical
scheme. We need to estimate the difference
Z
N

1 X

h
f (x) (x) dx
f (Xn )

N
Rd
(120)
n=1
as a function of h and N.
Pavliotis (IC)
CompStochProc
184 / 298
T
Let {Xnt }nn=0
denote a (long) numerically calculated trajectory. We
have
Z T
nT 1
1 X
f (Xnt ) := FTt .
f (Xs ) ds
FT :=
n
T
0
n=0
Define
F t = lim FTt .
T+
We will say that a time discrete approximation Xnt converges with

respect to the ergodic criterion with order > 0 to the ergodic
diffusion process Xt as t 0 provided that for every
f CP (Rd , R) there exist a positive constant Cf independent of t
and a t0 such that
|F t E f (x)| 6 Cf (t)
Pavliotis (IC)
CompStochProc
(121)
185 / 298
This is an extension of the weak convergence criterion to the

infinite time horizon T = .
The explicit Euler scheme converges with respect to the ergodic

criterion with order = 1.0.
We expect that most of the numerical schemes with weak order
also converge with respect to the ergodic criterion with the same
order , under the assumptions of the Hasminski criterion.
Pavliotis (IC)
CompStochProc
186 / 298
We need to address the following issues.

1
2
3
Does the numerical scheme have an invariant measure (density)

h ? How quickly does it converge to it?
How close is h to ?
How close is the time averaging estimator to the stationary
measure of the SDE?
The standard weak convergence results (e.g. for the explicit Euler
method) are valid only over finite time intervals. In order to be able
to control the difference between the long time average of the
numerical scheme and E f we need estimates on the numerical
scheme that are uniform in time.
For this we need to use the structural property of (geometric)
ergodicity of the SDE.
Pavliotis (IC)
CompStochProc
187 / 298
A very important concept in the study of limit theorems for

stochastic processes is that of ergodicity.
This concept, in the context of Markov processes, provides us with
information on the longtime behavior of a Markov semigroup.
Definition
A Markov process is called ergodic if the equation
Pt g = g,
g Cb (E) t > 0
has only constant solutions.

Roughly speaking, ergodicity corresponds to the case where the
semigroup Pt is such that Pt I has only constants in its null
space, or, equivalently, to the case where the generator L has
only constants in its null space. This follows from the definition of
the generator of a Markov process.
Pavliotis (IC)
CompStochProc
188 / 298
Under some additional compactness assumptions, an ergodic

Markov process has an invariant measure with the property that,
in the case T = R+ ,
Z
1 t
lim
g(Xs ) ds = Eg(x),
t+ t 0
where E denotes the expectation with respect to .
This is a physicists definition of an ergodic process: time
averages equal phase space averages.
Using the adjoint semigroup we can define an invariant measure
as the solution of the equation
Pt = .
If this measure is unique, then the Markov process is ergodic.
Pavliotis (IC)
CompStochProc
189 / 298
Using this, we can obtain an equation for the invariant measure in

terms of the adjoint of the generator L , which is the generator of
the semigroup Pt . Indeed, from the definition of the generator of a
semigroup and the definition of an invariant measure, we conclude
that a measure is invariant if and only if
L = 0
in some appropriate generalized sense ((L , f ) = 0 for every
bounded measurable function).
Assume that (dx) = (x) dx. Then the invariant density satisfies
the stationary Fokker-Planck equation
L = 0.
The invariant measure (distribution) governs the long-time
dynamics of the Markov process.
Pavliotis (IC)
CompStochProc
190 / 298
If X0 is distributed according to , then so is Xt for all t > 0. The

resulting stochastic process, with X0 distributed in this way, is
stationary .
In this case the transition probability density (the solution of the
Fokker-Planck equation) is independent of time: (x, t) = (x).
Consequently, the statistics of the Markov process is independent
of time.
Pavliotis (IC)
CompStochProc
191 / 298
Example
The one dimensional Brownian motion is not an ergodic process: The
d2
null space of the generator L = 21 dx
2 on R is not one dimensional!
Example
Consider a one-dimensional Brownian motion on [0, 1], with periodic
boundary conditions. The generator of this Markov process L is the
d2
differential operator L = 12 dx
2 , equipped with periodic boundary
conditions on [0, 1]. This operator is self-adjoint. The null space of both
L and L comprises constant functions on [0, 1]. Both the backward
Kolmogorov and the Fokker-Planck equation reduce to the heat
equation
1 2
=
t
2 x2
with periodic boundary conditions in [0, 1]. Fourier analysis shows that
the solution converges to a constant at an exponential rate.
Pavliotis (IC)
CompStochProc
192 / 298
Example
The one dimensional Ornstein-Uhlenbeck (OU) process is a
Markov process with generator
L = x
d2
d
+ D 2.
dx
dx
The null space of L comprises constants in x. Hence, it is an

ergodic Markov process. In order to calculate the invariant
measure we need to solve the stationary FokkerPlanck equation:
L = 0,
Pavliotis (IC)
> 0,
CompStochProc
kkL1 (R) = 1.
(122)
193 / 298
Example (Continued)
Let us calculate the L2 -adjoint of L. Assuming that f , h decay
sufficiently fast at infinity, we have:
Z
Z

(xx f )h + (Dx2 f )h dx
Lfh dx =
R
Z
ZR

f L h dx,
f x (xh) + f (Dx2 h) dx =:
=
R
where
d
d2 h
(axh) + D 2 .
dx
dx
We can calculate the invariant distribution by solving
equation (122).
L h :=
The invariant measure of this process is the Gaussian measure

r

exp x2 dx.
(dx) =
2D
2D
If the
initial
condition of theCompStochProc
OU process is distributed according
Pavliotis
(IC)
194to
/ 298
Approximation of Invariant Measures and MCMC

Consider the SDE in Rd :
X0 = x.
(123)
L = b(x) +
d
2
1X
ij (x)
.
2
xi xj
(124)
i,j=1
The L2 -adjoint (Fokker-Planck operator) is

1
L = b(x) + (x) .
2
Pavliotis (IC)
CompStochProc
(125)
195 / 298
The expectation u(x, t) = E((Xt )|X0 = x) is the solution of the

initial value problem for the backward Kolmogorov equation
u
= Lu,
t
u(x, 0) = (x).
(126)
Using the transition probability density p(t, x, y) we can write

Z
p(t, x, y)(y) dy.
u(x, t) =
Rd
p(t, x, y) is the solution of the forward Kolmogorov (Fokker-Planck)

equation
p
= Ly p, p(0, x, y) = (x y).
(127)
t
Pavliotis (IC)
CompStochProc
196 / 298
Connection between SDEs, (elliptic and parabolic) PDEs and

probability measures.
We can use SDEs in order to calculate functional integrals, the
solution of elliptic and parabolic PDEs and to sample from a given
probability distribution.
We need to study the ergodic properties of the SDE (123).
We denote by Pt and Pt the semigroups generated by L and L ,
respectively:
Pt = eLt and Pt = eL t .
Pt acts on L functions and Pt on probability measures.
Pavliotis (IC)
CompStochProc
197 / 298
A probability measure is invariant under the dynamics (123)

provided that
Pt = .
(128)
We will say that the dynamics Xt is ergodic if there exists only one
invariant measure. In this case
Z
Z
1 T
lim
f (x)(dx).
(129)
f (Xs ) ds =
T+ T 0
Rd
Let (dx) = p (x) dx. We divide (128) by t and pass to the limit
t 0 to obtain the stationary Fokker-Planck equation
L p = 0,
(130)
together with the appropriate boundary conditions.

The SDE (123) is ergodic provided that there exists a unique
normalizable solution to (130).
Pavliotis (IC)
CompStochProc
198 / 298
Consider the dynamics (123) with X0 p0 (x). The law of the

process Xt is the solution of the initial value problem for the
Fokker-Planck equation
p
= L p,
t
p(0, x) = p0 (x).
(131)
When Xt is ergodic we have that

lim kp(, t) p ()kL1 (Rd ) = 0.
t+
In applications to MCMC we need quantitative information on the

rate of convergence to equilibrium.
Pavliotis (IC)
CompStochProc
199 / 298
We can study the ergodic properties of an SDE using techniques

from stochastic analysis, functional analysis (functional
inequalities), Lyapunov function techniques etc.
Hasminskis criterion: if the drift and diffusion coefficients are
smooth, the diffusion coefficient is strictly positive definite and
bounded and there exists a constant > 0 and a compact subset
K Rd such that
hb(x), xi 6 |x|2
for all x Rd K. Then Xt is ergodic.
Even if we can prove that an SDE is ergodic, we usually do not

have an analytic formula for the invariant measure.
Example: nonequilibrium steady states.
Pavliotis (IC)
CompStochProc
200 / 298
We can consider numerical schemes that can be written in the

form
h
= Xn + b
b(Xnh ; h)h +
b(Xnh , h) hn ,
Xn+1
(132)
where b
b : Rd (0, 1) Rd ,
b : Rd (0, 1) Rdm , and n is a
collection of iid real-valued r.v. with
En,i = En,i = 0,
En,i = 1,
2r
En,i
< +,
for r sufficiently large.

For example, we can have n,i N (0, 1) iid or iid two-point rv (89).
Pavliotis (IC)
CompStochProc
201 / 298
The convergence of the long time average of the solution to the

SDE to its invariant measure can be proved by studying the
solution of the Poisson equation
L = f E f ,
E = 0.
(133)
We apply Its formula to :

d(Xt ) = L(Xt ) dt + dMt ,
where Mt =
1
T
Pavliotis (IC)
T
0
Rt
0
= (f (Xt ) E f ) dt + dMt .
(Xt )(Xt ) dWt .
f (Xt ) dt E f
1
1
(Xt ) (X0 ) + MT .
T
T
CompStochProc
202 / 298
Assume that solutions of the Poisson equation (133) are bounded

(for example, if the diffusion is uniformly elliptic and the state
space is compact, e.g. the unit torus Td ). Then:
and

2
1
(Xt ) (X0 )
= 0,
lim E
T+
T
lim E
T+
1
1 2
Mt = lim
EhMt i = 0,
2
T+ T
T
using the law of large numbers and the central limit theorems for
martingales.
Consequently:
E
Pavliotis (IC)
1
T
f (Xt ) dt E f
CompStochProc
2
C
.
T
203 / 298
This is a quantitative version of the mean ergodic theorem. We

can also prove an almost sure ergodic theorem (strong law of
large numbers)
Z
1 T
f (Xt ) dt = E f a.s.
lim
T+ T 0
The long time average of f (Xt ) is an estimator for the integral E f .
Its bias and variance are:
Bias
1
T
f (Xt ) dt
0
:= E
1
T
f (Xt ) dt E f

1
=O
T
(134)
and
Var
1
T
Pavliotis (IC)
f (Xt ) dt
0
:= E
1
T
T
0
CompStochProc
f (Xt ) dt E f
2
=O

1
.
T
(135)
204 / 298
We have that
lim T Var
T+
1
T
f (Xt ) dt
= 2h, f E f i ,
(136)
where denotes
the solution of the Poisson equation (133) and
R
hh, gi = hg (dx) denotes the L2 () inner product.
This follows from Its formula or, equivalently, from (setting

f = f E f )

Z T
Z +
1
f (Xt ) dt = 2
E (f (Xt )f (X0 )) dt (137)
lim T Var
T+
T 0
0
and the backward Kolmogorov equation.
Pavliotis (IC)
CompStochProc
205 / 298
We have:
Var
1
T
f (Xt ) dt
0
=
=
=:
=
=
1
T
f (Xt ) dt
2
Z TZ T

1
f
(X
)f
(X
)
dtds
E
t
s
T2 0 0
Z TZ T
1
R (t, s) dtds
T2 0 0 f
Z T
2
(T s)Rf (s) ds
T2 0
Z

s
2 T
E f (Xs )f (X0 ) ds,
1
T 0
T
Pavliotis (IC)
CompStochProc
206 / 298
Let u(x, t) denote the solution of the backward Kolmogorov

equation with initial condition f (x). We can formally write
E(f (Xt ) | X0 = x) = u(x, t) = eLt f (x),
where L denotes the generator of Xt .
For the calculation of the time integral of the stationary
autocorrelation function we have (X denotes the state space):
Z +
Z Z +

E (f (Xt )f (X0 )) dt =
eLt f (x) f (x) dt (dx)
0
X 0
Z +

Z
Lt
e f (x) dt (dx)
f (x)
=
0
X
Z +

Z
Lt
e f (x) dt (dx)
=
f (x)
0
X
Z
f (x)(L)1 f (x) (dx)
=
X
= hf , i .
Pavliotis (IC)
CompStochProc
207 / 298
Consider the long time average estimator of E f

N1
X
bfN = 1
f (Xnh ),
N
(138)
n=0
N1
is the Markov chain obtained from the numerical
where {Xnh }n=0
discretization of the SDE, from example using the EM scheme.
We want to calculate the variance of this estimator by generating a
long trajectory:
1
We generate a long trajectory of length M T with T = Nh and we

split it into M blocks of length T.
We evaluate the estimators bfm,N . Assuming that T is large enough
and that we have fast decay of correlations then these estimators
are close to being uncorrelated.
We compute the sampled variance
M
X
2
bfm,N
b= 1
D
M1
m=1
Pavliotis (IC)
CompStochProc
M
1 Xb
fm,N
M
m=1
!2
(139)
208 / 298
For sufficiently large N, M we have the following confidence

interval
p
p

b
b
D
D
EbfN bfNM c , bfNM + c
,
(140)
M
M
with probability 95% for c = 2 and probability 99.7% for c = 3.
Pavliotis (IC)
CompStochProc
209 / 298
MARKOV CHAIN MONTE CARLO
Pavliotis (IC)
CompStochProc
210 / 298
Goal: sample from a distribution (x) that is known only up to a

constant. We will sometimes write
Z
1 V(x)
eV(x) dx.
(141)
, Z=
(x) = e
Z
RN
We want to calculate integrals of the form
Z
f (x) (dx).
E f :=
Rd
MC approach: Calculate the normalization constant

Z
Z=
(x) dx
Rd
and the integral

E f (x) =
f (x)(x) dx
Rd
using MC, together with an appropriate variance reduction

technique.
SDE approach: Construct an appropriate stochastic dynamics
whose invariant distribution is (x).
Pavliotis (IC)
CompStochProc
211 / 298
Markov Chain Monte Carlo

Construct ergodic stochastic dynamics whose invariant
distribution is (x).
There are many different dynamics whose invariant distribution is
given by (x).
Different discretizations of the corresponding SDE can behave
very differently, even fail to converge to (x).
computational efficiency:
1
Choose the dynamics that converges to equilibrium as quickly as

possible (bias correction).
Choose the dynamics that leads to the minimum asymptotic
variance (variance reduction).
Consider these problems either for a class of observables or for a

specific observable.
Pavliotis (IC)
CompStochProc
212 / 298
We want to calculate the integral

Z
f (x) (dx),
E f =
(142)
Rd
where (dx) is known up to the normalization constant.

We want to use a diffusion process Xt that is ergodic with respect
to (dx):
Z
1 T
lim
f (Xs ) ds = E f a.s.
(143)
T+ T 0
For all f L1 ().
We will consider diffusions that satisfy a functional central limit
theorem

Z T
1
f (Xs ) ds E f = N (0, 2f2 ),
(144)
T
lim
T+
T 0
in distribution for all f L2 ().
Our goal is to choose the diffusion process so that we can speed
up convergence to equilibrium and minimize the asymptotic
variance.
Pavliotis (IC)
CompStochProc
213 / 298
In order for (x) to be the invariant distribution of Xt , we require

that it is the (unique) solution of the stationary Fokker-Planck
equation
(b(x)(x) + ((x)(x))) = 0,
(145)
together with appropriate boundary conditions (if we are in a
bounded domain).
If the detailed balance condition
Js := b + () = 0,
(146)
is satisfied, then the process Xt is reversible wrt (x).
Pavliotis (IC)
CompStochProc
214 / 298
There are (infinitely) many reversible diffusions that can be used in

order to sample from (x). A reversible diffusion can be written in
the form
p
(147)
dXt = (Xt )(Xt ) dt + (Xt ) dt + 2(Xt ) dWt .
The rate of convergence to equilibrium depends on the tails of the
distribution (x) and on the choice of diffusion process Xt .
The standard choice is the overdampled Langevin dynamics,
= 2 I, b(x) = 21 log (x)
1
dXt = log (Xt ) dt + dWt .
2
Pavliotis (IC)
CompStochProc
(148)
215 / 298
Definition
A stationary stochastic process Xt is time reversible if its law is
invariant under time reversal: for every T (0, +) Xt and the
time-reversed process XTt have the same distribution.
The processes Xt and XTt have the same finite dimensional
distributions. Equivalently, for each N N+ , a collection of times
0 = t0 < t1 < tN = T, and bounded measurable functions with
compact support fj , j = 0, . . . N we have that
E
N
Y
fj (Xtj ) = E
j=0
N
Y
fj (XTtj ),
(149)
j=0
where (dx) denotes the invariant measure of Xt and E denotes

expectation with respect to .
Reversible diffusion processes can be characterized in terms of
the properties of their generator. Time-reversal is equivalent to the
selfadjointness of the generator in the Hilbert space L2 (Rd ; ).
Pavliotis (IC)
CompStochProc
216 / 298
Theorem
A stationary Markov process Xt in Rd with generator L and invariant
measure is reversible if and only if its generator is selfadjoint in
L2 (Rd ; ).
Pavliotis (IC)
CompStochProc
217 / 298
Proof.
Assume first (149). We take N = 1 and t0 = 0, t1 = T to deduce that

E f0 (X0 )f1 (XT ) = E f0 (XT )f1 (X0 ) , f0 , f1 L2 (Rd ; ).
This is equivalent to
Z
Z

Lt
e f0 (x) f1 (x) (dx) = f0 (x) eLt f1 (x) (dx),
i.e.
heLt f1 , f2 iL2 = hf1 , eLt f2 iL2 ,
f1 , f2 L2 (Rd ; s ).
(150)
Consequently, the semigroup eLt generated by L is selfadjoint.

Differentiating (150) at t = 0 gives that L is selfadjoint.
Pavliotis (IC)
CompStochProc
218 / 298
Proof.
Conversely, assume that L is selfadjoint in L2 (Rd ; ). We will use an
induction argument. Our assumption of selfadjointness implies
that (149) is true for N = 1
E
1
Y
fj (Xtj ) = E
1
Y
fj (XTtj ),
(151)
j=0
j=0
Assume that it is true for N = k. We have that

E
k
Y
fj (Xtj ) =
j=0
= E
...
k
Y
f0 (x0 )(dx0 )
k
Y
fj (xj )p(tj tj1 , xj1 , dxj )
k
Y
fj1 (xj1 )p(tj tj1 , xj , dxj1 ),
j=1
fj (Xtj1 )
n=0
=
Pavliotis (IC)
...
fk (xk )(dxk )
j=1
CompStochProc
219 / 298
(152)
Proof.
Now we show that (149) it is true for N = k + 1. We calculate,
using (151) and (152)
E
k+1
Y
fj (Xtj )
E j=1 fj (Xtj )fk+1 (Xtk+1 )
j=1
(??)
...
(dx0 )f0 (x0 )
k
Y
fj (xj )p(tj tj1 , xj1 , dxj )
j=1
fk+1 (xk+1 )p(tk+1 tk , xk , dxk+1 )

(152)
...
(dxk )f0 (xk )
k
Y
fj1 (xj1 )p(tj tj1 , xj , dxj1 )
j=1
fk+1 (xk+1 )p(tk+1 tk , xk , dxk+1 )

(152)
...
k
Y
(dxk )f0 (xk )fk+1 (xk+1 )p(tk+1 tk , xk , dxk+1 )
j=1
(151)
...
(dxk+1 )f0 (xk+1 )
k+1
Y
j=1
fk+1 (xk+1 )p(tk+1 tk , xk , dxk+1 ) = E
k+1
Y
fj (XTtj ).
j=0
Pavliotis (IC)
CompStochProc
220 / 298
In order to sample from (x) using the overdamped Langevin

dynamics (148) we first have to discretize the SDE. The Euler
discretization gives
1
Xn+1 = Xn + h log (Xn ) + hn ,
2
n N (0, 1).
(153)
This can be written as

1
Xn+1 N Xn + h log (Xn ), hI
2
(154)
This defines a Markov Chain {Xn }Nn=0 . It is not clear that this
Markov chain has the same ergodic properties as the SDE.
We can choose to either accept or reject the next move Xn+1 with
a certain probability. Introducing this accept-reject step to a given
Markov chain leads to the Metropolis-Hastings algorithm.
The resulting Metropolis adjusted algorithm is always ergodic
with respect to the target distribution (x).
Pavliotis (IC)
CompStochProc
221 / 298
1
2
3
4
%SDE parameters
dt = 1.0e-2;
kappa = 0.1;
diff = sqrt(2*kappa*dt);
5
6
7
%number of iterations
N = 10^6;
8
9
10
%batch count (N must be divisible by m)

m = floor(sqrt(N));
11
12
13
%Significance value for confidence intervals

= 0.01; % 99% CI
14
15
16
%Derivative of potential
Vprime = @(x) x^3 - x;
17
18
19
%Observable
f = @(x) x;
20
21
22
obs = zeros(1, N);

x = 0;
Pavliotis (IC)
CompStochProc
221 / 298
23
24
25
26
27
%simulate SDE
for i=1:N
x = - Vprime(x)*dt + diff*randn(1);
obs(i) = f(x);
end;
28
29
30
31
%The estimator is then given by

av = mean(obs);
disp(['Mean ', num2str(av)]);
32
33
34
%Compute the confidence intervals using a batch-means ...

estimator
B = reshape(obs, m, []);
35
36
37
%Compute batch means

bmeans = mean(B, 1);
38
39
40
41
%Compute batch variance

D = var(bmeans);
disp(['Variance ', num2str(D)]);
42
43
44
%Compute confidence interval

cinv = tinv(1-/2,m-1)*sqrt(D/m);
Pavliotis (IC)
CompStochProc
221 / 298
45
46
47
48
49
%Now do all the

hold on
plot((1:N)*dt, cumsum(obs)./(1:N), '-r');
xlim([10 N*dt])
50
51
52
53
54
55
56
57
58
hline = refline(0, av - cinv);

set(hline,'LineStyle',':')
hline = refline(0,av + cinv);
set(hline,'LineStyle',':')
ylabel('$f(X_t)$', 'interpreter', 'latex')
xlabel('$t$', 'interpreter', 'latex')
title('Estimating $f(x)=x$ from ...
$e^{-(x^2-1)^2/4\kappa}$', 'interpreter', 'latex');
hold off
Pavliotis (IC)
CompStochProc
222 / 298
The Metropolis-Hastings Algorithm

We are given a target distribution (x) on Rd that is known up to
the normalisation constant.
We are also given a Markov chain {Xn } with transition kernel
density q(x, y) = q(x y). Examples include:
For the random walk with diffusion coefficient 2 we have

q(x) N (x, 2 ).
For the Euler discretization of the overdamped Langevin dynamics
we consider the Markov chain (154).
Given and {Xn }, a proposed value Yn+1 is generated from the

density q(Xn , y) and is then accepted with probability
n
o
min (y) q(y,x) , 1
: (x)q(x, y) > 0,
(x) q(x,y)
(x, y) =
.
(155)
1 : (x)q(x, y) = 0.
If the proposed value is accepted then set Xn+1 = Yn+1 otherwise
set Xn+1 = Xn .
Pavliotis (IC)
CompStochProc
222 / 298
The transition probability density for the Metropolis adjusted

Markov chain is
p(x, y) = q(x, y)(x, y),
fory 6= x
(156)
and with probability of remaining at the same point given by

Z
r(x) = P(x, {x}) = q(x, y)[1 (x, y)] dy.
With choice of the acceptance probability we have that the
resulting Markov chain is ergodic (in fact, reversible) with respect
to the target measure (dx):
Z
(A) = (x)P(x, A) dx, x X, A B,
(157)
where X denotes the state space and B the Borel sets.
Pavliotis (IC)
CompStochProc
223 / 298
1
2
3
4
5
6
7
8
9
10
11
12
13
14
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
rwmh: Implementation of Random-Walk metropolis hastings
%
Code written by A. Duncan
%
Parameters:
%
X0 : starting state.
%
: step size
%
nsamp: number of samples
%
target distribution: functional handle to density ...
function.
%
%
Output:
%
X: [nsamp, length(X0)]-dimensional array of samples
%
acc: Array of accept-reject flags
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
16
17
function [ X, acc ] = rwmh( X0,

targetDistribution)
%Metropolis hastings algorithm
nsamp, ...
18
19
20
dim = length(X0);
X = zeros(nsamp, dim);
Pavliotis (IC)
CompStochProc
223 / 298
21
acc = zeros([1 nsamp]);
22
23
24
%set the initial state

X(1,:) = X0;
25
26
27
28
for i = 1:nsamp-1
%Generate proposal
Y = proposal(X(i,:),
);
29
30
31
32
%Compute acceptance probability

pX_given_Y = proposal_prob(Y, X(i,:),
pY_given_X = proposal_prob(X(i,:), Y,
);
);
33
34
35
36
piY = targetDistribution(Y);
piX = targetDistribution(X(i,:));
alpha = min(1, pX_given_Y*piY/(pY_given_X*piX));
37
38
39
40
41
42
43
%Accept/Reject sample.
if (rand < alpha)
X(i+1,:) = Y;
acc(i) = 1;
else
Pavliotis (IC)
CompStochProc
223 / 298
X(i+1,:) = X(i,:);
acc(i) = 0;
44
45
end
46
end
47
48
49
end
50
51
52
53
54
% RWMH proposal function

function Y = proposal(X, )
Y = normrnd(X, sqrt(), [1,length(X)]);
end
55
56
57
58
59
60
% RWMH proposal probability

function prob = proposal_prob(Y, X,
d = dot((Y - X), (Y - X));
prob = exp(-d/(2*));
end
Pavliotis (IC)
CompStochProc
224 / 298
Given essentially any probability distribution the

Metropolis-Hastings algorithm provides us with a method for
generating a Markov chain {Xn } that has as a stationary
distribution.
When the transition kernel density q(, ) is symmetric then the
formula for the acceptance probability simplifies to

(y)
(x, y) = min
,1 .
(x)
For the symmetric Random Walk Metropolis algorithm (RWM),
q() = N (x, 2 ): Set Xn = x, choose Yn+1 N (x, 2 ), set
(
Yn+1 : with probability (x, y),
.
(158)
Xn+1 =
Yn : with probability 1 (x, y).
Pavliotis (IC)
CompStochProc
224 / 298
We consider an MCMC algorithm for sampling from (x), with or

without an accept reject step. We consider a Markov chain n .
The n-step transition probability is

Pn (x, A) = P n A | 0 = x ,
where x X (the state space) and A B.

For the RWMH and the Metropolis adjusted Langevin algorithm
(MALA) it converges to in total variation (strong law of large
numbers)
kPn (x, ) kTV :=
1
sup |Pn (x, A) (A)| 0.
2 AB
(159)
Under additional assumptions we also have a central limit

theorem.
The rate of convergence to the target distribution and the
asymptotic variance can be used to measure the efficiency of the
MCMC algorithm.
Pavliotis (IC)
CompStochProc
225 / 298
For an MCMC algorithm to be efficient, it has to be geometrically

ergodic (exponentially fast convergence to equilibrium).
Without the Metropolis-Hastings step, the (unadjusted) Euler
discretization of the Langevin dynamics (ULA) might fail to be
(geometrically) ergodic.
This depends on the tails of the target distribution.
We can test the convergence of different MCMC algorithms for a
class of 1d distributions E(, ):
E(, ) (x) e|x| |x| > x0 ,
(160)
for some x0 and some positive constants , .

The ULA Markov chain might fail to be (geometrically) ergodic
even when the Langevin dynamics is, depending on the values of
and and on the step size.
Pavliotis (IC)
CompStochProc
226 / 298
Optimal Scaling for the Metropolis-Hastings Algorithm

For the RWMH algorithm we can choose the diffusion coefficient:
q() = N (, 2 ).
We want to choose the parameter in an optimal way.
The RWMH can perform arbitrarily badly for 1 and for 1.
There exists an optimal value .
We can measure the efficiency of the MCMC algorithm in terms of
the asymptotic variance (for all f L2 ()). The efficiency of the
algorithm depends on the acceptance rate
Z
a =
(x, y)(x)q(x, y) dxdy
(161)
=
lim
n+
1
number{accepted moves}.
n
(162)
Choosing the optimal scaling reduces the computational cost.

Pavliotis (IC)
CompStochProc
227 / 298
The optimal acceptance rate for RWMH is

aopt = 0.234.
For MALA:
aopt = 0.574.
The MALA algorithm asymptotically mixes considerably faster
than the RWMH algorithm.
Pavliotis (IC)
CompStochProc
228 / 298
Ordering for MCMC algorithms
Consider the class S of diffusion processes that are ergodic with

respect to , the distribution from which we want to sample.
1
Choose Xt so that it converges as quickly as possible to the

equilibrium.
Choose Xt s that the stationary variance is minimized
f2 = Var (f ) = h(L)1 f , f i .
(163)
We can use (2) to introduce a partial ordering in S

(Peskun-Tierney ordering).
Pavliotis (IC)
CompStochProc
229 / 298
We can consider the nonreversible dynamics (Hwang et al 1993,

2005).

dXtb = V(Xt ) + b(Xt ) dt + 2 dWt ,
(164)
where is taken to be divergence-free with respect to the

invariant distribution (dx) = Z 1 eV dx:

eV = 0.
(165)
This ensures that (dx) is still the invariant measure of the

dynamics (164).
We can construct such vector fields by taking
= JV,
Pavliotis (IC)
J = J T .
CompStochProc
(166)
230 / 298
The dynamics (164) is non-reversible: (Xt )06t6T has the same law
as (XTt
)06t6T and thus not the same law as (XTt
)06t6T .
Equivalently, the system does not satisfy detailed balancethe
stationary probability flux is not zero.
From (166) it is clear that there are many (in fact, infinitely many)
different ways for modifying the reversible dynamics without
changing the invariant measure.
Pavliotis (IC)
CompStochProc
231 / 298
We consider the nonreversible dynamics

dXt = (I + J)V(Xt ) dt +
2 1 dWt ,
(167)
with R and J the standard 2 2 antisymmetric matrix, i.e.

J12 = 1, J21 = 1. For this class of nonreversible perturbations the
parameter that we wish to choose in an optimal way is .
However, the numerical experiments will illustrate that even a
non-optimal choice of can significantly accelerate convergence
to equilibrium.
We will use the potential
1
1
V(x, y) = (x2 1)2 + y2 .
4
2
Pavliotis (IC)
CompStochProc
(168)
232 / 298
=0
=10
1.8
1.6
E(x2 +y2)
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0.5
1.5
2.5
3.5
4.5
Figure: Second moment as a function of time for (167) with the

potential (168). We take 0 initial conditions and 1 = 0.1.
Pavliotis (IC)
CompStochProc
233 / 298
STATISTICAL INFERENCE FOR STOCHASTIC DIFFERENTIAL

EQUATIONS
Pavliotis (IC)
CompStochProc
234 / 298
Statistical Inference for SDEs
Many of the stochastic models that are used in applications

include unknown parameters that have to be determined from
observations.
We can use techniques from statistics to estimate parameters in
the drift and diffusion coefficients.
We will consider one dimensional It SDEs of the form
dXt = b(Xt ; ) dt + (Xt ; ) dWt ,
X0 = x,
(169)
where RN is a finite set of parameters that we want to

estimate from observations.
Pavliotis (IC)
CompStochProc
235 / 298
We can consider either the case where only discrete observations

are available, or the case where an entire path Xt , t [0, T] is
observed.
The length of the path can be either fixed or we can also consider
the case where the observation interval increases, T +.
We can also consider noisy observations:
Ytj = Xtj + tj ,
Pavliotis (IC)
tj N (0, 1), iid.
CompStochProc
236 / 298
Examples
Estimate the diffusion coefficient of Brownian motion
dXt = 2 dWt .
Estimate the drift and diffusion coefficients of the (stationary) OU
process
dXt = Xt dt + 2 dWt .
Estimate the drift and diffusion coefficients = (A, B, a , b ) in
the Landau-Stuart equation with additive and multiplicative noise
q
3
(170)
dXt = (AXt BXt ) dt + a2 + b2 Xt2 dWt .
estimate the volatility in geometric Brownian motion:
Pavliotis (IC)
CompStochProc
237 / 298
Estimate the integrated stochastic volatility in the Heston model:
dt
t Xt dWt1 ,
= ( t ) dt + t dWt2 ,
dXt = Xt dt +
where E(Wt1 Wt2 ) = .

The integrated stochastic volatility is
(T) =
t Xt2 dt.
Pavliotis (IC)
CompStochProc
238 / 298
To estimate parameters in the diffusion coefficient we can use the

quadratic variation of the process Xt
Z t
X
hXt , Xt i :=
2 (Xs ; ) ds = lim
|Xtk+1 Xtk |2 .
(171)
0
tk 0
tk 6t
where the limit is in probability.

When the diffusion coefficient is constant the convergence
in (171) becomes almost sure:
lim
n+
n
X

i=1
XiT2n X(i1)T2n
2
= 2 T,
a.s.
(172)
If we fix the length of the observation [0, T] and we let the number
of observations become infinite, n + we can in fact
determine (not only estimate) the diffusion coefficient.
This is called the high frequency limit.
T can be (arbitrarily) small.
For the estimation of the diffusion coefficient we do not need to
assume that the process Xt is stationary.
Pavliotis (IC)
CompStochProc
239 / 298
Theorem
Let {Xj }Jj=0 be a sequence of equidistant observations of (??) with
timestep t = and J = T fixed. Assume that the drift b(x; ) is
bounded and define
bJ2 =
Then
In particular,
J1
2
1 X
Xj+1 Xj ,
J
j=0

|Eb
J2 2 | 6 C + 1/2 .
lim |Eb
J2 2 | = 0.
J+
Pavliotis (IC)
(173)
CompStochProc
(174)
(175)
240 / 298
We have
Xj+1 Xj =
(j+1)
b(Xs ; ) ds + Wj ,
where Wj = W(j+1) Wj N (0, ). We substitute this

into (173)
where
bJ2
J1
J1
J1
1 X
2 X
1 X 2
2
=
(Wj ) +
Ij Mj +
Ij ,
J
J
J
2
j=0
j=0
Ij :=
j=0
(j+1)
b(Xs ; ) ds
j
and Mj := Wj .
Note that E(Wn )2 = .
Pavliotis (IC)
CompStochProc
241 / 298
From the boundedness of b(x; ) and using the Cauchy-Schwarz

inequality we get
EIj2
(j+1)
j
2
E b(Xs ; ) ds 6 C2 .
Consequently:

1 2 2

EIj + EIj Mj

C 1 2
2
EI + EMj
6 C +
j

6 C + 1/2 .

2
Eb
J 2 6
In the above we used Cauchys inequality with = 1/2 .
Pavliotis (IC)
CompStochProc
242 / 298
Assume that we have already estimated the diffusion coefficient.

Consider the SDE
dXt = b(Xt ; ) dt + dWt .
(176)
Our goal is to estimate the unknown parameters in the drift

from discrete observations.
We denote the true value by 0 .
We will use the maximum likelihood approach which is based on
maximizing thelikelihood function.
We first study the problem of estimating parameters in the
distribution function of random variables using observations.
Pavliotis (IC)
CompStochProc
243 / 298
We consider a random variable X whose probability distribution

function f (x|) in known up to parameters that we want to
estimate from observations.
When X is a Gaussian random variable, in which case the
parameters to be estimated are the mean and variance,
= (, ).
Suppose that we have N independent observations of the random
variable X.
We define the likelihood function
N
Y
f (xi |).
L {xi }Ni=1 =
(177)
i=1
The likelihood function is essentially the probability density

function of the random variable X, viewed as a function of the
parameters . The maximum likelihood estimator (MLE) is then
with x = {xi }Ni=1 .

Pavliotis (IC)
b = argmax L(x|),
CompStochProc
(178)
244 / 298
The MLE is a random variable that depends on the observations

{xi }Ni=1 . When X is a Gaussian random variable, X N (, 2 ) the
likelihood function takes the form
!
PN
N/2

2

(x
)
1
i
exp i=1 2
.
L {xi }Ni=1 =
2 2
2
Maximizing (the log likelihood function) then with respect to and
2 we obtain the maximum likelihood estimators
Notice that
b=
N
1X
xi ,
N
i=1
b2 =
N
1X
(xi
b)2 .
N
i=1
Eb
= and Eb
2 =
Pavliotis (IC)
CompStochProc
(179)
N1 2
.
N
245 / 298
Under appropriate assumptions the maximum likelihood estimator

is consistent: We can estimate the true values of the parameters
as accurately as we wish if we have a sufficiently large number of
observations N.
Use the LLN and the CLT to study the asymptotic (large N)
properties of the MLE.
In particular, we can show that
lim b = 0 ,
N+
either in probability or almost surely and
lim
N(b 0 ) = N (0, D2 ),
N+
in distribution, for an appropriate constant D2 .
Pavliotis (IC)
CompStochProc
246 / 298
We want to use this idea in order to estimate the parameters in the

drift of the SDE (176).
The (independent) observations of the random variable X are now
replaced by observations of the process Xt , {Xi }Ni=1 with Xi = Xi h
and h N = T.
We have N equidistant observations, or with an entire path
Xt , t [0, T].
Assume then that a path of the process Xt is observed.
The analogue of the likelihood function (177) is the law of the

process on path space.
To obtain a formula for the likelihood function we need to use
Girsanovs theorem.
Pavliotis (IC)
CompStochProc
247 / 298
The law of Xt , denoted by PX , is absolutely continuous with

respect to the Wiener measure, the law of Brownian motion.
The density of PX with respect to the Wiener measure is given by
the Radon-Nikodym derivative:
Z T

Z
1 T
dPX
b(Xs ; ) dXs
= exp
(b(Xs ; ))2 ds
dPW
2 0
0

=: L {Xt }t[0,T] ; , T .
(180)
The maximum likelihood estimator MLE is defined as

b = argmax L {Xt }t[0,T] ; .
(181)
The MLE estimator is a random variable that depends on the path

{Xt }t[0,T] .
Pavliotis (IC)
CompStochProc
248 / 298
Assume that the diffusion process (176) is stationary.

The MLE (181) is asympotically unbiased: in the limit as the
window of observation becomes infinite, T +, the MLE b
converges to the true value 0 .
Assume that there are N parameters to be estimated,
= (1 , . . . N ). The MLE is obtained by solving the (generally
nonlinear) system of equations
L
= 0,
i
i = 1, . . . , N.
(182)
The solution of this system of equations can be expressed in

terms of functionals (e.g. moments) of the observed path
{Xt }t[0,T]
b = F({Xt }t[0,T] ).
Pavliotis (IC)
CompStochProc
249 / 298
Example
(MLE for the stationary OU process). Consider the stationary OU
process
dXt = Xt dt + dWt
(183)

1
with X0 N 0, 2
. The log Likelihood function
log L =
We solve the equation
log L
RT
b = R0T
2
Xt dXt
2
Xt2 dt.
= 0 from which we obtain
Xt dXt
2
0 Xt dt
=:
B1 ({Xt }t[0,T] )
M2 ({Xt }t[0,T] )
where we have used the notation

Z
Z T
Xtn dXt , Mn ({Xt }t[0,T] ) :=
Bn ({Xt }t[0,T] ) :=
0
Pavliotis (IC)
CompStochProc
(184)
Xtn dt.
(185)
0
250 / 298
Given a set of discrete equidistant observations {Xj }Jj=0 ,

Xj = Xjt , Xj = Xj+1 Xj , formula (184) can be approximated by
PJ1
j=0
Xj Xj
j=0
|Xj |2 t
b = PJ1
(186)
The MLE (184) becomes asymptotically unbiased in the large

sample limit J +, t fixed.
Pavliotis (IC)
CompStochProc
251 / 298

1.4
1.2
0.8
0.6
0
1000
2000
3000
4000
5000
6000
Figure: MLE for the OU process.
Pavliotis (IC)
CompStochProc
252 / 298
Using Its formula we can obtain an alternative formula for the

maximum likelihood estimator of the drift coefficient for the OU
process.
The numerator in (184) can be written as
Z t
Z t
Z t
2
Xs dWs .
Xs ds +
Xs dXs =
0
We apply Its formula to the function V(x) = 12 x2 to obtain

1
dt + Xt dWt .
2
We combine the above two equations to obtain
Z t
X 2 X02 t
.
Xs dXs = t
2
0
dV(Xt ) = Xt2 dt +
The formula for the MLE now becomes
Pavliotis (IC)
b=
XT2 X02 T
.
RT
2 0 Xt2 dt
CompStochProc
(187)
253 / 298
Example
Consider the following generalization of the previous example:
dXt = b(Xt ) dt + dWt ,
(188)
where b(x) is such that the equation has a unique ergodic solution.
The log Likelihood function is
log L =
The MLE is
T
0
2
b(Xt ) dXt
2
RT
b = R T0
0
Pavliotis (IC)
b(Xt ) dXt
(b(Xt ))2 dt
CompStochProc
b(Xt )2 dt.
0
254 / 298
Example
MLE for a stationary bistable SDE
Consider the SDE

dXt = Xt Xt3 dt + dWt
(189)
This SDE is of the form dXt = V (Xt ) dt + dWt with

V(x) = 2 x2 4 x4 and is ergodic with invariant distribution
1
(x) = Z 1 e 2 V(x) .
Our goal is to estimate the coefficients and from observations
using the maximum likelihood approach. The log likelihood
functions reads
Z
Z T
2

1 T
3
Xt Xt3 dt
log L =
Xt Xt dXt
2 0
0
1
1 2
=: B1 B3 M2 2 M6 + M4 ,
2
2
using the notation (185).
Pavliotis (IC)
CompStochProc
255 / 298
Example
Equations (182) become
log L
b = 0,
(b
, )
log L
b = 0,
(b
, )
This leads to a linear system of equations

b
M2 M4
B1
=
,
M4 M6
B3
b
The solution of which is
b=
Pavliotis (IC)
B 1 M6 B 3 M4
,
M2 M6 M42
B 1 M4 B 3 M2
b =
.
M2 M6 M42
CompStochProc
(190)
256 / 298
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0
1000
2000
3000
4000
5000
6000
Figure: MLE estimators for a bistable potential.
Pavliotis (IC)
CompStochProc
257 / 298
The rigorous justification of the MLE is based on Girsanovs

theorem.
Here we present a heuristic derivation of (180) that is based on
the Euler-Marayama discretization of (176):
Xn+1 Xn = b(Xn ; ) t + Wn ,
(191)
where Xn = X(nt) and Wn = Wn+1 Wn =: n .
We have that n N (0, t), mutually independent.
Our goal is to calculate the Radon-Nikodym derivative of the law

N1
and the discretized Brownian
of the discrete-time process {Xn }n=0
motion.
In the discrete case this derivative becomes the ratio between the
distribution functions of the two processes.
Pavliotis (IC)
CompStochProc
258 / 298
We rewrite (191) in the form

Xn = bn + n ,
(192)
where bn := b(Xn ; ), := t. Our goal is to calculate the

RadonNikodym derivative of the law of the discrete-time process
N1
and the discretized Brownian motion.
{Xn }n=0
In the discrete case, this derivative becomes the ratio between the
distribution functions of the two processes.
We rewrite (??) in the form
(193)
Xn = bn t + n t,
where bn := b(Xn ; ). The distribution function of the discretized
Brownian motion is

N1
Y
1
1
(Wi )2
pNW =
exp
2t
2t
i=0
!
N1
1
1 X
(194)
=
(Wi )2 .
exp
2t
( 2t)N
i=0
Pavliotis (IC)
CompStochProc
259 / 298
N1
, using the
Similarly, for the law of the discretized process {Xn }n=0
fact that p(Xi+1 |Xi ) N (Xi + bi t, t), we can write
!
N1
X 1
1
1
2
2
N
(195)
.
(Xi ) + (bi ) t bi Xi
pX =
exp
2t
2
( 2t)N
i=0
Now we can calculate the ratio of the laws of the two processes,
N1
:
evaluated at the path {Xn }n=0
!
N1
N1
X
1X
dPNX
2
bi Xi .
(bi ) t +
= exp
2
dPNW
i=0
i=0
Passing now (formally) to the limit as N + while keeping

fixed, we obtain (180).
Pavliotis (IC)
CompStochProc
260 / 298
Lampertis Transformation and Girsanovs Theorem

For SDEs in one dimension it is possible to map multiplicative
noise to additive noise.
Consider a one dimensional It SDE with multiplicative noise
dXt = f (Xt ) dt + (Xt ) dWt .
(196)
We ask whether there exists a transformation z = h(x) that

maps (196) into an SDE with additive noise.
We apply Its formula to obtain
dZt = Lh(Xt ) dt + h (Xt )(Xt ) dWt ,
where L denotes the generator of Xt .
Pavliotis (IC)
CompStochProc
261 / 298
In order to obtain an SDE with unit diffusion coefficient we need to

impose the condition
h (x)(x) = 1,
from which we deduce that
h(x) =
x
x0
1
dx,
(x)
(197)
where x0 is arbitrary. We have that

Lh(x) =
f (x)
1
(x).
(x) 2
Consequently, the transformed SDE has the form

dYt = fY (Yt ) dt + dWt
(198)
with
1
f (h1 (y))
(h1 (y)).
(h1 (y)) 2
This is called the Lamperti transformation.
fY (y) =
Pavliotis (IC)
CompStochProc
262 / 298
Consider the Cox-Ingersoll-Ross (CIR) SDE
dXt = ( Xt ) dt + Xt dWt , X0 = x > 0,
where , , > 0. From (197) we deduce h(x) = 2 x.

The generator of the CIR process is
L = ( x)
We have that
d
2 d2
+ x 2.
dx
2 dx
1/2 1/2
x
x .
2
The CIR SDE becomes, for Yt = Xt ,
1
dt
Xt dt + dWt
dYt =
Xt

2 1 1
Y
=
t dt + dWt .
2
2 Yt
2
Lh(x) =
(199)
When = 4 the equation above becomes the

Ornstein-Uhlenbeck process for Yt .
Pavliotis (IC)
CompStochProc
263 / 298
The Lamperti transformation is very useful when estimating

parameters for an SDE model using the MLE approach.
Such a transformation does not exist for arbitrary SDEs in higher
dimensions. In particular, it is not possible, in general, to transform
a multidimensional It SDE with multiplicative noise to an SDE
with additive noise.
Multidimensional diffusion processes for which the reduction from
multiplicative to additive noise is possible are called reducible.
Conditions on the coefficients of an SDE so that it is
reducible are obtained in Ait-Sahalia, Closed form likelihood
expansions for multivariate diffusions, the Annals of Statistics
(2008) 36(2) 906937.
Pavliotis (IC)
CompStochProc
264 / 298
Conversely, it is also sometimes possible to remove the drift term

from an SDE and obtain an equation where only (multiplicative)
noise is present.
Consider the one dimensional SDE:
dXt = b(Xt ) dt + dWt .
We introduce the following functional of Xt

Z t
Z
1 t 2
b(Xs ) dWs .
Mt = exp
b (Xs ) ds +
2 0
0
(200)
(201)
We can write Mt = eYt where Yt is the solution to the SDE

Mt = eYt ,
1
dYt = b2 (Xt ) dt + b(Xt ) dWt , Y0 = 0.
2
(202)
We can now apply Its formula to obtain the SDE

dMt = Mt b(Xt ) dWt .
(203)
Notice that this is an equation without drift.

Pavliotis (IC)
CompStochProc
265 / 298
Under appropriate conditions on the drift b(), it is possible to

show that the law of the process Xt , denoted by P, which is a
probability measure over the space of continuous functions, is
absolutely continuous with respect to the Wiener measure PW , the
law of the Brownian motion Wt .
The Radon-Nikodym derivative between these two measures is
the inverse of the stochastic process Mt given in (202):
Z t

Z t
1
dP
2
(Xt ) = exp
b(Xs ) dWs .
b (Xs ) ds +
(204)
dPW
2 0
0
This is a form of Girsanovs theorem.
Using now equation (200) we can rewrite (204) as
Z t

Z
dP
1 t
2
b(Xs ) dXs
(Xt ) = exp
|b(Xs )| ds .
dPW
2 0
0
Pavliotis (IC)
CompStochProc
(205)
266 / 298
A form of Girsanovs theorem that is very useful in statistical

inference for diffusion processes is the following: consider the two
SDEs
dXt = b1 (Xt ) dt + (Xt ) dWt , X0 = x1 , t [0, T], (206a)
dXt = b2 (Xt ) dt + (Xt ) dWt , X0 = x2 , t [0, T], (206b)

where (x) > 0. We assume that we have existence and
uniqueness of strong solutions for both SDEs.
Assume that x1 and x2 are random variables with densities f1 (x)
and f2 (x) with respect to the Lebesgue measure which have the
same support, or nonrandom and equal to the same constant.
Let P1 and P2 denote the laws of these two SDEs. Then these two
measures are equivalent and their Radon-Nikodym derivative is
f2 (X0 )
dP2
(X) =
exp
dP1
f1 (X0 )
Pavliotis (IC)
Z
1
b2 (Xt ) b1 (Xt )
dXt
2 (Xt )
2
CompStochProc
T
0

b22 (Xt ) b21 (Xt )
dt
.
2 (Xt )
(207)
267 / 298
Example: estimation of the diffusion and the drift

for the OU process
1
2
3
4
5
6
7
8
9
10
11
% Estimate Parameters in the OU process.

% Use MLE and quadratic variation
% Input: a path of the SDE and the observation times
%
function [alpha, lambda, t, x] = param_est(q)
t = q(:,1); x = q(:,2);
figure; plot(t,x,'Linewidth',2)
hold
N = length(t); dt = t(2) - t(1); T = N*dt;
lambda = (1/(2*T))*sum(diff(x).^2)
alpha = - sum(x(1:end-1).*diff(x))/(sum(x.^2)*dt)
Pavliotis (IC)
CompStochProc
268 / 298
The MLE (181) depends on the path {Xt }t[0,T] and consequently it
is random variable.
We have to prove that, in the large sample limit J +, t fixed,
and for appropriate assumptions on the diffusion process Xt , the
MLE converges to the true value 0 and to also obtain information
about the fluctuations around the limiting value 0 .
Assuming that Xt is stationary we can prove that the MLE b
converges in the limit as T + (assuming that the entire path
{Xt }t[0,T] is available to us) to 0 .
Furthermore, we can prove asympotic normality of the
maximum likelihood estimator,

T b 0 N (0, 2 ),
(208)
for a variance 2 that can be calculated.
Pavliotis (IC)
CompStochProc
269 / 298
Theorem
Let Xt be the stationary OU process
dXt = Xt dt + dWt ,
X0 N
1
0,
2
and let
b denote the MLE (184). Then
lim
T|b
| = N (0, 2)
T+
(209)
in distribution.
Pavliotis (IC)
CompStochProc
270 / 298
For the proof of this theorem we will need the following result from
probability theory.
Theorem
+
(Slutsky) Let {Xn }+
n=1 , {Yn }n=1 be sequences of random variables
such that Xn converges in distribution to a random variable X and Yn
converges in probability to a constant c 6= 0. Then
lim Y 1 Xn
n+ n
= c1 X,
in distribution.
Pavliotis (IC)
CompStochProc
271 / 298
Proof of Theorem 57.

First we observe that
RT
b = R0 T
Xt dXt
2
0 Xt dt
Consequently:
RT
Xt dWt
= R0 T
.
2
0 Xt dt
RT
1
Xt dWt
1
T 0
=
b = R0 T
R
T
2
T T1 0 Xt2 dt
0 Xt dt
R

1 T 2
W
X
dt
t
T 0
1
Law
,
=
RT 2
1
T
T 0 Xt dt
RT
Xt dWt
where the scaling property of Brownian motion was used.

Z T
Z T
2
f (s) ds .
f (s) dW(s) = W
Pavliotis (IC)
(210)
CompStochProc
272 / 298
The process Xt is stationary. We use the ergodic theorem for

stationary Markov processes to obtain
Z
1
1 T 2
,
(211)
Xt dt = EXt2 =
lim
T+ T 0
2
Let now Y = N (0, 2 ) with 2 =
1
2 .
We can write Y = W( 2 ).
We use the Hlder continuity of Brownian motion to conclude that,

almost surely,

Z T

Z T
1

1
2
2

= W 1
X
dt
Y
X
dt
W
(212)
t
t
T

T 0
2
0
12
Z T

1
1
2

6 H l(W)
Xt dt
, (213)
T 0
2
where Hl(W) denotes the Hlder constant of Brownian motion.
Pavliotis (IC)
CompStochProc
273 / 298
We have (in distribution)

1
lim
T+ T
T
0
Xt dWt = N
1
0,
2
We combine (211) with (214) and use Slutskys theorem to

conclude that (in distribution)
T|b
| = N (0, 2)
lim
T+
Pavliotis (IC)
CompStochProc
(214)
(215)
274 / 298
Econometrics: Market Microstructure Noise

Zhang, Mykland, Ait-Sahalia J. American Statistical Association 2005 100(472)
pp. 13941411
St is the price process of a security. Xt = log(St ) is the solution of

dXt = t dt + t dWt1 .
(216)
t , t are stochastic processes.

For example, we can take (Heston model)
t = ( t /2),
t = t2 .
and
1/2
E(dWt1 dWt2 )
dt = ( t ) dt + t
dWt2 ,
(217)
= dt.
with
Goal: Estimate the integrated stochastic volatility of Xt from noisy
observations Yt .
Pavliotis (IC)
CompStochProc
275 / 298
Assume that we have market mictrostructure (observation

error):
Yti = Xti + ti , i = 0, . . . N
(218)
where
ti
E2ti = 2 .
iid Eti = 0,
Continuous time modelling of discrete time data:

The quadratic estimator of the volatility fails when the data is
sampled at the highest frequencies.
In the absence of market microstructure:
Z T
t2 dt,
lim [X, X]T =
t0
in probability, where
[X, X]T =
X
ti
Pavliotis (IC)
2
Xti+1 Xti .
CompStochProc
(219)
276 / 298
On the other hand, for the noisy measurements:

lim [Y, Y]T = 2N2 + O(N 1/2 ).
t0
1
2N
limt0 [Y, Y]T provides us with an estimate for the variance of

the noise!
The standard estimator for the integrated volatility gives us the
wrong result.
We need to ignore high frequency data.
This is called the "fifth best estimator" in Ait-Sahalia et al.
Pavliotis (IC)
CompStochProc
277 / 298
Fourth best estimator: sample at an arbitrary sparse frequency ns :

2
[Y, Y]sp
T hX, XiT + 2ns + C(, t ) Z,
Z N (0, 1).
Third best estimator: sample at the optimal sparse frequency by

minimizing the mean squared error:
nopt =
T
4(2 )2
t4 dt
1/3
We are using small sample sizes, the variance can be quite large
(use one out every 300 observations).
Pavliotis (IC)
CompStochProc
278 / 298
Second best estimator: average over the data (moving average).

Averaging over K grids of average size
n we have
[Y, Y]avg
n2 + C(, t ) Z,
T hX, XiT + 2
Z N (0, 1).
The optimal choice of

n is
nopt =
T
6(2 )2
t4 dt
1/3
The first best estimator: subsampling, averaging and

bias-correction. Use the fifth best estimator to estimate the noise
and subtract it off from the fourth best estimator.
n
all
hX, XiT = [Y, Y]avg
T [Y, Y]T .
n
This estimator is unbiased
hX, XiT = hX, XiT + n1/6 C(, t )Z,
Pavliotis (IC)
CompStochProc
Z N (0, 1).
279 / 298
Thermal Motion in a Two-Scale Potential

A.M. Stuart and G.P., J. Stat. Phys. 127(4) 741-781, (2007).
Consider the SDE

x (t)
dx (t) = V x (t),
; dt + 2 dW(t),
Separable potential, linear in the coefficient :

V(x, y; ) := V(x) + p (y) .
p(y) is a mean-zero smooth periodic function.
x (t) X(t) weakly in C([0, T]; Rd ), the solution of the
homogenized equation:
dX(t) = KV(X(t))dt + 2KdW(t).

Pavliotis (IC)
CompStochProc
280 / 298
2.5
1.5
0.5
2
1.5
0.5
0.5
1.5
Figure: Bistable potential with periodic fluctuations

Pavliotis (IC)
CompStochProc
281 / 298
The coefficients A, are given by the standard homogenization

formulas.
Goal: fit a time series of x (t), the solution of (??), to the
homogenized SDE.
Problem: the data is not compatible with the homogenized
equation at small scales.
Model misspecification.
Similar difficulties when studying inverse problems for PDEs with a
multiscale structure.
Pavliotis (IC)
CompStochProc
282 / 298
In one dimension
1
dx (t) = V (x (t))dt p
x (t)
dt +
2 dW(t).
The homogenized equation is

dX(t) = AV (X(t))dt +
2 dW(t).
(A, ) are given by

L2
A=
,
b
ZZ
L2
=
b
ZZ
Z=
e
0
p(y)
dy,
b=
Z
p(y)
dy.
A and decay to 0 exponentially fast in 0.
The homogenized coefficients satisfy (detailed balance):

A
= .
Pavliotis (IC)
CompStochProc
283 / 298
We are given a path of

1
dx (t) = V (x (t)) dt p
x (t)
dt +
2 d(t).
We want to fit the data to

b (X(t))dt +
dX(t) = AV
b d(t).
2
It is reasonable to assume that we have some information on the

largescale structure of the potential V(x).
We do not assume that we know anything about the small scale
fluctuations.
Pavliotis (IC)
CompStochProc
284 / 298
We fit the drift and diffusion coefficients via maximum likelihood

and quadratic variation, respectively.
For simplicity we fit scalars A, in
dx(t) = AV(x(t))dt +
2dW(t).
The RadonNikodym derivative of the law of this SDE wrt Wiener

measure is

Z
Z T
1
1 T
2
AV(x) dx(s)
|AV(x(s))| ds .
L = exp
0
2 0
This is the maximum likelihood function.
Pavliotis (IC)
CompStochProc
285 / 298
Let x denote {x(t)}t[0,T] or {x(n)}Nn=0 with n = T.
Diffusion coefficient estimated from the quadratic variation:

N1
X
b N, (x)) = 1
|xn+1 xn |2 ,
dN
n=0
b to maximize log L :
Choose A
RT
hV(x(s)), dx(s)i
b
A(x) = 0R T
2
0 |V(x(s))| ds
Pavliotis (IC)
CompStochProc
286 / 298
In practice we use the estimators on discrete time data and use

the following discretisations:
N1
X
b N, (x) = 1
|xn+1 xn |2 ,
N
n=0
bN, (x) =
A
PN1
n=0 hV(xn ), (xn+1 xn )i

,
PN1
2
n=0 |V(xn )|
PN1
n=0 V(xn )
b
AN, (x) = N, PN1

,
2
n=0 |V(xn )|
Pavliotis (IC)
CompStochProc
287 / 298
No Subsampling
Generate data from the unhomogenized equation (quadratic or

bistable potential, simple trigonometric perturbation).
Solve the SDE numerically using EulerMarayama for a single
realization of the noise. Time step is sufficiently small so that
errors due to discretization are negligible.
Fit to the homogenized equation.
Use data on a fine scale 2 (i.e. use all data).
Parameter estimation fails.
Pavliotis (IC)
CompStochProc
288 / 298
0.6
1
0.5
0.8
0.4
0.6
0.4
0.2
0.2
0
0.04
0.3
0.1
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0
0.04
0.2
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
b vs for quadratic potential.

b
Figure: A,
Pavliotis (IC)
CompStochProc
289 / 298
1.4
1
0.9
1.2
0.8
1
0.7
0.6
0.8
0.6
0.5
0.4
0.3
0.4
0.2
0.2
0.1
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
b vs for quadratic potential with = 0.1.

b
Figure: A,
Pavliotis (IC)
CompStochProc
290 / 298
Subsampling
Generate data from the unhomogenized equation.

Fit to the homogenized equation.
Use data on a coarse scale 2 1.
More precisely
:= tsam = 2k t,
k = 0, 1, . . . .
Study the estimators as a function of tsam .

Parameter Estimation Succeeds.
Pavliotis (IC)
CompStochProc
291 / 298
1
0.5
0.8
0.4
0.6
0.4
0.2
0.2
0.3
0.1
0.2
0.4
tsam
0.6
0.8
0.2
0.4
tsam
0.6
0.8
b vs tsam for quadratic potential with = 0.1.

b
Figure: A,
Pavliotis (IC)
CompStochProc
292 / 298
1.6
1.4
2.5
1.2
2
0.8
1.5
0.6
1
0.4
0.5
0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
sam
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
sam
b B
b vs tsam for bistable potential with = 0.5, = 0.1.
Figure: A,
Pavliotis (IC)
CompStochProc
293 / 298
1.5
1.5
B12
11
0.5
0.5
0.1
0.2
0.3
0.4
0.5
tsam
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
tsam
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
tsam
0.5
0.6
0.7
0.8
0.9
3.5
2
3
1.5
B22
21
2.5
1.5
0.5
1
0.1
0.2
0.3
0.4
0.5
tsam
0.6
0.7
0.8
0.9
0.5
b ij , i, j = 1, 2 vs tsam for 2d quadratic potential with = 0.5, = 0.1.

Figure: B
Pavliotis (IC)
CompStochProc
294 / 298
Conclusions From Numerical Experiments
Parameter estimation fails when we take the smallscale (high

frequency) data into account.
b become exponentially wrong in 0.
b
A,
b
b do not improve as 0.
A,
Parameter estimation succeeds when we subsample (use only

data on a coarse scale).
There is an optimal sampling rate which depends on .
Optimal sampling rate is different in different directions in higher
dimensions.
Pavliotis (IC)
CompStochProc
295 / 298
Theorem (No Subsampling)

Let x (t) : R+ 7 Rd be generated by the unhomogenized equation.
Then
b (t)) = , a.s.
lim lim A(x
0 T
Fix T = N. Then for every > 0
lim N, (x (t)) = ,
a.s.
Thus the unhomogenized parameters are estimated the wrong

answer.
Pavliotis (IC)
CompStochProc
296 / 298
Theorem (With Subsampling)

Fix T = N with = with (0, 1). Then
b N, (x ) =
lim
in distribution.
b N, (x ) = A
lim A
in distribution.
Let = with (0, 1), N = [ ], > . Then

0
Thus we get the right answer provided subsampling is used.
Pavliotis (IC)
CompStochProc
297 / 298

M5 A44 Lecture Notes

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

M5 A44 Lecture Notes

Transféré par

Droits d'auteur :

Formats disponibles

M5A44 COMPUTATIONAL STOCHASTIC

WINTER TERM 2014-2015

Lectures: Mondays, 11:00-13:00, Wednesdays 12:00-13:00,

This is an introductory course on computational stochastic

Random number generators.

SIMULATION OF STOCHASTIC PROCESSES

Introduction to stochastic processes.

3. NUMERICAL SOLUTION OF STOCHASTIC DIFFERENTIAL

Basic properties of stochastic differential equations.

4. ERGODIC DIFFUSION PROCESSES

Deterministic methods: numerical solution of the Fokker-Planck

5. MARKOV CHAIN MONTE CARLO (MCMC).

The Metropolis-Hastings and MALA algorithms.

6. STATISTICAL INFERENCE FOR DIFFUSION PROCESSES.

Estimation of the diffusion coefficient (volatility).

Probabilistic methods for the numerical solution of partial differential

G.A. Pavliotis Stochastic Processes and Applications, Springer

why Computational Stochastic Processes?

Deriving dynamical models from paleoclimatic records

Fit this data to a bistable SDE

Estimate the coefficients in the drift from the palecolimatic data

Computational Statistical Physics

where = (kB T)1 is the inverse temperature and the

x denotes the configuration of the system x = {xi : i = 1, . . . k},

We need to be able to calculate Z(), which is a very high

One method for doing this is by simulating a diffusion process

Some interesting questions

Example: Brownian motion in a periodic potential

At long times the solution to qt performs an effective Brownian

We compute the diffusion coefficient for V(q) = cos(q) using the

for both 1 and 1.

Figure: Variance and effective diffusion coefficient for various values of .

The One-Dimensional Random Walk

where Xj {1, 1} with P(Xj = 1) = 21 .

We can simulate the random walk on a computer:

50step random walk

Figure: Three paths of the random walk of length N = 50.

1000step random walk

Figure: Three paths of the random walk of length N = 1000.

Every path of the random walk is different: it depends on the

Consider the sequence of continuous time stochastic processes

Brownian motion W(t) is a continuous time stochastic processes

We can write an equation for the evolution of the paths of a

ELEMENTS OF PROBABILITY THEORY

A collection of subsets of a set is called a algebra if it

Let (, F) be a measurable space. A function : F 7 [0, 1] is

Let X be a random variable (measurable function) from (, F, ) to

More generally, let f : E 7 R be Gmeasurable. Then,

Let U be a topological space. We will use the notation B(U) to

The measure X is called the distribution (or sometimes the law)

We can use the distribution of a random variable to compute

When E = Rd and we can write dX (x) = (x) dx, then we refer to

and the variance is

is termed a multivariate Gaussian or normal random variable.

Since the mean and variance specify completely a Gaussian

Since the mean and covariance matrix completely specify a

The Characteristic Function

The characteristic function determines uniquely the distribution

(t) = ehm,ti 2 ht,ti .

Types of Convergence and Limit Theorems

One of the most important aspects of the theory of random

(b) Zn converges to Z in probability if for every > 0

(d) Let Fn (), n = 1, + , F() be the distribution functions of