1 Votes +0 Votes -

2 vues31 pagesMar 05, 2019

© © All Rights Reserved

PDF, TXT ou lisez en ligne sur Scribd

© All Rights Reserved

2 vues

© All Rights Reserved

- Neuromancer
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- How Not to Be Wrong: The Power of Mathematical Thinking
- Drive: The Surprising Truth About What Motivates Us
- Chaos: Making a New Science
- The Joy of x: A Guided Tour of Math, from One to Infinity
- How to Read a Person Like a Book
- Moonwalking with Einstein: The Art and Science of Remembering Everything
- The Wright Brothers
- The Other Einstein: A Novel
- The 6th Extinction
- The Housekeeper and the Professor: A Novel
- The Power of Discipline: 7 Ways it Can Change Your Life
- The 10X Rule: The Only Difference Between Success and Failure
- A Short History of Nearly Everything
- The Kiss Quotient: A Novel
- The End of Average: How We Succeed in a World That Values Sameness
- Made to Stick: Why Some Ideas Survive and Others Die
- Algorithms to Live By: The Computer Science of Human Decisions
- The Universe in a Nutshell

Vous êtes sur la page 1sur 31

equations

Justin Sirignano∗ and Konstantinos Spiliopoulos†‡§

September 7, 2018

Abstract

arXiv:1708.07469v5 [q-fin.MF] 5 Sep 2018

High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-

dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy

the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is

key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network

is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of

high-dimensional free boundary PDEs, which we are able to accurately solve in up to 200 dimensions.

The algorithm is also tested on a high-dimensional Hamilton-Jacobi-Bellman PDE and Burgers’ equation.

The deep learning algorithm approximates the general solution to the Burgers’ equation for a continuum

of different boundary conditions and physical conditions (which can be viewed as a high-dimensional

space). We call the algorithm a “Deep Galerkin Method (DGM)” since it is similar in spirit to Galerkin

methods, with the solution approximated by a neural network instead of a linear combination of basis

functions. In addition, we prove a theorem regarding the approximation power of neural networks for a

class of quasilinear parabolic PDEs.

High-dimensional partial differential equations (PDEs) are used in physics, engineering, and finance. Their

numerical solution has been a longstanding challenge. Finite difference methods become infeasible in higher

dimensions due to the explosion in the number of grid points and the demand for reduced time step size.

If there are d space dimensions and 1 time dimension, the mesh is of size Od+1 . This quickly becomes

computationally intractable when the dimension d becomes even moderately large. We propose to solve

high-dimensional PDEs using a meshfree deep learning algorithm. The method is similar in spirit to the

Galerkin method, but with several key changes using ideas from machine learning. The Galerkin method is

a widely-used computational method which seeks a reduced-form solution to a PDE as a linear combination

of basis functions. The deep learning algorithm, or “Deep Galerkin Method” (DGM), uses a deep neural

network instead of a linear combination of basis functions. The deep neural network is trained to satisfy

the differential operator, initial condition, and boundary conditions using stochastic gradient descent at

randomly sampled spatial points. By randomly sampling spatial points, we avoid the need to form a mesh

(which is infeasible in higher dimensions) and instead convert the PDE problem into a machine learning

problem.

DGM is a natural merger of Galerkin methods and machine learning. The algorithm in principle is

straightforward; see Section 2. Promising numerical results are presented later in Section 4 for a class

∗ University

of Illinois at Urbana Champaign, Urbana, E-mail: jasirign@illinois.edu

† Department of Mathematics and Statistics, Boston University, Boston, E-mail: kspiliop@math.bu.edu

‡ The authors thank seminar participants at the JP Morgan Machine Learning and AI Forum seminar, the Imperial College

London Applied Mathematics and Mathematical Physics seminar, the Department of Applied Mathematics at the University

of Colorado Boulder, Princeton University, and Northwestern University for their comments. The authors would also like to

thank participants at the 2017 INFORMS Applied Probability Conference, the 2017 Greek Stochastics Conference, and the

2018 SIAM Annual Meeting for their comments.

§ Research of K.S. supported in part by the National Science Foundation (DMS 1550918). Computations for this paper were

performed using the Blue Waters supercomputer grant “Distributed Learning with Neural Networks”.

1

of high-dimensional free boundary PDEs. We also accurately solve a high-dimensional Hamilton-Jacobi-

Bellman PDE in Section 5 and Burger’s equation in Section 6. DGM converts the computational cost of

finite difference to a more convenient form: instead of a huge mesh of Od+1 (which is infeasible to handle),

many batches of random spatial points are generated. Although the total number of spatial points could be

vast, the algorithm can process the spatial points sequentially without harming the convergence rate.

Deep learning has revolutionized fields such as image, text, and speech recognition. These fields require

statistical approaches which can model nonlinear functions of high-dimensional inputs. Deep learning, which

uses multi-layer neural networks (i.e., “deep neural networks”), has proven very effective in practice for such

tasks. A multi-layer neural network is essentially a “stack” of nonlinear operations where each operation is

prescribed by certain parameters that must be estimated from data. Performance in practice can strongly

depend upon the specific form of the neural network architecture and the training algorithms which are used.

The design of neural network architectures and training methods has been the focus of intense research over

the past decade. Given the success of deep learning, there is also growing interest in applying it to a range

of other areas in science and engineering (see Section 1.2 for some examples).

Evaluating the accuracy of the deep learning algorithm is not straightforward. PDEs with semi-analytic

solutions may not be sufficiently challenging. (After all, the semi-analytic solution exists since the PDE

can be transformed into a lower-dimensional equation.) It cannot be benchmarked against traditional finite

difference (which fails in high dimensions). We test the deep learning algorithm on a class of high-dimensional

free boundary PDEs which have the special property that error bounds can be calculated for any approximate

solution. This provides a unique opportunity to evaluate the accuracy of the deep learning algorithm on a

class of high-dimensional PDEs with no semi-analytic solutions.

This class of high-dimensional free boundary PDEs also has important applications in finance, where it

used to price American options. An American option is a financial derivative on a portfolio of stocks. The

number of space dimensions in the PDE equals the number of stocks in the portfolio. Financial institutions

are interested in pricing options on portfolios ranging from dozens to even hundreds of stocks [43]. Therefore,

there is a significant need for numerical methods to accurately solve high-dimensional free boundary PDEs.

We also test the deep learning algorithm on a high-dimensional Hamilton-Jacobi-Bellman PDE with

accurate results. We consider a high-dimensional Hamilton-Jacobi-Bellman PDE motivated by the problem

of optimally controlling a stochastic heat equation.

Finally, it is often of interest to find the solution of a PDE over a range of problem setups (e.g., different

physical conditions and boundary conditions). For example, this may be useful for the design of engineering

systems or uncertainty quantification. The problem setup space may be high-dimensional and therefore may

require solving many PDEs for many different problem setups, which can be computationally expensive. We

use our deep learning algorithm to approximate the general solution to the Burgers’ equation for different

boundary conditions, initial conditions, and physical conditions.

In the remainder of the Introduction, we provide an overview of our results regarding the approximation

power of neural networks for quasilinear parabolic PDEs (Section 1.1), and relevant literature (Section 1.2).

The deep learning algorithm for solving PDEs is presented in Section 2. An efficient scheme for evaluating

the diffusion operator is developed in Section 3. Numerical analysis of the algorithm is presented in Sections

4, 5, and 6. We implement and test the algorithm on a class of high-dimensional free boundary PDEs in up

to 200 dimensions. The theorem and proof for the approximation of PDE solutions with neural networks is

presented in Section 7. Conclusions are in Section 8. For readability purposes, proofs from Section 7 have

been collected in Appendix A.

We also prove a theorem regarding the approximation power of neural networks for a class of quasilinear

parabolic PDEs. Consider the potentially nonlinear PDE

u(0, x) = u0 (x), x∈Ω

u(t, x) = g(t, x), x ∈ [0, T ] × ∂Ω, (1.1)

2

where ∂Ω is the boundary of the domain Ω. The solution u(t, x) is of course unknown, but an approximate

solution f (t, x) can be found by minimizing the L2 error

2 2 2

J(f ) = k∂t f + Lf k2,[0,T ]×Ω + kf − gk2,[0,T ]×∂Ω + kf (0, ·) − u0 k2,Ω .

The error function J(f ) measures how well the approximate solution f satisfies the differential operator,

boundary condition, and initial condition. Note that no knowledge of the actual solution u is assumed; J(f )

can be directly calculated from the PDE (1.1) for any approximation f . The goal is to construct functions

f for which J(f ) is as close to 0 as possible. Define Cn as the class of neural networks with a single hidden

layer and n hidden units.1 Let f n be a neural network with n hidden units which minimizes J(f ). We prove

that, under certain conditions,

fn → u as n → ∞,

strongly in, Lρ ([0, T ] × Ω), with ρ < 2, for a class of quasilinear parabolic PDEs; see subsection 7.2 and

Theorem 7.3 therein for the precise statement. That is, the neural network will converge in Lρ , ρ < 2 to the

solution of the PDE as the number of hidden units tends to infinity. The precise statement of the theorem

and its proof are presented in Section 7. The proof requires the joint analysis of the approximation power of

neural networks as well as the continuity properties of partial differential equations. Note that J(f n ) → 0

does not necessarily imply that f n → u, given that we only have L2 control on the approximation error.

First, we prove that J(f n ) → 0 as n → ∞. We then establish that each neural network {f n }∞ n=1 satisfies a

PDE with a source term hn (t, x). We are then able to prove, under certain conditions, the convergence of

f n → u as n → ∞ in Lρ ([0, T ] × Ω), for ρ < 2, using the smoothness of the neural network approximations

and compactness arguments.

Theorem 7.3 establishes the approximation power of neural networks for solving PDEs (at least within

a class of quasilinear parabolic PDEs); however, directly minimizing J(f ) is not computationally tractable

since it involves high-dimensional integrals. The DGM algorithm minimizes J(f ) using a meshfree approach;

see Section 2.

Solving PDEs with a neural network as an approximation is a natural idea, and has been considered in

various forms previously. [29], [30], [46], [31], and [35] propose to use neural networks to solve PDEs and

ODEs. These papers estimate neural network solutions on an a priori fixed mesh. This paper proposes using

deep neural networks and is meshfree, which is key to solving high-dimensional PDEs.

In particular, this paper explores several new innovations. First, we focus on high-dimensional PDEs and

apply deep learning advances of the past decade to this problem (deep neural networks instead of shallow

neural networks, improved optimization methods for neural networks, etc.). Algorithms for high-dimensional

free boundary PDEs are developed, efficiently implemented, and tested. In particular, we develop an iterative

method to address the free boundary. Secondly, to avoid ever forming a mesh, we sample a sequence of

random spatial points. This produces a meshfree method, which is essential for high-dimensional PDEs.

Thirdly, the algorithm incorporates a new computational scheme for the efficient computation of neural

network gradients arising from the second derivatives of high-dimensional PDEs.

Recently, [41, 42] develop physics informed deep learning models. They estimate deep neural network

models which merge data observations with PDE models. This allows for the estimation of physical models

from limited data by leveraging a priori knowledge that the physical dynamics should obey a class of PDEs.

Their approach solves PDEs in one and two spatial dimensions using deep neural networks. [32] uses a deep

neural network to model the Reynolds stresses in a Reynolds-averaged Navier-Stokes (RANS) model. RANS

is a reduced-order model for turbulence in fluid dynamics. [15, 2] have also recently developed a scheme

for solving a class of quasilinear PDEs which can be represented as forward-backward stochastic differential

1A neural network with a single hidden layer and n o hidden units is a function of the form Cn =

n Pn Pd

h(t, x) : R1+d 7→ R : h(t, x) = i=1 βi ψ α1,i t + j=1 αj,i xj + cj where Ψ : R → R is a nonlinear “activation” function

such as a sigmoid or tanh function.

3

equations (FBSDEs) and [16] further develops the algorithm. The algorithm developed in [15, 2, 16] focuses

on computing the value of the PDE solution at a single point. The algorithm that we present here is different;

in particular, it does not rely on the availability of FBSDE representations and yields the entire solution of

the PDE across all time and space. In addition, the deep neural network architecture that we use, which

is different from the ones used in [15, 2], seems to be able to recover accurately the entire solution (at least

for the equations that we studied). [49] use a convolutional neural network to solve a large sparse linear

system which is required in the numerical solution of the Navier-Stokes PDE. In addition, [9] has recently

developed a novel partial differential equation approach to optimize deep neural networks.

[33] developed an algorithm for the solution of a discrete-time version of a class of free boundary PDEs.

Their algorithm, commonly called the “Longstaff-Schwartz method”, uses dynamic programming and ap-

proximates the solution using a separate function approximator at each discrete time (typically a linear

combination of basis functions). Our algorithm directly solves the PDE, and uses a single function approxi-

mator for all space and all time. The Longstaff-Schwartz algorithm has been further analyzed by [45], [23],

and others. Sparse grid methods have also been used to solve high-dimensional PDEs; see [43], [44], [22], [6],

and [7].

In regards to general results on the approximation power of neural networks we refer the interested reader

to classical works [11, 25, 26, 39] and we also mention the recent work by [38], where the authors study the

necessary and sufficient complexity of ReLU neural networks that is required for approximating classifier

functions in the mean square sense.

2 Algorithm

Consider a parabolic PDE with d spatial dimensions:

∂u

(t, x) + Lu(t, x) = 0, (t, x) ∈ [0, T ] × Ω,

∂t

u(t = 0, x) = u0 (x),

u(t, x) = g(t, x), x ∈ ∂Ω, (2.1)

where x ∈ Ω ⊂ Rd . The DGM algorithm approximates u(t, x) with a deep neural network f (t, x; θ) where

θ ∈ RK are the neural network’s parameters. Note that the differential operators ∂f ∂t (t, x; θ) and Lf (t, x; θ)

can be calculated analytically. Construct the objective function:

2

∂f
2 2

J(f ) =

∂t (t, x; θ) + Lf (t, x; θ)

+ kf (t, x; θ) − g(t, x)k[0,T ]×∂Ω,ν2 + kf (0, x; θ) − u0 (x)kΩ,ν3 .

[0,T ]×Ω,ν1

2 R 2

Here, kf (y)kY,ν = Y |f (y)| ν(y)dy where ν(y) is a positive probability density on y ∈ Y. J(f ) measures how

well the function f (t, x; θ) satisfies the PDE differential operator, boundary conditions, and initial condition.

If J(f ) = 0, then f (t, x; θ) is a solution to the PDE (2.1).

The goal is to find a set of parameters θ such that the function f (t, x; θ) minimizes the error J(f ). If the

error J(f ) is small, then f (t, x; θ) will closely satisfy the PDE differential operator, boundary conditions,

and initial condition. Therefore, a θ which minimizes J(f (·; θ)) produces a reduced-form model f (t, x; θ)

which approximates the PDE solution u(t, x).

Estimating θ by directly minimizing J(f ) is infeasible when the dimension d is large since the integral

over Ω is computationally intractable. However, borrowing a machine learning approach, one can instead

minimize J(f ) using stochastic gradient descent on a sequence of time and space points drawn at random

from Ω and ∂Ω. This avoids ever forming a mesh.

The DGM algorithm is:

1. Generate random points (tn , xn ) from [0, T ] × Ω and (τn , zn ) from [0, T ] × ∂Ω according to respective

probability densities ν1 and ν2 . Also, draw the random point wn from Ω with probability density ν3 .

2. Calculate the squared error G(θn , sn ) at the randomly sampled points sn = {(tn , xn ), (τn , zn ), wn }

where:

2 2 2

∂f

G(θn , sn ) = (tn , xn ; θn ) + Lf (tn , xn ; θn ) + f (τn , zn ; θn ) − g(τn , zn ) + f (0, wn ; θn ) − u0 (wn ) .

∂t

4

3. Take a descent step at the random point sn :

θn+1 = θn − αn ∇θ G(θn , sn )

The “learning rate” αn decreases with n. The steps ∇θ G(θn , sn ) are unbiased estimates of ∇θ J(f (·; θn )):

E ∇θ G(θn , sn )θn = ∇θ J(f (·; θn )).

Therefore, the stochastic gradient descent algorithm will on average take steps in a descent direction for the

objective function J. A descent direction means that the objective function decreases after an iteration (i.e.,

J(f (·; θn+1 )) < J(f (·; θn )) ), and θn+1 is therefore a better parameter estimate than θn .

Under (relatively mild) technical conditions (see [3]), the algorithm θn will converge to a critical point of

the objective function J(f (·; θ)) as n → ∞:

lim k∇θ J(f (·; θn ))k = 0.

n→∞

It’s important to note that θn may only converge to a local minimum when f (t, x; θ) is non-convex. This

is generally true for non-convex optimization and is not specific to this paper’s algorithm. In particular,

deep neural networks are non-convex. Therefore, it is well known that stochastic gradient descent may only

converge to a local minimum (and not a global minimum) for a neural network. Nevertheless, stochastic

gradient descent has proven very effective in practice and is the fundamental building block of nearly all

approaches for training deep learning models.

tives

This section describes a modified algorithm which may be more computationally efficient in some cases.

∂2f

The term Lf (t, x; θ) contains second derivatives ∂x i xj

(t, x; θ) which may be expensive to compute in higher

dimensions. For instance, 20, 000 second derivatives must be calculated in d = 200 dimensions.

The complicated architectures of neural networks can make it computationally costly to calculate the

second derivatives (for example, see the neural network architecture (4.2)). The computational cost for

calculating second derivatives (in both total arithmetic operations and memory) is O(d2 × N ) where d is

the spatial dimension of x and N is the batch size. In comparison, the computational cost for calculating

first derivatives is O(d × N ). The cost associated with the second derivatives is further increased since we

2

actually need the third-order derivatives ∇θ ∂∂xf2 (t, x; θ) for the stochastic gradient descent algorithm. Instead

of directly calculating these second derivatives, we approximate the second derivatives using a Monte Carlo

method. Pd ∂2f

Suppose the sum of the second derivatives in Lf (t, x, ; θ) is of the form 12 i,j=1 ρi,j σi (x)σj (x) ∂x i xj

(t, x; θ),

assume [ρi,j ]di,j=1 is a positive definite matrix, and define σ(x) = σ1 (x), . . . , σd (x) . For example, such

PDEs arise when considering expectations of functions of stochastic differential equations, where the σ(x)

represents the diffusion coefficient. See equation (4.1) and the corresponding discussion. A generalization

of the algorithm in this section to second derivatives with nonlinear coefficient dependence on u(t, x) is also

possible. Then,

d d ∂f ∂f

∂2f ∂xi (t, x + σ(x)W∆ ; θ) − ∂xi (t, x; θ)

X X

i

ρi,j σi (x)σj (x) (t, x; θ) = lim E σi (x)W∆ , (3.1)

i,j=1

∂xi xj ∆→0

i=1

∆

√

where Wt ∈ Rd is a Brownian motion and ∆ ∈ R+ is the step-size. The convergence rate for (3.1) is O( ∆).2

2 Let

f be a three-times differentiable function in x with bounded third-order derivatives in x. Then, it directly follows from

∂f ∂f

√

(t,x+σ(x)W∆ ;θ)− ∂x (t,x;θ)

∂2f

a Taylor expansion that di,j=1 ρi,j σi (x)σj (x) ∂x

P Pd ∂xi

i ≤ C(x) ∆.

(t, x; θ) − E i=1

i

σi (x)W ∆

i xj ∆

The constant C(x) depends upon ρ, fxxx (t, x; θ) and σ(x).

5

Define:

2

∂f

G1 (θn , sn ) := (tn , xn ; θn ) + Lf (tn , xn ; θn ) ,

∂t

2

G2 (θn , sn ) := f (τn , zn ; θn ) − g(τn , zn ) ,

2

G3 (θn , sn ) := f (0, wn ; θn ) − u0 (wn ) ,

The DGM algorithm use the gradient ∇θ G1 (θn , sn ), which requires the calculation of the second derivative

terms in Lf (tn , xn ; θn ). Define the first derivative operators as

d

1 X ∂2f

L1 f (tn , xn ; θn ):=Lf (tn , xn ; θn ) − ρi,j σi (xn )σj (xn ) (tn , xn ; θ).

2 i,j=1 ∂xi xj

d ∂f ∂f

1 X ∂xi (t, xn + σ(xn )W∆ ; θ) − ∂xi (t, xn ; θ)

∂f i

G̃1 (θn , sn ) := 2 (tn , xn ; θn ) + L1 f (tn , xn ; θn ) + σi (xn )W∆

∂t 2 i=1 ∆

d ∂f ∂f

1 X ∂xi (t, xn + σ(xn )W̃∆ ; θ) − ∂xi (t, xn ; θ)

∂f i

× ∇θ (tn , xn ; θn ) + L1 f (tn , xn ; θn ) + σi (xn )W̃∆ ,

∂t 2 i=1 ∆

where W∆ is a d-dimensional normal random variable with E[W∆ ] = 0 and Cov[(W∆ )i , (W∆ )j ] = ρi,j ∆. W̃∆

has the same distribution as W∆ . W∆ √ and W̃∆ are independent. G̃1 (θn , sn ) is a Monte Carlo approximation

of ∇θ G1 (θn , sn ). G̃1 (θn , sn ) has O( ∆) bias as an approximation for ∇θ G1 (θn , sn ). This approximation

error can be further improved via the following scheme using “antithetic variates”:

d ∂f ∂f

1 X ∂xi (t, xn + σ(xn )W∆ ; θ) − ∂xi (t, xn ; θ)

∂f i

G̃1,a (θn , sn ) := (tn , xn ; θn ) + L1 f (tn , xn ; θn ) + σi (xn )W∆

∂t 2 i=1 ∆

d ∂f ∂f

1 X ∂xi (t, xn + σ(xn )W̃∆ ; θ) − ∂xi (t, xn ; θ)

∂f i

× ∇θ (tn , xn ; θn ) + L1 f (tn , xn ; θn ) + σi (xn )W̃∆ ,

∂t 2 i=1 ∆

d ∂f ∂f

1 X ∂xi (t, xn − σ(xn )W∆ ; θ) − ∂xi (t, xn ; θ)

∂f i

G̃1,b (θn , sn ) := (tn , xn ; θn ) + L1 f (tn , xn ; θn ) − σi (xn )W∆

∂t 2 i=1 ∆

d ∂f ∂f

1 X ∂xi (t, xn − σ(xn )W̃∆ ; θ) − ∂xi (t, xn ; θ)

∂f i

× ∇θ (tn , xn ; θn ) + L1 f (tn , xn ; θn ) − σi (xn )W̃∆ .

∂t 2 i=1 ∆

The approximation (3.2) has O(∆) bias as an approximation for ∇θ G1 (θn , sn ). (3.2) uses antithetic variates

in the sense that G̃1,a (θn , sn ) uses the random variables (W∆ , W̃∆ ) while G̃1,b (θn , sn ) uses (−W∆ , −W̃∆ ).

See [1] for a background on antithetic variates in simulation algorithms. A Taylor expansion can be used

to show the approximation error is O(∆). It is important to highlight that there is no computational cost

associated with the magnitude of ∆; an arbitrarily small ∆ can be chosen with no additional computational

cost (although there may be numerical underflow or overflow problems). The modified algorithm using the

Monte Carlo approximation for the second derivatives is:

1. Generate random points (tn , xn ) from [0, T ] × Ω and (τn , zn ) from [0, T ] × ∂Ω according to respective

densities ν1 and ν2 . Also, draw the random point wn from Ω with density ν3 .

6

2. Calculate the step G̃(θn , sn ) = G̃1 (θn , sn ) + ∇θ G2 (θn , sn ) + ∇θ G3 (θn , sn ) at the randomly sampled

points sn = {(tn , xn ), (τn , zn ), wn }. G̃(θn , sn ) is an approximation for ∇θ G(θn , sn ).

3. Take a step at the random point sn :

θn+1 = θn − αn G̃(θn , sn )

In conclusion, the modified algorithm here is computationally less expensive than the original algorithm

in Section 2 but introduces some bias and variance. The variance essentially increases the i.i.d. noise in

the stochastic gradient descent step; this noise averages out over a large number of samples though. The

original algorithm in Section 2 is unbiased and has lower variance, but is computationally more expensive.

We numerically implement the algorithm for a class of free boundary PDEs in Section 4. Future research

may investigate other methods to further improve the computational evaluation of the second derivative

terms (for instance, multi-level Monte Carlo).

We test our algorithm on a class of high-dimensional free boundary PDEs. These free boundary PDEs are

used in finance to price American options and are often referred to as “American option PDEs”. An American

option is a financial derivative on a portfolio of stocks. The option owner may at any time t ∈ [0, T ] choose to

exercise the American option and receive a payoff which is determined by the underlying prices of the stocks

in the portfolio. T is called the maturity date of the option and the payoff function is g(x) : Rd → R. Let

Xt ∈ Rd be the prices of d stocks. If at time t the stock prices Xt = x, the price of the option is u(t, x). The

price function u(t, x) satisfies a free boundary PDE on [0, T ] × Rd . For American options, one is primarily

interested in the solution u(0, X0 ) since this is the fair price to buy or sell the option.

Besides the high dimensions and the free boundary, the American option PDE is challenging to numer-

ically solve since the payoff function g(x) (which both appears in the initial condition and determines the

free boundary) is not continuously differentiable.

Section 4.1 states the free boundary PDE and the deep learning algorithm to solve it. To address the

free boundary, we supplement the algorithm presented in Section 2 with an iterative method; see Section

4.1. Section 4.2 describes the architecture and implementation details for the neural network. Section 4.3

reports numerical accuracy for a case where a semi-analytic solution exists. Section 4.4 reports numerical

accuracy for a case where no semi-analytic solution exists.

We now specify the free boundary PDE for u(t, x). The stock price dynamics and option price are:

u(t, x) = sup E[e−r(τ ∧T ) g(Xτ ∧T )|Xt = x],

τ ≥t

where Wt ∈ Rd is a standard Brownian motion and Cov[dWti , dWtj ] = ρi,j dt. The price of the American

option is u(0, X0 ).

The model (4.1) for the stock price dynamics is widely used in practice and captures several desirable

characteristics. First, the drift µ(x) measures the “average” growth in the stock prices. The Brownian

motion Wt represents the randomness in the stock price, and the magnitude of the randomness is given

by the coefficient function σ(Xti ). The movement of stock prices are correlated (e.g., if Microsoft’s price

increases, it is likely that Apple’s price will also increase). The magnitude of the correlation between two

stocks i and j is specified by the parameter ρi,j . An example is the well-known Black-Scholes model µ(x) = µx

and σ(x) = σx. In the Black-Scholes model, the average rate of return for each stock is µ.

An American option is a financial derivative which the owner can choose to “exercise” at any time

t ∈ [0, T ]. If the owner exercises the option, they receive the financial payoff g(Xt ) where Xt is the prices of

7

the underlying stocks. If the owner does not choose to exercise the option, they receive the payoff g(XT ) at

the final time T . The value (or price) of the American option at time t is u(t, Xt ). Some typical examples of

Qd Pd

the payoff function g(x) : Rd → R are g(x) = max ( i=1 xi )1/d − K, 0 and g(x) = max d1 i=1 xi − K, 0 .

The former is referred to as a “geometric payoff function” while the latter is called an “arithmetic payoff

function.” K is the “strike price” and is a positive number.

The price function u(t, x) in (4.1) is the solution to a free boundary PDE and will satisfy:

d

∂u ∂u 1 X ∂2u

0 = (t, x) + µ(x) · (t, x) + ρi,j σ(xi )σ(xj ) (t, x) − ru(t, x), ∀ (t, x) : u(t, x) > g(x) .

∂t ∂x 2 i,j=1 ∂xi ∂xj

u(t, x) ≥ g(x), ∀ (t, x).

C (R+ × Rd ),

1

u(t, x) ∈ ∀ (t, x) : u(t, x) = g(x) .

u(T, x) = g(x), ∀ x. (4.1)

The free boundary set is F = (t, x) : u(t, x) = g(x) . u(t, x) satisfies a partial differential equation “above”

the free boundary set F , and u(t, x) equals the function g(x) “below” the free boundary set F .

The deep learning algorithm for solving the PDE (4.1) requires simulating points above and below the

free boundary set F . We use an iterative method to address the free boundary. The free boundary set F

is approximated using the current parameter estimate θn . This approximate free boundary is used in the

probability measure that we simulate points with. The gradient is not taken with respect to the θn input of

the probability density used to simulate random points. For this purpose, define the objective function:

2

d 2

∂f ∂f 1 X ∂ f

J(f ; θ, θ̃) =
(t, x; θ) + µ(x) ·

(t, x; θ) + ρi,j σ(xi )σ(xj ) (t, x; θ) − rf (t, x; θ)

∂t ∂x 2 i,j=1

∂xi ∂x j

[0,T ]×Ω,ν1 (θ̃)

2

+ kmax(g(x) − f (t, x; θ), 0)k[0,T ]×Ω,ν2 (θ̃)

2

+ kf (T, x; θ) − g(x)kΩ,ν3 .

Descent steps are taken in the direction −∇θ J(f ; θ, θ̃). ν1 (θ̃) and ν2 (θ̃) are the densities of the points in B̃ 1

and B̃ 2 , which are defined below. The deep learning algorithm is:

1. Generate the random batch of points B 1 = {tm , xm }M m=1 from [0, T ] × Ω according to the probability

density ν10 . Select the points B̃ 1 = {(t, x) ∈ B 1 : f (t, x; θn ) > g(x)}.

2. Generate the random batch of points B 2 = {τm , zm }M m=1 from [0, T ] × ∂Ω according to the probability

density ν20 . Select the points B̃ 2 = {(τ, z) ∈ B 2 : f (τ, z; θn ) ≤ g(z)}.

3. Generate the random batch of points B 3 = {wm }M

m=1 from Ω with probability density ν3 .

1 X ∂f ∂f

J(f ; θn , Sn ) = (tm , xm ; θn ) + µ(xm ) · (tm , xm ; θn )

1

|B̃ | 1

∂t ∂x

(tm ,xm )∈B̃

d 2

1 X ∂2f

+ ρi,j σ(xi )σ(xj ) (tm , xm ; θn ) − rf (tm , xm ; θn )

2 i,j=1 ∂xi ∂xj

1 X 2

+ max g(zm ) − f (τm , zm ; θn ), 0

|B̃ 2 |

(τm ,zm )∈B̃ 2

X 2

1

+ f (T, wm ; θ) − g(wm ) .

|B 3 | 3 wm ∈B

θn+1 = θn − αn ∇θ J(f ; θn , Sn ).

8

6. Repeat until convergence criterion is satisfied.

The second derivatives in the above algorithm can be approximated using the method from Section 3.

This section provides details for the implementation of the algorithm, including the DGM network architec-

ture, hyperparameters, and computational approach.

The architecture of a neural network can be crucial to its success. Frequently, different applications

require different architectures. For example, convolution networks are essential for image recognition while

long short-term networks (LSTMs) are useful for modeling sequential data. Clever choices of architectures,

which exploit a priori knowledge about an application, can significantly improve performance. In the PDE

applications in this paper, we found that a neural network architecture similar in spirit to that of LSTM

networks improved performance.

The PDE solution requires a model f (t, x; θ) which can make “sharp turns” due to the final condition,

which is of the form u(T, x) = max(p(x), 0) (the first derivative is discontinuous when p(x) = 0). The shape

of the solution u(t, x) for t < T , although “smoothed” by the diffusion term in the PDE, will still have a

nonlinear profile which is rapidly changing in certain spatial regions. In particular, we found the following

network architecture to be effective:

→

S1 = σ(W 1 x + b1 ),

→

Z` = σ(U z,` x + W z,` S ` + bz,` ), ` = 1, . . . , L,

` g,` → g,` 1 g,`

G = σ(U x +W S +b ), ` = 1, . . . , L,

` r,` → r,` ` r,`

R = σ(U x +W S + b ), ` = 1, . . . , L,

` h,` → h,`

H = σ(U x +W (S R ) + bh,` ),

` `

` = 1, . . . , L,

`+1 ` ` ` `

S = (1 − G ) H + Z S , ` = 1, . . . , L,

f (t, x; θ) = W S L+1 + b, (4.2)

→

where x = (t, x), the number of hidden layers is L + 1, and denotes element-wise multiplication (i.e.,

z v = z0 v0 , . . . , zN vN ). The parameters are

L L L L

1 1 z,` z,` z,` g,` g,` g,` r,` r,` r,` h,` h,` h,`

θ= W ,b , U ,W ,b , U ,W ,b , U ,W ,b , U ,W ,b , W, b .

`=1 `=1 `=1 `=1

σ(z) = φ(z1 ), φ(z2 ), . . . , φ(zM ) , (4.3)

y

e

where φ : R → R is a nonlinear activation function such as the tanh function, sigmoidal function 1+e y , or

1 M ×(d+1)

rectified linear unit (ReLU) max(y, 0). The parameters in θ have dimensions W ∈ R , b ∈ RM ,

1

M ×(d+1) M ×M M ×(d+1) M ×M

z,`

U ∈R z,`

,W ∈R z,` M

,b ∈R ,U ∈R g,`

,W ∈Rg,`

, b ∈ R , U ∈ RM ×(d+1) ,

g,` M r,`

r,` M ×M r,` M h,` M ×(d+1) h,` M ×M

W ∈R ,b ∈R ,U ∈R ,W ∈R , b ∈ R , W ∈ R1×M , and b ∈ R.

h,` M

The architecture (4.2) is relatively complicated. Within each layer, there are actually many “sub-layers”

of computations. The important feature is the repeated element-wise multiplication of nonlinear functions

of the input. This helps to model more complicated functions which are rapidly changing in certain time

and space regions. The neural network architecture (4.2) is similar to the architecture for LSTM networks

(see [24]) and highway networks (see [47]).

The key hyperparameters in the neural network (4.2) are the number of layers L, the number of units

M in each sub-layer, and the choice of activation unit φ(y). We found for the applications in this paper

that the hyperparameters L = 3 (i.e., four hidden layers), M = 50, and φ(y) = tanh(y) were effective. It

is worthwhile to note that the choice of φ(y) = tanh(y) means that f (t, x; θ) is smooth and therefore can

9

solve for a “classical solution” of the PDE. The neural network parameters are initialized using the Xavier

initialization (see [18]). The architecture (4.2) is bounded in the input x (for a fixed choice of parameters

θ) if σ(·) is a tanh or sigmoidal function; it may be helpful to allow the network to be unbounded for

approximating unbounded/growing functions. We found that replacing the σ(·) in the H ` sub-layer with the

identity function can be an effective way to develop an unbounded network.

We emphasize that the only input to the network is (t, x). We do not use any custom-designed nonlinear

transformations of (t, x). If properly chosen, such additional inputs might help performance. For example,

the European option PDE solution (which has an analytic formula) could be included as an input.

A regularization term (such as an `2 penalty) could also be included in the objective function for the

algorithm. Such regularization terms are used for reducing overfitting in machine learning models estimated

using datasets which have a limited number of data samples. (For example, a model estimated on a dataset

of 60, 000 images.) However, it’s unclear if this will be helpful in the context of this paper’s application,

since there is no strict upper bound on the size of the dataset (i.e., one can always simulate more time/space

points).

Our computational approach to training the neural network involved several components. The second

derivatives are approximated using the method from Section 3. Training is distributed across 6 GPU nodes

using asynchronous stochastic gradient descent (we provide more details on this below). Parameters are

updated using the well-known ADAM algorithm (see [27]) with a decaying learning rate schedule (more

details on the learning rate are provided below). Accuracy can be improved by calculating a running average

of the neural network solutions over a sequence of training iterations (essentially a computationally cheap

approach for building a model ensemble). We also found that model ensembles (of even small sizes of 5) can

slightly increase accuracy.

Training of the neural network is distributed across several GPU nodes in order to accelerate training.

We use asynchronous stochastic gradient descent, which is a widely-used method for parallelizing training of

machine learning models. On each node, i.i.d. space and time samples are generated. Each node calculates

the gradient of the objective function with respect to the parameters on its respective batch of simulated data.

These gradients are then used to update the model, which is stored on a central node called a “parameter

server”. Figure 1 displays the computational setup. Updates occur asynchronously; that is, node i updates

the model immediately upon completion of its work, and does not wait for node j to finish its work. The

“work” here is calculating the gradients for a batch of simulated data. Before a node calculates the gradient

for a new batch of simulated data, it receives an updated model from the parameter server. For more details

on asynchronous stochastic gradient descent, see [13].

During training, we decrease the learning as the number of iterations increases. We use a learning rate

schedule where the learning rate is a piecewise constant function of the number of iterations. This is a typical

choice. We found the following learning rate schedule to be effective:

10

10−4

n ≤ 5, 000

5 × 10−4

5, 000 < n ≤ 10, 000

10−5

< 10, 000 < n ≤ 20, 000

αn = 5 × 10−6 20, 000 < n ≤ 30, 000

10−6 30, 000 < n ≤ 40, 000

5 × 10−7

40, 000 < n ≤ 45, 000

10−7

45, 000 < n

We use approximately 100, 000 iterations. An “iteration” involves batches of size 1, 000 on each of the

GPU nodes. Therefore, there are 5, 000 simulated time/space points for each iteration. In total, we used

approximately 500 million simulated time/space points to train the neural network.

We implement the algorithm using TensorFlow and PyTorch, which are software libraries for deep learn-

ing. TensorFlow has reverse mode automatic differentiation which allows the calculation of derivatives for

a broad range of functions. For example, TensorFlow can be used to calculate the gradient of the neural

network (4.2) with respect to x or θ. TensorFlow also allows for the training of models on graphics process-

ing units (GPUs). A GPU, which has thousands of cores, can be use to highly parallelize the training of

deep learning models. We furthermore distribute our computations across multiple GPU nodes, as described

above. The computations in this paper were performed on the Blue Waters supercomputer which has a large

number of GPU nodes.

We implement our deep learning algorithm to solve the PDE (4.1). The accuracy of our deep learning

algorithm is evaluated in up to 200 dimensions. The results are reported below in Table 1.

3 0.05%

20 0.03%

100 0.11%

200 0.22%

Table 1: The deep learning algorithm solution is compared with a semi-analytic solution for the Black-Scholes

model. The parameters µ(x) = (r − c)x and σ(x) = σx. All stocks are identical with correlation ρi,j = .75,

volatility σ = .25, initial stock price X0 = 1, dividend rate c = 0.02, and interest rate r = 0. QThe maturity

d 1/d

of the option is T = 2 and the strike price is K = 1. The payoff function is g(x) = max ( i=1 xi ) −

K, 0 . The error is reported for the price u(0, X0 ) of the at-the-money American call option. The error is

|f (0,X0 ;θ)−u(0,X0 )|

|u(0,X0 )| × 100%.

The semi-analytic solution used in Table 1 is provided below. Let µ(x) = (r−c)x, σ(x) = σx, and ρi,j = ρ

Qd

for i 6= j (i.e., the Black-Scholes model). If the payoff function in (4.1) is g(x) = max ( i=1 xi )1/d − K, 0 ,

then there is a semi-analytic solution to (4.1):

d

Y

u(t, x) = v(t, ( xi )1/d − K), (4.4)

i=1

∂v ∂v 1 ∂2v

(t, x) + µ̂x (t, x) + σ̂ 2 x 2 (t, x) − rv(t, x),

0 = ∀ (t, x) : v(t, x) > ĝ(x) .

∂t ∂x 2 ∂x

v(t, x) ≥ ĝ(x), ∀ (t, x).

v̂(t, x) ∈ C 1 (R+ × Rd ), ∀ (t, x) : v(t, x) = ĝ(x) .

11

2 2

where σ̂ 2 = dσ +d(d−1)ρσ

d2 , µ̂ = (r − c) − 12 σ̂ 2 + 12 σ 2 , and ĝ(x) = max(x, 0). The one-dimensional PDE (4.5)

can be solved using finite difference methods. If f (t, x; θ) is the deep learning algorithm’s estimate for the

PDE solution at (t, x), the relative error at the point (t, x) is |f (t,x;θ)−u(t,x)|

|u(t,x)| × 100% and the absolute error

at the point (t, x) is |f (t, x; θ) − u(t, x)|. The relative error and absolute error at the point (t, x) can be

evaluated using the semi-analytic solution (4.4).

Although the solution at (t, x) = (0, X0 ) is of primary interest for American options, most other PDE

applications are interested in the entire solution u(t, x). The deep learning algorithm provides an approximate

solution across all time and space (t, x) ∈ [0, T ] × Ω. As an example, we present in Figure 2 contour plots of

the absolute error and percent error across time and space for the American option PDE in 20 dimensions.

The contour plot is produced in the following way:

1. Sample time points t` uniformly on [0, T ] and sample spatial points x` = (x`1 , . . . , x`20 ) from the joint

distribution of Xt1 , . . . , Xt20 in equation (4.1). This produces an “envelope” of sampled points since Xt

spreads out as a diffusive process from X0 = 1.

20

Y

x`i )1/20 , E ` for ` = 1, . . . , L.

3. Aggregate the error over a two-dimensional subspace t` , (

i=1

Q20 L

4. Produce a contour plot from the data t` , ( i=1 x`i )1/20 , E ` `=1 . The x-axis is t and the y-axis is the

Q20

geometric average ( i=1 xi )1/20 , which corresponds to the final condition g(x).

Figure 2 reports both the absolute error and the percent error. The percent error |f (t,x;θ)−u(t,x)|

|u(t,x)| × 100%

is reported for points where |u(t, x)| > 0.05. The absolute error becomes relatively large in a few areas;

however, the solution u(t, x) also grows large in these areas and therefore the percent error remains small.

We now consider a case of the American option PDE which does not have a semi-analytic solution. The

American option PDE has the special property that it is possible to calculate error bounds on an approximate

solution. Therefore, we can evaluate the accuracy of the deep learning algorithm even on cases where no

semi-analytic solution is available.

We previously only considered a symmetrical case where ρi,j = 0.75 and σ = 0.25 for all stocks. This

section solves a more challenging heterogeneous case where ρi,j and σi vary across all dimensions i =

1, 2, . . . , d. The coefficients are fitted to actual data for the stocks IBM, Amazon, Tiffany, Amgen, Bank

of America, General Mills, Cisco, Coca-Cola, Comcast, Deere, General Electric, Home Depot, Johnson &

Johnson, Morgan Stanley, Microsoft, Nordstrom, Pfizer, Qualcomm, Starbucks, and Tyson Foods from 2000-

2

2017. This produces a PDE with widely-varying coefficients for each of the d 2+d second derivative terms.

The correlation coefficients ρi,j range from −0.53 to 0.80 for i 6= j and σi ranges from 0.09 to 0.69.

Let f (t, x; θ) be the neural network approximation. [45] derived that the PDE solution u(t, x) lies in the

interval:

u(t, x) ∈ u(t, x), u(t, x) ,

u(t, x) = E g(Xτ )|Xt = x, τ > t ,

u(t, x) = E sup e−r(s−t) g(Xs ) − Ms .

(4.6)

s∈[t,T ]

where τ = inf{t ∈ [0, T ] : f (t, Xt ; θ) < g(Xt )} and Ms is a martingale constructed from the approximate

12

Figure 2: Top: Absolute error. Bottom: Percent error. For reference, the price at time 0 is 0.1003 and the

solution at time T is max(geometric average of x − 1, 0).

solution f (t, x; θ)

Z s

∂f 0 ∂f

Ms = f (s, Xs ; θ) − f (t, Xt ; θ) − (s , Xs0 ; θ) + µ(Xs0 ) (s0 , Xs0 ; θ)

t ∂t ∂x

d

∂2f

1 X

+ σ(Xs0 ,i )σ(Xs0 ,j ) (s , Xs0 ; θ) − rf (s , Xs0 ; θ) ds0 .

0 0

(4.7)

2 i,j=1 ∂xi ∂xj

The bounds (4.6) depend only on the approximation f (t, x; θ), which is known, and can be evaluated via

Monte Carlo simulation. The integral for Ms must also be discretized. The best estimate for the price

of the American option is the midpoint of the interval [u(0, X0 ), u(0, X0 )], which has an error bound of

13

u(0,X0 )−u(0,X0 )

2u(0,X0 ) × 100%. Numerical results are in Table 2.

Strike price Neural network solution Lower Bound Upper Bound Error bound

0.90 0.14833 0.14838 0.14905 0.23%

0.95 0.12286 0.12270 0.12351 0.33%

1.00 0.10136 0.10119 0.10193 0.37%

1.05 0.08334 0.08315 0.08389 0.44%

1.10 0.06841 0.06809 0.06893 0.62%

Table 2: The accuracy of the deep learning algorithm is evaluated on a case where there is no semi-

analytic solution. The parameters µ(x) = (r − c)x and σ(x) = σx. The correlations ρi,j and volatilities σi

are estimated from data to generate a heterogeneous diffusion matrix. The initial stock price is X0 = 1,

dividend rate c = 0.02, andP interest rate r = 0 for all stocks. The maturity of the option is T = 2. The payoff

d

function is g(x) = max d1 i=1 xi − K, 0 . The neural network solution and its error bounds are reported

for the price u(0, X0 ) of the American call option. The best estimate for the price of the American option

is the midpoint of the interval [u(0, X0 ), u(0, X0 )], which has an error bound of u(0,X 0 )−u(0,X0 )

2u(0,X0 ) × 100%. In

order to calculate the upper bound, the integral (4.7) is discretized with time step size ∆ = 5 × 10−4 .

We present in Figure 3 contour plots of the absolute error bound and percent error bound across time and

space for the American option PDE in 20 dimensions for strike price K = 1. The contour plot is produced

in the following way:

1. Sample time points t` uniformly on [0, T ] and sample spatial points x` = (x`1 , . . . , x`20 ) from the joint

distribution of Xt1 , . . . , Xt20 in equation (4.1).

2. Calculate the error E ` at each sampled point (t` , x` ) for ` = 1, . . . , L.

20

1 X ` `

3. Aggregate the error over a two-dimensional subspace t` , x , E for ` = 1, . . . , L.

20 i=1 i

1

P20 ` ` L

4. Produce a contour plot from the data t` , 20 i=1 xi , E `=1 . The x-axis is t and the y-axis is the

1

P20 `

geometric average 20 i=1 xi , which corresponds to the final condition g(x).

Figure 3 reports both the absolute error and the percent error. The percent error |f (t,x;θ)−u(t,x)|

|u(t,x)| × 100% is

reported for points where |u(t, x)| > 0.05. It should be emphasized that these are error bounds; therefore,

the actual error could be lower. The contour plot 3 requires significant computations. For each point at

which calculate an error bound, a new simulation of (4.6) is required. In total, a large number of simulations

are required, which we distribute across hundreds of GPUs on the Blue Waters supercomputer.

We also test the deep learning algorithm on a high-dimensional Hamilton-Jacobi-Bellman (HJB) equation

corresponding to the optimal control of a stochastic heat equation. Specifically, we demonstrate that the

deep learning algorithm accurately solves the high-dimensional PDE (5.5). The PDE (5.5) is motivated by

the problem of optimally controlling the stochastic partial differential equation (SPDE):

∂v ∂2v ∂2W

(t, x) = α 2 (t, x) + u(x) + σ (t, x), x ∈ [0, L],

∂t ∂x ∂t∂x

v(t, x = 0) = v̄(0),

v(t, x = L) = v̄(L),

v(t = 0, x) = v0 (x), (5.1)

14

Figure 3: Top: Absolute error. Bottom: Percent error. For reference, u(0, X0 ) ∈ [0.10119, 0.10193] and the

solution at time T is max(average of x − 1, 0).

2

where u(x) is the control and W (t, x) is a Brownian sheet (i.e., ∂∂t∂x

W

(t, x) is space-time white noise) defined on

a stochastic basis (Ω, F, Ft , P). The square integrable, adapted to the filtration Ft , control u is a source/sink

term which can be used to guide the temperature v(t, x) towards a target profile v̄(x) on [0, L]. As it is

discussed in [10] such problems admit unique solutions in the appropriate generalized sense, see Theorem

3.1 in [10]. The endpoints at x = 0, L are held at the target temperatures. Specifically, the optimal control

minimizes

Z ∞ Z L

e−γs (v(s, x) − v̄(x))2 + λu(x)2 dxds .

E (5.2)

0 0

The constant γ > 0 is a discount factor. The constant λ > 0 penalizes large values for the control u(x). The

goal is to reach the target v̄(x) while expending the minimum amount of energy. The optimal control u(x)

15

satisfies an infinite-dimensional HJB equation. We refer the reader to Theorems 5.3 and 5.4 of [10] as well

as [14] and [36] for an analysis of infinite-dimensional HJB equations for the stochastic heat equation.

An example of a problem represented by the SPDE (5.1) is the heating of a rod to a target temperature

profile. One can control the heat applied to each portion of the rod along its length. There are also random

fluctuations in the temperature of the rod due to other environmental factors, which is represented by the

Brownian sheet W (t, x). The goal is to guide the temperature profile of the rod to the target profile while

expending the least amount of energy; see the objective function (5.2).

(5.1) can be discretized in space, which yields a system of stochastic differential equations (SDEs). (For

example, see Section 3.2 of [19].) This system of SDEs can be used to derive a finite, high-dimensional PDE

for the value function and optimal control. That is, we first approximate the SPDE with a finite-dimensional

system of SDEs, and then we solve the high-dimensional PDE corresponding to the finite-dimensional system

of SDEs.

α σ

dXtj = (X j+1 − 2Xtj + Xtj−1 )dt + Utj dt + √ dWtj , X0j = v0 (j∆), (5.3)

∆2 t ∆

where ∆ is the mesh size, v(t, j∆) = Xtj , u(j∆) = Utj , and Wtj are independent standard Brownian motions

(see [12], [21], and [19] regarding numerical schemes for stochastic parabolic PDEs of the form considered in

L

this section). The dimension of the SDE system (5.3) is d = ∆ − 1. Note that (5.3) uses a central difference

scheme for the diffusion term in (5.1).

The objective function (5.2) becomes:

Z ∞ d

X

−γs

j 2 j 2

V (x) = inf E e (Xs − v̄(j∆)) + λ(Us ) ∆dsX0 = x .

(5.4)

Ut ∈U 0 j=1

The value function V (x) satisfies a nonlinear PDE with d spatial dimensions x1 , x2 , . . . , xd .

d 2

> 1 X ∂V

0 = ∆(x − v̄) (x − v̄) − (x)

4λ∆ j=1 ∂xj

d d

σ2 X ∂ 2 V α X ∂V

+ 2 (x) + 2

(xj+1 − 2xj + xj−1 ) (x) − γV (x). (5.5)

2∆ j=1 ∂xj ∆ j=1 ∂xj

The vector v̄ = (v̄(∆), v̄(2∆), . . . , v̄(d∆)). Note that the values xd+1 = v̄(L) and x0 = v̄(0) are constants

which correspond to the boundary conditions in (5.1). The PDE (5.5) is high dimensional since the number

L

of dimensions d = ∆ − 1. The optimal control is

1 ∂V

Utj = − (Xt ). (5.6)

2λ∆ ∂xj

We solve the PDE (5.5) using the deep learning algorithm for d = 21 dimensions. The size of the domain is

1

L = 10−1 . The coefficients are α = 10−4 , σ = 10− 2 , λ = 1, and γ = 1. The target profile is v̄(x) = 0.

The deep learning algorithm’s accuracy can be evaluated since a semi-analytic solution is available for

(5.5).3 Figure 4 shows a contour plot of the percent error over space. The contour plot is produced in the

following way:

1. Sample spatial points x` = (x`1 , . . . , x`21 ) from the distribution of (5.3) for ` = 1, . . . , L.

|f (x` ;θ)−V (x` )|

2. Calculate the percent error at each sampled point. The percent error is A` = |V (x` )|

× 100%.

21

X

1

x`i , A` for ` = 1, . . . , L.

3. Aggregate the accuracy over a two-dimensional subspace x`11 , 21

i=1

3 The PDE (5.5) has a semi-analytic solution which satisfies a Riccati equation. The Riccati equation can be solved using

an iterative method.

16

21

X L

1

4. Produce a contour plot from the data x`11 , 21 x`i , A` `=1

. The x-axis is x11 and the y-axis is the

i=1

21

1

X

L 1

RL

average 21 xi . This corresponds to v(t, x) at the midpoint x = 2 and the average L 0

v(t, x)dx,

i=1

respectively.

The average percent error over the entire space is 0.1%.

Figure 4: Contour plot of the percent error for the deep learning algorithm for a 21-dimensional Hamilton-

Jacobi-Bellman PDE. The horizontal axis is the 11-th dimension. The vertical axis is the average of all

dimensions.

Lastly, we close this section by mentioning that in the recent paper [15] (see also [2]) the authors develop a

machine learning algorithm that provides the value at a single point in time and space of the solution to a class

of HJB equations which admit explicit solution that can be obtained through the Cole-Hopf transformation.

Their method relies on characterizing the solution via backward stochastic differential equations (BSDE). In

contrast, the current work (a) does not rely on BSDE type representations through nonlinear Feynman-Kac

formulas, and (b) allows to recover the whole object (i.e. the solution across all points in time and space).

6 Burgers’ equation

It is often of interest to find the solution of a PDE over a range of problem setups (e.g., different physical

conditions and boundary conditions). For example, this may be useful for the design of engineering systems

or uncertainty quantification. The problem setup space may be high-dimensional and therefore may require

solving many PDEs for many different problem setups, which can be computationally expensive.

Let the variable p represent the problem setup (i.e., physical conditions, boundary conditions, and initial

conditions). The variable p takes values in the space P, and we are interested in the solution of the PDE

u(t, x; p). (This is sometimes called a “parameterized class of PDEs”.) In particular, suppose u(t, x; p)

17

satisfies the PDE

∂u

(t, x; p) = Lp u(t, x; p), (t, x) ∈ [0, T ] × Ω,

∂t

u(t, x; p) = gp (x), (t, x) ∈ [0, T ] × ∂Ω,

u(t = 0, x; p) = hp (x), x ∈ Ω. (6.1)

A traditional approach would be to discretize the P-space and re-solve the PDE many times for many

different points p. However, the total number of grid points (and therefore the number of PDEs that must

be solved) grows exponentially with the number of dimensions, and P is typically high-dimensional.

We propose to use the DGM algorithm to approximate the general solution to the PDE (6.1) for different

boundary conditions, initial conditions, and physical conditions. The deep neural network is trained using

stochastic gradient descent on a sequence of random time, space, and problem setup points (t, x, p). Similar

to before,

• Initialize θ.

• Repeat until convergence:

– Generate random samples (t, x, p) from [0, T ] × Ω × P, (t̃, x̃) from [0, T ] × ∂Ω, and x̂ from Ω.

– Construct the objective function

2

∂f

J(θ) = (t, x, p; θ) − Lp f (t, x, p; θ)

∂t

2

+ gp (x̃) − f (t̃, x̃, p; θ)

2

+ hp (x̂) − f (0, x̂, p; θ) . (6.2)

θ −→ θ − α∇θ J(θ), (6.3)

where α is the learning rate.

If x is low-dimensional (d ≤ 3), which is common in many physical PDEs, the first and second partial

derivatives of f can be calculated via chain rule or approximated by finite difference. We implement our

algorithm for Burgers’ equation on a finite domain.

∂u ∂2u ∂u

= ν 2 − αu , (t, x) ∈ [0, 1] × [0, 1],

∂t ∂x ∂x

u(t, x = 0) = a,

u(t, x = 1) = b,

u(t = 0, x) = g(x), x ∈ [0, 1].

4

The problem setup space is P = (ν, α, a, b) ∈ R . The initial condition g(x) is chosen to be a linear

function which matches the boundary conditions u(t, x = 0) = a and u(t, x = 1) = b. We train a single

neural network to approximate the solution of u(t, x; p) over the entire space (t, x, ν, α, a, b) ∈ [0, 1] × [0, 1] ×

[10−2 , 10−1 ] × [10−2 , 1] × [−1, 1] × [−1, 1]. We use a larger network (6 layers, 200 units per layer) than in

the previous numerical examples. Figure 5 compares the deep learning solution with the exact solution for

several different problem setups p. The solutions are very close; in several cases, the two solutions are visibly

indistinguishable. The deep learning algorithm is able to accurately capture the shock layers and boundary

layers.

Figure 6 presents the accuracy of the deep learning algorithm for different times t and different choices

of ν. As ν becomes smaller, the solution becomes steeper. It also shows the shock layer forming over time.

The contour plot (7) reports the absolute error of the deep learning solution for different choices of b and ν.

18

Figure 5: The deep learning solution is in red. The “exact solution”, found via finite difference, is

in blue. Solutions are reported at time t = 1. The solutions are very close; in several cases, the two

solutions are visibly indistinguishably. The problem setups, in counter-clockwise order, are (ν, α, a, b) =

(0.01, 0.95, 0.9, −0.9), (0.02, 0.95, 0.9, −0.9), (0.01, 0.95, −0.95, 0.95), (0.02, 0.9, 0.9, 0.8), (0.01, 0.75, 0.9, 0.1),

and (0.09, 0.95, 0.5 − 0.5).

Let the L2 error J(f ) measure how well the neural network f satisfies the differential operator, boundary

condition, and initial condition. Define Cn as the class of neural networks with n hidden units and let f n be

a neural network with n hidden units which minimizes J(f ). We prove that

fn → u as n → ∞,

in the appropriate sense, for a class of quasilinear parabolic PDEs with the principle term in divergence

form under certain growth and smoothness assumptions on the nonlinear terms. Our theoretical result only

19

Figure 6: The deep learning solution is in red. The “exact solution”, found via finite difference, is in blue.

Left plot: Comparison of solutions at times t = 0.1, 0.25, 0.5, 1 for (ν, α, a, b) = (0.03, 0.9, 0.95, −0.95).

Right plot: Comparison of solutions for ν = 0.01, 0.02, 0.05, 0.09 at time t = 1 and with (α, a, b) =

(0.8, 0.75, −0.75).

Figure 7: Contour plot of the average absolute error of the deep learning solution for different b and ν (the

viscosity). The absolute error is averaged across x ∈ [0, 1] for time t = 1.

covers a class of quasilinear parabolic PDEs as described in this section. However, the numerical results of

this paper indicate that the results are more broadly applicable.

The proof requires the joint analysis of the approximation power of neural networks as well as the

continuity properties of partial differential equations. First, we show that the neural network can satisfy the

differential operator, boundary condition, and initial condition arbitrarily well for sufficiently large n.

J(f n ) → 0 as n → ∞. (7.1)

Let u be the solution to the PDE. The statement (7.1) does not necessarily imply that f n → u. One challenge

to proving convergence is that we only have L2 control of the error. We prove convergence for the case of

homogeneous boundary data, i.e., g(t, x) = 0, by first establishing that each neural network {f n }∞

n=1 satisfies

a PDE with a source term hn (t, x). Importantly, the source terms hn (t, x) are only known to be vanishing

20

in L2 . We are then able to prove that the convergence of f n → u as n → ∞ in the appropriate space holds

using compactness arguments.

The precise statement of the theorem and the presentation of the proof is in the next two sections. Section

7.1 proves that J(f n ) → 0 as n → ∞. Section 7.2 contains convergence results of f n to the solution u of the

PDE as n → ∞. The main result is Theorem 7.3. For readability purposes the corresponding proofs are in

Appendix A.

In this section, we present a theorem guaranteeing the existence of multilayer feed forward networks f able

to universally approximate solutions of quasilinear parabolic PDEs in the sense that there is f that makes

the objective function J(f ) arbitrarily small. To do so, we use the results of [26] on universal approximation

of functions and their derivatives and make appropriate assumptions on the coefficients of the PDEs to

guarantee that a classical solution exists (since then the results of [26] apply).

Consider a bounded set Ω ⊂ Rd with a smooth boundary ∂Ω and denote ΩT = (0, T ] × Ω and ∂ΩT =

(0, T ] × ∂Ω. In this subsection we consider the class of quasilinear parabolic PDE’s of the form

∂t u(t, x) − div (α(t, x, u(t, x), ∇u(t, x))) + γ(t, x, u(t, x), ∇u(t, x)) = 0, for (t, x) ∈ ΩT

u(0, x) = u0 (x), for x ∈ Ω

u(t, x) = g(t, x), for (t, x) ∈ ∂ΩT (7.2)

For notational convenience, let us write the operator of (7.2) as G. Namely, let us denote

G[u](t, x) = ∂t u(t, x) − div (α(t, x, u(t, x), ∇u(t, x))) + γ(t, x, u(t, x), ∇u(t, x)).

d

X ∂αi (t, x, u(t, x), ∇u(t, x))

G[u](t, x) = ∂t u(t, x) − ∂xi ,xj u(t, x) + γ̂(t, x, u(t, x), ∇u(t, x)),

i,j=1

∂uxj

where

d d

X ∂αi (t, x, u, p) X ∂αi (t, x, u, p)

γ̂(t, x, u, p) = γ(t, x, u, p) − ∂xi u − .

i=1

∂u i=1

∂xi

For the purposes of this section, we consider equations of the type (7.2) that have classical solutions.

In particular we assume that there is a unique u(t, x) solving (7.2) such that

\ 2

X

1+η/2,2+η

u(t, x) ∈ C(Ω̄T ) C (ΩT ) with η ∈ (0, 1) and that sup |∇(k)

x u(t, x)| < ∞. (7.3)

(t,x)∈ΩT k=1

We refer the interested reader to Theorems 5.4, 6.1 and 6.2 of Chapter V in [28] for specific general

conditions on α, γ guaranteeing the validity of the aforementioned statement.

Universal approximation results for single functions and their derivatives have been obtained under

various assumptions in [11, 25, 26]. In this paper, we use Theorem 3 of [26]. Let us recall the setup

appropriately modified for our case of interest. Let ψ be an activation function, e.g., of sigmoid type, of the

hidden units and define the set

Xn d

X

Cn (ψ) = ζ(t, x) : R1+d 7→ R : ζ(t, x) = βi ψ α1,i t + αj,i xj + cj . (7.4)

i=1 j=1

where θ = (β1 , · · · , βn , α1,1 , · · · , αd,n , c1 , c1 , · · · , cn ) ∈ R2n+n(1+d) compose the elements of the parameter

space. Then we have the following result.

21

n 2 d

Theorem 7.1. S Let nC (ψ) be given by (7.4) where ψ is assumed to be in C (R ), bounded and non-constant.

Set C(ψ) = n≥1 C (ψ). Assume that ΩT is compact and consider the measures ν1 , ν2 , ν3 whose support

is contained in ΩT , Ω and ∂ΩT respectively. In addition, assume that the PDE (7.2) has a unique classical

solution such that (7.3) holds. Also, assume that the nonlinear terms ∂αi (t,x,u,p)

∂pj and γ̂(t, x, u, p) are locally

Lipschitz in (u, p) with Lipschitz constant that can have at most polynomial growth on u and p, uniformly

with respect to t, x. Then, for every > 0, there exists a positive constant K > 0 that may depend on

(2)

supΩT |u|, supΩT |∇x u| and supΩT |∇x u| such that there exists a function f ∈ C(ψ) that satisfies

J(f ) ≤ K.

We now prove, under stronger conditions, the convergence of the neural networks f n to the solution u of the

PDE

∂t u(t, x) − div (α(t, x, u(t, x), ∇u(t, x))) + γ(t, x, u(t, x), ∇u(t, x)) = 0, for (t, x) ∈ ΩT

u(0, x) = u0 (x), for x ∈ Ω

u(t, x) = 0, for (t, x) ∈ ∂ΩT , (7.5)

as n → ∞. Notice that we have restricted the discussion to homogeneous boundary data. We do this for

both presentation and mathematical reasons. 4

The objective function is

2 2 2

J(f ) = kG[f ]k2,ΩT + kf k2,∂ΩT + kf (0, ·) − u0 k2,Ω

Recall that the norms above are L2 (X) norms in the respective space X = ΩT , ∂ΩT and Ω respectively.

From Theorem 7.1, we have that

J(f n ) → 0 as n → ∞.

f n (0, x) = un0 (x), for x ∈ Ω

f n (t, x) = g n (t, x), for (t, x) ∈ ∂ΩT (7.6)

2 2 2

khn k2,ΩT + kg n k2,∂ΩT + kun0 − u0 k2,Ω → 0 as n → ∞. (7.7)

For the purposes of this section, we make the following set of assumptions.

Condition 7.2. • There is a constant µ > 0 and positive functions κ(t, x), λ(t, x) such that for all (t, x) ∈

ΩT we have

4 We set u(t, x) = 0, for (t, x) ∈ ∂Ω , i.e., g = 0, to circumvent certain technical difficulties arising due to inhomogeneous

T

boundary conditions. If g 6= 0 such that g is the trace of some appropriately smooth function, say φ, then one can reduce the

inhomogeneous boundary conditions on ∂ΩT to the homogeneous one by introducing in place of u the new function u − φ, see

Section 4 of Chapter V in [28] or Chapter 8 of [20] for details on such considerations. We do not explore this here, because our

goal is not to prove the most general result possible, but to provide a concrete setup in which we can prove the validity of the

approximation results of interest.

22

• α(t, x, u, p) and γ(t, x, u, p) are Lipschitz continuous in (t, x, u, p) ∈ ΩT × R × Rd uniformly on compacts

of the form {(t, x) ∈ Ω̄T , |u| ≤ C, |p| ≤ C}.

• α(t, x, u, p) is differentiable with respect to (x, u, p) with continuous derivatives.

• There is a positive constant ν > 0 such that

and

• u0 (x) ∈ C 0,2+ξ (Ω̄) for some ξ > 05 with itself and its first derivative bounded in Ω̄.

• For every n ∈ N, f n ∈ C 1,2 (Ω̄T ). In addition, (f n )n∈N ∈ L2 (ΩT ).

Theorem 7.3. Assume that Condition 7.2 and (7.7) hold. Then, problem (7.5) has a unique bounded

1,2 (1,2),2

solution in C 0,δ,δ/2 2

(Ω̄T ) ∩ L 0, T ; W0 (Ω) ∩ W0 (Ω0T ) for some δ > 0 and any interior subdomain Ω0T

of ΩT 6 . In addition, f n converges to u, the unique solution to (7.5), strongly in Lρ (ΩT ) for every ρ < 2. If,

in addition, the sequence {f n (t, x)}n∈N is uniformly bounded in n and equicontinuous then the convergence

to u is uniform in ΩT .

The proof of this theorem is in the Appendix. We conclude this section with some remarks and an

example.

Remark 7.4. Despite the restriction made to the zero boundary data case, we do expect that our results are

also valid for reasonably smooth inhomogeneous boundary data. In addition, if we make further assumptions

on the nonlinearities α(t, x, u, p) and γ(t, x, u, p) and on the initial data u0 (x), then one can establish existence

and uniqueness of classical solutions, see for example Section 6 of Chapter V in [28] for details. As a matter

of fact the results of Chapter V.6 in [28] show that with assuming a little bit more on the growth of the

0 0

derivatives of the nonlinear functions α(t, x, u, p), γ(t, x, u, p) will lead to ∇x u ∈ C 0,δ ,δ /2 (ΩT ) for some δ 0 > 0.

Furthermore, we remark here that stronger claims can be made if more properties are known in regards to

the given approximating family {f n } such as, for example, a-priori bounds on appropriate Sobolev norms,

but we do not explore this further here.

Remark 7.5. The uniform, in n, L2 bound for the sequence {f n }n∈N is easily satisfied for a bounded neural

network approximation sequence f n (t, x). However, we believe that it is true for a wider class of models, after

all one expects that to be true if f n indeed converges in Lρ for ρ < 2. The condition on equicontinuity for

{f n (t, x)} allows to both simplify the proof and make a stronger claim as well. However, it is only a sufficient

condition and not necessary. The paper, [8], see Theorems 19 and 20 therein, discusses structural restric-

tions (a-priori boundedness and summability) that S can be imposed on the unknown weights of feedforward

neural networks, belonging in the class C(ψ) = n≥1 Cn (ψ) as defined by (7.4), which then guarantee both

equicontinuity and universal approximation properties of the neural network for continuous and bounded

functions. As it is also discussed in [8], equicontinuity is also related to fault-tolerance properties of neural

networks, a subject worthy of further study in the context of PDEs. However, we do not discuss this further

here as this would be a topic for a different paper.

Let us present the case of linear parabolic PDEs in Example 7.6 below.

5 Ingeneral, the Hölder space C 0,ξ (Ω̄) is the Banach space of continuous functions in Ω̄ having continuous derivatives up

to order [ξ] in Ω̄ with finite corresponding uniform norms and finite uniform ξ − [ξ] Hölder norm. Analogously, we also define

the Hölder space C 0,ξ,ξ/2 (Ω̄T ) which in addition has finite [ξ]/2 and (ξ − [ξ])/2 regular and Hölder derivatives norms in time

respectively. These spaces are denoted by H ξ (Ω̄) and H ξ,ξ/2 (Ω̄T ) respectively in [28].

6 Here W (1,2),2 (Ω0 ) denotes the Banach space which is the closure of C ∞ (Ω0 ) with elements from L2 (Ω0 ) having generalized

0 T 0 T T

derivatives of the form Dtr Dxs with r, s such that 2r + s ≤ 2 with the usual Sobolev norm.

23

Example 7.6 (Linear case). Let us assume that the operator G is linear in u and ∇u. In particular, let us set

n

X

σσ T

αi (t, x, u, p) = i,j

(t, x)pj , i = 1, · · · d

j=1

and

d

X ∂

σσ T i,j (t, x)pj − c(t, x)u.

γ(t, x, u, p) = − hb(t, x), pi +

i,j=1

∂xi

h id

Assume that there are positive constants ν, µ > 0 such that for every ξ ∈ Rd the matrix σσ T i,j

(t, x)

i,j=1

satisfies

d

X

ν|ξ|2 ≤ σσ T (t, x)ξi ξj ≤ µ|ξ|2

i,j

i,j=1

Xd

2

bi + kckq,r,ΩT ≤ µ, for some µ > 0

i=1 q,r,ΩT

R

T r/q 1/r

|c(t, x)|q dx

R

where we recall for example kckq,r,ΩT = 0 Ω

and r, q satisfy the relations

1 d

+ =1

r 2q

q ∈ (d/2, ∞], r ∈ [1, ∞), for d ≥ 2,

q ∈ [1, ∞], r ∈ [1, 2], for d = 1.

In particular, the previous bounds always hold in the case of coefficients b and c that are bounded in

ΩT . Under these conditions, standard results for linear PDE’s, see for instance Theorem 4.5 of Chapter III

of [28] for a related result, show that approximation results analogous to that of Theorem 7.3 hold.

8 Conclusion

We believe that deep learning could become a valuable approach for solving high-dimensional PDEs, which

are important in physics, engineering, and finance. The PDE solution can be approximated with a deep

neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions.

We prove that the neural network converges to the solution of the partial differential equation as the number

of hidden units increases.

Our deep learning algorithm for solving PDEs is meshfree, which is key since meshes become infeasible in

higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled

time and space points. The approach is implemented for a class of high-dimensional free boundary PDEs in

up to 200 dimensions with accurate results. We also test it on a high-dimensional Hamilton-Jacobi-Bellman

PDE with accurate results.

The DGM algorithm can be easily modified to apply to hyperbolic, elliptic, and partial-integral differential

equations. The algorithm remains essentially the same for these other types of PDEs. However, numerical

performance for these other types of PDEs remains to be be investigated.

It is also important to put the numerical results in Sections 4, 5 and 6 in a proper context. PDEs with

highly non-monotonic or oscillatory solutions may be more challenging to solve and further developments

in architecture will be necessary. Further numerical development and testing is therefore required to better

judge the usefulness of deep learning for the solution of PDEs in other applications. However, the numerical

results of this paper demonstrate that there is sufficient evidence to further explore deep neural network

approaches for solving PDEs.

24

In addition, it would be of interest to establish results analogous to Theorem 7.3 for PDEs beyond the

class of quasilinear parabolic PDEs considered in this paper. Stability analysis of deep learning and machine

learning algorithms for solving PDEs is also an important question. It would certainly be interesting to

study machine learning algorithms that use a more direct variational formulation of the involved PDEs. We

leave these questions for future work.

In this section we have gathered the proofs of the theoretical results of Section 7.

Proof of Theorem 7.1. By Theorem 3 of [26] we know that there is a function f ∈ C(ψ) that is uniformly

2−dense on compacts of C 2 (R1+d ). This means that for u ∈ C 1,2 ([0, T ] × Rd ) and > 0, there is f ∈ C(ψ)

such that

sup |∂t u(t, x) − ∂t f (t, x; θ)| + max sup |∂x(a) u(t, x) − ∂x(a) f (t, x; θ)| < (A.1)

(t,x)∈ΩT |a|≤2 (t,x)∈Ω̄T

We have assumed that (u, p) 7→ γ̂(t, x, u, p) is locally Lipschitz continuous in (u, p) with Lipschitz constant

that can have at most polynomial growth in u and p , uniformly with respect to t, x. This means that

|γ̂(t, x, u, p) − γ̂(t, x, v, s)| ≤ |u|q1 /2 + |p|q2 /2 + |v|q3 /2 + |s|q4 /2 (|u − v| + |p − s|) .

for some constants 0 ≤ q1 , q2 , q3 , q4 < ∞. Therefore we obtain, using Hölder inequality with exponents r1 , r2 ,

Z

2

|γ̂(t, x, f, ∇x f ) − γ̂(t, x, u, ∇x u)| dν1 (t, x) ≤

ΩT

Z

≤ (|f (t, x; θ)|q1 + |∇x f (t, x; θ)|q2 + |u(t, x)|q3 + |∇x u(t, x)|q4 )

ΩT

× |f (t, x; θ) − u(t, x)|2 + |∇x f (t, x; θ) − ∇x u(t, x)|2 dν1 (t, x)

Z 1/r1

q1 q2 q3 q4 r1

≤ (|f (t, x; θ)| + |∇x f (t, x; θ)| + |u(t, x)| + |∇x u(t, x)| ) dν1 (t, x)

ΩT

Z 1/r2

r2

× |f (t, x; θ) − u(t, x)|2 + |∇x f (t, x; θ) − ∇x u(t, x)|2 dν1 (t, x)

ΩT

Z 1/r1

r

≤K (|f (t, x; θ) − u(t, x)|q1 + |∇x f (t, x; θ) − ∇x u(t, x)|q2 + |u(t, x)|q1 ∨q3 + |∇x u(t, x)|q2 ∨q4 ) 1 dν1 (t, x)

ΩT

Z 1/r2

r2

× |f (t, x; θ) − u(t, x)|2 + |∇x f (t, x; θ) − ∇x u(t, x)|2 dν1 (t, x)

ΩT

≤ K q1 + q2 + sup |u|q1 ∨q3 + sup |∇x u|q2 ∨q4 2 (A.2)

ΩT ΩT

where the unimportant constant K < ∞ may change from line to line and for two numbers q1 ∨ q3 =

max{q1 , q3 }. In the last step we used (A.1).

In addition, we have also assumed that for every i, j ∈ {1, · · · d}, the mapping (u, p) 7→ ∂αi (t,x,u,p) ∂pj is

locally Lipschitz in (u, p) with Lipschitz constant that can have at most polynomial growth on u and p,

uniformly with respect to t, x. This means that

∂αi (t, x, u, p) ∂αi (t, x, v, s) q /2

− ≤ |u| 1 + |p|q2 /2 + |v|q3 /2 + |s|q4 /2 (|u − v| + |p − s|) .

∂pj ∂sj

d

X ∂αi (t, x, u(t, x), ∇u(t, x))

ξ(t, x, u, ∇u, ∇2 u) = ∂xi ,xj u(t, x).

i,j=1

∂uxj

25

Then, similarly to (A.2) we have after an application of Hölder inequality, for some constant K < ∞ that

may change from line to line,

Z

ξ(t, x, f, ∇x f, ∇2x f ) − ξ(t, x, u, ∇x u, ∇2x u)2 dν1 (t, x) ≤

ΩT

2

d

X

∂αi (t, x, f (t, x; θ), ∇f (t, x; θ)) ∂αi (t, x, u(t, x), ∇u(t, x))

Z

≤ − ∂ xi ,xj u(t, x) dν1 (t, x)

ΩT i,j=1 ∂fxj ∂uxj

2

d ∂αi (t, x, f (t, x; θ), ∇f (t, x; θ))

Z X

+ ∂xi ,xj f (t, x; θ) − ∂xi ,xj u(t, x) dν1 (t, x)

ΩT i,j=1 ∂fxj

d Z 1/p

∂xi ,xj u(t, x)2p dν1 (t, x)

X

≤K ×

i,j=1 ΩT

!1/q

∂αi (t, x, f (t, x; θ), ∇f (t, x; θ)) ∂αi (t, x, u(t, x), ∇u(t, x)) 2q

Z

× − dν1 (t, x)

+

ΩT

∂fx j ∂ux j

d

!1/p Z

∂αi (t, x, f, ∇f ) 2p

Z 1/q

∂xi ,xj f (t, x; θ) − ∂xi ,xj u(t, x)2q dν1 (t, x)

X

+K dν1 (t, x)

i,j=1 ΩT

∂fx j

ΩT

d Z 1/p

X 2p

≤K ∂x

i ,xj

u(t, x) dν1 (t, x) ×

i,j=1 ΩT

Z 1/(qr1 )

qr1

× (|f (t, x; θ) − u(t, x)|q1 + |∇x f (t, x; θ) − ∇x u(t, x)|q2 + |u(t, x)|q1 ∨q3 + |∇x u(t, x)|q2 ∨q4 ) dν1 (t, x)

ΩT

Z 1/(qr2 )

qr2

× |f (t, x; θ) − u(t, x)|2 + |∇x f (t, x; θ) − ∇x u(t, x)|2 dν1 (t, x)

ΩT

d

!1/p Z

∂αi (t, x, f, ∇f ) 2p

Z 1/q

X 2q

+K dν1 (t, x) ∂ x

i ,xj

f (t, x; θ) − ∂xi ,xj u(t, x) dν1 (t, x)

i,j=1 ΩT

∂fx j

ΩT

2

≤ K , (A.3)

where in the last step we followed the computation in (A.2) and used (A.1).

Using (A.1) and (A.2)-(A.3) we subsequently obtain for the objective function (note that G[u](t, x) = 0

for u that solves the PDE)

2 2 2

J(f ) = kG[f ](t, x)kΩT ,ν1 + kf (t, x; θ) − g(t, x)k∂ΩT ,ν2 + kf (0, x; θ) − u0 (x)kΩ,ν3

2 2 2

= kG[f ](t, x) − G[u](t, x)kΩT ,ν1 + kf (t, x; θ) − g(t, x)k∂ΩT ,ν2 + kf (0, x; θ) − u0 (x)kΩ,ν3

Z Z

2 ξ(t, x, f, ∇f, ∇2 f ) − ξ(t, x, u, ∇u, ∇2 u)2 dν1 (t, x)

≤ |∂t u(t, x) − ∂t f (t, x; θ)| dν1 (t, x) +

Ω ΩT

Z T Z

2 2

+ |γ(t, x, f, ∇x f ) − γ(t, x, u, ∇x u)| dν1 (t, x) + |f (t, x; θ) − u(t, x)| dν2 (t, x)+

ΩT ∂ΩT

Z

2

+ |f (0, x; θ) − u(0, x)| dν3 (t, x)

Ω

≤ K2

for an appropriate constant K < ∞. The last step completes the proof of the Theorem after rescaling .

Proof of Theorem 7.3. Existence, regularity and uniqueness for (7.5) follows from Theorem 2.1 [40] combined

with Theorems 6.3-6.5 of Chapter V.6 in [28] (see also Theorem 6.6 of Chapter V.6 of [28]). Boundedness

26

follows from Theorem 2.1 in [40] and Chapter V.2 in [28]. The convergence proof follows by the smoothness

of the neural networks together with compactness arguments as we explain below.

Let us first consider problem (7.6) with g n (t, x) = 0 and let us denote the solution to this problem by

fˆn (t, x). Due to Condition 7.2, Lemma 4.1 of [40] ˆn

applies and gives that {f }n∈N is uniformly bounded

∞ 2

2 1,2

with respect to n in at least L 0, T ; L (Ω) ∩ L 0, T ; W0 (Ω) (in regards to such uniform energy bound

results we also refer the reader to Theorem 2.1 and Remark 2.14 of [5] for the case γ = 0 and to [34, 37]

for related results in more general cases). As a matter of fact fˆn is more regular than stated, see Section 6,

Chapter V of [28], but we will not make use of this fact in the convergence proof of fˆn to u. These uniform

energy bounds imply that we can extract a subsequence, denoted

also by {fˆn }n∈N , which converges to some

u in the weak-* sense in L∞ 0, T ; L2 (Ω) and weakly in L2 0, T ; W01,2 (Ω) and to some v weakly in L2 (Ω)

d

Next let us set q = 1 + d+4 ∈ (1, 2) and note that for conjugates, r1 , r2 > 1 such that 1/r1 + 1/r2 = 1

Z q Z

q

γ(t, x, fˆn , ∇x fˆn ) dtdx ≤ |λ(t, x)| |∇x fˆn (t, x)|q dtdx

ΩT ΩT

Z 1/r1 Z 1/r2

r1 q

≤ |λ(t, x)| dtdx |∇x fˆn (t, x)|r2 q dtdx . (A.4)

ΩT ΩT

r2 2

Let us choose r2 = 2/q > 1. Then we calculate r1 = = 2−q .

Hence, we have that r1 q = d + 2.

r2 −1

Recalling the assumption λ ∈ L d+2

(ΩT ) and the uniform bound on the
∇x fˆn
we subsequently obtain

2

d

that for q = 1 + d+4 , there is a constant C < ∞ such that

Z q

γ(t, x, fˆn , ∇x fˆn ) dtdx ≤ C.

ΩT

The latter estimate together with the growth assumptions on α(·) from Condition 7.2, imply that

{∂t fˆn }n∈N is bounded uniformly with respect to n in L1+d/(d+4) (ΩT ) and in L2 (0, T ; W −1,2 (Ω)). Consider

the conjugates 1/δ1 + 1/δ2 = 1 with δ2 > max{2, d}. Due to the embedding

W −1,2 (Ω) ⊂ W −1,δ1 (Ω), Lq (Ω) ⊂ W −1,δ1 (Ω), and L2 (Ω) ⊂ W −1,δ1 (Ω),

we have that {∂t fˆn }n∈N is bounded uniformly with respect to n in L1 (0, T ; W −1,δ1 (Ω)). Define now the

spaces X = W01,2 (Ω), B = L2 (Ω) and Y = W −1,δ1 (Ω), and notice that

X⊂B⊂Y

with the first embedding being compact. Then, Corollary 4 of [48] yields relative compactness of {fˆn }n∈N

in L2 (ΩT ), which means that {fˆn }n∈N converges strongly to u in that space. Thus, up to subsequences,

{fˆn }n∈N converges almost everywhere to u in ΩT .

The nonlinearity of the α and γ functions with respect to the gradient prohibits us from passing to

thelimit directly in the respective weak formulation. However, the uniform boundedness of {fˆn }n∈N in

Lσ 0, T ; W01,σ (Ω) with σ > 1 (in fact here σ = 2) and its weak convergence to u in that space, allows us

to conclude, as in Theorem 3.3 of [4], that

Hence, we obtain that {fˆn }n∈N converges to u strongly also in Lρ 0, T ; W01,ρ (Ω) for every ρ < 2.

In preparation to passing to the limit as n → ∞ in the weak formulation, we need to study the behavior

of the nonlinear terms. Recalling the assumptions on α(t, x, u, p) we have for ρ < 2 and for a measurable set

27

A ⊂ ΩT (the constant K < ∞ may change from line to line)

Z ρ Z Z

ˆn ˆn

α(t, x, f , ∇f ) dtdx ≤ K

ρ

|κ(t, x)| dtdx + ˆn ρ

|∇f (t, x)| dtdx

A A A

"Z Z ρ/2 #

≤K |κ(t, x)|ρ dtdx + |∇fˆn (t, x)|2 dtdx |A|1−ρ/2

A ΩT

Z

≤K |κ(t, x)|ρ dtdx + |A|1−ρ/2 .

A

In the latter display we used Höder inequality with exponent 2/ρ > 1. By Vitali’s theorem we then conclude

that

α(t, x, fˆn , ∇fˆn ) → α(t, x, u, ∇u) strongly in Lρ (ΩT )

as n → ∞, for every 1 < ρ < 2. For the same reason, an analogous estimate to (A.4), gives

Z q Z (2−q)/2

η

ˆn ˆn d+2

γ(t, x, f , ∇x f ) dtdx ≤ K |λ(t, x)| dtdx ≤ K|A| d+2+η

A A

d

as n → ∞, for q = 1 + d+4 .

Notice also that by construction we have that the initial condition un0 converges to u0 strongly in L2 (Ω).

The weak formulation of the PDE (7.6) with g n = 0 reads as follows. For every t1 ∈ (0, T ]

Z h D E i

−fˆn ∂t φ + α(t, x, fˆn , ∇fˆn ), ∇φ + (γ(t, x, fˆn , ∇fˆn ) − hn )φ (t, x)dxdt

Ωt1

Z Z

+ fˆn (t1 , x)φ(t1 , x)dx − un0 (x)φ(0, x)dx = 0

Ω Ω

for every φ ∈ C0∞ (ΩT ). Using the above convergence results, we then obtain that the limit point u satisfies

for every t1 ∈ (0, T ] the equation

Z

[−u∂t φ + hα(t, x, u, ∇u), ∇φi + γ(t, x, u, ∇u)φ] (t, x)dxdt

Ωt1

Z Z

+ u(t1 , x)φ(t1 , x)dx − u0 (x)φ(0, x)dx = 0,

Ω Ω

It remains to discuss the convergence of f n − fˆn to zero, where we recall that f n is the neural network

approximation satisfying (7.6) and fˆn satisfies (7.6) with g n = 0. The functions f n ∈ C 1,2 (Ω̄T ) and Ω̄T is

compact. We have also assumed that {f n }n is uniformly bounded in L2 (ΩT ). This implies that, up to a

subsequence, f n will converge at least weakly in L2 (ΩT ). Moreover, the boundary values g n (t, x) (which is

nothing else by f n (t, x) evaluated at the smooth boundary ∂ΩT ) in (7.6) converge to zero strongly in L2 .

It is then a standard result that g n , i.e., f n evaluated at the boundary, converges to zero, at least, almost

uniformly along a subsequence, see for example Lemma 2.1 in Chapter II of [28]. As it then follows, for

example, by the proof of Theorems 6.3-6.4-6.5 of Chapter V.6 in [28], using smoothness and uniqueness, f n

will differ from the solution to the PDE (7.6) with g n = 0, fˆn (t, x), by a negligible amount as n → ∞ in the

almost everywhere sense. The assumed uniform L2 bound for {f n }n∈N together with the previously derived

uniform L2 (ΩT ) bound for {fˆn }n∈N yield uniform L2 (ΩT ) boundedness for {f n − fˆn }n∈N . Then, by Vitali’s

theorem again, we get that {f n − fˆn }n∈N goes to zero strongly in Lρ (ΩT ) for every ρ < 2.

The previously derived strong convergence of {f n − fˆn }n∈N to zero in Lρ (ΩT ) for every ρ < 2, together

with the strong L2 (ΩT ) convergence of {fˆn }n∈N to u, conclude the proof of the convergence in Lρ (ΩT ) for

every ρ < 2 using triangle inequality.

28

If {f n (t, x)} is equicontinuous then, Lemma 3.2 of [17] gives uniform convergence of g n to zero. Hence,

by the previous analysis, it will certainly be true that {f n }n∈N converges to u in Lρ (ΩT ) for every ρ < 2.

The Lρ convergence to zero together with boundedness and equicontinuity of the sequence {f n (t, x)} results

then in uniform convergence due to the well known Arzelà-Ascoli theorem.

References

[1] S. Asmussen and P. Glynn, Stochastic Simulation: Algorithms and Analysis, Springer, 2007.

[2] C. Beck, W. E., A. Jentzen. Machine learning approximation algorithms for high-dimensional fully

nonlinear partial differential equations and second-order backward stochastic differential equations,

arXiv:1709.05963, 2017.

[3] D. Bertsekas and J. Tsitsiklis, Gradient convergence in gradient methods via errors, SIAM Journal of

Optimization, Vol.10, No. 3, pgs. 627-642, 2000.

[4] L. Boccardo, A. Dall‘Aglio, T. Gallouët and L. Orsina, Nonlinear parabolic equations with measure

data, Journal of Functional Analysis, Vol. 147, pp. 237-258, (1997).

[5] L. Boccardo, M.M. Porzio and A. Primo, Summability and existence results for nonlinear parabolic

equations, Nonlinear Analysis: Theory, Methods and Applications, Vol. 71, Issue 304, pp. 1-15, 2009.

[6] H. Bungartz, A. Heinecke, D. Pfluger, and S. Schraufstetter, Option pricing with a direct adaptive sparse

grid approach, Journal of Computational and Applied Mathematics, Vol. 236, No. 15, pgs. 3741-3750,

2012.

[7] H. Bungartz and M Griebel, Sparse Grids, Acta numerica, Vol. 13, pgs. 174-269, 2004.

[8] P. Chandra and Y. Singh, Feedforward sigmoidal networks - equicontinuity and fault-tolerance proper-

ties, IEEE Transactions on Neural Networks, Vol. 15, Issue 6, pp. 1350-1366, 2004.

[9] P. Chaudhari, A. Oberman, S. Osher, S. Soatto, and G. Carlier. Deep relaxation: partial differential

equations for optimizing deep neural networks, 2017.

[10] S. Cerrai, Stationary Hamilton-Jacobi Equations in Hilbert Spaces and Applications to a Stochastic

Optimal Control Problem, SIAM Journal on Control and Optimization, Vol. 40, No. 3, pgs. 824-852,

2001.

[11] G. Cybenko, Approximation by superposition of a sigmoidal function, Mathematics of Control, Signals

and Systems, Vol. 2, pp. 303-314, 1989.

[12] A. Davie and J. Gaines, Convergence of Numerical Schemes for the Solution of Parabolic Stochastic

Partial Differential Equations, Mathematics of Computation, Vol. 70, No. 233, pgs. 121-134, 2000.

[13] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. Le, and

A. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems,

pp. 1223-1231, 2012.

[14] A. Debussche, M. Fuhrman, and G. Tessitore, Optimal control of a stochastic heat equation with

boundary-noise and boundary-control, ESAIM: Control, Optimisation and Calculus of Variations, Vol.

13, No. 1, pgs. 178-205, 2007.

[15] W. E., J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic

partial differential equations and backward stochastic differential equations, Communications in Math-

ematics and Statistics, Springer, 2017.

[16] M. Fujii, A. Takahashi, and M. Takahashi. Asymptotic Expansion as Prior Knowledge in Deep Learning

Method for high dimensional BSDEs, arXiv:1710.07030, 2017.

29

[17] A. M. Garcia, E. Rodemich, H. Rumsey Jr. and M. Rosenblatt, A real variable lemma and the continuity

of paths of some Gaussian processes, Indiana University Mathematics Journal, Vol. 20, No. 6, pp. 565-

578, December, 1970.

[18] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks.

In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp.

249-256, 2010.

[19] J. Gaines. Numerical experiments with SPDEs. London Mathematical Society Lecture Note Series,

55-71, 1995.

[20] D. Gilbarg and N.S. Trudinger. Elliptic partial differential equations of second order, second edition.

Springer-Verlang, Berlin Heidelberg, 1983.

[21] I. Gyöngy, Lattice Approximations for Stochastic Quasi-Linear Parabolic Partial Differential Equations

Driven by Space-Time White Noise I, Potential Analysis, Vol. 9, Issue 1, pgs. 125, 1998.

[22] A. Heinecke, S. Schraufstetter, and H. Bungartz, A highly parallel Black-Scholes solver based on adaptive

sparse grids, International Journal of Computer Mathematics, Vol. 89, No. 9, pgs. 1212-1238, 2012.

[23] M. Haugh and L. Kogan, Pricing American Options: A Duality Approach, Operations Research, Vol.

52, No. 2, pgs. 258-270, 2004.

[24] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, Vol. 9, No. 8, pgs.

1735-1780, 1997.

[25] K. Hornik, M. Stinchcombe and H. White, Universal approximation of an unknown mapping and its

derivatives using multilayer feedforward networks, Neural Networks, Vol. 3, Issue 5, pp. 551-560, 1990.

[26] K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, Vol. 4, pp.

251-257, 1991.

[27] D. Kingma and J. Ba, ADAM: A method for stochastic optimization. arXiv: 1412.6980, 2014.

Parabolic Type (Translations of Mathematical Monographs Reprint), American Mathematical Society,

Vol. 23, 1988.

[29] I. Lagaris, A. Likas, and D. Fotiadis, Artificial neural networks for solving ordinary and partial differ-

ential equations, IEEE Transactions on Neural Networks, Vol. 9, No. 5, pgs. 987-1000, 1998.

[30] I. Lagaris, A. Likas, and D. Papageorgiou, Neural-network methods for boundary value problems with

irregular boundaries, IEEE Transactions on Neural Networks, Vol. 11, No. 5, pgs. 1041-1049, 2000.

[31] H. Lee, Neural Algorithm for Solving Differential Equations, Journal of Computational Physics, Vol.

91, pgs. 110-131, 1990.

[32] J. Ling, A. Kurzawski, and J. Templeton. Reynolds averaged turbulence modelling using deep neural

networks with embedded invariance. Journal of Fluid Mechanics, Vol. 807, pgs. 155-166, 2016.

[33] F. Longstaff and E. Schwartz, Valuing American Options by Simulation: A Simple Least-Squares Ap-

proach, Review of Financial Studies, Vol. 14, pgs. 113-147, 2001.

[34] M. Magliocca, Existence results for a Cauchy-Dirichlet parabolic problem with a repulsive gradient

term, Nonlinear Analysis, Vol. 166, pp. 102-143, 2018

[35] A. Malek and R. Beidokhti, Numerical solution for high order differential equations using a hybrid neural

network-optimization method, Applied Mathematics and Computation, Vol. 183, No. 1, pgs. 260-271,

2006.

30

[36] F. Masiero, HJB equations in infinite dimensions, Journal of Evolution Equations, Vol. 16, No. 4, pgs.

789-824, 2016.

[37] R. Di Nardo, F. Feo, O. Guibé, Existence result for nonlinear parabolic equations with lower order

terms, Anal. Appl.(Singap.), Vol. 09, No. 02, pp. 161-186, 2011.

[38] P. Petersen and F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU

neural networks, (2017), arXiv: 1709.05289v4.

[39] A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, Vol. 8, pp.

143195, 1999.

[40] M. M. Porzio, Existence of solutions for some “noncoercive” parabolic equations, Discrete and Contin-

uous Dynamical Systems, Vol. 5, Issue 3, pp. 553-568, 1999.

[41] M. Raissi, P. Perdikaris, and G. Karniadakis. Physics Informed Deep Learning (Part I): Data-driven

Solutions of Nonlinear Partial Differential Equations, arXiv:1711.10561, 2017.

[42] M. Raissi, P. Perdikaris, and G. Karniadakis. Physics Informed Deep Learning (Part II): Data-driven

Discovery of Nonlinear Partial Differential Equations, arXiv:1711.10566, 2017.

[43] C. Reisinger and G. Wittum, Efficient hierarchical approximation of high-dimensional option pricing

problems, SIAM Journal on Scientific Computing, Vol. 29, No. 1, pgs. 440-458, 2007.

[44] C. Reisinger, Analysis of linear difference schemes in the sparse grid combination technique, IMA Journal

of Numerical Analysis, Vol. 33, No. 2, pgs. 544-581, 2012.

[45] L.C.G. Rogers, Monte-Carlo Valuation of American Options, Mathematical Finance, Vol. 12, No. 3, pgs.

271-286, 2002.

[46] K. Rudd, Solving Partial Differential Equations using Artificial Neural Networks, PhD Thesis, Duke

University, 2013.

[47] R. Srivastava, K. Greff, and J. Schmidhuber, Training very deep networks, In Advances in Neural

Information Processing Systems, pp. 2377-2385, 2015.

[48] J. Simon, Compact sets in the space Lp (0, T ; B), Annali di Matematica Pura ed Applicata, Vol. 146, pp.

65-96, 1987.

[49] J. Tompson, K. Schlachter, P. Sprechmann, and K. Perlin. Accelerating Eulerian Fluid Simulation with

Convolutional Networks, Proceedings of Machine Learning Research, Vol. 70, pgs. 3424-3433, 2017.

31

- Numerical Solution of Two Dimensional Diffusion Equations with Nonlocal Boundary Conditions by Iterative Laplace Transform MethodTransféré parInternational Organization of Scientific Research (IOSR)
- Projects in deep learningTransféré parRobert Lang
- Frist SheetTransféré parJivnesh Sandhan
- Hull WhiteTransféré parJavierma Bedoya
- Choi_Deep Learning for Musical Info RetrievalTransféré parKai Lin Zhang
- SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKSTransféré parjehosha
- A Blockchain Future to Internet of Things Security a Position PaperTransféré parLong Vỹ
- Machine learning for antennas and radar signal processingTransféré parSuryaRajitha Inp
- AI Concrete StrengthTransféré parpdhurvey
- ML AI Main BrochureTransféré parsrivatsa
- A Tour of TensorFlowTransféré parchewable
- structural analysis2Transféré parGrantHerman
- 1273-3831-1-PBTransféré parmaheswaribalaji2010
- The Numerical Method of Lines for PartialTransféré parNaeem Younis
- Self-Driving+Car+Nanodegree+SyllabusTransféré parKapildev Kumar
- SY_MTEPS_ITransféré partigerman man
- p1071-parkTransféré parSunny Dhamnani
- Chapter 20Transféré parAriana Ribeiro Lameirinhas
- Understanding Deep LearningTransféré parstelios
- Sistema híbrido para la predicción del comportamiento de GNLTransféré parDario Huertas
- Chapter 3 TWO-DIMENSIONAL STEADY STATE CONDUCTIONTransféré parYash Morey
- An Approach on the Evaluation of LNG Tank Container Transportation SafetyTransféré parIJEMR Journal
- MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONTransféré parAdam Hansen
- Www.deeplearningbook.org Contents ApplicationsTransféré parblackdaisy13
- Numerical Methods for Elliptic PDEsTransféré parYoong Kia
- Sean McnallyTransféré parSerag El-Deen
- ieee-annTransféré parAjit Gadekar
- Project Report1.docxTransféré parVishal Shasi
- Combined Ad No.01-2018 without APS _0 (1).pdfTransféré parasad naqvi
- Mathematics r 18Transféré parVignesh

- Gaurav BansalTransféré paranon-811480
- Pico Computing Password Recovery Program Manual (Apr 2010)Transféré parpmooney55
- 14298629 DOS CommandsTransféré parguru_mohan88
- worksheet 01- character designTransféré parapi-340010192
- Ansys - High frequenty magneticTransféré parrohamsuni86
- soj-05-01-58159.pdfTransféré parAlvin Julian
- Pellet PotatoTransféré parArchit Agarwal
- Overview of BarapukuriaTransféré parRubel Shikder
- phases eclipses and tides worksheetTransféré parapi-254381718
- Normalization.docxTransféré partwizeyarsene
- GE6161(SET2)Transféré parheavendew
- Lipid Extraction Methods From MicrTransféré parOsaid Haq
- 17.the Three-Tier Security Scheme in Wireless Sensor Networks With Mobile SinksTransféré parDipali Akhare
- A Method of PCI Planning in LTE Based on Genetic AlgorithmTransféré parAttila Kovács
- Demantra End 2 End Sol GooDTransféré parRamesh Poshala
- ProVaC Catalogue V2Transféré parAbhayaSimha
- Syllabus.pdfTransféré parVinod Kumar Chauhan
- Design of DrainTransféré parD SRINIVAS
- Olive Upi Sdk API v1.0Transféré parshanysunny
- 19. Static ElectricityTransféré parslm8749
- 9709_w14_qp_32Transféré parCheng WL
- Cs Ansys Tools Enabled Metso Design Jaw CrusherTransféré parJotheeswaran
- DN05109-DTransféré parManoel Camargo Sampaio
- Six Sigma to Improve Finance FunctionTransféré parJúlio Pizzi Damião
- SURFACTANTS IN SOLUTION.pptxTransféré parGustria Ernis
- ML _ Intro.pdfTransféré parParikshit Sharma
- SX002a-En-EU-Example- Buckling Resistance of a Pinned Column With Intermediate RestraintsTransféré parW
- fuel pump controlTransféré parChinh Nguyen Tang
- Kai Ramko Nda BioTransféré parkra_am
- aimepracTransféré parxpgong

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.