Académique Documents
Professionnel Documents
Culture Documents
David Higdon
Houman Owhadi
Editors
Handbook of
Uncertainty
Quantification
Handbook of Uncertainty Quantification
Roger Ghanem David Higdon
Houman Owhadi
Editors
Handbook of Uncertainty
Quantification
123
Editors
Roger Ghanem David Higdon
Department of Civil and Environmental Social and Decision Analytics Laboratory
Engineering Virginia Bioinformatics Institute
University of Southern California Virginia Tech University
Los Angeles, CA, USA Arlington, VA, USA
Houman Owhadi
Computing and Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
v
Contents
Volume 1
vii
viii Contents
Volume 2
Volume 3
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2039
About the Editors
Roger Ghanem
Department of Civil and Environmental
Engineering
University of Southern California
Los Angeles, CA, USA
Email: ghanem@usc.edu
David Higdon
Social and Decision Analytics
Laboratory
Virginia Bioinformatics Institute
Virginia Tech University
Arlington, VA, USA
Email: dhigdon@vbi.vt.edu
Houman Owhadi
Computing and Mathematical Sciences
California Institute of Technology
Pasadena, CA, USA
Email: owhadi@caltech.edu
xv
xvi Section Editors
Methodology
Wei Chen
Department of Mechanical
Engineering
Northwestern University
Evanston, IL, USA
Email: weichen@northwestern.edu
Forward Problems
Habib N. Najm
Sandia National Laboratories
P.O.Box 969, MS 9051
Livermore, CA 94551, USA
Email: hnnajm@sandia.gov
Bertrand Iooss
Industrial Risk Management Department
EDF R&D
EDF Lab Chatou
6 Quai Watier
78401 Chatou, France
and
Institut de Mathmatiques de Toulouse
Universit Paul Sabatier
31062 Toulouse
France
Email: bertrand.iooss@edf.fr
Section Editors xvii
Risk
James R. Stewart
Optimization and UQ Department 1441
Sandia National Laboratories
P.O. Box 5800, MS 1318
Albuquerque, NM 87185
USA
Email: jrstewa@sandia.gov
Mike McKerns
California Institute of Technology
Computing and Mathematical Sciences
102 Steele House MC 9-94
Pasadena CA, 91125
USA
Email: mmckerns@caltech.edu
Contributors
xix
xx Contributors
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Introduction
R. Ghanem ()
Department of Civil and Environmental Engineering, University of Southern California,
Los Angeles, CA, USA
e-mail: ghanem@usc.edu
D. Higdon
Social Decision Analytics Laboratory, Biocomplexity Institute, Virginia Polytechnic Institute and
State University, Arlington, VA, USA
e-mail: dhigdon@vbi.vt.edu
H. Owhadi
Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
e-mail: owhadi@caltech.edu
of opinions on part of the editors in chief and organizes the handbook around
methodological developments, algorithms for the statistical exploration of the
forward model, sensitivity analysis, risk assessment, codes of practice, and software.
Most inference problems of current significance to the UQ community can be
assembled using building blocks from these six components. The contributions
consist of overview articles of interest both to newcomers and veterans of UQ.
Scientific progress proceeds in increments, and its transformative jumps invari-
ably entail falsifying prevalent theories. This involves comparing predictions from
theory with experimental evidence. While this recipe for advancing knowledge
remains as effective today as it has been throughout history, its two key ingredients
carry within them a signature of their own time and are thus continually evolving.
Indeed, both predicting and observing the physical world, the two main ingredients
of the scientific process, reflect our perspective on the physical world and are
delineated by technology.
The pace of innovation across the whole scientific spectrum, coupled with
previously unimaginable capabilities to both observe and analyze the physical
world, has heightened the expectations that the scientific machinery can anticipate
the state of the world and can thus serve to improve comfort and health and to
mitigate disasters.
Uncertainty quantification is the rational process by which proximity between
predictions and observations is characterized. It can be thought of as the task
of determining appropriate uncertainties associated with model-based predictions.
More broadly, it is a field that combines concepts from applied mathematics,
engineering, computational science, and statistics, producing methodology, tools,
and research to connect computational models to the actual physical systems
they simulate. In this broader interpretation, UQ is relevant to a wide span
of investigations. These range from seeking detailed quantitative predictions for
a well-understood and accurately modeled engineering systems to exploratory
investigations focused on understanding trade-offs in a new or even hypothetical
physical system.
Uncertainty in model-based predictions arises from a variety of sources including
(1) uncertainty in model inputs (e.g. parameters, initial conditions, boundary
conditions, forcings), (2) model discrepancy or inadequacy due to the difference
between the model and the true system, (3) computational costs, limiting the number
of model runs and supporting analysis computations that can be carried out, and
(4) solution and coding errors. Verification can help eliminate solution and coding
errors. Speeding up a model by replacing it with a reduced order model is a strategy
for trading off error/uncertainty between (2) and (3) above. Similarly, obtaining
additional data, or higher quality data, is often helpful in reducing uncertainty due to
(1) but will do little to reduce uncertainty from other sources. The multidisciplinary
nature of UQ makes it ripe for exploiting synergies at the intersection of a number
of disciplines that comprise this new field. More specifically, for instance,
1 Introduction to Uncertainty Quantification 5
Abstract
Computer simulators are a useful tool for understanding complicated systems.
However, any inferences made from them should recognize the inherent lim-
itations and approximations in the simulators predictions for reality, the data
used to run and calibrate the simulator, and the lack of knowledge about the best
inputs to use for the simulator. This article describes the methods of emulation
and history matching, where fast statistical approximations to the computer
simulator (emulators) are constructed and used to reject implausible choices
of input (history matching). Also described is a simple and tractable approach
to estimating the discrepancy between simulator and reality induced by certain
intrinsic limitations and uncertainties in the simulator and input data. Finally,
a method for forecasting based on this approach is presented. The analysis is
based on the Bayes linear approach to uncertainty quantification, which is similar
in spirit to the standard Bayesian approach but takes expectation, rather than
probability, as the primitive for the theory, with consequent simplifications in the
prior uncertainty specification and analysis.
Keywords
Computer simulators Bayes linear Emulation Model discrepancy His-
tory matching Calibration Internal discrepancy Forecasting
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Example: Rainfall Runoff Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 The Bayesian Analysis of Computer Simulators for Physical Systems . . . . . . . . . . . . . . . 12
1 Introduction
One of the main tools for studying complex real-world phenomena is the creation
of mathematical models for such phenomena, typically implemented as computer
simulators. There is a growing field of study which is concerned with the uncertain-
ties arising when computer simulators are used to make inferences about real-world
behavior. Two characteristic features of this field are, firstly, the need to analyze
uncertainties for simulators which are slow to evaluate and, secondly, the need to
recognize and assess the difference between the simulator and the physical system
which the simulator purports to represent. In this article, the Bayes linear approach
to the assessment of uncertainty for such problems is described. This Bayes linear
approach has been successfully applied in a variety of areas, including oil reservoir
management [3], climate modeling [17], and simulators of galaxy formation [16].
The aim of this paper is to present a survey of the common general methodology
followed in each such application.
The structure of the article is as follows. First, the Bayesian analysis of computer
simulators for physical systems is discussed, and the role of Bayes linear analysis in
this context is described and motivated. This is followed by a discussion of the
issues arising when the simulator is slow to evaluate and the role of emulation
in such problems. Next, ways to assess the structural discrepancy arising from
the mismatch between the simulator and the physical system are described. These
methods are then used in the context of history matching, namely, finding collections
of simulator evaluations which are consistent with historical observations, within the
levels of uncertainty associated with the problem. Finally, the role of the simulator
in forecasting within the Bayes linear framework is described. The running example
used to illustrate the development is based around flood modeling. This example
has the merit that the code is freely available as an R package, so that the interested
reader may try out the type of analysis described and compare it to alternative
approaches to the same problems. This work was supported by NERC under the
PURE research program.
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 11
The running example is of a rainfall runoff simulator, that is, a simulator that
predicts stream flow in a river. The particular simulator considered in this article
is FUSE (The authors thank their colleagues, from the NERC PURE program,
Wouter Buytaert, Nataliya Bulygina, and in particular Claudia Vitolo, for their help
in running and interpreting the FUSE simulator.) (Framework for Understanding
Structural Errors), described in [2], which can be downloaded for R from R-Forge at
https://r-forge.r-project.org/R/?group_id=411. FUSE was designed as a toolbox of
different modeling choices (there are 1248 different simulators available in FUSE),
but in this article, only one (model 17) is considered. For brevity, this specific
simulator within FUSE is referred to throughout simply as FUSE.
FUSE takes as its input a time series of average rainfall across the catchment
in question, on whatever time scale one desires, and a time series of potential
evapotranspiration on the same time scale. Its output is a time series of predicted
stream flow, on the same scale. For this example, the simulator was run using data
from the Pontbren catchment in Wales; the dataset is freely available from the Centre
for Ecology & Hydrology (https://eip.ceh.ac.uk/). After some processing, the data
consist of hourly readings over approximately 15 months, where the rainfall data is
derived from six rainfall gauges giving readings every 10 min, and the evapotranspi-
ration is calculated hourly using the Penman-Monteith equation [12]. The output is
compared to hourly readings of stream flow at a gauge near the end of the catchment.
FUSE models water storage as consisting of three compartments, the size of
which is governed by three corresponding parameters. Flow into, out of, and
between these compartments are governed by simple equations that rely on five
further parameters. Finally, a time delay, governed by one parameter, is applied to
the predicted stream flow. Full details of these equations can be found in [2].
In summary, FUSE is run by providing a time series of rainfall, a time series of
evapotranspiration, nine parameters, and finally an initial condition that specifies
how much water is in the compartments at the beginning. Figure 2.1 shows an
20
15
Discharge
10
5
0
Fig. 2.1 Observed discharge (red) compared with FUSE runs for two choices of parameters
12 M. Goldstein and N. Huntley
example of FUSE output for a decent choice of parameters (blue line) and a very
poor choice of parameters (green line) compared with observed stream flow (red
line) for a small section of the data.
This article is concerned with problems in which computer simulators are used to
represent physical systems. Each simulator can be conceived as a function f .x/,
where x is an input vector representing properties of the physical system and f .x/
is an output vector representing system behavior. Typically, some of the elements of
x represent unknown physical properties of the system, some are tuning parameters
to compensate for approximations in the simulator, and some are control parameters
which correspond to ways in which system behavior may be influenced by external
actions.
Analysis of simulator behavior provides qualitative insights into the behavior
of the physical system. Usually, it is of interest to identify appropriate (in some
sense) choices, x , for the system properties x and to assess how informative
f .x / is for actual system behavior, y. Often, historical observations, z, observed
with error, are available, corresponding to a historical subvector, yh , of y, which
may be used both to test and to constrain the simulator, by comparison with the
corresponding subvector fh .x/ of f .x/. Typically, there is ensemble of simulator
evaluations Fn D .f .x1 /; : : : ; f .xn // made at an evaluation set of input choices
xn D .x1 ; : : : ; xn / which is used to help address these questions.
There are many uncertainties associated with this analysis. The parameter values
are unknown, and there are uncertainties in the forcing functions, boundary condi-
tions, and initial conditions required to evaluate the simulator. Many simulators have
random outputs. The complexity of the underlying mathematics forces the solutions
of the system equations to be approximated. Typically, computer simulators are slow
to evaluate, and so the value of the function f .x/ must be treated as unknown for
all x except for the design values xn . The data to which the simulator is matched
is only observed with error. Even if all of these considerations could be addressed
precisely, the computer simulator would still be an imperfect representation of the
physical system.
A common representation for simulator discrepancy in practice is to suppose that
there is an appropriate choice of system properties x (currently unknown), so that
f .x / contains all the information about the system, expressed as the relation
y D f .x / C (2.1)
where is a random vector expressing the uncertainty about the physical system that
would remain were the simulator to be evaluated at x . Typically, is taken to be
independent of f and of x , though modifications to this form are discussed in later
sections. The variance of an individual component i expresses ones confidence in
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 13
the ability of the simulator to reproduce the corresponding physical process, while
the correlation between components of expresses the similarity between different
types of structural error, for example, whether underpredictions for the simulator
in the past suggest that the simulator will continue to underpredict in the future.
For many problems, whether this formulation is appropriate is, itself, the question
of interest, that is, the goal is to determine whether there are any choices of input
parameters for which the simulator outputs are in rough agreement with the system
observations in the sense of consistency with (2.1).
The specification is completed by a relationship between observations z and
historical system values yh , which often is taken to be of form
z D yh C e (2.2)
One may view Bayes linear adjustment in two ways. Firstly, this may be viewed
as a fast approximation to a full Bayes analysis based on linear fitting. Secondly,
one may view the Bayes linear approach as giving the appropriate analysis under a
direct partial specification of means, variances, and covariances, where expectation
is treated as primitive. This view is based on an axiomatic treatment of temporal
coherence [7]. In this view, full Bayes analysis is simply a special case where
expectations of families of indicator functions are specified, and probabilistic
conditioning is simply the corresponding special case of (2.3). For a full account
of the Bayes linear approach, see [11].
5 Emulation
In (2.5), B D fij g are unknown scalars and gij .x/ are known deterministic
functions of x (e.g., polynomials). Bg.x/ expresses global variation in fi .x/,
namely, those features of the surface which can be assessed by making simulator
evaluations anywhere in the input space. Therefore, the functional forms gij must
be chosen, and also a second-order specification for the elements of B is required.
As fi .x/ is continuous in x, the residual function ui .x/ is also a continuous
function. ui .x/ expresses local variation, namely, those aspects of the surface which
can only be learned about by making evaluations of the simulator within a subregion
near x. Typically, ui .x/ is represented as a second-order stationary stochastic
process, with a correlation function which expresses the notion that the correlation
between the value of ui for any two values x; x 0 is an increasing function of the
distance between the two input values. A common choice is the squared exponential,
which is used for the illustrations in this article, and is of form
2 !
0 kx x 0 k
Corr.ui .x/; ui .x // D exp : (2.6)
i
This form corresponds to a judgement that the function is very smooth. There
are many choices for the form of the correlation function and a wide literature
on the merits of different choices [15]. How important this choice is depends
largely on the proportion of the uncertainty that can be attributed to the global
portion of the emulator. In this article, an approach in which a lot of effort is
expended on the choice of global fit is favored. Firstly, this is because the choice
of appropriate residual correlation function is a difficult problem, largely because,
in practice, most functional outputs display different degrees of smoothness in
different regions of the input space. Therefore, reducing dependence on this choice
16 M. Goldstein and N. Huntley
The form (2.8) acts as a prior specification for fi .x/. Then, a relatively small
evaluation set Fn can be chosen, which, in combination with a representation of
the relationship between the forms of the global surface in the two simulators, for
example,
9
X 9 X
X 9
X
2
f1 .x/ D 1 C 1jj x.j /C 1j k x.j / x.k/ C 1j x.j / Cu1 .x/: (2.10)
j D1 j D1 k<j j D1
The next step is to seek inactive variables. These can be suggested through
stepwise selection, removing at each step the variable that has the smallest influence.
Following such a strategy for this example suggested removing parameters x.2/ , x.4/ ,
x.6/ , and x.7/ , with the R2 falling only to 0:95 after all are removed. Removing a fifth
variable (either x.3/ or x.5/ ) would reduce the R2 to 0:94; this is still a small change
18 M. Goldstein and N. Huntley
but it was decided not to remove any further variables. Thus, the final form of the
emulator is, following (2.7),
X X X
2
f1 .x/ D 1 C 1jj x.j /C 1j k x.j / x.k/ C 1j x.j / C u1 .xA / C 1 .xI /
j 2A k;j 2A j 2A
k<j
(2.11)
where A D f1; 3; 5; 8; 9g and recalling that x.8/ is still the logarithm of the original
parameter.
This is almost everything needed to perform the Bayes linear adjustment in (2.3)
and (2.4). The goal is to predict the value f1 of the simulator at some new parameter
choice x, using the values f1 .xn / calculated earlier. Thus, in the Bayes linear
adjustment equations, f1 .x/ plays the role of y, and f1 .xn / plays the role of z.
Applying the Bayes linear adjustment in (2.3) and (2.4) requires prior speci-
fications for the relevant expectations and variances. It is assumed a priori that
E.u1 .x// D 0, E.1 .x// D 0 and that both Var .u1 .x// and Var .1 .x// do not
depend on x. Correlation between u1 .x/ and u1 .x 0 / is given by (2.6), and it is
assumed that there is no correlation between 1 .x/ and 1 .x 0 /. If there was suitable
expert knowledge and little data, the remaining values would have to be carefully
elicited from the experts. In this example, however, there is little expert knowledge,
but there are lots of simulator runs. It is therefore reasonable to use some of the
estimates from the least squares fit: for E.1j / and Var 1j , the corresponding
estimates from the fit can be used (where with 1000 runs, the variances will be very
low), and for Var .u1 /, the residual variance can be used. This leaves only Var .1 /:
this is estimated by fixing values of xA and running the simulator for a range of xI .
These runs gave an estimate for Var .1 / of 3 105 , which is very small compared
with f1 (usually around 3). This therefore gives some confidence that this estimation
method should be sufficient.
Only one quantity remains to be assessed: 1 . Following the argument from
[16] that, for an emulator whose mean function is a polynomial of order p, a
plausible choice for 1 is 1=.p C 1/, an initial choice of 1=3 was tried, which
was then refined by cross-validation. For a given value of 1 , for each xi in xn ,
one can calculate Ef1 .xi / .f1 .xi // and Varf1 .xi / .f1 .xi // using the other elements
xi of xn to perform the adjustment. Here, Ef1 .xi / .f1 .xi // is the adjusted
expectation (see (2.3)) for f1 .xi / having observed f1 .xj / for each other xj in xn ,
and Varf1 .xi / .f1 .xi // is the corresponding adjusted variance (see (2.4)). Then,
one can check whether Ef1 .xi / .f1 .xi // is close to f1 .xi /, relative to the size of
Varf1 .xi / .f1 .xi //. If the prediction is many standard deviations away from f1 .xi /
for many xi , this suggests a problem with the model: either it is not fit for purpose, or
the correlations implied by 1 are too strong. On the other hand, if all the predictions
are very close to f1 .x/ relative to the size of the standard deviations, this suggests
that the adjusted variance at many points is too low. This could happen because the
correlation between nearby points is being underestimated, and hence a higher 1
could be tried to give accurate predictions with a lower variance.
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 19
The choice 1 D 1=3 leads to the latter case: the emulator successfully predicts
all 1000 points to within 3 standard deviations, but almost all points are predicted
to within 1 standard deviation and indeed over 80 % of the points are predicted to
within 0:5 standard deviations. That is, the emulator variance is too conservative,
which would lead to poor performance in the history-matching process later. This
could be because 1 is too small. Indeed, increasing 1 to 0:45 gave more sensible
variances and predicted 98 % of the simulations within three standard deviations.
There are several reasons why one might be comfortable with 98 % accuracy. In
a sample of 1000, one would expect some predictions to be so inaccurate anyway.
Further, more investigation of the parameters that led to poor predictions revealed
that they were mostly simulations that gave very high or very low values for f1 ,
and the emulator was correctly predicting that these parameters would give extreme
values. The only failure of the emulator was in underestimating quite how extreme
these were. As shall soon be seen, the two main dangers of an inaccurate emulator
are predicting f1 to be close to reality when it is in fact far away or predicting f1 to
be far from reality when it is in fact close. For this emulator, these cases occur only
very rarely.
As a summary of this validation, Fig. 2.2 shows an example of the predictions
(black line), three standard deviations (blue lines), and observed values (red line) for
predicting 100 parameter choices from the other 900. Observe that large differences
between predictions and observations do indeed occur when the adjusted variance
is high.
To conclude this section, note that, once the fit has been performed,
the updates
in (2.3) and (2.4) are performed very quickly. Inverting Var f1 .xn needs only
be done once, so for a particular candidate point x, the only computations are
4.5
Log maximum of stream flow
4.0
3.5
3.0
2.5
0 20 40 60 80 100
Index of parameter being predicted
Fig. 2.2 Emulator of f1 .x/ (black line) with three standard deviation bounds (blue lines), together
with observed value of f1 .x/, for 100 parameter choices, predicted from 900 other simulator runs.
The horizontal dashed line represents the observed log maximum of stream flow
20 M. Goldstein and N. Huntley
calculating distances between x and xn and then performing a matrix multipli-
cation. Thus, the emulator can be evaluated at very many locations. Even for a
simulator as fast as FUSE this is useful; for slower simulators it is invaluable.
7 Model Discrepancy
A physical model is a description of the way in which system properties (the inputs
to the model) affect system behavior (the outputs of the model). This description
involves two basic types of simplification.
Firstly, the properties of the system are approximated, as these properties are
too complicated to describe fully, and anyway they are not completely understood.
Secondly, the rules for finding system behavior, given system properties, are
approximated, because of necessary mathematical simplifications, simplifications
for numerical tractability, and because the physical laws which govern the process
are not fully understood.
Neither of these approximations invalidates the modeling process. Problems only
arise when these simplifications are forgotten, and the uncertainty analysis of the
simulator is confused with the corresponding uncertainty analysis for the physical
system itself. As a basic principle, it is always better to recognize than to ignore
uncertainty, even if modeling and analysis of such uncertainty is difficult and partial.
Thus, it is important to recognize that models do not produce statements about
reality, however carefully they are analyzed. Such statements require structural
uncertainty assessment, taking account the mismatch between the simulator and the
physical system.
One may distinguish two types of model discrepancy, internal and external. The
term internal discrepancy is used to refer to any aspect of structural discrepancy
whose magnitude can be assessed by experiments on the computer simulator.
Internal discrepancy analysis gives a lower bound on the structural uncertainty that
must be introduced into the simulator analyses.
How internal discrepancy is assessed depends on the access that one has to
the simulator code and ones ability and willingness to carry out the relevant
experiments. For example, many simulators assess, at each time point, a state vector
from which the system outputs are evaluated, and then propagate the state vector to
the next time point by various system rules. If it is straightforward to access the state
vector, then it may be possible to introduce an additional stochastic term into the
state vector propagation to express uncertainty as to the principles underlying this
process. In the flood example, the forcing functions and initial conditions are varied.
Also assessed is the effect of allowing various simulator parameters, considered
constant within the simulator, to vary slowly over time.
The term external discrepancy relates to all of those aspects of structural discrep-
ancy that have not been addressed by ones assessments for internal discrepancy. In
particular, it relates to the inherent limitations of the modeling process embodied in
the simulator, based on missing or misconceived aspects of the physical simulator.
There are no experiments on the simulator which may reveal the magnitude of this
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 21
where one might model ones judgements relating B and B in a similar manner
to (2.9) and specify a correlation function relating u.x/ and u .x/, while treating
u .x; w/, with additional parameters, w, as uncorrelated with the remaining terms
in the emulator. For discussion and illustration of reification, including a treatment
of structured reification which incorporates systematic modeling for all aspects of
simulator deficiency whose effects one wishes to represent explicitly, see [9].
This article uses a simple method for estimating internal discrepancy, following
a similar approach to [10]. The strengths of this method are that it is easy to
apply, does not involve any complicated theory or calculations, can incorporate
expert knowledge where available but can be performed without any, is trivially
parallelizable, and does not require a huge number of simulator runs (although
22 M. Goldstein and N. Huntley
would still not be suitable for a very slow simulator). The main disadvantage is
that the estimate derived is likely to be crude, as shall be seen in later sections.
The core idea is to identify aspects of internal discrepancy, make suitable
perturbations of these aspects, run the simulator for each perturbation, and assess
the variability of this output. The aspects perturbed in this example were the rainfall
and evapotranspiration time series, the initial condition, and the parameter vector.
The details of the perturbations themselves can be found in the Appendix. Once the
perturbation method is determined, the procedure used is as follows.
The justification for taking the maximum rather than, say, the mean, is that the
estimate for internal discrepancy should be conservative, because it could be that x
is in the region of high internal discrepancy. Indeed, it could be argued that even the
maximum is not conservative enough, since only a small number of parameters have
been sampled. In this example, the authors experiences with the model suggested
that the region of high internal discrepancy was unlikely to include x , so the
maximum was considered to be acceptable.
Running such perturbations for n D 50 and m D 300, the log maximum of
discharge yielded a maximum internal discrepancy variance of 0:1242 . Observe that
this is very high relative to the emulator variances from earlier. In practice, this will
often be the case; later, one way of refining this estimate is considered.
9 History Matching
A typical Bayesian approach to the analysis of relations (2.1) and (2.2) treats the
quantity x as a true but unknown parameter, specifies a prior distribution for x ,
and carries out a simulator calibration exercise to derive the posterior distribution of
x given z. However, in many problems, for example, the rainfall runoff simulator
being used as an illustration, it may not be believed that there is a unique true but
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 23
unknown choice for x . Indeed, it is often possible that there are no good choices
for x , due to deficiencies in the simulator.
A simple alternative to calibration is history matching. This method describes
the identification of all input choices x for which the match of the simulator to
the data is judged to be consistent with the uncertainty expressed in relations (2.1)
and (2.2). Sometimes, this analysis will be sufficient for the problem at hand. At
other times, it may serve as a precursor to a more traditional simulator calibration.
As full Bayes calibration analysis is technically difficult and non-robust for large
and complex simulators, it is usually helpful to identify the region of the input space
which offers acceptable fits to the data, firstly, to act as a check on the simulator,
by testing whether there are any input values for which the simulator gives an
acceptable match to the data, and, secondly, to (massively) reduce the search space
for the Bayesian algorithm.
The history match is carried out by assessing, for each x, the number of standard
deviations between z and f .x/. If x is one of the members xi of the evaluation
set, xn D .x1 ; : : : ; xn /, then the value of f .x/ is known so that the variance of
.z f .x// is the sum of the observation variance and the discrepancy variance.
For each other value of x, instead z must be compared to the emulator mean for
f .x/, and the difference, .z E.f .x///, has variance V .x/ given by the sum of the
observation, discrepancy, and emulator variance.
For each value x, the number of standard errors between z and E.f .x// can be
assessed. This distance is expressed as an implausibility measure. For example,
taking a single component fi .x/ corresponding to data output zi , a potential
implausibility measure is
Large values of I.i/ .x/ suggest that it is implausible that this choice of x will
give an acceptable representation of the observed data. Small values of I.i/ .x/ are
consistent with the possibility that x will offer an acceptable representation, but
could simply correspond to very large values of the emulator variance, so these
values may not necessarily be viewed as plausible. Because the emulator is fast
to evaluate, the implausibility measure is correspondingly fast to evaluate as x is
varied.
The implausibility calculation can be performed on individual outputs or by using
the corresponding multivariate calculation over subvectors. The implausibilities are
then combined. For example, one might assess IM .x/ D maxi I.i/ .x/ and consider
that an input x would not be an acceptable choice if it gave a poor fit to any of
the outputs of interest. It is a matter of judgement as to the magnitude of I .x/ that
is considered unacceptable. There are various probabilistic heuristics that may be
used as guidance in this judgement, for example, the 3 rule, which states that
for any continuous, unimodal distribution, at least 95 % of the probability lies within
3 standard deviations of the mean (see [14]).
24 M. Goldstein and N. Huntley
Most of the components needed in (2.12) are now in place. The emulator gives
Ef1 .xn / .f1 .x// for any x. From the data, z1 D 2:89. V1 .x/ is the sum of the
emulator variance (calculated earlier), the measurement error (assessed as .1=30/2
for the gauges in question), and the model discrepancy. This final component is
a sum of the internal discrepancy (estimated as 0:1242 earlier) and the external
discrepancy, assessed informally as the same as the measurement error. Observe
that the internal discrepancy is much higher than the external discrepancy and the
measurement error, so the exact choices for these two quantities should be expected
to have relatively minor influence on the history match.
Visualizing history matching is difficult in many dimensions, but useful two-
dimensional plots of implausibility can be drawn. Consider a grid on the five-
dimensional active parameter space, with say 50 grid points in each dimension,
giving 505 points in total. For each x in this grid, I1 .x/ can be calculated and
compared to 32 . Now consider the grid for only two of these five parameters. Each
point in this grid corresponds to 503 parameters in the full grid. For each such point,
the proportion of non-rejected parameters can be computed and these proportions
can be plotted. The plots in Fig. 2.3 show such an approach, using parameter pairs
x.1/ and x.22/ , and x.19/ and x.22/ . A location is colored red if most parameter
choices at this pair are rejected and colored white if most are accepted. This allows
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 25
1.0
0.8
x_22 (standardised)
0.6
0.4
0.2
0.0
Fig. 2.3 Proportions of parameter space rejected at each pair of .x.1/ ; x.22/ / and .x.19/ ; x.22/ /. Red
represents almost all points rejected, and white represents almost all points accepted
visualization of the shape of the non-implausible region or regions. Note that this
procedure involves very many calculations and requires the emulator: running the
simulator at all these grid points would not be feasible even for FUSE.
The next step is to apply the standard method of refining the non-implausible
region through successive waves of emulation and history matching. A new
hypercube of n2 D 2000 parameter choices xn2 was generated, and for each
x in this hypercube, I1 .x/ from (2.12) was calculated. Any x with I1 .x/ > 9
was removed. This leaves 1087 non-implausible parameter choices, at which the
simulator is run. From this, a new emulator is built for the reduced parameter
space (or, if the non-implausible space were several regions, an emulator would
26 M. Goldstein and N. Huntley
be built for each region this does not occur in this example). This new emulator
takes into account only the behavior of the simulator in the reduced space and so
will usually be superior fit to the global model. Given a new set of parameters,
the original emulator can be used to discard some implausible parameter choices
and then the new emulator used to discard even more. In this particular example,
however, the second emulator has relatively little impact: from a third hypercube
of 2000 parameters, 915 were discarded using the first emulator, and 44 more were
discarded using the second emulator. This suggests that the possibilities of f1 have
been exhausted for the moment.
10.1 Introducing f2
As expected, very few extra points were removed in the previous step. The next step
is to bring back the quantity f2 , the total stream flow over the period of interest;
recall that this was initially difficult to emulate. Now, in the reduced space, a cubic
fit is possible, giving R2 of 0:86, with inactive parameters x.4/ ; x.6/ , and x.9/ . This is
sufficient to perform a history match. Assessing the nugget effect of these inactive
parameters, however, suggested a high nugget variance. That is, one or more of the
supposedly inactive parameters has a noticeable influence. Reactivating parameter
x.6/ improved this, although the nugget variance in this case is still not much below
the measurement error, which is a concern. Reactivating other parameters, however,
resulted in either a very high emulator variance or a very poor cross-validation
result, depending on the choice of 2 . It was therefore decided to accept the nugget
with parameters x.4/ and x.9/ inactive.
Some new quantities are required: a new measurement error (estimated as around
42:62 , using the assessment of measurement error at a single observation, and a
similar argument for correlation as was used for the rain gauge measurement error),
internal discrepancy (estimated as 2512 in the same manner as that for f1 ), and a new
external discrepancy (assessed informally as .200=3/2 ). The observed quantity z2 is
2321. Observe that the measurement error is smaller relative to the observation than
for f1 ; this is a common feature of using total discharge rather than maximum. With
an emulator, observed data, and all relevant discrepancy estimates, I2 .x/ from (2.12)
can again be calculated for any new x.
A new hypercube of n D 2500 parameters was generated, and history matching
was run sequentially on f1 and then on f2 . This yielded 1001 non-implausible
parameter choices, at which the simulator was run. Using these 1001, another
emulator was built for f2 , giving a total of four emulators. In this new emulator,
parameter x.9/ became active, and now the nugget effect dropped to a near-
insignificant value. Finally, a history match on 2800 new points with each emulator
was performed, yielding 1086 non-implausible points. Leaving out the fourth
history match would have yielded 1109 non-implausible points, so it appears that
the limit of history matching on f2 has been reached as well.
As an example, the plots from Fig. 2.3 are extended to cover each wave of history
matching; this is shown in Figs. 2.4 and 2.5.
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 27
x_9 (standardised)
x_9 (standardised)
0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x_1 (standardised) x_1 (standardised)
x_9 (standardised)
x_9 (standardised)
0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x_1 (standardised) x_1 (standardised)
Fig. 2.4 Proportions of parameter space rejected at each pair of .x.1/ ; x.22/ /, over all four waves of
history matching. Red represents almost all points rejected, and white represents almost all points
accepted
x_9 (standardised)
x_9 (standardised)
0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x_8 (standardised) x_8 (standardised)
x_9 (standardised)
x_9 (standardised)
0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x_8 (standardised) x_8 (standardised)
Fig. 2.5 Proportions of parameter space rejected at each pair of .x.19/ ; x.22/ /, over all four waves
of history matching. Red represents almost all points rejected, and white represents almost all
points accepted
Having again reached the limit of f2 , but still with a fairly large non-implausible
parameter space, the likely next step in practice would be to seek additional
summaries to emulate or perhaps now attempt emulation of the whole time series at
once. This would follow essentially the same pattern, so this example is not pursued
further in this way.
28 M. Goldstein and N. Huntley
This approach can be used to investigate the influence of the various discrepancies.
One question is whether the external discrepancy is too low or conversely whether
the estimated internal discrepancy represents most of the model discrepancy. If
the history-matching procedure was returning no or very few non-implausible
parameters, this might be a sign that the external discrepancy was too low. In the
example, the non-implausible region is still large, so there is no particular reason to
increase the external discrepancy. Note that if a third summary f3 was introduced,
and this rejected almost all points, then there may be reason to increase the external
discrepancy at earlier stages.
On the other hand, it seems likely given its relative size that the internal
discrepancy is the driving factor. To investigate this, the external discrepancy was
set to zero. A history match under these conditions yielded 1048 non-implausible
points, compared with 1086 with external discrepancy included. As expected, these
numbers are close. Given that the estimate of internal discrepancy was quite crude,
but very influential, this quantity would seem to need more attention.
There is noticeable variation in the 20 estimates for internal discrepancy calcu-
lated earlier. Using the maximum of the 20 estimates at all points could therefore
be significantly overestimating the internal discrepancy at many points (and under-
estimating at some). This could be overcome by another use of emulation. The
results of the internal discrepancy experiments can be viewed as a slow computer
simulator, and so the emulation framework could be used to predict the result of
those experiments at any other parameter choice. This would provide an expected
value and a variance for the internal discrepancy variance at any parameter choice
x, which would be used in the calculation of Ii .x/. Pursuing this idea is outside the
scope of this article.
One final investigation is to suppose that the simulator is perfect, that is, that
the measurement error, internal discrepancy, and external discrepancy are all zero.
Running the history-matching procedure rejects all but 3 % of the parameter space.
This is very small compared with the reductions already seen. The conclusion is that
the emulators are powerful, and it is the intrinsic uncertainties in the simulator and
observations that are restricting the ability to remove parameter choices.
11 Forecasting
Consider the prediction of future system outcomes, yp , from data on past system
outcomes, z. In order to make a Bayes linear forecast for yp given z, the second-
order specification relating these two vectors must be constructed. z is linked to yp
through relations (2.1) and (2.2). This specification divides into two parts. The first
part arises from uncertainty as to the simulator values and the appropriate choice
for x . The mean and variance of f .x/, for any x, are obtained from the mean and
variance functions of the emulator for f . Using these values, the mean and variance
of f .x / can be computed by first conditioning on x and then integrating out x ,
2 Bayes Linear Emulation, History Matching, and Forecasting for: : : 29
over the input region identified by history matching. These values are combined with
the second-order specification for the simulator discrepancy , derived, typically,
as a combination of internal and external discrepancies. Adding the observational
variance for z gives the joint second-order structure required, from (2.3), (2.4), the
Bayes linear adjusted mean and variance for yp given z; for details, see [4].
This analysis is fast and straightforward, even for high-dimensional systems. It
is sufficiently tractable to serve as a basis for a general forecasting methodology
including suggesting informative designs for evaluating the simulator in order to
minimize forecast variance, careful sensitivity analyses to identify the impact of the
various modeling assessments that have been made on the accuracy of the forecast,
and real-time control of the process.
This analysis is largely based on exploiting the global variation in the emulator
and is most effective when the local component of emulator variation is small. A
more detailed forecast, exploiting the local component of variation, can be made by
using the method of Bayes linear calibrated forecasting. In this approach, one must
construct a Bayes linear assessment xO for x given z, evaluate the simulator at x,
O to
give value fO D f .x/,
O compute the covariance between fO and .z; yp /, and construct
the Bayes linear adjustment of yp based on both z and fO; for details, see [8].
The quantity chosen to forecast was log maximum in the next rainfall period (the
period beginning in October in Fig. 2.1). Only the simplest approach to forecasting
is applied in this example. Note that across the non-implausible region, the simulator
typically predicts log maximum in the range 2.53.5 for this period (the observed
value is 3.03), so this gives an idea of the best accuracy of prediction that can be
hoped for.
The first step is to build an emulator for this new quantity f3 in the non-
implausible region. This proceeded as usual, using a cubic fit and 3 D 0:35.
With this emulator, along with estimates of the discrepancy for f3 and a given
x , a prediction for f3 .x /, and hence z3 , could be made. The same measurement
error and external discrepancy as for f1 were used, and internal discrepancy was
estimated as usual from the 20 internal discrepancy experiments.
So, given x , the expected value and the variance of z3 can be calculated. Since
x is unknown, a uniform prior over the non-implausible region is assumed (since
there is no particular information to suggest a nonuniform distribution). Therefore,
x can be integrated out to find the mean and variance of z3 , no longer conditional
on x .
In the example, this gives E.z3 / D 2:93 with a standard deviation of 0:31.
As expected, the uncertainty on this estimate is high. Although the high internal
discrepancy contributes to this, the main problem is the size of and variability
within the non-implausible region this contributes around 75% of the variance.
In practice, this would be a sign to either assess discrepancies more carefully and
history match again or to find new quantities to match.
30 M. Goldstein and N. Huntley
13 Conclusion
Emulation and history matching give a powerful and tractable framework for
learning about the output of a computer simulator (through the simple statistical
approximation of the emulator) and good parameter choices at which to run the sim-
ulator (through the iterative procedure of history matching). Bayes linear analysis
provides a theoretical basis for performing these history matches without needing
a complex and computationally challenging full Bayes analysis, but still allowing
incorporation of prior judgements. This framework brings together judgements
about parameters and about the simulator and its shortcomings, training runs of
the simulator and of faster approximate simulators, and discrepancy estimates on
the simulator. As shown in the example, this can involve straightforward and
computationally inexpensive calculations to quickly reduce the non-implausible
space of parameters. Once this is completed, forecasting for the future follows
naturally.
The example contained simple estimations of both external and internal model
discrepancy. These calculations can be extended in several ways if a more careful
assessment of model discrepancy is required. As discussed earlier, the external
discrepancy can be explored more thoroughly by the introduction of the reified
model f . The internal discrepancy calculations can be improved by emulating
the result of the internal discrepancy experiments, that is, by building a model for
the standard deviation (and possible expected value) of the internal discrepancy for
each parameter choice x. This would give low internal discrepancy in region where
there is reason to believe it should be low, avoiding the problem encountered in the
example where the high internal discrepancies occurred in implausible regions of
the parameter space.
not be worth perturbing; if it has a small influence, it may be worth including but not
expending much effort on; if it has a large influence, it is worth carefully modeling.
The outcome of this exploration for FUSE suggested that the initial condition was
hardly relevant, the evapotranspiration was worth including, and the rainfall and
parameter perturbations deserved more attention.
The final step is to consider how to generate perturbations of each quantity. The
initial condition is ignored. For evapotranspiration, good estimates of observation
uncertainty are lacking, but given the low influence of this quantity this is not too
worrying: any plausible perturbation should be sufficient. Each evapotranspiration
observation was multiplied by a perturbation drawn from a log-normal distribution,
such that most observations were perturbed by no more than 10 %. Correlation
between observations within 24 h was also included, so if a particular observation
has a high perturbation, nearby ones will also. This is motivated by the daily period
of the evapotranspiration time series.
Parameter perturbations are performed by multiplying each initial parameter by
some random perturbation, with nearby multipliers being correlated. The size of the
perturbations were chosen such that the parameters rarely changed by much more
than 10 % over the course of a simulation. This creates collections of perturbations
that cause the parameters to evolve slowly without sudden large changes and without
a large change overall. The parameter perturbations have a significant effect on the
output, but expert opinion about how these are likely to change over time and by how
much is lacking. In principle, in such a situation one should make the correlation and
the magnitude of the perturbations configurable parameters, so as to understand their
influence. For this example, however, this complication is avoided. An example of
the evolution of a particular choice of x.1/ for a particular perturbation can be seen
in Fig. 2.6.
Perturbing the rainfall also has a significant effect on the output. In this case,
however, there is some more guidance on the perturbations required. Sources of
2800
2700
x_(1)
2600
2500
Fig. 2.6 The evolution of parameter x.1/ for a particular parameter perturbation
32 M. Goldstein and N. Huntley
uncertainty in the rainfall was attributed to three significant causes: the local gauge
measurement error, the process of aggregating readings to the nearest hour, and the
process of averaging over the catchment by kriging. Suitable perturbations from
these errors were generated and combined.
The overall rainfall perturbations generated for this process typically display
occasional noticeable differences but mostly small differences. This suggests that
the rainfall error could contribute significantly to discrepancy for maximum stream
flow, but not so much for discrepancy for average stream flow.
References
1. Bastos, L.S., OHagan, A.: Diagnostics for Gaussian process emulators. Technometrics 51,
425438 (2008)
2. Clark, M.P., Slater, A.G., Rupp, D.E., Woods, R.A., Vrugt, J.A., Gupta, H.V., Wagener, T.,
Hay, L.E.: Framework for Understanding Structural Errors (FUSE): a modular framework to
diagnose differences between hydrological models. Water Resour. Res. 44, W00B02 (2008)
3. Craig, P.S., Goldstein, M., Seheult, A.H., Smith, J.A.: Pressure matching for hydrocarbon
reservoirs: a case study in the use of Bayes linear strategies for large computer experiments
(with discussion). In: Gastonis, C., et al. (eds.) Case Studies in Bayesian Statistics, vol. III,
pp. 3793. Springer, New York (1997)
4. Craig, P.S., Goldstein, M., Rougier, J.C., Seheult, A.H.: Bayesian forecasting using large
computer models. JASA 96, 717729 (2001)
5. Cumming, J., Goldstein, M.: Small sample Bayesian designs for complex high-dimensional
models based on information gained using fast approximations. Technometrics 51, 377388
(2009)
6. de Finetti, B.: Theory of Probability, vols. 1 & 2. Wiley, New York (1974, 1975)
7. Goldstein, M.: Subjective Bayesian analysis: principles and practice. Bayesian Anal. 1, 403
420 (2006)
8. Goldstein, M., Rougier, J.C.: Bayes linear calibrated prediction for complex systems. JASA
101, 11321143 (2006)
9. Goldstein, M., Rougier, J.C.: Reified Bayesian modelling and inference for physical systems
(with discussion). JSPI 139, 12211239 (2008)
10. Goldstein, M., Seheult, A., Vernon, I.: Assessing model adequacy. In: Wainwright, J., Mulligan,
M. (eds) Environmental Modelling: Finding Simplicity in Complexity, 2nd edn., pp. 435449,
Wiley, Chichester (2010)
11. Goldstein, M., Wooff, D.A.: Bayes Linear Statistics: Theory and Methods. Wiley, Chich-
ester/Hoboken (2007)
12. Monteith, J.L.: Evaporation and environment. Symp. Soc. Exp. Biol. 19, 205224 (1965)
13. OHagan, A.: Bayesian analysis of computer code outputs: a tutorial. Reliab. Eng. Syst. Saf.
91, 12901300 (2006)
14. Pukelsheim, F.: The three sigma rule. Am. Stat. 48, 8891 (1994)
15. Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments.
Springer, New York (2003)
16. Vernon I., Goldstein M., and Bower, R.: Galaxy Formation: a Bayesian Uncertainty Analysis
(with discussion). Bayesian Anal. 5, 619670 (2010)
17. Williamson, D., Goldstein, M., Allison, L., Blaker, A., Challenor, P., Jackson, L., Yamazaki, K.:
History matching for exploring and reducing climate model parameter space using observations
and a large perturbed physics ensemble. Clim. Dyn. 41(78), 17031729 (2013)
Inference Given Summary Statistics
3
Habib N. Najm and Kenny Chowdhary
Abstract
In many practical situations, where one is interested in employing Bayesian
inference methods to infer parameters of interest, a significant challenge is that
actual data is not available. Rather, what is most commonly available in the
literature are summary statistics on the data, on parameters of interest, or on
functions thereof. In this chapter, we present a general framework relying on the
maximum entropy principle, and employing approximate Bayesian computation
methods, to infer a joint posterior density on parameters of interest given
summary statistics, as well as other known details about the experiment or
observational system behind the published statistics. By essentially redoing the
experimental fitting using proposed data sets, the method ensures that the inferred
joint posterior density on model parameters is consistent with the given statistics
and with the model.
Keywords
Approximate bayesian computation Bayesian inference Maximum
entropy Missing data Sufficient statistic
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 Mathematical Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Entropic Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Sufficient Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1 Introduction
Statistical inference methods are useful for estimation of uncertain model inputs
and/or parameters based on experimental measurements of observable model
outputs. Parameter estimation based on fitting to data can, of course, be done
using least-squares fitting methods [4, 15, 20]. Further, this fitting can, under certain
conditions such as Gaussian noise and model linearity in parameters, provide
measures of uncertainty in fitted parameters. However, more generally, when these
assumptions are relaxed, the most natural framework for uncertain parameter
estimation based on data is a statistical framework [5].
Further, the Bayesian viewpoint on probability theory [2, 18], and associated
statistical inference methods [6, 28], provide numerous advantages over alternate
frequentist methods. Bayesian inference methods involve an in-built requirement
for an explicit statement on prior knowledge. This requisite encapsulation of
prior information in the fitting process is often necessary due to the need for
regularization because of data sparsity, as is self-evident from the wide-scale use
of regularization in deterministic optimization studies. The Bayesian framework
incorporates regularization naturally through the prior [25, 31, 32]. Further, priors
provide a basis for sequential learning and for learning from multiple experiments
and heterogeneous data sources, among other uses. Bayesian methods also provide
well-founded means for dealing with nuisance parameters, hypothesis testing,
as well as model comparison/selection and averaging [9]. In the following, we
place ourselves squarely within the Bayesian probabilistic framework and more
particularly in the context of Bayesian inference for parameter estimation.
Given availability of measurement data on observable model outputs, there
is a wealth of literature on the practical use of Bayesian inference methods for
estimation of model inputs/parameters with quantified uncertainty [6,28]. However,
there are many practical situations where estimation of uncertain model parameters
is necessary, but no data is actually available. Rather, what is typically available to
the analyst is information in the form of data summaries. Our goal is to discuss the
estimation of a joint probability density function (PDF) on the parameters in the
absence of actual measurement data, but given such data summaries.
This is in fact a very typical scenario. A probabilistic characterization of
uncertain model parameters is not typically available in the literature for a wide
range of relevant models. Rather, one might find published information on nominal
values and error bars on model parameters. Alternately, one might find similar
3 Inference Given Summary Statistics 35
1
It is worth noting that, presuming reported error bars in ln k.T / over a sufficient number of
temperature points and considering a Gaussian parameter PDF, the joint correlation structure of
p.A; n; E/ has been inferred through other means [23]. However, neither of these assumptions is
required in the present approach.
36 H.N. Najm and K. Chowdhary
a meaningful PDF on model parameters in the absence of actual data, but given
some set of summary statistics based on a missing data set.
This problem is in fact solvable with recourse to the maximum entropy (MaxEnt)
principle [7, 18]. In the absence of data, but given constraints on a distribution, the
MaxEnt density is that density which maximizes relative entropy while satisfying
the constraints. Maximizing entropy translates to maximizing uncertainty up to
available information. The application of the MaxEnt principle relies on an explicit
statement on the constraints in a partition function. This is not easily done given
the above scenarios of interest because of the complex nature of the constraints. For
example, we are not after a density that simply satisfies the stated nominals and
bounds, but rather one that satisfies these and is consistent with the fitted model
and the fitting that was done on the missing data to arrive at the given statistics.
Rather, what is needed is a general computational procedure that allows the
estimation of a joint density on parameters of interest given arbitrary and complex
constraints ultimately stated as summary statistics. This is the procedure we outline
in this chapter. The construction of this data-free inference (DFI) procedure is
based on the MaxEnt principle, coupled with Approximate Bayesian Computation
(ABC) methods. In the following, we outline the mathematical formulation of the
method, followed by its general algorithmic structure. We then present illustrations
of its use in simple problems as well as more complex problems of practical
relevance.
2 Mathematical Setup
In this section, we will provide the mathematical background for understanding the
DFI procedure. We will start with a brief introduction to Bayesian inference and
how it relates to entropic inference, i.e., MaxEnt.
For the examples in this chapter, it will be sufficient to only consider finite-
dimensional random variables that live in a subset of Euclidean space. In general,
however, let .; A; P/ be the canonical probability triplet, where is the sam-
ple space, A the associated -algebra, and P the probability measure on the
-algebra A. Let X W 7! Rd be a d -dimensional random variable which maps
the event space into a subset of finite-dimensional Euclidean space. The random
variable induces a probability triplet .Rd ; B; /, where B is the Borel -algebra
and 2 P.Rd / is the measure induced by the random variable, where P.Rd / is the
space of all probability measures on Rd . In particular, we have .B/ D P.! 2
X 1 .B/ W B 2 B/. When the probability measure is absolutely continuous
w.r.t. the Lebesgue measure, we can write the density of the random variable as
.x/ 2 R; x 2 Rd . From here on, we will assume that all measures introduced
in this chapter are absolutely continuous w.r.t. the Lebesgue measure, which means
that all probability measures will have an associated probability density function.
3 Inference Given Summary Statistics 37
where ytrue represents the noiseless observation of the model. For example, f .t I x/
could be a linear function of x, e.g., f .t I x/ D t x. Note that in many applications,
one can make a reasonable assumption on the model form for f .t I x/ and the goal
of the inference is then to determine the model parameters. To illustrate this point
further, f .t I x/ may be a polynomial in t , and x represents the coefficients. Inferring
x in this model is referred to as regression.
In the context of Bayesian inference, modeling x as a realization of the random
variable X 2 Rd , the inference of x is the estimation of the posterior density on X
given (typically noisy) observations of ytrue . For the purposes of forward uncertainty
quantification, one might also be interested in propagating the uncertain random
variable X through the model f .t I X /, to obtain a distribution on the output. For the
purposes of this work, we will focus on the Bayesian parameter estimation problem
where the goal is to characterize the posterior density on X given observations of
the model output.
As alluded to earlier, one does not directly observe measurements in the form of
ytrue . Instead, one observes the true model perturbed by some instrument error/noise.
The Bayesian procedure now proceeds as follows. First, let .x/ 2 P.Rd / be a
prior density on X , which represents our prior belief on the uncertain parameters
[18]. Next, the likelihood or probability that the data came from a model y, not
to be confused with ytrue , with uncertain parameter X D x, will be denoted by
.yjx/ 2 P.RjRd / and is, not surprisingly, referred to as the likelihood density.
Now, we have all the ingredients to apply Bayes rule. The posterior density on the
uncertain random variable X is given by the following ratio:
.yjx/.x/
.xjy/ D R : (3.2)
Rd .yjx/.x/dx
The denominator is the evidence, or the marginal likelihood, being the integral
of the likelihood over the prior-distributed parameters. For purposes of parameter
estimation, it is sufficient to treat the evidence simply as a normalization constant.
Thus, there is no need to evaluate the marginal likelihood integral.
For convenience, it is useful to consider the following observation model, where
the noise is additive Gaussian. Thus, we write
y D f .t I x/ C (3.3)
n
Y
.xjy/ / .yi jx/.x/: (3.5)
iD1
Now that we have a density for X , given by .xjy/, let us assume we can
obtain samples from this density, which is by no means a trivial task, but will
help us illustrate the next point. If we propagate those samples through the data
model in Eq. (3.3), we obtain the posterior predictive density 0 .f .t; X / C /,
where X .xjy/ and N.0; 1/ [11, 22]. The posterior predictive is
useful as a means of predicting the data and is often employed as a diagnostic to
evaluate the quality of the inference. On the other hand, for prediction purposes, in
order to estimate model predictions with quantified uncertainty, what is required
is to push forward .xjy/ through the predictive model. This can be done by
propagating random samples of X through f .t; x/ alone, without the noise model.
In this case, one obtains the pushed forward posterior density .f .t I X //. Note
the distinction between the pushed forward posterior and the posterior predictive.
The posterior predictive propagates the uncertainty in X as well as the noise
(and any associated uncertainty in noise model hyperparameters), through the data
model. The pushed forward posterior, most commonly associated with forward
uncertainty quantification, propagates the uncertainty in X through the predictive
model f .t; x/. We will come back to these definitions after we discuss the idea of
entropic inference and maximum entropy.
To motivate this transition, let us review the concept of Bayesian inference and
how it relates to DFI. In the Bayesian procedure, one has the observation y and a
likelihood function, .y jx/, which relates the data to the observation, measuring
the likelihood of observing y given that the uncertain random variable is x. In the
DFI context, we no longer observe y directly, but rather are given some function
or summary statistic of y , call it S .y /, which may be a nonlinear, non-invertible
mapping, e.g., S .y / could be the variance of the observations y , or it could be
the mean of the posterior distribution on x .xjy /. In this scenario, how do we
calculate the likelihood for a particular realization y of the observed data if we only
know S .y /? In other words, how do we calculate .S Q .y /jx/, where Q does not
necessary have the same form as ? The hope is that the likelihood can be written
explicitly as a function of S .y/, but this is rarely the case. Alternatively, for each
3 Inference Given Summary Statistics 39
x, one could sample S .y/, fit a probability density, and measure the likelihood of
S .y /, but this could be very costly. A less costly and more accurate procedure that
integrates these two ideas is the following. The idea is to approximate the likelihood
with
.S
Q .y/jx/ .yjx/w.y/; (3.6)
where w.y/ is a probability density itself which weighs how close S .y/ is to the
observed summary statistics. Essentially, this function holds information about the
density of S .y/ w.r.t. the data space and is the work horse for this procedure. For
example, w.y/ D .S .y/ S .y //, which means that .SQ .y/jx/ is nonzero only
when y D y (assuming S is one-to-one). In the next section, we discuss how the
Bayesian procedure can proceed with this new likelihood given that we only have
S .y / and not y directly.
In this section, we derive the Bayesian procedure from an entropic point of view [7],
in which the DFI setting and the likelihood defined in Eq. (3.6) fit naturally.
First, let q.x/ be a prior density on the parameters and q.yjx/ be the likelihood
function relating the data and the parameters. Consider the joint density q.y; x/ D
q.yjx/q.x/ on both the data and parameters. Let p.y; x/ be some unknown joint
density on the data and parameters, such that p is absolutely continuous w.r.t. q.
The relative entropy between p and q is given by
Z
p.y; x/
E.p; q/ D DKL .pkq/ D log p.y; x/dydx; (3.7)
q.y; x/
where DKL .pkq/ is the Kullback-Leibler (KL) divergence from p to q [10]. DKL
is a pseudo-metric which enjoys some useful properties, such as convexity in both
p and/or q, lower semi-continuity, and nonnegativity [10]. However, it should be
noted that it is not symmetric nor does it satisfy the triangle inequality, which is
why it is a pseudo-metric instead of complete metric. The idea of relative entropy
originally came from Claude Shannon and was popularized by E.T. Jaynes [18, 26].
We now claim that the posterior density, q.xjy/, is a member of the class
of densities that maximizes E.p; q/ subject to a particular set of constraints. In
the simplest case, where we have an observation y , the single constraint on the
density is
p.y/ D .y y /; (3.8)
so that the data is constrained to be the observation y , but in general, this need
not be the case. In fact, in our construction, our constraint can be written more
generally as
Considering first the simple case with the constraint in Eq. (3.8), the feasible
region for the relative entropy maximization problem is given by
Z
P0 WD p.y; x/ W p.y/ D .y y /; p.y; x/dydx D 1 ; (3.10)
To summarize, the problem defined in Eq. (3.11) is to find the joint density p.y; x/ 2
P0 which maximizes E.p; q/ or (equivalently) is closest to the joint prior q.y; x/ in
terms of DKL .pkq/.
In order to solve this optimization problem, one can use the method of Lagrange
multipliers. Note that since the constraint is a function, the Lagrange multiplier must
be infinite-dimensional. Denote the Lagrange multiplier by .y/ 2 R. The Lagrange
function is given by
Z Z
L.p; ; / D E.p; q/ C p.y; x/dydx 1 C .y/p.y/ .y y /
dy:
(3.12)
e .y/
p.y; x/ D q.y; x/ (3.13)
Z
Z
e .y/
p.y/ D p.y; x/dx D q.y/ (3.14)
Z
e .y/
q.y/ D .y y /: (3.15)
Z
Thus,
e .y/
p.y; x/ D q.xjy/q.y/ D q.xjy/.y y /: (3.16)
Z
3 Inference Given Summary Statistics 41
Thus, we are looking at joint densities for which y w.y/, among the other
constraints. Now, it can be shown (see [7]) that the joint density maximizing the
entropy E.pkq/ is
Recall that marginalizing with the case w.y/ D .y y / brings us back to the
original Bayes rule, but in this case, since w.y/ may in general be any density,
the analogous solution is computing the mean posterior density of x over w.y/.
Now the DFI algorithm starts to take form. The idea is to first compute consistent
data sets y w.y/, and then, for each consistent data set, compute the posterior
q.xjy/. Finally, to obtain samples from p.x/, we marginalize the joint samples
over y w.y/. This is also called linear average pooling. One disadvantage of
linear pooling is that it is not externally Bayesian [3, 12], i.e., Bayesian updating
and pooling do not commute, with the result that each consistent posterior has to be
computed individually from each consistent data set, and saved, before proceeding
to pooling. This can be quite inefficient. A more computationally efficient option
is logarithmic pooling, which is, as discussed below, externally Bayesian. In this
context, we average/pool the posterior densities as follows:
Z
p.x/ D exp log.q.xjy//w.y/dy : (3.21)
Intuitively, linear pooling of densities is similar to taking the union of the densities,
whereas logarithmic pooling of densities is similar to taking their intersection.
42 H.N. Najm and K. Chowdhary
Z
P2 D p.y; x/ W p.y/ D w.y/; p.y; x/dydx D 1; p.y; x/ D p.y/p.x/ :
(3.22)
Then, it can be shown that the maximum entropy solution is Eqn. (3.21).
As already indicated, an important characteristic of logarithmic (or log for short)
average pooling is that it is externally Bayesian [3, 12]. This means one can log
pool the individual posteriors or, equivalently, calculate the posterior based on
the log pooled likelihoods. More formally, suppose we have M samples from
w.y/: y1 ; : : : ; yM . Each data sample has a respective posterior density q.xjyi /
Rfor i D 1; : : : ; M . We can write q.xjyi / D q.yi jx/.x/=q.yi /, where q.yi / D
q.yi ; x/dx and .x/ is a prior on the parameter space. For simplicity, let us
assume that the prior for each posterior density is the same. If we let T denote
the logarithmic average pooling operator of the M posterior densities, then it can be
shown [12] that
q.y1 jx/.x/ q.yM jx/.x/
T .q.xjy1 /; : : : ; q.xjyM // D T ;:::; (3.23)
q.y1 / q.yM /
T .q.y1 jx/; : : : ; q.yM jx//.x/
D : (3.24)
T .q.y1 /; : : : ; q.zM //
Thus far, we have introduced a Bayesian procedure to infer our uncertain model
parameters x where the likelihood as .S Q .y/jx/ is approximated as .yjx/w.y/,
where S might be some mapping of some complicated statistic on the data. Ideally,
one would like to observe y directly, since we have an analytic form for the data
likelihood. So when is .S Q .y/jx/ a good substitute for .yjx/? The answer is when
S .y/ is a sufficient statistic for the posterior density on x. In this case, any data set y
that produces the same sufficient statistic, S .y /, as the true observed information
will yield the same posterior density on x. In general, when using a likelihood-based
inference, any two data sets with the same sufficient statistics, for the underlying
model parameters, will yield the same inference on the posterior of those model
parameters. This is due to the Fisher factorization theorem [21].
3 Inference Given Summary Statistics 43
Let p.yjx/ be any likelihood function and let S .y/ be a sufficient statistic for
the model parameter x. We will show that any set of data with the same value of
S .y/ will yield the same inference, e.g., posterior, on x. By the Fisher factorization
theorem, there exist nonnegative functions p.S .y/jx/ and q.y/ such that
Before we move on to the algorithm behind DFI, we need to spend some time
discussing the work horse of the procedure, i.e., sampling consistent data sets
y w.y/. Recall that we need to do this in order to approximate the likelihood
which relates the data y to the uncertain parameters x. Ideally, given an observed
set of statistics S .y /, one would like to sample from w.y/ D .S .y/ S .y //,
but this would be nearly impossible unless we can explicitly invert the operator S .
However, in most cases, e.g., if S calculates the mean, this inverse problem is ill-
posed and has infinitely many solutions. Instead, if we relax this constraint that S .y/
must exactly match S .y /, then the problem becomes well-posed again, although
our likelihood becomes an approximation. This leads to Approximate Bayesian
Computation (ABC) methods [1, 27].
With ABC methods, one has to derive a reasonable way to measure the
discrepancy between the observed statistics, S .y / and the sample statistic S .y/.
For example, Berry et al. introduced a weight function [3] that penalized certain
quantiles of the observed statistics, using the KL divergence pseudo-metric. We
proceed in a similar manner. First, let us define the following subset of the data
space:
where the right-hand side is the indicator function. Alternatively, we can approxi-
mate the data likelihood directly using d , i.e.,
where the right-hand side is at its maximum when y D y . We can also add a
simulated annealing parameter 1 where
3 Algorithmic Structure
where p.x; yjI / is the MaxEnt joint posterior on data and parameters that satisfies
given constraints. The DFI procedure provides means of estimation of this marginal
3 Inference Given Summary Statistics 45
density without explicit estimation of the MaxEnt joint density. This is done by
generating consistent data samples fy 1 ; y 2 ; ; y M g and pooling the resulting
parametric posteriors fp1 ; p2 ; ; pM g, where pi p.xjyi / [3]. In this context,
consistency is used to mean consistency with respect to the given information I ,
i.e., consistent data samples are samples from the data density p.yjI /. Further, as
indicated above, pooling [3, 13] provides a means of combining information from
different sources, in the form of consistent posteriors pi , in a manner corresponding
to the marginalization in Eq. (3.31), with optionally additional constraints. In par-
ticular, the above introduced linear average pooling providing the pooled posterior
density as an arithmetic average
1 X
pa D pi (3.32)
M i
" m
#1=M
Y
pg D pi ; (3.33)
iD1
where Sk .
/ is the k-th statistic on
and Fk ./ is the associated kernel density, e.g.,
of the form
wk .Sk ; Sk /
Fk .Sk .
// D F (3.36)
k k
where wk and k are scaling factors, .Sk ; Sk / is a measure of distance between
Sk .
/ and the requisite value Sk of this statistic, and F ./ is the Gaussian kernel
1 z2
F .z/ D p e 2 : (3.37)
2
An outer single-site MCMC chain on the data space can be written simply as shown
in Algorithm 1. In this algorithm illustration, being with no loss of generality a
single-site MCMC algorithm, s is the MCMC step index,
s is the
at step s, pQ s is
the corresponding un-normalized posterior value, with
p.
jI
Q / p.I j
/.
/; (3.38)
and v is a vector of variances for the normal proposal distribution used in the
algorithm.
At any step in the outer chain, given
s D .y s ; s /, we run an inner MCMC chain
on the parameter space to evaluate the requisite statistics Sk .
s /; k D 1; : : : ; K. In
general, we define any statistic S .
/ as a functional on the posterior density p.jy/
of the model parameters inferred given data y. Presuming some hyperparameters
for the inner chain also, we define the state vector for the inner chain as D .; /
and write the inner chain posterior as
where ./ is the prior on , capturing prior belief on , and p.yj/ is the
likelihood of data y given the fit model and . The formulation of the inner chain
likelihood function depends on the presumed data model for the missing data
from the experiment, involving both the fit model and a statistical model for the
discrepancy between the model and the data. For example, presuming an additive
zero-bias i.i.d. Gaussian noise model for the discrepancy, we have the data model
y D f ./ C ; N .0; 2 / (3.40)
J
!
Y 1 .f ./ yjs /2
s s
p.y j/ L ./ D p exp : (3.41)
j D1
2 2 2
With this likelihood, and the prior ./, any suitable MCMC chain construction can
be employed to generate samples from the posterior p.jy s /. With these samples,
requisite statistics Sk .
s / can be evaluated. For example, statistics such as Ejy s
,
Varjy s
, Ejy s f ./
, or Varjy s f ./
can be easily evaluated and employed in
the outer/data chain ABC likelihood to check consistency with associated given
values.
With this, we have a number of consistent data sets y s , and associated consistent
posteriors p.jy s /, for s D 1; : : : ; M . A suitably pooled posterior can be estimated
in a straightforward manner as outlined above, arriving at, e.g., an arithmetically
or geometrically averaged posterior as in Eqs. (3.32) and (3.33). In particular,
considering logarithmic pooling, which is externally Bayesian, one can, in order
to avoid laborious parameterization of posterior densities, pool the data rather
than explicitly pooling the posteriors. For example, for the above i.i.d. Gaussian
likelihood in Eq. (3.41), we have
2 !31=M
M Y
Y J
1 .f ./ yjs /2
/4 p exp 5 ./:
sD1 j D1
2 2 2
(3.43)
Thus, a single MCMC chain on the parameters, with the full data vector y pool given
by fyjs g, s D 1; : : : ; M , j D 1; : : : ; J , and with the modified likelihood
2 !31=M
M Y
Y J
1 .f ./ yjs /2
p.y pool j/ D Lpool ./ D 4 p exp 5
sD1 j D1
2 2 2
(3.44)
provides the logarithmically pooled posterior.
Irrespective of the chosen pooling method, in the limit of sufficiently dense
sampling of the data space, the resulting pooled posterior is the sought-after
marginal MaxEnt density, being marginalized over the data. It is the density which
is consistent with all given information on the model, the experiment, and summary
statistics.
3.1 Remarks
The above DFI construction has been employed in a range of problems [3, 8, 24].
Before moving forward to a discussion of illustrative applications, we give in the
following some general remarks regarding the algorithm.
1. One of the key challenges in the algorithm is the potential high dimensionality
of the data space. Leaving aside any hyperparameters, with N data pairs .xi ; yi /,
the data chain is 2N -dimensional. It is quite feasible to imagine many practical
problems with large N , with an associated high-dimensional MCMC chain on
the data. In this context, it becomes crucial that numerous MCMC details be
judiciously chosen to ensure accurate results with reasonable computational
efficiency. This includes careful choices of the chain starting point, priors on data
space, and the proposal distribution. We have found that data chain burn-in, in
particular, can be a significant computational burden in high dimensions. Further,
it is precisely because of the challenges in proposal distribution selection in high
dimensions that our above-illustrated MCMC chain on the data space relies on
single-site proposal structure.
2. The outer chain, by construction, is sampling along a manifold in data space,
where the manifold is defined by the given constraints. While the chosen ABC
kernel width adds a certain small width to the manifold, sampling along the
manifold in data space does remain a challenge. While it remains to be shown, it
is likely that geometric MCMC methods that take the geometry of the manifold
into consideration in adapting the proposal distribution would be useful in this
3 Inference Given Summary Statistics 49
4 Applications
4.1.1 Setup
Consider a nonlinear model problem that involves the chemical decomposition of
an unstable radical, with initial concentration c0 at time t D 0. Synthetic data pairs
50 H.N. Najm and K. Chowdhary
of time and concentration, .ti ; yi /, are generated using a double exponential decay
truth model s.t /, with a multiplicative data noise model.
1 1
si D 2e 2ti C e 2 ti ; (3.45)
2
1 1
i D si C ;
4 20
yi D si C i "i ;
where "i are i.i.d. N.0; 1/ random variables and ti are the time values in the set
D fi =2 i D 1; :; : : : ; 10g ; (3.46)
with ti 2 0; 5
. Also, at each time point, 10 data points are generated for a total of
100 data points, i.e., the experiment was performed ten times. Here, si is the true
signal, "i is Gaussian white noise, and yi s are our observations.
A researcher observes these time-concentration data pairs and reports fitting a
single exponential decay model to this data (not knowing the true underlying double
exponential decay model). Furthermore, instead of publishing the data pairs, .ti ; yi /,
the researcher publishes summary statistics, namely, nominal values and error bars
on the predicted concentration from the fitted single exponential decay model at a set
of time instances [16]. This means that our S .y/ function will involve a complicated
and potentially computationally expensive procedure, which will be outlined in the
next two subsections.
As readers of this published report, we want to use the authors results and
produce detailed predictions with associated uncertainty estimates, i.e., we want
to know the full posterior distribution for the uncertain model parameters. We could
easily do this if the observed data pairs were published, but in this case, we have
no such luck. Thus, we need to infer consistent data sets w.r.t. the given summary
statistics, then use those data pairs to perform our own model fitting, and pool
those results to obtain an averaged posterior. Before we describe the procedure for
generating consistent data sets, let us describe the single exponential decay model
that the inference is based on. This will help determine the mapping S .y/. It should
be emphasized that the fitted model and the procedure for obtaining the processed
data products or summary statistics are presumably reported in the authors work.
n
!1 ( n 2 )
Y 1X yi ae kti
kti
p.yjx; / / .1 ae C 2 / exp :
iD1
2 iD1 1 ae kti C 2
(3.50)
Using improper uniform priors on , the posterior on x and is then given by
2
We will assume that the author had an assumption of normality, hence their interpretation as 95 %
quantiles.
52 H.N. Najm and K. Chowdhary
3.0
PDP on pushed forward post.
2.5
2.0
1.5
y
1.0
0.5
0.0
0 1 2 3 4 5
t
Fig. 3.1 Summary statistics or processed data products provided in the published report. These
are error bars on the pushed forward posterior through the single exponential decay model [30]
where
Z
fi .y/ D Ep.Xjy/ f .ti I X /
D f .xI ti /p.xjy/dx; (3.53)
Rd
Z
f2i .y/ D Ep.Xjy/ .f 2
.ti I X /fi /
D f .ti I x/fi
2 p.xjy/dx: (3.54)
Rd
The first term w.1/ will try to force consistency between the data sets and the
given pushed forward posterior means (contained in S .y /). Likewise, the second
term w.2/ .y/ will try to force consistency between the data sets and the pushed
forward posterior quantiles (interpreted as fi .y/ 2fi .y/, also contained in
S .y /).
In order to check consistency between the data sets and the pushed forward
posterior mean, we define the function w.1/ .y/ W Rn 7! R as follows:
Nt
Y
w.1/ .y/ D exp.1 .fi .y/ fi .y //2 / (3.56)
iD1
Nt
Y Z 2 !
.1/
w .y/ D exp 1 f .xI ti / fi .y /
p.xjy/dx : (3.57)
iD1 Rd
It is clear that this weight function is enforcing the mean of the pushed forward
posterior.
To enforce the variance or quantiles of the pushed forward posterior on model
outputs, one can take a similar approach to (3.56) and simply replace fi .y/
with fi .y/. However, we choose a more robust approach outlined in [3], which
performs a trinomial quantile check. Instead of comparing means and variances
directly, which can be highly sensitive to sampling noise, i.e., via Monte Carlo
estimation, we sample the pushed forward posterior of a sample data set y and
bin the samples according to the 2:5 % 95 % 2:5 % quantiles according to
the given summary statistics. This binning produces a trinomial density for the
sample data set y. Ideally, the trinomial density will have 2:5 % 95 % 2:5 %
probability masses in each bin. To check consistency, we use relative entropy/KL
divergence to compare the sample trinomial density, computed from the pushed
forward posterior based on y, with the ideal trinomial density, with probability
masses 2:5 % 95 % 2:5 %.
In order to properly define this pseudo-metric, we must first make an assumption
that the data provided for the standard deviation can be equivalently interpreted as
quantiles. That is, if we are given the mean and standard deviation and , we can
equivalently say that 95 % of the data falls within 2 ; C 2
and 2:5 % of the
data falls below 2 and above C 2 . If the underlying distribution is normal,
this assumption is justified. If this assumption cannot be justified, then using (3.56)
54 H.N. Najm and K. Chowdhary
with fi instead of fi will suffice. In the case that the processed data products are
provided in the form of quantiles, we do not need to make any assumption at all,
and the following approach is justified.
That being said, let us assume that the quantiles for the pushed forward posterior,
f .xI ti /, are given to us in the form of fi .y / 2fi .y /; fi .y / C 2fi .y /
fi .y / C 2fi .y /; 1
; (3.58)
respectively. In other words, gi .y/ measures the likelihood that the correct binning
is achieved. To measure the mismatch, we use the KL density [3, 10]. In order to
define gi .y/, let us introduce the following probabilities. Let pf .ti Ix/ .y/ be the
pushed forward posterior density at ti . Then, define
Z
pi .y/ WD pf .ti Ix/ .y/dy (3.59)
y<fi .y /2fi .y /
Z
qi .y/ WD pf .ti Ix/ .y/dy;
y>fi .y /C2fi .y /
Let Pi .y/ D pi ; ri ; qi
.y/ represent a probability density, with pi C qi C ri D 1,
which represents the percent of samples that fall within the 2:5 %95 %2:5 % bins
defined in (3.58). Ideally, we would like this density to be Q D 0:025; 0:95; 0:025
where increasing 2 > 0 acts to improve the match between Pi and Q, by increasing
the severity of the application of the constraint. Finally, we want this to hold true for
i D 1; : : : ; Nt which gives us the corresponding weight
Nt
Y Nt
Y
w.2/ .y/ / gi .y/ D exp.2 DKL .Pi .y/kQ//: (3.61)
iD1 iD1
3 Inference Given Summary Statistics 55
.1/
Combining weights w2 and w.2/ gives
"N # "N #
Yt Yt
2
w.y/ / exp.1 .fi .y/ fi .y // / exp.2 DKL .Pi .y/kQ// :
iD1 iD1
(3.62)
As we will see in the examples, it may be desirable to only match or penalize a
subset of the provided quantiles. That is, let B
f1; : : : ; Nt g and consider the
reduced weight function
" # " #
Y Y
2
Q
w.y/ / exp.1 .fi .y/ fi .y // / exp.2 DKL .Pi .y/kQ// :
i2B i2B
(3.63)
Note that wQ does not enforce a matching between all Nt means and quantiles, but
rather a smaller subset B of those same means and quantiles. We now explain in
the following how we employ these constraints to generate consistent data sets and
arrive at the pooled posterior of interest.
3
Again, we do this to reduce the computational cost of generating data sets over multiple
dimensions. Although we do not marginalize over the dimensionality, we show that for a reasonable
choice of the dimension parameter N D 5, we can still recover consistent data sets.
56 H.N. Najm and K. Chowdhary
Fig. 3.2 The black lines denote the true quantiles (means and 2.5 % and 97.5 % quantiles) based
on the original raw data set (hidden from us, but displayed here for comparison purposes) at four
different times. The green dots represent the means for each consistent data set generated using
Algorithm 1 with 1;2 D 300. Likewise, the red and magenta dots represent the upper and lower
quantiles, respectively, for each consistent data [30]
means and covariance of each individual sample versus the original posterior,
similar to what is displayed in Figure 3.2. Figure 3.5 displays the log and linear
pooled posterior from all 150,000 samples, superposed on the true posterior. In
this case, both pooled posteriors are very similar, since the generated data does not
exhibit much sample-to-sample variance in the posterior. More importantly, they
both agree very well with the true posterior, given (a) the correct interpretation of
the error bars in this case, and (b) the evident near sufficiency of the given statistics.
We also show, in Fig. 3.6, quantiles for consistent data when 2 D 150 instead
of 2 D 300. As expected, the quantiles are less tightly consistent, but in fact still
within an acceptable amount.
We also repeated the experiment with the dimensionality N D 10 to simulate the
true dimension of the data set. We found that the results were very similar to N D 5
so we omit the results.
Fig. 3.3 The black lines denote the true posterior means from the raw data set, unbeknown
to us. The red dots represent the means for the model fit parameter after each consistent data
set, generated using Algorithm 1, is fit to the single exponential decay model via a Bayesian
approach [30]
Fig. 3.4 The black lines denote the true posterior covariance from the original (synthetic) data set,
unbeknown to us. The green dots represent the covariance for the model fit parameter after each
consistent data set, generated using Algorithm 1, is fit to the single exponential decay model via a
Bayesian approach [30]
Fig. 3.5 The true posterior, based on the original (synthetic) data set, is shown in black contours.
The linear and log pooled covariance is shown in the left and right images, respectively, using
Algorithm 1 when we have error bars on the pushed forward posterior [30]
Fig. 3.6 The black lines denote the true quantiles (means and 2.5 % and 97.5 % quantiles) based
on the original, raw synthetic data set (hidden from us, but displayed here for comparison purposes)
at four different times. The green dots represent the means for each consistent data set generated
using Algorithm 1 with 1;2 D 150. Likewise, the red and magenta dots represent the upper and
lower quantiles, respectively, for each consistent data [30]
I
I l
Fig. 3.7 Ignition time computed with GRImech3.0 over a range of variation of initial temperature,
including also the corresponding noisy synthetic data points used for inference [17]
60 H.N. Najm and K. Chowdhary
from the marginal parameter posteriors, we then proceed to the application of DFI,
forgetting for the moment both the data and the true posterior structure.
For the application of DFI, we presume 8 data pairs, as opposed to the true
original number of 11, for illustration. We do this in order to accentuate the point
that in an actual application with experimental data that is not reported, the number
of missing data points may not necessarily be known. As we indicated earlier, one in
principle can propose a density that encapsulates knowledge of the number of points.
Here, we simply posit a number, that is different from the truth. An analysis of the
impact of different choices for the number of points, not reported here, suggests a
small effect on the result of inference in this particular situation.
The construction of the data chain likelihood relies on kernel densities that
enforce the nominal values of .ln A; ln E/ and their 5 %/95 % quantiles. With the
data chain state
D .z; ln d /, where z is the vector of 8 data pairs and d is the
standard deviation of the noise, we define the data chain likelihood as
p.z; ln d j 0 /
p.I j
/ D w .z; ln d / D F .z/ : (3.64)
maxp.z; ln j /
;
2
Y
F .z/ / f pi .z/; 1 pi .z/ qi .z/; qi .z/
0:05; 0:90; 0:05
(3.67)
iD1
where,
Z Z
p1 .z/ D p. j z/ d ; q1 .z/ D p. j z/ d ; (3.68)
ln A<.ln A/5 ln A>.ln A/95
Z Z
p2 .z/ D p. j z/ d ; q2 .z/ D p. j z/ d ; (3.69)
ln E<.ln E/5 ln E>.ln E/95
n
p r q o
f .p; r; q
j 0:05; 0:90; 0:05
/ D exp p ln C r ln C q ln ;
0:05 0:90 0:05
(3.70)
being the KL density [3] formed of the KL divergence from the density p; r; q
I
I
Fig. 3.9 A segment of the DFI data chain, showing the MCMC sampled values of the data vector.
In the bottom two frames, each line corresponds to the sampled value of each data point, .Tio ; i /,
i D 1; : : : ; 8 [17]
Fig. 3.10 A plot of the means, as well as the 5 % and 95 % quantiles from the parameter posteriors
from each data chain step, for all 50 chains, and for the last 500 steps from each chain. The left
frame shows those for ln A, while the right frame shows those for ln E. The solid lines indicate the
corresponding values from the reference posterior [17]
I
I I
Fig. 3.11 Scatter plot of 50 chains, with the first 1500 burn-in states removed from each chain.
Only every 25th chain state is plotted for convenience [17]
do with the amount of given information, the nature of the summary statistics, and
the structure of the model. In this case, evidently, posteriors consistent with all this
given information are essentially equivalent to the missing posterior. In other words,
the available information constitutes a nearly sufficient statistic [6].
Finally, the statistical convergence of the algorithm is highlighted in Fig. 3.13.
Here, the convergence of the KL divergence between successive pooled posteriors,
with increasing data volume, is illustrated. The abscissa k is the number of
64 H.N. Najm and K. Chowdhary
Fig. 3.12 Two-dimensional marginal parameter posteriors on .ln A; ln E/ resulting from pooling
50 chains of data (red), compared with the reference known posterior (black) [17]
Fig. 3.13 Plot showing the statistical convergence of the algorithm in terms of the KL divergence
(KLdiv) between successive pooled posteriors. Also shown is the scatter in each of the KLdiv
values. The red line indicates the evolution of the mean with increased number of chains [17]
3 Inference Given Summary Statistics 65
The results highlight the scatter of Dk , for any given k, resulting from 1024
combinatorial choices of k-chains out of 50. The average divergence, however,
exhibits a roughly inverse linear dependence on k, such that D k / 1=k. Thus, with
additional data, we have an associated linear convergence of the pooled posterior.
5 Conclusion
This chapter has focused on the context of Bayesian learning from summary
statistics based on missing data. We motivated this focus with practical challenges
faced in forward UQ given probabilistically incomplete published information on
uncertain model parameters, this being most commonly in the form of summary
statistics. We outlined a maximum entropy framework within which the problem can
be posed in general and discussed the relationship between the maximum entropy
and Bayesian data analysis frameworks. We also outlined a general algorithm,
relying on a the use of Approximate Bayesian Computation methods for solving
the MaxEnt problem, providing sampling from the MaxEnt posterior on the joint
data-parametric space, and pooling/marginalizing over the data space to arrive at
the marginal posterior density on parameters of interest. Finally, we illustrated the
application of this algorithm, with synthetic data, on an algebraic model problem
as well as the practical context of estimation of Arrhenius reaction rate coefficients
for a global chemical model of methane-air oxidation, based on reported summary
statistics on these parameters.
Aside from these applications, the method has been used for estimation of rate
constants for an elementary H2 =O2 reaction, namely, H C O2 D OH C O, based
on reported processed data summaries from a shock tube experiment [19]. In this
application in particular, the practical utility of having a meaningful joint density on
model parameters, rather than the default of ignoring such un-reported correlations,
has been illustrated in terms of associated impact on uncertainties in predicted
model observables. Similarly, the utility of the method in allowing the handling
of other previously measured uncertain model parameters as nuisance parameters,
and exploring the correlation among these and the parameters of interest, has been
studied.
In closing, it is useful to emphasize that, given the legacy of published literature
on physical models in general, where partial information is provided on uncertain
model inputs, largely composed of nominal values and error bars on each parameter,
with hardly ever any other information on correlations among parameters, let alone
the joint PDF on all parameters, there is a significant need for further development
and demonstration of statistical methods, such as the above DFI algorithm, that
66 H.N. Najm and K. Chowdhary
References
1. Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population
genomics. Genetics 162(4), 20252035 (2002)
2. Bernardo, J., Smith, A.: Bayesian Theory. Wiley Series in Probability and Statistics. Wiley,
Chichester (2000)
3. Berry, R., Najm, H., Debusschere, B., Adalsteinsson, H., Marzouk, Y.: Data-free inference of
the joint distribution of uncertain model parameters. J. Comput. Phys. 231, 21802198 (2012)
4. Bevington, P., Robinson, D.: Data Reduction and Error Analysis for the Physical Sciences, 2nd
edn. McGraw-Hill, New York (1992)
5. Box, G.E., Hunter, J.S., Hunter, W.G.: Statistics for Experimenters: Design, Innovation, and
Discovery, 2nd edn. Wiley, New York (2005)
6. Carlin, B.P., Louis, T.A.: Bayesian Methods for Data Analysis. Chapman and Hall/CRC, Boca
Raton (2011)
7. Caticha, A., Preuss, R.: Maximum entropy and Bayesian data analysis: entropic prior distribu-
tions. Phys. Rev. E 70(4), 046127 (2004)
8. Chowdhary, K., Najm, H.: Data free inference with processed data products. Stat. Comput.
121 (2014). doi:10.1007/s11222-014-9484-y
9. Clyde, M.: Bayesian model averaging and model search strategies (with discussion). In:
Bernardo, J., Berger, J., Dawid, A., Smith, A. (eds.) Bayesian Statistics 6, pp. 157185. Oxford
University Press, New York (1999)
10. Dupuis, P., Ellis, R.: A Weak Convergence Approach to the Theory of Large Deviations. Wiley-
Interscience, New York (1997)
11. Gelman, A., Meng, X.L., Stern, H.: Posterior Predictive Assessment of Model Fitness via
Realized Discrepancies. Statistica Sinica 6, 733807 (1996)
12. Genest, C.: A characterization theorem for externally Bayesian groups. Ann. Stat. 12(3), 1100
1105 (1984)
13. Genest, C., Zidek, J.: Combining probability distributions: a critique and an annotated
bibliography. Stat. Sci. 1(1), 114135 (1986)
14. Gregory, P.: Bayesian Logical Data Analysis for the Physical Sciences. Cambridge University
Press, Cambridge (2010)
15. Hansen, P.C., Pereyra, V., Scherer, G.: Least Squares Data Fitting with Applications. The Johns
Hopkins University Press, Baltimore (2013)
16. Hebrard, E., Dobrijevic, M.: How measurements of rate coefficients at low temperature
increase the predictivity of photochemical models of Titans atmosphere. J. Phys. Chem. 113,
1122711237 (2009)
17. IJUQ: Reprinted from Najm, H.N., Berry, R.D., Safta, C., Sargsyan, K., Debusschere, B.J.:
Data-free inference of uncertain parameters in chemical models. Int. J. Uncertain. Quantif. 4,
111132 (2014); Copyright (2014); with permission from Begell House, Inc
18. Jaynes, E., Bretthorst, G.L. (eds.): Probability Theory: The Logic of Science. Cambridge
University Press, Cambridge (2003)
19. Khalil, M., Najm, H.: Probabilistic inference of reaction rate parameters based on summary
statistics. In: Proceedings of the 9th U.S. National Combustion Meeting, Cincinnati (2015)
20. Lakowicz, J.R.: Principles of Fluorescence Spectroscopy, 2nd edn. Kluwer Academic, New
York (1999). Support Plane Analysis: see pp. 122123
21. Lehmann, E., Casella, G.: Theory of Point Estimation. Springer Texts in Statistics. Springer,
New York (2003). https://books.google.com/books?id=9St7DCbu9AUC
3 Inference Given Summary Statistics 67
22. Lynch, S., Western, B.: Bayesian posterior predictive checks for complex models. Sociol.
Methods Res. 32(3), 301335 (2004). doi:10.1177/0049124103257303
23. Nagy, T., Turnyi, T.: Determination of the uncertainty domain of the arrhenius parameters
needed for the investigation of combustion kinetic models. Reliab. Eng. Syst. Saf. 107, 2934
(2012)
24. Najm, H., Berry, R., Safta, C., Sargsyan, K., Debusschere, B.: Data free inference of
uncertain parameters in chemical models. Int. J. Uncertain. Quantif. 4(2), 111132 (2014).
doi:10.1615/Int.J.UncertaintyQuantification.2013005679
25. Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681686 (2008)
26. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Com-
put. Commun. Rev. 5(1), 355 (2001)
27. Sisson, S.A., Fan, Y.: Likelihood-free Markov chain Monte Carlo. In: Brooks, S. (ed.)
Handbook of Markov Chain Monte Carlo. Chapman & Hall, London (2010)
28. Sivia, D.S., Carlile, C.J.: Molecular-spectroscopy and Bayesian spectral-analysis how many
lines are there. J. Chem. Phys. 96(1), 170 178 (1992)
29. Smith, G., Golden, D., Frenklach, M., Moriarty, N., Eiteneer, B., Goldenberg, M., Bowman,
C., Hanson, R., Song, S., Gardiner, W., Jr., Lissianski, V., Zhiwei, Q.: GRI mechanism for
methane/air, version 3.0 (1999), 30 July 1999. www.me.berkeley.edu/gri_mech
30. STCO: Reprinted from the Chowdhary, K., Najm, H.N.: Data free inference with processed
data products. J. Stat. Comput. 121 (2014); Copyright (2014); with permission from Springer,
U.S.
31. Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM,
Philadelphia (2005)
32. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B
(Methodol.) 58, 267288 (1996)
Multi-response Approach to Improving
Identifiability in Model Calibration 4
Zhen Jiang, Paul D. Arendt, Daniel W. Apley, and Wei Chen
Abstract
In physics-based engineering modeling, two primary sources of model uncer-
tainty that account for the differences between computer models and physical
experiments are parameter uncertainty and model discrepancy. One of the
main challenges in model updating results from the difficulty in distinguishing
between the effects of calibration parameters versus model discrepancy. In this
chapter, this identifiability problem is illustrated with several examples that
explain the mechanisms behind it and that attempt to shed light on when a system
may or may not be identifiable. For situations in which identifiability cannot
be achieved using only a single response, an approach is developed to improve
identifiability by using multiple responses that share a mutual dependence on the
calibration parameters. Furthermore, prior to conducting physical experiments
but after conducting computer simulations, in order to address the issue of
how to select the most appropriate set of responses to measure experimentally
to best enhance identifiability, a preposterior analysis approach is presented to
predict the degree of identifiability that will result from using different sets of
responses to measure experimentally. To handle the computational challenges of
the preposterior analysis, we also present a surrogate preposterior analysis based
on the Fisher information of the calibration parameters.
Keywords
Parameter uncertainty Model discrepancy Experimental uncertainty Cali-
bration Bias correction (Non)identifiability Identifiability Model uncer-
tainty quantification Calibration parameters Discrepancy function Gaus-
sian process Modular Bayesian approach Hyperparameters Simply sup-
ported beam Non-informative prior Multi-response Gaussian process
Multi-response modular Bayesian approach Spatial correlation Non-spatial
covariance Preposterior covariance Preposterior analysis Fixed- prepos-
terior analysis Surrogate preposterior analysis Observed Fisher information
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2 Identifiability Challenge in Model Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.1 Overview of Lack of Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2 Case Study: An Illustrative Simply Supported Beam Example . . . . . . . . . . . . . . . . . 76
2.3 When Is Calibration Identifiability Possible? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3 Improving Identifiability in Model Calibration Using Multiple Responses . . . . . . . . . . . . 85
3.1 Multi-response Modular Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2 Applying the Multi-response Approach to the Simply Supported Beam . . . . . . . . . 91
3.3 Remarks on the Multi-response Modular Bayesian Approach . . . . . . . . . . . . . . . . . 95
4 Preposterior Analyses for Predicting Identifiability in Model Calibration . . . . . . . . . . . . . 97
4.1 Preposterior Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2 A Modified Algorithm for Investigating the Behavior of the
Preposterior Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3 Fisher Information-Based Surrogate Preposterior Analysis . . . . . . . . . . . . . . . . . . . . 105
4.4 Single-Response Case Study: Simply Supported Beam Example . . . . . . . . . . . . . . . 106
4.5 Using the Preposterior Analysis to Select Ne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.6 Multi-response Case Study: Simply Supported Beam Example . . . . . . . . . . . . . . . . 112
4.7 Remarks on the Preposterior Analyses Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Appendix A: Estimates of the Hyperparameters for the Computer Model MRGP . . . . . . . . . 121
Appendix B: Posterior Distributions of the Computer Responses . . . . . . . . . . . . . . . . . . . . . . 122
Appendix C: Estimates of the Hyperparameters for the Discrepancy Functions MRGP . . . . 123
Appendix D: Posterior Distribution of the Calibration Parameters . . . . . . . . . . . . . . . . . . . . . 124
Appendix E: Posterior Distribution of the Experimental Responses . . . . . . . . . . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
1 Introduction
Computer simulations have been widely used for design and optimization in
many fields of science and engineering, as they are often much less expensive
than physical experiments in analyzing complex systems. However, computer
simulations never agree completely with experiments, because no computer model
is perfect. Several sources of uncertainty accounting for the differences between
computer simulations and physical experiments have been reported in the literature
[1, 2]. Parameter uncertainty and model discrepancy are typically the two main
4 Multi-response Approach to Improving Identifiability in Model Calibration 71
calibration parameters and the discrepancy function result in equally good agree-
ment with the observed data. Most of the existing model uncertainty quantification
techniques [1, 2, 4, 20] treat the process as a black-box and have the objective of
improving the experimental response prediction but with little regard to identifying
the true model calibration parameters and model discrepancy. Loeppky et al. [12]
explored the identifiability issue and concluded that accurate prediction of the
experimental response is often possible, even if the individual calibration parameters
and discrepancy function are not identified precisely. Furthermore, they concluded
that the true values of the calibration parameters are only attainable when the model
discrepancy between simulation and reality is neither present nor included in the
model updating formulation. Higdon et al. [2] found that in some simple cases
the calibration parameters and discrepancy function can be accurately identified,
but in other cases their effects cannot be distinguished even though the response
predictions may be reasonable.
A broader view should be taken that good identifiability is often critically
important for many reasons: (1) Learning the calibration parameters may in itself be
a primary goal with broad-reaching implications for product/process improvement
(e.g., if these calibration parameters are needed for a system-level simulation or
if the calibration parameters themselves reflect the performance of interest but
cannot be observed directly). (2) Knowledge of the model discrepancy improves
the understanding of the deficiencies of the computer model for improving future
generations. (3) It results in more accurate prediction over a broader set of input
regions, because the model adjustment from learning the calibration parameters is
more global than the adjustment from learning the model discrepancy. In the context
of using predictive modeling for engineering design, adjusting the calibration
parameters changes the prediction of the experimental response for a wide range
of values for the design variables, whereas adjusting the discrepancy function
tends to change the prediction of the experimental response predominantly within
some neighborhood of the specific design variable settings that were used in the
experiments.
In many engineering applications, good identifiability is virtually impossible if
considering a single experimental response. In this chapter, we aim to provide a
better understanding of the issue of identifiability and offer feasible treatments to
predict and to enhance identifiability via a multi-response approach. The remainder
of the chapter is organized as follows: Sect. 2 investigates the issue of identifiability
using a simply supported beam example. Whereas the example and discussion in
Sects. 2.1 and 2.2 convey the conclusion that identifiability is often very difficult,
or impossible, in typical implementations, Sect. 2.3 discusses situations in which
it is reasonable to expect that good identifiability can be achieved. Section 3
further investigates the identifiability issue and demonstrates that incorporating
multiple experimental responses that share a mutual dependence on a common set of
calibration parameters can substantially improve identifiability. Section 4 provides
a detailed description of a preposterior analysis, in conjunction with the multi-
response uncertainty quantification approach, for predicting identifiability prior to
conducting physical experiments. Furthermore, to handle computational challenges
4 Multi-response Approach to Improving Identifiability in Model Calibration 73
Following the seminal work of Kennedy and OHagan [1], the model uncertainty
quantification formulation that incorporates the various uncertainties is [1, 2, 4, 21]:
where the controllable inputs (aka design variables) are denoted by the vector x D
x1 ; : : :; xd T and the calibration parameters, which are unknown model parameters,
are denoted by the vector D 1 ; : : : ; r T . A calibration parameter is defined as
any physical parameter that can be specified as an input to the computer model and
that is unknown or not measureable when conducting the physical experiments [1].
(an r 1 vector of constants) denotes the true values of the unknown calibration
parameters over the course of the physical experiments. y e .x/ is the experimental
response, y m .x; / is the computer model response as a function of x and , .x/ is
the additive discrepancy function that represents the difference between the model
response (using the true ) and the experimental response, and " accounts for the
experimental uncertainty. Experimental uncertainty is typically assumed to follow
a normal distribution with mean 0 and variance , denoted " N .0; ).
Parameter uncertainty has two forms. A parameter may be constant but
unknown and its uncertainty represented via a probability distribution. Alternatively,
a parameter x may vary randomly (from run to run over the physical experiments
and/or over any future instances for which the model is used to predict reality)
according to some probability distribution. An example of the latter could be the
blank thicknesses in a sheet metal stamping operation that vary randomly from blank
to blank due to manufacturing variation. Calibration generally refers to the former
form of parameter uncertainty, with the goal of identifying the true values of the
constant parameters. Constant parameters refer to those that do not change over
the duration of the physical experiments. Accordingly, in this chapter, only this form
of parameter uncertainty is considered. Calibration in the case of parameters that
vary randomly throughout the physical experiment would be far more challenging.
One might assume some distribution for the randomly varying parameters and define
the calibration goal as to identify the statistical parameters of the distribution (e.g.,
the mean and variance of a normal distribution) [20]. However, this would require
74 Z. Jiang et al.
far more experimental observations than are needed for the goal in this chapter of
identifying constant physical parameters.
The formulation in Eq. (4.1) is comprehensive in that it accounts for param-
eter uncertainty, model discrepancy, experimental uncertainty, and interpolation
uncertainty. To quantify uncertainty in the calibration parameters and discrepancy
function, the modular Bayesian approach presented in [1] is used to calculate their
posterior distributions. Notice that the discrepancy function in Eq. (4.1) is not
directly observable from the collected data, since the true value of is unknown.
Notation-wise two Gaussian process (GP) models are used to infer the computer
model response and the discrepancy function such that:
y m .; / GP hm .; /m ; m2 Rm ..; /; .; // ; (4.2)
The prior mean function of the computer model response is comprised of the
unknown regression coefficients m and the known regression functions hm .x; ).
The prior covariance function is the product of an unknown constant m2 and a
correlation function Rm ..x; ), .x0 ; 0 //, where (x; ) and (x0 ; 0 / denote two sets
of computer model inputs. The notations for the discrepancy function are similarly
defined. Rm ..x; /; .x0 ; 0 // and R .x; x0 / are parameterized by a (d C r/ 1 vector
m and a d 1 vector such that:
X Xd Cr
d
Rm ..x; /; .x0 ; 0 // D exp !km .xk xk0 /2 !km .k k0 /2 ;
kD1 kDd C1
X (4.4)
d
R x; x0 D exp
! .xk xk0 /2 : (4.5)
kD1 k
Fig. 4.1 An illustration of identifiability. While it is easy to tell that (c) is the least likely estimate
of , it may be impossible to identify which of (a) and (b) is better
For each simulation realization y m .x; O .i/ /.i D 1; : : : ; 3), an estimated discrepancy
function .i/ .x/.i D 1; : : : ; 3/ can be found so that the resulting experimental
response predictions y m .x; O .1/ / C .1/ .x/, y m .x; O .2/ / C .2/ .x/, and y m .x; O .3/ / C
.3/ .x/ are quite similar and in equally good agreement with the experimental data
within the experimental region. Intuitively from the smoothness of the observed
experimental data, the combination of O .3/ and .3/ .x/ seems less likely than
the other two combinations to be the true parameter value and true discrepancy
function as the simulation is highly nonlinear. Rigorous calculation can show that its
posterior probability density p.O .3/ jy m ; y e / has a smaller value than p.O .1/ jy m ; y e /
and p.O .2/ jy m ; y e /. However, it may be virtually and computationally impossible
to identify which of O .1/ and O .2/ is a better estimate of , since in both cases
(Fig. 4.1a, b) the computer simulations are consistent with the experiments to a
similar degree, and the values of p.O .1/ jy m ; y e / and p.O .2/ jy m ; y e / are close.
Figure 4.2 depicts how the posterior distribution of can be used to assess
the level of identifiability for a hypothetical single-parameter example. A tight
posterior distribution of with a clear mode indicates good identifiability. In
sharp contrast, with a widely dispersed posterior distribution, the identifiability is
poor. As in much of the prior work that assessed calibration identifiability (e.g.,
[4, 12, 13]), identifiability can be quantified via the posterior covariance matrix of
the set of calibration parameters (posterior to observing the physical experimental
76 Z. Jiang et al.
data, in addition to the simulation data). One might also consider using the posterior
covariance function of the discrepancy function as a measure of identifiability;
however, this is infinite dimensional and cumbersome. Moreover, the posterior
covariance of the discrepancy function is closely related to the posterior covariance
of the calibration parameters, in the sense that precisely estimated calibration
parameters generally imply a precisely estimated discrepancy function (which is
somewhat obvious from Eq. (4.1)). Consequently, the posterior covariance matrix
of the calibration parameters is more manageable to work with. Here, the term
identifiability is used rather informally to refer to whether one can expect to
achieve reasonable and useful estimation of the calibration parameters with typical
finite sample sizes (which is sometimes possible), as opposed to whether estimators
are consistent and guaranteed to asymptotically converge to the true values when
sample sizes approach infinity in some sense (which is perhaps never possible under
realistic assumptions).
Fig. 4.4 Material stress-strain curves for (a) the computer model and (b) the physical experi-
ments. E is Youngs modulus and the calibration parameter. y is the yield stress (225 MPa)
constant stress C D 2068 MPa and strain-hardening exponent n D 0:5) [23]. These
material laws are in part governed by Youngs modulus, E, which is treated as the
unknown calibration parameter (Fig. 4.4). For the physical experiments, the true
value of the calibration parameter is set to D 206:8 GPa but is treated as unknown
during the analysis and assigned a uniform prior over the range 150 300 GPa.
The prior distribution was chosen to be relatively wide in order to minimize the
effect of the prior on the posterior distributions and to avoid choosing a prior that
does not contain any support at the true value (in which case the posterior will
78 Z. Jiang et al.
Fig. 4.5 The posterior distributions for (a) the experimental response, (b) the discrepancy
function, and (c) the calibration parameter showing a lack of identifiability
4 Multi-response Approach to Improving Identifiability in Model Calibration 79
0.05 0.05
d (x,250)
0 0
y e(x) d (x,150)
y m(x,250)
0.05 0.05
Fig. 4.6 The computer response y m .x; /, the experimental response y e .x/ (bullets indicate
O / D y e .x/ y m .x; / for (a)
experimental data), and the estimated discrepancy function .x;
D 150 GPa and (b) D 250 GPa
large amount of experimental data and no experimental error ". In spite of this, the
calibration parameter and the discrepancy function are not identifiable as illustrated
below.
To assess identifiability, the posterior distributions of the calibration parameter
(Fig. 4.5c) and the discrepancy function (Fig. 4.5b) are used. The large posterior
variance of the calibration parameter and the large width of the prediction intervals
for the discrepancy function indicate the poor identifiability of the calibration
parameter and discrepancy function. Indeed, several different combinations of these
are approximately equally probable and give approximately the same (quite accurate
and precise) predicted response in Fig. 4.6a. It is also worth noting neither posterior
reflects the true values of the calibration parameter and the discrepancy function.
Because of the high uncertainty reflected by the posterior distributions, the system
is not identifiable, even with the relatively dense spacing of the experimental data.
Moreover, collecting additional experimental observations would help little.
To understand how the large uncertainty and the inaccuracy of the posterior
distributions lead to a lack of identifiability, consider the estimated discrepancy
O / D y e .x/ y m .x; / shown in Fig. 4.6 for two different values of .
function .x;
For underestimated at 150 GPa in Fig. 4.6a (the true D 206:8 GPa) and
for overestimated at 250 GPa in Fig. 4.6b, neither nor the resulting estimated
discrepancy function are inconsistent with their priors or in disagreement with
the data. They both result in very similar, and quite accurate, prediction of the
experimental response. Consequently, without additional information, it is virtually
impossible to distinguish between the effects of and .x/ and accurately identify
each in this example.
To improve identifiability, previous literature [4, 24] recommended using infor-
mative (i.e., low variance or tightly dispersed) prior distributions for the calibration
parameters, the discrepancy function, or both. However, specifying informative
priors for the discrepancy function (e.g., a linear or quadratic functional form
[20, 25]) is often difficult because one rarely has significant prior knowledge.
80 Z. Jiang et al.
d (x)
d (x)
0.01 0.01 0.01
Posterior PDF
0.05 0.05 0.05
0 0 0
150 225 300 150 225 300 150 225 300
q q q
True calibration parameter Posterior probability distribution function (PDF)
Fig. 4.7 Posterior distributions for the discrepancy function and the calibration parameter for the
three prior distributions in Table 4.2
Likewise, one does not have such prior information on the calibration parameters
(otherwise, they would be viewed as known parameters and not considered cali-
bration parameters). However, to illustrate how this would improve identifiability,
three versions of the preceding beam example are considered using three different
normal prior distributions for , each with different means and/or variances. The
resulting posterior distributions for and the discrepancy function are shown in
Fig. 4.7 and Table 4.2.
Case 1 assumes a less informative (larger standard deviation) prior distribution
centered about the midpoint of the calibration parameter range (mean of 225 GPa).
In this case, the posterior distribution for the calibration parameter and the dis-
crepancy function were neither accurate nor precise, which indicates a lack of
identifiability. In Case 2, an informative prior (small standard deviation) with a mean
equal to the true results in an identifiable system, which is evident by the accurate
and precise posterior distributions. In contrast, Case 3 shows the inherent danger of
using an informative prior that is inaccurate albeit precise. In this case, the posterior
for the discrepancy function has a small prediction interval and the posterior of the
calibration parameter has a small variance, which would lead one to believe that
they are precisely estimated. However, the results of posteriors are inaccurate and
do not reflect the true discrepancy function and .
4 Multi-response Approach to Improving Identifiability in Model Calibration 81
In conclusion, since one rarely has such accurate and informative prior knowl-
edge about the sources of uncertainty present in the engineering system, assuming
informative priors for the calibration parameters or the discrepancy function is not
a satisfying solution to the identifiability problem.
The simply supported beam represents a system for which we cannot accurately and
precisely identify both the calibration parameters and the discrepancy function (e.g.,
via their posterior distributions). This leads one to question whether identifiability is
ever possible, given the nature of the model in Eq. (4.1). This section demonstrates
that the answer is yes, but under certain assumptions. This is illustrated with a
simple example in which the requisite assumptions are that the discrepancy function
is reasonably smooth and can be represented by a smooth GP model (i.e., a GP
with small roughness parameters, !). Loosely speaking, smooth means that the
function does not change abruptly for small changes in the design variables. In the
context of our GP-based modeling, smooth means that the function is consistent
with a GP model having relatively small roughness parameters (), as discussed
later at the end of this section.
As evident from the posterior distributions of Fig. 4.5, the simply supported
beam is not identifiable. The conceptual explanation (see the discussion surrounding
Fig. 4.6) is that, for many different values of , the estimated discrepancy function
is smooth and consistent with a GP model. In contrast, an example is considered
next in which the estimated discrepancy function has behavior that is inconsistent
with a GP model when , but consistent when D . It will be shown that
this is the essential ingredient for good identifiability [14]. Consider the following
step function as the computer model
1 x<2
y m .x; / D ; (4.6)
1C x2
Fig. 4.8 Computer model and experimental response for Eqs. (4.6) and (??) with (a) D 1:0
O / for (b) D 1:0 and (d)
and (c) D 3:0. Corresponding estimated discrepancy function .x;
D 3:0
was difficult to fit using the 168 simulations, due to the discontinuity at x D 2.
Typically, a GP model with a Gaussian correlation function is not a good choice
to model a step function because of its inherent smoothness (see Ref. [26, p. 123]
and subsequent discussion). However, Fig. 4.9 shows that, with the large amount of
4 Multi-response Approach to Improving Identifiability in Model Calibration 83
data and allowing large roughness parameters (the MLEs of the hyperparameters
are m D ! x ; ! T , where ! x D 377:52 and ! D 50, and ! fell at the upper
boundary of what is allowed in the MLE algorithm for numerical purposes), a fairly
accurate GP metamodel results. Although the fitted GP metamodel is somewhat
rough and interpolates less accurately near the discontinuity, it serves to illustrate
when identifiability is achievable.
After creating the GP model for the computer model, the modular Bayesian
approach estimates the hyperparameters for the GP model representing the
discrepancy function. 14 experimental data points were collected at Xe D
1; 1:167; 1:333; : : :; 3; 1:99; 2:01T (the black dots in Fig. 4.10a). The same x
locations were chosen for the simulation and experimental runs in order to focus
on identifiability rather than interpolation performance. Finally, the posterior
distributions for the calibration parameters, the experimental response, and the
discrepancy function are calculated. 10 shows the resulting posterior distributions.
The accurate and precise (low posterior standard deviation) posterior distribu-
tions for the discrepancy function in Fig. 4.10b and the calibration parameter in
Fig. 4.10c demonstrates that good identifiability is achievable for this example.
Fig. 4.10 Posterior distributions for (a) the experimental response (black dots indicate experimen-
tally observed response values), (b) the discrepancy function, and (c) the calibration parameter
84 Z. Jiang et al.
Notice that the posterior mean of the experimental response in Fig. 4.10a is a less
accurate and precise predictor near the step discontinuity. In spite of this, reasonable
identifiability of the calibration parameter was achieved, which was the objective of
this analysis.
To understand why the system is identifiable in this example, consider the
estimated discrepancy functions .x; O / for two incorrect values of shown in
Fig. 4.8b ( D 1:0) and Fig. 4.8d ( D 3:0). For reference, the estimated
discrepancy function .x; O / for the true D 2:5 is shown in both figures. The
O / for the incorrect values of have discontinuities that
salient feature is that .x;
are inconsistent with a smooth GP model for the discrepancy function. The only
O / consistent with model is the true value D 2:5.
value of that results in a .x;
O
By a .x; / inconsistent with the model, it means that, when that value of
and its corresponding .x;O / are plugged into the likelihood function, the resulting
likelihood is exceptionally low. Because the estimated discrepancy function entails
a high likelihood only when is close to , and the likelihood drops off abruptly
as deviates from , the second derivative of the likelihood function (with respect
to ) is quite large. This translates to a large observed Fisher information, which
results in a tight posterior distribution for . This is, in essence, the definition
of identifiability. The preceding example illustrates that identifiability is possible
when small changes in the calibration parameters about their true values result in an
estimated discrepancy function (which reflects the differences between the observed
experimental data and the simulation model) that is inconsistent with the GP model
of the discrepancy function.
The identifiability issues in the preceding discussion can also be viewed in a
slightly different light. In essence, the approach evaluates identifiability by consid-
ering the simulation data, considering the experimental data, and then assessing the
likelihood that the observed differences between the simulation and experimental
data are explained by the discrepancy function, by an adjustment of the calibration
parameters, or by various combinations thereof. Hypothetically, if there is roughly
equal likelihood that the differences between the simulation and experimental data
are explained by two (or more) different combinations of a discrepancy function and
an adjustment in the calibration parameters, then identifiability would be poor in this
case. On the other hand, if the likelihood is high that one particular combination of
discrepancy function and adjustment in the calibration parameters accounts for the
differences between the simulation and experimental data, then identifiability would
be good in this case.
In either case, the likelihood must be evaluated with respect to the assumed
prior distribution for the discrepancy function. And in this sense, identifiability is
possible only by making certain assumptions on the prior distribution model for the
discrepancy function. Although identifiability is achieved in the step example via
certain assumptions on the prior distributions, this is quite different than assuming a
tight, informative prior distribution for the calibration parameters. The latter would
also generally result in identifiability (i.e., a tight posterior), but it would be an
artificial identifiability in that one would have to begin with precise knowledge of
4 Multi-response Approach to Improving Identifiability in Model Calibration 85
the parameters via a tight prior. In all of the examples of this section, only relatively
non-informative prior distributions have been used for the calibration parameters.
Note that, in the preceding, the term likelihood is used, but it is technically a
Bayesian posterior probability, which considers both the likelihood and the prior
distributions for all parameters and hyperparameters.
The single-response modular Bayesian approach [1] calculates the posterior dis-
tributions, which quantify the posterior uncertainty of the calibration parameters
and the discrepancy function, by integrating data from both simulations and exper-
iments. In this section, the modular Bayesian approach is extended to incorporate
multiple responses [15]. To do this, the model updating formulation of Kennedy and
OHagan [1] is reformulated as [15, 34]:
in two separate modules (Modules 1 and 2). After fitting each MRGP model, one
calculates the posterior distributions of the calibration parameters, the discrepancy
functions, and the experimental responses (Modules 3 and 4). Details of each
module in the multi-response modular Bayesian approach are provided in the
following.
where ym .x; / D y1m .x; /; : : :; yqm .x; )] denotes the multiple responses from
the computer model. The prior mean function is comprised of an arbitrary vec-
tor of specified regression functions hm .x; / D hm m
1 .x; /; : : : ; hp .x; / and a
matrix of unknown regression coefficients Bm D m 1 ; : : : ; m
q , where m
i D
m m T m
1;i ; : : : ; p;i . A frequently used regression function is h .x; / D 1, which
corresponds to a constant prior mean. The prior covariance function is the prod-
uct of an unknown nonspatial q q covariance matrix m and a spatial cor-
relation function Rm ..x; /; .x0 ,0 //, where (x; ) and (x0 ; 0 / denote two sets
of computer model inputs. This prior covariance function can be rewritten as
Covyim .x; /; yjm .x0 ; 0 / D m m 0 0 m
ij R ..x; /; .x ; // where ij is the covariance
between the i th and j th computer model responses at the same input values. Further,
this covariance structure reduces to the covariance function for a single-response
Gaussian process model when q D 1. A Gaussian correlation function is used:
Xd n Xr o
0 0
m
R ..x; /; .x ; //D exp ! m .xk xk0 /2 exp !dmCk .k k0 /2 ;
kD1 k kD1
(4.9)
Fig. 4.11 Schematic showing the MRGP models covariance function represented by spatial
correlation and nonspatial covariance for a fixed
spatial correlations, then one can model each response independently, allowing each
to have a different spatial correlation function [31].
Figure 4.11 illustrates the separation of the covariance function into spatial and
nonspatial portions for the case of two responses. The gray arrow below the x-axis of
Fig. 4.11 signifies the spatial correlation portion of the covariance function, which is
represented by Rm ..x; /,(x 0 ; )) for computer model inputs x and x 0 with fixed .
Alternatively, the nonspatial covariance matrix portion of the covariance function,
m , represents the covariance across response functions, as signified in Fig. 4.11
by the gray arrow between the responses. Whereas the spatial correlation function
dictates the smoothness of both responses, the nonspatial covariance matrix dictates
the similarity between any trends or fluctuations (not represented by the prior mean
function) in the two responses.
To obtain maximum likelihood estimates (MLEs) of the hyperparameters
m D fBm , m , m g for the computer model MRGP, the multivariate normal
log-likelihood function based on the simulation data Ym D ym m
1 ; : : : ; yq is
m m
maximized, where ym m m m m T
i D yi .x1 ; 1 /; : : : ; yi .xNm ; Nm / , collected at Nm input
m m m T m m m T
sites X D x1 ; : : : ; xNm and D 1 ; : : : ; Nm . Previous literature [37, 38]
suggest a space-filling experimental design for the input settings of Xm and m .
Furthermore, to improve the numerical stability of the MLE algorithm, the inputs
Xm and m can be transformed to the range of 0 to 1, and the simulation data can
be standardized to have a sample mean of 0 and a sample standard deviation of 1
[39]. Details of the MLE algorithm can be found in Appendix A.
Inference of the computer model response yim .x; / at any x and , can be
determined by inserting the MLEs of the hyperparameters into the MRGP model
posterior mean and covariance equations, Eqs. (4.22) and (4.23) of Appendix B.
After obtaining the MLEs of the hyperparameters, the next step is to estimate the
4 Multi-response Approach to Improving Identifiability in Model Calibration 89
where ye .x/ D y1e .x/; : : : ; yqe .x/ denote the multiple experimental responses. The
q q diagonal covariance matrix with diagonal entries 1 ; : : :; q represents the
experimental uncertainty, i.e., "i N .0; i /. Since Eq. (4.11) depends on the true
calibration parameters , the objective is now to estimate the hyperparameters of
90 Z. Jiang et al.
the MRGP model for the discrepancy functions without knowing the true value
of .
Kennedy and OHagan [1] developed a procedure to obtain estimates of the
hyperparameters of a single discrepancy function using the prior distribution of
the calibration parameters. Specifically, this procedure constructs a likelihood
function, marginalized with respect to the prior distribution of the calibration
parameters, from the simulation and experimental data and using the MLEs of the
computer model hyperparameters from Module 1. The experimental data consists
of experimental response observations at Ne input sites, Xe D xe1 ; : : : xeNe T
(transformed to the range [0, 1]) with responses Ye D ye1 ; : : : ; yeq , where
yei D yie .xe1 /; : : : ; yie .xeNe /T (standardized). Then the MLE estimates of the
hyperparameters D fB , , , g occur at the location of the maximum of
the likelihood function (marginalized with respect to the priors of the calibration
parameters). Appendix C contains a detailed description of this procedure. After
estimating the hyperparameters of the two MRGP models in Modules 1 and 2, Bayes
theorem is used to calculate the posterior distribution of , as follows.
O
p.dj; /p./
O D
p.jd; / ; (4.12)
p.dj/O
This section revisits the simply supported beam example, introduced in Sect. 2.2, to
show how using multiple responses can improve the identifiability of the calibration
parameters and the discrepancy functions. Briefly (refer to Sect. 2.2 for details), the
simply supported beam is fixed at one end and supported by a roller on the other end.
A static force is applied to the midpoint of the beam to induce various responses,
e.g., deflection and stress. The magnitude of this force was chosen as the design
variable x, while Youngs modulus was treated as the calibration parameter . For
generating the physical experiment (which is taken to be the same computer
simulations but with a more sophisticated material law) data, the true value of
the calibration parameter is set to D 206:8 GPa. However, is treated as
unknown during the analysis and we use a uniform prior distribution over the range
150 300 GPa. The experimental uncertainty was set to zero (i.e., i D 0).
Fig. 4.12 Posterior distributions of the discrepancy function and the calibration parameter using
a single response for (a) y1 , (b) y2 , (c) y3 , and (d) y4
data will not improve identifiability when using only a single response. Instead, in
the remainder of this section, the use of multiple responses is explored to enhance
identifiability.
Fig. 4.13 Posterior distributions of the discrepancy functions and the calibration parameter using
multiple responses for (a) y1 and y2 , (b) y2 and y3 , and (c) y3 and y4
D 206:8 (marked by the vertical line in Fig. 4.13c). Moreover, the posteriors of
the discrepancy functions 3 .x/ and 4 .x/ capture the true functions with relatively
small uncertainty. These three cases are representative of the varied improvements
in identifiability for all pairs of responses listed in Table 4.5.
In the simply supported beam example, 12 out of the 15 pairs of responses
resulted in an identifiability improvement of 25% or more. The degree of improve-
ment varied from little (e.g., 14%), to moderate (e.g., 24% and 32%), to substantial
(e.g., 87%). This example illustrates that, if identifiability cannot be obtained
using a single response, multiple responses may be used to achieve improved
identifiability.
In this section, it has been shown that identifiability can be enhanced by using
multiple responses in model updating. The approach is most beneficial when
data from a single response results in a lack of identifiability. The existing
96 Z. Jiang et al.
generalize how many responses are necessary to achieve identifiability based on the
number of calibration parameters.
Together, this section and the previous section (Sect. 2) shed light on the chal-
lenging problem of identifying calibration parameters and discrepancy functions
when combining data from a computer model and physical experiments. They
demonstrate that identifiability is often possible and can be reasonably achieved
with proper analyses in certain engineering systems.
It was shown in Sect. 2.3 that the degree of identifiability depends strongly on
the nature of the computer model response as a function of the input variables
and the calibration parameters, as well as on the prior assumptions regarding the
discrepancy function (e.g., the smoothness of the discrepancy function). Computer
simulations are inherently multi-response, as there are many intermediate and
final response variables at many different spatial and temporal locations that are
automatically calculated during the course of the simulation. Section 3.2 showed
that, for systems with poor identifiability based on a single measured physical
experimental response, identifiability may be improved by measuring multiple
physical responses. In all situations, identifiability obviously depends on the design
of the physical experiment, i.e., the number of experimental runs and the input
settings for each run.
The same can be said when estimating the parameters of any parametric model
based on physical experimental data, and standard criteria for designing such
physical experiments are often based on measures related to parameter identifia-
bility. However, calibration of computer simulation models involves a fundamental
distinction: Prior to designing the physical experiment, one can learn via simulation
a great deal about the nature of the functional dependence of the response(s) on
the input variables and on the calibration parameters, and this knowledge can be
exploited when designing the physical experiment to provide better identifiability.
Indeed, because of the cost and difficulty in developing and placing apparatus for
measuring certain responses, it is generally not feasible to measure experimentally
all of the great many responses that are automatically calculated in the simulations.
Moreover, it is generally not necessary, because measuring only a subset of the
responses may result in acceptable identifiability. It is preferable to select a relatively
small but most appropriate subset of responses to measure experimentally, to best
enhance identifiability.
In order to accomplish this, one needs to predict or approximate the degree of
identifiability prior to conducting the physical experiments, but after conducting
the computer simulations. The primary purpose of this section is to introduce a
preposterior analysis developed in [4143] and investigate the use of the prepos-
terior covariance matrix [44, 45] of the calibration parameters for this purpose
and to demonstrate that it provides a reasonable prediction of the actual posterior
98 Z. Jiang et al.
covariance, at least for the examples that are considered. Consequently, the prepos-
terior covariance matrix can serve as a reasonable criterion for designing physical
experiments in order to achieve good identifiability when calibrating computer
simulation models.
It is worth noting that a preposterior analysis cannot improve ones knowledge of
the calibration parameters, because it is conducted prior to observing any physical
experimental data. However, it can provide a reasonable quantification of the
uncertainty of the calibration parameters that one expects will result after observing
the experimental data. This is analogous to what occurs in standard optimal design
of experiments [46,47] for fitting a response surface via linear regression. Given the
design matrix and the standard deviation of the random observation error, one can
calculate the covariance matrix of the estimated parameters prior to conducting the
experiment and use this as the experimental design criterion. Of course, one cannot
estimate the parameters until the experimental data are collected.
The format of the remainder of the section is as follows. Section 4.1 presents
the approach to calculate the preposterior covariance matrix of the calibration
parameters. Section 4.2 discusses a modification of the preposterior analysis that
provides insight into the behavior of the preposterior covariance as a means of
predicting the posterior covariance. Section 4.3 provides a detailed description
of the surrogate preposterior analysis, in conjunction with the multi-response
modular Bayesian approach, for substantially reducing the computational cost in
the preposterior analysis. Sections 4.4, 4.5, and 4.6 illustrates the effectiveness of
the preposterior analysis in predicting identifiability prior to conducting physical
experiments by using a number of examples and discusses the issue of identifiability
of the calibration parameters. Remarks are made in Sect. 4.7.
Fig. 4.14 Flowchart of the preposterior analysis for selecting experiment responses
100 Z. Jiang et al.
non-informative prior for Bm are used. Over the remainder of the preposterior
analysis, the MRGP model of ym .x; ) is not updated from that of Step 1a.
Next, a total of Ne experimental input settings Xe D xe1 ; : : : ; xeNe T are
defined, representing the input settings, i.e., the experimental design that one
intends to use for the physical experiment (in the MC simulations of Step 2,
simulated experimental response values will be generated at these input settings).
Step 1c then assigns the prior distribution parameters for the priors of MRGP
hyperparameters in the discrepancy function. The prior for discrepancy function
MRGP hyperparameters n D fB , , } [analogous to the hyperparameters
m for the MRGP model of ym .x; )] captures the prior uncertainty and nonlinearity
of the discrepancy functions; in Eq. (4.10) controls the magnitude of uncertainty
of the MRGP model, and controls its roughness (rougher surfaces correspond
to larger values of /. This prior for is used only in Step 2a-iii of the MC
loop, in which values of are sampled from the specified prior before generating
a hypothetical discrepancy function. Because the modular Bayesian approach
calculates MLEs of in Step 2b, the prior for is not incorporated via maximum
a posteriori estimators, although one could modify the approach to do this if desired.
For computational convenience, fB , , g can be assigned with point mass
priors.
Additionally in Step 1c, the user specifies the prior distribution for the exper-
imental error, which is assumed to be i:i:d: normal with mean 0 and a prior
variance chosen based on prior knowledge of the level of measurement error.
is treated in the same manner as . Its prior is used in Step 2a-iv when generating
a random value for , but this prior is not used when calculating the MLE of in
Step 2b.
Lastly in Step 1c, the user assigns a prior distribution p.) to the calibration
parameters . The prior for calibration parameters should have support spanning
the entire range of possible values for ; for a less informative prior, one can use a
uniform distribution over a broad range, while a normal distribution with specified
mean and variance can be used for a more informative prior. Since the uncertainty
in the calibration parameters is of direct interest, Module 3 of the modular Bayesian
approach in Step 2c calculates a full posterior distribution for based on the
specified prior, in contrast to the MLE-only estimation for m and in Modules 1
and 2. Based on the posterior of , Module 3 also calculates the posterior covariance
of for each MC replicate.
A realization .i/ of the calibration parameters is first generated from its prior
distribution specified in Step 1. The generated .i/ will be treated as a true value
for when generating the hypothetical physical experimental response observations
on the i th MC replicate.
Next, using the MRGP model for the computer simulation obtained in Step 1,
a realization of the computer simulation response values is generated at param-
eter values .i/ and input settings Xe , denoted by Ym.i/ D yO m .Xe ; .i/ / D
h i h iT
m.i/ m.i/ m.i/ m.i/
yO 1 ; : : : ; yO m.i/
q where yO j D yOj .xe1 ; .i/ /; : : : ; yOj .xeNe ; .i/ / .j D
1; : : : ; q/. Here, Ym.i/ is generated from a multivariate normal distribution whose
mean vector and covariance matrix are determined by the MRGP model. The hat
notation ^ over ym .Xe ; .i/ / denotes that we are drawing interpolated data from
the MRGP model (instead of the original model) of ym .
Subsequently, using the priors for n D fB , , g specified in Step 1c
and the MRGP model of Eq. (4.10), a realization for the discrepancy functions
.i/
is generated at the design settings Xe , denoted by .i/ D 1 ; : : : ; .i/
q where
.i/
j .j D 1; : : : ; q/ represents the realization of discrepancy function for the i th
response at Xe . Similar to generating Ym.i/ , a multivariate normal distribution is
used to generate .i/ , but with mean vector and covariance matrix determined by
the MRGP model for the discrepancy functions. Specifically, the mean is H B.i/ ,
where H D h .xe1 /T ; : : : ; h .xeNe /T T with h .x/ some specified regression basis
functions (in the examples, constant h .x/ D 1 will be used), and the covariance is
the .i/ R , where R is the Ne Ne matrix with kth-row, j th-column entry
equal to the prior correlation between .xek / and .xej / using parameters .i/ .
Finally, a realization E.i/ of the observation errors is generated at the design
settings Xe , assuming that the experimental error "j .j D 1; : : : ; q/ follows i:i:d:
normal distributions with mean 0 and variance j specified in Step 1c. E.i/ is an
Ne q matrix whose uth-row, vth-column entry is the realization of observation
error for the vth response at the uth design setting xeu . Based on Eq. (4.7), the
realization Ye.i/ of the simulated experimental responses at the design settings Xe
is calculated via:
Nmc
1 X
where p./ is the prior distribution of specified in Step 1c-iii. In other words, the
preposterior covariance is the expected value of the fixed- preposterior covariance
with respect to the prior of .
4 Multi-response Approach to Improving Identifiability in Model Calibration 105
@2
I./u;v D O .u; v D 1; 2; : : : ; r/;
log p Ym ; Ye j; (4.16)
@u @v
where p Ym ; Ye j; O is the likelihood function for the simulation and experimen-
tal data together, as in Eq. (4.12), and represents the true parameters. It measures
the amount of information that yet-to-be-collected experimental data Ye , together
with the already-observed Ym , provides about calibration parameter . It should be
noted that in Eq. (4.16) is meant to be the true values of the calibration parameters.
Hence, in Eq. (4.16) the distribution of the computer response Ym (which is known)
by itself does not depend on . Only the distribution of Ye depends on .
The Fisher-like surrogate preposterior criterion that will be introduced shortly in
this section, which is used only for reducing the number of response combination
that must be considered, is a modified version of Eq. (4.16). To handle the compli-
cation that the yet-to-be-collected experimental data Ye are unknown in Eq. (4.16),
the simple substitution Y O m .Xe ; / for Ye is made. Here, yO m .Xe ; / represents the
predicted value of Y via interpolating the data from the MRGP model of ym .
e
0
h i 1 X
Nmc
@2
IO D 0 O .i/
log p Ym ; yO m .Xe ; .i/ /j;
u;v Nmc iD1 @u @v .i / ; (4.17)
D
.i/
where O denotes the values of estimated in the modular Bayesian algorithm on
the i th MC replicate. Various scalar metrics of IO (e.g., trace, determinant, maximum
eigenvalue, etc.) can be used as the surrogate measure of identifiability. For a simple
case with a single calibration parameter , IO is a scalar. The larger IO is, the more
information there is about from Ym and Ye , and the more likely it is to achieve
good identifiability.
After calculating IO from Eq. (4.17), the top Nc subsets of responses (based on one
of the scalar measures of identifiability extracted from I/ O are chosen to be further
analyzed in the full preposterior analysis. Figure 4.15 is a flowchart of the entire
Fisher information-based surrogate preposterior procedure.
The surrogate analysis is clearly much more efficient than the full prepos-
0
terior analysis. The number of MC replicates Nmc can be significantly smaller
than what is required in the preposterior analysis .Nmc /, since the MC sam-
pling now is only with respect to . In addition, and more importantly, the
calculations in each MC replicate are much simpler. Within each MC replicate,
neither an inner level MC simulation nor numerical integration is needed. The
main computation within each MC replicate is to calculate the MLEs of the
hyperparameters O .i/ and then to evaluate the likelihood at a few discrete values
in the neighborhood of .i/ to calculate the second-order derivative in Eq. (4.17)
numerically.
In this section, the same beam example (see Sects. 2.2 and 3.2) illustrates how the
preposterior analysis can be used, after conducting the computer simulations but
before conducting the physical experiments, to predict the identifiability that will
4 Multi-response Approach to Improving Identifiability in Model Calibration 107
result from a proposed experimental design. The system in this example is inherently
difficult to identify with a single response even with a large amount of observed
physical experiments.
108 Z. Jiang et al.
Fig. 4.17 Plot of the fixed- preposterior STD and the posterior STD (both normalized by the
prior STD of ) versus t for the beam example. PP STD is the fixed- preposterior STD and
P STD is the posterior STD
110 Z. Jiang et al.
true D 206:8 GPa. Consequently, the relative difference in the preposterior STD
[which was calculated as the integral of the fixed- preposterior STD in Fig. 4.17,
with respect to p.)] accurately reflects the relative difference in the actual posterior
STD in this example. Specifically, when Ne is increased from 3 to 8 in Fig. 4.17, the
posterior STD decreases by 10.3% (from 0.8231 to 0.7383), and the preposterior
STD decreases by 12.5% (from 0.6676 to 0.5840). This correctly indicates that the
experimental design with Ne D 8 will result in somewhat better identifiability than
the experimental design with Ne D 3, as measured by the posterior STD that would
result after the experiments are conducted.
The preposterior STD was then calculated from the fixed- preposterior STDs
in Fig. 4.17, as described in Sect. 4.2. The analyses were repeated for various Ne ,
and Fig. 4.18 shows a scatter plot of the preposterior STD versus the posterior
STD (each normalized by prior STD ) for Ne D 3; 4; : : :; 15. The scatter plot
conveys a number of interesting characteristics of the beam example. Although
the preposterior STD slightly underestimates the posterior STD for the different
designs, the trend is the same: When Ne increases from 3 to 4, the preposterior
STD correctly predicts that the posterior STD will decrease. Then, as Ne increases
from 4 through 15, the preposterior STD again correctly predicts that there will
be negligible further decrease in the posterior STD. The overall conclusion that a
user might draw from this preposterior analysis is that the system inherently suffers
from a lack of identifiability, because an increase in Ne beyond 4 results in almost no
further improvement in the preposterior STD. Sections 2 and 3 discussed the reasons
behind the lack of identifiability in detail and also demonstrated that identifiability
is greatly improved when multiple responses are measured experimentally. In the
next subsection, a second example will be in which the identifiability of the system
continues to improve as experimental data are added.
4 Multi-response Approach to Improving Identifiability in Model Calibration 111
One way in which the preposterior analysis can be used is to help the user choose
the number Ne of experimental runs [41]. Figure 4.19, which plots the normalized
preposterior STD versus Ne for the beam example, illustrates how this might be
accomplished. One might conclude that increasing Ne beyond 4 will provide little
further knowledge of and that the beam system is inherently difficult to identify
with only the single experimentally measured response (which was strain at the
beam midpoint).
The beam example had only a single input variable x and used uniformly spaced
grid designs. We now consider an example that uses Latin hypercube designs with
two input variable and a calibration parameter. Suppose the computer model is:
y m .x1 ; x2 ; / D sin. x1 / C x1 x2 ;
x1 2 0; 1; x2 2 0; 1; 2 0; 4;
and the physical experimental data are generated from the model:
We began with a Latin hypercube design for the computer experiment with 40
points in three dimensions {x1 ; x2 ; }. An optimal Latin hypercube design based on
the maximin criterion was used. A Gaussian process model was fit to the computer
experiment data via MLE in Step 1a of Fig. 4.14. For the physical experimental
Fig. 4.19 Preposterior STD (normalized) versus Ne for the beam example
112 Z. Jiang et al.
Fig. 4.20 Fixed- preposterior STD (normalized by the prior STD of ) versus t for Latin
hypercube designs of various size Ne (Ne is specified in the legend)
design specified in Step 1b, maximin Latin hypercube designs of various size
Ne D f10; 20; 30; 40; 50g were used over two dimensions fx1 ; x2 g. Because of the
random nature of the Latin hypercube designs, none of the physical experimental
designs had x settings that coincided with the computer experiment x settings. Next,
in Step 1c-i, the GP hyperparameters of the discrepancy function were set to be
D 0; ! D 3, and D 0:75; which corresponds to assigning them point
mass priors. The relatively large value of results in a variety of relatively large
discrepancy functions being generated within Step 2 of the preposterior algorithm.
In Step 1c-ii, D 0:012 was specified. Lastly, in Step 1c-iii, the prior for was
specified to be uniform over the entire range [0, 4].
From the preposterior analysis, Fig. 4.20 shows the resulting fixed- preposterior
standard deviation versus t for the designs with different Ne , and Fig. 4.21 shows
the (normalized) preposterior standard deviation as a function of Ne . From Fig. 4.21,
the preposterior standard deviation does not substantially decrease beyond Ne D 30
roughly.
In this section, the simply supported beam example is once again employed, to
demonstrate the effectiveness of the preposterior and surrogate preposterior analyses
approach under the multi-response scenario. The physical meanings of the six output
4 Multi-response Approach to Improving Identifiability in Model Calibration 113
Fig. 4.21 Preposterior STD (normalized by the prior STD of ) versus Ne for the Latin hypercube
designs
The prior for the calibration parameter is a uniform distribution over [150, 300]
GPa. Additionally for the preposterior analysis, point mass priors are assigned to
the hyperparameters of the discrepancy functions with masses at B D 0; D
diag.1:11 107 ; 1:11 107 /, and ! D 2. The experimental errors are assigned
independent normal distributions: "4 N .0; 0:0070/ and "5 N .0; 2:943109 /.
In the MC loop of the preposterior analysis, there are a total of Nmc D 1600
MC replicates for this specific subset of responses. Within the i th MC replicate,
a realization of the simulation response Ym.i/ at the input settings Xe is generated
from the MRGP model fitted in Step 1a (the mean of which is shown in Fig. 4.22),
a realization of the model discrepancy .i/ is generated based on the specified
prior for the hyperparameters, and a realization of the observational error E.i/ is
generated based on the distributions of "4 and "5 . Based on this, a realization of
experimental data Ye.i/ is calculated via Step 2a-iv and can be used for the multi-
response modular Bayesian approach to calculate the sample posterior variance for
the i th MC replicate. Assuming a uniform prior for over the range [150, 300] GPa,
the preposterior variance is estimated as the average of all 1600 sample posterior
variances, which is Q 2 D 2:0054 GPa2 .
0
In the MC loop of the surrogate preposterior analysis, Nmc equally spaced values
of are considered. Within the i th MC replicate, a realization of the simulation
Ym.i/ is generated, as in the preposterior analysis. However, without generating
.i/ and E.i/ , we directly take Ye.i/ D Ym.i/ and apply the multi-response
modular Bayesian approach to calculate the negative second-order derivative of
log-likelihood function (as the term inside the brackets in Eq. (4.17)). The results
for all 16 values of are plotted in Fig. 4.23b. The Fisher information-based scalar
O
predictor I,taken to be the average of the 16 values, is 351.8. Because is scalar
with a uniform prior for this example, taking the average over the 16 evenly spaced
values of is more computationally efficient than generating random draws of
from its prior and using Eq. (4.17).
To better illustrate the relationship between the preposterior and the surrogate
preposterior analyses, the preposterior analysis is conducted in the same fixed-
manner described in the preceding paragraph for the surrogate preposterior analysis.
4 Multi-response Approach to Improving Identifiability in Model Calibration 115
Fig. 4.23 16 equally spaced values of , and the corresponding (a) sample average of posterior
variance (unit: GPa2 / using hypothetical data over 100 MC simulations per , and (b) negative
second-order derivative of the log-likelihood function
That is, the 16 different values of equally spaced within [150, 300] GPa are
considered, and for each specific value, 100 MC replicates are used to calculate
the sample posterior variance. A procedure identical to that describes in Sect. 4.1,
but with fixed over the 100 MC replicates, was used to estimate the preposterior
variance (as the average of the 100 sample posterior variances over the 16 values
of ). The results are shown in Fig. 4.23a. Comparing Fig. 4.23a, b, a clear negative
correlation can be seen between the preposterior variance and the Fisher information
criterion. This is expected, considering that the preposterior variance is an estimate
of the actual posterior variance, which is closely related to the inverse of the Fisher
information; a larger value of the former generally corresponds to a larger value of
the latter.
The same procedure was repeated for every other pair of responses. Table 4.6
shows the results of the preposterior and surrogate preposterior analyses. The ranks
in columns 2 and 3 are based on the predicted identifiability; 1 corresponds to the
smallest preposterior variance/largest IO and 15 to the largest preposterior variance/s-
mallest I.O The rankings provide us with predictions of which subsets are most likely
to enhance identifiability (lower ranks) and which less likely (higher ranks). The
posterior columns in Table 4.6, which are taken from Table 4.5, show the actual
posterior covariances that resulted from each combination of measured responses
after the experiments were conducted. They serve as a basis for comparison, and
ideally we would like the rankings from the posterior analyses to coincide with the
predicted rankings from the preposterior and surrogate preposterior analyses. It can
be seen that the preposterior analysis is in good relative agreement with the posterior.
Although the values of preposterior and posterior standard deviations are off by
roughly a factor of 2.5, both analyses indicate that {y4 ; y5 } together lead to the best
identifiability, while {y1 ; y2 } together lead to the worst identifiability. Overall, the
rankings are in very close agreement. The improvement on identifiability relative
to the worst case (i.e., {y1 ; y2 } for both analyses) is also calculated and provided
in the table. The relative improvements are also in very close agreement. The
116 Z. Jiang et al.
Table 4.6 Comparisons between posterior, preposterior, and surrogate preposterior analyses
Responses Posterior Preposterior Surrogate preposterior
yi yj Rank Improvement Q Rank Improvement IO Rank
y4 y5 3.63 1 86.7% 1.4161 1 87.0% 351.8 1
y4 y6 3.90 2 85.8% 1.5224 3 86.0% 186.9 6
y5 y6 4.67 3 83.0% 1.6528 4 84.8% 218.2 3
y3 y6 4.86 4 82.2% 1.6618 5 84.7% 161.6 7
y3 y4 5.49 5 80.0% 1.4801 2 86.4% 273.0 2
y3 y5 6.27 6 77.1% 2.1117 6 80.6% 202.9 5
y1 y4 9.29 7 66.1% 3.0933 7 71.6% 209.6 4
y1 y3 10.13 8 63.0% 3.2583 8 70.1% 113.3 9
y2 y3 10.96 9 60.0% 3.6595 9 66.4% 93.34 10
y2 y5 15.42 10 43.4% 6.0765 11 44.2% 114.3 8
y1 y5 15.57 11 43.1% 5.5379 10 49.2% 90.92 11
y2 y4 18.35 12 33.9% 8.7816 12 19.4% 65.94 14
y1 y6 24.08 13 12.0% 9.1008 13 16.5% 81.19 12
y2 y6 26.46 14 3.3% 10.1850 14 6.5% 80.99 13
y1 y2 27.37 15 - 10.8931 15 - 47.09 15
results provided by the surrogate preposterior analysis are also in close agreement
with the posterior standard deviation results, although slightly less so than the
preposterior standard deviations. The top 7 pairs of responses coincide with the
top 7 pairs from the actual posterior analysis. Hence, the surrogate preposterior
analysis would have effectively narrowed down the candidate pairs to consider in
the preposterior analysis. Considering its extremely low computational cost, the
surrogate preposterior analysis is a useful enhancement to the preposterior analysis
for reducing the number of response pairs to consider.
The results from three analyses in Table 4.6 are in good accordance with the
underlying physics of the system. For example, the strain y1 and the plastic strain
y2 are perfectly correlated with each other; their values are off by a constant
(equal to the value of elastic strain) after plastic deformation. Therefore, their
combination adds no more information about and enhances identifiability little
beyond using either single response. In contrast, the internal energy y4 and the
midpoint displacement y5 follow a nonlinear relationship, and thus the degree of
improvement in identifiability is substantial.
It is not surprising to observe the absolute differences between the posterior
variance and preposterior variances. In the MC simulations of the preposterior
analysis, the hypothetical experimental data are generated based on discrepancy
functions generated from their assigned prior distribution. In contrast, the posterior
variance calculation is based on the single realization that is the actual discrepancy
function. Consequently, the Bayesian analysis modules inside the MC loops are
hypothetical and are not expected to obtain the same values of preposterior variance
as the actual posterior variance. The surrogate preposterior analysis involves further
4 Multi-response Approach to Improving Identifiability in Model Calibration 117
Fig. 4.24 Preposterior standard deviation and Fisher information-based identifiability predictor,
versus posterior standard deviation, demonstrating very high correlation
Fig. 4.25 Posterior distribution of the calibration parameter using a single response
Fig. 4.26 Posterior distribution of the calibration parameter using multiple responses
Fisher information criterion IO calculated via the algorithm of Fig. 4.15, constitute
reasonable predictions of the posterior covariance that will result when the physical
experimental data are collected, at least for the examples considered. As such, the
preposterior and the surrogate preposterior analyses can provide insight into the
identifiability of a system and serve as criteria for helping to design an effective
physical experimental design, based on the results of a previously conducted set of
computer simulations.
Regarding the conditions on .x/ required for reasonable identifiability, clearly
there must be some, and .x/ is not allowed to be any function. The assumption that
.x/ can be represented by a Gaussian process model is not a particularly strong one,
because, for various choices of correlation function, a Gaussian process prior can
produce an extremely wide variety of stochastic realizations of .x/. This flexibility
of the Gaussian process model is one of the reasons for its popularity in the
calibration literature. As discussed in Sect. 2.3, good identifiability is more difficult
4 Multi-response Approach to Improving Identifiability in Model Calibration 119
sufficient for this example, and no appreciable difference was found in the results
when MC replicates were increased to 8500. The computational expense will also
increase with the number of variables. For the example depicted in Figs. 4.20
and 4.21, each MC replicate took roughly 0.028 min on average, and this was
also largely independent of Ne . In order to use the approach in a formal design
optimization algorithm, one would have to calculate the preposterior standard
deviation for a great many different candidate designs as the exact locations of
the x settings are varied. Hence, the surrogate preposterior analysis (Fig. 4.15) is
needed to reduce computational expenses and to adapt the algorithm for these
purposes.
5 Conclusions
To use a computer model for simulation-based design, designers must build confi-
dence in using the model for predicting physical systems, which is accomplished by
quantifying uncertainty via model updating. The model updating formulation pro-
posed by Kennedy and OHagan is the most comprehensive and applicable to design
under uncertainty. However, a limitation of this model updating formulation is that
it can be difficult to distinguish between the effects of the calibration parameters
(parameter uncertainty) and the discrepancy function (model discrepancy), which
results in a lack of identifiability. It is important to identify, or differentiate, between
these two sources of uncertainty in order to better understand the underlying sources
of model uncertainty and ultimately further improve the model to better represent
reality.
The degree of identifiability can be measured by the posterior covariance of the
calibration parameters in a typical model uncertainty quantification framework. A
better understanding of the identifiability problem is provided in this chapter in
relation to model updating by using several illustrative examples. It was first shown
how the calibration parameters and the discrepancy function were not identifiable in
the simply supported beam example, even when significant amounts of simulation
and experimental data are available. There are several different combinations of
the computer model (with different values for the calibration parameter) and the
estimated discrepancy function that combine to accurately predict the experimental
response, which translates to poor identifiability. To improve identifiability, one
can use informative prior distributions for the calibration parameters and/or the
discrepancy function; however, this is a crude solution. Furthermore, in practice,
informative priors typically do not exist. In the step function example in Sect. 2.3,
it was shown that identifiability is possible under the relatively mild assumption
of a smooth discrepancy function. In this example, the estimated discrepancy
function is only smooth and consistent with the GP model when the calibration
parameter equals the true value, which resulted in good identifiability. In conclusion,
identifiability is a highly nuanced and difficult problem, but not impossible.
In spite of the identifiability challenges, good identifiability may often be
achieved in model uncertainty quantification by measuring multiple experimental
4 Multi-response Approach to Improving Identifiability in Model Calibration 121
responses that are automatically calculated in the simulation and that share a mutual
dependence on a common set of calibration parameters. An intriguing phenomenon
was observed that some combinations of responses may result in drastically different
identifiability than others: By revisiting the beam example, it was shown that
measuring certain responses will achieve substantial improvement in identifiability,
while measuring other combinations of responses provides little improvement in
identifiability beyond measuring only a single response. To take advantage of this, a
method is needed for predicting multi-response identifiability prior to conducting
the physical experiments, to allow users to choose the most appropriate set of
responses to measure experimentally.
As it is generally not feasible, nor necessary, to measure experimentally all
of the great many responses that are automatically calculated in the simulations,
a preposterior analysis was introduced to address the issue of how to select the
most appropriate subset of responses to measure experimentally, to best enhance
identifiability. Prior to conducting the physical experiments but after conducting the
computer simulations, the preposterior analysis can predict the relative improvement
in identifiability that will result using different subsets of responses. It is based
on Monte Carlo simulations within a modular Bayesian multi-response Gaussian
process (MRGP) framework. The case studies showed that the approach is effective
in predicting which subset of responses will provide the largest improvement in
identifiability. Even though there are absolute differences between the preposterior
and actual posterior covariances, the relative differences and the rankings derived
from them are quite consistent, indicating that the method can be used effectively to
choose the best combination of responses to measure experimentally.
Furthermore, to render the approach computationally feasible in engineering
applications with a large number of responses, a simpler, surrogate preposterior
analysis based on the expected Fisher information of the calibration parameters was
introduced. The expected Fisher information matrix is the frequentist counterpart to
the Bayesian preposterior covariance matrix of the parameters. It was demonstrated
that, while being much faster to calculate than the preposterior covariance, it still
provides a reasonable indication of the resulting identifiability. For real engineering
applications with many system responses (and hence many different combinations
of responses), it is recommended using the surrogate preposterior analysis to
eliminate the responses that are unlikely to lead to good identifiability and reduce
the number of response combinations that are considered in the preposterior
analysis.
To obtain the MLEs of the hyperparameters for the computer model MRGP model,
the multivariate normal likelihood function is first constructed as:
122 Z. Jiang et al.
qNm Nm q
ln.p.vec.Ym /jBm ; m ; m // D ln.2/ ln .jm j/ ln .jRm j/
2 2 2
1
vec.Ym Hm Bm /T .m Rm /1 vec.Ym Hm Bm /:
2
(4.19)
The MLE of Bm is found by setting the derivative of Eq. (4.19) with respect to Bm
equal to zero, which gives:
m
BO D .Hm /T .Rm /1 Hm 1 .Hm /T .Rm /1 Ym : (4.20)
The MLE of m is found using result 4.10 of Ref. [52], which yields:
m 1 m m
O D .Ym Hm BO /T .Rm /1 .Ym Hm BO /: (4.21)
Nm
After observing Ym , the posterior of the computer response yim .x; / given Ym (and
given m and m and assuming a non-informative prior for Bm / is Gaussian with
mean and covariance:
m m
Eym .x; /jYm ; m D hm .x; /BO C rm .x; /T .Rm /1 .Ym Hm BO / (4.22)
Covym .x; //; ym .x0 ; 0 /jYm ; m D m Rm ..x; /; .x0 ; 0 / /
rm .x; /T .Rm /1 rm .x0 ; 0 / C hm .x; /T .Hm /T .Rm /1 rm .x; /T
m
where rm .x; / is a Nm 1 vector whose i th element is Rm ..xm
i ; i /; .x; //. Using
an empirical Bayes approach, the MLEs of the hyperparameters from Appendix A
are plugged into Eqs. (4.22) and (4.23) to calculate the posterior distribution of the
computer responses. Notice that Eqs. (4.22) and (4.23) are analogous to the single-
response GP model results.
m m
O ; D D Eym .x; /jYm ;
Eye .x/jYm ; O C h .x/B ; (4.24)
m
O ; D D R .x; x/ C
Covye .x/; ye .x0 /jYm ;
m
(4.25)
CCovym .x; //; ym .x0 ; //jYm ;
O ;
m
where O are the MLEs of the hyperparameters for the computer model MRGP
model. Since Eqs. (4.24) and (4.25) depend on the unknown true value of , these
two equations are integrated with respect to the prior distribution of .p.// via:
Z
e O m D
Ey .x/jY ; m O m ; p./d;
Eye .x/jYm ;
Z (4.26)
e e O m D
0
Covy .x/; y .x /jY ; m e e 0 mO m ; p./d:
Covy .x/; y .x /jY ;
Kennedy and OHagan [53] provide closed form solutions for Eq. (4.26) under
the conditions of Gaussian correlation functions, constant regression functions, and
normal prior distributions for (for details, refer to Section 3 of [53] and Section 4.5
of [1]). In this chapter, similar closed form solutions are used except that a uniform
prior distribution is assumed for .
After observing the experimental data, Ye , one can construct a multivariate
normal likelihood function with mean and variance from Eq. (4.26). The MLEs of
maximize this likelihood function. The MLE of B can found by setting the
analytical derivative of this likelihood function with respect to B equal to zero (see
Section 2 of Ref. [53]). However, there are no analytical derivatives with respect to
the hyperparameters , , and . Therefore, numerical optimization techniques
are needed to find these MLEs.
124 Z. Jiang et al.
The posterior for the calibration parameters in Eq. (4.12) involves the likelihood
O and the marginal posterior distribution for the data p.dj/.
function p.dj; / O The
likelihood function is multivariate normal with mean vector and covariance matrix
defined as:
m./ D H./BO
" m #
Iq Hm 0 vec.BO /
D ; (4.27)
Iq Hm .Xe ; / Iq H vec.BO /
" m m #
O Rm O Cm
V./ D m m ; (4.28)
O Cm T O Rm .Xe ; / C O R C O I Ne
1
where BO D H./T V./1 H./ H./T V./1 d, which is calculated based on
the entire data set (instead of using the estimates from Modules 1 and 2 for Bm and
B / as detailed in Section 4 of [53]. Hm .Xe ; / D hm .xe1 ; /T ; : : : ; hm .xm T T
Ne ; /
e T T T
and H D h .x1 / ; : : : ; h .xNe / denote the specified regression functions
for the computer model and the discrepancy functions at the input settings Xe . Cm
m
denotes the Nm Ne matrix with i th-row, j th-column entries Rm ..xm e
i ; i /; .xj ; //.
m e
R .X ; / denotes the Ne Ne matrix with i th-row, j th-column entries
Rm ..xei ; /; .xej ; //. R denotes the Ne Ne matrix with i th-row, j th-column
entries R .xei ; xej /. Finally, Iq and INe denote the q q and Ne Ne identity
matrices.
The marginal posterior distribution for the data p.dj/ O is:
Z
p.dj/ D p.dj; /p./d; (4.29)
which can be calculated using any numerical integration technique. In this chapter,
Legendre-Gauss quadrature is used for the low-dimensional examples. Alterna-
tively, Markov chain Monte Carlo (MCMC) could be used to sample complex
posterior distributions such as those in Eq. (4.12).
where:
" m #
O Rm ..Xm ; m /; .x; //
t.x; / D m ; (4.32)
O Rm ..Xe ; /; .x; // C O R .Xe ; x/
h.x; / D Iq hm .x; / Iq h .x/ : (4.33)
where the outer expectation and covariance are with respect to the posterior
distribution of the calibration parameters. Equations (4.34) and (4.35) are derived
using the law of total expectation and the law of total covariance [54]. Due
to the complexity of the posterior distribution of the calibration parameters, the
marginalization requires numerical integration methods. For the examples in this
chapter, Legendre-Gauss quadrature is used.
References
1. Kennedy, M.C., OHagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B
63(3), 425464 (2001)
2. Higdon, D., Kennedy, M.C., Cavendish, J., Cafeo, J., Ryne, R.: Combining field data and
computer simulations for calibration and prediction. SIAM J. Sci. Comput. 26(2), 448466
(2004)
3. Reese, C.S., Wilson, A.G., Hamada, M., Martz, H.F.: Integrated analysis of computer and
physical experiments. Technometrics 46(2), 153164 (2004)
126 Z. Jiang et al.
4. Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J., Cafeo, J.A., Cavendish, J., Lin, C.H., Tu, J.: A
framework for validation of computer models. Technometrics 49(2), 138154 (2007)
5. Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-
dimensional output. J. Am. Stat. Assoc. 103(482), 570583 (2008)
6. Chen, W., Xiong, Y., Tsui, K.L., Wang, S.: A design-driven validation approach using bayesian
prediction models. J. Mech. Des. 130(2), 021101 (2008)
7. Qian, P.Z.G., Wu, C.F.J.: Bayesian hierarchical modeling for integrating low-accuracy and
high-accuracy experiments. Technometrics 50(2), 192204 (2008)
8. Wang, S., Tsui, K.L., Chen, W.: Bayesian validation of computer models. Technometrics 51(4),
439451 (2009)
9. Drignei, D.: A kriging approach to the analysis of climate model experiments. J. Agric. Biol.
Environ. Stat. 14(1), 99112 (2009)
10. Akkaram, S., Agarwal, H., Kale, A., Wang, L.: Meta modeling techniques and optimal design
of experiments for transient inverse modeling applications. Paper presented at the ASME
International Design Engineering Technical Conference, Montreal (2010)
11. Huan, X., Marzouk, Y.M.: Simulation-based optimal Bayesian experimental design for nonlin-
ear systems. J. Comput. Phys. 232(1), 288317 (2013)
12. Loeppky, J., Bingham, D., Welch, W.: Computer model calibration or tuning in practice.
Technical Report, University of British Columbia, Vancouver, p. 20 (2006)
13. Han, G., Santner, T.J., Rawlinson, J.J.: Simultaneous determination of tuning and calibration
parameters for computer experiments. Technometrics 51(4), 464474 (2009)
14. Arendt, P.D., Apley, D.W., Chen, W.: Quantification of model uncertainty: Calibration, model
discrepancy, and identifiability. J. Mech. Des. 134(10) (2012)
15. Arendt, P.D., Apley, D.W., Chen, W., Lamb, D., Gorsich, D.: Improving identifiability in model
calibration using multiple responses. J. Mech. Des. 134(10) (2012)
16. Ranjan, P., Lu, W., Bingham, D., Reese, S., Williams, B., Chou, C., Doss, F., Grosskopf, M.,
Holloway, J.: Follow-up experimental designs for computer models and physical processes.
J. Stat. Theory Pract. 5(1), 119136 (2011)
17. Williams, B.J., Loeppky, J.L., Moore, L.M., Macklem, M.S.: Batch sequential design to
achieve predictive maturity with calibrated computer models. Reliab. Eng. Syst. Saf. 96(9),
12081219 (2011)
18. Tuo, R., Wu, C.F.J., Vu, D.: Surrogate modeling of computer experiments with different mesh
densities. Technometrics 56(3), 372380 (2014)
19. Maheshwari, A.K., Pathak, K.K., Ramakrishnan, N., Narayan, S.P.: Modified Johnson-Cook
material flow model for hot deformation processing. J. Mater. Sci. 45(4), 859864 (2010)
20. Xiong, Y., Chen, W., Tsui, K.L., Apley, D.W.: A better understanding of model updating
strategies in validating engineering models. Comput. Methods Appl. Mech. Eng. 198(1516),
13271337 (2009)
21. Liu, F., Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J.: A Bayesian analysis of the thermal
challenge problem. Comput. Methods Appl. Mech. Eng. 197(2932), 24572466 (2008)
22. Arendt, P., Apley, D.W., Chen, W.: Updating predictive models: calibration, bias correction,
and identifiability. Paper presented at the ASME 2010 International Design Engineering
Technical Conferences, Montreal (2010)
23. Chakrabarty, J.: Theory of Plasticity, 3rd edn. Elsevier/Butterworth-Heinemann, Burlington
(2006)
24. Liu, F., Bayarri, M.J., Berger, J.O.: Modularization in Bayesian analysis, with emphasis on
analysis of computer models. Bayesian Anal. 4(1), 119150 (2009)
25. Joseph, V., Melkote, S.: Statistical adjustments to engineering models. J. Qual. Technol. 41(4),
362375 (2009)
26. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. Adaptive
Computation and Machine Learning. MIT Press, Cambridge (2006)
27. Qian, P.Z.G., Wu, H.Q., Wu, C.F.J.: Gaussian process models for computer experiments with
qualitative and quantitative factors. Technometrics 50(3), 383396 (2008)
4 Multi-response Approach to Improving Identifiability in Model Calibration 127
28. McMillan, N.J., Sacks, J., Welch, W.J., Gao, F.: Analysis of protein activity data by gaussian
stochastic process models. J. Biopharm. Stat. 9(1), 145160 (1999)
29. Cressie, N.: Statistics for Spatial Data. Wiley Series in Probability and Statistics. Wiley, New
York (1993)
30. Ver Hoef, J., Cressie, N.: Multivariable spatial prediction. Math. Geol. 25(2), 219240 (1993)
31. Conti, S., Gosling, J.P., Oakley, J.E., OHagan, A.: Gaussian process emulation of dynamic
computer codes. Biometrika 96(3), 663676 (2009)
32. Conti, S., OHagan, A.: Bayesian emulation of complex multi-output and dynamic computer
models. J. Stat. Plan. Inference 140(3), 640651 (2010)
33. Williams, B., Higdon, D., Gattiker, J., Moore, L.M., McKay, M.D., Keller-McNulty, S.:
Combining experimental data and computer simulations, with an application to flyer plate
experiments. Bayesian Analysis 1(4), 765792 (2006)
34. Bayarri, M.J., Berger, J.O., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo, J., Parthasarathy,
R.J., Paulo, R., Sacks, J., Walsh, D.: Computer model validation with functional output. Ann.
Stat. 35(5), 18741906 (2007)
35. McFarland, J., Mahadevan, S., Romero, V., Swiler, L.: Calibration and uncertainty analysis for
computer simulations with multivariate output. AIAA J. 46(5), 12531265 (2008)
36. Drignei, D.: A kriging approach to the analysis of climate model experiments. J. Agric. Biol.
Environ. Stat. 14(1), 99114 (2009)
37. Kennedy, M.C., Anderson, C.W., Conti, S., OHagan, A.: Case studies in gaussian process
modelling of computer codes. Reliab. Eng. Syst. Saf. 91(1011), 13011309 (2006)
38. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experi-
ments. Stat. Sci. 4(4), 409423 (1989)
39. Rasmussen, C.E.: Evaluation of Gaussian Processes and Other Methods for Non-linear
Regression. University of Toronto (1996)
40. Lancaster, T.: An Introduction to Modern Bayesian Econometrics. Blackwell, Malden (2004)
41. Arendt, P.D., Apley, D.W., Chen, W.: A preposterior analysis to predict identifiability in
experimental calibration of computer models. IIE Trans. 48(1), 7588 (2016)
42. Jiang, Z., Chen, W., Apley, D.W.: Preposterior analysis to select experimental responses for
improving identifiability in model uncertainty quantification. Paper presented at the ASME
2013 International Design Engineering Technical Conferences & Computers and Information
in Engineering Conference, Portland (2013)
43. Jiang, Z., Apley, D.W., Chen, W.: Surrogate preposterior analyses for predicting and enhancing
identifiability in model calibration. Int. J. Uncertain. Quantif. 5(4), 341359 (2015)
44. Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics.
Springer, New York (1985)
45. Carlin, B.P., Louis, T.A.: Empirical bayes: Past, present and future. J. Am. Stat. Assoc. 95(452),
12861289 (2000)
46. Wu, C., Hamada, M.: Experiments: Planning, Analysis, and Optimization. Wiley, New York
(2009)
47. Montgomery, D.C.: Design and Analysis of Experiments, 7th edn. Wiley, Hoboken (2008)
48. Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions. Dover, New York (1972)
49. Beyer, W.H.: CRC Standard Mathematical Tables, 28 edn. CRC, Boca Raton (1987)
50. Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2004)
51. Smith, A., Gelfand, A.: Bayesian statistics without tears: a sampling-resampling perspective.
Am. Stat. 46(2), 8488 (1992)
52. Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis, 6th edn. Prentice Hall,
Upper Saddle River (2007)
53. Kennedy, M.C., OHagan, A.: Supplementary Details on Bayesian Calibration of Computer
Models, pp. 113. University of Sheffield, Sheffield (2000)
54. Billingsley, P.: Probability and Measure, Anniversary Edition. John Wiley & Sons, Inc.,
Hoboken (2011)
Validation of Physical Models in the Presence
of Uncertainty 5
Robert D. Moser and Todd A. Oliver
Abstract
As the field of computational modeling continues to mature and simulation
results are used to inform more critical decisions, validation of the physical
models that form the basis of these simulations takes on increasing importance.
While model validation is not a new concept, traditional techniques such as visual
comparison of model outputs and experimental observations without accounting
for uncertainties are insufficient for assessing model validity, particularly for
the case where the intended purpose of the model is to make extrapolative
predictions. This work provides an overview of validation of physical models in
the presence of uncertainty. In particular, two issues are discussed: comparison of
model outputs and observational data when both the model and observations are
uncertain, and the process of building confidence in extrapolative predictions. For
comparing uncertain model outputs and data, a Bayesian probabilistic perspec-
tive is adopted in which the problem of assessing the consistency of the model
and the observations becomes one of Bayesian model checking. A broadly appli-
cable approach to Bayesian model checking for physical models is described.
For validating extrapolative predictions, a recently developed process termed
predictive validation is discussed. This process relies on the ideas of Bayesian
model checking but goes beyond comparison of model and data to assess the
conditions necessary for reliable extrapolation using physics-based models.
Keywords
Extrapolative predictions Posterior predictive assessment Validation under
uncertainty
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
1.1 Measuring Agreement Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
1.2 Different Uses of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2 Comparing Model Outputs and Data in the Presence of Uncertainty . . . . . . . . . . . . . . . . . 134
2.1 Sources of Uncertainty in Validation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.2 Assessing Consistency of Models and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.3 Data for Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3 Validating Models for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.1 Mathematical Structure for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.2 Validation for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4 Conclusions and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
1 Introduction
Over the last century, the field of computational modeling has grown tremendously,
from virtually nonexistent to pervasive. During this time, simultaneous advances
in simulation algorithms and computer hardware have enabled the development
and application of increasingly complicated and detailed models to represent
evermore complex physical phenomena. These advances are revolutionizing the
ways in which models are used in the design and analysis of complex systems,
enabling simulation results to be used in support of critical design and operational
decisions [13, 26]. With continued advances in models, algorithms, and hardware,
numerical simulations will only become more critical in modern science and
engineering.
Given the importance of computational modeling, it is increasingly important to
assess the reliability, in light of the purpose of a given simulation, of the models
that form the basis of computational simulations. This reliability assessment is the
domain of validation. While the concept of model validation is not new, it has
recently received renewed attention due to the rapid growth in the use of models
as a basis for making decisions [2, 4, 29]. This article provides an overview of the
state of the art in validation of physical models in the presence of uncertainty.
In science and engineering, the word validation is often used to refer to simple
comparisons between model outputs and experimental data such as plotting the
model results and data on the same axes to allow visual assessment of agreement
or lack thereof. While comparisons between model and data are at the core
of any validation procedure, there are a number of problems with such naive
comparisons. First, these comparisons tend to lead to qualitative rather than
quantitative assessments of agreement. While such qualitative assessments are often
5 Validation of Physical Models in the Presence of Uncertainty 131
instructive and important, they are clearly incomplete, particularly as a basis for
making decisions regarding model validity. Second, in naive comparisons, it is
common to ignore or only partially account for uncertainty e.g., uncertainty in
the experimental observations or the model input parameters. Without accounting
for these uncertainties, it is not possible to appropriately determine whether the
model and data agree. Third, by focusing entirely on the agreement in the observable
quantities, such comparisons neglect the intended uses of the model and, in general,
cannot on their own determine whether the model is sufficient for the intended
purposes.
These drawbacks of straightforward but naive comparisons highlight the two
primary difficulties in model validation. First, one must quantitatively measure the
agreement between model outputs and experimental observations while accounting
for uncertainties in both. This fact is widely recognized, particularly in the statistics
community, and there are a number of possible approaches. Second, depending
on the intended use of the model, an assessment of the agreement between model
outputs and available data is not sufficient for validation. Recognizing the purpose
of the model is crucial to designing an appropriate validation approach.
While the intended uses of a model will be important in a complete assessment of the
validity of the model for those uses, all validation methods rely in some way on an
assessment of whether the model is consistent with some set of observational data.
In general, both the observations and the model either through input parameters,
the model structure, or both are subject to uncertainties that must be accounted
for in this comparison. Indeed, if both the model and the experiments are free from
any uncertainty, then they can only be consistent if the model perfectly reproduces
all the data. To define consistency in the far more common situation where at least
some aspect of the model and/or data is uncertain, one must supply mathematical
representations of all relevant uncertainties, a quantitative method for comparing
uncertain quantities using the chosen representations of uncertainty, and a tolerance
defining how closely model and data must agree to be declared consistent.
A wide range of formalisms have been proposed to represent uncertainty [8, 10,
12, 21, 28, 34], and there is still considerable controversy in the literature regarding
the most appropriate approach, especially for so-called epistemic uncertainty (see
below). This work focuses on the Bayesian interpretation of probability, where
probability provides a representation of the degree of plausibility of a proposition,
to represent all uncertainties [21, 35]. This choice is popular and has many
advantages, including a well-defined method for updating probabilistic models to
incorporate new data (Bayes theorem) and an extensive and rapidly growing set of
available algorithms for both forward and inverse UQ [1, 9, 23, 24, 27, 31]. While
a full discussion of the controversy over uncertainty representations is beyond
the scope of this article, for the purposes of model validation, most uncertainty
representations that have been proposed are either overly simplistic (e.g., using
132 R.D. Moser and T.A. Oliver
Computational models are used for many different purposes. In science and
engineering, these different purposes can be split into three broad categories: (1)
investigation of the consequences of theories, (2) analysis of experimental data, and
(3) prediction.
Scientific theories often lead to models that are sufficiently complex that compu-
tations are required to evaluate whether the theory is consistent with reality. When
computation is used in this way, the computational model will be an expression of
5 Validation of Physical Models in the Presence of Uncertainty 133
given some observed discrepancy between the model and data, is the model likely
to produce predictions of the QoI with unacceptably large error?
While this type of question has generally been left to expert judgment [2, 4],
a recently proposed predictive validation process aims to systematically and quanti-
tatively address such issues [30]. The process involves developing stochastic models
to represent uncertainty in both the physical model and validation data, allowing
rigorous assessment of the agreement between the model and data under uncertainty,
as discussed in Sect. 2. Further, these stochastic models, coupled with the common
structure of physics-based models, allow one to pose a much richer set of validation
questions and assessments to determine whether extrapolation with the model to the
prediction is supported by the available data and other knowledge. This predictive
validation process will be discussed further in Sect. 3.
where the error is represented here as additive, though other choices are possible.
The error "u is imperfectly known and may be represented by an inadequacy
model E u .xI u / [30], with inadequacy model parameters u that are also cali-
brated and uncertain.
Observations of the phenomenon modeled by U are generally made in the
context of some larger system. This larger system has observable quantities v; which
will be the basis of the validation test. The validation system must also be modeled
with a model V that is a mapping from a set of inputs y and the modeled quantities
u to the observables v. The dependence on u is necessary since the system involves
the phenomenon being modeled by U . The model V will in general involve model
parameters v ; which are uncertain, and V may itself be imperfect with error "v ;
which is modeled as E v with parameters v . Thus, a preliminary representation of
the validation system is given by
where VQ is the validation system model enriched with the inadequacy model E v .
To complete the validation model, u in (5.2) is expressed in terms of the model
U ; which means that the inputs x to U must be determined from the inputs y
to V using a third model X with parameters x ; which may also be imperfect,
introducing uncertain errors "x ; modeled as E x with parameters x ; yielding
Q .yI x ; x /:
x D X .yI x / C E x .yI x / D X (5.3)
Because the model U of the phenomenon is introduced into a larger model of the
validation system, it is called an embedded model [30].
Finally, errors v are introduced in the physical observations of v themselves,
commonly identified as observation or instrument error. The complete model of the
validation test is then
v D VQ U .X
Q .yI x ; x /; u / C E u .X
Q .yI x ; x //; yI v ; v C v : (5.4)
Here, the E u term is retained explicitly to emphasize that the validation test is
directed at the physical model U and the associated inadequacy model E u ; if any.
In this model of the validation test, there are four types of uncertainties: uncertainties
in the model parameters (x ; u ; v ; x ; u ; and v ); uncertainties in the validation
inputs y; uncertainties due to the model errors (E u ; E x ; and E v ); and finally
uncertainties due to the observation or instrument errors (v ). Note that in some
cases, it may be convenient to include the response of the measuring instrument(s)
in the validation system model V . In this case, the instrument errors are included
in "v .
Clearly, the design of an experimental observation will seek to minimize the
uncertainties not directly related to the model being studied (i.e., other than the
uncertainties in u and E u ). Furthermore, in the event that the model of the
validation (5.4) is found to be inconsistent with observations of v; all that can be said
136 R.D. Moser and T.A. Oliver
surrounding validation under uncertainty are not unique to any particular uncertainty
representation. However, in the current discussion of how one actually makes
an assessment of the consistency of models and observation in the presence of
uncertainty, it will be helpful to consider a particular uncertainty representation:
Bayesian probability. This representation is used here for the reasons discussed in
Sect. 1.
Since, for the purposes of this work, uncertainty is represented using Bayesian
probability, the question of consistency of the model with data falls in the domain
of Bayesian model checking. There is a rich literature on this subject, which, for
brevity, cannot be discussed extensively here. Instead, a broadly applicable approach
to model checking, which is generally consistent with common notions of validation
for models of physical systems, is described. The ideas outlined are most closely
aligned with those of Andrew Gelman and collaborators [1418], and the reader is
directed to these references for a more detailed statistical perspective. In particular,
see [16] for a broad perspective on the meaning and practice of Bayesian model
checking.
0.5
0.4
0.3 6 1 2 4 3 5
p x
0.2
0.1
0.0
4 3 2 1 0 1 2 3 4
x
Fig. 5.1 Hypothetical model output distribution and a number of possible observations illustrating
observations that are clearly consistent with the output distribution (i D 1; 2; 3), observations that
are clearly inconsistent (i D 4; 5), and observations where the agreement is marginal (i D 6)
if it is sufficiently small (e.g., 0.05 or 0.01), one could conclude that vO is an unlikely
outcome of the model, so that the validity of the model is suspect. This leads to
the concept of a credibility interval, an interval in which, according to the model,
it is highly probable (e.g., probability of 0.95 or 0.99) that an observation will fall.
Given a probability distribution p.v/; there are many possible credibility intervals,
or more generally credibility sets, with a given probability. One could, for example,
choose a credibility interval centered on the mean of the distribution or one defined
so that the probability of obtaining an observation greater than the upper bound of
the interval is equal to that of an observation less than the lower bound. Either of
these credibility intervals has the disturbing property of including the point v4 in
Fig. 5.1, which is clearly not a likely draw from the distribution plotted.
A credibility region that is more consistent with an intuitive understanding of
credible observations for skewed and/or multimodal distributions such as that shown
in Fig. 5.1 is the highest posterior density (HPD) credibility region [7]. The -HPD
.0 1/ credible region S is the set for which the probability of belonging to
S is and the probability density for each point in S is greater than that of points
outside S . Thus, for a multimodal distribution like that shown in Fig. 5.1, an HPD
region may consist of multiple disjoint intervals [20] around the peaks, leaving out
the low probability density regions between the peaks.
However, because HPD credibility sets are defined in terms of the probability
density, they are not invariant to a change of variables. This is particularly
undesirable when formulating a validation metric because it means that ones
5 Validation of Physical Models in the Presence of Uncertainty 139
conclusions about model validity would depend on the arbitrary choice of variables
(e.g., whether one considers the observable to be the frequency or the period of
an oscillation). To avoid this problem, a modification of the HPD set is introduced
in which the credible set is defined in terms of the probability density relative to
a specified distribution q [30]. An appropriate definition of q would be one that
represents no information about v [21]. Using this definition of the highest posterior
relative density (HPRD), a conceptually attractive credibility metric can be defined
as D 1 min ; where min is the smallest value of for which the observation vO
is in the HPRD-credibility set for v according to the model. That is,
Z
p.v/ p.v/
O
D 1 p.v/ d v; where S D v W : (5.6)
S q.v/ q.v/
O
When is smaller than some tolerance, say less than 0:05 or 0:01; vO is considered
an implausible outcome of the model i.e., there is an inconsistency between the
model and the observation.
While this directly answers the question of how credible the observations are
as samples from the model distribution, it is generally difficult to compute when
N is large since it involves evaluating a high-dimensional integral over a complex
region. An alternative approach is to consider one or more test quantities [17, 18].
A test quantity T .V / is a mapping from an N -vector to a scalar. When evaluated
for a random vector, it is a random scalar, which is designed to summarize some
important feature of V . The idea then is to ask whether T .VO / is a plausible sample
from the distribution of T .V /. One could, for example, compute p-values for this
comparison. The HPRD metric could also be used, but p.T .V // which is part of its
definition is usually difficult to compute.
In addition to being potentially more tractable, the use of test statistics has
another advantage. With rare exceptions, the uncertainty representations leading
to the stochastic model being validated are based on crude and/or convenient
assumptions about the uncertainties. Indeed, iid Gaussian random variables are often
used to model experimental noise, and while this is sometimes justified, it is often
assumed purely out of convenience. Thus, one does not necessarily expect, nor is
it generally required, for the model distribution p.v/ to be representative of the
random processes that led to the variability in vO from observation to observation. In
this case, it is necessary only that the uncertainty representations will characterize
what is important about the uncertainty for the purposes for which the model
is to be used. While the HPRD metric given in (5.8) does not take this into
account, validating using test quantities gives one the opportunity to choose T to
characterize an important aspect of the uncertainty. For example, if the model is
to be used to evaluate extreme deviations from nominal behavior or conditions,
then it might make sense to perform validation comparisons based on the test
quantity T .V / D maxi vi . A few example test quantities are discussed in the next
subsection.
Finally, the assumption of negligible epistemic uncertainty, while useful in
simplifying this discussion, is not generally applicable. This assumption can be
removed by marginalizing over the epistemic uncertainties represented in the model,
as described in Sect. 2.2.4.
theorem implies that A.V / is approximately N .; 2 =N /; where and 2 are the
mean and variance of v. This leads to the Z-test in statistics.
To test whether the variability of v is consistent with the observed variability of
v;
O the test quantity
XN
.vi /2
X 2 .V / D (5.10)
iD1
2
could be used, which is a 2 discrepancy. Note that this test quantity is different
because it depends explicitly on characteristics of the model (mean and variance).
If in addition to being iid, the v obtained from the model are normally distributed,
the model distribution p.X 2 .V // will be the 2 distribution, with N degrees of
freedom.
When the test quantity has a known distribution, as A.V / and X 2 .V / discussed
above do, it simplifies assessing consistency using, for example, HPRD criteria or
p-values. This is so because tail integrals of these distributions are known. However,
it is not necessary that exact distributions be known, since the relevant integrals can
be approximated using Monte Carlo or other uncertainty propagation algorithms.
For example, in some problems, one is concerned with improbable events with
large consequences. In this case, one is interested in the tail of the distribution of v
and the extreme values that v may take. A simple test quantity that is sensitive to
this aspect of the distribution of v is the maximum attained value. That is,
M .V / D max vi (5.11)
i
The random variable M .V / that is implied by the model can be sampled by simply
generating N samples of v and determining the maximum. Thus the p-value relative
to the observation M .Z/ O can be computed by Monte Carlo simulation.
There is a large literature on statistical test quantities (cf. [22]) for use in a wide
variety of applications. Texts on statistics should be consulted for more details.
Among commonly used statistical tests are those which test whether a population
is consistent with a distribution with known characteristics (e.g., the mean and/or
variance) as with the average and 2 test quantities discussed above. These are also
useful when the model can be used to compute these characteristics with much
higher statistical accuracy than they can be estimated from the data (e.g., one can
generate many more samples from the model than there are data samples). In other
situations, the model may be expensive to validate, so that the number of samples of
the posterior distribution of the model is limited. In this case, test quantities that test
whether two populations are drawn from distributions with the same characteristics
are useful. A simple example is the two sample t-test (Welchs test), which is used
to test whether two populations have the same mean.
multiple different quantities, the predictions of which are affected by both epistemic
and aleatoric uncertainties. This section generalizes the validation comparisons
discussed previously to this more complicated situation.
Consider the model of the observable expressed in (5.4). This model includes
uncertain parameters, the s and s; representations of modeling errors, "u ; "x ; and
"v ; and the representation of the observation errors v . Generally, the uncertainties
in the parameters are considered to be epistemic, that is, there are ideal values of
these parameters, which are imperfectly known. Their values do not change from
observation to observation in the same system. The observation error v is often
considered to be aleatoric, for example, from instrument noise. However, there may
also be systematic errors in the measurements, which are imperfectly known and
thus epistemically uncertain. Similarly, depending on the nature of the phenomena
involved and how they are modeled, the errors "u ; "x ; and "v in the models U ; X ;
and V may be epistemic, aleatoric, or a mixture of the two. The models for these
uncertainties could then include an epistemic part and an aleatoric part, that is,
"x E ex C E ax ; v D ev C D av ; (5.12)
where superscript e and a are for epistemic and aleatoric, respectively. One
challenge is then to compare the model and possibly repeated observations under
these mixed uncertainties.
In addition to multiple observations, there may be multiple observables. These
observables may be of the same quantity (e.g., the temperature) for different values
of the model inputs y (e.g., at different points in space or time) or of different
quantities (e.g., the temperature and the pressure). The observable v should thus
be considered to be a vector of observables of dimension n. In general, the aleatoric
uncertainties in the various observables will be correlated. In the model, these corre-
lations will arise both from the way the aleatoric inadequacy uncertainties impact the
observables through the model and from correlations inherent in the dependencies
of the aleatoric uncertainties (the E a s and D av ) on model inputs. Due to the presence
of aleatoric uncertainties, a validation test may make multiple (N ) observations vO of
the observable vector v; resulting in a set of observations VO D fvO 1 ; : : : ; vO N g; which
are taken to be independent and identically distributed (iid) here.
To facilitate further discussion, let us consolidate the epistemic uncertainties
(including those associated with E eu ; E ex ; E ev and D ev ) into a vector ; which may
be infinite dimensional. The validation model for the observables (5.4) can then
be considered to be a statistical model S for the aleatorically uncertain v; which
depends on the epistemically uncertain ; that is, v D S . /. The validation
question is then whether the set of observations VO is consistent with S ; given what
is known about .
If is known precisely (i.e., no epistemic uncertainty), then the situation is
similar to that discussed in Sect. 2.2.2. For a given ; the model S defines a
probability distribution p.vj / in the n-dimensional space of observables. The
observables are not independent, so this is not a simple product of one-dimensional
distributions, but this introduces no conceptual difficulties. A set of N observations
5 Validation of Physical Models in the Presence of Uncertainty 143
then defines a probability space of dimension nN . Because the observations are iid,
the probability distribution is written:
N
p.V jS ; / D iD1 p.vi jS ; /: (5.13)
Here, the probability density is conditional on the model S because these are
the distributions of outputs v implied by the model. Consistency of VO with S can
then in principle be determined through something like the HPRD-credibility met-
ric (5.6). However, as discussed in Sect. 2.2.2, this is generally not computationally
tractable, nor is it generally desirable. Alternatively, one or more test quantities T
may be defined to characterize what is important about the aleatoric variation of v;
and as discussed in Sect. 2.2.2, the consistency between the observed T .VO / and the
distribution obtained from the model p.T .V /jS ; / can be tested. For example,
one could use the p-value P> ; which now depends on .
When the parameters are uncertain, with uncertainty that is entirely epistemic
by construction, the validation question is whether the observed value of the test
quantity T .VO / is plausible for plausible values of . In validation, it is presumed that
the parameters in the models (the s and s in (5.4)) have been calibrated (e.g.,
via Bayesian inference) and that the epistemic uncertainties are now expressed as
probability distributions p. jS ; w/;
O where wO represents the data for the observables
w used to calibrate the parameters. This calibration data may or may not be included
in the validation data v. O In Bayesian inference, this is the posterior distribution,
and so the relevant distribution of T .V / or P> is that induced by the posterior
distribution of .
As suggested by Box [6], Rubin [32], and Gelman et al. [17], in this situation, the
consistency of the observations VO with the model can be determined by considering
the distribution of T .V / implied by the distribution of :
Z
p.T .V /jS ; w/
O D p.T .V /jS ; /p.jS ; w/
O d : (5.14)
data. While true, this does not provide useful guidance for designing experimental
observations for the purpose of validation. Selecting appropriate validation data
requires consideration of several problem-specific issues, so it is difficult to specify
generally applicable techniques for designing validation tests. Instead, a list of broad
guidelines for designing validation experiments is provided.
1. Often, the first point at which models are confronted with data is when they are
calibrated. As part of the calibration process, the calibrated model should also be
validated against the calibration data. Because the model has been calibrated to fit
the calibration data, consistency with the data will not greatly increase confidence
in the model. But if the model is inconsistent with the data with which it has been
calibrated, it is a very strong indictment of the model. With parsimonious models,
which describe a rich phenomenon with few parameters, failure to reproduce the
calibration data is a common mode of validation failure.
2. To increase confidence in the validity of a model, it should be tested against data
that was not used in its calibration. Sometimes this is done by holding back some
portion of the calibration data set so that it can be used only for validation, which
leads to cross-validation techniques [3]. This is generally of limited utility for
physics-based models. A much stronger validation test is to use a completely
different data set, from experiments with different validation models V in (5.4).
For example, calibration of a model might be done using a set of relatively simple
experiments in a laboratory facility, while validation experiments are in more
complex scenarios in completely different facilities.
3. The development of computational models often involves various approxima-
tions and assumptions. To increase confidence in the models, one should design
validation experiments that test these approximations, that is, experiments and
measurements should be designed so that the observed quantities are expected to
be sensitive to any errors introduced by the approximation. Sensitivity analysis
applied to the model can help identify such experiments. Furthermore, test
quantities used in validation assessment should also be designed to be sensitive
to the approximations being tested.
4. Models of complex physical systems commonly involve sub-models (embedded
models) of several physical phenomena that are of questionable reliability. When
possible, the individual embedded models should be calibrated and validated
separately, using experiments that are designed to probe the modeled phenomena
individually. The resulting experiments will generally be much simpler, less
expensive, easier to instrument, and easier to simulate than the complete system.
This simplicity could allow many more experiments and more measurements
to be used for calibration and/or validation. When possible, further experiments
simultaneously involving a few of the relevant phenomena should be performed
to validate models in combination. Experiments of increasing complexity and
involving more phenomena can then be pursued, until measurements in systems
similar in complexity to the target system are performed. This structure has
been described as a validation pyramid [5], with abundant simple inexpensive,
5 Validation of Physical Models in the Presence of Uncertainty 145
An abstract structure of a validation test is defined in Sect. 2.1. Here, this mathemat-
ical structure is extended to include features common in models of physical systems.
To fix the main ideas, this structure is presented first in the simplest possible context
(Sect. 3.1.1), with a more general abstraction outlined briefly in Sect. 3.1.2.
146 R.D. Moser and T.A. Oliver
R .u; I r/ D 0; (5.16)
D T .uI s; /; (5.17)
where s is a set of scenario parameters, possibly distinct from r; for the embedded
model, and is a set of calibration or tuning parameters. Since the embedded model
T is not known a priori to be reliable for the desired prediction, it will be the focus
of the validation process.
The system of equations consisting of (5.16) and (5.17) is closed, but for
calibration, validation, and prediction, some additional relationships are necessary.
In particular, expressions for the experimental observables v and the prediction QoIs
q are required. Here, it is assumed that there are maps V and Q that determine v
and q; respectively, from the model state u; the modeled quantity ; and the global
scenario r:
This closed set of Eqs. (5.16), (5.17), (5.18), and (5.19), which allows calculation
of both the experimental observables and predictions QoIs, is referred to as a
composite model, since it is a composition of high- and low-fidelity components.
Because the foundation of the composite model is a reliable theory whose validity
is not questioned, it is possible for such a model to make reliable predictions
despite the fact that a less reliable embedded model is also involved. All that is
required is that the less reliable embedded model should not be used outside the
range where it has been calibrated and tested. This restriction does not necessarily
limit the reliability of the composite model in extrapolation since the relevant
scenario space for each embedded model is specific to that embedded model, not
the composite model in which it is embedded. For example, an elastic constitutive
relation for the deformation of a material can only be relied upon provided the
strain remains within the bounds in which it has been calibrated and tested. Despite
this restriction, a model for a complex structure made from the material, which is
based on conservation of momentum, can reliably predict a wide range of structural
responses.
3.1.2 Generalizations
The simple problem statement in Sect. 3.1.1 is sufficient to introduce many of
the concepts that are critical in validation for prediction, including the distinction
between the QoI and available observable quantities and the notion of an embedded
model. However, there are several important generalizations that are required to
represent the validation and prediction process in complex physical systems. These
are outlined below. A more detailed description can be found in [30].
R .u; f g0 ; r/ D 0
q D Q .u; fg0 ; r/
R j .uj ; fgj ; r j / D 0 for 1 j Ne
v D V .u ; f g ; r /
j j j j j
for 1 j Ne (5.20)
In this formulation, the embedded models required for the prediction (j D 0) and
the validation scenarios (1 j Ne ) are given by
j j j j j j
k D T k W k .u/; k ; sk for 1 k N (5.21)
j
where N is the number of embedded models in scenario j .
Like the simple problem statement from Sect. 3.1.1, in this generalized problem
the high-fidelity theory forming the basis of the model can enable reliable pre-
dictions, despite the need to extrapolate from available data. However, additional
complexity arises from confounding introduced by the presence of multiple embed-
ded models in the validation experiments. To avoid this confounding, one would
ideally use experiments where there are no extra embedded models beyond those
needed for the prediction or where any such extra embedded models introduce small
error or uncertainty in the context of the experiment. Of course, the assessment of
any extra embedded models would itself form another validation exercise.
To further avoid confounding uncertainties, it is preferable to use experiments in
which only one of the embedded models used in the prediction model is exercised.
Such experiments are powerful because they provide the most direct assessment
possible of the embedded model in question. However, even if experiments that
separately exercise all of the embedded models necessary for prediction are
available, in general these experiments alone are not sufficient for validation because
they cannot exercise couplings and interactions between the modeled phenomena.
This fact leads to the idea of a validation pyramid [5], as discussed in Sect. 2.3.
Sect. 2, this does not imply that the model is valid for making predictions. The
reason is that the prediction QoI may be sensitive to some error or omission in
the model that the observed quantities are not. To preclude this possibility and
gain confidence in the prediction, further assessments of the validation process are
needed. These are discussed in Sect. 3.2.2 below.
The opposite situation represents a different fundamental challenge in the
validation of predictions. Even if a model is found to be inconsistent with available
data, this does not imply that the model is invalid for making the desired predictions.
The reason is that the prediction QoI may be insensitive to the error or omission in
the model that caused the inconsistency with the observations. To determine whether
a prediction can be made despite the errors in the model requires that the impact
of the modeling errors on the predicted QoI be quantified. This quantification of
uncertainty due to model inadequacy is discussed in Sect. 3.2.1.
1. The calibration and validation of the embedded models are sufficient to give
confidence in the prediction
2. The embedded models are being used within their domain of applicability
3. The resulting prediction with its uncertainties is sufficient for the purpose for
which the prediction is being made
5 Validation of Physical Models in the Presence of Uncertainty 151
These predictive assessments are outlined in the following paragraphs and in more
detail in [30].
1. Suppose that the prediction QoI is highly sensitive to one of the embedded
models T ; as measured, for example, by the Frchet derivative of the QoI with
respect to T at some representative . If none of the validation quantities are
sensitive to T ; then the validation process has not provided a test of the validity
of T ; and a prediction based on T would be unreliable. More plausibly, it may
be that none of the validation quantities for scenarios higher in the validation
pyramid are sensitive to T . The integration of T into a composite model similar
to that used in the predictions would then not have been tested, which would
make its use in the prediction suspect [30]. To guard against this and similar
possible failures of T ; the predictive assessment process should determine
whether validation quantities in scenarios close enough to the prediction
scenario are sufficiently sensitive to T to provide a good test of its use in the
prediction. The determination of what is close enough and what constitutes
sufficient sensitivity must be made based on knowledge of the model and the
approximations that went into it and of the way the models are embedded into
the composite models of the validation and prediction scenarios.
2. Suppose that the prediction QoI is highly sensitive to the value of a particular
parameter in an embedded model. In this case, it is important to determine
whether the value of this parameter is well constrained by reliable information.
If, for example, none of the calibration data has informed the value of ; then
152 R.D. Moser and T.A. Oliver
only other available information (prior information in the Bayesian context) has
determined its value. Further, if none of the validation quantities are sensitive
to the value of ; then the validation process has not tested whether the
information used to determine is in fact valid in the current context. The
prediction QoI is then being determined to a significant extent by the untested
prior information used to determine ; which leaves little confidence in the
reliability of the prediction, unless the prior information is itself highly reliable
(e.g., is the speed of light). Alternatively, when the available prior information
is questionable (e.g., is the reaction rate of a poorly understood chemical
reaction), the predictions based on will not be reliable.
3. Suppose that uncertainty in the prediction QoI is largely due to the uncertainty
model E representing the inadequacy of the embedded model T . In this case, it is
important to ensure that E is a valid description of the inadequacy of T . As with
the embedded model sensitivities discussed above, validation tests from high on
the validation pyramid are most valuable for assessing whether the uncertainty
model represents inadequacy in the context of a composite model similar to that
for the prediction. If, however, the available validation data are for quantities that
are insensitive to E ; then the veracity of E in representing the uncertainty in the
QoI will be suspect. Reliable predictions will then be possible only if there is
independent information that the inadequacy representation is trustworthy.
that has nonlocal dependence on the state defining the relevant scenario space
and, hence, the region of scenario space that defines the domain of applicability is
significantly more difficult.
A Major Caveat
The predictive assessment process can determine whether, given what is known
about the system, the calibration and validation processes are sufficient to make a
reliable prediction. But the well-known problem of unknown unknowns remains.
If the system being simulated involves an unrecognized phenomenon, then clearly
an embedded model to represent it will not be included in the composite model for
the system. As with the examples above, the prediction QoI could be particularly
sensitive to this phenomenon, while the validation observables are not sensitive.
In this situation, one would not be able to detect that anything is missing from the
composite model. Further one could not even identify that the validation observables
were insufficient, that is, the predictive assessment could not detect the inadequacy
of the validation process. This is a special case of a broader issue. The predictive
validation process developed here relies explicitly on reliable knowledge about the
system and the models used to represent it. This knowledge is considered to not
need independent validation and is thus what allows for extrapolative predictions.
However, if this externally supplied information is in fact incorrect, then the
predictive validation process may not be able to detect it.
154 R.D. Moser and T.A. Oliver
References
1. Adams, B.M., Ebeida, M.S., Eldred, M.S., Others: Dakota, A Multilevel Parallel Object-
Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantifi-
cation, and Sensitivity Analysis: Version 6.2 Users Manual. Sandia National Laboratories,
Albuquerque (2014). https://dakota.sandia.gov/documentation.html
2. AIAA Computational Fluid Dynamics Committee on Standards: AIAA Guide for Verification
and Validation of Computational Fluid Dynamics Simulations. AIAA Paper number G-077-
1999 (1998)
3. Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv.
4, 4079 (2010). doi:10.1214/09-SS054, http://dx.doi.org/10.1214/09-SS054
4. ASME Committee V&V 10: Standard for Verification and Validation in Computational Solid
Mechanics. ASME (2006)
5. Babuka, I., Nobile, F., Tempone, R.: Reliability of computational science. Numer. Methods
Partial Differ. Equ. 23(4), 753784 (2007). doi:10.1002/num.20263
6. Box, G.E.P.: Sampling and Bayes inference in scientific modeling and robustness. R. Stat.
Soc. Ser. A 143, 383430 (1980)
7. Box, G., Tiao, G.C.: Bayesian Inference in Statistical Analysis. Wiley Classics, New York
(1973)
8. Cox, R.T.: The Algebra of Probable Inference. Johns Hopkins University Press, Baltimore
(1961)
9. Cui, T., Martin, J., Marzouk, Y.M., Solonen, A., Spantini, A.: Likelihood-informed dimension
reduction for nonlinear inverse problems. Inverse Probl. 30(11), 114015 (2014)
10. Dubois, D., Prade, H.: Possibility Theory: An Approach to Computerized Processing of
Uncertainty. Plenum Press, New York (1988)
11. Ferson, S., Ginzburg, L.R.: Different methods are needed to propagate ignorance and variabil-
ity. Reliab. Eng. Syst. Saf. 54, 133144 (1996)
12. Fine, T.L.: Theories of Probability. Academic, New York (1973)
13. Firm uses does fastest supercomputer to streamline long-haul trucks. Office of Science, U.S.
Department of Energy, Stories of Discovery and Innovation (2011). http://science.energy.gov/
discovery-and-innovation/stories/2011/127008/
14. Gelman, A.: Comment: Bayesian checking of the second levels of hierarchical models. Stat.
Sci. 22, 349352 (2007). doi:doi:10.1214/07-STS235A
15. Gelman, A., Rubin, D.B.: Avoiding model selection in Bayesian social research. Sociol.
Methodol. 25, 165173 (1995)
16. Gelman, A., Shalizi, C.R.: Philosophy and the practice of Bayesian statistics. Br. J. Math. Stat.
Psychol. 66(1), 838 (2013)
17. Gelman, A., Meng, X.L., Stern, H.: Posterior predictive assessment of medel fitness via realized
discrepancies. Statistica Sinica 6, 733807 (1996)
18. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data
Analysis, 3rd edn. CRC Press, Boca Raton (2014)
19. Hsu, J.: Multiple Comparisons: Theory and Methods. Chapman and Hall/CRC, London (1996)
20. Hyndman, R.J.: Computing and graphing highest density regions. Am. Stat. 50(2), 120126
(1996)
21. Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press,
Cambridge/New York (2003)
22. Kanji, G.K.: 100 Statistical Tests, 3rd edn. Sage Publications, London/Thousand Oaks (2006)
23. Le Matre, O., Knio, O., Najm, H., Ghanem, R.: Uncertainty propagation using wienerhaar
expansions. J. Comput. Phys. 197(1), 2857 (2004)
24. Li, J., Marzouk, Y.M.: Adaptive construction of surrogates for the Bayesian solution of inverse
problems. SIAM J. Sci. Comput. 36(3), A1163A1186 (2014)
25. Miller, R.G.J.: Simultaneous Statistical Inference, 2nd edn. Springer, New York (1981)
156 R.D. Moser and T.A. Oliver
26. Miller, L.K.: Simulation-based engineering for industrial competitive advantage. Comput. Sci.
Eng. 12(3), 1421 (2010). doi:10.1109/MCSE.2010.71
27. Najm, H.N.: Uncertainty quantification and polynomial chaos techniques in computational
fluid dynamics. Annu. Rev. Fluid Mech. 41, 3552 (2009)
28. Oberkampf, W.L., Helton, J.C., Sentz, K.: Mathematical representation of uncertainty. AIAA
2001-1645
29. Oden, J.T., Belytschko, T., Fish, J., Hughes, T.J.R., Johnson, C., Keyes, D., Laub, A., Petzold,
L., Srolovitz, D., Yip, S.: Revolutionizing engineering science through simulation: a report of
the National Science Foundation blue ribbon panel on simulation-based engineering science
(2006). http://www.nsf.gov/pubs/reports/sbes_final_report.pdf
30. Oliver, T.A., Terejanu, G., Simmons, C.S., Moser, R.D.: Validating predictions of unobserved
quantities. Comput. Methods Appl. Mech. Eng. 283, 13101335 (2015). doi:http://dx.doi.org/
10.1016/j.cma.2014.08.023, http://www.sciencedirect.com/science/article/pii/S004578251400
293X
31. Petra, N., Martin, J., Stadler, G., Ghattas, O.: A computational framework for infinite-
dimensional Bayesian inverse problems, Part II: stochastic Newton mcmc with application to
ice sheet flow inverse problems. SIAM J. Sci. Comput. 36(4), A1525A1555 (2014)
32. Rubin, D.B.: Bayesianly justifiable and relevant frequency calculations for the applied
statistician. Ann. Stat. 12, 11511172 (1984)
33. Sentz, K., Ferson, S.: Combination of evidence in DempsterShafer theory. Technical report
SAND 2002-0835, Sandia National Laboratory (2002)
34. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
35. Van Horn, K.S.: Constructing a logic of plausible inference: a guide to Coxs theorem. Int. J.
Approx. Reason. 34(1), 324 (2003)
Toward Machine Wald
6
Houman Owhadi and Clint Scovel
Abstract
The past century has seen a steady increase in the need of estimating and
predicting complex systems and making (possibly critical) decisions with limited
information. Although computers have made possible the numerical evaluation
of sophisticated statistical models, these models are still designed by humans
because there is currently no known recipe or algorithm for dividing the
design of a statistical model into a sequence of arithmetic operations. Indeed
enabling computers to think as humans, especially when faced with uncer-
tainty, is challenging in several major ways: (1) Finding optimal statistical
models remains to be formulated as a well-posed problem when information
on the system of interest is incomplete and comes in the form of a complex
combination of sample data, partial knowledge of constitutive relations and
a limited description of the distribution of input random variables. (2) The
space of admissible scenarios along with the space of relevant information,
assumptions, and/or beliefs, tends to be infinite dimensional, whereas calculus
on a computer is necessarily discrete and finite. With this purpose, this paper
explores the foundations of a rigorous framework for the scientific computation
of optimal statistical estimators/models and reviews their connections with
decision theory, machine learning, Bayesian inference, stochastic optimization,
robust optimization, optimal uncertainty quantification, and information-based
complexity.
Keywords
Abraham Wald Decision theory Machine learning Uncertainty quantifica-
tion Game theory
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2 The UQ Problem Without Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.1 Cebysev, Markov, and Kren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.2 Optimal Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
2.3 Worst-Case Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.4 Stochastic and Robust Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.5 Cebysev Inequalities and Optimization Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
3 The UQ Problem with Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.1 From Game Theory to Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.2 The Optimization Approach to Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.3 Abraham Wald . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.4 Generalization to Unknown Pairs of Functions and Measures and to
Arbitrary Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.5 Model Error and Optimal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3.6 Mean Squared Error, Variance, and Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
3.7 Optimal Interval of Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3.8 Ordering the Space of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3.9 Mixing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4 The Complete Class Theorem and Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.1 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.2 Relation Between Adversarial Model Error and Bayesian Error . . . . . . . . . . . . . . . . 174
4.3 Complete Class Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5 Incorporating Complexity and Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.1 Machine Wald . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.2 Reduction Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.3 Stopping Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.4 On the Borel-Kolmogorov Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.5 On Bayesian Robustness/Brittleness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.6 Information-Based Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Construction of D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Conditional Expectation as an Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
1 Introduction
During the past century, the need to solve large complex problems in applications
such as fluid dynamics, neutron transport, or ballistic prediction drove the parallel
development of computers and numerical methods for solving ODEs and PDEs. It
is now clear that this development lead to a paradigm shift. Before: each new PDE
required the development of new theoretical methods and the employment of large
teams of mathematicians and physicists; in most cases, information on solutions was
6 Toward Machine Wald 159
and
L.A/ WD inf X a
2A
where
A WD 2 M.0; 1/ j E X m
and M.0; 1/ is the set of Borel probability measures on 0; 1. It is easy to observe
that the extremum of (6.1) can be achieved only when is the weighted sum of a
Dirac mass at 0 and a Dirac mass at a. It follows that, although (6.1) is an infinite
dimensional optimization problem, it can be reduced to the simple one-dimensional
optimization problem obtained by letting p 2 0; 1 denote the weight of the Dirac
mass at 1 and 1 p the weight of the Dirac mass at 0: Maximize p subject to
ap D m, producing the Markov bound ma as solution.
Problems such as (1) can be traced back to Cebysev [77, Pg. 4] Given: length,
weight, position of the centroid and moment of inertia of a material rod with a
density varying from point to point. It is required to find the most accurate limits
for the weight of a certain segment of this rod. According to Kren [77], although
Cebysev did solve this problem, it was his student Markov who supplied the proof
in his thesis. See Kren [77] for an account of the history of this subject along with
substantial contributions by Kren.
6 Toward Machine Wald 161
Therefore, in the absence of sample data, in the context of this generalization, one is
interested in estimating .f ; /, where .f ; / 2 F.X / M.X / corresponds
to an unknown reality: the function f represents a response function of interest, and
represents the probability distribution of the inputs of f . If A represents all that
is known about .f ; / (in the sense that .f ; / 2 A and that any .f; / 2 A
could, a priori, be .f ; / given the available information), then [114] shows that
the quantities
H.X / is Polish, it follows that the approximation space H.X / M.X / is the
product of a Polish space and a separable metric space. When furthermore X is
Polish, it follows that the approximation space is the product of Polish spaces and
therefore Polish.
Robust control [176] and robust optimization [7, 14] have been founded
upon the worst-case approach to uncertainty. Recall that robust optimization
describes optimization involving uncertain parameters. While these uncertain
164 H. Owhadi and C. Scovel
To motivate the general formulation in the presence of sample data, consider another
simple warm-up problem.
The only difference between Problems 3 and 1 lies in the availability of data sampled
from the underlying unknown distribution. Observe that, in presence of this sample
data, the notions of sharpest estimate or smallest interval of confidence are far
from being transparent and call for clear and precise definitions. Note also that if
the constraint E X m is ignored, and the number n of sample data is large,
then one could use the central limit theorem or a concentration inequality (such
as Hoeffdings inequality) to derive an interval of confidence for X a. A
nontrivial question of practical importance is what to do when n is not large.
Writing . / WD X a as the quantity of interest, observe that an
estimation of . / is a function (which will be written ) of the data d . Ideally
one would like to choose so that the estimation error .d / . / is as close
as possible to zero. Since d is random, a more robust notion of error is that of
a statistical error E ; / defined by weighting the error with respect to a real
measurable positive loss function V W R ! R and the distribution of the data, i.e.,
h i
E. ; / WD Ed . /n V .d / . / (6.5)
Note that for V .x/ D x 2 , the statistical error E. ; / defined in (6.5) is the mean
squared error with respect to the distribution of the data d of the estimation error.
For V .x/ D 1;1 .jxj/ defined for some > 0, E. ; / is the probability with
respect to the distribution of d that the estimation error is larger than .
Now since is unknown, the statistical error E. ; / of any is also unknown.
However one can still identify the least upper bound on that statistical error through
a worst-case scenario with respect to all possible candidates for , i.e.,
The sharpest estimator (possibly within a given class) is then naturally obtained as
the minimizer of (6.6) over all functions of the data d within that class, i.e., as the
minimizer of
inf sup E ; /: (6.7)
2A
166 H. Owhadi and C. Scovel
Observe that the optimal estimator is identified independently from the obser-
vation/realization of the data and if the minimum of (6.7) is not achieved then
one can still use a near-optimal . Then, when the data is observed, the estimate
of the quantity of interest . / is then derived by evaluating the near-optimal
on the data d . The notion of optimality described here is that of Walds statistical
decision theory [156158, 160, 161], evidently influenced by Von Neumanns game
theory [153, 155]. In Walds formulation [157], which cites both Von Neumann
[153] and Von Neumann and Morgenstern [155], the statistician finds himself in
an adversarial
game played against the Universe in which he tries to minimize a risk
function E ; / with respect to in a worst-case scenario with respect to what the
Universes choice of could be.
The optimization approach to statistics is not new and this section will now give a
short, albeit incomplete, description of its development, primarily using Lehmanns
account [87]. Accordingly, it began with Gauss and Laplace with the nonparametric
result referred to as the Gauss-Markov theorem, asserting that the least squares esti-
mates are the linear unbiased estimates with minimum variance. Then, in Fishers
fundamental paper [39], for parametric models, he proposes the maximum likeli-
hood estimator and claims (but does not prove) that such estimators are consistent
and asymptotically efficient. According to Lehmann, the situation is complex, but
under suitable restrictions Fishers conjecture is essentially correct : : :. The Fishers
maximum likelihood principle was first proposed on intuitive grounds and then its
optimality properties developed. However, according to Lehmann [86, Pg. 1011],
Pearson then asked Neyman Why these tests rather than any of the many other that
could be proposed? This question resulted in Neyman and Pearsons 1928 papers
[104] on the likelihood ratio method, which gives the same answer as Fishers tests
under normality assumptions. However, Neyman was not satisfied. He agreed that
the likelihood ratio principle was appealing but felt that it was lacking a logically
convincing justification. This then led to the publication of Neyman and Pearson
[105], containing their now famous NeymanPearson lemma, which according to
Lehmann [87], In a certain sense this is the true start of optimality theory. In a
major extension of the NeymanPearson work, Huber [58] proves a robust version
of the NeymanPearson lemma, in particular, providing an optimality criteria
defining the robust estimator, giving rise to a rigorous theory of robust statistics
based on optimality; see Hubers Wald lecture [59]. Robustness is particularly suited
to the Wald framework since robustness considerations are easily formulated with
the proper choices of admissible functions and measures in the Wald framework.
Another example is Kiefers introduction of optimality into experimental design,
resulting in Kiefers 40 papers on Optimum Experimental Designs [74].
Not everyone was happy with optimality as a guiding principle. For example,
Lehmann [87] states that at a 1958 meeting of the Royal Statistical Society at which
Kiefer presented a survey talk [73] on Optimum Experimental Designs, Barnard
6 Toward Machine Wald 167
quotes Kiefer as saying of procedures proposed in a paper by Box and Wilson that
they often [are] not even well-defined rules of operation. Barnards reply:
in the field of practical human activity, rules of operation which are not well-defined may
be preferable to rules which are.
Wynn [173], in his introduction to a reprint of Kiefers paper, calls this a clash of
statistical cultures. Indeed, it is interesting to read the generally negative responses
to Kiefers article [73] and the remarkable rebuttal by Kiefer therein. Tukey had
other criticisms regarding The tyranny of the best in [147] and The dangers of
optimization in [148]. In the latter he writes:
Some [statisticians] seem to equate [optimization] to statistics an attitude which, if widely
adopted, is guaranteed to produce a dried-up, encysted field with little chance of real growth.
For an account of how the Exploratory Data Analysis approach of Tukey fits within
the Fisher/NeymanPearson debate, see Lehnard [88].
Let us also remark on the influence that Student William Sealy Gosset had
on both Fisher and Pearson. As presented in Lehmanns [85] Student and small-
sample theory, quoting F. N. David [79]: I think he [Gosset] was really the big
influence in statistics: : : He asked the questions and Pearson or Fisher put them into
statistical language and then Neyman came to work with the mathematics. But I
think most of it stems from Gosset. The aim of Lehmanns paper [85] is to consider
to what extent Davids conclusion is justified. Indeed, the claim is surprising since
Gosset is mainly known for only one contribution, that is, Student [141], with
the introduction of Students t-test and its analysis under the normal distribution.
According to Lehmann, Today the pathbreaking nature of this paper is generally
recognized and has been widely commented upon, : : :. Gossets primary concern in
communicating with both Fisher and Pearson was the robustness of the test to non-
normality. Lehmann concludes that the main ideas leading to Pearsons research
were indeed provided by Student. See Lehmann [85] for the full account, including
Gossets relationship to the FisherPearson debate, Pearson [116] for a statistical
biography of Gosset, and Fisher [40] for a eulogy. Consequently, modern statistics
appears to owe a lot to Gosset. Moreover, the reason for the pseudonym was a policy
by Gossets employer, the brewery Arthur Guinness, Sons, and Co., against work
done for the firm being made public. Allowing Gosset to publish under a pseudonym
was a concession that resulted in the birth of the statistician Student. Consequently,
the authors would like to take this opportunity to thank the Guinness Brewery for
its influence on statistics today, and for such a fine beer.
the introduction of Wald [156]. Brown [22] quotes the students of Neyman in 1966
from [103]:
The concepts of confidence intervals and of the Neyman-Pearson theory have proved
immensely fruitful. A natural but far reaching extension of their scope can be found in
Abraham Walds theory of statistical decision functions. The elaboration and application of
the statistical tools related to these ideas has already occupied a generation of statisticians.
It continues to be the main lifestream of theoretical statistics.
Browns purpose was to address if the last sentence in the quote was still true in
2000.
Wolfowitz [170] describes the primary accomplishments of Walds statistical
decision theory as follows:
Walds greatest achievement was the theory of statistical decision functions, which includes
almost all problems which are the raison detre of statistics.
Leonard [89, Chp. 12] portrays Von Neumanns return to game theory as partly
an early reaction to upheaval and war. However he adds that eventually Von
Neumann became personally involved in the war effort and with that involvement
came a significant, unforeseeable moment in the history of game theory: this
new mathematics made its wartime entrance into the world, not as the abstract
theory of social order central to the book, but as a problem solving technique.
Moreover, on pages 278280, Leonard discusses the statistical research groups at
Berkeley, Columbia, and Princeton, in particular Wald at Columbia, and how the
effort to develop inspection and testing procedures leads Wald to the development
of sequential methods apparently yielding significant economies in inspection
in the Navy, leading to the publication of Wald and Wolfowitz [162] proof
of the optimality of the sequential probability ratio test and Walds book [159]
Sequential Analysis. Leonards claim essentially is that the war stimulated these
fine theoretical minds to pursue activities with real application value. In this regard,
it is relevant to note Mangel and Samaniegos [93] stimulating description of Walds
work on aircraft survivability, along with the contemporary, albeit somewhat vague,
description of How a Story from World War II shapes Facebook today by Wilson
[167]. Indeed, in the problem of how to allocate armoring to the allied bombers
based on their condition upon return from their missions, it was discovered that
armoring where the previous planes had been hit was not improving their rate of
return. Walds ingenious insight was that these were the returning bombers not the
ones which had been shot down. So the places where the returning bombers were hit
are more likely to be the places where one does not need to add armoring. Evidently,
his rigorous and unconventional innovations to transform this intuition into a real
methodology saved many lives. Wolfowitz [170] states:
Wald not only posed his statistical problems clearly and precisely, but he posed them to
fit the practical problem and to accord with the decisions the statistician was called on
to make. This, in my opinion, was the key to his success-a high level of mathematical
talent of the most abstract sort, and a true feeling for, and insight into, practical problems.
The combination of the two in his person at such high levels was what gave him his
outstanding character.
6 Toward Machine Wald 169
The section on Von Neumann and Remak (along with the Notes that follows it)
in Kurz and Salvadori [78] describes Wald and Von Neumanns relations. Brown
[21] credits Wald as the creator of the minmax idea in statistics, evidently
given axiomatic justification by Gilboa and Schmeidler [47]. This certainly
had something to do with his friendship with Morgenstern and his relationship
with Von Neumann, who together authored the famous book [155], but this
influence can be explicitly seen in Walds [157] citation of Von Neumann
[153] and Von Neumann and Morgenstern [155] in his introduction [157]
of the minmax idea in statistical decision theory. Indeed, Wolfowitz states
that:
: : : he was also spurred on by the connection between the newly announced results of [Von
Neumann and Morgenstern] [155] and his own theory, and by the general interest among
economists and others aroused by the theory of games.
Wolfowitz asserts that Walds work [156] Contributions to the Theory of Statis-
tical Estimation and Testing Hypotheses is probably his most important paper
but that it went almost completely unnoticed, possibly because The use of
Bayes solutions was deterrent and Wald did not really emphasize that he was
using Bayes solutions only as a tool. Moreover, although Wolfowitz considered
Walds Statistical Decision Functions [161] his greatest achievement, he also
says:
The statistician who wants to apply the results of [161] to specific problems is likely
to be disappointed. Except for special problems, the complete classes are difficult to
characterize in a simple manner and have not yet been characterized. Satisfactory general
methods are not yet known for obtaining minimax solutions. If one is not always
going to use a minimax solution (to which.serious objections have been raised) or a
solution satisfying some given criterion, then the statistician should have the opportunity
to choose from among representative decision functions on the basis of their risk
functions. These are not available except for the simplest cases. It is clear that much
remains to be done before the use of decision functions becomes common. The theory
provides a rational basis for attacking almost any statistical problem, and, when some
computational help is available and one makes some reasonable compromises in the
interest of computational feasibility, one can obtain a practical answer to many prob-
lems which the classical theory is unable to answer or answers in an unsatisfactory
manner.
DW A ! M.D/; (6.8)
where M.D/ is given the Borel structure corresponding to the weak topology, to
define this relation. The idea is that D.f; / is the probability distribution of the
observed sample data d if .f ; / D .f; /, and for this reason it may be called
the data map (or, even more loosely, the observation operator). Now consider the
following problem.
Since in real applications true optimality will never be achieved, it is natural to gen-
eralize to considering near-minimizers of (6.11) as near-optimal models/estimators.
Remark 1. In situations where the data map is imperfectly known (e.g., when the
data d is corrupted by some noise of imperfectly known distribution), one has to
include a supremum over all possible candidates D 2 D in the calculation of the
least upper bound on the statistical error.
h 2 i h i2
Vard D.f;/ .d / WD Ed D.f;/ .d / Ed D.f;/ .d /
The following equation, whose proof is straightforward, shows that for V .x/ D x 2 ,
the least upper bound on the mean squared error of is equal to the least upper
bound on the sum of the variance of and the square of its bias:
" #
2
sup E ; .f; / D sup Vard D.f;/ .d / C Ed D.f;/ .d / .f; /
.f;/2A .f;/2A
note that sup.f;/2A E ; .f; // is the least upper bound on the probability (with
respect to the distribution of d ) that the difference between the true value of the
quantity of interest .f ; / and its estimated value .d / is larger than . Let
2 0; 1. Define
WD inf > 0 j inf sup E ; .f; // ;
.f;/2A
and observe that if is a minimizer of inf sup.f;/2A E ; .f; / then .d /
; .d / C is the smallest interval of confidence (random interval obtained as a
function of the data) containing .f ; / with probability at least 1 . Observe
also that this formulation is a natural extension of the OUQ formulation as described
in [114]. Indeed, in the absence of sample data, it is easy to show that 1 is the
midpoint of the optimal interval L.A/; U .A/.
Remark 2. We refer to [37, 38, 137] and in particular to Steins notorious paradox
[138] for the importance of a careful choice for loss function.
Pn
iD1 i D 1 is a finite-dimensional convex optimization problem in . If estimators
are seen as models of reality, then this observation supports the idea that one
can obtain improved models by mixing them (with the goal of achieving minimal
statistical errors).
E.f;/;d D.f;/ .f; /d (6.12)
as the estimator .d /. This requires giving A the structure of a measurable space
such that important quantities of interest such as .f; / ! f .X / a and
.f; / ! E f are measurable. This can be achieved using results of Ressel [121]
providing conditions under which the map .f; / ! f from function/measure
pairs to the induced law is Borel measurable. We will henceforth assume A to
be
a Suslin space and proceed to construct the measure of probability D of
.f; /; d on A D via a natural generalization of the Campbell measure and
Palm distribution associated with a random measure as described in [68]; see also
[25, Ch. 13] for a more current treatment. We refer to Sect.
6 of the appendix for the
details of the construction of the distribution D of .f; /; d when .f; /
and d D.f; /, and of the marginal distribution D of D on the data space
D, and the resulting regular conditional expectation (6.12). Consequently, the nested
expectation E.f;/;d D.f;/ appearing in (6.12) will from now on be rigorously
written as the expectation E..f;/;d /D .
When A is Suslin and when .f ; / is not a random sample from but simply an
unknown element of A, then a straightforward application of the reduction theorems
of [114] implies that when D M.A/, then (6.14) is equal to (6.11), i.e.,
sup E ; .f; / D sup E. ; / (6.15)
.f;/2A 2M.A/
When has a second moment with respect to , one can utilize the classical
variational description of conditional expectation as follows: Letting L2D .D/
denote the space of ( D a.e. equivalence classes of) real-valued measurable
functions on D that are square-integrable with respect to the measure D, one
has (see Sect. 6)
h 2 i
ED d WD arg min E.f;;d /D .f; / h.d / :
h2L2D .D/
In other words, if .f; / is sampled from the measure , ED .f; /d is
the best mean-square approximation of .f; / in the space of square-integrable
functions of d . As with the regular conditional probabilities, the real-valued function
on D
.d / D E.f;;D/D .f; /D D d ; d 2 D (6.16)
classical arguments in game theory suggest that one extend to random estimators
and random selection in A. To that end, let the set of randomized estimators
R WD fO W X n B.R/ ! 0; 1; O Markovg be the set of Markov kernels.
176 H. Owhadi and C. Scovel
Let R be endowed with the topology of pointwise convergence with respect to this
action, that is, the weak topology with respect to integration against Cb .R/L.An /.
Moreover, this weak topology also facilitates a definition of the space R of
generalized random estimators as bilinear real-valued maps # W Cb .R/ L.An / !
R satisfying j#.f; /j kf k1 kk, #.f; / 0 for f 0, 0, and
#.1; / D .X /. By [139, Thm. 42.3], the set of generalized random estimators
R is compact and convex, and by [139, Thm. 42.5] of Le Cam [82], R is dense in
R in the weak topology. Moreover, when An is dominated and one can restrict to a
compact subset C 2 R of the decision space, then Strasser [139, Cor. 42.8] asserts
that R D R.
Returning to our illustration, if one let W ; 2 A be defined by W .r/ WD
V .r .//; r 2 R; 2 A denote the associated family of loss functions, one
can now define a generalization of the statistical error function E. ; / of (6.5) to
randomized estimators O by
Z Z
O / WD
E.; W .r/O .x n ; dr/n .dx n /; O 2 R; 2 A:
This definition reduces to the previous one (6.5) when the random estimator O
corresponds to a point estimator and extends naturally to R. Finally, one says
that an estimator # 2 R is Bayesian if there exists a probability measure m with
finite support on A such that
6 Toward Machine Wald 177
Z Z
E.# ; /m.d / E.#; /m.d /; # 2 R:
The following complete class theorem follows from Strasser [139, Thm. 47.9,
Cor. 42.8] since one can naturally compactify the decision space R when the
quantity of interest is bounded and the loss function V is sublevel compact, that
is has compact sublevel sets.
Theorem 1. Suppose that the loss function V is sublevel compact and the quantity
of interest W A ! R is bounded. Then, for each generalized randomized estimator
# 2 R, there exists a weak limit # 2 R of Bayesian estimators such that
require the actual implementation of the estimator on a machine and its numerical
evaluation against the data.
That is, on even par with the rigorous performance analysis, machine learning also
requires that solutions be efficiently implementable on a computer, and often such
efficiency is established by proving bounds on the amount of computation required
to produce such a solution with a given algorithm. Although Walds theory of
optimal statistical decisions has resulted in many important statistical discoveries,
looking through the three Lehmann symposia of Rojo and PrezAbreu [126] in
2004 and Rojo [124, 125] in 2006 and 2009, it is clear that the incorporation of
the analysis of the computational algorithm, both in terms of its computational
efficiency and its statistical optimality, has not begun. Therefore a natural answer to
fundamental challenges in UQ appears to be the full incorporation of computation
into a natural generalization of Walds statistical decision function framework,
producing a framework one might call Machine Wald.
Many of these optimization problems will not be tractable. However even in the
tractable case, which has rigorous guarantees on the amount of computation required
to obtain an approximate optima, it will be useful to have stopping criteria for the
specific algorithm and the specific problem instance under consideration, which can
be used to guarantee when an approximate optima has been achieved. Although
in the intractable case no such guarantee will exist in general, intelligent choices
of algorithms may result in the attainment of approximate optima and such tests
guarantee that fact. Ermoliev, Gaivoronski, and Nedeva [36] successfully develop
such stopping criteria using Lagrangian duality and generalized Benders decompo-
sitions by Geoffrion [46] for certain stochastic optimization problems which are also
relevant here. In addition, the approximation of intractable problems by tractable
ones will be important. Recently, Hanasusanto, Roitch, Kuhn, and Wiesemann
[53] derive explicit conic reformulations for tractable problem classes and suggest
efficiently computable conservative approximations for intractable ones.
An oftentimes overlooked difficulty with Bayesian estimators lies in the fact that
for a prior 2 M.A/, the posterior (6.12) is not a measurable function of d but
a convex set ./ of measurable functions of d that are almost surely equal to
each other under the measure D on D.
A notorious pathological consequence is the BorelKolmogorov paradox (see
Chapter 5 of [76] and Section 15.7 of [63]). Recall that in the formulation of this
paradox, one considers the uniform distribution on the two-dimensional sphere and
one is interested in obtaining the conditional distribution associated with a great
circle of that sphere. If the problem is parameterized in spherical coordinates, then
the resulting conditional distribution is uniform for the equator but nonuniform for
the longitude corresponding to the prime meridian. The following theorem suggests
that this paradox is generic and dissipates the idea that it could be limited to
fabricated toy examples. See also Singpurwalla and Swift [132] for implications
of this paradox in modeling and inference.
Recall that for 2 M.A/, that ./ is defined as the convex set of
measurable functions which are equal to D-everywhere to the regular conditional
expectation (6.12). Despite this indeterminateness, it is comforting to know that
Theorem 2. Assume that V .x/ D x 2 and that the image .A/ is a nontrivial
interval. If is not absolutely continuous with respect to then
1 sup1 ;2 2./ E.2 ; / E.1 ; /
2 1 (6.19)
4 U .A/ L.A/ supB2B.D/ W .D/BD0 . D/B
midpoint estimator
L.A/ C U .A/
WD :
2
As a remedy, one can try (see [144, 145] and [117]) constructing conditional
expectations as disintegration or derivation limits defined as
ED .f; /D D d D lim ED .f; /D 2 B (6.20)
B#fd g
As much as classical numerical analysis shows that there are stable and unstable
ways to discretize a partial differential equation, positive [13,23,28,75,80,140,152]
and negative results [6,27,42,43,65,84,108,109,112,113] are forming an emerging
understanding of stable and unstable ways to apply Bayes rule in practice. One
6 Toward Machine Wald 181
Most statisticians would acknowledge that an analysis is not complete unless the sensitivity
of the conclusions to the assumptions is investigated. Yet, in practice, such sensitivity
analyses are rarely used. This is because sensitivity analyses involve difficult computations
that must often be tailored to the specific problem. This is especially true in Bayesian
inference where the computations are already quite difficult. [165]
Another aspect concerns situations where Bayes rule is applied iteratively and
posterior values become prior values for the next iteration. Observe in particular
that when posterior distributions (which are later on used as prior distributions) are
only approximated (e.g., via MCMC methods), stability requires the convergence of
the MCMC method in the same metric used to quantify the sensitivity of posterior
with respect to the prior distributions.
In the context of the framework being developed here, recent results [108, 109,
112, 113] on the extreme sensitivity (brittleness) of Bayesian inference in the TV
and Prokhorov metrics appear to suggest that robust inference, in a continuous world
under finite-information, should perhaps be done with reduced/coarse models rather
than highly sophisticated/complex models (with a level of coarseness/reduction
depending on the available finite information) [113].
From the point of view of practical applications, it is clear that the set of possible
models entering in the minimax problem 6.11 must be restricted by introducing
constraints on computational complexity. For example, finding optimal models of
materials in extreme environments is not the correct objective when these models
require full quantum mechanics calculations. A more productive approach is to
search for computationally tractable optimal models in a given complexity class.
Here one may wonder if Bayesian models remain a complete class for the resulting
complexity constrained minimax problems. It is also clear that computationally
tractable optimal models may not use all the available information, for instance,
a material model of bounded complexity should not use the state of every atom. The
idea that fast computation requires computation with partial information forms the
core of information-based complexity, the branch of computational complexity that
studies the complexity of approximating continuous mathematical operations with
discrete and finite ones up to a specified level of accuracy [101, 115, 146, 171, 172],
where it is also augmented by concepts of contaminated and priced information
associated with, for example, truncation errors and the cost of numerical operations.
Recent results [107] suggest that decision theory concepts could be used, not
only to identify reduced models but also algorithms of near-optimal complexity
by reformulating the process of computing with partial information and limited
resources as that of playing underlying hierarchies of adversarial information
games.
182 H. Owhadi and C. Scovel
6 Conclusion
Although uncertainty quantification is still in its formative stage, much like the state
of probability theory before its rigorous formulation by Kolmogorov in the 1930s,
it has the potential to have an impact on the process of scientific discovery that is
similar to the advent of scientific computing. Its emergence remains sustained by
the persistent need to make critical decisions with partial information and limited
resources. There are many paths to its development, but one such path appears to be
the incorporation of notions of computation and complexity in a generalization of
Walds decision framework built on Von Neumanns theory of adversarial games.
Appendix
Construction of D
b
DW A B.D/ ! R
defined by
b
D .f; /; B WD D.f; /B; for .f; / 2 A; B 2 B.D/
is a transition function in the sense that, for fixed .f; / 2 A, bD .f; /; is
a probability measure, and, for fixed B 2 B.D/, b D ; B is Borel measurable.
Therefore, by [18, Thm. 10.7.2], any 2 M.A/ defines a probability measure
D 2 M B.A/ B.D/
through
D A B WD E.f:/ 1A .f; /D.f; /B ; for A 2 B.A/; B 2 B.D/;
(6.21)
D 2 M.A D/:
DB WD E.f;/ D.f; /B ; for B 2 B.D/:
Since both D and A are Suslin, it follows that the product A D is Suslin.
Consequently, [18, Cor. 10.4.6]
asserts
that regular conditional probabilities exist
for any sub-
-algebra of B A D . In particular, the product theorem of [18,
Thm. 10.4.11] asserts that product regular conditional probabilities
D jd 2 M.A/; for d 2 D
Proof of Theorem 2
If D is not absolutely continuous with respect to D, then there exists B 2 B.D/
such that . D/B D 0 and . D/B > 0. Let 2 ./. Define
Then it is easy to see that if y is in the range of , then y 2 ./. Now observe
that for y; z 2 I mage./,
" #
E.y ; /E.z ; / D E.f;;d / D 1B .d / V y .f; / V z.f; /
with
WD E D jD 2 B
which proves
and,
2
sup E.2 ; / inf E.1 ; / U .A/ L.A/ sup . D/B:
2 2./ 1 2./ B2B.D/ W .D/BD0
To obtain the right hand side of (6.19) observe that (see for instance [29, Sec. 5])
there exists B 2 B.D/ such that
References
1. Richardson, L.F.: Weather Prediction by Numerical Process. Cambridge Mathematical
Library. Cambridge University Press, Cambridge (1922)
2. Ackerman, N.L., Freer, C.E., Roy, D.M.: On the computability of conditional probability.
arXiv:1005.3014 (2010)
3. Adams, M., Lashgari, A., Li, B., McKerns, M., Mihaly, J.M., Ortiz, M., Owhadi, H.,
Rosakis, A.J., Stalzer, M., Sullivan, T.J.: Rigorous model-based uncertainty quantification
with application to terminal ballistics. Part II: systems with uncontrollable inputs and large
scatter. J. Mech. Phys. Solids 60(5), 10021019 (2012)
4. Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhikers Guide, 3rd edn.
Springer, Berlin (2006)
5. Anderson, T.W.: The integral of a symmetric unimodal function over a symmetric convex set
and some probability inequalities. Proc. Am. Math. Soc. 6(2), 170176 (1955)
6. Belot, G.: Bayesian orgulity. Philos. Sci. 80(4), 483503 (2013)
7. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton Series in Applied
Mathematics. Princeton University Press, Princeton (2009)
8. Ben-Tal, A., Hochman, E.: More bounds on the expectation of a convex function of a random
variable. J. Appl. Probab. 9, 803812 (1972)
9. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23(4), 769805
(1998)
10. Bentkus, V.: A remark on the inequalities of Bernstein, Prokhorov, Bennett, Hoeffding, and
Talagrand. Liet. Mat. Rink. 42(3), 332342 (2002)
11. Bentkus, V.: On Hoeffdings inequalities. Ann. Probab. 32(2), 16501673 (2004)
12. Bentkus, V., Geuze, G.D.C., van Zuijlen, M.C.A.: Optimal Hoeffding-like inequalities under
a symmetry assumption. Statistics 40(2), 159164 (2006)
13. Bernstein, S.N.: Collected Works. Izdat. Nauka, Moscow (1964)
14. Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization.
SIAM Rev. 53(3), 464501 (2011)
15. Bertsimas, D., Popescu, I.: Optimal inequalities in probability theory: a convex optimization
approach. SIAM J. Optim. 15(3), 780804 (electronic) (2005)
16. Birge, J.R., Wets, R.J.-B.: Designing approximation schemes for stochastic optimization
problems, in particular for stochastic programs with recourse. Math. Prog. Stud. 27, 54102
(1986)
17. Blackwell, D.: Equivalent comparisons of experiments. Ann. Math. Stat. 24(2), 265272
(1953)
18. Bogachev, V.I.: Measure Theory, vol. II. Springer, Berlin (2007)
19. Bot, R.I., Lorenz, N., Wanka, G.: Duality for linear chance-constrained optimization prob-
lems. J. Korean Math. Soc. 47(1), 1728 (2010)
20. Boucheron, S., Lugosi, G., Massart, P.: A sharp concentration inequality with applications.
Random Struct. Algorithms 16(3), 277292 (2000)
21. Brown, L.D.: Minimaxity, more or less. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision
Theory and Related Topics V, pp. 118. Springer, New York (1994)
22. Brown, L.D.: An essay on statistical decision theory. J. Am. Stat. Assoc. 95(452), 12771281
(2000)
23. Castillo, I., Nickl, R.: Nonparametric Bernsteinvon Mises theorems in Gaussian white noise.
Ann. Stat. 41(4), 19992028 (2013)
24. Chen, W., Sim, M., Sun, J., Teo, C.-P.: From CVaR to uncertainty set: implications in joint
chance-constrained optimization. Oper. Res. 58(2), 470485 (2010)
25. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes.: General
Theory and Structure. Probability and its Applications (New York), vol. II, 2nd edn. Springer,
New York (2008)
26. Dantzig, G.B.: Linear programming under uncertainty. Manag. Sci. 1, 197206 (1955)
186 H. Owhadi and C. Scovel
27. Diaconis, P., Freedman, D.A.: On the consistency of Bayes estimates. Ann. Stat. 14(1), 167
(1986). With a discussion and a rejoinder by the authors
28. Doob, J.L.: Application of the theory of martingales. In: Le Calcul des Probabilits et ses
Applications, Colloques Internationaux du Centre National de la Recherche Scientifique,
vol. 13, pp. 2327. Centre National de la Recherche Scientifique, Paris (1949)
29. Doob, J.L.: Measure Theory. Graduate Texts in Mathematics, vol. 143. Springer, New York
(1994)
30. Drenick, R.F.: Aseismic design by way of critical excitation. J. Eng. Mech. Div. Am. Soc.
Civ. Eng. 99(4), 649667 (1973)
31. Dubins, L.E.: On extreme points of convex sets. J. Math. Anal. Appl. 5(2), 237244 (1962)
32. Dudley, R.M.: Real Analysis and Probability. Cambridge Studies in Advanced Mathematics,
vol. 74. Cambridge University Press, Cambridge (2002). Revised reprint of the 1989 original
33. Dvoretzky, A., Wald, A., Wolfowitz, J.: Elimination of randomization in certain statistical
decision procedures and zero-sum two-person games. Ann. Math. Stat. 22(1), 121 (1951)
34. Edmundson, H.P.: Bounds on the expectation of a convex function of a random variable.
Technical report, DTIC Document (1957)
35. Elishakoff, I., Ohsaki, M.: Optimization and Anti-optimization of Structures Under Uncer-
tainty. World Scientific, London (2010)
36. Ermoliev, Y., Gaivoronski, A., Nedeva, C.: Stochastic optimization problems with incomplete
information on distribution functions. SIAM J. Control Optim. 23(5), 697716 (1985)
37. Fisher, R.: The Design of Experiments. Oliver and Boyd, Edinburgh (1935)
38. Fisher, R.: Statistical methods and scientific induction. J. R. Stat. Soc. Ser. B. 17, 6978
(1955)
39. Fisher, R.A.: On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc.
Lond. Ser. A 222, 309368 (1922)
40. Fisher, R.A.: Student. Ann. Eugen. 9(1), 19 (1939)
41. Frauendorfer, K.: Solving SLP recourse problems with arbitrary multivariate distributions-the
dependent case. Math. Oper. Res. 13(3), 377394 (1988)
42. Freedman, D.A.: On the asymptotic behavior of Bayes estimates in the discrete case. Ann.
Math. Stat. 34, 13861403 (1963)
43. Freedman, D.A.: On the Bernstein-von Mises theorem with infinite-dimensional parameters.
Ann. Stat. 27(4), 11191140 (1999)
44. Gaivoronski, A.A.: A numerical method for solving stochastic programming problems with
moment constraints on a distribution function. Ann. Oper. Res. 31(1), 347369 (1991)
45. Gassmann, H., Ziemba, W.T.: A tight upper bound for the expectation of a convex function
of a multivariate random variable. In: Stochastic Programming 84 Part I. Mathematical
Programming Study, vol. 27, pp. 3953. Springer, Berlin (1986)
46. Geoffrion, A.M.: Generalized Benders decomposition. JOTA 10(4), 237260 (1972)
47. Gilboa, I., Schmeidler, D.: Maxmin expected utility with non-unique prior. J. Math. Econ.
18(2), 141153 (1989)
48. Godwin, H.J.: On generalizations of Tchebychefs inequality. J. Am. Stat. Assoc. 50(271),
923945 (1955)
49. Goh, J., Sim, M.: Distributionally robust optimization and its tractable approximations. Oper.
Res. 58(4, part 1), 902917 (2010)
50. Halmos, P.R., Savage, L.J.: Application of the Radon-Nikodym theorem to the theory of
sufficient statistics. Ann. Math. Stat. 20(2), 225241 (1949)
51. Han, S., Tao, M., Topcu, U., Owhadi, H., Murray, R.M.: Convex optimal uncertainty
quantification. SIAM J. Optim. 25(23), 13681387 (2015). arXiv:1311.7130
52. Han, S., Topcu, U., Tao, M., Owhadi, H., Murray, R.: Convex optimal uncertainty quantifica-
tion: algorithms and a case study in energy storage placement for power grids. In: American
Control Conference (ACC), 2013, Washington, DC, pp. 11301137. IEEE (2013)
53. Hanasusanto, G.A., Roitch, V., Kuhn, D., Wiesemann, W.: A distributionally robust per-
spective on uncertainty quantification and chance constrained programming. Math. Program.
151(1), 3562 (2015)
6 Toward Machine Wald 187
54. Hoeffding, W.: On the distribution of the number of successes in independent trials. Ann.
Math. Stat. 27(3), 713721 (1956)
55. Hotelling, H.: Abraham Wald. Am. Stat. 5(1), 1819 (1951)
56. Huang, C.C., Vertinsky, I., Ziemba, W.T.: Sharp bounds on the value of perfect information.
Oper. Res. 25(1), 128139 (1977)
57. Huang, C.C., Ziemba, W.T., Ben-Tal, A.: Bounds on the expectation of a convex function of
a random variable: with applications to stochastic programming. Oper. Res. 25(2), 315325
(1977)
58. Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73101 (1964)
59. Huber, P.J.: The 1972 Wald lecture- Robust statistics: a review. Ann. Math. Stat. 10411067
(1972)
60. Isii, K.: On a method for generalizations of Tchebycheffs inequality. Ann. Inst. Stat. Math.
Tokyo 10(2), 6588 (1959)
61. Isii, K.: The extrema of probability determined by generalized moments. I. Bounded random
variables. Ann. Inst. Stat. Math. 12(2), 119134; errata, 280 (1960)
62. Isii, K.: On sharpness of Tchebycheff-type inequalities. Ann. Inst. Stat. Math. 14(1):185197,
1962/1963.
63. Jaynes, E.T.: Probability Theory. Cambridge University Press, Cambridge (2003)
64. Joe, H.: Majorization, randomness and dependence for multivariate distributions. Ann.
Probab. 15(3), 12171225 (1987)
65. Johnstone, I.M.: High dimensional Bernsteinvon Mises: simple examples. In Borrowing
Strength: Theory Powering ApplicationsA Festschrift for Lawrence D. Brown, volume 6
of Inst. Math. Stat. Collect., pages 8798. Inst. Math. Statist., Beachwood, OH (2010)
66. Kac, M., Slepian, D.: Large excursions of Gaussian processes. Ann. Math. Stat. 30, 1215
1228 (1959)
67. Kall, P.: Stochastric programming with recourse: upper bounds and moment problems: a
review. Math. Res. 45, 86103 (1988)
68. Kallenberg, O.: Random Measures. Akademie-Verlag, Berlin (1975) Schriftenreihe des
Zentralinstituts fr Mathematik und Mechanik bei der Akademie der Wissenschaften der
DDR, Heft 23.
69. Kamga, P.-H.T., Li, B., McKerns, M., Nguyen, L.H., Ortiz, M., Owhadi, H., Sullivan, T.J.:
Optimal uncertainty quantification with model uncertainty and legacy data. J. Mech. Phys.
Solids 72, 119 (2014)
70. Karlin, S., Studden, W.J.: Tchebycheff Systems: With Applications in Analysis and Statistics.
Pure and Applied Mathematics, vol. XV. Interscience Publishers/Wiley, New York/Lon-
don/Sydney (1966)
71. Kendall, D.G.: Simplexes and vector lattices. J. Lond. Math. Soc. 37(1), 365371 (1962)
72. Kidane, A.A., Lashgari, A., Li, B., McKerns, M., Ortiz, M., Owhadi, H., Ravichandran, G.,
Stalzer, M., Sullivan, T.J.: Rigorous model-based uncertainty quantification with application
to terminal ballistics. Part I: Systems with controllable inputs and small scatter. J. Mech. Phys.
Solids 60(5), 9831001 (2012)
73. Kiefer, J.: Optimum experimental designs. J. R. Stat. Soc. Ser. B 21, 272319 (1959)
74. Kiefer, J.: Collected Works, vol. III. Springer, New York (1985)
75. Kleijn, B.J.K., van der Vaart, A.W.: The Bernstein-Von-Mises theorem under misspecifica-
tion. Electron. J. Stat. 6, 354381 (2012)
76. Kolmogorov, A.N.: Foundations of the Theory of Probability. Chelsea Publishing Co., New
York (1956). Translation edited by Nathan Morrison, with an added bibliography by A. T.
Bharucha-Reid
77. Kren, M.G.: The ideas of P. L. Cebysev and A. A. Markov in the theory of limiting values
of integrals and their further development. In: Dynkin, E.B. (ed.) Eleven Papers on Analysis,
Probability, and Topology, American Mathematical Society Translations, Series 2, vol. 12,
pp. 1122. American Mathematical Society, New York (1959)
78. Kurz, H.D., Salvadori, N.: Understanding Classical Economics: Studies in Long Period
Theory. Routledge, London/New York (2002)
188 H. Owhadi and C. Scovel
79. Laird, N.M.: A conversation with F. N. David. Stat. Sci. 4, 235246 (1989)
80. Le Cam, L.: On some asymptotic properties of maximum likelihood estimates and related
Bayes estimates. Univ. Calif. Publ. Stat. 1, 277329 (1953)
81. Le Cam, L.: An extension of Walds theory of statistical decision functions. Ann. Math. Stat.
26, 6981 (1955)
82. Le Cam, L.: Sufficiency and approximate sufficiency. Ann. Math. Stat. 35, 14191455 (1964)
83. Le Cam, L.: Asymptotic Methods in Statistical Decision Theory. Springer, New York (1986)
84. Leahu, H.: On the Bernsteinvon Mises phenomenon in the Gaussian white noise model.
Electron. J. Stat. 5, 373404 (2011)
85. Lehmann, E.L.: Student and small-sample theory. Stat. Sci. 14(4), 418426 (1999)
86. Lehmann, E.L.: Optimality and symposia: some history. Lect. Notes Monogr. Ser. 44, 110
(2004)
87. Lehmann, E.L.: Some history of optimality. Lect. Notes Monogr. Ser. 57, 1117 (2009)
88. Lenhard, J.: Models and statistical inference: the controversy between Fisher and Neyman
Pearson. Br. J. Philos. Sci. 57(1), 6991 (2006)
89. Leonard, R.: Von Neumann, Morgenstern, and the Creation of Game Theory: From Chess to
Social Science, 19001960. Cambridge University Press, New York (2010)
90. Lynch, P.: The origins of computer weather prediction and climate modeling. J. Comput. Phys.
227(7), 34313444 (2008)
91. Madansky, A.: Bounds on the expectation of a convex function of a multivariate random
variable. Ann. Math. Stat. 743746 (1959)
92. Madansky, A.: Inequalities for stochastic linear programming problems. Manag. Sci. 6(2),
197204 (1960)
93. Mangel, M., Samaniego, F.J.: Abraham Walds work on aircraft survivability. J. A. S. A.
79(386), 259267 (1984)
94. Marshall, A.W., Olkin, I.: Multivariate Chebyshev inequalities. Ann. Math. Stat. 31(4), 1001
1014 (1960)
95. Marshall, A.W., Olkin, I.: Inequalities: Theory of Majorization and Its Applications.
Mathematics in Science and Engineering, vol. 143. Academic [Harcourt Brace Jovanovich
Publishers], New York (1979)
96. McKerns, M.M., Strand, L., Sullivan, T.J., Fang, A., Aivazis, M.A.G.: Building a framework
for predictive science. In: Proceedings of the 10th Python in Science Conference (SciPy 2011)
(2011)
97. Morgenstern, O.: Abraham Wald, 19021950. Econometrica: J. Econom. Soci. 361367
(1951)
98. Mulholland, H.P., Rogers, C.A.: Representation theorems for distribution functions. Proc.
Lond. Math. Soc. (3) 8(2), 177223 (1958)
99. Nash, J.: Non-cooperative games. Ann. Math. (2) 54, 286295 (1951)
100. Nash, J.F. Jr.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. U. S. A. 36, 4849
(1950)
101. Nemirovsky, A.S.: Information-based complexity of linear operator equations. J. Complex.
8(2), 153175 (1992)
102. Neyman, J.: Outline of a theory of statistical estimation based on the classical theory of
probability. Philos. Trans. R. Soc. Lond. Ser. A 236(767), 333380 (1937)
103. Neyman, J.: A Selection of Early Statistical Papers of J. Neyman. University of California
Press, Berkeley (1967)
104. Neyman, J., Pearson, E.S.: On the use and interpretation of certain test criteria for purposes
of statistical inference. Biometrika 20A, 175240, 263294 (1928)
105. Neyman, J., Pearson, E.S.: On the problem of the most efficient tests of statistical hypotheses.
Philos. Trans. R. Soc. Lond. Ser. A 231, 289337 (1933)
106. Olkin, I., Pratt, J.W.: A multivariate Tchebycheff inequality. Ann. Math. Stat. 29(1), 226234
(1958)
6 Toward Machine Wald 189
107. Owhadi, H.: Multigrid with rough coefficients and multiresolution operator decomposition
from hierarchical information games. SIAM Rev. (Research spotlights) (2016, to appear).
arXiv:1503.03467
108. Owhadi, H., Scovel, C.: Qualitative robustness in Bayesian inference. arXiv:1411.3984
(2014)
109. Owhadi, H., Scovel, C.: Brittleness of Bayesian inference and new Selberg formulas.
Commun. Math. Sci. 14(1), 83145 (2016)
110. Owhadi, H., Scovel, C.: Extreme points of a ball about a measure with finite support.
Commun. Math. Sci. (2015, to appear). arXiv:1504.06745
111. Owhadi, H., Scovel, C.: Separability of reproducing kernel Hilbert spaces. Proc. Am. Math.
Soc. (2015, to appear). arXiv:1506.04288
112. Owhadi, H., Scovel, C., Sullivan, T.J.: Brittleness of Bayesian inference under finite informa-
tion in a continuous world. Electron. J. Stat. 9, 179 (2015)
113. Owhadi, H., Scovel, C., Sullivan, T.J.: On the Brittleness of Bayesian Inference. SIAM Rev.
(Research Spotlights) (2015)
114. Owhadi, H., Scovel, C., Sullivan, T.J., McKerns, M., Ortiz, M.: Optimal Uncertainty
Quantification. SIAM Rev. 55(2), 271345 (2013)
115. Packel, E.W.: The algorithm designer versus nature: a game-theoretic approach to
information-based complexity. J. Complex. 3(3), 244257 (1987)
116. Pearson, E.S.: Student A Statistical Biography of William Sealy Gosset. Clarendon Press,
Oxford (1990)
117. Pfanzagl, J.: Conditional distributions as derivatives. Ann. Probab. 7(6), 10461050 (1979)
118. Pinelis, I.: Exact inequalities for sums of asymmetric random variables, with applications.
Probab. Theory Relat. Fields 139(3-4):605635 (2007)
119. Pinelis, I.: On inequalities for sums of bounded random variables. J. Math. Inequal. 2(1), 17
(2008)
120. Platzman, G.W.: The ENIAC computations of 1950-gateway to numerical weather prediction.
Bull. Am. Meteorol. Soc. 60, 302312 (1979)
121. Ressel, P.: Some continuity and measurability results on spaces of measures. Mathematica
Scandinavica 40, 6978 (1977)
122. Rikun, A.D.: A convex envelope formula for multilinear functions. J. Global Optim. 10(4),
425437 (1997)
123. Rockafellar, R.T.: Augmented Lagrange multiplier functions and duality in nonconvex
programming. SIAM J. Control 12(2), 268285 (1974)
124. Rojo, J.: Optimality: The Second Erich L. Lehmann Symposium. IMS, Beachwood (2006)
125. Rojo, J.: Optimality: The Third Erich L. Lehmann Symposium. IMS, Beachwood (2009)
126. Rojo, J., Prez-Abreu, V.: The First Erich L. Lehmann Symposium: Optimality. IMS,
Beachwood (2004)
127. Rustem, B., Howe, M.: Algorithms for Worst-Case Design and Applications to Risk
Management. Princeton University Press, Princeton (2002)
128. Savage, L.J.: The theory of statistical decision. J. Am. Stat. Assoc. 46, 5567 (1951)
129. Scovel, C., Hush, D., Steinwart, I.: Approximate duality. J. Optim. Theory Appl. 135(3), 429
443 (2007)
130. Shapiro, A., Kleywegt, A.: Minimax analysis of stochastic problems. Optim. Methods Softw.
17(3), 523542 (2002)
131. Sherali, H.D.: Convex envelopes of multilinear functions over a unit hypercube and over
special discrete sets. Acta Math. Vietnam. 22(1), 245270 (1997)
132. Singpurwalla, N.D., Swift, A.: Network reliability and Borels paradox. Am. Stat. 55(3), 213
218 (2001)
133. Smith, J.E.: Generalized Chebychev inequalities: theory and applications in decision analysis.
Oper. Res. 43(5), 807825 (1995)
190 H. Owhadi and C. Scovel
134. Sniedovich, M.: The art and science of modeling decision-making under severe uncertainty.
Decis. Mak. Manuf. Serv. 1(12), 111136 (2007)
135. Sniedovich, M.: A classical decision theoretic perspective on worst-case analysis. Appl. Math.
56(5), 499509 (2011)
136. Sniedovich, M.: Black Swans, new Nostradamuses, Voodoo decision theories, and the science
of decision making in the face of severe uncertainty. Int. Trans. Oper. Res. 19(12), 253281
(2012)
137. Spanos, A.: Why the Decision-Theoretic Perspective Misrepresents Frequentist Inference
(2014). https://secure.hosting.vt.edu/www.econ.vt.edu/directory/spanos/spanos10.pdf
138. Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate normal
distribution. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics
and Probability, 19541955, vol. I, pp. 197206. University of California Press, Berkeley/Los
Angeles (1956)
139. Strasser, H.: Mathematical Theory of Statistics: Statistical Experiments and Asymptotic
Decision Theory, vol. 7. Walter de Gruyter, Berlin/New York (1985)
140. Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451559 (2010)
141. Student: The probable error of a mean. Biometrika 125 (1908)
142. Sullivan, T.J., McKerns, M., Meyer, D., Theil, F., Owhadi, H., Ortiz, M.: Optimal uncertainty
quantification for legacy data observations of Lipschitz functions. ESAIM Math. Model.
Numer. Anal. 47(6), 16571689 (2013)
143. Tintner, G.: Abraham Walds contributions to econometrics. Ann. Math. Stat. 23, 2128
(1952)
144. Tjur, T.: Conditional Probability Distributions, Lecture Notes, No. 2. Institute of Mathemati-
cal Statistics, University of Copenhagen, Copenhagen (1974)
145. Tjur, T.: Probability Based on Radon Measures. Wiley Series in Probability and Mathematical
Statistics. Wiley, Chichester (1980)
146. Traub, J.F., Wasilkowski, G.W., Wozniakowski, H.: Information-Based Complexity. Com-
puter Science and Scientific Computing. Academic, Boston (1988). With contributions by A.
G. Werschulz and T. Boult
147. Tukey, J.W.: Statistical and Quantitative Methodology. Trends in Social Science, pp. 84136.
Philisophical Library, New York (1961)
148. Tukey, J.W.: The future of data analysis. Ann. Math. Stat. 33, 167 (1962)
149. Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 11341142 (1984)
150. Vandenberghe, L., Boyd, S., Comanor, K.: Generalized Chebyshev bounds via semidefinite
programming. SIAM Rev. 49(1), 5264 (electronic) (2007)
151. Varadarajan, V.S.: Groups of automorphisms of Borel spaces. Trans. Am. Math. Soc. 109(2),
191220 (1963)
152. von Mises, R.: Mathematical Theory of Probability and Statistics. Edited and Complemented
by Hilda Geiringer. Academic, New York (1964)
153. Von Neumann, J.: Zur Theorie der Gesellschaftsspiele. Math. Ann. 100(1), 295320 (1928)
154. Von Neumann, J., Goldstine, H.H.: Numerical inverting of matrices of high order. Bull. Am.
Math. Soc. 53, 10211099 (1947)
155. Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton
University Press, Princeton (1944)
156. Wald, A.: Contributions to the theory of statistical estimation and testing hypotheses. Ann.
Math. Stat. 10(4), 299326 (1939)
157. Wald, A.: Statistical decision functions which minimize the maximum risk. Ann. Math. (2)
46, 265280 (1945)
158. Wald, A.: An essentially complete class of admissible decision functions. Ann. Math. Stat.
18, 549555 (1947)
159. Wald, A.: Sequential Analysis. 1947.
160. Wald, A.: Statistical decision functions. Ann. Math. Stat. 20, 165205 (1949)
161. Wald, A.: Statistical Decision Functions. Wiley, New York (1950)
6 Toward Machine Wald 191
162. Wald, A., Wolfowitz, J.: Optimum character of the sequential probability ratio test. Ann.
Math. Stat. 19(3), 326339 (1948)
163. Wald, A., Wolfowitz, J.: Characterization of the minimal complete class of decision functions
when the number of distributions and decisions is finite. In: Proceedings of the Second
Berkeley Symposium on Mathematical Statistics and Probability, pp. 149157. University
of California Press, Berkeley (1951)
164. Wasserman, L.: Rise of the Machines. Past, Present and Future of Statistical Science. CRC
Press, Boca Raton (2013)
165. Wasserman, L., Lavine, M., Wolpert, R.L.: Linearization of Bayesian robustness problems. J.
Stat. Plann. Inference 37(3), 307316 (1993)
166. Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res.
62(6), 13581376 (2014)
167. Wilson, M.: How a story from World War II shapes Facebook today. IBM Wat-
son (2012). http://www.fastcodesign.com/1671172/how-a-story-from-world-war-ii-shapes-
facebook-today.
168. Winkler, G.: On the integral representation in convex noncompact sets of tight measures.
Mathematische Zeitschrift 158(1), 7177 (1978)
169. Winkler, G.: Extreme points of moment sets. Math. Oper. Res. 13(4), 581587 (1988)
170. Wolfowitz, J.: Abraham Wald, 19021950. Ann. Math. Stat. 23, 113 (1952)
171. Wozniakowski, H.: Probabilistic setting of information-based complexity. J. Complex. 2(3),
255269 (1986)
172. Wozniakowski, H.: What is information-based complexity? In Essays on the Complexity of
Continuous Problems, pp. 8995. European Mathematical Society, Zrich (2009)
173. Wynn, H.P.: Introduction to Kiefer (1959) Optimum Experimental Designs. In Breakthroughs
in Statistics, pp. 395399. Springer, New York (1992)
174. Xu, L., Yu, B., Liu, W.: The distributionally robust optimization reformulation for stochastic
complementarity problems. Abstr. Appl. Anal. 2014, Art. ID 469587, (2014)
175. ckov, J.: On minimax solutions of stochastic linear programming problems. Casopis Pest.
Mat. 91, 423430 (1966)
176. Zhou, K., Doyle, J.C., Glover, K.: Robust and Optimal Control. Prentice Hall, Upper Saddle
River (1996)
177. Zymler, S., Kuhn, D., Rustem, B.: Distributionally robust joint chance constraints with
second-order moment information. Math. Program. 137(1-2, Ser. A), 167198 (2013)
Hierarchical Models for Uncertainty
Quantification: An Overview 7
Christopher K. Wikle
Abstract
Analyses of complex processes should account for the uncertainty in the
data, the processes that generated the data, and the models that are used to
represent the processes and data. Accounting for these uncertainties can be
daunting in traditional statistical analyses. In recent years, hierarchical statistical
models have provided a coherent probabilistic framework that can accommodate
these multiple sources of quantifiable uncertainty. This overview describes
a science-based hierarchical statistical modeling approach and the associated
Bayesian inference. In addition, given that many complex processes involve the
dynamical evolution of spatial processes, an overview of hierarchical dynamical
spatio-temporal models is also presented. The hierarchical and spatio-temporal
modeling frameworks are illustrated with a problem concerned with assimilating
ocean vector wind observations from satellite and weather center analyses.
Keywords
Bayesian Basis functions BHM Integro-difference equations Latent
process Quadratic nonlinearity MCMC Multivariate Ocean
Reduced-rank representation Spatio-temporal Wind
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
2 Hierarchical Modeling in the Presence of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2.1 Basic Hierarchical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
2.2 Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
2.3 Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
2.4 Parameter Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.5 Bayesian Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
1 Introduction
A scientific modeling approach considers a model for the process of interest, say W
here for wind. Recognizing the fact that ones understanding of such scientific
processes is always limited, this uncertainty is accounted for via a stochastic
representation, denoted by the distribution W j W , where W are parameters. The
traditional statistical approach described above does not explicitly account for this
uncertainty nor the uncertainty about the relationship between D and W . To see this
more clearly, one might decompose the joint distribution of the data and the process
given the associated parameters as
D; W j D ; W D D j W; D W j W ; (7.1)
with the process W is present in the marginal form of this data distribution and the
associated parameters. Typically, this integration cannot be done analytically, and
so one does not know the actual form of this marginal data likelihood nor could one
generally account for the complicated multivariate spatio-temporal dependence in
such a parameterization for real-world processes (e.g., nonlinearity, nonstationarity,
non-Gaussianity). Even in the rare situation where one can do the integration
analytically (e.g., Gaussian data and process models), the marginal dependence
structure is typically more complicated than can be captured by traditional spatio-
temporal parameterizations, that is, the dependence is some complicated function of
the parameters D and W . Perhaps more importantly, in the motivating application
considered here, the interest is with W , so one does not want to integrate it
out of (7.1). Indeed, one typically wants to predict this process distribution. This
separation between the data model conditional on the process and the process
model is exactly the paradigm in traditional state-space models in engineering
and time-series applications [e.g., 29]. More generally, the trade-off between
considering a statistical model from the marginal perspective, in which the random
process (parameters) are integrated out, and the conditional perspective, in which
complicated dependence structures must be parameterized, is just the well-known
trade-off that occurs in traditional mixed-model analysis in statistics [e.g., 31].
The decomposition given in (7.1) above is powerful in the sense that it separates
the uncertainty associated with the process and the uncertainty associated with
the observation of the process. However, it does not factor in the uncertainty
associated with the parameters themselves. Utilizing basic probability, one can
always decompose a joint distribution into a sequence of conditional and marginal
distributions. For example, A; B; C D A j B; C B j C C . Thus, the hierarchical
decomposition can be written as
Each of the stages of the hierarchy given in (7.3) can be decomposed into products
of distributions or submodels. For example, say there are three datasets for the near-
ocean surface wind process (W ) denoted by D .1/ , D .2/ , and D .3/ . These might
correspond to the satellite scatterometer data mentioned previously, ocean buoy
data, and the weather center analysis data product. These observations need not
be coincident nor even of the same spatial or temporal support as the other data nor
the process. In this case, the data model might be represented as
Typically, the process model in the hierarchical decomposition can also be further
decomposed into component distributions. For example, in the case of the wind
example described here, the wind process is a vector composed of two components,
speed and direction or, equivalently, north-south and east-west components that
depend on pressure. That is, one might write
.1;2/ .3/
W .1/ ; W .2/ ; W .3/ j W D W .1/ ; W .2/ j W .3/ ; W W .3/ j W ; (7.5)
where W .1/ and W .2/ correspond to the east-west and north-south wind components
(typically denoted by u and v, respectively) and W .3/ corresponds to the near-
surface atmospheric pressure (typically denoted P ). The decomposition in (7.5) is
not unique, but is sensible in this case because there is strong scientific justification
for conditioning the wind on the pressure [e.g., 14]. The process parameters are
again decomposed into those components associated with each distribution, W D
.1;2/ .3/
fW ; W g. The decomposition in (7.5) simplifies the joint dependence structure
7 Hierarchical Models for Uncertainty Quantification: An Overview 199
T
Y
W0 ; W1 ; : : : ; WT j W D Wt j Wt1 ; W W0 ; (7.6)
tD1
This suggests that one requires a flexible inferential paradigm that allows one to
perform inference and prediction on both process and parameters and even their
joint interaction.
The Bayesian paradigm fits naturally with hierarchical models because the posterior
distribution is proportional to the product distributions in the hierarchical decom-
position. For example, in the schematic representation of [4] given in (7.3), the
posterior distribution can be written via Bayes rule as
where the normalizing constant is the integral (in the case of continuous distri-
butions) of (7.3) with respect to the process and parameters (i.e., the marginal
distribution of the data). In the context of the wind example, the posterior distri-
bution can be written
W; W ; D j D / D j W; D W j W D ; W : (7.8)
W j D; W ; D / D j W; D W j W : (7.9)
7 Hierarchical Models for Uncertainty Quantification: An Overview 201
In applications where the component models are not too complex, these parameters
can be estimated using classical statistical approaches, and then the parameters are
used in a plug-in fashion in the model. For example, in state-space modeling,
one might estimate the parameters through an E-M algorithm and then evaluate
the process distributions through a Kalman filter/smoother [e.g., 29]. Such an
approach is sometimes called empirical Bayes or, in the context of hierarchical
models, empirical hierarchical modeling (EHM) [e.g., 7]. A potential concern using
such an approach is accounting for the uncertainty in the parameter estimation.
In some cases, this uncertainty can be accounted for by Taylor approximations or
bootstrap resampling methods [e.g., 29]. Typically, in complex models, the BHM
framework provides a more sensible approach to uncertainty quantification than
EHM approaches.
where Zt ./ corresponds to the data at time t and Yt ./ is the corresponding latent
process of interest, with a linear or nonlinear mapping function, H./, that relates
the data to the latent process. The data model error is given by t ./, and data model
parameters are represented by d .t /. These parameters may vary spatially and/or
temporally in general. As discussed more generally above, an important assumption
that is present here, and in many hierarchical representations of DSTMs, is that
the data Zt ./ are independent in time when conditioned on the true process, Yt ./
202 C.K. Wikle
and parameters d .t /. Thus, the observations conditioned on the true process and
parameters can be represented
T
Y
Zt ./ j Yt ./; d .t /:
tD1
The key component of the DSTM is the dynamical process model. As discussed
above, one can simplify this by making use of conditional independence through
Markov assumptions. For example, a first-order Markov process can be written as
for t D 1; 2; : : : so that
T
Y
Y0 ./; Y1 ./; : : : ; YT ./jf p .t /; t D 0; : : : ; T g D Yt ./jYt1 ./; p .t /
tD1
where M./ is the evolution operator (linear or nonlinear), t ./ is the noise
(error) process, and p .t / are process model parameters that may vary with time
and/or space. Typically, one would also specify a distribution for the initial state,
Y0 ./j p .0/.
The hierarchical model then requires distributions to be assigned to the parame-
ters f d .t /; p .t /; t D 0; : : : ; T g: Specific distributional forms for the parameters
(e.g., spatially or temporally varying, dependence on auxiliary covariate informa-
tion, etc.) depend strongly on the problem of interest. Indeed, as mentioned above,
one of the most critical aspects of complex hierarchical modeling is the specification
of these distributions. This is illustrated below with regard to linear and nonlinear
DSTMs.
In the case where one has a discrete set of spatial locations Ds D fs1 ; s2 ; : : : ; sn g of
interest (e.g., a lattice or grid), the first-order evolution process model (7.11) can be
written as
7 Hierarchical Models for Uncertainty Quantification: An Overview 203
n
X
Yt .si / D mij . m /Yt1 .sj / C t .si /; (7.12)
j D1
Yt D MYt1 C t ; (7.14)
where the n n transition matrix is given by M with elements fmij g with the
associated time-varying spatial error process given by t .t .s1 /; : : : ; t .sn //0 ,
which is typically specified to be zero mean and Gaussian, with spatial variance-
covariance matrix C . Usually, M and C are assumed to depend on parameters
m and , respectively, to mitigate the curse of dimensionality that often occurs
in spatio-temporal modeling. As discussed below, the parameterization of these
matrices is one way that additional mechanistic information can be incorporated
into the HM framework.
Many mechanistic processes are best modeled nonlinearly, at least at some spatial
and temporal scales of variability. A class of nonlinear statistical DSTMs can
be specified to accommodate such processes with quadratic interactions. Such a
general quadratic nonlinear (GQN) DSTM [35] can be written as
n
X n X
X n
Yt .si / D mij Yt1 .sj / C bi;k` Yt1 .sk /g.Yt1 .s` /I g / C t .si /;
j D1 kD1 `D1
(7.15)
where mij are the linear transition coefficients seen previously and quadratic
interaction transition coefficients are denoted by bi;k` . A transformation of one
of the components of the quadratic interaction is specified through the function
g./, which can depend on parameters g . This function g./ is responsible for the
general in GQN, and such transformations are critical for many processes such
as density-dependent growth that one may see in an epidemic or invasive species
204 C.K. Wikle
That is, processes 1 and 2 are conditionally independent at time t given process
3 at time t and previous values of processes 1 and 2 at time t 1, and process 3
at time t is conditionally independent of the others given its previous values. Such
a model formulation has the advantage of being able to match up to mechanistic
knowledge about the processes and their interactions. However, if such knowledge
is not available, this conditional formulation is arbitrary (or there is no basis for the
conditional independence assumptions), and such an approach is not recommended.
The third primary approach for modeling multivariate dynamical spatio-temporal
processes is to condition the J processes on one or more latent processes, much
like what is done in multivariate factor analysis. For a set of K J common latent
.k/
dynamical processes, f`;t g; which may or may not be spatially referenced, consider
n X
X K
.j / .j k/ .k/ .j /
Yt .si / D hi;` `;t C t .si /; (7.16)
`D1 kD1
.j k/
for i D 1; : : : ; n, j D 1; : : : ; J , where hi;` are interaction coefficients that account
for how the `th element of the kth latent process influences the j th process at
7 Hierarchical Models for Uncertainty Quantification: An Overview 205
Yt D t C t C t C t ; (7.18)
The MFS produces 10-day forecasts for upper ocean fields every day. This
forecast model resolves medium-scale (in time and space) features (e.g., synoptic
scale) in the upper ocean, and the most uncertain parts of the forecast fields
correspond to so-called mesoscales (i.e., hourly and 1050 km scales). These are the
primary scales of the upper ocean hydrodynamic instabilities driven by the surface
wind. Thus, modeling uncertainty in the surface wind field can be an important
means of quantifying uncertainty in the MFS ocean forecasts on the scales that are
most important to daily users.
[23] describe the details of a SVW BHM for the MFS, and [24] discuss the
impacts of the resulting BHM SVW fields in an ensemble forecast methodology
built around realizations from the posterior distribution for SVW from the BHM.
The process model in [23] involves the leading-order terms in a Rayleigh friction
equation (RFE) approximation at synoptic scales, with extra-spatial variability
added to account for turbulent scaling relationships in the wind field. A critical
component of the [23] model is that it is multivariate in terms of modeling the east-
west (u) and north-south (v) wind components and surface pressure (all of which
are spatio-temporal processes) such that the wind components are independently
conditioned on the pressure, which is a reasonable and justifiable assumption to
first order. However, higher-order interactions of wind components are most likely
important even after conditioning on the pressure field, so the model presented here
considers a multivariate low-rank representation of the residual wind components
after accounting for potential pressure gradient effects as suggested by the RFE.
The data, process, and parameter models are described below.
Q 2 Q 2
dt u jut ; Q i nd: Gau.Ht ut ; Q I/;
Q 2 Q 2
dt v jvt ; Q i nd: Gau.Ht vt ; Q I/;
dE u 2 E 2
t jut ; E i nd: Gau.Ht ut ; E I/;
dE v 2 E 2
t jvt ; E i nd: Gau.Ht vt ; E I/;
210 C.K. Wikle
Q Q
where dt u and dt v are mt -dimensional vectors of scatterometer u-wind and v-wind
observations, respectively, within a specified time window indexed by t ; and dE u
t and
Ev
dt are ECMWF u-wind and v-wind component observations, respectively, within
the same window. The spatially vectorized true wind process components are given
by the n-dimensional vectors ut , vt . The mapping matrices for the scatterometer
Q
and ECMWF observations are given by Ht and HE t , respectively. In this case, these
are just incidence matrices that map the observations to the nearest process grid
location [see 7, Chapter 7 for details]. The measurement errors are assumed to have
Gaussian distributions that are independent in space and time, conditioned upon
2
the true process values. The measurement-error variances, Q and E2 , correspond
to scatterometer and ECMWF wind components, respectively. The conditional
independence of these data models follows from the more general discussion above
concerning the relative ease of incorporating multiple data sources in the BHM
framework.
The wind data for February 2, 2005, are shown in Figs. 7.1 and 7.2. These plots
show the QuikSCAT scatterometer and ECMWF analysis observations available
within a window of 3 h of t D 00:00, 06:00, 12:00, and 18:00 UTC (Coordinated
Universal Time). The ECMWF analysis winds and pressures are specified on a
0:5 0:5 spatial grid, and they are available at each time period for all locations.
This grid is also used for the process vectors, ut and vt . As described above, the
QuikSCAT observations are available intermittently in space due to the polar orbit
of the satellite, but at much higher spatial resolution (25 km) when they are available.
Q
Thus, the mapping matrices for the scatterometer data, Ht , are defined as incidence
matrices such that all scatterometer observations within 0:25 of a process grid
point, and within 3 h of time t , are associated with the wind process at that grid
point and time.
where Dx and Dy are matrix operators that give the x-direction- and y-direction-
centered differences of the spatial field vector on which they operate, respectively,
and pt is the vectorized gridded ECMWF pressure data (assumed known here). In
the context of the process rank reduction decomposition given in (7.18), u and v
correspond to n n matrices of basis functions for the u- and v-wind components,
respectively, with the common random reduced-rank latent process expansion
coefficients, t . In addition, relative to (7.18), let u;t Dx pt ux C Dy pt uy and
v;t Dx pt vx C Dy pt vy , where these terms represent the importance of winds
on the gradient of pressure, as controlled by the parameters fux ; uy ; vx ; vy g.
This particular formulation does not include a separate small-scale spatial variability
component (e.g., t in (7.18)) for simplicity. [36] and [23] include such a term
7 Hierarchical Models for Uncertainty Quantification: An Overview 211
Latitude (deg N)
45
40
35
30
-10 0 10 20 30 40
Latittude (deg E)
Latitude (deg N)
45
40
35
30
-10 0 10 20 30 40
Latitude (deg N)
45
40
35
30
-10 0 10 20 30 40
Latitude (deg N)
45
40
35
30
-10 0 10 20 30 40
Longitude (deg E)
Fig. 7.1 Wind observations from February 2, 2005. From top to bottom, the panels correspond
to the available data at 00:00, 06:00, 12:00, and 18:00 UTC (Universal Coordinated Time). The
panels correspond to the ECMWF analysis winds on a 0:5 0:5 grid. The length of the wind
quiver (arrow) corresponds to speed, where the smallest is 0.06 m/s and the largest is 17.7 m/s.
45
Latitude (deg N)
40
35
30
-10 0 10 20 30 40
Latittude (deg E)
45
Latitude (deg N)
40
35
30
-10 0 10 20 30 40
45
Latitude (deg N)
40
35
30
-10 0 10 20 30 40
45
Latitude (deg N)
40
35
30
-10 0 10 20 30 40
Longitude (deg E)
Fig. 7.2 Wind observations from February 2, 2005. From top to bottom, the panels correspond to
the available data at 00:00, 06:00, 12:00, and 18:00 UTC (Universal Coordinated Time). The panels
correspond to the high-resolution (25 km), but spatially intermittent, QuickSCAT scatterometer
wind retrievals. The length of the wind quiver (arrow) corresponds to speed, where the smallest is
0.2 m/s and the largest is 21.3 m/s.
addition, EOFs generally are useful for dynamical reduced-rank modeling because
the dimension reduction is quite significant (in the case here, n D 4096 and
n D 32, which accounts for approximately 98% of the variability in the ECMWF
wind data). Although they can be quite useful for DSTM rank reduction, given
7 Hierarchical Models for Uncertainty Quantification: An Overview 213
that EOFs are essentially spatial principal component loadings, they are optimal
for variance reduction but not typically for dynamical propagation. Regarding the
notions of inclusion of mechanistic information in BHMs, note that by conditioning
the wind components on common processes pt and t , the process decomposition
in (7.19) and (7.20) allows a reasonable mechanistic-based approach for building in
the conditional independence between the wind components.
The dynamical evolution of the common latent process coefficients is specified
fairly simply in this illustrative example as
4.3 Implementation
The posterior distribution for the random components of the model is given by
Q Q
ft gTtD0 ; ; m ; C j fdt u gTtD1 ; fdt v gTtD1 ; fdE u T Eu T
t gtD1 ; fdt gtD1 /
T
Y T
Y
Qu Qv
dt j t ; dt j t ;
tD1 tD1
T
Y T
Y
Eu
dt j t ; dE v
t j t ;
tD1 tD1
T
Y
t jt1 ; m ; C 0 m C :
tD1
214 C.K. Wikle
4.4 Results
The posterior mean wind fields for 12:00 UTC on February 2, 2005, are shown in
Fig. 7.3. In addition, Fig. 7.4 shows a portion of the prediction domain, in which ten
samples widely separated in the MCMC chain are plotted. One can see from this
plot that the uncertainty associated with the spatial prediction of the wind fields is
not homogeneous in the domain. For example, the strong flow off the south coast
of France into the Gulf of Lion (so-called mistral winds) shows more variability
in wind direction than areas over land. Note that this area of increased posterior
variability is over a fairly small spatial region, which is important when one is
using winds to force an ocean model such as with the MFS. That is, the small-scale
variations in the wind forcing can lead to similar-scale uncertainties in the ocean
state variables, which can make a substantial difference in ocean forecasts [see 24,
for an in-depth discussion].
45
Latitude (deg N)
40
35
30
-10 0 10 20 30 40
Longitude (deg E)
Fig. 7.3 Posterior mean wind vectors for 12:00 UTC on February 2, 2005, on the prediction grid.
Wind speed is proportional to the length of the vectors, with the direction of wind toward the
arrowhead on the wind quiver. The length of the wind quiver corresponds to speed, where the
smallest is 0.0 m/s and the largest is 18.0 m/s.
7 Hierarchical Models for Uncertainty Quantification: An Overview 215
45
44
43
42
Latitude (deg N)
41
40
39
38
37
36
35
0 2 4 6 8 10
Longitude (deg E)
Fig. 7.4 Ten samples of wind vectors taken from the posterior distribution for 12:00 UTC on
February 2, 2005, over the western Mediterranean basins. Wind speed is proportional to the length
of the vectors. The direction of the vector is away from the vertex at the center of its grid cell.
In this case, the arrowhead on the wind quivers is suppressed so that the variability in the wind
corresponds to the width of the downwind portion of the vector.
5 Conclusion
presented above. There are a lot of observations from two different observation
sources in this example, so the wind component process model is actually fairly
simple relative to a mechanistic model that would be used with little or no data.
In other SVW implementations, more sophisticated process models may have to be
used depending on the data coverage and complexity of the dynamic environment
[e.g., 36]. On the other hand, if one specifies a very complex mechanistic process
model, but has very little data, there may not be enough information in the data
to inform the posterior distributions associated with the parameters and process
[e.g., 9]. That is, when the data are not rich enough to learn about the process
and parameters, then one effectively has a practical lack of identifiability that may
inhibit fitting the BHM. In practice, one tries to strike a balance between these two
competing data/model trade-offs.
Perhaps the greatest challenge with implementing complex BHMs is recognizing
the need to trade the complexity of the model for computational simplicity or what
[7] call the computing/model compromise. Despite the ever-advancing state of
statistical computation for HMs, the algorithms can still be difficult to implement,
both in terms of time required to code and the effort required to tune the algorithm.
Software packages to implement BHMs are increasing in number and quality, but
it is still often difficult to implement very complex BHMs with these packages.
Thus, one is often faced with the dilemma of either simplifying the model and
sacrificing some realism or utilizing an approximate estimation/inference approach
(e.g., ABC, INLA, variational Bayes, etc.) and either limiting the sorts of inference
that can be accomplished or accepting some inaccuracy relative to the true posterior
distribution of interest. Thus, when implementing a complex BHM, one must always
consider the difference between what one wants to do and what one can do and
whether it is best for the particular problem at hand to sacrifice model complexity
or computational efficiency. Regardless, the BHM paradigm still remains one of the
most powerful frameworks in which to quantify uncertainty.
This chapter is concerned with science-based hierarchical modeling, in which
one has mechanistic information available to inform the model components (either
data, model, or parameters). In recent years, alternative hierarchical modeling
approaches have been developed from the statistical learning perspective [e.g., see
the review outlined in 3], which typically do not make use of scientific/mechanistic
information, but seek to build multilayer models (e.g., deep learning) through
nonparametric approaches. These approaches can sometimes be useful in situations
where subject-matter knowledge is not readily available yet can overfit in situations
with complex spatial and temporal dependencies. In both the science-based and
statistical learning-based HM approaches, much more work remains to be done on
the theoretical properties of the estimators and predictors under various amounts
of uncertainty in observations, process models and parameter structure, as well
as data volume. Promising approaches are being developed [e.g., 1], but to date,
these approaches have not been able to speak to the multilevel-dependent parameter
structures common in the science-based BHM setting.
7 Hierarchical Models for Uncertainty Quantification: An Overview 217
References
1. Agapiou, S., Stuart, A., Zhang, Y.X.: Bayesian posterior contraction rates for linear severely
ill-posed inverse problems. J. Inverse Ill-Posed Probl. 22, 297321 (2014). doi:10.1515/jip-
2012-0071
2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representa-
tion. Neur. Comput. 15(6), 13731396 (2003)
3. Bengio, S., Deng, L., Larochelle, H., Lee, H., Salakhutdinov, R.: Guest editors introduction:
special section on learning deep architectures. IEEE Trans. Pattern Anal. Mach. Intell. 35(8),
17951797 (2013). doi:10.1109/TPAMI.2013.118
4. Berliner, L.: Hierarchical Bayesian time-series models. Fund. Theor. Phys. 79, 1522 (1996)
5. Berliner, L.M., Milliff, R.F., Wikle, C.K.: Bayesian hierarchical modeling of air-sea interaction.
J. Geophys. Res. Oceans (19782012) 108(C4), 21562202 (2003)
6. Brooks, S., Gelman, A., Jones, G., Meng, X.L.: Handbook of Markov Chain Monte Carlo.
CRC Press, Boca Raton (2011)
7. Cressie, N., Wikle, C.: Statistics for Spatio-Temporal Data, vol. 465. Wiley, Hoboken (2011)
8. Cressie, N., Calder, C., Clark, J., Hoef, J., Wikle, C.: Accounting for uncertainty in ecological
analysis: the strengths and limitations of hierarchical statistical modeling. Ecol. Appl. 19(3),
553570 (2009)
9. Fiechter, J., Herbei, R., Leeds, W., Brown, J., Milliff, R., Wikle, C., Moore, A., Powell, T.: A
Bayesian parameter estimation method applied to a marine ecosystem model for the coastal
gulf of Alaska. Ecol. Model. 258, 122133 (2013)
10. Gelfand, A.E., Smith, A.F.: Sampling-based approaches to calculating marginal densities.
J. Am. Stat. Assoc. 85(410), 398409 (1990)
11. Gladish, D., Wikle, C.: Physically motivated scale interaction parameterization in reduced
rank quadratic nonlinear dynamic spatio-temporal models. Environmetrics 25(4), 230244
(2014)
12. Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-
dimensional output. J. Am. Stat. Assoc. 103(482), 570583 (2008)
13. Hoar, T.J., Milliff, R.F., Nychka, D., Wikle, C.K., Berliner, L.M.: Winds from a Bayesian
hierarchical model: computation for atmosphere-ocean research. J. Comput. Graph. Stat. 12(4),
781807 (2003)
14. Holton, J.: Dynamic Meteorology. Elsevier, Burlington (2004)
15. Hooten, M., Leeds, W., Fiechter, J., Wikle, C.: Assessing first-order emulator inference for
physical parameters in nonlinear mechanistic models. J. Agric. Biolog. Environ. Stat. 16(4),
475494 (2011)
16. Hooten, M.B., Wikle, C.K.: A hierarchical Bayesian non-linear spatio-temporal model for the
spread of invasive species with application to the eurasian collared-dove. Environ. Ecol. Stat.
15(1), 5970 (2008)
17. Hooten, M.B., Wikle, C.K., Dorazio, R.M., Royle, J.A.: Hierarchical spatiotemporal matrix
models for characterizing invasions. Biometrics 63(2), 558567 (2007)
18. Kennedy, M.C., OHagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B
(Stat. Methodol.) 63(3), 425464 (2001)
19. Leeds, W., Wikle, C., Fiechter, J.: Emulator-assisted reduced-rank ecological data assimilation
for nonlinear multivariate dynamical spatio-temporal processes. Stat. Methodol. (2013).
doi:10.1016/j.stamet.2012.11.004
20. Leeds, W., Wikle, C., Fiechter, J., Brown, J., Milliff, R.: Modeling 3-D spatio-temporal
biogeochemical processes with a forest of 1-D statistical emulators. Environmetrics 24, 112
(2013). doi:10.1002/env.2187
21. Marin, J.M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational
methods. Stat. Comput. 22(6), 11671180 (2012)
218 C.K. Wikle
22. van der Merwe, R., Leen, T., Lu, Z., Frolov, S., Baptista, A.: Fast neural network surrogates
for very high dimensional physics-based models in computational oceanography. Neur. Netw.
20(4), 462478 (2007)
23. Milliff, R., Bonazzi, A., Wikle, C., Pinardi, N., Berliner, L.: Ocean ensemble forecasting. Part
I: ensemble mediterranean winds from a Bayesian hierarchical model. Q. J. R. Meteorol. Soc.
137(657), 858878 (2011)
24. Pinardi, N., Bonazzi, A., Dobricic, S., Milliff, R., Wikle, C., Berliner, L.: Ocean ensemble
forecasting. Part II: mediterranean forecast system response. Q. J. R. Meteorol. Soc. 137(657),
879893 (2011)
25. Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer New York (2004)
26. Royle, J., Berliner, L., Wikle, C., Milliff, R.: A Hierarchical Spatial Model for Constructing
Wind Fields from Scatterometer Data in the Labrador Sea. Lecture Notes in Statistics, pp. 367
382. Springer, New York (1999)
27. Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent gaussian models
by using integrated nested laplace approximations. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
71(2), 319392 (2009)
28. Sacks, J., Welch, W., Mitchell, T., Wynn, H.: Design and analysis of computer experiments.
Stat. Sci. 4(4), 409423 (1989)
29. Shumway, R.H., Stoffer, D.S.: Time Series Analysis and its Applications: With R Examples.
Springer, New York (2010)
30. mdl, V., Quinn, A.: The Variational Bayes Method in Signal Processing. Springer,
Berlin/New York (2006)
31. Verbeke, G., Molenberghs, G.: Linear Mixed Models for Longitudinal Data. Springer, New
York (2009)
32. Wikle, C.: Hierarchical Bayesian models for predicting the spread of ecological processes.
Ecol. 84(6), 13821394 (2003)
33. Wikle, C., Berliner, L.: A Bayesian tutorial for data assimilation. Phys. D: Nonlinear Phenom.
230(1), 116 (2007)
34. Wikle, C., Holan, S.: Polynomial nonlinear spatio-temporal integro-difference equation mod-
els. J. Time Ser. Anal. 32(4), 339350 (2011)
35. Wikle, C., Hooten, M.: A general science-based framework for dynamical spatio-temporal
models. Test 19(3), 417451 (2010)
36. Wikle, C., Milliff, R., Nychka, D., Berliner, L.: Spatiotemporal hierarchical Bayesian modeling
tropical ocean surface winds. J. Am. Stat. Assoc. 96(454), 382397 (2001)
37. Wikle, C.K.: A kernel-based spectral model for non-gaussian spatio-temporal processes. Stat.
Model. 2(4), 299314 (2002)
38. Wikle, C.K.: Hierarchical models in environmental science. Int. Stat. Rev. 71(2), 181199
(2003)
39. Wikle, C.K.: Low-Rank Representations for Spatial Processes. Handbook of Spatial Statistics,
pp. 107118. CRC, Boca Raton (2010)
40. Wikle, C.K.: Modern perspectives on statistics for spatio-temporal data. Wiley Interdiscip. Rev.
Comput. Stat. 7(1), 8698 (2015)
41. Wikle, C.K., Cressie, N.: A dimension-reduced approach to space-time Kalman filtering.
Biometrika 86(4), 815829 (1999)
42. Wikle, C.K., Berliner, L.M., Milliff, R.F.: Hierarchical Bayesian approach to boundary value
problems with stochastic boundary conditions. Mon. Weather Rev. 131(6), 10511062 (2003)
43. Wikle, C.K., Milliff, R.F., Herbei, R., Leeds, W.B.: Modern statistical methods in oceanog-
raphy: a hierarchical perspective. Stat. Sci. 28(4), 466486 (2013). doi:10.1214/13-STS436,
http://dx.doi.org/10.1214/13-STS436
44. Wu, G., Holan, S.H., Wikle, C.K.: Hierarchical Bayesian spatio-temporal ConwayMaxwell
poisson models with dynamic dispersion. Jo. Agri. Biol. Environ. Stat. 18(3), 335356 (2013)
45. Xu, K., Wikle, C.K., Fox, N.I.: A kernel-based spatio-temporal dynamical model for nowcast-
ing weather radar reflectivities. J. Am. Stat. Assoc. 100(472), 11331144 (2005)
Random Matrix Models and Nonparametric
Method for Uncertainty Quantification 8
Christian Soize
Abstract
This chapter deals with the fundamental mathematical tools and the associated
computational aspects for constructing the stochastic models of random matrices
that appear in the nonparametric method of uncertainties and in the random con-
stitutive equations for multiscale stochastic modeling of heterogeneous materials.
The explicit construction of ensembles of random matrices but also the presen-
tation of numerical tools for constructing general ensembles of random matrices
are presented and can be used for high stochastic dimension. The developments
presented are illustrated for the nonparametric method for multiscale stochastic
modeling of heterogeneous linear elastic materials and for the nonparametric
stochastic models of uncertainties in computational structural dynamics.
Keywords
Random matrix Symmetric random matrix Positive-definite random
matrix Nonparametric uncertainty Nonparametric method for uncertainty
quantification Random vector Maximum entropy principle Non-
Gaussian Generator Random elastic medium Uncertainty quantification
in linear structural dynamics Uncertainty quantification in nonlinear structural
dynamics Parametric-nonparametric uncertainties Identification Inverse
problem Statistical inverse problem
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
2 Notions on Random Matrices and on the Nonparametric Method for
Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
2.1 What Is a Random Matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
2.2 What Is the Nonparametric Method for Uncertainty Quantification? . . . . . . . . . . 223
C. Soize ()
Laboratoire Modlisation et Simulation Multi Echelle (MSME), Universit Paris-Est,
Marne-la-Vallee, France
e-mail: christian.soize@univ-paris-est.fr
1 Introduction
It is well known that the parametric method for uncertainty quantification consists
in constructing stochastic models of the uncertain physical parameters of a compu-
tational model that results from the discretization of a boundary value problem. The
parametric method is efficient for taking into account the variabilities of physical
parameters, but has not the capability to take into account the model uncertainties
induced by modeling errors that are introduced during the construction of the
computational model. The nonparametric method for the uncertainty quantification
is a way for constructing a stochastic model of the model uncertainties induced by
the modeling errors. It is also an approach for constructing stochastic models of con-
stitutive equations of materials involving some non-Gaussian tensor-valued random
fields, such as in the framework of elasticity, thermoelasticity, electromagnetism,
etc. The random matrix theory is a fundamental tool that is really efficient for
performing stochastic modeling of matrices that appear in the nonparametric method
of uncertainties and in the random constitutive equations for multiscale stochastic
222 C. Soize
A real (or complex) matrix is a rectangular or a square array of real (or complex)
numbers, arranged in rows and columns. The individual items in a matrix are called
its elements or its entries.
A real (or complex) random matrix is a matrix-valued random variable, which
means that its entries are real (or complex) random variables. The random matrix
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 223
3 A Brief History
The random matrix theory (RMT) were introduced and developed in mathematical
statistics by Wishart and others in the 1930s and was intensively studied by
physicists and mathematicians in the context of nuclear physics. These works began
with Wigner [125] in the 1950s and received an important effort in the 1960s by
Dyson, Mehta, Wigner [36, 37, 126], and others. In 1965, Poter [92] published a
volume of important papers in this field, followed, in 1967, by the first edition of
the Mehta book [72] whose second edition [73] published in 1991 gives a synthesis
of the random matrix theory. For applications in physics, an important ensemble of
the random matrix theory is the Gaussian orthogonal ensemble (GOE) for which
the elements are constituted of real symmetric random matrices with statistically
independent entries and which are invariant under orthogonal linear transformations
(this ensemble can be viewed as a generalization of a Gaussian real-valued random
variable to a symmetric real square random matrix).
For an introduction to multivariate statistical analysis, we refer the reader to
[2], for an overview on explicit probability distributions of ensembles of random
matrices and their properties, to [55] and, for analytical mathematical methods
devoted to the random matrix theory, to [74].
RMT has been used in other domains than nuclear physics. In 1984 and 1986,
Bohigas et al. [14,15] found that the level fluctuations of the quantum Sinais billard
were able to predict with the GOE of random matrices. In 1989, Weaver [124]
showed that the higher frequencies of an elastodynamic structure constituted of a
small aluminum block had the behavior of the eigenvalues of a matrix belonging to
the GOE. Then, Bohigas, Legrand, Schmidt, and Sornette [16,65,66,99] studied the
high-frequency spectral statistics with the GOE for elastodynamics and vibration
problems in the high-frequency range. Langley [64] showed that, in the high-
frequency range, the system of natural frequencies of linear uncertain dynamic
systems is a non-Poisson point process. These results have been validated for the
high-frequency range in elastodynamics. A synthesis of theses aspects related to
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 225
quantum chaos and random matrix theory, devoted to linear acoustics and vibration,
can be found in the book edited by Wright and Weaver [127].
The nonparametric method was initially be introduced by Soize [106, 107] in 1999
2000 for uncertainty quantification in computational linear structural dynamics
in order to take into account the model uncertainties induced by the modeling
errors that could not be addressed by the parametric method. The concept of
the nonparametric method then consisted in modeling the generalized matrices
of the reduced-order model of the computational model by random matrices. It
should be noted that the terminology nonparametric is not at all connected to
the nonparametric statistics but was introduced to show the differences between
the well-known parametric method consisting in constructing a stochastic model of
uncertain physical parameters of the computational model, and the new proposed
nonparametric method that consisted in modeling the generalized matrices of the
reduced-order model by random matrices, related to the operators of the problem.
Later, the parametric-nonparametric method has been introduced [113].
Early in the development of the concept of the nonparametric method, a problem
has occurred in the choice of ensembles of random matrices. Indeed the ensembles
of random matrices coming from the RMT were not adapted to stochastic modeling
required by the nonparametric method. For instance, the GOE of random matrices
could not be used for the generalized mass matrix, which must be positive definite,
what is not the case for a random matrix belonging to GOE. Consequently, new
ensembles of random matrices have had to be developed [76, 107, 108, 110, 115],
using the maximum entropy (MaxEnt) principle, for implementing the concept
of the nonparametric method for various computational models in mechanics,
for which the matrices must verify various algebraic properties. In addition,
parameterizations of the new ensembles of random matrices have been introduced
in the different constructions in order to be in capability to quantify simply the
level of uncertainties. These ensembles of random matrices have been constructed
with a parameterization exhibiting a small number of hyperparameters, what allows
for identifying the hyperparameters in using experimental data, solving a statistical
inverse problems for random matrices that are, in general, in very high dimension. In
these constructions, for certain types of available information, an explicit solution
of the MaxEnt principle has been obtained, giving an explicit description of the
ensembles of random matrices and of the corresponding generators of realizations.
Nevertheless, for other cases of available information coming from computational
models, there is no explicit solution of the MaxEnt, and therefore, a numerical tool
adapted to the high dimension has had to be developed [112].
Finally, during these last 15 years the nonparametric method has extensively been
used and extended, with experimental validations, to many problems in linear and
nonlinear structural dynamics, in fluid-structure interaction and in vibroacoustics,
226 C. Soize
4 Overview
5 Notations
The following algebraic notations are used through all the developments devoted to
this chapter.
MC C0
n .R/ Mn .R/ Mn .R/ Mn .R/:
S
(i) The determinant of aPmatrix G in Mn .R/ is denoted as detG, and its trace is
denoted as trG D nj D1 Gjj .
228 C. Soize
Let G and H be two matrices in MCn .R/. The notation G > H means that the
matrix G H belongs to MCn .R/.
The measure of uncertainties using the entropy of information has been introduced
by Shannon [103] in the framework of the development of information theory.
The maximum entropy (MaxEnt) principle (that is to say, the maximization of
the level of uncertainties) has been introduced by Jaynes [58] and allows a prior
probability model of any random variables to be constructed, under the constraints
defined by the available information. This principle appears as a major tool to
construct the prior probability models. All the ensembles of random matrices
presented hereinafter (including the well-known Gaussian Orthogonal Ensemble)
are constructed in the framework of a unified presentation using the MaxEnt. This
means that the probability distributions of the random matrices belonging to these
ensembles are constructed using the MaxEnt.
This section deals with the definition of a probability density function (pdf) of
a random matrix G with values in the Euclidean space MSn .R/ (set of all the
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 229
symmetric .n n/ real matrices, equipped with the inner product G; H D
trfGT H g). In order to correctly defined the integration on Euclidean space
MSn .R/, it is necessary to define the volume element on this space.
defined by
Y
d S G D 2n.n1/=4 dGj k : (8.1)
1j kn
It should be noted that, in the context of the construction of the unknown pdf pG ,
it is assumed that support Sn is a given (known) set.
which can be rewritten as E.pG / D Eflog pG .G/ . For any pdf pG defined
on MSn .R/ and with support Sn , entropy E.pG / is a real number. The uncertainty
increases when the Shannon entropy increases. More the Shannon entropy is small
and more the level of uncertainties is small. If E.pG / goes to 1, then the level of
uncertainties goes to zero, and random matrix G goes to a deterministic matrix for
the convergence in probability distribution (in probability law).
As explained before, the use of the MaxEnt principle requires to correctly defined
the available information related to random matrix G for which pdf pG (that is
unknown with a given support Sn ) has to be constructed.
h.pG / D 0; (8.5)
given matrix in Sn , and if thisR mean value G corresponds to the only available
information, then h .pG / D Sn Gj k pG .G/ d S G G j k , in which D 1; : : : ;
is associated with the couple of indices .j; k/ such as 1 j k n and where
D n.n C 1/=2.
6.3.2 The Admissible Sets for the pdf
The following admissible sets Cfree and Cad are introduced for defining the optimiza-
tion problem resulting from the use of the MaxEnt principle in order to construct the
pdf of random matrix G. The set Cfree is made up of all the pdf p W G 7! p.G/,
defined on MSn .R/, with support Sn MSn .R/,
Z
C
Cfree D fG 7! p.G/ W Mn .R/ ! R ; supp p D Sn ;
S
p.G/ d S G D 1g :
Sn
(8.6)
The set Cad is the subset of Cfree for which all the pdf p in Cfree satisfy the constraint
defined by
Cad D fp 2 Cfree ; h.p/ D 0g : (8.7)
A fundamental ensemble for the symmetric real random matrices is the Gaussian
orthogonal ensemble (GOE) that is an ensemble of random matrices G, defined
on a probability space .; T ; P/, with values in MSn .R/, defined by a pdf pG on
MSn .R/ with respect to the volume element d S G, for which the support Sn of pG is
MSn .R/, and satisfying the additional properties defined hereinafter.
The additional properties of a random matrix G belonging to GOE are (i) invari-
ance under any real orthogonal transformation, that is to say, for any orthogonal
.n n/ real matrix R such that RT R D R RT D In , the pdf (with respect
232 C. Soize
2
EfGj k g D 0; EfGj k Gj 0 k 0 g D jj 0 kk 0 .1 C j k / : (8.9)
nC1
nC1
pG .G/ D cG exp. trfG2 g/; Gkj D Gj k ; 1 j k n; (8.10)
4 2
in which cG is the constant of normalization such that Eq. (8.2) is verified. It can
then be deduced that fGj k ; 1 j k ng are Gaussian independent real random
variables such that Eq. (8.9) is verified. Consequently, for all 1 j k n,
the pdf (with respect
p to dg on R) of the Gaussian real random variable Gj k
is pGj k .g/ D . 2j k /1 expfg 2 =.2j2k /g in which the variance of random
variable Gj k is j2k D .1 C j k / 2 =.n C 1/.
Let GGOE be the random matrix with values in MSn .R/ such that GGOE D
In C G in which G is a random matrix belonging to the GOE defined before.
Therefore GGOE is not centered and its mean value is EfGGOE g D In . The
coefficient of variation of the random matrix GGOE is defined [109] by
( ) 1=2 1=2
Efk GGOE EfGGOE g k2F g 1
GOE D D Efk GGOE In k2F g ;
k EfGGOE g k2F n
(8.11)
p
and GOE D . The parameter 2= n C 1 can be used to specify a scale.
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 233
The GOE can then be viewed as a generalization of the Gaussian real random
variables to the Gaussian symmetric real random matrices. It can be seen that GGOE
is with values in MSn .R/ but is not positive. In addition, for all fixed n,
(i) It has been proved by Weaver [124] and others (see [127] and included refer-
ences) that the GOE is well adapted for describing universal fluctuations of the
eigenfrequencies for generic elastodynamical, acoustical, and elastoacoustical
systems, in the high-frequency range corresponding to the asymptotic behavior
of the largest eigenfrequencies.
(ii) On the other hand, random matrix GGOE cannot be used for stochastic
modeling of a symmetric real matrix for which a positiveness property and
an integrability of its inverse are required. Such a situation is similar to the
following one that is well known for the scalar case. Let us consider the scalar
equation in u: . G C G/ u D v in which v is a given real number, G is
a given positive number, and G is a positive parameter. This equation has a
unique solution u D . G C G/1 v. Let us assume that G is uncertain and is
modeled by a centered random variable G. We then obtain the random equation
in U : . G C G/U D v. If the random solution U must have finite statistical
fluctuations, that is to say, U must be a second-order random variable (this is
generally required due to physical considerations), then G cannot be chosen as
a Gaussian second-order centered real random variable, because with such a
Gaussian stochastic modeling, the solution U D .G C G/1 v is not a second-
order random variable, because EfU 2 g D C1 due to the non integrability of
the function G 7! .G C G/2 at point G D G.
analyzed for constructing other ensembles of random matrices used for the nonpara-
metric stochastic modeling of matrices encountered in uncertainty quantification.
MaxEnt with the following available information, which defines mapping h (see
Eq. (8.5)):
respect to the volume element d G on the set Mn .R/) verifies the normalization
S S
The positive parameter is a such that 0 < < .n C 1/1=2 .n C 5/1=2 , which allows
the level of statistical fluctuations of random matrix G0 to be controlled and which
is defined by
1=2 1=2
Efk G0 EfG0 g k2F g 1
D D Efk G0 In k2F g : (8.15)
k EfG0 g k2F n
R C1
where, for all z > 0,
.z/ D 0 t z1 e t dt . Note that fG0 j k ; 1 j k ng
are dependent random variables. If .n C 1/= 2 is an integer, then this pdf coincides
with the Wishart probability distribution [2,107]. If .nC1/= 2 is not an integer, then
this probability density function can be viewed as a particular case of the Wishart
distribution, in infinite dimension, for stochastic processes [104].
8 98 9 ( )
<Yn
.nC1/
.1 2 / =<Y = n
.n C 1/ X
2 2
p ./ D 10;C1n ./ c j j j exp k ;
: ;: ; 2 2
j D1 < kD1
(8.19)
R C1 R C1
in which c is a constant of normalization defined by the equation 0 : : : 0
p ./ d D 1. All the random eigenvalues j of random matrix G0 in SGC 0 are
positive almost surely, while this assertion is not true for the random eigenvalues
GOE
j of the random matrix GGOE D In C G in which G is a random matrix
belonging to the GOE ensemble.
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 237
It should be noted that the set f fUj k g1j <kn ; fVj g1j n g of random variables
are statistically independent, and the pdf of each diagonal element Ljj of random
matrix L depends on the rank j of the entry.
For 2 , any realization G0 . / is then deduced from the algebraic
representation given before, using the realization fUj k . /g1j <kn of n.n 1/=2
independent copies of a normalized (zero mean and unit variance) Gaussian real
random variable and using the realization fVj . /g1j n of the n independent
positive-valued gamma random variable Vj with parameter aj .
The ensemble SGC " is a subset of all the positive-definite random matrices for which
the mean value is the unit matrix and for which there is an arbitrary lower bound
that is a positive-definite matrix controlled by an arbitrary positive number " that
can be chosen as small as is desired. In this ensemble, the lower bound does not
correspond to a given matrix that results from a physical model.
Ensemble SGC " is the set of the random matrices G with values in Mn .R/,
C
1
G G` D G0 > 0; (8.21)
1C"
238 C. Soize
is such that
G D ; (8.24)
1C"
Random matrix G is invertible almost surely and its inverse G1 is a second-order
random variable, EfkG1 k2F g < C1.
The ensemble SGC b is a subset of all the positive-definite random matrices for which
the mean value is either the unit matrix or is not given and for which a lower bound
and an upper bound are given positive-definite matrices. In this ensemble, the lower
bound and the upper bound are not arbitrary positive-definite matrices, but are given
matrices that result from a physical model.
The ensemble SGC b is constituted of random matrices Gb , defined on the
probability space .; T ; P/, with values in the set MC n .R/ Mn .R/, such that
S
in which the lower bound G` and the upper bound Gu are given matrices in
MCn .R/ such that G` < Gu . The support of the pdf pGb (with respect to
the volume element d S G on MSn .R/) of random matrix Gb is the subset Sn of
MCn .R/ Mn .R/ such that
S
Sn D f G 2 MC
n .R/ j G` < G < Gu g : (8.27)
The available information associated with the presence of the lower and upper
bounds is defined by
in which ` and u are two constants such that j ` j < C1 and j u j < C1. The
mean value G b D EfGb g is given by
Z
G b D G pGb .G/ d S G : (8.29)
Sn
1=2
Efk Gb G b k2F g
b D : (8.30)
k G b k2F
in which cGb is the normalization constant and where > .n 1/=2 and > .n
1/=2 are two real parameters that are unknown and that depend on the two unknown
constants ` and u . The mean value G b must be calculated using Eqs. (8.29)
and (8.31), and the hyperparameter b , which characterizes the level of statistical
fluctuations, must be calculated using Eqs. (8.30) and (8.31). Consequently, G b
and b depend on and . It can be seen that, for n 2, the two scalar parameters
and are not sufficient for identifying the mean value G b that is in Sn and the
hyperparameter b . An efficient algorithm for generating realizations of Gb can be
found in [28].
240 C. Soize
This equation shows that the random matrix A0 is with values in MC n .R/.
Introducing the mean value A 0 D EfA0 g that belongs to MC n .R/ and is
Cholesky factorization A 0 D L 0 T L 0 in which L 0 is an upper triangular
real .n n/ matrix, random matrix A0 can be written as A0 D L 0 T G0 L 0
with G0 that belongs to ensemble SGC 0 depending on the hyperparameter defined
by Eq. (8.15). The inversion of Eq. (8.32) yields
1
Gb D G` C L 0 T G0 L 0 C G`u 1 : (8.33)
It can then be seen that for any arbitrary small "0 > 0 (for instance, "0 D 106 ), we
have
opt
L 0 D arg min F. L 0 /; (8.35)
L 0 2UL
in which the cost function F is deduced from Eq. (8.34) and is written as
The ensemble SGC is a subset of all the positive-definite random matrices for which
the mean value is the unit matrix, for which the lower bound is the zero matrix,
and for which the second-order moments of diagonal entries are imposed. In the
context of nonparametric stochastic modeling of uncertainties, this ensemble allows
for imposing the variances of certain random eigenvalues of stochastic generalized
eigenvalue problems.
MaxEnt with the following available information, which defines mapping h (see
Eq. (8.5)):
m
X
1 T 2
pG .G/ D 1Sn .G/ CG det G expftrf Gg j Gjj g;
j D1
(8.38)
1=2 1=2
Efk G EfG g k2F g 1
D D Efk G In k2F g ; (8.39)
k EfG g k2F n
m
1 X 2 n C 1 .m=n/.n C 2 1/
2 D s C : (8.40)
n j D1 j n C 2 1
242 C. Soize
The ensemble SEC 0 is a subset of all the positive-definite random matrices for which
the mean values are given and differ from the unit matrix (unlike to ensemble SGC 0 ).
This ensemble is constructed as a transformation of ensemble SGC 0 in keeping all
the mathematical properties of ensemble SGC 0 such as the positiveness [107].
A D LA T LA ; (8.42)
in which LA is an upper triangular matrix in Mn .R/. Taking into account Eq. (8.41)
and the definition of ensemble SGC C
0 , any random matrix A0 in ensemble SE0 is
written as
A0 D LA T G0 LA ; (8.43)
in which the random matrix G0 belongs to ensemble SGC 0 , with mean value
EfG0 g D In , and for which the level of statistical fluctuations is controlled by
the hyperparameter defined by Eq. (8.15).
244 C. Soize
Remark 1. It should be noted that the mean matrix A could also been written as
A D A1=2 A1=2 in which A1=2 is the square root of A in MC n .R/ and the
random matrix A0 could then be written as A0 D A1=2 G0 A1=2 .
and its inverse A0 1 exists almost surely and is a second-order random variable,
2
Cj k;j 0 k 0 D Aj 0 k Aj k 0 C Ajj 0 Akk 0 ; (8.46)
nC1
and the variance j2k D Cj k;j k of random variable A0 j k is
2 2
j2k D A C Ajj Akk : (8.47)
n C 1 jk
The coefficient of variation A0 of random matrix A0 is defined by
1=2
Efk A0 A k2F g
A0 D : (8.48)
k A k2F
P P
Since Efk A0 A k2F g D nj D1 nkD1 j2k , we have
1=2
.tr A/2
A0 D p 1C : (8.49)
nC1 kAk2F
The ensemble SEC " is a set of positive-definite random matrices for which the mean
value is a given positive-definite matrix and for which there is an arbitrary lower
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 245
A D LA T G LA ; (8.50)
with A D A0 n log.1 C "/. For all " > 0, random matrix A in ensemble SEC
" is
a second-order random variable,
and the bilinear form bA .X; Y/ D ..A X; Y// on L2n L2n is such that
Random matrix A is invertible almost surely and its inverse A1 is a second-order
random variable,
is such that
1
A D A ; (8.57)
1C" 0
in which A0 is defined by Eq. (8.49).
A 2 MC0
m .R/; dim null.A/ D null < m; null.A/ D null.A/ :
(8.58)
A D RA T RA : (8.59)
A D RA T G RA ; (8.60)
The ensemble SE rect is an ensemble of rectangular random matrices for which the
mean value is a given rectangular matrix and which is constructed with the MaxEnt.
Such an ensemble depends on the available information and consequently, is not
unique. We present hereinafter the construction proposed in [110], which is based
on the use of a fundamental algebraic property for rectangular real matrices, which
allows ensemble SEC " to be used.
A D U T ; (8.61)
in which the square matrix T and the rectangular matrix U are such that
T 2 MC
n .R/ and U 2 Mm;n .R/ such that U T U D In : (8.62)
The construction of the decomposition defined by Eq. (8.61) can be performed, for
instance, by using the singular value decomposition of A.
A D U T; (8.63)
248 C. Soize
T D LT T G LT : (8.64)
b I .!/;
! D.!/ D N b R .!/ :
K.!/ D K0 C N (8.65)
in which p.v. denotes the Cauchy principal value. The real matrix K0 belongs
to MC
n .R/ and can be written as
Z C1
2
K0 D K.0/ C D.!/ d ! D lim K.!/; (8.67)
0 j!j!C1
For ! 0, the construction of the stochastic model of the family of random matrices
D.!/ and K.!/ is carried out as follows:
(i) Constructing the family D.!/ of random matrices such that, for fixed !,
D.!/ D LD .!/T GD LD .!/, where LD .!/ is the upper triangular
real .n n/ matrix resulting from the Cholesky decomposition of the positive-
definite symmetric real matrix D.!/ D LD .!/T LD .!/ and where GD
is a .n n/ random matrix that belongs to ensemble SGC " , for which the
hyperparameter is rewritten as D . Hyperparameter D allows the level of
uncertainties to be controlled for random matrix D.!/.
(ii) Constructing the random matrix K.0/ D LK.0/ T GK.0/ LK.0/ in
which LK.0/ is the upper triangular real .n n/ matrix resulting from
the Cholesky decomposition of the positive-definite symmetric real matrix
K.0/ D LK.0/ T LK.0/ and where GK.0/ is a .n n/ random matrix that
belongs to ensemble SGC " , for which the hyperparameter is rewritten as
K . Hyperparameter K allows the level of uncertainties to be controlled for
random matrix K.0/.
(iii) For fixed ! 0, constructing the random matrix K.!/ using the
equation,
Z C1
! 1
K.!/ D K.0/ C p.v D.! 0 / d ! 0 ; (8.71)
1 ! !0
250 C. Soize
or equivalently,
Z C1
2 !2 1
K.!/ D K.0/ C p.v D.! 0 / d ! 0 : (8.72)
0 !2 !02
The last equation can also be rewritten as the following equation recommended
for computation (because the singularity in u D 1 is independent of !):
Z C1
2! 1
K.!/ D K.0/ C p.v D.!u/ d u;
0 1 u2
Z 1
Z C1
2!
D K.0/ C lim C : (8.73)
!0 0 1C
(iv) For fixed ! < 0, K.!/ is calculated using the even property, K.!/ D
K.!/. With such a construction, it can be verified that, for all ! 0,
K.!/ is a positive-definite random matrix. The following sufficient condition
is proved in [115]. If for all real vector y D .y1 ; : : : ; yn /, and if almost surely
the random function ! 7!< D.!/ y; y > is decreasing in ! for ! 0, then,
for all ! 0, K.!/ is a positive-definite random matrix.
Let A be a random matrix defined on the probability space .; T ; P/, with values
in any subset Sn of MSn .R/, possibly with Sn D MSn .R/. For instance, Sn can be
MC n .R/. Let pA be the pdf of A with respect to the volume element d A on
S
Mn .R/ (see Eq. (8.1). The support, denoted as supp pA of pdf A, is Sn . Thus,
S
EfG.A/g D f; (8.75)
A D A.y/; (8.76)
For random vector Y, the available information is deduced from Eqs. (8.75) to (8.77)
and is written as
Efg.Y/g D f : (8.79)
A D LA T ."In C A0 / LA ;
with " > 0, where A0 belongs to MC n .R/ and where LA is the upper triangular
.n n/ real matrix corresponding to the Cholesky factorization LA T LA D A
of the mean matrix A D EfAg that is given in MC n .R/. Positive-definite matrix
A0 can be written in two different forms (inducing different properties for random
matrix A):
For this two representations, the parameterization is constructed in taking for y, the
N D n.n C 1/=2 independent entries fGj k ; 1 j k ng of symmetric
real matrix G. Then for all y in RN , A D A.y/ is in Sn , that is to say, is a
positive-definite matrix.
The unknown pdf pY is constructed using the MaxEnt principle for which the
available information is
Efg.Y/g D f; (8.81)
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 253
where log is the Neperian logarithm. In order to solve the optimization problem
defined by Eq. (8.84), a Lagrange multiplier 0 2 RC (associated with the constraint
defined by Eq. (8.80)) and a Lagrange multiplier 2 C R (associated with the
constraint defined by Eq. (8.82)) are introduced, in which the admissible set C is
defined by Z
C D f 2 R ; exp. < ; g.y/ >/ d y < C1g : (8.86)
RN
The solution of Eq. (8.84) can be written (see the proof in the next section) as
sections.
The introduction of the Lagrange multipliers 0 and and the analysis of existence
and uniqueness of the solution of the MaxEnt corresponding to the solution of the
optimization problem defined by Eq. (8.84) are presented hereafter [53].
254 C. Soize
The first step of the proof consists in assuming that there exists a unique
solution (denoted as pY ) to the optimization problem defined by Eq. (8.84). The
functionals
Z Z
p 7! p.y/ d y 1 and p 7! g.y/ p.y/ d y f; (8.88)
RN RN
Z Z
L.pI 0 ; / D E.p/ .0 1/ p.y/ d y 1 < ; g.y/p.y/ d y f >:
RN RN
(8.89)
sol
From Theorem 2, p. 188, of [68], it can be deduced that there exists .sol0 ; /
such that the functional .p; 0 ; / 7! L.pI 0 ; / is stationary at pY (given by
sol
Eq. (8.87)) for 0 D sol
0 and D .
The second step deals with the explicit construction of a family Fp of pdf indexed
by .0 ; /, which renders p 7! L.pI 0 ; / extremum. It is further proved that
this extremum is unique and turns out to be a maximum. For any .0 ; / fixed
in RC C , it can first be deduced from the calculus of variations (Theorem
3.11.16, p. 341, in [101]) that the aforementioned extremum, denoted by p0 ; ,
is written as
Z
exp.0 < ; g.y/ >/ d y D 1 : (8.91)
RN
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 255
Z
g.y/ exp.0 < ; g.y/ >/ d y D f : (8.92)
RN
It is assumed that the optimization problem stated by Eq. (8.84) is well posed in
the sense that the constraints are algebraically
R independent, that is to say, that there
exists a bounded subset S of RN , with S d y > 0, such that for any nonzero vector
v in R1C ,
Z
< v; G.y/ >2 d y > 0 : (8.94)
S
the constraint defined by Eq. (8.93) is fulfilled) exists, then sol is unique and
corresponds to the solution of the following optimization problem:
where H is the strictly convex function defined by Eq. (8.95). The unique solution
pY of the optimization problem defined by Eq. (8.84) is the given by Eq. (8.87) with
sol sol
.sol
0 ; / D .
Since exp.0 / D c0 .0 /, and taking into account Eq. (8.101), the constraint
equation defined by Eq. (8.92) can be rewritten as
Z
g.y/ c0 ./ exp. < ; g.y/ >/ d y D f : (8.102)
RN
The optimization problem defined by Eq. (8.99), which allows for calculating
sol sol
.sol
0 ; / D , is replaced by the more convenient optimization problem that
sol
allows to be computed,
./ D< ; f > log.c0 .// : (8.104)
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 257
Once sol is calculated, c0sol is given by c0sol D c0 .sol /. Let fY ; 2 C g be the
family of random variables with values in RN , for which pdf pY is defined, for all
in C , by
r
./ D f Efg.Y /g; (8.106)
00 T T
./ D Efg.Y / g.Y / g Efg.Y /g Efg.Y /g : (8.107)
Matrix
00 ./ is thus the covariance matrix of the random vector g.Y / and is pos-
itive definite (the constraints have been assumed to be algebraically independent).
Consequently, function 7!
./ is strictly convex and reaches its minimum
for sol which is such that r
.sol / D 0. The optimization problem defined by
Eq. (8.103) can be solved using any minimization algorithm. Since function
is strictly convex, the Newton iterative method can be applied to the increasing
function 7! r
./ for searching sol such that r
.sol / D 0. This iterative
method is not unconditionally convergent. Consequently, an under-relaxation is
introduced and the iterative algorithm is written as
`C1 D `
00 .` /1 r
.` /; (8.108)
kf Efg.Y` /gk kr
.` /k
err.`/ D D ; (8.109)
kfk kfk
such that w.Y / D g.Y / or w.Y / D g.Y / g.Y /T . These two choices allow
for calculating the mathematical expectation in high dimension, Efg.Y /g and
Efg.Y / g.Y /T g, which are required for computing the gradient and the Hessian
defined by Eqs. (8.106) and (8.107).
The estimation of Efw.Y /g requires a generator of realizations of random
vector Y , which is constructed using the Markov chain Monte Carlo method
(MCMC) [59, 95, 117]. With the MCMC method, the transition kernel of the
homogeneous Markov chain can be constructed using the Metropolis-Hastings
algorithm [57, 75] (that requires the definition of a good proposal distribution), the
Gibbs sampling [42] (that requires the knowledge of the conditional distribution),
or the slice sampling [83] (that can exhibit difficulties related to the general shape of
the probability distribution, in particular for multimodal distributions). In general,
these algorithms are efficient, but can also be not efficient if there exist attraction
regions which do not correspond to the invariant measure under consideration and
tricky even in high dimension. These cases cannot easily be detected and are time
consuming.
We refer the reader to the references given hereinbefore for the usual MCMC
methods, and we present after a more advanced method that is very robust in
high dimension, which have been introduced in [112] and used, for instance, in
[11, 51]. The method presented looks like to the Gibbs approach but corresponds
to a more direct construction of a random generator of realizations for random
variable Y whose probability distribution is pY d y. The difference between
the Gibbs algorithm and the proposed algorithm is that the convergence in the
proposed method can be studied with all the mathematical results concerning the
existence and uniqueness of It stochastic differential equation (ISDE). In addition,
a parameter is introduced which allows the transient part of the response to be killed
in order to get more rapidly the stationary solution corresponding to the invariant
measure. Thus, the construction of the transition kernel by using the detailed balance
equation is replaced by the construction of an ISDE, which admits pY d y (defined
by Eq. (8.105)) as a unique invariant measure. The ergodic method or the Monte
Carlo method is used for estimating Efw.Y /g.
Under hypotheses (i) to (iii), and using Theorems 4 to 7 in pages 211 to 216 of
Ref. [105], in which the Hamiltonian is taken as H.u; v/ D kvk2 =2 C .u/, and
using [33, 62] for the ergodic property, it can be deduced that the problem defined
by Eqs. (8.111), (8.112), and (8.113) admits a unique solution. This solution is a
second-order diffusion stochastic process f.U.r/; V.r//, r 2 RC g, which converges
to a stationary and ergodic diffusion stochastic process f.Ust .rst /, Vst .rst //; rst 0g,
when r goes to infinity, associated with the invariant probability measure
Pst .d u; d v/ D st .u; v/ d u d v. The probability density function .u; v/ 7! st .u; v/
on RN RN is the unique solution of the steady-state Fokker-Planck equation
associated with Eqs. (8.111)(8.112) and is written (see pp. 120 to 123 in [105]), as
1 2
st .u; v/ D cN exp kvk .u/ ; (8.117)
2
Random variable Y (for which the pdf pY is defined by Eq. (8.105)) can then be
written, for all fixed positive value of rst , as
The free parameter f0 > 0 introduced in Eq. (8.112) allows a dissipation term to
be introduced in the nonlinear dynamical system for obtaining more rapidly the
asymptotic behavior corresponding to the stationary and ergodic solution associated
with the invariant measure. Using Eq. (8.119) and the ergodic property of stationary
stochastic process Ust yield
Z R
1
Efw.Y /g D lim w.U.r; // dr; (8.120)
R!C1 R 0
1 r k
UkC 2 D Uk C V ; (8.121)
2
p
1b k r 1 f0
VkC1 D V C LkC 2 C WkC1 ; (8.122)
1Cb 1Cb 1Cb
1 r kC1
UkC1 D UkC 2 C V ; (8.123)
2
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 261
1
with the initial condition defined by (8.113), where b D f0 r =4 and where LkC 2
1
is the RN -valued random variable such that LkC 2 D fr u .u/g kC 12 .
uDU
For a given realization in , the sequence fUk . /; k D 1; : : : ; M g is con-
structed using Eqs. (8.121), (8.122), and (8.123). The discretization of Eq. (8.120)
yields the following estimation of the mathematical expectation:
M
X
1
Efw.Y /g D OM;
lim w wO M D w.Uk . //; (8.124)
M !C1 M M0 C 1
kDM0
in which, for f0 fixed, the integer M0 > 1 is chosen to remove the transient part of
the response induced by the initial condition.
For details concerning the optimal choice of the numerical parameters, such as
M0 , M , f0 , r , u0 , and v0 , we refer the reader to [11, 51, 54, 112].
This section deals with a nonparametric stochastic model for random elasticity
matrices in the framework of the three-dimensional linear elasticity in continuum
mechanics, using the methodologies and some of the results that have been given
in the two previous sections: Fundamental Ensembles for Positive-Definite Sym-
metric Real Random Matrices and MaxEnt as a Numerical Tool for Constructing
Ensemble of Random Matrices. The developments given hereinafter correspond to
a synthesis of works detailed in [51, 53, 54].
From a continuum mechanics point of view, the framework is the 3D linear
elasticity of a homogeneous random medium (material) at a given scale. Let e C
be the random elasticity matrix for which the nonparametric stochastic model has
to be derived. Random matrix e C is defined on the probability space .; T ; P/ and
is with values in MC n .R/ with n D 6. This matrix corresponds to the so-called
Kelvins matrix representation of the fourth-order symmetric elasticity tensor in 3D
linear elasticity [71]. The symmetry classes for a linear elastic material, that is to
say, the linear elastic symmetries, are [23] isotropic, cubic, transversely isotropic,
trigonal, tetragonal, orthotropic, monoclinic, and anisotropic. From a stochastic
modeling point of view, the random elasticity matrix e C satisfies the following
properties:
(i) Random matrix eC is assumed to have a mean value that belongs to MC
n .R/,
but is, in mean, close to a given symmetry class induced by a material
sym
symmetry, denoted as Mn .R/ and which is a subset of MC n .R/,
e D Efe
C Cg 2 MC
n .R/ : (8.125)
262 C. Soize
sym sym
in which fEj ; j D 1; : : : ; N g is the matrix algebraic basis of Mn .R/
(Walpoles tensor basis [122]) and where the admissible subset Cm of RN is such
that
N
X sym
Cm D fm 2 RN j mj Ej 2 MC
n .R/g : (8.128)
j D1
sym
It should be noted that the basis matrices Ej are symmetric matrices belonging
to MSn .R/, but are not positive definite, that is to say, do not belong to MC
n .R/. The
dimension N for all material symmetry classes is 2 for isotropic, 3 for cubic, 5 for
transversely isotropic, 6 or 7 for trigonal, 6 or 7 for tetragonal, 9 for orthotropic,
13 for monoclinic, and 21 for anisotropic. The following properties are proved (see
[54, 122]):
sym
(i) If M and M 0 belong to Mn .R/, then
e
C D C` C C; (8.131)
in which the lower bound is the deterministic matrix C` belonging to MCn .R/ and
where C D eC C` is a random matrix with values in MC n .R/. The mean value
C D EfCg of C is written as
e C` 2 MC
C D C n .R/; (8.132)
in which f
C is defined by Eq. (8.125). Such a lower bound can be defined in two
ways:
(i) For a given symmetry class with N < 21, if there is no anisotropic statistical
sym
fluctuations, then the mean matrix C belongs to Mn .R/ and consequently
A is equal to C .
264 C. Soize
sym
(ii) If the class of symmetry is anisotropic (thus N D 21), then Mn .R/ coincides
with MC n .R/ and again A is equal to the mean matrix C that belongs to
MC n .R/.
(iii) In general, for a given symmetry class with N < 21, and due to the presence of
anisotropic statistical fluctuations, the mean matrix C of random matrix C
sym
belongs to MC n .R/ but does not belong to Mn .R/. For this case, an invertible
deterministic .n n/ real matrix S is introduced such that
C D S T A S : (8.134)
C D L C T L C ; A D L A T L A : (8.135)
S D L A 1 L C : (8.136)
It should be noted that for cases (i) and (ii), Eq. (8.136) shows that S D In .
In order that the statistical fluctuations of random matrix C belong mainly to the
sym
symmetry class Mn .R/ and exhibit more or less some anisotropic fluctuations
around this symmetry class, and in order that the level of statistical fluctuations in
the symmetry class is controlled independently of the level of statistical anisotropic
fluctuation, the use of the nonparametric method leads us to introduce the following
representation:
in which:
1=2
Efk A A k2F g
A D : (8.139)
k A k2F
Taking into account the statistical independence of A and G0 and taking the
mathematical expectation of the two members of Eq. (8.137) yield Eq. (8.134).
In this section, random matrix A that allows for describing the statistical fluctu-
sym
ations in the class of symmetry Mn .R/ with N < 21 is constructed using the
MaxEnt principle and, in particular, using all the results and notations introduced in
Sect. 10.
in which A is the matrix in Sn , defined by Eq. (8.133), and where the second
available information is introduced in order that pdf A 7! pA .A/ decreases
toward zero when kAkF goes to zero. The constant cA that has no physical meaning
is re-expressed as a function of the hyperparameter A defined by Eq. (8.139).
This available information defines the vector f D .f1 ; : : : ; f / in R with D
n.n C 1/=2 C 1 and defines the mapping A 7! G.A/ D .G1 .A/; : : : ; G .A//
from Sn into R , such that (see Eq. (8.75))
EfG.A/g D f : (8.141)
A D A.y/; (8.142)
in which, due to Eq. (8.129) and due to the invertibility of A 1=2 , N is a unique
sym
matrix belonging to Mn .R/. Using Eq. (8.130), matrix N has the following
representation:
N
X sym
N D expM .N .y//; N .y/ D yj Ej ; y D .y1 ; : : : ; yN / 2 RN ;
j D1
(8.144)
N
X sym
N D expM .N .Y//; N .Y/ D Yj Ej ; (8.146)
j D1
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 267
EfNg D In : (8.147)
The case of damped linear elastic structures for which the damping and the
stiffness matrices of the computational model are independent of the frequency.
The case of linear viscoelastic structures for which the damping and the stiffness
matrices of the computational model depend on the frequency.
13.1 Methodology
(i) A mean computational model for the linear dynamics of the structure,
(ii) A reduced-order model (ROM) of the mean computational model,
(iii) The nonparametric stochastic modeling of both the model-parameter uncer-
tainties and the model uncertainties induced by modeling errors, consisting in
modeling the mass, damping, and stiffness matrices of the ROM by random
matrices,
(iv) A prior probability model of the random matrices based on the use of the
fundamental ensembles of random matrices introduced previously,
(v) An estimation of the hyperparameters of the prior probability model of
uncertainties if some experimental data are available.
The extension to the case of vibrations of free linear structures (presence of rigid
body displacements and of elastic deformations) is straightforward, because it is
sufficient to construct the ROM (which is then devoted only to the prediction of the
268 C. Soize
The dynamical system is a damped fixed elastic structure for which the vibrations
are studied around a static equilibrium configuration considered as a natural state
without prestresses and which is subjected to an external load. For given nominal
values of the parameters of the dynamical system, the finite element model [128] is
called the mean computational model, which is written, in the time domain, as
The solution fx.t /; t > 0g of the time evolution problem is constructed in solving
Eq. (8.148) for t > 0 with the initial conditions x.0/ D x0 and xP .0/ D v0 .
The forced response fx.t /; t 2 Rg is such that,R for all t fixed in R, x.t / verifies
Eq. (8.148), and its Fourier transform xO .!/ D R e i!t x.t / dt is such that, for all
! in R,
O
.! 2 M C i !D C K/ xO .!/ D f.!/; (8.149)
The ROM of the mean computational model is constructed for analyzing the
response of the structure over a frequency band B (bounded symmetric interval of
pulsations in rad/s) such that
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 269
B D !max ; !min [ !min ; !max ; 0 !min < !max < C1; (8.150)
and is obtained in using the method of modal superposition (or modal analysis)
[8, 87]. The generalized eigenvalue problem associated with the mass and stiffness
matrices of the mean computational model is written as
K D M ; (8.151)
In general, D is a full matrix. The generalized force f.!/ is a Cn -vector such that
O
f.!/ D T f.!/ in which fO is the Fourier transform of f, which is assumed to be a
bounded function on R.
Z Z
n
8n n0 ; O
kh.!/ hO .!/k2F d ! "0 O
kh.!/k2
F d !; (8.157)
B B
n
in which hO .!/ D .! 2 M C i !D C K/1 T . In practice, for large
computational model, Eq. (8.157) is replaced by a convergence analysis of xn to x
on B for a given subset of generalized forces f.
For the given frequency band of analysis B, and for n fixed to the value n0 such that
Eq. (8.157) is verified, the nonparametric stochastic model of uncertainties consists
in replacing in Eq. (8.155) the deterministic matrices M , D, and K by random
matrices M, D, and K defined on the probability space .; T ; P/, with values
in MC n .R/. The deterministic ROM defined by Eqs. (8.154) and (8.155) is then
replaced by the following stochastic ROM:
in which, for all ! in B, Xn .!/ and Q.!/ are Cm - and Cn -valued random vectors
defined on probability space .; T ; P/.
(iii) The prior probability model of these random matrices must be chosen such
that, for all ! in B, the solution Q.!/ of Eq. (8.159) is a second-order Cn -
valued random variable, that is to say, such that
The dynamical system is a fixed viscoelastic structure for which the vibrations
are studied around a static equilibrium configuration considered as a natural state
without prestresses and which is subjected to an external load. Consequently, in
the frequency domain, the damping and stiffness matrices depend on frequency !,
instead of to be independent of the frequency as in the previous analyzed case.
Consequently, two aspects must be addressed. The first one is relative to the choice
of the basis for constructing the ROM, and the second one is the nonparametric
stochastic modeling of the frequency dependent damping and stiffness matrices
which are related by a Hilbert transform; we then use for such a nonparametric
stochastic modeling ensemble SE HT of a pair of positive-definite matrix-valued
random functions related to a Hilbert transform.
For constructing the ROM, the projection basis is chosen as previously in taking the
stiffness matrix at zero frequency. The generalized eigenvalue problem, defined by
Eq. (8.151), is then rewritten as K.0/ D M . With such a choice of a basis,
Eqs. (8.154) to (8.156) that defined the ROM for all ! belonging to the frequency
band of analysis B are replaced by
The matrices D.!/ and K.!/ are full matrices belonging to MC n .R/, which
verify (see [89]) all the mathematical properties introduced in the construction of
ensemble SE HT and, in particular, verify Eqs. (8.65) to (8.68). For "0 fixed, the value
n0 of the dimension n of the ROM is such that Eq. (8.157) holds (equation in which
the frequency dependence of the damping and stiffness matrices is introduced). In
practice, for large computational model, this criterion is replaced by a convergence
analysis of xn to x on B for a given subset of generalized forces f.
in which, for all ! in B, Xn .!/ and Q.!/ are Cm - and Cn -valued random vectors
defined on probability space .; T ; P/.
(i) Random matrices M, D.!/, and K.!/ are with values in MC
n .R/.
(ii) The mean values of these random matrices are chosen as the corresponding
matrices in the ROM of the mean computational model,
(iii) The random matrices D.!/ and K.!/ are such that
(iv) The prior probability model of these random matrices must be chosen for that,
for all ! in B, the solution Q.!/ of Eq. (8.167) is a second-order Cn -valued
random variable, that is to say, for that
(v) The algebraic dependence between D.!/ and K.!/ induced by the causality
must be preserved, which means that random matrix K.!/ is given by
Eq. (8.72) as a function of random matrix K.0/ and the family of random
matrices fD.!/; ! 0g,
Z C1
2 !2 1
K.!/ D K.0/ C p.v D.! 0 / d ! 0 ; 8! 0:
0 !2 !02
(8.171)
exp
opt
X exp
npar D arg max log pW .w` I npar / : (8.172)
npar 2Cnpar
`D1
exp exp
in which w1 ; : : : ; w exp are exp independent experimental data corresponding
to W.
14 Parametric-Nonparametric Uncertainties in
Computational Nonlinear Structural Dynamics
The last two presented sections have been devoted to the nonparametric stochastic
model of both the model-parameter uncertainties and the model uncertainties
induced by the modeling errors, without separating the contribution of each one
of these two types of uncertainties. Sometimes, there is an interest of separating
the uncertainties for a small number of model parameters that exhibit an important
sensitivity on the responses, from uncertainties induced by the model uncertainties
and the uncertainties on other model parameters.
Such an objective requires to use a parametric-nonparametric stochastic model
of uncertainties, also called the generalized probabilistic approach of uncertainties
in computational structural dynamics, which has been introduced in [113].
As the nonparametric stochastic model of uncertainties has been presented in the
previous sections for linear dynamical systems formulated in the frequency domain,
in the present section, the parametric-nonparametric stochastic model of uncertain-
ties is presented in computational nonlinear structural dynamics formulated in the
time domain.
The dynamical system is a damped fixed structure for which the nonlinear vibrations
are studied in the time domain around a static equilibrium configuration considered
as a natural state without prestresses and subjected to an external load. For given
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 275
nominal values of the model parameters of the dynamical system, the basic finite
element model is called the mean nonlinear computational model. In addition, it is
assumed that a set of model parameters has been identified as sensitive parameters
that are uncertain. These uncertain model parameters are the components of a vector
yQ belonging to an admissible set Cpar which is a subset of RN . It is assumed that
a parameterization is constructed such that yQ D Y.y/ in which y 7! Y.y/ is a
given and known function from Cpar into RN . For instance, if the component yQj
of yQ must belong to 0; C1, then yQj could be defined as exp.yj / with yj 2 R,
which yields Yj .y/ D exp.yj /. Hereinafter, it is then assumed that the uncertain
model parameters are represented by vector y D .y1 ; : : : ; yN / belonging to RN .
The nonlinear mean computational model, depending on uncertain model parameter
y, is written as
in which x.t / is the unknown time response vector of the m degrees of freedom
(DOF) (displacements and/or rotations); xP .t / and xR .t / are the velocity and acceler-
ation vectors respectively; f.t I y/ is the known external load vector of the m inputs
(forces and/or moments); M.y/, D.y/, and K.y/ are the mass, damping, and
stiffness matrices of the linear part of the mean nonlinear computational model,
respectively, which belong to MC m .R/; and .x.t /; x
P .t // 7! fNL .x.t /; xP .t /I y/ is the
nonlinear mapping that models the local nonlinear forces (such as nonlinear elastic
barriers).
We are interested in the time evolution problem defined by Eq. (8.173) for t > 0
with the initial conditions x.0/ D x0 and xP .0/ D v0 .
for which the eigenvalues 0 < 1 .y/ 2 .y/ : : : m .y/ and the associated
elastic structural modes f1 .y/; 2 .y/; : : : ; m .y/g are such that
in which M .y/, D.y/, and K.y/ (generalized mass, damping, and stiffness
matrices) belong to MC
n .R/ and are such that
Convergence of the ROM with respect to n. Let n0 be the value of n, for which,
for a given accuracy and for all y in RN , the response xn is converged to x for all
n > n0 .
In all this section, the value of n is fixed to the value n0 defined hereinbefore.
14.3.1 Methodology
The parametric stochastic modeling of uncertainties consists in modeling uncer-
tain model parameter y by a second-order random variable Y D .Y1 ; : : : ; YN /,
defined on the probability space .; T ; P/, with values in RN . Consequently,
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 277
(i) In modeling M .y/, D.y/, and K.y/ by random matrices M.y/, D.y/,
and K.y/,
(ii) In modeling uncertain model parameter y by the RN -valued random
variable Y.
in which, for all y in RN , LM .y/, LD .y/, and LK .y/ are the upper triangular
matrices such that (Cholesky factorization) M .y/ D LM .y/T LM .y/, D.y/ D
LD .y/T LD .y/, and K.y/ D LK .y/T LK .y/. In Eqs. (8.184) to (8.186),
GM , GD , and GK are independent random matrices defined on probability
space . 0 ; T 0 ; P 0 /, with values in MC n .R/, independent of y, and belonging to
fundamental ensemble SGC " of random matrices. The level of nonparametric
uncertainties is controlled by the coefficients of variation GM , GD , and GK
defined by Eq. (8.24) and the vector-valued parameter npar is defined as npar D
.M ; D ; K / that belongs to an admissible set Cnpar . The generator of realizations
fGM . 0 /; GD . 0 /; GK . 0 / for 0 in 0 is explicitly described in the section
C
devoted to the construction of ensembles SGC " and SG0 .
The value of n is fixed to the value n0 that has been defined. The parametric-
nonparametric stochastic model of uncertainties is controlled by the hyperparam-
eters defined by Eq. (8.188).
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 279
exp
X exp
opt
r D arg max log pW .w` I r/ : (8.189)
r2Cr
`D1
exp exp
in which w1 ; : : : ; w exp are exp independent experimental data corresponding
to W.
The stochastic modeling introduces some random vectors and some random matri-
ces in the stochastic computational models. Consequently, a stochastic solver is
required. Two distinct classes of techniques can be used:
local simulation domain [93]. The Monte Carlo simulation method is a stochastic
solver that is particularly well adapted to the high stochastic dimension induced
by the random matrices introduced by the nonparametric method of uncertainties.
Some important ingredients have been developed for having the tools required for
performing the nonparametric stochastic modeling of uncertainties in linear and
nonlinear dynamics of mechanical systems, in particular:
The dynamic substructuring with uncertain substructures which allows for the
nonparametric modeling of nonhomogeneous uncertainties in different parts of a
structure [116];
The nonparametric stochastic modeling of uncertain structures with uncertain
boundary conditions/coupling between substructures [78];
The nonparametric stochastic modeling of matrices that depend on the frequency
and that are related by a Hilbert transform due to the existence of causality
properties, such as those encountered in the linear viscoelasticity theory [89,115];
The multi-body dynamics for which there are uncertain bodies (mass, center of
mass, inertia tensor), for which the uncertainties in the bodies come from a lack
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 281
of knowledge of the distribution of the mass inside the bodies (for instance, the
spatial distribution of the passengers inside a high-speed train) [10];
The nonparametric stochastic modeling in vibroacoustics of complex systems for
low- and medium-frequency domains, including the stochastic modeling of the
coupling matrices between the structure and the acoustic cavities [38, 89, 110];
The formulation of the nonparametric stochastic modeling of the nonlinear
operators occurring in the static and the dynamics of uncertain geometrically
nonlinear structures [21, 77, 81].
In dynamics:
Aeronautics and aerospace engineering systems [7, 20, 78, 88, 91]
Biomechanics [30, 31]
Environment for well integrity for geologic CO2 sequestration [32]
Nuclear engineering [9, 12, 13, 29]
Pipe conveying fluid [94]
Rotordynamics [79, 80, 82]
Soil-structure interaction and wave propagation in soils [4, 5, 26, 27]
Vibration of turbomachines [18, 19, 22, 70]
Vibroacoustics of automotive vehicles [3, 3840, 61]
In continuum mechanics of heterogeneous materials:
Composites reinforced with fibers [48]
Heat transfer of complex composite panels [98]
Nonlinear thermomechanics in heterogeneous materials [97]
Polycrystalline microstructures [49]
Porous materials [52]
Random elasticity tensors of materials exhibiting symmetry properties [51,53]
16 Conclusions
References
1. Agmon, N., Alhassid, Y., Levine, R.D.: An algorithm for finding the distribution of maximal
entropy. J. Comput. Phys. 30, 250258 (1979)
2. Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. John Wiley &
Sons, New York (2003)
3. Arnoux, A., Batou, A., Soize, C., Gagliardini, L.: Stochastic reduced order computational
model of structures having numerous local elastic modes in low frequency dynamics. J. Sound
Vib. 332(16), 36673680 (2013)
4. Arnst, M., Clouteau, D., Chebli, H., Othman, R., Degrande, G.: A non-parametric probabilis-
tic model for ground-borne vibrations in buildings. Probab. Eng. Mech. 21(1), 1834 (2006)
5. Arnst, M., Clouteau, D., Bonnet, M.: Inversion of probabilistic structural models using
measured transfer functions. Comput. Methods Appl. Mech. Eng. 197(68), 589608 (2008)
6. Au, S.K., Beck, J.L.: Subset simulation and its application to seismic risk based on dynamic
analysis. J. Eng. Mech. ASCE 129(8), 901917 (2003)
7. Avalos, J., Swenson, E.D., Mignolet, M.P., Lindsley, N.J.: Stochastic modeling of structural
uncertainty/variability from ground vibration modal test data. J. Aircr. 49(3), 870884 (2012)
8. Bathe, K.J., Wilson, E.L.: Numerical Methods in Finite Element Analysis. Prentice-Hall,
New York (1976)
9. Batou, A., Soize, C.: Experimental identification of turbulent fluid forces applied to fuel
assemblies using an uncertain model and fretting-wear estimation. Mech. Syst. Signal Pr.
23(7), 21412153 (2009)
10. Batou, A., Soize, C.: Rigid multibody system dynamics with uncertain rigid bodies. Multi-
body Syst. Dyn. 27(3), 285319 (2012)
11. Batou, A., Soize, C.: Calculation of Lagrange multipliers in the construction of maximum
entropydistributions in high stochastic dimension. SIAM/ASA J. Uncertain. Quantif. 1(1),
431451 (2013)
12. Batou, A., Soize, C., Audebert, S.: Model identification in computational stochastic dynamics
using experimental modal data. Mech. Syst. Signal Pr. 5051, 307322 (2014)
13. Batou, A., Soize, C., Corus, M.: Experimental identification of an uncertain computational
dynamical model representing a family of structures. Comput. Struct. 89(1314), 14401448
(2011)
14. Bohigas, O., Giannoni, M.J., Schmit, C.: Characterization of chaotic quantum spectra and
universality of level fluctuation laws. Phys. Rev. Lett. 52(1), 14 (1984)
15. Bohigas, O., Giannoni, M.J., Schmit, C.: Spectral fluctuations of classically chaotic quantum
systems. In: Seligman, T.H., Nishioka, H. (eds.) Quantum Chaos and Statistical Nuclear
Physics, pp. 1840. Springer, New York (1986)
16. Bohigas, O., Legrand, O., Schmit, C., Sornette, D.: Comment on spectral statistics in
elastodynamics. J. Acoust. Soc. Am. 89(3), 14561458 (1991)
17. Burrage, K., Lenane, I., Lythe, G.: Numerical methods for second-order stochastic differential
equations. SIAM J. Sci. Comput. 29, 245264 (2007)
18. Capiez-Lernout, E., Soize, C.: Nonparametric modeling of random uncertainties for dynamic
response of mistuned bladed disks. ASME J. Eng. Gas Turbines Power 126(3), 600618
(2004)
19. Capiez-Lernout, E., Soize, C., Lombard, J.P., Dupont, C., Seinturier, E.: Blade manufacturing
tolerances definition for a mistuned industrial bladed disk. ASME J. Eng. Gas Turbines Power
127(3), 621628 (2005)
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 283
20. Capiez-Lernout, E., Pellissetti, M., Pradlwarter, H., Schueller, G.I., Soize, C.: Data and model
uncertainties in complex aerospace engineering systems. J. Sound Vib. 295(35), 923938
(2006)
21. Capiez-Lernout, E., Soize, C., Mignolet, M.: Post-buckling nonlinear static and dynamical
analyses of uncertain cylindrical shells and experimental validation. Comput. Methods Appl.
Mech. Eng. 271(1), 210230 (2014)
22. Capiez-Lernout, E., Soize, C., Mbaye, M.: Mistuning analysis and uncertainty quantification
of an industrial bladed disk with geometrical nonlinearity. J. Sound Vib. 356, 124143 (2015)
23. Chadwick, P., Vianello, M., Cowin, S.C.: A new proof that the number of linear elastic
symmetries is eight. J. Mech. Phys. Solids 49, 24712492 (2001)
24. Chebli, H., Soize, C.: Experimental validation of a nonparametric probabilistic model of non
homogeneous uncertainties for dynamical systems. J. Acoust. Soc. Am. 115(2), 697705
(2004)
25. Chen, C., Duhamel, D., Soize, C.: Probabilistic approach for model and data uncertainties
and its experimental identification in structural dynamics: case of composite sandwich panels.
J. Sound Vib. 294(12), 6481 (2006)
26. Cottereau, R., Clouteau, D., Soize, C.: Construction of a probabilistic model for impedance
matrices. Comput. Methods Appl. Mech. Eng. 196(1720), 22522268 (2007)
27. Cottereau, R., Clouteau, D., Soize, C.: Probabilistic impedance of foundation, impact of the
seismic design on uncertain soils. Earthq. Eng. Struct. D. 37(6), 899918 (2008)
28. Das, S., Ghanem, R.: A bounded random matrix approach for stochastic upscaling. Multiscale
Model. Simul. 8(1), 296325 (2009)
29. Desceliers, C., Soize, C., Cambier, S.: Non-parametric parametric model for random
uncertainties in nonlinear structural dynamics application to earthquake engineering.
Earthq. Eng. Struct. Dyn. 33(3), 315327 (2004)
30. Desceliers, C., Soize, C., Grimal, Q., Talmant, M., Naili, S.: Determination of the random
anisotropic elasticity layer using transient wave propagation in a fluid-solid multilayer: model
and experiments. J. Acoust. Soc. Am. 125(4), 20272034 (2009)
31. Desceliers, C., Soize, C., Naili, S., Haiat, G.: Probabilistic model of the human cortical bone
with mechanical alterations in ultrasonic range. Mech. Syst. Signal Pr. 32, 170177 (2012)
32. Desceliers, C., Soize, C., Yanez-Godoy, H., Houdu, E., Poupard, O.: Robustness analysis of
an uncertain computational model to predict well integrity for geologic CO2 sequestration.
Comput. Mech. 17(2), 307323 (2013)
33. Doob, J.L.: Stochastic Processes. John Wiley & Sons, New York (1990)
34. Doostan, A., Iaccarino, G.: A least-squares approximation of partial differential equations
with high dimensional random inputs. J. Comput. Phys. 228(12), 43324345 (2009)
35. Duchereau, J., Soize, C.: Transient dynamics in structures with nonhomogeneous uncertain-
ties induced by complex joints. Mech. Syst. Signal Pr. 20(4), 854867 (2006)
36. Dyson, F.J.: Statistical theory of the energy levels of complex systems. Parts I, II, III. J. Math.
Phys. 3, 140175 (1962)
37. Dyson, F.J., Mehta, M.L.: Statistical theory of the energy levels of complex systems. Parts IV,
V. J. Math. Phys. 4, 701719 (1963)
38. Durand, J.F., Soize, C., Gagliardini, L.: Structural-acoustic modeling of automotive vehicles
in presence of uncertainties and experimental identification and validation. J. Acoust. Soc.
Am. 124(3), 15131525 (2008)
39. Fernandez, C., Soize, C., Gagliardini, L.: Fuzzy structure theory modeling of sound-insulation
layers in complex vibroacoustic uncertain systems theory and experimental validation.
J. Acoust. Soc. Am. 125(1), 138153 (2009)
40. Fernandez, C., Soize, C., Gagliardini, L.: Sound-insulation layer modelling in car computa-
tional vibroacoustics in the medium-frequency range. Acta Acust. United Ac. 96(3), 437444
(2010)
41. Fishman, G.S.: Monte Carlo: Concepts, Algorithms, and Applications. Springer, New York
(1996)
284 C. Soize
42. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and the Bayesian distribution
of images. IEEE Trans. Pattern Anal. Mach. Intell. PAM I-6, 721741 (1984)
43. Ghanem, R., Spanos, P.D.: Polynomial chaos in stochastic finite elements. J. Appl. Mech.
Trans. ASME 57(1), 197202 (1990)
44. Ghanem, R., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1991)
45. Ghanem, R., Spanos, P.D.: Stochastic Finite Elements: A spectral Approach (rev. edn.). Dover
Publications, New York (2003)
46. Ghosh, D., Ghanem, R.: Stochastic convergence acceleration through basis enrichment of
polynomial chaos expansions. Int. J. Numer. Methods Eng. 73(2), 162184 (2008)
47. Golub, G.H., Van Loan, C.F.: Matrix Computations, Fourth, The Johns Hopkins University
Press, Baltimore (2013)
48. Guilleminot, J., Soize, C., Kondo, D.: Mesoscale probabilistic models for the elasticity tensor
of fiber reinforced composites: experimental identification and numerical aspects. Mech.
Mater. 41(12), 13091322 (2009)
49. Guilleminot, J., Noshadravan, A., Soize, C., Ghanem, R.G.: A probabilistic model for
bounded elasticity tensor random fields with application to polycrystalline microstructures.
Comput. Methods Appl. Mech. Eng. 200, 16371648 (2011)
50. Guilleminot, J., Soize, C.: Probabilistic modeling of apparent tensors in elastostatics: a
MaxEnt approach under material symmetry and stochastic boundedness constraints. Probab.
Eng. Mech. 28(SI), 118124 (2012)
51. Guilleminot, J., Soize, C.: Generalized stochastic approach for constitutive equation in linear
elasticity: a random matrix model. Int. J. Numer. Methods Eng. 90(5), 613635 (2012)
52. Guilleminot, J., Soize, C., Ghanem, R.: Stochastic representation for anisotropic permeability
tensor random fields. Int. J. Numer. Anal. Met. Geom. 36(13), 15921608 (2012)
53. Guilleminot, J., Soize, C.: On the statistical dependence for the components of random
elasticity tensors exhibiting material symmetry properties. J. Elast. 111(2), 109130 (2013)
54. Guilleminot, J., Soize, C.: Stochastic model and generator for random fields with symmetry
properties: application to the mesoscopic modeling of elastic random media. Multiscale
Model. Simul. (A SIAM Interdiscip. J.) 11(3), 840870 (2013)
55. Gupta, A.K., Nagar, D.K.: Matrix Variate Distributions. Chapman & Hall/CRC, Boca Raton
(2000)
56. Hairer, E., Lubich, C., G. Wanner, G.: Geometric Numerical Integration. Structure-Preserving
Algorithms for Ordinary Differential Equations. Springer, Heidelberg (2002)
57. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications.
Biometrika 109, 5797 (1970)
58. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620630 and
108(2), 171190 (1957)
59. Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems. Springer, New
York (2005)
60. Kapur, J.N., Kesavan, H.K.: Entropy Optimization Principles with Applications. Academic,
San Diego (1992)
61. Kassem, M., Soize, C., Gagliardini, L.: Structural partitioning of complex structures in the
medium-frequency range. An application to an automotive vehicle. J. Sound Vib. 330(5),
937946 (2011)
62. Khasminskii, R.:Stochastic Stability of Differential Equations, 2nd edn. Springer, Heidelberg
(2012)
63. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differentials Equations. Springer,
Heidelberg (1992)
64. Langley, R.S.: A non-Poisson model for the vibration analysis of uncertain dynamic systems.
Proc. R. Soc. Ser. A 455, 33253349 (1999)
65. Legrand, O., Sornette, D.: Coarse-grained properties of the chaotic trajectories in the stadium.
Physica D 44, 229235 (1990)
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 285
66. Legrand, O., Schmit, C., Sornette, D.: Quantum chaos methods applied to high-frequency
plate vibrations. Europhys. Lett. 18(2), 101106 (1992)
67. Le Matre, O.P., Knio, O.M.: Spectral Methods for Uncertainty Quantification with Applica-
tions to Computational Fluid Dynamics. Springer, Heidelberg (2010)
68. Luenberger, D.G.: Optimization by Vector Space Methods. John Wiley & Sons, New York
(2009)
69. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial
differential equations. Comput. Methods Appl. Mech. Eng. 194(1216), 12951331 (2005)
70. Mbaye, M., Soize, C., Ousty, J.P., Capiez-Lernout, E.: Robust analysis of design in vibration
of turbomachines. J. Turbomach. 135(2), 021008-1021008-8 (2013)
71. Mehrabadi, M.M., Cowin, S.C.: Eigentensors of linear anisotropic elastic materials. Q. J.
Mech. Appl. Math. 43:1541 (1990)
72. Mehta, M.L.: Random Matrices and the Statistical Theory of Energy Levels. Academic, New
York (1967)
73. Mehta, M.L.: Random Matrices, Revised and Enlarged, 2nd edn. Academic Press, San Diego
(1991)
74. Mehta, M.L.: Random Matrices, 3rd edn. Elsevier, San Diego (2014)
75. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 49, 335341 (1949)
76. Mignolet, M.P., Soize, C.: Nonparametric stochastic modeling of linear systems with
prescribed variance of several natural frequencies. Probab. Eng. Mech. 23(23), 267278
(2008)
77. Mignolet, M.P., Soize, C.: Stochastic reduced order models for uncertain nonlinear dynamical
systems. Comput. Methods Appl. Mech. Eng. 197(4548), 39513963 (2008)
78. Mignolet, M.P., Soize, C., Avalos, J.: Nonparametric stochastic modeling of structures with
uncertain boundary conditions/coupling between substructures. AIAA J. 51(6), 12961308
(2013)
79. Murthy, R., Mignolet, M.P., El-Shafei, A.: Nonparametric stochastic modeling of uncertainty
in rotordynamics-Part I: Formulation. J. Eng. Gas Turb. Power 132, 092501-1092501-7
(2009)
80. Murthy, R., Mignolet, M.P., El-Shafei, A.: Nonparametric stochastic modeling of uncertainty
in rotordynamics-Part II: applications. J. Eng. Gas Turb. Power 132, 092502-1092502-11
(2010)
81. Murthy, R., Wang, X.Q., Perez, R., Mignolet, M.P., Richter, L.A.: Uncertainty-based experi-
mental validation of nonlinear reduced order models. J. Sound Vib. 331, 10971114 (2012)
82. Murthy, R., Tomei, J.C., Wang, X.Q., Mignolet, M.P., El-Shafei, A.: Nonparametric stochastic
modeling of structural uncertainty in rotordynamics: Unbalance and balancing aspects. J. Eng.
Gas Turb. Power 136, 62506-162506-11 (2014)
83. Neal, R.M.: Slice sampling. Ann. Stat. 31, 705767 (2003)
84. Nouy, A.: Recent developments in spectral stochastic methods for the numerical solution
of stochastic partial differential equations. Arch. Comput. Methods Eng. 16(3), 251285
(2009)
85. Nouy, A.: Proper Generalized Decomposition and separated representations for the numerical
solution of high dimensional stochastic problems. Arch. Comput. Methods Eng. 16(3), 403
434 (2010)
86. Nouy, A., Soize, C.: Random fields representations for stochastic elliptic boundary value
problems and statistical inverse problems. Eur. J. Appl. Math. 25(3), 339373 (2014)
87. Ohayon, R., Soize, C.: Structural Acoustics and Vibration. Academic, San Diego (1998)
88. Ohayon, R., Soize, C.: Advanced computational dissipative structural acoustics and fluid-
structure interaction in low- and medium-frequency domains. Reduced-order models and
uncertainty quantification. Int. J. Aeronaut. Space Sci. 13(2), 127153 (2012)
89. Ohayon, R., Soize, C.: Advanced Computational Vibroacoustics. Reduced-Order Models and
Uncertainty Quantification. Cambridge University Press, New York (2014)
90. Papoulis, A.: Signal Analysis. McGraw-Hill, New York (1977)
286 C. Soize
91. Pellissetti, M., Capiez-Lernout, E., Pradlwarter, H., Soize, C., Schueller, G.I.: Reliability
analysis of a satellite structure with a parametric and a non-parametric probabilistic model.
Comput. Methods Appl. Mech. Eng. 198(2), 344357 (2008)
92. Poter, C.E.: Statistical Theories of Spectra: Fluctuations. Academic, New York (1965)
93. Pradlwarter, H.J., Schueller, G.I.: Local domain Monte Carlo simulation. Struct. Saf. 32(5),
275280 (2010)
94. Ritto, T.G., Soize, C., Rochinha, F.A., Sampaio, R.: Dynamic stability of a pipe conveying
fluid with an uncertain computational model. J. Fluid Struct. 49, 412426 (2014)
95. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2005)
96. Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method, 2nd edn. John Wiley
& Sons, New York (2008)
97. Sakji, S., Soize, C., Heck, J.V.: Probabilistic uncertainties modeling for thermomechanical
analysis of plasterboard submitted to fire load. J. Struct. Eng. ASCE 134(10), 16111618
(2008)
98. Sakji, S., Soize, C., Heck, J.V.: Computational stochastic heat transfer with model uncertain-
ties in a plasterboard submitted to fire load and experimental validation. Fire Mater. 33(3),
109127 (2009)
99. Schmit, C.: Quantum and classical properties of some billiards on the hyperbolic plane. In:
Giannoni, M.J., Voros, A., Zinn-Justin, J. (eds.) Chaos and Quantum Physics, pp. 333369.
North-Holland, Amsterdam (1991)
100. Schueller, G.I.: Efficient Monte Carlo simulation procedures in structural uncertainty and
reliability analysis recent advances. Struct. Eng. Mech. 32(1), 120 (2009)
101. Schwartz, L.: Analyse II Calcul Diffrentiel et Equations Diffrentielles. Hermann, Paris
(1997)
102. Serfling, R.J.: Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New
York (1980)
103. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379423,
623659 (1948)
104. Soize, C.: Oscillators submitted to squared Gaussian processes. J. Math. Phys. 21(10), 2500
2507 (1980)
105. Soize, C.: The Fokker-Planck Equation for Stochastic Dynamical Systems and its Explicit
Steady State Solutions. World Scientific Publishing Co Pte Ltd, Singapore (1994)
106. Soize, C.: A nonparametric model of random uncertainties in linear structural dynamics.
In: Bouc R., Soize, C. (eds.) Progress in Stochastic Structural Dynamics, pp. 109138.
Publications LMA-CNRS, Marseille (1999). ISBN 2-909669-16-5
107. Soize, C.: A nonparametric model of random uncertainties for reduced matrix models in
structural dynamics. Probab. Eng. Mech. 15(3), 277294 (2000)
108. Soize, C.: Maximum entropy approach for modeling random uncertainties in transient
elastodynamics. J. Acoust. Soc. Am. 109(5), 19791996 (2001)
109. Soize, C.: Random matrix theory and non-parametric model of random uncertainties. J. Sound
Vib. 263(4), 893916 (2003)
110. Soize, C.: Random matrix theory for modeling random uncertainties in computational
mechanics. Comput. Methods Appl. Mech. Eng. 194(1216), 13331366 (2005)
111. Soize, C.: Non Gaussian positive-definite matrix-valued random fields for elliptic stochastic
partial differential operators. Comput. Methods Appl. Mech. Eng. 195(13), 2664 (2006)
112. Soize, C.: Construction of probability distributions in high dimension using the maximum
entropy principle. Applications to stochastic processes, random fields and random matrices.
Int. J. Numer. Methods Eng. 76(10), 15831611 (2008)
113. Soize, C.: Generalized Probabilistic approach of uncertainties in computational dynamics
using random matrices and polynomial chaos decompositions. Int. J. Numer. Methods Eng.
81(8), 939970 (2010)
114. Soize, C.: Stochastic Models of Uncertainties in Computational Mechanics. American Society
of Civil Engineers (ASCE), Reston (2012)
8 Random Matrix Models and Nonparametric Method for Uncertainty: : : 287
115. Soize, C., Poloskov, I.E.: Time-domain formulation in computational dynamics for linear
viscoelastic media with model uncertainties and stochastic excitation. Comput. Math. Appl.
64(11), 35943612 (2012)
116. Soize, C., Chebli, H.: Random uncertainties model in dynamic substructuring using a
nonparametric probabilistic model. J. Eng. Mech.-ASCE 129(4), 449457 (2003)
117. Spall, J.C.: Introduction to Stochastic Search and Optimization. John Wiley & Sons, Hoboken
(2003)
118. Talay, D., Tubaro, L.: Expansion of the global error for numerical schemes solving stochastic
differential equation. Stoch. Anal. Appl. 8(4), 94120 (1990)
119. Talay, D.: Simulation and numerical analysis of stochastic differential systems. In: Kree,
P., Wedig, W. (eds.) Probabilistic Methods in Applied Physics. Lecture Notes in Physics,
vol. 451, pp. 5496. Springer, Heidelberg (1995)
120. Talay, D.: Stochastic Hamiltonian system: exponential convergence to the invariant measure
and discretization by the implicit Euler scheme. Markov Process. Relat. Fields 8, 163198
(2002)
121. Tipireddy, R., Ghanem, R.: Basis adaptation in homogeneous chaos spaces. J. Comput. Phys.
259, 304317 (2014)
122. Walpole, L.J.: Elastic behavior of composite materials: theoretical foundations. Adv. Appl.
Mech. 21, 169242 (1981)
123. Walter, E., Pronzato, L.: Identification of Parametric Models from Experimental Data.
Springer, Berlin (1997)
124. Weaver, R.L.: Spectral statistics in elastodynamics. J. Acoust. Soc. Am. 85(3), 10051013
(1989)
125. Wigner, E.P.: On the statistical distribution of the widths and spacings of nuclear resonance
levels. Proc. Camb. Philos. Soc. 47, 790798 (1951)
126. Wigner, E.P.: Distribution laws for the roots of a random Hermitian matrix In: Poter, C.E.
(ed.) Statistical Theories of Spectra: Fluctuations, pp. 446461. Academic, New York (1965)
127. Wright, M., Weaver, R.: New Directions in Linear Acoustics and Vibration. Quantum Chaos,
Random Matrix Theory, and Complexity. Cambridge University Press, New York (2010)
128. Zienkiewicz, O.C., Taylor, R.L.: The Finite Element Method For Solid and Structural
Mechanics, Sixth edition. Elsevier, Butterworth-Heinemann, Amsterdam (2005)
Maximin Sliced Latin Hypercube Designs
with Application to Cross Validating 9
Prediction Error
Abstract
This paper introduces an approach to construct a new type of design, called a
maximin sliced Latin hypercube design. This design is a special type of Latin
hypercube design that can be partitioned into smaller slices of Latin hypercube
designs, where both the whole design and each slice are optimal under the
maximin criterion. To construct these designs, a two-step construction method for
generating sliced Latin hypercubes is proposed. Several examples are presented
to evaluate the performance of the algorithm. An application of this type of
optimal design in estimating prediction error by cross validation is also illustrated
here.
Keywords
Computer experiments Maximin design Enhanced stochastic evolutionary
algorithm Design of experiments
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
2 Notation and Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
2.1 Sliced Latin Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
2.2 Optimality Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
2.3 The Enhanced Stochastic Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 293
2.4 An Alternate Construction Method for SLHDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
3 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
1 Introduction
where dij is the Euclidean distance between the i th point and the j th point of
the design. The asymptotic optimality of maximin designs is also discussed in
[5]. A related criterion, called p , is proposed in [10], which is scalar valued and
computationally cheaper than criterion (9.1).
The sliced Latin hypercube design (SLHD) was introduced in [13] for collective
evaluation of computer models. An SLHD is a special Latin hypercube design that
can be partitioned into slices of smaller Latin hypercube designs, which can be
constructed by random permutation of each column in the design matrix [13]. As a
result, the generated design is not optimal under any given criterion. This chapter
focuses on constructing a best SLHD which preserves the maximin optimality not
only as one LHD but also in each slice. Note that the existing criteria for optimal
LHDs, such as the maximin criterion (9.1), are not directly applicable here, as it
cannot control the distance in each slice and in the whole design simultaneously.
Also, in algorithmic searches for optimal designs, new designs are often obtained
by exchanging two elements in one column, which could destroy the sliced structure
in SLHDs. These issues motivate us to define a new maximin criterion for SLHD
and to modify the enhanced stochastic evolutionary algorithm [4] to construct this
new maximin SLHD.
9 Maximin Sliced Latin Hypercube Designs with Application to Cross: : : 291
The chapter is organized as follows. Relevant definitions and notations are given
in Sect. 2. An example that motivates the proposed maximin criterion for SLHDs is
shown in Sect. 3. The algorithm for constructing maximin SLHDs is introduced in
Sect. 4 and evaluated with a simulation study in Sect. 5. An application of maximin
SLHDs in cross validation of prediction error for Gaussian process regression is
discussed in Sect. 6. The conclusions can be found in Sect. 7.
.X U /=n; (9.2)
Consider three different schemes for selecting the input values: (i) IID, take an
independent and identically distributed sample of m runs for each fi , with the t
samples generated independently; (ii) LHD, obtain t independent ordinary Latin
hypercube designs of m runs, each of which is associated with one fi ; (iii) SLHD,
generate an n q SLHD with t slices, where each slice is assigned to one fi .
For any of these schemes, let D.i/ denote the design set for fi , i D 1; : : : ; t .
.i/
Denote the kth row of D.i/ as dk . For i D 1; : : : ; t , i is estimated by
m
1 X .i/
O i D fi .dk /; (9.4)
m
kD1
and is estimated by
t
X
O D i O i : (9.5)
iD1
The following theorem, taken from [13], shows the effect of the design set on the
variance of the O i and .
O
varSLHD ./
O varLHD ./
O varIID ./
O (9.7)
9 Maximin Sliced Latin Hypercube Designs with Application to Cross: : : 293
The optimal design defined by the original maximin criterion (9.1) is not unique.
To break ties among all the maximin designs, the following criterion is proposed in
[10]. For a given design X , define a distance list d D .d1 ; d2 ; : : : ; dk / in which the
elements are the distinct, increasing values of the distances between two rows in X .
Let J D .J1 ; J2 ; : : : ; Jk / be the corresponding index list, where Ji is the number of
pairs of sites in the design separated by distance di . Under the generalized maximin
criterion, design X is optimal if it sequentially maximizes di s and minimizes Ji s
in the following order: d1 ; J1 ; d2 ; J2 ; : : : ; dk ; Jk . A scalar-valued design criterion
was also proposed in [10] as a computationally cheaper substitute for the maximin
criterion:
" k #1=p
X p
p D J i di ; (9.8)
iD1
where p is a positive integer. For large enough p, the designs that minimize p are
optimal designs under the generalized maximin criterion.
In order to break down the proposed algorithm for construction of maximin SLHDs
into two stages, Algorithm 2 is introduced here as a new construction for an n by q
sliced Latin hypercube with t slices and m points in each slice.
Proof. Since B consists of t Latin hypercubes of size m, there are t entries in each
column of B with value i , for i 2 f1; 2; : : : ; mg. By construction, the corresponding
entries in C are set to be a permutation on f0; 1; : : : ; t 1g. As a result, each column
of X takes value from ft i j ji D 1; 2; : : : ; mI j D 0; 1; : : : ; t 1g D f1; 2; : : : ; ng.
By definition, the resulting design X is a Latin hypercube of size n.
Furthermore, the i th block of dX =t e is B .i/ , an m by q Latin hypercube. Hence,
design X is a sliced Latin hypercube with t slices and m points in each slice.
This alternate method generates the design slice by slice, which provides
flexibility for optimizing smaller designs as part of the construction of maximin
SLHDs. The proposed algorithm in Sect. 4 is therefore based on this construction
for SLHDs.
Step 2: Each column of C contains three permutations on f0; 1; 2; 3g, where each of
the 12 combinations between f1; 2; 3g and f0; 1; 2; 3g appears exactly once for the
pairing between one column of B and C . Here,
013311220230
CT D :
022013331021
4 7 9 5 3 11 2 6 12 10 1 8
So X T D 4 B T C T D :
8 10 2 4 11 5 1 9 7 12 6 3
3 A Motivating Example
In this section, the general ideas of maximin SLHDs are derived based on the
comparison of four SLHDs with two slices, given in Fig. 9.1. Black circles and
red squares are used to represent different slices. Design (a) is based on a sliced
Latin hypercube generated by random permutation as in [13], while design (d)
is constructed by the proposed algorithm in Sect. 4. Design (b) is obtained by
296 Y. Chen et al.
a b
1.0
1.0
0.8
0.8
0.6
0.6
x2
x2
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x1 x1
1.0
1.0
c d
0.8
0.8
0.6
0.6
x2
x2
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x1 x1
Fig. 9.1 Example SLHDs with ten runs, two factors, and two slices
construct such designs, it is necessary to define a new maximin criterion for SLHDs
and to modify the enhanced stochastic evolutionary algorithm to optimize for such
a design.
A maximin SLHD should maximize the minimum distance in the whole design and
in each slice simultaneously. The original p criterion in (9.8) is generalized for
SLHDs as follows:
" t #
X
s .i/
p .X / D w p .X /=t C .1 w/p .X /; (9.9)
iD1
where X is an SLHD with t slices fX .1/ ; X .2/ ; : : : ; X .t/ g, and w is the weight to
be specified. An SLHD is defined to be a maximin SLHD if it minimizes criterion
ps ./ among all possible SLHDs of the same size.
Based on the alternative construction of sliced Latin hypercubes proposed in
Sect. 2.4, the algorithm for constructing maximin SLHDs contains two steps: (1)
generate different slices in B D dX =t e as t independent maximin LHDs; (2)
construct optimal C while keeping the between-slice distance structure of dX =t e.
The optimal C is defined by the following criterion:
2 31=p
X
p D4 .dij /p 5 ; (9.10)
1i;j n;j i>m
where the constraint fj i > mg ensures that dij measures the distances between
points from different slices in X . This p works in the same way as p in (9.8)
but only controls the between-slice distance. The details of the proposed method for
constructing maximin SLHDs are given in Algorithm 3.
A simple modification can be made for the enhanced stochastic evolutionary
algorithm to construct optimal sliced Latin hypercubes. The element exchanges
within one column could be restricted to those that could keep the sliced structure.
There are two types of such sliced exchanges within one column: (i) exchange two
elements within the same slice; (ii) exchange two elements in different slices if they
correspond to the same level in dX =t e. For example, for a sliced Latin hypercube
with three slices, levels f1; 2; 3g in the same column of the design can be switched,
because d1=3e D d2=3e D d3=3e D 1. Algorithm 3 is therefore compared to this
modified enhanced stochastic evolutionary algorithm called SlicedEx, where new
designs are obtained by the two types of element exchanges that keep the structure
in sliced Latin hypercubes.
The proposed maximin SLHD construction is closely related to a similar proposal
in a recent article [1]. Both papers independently proposed the same maximin
298 Y. Chen et al.
criterion for SLHDs, as shown in (9.9), but used different construction algorithms.
The main differences of the two algorithms are summarized as follows, where
the actual numerical comparison is shown in Sect. 5. (i) The algorithm in [1] is
based on a standard simulated annealing (SA) algorithm established in [7], which
accepts a new design Xt ry with probability expff .Xt ry / f .X/=Th g, where
Th will be monotonically reduced by a cooling schedule. (ii) The first stage of
Algorithm 3 constructs each slice of B separately, whereas the method in [1]
directly works on B, avoiding duplicated rows by swapping elements. (iii) The
second stage in Algorithm 3 uses (9.10) to control between-slice distances, whereas
the optimality criterion used in [1] is still (9.9). (iv) The algorithm in [1] uses
a more efficient way to update p .Xt ry /, which allows it to be faster for large
designs. The application of the proposed maximin SLHDs in this chapter is mainly
used for cross validating prediction errors, whereas the design constructed in [1]
is used for fitting computer experiments with both continuous and categorical
variables.
5 Numerical Illustration
D D .X 0:5/=n; (9.11)
where X is a sliced Latin hypercube. Hence, the projection of the design onto
any factor is the midpoints of n evenly spaced intervals on 0; 1/. Each of the five
algorithms specified below is used to generate X first, then the resulting design D
is obtained by formula (9.11). The optimality criteria in (9.8), (9.9), and (9.10) are
evaluated on D in the search process of the algorithms. By adjusting parameters
of the enhanced stochastic evolutionary algorithm, the time for constructing the
designs is set to be close for all the algorithms. Let D 1 denote the worst slice in
D under criterion p in (9.8). Each procedure is replicated 50 times. The means of
p .D/, ps .D/, p .D 1 /, and CPU time are summarized in tables, where p is set to
be 30 in each criterion, and the weight w is chosen to be 1=2. Boxplots of p .D/ and
p .D 1 / are also given. The box goes from the first quartile to the third quartile,
and the line within the box indicates the median of the data set. Boxplots may
also have lines extending vertically from the boxes (whiskers) showing variability
outside the upper and lower quartiles.
Example 3. Consider the construction of a maximin SLHD with 120 runs, 2 factors,
and 12 slices. The results are shown in Table 9.3 and Fig. 9.4.
Table 9.1 Construction of maximin SLHDs with 50 runs, 2 factors, and 5 slices
Method Proposed SlicedEx Combine Split BA-SA
p .D/ 12:41 9:688 34:20 9:183 10:21
p .D 1 / 4:474 8:375 3:629 8:454 4:517
ps .D/ 8:212 8:538 18:87 8:485 7:177
CPU Time(s) 6:700 6:436 6:457 6:906 6:01
300 Y. Chen et al.
35
8
30
7
25
30
30
6
20
5
15
4
10
Proposed SlicedEx Combine Split Proposed SlicedEx Combine Split
Fig. 9.2 Boxplot of p .D 1 / and p .D/ for maximin SLHDs with 50 runs, 2 factors, and 5 slices
Table 9.2 Construction of maximin SLHDs with 80 runs, 2 factors, and 8 slices
Method Proposed SlicedEx Combine Split BA-SA
p .D/ 14:54 12:60 53:96 12:95 12:03
p .D 1 / 4:629 10:39 3:629 11:50 4:916
ps .D/ 9:277 10:17 28:74 11:23 8:167
CPU Time(s) 16:95 14:29 15:20 13:43 14:12
50
10
40
30
30
8
30
6
20
4
Fig. 9.3 Boxplot of p .D 1 / and p .D/ for maximin SLHDs with 80 runs, 2 factors, and 8 slices
Table 9.3 Construction of maximin SLHDs with 120 runs, 2 factors, and 12 slices
Method Proposed SlicedEx Combine Split BA-SA
p .D/ 15:96 17:13 53:04 16:97 14:50
p .D 1 / 4:658 11:50 3:644 13:99 5:711
ps .D/ 11:99 17:79 41:07 19:23 9:565
CPU Time(s) 26:03 25:00 22:57 22:89 25:73
80
20
60
30
30
15
40
10
20
5
Fig. 9.4 Boxplot of p .D 1 / and p .D/ for maximin SLHDs with 120 runs, 2 factors, and 12
slices
Table 9.4 Construction of maximin SLHDs with 50 runs, 10 factors, and 5 slices
Method Proposed SlicedEx Combine Split BA-SA
p .D/ 1:419 1:171 2:031 1:165 1:171
p .D 1 / 0:8648 1:042 0:8578 1:048 0:8895
ps .D/ 1:136 1:098 1:441 1:098 1:024
CPU Time(s) 38:70 38:94 38:13 37:45 41:40
3.0
1.00
2.5
30
30
0.95
2.0
0.90
1.5
0.85
Fig. 9.5 Boxplot of p .D 1 / and p .D/ for maximin SLHDs with 50 runs, 10 factors, and 5 slices
302 Y. Chen et al.
Combine. (ii) The proposed method is the best under criterion ps in (9.9) except
for Example 4, in which the design has ten points in ten dimensions for each of
the five slices. In that example, the advantage of Proposed over SlicedEx in
terms of between-slice distance is not that obvious, since the pairwise distances
between ten points in ten-dimensional space would not be very small. (iii) Based
on the results in Examples 1, 2, and 3, SlicedEx is slightly better than the two-
step method in terms of the whole design when there are only five slices, but the
difference tends to be smaller as more slices are added; eventually the proposed
method outperforms SlicedEx when the number of slices is 12. This makes sense
intuitively, since compared with working on the whole design, it is more efficient
to optimize and evaluate different slices separately, especially when the number of
slices is not small. Then, compared with the proposed method, the algorithm in [1]
is worse in terms of p .D1 / but is better in terms of ps .D/ and p .D/. This could
be a result of a more efficient updating scheme of p used in [1] or some subtle
difference between the SA algorithm in [1] and the ESE algorithm, such as the
parameter setting and the acceptance criterion.
where the Gaussian process Z.x/ is assumed to have mean 0 and covariance
function:
q
Y
R.x; w/ D 2 expfj .xj wj /2 g: (9.13)
j D1
Given the set of responses ffi .x/ W x 2 D i g, the computation of fOi as the predictor
is obtained by maximizing the likelihood of (9.12) or, equivalently, the BLUP
estimator. Similarly, let fO denote the predictor of f using the response set from
the whole design D D .D 1 ; D 2 ; : : : ; D t /.
In the simulation, the following five designs are used for D D .D 1 ; D 2 ; : : : ; D t /:
1 1
f1 .x/ D log. p C p /;
x1 x2
0:98 0:95
f2 .x/ D log. p C p /;
x1 x2
1:02 1:02
f3 .x/ D log. p C p /;
x1 x2
1 1:03
f4 .x/ D log. p C p /;
x1 x2
P
and f .x/ D 14 4iD1 fi .x/. For each choice of the size of D i (denoted as m), the
parameters for the algorithm are adjusted so that the construction time for the three
types of optimal designs is close. Table 9.5 provides the comparison between the
designs in terms of the mean squared prediction error on a separate 1000 run LHD
test design. In this example, MSLH and IMLH achieve the lowest prediction
error for f1 ; MLHD is the best considering the prediction error for f . In this
example, lower p value of the design matrix is associated with smaller prediction
error of the Kriging model.
In this section, the mean squared prediction error (MSPE) for Gaussian process
regression is estimated by cross validation [16]. Given the value of an unknown
function f on nL data points, yL;i D f .xL;i /, for i D 1; 2; : : : ; nL , let fO be the
BLUP predictor obtained with MLE parameter estimates. Given a large testing data
set T D .x T;i ; yT;i / of nT observations, MSPE can be estimated by:
nT
1 X
M SPEtest D .yT;i fO.x T;i //2 :
nT iD1
304 Y. Chen et al.
However, a testing data set is usually not available in practice, while K-fold cross
validation can be easily implemented to find an estimate based on the learning data
L only. The procedure starts by dividing L into t slices, L1 ; L2 ; : : : ; Lt , with m
points in each slice. For k D 1; 2; : : : ; K, build a predictor fOk based on data LnLk
by Gaussian process regression, and then calculate M SPEC V;k as the mean squared
prediction error of fOk on the remaining slice Lk . The cross-validation estimator is
the average of the mean squared prediction errors on all t slices:
t
1X
M SPEC V D M SPEC V;k :
t
kD1
In the simulation, the five designs discussed previously are still used to generate
the learning data L for K-fold cross validation. The leave-one-out estimator is a
special case of K-fold cross validation, where the number of slices t is equal to the
size of the training data nL . To simplify the computation, [15] proposes a pseudo
version of the leave-one-out estimator, in which fOk uses the maximum likelihood
estimates of the covariance parameters in (9.13) based on the complete data L. For
reference, these two leave-one-out estimates of MSPE are also computed based on
the optimal design MLHD.
The quantity M SPEtest is close to the true squared prediction error given a large
testing data set. Hence, M SPEC V M SPEtest is considered as the bias of the cross-
validation estimates to evaluate different designs. In addition, the standard deviation
of the prediction error on t slices fM SPEC V;1 ; M SPEC V;2 ; : : : ; M SPEC V;t g is also
computed, which is denoted as scv . The computation time to construct optimal
designs MLHD, IMLH, and MSLH is set to be close by adjusting the parameters
of these algorithms.
Example 2. This example considers the Branin function [6] on the domain
5; 10 0; 15. The response takes the form:
5:1 2 5 1
f .x1 ; x2 / D .x2 x C x1 6/2 C 10.1 / cos.x1 / C 10:
4 2 1 8
The testing set is generated by LHD with 5000 runs. The predictor fO is obtained
by Gaussian process regression on a learning data set of size 36. The K-fold
cross-validation estimate with six slices is obtained for each of the five sampling
schemes. The leave-one-out estimate and its pseudo version are also calculated using
a maximin LHD. The result of 50 replications is shown in Fig. 9.6 and Table 9.6.
Example 3. This example uses a multimodal six-hump camel back function from
[14]:
f .x1 ; x2 / D .4 2:1x12 C x14 =3/x12 C x1 x2 C .4 C 4x22 /x22 ;
9 Maximin Sliced Latin Hypercube Designs with Application to Cross: : : 305
10
5
0
Table 9.6 Sample statistics for the cross-validation estimates using 50 replicates in Example 2
K-Fold Leave-one-out
Method LHD MLHD IMLH SLHD MSLH true pseudo
mean of MSPEt est 0:7599 0:31530 0:5364 1:217 0:4866 0:3153 0:3153
sd. of MSPEt est 1:948 0:8242 1:131 4:302 0:7816 0:8242 0:8242
mean of MSPEC V 5:339 6:710 1:760 5:682 2:295 1:072 17:77
sd. of MSPEC V 5:199 9:538 1:925 9:856 3:046 1:840 24:43
scv 9:437 12:00 2:978 11:28 3:655 4:852 75:39
Time 6:554 7:527 5:832
where x1 2 3; 3; x2 2 2; 2. The testing set is generated by an LHD with 5000
runs. The predictor fO is obtained by Gaussian process regression on a learning data
set of size 49. The K-fold cross-validation estimate with seven slices is obtained
for each of the five sampling schemes. The leave-one-out estimate and its pseudo
version are also calculated using a maximin LHD. The result of 50 replications is
shown in Fig. 9.7 and Table 9.7.
306 Y. Chen et al.
Table 9.7 Sample statistics for the cross-validation estimates using 50 replicates in Example 3
K-Fold Leave-one-out
Method LHD MLHD IMLH SLHD MSLH true pseudo
mean of MSPEt est 4:617 2:854 9:215 3:629 12:74 2:854 2:854
sd. of MSPEt est 7:738 3:900 9:155 5:237 13:33 3:900 3:900
mean of MSPEC V 30:66 46:40 6:075 17:03 13:60 9:917 40:84
sd. of MSPEC V 29:08 25:91 5:959 14:32 9:370 12:08 22:77
scv 49:473 62:72 9:278 27:97 14:38 47:67 134:7
Time 13:25 14:62 14:60
Example 4. This example uses the 3 dimensional Ishigami function from [3]. The
original function is defined on ; 3 :
The testing set is generated by an LHD with 10000 runs. The predictor fO is
obtained by Gaussian process regression on a learning data set of size 80. The K-fold
cross-validation estimate with eight slices is obtained for each of the five sampling
schemes. The leave-one-out estimate and its pseudo version are also calculated using
a maximin LHD. The result of 50 replications is shown in Fig. 9.8 and Table 9.8.
2 .Hu Hl /
f .r; rw ; Tu ; Tl ; Hu ; Hl ; L; Kw / D :
2LTu Tu
log.r=rw / 1 C log.r=r 2
w /r Kw
C Tl
w
The testing set is generated by an LHD with 10000 runs. The predictor fO is
obtained by Gaussian process regression on a learning data set of size 90. The
9 Maximin Sliced Latin Hypercube Designs with Application to Cross: : : 307
Table 9.8 Sample statistics for the cross-validation estimates using 50 replicates in Example 4
K-Fold Leave-one-out
Method LHD MLHD IMLH SLHD MSLH true pseudo
mean of MSPEt est 2:668 2:074 2:378 2:662 2:185 2:074 2:074
sd. of MSPEt est 0:5678 0:4919 0:5508 0:4803 0:4179 0:4919 0:4919
mean of MSPEC V 3:386 4:896 2:189 2:957 2:414 3:611 4:230
sd. of MSPEC V 0:9334 1:189 0:6558 0:9348 0:7453 1:048 1:112
scv 2:680 3:081 1:403 2:060 1:590 8:739 9:010
Time 17:76 18:96 20:24
Table 9.9 Sample statistics for the cross-validation estimates using 50 replicates in Example 5
K-Fold Leave-one-out
Method LHD MLHD IMLH SLHD MSLH true pseudo
mean of MSPEt est 18:23 23:67 18:59 17:37 18:79 23:67 23:67
sd. of MSPEt est 5:697 8:284 3:635 4:634 4:224 8:284 8:284
mean of MSPEC V 31:10 34:04 14:02 28:15 16:65 11:48 44:04
sd. of MSPEC V 13:73 12:55 4:101 12:91 6:763 5:972 13:36
scv 25:53 27:09 8:383 21:88 12:02 53:81 127:9
Time 49:84 47:81 51:30
K-fold cross-validation estimate with five slices is obtained for each of the five
sampling schemes. The leave-one-out estimate and its pseudo version are also
calculated based on a maximin LHD. The result of 50 replications is shown in
Fig. 9.9 and Table 9.9.
Based on the examples in this section, the conclusions are listed as follows.
The true version of the leave-one-out estimate is the best based on the bias
criterion: M SPEC V M SPEtest . However, this estimate is known to have large
variation because of the similarity between the training data sets. Furthermore,
308 Y. Chen et al.
given a larger learning data set, the leave-one-out estimate is not practical because
of the repeated calculation of the maximum likelihood estimate.
The pseudo version of the leave-one-out estimate can be applied for large data
sets, but it is not accurate.
Randomly splitting an LHD or a maximin LHD to perform K-fold cross
validation leads to large variation in the estimate.
Designs IMLH and MSLH are the best among all five choices in terms of
the bias of the K-fold cross validation. For the standard deviation of the estimate,
MSLH is slightly better. In Example 5, K-fold cross validation using IMLH
underestimates the prediction error, which could be a result of the small between-
slice distances.
7 Conclusion
One important application of SLHD (and therefore maximin SLHD) is for fitting
multiple computer models with similar response, which, as discussed in [13],
has the advantage of reducing predictive variance. Other potential applications
include computer models with qualitative and quantitative factors [2, 11, 12], cross
validation, and stochastic optimization. It has been shown in [5] that the maximin
design is asymptotically D-optimal under the Gaussian process regression model
with correlation function corr.x i ; x j / D
.kx i x j k/k , as k ! 1, where
./
is a decreasing function. With this asymptotic D-optimality, the maximin SLHD is
expected to improve the performance of SLHD for these various applications.
References
1. Ba, S., Brenneman, W.A., Myers, W.R.: Optimal sliced Latin hypercube designs. Technomet-
rics. 57. 479487 (2015)
2. Han, G., Santner, T.J., Notz, W.I., Bartel, D.L.: Prediction for computer experiments having
quantitative and qualitative input variables. Technometrics 51, 278288 (2009)
3. Ishigami, T., Homma, T.: An importance quantification technique in uncertainty analysis
for computer models, In: Proceedings of the First International Symposium on Uncertainty
Modeling and Analysis (ISUMA90), University of Maryland, pp. 398403 (1990)
4. Jin, R., Chen, W., Sudjianto, A.: An efficient algorithm for constructing optimal design of
computer experiments. J. Stat. Plann. Inference 134, 268287 (2005)
5. Johnson, M.E., Moore, L.M., Ylvisaker, D.: Minimax and maximin distance designs. J. Stat.
Plan. Inference 26, 131148 (1990)
6. Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz
constant. J. Optim. Theory Appl. 79, 157181 (1993)
7. Lundy, M., Mees, A.: Convergence of an annealing algorithm. Math. Prog. 34, 111124 (1986)
8. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting
values of input variables in the analysis of output from a computer code. Technometrics 21,
239245 (1979)
9. Morris, M.D., Mitchell, T.J., Ylvisaker, D.: Bayesian design and analysis of computer
experiments: use of derivatives in surface prediction. Technometrics 35, 243255 (1993)
10. Morris M.D., Mitchell, T.J.: Exploratory designs for computational experiments. J. Stat. Plan.
Inference 43, 381402 (1995)
9 Maximin Sliced Latin Hypercube Designs with Application to Cross: : : 309
11. Qian, P.Z.G., Wu, C.F.J.: Sliced space-filling designs. Biometrika 96, 945956 (2006)
12. Qian, P.Z.G., Wu, H., Wu, C.F.J.: Gaussian process models for computer experiments with
qualitative and quantitative factors. Technometrics 50, 383396 (2008)
13. Qian, P.Z.G.: Sliced Latin hypercube designs. J. Am. Stat. Assoc. 107, 393399 (2012)
14. Szeg, G. P.: Towards Global Optimization II. North-Holland, New York (1978)
15. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening,
predicting, and computer experiments. Technometrics 34, 1525 (1992)
16. Zhang, Q., Qian, P.Z.G.: Designs for crossvalidating approximation models. Biometrika 100,
9971004 (2013)
The Bayesian Approach to Inverse Problems
10
Masoumeh Dashti and Andrew M. Stuart
Abstract
These lecture notes highlight the mathematical and computational structure
relating to the formulation of, and development of algorithms for, the Bayesian
approach to inverse problems in differential equations. This approach is fun-
damental in the quantification of uncertainty within applications involving the
blending of mathematical models with data. The finite-dimensional situation is
described first, along with some motivational examples. Then the development of
probability measures on separable Banach space is undertaken, using a random
series over an infinite set of functions to construct draws; these probability
measures are used as priors in the Bayesian approach to inverse problems.
Regularity of draws from the priors is studied in the natural Sobolev or Besov
spaces implied by the choice of functions in the random series construction, and
the Kolmogorov continuity theorem is used to extend regularity considerations
to the space of Hlder continuous functions. Bayes theorem is derived in
this prior setting, and here interpreted as finding conditions under which the
posterior is absolutely continuous with respect to the prior, and determining
a formula for the Radon-Nikodym derivative in terms of the likelihood of the
data. Having established the form of the posterior, we then describe various
properties common to it in the infinite-dimensional setting. These properties
include well-posedness, approximation theory, and the existence of maximum
a posteriori estimators. We then describe measure-preserving dynamics, again
on the infinite-dimensional space, including Markov chain Monte Carlo and
sequential Monte Carlo methods, and measure-preserving reversible stochastic
M. Dashti ()
Department of Mathematics, University of Sussex, Brighton, UK
e-mail: m.dashti@sussex.ac.uk
A.M. Stuart
Mathematics Institute, University of Warwick, Coventry, UK
e-mail: a.m.stuart@warwick.ac.uk
Keywords
Inverse problems Bayesian inversion Tikhonov regularization and MAP
estimators Markov chain Monte Carlo Sequential Monte Carlo Langevin
stochastic partial differential equations
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
1.1 Bayesian Inversion on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
1.2 Inverse Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
1.3 Elliptic Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
1.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
2 Prior Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
2.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
2.2 Uniform Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
2.3 Besov Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
2.4 Gaussian Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
2.5 Random Field Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
2.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
3 Posterior Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
3.1 Conditioned Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
3.2 Bayes Theorem for Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
3.3 Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
3.4 Elliptic Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
3.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
4 Common Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
4.1 Well Posedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
4.2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
4.3 MAP Estimators and Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
4.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
5 Measure Preserving Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
5.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
5.2 Metropolis-Hastings Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
5.3 Sequential Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
5.4 Continuous Time Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
5.5 Finite-Dimensional Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
5.6 Infinite-Dimensional Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
5.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
A.1 Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
A.2 Probability and Integration In Infinite Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 402
10 The Bayesian Approach to Inverse Problems 313
1 Introduction
y D G.u/:
1. The first difficulty, which may be illustrated in the case where n D J , concerns
the fact that often the equation is perturbed by noise and so we should really
consider the equation
y D G.u/ C ; (10.1)
where 2 RJ represents the observational noise which enters the observed data.
Assume further that G maps RJ into a proper subset of itself, ImG , and that G
has a unique inverse as a map from ImG into RJ . It may then be the case that,
because of the noise, y ImG so that simply inverting G on the data y will not
be possible. Furthermore, the specific instance of which enters the data may
not be known to us; typically, at best, only the statistical properties of a typical
noise are known. Thus we cannot subtract from the observed data y to obtain
something in ImG . Even if y 2 ImG , the uncertainty caused by the presence of
noise causes problems for the inversion.
2. The second difficulty is manifest in the case where n > J so that the system
is underdetermined: the number of equations is smaller than the number of
unknowns. How do we attach a sensible meaning to the concept of solution in
this case where, generically, there will be many solutions?
distribution of .u; y/. We then define the solution of the inverse problem to be
the probability distribution of u given y, denoted ujy. This allows us to model the
noise via its statistical properties, even if we do not know the exact instance of the
noise entering the given data. And it also allows us to specify a priori the form of
solutions that we believe to be more likely, thereby enabling us to attach weights to
multiple solutions which explain the data. This is the Bayesian approach to inverse
problems.
To this end, we define a random variable .u; y/ 2 Rn RJ as follows. We let
u 2 Rn be a random variable with (Lebesgue) density 0 .u/. Assume that yju (y
given u) is defined via the formula (10.1) where G W Rn ! RJ is measurable and
is independent of u (we sometimes write this as ? u) and distributed according
to measure Q0 with Lebesgue density ./. Then yju is simply found by shifting
Q0 by G.u/ to measure Qu with Lebesgue density .y G.u//. It follows that
.u; y/ 2 Rn RJ is a random variable with Lebesgue density .y G.u//0 .u/.
The following theorem allows us to calculate the distribution of the random
variable ujy:
1
y .u/ D y G.u/ 0 .u/:
Z
1
P.ujy/ D P.yju/P.u/:
P.y/
316 M. Dashti and A.M. Stuart
d y 1
.u/ D exp .uI y/ ;
d 0 Z
Z (10.2)
ZD exp .uI y/ 0 .d u/:
Rn
Thus the posterior is absolutely continuous with respect to the prior, and the
Radon-Nikodym derivative is proportional to the likelihood. This is rewriting
Bayes formula in the form
1 1
P.ujy/ D P.yju/:
P.u/ P.y/
y
d y
E f .u/ D E0 .u/f .u/ :
d 0
This inverse problem illustrates the first difficulty, labeled 1. in the previous subsec-
tion, which motivates the Bayesian approach to inverse problems. Let D Rd be
a bounded open set, with Lipschitz boundary @D. Then define the Hilbert space H
and operator A as follows:
H D L2 .D/; h; i; k k I
We make the following assumption about the spectrum of A which is easily verified
for simple geometries, but in fact holds quite generally.
10 The Bayesian Approach to Inverse Problems 317
A'j D j 'j ;
and form a basis for H . Furthermore, the eigenvalues are positive and, if ordered
2
to be increasing, satisfy j j d . t
u
Here and in the remainder of the notes, the notation denotes the existence of
constants C > 0 such that
for all j 2 N.
Any w 2 H can be written as
1
X
wD hw; 'j i'j
j D1
1
X 2t
kwk2Ht D j d jwj j2
j D1
dv
C Av D 0; v.0/ D u: (10.4)
dt
Lemma 1. Let Assumption 1 hold. Then for every u 2 H and every s > 0, there is
a unique solution v of equation (10.4) in the space C .0; 1/I H / \ C ..0; 1/I Hs /.
We write v.t / D exp.At /u.
318 M. Dashti and A.M. Stuart
To motivate this statement, and in particular the high degree of regularity seen at
each fixed t , we argue as follows. Note that, if the initial condition is expanded in
the eigenbasis as
1
X
uD uj 'j ; uj D hu; 'j i;
j D1
1
X
v.t / D uj e j t 'j :
j D1
Thus
1
X 1
X
kv.t /k2Hs D j 2s=d e 2j t juj j2 js e 2j t juj j2
j D1 j D1
1
X 1
X
D t s .j t /s e 2j t juj j2 C t s juj j2
j D1 j D1
D C t s kuk2H :
y D v.1/ C D G.u/ C D e A u C :
Here 2 H is noise and G.u/ WD v.1/ D e A u. Formally this looks like an infinite-
dimensional linear version of the inverse problem (10.1), extended from finite
dimensions to a Hilbert space setting. However, the infinite-dimensional setting
throws up significant new issues. To see this, assume that there is c > 0 such
that has regularity H if and only if < c . Then y is not in the image space of
G which is, of course, contained in \s>0 Hs . Applying the formal inverse of G to y
results in an object which is not in H .
To overcome this problem, we will apply a Bayesian approach and hence will
need to put probability measures on the Hilbert space H ; in particular we will want
to study P.u/, P.yju/ and P.ujy/, all probability measures on H .
One motivation for adopting the Bayesian approach to inverse problems is that
prior modeling is a transparent approach to dealing with under-determined inverse
10 The Bayesian Approach to Inverse Problems 319
problems; it forms a rational approach to dealing with the second difficulty, labeled
2 in Sect. 1.1. The elliptic inverse problem we now describe is a concrete example
of an under-determined inverse problem.
As in Sect. 1.2, D Rd denotes a bounded open set, with Lipschitz boundary
@D. We define the Gelfand triple of Hilbert spaces V H V by
H D L2 .D/; h; i; k k ; V D H01 .D/; hr; ri; k kV D kr k : (10.5)
and V the dual of V with respect to the pairing induced by H . Note that k k
Cp k kV for some constant Cp : the Poincar inequality.
Let 2 X WD L1 .D/ satisfy
r .rp/ D f; x 2 D; (10.7a)
p D 0; x 2 @D: (10.7b)
Lemma 2. Assume that f 2 V and that satisfies (10.6). Then (10.7) has a
unique weak solution p 2 V . This solution satisfies
kpkV kf kV =min
and, if f 2 H ,
kpkV Cp kf k=min :
yj D lj .p/ C j ; j D 1; ; J: (10.8)
y D G.u/ C (10.9)
320 M. Dashti and A.M. Stuart
in the case where D u. On the other hand, if D exp.u/, then the positivity
constraint (10.6) is satisfied for any u 2 X and we may take X C D X:
Notice that we again need probability measures on function space, here the
Banach space X D L1 .D/. Furthermore, in the case where u D , these
probability measures should charge only positive functions, in view of the desired
inequality (10.6). Probability on Banach spaces of functions is most naturally
developed in the setting of separable spaces, which L1 .D/ is not. This difficulty
can be circumvented in various different ways as we describe in what follows.
Section 1.1. See [11] for a general overview of the Bayesian approach to
statistics in the finite-dimensional setting. The Bayesian approach to linear
inverse problems with Gaussian noise and prior in finite dimensions is discussed
in [92, Chapters 2 and 6] and, with a more algorithmic flavor, in the book [53].
Section 1.2. For details on the heat equation as an ODE in Hilbert space, and the
regularity estimates of Lemma 1, see [70, 80]. The classical approach to linear
inverse problems is described in numerous books; see, for example, [32,51]. The
case where the spectrum of the forward map G decays exponentially, as arises for
the heat equation, is sometimes termed severely ill posed. The Bayesian approach
to linear inverse problems was developed systematically in [68, 71], following
from the seminal paper [36] in which the approach was first described; for further
reading on ill-posed linear problems, see [92, Chapters 3 and 6]. Recovering
the truth underlying the data from the Bayesian approach, known as Bayesian
posterior consistency, is the topic of [3, 55]; generalizations to severely ill-posed
problems, such as the heat equation, may be found in [4, 56].
Section 1.3. See [33] for the Lax-Milgram theory which gives rise to Lemma 2.
For classical inversion theory for the elliptic inverse problem determining the
permeability from the pressure in a Darcy model of flow in a porous medium
see [8, 86]; for Bayesian formulations see [24, 25]. For posterior consistency
results see [99].
2 Prior Modeling
be used for our random series; in the case where X is not separable, the resulting
probability measure will be constructed on a separable subspace X 0 of X (see the
discussion in Sect. 2.1).
Section 2.1 describes this general setting, and Sects. 2.2, 2.3 and 2.4 consider,
in turn, three classes of priors termed uniform, Besov and Gaussian. In Sect. 2.5
we link the random series construction to the widely used random field perspective
on spatial stochastic processes and we summarize in Sect. 2.6. We denote the prior
measures constructed in this section by 0 :
We let fj g1
j D1 denote an infinite sequence in the Banach space X , with norm k k,
of R-valued functions defined on a domain D. We will either take D Rd , a
bounded, open set with Lipschitz boundary or D D Td the d -dimensional torus. We
normalize these functions so that kj k D 1 for j D 1; ; 1. We also introduce
another element m0 2 X , not necessarily normalized to 1. Define the function u by
1
X
u D m0 C uj j : (10.11)
j D1
u live. We discuss this perspective in the summary Sect. 2.6. In dealing with the
random series construction, we will also find it useful to consider the truncated
random functions
N
X
uN D m0 C uj j ; uj D j j : (10.12)
j D1
1
X
ess inf m0 .x/ jj j
x2D
j D1
mmin k k`1
1
D mmin :
1C
1
u.x/
mmin > 0; a.e. x 2 D: (10.13)
1C
Since the r.h.s. is nonrandom, we have that for all r 2 ZC the random variable
p 2 LrP .I V /:
EkpkrV < 1:
and
1
X 1
X
2
S1 D jj j S2 jj j2 j a < 1:
j D1 j D1
Example 2. Let fj g denote the Fourier basis for L2 .D/ with D D 0; 1
d . Then
we may take a D D 1. If j D j s , then s > 1 ensures 2 `1 . Furthermore
1
X X
jj j2 j a D j 2s < 1
j D1 j D1
of real valued periodic functions in dimension d 3 with inner product and norm
denoted by h; i and k k, respectively. We then set m0 D 0 and let fj g1
j D1 be an
orthonormal basis for X . Consequently, for any u 2 X , we have for a.e. x 2 Td ,
1
X
u.x/ D uj j .x/; uj D hu; j i: (10.16)
j D1
where
X
1
tq q
q1
kukX t;q D j . d C 2 1/ juj jq
j D1
with q 2 1; 1/ and t > 0. If fj g form the Fourier basis and q D 2, then X t;2
is the Sobolev space HP t .Td / of mean-zero periodic functions with t (possibly non-
integer) square-integrable derivatives; in particular X 0;2 D LP 2 .Td /. On the other
t;q t
hand, if the fj g form certain wavelet bases, then X is the Besov space Bqq .
1
As described above, we assume that uj D j j where D fj gj D1 is an i.i.d.
sequence and D fj g1 j D1 is deterministic. Here we assume that 1 is drawn from
the centred measure on R with density proportional to exp 12 jxjq for some 1
q < 1 we refer to this as a q-exponential distribution, noting that q D 2 gives
a Gaussian and q D 1 a Laplace-distributed random variable. Then for s > 0 and
> 0, we define
q1
. ds C 12 q1 / 1
j D j : (10.17)
The parameter is a key scaling parameter which will appear in the statement of
exponential moment bounds below.
We now prove convergence of the series (found from (10.12) with m0 D 0)
N
X
N
u D uj j ; uj D j j (10.18)
j D1
q q1
This is a Banach space, when equipped with the norm E kvkX t;q . Thus every
Cauchy sequence is convergent in this space.
q q
the Banach space LP .I X t;q /. Thus the infinite series (10.19) exists as an LP -limit
and takes values in X t;q almost surely, for all t < s dq .
Theorem 5. Assume that u is given by (10.19) and (10.17) with 1 drawn from a
centred q-exponential distribution. Then the following are equivalent:
1
X .st/q
q
E exp.kukX t;q / D E exp. 1 j d jj jq /
j D1
1
Y 2 .st/q q1
D 1 j d :
j D1
.st/q d
For < 2
the product converges if d
> 1, i.e., t < s q
as required.
10 The Bayesian Approach to Inverse Problems 327
(ii) ) (i).
q
If (i) does not hold, Z WD kukX t;q is positive infinite on a set of positive
measure S . Then, since for > 0, exp.Z/ D C1 if Z D C1, and
E exp.Z/
E.1S exp.Z//, we get a contradiction.
(i) ) (iii).
To show that (i) implies (iii), note that (i) implies that, almost surely,
1
X
j .ts/q=d jj jq < 1:
j D1
This implies that t < s. To see this assume for contradiction that t
s. Then,
almost surely,
1
X
jj jq < 1:
j D1
Since there is a constant c > 0 with Ejj jq D c for any j 2 N, this contradicts
the law of large numbers.
Now define j D j .ts/q=d jj jq . Using the fact that the j are nonnegative and
independent, we deduce from Lemma 3 (below) that
1
X X1
E j ^ 1 D E j .ts/q=d jj jq ^ 1 < 1:
j D1 j D1
where
Z 1
.st/q=d q =2
I /j x q e x dx:
j .st/=d
328 M. Dashti and A.M. Stuart
q
Noting that, since q
1, the function x 7! x q e x =2 is bounded, up to a constant
of proportionality, by the function x 7! e x for any < 12 , we see that there is a
positive constant K such that
Z 1
.st/q=d
I Kj e x dx
j .st/=d
1
D Kj .st/q=d exp j .st/=d
WD
j :
1
X
j .ts/q=d < 1;
j D1
ables. Then
1
X 1
X
Ij < 1 a.s. , E.Ij ^ 1/ < 1:
j D1 j D1
As in the previous subsection, we now study the situation where the family fj g
has a uniform Hlder exponent and study the implications for Hlder continuity of
the random function u. In this case, however, the basis functions are normalized in
LP 2 and not L1 ; thus we must make additional assumptions on the possible growth
of the L1 norms of fj g with j . We assume that there are C; a; b > 0 and 2 .0; 1
We also assume that a > b as, since kj kL2 D 1, it is natural that the pre-
multiplication constant in the Hlder estimate on the fj g grows in j at least as
fast as the bound on the functions themselves.
10 The Bayesian Approach to Inverse Problems 329
Theorem 6. Assume that u is given by (10.19) and (10.17) with 1 drawn from a
centred
q-exponential distribution.
Suppose also that (10.20) hold and that s >
d b C q 1 C 12 .a b/ for some 2 .0; 2/. Then P-a.s. we have u 2 C 0; .Td /
for all <
2
.
1
X 1
X
S1 D jj j2 j2 . j c1
j D1 j D1
1
X 1
X
S2 D jj j 2
j2 j j a . j c2 :
j D1 j D1
2s 2
c1 D C 1 2b;
d q
2s 2
c2 D C 1 2b .a b/:
d q
We require c1 > 1 and c2 > 1 and since a > b satisfaction of the second of these
will imply the first. Satisfaction of the second gives the desired lower bound on s.
We note that the result of Theorem 6 holds true when the mean function is
nonzero if it satisfies
jm0 .x/j C; x 2 D:
We have the following sharper result if the family fj g is regular enough to be a
t
basis for Bqq instead of satisfying (10.20):
Theorem 7. Assume that u is given by (10.19) and (10.17) with 1 drawn from a
t
centred q-exponential distribution. Suppose also that fj gj 2N form a basis for Bqq
for some t < s dq . Then u 2 C 0;t .Td / P-almost surely.
330 M. Dashti and A.M. Stuart
1
X
mq mqt mq s 1 1
kukB t D . 1 /m j d C 2 1 j mq. d C 2 q / jj jmq :
mq;mq
j D1
For every m 2 N, there exists a constant Cm with Ejj jmq D Cm . Since each term of
the above series is measurable, we can swap the sum and the integration and write
1
X
mq mq
EkukB t D Cm . 1 /m j d .ts/Cm1 CQ m ;
mq;mq
j D1
noting that the exponent of j is smaller than 1 (since t < s d =q). Now for a
d
given t < s d =q, one can choose m large enough so that mq < s d =q t . Then
t1 d
the embedding Bmq;mq C t for any t1 satisfying t C mq < t1 < s d =q implies
mq
that EkukC t .Td / < 1. It follows that u 2 C P-almost surely.
t
If the mean function m0 is t -Hlder continuous, the result of the above theorem
holds for a random series with nonzero mean function as well.
This is in fact a Hilbert space, although we will not use the Hilbert space structure.
We will only use the fact that L2P is a Banach space when equipped with the norm
12
E kvk2Ht and that hence every Cauchy sequence is convergent.
s
Theorem 8. Assume that j j d . Then the sequence of functions fuN g1 N D1
given by (10.18) is Cauchy in the Hilbert space L2P .I Ht /, t < s d2 . Thus, the
infinite series (10.19) exists as an L2P limit and takes values in Ht almost surely, for
t < s d2 .
N
X 2t
EkuN uM k2Ht D E j d juj j2
j DM C1
N
X 1
X
2.t s/ 2.t s/
j d j d :
j DM C1 j DM C1
2.ts/
The sum on the right-hand side tends to 0 as M ! 1, provided d
< 1, by
the dominated convergence theorem. This completes the proof. t
u
The preceding theorem shows that the sum (10.18) has an L2P limit in Ht when
t < s d =2, as one can also see from the following direct calculation
1
X 2t
Ekuk2Ht D j d E.j2 j2 /
j D1
1
X 2t
D j d j2
j D1
1
X 2.t s/
j d < 1:
j D1
The following formal calculation, which can be made rigorous if C is trace class
on H, gives an expression for the covariance operator:
C D Eu u
X
1 X
1
DE j k j k j k
j D1 kD1
X
1 X
1
D j k E.j k /j k
j D1 kD1
X
1 X
1
D j k j k j k
j D1 kD1
1
X
D j2 j j :
j D1
From this expression for the covariance, we may find eigenpairs explicitly:
X
1
Ck D j2 j j k
j D1
1
X 1
X
D j2 hj ; k ij D j2 j k k D k2 k :
j D1 j D1
In the case where H D L P 2 .Td /, we are in the setting of Sect. 2.3 and we
briefly consider this case. We assume that the fj g1 j D1 constitute the Fourier basis.
Let A D 4 denote the negative Laplacian equipped with periodic boundary
conditions on 0; 1/d and restricted to functions which integrate to zero over 0; 1/d .
This operator is positive self-adjoint and has eigenvalues which grow like j 2=d ,
analogously to Assumption 1 made in the case of Dirichlet boundary conditions. It
then follows that Ht D D.At=2 / D HP t .Td /, the Sobolev space of periodic functions
on 0; 1/d with spatial mean equal to zero and t (possibly negative or fractional)
square integrable derivatives. Thus, by the preceding Remarks 2, u defined by
(10.19) is in the space HP t a.s., t < s d2 . In fact we can say more about regularity,
using the Kolmogorov continuity test and Corollary 4; this we now do.
1
X 1
X
S1 D j2 ; S2 D j2 j =d :
j D1 j D1
The corollary delivers the desired result after noting that any < 2s d will make
S2 , and hence S1 , summable.
The previous example illustrates the fact that, although we have constructed
Gaussian measures in a Hilbert space setting, and that they are naturally defined
on a range of Hilbert (Sobolev-like) spaces defined through fractional powers of the
Laplacian, they may also be defined on Banach spaces, such as the space of Hlder
continuous functions. We now return to the setting of the general domain D, rather
than the d -dimensional torus. In this general context, it is important to highlight the
Fernique theorem, here restated from the Appendix because of its importance:
E0 exp kuk2X < 1:
334 M. Dashti and A.M. Stuart
Example 3. Consider the random function (10.11) in the case where H D L P 2 .Td /
s d
and 0 D N .0; A /, s > 2 as in the preceding example. Then we know that,
0 -a.s., u 2 C 0;t , t < 1 ^ .s d2 /. Set D e u in the elliptic PDE (10.7) so that the
coefficient and hence the solution p are random variables on the probability space
.; F; P/. Then min given in (10.6) satisfies
min
exp kuk1 :
By Lemma 2 we obtain
kpkV exp kuk1 kf kV :
Furthermore, for any > 0, there is constant K2 D K2 ./ such that exp.K1 rx/
K2 exp.x 2 / for all x
0. Thus
kpkrV exp K1 rkukC 0;t kf krV
K2 exp kuk2C 0;t kf krV :
This is because the coefficient , while positive a.s., does not satisfy a uniform
positive lower bound across the probability space.
where ! is taken from a common probability space on which the random element
u 2 X is defined. It is thus necessary to study the joint distribution of a set of K
Rn -valued random variables, all on a common probability space. Such RnK -valued
random variables are, of course, only defined up to a set of zero measure. It is
desirable that all such finite-dimensional distributions are defined on a common
subset 0 with full measure, so that u may be viewed as a function u W
D 0 ! Rn I such a choice of random field is termed a modification. When
reinterpreting the previous subsections in terms of random fields, statements about
almost sure (regularity) properties should be viewed as statements concerning the
existence of a modification possessing of the stated almost sure regularity property.
We may define the space of functions
n o
q q
LP .I X / WD v W D ! Rn E kvkX < 1 :
1
q q
This is a Banach space, when equipped with the norm E kvkX . We have used
such spaces in the preceding subsections when demonstrating convergence of the
randomized series. Note that we often simply write u.x/, suppressing the explicit
dependence on the probability space.
336 M. Dashti and A.M. Stuart
Thus the covariance operator of a Gaussian random field is an integral operator with
kernel given by the covariance function. As such we may also view the covariance
function as the Greens function of the inverse covariance, or precision.
A mean-zero Gaussian random field is termed stationary if c.x; y/ D s.x y/
for some matrix-valued function s, so that shifting the field by a fixed random vector
does not change the statistics. It is isotropic if it is stationary and, in addition, s./ D
.j j/, for some matrix-valued function
.
In the previous subsection, we demonstrated how the regularity of random fields
maybe established from the properties of the sequences (deterministic, with
decay) and (i.i.d. random). Here we show similar results but express them in terms
of properties of the covariance function and covariance operator.
with any exponent 1. Then u is almost surely Hlder continuous on D with any
exponent smaller than 12 .
10 The Bayesian Approach to Inverse Problems 337
Proof. We have
C jx yj :
d
r D p Cd
2 p
n do
sup min 1; D ;
p2N 2 p 2
Theorem 12. Let u be a sample from the measure N .0; C/ in the case where
C D As with A satisfying Assumptions 2 and s > d2 . Then, P-a.s., u 2 HP t ,
for t < s d2 , and u 2 C 0;t .D/, for t < 1 ^ .s d2 /.
2
Example 4. Consider the case d D 2; n D 1 and D D 0; 1
. Define the
Gaussian random field through the measure D N .0; 4/ where 4 is the
Laplacian with domain H01 .D/ \ H 2 .D/. Then Assumption 2 is satisfied by 4.
By Theorem 12 it follows that choosing > 1 suffices to ensure that draws from
are almost surely in L2 .D/. It also follows that, in fact, draws from are almost
surely in C .D/.
2.6 Summary
In the preceding four subsections, we have shown how to create random functions by
randomizing the coefficients of a series of functions. Using these random series, we
have also studied the regularity properties of the resulting functions. Furthermore
we have extended our perspective in the Gaussian case to determine regularity
properties from the properties of the covariance function or the covariance operator.
For the uniform prior, we have shown that the random functions all live in a
subset of X D L1 characterized by the upper and lower bounds given in Theorem 2
and found as the closure of the linear span of the set of functions .m0 ; fj g1
j D1 /;
denote this subset, which is a separable Banach space, by X 0 . For the Besov priors,
we have shown in Theorem 5 that the random functions live in the separable Banach
spaces X t;q for all t < s d =q; denote any one of these Banach spaces by X 0 .
And finally for the Gaussian priors, we have shown in Theorem 8 that the random
function exists as an L2 limit in any of the Hilbert spaces Ht for t < s d =2.
Furthermore, we have indicated that, by use of the Kolmogorov continuity theorem,
we can also show that the Gaussian random functions lie in certain Hlder spaces;
these Hlder spaces are not separable but, by the discussion in Sect. A.1.2, we
10 The Bayesian Approach to Inverse Problems 339
0 0;
can embed the spaces C 0; in the separable uniform Hlder spaces C0 for any
< 0 ; since the upper bound on the range of Hlder exponents established by
use of Kolmogorov continuity theorem is open, this means we can work in the
same range of Hlder exponents, but restricted to uniform Hlder spaces, thereby
regaining separability. In this Gaussian case, we denote any of the separable Hilbert
or Banach spaces where the Gaussian random function lives almost surely by X 0 .
Thus, in all of these examples, we have created a probability measure 0 which
is the pushforward of the measure P on the i.i.d. sequence under the map which
takes the sequence into the random function. The resulting measure lives on the
separable Banach space X 0 , and we will often write 0 .X 0 / D 1 to denote this
fact. This is shorthand for saying that functions drawn from 0 are in X 0 almost
surely. Separability of X 0 naturally leads to the use of the Borel -algebra to define
a canonical measurable space and to the development of an integration theory
Bochner integration which is natural on this space; see Sect. A.2.2.
of the Gaussian measure which follows Theorem 8 leads to the answer which
may be rigorously justified by using characteristic functions; see, for example,
Proposition 2.18 in [28]. All three texts include statement and proof of the
Fernique theorem in the generality given here. The Kolmogorov continuity
theorem is discussed in [28] and [1]. Proof of Hlder regularity adapted to the
case of the periodic setting may be found in [40] and [92, Chapter 6]. For further
reading on Gaussian measures, see [27].
Section 2.5. A key tool in making the random field perspective rigorous is the
Kolmogorov Extension Theorem 29.
Section 2.6. For a discussion of measure theory on general spaces, see [15]. The
notion of Bochner integral is introduced in [13]; we discuss it in Sect. A.2.2.
3 Posterior Distribution
Key to the development of Bayes Theorem, and the posterior distribution, is the
notion of conditional random variables. In this section we state an important theorem
concerning conditioning.
Let .X; A/ and .Y; B/ denote a pair of measurable spaces, and let and
be probability measures on X Y . We assume that . Thus there exists
-measurable W X Y ! R with 2 L1 (see Sect. A.1.4 for definition of L1 )
and
d
.x; y/ D .x; y/: (10.28)
d
10 The Bayesian Approach to Inverse Problems 341
Theorem 13. Assume that the conditional random variable xjy exists under
with probability distribution denoted y .dx/. Then the conditional random variable
xjy under exists, with probability
R distribution denoted by y .dx/. Furthermore,
and if c.y/ WD X .x; y/d y .x/ > 0, then
y y
d y 1
y
.x/ D .x; y/:
d c.y/
Example 5. Let X D C 0; 1
I R , Y D R. Let denote the measure on X Y
induced by the random variable w./; w.1/ , where w is a draw from standard unit
Wiener measure on R, starting from w.0/ D z. Let y denote measure on X found
by conditioning Brownian motion to satisfy w.1/ D y, thus y is a Brownian bridge
measure with w.0/ D z; w.1/ D y.
Assume that with
d
.x; y/ D exp .x; y/ :
d
Assume further that
sup .x; y/ D C .y/ < 1
x2X
Lemma 4. Let .Z; B/ be a Borel measurable topological space and assume that
G 2 C .ZI R/ and that .Z/ D 1 for some probability measure on .Z; B/. Then
G is a -measurable function.
342 M. Dashti and A.M. Stuart
Let X and Y be separable Banach spaces, equipped with the Borel -algebra, and
G W X ! Y a measurable mapping. We wish to solve the inverse problem of finding
u from y where
y D G.u/ C (10.29)
Prior: u 0 measure on X .
Noise: Q0 measure on Y , and (recalling that ? denotes independence)
? u.
The random variable yju is then distributed according to the measure Qu , the
translate of Q0 by G.u/. We assume throughout the following that Qu Q0 for u
0 - a.s. Thus, for some potential W X Y ! R,
d Qu
.y/ D exp .uI y/ : (10.30)
d Q0
Thus, for fixed u, .uI / W Y ! R is measurable and EQ0 exp .uI y/ D 1. For
given instance of the data y, .I y/ is termed the log likelihood.
Define 0 to be the product measure
We assume in what follows that .; / is 0 measurable. Then the random variable
.u; y/ 2 X Y is distributed according to measure .d u; dy/ D 0 .d u/Qu .dy/.
Furthermore, it then follows that 0 with
d
.u; y/ D exp .uI y/ :
d 0
Proof. First note that the positivity of Z holds for y 0 -almost surely, and hence by
absolute continuity of with respect to 0 , for y -almost surely. The proof is an
application of Theorem 13 with replaced by 0 , .x; y/ D exp .u; y/ and
.x; y/ D .u; y/. Since 0 .d u; dy/ has product form, the conditional distribution of
ujy under 0 is simply 0 . The result follows. t
u
We will show how to carry out this program for two examples in the following
subsections. The following remark will also be used in studying one of the examples.
We apply Bayesian inversion to the heat equation from Sect. 1.2. Recall that for
G.u/ D e A u, we have the relationship
y D G.u/ C ;
2
Under Assumption 1, we have j j d so that this family of spaces is identical
with the Hilbert scale of spaces Ht as defined in Sects. 1.2 and 2.4.
We choose the prior 0 D N .0; A /; > d2 . Thus 0 .X / D 0 .H / D 1.
Indeed the analysis in Sect. 2.4 shows that 0 .Ht / D 1, t < d2 . For the likelihood
we assume that ? u with Q0 D N .0; A /, and 2 R. This measure
0
satisfies Q0 .Ht / D 1 for t < d2 and we thus choose Y D Ht for some
t 0 < d2 . Notice that our analysis includes the case of white observational noise,
for which D 0. The Cameron-Martin Theorem 32, together with the fact that
e A commutes with arbitrary fractional powers of A, can be used to show that
yju Qu WD N .G.u/; A / where Qu Q0 with
d Qu
.y/ D exp .uI y/ ;
d Q0
and
1 A 2 A A
.uI y/ D kA 2 e uk hA 2 e 2 y; A 2 e 2 ui:
2
In the following we repeatedly use the fact that A e A , > 0, is a bounded linear
operator from Ha to Hb , any a; b; 2 R. Recall that 0 .d u; dy/ D 0 .d u/Q0 .dy/.
0
Note that 0 .H Ht / D 1. Using the boundedness of A e A , it may be shown
that
0
W H Ht ! R
d y 1
.u/ D exp .uI y/ ;
d 0 Z
Z
ZD exp .uI y/ 0 .d u/;
H
provided that Z > 0 for y Q0 -a.s. We establish this positivity in the remainder of
0
the proof. Since y 2 Ht for any t < d2 , Q0 -a.s., we have that y D At =2 w0 for
some w0 2 H and t 0 < d2 . Thus we may write
1 A 2 t 0 A A
.uI y/ D kA 2 e uk hA 2 e 2 w0 ; A 2 e 2 ui: (10.34)
2
Then, using the boundedness of A e A , > 0, together with (10.34), we have
and, since 0 .kuk2 1/ > 0 (by Theorem 33 all balls have positive measure for
Gaussians on a separable Banach space), the required positivity follows.
We consider the elliptic inverse problem from Sect. 1.3 from the Bayesian perspec-
tive. We consider the use of both uniform and Gaussian priors. Before studying the
inverse problem, however, it is important to derive some continuity properties of the
forward problem. Throughout this section, we consider equation (10.7) under the
assumption that f 2 V :
r .i rpi / D f; x 2 D;
pi D 0; x 2 @D:
Then
1
kp1 p2 kV 2
kf kV k1 2 kL1
min
Using the fact that k'kV D kr'k, and applying Lemma 2 to bound p2 in V , we
find that
t
u
since X 0 L1 .D/. When considering uniform priors for the elliptic problem, we
work with this definition of X C .
We define G W X C ! RJ by
Gj .u/ D lj R.u/ ; j D 1; : : : ; J
where, recall,
the lj are elements
of V : bounded linear functionals on V . Then
G.u/ D G1 .u/; ; GJ .u/ and we are interested in the inverse problem of finding
u 2 X C from y where
y D G.u/ C
10 The Bayesian Approach to Inverse Problems 347
n 1 o
u 2 X0C WD v 2 X 0 mmin v.x/ mmax C mmin a.e. x2D :
1C 1C
(10.37)
Thus 0 .X0C / D 1.
defined as follows. Since N .0; /, it follows that Q0 D
The likelihood is
N .0; /, Qu D N G.u/; and
d Qu
.y/ D exp .uI y/ ;
d Q0
1 1 2 1 1 2
.uI y/ D 2 .y G.u// 2y :
2 2
d y 1
.u/ D exp .uI y/ (10.38)
d 0 Z
Z
ZD exp .uI y/ 0 .d u/;
XC
provided Z > 0 for y Q0 -almost surely. To see that Z > 0, note that
Z
ZD exp .uI y/ 0 .d u/;
X0C
1
We may use Remark 5 to shift by 12 j 2 yj2 , since this is almost surely
finite under Q0 and hence under .d u; dy/ D Qu .dy/0 .d u/. We then obtain the
equivalent form for the posterior distribution y :
d y 1 1 1 2
.u/ D exp 2 y G.u/ ; (10.39a)
d 0 Z 2
Z 1 1 2
ZD exp j 2 y G.u/ 0 .d u/: (10.39b)
X 2
We take as prior on u the measure N .0; As / with s > d =2. Then Theorem 12
shows that .X / D 1. The likelihood is unchanged by the prior, since it concerns y
given u, and is hence identical to that in the case of the uniform prior, although
the mean shift from Q0 to Qu by G.u/ now has a different interpretation since
D exp.u/ rather than D u. Thus we again obtain (10.38) for the posterior
distribution (albeit with a different definition of G.u/) provided that we can establish
that, Q0 -a.s.,
Z 1
ZD exp 12 y 2 1 12 y G.u/ 2 0 .d u/ > 0:
X 2 2
Hence
Z
Z
exp.M /0 .d u/ D exp.M /0 .B/ > 0
B
10 The Bayesian Approach to Inverse Problems 349
since all balls have positive measure for Gaussian measure on a separable Banach
space. Thus we again obtain (10.39) for the posterior measure, now with the new
definition of G, and hence .
4 Common Structure
In many classical inverse problems, small changes in the data can induce arbitrarily
large changes in the solution, and some form of regularization is needed to
counteract this ill posedness. We illustrate this effect with the inverse heat equation
example. We then proceed to show that the Bayesian approach to inversion has
the property that small changes in the data lead to small changes in the posterior
distribution. Thus working with probability measures on the solution space, and
adopting suitable priors, provides a form of regularization.
Example 6. Consider the heat equation introduced in Sect. 1.2 and both perfect data
y D e A u, derived from the forward model with no noise, and noisy data y 0 D
e A u C : Consider the case where D 'j with small and 'j a normalized
eigenfuction of A. Thus kk D . Obviously application of the inverse of e A to
y returns the point u which gave rise to the perfect data. It is natural to apply the
inverse of e A to both y and to y 0 to understand the effect of the noise. Doing so
yields the identity
ke A y e A y 0 k D ke A .y y 0 /k D ke A k D ke A 'j k D e j :
Recall Assumption 1 which gives j j 2=d . Now fix any a > 0 and choose j
large enough to ensure that j D .a C 1/ log. 1 /: It then follows that ky y 0 k D
O./ while ke A y e A y 0 k D O. a /: This is a manifestation of ill posedness.
Furthermore, since a > 0 is arbitrary, the ill posedness can be made arbitrarily bad
by considering a ! 1:
Our aim in this section is to show that this ill-posedness effect does not occur
in the Bayesian posterior distribution: small changes in the data y lead to small
changes in the measure y . Let Xand Y be separable Banach spaces, equipped
with the Borel -algebra, and 0 a measure on X . We will work under assumptions
which enable us to make sense of the following measure y 0 defined, for some
W X Y ! R, by
d y 1
.u/ D exp .uI y/ ; (10.40a)
d 0 Z.y/
Z
Z.y/ D exp .uI y/ 0 .d u/: (10.40b)
X
.uI y/
M1 .r; kukX /;
j.uI y1 / .uI y2 /j M2 .r; kukX /ky1 y2 kY :
Then, for every y 2 Y , Z.y/ given by (10.40b) is positive and finite and the
probability measure y given by (10.40) is well defined.
Proof. The boundedness of Z.y/ follows directly from the lower bound on in
Assumption 1, together with the assumed integrability condition in the theorem.
Since u 0 satisfies u 2 X 0 a.s., we have
Z
Z.y/ D exp .uI y/ 0 .d u/:
X0
Hence
Z
Z.y/
exp.R2 /0 .d u/ D exp.R2 /0 .B 0 / > 0: (10.41)
B0
Remarks 6. The following remarks apply to the preceding and following theo-
rem.
exp M1 .r; kukX / 1 C M2 .r; kukX /2 2 L10 .X I R/:
0
dHell .y ; y / C ky y 0 kY :
exp M1 .r; kukX / M2 .r; kukX / exp M1 .r; kukX / 1 C M2 .r; kukX /2 ;
(10.42a)
exp M1 .r; kukX / exp M1 .r; kukX / 1 C M2 .r; kukX /2 :
(10.42b)
0
Let Z D Z.y/ and Z 0 D Z.y 0 / denote the normalization constants for y and y
so that, by Theorem 15,
Z
ZD exp .uI y/ 0 .d u/ > 0;
X0
Z
Z0 D exp .uI y 0 / 0 .d u/ > 0:
X0
10 The Bayesian Approach to Inverse Problems 353
Then, using the local Lipschitz property of the exponential and the assumed
Lipschitz continuity of .uI /, together with (10.42a), we have
Z
jZ Z 0 j j exp .uI y/ exp .uI y 0 / j0 .d u/
X0
Z
exp M1 .r; kukX / j.uI y/ .uI y 0 /j0 .d u/
X0
Z
exp M1 .r; kukX / M2 .r; kukX /0 .d u/ ky y 0 kY
X0
Z
exp M1 .r; kukX / .1 C M2 .r; kukX /2 /0 .d u/ ky y 0 kY
X0
C ky y 0 kY :
The last line follows because the integrand is in L10 by assumption. From the
definition of Hellinger distance, we have
0
2
dHell .y ; y / I1 C I2 ;
where
Z
1 1 1 2
I1 D exp .uI y/ exp. .uI y 0 / 0 .d u/;
Z X0 2 2
Z
1 1 2
I2 D Z 2 .Z 0 / 2 exp..uI y 0 / 0 .d u/:
X0
Note that, again using similar Lipschitz calculations to those above, using the fact
that Z > 0 and Assumptions 1,
Z
1
I1 exp M1 .r; kukX / j.uI y/ .uI y 0 /j2 0 .d u/
4Z X 0
Z
1
exp M1 .r; kukX / M2 .r; kukX /2 0 .d u/ ky y 0 k2Y
Z X0
C ky y 0 k2Y :
< 1:
354 M. Dashti and A.M. Stuart
Hence
I2 C Z 3 _ .Z 0 /3 jZ Z 0 j2 C ky y 0 k2Y :
Remark 1. The Hellinger metric has the very desirable property that it translates
directly into bounds on expectations. For functions f which are in L2y .X I R/ and
L2y 0 .X I R/, the closeness of the Hellinger metric implies closeness of expectations
of f . To be precise, for y; y 0 2 BY .0; r/, we have
y y0 0
jE f .u/ E f .u/j CdHell .y ; y /
0
where constant C depends on r and on the expectations of jf j2 under y and y .
It follows that
y y0
jE f .u/ E f .u/j C ky y 0 k;
for a possibly different constant C which also depends on r and on the expectations
0
of jf j2 under y and y .
4.2 Approximation
and
d N 1
.u/ D N exp N .u/ ; (10.44a)
d 0 Z
Z
ZN D exp N .u/ 0 .d u/ (10.44b)
X
10 The Bayesian Approach to Inverse Problems 355
.u/
M1 .kukX /;
N .u/
M1 .kukX /;
j.u/ N .u/j M2 .kukX / .N /;
where .N / ! 0 as N ! 1.
The following two theorems are very similar to Theorems 15 and 16, and the
proofs are adapted to estimate changes in the posterior caused by changes in the
potential , rather than the data y.
Then ZandZ N given by (10.43b) and (10.44b) are positive and finite and the
probability measures and N given by (10.43) and (10.44) are well defined.
Furthermore, for sufficiently large N , Z N given by (10.44b) is bounded below by a
positive constant independent of N .
Proof. Finiteness of the normalization constants Z and Z N follows from the lower
bounds on and N given in Assumptions 2, together with the integrability
condition in the theorem. Since u 0 satisfies u 2 X 0 a.s., we have
Z
ZD exp .u/ 0 .d u/:
X0
Hence
Z
Z
exp.R2 /0 .d u/ D exp.R2 /0 .B 0 /:
B0
so that
Hence
Z
Z N
exp.2R2 /0 .d u/ D exp.2R2 /0 .B 0 /:
B0
dHell .; N / C .N /:
Let Z and Z N denote the normalization constants for and N so that for all N
sufficiently large, by Theorem 17,
Z
ZD exp .u/ 0 .d u/ > 0;
X0
Z
ZN D exp N .u/ 0 .d u/ > 0;
X0
with positive lower bounds independent of N . Then, using the local Lipschitz
property of the exponential and the approximation property of N ./ from Assump-
tions 2, together with (10.45a), we have
Z
jZ Z N j j exp .u/ exp N .u/ j0 .d u/
X0
Z
exp M1 .kukX / j.u/ N .u/j0 .d u/
X0
Z
exp M1 .kukX / M2 .kukX /0 .d u/ .N /
X0
Z
exp M1 .kukX / .1 C M2 .kukX /2 /0 .d u/ .N /
X0
C .N /:
The last line follows because the integrand is in L10 by assumption. From the
definition of Hellinger distance, we have
0
2
dHell .y ; y / I1 C I2 ;
where
Z
1 1 1 2
I1 D exp .u/ exp. N .u/ 0 .d u/;
Z X0 2 2
Z
1 1 2
I2 D Z 2 .Z N / 2 exp. N .u/ 0 .d u/:
X0
Note that, again by means of similar Lipschitz calculations to those above, using
the fact that Z; Z N > 0 uniformly for N sufficiently large by Theorem 17, and
Assumptions 2,
Z
1
I1 exp M1 .kukX j.u/ N .u/j2 0 .d u/
4Z X 0
Z
1
exp M1 .kukX / M2 .kukX /2 0 .d u/ .N /2
Z X0
C .N /2 :
358 M. Dashti and A.M. Stuart
< 1;
Remarks 7. The following two remarks are relevant to establishing the conditions
of the preceding two theorems and to applying them.
N
jE f .u/ E f .u/j C .N /:
The aim of this section is to connect the probabilistic approach to inverse problems
with the classical method of Tikhonov regularization. We consider the setting in
which the prior measure 0 is a Gaussian measure. We then show that MAP
estimators, points of maximal probability, coincide with minimizers of a Tikhonov-
Phillips regularized least squares function, with regularization being with respect
to the Cameron-Martin norm of the Gaussian prior. The data y plays no explicit
role in our developments here, and so we work in the setting of equation (10.43).
Recall, however, that in the context of inverse problems, a classical methodology is
to simply try and minimize (subject to some regularization) .u/. Indeed for finite
data and Gaussian observational noise with Gaussian distribution N .0; /, we have
1 1 2
.u/ D 2 y G.u/ :
2
10 The Bayesian Approach to Inverse Problems 359
Here .E; k kE / denotes the Cameron-Martin space associated to the Gaussian prior
0 . We view 0 as a Gaussian probability measure on a separable Banach space
.X; k kX / so that 0 .X / D 1. We make the following assumptions about the
function W
(i) For every > 0, there is an M D M ./ 2 R, such that for all u 2 X ,
.u/
M kuk2X :
(ii) is locally bounded from above, i.e., for every r > 0 there exists
K D K.r/ > 0 such that, for all u 2 X with kukX < r, we have
.u/ K:
(iii) is locally Lipschitz continuous, i.e., for every r > 0 there exists
L D L.r/ > 0 such that for all u1 ; u2 2 X with ku1 kX ; ku2 kX < r, we
have
In finite dimensions, for measures which have a continuous density with respect
to Lebesgue measure, there is an obvious notion of most likely point(s): simply the
point(s) at which the Lebesgue density is maximized. This way of thinking does
not translate into the infinite-dimensional context, but there is a way of restating it
which does. Fix a small radius > 0 and identify centres of balls of radius which
have maximal probability. Letting ! 0 then recovers the preceding definition,
when there is a continuous Lebesgue density. We adopt this small ball approach in
the infinite-dimensional setting.
For z 2 E, let B .z/ X be the open ball centred at z 2 X with radius in X .
Let
J .z/ D B .z/
be the mass of the ball B .z/ under the measure . Similarly we define
J0 .z/ D 0 B .z/
360 M. Dashti and A.M. Stuart
the mass of the ball B .z/ under the Gaussian prior. Recall that all balls in a
separable Banach space have positive Gaussian measure, by Theorem 33; it thus
follows that J0 .z/ is finite and positive for any z 2 E: By Assumptions 3(i) and (ii)
together with the Fernique Theorem 10, the same is true for J .z/: Our first theorem
encapsulates the idea that probability is maximized where I is minimized. To see
this, fix any point z2 in the Cameron-Martin space E and notice that the probability
of the small ball at z1 is maximized, asymptotically as the radius of the ball tends to
zero, at minimizers of I .
Theorem 19. Let Assumptions 3 hold and assume that 0 .X / D 1. Then the
function I defined by (10.46) satisfies, for any z1 ; z2 2 E,
J .z1 /
lim
D exp I .z2 / I .z1 / :
!0 J .z2 /
Proof. Since J .z/ is finite and positive for any z 2 E, the ratio of interest is finite
and positive. The key estimate in the proof is given in Theorem 35:
J .z1 / 1 1
lim 0 D exp kz2 k2E kz1 k2E : (10.47)
!0 J .z2 /
0
2 2
This estimate transfers questions about probability, naturally asked on the space X
of full measure under 0 , into statements concerning the Cameron-Martin norm of
0 ; note that under this norm, a random variable distributed as 0 is almost surely
infinite so the result is nontrivial.
We have
R
J .z1 / B .z / exp..u// 0 .du/
DR 1
J .z2 / B .z2 / exp..v// 0 .dv/
R
B .z / exp..u/ C .z1 // exp..z1 // 0 .du/
DR 1 :
B .z2 / exp..v/ C .z2 // exp..z2 // 0 .dv/
J .z1 /
r1 ./ e.L2 CL1 / eI .z1 /CI .z2 /
J .z2 /
with r1 ./ ! 1 as ! 0. Thus
J .z1 /
lim sup eI .z1 /CI .z2 / : (10.48)
!0 J .z2 /
Similarly we obtain
J .z1 /
lim inf
eI .z1 /CI .z2 / (10.49)
!0 J .z2 /
whenever un * u in E.
Lemma 6. If .E; h; iE / is a Hilbert space with induced norm k kE , then the
quadratic form J .u/ WD 12 kuk2E is weakly lower semicontinuous.
362 M. Dashti and A.M. Stuart
1 1
J .un / J .u/ D kun k2E kuk2E
2 2
1
D hun u; un C uiE
2
1 1
D hun u; 2uiE C kun uk2E
2 2
1
hun u; 2uiE :
2
But the right-hand side tends to zero since un * u in E. Hence, the result follows.
Theorem 20. Suppose that Assumption 3 hold and let E be a Hilbert space
compactly embedded in X . Then there exists u 2 E such that
kuk2X C kuk2E :
Hence, by Assumption 3(i), it follows that, for any > 0, there is M ./ 2 R such
that
1
C kuk2E C M ./ I .u/:
2
By choosing sufficiently small, we deduce that there is M 2 R such that, for all
u 2 E,
1
kuk2E C M I .u/: (10.50)
4
I I .un / I C ; 8n
N1 : (10.51)
Using (10.50) we deduce that the sequence fun g is bounded in E and, since E is a
Hilbert space, there exists u 2 E such that un * u in E. By the compact embedding
10 The Bayesian Approach to Inverse Problems 363
I I .u/ I C :
1 1 1 1
kun u` k2E D kun k2E C ku` k2E kun C u` k2E
4 2 2 4
1 1
D I .un / C I .u` / 2I .un C u` / .un / .u` / C 2 .un C u` /
2 2
1
2.I C / 2I .un / .u` / C 2 .un C u` /
2
1
2 .un / .u` / C 2 .un C u` / :
2
1
kun u` k2E 3:
4
Hence the sequence is Cauchy in E and converges strongly and the proof is
complete.
Corollary 1. Suppose that Assumptions 3 hold and the Gaussian measure 0 with
Cameron-Martin space E satisfies 0 .X / D 1. Then there exists u 2 E such that
Section 4.1. The well-posedness theory described here was introduced in the
papers [17] and [92]. Relationships between the Hellinger distance on probability
measures and the total variation distance and Kullback-Leibler divergence
may be found in [38] and Pollard (Distances and affinities between measures.
Unpublished manuscript, http://www.stat.yale.edu/~pollard/Books/Asymptopia/
Metrics.pdf), as well as in [92].
Section 4.2. Generalization of the well-posedness theory to study the effect of
numerical approximation of the forward model on the inverse problem may be
found in [20]. The relationship between expectations and Hellinger distance, as
used in Remark 7, is demonstrated in [92].
Section 4.3. The connection between Tikhonov-Phillips regularization and MAP
estimators is widely appreciated in computational Bayesian inverse problems;
see [53]. Making the connection rigorous in the separable Banach space setting
is the subject of the paper [30]; further references to the historical development of
the subject may be found therein. Related to Lemma 6, see also [23, Chapter 3].
The aim of this section is to study Markov processes, in continuous time, and
Markov chains, in discrete time, which preserve the measure given by (10.43).
The overall setting is described in Sect. 5.1 and introduces the role of detailed bal-
ance and reversibility in constructing measure-preserving Markov chains/processes.
Section 5.2 concerns Markov chain Monte Carlo (MCMC) methods; these are
Markov chains which are invariant with respect to . Metropolis-Hastings methods
are introduced and the role of detailed balance in their construction is explained. The
benefit of conceiving MCMC methods which are defined on the infinite-dimensional
space is emphasized. In particular, the idea of using proposals which preserve the
prior, more specifically which are prior reversible, is introduced as an example.
In Sect. 5.3 we show how sequential Monte Carlo (SMC) methods can be used
to construct approximate samples from the measure given by (10.43). Again
our perspective is to construct algorithms which are probably well defined on the
infinite-dimensional space, and in fact we find an upper bound for the approximation
error of the SMC method which proves its convergence on an infinite-dimensional
space. The MCMC methods from the previous section play an important role in
the construction of these SMC methods. Sections 5.45.6 concern continuous time
-reversible processes. In particular they concern derivation and study of a Langevin
equation which is invariant with respect to the measure . (Note that this is called
the overdamped Langevin equation for a physicist and the plain Langevin equation
for a statistician.) In continuous time we work entirely in the case of Gaussian prior
measure 0 D N .0; C/ on Hilbert space H with inner product and norm denoted by
h; i and k k, respectively; however, in discrete time our analysis is more general,
applying on a separable Banach space .X; k k/ and for quite general prior measure.
10 The Bayesian Approach to Inverse Problems 365
This section is devoted to Banach space valued Markov chains or processes which
are invariant with respect to the posterior measure y constructed in Sect. 3.2.
Within this section, the data y arising in the inverse problems plays no explicit
role; indeed the theory applies to a wide range of measures on separable Banach
space X . Thus the discussion in this chapter includes, but is not limited to, Bayesian
inverse problems. All of the Markov chains we construct will exploit structure in a
reference measure 0 with respect to which the measure is absolutely continuous;
thus has a density with respect to 0 . In continuous time we will explicitly use
the Gaussianity of 0 , but in discrete time we will be more general.
Let 0 be a reference measure on the separable Banach space X equipped with
the Borel -algebra B.X /: We assume that 0 is given by
d 1
.u/ D exp .u/ ; (10.52a)
d 0 Z
Z
ZD exp .u/ 0 .d u/; (10.52b)
X
where Z 2 .0; 1/. In the following we let P .u; d v/ denote a Markov transition
kernel so that P .u; / is a probability measure on X; B.X / for each u 2 X . Our
interest is in probability kernels which preserve .
Definition 2. The Markov chain with transition kernel P is invariant with respect
to if
Z
.d u/P .u; / D ./
X
as measures on X; B.X / . The Markov kernel is said to satisfy detailed balance
with respect to if
Reversible Markov chains and processes arise naturally in many physical systems
which are in statistical equilibrium. They are also important, however, as a means of
366 M. Dashti and A.M. Stuart
constructing Markov chains which are invariant with respect to a given probability
measure. We demonstrate this in Sect. 5.2 where we consider the Metropolis-
Hastings variant of MCMC methods. Then, in Sects. 5.4, 5.5 and 5.6, we move to
continuous time Markov processes. In particular we show that the equation
du p dW
D u CD.u/ C 2 ; u.0/ D u0 ; (10.53)
dt dt
1
Eu u D C; Ev v D C; Eu v D .1 2 / 2 C: (10.54)
Indeed, letting .d u; d v/ WD .d u/P .u; d v/, and with h; i and k k, the inner
product and norm on H, respectively, we can write, using (10.115),
Z
.d
O ; d / D eihu;iCihv;i .d u/P .u; d v/
HH
Z Z
ihu;i
D e eihv;i P .u; d v/ .d u/
H H
Z p 1
1 2 hu;i 12 kC 2 k2
D eihu;i ei .d u/
H
Z p
2 1
kC 2 k2 1 2 Ci
D e 2 eihu; .d u/
H
2 1 1 p
kC 2 k2 12 kC 2 . 1 2 C/k2
D e 2 e
1 1 1 1 1 1 1
D exp kC 2 k2 kC 2 k2 .1 2 / 2 hC 2 ; C 2 i :
2 2
10 The Bayesian Approach to Inverse Problems 367
Hence, by Lemma 19 and equation (10.115), .d u/P .u; d v/ is a centred Gaussian
measure with the covariance operator given by (10.54). Since the expression in
the last line of the above equation is symmetric in and , .d v/P .v; d u/ is a
centred Gaussian measure with the same covariance as .d u/P .u; d v/ and so the
reversibility is proved.
du p dW
D u C 2 ; u.0/ D u0 ; (10.55)
dt dt
p Z t .ts/
u.t / D e t u0 C 2 e d W .s/:
0
Algorithm 1
Given a W X X ! 0; 1
generate fu.k/ gk0 as follows:
Notice that
Z
P .u; d v/ D 1
X
as required. Substituting the expression for P into the detailed balance condition
from Definition 2, we obtain
R
.d u/Q.u; d v/a.u; v/ C .d u/u .d v/ X 1 a.u; w/ Q.u; d w/
D
R
.d v/Q.v; d u/a.v; u/ C .d v/v .d u/ X 1 a.v; w/ Q.v; d w/:
We now note that the measure .d u/u .d v/ is in fact symmetric in the pair .u; v/
and that u D v almost surely under it. As a consequence the identity reduces to
Our aim now is to identify choices of a which ensure that (10.57) is satisfied.
This will then ensure that the prototype algorithm does indeed lead to a Markov
chain for which is invariant. To this end we define the measures
and
T .d u; d v/ D .d v/Q.v; d u/
on X X; B.X / B.X / . The following theorem determines a necessary and
sufficient condition for the choice of a to make the algorithm reversible and
identifies the canonical Metropolis-Hastings choice.
In particular the choice mh .u; v/ D minf1; r.v; u/g will imply detailed balance.
d
.u; v/a.u; v/ D a.v; u/:
d T
as required.
A good example of the resulting methodology arises in the case where Q.u; d v/
is reversible with respect to 0 W
d
T
.u; v/ D r.u; v/ D exp .v/ .u/ :
d
The preceding algorithm works well when the likelihood is not too informative;
however, when the information in the likelihood is substantial, and ./ varies
significantly depending on where it is evaluated, the independence sampler will
10 The Bayesian Approach to Inverse Problems 371
not work well. In such a situation, it is typically the case that local proposals are
needed, with a parameter controlling the degree of locality; this parameter can
then be optimized by choosing it as large as possible, consistent with achieving
a reasonable acceptance probability. The following algorithm is an example of this
concept, with parameter playing the role of the locality parameter. The algorithm
may be viewed as the natural generalization of the random walk Metropolis method,
for targets defined by density with respect to Lebesgue measure, to the situation
where the targets are defined by density with respect to Gaussian measure. The
name pCN is used because of the original derivation of the algorithm via a Crank-
Nicolson discretization of the Hilbert space valued SDE (10.55).
.0/
1. Set k D 0 and pick
p u 2 X.
2. Propose v D .1 2 /u.k/ C .k/ ; .k/ N .0; C /.
.k/
3. Set u.kC1/ D v .k/ with probability a.u.k/ ; v .k/ /, independently of .u.k/ ; .k/ /.
4. Set u.kC1/ D u.k/ otherwise.
5. k ! k C 1 and return to 2.
Example 9. Example 8 shows that using the proposal from Example 7 within a
Metropolis-Hastings context may be viewed as using a proposal based on the
-measure-preserving equation (10.53), but with the D term dropped. The accept-
reject mechanism of Algorithm 3, which is based on differences of , then
compensates for the missing D term.
In this section we introduce sequential Monte Carlo methods and show how these
may be viewed as a generic tool for sampling the posterior distribution arising
in Bayesian inverse problems. These methods have their origin in filtering of
dynamical systems but, as we will demonstrate, have the potential as algorithms
for probing a very wide class of probability measures. The key idea is to introduce
372 M. Dashti and A.M. Stuart
a sequence of measures which evolve the prior distribution into the posterior
distribution. Particle filtering methods are then applied to this sequence of measures
in order to evolve a set of particles that are prior distributed into a set of particles that
are approximately posterior distributed. From a practical perspective, a key step in
the construction of these methods is the use of MCMC methods which preserve the
measure of interest and other measures closely related to it; furthermore, our interest
is in designing SMC methods which, in principle, are well defined on the infinite-
dimensional space; for these two reasons, the MCMC methods from the previous
subsection play a central role in what follows.
Given integer J , let h D J 1 , and for nonnegative integer j J , define the
sequence of measures j 0 by
d j 1
.u/ D exp j h.u/ ; (10.59a)
d 0 Zj
Z
Zj D exp j h.u/ 0 .d u/: (10.59b)
X
.u/ C 8u 2 X: (10.60)
d j C1 Zj
.u/ D exp h.u/ : (10.61)
d j Zj C1
An important idea here is that, while 0 and may be quite far apart as measures,
the pair of measures j ; j C1 can be quite close, for sufficiently small h. This fact
can be used to incrementally evolve samples from 0 into approximate samples of
J .
Let L denote the operator on probability measures which corresponds
to appli-
cation of Bayes theorem with likelihood proportional to exp h.u/ , and let
Pj denote any Markov kernel which preserves the measure j ; such kernels
arise, for example, from the MCMC methods of the previous subsection. These
considerations imply that
j C1 D LPj j : (10.62)
10 The Bayesian Approach to Inverse Problems 373
O j C1 D Pj j ; (10.63a)
j C1 D LO j C1 : (10.63b)
We approximate each of the two steps in (10.63) separately. To this end it helps to
note that, since Pj preserves j ,
d j C1 Zj
.u/ D exp h.u/ : (10.64)
d O j C1 Zj C1
.n/
The approximate distribution is completely defined by particle positions vj and
.n/
weights wj , respectively. Thus the objective of the method is to find an update rule
.n/ .n/ .n/ .n/ N
for fvj ; wj gN nD1 7! fvj C1 ; wj C1 gnD1 . The weights must sum to one. To do this we
.n/
proceed as follows. First each particle vj is updated by proposing a new candidate
.n/
particle vO j C1 according to the Markov kernel Pj ; this corresponds to (10.63a)
and creates an approximation to O j C1 : (See the last two parts of Remark 8 for a
discussion on the role of Pj in the algorithm.) We can think of this approximation
as a prior distribution for application of Bayes rule in the form (10.63b), or
equivalently (10.64). Secondly, each new particle is re-weighted according to the
desired distribution j C1 given by (10.64). The required calculations are very
straightforward because of the assumed form of the measures as sums of Diracs, as
we now explain.
The first step of the algorithm has made the approximation
N
X .n/ .n/
O j C1
O N
j C1 D wj .vj C1 vO j C1 /: (10.66)
nD1
N
X .n/ .n/
j C1
N
j C1 WD wj C1 .vj C1 vO j C1 /: (10.67)
nD1
374 M. Dashti and A.M. Stuart
where
.n/ .n/ .n/
wO j C1 D exp h.vO j C1 / wj (10.68)
N
.n/ .n/ X .n/
wj C1 D wO j C1 = wO j C1 : (10.69)
nD1
Practical experience shows that some weights become very small, and for this
.n/
reason it is desirable to add a resampling step to determine the fvj C1 g by drawing
from (10.67); this has the effect of removing particles with very low weights and
replacing them with multiple copies of the particles with higher weights. Because
the initial measure P.v0 / is not in Dirac form, it is convenient to place this
resampling step at the start of each iteration, rather than at the end as we have
presented here, as this naturally introduces a particle approximation of the initial
measure. This reordering makes no difference to the iteration we have described
and results in the following algorithm.
Algorithm 4
1. Let N
0 D 0 and set j D 0.
.n/
2. Draw vj N j , n D 1; : : : ; N .
.n/
3. Set wj D 1=N , n D 1; : : : ; N and define N
j by (10.65).
.n/ .n/
4. Draw vOj C1 Pj .vj ; /.
.n/
5. Define by (10.68), (10.69) and N
wj C1 j C1 by (10.67).
6. j ! j C 1 and return to 2.
N N N
j C1 D LS Pj j : (10.70)
the iteration (10.62). Thus, analyzing the particle filter requires estimation of the
error induced by application of S N (the resampling error) together with estimation
of the rate of accumulation of this error in time.
The operators L; Pj and S N map the space P.X / of probability measures on X
into itself according to the following:
exp h.v/ .d v/
.L/.d v/ D R ;
X exp h.v/ .d v/
Z
.Pj /.d v/ D Pj .v 0 ; d v/.d v 0 /;
X
N
1 X
.S N /.d v/ D .v v .n/ /d v; v .n/ i:i:d::
N nD1
Rwith jf j1 WD supv2X jf .v/j, and where we have used the convention that .f / D
X f .v/.d v/ for measurable f W X ! R, and similar for . This distance does
indeed generate a metric and, in particular, satisfies the triangle inequality. In fact it
is simply the total variation distance in the case of measures which are not random.
With respect to this distance between random probability measures, we may
prove that the SMC method generates a good approximation of the true measure
, in the limit N ! 1. We use the fact that, under (10.60), we have
exp h C < exp h.v/ < exp h :
Since 0 and C
0, we deduce that there exists 2 .0; 1/ such that for all
v 2 X,
< exp h.v/ < 1 :
J
X 1
d .N
J ; J / .2 2 /j p :
j D1
N
376 M. Dashti and A.M. Stuart
Proof. The desired result is a consequence of the following three facts, whose proof
we postpone to three lemmas at the end of the subsection:
1
sup d .S N ; / p ;
2P.X/ N
d .Pj ; Pj / d .; /;
d .L; L/ 2 2 d .; /:
d .N N N
j C1 ; j C1 / D d .LS Pj j ; LPj j /
d .LPj N N N N
j ; LPj j / C d .LS Pj j ; LPj j /
2 2 d .N j ; j / C d .S N N
j ; N
j /
1
2 2 d .N
j ; j / C p :
N
Remarks 8. This theorem shows that the sequential particle filter actually repro-
duces the true posterior distribution D J , in the limit N ! 1. We make some
comments about this.
We now prove the three lemmas which underlie the convergence proof.
1
sup d .S N ; / p :
2P.X/ N
Proof. Let be an element of P.X / and fv .k/ gNkD1 a set of i.i.d. samples with
v .1/ ; the randomness entering the probability measures is through these
samples, expectation with respect to which we denote by E! in what follows. Then
N
1 X
S N .f / D f .v .k/ /
N
kD1
N
1 X
S N .f / .f / D f .v .k/ /:
N
kD1
Furthermore, for jf j1 1,
N
1 X ! 1
E j.f / S .f /j D 2
! N 2
E jf .v .k/ /j2 :
N N
kD1
Since the result is independent of , we may take the supremum over all probability
measures and obtain the desired result.
d .Pj ; Pj 0 / d .; 0 /:
378 M. Dashti and A.M. Stuart
Proof. The result is generic for any Markov kernel P , so we drop the index j on
Pj for the duration of the proof. Define
Z
q.v 0 / D P .v 0 ; d v/f .v/;
X
that is the expected value of f under one step of the Markov chain started from v 0 .
Clearly, since
Z
jq.v 0 /j P .v 0 ; d v/ sup jf .v/j D sup jf .v/j
X v v
it follows that
Also, since
Z Z
P .f / D f .v/ P .v 0 ; d v/.d v 0 / ;
X X
Thus
12
d .P ; P 0 / D sup E! jP .f / P 0 .f /j2
jf j1 1
12
sup E! j.q/ 0 .q/j2
jqj1 1
D d .; 0 /
as required.
Proof. Define g.v/ D exp h.v/ . Notice that for jf j1 < 1, we can rewrite
.fg/ .fg/
.L/.f / .L/.f / D
.g/ .g/
1 .fg/ 1
D .fg/ .fg/
C .g/ .g/
:
.g/ .g/ .g/
Now notice that .g/1 1 and that, for jf j1 1, .fg/=.g/ 1 since the
expression corresponds to an expectation with respect to measure found from by
reweighting with likelihood proportional to g. Thus
In the remainder of this section, we shift our attention to continuous time processes
which preserve ; these are important in the construction of proposals for MCMC
methods and also as diffusion limits for MCMC. Our main goal is to show that the
equation (10.53) preserves . Our setting is to work in the separable Hilbert space
H with Inner product and norm denoted by h; i and k k, respectively. We assume
that the prior 0 is a Gaussian on H and, furthermore, we specify the space X H
that will play a central role in this continuous time setting. This choice of space X
will link the properties of the reference measure 0 and the potential . We assume
that C has eigendecomposition
n X1 o
X r D u 2 H j 2r jhu; j ij2 < 1
j D1
and then extend to superspaces r < 0 by duality. We use k kr to denote the norm
induced by the inner product
1
X
hu; vir D j 2r uj vj
j D1
denotes the Euclidean norm on Rn , and we also use this notation for the induced
matrix norm on Rn . We assume that
Z
I 2 C 2 .Rn ; RC /; e I .u/ d u D 1:
Rn
du p dB
D A DI .u/ C 2A ; u.0/ D u0 (10.75)
dt dt
jD 2 I .u/j M:
Theorem 24. For every u0 2 Rn and W-a.s., equation (10.75) has a unique global
in time solution u 2 C .0; 1/I Rn /.
Z t p
u.t / D u0 A DI u.s/ ds C 2AB.t /: (10.76)
0
Define X D C .0; T
I Rn / and F W X ! X by
Z t p
.Fv/.t / D u0 A DI v.s/ ds C 2AB.t /: (10.77)
0
Thus u 2 X solving (10.76) is a fixed point of F. We show that F has a unique fixed
point, for T sufficiently small. To this end we study a contraction property of F:
382 M. Dashti and A.M. Stuart
Z t
k.Fv1 / .Fv2 /kX D sup A DI v1 .s/ A DI v2 .s/ ds
0tT 0
Z
T
A DI v1 .s/ A DI v2 .s/ ds
0
Z T
jAjM jv1 .s/ v2 .s/jds
0
T jAjM kv1 v2 kX :
n
X
B.t / D j .t /ej
j D1
where the fj gnj D1 are an i.i.d. collection of standard unit Brownian motions on R.
Thus we obtain
n
X
p
AB.t / D j j ej DW W .t /:
j D1
du p dW
D ADI .u/ C 2 ; u.0/ D u0 : (10.78)
dt dt
Theorem 25. Let u.t / solve (10.75). If u0 , then u.t / for all t > 0. More
precisely, for all ' W Rn ! RC bounded and continuous, u0 implies
E' u.t / D E'.u0 /; 8t > 0:
10 The Bayesian Approach to Inverse Problems 383
Proof. Consider the additive noise SDE, for additive noise with strictly positive-
definite diffusion matrix ,
du p dB
D f .u/ C 2 ; u.0/ D u0 0 :
dt dt
@
D r .f C r/; .u; t / 2 Rn RC ;
@t
jtD0 D 0 :
At time t > 0 the solution of the SDE is distributed according to measure .t / with
density .u; t / solving the Fokker-Planck equation. Thus the initial measure 0 is
preserved if
r .f 0 C r0 / D 0
Thus
so that
r A DI .u/ C A r D r .0/ D 0:
1 1 2
.d u/ D exp I .u/ d u; I .u/ D jC 2 uj C .u/ C ln Z:
2
384 M. Dashti and A.M. Stuart
Thus
DI .u/ D C 1 u C D.u/
du p dB
D A C 1 u C D.u/ C 2A :
dt dt
du p dB
D u CD.u/ C 2C :
dt dt
p
This is exactly (10.53) provided W D CB, where B is a Brownian motion with
covariance I. Then W is a Brownian motion with covariance C. This is the finite-
dimensional analogue of the construction of a C-Wiener process in the Appendix.
We are now in a position to prove Theorems 26 and 27 which are the infinite-
dimensional analogues of Theorems 24 and 25.
kD 2 .u/kL.X t ;X t / M4 :
Example 10. The functional .u/ D 12 kuk2t satisfies Assumptions 4. To see this
note that we may write .u/ D 12 hu; Kui where
1
1 X 2t
KD j j j :
2 j D1
kC hk khk2s :
is globally Lipschitz on X t .
386 M. Dashti and A.M. Stuart
d N 1
.u/ D N exp .P N u/ ; (10.82a)
d 0 Z
Z
ZN D exp .P N u/ 0 .d u/: (10.82b)
X0
N WD P N : (10.83)
Since and hence N are bounded below by M1 , and since the function 1Ckuk2
is integrable by the Fernique theorem 10, the approximation Theorem 18 applies.
We deduce that the Hellinger distance between and N is bounded above by
O.N r / for any r < s 12 t since t 2 .0; s 12 t /.
We will not use this explicit convergence rate in what follows, but we will use
the idea that N converges to in order to prove invariance of the measure under
the SDE (10.53). The measure N has a product structure that we will exploit in the
following. We note that any element u 2 H is uniquely decomposed as u D p C q
where p 2 X N and q 2 X ? . Thus we will write N .d u/ D N .dp; dq/, and
similar expressions for 0 and so forth, in what follows.
d P
.u/ / exp .p/ :
d 0;P
d uN p dW
D uN CP N D N .uN / C 2 : (10.84)
dt dt
Lemma 13. Under Assumption 4, the following estimates hold with all constants
uniform in N :
is globally Lipschitz on X t .
4. The functional N ./ W X t ! R satisfies a second-order Taylor formula (for
which we extend h; i from an inner product on X to the dual pairing between
X t and X t ). There exists a constant M6 > 0 such that
N .v/ N .u/ C hD N .u/; v ui M6 ku vk2t 8u; v 2 X t :
(10.86)
388 M. Dashti and A.M. Stuart
The solution is said to be global if T > 0 is arbitrary. For us, W will be a C-Wiener
process and hence random; we look for existence of a global solution, almost surely
with respect to the Wiener measure. Similarly a solution of (10.84) is a function
uN 2 C .0; T
I X t / satisfying the integral equation
Z p
N
u . / D u0 C F N uN .s/ ds C 2 W . / 8t 2 0; T
: (10.88)
0
Again, the solution is random because W is a C-Wiener process. Note that the
solution to this equation is not confined to X N , because both u0 and W have
nontrivial components in X ? : However, within X ? , the behavior is purely Gaussian
and within X N , it is finite dimensional. We will exploit these two facts.
The following establishes basic existence, uniqueness, continuity and approxi-
mation properties of the solutions of (10.87) and (10.88).
Theorem 26. For every u0 2 X t and for almost every C-Wiener process W,
equation (10.87) (respectively, (10.88)) has a unique global solution. For any pair
.u0 ; W / 2 X t C .0; T
I X t /, we define the It map
W X t C .0; T
I X t / ! C .0; T
I X t /
which maps .u0 ; W / to the unique solution u (resp. uN for (10.88)) of the integral
equation (10.87) (resp. N for (10.88)). The map (respectively, N ) is globally
Lipschitz continuous. Finally we have that N .u0 ; W / ! .u0 ; W / strongly in
C .0; T
I X t / for every pair .u0 ; W / 2 X t C .0; T
I X t /.
Proof. The existence and uniqueness of local solutions to the integral equation
(10.87) is a simple application of the contraction mapping principle, following
arguments similar to those employed in the proof of Theorem 24. Extension to
a global solution may be achieved by repeating the local argument on successive
intervals.
Now let u.i/ solve
Z p
.i/
u.i/ D u0 C F .u.i/ /.s/ds C 2W .i/ . /; 2 0; T
;
0
10 The Bayesian Approach to Inverse Problems 389
Thus, since CD is globally Lipschitz on X t , by Lemma 11, and P N has norm one
as a mapping from X t into itself,
W X t C .0; T
I X t / ! X t
(respectively, N for (10.88)) which maps .u0 ; W / to the unique solution u. /
of the integral equation (10.87) (respectively, uN . / for (10.88)) at time . The
map (respectively, N ) is globally Lipschitz continuous. Finally we have that
N .u0 ; W / ! .u0 ; W / for every pair .u0 ; W / 2 X t C .0; T
I X t /.
Theorem 27. Let Assumption 4 hold. Then the measure given by (10.43) is
invariant for (10.53); for all continuous bounded functions ' W X t ! R, it
follows that if E denotes expectation with respect to the product measure found
condition u0 and W W, the C-Wiener measure on X , then
t
from initial
E' u. / D E'.u0 /.
and
Both of these facts follow from the dominated convergence theorem as we now
show. First note that
Z
Nu /
EN '.u0 / D '.u0 /e .P 0
0 .d u0 /:
N
Since './e P is bounded independently of N , by .sup '/e M1 , and since .
P N /.u/ converges pointwise to .u/ on X t , we deduce that
Z
EN '.u0 / ! '.u0 /e .u0 / 0 .d u0 / D E'.u0 /
so that (10.94) holds. The convergence in (10.93) holds by a similar argument. From
(10.91) we have
Z
N
E ' u . / D
N N
' N .u0 ; W / e .P u0 / 0 .d u0 /W.d W /: (10.95)
The integrand is again dominated by .sup '/e M1 . Using the pointwise convergence
of N to on X t C .0; T
I X t /, as proved in Corollary 2, as well as the pointwise
convergence of . P N /.u/ to .u/, the desired result follows from dominated
convergence: we find that
Z
EN ' uN . / ! ' .u0 ; W / e .u0 / 0 .d u0 /W.d W / D E' u. / :
Lemma 14. Let Assumptions 4 hold. Then the measure N given by (10.82) is
invariant for (10.84); for all continuous bounded functions ' W X t ! R, it follows
that if EN denotes expectation with respect to the product measure found from
initial
condition u0 N and W W, the C-Wiener measure on X t , then EN ' uN . / D
EN '.u0 /.
Proof. Recall from Lemma 12 that measure N given by (10.82) factors as the
independent product of two measures on P on X N and Q on X ? . On X ? the
measure is simply the Gaussian Q D N .0; C ? /, while X N the measure P is
finite dimensional with density proportional to
1 1
exp .p/ k.C N / 2 pk2 : (10.96)
2
392 M. Dashti and A.M. Stuart
The equation (10.84) also decouples on the spaces X N and X ? . On X ? it gives the
integral equation
Z p N
q. / D q.s/ C 2Q W ./ (10.97)
0
priors, in [10]. The proof of Theorem 23 follows the very clear exposition given
in [84] in the context of filtering for hidden Markov models.
Sections 5.45.6 concern measure preserving continuous time dynamics. The
finite-dimensional aspects of this subsection, which we introduce for motivation,
are covered in the texts [79] and [37]; the first of these books is an excellent
introduction to the basic existence and uniqueness theory, outlined in a simple
case in Theorem 24, while the second provides an in-depth treatment of the sub-
ject from the viewpoint of the Fokker-Planck equation, as used in Theorem 25.
This subject has a long history which is overviewed in the paper [41] where
the idea is applied to finding SPDEs which are invariant with respect to the
measure generated by a conditioned diffusion process. This idea is generalized
to certain conditioned hypoelliptic diffusions in [42]. It is also possible to study
deterministic Hamiltonian dynamics which preserves the same measure. This
idea is described in [9] in the same setup as employed here; that paper also
contains references to the wider literature. Lemma 11 is proved in [72] and
Lemma 13 in [82] Lemma 14 requires knowledge of the invariance of Ornstein-
Uhlenbeck processes together with invariance of finite-dimensional first order
Langevin equations with the form of gradient dynamics subject to additive
noise. The invariance of the Ornstein-Uhlenbeck process is covered in [29]
and invariance of finite-dimensional SDEs using the Fokker-Planck equation is
discussed in [37]. The C-Wiener process and its properties are described in [28].
The primary focus of this section has been on the theory of measure-preserving
dynamics and its relations to algorithms. The SPDEs are of interest in their own
right as a theoretical object, but have particular importance in the construction of
MCMC methods and in understanding the limiting behavior of MCMC methods.
It is also important to appreciate that MCMC and SMC methods are by no
means the only tools available to study the Bayesian inverse problem. In this
context we note that computing the expectation with respect to the posterior
can be reformulated as computing the ratio of two expectations with respect
to the prior, the denominator being the normalization constant. Effectively in
some such high-dimensional integration problems, [59] and [77] are general
references on the QMC methodology. The paper [57] is a survey on the theory
of QMC for bounded integration domains and is relevant for uniform priors.
The paper [60] contains theoretical results for unbounded integration domains
and is relevant to, for example, Gaussian priors. The use of QMC in plain
uncertainty quantification (calculating the pushforward of a measure through a
map) is studied for elliptic PDEs with random coefficients in [58] (uniform)
and [39] (Gaussian). More sophisticated integration tools can be employed,
using polynomial chaos representations of the prior measure, and computing
posterior expectations in a manner which exploits sparsity in the map from
unknown random coefficients to measured data; see [89, 90]. Much of this work,
viewing uncertainty quantification from the point of high-dimensional integra-
tion, has its roots in early papers concerning plain uncertainty quantification
in elliptic PDEs with random coefficients; the paper [7] was foundational in
this area.
394 M. Dashti and A.M. Stuart
6 Conclusions
A Appendix
In this subsection we briefly define the Hilbert and Banach spaces that will be
important in our developments of probability and integration in infinite-dimensional
spaces. As a consequence we pay particular attention to the issue of separability
(the existence of a countable dense subset) which we require in that context. We
primarily restrict our discussion to R- or C-valued functions, but the reader will
easily be able to extend to Rn -valued or Rnn -valued situations, and we discuss
Banach space-valued functions at the end of the subsection.
n X1 o
`pw D `pw .NI R/ D u 2 R wj juj jp < 1 :
j D1
p
Then `w is a Banach space when equipped with the norm
X
1 p1
kuk`pw D wj juj jp :
j D1
10 The Bayesian Approach to Inverse Problems 395
In the case p D 2, the resulting spaces are Hilbert spaces when equipped with the
inner product
1
X
hu; vi D wj uj vj :
j D1
and
The space `1 of bounded sequences is not separable. Each element of the sequence
uj is real valued, but the definitions may be readily extended to complex-valued,
Rn -valued, and Rnn -valued sequences, replacing j j by the complex modulus, the
vector `p norm, and the operator `p norm on matrices, respectively.
We now extend the idea of p-summability to functions and to p-integrability.
Let D be a bounded open set in Rd with Lipschitz boundary and define the space
Lp D Lp .DI R/ of Lebesgue measurable functions f W D ! R with norm kkLp .D/
defined by
( R 1
kf kLp .D/ WD jf jp dx p for 1 p < 1
D
ess supD jf j for p D 1:
Here a:e. is with respect to Lebesgue measure and the integral is, of course, the
Lebesgue integral. Sometimes we drop explicit reference to the set D in the norm
and simply write kkLp . For Lebesgue measurable functions f W D ! Rn , the norm
is readily extended replacing jf j under the integral by the vector p-norm on Rn .
Likewise we may consider Lebegue measurable f W D ! Rnn , using the operator
p-norm on Rnn . In all these cases, we write Lp .D/ as shorthand for Lp .DI X /
where X D R; Rn or Rnn . Then Lp .D/ is the vector space of all (equivalence
classes of) measurable functions f W D ! R for which kf kLp .D/ < 1. The
space Lp .D/ is separable for p 2 1; 1/, while L1 .D/ is not separable. We define
p
periodic versions of Lp .D/, denoted by Lper .D/, in the case where D is a unit
396 M. Dashti and A.M. Stuart
cube; these spaces are defined as the completion of C 1 periodic functions on the
unit cube, with respect to the Lp -norm. If we define Td to be the d -dimensional
p
unit torus, then we write Lper .0; 1
d / D Lp .Td /. Again these spaces are separable
for 1 p < 1, but not for p D 1:
C .D/ is a Banach space. Building on this we define the space C 0; .D/ to be the
space of functions in C .D/ which are Hlder with any exponent 2 .0; 1
with
norm
jf .x/ f .y/j
kf kC 0; .D/ D sup jf .x/j C sup : (10.99)
x2D x;y2D jx yj
jf .x/ f .y/j
lim D 0;
y!x jx yj
0;
uniformly in xI we denote the resulting separable space by C0 .D; R/: This
is analogous to the fact that the space of bounded measurable functions is not
separable, while the space of continuous functions on a compact domain is.
0 0;
Furthermore it may be shown that C 0; C0 for every 0 > . All of the
0;
preceding spaces can be generalized to functions C 0; .D; Rn / and C0 .D; Rn /I
they may also be extended to periodic functions on the unit torus Td found by
identifying opposite faces of the unit cube 0; 1
d . The same separability issues arise
for these generalizations.
with norm
8 p1
< P p
kukW r;p .D/ D jjr kD ukLp .D/ for 1 p < 1; (10.101)
:P
jjr kD ukL1 .D/ for p D 1:
then the space H 1 .D/ is a separable Hilbert space with inner product
and norm (10.101) with p D 2: Likewise the space H01 .D/ is a separable Hilbert
space with inner product
and norm
P x 2 2T . Furthermore, the L
d 2
where the identity holds for (Lebesgue) almost every
2
norm of u is given by Parsevals identity kukL2 D juk j . The fractional Sobolev
space H .T / for s
0 is given by the subspace of functions u 2 L2 .Td / such that
s d
X
kuk2H s WD .1 C 4 2 jkj2 /s juk j2 < 1: (10.103)
k2Zd
Note that this is a separable Hilbert space by virtue of `2w being separable. Note
also that H 0 .Td / D L2 .Td / and that, for positive integer s, the definition agrees
with the definition H s .Td / D W s;2 .Td / obtained from (10.100) with the obvious
generalization from D to Td . For s < 0, we define H s .Td / as the closure of L2
under the norm (10.103). The spaces H s .Td / for s < 0 may also be defined via
duality. The resulting spaces H s are separable for all s 2 R.
We now link the spaces H s .Td / to a specific Hilbert scale of spaces. Hilbert
scales are families of spaces defined by D.As=2 / for A a positive, unbounded,
self-adjoint operator on a Hilbert space. To view the fractional Sobolev spaces from
this perspective, let A D I 4 with domain H 2 .Td /, noting that the eigenvalues of
A are simply 1C4 2 jkj2 for k 2 Zd . We thus see that, by the spectral decomposition
theorem, H s D D.As=2 /, and we have kukH s D kAs=2 ukL2 . Note that we may
work in the space of real-valued functions where the eigenfunctions of A, f'j g1 j D1 ,
comprise sine and cosine functions; the eigenvalues of A, when ordered on a one-
dimensional lattice, then satisfy j j 2=d . This is relevant to the more general
perpsective of Hilbert scales that we now introduce.
We can now generalize the previous construction of fractional Sobolev spaces
to more general domains than the torus. The resulting spaces do not, in general,
coincide with Sobolev spaces, because of the effect of the boundary conditions
of the operator A used in the construction. On an arbitrary bounded open set
D Rd with Lipschitz boundary, we consider a positive self-adjoint operator A
satisfying Assumption 1 so that its eigenvalues satisfy j j 2=d ; then we define
the spaces Hs D D.As=2 / for s > 0: Given a Hilbert space .H; h; i; k k/ of real-
valued functions on a bounded open set D in Rd , we recall from Assumption 1 the
orthonormal basis for H denoted by f'j g1 j D1 . Any u 2 H can be written as
1
X
uD hu; 'j i'j :
j D1
Thus
n o
Hs D u W D ! Rkwk2Hs < 1 (10.104)
10 The Bayesian Approach to Inverse Problems 399
1
X 2s
kuk2Hs D j d juj j2 :
j D1
In fact Hs is a Hilbert space: for vj D hv; 'j i we may define the inner product
1
X 2s
hu; viHs D j d uj vj :
j D1
For any s > 0, the Hilbert space .Hs ; h; iHt ; k kHt / is a subset of the original
Hilbert space H ; for s < 0 the spaces are defined by duality and are supersets
of H . Note also that we have Parseval-like identities showing that the Hs norm on
a function u is equivalent to the `2w norm on the sequence fuj g1 j D1 with the choice
wj D j 2s=d . The spaces Hs are separable Hilbert spaces for any s 2 R:
Z p1
p
kukLp .MIE/ D ku.x/kE .dx/ :
M
400 M. Dashti and A.M. Stuart
For p 2 .1; 1/ these spaces are separable. However, separability fails to hold for
p D 1: We will use these Banach spaces in the case where is a probability
measure P, with corresponding expectation E, and we then have
1
p p
kukLp .MIE/ D E kukE :
P
ap bq
ab C :
p q
From this Hlder-like inequality, the following interpolation bound results: let 2
0; 1
and let L denote a (possibly unbounded) self-adjoint operator on the Hilbert
space .H; h; i; k k/. Then, the bound
Lemma 15. Let Assumption 1 hold. Then for any t > s, any r 2 s; t
and any
u 2 Ht , it follows that
A which defines the Hilbert scale, but also on the behavior of the corresponding
orthonormal basis of eigenfunctions in L1 . To this end we let Assumption 2 hold.
It then turns out that bounding the L1 norm is rather straightforward and we start
with this case.
Lemma 16. Let Assumption 2 hold and define the resulting Hilbert scale of spaces
Hs by (10.104). Then for every s > d2 , the space Hs is contained in the space
L1 .D/ and there exists a constant K1 such that kukL1 K1 kukHs .
1 X X 1=2 X 1=2
kukL1 juk j .1 C jkj2 /s juk j2 .1 C jkj2 /s :
C d d d
k2Z k2Z k2Z
Since the sum in the second factor converges if and only if s > d2 , the claim follows.
As a consequence of Lemma 16, we are able to obtain a more general Sobolev
embedding for all Lp spaces:
Proof. The case p D 2 is obvious and the case p D 1 has already been shown,
so it remains to show the claim for p 2 .2; 1/. The idea is to divide the space of
eigenfunctions into blocks and to estimate separately the Lp norm of every block.
More precisely, we define a sequence of functions u.n/ by
X
u.1/ D u0 '0 ; u.n/ D uj 'j ;
2n j <2nC1
p p2
ku.n/ kLp ku.n/ k2L2 ku.n/ kL1 : (10.107)
Now set s 0 D d2 C for some > 0 and note that the construction of u.n/ , together
with Lemma 16, gives the bounds
0
ku.n/ kL2 K2ns=d ku.n/ kHs ; ku.n/ kL1 K1 ku.n/ kHs0 K2n.s s/=d ku.n/ kHs :
(10.108)
402 M. Dashti and A.M. Stuart
Loosely speaking, one can interpret this theorem as stating that if one knows
the law of any finite number of components of a random vector or function, then
this determines the law of the whole random vector or function; in particular, in the
case of the random function, this comprises uncountably many components. This
statement is thus highly nontrivial as soon as the set I is infinite since we have a
priori defined PA only for finite subsets A I , and the theorem allows us to extend
this uniquely also to infinite subsets.
As a simple Napplication, we can use this theorem to define the infinite product
1
measure P D kD1 P0 as the measure given byKolmogorovs
N Extension Theo-
rem 29 if we take as our family of specifications PA D k2A P0 . Our i.i.d. sequence
is then naturally
defined as a random sample taken from the probability space
R1 ; B.R1 /; P . A more complicated example follows from making sense of the
random field perspective on random functions as explained in Sect. 2.5.
Lemma 17. Let B be a separable Banach space and let and be two Borel
probability measures on B. If `] D `] for every ` 2 B , then D .
404 M. Dashti and A.M. Stuart
where fn is a sequence of simple functions, for which the integral on the right-hand
side may be defined in the usual way, chosen such that
Z
lim kfn .!/ f .!/k P.d !/ D 0:
n!1
With this definition the value of the integral does not depend on the approximating
sequence, it is linear in f , and
Z Z
`.f .!// P.d !/ D ` f .!/ P.d !/ ; (10.110)
Px ` D x `.x/; (10.112)
The problem with this definition is that if we view the map x 7! Px as a map
taking values in the space L.B ; B/ of bounded linear operators from B ! B,
then, since this space is not separable in general, it is not clear a priori whether
(10.113) makes sense as a Bochner integral. This suggests to define the subspace
B? .B/ L.B ; B/ given by the closure (in the usual operator norm) of the linear
span of operators of the type Px given in (10.112) for x 2 B. We then have:
Corollary 3. Assume that has finite variance so that the map x 7! kxk2 is
integrable with respect to . Then the covariance operator C defined by (10.113)
exists as a Bochner integral in B? .B/.
Remark 3. Once the covariance is defined, the fact that (10.111) holds is then an
immediate consequence of (10.110). In general, not every element C 2 B? .B/
can be realised as the covariance of some probability measure. This is the case
even if we impose the positivity condition `.C `/
0, which by (10.111) is a
condition satisfied by every covariance operator. For further insight into this issue,
see Lemma 23 which characteritzes precisely the covariance operators of a Gaussian
measure in separable Hilbert space.
Z
.`/
O WD e i`.x/ .dx/: (10.114)
B
1
O 0 .`/ D ei`.a/ 2 `.C `/ : (10.115)
Lemma 19. Let and be any two probability measures on a separable Banach
space B. If .`/
O D .`/
O for every ` 2 B , then D .
406 M. Dashti and A.M. Stuart
Z
C ` D h`; xix .dx/; (10.116)
H
Z
C D x x .dx/: (10.117)
H
Corollary 3 shows that we can indeed make sense of the second formulation as a
Bochner integral, provided that has finite variance in H.
Z 0
01 d d
dTV .; / D d :
2 d d
Z
1 d 0
dTV .; 0 / D 1 d : (10.118)
2 d
s
Z r r
0 1 d d 0 2
dHell .; / D d :
2 d d
v s
u Z
u1 d 0 2
0
dHell .; / D t 1 d : (10.119)
2 d
1
Note that the numerical constant 2
appearing in both definitions is chosen in such
a way as to ensure the bounds
Lemma 20. The total variation and Hellinger metrics are related by the inequali-
ties
1 1
p dTV .; 0 / dHell .; 0 / dTV .; 0 / 2 :
2
408 M. Dashti and A.M. Stuart
Proof. We have
Z r r r r
0 1 d d 0 d d 0
dTV .; / D C d
2 d d d d
s s
1 Z r d r d 0 2 1 Z r d r d 0 2
d C d
2 d d 2 d d
s s
1 Z r d r d 0 2 Z d d 0
d C d
2 d d d d
p
D 2dHell .; 0 /
as required.
where
1 1
QD 2
.x m1 /2 C 2 .x m2 /2 :
4 1 4 2
10 The Bayesian Approach to Inverse Problems 409
Define 2 by
1 1 1
2
D 2
C 2:
2 4 1 4 2
m1 C m 2
yDx
2
1 1
QD .y m/2 C .m2 m1 /2
2 2 4. 1 C 22 /
2
where m does not appear in what follows and so we do not detail it. Noting that
the integral is then a multiple of a standard Gaussian N .m; 2 / gives the desired
result. In particular this calculation shows that the Hellinger distance between two
Gaussians on R tends to zero if and only if the means and variances of the two
Gaussians approach one another. Furthermore, by the previous lemma, the same is
true for the total variation distance. u t
The preceding example generalizes to higher dimension and shows that, for
example, the total variation and Hellinger metrics cannot metrize weak convergence
of probability measures (as one can also show that convergence in total variation
metric implies strong convergence). They are nonetheless useful distance measures,
for example, between families of measures which are mutually absolutely con-
tinuous. Furthermore, the Hellinger distance is particularly useful for estimating
the difference between expectation values of functions of random variables under
different measures. This is encapsulated in the following lemma:
0
0
12
kE f E f k 2 E kf k2 C E kf k2 dHell .; 0 /:
0
0
12
kE .f f / E .f f /k 2 E kf k4 C E kf k4 dHell .; 0 /:
410 M. Dashti and A.M. Stuart
Proof. Let be a reference probability measure as above. We then have the bound
Z d d 0
0
kE f E f k kf k d
d d
Z r r r d r d 0
1 d d 0 p
D p 2kf k C d
2 d d d d
s s
1 Z r d r d 0 2 Z r d r d 0 2
d 2 kf k2 C d
2 d d d d
s s
1 Z r d r d 0 2 Z d d 0
d 4 kf k2 C d
2 d d d d
0
12
D 2 E kf k2 C E kf k2 dHell .; 0 /
as required.
The proof for f f follows from the bound
0 0
kE .f f / E .f f /k D sup kE hf; hif E hf; hif k
khkD1
Z d d 0
kf k2 d ;
d d
and then arguing similarly to the first case but with kf k replaced by kf k2 .
We now define a third widely used distance concept for comparing two probabil-
ity measures. Note, however, that it does not give rise to a metric in the strict sense,
because it violates both symmetry and the triangle inequality.
Z d 0
d 0
DKL .0 jj/ D log d :
d d
10 The Bayesian Approach to Inverse Problems 411
If is also absolutely continuous with respect to 0 , so that the two measures are
equivalent, then
Z d
0
DKL . jj/ D log d 0
d 0
1 12 .m m /2
2 2 1
DKL .1 jj2 / D ln C 1 C :
1 2 22 2 22
as required. t
u
As for Hellinger distance, this example shows that two Gaussians on R approach
one another in the Kullback-Leibler divergence if and only if their means and vari-
ances approach one another. This generalizes to higher dimensions. The Kullback-
Leibler divergence provides an upper bound for the square of the Hellinger distance
and for the square of the total variation distance.
Lemma 22. Assume that two measures and 0 are equivalent. Then the bounds
1
dHell .; 0 /2 DKL .jj0 /; dTV .; 0 /2 DKL .jj0 /;
2
hold.
412 M. Dashti and A.M. Stuart
Proof. The second bound follows from the first by using Lemma 20, thus it suffices
to proof the first. In the following we use the fact that
x 1
log.x/ 8x
0;
so that
p 1
x 1
log.x/ 8x
0:
2
Z s 0 2 Z s
1 d 1 d 0 d 0
dHell .; 0 /2 D 1 d D C12 d
2 d 2 d d
Z s Z
d 0 1 d 0
D 1 d log d
d 2 d
1
D DKL .jj0 /;
2
as required.
where d denotes the distance function on X and d the dimension of the compact
domain D. Then, for every < , there exists a unique measure on C 0; .D; X /
such that the canonical process under has the same law as u.
10 The Bayesian Approach to Inverse Problems 413
We have here generalized the notion of Hlder spaces from Sect. A.1.2 to
functions taking values in a Polish space; such generalizations are discussed in
Sect. A.1.4. The notion of canonical process is defined in Sect. A.4.
We will frequently use Kolmogorovs continuity test in the following setting:
we again assume that we are given a compact domain D Rd , and now a
collection u.x/ of Rn -valued random variables indexed by x 2 D. We have the
following:
Corollary 4. Assume that there exist p > 1, > 0 and K > 0 such that
Then, for every < , there exists a unique measure on C 0; .D/ such that the
canonical process under has the same law as u.
0 0;
Remark 5. Recall that C 0; .D/ C0 .D/ for all 0 > so that, since the interval
< for this theorem is open, we may interpret the result as giving an equivalent
measure defined on a separable Banach space.
where fk gk0 is an i.i.d. sequence and the k are real- or complex-valued Hlder
functions on bounded open D Rd satisfying, for some 2 .0; 1
,
Corollary 5. Let fk gk0 be countably many centred i.i.d. random variables (real
or complex) with bounded moments of all orders. Moreover let f k gk0 satisfy
(10.122). Suppose there is some 2 .0; 2/ such that
X X
2 2
S1 WD k k kL1 < 1 and S2 WD k k kL1 h.; k/ < 1: (10.123)
k0 k0
Then u defined by (10.121) is almost surely finite for every x 2 D, and u is Hlder
continuous for every Hlder exponent smaller than =2.
Proof. Let us denote by n .X / the nth cumulant of a random variable X . The odd
cumulants of centred random variables are zero. Furthermore, using the fact that the
414 M. Dashti and A.M. Stuart
cumulants of independent random variables simply add up and that the cumulants
of k are all finite by assumption, we obtain for p
1 the bound
X 2p
2p u.x/ u.y/ D 2p .k / k .x/ k .y/
k0
X 2p
. Cp minf22p k k kL1 ; h.; k/
2p
jx yj2p g
k0
X .1 2 /2p 2p: 2
. Cp k k kL1 h.; k/ jx yj2p: 2
k0
. Cp jx yjp ;
of Rn are all Gaussian. This is a property that can readily be generalised to infinite-
dimensional spaces:
Since the covariance operator, and hence the mean, exist for a Gaussian measure,
and since they may be shown to characterize the measure completely, we write
N .m; C / for a Gaussian with mean m and covariance operator C .
Measures in infinite-dimensional spaces are typically mutually singular. Further-
more, two Gaussian measures are either mutually singular or equivalent (mutually
absolutely continuous). The Cameron-Martin space plays a key role in characteriz-
ing whether or not two Gaussians are equivalent.
HV D fh 2 B W 9 h 2 B with h D C h g; (10.125)
under the norm khk2 D hh; hi D h .C h /. It is a Hilbert space when endowed
with the scalar product hh; ki D h .C k / D h .k/ D k .h/.
where h D C h .
Theorem 34. The closed unit ball in the Cameron-Martin space H is compactly
embedded into the separable Banach space B.
In the setting of Gaussian measures on a separable Banach space, all balls have
positive probability. The Cameron-Martin norm is useful in the characterization of
small-ball properties of Gaussians. Let B .z/ denote a ball of radius in B centred
at a point z 2 H .
Theorem 35. The ratio of small ball probabilities under Gaussian measure
satisfy
B .z1 / 1 2 1 2
lim D exp kz2 k kz1 k :
!0 B .z2 / 2 2
10 The Bayesian Approach to Inverse Problems 417
Example 13. Let denote the Gaussian measure N .0; K/ on Rn with K positive
definite. Then Theorem 35 is the statement that
B .z1 / 1 1 2 1 1 2
lim D exp jK z2 j jK z1 j
2 2
!0 B .z2 / 2 2
We now turn to the Feldman-Hjek theorem in the Hilbert Space setting. Let
j D1 denote an orthonormal basis for H. Then the Hilbert-Schmidt norm of a
f'j g1
linear operator L W H ! H is defined by
1
X
kLk2HS WD kL'j k2 :
j D1
The value of the norm is, in fact, independent of the choice of orthonormal basis. In
the finite-dimensional setting, the norm is known as the Frobenius norm.
1 1
;
k2 2 C1 k2 2
From this it follows that the Cameron-Martin spaces of the two measures coincide
and are equal to H01 .J /. Notice that
1 12
T D C1 2 C2 C1 I
1
:
k2 2
These are square summable and so by Theorem 37 the two measures are absolutely
continuous with respect to one another.
Ei .s/j .t / D .s ^ t /i;j :
If we define
N
X
B N .t / WD n .t / en
nD1
then clearly EkB N .t /k2H D tN and so the series will not converge in H for fixed
t > 0: However the expression (10.127) is nonetheless the right way to think of
a cylindrical Wiener process on H; indeed for fixed t > 0 the truncated series for
B N will converge in a larger space containing H. We define the following scale of
Hilbert subspaces, for r > 0, by
1
X
X r D fu 2 H j 2r jhu; j ij2 < 1g
j D1
and then extend to superspaces r < 0 by duality. We use k kr to denote the norm
induced by the inner-product
1
X
hu; vir D j 2r uj vj
j D1
10 The Bayesian Approach to Inverse Problems 421
for uj D hu; j i and vj D hv; j i. A simple argument, similar to that used to prove
Theorem 8, shows that fB N .t /g is, for fixed t > 0, Cauchy in X r for any r < 12 : In
fact it is possible to construct a stochastic process as the limit of the truncated series,
living on the space C .0; 1/; X r / for any r < 12 , by the Kolmogorov Continuity
Theorem 30 in the setting where D D 0; T
and X D X r : We give details in the
more general setting that follows.
Building on the preceding we now discuss construction of a C-Wiener process
W , using the finite-dimensional case described in Remark 2 to guide us. Here C W
H ! H is assumed to be trace-class with eigenvalues j2 . Consider the cylindrical
Wiener process given by
1
X
B.t / D j ej ;
j D1
1
X
W .t / D j j .t /ej : (10.129)
j D1
X
1 X
1
EW .t / W .s/ D E j k j .t /k .s/ej ek
j D1 kD1
X
1 X
1
D j k E j .t /k .t / ej ek
j D1 kD1
X
1 X
1
D j k j k .t ^ s/ej ek
j D1 kD1
1
X
D j2 j j t ^ s
j D1
D C .t ^ s/:
Thus the process has the covariance structure of Brownian motion in time, and
covariance operator C in space. Hence the name C-Wiener process.
422 M. Dashti and A.M. Stuart
P1 2r 2
Assume now that the sequence D fj g1 j D1 is such that j D1 j j D M < 1
for some r 2 R: For fixed t it is then possible to construct a stochastic process as
the limit of the truncated series
N
X
W N .t / D j j .t /ej ;
j D1
Section A.1 introduces various Banach and Hilbert spaces, as well as the
notion of separability; see [100]. In the context of PDEs, see [33] and [87],
for all of the function spaces defined in Sects. A.1.1A.1.3; Sobolev spaces are
developed in detail in [2]. The nonseparability of the Hlder spaces C 0; and
0;
the separability of C0 is discussed in [40]. For asymptotics of the eigenvalues
of the Laplacian operator see [91, Chapter 11]. For discussion of the more
general spaces of E-valued functions over a measure space .M; / we refer the
reader to [100]. Section A.1.5 concerns Sobolev embedding theorems, building
rather explicitly on the case of periodic functions. The corresponding embedding
results in domains with more general boundary conditions or even on more
general manifolds or unbounded domains, we refer to the comprehensive series
of monographs [9597]. The interpolation inequality of (10.106) and Lemma 15
may be found in [87]; see also Proposition 6.10 and Corollary 6.11 of [40]. The
proof of Theorem 28 closely follows that given in [40, Theorem 6.16], and is a
slight generalization to the Hilbert scale setting used here.
Section A.2 briefly introduces the theory of probability measures on infinite-
dimensional spaces. We refer to the extensive treatise by Bogachev [15], and
to the much shorter but more readily accessible book by Billingsley [12], for
more details. The subject of independent sequences of random variables, as
overviewed in Sect. A.2.1 in the i.i.d. case, is discussed in detail in [27, section
1.5.1]. The Kolmogorov Extension Theorem 29 is proved in numerous texts
in the setting where X D R [79]; since any Polish space is isomorphic to R
it may be stated as it is here. Proofs of Lemmas 17 and 19 may be found in
10 The Bayesian Approach to Inverse Problems 423
[40], where they appear as Proposition 3.6 and Proposition 3.9 respectively. For
(10.115) see [28, Chapter 2]. In Sect. A.2.2 we introduce the Bochner integral;
see [13,48] for further details. Lemma 18 and the resulting Corollary 3 are stated
and proved in [14]. The topic of metrics on probability measures, introduced
in Sect. A.2.4 is overviewed in [38], where detailed references to the literature
on the subject may also be found; the second inequality in Lemma 22 is often
termed the Pinsker inequality and can be found in [22]. Note that the choice
of normalization constants in the definitions of the total variation and Hellinger
metrics differs in the literature. For a more detailed account of material on weak
convergence of probability measures we refer, for example, to [12, 15, 98]. A
proof of the Kolmogorov continuity test as stated in Theorem 30 can be found
in [85, p. 26] for simple case of D an interval and X a separable Banach space;
the generalization given here may be found in a forthcoming uptodate version
of [40].
The subject of Gaussian measures, as introduced in Sect. A.3, is comprehensively
studied in [14] in the setting of locally convex topological spaces, including
separable Banach spaces as a special case. See also [67] which is concerned
with Gaussian random functions. The Fernique theorem 31 is proved in [35]
and the reader is directed to [40] for a very clear exposition. In Theorem 31
it is possible to take for any value smaller than 1=.2kC k/ and this value
is sharp: see [66, Thm 4.1]. See [14, 67] for more details on the Cameron-
Martin space, and proof of Theorem 32. Theorem 33 follows from Theorem
3.6.1 and Corollary 3.5.8 of [14]: Theorem 3.6.1 shows that the topological
support is the closure of the Cameron-Martin space in B and Corollary 3.5.8
shows that the Cameron-Martin space is dense in B. The reproducing kernel
Hilbert space for (or just reproducing kernel for short) appears widely in
the literature and is isomorphic to the Cameron-Martin space in a natural way.
There is considerable confusion between the two as a result. We retain in these
notes the terminology from [14], but the reader should keep in mind that there
are authors who use a slightly different terminology. Theorem 35 as stated
is a consequence of Proposition 3 in section 18 in [67]. Turning now to the
Hilbert space setting we note that Lemma 23 is proved as Proposition 3.15, and
Theorem 36 appears as Exercise 3.34, in [40]. See [14, 28, 52] for alternative
developments of the Cameron-Martin and Feldman-Hjek theorems. The original
statement of the Feldman-Hjek Theorem 37 can be found in [34, 45]. Our
statement of Theorem 37 mirrors Theorem 2.23 of [28] and Remark 7 is Lemma
6.3.1(ii) of [14]. Note that we have not stated a result analogous to Theorem 32 in
the case where of two equivalent Gaussian measures with differing covariances.
Such a result can be stated, but is technically complicated in general because the
ratio of normalization constants of approximating finite-dimensional measures
can blow up as the limiting infinite-dimensional Radon-Nikodym derivative is
attained; see Corollary 6.4.11 in [14].
Section A.4 contains a discussion of cylindrical and C-Wiener processes. The
development is given in more detail in section 3.4 of [40], and in section 4.3
of [28].
424 M. Dashti and A.M. Stuart
Acknowledgements The authors are indebted to Martin Hairer for help in the development of
these notes, and in particular for considerable help in structuring the Appendix, for the proof of
Theorem 28 (which is a slight generalization to Hilbert scales of Theorem 6.16 in [40]) and for the
proof of Corollary 5 (which is a generalization of Corollary 3.22 in [40] to the non-Gaussian setting
and to Hlder, rather than Lipschitz, functions { k }). They are also grateful to Joris Bierkens,
Patrick Conrad, Matthew Dunlop, Shiwei Lan, Yulong Lu, Daniel Sanz-Alonso, Claudia Schillings
and Aretha Teckentrup for careful proof-reading of the notes and related comments. AMS is
grateful for various hosts who gave him the opportunity to teach this material in short course form
at TIFR-Bangalore (Amit Apte), Gttingen (Axel Munk), PKU-Beijing (Teijun Li), ETH-Zurich
(Christoph Schwab) and Cambridge CCA (Arieh Iserles), a process which led to refinements of
the material; the authors are also grateful to the students on those courses, who provided useful
feedback. The authors would also like to thank Sergios Agapiou and Yuan-Xiang Zhang for help
in the preparation of these lecture notes, including type-setting, proof-reading, providing the proof
of Lemma 3 and delivering problems classes related to the short courses. AMS is also pleased
to acknowledge the financial support of EPSRC, ERC and ONR over the last decade, while the
research that underpins this work has been developed.
References
1. Adler, R.: The Geometry of Random Fields. SIAM, Philadelphia (1981)
2. Adams, R.A., Fournier, J.J.: Sobolev Spaces. Pure and Applied Mathematics. Elsevier, Oxford
(2003)
3. Agapiou, S., Larsson, S., Stuart, A.M.: Posterior contraction rates for the Bayesian approach
to linear ill-posed inverse problems. Stoch. Process. Appl. 123, 38283860 (2013)
4. Agapiou, S., Stuart, A.M., Zhang, Y.X.: Bayesian posterior consistency for linear severely
ill-posed inverse problems. J. Inverse Ill-posed Probl. 22(3), 297321 (2014)
5. Alexanderian, A., Petra, N., Stadler, G., Ghattas, O.: A fast and scalable method for A-optimal
design of experiments for infinite-dimensional Bayesian nonlinear inverse problems. SIAM
J. Sci. Comput. 38(1), A243A272 (2016)
6. Alexanderian, A., Gloor, P., Ghattas, O.: On Bayesian A- and D-optimal experimental designs
in infinite dimensions. http://arxiv.org/abs/1408.6323 (2016)
7. Babuska, I., Tempone, R., Zouraris, G.: Galerkin finite element approximations of stochastic
elliptic partial differential equations. SIAM J. Numer. Anal. 42, 800825 (2004)
8. Banks, H.T., Kunisch, H.: Estimation Techniques for Distributed Parameter Systems.
Birkhauser, Boston (1989)
9. Beskos, A., Pinski, F.J., Sanz-Serna, J.-M., Stuart, A.M.: Hybrid Monte-Carlo on Hilbert
spaces. Stoch. Process. Appl. 121, 22012230 (2011)
10. Beskos, A., Jasra, A., Muzaffer, E.A., Stuart, A.M.: Sequential Monte Carlo methods for
Bayesian elliptic inverse problems. Stat. Comput. (2015)
11. Bernardo, J., Smith, A.: Bayesian Theory. Wiley, Chichester (1994)
12. Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968)
13. Bochner, S.: Integration von Funktionen, deren Werte die Elemente eines Vektorraumes sind.
Fund. Math. 20, 262276 (1933)
14. Bogachev, V.I.: Gaussian Measures. Mathematical Surveys and Monographs, vol. 62. Ameri-
can Mathematical Society, Providence (1998)
15. Bogachev, V.I.: Measure Theory, vol. I, II. Springer, Berlin (2007)
16. Bui-Thanh, T., Ghattas, O., Martin, J., Stadler, G.: A computational framework for infinite-
dimensional Bayesian inverse problems. Part I: the linearized case, with application to global
seismic inversion. SIAM J. Sci. Comput. 35(6), A2494A2523 (2013)
17. Cotter, S., Dashti, M., Robinson, J., Stuart, A.: Bayesian inverse problems for functions
and applications to fluid mechanics. Inverse Probl. 25. doi:10.1088/02665611/25/11/115008
(2009)
10 The Bayesian Approach to Inverse Problems 425
18. Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best n-term Galerkin approxima-
tions for a class of elliptic sPDEs. Found. Comput. Math. 10, 615646 (2010)
19. Cohen, A., DeVore, R., Schwab, Ch.: Analytic regularity and polynomial approximation of
parametric and stochastic elliptic PDEs. Anal. Appl. 9(1), 1147 (2011)
20. Cotter, S., Dashti, M., Stuart, A.: Approximation of Bayesian inverse problems. SIAM J.
Numer. Anal. 48, 322345 (2010)
21. Cotter, S., Roberts, G., Stuart, A., White, D.: MCMC methods for functions: modifying old
algorithms to make them faster. Stat. Sci. 28, 424446 (2013)
22. Csiszar, I., Krner, J.: Information Theory: Coding Theorems for Discrete Memoryless
Systems. Cambridge University Press, Cambridge (2011)
23. Dacorogna, B.: Introduction to the Calculus of Variations. Translated from the 1992 French
original, 2nd edn. Imperial College Press, London (2009)
24. Dashti, M., Harris, S., Stuart, A.: Besov priors for Bayesian inverse problems. Inverse Probl.
Imaging 6, 183200 (2012)
25. Dashti, M., Stuart, A.: Uncertainty quantification and weak approximation of an elliptic
inverse problem. SIAM J. Numer. Anal. 49, 25242542 (2011)
26. Del Moral, P.: Feynman-Kac Formulae. Springer, New York (2004)
27. Da Prato, G.: An introduction to infinite-dimensional analysis. Universitext. Springer, Berlin
(2006). Revised and extended from the 2001 original by Da Prato
28. DaPrato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Encyclopedia
of Mathematics and its Applications, vol. 44. Cambridge University Press, Cambridge
(1992)
29. DaPrato, G., Zabczyk, J.: Ergodicity for Infinite Dimensional Systems. Cambridge University
Press, Cambridge (1996)
30. Dashti, M., Law, K.J.H., Stuart, A.M., Voss, J.: MAP estimators and their consistency in
Bayesian nonparametric inverse problems. Inverse Probl. 29, 095017 (2013)
31. Daubechies, I.: Ten lectures on wavelets. CBMS-NSF Regional Conference Series in Applied
Mathematics, vol. 61. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
(1992)
32. Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht/-
Boston (1996)
33. Evans, L.: Partial Differential Equations. AMS, Providence (1998)
34. Feldman, J.: Equivalence and perpendicularity of Gaussian processes. Pac. J. Math. 8, 699
708 (1958)
35. Fernique, X.: Intgrabilit des vecteurs Gaussiens. C. R. Acad. Sci. Paris Sr. A-B 270,
A1698A1699 (1970)
36. Franklin, J.: Well-posed stochastic extensions of ill-posed linear problems. J. Math. Anal.
Appl. 31, 682716 (1970)
37. Gardiner, C.W.: Handbook of stochastic methods. Springer, Berlin, 2nd edn. (1985). For
Physics, Chemistry and the Natural Sciences
38. Gibbs, A., Su, F.: On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419435
(2002)
39. Graham, I.G., Kuo, F.Y., Nicholls, J.A., Scheichl, R., Schwab, C., Sloan, I.H.:
Quasi-Monte Carlo Finite Element methods for Elliptic PDEs with Log-normal Ran-
dom Coefficients, Seminar for Applied Mathematics, ETH, SAM Report 201314
(2013)
40. Hairer, M.: Introduction to Stochastic PDEs. Lecture Notes http://arxiv.org/abs/0907.4178
(2009)
41. Hairer, M., Stuart, A.M., Voss, J.: Analysis of SPDEs arising in path sampling, part II: the
nonlinear case. Ann. Appl. Probab. 17, 16571706 (2007)
42. Hairer, M., Stuart, A., Voss, J.: Sampling conditioned hypoelliptic diffusions. Ann. Appl.
Probab. 21(2), 669698 (2011)
43. Hairer, M., Stuart, A., Voss, J., Wiberg, P.: Analysis of SPDEs arising in path sampling. Part
I: the Gaussian case. Comm. Math. Sci. 3, 587603 (2005)
426 M. Dashti and A.M. Stuart
44. Hairer, M., Stuart, A., Vollmer, S.: Spectral gaps for a Metropolis-Hastings algorithm in
infinite dimensions. Ann. Appl. Probab. 24(6), 24552490 (2014)
45. Hjek, Y.: On a property of normal distribution of any stochastic process. Czechoslov. Math.
J. 8(83), 610618 (1958)
46. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications.
Biometrika 57(1), 97109 (1970)
47. Helin, T., Burger, M.: Maximum a posteriori probability estimates in infinite-dimensional
Bayesian inverse problems. Inverse Probl. 31(8), 085009 (2015)
48. Hildebrandt, T.H.: Integration in abstract spaces. Bull. Am. Math. Soc. 59, 111139 (1953)
49. Kahane, J.-P.: Some Random Series of Functions. Cambridge Studies in Advanced Mathe-
matics, vol. 5. Cambridge University Press, Cambridge (1985)
50. Kantas, N., Beskos, A., Jasra, A.: Sequential Monte Carlo methods for high-dimensional
inverse problems: a case study for the Navier-Stokes equations. arXiv preprint,
arXiv:1307.6127
51. Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Problems. Springer, New
York (1996)
52. Khn, T., Liese, F.: A short proof of the Hjek-Feldman theorem. Teor. Verojatnost. i
Primenen. 23(2), 448450 (1978)
53. Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems. Applied Mathe-
matical Sciences, vol. 160. Springer, New York (2005)
54. Kallenberg, O.: Foundations of Modern Probability, 2nd edn. Probability and its Applications.
Springer, New York, (2002)
55. Knapik, B., van Der Vaart, A., van Zanten, J.: Bayesian inverse problems with Gaussian
priors. Ann. Stat. 39(5), 26262657 (2011)
56. Knapik, B., van der Vaart, A., van Zanten, J.H.: Bayesian recovery of the initial condition for
the heat equation Commun. Stat. Theory Methods 42, 12941313 (2013)
57. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo methods for very high dimensional
integration: the standard (weighted Hilbert space) setting and beyond. ANZIAM J. 53, 137
(2011)
58. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo Finite Element Methods for a Class
of Elliptic Partial Differential Equations with Random Coefficients. SIAM J. Numer. Anal.
50(6), 33513374 (2012)
59. Kuo, F.Y., Sloan, I.H.: Lifting the curse of dimensionality. Not. AMS 52(11), 13201328
(2005)
60. Kuo, F.Y., Sloan, I.H., Wasilikowski, G.W., Waterhouse, B.J.: Randomly shifted lattice rules
with the optimal rate of convergence for unbounded integrands. J. Complex. 26, 135160
(2010)
61. Lasanen, S.: Discretizations of generalized random variables with applications to inverse
problems. Ann. Acad. Sci. Fenn. Math. Dissertation, University of Oulu, 130 (2002)
62. Lasanen, S.: Measurements and infinite-dimensional statistical inverse theory. PAMM 7,
10801011080102 (2007)
63. Lasanen, S.: Non-Gaussian statistical inverse problems part II: posterior convergence for
approximated unknowns. Inverse Probl. Imaging 6(2), 267 (2012)
64. Lasanen, S.: Non-Gaussian statistical inverse problems. Part I: posterior distributions. Inverse
Probl. Imaging 6(2), 215266 (2012)
65. Lasanen, S.: Non-Gaussian statistical inverse problems. Part II: posterior distributions. Inverse
Probl. Imaging 6(2), 267287 (2012)
66. Ledoux, M.: Isoperimetry and Gaussian analysis. In: Lectures on Probability Theory and
Statistics (Saint-Flour, 1994). Lecture Notes in Mathematics, vol. 1648, pp. 165294.
Springer, Berlin (1996)
67. Lifshits, M.: Gaussian Random Functions. Mathematics and its Applications, vol. 322.
Kluwer, Dordrecht (1995)
68. Lehtinen, M.S., Pivrinta, L., Somersalo, E.: Linear inverse problems for generalised random
variables. Inverse Probl. 5(4), 599612 (1989). http://stacks.iop.org/0266-5611/5/599
10 The Bayesian Approach to Inverse Problems 427
69. Lassas, M., Saksman, E., Siltanen, S.: Discretization-invariant Bayesian inversion and Besov
space priors. Inverse Probl. Imaging 3, 87122 (2009)
70. Lunardi, A.: Analytic Semigroups and Optimal Regularity in Parabolic Problems. Progress
in Nonlinear Differential Equations and their Applications, vol. 16. Birkhuser Verlag, Basel
(1995)
71. Mandelbaum, A.: Linear estimators and measurable linear transformations on a Hilbert space.
Z. Wahrsch. Verw. Gebiete 65(3), 385397 (1984). http://dx.doi.org/10.1007/BF00533743
72. Mattingly, J., Pillai, N., Stuart, A.: Diffusion limits of the random walk Metropolis algorithm
in high dimensions. Ann. Appl. Probl. 22, 881930 (2012)
73. Metropolis, N., Rosenbluth, R., Teller, M., Teller, E.: Equations of state calculations by fast
computing machines. J. Chem. Phys. 21, 10871092 (1953)
74. Meyer, Y.: Wavelets and operators. Translated from the 1990 French original by D.H.
Salinger. Cambridge Studies in Advanced Mathematics, vol. 37. Cambridge University Press,
Cambridge (1992)
75. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Communications and
Control Engineering Series. Springer, London (1993)
76. Neal, R.: Regression and classification using Gaussian process priors. http://www.cs.toronto.
edu/~radford/valencia.abstract.html (1998)
77. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1994)
78. Norris, J.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics.
Cambridge University Press, Cambridge (1998)
79. Oksendal, B.: Stochastic Differential Equations. An Introduction with Applications. Univer-
sitext, 6th edn. Springer, Berlin (2003)
80. Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations.
Springer, New York (1983)
81. Petra, N., Martin, J., Stadler, G., Ghattas, O.: A computational framework for infinite-
dimensional Bayesian inverse problems. Part II: stochastic Newton MCMC with application
to ice sheet flow inverse problems. SIAM J. Sci. Comput. 36(4), A1525A1555 (2014)
82. Pillai, N.S., Stuart, A.M., Thiery, A.H.: Noisy gradient flow from a random walk in Hilbert
space. Stoch. PDEs: Anal. Comput. 2, 196232 (2014)
83. Pinski, F., Stuart, A.: Transition paths in molecules at finite temperature. J. Chem. Phys. 132,
184104 (2010)
84. Rebeschini, P., van Handel, R.: Can local particle filters beat the curse of dimensionality?
Ann. Appl. Probab. 25(5), 28092866 (2015)
85. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundlehren der Mathe-
matischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 293, 2nd
edn. Springer, Berlin (1994)
86. Richter, G.: An inverse problem for the steady state diffusion equation. SIAM J. Appl. Math.
41(2), 210221 (1981)
87. Robinson, J.C.: Infinite-Dimensional Dynamical Systems. Cambridge Texts in Applied
Mathematics. Cambridge University Press, Cambridge (2001)
88. Rudin, W.: Real and Complex Analysis, 3rd edn. McGraw-Hill, New York (1987)
89. Schillings, C., Schwab, C.: Sparse, adaptive Smolyak quadratures for Bayesian inverse
problems. Inverse Probl. 29, 065011 (2013)
90. Schwab, C., Stuart, A.: Sparse deterministic approximation of Bayesian inverse problems.
Inverse Probl. 28, 045003 (2012)
91. Strauss, W.A.: Partial differential equations. An introduction, 2nd edn. Wiley, Chichester
(2008)
92. Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451559 (2010)
93. Stuart, A.M.: Uncertainty quantification in Bayesian inversion. ICM2014. Invited Lecture
(2014)
94. Tierney, L.: A note on Metropolis-Hastings kernels for general state spaces. Ann. Appl.
Probab. 8(1), 19 (1998)
428 M. Dashti and A.M. Stuart
95. Triebel, H.: Theory of function spaces. Mathematik und ihre Anwendungen in Physik und
Technik [Mathematics and its Applications in Physics and Technology], vol. 38. Akademische
Verlagsgesellschaft Geest & Portig K.-G., Leipzig (1983)
96. Triebel, H.: Theory of Function Spaces. II. Monographs in Mathematics, vol. 84. Birkhuser
Verlag, Basel (1992)
97. Triebel, H.: Theory of Function Spaces. III. Monographs in Mathematics, vol. 100. Birkhuser
Verlag, Basel (2006)
98. Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58.
American Mathematical Society, Providence (2003)
99. Vollmer, S.: Posterior consistency for Bayesian inverse problems through stability and
regression results. Inverse Probl. 29, 125011 (2013)
100. Yosida, K.: Functional Analysis. Classics in Mathematics. Springer, Berlin (1995). Reprint of
the sixth (1980) edition.
101. Yin, G., Zhang, Q.: Continuous-Time Markov Chains and Applications. Applications of
Mathematics (New York), vol. 37. Springer, New York (1998)
Multilevel Uncertainty Integration
11
Sankaran Mahadevan, Shankar Sankararaman, and Chenzhao Li
Abstract
This chapter discusses a Bayesian methodology to integrate model verification,
validation, and calibration activities for the purpose of overall uncertainty
quantification in model-based prediction. The methodology is first developed
for single-level models and then extended to systems that are studied using
multilevel models that interact with each other. Two types of interactions among
multilevel models are considered: (1) Type-I, where the output of a lower-level
model (component and/or subsystem) becomes an input to a higher-level system
model, and (2) Type-II, where parameters of the system model are inferred
using lower-level models and tests (that describe simplified components and/or
isolated physics). The various models; their inputs, parameters, and outputs;
experimental data; and various sources of model error are connected through
a Bayesian network. The results of calibration, verification, and validation
with respect to each individual model are integrated using the principles of
conditional probability and total probability and propagated through the Bayesian
network in order to quantify the overall system-level prediction uncertainty.
For Type-II model, the relevance of each lower-level output to the system-level
quantity of interest is quantified by comparing Sobol indices, thus measuring the
extent to which a lower-level test represents the characteristics of the system
so that the calibration results can be reliably used in the system level. The
proposed methodology is illustrated with numerical examples that deal with heat
conduction and structural dynamics.
S. Mahadevan () C. Li
Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN, USA
e-mail: sankaran.mahadevan@vanderbilt.edu
S. Sankararaman
NASA Ames Research Center, SGT Inc., Moffett Field, CA, USA
Keywords
Multilevel system Uncertainty quantification Bayesian network Calibra-
tion Validation Verification Relevance Sobol indices Bayes factor
Model reliability metric
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
1.2 Multilevel System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
1.3 Organization of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
2 Integration of Verification, Validation, and Calibration (Single Level) . . . . . . . . . . . . . . . 436
2.1 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
2.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
2.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
2.4 Integration for Overall Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . 447
3 Numerical Example: Single-Level Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
3.1 Description of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
3.2 Verification, Validation, and Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
3.3 Integration and Overall Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . 449
4 Multilevel Models with Type-I Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
4.1 Verification, Calibration, and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
4.2 Integration for Overall Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . 452
4.3 Extension to Multiple Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
5 Numerical Example: Two Models with Type-I Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 455
6 Multilevel Models with Type-II Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
6.1 Verification, Calibration, and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
6.2 Relevance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
6.3 Integration for Overall Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . 462
7 Numerical Example: Models with Type-II Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
7.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
7.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
1 Introduction
1.1 Motivation
While the Bayesian approach offers several advantages, there are some practical
challenges in implementing Bayesian methods. The need to assume prior probability
distributions might be challenging in some situations, whereas it might be attractive
in other situations to incorporate prior knowledge (if available) and use non-
informative prior probability distributions [7] when no prior knowledge is available.
Bayesian techniques involve the computation of high-dimensional integrals that are
often solved through Markov chain Monte Carlo (MCMC) sampling techniques that
require extensive computational effort. With the advent of the high-performance
432 S. Mahadevan et al.
Typically, a multilevel system is studied using different types of models; each model
may represent a particular component, a particular subsystem, or an isolated set of
features/physics of the original system. The interaction between any two models
depends on what features of the original system they represent, and it is necessary
to translate the dependency between the features into mathematical relationship
between the models. In order to facilitate such translation, and hence, the objectives
of this chapter, two types of interactions (designated as Type-I and Type-II in this
chapter) are considered in detail. In both of these types, the quantity of interest
is an overall system-level response, but there is a significant difference in how
this quantity is calculated using information from lower-level models, verification,
validation, and calibration.
The interaction between two models (denoted by G and H in Fig. 11.1a) is
considered to be Type-I, when the lower-level model G represents the behavior
of a particular component/subsystem whose output (Y ) becomes an input to a
higher-level subsystem/component whose behavior is represented by the higher-
level model H . Each model has its own set of model parameters (not indicated
in Fig. 11.1a); there may or may not be any model parameter common between
two models. For example, the rise in the temperature (Y , computed using G) of a
conducting wire leads to a change in its resistance and, hence, in its current carrying
capacity (Z, which is computed using H ).
11 Multilevel Uncertainty Integration 433
Fig. 11.1 Two types of interactions between models. (a) Type-I interaction. (b) Type-II interac-
tion
The interaction between two models is said to be Type-II, when the lower-level
model represents a simplification of the overall system. Typically, it may not be
possible to study certain features of the system (i.e., the system parameters) since
extensive testing of the overall system may be prohibitory. Therefore, a simplified
configuration that consists of an isolated set of features and/or an isolated set of
physics of the multilevel system is often considered for testing. The response of the
simplified subsystem is not directly related to the response of the overall system;
however, there are some parameters ./ of the system model (H ) that can be
inferred using lower-level models and experiments. Further, multiple test options
may be available (two in Fig. 11.1b), where the Level 2 model G2 may describe more
features than the Level 1 model G1 , in terms of complexity. The model of the highest
complexity (H ) represents the system of interest and the system-level response (Z)
needs to be calculated. The model parameters ./ are calibrated using models and
experiments of reduced complexity (e.g., isolated components or physics) and then
propagated through the system model to compute the desired response.
For example, consider a coupon subjected to axial testing (say, the axial
deflection is modeled using G1 ), a cantilever beam subjected to point loading (say,
the deflection is modeled using G2 /, and a plate subjected to bending (say, the
bending stress is modeled using H ), where the coupon, the beam, and the plate
are made of the same material. While both the axial deflection of the coupon (Y1 )
and the deflection of the beam (Y2 ) are not directly related to the bending stress in
the plate (Z), some material properties (like the elastic modulus, denoted by )
of the plate can be studied using tests performed on the coupon and the beam.
Note that there is no interaction between models G1 and G2 ; there is Type-II
interaction between models G1 and H and between models G2 and H. Urbina
et al. [1], Sankararaman et al. [11], Mullins et al. [12], and Li et al. [13] discuss
practical multilevel systems with Type-II interaction; these studies consider lower-
level models of increasing complexity and physics, i.e., only a few aspects of physics
are captured at the lowest level model and more aspects are increasingly captured in
subsequent higher levels.
434 S. Mahadevan et al.
The rest of the chapter is organized as follows. Section 2 develops the methodology
for the integration of calibration, verification, and validation in a single-level model,
followed by a numerical example in Sect. 3. Section 4 develops the integration
436 S. Mahadevan et al.
Consider a single-level model as shown in Fig. 11.2. The inputs are x, the model
parameters are , the true solution of the mathematical equation is y, and the code
output is yc . Both yc and y are deterministic functions of inputs (x) and model
parameters ().
This section discusses methods to integrate the results of calibration, verification,
and validation of the model. Since the process of verification is not related to
experimental data, it needs to be performed first; both calibration and validation
must include the results of verification analysis (i.e., solution error quantification).
Then, the results of verification, calibration, and validation are integrated to compute
the overall uncertainty in the response quantity.
2.1 Verification
The process of verification checks how close the code output is to the true
solution of the mathematical equation. Verification includes both code verification
(identification of programming errors and debugging) and solution verification (con-
vergence studies, identifying and computing solution approximation errors) [20,21].
Methods for code verification [2227] and estimation of solution approximation
error [25, 2734] have been investigated by several researchers. It is desirable
to perform verification before calibration and validation so that the solution
approximation errors are accounted for during calibration and validation. Solution
approximation errors in finite element analysis have been estimated using a variety
Fig. 11.2 A single-level model. (a) Mathematical equation. (b) Computer code
11 Multilevel Uncertainty Integration 437
stochastic errors of the surrogate model are accounted for through sampling based
on their estimated distributions. As a result, the overall solution approximation error
Gse .xI / is also stochastic, i.e., y is stochastic even for given values of x and ,
and this stochasticity, without physical randomness, also needs to be interpreted
subjectively. The remainder of this subsection briefly reviews the estimation of
discretization error and surrogate model uncertainty.
Recall that verification is performed before calibration and validation in the
approach of this chapter. Once the solution approximation errors are quantified, the
model predictions are corrected as explained above; the corrected solution (y) and
not the code output (yc ) is used for calibration and validation. Such an approach
integrates the result of verification into calibration, validation, and all subsequent
uncertainty quantification.
y D 1 h
2 1 D .r p 1/ (11.1)
3 2
p log.r/ D log
2 1
The solutions 1 , 2 , and 3 are dependent on both x and , and hence the error
estimate h and the true solution y are also functions of both x and . Since the
discretization error is a deterministic quantity, it needs to be corrected for, in the
context of uncertainty propagation.
Recently, Rangavajhala et al. [34] extended the Richardson extrapolation
methodology from a polynomial relation to a more flexible Gaussian process
extrapolation. This approach expresses the discretization error as a probability
distribution, and therefore, the training points (in particular, the output values)
11 Multilevel Uncertainty Integration 439
for the surrogate model are themselves stochastic. Rasmussen [4648] discusses
constructing GP models when the training point values are themselves stochastic.
yTT 1 d
log p.yT jxT I / D .KT T C n2 I /1 yT log j.KT T C n2 I /j C log.2/
2 2 2
(11.4)
Once the hyper-parameters are estimated, then the Gaussian process model can be
used for predictions using Eq. (11.3). For details of this method, refer to [46,4854].
Note that the variance in Eq. (11.3) is representative of the uncertainty due to the
use of the GP surrogate model, and this uncertainty needs to be accounted for, while
computing the overall prediction uncertainty.
440 S. Mahadevan et al.
2.2 Calibration
L./f ./
f .jG; D c / D R (11.5)
L./f ./d
In Eq. (11.5), f ./ is the prior PDF and f ./ is the prior PDF and f .jG; D c /
is the posterior PDF; the calibration procedure uses the model form G and hence the
posterior is conditioned on G. The function L./ is the likelihood of defined as
being proportional to the probability of observing the data D C (given as xi vs. yi ;
i D 1 to n) conditioned on the parameters . This likelihood function is evaluated as
n
Y
1 .yi G.x i ; //2
L./ / p exp (11.6)
iD1
2 2 2
where is the standard deviation of obs D y G.x; /. Note that the likelihood
is constructed using the actual solution of the mathematical equation (y), i.e.,
after correcting the raw code prediction (yc ) with solution approximation errors
(soln ). Sometimes, is also inferred along with , by constructing the joint
likelihood L.; /. Note that Eq. (11.6) assumes that the experimental observations
are unbiased. If the predictions are biased due to modeling errors, then the output
can be modeled as y D G.x; / C C obs , where represents the modeling error
and is inferred along with . This approach was further extended by Kennedy and
OHagan [30] by modeling the output as D G.x; / C .x/ C obs , where .x/
(sometimes, referred to as the model inadequacy function) is represented using a
Gaussian process (GP) whose hyper-parameters are also inferred along with . A
challenge with the Kennedy OHagan (KOH) framework is that it is necessary to (1)
simultaneously calibrate both the model parameters and the GP hyper-parameters
and (2) possess good prior knowledge regarding both the model parameters and
the GP hyper-parameters. This is a challenging issue and still being addressed
by several researchers [64, 65]. Using the KOH framework, all the parameters to
11 Multilevel Uncertainty Integration 441
2.3 Validation
Model validation refers to the process of determining the degree to which a model is
an accurate representation of the real world from the perspective of the intended use
of the model [22,72]. Generally model validation is realized by comparing the model
prediction against experimental data. Different types of validation metrics have been
studied in the literature to express the accuracy of a computational model through
comparison of its prediction against observed data and to determine whether the
442 S. Mahadevan et al.
model is adequate for its intended use (sometimes referred to as qualification [25]).
Coleman and Stern [73] and Oberkampf and Trucano [25] discussed conceptual
and practical aspects of model validation and provided guidelines for conducting
validation experiments and developing validation metrics. Available approaches
for quantitative model validation are based on statistical confidence intervals
[74], computing distance between the model prediction and experimental data by
computing the area metric [9, 75], normalizing residuals [76], classical statistics-
based hypothesis testing [77], Bayesian hypothesis testing [4,7881], and reliability
analysis-based techniques [8284]. Liu et al. [85] and Ling and Mahadevan [5]
investigated several of these validation approaches in detail and discussed their
practical implications in engineering.
The term validation has been used to imply different approaches in different
studies. While some studies compute validation metrics, some other approaches
focus on estimating the model discrepancy [15, 32] as the difference between the
model prediction and the underlying physical phenomenon the model seeks to rep-
resent. Comprehensive reviews on model validation can be found in [25, 74, 86, 87].
The present chapter only focuses on computing validation metrics with validation
data and does not estimate the model form error with validation data. The model
discrepancy is computed through the use of a discrepancy function in the Kennedy
OHagan framework during model calibration, a previous task. Separate sets of data
are considered for calibration and validation.
In this chapter, model calibration and model validation are distinct activities.
Model validation can be conducted exclusive of any model calibration [74] if the
model parameters are assumed to be known. However, the model parameters are
often unknown. Therefore, prior to model validation, model calibration can be con-
ducted to quantify the values of or reduce the uncertainty about their values. Thus
validation is a subsequent and distinct activity after validation. Both model calibra-
tion and model validation are conducted in this chapter, but they use different sets
of experimental data (no calibration data is used in model validation), as suggested
by [88]. The model parameters do not change as a result of model validation.
Among the validation methods mentioned above, this chapter uses Bayesian
hypothesis testing and the model reliability metric for the purpose of uncertainty
integration, since both methods give the validation result as a probability value. The
Bayes factor metric in Bayesian hypothesis testing is the ratio of the likelihoods
that the model is valid and that the model is invalid. The Bayes factor can be used
to directly calculate the posterior probability that the model is valid. Further, the
threshold Bayes factor for model acceptance can be derived based on a risk vs.
cost trade-off, thereby aiding in robust, meaningful decision-making, as shown by
Jiang and Mahadevan [89]. Alternatively, the model reliability metric [83] can also
compute the probability that the model is valid.
Z
fY .yjx; G; D C / D fY .yjx; / f .jG; D C /d (11.7)
In the case of partially characterized validation data (e.g., field data), the input x may
not be measured, in which case the model prediction must include the uncertainty
in the input as
Z
fY .yjG; D C / D fY .yjx; / fX .x/ f .jG; D C /d d x (11.8)
The above equations simply refer to uncertainty propagation through the model, and
hence the model prediction is conditioned on the event that the mathematical model
is correct and written as fy .yjG; D c /. The results of verification are included while
computing y, and the results of calibration are included by using the posterior PDF
of the model parameter (f .jG; D c /) in the prediction.
The model prediction is then compared with the validation data using Bayesian
hypothesis testing in this section. Let P .G/ and P .G 0 / denote the probabilities
that the model is valid (null hypothesis) and that the model is invalid (alternate
hypothesis), respectively. Prior to validation, if no information is available, P .G/ D
P .G 0 / D 05. Using Bayesian hypothesis testing, these probabilities can be updated
using the validation data (D V ), and the likelihood ratio, referred to as Bayes factor,
is defined as
P D V jG
BD (11.9)
P .D V jG 0 /
In Eq. (11.10), the term f .D V jy/ is calculated based on the measurement error
obs N .0; 2 /, as
V 2 !
1 D y
f D V jy D p exp (11.11)
2 2 2
data, the probability that the model is correct, i.e., P .GjD V /, can be calculated as
B=B C 1 [32].
The probability in Eq. (11.12) is used as a metric to measure model validity; thus
this metric is named as model reliability metric. If y and are deterministic,
Eq. (11.13) computes the model reliability where " is a dummy variable for
integration:
Z
1 ." .y D//2
P .GjD/ D p exp d" (11.13)
m 2 2m2
Note here that the model prediction y refers to the computational model output
y D G.xI /; this prediction may either include a model discrepancy term explicitly
or not, depending on the situation. The discussion below is valid in both cases, and
the numerical examples in this chapter illustrate both types of situations. Although
model input x is known, the model prediction y is still stochastic due to the
uncertainty of . In this case, the model reliability is
Z
P .GjD/ D P .Gj; D/f 00 ./d (11.14)
where P .Gj; D/ is given by the right side of Eq. (11.13) and f 00 ./ is the joint
posterior distribution of .
11 Multilevel Uncertainty Integration 445
Equations (11.13) and (11.14) are only suitable for a single observed value D for
a scalar output quantity. The concept of the model reliability metric can be extended
to deal with multiple data points and multivariate output, as follows.
Multivariate Output
In the case of multivariate output, if K output quantities are observed in a validation
experiment, we have a set of K models sharing the same model input and model
parameters:
8
y1 D G1 .xI /
<
y2 D G2 .xI /
y D G .xI / $ (11.15)
:
yK D GK .xI /
a data set D i D fDi1 ; : : : ; Dij ; : : : ; Dik gT . In addition, the predefined tolerance for
each quantity is included in a vector D f1 ; : : : ; j ; : : : ; k gT .
The distance between Z and D i can be measured by multiple distance
functions such as the Euclidean distance, Chebyshev distance, Manhattan
distance, and Minkowski distance [90]. This chapter uses the Mahalanobis
distance q[91]. The Mahalanobis distance between Z and D i is defined as
M D .Z D i /T 1
Z .Z D i / where Z is the covariance matrix of Z .
The Mahalanobis distance transfers Z and D i into the normalized principal
component (PC) space [91] by using 1Z . Compared to other distance functions,
the Mahalanobis distance brings two advantages: (1) the correlations between
output quantities are considered, and (2) the output quantities are normalized to
the same scale to prevent any quantity from dominating the metric simply due to
large numerical values. Using the Mahalanobis distance, the model reliability for
multivariate output is defined as
q q
P .GjD i / D P .M < M / D P .Z D i /T 1
Z .Z D i / < T 1
Z
q (11.16)
T 1
where M D Z is the normalized tolerance.
Generally the posterior distributions obtained in model calibration are numerical
samples generated by MCMC, so the subsequent model reliability in Eqs. (11.13)
and (11.14) is also computed numerically. Numerical computation also facilitates
the realization of the extended model reliability in Eq. (11.16). Here the model
reliability is expressed as
ZM
P .GjD i / D P .M < M jD i / D f .M jD i / dM
0 (11.17)
Z M
D f .M jD i ; /f 00 . /d d m
0
The calibration procedure in Sect. 2.2 assumed that the model form G is valid and
estimated the model parameters . In contrast, the validation procedure in Sect. 2.3.1
calculated the probability that the model G is valid by assuming the uncertainty in
the model parameters . The two results can be combined to calculate the overall
uncertainty in the model prediction, using the theorem of total probability as
fY .yjD C ; D V / D P .GjD V /fY .yjG; D C / C P G 0 fY yjG 0 (11.18)
In Eq. (11.18), P .GjD V / can also be calculated using the model reliability metric
instead of Bayesian hypothesis testing. Note that the result of verification, i.e., solu-
tion approximation error, was already included in both calibration and validation.
Thus, the PDF fY .yjD C ; D V / includes the results of verification, calibration, and
validation activities.
Consider steady state heat transfer in a thin wire of length L, with thermal
conductivity k and convective heat coefficient . Assume that the heat source is
Q.x/ D 25.2x L/2 , where x is measured along the length of the wire. For the
sake of illustration, it is assumed that this problem is essentially one dimensional
and that the solution can be obtained from the following boundary value problem
[32]:
2
k @@TT2 C T D Q .x/
T .0/ D T0 (11.19)
T .L/ D TL
448 S. Mahadevan et al.
First, the differential equation in Eq. (11.19) is solved using a finite difference code.
Three different discretization sizes are considered, and Richardson extrapolation
[42] is used to calculate the solution approximation error which is used to correct
the model prediction every time this differential equation is solved. Since there
are four uncertain quantities (T0 , TL , k, and the model parameter being updated,
i.e., ), it is necessary to quantify the solution approximation error as a function
of these four quantities. Every time the model prediction Y needs to be computed
as a function of these four quantities, a mesh refinement study is conducted, i.e.,
three different mesh sizes are considered (with sizes 0.01, 0.005, and 0.0025),
and Eq. 11.1 is used to compute the solution approximation error and, hence, the
corrected prediction.
It may be noted that, for this particular numerical example, linear diffusion
dominates and therefore, the solution approximation error is not very sensitive
to k. However, this behavior is not explicitly considered during verification. This
is because of two reasons. First, it may recalled that, in order to integrate the
results of verification into calibration and validation, the solution approximation
errors are computed and corrected whenever needed while performing calibration
(update using data D C D f22I 23I 25I 261I 254g, in C) and validation. In other
words, during calibration, samples of the aforementioned four uncertain quantities
are generated through slice sampling; for each generated sample, the solution
approximation error is quantified through mesh refinement. Since it is necessary
to repeat this procedure for every generated sample, it does not matter whether the
nominal value of k or the actually sampled value of k is used, from a computational
point of view. The second reason is that the underlying physics behavior and the
choice of numerical values are not exploited during the implementation of the
methodology. The code used to solve the differential equation in Eq. (11.19) is
treated as if it were a black box model, in an effort to keep the illustration as
general as possible. In specific problems, special features may be exploited to reduce
computational effort.
The prior (f . /) and posterior (f .jG; D C /) PDFs of are shown in
Fig. 11.3. Additional validation data (D V D f24I 245I 246I 238g, in C) is used
to compute the probability that the temperature prediction model is correct, i.e.,
P .G/ D 0:84.
11 Multilevel Uncertainty Integration 449
The method developed in Sect. 2.4 is used to calculate the unconditional PDF of
temperature using the principle of total probability, as shown in Fig. 11.4. This
PDF integrates the results of verification, validation, and calibration to compute the
overall uncertainty in the temperature at the midpoint of the wire.
Figure 11.4 indicates three PDFs: (i) fY .yjG; D C / denotes the model prediction,
(ii) fY .yjG 0 / denotes the prediction under the alternate hypothesis (assumed
uniform; due to sampling errors and use of kernel density estimation for plotting,
the PDF is not perfectly horizontal in Fig. 11.4), and (iii) fY .yjG; D C ; D V / which
represents the PDF that integrates the validation result with the previous calibration
and verification activities. The third PDF is referred to as the unconditional PDF
of the temperature response, since it is not conditioned on the model form.
450 S. Mahadevan et al.
Consider a system that is studied using multiple levels of models with Type-
I interaction, i.e., the output of a lower-level model becomes an input to the
higher-level model and hence is the linking variable between the two models.
While the methods of verification, validation, and calibration can be applied to
each of the individual models, the challenge is to integrate the results from these
activities performed at multiple levels. This section discusses a methodology for
the integration of verification, validation, and calibration across multiple levels of
modeling with Type-I interaction. The methodology in this section is illustrated for
two levels of models, as shown in Fig. 11.5 and Eq. (11.20), but can be extended to
any number of levels of modeling without loss of generality.
Y D G .X I /
(11.20)
Z D H .Y; W I /
Assume that data are not available at the system level, i.e., it is not possible to
validate/calibrate model H . Let D C and D V denote the data available on Y for
calibration (of ) and validation (of G), respectively. Let obs N .0; obs 2 / denote
the measurement errors in the data.
The first step is to connect the various sources of uncertainty using a Bayesian
network, as shown in Fig. 11.6.
This Bayesian network indicates that two sets of data are available for calibration
and validation; the Bayesian methods for calibration and validation can be applied
to these sets. (If the KOH framework is pursued for calibration, then both the
parameters and the model inadequacy function can be included in the Bayesian
network.) While Bayesian updating is used for model calibration, either Bayesian
hypothesis testing or the model reliability metric can be pursued for model
validation.
The task is to compute the overall uncertainty in Z by using lower-level data;
this uncertainty must include the effect of verification, calibration, and validation
activities.
Both the models G and H can be verified since experimental data are not required
for verification. During the process of verification, the solution approximation
error (soln ) is quantified for both the models G and H . Note that the solution
approximation error is a function of the inputs and the model parameters. Note
that these solution approximation errors (soln for both G and H ) account for the
combined effect of both deterministic and stochastic errors, as discussed in Sect. 2.1.
452 S. Mahadevan et al.
The Bayesian network can be used for forward propagation of uncertainty using the
principles of conditional probability and total probability. Prior to the collection of
any data, the uncertainty in x; , and can be propagated through the models as
R
fZ .ZjH / D fZ .Zjw; ; y; H / fW .w/ f ./ fY .yjG/ d wd dy
R (11.21)
fY .yjG/ D fY .yjx; ; G/ fX .x/ f ./dxd
However, this procedure assumes that (1) the PDFs of the parameters and are
f ./ and f ./ and (2) the models G and H are correct. These two issues were
addressed in calibration and validation, respectively. While the PDF of did not
change, the PDF of f ./ was updated to f .jG; D c /. Further, the probability
that G is correct, i.e., P .GjD V /, was evaluated. These two quantities can now be
used to calculate the overall uncertainty in Z. First, if the calibration data alone was
used, then the PDFs of Y and Z are given by
R
fZ ZjG; H; D C D fZ .Zjw; ; y; H / fW .w/ f ./ fY yjG; D C d wd dy
R
fY .yjG; D C / D fY .yjx; ; G/ fX .x/ f .jG; D C /dxd
(11.22)
The theorem of total probability can then be used to include the result of validation.
The PDF of Y is modified as
fY .yjD C ; D V / D P GjD V fY .yjG; D C / C P G 0 jD V fY .yjG 0 / (11.23)
11 Multilevel Uncertainty Integration 453
Until now, only the first model G was considered for verification, validation,
and calibration. However, the methodology is general and can be extended to
multiple models. For example, consider the case where there are two models whose
individual outputs become inputs for the system model. For example, consider the
equations:
Y1 D G1 .X1 ; 1 /
Y2 D G2 .X2 ; 1 / (11.25)
Z D H .Y1 ; Y2 /
The inputs to the models G1 and G2 are X1 and X2 , respectively; the corresponding
parameters are 1 and 2 , respectively. The Bayesian network for this multilevel
system is shown in Fig. 11.7.
Assume that there is no data at the system level Z, but data are available
for calibration and validation of lower-level models G1 and G2 , as shown in the
Bayesian network in Fig. 11.7. Using the calibration data, the PDFs f .1 jG1 ; D1c /,
f .2 jG2 ; D2c /, f .y1 jG1 ; D1c /, and f .y2 jG2 ; D2c / are calculated. Using the val-
idation data, the probability P .G1 jD1v / and P .G2 jD2v / are calculated; further
P .G10 jD1v / D 1 P .G1 jD1v / and P .G20 jD2v / D 1 P .G2 jD2v /. As explained
earlier, the probabilities that G1 and G2 are correct can also be calculated using the
reliability-based metric.
The unconditional PDF of Z needs to be calculated by considering four
quantities:
454 S. Mahadevan et al.
V V
V
V
1. P G1 \ G20 jD1V ; D2V D P G1 jD1V P G20 jD2V
2. P G10 \ G2 jD1V ; D2V D P G10 jD1V P G2 jD2V
3. P G10 \ G20 jD1V ; D2V D P G10 jD1V P G20 jD2V
4. P G1 \ G2 jD1 ; D2 D P G1 jD1 P G2 jD2
Note the assumption that the two models G1 and G2 are independent. If the
dependence is known, then it can be included in the calculation of the joint
probabilities. Then, the unconditional PDF of Z is written as
fZ ZjD1C ; D1V ; D2C ; D2V ; H
D P G1 jD1V P G2 jD2V fZ .ZjG1 ; G2 ; H /
C P G10 jD1V P G2 jD2V fZ .ZjG10 ; G2 ; H / (11.26)
C P G1 jD1V P G20 jD2V fZ .ZjG1 ; G20 ; H /
C P G10 jD1V P G20 jD2V fZ .ZjG10 ; G20 ; H /
level models (G1 , G2 , G3 and so on), the number of terms on the right hand side
of Eq. (11.26) increases exponentially, since the number of terms will be equal
to 2nm where nm is the number of models. Each of these terms indicates which
subset of the models is correct. For example, in the case of three models, the event
G1 \G20 \G30 indicates that the model G1 is correct, whereas the models G2 and G3
are not. Similarly, the event G1 \ G2 \ G30 indicates that the model G2 is correct,
whereas the models G1 and G3 are not. Though Eq. (11.26) clearly distinguishes
between all such possibilities, the computation of the right hand side of this equation
may be cumbersome due to the exponential number of terms. Such computational
complexity can be easily avoided by first computing the unconditional PDFs of
the lower-level outputs (f .y1 jD1c ; D1v ) and f .y2 jD2c ; D2v ) in this case) similar
to Eq. (11.23) and then propagating these PDFs through the model H . Both the
approaches will yield the same resultant PDF of Z, which accounts for the results
of verification, validation, and calibration activities in both the models G1 and G2 .
This section illustrates two models that represent thermal and electrical analyses,
with Type-I interaction. This example is an extension of the heat conduction
problem in Sect. 3; the temperature rise in the wire causes change in the electrical
resistance. The goal is to predict the system response, which is the electric current
in the wire. Hence, the output of the lower-level model (temperature predictor in
Eq. (11.19)), i.e., temperature, becomes an input to a higher-level model (current
predictor), as shown in Fig. 11.8.
Consider the same wire as in Sect. 3. Before application of the heat, the resistance
of the wire is given in terms of the resistivity (), the cross-section area (A), and
length (L) as
L
Rold D (11.27)
A
After steady state is reached, the midpoint temperature (Y ) computed in Eq. (11.19)
causes an increase in the resistance of the wire; this increase is evaluated using the
coefficient of resistivity (). The current through the wire when a 10V voltage is
applied is calculated as
10
I D (11.28)
Rold .1 C Y /
Assume that there is no electrical performance test data for the wire, and it is
required to predict the uncertainty in the electrical current, by including the results
of verification (using Richardson extrapolation), validation, and calibration in the
lower-level model. The two models and the associated sources of uncertainty are
connected through a Bayesian network as shown in Fig. 11.9. The four uncertain
quantities (T , TL , k, ) are used to predict the midpoint temperature (Y ) which
is then used to compute the resistance (R) as a function of higher-level model
parameters (A, , I , ). Notice that the lower-level solution approximation error
is actually a function of the aforementioned four uncertain quantities (T0 , TL , k, ).
Data (D C , D V ) is available for comparison against Y , and m is the measurement
error associated with such data.
Since the thermal model used for temperature prediction has already been
verified, calibrated, and validated, the unconditional PDF of the temperature is
simply propagated through the current-predictor model to calculate the current in
the wire. For the purpose of illustration and to see the effect of uncertainty in Y on
the uncertainty in electrical current (I ), the other parameters of the current-predictor
model (, A, ) are chosen to be deterministic. The PDF of the current of the wire
is shown in Fig. 11.10, for three cases.
Z D H .; X ; / (11.29)
458 S. Mahadevan et al.
Y1 D G1 .; X 1 ; 1 /
(11.30)
Y2 D G2 .; X 2 ; 2 /
Assume that separate sets of data are available for calibration (D1c and D2c for levels
1 and 2, respectively) and validation (D1v and D1v for levels 1 and 2, respectively).
Full system testing is not possible, i.e., no test data are available at the system level
(Z), and it is required to quantify the uncertainty in the system-level prediction
using the data at the lower levels (Y1 and Y2 /. The inputs, model parameters,
outputs, and data at all levels are connected through a Bayesian network, as shown
in Fig. 11.11.
The steps of verification, calibration, and validation in each model are similar to
the previous sections; however the procedure for integration of these activities is
different.
If is estimated using each individual model (G1 or G2 ) and the corresponding
calibration data (D1c or D2c ), then the corresponding PDFs of the model parameter
are f .jD1c ; G1 / or .jD2c ; G2 /, respectively.
The Bayesian network facilitates the simultaneous use of both models and
the corresponding data to calibrate and obtain the PDF f .jD1c ; D2c ; G1 ; G2 /.
This step of simultaneous calibration using multiple data sets from experiments
of differing complexity is different from the calibration considered in Sects. 2
and 4, where only one model and the corresponding calibration data were used to
estimate . In order to integrate the results of verification, validation, and calibration
in Sect. 6.3 below, all the PDFs, i.e., those calibrated using individual data sets
(f .jD1c ; G1 ) and f .jD2c ; G2 /) as well as those calibrated using multiple data
sets ( f .jD1c ; D2c G1 ; G2 /), are necessary.
The use of validation data is identical to the procedure in Sects. 2 and 4.
The quantities P .G1 jD1c / and P .G2 jD2v / are calculated using the Bayes factor
metric; further P .G10 jD1v / D 1 P .G1 jD1v / and P .G20 jD2v / D 1 P .G2 jD2v /.
Alternatively, the reliability-based method can also be used to calculate this
probability. Since the two models are assumed independent, P .G1 \ G2 jD1v ; D2v / D
P .G1 jD1v /P .G2 jD2v /.
11 Multilevel Uncertainty Integration 459
Section 1.2 mentioned the necessity to assign larger weight to the level physically
closer or more relevant to the system level than the other in the case of Type-
II interaction. Hence this section develops a method for relevance analysis, which
measures the degree to which the experimental configuration at a lower level has
similar physical characteristics as the system of interest. Currently such measure
is only intuitive and qualitative; an objective quantitative measure of relevance is
needed for uncertainty integration.
The methodology to measure relevance should have two desired features. First,
the defined methodology needs no mathematical details of the model in each level,
since the model in each level could be a black box. Second, the resultant relevance
460 S. Mahadevan et al.
comparison is not easy if the models at different levels have distinct formats and are
addressing different physical configurations. Further, the model sometimes may be
a black box; thus we cannot access its mathematical details and a direct comparison
would be difficult. The obtained sensitivity vectors quantify the contribution of each
model input/parameter toward the uncertainty in the model output. In other words,
the sensitivity vector indicates which model input/parameter is more important
in affecting the model output uncertainty. Whether a model input/parameter is
important is determined by the physics of the model, thus the sensitivity vector
represents the physics to the extent that the model represents the physics accurately.
Therefore, this chapter considers the sensitivity vector as an indicator of the physics
captured in the model. (Of course, how well the physics is captured in the model
is already indicated by the model reliability metric.) Thus the comparison of the
vectors from two different levels is used to quantify the relevance between these
two levels.
One issue in the comparison of VLi .i D 1; 2/ and Vs is that they may have
different sizes ( NL1 ; NL2 ; Ns may not be equal to each other) and some elements
in one vector may not be present in the other vector. The shared dimensions
of VLi and Vs are model parameters , and the unshared dimensions are the
different model inputs and auxiliary variables at each level. To solve this problem,
we add the unshared dimension in VLi or Vs to the other vectors but set the
corresponding sensitivity indices as zero since the added dimensions have no effect
in the computation of the original sensitivity vector. Thus all the vectors VLi or Vs
are brought to the same size.
Several methods are available to compare two vectors, such as Euclidean distance
[90], Manhattan distance [90], Chebyshev distance [90], and cosine similarity [90,
96]. To include the relevance in the subsequent uncertainty integration conveniently,
we define the relevance index R as the square of cosine similarity of the sensitivity
vectors, where the cosine similarity is the standardized dot product of two vectors:
2
VLi Vs
RD (11.31)
jjVLi jj jjVs jj
In other words, the above relevance index is the square of the cosine value of the
angle between two sensitivity vectors, the elements in which are all positive. If
the angle is zero, the relevance among the two levels is 1; if the two vectors are
perpendicular, the relevance is 0.
In addition, this definition of relevance generates a value on the interval [0, 1],
and its complement, the square of the sine value, indicates physical non-relevance;
hence the sum of relevance and non-relevance is the unity. Here the relevance
index is a plausibility model for the proposition The lower level model reflects the
physical characteristics of the system Level, and the plausibility of this proposition
is the relevance index. Based on Coxs theorem [97], this plausibility model is
isomorphic to probability since (1) the relevance index is a real value depending
on the information of sensitivity vectors we obtained and (2) the relevance index
changes sensibly as the sensitivity vectors change. Thus the relevance index can
462 S. Mahadevan et al.
be converted to probability by scaling, which has been done since the relevance
index defined in Eq. (11.31) is already on the interval [0, 1]. Therefore in the
roll-up methodology Sect. 6.3, we treat the relevance index as a probability and
conveniently include it as a weighting term in the uncertainty integration.
A further question arises in the computation of relevance index. Sobol indices
consider the entire distribution of the influencing variable, but the posterior distribu-
tion of m (to be used in system-level prediction) is unknown before the uncertainty
integration. In order to solve this problem, a straightforward iterative algorithm to
compute the relevance index R is as below:
Thus, the results of calibration and validation at each lower level and relevance
indices between the lower levels and the system level have been obtained. The next
task is to construct the integrated distribution of the system-level model parameters
and predict the system output.
The method for the integration of the verification, calibration, validation, and
relevance analysis results for Type-II interaction is different from Sect. 4 because
the linking variables in this case are the model parameters, whereas the linking
variables in Type-I interaction were the outputs of lower-level models. While the
unconditional PDF of the lower-level output was calculated in Sect. 4, it is now
necessary to calculate the integrated distribution of the model parameters that also
accounts for validation and relevance results.
For a multilevel problem of Type-II interaction, the purpose of uncertainty
integration is to combine all the available information (from calibration, validation,
and relevance analysis) from the lower levels and predict the response at the
system level. In this chapter the information from the lower level includes (1) the
posterior distributions from model calibration by considering data at each individual
lower level, as well as data from multiple lower levels, (2) the model reliability
distributions from model validation at each lower level, and (3) the relevance indices
between each lower level and the system level. A roll-up methodology has been
proposed in [16, 98] for uncertainty integration. This methodology computes the
unconditional PDF of the model parameters as a weighted average of each posterior
distributions, where the weight term of each posterior is purely decided by model
validation, and the model validity is a fixed value. This section extends this method
to include two additional concepts:
11 Multilevel Uncertainty Integration 463
Take the multilevel problem in Fig. 11.1b as an example. The integrated distribution
of model parameters conditioned on the calibration and validation data and model
reliability P .Gi /.i D 1; 2/ is [99]:
f jD1C ; D1V ; D2C ; D2V ; P .G1 / ; P .G2 /
D P .G1 G2 S1 S2 / f jD1C ; D2C C P G1 S1 \ G20 [ S20 f jD1C
C P G2 S2 \ G10 [ S10 f jD2C C P ..G10 [ S10 / \ .G20 [ S20 //f . /
(11.32)
From the view of generating samples, Eq. (11.32) indicates two criteria: (1) whether
a level is relevant to the system level and (2) whether a level has a valid model.
A sample of is generated from f .jD1c ; D2c / only when both levels satisfy
both criteria, a sample of is generated from f .jDic / if level i satisfies both
criteria but the other level does not, and a sample of is generated from the prior
distribution f ./ if neither level satisfies both criteria. By assuming independence
of model validity and relevance between different lower levels, the weight terms
in Eq. (11.32) are computed by using the values of P .Gi /; P .Si jGi / and two
fundamental probability relationships: P .Gi Si / D P .Gi /P .Si jGi /; P .Gi0 [ Si0 / D
1P .Gi Si /. Eq. (11.11) also implies the option of using only data from one level.
If both the model validity and relevance are 1 for Level 1 and either model validity
or relevance is 0 for Level 2, Eq. (11.11) reduces to f .jD1c;v ; D2c;v / D f .jD1c /,
i.e., only Level 1 data is used.
The integrated distribution of , which is conditioned on both calibration and
validation data, can now be computed as
f jD1C ; D1V ; D2C ; D2V
D f jD1C ; D1V ; D2C ; D2V ; P .G1 / ; P .G2 / f .P .G1 // (11.33)
Eqs. (11.32) and (11.33) express the approach to integrate calibration, validation,
and relevance results at lower levels. Note that Eq. (11.33) accounts for stochastic
464 S. Mahadevan et al.
model reliability. The analytical expression of f jD1C ; D1V ; D2C ; D2V is dif-
ficult to derive since the results we collect in model calibration and validation
are all numerical. A single
loop sampling approach is given here to construct
f jD1C ; D1V ; D2C ; D2V numerically, as follows:
(In general, if there are nm models, then the right hand side of Eq. (11.32) has
2nm terms. Each of these terms, except the one corresponding to the case where
all the models are not correct, needs to be computed through a Bayesian inference
procedure. Hence, this may be a computational challenge in larger applications. In
such scenarios, the use of an inexpensive surrogate model is recommended for such
repeated Bayesian inference calculations,
as performed in this chapter.)
The PDF f jD1C ; D1V ; D2C ; D2V calculated in Eq. (11.33) accounts for the
verification, calibration, validation, and relevance analysis activities with respect
to each of the lower-level models. This unconditional PDF is propagated through
the system model H .; X ; /, in order to quantify the uncertainty in the system-
level response Z. This can be done by Monte Carlo sampling or other preferred
stochastic analysis methods. Thus, similar to Sect. 4, it can be seen that the tools
of conditional probability and total probability are directly useful for integrating
verification, validation, and calibration and, thereby, aid in the quantification of the
system-level prediction uncertainty.
Fig. 11.12 Structural dynamics challenge problem. (a) Level 1. (b) Level 2. (c) System level
The data points for each quantity from the first five tests are selected as calibration
data and the rest five as validation data.
Computational models for the three levels have been established. The method to
solve the dynamic problem at Level 1 can be found in structural dynamics textbooks
[102], and the computational models using the finite element method for Level 2 and
the system level are provided by Sandia National Laboratories [101].
Since the model input at each level is fixed, the input-dependent model error
is an unknown deterministic value. Thus the parameters to be calibrated in this
example are the spring stiffnesses ki .i D 1; 2; 3/, the damping ratios i .i D 1; 2; 3/,
model error , and the output measurement error standard deviation m if the data of
the corresponding quantity are used in model calibration. Based on expert opinion,
suppose the prior distribution of each ki and i is assumed to be lognormal with a
coefficient of variation of 10% and mean values of k1 D 5000; k2 D 9000; k3 D
8000; i D 0:025 .i D 1; 2; 3/. The prior distribution of model error is assumed to
be uniform, i.e., U .a; b/ and the prior of m is Jeffreys prior f 0 . / / 1= .
466 S. Mahadevan et al.
In order to reduce the computational effort, Gaussian process (GP) surrogate models
are established to replace the computational models for all the output quantities. The
surrogate model uncertainty introduced by the GP models is incorporated in model
calibration and validation, as explained in Sects. 2 and 3.1. The calibration results of
ki and i using the calibration data of the six output quantities at different levels are
shown in Fig. 11.13, including all the PDFs needed in Eq. (11.32). As more data are
used in the calibration, the uncertainty of the model parameters will decline. Thus
Fig. 11.13 shows that the posterior distributions using the data at both levels always
have less uncertainty than those using data at a single level. The difference between
the posterior distributions within each sub-figure also indicates that the posterior
distribution is a best-fitting result in the sense of representing that particular data
set, but we do not yet know how to combine these alternatives in the subsequent
prediction. This is answered by model validation and relevance analysis.
Next model validation is performed using the stochastic model reliability metric
with multivariate output. The tolerance for each quantity is chosen to be 15% of the
validation data. Level 2 is expected to have lower model reliability value for two
main factors:
1. The discretization error at Level 2 due to a limited number of finite elements for
the beam (41 in this example). But this factor is not effective here since the data
at Level 2 are synthetic data generated using the computational model, meaning
that the difference between the computational model and the physics model is
ignored. This factor will come into play if experimental data instead of synthetic
data are used.
2. The coupling between the beam and the damped mass-spring system brings
stronger nonlinearity at Level 2. Under the same number of training points,
the GP surrogate model at Level 2 has more surrogate uncertainty (larger GP
model prediction variance) than the GP surrogate model at Level 1. This factor
is included in the numerical example.
11 Multilevel Uncertainty Integration 467
The model reliability values given by the validation data from each validation
test are listed in Table 11.1, which indicate lower model reliability at Level 2. In
Fig. 11.14, these values are used to construct the distributions of model reliability at
Level 1 and Level 2 using the method of moments. However, even though the model
at Level 1 has higher model reliability than the model at Level 2, the configuration
at Level 2 is closer to the system level of interest. Therefore relevance analysis also
needs to be considered.
468 S. Mahadevan et al.
30
Level 1
22.5 Level 2
PDF
15
7.5
0
0.6 0.7 0.8 0.9 1
Model reliability
The relevance index of each lower level to the system level is computed using
the iterative algorithm in Sect. 6.2. The initial values of relevance indices for both
lower levels are set as 1. The algorithm converges after three iterations for Level 1
and after five iterations for Level 2. The results are P .S1 / D 0:5785 and P .S2 / D
0:8971. This result means that Level 2 is more relevant to the system level, which
is consistent with our intuition since Level 2 has the same structural configuration
as the system and differs only in the load input (sinusoidal vs. random process).
Compared with the result of model validation, Level 2 has a lower value of model
reliability but higher relevance index.
Based on all the information from calibration, validation, and relevance analyses,
the integrated distributions of all six model parameters are constructed in Fig. 11.15
using Eqs. (11.32) and (11.33). Figure 11.15 also shows the result by considering
validation only (no relevance) using the previous roll-up method in Ref. [98],
but stochastic model reliability metric is still used. It is shown that the new
roll-up method is more conservative than the previous one, since we add one
more criterion of relevance during the generation of samples from the posterior
distribution.
The system output is predicted by propagating the integrated distribution of
model parameters through the computational model at the system level. Figure 11.16
gives not only the prediction using the data of all six quantities but also the
prediction by other combinations of quantities whose names are shown in the
legend. The mean values and variances of the predictions are shown in Table 11.2.
11 Multilevel Uncertainty Integration 469
As more quantities are employed, the mean value of prediction decreases from 712
to 656, and the variance shows an overall decreasing tendency, but not monotonic
(the variance increases slightly when the number of outputs considered rises from 2
to 3 and from 4 to 5).
470 S. Mahadevan et al.
0.02
A3
A3, D2
A1, A3, D2
0.015
A1, A2, A3, D2
A1, A2, A3, D1, D2
A1, A2, A3, D1, D2, D3
PDF
0.01
0.005
0
400 500 600 700 800 900 1000 1100 1200
Maximum acceleration at mass 3
8 Conclusion
the results of calibration and validation in order to compute the overall uncertainty
in the model-based prediction.
The integration methodology is first developed for single-level models and then
extended to multilevel systems that consist of interacting models. Two types of
interactions are discussed in detail: (1) Type-I interaction, where the output of a
lower-level model becomes an input to the higher-level model, and (2) Type-II
interaction, where models and experiments at various levels of reduced complexity
are used to infer system model parameters. While verification is performed before
calibration and validation in both the cases, in order to account for the results of
verification during calibration and validation, the procedure for the integration of
calibration and validation results at lower levels is different for Type-I and Type-II
interactions; in the former case, the key idea is to compute the unconditional PDF
of the output of the lower-level model, whereas in the latter case, the key idea is
to compute the unconditional PDF of the system model parameters. If a system-
level prediction is based on models with both types of interactions (both Type-I and
Type-II), then the unconditional PDFs of the intermediate output and the parameters
can both be used to compute the uncertainty in the overall system-level prediction
uncertainty.
The multilevel roll-up methodology described in this chapter offers considerable
promise toward the quantification of margins and uncertainties in multilevel system
prediction. While calibration and validation have previously been performed inde-
pendently at individual levels, this methodology systematically integrates all such
activities in order to compute the system-level prediction uncertainty, thereby aiding
in risk-informed decision-making with all available information. Only two types of
interactions between multiple models were considered in this chapter.
An important related issue is the extrapolation of the model to application
conditions under which experiments may not have been performed. Often, there
are two types of extrapolation. The first type is where the model is validated at
certain input values, but prediction needs to be performed at other input values
that are not contained in the validation domain. The second type of extrapolation
is where validation is performed using a simplified system (with restricted features,
physics, etc.) and the desired prediction is of the original system. While regression-
based techniques have been developed for the first type of extrapolation [27], the
second type of extrapolation could be represented through a Type-II configuration
discussed in Sects. 6 and 7. The methods discussed in this chapter can only be used
for the propagation of uncertainty to the extrapolation domain as long as there is no
change in the physics of the system between validation and extrapolation domains.
References
1. Urbina, A., Mahadevan, S., Paez, T.L.: Quantification of margins and uncertainties of complex
systems in the presence of aleatoric and epistemic uncertainty. Reliab. Eng. Syst. Saf. 96(9),
11141125 (2011)
472 S. Mahadevan et al.
2. Sankararaman, S., Ling, Y., Mahadevan, S.: Uncertainty quantification and model validation
of fatigue crack growth prediction. Eng. Fract. Mech. 78(7), 14871504 (2011)
3. Sankararaman, S., Mahadevan, S.: Model parameter estimation with imprecise and unpaired
data. Inverse Probl. Sci. Eng. 20(7), 10171041 (2012)
4. Sankararaman, S., Mahadevan, S.: Model validation under epistemic uncertainty. Reliab. Eng.
Syst. Saf. 96(9), 12321241 (2011)
5. Ling, Y., Mahadevan, S.: Quantitative model validation techniques: new insights. Reliab. Eng.
Syst. Saf. 111, 217231 (2013)
6. Sankararaman, S., Mahadevan, S.: Likelihood-based representation of epistemic uncertainty
due to sparse point data and/or interval data. Reliab. Eng. Syst. Saf. 96(7), 814824 (2011)
7. Jeffrey, H.: Theory of Probability. Oxford University Press, Oxford (1998)
8. Sankararaman, S., Ling, Y., Shantz, C., Mahadevan, S.: Uncertainty quantification in fatigue
crack growth prognosis. Int. J. Progn. Heal. Manag. 2(1), 15 (2011)
9. Sankararaman, S., Ling, Y., Shantz, C., Mahadevan, S.: Inference of equivalent initial flaw
size under multiple sources of uncertainty. Int. J. Fatigue 33(2), 7589 (2011)
10. Sankararaman, S., Ling, Y., Mahadevan, S.: Statistical inference of equivalent initial flaw
size with complicated structural geometry and multi-axial variable amplitude loading. Int. J.
Fatigue 32(10), 16891700 (2010)
11. Sankararaman, S., McLemore, K., Mahadevan, S., Bradford, S.C., Peterson, L.D.: Test
resource allocation in hierarchical systems using bayesian networks. AIAA J. 51(3), 537550
(2013)
12. Mullins, J., Li, C., Mahadevan, S., Urbina, A.: Optimal Selection of Calibration and Validation
Test Samples under Uncertainty. In: IMAC XXXII, Orlando, pp. 391401 (2014)
13. Li, C., Mahadevan, S.: Sensitivity Analysis for Test Resource Allocation. In: IMAC XXXIII,
Orlando (2015)
14. Mullins, J., Li, C., Sankararaman, S., Mahadevan, S.: Probabilistic integration of validation
and calibration results for prediction level uncertainty quantification: application to structural
dynamics. In: 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and
Materials Conference, Boston (2013)
15. Kennedy, M.C., OHagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. 63(3),
425464 (2001)
16. Sankararaman, S., Mahadevan, S.: Comprehensive framework for integration of calibration,
verification and validation. In: 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural
Dynamics and Materials Conference, Honolulu, pp. 112 (2012)
17. Li, C., Mahadevan, S.: Uncertainty quantification and output prediction in multi-level prob-
lems. In: 16th AIAA Non-Deterministic Approaches Conference, National Harbor (2014)
18. Li, C., Mahadevan, S.: Role of calibration, validation, and relevance in multi-level uncertainty
integration. Reliab. Eng. Syst. Saf. 148, 3243 (2016)
19. Sankararaman, S., Mahadevan, S.: Likelihood-based approach to multidisciplinary analysis
under uncertainty. J. Mech. Des. 134(3), 031008 (2012)
20. Babuska, I., Oden, J.T.T.: Verification and validation in computational engineering and
science: basic concepts. Comput. Methods Appl. Mech. Eng. 193(3638), 40574066 (2004)
21. Roy, C.J.: Review of code and solution verification procedures for computational simulation.
J. Comput. Phys. 205(1), 131156 (2005)
22. AIAA: Guide for the verification and validation of computational fluid dynamics simulations.
American Institute of Aeronautics and Astronautics (AIAA), no. AIAA G-077-1998 (1998)
23. Defense Modelling and Simulation Office, Verification, Validation, and accreditation (VV &
A) recommend practices guide, Alexandia (1998)
24. Oberkampf, W.L., Blottner, F.G.: Issues in computational fluid dynamics code verification
and validation. AIAA J 36(5), 687695 (1998)
25. Oberkampf, W.L., Trucano, T.G.G.: Verification and validation in computational fluid
dynamics. Prog. Aerosp. Sci. 38(3), 209272 (2002)
26. Benay, R., Chanetz, B., Delery, J.: Code verification/validation with respect to experimental
data banks. Aerosp. Sci. Technol. 7(4), 239262 (2003)
11 Multilevel Uncertainty Integration 473
27. Roy, C.J., Oberkampf, W.L.: A comprehensive framework for verification, validation, and
uncertainty quantification in scientific computing. Comput. Methods Appl. Mech. Eng.
200(25), 21312144 (2011)
28. Roache, P.J.: Verification of codes and calculations. Aiaa J. 36(5), 696702 (1998)
29. Roache, P.J.: Verification and Validation in Computational Science and Engineering. Hermosa
Publishers, Albuquerque (1998)
30. Oberkampf, W.L., Trucano, T.G., Hirsch, C.: Verification, validation, and predictive capability
in computational engineering and physics. Appl. Mech. Rev. 57(5), 345384 (2004)
31. Roy, C.J., McWherter-Payne, M.A., Oberkampf, W.L.: Verification and validation for laminar
hypersonic flowfields, part 1: verification. Aiaa J. 41(10), 19341943 (2003)
32. Rebba, R., Mahadevan, S., Huang, S.: Validation and error estimation of computational
models. Reliab. Eng. Syst. Saf. 91(1011), 13901397 (2006)
33. Liang, B., Mahadevan, S.: Error and uncertainty quantification and sensitivity analysis in
mechanics computational models. Int. J. Uncertain. Quantif. 1(2), 147161 (2011)
34. Rangavajhala, S., Sura, V.S., Hombal, V.K., Mahadevan, S.: Discretization error estimation in
multidisciplinary simulations. AIAA J. 49(12), 26732683 (2011)
35. Ferziger, J., Peric, M.: Computational Methods for Fluid Dynamics. Springer, Berlin (1996)
36. Ainsworth, M., Oden, J.T.T.: A posteriori error estimation in finite element analysis. Comput.
Methods Appl. Mech. Eng. 142(12), 188 (1997)
37. Oberkampf, W.L., DeLand, S.M., Rutherford, B.M., Diegert, K.V., Alvin, K.F.: Error and
uncertainty in modeling and simulation. Reliab. Eng. Syst. Saf. 75(3), 333357 (2002)
38. Haldar, A., Mahadevan, S.: Probability, Reliability, and Statistical Methods in Engineering
Design. John Wiley, New York (2000)
39. Ghanem, R., Spanos, P.D.: Polynomial chaos in stochastic finite elements. J. Appl. Mech.
57(1)(89), 197202 (1990)
40. Buhmann, M.D.: Radial Basis Functions: Theory and Implementations, vol. 12. Cambridge
university press, Cambridge/New York (2003)
41. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT, Cam-
bridge (2006)
42. Richards, S.A.: Completed Richardson extrapolation in space and time. Commun. Numer.
Methods Eng. 13(7), 573582 (1997)
43. Xu, P., Su, X., Mahadevan, S., Li, C., Deng, Y.: A non-parametric method to determine basic
probability assignment for classification problems. Appl. Intell. 41(3), 681693 (2014)
44. Babuska, I., Rheinboldt, W.C.: A posteriori error estimates for the finite element method. Int.
J. Numer. Methods Eng. 12(10), 15971615 (1978)
45. Demkowicz, L., Oden, J.T., Strouboulis, T.: Adaptive finite elements for flow problems with
moving boundaries. part I: variational principles and a posteriori estimates. Comput. Methods
Appl. Mech. Eng. 46(2), 217251 (1984)
46. Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for non-linear
regression. PhD dissertation, University of Toronto, 1996
47. Rasmussen, C.E.: The infinite Gaussian mixture model. In: NIPS, Denver, vol. 12, pp. 554
560 (1999)
48. Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., VonLuxburg,
U., Ratsch, G. (eds.) Advanced Lectures on Machine Learning, vol. 3176, pp. 6371 (2004)
49. Santner, T.J., Williams, B.J., Notz, W.I.: The design and analysis of computer experiments.
Springer, Dordrecht/New York (2013)
50. Bichon, B.J., Eldred, M.S., Swiler, L.P., Mahadevan, S., McFarland, J.M.: Efficient global
reliability analysis for nonlinear implicit performance functions. Aiaa J. 46(10), 24592468
(2008)
51. McFarland, J.M.: Uncertainty Analysis for Computer Simulations throuth Validation and
Calibraion. Vanderbilt University, Nashville (2008)
52. Cressie, N.: Spatial Statistics. John Wiley, New York (1991)
53. Chiles, J.-P., Delfiner, P.: Geostatistics: Modeling Spatial Uncertainty, vol. 344. Wiley-
Interscience, New York (1999)
474 S. Mahadevan et al.
82. Rebba, R., Mahadevan, S.: Computational methods for model reliability assessment. Reliab.
Eng. Syst. Saf. 93(8), 11971207 (2008)
83. Sankararaman, S., Mahadevan, S.: Assessing the reliability of computational models under
uncertainty. In: 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and
Materials Conference, Boston, pp. 18 (2013)
84. Thacker, B.H., Paez, T.L.: A simple probabilistic validation metric for the comparison of
uncertain model and test results. In: ASME Verification and Validation Symposium, Las
Vegas (2013)
85. Liu, Y., Chen, W., Arendt, P., Huang, H.-Z.: Toward a better understanding of model
validation metrics. J. Mech. Des. 133(7), 071005 (2011)
86. Roache, P.J.: Fundamentals of Verification and Validation. Hermosa Press, Socorro (2009)
87. Oberkampf, W.L., Roy, C.C.J.: Verification and Validation in Scientific Computing. Cam-
bridge University Press, New York (2010)
88. OHagan, A.: Fractional Bayes Factors for Model Comparison. J. R. Stat. Soc. 57(1), 99138
(1995)
89. Jiang, X., Mahadevan, S.: Bayesian risk-based decision method for model validation under
uncertainty. Reliab. Eng. Syst. Saf. 92(6), 707718 (2007)
90. Cha, S.: Comprehensive survey on distance/similarity measures between probability density
functions. Int. J. Math. Model. METHODS Appl. Sci. 1(4) (2007)
91. De Maesschalck, R., Jouan-Rimbaud, D., Massart, D.L.: The Mahalanobis distance.
Chemom. Intell. Lab. Syst. 50(1), 118 (2000)
92. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global Sensitivity Analysis: The Primer. John Wiley, Chichester (2008)
93. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte
Carlo estimates. Math. Comput. Simul. 55(13), 271280 (2001)
94. Li, C., Mahadevan, S.: Global sensitivity analysis for system response prediction using auxil-
iary variable method. In: 17th AIAA Non-Deterministic Approaches Conference, Kissimmee
(2015)
95. Li, C., Mahadevan, S.: Relative contributions of aleatory and epistemic uncertainty sources in
time series prediction. Int. J. Fatigue 82, 474486 (2016)
96. Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4),
3543 (2001)
97. Van Horn, K.S.: Constructing a logic of plausible inference: a guide to Coxs theorem. Int. J.
Approx. Reason. 34(1), 324 (2003)
98. Sankararaman, S., Mahadevan, S.: Integration of model verification, validation, and cali-
bration for uncertainty quantification in engineering systems. Reliab. Eng. Syst. Saf. 138,
194209 (2015)
99. Li, C., Mahadevan, S.: Uncertainty quantification and integration in multi-level problems. In:
IMAC XXXII, Orlando, vol. 3 (2014)
100. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math.
Stat. 27(3), 832837 (1956)
101. Red-Horse, J.R.R., Paez, T.L.L.: Sandia National Laboratories validation workshop: Struc-
tural dynamics application. Comput. Methods Appl. Mech. Eng. 197(2932), 25782584
(2008)
102. Chopra, A.K.: Dynamics of Structures: Theory and Applications to Earthquake Engineering,
4th edn. Prentice Hall, Englewood Cliffs (2011)
Bayesian Cubic Spline in Computer
Experiments 12
Yijie Dylan Wang and C. F. Jeff Wu
Abstract
Cubic splines are commonly used in numerical analysis. It has also become
popular in the analysis of computer experiments, thanks to its adoption by
the software JMP 8.0.2 2010. In this chapter, a Bayesian version of the cubic
spline method is proposed, in which the random function that represents prior
uncertainty about y is taken to be a specific stationary Gaussian process and
y is the output of the computer experiment. A Markov chain Monte Carlo
(MCMC) procedure is developed for updating the prior given the observed y
values. Simulation examples and a real data application are given to show that
the proposed Bayesian method performs better than the frequentist cubic spline
method and the standard method based on the Gaussian correlation function.
Keywords
Gaussian process Markov chain Monte Carlo (MCMC) Kriging Nugget
Uncertainty quantification
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
2 A Brief Review on Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
3 Bayesian Cubic Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
3.1 The Prior and Posterior Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
3.2 Nugget Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
1 Introduction
where d D kx1 x2 k is the distance between two input values x1 and x2 and is the
scale parameter. It has been used in many applications [1, 24, 28, 29] and software
including JMP 8.0.2 2010. However, a process y.x/ with (12.2) as the correlation
function has the property that its realization on an arbitrarily small, continuous
interval determines the realization on the whole real line. This global influence of
local data is considered unrealistic and possibly misleading in some applications
[6, p. 54]. This property will be referred to hereafter as global prediction. Another
12 Bayesian Cubic Spline in Computer Experiments 479
well-known correlation function is the Matrn family (Matrn, 1960). For the one-
dimension case, it is a two-parameter family:
where K ./ denotes a modified Bessel function of order > 0 and > 0 is a scale
parameter for the distance d . As ! 1, the Matrn correlation function converges
to (12.2).
Another commonly used interpolation method is the spline. An order-s spline
with knots i , i D 1; : : : ; l is a piecewise polynomial of order s and has continuous
derivatives up to order s 2. A cubic spline has s D 4. The GP may also be
viewed as a spline in a reproducing kernel Hilbert space, with the reproducing kernel
given by its covariance function [32]. The main difference between them is in the
interpretation. While the spline is driven by a minimum norm interpolation based
on a Hilbert space structure, the GP is driven by minimizing the expected squared
prediction error based on a stochastic model.
In this chapter, emphasis is placed on the cubic spline by considering it in the GP
framework via the cubic spline correlation function [4, 31]:
8
d 2 jd j 3
if jd j < 2 ;
<1 6. / C 6. / ;
R.d / D 2.1 jd j /3 ; if
jd j < ; (12.3)
2
:0; if jd j ;
where > 0 is the scale parameter. Currin et al. [5] showed that the BLUP
with the function in (12.3) as the correlation function gives the usual cubic spline
interpolator. An advantage of the cubic spline correlation is that can be made
small, which permits prediction to be based on data in a local region around the
predicting location [31, p. 38]. This property shall be referred to hereafter as local
prediction.
In this chapter, a Bayesian version of the Gaussian process approach is introduced
for the cubic spline correlation function given in (12.3). One advantage of Bayesian
prediction is that the variability of y.x/ given observations can be used to provide
measures of posterior uncertainty, and designs can be sought to minimize the
expected uncertainty [29, 36]. Some empirical studies have shown the superiority
of Gaussian processes over the other interpolating techniques, including splines
(see Laslett [18]). Here, some potential advantages of using Bayesian cubic spline
in the GP model compared to the power exponential correlation function (12.2) is
illustrated through simulation studies.
The chapter is organized as follows. In the following section, a brief review of the
kriging technique based on GP models is provided. A methodological section is then
presented introducing a Bayesian version of the cubic spline method, abbreviated
as BCS. This section also outlines methodology for the nugget parameter in BCS
for situations where numerical and estimation stability is required. A summary
procedure for BCS is also given, and its extension to high dimensions is discussed.
480 Y.D. Wang and C.F.J. Wu
The GP model has been used in geostatistics and is known as kriging [3, 6, 20].
Kriging is used to analyze spatially referenced data which have the following
characteristics [3]: The observed values yi are at a discrete set of sampling locations
xi , i D 1; : : : ; n; within a spatial region. The observations yi are statistically related
to the values of an underlying continuous spatial phenomenon. Sacks et al. [29]
proposed kriging as a technique for developing meta models from a computer
experiment. Computer experiments produce a response for a given set of input
variables. Our methodology development only considers deterministic computer
experiments, i.e., the code produces identical answers if run twice using the same
set of inputs.
Suppose the goal is to predict the output of a function y.x/ at an untried location
x, given the observed y values at D D .x1 ; : : : ; xn /t . For the Gaussian process
in (12.1), the best linear unbiased estimator (BLUE) of b is
and spatial process models [4, 19, 29, 30, 34]. For the power exponential correlation
function in (12.2), the estimate of 2 yields
1
O 2 . / D .YD D /0 RD . /1 .YD D /:
n
With YD N .D ; 2 RD . //, the Bayesian framework for the cubic spline method
can now be developed, where R is the correlation matrix based on the function
in (12.3). First assign the non-informative priors to D and and the conjugate
prior to 2 , and further assume that these priors are independent with each other:
D / 1;
2 j ; InverseGamma.; /;
j a U .0; a/;
/ 1=;
a D max kxi xj k:
xi ;xj 2D
The reason for choosing this particular a is because the function in (12.3) is
truncated and equals zero for jd j . Since the local prediction property is desired
for the GP, a is chosen to be the largest distance among the x values in D. A
simulation study (not reported here) shows that a larger value of a does not change
482 Y.D. Wang and C.F.J. Wu
Z
j YD ; D ; ; / P.YD j D ; ; 2 /P./P. 2 j ; / d
Z
/ N .D ; RD . ; 2 // 1 InverseGamma.; / dd:
However, the parameters are embedded into the covariance function R in (12.3)
and have no posterior distribution in closed form. Hence, posterior samples of
are obtained using the Metropolis-Hastings (MH) algorithm [14, 22]. The MH
algorithm works by generating a sequence of sample values in such a way that, as
more and more sample values are produced, the distribution of values more closely
approximates the desired distribution. Specifically, at each iteration, the algorithm
picks a candidate for the next sample value based on the current sample value. Then,
with some probability, the candidate is either accepted (in which case the candidate
value is used in the next iteration) or rejected (in which case the candidate value is
discarded and the current value is reused in the next iteration).
Specifically, a new value of (denoted as new ) is sampled using a normal density
with respect to the existing (denoted as old ). The normal densities work as a
jumping distribution, because they choose a new sample value based on the current
sample. In theory, any arbitrary jumping probability density Q.old j new / can work,
where is the parameter of interest. Here, a symmetric jumping density Q.old j
new / D Q.new j old / is chosen for simplicity. The variance term in the normal
distribution is the jumping size from the old sample to the new sample of the MH
algorithm. Here, the variance is chosen to be one. As this jumping size gets smaller,
the deviation of new parameters from previous ones should get smaller. The jumping
distribution is therefore
One possible problem with the kriging approach is the potential numerical insta-
bility in the computation of the inverse of the correlation matrix in (12.5). This
happens when the correlation matrix is nearly singular. Numerical instability is
serious because it can lead to large variability and poor performance of the predictor.
The simplest and perhaps most appealing way is to add a nugget effect in the
GP modeling. In the spatial statistic literature [3], a nugget effect is introduced to
compensate for local discontinuities in an underlying stochastic process. A well-
known precursor is the ridge regression in linear regression analysis. Gramacy and
Lee [13] gave justifications for the use of nugget in GP modeling for deterministic
computer experiments. Here, the option of adding a nugget parameter in GP model
is considered by using ridge regression.
Consider the Gaussian model:
Y N .; R. 2 ; / C
2 I /;
where
2 is the nugget parameter. Adding the matrix
2 I to R makes the covariance
matrix nonsingular and helps stabilize the parameter estimate. MH sampling can be
used to estimate
2 by letting 2 D
2 = 2 and assigning a uniform 0; prior on 2 ,
where is fixed and known. The correlation matrices RD and R0 introduced at the
start of this section can then be replaced by VD D RD C 2 I and V0 D R0 C 2 I .
To use MH sampling for 2 , the jumping distribution
2
log new N .log old
2
; 1/; (12.8)
In step 1, the posterior mode is chosen as the starting value. This choice is
sensible and can avoid the need of doing burn-in in MCMC. See its justification
in Geyer [10].
First, the performance of the proposed Bayesian cubic spline (BCS) method is
compared with two other methods: GC and CS (described in the review section).
The criterion for evaluating the performance of the estimators is the integrated mean
squared error (IMSE), defined as
12 Bayesian Cubic Spline in Computer Experiments 485
Z
IMSE.fO/ D .fO.x/ f .x//2 d x;
where f and fO are respectively the true function values and estimated values and
is the region of the x values. The following mean squared error (MSE) is a finite-
sample approximation to the IMSE:
m
1 X O
MSE.fO/ D .f .xi / f .xi //2 ; (12.9)
m iD1
where m is the number of randomly selected points fxi g from . Three choices of
the true function f .x/ are considered in Examples 13, which range from low to
high dimensions and from smooth to non-smooth functions.
Example 1.
For noise based on U .0; 0/ (and U .0; :2/, U .0; :5/ and U .0; 1/ respectively), the
simulation is conducted as follows. First, a noise is randomly sampled from U .0; 0/
(and U .0; :2/, U .0; :5/, and U .0; 1/). For each simulation, the noise is fixed and
denoted as f
1 ; : : : ;
n g. Here n, the number of design points, is 16, 16, 25, and
36, respectively, for the four designs. Second, the values of ff1 .x1 /; : : : ; f1 .xn /g
are calculated. Then the values of ff1 .x1 / C
1 ; : : : ; f1 .xn / C
n g are treated as the
response values by GC, BCS, and CS in parameter estimation. The purpose of this
step is to facilitate the study of the robustness of estimation against noise of various
sizes. Then, the MSE (see (12.9)) is calculated on m D 100 of x randomly sampled
from 0; 1 2 . The errors f
1 ; : : : ;
n g and inputs fx1 ; : : : ; xm g are sampled repeatedly
and independently for each simulation, and the average of the MSE values from 500
simulations for each noise and design setting is recorded in Table 12.1. For each sim-
ulation setting, the method with the smallest average MSE is highlighted in boldface.
The MCMC iterations of the BCS terminate if the change of parameter estimate
is smaller than 104 . The running time takes about 20 s for an Intel Xeon CPU
with 2.66 GHz and 3.00 GB of RAM to reach convergence. The power exponential
method performs best when the noise is small (U .0; 0/ and U .0; :2/) or the design
486 Y.D. Wang and C.F.J. Wu
Table 12.1 Average MSE U .0; 0/ U .0; :2/ U .0; :5/ U .0; 1/
values for GC, BCS, and CS
GC 1:0034 1:0437 1:7192 1:8951
predictors in Example 1
42 (J) BCS 1:0901 1:1024 1:6722 1:8642
CS 1:1036 1:2518 1:6714 1:9854
GC 1:1223 1:3495 1:6545 2:4491
42 (C) BCS 1:1651 1:4475 1:5819 2:1270
CS 1:1710 1:4322 1:6318 2:1476
GC 0:9087 1:0186 1:3516 1:4845
52 BCS 1:0112 1:0574 1:2416 1:3391
CS 1:0481 1:1204 1:2665 1:5816
GC 0:2171 0:2588 0:5604 0:6352
62 BCS 0:2716 0:3079 0:7008 0:6893
CS 0:2942 0:2769 0:7479 0:8042
size is large (36 run). This is because the example is relatively smooth when noise is
small, and the function f1 contains the exponential term exp.:5=x2 /, which is best
captured by the nonzero exponential correlation function. GC also benefits from
the larger sample size of 62 , which helps to stabilize the estimate. For relatively
small designs (16 and 25 run) with large noise (U .0; :5/ and U .0; 1/), CS and BCS
perform better than GC. When the design is small and noise is large, the response
surface tends to be very rugged, and there is not enough data for GC to estimate the
surface with good precision. A localized estimate like CS and BCS with truncated
correlation function gives smaller MSE. BCS in most cases outperforms CS. The
over-smoothing property of GC can result in large estimation errors as will be shown
in Example 2 and in the application section.
Example 2.
Table 12.2 Average MSE U .0; 0/ U .0; :2/ U .0; :5/ U .0; 1/
values for GC, BCS, and CS
GC 0:0026 0:2857 0:7214 0:8147
predictors in Example 2
5 BCS 0:0018 0:0772 0:2135 0:2740
CS 0:0021 0:0518 0:2768 0:2853
GC 0:0017 0:0389 0:1626 0:3260
10 BCS 0:0013 0:0261 0:0508 0:1892
CS 0:0013 0:0259 0:0591 0:1964
GC 0:0015 0:0092 0:1443 0:1547
20 BCS 0:0004 0:0057 0:0721 0:1208
CS 0:0011 0:0063 0:1125 0:1248
GC 0:0011 0:0049 0:0754 0:1853
30 BCS 6.70E-04 0:0032 0:0341 0:1807
CS 7.20E-04 0:0028 0:0596 0:2005
In all cases, CS and BCS beat GC even when no noise is added to the true
function. BCS performs better than CS in most cases. CS performs better than
BCS in four cases, three of which the difference is not significant. GC gives much
worse results when the design size is small (5, 10) and the noise is large (U .0; :5/
and U .0; 1/). This is due to the global prediction property of GC. For non-smooth
functions, this can bring in unnecessarily large errors. On the other hand, the better
performance of CS and BCS benefits from their local prediction property.
Example 3.
2x1 .x2 x3 /
f3 .x/ D 2x1 x6 x1
:
ln.x4 =x5 /1 C ln.x4 =x5 /x52 x7
C x8
Table 12.4 MSE values for U .0; 0/ U .0; :2/ U .0; :5/ U .0; 1
GC, BCS, and CS predictors
GC 0:0386 0:0457 0:0695 0:1850
in Example 3
10 BCS 0:0490 0:0516 0:0855 0:1608
CS 0:0501 0:0672 0:0884 0:1691
GC 0:0277 0:0473 0:0721 0:1821
10LH BCS 0:0473 0:0655 0:0890 0:1542
CS 0:0458 0:0632 0:0876 0:1409
GC 0:0013 0:0125 0:0405 0:1714
20LH BCS 0:0067 0:0366 0:0784 0:1509
CS 0:0077 0:0298 0:0868 0:1621
GC 5.10E-04 0:0096 0:0311 0:1355
50LH BCS 0:0039 0:0117 0:0520 0:1428
CS 0:0043 0:0159 0:0581 0:1523
The results are similar to those of Example 1. This is expected as they are both
smooth functions. GC gives best results among the three methods when the noise is
small (U .0; 0/ and U .0; :2/) or the sample size is large (50LH). BCS and CS perform
well when sample size is small (10 and 20) and the noise is large (U .0; 1/). BCS
generally outperforms CS. Noting that x4 and x5 appear in the f3 function through
the ln function, an alternative simulation is run by taking the log transformation of
x4 and x5 and rescaling them into 0; 1 . Since it does not change the overall picture
of comparisons, these results are not reported in the paper.
5 Application
its 50-run design. In addition, Mitchell and Morris gave 20-, 30-, 40-, 50-, and
7-run variable maximin designs in their paper. The first 20, 30, 40, and 50 runs
in Table 12.5 consist of these designs, denoted as D20, D30, D40, and D50. The
response y is the logarithm of the ignition delay time.
Before conducting the comparison, a careful data analysis is performed to show
some feature of the data. For D20, D30, and D50, 90%, 80%, and 50% of the original
data are taken as the response values for GC and BCS to estimate . The average
values of Oj , j D 1; : : : ; 7, based on 100 simulations, are calculated and given in
Table 12.6 for each setting. For each design, the value of O from GC increases as
the number of input data decreases, while the O values for BCS are more stable. The
divergent behavior between GC and BCS for this data can be partially explained
by their respective global and local prediction properties. First, note that a larger
value indicates a smoother surface. As the size of input data gets smaller, the data
12 Bayesian Cubic Spline in Computer Experiments 491
1.5
1
MSE of BCS
0.5
0
0 0.5 1 1.5
MSE of PEM
points are spread more thinly in the design region 0; 1 7 . The fitted response surface
by GC will become more smooth due to its global prediction property. The change
will not be as dramatic for BCS, thanks to its local prediction property. Even though
the ruggedness of the true response surface is not known, this divergent behavior
seems to suggest that BCS is a better method for the data. This is confirmed in the
next study based on cross-validation.
The same data and design settings are then used to run cross-validations on D50,
D40, and D30 for each of the three methods. One round of cross-validation involves
partitioning the data into complementary subsets, performing the analysis on one
subset (called the training set) and validating the analysis on the other subset (called
the validation set). To reduce variability, multiple rounds of cross-validation are
performed using different partitions, and the validation results are averaged over
the rounds [9, 17]. Each time, a fixed number of data out of D50 (and D40, D30
respectively) are used for model fitting. The remaining data are used to calculate
the MSE in (12.9). Because CS gives much larger MSE in each case, only the MSE
results for GC and BCS are reported. For D50, the results of training data size of 40
and 30 are plotted in Figs. 12.1 and 12.2. In each figure, one dot indicates the MSE
from GC versus the MSE from BCS for a given design. The reference line of 45
indicates that the two designs are equally good since they render the same MSE.
When the majority of the dots are below the line, it means BCS has smaller MSE.
This is evident in Fig. 12.2. GC gives some very bad predictions with MSE as high as
5.5 (dots in the right bottom of Fig. 12.2), while the majority of MSE of BCS center
around 1.5. The average of MSE from 100 simulations for each cross-validation
492 Y.D. Wang and C.F.J. Wu
4.5
3.5
3
MSE of BCS
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
MSE of PEM
setting for BCS and GC and the percentage of BCS outperforming GC are given in
Table 12.7. There is a relatively larger difference between GC and BCS when the
training data size is relatively small (20 for D30, 25 for D30, and 30 for D50). This
is probably caused by the global prediction and over-smoothing properties of GC.
There is no systematic pattern in the percent improvement figures, possibly due to
the small number of simulations.
6 Conclusions
used correlation functions, the Matrn family and the power exponential correlation,
do not enjoy this property. A sparse correlation matrix can reduce the cost of
computation and enhance its stability. The viability of cubic spline for computer
experiment applications received a further boost when JMP 8.0.2 2010 (and its
update 11.1.1 2014) provided the power exponential correlation and the cubic spline
correlation as its only two choices in GP modeling. The prominence that JMP gives
to the cubic spline was one motivation for us to develop a Bayesian version of
the cubic spline method. By putting a prior on the parameters in the cubic spline
correlation function in (12.3), Bayesian computation can be performed by using
MCMC. The Bayesian cubic spline should outperform its frequentist counterpart
because of its smoothness and shrinkage properties. It also provides posterior
estimates, which enable statistical inference on the parameters of interest.
In this chapter, a comparison of BCS with CS and GC is performed in three
simulation examples and application to real data. Other correlation functions with
compact support have also been considered, such as those from the spherical family:
(
3jd j
1 2
C 12 . jd j /3 ; if jd j ;
R.d / D
0; if jd j > :
Because the performance of the frequentist version of the spherical family is similar
to that of the cubic spline, these results are omitted in the chapter. In the three
simulation examples, BCS outperforms CS in most cases. GC performs the best
when the true function is smooth or the data size is large. BCS and CS perform
better than GC when the true function is rugged and the data size is relatively small.
This difference in performance can be explained by the local prediction property
of BCS and CS and the global prediction property of GC. Recall that, in global
prediction, the prediction at any location is influenced by far-away locations (though
with less weights). This leads to over-smoothing, which is not good for a rugged
surface. Local prediction does not suffer from this as the prediction depends only on
nearby locations. In the real data application, BCS outperforms GC in most choices
of design. In summary, when the response surface is non-smooth and/or the input
dimension is high, the BCS method can have potential advantages and should be
considered for adoption in data analysis.
There are other methods that also attempt to balance between local and global
prediction. See, for examples, Kaufman et al. [16] and Gramacy and Apley [12],
although the main purpose of these two papers was to leverage sparsity to build
emulators for large computer experiments. Comparisons of our proposed method
with these and other methods in the literature will require a separate study, and it lies
outside our original scope of comparing the Bayesian and the frequentist methods
for cubic spline.
Some issues need to be considered in future work. When the dimension is
high, the parameter estimation is based on the MH sampling which can be costly.
Grouping parameters to reduce computation is an alternative. Also, only non-
494 Y.D. Wang and C.F.J. Wu
References
1. Abrahamsen, P.: A Review of Gaussian Random Fields and Correlation Functions. Norsk
Regnesentral/Norwegian Computing Center (1997)
2. Chen, Z., Gu, C., Wahba, G.: Comment on linear smoothers and additive models. Ann. Stat.
17(3), 515521 (1989)
3. Cressie, N.: Statistics for Spatial Data. Wiley, Chichester/New York (1992)
4. Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: A Bayesian approach to the design and
analysis of computer experiments. ORNL-6498 (1988)
5. Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: Bayesian prediction of deterministic
functions, with applications to the design and analysis of computer experiments. J. Am. Stat.
Assoc. 86(416), 953963 (1991)
6. Diggle, P.J., Ribeiro, P.J.: Model-Based Geostatistics. Springer, New York (2007)
7. DiMatteo, I., Genovese, C.R., Kass, R.E.: Bayesian curve-fitting with free-knot splines.
Biometrika 88(4), 10551071 (2001)
8. Fang, K.-T., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments. Chapman
and Hall/CRC, Boca Raton (2010)
9. Geisser, S.: Predictive Inference: an Introduction, vol. 55. CRC Press, New York (1993)
10. Geyer, C.J.: Introduction to MCMC. Chapman & Hall, Boca Raton (2011)
11. Gill, J.: Bayesian Methods: A Social and Behavioral Sciences Approach. CRC Press, Boca
Raton (2002)
12. Gramacy, R.B., Apley, D.W.: Local Gaussian process approximation for large computer
experiments. J. Comput. Graph. Stat. 24(2), 561578 (2014)
13. Gramacy, R.B., Lee, H.K.H.: Cases for the nugget in modeling computer experiments. Stat.
Comput. 22(3), 713722 (2012)
14. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications.
Biometrika 57(1), 97109 (1970)
15. Joseph, V.R.: Limit kriging. Technometrics 48(4), 458466 (2006)
16. Kaufman, C.G., Bingham, D., Habib, S., Heitmann, K., Frieman, J.A.: Efficient emulators of
computer experiments using compactly supported correlation functions, with an application to
cosmology. Ann. Appl. Stat. 5(4), 24702492 (2011)
17. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model
selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 11371145.
Lawrence Erlbaum Associates Ltd (1995)
18. Laslett, G.M.: Kriging and splines: an empirical comparison of their predictive performance in
some applications. J. Am. Stat. Assoc. 89(426), 391400 (1994)
19. Mardia, K.V., Marshall, R.J.: Maximum likelihood estimation of models for residual covari-
ance in spatial regression. Biometrika 71(1), 135146 (1984)
20. Matheron, G.: Principles of geostatistics. Econ. Geol. 58(8), 12461266 (1963)
21. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting
values of input variables in the analysis of output from a computer code. Technometrics 21,
239245 (1979)
22. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state
calculations by fast computing machines. J. Chem. Phys. 21(6), 10871092 (1953)
12 Bayesian Cubic Spline in Computer Experiments 495
23. Morris, M.D., Mitchell, T.J., Ylvisaker, D.: Bayesian design and analysis of computer
experiments: use of derivatives in surface prediction. Technometrics 35(3), 243255 (1993)
24. OHagan, A., Kingman, J.F.C.: Curve fitting and optimal design for prediction. J. R. Stat. Soc.
Ser. B (Methodolog.) 2, 142 (1978)
25. Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are
unequal. Biometrika 58(3), 545554 (1971)
26. Peng, C.-Y., Wu, C.F.J.: On the choice of nugget in kriging modeling for deterministic
computer experiments. J. Comput. Graph. Stat. 23(1), 151168 (2014)
27. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, vol. 319. Citeseerx, New York
(2004)
28. Sacks, J., Schiller, S.: Spatial designs. Stat. Decis. Theory Relat. Topics IV 2(32), 385399
(1988)
29. Sacks, J., Schiller, S., Welch, W.J.: Designs for computer experiments. Technometrics 31(1),
4147 (1989)
30. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experi-
ments. Stat. Sci. 4(4), 409423 (1989)
31. Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments.
Springer, New York (2003)
32. Wahba, G.: Spline Models for Observational Data, vol. 59. Society for Industrial and Applied
Mathematics, Philadelphia (1990)
33. Wang, X.: Bayesian free-knot monotone cubic spline regression. J. Comput. Graph. Stat. 17(2),
518527 (2008)
34. Wecker, W.E., Ansley, C.F.: The signal extraction approach to nonlinear regression and spline
smoothing. J. Am. Stat. Assoc. 78(381), 8189 (1983)
35. Ylvisaker, D.: Designs on random fields. Surv. Stat. Des. Linear Models 37(6), 593607 (1975)
36. Ylvisaker, D.: Prediction and design. Ann. Stat. 52(17), 119 (1987)
Propagation of Stochasticity in
Heterogeneous Media and Applications 13
to Uncertainty Quantification
Guillaume Bal
Abstract
This chapter reviews several results on the derivation of asymptotic models for
solutions to partial differential equations (PDE) with highly oscillatory random
coefficients. We primarily consider elliptic models with random diffusion or
random potential terms. In the regime of small correlation length of the random
coefficients, the solution may be described either as the sum of a leading, deter-
ministic, term, and random fluctuations, whose macroscopic law is described, or
as a random solution of a stochastic partial differential equation (SPDE). Several
models for such random fluctuations or SPDEs are reviewed here.
The second part of the chapter focuses on potential applications of such
macroscopic models to uncertainty quantification and effective medium models.
The main advantage of macroscopic models, when they are available, is that
the small correlation length parameter no longer appears. Highly oscillatory
coefficients are typically replaced by (additive or multiplicative) white noise
or fractional white noise forcing. Quantification of uncertainties, such as, for
instance, the probability that a certain function of the PDE solution exceeds
a certain threshold, then sometimes takes an explicit, solvable, form. In most
cases, however, the propagation of stochasticity from the random forcing to the
PDE solution needs to be estimated numerically. Since (fractional) white noise
oscillates at all scales (as a fractal object), its numerical discretization involves
a large number of random degrees of freedom, which renders accurate compu-
tations extremely expensive. Some remarks on a coupled polynomial chaos
Monte Carlo framework and related concentration (Efron-Stein) inequalities to
numerically solve problems with small-scale randomness conclude this chapter.
G. Bal ()
Department of Applied Physics and Applied Mathematics, Columbia University, New York,
NY, USA
e-mail: gb2030@columbia.edu
Keywords
Concentration inequalities Equations with random coefficients Propagation
of stochasticity Uncertainty quantification
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
2 Propagation of Stochasticity for Elliptic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
2.1 Asymptotic Random Fluctuations in One Dimension . . . . . . . . . . . . . . . . . . . . . . . . 500
2.2 Large Deviations in One Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
2.3 Some Remarks on the Higher-Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
3 Equations with a Random Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
3.1 Perturbation Theory for Bounded Random Potentials . . . . . . . . . . . . . . . . . . . . . . . . 506
3.2 Homogenization Theory for Large Random Potentials . . . . . . . . . . . . . . . . . . . . . . . 510
3.3 Convergence to Stochastic Limits for Long-Range Random Potentials . . . . . . . . . . 512
4 Applications to Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
4.1 Application to Effective Medium Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
4.2 Concentration Inequalities and Coupled PCE-MC Framework . . . . . . . . . . . . . . . . . 515
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
1 Introduction
L.q; x; u/ D 0; x 2 X Rd ; (13.1)
The residual stochasticity in u coming from such rapid fluctuations is then expected
to appear as a correction to the law of large numbers, for instance, as an application
of a central limit result.
These remarks lead us to the modified modeling of (13.1) given by
r a. x" ; !/ru" D f; x 2 X Rd
(13.3)
u D g; x 2 @X;
with a.y; !/ a stationary ergodic random field bounded above and below by positive
constants. It is then known that u" converges strongly, e.g., in the L2 sense, to a
deterministic, homogenized limit u satisfying the same equation as above with
a replaced by an effective, deterministic, constant, diffusion tensor a ; see, e.g.,
[29, 31, 33].
500 G. Bal
Let us define a" .x; !/ D a. x" ; !/. When d D 1, the above elliptic equation takes
the form
d d
a" .x; !/ u" D f .x/ .0; 1/;
u" .0; !/ D 0; u" .1; !/ D g:
in
dx dx
(13.4)
Here, a.x; !/ is a stationary ergodic random process satisfying the ellipticity
condition 0 < 0 a.x; !/ 01 a.e. for .x; !/ 2 R where .; F; P/
is an abstract probability
Rx space.
With F .x/ D 0 f .y/dy, the solution is written using explicit integrals of the
random coefficient a:
Z 1
F .y/
Z x gC dy
c" .!/ F .y/ 0 a" .y; !/
u" .x; !/ D dy; c" .!/ D Z 1 : (13.5)
0 a" .y; !/ 1
dy
0 a" .y; !/
d d
a u D f .x/ in .0; 1/; u .0; !/ D 0; u .1; !/ D g;
dx dx
(13.6)
1
with a D Efa1 .0; :/g the harmonic mean of a.0; /. We also have the explicit
expression
Z Z
x
c F .y/ 1
u .x/ D dy; c D ga C F .y/dy: (13.7)
0 a 0
where gx is a stationary Gaussian process with mean zero and variance one and
is a bounded function such that
Z g2
e 2
V0 D Ef.g0 /g D .g/ p dg D 0; (13.11)
2
Z g2
e 2
V1 D Efg0 .g0 /g D g.g/ p dg > 0:
2
where g > 0 and 2 .0; 1/. Then we can show [7] that
We observe that R.y/ is no longer integrable. In this setting, we obtain [7] that
r Z
u" .x/ u .x/ "!0
HHH) K.x; y/d WyH ; (13.14)
" 2 H .2H 1/ R
in the space of continuous functions C0; 1, where K.x; y/ is as above and WyH is
a fractional Brownian motion with Hurst index H D 1 2 .
We thus observe that the random fluctuations are of variance "
", which
is larger than in the case of an integrable correlation function and in fact could
be arbitrarily close to "0 D 1. Moreover, they are conveniently represented as
a stochastic integral with respect to a fractional Brownian motion such that the
correlation function of d WyH also decays like y as y ! 1.
Note that D 0 when V1 D 0. In such a case, we can also sometimes exhibit a
limit for u" u , which is no longer Gaussian. Let us assume that V0 D V1 D 0 and
that
Z g2
e 2
V2 D E g02 .g0 / D 2
g .g/ p dg > 0; (13.15)
2
in other
words,
is of Hermite rank 2. Defining D 2, we then observe for
2 0; 12 [26] that
1 2 2
R.y/ WD Ef'.x/'.x C y/g y as y ! 1 with D V ; (13.16)
2 g 2
in the space of continuous functions C0; 1, where K.x; y/ is as above and RD .y/
is a Rosenblatt process with D D 2 D [36]. The result holds for 2 .0; 1/ and
thus mimics that obtained in (13.14) with a fractional Brownian motion replaced by
a non-Gaussian Rosenblatt process.
For small enough ", the homogenized solution u captures the bulk of the solution
u" . The corrector attempts to capture some statistics of the term u" u . The above
corrector result for integrable correlation R shows that for any ` > 0
Z 1
u" .x/ u .x/ "!0
P p ` ! P v WD K.x; y/d Wy ` : (13.18)
" 0
13 Propagation of Stochasticity in Heterogeneous Media and: : : 503
p
Pu" .x/ ` Pu .x/ C "v `: (13.19)
The answer to the above question is in general negative for u .x/ < ` D O.1/. This
review follows the presentation in [8], to which the reader is referred for additional
details, and presents some relevant results in the analysis of (13.19). Let us first
recall a few definitions.
inf I .y/ lim inf " log P Y" 2 lim sup " log P Y" 2 inf I .y/:
y2 "!0 "!0 y2N
Rather than directly analyzing (13.19), we consider the simpler limit for ` >
Efu" g:
assuming such a limit exists. Note that such a limit implies that for all > 0 and
" < "0 ./, we have
1 1
e " .Iu" .`/C/ Pu" > ` e " .Iu" .`// :
In dim d D 1, we verify from the explicit above formulas that the solution
u" .x/ D g.Z" / for g.z1 ; : : : ; z4 / D z1 C z2 z3 z1
4 and
Z 1
Hj .s/
Z";j D ds; H WD .H1 ; : : : ; H4 / D .f 1.0;x/ ; f; 1.0;x/ ; 1/: (13.21)
0 a" .s/
Let us now describe the main steps of the process leading to a characterization of
the rate function Iu" .`/ and first recall two results.
504 G. Bal
.`/ WD sup ` . / :
2Rn
The above theorem allows us to characterize the rate function IZ" of the oscillatory
integrals in (13.21). From this, we deduce the rate function of Iu" by means of the
following contraction principle:
Then I 0 is a good rate function controlling the LDP associated with the measures
" f 1 (
" f 1 .B/ D P f .Z" / 2 B).
Here, a0 and b are chosen so that 0 < 1 < a" .x; !/ < 2 . For a random variable
Y , we define the logarithmic moment generating function
L.Y; / WD log Ee Y :
13 Propagation of Stochasticity in Heterogeneous Media and: : : 505
Then 2 C 1 .R4 / is a convex function such that when a" is defined as above,
1 Z
lim " log Ee " "
D . /:
"!0
Moreover, for fixed
, Z" satisfies a large deviation principle with good convex rate
function
.`/ WD sup ` . / ;
2R4
and u" satisfies a large deviation principle with good rate function
The above example and further examples presented in [8] show that a large
deviation principle may be obtained for u" and asymptotically characterize the
probability Pu" > `. However, the derivation of such a rate function is not
straightforward and depends on the whole law of the random coefficient a" and
not only its correlation function as in the application of central limit results.
The above results strongly exploit the integral representation (13.5), which is
only available in the one-dimensional setting. Fewer results exist for the higher-
dimensional problem, concerning the propagation of stochasticity from a" .x; !/ D
a. x" ; !/ to u" in the elliptic equation r a" ru" D f on X Rd , augmented with,
say, Dirichlet conditions on @X .
As we already indicated, homogenization theory states that u" converges to a
deterministic solution u when the diffusion coefficient a.x; !/ is a stationary,
ergodic (bounded above and below by positive constants) function [29, 31, 33].
Homogenization is obtained by introducing a (vector-valued) corrector " such that
v" WD u" u "" ru converges to 0 in the strong H 1 sense. In the periodic
setting and away from boundaries, "" ru also captures the main contribution of
the fluctuations u" u with v" D o."/ in the L2 sense [11]. In the random setting,
such results no longer hold. It remains true that v" converges to 0 in the H 1 sense,
but it is no longer necessary of order O."/ in the L2 sense, as may be observed
506 G. Bal
in the one-dimensional setting. Moreover, "" ru may no longer be the main
contribution to the error u" u , as shown in, e.g., [27].
Concerning the random fluctuations u" u , Yurinskii [37] gave the first statistical
error estimate, a nonoptimal rate of convergence to homogenization. Recent results,
borrowing from (Naddaf, A., Spencer, T.: Estimates on the variance of some
homogenization problems. Unpublished Manuscript, 1998), provide optimal rates
of convergence of u" to its deterministic limit [2325] in the discrete and continuous
settings for random coefficients with short range (heuristically corresponding to a
correlation function R with compact support). In [1, 16], rates of convergence for
fully nonlinear equations are also provided.
The limiting law of the random fluctuations u" Efu" g is the object of current
active research. Using central limit results of [17], it was shown in [32] that the
(normalized) fluctuations of certain functionals of u" were indeed Gaussian. Central
limit results for the effective conductance have been obtained in [12] in the setting
of small conductance contrast and using a martingale CLT method. Convergence of
the random fluctuations u" Efu" g was obtained in the discrete setting in [27].
Rather than describing in detail the results obtained in this rapidly evolving field,
we consider in the following section a simpler multidimensional problem with a
random (zeroth-order) potential for which a more complete theory is available.
We come back to the above elliptic problem briefly in the section devoted to the
applications to uncertainty quantifications.
In this section, we consider linear equations with a random potential of the form
for a Schwartz kernel G.x; y/, which we assume is nonnegative, real valued, and
symmetric so that G.x; y/ D G.y; x/.
Then u" converges, for instance, in L2 .X /, to the unperturbed solution u.
We are next interested in a macroscopic characterization of the fluctuations u" u.
These fluctuations may be decomposed into the superposition of a deterministic
corrector Efu" g u and the random fluctuations u" Efu" g. The latter contribution
dominates when the Greens function G.x; y/ is a little more than square integrable
in the sense that
Z 2C
1
The above constraint is satisfied for P .x; D/ D r a.x/r C .x/ for a.x/
bounded and coercive and .x/ 0 bounded in dimension d 3.
Under sufficient conditions on the decorrelation properties of q.x; !/, we obtain
that u" u is well approximated by a central limit theory as in the preceding section.
We describe the results obtained in [2]; see also [21]. The main idea is to decompose
u" u as a sum of stochastic integrals that can be analyzed explicitly as in the one-
dimensional case and negligible higher-order terms. We now present some details
of the derivation of such a decomposition.
We define qQ " .x; !/ D q. x" ; !/, where q.x; !/ is a mean zero, strictly stationary,
process defined on an abstract probability space .; F; P/ [15]. We assume that
q.x; !/ has an integrable correlation function R.x/ D Efq.0/q.x/g. We also
assume that q.x; !/ is strongly mixing in the following sense. For two Borel sets
A; B Rd , we denote by FA and FB the sub- algebras of F generated by the
field q.x; !/ for x 2 A and x 2 B, respectively. Then we assume the existence of a
() mixing coefficient '.r/ such that
The above equation may not be invertible for all realizations, even if G is bounded.
We are not interested in the analysis of such possible resonances here and thus
modify the definition of our random field q" . Let 0 < < 1. We denote by "
the set where kG qQ " G qQ " k2L.L2 .X// > . We deduce from (13.27) that P." / C "d .
We thus modify qQ " as
qQ " .; !/; ! 2 n" ;
q" .; !/ D (13.29)
0; ! 2 " ;
and now assume that the above q" is a reasonable representation of the physical
heterogeneous potential. Note that the process q" is no longer necessarily stationary
or ergodic. However, since the set of bad realizations " is small, all subsequent
calculations involving q" can be performed using qQ " up to a negligible correction.
Now, almost surely, kGq" Gq" k2L.L2 .X// < < 1 and u" is well defined in L2 .X /
P-a.s. Moreover, we observe that
Since Gq" Gq" is small thanks to (13.27), we verify that EfkGq" Gq" .u" u/kg C "d
is also small.
The analysis of u" u therefore boils down to that of Gq" Gf and Gq" Gq" Gf ,
which are integrals of stochastic field q" of a similar nature to those obtained in the
preceding section. When (13.25) holds, we obtain that the former term dominates
the latter. It thus remains to analyze Gq" u, which up to a negligible contribution,
is the same as Gq. " ; !/u. This integral may be analyzed as in the one-dimensional
setting considered in the preceding section to obtain [2]:
Theorem 4. Let q satisfy the hypotheses mentioned above. Then we have that
Z
u" u "!0
d
.x/ HHH) G.x; y/u.y/d Wy ; (13.31)
"2 X
R
in distribution weakly in space where 2 D Rd Efq.0/q.x/gdx < 1 and d Wy is
a standard multiparameter Wiener measure on Rd .
random coefficient q.x; !/. We refer to [10] for such a theory when the operator P
is the square root of the Laplacian, which finds applications in cell biology and the
diffusion of molecules through heterogeneous membranes.
Assuming now that the random potential has a slowly decaying correlation
function, we expect the random fluctuations u" u to be significantly larger. Let gx
be a stationary-centered Gaussian random field with unit variance and a correlation
function that has a heavy tail
for g > 0 and some 0 < < d . Let then W R ! R bounded (and sufficiently
small) so that
Z 1 2
e 2 g
Ef.g0 /g D .g/ p dg D 0; D g .Efg0 .g0 /g/2 > 0:
R 2
The above weak in space convergence may often be improved. Consider, for
instance, the case of P .x; D/ D C 1 in dimension d 3. Then we can show
[6, Theorem 2.7] that Y" WD " 2 .u" Efu" g/ converges in distribution in the space
of functions L2 .X / to its limit Y given on the right-hand side of (13.32). This
more precise statement means that for any continuous map f from L2 .X / to R,
we have that
"!0
Eff .Y" /g ! Eff .Y /g; (13.33)
so that, for instance, the L2 norm of Y" converges to that of Y . See [6] for some
generalizations of the above convergence result.
510 G. Bal
In the preceding section, the elliptic problems involved a highly oscillatory potential
q" satisfying bounds independent of ". We saw that the limit of the random solution
u" was given by the solution u obtained by replacing q" by its ensemble average.
Such a centered potential is therefore not sufficiently strong to have an influence on
the leading term u as " ! 0.
In this and the following two sections, we consider the more strongly stochastic
case where the potential is rescaled such that it has an influence of order O.1/ on
the limit as " ! 0, assuming the latter exists. In this section, we consider results
obtained by a diagrammatic expansion method that converges only for Gaussian
potentials q" . With this restriction in mind, consider the problem
@u" 1 x
C P .D/u" q u" D 0; t 0; x 2 Rd
@t " " (13.34)
u" .0; x/ D u0 .x/; x2R ; d
m
where d 1 is spatial dimension, P .D/ D ./ 2 for some m > 0, and q.x/ is a
stationary-centered Gaussian field with correlation function R.x/ D Efq.0/q.x/g.
We assume the initial condition u0 sufficiently smooth, deterministic, and compactly
supported.
The limit of u" and the natural choice of depend on the decorrelation properties
of q. When the correlation function of q decays sufficiently rapidly, then averaging
effects are sufficiently efficient to imply that u" converges to a deterministic solution
u. However, when the correlation function of q decays slowly, stochasticity persists
in the limit, and u may be shown to be the solution of a stochastic partial differential
equation with multiplicative noise. The threshold rate of decay of the correlation is
as follows. Define the power spectrum of q as the Fourier transform (up to a factor
.2/d ) of the correlation function
Z
O
.2/d R.
/ D e ix
R.x/dx: (13.35)
Rd
Z O
R.
/
WD d
: (13.36)
Rd j
jm
When the above quantity is finite, then u" converges to the deterministic solution of
@
C P .D/ u.t; x/ D 0; x 2 Rd ; t > 0;
@t (13.37)
u.0; x/ D u0 .x/; x 2 Rd :
13 Propagation of Stochasticity in Heterogeneous Media and: : : 511
When the above integral diverges (because of the behavior of the integrand at
D 0), then u" converges to a stochastic limit described in (13.43) below.
In the case of convergence to a deterministic limit, we have the following result:
More precise rates of convergence are given in [4, Theorem 1]. A similar result of
convergence holds in the critical dimension d D m with R.x/ integrable. In such
m 1
a case, " has to be chosen as " 2 j ln "j 2 [4]. The same method shows that for any
m
choice of potential rescaling < 2 , then is replaced by "m2 in (13.37) so that
u" converges uniformly in time on compact intervals .0; T / (with no restriction on
T then) to the unperturbed solution u of (13.37) with replaced by 0.
The residual stochasticity of u" can be computed explicitly in the diagrammatic
expansion. Let us separate u" u as u" Efu" g and Efu" gu. The latter contribution
is a deterministic corrector, which could be larger than the random fluctuations. We
refer to [4] for its size and how it may be computed. For the random fluctuations
u" Efu" g, we have the following convergence result:
in distribution and weakly in space, where u1 is the unique solution of the following
stochastic partial differential equation (SPDE) with additive noise
@
C P .D/ u1 .t; x/ D uWP ; x 2 Rd ; t > 0;
@t (13.40)
u1 .0; x/ D 0; x 2 Rd ;
p
. 2 /
Here, we have defined the normalizing constant cp D d d p
.
2d p 2 . 2 /
The proof of these results may be found in [4] with some extensions in [5]. The
convergence
result in Theorem
6 was extended to the case of Schrdinger equations
with @t@ replaced by i @t@ to arbitrary times 0 < t < T < 1 in [39] using the
unitarity of the unperturbed solution operator and the decomposition introduced in
[20].
The behavior of u" is different when the correlation function decays slowly or when
d < m. When p tends to m, we observe that the random fluctuations (13.39) become
of order O.1/ and we thus expect the limit of u" , when it exists, to be stochastic.
Theorem 8. Let either m > d and R.x/ is an integrable function, in which case,
we set p D d , or R is a bounded function such that R.x/ jxjp as jxj ! 1
with 0 < p < m. Let us choose D p2 .
Then there exists a solution to (13.34) u" .t / 2 L2 . Rd / uniformly in 0 < " <
"0 and t 2 0; T for all T > 0. Moreover, we have the convergence result
"!0
u" HHH) u; (13.42)
The derivation of the above result is presented in [3] with some extensions in [5].
In low dimensions d < m and in arbitrary dimension d m when the correlation
function decays sufficiently slowly that 0 < p < m, we observe that the solution
u" remains stochastic in the limit " ! 0. Note that we are in a situation where
the integral in (13.36) is infinite. A choice of D m2 would generate too large
a random potential. Smaller, but with a heavier tail, potentials corresponding to
D p2 < m2 generate an influence of order O.1/ on the (limiting) solution u.
Any choice < p2 would again lead u" to converge (in the strong L2 . Rd / sense
then) to the unperturbed solution u of (13.43) with D 0.
13 Propagation of Stochasticity in Heterogeneous Media and: : : 513
Note that it is not obvious that the (Stratonovich) product uWP is defined q priori.
WP is a distribution and as a consequence, u is also singular. It turns out that in order
to make sense of a solution to (13.43), we need either a sufficiently low dimension
d ensuring that e tP .D/ is an efficient smoothing operator or a sufficiently slow
decay p < m ensuring that WP with statistics recalled in (13.41) is sufficiently
regular. When d < m or m < p, then the product of the two distributions uWP
in (13.43) cannot be defined as a distribution. From a physical point of view, we may
not need such SPDE models since u" then converges to the deterministic solution
in (13.37) with its random fluctuations described by the well-defined SPDE with
additive noise (13.40); see [18] for a general treatment and references to SPDEs.
As for the case of convergence to a deterministic solution, similar results
may be
obtained for the Schrdinger equation with @t@ above replaced by i @t@ ; see [30,38].
The results presented above extend to the setting of time-dependent Gaussian
potentials
@u" 1 t x
C P .D/u" q ; u" D 0; t 0; x 2 Rd
@t " " " (13.44)
u" .0; x/ D u0 .x/; x2R ;
d
R.t; x/ as jxj; t ! 1:
jxjp t b
We restrict ourselves to the setting 0 < b < 1 and 0 < p < d with formally b D 1
when R is integrable in time (uniformly in space) and p D d when R is integrable in
space (uniformly in time). Then when p and b are sufficiently small, we again obtain
that u" converges to the solution of a SPDE, while it converges to a homogenized,
deterministic, solution otherwise.
More precisely, when bmCp < m, then we should choose D 12 .pC b/ and u"
then converges to a SPDE of the form (13.43) with WP replaced by a spatio-temporal
fractional Brownian motion with asymptotically the same correlation function as
R.t; x/, i.e., such that
cp;b
EfWP .s; x/WP .s C t; x C y/g D ; (13.45)
jyjp jt jb
Z 1 Z
D lim "d 2
m
O t ; "
/d
dt;
e tj
j R.
"!0 0 Rd "
still given by u1 solution of the SPDE (13.40) with WP the spatio-temporal fractional
Brownian motion given by (13.45).
We refer to [5] for additional details on these results, which use a diagrammatic
expansion that can be applied only to Gaussian potentials. Many results remain valid
for non-Gaussian potentials as well; see, e.g., [28, 34, 35].
x
@x a ; ! @x u" D f in .0; 1/;
"
with a fluctuation theory given by (13.14) (or (13.9)) u" .x/ D u.x/ C
R1
" 2 0 K.x; t /d WtH C r" .x/, where " 2 r" converges to 0 (in probability in the
uniform norm).
From this, we deduce asymptotic results of the form
h i Z 1
P u" .x/ u .x/ C " 2 ` D P K.x; t /d WtH ` C o.1/: (13.46)
0
13 Propagation of Stochasticity in Heterogeneous Media and: : : 515
We also observed in the one-dimensional case that such results no longer held
for " 2 ` of order O.1/, where more complex large deviation results needed to be
developed.
How such asymptotic results may be used for computational purposes is less
clear. Consider again the above one-dimensional model. We can formally write the
limiting model (with d WtH formally written as WP H dt )
a
@x @x uQ " D f; .a /1 D Ea1 ;
1 C " WP H
2
and verify that asymptotically, u" and uQ " have the same limiting random fluctuations.
The above model is heuristic, and " 2 WP H should be appropriately smoothed out
and truncated to preserve the ellipticity of the diffusion coefficient in the above
equation, without significantly changing the law of the fluctuations for " 1.
From this formal expression preserving the leading contribution to the stochas-
ticity of u" , we can draw two conclusions: (i) upscaling of diffusion coefficient
involves a small-scale limit taking the form of white noise (when D H D 12 ) in
short-range case or colored noise (when > 12 ) in the long-range case and (ii) the
small-scale structure of homogenized coefficient makes it very difficult to solve
such equations by polynomial chaos expansions since the rapid spatial fluctuations
of WP H .x/ requires a very large number of polynomials in the expansion.
Let us come back to the general problem (13.2) presented in the introduction and
the propagation of stochasticity from q0 D q0 .x;
/ and q" D q" .x; / to u" D
u" .x;
; /. We assume here that q0 corresponds to the low-frequency component of
the random coefficients and q" to their high-frequency component, which means that
is relatively low dimensional, whereas lives in a high-dimensional space. The
asymptotic models presented earlier in this paper allow us to estimate the influence
of the parameters on the distribution of u" . Unfortunately, the latter propagation
can be obtained theoretically only in a very limited class of problems.
When such theoretical results are not available, we would like to devise com-
putational tools that respect the above multi-scale decomposition. One such model
consists of combining a PCE method to model the effect of the random variables
with a Monte Carlo (MC) method to estimate that of .
Although the theoretical asymptotic results mentioned in the preceding sections
involve technical derivations that are problem specific, a central idea underlying
the validity of several of them is the following type of inequalities, which we will
refer to as Efron-Stein inequalities. We follow here the presentation in [13]. Let
516 G. Bal
us assume that D .1 ; : : : ; n / with n large, which is the case in the asymptotic
regimes considered earlier, where typically n "d in dimension d . Now let us
assume that the variables j are independent and that they each have an influence
on the solution u proportional to their physical volume of order "d n1 and hence
small. This means that
ci
sup u.1 ; : : : ; n / u.1 ; : : : ; i1 ; i0 ; iC1 ; : : : ; n / ; 1 i n;
0
;i n
(13.47)
n
c2 1X 2
Var.Z/ ; c2 D c : (13.48)
2n n j D1 j
The above result is consistent with those presented in (13.9) in the one-
dimensional setting and in Theorem 4, although the latter results do not require a
stochastic model involving independent random variables. The results obtained for
long-range random fluctuations in Theorem 5, for instance, correspond to random
variables that have an effect that is larger than their physical volume. However,
another similar averaging mechanism ensures that the global effect of n variables
decays as n increases.
Results such as (13.48) do not provide an explicit characterization of the variance
of u./ but rather an upper bound. In general, the effect of these n random variables
is not straightforward to describe and hence needs to be obtained by computational
means. In settings, where an estimate such as (13.48) may be obtained, we claim
that the Monte Carlo method is particularly well suited. Indeed, for .k/ , 1
k KPrealizations of and Zk D u. .k/ /, we obtain for the empirical mean
S D K1 K kD1 Zk using (13.48) that
c2
Var.S / ;
2Kn
in other words is small even for moderately large values of K provided that n is
large.
Coming back to u D u.xI
; / as the solution of an equation of the form
13 Propagation of Stochasticity in Heterogeneous Media and: : : 517
5 Conclusions
The first part of this chapter reviewed several macroscopic models that describe
the propagation of stochasticity from (highly oscillatory) random coefficients to
random solutions of partial differential equations. Although not exhaustive, the
set of examples covered above displays the main features of the problem: (i)
the explicit, analytic derivation of macroscopic model is a difficult task that has
been completed for a relatively small set of examples and (ii) the propagation of
stochasticity depends on the correlation function of the random fluctuations. In the
presence of long-range fluctuations, stochasticity may still dominate in the limit
of vanishing correlation length and the macroscopic model takes the form of a
stochastic partial differential equation. For short-range fluctuations in some models
and independent of the correlation function in other models, averaging (law of large
number) effects dominate the propagation process, and the limiting model takes
the form of a homogenized, effective medium equation. The residual stochasticity
may then be characterized as a central limit correction to homogenization when
randomness is sufficiently short range. In the presence of long-range fluctuations,
the random fluctuations of the solutions are often described by means of integrals of
fractional white noise.
The advantages of analytically tractable macroscopic models in uncertainty
quantification are clear. When they can be derived, such models provide an explicit
expression for the probability density of many functionals of the PDE solution of
interest. Moreover, these explicit models typically involve a very small number of
parameters, such as the integral or the asymptotic decay of the correlation function.
518 G. Bal
A notable exception is the case of large deviation results (see, e.g., Theorem 3),
which involve the fine structure of the random process. Finally, these models show
that the random fluctuations of PDE solutions typically involve functionals of
(fractional) white noise.
The presence of such fractal processes is, however, problematic when the
propagation of stochasticity needs to be performed computationally: white noise
requires a large number of degrees of freedom to be modeled accurately. In such a
situation, the last part of this chapter presented a combined PCE-MC computational
framework in which large-scale random coefficients are treated by polynomial chaos
expansions, while the large number of small-scale coefficients is handled by Monte
Carlo. In the event that each small-scale random coefficient has a small influence
on the final solution, concentration (Efron-Stein) inequalities, which are consistent
with a central limit scaling, show that the collective influence of these random
coefficients has a relatively small variance that can be accurately predicted by a
reasonable number of MC samples.
References
1. Armstrong, S.N., Smart, C.K.: Quantitative stochastic homogenization of elliptic equations in
nondivergence form. Arch. Ration. Mech. Anal. 214, 867911 (2014)
2. Bal, G.: Central limits and homogenization in random media. Multiscale Model. Simul. 7(2),
677702 (2008)
3. Bal, G.: Convergence to SPDEs in Stratonovich form. Commun. Math. Phys. 212(2), 457477
(2009)
4. Bal, G.: Homogenization with large spatial random potential. Multiscale Model. Simul. 8(4),
14841510 (2010)
5. Bal, G.: Convergence to homogenized or stochastic partial differential equations. Appl. Math.
Res. Express 2011(2), 215241 (2011)
6. Bal, G., Garnier, J., Gu, Y., Jing, W.: Corrector theory for elliptic equations with long-range
correlated random potential. Asymptot. Anal. 77, 123145 (2012)
7. Bal, G., Garnier, J., Motsch, S., Perrier, V.: Random integrals and correctors in homogenization.
Asymptot. Anal. 59(12), 126 (2008)
8. Bal, G., Ghanem, R., Langmore, I.: Large deviation theory for a homogenized and corrected
elliptic ode. J. Differ. Equ. 251(7), 18641902 (2011)
9. Bal, G., Gu, Y.: Limiting models for equations with large random potential: a review. Commun.
Math. Sci. 13(3), 729748 (2015)
10. Bal, G., Jing, W.: Corrector theory for elliptic equations in random media with singular Greens
function. Application to random boundaries. Commun. Math. Sci. 9(2), 383411 (2011)
11. Bensoussan, A., Lions, J.-L., Papanicolaou, G.C.: Homogenization in deterministic and
stochastic problems. In: Symposium on Stochastic Problems in Dynamics, University of
Southampton, Southampton, 1976, pp. 106115. Pitman, London (1977)
12. Biskup, M., Salvi, M., Wolff, T.: A central limit theorem for the effective conductance: linear
boundary data and small ellipticity contrasts. Commun. Math. Phys. 328, 701731 (2014)
13. Boucheron, S., Lugosi, G., Bousquet, O.: Concentration inequalities. In: Advanced Lectures
on Marchine Learning. Volume 3176 of Lecture Notes in Computer Science, pp. 208240.
Springer, Berlin (2004)
14. Bourgeat, A., Piatnitski, A.: Estimates in probability of the residual between the random and
the homogenized solutions of one-dimensional second-order operator. Asymptot. Anal. 21,
303315 (1999)
13 Propagation of Stochasticity in Heterogeneous Media and: : : 519
15. Breiman, L.: Probability. Volume 7 of Classics in Applied Mathematics. SIAM, Philadelphia
(1992)
16. Caffarelli, L.A., Souganidis, P.E.: Rates of convergence for the homogenization of fully
nonlinear uniformly elliptic PDE in random media. Invent. Math. 180, 301360 (2010)
17. Chatterjee, S.: Fluctuations of eigenvalues and second order Poincar inequalities. Prob.
Theory Relat. Fields 143, 140 (2009)
18. Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Cambridge University
Press, Cambridge (2008)
19. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Applications of
Mathematics. Springer, New York (1998)
20. Erds, L., Yau, H.T.: Linear Boltzmann equation as the weak coupling limit of a random
Schrdinger Equation. Commun. Pure Appl. Math. 53(6), 667735 (2000)
21. Figari, R., Orlandi, E., Papanicolaou, G.: Mean field and Gaussian approximation for partial
differential equations with random coefficients. SIAM J. Appl. Math. 42(5), 10691077 (1982)
22. Ghanem, R.G.: Hybrid stochastic finite elements and generalized Monte Carlo simulation.
Trans. ASME 65, 10041009 (1998)
23. Gloria, A., Otto, F.: An optimal variance estimate in stochastic homogenization of discrete
elliptic equations. Ann. Probab. 39, 779856 (2011)
24. Gloria, A., Otto, F.: An optimal error estimate in stochastic homogenization of discrete elliptic
equations. Ann. Appl. Probab. 22, 128 (2012)
25. Gloria, A., Otto, F.: An optimal variance estimate in stochastic homogenization of discrete
elliptic equations. ESAIM Math. Model. Numer. Anal. 48, 325346 (2014)
26. Gu, Y., Bal, G.: Random homogenization and convergence to integrals with respect to the
Rosenblatt proces. J. Differ. Equ. 253(4), 10691087 (2012)
27. Gu, Y., Mourrat, J.-C.: Scaling limit of fluctuations in stochastic homogenization. Probab.
Theory Relat. Fields (2015, to appear)
28. Hairer, M., Pardoux, E., Piatnitski, A.: Random homogenization of a highly oscillatory singular
potential. Stoch. Partial Differ. Equ. 1, 572605 (2013)
29. Jikov, V.V., Kozlov, S.M., Oleinik, O.A.: Homogenization of Differential Operators and
Integral Functionals. Springer, New York (1994)
30. Komorowski, T., Nieznaj, E.: On the asymptotic behavior of solutions of the heat equation with
a random, long-range correlated potential. Potential Anal. 33(2), 175197 (2010)
31. Kozlov, S.M.: The averaging of random operators. Math. Sb. (N.S.) 109, 188202 (1979)
32. Nolen, J.: Normal approximation for a random elliptic equation. Probab. Theory Relat. Fields
159, 661700 (2014)
33. Papanicolaou, G.C., Varadhan, S.R.S.: Boundary value problems with rapidly oscillating
random coefficients. In: Random Fields, Esztergom, 1979, Volumes I and II. Colloquia
Mathematica Societatis Jnos Bolyai, vol. 27, pp. 835873. North Holland, Amsterdam/New
York (1981)
34. Pardoux, E., Piatnitski, A.: Homogenization of a singular random One dimensional PDE.
GAKUTO Int. Ser. Math. Sci. Appl. 24, 291303 (2006)
35. Pardoux, E., Piatnitski, A.: Homogenization of a singular random one-dimensional PDE with
time-varying coefficients. Ann. Probab. 40, 13161356 (2012)
36. Taqqu, M.S.: Weak convergence to fractional Brownian motion and to the Rosenblatt process.
Probab. Theory Relat. Fields 31, 287302 (1975)
37. Yurinskii, V.V.: Averaging of symmetric diffusion in a random medium. Siberian Math. J. 4,
603613 (1986). English translation of: Sibirsk. Mat. Zh. 27(4), 167180 (1986, Russian)
38. Zhang, N., Bal, G.: Convergence to SPDE of the Schrdinger equation with large, random
potential. Commun. Math. Sci. 5, 825841 (2014)
39. Zhang, N., Bal, G.: Homogenization of a Schrdinger equation with large, random, potential.
Stoch. Dyn. 14, 1350013 (2014)
Polynomial Chaos: Modeling, Estimation,
and Approximation 14
Roger Ghanem and John Red-Horse
Abstract
Polynomial chaos decompositions (PCE) have emerged over the past three
decades as a standard among the many tools for uncertainty quantification. They
provide a rich mathematical structure that is particularly well suited to enabling
probabilistic assessments in situations where interdependencies between physi-
cal processes or between spatiotemporal scales of observables constitute credible
constraints on system-level predictability. Algorithmic developments exploiting
their structural simplicity have permitted the adaptation of PCE to many of
the challenges currently facing prediction science. These include requirements
for large-scale high-resolution computational simulations implicit in modern
applications, non-Gaussian probabilistic models, and non-smooth dependen-
cies and for handling general vector-valued stochastic processes. This chapter
presents an overview of polynomial chaos that underscores their relevance
to problems of constructing and estimating probabilistic models, propagating
them through arbitrarily complex computational representations of underlying
physical mechanisms, and updating the models and their predictions as additional
constraints become known.
Keywords
Polynomial chaos expansions Stochastic analysis Stochastic modeling
Uncertainty quantification
R. Ghanem ()
Department of Civil and Environmental Engineering, University of Southern California,
Los Angeles, CA, USA
e-mail: ghanem@usc.edu
J. Red-Horse
Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA
e-mail: jrredho@sandia.gov
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
2 Mathematical Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
3 Polynomial Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
4 Representation of Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
5 Polynomial Chaos with Random Coefficients: Model Error . . . . . . . . . . . . . . . . . . . . . . . 534
6 Adapted Representations of PCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
7 Stochastic Galerkin Implementation of PCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
7.1 Nonintrusive Evaluation of the Stochastic Galerkin Solution . . . . . . . . . . . . . . . . . 539
7.2 Adapted Preconditioners for the Stochastic Galerkin Equations . . . . . . . . . . . . . . . 541
8 Embedded Quadratures for Stochastic Coupled Physics . . . . . . . . . . . . . . . . . . . . . . . . . . 542
9 Constructing Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
1 Introduction
science and engineering were successfully tackled, and a very strong foundation
for nonlinear system theory was forged [19, 48, 49, 51, 52, 65, 75].
Extensions of Wieners formalism to non-Gaussian measures have been pursued
with some success. In particular, those ideas were readily extended to functionals
of Lvy processes [50, 78, 79, 104]. Methods based on Jacobi fields [58] and
bi-orthogonal expansions [3, 53] have provided a systematic approach for these
extensions to other processes [12, 54].
Interest in treating problems with random coefficients, as opposed to problems
with random external forcing, was initially motivated by waves propagating in
heterogeneous media [13, 16, 82]. This work relied upon perturbation methods
and low-order Taylor expansions as essential means for treating the nonlinearities
introduced by parametric dependence. The adaptation of these expansions to
general random operators soon followed in the form of stochastic Greens functions
[1, 2] and Neumann expansions of the inverse operator [107]. Integration of these
approaches into finite element formalisms enabled their application to a much wider
range of problems [11, 44, 45, 57, 63, 80, 87]. In spite of their algorithmic and
computational simplicity, issues of convergence and statistical significance limited
the applicability of these methods to problems with relatively small dependence on
the random coefficients. It is critical to note here that during this time period, Monte
Carlo sampling methods were challenged with limited computational resources,
thus motivating the development of alternative formalisms.
The adaptation of the Wiener-based approaches to problems of parametric
dependence required a change of perspective from Wiener and Volterras initial
efforts. Parametric noise as motivated by the initial physical problems described
above fluctuates in space and does not exhibit the non-anticipative behavior, i.e.,
the clear distinction between past and future, implicit in time-dependent processes.
Pioneering efforts in this direction were carried out by Ghanem and Spanos [40]
where the Karhunen-Love expansion of a correlated Gaussian process was used
to embed the problem in an infinite dimensional Gaussian space. This embedding
enabled the use of the polynomial chaos development of Wiener to represent the var-
ious stochastic processes involved. That work also introduced a Galerkin projection
scheme to integrate polynomial chaos expansions into an approximation formalism
for algebraic and differential equations with random coefficients. Extensions of that
work with basis adaptation [56], hybrid expansions [32], geometric uncertainties
[35], dynamic model reduction [39], and nonlinear partial differential equations and
coupled physics [24, 33, 36, 60] were also developed around that time. Given its
reliance on Wieners discretization of Brownian motion, these initial applications of
polynomial chaos expansions to parametric uncertainty were limited to parameters
with Gaussian measure or models that are simple transformation of Gaussian
processes, such as lognormal processes. The Karhunen-Love expansion provides
a natural discretization of a stochastic process into a denumerable set of jointly
distributed random variables. Other settings with denumerable random variables are
common in science and engineering and are typically associated with a collection
of generally dependent scalar or vector random variables used to parameterize
526 R. Ghanem and J. Red-Horse
2 Mathematical Setup
3 Polynomial Chaos
We adopt a polynomial chaos expansion (PCE) [40] form for the uncertainty models
as they provide the means to develop flexible representations of random variables
that can be easily constrained either by observations or by governing equations.
These representations serve also as highly efficient generators of random vari-
ables, permitting on-the-fly sampling from complex distributions and the real-time
implementation of Approximate Bayesian Computation (ABC) methods [26, 95].
A polynomial chaos decomposition of a random variable, X , involves two stages.
528 R. Ghanem and J. Red-Horse
In the first of these stages, the input set of system random variables, X 2 G, is
described as function of an underlying set of basic random variables, 2 Rd , which
we refer to as the germ of the expansion [4, 40, 84]. The probability description
of is assumed to be given by a probability density function, , and the functional
dependence is denoted by X D X./. The second stage consists of developing this
general functional dependence in a polynomial expansion with respect to the germ
that is convergent in L2 .Rd ; G; / and estimating the corresponding expansion
coefficients. The structure of this L2 space was detailed elsewhere [84].
The polynomial chaos expansion (PCE) of random variable X thus takes the
form
X
XD X ./; (14.1)
jj>0
d
X
where D .1 ; d /, jj D i , and f g are polynomials orthonormal with
iD1
respect to the measure of . The coefficients can thus be readily expressed as
Z
X D X ./ ./ d ; (14.2)
Rd
where the mathematical expectation indicated by this last integral can be shown to
be the scalar product in L2 .Rd ; G; / between X and and will be denoted as
hX i . While the case where is a discretization of the Brownian motion would
align this development with Wieners theory, the infinite dimensional problem is
usually discretized and truncated in a preprocessing step as part of modeling and
data assimilation. This preprocessing obviates the need for an infinite dimensional
analysis allowing far simpler concepts from multivariate polynomial approximation
to be utilized. The summation in the above expansion is typically truncated after
some polynomial order p. To be consistent with current research on efficient PCE
representations, however, we will replace such a truncation with a projection on
a subspace spanned by an indexing set I and thus replace Eq. (14.1) with the
following equation:
X
XD X ./: (14.3)
2I
h.U; X .// D 0; h W H Rd 7! Rm
Q D q.U .//; q W Rd 7! Rn : (14.4)
At first sight it would seem that the QoI, Q, should be expressed in terms of .
This would require the characterization of a nonlinear multivariate map, a task
closely associated with the curse of dimensionality.
Capitalizing on the simplicity of Q relative to X i and U has been shown to
lead to adapted algorithms that significantly enhance the efficiency of associated
computational models. This topic will be further elaborated in a separate section
below.
Two algorithmic methodologies have been pursued for characterizing the PCE of
U and Q. In a first of these, an approximation for U is substituted into the governing
equations, and the resulting error is constrained to be orthogonal to the operator
input space which yields the following equation:
* 0 1+
X
k h@ Uj j;X
A D 0; 8k 2 I: (14.6)
j 2I
This last equation is a system of coupled nonlinear equations for the coefficients
Uj in the PCE representation of U . Linearizing h with respect to U , which is
typically done as part of nonlinear solution algorithms, results in the following
where l denotes the linearized operator:
X
k j l Uj ; X D 0; 8k 2 I: (14.7)
j 2I
Q D hQ i ; 2I (14.10)
nq
X
Q Q. .r/ / .
.r
/wq ; (14.11)
rD1
where f .r/ g are the nq quadrature points in Rd and fwq g their associated weights.
The number of quadrature points required in the above approximation depends
532 R. Ghanem and J. Red-Horse
1
X
XD i X i ; (14.13)
iD1
1 p
X
XD i i e i (14.14)
iD1
1
i D p .ei ; X / ; (14.16)
i
random vector [22,23]. For the special case where the Hilbert space G is identified
with L2 .T / for some subset T of a metric space, the eigenproblem specified in
Eq. (14.15) is replaced by
Z
k.t; s/ei .s/ds D i ei .t /; t 2T (14.17)
T
simultaneously permitting both their sampling and their integration with other
simulation codes that rely on polynomial chaos representations for their input
parameters.
A challenge with inferring a stochastic process from data is the very large number
of constraints required on any probabilistic representation of the process. In the case
of a Karhunen-Love expansion, these constraints take the form of specifying the
covariance operator and also specifying the joint probability density function of
the random variables . Under the added assumption of a Gaussian process, is a
vector of independent standard Gaussian variables. Even in this overly simplified
setting, any assumption concerning the form of the covariance function presumes
the ability to observe the process over a continuum of scales within the domain
T of the process, a feat that is likely to remain elusive. Alternative procedures
for characterizing stochastic processes that are not predicated on knowledge of
covariance functions are required for stochastic models to bridge the gap between
predictions and reality. This point is addressed in a subsequent section in this
chapter.
Given its structure, there are several manners in which the PCE can be viewed as
a prior model, and the associated update method can take on at least one of the
following three distinct forms. First, the germ can be updated, thus assimilating
the new knowledge into the motive source of uncertainty. This can be achieved by
either updating the measure of the same germ or by introducing new components
into the germ, thus increasing the underlying stochastic dimension of the PCE. The
first of these germ update methods maintains the same physical conceptualization
of uncertainty, while modifying its measure. The second germ update approach
introduces new observables and in the process necessitates more elaborate physical
interpretations, typically through multiscale and multiphysics formulations.
A second prior update method is to update the numerical values of the coefficients
in the PCE with the goal of minimizing some measure of discrepancy between pre-
dictions and observations. Even with such a deterministic update of the coefficients,
predictions from the PCE remain probabilistic in view of their dependence on the
stochastic germ. This second update approach can be achieved through a Bayesian
MAP, a Maximum Likelihood estimation, or even a least squares setting.
A third alternative update method consists of maintaining the probability measure
of the germ at its prior value while updating a probabilistic model of the coefficients.
This last option presupposes that a prior model of the PCE coefficients can be
constructed.
Each of these three update alternatives is associated with a distinctly different
perspective on interpreting and managing uncertainty.
A unique feature of a PCE that differentiates it among other probabilistic
representations is that both parameters, X , the solution to governing equations, U ,
and the QoI, Q, are all represented with respect to the same germ . This gives
rise to interesting additional options for updating prior PCE models. Specifically,
the representation of X can be updated and subsequently propagated through the
model to obtain a correspondingly updated representation of Q. Alternatively, the
representation of Q can itself be directly updated with the implicit understanding
that the coefficients in the PCE of Q already encapsulate a numerical resolution of
the prior governing equations.
Equation (14.3) is thus replaced by the following equation:
X
XD X ./ ./; (14.19)
2I
where the new random variables 2 Rn describe the uncertainty about the
probabilistic model. Typically, each X depends on a proper subset of . The
QoI can thus be expressed as Q.; / to underscore its dependence on the .d; n /-
dimensional random vectors .; /. Clearly, the uncertainty captured by the random
variables, , can be reduced by assimilating more data into the estimation problem,
and increasing the size of suggests more subscale details. In the limit, the
asymptotic distributions of X and their posterior probability densities become
concentrated. The uncertainty reflected by the variables, on the other hand, is
a property of the selected uncertainty model form and characterizes the limits on
536 R. Ghanem and J. Red-Horse
Equations (14.20) and (14.21) provide two different, yet consistent, representa-
tions. Given the orthogonality of the polynomials appearing in these representations,
the coefficients Q can be obtained via projection and quadrature in the form
nq
X
Q ./ D wr q. .r/ / .
.r/
/; (14.22)
rD1
7000
associated with randomness
in coefficients
5000
PDF
3000
1000
0
clear from this figure that inferences about Maximum Acceleration, in particular
near the tail of its distribution, are quite sensitive to additional samples of material
properties.
.d C p/
P D : (14.23)
d p
This factorial growth in the number of terms has motivated the pursuit of compact
representations which typically select a subset of terms in the expansions. Another
recent approach, reviewed in this section, capitalizes on properties of Gaussian
germs to developed representations that are uniquely adapted to specific quantities
of interest. This approach is based on the observation that irrespective of how
large the stochastic dimension is, the quantity of interest q is typically very low
dimensional, spanning a manifold of significantly lower intrinsic dimension than
the ambient Euclidean space. Recently, algorithms were developed for identifying
two such manifolds, namely, a one-dimensional subspace and a sequence of one-
dimensional subspaces [98]. Identifying these manifolds involves computing an
isometry, A, in Gaussian space that rotates the Gaussian germ so as to minimize
the concentration of the measure of the QoI away from the manifold. For problems
involving non-Gaussian germ, a mapping back to Gaussian variables is typically
538 R. Ghanem and J. Red-Horse
implemented as a preprocessing step. This step is both highly efficient and accurate.
According to this basis adaptation approach, Eq. (14.5) is first rewritten as follows:
d
X X
Q D Q0 C Qi i C Q ./: (14.24)
iD1 jj>1
Under conditions of a Gaussian germ , the first summation in this last equation
rotate the germ into a new germ
is a Gaussian random variable. The idea then is toP
where the first coordinate, 1 , is aligned with diD1 Qi i . To that end, let be
such a transformed germ expressed as
D A; (14.25)
P
where A is an isometry in Rd such that 1 D diD1 Qi i . The other rows of A can
be completed through a Gram-Schmidt orthogonalization procedure which may be
supplemented by additional constraints. The same PCE for Q can then be expressed
either in terms of or in terms of leading to the following two expressions:
d
X X
Q D Q0 C Q1 i C Q ./
iD1 jj>1
X
D Q0 C Q1A 1 C QA A
./ (14.26)
jj>1
A
where ./ D .A /. Equation (14.26) can be rewritten as follows:
p
X X
Q D Q0 C Q1A 1 C QiA i .1 / C QA ./; (14.27)
iD2 jj>1
where the equality is in distribution and the equation provides a map from
a Gaussian variate to a Q-variate. According to the Skorokhod representation
theorem [14], a version of can be defined in the same probability triple as Q. Thus
appearing in the inverse CDF expression above can be defined in the linear span
of the germ . Other constructions of the isometry, A, with alternative probabilistic
arguments and justifications have also been presented [98].
The construction of the adapted bases requires an initial investment of resources
for the evaluation of the Gaussian components of the QoI Q. These components
are used to construct the isometry, A, and thus the dominant direction 1 that is
adapted to Q. Following that, a high-order expansion in 1 is sought and can be
evaluated using either intrusive or nonintrusive procedures.
An extension to the case where Q is a stochastic process (i.e., infinite dimen-
sional) has also been implemented [99].
A salient feature of the mathematical setup underlying PCE is the L2 setting induced
by the germ. This permits L2 -convergent representations of random variables in
contradistinction to representations of their statistical moments or distributions.
A by-product of this L2 structure is the ability to characterize the convergence
of solutions to stochastic operator equations in an appropriate stochastic operator
norm. For bounded linear operators, this results in an immediate extension of Cas
lemma and a corresponding interpretation of the Lax-Milgram theorem [9]. This is
typically implemented according to the so-called intrusive approach described in
Eqs. (14.6), (14.7), (14.8), and (14.9).
These intrusive implementations face two challenges, both of which were
touched upon briefly earlier in this chapter. First, the intrusive nature of the
algorithms presents an implementation challenge as it necessitates the development
of new software or significant revisions to existing code. Second, the size of these
linear problems is much larger than the size of the corresponding deterministic
problems, thus requiring careful attention to algorithmic implementations and the
development of adapted preconditioners. These two challenges are briefly discussed
in this section.
XX
i j k Li U j D f k ; 8 k 2 I; (14.29)
j 2I i2I
540 R. Ghanem and J. Red-Horse
where the deterministic linear operators fli g have been projected on a suitable finite
dimensional space and are represented by the associated matrices fLi g with U j and
f k denoting, respectively, the corresponding projections of Uj and of boundary
conditions. We further distinguish a special case where Eq. (14.8) can be written as
d
X
l.Uj ; X / D li .Uj / i ; li .Uj / D hl.Uj ; X /; i i ; (14.30)
iD0
d
XX
i j k Li U j D f k ; k 2 I: (14.31)
j 2I iD0
X X
Lj k U j D f k ; k 2 I; Lj k D cij k Li (14.32)
j 2I i
X X
L0 U k D f k cij k Li U j ; k2I (14.33)
j 2I i2I C
where I C denotes the set I without 0. This last equation serves two purposes.
First, it describes one of the most robust preconditioners for the system given
by Eq. (14.29), namely, a preconditioning via the inverse of the mean operator
L0 [67]. Second, it provides a path for evaluating the solution of the intrusive
approach via a nonintrusive algorithm. Specifically, as long as an analysis code
exists for solving the mean operator, Eq. (14.33) describes how to evaluate the
polynomial chaos coefficients U k by updating the right-hand side of the equations
and iterating until convergence is achieved. Block-diagonal preconditioning has also
been proposed [67, 68] as well as preconditioning with truncated PCE solutions
[88, 89].
14 Polynomial Chaos: Modeling, Estimation, and Approximation 541
c bT
c` D `1 ` ; ` D 1; ; p; (14.34)
b ` d`
where c`1 are the first principal submatrices corresponding to the .` 1/ th order
polynomial expansion. We note that even though the matrices c` are symmetric, the
global stochastic Galerkin matrix will be symmetric only if each one of the matrices
a b
0 0
10 10
20 20
30 30
40 40
50 50
60 60
70 70
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Fig. 14.2 Typical sparsity patterns of the coefficient matrices for different cij k . (a) cij k D
h i j k i. (b) cij k D h i j k i
542 R. Ghanem and J. Red-Horse
Li is symmetric. Clearly, all matrices Li will have the same sparsity pattern. In
either case, the block sparse or the block dense, the linear system (14.29) can be
written as
AP U P D f P ; (14.35)
A `1 B `
A` D ; ` D P; : : : ; 1; (14.36)
C ` D`
where uPi refers to the set of stochastic outputs over Pi . This notation highlights
the fact that significant smoothing takes place as uncertainty propagates through
each model component in the overall system model. It is thus clear that rather
than carrying out the high-dimensional integrals over d -space, a subspace which
contains the fluctuations of X i together with those of uPi should be sufficient. It
is a challenge to efficiently characterize the probability measure on this space and
to compute the associated quadrature rules. Instead, we will rely on an embedded
quadrature approach [6]. A first step in this approach is to reduce the functions
uPi , whenever they are described as random fields, via their Karhunen-Love
decomposition. Let the number of terms retained for these functions be equal to
ti . Clearly, these terms will generally be dependent, and hence the current approach,
which is described next, is conservative as they are presumed independent. Let Qd
denote the coordinates of a quadrature rule in d -space. We then identify a quadrature
rule Qdi Cti in the space of dimension di C ti that is a subset of Qd with an L1
constraint on the weights. We demonstrate this idea for a two-model problem below.
Consider two physical processes, u and v, governed by the following functional
relationships:
where and denote random parameters implicit to the first and second physics
relationships, respectively. We will assume that these have been discretized into
random vectors using, for instance, a Karhunen-Love expansion. Also, .v/ and
.u/ are additional parameters that describe the coupling between the two physical
processes. A complete solution of the problem would require an analysis over a
stochastic space that is large enough to characterize and simultaneously. We
will approach this challenge in two steps. First, we reduce the stochastic character
of each of u and v through decorrelation, by describing their respective dominant
Karhunen-Love expansions. Our goal is to express u, v, , and . To begin, we
take
Ku
X Kv
X
uD u O ; vD v O ; (14.39)
D0 D0
where O and O are the Karhunen-Love variables which, for now, are assumed to be
independent. Note that each of these variables can be described independently using
an inverse CDF mapping, as a function of a Gaussian variable. This results in the
following representations:
Lu
X Lv
X
uD u
./; vD v
i ./
D0 D0
L L
X X
.u/ D m
./; .v/ D n
./ (14.40)
D0 D0
544 R. Ghanem and J. Red-Horse
L K
X X
v.; / v.; / D v ./
./ (14.41)
D0 D0
u .v/ D Efu
g; v .u/ D Efv
g; (14.42)
If the function h is an indicator function for a set A, then Eff g evaluates the
probability of g being in A. We are also interested in situations where the function,
h, is an orthogonal polynomial in the random variables, i , in which case Eff g
evaluates to the polynomial chaos coefficient in the expansion of g in a basis
consisting of these polynomials. In some instances, we will also be interested in
situations where the random variables are themselves functions of another set of
random variables 2 RN . Thus, in general, we will consider
Z Z nq
X
I D E ff g D f .k/ .k/d k D f ..// ./d wq f . q /;
Rm RN qD1
2 R ; f 2 V;
q N
(14.44)
14 Polynomial Chaos: Modeling, Estimation, and Approximation 545
5 5
2.5 2.5
[]
[]
0 0
2
2
2.5 2.5
5 5
5 2.5 0 2.5 5 5 2.5 0 2.5 5
1 [] 1 []
Fig. 14.3 Quadrature points in original space and embedded quadratures in reduced space
where V denotes a functional space specified by the physics of the problem, is the
probability density function of the N -dimensional random variable , and .wq ; q /
are weight/coordinate pairs of some particular quadrature rule. The integral I can
thus be evaluated either over Rm with respect to the measure of or over RN with
respect to the measure of . In most cases of interest, quadrature with respect to
, while easier to compute (since the are usually statistically independent), is in
a much higher dimension. On the other hand, quadrature with respect to , while
over a much lower dimension, is with respect to a dependent measure and does
not conform to standard quadrature rules. Recent algorithms [6] select quadrature
points for -integration as a very small subset from the points required for the -
integration. The selection is based on an optimality requirement for the L1 norm of
the weights of the selected subset. Figure 14.3a, b show a comparison between the
required quadrature points using full tensor products and the method that was just
described. The results in these figures pertain to an application involving stationary
transport of neutrons with thermal coupling [5]. The figures show the projection
onto a two-dimensional plane of all quadrature points used for evaluating the PCE
coefficients of the temperature field, to within similar accuracy. The number of
required quadrature points is reduced from over a 1000 to 6.
@hP iV
D kij1 huj iV (14.45)
@xi
where hP iV and hui are volume averaged pressure and velocity obtained from fine-
scale solution. The volume average is computed as
Z
1
hai D ad V: (14.46)
V V
10 Conclusion
References
1. Adomian, G.: Stochastic Greens functions. In: Bellman, R. (ed.) Proceedings of Symposia
in Applied Mathematics. Volume 16: Stochastic Processes in Mathematical Physics and
Engineering. American Mathematical Society, Providence (1964)
2. Adomian, G.: Stochastic Systems. Academic, New York (1983)
3. Albeverio, S., Daletsky, Y., Kondratiev, Y., Streit, L.: Non-Gaussian infinite dimensional
analysis. J. Funct. Anal. 138, 311350 (1996)
4. Arnst, M., Ghanem, R.: Probabilistic equivalence and stochastic model reduction in
multiscale analysis. Comput. Methods Appl. Mech. Eng. 197(4344), 35843592 (2008)
548 R. Ghanem and J. Red-Horse
5. Arnst, M., Ghanem, R., Phipps, E., Red-Horse, J.: Dimension reduction in stochastic
modeling of coupled problems. Int. J. Numer. Methods Eng. 92, 940968 (2012)
6. Arnst, M., Ghanem, R., Phipps, E., Red-Horse, J.: Measure transformation and efficient
quadrature in reduced-dimensional stochastic modeling of coupled problems. Int. J. Numer.
Methods Eng. 92, 10441080 (2012)
7. Arnst, M., Ghanem, R., Phipps, E., Red-Horse, J.: Reduced chaos expansions with random
coefficients in reduced-dimensional stochastic modeling of coupled problems. Int. J. Numer.
Methods Eng. 97(5), 352376 (2014)
8. Arnst, M., Ghanem, R., Soize, C.: Identification of Bayesian posteriors for coefficients of
chaos expansions. J. Comput. Phys. 229(9), 31343154 (2010)
9. Babuka, I., Tempone, R., Zouraris, G.E.: Galerkin finite element approximations of
stochastic elliptic partial differential equations. SIAM J. Numer. Anal. 42(2), 800825 (2005)
10. Babuka, I., Tempone, R., Zouraris, G.E.: Solving elliptic boundary value problems with
uncertain coefficients by the finite element method: the stochastic formulation. Comput.
Methods Appl. Mech. Eng. 194(1216), 12511294 (2005)
11. Benaroya, H., Rehak, M.: Finite element methods in probabilistic structural analysis: a
selective review. Appl. Mech. Rev. 41(5), 201213 (1988)
12. Berezansky, Y.M.: Infinite-dimensional non-Gaussian analysis and generalized translation
operators. Funct. Anal. Appl. 30(4), 269272 (1996)
13. Bharucha-Reid, A.T.: On random operator equations in Banach space. Bull. Acad. Polon.
Sci. Ser. Sci. Math. Astr. Phys. 7, 561564 (1959)
14. Billingsley, P.: Probability and Measure. Wiley Interscience, New York (1995)
15. Bose, A.G.: A theory of nonlinear systems. Technical report 309, Research Laboratory of
Electronics, MIT (1956)
16. Boyce, E.W., Goodwin, B.E.: Random transverse vibration of elastic beams. SIAM J. 12(3),
613629 (1964)
17. Brilliant, M.B.: Theory of the analysis of nonlinear systems. Technical report 345, Research
Laboratory of Electronics, MIT (1958)
18. Cameron, R.H., Martin, W.T.: The orthogonal development of nonlinear funtions in a series
of Fourier-Hermite functionals. Ann. Math. 48, 385392 (1947)
19. Chorin, A.: Hermite expansions in Monte-Carlo computation. J. Comput. Phys. 8, 472482
(1971)
20. Cornish, E., Fisher, R.: Moments and cumulants in the specification of distributions. Rev. Int.
Stat. Inst. 5(4), 307320 (1938)
21. Das, S., Ghanem, R.: A bounded random matrix approach for stochastic upscaling. SIAM J.
Multiscale Model. Simul. 8(1), 296325 (2009)
22. Das, S., Ghanem, R., Finette, S.: Polynomial chaos representation of spatio-temporal random
fields from experimental measurements. J. Comput. Phys. 228(23), 87268751 (2009)
23. Das, S., Ghanem, R., Spall, J.: Sampling distribution for polynomial chaos representation
of data: a maximum-entropy and fisher information approach. SIAM J. Sci. Comput. 30(5),
22072234 (2008)
24. Debusschere, B., Najm, H., Matta, A., Knio, O., Ghanem, R., Le Maitre, O.: Protein labeling
reactions in electrochemical microchannel flow: numerical simulation and uncertainty propa-
gation. Phys. Fluids 15(8), 22382250 (2003)
25. Descelliers, C., Ghanem, R., Soize, C.: Maximum likelihood estimation of stochastic chaos
representation from experimental data. Int. J. Numer. Methods Eng. 66(6), 9781001 (2006)
26. Diggle, P., Gratton, R.: Monte Carlo methods of inference for implicit statistical models.
J. R. Stat. Soc. Ser. B 46, 193227 (1984)
27. Doostan, A., Ghanem, R., Red-Horse, J.: Stochastic model reduction for chaos representa-
tions. Comput. Methods Appl. Mech. Eng. 196, 39513966 (2007)
28. Ernst, O.G., Ullmann, E.: Stochastic Galerkin matrices. SIAM J. Matrix Anal. Appl. 31(4),
18481872 (2010)
29. Fisher, R., Cornish, E.: The percentile points of distributions having known cumulants.
Technometrics 2(2), 209225 (1960)
14 Polynomial Chaos: Modeling, Estimation, and Approximation 549
30. Ganapathysubramanian, B., Zabaras, N.: Sparse grid collocation methods for stochastic
natural convection problems. J. Comput. Phys. 225, 652685 (2007)
31. George, D.A.: Continuous nonlinear systems. Technical report 355, Research Laboratory of
Electronics, MIT (1959)
32. Ghanem, R.: Hybrid stochastic finite elements: coupling of spectral expansions with Monte
Carlo simulations. ASME J. Appl. Mech. 65, 10041009 (1998)
33. Ghanem, R.: Scales of fluctuation and the propagation of uncertainty in random porous media.
Water Resour. Res. 34(9), 21232136 (1998)
34. Ghanem, R., Abras, J.: A general purpose library for stochastic finite element computations.
In: Bathe, J. (ed.) Second MIT Conference on Computational Mechanics, Cambridge (2003)
35. Ghanem, R., Brzkala, V.: Stochastic finite element analysis for randomly layered media.
ASCE J. Eng. Mech. 122(4), 361369 (1996)
36. Ghanem, R., Dham, S.: Stochastic finite element analysis for multiphase flow in heteroge-
neous porous media. Transp. Porous Media 32, 239262 (1998)
37. Ghanem, R., Doostan, A., Red-Horse, J.: A probabilistic construction of model validation.
Comput. Methods Appl. Mech. Eng. 197, 25852595 (2008)
38. Ghanem, R., Red-Horse, J., Benjamin, A., Doostan, A., Yu, A.: Stochastic process
model for material properties under incomplete information (AIAA 20071968). In: 48th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference,
Honolulu, 2326 Apr 2007. AIAA (2007)
39. Ghanem, R., Sarkar, A.: Reduced models for the medium-frequency dynamics of stochastic
systems. JASA 113(2), 834846 (2003)
40. Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1991). Revised edition by Dover Publications, (2003)
41. Ghiocel, D., Ghanem, R.: Stochastic finite element analysis of seismic soil-structure
interaction. J. Eng. Mech. 128(1), 6677 (2002)
42. Gikhman, I., Skorohod, A.: The Theory of Stochastic Processes I. Springer, Berlin (1974)
43. Guilleminot, J., Soize, C., Ghanem, R.: Stochastic representation for anisotropic permeability
tensor random fields. Int. J. Numer. Anal. Methods Geomech. 36, 15921608 (2012)
44. Hart, G.C., Collins, J.D.: The treatment of randomness in finite element modelling. In: SAE
Shock and Vibrations Symposium, Los Angeles, pp. 25092519 (1970)
45. Hasselman, T.K., Hart, G.C.: Modal analysis of random structural systems. ASCE J. Eng.
Mech. 98(EM3), 561579 (1972)
46. Hida, T.: White noise analysis and nonlinear filtering problems. Appl. Math. Optim. 2, 8289
(1975)
47. Hida, T., Kuo, H.-H., Potthoff, J., Streit, L.: White Noise: An Infinite Dimensional Calculus.
Kluwer Academic Publishers, Dordrecht/Boston (1993)
48. Imamura, T., Meecham, W.: Wiener-Hermite expansion in model turbulence in the late decay
stage. J. Math. Phys. 6(5), 707721 (1965)
49. It, K.: Multiple Wiener integrals. J. Math. Soc. Jpn. 3(1), 157169 (1951)
50. It, K.: Spectral type of shift transformations of differential process with stationary
increments. Trans. Am. Math. Soc. 81, 253263 (1956)
51. Jahedi, A., Ahmadi, G.: Application of Wiener-Hermite expansion to nonstationary random
vibration of a Duffing oscillator. ASME J. Appl. Mech. 50, 436442 (1983)
52. Kallianpur, G.: Stochastic Filtering Theory. Springer, New York (1980)
53. Klein, S., Yasui, S.: Nonlinear systems analysis with non-Gaussian white stimuli: General
basis functionals and kernels. IEEE Tran. Inf. Theory IT-25(4), 495500 (1979)
54. Kondratiev, Y., Da Silva, J., Streit, L., Us, G.: Analysis on Poisson and Gamma spaces.
Infinite Dimens. Anal. Quantum Probab. Relat. Top. 1(1), 91117 (1998)
55. Lvy, P.: Leons dAnalyses Fonctionelles. Gauthier-Villars, Paris (1922)
56. Li, R., Ghanem, R.: Adaptive polynomial chaos simulation applied to statistics of extremes
in nonlinear random vibration. Probab. Eng. Mech. 13(2), 125136 (1998)
57. Liu, W.K., Besterfield, G., Mani, A.: Probabilistic finite element methods in nonlinear
structural dynamics. Comput. Methods Appl. Mech. Eng. 57, 6181 (1986)
550 R. Ghanem and J. Red-Horse
58. Lytvynov, E.: Multiple Wiener integrals and non-Gaussian white noise: a Jacobi field
approach. Methods Funct. Anal. Topol. 1(1), 6185 (1995)
59. Le Maitre, O., Najm, H., Ghanem, R., Knio, O.: Multi-resolution analysis of Wiener-type
uncertainty propagation schemes. J. Comput. Phys. 197(2), 502531 (2004)
60. Le Maitre, O., Reagan, M., Najm, H., Ghanem, R., Knio, O.: A stochastic projection method
for fluid flow. II: random process. J. Comput. Phys. 181, 944 (2002)
61. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial
differential equations. Comput. Methods Appl. Mech. Eng. 194(1216), 12951331 (2005).
Special Issue on Computational Methods in Stochastic Mechanics and Reliability Analysis
62. Meidani, H., Ghanem, R.: Uncertainty quantification for Markov chain models. Chaos 22(4)
(2012)
63. Nakagiri, S., Hisada, T.: Stochastic finite element method applied to structural analysis with
uncertain parameters. In: Proceeding of the International Conference on FEM, pp. 206211
(1982)
64. Nakayama, A., Kuwahara, F., Umemoto, T., Hayashi, T.: Heat and fluid flow within an
anisotropic porous medium. Trans. ASME 124, 746753 (2012)
65. Ogura, H.: Orthogonal functionals of the Poisson process. IEEE Trans. Inf. Theory IT-18(4),
473481 (1972)
66. Pawlowski, R., Phipps, R., Salinger, A., Owen, S., Ciefert, C., Stalen, A.: Automating embed-
ded analysis capabilities and managing software complexity in multiphysics simulation, Part
II: application to partial differential equations. Sci. Program. 20(3), 327345 (2012)
67. Pellissetti, M.F., Ghanem, R.G.: Iterative solution of systems of linear equations arising in
the context of stochastic finite elements. Adv. Eng. Softw. 31(89), 607616 (2000)
68. Powell, C.E., Elman, H.C.: Block-diagonal preconditioning for spectral stochastic finite-
element systems. IMA J. Numer. Anal. 29(2), 350375 (2009)
69. Pugachev, V., Sinitsyn, I.: Stochastic Systems: Theory and Applications. World Scientific,
River Edge (2001)
70. Red-Horse, J., Ghanem, R.: Elements of a functional analytic approach to probability. Int. J.
Numer. Methods Eng. 80(67), 689716 (2009)
71. Reichel, L., Trefethen, L.: Eigenvalues and pseudo-eigenvalues of toeplitz matrices. Linear
Algebra Appl. 162, 153185 (1992)
72. Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23, 470472
(1952)
73. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math.
Stat. 27, 832837 (1956)
74. Rosseel, E., Vandewalle, S.: Iterative solvers for the stochastic finite element method. SIAM
J. Sci. Comput. 32(1), 372397 (2010)
75. Rugh, W.J.: Nonlinear System Theory: The Volterra-Wiener Approach. Johns Hopkins
University Press, Baltimore (1981)
76. Sakamoto, S., Ghanem, R.: Simulation of multi-dimensional non-Gaussian non-stationary
random fields. Probab. Eng. Mech. 17(2), 167176 (2002)
77. Sargsyan, K., Najm, H., Ghanem, R.: On the statistical calibration of physical models. Int. J.
Chem. Kinet. 47(4), 246276 (2015)
78. Schoutens, W.: Stochastic Processes and Orthogonal Polynomials. Springer, New York (2000)
79. Segall, A., Kailath, T.: Orthogonal functionals of independent-increment processes. IEEE
Trans. Inf. Theory IT-22(3), 287298 (1976)
80. Shinozuka, M., Astill, J.: Random eigenvalue problem in structural mechanics. AIAA J.
10(4), 456462 (1972)
81. Skorohod, A.V.: Random linear operators. Reidel publishing company, Dordrecht (1984)
82. Sobczyk, K.: Wave Propagation in Random Media. Elsevier, Amsterdam (1985)
83. Soize, C.: A nonparametric model of random uncertainties for reduced matrix models in
structural dynamics. Probab. Eng. Mech. 15(3), 277294 (2000)
84. Soize, C., Ghanem, R.: Physical systems with random uncertainties: chaos representations
with arbitrary probability measure. SIAM J. Sci. Comput. 26(2), 395410 (2004)
14 Polynomial Chaos: Modeling, Estimation, and Approximation 551
85. Soize, C., Ghanem, R.: Reduced chaos decomposition with random coefficients of vector-
valued random variables and random fields. Comput. Methods Appl. Mech. Eng. 198(2126),
19261934 (2009)
86. Soize, C., Ghanem, R.: Data-driven probability concentration and sampling on manifold. J.
Comput. Phys. 321, 242258 (2016)
87. Soong, T.T., Bogdanoff, J.L.: On the natural frequencies of a disordered linear chain of n
degrees of freedom. Int. J. Mech. Sci. 5, 237265 (1963)
88. Sousedik, B., Elman, H.: Stochastic Galerkin methods for the steady-state Navier-Stokes
equations. J. Comput. Phys. 316, 435452 (2016)
89. Sousedik, B., Ghanem, R.: Truncated hierarchical preconditioning for the stochastic Galerkin
FEM. Int. J. Uncertain. Quantif. 4(4), 333348 (2014)
90. Sousedik, B., Ghanem, R., Phipps, E.: Hierarchical schur complement preconditioner for the
stochastic Galerkin finite element methods. Numer. Linear Algebra Appl. 21(1), 136151
(2014)
91. Steinwart, I., Scovel, C.: Mercers theorem on general domains: on the interaction between
measures, kernels, and RKHSs. Constr. Approx. 35, 363417 (2012)
92. Stone, M.: The genralized Weierstrass approximation theorem. Math. Mag. 21(4), 167184
(1948)
93. Takemura, A., Takeuchi, K.: Some results on univariate and multivariate Cornish-Fisher
expansion: algebraic properties and validity. SankhyaM 50, 111136 (1988)
94. Tan, W., Guttman, I.: On the construction of multi-dimensional orthogonal polynomials.
Metron 34, 3754 (1976)
95. Tavare, S., Balding, D., Griffiths, R., Donnelly, P.: Inferring coalescence times from dna
sequence data. Genetics 145, 505518 (1997)
96. Thimmisetty, C., Khodabakhshnejad, A., Jabbari, N., Aminzadeh, F., Ghanem, R., Rose, K.,
Disenhof, C., Bauer, J.: Multiscale stochastic representation in high-dimensional data using
Gaussian processes with implicit diffusion metrics. In: Ravela, S., Sandu, A. (eds.) Dynamic
Data-Driven Environmental Systems Science. Lecture Notes in Computer Science, vol. 8964.
Springer (2015). doi:10.1007/9783319251387_15
97. Tipireddy, R.: Stochastic Galerkin projections: solvers, basis adaptation and multiscale
modeling and reduction. PhD thesis, University of Southern California (2013)
98. Tipireddy, R., Ghanem, R.: Basis adaptation in homogeneous chaos spaces. J. Comput. Phys.
259, 304317 (2014)
99. Tsilifis, P., Ghanem, R.: Reduced Wiener chaos representation of random fields via basis
adaptation and projection. J. Comput. Phys. (2016, submitted)
100. Volterra, V.: Theory of Functionals and of Integral and Integro-Differential Equations. Blackie
& Son, Ltd., Glasgow (1930)
101. Wan, X., Karniadakis, G.: Multi-element generalized polynomial chaos for arbitrary
probability measures. SIAM J. Sci. Comput. 28(3), 901928 (2006)
102. Wiener, N.: Differential space. J. Math. Phys. 2, 131174 (1923)
103. Wiener, N.: The homogeneous chaos. Am. J. Math. 60(4), 897936 (1938)
104. Wintner, A., Wiener, N.: The discrete chaos. Am. J. Math. 65, 279298 (1943)
105. Xiu, D., Karniadakis, G.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24, 619644 (2002)
106. Xiu, D., Hesthaven, J.S.: High-order collocation methods for differential equations with
random inputs. SIAM J. Sci. Comput. 27(3), 11181139 (2005)
107. Yamazaki, F., Shinozuka, M., Dasgupta, G.: Neumann expansion for stochastic finite-element
analysis. ASCE J. Eng. Mech. 114(8), 13351354 (1988)
Part III
Forward Problems
Bayesian Uncertainty Propagation Using
Gaussian Processes 15
Ilias Bilionis and Nicholas Zabaras
Abstract
Classic non-intrusive uncertainty propagation techniques, typically, require a
significant number of model evaluations in order to yield convergent statistics.
In practice, however, the computational complexity of the underlying computer
codes limits significantly the number of observations that one can actually
make. In such situations the estimates produced by classic approaches cannot
be trusted since the limited number of observations induces additional epistemic
uncertainty. The goal of this chapter is to highlight how the Bayesian formalism
can quantify this epistemic uncertainty and provide robust predictive intervals
for the statistics of interest with as few simulations as one has available. It
is shown how the Bayesian formalism can be materialized by employing the
concept of a Gaussian process (GP). In addition, several practical aspects that
depend on the nature of the underlying response surface, such as the treatment
of spatiotemporal variation, and multi-output responses are discussed. The
practicality of the approach is demonstrated by propagating uncertainty through
a dynamical system and an elliptic partial differential equation.
Keywords
Epistemic uncertainty Expensive computer code Expensive computer simu-
lations Gaussian process Uncertainty propagation
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
I. Bilionis ()
School of Mechanical Engineering, Purdue University, West Lafayette, IN, USA
e-mail: ibilion@purdue.edu; http://www.predictivesciencelab.org
N. Zabaras
Warwick Centre for Predictive Modelling, University of Warwick, Coventry, UK
e-mail: nzabaras@gmail.com; http://www.zabaras.com
1 Introduction
National laboratories, research groups, and corporate R&D departments have spent
decades in time and billions in dollars to develop realistic multi-scale/multi-physics
computer codes for a wide range of engineered systems such as aircraft engines,
nuclear reactors, and automobile vehicles. The driving force behind the development
of these models has been their potential use for designing systems with desirable
properties. However, this potential has been hindered by the inherent presence of
uncertainty attributed to the lack of knowledge about initial/boundary conditions,
material properties, geometric features, the form of the models, as well as the noise
present in the experimental data used in model calibration.
This chapter focuses on the task of propagating parametric uncertainty through
a given computer code ignoring the uncertainty introduced by discretizing an
ideal mathematical model. This task is known as the uncertainty propagation
(UP) problem. Even though the discussion is limited to the UP problem, all the
ideas presented can be extended to the problems of model calibration and design
optimization under uncertainty, albeit this is beyond the scope of this chapter.
The simplest approach to the solution of the UP problem is the Monte Carlo (MC)
approach. When using MC, one simply samples the parameters, evaluates the model,
and records the response. By post-processing the recorded responses, it is possible
to quantify any statistic of interest. However, obtaining convergent statistics via MC
for computationally expensive realistic models is not feasible, since one may be able
to run only a handful of simulations.
The most common way of dealing with expensive models is to replace them with
inexpensive surrogates. That is, one evaluates the model on a set of design points
and then tries to develop an accurate representation of the response surface based
on what he observes. The most popular such approach is to expand the response
in a generalized polynomial chaos basis (gPC) [4, 5153] and approximate its
coefficients with a Galerkin projection computed with a quadrature rule, e.g., sparse
grids [44]. In relatively low-dimensional parametric settings, these techniques
15 Bayesian Uncertainty Propagation using Gaussian Processes 557
2 Methodology
f W X D Xs Xt X ! Y; (15.1)
required in order to provide a complete description of the system. This covers any
physical parameters, boundary conditions, external forcings, etc.
zP D h.z; 2 /;
(15.2)
z.0/ D z0 . 1 /;
f.xs ; t; / D .p.xs ; t; /; v1 .xs ; t; /; v2 .xs ; t; /; v3 .xs ; t; // ; (15.3)
where p.xs ; t; / and vi .xs ; t; /; i D 1; 2; 3 is the pressure and the i -th component
of the velocity field of the fluid, respectively.
A computer emulator, fc ./, of a physical model, f./, is a function that reports the
physical model at a given set of spatial locations,
fc W X ! Y ns nt Rns nt dy ; (15.6)
with
T
fc ./ D f.xs;1 ; t1 ; /T : : : f.xs;1 ; tnt ; /T : : : f.xs;ns ; t1 ; /T : : : f.xs;ns ; tnt ; /T :
(15.7)
15 Bayesian Uncertainty Propagation using Gaussian Processes 559
For example, for the case of dynamical systems, Xt may be the time steps on which
the numerical integration routine reports the solution. In the porous flow example
Xs may be the centers of the cells of a finite volume scheme and Xt the time steps
on which the numerical integration reports the solution.
Alternatively, the computer code represents the physical model in an internal
basis, e.g., spectral elements. In this case,
nc
X
f.xs ; t; / ci .t; / i .xs /; (15.8)
iD1
where i .xs / are known spatial basis functions. Then, one may think of the
computer code as the function that returns the coefficients ci .t; / 2 Rdy , i.e.,
fc W X ! Rnc nt dy (15.9)
T
fc ./ D c1 .t1 ; /T : : : c1 .tnt ; /T : : : cnc .t1 ; /T : : : cnc .tn ; /T : (15.10)
In all problems of physical relevance, the parameters, , are not known explicitly.
Without loss of generality, uncertainty about may be represented by assigning to it
a probability density, p./. Accordingly, is referred to as the stochastic input. The
goal of uncertainty propagation is to study the effects of p./ on the model output
f.xs ; t; /. Usually, it is sufficient to be able to compute low-order statistics such as
the mean,
Z
E f./.xs ; t / D f.xs ; t; /p./d ; (15.11)
Assume that one has made n simulations and collected the following data set:
p.Dj/p./
p.jD/ D ; (15.15)
p.D/
Equation (15.17) contains everything that is known about the value of Q, given the
observations in D. The quantity p.QjD/ is known as the predictive distribution of
the statistic Q given D. The uncertainty in p.QjD/ corresponds to the epistemic
uncertainty induced by ones limited-data budget. The Bayesian approach is the
only one that can naturally characterize this epistemic uncertainty.
Equation (15.17) applies generically to any Bayesian surrogate and to any
statistic Q. For the case of a Gaussian process surrogate, it is possible to develop
semi-analytic approximations to Equation (15.17) when Q is the mean or the
variance of the response. However, in general one has to think of Equation (15.17)
in a generative manner in the sense that it enables sampling of possible statistics in
a two-step procedure:
For simplicity, the section starts by developing the theory for one-dimensional
outputs, i.e., ny D 1.
beliefs, in the sense that it assigns probability one to the set of functions that are
consistent with it. A Gaussian process is a great way to represent this probability
measure.
A Gaussian process is a generalization of a multivariate Gaussian random
variable to infinite dimensions [39]. In particular, f ./ is a Gaussian process with
mean function m W X ! R and covariance function k W X X ! R, i.e.,
where m.X/ D .m.xi //i , k.X; X0 / D .k.xi ; x0j //ij , and Nn .j; / are the
probability density function of a n-dimensional multivariate normal random variable
with mean vector and covariance matrix .
But how does Equation (15.18) encode ones prior knowledge about the code?
It does so through the choice of the mean and the covariance functions. The mean
function can be an arbitrary function. Its role is to encode any generic trends about
the response. The covariance function can be any semi-positive-definite function. It
is used to model the signal strength of the response and how it varies across X , the
similarity (correlation) of the response at two distinct input points, noise, regularity,
periodicity, invariance, and more. The choice of mean and covariance functions is
discussed more elaborately in what follows.
Vague prior knowledge, i.e., prior beliefs, may be modeled by parameterizing the
mean and covariance functions and assigning prior probabilities to their parameters.
In particular, the following generic forms for the mean and covariance functions are
considered:
m W X m ! R; (15.20)
and
k W X X k ! R; (15.21)
respectively. Here
and for all k 2 k the function k.; I k / is positive definite. The parameters
m 2 m and k 2 k , of the mean and covariance functions, respectively, are
15 Bayesian Uncertainty Propagation using Gaussian Processes 563
known as hyper-parameters of the model. Using this notation, the most general
model considered in this work is:
m p. m /; (15.24)
k p. k /: (15.25)
WD f m; k g; (15.26)
and
p. / WD p. m /p. k /: (15.27)
p. m/ D p.b/ / 1: (15.29)
where b 2 Rdm , and b 2 Rdm dm is positive definite. It can be shown that in both
choices, Equation (15.29) or Equation (15.30), it is actually possible to integrate b
out of the model [15]. Some examples of generalized linear models are:
h.x/ D 1I (15.31)
h2i .x/ D sin.!i x/; and h2iC1 .x/ D cos.!i x/: (15.34)
1. Modeling measurement noise. Assume that measurements of f .x/ are noisy and
that this noise is Gaussian with variance 2 . GPs can account for this fact if one
adds a Kronecker delta-like function to the covariance, i.e., if one works with a
covariance of the form:
times differentiable at x, then samples f ./ from Equation (15.23) are times
differentiable a.s. at x.
3. Modeling invariance. If k.; I k / is invariant with respect to a transformation
T , i.e., k.x; T x0 I k / D k.T x; x0 I k / D k.x; x0 I k /, then samples f .x/ from
Equation (15.23) are invariant with respect to the same transformation a.s. In
particular, if k.; I / is periodic, then samples f .x/ from Equation (15.23) are
periodic a.s.
4. Modeling additivity. Assume that the covariance function is additive, i.e.,
d
X X
0
k.x; x I k/ D ki .xi ; xi0 I ki / C kij .xi ; xj /; .xi0 ; xj0 /I kij C:::;
iD1 1i<j d
(15.36)
with k D f ki g [ f kij g. If fi .x/; fij .x/; : : : are samples from Equa-
tion (15.23) with covariances ki .; I ki /, kij .; I kij /, : : : , respectively, then
d
X X
f .x/ D fi .xi / C fij .xi ; xj / C : : : ;
iD1 1i<j d
is a sample from Equation (15.23) with the additive covariance defined in Equa-
tion (15.36). These ideas can be used to deal effectively with high-dimensional
inputs [22, 23].
The most commonly used covariance function for representing ones prior
knowledge about a computer code is of the form of Equation (15.35) with
k0 .; I k0 / being the squared exponential (SE) covariance function:
d
( )
X .xi xi0 /2
0 2
k0 .x; x I k0 / D s exp ; (15.37)
iD1
2`2i
with
d
Y
p. k0 / D p.s/ p.`i /: (15.39)
iD1
Typically, p. /; p.s/; p.`1 /; : : : ; p.`d / are chosen to be Jeffreys priors or exponen-
tial probability densities with fixed parameters.
566 I. Bilionis and N. Zabaras
p.Dj /p. /
p. jD/ D ; (15.43)
p.D/
where
simplifying assumptions must be made. Namely, one has to assume that the mean
is a generalized linear model (see Equation (15.28)) with a separable set of basis
functions, h./, and that the covariance function is also separable.
The mean function h W Xs Xt X ! Rdh is separable if it can be written as:
0
k.x; x0 I k/ D ks .xs ; x0s I k;s /kt .t; t
0
I t; /k .; I k;t /; (15.46)
where ks .; I k;s /; kt .; I k;t /, and k .; I k; / are spatial, time, and stochastic
domain covariance functions, respectively; k;s ; k;t , and k; are the correspond-
ing hyper-parameters; and k D f k;s ; k;t ; k; g. Then, as shown in Bilionis et al.
[10], the covariance matrix can be written as the Kronecker product of a spatial, a
time, and a stochastic covariance, i.e.,
Exploiting the fact that factorizations (e.g., Cholesky or QR) of a matrix formed
by Kronecker products are given by the Kronecker products of the factorizations
of the individual matrices [47], inference and predictions can be made in O.n3s / C
O.n3t /CO.n3 / time. For the complete details of this approach, the reader is directed
to the appendix of Bilionis et al. [10]. It is worth mentioning that exactly the same
idea was used by Stegle et al. [46].
and then learn the map between the reduced variables. For notational convenience,
let ny D ns nt denote the number of outputs of the code, y 2 Rny the full output, and
z 2 Rnz the reduced output. The choice of the right dimensionality reduction map is
an open research problem. Principal component analysis (PCA) [12, Chapter 12] is
568 I. Bilionis and N. Zabaras
going to be used for the identification of the dimensionality reduction map. Consider
the empirical covariance matrix C 2 Rny ny :
n
1 X
CD .Yi m/.Yi m/T ; (15.49)
n 1 iD1
where Yi is the i -th row of the output matrix Y and m 2 Rny is the empirical mean
of the observed outputs:
n
1 X
mD Yi : (15.50)
n iD1
C D VDVT ; (15.51)
where V 2 Rny ny contains the eigenvectors of C as columns, and D 2 Rny ny is
a diagonal matrix with the eigenvalues of C on its diagonal. The PCA connection
between y and z (reconstruction map) is given by:
y D VD1=2 z C m: (15.52)
In this equation, the reduced outputs corresponding to very small eigenvalues can
be eliminated to a very good approximation. Typically, one keeps nz eigenvalues
of C so that 95%, or more, of the observed variance of y is explained. This is
achieved by removing columns from V and columns and rows from D. The inverse
of Equation (15.52) (reduction map) is given by:
The map between and the reduced variables z can be learned by any of the multi-
output GP regression techniques to be introduced in subsequent sections.
S
X
p. jD/ w.s/ .s/
: (15.54)
sD1
15 Bayesian Uncertainty Propagation using Gaussian Processes 569
P
Here, w.s/ 0 and SsD1 w.s/ D 1. The usefulness of Equation (15.54) relies on
the fact that it allows one to approximate expectations with respect to p. jD/ in a
straightforward manner.
Z S
X
E g. / WD g. /p. jD/d w.s/ g .s/
: (15.55)
sD1
There are various ways in which a particle approximation can be constructed. The
discussion below includes the most common approaches.
MLE D arg max p.Dj /: (15.56)
2
if the prior is relatively flat and the posterior is sharply peaked around MLE .
MAP D arg max p. jD/; (15.58)
2
where the posterior p. jD/ was defined in Equation (15.43). This is also a single-
particle approximation:
p. jD/ MAP : (15.59)
p. j ; D/ / p. jD/ p. /: (15.60)
Notice that for D 0 one obtains the prior and for D 1 one obtains the posterior.
n oS
.s/ .s/
The idea is to start a particle approximation w0 ; 0 from the prior ( D 0)
sD1
which is very easy to sample from and gradually move it to the posterior ( D 1).
This can be easily achieved if there is an underlying MCMC routine for sampling
from Equation (15.60). In addition, the schedule of s can be picked adaptively.
The reader is directed to the appendix of Bilionis and Zabaras [9] for the complete
details. SMC-based methodologies are suitable for multimodal posteriors. Another
very attractive attribute is that they are embarrassingly parallelizable.
where i D f m;i ; k;i g and p.yi jX; i / is a multivariate Gaussian exactly the
same as the one in Equation (15.19) with mi .I m;i / and ki .; I k;i / instead of
m.I m / and k.; I k /, respectively. A typical choice of a prior for assumes a
priori independence of the various parameters, i.e.,
dy
Y
p. / D p. m;i /p. k;i / : (15.64)
iD1
dy
Y
2
f./j GP fi ./jm.I m;i /; s;i k.; I k/ ; (15.65)
iD1
with
dy
[
Df kg [ f m;i ; s;i g; (15.66)
iD1
where m;i and s;i are the parameters of the mean and the signal strength,
respectively, of the i -th outputs, and k are the parameters of the covariance
function shared by all outputs. The likelihood of this model is given by an equation
similar to Equation (15.63) with an appropriate mean and covariance function.
The prior of is assumed to have an a priori independence structure similar
to Equation (15.64). The advantage of this approach compared with the fully
independent approach is that all outputs share the same covariance function, and,
hence, its computational complexity is the same as that of a single-output Gaussian
process regression.
with
Df m; k ; g; (15.68)
572 I. Bilionis and N. Zabaras
and
Notice that there is no need to include a signal parameter in the covariance function,
since it will be absorbed by . Therefore, in this model, one can identically take
the signal strength of the response to be identically equal to one. Equation (15.70)
provides the intuitive meaning of as the matrix capturing the linear part of the
correlations of the various outputs. In this case, the likelihood is given via a matrix
normal [18]:
p.Dj / D p.YjX; /
(15.71)
D Nndy .Yjm.XI m /; ; k.X; XI k // :
dy C1
p./ / jj 2 : (15.73)
The purpose of this section is to demonstrate how the posterior Gaussian process
defined by Equations (15.40) and (15.43) can be represented as fO.I /, where is
a finite dimensional random variable with probability density p.jD/. There are two
ways in which this can be achieved. The first way is based on a truncated Karhunen-
Love expansion [26, 32, 45] (KLE) of the Equation (15.40). The second way is
15 Bayesian Uncertainty Propagation using Gaussian Processes 573
based on the idea introduced in OHagan et al. [38], further discussed in Oakley
and OHagan [34, 35], and revisited in Bilionis et al. [10] as well as in Chen et al.
[14]. In both approximations, fO.I / is given analytically. In particular, it can be
expressed as
where m .I / is the posterior mean given in Equation (15.41) and k .x; Xd I / is
the posterior cross covariance, defined generically via
0 1
k .x1 ; x01 I / : : : k .x1 ; x0n0 I /
B :: :: :: C
k .X; X0 I / WD @ : : : ;A (15.75)
k .xn ; x01 I / : : : k .xn ; x0n0 I /
with k .; I / being the posterior covariance defined in Equation (15.42). The
parameters that characterize the surrogate surface fO.xI / are:
D f ; !g; (15.76)
with
with p. jD/ being the posterior of the hyper-parameters (see Equation (15.43)).
The matrix
T
Xd D xTd;1 : : : xTd;nd ; (15.80)
d!
X p
fO.xI / D m .xI / C !` ` . /
` .xI /; (15.83)
`D1
where and ! are as in Equations (15.76) and (15.77), respectively. The probability
density of , p.jD/, is defined via Equations (15.79) and (15.78).
Equation (15.81) is a Fredholm integral eigenvalue problem. In general, this
equation cannot be solved analytically. A very recent study of the numerical
techniques that can be used for the solution of this problem can be found in Betz
et al. [5]. This work relies on the Nystrm approximation [20, 40] and follows Betz
et al. [5] closely in its development.
Start by approximating the integral on the left-hand side of Equation (15.81) by
nd
X
wj k .x; xd;j I /
` .xd;j I / ` . /
` .xI /; (15.84)
j D1
where fxd;j ; wj gnj dD1 is a suitable quadrature rule. Notice that in this approximation
the design points of Equation (15.80) correspond to the quadrature points. The
simplest quadrature rule would be a Monte Carlo type of rule, in which the points
xd;j are randomly picked in X and wj D n1d . Other choices could be based on
tensor products of one-dimensional rules or a sparse grid quadrature rule [44]. As
shown later on, for the special but very common case of a separable covariance
function, the difficulty of the problem can be reduced dramatically.
The next step in the Nystrm approximation is to solve Equation (15.84) at the
quadrature points:
nd
X
wj k .xd;i ; xd;j I /
` .xd;j I / ` . /
` .xd;i I /; (15.85)
j D1
15 Bayesian Uncertainty Propagation using Gaussian Processes 575
BQv` D ` vQ ` ; (15.87)
where
1 1
B D W 2 k .Xd ; Xd I /W 2 ; (15.88)
Typically, the KLE is truncated at some d! nd such that the % of the energy
of the field is captured, e.g., D 90%. That is, one may pick d! so that:
d!
X nd
X
` D ` : (15.92)
`D1 `D1
Q D .Qv1 : : : vQ d / ;
V (15.94)
where
m .Xd I / D m .xd;1 I / : : : m .xd;nd / ; (15.96)
with m .I / being the posterior mean given defined in Equation (15.41), and
0 1
k .xd;1 ; xd;1 I / : : : k .xd;1 ; xd;nd I /
B :: :: :: C
k .Xd ; Xd I / D @ : : : A; (15.97)
k .xd;n ; xd;1 I / : : : k.xd;n ; xd;n I /
with k .; I / being the posterior covariance defined in Equation (15.42). Finally,
let us condition the posterior Gaussian process of Equation (15.40) on the hypothet-
ical observations fXd ; Yd g. One has:
f ./jD; ; Xd ; Yd GP f ./jm .I ; Yd /; k .; I ; Yd / ; (15.98)
m .xI ; Yd / D m .xI / C k .x; Xd I /k .Xd ; Xd I /1 .Yd m .Xd I //;
(15.99)
and the covariance is given by:
The idea of [38] was that if nd is sufficiently large, then k .x; x0 / is very small and,
thus, negligible. In other words, if Xd is dense enough, then all the probability mass
of Equation (15.98) is accumulated around the mean m .I ; Yd /. Therefore, one
may think of the mean m .I ; Yd / as a sample surface from Equation (15.40).
To make the connection with Equation (15.74), let L . / be the Cholesky
decomposition of the covariance k .Xd ; Xd I /, Equation (15.100),
T
k .Xd ; Xd I / D L . / L . / :
Yd D m .Xd I / C L . /!;
15 Bayesian Uncertainty Propagation using Gaussian Processes 577
for a ! Nd! .!j0nd ; Ind /. Therefore, one may rewrite Equation (15.99) as:
T;1
m .xI I !/ D m .xI / C k .x; Xd I / L . / !:
1. Start with
Xd;s D fg:
2. If
jXd;s j D nd ;
then
Notice that when one includes a new point xd;i , he has to compute the Choleksy
decomposition of the covariance matrix k .Xd;s [ fxd;i g; Xd;s [ fxd;i gI /. This
can be done efficiently using rank-one updates of the covariance matrix (see Seeger
[43]).
578 I. Bilionis and N. Zabaras
It is quite obvious how Equation (15.74) can be used to obtain samples from
the predictive distribution, p.QjD/, of a statistic of interest Q. Thus, using a
Monte Carlo procedure, one can characterize ones uncertainty about any statistic
of the response surface. This could become computationally expensive in the case
of high dimensions and many observations, albeit less expensive than evaluating
these statistics using the simulator itself. Fortunately, as shown in this section, it is
actually possible to evaluate exactly the predictive distribution for the mean statistic,
Equation (15.11), since it turns out to be Gaussian. Furthermore, it is possible to
derive the predictive mean and variance of the covariance statistic Equation (15.12).
In this subsection, it is shown that the predictive distribution for the mean statistic
Equation (15.11) is actually Gaussian.
where ; ; and ! are as before. Taking the expectation of this over p./, one
obtains:
where
and
with
. d I / WD E k . d ; I /: (15.107)
15 Bayesian Uncertainty Propagation using Gaussian Processes 579
1
f . / D E m.I m / C . I k/
T
k. ; I k/ .Y m. I m //;
(15.109)
and from Equation (15.42)
For the case of a SE covariance (see Equation (15.37)) combined with a Gaussian or
uniform distribution p./, [34] and [7], respectively, show that Equation (15.108)
can be computed analytically. As shown in [8], for the case of an arbitrary separable
covariance as well as arbitrary independent random variables , Equation (15.108)
can be computed efficiently by doing dy one-dimensional integrals.
In other words, it has been shown that if f ./ is a Gaussian process, then its mean
E f ././ is a Gaussian process:
E f ././jD; GP E f ././jmmean .I /; kmean .; I / ; (15.112)
Note, that if the stochastic variables in are independent and the covariance function
k.x; x0 I / is separable with respect to the i s, then all these quantities can be
computed efficiently with numerical integration.
Equations similar to Equation (15.111) can be derived without difficulty for
the covariance statistic of Equation (15.12) as well as for the variance statistic of
Equation (15.13) [10, 14]. In contrast to the mean statistic, however, the resulting
random field is not Gaussian. That is, an equation similar to Equation (15.112) does
not hold.
3 Numerical Examples
In this synthetic example, the ability of the Bayesian approach to characterize ones
state of knowledge about the statistics with a very limited number of observations is
demonstrated. To keep things simple, start with no space/time inputs (ds D dt D 0),
one stochastic variable d D 0, and one output dy D 0. That is, the input is just xD.
n
Consider n D 7 arbitrary observations, D D x .i/ ; y .i/ iD1 , which are shown
as crosses in Fig. 15.1a. The goal is to use these seven observations to learn the
underlying response function y D f .x/ and characterize ones state of knowledge
about the mean Ef ./ (Equation (15.11)), the variance Vf ./ (Equation (15.13)),
and the induced probability density function in the response y:
Z
p.y/ D .y f .x// p.x/dx;
where p.x/ is the input probability density, taken to be a Beta.10; 2/ and shown in
Fig. 15.1b.
The first step is to assign a prior Gaussian process to the response (Equa-
tion (15.18)). This is done by picking a zero mean and an SE covariance function
(Equation (15.37)) with no nugget, 2 D 0 in Equation (15.35), and fixed signal and
length-scale parameters to s D 1 and ` D 0:1, respectively. These choices represent
ones prior beliefs about the underlying response function y D f .x/.
Given the observations in D, the updated state of knowledge is characterized
by the posterior GP of Equation (15.40). The posterior mean function, m ./ of
Equation (15.41), is the dashed blue line of Fig. 15.1a. The shaded gray are of the
same figure corresponds to a 95% predictive interval about the mean. This interval
is computed using the posterior covariance function, k .; / of Equation (15.42).
Specifically, the point predictive distribution at x is
2
p.yjx; D/ N m .x/; .x/ ;
p
where .x/ D k .x; x/ and, thus, the 95% predictive interval at x is given,
approximately, by .m .x/ 1:96 .x/; m .x/ C 1:96 .x//. The posterior mean
can be thought of as a point estimation of the underlying response surface.
15 Bayesian Uncertainty Propagation using Gaussian Processes 581
a b
c d
e f
Fig. 15.1 Synthetic: Subfigure (a) shows the observed data (cross symbols), the mean (dashed
blue line), the 95% predictive intervals (shaded gray area), and three samples (solid black
lines) from the posterior Gaussian process conditioned on the observed data. The green line of
subfigure (b) shows the probability density function imposed on the input x. The three lines in
subfigure (c) correspond to the first three eigenfunctions used in the KLE of the posterior GP.
Subfigures (d) and (e) depict the predictive distribution conditioned on the observations of the mean
and the variance statistic of f .x/, respectively. Subfigure (f) shows the mean predicted probability
density of y D f .x/ (blue dashed line) with 95% predictive intervals (shaded gray area),and
three samples (solid black lines) from the posterior predictive probability measure on the space of
probability densities
582 I. Bilionis and N. Zabaras
where
i U .1; 1/; i D 1; 2;
is considered. To make the connection with the notation of this chapter, note that
ds D 0; dt D 1; d D 2, and dy D 3. For each choice of , the computer
15 Bayesian Uncertainty Propagation using Gaussian Processes 583
a 1
KO2, n=100 b KO2, n=100
1
95% Pred. interval 95% Pred. interval
Mean Mean
0.8 0.5
Mean of y1(t)
Mean of y3(t)
0.6 0
0.4 0.5
0.2 1
0 1.5
0 2 4 6 8 10 0 2 4 6 8 10
Time (t) Time (t)
0.15 0.6
Variance of y3(t)
Variance of y1(t)
0.5
0.1 0.4
0.3
0.05 0.2
0.1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (t) Time (t)
0.15 0.6
Variance of y1(t)
Variance of y3(t)
0.5
0.1 0.4
0.3
0.05 0.2
0.1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (t) Time (t)
Fig. 15.2 Dynamical system: Subfigures (a) and (b) correspond to the predictions about the
mean of y1 .t / and y3 .t /, respectively, using n D 100 simulations. Subfigures (c) and (d) ((e)
and (f)) show the predictions about the variance of the same quantities for n D 100 (n D 150)
simulations (Reproduced with permission from [10])
Probability
Probability
0.8 0.6 0.8
0.6 0.4 0.6
0.4 0.4
0.2
0.2 0.2
0 0 0
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
y2(t = 4.0) y2(t = 4.0) y2(t = 4.0)
Probability
Probability
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
y2(t=6.0) y2(t = 6.0) y2(t = 6.0)
g h i
3.5 4 4
95% Pred. int. 95% Pred. int. 95% Pred. int.
3 Mean of PDF 3.5 Mean of PDF 3.5 Mean of PDF
2.5 3 3
Probability
Probability
Probability
2.5 2.5
2
2 2
1.5
1.5 1.5
1 1 1
0.5 0.5 0.5
0 0 0
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
y2(t = 8.0) y2(t = 8.0) y2(t = 8.0)
j k l
2.5 2.5 2.5
95% Pred. int. 95% Pred. int. 95% Pred. int.
Mean of PDF Mean of PDF Mean of PDF
2 2 2
Probability
Probability
Probability
1 1 1
0 0 0
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
y2(t =8.0) y2(t = 8.0) y2(t = 8.0)
Fig. 15.3 Dynamical system: Columns correspond to results using n D 70; 100, and 150
simulations counting from the left. Counting from the top, rows one, two, three, and four show
the predictions about the PDF of y2 .t / at times t D 4; 6; 8; 10, respectively (Reproduced with
permission from [10])
a1 b 1
0.45
0.4
0.4
0.8 0.8 0.35
0.35
0.3
0.3
0.6 0.6 0.25
0.25
0.2
0.2
0.4 0.4 0.15
0.15
0.1
0.1
0.2 0.2 0.05
0.05
0
0 0 0
0 0.5 1 0 0.5 1
Mean of E[u(xs;)], n = 24 Mean of E[u(xs;)], n = 64
c d
1 0.45
1
0.09
0.4
0.8 0.8 0.08
0.35
0.07
0.3
0.6 0.6 0.06
0.25
0.2 0.05
0.4 0.4
0.15
0.04
0.1
0.2 0.2 0.03
0.05
0 0.02
0 0
0 0.5 1 0 0.5 1
Mean of E [u(xs;)], n = 120 Two std. of E[u(xs;)], n = 120
e 1 0.5
0.45
0.8 0.4
0.35
0.6 0.3
0.25
0.4 0.2
0.15
0.2 0.1
0.05
0 0
0 0.5 1
MC estimate of E[u(xs;)]
Fig. 15.4 Partial differential equation: Mean of Eu.xs I /. Subfigures (a), (b), and (c) show the
predictive mean of Eu.xs I / as a function of xs conditioned on 24, 64, and 120 simulations,
respectively. Subfigure (d) plots two standard deviations Eu.xs I / conditioned on 120 observa-
tions. Subfigure (e) shows the MC estimate of the same quantity (Reproduced with permission
from [10])
588 I. Bilionis and N. Zabaras
a 1 b 1
0.45
0.4
0.4
0.8 0.8 0.35
0.35
0.3
0.3
0.6 0.6 0.25
0.25
0.2
0.2
0.4 0.4 0.15
0.15
0.1
0.1
0.2 0.2 0.05
0.05
0
0 0 0
0 0.5 1 0 0.5 1
Mean of E[v(xs ; )], n = 24 Mean of E[v(xs ; )], n = 64
c 1 d 1
0.45 0.09
0.4 0.08
0.8 0.8
0.35
0.07
0.3
0.6 0.6 0.06
0.25
0.05
0.2
0.4 0.4
0.15 0.04
0.1 0.03
0.2 0.2
0.05 0.02
0 0
0
0 0.5 1 0 0.5 1
e 1 0.5
0.45
0.8 0.4
0.35
0.6 0.3
0.25
0.4 0.2
0.15
0.2 0.1
0.05
0 0
0 0.5 1
MC estimate of E[v(xs ; )]
Fig. 15.5 Partial differential equation: Mean of Ev.xs I /. Subfigures (a), (b), and (c) show the
predictive mean of Ev.xs I / as a function of xs conditioned on 24, 64, and 120 simulations,
respectively. Subfigure (d) plots two standard deviations Ev.xs I / conditioned on 120 observa-
tions Subfigure (e) shows the MC estimate of the same quantity (Reproduced with permission from
[10])
15 Bayesian Uncertainty Propagation using Gaussian Processes 589
a 1 0.15 b 1
0.15
0.1
0.8 0.8 0.1
0.05
0.05
0.6 0.6
0 0
0.1 0.1
0.2 0.2
0.15 0.15
0 0
0 0.5 1 0 0.5 1
Mean of E [p (xs ; )], n = 24 Mean of E [p (xs ; )], n = 64
c 1 0.15 d 1 0.035
0.1 0.03
0.8 0.8
0.05 0.025
0.6 0.6
0 0.02
0.1 0.01
0.2 0.2
0.15 0.005
0 0
0 0.5 1 0 0.5 1
Mean of E [p (xs ; )], n = 120 Two std. of E [p(x s; )], n = 120
e 1
0.15
0.8 0.1
0.05
0.6
0
0.4 0.05
0.1
0.2
0.15
0
0 0.5 1
MC estimate of E [p(xs ; )]
Fig. 15.6 Partial differential equation: Mean of Ep.xs I /. Subfigures (a), (b), and (c) show the
predictive mean of Ep.xs I / as a function of xs conditioned on 24, 64, and 120 simulations,
respectively. Subfigure (d) plots two standard deviations of Ep.xs I / conditioned on 120
observations. Subfigure (e) shows the MC estimate of the same quantity (Reproduced with
permission from [10])
590 I. Bilionis and N. Zabaras
a1 103 b 1 103
14
10
0.8 9 0.8 12
8 10
0.6 0.6
7 8
0.4 6
0.4
6
0.2 5 0.2 4
4
0 0
0 0.5 1 0 0.5 1
Mean of V[u(xs ; )], n =24 Mean of V[u(xs ; )], n = 64
103 103
c1 d 1
16
3.5
0.8 14 0.8
3
12
0.6 10
0.6 2.5
8 2
0.4 0.4
6 1.5
0.2 4 0.2 1
2
0.5
0 0
0 0.5 1 0 0.5 1
Mean of V[u(xs ; )], n = 120 Two std.of V[u(xs ; )], n =120
e 1 0.025
0.8 0.02
0.6 0.015
0.4 0.01
0.2 0.005
0 0
0 0.5 1
MC estimate of V[u(xs ; )]
Fig. 15.7 Partial differential equation: Mean of Vu.xs I /. Subfigures (a), (b), and (c) show the
predictive mean of Vu.xs I / as a function of xs conditioned on 24, 64, and 120 simulations,
respectively. Subfigure (d) plots two standard deviations of Vu.xs I / conditioned on 120
observations. Subfigure (e) shows the MC estimate of the same quantity (Reproduced with
permission from [10])
15 Bayesian Uncertainty Propagation using Gaussian Processes 591
a1 103 b 1 103
14
12
0.8 12 0.8
10
10
0.6 0.6 8
8
6
0.4 0.4
6
4
0.2 4 0.2
2
2
0 0
0 0.5 1 0 0.5 1
Mean of V[v(xs ; )], n = 24 Mean of V[v(xs ; )], n = 64
c1 103 d 1 103
16 3.5
0.8 14 0.8 3
12
2.5
0.6 10 0.6
2
8
0.4 0.4 1.5
6
4 1
0.2 0.2
2 0.5
0 0
0 0.5 1 0 0.5 1
e 1 0.025
0.8 0.02
0.6 0.015
0.4 0.01
0.2 0.005
0 0
0 0.5 1
MC estimate of V[v(xs ; )]
Fig. 15.8 Partial differential equation: Mean of Vv.xs I /. Subfigures (a), (b), and (c) show the
predictive mean of Vv.xs I / as a function of xs conditioned on 24, 64, and 120 simulations,
respectively. Subfigure (d) plots two standard deviations of Vv.xs I / conditioned on 120
observations. Subfigure (e) shows the MC estimate of the same quantity (Reproduced with
permission from [10])
592 I. Bilionis and N. Zabaras
a 1 104 b 1 103
2.5
12
0.8 10
0.8 2
6
0.4 0.4 1
4
0.2 0.2 0.5
2
0 0
0 0.5 1 0 0.5 1
c1 103 d 1 104
2.5 5
4.5
0.8 2
0.8 4
3.5
0.6 1.5 0.6 3
2.5
0.4 1 0.4 2
1.5
0.2 0.5 0.2 1
0.5
0 0
0 0.5 1 0 0.5 1
Two std. of V [ p (xs ; )], n = 120 Error bar of the variance of p, 120
obs.
e 1 103
4.5
0.8 4
3.5
0.6 3
2.5
0.4 2
1.5
0.2 1
0.5
0
0 0.5 1
MC estimate of V[p(xs ; )]
Fig. 15.9 Partial differential equation: Mean of Vp.xs I /. Subfigures (a), (b), and (c) show the
predictive mean of Vp.xs I / as a function of xs conditioned on 24, 64, and 120 simulations,
respectively. Subfigure (d) plots two standard deviations of Vp.xs I / conditioned on 120
observations. Subfigure (e) shows the MC estimate of the same quantity (Reproduced with
permission from [10])
15 Bayesian Uncertainty Propagation using Gaussian Processes 593
a b
10 10
8 8
Probability density
Probability density
6 6
4 4
2 2
0 0
0.1 0 0.1 0.2 0.3 0.1 0 0.1 0.2 0.3
u (0.50,0.50) u (0.50,0.50)
n = 24 n = 64
c 10 d 10
8 8
Probability density
Probability density
6 6
4 4
2 2
0 0
0.1 0 0.1 0.2 0.3 0.1 0 0.1 0.2 0.3
u (0.50,0.50) u (0.50,0.50)
n = 120 MC estimate
Fig. 15.10 Partial differential equation: The prediction for the PDF of u.xs D .0:5; 0:5/I /. The
solid blue line shows the mean predictive distribution of the PDF conditioned on 24 (a), 64 (b),
and 120 (c) simulations. The filled gray area depicts two standard deviations of the predictive
distribution PDFs about the predictive mean of PDF. The solid red line of (d) shows the MC
estimate for comparison (Reproduced with permission from [10])
4 Conclusions
a b
6 6
Probability density
Probability density
4 4
2 2
0 0
0.1 0 0.1 0.2 0.3 0.4 0.1 0 0.1 0.2 0.3 0.4
u(0.25,0.25) u (0.25,0.25)
n = 24 n = 64
c d
6 6
Probability density
Probability density
4 4
2 2
0 0
0.1 0 0.1 0.2 0.3 0.4 0.1 0 0.1 0.2 0.3 0.4
u (0.25,0.25) u (0.25,0.25)
n = 120 MC estimate
Fig. 15.11 Partial differential equation: The prediction for the PDF of u.xs D .0:25; 0:25/I /.
The solid blue line shows the mean predictive distribution of the PDF conditioned on 24 (a), 64 (b),
and 120 (c) simulations. The filled gray area depicts two standard deviations of the predictive
distribution PDFs about the predictive mean of PDF. The solid red line of (d) shows the MC
estimate for comparison (Reproduced with permission from [10])
Despite the successes of the current state of the Bayesian approach to the UP
problem, there is still a wealth of open research questions. First, carrying out
GP regression in high dimensions is not a trivial problem since it requires the
development of application-specific covariance functions. The study of covariance
functions that automatically perform some kind of internal dimensionality reduction
seems to be a promising step forward. Second, in order to capture sharp variations in
the response surface, such as localized bumps or even discontinuities, there is a need
for flexible nonstationary covariance functions or alternative approaches based on
mixtures of GPs, e.g., see [14]. Third, there is a need for computationally efficient
ways of treating nonlinear correlations between distinct model outputs, since this
is expected to squeeze more information out of the simulations. Fourth, as a semi-
15 Bayesian Uncertainty Propagation using Gaussian Processes 595
a 70
b 70
60 60
Probability density
Probability density
50 50
40 40
30 30
20 20
10 10
0 0
0.04 0.02 0 0.02 0.04 0.04 0.02 0 0.02 0.04
p(0.50,0.50) p(0.50,0.50)
n = 24 n = 64
c d
70 70
60 60
Probability density
Probability density
50 50
40 40
30 30
20 20
10 10
0 0
0.04 0.02 0 0.02 0.04 0.04 0.02 0 0.02 0.04
p(0.50,0.50) p(0.50,0.50)
n = 120 MC estimate
Fig. 15.12 Partial differential equation: The prediction for the PDF of p.xs D .0:5; 0:5/I /. The
solid blue line shows the mean predictive distribution of the PDF conditioned on 24 (a), 64 (b),
and 120 (c) simulations. The filled gray area depicts two standard deviations of the predictive
distribution PDFs about the predictive mean of PDF. The solid red line of (d) shows the MC
estimate for comparison (Reproduced with permission from [10])
intrusive approach, the mathematical models describing the physics of the problem
could be used to derive physics-constrained covariance functions that would,
presumably, force the prior GP probability measure to be compatible with known
response properties, such as mass conservation. That is, such an approach would put
more effort on better representing our prior state of knowledge about the response.
Fifth, there is an evident need for developing simulation selection policies which
are specifically designed to gather information about the uncertainty propagation
task. Finally, note that the Bayesian approach can also be applied to other important
contexts such as model calibration and design optimization under uncertainty. As a
result, all the open research questions have the potential to also revolutionize these
fields.
596 I. Bilionis and N. Zabaras
a b
40 40
Probability density
Probability density
30 30
20 20
10 10
0 0
0 0.05 0.1 0.15 0 0.05 0.1 0.15
p(0.25,0.25) p(0.25,0.25)
n = 24 n = 64
c d
40 40
Probability density
Probability density
30 30
20 20
10 10
0 0
0 0.05 0.1 0.15 0 0.05 0.1 0.15
p(0.25,0.25) p(0.25,0.25)
n = 120 MC estimate
Fig. 15.13 Partial differential equation: The prediction for the PDF of p.xs D .0:25; 0:25/I /.
The solid blue line shows the mean predictive distribution of the PDF conditioned on 24 (a), 64 (b),
and 120 (c) simulations. The filled gray area depicts two standard deviations of the predictive
distribution PDFs about the predictive mean of PDF. The solid red line of (d) shows the MC
estimate for comparison (Reproduced with permission from [10])
References
1. Aarnes, J.E., Kippe, V., Lie, K.A., Rustad, A.B.: Modelling of multiscale structures in flow
simulations for petroleum reservoirs. In: Hasle, G., Lie, K.A., Quak, E. (eds.): Geomet-
ric Modelling, Numerical Simulation, and Optimization, chap. 10, pp. 307360. Springer,
Berlin/Heidelberg (2007). doi:10.1007/978-3-540-68783-2_10
2. Alvarez, M., Lawrence, N.D.: Sparse convolved Gaussian processes for multi-output regres-
sion. In: Koller, D., Schuurmans, D., Bengio, Y., and Bottou. L. (eds.): Advances in Neural
Information Processing Systems 21 (NIPS 2008), Vancouver, B.C., Canada (2008)
3. Alvarez, M., Luengo-Garcia, D., Titsias, M., Lawrence, N.: Efficient multioutput Gaussian
processes through variational inducing kernels. In: Ft. Lauderdale, FL, USA (2011)
4. Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 45(3), 10051034
(2007)
15 Bayesian Uncertainty Propagation using Gaussian Processes 597
5. Betz, W., Papaioannou, I., Straub, D.: Numerical methods for the discretization of random
fields by means of the Karhunen-Loeve expansion. Comput. Methods Appl. Mech. Eng. 271,
109129 (2014). doi:10.1016/j.cma.2013.12.010
6. Bilionis, I.: py-orthpol: Construct orthogonal polynomials in python. https://github.com/
PredictiveScienceLab/py-orthpol (2013)
7. Bilionis, I., Zabaras, N.: Multi-output local Gaussian process regression: applica-
tions to uncertainty quantification. J. Comput. Phys. 231(17), 57185746 (2012)
doi:10.1016/J.Jcp.2012.04.047
8. Bilionis, I., Zabaras, N.: Multidimensional adaptive relevance vector machines for uncertainty
quantification. SIAM J. Sci. Comput. 34(6), B881B908 (2012). doi:10.1137/120861345
9. Bilionis, I., Zabaras, N.: Solution of inverse problems with limited forward solver
evaluations: a Bayssian perspective. Inverse Probl. 30(1), Artn 015004 (2014).
doi:10.1088/0266-5611/30/1/015004
10. Bilionis, I., Zabaras, N., Konomi, B.A., Lin, G.: Multi-output separable Gaussian process:
towards an efficient, fully Bayesian paradigm for uncertainty quantification. J. Comput. Phys.
241, 212239 (2013). doi:10.1016/J.Jcp.2013.01.011
11. Bilionis, I., Drewniak, B.A., Constantinescu, E.M.: Crop physiology calibration in the CLM.
Geoscientific Model Dev. 8(4), 10711083 (2015). doi:10.5194/gmd-8-1071-2015, http://
www.geosci-model-dev.net/8/1071/2015 http://www.geosci-model-dev.net/8/1071/2015/
gmd-8-1071-2015.pdf, gMD http://www.geosci-model-dev.net/8/1071/2015/gmd-8-1071-
2015.pdf
12. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics.
Springer, New York (2006)
13. Boyle, P., Frean, M.: Dependent Gaussian processes. In: Saul, L.K., Weiss, Y., and Bottou L.
(eds.): Advances in Neural Information Processing Systems 17 (NIPS 2004), Whistler, B.C.,
Canada (2004)
14. Chen, P., Zabaras, N., Bilionis, I.: Uncertainty propagation using infinite mixture of Gaussian
processes and variational Bayssian inference. J. Comput. Phys. 284, 291333 (2015)
15. Conti, S., OHagan, A.: Bayesian emulation of complex multi-output and dynamic computer
models. J. Stat. Plan. Inference 140(3), 640651 (2010). doi:10.1016/J.Jspi.2009.08.006
16. Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: A Bayesian approach to the design and
analysis of computer experiments. Report, Oak Ridge Laboratory (1988)
17. Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: Bayesian prediction of deterministic
functions, with applications to the design and analysis of computer experiments. J. Am. Stat.
Assoc. 86(416), 953963 (1991). doi:10.2307/2290511
18. Dawid, A.P.: Some matrix-variate distribution theory notational considerations and a
Bayesian application. Biometrika 68(1), 265274 (1981)
19. Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B.
(Stat. Methodol.) 68(3), 411436 (2006)
20. Delves, L.M., Walsh, J.E., of Manchester Department of Mathematics, U., of Computational
LUD, Science, S.: Numerical Solution of Integral Equations. Clarendon Press, Oxford (1974)
21. Doucet, A., De Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in Practice
(Statistics for Engineering and Information Science). Springer, New York (2001)
22. Durrande, N., Ginsbourger, D., Roustant, O.: Additive covariance kernels for high-dimensional
Gaussian process modeling. arXiv:11116233 (2011)
23. Duvenaud, D., Nickisch, H., Rasmussen, C.E.: Additive Gaussian processes. In: Advances in
Neural Information Processing Systems, vol. 24, pp. 226234 (2011)
24. Gautschi, W.: On generating orthogonal polynomials. SIAM J. Sci. Stat. Comput. 3(3), 289
317 (1982). doi:10.1137/0903018
25. Gautschi, W.: Algorithm-726 ORTHPOL a package of routines for generating orthogonal
polynomials and Gauss-type quadrature rules. ACM Trans. Math. Softw. 20(1), 2162 (1994)
doi:10.1145/174603.174605
26. Ghanem, R., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach, rev. edn. Dover
Publications, Minneola (2003)
598 I. Bilionis and N. Zabaras
27. Gramacy, R.B., Lee, H.K.H.: Cases for the nugget in modeling computer experiments. Stat.
Comput. 22(3), 713722 (2012) doi:10.1007/s11222-010-9224-x
28. Haff, L.: An identity for the Wishart distribution with applications. J. Multivar. Anal. 9(4),
531544 (1979). doi:http://dx.doi.org/10.1016/0047-259X(79)90056-3
29. Hastings, W.K.: Monte-Carlo sampling methods using Markov chains and their applications.
Biometrika 57(1), 97109 (1970). doi:10.2307/2334940
30. Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-
dimensional output. J. Am. Stat. Assoc. 103(482), 570583 (2008)
31. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics.
Springer, New York (2001)
32. Love, M.: Probability Theory, 4th edn. Graduate Texts in Mathematics. Springer, New York
(1977)
33. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of
state calculations by fast computing machines. J. Chem. Phys. 21(6), 10871092 (1953).
doi:10.1063/1.1699114
34. Oakley, J., OHagan, A.: Bayesian inference for the uncertainty distribution of computer model
outputs. Biometrika 89(4), 769784 (2002)
35. Oakley, J.E., OHagan, A.: Probabilistic sensitivity analysis of complex models: a
Bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66, 751769 (2004).
doi:10.1111/j.1467-9868.2004.05304.x
36. OHagan, A.: Bayes-Hermite quadrature. J. Stat. Plan. Inference 29(3), 245260 (1991)
37. OHagan, A., Kennedy, M.: Gaussian emulation machine for sensitivity analysis (GEM-SA)
(2015). http://www.tonyohagan.co.uk/academic/GEM/
38. OHagan, A., Kennedy, M.C., Oakley, J.E.: Uncertainty analysis and other inference tools for
complex computer codes. Bayesian Stat. 6, 503524 (1999)
39. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press,
Cambridge (2006)
40. Reinhardt, H.J.: Analysis of Approximation Methods for Differential and Integral Equations.
Applied Mathematical Sciences. Springer, New York (1985)
41. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer Texts in
Statistics. Springer, New York (2004)
42. Sacks, J., Welch, W.J., Mitchell, T., Wynn, H.P.: Design and analysis of computer experiments.
Stat. Sci. 4(4), 409423 (1989)
43. Seeger, M.: Low rank updates for the Cholesky decomposition. Report, University of California
at Berkeley (2007)
44. Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of
functions. Sov. Math. Dokl. 4, 240243 (1963)
45. Stark, H., Woods, J.W., Stark, H.: Probability and Random Processes with Applications to
Signal Processing, 3rd edn. Prentice Hall, Upper Saddle River (2002)
46. Stegle, O., Lippert, C., Mooij, J.M., Lawrence, N.D., Borgwardt, K.M.: Efficient inference in
matrix-variate Gaussian models with backslash iid observation noise. In: Shawe-Taylor, J.,
Zemel, R.S., Barlett, P.L., Pereira, F., Weinberger K.Q. (eds.): Advances in Neural Information
Processing Systems 24 (NIPS 2011), Granada, Spain (2011)
47. Van Loan, C.F.: The ubiquitous Kronecker product. J. Comput. Appl. Math. 123(12), 85100
(2000)
48. Wan, J., Zabaras, N.: A Bayssian approach to multiscale inverse problems using the sequential
Monte Carlo method. Inverse Probl. 27(10), 105004 (2011)
49. Wan, X.L., Karniadakis, G.E.: An adaptive multi-element generalized polynomial chaos
method for stochastic differential equations. J. Comput. Phys. 209(2), 617642 (2005).
doi:10.1016/j.jcp.2005.03.023, <GotoISI>://WOS:000230736700011
50. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening,
predicting, and computer experiments. Technometrics 34(1), 1525 (1992)
51. Xiu, D.B.: Efficient collocational approach for parametric uncertainty analysis. Commun.
Comput. Phys. 2(2), 293309 (2007)
15 Bayesian Uncertainty Propagation using Gaussian Processes 599
52. Xiu, D.B., Hesthaven, J.S.: High-order collocation methods for differential equations with
random inputs. SIAM J. Sci. Comput. 27(3), 11181139 (2005)
53. Xiu, D.B., Karniadakis, G.E.: The wiener-askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
Solution Algorithms for Stochastic Galerkin
Discretizations of Differential Equations 16
with Random Data
Howard Elman
Abstract
This chapter discusses algorithms for solving systems of algebraic equations
arising from stochastic Galerkin discretization of partial differential equations
with random data, using the stochastic diffusion equation as a model problem.
For problems in which uncertain coefficients in the differential operator are linear
functions of random parameters, a variety of efficient algorithms of multigrid
and multilevel type are presented, and, where possible, analytic bounds on
convergence of these methods are derived. Some limitations of these approaches
for problems that have nonlinear dependence on parameters are outlined, but for
one example of such a problem, the diffusion equation with a diffusion coefficient
that has exponential structure, a strategy is described for which the reformulated
problem is also amenable to efficient solution by multigrid methods.
Keywords
Convergence analysis Iterative methods Multigrid Stochastic Galerkin
Contents
1 Introduction: The Stochastic Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
2 Solution Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
2.1 Multigrid Methods I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
2.2 Multigrid Methods II: Mean-Based Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . 608
2.3 Hierarchical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
3 Approaches for Other Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
H. Elman ()
Department of Computer Science and Institute for Advanced Computer Studies, University of
Maryland, College Park, MD, USA
e-mail: elman@cs.umd.edu
This chapter is concerned with algorithms for solving the systems of algebraic
equations arising from stochastic Galerkin discretization of partial differential
equations (PDEs) with random data. Consider a PDE of generic form Lu D f
on a spatial domain D, subject to boundary conditions Bu D g on @D, where
one or more of the operators L, B, or functions f , g depend on random data. The
most challenging scenario is when the dependence is in the differential operator L
and the discussion will be concentrated on this case. An example is the diffusion
equation, where Lu r .aru/, subject to boundary conditions u D gD on
@u
@DD , a @n D 0 on @DN . The diffusion coefficient a D a.x; !/ is a random field:
given a probability space .; F; P /, for each ! 2 , the realization a.; !/ is a
function defined on D, and for each x 2 D, a.x; / is a random variable. Although the
focus will be on the diffusion equation, the solution algorithms described are easily
generalized to other problems with random coefficients. (See, e.g., [22] for a study
of the Navier-Stokes equations.) In discussion of the diffusion problem, it will be
assumed that a.x; !/ is uniformly bounded, i.e., 0 < 1 a.x; !/ 2 < 1 a.e.,
which ensures well posedness. The Galerkin formulation of the stochastic diffusion
problem augments the standard (spatial) Galerkin methodology using averaging
with respect to expected value in the probability measure.
To begin, this section contains a concise introduction to the stochastic Galerkin
methodology. Comprehensive treatments of this approach can be found in [10,
17, 18, 33]. The next section contains a description and analysis of two efficient
solution algorithms that use a multigrid structure in the spatial component of the
problem, together with a description of an algorithm that can be viewed as having
a multilevel structure in the stochastic component. These methods are designed for
problems that depend linearly on a set of random parameters. This is followed by
a section discussing some limitations of these ideas for problems with nonlinear
dependence on random parameters, coupled with the presentation of an effective
strategy to handle one specific version of such problems, where the diffusion
coefficient is of exponential form. The final section contains a brief recapitulation
of the chapter.
Let H 1 .D/ denote the Sobolev space of functions on D with square integrable
first derivatives, let HE1 .D/ fu 2 H 1 .D/ j u D gD on @DD g, let H01 .D/ fu 2
H 1 .D/ j u D 0 on @DD g, and let L2 ./ denote the Hilbert space with inner product
defined by expected value on ,
Z
hv; wi v.!/w.!/dP ./:
where fr gmrD1 is a set of m random variables. A typical example comes from a
Karhunen-Love (KL) expansion derived from a covariance function. In this case,
far gm
rD1 correspond to discrete versions of the m eigenfunctions of the covariance
operator associated with the m largest eigenvalues, and fr g are uncorrelated zero-
mean random variables defined on . (It is assumed for (16.2) that the associated
eigenvalues are incorporated into far g as factors.) The covariance function is pos-
itive semi-definite and far gm rD1 can be listed in nonincreasing order of magnitude,
according to the nonincreasing sizes of the eigenvalues.
Let .1 ; : : : ; m /T . For fixed x, the diffusion coefficient can be viewed as a
function of , and it will be written using the same symbol, as a.x; /. It can then
be shown that the solution u.x; / is also a function of [18, Ch. 9], and if the joint
density ./ is known, then (16.1) can be rewritten as
Z Z Z Z
a.x; / ru.x; / rv.x; / d x ./ d D f .x; /v.x; / d x ./ d ;
D D
(16.3)
where D ./. Note that in (16.3), the vector of random variables is playing
a role analogous to spatial Cartesian coordinates in the physical domain. For a d -
dimensional physical domain, (16.3) is similar in form to the weak formulation of a
.d C m/-dimensional continuous deterministic problem.
Numerical approximation is done on a finite-dimensional subset of HE1 .D/
L ./. For this, let S .h/ denote a finite-dimensional subspace of H01 .D/ with basis
2
.h/
fj gN
j D1 , let SE be an extended version of this space with additional functions
x
x CN@
PNx CN@
fj gN
j DNx C1 , so that j DNx C1 uj j interpolates the boundary data gD on @DD ,
N
and let T .p/ denote a finite-dimensional subspace of L2 . / with basis f q gqD1 .
.h/
The discrete weak problem is then to find u.hp/ 2 SE T .p/ such that
Z Z Z Z
aru.hp/ rv .hp/ d x ./ d D f v .hp/ d x ./ d ;
D D
for all v .hp/ 2 S .h/ T .p/ . The discrete solution has the form
N Nx x CN@
NX
X X
u.hp/ .x; / D uj q j .x/ q ./ C uj j .x/ : (16.4)
qD1 j D1 j DNx C1
604 H. Elman
for the coefficients fuj q gj D1WNx ;qD1WN . (From the assumption of deterministic
boundary data, the known coefficients of the spatial basis functions on the Dirichlet
boundary @D can be incorporated into the right-hand side of (16.4).) If the vector
u.hp/ of these coefficients is ordered by grouping the spatial indices together, as
u11 ; u21 ; : : : ; uNx 1 ; u12 ; u22 : : : ; uNx 2 ; u1N ; u2N ; : : : ; uNx N ;
and the equations are ordered in an analogous way, then the coefficient matrix is a
sum of matrices of Kronecker-product structure,
m
X
.p/ .h/
A.hp/ D G0 A0 C Gr.p/ A.h/
r ; (16.6)
rD1
where
.h/ R
Ar j k D D ar .x/rk .x/rj .x/d x;
.p/ R .p/ R
G0 lq D q ./ l ././d; Gr lq D r q ./ l ././d;
(16.7)
and the Kronecker product of matrices G (of order N ) and A (of order Nx ) is the
matrix of order Nx N given by
2 3
g11 A g12 A g1N A
6 g21 A g22 A g2N A 7
6 7
GAD6 : :: : : :: 7:
4 :: : : : 5
g N 1 A gN 2 A gN N A
The joint density function ./ may not be known, and an additional assumption
typically made is that the random variables fr g are independent with known
marginal density functions fr g [34] (cf. [12] for an analysis of this assumption).
The joint density function is then the product of marginal density functions, ./ D
1 .1 /2 .2 / m .m /. This also enables a simple choice of basis functions for
.r/
T .p/ . Let fpj gj 0 be the set of polynomials orthogonal with respect to the density
.r/ .r/
function r , normalized so that hpj ; pj i D 1. Then, each basis function q can
be taken to be a product of univariate polynomials [10, 34]
Fig. 16.1 Representative sparsity pattern of coefficient matrix obtained from stochastic Galerkin
discretization with m D 6, p D 4, giving block order 210
polynomials of total degree at most p, i.e., j1 C Cjm p. Then, N D m C p
p ;
.p/
and it follows from properties of orthogonal polynomials that the matrices fGr g
.hp/
are sparse, with at most two nonzeros per row. A is also sparse; a representative
example of its sparsity pattern, for m D 4 and p D 6, is shown in Fig. 16.1. Each
pixel in the figure corresponds to a matrix of order Nx with nonzero structure like
that obtained from discretization of a deterministic PDE. The blocking of the matrix
will be explained below.
The following lemma will be of use for analyzing the convergence properties of
iterative solution algorithms for the linear system.
See [8] or [21] for a proof. It is shown in the latter reference that p is bounded
p
p p
by p C p 1 in the case of standard Gaussian random variables and by 3
for uniform variables with mean 0 and variance 1. The assumption concerning
.p/
normalized polynomials implies that G0 D I , the identity matrix of order N .
606 H. Elman
2 Solution Algorithms
This section presents a collection of efficient iterative solution algorithms for the
coupled linear system (16.5) that arises from stochastic finite element methods.
The emphasis is on approaches whose convergence behavior has been shown to
be insensitive to the parameters such as spatial mesh size or polynomial order that
determine the discretization; they include multigrid and multilevel approaches and
techniques that take advantage of the hierarchical structure of the problem. Other
methods, mostly developed earlier than the ones described here, can be viewed
as progenitors of these techniques; they include methods based on incomplete
factorization [11, 20] or block splitting methods [15, 24]. The discussion will be
limited to primal formulations of the problem; see [5, 9] for treatment of the
stochastic diffusion equation using mixed finite elements in space.
One way to apply multigrid to the stochastic Galerkin system (16.5) is by general-
izing methods devised for deterministic problems. This idea was developed in [16],
and convergence analysis was presented in [4]; see also [26]. Assume there is a set of
spatial grids of mesh width h, 2h, 4h, etc., and let the discrete space T .p/ associated
with the stochastic component of the problem be fixed. The coefficient matrices for
fine-grid and coarse-grid spatial discretizations are
m
X m
X
.p/ .h/ .p/ .2h/
A.hp/ D G0 A0 C Gr.p/ A.h/
r ; A.2h;p/ D G0 A0 C Gr.p/ A.2h/
r :
rD1 rD1
h
Let P2h denote a prolongation operator mapping (spatial) coarse-grid vectors to
fine-grid vectors, and let Rh2h denote a restriction operator mapping fine to coarse.
Typically P2hh
is specified using interpolation and Rh2h D P2h [7]. Then, P D
h T
2h
I Rh is a restriction operator on the tensor product space that leaves the discrete
stochastic space intact, and P D I P2h h
is an analogous prolongation operator.
In addition, let Q represent a smoothing operator on the fine grid; a more precise
specification of Q is given below. One step of a two-grid algorithm for (16.5) to
update an approximate solution u.hp/ is defined as follows:
Recall that any positive-definite matrix M induces a norm kvkM .v; M v/1=2 .
Convergence of Algorithm 1 is established by the following result [4]. It is stated in
terms of generic constants c1 and c2 whose properties will be discussed below.
.hp/
Theorem 1. Let ui denote the approximate solution to (16.5) obtained after i
.hp/
steps of Algorithm 1, with error ei D u.hp/ ui . If the smoothing property
.hp/
A .I Q1 A.hp/ /k y .k/kykA.hp/ for all y 2 RNx N (16.9)
2
16 Solution Algorithms for Stochastic Galerkin Discretizations of. . . 607
Proof. The errors associated with Algorithm 1 satisfy the recursive relationship
kei kA.hp/ D k.A.hp/ /1 P.A.2h;p/ /1 R A.hp/ .I Q1 A.hp/ /k ei1 kA.hp/
c1 kA.hp/ .I Q1 A.hp/ /k ei1 k2
c1 .k/ kei1 kA.hp/ c2 kei1 kA.hp/
m
X m
X
.hp/
max .A / max .Gr.p/ A.h/
r / D max .Gr.p/ / j max .A.h/
r /j: (16.11)
rD0 rD0
.p/ .h/
Bounds on the eigenvalues of Gr come from Lemma 1. The eigenvalues of Ar can
be bounded using Rayleigh quotients; see (16.15) below.
608 H. Elman
To see what is needed to establish the approximation property (16.10), first recall
.h/
some results for deterministic problems. Let S .h/ and SE be above. Given y 2
P N
RNx , let y D y .h/ D j D1 x
yj j 2 S .h/ , and consider the deterministic diffusion
equation r .aru/ D y on D, for which no components of the problem depend
on random data. Let u 2 HE1 .D/ denote the (weak) solution. Then, u D .A.h/ /1 y
.h/
corresponds to a discrete weak solution u.h/ 2 S .h/ (which can be extended to SE
.2h/ 1 2h
as in (16.4)). Similarly, u D .A / Rh y corresponds to a coarse-grid solution
u.2h/ . The approximation property follows from the relations
A different way to apply multigrid to the system (16.5) comes from use of
.hp/ .p/ .h/
Q0 G0 A0 as a preconditioner for the coefficient matrix A.hp/ [21].
The preconditioning operator derives from the mean a0 of the diffusion coefficient.
16 Solution Algorithms for Stochastic Galerkin Discretizations of. . . 609
.hp/
If a.; / is not too large a perturbation of a0 , then Q0 will be a reasonable
approximation of A.hp/ and can be used as a preconditioner in combination with
Krylov subspace methods such as the conjugate gradient method (CG). Moreover,
.p/ .hp/
since G0 D IN , the identity matrix, Q0 , is a block-diagonal matrix consisting
.h/
of N decoupled copies of A0 . Thus, the overhead of using it with CG entails
applying the action of the inverse of N decoupled copies of A.h/ at each step.
Assume that a0 .x/ is uniformly bounded below by 1 > 0. The following result
.hp/
establishes the effectiveness of Q0 as a preconditioner.
Xm
max .G .p/ / kar k1 ; (16.13)
r
rD1
1
.p/ .p/
and max .Gr / is the modulus of the eigenvalue of Gr of maximal modulus.
.hp/ 1
If < 1, the condition number of the preconditioned operator .Q0 / A.hp/ is
bounded by 1C
1
.
Each of the fractions in the sum is bounded by the product of the maximal eigenvalue
.p/ .h/ .h/
of Gr times the modulus of the maximal eigenvalue of .A0 /1 Ar . The latter
quantities are the extrema of the Rayleigh quotient
.h/
j.v.h/ ; Ar v.h/ /j
.h/
; (16.14)
.v.h/ ; A0 v.h/ /
.p/
The assertion follows. Bounds on the eigenvalues of Gr again come from
Lemma 1.
610 H. Elman
.h/
It is generally preferable to approximate the action of .A0 /1 , which can be
.h/
done using multigrid. In particular, suppose .Q0;M G /1 represents a spectrally
.h/
equivalent approximation to .A0 /1 resulting from a fixed number of multigrid
steps. That is,
.h/
.v.h/ ; A0 v.h/ /
1 .h/
2
.v.h/ ; Q0;M G v.h/ /
As is well known, the number of steps required for convergence of the conjugate
gradient method is bounded in terms of the condition number of the coefficient
matrix [7]. Thus, Corollary 1 establishes the optimality of this method. The
requirement < 1 implies that the method is useful only if A.hp/ is not too large, a
.hp/
perturbation of Q0;M G . However, this method is significantly simpler to implement
than Algorithm 1. In particular, it is straightforward to use algebraic multigrid (e.g.,
.h/
[25]) to handle the approximate action of .A0 /1 , and there are no coarse-grid
.p/
operations with matrices of order jT j. A comparison of these two versions of
multigrid, showing some advantages of each, is given in [24]. It is also possible
to improve performance of the mean-based approach using a variant of the form
b .hp/
Q p G b .p/ A.h/ , where some G b .p/ replaces the identity in the Kronecker
0
b .p/
product [29]; a good choice of G is determined by minimizing the Frobenius
norm kA.hp/ G b .p/ A.h/ kF [31].
0
The blocking of the matrix shown in Fig. 16.1 reflects the hierarchical structure
of the discrete problem. The coefficient matrix arising from the subspace T .p/
L2 . /, consisting of polynomials of total degree p, has the form
.hp/ A.h;p1/ B .hp/
A D
C .hp/ D .hp/
where the .1; 1/-subblock comes from S .h/ T .p1/. In thefigure, p D 4 and (with
m D 6), the block corresponding to T .3/ has order 6 C 3
3 D 84. It follows from
the orthogonality of the basis for T .p/ that the .2; 2/-block D .hp/ is a block-diagonal
16 Solution Algorithms for Stochastic Galerkin Discretizations of. . . 611
matrix, each of whose blocks is Ah0 (see, e.g., [24]). The number of such blocks is
the number of basis functions of total degree exactly p.
This structure has been used in [28], building on ideas in [11], to develop
a hierarchical preconditioning strategy. The approach can also be viewed as a
multilevel method in the stochastic dimension, and it bears some resemblance
to iterative substructuring methods as described, for example, in [27]. To make
the description more readable, the dependence on the spatial mesh size h will be
suppressed from the notation. The coefficient matrix then has a block factorization
.p1/ .p/ .p1/
A B I B .p/ .D .p/ /1 S 0 I 0
A.p/D D ;
C .p/ D .p/ 0 I 0 D .p/ .D .p/ /1 B .p/ I
where S .p1/ A.p1/ B .p/ .D .p/ /1 C .p/ is a Schur complement. Equivalently,
I 0 .S .p1/ /1 0 I B .p/ .D .p/ /1
.A.p/ /1 D :
.D .p/ /1 B .p/ I 0 .D .p/ /1 0 I
The Schur complement is expensive to work with, and the idea in [28] is to replace
S .p1/ with A.p1/ , giving the approximate factorization
I 0 .A.p1/ /1 0 I B .p/ .D .p/ /1
.A.p/ /1 :
.D .p/ /1 B .p/ I 0 .D .p/ /1 0 I
(16.16)
The preconditioning operator is then defined by applying this strategy recursively
to .A.p1/ /1 ; implementation details are given in [28]. A key point is that the only
(subsidiary) system solves required entail independent computations of the action
.h/
of .A0 /1 . These are analogous to the coarse-grid solves required by Algorithm 1,
and they can be replaced by approximate solves.
The following convergence result is given in [28].
Bounds on the terms f1q g appearing in this expression have not been estab-
lished. However, experimental results in [28] suggest that for a diffusion equation
with coefficient as in (16.2) (with uniformly distributed random variables), the
condition number of the preconditioned system .Q.hp/ /1 A.hp/ is very close to 1
and largely insensitive to the problem parameters, i.e., spatial mesh size h, stochastic
dimension m, and polynomial degree p.
612 H. Elman
As noted above, the stochastic Galerkin method is most effective when the
dependence of the problem on random data is a linear function of parameters, as
in (16.2). This section expands on this point, showing what issues arise for problems
depending on random fields that are nonlinear functions of the data. In addition,
one method is presented that avoids some of these issues for one important class
of problems, the particular case of the diffusion equation in which the diffusion
coefficient has an exponential structure. A commonly used model of this type is
where the diffusion coefficient has a log-normal distribution, see [1, 35].
Suppose now that a.x; / is a nonlinear function of , where is of length m,
and that this function has a series expansion in the polynomial chaos basis
X
a.x; / D ar .x/ r ./ ;
r
O
m
X
A.hp/ D Gr.p/ A.h/
r : (16.17)
rD1
.p/
The main difference lies in the definition of the matrices fGr g associated with the
stochastic terms, where r in (16.7) is replaced by r ./:
Z
Gr.p/ `q D r ./ q ./ ` ././d : (16.18)
16 Solution Algorithms for Stochastic Galerkin Discretizations of. . . 613
(There is also a slight difference in indexing conventions, in that the lowest index
in (16.17) is r D 1.) If the finite-dimensional subspace T .p/ is defined by the
generalized polynomial chaos of maximal degree p, then this determines the length
of the finite-term approximation to a.; /. In particular, it follows from properties
of orthogonal polynomials that the triple product h r q ` i D 0 for all indices r
corresponding to polynomials
of degree
greater than 2p [19]. Thus, the number of
terms in (16.17) is m O D mC 2p
2p :
It is possible to develop preconditioning strategies that are effective with respect
to iteration counts to solve the analogue of (16.5) in this setting. In particular,
.h/
let a0 .x/ ha.x; /i denote the mean of a.x; /, and let A0 denote the matrix
obtained from finite element discretization of the diffusion operator with diffusion
coefficient a0 . Then, the analogue of the mean-based preconditioner described
.hp/ .h/
above is Q0 I A0 , and the analysis of [21] establishes mesh-independent
.hp/
conditioning of the preconditioned operator .Q0 /1 A.hp/ .
Unfortunately, there are serious drawbacks to this approach. The coefficient
matrices of (16.18) are considerably more dense than those of (16.7). Indeed, A.hp/
of (16.17) is block dense; that is, most blocks of order Nx are nonzero [19]. Although
each individual block has sparsity structure like that arising from finite element
discretization in space, most blocks are nonzero. Equivalently, the analogue of
Fig. 16.1 would be dense. As a result, the cost of a matrix-vector product required
by a Krylov subspace method is of order O.Nx N2 /. Even with an optimally small
number of iterations, the costs of a preconditioned iterative solver cannot scale
linearly with the number of unknowns, Nx N . The mean-based preconditioned
operator is also ill conditioned with respect to the polynomial degree p [23]. Thus,
the stochastic Galerkin method is less effective for problems whose dependence on
is nonlinear than in the linear case.
There is a way around this in the particular setting where the diffusion coefficient
is the exponential of a (smooth enough) random field. Our description follows [30].
Consider the equation
r .exp.c/ru/ D f:
Premultiplying by exp.c/ and applying the product rule to the divergence operator
transforms this problem to
u C w r D f exp.c/ (16.19)
where w D rc. Now suppose that c D c.x; / is a random field that is well
approximated by a finite-term KL expansion, i.e.,
m
X
c.x; / D c0 .x/ C cr .x/r : (16.20)
rD1
614 H. Elman
m
X
w D rc.x; / D rc0 .x/ C rcr .x/r ; (16.21)
rD1
a random field that is linear in the parameters fr g. The stochastic Galerkin method
can be applied in the identical way it was used for the diffusion equation. The
coefficient matrix has the form
m
X
.p/ .h/
A.hp/ D G0 .L.h/ C N0 / C Gr.p/ Nr.h/ ;
rD1
where
Z Z
.h/
L j k D rk .x/rj .x/d x ; Nr.h/ j k D j .x/rcr rk .x/d x ;
D D
.p/
and, most importantly, fGr g are as in (16.7).
Remark 1. Although this method avoids the difficulties associated with nonlinear
dependence on parameters, it is somewhat less general than a more direct approach.
In particular, if (16.20) is a truncated approximation to a convergent series, then the
series expansion of the gradient rc, for which (16.21) is an approximation, must
also be convergent. This is true if and only if rc0 and the second derivatives of
the covariance function for c are continuous; see, e.g., [3, Sect. 4.3]. It holds, for
example, for covariance functions of the form exp.kx yk22 / but not for those of
the form exp.kx yk2 /.
Any of the solution methods developed for problems that depend linearly on
are applicable in this setting, with slight modifications. For example, the mean-
.hp/ .p/ .h/
based preconditioner is Q0 G0 .L.h/ CN0 /. Since A.hp/ is nonsymmetric,
this preconditioner must be used with a Krylov subspace method such as GMRES
.h/
designed for nonsymmetric systems. Moreover, L.h/ CN0 is a discrete convection-
diffusion operator, so that applying the preconditioner requires the (approximate)
solution of N decoupled such operators. There are robust methods for solving the
discrete convection-diffusion equation, using multigrid (see, e.g., [7, 32]), so the
presence of this operator is not a significant drawback of this approach.
Convergence analysis of the GMRES method typically takes the form of bounds
.hp/
on the eigenvalues of the preconditioned operator .Q0 /1 A.hp/ , which determines
bounds on the asymptotic convergence factor for GMRES. The proof and additional
details can be found in [30].
16 Solution Algorithms for Stochastic Galerkin Discretizations of. . . 615
4 Conclusion
This chapter contains a description of some of the main approaches for solving
the coupled system of equations associated with stochastic Galerkin discretization
of parameter-dependent partial differential equations. This intrusive discretization
strategy gives rise to large algebraic systems of linear equations, but there is a variety
of computational algorithms available that, especially for problems that depend
linearly on the random parameters, offer the prospect of rapid solution. Indeed,
through the use of such solvers, Galerkin methods are competitive with other ways
of handling stochastic components of such models, such as stochastic collocation
methods [6]. Their utility for more complex models remains an open question.
References
1. Babuka, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 45(3), 10051034 (2007)
2. Braess, D.: Finite Elements. Cambridge University Press, London (1997)
3. Christakos, G.: Random Field Models in Earth Sciences. Academic, New York (1992)
4. Elman, H., Furnival, D.: Solving the stochastic steady-state diffusion problem using multigrid.
IMA J. Numer. Anal. 27, 675688 (2007)
5. Elman, H.C., Furnival, D.G., Powell, C.E.: H(div) preconditioning for a mixed finite element
formulation of the diffusion problem with random data. Math. Comput. 79, 733760 (2009)
6. Elman, H.C., Miller, C.W., Phipps, E.T., Tuminaro, R.S.: Assessment of collocation and
Galerkin approaches to linear diffusion equations with random data. Int J. Uncertain. Quantif.
1, 1933 (2011)
7. Elman, H.C., Silvester, D.J., Wathen, A.J.: Finite Elements and Fast Iterative Solvers, 2nd edn.
Oxford University Press, Oxford (2014)
8. Ernst, O.G., Ullmann, E.: Stochastic Galerkin matrices. SIAM J. Matrix Anal. Appl. 31, 1818
1872 (2010)
9. Ernst, O.G., Powell, C.E., Silvester, D.J., Ullmann, E.: Efficient solvers for a linear stochastic
Galerkin mixed formulation of diffusion problems with random data. SIAM J. Sci. Comput.
31, 14241447 (2009)
10. Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Springer, New York
(1991)
11. Ghanem, R.G., Kruger, R.M.: Numerical solution of spectral stochastic finite element systems.
Comput. Methods Appl. Mech. Eng. 129, 289303 (1996)
616 H. Elman
12. Grigoriu, M.: Probabilistic models for stochastic elliptic partial differential equations.
J. Comput. Phys. 229, 84068429 (2010)
13. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (1991)
14. Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, New York
(1991)
15. Keese, A.: Numerical solution of systems with stochastic uncertainties. PhD thesis, Universitt
Braunschweig, Braunsweig (2004)
16. Le Matre, O.P., Knio, O.M., Debusschere, B.J., Najm, H.N., Ghanem, R.G.: A multigrid solver
for two-dimensional stochastic diffusion equations. Comput. Methods Appl. Mech. Eng. 192,
47234744 (2003)
17. Le Matre, O.P., Knio, O.M.: Spectral Methods for Uncertainty Quantification. Springer, New
York (2010)
18. Lord, G.J., Powell, C.E., Shardlow, T.: An Introduction to Computational Stochastic PDEs.
Cambridge University Press, London (2014)
19. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial
differential equations. Comput. Methods Appl. Mech. Eng. 194, 12951331 (2005)
20. Pellissetti, M.F., Ghanem, R.G.: Iterative solution of systems of linear equations arising in the
context of stochastic finite elements. Adv. Eng. Softw. 31, 607616 (2000)
21. Powell, C.E., Elman, H.C.: Block-diagonal preconditioning for spectral stochastic finite
element systems. IMA J. Numer. Anal. 29, 350375 (2009)
22. Powell, C.E., Silvester, D.J.: Preconditioning steady-state Navier-Stokes equations with
random data. SIAM J. Sci. Comput. 34, A2482A2506 (2012)
23. Powell, C.E., Ullmann, E.: Preconditioning stochastic Galerkin saddle point matrices. SIAM J.
Matrix Anal. Appl. 31, 28132840 (2010)
24. Rosseel, E., Vandewalle, S.: Iterative methods for the stochastic finite element method. SIAM
J. Sci. Comput. 32, 372397 (2010)
25. Ruge, J.W., Stben, K.: Algebraic multigrid (AMG). In: McCormick, S.F. (ed.) Multigrid
Methods, Frontiers in Applied Mathematics, pp. 73130. SIAM, Philadelphia (1987)
26. Saynaeve, B., Rosseel, E., b Nicolai, Vandewalle, S.: Fourier mode analysis of multigrid
methods for partial differential equations with random coefficients. J. Comput. Phys. 224,
132149 (2007)
27. Smith, B., Bjrstad, P., Gropp, W.: Domain Decomposition. Cambridge University Press,
Cambridge (1996)
28. Sousedk, B., Ghanem, R.G., Phipps, E.T.: Hierarchical Schur complement preconditioner for
the stochastic Galerkin finite element methods. Numer. Linear Algorithm Appl. 21, 136151
(2014)
29. Ullmann, E.: A Kronecker product preconditioner for stochastic Galerkin finite element
discretizations. SIAM J. Sci. Comput. 32, 923946 (2010)
30. Ullmann, E., Elman, H.C., Ernst, O.G.: Efficient iterative solvers for stochastic Galerkin
discretizations of log-transformed random diffusion problems. SIAM J. Sci. Comput. 34,
A659A682 (2012)
31. Van Loan CF, Pitsianis, N.: Approximation with Kronecker products. In: Moonen, M.S., Golub,
G.H., de Moor, B.L.R. (eds.) Linear Algebra for Large Scale and Real-time Applications,
pp. 293314. Kluwer, Dordrecht (1993)
32. Wesseling, P.: An Introduction to Multigrid Methods. John Wiley & Sons, New York (1992)
33. Xiu, D.: Numerical Methods for Stochastic Computations. Princeton University Press, Prince-
ton (2010)
34. Xiu, D., Karniadakis, G.E.: Modeling uncertainty in steady-state diffusion problems using
generalized polynomial chaos. Comput. Methods Appl. Mech. Eng. 191, 49274948 (2002)
35. Zhang, D.: Stochastic Methods for Flow in Porous Media. Coping with Uncertainties.
Academic, San Diego (2002)
Intrusive Polynomial Chaos Methods for
Forward Uncertainty Propagation 17
Bert Debusschere
Abstract
Polynomial chaos (PC)-based intrusive methods for uncertainty quantification
reformulate the original deterministic model equations to obtain a system of
equations for the PC coefficients of the model outputs. This system of equations
is larger than the original model equations, but solving it once yields the uncer-
tainty information for all quantities in the model. This chapter gives an overview
of the literature on intrusive methods, outlines the approach on a general level,
and then applies it to a system of three ordinary differential equations that model
a surface reaction system. Common challenges and opportunities for intrusive
methods are also highlighted.
Keywords
Galerkin projection Intrusive spectral projection Polynomial chaos
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
2 Theory and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
3 Example: Intrusive Propagation of Uncertainty Through ODEs . . . . . . . . . . . . . . . . . . . . 625
4 Challenges and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
5 Software for Intrusive UQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
B. Debusschere ()
Mechanical Engineering, Sandia National Laboratories, Livermore, CA, USA
Reacting Flow Research Department, Sandia National Laboratories, Livermore, CA, USA
e-mail: bjdebus@sandia.gov
1 Introduction
trying to cover all of the material in this research field, it focuses on the Galerkin
projection of governing equations in the context of global PCEs only.
The next section outlines the main theoretical and algorithmic concepts for ISP
methods, followed by a specific example of propagation of parametric uncertainty
through a set of ordinary differential equations (ODEs). After that, some of the
challenges as well as opportunities for intrusive spectral propagation of uncertainty
are covered. Finally, pointers are provided to some software packages that are set
up for ISP uncertainty quantification.
Consider spectral PCEs following Wiener [56]. Let u be a real second-order random
variable, i.e., a Lebesgue-measurable mapping from a probability space .; ; P /
into R, where is the set of elementary events, ./ is a -algebra on generated
by the germ , and P is a probability measure on .; / [13]. Such a random
variable can be represented by a PCE as follows [11, 13, 56]:
1
X
uD uk k ./ (17.1)
kD0
that extends to higher values of in case random variables with long tails need to
be represented. Many other choices of random variables and associated polynomials
are available, as detailed in [57].
A related question is how many basis terms are needed to accurately represent a
random variable with a PCE. In general, the more similar the germ is to the random
variable that needs to be represented, the fewer terms will be needed. For example,
to represent a normally distributed random variable u with mean and standard
deviation , a Gauss-Hermite expansion only needs two terms
u D C (17.3)
while a Legendre-Uniform expansion would need many more terms. Even a 10th
order Legendre-Uniform expansion offers only a very crude approximation [43].
While this rule of thumb is quite intuitive, the formal study of the convergence of
PCEs is not as straightforward. For Gauss-Hermite expansions, a lot of convergence
results are available, but this is not the case for most other types of bases. A careful
analysis is presented by Ernst et al. [11]. One of the key findings in this chapter
is that the distribution of a random variable needs to be uniquely determined by
its moments in order for this random variable to be a valid germ for a PCE. One
notable example that fails this test is a lognormal random variable. Further, for a
given , a proof is provided that shows that any random variable with finite variance
that resides in the probability space with a Sigma algebra that is generated by can
be represented with a PCE that has as its germ. While these conditions are very
informative, it is clear however that the formal convergence of PCEs is a topic of
ongoing research.
In practice, the choice of the PCE basis type and the order of the representation
are often an engineering choice that depends on the amount of information provided
about the random variable that needs to be represented.
To determine the PC coefficients for an arbitrary random variable u, the inner
product (17.2) can be used as follows: First, multiply both sides of Eq. (17.1) with
the basis function k , dropping the dependence on for clarity of notation:
1
X
ku D k ui i (17.4)
iD0
Then, move k inside the summation on the right-hand side, and take the expecta-
tion:
* 1
+
X
h k ui D ui i k (17.5)
iD0
After rearranging some terms and recognizing that ui does not depend on , it can
be factored outside of the expectation operator:
1
X
h k ui D ui h i ki (17.6)
iD0
17 Intrusive Polynomial Chaos Methods for Forward Uncertainty Propagation 621
One nice property of the Galerkin projection is that the residual between the
random variable u and its PC representation uQ is orthogonal to the space covered by
the basis functions. In essence, uQ is the best possible representation of u with the
given basis set (in the corresponding `2 norm on that same space).
Note that so far, the dimensionality of the PCE germ has not been specified.
In general, the dimensionality of corresponds to the number of degrees of
freedom in the random variable that is represented by the PCE. For example, if
a quantity depends on two other uncertain inputs, then its PCE would depend
on D f1 ; 2 g. In a more general context, consider a random field, which has
infinitely many degrees of freedom. To properly capture random fields with finite
dimensional representations, a commonly used tool is the Karhunen-Love (KL)
expansion [13, 26]. The KL expansion represents a random field in terms of the
eigenfunctions of its covariance matrix, multiplied with random variables that are
uncorrelated. If the spectrum of the eigenvalues of the covariance matrix decays
rapidly, then only a few eigenfunctions are needed to properly represent the random
field with a truncated KL expansion. This generally occurs when the random field
uo
u1
o
622 B. Debusschere
has long correlation lengths. However, when the random field has short correlation
lengths, its realizations will have a lot of small-scale variability, which results in a
slowly decaying eigenspectrum. In this case, many more terms are needed in the
KL expansion to properly capture the random field, and a high-dimensional spectral
representation is required.
As far as ISP is concerned, however, all methods described here apply naturally
to PCEs of any dimension. The only thing that changes is the number of terms P C1
in the PCE. As such, in this work, all PCEs will be expressed as a function of the
germ without specifying its dimensionality. In many cases, will even be omitted
from the notations for clarity.
In intrusive spectral uncertainty propagation, the Galerkin projection (17.7) is
applied to the model equations in order to obtain a set of reformulated equations in
the PCE coefficients. Consider, e.g., a simple ODE with an uncertain parameter .
du
D u (17.9)
dt
Assume a PCE is specified for :
P
X
D i i ./ (17.10)
iD0
The next step applies a Galerkin projection onto the spectral basis functions by
multiplying both sides of the equation with k and taking the expectation with
respect to the basis germ :
* P + * P P +
X duj .t / XX
j ./ k ./ D i uj .t / i ./ j ./ k ./ (17.14)
j D0
dt iD0 j D0
17 Intrusive Polynomial Chaos Methods for Forward Uncertainty Propagation 623
Since only the basis functions depend on , the expectation operator can be moved
inside the summations and products to yield
X P P X P
duj .t / X
j ./ k ./ D i uj .t / i ./ j ./ k ./ (17.15)
j D0
dt iD0 j D0
Given the orthogonality of the basis functions, Equation (17.2), and dropping the
dependence on for notational clarity, this can be simplified to
P P
duk 2 X X
k D i uj i j k (17.16)
dt iD0 j D0
After dividing by k2 , this becomes
XX P P
duk
D i uj Cij k (17.17)
dt iD0 j D0
where
i j k
Cij k D 2 (17.18)
k
which multiplies all factors in the product at once and keeps all higher-order terms
in the process. However, this leads to the need to compute the norms of the products
of many basis functions, which becomes unwieldy and expensive as the number of
factors in a product increases [8]. In practice, the pseudo-spectral approach is used
most often for its ease of use. However, it is important to choose a PC expansion with
a high-enough order to minimize the errors associated with the loss of information
when intermediate results with higher-order terms are truncated in the process.
Divisions of variables represented by PC expansions can be handled by con-
structing an appropriate system of equations. Consider, e.g., w D u=v, which can
be rewritten as u D v w. By writing out this product using Equation (17.19), and
substituting in the known PC expansions for u and v, a system of P C 1 equations
in the unknown PC coefficients of w is obtained.
Through the sum, product, and division, polynomial expressions of PC expan-
sions as well as ratios of such expressions can be evaluated. However, non-
polynomial expressions can be more challenging to compute intrusively. One
approach is to expand non-polynomial functions as a Taylor series around the mean
of the random variable that is the function argument. This approach is often very
effective when the function has a Taylor series that converges rapidly and when the
uncertainty in the function argument is low. In the present context, low uncertainty
means that realizations of the random variable that is represented with the PCE fall
within the radius of convergence of the Taylor series around the mean. For functions
that have a very large or infinite radius of convergence, such as the exponential
function, this is not an issue, and Taylor series tend to be quite robust. However,
other functions, such as the logarithm, have a narrow radius of convergence, which
can make convergence tricky. In particular, when Gauss-Hermite expansions are
used, the corresponding random variable can in theory take on any value on the real
line. In practice, the Taylor series for the logarithm of Gauss-Hermite PCEs will
only converge if the uncertainty is very low so that the probability of realizations
falling outside the radius of convergence is very low [8].
For some specific functions, namely, those whose derivative can be written as
a rational expression of the function outcome or its argument, the PC expansion
of the function evaluation can be found through an integration approach. Consider,
e.g., the exp./ function. If v D e u , then dv D e u du D v du. As such,
Z b
b a
e e D v du (17.20)
a
0.4
0.2
0.0
0 200 400 600 800 1000
Time [-]
PCEs for the model variables are first postulated, which are then substituted into the
governing equations.
P
X
bD bi i ./ (17.28)
iD0
P
X P
X
u.t / D ui .t / i ./ v.t / D vi .t / i ./ (17.29)
iD0 iD0
P
X P
X
w.t / D wi .t / i ./ z.t / D zi .t / i ./ (17.30)
iD0 iD0
Substituting those PCEs first into Equation (17.21), multiplying with k , and
taking the expectation w.r.t. results in
du
D az cu 4d uv (17.31)
dt
P P P P P
d X X X X X
ui i D a zi i c ui i 4d ui i vj j (17.32)
dt iD0 iD0 iD0 iD0 j D0
* P
+ * P
+ * P
+
d X X X
k ui i D a k zi i c k ui i
dt iD0 iD0 iD0
* P P
+
X X
4d k ui i vj j (17.33)
iD0 j D0
after rearranging terms and invoking the orthogonality of the basis functions, this
results in
17 Intrusive Polynomial Chaos Methods for Forward Uncertainty Propagation 627
XP X P
d 2
uk k D azk k2 cuk k2 4d ui vj i j k (17.34)
dt iD0 j D0
XX P P
d
uk D azk cuk 4d ui vj Cij k (17.35)
dt iD0 j D0
where Equation (17.35) is solved for k D 0; : : : ; P , using the same Cij k constants
as defined before in Equation (17.18).
For the dimer coverage fraction, the same procedure is applied to Equa-
tion (17.22) to get
dv
D 2bz2 4d uv (17.36)
dt
P P P P P P
d X X X X X X
vi i D 2 bh h zi i zj j 4d ui i vj j
dt iD0 iD0 j D0 iD0 j D0
hD0
(17.37)
* P
+ * P P P
+
d X X X X
k vi i D 2 k bh h zi i zj j
dt iD0 iD0 j D0
hD0
* P P
+
X X
4d k ui i vj j (17.38)
iD0 j D0
P X
X P X P XP X P
d 2
vk k D 2 bh zi zj h i j k 4d ui vj i j k
dt iD0 j D0 iD0 j D0
hD0
(17.39)
P X P X P P X P
d X h i j k X i j k
vk D 2 bh z i z j 2 4d ui vj 2
dt k k
hD0 iD0 j D0 iD0 j D0
(17.40)
P X
X P X P XP X P
d
vk D 2 bh zi zj Dhij k 4d ui vj Cij k (17.41)
dt iD0 j D0 iD0 j D0
hD0
In Equation (17.41), the factors Dhij k result from the fact that the original equation
contains a product of three uncertain variables, and all terms in the product are
retained (full spectral product). However, as discussed in the previous section,
precomputing and storing the Dhij k can get pretty tedious. Further, even though
628 B. Debusschere
the product is performed fully spectrally, only the first .P C 1/ terms are retained,
since Equation (17.41) is only solved for k D 0; : : : ; P .
To avoid the need for computing the Dhij k factors, a pseudo-spectral approach is
used by rewriting Equation (17.22) using an auxiliary variable g as follows:
g D z2 (17.42)
dv
D 2bg 4d uv (17.43)
dt
Using this transformation, the product of three uncertain variables has been
removed, and the Galerkin-projected equations for the PC coefficients become
P X
X P
gk D zi zj Cij k (17.44)
iD0 j D0
XP X P XP X P
d
vk D 2 bi gj Cij k 4d ui vj Cij k (17.45)
dt iD0 j D0 iD0 j D0
d
wk D ezk f wk k D 0; : : : ; P (17.46)
dt
z0 D 1 u0 v0 w0 (17.47)
zk D uk vk wk k D 1; : : : ; P (17.48)
0.4
0.2
0.0
0 200 400 600 800 1000
Time [-]
0.4
0.3
0.2
0.1
0.0
0.1
0 200 400 600 800 1000
Time [-]
later on in the simulation, suggesting that both the uncertainty and skewness of
the distribution of u are changing in time. Note that the higher-order terms are
the smallest in magnitude when the mean value goes through a local maximum or
minimum and the largest in magnitude when the mean value has the largest gradients
in time. This indicates that a lot of the uncertainty in u is due to phase shifts in the
oscillation due to uncertainty in b.
To get a feel for the distribution of u, Fig. 17.5 shows PDFs of u at selected
instances in time. To generate these PDFs, the PCEs of u at the corresponding points
in time were evaluated for 100,000 samples of the PCE germ, and the resulting u
samples were used as data to get the PDFs with kernel density estimation (KDE).
For the points in time where the mean of u has a high value (left plot in Fig. 17.5),
the PDFs show a distinct peak, but there is tail toward lower values that gets broader
in time. For the cases where the standard deviation has a local maximum (right plot
in Fig. 17.5), the PDFs of u show a more and more distinctive bimodal behavior as
time goes on.
Figure 17.6 shows a comparison between PDFs for u, generated with intrusive
spectral projection (ISP), nonintrusive spectral projection (NISP), and Monte Carlo
630 B. Debusschere
350 16
t = 330.5 t = 377.8
300 14
t = 756.5 t = 803.0
12
250
Prob. Dens. [-]
Fig. 17.5 Probability density functions (PDFs) of u at select points in time. Left: two instances
in time where the mean value of u reaches a maximum. Right: two instances in time where the
standard deviation of u reaches a maximum
0
0.10 0.15 0.20 0.25 0.30 0.35
u
sampling (MC). For the NISP results, the deterministic system of ODEs was
evaluated for six values of b, sampled at the Gauss-Legendre quadrature points.
The resulting u samples were then used to project u onto the PC basis func-
tion with quadrature integration. For more information on this approach, see
Chap. 21, Sparse Collocation Methods for Stochastic Interpolation and Quadra-
ture. In the Monte Carlo approach, the deterministic ODE system was evaluated
for 50,000 randomly sampled values of b. The resulting u values were not used
to construct a PCE, but instead fed directly into KDE to generate a PDF. All
three approaches show very good agreement with each other, which validates the
implementation of the ISP and NISP methods and also confirms that the choice of a
5th-order PCE was appropriate for this example.
A more detailed comparison between the ISP and NISP results is shown in
Fig. 17.7, which compares the 4th- and 5th-order terms in the PCE of u as a function
of time.
The lower-order terms in the PCE of u are indistinguishable (not shown)
between the ISP and NISP approaches. The 4th- and 5th-order terms are also
nearly indistinguishable, except at late time, when very minor differences become
17 Intrusive Polynomial Chaos Methods for Forward Uncertainty Propagation 631
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
0 200 400 600 800 1000
Time [-]
Fig. 17.7 Comparison of the 4th- and 5th-order terms in the PCE of u as a function of time,
generated with intrusive (ISP) and nonintrusive (NISP) spectral projection
visible. If the integration time horizon were further extended, those differences
would become larger, due to the need to use higher-order representations to properly
capture all intermediate variables in the computations and avoid phase errors in the
oscillatory dynamics [38]. This touches on some of the challenges with the ISP
approach, which are discussed in the next section.
information, with both of these conditions placing a lot of demands on the spectral
representation of uncertain quantities.
The reformulation of the governing equations actually alters the nature of the
equations to be solved. Not only is the system of equations larger than the original
deterministic system of equations, but its dynamics can change, e.g., through the
emergence of spurious eigenvalues [45]. All of this requires careful construction
of solvers for the systems of equations resulting from intrusive Galerkin projection
[1, 30].
Due to these challenges, intrusive uncertainty quantification approaches have
mostly found use in problems with near linear behavior. Also, the fact that intrusive
methods require a code rewrite remains a significant barrier to their adoption.
Instead, the various nonintrusive approaches are much more widely used, thanks to
their relative ease of implementation and the fact that they rely on the same solvers
that are used for the original deterministic equations. As of recently, however,
intrusive UQ methods are gathering increased attention, for their potential to make
better use of extreme scale computing architectures. Using proper preconditioners
and solvers, the large systems of equations resulting from the intrusive Galerkin
reformulation are often well suited to take advantage of intra-node concurrency,
e.g., via multi-core nodes or GPUs [40]. As such, research in intrusive methods for
uncertainty propagation is alive and well.
Given the complexities of creating and solving the systems of equations for intrusive
UQ, and its status more of a research topic rather than a mainstream application
method, relatively few software packages are available for intrusive UQ. Two open-
source packages are mentioned here: UQTk and Stokhos.
UQTk stands for UQ Toolkit [9], containing libraries and tools for the quantifica-
tion of uncertainty in computational models. It includes tools for intrusive Galerkin
projection of many numerical operations. It is geared toward algorithm prototyping,
tutorial, and academic use.
Stokhos [39] is part of the Trilinos project. It provides methods for computing
well-known intrusive stochastic Galerkin projections such as polynomial chaos
and generalized polynomial chaos, interfaces for forming the resulting nonlinear
systems, and linear solver methods for solving block stochastic Galerkin linear
systems. Stokhos is geared toward handling large systems of equations and, through
its connection with other Trilinos libraries, is set up for high performance computing
platforms.
6 Conclusions
Intrusive methods for uncertainty quantification provide a powerful and elegant way
to propagate uncertainties through computational models. Through the Galerkin
projection of the original governing equations of a deterministic problem with
17 Intrusive Polynomial Chaos Methods for Forward Uncertainty Propagation 633
Cross-References
References
1. Augustin, F., Rentrop, P.: Stochastic Galerkin techniques for random ordinary differential
equations. Numer. Math. 122(3), 399419 (2012)
2. Babuka, I., Tempone, R., Zouraris, G.: Galerkin finite element approximations of
stochastic elliptic partial differential equations. SIAM J. Numer. Anal. 42(2), 800825
(2004)
3. Babuka, I., Tempone, R., Zouraris, G.: Solving elliptic boundary value problems with
uncertain coefficients by the finite element method: the stochastic formulation. Comput
Methods Appl. Mech. Eng. 194, 12511294 (2005)
4. Beran, P.S., Pettit, C.L., Millman, D.R.: Uncertainty quantification of limit-cycle oscillations.
J. Comput. Phys. 217(1), 21747 (2006). doi:10.1016/j.jcp.2006.03.038
5. Chen, Q.Y., Gottlieb, D., Hesthaven, J.: Uncertainty analysis for the steady-state flows in a
dual throat nozzle. J. Comput. Phys. 204, 378398 (2005)
6. Deb, M.K., Babuka, I., Oden, J.: Solution of stochastic partial differential equations using
Galerkin finite element techniques. Comput. Methods Appl. Mech. Eng. 190, 63596372
(2001)
7. Debusschere, B., Najm, H., Matta, A., Knio, O., Ghanem, R., Le Matre, O.: Protein labeling
reactions in electrochemical microchannel flow: numerical simulation and uncertainty propa-
gation. Phys. Fluids 15(8), 22382250 (2003)
8. Debusschere, B., Najm, H., Pbay, P., Knio, O., Ghanem, R., Le Matre, O.: Numerical
challenges in the use of polynomial chaos representations for stochastic processes. SIAM
J. Sci. Comput. 26(2), 698719 (2004)
9. Debusschere, B., Sargsyan, K., Safta, C., Chowdhary, K.: UQ Toolkit. http://www.sandia.gov/
UQToolkit (2015)
10. Elman, H.C., Miller, C.W., Phipps, E.T., Tuminaro, R.S.: Assessment of collocation and
Galerkin approaches to linear diffusion equations with random data. Int. J. Uncertain. Quantif.
1(1), 1933 (2011)
11. Ernst, O., Mugler, A., Starkloff, H.J., Ullmann, E.: On the convergence of generalized
polynomial chaos expansions. ESAIM: Math. Model. Numer. Anal. 46, 317339 (2012)
634 B. Debusschere
12. Ghanem, R., Dham, S.: Stochastic finite element analysis for multiphase flow in heteroge-
neous porous media. Transp. Porous Media 32, 239262 (1998)
13. Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Springer,
New York (1991)
14. Knio, O., Le Matre, O.: Uncertainty propagation in CFD using polynomial chaos decompo-
sition. Fluid Dyn. Res. 38(9), 61640 (2006)
15. Le Matre, O., Knio, O., Najm, H., Ghanem, R.: A stochastic projection method for fluid flow
I. Basic formulation. J. Comput. Phys. 173, 481511 (2001)
16. Le Matre, O., Reagan, M., Najm, H., Ghanem, R., Knio, O.: A stochastic projection method
for fluid flow II. Random process. J. Comput. Phys. 181, 944 (2002)
17. Le Matre, O., Knio, O., Debusschere, B., Najm, H., Ghanem, R.: A multigrid solver for two-
dimensional stochastic diffusion equations. Comput. Methods Appl Mech. Eng. 192, 4723
4744 (2003)
18. Le Matre, O., Ghanem, R., Knio, O., Najm, H.: Uncertainty propagation using Wiener-Haar
expansions. J. Comput. Phys. 197(1), 2857 (2004)
19. Le Matre, O., Najm, H., Ghanem, R., Knio, O.: Multi-resolution analysis of Wiener-type
uncertainty propagation schemes. J. Comput. Phys. 197, 502531 (2004)
20. Le Matre, O., Reagan, M., Debusschere, B., Najm, H., Ghanem, R., Knio, O.: Natural
convection in a closed cavity under stochastic, non-Boussinesq conditions. SIAM J. Sci.
Comput. 26(2), 375394 (2004)
21. Le Matre, O., Najm, H., Pbay P, Ghanem, R., Knio, O.: Multi-resolution analysis scheme
for uncertainty quantification in chemical systems. SIAM J. Sci. Comput. 29(2), 864889
(2007)
22. Le Maitre, O.P., Mathelin, L., Knio, O.M., Hussaini, M.Y.: Asynchronous time integration
for polynomial chaos expansion of uncertain periodic dynamics. Discret. Contin. Dyn. Syst.
28(1), 199226 (2010)
23. Lucor, D., Karniadakis, G.: Noisy inflows cause a shedding-mode switching in flow past an
oscillating cylinder. Phys. Rev. Lett. 92(15), 154501 (2004)
24. Ma, X., Zabaras, N.: A stabilized stochastic finite element second-order projection method for
modeling natural convection in random porous media. J. Comput. Phys. 227(18), 84488471
(2008)
25. Makeev, A.G., Maroudas, D., Kevrekidis, I.G.: Coarse stability and bifurcation analysis
using stochastic simulators: kinetic Monte Carlo examples. J. Chem. Phys. 116(23), 10,083
(2002)
26. Marzouk, Y.M., Najm, H.N.: Dimensionality reduction and polynomial chaos acceleration of
Bayesian inference in inverse problems. J. Comput. Phys. 228(6), 18621902 (2009)
27. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial
differential equations. Comput. Methods Appl. Mech. Eng. 194, 12951331 (2005)
28. Millman, D., King, P., Maple, R., Beran, P., Chilton, L.: Uncertainty quantification with a
B-spline stochastic projection. AIAA J. 44(8), 18451853 (2006)
29. Najm, H.: Uncertainty quantification and polynomial chaos techniques in
computational fluid dynamics. Ann. Rev. Fluid Mech. 41(1), 3552 (2009).
doi:10.1146/annurev.fluid.010908.165248
30. Najm, H., Valorani, M.: Enforcing positivity in intrusive PC-UQ methods for reactive ODE
systems. J. Comput. Phys. 270, 544569 (2014)
31. Narayanan, V., Zabaras, N.: Variational multiscale stabilized FEM formulations for transport
equations: stochastic advection-diffusion and incompressible stochastic Navier-Stokes equa-
tions. J. Comput. Phys. 202(1), 94133 (2005)
32. Pawlowski, R.P., Phipps, E.T., Salinger, A.G.: Automating embedded analysis capabilities
and managing software complexity in multiphysics simulation, Part I: Template-based
generic programming. Sci. Program. 20(2), 197219 (2012). doi:10.3233/SPR-2012-0350,
arXiv:1205.3952v1
33. Pawlowski, R.P., Phipps, E.T., Salinger, A.G., Owen, S.J., Siefert, C.M., Staten, M.L.:
Automating embedded analysis capabilities and managing software complexity in multi-
17 Intrusive Polynomial Chaos Methods for Forward Uncertainty Propagation 635
physics simulation part II: application to partial differential equations. Sci. Program. 20(3),
327345 (2012). doi:10.3233/SPR-2012-0351, arXiv:1205.3952v1
34. Perez, R., Walters, R.: An implicit polynomial chaos formulation for the euler equations. In:
Paper AIAA 2005-1406, 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno (2005)
35. Pettersson, M.P., Iaccarino, G., Nordstrm, J.: Polynomial Chaos Methods for Hyperbolic
Partial Differential Equations. Numerical Techniques for Fluid Dynamics Problems in the
Presence of Uncertainties. Springer, Cham (2015)
36. Pettersson, P., Nordstrm, J., Iaccarino, G.: Boundary procedures for the time-
dependent Burgers equation under uncertainty. Acta Math. Sci. 30(2), 539550 (2010).
doi:10.1016/S0252-9602(10)60061-6
37. Pettersson, P., Iaccarino, G., Nordstrm, J.: A stochastic Galerkin method for the Euler
equations with Roe variable transformation. J. Comput. Phys. 257(PA), 481500 (2014)
38. Pettit, C.L., Beran, P.S.: Spectral and multiresolution wiener expansions of oscillatory
stochastic processes. J. Sound Vib. 294(4/5):752779 (2006). doi:10.1016/j.jsv.2005.12.043
39. Phipps, E.: Stokhos. https://trilinos.org/packages/stokhos/ (2015). Accessed 9 Sept 2015
40. Phipps, E., Hu, J., Ostien, J.: Exploring emerging manycore architectures for uncertainty
quantification through embedded stochastic Galerkin methods. Int. J. Comput. Math. 123
(2013). doi:10.1080/00207160.2013.840722
41. Powell, C.E., Elman, H.C.: Block-diagonal preconditioning for spectral stochastic finite-
element systems. IMA J. Numer. Anal. 29(2), 350375 (2009)
42. Reagan, M., Najm, H., Debusschere, B., Le Matre O, Knio, O., Ghanem, R.: Spectral
stochastic uncertainty quantification in chemical systems. Combust. Theory Model. 8, 607
632 (2004)
43. Sargsyan, K., Debusschere, B., Najm, H., Marzouk, Y.: Bayesian inference of spectral
expansions for predictability assessment in stochastic reaction networks. J. Comput. Theor.
Nanosci. 6(10), 22832297 (2009)
44. Schwab, C., Todor, R.: Sparse finite elements for stochastic elliptic problems. Numer. Math.
95, 707734 (2003)
45. Sonday, B., Berry, R., Najm, H., Debusschere, B.: Eigenvalues of the Jacobian of a Galerkin-
projected uncertain ODE system. SIAM J. Sci. Comput. 33, 12121233 (2011)
46. Todor, R., Schwab, C.: Convergence rates for sparse chaos approximations of elliptic
problems with stochastic coefficients. IMA J. Numer. Anal. 27, 232261 (2007)
47. Tryoen, J., Le Matre, O., Ndjinga, M., Ern, A.: Intrusive Galerkin methods with upwinding
for uncertain nonlinear hyperbolic systems. J. Comput. Phys. 229(18), 64856511 (2010)
48. Tryoen, J., Le Matre, O., Ndjinga, M., Ern, A.: Roe solver with entropy corrector for
uncertain hyperbolic systems. J. Comput. Appl. Math. 235(2), 491506 (2010)
49. Tryoen, J., Matre, O.L., Ern, A.: Adaptive anisotropic spectral stochastic methods for
uncertain scalar conservation laws. SIAM J. Sci. Comput. 34(5), A2459A2481 (2012)
50. Vigil, R., Willmore, F.: Oscillatory dynamics in a heterogeneous surface reaction: Breakdown
of the mean-field approximation. Phys. Rev. E. Stat. Phys. Plasmas Fluids Relat. Interdiscip.
Topics 54(2), 12251231 (1996)
51. Villegas, M., Augustin, F., Gilg, A., Hmaidi, A., Wever, U.: Application of the Polynomial
Chaos Expansion to the simulation of chemical reactors with uncertainties. Math. Comput.
Simul. 82(5), 805817 (2012). doi:10.1016/j.matcom.2011.12.001
52. Wan, X., Karniadakis, G.: Long-term behavior of polynomial chaos in stochastic flow
simulations. Comput. Methods Appl. Mech. Eng. 195(2006), 55825596 (2006)
53. Wan, X., Karniadakis, G.: Multi-element generalized polynomial chaos for arbitrary proba-
bility measures. SIAM J. Sci. Comput. 28(3), 901928 (2006)
54. Wan, X., Karniadakis, G.E.: An adaptive multi-element generalized polynomial chaos method
for stochastic differential equations. J. Comput. Phys. 209, 617642 (2005)
55. Wan, X., Xiu, D., Karniadakis, G.: Stochastic solutions for the two-dimensional advection-
diffusion equation. SIAM J. Sci. Comput. 26(2), 578590 (2004)
56. Wiener, N.: The homogeneous chaos. Am. J. Math. 60, 897936. doi:10.2307/2371268
(1938)
636 B. Debusschere
57. Xiu, D., Karniadakis, G.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002). doi:10.1137/S1064827501387826
58. Xiu, D., Karniadakis, G.: Modeling uncertainty in flow simulations via generalized polyno-
mial chaos. J. Comput. Phys. 187, 137167 (2003)
59. Xiu, D., Karniadakis, G.: A new stochastic approach to transient heat conduction modeling
with uncertainty. Int. J. Heat Mass Transf. 46(24), 46814693 (2003)
60. Xiu, D., Lucor, D., Su, C.H., Karniadakis, G.: Stochastic modeling of flow-structure
interactions using generalized polynomial chaos. ASME J. Fluids Eng. 124, 5159 (2002)
Multiresolution Analysis for Uncertainty
Quantification 18
Olivier P. Le Matre and Omar M. Knio
Abstract
We survey the application of multiresolution analysis (MRA) methods in uncer-
tainty propagation and quantification problems. The methods are based on
the representation of uncertain quantities in terms of a series of orthogonal
multiwavelet basis functions. The unknown coefficients in this expansion are
then determined through a Galerkin formalism. This is achieved by injecting
the multiwavelet representations into the governing system of equations and
exploiting the orthogonality of the basis in order to derive suitable evolution
equations for the coefficients. Solution of this system of equations yields the
evolution of the uncertain solution, expressed in a format that readily affords the
extraction of various properties.
One of the main features in using multiresolution representations is their
natural ability to accommodate steep or discontinuous dependence of the solution
on the random inputs, combined with the ability to dynamically adapt the
resolution, including basis enrichment and reduction, namely, following the
evolution of the surfaces of steep variation or discontinuity. These capabilities
are illustrated in light of simulations of simple dynamical system exhibiting
a bifurcation and more complex applications to a traffic problem and wave
propagation in gas dynamics.
Keywords
Multiresolution analysis Multiwavelet basis Stochastic refinement
Stochastic bifurcation
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
2 One-Dimensional Multiresolution System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
2.1 Multiresolution Analysis and Multiresolution Space . . . . . . . . . . . . . . . . . . . . . . . . . 640
2.2 Stochastic Element Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
2.3 Multiwavelet Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
3 Multidimensional Extension and Multiscale Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
3.1 Binary-Tree Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
3.2 Multidimensional Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
3.3 Multiscale Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
4 Adaptivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
4.1 Coarsening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
4.2 Anisotropic Enrichment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
5 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
5.1 Simple ODE Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
5.2 Scalar Conservation Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
1 Introduction
A severe difficulty arising in the analysis of stochastic systems concerns the situation
where the solution exhibits steep or discontinuous dependence on the random
variables that are used to parametrize the uncertain inputs. Well-known examples
where these situations arise include complex systems involving shock formation or
an energy cascade [1, 2], bifurcations [3, 4], and chemical system ignition [5].
It is well known that these complex settings result in severe difficulties in repre-
senting the solution in terms of smooth global functionals. For instance, Gaussian
processes frequently exhibit large errors when the solution is discontinuous and
generally become impractical. When global polynomial basis functions are used [6],
the representation generally suffers from low convergence rate when the solution
varies steeply with the random inputs and requires an excessively large basis when
discontinuities arise.
In order to address and overcome these difficulties, it is generally desirable
to develop methodologies that can provide appropriate resolution in regions of
(smooth) steep variation and in the extreme cases isolate the regions where
discontinuities occur. When discontinuities occur along hypersurfaces of well-
defined regions, one can generally decompose the random parameter domain into
regions over which the solution varies smoothly and consequently build a global
representation as a collection of smooth representations defined over subdomains.
The domain decomposition approach [4] is relatively straightforward in situ-
ations where the hypersurfaces across which the solution varies discontinuously
have fixed positions and when the latter are known. When this is not the case,
the hypersurfaces must be first localized. Various approaches have been recently
developed to address this problem, including detection approaches [7].
18 Multiresolution Analysis for Uncertainty Quantification 639
In many cases, however, one is faced with the additional complexity that
the hypersurfaces of discontinuity can evolve according to the dynamics of the
system as well as the uncertain inputs. This necessitates a methodology that can
dynamically adapt the representation of the solution response in such a way as to
accommodate such behavior. The focus of this chapter is to provide a brief outline
of a class of adaptive multiresolution analysis (MRA) methods that can achieve this
capability.
In contrast to methods based on global polynomials [6], experiences with the
application of MRA methods to uncertain systems have been more limited [3
5, 814]. Our goals in the present discussion are twofold: (1) to outline the basic
construction of multiresolution representations in a probabilistic setting and their
implementation in the context of a Galerkin formalism for uncertainty propaga-
tion [15] and (2) to illustrate applications in which dynamic adaptation of the
representation is particularly essential.
This chapter is organized as follows. Section 2 provides an introduction to the
multiresolution system (MRS) for the case of a single dimension. Section 3 details
the construction of multiresolution spaces in higher dimension, relying on binary
trees, and introduces essential multiscale operators. Section 4 discusses adaptivity
based on the multiresolution framework, detailing crucial criteria for deciding the
coarsening and enrichment of the approximation spaces. Illustrations of simulations
using multiresolution schemes are provided in Sect. 5, in light of applications to an
elementary dynamical system and to a traffic flow problem. Concluding remarks are
given in Sect. 6.
(
1 xiD1;:::;N 2 U
pX x 2 R D
N
: (18.1)
0 otherwise
space L2 . / is equipped with the inner product denoted h; i and norm k k2
given by
Z
:
hU; V i D U .x/V .x/pX .x/d x; kU k22 D hU; U i : (18.2)
We recall that
kU k2 < C1 , U 2 L2 . /: (18.3)
Let U be the unit interval. For given integer k 0, called the resolution level, we
.k/
define the partition P.k/ of U into 2k non-overlapping subintervals Ul having equal
size,
n o
.k/ .k/ : l 1 l
P.k/ D Ul ; l D 1; : : : ; 2k ; Ul D ; ;
2k 2k
18 Multiresolution Analysis for Uncertainty Quantification 641
so we have
2k
[ .k/ .k/ .k/
Ul D U; Ul \ Ul 0 D ; for l l 0 :
lD1
n o
.k/ .k/
VNo D U W x 2 U 7! RI U jU.k/ 2 No Ul ; l D 1; : : : ; 2k ; (18.4)
l
where we have denoted by No .I/ the space of polynomials with degree less or equal
to No defined over I.
.k/
In other words, the space VNo consists of the functions that are polynomials of
.k/
degree at most No over each subintervals Ul of the partition P.k/ . Observe that
.k/
dim.VNo / D .No C 1/ 2k . In addition, these spaces have a nested structure in the
sense that
.0/ .1/ .2/ .k/ .k/ .k/
VNo VNo VNo : : : and V0 V1 V2 : : :
it is remarked [16] that these union spaces are dense in L2 .U/, the space of square
integrable functions equipped with the inner product denoted h; iU .
.k/
We now construct two distinct orthonormal bases for VNo .
.0/
VNo D span fL ; D 0; : : : ; Nog : (18.5)
.k/ .k/
Introducing the affine mapping Ml W Ul 7! U through
.k/ :
Ml .x/ D .2k x l C 1/; (18.7)
we construct the associated shifted and scaled versions of the Legendre polynomials
L as follows:
8
<2k=2 L M .k/ x ; x 2 Ul ;
.k/
.k/ l
;l .x/ D (18.8)
:0; otherwise.
.k/ .k/
The resulting functions ;l .x/ obviously belong to VNo , and, for given resolu-
tion level k, one can easily verify that they form an orthonormal set
(
D E 1; D and l D l 0 ;
.k/ .k/
;l ; ;l 0 D (18.9)
U 0; otherwise.
n o
.k/ .k/
Thus, ;l I D 0; : : : ; NoI l D 1; : : : ; 2k is an orthonormal basis of VNo since
.k/
dim VNo D 2k .No C 1/.
.k/
Therefore, any function U 2 VNo can be expanded as
2
No X k
.k/
X .k/ .k/
VNo 3 U .x/ D u;l ;l .x/; (18.10)
D0 lD1
.k/
where the coefficients u;l are the SE coefficients of U . Also, the orthogonal
.k/ .k/
projection of U 2 L2 .U/ in VNo , denoted hereafter PNo U , minimizes kU V kU
.k/
over all V 2 VNo and has for expression
k
2
No X
X D E
.k/ .k/ .k/ .k/ .k/
PNo U .x/ D u;l ;l .x/; u;l D U; ;l : (18.11)
U
D0 lD1
.k/
The projection error U PNo U can be reduced by considering richer projection
spaces, that is, by refining the partition P.k/ (increasing the resolution level k), or
the polynomial degree (increasing No), or both.
From Eq. (18.11), we remark that for a fixed resolution level k; the projection
.k/
coefficient u;l associated to polynomial degrees D 1; : : : ; No are not affected
when considering spaces with higher polynomial degrees. In other words, increasing
the polynomial degree amounts to complement the SE expansion with additional
(higher degree), leaving unchanged the previous ones. On the contrary, the whole
18 Multiresolution Analysis for Uncertainty Quantification 643
SE expansion is affected when one changes the resolution level k, for instance,
considering a finer partition P.k/ , even if the polynomial degree No is kept constant.
This is due to the structure of the SE basis functions which are not orthogonal (in
general) for different resolution levels. The absence of orthogonality between SE
basis functions at different resolution levels makes it difficult to control and reduce
the projection error through partition refinement. This motivates the introduction
of hierarchical enrichments having the orthogonal properties between resolution
levels. Stochastic multiwavelet (MW) expansions exactly achieve this task.
.k/
The spaces WNo are called detail spaces; we have
.0/
M .k/
VNo WNo D L2 .U/: (18.13)
k0
.k/
The objective is now to construct an orthonormal basis for these detail spaces WNo .
To this end, we first observe that
.k/ .k/
dim WNo D dim VNo D 2k .No C 1/;
.kC1/
since dim VNo D 2kC1 .No C 1/. Focusing on the case k D 0, we have to
.0/
determine orthonormal functions , 2 0; No C 1, spanning WNo . Clearly, 2
.1/
VNo so each function has an SE expansion of the form
2
No X
X
.1/
D a;l ;l : (18.14)
D0 lD1
Equation (18.15) then provides .No C 2/.No C 1/=2 distinct conditions on the
.0/ .0/
coefficients. Accounting for the orthogonality WNo ? VNo , one obtains another
2
set of .No C 1/ , vanishing moment conditions for the functions :
; x U D 0; 0 ; No: (18.16)
Orthonormal and vanishing moment properties are the only conditions needed
.0/
to be enforced for the to obtain an orthonormal basis of WNo . Having less
conditions than coefficients determining all the reflects the fact that there is an
.0/
infinite number of orthonormal bases for WNo . Higher moment vanishing conditions
could then be introduced on some of the functions . This could be particularly
useful for the compression of the MW expansions (see the thresholding procedure
described below). However, improved vanishing moments properties will not be
exploited here, and the orthogonality between functions at different resolution
levels is really what matters; therefore, we shall rely in Sect. 2.3.2 on a simple
but systematic construction procedure to obtain a set of functions satisfying
conditions (18.15) and (18.16).
When the so-called mother functions are obtained, bases for the whole hier-
.k/
archy of subspaces WNo can readily be defined by means of simple transformations.
.k/
Using the affine mapping in (18.7), we define the MW basis functions ;l through
(
.k/ .k/
.k/ 2k=2 .Ml x/; x 2 Ul ;
;l .x/ D (18.17)
0; otherwise:
.k/
As a result, the projection of U 2 L2 .U/ in VNo can be expressed in the MW basis
as
No k1 X
2 X
No i
.k/
X .0/ .0/
X .i/ .i/
PNo U .x/ D u;1 ;1 .x/ C uQ ;l ;l .x/: (18.20)
D0 iD0 lD1 D0
.0/
The first sum in (18.20) corresponds to the projection of U on VNo , whereas the
remaining contributions are orthogonal details at successive resolution levels i D 0
18 Multiresolution Analysis for Uncertainty Quantification 645
In a first step, each function qQ is orthogonalized with respect to all the functions
p .x/ for D 0; : : : ; No. That is, we determine a new set of functions q .x/ as
X No
:
q .x/ D qQ .x/ C c p .x/; (18.23)
D0
so that
q ; p U D 0; for ; D 0; 1; : : : ; No: (18.24)
2 3
i=0 i=0
1.5 i=1 i=1
2
i=2
1
1
0.5
i (x)
i (x)
0 0
0.5
1
1
2
1.5
2 3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x
4 6
i=0 i=0
3 i=1 i=1
4
i=2 i=2
2 i=3 i=3
2 i=4
1
i=5
i (x)
i (x)
0 0
1
2
2
4
3
4 6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x
Fig. 18.1 Mother MW functions .x/ for polynomial degrees No D 1, 2, 3 and 5 (Adapted
from [6])
point of U. For higher polynomial degrees, the functions are similarly piecewise
No-degree polynomials with discontinuities at different orders ranging from 0 to
No localized at the center of U.
The partitions P.k/ and their associated approximation spaces can be conveniently
represented in terms of binary trees. In a binary tree T, every node has either zero
or two children, and every node, except the root node denoted by n0 , has a unique
parent. Nodes are collected in the set N.T/. Nodes with no children are called leaves
and are collected in the set L.T/, while nodes with children are collected in the set
b
N.T/ WD N.T/ n L.T/. The two children of a node n 2 N.T/ b are called left and
right children (and also sisters) and are denoted by c .n/ and cC .n/, respectively.
The parent of a node n 2 N.T/ n fn0 g is denoted by p.n/. To each node n 2 N.T/,
.k/
we assign an element of a partition S .n/ D Ui as follows. The support of a node is
denoted S .n/ D xn ; xn U. We set S .n0 / D U. The supports of the other nodes
C
are defined recursively by a dyadic partition of the support of the parent node. Then,
18 Multiresolution Analysis for Uncertainty Quantification 647
the supports of the left and right children are, respectively, S .c .n// D xn ; .xn C
C
xnC /=2 and S .cC .n// D .xn C xnC /=2; xn;d . This construction leads to a partition
of U in the form
[
UD S .l/: (18.25)
l2L.T/
For a node n 2 N.T/, its depth jnj is defined as the number of generations it takes
to reach n from the root node n0 . It is readily seen that the support of node n has
measure jS .n/j WD 2jnj . Finally, for any node n 2 N.T/, Mn denotes the affine
map from S .n/ onto the reference stochastic domain U.
In practice, we consider binary trees T with a fixed maximum number of
successive partitions allowed. This quantity is called the resolution level and is
denoted by Nr in the following. As a result, there holds, for all n 2 N.T/,
jS .n/j 2Nr . A particular case of a binary tree is the complete binary tree where
jS .l/j D 2Nr for all the leaves. As there are 2Nr leaves in a complete binary tree,
each of the leaves is identified as an element of the partition PNr . An example of a
complete binary tree with Nr D 4 is provided in the top plot of Fig. 18.2. However,
trees need not be complete, and in view of adaptivity, we consider general binary
trees with leaves having different depth jnj. An example of such incomplete binary
tree is reported in the bottom plot of Fig. 18.2.
Even if the binary tree is incomplete, one can identify with each leaf an element
.k/
Ui of a certain partition Pk with k Nr. As a result, we can associate to a binary
tree T an approximation space V.T/ which is the space of functions in L2 .U/ being
polynomials of degree at most No over the support of each leaf of T. Note that the
notation does not explicitly mention the polynomial degree as it will be considered
Fig. 18.2 Complete binary tree (top) and incomplete binary tree (bottom) in one dimension; the
corresponding partitions of U D 0; 1 are shown below the trees
648 O.P. Le Matre and O.M. Knio
and kept constant. On the contrary, we aim at adapting the tree to the functional to
be approximated. The SE expansion of U 2 V.T/ can further be expressed as
No
X X
V.T/ 3 U .x/ D un n .x/; (18.26)
n2L.T/ D0
:
where un D hU; n iU for D 0; : : : ; No are the SE coefficients associated to the
leaf n 2 L.T/. Similarly, the MW expansion can be expressed as
No
X No
X X
V.T/ 3 U .x/ D un 0 n0 .x/ C uQ n n .x/; (18.27)
D0 N.T/ D0
n2b
:
where the detail coefficients of a node, which is not a leaf, are given by uQ n D
hU; n iU for D 0; : : : ; No.
Using the binary tree structure to represent partitions and define approximation
spaces, the extension of the one-dimensional multiresolution framework to higher
number of dimensions becomes quite straightforward. We shall now consider
the case of an N-dimensional parameter space consisting of the N dimensional
:
hypercube D UN . Regarding the partitioning of , instead of extending the
previous approach to general N-ary trees, which would become quickly intractable
as N increases, we require to keep the binary structure (each node has zero or two
children), starting from the root node n0 D . This requires the introduction of an
b
indicator for each node n 2 N.T/ which keeps track of the direction along which n
is partitioned dyadically to produce its two children. Denoted by d.n/ 2 f1 : : : Ng
the direction along which the dyadic partition of the support S .n/ is performed, the
left and right children have for respective support
C h i
C
S .c .n// D xn;1
; xn;1 xn;d.n/
; .xn;d.n/ C xn;d.n/ /=2 xn;N ; xNC
and
C h i
C C C
S .cC .n// D xn;1
; xn;1 .xn;d.n/ C xn;d.n/ /=2; xn;d.n/ xn;N ; xn;N :
Fig. 18.3 Multidimensional binary tree for N D 2 (left). Dash (resp. full) segments represent a
partition along the first (resp. second) direction. Corresponding partition of D 0; 12 (right)
Fig. 18.4 Example of two equivalent trees for N D 2. The solid (resp. dash) segments represent
a partition along the first (resp. second) direction. The partition of is shown on the right
The notion of equivalent trees is needed in the coarsening and enrichment proce-
dures of Sect. 4.
In practice, we shall consider binary trees T with a fixed maximum number
of successive partitions allowed in each direction d 2 f1 : : : Ng. As for the one-
dimensional case, this quantity is called the resolution level and is denoted by Nr.
Thus, there are 2NNr leaves in a complete binary tree in N dimensions, showing that
adaptive strategies are mandatory to apply these multiresolution schemes in high
dimensions.
The construction of the SE basis in the multidimensional case can proceed as
in the one-dimensional case. Let us denote as a prescribed polynomial space
over , P the dimension of the polynomial space, and f1 ; : : : ; P g an orthonormal
basis of . For instance, one may consider to construct from a tensorization of
the normalized Legendre basis of No .U/, resulting in P D .No C 1/N for a full
tensorization of the Legendre polynomials or P C 1 D .No C N/=.NoN/ for a
total degree truncation strategy. In any case, the SE basis functions n associated
to a leaf n are defined from the N-variate polynomials using the mapping Mn
650 O.P. Le Matre and O.M. Knio
between S .n/ and and a scaling factor jS .n/j1=2 . We are then in position to
define the multiresolution space V.T/, as the space of functions whose restrictions
to each leaves of T are polynomials belonging to . Clearly, equivalent trees are
associated to the same multiresolution space.
For U 2 L2 . /, we shall denote U T its approximation in V.T/. The SE
expansion U T has a general structure similar to Eq. (18.26), specifically
P
X X
V.T/ 3 U T .x/ D un n .x/; un D hU; n i : (18.29)
n2L.T/ D1
P h
X i
c .n0 / cC .n0 /
d D d
a; d
C b; : (18.30)
D1
It is required that the directional MW mother functions are all orthogonal to any
function of (i.e., they have vanishing moments) or equivalently
D E
d ; n0 D 0; 1 ; P and d D 1; : : : ; N: (18.31)
Once the N sets of mother functions are determined, the affine mapping Mn can be
b
used to derive the MW functions of any node n 2 N.T/, through
(
d.n/
jS .n/j1=2 .Mn x/; x 2 S .n/
n .x/ D (18.33)
0; otherwise:
.n/ C .n/
span fc ; c g D span fn g span fn g: (18.34)
18 Multiresolution Analysis for Uncertainty Quantification 651
= 1 = 2
1 1
1 1
0 0
1 1
1 1
0.5 2 0.5
0 0 2
0.5 1 0.5 1
1 0 1 0
=3 =4
1 1
2 2
1 1
0 0
1 1
2
2
1 1
0.5 2 0.5 2
0 1 0 1
0.5 0.5
1 0 1 0
As an illustration, the mother functions dD1 are reported in Fig. 18.5 for the
two-dimensional case (N D 2) and a polynomial space consisting of the full
tensorization of the degree one polynomials, so P D .No C 1/2 D 4. The plots
of the MW mother functions show the different types of singularity across the line
xdD1 D 1=2 corresponding to the split into the two children.
Then, the approximation of f 2 L2 . / in V.T/ has the hierarchical expansion
in the MW basis given by
P
X P
X X
T
V.T/ 3 U .x/ D un 0 n0 .x/ C uQ n n .x/; uQ n D hU; n i :
D1 N.T/ D1
n2b
(18.35)
652 O.P. Le Matre and O.M. Knio
In this section, we introduce two essential multiscale operators, the restriction and
prediction operators, that are useful tools in the adaptive context. We start by
introducing the notion of inclusion over trees. Let T1 and T2 be two binary trees.
We say that T1 T2 if
C n
R#T1 U T2 D uQ n : (18.37)
(The large tilde over the left-hand-side stresses that we express a MW coefficient of
the restriction of U T2 to V.T1 /). It shows that the restriction of the approximation
space, from V.T2 / to V.T1 /, preserves the MW coefficients, but reduces the set of
nodes supporting these coefficients: N.T b 1 / N.Tb 2 /. The computation of the SE
coefficients of the restriction is not as immediate. Assuming that the SE expansion
of U T2 is known, we construct a sequence of trees T.i/ such that T2 D T.0/
T.i/
T.l/ D T1 , where two consecutive trees differ by one generation only,
i.e., a leaf of T.iC1/ is either a leaf or a node with leaf children in T.i/ . Therefore,
the transition from T.i/ to T.iC1/ consists in removing pairs of sister leaves. The
process is illustrated in the left part of Fig. 18.6 for the removal of a single pair of
sister leaves. Focusing on the removal of a (left-right ordered) pair of sister leaves
Restriction Prediction
n n
+
l = p(l ) = p(l )
l l+ c (n) c+(n)
Fig. 18.6 Schematic representation of the elementary restriction (left) and prediction (right)
operators through the removal and creation respectively of the (leaves) children of a node
18 Multiresolution Analysis for Uncertainty Quantification 653
fl ; lC g, the SE coefficients of the restriction of U T.i / associated with the new leaf
l D p.l / D p.lC / 2 L.T.iC1/ / in direction d.l/ are
X h ;d.l/ C;d.l/ C
i
ul D R; ul C R; ul ; (18.38)
2P
;d
where, for all d 2
f1 : : : Ng, the transition matrices R of order P have entries
cd .n0 /
;d n0
given by R; D ; .
For the SE coefficients of P"T2 U T1 , we can again proceed iteratively, starting from
the SE expansion over T1 , using a series of increasing intermediate trees, differing
by only one generation from one to the other. This time, the elementary operation
consists in adding to some node n, being a leaf of the current tree, children in a
prescribed direction d.n/. The process is illustrated in the right part of Fig. 18.6.
The SE coefficients associated to the new leaves of a node n are given by
.n/
X ;d.n/ n C .n/
X C;d.n/ n
uc D R; u ; uc D R; u ; (18.40)
2P 2P
with the same transition coefficients as those used in (18.38). For two trees T1 T2 ,
we observe that R#T1 P"T2 D IT1 , while in general P"T2 R#T1 IT2 (I denoting
the identity).
4 Adaptivity
In this section, we detail the essential adaptivity tools needed for the control
of the local stochastic resolution, with the objective of efficiently reducing the
complexity of the computations. There are two essential ingredients: the coarsening
and enrichment procedures. Below we detail the criteria needed to perform these
654 O.P. Le Matre and O.M. Knio
two operations. These thresholding and enrichment criteria were initially proposed
in [11].
Recall that for n 2 N.T/, jnj is the distance of n from the root node n0 so
the measure of its support is jS .n/j WD 2jnj . We shall also need the measure
C
of S .n/ in specific direction d 2 1; N, that is jS .n/jd WD xn;d xn;d , its
diameter diam.S .n// WD maxd jS .n/jd , and its volume in all directions except d
as jS .n/jd WD jS .n/j=jS .n/jd .
4.1 Coarsening
Let T be a binary tree and let U T 2 V.T/. The coarsening procedure aims
at constructing a subtree T T (or, equivalently, a stochastic approximation
subspace V.T / V.T/) through a thresholding of the MW expansion coefficients
of U T .
P 2
where uQ n WD uQ n 2P and kuQ n k2`2 D 2P uQ n . The motivation for (18.41) is that,
b T be the thresholded version of U T obtained by omitting in the second sum
letting U
of (18.35) the nodes n 2 D.; Nr/, there holds
X X
kUO T U T k2L2 ./ D kuQ n k2`2 2jnj .NNr/1 2 2 (18.42)
n2D.;Nr/ n2D.;Nr/
since
X NNr1
X NNr1
X
2jnj D #fn 2 D.; Nr/I jnj D j g2j 1 D NNr:
n2D.;Nr/ j D0 j D0
Therefore, using the definition in (18.41) for the thresholding the MW expansion of
a function in V.T/, we can guaranty an L2 error less than .
n n
Ta Tb Tc Td Ta Tc Tb Td
Fig. 18.7 Illustration of the elementary operation to generate equivalent trees: the pattern of a
node with its children divided along the same direction (left) is replaced by the same pattern but
with an exchange of the partition directions (right) plus the corresponding permutation of the
descendants of the children
Let T be a binary tree and let U T 2 V.T/. The purpose of the enrichment is to
increase the dimension of V.T/, by adding descendants to some of its leaves. Enrich-
ment of the stochastic space is required to adaptively construct approximations of
particular quantities. Another typical situation where enrichment is necessary is in
dynamical problems, with the possible emergence in time of new features in the
stochastic solution, such as shocks, that require more resolution. Classically, the
enrichment of a tree T is restricted to at most one partition along each dimension in
656 O.P. Le Matre and O.M. Knio
dynamical problems. For steady problems, the enrichment procedure can be simply
applied multiple times to end up with sufficiently refined approximations.
The simplest enrichment procedure consists in systematically partitioning all the
leaves l 2 L.T/ once for all d 2 f1 : : : Ng provided jS .l/jd > 2Nr . This procedure
generates a tree TC that typically has 2N card.L.T// leaves, which is only practical
when N is small.
More economical strategies are based on the analysis of the MW coefficients
in U T to decide which leaves of T need be partitioned and along which direction
(see, for instance, [4, 5]). We derive below two directional enrichment criteria in the
context of N-dimensional binary trees.
where H NoC1 .S .n// is the usual Sobolev space of order .No C 1/ on S .n/.
Recalling that jS .n/j D 2jnj , the bound (18.43) shows that the norm of the MW
coefficients decays roughly as O.2jnj.NoC1/ / for smooth U . Therefore, the norm
of the (unknown) MW coefficients of a leaf l 2 L.T1D / can be estimated from the
norm of the (known) MW coefficients of its parent as
kuQ p.l/ k`2 2NoC1 2jlj=2 Nr1=2 and jS .l/j > 2Nr : (18.44)
Fig. 18.8 Illustration of the virtual sisters of a leaf l of a tree T whose partition is shown in the left
plot. In the center plot, the leaf l is hatched diagonally in blue and its two virtual sisters for d D 1
and 2 (hatched horizontally and vertically respectively) are leaves of T, both being cC .pd .l//. In
the right plot, a different leaf l is considered (still hatched diagonally in blue) with virtual sisters
sd .l/ which for d D 1 (hatched horizontally) is a node of T but not a leaf and which for d D 2
(hatched vertically) is not a node of T
virtual sister of l along direction d . Note that pd .l/ 2 N.T/ only for d D d.p.l//;
moreover, in general, sd .l/ N.T/. These definitions are illustrated in Fig. 18.8
which shows for N D 2 the partition associated with a tree T (left plot) and the
virtual sisters of two leaves.
The SE coefficients of the virtual sisters,
d .l/
D d
E
us WD U T ; s .l/ ; D 1; : : : ; P; (18.45)
are efficiently computed by exploiting the binary structure of T and relying on the
elementary restriction and prediction operators defined in Sect. 3.3. Without going
into too many details, let us mention that the computation of the SE coefficients of
sd .l/ amounts to (i) finding the subset of leaves in L.T/ whose supports overlap
with S .sd .l//, (ii) constructing the subtree having for leaves this subset, and (iii)
restricting the solution over this subtree up to sd . In practice, one can reuse
the restriction operator defined in Sect. 3.3 to compute the usual details in the
fn;d gD1;:::;P basis for a chosen direction d .
We now return to the design of a multidimensional enrichment criterion. Assum-
ing an underlying isotropic polynomial space , with degree No in all directions, a
natural extension of (18.44) is that a leaf l is partitioned in the direction d if
NoC1
pd .l/ diam.S .pd .l///
kuQ k`2 2jlj=2 .NNr/1=2 and jS .l/jd > 2Nr :
diam.S .l//
(18.46)
We recall that diam.S .n// is the diameter of the support of the considered node. The
criterion in (18.46) is motivated by the following multidimensional extension of the
bound (18.43) for the magnitude of the MW coefficients uQ n in the direction d for a
658 O.P. Le Matre and O.M. Knio
generic node n,
jQun j D inf .U P /; n;d C diam.S .n//NoC1 kU kH NoC1 .S.n// : (18.47)
P 2
:
l D diam.S .pd .l///=diam.S .l//
8
D E <jS .n/j1=2 xd xn;d ; 2 S .n/;
C
uN n;d N n;d N n;d .x/ D
xn;d
WD U; ; xn;d
:
0; otherwise;
(18.48)
where f g2f1:::NoC1g is the set of 1D wavelet functions defined on 0; 1. The
vector of coefficients uN n;d measures details in U at the scale jS .n/jd , in direction d
only, by averaging out any variability in U along the other directions.
For a direction d 2 f1 : : : Ng, let d denote all the directions except d . Because
by construction kN n;d k D 1, it follows that
D E
N n;d
jNun;d
j D inf U P;
P 2No
Z
D inf .U .xd ; xd / P .xd // N n;d .xd /d x
P 2No S.n/
Z
n
D inf jS .n/jd UN d .xd / P .xd / N n;d .xd /d x
P 2No Sd .n/
C jS .n/jd jS .n/jNoC1
d kUN d
n
kH NoC1 .Sd .n// kN n;d kL2 .Sd .n//
1=2
D C jS .n/jd jS .n/jNoC1
d kUN d
n
kH NoC1 .Sd .n// ; (18.49)
R
where UN d
n
.xd / D jS .n/j1
d Sd .n/ U .xd ; xd /d xd is the marginalization of
U .x/ over the support S .n/ in all the directions d . Furthermore, (omitting the
reference to the node n),
18 Multiresolution Analysis for Uncertainty Quantification 659
Z NoC1 Z
2
@ 1
kUN d k2H NoC1 .Sd / D U .x ; x /d x
d dxd
@x jS j d d
Sd d d Sd
Z Z 2
1 @NoC1
D U .xd ; xd /d xd dxd jS j1
2
jS jd Sd Sd @xd d
Z NoC1 2
@
@x U dx;
S d
whence we deduce
1=2
kUN d kH NoC1 .Sd / jS jd kU kL2 .Sd ;H NoC1 .Sd // ; (18.50)
R
with anisotropic Sobolev norm kU k2L2 .S ;H NoC1 .S // D Sd kU .xd ; /k2H NoC1 .S /
d d d
d xd .
Combining (18.49) with (18.50), the estimate for the directional details magni-
tude is now
D E
jNun;d .U P /; N C jS .n/jNoC1
n;d
jD inf d kU kL2 .Sd .n/;H NoC1 .Sd .n/// :
P .d /2No .U/
(18.51)
d .l/
kuN p k`2 2NoC1 2jlj=2 .NNr/1=2 and jS .l/jd > 2Nr : (18.52)
The details norm associated with the basis fN n;d g2P can be obtained explicitly
from the vector of MW coefficients uQ n;d by averaging it in all but the d th direction.
5 Illustrations
d 2U dU 35 3 15
Cf D F .U /; F .U / D U C U; (18.53)
dt 2 dt 2 2
660 O.P. Le Matre and O.M. Knio
which depends on the initial position and so is uncertain. However, the mapping
7! U 1 p is not necessarily smooth; indeed F .u/ D 0 has two (stable) roots r1 D
r2 D 15=35 so the system has two stable steady points. For simplicity, we
shall consider the uncertain initial condition U 0 ./ D 0:05 C 0:2, where has a
uniform distribution in 1; 1; the multiresolution scheme detailed above can then
be applied with x D . C 1/=2.
The problem is solved relying on a stochastic Galerkin projection method [15,
21], seeking an expansion of U .t; / according to
X
U .t; / D u .t / ./; (18.55)
a X b X
1 1.5
0.8
0.6 1
0.4 0.5
0.2
0 0
0.2 0.5
0.4
0.6 1
0.8 1.5
1
1 1
0.5 0.5
0 0 0 0
2 2
4 4 0.5
6 0.5 6
8 time 8
12 1 1
time 10 10
12
c X
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1
0.5
0 2 0
4
6 0.5
time 8
12 1
10
.0/
Fig. 18.9 Time evolution of the Galerkin solution of (18.53) discretized in spectral space VNo
(scaled Legendre polynomials) with No D 3, 5, and 9 (from left to right)
18 Multiresolution Analysis for Uncertainty Quantification 661
residual with respect to the stochastic approximation space, one obtains the
governing equations for the expansion coefficients; specifically this results in
* 0 1 +
d u X
D F @ A
u ; : (18.56)
dt
Upon truncation of the expansion, one has to solve a set of coupled nonlinear ODEs
for the expansion coefficients. The initial conditions for these expansion coefficients
are0 also obtained by projecting the uncertain initial data, yielding u .t D 0/ D
U ; .
Figure 18.9 shows the time evolutions of the solution from the uniformly dis-
.0/
tributed initial condition, for different one-dimensional multiresolution spaces VNo
as defined in Sect. 2.1. Computations for different polynomial orders No D 3, 5, and
9 are reported. It is seen that the smooth approximations at level 0, corresponding to
spectral expansions with the functionals being the scaled Legendre polynomials,
are not able to properly represent the evolutions and asymptotic behavior of U .
Specifically, when increasing the polynomial order No, the computed asymptotic
steady state is plagued with Gibbs oscillations.
.Nr/
In contrast, piecewise constant approximations in V0 quickly converge with
the resolution level Nr, as can be appreciated from the plots of Fig. 18.10. The plots
depict the solutions for increasing resolution levels Nr D 2, 3, and 5. In particular,
it is seen that the exact asymptotic steady solution is recovered for most values of ,
except over the subinterval containing the discontinuity. When the resolution level
increases, a finer definition of the discontinuity location is achieved, resulting in a
highly accurate approximation.
.Nr/
Note that in the case of an approximation in VNo , one can decide to formulate
Galerkin problem in the SE or MW basis. From the computational point of view,
using one or the other basis can dramatically affect the complexity of the computa-
tions. Specifically, for the SE element basis, the nonlinear systems of 2Nr .No C 1/
equations form in fact a set of 2Nr uncoupled nonlinear systems with size .No C 1/,
because of the limited number of overlapping supports. In contrast, formulating the
Galerkin problem in the MW basis maintains the nonlinear coupling between all
the MW coefficients of the solution, requiring the resolution of a significantly larger
problem. As a general rule, determining approximations in multiresolution spaces is
generally more conveniently performed for the SE expansions (i.e., computing local
expansions over the set of leaves), while performing the multiresolution analysis
to decide on enrichment/coarsening uses mostly the MW coefficients as shown
in Sect. 4. The multiscale operators and the tree representation provide convenient
means to efficiently translate one set of coefficients into another.
This example also illustrates the need for adaptivity tools that allow the tuning of
the stochastic discretization effort, i.e., the resolution level, dynamically in time and
locally in the stochastic space. For instance, it is seen that an essentially uniform
resolution level is needed at early times while, asymptotically, one needs only detail
662 O.P. Le Matre and O.M. Knio
a b
X X
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
0.5 0.5
0 0 0 0
2 2
4 0.5 4 0.5
6 8 6 8
time time
10 12 1 10 12 1
c X
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
0.5
0 0
2
4 6 0.5
time 8
10 12 1
.Nr/
Fig. 18.10 Time evolution of the Galerkin solution of (18.53) discretized in spectral space V0 ,
that is, piecewise constant approximations, with Nr D 2, 3, and 5 (from left to right)
In [11], a fully adaptive strategy of the stochastic discretization was proposed for
the resolution of scalar conservation equations.
@U @
C F .U I / D 0; x 2 0; 1; (18.57)
@t @x
with periodic boundary conditions (here x denotes the spatial variable). This
equation models the evolution of a normalized density of vehicles, U , on a road
(closed track, in this case); the traffic model corresponds to a flux function having
the form
U IC .x; / D U .1 / U .2 /I0:1;0:3 .x/ C U C .3 /I0:3;0:5 .x/ U .4 /I0:5;0:7 .x/;
(18.59)
where
and
The problem has therefore five stochastic dimensions (N D 5) and the selected
polynomial space is spanned by the partially tensorized Legendre polynomials
U(x,t=0.9, )
(j)
(j)
U(x,t=0, )
(j)
Fig. 18.11 Stochastic traffic equation: sample set of 20 realizations of the initial condition (left)
and computed solution at t D 0:4 (middle) and t D 0:9 (right) (Adapted from [11])
664 O.P. Le Matre and O.M. Knio
1 0.4 1
0.14
0.3
0.8 0.8 0.11
0.2 0.08
0.6 0.6
0.1 0.05
0.4 0.4
0 0.02
0.2 0.2
0 0
0 1 0 1
x x
Fig. 18.12 Space-time diagrams of the solution expectation (left) and standard deviation (right)
(Adapted from [11])
S1 S2 S3 S4 S5 1 d Sd
1
0.8
0.6
0.4
0.2
0
Fig. 18.13 Space-time diagrams .x; t / 2 0; 1 0; 1 of the first-order sensitivity indices and
the contribution of sensitivity indices of higher order (Adapted from [11])
mesh supports its own binary tree for its stochastic solution space, which is adapted
at each time step of the simulation, by relying on the coarsening and refinement
procedures detailed above.
The two enrichment criteria (multidimensional (18.46) and directional (18.52))
were tested for different values for and No and a fixed maximal resolution
Nr D 6. For fixed and No, the multidimensional criterion leads to more refined
stochastic discretizations but with only a marginal reduction of the approximation
error (as measured by the stochastic approximation error "sd defined in (18.61)
below) compared to the directional criterion. This is illustrated in Fig. 18.15 which
shows the time evolution of the total number of SE for the two enrichment criteria,
different values of , and No D 3. The rightmost plot shows the corresponding error
"sd as a function to the total number of SE at t D 0:5. Because the two enrichment
criteria have similar computational complexity, the directional criterion (18.52) is
generally preferred.
The dependence on space and time of the adapted trees can be appreciated
from Fig. 18.16 which displays the averaged depths of the trees measured as
log2 card.L.Tni // in the computation with D 104 and No D 3. This plot shows
the adaptation to the local stochastic smoothness; as expected, a finer stochastic
discretization along the path of the shock waves is necessary, while a coarser
discretization suffices in the expansion waves and in the regions where the solution
is spatially constant. The right plot in Fig. 18.16 shows the time evolution of the total
number of leaves in the stochastic discretization. We observe a monotonic increase
in the number of leaves, with higher rates when additional wave interactions occur
and, subsequently, with a roughly constant rate since the stochastic shocks, which
dominate the discretization need, affect a portion of the spatial domain growing
linearly in time.
1 1
T1 T1
0 0
1 T2 1 T2
0 0
1 T3 1 T3
0 0
1 T4 1 T4
0 0
1 T5 1
T5
0 0
Fig. 18.14 Total sensitivity indices as a function of x 2 0; 1 at t D 0:4 (left) and t D 0:9 (right)
(Adapted from [11])
18 Multiresolution Analysis for Uncertainty Quantification 667
Nc Z
X n 2
"2sd D
x n
Ui ./ Uex;i ./ d; (18.61)
iD1
n
where Uex;i denotes the exact stochastic semi-discrete solution and Nc D 200 is
the number of spatial cells. In practice, the error is approximated by means of
a Monte Carlo simulation from a uniform sampling of . For each element of
Total number of SE
=5.105 =5.105
105 =10 105 =10
104
104 104
105
Fig. 18.15 Comparison of the two enrichment criteria for No D 3 and different values of as
indicated. Evolution in time of the total number of stochastic elements in the discretization for the
multiD criterion (18.46) (left plot) and the directional criterion (18.52) (center plot). Right plot:
corresponding error measures "sd at t D 0:5 as a function of the total number of SE for the two
enrichment criteria (Adapted from [11])
668 O.P. Le Matre and O.M. Knio
1 14
60000
12
0.8 50000
Total number of SE
10
40000
0.6
8
30000
0.4 6
20000
0.2 10000
0
0 0 0.2 0.4 0.6 0.8 1
0 1 t
x
Fig. 18.16 Space-time diagrams of the averaged depth of local trees in log2 scale (left) and
evolution in time of the total number of stochastic elements (right) (Adapted from [11])
D1 D2 D3 D4 D5 r
Fig. 18.17 Space-time diagrams .x; t / 2 0; 1 0; 1 of the averaged directional depths and of
the aspect ratio (Adapted from [11])
contrary, for highly resolved computations (lowest values of ), high polynomial
degrees achieve a more accurate approximation for a lower number of degrees
of freedom. Such behavior is typical of multiresolution schemes; when numerical
diffusion slightly smoothes out the discontinuity, at high-enough resolution, high-
degree approximations recover their effectiveness.
To complete this example on stochastic adaptation, computational efficiency is
briefly discussed. The purpose is limited here to demonstrate that the overhead
arising from adapting the stochastic discretization in space and time is limited. It is
first recalled that when considering SE expansions, the determination of the solution
is independent from a leaf to another. This characteristic offers opportunities for
designing efficient parallel solvers, with different processors dealing with inde-
pendent subsets of leaves. For the present example, for instance, the computation
of the stochastic (Roe) flux over different leaves was performed in parallel, and
the only remaining issue concerns the load balancing in the case of complicated
trees, particularly for trees evolving dynamically in time. Regarding the other
parts of the adaptive procedure, namely, the enrichment and coarsening steps, it is
important that they do not consume too much computational resources; otherwise,
the gain in adapting the approximation space would be lost. For the present example,
numerical experiments demonstrate that the computational times of the enrichment
and coarsening steps roughly scale with the number of leaves. This is shown in
the plots of Fig. 18.19 which report the CPU times (in arbitrary units) for the
advancement of the solution over a single time step as a function of the number of
leaves, when using the discretization parameters No D 2, D 103 , and No D 3,
D 104 , respectively. The global CPU times reported are in fact split into different
contributions, including flux computation times and coarsening and enrichment
102 102
No=2 No=2
No=3 No=3
No=4 No=4
103 No=5 103 No=5
104 104
sd
sd
105 105
106 106
107 3 107 5
10 104 105 106 10 106 107 108
Total number of SE Total number of DoF
Fig. 18.18 Convergence of the semi-discrete error "sd at time t n D 0:5 for different values of
2 102 ; 105 and different polynomial degrees No 2 f2 : : : 5g. The left plot reports the error
as a function of the total number of stochastic elements, while the right plot shows the error as a
function of the total number of degrees of freedom in the stochastic approximation space (Adapted
from [11])
670 O.P. Le Matre and O.M. Knio
computational times. These numerical experiments demonstrate that for the present
problem, owing to the representation of the stochastic approximation spaces using
binary tree structures, an asymptotically linear computational time in the number
of leaves is achieved for the adaptation specific steps of the computation. Further,
in this example, the computational times dedicated to adaptivity management (the
coarsening and enrichment procedures) are not significant compared to the rest of
the computations.
6 Conclusions
This chapter has presented a multiresolution approach for propagating input uncer-
tainties in a model output. The multiresolution is designed to handle situations
where the uncertain model solution has complex dependences with respect to the
parameters. These include steep variations and discontinuities, a situation that is
extremely challenging for classical spectral approaches based on smooth global
basis functions.
The key feature of the multiresolution spaces is their piecewise polynomial
nature that can, with sufficient resolution, accommodate singularities of various
kinds. However, this capability comes at the cost of a potentially significant
increase in the dimensionality of approximation spaces, making it essential to
consider effective adaptive strategies that are able to tune the local resolution level
according to the local complexity of the model solution. We have shown that
this can be efficiently achieved by relying on a suitable data structure, namely,
binary trees. One significant advantage of the binary tree structure is that, not only
does it scale reasonably with the dimensionality of the input parameter space, but
it also facilitates the conversion of local expansions (in the SE basis) to detail
expansions (in the MW basis), namely, through the recursive application of scale-
independent operators. As a result, fast enrichment and coarsening procedures can
be implemented without slowing down significantly the computation and so taking
full advantage of having a stochastic discretization which is adapted to the solution
100 100
101 101
arbitrary unit
arbitrary unit
Fig. 18.19 Dependence of the CPU time (per time iteration) on the stochastic discretization
measured by the total number of leaves; left: No D 2 and D 103 ; right: No D 3 and D 104 .
The contributions of the various steps of the adaptive algorithm are also shown (Adapted from [11])
18 Multiresolution Analysis for Uncertainty Quantification 671
needs. All these characteristics are crucial to make the problem tractable from the
efficiency perspective.
Multiresolution frameworks and algorithms have enabled the resolution of engi-
neering uncertainty quantification problems that could not be solved with classical
spectral approaches, such as global PC representations. These successes include
application to conservation law models, compressible flows, stiff chemical systems,
and dynamical systems with uncertain parametric bifurcations. MRA schemes,
however, are complex to implement, especially in high-dimensional problems. So
far, applications have only considered a limited set of uncertain input variables.
Another aspect preventing a wider spread of the MRA approach is the absence
of available software libraries automating the adaptive refinement procedures. It
is expected that such numerical tools will be made available in the coming years.
Another area that could also result in wider adoption of MRA schemes concerns
the application of associated MW representations in a nonintrusive context. Recent
advances in compressive sensing, sparse approximations, and low rank approxima-
tion methodologies offer promising avenues toward the emergence of new, highly
attractive capabilities.
Acknowledgements The authors are thankful to Dr. Alexandre Ern and Dr. Julie Tryoen for their
helpful discussions and for their contributions to the work presented in this chapter.
References
1. Chorin, A.J.: Gaussian fields and random flow. J. Fluid Mech. 63, 2132 (1974)
2. Meecham, W.C., Jeng, D.T.: Use of the Wiener-Hermite expansion for nearly normal
turbulence. J. Fluid Mech. 32, 225 (1968)
3. Le Matre, O., Knio, O., Najm, H., Ghanem, R.: Uncertainty propagation using Wiener-Haar
expansions. J. Comput. Phys. 197(1), 2857 (2004)
4. Le Matre, O.P., Najm, H.N., Ghanem, R.G., Knio, O.M.: Multi-resolution analysis of Wiener-
type uncertainty propagation schemes. J. Comput. Phys. 197(2), 502531 (2004)
5. Le Matre, O.P., Najm, H.N., Pbay, P.P., Ghanem, R.G., Knio, O.M.: Multi-resolution-analysis
scheme for uncertainty quantification in chemical systems. SIAM J. Sci. Comput. 29(2), 864
889 (2007)
6. Le Matre, O., Knio, O.: Spectral Methods for Uncertainty Quantification. Scientific
Computation. Springer, Dordrecht/New York (2010)
7. Gorodetsky, A., Marzouk, Y.: Efficient localization of discontinuities in complex computa-
tional simulations. SIAM J. Sci. Comput. 36, A2584A2610 (2014)
8. Beran, P.S., Pettit, C.L., Millman, D.R.: Uncertainty quantification of limit-cycle oscillations.
J. Comput. Phys. 217, 217247 (2006)
9. Pettit, C.L., Beran, P.S.: Spectral and multiresolution Wiener expansions of oscillatory
stochastic processes. J. Sound Vib. 294, 752779 (2006)
10. Tryoen, J., Le Matre, O., Ndjinga, M., Ern, A.: Multi-resolution analysis and upwinding for
uncertain nonlinear hyperbolic systems. J. Comput. Phys. 228, 64856511 (2010)
11. Tryoen, J., Le Matre, O., Ern, A.: Adaptive anisotropic spectral stochastic methods for
uncertain scalar conservation laws. SIAM J. Sci. Comput. 34, 24592481 (2012)
12. Ren, X., Wu, W., Xanthis, L.S.: A dynamically adaptive wavelet approach to stochastic com-
putations based on polynomial chaos capturing all scales of random modes on independent
grids. J. Comput. Phys. 230, 73327346 (2011)
672 O.P. Le Matre and O.M. Knio
13. Sahai, T., Pasini, J.M.: Uncertainty quantification in hybrid dynamical systems. J. Comput.
Phys. 237, 411427 (2013)
14. Pettersson, P., Iaccarino, G., Nordstrm, J.: A stochastic galerkin method for the Euler
equations with roe variable transformation. J. Comput. Phys. 257, 481500 (2014)
15. Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Dover, Minneola
(2003)
16. Alpert, B.K.: A class of bases in L2 for the sparse representation of integral operators. J. Math.
Anal. 24, 246262 (1993)
17. Strang, G.: Introduction to Applied Mathematics. Wellesley-Cambridge Press, Wellesley
(1986)
18. Cohen, A., Mller, S., Postel, M., Kaber, S.: Fully adaptive multiresolution schemes for
conservation laws. Math. Comput. 72, 183225 (2002)
19. Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet techniques in numerical simulation.
In: Stein, E., de Borst, R., Hughes, T.J.R. (eds.) Encyclopedia of Computational Mechanics,
vol. 1, pp. 157197. Wiley, Chichester (2004)
20. Harten, A.: Multiresolution algorithms for the numerical solution of hyperbolic conservation
laws. Commun. Pure Appl. Math. 48(12), 13051342 (1995)
21. Le Matre, O.P., Knio, O.M.: Spectral Methods for Uncertainty Quantification. Springer,
Dordrecht/New York (2010)
22. Tryoen, J., Le Matre, O., Ndjinga M., Ern, A.: Intrusive projection methods with upwinding
for uncertain nonlinear hyperbolic systems. J. Comput. Phys. 228(18), 64856511 (2010)
23. Tryoen, J., Le Matre, O., Ndjinga, M., Ern, A.: Roe solver with entropy corrector for uncertain
nonlinear hyperbolic systems. J. Comput. Appl. Math. 235(2), 491506 (2010)
24. Crestaux, T., Le Matre, O.P., Martinez, J.M.: Polynomial chaos expansion for sensitivity
analysis. Reliab. Eng. Syst. Saf. 94(7), 11611172 (2009)
Surrogate Models for Uncertainty
Propagation and Sensitivity Analysis 19
Khachik Sargsyan
Abstract
For computationally intensive tasks such as design optimization, global sensitiv-
ity analysis, or parameter estimation, a model of interest needs to be evaluated
multiple times exploring potential parameter ranges or design conditions. If a
single simulation of the computational model is expensive, it is common to
employ a precomputed surrogate approximation instead. The construction of
an appropriate surrogate does still require a number of training evaluations of
the original model. Typically, more function evaluations lead to more accurate
surrogates, and therefore a careful accuracy-vs-efficiency tradeoff needs to
take place for a given computational task. This chapter specifically focuses
on polynomial chaos surrogates that are well suited for forward uncertainty
propagation tasks, discusses a few construction mechanisms for such surrogates,
and demonstrates the computational gain on select test functions.
Keywords
Bayesian inference Global sensitivity analysis Polynomial chaos Regres-
sion Surrogate modeling
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
2 Surrogate Modeling for Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
3 Polynomial Chaos Surrogate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
3.1 Input PC Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
3.2 PC Surrogate Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
3.3 Surrogate Construction Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
3.4 Moment Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
3.5 Global Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
K. Sargsyan ()
Reacting Flow Research Department, Sandia National Laboratories, Livermore, CA, USA
e-mail: ksargsy@sandia.gov
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
1 Introduction
Over the last decade, improved computing capabilities have enabled computation-
ally intensive model studies that seemed infeasible before. In particular, models
nowadays are being explored in a wide range of condition targeting, e.g., optimal
input parameter settings or predictions with uncertainties that correspond to the
variability of input conditions. In turn, the algorithmic improvements of such inverse
and forward modeling studies have pushed the boundaries of the state-of-the-art
computational resources if a single simulation of the complex model at hand is
expensive enough. In this regard, fast-to-evaluate surrogate models serve to alleviate
the computational expense of having to simulate a complex model prohibitively
many times. In various applications, while leading to major computational efficiency
improvements, such inexpensive approximations of full models across a range of
conditions have also been called response surface models [25, 55], metamodels [39,
78], fully equivalent operational models (FEOM) [38, 49], or emulators [5, 8]. Such
synthetic, surrogate approximations have been utilized in various computationally
intensive studies, such as design optimization [48], parameter estimation [41],
global sensitivity analysis [62], uncertainty propagation [25], and reliability anal-
ysis [70]. It is useful to differentiate surrogate models as completely unphysical
functional approximations unlike, say, reduced order models that tend to preserve
certain physical or application-specific mechanisms. Furthermore, parametric surro-
gates have a predefined form thus formulating surrogate construction as a parameter
estimation problem. Polynomial chaos (PC) surrogates, due to orthogonality of the
underlying polynomial bases, offer closed-form formulae for the output moment
and sensitivity computations. They also allow orthogonal projection formulae with
quadrature integration for PC coefficient estimation. However, the projection may
not scale well into high-dimensional settings and is generally inaccurate if the
function evaluations are noisy. Regression methods in particular, Bayesian regres-
sion suffer from these difficulties in a much more controllable fashion. Besides
reviewing the PC regression approach in general and in a Bayesian setting, the focus
of this chapter is on two key forward uncertainty quantification tasks, uncertainty
propagation and sensitivity analysis, and the assessment of how surrogate models
help accelerate such studies. The proof-of-concept demonstrations will be based on
synthetic models that are not expensive to simulate but will help demonstrate the
efficiency gains in terms of the number of times the expensive model is evaluated.
In this chapter, general principles of surrogate modeling are presented, with
specific focus on PC surrogates that offer convenient means for forward uncertainty
propagation and sensitivity analysis. Various methods of constructing PC surrogates
are presented with an emphasis on Bayesian regression methods that are well
positioned to work with noisy, sparse function evaluations and provide efficient
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 675
tools for uncertain model prediction, as well as model comparison and selection.
The chapter will also highlight key challenges of surrogate modeling, such as
model selection, high dimensionality, and nonsmoothness, and review the available
methods for tackling these challenges.
Parametric, where the surrogate has a predefined form with a set of parameters
that need to be determined. A parametric surrogate is convenient since one
simply needs to store a vector of parameters to fully describe it. However, it
typically entails an underlying assumption about some features of the function,
e.g., smoothness. Most often, parametric surrogates also are equipped with a
mechanism to increase the resolution or accuracy within the same functional
form, e.g., polynomial order in polynomial chaos (PC) surrogates [44] or
Pad approximations [12], or the resolution level for wavelet-based approxima-
tions [35].
Nonparametric, where underlying assumptions help derive rules to construct the
surrogate for any given , without a predefined functional form. Thus, nonpara-
metric surrogates often provide more flexibility, albeit at an extra computational
cost associated with their construction and evaluation. It is worth noting that
nonparametric surrogates also rely on some structural assumptions about the
function and may involve a controlling parameter set, such as correlation length
in Gaussian process [51] surrogates, regularization parameter in radial basis func-
tion construction [45], or knot locations for spline-based approximations [64].
Moment evaluation: This work will specifically focus on the first two moments
and demonstrate the computation of model mean Ef ./ and model variance
Vf ./ with surrogates.
Sensitivity analysis: This forward propagation task is otherwise called global
sensitivity analysis or variance-based decomposition which helps attribute the
output variance to the input dimensions, thus enabling parameter importance
ranking and dimensionality reduction studies. The main effect sensitivity of
dimension i refers to the fractional contribution of the i -th parameter toward the
V Ei f .ji /
total variance via Si D i Vf ./
, where Vi and Ei refer to variance with
respect to the i -th parameter and mean with respect to the rest of the parameters,
respectively.
In lieu of a surrogate, Monte Carlo (MC) methods are often employed for
such uncertainty propagation tasks. Further in this chapter, such MC estimators
for moments and sensitivities, used for comparison, will be described. With the
underlying premise of computationally expensive forward model f ./, the measure
of success of the surrogate construction will be the estimation of moments and
sensitivity indices that requires a smaller number of training samples to achieve
an accuracy comparable to the MC method.
Polynomial chaos (PC) expansions are chosen here for the illustration of the
surrogate-based acceleration of computationally intensive studies. In fact, they offer
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 677
the additional convenience for the selected uncertainty propagation tasks since the
first two moments and the sensitivity indices are analytically computable from
the PC surrogate [14, 69]. Generally, if analytical expressions are not available,
assuming surrogate evaluation is much cheaper than the model evaluation itself,
one can employ MC estimates for moments and sensitivities of the surrogate with
a much larger number of model evaluations otherwise infeasible with the original
function f ./.
that converge in an L2 sense, provided that the moment problem for each m is
uniquely solvable. The latter condition means that the moments of m uniquely
determine the probability density function of m . This is a key condition and a
special case of more general results developed in Ernst et al. [16]. The polynomials
in the expansion (19.1) are constructed to be orthogonal with respect to the PDF of
, ./, i.e.,
Z
hk j i D k ./j ./ ./d D kj jjk jj2 ; (19.2)
where kj is the Kronecker delta, and the polynomial norms are defined as
Z 1=2
jjk jj D k2 ./ ./d : (19.3)
678 K. Sargsyan
i n 1
KX
i D ik k ./ (19.5)
kD0
Frequently only the mean and standard deviation, or bounds, of i are reported in
the literature or extracted by expert elicitation. In such cases, the maximum entropy
considerations lead to Gaussian and uniform random inputs, respectively [29].
The former case corresponds to a univariate, first-order Gauss-Hermite expansion
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 679
ai C bi bi ai
i D C i ; (19.7)
2 2
where i are i.i.d. uniform random variables on 1; 1. The classical task of forward
propagation of uncertainties for a black-box model y D f ./ is then to find the
coefficients of the PC representation of the output y:
K1
X
y' ck k ./: (19.8)
kD0
The implicit relationship between and y encoded in Eqs. (19.5) and (19.8)
serves as a surrogate for the function f ./. While the general PC forms (19.5) and
(19.6) should be used for uncertainty propagation from input to the output f ./,
the PC surrogate in particular is constructed within the same framework using a
first-order input (19.7) for all parameters. In such cases, the polynomial function
with respect to scaled inputs
K1
X
y D f ./ ' gc ./ D ck k ..//; (19.9)
kD0
where ./ is the simple scaling relationship, in this case, derived from the
Legendre-Uniform linear PC in Eq. (19.7), i.e., i D abii Cb ai
i 2
C bi ai
i for i D
1; : : : ; d . Similar to the general input PC case (19.5), one typically truncates the
polynomial expansion (19.9) according to a total degree p, such that the number of
terms is K D .p C d /=pd :
Projection relies on the orthogonality of the basis functions enabling the formula
Z
P roj 1
ck D f ..//k ./ ./d (19.10)
jjk jj2
and minimizes the L2 -distance between the function f and its surrogate
Z K1
!2
X
P roj
c D arg min f ..// ck k ./ ./d : (19.11)
c kD0
680 K. Sargsyan
Q
P roj 1 X
ck f .. .q/ //k . .q/ /wq ; (19.12)
jjk jj2 qD1
Q
where quadrature point-weight pairs f. .q/ ; wq /gqD1 are usually chosen such that
the integration of the highest-degree polynomial coefficient is sufficiently accu-
rate. For high-dimensional problems, one can use sparse quadrature [4,19,23,76]
in order to reduce the number of required function evaluations for a given level
of integration accuracy.
Regression-based approaches directly minimize a distance measure given any set
of evaluations of the function f D ff .. .n/ //gN nD1 and the surrogate that can
be written in a matrix form
( K1 )N
X
.n/
gc D ck k . / D Gc (19.13)
kD0 nD1
N K1
!2
X X
LSQ .n/ .n/
c D arg min f .. // ck k . / D arg min jjf G cjj2
c c
nD1 kD0
(19.15)
that has a closed-form solution
Both projection and regression fall into the category of collocation approaches
in which the surrogate is constructed using a finite set of evaluations of f ./ [40,
74, 76]. While the projection typically requires function evaluations at predefined
parameter values . .q/ / corresponding to quadrature points, regression has the
additional flexibility that the function evaluations or training samples can be located
arbitrarily, although the accuracy of the resulting surrogate and numerical stability
of the coefficient computation will depend on the distribution of the input parameter
samples. Furthermore, if function evaluations are corrupted by noise or occasional
faults, the projection loses its attractive properties in particular, sparse quadrature
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 681
is extremely unstable due to negative weights [19]. Regression is much more stable
in such noisy training data scenarios and is flexible enough to use various objective
functions. For example, if the function is fairly smooth but one expects rare outliers
in function evaluations due to, e.g., soft faults in computing environment, then an
`1 objective function is more appropriate instead of the `2 objective function shown
in (19.15) as it leads to a surrogate with the fewest number of outlier residuals,
inspired by compressed sensing techniques [9, 10, 43, 63]. Besides, the regression
approach allows direct extension to a Bayesian framework, detailed in the next
subsection.
Likelihood Prior
Posterior
p.Djc/ p.c/
p.cjD/ D ; (19.17)
p.D/
Evidence
which essentially measures the goodness of fit of the model training data D D ff g
to the surrogate model evaluations g c for a parameter set c. As far as the estimation
of c is concerned, the evidence p.D/ is simply a normalizing factor. Nevertheless,
it plays a crucial role in model comparison and model selection studies, discussed
further in this chapter. Since for general likelihoods and high-dimensional parameter
vector c the posterior distribution in (19.17) is hard to compute, one often employs
Markov chain Monte Carlo (MCMC) approaches to sample from it. MCMC
682 K. Sargsyan
methods perform search in the parameter space exploring potential values for c and,
via an accept-reject mechanism, generate a Markov chain that has the posterior PDF
as its stationary distribution [17, 22]. MCMC methods require many evaluations of
the surrogate model. However, such construction is performed once only, and the
surrogates are typically inexpensive to evaluate. Therefore, the main computational
burden is in simulating the complex model f ./ at training input locations in order
to generate the dataset D for Bayesian inference via MCMC. As such, one should
try to find the most accurate surrogate with as few model evaluations as possible.
The posterior distribution reaches its maximum at the maximum a posteriori
(MAP) value. Working with logarithms of the prior and posterior distributions as
well as the likelihood, the MAP value solves the optimization problem
c MAP D arg max log p.cjD/ D arg max log LD .c/ C log p.c/ : (19.19)
c c
Clearly, this problem is equivalent to the deterministic regression with the negative
log-likelihood log LD .c/ playing the role of an objective function augmented by
the regularization term that is the negative log-prior log p.c/. In principle, the
Bayesian framework also allows inclusion of nuisance parameters, e.g., parameters
of the prior or the likelihood, that are inferred together with c and subsequently
integrated out to lead to marginal posterior distributions on c. In a classical case,
assuming a uniform prior p.c/ and an i.i.d Gaussian likelihood with, say, constant
variance 2 ,
N 1
log LD .c/ D log 2 C N log C 2 jjf G cjj2 ; (19.20)
2 2
Clearly, the posterior mean value is equal to c MAP and also coincides with the least-
squares estimate (19.16). With the probabilistic description of c, the PC surrogate is
uncertain and is in fact a Gaussian process with analytically computable mean and
covariance functions
gc ..// GP ./c ; ./ c . 0 /T ; (19.22)
where ./ is the basis measurement vector at parameter value , i.e., its k-th
entry is ./k D k ./. Such an uncertain surrogate is particularly useful when
there is a low number of training simulations, and one would like to quantify
the epistemic uncertainty due to lack of information. This is the key strength of
Bayesian regression in the presence of noisy or sparse training data, it allows
useful surrogate results with an uncertainty estimate that goes with it.
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 683
reduces with increasing p while the surrogate itself is becoming less and less
accurate across the full range of values of . Such overfitting can be avoided using
a separate validation set of samples V D f.r/ gR rD1 that is held out, measuring the
quality of the surrogate via an error measure
v
u R 2
u1 X
eV .f ; g c / D t f ..r/ / gc ..r/ / : (19.24)
R rD1
Often validation set is split from within the training set in a variety of ways,
leading to cross-validation methods that can be used for selection of an appropriate
surrogate degree. For a survey of cross-validation methods for model selection,
see Arlot et al. [2]. In the absence of a computational budget for cross-validation,
one can employ heuristic information criteria, e.g., Akaike information criterion
(AIC) or Bayesian information criterion (BIC), that penalize models with too many
parameters thus alleviating the overfitting issue to an extent [1]. Bayesian machinery
generally allows for model selection strategies via the Bayes factor which has firm
probabilistic grounds and is more general than AIC or BIC, albeit is often difficult
to evaluate [30]. The Bayes factor is the ratio of the evidence p.D/ with one model
versus another. While the evidence is difficult to compute in general, it admits a
useful decomposition for any parameterized model (first integrating with respect to
the posterior distribution then employing Bayes formula),
Z Z
LD .c/p.c/
log p.D/ D log p.D/p.cjD/d c D log p.cjD/d c
p.cjD/
Z Z
p.cjD/
D log LD .c/ p.cjD/d c log p.cjD/d c : (19.25)
p.c/
Fit Complexity
684 K. Sargsyan
The posterior average of the log-likelihood measures the ability of the surrogate
model to fit the data, while the second term in (19.25) is the relative entropy or
the Kullback-Leibler divergence between the prior and posterior distributions [32].
In other words, it measures the information gain about the parameters c given the
training data set D. The more complex model extracts more information from the
data, therefore this complexity directly implements Ockhams razor, which states
that with everything else equal, one should choose the simpler model [28]. Most
information criteria, such as AIC or BIC, enforce Ockhams razor in a more heuristic
way. For Bayesian regression described above in this chapter, the linearity of the
model with respect to c likelihood allow closed-form expressions for the fit and
complexity scores from (19.25) and, consequently, for the evidence:
N 1
Fit D log2 2 2 f T I G .G T G /1 G T f (19.26)
2
K 1
Complexity D log2 2 logdet.G T G / K log.2
/; (19.27)
2 2
a b
Fig. 19.1 Illustration of Bayesian regression with N D 11 training data extracted from noisy
evaluations of a third-order polynomial f ./ D 3 C 2 6, with noise variance 2 D 0:01. The
gray region indicates the predictive variance of the resulting uncertain surrogate (19.22) together
with the data noise variance. Three cases of the PC surrogate order are shown, demonstrating the
high-order overfitting on the right plot. (a) PC order = 2. (b) PC order = 3. (c) PC order = 8
Fig. 19.2 (Left) Illustration of log-evidence together with its component scores for fit and
complexity as functions of the PC surrogate order. (Right) The same log-evidence is plotted
together with a validation error computed at R D 100 separate set of points according to (19.24)
Fig. 19.3 Log-evidence as a function of the PC surrogate model for varying amount of training
data (left), and for varying values of the data noise variance (right)
Fig. 19.4 Illustration of PC surrogate constructed via Bayesian compressive sensing, for a 50-
dimensional exponential function, defined in the Appendix. The left plot shows PC surrogate values
versus the actual model evaluations at a separate, validation set of R D 100 sample points. The
right plot illustrates the same result in a different way. Namely, the validation samples are ordered
according to ascending model values and plotted together with the corresponding PC surrogate
values with respect to a counting index of the validation samples. The grey error bars correspond
to posterior predictive standard deviations. The number of training samples used to construct the
surrogate is N D 100, and the constructed sparse PC expansion has only K D 11 bases, with the
highest order equal to three
Assuming
Qd is distributed according to input PC expansion (19.7), i.e., uniform on
a
iD1 i ; b i , the moments of the function can be evaluated using standard Monte
Carlo estimators with M function evaluations:
M
O 1 X
Ef ./ Ef ./ D f .. .m/ //; (19.29)
M mD1
1 X 2
M
O ./ D
Vf ./ Vf O ./ : (19.30)
f .. .m/ // Ef
M mD1
Z Z K1
X
Ef ./ Egc ./ D gc ./ ./d D ck k ./ ./d D c0 ;
kD0
(19.31)
Z
Vf ./ Vgc ./ D .gc ./ c0 /2 ./d
Z K1
!2 K1
X X
D ck k ./ ./d D ck2 jjk jj2 : (19.32)
kD1 kD1
Note that the exact expressions forQsurrogate moments (19.31) and (19.32) presume
that is distributed uniformly on diD1 ai ; bi . For more complicated PDFs ./,
one can employ MC estimates with a much larger ensemble than would have
been possible without using the surrogate. This is the main reason for surrogate
construction to replace the complex model in computationally intensive studies.
In this chapter, however, specific moment evaluation and sensitivity computation
tasks are selected that allow exact analytical formulae with PC surrogates in order
to neglect errors due to finite sampling of the surrogate itself.
For the demonstration of moment estimation as well as sensitivity index compu-
tation in the next subsection, the classical least-squares regression problem (19.15)
is employed for surrogate construction and estimation of PC coefficients, with a
solution in (19.16). Note that if Bayesian regression is employed, the resulting
uncertain surrogate can lead to an uncertainty estimate in uncertainty estimate in
the evaluation of moments as well. Figure 19.5 illustrates the relative error in the
estimation of mean and variance for varying number of function evaluations and
PC surrogate order, for the oscillatory function, described in the Appendix, with
dimensionality d D 3. Also, the dimensionality d D 3. Also, the MC estimates
from (19.29) and (19.30) are shown for comparison. Clearly, for the estimate of the
mean of this function, even a linear PC surrogate outperforms MC. In fact, it can
be shown that the MC estimate is i.e., constant fit value with the solution of the
least-squares problem (19.15). At the same time, the variance problem (19.15). At
the same time, the variance estimate requires at least a second-order PC surrogate
to guarantee improvement over MC error for all inspected values of the number of
function evaluations.
Similar results hold across all the tested models, listed in the Appendix, as
Fig. 19.6 suggests. It illustrates relative error convergence with respect to the number
of function evaluations, as well as with respect to the PC surrogate order, for all the
considered model functions with d D 3. Again, one can see that typically the mean
and, to the lesser extent, the variance is much more efficient to estimate with the
PC surrogate approximation than with the nave MC estimator. In fact, continuous
(model #4, discontinuous first derivative) and corner-peak (model #5, sharp peak at
the domain corner) functions are less amenable to polynomial approximation, as the
third column suggests, leading to moment estimates that are comparable in accuracy
with MC estimates, at least for low surrogate orders.
690 K. Sargsyan
Fig. 19.5 Relative error in the estimation of mean (left column) and variance (right column)
with varying number of function evaluations (top row) and PC surrogate order (bottom row) for
the three-dimensional oscillatory function. The corresponding MC estimates are highlighted for
comparison with dashed lines or as a separate abscissa value. Presented values of relative errors
are based on medians over 100 different training sample sets, while error bars indicate 25% and
75% quantiles
Sobols sensitivity indices are useful statistical summaries of model output [57, 67].
They measure fractional contributions of each parameter or group of parameters
toward the total output variance. In this chapter, the main effect sensitivities are
explored, also called first-order sensitivities, that are defined as
where Vi and Ei indicate variance with respect to the i -th parameter and
expectation with respect to the rest of the parameters, respectively. The sampling-
based approach developed in Saltelli et al. [58]
" #
1 1 X
M .m/ .m/
SO i D f .. // f ..N i // f ..N //
.m/
(19.34)
O ./
Vf M mD1
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis
Fig. 19.6 Illustration of the relative error in the estimate of mean (left column) and variance (middle column), as well as surrogate error (right column), i.e.,
mean-square mismatch of the surrogate and the true model over a separate set of 100 validation samples. The results are illustrated in two ways, with respect to
N (top row), the number of function evaluations for the PC surrogate order set to p D 4, and with respect to the PC surrogate order for N D 10;000 (bottom
row). The corresponding MC estimates are highlighted for comparison with dashed lines or as a separate abscissa value. The error bars indicate 25 % and 75 %
quantiles, while the dots correspond to medians over 100 replica simulations. Models are color coded as shown in the legends
691
692 K. Sargsyan
a b
Fig. 19.7 Relative error in estimation of the first main sensitivity index, S1 , with respect to PC
surrogate order with varying number of function evaluations for the oscillatory function for three
cases of dimensionality d . The corresponding MC estimates are highlighted for comparison as a
separate abscissa value. Presented values of relative errors are based on medians over 100 different
training sample sets, while error bars indicate 25 % and 75 % quantiles. (a) d D 1. (b) d D 3. (c)
d D 10
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 693
PMi 2 2
Vi Ei gc ./ji mD1 ckm .i/ jjkm .i/ jj
Si D PK1 2 ; (19.35)
Vgc ./ kD1 ck jjk jj
2
4 Conclusions
This chapter has focused on surrogate modeling for computationally expensive for-
ward models. Specifically, polynomial chaos (PC) surrogates have been studied for
the purposes of forward propagation and variance-based sensitivity analysis. Among
various methods of PC coefficient computation, Bayesian regression has been
highlighted as it offers a host of advantages. First of all, it is applicable with noisy
function evaluations and allows likelihood or objective function construction stem-
ming from formal probabilistic assumptions. Second, it provides robust answers
with an uncertainty certificate with any number of and arbitrarily distributed training
function evaluations, which is often very practical for complex physical models,
e.g., climate models that are simulated on supercomputers with a single simulation
taking hours or days. This is particularly useful for high-dimensional surrogate
construction, when one necessarily operates under the condition of sparse training
data. Relatedly, high dimensionality or a large number of surrogate parameters
often lead to an underdetermined problem in which case Bayesian sparse learning
methods such as Bayesian compressive sensing provide an uncertainty-enabled
alternative to classical, deterministic regularization methods. Besides, Bayesian
methods allow sequential updating of surrogates as new function evaluations
arrive by encoding the state of knowledge about the surrogate parameters into a
prior distribution. Finally, Bayesian regression allows an efficient solution to the
surrogate model selection problem via evidence computation and Bayes factors
as demonstrated on toy examples in this chapter. Irrespective of the construction
methodology, a PC surrogate enables drastic computational savings when evaluating
moments or sensitivity indices of complex models, as illustrated on a select class of
functions with tunable dimensionality and a varying degree of smoothness.
694
Table 19.1 Test functions used in the studies of this section. The shift parameters are set to ui D 0:3 for all dimensions i D 1; : : : ; d , while the weight
1
Pd Pd
parameters are selected as wi D
C =i with normalization constant C D 1= i D1 i to ensure i D1wi D 1. The variance formula for the product-peak
i ui 1 2
Qd
2 2 2 C 2w .arctan .wi .1 ui // C arctan .wi ui // m.u; w/
i
function is v.u; w/ D i D1 w4i 2.1Cw1u.1u /2 /
C 2.1Cw u /
i i i i
Id Function Formula Exact mean Exact variance
R1 R1
fu;w ./ m.u; w/ D 0 fu;w ./d v.u; w/ D 0 .fu;w ./
m.u; w//2 d
d d d
! !
X X
1
Y 2 sin.wi =2/ 1
1 Oscillatory cos 2u1 C wi i cos 2u1 C wi 2 2
C 12 m.2u; 2w/ m.u; w/2
i D1 i D1 i D1
wi
d d p
!
X Y p
2 Gaussian exp w2i .i ui /2 .erf .wi .1 ui // C erf .wi ui // m.u; 2w/ m.u; w/2
i D1 i D1
2wi
d d
!
X Y 1
3 Exponential exp wi .i ui / .exp .wi .1 ui // exp .wi ui // m.u; 2w/ m.u; w/2
i D1
w
i D1 i
d d
!
X Y 1
4 Continuous exp wi ji ui j .2 exp .wi ui / exp .wi .ui 1/// m.u; 2w/ m.u; w/2
i D1
w
i D1 i
d
!.d C1/
X 1 X .1/jjrjj1
5 Corner peak 1C wi i Qd Pd Sampling estimator (19.34)
i D1 d i D1 wi r2f0;1gd 1 C i D1 wi ri with M D 107
d
Y d
Y
w2i
6 Product peak 2 wi .arctan .wi .1 ui // C arctan .wi ui // See the caption
i D1
1C wi .i ui /2 i D1
K. Sargsyan
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 695
Appendix
Table 19.1 shows six classes of functions employed in the numerical tests in this
chapter. All functions, except the exponential one, are taken from the classical
Genz family of test functions [18]. The weight parameters of the test functions are
chosen according
P P decay rate wi D C =i and a normalization factor
to a predefined
C D 1= diD1 i 1 to ensure diD1 wi D 1. The exact moments are analytically
available and are used for reference to compare against, with the exception of the
corner-peak function, for which the variance estimator (19.30) with sampling size
M D 107 is used. The true reference values true reference values for sensitivity
indices Si are also computed via Monte Carlo, using the estimates (19.34) with
M D 105 . The functions are defined on 2 0; 1d ; assuming the inputs are i.i.d
uniform random variables, the underlying linear input PC expansions are simple
linear transformations i D 0:5i C 0:5, relating the physical model inputs
i 2 0; 1 to the PC surrogate inputs i 2 1; 1, for i D 1; : : : ; d .
References
1. Acquah, H.: Comparison of Akaike information criterion (AIC) and Bayesian information
criterion (BIC) in selection of an asymmetric price relationship. J. Dev. Agric. Econ. 2(1),
001006 (2010)
2. Arlot, S., Celisse, A., et al.: A survey of cross-validation procedures for model selection. Stat.
Surv. 4, 4079 (2010)
3. Babacan, S., Molina, R., Katsaggelos, A.: Bayesian compressive sensing using Laplace priors.
IEEE Trans. Image Process. 19(1), 5363 (2010)
4. Barthelmann, V., Novak, E., Ritter, K.: High-dimensional polynomial interpolation on sparse
grids. Adv. Comput. Math. 12, 273288 (2000)
5. Bastos, L., OHagan, A.: Diagnostics for Gaussian process emulators. Technometrics 51(4),
425438 (2009)
6. Bernardo, J., Smith, A.: Bayesian Theory. Wiley Series in Probability and Statistics. Wiley,
Chichester (2000)
7. Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on least angle
regression. J. Comput. Phys. 230(6), 23452367 (2011)
8. Borgonovo, E., Castaings, W., Tarantola, S.: Model emulation and moment-independent
sensitivity analysis: an application to environmental modelling. Environ. Model. Softw. 34,
105115 (2012)
9. Cands, E., Romberg, J.: Sparsity and incoherence in compressive sampling. Inverse Probl.
23(3), 969985 (2007)
10. Cands, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction
from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489509 (2006)
11. Carlin, B.P., Louis, T.A.: Bayesian Methods for Data Analysis. Chapman and Hall/CRC, Boca
Raton (2011)
12. Chantrasmi, T., Doostan, A., Iaccarino, G.: Pad-Legendre approximants for uncertainty
analysis with discontinuous response surfaces. J. Comput. Phys. 228(19), 71597180 (2009)
13. Cox, D.A., Little, J., OShea, D.: Ideals, Varieties and Algorithms: An Introduction to
Computational Algebraic Geometry and Commutative Algebra. Springer,New York (1997)
14. Crestaux, T., Le Matre, O., Martinez, J.: Polynomial chaos expansion for sensitivity analysis.
Reliab. Eng. Syst. Saf. 94(7), 11611172 (2009)
15. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 12891306 (2006)
696 K. Sargsyan
16. Ernst, O., Mugler, A., Starkloff, H.J., Ullmann, E.: On the convergence of generalized
polynomial chaos expansions. ESAIM: Math. Model. Numer. Anal. 46, 317339 (2012)
17. Gamerman, D., Lopes, H.F.: Markov Chain Monte Carlo: Stochastic Simulation for Bayesian
Inference. Chapman and Hall/CRC, Boca Raton (2006)
18. Genz, A.: Testing multidimensional integration routines. In: Proceedings of International
Conference on Tools, Methods and Languages for Scientific and Engineering Computation.
Elsevier North-Holland, Inc., pp 8194 (1984)
19. Gerstner, T., Griebel, M.: Numerical integration using sparse grids. Numer. Algorithms 18,
209232 (1998). doi:10.1023/A:1019129717644, (also as SFB 256 preprint 553, Univ. Bonn,
1998)
20. Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Springer, New York
(1991)
21. Ghosh, D., Ghanem, R.: Stochastic convergence acceleration through basis enrichment of
polynomial chaos expansions. Int. J. Numer. Method Eng. 73, 162174 (2008)
22. Gilks, W.R.: Markov Chain Monte Carlo. Wiley Online Library (2005)
23. Griebel, M.: Sparse grids and related approximation schemes for high dimensional problems.
In: Proceedings of the Conference on Foundations of Computational Mathematics. Santander,
Spain (2005)
24. Huan, X., Marzouk, Y.: Simulation-based optimal Bayesian experimental design for nonlinear
systems. J. Comput. Phys. 232, 288317 (2013)
25. Isukapalli, S., Roy, A., Georgopoulos, P.: Stochastic response surface methods (SRSMs) for
uncertainty propagation: application to environmental and biological systems. Risk Anal.
18(3), 351363 (1998)
26. Jakeman, J.D., Eldred, M.S., Sargsyan, K.: Enhancing `1 -minimization estimates of polyno-
mial chaos expansions using basis selection. J. Comput. Phys. 289, 1834 (2015)
27. Jansen, M.J.: Analysis of variance designs for model output. Comput. Phys. Commun. 117(1),
3543 (1999)
28. Jefferys, W.H., Berger, J.O.: Ockhams razor and Bayesian analysis. Am. Sci. 80, 6472 (1992)
29. Kapur, J.N.: Maximum-Entropy Models in Science and Engineering. Wiley, New Delhi (1989)
30. Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773795 (1995)
31. Kersaudy, P., Sudret, B., Varsier, N., Picon, O., Wiart, J.: A new surrogate modeling technique
combining Kriging and polynomial chaos expansionsapplication to uncertainty analysis in
computational dosimetry. J. Comput. Phys. 286, 103117 (2015)
32. Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 7986 (1951)
33. Le Matre, O., Knio, O.: Spectral Methods for Uncertainty Quantification. Springer, New York
(2010)
34. Le Matre, O., Knio, O., Debusschere, B., Najm, H., Ghanem, R.: A multigrid solver for two-
dimensional stochastic diffusion equations. Comput. Methods Appl. Mech. Eng. 192, 4723
4744 (2003)
35. Le Matre, O., Ghanem, R., Knio, O., Najm, H.: Uncertainty propagation using Wiener-Haar
expansions. J. Comput. Phys. 197(1), 2857 (2004a)
36. Le Matre, O., Najm, H., Ghanem, R., Knio, O.: Multi-resolution analysis of Wiener-type
uncertainty propagation schemes. J. Comput. Phys. 197, 502531 (2004b)
37. Le Matre, O., Najm, H., Pbay, P., Ghanem, R., Knio, O.: Multi-resolution analysis scheme for
uncertainty quantification in chemical systems. SIAM J. Sci. Comput. 29(2), 864889 (2007)
38. Li, G., Rosenthal, C., Rabitz, H.: High dimensional model representations. J. Phys. Chem. A
105, 77657777 (2001)
39. Marrel, A., Iooss, B., Laurent, B., Roustant, O.: Calculations of Sobol indices for the Gaussian
process metamodel. Reliab. Eng. Syst. Saf. 94(3), 742751 (2009)
40. Marzouk, Y.M., Najm, H.N.: Dimensionality reduction and polynomial chaos acceleration of
Bayesian inference in inverse problems. J. Comput. Phys. 228(6), 18621902 (2009)
41. Marzouk, Y.M., Najm, H.N., Rahn, L.A.: Stochastic spectral methods for efficient Bayesian
solution of inverse problems. J. Comput. Phys. 224(2), 560586 (2007)
19 Surrogate Models for Uncertainty Propagation and Sensitivity Analysis 697
42. Mathelin, L., Gallivan, K.: A compressed sensing approach for partial differential equations
with random input data. Commun. Comput. Phys. 12(4), 919954 (2012)
43. Moore, B., Natarajan, B.: A general framework for robust compressive sensing based nonlinear
regression. In: 2012 IEEE 7th Sensor Array and Multichannel Signal Processing Workshop
(SAM), Hoboken. IEEE, pp 225228 (2012)
44. Najm, H.: Uncertainty quantification and polynomial chaos techniques in
computational fluid dynamics. Ann. Rev. Fluid Mech. 41(1), 3552 (2009).
doi:10.1146/annurev.fluid.010908.165248
45. Orr, M.: Introduction to radial basis function networks. Technical Report, Center for Cognitive
Science, University of Edinburgh (1996)
46. Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681686 (2008)
47. Peng, J., Hampton, J., Doostan, A.: A weighted `1 -minimization approach for sparse polyno-
mial chaos expansions. J. Comput. Phys. 267, 92111 (2014)
48. Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R., Tucker, P.K.: Surrogate-based
analysis and optimization. Prog. Aerosp. Sci. 41(1), 128 (2005)
49. Rabitz, H., Alis, O.F.: General foundations of high-dimensional model representations. J. Math.
Chem. 25, 197233 (1999)
50. Rabitz, H., Alis, O.F., Shorter, J., Shim, K.: Efficient input-output model representations.
Comput. Phys. Commun. 117, 1120 (1999)
51. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT, Cam-
bridge (2006)
52. Rauhut, H., Ward, R.: Sparse Legendre expansions via `1 -minimization. J. Approx. Theory
164(5), 517533 (2012)
53. Reagan, M., Najm, H., Ghanem, R., Knio, O.: Uncertainty quantification in reacting flow
simulations through non-intrusive spectral projection. Combust. Flame 132, 545555 (2003)
54. Reagan, M., Najm, H., Debusschere, B., Le Matre, O., Knio, O., Ghanem, R.: Spectral
stochastic uncertainty quantification in chemical systems. Combust. Theory Model. 8, 607
632 (2004)
55. Rutherford, B., Swiler, L., Paez, T., Urbina, A.: Response surface (meta-model) methods and
applications. In: Proceedings of 24th International Modal Analysis Conference, St. Louis,
pp 184197 (2006)
56. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 280297 (2002). doi:10.1016/S0010-4655(02)00280-1
57. Saltelli, A., Tarantola, S., Campolongo, F., Ratto, M.: Sensitivity Analysis in Practice: A Guide
to Assessing Scientific Models. Wiley, Chichester/Hoboken (2004)
58. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output. Design and estimator for the total sensitivity index.
Comput. Phys. Commun. 181(2), 259270 (2010)
59. Sargsyan, K., Debusschere, B., Najm, H., Le Matre, O.: Spectral representation and reduced
order modeling of the dynamics of stochastic reaction networks via adaptive data partitioning.
SIAM J. Sci. Comput. 31(6), 43954421 (2010)
60. Sargsyan, K., Safta, C., Debusschere, B., Najm, H.: Multiparameter spectral representation of
noise-induced competence in Bacillus subtilis. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(6),
17091723 (2012a). doi:10.1109/TCBB.2012.107
61. Sargsyan, K., Safta, C., Debusschere, B., Najm, H.: Uncertainty quantification given discon-
tinuous model response and a limited number of model runs. SIAM J. Sci. Comput. 34(1),
B44B64 (2012b)
62. Sargsyan, K., Safta, C., Najm, H., Debusschere, B., Ricciuto, D., Thornton, P.: Dimensionality
reduction for complex models via Bayesian compressive sensing. Int. J. Uncertain. Quantif.
4(1), 6393 (2014). doi:10.1615/Int.J.UncertaintyQuantification.2013006821
63. Sargsyan, K., Rizzi, F., Mycek, P., Safta, C., Morris, K., Najm, H., Le Matre, O., Knio, O.,
Debusschere, B.: Fault resilient domain decomposition preconditioner for PDEs. SIAM J. Sci.
Comput. 37(5), A2317A2345 (2015)
64. Schumaker, L.: Spline Functions: Basic Theory. Cambridge University Press, New York (2007)
698 K. Sargsyan
65. Sivia, D.S., Skilling, J.: Data Analysis: A Bayesian Tutorial, 2nd edn. Oxford University Press,
Oxford (2006)
66. Sobol, I.M.: Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput.
Exp. 1, 407414 (1993)
67. Sobol, I.M.: Theorems and examples on high dimensional model representation. Reliab. Eng.
Syst. Saf. 79, 187193 (2003)
68. Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer Science &
Business Media, New York (2012)
69. Sudret, B.: Global sensitivity analysis using polynomial Chaos expansions. Reliab. Eng. Syst.
Saf. (2007). doi:10.1016/j.ress.2007.04.002
70. Sudret, B.: Meta-models for structural reliability and uncertainty quantification. In: Asian-
Pacific Symposium on Structural Reliability and its Applications, Singapore, pp 124 (2012)
71. Wan, X., Karniadakis, G.E.: An adaptive multi-element generalized polynomial chaos method
for stochastic differential equations. J. Comput. Phys. 209, 617642 (2005)
72. Webster, M., Tatang, M., McRae, G.: Application of the probabilistic collocation method for
an uncertainty analysis of a simple ocean model. Technical report, MIT Joint Program on the
Science and Policy of Global Change Reports Series 4, MIT (1996)
73. Wiener, N.: The homogeneous chaos. Am. J. Math. 60, 897936 (1938). doi:10.2307/2371268
74. Xiu, D.: Efficient collocational approach for parametric uncertainty analysis. Commun.
Comput. Phys. 2(2), 293309 (2007)
75. Xiu, D.: Fast numerical methods for stochastic computations: a review. J. Comput. Phys.
5(24), 242272 (2009)
76. Xiu, D., Hesthaven, J.S.: High-order collocation methods for differential equations with
random inputs. SIAM J. Sci. Comput. 27(3), 11181139 (2005)
77. Xiu, D., Karniadakis, G.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002). doi:10.1137/S1064827501387826
78. Zuniga, M.M., Kucherenko, S., Shah, N.: Metamodelling with independent and dependent
inputs. Comput. Phys. Commun. 184(6), 15701580 (2013)
Stochastic Collocation Methods: A Survey
20
Dongbin Xiu
Abstract
Stochastic collocation (SC) has become one of the major computational tools for
uncertainty quantification. Its primary advantage lies in its ease of implementa-
tion. To carry out SC, one needs only a reliable deterministic simulation code
that can be run repetitively at different parameter values. And yet, the modern-
day SC methods can retain the high-order accuracy properties enjoyed by most
of other methods. This is accomplished by utilizing the large amount of literature
in the classical approximation theory. Here we survey the major approaches in
SC. In particular, we focus on a few well-established approaches: interpolation,
regression, and pseudo projection. We present the basic formulations of these
approaches and some of their major variations. Representative examples are also
provided to illustrate their major properties.
Keywords
Compressed sensing Interpolation Least squares Stochastic collocation
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
2 Definition of Stochastic Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
3 Stochastic Collocation via Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
3.2 Interpolation SC on Structured Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
3.3 Interpolation on Unstructured Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
4 Stochastic Collocation via Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
4.1 Over-sampled Case: Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
4.2 Under-sampled Case: Sparse Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
D. Xiu ()
Department of Mathematics and Scientific Computing and Imaging Institute, University of Utah,
Salt Lake City, UT, USA
e-mail: dongbin.xiu@utah.edu
1 Introduction
u.x; t; Z/ W DN 0; T IZ ! R;
u.; Z/ W IZ ! R; (20.2)
where the dependence on the spatial and temporal variables .x; t / is suppressed.
Hereafter, all statements are made for any fixed x and t .
In stochastic collocation, the system (20.1) is solved in a discrete manner. More
specifically, we seek to enforce the equation at a discrete set of nodes collocation
points. Let M D fZ .j / gM j D1 IZ be a set of (prescribed) nodes in the random
space, where M 1 is the number of nodes. Then in SC, we enforce (20.1) at the
node Z .j / , for all j D 1; : : : ; M , by solving
8
< ut .x; t; Z .j / / D L.u/; D .0; T ;
B.u/ D 0; @D 0; T ; (20.3)
:
u D u0 ; D ft D 0g:
20 Stochastic Collocation Methods: A Survey 701
It is easy to see that for each j , (20.3) is a deterministic problem because the value of
the random parameter Z is fixed. Therefore, solving the system poses no difficulty
provided one has a well-established deterministic algorithm. Let u.j / D u.; Z .j / /,
j D 1; : : : ; M , be the solution of the above problem. The result of solving (20.3) is
an ensemble of deterministic solutions fu.j / gM j D1 . And one can apply various post-
processing operations to the ensemble to extract useful information about u.Z/.
From this point of view, all classical sampling methods belong to the class of
collocation methods. For example, in Monte Carlo sampling, the nodal set M is
generated randomly according to the distribution of Z, and the ensemble averages
are used to estimate the solution statistics, e.g., mean and variance. In deterministic
sampling methods, the nodal set is typically the nodes of a cubature rule (i.e.,
quadrature rule in multidimensional space) defined on IZ such that one can use
the integration rule defined by the cubature to estimate the solution statistics.
In SC, the goal is to construct an accurate approximation to the solution response
function using the samples. This is a stronger goal than estimating the solution
statistics and the major difference between SC and the classical sampling methods.
Knowing the function response of the solution allows us to immediately derive all
of the statistical information of the solution. It is not the case conversely, as knowing
the solution statistics does not allow us to create the solution response. To this
end, SC can be classified as strong approximation methods, whereas the traditional
sampling methods are weak approximation methods. (More precise definitions of
strong and weak approximations can be found in [30].)
3.1 Formulation
N
X
w.Z/ D ci bi .Z/; (20.5)
iD1
This immediately leads to a linear system of equations for the unknown coefficients
Ac D f; (20.7)
where
and
are the coefficient vector and solution sample vector, respectively. For example, if
one adopts the gPC expansion, then VN is the polynomial space Pdn from (20.4), and
the matrix A becomes the Vandermonde-like matrix with entries
where j .Z/ are the gPC orthogonal polynomials chosen based on the probability
distribution of Z and satisfy
Z
i .z/j .z/.z/d z D ij : (20.11)
IZ
Here, ij is the Kronecker delta function and the polynomials are normalized.
When the number of the collocation points is the same as the number of the
basis functions, i.e., M D N , the matrix A is square and can be inverted when it is
nonsingular. One then immediately obtains
c D A1 f; (20.12)
and can construct the approximation w.Z/ using (20.5). Although very flexible,
this approach is not used widely in practice. The reason is that interpolation is
often not very robust and leads to wild behavior in w.Z/. This is especially true in
multidimensional spaces (d > 1). The accuracy of the interpolation is also difficult
to assess and control. Even though the interpolation w.Z/ has no error at the nodal
points, it can incur large errors between the nodes. Rigorous mathematical analysis
is also lacking on this front, particularly in high dimensions. Some of these facts are
well documented in texts such as [7, 23, 27].
Another approach to accomplish interpolation is to employ the Lagrange inter-
polation approach. That is, we seek
M
X
w.Z/ D u.Z .j / /Lj .Z/; (20.13)
j D1
where
Lj .Z .i/ / D ij ; 1 i; j M; (20.14)
10 1
0 0.9
10 0.8
0.7
20
0.6
30
0.5
40
0.4
50 0.3
60 0.2
70 0.1
80 0
1 0.5 0 0.5 1 1 0.5 0 0.5 1
Fig. 20.1 Polynomial interpolation of f .x/ D 1=.1 C 25x 2 / in 1; 1, the rescaled Runge
function. Left: interpolation on uniformly distributed nodes; Right: interpolation on nonuniform
nodes (the zeros of Chebyshev polynomials)
Let
mi
X .j /
Qmi f D f .Zi /Lj .Zi /; i D 1; : : : ; d; (20.16)
j D1
.I Qmi /f / m ;
where the constant > 0 depends on the smoothness of the function f . Then, the
overall interpolation error also follows the same convergence rate
.I QM /f / m :
.I QM /f / M =d ; d 1:
Again it is clear that this is the union of a collection of subsets of the full tensor grids.
Unfortunately there is usually no explicit formula to determine the total number of
nodes M in terms of d and `.
One popular choice of sparse grids is based on Clenshaw-Curtis nodes, which are
the extrema of the Chebyshev polynomials and are defined as, for any 1 i d ,
.j / .j 1/
Zi D cos ; j D 1; : : : ; mki ; (20.21)
mki 1
M D #k
2k d k =k; d 1: (20.22)
It has been shown ([3]) that the interpolation through the Clenshaw-Curtis sparse
grid interpolation is exact if the function is in Pdk . (In fact, the polynomial space for
which the interpolation is exact is slightly bigger than Pdk .) For large dimensions
d 1, dim Pdk D d Ck d
d k =k. Therefore, the number of points from (20.22) is
k
about 2 more and the factor is independent of the dimension d . For this reason, the
20 Stochastic Collocation Methods: A Survey 707
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
best, as all other types of sparse grid constructions have (much) larger number of
points. To this end, sparse grid interpolation has been used for moderately high
dimensions, say, d
O.10/.
The results are shown in Fig. 20.3. We observe almost the same convergence rate
and errors in both the least interpolation and the standard interpolation using Gauss-
Legendre nodes. The fundamental difference between the two methods is that the
100 100
1
10 101
102 102
error
103 103
4
10 104
f0 f0
105 105
f1 f1
106 f2 106 f2
f3 f3
7
10 107
101 102 101 102
Fig. 20.3 Interpolation accuracy for functions f0 , f1 , f2 , and f3 in (20.23). Left: errors by least
interpolation; Right: errors by interpolation on Legendre-Gauss nodes (Reproduction of Fig. 5.2 in
[22])
20 Stochastic Collocation Methods: A Survey 709
In regression type SC, one does not require the approximation w.Z/ to precisely
match the solution u.Z/ at the collocation nodes M . Instead, one resorts to
minimize the error difference
where the norm k k is a discrete norm defined over the nodal set . By doing
so, the numerical errors are more evenly distributed in the entire parameter space
IZ , assuming that the set fills up the space IZ in a reasonable manner. Thus,
the non-robustness of interpolation can be alleviated. The regression approach also
becomes a natural choice when the solution samples u.Z .j / / are of low accuracy or
contaminated by noise, in which case interpolation of the solution samples becomes
an unnecessarily strong requirement.
When the number of samples is larger than the cardinality of the linear space VN , we
have an over-determined system of equations (20.7) with M > N . Consequently,
the equality cannot hold true in general. The natural approach is to use the least
squares method. By doing so, the norm in (20.24) is the vector 2-norm, and we have
the well-known least squares solution:
worthwhile to strategically choose the points to achieve better accuracy. This is the
topic of experimental design. Interested readers can refer to, for example, [1, 15],
and the references therein.
Despite the large amount of literature on least squares methods, a series of
more recent studies from the computational mathematics perspective show that the
linear over-sampling of Monte Carlo and quasi-Monte Carlo points for polynomial
approximation can be asymptotically unstable (cf. [8,1820]). Care must be taken if
one intends to conduct very high-order polynomial approximations using the least
squares method.
When the number of samples is smaller than the cardinality of the linear space
VN , we have an under-determined system of equation (20.7) with M < N .
Equation (20.7) then admits an infinite number of solutions. In this case, one can
resort to the idea of compressive sensing (CS) and seek a sparse solution:
where kck1 D jc1 j C C jcN j. The use of the `1 norm also promotes sparsity. But
the optimization problem can be cast into a linear programming and solved easily.
The constraint Ac D f effectively enforces interpolation. This does not need to be
the case, especially when the samples f contain errors or noises. In this case, the
de-noising version of CS [5, 6, 12] can be used.
where > 0 is a real number associated with the noise level in the sample data f.
The idea of CS was first used in UQ computations in [13], where the gPC-type
orthogonal approximations using Legendre polynomials were used. The advantage
of this approach lies in the fact that it allows one to construct reliable sparse gPC
approximations when the underlying system is (severely) under sampled. Most
of the existing studies focus on the use of Legendre polynomials [17, 24, 33].
20 Stochastic Collocation Methods: A Survey 711
Q .j /
where W is a diagonal matrix with entries wj;j D .=2/d =2 diD1 .1 .zi /2 /1=4 ,
j D 1; : : : ; M . Note that this is the tensor product of the one-dimensional
Chebyshev weights. The corresponding de-noising version for (20.28) takes a
similar form. On the other hand, it was also proved that in high dimensions d 1,
the standard non-weighted version (20.27) using uniformly distributed random
samples is in fact better than the weighted Chebyshev version (20.29).
The performance of the `1 minimization methods is typically measured by the
recovery probability of the underlying sparse functions. For high rate of recovery,
the required number of samples typically follows:
where s is the sparsity of the underlying function, i.e., s is the number of nonzero
terms in the underlying function. A representative result can be seen in Fig. 20.4,
where the `1 minimization is used to recover a d D 3 dimensional polynomial
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120
Number of Samples
Fig. 20.4 Probability of successful function recovery vs. number of samples. (d D 3 and s D
10). The degree of polynomial space is k D 10 with dim Pdk D 286. Line pattern: dotted-circle,
uniform samples; dotted-triangle: Chebyshev samples; solid-circle: weighted uniform samples;
solid-triangle: weighted Chebyshev samples (Reproduction of Fig. 1 in [33])
712 D. Xiu
In the pseudo projection approach (first formally defined in [29]), one seeks
to approximate the continuous orthogonal projection using an integration rule.
Since the orthogonal projection is the best approximation, based on a properly
chosen norm, the pseudo projection method allows one to obtain a near best
approximation whenever the chosen integration is sufficiently accurate. Again, let
VN be a properly chosen linear space, from which an approximation is sought. Then,
the orthogonal projection of the solution is
uN WD PVN u; (20.30)
where P denotes the orthogonal projection operator. The projection operator is often
defined via integrals. In the pseudo projection approach, one then approximates the
integrals using a quadrature/cubature rule.
To better illustrate the method, let us again use the gPC-based approximation. In
this case, VN D Pdn , and we seek an approximation
N
!
X nCd
w.Z/ D ci i .Z/; N D dim Pdn D : (20.31)
iD1
n
The orthogonal projection, which is the best approximation in L2 norm, takes the
following form,
N
X Z
uN .Z/ WD uO i i .Z/; uO i D u.z/i .z/.z/d z: (20.32)
iD1
In the pseudo projection approach, we then use a cubature rule to approximate the
coefficients uO i . That is, in (20.31), we seek
M
X
ci D u.Z .j / /i .Z .j / /wj uO i ; i D 1; : : : ; N: (20.33)
j D1
20 Stochastic Collocation Methods: A Survey 713
This means the collocation nodal set needs to be a quadrature such that integrals
over the IZ can be approximated by a weighted sum. That is,
Z M
X
f .z/.z/d z f .Z .j / /wj ; (20.34)
IZ j D1
100
q1=2
q1=4
q1=6
101
102
103
104
105
106
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
P
Fig. 20.5 Error convergence of pseudo projection method with different choices of cubature rule.
Here the cubature rule is the full tensor quadrature rule with q1 number of points in each dimension
6 Summary
References
1. Atkinson, A.C., Donev, A.N., Tobias, R.D.: Optimum Experimental Designs, with SAS.
Oxford University Press, Oxford (2007)
2. Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 45(3), 10051034 (2007)
20 Stochastic Collocation Methods: A Survey 715
3. Barthelmann, V., Novak, E., Ritter, K.: High dimensional polynomial interpolation on sparse
grid. Adv. Comput. Math. 12, 273288 (1999)
4. Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 1123 (2004)
5. Cands, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate
measurements. Commun. Pure Appl. Math. 59(8), 12071223 (2006)
6. Cands, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal
encoding strategies? IEEE Trans. Inform. Theory 52(12), 54065425 (2006)
7. Cheney, W., Light, W.: A Course in Approximation Theory. Brooks/Cole Publishing Company,
Pacific Grove (2000)
8. Cohen, A., Davenport, M.A., Leviatan, D.: On the stability and accuracy of least squares
approximations. Found. Comput. Math. 13(5), 819834 (2013)
9. Cools, R.: Advances in multidimensional integration. J. Comput. Appl. Math. 149, 112 (2002)
10. De Boor, C., Ron, A.: On multivariate polynomial interpolation. Constr. Approx. 6, 287302
(1990)
11. De Boor, C., Ron, A.: Computational aspects of polynomial interpolation in several variables.
Math. Comput. 58, 705727 (1992)
12. Donoho, D.L.: Compressed sensing. IEEE Trans. Inform. Theory 52(4), 12891306 (2006)
13. Doostan, A., Owhadi, H.: A non-adapted sparse approximation of PDEs with stochastic inputs.
J. Comput. Phys. 230(8), 30153034 (2011)
14. Engels, H.: Numerical Quadrature and Cubature. Academic, London/New York (1980)
15. Fedorov, V.V., Leonov, S.L.: Optimal Design for Nonlinear Response Models. CRC Press,
Boca Raton (2014)
16. Ghanem, R.G., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1991)
17. Mathelin, L., Gallivan K.A.: A compressed sensing approach for partial differential equations
with random input data. Commun. Comput. Phys. 12, 919954 (2012)
18. Migliorati, G., Nobile, F.: Analysis of discrete least squares on multivariate polynomial spaces
with evaluations at low-discrepancy point sets. J. Complex. 31(4), 517542 (2015)
19. Migliorati, G., Nobile, F., E. von Schwerin, Tempone, R.: Approximation of quantities of
interest in stochastic PDEs by the random discrete L2 projection on polynomial spaces. SIAM
J. Sci. Comput. 35(3), A1440A1460 (2013)
20. Migliorati, G., Nobile, F., von Schwerin, E., Tempone, R.: Analysis of the discrete L2
projection on polynomial spaces with random evaluations. Found. Comput. Math. 14(3), 419
456 (2014)
21. Narayan, A., Xiu, D.: Stochastic collocation methods on unstructured grids in high dimensions
via interpolation. SIAM J. Sci. Comput. 34(3), A1729A1752 (2012)
22. Narayan, A., Xiu, D.: Constructing nested nodal sets for multivariate polynomial interpolation.
SIAM J. Sci. Comput. 35(5), A2293A2315 (2013)
23. Powell, M.J.D.: Approximation Theory and Methods. Cambridge University Press, Cambridge
(1981)
24. Rauhut, H., Ward, R.: Sparse Legendre expansions via `1 -minimization. J. Approx. Theory
164, 517533 (2012)
25. Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of
functions. Soviet Math. Dokl. 4, 240243 (1963)
26. Stroud, A.H.: Approximate Calculation of Multiple Integrals. Prentic-Hall, Englewood Cliffs
(1971)
27. Trefethen, L.N.: Approximation Theory and Approximation Practice. SIAM, Philadelphia
(2013)
28. Wasilkowski, G.W., Wozniakowski, H.: Explicit cost bounds of algorithms for multivariate
tensor product problems. J. Complex. 11, 156 (1995)
29. Xiu, D.: Efficient collocational approach for parametric uncertainty analysis. Commun.
Comput. Phys. 2(2), 293309 (2007)
30. Xiu, D.: Numerical Methods for Stochastic Computations. Princeton University Press, Prince-
ton (2010)
716 D. Xiu
31. Xiu, D., Hesthaven, J.S.: High-order collocation methods for differential equations with
random inputs. SIAM J. Sci. Comput. 27(3), 11181139 (2005)
32. Xiu, D., Karniadakis, G.E.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
33. Yan, L., Guo, L., Xiu, D.: Stochastic collocation algorithms using `1 -minimization. Int. J. UQ
2(3), 279293 (2012)
Sparse Collocation Methods for Stochastic
Interpolation and Quadrature 21
Max Gunzburger, Clayton G. Webster, and Guannan Zhang
Abstract
In this chapter, the authors survey the family of sparse stochastic collocation
methods (SCMs) for partial differential equations with random input data. The
SCMs under consideration can be viewed as a special case of the generalized
stochastic finite element method (Gunzburger et al., Acta Numer 23:521
650, 2014), where the approximation of the solution dependences on the random
variables is constructed using Lagrange polynomial interpolation. Relying on the
delta property of the interpolation scheme, the physical and stochastic degrees
of freedom can be decoupled, such that the SCMs have the same nonintrusive
property as stochastic sampling methods but feature much faster convergence.
To define the interpolation schemes or interpolatory quadrature rules, several
approaches have been developed, including global sparse polynomial approx-
imation, for which global polynomial subspaces (e.g., sparse Smolyak spaces
(Nobile et al., SIAM J Numer Anal 46:23092345, 2008) or quasi-optimal
subspaces (Tran et al., Analysis of quasi-optimal polynomial approximations for
parameterized PDEs with deterministic and stochastic coefficients. Tech. Rep.
ORNL/TM-2015/341, Oak Ridge National Laboratory, 2015)) are used to exploit
the inherent regularity of the PDE solution, and local sparse approximation,
for which hierarchical polynomial bases (Ma and Zabaras, J Comput Phys
228:30843113, 2009; Bungartz and Griebel, Acta Numer 13:1123, 2004)
or wavelet bases (Gunzburger et al., Lect Notes Comput Sci Eng 97:137
170, Springer, 2014) are used to accurately capture irregular behaviors of the
PDE solution. All these method classes are surveyed in this chapter, including
M. Gunzburger ()
Department of Scientific Computing, The Florida State University, Tallahassee, FL, USA
e-mail: gunzburg@fsu.edu
C.G. Webster G. Zhang
Department of Computational and Applied Mathematics, Oak Ridge National Laboratory, Oak
Ridge, TN, USA
e-mail: webstercg@ornl.gov; zhangg@ornl.gov
some novel recent developments. Details about the construction of the various
algorithms and about theoretical error estimates of the algorithms are provided.
Keywords
Uncertainty quantification Stochastic partial differential equations High-
dimensional approximation Stochastic collocation Sparse grids Hierarchi-
cal basis Best approximation Local adaptivity
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
3 Stochastic Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
3.1 Spatial Finite Element Semi-discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
3.2 Stochastic Fully Discrete Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
3.3 Stochastic Polynomial Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
4 Global Stochastic Collocation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
4.1 Lagrange Global Polynomial Interpolation in Parameter Space . . . . . . . . . . . . . . . . 731
4.2 Generalized Sparse Grid Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
4.3 Nonintrusive Sparse Interpolation in Quasi-optimal Subspaces . . . . . . . . . . . . . . . . 744
5 Local Stochastic Collocation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
5.1 Hierarchical Stochastic Collocation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
5.2 Adaptive Hierarchical Stochastic Collocation Methods . . . . . . . . . . . . . . . . . . . . . . . 753
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
1 Introduction
by truncating an infinite expansion [30] (e.g., the correlated random field case). As
such, a crucial, yet often complicated, ingredient that all numerical approaches to
UQ must incorporate is an accurate and efficient numerical approximation technique
for solving PDEs with random inputs. In some circles, although the nomenclature
stochastic partial differential equations (SPDEs) is reserved for a specific class
of PDEs having random inputs, here, for the sake of economy of notation, this
terminology is used to refer to any PDE having random inputs.
Currently, there are several types of numerical methods available for solving
SPDEs. Monte Carlo methods (MCMs) [29], quasi MCMs [46, 47], multilevel
MCMs [4, 20, 36] stochastic Galerkin methods (SGMs) [2, 3, 35, 55], and stochastic
collocation methods (SCMs) [1, 51, 57, 58, 71, 73]. In this chapter, the authors
provide an overview of the family of stochastic collocation methods for SPDEs.
Recently, SCMs based on either full or sparse tensor-product approximation spaces
[1, 32, 54, 57, 58, 62, 71] have gained considerable attention. As shown in [1],
SCMs can essentially match the fast convergence of intrusive polynomial chaos
methods, even coinciding with them in particular cases. The major difference
between the two approaches is that stochastic collocation methods are ensemble-
based, nonintrusive approaches that achieve fast convergence rates by exploiting
the inherent regularity of PDE solutions with respect to parameters. Compared to
nonintrusive polynomial chaos methods, they also require fewer assumptions about
the underlying SPDE. SCMs can also be viewed as stochastic Galerkin methods
in which one employs an interpolatory basis built from the zeros of orthogonal
polynomials with respect to the joint probability density function of the input
random variables. For additional details about the relations between polynomial
chaos methods and stochastic collocation methods, see [41], and for computational
comparisons between the two approaches, see [5, 25, 28, 41].
To achieve increased rates of convergence, most SCMs described above are based
on global polynomial approximations that take advantage of smooth behavior of
the solution in the multidimensional parameter space. Hence, when there are steep
gradients, sharp transitions, bifurcations, or finite discontinuities (e.g., piecewise
processes) in stochastic space, these methods converge very slowly or even fail to
converge. Such problems often arise in scientific and engineering problems due
to the highly complex nature of most physical or biological phenomena. To be
effective, refinement strategies must be guided by accurate estimations of errors
while not expending significant computational effort approximating an output of
interest within each random dimension. The papers [27, 45, 51, 52, 74] apply an
adaptive sparse grid stochastic collocation strategy that follows the work of [14,37].
This approach utilizes the hierarchical surplus as an error indicator to automatically
detect regions of importance (e.g., discontinuities) in the stochastic parameter space
and adaptively refine the collocation points in this region. To this end, grids are
constructed in an adaptation process steered by the indicator in such a way that a
prescribed global error tolerance is attained. This goal, however, might be achieved
using more points than necessary due to the instability of this multi-scale basis.
The outline of this chapter is as follows. In Sect. 2, a generalized mathematical
description of SPDEs is provided, together with the notations that are used
720 M. Gunzburger et al.
2 Problem Setting
on D, and let a.x; y/, with x 2 D and y 2 , represent the input coefficient
associated with the operator L. The forcing term f D f .x; y/ is also assumed
to be a parameterized field in D . This chapter concentrates on the following
parameterized boundary value problem: for all y 2 , find u.; y/ W D ! R, such
that the following equation holds
To guarantee the well posedness of the system (21.1) in a Banach space, the
following assumptions are needed:
Assumption 1. (a) The solution to (21.1) has realizations in the Banach space
W .D/, i.e., u.; y/ 2 W .D/ -almost surely satisfies that
where W .D/ denotes the dual space of W .D/ and C denotes a constant
having value independent of the realization y 2 .
(b) The forcing term f 2 L2 . I W .D// is such that the solution u is unique and
bounded in L2 . I W .D//.
such that Assumptions 1(ab) are satisfied with W .D/ D H01 .D/; see [1].
722 M. Gunzburger et al.
r a.x; y/ru.x; y/ C u.x; y/ju.x; y/js D f .x; y/ in D
(21.4)
u.x; y/ D 0 on @D
with a.x; y/ uniformly bounded from above and below and f .x; y/ square integrable
with respect to the probability measure such that Assumptions 1(ab) are satisfied
with W .D/ D H01 .D/ \ LsC2 .D/; see [69].
In many applications, the stochastic input data may have a simple piecewise
random representation, whereas, in other applications, the coefficients a and the
right-hand side f in (21.1) may have spatial variation that can be modeled as a
correlated random field, making them amenable to description by a Karhunen-Love
(KL) expansion [37, 38]. In practice, one has to truncate such expansions according
to the degree of correlation and the desired accuracy of the simulation. Examples of
both types of random input data are given next.
Example 3 (Piecewise constant random fields). Assume that the spatial domain D
is the union of non-overlapping subdomains Dn , n D 1; : : : ; N . Then, consider a
coefficient a.x; y/ that is a random constant in each subdomain Dn , i.e., a.x; y/ is
the piecewise constant function
N
X
a.x; y/ WD a0 C an yn .!/1Dn .x/;
nD1
N p
X
a.x; y/ WD a.x/ C n bn .x/yn .!/;
nD1
where n and bn .x/ for n D 1; : : : ; N are the dominant eigenvalues and correspond-
ing eigenfunctions for the covariance function and yn .!/ for n D 1; : : : ; N denote
uncorrelated real-valued random variables. In addition, for the purpose of keeping
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 723
the property that a is bounded away from function zero, the logarithm of the random
field is instead expanded so that a.x; y/ has the form
N p
( )
X
a.x; y/ WD amin C exp a.x/ C n bn .x/yn .!/ ; (21.5)
nD1
K Z Z
X
Sk .uI y/Tk .v/.y/dxdy
kD1 D
(21.6)
Z Z
D vf .x; y/.y/dxdy 8 v 2 W .D/ Lq . /;
D
Without loss of generality, it suffices to consider the single term form of (21.6),
i.e.,
Z Z Z Z
S .uI y/T .v/.y/dxdy D vf .y/.y/dxdy 8 v 2 W .D/ Lq . /;
D D
(21.7)
724 M. Gunzburger et al.
where T ./ is a linear operator and, in general, S ./ is a nonlinear operator and where
again the explicit reference to dependences on the spatial variable x is suppressed.
At each point in y 2 , the coefficients cj .y/ and thus uJh are determined by solving
the problem
Z XJh Z
S cj .y/j .x/I y T .j 0 /dx D j 0 f .y/dx for j 0 D 1; : : : ; Jh :
D j D1 D
(21.9)
What this means is that to obtain the semi-discrete approximation uJh .x; y/ at any
specific point y0 2 , one only has to solve a deterministic finite element problem
by fixing y D y0 in (21.9). The subset of in which (21.9) has no solution has zero
measure with respect to dy. For convenience, it is assumed that the coefficient a
and the forcing term f in (21.1) admit a smooth extension on dy-zero measure
sets. Then, (21.9) can be extended a.e. in with respect to the Lebesgue measure,
instead of the measure dy.
Jh
M X
X
uJh ;M .x; y/ WD cj m j .x/ m .y/ 2 Wh .D/ P. /; (21.10)
mD1 j D1
where the coefficients cj m and thus uJh ;M are determined by solving the problem
Z Z Z Z
S .uJh ;M I y/T .v/.y/dxdy D vf .y/.y/dxdy 8 v 2 Wh .D/ P. /
D D
(21.11)
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 725
R
X Z X Jh
M X
wr .b
yr / m0 .b
yr / S cj m j .x/ .b
y
m r /;b
y r T j 0 .x/ dx
rD1 D mD1 j D1
R
X Z
D wr .b
yr / m0 .b
yr / j 0 .x/f .x;b
yr /dx
rD1 D
In this section, several choices of the stochastic polynomial subspace P. / are
discussed for constructing the fully discrete approximation in (21.10).
Let P. / WD PJ .p/ . / L2 . / denote the multivariate polynomial space over
corresponding to the index set J .p/, defined by
Y
N
PJ .p/ . / D span ynn WD .1 ; : : : ; N / 2 J .p/; yn 2 n ; (21.13)
nD1
where M WD dim PJ .p/ is the dimension of the polynomial subspace.
Several choices for the index set and the corresponding polynomial spaces are
available [5,22,30,65]. The most obviousone is the tensor-product (TP)
polynomial
space, defined by choosing J .p/ D 2 NN j max n p . In this case,
n
M D .p C 1/N which results in an explosion in computational effort for higher
dimensions. For the same value of p, the same nominal rate of convergence is
achieved at a substantially
lower P
costs by the total
degree (TD) polynomial spaces
for which J .p/ D 2 NN j N nD1 n p and M D .N C p/=.N p/.
Other subspaces having dimension smaller than theTP subspace PNinclude hyperbolic
cross (HC) polynomial
spaces for which J .p/ D 2 NN
j nD1 log2 .n C 1/
log2 .p C 1/ and sparse Smolyak (SS) polynomial spaces for which
8
XN <0 for p D 0
N
J .p/ D 2 N .n / .p/ with .p/ D 1 for p D 1
:
nD1 dlog2 .p/e for p 2:
(21.14)
As illustrated by Example 4, it is often the case that the stochastic input data exhibits
anisotropic behavior with respect to the directions yn , n D 1; : : : ; N . To exploit
this effect, it is necessary to approximate uJh in an anisotropic polynomial space.
Following [57], the vector D .1 ; : : : ; N / 2 RN
C of positive weights is introduced
with min D min n . The anisotropic versions J .p/ of the aforementioned
nD1;:::;N
polynomial spaces are described in [5]. Here, the anisotropic SS polynomial space
is given by
XN
J .p/ D 2 NN n .n / min .p/ :
nD1
For N D 2 dimensions, in Fig. 21.1 examples are provided of both isotropic and
anisotropic TP, TD, HC, and SS polynomial spaces, where p D 8 and D .2; 1/.
10 10 10 10
9 9 9 9
8 8 8 8
7 7 7 7
6 6 6 6
5 5 5 5
4 4 4 4
3 3 3 3
2 2 2 2
1 1 1 1
0 0 0 0
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Fig. 21.1 For a finite dimensional with N D 2 and a fixed polynomial index p D 8, top row: the indices .p1 ; p2 / 2 J .8/ corresponding to the isotropic
TP, TD, HC, and SS polynomial spaces. Bottom row: the indices .p1 ; p2 / 2 J .8/ with 1 =2 D 2 corresponding to the anisotropic TP, TD, HC, and SS
polynomial spaces
727
728 M. Gunzburger et al.
where the basis functions of the best M -term subspace PJ opt . / are indexed by
opt P M
2 JM and the coefficient t .x/ is defined by t .x/ WD Jj hD1 cj; j .x/.
The literature on the best M -term Taylor and Galerkin approximations has been
growing fast recently; see [6, 7, 10, 15, 16, 21, 22, 4244]. In the benchmark work
[22], the analytical dependence of the solutions of stochastic elliptic PDEs on
the parameters was proved under mild assumptions on the input coefficients, and
convergence analysis of the best M -term Taylor and Legendre approximations was
established subsequently. Consider, for example, the Taylor expansion of the semi-
discrete solution uJh .x; y/ with respect to y on D 1; 1N . Application of the
triangle inequality gives
X X
sup
uJh .y/ t
.y/ kt kW .D/ ;
y2 2JM opt opt
W .D/ JM
opt
which suggests determining optimal index set JM by choosing for it the set
of indices corresponding to M largest kt kW .D/ . In [22], the error of such
approximation was estimated due to Stechkin inequality (see, e.g., [23]):
X 1
kt kW .D/ k.kt kW .D/ /k`q .S/ M 1 q ; (21.16)
opt
JM
where q 2 .0; 1/ such that .kt kW .D/ /2S is `q -summable. It should be noted
that the convergence rate (21.16) does not depend on the dimension of the
parameter domain (which is possibly countably infinite therein). This error
estimate, however, has some limitations. First, sharp, explicit evaluation of coef-
ficient k.kt kV .D/ /k`q.S / is inaccessible in general (thus so is the total estimate).
Secondly, (21.16) often occurs with infinitely many values of q, and stronger rates,
corresponding to smaller q, are also attached to bigger coefficients. For a specific
range of M , the effective rate of convergence is unclear. In [66], the estimate (21.16)
has been improved to a sharp sub-exponential convergence rates of the form
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 729
M exp..
M /1=N /, where
is a constant depending on the shape and cardinality
of multi-index sets. On the other hand, in implementation, finding the best index
set and polynomial space is an infeasible task, since this requires computation
of all of the t . As a strategy to circumvent this challenge, adaptive algorithms
which generate the index set in a near-optimal, greedy procedure were developed
in [15]. This method gives the optimal rates; however, it comes with a high cost
of exploring the polynomial space, which may be daunting in high-dimensional
problems.
X 1
B./ kB./k`p .S/ M 1 q (21.17)
q-opt
JM
and then took advantage of the formula of B./ to compute q 2 .0; 1/ depending
1
on M which minimizes kB./k`p .S/ M 1 q .
Although known as an essential tool to study the convergence rate of best M -term
approximations, the Stechkin inequality is probably less efficient for quasi-optimal
methods. As a generic estimate, it does not fully exploit the availableP
information of
the decay of coefficient bounds. In this setting, a direct estimate of J q-opt B./
M
may be viable and advantageous to give a sharper result. On the other hand, the
1
process of solving the minimization problem q D argminq2.0;1/ kB./k`p .S/ M 1 q
needs to be tailored to B./, making this approach not ideal for generalization.
Currently, the minimization has been limited for some quite simple types of
upper bounds. In many scenarios, the sharp estimates of the coefficients may
involve complicated bounds which are not even explicitly computable, such as
those proposed in [22]. The extension of this approach to such cases seems to be
challenging.
730 M. Gunzburger et al.
thus the quality of the error estimate mainly depends on the approximation of
cardinality of Qk . To tackle this problem, the authors of [66] developed a strategy
which extends Qk into continuous domain and, through relating the number of
N -dimensional lattice points to continuous volume (Lebesgue measure), established
a sharp estimate of the cardinality #.Qk / up to any prescribed accuracy. The
development includes the utilization and extension of several results on lattice
point enumeration; see [8, 38] for a survey. Under some weak assumptions on
B./, an asymptotic sub-exponential convergence rate of truncation error of the
form M exp..
M /1=N / was achieved, where
is a constant depending on
the shape and size of quasi-optimal index sets. Through several examples, the
authors explicitly derived
and demonstrated the optimality of the estimate both
theoretically (by proving a lower bound) and computationally (via comparison
with exact calculation of truncation error). The advantage of the analysis frame-
work is therefore twofold. First, it applies to a general class of quasi-optimal
approximations; further, it gives sharp estimates for their asymptotic convergence
rates.
Assumption 2. For 0 < < amin , there exists r, with .rn /1nN and rn > 1 8i ,
such that the complex extension of u to the .; r/-polyellipse Er , i.e., u W Er !
W .D/ is well-defined and analytic in an open neighborhood of Er , given by
O rn C rn1 rn rn1
Er D zn 2 C W<.zn /D cos ; =.zn /D sin ; 2 0; 2/ ;
1nN
2 2
M
X
cm .x/ m .ym0 / D uJh .x; ym0 / for m0 D 1; : : : ; M : (21.19)
mD1
732 M. Gunzburger et al.
Thus, each of the coefficient functions fcm .x/gM mD1 is a linear combination of the
finite element data fuJh .x; ym /gM
mD1 ; the specific linear combinations are determined
in the usual manner from the entries of the inverse of the M M interpolation
matrix L having entries Lm0 ;m D m .ym0 /, m; m0 D 1; : : : ; M . The sparsity and
conditioning of L heavily depend on the choice of basis; that choice could result in
matrices that range from fully dense to diagonal and from highly ill conditioned to
perfectly well conditioned.
The main attraction of interpolatory approximations of parameter dependences
is that it effects a complete decoupling of the spatial and probabilistic degrees of
freedom. Clearly, once the interpolation points fym gM mD1 are chosen, one can solve
M deterministic finite element problems, one for each parameter point ym , with
total disregard to what basis f m .y/gM mD1 one choose to use. Then, the coefficients
fcm .x/gMmD1 defining the approximation (21.18) are found from the interpolation
conditions in (21.19); it is only in this last step that the choice of stochastic basis
enters into the picture. Note that this decoupling property makes the implementation
of Lagrange interpolatory approximations of parameter dependences almost as
trivial as it is for Monte Carlo sampling. However, if that dependence is smooth,
because of the higher accuracy of global polynomial approximations in the space
PJ .p/ . /, interpolatory approximations require substantially fewer sampling points
to achieve a desired error tolerance.
Given a set of interpolation points, to complete the setup of a Lagrange
interpolation problem, one has to choose a basis. The simplest and most popular
choices are the Lagrange fundamental polynomials, i.e., polynomials that have
the delta property m0 .ym / D m0 m , where m0 m denotes the Kronecker delta. In
this case, the interpolating conditions (21.19) reduce to cm .x/ D uJh .x; ym / for
m D 1; : : : ; M , i.e., the interpolation matrix L is simply the M M identity
matrix. In this sense, the use of Lagrange polynomial bases can be viewed as
resulting in pure sampling methods, much the same as Monte Carlo methods,
but instead of randomly sampling in the parameter space , the sample points
are deterministically structured. Mathematically, using the Lagrange fundamental
polynomial basis f m gM mD1 , this ensemble-based approach results in a fully discrete
approximation of the solution u.x; y/ of the form
M
X
gSC
uJh M .x; y/ D uJh .x; ym / m .y/: (21.20)
mD1
m.ln /
X
.l / .ln /
Unm.ln / v.yn / D v yn;kn n;k .yn / for ln D 1; 2; : : : ; (21.21)
kD1
.l /
where n;kn 2 Pm.ln /1 .n /, k D 1; : : : ; m.ln / are Lagrange fundamental polyno-
mials of degree pln D m.ln / 1 such that
.l /
m.ln /
Y yn yn;kn 0
.ln /
n;k .yn / D :
.l / .l /
k 0 D1 yn;kn yn;kn 0
k 0 k
Using the convention that Unm0 D 0, the difference operator can be defined by
m
nm.ln / D Unm.ln / Un ln 1 : (21.22)
N
O
m
D nm.ln / ; (21.23)
nD1
and, from (21.22) and (21.23), the L-th level generalized sparse grid operator given
by
N
X O
m;g
IL D nm.ln / ; (21.24)
g.l/L nD1
X X N
O
gSC m;g
uJh ML D IL uJh D .1/jkj Unm.ln / uJh :
LN C1g.l/L k2f0;1gN nD1
g.lCk/L
(21.25)
734 M. Gunzburger et al.
The fully discrete gSCM (21.25) requires the independent evaluation of the finite
element approximation uJh .x; y/ on a deterministic set of distinct collocation points
given by
N n
[ O om.ln /
m;g .l /
HL D yn;kn
kD1
g.l/L nD1
and
ML
X
gSC
VuJh ML .x/ D wQ m u2Jh .x; ym / EuSC 2
Jh ML .x/;
m
where wQ m D V m .y/, m D 1; : : : ; ML .
The work [5] constructs the underlying polynomial space associated with the
approximation (21.25) for particular choices of m, g, and L. Let m1 .k/ D minfl 2
NC j 1
m.l/ kg denote the left inverse
of m such that m1 m.l/ D l and
m m .k/ k. Then, let s.l/ D m.l1 /; : : : ; m.lN / and define the polynomial
index set
With this definition in hand, the following proposition, whose proof can be found in
[5, Proposition 1], characterizes the underlying polynomial space of the sparse grid
m;g
approximation IL uJh .
m;g
With Proposition 1 in hand, the sparse grid approximation IL can be related
with the corresponding polynomial subspaces, i.e., PJ m;g .L/ . / with m.l/ D l
PN
and g.l/ D maxnD1;:::;N .ln 1/ L, g.l/ D nD1 .ln 1/ L, and
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 735
PN
g.l/ D nD1 log2 .ln / log2 .L C 1/ for the tensor-product, total degree, and
hyperbolic cross polynomial subspaces, respectively. However, the most widely
used polynomial subspace is the sparse Smolyak given by (21.14), which, in the
context of the sparse grid approximation, is defined by
N
X
m.1/ D 1; m.l/ D 2l1 C 1; and g.l/ D .ln 1/: (21.26)
nD1
.l/
In addition, one sets y1 D 0 if m.l/ D 1 and chooses the multi-index map g as
well as the number of points m.l/, l > 1, at each level as in (21.26). Note that this
particular choice corresponds to the most used sparse grid approximation because
.l/
m.l/ .lC1/
m.lC1/
it leads to nested sequences of points, i.e., yk kD1 yk kD1
so that the
m;g m;g
sparse grids are also nested, i.e., HL HLC1 .
However, even though the CC choice of points results in a significantly reduced
m;g
number of points used by IL , that number of points eventually increases exponen-
tially fast with N . With this in mind, an alternative to the standard Clenshaw-Curtis
(CC) family of rules is considered which attempts to retain the advantages of
nestedness while reducing the excessive growth described above. To achieve this,
it is necessary to exploit the fact that the CC interpolant is exact in the polynomial
space Pm.l/1 to drop, in each direction, the requirement that the function m be
strictly increasing. Instead, a new mapping m.l/ Q W NC ! NC is defined such
0
that m.l/
Q m.l
Q C 1/ and m.l/ Q D m.k/,
Q where k D argminfk 0 j 2k 1 Lg.
In other words, the current rule are reused for as many levels as possible, until
the total degree subspace is properly included. Figure 21.2 shows the difference
between the standard CC sparse grid and the slow growth CC (sCC) sparse grid
for l D 1; 2; 3; 4. Figure 21.3 shows the corresponding polynomial accuracy of the
CC and sCC sparse grids when used in a quadrature rule approximation (as opposed
736
1 1 1 1
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
0.2 0.2 0.2 0.2
0.4 0.4 0.4 0.4
0.6 0.6 0.6 0.6
0.8 0.8 0.8 0.8
1 1 1 1
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
1 1 1 1
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
0.2 0.2 0.2 0.2
0.4 0.4 0.4 0.4
0.6 0.6 0.6 0.6
0.8 0.8 0.8 0.8
1 1 1 1
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
Fig. 21.2 For D 1; 12 , the sparse grids corresponding to levels L D 1; 2; 3; 4, using standard Clenshaw-Curtis points (top) and slow-growth Clenshaw-
Curtis points (bottom)
M. Gunzburger et al.
35 35 35 35
30 30 30 30
25 25 25 25
20 20 20 20
15 15 15 15
10 10 10 10
5 5 5 5
0 0 0 0
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
35 35 35 35
30 30 30 30
25 25 25 25
20 20 20 20
15 15 15 15
10 10 10 10
5 5 5 5
0 0 0 0
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Fig. 21.3 For D 1; 12 , the polynomial subspaces associated with integrating a function u 2 C 0 . /, using sparse grids corresponding to levels L D
3; 4; 5; 6 using standard Clenshaw-Curtis points (top) and slow-growth Clenshaw-Curtis points (bottom)
737
738 M. Gunzburger et al.
N
Y
b
.y/ D b
n .yn / 8y 2 and such that 1 < 1:
nD1
b
L . /
Note that b.y/ factorizes so that it can be viewed as a joint PDF for N independent
random variables.
For each parameter dimension n D 1; : : : ; N , let the m.ln / Gaussian points be
the roots of the m.ln / degree polynomial that is bn -orthogonal to all polynomials of
degree m.ln 1/ on the interval n . The auxiliary density b should be chosen as
close as possible to the true density so that the quotient =b is not too large.
such that, the solution u.x; yn ; yn / admits an analytic extension u.x; yn ; z/, z 2
n C. The ability to treat the stochastic dimensions differently is a necessity
because many practical problems exhibit highly anisotropic behavior, e.g., the size
n of the analyticity region associated to each random variable yn increases with n.
In such a case, it is known that if the dependence on each random variable is
approximated with polynomials, the best approximation error decays exponentially
fast with respect to the polynomial degree. More precisely, for a bounded region n
and a univariate analytic function, recall the following Lemma, whose proof can be
found in [1, Lemma 7] and which is an immediate extension of the result given in
[24, Chapter 7, Section 8].
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 739
2
Em.ln / min kv wkCn0 e 2Mln rn max kv.z/kW .D/
w2Pm.ln / e 2rn 1 z2.n I
n /
with
s
1 2
n 4
n2
0 < rn D log C 1C : (21.28)
2 jn j jn j2
A related result with weighted norms holds for unbounded random variables whose
probability density decays as the Gaussian density at infinity; see [1].
In the multivariate case, the size
n of the analyticity region depends, in general,
on the direction n. As a consequence, the decay coefficient rn will also depend on
the direction. The key idea of the anisotropic sparse gSCM in [58] is to place more
points in directions having slower convergence rate, i.e., with smaller value for rn .
In particular, the weights n can be linked with the rate of exponential convergence
in the corresponding direction by
Let
N
X
Dr D min frn g and R.N / D rn : (21.30)
nD0;1;:::;N
nD1
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
active index previous index
Fig. 21.4 For D 0; 12 and L D 7: the anisotropic sparse grids with 2 =1 D 1
(isotropic), 2 =1 D 3=2, and 2 =1 D 2 utilizing the Clenshaw-Curtis points (top row) and
the corresponding indices .l1 ; l2 / such that 1 .l1 1/ C 2 .l2 1/ min L (bottom row)
evaluated in the natural norm L2 . I W .D//. Note that if the stochastic data, i.e.,
a and/or f , are not an exact representation but are instead an approximation in
terms of N random variables, e.g., arising from a suitable truncation of infinite
representations of random fields, then there would be an additional error ku uN k
to consider. This contribution to the total error was considered in [57, Section 4.2].
By controlling the error in this natural norm, the error can also be controlled in the
expected value of the solution, e.g.,
kEkW .D/ E kkW .D/ kkL2 . IW .D// :
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 741
The quantity .I / accounts for the error with respect to the spatial grid size h, i.e.,
the finite element error; it is estimated using standard approximability properties of
the finite element space Wh .D/ and the spatial regularity of the solution u; see, e.g.,
[11, 18]. Specifically,
Z 1=2
ku uJh kL2 . IW .D// hs C .y/C .sI u.y//2 .y/ dy :
The primary concern will be to analyze the approximation error .II /, i.e.,
uJ I m;g uJ 2 ; (21.32)
h L h L . IW .D//
for the Clenshaw-Curtis points using the anisotropic sparse grid approximation with
m.l/ and g defined as follows:
N
X
m.1/ D 1; m.l/ D 2l1 C 1; and g.l/ D n .ln 1/ min L: (21.33)
nD1
Error analysis of the sparse grid approximation with other isotropic or anisotropic
choices of m.l/ and g can be found in [57, 58].
Under the very reasonable assumption that the semi-discrete finite element
solution uJh admits an analytic extension as described in Assumption 2 with the
same analyticity region as for u, the behavior of the error (21.32) will be analogous
to u IL uL2 . IW .D// . For this reason, the analysis presented next considers
m;g
the latter.
Recall that even though in the global estimate (21.31) it is enough to bound
the approximation error .II / in the L2 . I W .D// norm, we consider the more
stringent L1 . I W .D// norm. In this chapter, the norm k k1;n is shorthand for
k kL1 .n IW .D// and similarly, k k1;N is shorthand
for k kL1 . IW .D// .
The multidimensional error estimate u IL u is constructed from a
m;g
2
m log.m 1/ C 1 (21.35)
742 M. Gunzburger et al.
for Mln 2; see [26]. On the other hand, using Lemma 1, the best approximation
to functions u 2 C 0 .n I W .D// that admit an analytic extension as described by
Assumption 2 is bounded by
2
Em.ln / .u/ e 2m.ln /rn .u/; (21.36)
e 2rn 1
where .u/ D max1nN maxyn 2n maxz2.n I
n / ku.z/kW .D/ . For n D 1; 2; : : : ; N ,
denote the one-dimensional identity operator by In W C 0 .n I W .D// !
C 0 .n I W .D// and use (21.34)(21.36) to obtain the estimates
4
ln
u Unm.ln / u 2r ln e rn 2 .u/
1;n e n 1
and
8
m.ln / ln 1
n u ln e rn 2 .u/: (21.37)
1;n e 2rn1
Because the value .u/ affects the error estimates as a multiplicative constant, from
here on, it is assumed to be one without any loss of generality.
The next Theorem provides an error bound in terms of the total number ML of
Clenshaw-Curtis collocation points. The details of the proof can be found in [58,
Section 3.1.1] and is therefore omitted. A similar result holds for the sparse grid
m;g
IL using Gaussian points and can be found in [58, Section 3.1.2].
Theorem 1. Let u 2 L2 . I W .D// and let the functions m and g satisfy (21.33)
with weights n D rn . Then for the gSCM approximation based on the Clenshaw-
Curtis points, the following estimates hold.
R.N /
Algebraic convergence 0 L r log.2/
.
1
u I m;g u b .r; N /M
C
L L1 . N IW .D// L
r .log.2/e 1=2/
with 1 D P : (21.38)
log.2/ C N nD1 r=g.n/
Sub-exponential convergence L > rR.N /
log.2/
2 log.2/
u I m;g u 1 N b .r; N / M 2 exp R.N /M R.N / 2
C
L L . IW .D// L L
r
with 2 D PN ;
log.2/ C nD1 r=g.n/
(21.39)
Remark 3 (On the curse of dimensionality). Suppose that the stochastic input data
are truncated expansions of P random fields and that the values frn g1 nD1 can be
estimated. Whenever the sum 1 nD1 r=rn is finite, the algebraic exponent in (21.38)
does not deteriorate as the truncation dimension N increases. This condition is
satisfied. This is a clear advantage compared to the isotropic Smolyak method
studied in [57] because rn ! C1 is available and C b .r; N / does not deteriorate
with N , i.e., it is bounded, and therefore the method does not suffer from the curse
of dimensionality. In fact, in such a case,P the anisotropic Smolyak formula can be
extended to infiniteP dimensions, i.e., 1 nD1 .ln 1/rn Lr:
1
The condition nD1 r=rn < 1 is clearly sufficient to break the curse of
dimensionality. In that case, even an anisotropic full tensor approximation also
breaks the curse of dimensionality.
The algebraic exponent for the convergence P of the anisotropic full tensor
approximation again deteriorates with the value of 1 nD1 r=rn , but the constant for
P
such convergence is N 2
nD1 e 2rn 1 . This constant is worse than the one corresponding
b
to the anisotropic Smolyak approximation C .r; N /.
On the other hand, by considering the case where all rn are equal and the
results derived in [57], it can be seen that the algebraic convergence exponent has
not be estimated sharply. It is expected that the anisotropic Smolyak method to
break
P1 the curse of dimensionality for a wider set of problems, i.e., the condition
nD1 r=rn < 1, seems to be unnecessary to break the curse of dimensionality.
This is in agreement with Remark 1.
P
Q D N
where g.l/ nD1 n .ln 1/. This problem has the solution D r, and hence,
the choice of weights (21.29) is optimal.
744 M. Gunzburger et al.
N
X O N
X O m.n /
IJ q-opt u D m.n / u D U U m.n 1/ u
M
q-opt nD1 q-opt nD1
2JM 2JM
(21.40)
and
N
[ O m. /
HJ q-opt D fyn;k gkD1n ; (21.41)
M
q-opt nD1
2JM
u IJ q-opt u 1 C LJ q-opt inf ku vkL1 . /
M L1 . / M v2P q-opt . /
JM
(21.42)
1 C LJ q-opt u uJ q-opt :
M M L1 . /
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 745
Q
where .i/ is also known as the Leja sequence [9, 17], RZK1 ./ D K1
kD1 j yk j is
the residual function, and LZK1 ./ is the well-known Lebesgue function [12, 60].
The three separate one-dimensional optimization problems are referred to as
.i/ Leja, .ii/ Max-Lebesgue, and .iii/ Min-Lebesgue. Preliminary compar-
isons of the growth of the Lebesgue constants in N D 1 and N D 2 dimensions
(using the sparse interpolant IJ q-opt in a total degree polynomial subspace) are
M
given in Fig. 21.5. All three cases exhibit moderate growth of the Lebesgue
constant, but theoretical estimates of growth of the Lebesgue constants are still
open questions. Finally, it should be noted that the approach of constructing the
30 80
Leja Leja
MaxLebesgue 70 MaxLebesgue
25 MinLebesgue MinLebesgue
Chebyshev 60
Lebesgue constant
Lebesgue constant
20
50
15 40
30
10
20
5
10
0 0
0 5 10 15 20 25 30 35 40 2 4 6 8 10 12 14 16 18 20
Polynomial degree Polynomial degree
Fig. 21.5 (Left) Comparison between Lebesgue constants for one-dimensional interpolation
rules; (right) comparison between Lebesgue constants for two-dimensional interpolation rules
constructed by virtue of the quasi-optimal interpolant IJ q-opt u
M
746 M. Gunzburger et al.
To realize their high accuracy, the stochastic collocation methods based on the use
of global polynomials discussed require high regularity of the solution u.x; y/ with
respect to the random parameters fyn gN nD1 . They are therefore ineffective for the
approximation of solutions that have irregular dependence with respect to those
parameters. Motivated by finite element methods (FEMs) for spatial approximation,
an alternate and potentially more effective approach for approximating irregular
solutions is to use locally supported piecewise polynomial approaches for approx-
imating the solution dependence on the random parameters. To achieve greater
accuracy, global polynomial approaches increase the polynomial degree; piecewise
polynomial approaches instead keep the polynomial degree fixed but refine the grid
used to define the approximation space.
from which an arbitrary hat function with support .yl;i hQ l ; yl;i C hQ l / can be
generated by dilation and translation, i.e.,
y C 1 i hQ
l
l;i .y/ WD ;
hQ l
where l denotes the resolution level, hQ l D 2lC1 for l D 0; 1; : : : denotes the grid
size of the level l grid for the interval 1; 1, and yl;i D i hQ l 1 for i D 0; 1; : : : ; 2l
denotes the grid points of that grid. The basis function l;i .y/ has local support and
is centered at the grid point yl;i ; the number of grid points in the level l grid is 2l C1.
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 747
Z0 Z1 Zl ZlC1 Z:
Each of the subspaces fZl g1 lD0 is the standard finite element subspace of continuous
piecewise linear polynomial functions on 1; 1 that is defined with respect to the
l
grid having grid size hQ l . The set f l;i .y/g2iD0 is the standard nodal basis for the
space Zl .
l
An alternative to the nodal basis f l;i .y/g2iD0 for Zl is a hierarchical basis, the
construction of which starts with the hierarchical index sets
Bl D i 2 N i D 1; 3; 5; : : : ; 2l 1 for l D 1; 2; : : :
Due to the nesting property of fZl g1 lD0 , it is easy to see that Zl D Zl1 Wl and
Wl D Zl = l1
l 0 D0 Zl for l D 1; 2; : : :. Then, the hierarchical subspace splitting of
0
Zl is given by
Z l D Z0 W1 W l for l D 1; 2; : : ::
It is easy to verify that, for each l, the subspaces spanned by the hierarchical and the
nodal basis bases are the same, i.e., they are both bases for Zl .
L
The nodal basis f L;i .y/g2iD0 for ZL possesses the delta property, i.e.,
0
L;i .yL;i 0 / D i;i 0 for i; i 2 f0; : : : ; 2L g. The hierarchical basis (21.44) for
ZL possesses only a partial delta property; specifically, the basis functions
corresponding to a specific level possess the delta property with respect to its own
level and coarser levels, but not with respect to finer levels, i.e., for l D 0; 1; : : : ; L
and i 2 BL ,
A comparison between the linear hierarchical polynomial basis and the correspond-
ing nodal basis for L D 3 is given in Fig. 21.6.
For each grid level l, the interpolant of a function g.y/ in the subspace Zl in
l
terms of the its nodal basis f l;i .y/g2iD0 is given by
2 l
X
Il g.y/ D g.yi / l;i .y/: (21.46)
iD0
The delta property of the nodal basis implies that the interpolation matrix is
diagonal. The interpolation matrix for the hierarchical basis is not diagonal, but
the partial delta property (21.45) implies that it is triangular so that the coefficients
in the interpolant can be solved for explicitly. This also can be seen from the
definition (21.47) for l ./ and the recursive form of Il ./ in (21.48) for which
the surpluses can be computed explicitly.
Level 0
1.5 1 1
0,0 0,1
1
0.5
0
1 0.5 0 0.5 1
Level 1
1.5
1
1,1
1
0.5
0
1 0.5 0 0.5 1
Level 2
1.5
1 1
2,1 2,3
1
0.5
0
1 0.5 0 0.5 1
Level 3
1.5
1 1 1 1
3,1 3,3 3,5 3,7
1
0.5
0
1 0.5 0 0.5 1
Level 3 nodal basis
1.5 1 1 1 1 1 1 1 1 1
3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8
1
0.5
0
1 0.5 0 0.5 1
Fig. 21.6 Piecewise linear polynomial bases for L D 3. Top four rows: the basis functions for
Z0 , W1 , W2 , and W3 , respectively; the hierarchical basis for Z3 is the union of the functions in the
top four rows. Bottom row: the nodal basis for Z3
750 M. Gunzburger et al.
N
O
Wl D Wln D span f l;i .y/ j i 2 Bl g ;
nD1
l
M l
M M
Zl D Wl 0 D Wl0 ;
l 0 D0 l 0 D0 .l0 /Dl 0
where the key is how the mapping .l/ is defined because it defines the incremental
subspaces Wl 0 D .l0 /Dl 0 Wl0 . For example, .l/ D maxnD1;:::;N ln leads to a
full tensor-product space, whereas .l/ D jlj D l1 C : : : C lN leads to a sparse
polynomial space. As the full tensor-product space especially suffers from the curse
of dimensionality as N increases, this choice is not feasible for even moderately
high-dimensional problems. Thus, it is only considered the case that the sparse
polynomial space obtained by setting .l/ D jlj.
The level l hierarchal sparse grid interpolant of the multivariate function g.y/ is
then given by
l
X X
gl .y/ WD .l10 lN0 /g.y/
l 0 D0 jl0 jDl 0
X
D gl1 .y/ C .l10 lN0 /g.y/
jl0 jDl
(21.49)
X X
D gl1 .y/ C g.yl0 ;i / gl 0 1 .yl0 ;i / l0 ;i .y/
jl0 jDl i2Bl0
X X
D gl1 .y/ C cl0 ;i l0 ;i .y/;
jl0 jDl i2Bl0
l0 2 f.0; 0/; .0; 1/; .0; 2/; .1; 0/; .1; 1/; .1; 2/; .2; 0/; .2; 1/; .2; 2/g:
The level l D 2 sparse grid H22 . / shown on the right-top includes only six of
the nine sub-grids, with the three sub-grids depicted in gray not included because
they fail the criterion jl0 j l 0 D 2. Moreover, due to the nesting property of the
i1 = 0 i1 = 1 i1 = 2
i2 = 0
2
Isotropic sparse grid H2
i2 = 1
Fig. 21.7 Nine tensor-product sub-grids (left) for level l D 0; 1; 2 of which only the 6 sub-grids
for which l10 C l20 l D 2 are chosen to appear in the level l D 2 isotropic sparse grid H22 . /
(right-top) containing 17 points. With adaptivity, only points that correspond to a large surplus,
i.e., the points in red, blue, and green, lead to 2 children points added in each direction resulting in
the adaptive sparse grid H Q 22 . / (right-bottom) containing 12 points
752 M. Gunzburger et al.
hierarchical basis, H22 . / has only 17 points, as opposed to the 49 points of the full
tensor-product grid.
L XX
X
uJh ML .x; y/ D cl;i .x/ l;i .y/; (21.50)
lD0 jljDl i2Bl
where now the coefficients are functions of x to reflect that dependence of the
function uJh ML .x; y/. In the usual manner, those coefficients are given in terms of
PJh
the spatial finite element basis fj .x/gJj hD1 by cl;i .x/ D j D1 cj;l;i j .x/ so that,
from (21.50), it can be obtained that
L X XX
X Jh
uJh ML .x; y/ D cj;l;i j .x/ l;i .y/
lD0 jljDl i2Bl j D1
(21.51)
Jh X
X L XX
D cj;l;i l;i .y/ j .x/:
j D1 lD0 jljDl i2Bl
Then, it is easy to see from (21.51) that, for fixed j , fcj;l;i gjljL;i2Bl can be obtained
by solving the linear system
L XX
X
uJh ML .xj ; yl0 ;i0 / D cj;l;i l;i .yl0 ;i0 /
lD0 jljDl i2Bl (21.52)
Thus, the approximation uJh ML .x; y/ can be obtained by solving Jh linear systems.
However, because the hierarchical bases l;i .y/ satisfies l;i .yl0 ;i0 / D 0 if l 0 l
(this is a consequence of the one-dimensional partialdelta property), the coefficient
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 753
cj;l0 ;i0 in the system (21.52) corresponding to the sparse grid point yl0 ;i0 on level L,
i.e., for jl0 j D L, reduces to
L1
X XX
c j;l0 ;i0 D uNh .xj ; y l0 ;i0 / cj;l;i l;i .yl0 ;i0 /
lD0 jljDl i2Bl (21.53)
so that linear system becomes a triangular system and all the coefficients can be
computed explicitly by recursively using the (21.53). Note that (21.53) is consistent
with the definition of the cl;i .x/ given in (21.49).
where uJh ML1 .x; y/ is the sparse grid approximation in ZL1 and uJh ML .x; y/ is
the hierarchical surplus interpolant in the subspace WL . According to the analysis
in [14], for smooth functions, the surpluses cj;l;i of the sparse grid interpolant uJh ML
in (21.51) tend to zero as the interpolation level l goes to infinity. For example,
in the context of using piecewise linear hierarchical bases and assuming the spatial
approximation uNh .x; y/ of the solution has bounded second-order weak derivatives
with respect to y, i.e., uNh .x; y/ 2 Wh .D/ H2 . /, then the surplus cj;l;i can be
bounded as
where the constant C is independent of the level l. Furthermore, the smoother the
target function is, the faster the surplus decays. This provides a good avenue for
constructing adaptive sparse grid interpolants using the magnitude of the surplus
as an error indicator, especially for irregular functions having, e.g., steep slopes or
jump discontinuities.
The construction of one-dimensional adaptive grids is first introduced, and then
such strategy for adaptivity will be extended to multidimensional sparse grids. As
shown in Fig. 21.8, the one-dimensional hierarchical grid points have a treelike
structure. In general, a grid point yl;i on level l has two children, namely, ylC1;2i1
and ylC1;2iC1 on level l C 1. Special treatment is required when moving from level
0 to level 1, where only one single child y1;1 is added on level 1. On each successive
interpolation level, the basic idea of adaptivity is to use the hierarchical surplus
as an error indicator to detect the smoothness of the target function and refine
754 M. Gunzburger et al.
y0, 0 y0, 1
Level 0
y1, 1
Level 1
y2, 1 y2, 3
Level 2
Level 5 y5 , 1 y5, 3 y5, 5 y5, 7 y5, 9 y5 , 1 1 y5, 1 3 y5 , 1 5 y5, 1 7 y5, 1 9 y5, 2 1 y5, 2 3 y5 , 2 5 y5, 2 7 y5, 2 9 y5, 3 1
Level 6
y 6 ,1 y 6 ,3 y 6 ,5 y 6 ,7 y 6 ,9 y 6 ,1 1 y 6 ,1 3 y 6 ,1 5 y 6 ,1 7 y 6 ,1 9 y 6 ,2 1 y 6 ,2 3 y 6 ,2 5 y 6 ,2 7 y 6 ,2 9 y 6 ,3 1 y 6 ,3 3 y 6 ,3 5 y 6 ,3 7 y 6,3 9 y 6 ,4 1 y 6,4 3 y 6 ,4 5 y 6,4 7 y 6 ,4 9 y 6 ,5 1 y 6 ,5 3 y 6 ,5 5 y 6 ,5 7 y 6 ,5 9 y 6 ,6 1 y 6 ,6 3
1 2
y 0.4
0.75 u(y) = exp
0.0 6 2 5
0.5
0.25
Fig. 21.8 A 6-level adaptive sparse grid for interpolating the one-dimensional function g.y/ D
exp.y 0:4/2 =0:06252 on 0; 1 with the error tolerance of 0:01. The resulting adaptive sparse
grid has only 21 points (the black points), whereas the full grid has 65 points (the black and gray
points)
the grid by adding two new points on the next level for each point for which the
magnitude of the surplus is larger than the prescribed error tolerance. For example,
in Fig. 21.8, the 6-level adaptive grid is illustrated for interpolating the function
g.y/ D exp.y 0:4/2 =0:06252 on 0; 1 with error tolerance 0:01. From level
0 to level 2, because the magnitude of every surplus is larger than 0.01, two points
are added for each grid point on levels 0 and 2; as mentioned above, only one point
is added for each grid point on level 1. However, on level 3, there is only 1 point,
namely, y3;3 , whose surplus has a magnitude larger than 0.01, so only two new
points are added on level 4. After adding levels 5 and 6, it ends up with the 6-level
adaptive grid with only 21 points (points in black in Fig. 21.8), whereas the 6-level
nonadaptive grid has a total of 65 points (points in black and gray in Fig. 21.8).
It is trivial to extend this adaptive approach from one-dimension to a multidi-
mensional adaptive sparse grid. In general, as shown in Fig. 21.7, in N -dimensions
a grid point has 2N children which are also its neighbor points. However, note
that the children of a parent point correspond to hierarchical basis functions on
the next interpolation level, so that the interpolant uJh ML in (21.51) can be built
from level L 1 to level L by only adding those points on level L whose parents
have surpluses greater than the prescribed tolerance. At each sparse grid point yl;i ,
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 755
the error indicator is set to the maximum magnitude of the j surpluses, i.e., to
maxj D1;:::;Jh jcj;l;i j. In this way, the sparse grid can be refined locally, resulting in an
adaptive sparse grid which is a sub-grid of the corresponding isotropic sparse grid,
as illustrated by the right-bottom plot in Fig. 21.7. The solution of the corresponding
adaptive hSGSC approach is represented by
0 1
L X X X
X Jh
u"Jh ML .x; y/ D @ cj;l;i j .x/A l;i .y/; (21.56)
lD0 jljDl i2Bl" j D1
Note that Bl" is an optimal multi-index set that contains only the indices of the basis
functions with surplus magnitudes larger than the tolerance ". However, in practice,
the deterministic FEM solver needs to be executed at a certain number of grid points
yl;i with maxj D1;:::;Jh jcj;l;i j < " in order to detect when mesh refinement can stop.
For example, in Fig. 21.8, the points y3;1 , y3;5 , y3;7 , and y5;11 are this type of points.
In this case, the numbers of degrees of freedom in (21.56) is usually smaller than
the necessary number of executions of the deterministic FEM solver.
0.5 1
0 1.5
1 0.5 0 0.5 1 1 0.5 0 0.5 1
Level 2 The quartic basis
1.5 3 3 3,1
6
2,1 2,3
1
5
0.5
4
0
1 0.5 0 0.5 1 3
Level 3
1.5 4 4 4 4 2
4
3,1 3,3 3,5 3,7 3,1
1 1
y2,1
0.5 0
y0,0 y3,1 y1,1 y0,1
0 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1
Fig. 21.9 Left: quartic hierarchical basis functions, where linear, quadratic, and cubic basis
functions are used on levels 0, 1, and 2, respectively. Quartic basis functions appear beginning
with level 3. Right: the construction of a cubic hierarchical basis function and a quartic hierarchical
basis function, respectively
the hierarchal structure can be retained. It should be noted that because a total of p
ancestors are needed, a polynomial of order p cannot be defined earlier than level
p 1. In other words, at level L, the maximum order of polynomials is p D L C 1.
For example, a quartic polynomial basis of level 3 is plotted in Fig. 21.9 (left) where
linear, quadratic, and cubic polynomials are used on levels 0, 1, and 2 due to the
lack of ancestors. It is observed that there are multiple types of basis functions on
each level when p 3 because of the different distributions of supporting points
for different grid points. In general, the hierarchical basis of order p > 1 contains
2p2 types of p-th order polynomials. In Table 21.1, the supporting points used to
define the hierarchical polynomial bases of order p D 2; 3; 4 are listed.
Wavelet Basis
Besides the hierarchical bases discussed above, wavelets form another important
family of basis functions which can provide a stable subspace splitting because of
their Riesz property. The second-generation wavelets, constructed using the lifting
scheme discussed in [63, 64], will be briefly introduced in the following. Second-
generation wavelets are a generalization of biorthogonal wavelets that are easier
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 757
to apply for functions defined on bounded domains. The lifting scheme [63, 64] is
a tool for constructing second-generation wavelets that are no longer dilates and
translates of a single scaling function. The basic idea behind lifting is to start with
simple multi-resolution analysis and gradually build a multi-resolution analysis with
specific, a priori defined properties. The lifting scheme can be viewed as a process
of taking an existing wavelet and modifying it by adding linear combinations of the
scaling function at the coarse level. In the context of the piecewise linear basis, the
w
second-generation wavelet on level l 1, denoted by l;i .y/, is constructed by
lifting the piecewise linear basis l;i .y/ as
2l1
X 0
w i
l;i .y/ WD l;i .y/ C l;i l1;i 0 .y/;
i 0 D0
where, for i D 0; : : : ; 2l1 , l1;i .y/ are the nodal polynomials on level l 1 and
j
the weights l;i in the linear combination are chosen in such a way that the wavelet
w
l;i .y/ has more vanishing moments than l;i .y/ and thus provides a stabilization
effect. Specifically, in the bounded domain 1; 1, there are three types of linear
lifting wavelets:
w 1 1
l;i WD l;i l1; i 1 l1; i C1
for 1 < i < 2l 1, i odd
4 2 4 2
w 3 1
l;i WD l;i l1; i 1 l1; i C1
for i D 1 (21.57)
4 2 8 2
w 1 3
l;i WD l;i l1; i 1 l1; i C1
for i D 2l 1;
8 2 4 2
where the three equations define the central mother wavelet, the left-boundary
wavelet, and the right-boundary wavelet, respectively. The three lifting wavelets are
plotted in Fig. 21.10. For additional details, see [63].
Note that the property given in (21.45) is not valid for the lifting wavelets
in (21.57) because neighboring wavelets at the same level have overlapping support.
As a result, the coefficient matrix of the linear system (21.52) is no longer triangular.
758 M. Gunzburger et al.
3
4
9 9
16 16
18 18
14 14
34 34
Fig. 21.10 Left-boundary wavelet (left), central wavelet (middle), right-boundary wavelet (right)
6 Conclusion
This chapter provides an overview of the stochastic collocation methods for PDEs
with random input data. To alleviate the curse of dimensionality, sparse global
polynomial subspaces and local hierarchical polynomial subspaces are incorporated
into the framework of SCMs. By exploiting the inherent regularity of PDE solutions
with respect to random parameters, the global SCMs can essentially match the fast
convergence of the intrusive stochastic Galerkin methods and retain the nonintrusive
nature that leads to massively parallel implementation. There are a variety of global
sparse polynomial subspaces for the SCMs as alternatives to the standard isotropic
full tensor-product space, in order to obtain better approximations to the best M -
term polynomial subspace. For instance, the generalized sparse grid SCMs can
efficiently approximate PDE solutions with anisotropic behavior by exploiting the
size of the analyticity region associated to each random parameter and assigning
an appropriate weight to each stochastic dimension. The nonintrusive sparse
interpolation in quasi-optimal subspaces can provide more accurate approximations
to the best M -term polynomial expansion by building the subspaces based on sharp
upper bounds of the coefficients of the polynomial expansion of the PDE solutions.
In addition, there are many works on reducing complexity of implementing
SCMs. For example, in [67], a multilevel version of the stochastic collocation
method was proposed, which uses hierarchies of spatial approximations to reduce
the overall computational complexity. In addition, this approach utilizes, for
approximation in stochastic space, a sequence of multidimensional interpolants
of increasing fidelity which can then be used for approximating statistics of the
solution. With the use of interpolating polynomials, the multilevel SCM [39] can
provide us with high-order surrogates featuring faster convergence rates, compared
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 759
References
1. Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 45, 10051034
(2007)
2. Babuka, I.M., Tempone, R., Zouraris, G.E.: Galerkin finite element approximations of
stochastic elliptic partial differential equations. SIAM J. Numer. Anal. 42, 800825 (2004)
(electronic)
3. Babuka, I.M., Tempone, R., Zouraris, G.E.: Solving elliptic boundary value problems with
uncertain coefficients by the finite element method: the stochastic formulation. Comput.
Methods Appl. Mech. Eng. 194, 12511294 (2005)
4. Barth, A., Lang, A., Schwab, C.: Multilevel Monte Carlo method for parabolic stochastic
partial differential equations. BIT Numer. Math. 53, 327 (2013)
5. Beck, J., Nobile, F., Tamellini, L., Tempone, R.: Stochastic spectral Galerkin and collocation
methods for PDEs with random coefficients: a numerical comparison. Lect. Notes Comput.
Sci. Eng. 76, 4362 (2011)
6. Beck, J., Nobile, F., Tamellini, L., Tempone, R.: Convergence of quasi-optimal stochastic
Galerkin methods for a class of PDEs with random coefficients. Comput. Math. Appl. 67,
732751 (2014)
7. Beck, J., Tempone, R., Nobile, F., Tamellni, L.: On the optimal polynomial approximation of
stochastic PDEs by Galerkin and collocation methods. Math. Models Methods Appl. Sci. 22,
1250023 (2012)
8. Beck, M., Robins, S.: Computing the Continuous Discretely: Integer-Point Enumeration in
Polyhedra. Springer, New York (2007)
9. Biaas-Ciez, L., Calvi, J.-P.: Pseudo Leja sequences. Annali di Matematica Pura ed Applicata
191, 5375 (2012)
10. Bieri, M., Andreev, R., Schwab, C.: Sparse tensor discretization of elliptic sPDEs. SIAM J.
Sci. Comput. 31, 42814304 (2009)
11. Brenner, S.C., Scott, L.R.: The Mathematical Theory of Finite Element Methods. Springer,
New York (1994)
12. Brutman, L.: On the Lebesgue function for polynomial interpolation. SIAM J. Numer. Anal.
15, 694704 (1978)
13. Buffa, A., Maday, Y., Patera, A., Prudhomme, C., Turinici, G.: A priori convergence of the
greedy algorithm for the parametrized reduced basis method. ESAIM: Math. Model. Numer.
Anal. 46, 595603 (2012)
14. Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 1123 (2004)
760 M. Gunzburger et al.
15. Chkifa, A., Cohen, A., DeVore, R., Schwab, C.: Sparse adaptive Taylor approximation
algorithms for parametric and stochastic elliptic PDEs. Modl. Math. Anal. Numr. 47, 253
280 (2013)
16. Chkifa, A., Cohen, A., Schwab, C.: Breaking the curse of dimensionality in sparse polynomial
approximation of parametric PDEs. J. Math. Pures Appl. 103, 400428 (2015)
17. Chkifa, M.A.: On the Lebesgue constant of Leja sequences for the complex unit disk and of
their real projection. J. Approx. Theory 166, 176200 (2013)
18. Ciarlet, P.G.: The Finite Element Method for Elliptic Problems. North-Holland, New York
(1978)
19. Clenshaw, C.W., Curtis, A.R.: A method for numerical integration on an automatic computer.
Numer. Math. 2, 197205 (1960)
20. Cliffe, K.A., Giles, M.B., Scheichl, R., Teckentrup, A.L.: Multilevel Monte Carlo methods
and applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14, 315
(2011)
21. Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best n-term Galerkin approximations
for a class of elliptic SPDEs. Found. Comput. Math. 10, 615646 (2010)
22. Cohen, A., DeVore, R., Schwab, C.: Analytic regularity and polynomial approximation of
parametric and stochastic elliptic PDEs. Anal. Appl. 9, 1147 (2011)
23. DeVore, R.: Nonlinear approximation. Acta Numer. 7, 51150 (1998)
24. DeVore, R.A., Lorentz, G.G.: Constructive approximation. Volume 303 of Grundlehren
der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences].
Springer, Berlin (1993)
25. Dexter, N., Webster, C., Zhang, G.: Explicit cost bounds of stochastic Galerkin approximations
for parameterized PDEs with random coefficients. ArXiv:1507.05545 (2015)
26. Dzjadyk, V.K., Ivanov, V.V.: On asymptotics and estimates for the uniform norms of the
Lagrange interpolation polynomials corresponding to the Chebyshev nodal points. Anal. Math.
9, 8597 (1983)
27. Elman, H., Miller, C.: Stochastic collocation with kernel density estimation. Tech. Rep.,
Department of Computer Science, University of Maryland (2011)
28. Elman, H.C., Miller, C.W., Phipps, E.T., Tuminaro, R.S.: Assessment of collocation and
Galerkin approaches to linear diffusion equations with random data. Int. J. Uncertain. Quantif.
1, 1933 (2011)
29. Fishman, G.: Monte Carlo. Springer Series in Operations Research. Springer, New York (1996)
30. Frauenfelder, P., Schwab, C., Todor, R.A.: Finite elements for elliptic problems with stochastic
coefficients. Comput. Methods Appl. Mech. Eng. 194, 205228 (2005)
31. Galindo, D., Jantsch, P., Webster, C.G., Zhang, G.: Accelerating stochastic collocation methods
for partial differential equations with random input data. Tech. Rep. ORNL/TM-2015/219, Oak
Ridge National Laboratory (2015)
32. Ganapathysubramanian, B., Zabaras, N.: Sparse grid collocation schemes for stochastic natural
convection problems. J. Comput. Phys. 225, 652685 (2007)
33. Gentleman, W.M.: Implementing Clenshaw-Curtis quadrature, II computing the cosine trans-
formation. Commun. ACM 15, 343346 (1972)
34. Gerstner, T., Griebel, M.: Numerical integration using sparse grids. Numer. Algorithms 18,
209232 (1998)
35. Ghanem, R.G., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1991)
36. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56, 607617 (2008)
37. Griebel, M.: Adaptive sparse grid multilevel methods for elliptic PDEs based on finite
differences. Computing 61, 151179 (1998)
38. Gruber, P.: Convex and Discrete Geometry. Springer Grundlehren der mathematischen Wis-
senschaften (2007)
39. Gunzburger, M., Jantsch, P., Teckentrup, A., Webster, C.G.: A multilevel stochastic collocation
method for partial differential equations with random input data. SIAM/ASA J. Uncertainty
Quantification 3, 10461074 (2015)
21 Sparse Collocation Methods for Stochastic Interpolation and Quadrature 761
40. Gunzburger, M., Webster, C.G., Zhang, G.: An adaptive wavelet stochastic collocation method
for irregular solutions of partial differential equations with random input data. Lect. Notes
Comput. Sci. Eng. 97, 137170. Springer (2014)
41. Gunzburger, M.D., Webster, C.G., Zhang, G.: Stochastic finite element methods for partial
differential equations with random input data. Acta Numer. 23, 521650 (2014)
42. Hansen, M., Schwab, C.: Analytic regularity and nonlinear approximation of a class of
parametric semilinear elliptic PDEs. Math. Nachr. 286, 832860 (2013)
43. Hansen, M., Schwab, C.: Sparse adaptive approximation of high dimensional parametric initial
value problems. Vietnam J. Math. 41, 181215 (2013)
44. Hoang, V.H., Schwab, C.: Sparse tensor Galerkin discretizations for parametric and random
parabolic PDEs analytic regularity and generalized polynomial chaos approximation. SIAM
J. Math. Anal. 45, 30503083 (2013)
45. Jakeman, J.D., Archibald, R., Xiu, D.: Characterization of discontinuities in high-dimensional
stochastic problems on adaptive sparse grids. J. Comput. Phys. 230, 39773997 (2011)
46. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo methods for high-dimensional integra-
tion: the standard (weighted Hilbert space) setting and beyond. The ANZIAM J. Aust. N. Z.
Ind. Appl. Math. J. 53, 137 (2011)
47. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo finite element methods for a class of
elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50, 3351
3374 (2012)
48. Li, C.F., Feng, Y.T., Owen, D.R.J., Li, D.F., Davis, I.M.: A Fourier-Karhunen-Love discretiza-
tion scheme for stationary random material properties in SFEM. Int. J. Numer. Methods Eng.
73, 19421965 (2007)
49. Love, M.: Probability Theory. I. Graduate Texts in Mathematics, vol. 45, 4th edn. Springer,
New York (1977)
50. Love, M.: Probability Theory. II. Graduate Texts in Mathematics, vol. 46, 4th edn. Springer,
New York (1978)
51. Ma, X., Zabaras, N.: An adaptive hierarchical sparse grid collocation algorithm for the solution
of stochastic differential equations. J. Comput. Phys. 228, 30843113 (2009)
52. Ma, X., Zabaras, N.: An adaptive high-dimensional stochastic model representation technique
for the solution of stochastic partial differential equations. J. Comput. Phys. 229, 38843915
(2010)
53. Maday, Y., Nguyen, N., Patera, A., Pau, S.: A general multipurpose interpolation procedure:
the magic points. Commun. Pure Appl. Anal. 8, 383404 (2009)
54. Mathelin, L., Hussaini, M.Y., Zang, T.A.: Stochastic approaches to uncertainty quantification
in CFD simulations. Numer. Algorithms 38, 209236 (2005)
55. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial
differential equations. Comput. Methods Appl. Mech. Eng. 194, 12951331 (2005)
56. Milani, R., Quarteroni, A., Rozza, G.: Reduced basis methods in linear elasticity with many
parameters. Comput. Methods Appl. Mech. Eng. 197, 48124829 (2008)
57. Nobile, F., Tempone, R., Webster, C.G.: An anisotropic sparse grid stochastic collocation
method for partial differential equations with random input data. SIAM J. Numer. Anal. 46,
24112442 (2008)
58. Nobile, F., Tempone, R., Webster, C.G.: A sparse grid stochastic collocation method for partial
differential equations with random input data. SIAM J. Numer. Anal. 46, 23092345 (2008)
59. Sauer, T., Xu, Y.: On multivariate Lagrange interpolation. Math. Comput. 64, 11471170
(1995)
60. Smith, S.J.: Lebesgue constants in polynomial interpolation. Annales Mathematicae et Infor-
maticae. Int. J. Math. Comput. Sci. 33, 109123 (2006)
61. Smolyak, S.: Quadrature and interpolation formulas for tensor products of certain classes of
functions. Dokl. Akad. Nauk SSSR 4, 240243 (1963) (English translation)
62. Stoyanov, M., Webster, C.G.: A gradient-based sampling approach for dimension reduction for
partial differential equations with stochastic coefficients. Int. J. Uncertain. Quantif. 5, 49-72
(2015)
762 M. Gunzburger et al.
63. Sweldens, W.: The lifting scheme: a custom-design construction of biorthogonal wavelets.
Appl. Comput. Harmonic Anal. 3, 186200 (1996)
64. Sweldens, W.: The lifting scheme: a construction of second generation wavelets. SIAM J.
Math. Anal. 29, 511546 (1998)
65. Todor, R.A.: Sparse perturbation algorithms for elliptic PDEs with stochastic data. Diss. No.
16192, ETH Zurich (2005)
66. Tran, H., Webster, C.G., Zhang, G.: Analysis of quasi-optimal polynomial approximations for
parameterized PDEs with deterministic and stochastic coefficients. Tech. Rep. ORNL/TM-
2015/341, Oak Ridge National Laboratory (2015)
67. Gunzburger, M., Jantsch, P., Teckentrup, A., Webster, C.G.: A multilevel stochastic collocation
method for partial differential equations with random input data. Tech. Rep. ORNL/TM-
2014/621, Oak Ridge National Laboratory (2014)
68. Trefethen, L.N.: Is gauss quadrature better than Clenshaw-Curtis? SIAM Rev. 50, 6787 (2008)
69. Webster, C.G.: Sparse grid stochastic collocation techniques for the numerical solution of
partial differential equations with random input data. PhD thesis, Florida State University
(2007)
70. Wiener, N.: The homogeneous chaos. Am. J. Math. 60, 897936 (1938)
71. Xiu, D., Hesthaven, J.S.: High-order collocation methods for differential equations with
random inputs. SIAM J. Sci. Comput. 27, 11181139 (2005)
72. Xiu, D., Karniadakis, G.E.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24, 619644 (2002)
73. Zhang, G., Gunzburger, M.: Error analysis of a stochastic collocation method for parabolic
partial differential equations with random input data. SIAM J. Numer. Anal. 50, 19221940
(2012)
74. Zhang, G., Webster, C., Gunzburger, M., Burkardt, J.: A hyper-spherical adaptive sparse-grid
method for high-dimensional discontinuity detection. SIAM J. Numer. Anal. 53, 15081536
(2015)
Method of Distributions for Uncertainty
Quantification 22
Daniel M. Tartakovsky and Pierre A. Gremaud
Abstract
Parametric uncertainty, considered broadly to include uncertainty in system
parameters and driving forces (source terms and initial and boundary condi-
tions), is ubiquitous in mathematical modeling. The method of distributions,
which comprises PDF and CDF methods, quantifies parametric uncertainty by
deriving deterministic equations for either probability density function (PDF) or
cumulative distribution function (CDF) of model outputs. Since it does not rely
on finite-term approximations (e.g., a truncated Karhunen-Love transformation)
of random parameter fields, the method of distributions does not suffer from the
curse of dimensionality. On the contrary, it is exact for a class of nonlinear
hyperbolic equations whose coefficients lack spatiotemporal correlation, i.e.,
exhibit an infinite number of random dimensions.
Keywords
Random Stochastic Probability density function (PDF) Cumulative
distribution function (CDF) Langevin equation White noise Colored
noise Fokker-Planck equation
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
1.1 Randomness in Mathematical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
1.2 Uncertainty Quantification in Langevin sODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
1.3 Uncertainty Quantification in PDEs with Random Coefficients . . . . . . . . . . . . . . . . 765
1 Introduction
The distinct challenges posed by the two types of randomness (Langevin forces and
uncertain parameters) are illustrated by the following two examples. The first is
provided by a Langevin (stochastic ordinary-differential) equation (sODE)
du
D h.u; t / C g.u; t /.t; !/: (22.1a)
dt
It describes the dynamics of the state variable u.t; !/, which consists of a slowly
varying (deterministic) part h and a fast varying random part g; the random
fluctuations .t; !/ have zero mean and a two-point covariance function
C .t; s/ h.t; !/.s; !/i D 2 .t; s/: (22.1b)
Here ! is an element of a probability space .; F; P/, hi E./ denotes the
ensemble mean over this space, and 2 and are the variance and correlation
function of , respectively. At time t , the systems state is characterized by the
probability Pu.t / U or, equivalently, by its probability density function (PDF)
fu .U I t /.
22 Method of Distributions for Uncertainty Quantification 765
@u @u
Ca D bH .u/ (22.2)
@t @x
where the spatially varying and uncertain coefficients a and b are correlated random
fields a.x; !/ and b.x; !/. This equation is subject to deterministic or random initial
and boundary conditions. Rather than having a unique solution, this problem admits
an infinite set of solutions that is characterized by a PDF fu .U I x; t /.
Monte Carlo simulations (MCS) provide the most robust and straightforward way to
solve PDEs with random coefficients. In the case of (22.2), for instance, they consist
of (i) generating multiple realizations of the input parameters a and b, (ii) solving
deterministic PDEs for each realization, and (iii) evaluating ensemble statistics or
PDFs of these solutions. MCS do not impose limitations on statistical properties of
input parameters, entail no modifications of existing deterministic solvers, and are
ideal for parallel computing. Yet MCS have a slow convergence rate which renders
them computationally expensive (often, prohibitively so). Research in the field of
766 D.M. Tartakovsky and P.A. Gremaud
2 Method of Distributions
du
D G.u; t /; u.0; !/ D u0 .!/: (22.3)
dt
Z1
E .U uQ / fu .QuI t / dQu D fu .U I t /: (22.5)
1
du
D ku; u.0/ D u0 ; (22.7)
dt
in which both the coefficient k.!/ and initial condition u0 .!/ are random variables.
While (22.7) looks simple, the determination of the statistical properties of its
solution is a nonlinear problem. The need to deal with the mixed moment Eku
suggests defining the raw joint PDF of k.!/ and u.t; !/,
Its ensemble mean is the joint PDF, E D fuk .U; KI t /. Manipulations similar to
those used in the previous example lead to the PDF equation
@fuk @.Ufuk /
DK : (22.9)
@t @U
768 D.M. Tartakovsky and P.A. Gremaud
100
3rd degree gPC
5rd degree gPC
exact = PDF method
101
expected value
102
103
0 20 40 60 80 100
time
Fig. 22.1 Expected value of the solution to (22.7) with k U .0; 1/ and u0 U .1; 1/. The
gPC approximations only capture the solution for small times. The PDF method is exact and yields
Eu.t; !/ D .1 et /=t
This equation can be solved by the method of characteristics and the marginal
fu .U I t / obtained by integration, see Fig. 22.1.
Numerical techniques such as gPC evolve approximations based on the statistical
properties of the data (here k and u0 ). These approximations become increasingly
inappropriate under the stochastic drift generated by the nonlinear dynamics (see
Fig. 22.1); in stark contrast to gPC, the PDF method is exact. Methods such as
time-dependent and/or gPC (see the relevant chapters of this handbook) have been
proposed to locally adapt the gPC polynomial bases; these improvements only
partially solve the above problem at the price, however, of significant numerical
costs and complexity.
The derivation of equations (22.6) and (22.9) for the PDFs of the solutions
to (22.3) and (22.7), respectively, involves several delicate points that deserve
further scrutiny. While PDF equations have been derived through the use of
characteristic functions [14], we prefer a more explicit justification based on
regularization arguments. We use the study of systems of nonlinear ODEs to provide
a mathematical justification of the approach. Consider a set of state variables
u.t / D .u1 ; : : : ; uN /> whose dynamics are governed by a system of coupled ODEs
subject to a random initial condition
dui
D Gi .u; t /; i D 1; : : : ; N I u.0; !/ D u0 .!/; (22.10)
dt
22 Method of Distributions for Uncertainty Quantification 769
One can show (e.g., [8], Appendix C) that fu; is a smooth approximation of fu .
Using this fact, we show in the Appendix that for any
2 C 1c .RN 0; 1//, the
PDF fu satisfies
Z1Z1 Z1Z1 Z1
fu @t
dU dt C Gfu @U
dU dt C fu .UI 0/
.U; 0/ dU D 0: (22.12)
0 1 0 1 1
@fu
C rU G.U; t /fu D 0; fu .U; 0/ D f0 .U/; (22.13)
@t
where f0 .U/ is the distribution corresponding to u0 . It is worthwhile emphasizing
that the PDF equation (22.13) is exact. Figure 22.2 illustrates this approach in the
case of the van der Pol equation
d u1 u2
D (22.14a)
dt u2 .1 u21 /u2 u1
4 4
2 2
0 0
-2 -2
-4 -4
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
4 4
2 2
0 0
-2 -2
-4 -4
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Fig. 22.2 Phase space trajectory of the PDF of the solution to the van der Pol equation (22.14a)
with D 1:25 subject to the Gaussian initial condition (22.14b) with 2 D 0:01 at times t D 1:0,
2.0, 4.0 and 5.2; the solid green line is the trajectory of the mean solution while the green dot
corresponds to the expected value at the considered time. The PDF undergoes many expansions
and contractions; this complex dynamics cannot be accurately described by a few moments of
the solution
where H ./ is the Heaviside function. At time t , the ensemble mean E gives the
CDF Fu .U I t / of u.t; !/:
Z ZU
E H .U uQ / fu .QuI t / dQu D fu .QuI t / dQu D Fu .U I t /: (22.16)
22 Method of Distributions for Uncertainty Quantification 771
@Fu @Fu
C G.U; t / D 0; Fu .U I 0/ D F0 .U /; (22.17)
@t @U
@u @G.u/
C D H .u; x; t /; u.x; 0; !/ D u0 .x; !/; (22.18)
@t @x
@ @ dG @u @
D H .U; x; t / : (22.19)
@t @U dU @x @U
@fu dG @ @u d2 G @u @H .U; x; t /fu
E 2
E D ; (22.20)
@t dU @U @x dU @x @U
772 D.M. Tartakovsky and P.A. Gremaud
ZU
@fu dG @fu d2 G @ @H .U; x; t /fu
C C fu .UQ I x; t / dUQ C D 0: (22.21)
@t dU @x dU 2 @x @U
1
A CDF formulation can also be derived (through integration); it takes the simple
form
@u @u
C D H .u; x; t /; u.x; 0/ D u0 .x; !/ (22.23)
@t @x
Defining the raw PDF and CDF, D U u.x; t; !/ and D H U u.x; t; !/,
and following the procedure described above yields Cauchy problems for the single-
point PDF and CDF of u,
and
The above equations can be numerically solved to high accuracy with standard
methods. In fact, due to their linear structure, they can often be solved exactly
through a characteristic analysis; the PDF solution for H .u; x; t / u u3 and
a standard Gaussian initial condition u0 .!/ is shown in Fig. 22.3.
Additional probabilistic information can be gained by similar means. For
instance, for an initial condition u0 corresponding to a random field, the equation
for the two-point CDF of u is given by
Fig. 22.3 PDF of the solution to (22.23) for a spatially constant initial condition u0 N .0; 1/
and for H .u; x; t / u u3 (the solution u does not depend on x)
@ @ @
Ca D bH .U; x; t / ; .U I x; 0; !/ D H .U u0 /: (22.27)
@t @x @U
Zt Z
@G @F
.F I x; t /
Evi0 .y/vj0 .x/ .x; y; t / .y; /dyd; (22.28b)
@xj @yi
0 D
The solutions to generic nonlinear hyperbolic balance laws such as (22.18) are
in general non-smooth and can present shocks even for smooth initial conditions.
The various types of probabilistic distributions describing such solutions are also
expected to not be smooth. As a result, the resolution of the evolution equations
describing such distributions (see, for instance, (22.21) and (22.22)) becomes
problematic. A way to incorporate shock dynamics in the method of distributions
(CDF equation) can be found in [31] and consists essentially in adapting the concept
of front tracking to the present framework (see [30] for a different approach in the
context of kinematic-wave equations).
Following [31], we illustrate the approach on a model of multiphase flow in
porous media given by the Buckley-Leverett equation,
@u @G.u/ .u swi /2
Cq D 0; GD : (22.30)
@t @x .u swi /2 C .1 soi u/2
22 Method of Distributions for Uncertainty Quantification 775
The saturation value ahead of the front, u , is constant along the characteristic curve
defined by
dx dG
D v.u / D q .u /;
dt du
G.u / G.uC / dG
C
D .u /: (22.32)
u u du
Solving (22.32) gives u and hence the location of the shock front xf .t /. The
continuous rarefaction solution ur .x; t / ahead of the front is found by using the
method of characteristics in the range u ur 1 uoi . The complete solution is
given by [31]:
ur .x; t /; 0 x < xf .t /
u.x; t / D (22.33)
swi ; x > xf .t /
The raw CDF is subdivided into two parts, a and b , according to the
saturation solution (22.33):
8
< a D H .U swi /; U < u ; x > xf .t /
.U; x; t / D (22.34a)
:
b D H .U sr /; s < U; x < xf .t /
where
with
Zt
dG
C .U; t / D q. /d: (22.34c)
dU
0
The ensemble average of (22.34) yields a formal expression for the CDF
Fu .U I x; t / [31],
81 1
R R
a H .x Xf /fu;x
C
.U; Xf I x; t /dU dXf ; U < u
<1 1
f
Fu D
1 1 h i
R R H .x X /f C C H .X x/f dU dX ;
U u :
: a f u;xf b f u;xf f
1 1
(22.35)
Here fu;xf .U; Xf I x; t / is the (unknown) joint PDF of u and xf , with the superscripts
and C indicating the parts of this PDF restricted to values of u before and after the
front, respectively. Since fu;xf dU dXf D fq dQ, this yields a computable solution
(see [31] for details)
8 1
R
a H x Xf .Q/pq dQ; U < u
< 1
Fu .U I x; t / D (22.36)
R1
: a H .x Xf .Q/ C b H Xf .Q/ x pq dQ; U u :
1
The local nature in x and t of (22.21) and (22.22) is a direct result of the simple
dependency structure associated to (22.18): solutions can be constructed using one
family of characteristics. The situation is more involved for random hyperbolic
systems of balance laws such as
0 0 0 0
0.1 0.2 0.3 0.2 0.3 0.4 0.5 0.6 0.78 0.8
U U U U
An equation for the joint PDF fu .UI x; t / can in principle be obtained as the
ensemble mean (over realizations of random ui , i D 1; : : : ; N at point x and time
t ), i.e., h i D fu .UI x; t /; a closed exact equation such as (22.21) or (22.22) in
the scalar case is however not available in general.
We illustrate this point on a simple example: the wave equation with constant
speed of propagation c > 0
@ v 0 c 2 @ v 0
C D : (22.40)
@t w 1 0 @x w 0
@2 u @2 u
2
c 2 2 D 0; (22.43)
@t @x
with
@u @u
vD and wD :
@x @x
which, again, is an equation that does admit an elementary closure. This difficulty is
not specific to the wave equation: the nonlocal nature and self-interacting character
of most partial differential equations unfortunately precludes the existence of
pointwise equations for the PDF of their solutions. The determination of PDF
equations for the solutions of random PDEs is an active area of research; three
approaches can be considered.
First, PDF equations can be obtained from the principles outlined above through
the use of closure approximations. Such approximations are typically application
dependent and attention must be to the resulting accuracy. For instance, we show
22 Method of Distributions for Uncertainty Quantification 779
below that the simple approximation, when applied to the wave equation, leads to
exact means but that the higher moments are not evolved exactly.
Second, using functional integral methods, sets of differential constraints satis-
fied by PDFs of random PDEs have been proposed [26]. We note however that this
approach may still require closure approximations and that, in general, the existence
of a set of differential constraints allowing the unique determination of a PDF is an
open question.
Third, the need to invoke a closure approximation can, in principle, be avoided
through discretization. Lets consider again the hyperbolic system of balance
laws (22.37). Spatial semi-discretization of (22.37) by appropriate methods such
as central schemes [1518] leads to ODE systems akin to (22.10). A PDF approach
can then be applied to the semi-discrete problems following the lines of Sect. 2.
We note however that dimension reduction methods have to be considered for the
numerical resolution of the resulting PDF equations. Analyzing the accuracy of the
approach, i.e., the closeness of the PDF of solutions to a discretized problem to the
PDF of solutions to the original problem, is an open question.
For most applications corresponding to systems of balance laws such as (22.37),
the spatial domain is finite. This seemingly innocuous remark has profound conse-
quences. The imposition of boundary conditions requires a characteristic analysis.
Let B D .@Gi =@uj / be the Jacobian of the system and let B D V V 1 be its
eigenvalue decomposition (which exists by hyperbolicity assumption). Assuming
for now a linear system, i.e., a constant Jacobian and a problem defined on the half
line x > 0, the number of boundary conditions to be imposed at x D 0 is q, the
number of positive eigenvalues of B. The boundary condition is then imposed on
the characteristic variables with positive wave speed. If the problem is defined on
an interval, the process is simply repeated at the other end by considering again the
characteristics entering the computational domain there.
For nonlinear systems with smooth solutions, one can linearize the equations
about an appropriate local state (frozen coefficient method). (The situation is much
more involved for nonlinear problems admitting shock formation [3, 7, 11, 12].)
The back and forth transformations between characteristic variables (z D V 1 u)
and conserved variables (u) are trivial in a deterministic setting. In the present
framework, the unknowns are joint PDF functions or marginals of random variables
that are not independent. Therefore, even linear transformations require additional,
detailed analysis.
We now illustrate the effect of a closure approximation in the case of the wave
equation (22.40). The application of the simple approximation to (22.42) yields
@ @v @ @v @Ev @fvw
E
E E D ; (22.45)
@W @x @W @x @x @W
where
1
Ev D V ? fvw .V ? ; W ? I x; t / dV ? dW ? ;
1
780 D.M. Tartakovsky and P.A. Gremaud
and similarly for the other term in (22.42). This leads to the Fokker-Planck equation
Q
@Ev @Ew Q
@Ew @Ev
c2 D0 and D 0; (22.47)
@t @x @t @x
where EQ denotes the expected value taken with respect to fQvw . On the other hand,
by taking directly the expectation of (22.40), we get
Subtracting the two above systems equation by equation yields that the difference
Q
between Ev and Ev is constant in time and can be made zero through consistent
initial conditions; a similar result holds for Ew. Therefore (22.46) evolves the
means exactly. Similar arguments show that
@ 2 Q
2 @w
Ev Ev D 2 Cov v; c
2
; (22.49)
@t @x
@
Q 2 D 2 Cov w; @v ;
Ew2 Ew (22.50)
@t @x
and thus, in general, (22.46) does not evolve the higher-order moments exactly.
Figure 22.5 illustrates the case of the wave equation on the half space x > 0 with
random initial and boundary conditions. The retained closure is the simple mean
field approximation.
4 Conclusions
The method of distributions, which comprises PDF and CDF methods, quantifies
uncertainty in model parameters and driving forces (source terms and initial and
boundary conditions) by deriving deterministic equations for either probability
density function (PDF) or cumulative distribution function (CDF) of model outputs.
Since it does not rely on finite-term approximations (e.g., a truncated Karhunen-
Love transformation) of random parameter fields, the method of distributions
does not suffer from the curse of dimensionality. On the contrary, it is exact for
22 Method of Distributions for Uncertainty Quantification 781
Appendix
It follows from (22.11) and the fact that u solves (22.10) that
2 3
Z1 Z1 Z1 Z1
@
@
for any
2 C 1c .RN 0; 1//. Therefore
Z1 Z1 Z1 Z1
I D 0 .U u/G.
Q Q
u/
.U; Q t / dU duQ dt
t /fu .uI fu; .UI 0/
.U; 0/ dU:
0 1 1 1
Z1 Z1 Z1
I D . ? Gfu /.U; t /rU
.U; t / dU dt fu; .UI 0/
.U; 0/ dU:
0 1 1
Z1 Z1 Z1 Z1 Z1
@
References
1. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuous
Problems. The Clarendon Press/Oxford University Press, Oxford/New York (2000)
2. Arnold, L.: Random Dynamical Systems. Springer, Berlin/New York (1998)
3. Benabdallahh, A., Serre, D.: Problmes aux limites pour des systmes hyperboliques non
linaires de deux quations une dimension despace. C. R. Acad. Sci. Paris 305, 677680
(1986)
4. Bharucha-Reid, A.T. (ed.): Probabilistic Methods in Applied Mathematics. Academic, New
York (1968)
5. Boso, F., Broyda, S.V., Tartakovsky, D.M.: Cumulative distribution function solutions of
advection-reaction equations with uncertain parameters. Proc. R. Soc. A 470(2166), 20140189
(2014)
6. Broyda, S., Dentz, M., Tartakovsky, D.M.: Probability density functions for advective-reactive
transport in radial flow. Stoch. Environ. Res. Risk Assess. 24(7), 985992 (2010)
7. Dubois, F., LeFLoch, P.: Boundary conditions for nonlinear hyperbolic systems. J. Differ. Equ.
71, 93122 (1988)
8. Evans, L.C.: Partial Differential Equations, 2nd edn. AMS, Providence (2010)
9. Fox, R.F.: Functional calculus approach to stochastic differential equations. Phys. Rev. A 33,
467476 (1986)
10. Fuentes, M.A., Wio, H.S., Toral, R.: Effective Markovian approximation for non-Gaussian
noises: a path integral approach. Physica A 303(12), 91104 (2002)
11. Gisclon, M.: Etude des conditions aux limites pour un systme strictement hyperbolique via
lapproximation parabolique. PhD thesis, Universit Claude Bernard, Lyon I (France) (1994)
12. Gisclon, M., Serre, D.: Etude des conditions aux limites pour un systme strictement
hyperbolique via lapproximation parabolique. C. R. Acad. Sci. Paris 319, 377382 (1994)
13. HRanggi, P., Jung, P.: Advances in Chemical Physics, chapter Colored Noise in Dynamical
Systems, pp. 239326. John Wiley & Sons, New York (1995)
22 Method of Distributions for Uncertainty Quantification 783
14. Kozin, F.: On the probability densities of the output of some random systems. Trans. ASME
Ser. E J. Appl. Mech. 28, 161164 (1961)
15. Kurganov, A., Lin, C.-T.: On the reduction of numerical dissipation in central-upwind schemes.
Commun. Comput. Phys. 2, 141163 (2007)
16. Kurganov, A., Noelle, S., Petrova, G.: Semi-discrete central-upwind scheme for hyperbolic
conservation laws and hamilton-jacobi equations. SIAM J. Sci. Comput. 23, 707740 (2001)
17. Kurganov, A., Petrova, G.: A third order semi-discrete genuinely multidimensional central
scheme for hyperbolic conservation laws and related problems. Numer. Math. 88, 683729
(2001)
18. Kurganov, A., Tadmor, E.: New high-resolution central schemes for nonlinear conservation
laws and convection-diffusion equations. J. Comput. Phys. 160, 241282 (2000)
19. Lichtner, P.C., Tartakovsky, D.M.: Stochastic analysis of effective rate constant for heteroge-
neous reactions. Stoch. Environ. Res. Risk Assess. 17(6), 419429 (2003)
20. Lindenberg, K., West, B.J.: The Nonequilibrium Statistical Mechanics of Open and Closed
Systems. VCH Publishers, New York (1990)
21. Lundgren, T.S.: Distribution functions in the statistical theory of turbulence. Phys. Fluids 10(5),
969975 (1967)
22. Risken, H.: The Fokker-Planck Equation: Methods of Solutions and Applications, 2nd edn.
Springer, Berlin/New York (1989)
23. Tartakovsky, D.M., Broyda, S.: PDF equations for advective-reactive transport in heteroge-
neous porous media with uncertain properties. J. Contam. Hydrol. 120121, 129140 (2011)
24. Tartakovsky, D.M., Dentz, M., Lichtner, P.C.: Probability density functions for advective-
reactive transport in porous media with uncertain reaction rates. Water Resour. Res. 45,
W07414 (2009)
25. Venturi, D., Karniadakis, G.E.: New evolution equations for the joint response-excitation
probability density function of stochastic solutions to first-order nonlinear PDEs. J. Comput.
Phys. 231(21), 74507474 (2012)
26. Venturi, D., Karniadakis, G.E.: Differential constraints for the probability density function of
stochastic solutions to wave equation. Int. J. Uncertain. Quant. 2, 195213 (2012)
27. Venturi, D., Sapsis, T.P., Cho, H., Karniadakis, G.E.: A computable evolution equation for the
joint response-excitation probability density function of stochastic dynamical systems. Proc.
R. Soc. A 468(2139), 759783 (2012)
28. Venturi, D., Tartakovsky, D.M., Tartakovsky, A.M., Karniadakis, G.E.: Exact PDF equations
and closure approximations for advective-reactive transport. J. Comput. Phys. 243, 323343
(2013)
29. Wang, P., Tartakovsky, A.M., Tartakovsky, D.M.: Probability density function method for
Langevin equations with colored noise. Phys. Rev. Lett. 110(14), 140602 (2013)
30. Wang, P., Tartakovsky, D.M.: Uncertainty quantification in kinematic-wave models. J. Comput.
Phys. 231(23), 78687880 (2012)
31. Wang, P., Tartakovsky, D.M., Jarman K.D. Jr., Tartakovsky, A.M.: CDF solutions of Buckley-
Leverett equation with uncertain parameters. Multiscale Model. Simul. 11(1), 118133 (2013)
32. Xiu, D., Karniadakis, G.E.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24, 619644 (2002)
Sampling via Measure Transport: An
Introduction 23
Youssef Marzouk, Tarek Moselhy, Matthew Parno, and
Alessio Spantini
Abstract
We present the fundamentals of a measure transport approach to sampling. The
idea is to construct a deterministic coupling i.e., a transport map between
a complex target probability measure of interest and a simpler reference
measure. Given a transport map, one can generate arbitrarily many independent
and unweighted samples from the target simply by pushing forward reference
samples through the map. If the map is endowed with a triangular structure, one
can also easily generate samples from conditionals of the target measure. We
consider two different and complementary scenarios: first, when only evaluations
of the unnormalized target density are available and, second, when the target
distribution is known only through a finite collection of samples. We show
that in both settings, the desired transports can be characterized as the solu-
tions of variational problems. We then address practical issues associated with
the optimization-based construction of transports: choosing finite-dimensional
parameterizations of the map, enforcing monotonicity, quantifying the error
of approximate transports, and refining approximate transports by enriching
Y. Marzouk ()
Massachusetts Institute of Technology, Cambridge, MA, USA
e-mail: ymarz@mit.edu
T. Moselhy
D. E. Shaw Group, New York, NY, USA
e-mail: tmoselhy@mit.edu
M. Parno
Massachusetts Institute of Technology, Cambridge, MA, USA
U. S. Army Cold Regions Research and Engineering Laboratory, Hanover, NH, USA
e-mail: matthew.d.parno@usace.army.mil; mparno@mit.edu
A. Spantini
Massachusetts Institute of Technology, Cambridge, MA, USA
e-mail: spantini@mit.edu
Keywords
Measure transport Optimal transport KnotheRosenblatt map Monte
Carlo methods Bayesian inference Approximate Bayesian computation
Density estimation Convex optimization
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786
2 Transport Maps and Optimal Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
3 Direct Transport: Constructing Maps from Unnormalized Densities . . . . . . . . . . . . . . . . . 790
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
3.2 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
3.3 Convergence, Bias, and Approximate Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
4 Inverse Transport: Constructing Maps from Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
4.1 Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
4.2 Convexity and Separability of the Optimization Problem . . . . . . . . . . . . . . . . . . . . . 799
4.3 Computing the Inverse Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
5 Parameterization of Transport Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
5.1 Polynomial Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
5.2 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
5.3 Monotonicity Constraints and Monotone Parameterizations . . . . . . . . . . . . . . . . . . . 805
6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
7 Conditional Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
8 Example: Biochemical Oxygen Demand Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
8.1 Inverse Transport: Map from Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
8.2 Direct Transport: Map from Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
9 Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
1 Introduction
no further appeals to tar are needed. A transport map can also be used to devise
deterministic sampling approaches, i.e., quadratures for nonstandard measures tar ,
based on quadratures for the reference measure ref . Going further, if the transport
map is endowed with an appropriate structure, it can enable direct simulation from
particular conditionals of tar . (We will describe this last point in more detail later.)
The potential to accomplish all of these tasks using measure transport is the
launching point for this chapter. We will present a variational approach to the
construction of transport maps, i.e., characterizing the desired maps as the solutions
of particular optimization problems. We will also discuss the parameterization of
transport maps a challenging task since maps are, in general, high-dimensional
multivariate functions, which ultimately must be approximated in finite-dimensional
spaces. Because maps are sought via optimization, standard tools for assessing
convergence can be used. In particular, it will be useful to quantify the error incurred
when the map is not exact, i.e., when we have only T] ref tar , and to develop
strategies for refining the map parameterization in order to reduce this error. More
broadly, it will be useful to understand how the structure of the transport map (e.g.,
sparsity and other low-dimensional features) depends on the properties of the target
distribution and how this structure can be exploited to construct and represent maps
more efficiently.
In discussing these issues, we will focus on two classes of map construction
problems: (P1) constructing transport maps given the ability to evaluate only the
unnormalized probability density of the target distribution and (P2) constructing
transport maps given only samples from a distribution of interest, but no explicit
density. These problems are frequently motivated by Bayesian inference, though
their applicability is more general. In the Bayesian context, the first problem
corresponds to the typical setup where one can evaluate the unnormalized density
of the Bayesian posterior. The second problem can arise when one has samples
from the joint distribution of parameters and observations and wishes to condition
the former on a particular realization of the latter. This situation is related to
approximate Bayesian computation (ABC) [9, 52], and here our ultimate goal is
conditional simulation.
This chapter will present the basic formulations employed in the transport map
framework and illustrate them with simple numerical examples. First, in Sect. 2, we
will recall foundational notions in measure transport and optimal transportation.
Then we will present variational formulations corresponding to Problems P1 and P2
above. In particular, Sect. 3 will explore the details of Problem P1: constructing
transport maps from density evaluations. Section 4 will explore Problem P2:
sample-based map construction. Section 5 then discusses useful finite-dimensional
parameterizations of transport maps. After presenting these formulations, we will
describe connections between the transport map framework and other work in
Sect. 6. In Sect. 7 we describe how to simulate from certain conditionals of the
target measure using a map, and in Sect. 8 we illustrate the framework on a Bayesian
inference problem, including some comparisons with MCMC. Because this area is
rapidly developing, this chapter will not attempt to capture all of the latest efforts
and extensions. Instead, we will provide the reader with pointers to current work in
Sect. 9, along with a summary of important open issues.
23 Sampling via Measure Transport: An Introduction 789
n
X
c.x; z/ D t i1 jxi zi j2 ; t > 0; (23.3)
iD1
then [18] and [13] show that the optimal map becomes lower triangular as t ! 0.
Lower triangular maps take the form
2 3
T 1 .x1 /
6 T 2 .x1 ; x2 / 7
6 7
T .x/ D 6 : 7 8x D .x1 ; : : : ; xn / 2 Rn ; (23.4)
4 :: 5
T n .x1 ; x2 ; : : : xn /
790 Y. Marzouk et al.
where T i represents output i of the map. Importantly, when tar and ref are
absolutely continuous, a unique lower triangular map satisfying (23.1) exists; this
map is exactly the KnotheRosenblatt rearrangement [13, 18, 70].
The numerical computation of the optimal transport map, for generic measures
on Rn , is a challenging task that is often restricted to very low dimensions
[4, 10, 38, 50]. Fortunately, in the context of stochastic simulation and Bayesian
inference, we are not particularly concerned with the optimality aspect of the
transport; we just need to push forward the reference measure to the target measure.
Thus we will focus on transports that are easy to compute, but that do not necessarily
satisfy an optimality criterion based on transport cost. Triangular transports will
thus be of particular interest to us. The triangular structure will make constructing
transport maps feasible (see Sects. 3 and 4), conditional sampling straightforward
(see Sect. 7), and map inversion efficient (see Sect. 4.3). Accordingly, we will
require the transport map to be (lower) triangular and search for a transformation
that satisfies (23.1). The optimization problems arising from this formulation are
described in the next two sections.
In this section we show how to construct a transport map that pushes forward a
reference measure to the target measure when only evaluations of the unnormalized
target density are available. This is a central task in Bayesian inference, where the
target is the posterior measure.
3.1 Preliminaries
We assume that both target and reference measures are absolutely continuous with
respect to the Lebesgue measure on Rn . Let and be, respectively, the normalized
target and reference densities with respect to the Lebesgue measure. In what follows
and for the sake of simplicity, we assume that both and are smooth strictly
positive functions on their support. We seek a diffeomorphism T (a smooth function
with smooth inverse) that pushes forward the reference to the target measure,
T] :D T 1 j det rT 1 j ; (23.6)
23 Sampling via Measure Transport: An Introduction 791
where rT 1 denotes the Jacobian of the inverse of the map. (Recall that the
Jacobian determinant det rT 1 is equal to 1=.det rT T 1 /). As noted in the intro-
duction, if .x i /i are independent samples from , then .T .x i //i are independent
samples from T] . (Here and throughout the chapter, we use the notation .x i /i as a
shorthand for .x i /MiD1 to denote a collection .x 1 ; : : : ; x M / whenever the definition
of the cardinality M is either unimportant or possibly infinite.) Hence, if we find a
transport map T that satisfies T] D , then .T .x i //i will be independent samples
from the target distribution. In particular, the change of variables formula:
Z Z
g.x/ .x/ dx D g T .x/ .x/ dx (23.7)
holds for any integrable real-valued function g on Rn [1]. The map therefore allows
for direct computation of posterior expectations.
is a valid transport map that pushes forward the reference to the target measure1 . In
fact, any global minimizer of (23.8) achieves the minimum cost DKL . T] jj / D 0
and implies that T] D . The constraint det rT > 0 ensures that the pushforward
density T] is strictly positive on the support of the target. In particular, the
constraint det rT > 0 ensures that the KL divergence evaluates to finite values
over T and does not rule out any useful transport map since we assume that both
target and reference densities are positive. The existence of global minimizers of
(23.8) is a standard result in the theory of deterministic couplings between random
variables [85].
1
See [61] for a discussion on the asymptotic equivalence of the KL divergence and Hellinger
distance in the context of transport maps.
792 Y. Marzouk et al.
where T4 is now the vector space of smooth triangular maps. The constraint
rT 0 suffices to enforce invertibility of a feasible triangular map. Equation (23.9)
is a far better behaved optimization problem than the original formulation (23.8).
Hence, for the rest of this section, we will focus on the computation of a Knothe
Rosenblatt rearrangement by solving (23.9). Recall that our goal is just to compute
a transport map from ref to tar . If there are multiple transports, we can opt for
the easiest one to compute. A possible drawback of a triangular transport is that the
complexity of a parameterization of the map depends on the ordering of the input
variables. This dependence motivates questions of what is the best ordering, or
how to find at least a good ordering. We refer the reader to [75] for an in-depth
discussion of this topic. For the computation of general non-triangular transports,
see [61]; for a generalization of the framework to compositions of maps, see [61,63];
and for the computation of optimal transports, see, for instance, [4, 10, 50, 85].
Now let N denote any unnormalized version of the target density. For any map T
in the feasible set of (23.9), the objective function can be written as:
2
The lexicographic order on Rn is defined as follows. For x; y 2 Rn , we define x y if and only if
either x D y or the first nonzero coordinate in y x is positive [32]. is a total order on Rn . Thus,
we define T to be a monotone increasing function if and only if x y implies T .x/ T .y/.
Notice that monotonicity can be defined with respect to any order on Rn (e.g., need not be the
lexicographic order). There is no natural order on Rn except when n D 1. It is easy to verify that
for a triangular function T , monotonicity with respect to the lexicographic order is equivalent to
the following: the kth component of T is a monotone function of the kth input variable.
23 Sampling via Measure Transport: An Introduction 793
Notice that we can evaluate the objective of (23.11) given only the unnormalized
density N and a way to compute the integral E . There exist a host of techniques to
approximate the integral with respect to the reference measure, including quadrature
and cubature formulas, sparse quadratures, Monte Carlo methods, and quasi-Monte
Carlo (QMC) methods. The choice between these methods is typically dictated
by the dimension of the reference space. In any case, the reference measure is
usually chosen so that the integral with respect to can be approximated easily
and accurately. For instance, if ref is a standard Gaussian measure, then we
can generate arbitrarily many independent samples to yield an approximation of
E to any desired accuracy. This will be a crucial difference relative to the
sample-based construction of the map described in Sect. 4, where samples from the
target distribution are required to accurately solve the corresponding optimization
problem.
Equation (23.11) is a linearly constrained nonlinear differentiable optimization
problem. It is also non-convex unless, for instance, the target density is log concave
[41]. That said, many statistical models have log-concave posterior distributions
and hence yield convex map optimization problems; consider, for example, a log-
Gaussian Cox process [34]. All n components of the map have to be computed
simultaneously, and each evaluation of the objective function of (23.11) requires
an evaluation of the unnormalized target density. The latter is also the mini-
mum requirement for alternative sampling techniques such as MCMC. Of course,
the use of derivatives in the context of the optimization problem is crucial for
computational efficiency, especially for high-dimensional parameterizations of the
map. The same can be said of the state-of-the-art MCMC algorithms that use
gradient or Hessian information from the log-target density to yield better proposal
distributions, such as Langevin or Hamiltonian MCMC [34]. In the present context,
we advocate the use of quasi-Newton (e.g., BFGS) or Newton methods [90] to solve
(23.11). These methods must be paired with a finite-dimensional parameterization
of the map; in other words, we must solve (23.11) over a finite-dimensional
space T4h T4 of triangular diffeomorphisms. In Sect. 5 we will discuss
various choices for T4h along with ways of enforcing the monotonicity constraint
rT 0.
794 Y. Marzouk et al.
1
DKL . T] jj / Var log log T]1
N (23.12)
2
M n
!
1 X X
min N .x i //
log .T log @k T k .x i / (23.14)
M iD1
kD1
estimator for E g since Q is not the target density. It turns out that this bias can be
bounded as:
q
kE g EQ gk C.g; ; /
Q DKL . T] jj / (23.15)
p 1
where C.g; ; / Q :D 2 E kgk2 C EQ kgk2 2 . The proof of this result is in
[79, Lemma 6.37] together with a similar result for the approximation of the second
moments. Note that the KL divergence on the right-hand side of (23.15) is exactly
the quantity we minimize in (23.11) during the computation of a transport map, and
it can easily be estimated using (23.12). Thus the transport map framework allows
a systematic control of the bias resulting from estimation of E g by means of an
approximate map T e.
In practice, the mean-square error in approximating E g using EQ g will be
entirely due to the bias described in (23.15). The reason is that a Monte Carlo
estimator of EQ g can be constructed to have virtually no variance: one can cheaply
generate an arbitrary number of independent and unweighted samples from Q using
the approximate map. Hence the approximate transport map yields an essentially
zero-variance biased estimator of the quantity of interest E g. This property
should be contrasted with MCMC methods which, while asymptotically unbiased,
yield estimators of E g that have nonzero variance and bias for finite sample size.
If one is not content with the bias associated with an approximate map, then there
are at least two ways to proceed. First, one can simply refine the approximation
parameters .M; h/ to improve the current approximation of the transport map. On
the other hand, it is straightforward to apply any classical sampling technique (e.g.,
MCMC, importance sampling) to the pullback density T e] D Te1
] . This density
is defined as
e] :D T
T e j det r T
ej ; (23.16)
There is clearly a trade-off between the computational cost associated with con-
structing a more accurate transport map and the costs of correcting an approximate
map by applying an exact sampling scheme to the pullback. Focusing on the former,
it is natural to ask how to refine the finite-dimensional approximation space T4h so
that it can better capture the true map T . Depending on the problem at hand, a nave
finite-dimensional parameterization of the map might require a very large number of
degrees of freedom before reaching a satisfactory approximation. This is particularly
true when the parameter dimension n is large and is a challenge shared by any func-
tion approximation algorithm (e.g., high-dimensional regression). We will revisit
this issue in Sect. 9, but the essential way forward is to realize that the structure of the
target distribution is reflected in the structure of the transport map. For instance, con-
ditional independence in the target distribution yields certain kinds of sparsity in the
transport, which can be exploited when solving the optimization problems above.
Many Bayesian inference problems also contain low-rank structure that causes the
map to depart from the identity only on low-dimensional subspace of Rn . From the
optimization perspective, adaptivity can also be driven via a systematic analysis of
the first variation of the KL divergence DKL . T] jj / as a function of T 2 T .
We focus on the inverse triangular transport because it can be computed via convex
optimization given samples from the target distribution. It is easy to see that the
monotone increasing KnotheRosenblatt rearrangement that pushes forward tar to
ref is the unique minimizer of
where E denotes integration with respect to the target measure, and where C
is once again a term independent of the transport map and thus a constant for
the purposes of optimization. The resulting optimization problem is a stochastic
program given by:
M
1 X
min log .S .zi // log det rS .zi / (23.21)
M iD1
Note that (23.21) is a convex optimization problem as long as the reference density
is log concave [41]. Since the reference density is a degree of freedom of the
problem, it can always be chosen to be log concave. Thus, (23.21) can be a
convex optimization problem regardless of the particular structure of the target.
This is a major difference from the density-based construction of the map in Sect. 3,
where the corresponding optimization problem (23.11) is convex only under certain
conditions on the target (e.g., that the target density be log concave).
For smooth reference densities, the objective function of (23.21) is also smooth.
Moreover, its gradients do not involve derivatives of the log-target density which
might require expensive adjoint calculations when the target density contains a PDE
model. Indeed, the objective function of (23.21) does not contain the target density
at all! This feature should be contrasted with the density-based construction of the
map, where the objective function of (23.11) depends explicitly on the log-target
density. Moreover, if the reference density can be written as the product of its
marginals, then (23.21) is a separable optimization problem, i.e., each component
of the inverse transport can be computed independently and in parallel.
As a concrete example, let the reference measure be standard Gaussian. In this
case, (23.21) can be written as
M n
1 XX 1 k 2 k
min .S / .zi / log @k S .zi / (23.22)
M iD1 2
kD1
are separable: the kth component of the inverse transport can be computed as the
solution of a single convex optimization problem,
M
1 X 1 k 2
min .S / .zi / log @k S k .zi / (23.23)
M iD1 2
1
DKL . S] jj / Var log N log S ] (23.24)
2
23 Sampling via Measure Transport: An Introduction 801
Up to now, we have shown how to compute the triangular inverse transport S via
convex optimization given samples from the target density. In many problems of
interest, however, the goal is to evaluate the direct transport T , i.e., a map that pushes
forward the reference to the target measure. Clearly, the following relationship
between the direct and inverse transports holds:
where the kth component of S is just a function of the first k input variables. This
system is in general nonlinear, but we can devise a simple recursion in k to compute
each component of z as
unique real solution for each k and any given x . Here, one can use any off-the-shelf
root-finding algorithm3 . Whenever the transport is high-dimensional (e.g., hundreds
or thousands of components), this recursive approach might become inaccurate, as it
is sequential in nature. In this case, we recommend running a few Newton iterations
of the form
to clean up the approximation of the root z obtained from the recursive algorithm
(23.27).
An alternative way to evaluate the direct transport is to build a parametric
representation of T itself via standard regression techniques. In particular, if
fz1 ; : : : ; zM g are samples from the target distribution, then fx 1 ; : : : ; x M g, with
x k :D S .zk / for k D 1; : : : ; M , are samples from the reference distribution. Note
that there is a one-to-one correspondence between target and reference samples.
Thus, we can use these pairs of samples to define a simple constrained least-squares
problem to approximate the direct transport as:
M X
X n
i 2
min T .x k / zk (23.29)
kD1 iD1
M
X i 2
min T .x k / zk (23.30)
kD1
3
Roots can be found using, for instance, Newtons method. When a component of the inverse
transport is parameterized using polynomials, however, then a more robust root-finding approach
is to use a bisection method based on Sturm sequences (e.g., [63]).
23 Sampling via Measure Transport: An Introduction 803
As noted in the previous sections, the optimization problems that one solves to
obtain either the direct or inverse transport must, at some point, introduce discretiza-
tion. In particular, we must define finite-dimensional approximation spaces (e.g.,
T4h ) within which we search for a best map. In this section we describe several
useful choices for T4h and the associated map parameterizations. Closely related
to the map parameterization is the question of how to enforce the monotonicity
constraints @k T k > 0 or @k S k > 0 over the support of the reference and target
densities, respectively. For some parameterizations, we will explicitly introduce
discretizations of these monotonicity constraints. A different map parameterization,
discussed in Sect. 5.3, will satisfy these monotonicity conditions automatically.
For simplicity, we will present the parameterizations below mostly in the context
of the direct transport T . But these parameterizations can be used interchangeably
for both the direct and inverse transports.
n
Y
j .x/ D 'ji .xi /; (23.31)
iD1
where Jk is a set of multi-indices defining the polynomial terms in the expansion for
dimension k and k;j 2 R is a scalar coefficient. Importantly, the proper choice of
each multi-index set Jk will force T to be lower triangular. For instance, a standard
choice of Jk involves restricting each map component to a total-degree polynomial
space:
JkTO D fj W kjk1 p ^ ji D 0; 8i > kg; k D 1; : : : ; n (23.33)
The first constraint in this set, kjk1 p, limits the total degree of each polynomial
to p, while the second constraint, ji D 0; 8i > k, forces T to be lower triangular.
Expansions built using JdTO are quite expressive in the sense of being able
to capture complex nonlinear dependencies in the target measure. However, the
number of terms in JkTO grows rapidly with k and p. A smaller multi-index set
can be obtained by removing all the mixed terms in the basis:
JkNM D fj W kjk1 p ^ ji j` D 0; 8i ` ^ ji D 0; 8i > kg:
An even more parsimonious option is to use diagonal maps, via the multi-index sets
JkD D fj W kjk1 p ^ ji D 0; 8i kg:
Figure 23.1 illustrates the difference between these three sets for p D 3 and k D 2.
0 1 2 3 4
j1
Fig. 23.1 Visualization of multi-index sets for the second component of a two-dimensional map,
T 2 .x1 ; x2 /. In this case, j1 is the degree of a basis polynomial in x1 and j2 is the degree in x2 . A
filled circle indicates that a term is present in the set of multi-indices
23 Sampling via Measure Transport: An Introduction 805
k
X Pk
X
T k .x/ D ak;0 C ak;j xj C bk;j j .x1 ; x2 ; : : : ; xk I xN k;j /; k D 1; : : : ; n;
j D1 j D1
(23.34)
where Pk is the total number of radial basis functions used for the kth component of
the map and j .x1 ; x2 ; : : : ; xk I xN k;j / is a radial basis function centered at xN k;j 2 Rk .
Note that this representation ensures that the overall map T is lower triangular. The
a and b coefficients can then be exposed to the optimization algorithm used to search
for the map.
Choosing the centers and scales of the radial basis functions can be challenging
in high dimensions, though some heuristics for doing so are given in [63]. To
circumvent this difficulty, [63] also proposes using only univariate radial basis
functions and embedding them within a composition of maps.
Neither the polynomial representation (23.32) nor the radial basis function repre-
sentation (23.34) yields monotone maps for all values of the coefficients. With
either of these choices for the approximation space T4h , we need to enforce the
monotonicity constraints explicitly. Recall that, for the triangular maps considered
here, the monotonicity constraint reduces to requiring that @k T k > 0 over the entire
support of the reference density, for k D 1; : : : ; n. It is difficult to enforce this
condition everywhere, so instead we choose a finite sequence of points .x i /i a
806 Y. Marzouk et al.
stream of samples from the reference distribution, and very often the same samples
used for the sample-average approximation of the objective (23.14) and enforce
local monotonicity at each point: @k T k .x i / > 0, for k D 1; : : : ; n. The result
is a finite set of linear constraints. Collectively these constraints are weaker than
requiring monotonicity everywhere, but as the cardinality of the sequence .x i /i
grows, we have stronger guarantees on the monotonicity of the transport map over
the entire support of the reference density. When monotonicity is lost, it is typically
only in the tails of ref where samples are fewer. We should also point out that the
. log det rT / term in the objective of (23.11) acts as a barrier function for the
constraint rT 0 [68].
A more elegant alternative to discretizing and explicitly enforcing the monotonic-
ity constraints is to employ parameterizations of the map that are in fact guaranteed
to be monotone [12]. Here we take advantage of the fact that monotonicity of a
triangular function can be expressed in terms of one-dimensional monotonicity of
its components. A smooth monotone increasing function of one variable, e.g., the
first component of the lower triangular map, can be written as [66]:
Z x1
T 1 .x1 / D a1 C exp .b1 .w// dw; (23.35)
0
not a function of the kth input variable. Of course, we now have to pick a
finite-dimensional parameterization of the functions ak ; bk , but the monotonicity
constraint is automatically enforced since
and for all choices of ak and bk . Enforcing monotonicity through the map param-
eterization in this way, i.e., choosing T4h so that it only contains monotone lower
triangular functions, allows the resulting finite-dimensional optimization problem
to be unconstrained.
6 Related Work
log .z/
N C D log .x/; (23.39)
7 Conditional Sampling
In this section we will show how the triangular structure of the transport map allows
efficient sampling from particular conditionals of the target density. This capability
is important because, in general, the ability to sample from a distribution does not
necessarily provide efficient techniques for also sampling its conditionals. As in
the previous sections, assume that the reference and target measures are absolutely
continuous with respect to the Lebesgue measure on Rn with smooth and positive
densities. Let T W Rn ! Rn be a triangular and monotone increasing transport that
pushes forward the reference to the target density, i.e., T] D , where T is the
KnotheRosenblatt rearrangement.
We first need to introduce some additional notation. There is no loss of generality
in thinking of the target as the joint distribution of some random vector Z in Rn .
Consider a partition of this random vector as Z D .D; / where D 2 Rnd , 2
Rn , and n D nd C n . In other words, D simply comprises the first nd components
of Z . We equivalently denote the joint density of Z D .D; / by either or D; .
That is, D; . We define the conditional density of given D as
D;
jD :D ; (23.40)
D
R
where D :D D; .; / d is the marginal density of D. In particular,
jD .jd/ is the conditional density of at 2 Rn given the event fD D dg.
Finally, we define jDDd as a map from Rn to R such that
0
d=0
0.6
|D=0
5 0.4
d
0.2
10
0
2 0 2 4 2 0 2 4
Fig. 23.2 (left) Illustration of a two-dimensional joint density D; together with a particular
slice at d D 0. (right) Conditional density jD . j0/ obtained from a normalized slice of the
joint density at d D 0
We can think of jDDd as a particular normalized slice of the joint density D;
for D D d, as shown in Fig. 23.2. Our goal is to show how the triangular transport
map T can be used to efficiently sample the conditional density jDDd .
If T is a monotone increasing lower triangular transport on Rn , we can denote its
components by
T D .x D /
T .x/ :D T .x D ; x / D ; 8x 2 Rn ; (23.42)
T .x D ; x /
assume that the reference density can be written as the product of its marginals;
that is, .x/ D D .x D / .x / for all x D .x D ; x / in Rn and with marginal
densities D and . This hypothesis is not restrictive as the reference density is a
degree of freedom of the problem (e.g., is often a standard normal density).
Lemma 1. For a fixed d 2 Rnd , define x ?d as the unique element of Rnd such that
T D .x ?d / D d. Then, the map Td W Rn ! Rn , defined as
w 7! T . x ?d ; w/; (23.43)
Proof. First of all, notice that x ?d :D .T D /1 .d/ is well defined since T D W Rnd !
Rnd is a monotone increasing and invertible function by definition of the Knothe
Rosenblatt rearrangement T . Then:
]
Td jDDd .w/ D jD .Td .w/jd/ j det rT d .w/j
D; .d; T . x ?d ; w//
D det rw T . x ?d ; w/ (23.44)
D .d/
] D; .T D .x ?d /; T . x ?d ; w//
Td jDDd .w/ D det rw T . x ?d ; w/
D .T D .x ?d //
D; .T D .x ?d /; T . x ?d ; w//
D det rT D .x ?d / det rw T . x ?d ; w/
.T D /] D .x ?d /
T ] D; .x ?d ; w/ .x ?d ; w/
D D D .w/
D .x ?d / D .x ?d /
where we used the identity T]D D D D which follows from the definition of
KnotheRosenblatt rearrangement (e.g., [85]). u
t
We can interpret the content of Lemma 1 in the context of Bayesian inference. If
we observe a particular realization of the data, i.e., D D d, then we can easily
sample the posterior distribution jDDd as follows. First, solve the nonlinear
triangular system T D .x ?d / D d to get x ?d . Since T D is a lower triangular and
invertible map, one can solve this system using the techniques described in Sect. 4.3.
Then, define a new map Td W Rn ! Rn as Td .w/ :D T . x ?d ; w/ for all w 2 Rn ,
and notice that the pushforward through the map of the marginal distribution of
the reference over the parameters, i.e., .Td /] , is precisely the desired posterior
distribution.
23 Sampling via Measure Transport: An Introduction 813
a transport TdQ according to (23.43). Moreover, note that the particular triangular
structure of the transport map T is essential to achieving efficient sampling from the
conditional jDDd in the manner described by Lemma 1.
A
U .0:4; 1:2/; B
U .0:01; 0:31/: (23.46)
Instead of inferring A and B directly, we choose to invert for some new target
parameters,
1 and
2 , that are related to the original parameters through the CDF
of a standard normal distribution:
1
A :D 0:4 C 0:4 1 C erf p (23.47)
2
2
B :D 0:01 C 0:15 1 C erf p : (23.48)
2
Notice that these transformations are invertible. Moreover, the resulting prior
marginal distributions over the target parameters
1 and
2 are given by
1
N .0; 1/;
2
N .0; 1/: (23.49)
814 Y. Marzouk et al.
Table 23.1 BOD problem of Section 8.1 via inverse transport and conditioning. First four
moments of the conditional density jDDd , for d D 0:18; 0:32; 0:42; 0:49; 0:54, estimated by a
reference run of an adaptive Metropolis sampler with 6 106 steps, and by transport maps up to
degree p D 7 with 3 104 samples
# training Mean Variance Skewness Kurtosis
Map type
samples
1
2
1
2
1
2
1
2
MCMC truth 0:075 0:875 0:190 0:397 1:935 0:681 8:537 3:437
5000 0:199 0:717 0:692 0:365 0:005 0:010 2:992 3:050
pD1
50000 0:204 0:718 0:669 0:348 0:016 0:006 3:019 3:001
5000 0:066 0:865 0:304 0:537 0:909 0:718 4:042 3:282
pD3 50000 0:040 0:870 0:293 0:471 0:830 0:574 3:813 3:069
5000 0:027 0:888 0:200 0:447 1:428 0:840 5:662 3:584
pD5
50000 0:018 0:907 0:213 0:478 1:461 0:843 6:390 3:606
5000 0:090 0:908 0:180 0:490 2:968 0:707 29:589 16:303
pD7 50000 0:034 0:902 0:206 0:457 1:628 0:872 7:568 3:876
reference adaptive Metropolis MCMC scheme [37]. While more efficient MCMC
algorithms exist, the adaptive Metropolis algorithm is well known and enables a
qualitative comparison of the computational cost of the transport approach to that
of a widely used and standard method. The MCMC sampler was tuned to have an
acceptance rate of 26%. The chain was run for 6 106 steps, 2 104 of which were
discarded as burn-in. The moments of the approximate conditional density jDDd
given by each computed transport are estimated using 3 104 independent samples
generated from the conditional map. The accuracy comparison in Table 23.1 shows
that the cubic map captures the mean and variance of jDDd but does not
accurately capture the higher moments. Increasing the map degree, together with the
number of target samples, yields better estimates of these moments. For instance,
degree-seven maps constructed with 5 104 target samples can reproduce the
skewness and kurtosis of the conditional density reasonably well. Kernel density
estimates of the two-dimensional conditional density jDDd using 50,000 samples
are also shown in Fig. 23.3 for different orders of the computed transport. The
degree-seven map gives results that are nearly identical to the MCMC reference
computation.
In a typical application of Bayesian inference, we can regard the time required to
compute an approximate inverse transport e S and a corresponding approximate direct
transport Te as offline time. This is the expensive step of the computations, but it
is independent of the observed data. The online time is that required to generate
samples from the conditional distribution jDDd when a new realization of the
data fD D dg becomes available. The online step is computationally inexpensive
since it requires, essentially, only the solution of a single nonlinear triangular system
of the dimension of the data (see Lemma 1).
In Table 23.2 we compare the computational time of the map-based approach
to that of the reference adaptive Metropolis MCMC scheme. The online time
column shows how long each method takes to generate 30,000 independent samples
816 Y. Marzouk et al.
a b c d
Fig. 23.3 BOD problem of Sect. 8.1 via inverse transport and conditioning. Kernel density
estimates of the conditional density jDDd , for d D 0:18; 0:32; 0:42; 0:49; 0:54, using 50,000
samples from either a reference adaptive MCMC sampler (left) or conditioned transport maps of
varying total degree. Contour levels and color scales are constant for all figures. (a) MCMC truth.
(b) Degree p D 3. (c) Degree p D 5. (d) Degree p D 7
Table 23.2 BOD problem of Sect. 8.1 via inverse transport and conditioning. Efficiency of
approximate Bayesian inference with inverse transport map from samples. The offline time is
defined as the time it takes to compute an approximate inverse transport e S and a corresponding
approximate direct transport eT via regression (see Sect. 4). The online time is the time required
after observing fD D dg to generate the equivalent of 30,000 independent samples from the
conditional jDDd . For MCMC, the online time is the average amount of time it takes to
generate a chain with an effective sample size of 30,000
# training Offline time (sec)
Map type Online time (sec)
samples e
S construction e
T regression
MCMC truth NA 591:17
pD1 5000 0:46 0:18 2:60
50;000 4:55 1:65 2:32
5000 4:13 1:36 3:54
pD3
50;000 40:69 18:04 3:58
pD5 5000 22:82 8:40 5:80
50;000 334:25 103:47 6:15
5000 145:00 40:46 8:60
pD7 50;000 1070:67 432:95 8:83
from the conditional jDDd . For MCMC, we use the average amount of time
required to generate a chain with an effective sample size of 30,000. Measured
in terms of online time, the polynomial transport maps are roughly two orders of
magnitude more efficient than the adaptive MCMC sampler. More sophisticated
MCMC samplers could be used, of course, but the conditional map approach will
retain a significant advantage because, after solving for x ?d in Lemma 1, it can
generate independent samples at negligible cost. We must stress, however, that the
samples produced in this way are from an approximation of the targeted conditional.
In fact, the conditioning lemma holds true only if the computed joint transport is
exact, and a solution of (23.22) is an approximate transport for the reasons discussed
in Sect. 4.2. Put in another way, the conditioning lemma is exact for the pushforward
23 Sampling via Measure Transport: An Introduction 817
a b
Fig. 23.4 BOD problem of Section 8.2 via direct transport. Observations are taken at times
t 2 f1; 2; 3; 4; 5g. The observed data vector is given by d D 0:18I 0:32I 0:42I 0:49I 0:54. (a)
Reference density. (b) Target density
818 Y. Marzouk et al.
a b c
Fig. 23.5 BOD problem of Sect. 8.2 via direct transport: pushforwards of the reference density
under a given total-degree triangular map. The basis of the map consists of multivariate Hermite
polynomials. The expectation with respect to the reference measure is approximated with a full
tensor product GaussHermite quadrature rule. The approximation is already excellent with a map
of degree p D 3. (a) Target density. (b) Pushforward p D 3. (c) Pushforward p D 5
a b c
Fig. 23.6 BOD problem of Sect. 8.2 via direct transport: pullbacks under a given total order
triangular map of the target density. Same setup of the optimization problem as in Fig. 23.5. The
pullback density is progressively Gaussianized as the degree of the transport map increases. (a)
Reference density. (b) Pullback p D 3. (c) Pullback p D 5
map that pushes forward the target measure to the reference measure can be
computed efficiently via separable convex optimization. The direct transport can
then be evaluated by solving a nonlinear triangular system via a sequence of one-
dimensional root findings (see Sect. 4.3). Moreover, we showed that characterizing
the target distribution as the pushforward of a triangular transport enables efficient
sampling from particular conditionals (and of course any marginal) of the target
(see Sect. 7). This feature can be extremely useful in the context of online Bayesian
inference, where one is concerned with fast posterior computations for multiple
realizations of the data (see Sect. 8.1).
Ongoing efforts aim to expand the transport map framework by: (1) understand-
ing the fundamental structure of transports and how this structure flows from certain
properties of the target measure; (2) developing rigorous and automated methods for
the adaptive refinement of maps; and (3) coupling these methods with more effective
parameterizations and computational approaches.
An important preliminary issue, which we discussed briefly in Sect. 5.3, is how
to enforce monotonicity and thus invertibility of the transport. In general, there
is no easy way to parameterize a monotone map. However, as shown in Sect. 5.3
and detailed in [12], if we restrict our attention to triangular transports that is,
if we consider the computation of a KnotheRosenblatt rearrangement then the
monotonicity constraint can be enforced strictly in the parameterization of the map.
This result is inspired by monotone regression techniques [66] and is useful in the
transport map framework as it removes explicit monotonicity constraints altogether,
enabling the use of unconstrained optimization techniques.
Another key challenge is the need to construct low-dimensional parameteriza-
tions of transport maps in high-dimensional settings. The critical observation in
[75] is that Markov properties i.e., the conditional independence structure of
the target distribution induce an intrinsic low dimensionality of the transport map
in terms of sparsity and decomposability. A sparse transport is a multivariate
map where each component is only a function of few input variables, whereas a
decomposable transport is a map that can be written as the exact composition of a
finite number of simple functions. The analysis in [75] reveals that these sparsity and
decomposability properties can be predicted before computing the actual transport
simply by examining the Markov structure of the target distribution. These proper-
ties can then be explicitly enforced in the parameterization of candidate transport
maps, leading to optimization problems of considerably reduced dimension. Note
that there is a constant effort in applications to formulate probabilistic models of
phenomena of interest using sparse Markov structures; one prominent example is
multiscale modeling [65]. A further source of low dimensionality in transports is
low-rank structure, i.e., situations where a map departs from the identity only
on a low-dimensional subspace of the input space [75]. This situation is fairly
common in large-scale Bayesian inverse problems where the data are informative,
relative to the prior, only about a handful of directions in the parameter space
[25, 76].
Building on these varieties of low-dimensional structure, we still need to
construct explicit representations of the transport. In this chapter, we have opted for a
23 Sampling via Measure Transport: An Introduction 821
and inference in likelihood-free settings (e.g., with radar and image data), and rare
event simulation.
References
1. Adams, M.R., Guillemin, V.: Measure Theory and Probability. Birkhuser Basel (1996)
2. Ambrosio, L., Gigli, N.: A users guide to optimal transport. In: Benedetto, P., Michel, R. (eds)
Modelling and Optimisation of Flows on Networks, pp. 1155. Springer, Berlin/Heidelberg
(2013)
3. Andrieu, C., Moulines, E.: On the ergodicity properties of some adaptive MCMC algorithms.
Ann. Appl. Probab. 16(3), 14621505 (2006)
4. Angenent, S., Haker, S., Tannenbaum, A.: Minimizing flows for the MongeKantorovich
problem. SIAM J. Math. Anal. 35(1), 6197 (2003)
5. Atkins, E., Morzfeld, M., Chorin, A.J.: Implicit particle methods and their connection with
variational data assimilation. Mon. Weather Rev. 141(6), 17861803 (2013)
6. Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In:
Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm,
pp. 2130. Morgan Kaufmann Publishers Inc. (1999)
7. Bangerth, W., Rannacher, R.: Adaptive Finite Element Methods for Differential Equations.
Birkhuser Basel (2013)
8. Bardsley, J.M., Solonen, A., Haario, H., Laine, M.: Randomize-then-optimize: a method for
sampling from posterior distributions in nonlinear inverse problems. SIAM J. Sci. Comput.
36(4), A1895A1910 (2014)
9. Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population
genetics. Genetics 162(4), 20252035 (2002)
10. Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge-
Kantorovich mass transfer problem. Numer. Math. 84(3), 375393 (2000)
11. Bernard, P., Buffoni, B.: Optimal mass transportation and Mather theory. J. Eur. Math. Soc.
9, 85121 (2007)
12. Bigoni, D., Spantini, A., Marzouk, Y.: On the computation of monotone transports (2016,
preprint)
13. Bonnotte, N.: From Knothes rearrangement to Breniers optimal transport map. SIAM J. Math.
Anal. 45(1), 6487 (2013)
14. Box, G., Cox, D.: An analysis of transformations. J. R. Stat. Soc. Ser. B 26(2), 211252 (1964)
15. Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions.
Commun. Pure Appl. Math. 44(4), 375417 (1991)
16. Brooks, S., Gelman, A., Jones, G., Meng, X.L. (eds.): Handbook of Markov Chain Monte
Carlo. Boca Raton (2011)
17. Calderhead, B.: A general construction for parallelizing Metropolis-Hastings algorithms. Proc.
Natl. Acad. Sci. 111(49), 1740817413 (2014)
18. Carlier, G., Galichon, A., Santambrogio, F.: From Knothes transport to Breniers map and a
continuation method for optimal transport. SIAM J. Math. Anal. 41(6), 25542576 (2010)
19. Champion, T., De Pascale, L.: The Monge problem in Rd . Duke Math. J. 157(3), 551572
(2011)
20. Chib, S., Jeliazkov, I.: Marginal likelihood from the Metropolis-Hastings output. J. Am. Stat.
Assoc. 96(453), 270281 (2001)
21. Chorin, A., Morzfeld, M., Tu, X.: Implicit particle filters for data assimilation. Commun. Appl.
Math. Comput. Sci. 5(2), 221240 (2010)
22. Chorin, A.J., Tu, X.: Implicit sampling for particle filters. Proc. Natl. Acad. Sci. 106(41),
17,24917,254 (2009)
23. Csillry, K., Blum, M.G.B., Gaggiotti, O.E., Franois, O.: Approximate Bayesian computation
(ABC) in practice. Trends Ecol. Evol. 25(7), 4108 (2010)
23 Sampling via Measure Transport: An Introduction 823
24. Cui, T., Law, K.J.H., Marzouk, Y.M.: Dimension-independent likelihood-informed MCMC. J.
Comput. Phys. 304(1), 109137 (2016)
25. Cui, T., Martin, J., Marzouk, Y.M., Solonen, A., Spantini, A.: Likelihood-informed dimension
reduction for nonlinear inverse problems. Inverse Probl. 30(11), 114,015 (2014)
26. Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc. B 68(3),
411436 (2006)
27. Feyel, D., stnel, A.S.: Monge-Kantorovitch measure transportation and Monge-Ampere
equation on Wiener space. Probab. Theory Relat. Fields 128(3), 347385 (2004)
28. Fox, C.W., Roberts, S.J.: A tutorial on variational Bayesian inference. Artif. Intell. Rev. 38(2),
8595 (2012)
29. Gautschi, W.: Orthogonal polynomials: applications and computation. Acta Numer. 5, 45119
(1996)
30. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman
and Hall, Boca Raton (2003)
31. Gelman, A., Meng, X.L.: Simulating normalizing constants: from importance sampling to
bridge sampling to path sampling. Stat. Sci. 13, 163185 (1998)
32. Ghorpade, S., Limaye, B.V.: A Course in Multivariable Calculus and Analysis. Springer, New
York (2010)
33. Gilks, W., Richardson, S., Spiegelhalter, D. (eds.): Markov Chain Monte Carlo in Practice.
Chapman and Hall, London (1996)
34. Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo
methods. J. R. Stat. Soc. Ser. B 73, 137 (2011)
35. Goodman, J., Lin, K.K., Morzfeld, M.: Small-noise analysis and symmetrization of implicit
Monte Carlo samplers. Commun. Pure Appl. Math. 24, n/a (2015)
36. Gorham, J., Mackey, L.: Measuring sample quality with Steins method. In: Advances in Neural
Information Processing Systems, Montral, Canada, pp. 226234 (2015)
37. Haario, H., Saksman, E., Tamminen, J.: An adaptive metropolis algorithm. Bernoulli 7(2),
223242 (2001)
38. Haber, E., Rehman, T., Tannenbaum, A.: An efficient numerical method for the solution of the
L2 optimal mass transfer problem. SIAM J. Sci. Comput. 32(1), 197211 (2010)
39. Huan, X., Parno, M., Marzouk, Y.: Adaptive transport maps for sequential Bayesian optimal
experimental design (2016, preprint)
40. Jaakkola, T.S., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat.
Comput. 10(1), 2537 (2000)
41. Kim, S., Ma, R., Mesa, D., Coleman, T.P.: Efficient Bayesian inference methods via convex
optimization and optimal transport. IEEE Symp. Inf. Theory 6, 22592263 (2013)
42. Kleywegt, A., Shapiro, A., Homem-de-Mello, T.: The sample average approximation method
for stochastic discrete optimization. SIAM J. Optim. 12(2), 479502 (2002)
43. Kushner, H., Yin, G.: Stochastic Approximation and Recursive Algorithms and Applications.
Springer, New York (2003)
44. Laparra, V., Camps-Valls, G., Malo, J.: Iterative gaussianization: from ICA to random rotations.
IEEE Trans. Neural Netw. 22(4), 113 (2011)
45. Laurence, P., Pignol, R.J., Tabak, E.G.: Constrained density estimation. In: Quantitative Energy
Finance, pp. 259284. Springer, New York (2014)
46. Le Maitre, O., Knio, O.M.: Spectral Methods for Uncertainty Quantification: With Applica-
tions to Computational Fluid Dynamics. Springer, Dordrecht/New York (2010)
47. Litvinenko, A., Matthies, H.G.: Inverse Problems and Uncertainty Quantification.
arXiv:1312.5048 (2013)
48. Litvinenko, A., Matthies, H.G.: Uncertainty quantification and non-linear Bayesian update of
PCE coefficients. PAMM 13(1), 379380 (2013)
49. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2004)
50. Loeper, G., Rapetti, F.: Numerical solution of the MongeAmpre equation by a Newtons
algorithm. Comptes Rendus Math. 340(4), 319324 (2005)
51. Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1968)
824 Y. Marzouk et al.
52. Marin, J.M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational
methods. Stat. Comput. 22(6), 11671180 (2012)
53. Martin, J., Wilcox, L., Burstedde, C., Ghattas, O.: A stochastic Newton MCMC method for
large-scale statistical inverse problems with application to seismic inversion. SIAM J. Sci.
Comput. 34(3), 14601487 (2012)
54. Matthies, H.G., Zander, E., Rosic, B.V., Litvinenko, A., Pajonk, O.: Inverse problems in a
Bayesian setting. arXiv:1511.00524 (2015)
55. McCann, R.: Existence and uniqueness of monotone measure-preserving maps. Duke Math. J.
80(2), 309323 (1995)
56. Meng, X.L., Schilling, S.: Warp bridge sampling. J. Comput. Graph. Stat. 11(3), 552586
(2002)
57. Monge, G.: Mmoire sur la thorie des dblais et de remblais. In: Histoire de lAcadmie
Royale des Sciences de Paris, avec les Mmoires de Mathmatique et de Physique pour la
mme anne, pp. 666704 (1781)
58. Morzfeld, M., Chorin, A.J.: Implicit particle filtering for models with partial noise, and an
application to geomagnetic data assimilation. arXiv:1109.3664 (2011)
59. Morzfeld, M., Tu, X., Atkins, E., Chorin, A.J.: A random map implementation of implicit
filters. J. Comput. Phys. 231(4), 20492066 (2012)
60. Morzfeld, M., Tu, X., Wilkening, J., Chorin, A.: Parameter estimation by implicit sampling.
Commun. Appl. Math. Comput. Sci. 10(2), 205225 (2015)
61. Moselhy, T., Marzouk, Y.: Bayesian inference with optimal maps. J. Comput. Phys. 231(23),
78157850 (2012)
62. Neal, R.M.: MCMC using Hamiltonian dynamics. In: Brooks, S., Gelman, A., Jones, G.L.,
Meng, X.L. (eds.) Handbook of Markov Chain Monte Carlo, chap. 5, pp. 113162. Taylor and
Francis, Boca Raton (2011)
63. Parno, M.: Transport maps for accelerated Bayesian computation. Ph.D. thesis, Massachusetts
Institute of Technology (2014)
64. Parno, M., Marzouk, Y.: Transport Map Accelerated Markov Chain Monte Carlo.
arXiv:1412.5492 (2014)
65. Parno, M., Moselhy, T., Marzouk, Y.: A Multiscale Strategy for Bayesian Inference Using
Transport Maps. arXiv:1507.07024 (2015)
66. Ramsay, J.: Estimating smooth monotone functions. J. R. Stat. Soc. Ser. B 60(2), 365375
(1998)
67. Reich, S.: A nonparametric ensemble transform method for Bayesian inference. SIAM J. Sci.
Comput. 35(4), A2013A2024 (2013)
68. Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization, vol. 3.
SIAM, Philadelphia (2001)
69. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York
(2004)
70. Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470472
(1952)
71. Rosic, B.V., Litvinenko, A., Pajonk, O., Matthies, H.G.: Sampling-free linear Bayesian update
of polynomial chaos representations. J. Comput. Phys. 231(17), 57615787 (2012)
72. Saad, G., Ghanem, R.: Characterization of reservoir simulation models using a polynomial
chaos-based ensemble Kalman filter. Water Resour. Res. 45(4), n/a (2009)
73. Smith, A., Doucet, A., de Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in
Practice. Springer, New York (2001)
74. Spall, J.C.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and
Control, vol. 65. Wiley, Hoboken (2005)
75. Spantini, A., Marzouk, Y.: On the low-dimensional structure of measure transports (2016,
preprint)
76. Spantini, A., Solonen, A., Cui, T., Martin, J., Tenorio, L., Marzouk, Y.: Optimal low-rank
approximations of Bayesian linear inverse problems. SIAM J. Sci. Comput. 37(6), A2451
A2487 (2015)
23 Sampling via Measure Transport: An Introduction 825
77. Stavropoulou, F., Mller, J.: Parameterization of random vectors in polynomial chaos expan-
sions via optimal transportation. SIAM J. Sci. Comput. 37(6), A2535A2557 (2015)
78. Strang, G., Fix, G.J.: An Analysis of the Finite Element Method, vol. 212. Prentice-Hall,
Englewood Cliffs (1973)
79. Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451559 (2010)
80. Sullivan, A.B., Snyder, D.M., Rounds, S.A.: Controls on biochemical oxygen demand in the
upper Klamath River, Oregon. Chem. Geol. 269(1-2), 1221 (2010)
81. Tabak, E., Turner, C.V.: A family of nonparametric density estimation algorithms. Communi-
cations on Pure and Applied Mathematics 66(2), 145164 (2013)
82. Tabak, E.G., Trigila, G.: Data-driven optimal transport. Commun. Pure Appl. Math. 10, 1002
(2014)
83. Thode, H.C.: Testing for Normality, vol. 164. Marcel Dekker, New York (2002)
84. Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society,
Providence (2003)
85. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin/Heidelberg (2008)
86. Wackernagel, H.: Multivariate Geostatistics: An Introduction with Applications. Springer-
Verlag Berlin Heidelberg (2013)
87. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational
inference. Found. Trends Mach. Learn. 1(12), 1305 (2008)
88. Wang, L.: Methods in Monte Carlo computation, astrophysical data analysis and hypothesis
testing with multiply-imputed data. Ph.D. thesis, Harvard University (2015)
89. Wilkinson, D.J.: Stochastic Modelling for Systems Biology. CRC Press, Boca Raton (2011)
90. Wright, S.J., Nocedal, J.: Numerical Optimization, vol. 2. Springer, New York (1999)
91. Xiu, D., Karniadakis, G.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
Compressive Sampling Methods for Sparse
Polynomial Chaos Expansions 24
Jerrad Hampton and Alireza Doostan
Abstract
A salient task in uncertainty quantification (UQ) is to study the dependence of a
quantity of interest (QoI) on input variables representing system uncertainties.
Relying on linear expansions of the QoI in orthogonal polynomial bases of
inputs, polynomial chaos expansions (PCEs) are now among the widely used
methods in UQ. When there exists a smoothness in the solution being approxi-
mated, the PCE exhibits sparsity in that a small fraction of expansion coefficients
are significant. By exploiting this sparsity, compressive sampling, also known
as compressed sensing, provides a natural framework for accurate PCE using
relatively few evaluations of the QoI and in a manner that does not require
intrusion into legacy solvers. The PCE possesses a rich structure between the
QoI being approximated, the polynomials, and input variables used to perform
the approximation and where the QoI is evaluated. In this chapter insights are
provided into this structure, summarizing a portion of the current literature on
PCE via compressive sampling within the context of UQ.
Keywords
Legendre Polynomials Hermite Polynomials Orthogonal Polynomials
Compressed Sensing Polynomial Chaos Expansions Markov Chain Monte
Carlo `1 -minimization Basis Pursuit Sparse Approximation
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
1.2 The Askey Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
1.3 Polynomial Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
1.4 Sparsity in PCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
1 Introduction
the random vector, . In this manner a scalar QoI, denoted by u. /, is modeled
as an unknown function of the random input, and the goal is to approximate this
function. This approximation then forms a surrogate model for u. /, which can be
accessed with negligible cost.
Often u. / depends on a function v.x; t; /, the solution to a set of partial or
ordinary differential equations,
with appropriate boundary and initial conditions. Here, R denotes the set of
governing equations, x 2 D RD , D D 1; 2; 3, the spatial variable, and t 2 0; T
the time.
As the computation of realizations of v.x; t; /, and hence of u. /, requires
potentially the use of legacy codes, it is often preferred not to alter the solver for v
to handle in a sophisticated manner, e.g., by considering the Galerkin projection
of R into a subspace spanned by a basis in . Additionally, the evaluation of a
realized u. / can become a significant computational bottleneck, and in these cases
it is prudent to identify an approximation using as few realizations of u. / as is
practical.
Approximating u. /, assumed to have finite variance, using a basis expansion
in multivariate orthogonal polynomials, each of which is denoted by k . /, yields
the PCE
1
X
u. / D ck k . /: (24.2)
kD1
Table 24.1 The orthogonal polynomials corresponding to several commonly used continuous
distributions from the Askey scheme
Distribution of random variable, f ./ Normal Uniform Gamma Beta
Associated orthogonal polynomial, k . / Hermite Legendre Laguerre Jacobi
Fig. 24.1 Plots of orthonormal Legendre (a) and Hermite (b) polynomials of degree up to three
d
Y
k . / D ki .i /;
iD1
which by (24.3) is the approximation minimizing kuuO kL2 . / . Perhaps, the simplest
formulation of polynomial order is that of tensor order, defined to be
where the superscript .1/ is chosen due to the norm on k when viewed as a vector.
.1/
This set has jKp j D .p C 1/d polynomials in it, which is exponentially dependent
on d , but has the benefit of being able to handle a high level of interaction among
the different dimensions. However, it is rare that u has such a high dependence
between dimensions so as to simultaneously require high-order polynomials in all
dimensions to accurately reconstruct u. For this reason, a more popular definition of
order for PCE, total order, is defined by
.1/
A combinatorial consideration reveals that this definition yields jKp j D pCd d
minf.p C 1/d ; .d C 1/p g basis functions, which often gives a significant reduction
in the number of basis functions though it grows rapidly if both p and d are large.
832 J. Hampton and A. Doostan
where > 0 is a parameter controlling the shape of this set of basis functions.
Smaller reduces the number of basis functions more dramatically, particularly by
removing polynomials of higher orders in multiple dimensions. In contrast, larger
more rapidly increases the number of basis functions with the advantage of being
more robust to recovering solutions possessing highly interactive dimensions.
In this chapter, the focus is on the total-order bases as defined in (24.5),
corresponding to D 1 in (24.6), and the superscript is dropped. As seen in the
next section, approximations to u using such a basis often require a relatively small
proportion of these basis functions, leading to a sparsity in computed solutions.
The sparsity in the PCE is closely intertwined with recovery via compressive
sampling. Intuitively, a PCE is sparse if a small but unknown subset of the basis
set may still accurately approximate the function u. Refer to the set of indices of
these basis functions as Sp , noting that it depends on p, in the sense that with a
higher-order approximation, previously unavailable basis functions may improve
the approximation. Intuitively, smooth u are well approximated by a truncated
Taylor series expansion, which can itself be constructed from a small number of
basis polynomials. Specifically, a function u is s-sparse if for some Sp Kp with
jSp j D s,
X
u. / D ck k . /:
k2Sp
Practically, this will not hold with equality, but rather only as an accurate approxima-
tion, often referred to as approximate sparsity. An example of such sparsity, adapted
from [64], is a solution computed from a thermally driven cavity flow problem
shown in Fig. 24.2 and discussed later, exhibiting a significant decay in coefficient
magnitude as order increases and for polynomials which vary with respect to
dimensions that are less useful for constructing an accurate approximation. This
solution is for a problem summarized in Fig. 24.8. Though a priori information as
to which basis functions are useful in approximating u is often unavailable, there is
often reasonable certainty that the basis could be compressed to a smaller number
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 833
Fig. 24.2 Solution coefficient magnitude for basis functions of varied order, demonstrating
coefficient decay motivating a sparse reconstruction. Coefficient indices to the left of the vertical
dashed lines are included in the set of PC basis functions of order not larger than the specified p
2 Solution Computation
While there is motivation to identify a solution that has small support Sp , finding
the smallest such support is a difficult combinatorial problem [26], and instead a
similar, tractable problem is solved. Starting from (24.4) a matrix equation can be
constructed from realized data that may be solved to identify the coefficients ck .
Specifically, utilizing N realizations of , each denoted by .i/ , one can define
N equations in a linear system, where methods to select these realizations are
discussed in a later section. Here, let u be a vector with ui D u. .i/ /. For some
indexing of polynomials that maps k 2 Kp to j 2 f1; ; P g, let c be a vector
with entries cj , and i;j D j . .i/ /. This gives an N 1 vector u, a P 1 vector
c, and an N P matrix , referred to as the measurement matrix. Denote the
approximation to c by cO . It is reasonable to approximate c by cO computed using
and u. Moreover, compressive sampling allows this to be done while undersampling
the N P matrix , that is when N < P . One problem to investigate is
One useful method is the convex relaxation of the k k0 objective function in (24.7)
to the `1 norm k k1 ,
which is known as basis pursuit denoising (BPDN). It is possible to select via the
statistical technique of cross-validation [2,11,31,82]. In many practical contexts, the
truncation error defined via (24.4) as u uO implies a need for positive to account
for the QoI not being fully explained by the chosen basis.
Closely connected with (24.9) is the regularized formulation (for some positive
parameter ),
Y
.cj/ WD e jck j ; (24.11)
2
k2Kp
for some parameter . For computational purposes, often an alternative, but similar
in nature, prior distribution on c is employed [5].
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 835
Several measures for recovery are focused on identifying when the solution of the
`1 -minimization problem (24.9) or OMP without noise is equivalent to the solution
of (24.7). This recovery is also shown to be robust to noise, which includes the sta-
bility of the problem with respect to the truncation error as well as potential errors in
identifying u./. Though a variety of measures are considered here, other properties
relating to the spark and null space of a matrix are additionally useful [25, 47, 48].
In this context, the Restricted isometry constant (RIC) [18, 34, 58], closely related
to the uniform uncertainty principle [14], is defined to be the smallest constant s
satisfying
1
.1 s /kc s k22 k c s k22 .1 C s /kc s k22 ; (24.12)
N
for all c s having s or fewer nonzero elements. Note that smaller s implies a near
isometry of any s columns of , and it is for this reason that the RIC is analytically
useful. Further, when s is less than some threshold for an appropriate s, then the
matrix is said to satisfy a restricted isometry property (RIP), guaranteeing several
results. An example of such an RIP result is a theorem restated from [69].
Unfortunately, for practically sized s, the RIC is difficult to compute for a given
matrix, and as a result any RIP can be difficult to verify. Still, the RIC can be
guaranteed in a probabilistic sense and lead to a minimum sampling size N for
a successful solution recovery via (24.8). We defer the statement of any particular
theorem until the coherence parameter of the next section is defined.
Let .i; W/ denote the i th row of and .W; j / denote the j th column of .
The rows of are assumed to be independent and identically distributed. Recall
that as basis polynomials are taken according to the Askey scheme, an appropriate
normalization for the basis polynomials gives that
1
E. T / D E.. .1; W//T .1; W// D I; (24.13)
N
2
WD sup j k ./j ; (24.14)
k2Kp ;2
N
C 2 s log2 .100s/ log.4P / log.7
1 /; (24.15)
log.10N /
This result may be combined with Theorem 1 to bound recovery. Although not
needed here, it is worthwhile noting that the result in Theorem 2 can be improved
slightly to that of Theorem 8.4 in [68]. The RIC implies uniform recovery over all
potential c, but this guarantee of uniform recovery is often unneeded in applications.
838 J. Hampton and A. Doostan
If one instead wishes to assure recovery for an arbitrary but fixed c, then one may
invoke Theorem 1.1 of [19], reproduced here.
N C0 .1 C / s log. P /: (24.16)
We note that this result is stated for the solution to (24.8), though a similar result
holds for the solution to (24.9) [19].
Theorem 4 ([29]). Let be the error parameter in OMP, and let c satisfy
1CM 1
kck0 :
2M M min jck j
ck 0
supp.c/ D supp.Oc /:
2
kOc ck22 :
1 M .kck0 1/
While this coherence involves the inner products between vectors, a different
definition is useful for random matrices commonly utilized in PCE. This related
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 839
4 Improving Recovery
Several methods exist for identifying the .i/ used to form the linear system
in (24.7). A straightforward approach is to draw .i/ independently from the
distribution of , i.e., f ./. This distribution is, in general, not an ideal sampling
method for solution recovery [40], and it may be desired to sample from a different
distribution, g./. Unfortunately, this would destroy the orthogonality of the basis
functions in that (24.13) would no longer hold. This, however, can be corrected
within the context of importance sampling by noting if g./ > 0 for 2 , then
Z Z
f ./
i;j D i ./ j ./f ./d D i ./ j ./g./d :
g./
where the weight must also be applied to the corresponding realization of QoI, u./,
as the matrix relation W c D W u insures that c D u. Here, the choice of g./
depends in particular on (24.14), as described in the next section. Note that the
inclusion of w./ leads to a new definition for (24.14) given by
2
w WD sup jw./ k ./j ; (24.19)
k2Kp ;2
840 J. Hampton and A. Doostan
where c is the necessary normalizing constant to insure that g./ is a probability dis-
tribution and is typically unneeded in practice. Some of these coherence-minimizing
distributions, g./, for one-dimensional Hermite and Legendre polynomials, are
provided for reference in Fig. 24.4. Here, the limit for the Legendre sampling
distribution when p ! 1 is the Chebyshev distribution, defined to be
1
g./ WD p : (24.22)
1 2
In contrast, the limiting behavior for the Hermite sampling can be derived from
theoretical results in [54] and is more nuanced, as
pdepicted
p in Fig. 24.5. Here, the
input on the horizontal axis scales as Q D =. 2 2p C 1/, and for Q defined
this way the distribution has theoretical form g./Q D 6 jj
Q 1=6 for Q 2 1; 1.
5
Fig. 24.4 Plots of a few coherence-minimizing sampling distributions for Legendre (a) and
Hermite (b) polynomials
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 841
p p
As the support grows with p, a normalization dividing by 2 2p C 1 is required
to scale the support of the distribution, and so the limit of the coherence-minimizing
distributions for a basis of total-order p as p ! 1 is not itself a distribution. In
appractical sense, sampling uniformly from a d -dimensional hypersphere of radius
2p C 1 yields reasonable coherence values [40] and is used here as a distribution
for high-order expansions.
There is a practical method to sample from g./ so as to minimize (24.19)
based on Markov chain Monte Carlo (MCMC) sampling [37], which does not
require the computation of the normalizing constant c in (24.21). A Metropolis-
Hastings [55] type version of MCMC sampling is presented in Algorithm 2 and
uses random samples which are rejected and accepted in such a way that the
distribution of samples converges to the desired distribution, which is the stationary
distribution of a Markov chain. This is a particularly useful tool when the sampling
distribution is accessible up to an unknown normalizing constant, as in the case
of g./. There are several considerations [71] to take into account when using
Algorithm 2 to sample from g./. One is that as MCMC samples are dependent,
there may be significant serial correlation between samples, and so a number of
intermediate samples, denoted by T in Algorithm 2, may need to be discarded
to reduce this effect. Another consideration is that a number of initial discarded
samples are required to insure that the samples are being drawn approximately
from the desired distribution. Both of these considerations depend on the choice
of proposal distribution gp ./ in Algorithm 2. Choices of proposal distributions
depend on whether the PCE is of a higher dimension or order [40] and in the latter
case can be derived from asymptotic considerations of the polynomials, leading to
the asymptotic distributions shown in Figs. 24.4 and 24.5. For example, if a high-
order Legendre approximation is used, gp ./ can be taken to be the Chebyshev
distribution, while a low-order Legendre approximation would be better served by
a uniformly distributed gp ./. Similarly, for a high-order Hermite approximation,
842 J. Hampton and A. Doostan
Fig. 24.6 Phase transition diagrams for Legendre PCE under various sampling strategies. Differ-
ent rows correspond to different problems ranging from high dimensional to high order. Different
columns correspond to different sampling distributions (Figure adapted from [40] with stricter
`1 -minimization solver tolerance and larger number of iterations)
Notice that values W k;k which are relatively large impose a higher penalty for larger
cOk , and so the corresponding values of cOk will tend to be smaller than the solution
to (24.8). Similarly, relatively small values of W k;k will yield a smaller penalty for
larger cOk , and so the values of cOk will tend to be larger than the solution to (24.8). In
this way, W distorts the `1 -ball and via the effect displayed in Fig. 24.7 may improve
solution recovery [1, 17, 21, 42, 51, 61, 64, 70, 87, 90]. Weights may be adjusted
iteratively and (24.23) recomputed, in a process called iteratively reweighted `1 -
minimization [17, 90]. This adjustment is also related to a variant of (24.7) via
the concept of weighted sparsity, as in [1, 70]. The choice of weights influences
the computed solutions based on which coordinates are more heavily or lightly
weighted, and so the weights must be identified carefully. Sometimes these weights
can be derived by a careful analysis of decay rates [64], other times by a simplified or
lower-fidelity model. In the context of function interpolation via solution of (24.8),
column weighting is used to reduce aliasing, insuring that high-order basis functions
are not overrepresented [1, 70].
844 J. Hampton and A. Doostan
@u .1/ @u .1/ @u .N / @u .N / T
u@ WD ;:::; ;:::; ;:::;
@1 @d @1 @d
(24.24)
and
@ j .i/
@ ..i 1/ d C k; j / WD . /; k D 1; : : : ; d: (24.25)
@k
In this way, u@ contains all the partial derivative information for u./ with respect
to each coordinate of . Similarly, @ contains the partial derivatives of each basis
function with respect to each coordinate of . Whereas before had independent
rows, here instead it has independent submatrices of size .d C 1/ P for the
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 845
measurement matrix. Let the submatrix of independent information related with the
kth sample be given by X k , with a generic realization given by X . Specifically,
@ j
X .i; j / D ./; i D 1; : : : ; d I
@i
X .d C 1; j / D j ./:
Here, W @ is a fixed matrix depending on the basis functions and their derivatives
chosen to insure that
W T@ E N 1 T W @ D I: (24.28)
d
! d
!
X @ i @ j X
E i . / j . / C . / . / D i ;j 1C ik ; (24.29)
@k @k
kD1 kD1
This theorem implies that W @ is diagonal and having entries determined by the
degree of the polynomial in each of the various dimensions. In general, W @ will not
be diagonal, for example, when considering d -dimensional Legendre polynomials.
It is possible to define coherence in the presence of derivative information in a
manner that extends (24.14). Specifically, let
where the supremum is taken over all columns of X and potential realizations of
X . From (24.30), r represents the largest `2 -norm of a column of X evaluated
at the domain of the random input. Stated another way, r represents the largest
`2 -contribution to from any specific basis function over any potential realization.
This definition of coherence is related to the recovery of solutions by (24.27), both
directly and via the RIC in (24.12). As for (24.14), for Hermite polynomials, (24.30)
is unbounded, and it is necessary to restrict the domain of to a smaller set as
in [19, 40], giving a boundedness with high probability.
The same guarantees of Theorem 2 hold in this case [65]. The coherence
in (24.30) is generally less than that of (24.14) [65] and is compatible with
Theorem 3, implying that there is no need to increase sample sizes when including
derivative information. In terms of the conditioning of the measurement matrix
, relative to a comparable , including derivative information as described
does not reduce the ability of the matrix to uniformly recover solutions, although
contributions from evaluation of the QoI and the appropriate derivatives may
effect error in the solution computation [65]. In practice, the computational saving
associated with including derivative information on the `1 -minimization solution
depends on both the cost of computing the derivatives for the QoI and the benefit of
including derivative information.
Note that the importance sampling to minimize (24.30) involves the weight
function
!1
w./ WD sup kX .W; k/k2 : (24.31)
k2Kp
Here, the realized weight function would multiply the entire submatrix for the
corresponding sample. The associated sampling distribution is
!2
g./ WD c sup kX .W; k/k2 f ./; (24.32)
k2Kp
where c is a normalizing constant that need not be identified when samples from
g./ are drawn via MCMC.
Following [56, 64, 66], a 2-D heat-driven square cavity flow problem with uncertain
wall temperature, as shown in Fig. 24.8, is considered. The left vertical wall has a
deterministic, constant temperature TQh , while the right vertical wall has a stochastic
temperature TQc < TQh with constant mean TNQc . The superscript tilde (Q) denotes
the nondimensional quantities. Both the top and bottom walls are assumed to be
adiabatic. The reference temperature and the reference temperature difference are
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 847
defined as TQref D .TQh C TNQc /=2 and TQref D TQh TNQc , respectively. Under the
assumption of small temperature differences, i.e., the Boussinesq approximation,
the governing equations in dimensionless variables are given by
@u Pr
C u ru D rp C p r 2 u C PrT y;
O
@t Ra
r u D 0; (24.33)
@T 1
C r .uT / D p r 2 T;
@t Ra
where yO is the unit vector .0; 1/, u D .u; v/ is the velocity vector field,
T D .TQ TQref /=TQref is normalized temperature, p is pressure, and t is time.
Nondimensional Prandtl and Rayleigh numbers are defined, respectively, as Pr D
Q cQp =Q and Ra D
g
Q Q 3 =. Q /.
TQref L Q Specifically, Q is molecular viscosity, Q is
thermal diffusivity,
Q is density, g is gravitational acceleration, the coefficient of
thermal expansion is given by , and L Q is the reference length. In this example, the
Prandtl and Rayleigh numbers are set to Pr D 0:71 and Ra D 106 , respectively.
For more details on the nondimensional variables in (24.33), the interested reader is
referred to [56, Chapter 6.2]. On the right vertical wall, a (normalized) temperature
distribution with stochastic fluctuations of the form
Tc .x D 1; y; / D TNc C Tc0 ;
d p
X (24.34)
Tc0 D T i 'i . y/i
iD1
the exponential covariance kernel CTc Tc .y1 ; y2 / D exp jy1 y
lc
2j
, where lc is the
correlation length. In this setting, a (semi-)analytic representation of the eigenpairs
.i ; 'i .y// in (24.34) is available; see, e.g., [36].
Here, .Th ; TNc / D .0:5; 0:5/, d D 20, lc D 1=21, and T D 11=100. Our QoI is
the vertical velocity component at .x; y/ D .0:25; 0:25/ denoted by u. /.
Figures 24.9a and 24.10a contrast the approximate PCE coefficients obtained via
`1 -minimization, with N D 200 and N D 1000, respectively, against the reference
solution, confirming an improved solution recovery when N is increased.
As an example of the improvement that can be achieved via weighted `1 -
minimization, Figs. 24.9b and 24.10b display approximations to the solution, with
and without weights as in Fig. 24.11, generated from a simplified model which
does not require the solution to (24.33), [64]. Note that in Fig. 24.9, considerable
improvement occurs even though the weights are not a very accurate representation
of the solution decay. Figure 24.10 verifies that the weighted and unweighted
Fig. 24.9 Reference solution overlaid with a computed solution via standard `1 -minimization
(a) and weighted `1 -minimization (b) using 200 realizations of the input random vector (Figure
adapted from [64])
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 849
Fig. 24.10 Reference solution overlaid with a computed solution via standard `1 -minimization
(a) and weighted `1 -minimization (b) using 1000 realizations of the input random vector (Figure
adapted from [64])
Fig. 24.11 Approximate PCE coefficient magnitudes of the QoI v.0:25; 0:25/ for the cavity
problem of Fig. 24.8. The approximation is derived from a simplified model, [64], and is used
to construct weights for solving (24.23) (Figure adapted from [64])
850 J. Hampton and A. Doostan
Fig. 24.12 Reference solution overlaid with a computed solution via stochastic collocation with
Smolyak sparse grid based on Clenshaw-Curtis abscissas with N D 841 samples
reduces when larger number of samples are used. Secondly, improving the accuracy
of the stochastic collocation (by using a larger sample size N ) requires the
simulation of an additional number of samples that is dictated by the Smolyak
grid construction, as opposed to the available computational budget. This limitation,
however, does not apply to the compressive sampling methods described above, as
they rely on random samples.
6 Conclusion
The use of polynomial basis functions can impart a sparsity in solutions for many
problems, allowing a tractable computation of solutions to a large number of
problems. This chapter has looked at how a QoI is built from a computation of
realized random variables, how orthogonal basis polynomials are identified in those
random variables, and how coefficients for those basis functions are then computed.
A variety of theoretical measures were investigated which are useful in gauging how
well a solution may be recovered with a given number of samples. Adjustments
to the standard approach were considered that may lead to an improvement of
recovery.
References
1. Adcock, B.: Infinite-dimensional `1 minimization and function approximation from pointwise
data. arXiv preprint arXiv:150302352 (2015)
2. Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv.
4, 4079 (2010)
3. Askey, R., Wainger, S.: Mean convergence of expansions in Laguerre and hermite series. Am.
J. Math. 87(3), 695708 (1965)
4. Askey, R.A., Arthur, W.J.: Some Basic Hypergeometric Orthogonal Polynomials That Gener-
alize Jacobi Polynomials, vol. 319. AMS, Providence (1985)
5. Babacan, S., Molina, R., Katsaggelos, A.: Bayesian compressive sensing using laplace priors.
IEEE Trans. Image Process. 19(1), 5363 (2010)
6. Becker, S., Bobin, J., Cands, E.J.: NESTA: A fast and accurate first-order method for sparse
recovery. ArXiv e-prints (2009). Available from http://arxiv.org/abs/0904.3367
7. Berg, E.v., Friedlander, M.P.: SPGL1: a solver for large-scale sparse reconstruction (2007).
Available from http://www.cs.ubc.ca/labs/scl/spgl1
8. Berveiller, M., Sudret, B., Lemaire, M.: Stochastic finite element: a non intrusive approach
by regression. Eur. J. Comput. Mech. Revue (Europenne de Mcanique Numrique) 15(13),
8192 (2006)
9. Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on least angle
regression. J. Comput. Phys. 230, 23452367 (2011)
10. Bouchot, J.L., Bykowski, B., Rauhut, H., Schwab, C.: Compressed sensing Petrov-Galerkin
approximations for parametric PDEs. In: International Conference on Sampling Theory and
Applications (SampTA 2015), pp. 528532. IEEE (2015)
11. Boufounos, P., Duarte, M., Baraniuk, R.: Sparse signal reconstruction from noisy compressive
measurements using cross validation. In: Proceedings of the 2007 IEEE/SP 14th Workshop
on Statistical Signal Processing (SSP07), Madison, pp. 299303. IEEE Computer Society
(2007)
852 J. Hampton and A. Doostan
12. Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to
sparse modeling of signals and images. SIAM Rev. 51(1), 3481 (2009)
13. Cands, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203
4215 (2005)
14. Cands, E., Tao, T.: Near optimal signal recovery from random projections: Universal encoding
strategies? IEEE Trans. Inf. Theory 52(12), 54065425 (2006)
15. Cands, E., Wakin, M.: An introduction to compressive sampling. IEEE Signal Process. Mag.
25(2), 2130 (2008)
16. Cands, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction
from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489509
(2006)
17. Cands, E., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted `1 minimization. J. Fourier
Anal. Appl. 14(5), 877905 (2008)
18. Cands, E.J.: The restricted isometry property and its implications for compressed sensing. C.
R. Math. 346(9), 589592 (2008)
19. Cands, E.J., Plan, Y.: A probabilistic and ripless theory of compressed sensing. IEEE Trans.
Inf. Theory 57(11), 72357254 (2010)
20. Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate
measurements. Commun. Pure Appl. Math. 59(8), 12071223 (2006)
21. Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: 33rd
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas
(2008)
22. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci.
Comput. 20, 3361 (1998)
23. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1),
129159 (2001)
24. Dai, W., Milenkovic, O.: Subspace pursuit for compressive sensing signal reconstruction. IEEE
Trans. Inf. Theory 55(5), 22302249 (2009)
25. Davenport, M.A., Duarte, M.F., Eldar, Y.C., Kutyniok, G.: Introduction to Compressed
Sensing. Cambridge University Press, Cambridge (2012)
26. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 12891306 (2006)
27. Donoho, D., Huo, X.: Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf.
Theory 47(7), 28452862 (2001). doi:10.1109/18.959265
28. Donoho, D., Tanner, J.: Observed universality of phase transitions in high-dimensional
geometry, with implications for modern data analysis and signal processing. Philos. Trans.
R Soc. A Math. Phys. Eng. Sci. 367(1906), 42734293 (2009)
29. Donoho, D., Elad, M., Temlyakov, V.: Stable recovery of sparse overcomplete representations
in the presence of noise. IEEE Trans. Inf. Theory 52(1), 618 (2006)
30. Donoho, D., Stodden, V., Tsaig, Y.: About SparseLab (2007)
31. Doostan, A., Owhadi, H.: A non-adapted sparse approximation of PDEs with stochastic inputs.
J. Comput. Phys. 230, 30153034 (2011)
32. Doostan, A., Owhadi, H., Lashgari, A., Iaccarino, G.: Non-adapted sparse approximation
of PDEs with stochastic inputs. Tech. Rep. Annual Research Brief, Center for Turbulence
Research, Stanford University (2009)
33. Eldar, Y., Kutyniok, G.: Compressed Sensing: Theory and Applications. Cambridge University
Press, Cambridge (2012)
34. Foucart, S.: A note on guaranteed sparse recovery via, `1 -minimization. Appl. Comput.
Harmonic Anal. 29(1), 97103 (2010). Elsevier
35. Gerstner, T., Griebel, M.: Numerical integration using sparse grids. Numer. Algorithms 18(3
4):209232 (1998)
36. Ghanem, R., Spanos, P.: Stochastic Finite Elements: A Spectral Approach. Dover, Minneola
(2002)
37. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice, vol 2.
CRC Press, Boca Raton (1996)
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 853
38. Hadigol, M., Maute, K., Doostan, A.: On uncertainty quantification of lithium-ion batteries.
arXiv preprint arXiv:150507776 (2015)
39. Hampton, J., Doostan, A.: Coherence motivated sampling and convergence analysis of least
squares polynomial chaos regression. Comput. Methods Appl. Mech. Eng. 290, 7397 (2015)
40. Hampton, J., Doostan, A.: Compressive sampling of polynomial chaos expansions: conver-
gence analysis and sampling strategies. J. Comput. Phys. 280, 363386 (2015)
41. Hosder, S., Walters, R., Perez, R.: A non-intrusive polynomial chaos method for uncertainty
propagation in CFD simulations. In: 44th AIAA Aerospace Sciences Meeting and Exhibit,
AIAA-2006-891, Reno (NV) (2006)
42. Huang, A.: A re-weighted algorithm for designing data dependent sensing dictionary. Int. J.
Phys. Sci. 6(3), 386390 (2011)
43. Jakeman, J., Eldred, M., Sargsyan, K.: Enhancing `1 -minimization estimates of polynomial
chaos expansions using basis selection. J. Comput. Phys. 289, 1834 (2015)
44. Jakeman, J.D., Eldred, M.S., Sargsyan, K.: Enhancing `1 -minimization estimates of polyno-
mial chaos expansions using basis selection. ArXiv e-prints 1407.8093 (2014)
45. Ji, S., Xue, Y., Carin, L.: Bayesian compressive sensing. IEEE Trans. Signal Process. 56(6),
23462356 (2008)
46. Jones, B., Parrish, N., Doostan, A.: Postmaneuver collision probability estimation using sparse
polynomial chaos expansions. J. Guidance Control Dyn. 38(8), 113 (2015)
47. Juditsky, A., Nemirovski, A.: Accuracy guarantees for `1 -recovery. IEEE Trans. Inf. Theory
57, 78187839 (2011)
48. Juditsky, A., Nemirovski, A.: On verifiable sufficient conditions for sparse signal recovery via
`1 minimization. Math. Program. 127(1), 5788 (2011)
49. Karagiannis, G., Lin, G.: Selection of polynomial chaos bases via Bayesian model uncertainty
methods with applications to sparse approximation of PDEs with stochastic inputs. J. Comput.
Phys. 259, 114134 (2014)
50. Karagiannis, G., Konomi, B., Lin, G.: A Bayesian mixed shrinkage prior procedure for spatial
stochastic basis selection and evaluation of gPC expansions: Applications to elliptic SPDEs. J.
Comput. Phys. 284, 528546 (2015)
51. Khajehnejad, M.A., Xu, W., Avestimehr, A.S., Hassibi, B.: Improved sparse recovery thresh-
olds with two-step reweighted `1 minimization. In: 2010 IEEE International Symposium on
Information Theory Proceedings (ISIT), Austin, pp. 16031607. IEEE (2010)
52. Komkov, V., Choi, K., Haug, E.: Design Sensitivity Analysis of Structural Systems, vol. 177.
Academic, Orlando (1986)
53. Krahmer, F., Ward, R.: Beyond incoherence: stable and robust sampling strategies for
compressive imaging. arXiv preprint arXiv:12102380 (2012)
54. Krasikov, I.: New bounds on the Hermite polynomials. ArXiv Mathematics e-prints math/
0401310 (2004)
55. Ma, X., Zabaras, N.: An efficient Bayesian inference approach to inverse problems based on
an adaptive sparse grid collocation method. Inverse Probl. 25, 035,013+ (2009)
56. Maitre, O.L., Knio, O.: Spectral Methods for Uncertainty Quantification with Applications to
Computational Fluid Dynamics. Springer, Dordrecht/New York (2010)
57. Mathelin, L., Gallivan, K.: A compressed sensing approach for partial differential equations
with random input data. Commun. Comput. Phys. 12, 919954 (2012)
58. Mo, Q., Li, S.: New bounds on the restricted isometry constant 2k . Appl. Comput. Harmonic
Anal. 31(3), 460468 (2011)
59. Narayan, A., Zhou, T.: Stochastic collocation on unstructured multivariate meshes. Commun.
Comput. Phys. 18, 136 (2015)
60. Narayan, A., Jakeman, J.D., Zhou, T.: A Christoffel function weighted least squares algorithm
for collocation approximations. arXiv preprint arXiv:14124305 (2014)
61. Needell, D.: Noisy signal recovery via iterative reweighted `1 -minimization. In: Proceedings
of the Asilomar Conference on Signals, Systems, and Computers, Pacific Grove (2009)
62. Needell, D., Tropp, J.: CoSaMP: Iterative signal recovery from incomplete and inaccurate
samples. Appl. Comput. Harmonic Anal. 26(3), 301321 (2008)
854 J. Hampton and A. Doostan
63. Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681686 (2008)
64. Peng, J., Hampton, J., Doostan, A.: A weighted `1 -minimization approach for sparse polyno-
mial chaos expansions. J. Comput. Phys. 267, 92111 (2014)
65. Peng, J., Hampton, J., Doostan, A.: On polynomial chaos expansion via gradient-enhanced
`1 -minimization. arXiv preprint arXiv:150600343 (2015)
66. Qur, P.L.: Accurate solutions to the square thermally driven cavity at high rayleigh number.
Comput. Fluids 20(1), 2941 (1991)
67. Rall, L.B.: Automatic Differentiation: Techniques and Applications, vol. 120. Springer, Berlin
(1981)
68. Rauhut, H.: Compressive sensing and structured random matrices. Theor. Found. Numer.
Methods Sparse Recover. 9, 192 (2010)
69. Rauhut, H., Ward, R.: Sparse Legendre expansions via `1 -minimization. J. Approx. Theory
164(5), 517533 (2012)
70. Rauhut, H., Ward, R.: Interpolation via weighted minimization. Appl. Comput. Harmonic Anal.
40, 321351 (2015)
71. Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer,
New York (2004)
72. Sargsyan, K., Safta, C., Najm, H., Debusschere, B., Ricciuto, D., Thornton, P.: Dimensionality
reduction for complex models via Bayesian compressive sensing. Int. J. Uncertain. Quantif. 4,
6393 (2013)
73. Savin, E., Resmini, A., Peter, J.: Sparse polynomial surrogates for aerodynamic computations
with random inputs. arXiv preprint arXiv:150602318 (2015)
74. Schiavazzi, D., Doostan, A., Iaccarino, G.: Sparse multiresolution regression for uncer-
tainty propagation. Int. J. Uncertain. Quantif. (2014). doi:10.1615/Int.J.UncertaintyQuanti
fication.2014010147
75. Schoutens, W.: Stochastic Processes and Orthogonal Polynomials. Springer, New York
(2000)
76. Smolyak, S.: Quadrature and interpolation formulas for tensor products of certain classes of
functions. Soviet Mathematics, Doklady 4, 240243 (1963)
77. Szeg G (1939) Orthongonal Polynomials. American Mathematical Society, American Math-
ematical Society
78. Tang, G., Iaccarino, G.: Subsampled gauss quadrature nodes for estimating polynomial chaos
expansions. SIAM/ASA J. Uncertain. Quantif. 2(1), 423443 (2014)
79. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat.l Soc. (Ser. B) 58,
267288 (1996)
80. Tropp, J.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory
50(10), 22312242 (2004). doi:10.1109/TIT.2004.834793
81. Tropp, J.A., Anna, G.C.: Signal recovery from random measurements via orthogonal matching
pursuit. IEEE Trans. Inf. Theory 53, 46554666 (2007)
82. Ward, R.: Compressed sensing with cross validation. IEEE Trans. Inf. Theory 55(12), 5773
5782 (2009)
83. West, T., Brune, A., Hosder, S., Johnston, C.: Uncertainty analysis of radiative heating
predictions for titan entry. J. Thermophys. Heat Transf. 114 (2015)
84. Xiu, D.: Numerical Methods for Stochastic Computations: A Spectral Method Approach.
Princeton University Press, Princeton (2010)
85. Xiu, D., Hesthaven, J.: High-order collocation methods for differential equations with random
inputs. SIAM J. Sci. Comput. 27(3), 11181139 (2005)
86. Xiu, D., Karniadakis, G.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
87. Xu, W., Khajehnejad, M., Avestimehr, A., Hassibi, B.: Breaking through the thresholds: an
analysis for iterative reweighted `1 minimization via the Grassmann Angle Framework (2009).
ArXiv e-prints Available from http://arxiv.org/abs/0904.0994
88. Xu, Z., Zhou, T.: On sparse interpolation and the design of deterministic interpolation points.
SIAM J. Sci. Comput. 36(4), A1752A1769 (2014)
24 Compressive Sampling Methods for Sparse Polynomial Chaos Expansions 855
89. Yan, L., Guo, L., Xiu, D.: Stochastic collocation algorithms using `1 -minimization. Int. J.
Uncertain. Quantif. 2(3), 279293 (2012)
90. Yang, X., Karniadakis, G.E.: Reweighted `1 minimization method for stochastic elliptic
differential equations. J. Comput. Phys. 248, 87108 (2013)
91. Yang, X., Lei, H., Baker, N., Lin, G.: Enhancing sparsity of hermite polynomial expansions by
iterative rotations. arXiv preprint arXiv:150604344 (2015)
Low-Rank Tensor Methods for Model
Order Reduction 25
Anthony Nouy
Abstract
Parameter-dependent models arise in many contexts such as uncertainty quan-
tification, sensitivity analysis, inverse problems, or optimization. Parametric or
uncertainty analyses usually require the evaluation of an output of a model for
many instances of the input parameters, which may be intractable for complex
numerical models. A possible remedy consists in replacing the model by an
approximate model with reduced complexity (a so-called reduced order model)
allowing a fast evaluation of output variables of interest. This chapter provides
an overview of low-rank methods for the approximation of functions that are
identified either with order-two tensors (for vector-valued functions) or higher-
order tensors (for multivariate functions). Different approaches are presented
for the computation of low-rank approximations, either based on samples of
the function or on the equations that are satisfied by the function, the latter
approaches including projection-based model order reduction methods. For
multivariate functions, different notions of ranks and the corresponding low-rank
approximation formats are introduced.
Keywords
High-dimensional problems Low-rank approximation Model order reduc-
tion Parameter-dependent equations Stochastic equations Uncertainty
quantification
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
2 Low-Rank Approximation of Order-Two Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
2.1 Best Rank-m Approximation and Optimal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . 860
A. Nouy ()
Department of Computer Science and Mathematics, GeM, Ecole Centrale Nantes, Nantes, France
e-mail: anthony.nouy@ec-nantes.fr
1 Introduction
R.u./I / D 0; (25.1)
where are parameters taking values in some set and where the solution u./ is
in some vector space V , say RM . Parametric or uncertainty analyses usually require
the evaluation of the solution for many instances of the parameters, which may
be intractable for complex numerical models (with large M ) for which one single
solution requires hours or days of computation time. Therefore, one usually relies on
approximations of the solution map u W ! V allowing for a rapid evaluation of
output quantities of interest. These approximations take on different names such as
meta-model, surrogate model, or reduced order model. They are usually of the form
m
X
um ./ D vi si ./; (25.2)
iD1
where the vi are elements in V and the si are elements of some space S of functions
defined on . Standard linear approximation methods rely on the introduction of
generic bases (e.g., polynomials, wavelets, etc.) allowing an accurate approximation
of a large class of models to be constructed but at the price of requiring expansions
with a high number of terms m. These generic approaches usually result in a very
high computational complexity.
25 Low-Rank Tensor Methods for Model Order Reduction 859
p
The set Rm admits the subspace-based parametrization Rm D fw 2 Vm L . / W
Vm V; dim.Vm / D mg: Then the best rank-m approximation problem can be
reformulated as an optimization problem over the set of m-dimensional spaces:
where PVm is the orthogonal projection from V to Vm which provides the best
approximation PVm u./ of u./ from Vm , defined by
That means that the best rank-m approximation problem is equivalent to the problem
of finding an optimal subspace of dimension m for the projection of the solution.
25 Low-Rank Tensor Methods for Model Order Reduction 861
K WD u. / D fu./ W 2 g;
and measure how well the set of solutions K can be approximated by m-dimensional
subspaces. They provide a quantification of the ideal performance of model order
reduction methods.
For p D 1, assuming that K is compact, the number
dm.1/ .u/ D min sup ku./ PVm u./kV D min sup kv PVm vkV
dim.Vm /Dm 2 dim.Vm /Dm v2K
(25.7)
Z 1=p
p
dm.p/ .u/ D min ku./ PVm u./kV .d / (25.8)
dim.Vm /Dm
provide measures of the error that take into account the measure on the parameter
set. This is particularly relevant for uncertainty quantification, where these numbers
.p/
dm .u/ directly control the error on the moments of the solution map u (such as
mean, variance, or higher-order moments).
Some general results on the convergence of linear widths are available in
approximation theory (see, e.g., [51]). More interesting results which are specific
to some classes of parameter-dependent equations have been recently obtained
[16, 38]. These results usually exploit smoothness and anisotropy of the solution
map u W ! V and are typically upper bounds deduced from results on polynomial
.p/
approximation. Even if a priori estimates for dm .u/ are usually not available, a
challenging problem is to propose numerical methods that provide approximations
um of the form (25.3) with an error ku um kp of the order of the best achievable
.p/
accuracy dm .u/.
X
U D i vi si ;
i1
where the i are the singular values and where vi and si are the corresponding
normalized right and left singular vectors, respectively.
R Denoting by U the adjoint
2
operator of U , defined by U W s 2 L . / 7! u./s./d ./ 2 V , the operator
Z
U U WD C .u/ W v 2 V 7! u./.u./; v/V d ./ 2 V (25.9)
Here we adopt a subspace point of view for the low-rank approximation of the
solution map u W ! V . We first describe how to define projections onto a given
subspace. Then we present methods for the practical construction of subspaces.
3.1.1 Interpolation
When the subspace Vm is the linear span of evaluations (samples) of the solution u
at m given points f 1 ; : : : ; m g in , i.e.,
m
X
um ./ D u. i /'i .u.//; (25.11)
iD1
with f'i g a dual system to fu. i /g such that 'i .u. j // D ij for 1 i; j m.
This yields an interpolation um ./ depending not only on the value of u at the
points f 1 ; : : : ; m g but also on u./. This is of practical interest if 'i .u.// can
be efficiently computed without a complete knowledge of u./. For example, if the
coefficients of u./ on some basis of V can be estimated without computing u./,
then a possible choice consists in taking for f'1 ; : : : ; 'm g a set of functions that
associate to an element of V a set of m of its coefficients. This is the idea behind the
empirical interpolation method [41] and its generalization [40].
where c./ 1. As an example, let us consider the case of a linear problem where
with A./ a linear operator from V into some Hilbert space W , and let us consider
a Galerkin projection defined by minimizing some residual norm, that means
Assuming
the resulting projection um ./ satisfies (25.12) with a constant c./ D ./
./
which
can be interpreted as the condition number of the operator A./. This reveals
the interest of introducing efficient preconditioners in order to better exploit the
approximation power of the subspace Vm (see, e.g., [13, 58] for the construction of
parameter-dependent preconditioners). The resulting approximation can be written
X
R.um I / D Rj j .s./I /;
j
where the Rj are independent of s./ D .s1 ./; : : : ; sm .// and where the j can
be computed with a complexity depending on m (the dimension of the reduced
order model) but not on the dimension of V . This allows Galerkin projections to be
computed by solving a reduced system of m equations on the unknown coefficients
s./, with a computational complexity independent of the dimension of the full
order model. For linear problems, such a property is obtained when the operator
A./ and the right-hand side b./ admit low-rank representations (so-called affine
representations)
L
X R
X
A./ D Ai i ./; b./ D bi i ./: (25.16)
iD1 iD1
If A./ and b./ are not explicitly given under the form (25.16), then a preliminary
approximation step is needed. Such expressions can be obtained by the empirical
interpolation method [5, 11] or other low-rank truncation methods. Note that
preconditioners for parameter-dependent operators should also have such represen-
tations in order to preserve a reduced order model with a complexity independent of
the dimension of the full order model.
25 Low-Rank Tensor Methods for Model Order Reduction 865
L2 Optimality
When one is interested in optimal subspaces with respect to the norm k k2 , the
definition (25.5) of optimal subspaces can be replaced by
K K
1 X k k 2 1 X
min ku. / PVm u. /kV D min ku. k / v. k /k2V ;
dim.Vm /Dm K v2Rm K
kD1 kD1
(25.17)
K
1 X
CK .u/ D u. k /.u. k /; /V ; (25.18)
K
kD1
K
X K
X
min !k ku. k / PVm u. k /k2V D min !k ku. k / v. k /k2V ;
dim.Vm /Dm v2Rm
kD1 kD1
(25.19)
using a suitable quadrature rule for the integration over , e.g., exploiting the
smoothness of the solution map u W ! V in order to improvethe convergence
866 A. Nouy
with K and therefore decrease the number of evaluations of u for a given accuracy.
The resulting subspace Vm is obtained as the dominant eigenspace of the operator
K
X
CK .u/ D ! k u. k /.u. k /; /V : (25.20)
kD1
L1 Optimality
If one is interested in optimality with respect to the norm kk1 , the definition (25.5)
of optimal spaces can be replaced by
min sup ku./ PVm u./kV D min sup ku./ v./kV ; (25.21)
dim.Vm /Dm 2K v2Rm 2K
where the m points are selected in the finite set of points K . In practice, this
combinatorial problem can be replaced by a greedy algorithm, which consists in
selecting the points adaptively: given the first m points and the corresponding
subspace Vm , a new interpolation point mC1 is defined such that
polynomial basis. Let us note that the statistical estimate CK .u/ in (25.18) can be
interpretedPas the correlation operator C .uK / of a piecewise constant interpolation
uK ./ D K k
kD1 u. /12Ok of u./, where the sets fO1 ; : : : ; OK g form a partition
k
of such that 2 Ok and .Ok / D 1=K for all k.
Remark 1. Let us mention that the optimal rank-m singular value decomposition
of u can be equivalently obtained by computing the dominant eigenspace of the
operator CO .u/ D U U W L2 . / ! L2 . /. Then, a dual approach for model order
reduction consists in defining a subspace Sm in L2 . / as the dominant eigenspace
of CO .uK /, where uK is an approximation of u. An approximation of u can then be
obtained by a Galerkin projection onto the subspace V Sm (see, e.g., [20] where
an approximation uK of the solution of a parameter-dependent partial differential
equation is first computed using a coarse finite element approximation).
This problem is an optimization problem over the set Rm of rank-m tensors which
can be solved using optimization algorithms on low-rank manifolds such as an
alternating minimization algorithm which consists in successively minimizing over
the vi and over the si . Assuming that .I / defines a distance to the solution u./
which is uniformly equivalent to the one induced by the norm k kV , i.e.,
This results in an adaptive interpolation algorithm which was first introduced in [52].
It is the basic idea behind the so-called reduced basis methods (see, e.g., [50, 53]).
Assuming that satisfies (25.25) and that the projection um ./ verifies the quasi-
optimality condition (25.12), the selection of interpolation points defined by (25.27)
is quasi-optimal in the sense that
1
with D c inf2 c./1 . That makes (25.27) a suboptimal version of the greedy
algorithm (25.22) (so-called weak greedy algorithm). Convergence results for this
algorithm can be found in [8, 9, 18], where the authors provide explicit comparisons
between the resulting error ku./um ./kV and the Kolmogorov m-width of u.K /
which is the best achievable error by projections onto m-dimensional spaces.
The above definitions of interpolation points, and therefore of the resulting
subspaces Vm , do not take into account explicitly the probability measure .
However, this measure is taken into account implicitly when working with a sample
set K drawn according the probability measure . A construction that takes
into account the measure explicitly has been proposed in [12], where .vI / is
replaced by !./.vI /, with !./ a weight function depending on the probability
measure .
subspace of the algebraic tensor space L21 .1 /: : :L2d .d / (whose completion
is L2 . /).
The canonical rank of a tensor v in S is the minimal integer m such that v can be
written under the form (25.29). An approximation of the form (25.29) is called an
approximation in canonical tensor format. It has a storage complexity in O.rnd /.
For order-two tensors, this is the standard and unique notion of rank. For higher-
order tensors, other notions of rank can be introduced, therefore yielding different
types of rank-structured approximations. First, for a certain subset of dimensions
D WD f1; : : : ; d g and its complementary subset c D D
Nn, S can be identified
with the space S Sc of order-two tensors, where S D 2 S . The -rank of
a tensor v, denoted rank .v/, is then defined as the minimal integer r such that
r
X c
v.1 ; : : : ; d / D vi . /vi .c /;
iD1
c
where vi 2 S and vi 2 Sc (here denotes the collection of variables
f W 2 g), which is the standard and unique notion of rank for order-two
tensors. Low-rank Tucker formats are then defined by imposing the -rank for a
collection of subsets D. The Tucker rank (or multilinear rank) of a tensor is
the tuple .rankf1g .v/; : : : ; rankfd g .v// in Nd . A tensor with Tucker rank bounded by
.r1 ; : : : ; rd / can be written as
r1
X rd
X
v.1 ; : : : ; d / D ::: ai1 :::id vi11 .1 / : : : vidd .d /; (25.30)
i1 D1 id D1
where vi 2 S and where a 2 Rr1 :::rd is a tensor of order d , called the core
tensor. An approximation of the form (25.30) is called an approximation in Tucker
format. It can be seen as an approximation in a tensor space U1 : : : Ud , where
U D spanfvi griD1 is a r -dimensional subspace of S . The storage complexity of
this format is in O.rnd C r d / (with r D max r ) and grows exponentially with
the dimension d . Additional constraints on the ranks of v (or of the core tensor a)
have to be imposed in order to reduce this complexity. Tree-based (or hierarchical)
Tucker formats [25, 30] are based on a notion of rank associated with a dimension
partition tree T which is a tree-structured collection of subsets in D, with D as
the root of the tree and the singletons f1g; : : : ; fd g as the leaves of the tree (see
Fig. 25.1). The tree-based (or hierarchical) Tucker rank associated with T is then
defined as the tuple .rank .v//2T . A particular case of interest is the tensor train
(TT) format [47] which is associated with a simple rooted tree T with interior nodes
I D ff; : : : ; d g, 1 d } (represented on Fig. 25.1b). The T T -rank of a
25 Low-Rank Tensor Methods for Model Order Reduction 871
b {1, 2, 3, 4}
Fig. 25.1 Examples of dimension partition trees over D D f1; : : : ; 4g. (a) Balanced tree. (b)
Unbalanced tree
r1
X rX
d 1
1
v.1 ; : : : ; d / D ::: v1;i 1
.1 /vi21 ;i2 .2 / : : : vidd 1 ;1 .d /; (25.31)
i1 D1 id 1 D1
with vi1 ;i 2 S , which is very convenient in practice (for storage, evaluations, and
algebraic manipulations). The storage complexity for the TT format is in O.dr 2 n/
(with r D max r ).
Sparse tensor methods consist in searching for approximations of the P form (25.28)
m
with only a few nonzero terms that means approximations v./ D iD1 ai si ./
where the m functions si are selected (with either adaptive or nonadaptive methods)
in the collection (dictionary) of functions D D f k11 .1 / : : : kdd .d / W 1 k
n ; 1 d g. A typical choice consists in taking for D a basis of multivariate
polynomials. Recently, theoretical results have been obtained on the convergence of
sparse polynomial approximations of the solution of parameter-dependent equations
(see [15]). Also, algorithms have been proposed that can achieve convergence rates
comparable to the best m-term approximations. For such a dictionary D containing
rank-one functions, a sparse m-term approximation is a tensor with canonical
rank bounded by m (usually lower than m, see [29, Section 7.6.5]). Therefore, an
approximation in canonical tensor format (25.29) can be seen as a sparse m-term
approximation where the m functions are selected in the dictionary of all rank-one
(separated) functions R1 D fs./ D s 1 .1 / : : : s d .d / W s 2 S g. Convergence
results for best rank-m approximations can therefore be deduced from convergence
results for sparse tensor approximation methods.
872 A. Nouy
Let us mention some other standard structured approximations that are particular
cases of low-rank tensor approximations. First, a function v./ D v. / depending
on a single variable is a rank-one (elementary) tensor. It has an -rank equal to
1 for any subset in D. A low-dimensional function v./ D v. / depending on a
subset of variables , D, has a -rank equal to 1 for any subset of dimensions
containing or such that \ D ;: An additive function v./ D v1 .1 /
C : : : C vd .d /, which is a sum of d elementary tensors, is a tensor with canonical
rank d . Also, such an additive function has rank .v/ 2 for any subset 2 D,
which means that it admits an exact representation in any hierarchical Tucker format
(including TT format) with a rank bounded by .2; : : : ; 2/.
Remark 2. Let us note that low-rank structures (as well as other types of struc-
tures) can be revealed only after a suitable change of variables. For example,
let
D .
1 ; : : : ;
m / be the
Pd variables obtained by an affine transformation of
variables , with
i D j D1 aij j C bj , 1 j m. Then the function
Pm
v./ D iD1 vi .
i / WD v.
/, O as a function m variables, can be seen as an order-m
tensor with canonical rank less than m. This type of approximation corresponds to
the projection pursuit regression model.
inf ku vk WD r
v2Mr
for a given class of functions u and a given low-rank format. A few convergence
results have been obtained for functions with standard Sobolev regularity (see, e.g.,
[55]). An open and challenging problem is to characterize approximation classes of
the different low-rank formats, which means the class of functions u such that r
has a certain (e.g., algebraic or exponential) decay with r.
The characterization of topological and geometrical properties of subsets Mr
is important for different purposes such as proving the existence of best approxi-
mations in Mr or deriving algorithms. For d 3, the subsets Mr associated
with the notion of canonical rank are not closed and best approximation problems
in Mr are ill-posed. Subsets of tensors associated with the notions of tree-based
(hierarchical) Tucker ranks have better properties. Indeed, they are closed sets,
25 Low-Rank Tensor Methods for Model Order Reduction 873
which ensures the existence of best approximations. Also, they are differentiable
manifolds [24,32,57]. This has useful consequences for optimization [57] or for the
projection of dynamical systems on these manifolds [39].
For the different notions of rank introduced above, an interesting property of the
corresponding low-rank subsets is that they admit simple parametrizations:
where F is a multilinear map and Pk are vector spaces and where the number
of parameters ` is in O.d /. An optimization problem on Mr can then be
reformulated as an optimization problem on P1 : : : P` , for which simple
alternating minimization algorithms (block coordinate descent) can be used [23].
K
1 X k 2
min u. / v. k / : (25.33)
v2Mr K
kD1
Remark 3. Note that for other objectives in statistical learning (e.g., classifica-
tion), (25.33) can be replaced by
K
1 X k
min ` u. /; v. k / ;
v2Mr K
kD1
874 A. Nouy
where ` is a so-called loss function measuring a certain distance between u. k / and
the approximation v. k /.
5.2 Interpolation/Projection
Remark 4. When exploiting only the order-two tensor structure, the methods
presented here are closely related to projection-based model reduction methods.
Although they provide a directly exploitable low-rank approximation of u, they can
also be used for the construction of a low-dimensional subspace in V (a candidate
for projection-based model reduction) which is extracted from the obtained low-
rank approximation.
R.u/ D 0; (25.34)
with
u in V RN or V Rn1 : : : Rnd :
876 A. Nouy
In order to clarify the structure of Eq. (25.34), let us consider the case of a linear
problem where R.vI / D A./v b./ and assume that A./ and b./ have low-
rank (or affine) representations of the form
L
X R
X
A./ D Ai i ./ and b./ D bi i ./: (25.35)
iD1 iD1
L
X R
X
AD Ai AQ i and bD bi bQi ; (25.36)
iD1 iD1
L
X R
X
A./ D Ai i1 .1 / : : : id .d / and b./ D bi i1 .1 / : : : id .d /;
iD1 iD1
(25.37)
L
X R
X
AD Ai AQ 1i : : : AQ di and bD bi bQi1 : : : bQid ; (25.38)
iD1 iD1
A first solution strategy consists in using standard iterative solvers (e.g., Richard-
son, conjugate gradient, Newton, etc.) with efficient low-rank truncation methods
25 Low-Rank Tensor Methods for Model Order Reduction 877
of the iterates [13, 35, 37, 43]. A simple iterative algorithm takes the form
ukC1 D M .uk /; where M is an iteration map involving simple algebraic operations
between tensors (additions, multiplications) which requires the implementation of a
tensor algebra. Low-rank truncation methods can be systematically used for limiting
the storage complexity and the computational complexity of algebraic operations.
This results in approximate iterations ukC1 M .uk /, and the resulting algorithm
can be analyzed as an inexact version (or perturbation) of the initial algorithm
(see, e.g., [31]).
As an example, let us consider a linear tensor-structured problem Au b D 0.
An approximate Richardson algorithm takes the form
J [26]. Also, J .v/ can be taken as a certain norm of the residual R.v/. For linear
problems, choosing
ku ur k2 min ku vk2 ;
v2Mr
where = is the condition number of the operator A. Here again, the use of
preconditioners allows us to better exploit the approximation power of a given low-
rank manifold Mr .
Constructive algorithms are also possible, the most prominent algorithm being a
greedy algorithm which consists in computing a sequence of low-rank approxima-
tions um obtained by successive rank-one corrections, i.e.,
Remark 5. The above algorithms can also be used within iterative algorithms for
which an iteration takes the form
Ck ukC1 D Fk .uk /;
where Fk .uk / can be computed with a low complexity using low-rank tensor algebra
(with potential low-rank truncations), but where the inverse of the operator Ck is
not known explicitly, so that C1 k
k Fk .u / cannot be obtained from simple algebraic
25 Low-Rank Tensor Methods for Model Order Reduction 879
7 Concluding Remarks
Low-rank tensor methods have emerged as a very powerful tool for the solution of
high-dimensional problems arising in many contexts and in particular in uncertainty
quantification. However, there remain many challenging issues to address for a bet-
ter understanding of this type of approximation and for a diffusion of these methods
in a wide class of applications. From a theoretical point of view, open questions
include the characterization of the approximation classes of a given low-rank tensor
format, which means the class of functions for which a certain type of convergence
(e.g., algebraic or exponential) can be expected, and also the characterization of
the problems yielding solutions in these approximation classes. Also, quantitative
results on the approximation of a low-rank function (or tensor) from samples of
this function (or entries of this tensor) could allow us to answer some practical
issues such as the determination of the number of samples required for a stable
approximation in a given low-rank format or the design of sampling strategies
which are adapted to particular low-rank formats. From a numerical point of
view, challenging issues include the development of efficient algorithms for global
optimization on low-rank manifolds, with guaranteed convergence properties, and
the development of adaptive algorithms for the construction of controlled low-rank
approximations, with an adaptive selection of ranks and potentially of the tensor
formats (e.g., based on tree optimization for tree-based formats). From a practical
point of view, low-rank tensor methods exploiting model equations (Galerkin-type
methods) are often seen as intrusive methods in the sense that they require (a
priori) the development of specific softwares. An important issue is then to develop
weakly intrusive implementations of these methods which may allow the use of
existing computational frameworks and which would therefore contribute to a large
diffusion of these methods.
References
1. Bachmayr, M., Dahmen, W.: Adaptive near-optimal rank tensor approximation for high-
dimensional operator equations. Found. Comput. Math. 15(4), 839898 (2015)
2. Bachmayr, M., Schneider, R.: Iterative Methods Based on Soft Thresholding of Hierarchical
Tensors (Jan 2015). ArXiv e-prints 1501.07714
3. Ballani, J., Grasedyck, L.: A projection method to solve linear systems in tensor format.
Numer. Linear Algebra Appl. 20(1), 2743 (2013)
4. Ballani, J., Grasedyck, L., Kluge, M.: Black box approximation of tensors in hierarchical tucker
format. Linear Algebra Appl. 438(2), 639657 (2013). Tensors and Multilinear Algebra
5. Barrault, M., Maday, Y., Nguyen, N.C., Patera, A.T.: An empirical interpolation method:
application to efficient reduced-basis discretization of partial differential equations. C. R. Math.
339(9), 667672 (2002)
880 A. Nouy
6. Bebendorf, M., Maday, Y., Stamm, B.: Comparison of some reduced representation approx-
imations. In: Quarteroni, A., Rozza, G. (eds.) Reduced Order Methods for Modeling and
Computational Reduction. Volume 9 of MS&A Modeling, Simulation and Applications,
pp. 67100. Springer International Publishing, Cham (2014)
7. Beylkin, G., Garcke, B., Mohlenkamp, M.J.: Multivariate regression and machine learning
with sums of separable functions. J. Comput. Phys. 230, 23452367 (2011)
8. Binev, P., Cohen, A., Dahmen, W., Devore, R., Petrova, G., Wojtaszczyk, P.: Convergence
rates for greedy algorithms in reduced basis methods. SIAM J. Math. Anal. 43(3), 14571472
(2011)
9. Buffa, A., Maday, Y., Patera, A.T., PrudHomme, C., Turinici, G.: A priori convergence of the
Greedy algorithm for the parametrized reduced basis method. ESAIM: Math. Model. Numer.
Anal. 46(3), 595603 (2012). Special volume in honor of Professor David Gottlieb
10. Cances, E., Ehrlacher, V., Lelievre, T.: Convergence of a greedy algorithm for high-
dimensional convex nonlinear problems. Math. Models Methods Appl. Sci. 21(12), 24332467
(2011)
11. Casenave, F., Ern, A., Lelivre, T.: A nonintrusive reduced basis method applied to
aeroacoustic simulations. Adv. Comput. Math. 41(5), 961986 (2015)
12. Chen, P., Quarteroni, A., Rozza, G.: A weighted reduced basis method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 51(6), 31633185 (2013)
13. Chen, Y., Gottlieb, S., Maday, Y.: Parametric analytical preconditioning and its applications to
the reduced collocation methods. C. R. Math. 352(7/8), 661666 (2014). ArXiv e-prints
14. Chevreuil, M., Lebrun, R., Nouy, A., Rai, P.: A least-squares method for sparse low rank
approximation of multivariate functions. SIAM/ASA J. Uncertain. Quantif. 3(1), 897921
(2015)
15. Cohen, A., Devore, R.: Approximation of high-dimensional parametric PDEs. Acta Numer.
24, 1159 (2015)
16. Cohen, A., Devore, R.: Kolmogorov widths under holomorphic mappings. IMA J. Numer.
Anal. (2015)
17. Defant, A., Floret, K.: Tensor Norms and Operator Ideals. North-Holland, Amster-
dam/New York (1993)
18. DeVore, R., Petrova, G., Wojtaszczyk, P.: Greedy algorithms for reduced bases in banach
spaces. Constr. Approx. 37(3), 455466 (2013)
19. Dolgov, S., Khoromskij, B.N., Litvinenko, A., Matthies, H. G.: Polynomial chaos expansion
of random coefficients and the solution of stochastic partial differential equations in the tensor
train format. SIAM/ASA J. Uncertain. Quantif. 3(1), 11091135 (2015)
20. Doostan, A., Ghanem, R., Red-Horse, J.: Stochastic model reductions for chaos representa-
tions. Comput. Methods Appl. Mech. Eng. 196(3740), 39513966 (2007)
21. Doostan, A., Validi, A., Iaccarino, G.: Non-intrusive low-rank separated approximation of
high-dimensional stochastic models. Comput. Methods Appl. Mech. Eng. 263(0), 4255
(2013)
22. Espig, M., Grasedyck, L., Hackbusch, W.: Black box low tensor-rank approximation using
fiber-crosses. Constr. Approx. 30, 557597 (2009)
23. Espig, M., Hackbusch, W., Khachatryan, A.: On the convergence of alternating least squares
optimisation in tensor format representations (May 2015). ArXiv e-prints 1506.00062
24. Falc, A., Nouy, A.: Proper generalized decomposition for nonlinear convex problems in tensor
banach spaces. Numerische Mathematik 121, 503530 (2012)
25. Falc, A., Hackbusch, W., Nouy, A.: Geometric structures in tensor representations. Found.
Comput. Math. (Submitted)
26. Giraldi, L., Liu, D., Matthies, H.G., Nouy, A.: To be or not to be intrusive? The solution
of parametric and stochastic equationsproper generalized decomposition. SIAM J. Sci.
Comput. 37(1), A347A368 (2015)
27. Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal.
Appl. 31, 20292054 (2010)
25 Low-Rank Tensor Methods for Model Order Reduction 881
28. Grasedyck, L., Kressner, D., Tobler, C.: A literature survey of low-rank tensor approximation
techniques. GAMM-Mitteilungen 36(1), 5378 (2013)
29. Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus. Volume 42 of Springer Series
in Computational Mathematics. Springer, Heidelberg (2012)
30. Hackbusch, W., Kuhn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl.
15(5), 706722 (2009)
31. Hackbusch, W., Khoromskij, B., Tyrtyshnikov, E.: Approximate iterations for structured
matrices. Numerische Mathematik 109, 365383 (2008). 10.1007/s00211-008-0143-0.
32. Holtz, S., Rohwedder, T., Schneider, R.: On manifolds of tensors of fixed tt-rank. Numerische
Mathematik 120(4), 701731 (2012)
33. Kahlbacher, M., Volkwein, S.: Galerkin proper orthogonal decomposition methods for
parameter dependent elliptic systems. Discuss. Math.: Differ. Incl. Control Optim. 27, 95
117 (2007)
34. Khoromskij, B.: Tensors-structured numerical methods in scientific computing: survey on
recent advances. Chemom. Intell. Lab. Syst. 110(1), 119 (2012)
35. Khoromskij, B.B., Schwab, C.: Tensor-structured Galerkin approximation of parametric and
stochastic elliptic pdes. SIAM J. Sci. Comput. 33(1), 364385 (2011)
36. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455500
(2009)
37. Kressner, D., Tobler, C.: Low-rank tensor krylov subspace methods for parametrized linear
systems. SIAM J. Matrix Anal. Appl. 32(4), 12881316 (2011)
38. Lassila, T., Manzoni, A., Quarteroni, A., Rozza, G.: Generalized reduced basis methods and
n-width estimates for the approximation of the solution manifold of parametric pdes. In:
Brezzi, F., Colli Franzone, P., Gianazza, U., Gilardi, G. (eds.) Analysis and Numerics of Partial
Differential Equations. Volume 4 of Springer INdAM Series, pp. 307329. Springer, Milan
(2013)
39. Lubich, C., Rohwedder, T., Schneider, R., Vandereycken, B.: Dynamical approximation by
hierarchical tucker and tensor-train tensors. SIAM J. Matrix Anal. Appl. 34(2), 470494
(2013)
40. Maday, Y., Mula, O.: A generalized empirical interpolation method: application of reduced
basis techniques to data assimilation. In: Brezzi, F., Colli Franzone, P., Gianazza, U., Gilardi, G.
(eds.) Analysis and Numerics of Partial Differential Equations. Volume 4 of Springer INdAM
Series, pP. 221235. Springer, Milan (2013)
41. Maday, Y., Nguyen, N.C., Patera, A.T., Pau, G.S.H.: A general multipurpose interpolation
procedure: the magic points. Commun. Pure Appl. Anal. 8(1), 383404 (2009)
42. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial
differential equations. Comput. Methods Appl. Mech. Eng. 194(1216), 12951331 (2005)
43. Matthies, H.G., Zander, E.: Solving stochastic systems with low-rank tensor compression.
Linear Algebra Appl. 436(10), 38193838 (2012)
44. Nouy, A.: A generalized spectral decomposition technique to solve a class of linear stochastic
partial differential equations. Comput. Methods Appl. Mech. Eng. 196(4548), 45214537
(2007)
45. Nouy, A.: Generalized spectral decomposition method for solving stochastic finite element
equations: invariant subspace problem and dedicated algorithms. Comput. Methods Appl.
Mech. Eng. 197, 47184736 (2008)
46. Nouy, A.: Proper generalized decompositions and separated representations for the numerical
solution of high dimensional stochastic problems. Arch. Comput. Methods Eng. 17(4), 403
434 (2010)
47. Oseledets, I.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 22952317 (2011)
48. Oseledets, I., Tyrtyshnikov, E.: Breaking the curse of dimensionality, or how to use SVD in
many dimensions. SIAM J. Sci. Comput. 31(5), 37443759 (2009)
49. Oseledets, I., Tyrtyshnikov, E.: TT-cross approximation for multidimensional arrays. Linear
Algebra Appl. 432(1), 7088 (2010)
882 A. Nouy
50. Patera, A.T., Rozza, G.: Reduced Basis Approximation and A-Posteriori Error Estimation
for Parametrized PDEs. MIT-Pappalardo Graduate Monographs in Mechanical Engineering.
Massachusetts Institute of Technology, Cambridge (2007)
51. Pietsch, A.: Eigenvalues and s-Numbers. Cambridge University Press, Cambridge/New York
(1987)
52. Prudhomme, C., Rovas, D., Veroy, K., Maday, Y., Patera, A.T., Turinici, G.: Reliable real-time
solution of parametrized partial differential equations: reduced-basis output bound methods. J.
Fluids Eng. 124(1), 7080 (2002)
53. Quarteroni, A., Rozza, G., Manzoni, A.: Certified reduced basis approximation for
parametrized partial differential equations and applications. J. Math. Ind 1(1), 149 (2011)
54. Rauhut, H., Schneider, R., Stojanac, Z.: Tensor completion in hierarchical tensor representa-
tions (Apr 2014). ArXiv e-prints
55. Schneider, R., Uschmajew, A.: Approximation rates for the hierarchical tensor format in
periodic sobolev spaces. J. Complex. 30(2), 5671 (2014) Dagstuhl 2012
56. Temlyakov, V.: Greedy approximation in convex optimization (June 2012). ArXiv e-prints
57. Uschmajew, A., Vandereycken, B.: The geometry of algorithms using hierarchical tensors.
Linear Algebra Appl 439(1), 133166 (2013)
58. Zahm, O., Nouy, A.: Interpolation of inverse operators for preconditioning parameter-
dependent equations (April 2015). ArXiv e-prints
Random Vectors and Random Fields in
High Dimension: Parametric Model-Based 26
Representation, Identification from Data,
and Inverse Problems
Christian Soize
Abstract
The statistical inverse problem for the experimental identification of a non-
Gaussian matrix-valued random field, that is, the model parameter of a boundary
value problem, using some partial and limited experimental data related to a
model observation, is a very difficult and challenging problem. A complete
advanced methodology and the associated tools are presented for solving such
a problem in the following framework: the random field that must be identified
is a non-Gaussian matrix-valued random field and is not simply a real-valued
random field; this non-Gaussian random field is in high stochastic dimension
and is identified in a general class of random fields; some fundamental algebraic
properties of this non-Gaussian random field must be satisfied such as symmetry,
positiveness, invertibility in mean square, boundedness, symmetry class, spatial-
correlation lengths, etc.; and the available experimental data sets correspond only
to partial and limited data for a model observation of the boundary value problem.
The developments presented are mainly related to the elasticity framework,
but the methodology is general and can be used in many areas of computational
sciences and engineering. The developments are organized as follows. The
first part is devoted to the definition of the statistical inverse problem that
has to be solved in high stochastic dimension and is focussed on stochastic
elliptic operators such that the ones that are encountered in the boundary value
problems of the linear elasticity. The second one deals with the construction of
two possible parameterized representations for a non-Gaussian positive-definite
matrix-valued random field that models the model parameter of a boundary
value problem. A parametric model-based representation is then constructed
in introducing a statistical reduced model and a polynomial chaos expansion,
first with deterministic coefficients and after with random coefficients. This
C. Soize ()
Laboratoire Modlisation et Simulation Multi Echelle (MSME), Universit Paris-Est,
Marne-la-Vallee, France
e-mail: christian.soize@univ-paris-est.fr
Keywords
Random vector Random field Random matrix High dimension High
stochastic dimension Non-Gaussian Non-Gaussian random field Rep-
resentation of random fields Polynomial chaos expansion Generator
Maximum entropy principle Prior model Maximum likelihood method
Bayesian method Identification Inverse problem Statistical inverse prob-
lem Random media Heterogeneous microstructure Composite materials
Porous media
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
2 Notions on the High Stochastic Dimension and on the Parametric
Model-Based Representations for Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
2.1 What Is a Random Vector or a Random Field with a High Stochastic
Dimension? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
2.2 What Is a Parametric Model-Based Representation for the Statistical
Identification of a Random Model Parameter from Experimental Data? . . . . . . . 887
3 Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
3.1 Classical Methods for Statistical Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . 888
3.2 Case of a Gaussian Random Model Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
3.3 Case for Which the Model Parameter Is a Non-Gaussian
Second-Order Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
3.4 Finite-Dimension Approximation of the BVP and Finite-Dimension
Parameterization of the Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
3.5 Parameterization of the Non-Gaussian Second-Order Random Vector . . . . . . . 890
3.6 Statistical Inverse Problem for Identifying a Non-Gaussian Random
Field as a Model Parameter of a BVP, Using Polynomial Chaos Expansion . . . . 891
3.7 Algebraic Prior Stochastic Models of the Model Parameters of BVP . . . . . . . . . . 892
4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
5 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
5.1 Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
5.2 Sets of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
5.3 Kronecker Symbol, Unit Matrix, and Indicator Function . . . . . . . . . . . . . . . . . . . 894
5.4 Norms and Usual Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
26 Random Vectors and Random Fields in High Dimension . . . 885
5.5 Order Relation in the Set of All the Positive-Definite Real Matrices . . . . . . . . . . 895
5.6 Probability Space, Mathematical Expectation, and Space of
Second-Order Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
6 Setting the Statistical Inverse Problem to be Solved in High Stochastic Dimension . . . 895
6.1 Stochastic Elliptic Operator and Boundary Value Problem . . . . . . . . . . . . . . . . . . 895
6.2 Stochastic Finite Element Approximation of the Stochastic Boundary
Value Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897
6.3 Experimental Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898
6.4 Statistical Inverse Problem to be Solved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898
7 Parametric Model-Based Representation for the Model Parameters and
Model Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
7.1 Introduction a Class of Lower-Bounded Random Fields for K and
Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
7.2 Construction of the Nonlinear Transformation G . . . . . . . . . . . . . . . . . . . . . . . . . . 900
7.3 Truncated Reduced Representation of Second-Order Random Field
G and Its Polynomial Chaos Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902
7.4 Parameterization of Compact Stiefel Manifold Vm .RN / . . . . . . . . . . . . . . . . . . . . 904
7.5 Parameterized Representation for Non-Gaussian Random Field K . . . . . . . . . . 904
7.6 Parametric Model-Based Representation of Random Observation
Model U . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
8 Methodology for Solving the Statistical Inverse Problem
in High Stochastic Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
8.1 Step 1: Introduction of a Family fKAPSM .xI s/; x 2 g of Algebraic
Prior Stochastic Models (APSM) for Non-Gaussian Random Field K . . . . . . . 905
8.2 Step 2: Identification of an Optimal Algebraic Prior Stochastic Model
(OAPSM) for Non-Gaussian Random Field K . . . . . . . . . . . . . . . . . . . . . . . . . . 906
8.3 Step 3: Choice of an Adapted Representation for Non-Gaussian
Random Field K and Optimal Algebraic Prior Stochastic Model for
Non-Gaussian Random Field G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
8.4 Step 4: Construction of a Truncated Reduced Representation of
Second-Order Random Field GOAPSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
8.5 Step 5: Construction of a Truncated Polynomial Chaos Expansion of
OAPSM and Representation of Random Field KOAPSM . . . . . . . . . . . . . . . . . . . . 908
8.6 Step 6: Identification of the Prior Stochastic Model Kprior of K in
the General Class of the Non-Gaussian Random Fields . . . . . . . . . . . . . . . . . . . . 912
8.7 Step 7: Identification of a Posterior Stochastic Model Kpost of K . . . . . . . . . . 913
9 Construction of a Family of Algebraic Prior Stochastic Models . . . . . . . . . . . . . . . . . . . 914
9.1 General Properties of the Non-Gaussian Random Field K with a
Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
9.2 Algebraic Prior Stochastic Model for the Case of Anisotropic
Statistical Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915
9.3 Algebraic Prior Stochastic Model for the Case of Dominant Statistical
Fluctuations in a Symmetry Class with Some Anisotropic Statistical
Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919
10 Key Research Findings and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931
10.1 Additional Ingredients for Statistical Reduced Models, Symmetry
Properties, and Generators for High Stochastic Dimension . . . . . . . . . . . . . . . . . . 931
10.2 Tensor-Valued Random Fields and Continuum Mechanics of
Heterogenous Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931
11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932
886 C. Soize
1 Introduction
in order that the mean-square error of U U.m/ is sufficiently small. It can thus be
written U U.m/ (in mean square). In such a reduced representation, 1 : : : >
m > 0 are the dominant eigenvalues of the covariance matrix of U and b1 ; : : : ; bm
are the associated orthonormal eigenvectors in RNU . The components 1 ; : : : ; m are
m centered and uncorrelated real-valued random variables. If random vector U is a
Gaussian random vector, then 1 ; : : : ; m are m independent Gaussian real-valued
random variables, and, for this particular Gaussian case, the stochastic dimension
of U is m. However, for the general case, U is a non-Gaussian random vector, and
consequently, the real-valued random variables 1 ; : : : ; m (that are centered and
uncorrelated) are not independent but are statistically dependent. In such a case, m
is not the stochastic dimension of U, but clearly the stochastic dimension is less
or equal to m (the equality is obtained for the Gaussian case). Let us assume that
there exists a deterministic nonlinear mapping Y from RNg into Rm such that the
random vector D .1 ; : : : ; m / can be written as D Y.1 ; : : : ; Ng / in which
Ng < m and where 1 ; : : : ; Ng are Ng independent real-valued random variables
(for instance, Y can be constructed using the polynomial chaos expansion of the
second-order random vector ). In such a case, the stochastic dimension of U is less
or equal to Ng . If among all the possible nonlinear mappings and all the possible
integers Ng such that 1 Ng < m, the mapping Y and the integer Ng correspond
to the smallest possible value of Ng such as D Y.1 ; : : : ; Ng /, then Ng is the
stochastic dimension of U, and U has a high stochastic dimension if Ng is large.
If fu.x/; x 2 g is a second-order random field indexed by Rd with values
in RNu , for which its cross covariancePfunctionpis square integrable , then
m
a reduced representation u.m/ .x/ D iD1 i i bi .x/ of u can be constructed
using the Karhunen-Love expansion of u, in which m is calculated in order that the
mean-square error of uu.m/ is sufficiently small. Therefore, the explanations given
before can be applied to the random vector D .1 ; : : : ; m / in order to estimate
the stochastic dimension of random field u.
3 Brief History
The problem related to the identification of a model parameter (scalar, vector, field)
of a boundary value problem (BVP) (for instance, the coefficients of a partial
differential equation) using experimental data related to a model observation (scalar,
vector, field) of this BVP is a problem for which there exists a rich literature,
including numerous textbooks. In general and in the deterministic context, there
is not a unique solution because the function, which maps the model parameter
(that belongs to an admissible set) to the model observation (that belongs to another
admissible set) is not a one-to-one mapping and, consequently, cannot be inverted.
It is an ill-posed problem. However, such a problem can be reformulated in terms
of an optimization problem consisting in calculating an optimal value of the model
parameter, which minimizes a certain distance between the observed experimental
data and the model observation that is computed with the BVP and that depends
on the model parameter (see, for instance, [76] for an overview concerning the
general methodologies and [36] for some mathematical aspects related to the inverse
problems for partial differential equations). In many cases, the analysis of such
an inverse problem can have a unique solution in the framework of statistics,
26 Random Vectors and Random Fields in High Dimension . . . 889
that is to say when the model parameter is modeled by a random quantity, with
or without external noise on the model observation (observed output). In such
a case, the random model observation is completely defined by its probability
distribution (in finite or in infinite dimension) that is the unique transformation of
the probability distribution of the random model parameter. This transformation is
defined by the functional that maps the model parameter to the model observation.
Such a formulation is constructed for obtaining a well-posed problem that has
a unique solution in the probability theory framework. We refer the reader to
[38] and [72] for an overview concerning the general methodologies for statistical
and computational inverse problems, including general least-square inversion and
the maximum likelihood method [54, 67] and including the Bayesian approach
[8, 9, 67, 68].
46,47,52]). The polynomial chaos expansion has also been extended for an arbitrary
probability measure [20, 43, 44, 57, 77, 78] and for sparse representation [5]. New
algorithms have been proposed for obtaining a robust computation of realizations of
high degrees of polynomial chaos [49,65]. This type of representation has also been
extended for the case of the polynomial chaos expansion with random coefficients
[66], for the construction of a basis adaptation in homogeneous chaos spaces [73],
and for an arbitrary multimodal multidimensional probability distribution [64].
Since it is assumed that the experimental data that are available for the statistical
inverse problem are partial and limited, the parametric statistics must be used
instead of the nonparametric statistics that cannot be used. This implies that
a parameterized representation of the non-Gaussian second-order random vector
must be constructed. There are two main methods for constructing such a
parameterization.
(i) The first one is a direct approach that consists in constructing a algebraic prior
representation of the non-Gaussian probability distribution of in using the
maximum entropy principle (MaxEnt) [37, 63] under the constraints defined
by the available information. A general computational methodology, for the
problems in high stochastic dimension, is proposed in [4, 59] and is synthe-
sized in Sect. 11 of Chap. 8, Random Matrix Models and Nonparametric
Method for Uncertainty Quantification in part II of the present Handbook
on Uncertainty Quantification. Such a construction allows a low-dimension
26 Random Vectors and Random Fields in High Dimension . . . 891
4 Overview
A complete methodology and the associated tools are presented for the experimental
identification of a non-Gaussian matrix-valued random field that is the model
parameter of a boundary value problem, using some experimental data related to
a model observation. The difficulties of the statistical inverse problem that are
presented are due to the following chosen framework that corresponds to many
practical situations in computational sciences and engineering:
For such a statistical inverse problem, the above framework implies the use of
an adapted and advanced methodology. The developments presented hereinafter are
mainly related to the elasticity framework, but the methodology is general and can
be used in many areas of computational sciences and engineering. The developments
are organized as follows.
The first one is devoted to the definition of the statistical inverse problem that has
to be solved in high stochastic dimension and is focused on stochastic elliptic
operators such as the ones that are encountered in the boundary value problems
of the linear elasticity.
The second one deals with the construction of two possible parameterized rep-
resentations for a non-Gaussian positive-definite matrix-valued random field that
models the model parameter of a boundary value problem. A parametric model-
based representation is then constructed in introducing a statistical reduced
model and a polynomial chaos expansion, first with deterministic coefficients
and after with random coefficients. This parametric model-based representation
is directly used for solving the statistical inverse problem.
The third part is devoted to the description of all the steps of the methodology
allowing the statistical inverse problem to be solved in high stochastic dimension.
This methodology corresponds to the work initialized in [61], extended in [62]
for constructing a posterior stochastic model using the Bayesian approach, and
revisited in [48, 49].
The fourth part presents the construction of an algebraic prior stochastic model of
the model parameter of the boundary value problem, for a non-Gaussian matrix-
valued random field. This construction is based on the works [27, 28, 58, 60] and
reuses the formalism and the results introduced in the developments presented in
894 C. Soize
5 Notations
the usual inner product < x; y >D j D1 xj yj and the associated norm kxk D
< x; x >1=2 .
Let G and H be two matrices in MCn .R/. The notation G > H means that the
matrix G H belongs to MCn .R/.
Let d be an integer such that 1 d 3. Let n be another finite integer such that
n 1, and let Nu be an integer such that 1 Nu n. Let be a bounded open
domain of Rd , with generic point x D .x1 ; : : : ; xd /, with boundary @, and let be
D [ @.
random field K cannot be a Gaussian field. Such a random field K allows for
constructing the coefficients of a given stochastic elliptic operator u 7! Dx .u/ that
applies to the random field u.x/ D .u1 .x/; : : : ; uNu .x//, indexed by , with values
in RNu .
The boundary value problem that is formulated in u involves the stochastic
elliptic operator Dx , and some Dirichlet and Neumann boundary conditions are
given on @ that is written as the union of three parts, @ D 0 [ [ 1 . On
the part 0 , a Dirichlet condition is given. The part corresponds to the part of the
boundary on which there is a zero Neumann condition and on which experimental
data are available for u. On the part 1 , a Neumann condition is given. The boundary
value problems, involving such a stochastic elliptic operator Dx , are encountered in
many problems of computational sciences and engineering.
Examples of stochastic elliptic operators.
@ @ @
Dx D M .1/ C M .2/ C M .3/ ; (26.4)
@x1 @x2 @x3
26 Random Vectors and Random Fields in High Dimension . . . 897
where M .1/ , M .2/ and M .3/ are the .n Nu / real matrices defined by
2 3 2 3 2 3
1 0 0 0 0 0 0 0 0
60 0 07 60 1 07 60 0 07
6 7 6 7 6 7
6 7 6 7 6 7
60 0 07 6 0 0 0 7 60 0 17
M D 6
.1/
60 p1 0
7;
7 M D 6 p1 0 0 7 ; M D 6
.2/ 6 7 .3/
60 0 07
7:
6 2 7 6 2 7 6 1 7
60 0 p 7
1 60 0 07 6p 0 07
4 25 4 5 4 2 5
1 p1
0 0 0 0 0 p 0 0
2 2
(26.5)
Dx .u/ D 0 in ; (26.6)
in which the stochastic operator Dx is defined by Eq. (26.2), where the Dirichlet
condition is
u D 0 on 0 ; (26.7)
in which Mn .x/ D M .1/ n1 .x/ C M .2/ n2 .x/ C M .3/ n3 .x/ and where f1
is a given surface force field applied to 1 . The boundary value problem defined
by Eqs. (26.6), (26.7) and (26.8) is typically the one for which the random field
fK.x/; x 2 g has to be identified by solving a statistical inverse problem in high
stochastic dimension with the partial and limited experimental data fuexp;` ; ` D
1; : : : ; exp g.
Let us assume that the weak formulation of the stochastic boundary value problem
involving stochastic elliptic operator Dx is discretized by using the finite element
method. Let I D fx1 ; : : : ; xNp g be the finite subset of made up of all the
898 C. Soize
integrating points in the numerical integration formulae for the finite elements [79]
used in the mesh of . Let U D .U1 ; : : : ; UNU / be the random model observation
with values in RNU , constituted of the NU observed degrees of freedom for which
there are available experimental data (corresponding to some degrees of freedom
of the nodal values of u at all the nodes in ). The random observation vector U
is the unique deterministic nonlinear transformation of the finite family of the Np
dependent random matrices K.x1 /; : : : ; K.xNp / such that
in which
.K 1 ; : : : ; K Np // 7! h.K 1 ; : : : ; K Np / W MC C
n .R/ : : : Mn .R/ ! R ;
NU
(26.10)
It is assumed that exp experimental data sets are available for the random obser-
vation vector U. Each experimental data set corresponds to partial experimental
data (only some degrees of freedom of the nodal values of the displacement
field on are observed) with a limited length (exp is relatively small). These
exp experimental data sets correspond to measurements of exp experimental
configurations associated with the same boundary value problem. For configuration
`, with ` D 1; : : : ; exp , the observation vector (corresponding to U for the
computational model) is denoted by uexp;` and belongs to RNU . Therefore, the
available data are made up of the exp vectors uexp;1 ; : : : ; uexp;exp in RNU . It
is assumed that uexp;1 ; : : : ; uexp;exp correspond to exp independent realizations
of a random vector Uexp defined on a probability space . exp ; T exp ; P exp / and
correspond to random observation vector U of the stochastic computational model
(random vectors Uexp and U are not defined on the same probability space). It should
be noted that the experimental data do not correspond to a field measurement in
but only to a field measurement on the part of the boundary @ of domain .
This is the reason why the experimental data are called partial.
1
K.x/ D L.x/T f"In C K0 .x/g L.x/; 8 x 2 ; (26.11)
1C"
in which " > 0 is any fixed positive real number, where L.x/ is the upper
triangular .n n/ real matrix such that K.x/ D L.x/T L.x/ and where
K0 D fK0 .x/; x 2 g is any random field indexed by , with values in MC
n .R/.
Equation (26.11) can be inverted,
Random field K is effective with values in MCn .R/. For all x fixed in , the
lower bound is the matrix belonging to MC
n .R/ defined "
by K" .x/ D 1C" K.x/,
900 C. Soize
and for all random matrix K0 .x/ with values in MC n .R/, K.x/, defined by
Eq. (26.12), is a random matrix with values in a subset of MC n .R/ such that
K.x/ K" .x/ almost surely.
For all integer p 1, fK.x/1 ; x 2 g is a p-order random field with values
1 p
in MC n .R/, i.e., for all x in , EfkK.x/ kF g < C1 and, in particular, is a
second-order random field.
If K0 is a second-order random field, i.e., for all x in , EfkK0 .x/k2F g < C1,
then K is a second-order random field, i.e., for all x in , EfkK.x/k2F g < C1.
If function K is chosen as the mean function of random field K, i.e., K.x/ D
EfK.x/g for all x, then EfK0 .x/g is equal to In , whichs shows that random
field K0 is normalized.
The class of random fields defined by Eq. (26.11) yields a uniform stochastic
elliptic operator Dx that allows for studying the existence and uniqueness
of a second-order random solution of a stochastic boundary value problem
involving Dx .
in which logM is the reciprocity mapping of expM , which is defined on MC n .R/ with
values in MSn .R/, but in general, random field G is not a second-order random
field. If G is any second-order random field with values in MCn .R/, in general, the
random field K0 D expM .G/ is not a second-order random field. Nevertheless, it
can be proved that, if K0 and K0 1 are second-order random fields with values
in MC n .R/, then there exists a second-order random field G with values in Mn .R/
S
(ii) there are real numbers 0 < ch < C1 and 0 < ca < C1, such that, for all g in
R, we have h.gI a/ ca C ch g 2 .
The introduced hypotheses imply that, for all a > 0, g 7! h.gI a/ is a one-to-
one mapping from R onto RC and consequently, the reciprocity mapping, v 7!
h1 .vI a/, is a strictly monotonically increasing function from RC onto R. The
square-type representation of random field K0 , indexed by , with values in
MCn .R/, is defined by
L D L.G/ where G 7! L.G/ is the measurable mapping from MSn .R/ into
MUn .R/ defined by
q
L.G/j k D Gj k ; 1 j < k n; L.G/jj D h.Gjj I aj /; 1 j n;
(26.16)
in which L1 is the reciprocity function of L, from MCn .R/ into Mn .R/, which is
S
from R into RC and there are positive real numbers ch and caj such that, for all g
in R, we have hAPSM .gI aj / caj C ch g 2 . In addition, it can easily be seen that the
1
reciprocity function is written as hAPSM .vI a/ D s FW1 .Fa .v=.2s 2 //.
Construction of the transformation G and its inverse G .
1
K such that, for all x in , the matrix K K" .x/ > 0. Transformation G maps
MSn .R/ into MCbn .R/ Mn .R/ and G
C 1
maps MCbn .R/ into Mn .R/.
S
Two versions of the nonlinear transformation G from MSn .R/ into MC n .R/ are
defined by Eqs. (26.19) and (26.21). For the statistical inverse problem, G is
chosen in the class of the second-order random field indexed by with values
in MSn .R/, which is reduced using its truncated Karhunen-Love decomposition
in which the random coordinates are represented using a truncated polynomial
Gaussian chaos. Consequently, the approximation G.m;N;Ng / of the non-Gaussian
second-order random field G is introduced such that
m p
X
.m;N;Ng /
G .x/ D G0 .x/ C i Gi .x/ i ; (26.23)
iD1
N
X j
i D yi j ./; (26.24)
j D1
26 Random Vectors and Random Fields in High Dimension . . . 903
in which
zT z D Im ; (26.25)
j
zj i D yi ; 1 i m; 1 j N: (26.26)
Vm .RN / D z 2 MN;m .R/ I zT z D Im : (26.28)
904 C. Soize
Let fG.m;N;Ng / .x/; x 2 g be defined by Eqs. (26.23) and (26.24), and let G be
defined by Eq. (26.19) for the exponential-type representation and by Eq. (26.21)
for the square-type representation. The corresponding parameterized representation
for non-Gaussian positive-definite matrix-valued random field fK.x/; x 2 g is
denoted by fK.m;N;Ng / .x/; x 2 g and is rewritten, for all x in , as
m p
X
K.m;N;Ng / .x; ; z/ D G.G0 .x/ C i Gi .x/ fzT ./gi /: (26.31)
iD1
26 Random Vectors and Random Fields in High Dimension . . . 905
From Eqs. (26.9) and (26.30), the parametric model-based representation of random
model observation U with values in RNU , corresponding to the representation
fK.m;N;Ng / .x/; x 2 g of random field fK.x/; x 2 g, is denoted by U.m;N;Ng /
and is written as
in which .; z/ 7! B.m;N;Ng / .; z/ is a deterministic mapping defined on RNg
Vm .RN / with values in RNU such that
B.m;N;Ng / .; z/ D h. K.m;N;Ng / .x1 ; ; z/; : : : ; K.m;N;Ng / .xNp ; ; z//: (26.33)
For Np fixed, the sequence fU.m;N;Ng / gm;N;Ng of RNU -valued random variables
converge to U in LNU 2 .
The first step consists in introducing a family fKAPSM .xI s/; x 2 g of algebraic
prior stochastic models (APSM) for the non-Gaussian second-order random field
K, defined on .; T ; P/, indexed by , with values in MC n .R/, which has been
introduced in the previous paragraph entitled Sect. 6.1. This family depends on an
unknown hyperparameter s belonging to an admissible set Cs that is a subset of RNs ,
for which the dimension, Ns , is assumed to be relatively small, while the stochastic
dimension of KAPSM is high. For instance, s can be made up of the mean function,
a matrix-valued lower bound, some spatial-correlation lengths, some parameters
controlling the statistical fluctuations and the shape of the tensor-valued correlation
function. For s fixed in Cs , the probability distribution (i.e., the system of marginal
probability distributions) of random field KAPSM and the corresponding generator
of independent realizations are assumed to have been constructed and, consequently,
are assumed to be known.
906 C. Soize
The calculation of sopt in Cs can be carried out by using the maximum likelihood
method:
exp
X
opt
s D arg max log pU .uexp;` I s/; (26.35)
s2Cs
`D1
These realizations can be generated at points x1 ; : : : ; xNp (or at any other points),
with KL as large as it is desired without inducing a significant computational
cost.
For a fixed choice of the type of representation for random field K given
by Eq. (26.19) (exponential type) or Eq. (26.21) (square type), the corresponding
optimal algebraic prior model fGOAPSM .x/; x 2 g of random field fG.x/;
x 2 g is written as
m p
X
GOAPSM .m/ .x/ D G0 .x/ C i Gi .x/ OAPSM
i ; 8x 2 : (26.39)
iD1
Using the KL independent realizations G .1/ ; : : : ; G .KL / of random field GOAPSM
calculated with Eq. (26.38), KL independent realizations .1/ ; : : : ; .KL / of the
908 C. Soize
where the integer Nd is the maximum degree of the normalized multivariate Hermite
polynomials and Ng the dimension of random vector . In Eq. (26.41), the symbol
' means that the mean-square convergence is reached for Nd and Ng (with Ng
m) sufficiently large.
Identification of an optimal value z0 .Nd ; Ng / of z for a fixed value of Nd
and Ng .
For a fixed value of Nd and Ng such that Nd 1 and 1 Ng m, the
identification of z is performed using the maximum likelihood method. The log-
likelihood function is written as
KL
X
L.z/ D log pchaos .Nd ;Ng / ..`/ I z/; (26.43)
`D1
(i) For z fixed in Vm .RN /, the probability density function e 7! pchaos .Nd ;Ng /
.e I z/ of random variable chaos .Nd ; Ng / is estimated by the multidimensional
26 Random Vectors and Random Fields in High Dimension . . . 909
The first one is required for generating the independent realizations j ..`/ / of
j ./ in preserving the orthogonality condition for any high values of Ng and
Nd . An efficient algorithm is presented hereinafter.
The second one requires an advanced algorithm to optimize the trials for solving
the high-dimension optimization problem defined by Eq. (26.44), the constraint
zT z D Im being automatically and exactly satisfied as described in [61].
2 3
1 ..1/ / : : : 1 ..chaos / /
D 4
:::
5; (26.45)
.1/ .chaos /
N . / : : : N . /
1
lim T D IN : (26.46)
chaos !C1 chaos
It should be noted that the algorithm, which is used for the Gaussian chaos
j ./ D 1 .1 / : : : Ng .Ng / for j D 1; : : : ; N , can also be used for an
arbitrary non-separable probability distribution p ./ d on RNg without any mod-
ification, but in such a case, the multivariate polynomials fj ./gN j D1 , which verify
R
the orthogonality property, Efj ./ j 0 ./g D RNg j ./ j 0 ./ p ./ d D
jj 0 , are not written as a tensorial product of univariate polynomials (we have not
j ./ D 1 .1 / : : : Ng .Ng /). It has been proved in [65] that, for the usual
probability measure, the use of the explicit algebraic formula (constructed with a
symbolic Toolbox) or the use of the computational recurrence relation with respect
to the degree, induces important numerical noise, and the orthogonality property
is lost. In addition, if a global orthogonalization was done to correct this loss of
orthogonality, then the independence of the realizations would be lost. A robust
computational method has been proposed in [49, 65] to preserve the orthogonality
910 C. Soize
properties and the independence of the realizations. The two main steps are the
following.
2 3
M1 ..1/ / : : : M1 ..chaos / /
M D M..1/ / : : : M..chaos / / D 4
:::
5:
MN . / : : : MN .
.1/ .chaos /
/
(26.47)
variable chaos
i .Nd ; Ng /, which is estimated with the one-dimensional
kernel density estimation method using chaos independent realizations,
26 Random Vectors and Random Fields in High Dimension . . . 911
in which BIi is a bounded interval of the real line, which is defined as the
support of the one-dimensional kernel density estimator of random variable
OAPSM
i and which is then adapted to independent realizations .1/ ; : : : ; .KL /
OAPSM
of .
(ii) For random vector chaos .Nd ; Ng /, the L1 -log error function is denoted by
err.Nd ; Ng / and is defined by
m
1 X
err.Ng ; Nd / D erri .Nd ; Ng /: (26.49)
m iD1
opt opt
(iii) The optimal values Nd and Ng of the truncation parameters Nd and Ng
are determined for minimizing the error function err.Nd ; Ng / in taking into
account the admissible set for the values of Nd and Ng as described in [49].
Let CNd ;Ng be the admissible set for the values of Nd and Ng , which is
defined by
It should be noted the more the values of Nd and Ng are high, the
bigger is the matrix z0 .Nd ; Ng /, and thus, the more difficult it is to
perform the numerical identification. Rather than directly minimizing error
function err.Nd ; Ng /, it is more accurate to search for the optimal values of
Nd and Ng that minimize the dimension of the projection basis, .Nd C
Ng / =.Nd Ng /. For a given error threshold ", we then introduce the
admissible set C" such that
opt opt
and the optimal values Nd and Ng are given as the solution of the
optimization problem,
opt opt
.Nd ; Ngopt / D arg min .Nd CNg / =.Nd Ng /; N opt D h.Nd ; Ngopt /:
.Nd ;Ng / 2 C"
912 C. Soize
This step consists in identifying the prior stochastic model fKprior .x/; x 2 g
of fK.x/; x 2 g, using the maximum likelihood method and the experimental
data sets uexp;1 ; : : : ; uexp;exp relative to the random model observation U of the
stochastic computational model (see Eq. (26.9)) and using the parametric model-
based representation of random observation model U (see Eq. (26.32)). We thus
opt
have to identify the value zprior in Vm .RN / of z such that
exp
X
prior
z D arg max log pU.m;N;Ng / .uexp;` I z/; (26.51)
z2Vm .RN /
`D1
in which pU.m;N;Ng / .uexp;` I z/ is the value, in u D uexp;` , of the pdf pU.m;N;Ng / .uI z/
of the random vector U.m;N;Ng / given (see Eq. (26.32)) by
where .; z/ 7! B.m;N;Ng / .; z/ is the deterministic mapping from RNg Vm .RN /
into RNU defined by Eq. (26.33) with Eq. (26.31) in which G0 .x/, i , and Gi .x/,
for i D 1; : : : ; m, are the quantities computed in Step 4.
(i) For z fixed in Vm .RN /, pdf u 7! pU.m;N;Ng / .uI z/ of random variable
U.m;N;Ng / is estimated by the multidimensional kernel density estimation
method using chaos independent realizations .1/ ; : : : ; .chaos / of .
(ii) Let us assume that N > m is generally the case. The parameterization
z D Rz0 .w/ defined by Eq. (26.29) is used for exploring, with a random
search algorithm, the subset of Vm .RN /, centered in z0 WD z0 .Nd ; Ng / 2
Vm .RN / computed in Step 5. The optimization problem defined by Eq. (26.51)
is replaced by zprior D Rz0 .wprior / with
26 Random Vectors and Random Fields in High Dimension . . . 913
exp
X
prior
w D arg max log pU.m;N;Ng / .uexp;` I Rz0 .w//: (26.53)
w2Tz0
`D1
in which zprior is given by Eq. (26.51) and where K.m;N;Ng / .x; ; zprior / is
defined by Eq.(26.31) with z D zprior .
in which Rzprior is the mapping defined by Eq. (26.29), where zprior has
been calculated in Step 6 and where Wprior D ProjT prior .prior / is a
z
random matrix with values in Tzprior , which is the projection on Tzprior of a
random matrix prior with values in MN;m .R/ whose entries are independent
normalized Gaussian real-valued random variables, i.e. Efj i g D 0 and
Ef2j i g D 1. For a sufficiently small value of , the statistical fluctuations
opt
of the Vm .RN /-valued random matrix Zprior are approximatively centered
around zprior . The Bayesian update allows the posterior distribution of the
random matrix Wpost with values in Tzprior to be estimated using the stochastic
914 C. Soize
opt opt
in which K.m;N ;Ng / is defined by Eq. (26.31).
(iii) Once the probability distribution of Wpost has been estimated by Step 7, KL
independent realizations can be calculated for the random field Gpost .x/ D
P p post
G0 .x/ C m iD1 i Gi .x/ i in which post D Zpost T ./ and where
Z D Rzprior .W /. The identification procedure can then be restarted
post post
The first one is the algebraic prior stochastic model KAPSM for the non-
Gaussian positive-definite matrix-valued random field K that exhibits
anisotropic statistical fluctuations (initially introduced in [56, 58]) and for
which there is a parameterization with a maximum of d n.n C 1/=2 spatial-
correlation lengths and for which a positive-definite lower bound is given
[60, 63]. An extension of this model can be found in [32] for the case for which
some positive-definite lower and upper bounds are introduced as constraints.
The second one is the algebraic prior stochastic model KAPSM described in [28,
29] for the non-Gaussian positive-definite matrix-valued random field K that
sym
exhibits (i) dominant statistical fluctuations in a symmetry class Mn .R/
MCn .R/ of dimension N (isotropic, cubic, transversal isotropic, tetragonal,
26 Random Vectors and Random Fields in High Dimension . . . 915
We consider the case for which the random field exhibits anisotropic statistical
fluctuations.
C D K C` 2 MC
n .R/: (26.59)
In Eq. (26.58), fG0 .x/; x 2 Rd g is a random field defined on .; T ; P/, indexed
by Rd , with values in MC n .R/, homogeneous on R , second order such that, for all
d
x in R ,
d
Consequently, the random fields fUj k .x/; x 2 Rd g1j kn are completely and
uniquely defined by the n.n C 1/=2 autocorrelation functions D .1 ; : : : ; d / 7!
RUj k ./ D EfUj k .x C / Uj k .x/g from Rd into R, such that RUj k .0/ D 1. The
jk jk
spatial-correlation lengths L1 ; : : : ; Ld of random field fUj k .x/; x 2 Rd g are
defined by
Z C1
Ljk D jRUj k .0; : : : ; ; : : : ; 0/j d ; D 1; : : : d; (26.63)
0
jk jk
RUj k ./ D 1 .1 / : : : d .d /; (26.64)
jk
in which, for all D 1; : : : ; d , .0/ D 1, and for all 6D 0,
j k . / D 4.Ljk /2 =. 2 2 / sin2 =.2Ljk / ; (26.65)
jk jk
where L1 ; : : : ; Ld are positive real numbers. Each random field Uj k is mean-
square continuous on Rd and its power spectral density function defined on
jk jk jk jk
Rd has a compact support, =L1 ; =L1 : : : =L1 ; =Ld . Such a
jk jk
model has d n.n C 1/=2 real parameters fL1 ; : : : ; Ld g1j kn that represent
the spatial-correlation lengths of the stochastic germs fUj k .x/; x 2 Rd g1j kn ,
because
Z C1
jRUj k .0; : : : ; ; : : : ; 0/j d / D Ljk : (26.66)
0
h.uI a/ D F1
a
.FU .u// ; (26.67)
Ru 2
in which u 7! FU .u/ D p1 e v =2 d v is the cumulative distribution function
1 2
of the normalized Gaussian random variable U . The function p 7! F1 a
.p/,
from 0; 1 into 0; C1, isRthe reciprocal function of the cumulative distribution
function
7! Fa .
/ D 0 1.a/ t a1 e t dt of the gamma random variable a
with parameter a, in which .a/ is the gamma function defined by .a/ D
R C1 a1 t
0 t e dt .
Defining the random field fG0 .x/; x 2 R g and its generator of realizations.
d
For all x fixed in R , the available information is defined by Eq. (26.60) and by
d
the constraint jEflog.detG0 .x//gj < C1, which is introduced in order that
the zero matrix be a repulsive value for the random matrix G0 .x/. The use of
the maximum entropy principle under the constraints defined by this available
information leads to taking the random matrix G0 .x/ in ensemble SGC 0 defined
in Sect. 8.1 of Chap. 8, Random Matrix Models and Nonparametric Method
for Uncertainty Quantification in part II of the present Handbook on Uncertainty
Quantification. Taking into account the algebraic representation of any random
matrix belonging to ensemble SGC 0 , the spatial-correlation structure of random
field G0 is then introduced in replacing the Gaussian random variables Uj k
by the Gaussian real-valued random fields fUj k .x/; x 2 Rd g defined above, for
which the spatial-correlation structure is defined by the spatial-correlation lengths
jk
fL gD1;:::;d . Consequently, the random field fG0 .x/; x 2 Rd g, defined on
probability space .; T ; P/, indexed by Rd , with values in MC n .R/, is constructed
as follows:
(i) Let fUj k .x/; x 2 Rd g1j kn be the n.n C 1/=2 independent random fields
introduced above. Consequently, for all x in Rd ,
where CG0 is the positive constant of normalization. For all x fixed in Rd , the random
variables fG0 .x/j k ; 1 j k 6g are mutually dependent. In addition, the
system of the marginal probability distributions of random field fG0 .x/; x 2 g
26 Random Vectors and Random Fields in High Dimension . . . 919
in which sup is the supremum and where 0 < cG < C1 is a finite positive constant.
the reshaping of C` 2 MCn .R/ (the lower bound) and K 2 Mn .R/ (the mean
C
value),
jk jk
the d n.n C 1/=2 positive real numbers, fL1 ; : : : ; Ld g1j kn (the spatial-
correlation lengths, for the parameterization
p given in the example) and (the
dispersion) such that 0 < < .n C 1/=.n C 5/.
We now consider the case for which the random field exhibits dominant statistical
fluctuations in a symmetry class and some anisotropic statistical fluctuations.
N
X sym
Cm D fm 2 RN j mj Ej 2 MC
n .R/g: (26.78)
j D1
sym sym
It should be noted that matrices E1 ; : : : ; EN are symmetric but are not
positive definite. For the usual material symmetry classes, the possible values of
N are the following: 2 for isotropic, 3 for cubic, 5 for transversely isotropic, 6 or 7
for tetragonal, 6 or 7 for trigonal, 9 for orthotropic, 13 for monoclinic, and 21 for
anisotropic. The following properties are proved (see [34, 75]):
sym
(i) If M and M 0 belong to Mn .R/, then for all a and b in R, a M Cb M 0 2
sym
Mn .R/, and
sym
(ii) Any matrix N belonging to Mn .R/ can be written as
N
X sym
N D expM .N /; N D yj Ej ; y D .y1 ; : : : ; yN / 2 RN ;
j D1
(26.80)
N
X N
X
sym sym
expM . yj Ej / D mj .y/ Ej ; 8 y 2 RN ; (26.81)
j D1 j D1
C D K C` 2 MC
n .R/: (26.83)
sym
Let A be the deterministic matrix in Mn .R/, independent of x, representing the
sym
projection of the mean matrix C on the symmetry class Mn .R/,
in which C is defined by Eq. (26.83) and where P sym is the projection operator
sym
from MCn .R/ onto Mn .R/.
(i) For a random field with values in a given symmetry class with N < 21 (there
are no anisotropic statistical fluctuations), the matrices K and C` belong to
sym
the symmetry class and consequently, C must belong to Mn .R/, and thus,
A is equal to C .
sym
(ii) If the class of symmetry is anisotropic (thus, N D 21), then Mn .R/ coincides
with Mn .R/ and again, A is equal to the mean matrix C that belongs to
C
MC n .R/.
(iii) In general, for a given symmetry class with N < 21, and due to the presence
of anisotropic statistical fluctuations, the mean value C of the random matrix
sym
C.x/ D K.x/ C` belongs to MC n .R/ but does not belong to Mn .R/.
For this case, an invertible deterministic .n n/ real matrix S is introduced
such that
C D S T A S : (26.85)
C D LC T LC ; A D LA T LA : (26.86)
S D LA 1 LC : (26.87)
It should be noted that for cases (i) and (ii) above, Eq. (26.85) shows that
S D In .
922 C. Soize
For all x in Rd , the mean value of random matrix G0 .x/ is matrix In (see
Eq. (26.71)).
The level of the anisotropic statistical fluctuations is controlled
p by the dispersion
parameter (independent of x) such that 0 < < .n C 1/=.n C 5/ (see
Eq. (26.72)).
The hyperparameter sG0 of random field G0 is constituted of the dispersion
jk jk
parameter and of the spatial-correlation lengths fL1 ; : : : ; Ld g1j kn that
are positive real numbers (see Eq. (26.63)).
Statistical fluctuations in the given symmetry class described by fA.x/; x 2
Rd g.
The random field fA.x/; x 2 Rd g models the statistical fluctuations belonging
sym
to the given symmetry class Mn .R/. This random field, defined on the probabil-
ity space . ; T ; P /, is statistically independent of random field fG0 .x/; x 2
0 0 0
sym
Rd g; is indexed by Rd , with values in Mn .R/ MC n .R/; and is non-Gaussian,
homogeneous, second-order, and mean-square continuous on Rd . In Eq. (26.88),
for all x fixed in Rd , the random matrix A.x/1=2 is the square root of random
sym
matrix A.x/ and, due to Eq. (26.79), is with values in Mn .R/ MC n .R/.
For all x in R , the mean value of random matrix A.x/ is the matrix A
d
In order that, for all x in Rd , the zero matrix be a repulsive value for random
matrix A.x/, the following constraint is introduced,
Due to the statistical independence of A.x/ and G0 .x/, taking the mathe-
matical expectation of the two members of Eq. (26.88), and from Eqs. (26.83)
and (26.85), it can be deduced that, for all x in ,
sym
and fKAPSM .x/; x 2 g is a random field indexed by with values in Mn .R/.
Statistical fluctuations in the symmetry class going to zero (A ! 0).
If the given symmetry class is anisotropic (N D 21) and if A ! 0, then A goes to
the mean matrix C and S goes to In , and Eq. (26.88) shows that KAPSM .x/
C` goes to C 1=2 G0 .x/ C 1=2 (in probability distribution), which is a random
matrix with values in MC n .R/. Consequently, if there are no statistical fluctuations
in the symmetry class (A D 0), then Eq. (26.86) becomes
in which expM denotes the exponential of the symmetric real matrices, where
Y.x/ D .Y1 .x/; : : : ; YN .x// and where fY.x/; x 2 Rd g is a non-Gaussian random
field defined on . 0 ; T 0 ; P 0 /, indexed by Rd with values in RN , homogeneous,
second-order, mean-square continuous on Rd . Using the change of representation
defined by Eqs. (26.81) and (26.82), random matrix N.x/ defined by Eq. (26.96)
can be rewritten as
N
X sym
N.x/ D mj .Y.x// Ej : (26.97)
j D1
EfN.x/g D In ; (26.98)
The constraints defined by Eqs. (26.100) and (26.101) are globally rewritten
as
in which
y 7! g.y/PD .g1 .y/; : : : ; g1CN .y// is the mapping from RN into R1CN such that
sym
g1 .y/ D N j D1 yj trEj and g1Cj .y/ D mj .y/ for j D 1; : : : ; N .
f D .f1 ; : : : ; f1CN / is the vector in R1CN such that f1 D cN and f1Cj D
fE1 Igj for j D 1; : : : ; N .
9.3.6 Construction of the pdf for Random Vector Y.x/ Using the
MaxEnt Principle
For all x fixed in Rd , the probability density function y 7! pY.x/ .y/ from RN into
RC of the RN -valued random vector Y.x/ is independent of x (Y is homogeneous).
This pdf is constructed using the maximum entropy principle presented in Sect. 11
of Chap. 8, Random Matrix Models and Nonparametric Method for Uncertainty
Quantification in part II of the present Handbook on Uncertainty
R Quantification,
under the constraints defined by the normalization condition RN pY.x/ .y/ d y D 1
and by Eq. (26.102). For all y in RN , the pdf is written as
spatial-correlation structure.
Z C1
Lj D jRj .0; : : : ; ; : : : ; 0/j d :
0
(ii) For all countable ordered subsets 0 r1 < : : : < rk < rkC1 < : : : of RC , the
sequence of random fields fBrk rkC1 .x/; x 2 Rd gk2N
is mutually independent random fields,
is such that, 8 k 2 N, fBrk rkC1 .x/; x 2 Rd g is an independent copy of
fB.x/; x 2 Rd g, which implies that EfBrk rkC1 .x/g D EfB.x/g D 0 and
that
From the properties of random field fB.x/; x 2 Rd g and of the family of random
fields fBrk rkC1 .x/; x 2 Rd gk2N for all countable ordered subsets 0 r1 < : : : <
rk < rkC1 < : : :, it is deduced that, for all x fixed in Rd ,
.1/ .N /
(i) the components Wx ; : : : ; Wx of Wx are mutually independent real-valued
stochastic processes,
(ii) fWx .r/; r 0g is a stochastic process with independent increments,
(iii) For all 0 s < r < C1, the increment Wsr x D Wx .r/ Wx .s/ is
a RN -valued second-order random variable which is Gaussian, centered, and
with a covariance matrix that is written as CWsrx D EfWsr sr T
x .Wx / g D
.r s/ IN .
(iv) Since Wx .0/ D 0, and from (i), (ii), and (iii), it can be deduced that
fWx .r/; r 0g is a RN -valued normalized Wiener process.
928 C. Soize
for which the Wiener process is the family fWx .r/; r 0g that contains the
imposed spatial-correlation structure defined by Eq. (26.105),
that admits the same unique invariant measure (independent of x), which is
defined by the pdf pY.x/ given by Eqs. (26.103) and (26.104).
Taking into account Eq. (26.103), the potential u 7! .u/, from RN into R, is
defined by
For all x fixed in Rd , let f.Ux .r/; Vx .r//; r 0g be the Markov stochastic process
defined on the probability space . 0 ; T 0 ; P 0 /, indexed by r 0, with values in
RN RN , satisfying, for all r > 0, the following ISDE,
in which u0 and v0 are given vectors in RN (that are generally taken as zero in
the applications) and f0 > 0 is a free parameter whose usefulness is explained
below. From Eqs. (26.82) and (26.102), it can be deduced that function u 7! .u/:
(i) is continuous on RN and (ii) is such that u 7! kr u .u/k is a locally bounded
function on RN (i.e., is bounded on all compact sets in RN ). In addition the Lagrange
multiplier sol , which belongs to C R1CN , is such that
Z
kr u .u/k e .u/ d u < C1: (26.114)
Rn
Taking into account (i), (ii), and Eqs. (26.112), (26.113), and (26.114), using
Theorems 47 in pages 211216 of Ref. [55] for which the Hamiltonian is taken
as H.u; v/ D kvk2 =2 C .u/, and using [17, 39] for the ergodic property, it can be
deduced that the problem defined by Eqs. (26.109), (26.110), and (26.111) admits
a unique solution. For all x fixed in Rd , this solution is a second-order diffusion
26 Random Vectors and Random Fields in High Dimension . . . 929
1
st .u; v/ D cN expf kvk2 .u/g; (26.115)
2
Random variable Y.x/ (for which the pdf pY.x/ is defined by Eq. (26.103)) can then
be written, for all fixed positive value of rst , as
Y.x/ D Ust
x .rst / D lim Ux .r/ in probability distribution: (26.117)
r!C1
The free parameter f0 > 0 introduced in Eq. (26.110) allows a dissipation term
to be introduced in the nonlinear second-order dynamical system (formulated in
the Hamiltonian form with an additional dissipative term) for obtaining more
rapidly the asymptotic behavior corresponding to the stationary and ergodic solution
associated with the invariant measure. Using Eq. (26.117) and the ergodic property
of stationary stochastic process Ust
x , it should be noted that, if
R w is any mapping
from RN into an Euclidean space such that Efw.Y.x//g D RN w.y/ pY.x/ d y is
finite, then
Z R
1
Efw.Y.x//g D lim w.Ux .r;
0 // dr; (26.118)
R!C1 R 0
in which, for
0 2 0 , Ux .
;
0 / is any realization of Ux .
the reshaping of C` 2 MC
n .R/ (the lower bound) and K 2 Mn .R/ (the mean
C
value),
26 Random Vectors and Random Fields in High Dimension . . . 931
11 Conclusions
A complete advanced methodology and the associated tools have been presented
for solving the challenging statistical inverse problem related to the experimental
identification of a non-Gaussian matrix-valued random field that is the model
932 C. Soize
parameter of a boundary value problem, using some partial and limited experimental
data related to a model observation. Many applications and validation of this
methodology can be found in the given references.
References
1. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds.
Princeton University Press, Princeton (2000)
2. Arnst, M., Ghanem, R., Soize, C.: Identification of Bayesian posteriors for coefficients of chaos
expansions. J. Comput. Phys. 229(9), 31343154 (2010)
3. Batou, A., Soize, C.: Stochastic modeling and identification of an uncertain computational
dynamical model with random fields properties and model uncertainties. Arch. Appl. Mech.
83(6), 831848 (2013)
4. Batou, A., Soize, C.: Calculation of Lagrange multipliers in the construction of maximum
entropy distributions in high stochastic dimension. SIAM/ASA J. Uncertain. Quantif. 1(1),
431451 (2013)
5. Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on least angle
regression. J. Comput. Phys. 230(6), 23452367 (2011)
6. Burrage, K., Lenane, I., Lythe, G.: Numerical methods for second-order stochastic differential
equations. SIAM J. Sci. Comput. 29, 245264 (2007)
7. Cameron, R.H., Martin, W.T.: The orthogonal development of non-linear functionals in series
of Fourier-Hermite functionals. Ann. Math. Second Ser. 48(2), 385392 (1947)
8. Carlin, B.P., Louis, T.A.: Bayesian Methods for Data Analysis, 3rd edn. Chapman & Hall/CRC
Press, Boca Raton (2009)
9. Congdon, P.: Bayesian Statistical Modelling, 2nd edn. Wiley, Chichester (2007)
10. Das, S., Ghanem, R.: A bounded random matrix approach for stochastic upscaling. Multiscale
Model. Simul. 8(1), 296325 (2009)
11. Das, S., Ghanem, R., Spall, J.C.: Asymptotic sampling distribution for polynomial chaos
representation from data: a maximum entropy and fisher information approach. SIAM J. Sci.
Comput. 30(5), 22072234 (2008)
12. Das, S., Ghanem, R., Finette, S.: Polynomial chaos representation of spatio-temporal random
field from experimental measurements. J. Comput. Phys. 228, 87268751 (2009)
13. Debusschere, B.J., Najm, H.N., Pebay, P.P., Knio, O.M., Ghanem, R., Le Matre, O.: Numerical
challenges in the use of polynomial chaos representations for stochastic processes. SIAM J. Sci.
Comput. 26(2), 698719 (2004)
14. Desceliers, C., Ghanem, R., Soize, C.: Maximum likelihood estimation of stochastic chaos
representations from experimental data. Int. J. Numer. Methods Eng. 66(6), 9781001 (2006)
15. Desceliers, C., Soize, C., Ghanem, R.: Identification of chaos representations of elastic
properties of random media using experimental vibration tests. Comput. Mech. 39(6), 831
838 (2007)
16. Desceliers, C., Soize, C., Naili, S., Haiat, G.: Probabilistic model of the human cortical bone
with mechanical alterations in ultrasonic range. Mech. Syst. Signal Process. 32, 170177
(2012)
17. Doob, J.L.: Stochastic Processes. Wiley, New York (1990)
18. Doostan, A., Ghanem, R., Red-Horse, J.: Stochastic model reduction for chaos representations.
Comput. Methods Appl. Mech. Eng. 196(3740), 39513966 (2007)
19. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality
constraints. SIAM J. Matrix Anal. Appl. 20(2), 303353 (1998)
20. Ernst, O.G., Mugler, A., Starkloff, H.J., Ullmann, E.: On the convergence of generalized
polynomial chaos expansions. ESAIM Math. Model. Numer. Anal. 46(2), 317339 (2012)
26 Random Vectors and Random Fields in High Dimension . . . 933
21. Ghanem, R., Dham, S.: Stochastic finite element analysis for multiphase flow in heterogeneous
porous media. Transp. Porous Media 32, 239262 (1998)
22. Ghanem, R., Doostan, R.: Characterization of stochastic system parameters from experimental
data: a Bayesian inference approach. J. Comput. Phys. 217(1), 6381 (2006)
23. Ghanem, R., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1991). See also the revised edition, Dover Publications, New York (2003)
24. Ghanem, R., Doostan, R., Red-Horse, J.: A probability construction of model validation.
Comput. Methods Appl. Mech. Eng. 197(2932), 25852595 (2008)
25. Ghosh, D., Ghanem, R.: Stochastic convergence acceleration through basis enrichment of
polynomial chaos expansions. Int. J. Numer. Methods Eng. 73(2), 162184 (2008)
26. Guilleminot, J., Soize, C.: Non-Gaussian positive-definite matrix-valued random fields with
constrained eigenvalues: application to random elasticity tensors with uncertain material
symmetries. Int. J. Numer. Methods Eng. 88(11), 11281151 (2011)
27. Guilleminot, J., Soize, C.: Probabilistic modeling of apparent tensors in elastostatics: a MaxEnt
approach under material symmetry and stochastic boundedness constraints. Probab. Eng.
Mech. 28, 118124 (2012)
28. Guilleminot, J., Soize, C.: Stochastic model and generator for random fields with symmetry
properties: application to the mesoscopic modeling of elastic random media. Multiscale Model.
Simul. (SIAM Interdiscip. J.) 11(3), 840870 (2013)
29. Guilleminot, J., Soize, C.: It SDE-based generator for a class of non-Gaussian vector-valued
random fields in uncertainty quantification. SIAM J. Sci. Comput. 36(6), A2763A2786 (2014)
30. Guilleminot, J., Soize, C., Kondo, D., Binetruy, C.: Theoretical framework and experimental
procedure for modelling volume fraction stochastic fluctuations in fiber reinforced composites.
Int. J. Solids Struct. 45(21), 55675583 (2008)
31. Guilleminot, J., Soize, C., Kondo, D.: Mesoscale probabilistic models for the elasticity tensor
of fiber reinforced composites: experimental identification and numerical aspects. Mech.
Mater. 41(12), 13091322 (2009)
32. Guilleminot, J., Noshadravan, A., Soize, C., Ghanem, R.G.: A probabilistic model for bounded
elasticity tensor random fields with application to polycrystalline microstructures. Comput.
Methods Appl. Mech. Eng. 200, 16371648 (2011)
33. Guilleminot, J., Soize, C., Ghanem, R.: Stochastic representation for anisotropic permeability
tensor random fields. Int. J. Numer. Anal. Methods Geom. 36(13), 15921608 (2012)
34. Guilleminot, J., Le, T.T., Soize, C.: Stochastic framework for modeling the linear apparent
behavior of complex materials: application to random porous materials with interphases. Acta
Mech. Sinica 29(6), 773782 (2013)
35. Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration. Structure-Preserving
Algorithms for Ordinary Differential Equations. Springer, Heidelberg (2002)
36. Isakov, V.: Inverse Problems for Partial Differential Equations. Springer, New York
(2006)
37. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620630; 108(2),
171190 (1957)
38. Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems. Springer, New York
(2005)
39. Khasminskii, R.: Stochastic Stability of Differential Equations, 2nd edn. Springer, Heidelberg
(2012)
40. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differentials Equations. Springer,
Heidelberg (1992)
41. Kre, P., Soize, C.: Mathematics of Random Phenomena. Reidel, Dordrecht (1986)
42. Le Matre, O.P., Knio, O.M.: Spectral Methods for Uncertainty Quantification with Applica-
tions to Computational Fluid Dynamics. Springer, Heidelberg (2010)
43. Le Maitre, O.P., Knio, O.M., Najm, H.N.: Uncertainty propagation using Wiener-Haar
expansions. J. Comput. Phys. 197(1), 2857 (2004)
44. Lucor, D., Su, C.H., Karniadakis, G.E.: Generalized polynomial chaos and random oscillators.
Int. J. Numer. Methods Eng. 60(3), 571596 (2004)
934 C. Soize
45. Marzouk, Y.M., Najm, H.N.: Dimensionality reduction and polynomial chaos acceleration of
Bayesian inference in inverse problems. J. Comput. Phys. 228(6), 18621902 (2009)
46. Najm, H.H.: Uncertainty quantification and polynomial chaos techniques in computational
fluid dynamics. Annu. Rev. Fluid Mech. 41, 3552 (2009)
47. Nouy, A.: Proper generalized decomposition and separated representations for the numerical
solution of high dimensional stochastic problems. Arch. Comput. Methods Eng. 16(3), 403
434 (2010)
48. Nouy, A., Soize, C.: Random fields representations for stochastic elliptic boundary value
problems and statistical inverse problems. Eur. J. Appl. Math. 25(3), 339373 (2014)
49. Perrin, G., Soize, C., Duhamel, D., Funfschilling, C.: Identification of polynomial chaos
representations in high dimension from a set of realizations. SIAM J. Sci. Comput. 34(6),
A2917A2945 (2012)
50. Perrin, G., Soize, C., Duhamel, D., Funfschilling, C.: Karhunen-Love expansion revisited for
vector-valued random fields: scaling, errors and optimal basis. J. Comput. Phys. 242(1), 607
622 (2013)
51. Perrin, G., Soize, C., Duhamel, D., Funfschilling, C.: A posterior error and optimal reduced
basis for stochastic processes defined by a set of realizations. SIAM/ASA J. Uncertain. Quantif.
2, 745762 (2014)
52. Puig, B., Poirion, F., Soize, C.: Non-Gaussian simulation using Hermite polynomial expansion:
convergences and algorithms. Probab. Eng. Mech. 17(3), 253264 (2002)
53. Rozanov, Y.A.: Random Fields and Stochastic Partial Differential Equations. Kluwer Aca-
demic, Dordrecht (1998)
54. Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
55. Soize, C.: The Fokker-Planck Equation for Stochastic Dynamical Systems and Its Explicit
Steady State Solutions. World Scientific, Singapore (1994)
56. Soize, C.: Random-field model for the elasticity tensor of anisotropic random media. Comptes
Rendus Mecanique 332, 10071012 (2004)
57. Soize, C., Ghanem, R.: Physical systems with random uncertainties: chaos representation with
arbitrary probability measure. SIAM J. Sci. Comput. 26(2), 395410 (2004)
58. Soize, C.: Non Gaussian positive-definite matrix-valued random fields for elliptic stochastic
partial differential operators. Comput. Methods Appl. Mech. Eng. 195(13), 26-64 (2006)
59. Soize, C.: Construction of probability distributions in high dimension using the maximum
entropy principle. Applications to stochastic processes, random fields and random matrices.
Int. J. Numer. Methods Eng. 76(10), 15831611 (2008)
60. Soize, C.: Tensor-valued random fields for meso-scale stochastic model of anisotropic elastic
microstructure and probabilistic analysis of representative volume element size. Probab. Eng.
Mech. 23(23), 307323 (2008)
61. Soize, C.: Identification of high-dimension polynomial chaos expansions with random coef-
ficients for non-Gaussian tensor-valued random fields using partial and limited experimental
data. Comput. Methods Appl. Mech. Eng. 199(3336), 21502164 (2010)
62. Soize, C.: A computational inverse method for identification of non-Gaussian random fields
using the Bayesian approach in very high dimension. Comput. Methods Appl. Mech. Eng.
200(4546), 30833099 (2011)
63. Soize, C.: Stochastic Models of Uncertainties in Computational Mechanics. American Society
of Civil Engineers (ASCE), Reston (2012)
64. Soize, C.: Polynomial chaos expansion of a multimodal random vector. SIAM/ASA J.
Uncertain. Quantif. 3(1), 3460 (2015)
65. Soize, C., Desceliers, C.: Computational aspects for constructing realizations of polynomial
chaos in high dimension. SIAM J. Sci. Comput. 32(5), 28202831 (2010)
66. Soize, C., Ghanem, R.: Reduced chaos decomposition with random coefficients of vector-
valued random variables and random fields. Comput. Methods Appl. Mech. Eng. 198(2126),
19261934 (2009)
67. Spall, J.C.: Introduction to Stochastic Search and Optimization. Wiley, Hoboken (2003)
68. Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451559 (2010)
26 Random Vectors and Random Fields in High Dimension . . . 935
69. Ta, Q.A., Clouteau, D., Cottereau, R.: Modeling of random anisotropic elastic media and
impact on wave propagation. Eur. J. Comput. Mech. 19(123), 241253 (2010)
70. Talay, D.: Simulation and numerical analysis of stochastic differential systems. In: Kree, P.,
Wedig, W. (eds.) Probabilistic Methods in Applied Physics. Lecture Notes in Physics, vol. 451,
pp. 5496. Springer, Heidelberg (1995)
71. Talay, D.: Stochastic Hamiltonian system: exponential convergence to the invariant measure
and discretization by the implicit Euler scheme. Markov Process. Relat. Fields 8, 163198
(2002)
72. Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM,
Philadelphia (2005)
73. Tipireddy, R., Ghanem, R.: Basis adaptation in homogeneous chaos spaces. J. Comput. Phys.
259, 304317 (2014)
74. Vanmarcke, E.: Random Fields, Analysis and Synthesis, Revised and Expanded New edn.
World Scientific, Singapore (2010)
75. Walpole, L.J.: Elastic behavior of composite materials: theoretical foundations. Adv. Appl.
Mech. 21, 169242 (1981)
76. Walter, E., Pronzato, L.: Identification of Parametric Models from Experimental Data. Springer,
Berlin (1997)
77. Wan, X.L., Karniadakis, G.E.: Multi-element generalized polynomial chaos for arbitrary
probability measures. SIAM J. Sci. Comput. 28(3), 901928 (2006)
78. Xiu, D.B., Karniadakis, G.E.: Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
79. Zienkiewicz, O.C., Taylor, R.L.: The Finite Element Method for Solid and Structural Mechan-
ics, 6th edn. Elsevier/Butterworth-Heinemann, Amsterdam (2005)
Model Order Reduction Methods in
Computational Uncertainty Quantification 27
Peng Chen and Christoph Schwab
Abstract
This work surveys formulation and algorithms for model order reduction (MOR
for short) techniques in accelerating computational forward and inverse UQ.
Operator equations (comprising elliptic and parabolic partial differential equa-
tions (PDEs for short) and boundary integral equations (BIEs for short)) with
distributed uncertain input, being an element of an infinite-dimensional, sep-
arable Banach space X , are admitted. Using an unconditional basis of X ,
computational UQ for these equations is reduced to numerical solution of
countably parametric operator equations with smooth parameter dependence.
In computational forward UQ, efficiency of MOR is based on recent spar-
sity results for countably parametric solutions which imply upper bounds on
Kolmogorov N -widths of the manifold of (countably) parametric solutions and
quantities of interest (QoI for short) with dimension-independent convergence
rates. Subspace sequences which realize the N -width convergence rates are
obtained by greedy search algorithms in the solution manifold. Heuristic search
strategies in parameter space based on finite searches over anisotropic sparse
grids render greedy searches in reduced basis construction feasible. Instances
of the parametric forward problems which arise in the greedy searches are
assumed to be discretized by abstract classes of PetrovGalerkin (PG for short)
discretizations of the parametric operator equation, covering most conforming
primal, dual, and mixed finite element methods (FEMs), as well as certain space-
time Galerkin schemes for the application problem of interest. Based on the
PG discretization, MOR for both linear and nonlinear and affine and nonaffine
parametric problems are presented.
P. Chen ()
Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin,
TX, USA
e-mail: peng@ices.utexas.edu
C. Schwab
Departement Mathematik, Seminar for Applied Mathematics, ETH Zurich, Switzerland
e-mail: christoph.schwab@sam.math.ethz.ch
Keywords
Uncertainty quantification Sparse grid Reduced basis Empirical interpola-
tion Greedy algorithm High fidelity PetrovGalerkin A posteriori error
estimate A priori error estimate Bayesian inversion
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938
2 Forward UQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
2.1 A Class of Forward Problems with Uncertain Input Data . . . . . . . . . . . . . . . . . . . . . 942
2.2 Uncertainty Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
2.3 Parameter Sparsity in Forward UQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
3 Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
3.1 High-Fidelity Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
3.2 Reduced Basis Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958
3.3 Reduced Basis Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960
3.4 Linear and Affine-Parametric Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963
3.5 Nonlinear and Nonaffine-Parametric Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968
3.6 Sparse Grid RB Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975
4 Inverse UQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
4.1 Bayesian Inverse Problems for Parametric Operator Equations . . . . . . . . . . . . . . . . 976
4.2 Parametric Bayesian Posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
4.3 Well-Posedness and Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
4.4 Reduced Basis Acceleration of MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
4.5 Dimension and Order Adaptive, Deterministic Quadrature . . . . . . . . . . . . . . . . . . . . 983
4.6 Quasi-Monte Carlo Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985
7 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
1 Introduction
(i) Modeling error: the mathematical model under consideration does not
correctly describe the physical phenomena of interest: its exact solution does
not predict the QoI properly. (ii) Discretization error: the discretization of the
mathematical model (the substitution of the continous mathematical model by
a discrete, finite-dimensional approximation which, in principle, is solvable to
any prescribed numerical accuracy on computers in float point arithmetic), whose
computational realization is used in forward UQ, introduces a mismatch between
the true response (understood as exact solution of the mathematical model) and
the computed response obtained from the computational model, for a given (set
of) input data, and for a prescribed quantity of interest (QoI). Discretization errors
comprise replacing mathematical continuum models by finite difference, finite
volume, or finite element models and inexact solution of the finite-dimensional
problems which result from discretization, continuous time models by discrete
timestepping, random variables by Monte Carlo samples, and their realization by
random number generators. The classical paradigm of numerical analysis requires
discretizations to be stable and consistent. Particular issues for discretizations in the
context of UQ are uniform stability and consistency, with respect to all instances
of the uncertain input u. (iii) Computational error: the discretized model for the
numerical computation of the QoI which is obtained from a mathematical model and
its subsequent discretization is not numerically solvable for the given input data with
the computer resources at hand. This could be, for example, due to CPU limitations
and imprecise float point arithmetic (rounding) but also due to runtime failures of
hardware components, partial loss of data at runtime, etc.
Under the (strong) assumptions that modeling error (i) and computational error
(iii) are negligible, which is to say that epistemic uncertainty is absent, i.e., the
mathematical model under consideration is well posed and accounts in principle
for all phenomena of interest, and the assumption that float point computations are
reliable, a first key task in computational forward UQ is the efficient computational
prediction of the QoI of a mathematical model for any given instance of uncertain
input u from a space of admissible input data X . Computational challenges arise
when the space X of uncertain inputs is infinite-dimensional. For example, for
distributed uncertain input (such as material properties in inhomogeneous solids
or fluids, shapes obtained from noisy imaging, random forcing, etc.), X is a (subset
of a) function space. Computational UQ then involves uncertainty parametrization
via a basis of X (such as, for example, a Fourier basis in X D L2 ), resulting
in a parametric forward problem depending on sequences y D .yj /j 1 of
uncertain input parameters. Probability measures on X which encode information
on aleatoric uncertainty can be introduced via countable products of probability
measures on sequence space. Computational UQ procedures access the parametric
response or forward solutions nonintrusively, i.e., by numerical solution of the
forward model for instances of the parameter sequence y. Due to the possibly
infinite dimension of space of uncertain inputs y and due to the possibly high
cost of each forward solve, two key issues in computational UQ are (a) efficient
computational sampling of uncertain inputs from high (possibly infinite) dimen-
sional parameter spaces and (b) efficient numerical solution of parametric forward
models.
940 P. Chen C. Schwab
Two key methodologies are reviewed, which address issue (a) and issue (b):
sparsity of the parametric response map and model order reduction (MOR). Sparsity
refers to the possibility to represent responses of the parametric forward model
with user-specified accuracy 0 < " 1 with responses at O."1=s / parameter
instances where the sparsity parameter s > 0 and O./ are independent of the
dimension of the parameter space. MOR refers to replacing the parametric response
from the mathematical model for all admissible parameters by one parsimonious,
the so-called reduced order model, respectively, surrogate model which allows fast
response evaluation with certified accuracy " for any parameter instance y. The
accuracy which can be achieved by MOR with n basis functions, for a given set of
solutions, is determined by the so-called n-width of the solution set in the space of
all possible solutions.
Recent results on sparsity of parametric forward models are reviewed, covering
in particular uncertain coefficients, loadings and domains of definition in partial
differential equation models which arise in engineering and in the sciences. Sparsity
adapted interpolation schemes in parameter space allow to build polynomial chaos
surrogates of the parametric forward response maps. The corresponding point sets
in parameter space are (generalized) sparse grids; they are based on dimension- and
order-adaptive interpolation processes described here. As Monte Carlo sampling,
the performance of collocation in these deterministic point sets is independent of
the dimension of the parameter space, and their convergence rates can exceed 1=2,
provided the parametric response exhibits sufficient sparsity.
Sampling of high-dimensional parameter spaces on generalized sparse grids can
be used in so-called stochastic collocation for building polynomial surrogates of
the parametric forward maps, in the training phase of MOR and for deterministic
Smolyak quadrature over all uncertainties to evaluate response statistics in forward
UQ and Bayesian estimates in inverse UQ, for example.
The chapters focus is therefore on model order reduction and sparse, adaptive
collocation methods in high-dimensional parameter spaces in computational for-
ward and inverse UQ. All concepts are developed on abstract classes of forward
problems with parametric distributed uncertain input data, which is to say that
the uncertain data may take values in an infinite-dimensional function space.
Mathematical forward models which are considered here are specified in terms of
(linear or nonlinear) PDEs, of elliptic or parabolic type, with smooth nonlinearities.
Key steps in the approach are:
2 Forward UQ
Here the spaces for the system response q are X D Y D H01 .D/, the space for the
uncertain input is X D L1 .D/, and Y 0 D .H01 .D// D H 1 .D/. Note that (27.1)
is to hold in the weak or variational sense of Y 0 D H 1 .D/; this gives rise to the
variational form of the residual equation: find q 2 X such that
R R
0D Y hv; R.qI u/iY 0 D D gradv u.x/gradq.x/dx D v.x/f .x/dx (27.2)
for all v2YD H01 .D/:
find q.u/ 2 X such that the residual Eq. (27.4) is well-posed. I.e., for every fixed
u 2 XQ X , and for every F .u/ 2 Y 0 , there exists a unique solution q.u/ of (27.4)
which depends continuously on u.
The solutions is called in the regular branch (27.6) nonsingular if, in addition, the
differential
Proposition 1. Assume that Y is reflexive and that, for some nominal value hui 2 X
of the uncertain input data, the operator equation (27.5) admits a regular branch of
solutions (27.6). Then the differential Dq R at .hui; q0 / given by the bilinear map
and
The inf-sup conditions (27.8) and (27.9) are implied, for linear, self-adjoint PDEs
such as the diffusion equation in Example 1, by the more familiar concept of
coercivity.
Example 3. In the context of Example 1, one verifies with Y D X D H01 .D/ that
Z
Y 0 h.Dq R/.q0 I u/'; iY D r u.x/r'.x/dx:
D
In particular, for linear operator equations, the residual R.qI u/ is linear with respect
to q and the differential .Dq R/.q0 I u/ in (27.8) does not depend on q0 . Note that
uniform validity of (27.8) for all realizations of the uncertain input u implies a
constraint on the set XQ X of admissible data: one may choose, for example,
X D L1 .D/ and require u to take values on
Then (27.8) (and thus also (27.7)) are implied by the coercivity of .Dq R/.q0 I u/ in
X D H01 .D/
Z
R
Y 0 h.Dq R/.q0 I u/'; 'iY r' u.x/r'.x/dx c0 D jr'.x/j2 dx
D
c0 21 .1 C CP /k'k2X ;
where CP .D/ > 0 denotes the constant in the Poincar-inequality kr'k2L2 .D/
CP k'k2 2 , valid in X uniformly for all inputs u 2 XQ . The saddle point stability
L .D/
conditions (27.8) and the possibility of different trial and test function spaces are
not necessary here.
Remark 1. The saddle point stability conditions (27.8) are, however, indispensable
for indefinite variational problems such as the space-time formulation of the
parabolic evolution problem (27.3). For (27.3), the inf-sup conditions (27.8) have
been verified in [70, Appendix]. Consider, for further illustration, the Helmholtz
Equations for the propagation of time-harmonic pressure amplitude .x; t / in an
uncertain, linearly elastic medium, in a bounded, certain domain D Rd , with
homogeneous Dirichlet boundary conditions on @D.
Separation of variables .x; t / D exp.{!t /q.x/ implies the Helmholtz equation,
One chooses again X D Y D H01 .D/ and u 2 XQ as in Example 3. Then, for large
frequency
S ! > 0, the saddle point stability conditions (27.8) hold only if ! 2 62
u2XQ .A.I u//, where .A.I u// R>0 denotes the (discrete and countable)
946 P. Chen C. Schwab
Under conditions (27.8) and (27.9), for every u 2 XQ X , there exists a unique,
regular solution q.u/ of (27.5) which is uniformly bounded with respect to u 2 XQ
Q > 0 such that
in the sense that there exists a constant C .F; X/
sup kq.u/kX C F; XQ : (27.11)
u2XQ
For
(27.8), (27.9), (27.10), and (27.11) being valid, we shall say that the set
.q.u/; u/ W u 2 XQ X XQ forms a regular branch of nonsingular solutions.
If the data-to-solution map XQ 3 u 7! q.u/ is also Frchet differentiable with
Q X X,
respect to u at every point of the regular branch f.q.u/I u/ W u 2 Xg Q
the dependence of the forward map, i.e., the mapping relating u to q.u/ with the
Q there exists a Lipschitz
branch of nonsingular solutions, is locally Lipschitz on X:
constant L.F; XQ / such that
8u; v 2 XQ W kq.u/ q.v/kX L F; XQ ku vkX : (27.12)
This follows from the identity .Du q/.u/ D .Dq R/1 .Du R/ and from the
isomorphism property .Du Rq /.q0 I hui/ 2 Liso .X ; Y 0 / which is implied by (27.8)
and (27.9) and from the continuity of the differential Dq R on the regular branch.
In what follows, the abstract setting (27.4) is considered with uniformly contin-
uously differentiable mapping R.qI u/ in a product of neighborhoods BX .huiI R/
BX .q.hui/I R/ X X of sufficiently small radius R > 0, satisfying the structural
assumption (27.5). In Proposition 1 and throughout what follows, q.hui/ 2 X
denotes the unique regular solution of (27.5) at the nominal input hui 2 X .
where we emphasize that for every j 1, each coordinate function j .y/ may
depend on all coordinates yj 2 y. A typical example of a separable uncertainty
parametrization is the so-called thermal fin problem (cp. [59])
J
X
u.x; y/ D Dj .x/10yj ; (27.15)
j D1
S
where D WD Jj D1 Dj is decomposed into J nonoverlapping subdomains Dj ,
j D 1; : : : ; J ; Dj is a characteristic function such that Dj .x/ D 1 if x 2 Dj
and 0 otherwise; yj 2 1; 1, j D 1; : : : ; J .
(c) Nonlinear transformation of an affine parametrization. This case occurs, for
example, in log Gaussian models with a positivity constraint, such as a linear
diffusion equation with a log Gaussian permeability; for illustration, consider
the Dirichlet problem:
Bases in the uncertain input space X are, in general, not unique, even when
fixing the scaling of the coordinates yj in (27.13). Being bases of X , they are
mathematically equivalent. In the context of computational UQ, the concrete choice
of basis can have a significant impact on the numerical stability of UQ algorithms.
For illustration we mention the (textbook) example X D L2 .1; 1/, for which two
bases are given by 1 D f1; x; x 2 ; : : :g and 2 D fPj .x/ W j D 0; 1; 2; : : :g
denoting Pj the classical Legendre polynomial of degree j 0, with normalization
Pj .1/ D 1. Both bases, 1 and 2 , as well as the trigonometric functions
constituting a KarhunenLove basis of X D L2 .1; 1/, are global, meaning
that their elements are supported in the entire domain 1; 1. Alternative bases
of X D L2 .1; 1/ with local supports are spline wavelet bases, such as the Haar
wavelet basis. Most localized bases have an intrinsic limit on the approximation
order which can be reached for sufficiently smooth uncertain input data. Uncertainty
parametrization with localized bases can, however, substantially increase sparsity in
the parametric forward map.
Norm-convergence of the series (27.13) in X is implied by the summability
condition
X
k j kX < 1; (27.18)
j 1
Uncertain input data can also signify domain uncertainty, i.e., the shape of the
physical domain D in which the boundary value problem is considered is uncertain.
Upon suitable domain parametrization, such problems also are covered by the
ensuing parametric, variational formulation; domain uncertainty in the physical
domain can be reduced by domain mapping to a parametric problem in a fixed,
nominal domain D0 . This was considered for (27.2) with parametric coefficient u
depending on the shape of the [28] for a particular example, and the following,
smooth and nonlinear elliptic problem from [23]; see also [52] for applications
to artery variability in Hemodynamics, to [45] for application of reduced basis
techniques in electromagnetic scattering.
27 Model Order Reduction Methods in Computational Uncertainty Quantification 949
where the random domain Du.y/ is homeomorphic to the unit disc, and explicitly
given by
0:5
u.0/ D hui D 1 and j D sin.j / j 1; where > 2: (27.21)
j
By Tu we denote a transformation map from the nominal domain Dhui , the unit disk
of R2 centered at the origin, to the parametric domain Du , Tu .r cos. /; r sin. // WD
.u.y/r cos. /; u.y/r sin. //. The nonlinear operator equation (27.19) in the para-
metric, uncertain physical domain becomes a parametric, nonlinear equation in the
fixed nominal domain, which reads as: given a parameter sequence y 2 U , find a
parametric response q.y/ W Dhui ! R such that
(
div.M .y/rq.y// C q 3 .y/d .y/ D f d .y/ in Dhui ;
(27.22)
q.y/ D 0 on @Dhui ;
950 P. Chen C. Schwab
where d .y/ denotes the determinant of the Jacobian d T u of the map Tu , given as
d .y/ D .u.y//2 ;
1 C .b.y//2 b.y/ @ u.y/
M .y/ WD d .y/d Tu1 d Tu> D where b.y/ WD :
b.y/ 1 u.y/
(27.23)
This example fits into the abstract setting of Sect. 2.1 as follows: the uncertain datum
t
u D u.yI / 2 Xt D Cper .0; 2// where the degree of smoothness t D t ./ depends
on the exponent > 2 in (27.21). The spaces X and Y then are function spaces on
the nominal domain, and chosen as X D Y D H01 .Dhui /.
See (27.21) in Example 4, where the exponent > 2 determines the (Hlder)
smoothness of the domain transformation, and where the spaces Xt correspond to
Hlder spaces of 2-periodic functions.
Once an unconditional basis f j gj 1 of X has been chosen, every realization u 2
X can be identified in a one-to-one fashion with the pair .hui; y/ where hui denotes
the nominal instance of the uncertain datum u and y is the coordinate vector in
representation (27.13). Inserting (27.13) into (27.4), we obtain under Assumption 1
the equivalent, countably-parametric form: given F W U ! Y 0 ,
Remark 2. In what follows, by a slight abuse of notation, one identifies the subset
U in (27.25) with the countable set of parameters from the infinite-dimensional
parameter domain U RN without explicitly writing so. The operator A.qI u/
in (27.5) then becomes, via the parametric dependence u D u.y/, a parametric
operator family A.qI u.y// which one denotes (with slight abuse of notation) by
fA.qI y/ W y 2 U g, with the parameter set U D 1; 1N (again, one uses in what
follows this definition in place of the set U as defined in (27.25)). If A.qI y/ in (27.5)
is linear, one has A.qI y/ D A.y/q with A.y/ 2 L.X ; Y 0 /. One does not assume,
27 Model Order Reduction Methods in Computational Uncertainty Quantification 951
however, that the maps q 7! A.qI y/ are linear in what follows, unless explicitly
stated.
With this understanding, and under the assumptions (27.11) and (27.12), the
operator equation (27.5) will admit, for every y 2 U , a unique solution q.yI F /
which is, due to (27.11) and (27.12), uniformly bounded and depends Lipschitz
continuously on the parameter sequence y 2 U : there holds
and, if the local Lipschitz condition (27.12) holds, there exists a Lipschitz constant
L > 0 such that
Q in (27.12): it
The Lipschitz constant in (27.28) is not, in general, equal to L.F; X/
depends on the nominal instance hui 2 X and on the choice of basis f j gj 1 .
Unless explicitly stated otherwise, throughout what follows, we identify q0 D
q.0I F / 2 X in Proposition 1 with the solution of (27.4) at the nominal input
hui 2 X .
In forward UQ for problems with distributed, uncertain input data, upon uncertainty
parametrizations such as (27.13), the solution of the forward problem becomes,
as functions of the parameters yj in the sequence y, a countably parametric map
from the parameter space U to the solutions state space X . Efficient computational
UQ for such problems is crucially related to the approximation of such countably
parametric maps with convergence rates which are independent of the dimension,
i.e., independent of the number of coordinates which are active in the approximation.
Mathematical results are reviewed which allow to establish such approximation
results, with a key insight being that the attainable convergence rate is independent
of the dimension of the space of active parameters, and depends only on the
sparsity of the parametric map which is to be approximated. One technique to
verify parametric sparsity for a broad class of parametric problems is to verify
the existence of suitable holomorphic extensions of the parameter to solution map
into the complex domain. The existence of such extensions is closely related
to polynomial approximations of the parametric maps. The version presented
here applies the holomorphic extension of countably parametric maps for which
polynomial approximations take the form of so-called generalized polynomial chaos
expansions which will be described in detail below. The results presented in this
section are detailed in [28, 29] and the references there.
952 P. Chen C. Schwab
2.3.1 .b,p/-Holomorphy
For s > 1, introduce the Bernstein ellipse in the complex plane
w C w1
Es WD W 1 jwj s C; (27.29)
2
sCs 1 ss 1
which has semi axes of length 2
> 1 and 2
> 0 and denote
O
E WD Ej CN ; (27.30)
j 1
Definition 1. For a positive sequence b D .bj /j 1 2 `p .N/ for some 0 < p <
1, a parametric mapping U 3 y 7! q.y/ 2 X satisfies the .b; p/-holomorphy
assumption in the Hilbert space X if and only if
1. For each y 2 U , there exists a unique q.y/ 2 X and the map y 7! q.y/ from U
to X is uniformly bounded, i.e.,
The significance of .b; p/ holomorphy lies in the following facts: (a) solution
of well-posed, countably possibly nonlinear parametric operator equations with
.b; p/ holomorphic, parametric operator families are .b; p/ holomorphic; (b) .b; p/
holomorphic parametric solution maps fq.y/ W y 2 U g X allow for tensorized
so-called polynomial chaos approximations with dimension-independent N -term
approximation rates which depend only on the summability exponent p of the
sequence b; (c) .b; p/ holomorphic parametric solution maps fq.y/ W y 2 U g X
can also be constructively approximated by sparse, Smolyak-type interpolation
methods; see [25, 26]; and (d) .b; p/ holomorphy is preserved under composition
with holomorphic maps, in particular, for example, in the context of Bayesian
inverse problems; see [66,68] for details. (e) .b; p/ holomorphic parametric solution
maps fq.y/ W y 2 U g X allow for low-parametric, reduced basis surro-
gates. Points (b) and (c) are explained next, detailing in particular computational
approximation strategies for the efficient computation of sparse approximations of
countably-parametric solution families.
Q
where L .y/ WD j 1 L j .yj /, with Ln denoting the version of Pn normalized in
L2 .1; 1; dt2 /, i.e.,
0 11=2
Y
q D @ .1 C 2 j /A v : (27.38)
j 1
1
inf kq wkL1 .U;X / C .N C 1/s ; sD 1; (27.40)
w2Xn p
WD fz W 2 g where z WD .z j /j 1 : (27.42)
N
X
I g D g i H i ; (27.43)
iD1
27 Model Order Reduction Methods in Computational Uncertainty Quantification 955
Y k1
Y t zj
H .y/ WD h j .yj / where h0 .t / D 1 and hk .t / D ; k 1;
j 1
z
j D0 k
zj
(27.44)
and where the coefficients g k are recursively defined by
k
X
g 1 WD g.z0 /; g kC1 WD g.z kC1 / Ik g.z kC1 / D g.z kC1 / g i H i .z kC1 /:
iD1
(27.45)
The sparse grid
U is unisolvent for the space X of multivariate polynomials
with coefficients in X . The interpolation operator that maps functions defined on U
with values in X into X can be computed by the recursion (27.43) if one admits
g 2 X . Naturally, in this case, the coefficients g being elements of a function
space cannot be exactly represented and must be additionally approximated, e.g.,
by a finite element or a collocation approximation in a finite-dimensional subspace
Xh X .
The following result recovers the best N -term approximation rate O.N s /
in (27.40) for the interpolation in P different choice of downward closed sets
. See [25] for a proof.
1
kq IN qkL1 .U;X / C .N C 1/s ; s D 1: (27.46)
p
Remark 3. The above theorem guarantees the existence of some sparse grid
interpolant with dimension-independent error convergence rate. However, practical
construction of a sparse grid is not a trivial task, which depends on specific
problems. A dimension-adaptive algorithm has been proposed in [42] and further
developed in [15,54,66]. This algorithm has been found to perform well in a host of
examples from forward and inverse UQ. Its convergence and (quasi)optimality are,
however, not yet justified mathematically.
Given any sample y 2 U , an accurate solution of the forward PDE model (27.26)
relies on a stable and consistent numerical solver with high precision, which
typically requires a high-fidelity discretization of the PDE model and a computa-
tionally expensive solving of the corresponding algebraic system. Such a large-scale
956 P. Chen C. Schwab
A1 stability: the parametric bilinear form a satisfies the discrete HiFi-PG inf-sup
condition
a.wh ; vh I y/
8y 2 U W inf sup DW h .y/ h > 0; (27.51)
0wh 2Xh 0vh 2Yh jjwh jjX jjvh jjY
where the inf-sup constant h .y/ depends on h and on y and may vanish
h .y/ ! 0 as h ! 0.
A2 consistency: the best approximation satisfies the consistent approximation
property
1
8y 2 U W lim inf kq.y/ wh kX D 0: (27.52)
h!0 2 .y/ wh 2Xh
h
Theorem 3. Under Assumption 2, there exists h0 > 0 and 0 > 0 such that for
0 < h h0 , there exists a solution qh .y/ 2 Xh of the HiFi-PG approximation
958 P. Chen C. Schwab
where C is independent of the mesh size h and uniformly bounded w.r.t. y. Moreover,
one has the a posteriori error estimate
4
jjq.y/ qh .y/jjX jjR.qh .y/I y/jjY 0 : (27.56)
.y/
In many (but not all) practical applications in UQ, the lower bound of the stability
constants .y/ and h .y/ in Assumption 2 are independent of y and of h: consider
specifically the parametric elliptic diffusion problem in Example 3. In this example,
one has that for X D Y D H01 .D/ holds, for every y 2 U , that h .y/ .y/
c0 .1 C CP /=2.
except for the trial and test spaces, which indicate that the RB solution qN .y/ is a
PG compression/projection of the HiFi solution qh .y/; specifically, a PG projection
from the HiFi space into the RB space. To ensure the well-posedness of the RB
solution, one makes the following assumptions.
Assumption 3. Holding Assumption 2, with the same notation of the bilinear form
a.; I y/ W X Y ! R defined as in (27.50), one makes the further assumptions
that
A1 stability: the parametric bilinear form a satisfies the discrete RB-PG inf-sup
condition: there holds
a.wN ; vN I y/
8y 2 U W inf sup DW N .y/ N > 0;
0wN 2XN 0vN 2YN jjwN jjX jjvN jjY
(27.58)
where N is a lower bound of the inf-sup constant N .y/, which depends on N
and on y and may converge to the HiFi inf-sup constant N .y/ ! h .y/ as
N ! Nh .
A2 consistency: the best approximation satisfies the consistent approximation
property
1
8y 2 U W lim inf kqh .y/ wN kX D 0: (27.59)
N !Nh 2 .y/ wN 2XN
N
Proceeding as in [60], one can establish the following error estimates for the RB
solution (see [23])
Theorem 4. Under Assumption 3, there exist N0 > 0 and 00 > 0 such that for N
N0 , there exists a solution qN .y/ 2 XN of the RB-PG compression problem (27.57),
which is unique in BX .qh .y/I 00 N .y//. Moreover, for any N N0 , there holds
the a priori error estimate
jja.y/jj jja.y/jj
jjqh .y/ qN .y/jjX 2 1C inf jjqh .y/ wN jjX :
h .y/ N .y/ wN 2XN
(27.60)
Moreover, one has the a-posteriori error estimate
4
jjqh .y/ qN .y/jjX jjR.qN .y/I y/jjY 0 : (27.61)
h .y/
Remark 4. Note that both the a priori and the a posteriori error estimates of the
RB solution turn out to be the same as those of the HiFi solution with different
stability constants and different approximation spaces. These results are obtained as
960 P. Chen C. Schwab
a consequence of the fact that the RB-PG problem (27.57) is nothing different from
the HiFi-PG problem (27.49) except in different approximation spaces.
As the computational cost for the solution of the RB-PG problem (27.57) critically
depends on the RB degrees of freedom (dof) N , one needs to construct the optimal
RB space XN that is most representative for all the parametric solutions with
required approximation accuracy, such that N is as small as possible. However,
it is computationally unfeasible to obtain such a optimal subspace XN as it is an
infinite dimensional optimization problem involving expensive HiFi solutions. In
the following, two practical algorithms are presented that allow in practice quasi-
optimal construction of RB trial spaces XN . Construction of the RB test space YN
is deferred to the next sections.
Nr
let .n ; n /nD1 denote the eigenpairs of the correlation matrix C, i.e.
C n D n n; n D 1; : : : ; Nr : (27.63)
Nt
X 1
hn D p .m/ m
n qh .y /; n D 1; : : : ; Nr : (27.64)
mD1
n
Moreover,
Nt
X Nr
X
W
jjqh .y n / PN pod qh .y n /jj2X D n : (27.67)
nD1 nDN C1
Remark 5. Proposition 2 implies that the POD basis functions achieve the optimal
compression measured in the ensemble of square X -norm of the orthogonal
projection error. Moreover, the ensemble of the projection errors can be bounded
explicitly according to (27.67), which can serve as an error indicator to choose the
suitable number of POD basis functions given certain requirement of accuracy. Due
to its optimality, POD has been widely used for reduced basis construction in Hilbert
spaces [7, 10, 75].
Remark 6. However, to compute the POD basis functions, one needs to compute the
HiFi solution at a sufficiently large number of properly chosen random samples. The
possibly large training set could be prohibitive for the given computational budget,
especially for high-dimensional problems that require numerous samples.
at which one constructs the first RB space X1 D spanfqh .y 1 /g. Then, for N D
1; 2; : : : , one seeks the next sample y N C1 such that
Theorem 5. Let dN .Mh ; Xh / denote the Kolmogorov N -width, i.e., the worst-case
scenario error of the X -projection of the HiFi solution qh .y/ 2 Mh in the optimal
among all possible N -dimensional subspaces ZN Xh . Specifically,
Then the following results hold for the convergence rates of the RB compression
error [4, 34]:
Proof. The proof of the results in the finite dimensional approximation spaces Xh
and Mh follows those in [34] where X is a Hilbert space and the solution manifold
M is a compact set in X .
Remark 7. The above results indicate that the RB compression by the (weak)
greedy algorithm achieves optimal convergence rates in comparison with the
Kolmogorov width, in the case of both algebraic rate and exponential rate. However,
27 Model Order Reduction Methods in Computational Uncertainty Quantification 963
the Kolmogorov width is typically not available for general parametric problems. In
our setting, i.e., for smooth parameter dependence, it can be bounded from above by
the sparse interpolation error estimate in (27.46), i.e., with algebraic convergence
rate N s . Exponential convergence rates are shown in [16] for a one-dimensional
parametric problem whose solution is analytic w.r.t. the parameter.
By .wnh /N h n Nh
nD1 and .vh /nD1 one denotes the basis functions of the HiFi trial and test
spaces Xh and Yh . Then, the parametric solution qh .y/ can be written as
Nh
X
qh .y/ D qhn .y/wnh ; (27.75)
nD1
where q h .y/ D .qh1 .y/; : : : ; qhNh .y//> denotes the (parametric) coefficient vector
of the HiFi PG solution qh .y/. The algebraic formulation of (27.74) reads:
X X
given y 2 U; find q h .y/ 2 RNh W yj Ahj q h .y/ D yj f hj : (27.76)
j 2J j 2J
The HiFi matrix Ahj 2 RNh Nh and the HiFi vector f hj 2 RNh can be assembled as
N
X
qN .y/ D qNn .y/wnN ; (27.78)
nD1
with the coefficient vector q N .y/ D .qN1 .y/; : : : ; qNN .y//> . Then, the parametric
RB-PG compression problem can be written in the algebraic formulation as
X X
given y 2 U; find q N .y/ 2 RN W yj AN
j q N .y/ D yj f N
j : (27.79)
j 2J j 2J
> h > h
AN
j D V Aj W and f j D V f j ;
N
j 2 J; (27.80)
where W and V are the transformation matrices between the HiFi and RB basis
functions, i.e.,
Nh
X Nh
X
wnN D Wmn wm n
h and vN D Vmn vhm ; n D 1; : : : ; N: (27.81)
mD1 mD1
27 Model Order Reduction Methods in Computational Uncertainty Quantification 965
Thanks to the linear and affine structure of the parametric terms in (27.73), one can
assemble the RB matrices AN j and the RB vectors f j , j 2 J, once and for all. For
N
each given y, one only needs to assemble and solve the RB algebraic system (27.79)
with computational cost depends only on N as O.N 2 / for assembling and O.N 3 /
for solving (27.79), which leads to considerable computational reduction as long as
N Nh .
Let eN .y/ D qh .y/ qN .y/ denote the RB error, then by the stability
condition (27.51) one has
so that jjeON .y/jjY D jjr.I y/jjY 0 . By setting vh D eON .y/ in (27.86), one has
jjeON .y/jj2Y D r.eON .y/I y/ D a.eN .y/; eON .y/I y/ h jjeN .y/jjX jjeON .y/jjY ;
(27.87)
where h is the continuity constant of the bilinear form a in Xh Yh , which implies
that
h
4N .y/ jjeN .y/jjX : (27.88)
h
966 P. Chen C. Schwab
Therefore, the error bound 4N .y/ is tight with the constants in (27.69) as c D
h =h and C D 1, and D c=C D h =h .
For the evaluation of 4N .y/, one makes use of the affine structure (27.73)
by computing the Riesz representation Anj of the linear functional aj .wnN I / D
Y 0 hAj wN ; iY W Yh ! R as the solution of
n
where wnN is the nth RB basis function. Analogously, one computes the Riesz
representation Fj of the linear functional fj ./ DY 0 hFj ; iY W Yh ! R as the
solution of
Finally, one can compute the dual norm of the residual in the error bound 4N .y/
by
X
P
N
jjeON .y/jj2Y D yj yj 0 .Fj ; Fj 0 /Y 2 qNn .y/.Fj ; Anj 0 /Y
j;j 0 2J nD1
!
P
N 0 0
C qNn .y/qNn .y/.Anj ; Anj 0 /Y ;
n;n0 D1
0
This definition implies supvh 2Yh ja.wh ; vh I y/j D ja.wh ; Ty wh I y/j, i.e. Ty wh is the
supremizer of wh in Yh w.r.t. the bilinear form a. Then the y-dependent RB test
y
space YN is constructed as
y
YN D spanfTy wN ; wN 2 XN g: (27.92)
a.wN ; vN I y/
N .y/ WD inf sup h .y/: (27.93)
wN 2XN y
vN 2YN jjwN jjX jjvN jjY
This implies that inf-sup stability of the HiFi-PG discretization is inherited by the
corresponding PG-RB trial and test spaces: all RB-PG compression problems
are inf-sup stable under the stability assumption of the HiFi-PG approximation:
A1 stability in Assumption 2. In particular, if the HiFi-PG discretizations are
inf-sup stable uniformly with respect to the uncertain input parameter u (resp.
its parametrization in terms of y), so is any PG-RB method obtained with PG
trial space XN obtained by a greedy search, and the corresponding PG test space
(27.92).
Due to the affine structure (27.73), one can compute Ty wN for each y 2 U as
X
Ty wN D yj Tj wN ; where .Tj wN ; vh /Y D aj .wN ; vh / 8vh 2 Yh ;
j 2J
(27.94)
where Tj wN , wN 2 XN , j 2 J needs to be computed only once; given any y,
Ty wN can be assembled in O.N / operations, which is independent of the number
Nh of degrees of freedom in the HiFi-PG discretization. The compressed RB-PG
discretization (27.79) can be written more explicitly as
X X
given y 2 U; find q N .y/ 2 RN W yj yj 0 AN
j;j 0 q N .y/ D yj yj 0 f N
j;j 0 ;
j;j 0 2J j;j 0 2J
(27.95)
j;j 0 2 R
where the RB compressed stiffness matrix AN N N
and the RB compressed
load vector f j;j 0 2 R are given by
N N
Since all these quantities are independent of y, one can precompute these quantities
once and for all.
968 P. Chen C. Schwab
The linearity of the (integral or differential) operator in the forward model and the
affine parameter dependence play a crucial role in effective decomposition of the
parameter-dependent and parameter-independent quantities. This, in turn, leads to a
massive reduction in computational work for the RB-PG compression. For more
general problems that involve nonlinear terms w.r.t. the state variable q and/or
nonaffine terms w.r.t. the parameter y, for instance, Example 4, it is necessary
to obtain an affine approximation of these terms in order to retain the effective
decomposition and RB reduction. In this section, such an affine approximation based
on empirical interpolation [2, 12, 18, 40, 57] is presented.
where .k/ is a constant determined by a line search [33]. The Newton iteration is
stopped once
.k/ .k/
jjqh .y/jjX "tol or jjR.qh .y/I y/jjY 0 "tol ; (27.99)
.kC1/
being "tol a tolerance; then one sets qh .y/ D qh .y/.
With the bases fwnh gN h
nD1 and fv n Nh
g
h nD1 of X h and Y h , respectively, one can write
Nh
X Nh
X
.k/ .k/ .k/ .k/
qh .y/ D qh;n .y/wnh and qh .y/ D qh;n .y/wnh ; (27.100)
nD1 nD1
27 Model Order Reduction Methods in Computational Uncertainty Quantification 969
>
.k/ .k/ .k/ .k/
with the coefficient vectors q h .y/ D qh;1 .y/; : : : ; qh;Nh .y/ and q h .y/ D
>
.k/ .k/
qh;1 .y/; : : : ; qh;Nh .y/ so that the algebraic formulation of the paramet-
ric HiFi-PG approximation problem (27.97) reads: find the coefficient vector
>
.k/ .k/ .k/
q h .y/ WD qh;1 .y/; : : : ; qh;Nh .y/ 2 RNh such that
.k/
where the Jacobian matrix Jh q h .y/I y 2 RNh Nh is given by
D 0
E
.k/ .k/
Jh .q h .y/I y/ DY 0 Dq R.qh .y/I y/.wnh /; vhn ; n; n0 D 1; : : : ; Nh ;
nn0 Y
(27.102)
.k/
and the residual vector r h q h .y/I y 2 RNh takes the form
D
E
.k/ .k/
r h q h .y/I y DY 0 R qh .y/I y ; vhn ; n D 1; : : : ; Nh : (27.103)
n Y
.k/ .k/
Y 0 hDq R qN .y/I y qN .y/ ; vN iY
.k/
D Y 0 hR qN .y/I y ; vN iY 8vN 2 YN I (27.104)
where again .k/ is a constant determined by a line search method [33]. The stopping
criterion is
.k/ .k/
jjqN .y/jjX "tol or jjR.qN .y/I y/jjY 0 "tol ; (27.106)
970 P. Chen C. Schwab
N
X N
X
.k/ .n;k/ .k/ .n;k/
qN .y/ D qN .y/wnN and qN .y/ D qN .y/wnN (27.107)
nD1 nD1
>
.k/ .1;k/ .N;k/ .k/
with the coefficient vectors q N D qN .y/; : : : ; qN .y/ and q N D
>
.1;k/ .N;k/
qN .y/; : : : ; qN .y/ . Then the algebraic formulation of the RB-PG com-
.k/
pression (27.104) reads: find q N 2 RN such that
.k/
where the parametric RB Jacobian matrix JN q N .y/I y 2 RN N and the
.k/
parametric RB residual vector r N q N .y/I y 2 RN are given (through the
transformation matrix W and V) by
.k/ .k/
JN q N .y/I y D V> Jh Wq N .y/I y W
.k/ .k/
and r N q N .y/I y D V> r h Wq N .y/I y (27.109)
One can observe that, due to the nonlinearity (and/or nonaffinity) of the residual
operator, neither the residual vector nor the Jacobian matrix allows affine decom-
position into parameter-dependent and parameter-independent terms. This obstructs
effective RB-PG compression.
r tm 2 RNh ; m D 1; : : : ; Mt ; (27.110)
.k/
for instance, collected from the residual vectors r h q h I y at successive iteration
steps enumerate with the index k and at different samples y. A greedy algorithm is
27 Model Order Reduction Methods in Computational Uncertainty Quantification 971
applied to construct the empirical interpolation for the approximation of any given
r 2 RNh : one picks the first EI basis r 1 2 RNh as
where jj jj1 can also be replaced by jj jj2 ; then one chooses the first index n1 2
f1; : : : ; Nh g, such that
where .r 1 /n is the n-th entry of the vector r 1 . The number of EI basis is set as
M , and M D 1 for the time being. For any r 2 RNh , it is approximated by the
(empirical) interpolation
M
X
IM r D cm r m ; (27.113)
mD1
M
X
.r/m0 D cm .r m /m0 ; m 0 D n1 ; : : : ; n M : (27.114)
mD1
More explicitly, let PM 2 f0; 1gM Nh denote an index indicator matrix with nonzero
entries .PM /m;nm D 1, m D 1; : : : ; M ; let RM 2 RNh M denote the EI basis matrix
whose m-th column is r m , m D 1; : : : ; M . Then, the coefficient vector c can be
written as
r tm IM r tm
r M C1 D ; where m D argmax jjr tm IM r tm jj1 ;
jjr tm IM r tm jj1 1mMt
(27.117)
and find the next index nM C1 as
The greedy algorithm is terminated when j.r M C1 /nM C1 j "tol . The empirical
interpolation is consistent in that when M ! Nh , one has r M ! 0 due to the
interpolation property. Moreover, an a priori error analysis shows that the greedy
algorithm for EI construction leads to the same result for the convergence of the EI
compression error in comparison with the Kolmogorov width as the bound stated in
Theorem 5, except for a Lebesgue constant depending on M , see [23, 56] for more
details.
For any y 2 U , let q N;M .y/ 2 RN denote the coefficient of the solution
qN;M .y/ 2 XN of the RB-PG compression problem with the empirical interpo-
lation. By the empirical interpolation of the HiFi residual vector in (27.109), one
can approximate the RB residual vector as
.k/
once and for all, and the y-dependent quantity PM r h Wq N;M .y/I y can be
evaluated in O.MN / operations as long as locally supported HiFi basis functions
are used, e.g., Finite Element basis functions.
Similarly, one can approximate the HiFi Jacobian matrix in (27.109) as
.k/ .k/
Jh Wq N .y/I y RM .PM RM /1 PM Jh Wq N;M .y/I y ; (27.120)
.k/ .k/
JN q N .y/I y JN;M q N;M .y/I y
.k/
WD V> RM .PM RM /1 PM Jh Wq N;M .y/I y W; (27.121)
.k/
where the y-dependent quantity PM Jh Wq N;M .y/I y W can be com-
2
efficiently
with O.M N / operations, as long as the Jacobian matrix
puted
.k/
Jh Wq N;M .y/I y is sparse. This is typically the case for PG PDE approximation
with locally supported (e.g., Finite Element) basis functions. Direct approximation
of the HiFi Jacobian matrix Jh by empirical interpolation has also been studied in
[10].
By the above EI compression, q N;M .y/ is the solution of the problem
One observes that the RB solution (with EI) q N .y/ q N;M .y/ due to the empirical
interpolation error. Moreover, the RB-EI solution qN;M .y/ converges to the RB
solution qN .y/ as M ! Nh .
27 Model Order Reduction Methods in Computational Uncertainty Quantification 973
Analogously, recall the RB-EI-PG problem with slight abuse of notation: given any
y 2 U , find qN;M .y/ 2 XN , such that
Subtracting (27.123) from (27.124), inserting two zero terms, one has by rearrang-
ing
jjr h .qh .y/I y/ r h .qN;M .y/I y/jj DjjDq r h ..qh .y/I y//.qh .y/ qN;M .y//jj
C o.jjqh .y/ qN;M .y/jjX /
where the first term accounts for the empirical interpolation error, which can be
approximated by
jjIM r h .qN;M .y/I y/ V> IM r h .qN;M .y/I y/jj D jjIM r h .qN;M .y/I y/jj
DjjRM .PM RM /1 PM r h .qN;M .y/I y/jj
(27.129)
where the HiFi terms can be computed for only once; given any y 2 U ,
evaluation of PM r h .qN;M .y/I y/ takes O.MN / operations. Therefore, once the
y-independent quantities are (pre)computed, evaluation of the a posteriori error
indicators for both the EI compression error and the RB compression error can
be achieved efficiently, with cost depending only on N and M for each given
y 2 U.
Finally, one can define the a posteriori error indicator of the RB-EI compression
error as
1
4N .y/WD .jjr h .qN;M .y/I y/IM r h .qN;M .y/I y/jj CjjIM r h .qN;M .y/I y/jj / ;
Q
h
(27.130)
which can be efficiently evaluated for each y 2 U with cost independent of the HiFi
dof Nh .
.k/ .k/
JsN;M q N;M .y/I y q N;M D r sh q N;M .y/I y ; (27.131)
.k/
where the Jacobian matrix JsN;M .q N;M .y/I y/ with stabilization is given by
>
.k/
PM Jh Wq N;M .y/I y W .RM .PM RM /1 /> M1 1
h RM .PM RM / PM Jh
.k/
Wq N;M .y/I y W; (27.132)
27 Model Order Reduction Methods in Computational Uncertainty Quantification 975
.k/
residual vector r sh q N;M .y/I y is given by
>
.k/
PM Jh .Wq N;M .y/I y/W .RM .PM RM /1 /> M1 1
h RM .PM RM / PM r h
.k/
Wq N;M .y/I y ; (27.133)
which can also be efficiently evaluated for each given y with an additional O.MN /
.k/
operations. Then until a suitable termination criterion is met, e.g., jjq N;M jj2 "tol ,
for a prescribed tolerance "tol and for a suitable vector norm jj jj (cp. Section 3.5.4)
.kC1/ .k/ .k/
the solution is updated as q N;M D q N;M C .k/ q N;M with suitable constant .k/
obtained by a line search method.
For the construction of the RB space XN , one can directly solve the optimization
problem (27.69) with the true error replaced by suitable a posteriori error estimate,
for instance,
To choose the training samples, random sampling methods have been mostly used
in practice [63]; adaptive sampling with certain saturation criteria [43] has also
been developed recently to remove and add samples from the training set. In the
present setting of uncertainty parametrization as introduced in Sect. 2.2, one takes
advantage of the sparsity of the parametric data-to-solution map which is implied
976 P. Chen C. Schwab
4 Inverse UQ
Following [32, 66, 68, 71, 73], one equips the space of uncertain inputs X and the
space of solutions X of the forward maps with norms k kX and with k kX ,
respectively. Consider the abstract (possibly nonlinear) operator equation (27.5)
where the uncertain operator A.I u/ 2 C 1 .X ; Y 0 / is assumed to be boundedly
invertible, at least locally for the uncertain input u sufficiently close to a nominal
input hui 2 X , i.e. for ku huikX sufficiently small so that, for such u, the response
of the forward problem (27.5) is uniquely defined. Define the forward response
map, which relates a given uncertain input u and a given forcing F to the response
q in (27.5) by
27 Model Order Reduction Methods in Computational Uncertainty Quantification 977
To ease notation, one does not list the dependence of the response on F and simply
denotes the dependence of the forward solution on the uncertain input as q.u/ D
G.u/. Assume given an observation functional O./ W X ! Y , which denotes a
bounded linear observation operator on the space X of observed system responses
in Y . Throughout the remainder of this paper, one assumes that there is a finite
number K of sensors, so that Y D RK with K < 1. Then O 2 L.X I Y / ' .X 0 /K .
One equips Y D RK with the Euclidean norm, denoted by j j. For example, if O./
is a K-vector of observation functionals O./ D .ok .//K
kD1 .
In this setting, one wishes to predict computationally an expected (under the
Bayesian posterior) system response of the QoI, conditional on given, noisy
measurement data . Specifically, the data is assumed to consist of observations
of system responses in the data space Y , corrupted by additive observation noise,
e.g., by a realization of a random variable taking values in Y with law Q0 . One
assumes additive, centered Gaussian noise on the observed data 2 Y . That is,
the data is composed of the observed system response and the additive noise
according to D O.G.u// C 2 Y . One assumes that Y D RK and is Gaussian,
i.e., a random vector
Q0
N .0;
/ with a positive definite covariance
where Y D L2
.RK / denotes random vectors taking values in Y D RK which
are square integrable with respect to the Gaussian measure on Y D RK . Bayes
formula [32, 73] yields a density of the Bayesian posterior with respect to the prior
whose negative log-likelihood equals the observation noise covariance-weighted,
least-squares functional (also referred to as potential in what follows)
W X
Y ! R by
.uI / D 12 j G.u/j2
, i.e.,
1 1
.uI / D j G.u/j2
WD . G.u//>
1 . G.u// : (27.138)
2 2
d 1
.u/ D exp ..uI // : (27.139)
d 0 Z
In particular, then, the Radon-Nikodym derivative of the Bayesian posterior w.r.t. the
prior measure admits a bounded density w.r.t. the prior 0 which is denoted by ,
and which is given by (27.139).
.U; B; 0 /: (27.140)
To ease notation, one assumes throughout what follows that the prior measure 0
on the uncertain input u 2 X , parametrized in the form (27.13), is the uniform
measure (the ensuing derivations are still applicable if 0 is absolutely continuous
with respect to the uniform measure, with a smooth and bounded density). Being
0 a countable product probability measure, this assumption implies the statistical
independence of the coordinates yj in the parametrization (27.13). With the
parameter domain U as in (27.140) the parametric uncertainty-to-observation map
W U ! Y D RK is given by
.y/ D G.u/ P : (27.141)
uDhuiC j 2J yj j
27 Model Order Reduction Methods in Computational Uncertainty Quantification 979
d 1
.y/ D .y/ (27.142)
d 0 Z
.y/ D exp
.uI / P ; (27.143)
uDhuiC j 2J yj j
Z
Z D E 0 D .y/ d 0 .y/> 0: (27.144)
U
.y/D.y/.u/ juDhuiCPj 2J yj j
D exp
.uI / .u/ P WU !Z:
uDhuiC j 2J yj j
(27.145)
Then the Bayesian estimate of the QoI , given noisy observation data , reads
Z Z
0 0
E D Z =Z; Z WD .y/ 0 .dy/; Z WD .y/ 0 .dy/:
y2U y2U
(27.146)
The task in computational Bayesian estimation is therefore to approximate the ratio
Z 0 =Z 2 Z in (27.146). In the parametrization with respect to y 2 U , Z and Z 0 take
the form of infinite-dimensional, iterated integrals with respect to the prior 0 .d y/.
980 P. Chen C. Schwab
> 0) which are monotone, non-decreasing separately in each argument, such that
for all u 2 XQ , and for all ; 1 ; 2 2 BY .0; r/
and
j
.uI 1 /
.uI 2 /j M2 .r; kukX /k1 2 kY : (27.148)
Under Assumption 4, the expectation (27.146) depends Lipschitz on (see [32, Sec.
4.1] for a proof):
1 2
8 2 L2 . 1 ; X I R/\L2 . 2 ; X I R/ kE E kZ C .
; r/k1 2 kY :
(27.149)
Below, one shall be interested in the impact of approximation errors in the forward
response of the system (e.g., due to discretization and approximate numerical
solution of system responses) on the Bayesian predictions (27.146). For continuity
of the expectations (27.146) w.r.t. changes in the potential, the following assumption
is imposed.
j
.uI / N
.uI /j M2 .r; kukX /kkY .N / (27.150)
where .N / ! 0 as N ! 1.
Proposition 3. Under Assumption 5, and the assumption that for XQ X and for
some bounded B X one has 0 .XQ \ B/ > 0 and
there holds, for every QoI W X ! Z such that, although the convergence rate
s can be substantially higher than the rate 1=2 afforded by MCMC methods (cp.
Sect. 4.4 and [32, Sections 5.1, 5.2]) that 2 L2 .X I Z/ \ L2 .X I Z/ uniformly
N
w.r.t. N , that Z > 0 in (27.144) and
kE E N kZ C .
; r/kkY .N /: (27.151)
1
N .uI / D . G N .u//>
1 . G N .u// W X Y 7! R ; (27.153)
2
The preceding result shows that the consistency condition (27.152) for the approx-
imate forward map q N ensures corresponding consistency of the Bayesian estimate
E N , due to (27.151). Note that so far, no specific assumption on the nature of
approximation of the forward map has been made. Using a MOR surrogate of the
parametric forward model, Theorem 5 allows to bound .N / in (27.152) by the
982 P. Chen C. Schwab
.N / N . dN .Mh I Xh / (27.154)
U 3 y ! q.y/ 2 X . The theory in [47] can be extended using the consistency error
bound Assumption 5 in the Bayesian potential, and Proposition 4 for the RB error
in the forward map. Practical application and implementation of RB with MCMC
for (Bayesian) inverse problems can be found, for instance, in [30, 51].
N ./ WD f W ej 2 ; 8j 2 I and j D 0 ; 8j > j ./ C 1g
for any downward closed index set F of currently active gpc modes,
where j ./ WD maxfj W j > 0 for some 2 g. This heuristic approach
aims at controlling the global approximation error by locally collecting indices of
the current set of neighbors with the largest estimates error contributions. In the
following, the resulting algorithm to recursively build the downward closed index
set in the Smolyak quadrature which is adapted to the posterior density (and,
due to the explicit expression (27.143) from Bayes formula, also to the observation
data ) is summarized. The reader is referred to [42,66,68] for details and numerical
results. Development and analysis of the combination of RB, MOR and ASG for
Bayesian inversion are described in depth in [2123], for both linear and nonlinear,
both affine and nonaffine parametric problems.
1: function ASG
2: Set 1 D f0g ; k D 1 and compute 0 ./.
3: Determine the reduced set (27.155) of neighbors N .1 /.
4: P ./ ; 8 2 N .1 /.
Compute
5: while 2N .k / k ./kS > t ol do
6: Select from N .k / with largest k kS and set kC1 D k [ f g.
7: Determine the reduced set (27.155) of neighbors N .kC1 /.
8: Compute ./ ; 8 2 N .kC1 /.
9: Set k D k C 1.
10: end while
11: end function
984 P. Chen C. Schwab
5 Software
and test of new algorithms. The code and an accompanying textbook [59] are
available through the link:
http://augustine.mit.edu/methodology/methodology_rbMIT_System.htm
RBniCS: An RB extension of the Finite Element software package FEniCS
[53] is under development and public domain through the link:http://mathlab.
sissa.it/rbnics. Implementation includes POD and greedy algorithm for coercive
problems, which is suited for an introductory course of RB together with the
book [44].
Dune-RB: It is a module for the Dune (Distributed and Unified Numerics
Environment) library in C++. Template classes are available for RB construction
based on several HiFi discretizations, including Finite Element and Finite
Volume. Parallelization is available for RB construction too. Tutorials and code
are available at http://www.dune-project.org/.
pyMOR: pyMOR is implemented in Python for MOR for parameterized PDE.
It has friendly interfaces and proper integration with external high-dimensional
PDE solvers. Finite element and finite volume discretizations implemented based
on the library of NumPy/SciPy are available. For more information, see http://
pymor.org.
AKSELOS: MOR remains the core technology for the startup company AKSE-
LOS in several engineering fields, such as port infrastructure and industrial
machinery. Different components are included such as FEA and CAD. The HiFi
solution in AKSELOS is implemented with a HPC and cloud-based simulation
platform and is available for commercial use. For more information see http://
www.akselos.com/.
For further libraries/software packages, we refer to http://www.ians.uni-stuttgart.
de/MoRePaS/software/.
6 Conclusion
7 Glossary
References
1. Abgrall, R., Amsallem, D.: Robust model reduction by l-norm minimization and approxima-
tion via dictionaries: application to linear and nonlinear hyperbolic problems. Technical report
(2015)
2. Barrault, M., Maday, Y., Nguyen, N.C., Patera, A.T.: An empirical interpolation method:
application to efficient reduced-basis discretization of partial differential equations. Comptes
Rendus Mathematique, Analyse Numrique 339(9), 667672 (2004)
3. Beiro da Veiga, L., Buffa, A., Sangalli, G., Vzquez, R.: Mathematical analysis of variational
isogeometric methods. Acta Numer. 23, 157287 (2014)
4. Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., Wojtaszczyk, P.: Convergence
rates for greedy algorithms in reduced basis methods. SIAM J. Math. Anal. 43(3), 14571472
(2011)
5. Brezzi, F., Rappaz, J., Raviart, P.-A.: Finite-dimensional approximation of nonlinear problems.
I. Branches of nonsingular solutions. Numer. Math. 36(1), 125 (1980/1981)
6. Buffa, A., Maday, Y., Patera, A.T., Prudhomme, C., and Turinici, G.: A priori convergence
of the greedy algorithm for the parametrized reduced basis method. ESAIM: Math. Modell.
Numer. Anal. 46(03), 595603 (2012)
7. Bui-Thanh, T., Damodaran, M., Willcox, K.E.: Aerodynamic data reconstruction and inverse
design using proper orthogonal decomposition. AIAA J. 42(8), 15051516 (2004)
8. Bui-Thanh, T., Ghattas, O., Martin, J., Stadler, G.: A computational framework for infinite-
dimensional Bayesian inverse problems Part I: the linearized case, with application to global
seismic inversion. SIAM J. Sci. Comput. 35(6), A2494A2523 (2013)
9. Bui-Thanh, T., Willcox, K., Ghattas, O.: Model reduction for large-scale systems with high-
dimensional parametric input space. SIAM J. Sci. Comput. 30(6), 32703288 (2008)
10. Carlberg, K., Bou-Mosleh, C., Farhat, C.: Efficient non-linear model reduction via a least-
squares PetrovGalerkin projection and compressive tensor approximations. Int. J. Numer.
Methods Eng. 86(2), 155181 (2011)
11. Chatterjee, A.: An introduction to the proper orthogonal decomposition. Curr. Sci. 78(7),
808817 (2000)
12. Chaturantabut, S., Sorensen, D.C.: Nonlinear model reduction via discrete empirical interpo-
lation. SIAM J. Sci. Comput. 32(5), 27372764 (2010)
13. Chen, P., Quarteroni, A.: Accurate and efficient evaluation of failure probability for partial
differential equations with random input data. Comput. Methods Appl. Mech. Eng. 267(0),
233260 (2013)
14. Chen, P., Quarteroni, A.: Weighted reduced basis method for stochastic optimal control
problems with elliptic PDE constraints. SIAM/ASA J. Uncertain. Quantif. 2(1), 364396
(2014)
15. Chen, P., Quarteroni, A.: A new algorithm for high-dimensional uncertainty quantification
based on dimension-adaptive sparse grid approximation and reduced basis methods. J. Comput.
Phys. 298, 176193 (2015)
16. Chen, P., Quarteroni, A., Rozza, G.: A weighted reduced basis method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 51(6), 31633185 (2013)
17. Chen, P., Quarteroni, A., Rozza, G.: Comparison of reduced basis and stochastic collocation
methods for elliptic problems. J. Sci. Comput. 59, 187216 (2014)
18. Chen, P., Quarteroni, A., Rozza, G.: A weighted empirical interpolation method: a priori
convergence analysis and applications. ESAIM: Math. Modell. Numer. Anal. 48, 943953,
7 (2014)
19. Chen, P., Quarteroni, A., Rozza, G.: Multilevel and weighted reduced basis method for
stochastic optimal control problems constrained by Stokes equations. Numerische Mathematik
133(1), 67102 (2015)
988 P. Chen C. Schwab
20. Chen, P., Quarteroni, A., Rozza, G.: Reduced order methods for uncertainty quantification
problems. Report 2015-03, Seminar for Applied Mathematics, ETH Zrich (2015, Submitted)
21. Chen, P., Schwab, Ch.: Sparse-grid, reduced-basis Bayesian inversion. Comput. Methods Appl.
Mech. Eng. 297, 84115 (2015)
22. Chen, P., Schwab, Ch.: Adaptive sparse grid model order reduction for fast Bayesian estimation
and inversion. In: Garcke, J., Pflger, D. (eds.) Sparse Grids and Applications Stuttgart 2014,
pp. 127. Springer, Cham (2016)
23. Chen, P., Schwab, Ch.: Sparse-grid, reduced-basis Bayesian inversion: nonaffine-parametric
nonlinear equations. J. Comput. Phys. 316, 470503 (2016)
24. Cheng, M., Hou, T.Y., Zhang, Z.: A dynamically bi-orthogonal method for time-dependent
stochastic partial differential equations i: derivation and algorithms. J. Comput. Phys. 242,
843868 (2013)
25. Chkifa, A., Cohen, A., DeVore, R., Schwab, Ch.: Adaptive algorithms for sparse polynomial
approximation of parametric and stochastic elliptic pdes. M2AN Math. Mod. Num. Anal.
47(1), 253280 (2013)
26. Chkifa, A., Cohen, A., Schwab, Ch.: High-dimensional adaptive sparse polynomial interpola-
tion and applications to parametric pdes. J. Found. Comput. Math. 14(4), 601633 (2013)
27. Ciesielski, Z., Domsta, J.: Construction of an orthonormal basis in C m .I d / and Wpm .I d /.
Studia Math. 41, 211224 (1972)
28. Cohen, A., Chkifa, A., Schwab, Ch.: Breaking the curse of dimensionality in sparse polynomial
approximation of parametric pdes. J. Math. Pures et Appliquees 103(2), 400428 (2015)
29. Cohen, A., DeVore, R.: Kolmogorov widths under holomorphic mappings. IMA J. Numer.
Anal. (2015). doi:dru066v1-dru066
30. Cui, T., Marzouk, Y.M., Willcox, K.E.: Data-driven model reduction for the Bayesian solution
of inverse problems (2014). arXiv preprint arXiv:1403.4290
31. Dahmen, W., Plesken, C., Welper, G.: Double greedy algorithms: reduced basis methods for
transport dominated problems. ESAIM: Math. Modell. Numer. Anal. 48(03), 623663 (2014)
32. Dashti, M., Stuart, A.M.: The Bayesian approach to inverse problems. In: Ghanem, R., etal.
(eds.) Handbook of UQ (2016). http://www.springer.com/us/book/9783319123844
33. Deuflhard, P.: Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive
Algorithms, vol. 35. Springer, Berlin/New York (2011)
34. DeVore, R., Petrova, G., Wojtaszczyk, P.: Greedy algorithms for reduced bases in banach
spaces. Constr. Approx. 37(3), 455466 (2013)
35. Dick, J., Gantner, R., LeGia, Q.T., Schwab, Ch.: Higher order Quasi Monte Carlo integration
for Bayesian inversion of holomorphic, parametric operator equations. Technical report,
Seminar for Applied Mathematics, ETH Zrich (2015)
36. Dick, J., Kuo, F.Y., Le Gia, Q.T., Nuyens, D., Schwab, Ch.: Higher order QMC Petrov-Galerkin
discretization for affine parametric operator equations with random field inputs. SIAM J.
Numer. Anal. 52(6), 26762702 (2014)
37. Dick, J., Kuo, F.Y., LeGia, Q.T., Schwab, C.: Multi-level higher order QMC Galerkin
discretization for affine parametric operator equations. SIAM J. Numer. Anal. (2016, to appear)
38. Dick, J., Kuo, F.Y., Sloan, I.H.: High-dimensional integration: the Quasi-Monte Carlo way.
Acta Numer. 22, 133288 (2013)
39. Dick, J., LeGia, Q.T., Schwab, Ch.: Higher order Quasi Monte Carlo integration for
holomorphic, parametric operator equations. SIAM/ASA J. Uncertain. Quantif. 4(1), 4879
(2016)
40. Drohmann, M., Haasdonk, B., Ohlberger, M.: Reduced basis approximation for nonlinear
parametrized evolution equations based on empirical operator interpolation. SIAM J. Sci.
Comput. 34(2), A937A969 (2012)
41. Gantner, R.N., Schwab, Ch.: Computational higher order quasi-monte carlo integration.
Technical report 2014-25, Seminar for Applied Mathematics, ETH Zrich (2014)
42. Gerstner, T., Griebel, M.: Dimensionadaptive tensorproduct quadrature. Computing 71(1),
6587 (2003)
27 Model Order Reduction Methods in Computational Uncertainty Quantification 989
43. Hesthaven, J., Stamm, B., Zhang, S.: Efficient Greedy algorithms for high-dimensional
parameter spaces with applications to empirical interpolation and reduced basis methods.
ESAIM: Math. Modell. Numer. Anal. 48(1), 259283 (2011)
44. Hesthaven, J.S., Rozza, G., Stamm, B.: Certified Reduced Basis Methods for Parametrized
Partial Differential Equations. Springer Briefs in Mathematics. Springer, Cham (2016)
45. Hesthaven, J.S., Stamm, B., Zhang, S.: Certified reduced basis method for the electric field
integral equation. SIAM J. Sci. Comput. 34(3), A1777A1799 (2012)
46. Hoang, V.H., Schwab, Ch.: n-term Wiener chaos approximation rates for elliptic PDEs
with lognormal Gaussian random inputs. Math. Mod. Methods Appl. Sci. 24(4), 797826
(2014)
47. Hoang, V.H., Schwab, Ch., Stuart, A.: Complexity analysis of accelerated MCMC methods for
Bayesian inversion. Inverse Probl. 29(8), 085010 (2013)
48. Huynh, D.B.P., Knezevic, D.J., Chen, Y., Hesthaven, J.S., Patera, A.T.: A natural-norm
successive constraint method for inf-sup lower bounds. Comput. Methods Appl. Mech. Eng.
199(29), 19631975 (2010)
49. Huynh, D.B.P., Rozza, G., Sen, S., Patera, A.T.: A successive constraint linear optimization
method for lower bounds of parametric coercivity and inf-sup stability constants. Comptes
Rendus Mathematique, Analyse Numrique 345(8), 473478 (2007)
50. Lassila, T., Manzoni, A., Quarteroni, A., Rozza, G.: Generalized reduced basis methods and
n-width estimates for the approximation of the solution manifold of parametric PDEs. In:
Brezzi, F., Colli Franzone, P., Gianazza, U., Gilardi, G. (eds.) Analysis and Numerics of Partial
Differential Equations. Springer INdAM Series, vol. 4, pp. 307329. Springer, Milan (2013)
51. Lassila, T., Manzoni, A., Quarteroni, A., Rozza, G.: A reduced computational and geometrical
framework for inverse problems in hemodynamics. Int. J. Numer. Methods Biomed. Eng.
29(7), 741776 (2013)
52. Lassila, T., Rozza, G.: Parametric free-form shape design with PDE models and reduced basis
method. Comput. Methods Appl. Mech. Eng. 199(23), 15831592 (2010)
53. Logg, A., Mardal, K.A., Wells, G.: Automated Solution of Differential Equations by the Finite
Element Method: The FEniCS Book, vol. 84. Springer, Berlin/New York (2012)
54. Ma, X., Zabaras, N.: An adaptive hierarchical sparse grid collocation algorithm for the solution
of stochastic differential equations. J. Comput. Phys. 228(8), 30843113 (2009)
55. Maday, Y., Mula, O., Patera, A.T., Yano, M.: The generalized empirical interpolation method:
stability theory on Hilbert spaces with an application to the Stokes equation. Comput. Methods
Appl. Mech. Eng. 287, 310334 (2015)
56. Maday, Y., Mula, O., Turinici, G.: A priori convergence of the generalized empirical
interpolation method. In: 10th International Conference on Sampling Theory and Applications
(SampTA 2013), Bremen, pp. 168171 (2013)
57. Maday, Y., Nguyen, N.C., Patera, A.T., Pau, G.S.H.: A general, multipurpose interpolation
procedure: the magic points. Commun. Pure Appl. Anal. 8(1), 383404 (2009)
58. Maday, Y., Patera, A.T., Turinici, G.: A priori convergence theory for reduced-basis
approximations of single-parameter elliptic partial differential equations. J. Sci. Comput. 17(1),
437446 (2002)
59. Patera, A.T., Rozza, G.: Reduced basis approximation and a posteriori error estimation for
parametrized partial differential equations. Copyright MIT, http://augustine.mit.edu (2007)
60. Pousin, J., Rappaz, J.: Consistency, stability, a priori and a posteriori errors for Petrov-Galerkin
methods applied to nonlinear problems. Numerische Mathematik 69(2), 213231 (1994)
61. Prudhomme, C., Maday, Y., Patera, A.T., Turinici, G., Rovas, D.V., Veroy, K., Machiels, L.:
Reliable real-time solution of parametrized partial differential equations: reduced-basis output
bound methods. J. Fluids Eng. 124(1), 7080 (2002)
62. Quarteroni, A.: Numerical Models for Differential Problems, 2nd edn. Springer, Milano (2013)
63. Rozza, G., Huynh, D.B.P., Patera, A.T.: Reduced basis approximation and a posteriori error
estimation for affinely parametrized elliptic coercive partial differential equations. Arch.
Comput. Methods Eng. 15(3), 229275 (2008)
990 P. Chen C. Schwab
64. Rozza, G., Veroy, K.: On the stability of the reduced basis method for stokes equations in
parametrized domains. Comput. Methods Appl. Mech. Eng. 196(7), 12441260 (2007)
65. Sapsis, T.P., Lermusiaux, P.F.J.: Dynamically orthogonal field equations for continuous
stochastic dynamical systems. Phys. D: Nonlinear Phenom. 238(23), 23472360 (2009)
66. Schillings, C., Schwab, Ch.: Sparse, adaptive Smolyak quadratures for Bayesian inverse
problems. Inverse Probl. 29(6), 065011 (2013)
67. Schillings, C., Schwab, Ch.: Scaling limits in computational Bayesian inversion. In: ESAIM:
M2AN (2014, to appear). http://dx.doi.org/10.1051/m2an/2016005
68. Schillings, C., Schwab, Ch.: Sparsity in Bayesian inversion of parametric operator equations.
Inverse Probl. 30(6), 065007, 30 (2014)
69. Schwab, Ch., Gittelson, C.: Sparse tensor discretizations of high-dimensional parametric and
stochastic PDEs. Acta Numerica 20, 291467 (2011)
70. Schwab, Ch., Stevenson, R.: Space-time adaptive wavelet methods for parabolic evolution
problems. Math. Comput. 78(267), 12931318 (2009)
71. Schwab, Ch., Stuart, A.M.: Sparse deterministic approximation of Bayesian inverse problems.
Inverse Probl. 28(4), 045003, 32 (2012)
72. Schwab, Ch., Todor, R.A.: KarhunenLove approximation of random fields by generalized
fast multipole methods. J. Comput. Phys. 217(1), 100122 (2006)
73. Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numerica 19(1), 451559 (2010)
74. Taddei, T., Perotto, S., Quarteroni, A.: Reduced basis techniques for nonlinear conservation
laws. ESAIM: Math. Modell. Numer. Anal. 49(3), 787814 (2015)
75. Willcox, K., Peraire, J.: Balanced model reduction via the proper orthogonal decomposition.
AIAA J. 40(11), 23232330 (2002)
Multifidelity Uncertainty Quantification
Using Spectral Stochastic Discrepancy 28
Models
Abstract
When faced with a restrictive evaluation budget that is typical of todays high-
fidelity simulation models, the effective exploitation of lower-fidelity alternatives
within the uncertainty quantification (UQ) process becomes critically important.
Herein, we explore the use of multifidelity modeling within UQ, for which
we rigorously combine information from multiple simulation-based models
within a hierarchy of fidelity, in seeking accurate high-fidelity statistics at lower
computational cost. Motivated by correction functions that enable the provable
convergence of a multifidelity optimization approach to an optimal high-fidelity
point solution, we extend these ideas to discrepancy modeling within a stochas-
tic domain and seek convergence of a multifidelity uncertainty quantification
process to globally integrated high-fidelity statistics. For constructing stochastic
models of both the low-fidelity model and the model discrepancy, we employ
stochastic expansion methods (non-intrusive polynomial chaos and stochastic
collocation) computed by integration/interpolation on structured sparse grids
or regularized regression on unstructured grids. We seek to employ a coarsely
resolved grid for the discrepancy in combination with a more finely resolved
grid for the low-fidelity model. The resolutions of these grids may be defined
statically or determined through uniform and adaptive refinement processes.
Adaptive refinement is particularly attractive, as it has the ability to preferentially
target stochastic regions where the model discrepancy becomes more complex,
i.e., where the predictive capabilities of the low-fidelity model start to break down
and greater reliance on the high-fidelity model (via the discrepancy) is necessary.
These adaptive refinement processes can either be performed separately for the
different grids or within a coordinated multifidelity algorithm. In particular,
we present an adaptive greedy multifidelity approach in which we extend the
generalized sparse grid concept to consider candidate index set refinements
drawn from multiple sparse grids, as governed by induced changes in the
statistical quantities of interest and normalized by relative computational cost.
Through a series of numerical experiments using statically defined sparse grids,
adaptive multifidelity sparse grids, and multifidelity compressed sensing, we
demonstrate that the multifidelity UQ process converges more rapidly than a
single-fidelity UQ in cases where the variance of the discrepancy is reduced
relative to the variance of the high-fidelity model (resulting in reductions in initial
stochastic error), where the spectrum of the expansion coefficients of the model
discrepancy decays more rapidly than that of the high-fidelity model (resulting
in accelerated convergence rates), and/or where the discrepancy is more sparse
than the high-fidelity model (requiring the recovery of fewer significant terms).
Keywords
Multifidelity Uncertainty quantification Discrepancy model Polynomial
chaos Stochastic collocation Sparse grid Compressed sensing Wind
turbine
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 993
2 Stochastic Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996
2.1 Non-intrusive Polynomial Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997
2.2 Stochastic Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
2.3 Sparse Grid Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
2.4 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003
3 Multifidelity Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004
3.1 Corrected Low-Fidelity Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005
3.2 Sparse Grids with Predefined Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006
3.3 Adaptive Sparse Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006
3.4 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009
3.5 Analytic Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010
4 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012
4.1 Simple One-Dimensional Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012
4.2 Short Column Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013
4.3 Elliptic PDE Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017
4.4 Horn Acoustics Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022
4.5 Production Engineering Example: Vertical-Axis Wind Turbine . . . . . . . . . . . . . . . 1025
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 993
1 Introduction
The rapid advancement of both computer hardware and physics simulation capabil-
ities has revolutionized science and engineering, placing computational simulation
on an equal footing with theoretical analysis and physical experimentation. This
rapidly increasing reliance on the predictive capabilities of computational models
has created the need for rigorous quantification of the effect that all types of
uncertainties have on these predictions, in order to inform decisions and design
processes with critical information on quantified variability and risk. A variety
of challenges arise when deploying uncertainty quantification (UQ) to ever more
complex physical systems. Foremost among these is the intersection of UQ prob-
lem scale with simulation resource constraints. When pushing the boundaries of
computational science using complex three-dimensional (multi-)physics simulation
capabilities, a common result is a highest-fidelity simulation that can be performed
only a very limited number of times, even when executing on leadership-class
parallel computers. Combining this state with the significant number of uncertainty
sources (often in the hundreds, thousands, or beyond) that exist in these complex
applications can result in an untenable proposition for UQ or at least a state of
significant imbalance in which resolution of spatiotemporal physics has been given
much higher priority over resolution of quantities of interest (QoIs) throughout
the stochastic domain. This primary challenge can be further compounded by a
number of factors, including (i) the presence of a mixture of aleatory and epistemic
uncertainties, (ii) the need to evaluate the probability of rare events, and/or (iii) the
presence of nonsmoothness in the variation of the response QoIs over the range
of the random parameters. In these challenging cases, our most sophisticated UQ
algorithms may be inadequate on their own for affordably generating accurate high-
fidelity statistics; we need to additionally exploit opportunities that exist within a
hierarchy of model fidelity.
The computational simulation of a particular physical phenomenon often has
multiple discrete model selection possibilities. Here, we will broadly characterize
this situation into two classes: a hierarchy of model fidelity and an ensemble of
model peers. In the former case of a hierarchy, a clear preference structure exists
among the models such that high-fidelity and low-fidelity judgments are readily
assigned. Here, the goal is to manage the trade-off between accuracy and expense
among the different model fidelities in order to achieve high-quality statistical
results at lower cost. In the latter case, a clear preference structure is lacking
and there is additional uncertainty created by the lack of a best model. In this
case, the goal becomes one of management and propagation of this model form
uncertainty [15] or, in the presence of experimental data for performing inference,
reducing the model form uncertainty through formal model selection processes [10,
37]. Taking computational fluid dynamics (CFD) applied to a canonical cavity flow
as an example, Fig. 28.1 depicts a multifidelity hierarchy that may include invis-
cid and boundary-element-based simulations, Reynolds averaged Navier-Stokes
(RANS), unsteady RANS, large-eddy simulation (LES), and direct numerical
simulation (DNS). Peers at a given level (e.g., RANS and LES) involve a variety
994 M.S. Eldred et al.
a Potential Flow
Potential
uniform Flow
inflow Regions
vortex sheet
cavity acoustic
waves
RANS
Hybrid RANS/LES
Example of a multifidelity hierarchy.
b
Vortex sheet
Potential Flow model
Increasing Model Fidelity
Reynolds Reynolds
One equation Two equation
Averaged Navier- RANS model RANS model
stress RANS
Stokes (RANS) model
Large Eddy
Smagorinsky Dynamic
Simulation (LES) Model Model
Fig. 28.1 Example of an ensemble of CFD models for a cavity flow problem (a) and hierarchy
of fidelity and ensemble of peer alternatives (b). Observational data can be used to inform model
selection among peers (red boxes)
2 Stochastic Expansions
In polynomial chaos, one must estimate the chaos coefficients for a set of basis
functions. To reduce the nonlinearity of the expansion and improve convergence,
the polynomial bases are chosen such that their orthogonality weighting functions
match the probability density functions of the stochastic parameters up to a constant
factor [43]. The basis functions are typically obtained from the Askey family of
hypergeometric orthogonal polynomials [4], and Table 28.1 lists the appropriate
polynomial bases for some commonly used continuous probability distributions. If
the stochastic parameters do not follow these standard probability distributions, then
the polynomial bases may be generated numerically [19, 25, 41]. Alternatively, or if
correlations are present, variable transformations [1, 12, 39] may be used. These
transformations provide support for statistical independence (e.g., decorrelation of
standard normal distributions as in [12]) and can also enable the use of nested
quadrature rules such as Gauss-Patterson and Genz-Keister for uniform and normal
distributions, respectively, within the transformed space.
Let R./ be a black box that takes d stochastic parameters D .1 ; : : : ; d /
as inputs and return the system response R as the output.
The system response is approximated by the expansion
X
R./ i i ./; (28.1)
i2Ip
Table 28.1 Some standard continuous probability distributions and their corresponding Askey
polynomial bases. B.; / is the Beta function and ./ is the Gamma function
Distribution Density function Polynomial basis Orthogonality weight Support
x 2 x 2
Normal p1 e 2 Hermite Hen .x/ e 2 1; 1
2
1
Uniform 2
Legendre Pn .x/ 1 1; 1
.1x/ .1Cx/ .;/
Beta 2CC1 B .C1;C1/
Jacobi Pn .x/ .1 x/ .1 C x/ 1; 1
x x 0; 1
Exponential e Laguerre Ln .x/ e
x e x
Gamma .C1/
Gen. Laguerre L./
n .x/ x e x 0; 1
From Multifidelity Uncertainty Quantification Using Non-Intrusive Polynomial Chaos and Stochas-
tic Collocation, by Ng and Eldred, 2012, AIAA-2012-1852, published in the Proceedings of the
53rd SDM conference; reprinted by permission of the American Institute of Aeronautics and
Astronautics, Inc.
998 M.S. Eldred et al.
Ip D fi W ik pk ; k D 1; : : : ; d g: (28.3)
and
d
Y
MTP D .pk C 1/; (28.5)
kD1
respectively.
One approach to calculate the chaos coefficients i is the spectral projection
method that takes advantage of the orthogonality of the bases. This results in
Z
hR./; i ./i 1
i D 2 D 2 R./i ././ d; (28.6)
i ./ i ./
Qd
where ./ D kD1 k .k / is the joint probability
density
of the stochastic
parameters over the support D 1 : : : d and i2 ./ is available in closed
form. Thus, the bulk of the work is in evaluating the multidimensional integral in the
numerator. Tensor-product quadrature may be employed to evaluate these integrals
or, if d is more than a few parameters, sparse grid quadrature may be employed as
described in Eqs. 28.1928.21 to follow.
Once the chaos coefficients are known, the statistics of the system response can
be estimated directly and inexpensively from the expansion. For example, the mean
and the variance can be obtained analytically as
X
R D hR./i i hi ./i D 0 (28.7)
i2Ip
and
D E XX X
R2 D R./2 2R i j i ./j ./ 02 D i2 i2 ./ (28.8)
i2Ip j2Ip i2Ip n0
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 999
where 0 is a vector of zeros (i.e., the first multi-index). Other statistics such as
probabilities and quantiles can be estimated by sampling the polynomial expansion.
where R .j / is the system response evaluated at collocation points .j / ,
j D 1; : : : ; n. The basis functions `.j / ./ can be either local or global and
either value based or gradient enhanced [1], but the most common option is the
global Lagrange polynomial based on interpolation of values:
n
Y .k/
`.j / ./ D : (28.10)
.j / .k/
kD1; kj
The collocation points can be chosen for accuracy in interpolation (as indicated by
the Lebesgue measure) or integration (as indicated by polynomial exactness). Here,
we emphasize the latter and choose Gaussian quadrature nodes that correspond
to the same orthogonal polynomial selections used for the polynomial chaos
expansion. In the multivariate case, we start from a tensor-product formulation with
the multi-index i D .i1 ; : : : ; id /, ik D 0; 1; 2; : : ::
def
R./ U i .R/ D Ui1 : : : Uid .R/
n1
X nd
X
.j / .j / .j / .j /
D R 1 1 ; : : : ; d d `1 1 .1 / : : : `d d .d /:
j1 D1 jd D1
(28.11)
Statistics of the system response such as the mean and the variance of a tensor-
product expansion can be obtained analytically as
n1
X nd
X D E
.j / .j / .j / .j /
R D hR./i R 1 1 ; : : : ; d d `1 1 .1 / : : : `d d .d /
j1 D1 jd D1
n1
X nd
X
.j / .j / .j / .j /
D R 1 1 ; : : : ; d d w1 1 : : : wd d
j1 D1 jd D1
def
DQi .R/ (28.12)
1000 M.S. Eldred et al.
and
D E
R2 D R./2 2R
n1
X nd X
X n1 nd
X
.j / .j / .k / .k /
R 1 1 ; : : : ; d d R 1 1 ; : : : ; d d
j1 D1 jd D1 k1 D1 kd D1
D E
.j / .j / .k / .k /
`1 1 .1 / : : : `d d .d /`1 1 .1 / : : : `d d .d / 2R
n1
X nd
X
.j / .j / .j / .j /
D R2 1 1 ; : : : ; d d w1 1 : : : wd d 2R
j1 D1 jd D1
def
DQi R2 2R ; (28.13)
where
the expectation integrals of the basis polynomials use the property that
`.s/ .t/ D s;t , such that numerical quadrature of these integrals leaves only the
quadrature weights w. Higher moments can be obtained analytically in a similar
manner, and other statistics such as probabilities and quantiles can be estimated by
sampling the expansion.
We can collapse these tensor-product
NTP
sums by moving to a multivariate basis
representation X
R./ R. .j / /L.j / ./ (28.14)
j D1
j
where ck is a collocation multi-index (similar to the polynomial chaos multi-index
in Eqs. 28.128.3). The corresponding moment expressions are then
NTP
X
R D R. .j / /w.j / (28.16)
j D1
NTP
X
R2 D R2 . .j / /w.j / 2R (28.17)
j D1
A sparse interpolant and its moments are then formed from a linear combination of
these tensor products, as described in the following section.
where i .R/ D i1 : : : id .R/ and ik D Uik Uik 1 with U1 D 0.
Thus, the tensor-product grids used in the isotropic sparse grid construction are
those whose multi-indices lie within a simplex defined by the sparse grid level q.
The relationship between the index ik and the number of collocation points nk
in each dimension k D 1; : : : ; d is called the growth rule and is an important detail
of the sparse grid construction. If the collocation points are chosen based on a fully
nested quadrature rule, then a nonlinear growth rule that approximately doubles nk
with every increment in ik (e.g., nk D 2ik C1 1 for open nested rules such as Gauss-
Patterson) is used to augment existing model evaluations. If the collocation points
are based on a weakly nested or non-nested quadrature rule, then a linear growth
rule (e.g., nk D 2ik C 1) may alternatively be used to provide finer granularity in
the order of the rule and associated degree of the polynomial basis.
0
0 1 2 3 4 0 1 2 3 4
Fig. 28.2 Identification of the admissible forward neighbors for an index set (red). The indices of
the reference basis are gray and admissible forward indices are hashed. An index set is admissible
only if its backward neighbors exist in every dimension
index are admissible, because their backward neighbors exist in every dimension.
In the right graphic only the child in the vertical dimension is admissible, as not all
parents of the horizontal child exist.
Thus, admissible multi-indices can be added one by one starting from an initial
reference grid, often the level 0 grid (i D 0) corresponding to a single point
in parameter space. The refinement process is governed by a greedy algorithm,
where each admissible index set is evaluated for promotion into the reference grid
based on a refinement metric. The best index set is selected for promotion and
used to generate additional admissible index sets, and the process is continued
until convergence criteria are satisfied. The resulting generalized sparse grid is then
defined from
X
AJ ;d .R/ D i .R/: (28.21)
i2J
of unique points in the sparse grid, NSG ), where the weights w.j / are now the
sparse weights defined by the linear combination of all tensor-product weights
applied to the same collocation point .j / . Note that while each tensor-product
expansion interpolates the system responses at the collocation points, the sparse grid
expansion will not interpolate in general unless a nested set of collocation points is
used [7]. Another important detail in the non-nested case is that the summations in
Eqs. 28.1628.17 correspond to sparse numerical integration of R and R2 , differing
slightly from expectations of the sparse interpolant and its square.
Hierarchical interpolation on nested grids [2, 31] is particularly useful in an
adaptive refinement setting, where we are interested in measuring small increments
in statistical QoI due to small increments (e.g., candidate index sets in generalized
sparse grids) in the sparse grid definition. By reformulating using hierarchical
interpolants, we mitigate the loss of precision due to subtractive cancelation. In hier-
archical sparse grids, the i in Eqs. 28.20 and 28.21 are tensor products of the one-
dimensional difference interpolants defined over the increment in collocation points:
n1 nd
X X
i .R/ D s j11 ; : : : ; jdd L
j1
1
L d
jd (28.22)
j1 D1 jd D1
where s are the hierarchical surpluses computed from the difference between the
newly added response value and the value predicted by the previous interpolant
level. Hierarchical increments in expected value are formed simply from the
expected value of the new i contributions induced by a refinement increment
(a single i for each candidate index set in the case of generalized sparse grids).
Hierarchical increments in a variety of other statistical QoI may also be derived in
a manner that preserves precision [1], starting from increments in response mean
and covariance :
i D ERi (28.23)
ij D ERi Rj i ERj j ERi ERi ERj (28.24)
2 32 3 2 3
1 . 1 / 2 . 1 / ::: M . 1 / 1 R1
6 1 . 2 / 2 . 2 / : : : M . 2 /
76 2 7 6 R2 7
6 76 7 6 7
6 :: :: :: :: 76 :: 7 D D R D 6 : 7: (28.25)
4 : : : : 54 : 5 4 :: 5
1 . N / 2 . N / : : : M . N / M RN
This system may be over-, under-, or uniquely determined, based on the number
of candidate basis terms M and the number of response QoI observations N . We
are particularly interested in the under-determined case N M , for which the
problem is ill-posed and a regularization is required to enforce solution uniqueness.
One approach is to apply the pseudo-inverse of , which returns the solution with
minimum 2-norm jjjj`2 . However, if the solution is sparse (many terms are zero) or
compressible (many terms are small due to rapid solution decay), then an effective
alternative is to seek the solution with the minimum number of terms. This sparse
solution is the minimum zero-norm solution
which can be solved using the orthogonal matching pursuit (OMP), a greedy
heuristic algorithm that iteratively constructs the solution vector term by term. Other
approaches such as basis pursuit denoising (BPDN) compute a sparse solution using
the minimum one norm
3 Multifidelity Extensions
Let Rhigh ./ be the system response obtained by evaluating the expensive high-
fidelity model and Rlow ./ be the system response obtained by evaluating the
inexpensive low-fidelity model. In a multifidelity stochastic expansion, the low-
fidelity model values are corrected to match the high-fidelity model values (and
potentially their derivatives) at the high-fidelity collocation points.
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1005
and
Rhigh ./
M ./ D ; (28.29)
Rlow ./
respectively, then
or
where the second moments of the A ./ and M ./ can be estimated analytically
from their stochastic expansions as described in Eqs. 28.728.8 and Eqs. 28.16
1006 M.S. Eldred et al.
28.17. This choice of balances the additive correction and the multiplicative
correction such that neither becomes too large.
respectively, where r is a sparse grid level offset between the stochastic expansion
of the low-fidelity model and the stochastic expansion of the correction function
and r q. Thus, the multifidelity stochastic expansion at sparse grid level q can
be constructed with Nqr;d instead of Nq;d high-fidelity model evaluations plus
Nq;d low-fidelity model evaluations. If the low-fidelity model is significantly less
expensive to evaluate than the high-fidelity model, then significant computational
savings will be obtained. Furthermore, if the low-fidelity model is sufficiently
predictive, then accuracy comparable to the single-fidelity expansion with Nq;d
high-fidelity model evaluations can be achieved. This notion of a predetermined
sparse grid level offset enforces computational savings explicitly, with less regard
to managing the accuracy in RQ high ./. In the case of adaptive refinement, to be
discussed in the following section, we instead manage the accuracy explicitly by
investing resources where they are most needed for resolving statistics of RQ high ./
and the computational savings that result are achieved indirectly.
a norm of QoI increments, or multiple sets could be selected that are each the
best for at least one QoI (a non-dominated Pareto set).
4. Update sets: If the largest change induced by the active trial sets exceeds a
specified convergence tolerance, then promote the selected trial set(s) from the
active set to the reference set and update the active set with new admissible
forward neighbors; return to step 2 and evaluate all active trial sets with respect
to the new reference grid. If the convergence tolerance is satisfied, advance to
step 5. An important algorithm detail for this step involves recursive set updating.
In one approach, the selection of a discrepancy set could trigger the promotion of
that set for both the discrepancy grid and the grid level(s) below it, in support of
the notion that the grid refinements should be strictly hierarchical and support a
spectrum from less to more resolved. Alternatively, one could let the sparse grid
at each level evolve without constraint and ensure only the caching and reuse of
evaluations among related levels. In this work, we employ the latter approach,
as we wish to preserve the ability to under-resolve levels that are not providing
predictive utility.
5. Finalization: Promote all remaining active sets to the reference set, update
all expansions within the hierarchy, and perform a final combination of the
low-fidelity and discrepancy expansions to arrive at the final result for the high-
fidelity statistical QoI.
Figure 28.3 depicts a multifidelity sparse grid adaptation in process, for which
the low-fidelity grid has three active index sets under evaluation and the discrepancy
grid has four. These seven candidates are evaluated for influence on the statistical
QoI(s), normalized by relative cost, in step 2 above. It is important to emphasize
that all trial set evaluations (step 2) involve sets of actual model evaluations. That is,
this algorithm does not identify the best candidates based on any a priori estimation;
rather, the algorithm incrementally constructs sparse grids based on greedy selection
of the best evaluated candidates in an a posteriori setting, followed by subsequent
augmentation with new index sets that are the children of the selected index sets. In
the end, the multi-index frontiers for the sparse grids have been advanced only in
the regions with the greatest influence on the QoI, and all of the model evaluations
(including index sets that were evaluated but never selected) contribute to the final
answer, based on step 5.
In the limiting case where the low-fidelity model provides little useful infor-
mation, this algorithm will prefer refinements to the model discrepancy and will
closely mirror the single-fidelity case, with the penalty of carrying along the low-
fidelity evaluations needed to resolve the discrepancy. This suggests an additional
adaptive control, in which one drops low (and intermediate)-fidelity models from
the hierarchy that are adding expense but not adding value (as measured by their
frequency of selection in step 3). In addition, this general framework can be
extended to include pointwise local refinement [28] (for handling nonsmoothness)
as well as adjoint-enhanced approaches [1, 17] (for improving scalability with
respect to random input dimension).
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1009
3 active 5 4 active
LF sets discrepancy sets
4
0 1 2 3 4 0 1 2 3 4
Fig. 28.3 Multifidelity generalized sparse grid with reference index sets in gray (existing) and
hashed (newly promoted) and active index sets in black (existing) and white (newly created).
Newton-Cotes grid points associated with the reference index sets are shown at bottom
In the case of compressed sensing, we control the relative refinement of the low-
fidelity model and the discrepancy model via the number of points within each
sample set. We define a ratio points D mmhighlow
1, where the high-fidelity points
must be contained within the low-fidelity point set to support the definition of the
model discrepancy at these points.
We now revisit our surrogate high-fidelity expansions produced from additive,
multiplicative, or combined correction to the low-fidelity model. Replacing our
previous sparse grid operator Sq;d : in Eqs. 28.3528.37 with a compressed sensing
operator, we obtain
RQ high D CSmlow ;p ;" ;d Rlow C CSmhigh ;p ;" ;d A ; (28.38)
RQ high D CSmlow ;p ;" ;d Rlow CSmhigh ;p ;" ;d M ; (28.39)
RQ high D CSmlow ;p ;" ;d Rlow C CSmhigh ;p ;" ;d A C .1 /
CSmlow ;p ;" ;d Rlow CSmhigh ;p ;" ;d M ; (28.40)
1010 M.S. Eldred et al.
pc
X
Sq;d Rlow ./ D lowi i ./ (28.41)
i2Jq;d
and
pc
X
Sqr;d A ./ D A i i ./: (28.42)
i2Jqr;d
Since the Jqr;d Jq;d , the bases i ./, i 2 Jqr;d are common between
Sq;d Rlow and Sqr;d A . Therefore, the chaos coefficients of those bases can be
added to produce a single polynomial chaos expansion
pc pc
X X
Sq;d Rlow ./CSqr;d A ./D lowi C A i i ./ C lowi i ./:
i2Jqr;d i2Jq;d nJqr;d
(28.43)
The mean and the variance can be computed directly from this multifidelity
expansion by simply collecting terms for each basis and applying Eqs. 28.7 and 28.8:
R low0 C A 0 (28.44)
X 2 X 2
R2 lowi C A i i2 ./ C 2
lowi i ./ : (28.45)
i2Jqr;d n0 i2Jq;d nJqr;d
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1011
In the adaptive sparse grid and compressed sensing cases, the primary difference
is the loss of a strict subset relationship between the discrepancy and low-fidelity
multi-indices, which we will represent in generalized form as JA and Jlow ,
respectively. We introduce a new multi-index for the set intersection JI D JA \
Jlow . Eq. 28.44 remains the same, as all approaches retain the 0 term, but Eq. 28.45
becomes generalized as follows:
X 2 2 X 2 X
R2 lowi C A i i ./ C 2
lowi i ./ C 2A i i2 ./ :
i2JI n0 i2Jlow nJI i2JA nJI
(28.46)
from which the variance and higher moments can be obtained analytically. This
sc sc sc
requires evaluating Sq;d Rlow and either Sqr;d A or Sqr;d M at the collocation
points associated with the multi-indices in Jq;d . For the former, the low-fidelity
model values at all of the collocation points are readily available and can be used
sc sc
directly. For the latter, the discrepancy expansion Sqr;d A or Sqr;d M must
be evaluated. Since the discrepancy function values are available at collocation
points associated with the multi-indices of Jqr;d , it may be tempting to only
sc sc
evaluate Sqr;d A or Sqr;d M at collocation points associated with the multi-
indices in Jq;d n Jqr;d . However, because sparse grid stochastic collocation does
not interpolate unless the set of collocation points are nested [7], the function values
sc sc
from A and M and from Sqr;d A and Sqr;d M should not be mixed together
when they are not consistent.
To generalize to adaptive sparse grid cases that could lack the strict subset
relationship Jqr;d Jq;d provided by a predefined level offset, we introduce a
multi-index union JU and form a new interpolant on the union grid:
sc
SUsc Slow Rlow C SscA A for JU D Jlow [ JA (28.51)
sc
SUsc Slow Rlow SscM M for JU D Jlow [ JM : (28.52)
Similar to the predefined offset case, the new interpolant is formed by evaluating
sc
Slow Rlow , SscA A , and/or SscM M at the collocations points associated with the
multi-indices in JU and computing the necessary sum or product.
4 Computational Results
First, we present a simple example to motivate the approach and demonstrate the
efficiency improvements that are possible when an accurate low-fidelity model is
available. The system responses of the high-fidelity model and the low-fidelity
model are, respectively,
2 2
Rhigh ./ D e 0:05 cos 0:5 0:5e 0:02.5/
2
Rlow ./ D e 0:05 cos 0:5;
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1013
a b
100 100
102
105
Absolute Error
Absolute Error
104
1010
106
1015
108
HighFidelity HighFidelity
Multifidelity Multifidelity
1020 1010
0 5 10 15 20 0 5 10 15 20
Number of HighFidelity Model Evaluations Number of HighFidelity Model Evaluations
Fig. 28.4 Convergence of single-fidelity PCE and multifidelity PCE with additive discrepancy for
the one-dimensional example. (a) Error in mean. (b) Error in standard deviation (From Multifidelity
Uncertainty Quantification Using Non-Intrusive Polynomial Chaos and Stochastic Collocation,
by Ng and Eldred, 2012, AIAA-2012-1852, published in the Proceedings of the 53rd SDM
conference; reprinted by permission of the American Institute of Aeronautics and Astronautics,
Inc.)
where
Uniform8; 12. An additive discrepancy ./ D Rhigh ./ Rlow ./
is used, which is just the second term of Rhigh ./. In Fig. 28.4, we compare the
convergence in mean and standard deviation of the single (high)-fidelity PCE with
the convergence of the multifidelity PCE. The multifidelity PCE is constructed from
a PCE of the discrepancy function at order 1 to 20 combined with a PCE of the
low-fidelity model at order 60 (for which the low-fidelity statistics are converged
to machine precision). This corresponds to a case where low-fidelity expense can
be considered to be negligible, and by eliminating any issues related to low-
fidelity accuracy, we can focus more directly on comparing the convergence of
the discrepancy function with convergence of the high-fidelity model (in the next
example, we will advance to grid level offsets that accommodate nontrivial low-
fidelity expense). The error is plotted against the number of high-fidelity model
evaluations and is measured with respect to an overkill single-fidelity PCE solution
at order 60. It is evident that the multifidelity stochastic expansion converges much
more rapidly because the additive discrepancy function in the example has lower
complexity than the high-fidelity model. This can be seen from comparison of the
decay of the normalized spectral coefficients as plotted in Fig. 28.5, which shows
that the discrepancy expansion decays more rapidly allowing the statistics of the
discrepancy expansion to achieve a given accuracy using fewer PCE terms than that
required by the high-fidelity model.
Spectral Coefficients
example (From Multifidelity
Uncertainty Quantification
Using Non-Intrusive 105
Polynomial Chaos and
Stochastic Collocation, by Ng
and Eldred, 2012,
AIAA-2012-1852, published
in the Proceedings of the 53rd
SDM conference; reprinted HighFidelity Model
by permission of the Correction Function
American Institute of 1010
Aeronautics and 5 10 15 20
Astronautics, Inc.) Polynomial Order
sparse grid level offsets between low fidelity and high fidelity, which are more
appropriate for cases where the low-fidelity model expense is non-negligible. Let
the system response of the high-fidelity model be
2
4M P
Rhigh ./ D 1 2 ; (28.53)
bh Y bhY
where D .b; h; P; M; Y /, b
Uniform.5; 15/, h
Uniform.15; 25/, P
Normal.500; 100/, M
Normal.2000; 400/, and Y
Lognormal.5; 0:5/ and we
neglect the traditional correlation between P and M for simplicity. We consider
three artificially constructed low-fidelity models of varying predictive quality, which
yield additive discrepancy forms of varying complexity:
2
4P P 4.P M /
Rlow1 ./ D 1 ; A1 ./ D ;
bh2 Y bhY bh2 Y
(28.54)
2
4M M M2 P2
Rlow2 ./ D 1 ; A2 ./ D ;
bh2 Y bhY .bhY /2
(28.55)
2
4M P 4.P M / 4.P M /
Rlow3 ./ D 1 ; A3 ./ D :
bh2 Y bhY bhY bhY
(28.56)
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1015
a b
100 100
Absolute Error
Absolute Error
105 105
Fig. 28.6 Convergence of single-fidelity and multifidelity PCE with isotropic sparse grids and
additive discrepancy for the short column example. The multifidelity sparse grid level offset r D 1.
(a) Error in mean. (b) Error in standard deviation
1016 M.S. Eldred et al.
a b
100 100
Absolute Error
Absolute Error
105 105
Fig. 28.7 Convergence of single-fidelity PCE and multifidelity PCE with isotropic sparse grids
and additive discrepancy for the short column example. The multifidelity sparse grid level offset
r D 1. (a) Error in mean. (b) Error in standard deviation (From Multifidelity Uncertainty
Quantification Using Non-Intrusive Polynomial Chaos and Stochastic Collocation, by Ng and
Eldred, 2012, AIAA-2012-1852, published in the Proceedings of the 53rd SDM conference;
reprinted by permission of the American Institute of Aeronautics and Astronautics, Inc.)
a b
100 100
Absolute Error
Absolute Error
105 105
Fig. 28.8 Convergence of single-fidelity PCE and multifidelity PCE with isotropic sparse grids
and additive discrepancy for the short column example. The multifidelity sparse grid level offset
is compared using r D 1 from Fig. 28.6 (solid lines) and r D 2 (dashed lines). (a) Error in
mean. (b) Error in standard deviation (From Multifidelity Uncertainty Quantification Using Non-
Intrusive Polynomial Chaos and Stochastic Collocation, by Ng and Eldred, 2012, AIAA-2012-
1852, published in the Proceedings of the 53rd SDM conference; reprinted by permission of the
American Institute of Aeronautics and Astronautics, Inc.)
Finally, we investigate the effect of the sparse grid level offset r. Figure 28.8 is
the same as Fig. 28.6 but with r increased from one to two, resulting in greater
resolution in the low-fidelity expansion for a particular discrepancy expansion
resolution. The results from Fig. 28.6 are included as solid lines, and the new
results with r D 2 are shown as dashed lines. A small improvement can be seen
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1017
in the mean convergence for multifidelity using Rlow1 ./ and Rlow3 ./ and for
standard deviation convergence using Rlow1 ./, but results are mixed and it is
unclear whether the benefit of increasing the offset is worth the additional low-
fidelity evaluations, especially in the case where their expense is nontrivial. Thus,
it appears that an automated procedure will be needed to optimize these offsets
accounting for relative cost. This motivates the multifidelity adaptive sparse grid
algorithm described previously (Fig. 28.3).
10 p
X
.x; !/ D 0:1 C 0:03 k k .x/Yk .!/; Yk
Uniform.1; 1/
kD1
1018 M.S. Eldred et al.
100 100
102 102
104 104
100 100
Relative Error in Std Dev
Relative Error in Mean
102 102
104 104
100 100
Relative Error in Std Dev
Relative Error in Mean
102 102
104 104
Table 28.2 Comparison of the relative error and the number of model evaluations for the elliptic
PDE example
Relative error Relative error High fidelity Low fidelity
in mean in std deviation evaluations evaluations
Single fidelity (q D 3) 5:3 106 2:7 104 1981
Single fidelity (q D 4) 4:1 107 2:3 105 12,981
Multifidelity (q D 4, r D 1) 4:7 107 2:6 105 1981 12,981
From Multifidelity Uncertainty Quantification Using Non-Intrusive Polynomial Chaos and
Stochastic Collocation, by Ng and Eldred, 2012, AIAA-2012-1852, published in the Proceedings
of the 53rd SDM conference; reprinted by permission of the American Institute of Aeronautics and
Astronautics, Inc.
The PDE is solved by finite elements and the output of interest is u.0:5; !/. We use
a fine spatial grid with 500 states for the high-fidelity model and a coarse spatial
grid with 50 states for the low-fidelity model. The ratio of average run time between
the high-fidelity model and the low-fidelity model is work D 40.
Fig. 28.9 Convergence of single-fidelity and multifidelity PCE comparing compressed sensing
with isotropic sparse grids for the short column example for Rlow1 ; Rlow2 , and Rlow3 . Discrepancy
is additive, multifidelity sparse grid level offset is r D 1, work D 10, and points D 10. (a) Error in
mean for Rlow1 . (b) Error in standard deviation for Rlow1 . (c) Error in mean for Rlow2 . (d) Error in
standard deviation for Rlow2 . (e) Error in mean for Rlow3 . (f) Error in standard deviation for Rlow3
1020 M.S. Eldred et al.
a b
100 100
102 102
Relative Error
Relative Error
104 104
106 106
108 108
HighFidelity HighFidelity
Multifidelity Multifidelity
1010 1010
100 101 102 103 104 105 100 101 102 103 104 105
Equivalent Number of HighFidelity Equivalent Number of HighFidelity
Model Evaluations Model Evaluations
Fig. 28.10 Convergence of single-fidelity and multifidelity PCE with additive discrepancy using
adaptive sparse grids for the elliptic PDE example. (a) Error in mean. (b) Error in standard devi-
ation (From Multifidelity Uncertainty Quantification Using Non-Intrusive Polynomial Chaos and
Stochastic Collocation, by Ng and Eldred, 2012, AIAA-2012-1852, published in the Proceedings
of the 53rd SDM conference; reprinted by permission of the American Institute of Aeronautics and
Astronautics, Inc.)
Figure 28.10 shows the convergence for the single-fidelity and multifidelity PCE
with additive discrepancy based on adaptive refinement using generalized sparse
grids. The single-fidelity case uses the standard generalized sparse grid proce-
dure [21], whereas the multifidelity case uses the multifidelity adaptive sparse grid
algorithm depicted in Fig. 28.3. The initial grid for both the low-fidelity model and
the correction function is a level one sparse grid (requiring 11 model evaluations).
We use the equivalent number of high-fidelity model evaluations (meqv ) to include
the additional cost of low-fidelity model evaluations in the comparison with the
single-fidelity case. By considering the potential error reduction per unit cost of
refining the sparse grid of the discrepancy versus that of refining the sparse grid of
the low-fidelity model, the multifidelity adaptive algorithm is able to achieve a faster
convergence than the single-fidelity adaptive generalized sparse grid. For achieving
error levels consistent with the non-adapted multifidelity approach in Table 28.2,
the adaptive multifidelity algorithm reduces the equivalent number of high-fidelity
evaluations by 33% for the mean and by 62% for the standard deviation.
a r = Nhi/Nlo = 4 b r = Nhi/Nlo = 6
100 100
101 101
Relative Error in Std Dev
103 103
100 101 102 103 104 100 101 102 103 104
Equivalent Number of HighFidelity Equivalent Number of HighFidelity
Model Evaluations Model Evaluations
c r = Nhi/Nlo = 8 d r = Nhi/Nlo = 10
100 100
101 101
Relative Error in Std Dev
102 102
103 103
100 101 102 103 104 100 101 102 103 104
Equivalent Number of HighFidelity Equivalent Number of HighFidelity
Model Evaluations Model Evaluations
Fig. 28.11 Convergence of standard deviation in elliptic PDE problem using multifidelity
compressed sensing with different points ratios. (a) points D 4. (b) points D 6. (c) points D 8.
(d) points D 10
runs, and CS employs the OMP solver in combination with cross-validation to select
the total-order polynomial degree p of the candidate basis and the noise tolerance ".
It is evident that CS-based approaches perform better for this problem than the
sparse grid approaches, and the benefit of multifidelity CS over single-fidelity
CS is comparable to that of multifidelity sparse grids over single-fidelity sparse
grids. Finally, increasing points is advantageous for this problem, implying strong
predictivity in the low-fidelity model and allowing for lower relative investment in
resolving the discrepancy.
1022 M.S. Eldred et al.
Fig. 28.12 2D horn geometry and the propagation of acoustic waves (From Multifidelity Uncer-
tainty Quantification Using Non-Intrusive Polynomial Chaos and Stochastic Collocation, by Ng
and Eldred, 2012, AIAA-2012-1852, published in the Proceedings of the 53rd SDM conference;
reprinted by permission of the American Institute of Aeronautics and Astronautics, Inc.)
a b
100 100
105 105
Relative Error
Relative Error
1010 1010
HighFidelity HighFidelity
Multifidelity Multifidelity
1015 1015
100 101 102 103 100 101 102 103
Equivalent Number of HighFidelity Equivalent Number of HighFidelity
Model Evaluations Model Evaluations
Fig. 28.13 Convergence of single-fidelity PCE and multifidelity PCE with additive correction
using adaptive generalized sparse grids for the acoustic horn example. (a) Error in mean. (b)
Error in standard deviation (From Multifidelity Uncertainty Quantification Using Non-Intrusive
Polynomial Chaos and Stochastic Collocation, by Ng and Eldred, 2012, AIAA-2012-1852,
published in the Proceedings of the 53rd SDM conference; reprinted by permission of the American
Institute of Aeronautics and Astronautics, Inc.)
a 0.16
b 1.5
103
FEM
RBM
0.15 1
0.14
0.5
0.13
sFEMsRBM
0
0.12
s
0.5
0.11
1
0.1
0.09 1.5
0.08 2
1.3 1.35 1.4 1.45 1.5 1.3 1.35 1.4 1.45 1.5
K K
Fig. 28.14 Comparison of finite-element model (FEM) and reduced-basis model (RBM) results
for the acoustic horn example. (a) Comparison of low- and high-fidelity QoI. (b) Model
discrepancy
model is about 2%), its interpolatory nature results in oscillations that require
a higher-order PCE expansion to resolve (Fig. 28.14). In addition, reduced-order
modeling approaches based on projection of a truncated basis will in general tend to
retain dominant lower-order effects and omit higher-order behavior. This highlights
the primary weakness of a multifidelity sparse grid approach; it must step through
lower grid levels to reach higher grid levels, such that a multifidelity sparse grid
approach has difficulty benefiting from a low-fidelity model that only predicts low-
order effects. The presence of similar high-order content within the discrepancy
1024 M.S. Eldred et al.
a r = Nhi/Nlo = 4 b r = Nhi/Nlo = 6
100 100
101 101
Relative Error in Std Dev
Relative Error in Std Dev
102 102
103 103
104 104
105 105
106 106
CS multi CS multi
CS single CS single
107 SG multi 107 SG multi
SG single SG single
MC MC
108 108
100 101 102 103 100 101 102 103
Equivalent Number of HighFidelity Equivalent Number of HighFidelity
Model Evaluations Model Evaluations
c r = Nhi/Nlo = 8 d r = Nhi/Nlo = 10
100 100
101 101
Relative Error in Std Dev
102 102
103 103
104 104
105 105
106 106
CS multi CS multi
CS single CS single
SG multi SG multi
107 SG single 107 SG single
MC MC
108 108
100 101 102 103 100 101 102 103
Equivalent Number of HighFidelity Equivalent Number of HighFidelity
Model Evaluations Model Evaluations
Fig. 28.15 Convergence of standard deviation in horn problem using multifidelity compressed
sensing with different points values. (a) points D 4 (b) points D 6. (c) points D 8. (d) points D 10
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1025
and Monte Carlo sampling. As for previous examples, convergence plots for Monte
Carlo sampling and CS are averaged over 10 runs, and CS employs the OMP
solver in combination with cross-validation to select the total-order polynomial
degree p of the candidate basis and the noise tolerance ". For the non-adapted
sparse grid approaches, the multifidelity PCE shows a small improvement relative
to the single-fidelity approach, although the amount of improvement decreases
with resolution level (similar to the adapted sparse grid result in Fig. 28.13b). For
the CS approaches, more rapid convergence overall is evident than for the sparse
grid approaches, indicating sparsity or compressibility in the coefficient spectrum.
The multifidelity CS approaches show modest improvements relative to the single-
fidelity approaches, with the greatest separation corresponding to points D 6.
Comparing this observation to the elliptic PDE problem where the highest point
ratios performed best, one could infer that solutions for the horn problem require
greater reliance on the high-fidelity model for resolving the high-order discrepancy
effects.
Figure 28.16 plots the spectrum of expansion coefficients i comparing single-
fidelity with multifidelity CS for different point ratios points . The single-fidelity PCE
coefficients recovered from m D 200 samples are compared to the multifidelity
PCE coefficients using the same 200 high-fidelity samples in combination with 800,
1200, 1600, and 2000 low-fidelity samples for points = 4, 6, 8, and 10, respectively.
These cases are plotted against reference single-fidelity cases for which 800, 1200,
1600, and 2000 samples are performed solely on the high-fidelity model. All CS
solutions employ cross-validation to select the most accurate candidate basis order
p and noise tolerance ". It is evident that all cases are in agreement with respect to
capturing the dominant terms with the largest coefficient magnitude (terms greater
than approximately 105 ). Differences manifest for the smaller coefficients, with the
more resolved single-fidelity reference solutions (blue) recovering many additional
terms relative to the 200-sample single-fidelity solutions (green), with some relative
inaccuracy apparent in the smallest terms for the latter. Augmenting the 200 high-
fidelity samples with low-fidelity simulations allows the multifidelity approach (red)
to more effectively approximate these less-dominant terms, effectively extending the
recovered spectrum to a level similar to that of the high-fidelity reference solution.
Wind turbine reliability plays a critical role in the long-term prospects for cost-
effective wind-based energy generation. The computational assessment of failure
probability or life expectancy of turbine components is fundamentally hindered
by the presence of large uncertainties in the environmental conditions, the blade
structure, and in the form of turbulence closure models that are used to simulate
complex flow. Rigorous quantification of the impact of such uncertainties can
fundamentally improve the state of the art in computational predictions and, as a
result, aid in the design of more cost-effective devices.
1026 M.S. Eldred et al.
103
103
Coefficient Magnitude
Coefficient Magnitude
104
104
105
105
106
106
107
107
108
108 109
109 1010
100 101 100 101
Coefficient Index Coefficient Index
103
103
Coefficient Magnitude
Coefficient Magnitude
4
10
104
105
105
106
106
107
107
108
109 108
1010 109
100 101 100 101
Coefficient Index Coefficient Index
Fig. 28.16 Coefficient spectrum for horn problem using multifidelity compressed sensing with
different points values. (a) points D 4. (b) points D 6. (c) points D 8. (d) points D 10
Fig. 28.17 Vertical-axis wind turbine test bed and two-dimensional CACTUS simulation. (a)
VAWT test bed. (b) CACTUS simulation
b
a 3 m/s
5 m/s
0.4 8 m/s
Viscous Surface Traction (N)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
2 4 6 8
Time (s)
Fig. 28.18 VAWT geometry and leading edge viscous surface traction for three crosswind
velocities. (a) Mesh geometry. (b) Integrated viscous surface traction
shows a sample simulation for a VAWT geometry of interest where the cross wind
magnitude was varied in a fixed tip speed configuration of 40 m/s. Three different
crosswinds were used, 3, 5, and 8 m/s, from which the integrated surface traction
was computed (Fig. 28.18b). The mesh outlining the three blade configuration is
shown in Fig. 28.18a.
This two-level hierarchy has extreme separation in cost. CACTUS executions for
two-dimensional VAWT simulations are fast, typically requiring a few minutes of
execution time on a single core of a workstation. Conchas simulations, on the other
hand, require approximately 72 h on 48 cores for simulation with 2M finite elements,
such that these simulations are strongly resource constrained. For this case, work is
approximately 105 .
a b
35
0.095
gusttime
gustX0
gustamp
0.09
Blade normal force
30
0.085
Torque
0.08
25
0.075
gusttime
gustX0
gustamp
20
100 80 60 40 20 0 20 40 60 80 100 100 80 60 40 20 0 20 40 60 80 100
Parameter step Parameter step
Fig. 28.19 Centered 1D parameter studies for initial formulation using the low-fidelity model.
Maximum torque and maximum blade normal force are computed as functions of gust phasing,
location, and amplitude. (a) Maximum torque. (b) Maximum blade normal force
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1029
a b
Fig. 28.20 Conchas simulation of a synthetic gust (inviscid Taylor Vortex) passing through a
VAWT. (a) Closeup of vortex/rotor interaction. (b) Vortex passing downstream
a 4000
b 8000
Conchas Conchas
CACTUS CACTUS
3000 6000
2000 4000
normal force (N/m)
1000 2000
0 0
1000 4000
2000 4000
3000 6000
0 20 40 60 80 100 0 20 40 60 80 100
time (sec)
Fig. 28.21 Comparison of CACTUS and Conchas time histories for small and large gust
amplitudes. (a) Small gust (amplitude = 5). (b) Large gust (amplitude = 20)
an inviscid Taylor vortex, as shown in Fig. 28.20. The random variables have been
slightly modified from those in Fig. 28.19 to describe the vortex radius measured
in rotor radii, the lateral position of the vortex in rotor radii, and the amplitude
of the vortex. These three random variables are modeled using bounded normal
( D 0:25;
D 0:05; l D 0:0; u D 1:5), uniform (l D 1:5; u D 1:5), and Gumbel
( D 7:106; D 1:532) probability distributions, respectively. Figure 28.21
overlays CACTUS and Conchas results for the time histories of blade normal
force for two different values of vortex amplitude, where it is evident that there
is qualitative agreement between the time histories. The details of flow separation
for the large gust have important differences, as expected for the different physics
models. Figure 28.22 displays a centered parameter study for the integrated impulse
for blade normal force plotted against variations in the vortex radius, lateral position,
and amplitude parameters. While the obvious discontinuities have been tamed by the
migration to integrated metrics, the response quantity remains a complex function
of its parameters.
1030 M.S. Eldred et al.
104
8.3
vortex radius
8.2 vortex lateral pos.
vortex amplitude
Blade Normal Force Impulse
8.1
7.9
7.8
7.7
7.6
7.5
50 0 50
Parameter Step
Fig. 28.22 Centered 1D parameter studies for refined formulation using the low-fidelity model.
Integrated impulse for blade normal force is computed as a function of Taylor vortex radius, lateral
position, and amplitude
104
103
abs(spectral coefficient)
102
101
100
101
102
103
104
100 101 102 103
Basis id (totalorder basis)
Fig. 28.23 Multifidelity coefficient spectrum: multifidelity coefficients (black) are composed
from low fidelity (green) and discrepancy (blue) to extend spectrum from a single-fidelity approach
(red)
lower for the multifidelity approaches, indicating more rapid relative convergence.
Convergence in PDF and CCDF is less clear, with the most relevant observation
being that, while the less resolved single-fidelity (q D 1 in red) and corresponding
multifidelity results (q ; qlow D 1; 5 in blue) are quite different, the more resolved
single-fidelity (q D 2 in green) and corresponding multifidelity (q ; qlow D 2; 5
in black) results have begun to coalesce. Thus, the augmentation with low-fidelity
information appears to be providing utility, and the process overall appears to
support the trends observed with previous examples, although it is clear that
additional resolution is needed to achieve more conclusive statistical convergence.
5 Conclusions
a 8.78
104 Single and multifidelity mean predictions b Single and multifidelity std deviation predictions
2600
8.77 2500
8.76 2400
2300
8.75
Std deviation
2200
Mean
8.74
2100
8.73
2000
8.72
HF PCE SG 1900 HF PCE SG
HF SC SG HF SC SG
8.71 MF PCE SG 1800 MF PCE SG
MF SC SG MF SC SG
8.7 1700
0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45
Number of HF simulations Number of HF simulations
c 3
PDF from 100k LHS Samples d 1
CCDF from 100k LHS Samples
10
HF SGL 1
Complementary cumulative probability
HF SGL 1
HF SGL 2 0.9 HF SGL 2
4 MF SGL 1,5 MF SGL 1,5
10 MF SGL 2,5
0.8 MF SGL 2,5
0.7
Probability density
5
10
0.6
10
6 0.5
0.4
7
10 0.3
8
0.2
10
0.1
9 0
10
7.5 8 8.5 9 9.5 10 10.5 11 8.2 8.4 8.6 8.8 9 9.2 9.4 9.6
blade impulse 10 4
blade impulse 104
Fig. 28.24 Refinement of blade impulse statistics for multifidelity PCE/SC compared to single-
fidelity PCE/SC. (a) Refinement of mean. (b) Refinement of standard deviation. (c) Refinement of
PDF. (d) Refinement of CCDF
to the approach of directly estimating the statistics of the system response from a
limited number of expensive high-fidelity model evaluations, greater accuracy can
be obtained by incorporating information about the system response from a less
expensive but still informative low-fidelity model.
Additive, multiplicative, and combined discrepancy formulations have been
described within the context of polynomial chaos and stochastic collocation based
on sparse grids and within the context of sparse polynomial chaos based on
compressed sensing. Multifidelity sparse grid approaches can include simple pre-
defined offsets in resolution level that enforce computational savings but do not
explicitly manage accuracy, or adaptive approaches that seek the most cost-effective
incremental refinements for accurately resolving statistical quantities of interest.
Multifidelity compressed sensing approaches employ fixed sample sets but adapt the
candidate basis and noise tolerance through the use of cross-validation to achieve an
accurate recovery without overfitting.
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1033
decays more rapidly than that of the high-fidelity model (resulting in accelerated
convergence rates), and/or where the discrepancy is sparse relative to the high-
fidelity model (requiring the recovery of fewer significant terms).
References
1. Adams, B.M., Bauman, L.E., Bohnhoff, W.J., Dalbey, K.R., Ebeida, M.S., Eddy, J.P., Eldred,
M.S., Hough, P.D., Hu, K.T., Jakeman, J.D., Swiler, L.P., Stephens, J.A., Vigil, D.M.,
Wildey, T.M.: Dakota, a multilevel parallel object-oriented framework for design optimization,
parameter estimation, uncertainty quantification, and sensitivity analysis: version 6.2 theory
manual. Tech. Rep. SAND2014-4253, Sandia National Laboratories, Albuquerque (Updated
May 2015). Available online from http://dakota.sandia.gov/documentation.html
2. Agarwal, N., Aluru, N.: A domain adaptive stochastic collocation approach for analysis of
MEMS under uncertainties. J. Comput. Phys. 228, 76627688 (2009)
3. Alexandrov, N.M., Lewis, R.M., Gumbert, C.R., Green, L.L., Newman, P.A.: Approximation
and model management in aerodynamic optimization with variable fidelity models. AIAA J.
Aircr. 38(6), 10931101 (2001)
4. Askey, R., Wilson, J.: Some Basic Hypergeometric Orthogonal Polynomials that Generalize
Jacobi Polynomials. No. 319 in Memoirs of the American Mathematical Society. AMS,
Providence (1985)
5. Babuka, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial
differential equations with random input data. SIAM J. Numer. Anal. 45(3), 10051034 (2007)
6. Barth, A., Schwab, C., Zollinger, N.: Multi-level monte carlo finite element method for elliptic
PDEs with stochastic coefficients. Numer. Math. 119, 123161 (2011)
7. Barthelmann, V., Novak, E., Ritter, K.: High dimensional polynomial interpolation on sparse
grids. Adv. Comput. Math. 12(4), 273288 (2000)
8. Bichon, B.J., Eldred, M.S., Swiler, L.P., Mahadevan, S., McFarland, J.M.: Efficient global
reliability analysis for nonlinear implicit performance functions. AIAA J. 46(10), 24592468
(2008)
9. Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numer. 13, 147269 (2004)
10. Cheung, S.H., Oliver, T.A., Prudencio, E.E., Prudhomme, S., Moser, R.D.: Bayesian uncer-
tainty analysis with aplications to turbulence modeling. Reliab. Eng. Syst. Saf. 96, 11371149
(2011)
11. Constantine, P.G., Eldred, M.S., Phipps, E.T.: Sparse pseudospectral approximation method.
Comput. Methods Appl. Mech. Eng. Volumes 229232, pp. 112 (2004)
12. Der Kiureghian, A., Liu, P.L.: Structural reliability under incomplete information. J. Eng.
Mech. ASCE 112(EM-1), 85104 (1986)
13. Domino, S.P.: Towards verification of sliding mesh algorithms for complex applications using
MMS. In: Proceedings of 2010 Center for Turbulence Research Summer Program, Stanford
University (2010)
14. Eftang, J.L., Huynh, D.B.P., Knezevic, D.J., Patera, A.T.: A two-step certified reduced basis
method. J. Sci. Comput. 51(1), pp 2858 (2012)
15. Eldred, M., Wildey, T.: Propagation of model form uncertainty for thermal hydraulics using
rans turbulence models in drekar. Tech. Rep. SAND2012-5845, Sandia National Laboratories,
Albuquerque (2012)
16. Eldred, M.S., Giunta, A.A., Collis, S.S.: Second-order corrections for surrogate-based opti-
mization with model hierarchies. In: 10th AIAA/ISSMO Multidisciplinary Analysis and
Optimization Conference, Albany, AIAA 2004-4457 (2004)
17. Eldred, M.S., Phipps, E.T., Dalbey, K.R.: Adjoint enhancement within global stochastic
methods. In: Proceedings of the SIAM Conference on Uncertainty Quantification, Raleigh
(2012)
28 Multifidelity Uncertainty Quantification Using Spectral Stochastic . . . 1035
18. Gano, S.E., Renaud, J.E., Sanders, B.: Hybrid variable fidelity optimization by using a Kriging-
based scaling function. AIAA J. 43(11), 24222430 (2005)
19. Gautschi, W.: Orthogonal Polynomials: Computation and Approximation. Oxford University
Press, New York (2004)
20. Gerstner, T., Griebel, M.: Numerical integration using sparse grids. Numer. Algorithms 18(3),
209232 (1998)
21. Gerstner, T., Griebel, M.: Dimension-adaptive tensor-product quadrature. Computing 71(1),
6587 (2003)
22. Ghanem, R.G., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1991)
23. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607617 (2008)
24. Goh, J., Bingham, D., Holloway, J.P., Grosskopf, M.J., Kuranz, C.C., Rutter, E.: Prediction
and computer model calibration using outputs from multi-fidelity simulators. Technometrics in
review (2012)
25. Golub, G.H., Welsch, J.H.: Calculation of gauss quadrature rules. Math. Comput. 23(106),
221230 (1969)
26. Huang, D., Allen, T.T., Notz, W.I., Miller, R.A.: Sequential Kriging optimization
using multiple-fidelity evaluations. Struct. Multidisciplinary Optim. 32(5), 369382
(2006)
27. Jakeman, J., Eldred, M.S., Sargsyan, K.: Enhancing `1 -minimization estimates of polynomial
chaos expansions using basis selection. J. Comput. Phys. 289, 1834 (2015)
28. Jakeman, J.D., Roberts, S.G.: Local and dimension adaptive stochastic collocation for uncer-
tainty quantification. In: Proceedings of the Workshop on Sparse Grids and Applications, Bonn
(2011)
29. Kennedy, M.C., OHagan, A.: Predicting the output from a complex computer code when fast
approximations are available. Biometrika 87(1), 113 (2000)
30. Kennedy, M.C., OHagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. 63,
425464 (2001)
31. Klimke, A., Wohlmuth, B.: Algorithm 847: spinterp: Piecewise multilinear hierarchical sparse
grid interpolation in matlab. ACM Trans. Math. Softw. 31(4), 561579 (2005)
32. Kuschel, N., Rackwitz, R.: Two basic problems in reliability-based structural optimization.
Math. Method Oper. Res. 46, 309333 (1997)
33. March, A., Willcox, K.: Convergent multifidelity optimization using Bayesian model calibra-
tion. In: 13th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Fort
Worth, AIAA 2010-9198 (2010)
34. Murray, J., Barone, M.: The development of CACTUS, a wind and marine turbine performance
simulation code. In: 49th AIAA Aerospace Sciences Meeting, Orlando, AIAA 2011-147
(2011)
35. Narayan, A., Gittelson, C., Xiu, D.: A stochastic collocation algorithm with multifidelity
models. SIAM J. Sci. Comput. 36(2), A495A521 (2014)
36. Ng, L.W.T., Eldred, M.S.: Multifidelity uncertainty quantification using nonintrusive polyno-
mial chaos and stochastic collocation. In Proceedings of the 53rd SDM Conference, Honolulu,
Hawaii, AIAA-2012-1852 (2012)
37. Picard, R.R., Williams, B.J., Swiler, L.P., Urbina, A., Warr, R.L.: Multiple model inference with
application to uncertainty quantification for complex codes. Tech. Rep. LA-UR-10-06382, Los
Alamos National Laboratory, Los Alamos (2010)
38. Rajnarayan, D., Haas, A., Kroo, I.: A multifidelity gradient-free optimization method and
application to aerodynamic design. In: 12th AIAA/ISSMO Multidisciplinary Analysis and
Optimization Conference, Victoria, AIAA 2008-6020 (2008)
39. Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470472
(1952)
40. Rozza, G., Huynh, D.B.P., Patera, A.T.: Reduced basis approximation and a posteriori error
estimation for affinely parametrized elliptic coercive partial differential equations. Arch.
Comput. Methods Eng. 15(3), 229275 (2008)
1036 M.S. Eldred et al.
41. Witteveen, J.A.S., Bijl, H.: Modeling arbitrary uncertainties using gram-schmidt polynomial
chaos. In: 44th AIAA Aerospace Sciences Meeting and Exhibit, Reno, AIAA 2006-896 (2006)
42. Xiu, D., Hesthaven, J.S.: High-order collocation methods for differential equations with
random inputs. SIAM J. Sci. Comput. 27(3), 11181139 (2005)
43. Xiu, D., Karniadakis, G.E.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
44. Zhu, X., Narayan, A., Xiu, D.: Computational aspects of stochastic collocation with multi-
fidelity models. SIAM/ASA J. Uncertain. Quantif. 2, 444463 (2014)
Mori-Zwanzig Approach to Uncertainty
Quantification 29
Daniele Venturi, Heyrim Cho, and George Em Karniadakis
Abstract
Determining the statistical properties of nonlinear random systems is a problem
of major interest in many areas of physics and engineering. Even with recent
theoretical and computational advancements, no broadly applicable technique
has yet been developed for dealing with the challenging problems of high
dimensionality, low regularity and random frequencies often exhibited by the
system. The Mori-Zwanzig and the effective propagator approaches discussed
in this chapter have the potential of overcoming some of these limitations,
in particular the curse of dimensionality and the lack of regularity. The key
idea stems from techniques of irreversible statistical mechanics, and it relies
on developing exact evolution equations and corresponding numerical methods
for quantities of interest, e.g., functionals of the solution to stochastic ordinary
and partial differential equations. Such quantities of interest could be low-
dimensional objects in infinite-dimensional phase spaces, e.g., the lift of an
airfoil in a turbulent flow, the local displacement of a structure subject to random
loads (e.g., ocean waves loading on an offshore platform), or the macroscopic
properties of materials with random microstructure (e.g., modeled atomistically
in terms of particles). We develop the goal-oriented framework in two different,
although related, mathematical settings: the first one is based on the Mori-
Zwanzig projection operator method, and it yields exact reduced-order equations
for the quantity of interest. The second approach relies on effective propagators,
D. Venturi ()
Department of Applied Mathematics and Statistics, University of California Santa Cruz, Santa
Cruz, CA, USA
e-mail: venturi@ucsc.edu
H. Cho
Department of Mathematics, University of Maryland, College Park, MD, USA
G.E. Karniadakis ()
Division of Applied Mathematics, Brown University, Providence, RI, USA
e-mail: gk@dam.brown.edu, george_karniadakis@brown.edu
Keywords
High-dimensional stochastic dynamical systems Probability density function
equations Projection operator methods Dimension reduction
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1038
1.1 Overcoming High Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041
1.2 Overcoming Low Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041
2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042
2.1 Some Properties of the Solution to the Joint PDF Equation . . . . . . . . . . . . . . . . . . 1043
3 Dimension Reduction: BBGKY Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044
4 The Mori-Zwanzig Projection Operator Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045
4.1 Coarse-Grained Dynamics in the Phase Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046
4.2 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
4.3 Time-Convolutionless Form of the Mori-Zwanzig Equation . . . . . . . . . . . . . . . . . . 1048
4.4 Multilevel Coarse-Graining in Probability and Phase Spaces . . . . . . . . . . . . . . . . . 1049
5 The Closure Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1050
5.1 Beyond Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051
5.2 Effective Propagators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1052
5.3 Algorithms and Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055
6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057
6.1 Stochastic Resonance Driven by Colored Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057
6.2 Mori-Zwanzig Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
6.3 Fractional Brownian Motion, Levy, and Other Noises . . . . . . . . . . . . . . . . . . . . . . . 1060
6.4 Stochastic Advection-Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
6.5 Stochastic Burgers Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063
6.6 Coarse-Grained Models of Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066
8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067
1 Introduction
for such contraction of state variables is the derivation of the Boltzmann equation
from the Newtons law or from the Liouville equation [16, 118, 142]. Another
example of a different type is the Brownian motion of a particle in a liquid,
where the master equation governing the position and momentum of the particle
is derived from first principles (Hamilton equations of motion of the full system),
by eliminating the degrees of freedom of the liquid [17,67]. In stochastic systems far
from equilibrium, one often has to deal with the problem of eliminating macroscopic
phase variables, i.e., phase variables with the same order of magnitude and dynam-
ical properties as the ones of interest. For example, to define the turbulent viscosity
in the inertial range of fully developed turbulence, one has to eliminate short
wavelength components of the fluid velocity which are far from equilibrium. This
problem arises more often than one would expect, and it is more challenging than
the problem of contracting microscopic phase variables. For example, it arises when
deriving the master equation for the series expansion of the solution to a nonlinear
stochastic partial differential equation (SPDE), given any discretized form.
In this chapter we illustrate how to perform the contraction of state variables in
nonequilibrium stochastic dynamical systems by using the Mori-Zwanzig projection
operator method and the effective propagator approach. In particular, we will
show how to develop computable evolution equations for quantities of interest in
high-dimensional stochastic systems and how to determine their statistical prop-
erties. This problem received considerable attention in recent years. Well-known
approaches to compute such properties are generalized polynomial chaos (gPC)
[50, 154, 155], multielement generalized polynomial chaos (ME-gPC) [138, 146],
multielement and sparse adaptive probabilistic collocation (ME-PCM) [32, 43,
44, 84], high-dimensional model representations [76, 112], stochastic biorthogonal
expansions [131, 132, 137], and generalized spectral decompositions [100, 101].
These techniques can provide considerable speedup in computational time when
compared to classical approaches such as Monte Carlo (MC) or quasi-Monte Carlo
methods. However, there are still several important computational limitations that
have not yet been overcome. They are related to:
3 10
t=0 t=8
2
5
1
x2 0 x2 0
1
5
2
3 10
4 2 0 2 4 4 2 0 2
x1 x1
8 4
t = 16 t = 76
4 2
x2 0 x2 0
4 2
8 4
2 1 0 1 2 0 0.5 1 1.5 2
x1 x1
Fig. 29.1 Duffing equation. Poincar sections of the phase space at different times obtained by
evolving a zero-mean jointly Gaussian distribution with covariance C11 D C22 D 1=4, C12 D 0.
Note that simple statistical properties such as the mean and variance are not sufficient to describe
the stochastic dynamics of the system (29.5) (Adapted from [136])
unavoidable in nonlinear systems and they are often associated with interesting
physics, e.g., around bifurcation points [138, 139]. By using adaptive methods,
e.g., ME-gPC or ME-PCM, one can effectively resolve such discontinuities
and restore convergence. This is where h-refinement in parameter space is
particularly important [43, 144].
3. Multiple scales: Stochastic systems can involve multiple scales in space, time,
and phase space (see Fig. 29.1) which could be difficult to resolve by conven-
tional numerical methods.
4. Long-term integration: The flow map defined by systems of differential equations
can yield large deformations, stretching and folding of the phase space. As a
consequence, methods that represent the parametric dependence of the solution
on random input variables, e.g., in terms of polynomials chaos of fixed order
or in terms of a fixed number of collocation points, will lose accuracy as
time increases. This phenomenon can be mitigated, although not completely
29 Mori-Zwanzig Approach to Uncertainty Quantification 1041
The Mori-Zwanzig and the effective propagator approaches have the potential of
overcoming some of these limitations, in particular the curse of dimensionality and
the lack of regularity.
The Mori-Zwanzig and the effective propagator approaches allow for a systematic
elimination of the irrelevant degrees of freedom of the system, and they yield
formally exact equations for quantities of interest, e.g., functionals of the solution to
high-dimensional systems of stochastic ordinary differential equations (SODEs) and
stochastic partial differential equations (SPDEs). This allows us to avoid integrating
the full (high-dimensional) stochastic dynamical system and solve directly for the
quantities of interest. In principle, this can break the curse of dimensionality in
numerical simulations of SODEs and SPDEs at the price of solving complex
integrodifferential PDEs the Mori-Zwanzig equations. The computability of such
PDEs relies on approximations. Over the years many methods have been proposed
for this scope, for example, small correlation expansions [30, 46, 121], cumulant
resummation methods [13, 17, 45, 78, 136], functional derivative techniques [51
53, 140], path integral methods [86, 108, 130, 153], decoupling approximations [54],
and local linearization methods [35]. However, these techniques are not, in general,
effective in eliminating degrees of freedom with the same order of magnitude and
dynamical properties as the quantities of interest. Several attempts have been made
to overcome these limitations and establish a computable framework for Mori-
Zwanzig equations that goes beyond closures based on perturbation analysis. We
will discuss some of these methods later in this chapter.
dimension depends on the amplitude of the forcing term (see, e.g., [69]). However,
the marginal distributions of such a complex joint PDF are approximately Gaussian
(see Fig. 29.4).
2 Formulation
8
< dx.t I !/ D f .x.t I !/; .!/; t /
dt ; (29.1)
:
x.0I !/ D x0 .!/
where hi denotes an integral with respect to the joint probability distribution of
.!/ and x0 .!/, while are multidimensional Dirac delta functions [68, 71]. Also,
the vectors a and b represent the phase-space coordinates corresponding to xi .t I !/
and .!/, respectively. By differentiating (29.2) with respect to time and using well-
known identities involving the Dirac delta function, it is straightforward to obtain
the following exact hyperbolic conservation law:
n
X
@p.a; b; t / @fi .a; b; t / @
DL.a; b; t /p.a; b; t /; L.a; b; t /D C fi :
@t iD1
@ai @ai
(29.3)
In the sequel we will often set p.t / p.a; b; t / and L.t / L.a; b; t / for notational
convenience. Equation (29.3) is equivalent to the Liouville equation of classical
statistical mechanics (for non-Hamiltonian systems), with the remarkable difference
that the phase variables we consider here can be rather general coordinates not
simply positions and momenta of particles. For instance, they could be the Galerkin
or the collocation coefficients arising from a spatial discretization of a SPDE, e.g.,
if we represent the solution as
29 Mori-Zwanzig Approach to Uncertainty Quantification 1043
n
X
u.X; t I !/ D xj .t I !/j .X /; (29.4)
j D1
where j .X / are spatial basis functions. Early formulations in this direction were
proposed by Edwards [33], Herring [56], and Montgomery [90] in the context of
fluid turbulence.
Nonlinear systems in the form (29.1) can lead to all sorts of dynamics, including
bifurcations, fractal attractors, multiple stable steady states, and transition scenarios.
Consequently, the solution to the joint PDF equation (29.3) can be very complex
as well, since it relates directly to the geometry of the phase space. For example,
it is known that the time-asymptotic joint PDF associated with the Lorentz three-
mode problem lies on a fractal attractor with Hausdorff dimension of about 2:06 (see
[143]). Chaotic states and existence of strange attractors have been well documented
for many other systems, such as the Lorenz-84 (see [14]) and the Lorenz-96 [69]
models. Even in the much simpler case of the Duffing equation
dx1 dx2 3 x2 t
D x2 ; D x1 5x1 C 8 cos (29.5)
dt dt 50 2
we can have attractors with fractal structure and chaotic phase similarities [11]. This
is clearly illustrated in Fig. 29.1 where we plot the Poincar sections of the two-
dimensional phase space at different times. Such sections are obtained by sampling
106 initial states from a zero-mean jointly Gaussian distribution and then evolving
them by using (29.5). Since the joint PDF of the phase variables is, in general, a
high-dimensional compactly supported distribution with a possibly fractal structure,
its numerical approximation is a very challenging task, especially in longtime
integration.
However, the statistical description of the system (29.1) in terms of the joint
PDF equation (29.3) is often far beyond practical needs. For instance, we may be
interested in the PDF of only one component, e.g., x1 .t I !/, or in the PDF of a
phase space function u D g.x/ such as (29.4). These PDFs can be obtained either
by integrating out several phase variables from the solution to Eq. (29.3), by con-
structing NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous
input) models (see [7] 5.7) or by applying the projection operator or the effective
propagator approaches discussed in this chapter. This may yield a low-dimensional
PDF equation whose solution is more regular than the one obtained by solving
directly Eq. (29.3) and therefore more amenable to computation. The regularization
of the reduced-order PDF is due to multidimensional integration (marginalization)
of the joint PDF.
1044 D. Venturi et al.
n
Y
p.t; a; b/ D p .b/ pi .ai ; t /; (29.7)
iD1
where p .b/ is the joint PDF of the random vector , yields the system of
conservation laws (i D 1; : : : ; n):
2 3
Z 1 Z 1 n
Y
@pi .ai ; t / @ 66pi .ai ; t /
7
D fi .a; b; t /p .b/ pj .aj ; t /daj db 7
5:
@t @ai 4 1 1 j D1
j i
(29.8)
These equations are coupled through the integrals appearing within the square
bracket. As an example, consider the Lorentz-96 system [69]
dxi
D .xiC1 xi2 / xi1 xi C c; i D 1; : : : ; 40: (29.9)
dt
BBGKY (second-order) MC
40 1
40 0.2 10 mean
s.d.(first)
s.d.(second)
0.15 102
xi
20 20
xi
0.1 10
3
1 1 0.05 104
0 0.5 1.0 1.5 0 0.5 1.0 1.5 0 0.2 0.4 0.6 0.8 1
t t t
Fig. 29.2 Lorenz-96 system. Standard deviation of the phase variables versus time (left) and
absolute errors of first- and second-order truncations of the BBGKY hierarchy relative to MC
(Adapted from [22])
@pi .ai ; t / @
D .hxiC1 i hxi2 i/ hxi1 i pi .ai ; t / .ai c/pi .ai ; t / ;
@t @ai
(29.10)
where hi denotes averaging with respect to the joint PDF of the system, assumed
in the form (29.7). Higher-order truncations, i.e., truncations involving multipoint
PDFs, can be obtained in a similar way (see [22, 90]). Clearly, higher truncation
orders yield better accuracy, but at higher computational cost (see Fig. 29.2).
where T is the chronological time-ordering operator (latest times to the left). For a
detailed derivation, see, e.g., [13, 17, 66, 136, 159]. From Eq. (29.11) we see that the
1046 D. Venturi et al.
exact dynamics of the PDF of the relevant phase variables (projected PDF Pp.t /)
depends on three terms: the Markovian term PL.t /Pp.t /, computable based on the
current state Pp.t /, the initial condition (or noise) term PL.t /G.t; 0/Qp.0/, and
the memory term (time convolution), both depending on the propagator G.t; 0/ of
the orthogonal dynamics. The critical part of the MZ formulation is to find reliable
and accurate approximations of the memory and the initial condition terms.
The nature of the projection operator P will be discussed extensively in
subsequent sections. For now, it is sufficient to note that such projection basically
extracts from the full joint PDF equation (29.3) only the part that describes (in
an exact way) the dynamics of the relevant phase variables. A simple analysis of
Eq. (29.11) immediately shows its irreversibility. Roughly speaking, the projected
distribution function Pp.t /, initially in a certain subspace, leaks out of this subspace
so that information is lost, hence the memory (time convolution) and the initial
condition terms.
The Mori-Zwanzig projection operator method just described can be also used to
reduce the dimensionality of either deterministic or stochastic systems of equations
in the phase space, yielding generalized Langevin equations [61, 118] for quantities
of interest. One remarkable example of such equations is the one describing the
coarse-grained dynamics of a particle system. Within this framework the phase
variables xi .t / in (29.1) can represent either the position or the momentum of the
particle i . Coarse-graining is achieved by defining a new set of state variables:
Particle System
coarse graining
Fig. 29.3 Coarse-graining particle systems using the projection operator method. The micro-
scopic degrees of freedom associated with each atom are condensed into a smaller number of
degrees of freedom (those associated with the big green particles). Coarse-graining is not unique,
and therefore fundamental questions regarding model inadequacy, selection, and validation have to
be carefully addressed
with respect to time and substitute (29.14) within the average (see, e.g., [92, 118]
for the derivation). If (29.1) represents the semi-discrete form of a SPDE and we are
interested in the phase-space function (29.4), then the Mori-Zwanzig formulation
yields the exact PDF equation for the series expansion of the solution to the
SPDE. This overcomes the well-known closure problem arising in PDF equations
corresponding to SPDEs with diffusion or higher-order terms [105,110,141]. On the
other hand, if ui .t / D xi .t / (i D 1; : : : ; q/ are the first q components of a Galerkin
dynamical system, then the Mori-Zwanzig projection operator method allows us to
construct a closure approximation in which the unresolved dynamics (modes from
q C 1 to n) are injected in a formally exact way into the resolved dynamics. This use
of projection operators has been investigated, e.g., by Chorin [24, 27], Stinis [119]
and Chertok [20]. Early studies in this direction not involving Mori-Zwanzig
employed inertial manifolds [40] and nonlinear Galerkin projections [83].
(coarse-graining in the phase space Eq. (29.14)). Well-known choices are the
Zwanzig projection [159], the Mori projection, projections defined in terms of
Boltzmann-Gibbs measures [61, 118, 128], or projections defined by conditional
expectations [25, 27, 29, 119]. If the relevant and the irrelevant phase variables
(hereafter denoted by a and b, respectively) are statistically independent, i.e., if
p.0/ D pa .0/pb .0/, then a convenient projection is
Z Z
Pp.t / D pb .0/ p.t /db ) pa .t / D Pp.t /db: (29.17)
This projection takes the joint PDF p.t / and basically sends it to a separated
state. In this case we have that p.0/ is in the range of P , i.e., Pp.0/ D p.0/,
and therefore the initial condition term in the MZ-PDF equation drops out since
Qp.0/ D .I P /p.0/ D 0.
and replace p.s/ with the solution to Eq. (29.3), propagated backward from time t
to time s < t , i.e.,
Z t
!
p.s/ D Z.t; s/p.t /; where Z.t; s/ D T exp L. /d :
s
(29.19)
!
In the latter definition T is the anti-chronological ordering operator (latest times to
the right). Substituting (29.19) into (29.18) yields
where
Z t
.t / D G.t; s/QL.s/P Z.t; s/ds: (29.21)
0
Equation (29.20) states that the irrelevant part of the PDF Qp.t / can, in principle,
be determined from the knowledge of the relevant part Pp.t / at time t and from
the initial condition Qp.0/. Thus, the dependence on the history of the relevant
part which occurs in the classical Mori-Zwanzig equation has been removed by
29 Mori-Zwanzig Approach to Uncertainty Quantification 1049
@Pp.t /
D K.t /Pp.t / C H .t /Qp.0/; (29.22)
@t
where
@Qp.t /
D QL.t /Pp.t / C Qp.t /; (29.24)
@t
@P1 Qp.t /
D P1 QL.t / Pp.t / C P1 Qp.t / C Q1 Qp.t / ; (29.25)
@t
@Q1 Qp.t /
D Q1 QL.t / Pp.t / C P1 Qp.t / C Q1 Qp.t / : (29.26)
@t
Proceeding similarly, we can split the equation for Q1 Qp.t / by using a new
pair of orthogonal projections P2 and Q2 satisfying P2 C Q2 D I . This yields
two additional evolution equations for P2 Q1 Qp.t / and Q2 Q1 Qp.t /, respectively.
Obviously, one can repeat this process indefinitely to obtain a hierarchy of equations
which generalizes both the Mori-Zwanzig as well as the BBGKY frameworks. The
1050 D. Venturi et al.
advantage of this formulation with respect to the classical approach relies on the fact
that the joint PDF p.t / is not simply split into the relevant and the irrelevant
parts by using P and Q. Indeed, the dynamics of the irrelevant part Qp.t / are
decomposed further in terms of a new set of projections.
This allows us to coarse-grain relevant features of the orthogonal dynamics
further in terms of lower-dimensional quantities. In other words, the multilevel
projection operator method allows us to seemingly interface dynamical systems at
different scales in a mathematically rigorous way. This is particularly useful when
coarse-graining (in state space) high-dimensional systems in the form (29.1). To this
end, we simply have to define a set of quantities of interest u.1/ D g .1/ .x; t /, u.2/ D
g .2/ .x; t /, etc. (see (29.13)), e.g., representing clusters of particles of different
sizes and corresponding projection operators P1 , P2 , etc. This yields a coupled
set of equations resembling (29.14) in which relevant features of the microscopic
dynamics are interacting at different scales defined by different projection operators.
L D L 0 C L1 : (29.27)
Here L0 depends only on the relevant variables of the system, is a positive real
number (coupling constant in time-dependent quantum perturbation theory), and
the norm kL1 k is somehow small (see, e.g., [13, 17, 93]). By using the interaction
representation of quantum mechanics [9, 149], then it is quite straightforward to
obtain from (29.22) and (29.27) an effective approximation (see [136]). One way
to do so is to expand the operators (29.23) in a cumulant series, e.g., in terms of
Kubo-Van Kampen operator cumulants [55, 64, 74, 97], involving increasing powers
of (coupling parameter). Any finite-order truncation of such series then represents
an approximation to the exact MZ-PDF equation. In particular, the approximation
obtained by retaining only the first two cumulants is known as Born approximation
in quantum field theory [17]. We remark that from the point of view of perturbation
theory, the convolutionless form (29.22) has distinctive advantages over the usual
convolution form (29.11). In particular, in the latter case, a certain amount of
29 Mori-Zwanzig Approach to Uncertainty Quantification 1051
40
0.07
0.06
20
0.05
0.04
p5(t)
a6
0
0.03
0.02
-20 0.01
0
20 0 20 40 20 10 0 10 20 30
a6 a5
Fig. 29.4 Lorenz-96 system. Joint PDFs of different phase variables at time t D 100. Setting
c D 20 in (29.9) yields a chaotic dynamics. In this case it can be shown that the joint PDF that
solves Eq. (29.3) goes to a fractal attractor with Hausdorff dimension 34:5 [69]. However, the
reduced-order PDFs are approximately Gaussian. This can be justified on the basis of chaos and
multidimensional integration (29.6)
1052 D. Venturi et al.
techniques [26] and PDF estimates to construct a hybrid approach in which the
memory and the initial condition terms in the MZ equation are computed on the
fly based on samples. In this way, one can compensate for the loss of accuracy
associated with the approximation of the full MZ equation with few samples of
the full dynamical system. A closely related approach is to estimate the expansion
coefficients of the effective propagator (subsequent section) by using samples of the
full dynamical system (29.1) and retain only the coefficients larger than a certain
threshold.
If the initial state p.0/ is separable, i.e., if p.0/ D pa .0/pb .0/ (where a and b are
the relevant and irrelevant phase-space coordinates), then the exact evolution of the
relevant part of the PDF is given by
pa .t / D e tL pa .0/; (29.29)
where hi is an average with respect to the PDF pb .0/. For example, the exact
evolution of the PDF of the first component of the Lorentz-96 system (29.9) is given
by
Z 1 Z 1
p1 .a1 ; t / D e tL p2 .a2 ; 0/ pn .an ; 0/da2 dan p1 .a1 ; 0/;
1 1
(29.30)
where L, in this case, is
n
X @
L D nI .aiC1 ai2 / ai1 C ai C c : (29.31)
iD1
@ai
The linear operator e tL appearing in (29.29) is known as relaxation operator [74]
or effective propagator [63,87]
of the reduced-order
dynamics. Such a propagator is
no longer a semigroup as e .tCs/L e tL e sL , i.e., the evolution of pa .t / is non-
Markovian. This reflects the memory effect induced in the reduced-order dynamics
when we integrate out the phase variables b. To compute the effective propagator,
we need to resort to approximations. For example, we could expand it in a power
series [34, 70] as
29 Mori-Zwanzig Approach to Uncertainty Quantification 1053
1 k
X
tL t k
e DIC L : (29.32)
k
kD1
This expression shows that the dynamics of the PDF pa .t / is fully determined by
the moments of the operator L relative to the joint distribution
of the irrelevant
phase variables. In particular, the kth-order moment Lk governing the dynamics
of p1 .a1 ; t / in the Lorenz-96 system (29.30) is a linear differential operator in a1
involving derivatives up to order k, i.e.,
Z 1 Z 1
k
L D L
L
p2 .a2 ; 0/ pn .an ; 0/da2 dan (29.33)
1 1
k times
k
X .k/ @j
D j .a1 / j
: (29.34)
j D0 @a1
.k/
The coefficients j can be calculated by substituting (29.31) into (29.33) and
performing all integrations. This is a cumbersome calculation, but in principle it
can be carried out and yields exact results. The problem is that truncating moment
expansions such as (29.32) to any finite order usually yields results of poor accuracy.
This is because we may be discarding terms growing like t k , if the norm of Lk does
not decay rapidly enough. A classical approach to overcome these limitations is to
use operator cumulants [55, 64, 66, 73, 74]. For autonomous dynamical systems, we
have the exact formula (see, e.g., [4])
e tL D e he I ic ;
tL
(29.35)
Following Kubo [73,74], we emphasize that many different types of operator cumu-
lants can be defined. Disregarding, for the moment, the specific prescription we
use to construct such operator cumulants (see [55, 74]), let us differentiate (29.29)
with respect to time and take (29.35) into account. This yields the following exact
reduced-order PDF equation
1
!
@pa .t / X t k1 k
D hLic C L c pa .t /; (29.37)
@t .k 1/
kD2
k
X
k .k/ @i1 CCiq
L cD i1 iq .a1 ; : : : ; aq / i
: (29.38)
i1 ;:::;iq D0 @a1i1 @aqq
A substitution of this series expansion into Eq. (29.37) immediately suggests that
the exact evolution of the reduced-order PDF pa .t / is governed, in general, by a
linear PDE involving derivatives of infinite order in the phase variables .a1 ; : : : ; aq /.
.k/
All coefficients i1 iq .a1 ; : : : ; aq / appearing in (29.38) can be expressed in terms
of integrals of polynomial functions of fi (see Eq. (29.1)). However, computing
such coefficients at all orders is neither trivial nor practical. On the other hand,
determining an approximated advection-diffusion form of (29.37) is possible,
simply by taking into account those coefficients leading to second-order derivatives
in the phase variables. This can be achieved in a systematic way by truncating
the series (29.38) to derivatives of second order and computing the corresponding
coefficients.
Each operator U .ti ; ti1 / (short-time propagator) can be then approximated accord-
ing to an appropriate decomposition formula [10, 122, 123, 150]. In particular, if L
is given by (29.3) and if t D jti ti1 j is small, then one can use the following
first-order approximation
" n
!# " n
# n
X @fi @ X @fi Y @
exp t C fi ' exp t exp tfk :
iD1
@ai @ai iD1
@ai @ak
kD1
(29.40)
29 Mori-Zwanzig Approach to Uncertainty Quantification 1055
This allows us to split the joint PDF equation (29.3) into a system of PDF equations.
This approach is quite standard in numerical methods to solve linear PDEs in which
the generator of the semigroup can be represented as a superimposition of linear
operators. The error estimate for the decomposition formula (29.40) is given in
[122, 124]. Higher-order formulas such as Lie-Trotter, Suzuki, and related Backer-
Campbell-Hausdorff formulas can be constructed as well. The literature on this
subject is very rich e.g., [9, 15, 122, 127, 148, 151].
A somewhat related approach relies on approximating the exponential semigroup
e tL in terms of operator polynomials, e.g., the Faber polynomials Fk [103]. In this
case, the exact evolution of the PDF can be expressed as
N
X
pa .t / D k .t /k .a/; where k .a/ D hFk .L/i pa .0/: (29.41)
kD0
kC1 .a/ D hLFk .L/i pa .0/ c0 k .a/ c1 k1 .a/; 0 .a/ D 1: (29.43)
In some cases, the operator averages hLn i appearing in hLFk .L/i can be reduced to
one-dimensional integrals. This happens, in particular, if the initial p.0/ is separable
and if the functions fk appearing in (29.1) are separable as well. Although this might
seem a severe restriction, it is actually satisfied by many systems including Lorentz-
96 [79], Kraichnan-Orszag [106], and the semi-discrete form of SPDEs with
polynomial-type nonlinearities (e.g., viscous Burgers and Navier-Stokes equations).
example, the joint PDF equation (29.3) is a hyperbolic conservation law whose
solution is purely advected (with no diffusion) by the compressible flow G. This
can easily yield mixing, fractal attractors, and all sorts of complex dynamics (see
Fig. 29.1).
Lack of regularity: The solution to a PDF equation is, in general, a distribution
[68]. For example, it could be a multivariate Dirac delta function, a function
with shock-type discontinuities [23], or even a fractal object. From a numerical
viewpoint, resolving such distributions is not trivial although in some cases it can
be done by taking integral transformations or projections [156]. An additional
numerical difficulty inherent to the simulation of PDF equations arises due to the
fact that the solution could be compactly supported over disjoint domains. This
obviously requires the development of appropriate numerical techniques such as
adaptive discontinuous Galerkin methods [21, 28, 113].
Conservation properties: There are several properties of the solution to a PDF
equation that must be preserved in time. The most obvious one is mass, i.e., the
solution always integrates to one. Other properties that must be preserved are the
positivity of the joint PDF and the fact that a partial marginalization of a joint
PDF still yields a PDF.
Long-term integration: The flow map defined by nonlinear dynamical systems
can yield large deformations, stretching and folding of the phase space. As
a consequence, numerical schemes for kinetic equations associated with such
systems will generally loose accuracy in time.
Over the years, many different methods have been proposed to address these issues,
with the most efficient ones being problem dependent. For example, a widely
used method in statistical fluid mechanics is the particle/mesh method [95, 109
111], which is based directly on stochastic Lagrangian models. Other methods
make use of stochastic fields [129] or direct quadrature of moments [47]. In the
case of Boltzmann equation, there is a very rich literature. Both probabilistic
approaches such as direct simulation Monte Carlo [8, 116] and deterministic
methods, e.g., discontinuous Galerkin and spectral methods [18, 19, 39], have been
proposed to compute the solution. Probabilistic methods such as direct simulation
Monte Carlo are extensively used because of their very low computational cost
compared to finite volumes, finite differences, or spectral methods, especially in the
multidimensional case. However, Monte Carlo usually yields poorly accurate and
fluctuating solutions, which need to be post-processed appropriately, for example,
through variance reduction techniques. We refer to Dimarco and Pareschi [31] for a
recent review.
In our previous work [21], we addressed the lack of regularity and high
dimensionality (in the space of parameters) of kinetic equations by using adaptive
discontinuous Galerkin methods [28, 114] combined with sparse probabilistic
collocation. Specifically, the phase variables of the system were discretized by using
spectral elements on an adaptive nonconforming grid that tracks the support of the
PDF in time, while the parametric dependence of the solution was handled by using
sparse grids. More recently, we proposed and validated new classes of algorithms
29 Mori-Zwanzig Approach to Uncertainty Quantification 1057
6 Applications
a b c d
1 1 1 1
x(t)
0 0 0 0
1 1 1 1
0 10 20 0 10 20 0 10 20 0 10 20
t t t t
Fig. 29.6 Stochastic resonance. We study the system (29.44) with parameters D 10, D 3,
D 2,
D 0:2 subject to weakly colored random noise ( D 0:01) of different amplitudes: (a)
D 0; (b) D 0:2; (c) D 0:4; (d) D 0:8. Each figure shows only one solution sample.
At low noise levels, the average residence time in the two states is much longer than the driving
period. However, if we increase the noise level to D 0:8 (d), then we observe almost periodic
transitions between the two metastable states. In most cases, we have a jump from one state to the
other and back again approximately once per modulation period (Adapted from [136])
a phenomenon known as stochastic resonance [6, 82, 104, 147], namely, random
noise can enhance significantly the transmission of the weak periodic signal. The
mechanism that makes this possible is explained in Fig. 29.6, with reference to the
system
8
3 5
< dx.t / D 2x 2 x x C f .t I / C
cos.t /
dt 2.1 C x 2 /2 : (29.44)
:x.0/ D x .!/
0
1 jtsj=
C .t; s/ D e (29.45)
2
The exact evolution equation for the PDF of x.t / can be obtained by applying the
convolutionless projection operator method described in previous sections. Such an
equation is a linear partial differential equation of infinite order in the phase variable
a. If we consider a second-order approximation, i.e., if we expand the propagator of
px in terms of cumulant operators and truncate the expansion at the second order,
then we obtain
29 Mori-Zwanzig Approach to Uncertainty Quantification 1059
Fig. 29.7 Stochastic resonance. Range of validity of the MZ-PDF equation (29.46) as a function
of the noise amplitude and correlation time (a). Effective diffusion coefficient D.a; t / (see
Eq. (29.48)) corresponding to exponentially correlated Gaussian noises (b) (Adapted from [136])
@px @px
D L0 px
cos.t /
@t @a
Z t
@ @
C 2 C .t; s/ e .ts/L0 e .st/L0 ds px ; (29.46)
0 @a @a
where
@ 2a 2 a3 a5 2a 2 a3 a5 @
L0 D IC : (29.47)
@a 2.1 C a2 /2 2.1 C a2 /2 @a
@px @px @2
D L0 px " cos.t / C 2 2 .D.a; t /px / ; (29.48)
@t @a @a
where the effective diffusion coefficient D.a; t / depends on the type of noise. Note
that if the correlation time goes to zero (white-noise limit), then Eq. (29.46),
with C .t; s/ defined in (29.45), consistently reduces to the classical Fokker-Planck
equation. The proof is simple, and it relies on the limits
Z t Z t
1 s= 1 1 s= k
lim e ds D ; lim e s ds D 0; k 2 N: (29.49)
!0 0 2 2 !0 0 2
1060 D. Venturi et al.
Z t Z t 2
1 s= @ sL0 @ sL0 1 s= @ 1 @2
lim e e e ds D lim e ds 2
D ;
!0 0 2 @a @a !0 0 2 @a 2 @a2
(29.50)
i.e., for ! 0 Eq. (29.46) reduces to the Fokker-Planck equation
@px @px 2 @2 px
D L0 px
cos.t / C : (29.51)
@t @a 2 @a2
Next, we study the transient dynamics of the one-time PDF of the solution x.t /
within the period T D 3. To this end, we consider the following set of parameters
D 1, D 1, D 10,
D 0:5, leading to a slow relaxation to statistical
equilibrium. This allows us to study the transient of the PDF more carefully and
compare the results with Monte Carlo (MC). This is shown in Fig. 29.8, where
it is seen that for small the random forcing term in (29.44) does not influence
significantly the dynamics and therefore the PDF of x.t / is mainly advected by
the operator L0 . Noteppthat the PDF tends to accumulate around the metastable
equilibrium states 3 1. For larger the probability of switching between the
metastable states increases and therefore the strong bimodality observed in Fig. 29.8
(left) is attenuated.
There exists a close connection between the statistical properties of the random
noise and the structure of the MZ-PDF equation for the response variables of the
system. In particular, it has been recently shown, e.g., in [75], that the PDF of
the solution to the Langevin equation driven by Levy flights satisfies a fractional
Fokker-Plank equation. Such an equation can be easily derived by using the
Mori-Zwanzig projection operator framework, which represents therefore a very
general tool to exploit the relation between noise and reduced-order PDF equations.
For example, let us consider the non-stationary covariance function of fractional
Brownian motion
1 2h
C .t; s/ D jt j C jsj2h jt sj2h ; 0 < h < 1; t; s > 0 (29.52)
2
which reduces to the covariance function of standard Levy noise for h D 1=2, i.e.,
CLevy .t; s/ D minft; sg. A substitution of (29.52) into (29.46) yields an equation for
the PDF of the solution to the system (29.1) driven by fractional Brownian motion
of small amplitude. As is well known, such noise can trigger either sub-diffusion or
super-diffusion in the PDF dynamics.
29 Mori-Zwanzig Approach to Uncertainty Quantification 1061
Fig. 29.8 Stochastic resonance. Time snapshots of the PDF of x.t / as predicted by Eq. (29.46)
(continuous lines) and MC simulation (105 samples) (dashed lines). The Gaussian process f .t I /
in (29.44) is exponentially correlated with small correlation time ((a) and (b)). Note that the
Karhunen-Love expansion of such noise requires 280 Gaussian random variables to achieve 99%
of the correlation energy in the time interval 0; 3. We also show the PDF dynamics corresponding
to fractional Brownian motion of small amplitude and different Hurst indices (c) (Adapted from
[136])
@u
C V .x/ ru D 0 .x/ C 1 .xI / R.u/; 2 Rm ; (29.53)
@t
a b c d e
6 6 6 6 6 1
5 5 5 5 5 0.8
4 4 4 4 4
0.6
3 3 3 3 3
t
t
0.4
2 2 2 2 2
1 1 1 1 1 0.2
0 0 0 0 0 0
0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6
x x x x x
f 3
0
0 + 1
0
0 2 4 6
x
Fig. 29.9 Stochastic advection-reaction. Samples of the concentration field solving (29.53) in one
spatial dimension for periodic boundary conditions and different realizations of the random initial
condition and the random reaction rate. The correlation length of the random initial condition
decreases from (a) to (e). In (f) we plot a few realizations of the random reaction rate ( D 0:3)
(Adapted from [136])
In [135, 141] we have studied Eq. (29.53) by using the response-excitation PDF
method as well as the large-eddy-diffusivity (LED) closure [125]. Here we consider
a different approach based on the MZ-PDF equation [136]. To this end, we assume
that is reasonably small and that the concentration field u is independent of
at initial time, i.e., that the initial joint PDF of the system has the form p.0/ D
pu .0/p . In these hypotheses, we obtain the following second-order approximation
to the MZ-PDF equation
Z t
@pu .t / sL0 sL0
D L0 pu .t / C 2 1 e 1 e ds F 2 pu .t /; (29.54)
@t 0
@R.a/ @
L0 D 0 .x/F V .x/ r; F D C R.a/ : (29.55)
@a @a
@u @u
Cu D f .x; t I !/ (29.56)
@t @x
4 4 4 4 4
x
x
2 2 2 2 2
0 0 0 0 0
0 0.3 0.6 0.9 0 0.3 0.6 0.9 0 0.3 0.6 0.9 0 0.3 0.6 0.9 0 0.3 0.6 0.9
a a a a a
Fig. 29.10 Stochastic advection-reaction. Time snapshots of the concentration PDF predicted by
the MZ-PDF equation (29.54) (Adapted from [136])
MZ-PDF zero-order
20
MZ-PDF second-order
t=4
Monte Carlo
15
t=1
10
t=2
t=0
5 t=3
0
0 0.2 0.4 0.6 0.8 1
1 0 1 2 t=2 t =1 t =2
2 4 4 0.8
2 2 0.6
t=1
1 0 0 0.4
a
a
t
2 2 0.2
t=0
0 4 4 0
0 3.14 6.28 0 1.57 3.14 4.71 6.28 0 1.57 3.14 4.71 6.28
x x x
Fig. 29.12 Stochastic Burgers equation. One realization of the velocity field computed by using
adaptive discontinuous Galerkin methods (left). Time snapshots of the one-point PDF obtained by
solving the MZ-PDF equation (29.57) (Adapted from [23])
Fig. 29.13 Stochastic Burgers equation. One-point PDF of the velocity field at x D for
exponentially correlated, homogeneous (in space) random forcing processes with correlation time
0:01 and amplitude D 0:01 and D 0:1 (second row). Shown are results obtained from MC
and from two different truncations of the MZ-PDF equation (29.57) (Adapted from [23])
to formally integrate out the random forcing term. This yields the following equation
(second-order approximation) for the one-point one-time PDF of the velocity field
(see Fig. 29.12 and [23])
@pu .t / @pu .t /
D L0 pu .t / C hf .x; t /i C
@t @a
Z t
2 @ .ts/L0 @ .ts/L0
f .x; t / e f .x; s/ e ds pu .t /; (29.57)
0 @a @a
where L0 is given by
Z a
@ @
L0 D da a : (29.58)
1 @x @x
29 Mori-Zwanzig Approach to Uncertainty Quantification 1065
In Fig. 29.13 we compare the PDF dynamics obtained by solving Eq. (29.57) with
Monte Carlo simulation. It is seen that, as we increase the amplitude of the forcing,
the second-order approximation (29.57) loses accuracy and higher-order corrections
have to be included.
Particle systems are often used in models of system biology and soft matter
physics to simulate and understand large-scale effects based on microscopic first
principles. The computability of such systems depends critically on the number of
particles and the interaction potentials. Full molecular
dynamics
(MD) simulations
can be performed for particle systems with O 1013 particles. However, such
hero simulations require hundred of thousands of computer cores and significant
time and data processing to be successfully completed. This motivates the use of
coarse-graining techniques, such as dissipative particle dynamics (DPD) [98], for
particle systems to compute macroscopic/mesoscopic observables at a reasonable
computational cost. Among different approaches, the Mori-Zwanzig formulation
[61, 118] has proved to be effective in achieving this goal [1, 58, 77]. The key idea is
shown in Fig. 29.14, where a star polymer described atomistically is coarse-grained
to bigger particles the MZ-DPD particles by following the procedure sketched in
Fig. 29.3 for a star polymer. The calculation of the solution to the MZ-DPD system,
e.g., Eq. (29.14), relies on approximations. In particular, the memory term plays an
important role in the dynamics of the coarse-grained system, and this role becomes
more relevant as we increase the coarse-graining level [77, 157]. In Fig. 29.14 we
compare the velocity autocorrelation function obtained from molecular dynamics
simulations (MD) and the coarse-grained MZ-DPD system.
Fig. 29.14 Coarse-grained model of a particle system. Comparison between the velocity auto-
correlation function obtained from molecular dynamics simulation (MD) and Mori-Zwanzig
dissipative particle-dynamics (MZ-DPD) (Courtesy of Dr. Zhen Li, Brown University (unpub-
lished))
1066 D. Venturi et al.
7 Conclusions
8 Cross-References
References
1. Akkermans, R.L.C., Briels, W.J.: Coarse-grained dynamics of one chain in a polymer melt. J.
Chem. Phys. 113(15), 620630 (2000)
2. Al-Mohy, A.H., Higham, N.J.: Computing the Frchet derivative of the matrix exponential with
an application to condition number estimation. SIAM J. Matrix Anal. Appl. 30(4), 16391657
(2009)
3. Al-Mohy, A.H., Higham, N.J.: Computing the action of the matrix exponential with an
application to exponential integrators. SIAM J. Sci. Comput. 33(2), 488511 (2011)
4. Arai, T., Goodman, B.: Cumulant expansion and Wick theorem for spins. Application to the
antiferromagnetic ground state. Phys. Rev. 155(2), 514527 (1967)
5. Balescu, R.: Equilibrium and Non-equilibrium Statistical Mechanics. Wiley, New York (1975)
6. Benzi, R., Sutera, A., Vulpiani, A.: The mechanism of stochastic resonance. J. Phys. A: Math.
Gen. 14:L453L457 (1981)
7. Billings, S.A.: Nonlinear System Identification: NARMAX Methods in the Time, Frequency,
and Spatio-Temporal Domains. Wiley, Chichester (2013)
8. Bird, G.A.: Molecular Gas Dynamics and Direct Numerical Simulation of Gas Flows.
Clarendon Press, Oxford (1994)
9. Blanes, S., Casas, F., Oteo, J.A., Ros, J.: The Magnus expansion and some of its applications.
Phys. Rep. 470, 151238 (2009)
10. Blanes, S., Casas, F., Murua, A.: Splitting methods in the numerical integration of non-
autonomous dynamical systems. RACSAM 106, 4966 (2012)
11. Bonatto, C., Gallas, J.A.C., Ueda, Y.: Chaotic phase similarities and recurrences in a damped-
driven Duffing oscillator. Phys. Rev. E 77, 026217(15) (2008)
12. Botev, Z.I., Grotowski, J.F., Kroese, D.P.: Kernel density estimation via diffusion. Ann. Stat.
38(5), 29162957 (2010)
13. Breuer, H.P., Kappler, B., Petruccione, F.: The time-convolutionless projection operator
technique in the quantum theory of dissipation and decoherence. Ann. Phys. 291, 3670 (2001)
14. Broer, H., Sim, C., Vitolo, R.: Bifurcations and strange attractors in the Lorenz-84 climate
model with seasonal forcing. Nonlinearity 15, 12051267 (2002)
15. Casas, F.: Solutions of linear partial differential equations by Lie algebraic methods. J. Comput.
Appl. Math. 76, 159170 (1996)
16. Cercignani, C., Gerasimenko, U.I., Petrina, D.Y. (eds.): Many Particle Dynamics and Kinetic
Equations, 1st edn. Kluwer Academic, Dordrecht/Boston (1997)
17. Chaturvedi, S., Shibata, F.: Time-convolutionless projection operator formalism for elimination
of fast variables. Applications to Brownian motion. Z. Phys. B 35, 297308 (1979)
18. Cheng, Y., Gamba, I.M., Majorana, A., Shu, C.W.: A discontinuous Galerkin solver for
Boltzmann-Poisson systems in nano devices. Comput. Methods Appl. Mech. Eng. 198, 3130
3150 (2009)
19. Cheng, Y., Gamba, I.M., Majorana, A., Shu, C.W.: A brief survey of the discontinuous Galerkin
method for the Boltzmann-Poisson equations. SEMA J. 54, 4764 (2011)
20. Chertock, A., Gottlieb, D., Solomonoff, A.: Modified optimal prediction and its application to
a particle method problem. J. Sci. Comput. 37(2), 189201 (2008)
21. Cho, H., Venturi, D., Karniadakis, G.E.: Adaptive discontinuous Galerkin method for response-
excitation PDF equations. SIAM J. Sci. Comput. 5(4), B890B911 (2013)
22. Cho, H., Venturi, D., Karniadakis, G.E.: Numerical methods for high-dimensional probability
density function equations. J. Comput. Phys. Under Rev. (2014)
23. Cho, H., Venturi, D., Karniadakis, G.E.: Statistical analysis and simulation of random shocks
in Burgers equation. Proc. R. Soc. A 2171(470), 121 (2014)
1068 D. Venturi et al.
24. Chorin, A., Lu, F.: A discrete approach to stochastic parametrization and dimensional reduction
in nonlinear dynamics, pp. 112. arXiv:submit/1219662 (2015)
25. Chorin, A.J., Stinis, P.: Problem reduction, renormalization and memory. Commun. Appl.
Math. Comput. Sci. 1(1), 127 (2006)
26. Chorin, A.J., Tu, X.: Implicit sampling for particle filters. PNAS 106(41), 1724917254 (2009)
27. Chorin, A.J., Hald, O.H., Kupferman, R.: Optimal prediction and the Mori-Zwanzig represen-
tation of irreversible processes. Proc. Natl. Acad. Sci. U. S. A. 97(7), 29682973 (2000)
28. Cockburn, B., Karniadakis, G.E., Shu, C.W.: Discontinuous Galerkin Methods, Vol. 11 of
Lecture Notes in Computational Science and Engineering. Springer, New York (2000)
29. Darve, E., Solomon, J., Kia, A.: Computing generalized Langevin equations and generalized
Fokker-Planck equations. Proc. Natl. Acad. Sci. U. S. A. 106(27), 1088410889 (2009)
30. Dekker, H.: Correlation time expansion for multidimensional weakly non-Markovian Gaussian
processes. Phys. Lett. A 90(12), 2630 (1982)
31. Dimarco, G., Paresci, L.: Numerical methods for kinetic equations. Acta Numer. 23(4), 369
520 (2014)
32. Doostan, A., Owhadi, H.: A non-adapted sparse approximation of PDEs with stochastic inputs.
J. Comput. Phys. 230(8), 30153034 (2011)
33. Edwards, S.F.: The statistical dynamics of homogeneous turbulence. J. Fluid Mech. 18, 239
273 (1964)
34. Engel, K.J., Nagel, R.: One-Parameter Semigroups for Linear Evolution Equations. Springer,
New York (2000)
35. Faetti, S., Grigolini, P.: Unitary point of view on the puzzling problem of nonlinear systems
driven by colored noise. Phys. Rev. A 36(1), 441444 (1987)
36. Faetti, S., Fronzoni, L., Grigolini, P., Mannella, R.: The projection operator approach to the
Fokker-Planck equation. I. Colored Gaussian noise. J. Stat. Phys. 52(3/4), 951978 (1988)
37. Faetti, S., Fronzoni, L., Grigolini, P., Palleschi, V., Tropiano, G.: The projection operator
approach to the Fokker-Planck equation. II. Dichotomic and nonlinear Gaussian noise. J. Stat.
Phys. 52(3/4), 9791003 (1988)
38. Feynman, R.P.: An operator calculus having applications in quantum electrodynamics. Phys.
Rev. 84, 108128 (1951)
39. Filbet, F., Russo, G.: High-order numerical methods for the space non-homogeneous Boltz-
mann equations. J. Comput. Phys. 186, 457480 (2003)
40. Foias, C., Sell, G.R., Temam, R.: Inertial manifolds for nonlinear evolutionary equations. Proc.
Natl. Acad. Sci. U.S.A. 73(2), 309353 (1988)
41. Foias, C., Manley, O.P., Rosa, R., Temam, R.: Navier-Stokes equations and turbulence, 1st edn.
Cambridge University Press (2001)
42. Foias, C., Jolly, M.S., Manley, O.P., Rosa, R.: Statistical estimates for the Navier-Stokes
equations and Kraichnan theory of 2-D fully developed turbulence. J. Stat. Phys. 108(3/4),
591646 (2002)
43. Foo, J., Karniadakis, G.E.: The multi-element probabilistic collocation method (ME-PCM):
error analysis and applications. J. Comput. Phys. 227, 95729595 (2008)
44. Foo, J., Karniadakis, G.E.: Multi-element probabilistic collocation method in high dimensions.
J. Comput. Phys. 229, 15361557 (2010)
45. Fox, R.F.: A generalized theory of multiplicative stochastic processes using Cumulant tech-
niques. J. Math. Phys. 16(2), 289297 (1975)
46. Fox, R.F.: Functional-calculus approach to stochastic differential equations. Phys. Rev. A
33(1), 467476 (1986)
47. Fox, R.O.: Computational Models for Turbulent Reactive Flows. Cambridge University Press,
Cambridge (2003)
48. Friedrich, R., Daitche, A., Kamps, O., Llff, J., Vokuhle, M., Wilczek, M.: The Lundgren-
Monin-Novikov hierarchy: kinetic equations for turbulence. Comp. Rend. Phys. 13(910),
929953 (2012)
49. Frisch, U.: Turbulence: the legacy of A. N. Kolmogorov. Cambridge University Press,
Cambridge (1995)
29 Mori-Zwanzig Approach to Uncertainty Quantification 1069
50. Ghanem, R.G., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New
York (1998)
51. Hnggi, P.: Correlation functions and master equations of generalized (non-Markovian)
Langevin equations. Z. Phys. B 31, 407416 (1978)
52. Hnggi, P.: On derivations and solutions of master equations and asymptotic representations.
Z. Phys. B 30, 8595 (1978)
53. Hnggi, P.: The functional derivative and its use in the description of noisy dynamical systems.
In: Pesquera, L., Rodriguez, M. (eds.) Stochastic Processes Applied to Physics, pp. 6995.
World Scientific, Singapore (1985)
54. Hnggi, P., Jung, P.: Colored noise in dynamical systems. In: Prigogine, I., Rice, S.A. (eds.)
Advances in Chemical Physics, vol. 89, pp. 239326. Wiley-Interscience, New York (1995)
55. Hegerfeldt, G.C., Schulze, H.: Noncommutative cumulants for stochastic differential equations
and for generalized Dyson series. J. Stat. Phys. 51(3/4), 691710 (1988)
56. Herring, J.R.: Self-consistent-field approach to nonstationary turbulence. Phys. Fluids 9(11),
21062110 (1966)
57. Hesthaven, J.S., Gottlieb, S., Gottlieb, D.: Spectral Methods for Time-Dependent Problems.
Cambridge University Press, Cambridge (2007)
58. Hijn, C., nol, P.E., Vanden-Eijnden, E., Delgado-Buscalioni, R.: Mori-Zwanzig formalism as
a practical computational tool. Faraday Discuss 144, 301322 (2010)
59. Hosokawa, I.: Monin-Lundgren hierarchy versus the Hopf equation in the statistical theory of
turbulence. Phys. Rev. E 73, 067301(14) (2006)
60. Hughes, K.H., Burghardt, I.: Maximum-entropy closure of hydrodynamic moment hierarchies
including correlations. J. Chem. Phys. 136, 214109(118) (2012)
61. Izvekov, S.: Microscopic derivation of particle-based coarse-grained dynamics. J. Chem. Phys.
138, 134106(116) (2013)
62. Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press, Cam-
bridge, (2003)
63. Jensen, R.V.: Functional integral approach to classical statistical dynamics. J. Stat. Phys. 25(2),
183210 (1981)
64. Kampen, N.G.V.: A cumulant expansion for stochastic linear differential equations. II. Physica
74, 239247 (1974)
65. Kampen, N.G.V.: Elimination of fast variables. Phys. Rep. 124(2), 69160 (1985)
66. Kampen, N.G.V.: Stochastic Processes in Physics and Chemistry, 3rd edn. North Holland,
Amsterdam (2007)
67. Kampen, N.G.V., Oppenheim, I.: Brownian motion as a problem of eliminating fast variables.
Physica A 138, 231248 (1986)
68. Kanwal, R.P.: Generalized Functions: Theory and Technique, 2nd edn. Birkhuser, Boston
(1998)
69. Karimi, A., Paul, M.R.: Extensive chaos in the Lorenz-96 model. Chaos 20(4), 043105(111)
(2010)
70. Kato, T.: Perturbation Theory for Linear Operators, 4th edn. Springer, New York (1995)
71. Khuri, A.I.: Applications of Diracs delta function in statistics. Int. J. Math. Educ. Sci. Technol.
35(2), 185195 (2004)
72. Kraichnan, R.H.: Statistical dynamics of two-dimensional flow. J. Fluid Mech. 67, 155175
(1975)
73. Kubo, R.: Generalized cumulant expansion method. J. Phys. Soc. Jpn. 17(7), 11001120 (1962)
74. Kubo, R.: Stochastic Liouville equations. J. Math. Phys. 4(2), 174183 (1963)
75. Kullberg, A., del Castillo-Negrete, D.: Transport in the spatially tempered, fractional Fokker-
Planck equation. J. Phys. A: Math. Theor. 45(25), 255101(121) (2012)
76. Li, G., Wang, S.W., Rabitz, H., Wang, S., Jaff, P.: Global uncertainty assessments by high
dimensional model representations (HDMR). Chem. Eng. Sci. 57(21), 44454460 (2002)
77. Li, Z., Bian, X., Caswell, B., Karniadakis, G.E.: Construction of dissipative particle dynamics
models for complex fluids via the Mori-Zwanzig formulation. Soft. Matter. 10, 86598672
(2014)
1070 D. Venturi et al.
78. Lindenberg, K., West, B.J., Masoliver, J.: First passage time problems for non-Markovian
processes. In: Moss, F., McClintock, P.V.E. (eds.) Noise in Nonlinear Dynamical Systems,
vol. 1, pp. 110158. Cambridge University Press, Cambridge (1989)
79. Lorenz, E.N.: Predictability a problem partly solved. In: ECMWF Seminar on Predictability,
Reading, vol. 1, pp. 118 (1996)
80. Luchtenburg, D.M., Brunton, S.L., Rowley, C.W.: Long-time uncertainty propagation using
generalized polynomial chaos and flow map composition. J. Comput. Phys. 274, 783802
(2014)
81. Lundgren, T.S.: Distribution functions in the statistical theory of turbulence. Phys. Fluids 10(5),
969975 (1967)
82. Luo, X., Zhu, S.: Stochastic resonance driven by two different kinds of colored noise in a
bistable system. Phys. Rev. E 67(3/4), 021104(113) (2003)
83. Ma, X., Karniadakis, G.E.: A low-dimensional model for simulating three-dimensional
cylinder flow. J. Fluid Mech. 458, 181190 (2002)
84. Ma, X., Zabaras, N.: An adaptive hierarchical sparse grid collocation method for the solution
of stochastic differential equations. J. Comput. Phys. 228, 30843113 (2009)
85. Mattuck, R.D.: A Guide to Feynman Diagrams in the Many-Body Problem. Dover, New York
(1992)
86. McCane, A.J., Luckock, H.C., Bray, A.J.: Path integrals and non-Markov processes. 1. General
formalism. Phys. Rev. A 41(2), 644656 (1990)
87. McComb, W.D.: The Physics of Fluid Turbulence. Oxford University Press, Oxford (1990)
88. Moler, C., Loan, C.V.: Nineteen dubious ways to compute the exponential of a matrix, twenty-
five years later. SIAM Rev. 45(1), 349 (2003)
89. Monin, A.S.: Equations for turbulent motion. Prikl. Mat. Mekh. 31(6), 10571068 (1967)
90. Montgomery, D.: A BBGKY framework for fluid turbulence. Phys. Fluids 19(6), 802810
(1976)
91. Mori, H.: Transport, collective motion, and Brownian motion. Prog. Theor. Phys. 33(3), 423
455 (1965)
92. Mori, H., Morita, T., Mashiyama, K.T.: Contraction of state variables in non-equilibrium open
systems. I. Prog. Theor. Phys. 63(6), 18651883 (1980)
93. Moss, F., McClintock, P.V.E. (eds.): Noise in Nonlinear Dynamical Systems. Volume 1: Theory
of Continuous Fokker-Planck Systems. Cambridge University Press, Cambridge (1995)
94. Mukamel, S., Oppenheim, I., Ross, J.: Statistical reduction for strongly driven simple quantum
systems. Phys. Rev. A 17(6), 19881998 (1978)
95. Muradoglu, M., Jenny, P., Pope, S.B., Caughey, D.A.: A consistent hybrid finite-
volume/particle method for the PDF equations of turbulent reactive flows. J. Comput. Phys.
154, 342371 (1999)
96. Nakajima, S.: On quantum theory of transport phenomena steady diffusion. Prog. Theor.
Phys. 20(6), 948959 (1958)
97. Neu, P., Speicher, R.: A self-consistent master equation and a new kind of cumulants. Z. Phys.
B 92, 399407 (1993)
98. Espaol, P., Warren, P.: Statistical mechanics of dissipative particle dynamics. EuroPhys. Lett.
30(4), 191196 (1995)
99. Noack, B.R., Niven, R.K.: A hierarchy of maximum entropy closures for Galerkin systems of
incompressible flows. Comput. Math. Appl. 65(10), 15581574 (2012)
100. Nouy, A.: Proper generalized decompositions and separated representations for the numerical
solution of high dimensional stochastic problems. Arch. Comput. Methods Appl. Mech. Eng.
17, 403434 (2010)
101. Nouy, A., Matre, O.P.L.: Generalized spectral decomposition for stochastic nonlinear
problems. J. Comput. Phys. 228, 202235 (2009)
102. Novak, E., Ritter, K.: High dimensional integration of smooth functions over cubes. Numer.
Math. 75, 7997 (1996)
103. Novati, P.: Solving linear initial value problems by Faber polynomials. Numer. Linear Algebra
Appl. 10, 247270 (2003)
29 Mori-Zwanzig Approach to Uncertainty Quantification 1071
104. Nozaki, D., Mar, D.J., Grigg, P., Collins, J.J.: Effects of colored noise on stochastic resonance
in sensory neurons. Phys. Rev. Lett 82(11), 24022405 (1999)
105. OBrien, E.E.: The probability density function (pdf) approach to reacting turbulent flows.
In: Topics in Applied Physics. Turbulent Reacting Flows, vol. 44, pp. 185218. Springer,
Berlin/New York (1980)
106. Orszag, S.A., Bissonnette, L.R.: Dynamical properties of truncated Wiener-Hermite expan-
sions. Phys. Fluids 10(12), 26032613 (1967)
107. Pereverzev, A., Bittner, E.R.: Time-convolutionless master equation for mesoscopic electron-
phonon systems. J. Chem. Phys. 125, 144107(17) (2006)
108. Pesquera, L., Rodriguez, M.A., Santos, E.: Path integrals for non-Markovian processes. Phys.
Lett. 94(67), 287289 (1983)
109. Pope, S.B.: A Monte Carlo method for the PDF equations of turbulent reactive flow. Combust
Sci. Technol. 25, 159174 (1981)
110. Pope, S.B.: Lagrangian PDF methods for turbulent flows. Ann. Rev. Fluid Mech. 26, 2363
(1994)
111. Pope, S.B.: Simple models of turbulent flows. Phys. Fluids 23(1), 011301(120) (2011)
112. Rabitz, H., Alis F, Shorter, J., Shim, K.: Efficient inputoutput model representations.
Comput. Phys. Commun. 117(12), 1120 (1999)
113. Remacle, J.F., Flaherty, J.E., Shephard, M.S.: An adaptive discontinuous Galerkin technique
with an orthogonal basis applied to compressible flow problems. SIAM Rev. 45(1), 5372
(2003)
114. Remacle, J.F., Flaherty, J.E., Shephard, M.S.: An adaptive discontinuous Galerkin technique
with an orthogonal basis applied to compressible flow problems. SIAM Rev. 45(1), 5372
(2003)
115. Richter, M., Knorr, A.: A time convolution less density matrix approach to the nonlinear
optical response of a coupled system-bath complex. Ann. Phys. 325, 711747 (2010)
116. Rjasanow, S., Wagner, W.: Stochastic Numerics for the Boltzmann Equation. Springer,
Berlin/New York (2004)
117. Sapsis, T.P., Lermusiaux, P.F.J.: Dynamically orthogonal field equations for continuous
stochastic dynamical systems. Physica D 238(2324), 23472360 (2009)
118. Snook, I.: The Langevin and Generalised Langevin Approach to the Dynamics of Atomic,
Polymeric and Colloidal Systems, 1st edn. Elsevier, Amsterdam/Boston (2007)
119. Stinis, P.: A comparative study of two stochastic mode reduction methods. Physica D 213,
197213 (2006)
120. Stinis, P.: Mori-Zwanzig-reduced models for systems without scale separation. Proc. R. Soc.
A 471, 20140446(113) (2015)
121. Stratonovich, R.L.: Topics in the Theory of Random Noise, vols. 1 and 2. Gordon and Breach,
New York (1967)
122. Suzuki, M.: Decomposition formulas of exponential operators and Lie exponentials with
applications to quantum mechanics and statistical physics. J. Math. Phys. 26(4), 601612
(1985)
123. Suzuki, M.: General decomposition theory of ordered exponentials. Proc. Jpn. Acad. B 69(7),
161166 (1993)
124. Suzuki, M.: Convergence of general decompositions of exponential operators. Commun.
Math. Phys. 163, 491508 (1994)
125. Tartakovsky, D.M., Broyda, S.: PDF equations for advective-reactive transport in hetero-
geneous porous media with uncertain properties. J. Contam. Hydrol. 120121, 129140
(2011)
126. Terwiel, R.H.: Projection operator method applied to stochastic linear differential equations.
Physica 74, 248265 (1974)
127. Thalhammer, M.: High-order exponential operator splitting methods for time-dependent
Schrdinger equations. SIAM J. Numer. Anal. 46(4), 20222038 (2008)
128. Turkington, B.: An optimization principle for deriving nonequilibrium statistical models of
Hamiltonian dynamics. J. Stat. Phys. 152, 569597 (2013)
1072 D. Venturi et al.
129. Valino, L.: A field Monte Carlo formulation for calculating the probability density function
of a single scalar in a turbulent flow. Flow Turbul. Combust. 60(2), 157172 (1998)
130. Venkatesh, T.G., Patnaik, L.M.: Effective Fokker-Planck equation: Path-integral formalism.
Phys. Rev. E 48(4), 24022412 (1993)
131. Venturi, D.: On proper orthogonal decomposition of randomly perturbed fields with applica-
tions to flow past a cylinder and natural convection over a horizontal plate. J. Fluid Mech. 559,
215254 (2006)
132. Venturi, D.: A fully symmetric nonlinear biorthogonal decomposition theory for random
fields. Physica D 240(45), 415425 (2011)
133. Venturi, D.: Conjugate flow action functionals. J. Math. Phys. 54, 113502(119) (2013)
134. Venturi, D., Karniadakis, G.E.: Differential constraints for the probability density function of
stochastic solutions to the wave equation. Int. J. Uncertain. Quantif. 2(3), 131150 (2012)
135. Venturi, D., Karniadakis, G.E.: New evolution equations for the joint response-excitation
probability density function of stochastic solutions to first-order nonlinear PDEs. J. Comput.
Phys. 231, 74507474 (2012)
136. Venturi, D., Karniadakis, G.E.: Convolutionless Nakajima-Zwanzig equations for stochastic
analysis in nonlinear dynamical systems. Proc. R. Soc. A 470(2166), 120 (2014)
137. Venturi, D., Wan, X., Karniadakis, G.E.: Stochastic low-dimensional modelling of a random
laminar wake past a circular cylinder. J. Fluid Mech. 606, 339367 (2008)
138. Venturi, D., Wan, X., Karniadakis, G.E.: Stochastic bifurcation analysis of Rayleigh-Bnard
convection. J. Fluid Mech. 650, 391413 (2010)
139. Venturi, D., Choi, M., Karniadakis, G.E.: Supercritical quasi-conduction states in stochastic
Rayleigh-Bnard convection. Int. J. Heat Mass Transf. 55(1314), 37323743 (2012)
140. Venturi, D., Sapsis, T.P., Cho, H., Karniadakis, G.E.: A computable evolution equation for the
joint response-excitation probability density function of stochastic dynamical systems. Proc.
R. Soc. A 468(2139), 759783 (2012)
141. Venturi, D., Tartakovsky, D.M., Tartakovsky, A.M., Karniadakis, G.E.: Exact PDF equations
and closure approximations for advective-reactive transport. J. Comput. Phys. 243, 323343
(2013)
142. Villani, C.: A review of mathematical topics in collisional kinetic theory. In: Friedlander, S.,
Serre, D. (eds.) Handbook of mathematical fluid dynamics, Vol I, North-Holland, Amsterdam,
pp 73258 (2002)
143. Viswanath, D.: The fractal property of the lorentz attractor. Physica D 190, 115128 (2004)
144. Wan, X., Karniadakis, G.E.: An adaptive multi-element generalized polynomial chaos method
for stochastic differential equations. J. Comput. Phys. 209(2), 617642 (2005)
145. Wan, X., Karniadakis, G.E.: Long-term behavior of polynomial chaos in stochastic flow
simulations. Comput. Methods Appl. Mech. Eng. 195, 55825596 (2006)
146. Wan, X., Karniadakis, G.E.: Multi-element generalized polynomial chaos for arbitrary
probability measures. SIAM J. Sci. Comput. 28(3), 901928 (2006)
147. Wang, C.J.: Effects of colored noise on stochastic resonance in a tumor cell growth system.
Phys. Scr. 80, 065004 (5pp) (2009)
148. Wei, J., Norman, E.: Lie algebraic solutions of linear differential equations. J. Math. Phys.
4(4), 575581 (1963)
149. Weinberg, S.: The Quantum Theory of Fields, vol. I. Cambridge University Press, Cambridge
(2002)
150. Wiebe, N., Berry, D., Hyer, P., Sanders, B.C.: Higher-order decompositions of ordered
operator exponentials. J. Phys. A: Math. Theor. 43, 065203(120) (2010)
151. Wilcox, R.M.: Exponential operators and parameter differentiation in quantum physics. J.
Math. Phys. 8, 399407 (1967)
152. Wilczek, M., Daitche, A., Friedrich, R.: On the velocity distribution in homogeneous isotropic
turbulence: correlations and deviations from Gaussianity. J. Fluid Mech. 676, 191217 (2011)
153. Wio, H.S., Colet, P., San Miguel M, Pesquera, L., Rodrguez, M.A.: Path-integral formulation
for stochastic processes driven by colored noise. Phys. Rev. A 40(12), 73127324 (1989)
29 Mori-Zwanzig Approach to Uncertainty Quantification 1073
154. Xiu, D., Karniadakis, G.E.: The WienerAskey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
155. Xiu, D., Karniadakis, G.E.: Modeling uncertainty in flow simulations via generalized
polynomial chaos. J. Comput. Phys. 187, 137167 (2003)
156. Yang, Y., Shu, C.W.: Discontinuous Galerkin method for hyperbolic equations involving -
singularities: negarive-order norma error estimate and applications. Numer. Math. 124, 753
781 (2013)
157. Yoshimoto, Y., Kinefuchi, I., Mima, T., Fukushima, A., Tokumasu, T., Takagi, S.: Bottom-up
construction of interaction models of non-Markovian dissipative particle dynamics. Phys. Rev.
E 88, 043305(112) (2013)
158. Zwanzig, R.: Ensemble methods in the theory of irreversibility. J. Chem. Phys. 33(5), 1338
1341 (1960)
159. Zwanzig, R.: Memory effects in irreversible thermodynamics. Phys. Rev. 124, 983992
(1961)
Rare-Event Simulation
30
James L. Beck and Konstantin M. Zuev
Abstract
Rare events are events that are expected to occur infrequently or, more techni-
cally, those that have low probabilities (say, order of 103 or less) of occurring
according to a probability model. In the context of uncertainty quantification, the
rare events often correspond to failure of systems designed for high reliability,
meaning that the system performance fails to meet some design or operation
specifications. As reviewed in this section, computation of such rare-event
probabilities is challenging. Analytical solutions are usually not available for
nontrivial problems, and standard Monte Carlo simulation is computationally
inefficient. Therefore, much research effort has focused on developing advanced
stochastic simulation methods that are more efficient. In this section, we address
the problem of estimating rare-event probabilities by Monte Carlo simulation,
importance sampling, and subset simulation for highly reliable dynamic systems.
Keywords
Rare-event simulation Dynamic system reliability Monte carlo simulation
Subset simulation Importance sampling Splitting
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076
1.1 Mathematical Formulation of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076
2 Standard Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078
3 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081
4 Subset Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084
5 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
6 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1091
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1098
1 Introduction
X .t C 1/ D f .X .t /; U .t /; t /; X .t / 2 Rn ; U .t / 2 Rm ; t D 0; : : : ; T
(30.1)
where U .t / 2 Rm is the input at discrete time t .
If the original model consists of a BVP with PDEs describing a response u.x; t /
where x 2 Rd , then we assume that a finite set of basis functions f1 .x/; : : : ; n .x/g
is chosen (e.g., global bases such as Fourier and Hermite polynomials or localized
ones such as finite-element interpolation functions) so that the solution is well
approximated by
n
X
u.x; t / Xi .t /i .x/ (30.2)
iD1
The PDF p.u/ is assumed to be readily sampled. Although direct sampling from
a high-dimensional PDF is not possible in most cases, multidimensional Gaussians
1078 J.L. Beck and K.M. Zuev
are an exception because the Gaussian vector can be readily transformed so that the
components are independent and the PDF is a product of one-dimensional Gaussian
PDFs. In many applications, the discrete-time stochastic input history is modeled
by running discrete-time Gaussian white noise through a digital filter to shape its
spectrum in the frequency domain and then multiplying the filtered sequence by an
envelope function to shape it in the time domain, if it is nonstationary.
The model in (30.1) may also depend on uncertain parameters 2 Rp
which includes the initial values X .0/ if they are uncertain. Then a prior PDF p./
may be chosen to quantify the uncertainty in the value of vector . Some of the
parameters may characterize the PDF for input U which can then be denoted p.uj/.
It is convenient to redefine vector U to also include ; then the new PDF p.u/
is p.uj /p./ in terms of the previous PDFs. We assume that model parameter
uncertainty is incorporated in this way, so the basic equations remain the same
as (30.1), (30.3), and (30.4). When model uncertainty is incorporated, the calculated
pE has been referred to as the robust rare-event probability [10, 40], meaning robust
to model uncertainty, as in robust control theory.
The standard Monte Carlo simulation method (MCS) is one of the most robust
and straightforward ways to simulate rare events and estimate their probabilities.
The method was originally developed in [37] for solving problems in mathematical
physics. Since then MCS has been used in many applications in physics, statistics,
computer science, and engineering, and currently it lays at the heart of all random
sampling-based techniques [35, 44].
The basic idea behind MCS is to observe that the probability in (30.4) can be
written as an expectation:
Z
pE D IE .u/p.u/d u D Ep IE (30.5)
RD
N
1 X
pE pEM CS D IE .Ui / (30.6)
N iD1
30 Rare-Event Simulation 1079
bE .1 p /N N
pEN bE
bE ; N / D E
p.pE jN (30.8)
bE C 1; N N
B.N bE C 1/
where the beta function B is the normalizing constant that equals .N C 1/=
bE .N N
.N bE // here. The MCS estimate is the maximum a posteriori (MAP)
estimate, which is the mode of the posterior distribution (30.8) and therefore the
most probable value of pE a posteriori:
bE
N
pOEM CS D (30.9)
N
1080 J.L. Beck and K.M. Zuev
Notice that the posterior PDF in (30.8) gives a complete description of the
uncertainty in the value of pE based on the specific set of N samples of U drawn
from p.u/. The posterior distribution in (30.8) is in fact the original Bayes result
[9], although Bayes theorem was developed in full generality by Laplace [34].
The standard MCS method for estimating the probability in (30.4) is summarized
in the following pseudo-code.
q s
Varp pEM CS 1 pE
.pEM CS jpE ; N / D D (30.10)
Ep pEM CS NpE
bE =N for a
This can be approximated by replacing pE by the estimate pOEM CS D N
b b
given set of N samples fU 1 ; : : : ; U N g:
s
1 pOEM CS 4 OM CS
.pEM CS jpE ; N / D N (30.11)
N pOEM CS
For the Bayesian interpretation, the posterior c.o.v. for the stochastic variable pE ,
conditional on the set of N samples, follows from (30.8):
30 Rare-Event Simulation 1081
q q s
bE ; N
VarpE jN 1 NbE C1
bE ; N / D N C2 1 pOEM CS OM CS
.pE jN Dr ! D N
bE ; N
EpE jN bE C1 N pOEM CS
.N C 3/ N N C2
(30.12)
as N ! 1. Therefore, the same expression ONM CS can be used to assess the accuracy
of the MCS estimate, even though the two c.o.v.s have distinct interpretations.
The approximation ONM CS for the two c.o.v.s reveals both the main advantage of
the standard MCS method and its main drawback. The main strength of MCS, which
makes it very robust, is that its accuracy does not depend on the geometry of the
domain E RD and its dimension D. As long as an algorithm for generating i.i.d.
samples from p.u/ is available, MCS, unlike many other methods (e.g., numerical
integration), does not suffer from the curse of dimensionality. Moreover, an
irregular, or even fractal-like, shape of E will not affect the accuracy of MCS.
On the other hand, the serious drawback of MCS is that this method is not
computationally efficient in estimating the small probabilities pE corresponding to
rare events, where from (30.10),
1
.pEM CS jpE ; N / p (30.13)
NpE
3 Importance Sampling
The importance sampling (IS) method belongs to the class of variance reduction
techniques that aim to increase the accuracy of the estimates by constructing
(sometimes biased) estimators with a smaller variance [1, 22]. It seems it was first
proposed in [29], soon after the standard MCS method appeared.
The inefficiency of MCS for rare-event estimation stems from the fact that most
of the generated samples Ui p.u/ do not belong to E so that the vast majority of
the terms in the sum (30.6) are zero and only very few (if any) are equal to one. The
basic idea of IS is to make use of the information available about the rare-event E
to generate samples that lie more frequently in E or in the important region EQ E
that accounts for most of the probability content in (30.4). Rather than estimating
pE as an average of many 0s and very few 1s like in pOEM CS , IS seeks to reduce the
1082 J.L. Beck and K.M. Zuev
P 0
variance by constructing an estimator of the form pEIS D N1 N 0
iD1 wi , where N is
an appreciable fraction of N and the wi are small but not zero, ideally of the same
order as the target probability, wi pE .
Specifically, for an appropriate PDF q.u/ on the excitation space RD , the integral
in (30.5) can be rewritten as follows:
Z Z
IE .u/p.u/ IE p
pE D IE .u/p.u/d u D q.u/d u D Eq (30.14)
RD RD q.u/ q
The IS estimator is now constructed similarly to (30.6) by utilizing the law of large
numbers:
N N
1 X IE .Ui /p.Ui / 1 X
pE pEIS D D IE .Ui /w.Ui / (30.15)
N iD1 q.Ui / N iD1
where U1 ; : : : ; UN are i.i.d. samples from q.u/, called the importance sampling
density (ISD), and w.Ui / D p.U i/
q.Ui /
is the importance weight of sample Ui .
IS
The IS estimator pE converges almost surely as N ! 1 to pE by the strong
law of large numbers, provided that the support of q.u/, i.e., the domain in RD
where q.u/ > 0, contains the support of IE .u/p.u/. Intuitively, the latter condition
guarantees that all points of E that can be generated by sampling from the original
PDF p.u/ can also be generated by sampling from the ISD q.u/. Note that if q.u/ D
p.u/, then w.Ui / D 1 and IS simply reduces to MCS, pEM CS D pEIS . By choosing
the ISD q.u/ appropriately, IS aims to obtain an estimator with a smaller variance.
The IS estimator pEIS is also unbiased with mean and variance:
" N
# N
1 X 1 X IE p
Eq pEIS D Eq IE .Ui /w.Ui / D Eq D pE
N iD1 N iD1 q
(30.16)
N
1 X IE p 1 IE p 2
Varq pEIS D 2 Varq D Eq p 2
E
N iD1 q N q2
Importance Sampling
Input:
B N , total number of samples.
B q.u/, importance sampling density.
Algorithm:
Set j D 0, counter for the number of samples in E .
for i D 1; : : : ; N do
Sample the input excitation Ui D .Ui .0/; : : : ; Ui .T // q.u/.
Compute the system trajectory Xi D .Xi .0/; : : : ; Xi .T //
using the system model (30.1) with U .t / D Ui .t /.
if max g.Xi .t // > b
t D0;:::;T
j j C1
p.Ui /
Compute the importance weight of the j t h sample in E , wj D q.Ui /
.
end if
end for
NE D j , the total number of trials where the event E occurred.
Output:
PNE
wj
I pOEIS D j D1
N
, IS estimate of pE
IE .u/p.u/
q0 .u/ D p.ujE/ D (30.17)
pE
NISEE D p (30.18)
N
1084 J.L. Beck and K.M. Zuev
where the proportionality constant is close to 1, regardless of how small the value
of pE . In fact, decreases slightly as pE decreases, exhibiting the opposite behavior
to MCS.
In general, it is known that in many practical cases of rare-event estimation, it is
difficult to construct a good ISD that leads to a low-variance IS estimator, especially
if the dimension of the uncertain excitation space RD is large, as it is in dynamic
reliability problems [5]. A geometric explanation as to why IS is often inefficient
in high dimensions is given in [32]. Au [2] has presented an efficient IS method
for estimating pE in (30.4) for elastoplastic systems subject to Gaussian excitation.
In recent years, substantial progress has been made by tailoring the sequential
importance sampling (SIS) methods [35], where the ISD is iteratively refined, to
rare-event problems. SIS and its modifications have been successfully used for
estimating rare events in dynamic portfolio credit risk [19], structural reliability
[33], and other areas.
4 Subset Simulation
The subset simulation (SS) method [4] is an advanced stochastic simulation method
for estimating rare events which is based on Markov chain Monte Carlo (MCMC)
[35, 44]. The basic idea behind SS is to represent a very small probability pE of the
rare-event E as a product of larger probabilities of more-frequent events and then
estimate these larger probabilities separately. To implement this idea, let
RD E0
E1 : : :
EL E (30.19)
be a sequence of nested subsets of the uncertain excitation space starting from the
entire space E0 D RD and shrinking to the target rare-event EL D E. By analogy
with (30.3), subsets Ei can be defined by relaxing the value of the critical threshold
b:
Ei D U 2 R W max g.X .t // > bi
D
(30.20)
tD0;:::;T
L
Y
pE D P.Ei jEi1 / (30.21)
iD1
where U1 ; : : : ; Un are i.i.d. samples from p.u/. Estimating the remaining probabil-
ities P.Ei jEi1 /, i 2, is more challenging since one needs to generate samples
I 1 .u/p.u/
from the conditional distribution p.ujEi1 / D EiP.E i 1 /
, which, in general, is not
a trivial task. Notice that a sample U from p.ujEi1 / is one drawn from p.u/ that
lies in Ei1 . However, it is not efficient to use MCS for generating samples from
p.ujEi1 /: sampling from p.u/ and accepting only those samples that belong to
Ei1 is computationally very expensive, especially at higher levels i .
In standard SS, samples from the conditional distribution p.ujEi1 / are generated
by the modified Metropolis algorithm (MMA) [4] which belongs to the family
of MCMC methods for sampling from complex probability distributions that are
difficult to sample directly from [35, 44]. An alternative strategy splitting is
described in the next section.
The MMA algorithm is a component-wise version of the original Metropolis
algorithm [38]. It is specifically tailored for sampling from high-dimensional
conditional distributions Q and works as follows. First, without loss of generality,
assume that p.u/ D D kD1 pk .uk /, i.e., components of U are independent. This
assumption is indeed not a limitation, since in simulation one always starts from
independent variables to generate correlated excitation histories U . Suppose further
that some vector U1 2 RD is already distributed according to the target conditional
distribution, U1 p.ujEi1 /. MMA prescribes how to generate another vector
U2 p.ujEi1 /, and it consists of two steps:
n
1X
P.Ei jEi1 / IE .Uj / (30.25)
n j D1 i
.0/ .0/
Gnp0 C Gnp0 C1
b1 D (30.26)
2
where p0 is a chosen probability satisfying 0 < p0 < 1. This choice of b1 has two
immediate consequences: first, the MCS estimate of P.E1 / in (30.22) is exactly p0 ,
.0/ .0/
and, second, U1 : : : ; Unp0 not only belong to E1 but also are distributed according
to the conditional distribution p.ujE1 /. Each of these np0 samples can now be used
as mother seeds in MMA to generate . p10 1/ offspring, giving a total of n samples
.1/ .1/
U1 ; : : : ; Un p.ujE1 /. Since these seeds start in the stationary state p.ujE1 /
of the Markov chain, this MCMC method gives perfect sampling, i.e., no wasteful
burn-in period is needed. Similarly, b2 is defined as
.1/ .1/
Gnp0 C Gnp0 C1
b2 D (30.27)
2
30 Rare-Event Simulation 1087
.1/
where fGj g are the (ordered) performance values corresponding to excitations
.1/
fUj g. Again by construction, the estimate (30.25) gives P.E2 jE1 / p0 , and
.1/ .1/
U1 ; : : : ; Unp0 p.ujE2 /. The SS method proceeds in this manner until the target
rare-event E is reached and is sufficiently sampled. All but the last factor in (30.21)
are approximated by p0 , and the last factor P.EjEL1 / nnE p0 , where nE is the
.L1/ .L1/
number of samples in E among U1 ; : : : ; Un p.ujEL1 /. The method is
more formally summarized in the following pseudo-code.
Subset Simulation
Input:
B n, number of samples per conditional level.
B p0 , level probability; e.g. p0 D 0:1
2
B fqk;i g, proposal distributions; e.g. qk;i .ju/ D N .ju; k;i /
Algorithm:
Set i D 0, number of conditional level.
.0/
Set nE D 0, number of the MCS samples in E .
.0/ .0/
Sample the input excitations U1 ; : : : ; Un p.u/.
.0/ .0/
Compute the corresponding trajectories X1 ; : : : ; Xn .
for j D 1; : : : ; n do
.0/ .0/
if Gj D max g.Xj .t // > b do
t D0;:::;T
.0/ .0/
nE nE C 1
end if
end for
.i /
while nE =n < p0 do
i i C 1, a new subset Ei is needed.
.i 1/ .i 1/ .i 1/ .i 1/
Sort fUj g so that G1 G2 : : : Gn
.i 1/ .i 1/
Define the i th intermediate threshold: bi D Gnp0 C Gnp0 C1 =2
for j D 1; : : : ; np0 do
.i 1/
Using Wj;1 D Uj p.ujEi / as a seed, use MMA to generate
1
. p0 1/ additional states of a Markov chain
Wj;1 ; : : : ; Wj;1=p0 p.ujEi /.
end for
np0 ;1=p0 .i / .i /
Renumber: fWj;s gj D1;sD1 7! U1 ; : : : ; Un p.ujEi /
.i / .i /
Compute the corresponding trajectories X1 ; : : : ; Xn
for j D 1; : : : ; n do
.i / .i /
if Gj D max g.Xj .t // > b do
t D0;:::;T
.i / .i /
nE nE C 1
end if
end for
end while
L D i C 1, number of levels, i.e. subsets Ei in (30.19) and (30.20)
N D n C n.1 p0 /.L 1/, total number of samples.
Output:
.L1/
nE
I pOESS D p0L1 n
, SS estimate of pF :
1088 J.L. Beck and K.M. Zuev
.1 C /.1 p0 /
2 .pESS jpE ; p0 ; N / D .ln pE1 /r (30.28)
Np0 .ln p01 /r
for r D 2, D 3, and p0 D 0:1. For rare events, pE1 is very large, and, as expected,
SS outperforms MCS; for example, if pE D 106 , then SS 800.
In recent years, a number of modifications of SS have been proposed, including
SS with splitting [17] (described in the next section), hybrid SS [18], two-stage
SS [30], spherical SS [31], and SS with delayed rejection [57]. A Bayesian post-
processor for SS, which generalizes the Bayesian interpretation of MCS described
30 Rare-Event Simulation 1089
above, was developed in [56]. In the original paper [4], SS was developed for
estimating reliability of complex civil engineering structures such as tall buildings
and bridges at risk from earthquakes. It was applied for this purpose in [5] and [24].
SS and its modifications have also been successfully applied to rare-event simulation
in fire risk analysis [8], aerospace [39, 52], nuclear [16], wind [49] and geotechnical
engineering [46], and other fields. A detailed exposition of SS at an introductory
level and a MATLAB code implementing the above pseudo-code are given in [55].
For more advanced and complete reading, the fundamental monograph on SS [7] is
strongly recommended.
5 Splitting
In the previously presented stochastic simulation methods, samples of the input and
output discrete-time histories, fU .t / W t D 0; : : : ; T g Rm and fX .t / W t D
0; : : : ; T g Rn , are viewed geometrically as vectors U and X that define points
in the vector spaces R.T C1/m and R.T C1/n , respectively. In the splitting method,
however, samples of the input and output histories are viewed as trajectories defining
paths of length .T C 1/ in Rm and Rn , respectively. Samples that reach a certain
designated subset in the input or output spaces at some time are treated as mothers
and are then split into multiple offspring trajectories by separate sampling of the
input histories subsequent to the splitting time. These multiple trajectories can
themselves subsequently be treated as mothers if they reach another designated
subset nested inside the first subset at some later time and so be split into multiple
offspring trajectories. This is continued until a certain number of the trajectories
reach the smallest nested subset corresponding to the rare event of interest.
Splitting methods were originally introduced by Kahn and Harris [28], and they
have been extensively studied (e.g., [12, 17, 42, 54]). We describe splitting here
by using the framework of subset simulation where the only change is that the
conditional sampling in the nested subsets is done by splitting the trajectories that
reach each subset, rather than using them as seeds to generate more samples from
Markov chains in their stationary state. As a result, only standard Monte Carlo
simulation is needed, instead of MCMC simulation.
The procedure in [17] is followed here to generate offspring trajectories at the i th
level .i D 1; : : : ; L/ of subset simulation from each of the mother trajectories in Ei
constructed from samples from the previous level, except that we present it from the
viewpoint of trajectories in the input space, rather than the output space. Therefore,
at the i th level, each of the np0 sampled input histories Uj , j D 1; : : : ; np0 , from
the previous level that satisfy Uj 2 Ei , as defined in (30.20) (so the corresponding
output history Xj satisfies max g.Xj .t // > bi ), is split at their first-passage
tD0;:::;T
time
This means that the mother trajectory Uj is partitioned as Uj ; UjC where Uj D
Uj .0/; : : : ; Uj .tj / and UjC D Uj .tj C 1/; : : : ; Uj .T /; then a subtrajectory
sample U e C D U e j .tj C 1/; : : : ; U
e j .T / is drawn from
j
P.Ei juC
j ; Uj /
p.uC
j jUj ; Ei / D p.uC C C
j jUj / D p.uj jUj / D p.uj / (30.31)
P.Ei jUj /
trajectory splitting method. They show that as long as the conditional probability
in subset simulation satisfies p0 0:1, the coefficient of variation for pE when
estimating it by (30.21) and (30.25) is insensitive to p0 .
Ching, Beck, and Au [18] also introduce a hybrid version of subset simulation
that combines some advantages of the splitting and MCMC versions when gen-
erating the conditional samples Uj ; j D 1; : : : ; n at each level. It is limited to
dynamic problems because of the splitting, but it can handle parameter uncertainty
through using MCMC. All three variants of subset simulation are applied to a
series of benchmark reliability problems in [6]; their results imply that for the same
computational effort in the dynamic benchmark problems, the hybrid version gives
slightly better accuracy for the rare-event probability than the MCMC version. For a
comparison between these results and those of other stochastic simulation methods
that are applied to some of the same benchmark problems (e.g., spherical subset
simulation, auxiliary domain method, and line sampling), the reader may wish to
check [47].
6 Illustrative Example
To illustrate MCS, IS, and SS with MCMC and splitting for rare-event estimation,
consider the following forced Lorenz system of ordinary differential equations:
be the initial condition and X .t / be the corresponding solution. Lorenz showed [36]
that the solution of (30.32, 30.33, and 30.34) with U .t / 0 always (for any t )
stays inside the bounding ellipsoid E:
1092 J.L. Beck and K.M. Zuev
X1 .t /2 X2 .t /2 .X3 .t / R/2
C C 1; R Dr C (30.37)
R2 b bR2 R2
Suppose that the system is now excited by U .t / D B.t /, where B.t / is the
standard Brownian process (Gaussian white noise) and is some scaling constant.
The uncertain stochastic excitation U .t / makes the corresponding system trajectory
X .t / also stochastic. Let us say that the event E occurs if X .t / leaves the bounding
ellipsoid E during the time interval of interest 0; T .
The discretization of the excitation U is obtained by the standard discretization
of the Brownian process:
k
p p X
U .0/ D 0; U .k/ D B.k
t / D U .k 1/ C
t Zk D
t Zi ;
iD1
(30.38)
where
t D 0:1s is the sampling interval and k D 1; : : : ; D D T =
t , and
Z1 ; : : : ; ZD are i.i.d. standard Gaussian random variables. The target domain E
RD is then
X1 .k
t /2 X2 .k
t /2 .X3 .k
t / R/2
g.k/ D C C (30.40)
R2 b bR2 R2
Figure 30.1 shows the solution of the unforced Lorenz system (with D 0 so
U .t / D 0), and an example of the solution of the forced system (with D 3) that
corresponds to excitation U 2 E (slightly abusing notation, U D U .Z1 ; : : : ; ZD / 2
E means that the corresponding Gaussian vector .Z1 ; : : : ; ZD / 2 E).
Monte Carlo Simulation: For D 3, Fig. 30.2 shows the probability pE of event
E as a function of T estimated using standard MCS:
N
1 X
pOEM CS D IE .Z .i/ / (30.41)
N iD1
.i/ .i/
where Z .i/ D .Z1 ; : : : ; ZD / .z/ are i.i.d. samples from the standard D-
dimensional Gaussian PDF .z/. For each value of T , N D 104 samples were used.
When T < 25, the accuracy of the MCS estimate (30.41) begins to degenerate
since the total number of samples N becomes too small for the corresponding target
probability. Moreover, for T < 15, none of the N -generated MCS samples belong to
the target domain E, making the MCS estimate zero. Figure 30.2 shows, as expected,
that pE is an increasing function of T , since the more time the system has, the more
likely its trajectory eventually penetrates the boundary of ellipsoid E.
Importance Sampling: IS is a variance reduction technique, and, as it was
discussed in previous sections, its efficiency critically depends on the choice of the
30 Rare-Event Simulation 1093
=0 =3
1.5 1.5
Response g(t)
Response g(t)
1 1
0.5 0.5
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Time t Time t
Fig. 30.1 The left column shows the solution of the unexcited Lorenz system ( D 0) enclosed
in the bounding ellipsoid E (top) and the corresponding response function g.t / (bottom), where
t 2 0; T , T D 100. The right top panel shows the solution of the forced Lorenz system ( D 3)
that corresponds to an excitation U 2 E . As it is clearly seen, this solution leaves the ellipsoid E.
According to the response function g.t / shown in the right bottom panel, this first-passage event
happens around t D 90
ISD q. Usually some geometric information about the target domain E is needed
for constructing a good ISD. To get some intuition, Fig. 30.3 shows the domain
E for two lower-dimensional cases: T D 1,
t D 0:5 (D D 2) and T D 1:5,
q .z/ C qC .z/
q.z/ D (30.42)
2
100
MonteCarlo
Subset Simulation / MCMC
101 Subset Simulation / Splitting
102
Target probability pE
103
104
105
106
107
0 10 20 30 40 50 60 70 80 90 100
Time T
14000
Total number of
Subset Simulation
12000
samples N
10000
8000
6000
4000
2000
0 10 20 30 40 50 60 70 80 90 100
Time T
Fig. 30.2 Top panel shows the estimate of the probability pE of event E where D 3 as a function
of duration time T . For each value of T 2 5; 100, N D 104 samples were used in MCS, and
n D 2 103 samples per conditional level were used in the two versions of SS. The MCS and
SS/splitting estimates for pE are zero for T < 15 and T < 12, respectively. The bottom panel
shows the total computational effort automatically chosen by both SS algorithms
since .zE / has the largest (among generated samples) value. We then take ISD
q2 as the mixture of Gaussian PDFs centered at zE and zE .
Case 3: To illustrate what happens if one ignores the geometric information about
two components of E, we choose q3 .z/ D .zjzE /, as given in Case 2.
Let T D 1 and D 20. The dimension of the uncertain excitation space is then
D D 10. Table 30.1 shows the simulation results for the above three cases as well
as for standard MCS. The IS method with q1 , on average, correctly estimates pE .
However, the c.o.v. of the estimate is very large, which results in large fluctuations of
the estimate in independent runs. IS with q2 works very well and outperforms MCS:
the c.o.v. is reduced by half. Finally, IS with q3 completely misses one component
30 Rare-Event Simulation 1095
Fig. 30.3 Left panel: visualization of the domain E in two- dimensional case D D 2, where
T D 1,
t D 0:5, and D 20. N D 104 samples were generated and marked by red circles
(respectively, green dots) if they do (respectively, do not) belong to E . Right panel: the same as on
the left panel but with D D 3 and T D 1:5
part of the target domain E, and the resulting estimate is about half of the correct
value. Note that the c.o.v. in this case is very small, which is very misleading.
It was mentioned in previous sections that IS is often not efficient in high
dimensions because it becomes more difficult to construct a good ISD [5, 32]. To
illustrate this effect, IS with q2 was used to estimate pE for a sequence of problems
where the total duration time gradually grows from T D 1 to T D 10. This results
in an increase of the dimension D of the underlying uncertain excitation space from
10 to 100. Figure 30.4 shows how the IS estimate degenerates as the dimension D
of the problem increases. While IS is accurate when D D 10 .T D 1/, it strongly
underestimates the true value of pE as D approaches 100 .T D 10/.
Subset Simulation: SS is a more advanced simulation method, and, unlike IS, it
does not suffer from the curse of dimensionality. For D 3, Fig. 30.2 shows the
estimate of the target probability pE as a function of T using SS with MCMC and
splitting. For each value of T , n D 2 103 samples were used in each conditional
level in SS. Unlike MCS, SS is capable of efficiently simulating very rare events
and estimating their small probabilities. The total computational effort, i.e., the total
number N of samples automatically chosen by SS, is shown in the bottom panel of
Fig. 30.2. Note that the larger the value of pE , the smaller the number of conditional
levels in SS, and, therefore, the smaller the total number of samples N . The total
1096 J.L. Beck and K.M. Zuev
101
MCS
IS (q2)
100
101
Target probability pE
102
103
104
105
1 2 3 4 5 6 7 8 9 10
Time T
Fig. 30.4 Estimation of the target probability pE as a function of duration time T . Solid red and
dashed blue curves correspond to MCS and IS with q2 , respectively. In this example, D 20 and
N D 104 samples for each value of T are used. It is clearly visible how the IS estimate degenerates
as the dimension D goes from 10 (T D 1) to 100 (T D 10)
7 Conclusion
the real system behavior, implying that there are usually no true values of the
model parameters and the accuracy of its predictions is uncertain. In the engineering
literature, the mathematical problem to be solved numerically for the probability
of performance failure, commonly called the failure probability, is referred to as
the first-passage reliability problem. It does not have an analytical solution, and
numerical solutions must face two challenging aspects:
1. The vector representing the time-discretized stochastic process that models the
future system excitation lies in an input space of high dimension;
2. The dynamic systems of interest are assumed to be highly reliable so that their
performance failure is a rare event, that is, the probability of its occurrence, pE ,
is very small.
as a function of the number of training samples from the complex model and the
accuracy of the estimate of the rare-event probability as a function of the total
number of samples from both the complex model and the surrogate model.
References
1. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Springer,
New York (2010)
2. Au, S.K.: Importance sampling for elasto-plastic systems using adapted process with determin-
istic control. Int. J. Nonlinear Mech. 44, 189198 (2009)
3. Au, S.K., Beck, J.L.: First excursion probabilities for linear systems by very efficient
importance sampling. Prob. Eng. Mech. 16, 193207 (2001)
4. Au, S.K., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset
simulation. Prob. Eng. Mech. 16(4), 263277 (2001)
5. Au, S.K., Beck, J.L.: Importance sampling in high dimensions. Struct. Saf. 25(2), 139163
(2003)
6. Au, S.K., Ching, J., Beck, J.L.: Application of subset simulation methods to reliability
benchmark problems. Struct. Saf. 29(3), 183193 (2007)
7. Au, S.K., Wang, Y.: Engineering Risk Assessment and Design with Subset Simulation. Wiley,
Singapore (2014)
8. Au, S.K., Wang, Z.H., Loa, S.M.: Compartment fire risk analysis by advanced Monte Carlo
method. Eng. Struct. 29, 23812390 (2007)
9. Bayes, T.: An essay towards solving a problem in the doctrine of chances. Philos. Trans. R.
Soc. Lond. 53, 370418 (1763). Reprinted in Biometrika 45, 296315 (1989)
10. Beck, J.L.: Bayesian system identification based on probability logic. Struct. Control Health
Monit. 17, 825847 (2010)
11. Beck, J.L., Au, S.K.: Reliability of Dynamic Systems using Stochastic Simulation. In:
Proceedings of the 6th European Conference on Structural Dynamics, Paris (2005)
12. Botev, Z.I., Kroese, D.P.: Efficient Monte Carlo simulation via the generalized splitting method.
Stat. Comput. 22(1), 116 (2012)
13. Bourinet, J.M., Deheeger, F., Lemaire, M.: Assessing small failure probabilities by combined
subset simulation and support vector machines. Struct. Saf. 33(6), 343353 (2011)
14. Bucher, C., Bourgund, U.: A fast and efficient response surface approach for structural
reliability problems. Struct. Saf. 7, 5766 (1990)
15. Bucklew, J.A.: Introduction to Rare Event Simulation. Springer Series in Statistics. Springer,
New York (2004)
16. Cadini, F., Avram, D., Pedroni, N., Zio, E.: Subset simulation of a reliability model for
radioactive waste repository performance assessment. Reliab. Eng. Syst. Saf. 100, 7583
(2012)
17. Ching, J., Au, S.K., Beck, J.L.: Reliability estimation for dynamical systems subject to
stochastic excitation using subset simulation with splitting. Comput. Methods Appl. Mech.
Eng. 194, 15571579 (2005)
18. Ching, J., Beck, J.L., Au, S K.: Hybrid subset simulation method for reliability estimation of
dynamical systems subject to stochastic excitation. Prob. Eng. Mech. 20, 199214 (2005)
19. Deng, S., Giesecke, K., Lai, T.L.: Sequential importance sampling and resampling for dynamic
portfolio credit risk. Oper. Res. 60(1), 7891 (2012)
20. Ditlevsen, O., Madsen, H.O.: Structural Reliability Methods. Wiley, Chichester/New York
(1996)
21. Dubourg, V., Sudret, B., Deheeger, F.: Meta-model based importance sampling for structural
reliability analysis. Prob. Eng. Mech. 33, 4757 (2013)
30 Rare-Event Simulation 1099
22. Dunn, W.L., Shultis, J.K.: Exploring Monte Carlo Methods. Elsevier, Amsterdam/Boston
(2012)
23. Hurtado, J.: Structural Reliability: Statistical Learning Perspectives. Springer, Berlin/New York
(2004)
24. Jalayer, F., Beck, J.L.: Effects of two alternative representations of ground-motion uncertainty
on probabilistic seismic demand assessment of structures. Earthq. Eng. Struct. Dyn. 37, 6179
(2008)
25. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620630 (1957)
26. Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press,
Cambridge (2003)
27. Johnson, C.: Numerical Solution of Partial Differential Equations by the Finite Element
Method. Dover, Mineola (2009)
28. Kahn, H., Harris, T.E.: Estimation of particle transmission by random sampling. Natl. Bur.
Stand. Appl. Math. Ser. 12, 2730 (1951)
29. Kahn, H., Marshall, A. W.: Methods of reducing sample size in Monte Carlo computations.
J. Oper. Res. Soc. Am. 1(5), 263278 (1953)
30. Katafygiotis, L.S., Cheung, S.H.: A two-stage subset simulation-based approach for calculating
the reliability of inelastic structural systems subjected to Gaussian random excitations.
Comput. Methods Appl. Mech. Eng. 194, 15811595 (2005)
31. Katafygiotis, L.S., Cheung, S.H.: Application of spherical subset simulation method and
auxiliary domain method on a benchmark reliability study. Struct. Saf. 29(3), 194207 (2007)
32. Katafygiotis, L.S., Zuev, K.M.: Geometric insight into the challenges of solving high-
dimensional reliability problems. Prob. Eng. Mech. 23, 208218 (2008)
33. Katafygiotis, L.S., Zuev, K.M.: Estimation of small failure probabilities in high dimensions
by adaptive linked importance sampling. In: Proceedings of the COMPDYN-2007, Rethymno
(2007)
34. Laplace, P.S.: Theorie Analytique des Probabilites. Courcier, Paris (1812)
35. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)
36. Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130141 (1963)
37. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44, 335341 (1949)
38. Metropolis, N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H., Teller, E.: Equation of state
calculations by fast computing machines. J. Chem. Phys. 21(6), 10871092 (1953)
39. Pellissetti, M.F., Schuller, G.I., Pradlwarter, H.J., Calvi, A., Fransen, S., Klein, M.: Reliability
analysis of spacecraft structures under static and dynamic loading. Comput. Struct. 84, 1313
1325 (2006)
40. Papadimitriou, C., Beck, J.L., Katafygiotis, L.S.: Updating robust reliability using structural
test data. Prob. Eng. Mech. 16, 103113 (2001)
41. Papadopoulos, V. Giovanis, D.G., Lagaros, N.D., Papadrakakis, M.: Accelerated subset
simulation with neural networks for reliability analysis. Comput. Methods Appl. Mech. Eng.
223, 7080 (2012)
42. Pradlwarter, H.J., Schuller, G.I., Melnik-Melnikov, P.G.: Reliability of MDOF-systems. Prob.
Eng. Mech. 9, 23543 (1994)
43. Rackwitz, R.: Reliability analysis a review and some perspectives. Struct. Saf. 32, 365395
(2001)
44. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2004)
45. Ross, S.M.: A First Course in Probability, 8th edn. Prentice Hall, Upper Saddle River (2009)
46. Santoso, A.M., Phoon, K.K., Quek, S.T.: Modified Metropolis-Hastings algorithm with
reduced chain correlation for efficient subset simulation. Prob. Eng. Mech. 26, 331341 (2011)
47. Schuller, G.I., Pradlwarter, H.J.: Benchmark study on reliability estimation in higher dimen-
sions of structural systems an overview. Struct. Saf. 29(3), 167182 (2007)
48. Schuller, G.I., Pradlwarter, H.J., Koutsourelakis, P.S.: A critical appraisal of reliability
estimation procedures for high dimensions. Prob. Eng. Mech. 19, 463474 (2004)
49. Sichani, M.T., Nielsen, S.R.K.: First passage probability estimation of wind turbines by
Markov chain Monte Carlo. Struct. Infrastruct. Eng. 9, 10671079 (2013)
1100 J.L. Beck and K.M. Zuev
50. Sparrow, C.: The Lorenz Equations: Bifurcations, Chaos, and Strange Attractors. Springer,
New York (1982)
51. Taflanidis, A.A., Beck, J.L.: Analytical approximation for stationary reliability of certain and
uncertain linear dynamic systems with higher dimensional output. Earthq. Eng. Struct. Dyn.
35, 12471267 (2006)
52. Thunnissen, D.P., Au, S.K., Tsuyuki, G.T.: Uncertainty quantification in estimating critical
spacecraft component temperatures. AIAA J. Thermophys. Heat Transf. 21(2), 422430 (2007)
53. Valdebenito, M.A., Pradlwarter, H.J., Schuller, G.I.: The role of the design point for
calculating failure probabilities in view of dimensionality and structural nonlinearities. Struct.
Saf. 32, 101111 (2010)
54. Villn-Altamirano, M., Villn-Altamirano, J.: Analysis of RESTART simulation: theoretical
basis and sensitivity study. Eur. Trans. Telecommun. 13(4), 373386 (2002)
55. Zuev, K.: Subset simulation method for rare event estimation: an introduction. In: M. Beer et al.
(Eds.) Encyclopedia of Earthquake Engineering. Springer, Berlin/Heidelberg (2015). Available
on-line at http://www.springerreference.com/docs/html/chapterdbid/369348.html
56. Zuev, K.M., Beck, J.L., Au. S.K., Katafygiotis, L.S.: Bayesian post-processor and other
enhancements of subset simulation for estimating failure probabilities in high dimensions.
Comput. Struct. 9293, 283296 (2012)
57. Zuev, K.M., Katafygiotis, L.S.: Modified Metropolis-Hastings algorithm with delayed rejec-
tion. Prob. Eng. Mech. 26, 405412 (2011)
Part IV
Introduction to Sensitivity Analysis
Introduction to Sensitivity Analysis
31
Bertrand Iooss and Andrea Saltelli
Abstract
Sensitivity analysis provides users of mathematical and simulation models with
tools to appreciate the dependency of the model output from model input and
to investigate how important is each model input in determining its output. All
application areas are concerned, from theoretical physics to engineering and
socio-economics. This introductory paper provides the sensitivity analysis aims
and objectives in order to explain the composition of the overall Sensitivity
Analysis chapter of the Springer Handbook. It also describes the basic principles
of sensitivity analysis, some classification grids to understand the application
ranges of each method, a useful software package, and the notations used in
the chapter papers. This section also offers a succinct description of sensitiv-
ity auditing, a new discipline that tests the entire inferential chain including
model development, implicit assumptions, and normative issues and which is
recommended when the inference provided by the model needs to feed into a
regulatory or policy process. For the Sensitivity Analysis chapter, in addition to
this introduction, eight papers have been written by around twenty practitioners
from different fields of application. They cover the most widely used methods
for this subject: the deterministic methods as the local sensitivity analysis, the
experimental design strategies, the sampling-based and variance-based methods
developed from the 1980s, and the new importance measures and metamodel-
B. Iooss ()
Industrial Risk Management Department, EDF R&D, Chatou, France
Institut de Mathmatiques de Toulouse, Universit Paul Sabatier, Toulouse, France
e-mail: bertrand.iooss@edf.fr; biooss@yahoo.fr
A. Saltelli
Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen (UIB),
Bergen, Norway
based techniques established and studied since the 2000s. In each paper, toy
examples or industrial applications illustrate their relevance and usefulness.
Keywords
Computer experiments Uncertainty analysis Sensitivity analysis Sensitiv-
ity auditing Risk assessment Impact assessment
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104
2 Basic Principles of Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105
3 Methods Contained in the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108
4 Specialized R Software Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1112
5 Sensitivity Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1118
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119
1 Introduction
Several textbooks and specialist works [2, 3, 8, 1012, 21, 56, 59] have covered
most of the classic SA methods and objectives. In parallel, a scientific conference
called SAMO (Sensitivity Analysis on Model Output) has been organized every
3 years since 1995 and extensively covers SA related subjects. Works presented
at the different SAMO conferences can be found in their proceedings and several
special issues published in international journals (mainly in Reliability Engineering
and System Safety).
The main goal of this chapter is to provide an overview of classic and advanced
SA methods, as none of the referenced works have reported all the concepts and
methods in one single document. Researchers and engineers will find this document
to be an up-to-date report on SA as it currently stands, although this scientific field
remains very active in terms of new developments. The present chapter is only a
snapshot in time and only covers well-established methods.
The next section of this paper provides the SA basic principles, including
elementary graphic methods. In the third section, the SA methods contained
in the chapter are described using a classification grid, together with the main
mathematical notations of the chapter papers. Then, the SA-specialized packages
developed in the R software environment are discussed. To finish this introductory
paper, a process for the sensitivity auditing of models in a policy context is
discussed, by providing seven rules that extend the use of SA. As discussed in
Saltelli et al. [61], SA, mandated by existing guidelines as a good practice to use
in conjunction with mathematical modeling, is insufficient to ensure quality in
the treatment of scientific uncertainty for policy purposes. Finally, the concluding
section lists some important and recent research works that could not be covered in
the present chapter.
The first historical approach to SA is known as the local approach. The impact of
small input perturbations on the model output is studied. These small perturbations
occur around nominal values (the mean of a random variable, for instance). This
deterministic approach consists of calculating or estimating the partial derivatives
of the model at a specific point of the input variable space [68]. The use of adjoint-
based methods allows models with a large number of input variables to be processed.
Such approaches are particularly well-suited to tackling uncertainty analysis and
SA and data assimilation problems in environmental systems such as those in
climatology, oceanography, hydrogeology, etc. [3, 4, 48].
To overcome the limitations of local methods (linearity and normality assump-
tions, local variations), another class of methods has been developed in a statistical
framework. In contrast to local SA, which studies how small variations in inputs
around a given value change the value of the output, global sensitivity analysis
(global in opposition to the local analysis) does not distinguish any initial set
of model input values, but considers the numerical model in the entire domain of
possible input parameter variations [57]. Thus, the global SA is an instrument used
1106 B. Iooss and A. Saltelli
to study a mathematical model as a whole rather than one of its solution around
parameters specific values.
Numerical model users and modelers have shown high interest in these global
tools that take full advantage of the development of computing equipment and
numerical methods (see Helton [20], de Rocquigny et al. [10], and [11] for industrial
and environmental applications). Saltelli et al. [58] emphasized the need to specify
clearly the objectives of a study before performing an SA. These objectives may
include:
The factor prioritization setting, which aims at identifying the most important
factors. The most important factor is the one that, if fixed, would lead to the
greatest reduction in the uncertainty of the output;
The factor fixing setting, which aims at reducing the number of uncertain inputs
by fixing unimportant factors. Unimportant factors are the ones that, if fixed to
any value, would not lead to a significant reduction of the output uncertainty.
This is often a preliminary step before the calibration of model inputs using some
available information (real output observations, constraints, etc.);
The variance cutting setting, which can be a part of a risk assessment study.
Its aim is to reduce the output uncertainty from its initial value to a lower pre-
established threshold value;
The factor mapping setting, which aims at identifying the important inputs in a
specific domain of the output values, for example, which combination of factors
produce output values above or below a given threshold.
Y D G.X/: (31.1)
15
15
15
10
10
10
y
y
5
5
5
0
0
0
0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0
X1 X2 X3
Fig. 31.1 Scatterplots of 200 simulations on a numerical model with three inputs (in abscissa of
each plot) and one output (in ordinate). Dotted curves are local-polynomial-based smoothers
many other chapters of this handbook). To begin with, simple scatterplots between
each input variable and the model output can allow the detection of linear or
nonlinear input/output relation. Figure 31.1 gives an example of scatterplots on a
simple function with three input variables and one output variable. As the two-
dimensional scatterplots do not capture the possible interaction effects between
the inputs, the cobweb plots [32] can be used. Also known as parallel coordinate
plots, cobweb plots allow to visualize the simulations as a set of trajectories by
joining the value (or the corresponding quantile) of each variables combination
of the simulation sample by a line (see Fig. 31.2): Each vertical line represents
one variable, the last vertical line representing the output variable. In Fig. 31.2, the
simulations leading to the smallest values of the model output have been highlighted
in red. This allows to immediately understand that these simulations correspond to
combinations of small and large values of the first and second
inputs, respectively.
.i/ .i/ .i/
Moreover, from the same sample x1 ; : : : ; xd ; y , quantitative global
iD1:::n
sensitivity measures can be easily estimated, as the linear (or Pearson) correlation
coefficient and the rank (or Spearman) correlation coefficient [56]. It is also possible
to fit a linear model explaining the behavior of Y given the values of X, provided
that the sample size n is sufficiently large (at least n > d ). The main indices are then
1108 B. Iooss and A. Saltelli
Fig. 31.2 Cobweb plot of 200 simulations of a numerical model with three inputs (first three
columns) and one output (last column)
r
Var.Xj /
the Standard Regression Coefficients SRCj D j Var.Y / , where j is the linear
regression coefficient associated to Xj . SRC2j represents a share of variance if the
linearity hypothesis is confirmed. Among many simple sensitivity indices, all these
indices are included in the so-called sampling-based sensitivity analysis methods
(see the description of the content of this chapter in the next section and Helton
et al. [23]).
Three families of methods are described in this chapter, based on which is the
objective of the analysis:
shows how, from an initial sample of input and output values, quantitative
sensitivity indices can be obtained by various methods (correlation, multiple
linear regression, nonparametric regression) and applied in analyzing composite
indicators. In Paper 5 (see Chap. 35, Variance-Based Sensitivity Analysis:
Theory and Estimation Algorithms, written by Clmentine Prieur and
Stefano Tarantola), the definitions of the variance-based importance measures
(the so-called Sobol indices) and the algorithms to calculate them will be
detailed. In Paper 6 (see Chap. 36, Derivative-Based Global Sensitivity
Measures, written by Serge Kucherenko and Bertrand Iooss), the global
SA based on derivatives sample (the DGSM indices) are explained, while
in Paper 7 (see Chap. 37, Moment-Independent and Reliability-Based
Importance Measures, written by Emanuele Borgonovo and Bertrand
Iooss), the moment-independent and reliability importance measures are
described.
3. Third, in-depth exploration of model behavior with respect to input variation can
be carried out. Paper 8 (see Chap. 38, Metamodel-Based Sensitivity Analysis:
Polynomial Chaos Expansions and Gaussian Processes, written by Loc Le
Gratiet, Stefano Marelli, and Bruno Sudret) includes recent advances made in the
modeling of computer experiments. A metamodel is used as a surrogate model
of the computer model with any SA techniques, when the computer model is too
CPU time-consuming to allow a sufficient number of model calculations. Special
attention is paid to two of the most popular metamodels: the polynomial chaos
expansion and the Gaussian process model, for which the Sobol indices can be
efficiently obtained. Finally, Paper 9 (see Chap. 39, Sensitivity Analysis of
Spatial and/or Temporal Phenomena, written by Amandine Marrel, Nathalie
Saint Geours, and Matthias De Lozzo) extends the SA tools in the context of
temporal and/or spatial phenomena.
Fig. 31.3 Coarse classification of main global SA methods in terms of required number of model
evaluations and model complexity
Fig. 31.4 Classification of global SA methods in terms of the required number of model
evaluations and model complexity
31 Introduction to Sensitivity Analysis 1111
R is easy to run and install on main operating systems. It can be efficiently used
with an old computer, with a single workstation and with one of the most recent
supercomputer;
R is a programming language which is interpreted and object oriented and which
contains vector operators and matrix computation;
The main drawbacks of R are its virtual memory limits and non-optimized
computation times. To overcome these problems, compiled languages as Fortran
or C are much more efficient and can be introduced as compiled codes inside R
algorithms requiring huge computation time.
R contains a lot of built-in statistical functions;
R is extremely well documented with built-in help system;
R encourages the collaboration, the discussion forums, and the creation of new
packages by researchers and students. Thousands of packages are made available
on the CRAN website (http://cran.r-project.org/).
All these benefits highlight the interest to develop specific softwares in R and
many packages have been developed on SA. For instance, the FME package con-
tains basic SA and local SA methods (see Chap. 32, Variational Methods), while
the spartan package contains basic methods for exploring stochastic numerical
models.
For global SA, the sensitivity package includes a collection of functions for
factor screening, sensitivity index estimation, and reliability sensitivity analysis of
model outputs. It implements:
A few screening techniques as the sequential bifurcations and the Morris method
(see Chap. 33, Design of Experiments for Screening). Note that the R
package planor allows to build fractional factorial design (see Chap. 33,
Design of Experiments for Screening);
The main sampling-based procedures as linear regression coefficients, partial
correlations, and rank transformation (see Chap. 34, Weights and Importance
in Composite Indicators: Mind the Gap). Note that the ipcp function of the R
package iplots provides an interactive cobweb graphical tool (see Fig. 31.2),
31 Introduction to Sensitivity Analysis 1113
5 Sensitivity Auditing
The first rule looks at the instrumental use of mathematical modeling to advance
ones agenda. This use is called rhetorical, or strategic, like the use of Latin by
the elites and clergy before the Reformation. At times the use of models is driven a
simple pursuit of profit; according to Stiglitz [64], this was the case for the modelers
pricing the derivatives at the root of the sub-prime mortgages crisis:
[. . . ] Part of the agenda of computer models was to maximize the fraction of, say, a lousy
sub-prime mortgage that could get an AAA rating, then an AA rating, and so forth, [. . . ]
This was called rating at the margin, and the solution was still more complexity, p. 161.
At times this use of models can be called ritual, in the sense that it offers a false
sense of reassurance. An example is Fisher [13] (quoting Szenberg [65]):
Kenneth Arrow, one of the most notable Nobel Laureates in economics, has his own
perspective on forecasting. During World War II, he served as a weather officer in the
U.S. Army Air Corps and worked with a team charged with the particularly difficult
task of producing month-ahead weather forecasts. As Arrow and his team reviewed
these predictions, they confirmed statistically what you and I might just as easily have
guessed: The Corps weather forecasts were no more useful than random rolls of a die.
Understandably, the forecasters asked to be relieved of this seemingly futile duty. Arrows
recollection of his superiors response was priceless: The commanding general is well
aware that the forecasts are no good. However, he needs them for planning purposes
Szenberg [65].
The second rule about assumption hunting is a reminder to look for what was
assumed when the model was originally framed. Modes are full of caeteris paribus
assumptions, meaning that, e.g., in economics the model can predict the result of a
1116 B. Iooss and A. Saltelli
shock to a given set of equations assuming that all the rest all other input variables
and inputs remains equal, but in real life caeteris are never paribus, the meaning by
this that variables tend to be linked with one another, so that they can hardly change
in isolation.
Furthermore, at times the assumption made by modelers do not to withstand
scrutiny. A good example of assumption hunting is from John Kay [28], where the
author takes issue with modeling used in transport policy. This author discovered
that among the input values assumed (and hence fixed) in the model was average
car occupancy rates, differentiated by time of day, in 2035. The point is that
such assumptions are very difficult to justify. This comment was published in the
Financial Times (where John Kay is a columnist) showing that at present times
controversies that could be called epistemological evade the confines of academia
and populate the media.
Rule three is about artificially exaggerating or playing down uncertainties
wherever convenient. The tobacco lobbies exaggerated the uncertainties about the
health effects of smoking according to Oreskes and Conway [44], while advocates
of the death penalty played down the uncertainties in the negative relations between
capital punishment and crime rate [34]. Clearly the latter wanted the policy, in this
case the death penalty, and were interested in showing that the supporting evidence
was robust. In the former case the lobbies did not want regulation (e.g., bans on
tobacco smoking in public places) and were hence interested in amplifying the
uncertainty in the smoking-health effect causality relationship.
Rule four is about confessing uncertainties before going public with the
analysis. This rule is also one of the commandments of applied econometrics
according to Kennedy [29]: Thou shall confess in the presence of sensitivity.
Corollary: Thou shall anticipate criticism. According to this rule a sensitivity
analysis should be performed before the results of a modeling study are published.
There are many good reasons for doing this, one being that a carefully performed
sensitivity analysis often uncovers plain coding mistakes or model inadequacies.
The other is that most often than not the analysis reveals uncertainties that are
larger than those anticipated by the model developers. Econometrician Edward
Leamer in discussing this [34] argues that One reason these methods are rarely
used is their honesty seems destructive. In Saltelli and dHombres [52], the negative
consequences of doing a sensitivity analysis a posteriori are discussed. The case is
the first review of the cost of offsetting climate change done by Nicholas Stern of
the London School of Economics (the so-called Stern Review) which was criticized
by William Nordhaus, of the University of Yale, on the basis of large sensitivity of
the estimates upon the discount factors employed by Stern. Sterns own sensitivity
analysis, published as an annex to the review, revealed according to the authors
in Saltelli and dHombres [52] that while the discount factors were not the only
important factors determining the cost estimate, the estimates were indeed very
uncertain. For the large uncertainties of integrated assessment models of climates
impact, see also Saltelli et al. [62].
Rule five is about presenting the results of the modeling study in a transparent
fashion. Both rules originate from the practice of impact assessment, where a
31 Introduction to Sensitivity Analysis 1117
Sensitivity auditing can then be seen as a user guide to criticize the model-based
studies, and SA is a part of this guide. In the following section, we go back to
sensitivity analysis in order to conclude and give some perspectives.
6 Conclusion
This introductory paper has presented the SA aims and objectives, the SA and
sensitivity auditing basic principles, the different methods explained in the chapter
papers by positioning them in a classification grid, and the useful R software
packages and the notations used in the chapter papers. The chapters Editor, Bertrand
Iooss, would like to sincerely thank all authors and co-authors of the Sensitivity
Analysis chapter for their efforts and the quality of their contributions. Dr. Jean-
Philippe Argaud (EDF R&D), Dr. Graud Blatman (EDF R&D), Dr. Nicolas
Bousquet (EDF R&D), Dr. Sbastien da Veiga (Safran), Dr. Herv Monod (INRA),
Dr. Matieyendou Lamboni (INRA), and Dr. Loc Le Gratiet (EDF R&D) are also
greatly thanked for their advices on the different chapter papers.
The papers in this chapter include the most widely used methods for sensitivity
analysis: the deterministic methods as the local sensitivity analysis, the experimental
design strategies, the sampling-based and variance-based methods developed from
the 1980s, and the new importance measures and metamodel-based techniques
established and studied since the 2000s. However, with such a rich subject, choices
had to be made for the different chapter papers, and some important omissions
are present. For instance, the robust Bayesian analysis [1, 24] is not discussed,
while it is nevertheless a great ingredient in the study of the sensitivity of Bayesian
answers to assumptions and uncertain calculation inputs. Moreover, in the context
of non-probabilistic representation of uncertainty (as in the interval analysis, the
evidence theory and the possibility theory), a small amount of SA methods has been
developed [22]. This subject is deferred to a future review work.
Due to their very new or incomplete nature, several other SA issues are not
discussed in this chapter. For instance, estimating total Sobol indices at low
cost remains a problem of primary importance in many applications (see Saltelli
et al. [60] for a recent review on the subject). Second-order Sobol index estimations
have recently been considered by Fruth et al. [16] (by way of total interaction
indices) and Tissot and Prieur [67] (by way of replicated orthogonal-array LHS).
The latter work offers a powerful estimation method because the number of model
calls is independent of the number of inputs (as in the spirit of permutation-
based technique [37, 38]). However, for high-dimensional models (several hundreds
inputs), estimation biases and computational costs remain considerable; De Castro
and Janon [9] have proposed to introduce modern statistical techniques based on
variable selection in regression models. Owen [46] has introduced generalized Sobol
indices allowing to compare and search efficient estimators (as the new one found
in Owen [45]).
Another mathematical difficulty is the consideration of the dependence between
the inputs Xi (i D 1 : : : d ). Nonindependence between inputs in SA has been
31 Introduction to Sensitivity Analysis 1119
discussed by many authors such as Saltelli and Tarantola [55], Jacques et al. [27],
Xu and Gertner [72], Da Veiga et al. [7], Li et al. [36], Kucherenko et al. [31],
Mara and Tarantola [39], and Chastaing et al. [5]. Despite all these works, much
confusion still exists in practical applications. In practice, it would be useful to be
able to measure the influence of the potential dependence between some inputs on
the output quantity of interest.
Note that most of the works in this chapter focus on SA relative to the overall
variability of model output (second-order statistics). In practice, one can be inter-
ested in other quantities of interest, such as the output entropy, the probability that
the output exceeds a threshold and a quantile estimation [15,35,56]. This is an active
area of research as shown in this chapter (see Chap. 37, Moment-Independent
and Reliability-Based Importance Measures), but innovative and powerful ideas
have recently been developed by Owen et al. [47] and Geraci et al. [18] using higher-
order statistics, Fort et al. [14] using contrast functions, and Da Veiga [6] using a
kernel point of view.
Finally, in some situations, the computer code is not a deterministic simulator but
a stochastic one. This means that two model calls with the same set of input variables
lead to different output values. Typical stochastic computer codes are queuing
models, agent-based models, and models involving partial differential equations
applied to heterogeneous or Monte Carlo-based numerical models. For this type
of code, Marrel et al. [40] have proposed a first solution for dealing with Sobol
indices. Moutoussamy et al. [43] have also tackled the issue in the context of the
metamodel building of the output probability density function. Developing relevant
SA methods in this context will certainly be subject of future works.
References
1. Berger, J.: An overview of robust Bayesian analysis (with discussion). Test 3, 5124 (1994)
2. Borgonovo, E., Plischke, E.: Sensitivity analysis: a review of recent advances. Eur. J. Oper.
Res. 248, 869887 (2016)
3. Cacuci, D.: Sensitivity and Uncertainty Analysis Theory. Chapman & Hall/CRC, Boca Raton
(2003)
4. Castaings, W., Dartus, D., Le Dimet, F.X., Saulnier, G.M.: Sensitivity analysis and parameter
estimation for distributed hydrological modeling: potential of variational methods. Hydrol.
Earth Syst. Sci. Discuss. 13, 503517 (2009)
5. Chastaing, G., Gamboa, F., Prieur, C.: Generalized Hoeffding-Sobol decomposition for
dependent variables application to sensitivity analysis. Electron. J. Stat. 6, 24202448 (2012)
6. Da Veiga, S.: Global sensitivity analysis with dependence measures. J. Stat. Comput. Simul.
85, 12831305 (2015)
7. Da Veiga, S., Wahl, F., Gamboa, F.: Local polynomial estimation for sensitivity analysis on
models with correlated inputs. Technometrics 51(4), 452463 (2009)
8. Dean, A., Lewis, S. (eds.): Screening Methods for Experimentation in Industry, Drug
Discovery and Genetics. Springer, New York (2006)
9. De Castro, Y., Janon, A.: Randomized pick-freeze for sparse Sobol indices estimation in high
dimension. ESAIM Probab. Stat. 19, 725745 (2015)
10. de Rocquigny, E., Devictor, N., Tarantola, S. (eds.): Uncertainty in Industrial Practice. Wiley,
Chichester/Hoboken (2008)
1120 B. Iooss and A. Saltelli
11. Faivre, R., Iooss, B., Mahvas, S., Makowski, D., Monod, H. (eds.): Analyse de sensibilit et
exploration de modles. ditions Qua (2013)
12. Fang, K.T., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments. Chapman
& Hall/CRC, Boca Raton (2006)
13. Fisher, R.W.: Remembering Carol Reed, Aesops Fable, Kenneth Arrow and Thomas Dewey.
In: Speech: An Economic Overview: Whats Next, Federal Reserve Bank of Dallas. http://
www.dallasfed.org/news/speeches/fisher/2011/fs110713.cfm (2011)
14. Fort, J., Klein, T., Rachdi, N.: New sensitivity analysis subordinated to a contrast. Com-
mun. Stat. Theory Methods (2014, in press). http://www.tandfonline.com/doi/full/10.1080/
03610926.2014.901369#abstract
15. Frey, H., Patil, S.: Identification and review of sensitivity analysis methods. Risk Anal. 22,
553578 (2002)
16. Fruth, J., Roustant, O., Kuhnt, S.: Total interaction index: a variance-based sensitivity index
for second-order interaction screening. J. Stat. Plan. Inference 147, 212223 (2014)
17. Funtowicz, S., Ravetz, J.: Uncertainty and Quality in Science for Policy. Kluwer Academic,
Dordrecht (1990)
18. Geraci, G., Congedo, P., Iaccarino, G.: Decomposing high-order statistics for sensitivity
analysis. In: Thermal & Fluid Sciences Industrial Affiliates and Sponsors Conference, Stanford
University, Stanford (2015)
19. Grundmann, R.: The role of expertise in governance processes. For. Policy Econ. 11, 398403
(2009)
20. Helton, J.: Uncertainty and sensitivity analysis techniques for use in performance assesment
for radioactive waste disposal. Reliab. Eng. Syst. Saf. 42, 327367 (1993)
21. Helton, J.: Uncertainty and sensitivity analysis for models of complex systems. In: Graziani, F.
(ed.) Computational Methods in Transport: Verification and Validation, pp. 207228. Springer,
New-York (2008)
22. Helton, J., Johnson, J., Obekampf, W., Salaberry, C.: Sensitivity analysis in conjunction with
evidence theory representations of epistemic uncertainty. Reliab. Eng. Syst. Saf. 91, 14141434
(2006a)
23. Helton, J., Johnson, J., Salaberry, C., Storlie, C.: Survey of sampling-based methods for
uncertainty and sensitivity analysis. Reliab. Eng. Syst. Saf. 91, 11751209 (2006b)
24. Insua, D., Ruggeri, F. (eds.): Robust Bayesian Analysis. Springer, New York (2000)
25. Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2(8), 696701
(2005)
26. Iooss, B., Lematre, P.: A review on global sensitivity analysis methods. In: Meloni, C.,
Dellino, G. (eds.) Uncertainty Management in Simulation-Optimization of Complex Systems:
Algorithms and Applications. Springer, New York (2015)
27. Jacques, J., Lavergne, C., Devictor, N.: Sensitivity analysis in presence of model uncertainty
and correlated inputs. Reliab. Eng. Syst. Saf. 91, 11261134 (2006)
28. Kay, J.: A wise man knows one thing the limits of his knowledge. Financial Times 29 Nov
2011
29. Kennedy, P.: A Guide to Econometrics, 5th edn. Blackwell Publishing, Oxford (2007)
30. Kleijnen, J.: Sensitivity analysis and related analyses: a review of some statistical techniques.
J. Stat. Comput. Simul. 57, 111142 (1997)
31. Kucherenko, S., Tarantola, S., Annoni, P.: Estimation of global sensitivity indices for models
with dependent variables. Comput. Phys. Commun. 183, 937946 (2012)
32. Kurowicka, D., Cooke, R.: Uncertainty Analysis with High Dimensional Dependence Mod-
elling. Wiley, Chichester/Hoboken (2006)
33. Latour, B.: We Have Never Been Modern. Harvard University Press, Cambridge (1993)
34. Leamer, E.E.: Tantalus on the road to asymptopia. J. Econ. Perspect. 4(2), 3146 (2010)
35. Lematre, P., Sergienko, E., Arnaud, A., Bousquet, N., Gamboa, F., Iooss, B.: Density
modification based reliability sensitivity analysis. J. Stat. Comput. Simul. 85, 12001223
(2015)
31 Introduction to Sensitivity Analysis 1121
36. Li, G., Rabitz, H., Yelvington, P., Oluwole, O., Bacon, F., Kolb, C., Schoendorf, J.: Global
sensitivity analysis for systems with independent and/or correlated inputs. J. Phys. Chem. 114,
60226032 (2010)
37. Mara, T.: Extension of the RBD-FAST method to the computation of global sensitivity indices.
Reliab. Eng. Syst. Saf. 94, 12741281 (2009)
38. Mara, T., Joseph, O.: Comparison of some efficient methods to evaluate the main effect of
computer model factors. J. Stat. Comput. Simul. 78, 167178 (2008)
39. Mara, T., Tarantola, S.: Variance-based sensitivity indices for models with dependent inputs.
Reliability Engineering and System Safety 107, 115121 (2012)
40. Marrel, A., Iooss, B., Da Veiga, S., Ribatet, M.: Global sensitivity analysis of stochastic
computer models with joint metamodels. Stat. Comput. 22, 833847 (2012)
41. Marris, C., Wynne, B., Simmons, P., Weldon, S.: Final report of the PABE research project
funded by the Commission of European Communities. Technical report contract number: FAIR
CT98-3844 (DG12 SSMI), Commission of European Communities (2001)
42. Monbiot, G.: Beware the rise of the government scientists turned lobbyists. The Guardian
29 Apr 2013
43. Moutoussamy, V., Nanty, S., Pauwels, B.: Emulators for stochastic simulation codes. ESAIM:
Proc. Surv. 48, 116155 (2015)
44. Oreskes, N., Conway, E.M.: Merchants of Doubt: How a Handful of Scientists Obscured
the Truth on Issues from Tobacco Smoke to Global Warming. Bloomsbury Press, New York
(2010)
45. Owen, A.: Better estimation of small Sobol sensitivity indices. ACM Trans. Model. Comput.
Simul. 23, 11 (2013a)
46. Owen, A.: Variance components and generalized Sobol indices. J. Uncert. Quantif. 1, 1941
(2013b)
47. Owen, A., Dick, J., Chen, S.: Higher order Sobol indices. Inf. Inference: J. IMA 3, 5981
(2014)
48. Park, K., Xu, L.: Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications.
Springer, Dordrecht (2008)
49. Pujol, G., Iooss, B., Janon, A.: Sensitivity Package, Version 1.11. The Comprenhensive R
Archive Network. http://www.cran.r-project.org/web/packages/sensitivity/ (2015)
50. Rakovec, O., Hill, M.C., Clark, M.P., Weerts, A.H., Teuling, A.J., Uijlenhoet, R.: Distributed
evaluation of local sensitivity analysis (DELSA), with application to hydrologic models. Water
Resour. Res. 50, 118 (2014)
51. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 280297 (2002)
52. Saltelli, A., dHombres, B.: Sensitivity analysis didnt help. A practitioners critique of the Stern
review. Glob. Environ. Change 20(2), 298302 (2010)
53. Saltelli, A., Funtowicz, S.: When all models are wrong: more stringent quality criteria are
needed for models used at the science-policy interface. Issues Sci. Technol. XXX(2), 7985
(2014, Winter)
54. Saltelli, A., Funtowicz, S.: Evidence-based policy at the end of the Cartesian dream: the case of
mathematical modelling. In: Pereira, G., Funtowicz, S. (eds.) The End of the Cartesian Dream.
Beyond the TechnoScientific Worldview. Routledges Series: Explorations in Sustainability
and Governance, pp. 147162. Routledge, London (2015)
55. Saltelli, A., Tarantola, S.: On the relative importance of input factors in mathematical models:
safety assessment for nuclear waste disposal. J. Am. Stat. Assoc. 97, 702709 (2002)
56. Saltelli, A., Chan, K., Scott, E. (eds.): Sensitivity Analysis. Wiley Series in Probability and
Statistics. Wiley, Chichester/New York (2000a)
57. Saltelli, A., Tarantola, S., Campolongo, F.: Sensitivity analysis as an ingredient of modelling.
Stat. Sci. 15, 377395 (2000b)
58. Saltelli, A., Tarantola, S., Campolongo, F., Ratto, M.: Sensitivity Analysis in Practice: A Guide
to Assessing Scientific Models. Wiley, Chichester/Hoboken (2004)
1122 B. Iooss and A. Saltelli
59. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Salsana, M.,
Tarantola, S.: Global Sensitivity Analysis The Primer. Wiley, Chichester (2008)
60. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output. Design and estimator for the total sensitivity index.
Comput. Phys. Commun. 181, 259270 (2010)
61. Saltelli, A., Pereira, G., Van der Sluijs, J.P., Funtowicz, S.: What do I make of your latinorum?
Sensitivity auditing of mathematical modelling. Int. J. Foresight Innov. Policy 9(2/3/4),
213234 (2013)
62. Saltelli, A., Stark, P., Becker, W., Stano, P.: Climate models as economic guides. Scientific
challenge or quixotic quest? Issues Sci. Technol. XXXI(3), 7984 (2015)
63. Savage, S.L.: The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty.
Wiley, Hoboken (2009)
64. Stiglitz, J.: Freefall, Free Markets and the Sinking of the Global Economy. Penguin, London
(2010)
65. Szenberg, M.: Eminent Economists: Their Life Philosophies. Cambridge University Press,
Cambridge (1992)
66. The Economist: How science goes wrong. The Economist 19 Oct 2013
67. Tissot, J.Y., Prieur, C.: A randomized orthogonal array-based procedure for the estimation of
first- and second-order Sobol indices. J. Stat. Comput. Simul. 85, 13581381 (2015)
68. Turanyi, T.: Sensitivity analysis for complex kinetic system, tools and applications. J. Math.
Chem. 5, 203248 (1990)
69. Van der Sluijs, J.P., Craye, M., Funtowicz, S., Kloprogge, P., Ravetz, J., Risbey, J.: Combining
quantitative and qualitative measures of uncertainty in model based environmental assessment:
the NUSAP system. Risk Anal. 25(2), 481492 (2005)
70. Wang, J., Faivre, R., Richard, H., Monod, H.: mtk: a general-purpose and extensible R
environment for uncertainty and sensitivity analyses of numerical experiments. R J. 7/2,
206226 (2016)
71. Winner, L.: The Whale and the Reactor: A Search for Limits in an Age of High Technology.
The University of Chicago Press, Chicago (1989)
72. Xu, C., Gertner, G.: Extending a global sensitivity analysis technique to models with correlated
parameters. Comput. Stat. Data Anal. 51, 55795590 (2007)
Variational Methods
32
Maelle Nodet and Arthur Vidard
Abstract
This contribution presents derivative-based methods for local sensitivity analysis,
called Variational Sensitivity Analysis (VSA). If one defines an output called the
response function, its sensitivity to input variations around a nominal value can
be studied using derivative (gradient) information. The main issue of VSA is then
to provide an efficient way of computing gradients.
This contribution first presents the theoretical grounds of VSA: framework
and problem statement and tangent and adjoint methods. Then it covers practical
means to compute derivatives, from naive to more sophisticated approaches,
discussing their various merits. Finally, applications of VSA are reviewed, and
some examples are presented, covering various applications fields: oceanogra-
phy, glaciology, and meteorology.
Keywords
Variational sensitivity analysis Variational methods Tangent model
Adjoint model Gradient Automatic differentiation Derivative
Local sensitivity analysis Stability analysis Geophysical applications
Meteorology Glaciology Oceanography
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124
2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125
2.2 Tangent and Adjoint Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126
1 Introduction
2 Methods
In this section, the sensitivity of the output vector y with respect to the input vector
x is considered. This output is often called the response function, since it represents
the response of the system to variation on the input. In the variational framework,
practitioners are generally interested in studying sensitivities of given numerical
models M coming from various application domains (physics, geosciences, biol-
ogy, etc.), so that the output is a function of a state vector u.xI t / 2 Rp , which
depends on the input x 2 Rd and on time t and represents the state of a given
system of interest:
8
< du
D M.uI x/; t 2 0; T
dt (32.1)
: u.t D 0/ D u0 .x/
where G is the response function and G maps the state space into the output space.
Remark 1. In the definition of the response function G lies all the art of sensitivity
analysis. It should be designed carefully depending on the aim of the study
(model validation, physical phenomenon study, etc.). This is discussed in depth
in the contributions Chap. 39, Sensitivity Analysis of Spatial and/or Temporal
Phenomena and Variables Weights and Importance in Arithmetic Averages: Two
Stories to Tell.
Remark 2. A more general case would be y D G.u.x/; x/. The theory easily extends
to that case, but for readability, only this simpler case will be presented here.
In order to simplify the notations and clarify the reading, scalar outputs will be
considered in this chapter, i.e., y 2 R, but vector-valued outputs can of course be
considered as well.
1126 M. Nodet and A. Vidard
dG
.x/
dx
The partial derivatives of G are particular cases of the directional derivative, also
called the Gteau derivative, which is defined by G 0 .x/h such that
G.x C h/ G.x/
G 0 .x/h D lim
!0
@G
G 0 .x/ei D .x/
@xi
G 0 .x/h D rx G.x/:h
The most naive approach to track sensitive variables consists in fixing all parameters
except one, increasing it by a given percentage (of its standard deviation or its
absolute value) and then evaluating its impact on the output. This type of analysis
can allow for a quick ranking of the variables if there are not too many of them. The
reader can refer to [11] for a brief review on this subject.
A more refined approach would be to obtain a numerical approximation of the
gradient using finite differences, i.e., computing the gradient as a limit of a growth
rate (see [11] and next paragraph about practical aspects). This method is very
simple but has two main drawbacks. First, its computational cost increases rapidly
32 Variational Methods 1127
with the dimension d of x. Second, the choice of xi is critical: if too large, the
truncation error becomes large; if too small, rounding error occurs.
To address this last point, one can obtain an exact calculation for the Gteau
derivatives (FSAP, Forward Sensitivity Analysis Procedure in [3]s terminology).
Assuming the output is given by Eqs. (32.1) and (32.2),
where @M @u
and @M@x
are the Jacobian matrices of the model with respect to the state
u and the parameters x. The Tangent Linear Model allows to compute exactly the
directional derivative of the response function G, for a given direction h. To get
the entire gradient, all the partial derivatives need to be computed; therefore, d
integrations of the TLM are required. The accuracy problem may be solved, but
for a large-dimensional set of parameters, the computing cost of the FSAP method
remains prohibitive.
In large-dimensional cases, however, an adjoint method can be used to compute
the gradient (ASAP, Adjoint Sensitivity Analysis Procedure in [3]s terminology).
As the derivation of the adjoint model is tedious in the general abstract case, this
contribution will focus on common examples.
One will first consider the case where G and G are given by
Z tDT
G.x/ D G.u.x// D H.u.xI t // dt (32.4)
tD0
As Eq. (32.5) needs to be rewritten, the following backward equation for p is set and
called the adjoint model:
8 T
< dp @M
D .u.x/I x/ :p C ru H.u.xI t //; t 2 0; T
(32.7)
: dt @u
p.t D T / D 0
Combining (32.5), (32.6), and (32.7), the following formula for the Gteau deriva-
tive is obtained:
Z T T !
0 @M
G .x/h D rx G.x/:h D p.0/rx u0 C .u.x/I x/ p dt :h
0 @x
which does not involve variable v anymore. Thus, the computing cost is independent
from the number of parameters, and rx G.x/ can be computed exactly using one
32 Variational Methods 1129
integration of the direct model and one backward integration of the adjoint model.
Note that if the model is linear, only the adjoint integration is needed.
Similarly, the adjoint model can be computed for the following type of output:
where tDt1 .t / is the Dirac function of t at point t1 . The adjoint model is then
8 T
< dp @M
D .u.x/I x/ :p C ru H.u.xI t //tDt1 .t /; t 2 0; T
: dt @u
p.t D T / D 0
(32.10)
Notice here that the adjoint variable is equal to zero up to time t1 . Computationally
speaking, it is therefore sufficient to run the adjoint model from t1 to 0.
Then the gradient is unchanged:
Z tDT T
T @M
rx G.x/ D rx u0 p.t D 0/ C .u.x/I x/ p.t / dt (32.11)
tD0 @x
Note here that this adjoint method allows to obtain directly every component of
the gradient vector with a single run of the adjoint model, thus avoiding the curse of
dimensionality on the parameter set dimension.
1130 M. Nodet and A. Vidard
Practical aspects are an important incentive for the choice of either of gradient
computation methods presented above. For a small dimension problem, the finite
difference approximation route is pretty straightforward, and unless a high precision
is required, it is probably the better choice. For larger dimension problems, however,
either choice will require some efforts in terms of computing cost and/or code
developments. This section presents practical aspects of such developments, as
well as a further approximation of the finite difference method tailored for high
dimension problems.
Discretize the continuous direct model, and then write the adjoint of the discrete
direct model. This is generally called the discrete adjoint approach; see, e.g.,
[17].
Write the continuous adjoint from the continuous direct model (as explained
before), and then discretize the continuous adjoint equations. This is called the
continuous adjoint approach.
The two approaches are not equivalent. As an example, a simple ordinary differ-
ential equation can be considered:
c 0 .t / D F .t /:c.t /
where F is a time-dependant linear operator acting on the vector c.t /. If this model
is discretized using a forward Euler scheme, a typical time step writes as
cnC1 D .I C t Fn /cn
32 Variational Methods 1131
where I is the identity matrix and n the time index. Then the adjoint of this discrete
equation would give the following typical time step (discrete adjoint):
cn D .I C t Fn /T cnC1
where denotes the adjoint variable. On the other hand, the continuous adjoint
equation is
c 0 .t / D F .t /T :c .t /
If the same forward Euler explicit scheme is chosen, one obtains for the continuous
adjoint
and the time dependency on F then implies that both approaches are not
identical. This illustrates the fact that discretization and transposition (the word
adjointization does not exist; the process of obtaining an adjoint code is called
transposition or derivation) do not, in general, commute.
The choice of the discrete adjoint approach should be immediate for two main
reasons:
1. The response function G.x/ is computed through the discrete direct model; its
gradient is therefore given by the discrete adjoint. The continuous adjoint gives
only an approximation of this gradient.
2. The discrete adjoint approach allows for the use of automatic adjoint compilers
(software that takes in input the direct model code and produces as output the
tangent and adjoint codes, e.g., Tapenade [12]).
However, it must be noted here that, for large complex systems, obtaining the
discrete adjoint can be a time- and expertise-demanding task. Therefore, if one
has limited time and experience in adjoint coding and if one can be satisfied
with just an approximation of the gradient, the continuous adjoint approach can
be considered. Moreover, for complex nonlinear or non-differentiable equations,
the discrete adjoint has been shown in [29] to present problems to compute the
sensitivities, so that other approaches may be considered. Similarly, [18] presents an
application with adaptive mesh, where it is preferable to go through the continuous
adjoint route. In any case, the validation of the gradient (see below) should give a
good idea about the quality of the chosen gradient.
(and then the adjoint code, by transposition), it is necessary to apply the chain
rule to this function composition. Because of nonlinearities, dependencies, and
inputs/outputs, applying the chain rule to a large code is very complex. There exist
recipes for adjoint code construction, explaining in details how to get the adjoint for
various instruction types (assignment, loop, conditional statements, inputs/outputs,
and so on); see, e.g., [9, 10]. Code differentiation can be done by hand following
these recipes.
An alternative to adjoint handwriting is to use specially designed softwares that
automate the writing of the adjoint code, called automatic differentiation tools,
such as the software Tapenade [12]. These tools are powerful, always improving,
and can now derive large computer codes in various programming languages. It
should be mentioned, however, that these tools cannot be used completely in black
box mode and may require some preparatory work on the direct code. Despite
these difficulties, some large-scale usages of automatic differentiation have been
performed, e.g., for the ocean model of the MIT (MITgcm, see [20]) or a Greenland
ice model (see below the Applications section and [13]).
In practice, A is a large matrix and may be difficult to inverse; therefore, the authors
propose to replace A by its diagonal. In the end, an approximate gradient is given
by the formula:
B
rx G.x/ D E.GhT / .Diag.A//1 (32.16)
32 Variational Methods 1133
where the expectation is computed using a Monte Carlo method, requiring a certain
number of model runs. This formula is simple enough but should be handled with
care; indeed, it holds three successive approximations: finite differences instead
of true gradient, Monte Carlo approximation (usually carried over with a small
sample size), and A replaced by Diag.A/. However, this kind of approach has been
successfully used in data assimilation, to obtain gradient-like information without
resorting to adjoint code construction; see, e.g., [7, 19].
First-Order Test
The first-order test is very simple; the idea is just to check the following approxima-
tion at the first order in ! 0:
The principle of the test is then to compute, for various perturbation directions h and
various values of (with ! 0, e.g., D 10n ; n D 1 : : : 8), the two following
quantities: first one computes
j .; h/ .h/j
".; h/ D
j.h/j
Second-Order Test
For this test, the Taylor expansion at second order is written:
2
G.x C h/ D G.x/ C rG; h C h; r 2 Gh C o. 2 /
2
.; h/ .h/
r.; h/ D
for various directions h and various and checks that it tends to a constant
(depending on h) when ! 0.
Stability analysis is the study of how perturbations on the system will grow. Such
tools, and in particular the so-called singular vectors, can be used for sensitivity
analysis. Indeed, looking at extremal perturbations gives an input on sensitivities of
the system (see [22, 23, 27] for application examples).
The growth rate of a given perturbation h0 of, say, the initial condition u0 of the
model is classically defined by
kM .u0 C h0 ; T / M .u0 ; T / k
.h0 / D (32.17)
kh0 k
By restricting the study to the linear part of the perturbation behavior, the growth
rate becomes (denoting L D @M @u
for clarity)
One can notice then that LfiC is an eigenvector of LL , which allows to define the
backward singular vectors, noted fi , as
p
LfiC D i fi
3 Applications
Applications of VSA can be generally divided into two classes: sensitivity to initial
or boundary condition changes and sensitivity to parameter changes. However, one
can extend the notion of sensitivity analysis to second-order sensitivity and stability
analysis. This section is organized following this classification. Even though VSA
has been used extensively in a wide range of problems, examples given here come
from geophysical applications. Indeed, in that case, the control vector is generally of
very large dimension; therefore, global SA techniques are out of reach. Moreover,
quite often tangent and/or adjoint models are already available since they are used
for data assimilation.
where zobs are observations of the system, H maps the state vector to the observation
space, and k : kO is typically a weighted L2 norm. Following (32.4), the response
function is then
Z T
1
G.u0 / D k zobs .t / H u.u0 I t / k2O dt (32.21)
2 0
Local sensitivities of such functions can be very useful to understand the behavior
of a system and have been extensively used in geosciences (see, for instance,
[8, 28, 34], or [2]).
One can find an example of application of such methods on the Mercator Oceans
1/4 global ocean model in [31]. The initial objective was to try to estimate the
influences of geographical areas to reduce the forecast error using an adjoint method
to compute the sensitivities. A preliminary study has been conducted by considering
the misfit to observations as a proxy of the forecast error and sought to determine
the sensitivity of this misfit to regional changes in the initial condition and/or to
forcing. That should give an indication about the important phenomena to consider
to improve this system.
The most easily interpreted case in this study is to consider a sensitivity criterion
coming from the difference in sea surface temperature (SST) maps at the final instant
of the assimilation cycle, because of its dense coverage in space. The response
function (32.21) is a discrete version of the time integral, in which the operator
H (mapping the state vector u to the observation space) simply extracts the SST of
the state vector. This can be translated into computing the gradient:
NSST
1X
G.u0 ; q/ D k HSST .un / SSTobs k2R1 (32.22)
2 nD1
Fig. 32.1 Top: misfit between forecast and observed SST (left) and mixed layer depth (right). Bottom: sensitivity to 1-week lead time SST error with respect
1137
to variations in initial surface (left) and 100 m (right) temperature (From [31])
1138 M. Nodet and A. Vidard
This kind of study is also routinely used to target observations. For example, in
order to be able to track tropical cyclones, it is possible to use the so-called adjoint-
derived sensitivity steering vectors (ADSSV, [5, 14, 32]). In that case, the model
Eq. (32.1) represents the evolution of the atmosphere, starting from an initial state
vector u0 . Some technicalities allow to choose the parameter vector x equal to the
vorticity at initial time x D 0 D @v@x
0
@u
@y
0
. Then the response function follows the
form (32.9); it is a two-criteria function at a given verification time t1 :
Z Z T
1 1
G.0 / D u.t1 / dx; v.t1 / dx (32.23)
jAj A jAj A
1
R 1
R
where jAj A u dx and jAj A v dx are the zonal and meridional averaged wind
@Gu
velocities over a given area of interest A. By looking at the sensitivities to 0 : @0
and
@Gv
@0
,one will get information about the way the tropical cyclone is likely to go. For
example, if at a given forecast time at one particular grid point the ADSSV vector
points to the east, an increase in the vorticity at this very point at the observing time
would be associated with an increase in the eastward steering flow of the storm at
the verifying time [32]. This information, in turn, helps to decide where to launch
useful observations.
Fig. 32.2 Adjoint sensitivity maps of the total Greenland ice volume V related to (a) basal sliding
cb and (b) basal melt rate qbm (From [13], copyright International Glaciological Society)
perturbations on the basal sliding cb and the basal melt rate qbm . Let us note
here that in this case, the adjoint model has been obtained using automatic
differentiation tools. Figure 32.2 shows that sliding sensitivities exhibit significant
regional variations and are mostly located in the coastal areas, whereas melt-
rate sensitivities are either fairly uniform or they completely vanish. This kind
of information is of prime importance when tuning the model parameter and/or
designing an observation campaign for measuring said parameters. For instance,
in this particular system, there is no gain to be expected by focusing on the interior
of the domain.
to observations through a sequence of prediction and correction steps, i.e., the model
is corrected as follows:
du
D M.uI x/ C K H .u.t // yobs .t / (32.24)
dt
where K is a gain matrix. For the simplest versions of filtering data assimilation,
K does not depend on u.t /, so computing sensitivities to such system can be
done similarly as before (see [33] for an example). However, in more sophisticated
approach, like variational data assimilation, it is less straightforward. In that case,
the data assimilation problem is solved through the minimization of a cost function
that is typically Eq. (32.21). Looking for local sensitivities on the forecasting system
would mean to look for sensitivities to the optimal solution of the minimization of
(32.21), that is to say to compute the gradient of the whole optimality system (direct
model, adjoint model, cost function); doing so by adjoint method may require the
second-order adjoint model [6, 16].
Another example of complex system is to perform a sensitivity analysis on a
stability analysis (i.e., how given perturbations will affect the stability of a system).
In [27], the authors use stability analysis to study the sensitivity of the ther-
mohaline oceanic circulation (large-scale circulation, mostly dominated by density
variations, i.e., temperature and salinity variations). To do so, they look for the
optimal initial perturbation of the sea surface salinity that induces the largest
variation of the thermohaline circulation.
In [23], the authors are interested in the moist predictability in meteorological
models. Since this is a very nonlinear process, they propose to use nonlinear singular
vectors as response function G:
kM .u0 C h0 ; T / M .u0 ; T / k
G.u0 / D arg max (32.25)
kh0 kDE0 kh0 k
This tells which variation in the initial condition will affect the most the optimal
perturbations and then the predictability. Note that in that case, computing ru0 G.u0 /
also requires the second-order adjoint.
Obviously, these are only a handful of possible applications among many; as
long as one defines a response function, one could be interested by studying its
sensitivities.
4 Conclusion
Variational methods are local sensitivity analysis techniques; they gather a set of
methods from very basic to sophisticated as presented above. The main advantage
is that they can be used for very large dimension problems if using adjoint methods;
the downside is that it may require some heavy developments. This burden can
be reduced however by the use of automatic differentiation tools. They have been
32 Variational Methods 1141
used for a very wide range of applications and even on a daily basis in operational
numerical weather prediction. Although they are local by essence, adjoint-based
variational methods can be extended to global sensitivity analysis as will be
presented in Chap. 36, Derivative-Based Global Sensitivity Measures.
Methods presented here are dedicated to first-order local sensitivity analysis.
This can be extended to interaction studies by using the second-order derivatives
(Hessian), which can be computed similarly using so-called second-order adjoint
models.
Readers interested in going further could start with clearly written and easy to
read papers such as [4, 13, 17]. To go further, the book [3] and the application paper
[2] are recommended. And, finally, about second-order derivatives, [16] provides a
nice introduction, and [23] offers an advanced application.
5 Cross-References
References
1. Ancell, B., Hakim, G.J.: Comparing adjoint- and ensemble-sensitivity analysis with applica-
tions to observation targeting. Mon. Weather Rev. 135(12), 41174134 (2007)
2. Ayoub, N.: Estimation of boundary values in a North Atlantic circulation model using an
adjoint method. Ocean Model. 12(34), 319347 (2006)
3. Cacuci, D.G.: Sensitivity and Uncertainty Analysis: Theory. CRC Press, Boca Raton (2005)
4. Castaings, W., Dartus, D., Le Dimet, F.X., Saulnier, G.M.: Sensitivity analysis and parameter
estimation for distributed hydrological modeling: potential of variational methods. Hydrol.
Earth Syst. Sci. 13(4), 503517 (2009)
5. Chen, S.G., Wu, C.C., Chen, J.H., Chou, K.H.: Validation and interpretation of adjoint-derived
sensitivity steering vector as targeted observation guidance. Mon. Weather Rev. 139, 1608
1625 (2011)
6. Daescu, D.N., Navon, I.M.: Reduced-order observation sensitivity in 4D-var data assimilation.
In: American Meteorological Society 88th AMS Annual Meeting, New Orleans (2008)
7. Desroziers, G., Camino, J.T., Berre, L.: 4DEnVar: link with 4D state formulation of variational
assimilation and different possible implementations. Q. J. R. Meteorol. Soc. 140, 20972110
(2014)
8. Errico, R.M., Vukicevic, T.: Sensitivity analysis using an adjoint of the PSU-NCAR mesoseale
model. Mon. Weather Rev. 120(8), 16441660 (1992)
9. Giering, R., Kaminski, T.: Recipes for adjoint code construction. ACM Trans. Math. Softw.
24(4), 437474 (1998)
10. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic
Differentiation. SIAM, Philadelphia (2008)
11. Hamby, D.M.: A review of techniques for parameter sensitivity analysis of environmental
models. Environ. Monit. Assess. 32(2), 135154 (1994)
12. Hascoet, L., Pascual, V.: The Tapenade automatic differentiation tool: principles, model, and
specification. ACM Trans. Math. Softw. 39(3), 20 (2013)
1142 M. Nodet and A. Vidard
13. Heimbach, P., Bugnion, V.: Greenland ice-sheet volume sensitivity to basal, surface and initial
conditions derived from an adjoint model. Ann. Glaciol. 50, 6780 (2009)
14. Hoover, B.T., Morgan, M.C.: Dynamical sensitivity analysis of tropical cyclone steering using
an adjoint model. Mon. Weather Rev. 139, 27612775 (2011)
15. Lauvernet, C., Hascot, L., Dimet, F.X.L., Baret, F.: Using automatic differentiation to study
the sensitivity of a crop model. In: Forth, S., Hovland, P., Phipps, E., Utke, J., Walther, A. (eds.)
Recent Advances in Algorithmic Differentiation. Lecture Notes in Computational Science and
Engineering, vol. 87, pp. 5969. Springer, Berlin (2012)
16. Le Dimet, F.X., Ngodock, H.E., Luong, B., Verron, J.: Sensitivity analysis in variational data
assimilation. J. Meteorol. Soc. Jpn. Ser. 2, 75, 135145 (1997)
17. Lellouche, J.M., Devenon, J.L., Dekeyser, I.: Boundary control of Burgers equationa
numerical approach. Comput. Math. Appl. 28(5), 3334 (1994)
18. Li, S., Petzold, L.: Adjoint sensitivity analysis for time-dependent partial differential equations
with adaptive mesh refinement. J. Comput. Phys. 198(1), 310325 (2004)
19. Liu, C., Xiao, Q., Wang, B.: An ensemble-based four-dimensional variational data assimilation
scheme. Part I: technical formulation and preliminary test. Mon. Weather Rev. 136(9),
33633373 (2008)
20. Marotzke, J., Wunsch, C., Giering, R., Zhang, K., Stammer, D., Hill, C., Lee, T.: Construction
of the adjoint MIT ocean general circulation model and application to atlantic heat transport
sensitivity. J. Geophys. Res. 104(29), 52929 (1999)
21. Mu, M., Duan, W., Wang, B.: Conditional nonlinear optimal perturbation and its applications.
Nonlinear Process. Geophys. 10(6), 493501 (2003)
22. Qin, X., Mu, M.: Influence of conditional nonlinear optimal perturbations sensitivity on
typhoon track forecasts. Q.J. R. Meteorol. Soc. 138, 185197 (2011)
23. Rivire, O., Lapeyre, G., Talagrand, O.: A novel technique for nonlinear sensitivity analysis:
application to moist predictability. Q. J. R. Meteorol. Soc. 135(643), 15201537 (2009)
24. Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F.: Sensitivity analysis for chemical models.
Chem. Rev. 105(7), 28112828 (2005)
25. Sandu, A., Daescu, D.N., Carmichael, G.R.: Direct and adjoint sensitivity analysis of chemical
kinetic systems with KPP: Part Itheory and software tools. Atmos. Environ. 37(36),
50835096 (2003)
26. Sandu, A., Daescu, D.N., Carmichael, G.R., Chai, T.: Adjoint sensitivity analysis of regional
air quality models. J. Comput. Phys. 204(1), 222252 (2005)
27. Svellec, F.: Optimal surface salinity perturbations influencing the thermohaline circulation.
J. Phys. Oceanogr. 37(12), 27892808 (2007)
28. Sykes, J.F., Wilson, J.L., Andrews, R.W.: Sensitivity analysis for steady state groundwater flow
using adjoint operators. Water Resour. Res. 21(3), 359371 (1985)
29. Thuburn, J., Haine, T.W.N.: Adjoints of nonoscillatory advection schemes. J. Comput. Phys.
171(2), 616631 (2001)
30. Vidard, A.: Data assimilation and adjoint methods for geophysical applications. PhD thesis,
Universit de Grenoble, Habilitation thesis (2012)
31. Vidard, A., Rmy, E., Greiner, E.: Sensitivity analysis through adjoint method: application to
the GLORYS reanalysis. Contrat n 08/D43, Mercator Ocan (2011)
32. Wu, C.C., Chen, J.H., Lin, P.H., Chou, K.H.: Targeted observations of tropical cyclone
movement based on the adjoint-derived sensitivity steering vector. J. Atmos. Sci. 64(7),
26112626 (2007)
33. Zhu, Y., Gelaro, R.: Observation sensitivity calculations using the adjoint of the gridpoint
statistical interpolation (GSI) analysis system. Mon. Weather Rev. 136(1), 335351 (2008)
34. Zou, X., Barcilon, A., Navon, I.M., Whitaker, J., Cacuci, D.G.: An adjoint sensitivity study of
blocking in a two-layer isentropic model. Mon. Weather Rev. 121, 28332857 (1993)
Design of Experiments for Screening
33
David C. Woods and Susan M. Lewis
Abstract
The aim of this paper is to review methods of designing screening experiments,
ranging from designs originally developed for physical experiments to those
especially tailored to experiments on numerical models. The strengths and
weaknesses of the various designs for screening variables in numerical models
are discussed. First, classes of factorial designs for experiments to estimate
main effects and interactions through a linear statistical model are described,
specifically regular and nonregular fractional factorial designs, supersaturated
designs, and systematic fractional replicate designs. Generic issues of aliasing,
bias, and cancellation of factorial effects are discussed. Second, group screening
experiments are considered including factorial group screening and sequential
bifurcation. Third, random sampling plans are addressed including Latin hyper-
cube sampling and sampling plans to estimate elementary effects. Fourth, a
variety of modeling methods commonly employed with screening designs are
briefly described. Finally, a novel study demonstrates six screening methods on
two frequently-used exemplars, and their performances are compared.
Keywords
Computer experiments fractional factorial designs Gaussian process mod-
els group screening space-filling designs supersaturated designs vari-
able selection
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144
1.1 Linear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146
1.2 Gaussian Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147
1.3 Screening without a Surrogate Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1148
1 Introduction
Such models often describe complex input-output relationships and have numerous
input variables. A primary reason for building a numerical model is to gain better
understanding of the nature of these relationships, especially the identification of
the active input variables. If a small set of active variables can be identified, then
the computational costs of subsequent exploration and exploitation of the numerical
model are reduced. Construction of a surrogate model from the active variables
requires less experimentation, and smaller Monte Carlo samples may suffice for
uncertainty analysis and uncertainty propagation.
The effectiveness of screening can be evaluated in a variety of ways. Suppose
there are d input variables held in vector x D .x1 ; : : : ; xd /T and that X Rd
contains all possible values of x, i.e., all possible combinations of input variable
values. Let AT f1; : : : ; d g be the set of indices of the truly active variables
and AS f1; : : : ; d g consist of the indices of those variables selected as active
through screening. Then, the following measures may be defined: (i) sensitivity,
s D jAS \ AT j=jAT j, the proportion of active variables that are successfully
detected, where s is defined as 1 when AT D ;; (ii) false discovery rate [8],
fdr D jAS \ ANT j=jAS j, where ANT is the complement of AT , the proportion of
variables selected as active that are actually inactive, and fdr is defined as 0 when
AS D ;; and (iii) type I error rate, I D jAS \ ANT j=jANT j, the proportion of inactive
variables that are selected as active. In practice, high sensitivity is often considered
more important than a low type I error rate or false discovery rate [34] because
failure to detect an active input variable results in no further investigation of the
variable and no exploitation of its effect on the output for purposes of optimization
and control.
The majority of designs for screening experiments are tailored to the identifica-
tion and estimation of a surrogate model that approximates an output variable Y .x/.
A class of surrogate models which has been successfully applied in a variety of
fields [80] has the form
Linear regression models assume that ".x0 / and ".x00 /, x0 x00 2 X , are
independent random variables. Estimation of detailed mean functions hT .x/ with
a large number of terms requires large experiments which can be prohibitively
expensive. Hence, many popular screening strategies investigate each input variable
xi at two levels, often coded C1 and 1 and referred to as high and low,
respectively [15, chs. 6 and 7]. Interest is then in identifying those variables that
have a large main effect, defined for variable xi as the difference between the average
expected responses for the 2d 1 combinations of variable values with xi D C1 and
the average for the 2d 1 combinations with xi D 1. Main effects may be estimated
via a first-order surrogate model
hT .x/ D 0 C 1 x1 C : : : C d xd ; (33.2)
where H D .h.xn1 /; : : : ; h.xnn //T is the model matrix and .xnj /T is the j th row of Xn .
For the least squares estimators to be uniquely defined, H must be of full column
rank.
In physical screening experiments, often no attempt is made to estimate nonlinear
effects other than two-variable interactions. This practice is underpinned by the
principle of effect hierarchy [112] which states that low-order factorial effects, such
as main effects and two-variable interactions, are more likely to be important than
higher-order effects. This principle is supported by substantial empirical evidence
from physical experiments.
However, the exclusion of higher-order terms from surrogate model (33.1) can
result in biased estimators (33.4). Understanding, and minimizing, this bias is key to
effective linear model screening. Suppose that a more appropriate surrogate model is
Q
where h.x/ is a p-vector
Q of model terms, additional to those held in h.x/, and Q is
a p-vector
Q of constants. Then, the expected value of O is given by
E. O / D C A Q ; (33.5)
where
Q
A D .HT H/1 HT H; (33.6)
and HQ D .h.x
Q n //j . The alias matrix A determines the pattern of bias in O due to
j
omitting the terms hQ T .x/ Q from the surrogate model and can be controlled through
the choice of design. The size of the bias is determined by Q which is outside the
experimenters control.
Gaussian process (GP) models are used when it is anticipated that understanding
more complex relationships between the input and output variables is necessary
for screening. Under a GP model, it is assumed that ".x0 /; ".x00 / follow a bivariate
normal distribution with correlation dependent on a distance metric applied to x0 ; x00 ;
see [93] and Metamodel-based sensitivity analysis: polynomial chaos and Gaussian
process.
Screening with a GP model requires interrogation of the parameters that control
this correlation. A common correlation function employed for GP screening has the
form
d
Y
cor.x; x0 / D exp i jxi xi0 ji ; i 0; 0 < i 2: (33.7)
iD1
1148 D.C. Woods and S.M. Lewis
The selection of active variables using a surrogate model relies on the model
assumptions and their validation. An alternative model-free approach is the esti-
mation of elementary effects [74]. The elementary effect for the i th input variable
for a combination of input values x0 2 X is an approximation to the derivative of
Y .x0 / in the direction of the i th variable. More formally,
In a full factorial design, each of the d input variables is assigned a fixed number
of values or levels, and the design consists of one run of each of the distinct
combinations of these values. Designs in which each variable has two values are
mainly considered here, giving n D 2d runs in the full factorial design. For even
moderate values of d , experiments using such designs may be infeasibly large due
to the costs or computing resources required. Further, such designs can be wasteful
as they allow estimation of all interactions among the d variables, whereas effect
hierarchy suggests that low-order factorial effects (main effects and two-variable
interactions) will be the most important. These problems may be overcome by using
a carefully chosen subset, or fraction, of the combinations of variable values in the
full factorial design. Such fractional factorial designs have a long history of use in
33 Design of Experiments for Screening 1149
physical experiments [39] and, more recently, have also been used in the study of
numerical models [36]. However, they bring the complication that the individual
main effects and interactions cannot be estimated independently. Two classes of
designs are discussed here.
The most widely used two-level fractional factorial designs are 1=2q fractions of the
2d full factorial design, known as 2d q designs [112, ch. 5] .1 q < d is integer).
As the outputs from all the combinations of variable values are not available from the
experiment, the individual main effects and interactions cannot be estimated. How-
ever, in a regular fractional factorial design, 2d q linear combinations of the facto-
rial effects can be estimated. Two factorial effects that occur in the same linear com-
bination cannot be independently estimated and are said to be aliased. The designs
are constructed by choosing which factorial effects should be aliased together.
The following example illustrates a full factorial design, the construction of a
regular fractional factorial design, and the resulting aliasing among the factorial
effects. Consider first a 23 factorial design in variables x1 , x2 , x3 . Each run of this
design is shown as a row across the columns 35 in Table 33.1. Thus, these three
columns form the design matrix. The entries in these columns are the coefficients
of the expected responses in the linear combinations that constitute the main
effects, ignoring constants. Where interactions are involved, as in model (33.3),
their corresponding coefficients are obtained as elementwise products of columns
35. Thus, columns 28 of Table 33.1 give the model matrix for model (33.3).
A 241 regular fractional factorial design in n D 8 runs may be constructed
from the 23 design by assigning the fourth variable, x4 , to one of the interaction
columns. In Table 33.1, x4 is assigned to the column corresponding to the highest-
order interaction, x1 x2 x3 . Each of the eight runs now has the property that
x1 x2 x3 x4 D C1, and hence, as each variable can only take values 1, it follows
that x1 D x2 x3 x4 , x2 D x1 x3 x4 , and x3 D x1 x2 x4 . Similarly, x1 x2 D x3 x4 , x1 x3 D
x2 x4 , and x1 x4 D x2 x3 . Two consequences are: (i) each main effect is aliased with
Table 33.1 A 241 fractional factorial design constructed from the 23 full factorial design
showing the aliased effects
Run I x1 x2 x3 x1 x2 x1 x3 x2 x3 x1 x2 x3
D x1 x2 x3 x4 D x2 x3 x4 D x1 x3 x4 D x1 x2 x4 D x3 x4 D x2 x4 D x1 x4 D x4
1 C1 1 1 1 C1 C1 C1 1
2 C1 1 1 C1 C1 1 1 C1
3 C1 1 C1 1 1 C1 1 C1
4 C1 1 C1 C1 1 1 C1 1
5 C1 C1 1 1 1 1 C1 C1
6 C1 C1 1 C1 1 C1 1 1
7 C1 C1 C1 1 C1 1 1 1
8 C1 C1 C1 C1 C1 C1 C1 C1
1150 D.C. Woods and S.M. Lewis
hT .x/ D 0 C 1 x1 C : : : C 4 x4 C 12 x1 x2 C 13 x1 x3 C 23 x2 x3 ;
with model matrix H given by columns 29 of Table 33.1. The columns of H are
mutually orthogonal, h.xnj /T h.xnk / D 0 for j kI j; k D 1; : : : ; 8. The aliasing in
the design will result in a biased estimator of . This can be seen by setting
X
hQ T .x/ Q D 14 x1 x4 C 24 x2 x4 C 34 x3 x4 C j kl xj xk xl C 1234 x1 x2 x3 x4 ;
1j <k<l4
P8
which leads to the alias matrix A D j D1 ej 8 e.8j C1/8 , which is an anti-diagonal
identity matrix, and
E.O0 / D 0 C 1234 ;
The regular designs discussed above require n to be a power of two, which limits
their application to some experiments. Further, even resolution III regular fractional
factorials may require too many runs to be feasible for large numbers of variables.
For example, with 11 variables, a resolution III regular fractional design requires
n D 16 runs. Smaller experiments with n not equal to a power of two can often be
performed by using the wider class of nonregular fractional factorial designs [115]
that cannot be constructed via a set of defining words. For 11 variables, a design with
n D 12 runs can be constructed that can estimate all 11 main effects independently
of each other. While these designs are more flexible in their run size, the cost is
a more complex aliasing scheme that makes interpretation of experimental results
more challenging and requires the use of more sophisticated modeling methods.
Many nonregular designs are constructed via orthogonal arrays [92]. A sym-
metric orthogonal array of strength t , denoted by OA(n, s d , t ), is an n d matrix
of s different symbols such that all ordered t -tuples of the symbols occur equally
often as rows of any n t submatrix of the array. Each such array defines an
n-run factorial design in d variables, each having s levels. Here, only arrays with
s D 2 symbols, 1, will be discussed. The strength of the array is closely related
to the resolution of the design. An array of strength t D 2 allows estimation of all
main effects independently of each other but not of the two-variable interactions
(cf. resolution III); a strength 3 array allows estimation of main effects inde-
pendently of two-variable interactions (cf. resolution IV). Clearly, the two-level
regular fractional factorial designs are all orthogonal arrays. However, the class of
orthogonal arrays is wider and includes many other designs that cannot be obtained
via a defining relation.
1152 D.C. Woods and S.M. Lewis
where 1A is the indicator function for the set A and bij k D 0 or 1 (i; j; k D
1; : : : ; 11). For this design, each interaction is partially aliased with nine main
effects. The competing 16-run resolution III 2117 regular fraction has each main
effect aliased with at most four two-variable interactions, and each interaction
aliased only with at most one main effect. Hence, while an active interaction would
33 Design of Experiments for Screening 1153
bias only one main effect for the regular design, it would bias nine main effects for
the PB design, albeit to a lesser extent.
However, an important advantage of partial aliasing is that it allows interactions
to be considered through the use of variable selection methods (discussed later)
without requiring a large increase in the number of runs. For example, the 12-run
PB design has been used to identify important interactions [26, 46].
A wide range of nonregular designs can be constructed. An algorithm has been
developed for constructing designs which allow orthogonal estimation of all main
effects together with catalogues of designs for n D 12; 16; 20 [101]. Other authors
have used computer search and criteria based on model selection properties to
find nonregular designs [58]. A common approach is to use criteria derived from
D-optimality [4, ch. 11] to find fractional factorial designs for differing numbers
of variables and runs [35]. Designs from these methods may or may not allow
independent estimation of the variable main effects dependent on the models under
investigation and the number of runs available.
Most screening experiments use designs at two levels, possibly with the addition
of one or more center points to provide a portmanteau test for curvature. Recently,
an economic class of three-level screening designs have been proposed, called
definitive screening designs (DSDs) [53], to investigate d variables, generally
in as few as n D 2d C 1 runs. The structure of the designs is illustrated in
Table 33.3 for d D 6. The design has a single center point and 2d runs formed
from d mirrored pairs. The j th pair has the j th variable set to zero and the other
d 1 variables set to 1. The second run in the pair is formed by multiplying all
the elements in the first run by 1. That is, the 2d runs form a foldover design
[17]. This foldover property ensures that main effects and two-variable interactions
are orthogonal and hence main effects are estimated independently from these
interactions, unlike for resolution III or PB designs. Further, all quadratic effects
are estimated independently of the main effects but not independently of the two-
variable interactions. Finally, the two-variable interactions will be partially aliased
1154 D.C. Woods and S.M. Lewis
with each other. These designs are growing in popularity, with a sizeable literature
available on their construction [79, 82, 113].
For experiments with a large number of variables or runs that are very expensive
or time consuming, supersaturated designs have been proposed as a low-resource
(small n) solution to the screening problem [42]. Originally, supersaturated designs
were defined as having too few runs to estimate the intercept and the d main effects
in model (33.2), that is, n < d C1. The resulting partial aliasing is more complicated
than for the designs discussed so far, in that at least one main effect estimator
is biased by one or more other main effects. Consequently, there has been some
controversy about the use of these designs [1]. Recently, evidence has been provided
for the effectiveness of the designs when factor sparsity holds and the active main
effects are large [34, 67]. Applications of supersaturated designs include screening
variables in numerical models for circuit design [65], extraterrestrial atmospheric
science [27], and simulation models for maritime terrorism [114].
Supersaturated designs were first proposed in the discussion [14] of random
balance designs [98]. The first systematic construction method [11] found designs
via computer search that have pairs of columns of the design matrix Xn as nearly
orthogonal as possible through use of the E.s 2 / design selection criterion (defined
below). There was no further research in the area for more than 30 years until
Lin [61] and Wu [111] independently revived interest in the construction of these
designs. Both their methods are based on Hadamard matrices and can be understood,
respectively, as (i) selecting a half-fraction from a Hadamard matrix (Lin) and (ii)
appending one or more interaction columns to a Hadamard matrix and assigning a
new variable to each of these columns (Wu).
Both methods can be illustrated using the n D 12-run PB design in Table 33.2.
To construct a supersaturated design for d D 10 variables in n D 6 runs by method
(i), all six runs of the PB design with x11 D 1 are removed, followed by deletion
of the x11 column. The resulting design is shown in Table. 33.4. To obtain a design
by method (ii) for d D 21 variables in n D 12 runs, ten columns are appended that
correspond to the interactions of x1 with variables x2 to x11 , and variables x12 to x21
are assigned to these columns; see Table 33.5.
Table 33.4 An n D 6-run supersaturated design for d D 10 variables obtained by the method of
Lin [61]
Run x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1
Table 33.5 An n D 12-run supersaturated design for d D 21 obtained by the method of Wu [111]
Run x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
33 Design of Experiments for Screening
4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1155
1156 D.C. Woods and S.M. Lewis
Since 1993, there has been a substantial research effort on construction methods
for supersaturated designs; see, for example, [62, 77, 78]. The most commonly used
criterion for design selection in the literature is E.s 2 /-optimality [11]. More recently,
the Bayesian D-optimality criterion [35, 51] has become popular.
2 X
E.s 2 / D s2 ; (33.9)
d .d 1/ i<j ij
1=.d C1/
D D .H? /T H? C K= 2 ; (33.10)
where H? D 1n jXn , K D Id C1 e1.d C1/ eT1.d C1/ , 2 > 0, and 2 K1 is the
prior variance-covariance matrix for . Equation (33.10) results from assuming an
informative prior distribution for each i (i D 1; : : : ; d ) with mean zero and small
prior variance, to reflect factor sparsity, and a non-informative prior distribution
for 0 . The prior information can be regarded as equivalent to having sufficient
additional runs to allow estimation of all parameters 0 ; : : : ; d , with the value of 2
reflecting the quantity of available prior information. However, the optimal designs
obtained tend to be insensitive to the choice of 2 [67].
Both E.s 2 /- and D-optimal designs may be found numerically, using algorithms
such as columnwise-pairwise [59] or coordinate exchange [71]. From simulation
studies, it has been shown that there is little difference in the performance of E.s 2 /-
and Bayesian D-optimal designs assessed by, for example, sensitivity and type I
error rate [67].
33 Design of Experiments for Screening 1157
Supersaturated designs have also been constructed that allow the detection
of two-variable interactions [64]. Here, the definition of supersaturated has been
widened to include designs that have fewer runs than the total number of factorial
effects to be investigated. In particular, Bayesian D-optimal designs have been
shown to be effective in identifying active interactions [34]. Note that under this
expanded definition of supersaturated designs, all fractional factorial designs are
supersaturated under model (33.1) when n < p.
The analysis of unreplicated factorial designs commonly used for screening exper-
iments has been a topic of much research [45, 56, 105]. In a physical experiment,
the lack of replication to provide a model-free estimate of 2 can make it difficult
to assess the importance of individual factorial effects. The most commonly applied
method for orthogonal designs treats this problem as analogous to the identification
of outliers and makes use of (half-) normal plots of the factorial effects. For
many nonregular and supersaturated designs, more advanced analysis methods
are necessary; see later. For studies on numerical models, provided all the input
variables are controlled, the problem of assessing statistical significance does not
occur as no unusually large observations can have arisen due to chance. Here,
factorial effects may be ranked by size and those variables whose effects lead to a
substantive change in the response declared active.
Biased estimators of factorial effects, however, are an issue for experiments
on both numerical models and physical processes. Complex (partial) aliasing can
produce two types of bias in the estimated parameters in model (33.1): upward bias
so that a type I error may occur (amalgamation) or downward bias leading to missing
active variables (cancellation). Simulation studies have been used to assess these
risks [31, 34, 67].
Bias may also, of course, be induced by assuming a form of the surrogate
model that is too simple, for example, through the surrogate having too few turning
points (e.g., being a polynomial of too low order) or lacking the detail to explain
the local behavior of the numerical model. This kind of bias is potentially the
primary source of mistakes in screening variables in numerical models. When
prior scientific knowledge suggests that the numerical model is highly nonlinear,
screening methods should be employed that have fewer restrictions on the surrogate
model or are model-free. Such methods, including designs for the estimation of
elementary effects (33.8), are described later in this paper. Typically, they require
larger experiments than the designs in the present section.
d X
X d
So .i / D i C ij k C : : : ; (33.11)
j D1 kD1
ij k
and
d
X d X
X d
d X
Se .i / D ij C ij kl C : : : ; (33.12)
j D1 j D1 kD1 lD1
ij ij kl
1 .2d C2/
C0 .i / D Y Y .d CiC1/ C Y .iC1/ Y .1/ ;
4
and
1 .2d C2/
Ce .i / D Y Y .d CiC1/ Y .iC1/ Y .1/ :
4
For numerical models, where observations are not subject to random error, active
variables are selected by ranking the sensitivity indices defined by
M .i /
S .i / D Pd ; i D 1; : : : ; d; (33.13)
j D1 M .j /
Early work on group screening used pooled blood samples to detect individuals
with a disease as economically as possible [33]. The technique was extended,
almost 20 years later, to screening large numbers of two-level variables in factorial
experiments where a main effects only model is assumed for the output [108]. For
an overview of this work and several other strategies, see [75].
In group screening, the set of variables is partitioned into groups, and the values
of the variables within each group are varied together. Smaller designs can then
be used to experiment on these groups. This strategy deliberately aliases the main
effects of the individual variables. Hence, follow-up experimentation is needed on
those variables in the groups found to be important in order to detect the individual
active variables. The main screening techniques that employ grouping of variables
are described below.
The majority of factorial group screening methods apply to variables with two
levels and use two stages of experimentation. At the first stage, the d variables
are partitioned into g groups, where the j th group contains gj 1 variables
.j D 1; : : : ; g/. High and low levels for each of the g grouped variables are
defined by setting all the individual variables in a group to either their high level
or their low level simultaneously. The first experiment with n1 runs is performed
on the relatively small number of grouped variables. Classical group screening then
estimates the main effects for each of the grouped variables and takes those variables
involved in groups that have large estimated main effects through to a second-stage
experiment. Individual variables are investigated at this stage, and their main effects,
and possibly interactions, are estimated.
1160 D.C. Woods and S.M. Lewis
the first new group should contain 2l variables such that 2l < m. The remaining
m 2l variables are assigned to the second group. If variables have been ordered
by an a priori assessment of increasing importance, the most important variables
will be in the second, smaller group, and hence more variables can be ruled out as
unimportant more quickly.
The importance of two-variable interactions may be investigated by using the
output from the following runs to assess each split. The first is run x used in the
standard sequential bifurcation method; the second is the mirror image of x in which
each variable is set low that is set high in x and vice versa. This foldover structure
ensures that any two-variable interactions will not bias estimators of grouped
main effects at each stage. This alternative design also permits standard sequential
bifurcation to be performed and, if the variables deemed active differ from those
found via the foldover, then the presence of active interactions is indicated. Again,
successful identification of the active variables relies on the principle of strong effect
heredity.
A variety of adaptations of sequential bifurcation have been proposed, including
methods of controlling type I and type II error rates [106, 107] and a procedure to
identify dispersion effects in robust parameter design [3]. For further details, see
[55, ch. 4].
The use of iterated fractional factorial designs requires a larger total number
of runs than other group screening methods, as a sequence of factorial screening
designs is implemented. However, the method has been suggested for use when
there are many variables (thousands) arranged in a few large groups. Simulation
studies [95, 96] have indicated that it can be effective here, provided there are very
few active variables.
a b
1.0
1.0
0.8
0.8
0.6
0.6
x2
x2
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x1 x1
c
1.0
0.8
0.6
x2
0.4
0.2
0.0
Fig. 33.1 Latin hypercube samples with n D 9 and d D 2: (a) random LHS, (b) random LHS
generated from an orthogonal array, (c) maximin LHS
.i/ .i/
each dj or some equivalent operation performed, prior to transformation to xj .
An LHS design generated by this method is shown in Fig. 33.1a.
There may be great variation in the overall space-filling properties of LHS
designs. For example, the LHS design in Fig. 33.1a clearly has poor two-
dimensional space-filling properties. Hence, a variety of extensions to Latin
hypercube sampling have been proposed. Most prevalent are orthogonal array-
based and maximin Latin hypercube sampling.
To generate an orthogonal array-based LHS [81, 102], the matrix D is formed
from an orthogonal array. Hence, the columns of D are no longer independent
permutations of 1; : : : ; d . For simplicity, assume O is a symmetric OA(n, s d , t )
with symbols 1; : : : ; s and t 2. The j th row of D is formed by mapping the
n=s occurrences of each symbol in the j th column of O to a random permutation,
, of n=s new symbols, i.e., 1 ! .1; : : : ; n=s/, 2 ! .n=s C 1; : : : ; 2n=s/,
. . . , s ! ..s 1/n=s C 1; : : : ; n/, where .1; : : : ; a/ is a permutation of the
33 Design of Experiments for Screening 1165
8 91
<Xd =2
dist.x; x0 / D .xj xj0 /2 : (33.14)
: ;
j D1
Rather than maximizing directly the minimum of (33.14), most authors [6, 76] find
designs by minimization of
8 91=q
< X h iq =
.Xn / D dist.xni ; xnj / ; (33.15)
: ;
1i<j n
r
1X
i D EEi .xj / (33.17)
r j D1
and
v
u r
uX EEi .xj / i 2
i D t : (33.18)
j D1
r 1
33 Design of Experiments for Screening 1167
A large value of the mean i suggests the i th input variable is active. Nonlinear
and interaction effects are indicated by large values of i . Plots of the sensitivity
indices may be used to decide which variables are active and, among these variables,
which have complex (nonadditive and nonlinear) effects.
In addition to (33.17) and (33.18), an additional measure [22] of the individual
effect of the i th variable has been proposed that overcomes the possible cancellation
of elementary effects in (33.17) due to non-monotonic variable effects, namely,
r
1X
?i D jEEi .xj /j; (33.19)
r j D1
where j j denotes the absolute value. Large values of both i and ?i suggest that
the i th variable is active and has a linear effect on Y .x/; large values of ?i and
small values of i indicate cancellation in (33.17) and a nonlinear effect for the i th
variable.
Four examples of the types of effects which use of (33.17), (33.18) and (33.19)
seeks to identify are shown in Fig. 33.2. These are linear, nonlinear, and interaction
effects corresponding to nonzero values of (33.17) and (33.19) and both zero and
nonzero values of (33.18). It is not possible to use these statistics to distinguish
between nonlinear and interaction effects; see Fig. 33.2b, c.
The future development of a surrogate model can be simplified by separation
of the active variables into two vectors, xS1 and xS2 , where the variables in xS1
have linear effects and the variables in xS2 have nonlinear effects or are involved in
interactions [12]. For example, model (33.1) might be fitted in which h.xS1 / consists
of linear functions and ".xS2 / is modeled via a Gaussian process with correlation
structure dependent only on variables in xS2 .
In the elementary effect literature, the design region is assumed to be X D
0; 1d , after any necessary scaling, and is usually approximated by a d -dimensional
grid, XG , having f equally spaced values, 0; 1=.f 1/; : : : ; 1, for each input
variable. The design of a screening experiment to allow the computation of the
sample moments (33.17), (33.18) and (33.19) has three components: the trajectory
vectors x1 ; : : : ; xr , the m-run sub-design used to calculate EEi .xj / (i D 1; : : : ; d )
for each xj (j D 1; : : : ; r), and stepsize . Choices for these components are now
discussed.
a 35 b
35
30
30
25
25
y
y
20
20
15
15
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
xi xi
c d
40
40
20
20
0
0
y
y
20
20
40
40
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
xi xi
Fig. 33.2 Illustrative examples of effects for an active variable xi with values of i , ?i , and i . In
plots (c) and (d), the plotting symbols correspond to the ten levels of a second variable. (a) Linear
effect: i > 0, ?i > 0; i D 0. (b) Nonlinear effect: ?i > 0; i > 0. (c) Interaction: i > 0.
(d) Nonlinear effect and interaction: ?i > 0, i > 0
0 1
0 0 0 0
B1 0 0 0C
B C
B 1 0 0C
B D B1 C:
B: :: :: C
@ :: : :A
1 1 1 1
33 Design of Experiments for Screening 1169
is now given of variable selection methods for (a) linear models with nonregular and
supersaturated designs and (b) Gaussian process models.
For designs with complex partial aliasing such as supersaturated and nonregular
fractional factorial designs, a wide range of model selection methods have been
proposed. An early suggestion was forward stepwise selection [72], but this was
shown to have low sensitivity in many situations [1, 67]. More recently, evidence
has been provided for the effectiveness of shrinkage regression for the selection of
active effects using data from these more complex designs [34, 67, 83], particularly
use of the Dantzig selector [24]. For this method, estimators O of the parameters in
model (33.1) are chosen to satisfy
p
X
min jOu j subject to jjHT .Yn H O /jj1 s; (33.20)
O 2Rp
uD1
with s a tuning constant and jjajj1 D max jai j, aT D .a1 ; : : : ; ap /. This equation
balances the desire for a parsimonious model with the need for models that
adequately describe the data. The value of s can be chosen via an information
criterion [20] (e.g., AIC, AICc, BIC). The solution to (33.20) may be obtained
via linear programming; computationally efficient algorithms exist for calculating
coefficient paths for varying s [48].
The Dantzig selector is applied to choose a subset of potentially active variables,
and then standard least squares is used to fit a reduced linear model. The terms
in this model whose coefficient estimates exceeded a threshold t , elicited from
subject experts, are declared active. This procedure is known as the Gauss-Dantzig
selector [24].
Other methods for variable selection that have been effective for these designs
include Bayesian methods that use mixture prior distributions for the elements of
[26, 41] and the application of stochastic optimization algorithms [110].
Screening for a Gaussian process model (33.1), with constant mean (h.x/ D 1)
and ".x/; ".x/0 having correlation (33.7), may be performed by deciding which i
in (33.7) are large using data obtained from an LHS or other space-filling design.
Two approaches to this problem are outlined below: stepwise Gaussian process
variable selection (SGPVS) [109] and reference distribution variable selection
(RDVS) [63].
In the first method, screening is via stepwise selection of i , i in (33.7),
analogous to forward stepwise selection in linear regression models. The SGPVS
33 Design of Experiments for Screening 1171
algorithm identifies those variables that differ from the majority in their impact on
the response in the following steps:
(i) Find the maximized log-likelihood for model (33.1) subject to i D and
i D for all i D 1; : : : ; d ; denote this by l0 .
(ii) Set E D f1; : : : ; d g.
(iii) For each j 2 E, find the maximized log-likelihood subject to k D and
k D for all k 2 E n fj g; denote the maximized log-likelihood by lj .
(iv) Let j ? D arg max lj . If lj ? l0 > c, set E D E n fj ? g, l0 D lj ? and go to
j 2E
step (iii). Otherwise, stop.
In step (iv), the value c 6 (the 5% critical value for a X2 distribution) has been
suggested [109].
The algorithm starts by assuming an isotropic correlation function, and, at each
iteration, at most one variable is allocated individual correlation parameters. The
initial model contains only four parameters and the largest model considered has
4d C 2 parameters. However, factor sparsity suggests that the algorithm usually
terminates before models of this size are considered. Hence, smaller experiments
can be used than are normally employed for GP regression in d variables (e.g.,
d D 20 and n D 40 or 50 [109]). This approach essentially adds one variable at a
time to the model. Hence, it has, potentially, similar issues as stepwise regression for
linear models; in particular, the space of possible GP models is not very thoroughly
explored.
The second method, RDVS, is a fully Bayesian approach to the GP variable selec-
tion problem. A Bayesian treatment of model (33.1) with correlation function (33.7)
requires numerical methods, such as Markov Chain Monte Carlo (MCMC), to
obtain an approximate joint posterior distribution of 1 ; : : : ; d (in RDVS, i D 2
for all i ). Conjugate prior distributions can be used for 0 and 2 to reduce the
computational complexity of this approximation. In RVDS, a prior distribution,
formed as a mixture of a standard uniform distribution on 0; 1 and a point mass
at 1, is assigned to
i D exp .0:25i /. This reparameterization of i aids the
implementation of MCMC algorithms and provides an intuitive interpretation: small
0 <
i 1 corresponds to an active variable with the response changing rapidly
with respect to the i th variable.
To screen variables using RVDS, the design matrix Xn for the experiment is
augmented by a .d C 1/th column corresponding to an inert variable (having no
substantive effect on the response) whose values are set at random. The posterior
median for the correlation parameter, d C1 , of the inert variable is computed for
b different randomly generated design matrices, formed by sampling values for the
inert variable, to obtain an empirical reference distribution for the median d C1 . The
percentiles of this reference distribution can be used to assess the importance of the
real variables via the size of the corresponding correlation parameters.
For methods that also incorporate variable selection into the Gaussian process
mean function, i.e., incorporating the choice of functions in h.x/, see [68, 80].
1172 D.C. Woods and S.M. Lewis
In this section, six combinations of the design and modeling strategies discussed
in this paper are demonstrated and compared for variable screening using two test
functions from the literature having d D 20 variables and X D 1; 1d . The
functions differ in the number of active variables and the strength of influence of
these variables on the output.
5w12
Y .x/ D C 5.w4 w20 /2 C w5 C 40w319 5w19
1 C w1
C 0:05w2 C 0:08w3 0:03w6 C 0:03w7 0:09w9 0:01w10
0:07w11 C 0:25w213 0:04w14 C 0:06w15 0:01w17 0:03w18 ;
(33.21)
20
X 20
X 20
X
Y .x/ D 0 C j vj C j k vj vk C j kl vj vk vl
j D1 1j <k 1j <k<l
X
C j klu vj vk vl vu ; (33.22)
1j <k<l<u
There are some important differences between these two examples: func-
tion (33.21) has a proportion of active variables (0.3) in line with factor sparsity,
but the influence of many of these active variables on the response is only small;
function (33.22) is not effect sparse (with 50% of the variables being active), but
the active variables have much stronger influence on the response. This second
function also contains many more interactions. Thus, the examples present different
screening challenges. For these two deterministic functions, a single data set was
generated for each design employed.
33 Design of Experiments for Screening 1173
The designs used are an SSD with n D 16 runs and a DSD with n D 41 runs.
For each design, variable selection is performed using the Dantzig selector as
implemented in the R package flare [60] with shrinkage parameter s chosen using
AICc. Note that these designs are tailored to screening in linear models and hence
may not perform well when the output is inadequately described by a linear model.
Tables 33.6 and 33.7 summarize the results for Examples 1 and 2, respectively,
and present the sensitivity (s ), type I error rate (I ), and false discovery rate
(fdr ) for the five methods. The summaries reported in these tables use automatic
selection of tuning parameters in each method; see below. More nuanced screening,
for example, using graphical methods, may produce different trade-offs between
sensitivity, type I error rate, and false discovery rate. In general, with the exception
of the EE method, results are better for Example 1, which obeys factor sparsity, than
for Example 2, which does not.
For the Gaussian process methods (SGPVS and RDVS), the assessments pre-
sented in Tables 33.6 and 33.7 are the better of the results obtained from the maximin
Latin hypercube sampling design and the maximum projection space-filling design.
In Example 1, for n D 16; 41; 84 the maximin LHS design gave the better
performance, while for n D 200 the maximum projection design was preferred.
In Example 2, the maximum projection design was preferred for n D 16; 200
1174 D.C. Woods and S.M. Lewis
Table 33.6 Sensitivity (s ), type I error rate (I ), and false discovery rate (fdr ) for Example 1
s I fdr s I fdr s I fdr s I fdr
Gaussian processes and space-filling designs
n D 16 n D 41 n D 84 n D 200
SGPVS 0:33 0:07 0:33 1 0 0 1 0 0 0.67 (1) 0 0
RVDS 0:17 0 0 0.33 0 0 1 0 0 1 0 0
One-factor-at-a-time designs n D 42 n D 84 n D 210
EE 0.5 0 0 0:83 0 0 0.83 0 0
SFRD 0.83a (1)b 0 0
Nonregular fractional factorial designs and linear models
n D 16 n D 41
SSD 0:50 0:14 0:40
DSD 0.17 (0.33)c 0 0
Using (33.7) with i D 1
a
Using a threshold of 5% on the sensitivity indices
b
Using a threshold of 1% on the sensitivity indices
c
Sensitivity for a main effects only model
Table 33.7 Sensitivity (s ), type I error rate (I ), and false discovery rate (fdr ) for Example 2
s I fdr s I fdr s I fdr s I fdr
Gaussian processes and space-filling designs
n D 16 n D 41 n D 84 n D 200
SGPVS 0 0 0 0.40 0 0 1 0 0 1 0 0
RVDS 0 0 0 0.30 0 0 0:80 0 0 1 0 0
One-factor-at-a-time designs n D 42 n D 84 n D 210
EE 0.80 0 0 1 0 0 1 0 0
SFRD 0.60a (1)b 0 0
Nonregular fractional factorial designs and linear models
n D 16 n D 41
SSD 0:60 0:90 0:60
DSD 0.10 (0)c 0 0
a
Using a threshold of 5% on the sensitivity indices
b
Using a threshold of 1% on the sensitivity indices
c
Sensitivity for a main effects only model
and the maximin LHS design for n D 41; 84. For Example 1, note that, rather
counterintuitively, for correlation function (33.7) with i D 2, SGPVS has lower
sensitivity for n D 200 than for n D 41 or n D 84. A working hypothesis to explain
this result is that larger designs, with points closer together in the design space,
can present challenges in estimating the parameters 1 ; : : : ; d when the correlation
function is very smooth. Setting i D 1, so that the correlation function is less
smooth, resulted in better screening for n D 200. The choice of design for Gaussian
process screening is an area for further research, as is the application to screening
of extensions of the Gaussian process model to high-dimensional inputs [37, 43].
33 Design of Experiments for Screening 1175
SGPVS and RDVS performed well when used with larger designs (n D 84; 200)
for both examples. For Example 1, the effectiveness of SGPVS has already been
demonstrated [109] for a Latin hypercube design with n D 50 runs. In the current
study, the method was also effective when n D 41. Neither RDVS or SGPVS
provided reliable screening when n D 16. Both methods found Example 2, with
a greater proportion of active variables, more challenging; effective screening was
only achieved when n D 84; 200.
For RVDS, recall that the empirical distribution of the posterior median of
the correlation parameter for the inert variable is used as a reference distribution
to assess the size of the correlation parameters for the actual variables. For the
assessments in Tables 33.6 and 33.7, a variable was declared active if the posterior
median of its correlation parameter exceeded the 90th percentile of the reference
distribution. A graphical analysis can provide a more detailed assessment of the
method. Figure 33.3 shows the reference distribution and the posterior median of
the correlation parameter for each of the 20 variables. Variables declared active have
their posterior medians in the right-hand tail of the reference distribution. For both
examples, the greater effectiveness of RDVS for larger n is clear. It is interesting to
note that choosing a smaller percentile as the threshold to declare a variable active,
for example, 80%, would have resulted in considerably higher type I error and false
discovery rates for n D 16; 41.
For the EE method, performance was assessed by visual inspection of plots of
?i against i (i D 1; : : : ; 20); see Fig. 33.4. A number of different samples of
trajectory vectors x1 ; : : : ; xr were used and similar results obtained for each. For
Example 1, where active variables have a smaller influence on the response, the EE
method struggled to identify all the active variables. Variable x19 was consistently
declared active, having a nonlinear effect. For larger n, variables x1 , x4 , x12 , and
x20 were also identified. For Example 2, with larger active effects, the performance
was better. All active variables are identified when n D 84 (as also demonstrated
by Morris [74]) and when n D 200. Performance was also strong for n D 42, with
only x3 and x7 not identified.
For both examples, relatively effective screening was achieved through use of an
SFRD to estimate sensitivity indices (33.13). These estimated indices are displayed
in Fig. 33.5. Using a threshold of S .i / > 0:05 leads to a small number of active
variables being missed (one in Example 1 and four in Example 2); the choice of the
lower threshold of S .i / > 0:01 results in all active variables being identified in both
examples and no type I errors being made.
A study of the two true functions provides some understanding of the strong
performance of the SFRD. Both functions can be reasonably well approximated
(via Taylor series expansions) by linear models involving main effect and interaction
terms, with no cancellation of main effects with three-variable interactions or of two
variable with four-variable interactions. It is not difficult to modify the two functions
to achieve a substantial reduction in the effectiveness of the SFRD. In Example 1,
replacement of the term 5.w4 w20 /2 by 5w24 5w220 produces a function which
is highly nonlinear in w4 and w20 . Screening for this function resulted in these two
variables being no longer declared active when the SFRD was used. For Example 2,
1176 D.C. Woods and S.M. Lewis
a 2.5
x12 b c x5 x12
2.5
2.0
2.0
2.0
1.5
1.5
1.5
Density
Density
Density
1.0
1.0
1.0
0.5
0.5
0.5
0.0
0.0
0.0
1.2 0.8 0.4 0.0 0.8 0.6 0.4 0.2 0.0 1.0 0.5 0.0 0.5
Posterior median Posterior median Posterior median
d e f
2.0
0.4
Density
Density
Density
1.0
0.2
0.5
0.0
0.0
g h
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.3
Density
Density
0.2
0.1
0.0
14 10 6 2 5 4 3 2 1 0
Posterior median Posterior median
Fig. 33.3 RDVS results: histograms of the empirical distribution of the posterior median of the
correlation parameter for 1000 randomly generated inert variables. The posterior medians of the
correlation parameters for the 20 variables are marked as vertical lines (unbroken, declared active;
broken, declared inactive). If there are fewer than five variables declared active, the variable names
are also given. (a) Example 1: n D 16. (b) Example 2: n D 16. (c) Example 1: n D 41.
(d) Example 2: n D 41. (e) Example 1: n D 84. (f) Example 2: n D 84. (g) Example 1: n D 200.
(h) Example 2: n D 200
a x20 b x4 c x4
150
l l l
0.5
1.2
x5
0.4
100
0.8
0.3
x2
x6
0.2
x1 x20
50
0.4
l l
x1 x12
0.1
l l
x12
x10
l
l
x8
l
x19 x9 x19
l
0.0
0.0
l
l l
l l
l l l l l
0
l
lll l l l l l l
l
ll l l
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 20 40 60 80 100 0.0 0.5 1.0 1.5 2.0 2.5 3.0
* * *
d x5 e x19 f x1
80
l
150
l l
x4
l
x5
1.2
x6 l
60
x
100
x24 l
0.8
x2
40
x1 l
x3 l
50
x3
0.4
x4 x7
l
l l
x6 x
20
l x20 l
7 l
x8 x10 x1 x12 x9
x10 x8
l
l
ll l
x9
l ll
ll
ll ll l l l l l l
l
0.0
l l l l
l
0
l
ll l
l l
Fig. 33.4 EE results: plots of ?i (33.19) against i (33.18) for Examples 1 and 2. Labels indicate
variables declared active by visual inspection. (a) Example 1: n D 42. (b) Example 2: n D 42. (c)
Example 1: n D 84. (d) Example 2: n D 84. (e) Example 1: n D 210. (f) Example 2: n D 210
0.4
Example 1
Example 2
0.3
S(i)
0.2
0.1
0.0
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20
Fig. 33.5 Sensitivity indices (33.13) from the SFRD for Examples 1 and 2 for variables x1 x20
1178 D.C. Woods and S.M. Lewis
a b
40
0.8 x 19
Parameter estimate
Parameter estimate
20
0.6
0.4
0
0.2
20
x5 x2
0.0
x 12
x 13
c d 40
0.8
x 19
Parameter estimate
Parameter estimate
20
0.6
0
0.4
x 13 x 20
x 2 x 14
20
0.2
2
x 19
0.0
60
Fig. 33.6 Supersaturated (SSD) and definitive screening design (DSD) results: plots of penalty
parameter s against parameter estimates. In each plot, the variables and effects declared active are
labeled, with the exception of plot (b) where 15 variables are declared active. (a) Example 1: SSD
n D 16. (b) Example 2: SSD n D 16. (c) Example 1: DSD n D 41. (d) Example 2: DSD n D 41
The nonregular designs (SSD and DSD) provide an interesting contrast to the
other methods, all of which are tailored to, or have been suggested for, variable
screening for nonlinear numerical models. For the SSD, only main effects are
considered; for the DSD, the surrogate model can include main effects, two-variable
interactions, and quadratic terms. Figure 33.6 gives shrinkage plots for each of the
SSD and DSD which show the estimated model parameters against the shrinkage
parameter s in (33.20). As s ! 0, the shrinkage of the estimated parameters is
reduced; at s D 0, there would be no shrinkage which is not possible for designs
with n < p. These plots may be used to choose the active variables, namely, those
involved in at least one effect whose corresponding estimated model parameter is
33 Design of Experiments for Screening 1179
150
a 6
l
x19
b l
l l l
l
5
100
4
Effects
Effects
l
l
3
l
l
l
50
2
l l
l
l
1
l
x12 l
l l
l l
0
l l l l l l l l l l l l l l l l l
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
Halfnormal quantiles Halfnormal quantiles
Fig. 33.7 Definitive screening design (DSD) results for main effects only: half-normal plots.
(a) Example 1: DSD n D 41 (main effects). (b) Example 2: DSD n D 41 (main effects)
nonzero for larger values of s. In each plot, the value of s chosen by AICc is marked
by a vertical line. Figure 33.6a, c shows shrinkage plots for Example 1 from which
the dominant active variables are easily identified, although a number of smaller
active variables are missed. For Example 2, Fig. 33.6b, d is much harder to interpret,
because they have a number of moderately large estimated parameters. This reflects
the larger number of active variables in Example 2. Clearly, the effectiveness of both
methods for this second example is limited.
To provide a further comparison between the SSD and DSD, data from the latter
design were also analyzed using a main effects only surrogate model (33.2). For
these models, the DSD is an orthogonal design and hence standard linear model
analyses are appropriate. To summarize the results, Fig. 33.7 gives half-normal
plots [30] for each example. Here, the ordered absolute values of the estimated
main effects are plotted against theoretical half-normal quantiles. Variables whose
estimated main effects stand away from a straight line are declared active, such as
x12 and x19 in Example 1; see Fig. 33.7a. No variables are identified as active for
Example 2. These results agree with t-tests on the estimated model parameters.
7 Conclusions
Screening with numerical models is a challenging problem, due to the large number
of input variables that are typically under investigation and the complex nonlinear
relationships between these variables and the model outputs.
The results from the study in the previous section highlight these challenges
and the dangers of attempting screening using experiments that are too small or
are predicated on linear model methodology. For this study of d D 20 variables
and six screening methods, a sample size of at least n D 40 was required for
effective screening, with more runs needed when factor sparsity did not hold. The
1180 D.C. Woods and S.M. Lewis
EE method was the most effective and robust method for screening, with the highly
resource-efficient SSD and DSD being the least effective here. Of course, these two
nonregular designs were not developed for the purpose of screening variables in
nonlinear functions; in contrast to the SFRD, neither explicitly incorporates higher-
order interactions, and the SSD suffers from partial aliasing between main effects.
The two Gaussian process methods, RDVS and SSD, required a greater number of
runs to provide sensitive screening.
Methods that use Gaussian process models have the advantage of also providing
predictive models for the response. Building such models with the EE or SFRD
methods is likely to require additional experimentation. In common with screening
via physical experiments, a sequential screening strategy, where possible, is likely to
be more effective. Here, a small initial experiment could be run, for example, using
the EE method, with more targeted follow-up experimentation and model building
focused on a subset of variables using a Gaussian process modeling approach.
Cross-References
References
1. Abraham, B., Chipman, H., Vijayan, K.: Some risks in the construction and analysis of
supersaturated designs. Technometrics, 41, 135141 (1999).
2. Andres, T.H., Hajas, W.C.: Using iterated fractional factorial design to screen parameters in
sensitivity analysis of a probabilistic risk assessment model. In: Proceedings of Joint Interna-
tional Conference on Mathematical Methods and Supercomputing in Nuclear Applications,
Karlsruhe, pp. 328340 (1993)
3. Ankenman, B.E., Cheng, R.C.H., Lewis, S.M.: Screening for dispersion effects by sequential
bifurcation. ACM Trans. Model. Comput. Simul. 25 pages 2:1 - 2:27 (2014)
4. Atkinson, A.C., Donev, A.N., Tobias, R.D.: Optimum Experimental Designs, with SAS, 2nd
edn. Oxford University Press, Oxford (2007)
5. Ba, S.: SLHD: Maximin-Distance (Sliced) Latin Hypercube Designs. http://CRAN.R-project.
org/package=SLHD (2015). R package version 2.1-1
6. Ba, S., Brenneman, W.A., Myers, W.R.: Optimal sliced Latin hypercube designs. Technomet-
rics 57, 479487 (2015)
33 Design of Experiments for Screening 1181
33. Dorfman, R.: The detection of defective members of large populations. Ann. Math. Stat. 14,
436440 (1943)
34. Draguljic, D., Woods, D.C., Dean, A.M., Lewis, S.M., Vine, A.E.: Screening strategies in the
presence of interactions (with discussion). Technometrics 56, 128 (2014)
35. DuMouchel, W., Jones, B.A.: A simple Bayesian modification of D-optimal designs to reduce
dependence on an assumed model. Technometrics 36, 3747 (1994)
36. Dupuy, D., Corre, B., Claeys-Bruno, M., Sergent, M.: Comparison of different screening
methods. Case Stud. Bus. Ind. Gov. Stat. 5, 115125 (2014)
37. Durrande, N., Ginsbourger, D., Roustant, O.: Additive covariance kernels for high-
dimensional Gaussian process modeling. Annales de la Facult de Sciences de Toulouse
21, 481499 (2012)
38. Fang, K.T., Lin, D.K.J., Winker, P., Zhang, Y.: Uniform design: theory and application.
Technometrics 42, 237248 (2000)
39. Finney, D.J.: The fractional replication of factorial arrangements. Ann. Eugen. 12, 291301
(1943)
40. Franco, J., Dupuy, D., Roustant, O., Damblin, G., Iooss, B.: DiceDesign: Design of
Computer Experiments. http://CRAN.R-project.org/package=DiceDesign (2014). R package
version 1.6
41. George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc.
88, 881889 (1993)
42. Gilmour, S.G.: Factor screening via supersaturated designs. In: Dean, A.M., Lewis, S.M.
(eds.) Screening: Methods for Experimentation in Industry, Drug Discovery and Genetics,
pp. 169190. Springer, New York (2006)
43. Gramacy, R.B., Lee, H.K.H.: Bayesian treed Gaussian process models with an application to
computer modeling. J. Am. Stat. Assoc. 103, 11191130 (2008)
44. Hall, M.J.: Combinatorial Theory. Blaisdell, Waltham (1967)
45. Hamada, M., Balakrishnan, N.: Analyzing unreplicated factorial experiments: a review with
some new proposals. Statistica Sinica 8, 141 (1998)
46. Hamada, M., Wu, C.F.J.: Analysis of designed experiments with complex aliasing. J. Qual.
Technol. 24, 130137 (1992)
47. Iman, R.L., Conover, W.J.: A distribution-free approach to inducing rank correlation among
input variables. Commun. Stat. Simul. Comput. 11, 311334 (1982)
48. James, G.M., Radchenko, P., Lv, J.: DASSO: connections between the Dantzig selector and
lasso. J. R. Stat. Soc. B 71, 127142 (2009)
49. Jin, R., Chen, W., Sudjianto, A.: An efficient algorithm for constructing optimal design of
computer experiments. J. Stat. Plan. Inference 134, 268287 (2005)
50. Johnson, M., Moore, L.M., Ylvisaker, D.: Minimax and maximin distance design. J. Stat.
Plan. Inference 26, 131148 (1990)
51. Jones, B.A., Lin, D.K.J., Nachtsheim, C.J.: Bayesian D-optimal supersaturated designs. J.
Stat. Plan. Inference 138, 8692 (2008)
52. Jones, B.A., Majumdar, D.: Optimal supersaturated designs. J. Am. Stat. Assoc. 109, 1592
1600 (2014)
53. Jones, B.A., Nachtsheim, C.J.: A class of three-level designs for definitive screening in the
presence of second-order effects. J. Qual. Technol. 43, 115 (2011)
54. Joseph, R., Gul, E., Ba, S.: Maximum projection designs for computer experiments.
Biometrika 102, 371380 (2015)
55. Kleijnen, J.P.C.: Design and Analysis of Simulation Experiments, 2nd edn. Springer, New
York (2015)
56. Lenth, R.V.: Quick and easy analysis of unreplicated factorials. Technometrics 31, 469473
(1989)
57. Lewis, S.M., Dean, A.M.: Detection of interactions in experiments on large numbers of factors
(with discussion). J. R. Stat. Soc. B 63, 633672 (2001)
58. Li, W.: Screening designs for model selection. In: Dean, A.M., Lewis, S.M. (eds.) Screening:
Methods for Experimentation in Industry, Drug Discovery and Genetics, pp. 207234.
Springer, New York (2006)
33 Design of Experiments for Screening 1183
59. Li, W.W., Wu, C.F.J.: Columnwise-pairwise algorithms with applications to the construction
of supersaturated designs. Technometrics 39, 171179 (1997)
60. Li, X., Zhao, T., Wong, L., Yuan, X., Liu, H.: flare: Family of Lasso Regression. http://CRAN.
R-project.org/package=flare (2014). R package version 1.5.0
61. Lin, D.K.J.: A new class of supersaturated designs. Technometrics 35, 2831 (1993)
62. Lin, D.K.J.: Generating systematic supersaturated designs. Technometrics 37, 213225
(1995)
63. Linkletter, C., Bingham, D., Hengartner, N., Hidgon, D., Ye, K.Q.: Variable selection for
Gaussian process models in computer experiments. Technometrics 48, 478490 (2006)
64. Liu, Y., Dean, A.M.: K-circulant supersaturated designs. Technometrics 46, 3243 (2004)
65. Liu, M., Fang, K.T.: A case study in the application of supersaturated designs to computer
experiments. Acta Mathematica Scientia 26B, 595602 (2006)
66. Loeppky, J.L., Sacks, J., Welch, W.J.: Choosing the sample size of a computer experiment: a
practical guide. Technometrics 51, 366376 (2009)
67. Marley, C.J., Woods, D.C.: A comparison of design and model selection meth-
ods for supersaturated experiments. Comput. Stat. Data Anal. 54, 31583167
(2010)
68. Marrel, A., Iooss, B., Van Dorpe, F., Volkova, E.: An efficient methodology for modeling
complex computer codes with Gaussian processes. Comput. Stat. Data Anal. 52, 47314744
(2008)
69. Mauro, C.A., Smith, D.E.: The performance of two-stage group screening in factor screening
experiments. Technometrics 24, 325330 (1982)
70. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting
values of input variables in the analysis of output from a computer code. Technometrics 21,
239245 (1979)
71. Meyer, R.K., Nachtsheim, C.J.: The coordinate-exchange algorithm for constructing exact
optimal experimental designs. Technometrics 37, 6069 (1995)
72. Miller, A.: Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, Boca Raton
(2002)
73. Moon, H., Dean, A.M., Santner, T.J.: Two-stage sensitivity-based group screening in com-
puter experiments. Technometrics 54, 376387 (2012)
74. Morris, M.D.: Factorial sampling plans for preliminary computational experiments. Techno-
metrics 33, 161174 (1991)
75. Morris, M.D.: An overview of group factor screening. In: Dean, A.M., Lewis, S.M. (eds.)
Screening: Methods for Experimentation in Industry, Drug Discovery and Genetics, pp.
191206. Springer, New York (2006)
76. Morris, M.D., Mitchell, T.J.: Exploratory designs for computer experiments. J. Stat. Plan.
Inference 43, 381402 (1995)
77. Nguyen, N.K.: An algorithmic approach to constructing supersaturated designs. Technomet-
rics 38, 6973 (1996)
78. Nguyen, N.K., Cheng, C.S.: New E.s 2 /-optimal supersaturated designs constructed from
incomplete block designs. Technometrics 50, 2631 (2008)
79. Nguyen, N.K., Stylianou, S.: Constructing definitive screening designs using cyclic genera-
tors. J. Stat. Theory Pract. 7, 713724 (2012)
80. Overstall, A.M., Woods, D.C.: Multivariate emulation of computer simulators: model selec-
tion and diagnostics with application to a humanitarian relief model (2016). J. Roy. Statist.
Soc. C, in press (DOI:10.1111/rssc.12141).
81. Owen, A.B.: Orthogonal arrays for computer experiments, integration and visualisation.
Statistica Sinica 2, 439452 (1992)
82. Phoa, F.K.H., Lin, D.K.J.: A systematic approach for the construction of definitive screening
designs. Statistica Sinica 25, 853861 (2015)
83. Phoa, F.K.H., Pan, Y.H., Xu, H.: Analysis of supersaturated designs via the Dantzig selector.
J. Stat. Plan. Inference 139, 23622372 (2009)
84. Plackett, R.L., Burman, J.P.: The design of optimum multifactorial experiments. Biometrika
33, 305325 (1946)
1184 D.C. Woods and S.M. Lewis
85. Pronzato, L., Mller, W.G.: Design of computer experiments: space filling and beyond. Stat.
Comput. 22, 681701 (2012)
86. Pujol, G.: Simplex-based screening designs for estimating meta-models. Reliab. Eng. Syst.
Saf. 94, 11561160 (2009)
87. Pujol, G., Iooss, B., Janon, A.: Sensitivity: Sensitivity Analysis. http://CRAN.R-project.org/
package=sensitivity (2015). R package version 1.11
88. Qian, P.Z.G.: Sliced Latin hypercube designs. J. Am. Stat. Assoc. 107, 393399 (2012)
89. Qian, P.Z.G., Wu, C.F.J.: Sliced space-filling designs. Biometrika 96, 945956 (2009)
90. Qian, P.Z.G., Wu, H., Wu, C.F.J.: Gaussian process models for computer experiments with
qualitative and quantitative factors. Technometrics 50, 383396 (2008)
91. Qu, X., Wu, C.F.J.: One-factor-at-a-time designs of resolution V. J. Stat. Plan. Inference 131,
407416 (2005)
92. Rao, C.R.: Factorial experiments derivable from combinatorial arrangements of arrays. J. R.
Stat. Soc. Suppl. 9, 128139 (1947)
93. Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT, Cambridge
(2006)
94. Ryan, K.J., Bulutoglu, D.A.: E.s 2 /-optimal supersaturated designs with good minimax
properties. J. Stat. Plan. Inference 137, 22502262 (2007)
95. Saltelli, A., Andres, T.H., Homma, T.: Sensitivity analysis of model output. An investigation
of new techniques. Comput. Stat. Data Anal. 15, 211238 (1993)
96. Saltelli, A., Andres, T.H., Homma, T.: Sensitivity analysis of model output. Performance
of the iterated fractional factorial design method. Comput. Stat. Data Anal. 20, 387407
(1995)
97. Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments.
Springer, New York (2003)
98. Satterthwaite, F.: Random balance experimentation. Technometrics 1, 111137 (1959)
99. Scinto, P.R., Wilkinson, R.G., Wang, Z., Rose, A.D.: Comment: need for guidelines on
appropriate screening designs for practitioners. Technometrics 56, 2324 (2014)
100. Scott-Drechsel, D., Su, Z., Hunter, K., Li, M., Shandas, R., Tan, W.: A new flow co-culture
system for studying mechanobiology effects of pulse flow waves. Cytotechnology 64,
649666 (2012)
101. Sun, D.X., Li, W., Ye, K.Q.: An algorithm for sequentially constructing non-isomorphic
orthogonal designs and its applications. Technical report SUNYSB-AMS-02-13, Department
of Applied Mathematics, SUNY at Stony Brook, New York (2002)
102. Tang, B.: Orthogonal array-based Latin hypercubes. J. Am. Stat. Assoc. 88, 13921397
(1993)
103. Tang, B.: Selecting Latin hypercubes using correlation criteria. Statistica Sinica 8, 965977
(1998)
104. Vine, A.E., Lewis, S.M., Dean, A.M.: Two-stage group screening in the presence of noise
factors and unequal probabilities of active effects. Statistica Sinica 15, 871888 (2005)
105. Voss, D.T., Wang, W.: Analysis of orthogonal saturated designs. In: Dean, A.M., Lewis, S.M.
(eds.) Screening: Methods for Experimentation in Industry, Drug Discovery and Genetics,
pp. 268286. Springer, New York (2006)
106. Wan, H.B.E., Ankenman, B.E., Nelson, B.L.: Controlled sequential bifurcation: a new factor-
screening method for discrete-event simulation. Oper. Res. 54, 743755 (2006)
107. Wan, H., Ankenman, B.E., Nelson, B.L.: Improving the efficiency and efficacy of controlled
sequential bifurcation for simulation factor screening. INFORMS J. Comput. 3, 482492
(2010)
108. Watson, G.S.: A study of the group screening method. Technometrics 3, 371388 (1961)
109. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening,
predicting and computer experiments. Technometrics 34, 1525 (1992)
110. Wolters, M.A., Bingham, D.R.: Simulated annealing model search for subset selection in
screening experiments. Technometrics 53, 225237 (2011)
33 Design of Experiments for Screening 1185
111. Wu, C.F.J.: Construction of supersaturated designs through partially aliased interactions.
Biometrika 80, 661669 (1993)
112. Wu, C.F.J., Hamada, M.: Experiments: Planning, Analysis and Optimization, 2nd edn. Wiley,
Hoboken (2009)
113. Xiao, L., Lin, D.K.J., Bai, F.: Constructing definitive screening designs using conference
matrices. J. Qual. Technol. 44, 28 (2012)
114. Xing, D., Wan, H., Yu Zhu, M., Sanchez, S.M., Kaymal, T.: Simulation screening experiments
using Lasso-optimal supersaturated design and analysis: a maritime operations application.
In: Pasupathy, R., Kim, S.H., Tolk, A., Hill, R., Kuhl, M.E. (eds.) Proceedings of the 2013
Winter Simulation Conference, Washington, DC, pp. 497508 (2013)
115. Xu, H., Phoa, F.K.H., Wong, W.K.: Recent developments in nonregular fractional factorial
designs. Stat. Surv. 3, 1846 (2009)
116. Yang, H., Butz, K.D., Duffy, D., Niebur, G.L., Nauman, E.A., Main, R.P.: Characterization of
cancellous and cortical bone strain in the in vivo mouse tibial loading model using microct-
based finite element analysis. Bone 66, 131139 (2014)
117. Ye, K.Q.: Orthogonal column Latin hypercubes and their application in computer experi-
ments. J. Am. Stat. Assoc. 93, 14301439 (1998)
Weights and Importance in Composite
Indicators: Mind the Gap 34
William Becker, Paolo Paruolo, Michaela Saisana, and
Andrea Saltelli
Abstract
Multidimensional measures (often termed composite indicators) are popular
tools in the public discourse for assessing the performance of countries on
human development, perceived corruption, innovation, competitiveness, or other
complex phenomena. These measures combine a set of variables using an
aggregation formula, which is often a weighted arithmetic average. The values
of the weights are usually meant to reflect the variables importance in the index.
This paper uses measures drawn from global sensitivity analysis, specifically
the Pearson correlation ratio, to discuss to what extent the importance of each
variable coincides with the intentions of the developers. Two nonparametric
regression approaches are used to provide alternative estimates of the correlation
ratios, which are compared with linear measures. The relative advantages of
different estimation procedures are discussed. Three case studies are investigated:
the Resource Governance Index, the Good Country Index, and the Financial
Secrecy Index.
Keywords
Composite indicators Nonlinear regression Sensitivity analysis
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188
2 Measures of Importance and Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1191
2.1 Transformations and Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1191
2.2 Importance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192
2.3 Relation to Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195
3 Estimating Main Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196
3.1 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196
3.2 Penalized Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197
3.3 Local Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198
3.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199
4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199
4.1 Resource Governance Index (RGI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200
4.2 Financial Secrecy Index (FSI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204
4.3 Good Country Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209
5 Discussion on Estimation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1212
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215
1 Introduction
different estimation methods for the conditional expectations are discussed. Results
obtained for nonlinear regression are compared to with those obtained by linear
regression, and the added value of nonlinearity is assessed in concrete cases studies.
The approach of this chapter lends itself to the possibility of selecting nominal
weights which imply the intended importance (correlation ratios) of each input vari-
able by searching through the weight space (for a short discussion on reverse engi-
neering the weights, see [13]). An additional feature of this approach is that, as a by-
product of the analysis, it provides an estimate of the conditional expectation of the
composite indicator as a function of a single input variable. The local slope of this
conditional expectation answers the related research question: how much would the
composite indicator increase for a marginal increase of the input variable (averaged
over variations in other variables)?. This question may be of interest when
discussing alternative policy measures geared to influence different input variables.
The remainder of this chapter is organized as follows: the construction of
composite indicators and some measures of variables importance are reviewed first,
including the correlation ratio. A description is then given of linear and nonlinear
approaches to estimation of the correlation ratio. Three case studies are used to
illustrate the relative merits of each estimation approach: the Resource Governance
Index, which aims to measure transparency and accountability in the oil, gas, and
mining sectors; the Financial Secrecy Index, which measures secrecy and scope for
abuse in the financial sector for each country; and the Good Country Index, which
aims to measure to what extent a given country contributes to some preestablished
normative goods for humanity. In these case studies, the relative strengths of the
correlation ratio compared to linear measures of dependence are also discussed.
d
X
yj D wi xj i ; j D 1; 2; ; n (34.1)
iD1
where xj i is the normalized score of individual j (e.g., country) based on the value
Xj i of the i th raw variable i D 1; : : : ; d and wi is the nominal weight assigned to
the i th variable Xi or xi . Input variables are usually normalized according to the
min-max normalization method (see [3]):
Xj i Xmin;i
xj i D ; (34.2)
Xmax;i Xmin;i
1192 W. Becker et al.
where Xmax;i and Xmin;i are the upper and lower values, respectively, for the variable
Xi ; in this case all scores xj i vary in 0; 1. Xmax;i and Xmin;i could be replaced
by maximal and minimal values for the Xi that do not depend on the sample
observations.
A popular alternative to the min-max normalization in (34.2) is given by the
standardization:
Xj i E .Xi /
xj i D p ; (34.3)
V .Xi /
where E .Xi / and V .Xi / are the mean and variances of the original variables Xi .
Here E .Xi / and V .Xi / can be estimated by the sample mean and variance. Note
that (34.3) guarantees equal variances, while transformation (34.2) does not.
In fact, transformation (34.2) scales each input variable onto 0; 1, but allows
for different means and variances. On the other hand, (34.3) transforms all variables
such that they have a mean of zero and a variance of one. Importantly, however, both
transformations do not affect the correlation structure of variables, because both are
linear transformations of the original variables and correlations are invariant with
respect to linear transformations.
The weight, wi , attached to each variable, xi , is usually chosen so as to reflect the
importance of that variable with respect to the concept being measured. The ratios
wi =w` can be taken to be the revealed target relative importance of inputs i and `
because they measure the substitution effect between xi and x` , i.e., how much x`
must be increased to offset or balance a unit decrease in xi (see [8]).
One of the central messages in this chapter is that the target importance does
not usually coincide with the true importance of each variable, as defined by the
correlation ratio, as shown above for the trivial case of two uncorrelated variables.
In order to better understand the contribution of each input variable to the output of
the composite indicator, measures of linear and nonlinear dependence may be used;
these are reviewed in the following section.
In the following, the sample index subscript j (usually representing the country or
region) is dropped unless it is needed for clarity. Let also the expected value and
variance of y be E.y/ D y and V.y/ D y2 . Similarly, the means and variances of
each xi are denoted as fi gdiD1 and fi2 gdiD1 , respectively.
For any given composite indicator, one can define measures of importance of
each of the input variables with respect to the output values of the composite
indicator. One approach for assessing the influence of each input variable xi on
the composite indicator is to measure the dependence of y on xi , where, from here
on, variables are assumed to be normalized see Eqs. (34.2) and (34.3).
Assume that
where fi .xj;i / is an appropriate function, possibly nonlinear, that models E.yj jxi /
the conditional expectation of y given xi and "j is an error term. Dependence
of y on xi can be measured in a number of ways. The covariance and correlation
between xi and y, for example, are defined as
cov.y; xi /
cov.y; xi / WD E.y y /.xi i /; Ri WD corr.y; xi / WD :
y i
(34.5)
Remark here that corr.y; xi / is a standardized version of the covariance, called the
Pearson product-moment correlation coefficient, which scales the covariance onto
the interval 1; 1. In the case of a simple linear regression of y on xi , the square of
the correlation coefficient gives the well-known linear coefficient of determination
R2 , i.e., Ri2 WD corr2 .y; xi /, which takes values in 0; 1.
The coefficient of determination is used to measure the goodness of fit of an
ordinary linear regression: as such Ri2 is a measure of linear dependence. Because
of (34.5), the covariance cov.y; xi /, the correlation corr.y; xi /, and the coefficient
of determination Ri2 are all related measures of linear dependence. In sample, the
coefficient of determination Ri2 can be computed as
Vxi .Exi .y j xi //
Si 2i WD ; (34.7)
V .y/
where xi is defined as the vector containing all the variables .x1 ; : : : ; xd / except
variable xi and Exi .y j xi / denotes the conditional expectation of y given xi ,
e.g., with xi fixed at one value in its interval of variation. The notation employed
here stresses that the expectation in Exi .y j xi / is computed with respect to the
1194 W. Becker et al.
distribution of xi , i.e., with respect to all other input variables; while the subscript
xi used as the outer variance signifies that the variance is taken over all possible
values of xi . In the following the variance and expected value subscripts are dropped
to economize on notation.
The conditional expectation E .y j xi / is known as the main effect of xi on y,
and it describes the expected value of y (the composite indicator) averaged over all
input variables except xi , keeping xi fixed. As such, E .y j xi / is a function of xi
and is here denoted as fi .xi /. This function is not typically known; however, it can
be estimated by performing a (nonlinear) regression of y on xi . Various approaches
for this problem are discussed in the following section; any of them delivers a fitted
value mj WD fOi .xij / corresponding to the prediction of yj . The correlation ratio Si
can then be computed in sample as
n
X n
X
2
.mj m/
N = N 2
.yj y/ (34.8)
j D1 j D1
P
where m N WD n1 nj D1 mj , mj WD m.x O j i /, and m./
O is the estimate of m./ WD
fi ./. Equation (34.8) mimics (34.6).
The correlation ratio Si is closely connected with the measure Ri2 of linear
dependence discussed previously. More specifically, Ri2 equals Si when fi .xi / is
linear: note that the main effect is dependent on the functional form of the composite
indicator and the dependence structure between variables. This shows that, in the
linear case, Si is also related to the covariance and correlation coefficients by the
associations discussed earlier.
In order to illustrate why linear measures of importance would not be sufficient
for nonlinear dependence, Fig. 34.1 plots data generated using the relationship
0.4
0.2
0.2
0 0.2 0.4 0.6 0.8 1
x
34 Weights and Importance in Composite Indicators: Mind the Gap 1195
correlation. However, even for the case of non-independent inputs, the Si measure
preserves its meaning of expected fractional reduction of the output variance that
would be achieved if a variable could be fixed [22].
In order to estimate fi .xi /, a wide number of approaches are available. Two
nonlinear regression approaches are discussed in the next section; both offer a
flexible framework for estimating nonlinear main effects.
y D X C : (34.10)
Minimizing the sum of squared residuals with respect to gives the well-known
expression
K
X p
0 C 1 x C : : : C p x p C pk .x k /C (34.12)
kD1
P p
constraint reduces the influence of the last additional H terms H hD1 ph .x h /C
in (34.12) to avoid overfitting the data and results in the penalized spline model. The
constraining of regression parameters shares similarities with other approaches such
as ridge regression (see, e.g., [9]) and LASSO [26].
The constrained minimization problem is solved by
where plays the role of a Lagrange multiplier for the penalization. High values
result in a very smooth fit, whereas low values give a rough fit which is closer to
interpolating the data points. The smoothing parameter needs to be optimized,
and this is done using a cross-validation approach, in which a search algorithm is
used to find the value of that minimizes the average squared error resulting from
removing a point from the set of training data and predicting at that point. The fact
that penalized splines are only a short step from linear regression means that they
can exploit well-known properties to give fast order-n algorithms for the calculation
of cross-validation measures; therefore, the fitting of a penalized spline can be very
fast. More details on this can be found in [17] Section 5.3.
The spline model requires the specification of the knots h ; here they are placed at
equally spaced quantiles of the unique xi values, and the number of knots is chosen
from a set of candidate H values, as the number of knots which results in the best
fit according to a measure of cross-validation. This procedure is described in more
detail in [17] Section 5.5.
In the applications in this chapter, the spline models are deliberately constrained
to be quite simple fits to the data. To do this, the spline is allowed to have
K D 0; 1; 2; 3; 4 knots, where the optimal number is chosen as that which minimizes
the cross-validation measure. In the case of H D 0, the spline is reduced to the
polynomial model, which in this work is cubic, since p is set to 3.
Finally, note that in the following work, penalized splines will sometimes be
referred to simply as splines. All splines used in this investigation are penalized
cubic splines.
A commonly used kernel function is the Gaussian density function with standard
deviation b. Local-linear regression is used here (as opposed to, e.g., local mean
regression) since it is generally regarded as a good choice, due to its good properties
near the edges of the data cloud. The smoothing parameter b can be optimized by
cross-validation; this is the method that is adopted in the applications in this chapter.
The local polynomial approach is used by [7] for estimation of first-order sensitivity
indices of uncertainty for models with correlated inputs.
3.4 Remarks
The choice of whether to use penalized splines, kernel regression, linear regression,
or another nonlinear regression approach entirely is a working decision of the
analyst. By construction, linear regression (i.e., linear with respect to x) only reveals
linear dependencies between variables and so should be used with this caveat in
mind. Higher-order polynomial regression or other basis functions within a linear
regression might improve the fit, but also risk overfitting. The penalized splines and
local-linear regression allow for nonlinearities when they are present in the data,
but can also model near-linear data when required. In most applications, these two
latter approaches are expected to give comparable results. However, in the presence
of highly skewed and/or heteroskedastic data, the two fits may be quite different.
While the fits are likely to be similar, splines may have a slight advantage over
local-linear regression in some other respects. First, they are less computationally
demanding than kernel regression, due to the fast computation of cross-validation
measures discussed previously. While the difference is small enough to be negligible
for running a few regressions on small data sets, the difference may be sizable
if one wants to attempt an optimization of weights or if a very large number of
regressions need to be run. Large data sets of the order of thousands or millions of
points are common in environmental science. A second advantage of splines is that
the computation of their derivatives is just as easy.
In any case, a prudent strategy would be to run both analyses with splines
and kernel regression and compare the results this is the approach taken in the
following examples where both methods are applied and compared.
4 Case Studies
This section delves into the conceptual and statistical properties of three composite
indicators selected for their interesting structure as well as for their popularity. These
examples allow a practical demonstration of the methods described in previous
sections, as well as providing an interesting analysis of three well-known composite
indicators. The case studies are chosen because:
They are transparent namely, the values of each input variable are publicly
available, and the index can be reproduced given the methodological notes
provided by the developers.
1200 W. Becker et al.
Composite indicator
Sub-pillar Sub-pillar
They use three different aggregation formulas, which allows the effect of the
different measures of importance described here, and estimation methods, to be
assessed.
They deal with issues of governance and accountability and hence offer interest-
ing narratives on the misconception of what matters when developing an index.
4.1.1 Aim
The Resource Governance Index (RGI) is developed by the Revenue Watch Institute
in order to measure the transparency and accountability in the oil, gas, and mining
sectors in 58 countries [15]. These nations produce 85% of the worlds petroleum,
90% of diamonds, and 80% of copper, generating trillions of dollars in annual
profits. The future of these countries depends on how well they manage their natural
resources.
34 Weights and Importance in Composite Indicators: Mind the Gap 1201
4.1.2 Sources
To evaluate the quality of governance in the extractive sector, the Resource
Governance Index employs a 173-item questionnaire that is based on the standards
put forward by the International Monetary Funds 2007 Guide on Resource Revenue
Transparency and the Extractive Industries Transparency Initiative, among others.
The answers to the 173 questions are grouped into 45 indicators that are then
mapped into three (of the four) RGI dimensions: Institutional and Legal Setting,
Reporting Practices, and Safeguards and Quality Controls. The fourth dimen-
sion, Enabling Environment, consists of five additional indicators that describe
a countrys broader governance environment; it uses data compiled from over
30 external sources by the Economist Intelligence Unit, International Budget
Partnership, Transparency International, and Worldwide Governance Indicators.
The Index is therefore a hybrid, with three dimensions based on the questionnaire
specifically assessing the extractive sector, and the fourth rating the countrys overall
governance.
1. Institutional and Legal Setting (20% weight): ten indicators that assess whether
the laws, regulations, and institutional practices enable comprehensive disclo-
sures, open and fair competition, and accountability.
2. Reporting Practices (40% weight): 20 indicators that evaluate the actual disclo-
sure of information and reporting practices by government agencies.
3. Safeguards and Quality Controls (20% weight): 15 indicators that measure the
checks and oversight mechanisms that guard against conflicts of interest and
undue discretion, such as audits.
4. Enabling Environment (20% weight): five indicators of the broader governance
environment generated using over 30 external measures of accountability, gov-
ernment effectiveness, rule of law, corruption, and democracy.
The RGI score is a weighted arithmetic average of the four dimensions, i.e., of
the form of (34.1), where the dimensions here are treated as the x1 ; x2 ; x3 ; x4 input
variables.
Because actual disclosure constitutes the core of transparency, developers
assigned a greater weight to the Reporting Practices dimension. This choice
also reflects a belief that without reporting information, rules and safeguards ring
hollow. Therefore, Reporting Practices are assigned a weight equal to 40% of the
final country score, and the other three dimensions (Institutional and Legal Setting,
Safeguards and Quality Controls, and Enabling Environment) account for 20% each.
On the inclusion of the Enabling Environment dimension, there have been
arguments for and against. Against its inclusion, the arguments are:
1. The Enabling Environment dimension dilutes the focus of the index on the oil,
gas, and mining sector by incorporating measures of overall governance.
1202 W. Becker et al.
2. The Enabling Environment dimension can have an undue effect on the index
scores, driving scores up or down, inflating or depressing performances beyond
what countries actually show in their extractive sector.
Given these last two arguments, the developers chose to include the Enabling
Environment dimension and to allocate a 20% weight to this dimension. As part
of the Index website, the developers provide a tool that allows users to change the
weights for the different dimensions, creating different composite scores that reflect
their own sense of prioritization. This is a direct example of how weights are often
(erroneously) interpreted as measures of importance.
4.1.4 Results
Before examining the relationships between each variable (dimension/pillar) and
the output of the composite indicator, a basic analysis of the structure of the data
was performed. There are no outliers (absolute skewness is less than 0.63) in the
four dimensions distributions that could bias the subsequent interpretations of
the correlation structure. The four dimensions of the Resource Governance Index
have moderate to high correlations that range from 0.41 (between Institutional and
Legal Setting and Enabling Environment) to 0.82 (between Reporting Practices and
Safeguards and Quality Controls) and an overall good average bivariate correlation
of 0.65. Principal component analysis suggests that there is indeed a single latent
phenomenon underlying the four index dimensions. This first principal component
captures 74% of the variation in the four dimensions.
Moving now to the importance measures, Table 34.1 shows the estimates of
the correlation ratios obtained both with penalized splines and local-linear (LL)
regression, as well linear correlations, for the four input variables of the index.
The correlation ratios Si (penalized splines and LL approach) and the Pearson
correlation coefficients Ri2 confirm that the Reporting Practices component has
indeed the highest impact on the index. This was the intention of the RGI
developers on the grounds that actual disclosure constitutes the core of transparency.
This choice also reflects a belief that without reporting information, rules and
safeguards are inconsequential. In fact, the correlation between Reporting Practices
and Safeguards and Quality Control is very high (0.82, the highest among the
34 Weights and Importance in Composite Indicators: Mind the Gap 1203
Table 34.1 Measures of dependence of the Resource Governance Index on its input variables:
Ri D corr.xi; ; y/ (correlation), Si;spl (correlation ratio, penalized spline), Si;LL (correlation ratio,
local linear)
Resource Governance Index (n D 58) xi wi Ri Ri2 Si;spl Si;LL
Institutional and legal setting x1 0:2 0:79 0:63 0:65 0:67
Reporting practices x2 0:4 0:95 0:90 0:90 0:94
Safeguards and quality controls x3 0:2 0:91 0:82 0:83 0:83
Enabling Environment x4 0:2 0:77 0:59 0:65 0:70
components). If one could fix the Reporting Practices variable, the variance of the
RGI scores across the 58 countries would on average be reduced by 94% (local-
linear estimate). It is worth noting that despite the equal weights assigned to the
other three components, their impact on the RGI variation differs: by fixing any
of the other variables, the variance reduction would be 83% for Safeguards and
Quality Control, 70% for Enabling Environment, and 67% for Institutional and
Legal Setting, using the estimates of the local-linear regression.
As per the developers intention to have Reporting Practices twice as important
as any of the other three dimensions, i.e. Institutional and Legal Setting, Safeguards
and Quality Controls, and Enabling Environment, it is easy to see that this was not
achieved. The strong correlation of Reporting Practices with Safeguards and Quality
Controls results in these two pillars being almost equally important (0.90 and 0.82),
while the importance of Reporting Practices relative to the remaining two pillars is
of about 3=2 rather than 2.
Aside from the implications of this analysis from the point of view of the RGI,
it is interesting to look at the differences between the measures of importance
in Table 34.1. The linear Ri2 measure consistently gives the lowest estimates of
importance, the LL estimate of correlation ratio is the highest, and the penalized
spline estimate of correlation ratio is somewhere in between. To see in a little
more detail what is happening, Fig. 34.3 shows the scatterplots of the four input
variables and the simple linear, penalized spline and local-linear estimations of
main effects. For the Institutional and Legal Setting variable, the two nonlinear
fits are significantly different to the linear fit, but fairly similar to each other.
However, the structure of the data is such that the gradient of the linear fit (and
the overall trend of the nonlinear fits) is not very steep, resulting in relatively
low importance estimates compared to other variables. The Reporting Practices
variable shows quite similar fits between all three methods, although the local-linear
curve has a slightly higher variance. Safeguards and Quality gives a near-linear
fit for both nonlinear regression approaches, showing a strong agreement in
measures of importance. The Enabling Environment variable is the most interesting
here from a regression point of view: the penalized spline fits a roughly cubic
model, whereas the local-linear curve fluctuates more strongly with the data.
Whether the more parsimonious penalized spline curve or the more variable local-
linear curve is a better estimate of E.yjxi / is not intuitively clear from visual
inspection.
1204 W. Becker et al.
100 100
Data
Resource governance
Resource governance
80 Spline 80
Linear
60 Local linear 60
40 40
20 20
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Institutional/legal Reporting
100 100
Resource governance
Resource governance
80 80
60 60
40 40
20 20
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Safeguards/quality Enabling environment
Fig. 34.3 Penalized spline, local linear, and linear fits to the Resource Governance Index
As a further analysis of the effect of each variable, Fig. 34.4 shows the first
derivatives of E.yjxi / with respect to each input variable, as estimated by the
penalized splines. First, the nonlinearities of the spline fits are evident from the
nonconstant derivatives. The derivative plots also have implications for the index
itself they effectively summarize the expected change in the RGI that a country
would achieve if it moved a given amount in each variable. Two example countries,
Libya and India, have their scores in each indicator marked as dotted vertical
lines. In the case of Libya, it is clear that to move up the rankings in the RGI, it
would be better to invest efforts in improving the Institutional and Legal Setting,
and Enabling Environment dimensions, whereas gains in Reporting and Safeguards
and Quality would yield lesser gains. India, a country ranked overall 12th in the
RGI, would on the other hand stand to gain very little from small improvements in
Enabling Environment, and immediate efforts would be better directed toward either
Reporting or Safeguards and Quality.
4.2.1 Aim
The Financial Secrecy Index (FSI) is developed by the Tax Justice Network (TJN)
and aims to measure the contribution to the global problem of financial secrecy
in 80 jurisdictions worldwide [5]. In other words, the Index attempts to provide
an answer to the question: by providing offshore financial services in combination
34 Weights and Importance in Composite Indicators: Mind the Gap 1205
2.5 2.5
Derivative of main effect
1.5 1.5
1 1
0.5 0.5
Libya India Libya India
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Institutional/legal Reporting
2.5 2.5
Derivative of main effect
1.5 1.5
1 1
0.5 0.5
Libya India Libya India
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Safeguards/quality Enabling environment
Fig. 34.4 Derivatives d E.yjxi /=dxi using penalized splines and linear regression fits to the
Resource Governance Index. Indicator values of Libya and India marked as vertical dotted lines
with a lack of transparency, how much damage is each secrecy jurisdiction actually
responsible for?
To give an example of what this implies, the home page of the FSI informs us that
because of capital flights dwarfing the inflow of aid money, Africa is in fact a net
creditor to the rest of the world since the seventies. As per the money stashed away,
the TJN informs us that those assets are in the hands of a few wealthy people,
protected by offshore secrecy, while the debts are shouldered by broad African
populations.
4.2.2 Sources
The index combines qualitative data and quantitative data. Qualitative data are
based on laws, regulations, cooperation with information exchange processes, and
other verifiable data sources and are used to calculate a secrecy score for each
jurisdiction. Secrecy jurisdictions with the highest secrecy scores are more opaque
in the operations they host, less engaged in information sharing with other national
authorities, and less compliant with international norms relating to combating
money laundering. Lack of transparency and unwillingness to engage in effective
information exchange makes a secrecy jurisdiction a more attractive location for
routing illicit financial flows and for concealing criminal and corrupt activities.
Quantitative data is used to create a global scale score, for each jurisdiction,
according to its share of offshore financial service activity in the global total.
1206 W. Becker et al.
Publicly available data about the trade in international financial services of each
jurisdiction are used. Where necessary because of missing data, the developers fol-
low International Monetary Fund methodology to extrapolate from stock measures
to generate flow estimates. Jurisdictions with the largest weightings are those that
play the biggest role in the market for financial services offered to nonresidents.
Critics have argued that the Global Scale dimension unfairly points to large
financial centers. However, the developers response is that to dispense with
the scale risks ignoring the big elephants in the room. While large players
may be slightly less secretive than other jurisdictions, the extraordinary size of
their financial sectors offers far more opportunities for illicit financial flows to
hide. Therefore, the larger an international financial sector becomes, the better its
regulations and transparency ought to be. This logic is reflected in the FSI which
aims to avoid the conceptual pitfalls of the usual suspects lists of tax havens
which are often remote islands whose overall share in global financial markets is
tiny. In the FSI a jurisdiction with a larger share of the offshore finance market, and
a moderate degree of opacity, may receive the same overall ranking as a smaller but
more secretive jurisdiction.
Due to a significantly greater skew in the Global Scale scores compared to the
Financial Secrecy scores, the developers transform the two to generate series that
are more evenly distributed. The choice of the transformation has been guided by
the 90/10% and the 75/25% quantile ratios in each of the two series. In the original
34 Weights and Importance in Composite Indicators: Mind the Gap 1207
Table 34.2 Importance measures of the variables of the Financial Secrecy Index. Ri D
corr.xi; ; y/ (correlation), Si;spl (correlation ratio, penalized spline), Si;LL (correlation ratio, local
linear)
Financial Secrecy Index (n = 80) xi wi Ri Ri2 Si;spl Si;LL
Global Scale x1 0:5 0:57 0:32 0:63 0:79
Financial Secrecy x2 0:5 0:09 0:01 0:13 0:09
series, the 90/10% ratio is more than five thousand times higher for the Global Scale
than the Secrecy variable, the 75/25 ratio nearly a hundred times higher. If one
squares the Secrecy Score and takes the square root of the Global Scale, these ratios
fall to below 26 and 6, respectively; and if one cubes the Secrecy Score and takes the
cube root of the Global Scale, they fall below 3 and 2, respectively. Finally, looking
at fourth and fifth roots and powers, the variation of the Global Scale series becomes
disproportionately small. Hence, the cube root/cube combination was adopted by the
developers on the grounds that these transformations are sufficient to ensure that
neither secrecy nor scale alone determine the FSI see [6].
4.2.4 Results
Despite the intentions of the developers, and the reasoning based on the quantiles,
the correlation ratios Si (splines and LL approach) and the Pearson correlation
coefficients reveal a notably unbalanced impact of the two components on the FSI
(see Table 34.2). The greatest difference between the estimates is in the correlation
ratios provided by the local-linear regression, in which the values are 0.79 for Global
Scale and 0.09 for Secrecy. This is in stark contrast to the intended influence, which
should be roughly equal for both variables. Examining the scatterplots in Fig. 34.5
however, the data looks quite challenging to smooth, and the three regression
approaches have markedly different fits. In particular, the local-linear regression
of the Global Scale variable has a large spike in the fit at a value of around 7, which
does not appear to be justified by the data. Therefore, the LL importance estimates
ought to be treated with caution. The spline fit is slightly more convincing, but
seems to be quite heavily biased by the outlying points above a score of around 5.
However, given that even the linear fit yields a far higher importance measure for
the Global Scale variable than the Secrecy variable (0.32 and 0.01, respectively),
the evidence seems fairly compelling that the Global scale variable dominates the
FSI by a significant margin. The analysis illustrates vividly that the cube root/cube
transformation of the FSI components and the equal weights assigned to the two
components are not a sufficient condition to ensure equal importance, at least
according to the correlation ratio measure.
The Global scale scores are particularly skewed (skewness D 4.6 and kur-
tosis D 23). Yet, after considering the cube root, the distribution is no longer
problematic (skewness D 1.7 and kurtosis D 3.0). Nevertheless, what remains prob-
lematic is the strong negative association between the cubed distribution of the
Financial Secrecy scores and the cube-rooted distribution of the Global Scale scores
(corr D :536).
1208 W. Becker et al.
4000
Data
3000 Spline
Linear
LL
FSI
2000
1000
0
0 5 10 15 20 25
Global Scale
2000
1500
1000
FSI
500
500
30 40 50 60 70 80 90
Financial Secrecy
Fig. 34.5 Scatter plots of Financial Secrecy Index versus its components
Table 34.3 Importance measures of the variables of the Financial Secrecy Index with the Ntrim
points with the highest Global Scale scores removed. Ri D corr.xi; ; y/ (correlation), Si;spl
(correlation ratio, spline), Si;LL (correlation ratio, local-linear)
Ntrim Financial Secrecy Index (n=50) Ri2 Si;spl Si;LL
3 Global Scale 0:616 0:798 0:754
Financial Secrecy 0:017 0:112 0:074
8 Global Scale 0:028 0:310 0:077
Financial Secrecy 0:015 0:119 0:093
If the points with the highest Global Scale scores are treated as outliers, the
problem lessens somewhat but does not disappear. Table 34.3 shows the results
of rerunning the analysis, trimming the points with three highest or eight highest
Global Scale values. By visual inspection, these seem to be two possible reasonable
definitions of outliers depending on where the cutoff value is set. Trimming the top
three points, the disparity is very similar, with the Global Scale correlation ratios far
higher than the Secrecy values. After trimming the top eight points, the importance
34 Weights and Importance in Composite Indicators: Mind the Gap 1209
scores are much closer; however, in the linear and spline estimates, the importance
of the Global Scale variable is still at least twice that of the Secrecy variable. In
the case of the LL regression, the Secrecy variable is now rated as slightly more
influential than the Global Scale, but given that this is the only (mild) contradiction
to an otherwise compelling trend, the conclusion that the variables are not equally
influential appears to stand.
In defense of the developers, one may note that this gross unbalance refers to
a specific definition of importance (how much the variance of the index would be
reduced on average if one could fix one dimension), and this definition appears to
condemn the FSI as problematic. One might argue however that the use of variance
of the main effect E.yjxi / as a measure of importance might not tell the whole
story and that interactions between the two variables should also be accounted for.
Following this line of thought, a useful tool might be the total effect index ST i ,
which is defined as Exi Vxi .y j x i /=V .y/, where x i denotes the vector of all
variables but xi . This measure also accounts for variance due to interactions. In fact
the scatterplot of Financial Secrecy in Fig. 34.5 looks like a textbook example of a
variable with a low Si and a high ST i , with the points displaying a rather constant
mean and an increasing variance while moving over the horizontal axis.
An investigation using this measure in the context of composite indicators is
outside of the scope of this work, but further information can be found in the section
on Variance-Based Sensitivity Analysis: Theory and Applications.
4.3.1 Aim
The Good Country Index (GCI) is developed by the Good Country Party with a view
to measure what a country contributes to the common good of humanity, and what
it takes away [1], following its developers normative framework and world view. In
total, 125 countries are included in the Index. In contrast to the majority of similar
composite indicators, the Good Country Index does not measure what countries do
at home; rather, it aims to start a global discussion about how countries can balance
their duty to their own citizens with their responsibility to the wider world.
This reflects the consideration that the biggest challenges facing humanity today
are global and borderless: problems such as climate change, economic crisis,
terrorism, drug trafficking, slavery, pandemics, poverty and inequality, population
growth, food and water shortages, energy, species loss, human rights, and migration.
All of these problems stretch across national borders, so they can be properly
tackled through international efforts. Hence, the concept of the Good Country is
about encouraging populations and their governments to be more outward looking
and to consider the international consequences of their national behavior.
4.3.2 Sources
The GCI builds upon 35 indicators that are produced by the United Nations and
other international agencies and a few by NGOs and other organizations. Most of
1210 W. Becker et al.
A ranking is calculated for each of the seven dimensions. The Good Country
Index is then calculated by taking the arithmetic average of the seven ranks in
Science, Technology, and Knowledge; Culture; International Peace and Security;
34 Weights and Importance in Composite Indicators: Mind the Gap 1211
World Order; Planet and Climate; Prosperity and Equality; and, finally, Health and
Wellbeing. This aggregation scheme has been selected by the developers because
of its simplicity and relative robustness to outliers. Beyond what is stated by the
developers, a further argument in favor of this aggregation scheme would be the
imperfect substitutability across all seven index components, i.e., the reduction of
the fully compensatory nature of an arithmetic average of the seven scores.
4.3.4 Results
Before looking at the correlation ratios describing the importance of each variable
to the output, it is useful to look at the correlations between input variables, as it
is good practice in the construction/evaluation of composite indicators. Six of the
seven dimensions of the Good Country Index have low to moderate correlations
that range from 0.20 (between several pairs of dimensions, mostly those involving
Prosperity and Equality) to 0.78 (between Science and Technology and Culture)
and an overall moderate average bivariate correlation of 0.37. Principal component
analysis suggests that there is indeed a single latent phenomenon underlying the
six dimensions and that the first principal component captures almost 50% of
the variation in these dimensions. However, the International Peace and Security
dimension has a negative correlation to both the Science and Technology and
to the Culture variables (0:48) and almost random correlation to all remaining
dimensions. This point is a strong concern for the validity of the GCI. The negative
correlations between International Peace and Security on one hand and either
Science and Technology or Culture or Health and Wellbeing on the other are
undesirable in a composite indicator context, as they suggest the presence of trade-
offs, and are a reminder of the danger of compensability between components.
Turning now to the correlation coefficients, Table 34.4 shows that, unlike the
equal weighting scheme of the seven components would suggest, the impact of the
seven components on the index is uneven. Three of the seven components, namely,
Culture, World Order, and Science and Technology, account for over 50% of the
variation in the index scores (up to 63% for Culture). Instead, by fixing either of
the three components Health and Wellbeing, Planet and Climate, and Prosperity
and Equality the variance reduction would be between 2537%. What is striking
Table 34.4 Ri D corr.xi; ; y/ (correlation), Si;spl (correlation ratio, spline), Si;ker (correlation
ratio, local linear)
Good Country Index (n=125) xi wi Ri Ri2 Si;spl Si;LL
Science and Technology x1 1/7 0:71 0:50 0:50 0:50
Culture x2 1/7 0:79 0:63 0:63 0:66
International Peace and Security x3 1/7 0:17 0:03 0:05 0:03
World Order x4 1/7 0:78 0:62 0:64 0:63
Planet and Climate x5 1/7 0:57 0:32 0:34 0:33
Prosperity and Equality x6 1/7 0:49 0:24 0:27 0:25
Health and Wellbeing x7 1/7 0:55 0:30 0:35 0:37
1212 W. Becker et al.
The tables showing correlation ratios and correlation coefficients from the case
studies in the previous section (i.e., Tables 34.1, 34.2, and 34.4) help to understand
the relative merits of the measures used in this study. The correlation coefficient Ri
and the coefficient of determination Ri2 give the linear correlation of the composite
indicator with each of its inputs. The Ri2 measure can be interpreted as the fraction
of variance accounted for by the linear regression (similar to Si ), but it is also useful
to see the Ri value in order to understand whether the relation is positive or negative.
As shown in the Good Country Index (Table 34.4), this measure revealed that one
of the input indicators is in fact negatively correlated with the composite indicator
this is an undesirable property as discussed previously, at least in the context of
linearly or geometrically aggregated composite indicators.
The spline and local-linear estimates of Si are all higher than or equal to the
Ri2 values. In most cases the difference is small this reflects the fact that the main
effects are close to linear. In these cases (see, e.g., the Reporting Practices variable
of the Resource Governance Index in Table 34.1), the spline and LL regression
fits are very close to straight lines. However, in some cases, such as the Enabling
Environment pillar (also from the Resource Governance Index), Si estimates are
significantly higher than Ri2 this is due to the nonlinear main effect, which is
captured by the spline and LL regression, but missed by the linear regression. In
this instance, R2 D 0:59 and Si is estimated as 0.65 or 0.70 by the spline and LL
regression, respectively. Looking at all three tables, one can see that in the majority
of cases, Ri2 would be a sufficient approximation of the effect of each variable, but
there are several exceptions. This shows the added value of nonlinear regression
it can approximate linear and near-linear main effects, which appear in the majority
of cases, but can generalize to nonlinear fits when required.
An obvious question at this point is to ask whether splines or LL regression
provides a better estimate of Si , given that the fits are slightly different in some
cases. The short answer is that there is no way to know which is better, given that
we do not know the true values of Si or the true main effects. The three tables
34 Weights and Importance in Composite Indicators: Mind the Gap 1213
show however that in the majority of cases, the estimates are very similar and in
the cases where the estimates differ, neither the splines nor the LL regression has a
tendency to provide higher estimates than the other overall.
It is tempting to think that higher estimates might be better estimates, given that
they capture more of the total variance. However, both the splines and the LL
regression could be easily adjusted to capture 100% of the variance; this would
result in interpolation rather than smoothing. But this does not in general provide a
good approximation of the main effects and results in overfitting. Instead, the aim of
nonlinear regression is to find a balance between interpolation and simplicity this
is known as the bias-variance tradeoff in machine learning; see, e.g., [9].
Given then that neither splines nor local-linear regression provides consistently
higher or lower estimates than the other, the best strategy would be to estimate
main effects using both methods and then compare the results. It can also be helpful
to visually examine the fits for example, in Fig. 34.5, the Global Scale variable
of the Financial Secrecy Index gives a data set that results in very different main
effect estimates between the linear, spline, and local-linear approaches. Although
none of the fits seem extremely convincing (the data is strongly skewed and
heteroskedastic), the LL fit in particular has a strange spike that does not appear to
be justified by the data in this case one might treat the estimate of the local-linear
regression with caution. A clear avenue of research here would be to incorporate
confidence intervals into the estimates of main effects and subsequently into the
estimates of Si . Some approaches to doing this within the context of splines can
be found in [17]. An alternative tool might be to use a Bayesian implementation of
a Gaussian process, which would formally propagate uncertainties in main effect
estimation to estimates of correlation ratios.
Aside from the fitting of the data, splines may offer some small advantages over
LL regression. The first is that, being closely related to linear regression, they can
provide fast analytic estimates of derivatives. This property is used in Fig. 34.4, to
illustrate the gain in Resource Governance Index that a country or entity would
achieve if its value of a given variable changed by a given amount along its axis.
Furthermore, splines are able to exploit properties of linear regression (such as
calculation of the cross-validation measures) to allow very fast fitting to a given data
set. In the examples here, which feature relatively small data sets, computational
time is not an issue. However, some composite indicators based on physical maps
measure concepts such as ecosystem service indices at resolutions as high as 1km,
over the whole of Europe [12]. In such cases, the data set may feature millions
of points, and splines offer a significant advantage over the slower local-linear
regression.
6 Conclusions
The leitmotif of the present chapter is to mind the gap between the two stories
which can be told about the weights of composite indicators. On one hand weights
appear to belong to developers, who are entitled to set them by analysis and
1214 W. Becker et al.
negotiation based on their norms and values. On the other hand, most forms
of aggregation and notably all linear aggregations are particularly inept
at translating these weights into effects. The proposal here is to look at the
problem by choosing a statistically defensible definition of importance, one that
derives from the theory of global sensitivity analysis, which in turn originates
from the theory of experimental design and of analysis of variance (ANOVA-like
decomposition). It is therefore possible to compare the importance of different
variables against what their weights would purport them to be. The discussion
of the Financial Secrecy Index makes it clear that all inferences made here are
conditional on the definition of importance. There is nothing surprising in this.
The same holds in sensitivity analysis. In fact in sensitivity analysis [an] analyst
[must] specify what is meant by factors importance for a particular application.
Leaving instead the concept of importance undefined could result in several tests
being thrown at the problem and in several rankings of factor importance being
obtained, without a basis to decide which one to believe [24]. The situation is
similar in the analysis of the coherence of the weights with the behavior of the
index.
Still, within the limits of this analysis, it is perhaps possible to say some-
thing useful to the developers. As far as the Resource Governance Index is
concerned, it can be said that developers were successful in making Reporting
Practices the most important dimension. They were less successful in making
the other three dimensions equally important (Table 34.1). Further, in the inten-
tion of the developers, each of these three dimensions, Institutional and Legal
Setting, Safeguards and Quality Controls, and Enabling Environment, was
supposed to be half important as Reporting Practices. The correlation struc-
ture of the problem did not allow this to happen, at least with the selected
weights.
For the Financial Secrecy Index, the measures of importance point to a problem-
atic property of the index, whereby one dimension, Global Scale, apparently far
more important than the other, Financial Secrecy (with the exact ratio varying
somewhat between various estimates), while the two are intended to be equally
important. At the same time, the aggregation formula chosen generates an important
interaction term in the index. This challenges the correlation ratio as a sufficient
measure of importance for this particular application. More analysis is needed on
this challenging test case.
In the case of the Good Country Index, it is clear that the ambition to capture
several dimensions normatively labeled as equally important was not rewarded by
the results. Not only are the dimensions unequal, but one important dimension,
International Peace and Security, has a very small influence on the index score
and in fact has a negative correlation with the output.
Note that the approach presented in this chapter can be used to actually improve
the indices by tuning the weights as to obtain the desired importance [13]. The
approach has been implemented in two important indices: the INSEAD-WIPO
Global Innovation Index [18] and the Environmental Performance Index developed
by Yale University [2].
34 Weights and Importance in Composite Indicators: Mind the Gap 1215
References
1. Anholt, S., Govers, R.: The good country index. Tech. rep., The Good Country Party. http://
www.goodcountry.org/ (2014)
2. Athanasoglou, S., Weziak-Bialowolska, D., Saisana, M.: Environmental performance index
2014: Jrc analysis and recommendations. Tech. rep., European Commission, Joint Research
Centre (2014)
3. Bandura, R.: Composite indicators and rankings: inventory 2011. Tech. rep., United Nations
Development Programme Office of Development Studies (2011)
4. Bowman, A., Azzalini, A.: Applied Smoothing Techniques for Data Analysis: The Ker-
nel Approach with S-Plus Illustrations, vol 18. Oxford University Press, New York
(1997)
5. Cobham, A., Jansky, P., Christensen, J., Eichenberger, S.: Financial Secrecy Index 2013:
Methodology. Tech. rep., The Tax Justice Network. http://www.financialsecrecyindex.com/
(2013)
6. Cobham, A., Jansk, P., Meinzer, M.: The financial secrecy index: shedding new light on the
geography of secrecy. Econ. Geogr. 91(3), 281303 (2015)
7. Da Veiga, S., Wahl, F., Gamboa, F.: Local polynomial estimation for sensitivity analysis on
models with correlated inputs. Technometrics 51(4), 452463 (2009)
8. Decancq, K., Lugo, M.A.: Weights in multidimensional indices of wellbeing: an overview.
Econ. Rev. 32(1), 734 (2013)
9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining,
Inference and Prediction. Springer, New York (2001)
10. Kelley, J.G., Simmons, B.A.: Politics by number: indicators as social pressure in international
relations. Am. J. Pol. Sci. 59(1), 5570 (2015)
11. Li, G., Rabitz, H., Yelvington, P.E., Oluwole, O.O., Bacon, F., Kolb, C.E., Schoendorf, J.:
Global sensitivity analysis for systems with independent and/or correlated inputs. J. Phys.
Chem. A 114(19), 60226032 (2010)
12. Paracchini, M.L., Zulian, G., Kopperoinen, L., Maes, J., Schgner, J.P., Termansen, M.,
Zandersen, M., Perez-Soba, M., Scholefield, P.A., Bidoglio, G.: Mapping cultural ecosystem
services: a framework to assess the potential for outdoor recreation across the EU. Ecol. Indic.
45, 371385 (2014)
13. Paruolo, P., Saisana, M., Saltelli, A.: Ratings and rankings: voodoo or science? J. R. Stat. Soc.
Ser. A. (Stat. Soc.) 176(3), 609634 (2013)
14. Pearson, K.: On the General Theory of Skew Correlation and Non-linear Regression. Volume
XIV of Mathematical Contributions to the Theory of Evolution, Drapers Company Research
Memoirs. Dulau & Co., London (1905). Reprinted in: Early Statistical Papers, Cambridge
University Press, Cambridge (1948)
15. Quiroz, J.C., Lintzer, M.: The 2013 resource governance index. Tech. rep., The Revenue Watch
Institute (2013)
16. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cam-
bridge (2006)
17. Ruppert, D., Wand, M., Carroll, R.: Semiparametric Regression, vol. 12. Cambridge University
Press, Cambridge (2003)
18. Saisana, M., Saltelli, A.: Joint Research Centre statistical audit of the 2014 Global Innovation
Index. Tech. rep., European Commission, Joint Research Centre (2014)
19. Saisana, M., Saltelli, A., Tarantola, S.: Uncertainty and sensitivity analysis techniques as
tools for the quality assessment of composite indicators. J. R. Stat. Soc. A 168(2), 307323
(2005)
20. Saisana, M., dHombres, B., Saltelli, A.: Rickety numbers: volatility of university rankings and
policy implications. Res. Policy 40(1), 165177 (2011)
21. Saltelli, A., Tarantola, S., Campolongo, F.: Sensitivity analysis as an ingredient of modelling.
Stat. Sci. 15(4), 377395 (2000)
1216 W. Becker et al.
22. Saltelli, A., Tarantola, S.: On the relative importance of input factors in mathematical models:
safety assessment for nuclear waste disposal. J. Am. Stat. Assoc. 97, 702709 (2002)
23. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global Sensitivity Analysis The Primer. John Wiley & Sons, Hoboken (2008)
24. Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F.: Update 1 of: Sensitivity analysis for
chemical models. Chem. Rev. 112(5), 125 (2012)
25. Storlie, C., Helton, J.: Multiple predictor smoothing methods for sensitivity analysis: Descrip-
tion of techniques. Reliab. Eng. Syst. Saf. 93(1), 2854 (2008)
26. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B
(Methodol.) 58, 267288 (1996)
27. Yang, L.: An inventory of composite measures of human progress. Tech. rep., United Nations
Development Programme Human Development Report Office (2014)
Variance-Based Sensitivity Analysis:
Theory and Estimation Algorithms 35
Clmentine Prieur and Stefano Tarantola
Abstract
This section aims at presenting an overview of variance-based approaches for
global sensitivity analysis. Starting from functional ANOVA, Sobol indices are
first defined and then estimation algorithms are provided. The performance of
these algorithms is theorically and practically discussed. The review includes
recent results on the topic.
Keywords
FANOVA Sobol sensitivity indices Global sensitivity analysis Monte
Carlo sampling Quasi-Monte Carlo sampling Sampling design Replica-
tion Latin hypercube sampling Orthogonal arrays (OA) Spectral meth-
ods Fourier amplitude sensitivity test Random balance design Effective
algorithm for sensitivity indices Polynomial chaos expansion effective
dimension Sensitivity indices with given data
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218
2 General Definition of FANOVA Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218
3 Definition of Sobol Sensitivity Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1220
4 Available Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221
C. Prieur ()
Laboratoire Jean Kuntzmann (LJK), University of Grenoble Alpes, INRIA, Grenoble, France
e-mail: clementine.prieur@imag.fr
S. Tarantola
Statistical Indicators for Policy Assessment, Joint Research Centre of the European Commission,
Ispra (VA), Italy
e-mail: stefano.tarantola@jrc.ec.europa.eu
1 Introduction
Many mathematical models involve input parameters and variables (simply named
inputs from now on), which are not precisely known. Global sensitivity analysis
aims to identify the inputs whose uncertainty has the largest impact on the variability
of a quantity of interest (output of the model). One of the statistical tools used to
quantify the influence of each input on the output are the Sobol sensitivity indices.
These indices measure the part of the outputs variance due to one or more inputs.
To define the so-called Sobol sensitivity indices, a first step is the decomposition
of the model, as a d -variate function G.x/ into a finite hierarchical expansion of
component functions in terms of the input x D .x1 ; x2 ; : : : ; xd /. A way to do
that is the functional ANOVA (FANOVA) decomposition also known as the high-
dimensional model representation (HDMR) technique.
Definition 1 (FANOVA decomposition; see, e.g., [9, 37, 48]).R Letx D .x1 ;: : : ; xd /
2 I1 : : :Id . Let G 2 L2R .PX / D ff W I1 : : :Id ! R j f 2 .x/PX .d x/ < C1g.
One can decompose G.x/ D G.x1 ; : : : ; xd / as the sum of increasing dimension
functions:
P P
G.x/ D G C dj D1 Gj .xj / C 1j <kd Gj;k .xj ; xk / C : : : C G1;:::;d .x/
P
D u2P.f1;:::;d g/ Gu .xu /;
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1219
with xu D .xj ; j 2 u/. This expansion exists and is unique under the constraint
Z
Gu .xu /PXj .dxj / D 0 8 j 2 u; 8 u 2 P.f1; : : : ; d g/:
d
X
Var .G.X// D Var Gj .Xj /
j D1
X
C Var Gj;k .Xj ; Xk / C : : : C Var .G1;:::;d .X//
1j <kd
X
D Var .Gu .Xu // :
u2P.f1;:::;d g/
0 12
X
E .G.X/ G /2 D E @ Gu .Xu /A
u2P.f1;:::;d g/
Then, under the constraints in Definition 1, one gets that G D E .G.X// and that
the Gu .Xu / are centered. Thus, E .G.X/ G /2 D Var .G.X// and E .Gu .Xu //2 D
Var .Gu .Xu //. This concludes the proof of Corollary 1.
Remark. The elements of the decomposition can be computed recursively with the
formula
X
Gu .Xu / D .1/jujjvj E .G.X/jXi ; i 2 v// :
vu
1220 C. Prieur and S. Tarantola
Following [48], Sobol sensitivity indices are defined based on the functional
ANOVA decomposition introduced in the previous section.
Definition 2. For any u 2 P.f1; : : : ; d g/, if G 2 L2R .PX / and Var .G.X// 0,
define
Var Gu .Xu /
the Sobol index associated to u: Su D .
Var G.X/
clo Var E.G.X/jXj ; j 2 u/
the closed Sobol index associated to u: Su D .
Var G.X/
tot Var E.G.X/jXj ; j u/
the total Sobol index associated to u: Su D 1 .
Var G.X/
X
1D Su :
u2P.f1;:::;d g/; u
The first-order Sobol index Sfj g , also denoted by Sj , measures the main effect
of the group of inputs Xj 2 Rkj . The second-order Sobol index Sfj;kg measures
the effect due to the second-order interaction between groups Xj and Xk , once
the main effect of each group has been removed.
clo
The closed second-order Sobol index Sfj;kg measures the effect due to both
clo clo
groups Xj and Xk . One has Sfj;kg D Sfj;kg CSfj g CSfkg . Moreover, Sfj g
D Sfj g .
tot
At last, the total Sobol index Sfj g measures the effect due to Xj and its
interactions with
tot
P all the other Xk , k D 1; : : : ;tot j 1; j C 1; : : : d [15]. One
clo
has Sfj g
D u2P.f1;:::;d g/; j 2u Su . Note that Sfj g D 1 Sj where j D
f1; : : : ; j 1; j C 1; : : : ; d g.
Remark.
The origin of Sobol indices probably goes back to the correlation ratio [39].
In the seminal paper [48], the Xj are assumed to be uniformly distributed on
0; 1. However, the definitions in [48] easily generalize to the ones proposed in
Definition 2.
The generalization of the total Sobol index Sutot , for any u 2 P.f1; : : : ; d g/, is
also known as total variation associated with input set u (see, e.g., [45]). It is also
possible to consider the importance superset measure of the group of variables
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1221
P
Xu , as introduced in [31] for uniform distributions: uv Sv . This last measure
was also investigated in the data-mining framework by [16], and in the case where
u represents a pair of inputs, it was used for screening purposes in [11]. In these
last papers, the authors work with the unnormalized versions of the indices, that
is, they consider the above indices multiplied by the overall variance Var G.X/.
where N is the sample size (typically of the order of 5001,000, although larger
values might be needed to achieve stable estimates of sensitivity for more complex
G.X/); A and B are two independent N d sampling matrices where each row
of the matrix is a sample point in the d-dimensional space of the inputs. In (35.5),
AB.u/ is a resampled matrix, where columns u come from matrix B and all other
d card .u/ columns come from matrix A. Here, card .u/ is the cardinality of
subset u. The term Var G.X/ can be computed efficiently using 2N sample points,
i.e., the N rows of matrix A and the N rows of matrix B:
N
1 X
Var .G.X// D .G.A/.i/ .A//2 C .G.B/.i/ .B//2 (35.6)
2N iD1
P P
where .A/ D N1 N iD1 G.A/
.i/
and .B/ D N1 N iD1 G.B/ .
.i/
In the particular case where u D fjg, (35.5) reduces to (35.7a), i.e., to the first-
order Sobol index for a single input:
N
1 X
Sj Var G.X/ D G.B/.i/ .G.AB.fj g/ /.i/ G.A/.i/ /: (35.7a)
N iD1
.i/
Y .i/ D G.B/.i/ and Yj D G.AB.fj g/ /.i/ :
N N .i/ !2
1 X .i/ .i/ 1 X Y C Yj
.i/
Y Yj (35.7b)
N iD1 N iD1 2
N .i/ 2 N .i/ !2
1 X .Y / C .Yj / 1 X Y C Yj
.i/ 2 .i/
: (35.7c)
N iD1 2 N iD1 2
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1223
Remark. It was proven in [18] that the estimator based on (35.7b) and (35.7c)
is asymptotically efficient, in the sense that it has an optimal asymptotic variance
among a given class of estimators containing, e.g., the estimator defined by
formulas (35.6) and (35.7a). However, in the form of (35.7b) and (35.7c), the
numerical stability is far from being guaranteed. In Remark 1.4 of [18], the authors
propose mathematically equivalent formulas to (35.7b) and (35.7c) which enable
greater numerical stability (i.e., less error due to roundoffs).
Remark. Formulas (35.7b) and (35.7c) can be extended for the estimation of closed
Sobol indices of any order as well (see [18] for more details).
Remark. Owen [38] extends (35.5) by drawing points from a third N d sample
matrix obtaining more accurate estimates of small sensitivity indices. However, no
additional evaluations are necessary when applying the strategy of simultaneous
estimation of all indices described in [43], Theorem 1 (see also the comments after
Theorem 1).
N
1 X
Sjtot Var G.X/ D .G.A/.i/ G.AB.fj g/ /.i/ /2 : (35.7d)
2N iD1
where fj1 ; j2 g f1; : : : ; d g and fj1 ; j2 g means all variables except fj1 ; j2 g.
Another theorem (Theorem 1 in [43]) shows that one can obtain single estimates
of all first-order, second-order, and total sensitivity indices using NT D N .d C 2/
model evaluations.
To explain the procedure yielding to Theorem 1 above, an illustration in
dimension d D 4 is given. The following notation is used:
Uu D E2 .G.X/jXu /;
Vu D Var E.G.X/jXu / D Uu E2 .G.X//:
Uj E2 .G.X//
Sj D ;
Var G.X/
Uj E2 .G.X//
Sjtot D ;
Var G.X/
Uj1 j2 Uj1 Uj2 C E2 .G.X//
Sjclo
1 j2
D :
Var G.X/
b{1} U1
b{2} U2 U12
b{1,2,3,4} U1 U2 U3 U4 U1 U2 U3 U4
bound). If the model has some regularity, in the sense that it has bounded Hardy-
Krause variation, the Koksma-Hlawka inequality gives an upper d bound for the
QMC integration error. The rate of convergence is then in O logN.N / where N
is the sample size and d the dimension. However, this bound is a weak bound for
the integration error, as in practice it was proved that QMC may outperform MC
even for large dimension d . Indeed, the key parameter is not the model nominal
dimension but rather its effective dimension [2]. The effective dimension in the
truncation sense dT roughly speaking is equal to the number of important factors
in the model, while a model has the effective dimension in the superposition sense
dS if it is almost a sum of s-dimensional components, with s dS , in the FANOVA
decomposition. These dimensions dT and dS have a much more important impact
on the rate of convergence of QMC than the nominal dimension d . This property
will be illustrated on numerical experiments later in the chapter.
There exist two main families of constructions for low-discrepancy point sets
and sequences: lattices and digital nets/sequences (see [28] for more details on
these constructions and their properties). It is usual to use Sobol0 sequences [21]
as particular cases of quasi-random (or low-discrepancy) sequences of size .N; d /.
The low-discrepancy properties of Sobol sequences deteriorate with increasing
dimension of the input space. If the important inputs are located in the last
components of X, then the rate of convergence is negatively affected. Therefore, if
a preliminary ranking of inputs in order of importance is available, the convergence
1226 C. Prieur and S. Tarantola
Finally, the variance of all such conditional expectations yields the numerator of Sj ,
Var Gj .Xj /. Although several partition strategies are possible, an effective way is
to form classes of nearly the same size Nm D .N =M /. A de-biased formula has
been proposed by [41]
P
M 2
Nm G m G
mD1
Sj D N
(35.8)
P 2
G.X/.i/ G
iD1
where
1 X
Gm D G.X/.i/
Nm .i /
iWXj 2Cm
and
N
1 X
GD G.X/.i/ :
N iD1
Remark. Numerical experiments have shown that increasing such equally sized
partitions beyond 50 classes has negligible effect on the estimation accuracy [41].
Remark. The form of the estimate in (35.8) is very close to the one proposed by
McKay [34] (see also [35]) based on permuted column sampling. However, the one
described here is more general as it allows using any sampling design.
In case the model G has some regularity, one might prefer spectral approaches rather
than Monte Carlo approaches to estimate Sobol indices.
Now define
Zu;0 WD fk 2 Zd j 8j 2 u; kj 2 Z0 and 8j u; kj D 0g
and Zd 0 D fk 2 Zd j k .0; : : : ; 0/g:
Thanks to the orthonormality of the bases j;k ; k 2 Z , it is possible to identify
the components in the FANOVA decomposition as
X
Gu .Xu / D ck .G/1;k1 .X1 / : : : d;kd .Xd /: (35.9)
k2Zu;0
Proposition 1. Assume that G 2 L2K .PX / and that Var .G.X// 0. Then, for any
nonempty subset u f1; : : : ; d g,
P
k2Zu;0 jck .G/j2
Su D (35.10)
Var .G.X//
Proof. Endow the space L2K .PX / with the scalar product associated to PX : <
R
f; g >D f .x/g.x/PX .d x/. Equality (35.10) is a straightforward consequence
of the identification equality (35.9) and of Parsevals identity.
Recall that for all j D 1; : : : ; d , Ij is the support of the probability distribution of
Xj . For n 2 Z>0 , let D D fx1 ; : : : ; xn g be any
P finite subset of n points in I1 : : :Id .
Let p D .p1 ; : : : ; pn / 2 .R0 /n such that niD1 pi D 1. In the following, integrals
in (35.10) are approximated by quadrature formulas associated to the set of nodes
D and to the weights p. ck .G/ D< G; k > can be approximated by
n
X
cOk .G; D; p/ D G.xi /k .xi /pi
iD1
0 12
n
X n
X
b .D; p/ D
V @G.xi / G.xj /pj A pi :
iD1 j D1
Let now Ku be any finite subset of Zu;0 and define the following intuitive
estimator for Su :
P
k2Ku cOk .G; D; p/
b
S u .Ku ; D; p/ D (35.11)
b
V .D; p/
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1229
Remark. As the infinite set Zu;0 is truncated, an important requirement under this
way of estimating Sobol indices is the fast decrease of the coefficients ck .G/. This
is why spectral approaches are particularly well suited for regular models G, for
which they outperform Monte Carlo-type approaches. However, the choice of the
truncation set Ku is very important. Applying a too drastic truncation may lead to
nonconvergent estimates.
with for any real number z, fzg D z bzc, the fractional part of z 2 R and bc the
floor function.
Then, define the linear operators Rp and T' on L2C .0; 1d / (see Fig. 35.2 below)
such that for all x 2 0; 1d,
Rp G.x/ D G rp .x1 / : : : ; rp .xd / and T' G.x/ D G t'1 .x1 /; : : : ; t'd .xd / :
with Rp D R1 R1 .
p times
Lemma 1 (Lemma 1 in [54]). For any p 2 N and any ' 2 0; 2/d , the variance
decomposition is Rp and T' -invariant on L2C .0; 1d /.
Proof. The interested reader is referred to Appendix A.3 in [54] for a detailed proof
of Lemma 1.
Two classical designs of experiments are introduced. For any ! 2 .Z>0 /d, denote
n
i o n i o
G.!/ D !1 ; : : : ; !d ; i 2 f0; : : : ; n 1g :
n n
The cyclic subgroup of order n=gcd .!1 ; : : : ; !d ; n of the torus Td D
.R=Z/d ' 0; 1/d is generated by .f !n1 g; : : : ; f !nd g/.
1230 C. Prieur and S. Tarantola
a b
6 6
4 4
2 2
0 0
0 0.5 1 0 0.5 1
c d
6 6
4 4
2 2
0 0
0 0.5 1 0 0.5 1
Fig. 35.2 Examples of operators Rp and T' in dimension 1. (a) Plot of f W x 7! x C sin.x/. (b)
Plot of R1 f . (c) Plot of T 10 f . (d) Plot of .T 30 R1 /f
Definition 3. For any nonempty subset u f1; : : : ; d g, and any finite subset Ku
Zu;0 , for any ' 2 0; 2/d and any ! 2 .Z>0 /d , one has
SO FAST
u .G; Ku ; x / D SO u .T' R1 /G; Ku ; G.! ; p0 /:
For any nonempty subset u f1; : : : ; d g, and any finite subset Ku Zu;0 , for any
2 S and any ! 2 Z>0 , one has
SO RBD
u .G; Kfug ; x / D SO u .T!Q R! /G; !Kfug ; A. ; p0 /
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1231
.1!/
Q D
where ! 2!
; ; .1!/
2!
and !Kfug D f.!k1 ; : : : ; !kd /; k 2 Kfug g.
Remark. Note that the linear operator Rp regularizes the function G in the sense
that the coefficients ck .Rp G/ decrease faster to zero than the coefficients ck .G/
P
as dj D1 jkj j tends to infinity. The other operator T' essentially allows to define
randomized estimators in FAST.
Remark. FAST was introduced in [68] and RBD for first-order Sobol indices
in [50] and for higher-order Sobol indices in [57]. Both methods were revisited
recently in [54], leading to Definition 3. Under regularity assumptions, the estima-
tors in RBD are asymptotically unbiased.
Remark. A variant of RBD called quasi-random balance design (QRBD) has
recently been proposed to increase the precision and accuracy of first-order indices.
The method differs from RBD in the way the sample is created. QRBD uses
multidimensional quasi-random sequences to construct the permutations which
are used to create the realizations of the inputs. A working paper describing the
methodology is being prepared by Plischke, Tarantola, and Mara.
d
X
.k k0 / ! 0 for all k; k0 2 Zd ; k k0 , s.t. jki ki0 j M C 1 (35.12)
iD1
n M max.!1 ; : : : ; !d /:
More recently, referring to the classical information theory, the authors in [46]
suggest to replace the previous formula with Nyquist-Shannon sampling theorem
n > 2M max.!1 ; : : : ; !d /:
d
X d
X
0 0 0
.k k / ! 0 for all k; k 2 Z ; k k , s.t.
d
jki j M and jki0 j M:
iD1 iD1
Remark. In the RBD method, the parameter ! is usually set equal to 1. The RBD
approach is known to be biased (even if asymptotically unbiased), and recently a
partial correction of the bias was proposed in [53] and generalized in [54].
1232 C. Prieur and S. Tarantola
5 Test Functions
The first test case is the g-Sobol function. It is a classical test case, which is
parameter dependent. Depending on the choice of the parameter, the effective
dimension of this model can be varied. Moreover, an analytical expression for
Sobol indices is available.
5.1.1 Model
Y D f1 .X1 / fd .Xd / with .X1 ; : : : ; Xd / U 0; 1d and
j4Xj 2j C aj
fj .Xj / D ; aj
0; j D 1; : : : d:
1 C aj
The example provided in [43] with a D .0; 0:5; 3; 9; 99; 99/ has been chosen in
dimension d D 6.
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1233
0.8
0.4
0.6
0.4
Si
Si
0.2
0.2
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i i
Fig. 35.3 Estimation of (left) first-order and (right) closed second-order indices for the g-Sobol
function with a D .0; 0:5; 3; 9; 99; 99/, using replicated designs and formulas (35.7b) and (35.7c),
with replicated LHS (left) of level 1024 and replicated randomized orthogonal arrays (right) of
level 31. The confidence bounds are provided by the asymptotic central limit theorem stated in [18]
0.8
estimate
Sobol indices for the theoretical
g-Sobol function with
a D .0; 0:5; 3; 9; 99; 99/,
using random sampling and
0.6
formula (35.7d). The
confidence bounds are
computed with B D 100
bootstrap replications.
0.4
Si
The number of model
evaluations is
N D .d C 2/ 1024 D 8192
0.2
0.0
1 2 3 4 5 6
i
d
X
G.X/ D Xi =2
iD1
where
D 1 if 9Xi W 0 Xi < 1=2,
D 2 if 9i j 9Xi ; Xj W f0 Xi < 1=2; 0 Xj < 1=2g,
D 3 if 9i; j; k distinct 9Xi ; Xj ; Xk W f0 Xi < 1=2; 0 Xj < 1=2; 0
Xk < 1=2g,
and so on until D d .
Table 35.1 contains the results of applying the Sobol method and all the spectral
methods described in this chapter, for the estimation of first-order indices to the
discontinuous function defined above in two dimensions. Table 35.1 shows quite
interesting results. Firstly, the Sobol method provides very accurate estimates even
at N = 32, although the precision of such estimates (expressed by the standard
deviation over the 100 replicates) is quite weak and becomes comparable to that
of the other techniques only at N = 512. Secondly, RBD is biased at small sample
size, although this bias tends to disappear as N increases. At small N, the de-
biased version of RBD is very accurate, but its precision is lower than that of RBD.
QRBD is both accurate and precise even at small N. However, its de-biased version
underestimates the true values. This is because the assumption of white noise
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1235
Table 35.1 First-order indices obtained by the method of Sobol; de-biased and non-de-biased
versions of RBD, QRBD, and EASI; as well as the definition of first-order index with no specific
design. The tests have been made on the discontinuous function, defined above, in the case of two
inputs. The analytic first-order indices are 0.5 and 0.5, for the symmetric nature of the test function.
N indicates the total sample size. The analysis is replicated 100 times. In each cell, the average
sensitivity indices and the standard deviations are shown. The no specific design approach uses
simple random sample with 16 equally sized bins. In these tests, 16 bins allow the estimation of
sensitivity indices only for sample sizes 128, 256, and 512. At lower sample size, estimations are
not possible due to few empty bins
N 32 64 128 256 512
Sobol .48 .50 .40 .49 .48 .21 .47 .49 .13 .49 .50 .08 .49 .49 .04
RBD .68 .67 .18 .58 .58 .12 .54 .53 .08 .51 .51 .06 .51 .51 .04
RBD de-biased .51 .53 .26 .49 .50 .15 .49 .49 .10 .49 .49 .06 .50 .49 .04
QRBD .56 .56 .04 .52 .52 .03 .50 .50 .01 .50 .50 .01 .49 .49 .00
QRBD .28 .28 .07 .42 .41 .04 .45 .44 .01 .47 .47 .01 .48 .48 .00
de-biased
EASI .68 .67 .09 .58 .57 .06 .54 .54 .05 .51 .51 .03 .50 .50 .02
EASI de-biased .47 .48 .16 .48 .48 .07 .50 .50 .05 .50 .50 .03 .49 .49 .02
No specific .56 .56 .05 .53 .52 .04 .51 .51 .03
design [41]
Table 35.2 First-order indices obtained using the de-noised version of the EASI method for
different values of frequency truncation; see Equation (35.12). The tests have been made on the
discontinuous function, defined above, in the case of two inputs. Each analytic first-order index is
0.5, due to the symmetric nature of the test function. The total sample size N D 10;000 is chosen
sufficiently large so as to minimize sampling error and highlight the underestimation error due to
the truncation of the Fourier coefficients. The analysis is replicated 100 times. In each cell, the
average sensitivity indices over the 100 replicates are shown. The standard deviations across the
100 replicates are all equal to 0.005. As one can see, the negative bias is progressively reduced as
the truncation parameter increases
Truncation S1 S2
4 :4870 :4865
6 :4925 :4915
8 :4942 :4942
10 :4951 :4953
20 :4978 :4981
50 :4985 :4987
100 :4993 :4998
underlying the de-biasing algorithm does not hold for quasi-random permutations.
Therefore, it is recommended not to use this procedure. Finally, the EASI technique,
which does not require a specific design but works for any given input dataset, is
very precise even at N = 32, although less accurate. However, at large sample size,
its estimates are very accurate. The de-biased version of EASI proves very effective
in both accuracy and precision (Table 35.1).
1236 C. Prieur and S. Tarantola
6 Conclusions
Various approaches have been proposed to handle the case of correlated inputs.
The interested reader is referred to [35, 26, 29, 33] (see also Introduction of the
present chapter).
In many instances, one has to deal with multivariate outputs. Two interesting
approaches are described in [13, 27]. The reader can also refer to [14] for an
overview of different approaches to the analysis of multivariate output.
It is also possible to generalize global sensitivity analysis to computer codes
with functional inputs. In [17, 30], or [10], the authors propose approaches
which result in one sensitivity index for each functional input as a whole. More
recently, the authors in [12] proposed a method which provides insight into the
sensitivity with respect to changes at specific intervals of the functional domain
(see also the paper Chap. 39, Sensitivity Analysis of Spatial and/or Temporal
Phenomena of the present chapter which presents a review on sensitivity
analysis for functional inputs and/or functional outputs).
References
1. Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on least angle
regression. J. Comput. Phys. 230(6), 23452367 (2011)
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1237
2. Caflisch, R., Morokoff, W., Owen, A.: Valuation of mortgage backed securities using Brownian
bridges to reduce effective dimension. J. Comput. Fin. 1(1), 2746 (1997)
3. Champion, M., Chastaing, G., Gadat, S., Prieur, C.: L2-Boosting for sensitivity analysis with
dependent inputs. Stat. Sin. 25(4), (2015). doi:10.5705/ss.2013.310
4. Chastaing, G., Gamboa, F., Prieur, C.: Generalized Hoeffding-Sobol decomposition for
dependent variables. Application to sensitivity analysis. Electron. J. Stat. 6, 24202448
(2012)
5. Chastaing, G., Gamboa, F., Prieur, C.: Generalized Sobol sensitivity indices for dependent
variables: numerical methods. J. Stat. Comput. Simul. (ahead-of-print), 128 (2014)
6. Cukier, H., Fortuin, C.M., Shuler, K., Petschek, A.G., Schaibly, J.H.: Study of the sensitivity
of coupled reaction systems to uncertainties in rate coefficients: theory. J. Chem. Phys. 59,
38733878 (1973)
7. Cukier, H., Levine, R.I., Shuler, K.: Nonlinear sensitivity analysis of multiparameter model
systems. J. Comput. Phys. 26, 142 (1978)
8. Cukier, H., Schaibly, J.H., Shuler, K.: Study of the sensitivity of coupled reaction systems to
uncertainties in rate coefficients: analysis of the approximations. J. Chem. Phys. 63, 11401149
(1975)
9. Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Stat. 9(3), 586596 (1981)
10. Fort, J.-C., Klein, T., Lagnoux, A., Laurent, B.: Estimation of the Sobol indices in a linear
functional multidimensional model. J. Stat. Plan. Inference 143(9), 15901605 (2013)
11. Fruth, J., Roustant, O., Kuhnt, S.: Total interaction index: a variance-based sensitivity index
for second-order interaction screening. J. Stat. Plann. Inference 147, 212223 (2014)
12. Fruth, J., Roustant, O., Kuhnt, S.: Sequential designs for sensitivity analysis of functional
inputs in computer experiments. Reliab. Eng. Syst. Saf. 134, 260267 (2015)
13. Gamboa, F., Janon, A., Klein, T., Lagnoux, A.: Sensitivity analysis for multidimensional and
functional outputs. Electron. J. Stat. 8(1), 575603 (2014)
14. Garcia-Cabrejo, O., Valocchi, A.: Global sensitivity analysis for multivariate output using
polynomial chaos expansion. Reliab. Eng. Syst. Saf. 126, 2536 (2014)
15. Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of nonlinear
models. Reliab. Eng. Syst. Saf. 52(1), 117 (1996)
16. Hooker, G.: Discovering additive structure in black box functions. In: Proceedings of the
Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD04), Seattle, pp. 575580. ACM (2004)
17. Iooss, B., Ribatet, M.: Global sensitivity analysis of computer models with functional inputs.
Reliab. Eng. Syst. Saf. 94(7), 11941204 (2009)
18. Janon, A., Klein, T., Lagnoux-Renaudie, A., Nodet, M., Prieur, C.: Asymptotic normality and
efficiency of two Sobol index estimators. ESAIM: Probab. Stat. 18, 342364 (2014)
19. Jansen, M.J.W.: Analysis of variance designs for model output. Comput. Phys. Commun.
117(1), 3543 (1999)
20. Jansen, M.J.W., Rossing, W.A.H., Daamen, R.A.: Monte carlo estimation of uncertainty con-
tributions from several independent multivariate sources. In: Gasman, J., van Straten, G. (eds.)
Predictability and Nonlinear Modelling in Natural Sciences and Economics, pp. 334343.
Kluwer Academic, Dordrecht (1994)
21. Joe, S., Kuo, F.Y.: Remark on algorithm 659: Implementing Sobol0 s quasirandom sequence
generator. ACM Trans. Math. Softw. 29(1), 4957 (2003)
22. Joint Research Centre of the European Commission: Sensitivity analysis software. http://ipsc.
jrc.ec.europa.eu/index.php?id=756 (2014)
23. Kelley, T.L.: An unbiased correlation ratio measure. Proc. Natl. Acad. Sci. U. S. A. 21(9),
554559 (1935)
24. Kishen, K.: On latin and hyper-graeco cubes and hypercubes. Curr. Sci. 11(3), 9899 (1942)
25. Kucherenko, S., Balazs, F., Nilay, S., Mauntz, W.: The identification of model effective
dimensions using global sensitivity analysis. Reliab. Eng. Syst. Saf. 96, 440449 (2011)
26. Kucherenko, S., Tarantola, S., Annoni, P.: Estimation of global sensitivity indices for models
with dependent variables. Comput. Phys. Commun. 183(4), 937946 (2012)
1238 C. Prieur and S. Tarantola
27. Lamboni, M., Monod, H., Makowski, D.: Multivariate sensitivity analysis to measure global
contribution of input factors in dynamic models. Reliab. Eng. Syst. Saf. 96(4), 450459
(2011)
28. Lemieux, C.: Monte Carlo and Quasi-Monte Carlo Sampling. Springer Series in Statistics.
Springer, New York (2009)
29. Li, G., Rabitz, H.: General formulation of HDMR component functions with independent and
correlated variables. J. Math. Chem. 50(1), 99130 (2012)
30. Lilburne, L., Tarantola, S.: Sensitivity analysis of spatial models. Int. J. Geogr. Inf. Sci. 23(2),
151168 (2009)
31. Liu, R., Owen, A.: Estimating mean dimensionality of analysis of variance decompositions.
J. Am. Stat. Assoc. 101(474), 712721 (2006)
32. Mara, T.A., Joseph, O.R.: Comparison of some efficient methods to evaluate the main effect of
computer model factors. J. Stat. Comput. Simul. 78, 167178 (2008)
33. Mara, T.A., Tarantola, S.: Variance-based sensitivity indices for models with dependent inputs.
Reliab. Eng. Syst. Saf. 107, 115121 (2012)
34. McKay, M.D.: Evaluating prediction uncertainty. Technical Report NUREG/CR-6311, US
Nuclear Regulatory Commission and Los Alamos National Laboratory, pp. 179 (1995)
35. Moore, L.M., Morris, M.D., McKay, M.D.: Using orthogonal arrays in the sensitivity analysis
of computer models. Technometrics 50, 205215 (2008)
36. Monod, H., Naud, C., Makowski, D.: Uncertainty and sensitivity analysis for crop models.
In: Wallach, D., Makowski, D., Jones, J.W. (eds.) Working with Dynamic Crop Models:
Evaluation, Analysis, Parameterization, Applications, chap. 4, pp. 5599. Elsevier (2006)
37. Owen, A.B.: Orthogonal arrays for computer experiments, integration and visualization. Stat.
Sin. 2, 439452 (1992)
38. Owen, A.B.: Better estimation of small Sobol sensitivity indices. ACM Trans. Model. Comput.
Simul. 23, 1117 (2013)
39. Pearson, K.: Mathematical contributions to the theory of evolution. Proc. R. Soc. Lond. 71,
288313 (1903)
40. Plischke, E.: An effective algorithm for computing global sensitivity indices (easi). Reliab.
Eng. Syst. Saf. 95, 354360 (2010)
41. Plischke, E., Borgonovo, E., Smith, C.L.: Global sensitivity measures from given data. Eur. J.
Oper. Res. 226, 536550 (2013)
42. Qian, P.Z.G.: Nested Latin hypercube designs. Biometrika 96(4), 957970 (2009)
43. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 280297 (2002)
44. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output. Design and estimator for the total sensitivity index.
Comput. Phys. Commun. 181(2), 259270 (2010)
45. Saltelli, A., Tarantola, S., Campolongo, F.: Sensitivity analysis as an ingredient of modeling.
Stat. Sci. 15(4), 377395 (2000)
46. Saltelli, A., Tarantola, S., Chan, K.: A quantitative, model-independent method for global
sensitivity analysis of model output. Technometrics 41, 3956 (1999)
47. Schaibly, J.H., Shuler, K.: Study of the sensitivity of coupled reaction systems to uncertainties
in rate coefficients: applications. J. Chem. Phys. 59, 38793888 (1973)
48. Sobol, I.M.: Sensitivity analysis for nonlinear mathematical models. Math. Model. Comput.
Exp. 1, 407414, 1993.
49. Sudret, B.: Global sensitivity analysis using polynomial chaos expansion. Reliab. Eng. Syst.
Saf. 93, 964979 (2008)
50. Tarantola, S., Gatelli, D., Mara, T.A.: Random balance designs for the estimation of first-order
global sensitivity indices. Reliab. Eng. Syst. Saf. 91, 717727 (2006)
51. The Comprehensive R Archive Network: DiceDesign package. http://cran.r-project.org/web/
packages/DiceDesign/index.html
52. The Comprehensive R Archive Network: Sensitivity package. http://cran.r-project.org/web/
packages/sensitivity/
35 Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms 1239
53. Tissot, J.Y., Prieur, C.: A bias correction method for the estimation of sensitivity indices based
on random balance designs. Reliab. Eng. Syst. Saf. 107, 205213 (2012)
54. Tissot, J.Y., Prieur, C.: Variance-based sensitivity analysis using harmonic analysis. Technical
report (2012) http://hal.archives-ouvertes.fr/hal-00680725.
55. Tissot, J.Y., Prieur, C.: A randomized orthogonal array-based procedure for the estimation of
first- and second-order Sobol indices. J. Stat. Comput. Simul. 85(7), 13581381 (2015)
56. Wang, X., Fang, K.-T.: The effective dimension and quasi-Monte Carlo integration. J. Com-
plex. 19(2), 101124 (2003)
57. Xu, C., Gertner, G.: Understanding and comparisons of different sampling approaches for the
Fourier amplitudes sensitivity test (fast). Comput. Stat. Data Anal. 55(1), 184198 (2011)
Derivative-Based Global Sensitivity
Measures 36
Sergey Kucherenko and Bertrand Iooss
Abstract
The method of derivative-based global sensitivity measures (DGSM) has recently
become popular among practitioners. It has a strong link with the Morris
screening method and Sobol sensitivity indices and has several advantages
over them. DGSM are very easy to implement and evaluate numerically. The
computational time required for numerical evaluation of DGSM is generally
much lower than that for estimation of Sobol sensitivity indices. This paper
presents a survey of recent advances in DGSM concerning lower and upper
bounds on the values of Sobol total sensitivity indices Sitot . Using these bounds
it is possible in most cases to get a good practical estimation of the values of Sitot .
Several examples are used to illustrate an application of DGSM.
Keywords
Sensitivity analysis Sobol indices Morris method Model derivatives
DGSM Poincar inequality
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1242
2 From Morris Method to DGSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244
2.1 Basics of the Morris Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1244
2.2 The Local Sensitivity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245
S. Kucherenko ()
Department of Chemical Engineering, Imperial College London, London, UK
e-mail: s.kucherenko@imperial.ac.uk
B. Iooss
Industrial Risk Management Department, EDF R&D, Chatou, France
1 Introduction
with characteristic dimensions much less than delta. Secondly, it lacks the ability
of the Sobol method to provide information about main effects (contribution of
individual variables to uncertainty), and it cannot distinguish between low- and
high-order interactions.
This paper presents a survey of derivative-based global sensitivity measures
(DGSM) and their link with Sobol sensitivity indices. DGSM are based on
averaging local derivatives using Monte Carlo or quasi-Monte Carlo sampling
methods. This technique is much more accurate than the Morris method as the
elementary effects are evaluated as strict local derivatives with small increments
compared to the variable uncertainty ranges. Local derivatives are evaluated at
randomly or quasi-randomly selected points in the whole range of uncertainty and
not at the points from a fixed grid.
The so-called alternative global sensitivity estimator defined as a normalized
integral of partial derivatives was firstly introduced by [36]. Kucherenko et al. [17]
introduced some other DGSM and coined the acronym DGSM. They showed that
DGSM can be seen as the generalization of the Morris method [21]. Kucherenko
et al. [17] also established empirically the link between DGSM and Sobol
sensitivity indices. They showed that the computational cost of numerical evaluation
of DGSM can be much lower than that for estimation of Sobol sensitivity indices.
Sobol and Kucherenko [38] proved theoretically that, in the cases of uniformly
and normally distributed input variables, there is a link between DGSM and the
Sobol total sensitivity index Sitot for the same input. They showed that DGSM can
be used as an upper bound on total sensitivity index Sitot . Small values of DGSM
imply small Sitot and hence unessential factors xi . However, ranking influential
factors using DGSM can be similar to that based on Sitot only for the case of
linear and quasi-linear models. For highly nonlinear models, two rankings can be
very different. They also introduced modified DGSM which can be used for both a
single input and groups of inputs [39]. From DGSM, [16] have also derived lower
bounds on total sensitivity index. Lamboni et al. [19] extended results of Sobol
and Kucherenko for models with input variables belonging to the general class
of continuous probability distributions. In the same framework, [28] have defined
crossed-DGSM, based on second-order derivatives of model output, in order to
bound the total Sobol indices of an interaction between two inputs.
All these DGSM can be applied for problems with a high number of input
variables to reduce the computational time. Indeed, the numerical efficiency of the
DGSM method can be improved by using the automatic differentiation algorithm for
calculation DGSM as was shown in [15]. However, the number of required function
evaluations still remains to be proportional to the number of inputs. This dependence
can be greatly reduced using an approach based on algorithmic differentiation in
the adjoint or reverse mode [9] (see Chap. 32, Variational Methods). It allows
estimating all derivatives at a cost at most 46 times of that for evaluating the
original function [13].
This paper is organized as follows: the Morris method and DGSM are firstly
described in the following section. Sobol global sensitivity indices and useful
relationships are then introduced. Therefore, DGSM-based lower and upper bounds
1244 S. Kucherenko and B. Iooss
on total Sobol sensitivity indices for uniformly and normally distributed random
variables are presented, followed by DGSM for groups of variables and their link
with total Sobol sensitivity indices. Another section presents the upper bound
results in the general case of variables with continuous probability distributions.
Then, computational costs are considered, followed by some test cases which
illustrate an application of DGSM and their links with total Sobol sensitivity
indices. Finally, conclusions are presented in the last section.
The Morris method is traditionally used as a screening method for problems with
a high number of variables for which function evaluations can be CPU time-
consuming (see Chap. 33, Design of Experiments for Screening). It is composed
of individually randomized one-factor-at-a-time (OAT) experiments. Each input
factor may assume a discrete number of values, called levels, which are chosen
within the factor range of variation.
The sensitivity measures proposed in the original work of [21] are based on what
is called an elementary effect. It is defined as follows. The range of each input
variable is divided into p levels. Then the elementary effect (incremental ratio) of
the i-th input factor is defined as
G x1 ; : : : ; xi1 ; xi C ; xiC1
; : : : ; xd G .x /
EEi x D ; (36.1)
Finally, other extensions of the initial Morris method have been introduced for
the second-order effects analysis [2, 4, 6], for the estimation of Morris measures
with any type of design [26, 32], and for building some 3D Morris graph [26].
In (36.3) and (36.6), i is in fact the mean value of .@G=@xi /2 . In the following
and in practice, it will be the most useful DGSM.
3.1 Definitions
The method of global sensitivity indices developed by Sobol (see Chap. 35,
Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms) is based
on ANOVA decomposition [32, 33]. Consider a square integrable function G.x/
defined in the unit hypercube H d . It can be expanded in the following form:
X X
G.x/ D g0 C gi .xi / C gij .xi ; xj / C : : : C g12:::d .x1 ; x2 ; : : : ; xd /:
i i<j
Z (36.8)
1
This decomposition is unique if conditions gi1 :::is dxik D 0 for 1 k s are
0
satisfied. Here 1 i1 < < is d .
The variances of the terms in the ANOVA decomposition add up to the total
variance of the function
d
X d
X
V D Vi1 :::is ;
sD1 i1 <<is
Z 1
where Vi1 :::is D gi21 :::is .xi1 ; : : : ; xis /dxi1 ; : : : ; xis are called partial variances.
0
Sobol defined the global sensitivity indices as the ratios
d
X XX XXX
Si C Sij C Sij k : : : C S1;2;:::;d D 1:
iD1 i j i j k
Sobol also defined sensitivity indices for subsets of variables. Consider two
complementary subsets of variables y and z:
x D .y; z/:
36 Derivative-Based Global Sensitivity Measures 1247
Vy includes all partial variances Vi1 , Vi2 ,. . . , Vi1 :::is such that their subsets of indices
.i1 ; : : : ; is / 2 K.
The total sensitivity indices were introduced by [11]. The total variance Vytot is
defined as
Vytot D V Vz :
Vytot consists of all Vi1 :::is such that at least one index ip 2 K while the remaining
indices can belong to the complimentary K set K. N The corresponding global
sensitivity indices are defined as
Sy D Vy =V;
(36.9)
Sytot D Vytot =V:
Si D Vi =V;
(36.10)
Sitot D Vitot =V:
Their values in most cases provide sufficient information to determine the sensitivity
of the analyzed function to individual input variables. Variance-based methods
generally require a large number of function evaluations (see Variance-Based
Methods: Theory and Algorithms) to achieve reasonable convergence and can
become impractical for large engineering problems.
To present further results on lower and upper bounds of Sitot , new notations and
useful relationships have to be firstly presented. Denote ui .x/ the sum of all terms
in the ANOVA decomposition (36.8) that depend on xi :
d
X
ui .x/ D gi .xi / C gij .xi ; xj / C C g12d .x1 ; ; xd /: (36.11)
j D1;j i
It is obvious that
@G @ui
D : (36.13)
@xi @xi
Denote z D .x1 ; : : : ; xi1 ; xiC1 ; : : : ; xd / the vector of all variables but xi , then
x .xi ; z/ and G.x/ G.xi ; z/. The ANOVA decomposition of G.x/ (36.8) can
be presented in the following form:
Z 1
ui .xi ; z/ D G.x/ G.x/dxi : (36.14)
0
This equation can be found in [18]. The total partial variance Vitot can be com-
puted as
Z Z
Vitot D u2i .x/d x D u2i .xi ; z/dxi d z:
Hd Hd
o
where x D x1 ; : : : ; xi1 ; xi0 ; xiC1 ; : : : ; xn .
36 Derivative-Based Global Sensitivity Measures 1249
@G
Theorem 1. Assuming that c C , then
@x i
c2 C2
Sitot : (36.17)
12V 12V
In this section, several theorems are listed in order to define useful lower and upper
bounds of the total Sobol indices. The proofs of these theorems come from previous
works and papers and are not recalled here. Two cases are considered: variables x
following uniform distributions and variables x following Gaussian distributions.
The general case will be seen in a subsequent section.
Theorem 3. There exists the following lower bound, denoted .m/, between
DGSM (36.4) and the Sobol total sensitivity index:
hR i2
.mC1/
.2m C 1/ Hd .G.1; z/ G.x// d x wi
.m/ D < Sitot : (36.21)
.m C 1/2 V
.m/
i D fi ; wi ;
i g; m > 0: (36.24)
i
Sitot : (36.25)
2V
36 Derivative-Based Global Sensitivity Measures 1251
From a practical point of view, the criteria i and i are equivalent: they are
evaluated by the same numerical algorithm and are linked by relations i C i
p
and i i .
The right term in (36.25) is further called the upper bound number one (UB1).
Theorem 5. There exists the following upper bound between DGSM (36.5) and the
Sobol total sensitivity index:
&i
Sitot : (36.26)
V
Further &i =D is called the upper bound number two (UB2). Note that 12 xi .1xi /
for 0 xi 1 is bounded: 0 12 xi .1 xi / 1=8. Therefore, 0 &i i =8.
i4
w2 Sitot : (36.28)
.2i C i2 /V i
1252 S. Kucherenko and B. Iooss
Z
Proof. Using Eq. (36.15) and Cauchy-Schwarz inequality applied on xi ui .x/
Rd
dF .x/ (with F the joint cdf), [16] give the proof of this inequality when i D 0
(omitting to mention this condition). The general proof, obtained by [25], is given
below.
Consider a univariate function g.X /, with X a normally distributed variable with
mean , finite variance 2 , and cdf F . With adequate conditions on g, the following
equality is obtained by integrating by parts:
Z 1 Z 1
0 0 1 0 .x /2
Eg .X / D g .x/dF .x/ D p g .x/ exp dx
1 2 1 2 2
C1
1 .x /2
D p g.x/ exp
2 2 2 1
Z 1
1 x .x /2
C p g.x/ 2 exp dx
2 1 2 2
Z 1 Z 1
1
D 2 xg.x/dF .x/ g.x/dF .x/:
1 1
R
because Rd ui .x/dF .x/ D 0 (due to the ANOVAR decomposition condition).
Moreover, the Cauchy-Schwarz inequality applied on Rd xi ui .x/dF .x/ gives
Z 2 Z Z
xi ui .x/dF .x/ xi2 dF .x/ ui .x/2 dF .x/:
Rd Rd Rd
1 2
w2i . C i2 /V Sitot ;
i4 i
@G
Theorem 7. If Xi has a finite variance i2
and c C , then
@x
i
i2 c 2 2C 2
Sitot i : (36.29)
V V
The constant factor i2 cannot be improved.
Theorem 8. If Xi is normally distributed with a finite variance i2 , there exists the
following upper bound between DGSM (36.6) and the Sobol total sensitivity index:
i2
Sitot i : (36.30)
V
The constant factor i2 cannot be reduced.
s Z
!2
X @G .x/ 1 3xip C 3xi2p
y D d x: (36.31)
pD1
@xip 6
Proof. The proofs of these theorems are given in [39]. The second theorem shows
that small values of y imply small values of Sytot , and this allows identification of
a set of unessential factors y (usually defined by a condition of the type Sytot < ,
where is small).
1254 S. Kucherenko and B. Iooss
Considering the one-dimensional case when the subset y consists of only one
variable y D .xi /, then measure y D i has the form
Z 2
@G .x/ 1 3xi C 3xi2
i D d x: (36.32)
@xi 6
24
Sitot i : (36.33)
2V
Thus small values of i imply small values of Sitot that are characteristic for non-
important variables xi . At the same time, the following corollary is obtained from
Theorem 9: if G .x/ depends linearly on xi , then Sitot D i =V . Thus i is closer to
Vitot than i .
Note that the constant factor 1= 2 in (36.25) is the best possible. But in the
general inequality for i (36.33), the best possible constant factor is unknown.
There is a general link between importance measures i , &i , and i :
1
i D &i C i ;
6
then
1
&i D i i :
6
2
Sytot y :
V
where F is the joint cdfR of .X1 ; : : : ; Xd /. Equation (36.34) is valid for all functions
G in L2 .F / such that G.x/dF .x/ D 0 and krf k 2 L2 .F /. The constant C .F /
in Eq. (36.34) is called a Poincar constant of F . In some cases, it exists an optimal
Poincar constant Copt .F / which is the best possible constant. In measure theory,
the Poincar constants are expressed as a function of so-called Cheeger constants
[1] which are used for SA in [19] (see [28] for more details).
A connection between total indices and DGSM has been established by [19] for
variables with continuous distributions (called Boltzmann probability measures in
their paper).
Theorem 12. Let Fi and fi be, respectively, the cdf and the pdf of Xi ; the following
inequality is obtained:
C .Fi /
Sitot i ; (36.35)
V
2
min .Fi .x/; 1 Fi .x//
C .Fi / D 4 sup : (36.36)
x2R fi .x/
1256 S. Kucherenko and B. Iooss
Proof. This result comes from the direct application of the L2 -Poincar inequal-
ity (36.34) on ui .x/ (see Eq. (36.11)).
In [19] and [28], the particular case of log-concave probability distribution has
been developed. It includes classical distributions as, for instance, the normal,
exponential, beta, gamma, and Gumbel distributions. In this case, the constant writes
1
C .Fi / D (36.37)
fi .mQ i /2
with mQ i the median of the distribution Fi . This allows to obtain analytical
expressions for C .Fi / in several cases [19]. In the case of a log-concave truncated
distribution on a; b, the constant writes [28]
2 Fi .a/ C Fi .b/ 2
.Fi .b/ Fi .a// =fi qi (36.38)
2
with qi ./ the quantile function of Xi . Table 36.1 gives some examples of Poincar
constants for several well-known and often used probability distributions in practice.
For studying second-order interactions, [28] have derived a similar inequality
to (36.35) based on the squared crossed derivatives of the function. Assuming that
second-order derivatives of G are in L2 .F /, it uses the so-called crossed-DGSM
Z 2
@2 G.x/
ij D dF .x/; (36.39)
@xi @xj
introduced by [7]. An inequality link is made with an extension of the total Sobol
sensitivity indices to general sets of variables (called superset importance or total
interaction index) proposed by [20]. In the case of a pair of variables fXi ; Xj g, the
superset importance is defined as
super
X
Vij D VI : (36.40)
I fi;j g
The estimation methods of this total interaction index have also been studied by [8].
Roustant et al. [28] have shown on several examples how to apply this result in
order to detect pairs of inputs that do not interact together (see also [22] and [8]
which use Sobol indices).
7 Computational Costs
@G.x/
All DGSM can be computed using the same set of partial derivatives ;
@xi
@G.x/
i D 1; : : : ; d . Evaluation of can be done analytically for explicitly given
@xi
easily differentiable functions or numerically:
@G.x / G x1 ; : : : ; xi1 ; xi C ; xiC1
; : : : ; xn G .x /
D : (36.43)
@xi
8 Test Cases
In this section, three test cases are considered, in order to illustrate application of
DGSM and their links with Sitot .
Example 2. Consider the so-called g-function which is often used in global SA for
illustration purposes:
d
Y
G.x/ D vi ;
iD1
j4xi 2j C ai
where vi D , ai .i D 1; : : : ; d / are constants. It is easy to see that
1 C ai Q
for this function gi .xi / D .vi 1/, ui .x/ D .vi 1/ dj D1;j i vj , and as a result
Y d
1=3
LB1 = 0. The total variance is V D 1 C 1C . The analytical
j D1
.1 C aj /2
values of Si , Sitot and LB2 are given in Table 36.2.
d .m/ 0:0772
By solving equation D 0, m = 9.64 and .m / D . It is
dm .1 C ai /2 V
interesting to note that m does not depend on ai ; i D 1; 2; : : : ; d and d. In the
Table 36.2 The analytical expressions for Si , Sitot and LB2 for g-function
Si Sitot .m/
2
Qd 4.1.1=2/mC1 /
1=3 1=3 .2m C 1/ 1
1=3 .1Cai /2 j D1;j i 1 C .1Caj /2 mC2
.1 C ai /2 V V .1 C ai /2 .m C 1/2 V
36 Derivative-Based Global Sensitivity Measures 1259
.m / Si
extreme cases, if ai ! 1 for all i, ! 0:257, tot ! 1, while if ai ! 0 for
Sitot Si
.m / 0:257 Si 1
all i, tot ! 1
, tot ! . The analytical expression for Sitot ,
Si .4=3/ d Si .4=3/d 1
UB1, and UB2 are given in Table 36.3.
S tot 2 Sitot 1 UB2 2
For this test function, i D , D ; hence D < 1.
UB1 48 UB2 4 UB1 12
tot
Values of Si , Si , UB1, UB2, and LB2 for the case of a = [0,1,4.5,9,99,99,99,99],
d = 8, are given in Table 36.4 and shown in Fig. 36.1. One can see that the knowledge
Table 36.3 The analytical expressions for Sitot , UB1, and UB2 for g-function
Sitot UB1 UB2
1=3 Qd 1=3 Qd 1=3 Qd 1=3
.1Cai /2 j D1;j i 1C .1Caj /2
16 j D1;j i 1 C .1Caj /2
4 j D1;j i 1 C .1Caj /2
V .1 C ai /2 2 V 3.1 C ai /2 V
Table 36.4 Values of LB*, Si , Sitot , UB1, and UB1. Example 2, a = [0,1,4.5,9,99,99,99,99], d = 8.
x1 x2 x3 x4 x 5 : : : x8
LB* 0.166 0.0416 0.00549 0.00166 0.000017
Si 0.716 0.179 0.0237 0.00720 0.0000716
Sitot 0.788 0.242 0.0343 0.0105 0.000105
UB1 3.828 1.178 0.167 0.0509 0.00051
UB2 3.149 0.969 0.137 0.0418 0.00042
101
S1tot
UB
100 LB2
Si
101
Measures
102
103
104
x1 x2 x3 x4 x5 x6 x7 x8
Variable
Fig. 36.1 Values of Si , Sitot , LB2, and UB1 for all input variables. Example 2 with
a D 0; 1; 4:5; 9; 99; 99; 99; 99, d D 8
1260 S. Kucherenko and B. Iooss
of LB2 and UB1 allows to rank correctly all the variables in the order of their
importance.
Example 3. Consider the reduced Morris test function with four inputs [3]:
4
X 4
X 4
X
f .x/ D bi x i C bij xi xj C bij k xi xj xk (36.44)
iD1 ij ij k
2 3 2 3 2 3
0:05 0 80 60 40 0 10 0:98 0:19
6 0:59 7 6 0 30 0:73 0:18 7 60 0 0:49 50 7
with bi D 6 7
4 10 5 ; bij D 6
40 0
7; bij 4 D6 7 :
0:64 0:93 5 40 0 0 1 5
0:21 0 0 0 0:06 0 0 0 0
numerical tests involving nonuniform and non-normal distributions for the inputs
can be found in [19] and [8].
9 Conclusions
This paper has shown that using lower and upper bounds based on DGSM is possible
in most cases to get a good practical estimation of the values of Sitot at a fraction of
the CPU cost for estimating Sitot . Upper and lower bounds can be estimated using
MC/QMC integration methods using the same set of partial derivative values. Most
of the applications show that DGSM can be used for fixing unimportant variables
and subsequent model reduction because small values of DGSM imply small values
of Sitot . In a general case, variable ranking can be different for DGSM and variance-
based methods, but for linear function and product function, DGSM can give the
same variable ranking as Sitot .
Engineering applications of DGSM can be found, for instance, in [15] and [27]
for biological systems modeling, [24] for structural mechanics, [12] for an aquatic
prey-predator model, [25] for a river flood model, and [41] for a hydrogeological
simulator of the oil industry. One of the main prospects in practical situations is
to use algorithmic differentiation in the reverse (adjoint) mode on the numerical
model, allowing to estimate efficiency of all partial derivatives of this model (see
Chap. 32, Variational Methods). In this case, the cost of DGSM estimations
would be independent of the number of input variables. Obtaining global sensitivity
information in a reasonable CPU time cost is therefore possible even for large-
dimensional model (several tens and spatially distributed inputs in the recent and
pioneering attempt of [25]). When the adjoint model is not available, the DGSM
estimation remains a problem in high dimension, and novel ideas have to be
explored [23, 24]. Coupling DGSM with non-parametric regression techniques
or metamodel-based technique (see Chap. 38, Metamodel-Based Sensitivity
Analysis: Polynomial Chaos Expansions and Gaussian Processes) is another
research prospect as first shown by [40] and [5].
The authors would like to thank Prof. I. Sobol, Dr. S. Song, S. Petit, Dr. M.
Lamboni, Dr. O. Roustant, and Prof. F. Gamboa for their contributions to this
work. One of the authors (SK) gratefully acknowledges the financial support by
the EPSRC grant EP/H03126X/1.
References
1. Bobkov, S.G.: Isoperimetric and analytic inequalities for log-concave probability measures.
Ann. Probab. 27(4), 19031921 (1999)
2. Campolongo, F., Braddock, R.: The use of graph theory in the sensitivity analysis of model
output: a second order screening method. Reliab. Eng. Syst. Saf. 64, 112 (1999)
3. Campolongo, F., Cariboni, J., Saltelli, A.: An effective screening design for sensitivity analysis
of large models. Environ. Model. Softw. 22, 15091518 (2007)
4. Cropp, R., Braddock, R.: The new Morris method: an efficient second-order screening method.
Reliab. Eng. Syst. Saf. 78, 7783 (2002)
1262 S. Kucherenko and B. Iooss
5. De Lozzo, M., Marrel, A.: Estimation of the derivative-based global sensitivity measures using
a Gaussian process metamodel, SIAM/ASA J. on Uncertain. Quantif. (2016, Accepted for
publication)
6. Fdou, J.M., Rendas, M.J.: Extending Morris method: identification of the interaction graph
using cycle-equitable designs. J. Stat. Comput. Simul. 85, 13981419 (2015)
7. Friedman, J., Popescu, B.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2(3),
916954 (2008)
8. Fruth, J., Roustant, O., Kuhnt, S.: Total interaction index: a variance-based sensitivity index
for second-order interaction screening. J. Stat. Plan. Inference 147, 212223 (2014)
9. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Automatic
Differentiation. SIAM, Philadelphia (2008)
10. Hardy, G., Littlewood, J., Polya, G.: Inequalities, 2nd edn. Cambridge University Press,
London (1988)
11. Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of non linear
models. Reliab. Eng. Syst. Saf. 52, 117 (1996)
12. Iooss, B., Popelin, A.L., Blatman, G., Ciric, C., Gamboa, F., Lacaze, S., Lamboni, M.: Some
new insights in derivative-based global sensitivity measures. In: Proceedings of the PSAM11
ESREL 2012 Conference, Helsinki, pp. 10941104 (2012)
13. Jansen, K., Leovey, H., Nube, A., Griewank, A., Mueller-Preussker, M.: A first look of quasi-
Monte Carlo for lattice field theory problems. Comput. Phys. Commun. 185, 948959 (2014)
14. Jansen, M.: Analysis of variance designs for model output. Comput. Phys. Commun. 117,
2543 (1999)
15. Kiparissides, A., Kucherenko, S., Mantalaris, A., Pistikopoulos, E.: Global sensitivity analysis
challenges in biological systems modeling. J. Ind. Eng. Chem. Res. 48, 11351148 (2009)
16. Kucherenko, S., Song, S.: Derivative-based global sensitivity measures and their link with
Sobol sensitivity indices. In: Cools, R., Nuyens, D. (eds.) Proceedings of the Eleventh
International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific
Computing (MCQMC 2014). Springer, Leuven (2015)
17. Kucherenko, S., Rodriguez-Fernandez, M., Pantelides, C., Shah, N.: Monte carlo evaluation of
derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 94, 11351148 (2009)
18. Lamboni, M.: New way of estimating total sensitivity indices. In: Proceedings of the 7th
International Conference on Sensitivity Analysis of Model Output (SAMO 2013), Nice
(2013)
19. Lamboni, M., Iooss, B., Popelin, A.L., Gamboa, F.: Derivative-based global sensitivity
measures: general links with sobol indices and numerical tests. Math. Comput. Simul. 87,
4554 (2013)
20. Liu, R., Owen, A.: Estimating mean dimensionality of analysis of variance decompositions.
J. Am. Stat. Assoc. 101(474), 712721 (2006)
21. Morris, M.: Factorial sampling plans for preliminary computational experiments. Technomet-
rics 33, 161174 (1991)
22. Muehlenstaedt, T., Roustant, O., Carraro, L., Kuhnt, S.: Data-driven Kriging models based on
FANOVA-decomposition. Stat. Comput. 22, 723738 (2012)
23. Patelli, E., Pradlwarter, H.: Monte Carlo gradient estimation in high dimensions. Int. J. Numer.
Methods Eng. 81, 172188 (2010)
24. Patelli, E., Pradlwarter, H.J., Schuller, G.I.: Global sensitivity of structural variability by
random sampling. Comput. Phys. Commun. 181, 20722081 (2010)
25. Petit, S.: Analyse de sensibilit globale du module MASCARET par lutilisation de la
diffrentiation automatique. Rapport de stage de fin dtudes de Suplec, EDF R&D, Chatou
(2015)
26. Pujol, G.: Simplex-based screening designs for estimating metamodels. Reliab. Eng. Syst. Saf.
94, 11561160 (2009)
27. Rodriguez-Fernandez, M., Banga, J., Doyle, F.: Novel global sensitivity analysis methodology
accounting for the crucial role of the distribution of input parameters: application to systems
biology models. Int. J. Robust Nonlinear Control 22, 10821102 (2012)
36 Derivative-Based Global Sensitivity Measures 1263
28. Roustant, O., Fruth, J., Iooss, B., Kuhnt, S.: Crossed-derivative-based sensitivity measures for
interaction screening. Math. Comput. Simul. 105, 105118 (2014)
29. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 280297 (2002)
30. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global sensitivity analysis. The primer. Wiley, Chichester/Hoboken (2008)
31. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output. Design and estimator for the total sensitivity index.
Comput. Phys. Commun. 181, 259270 (2010)
32. Santiago, J., Corre, B., Claeys-Bruno, M., Sergent, M.: Improved sensitivity through Morris
extension. Chemom. Intell. Lab. Syst. 113, 5257 (2012)
33. Sobol, I.: Sensitivity estimates for non linear mathematical models (in Russian). Matematich-
eskoe Modelirovanie 2, 112118 (1990)
34. Sobol, I.: Sensitivity estimates for non linear mathematical models. Math. Model. Comput.
Exp. 1, 407414 (1993)
35. Sobol, I.: Global sensitivity indices for non linear mathematical models and their Monte Carlo
estimates. Math. Comput. Simul. 55, 271280 (2001)
36. Sobol, I., Gershman, A.: On an alternative global sensitivity estimators. In: Proceedings of
SAMO 1995, Belgirate, pp. 4042 (1995)
37. Sobol, I., Kucherenko, S.: Global sensitivity indices for non linear mathematical models. Rev.
Wilmott Mag. 1, 5661 (2005)
38. Sobol, I., Kucherenko, S.: Derivative based global sensitivity measures and their links with
global sensitivity indices. Math. Comput. Simul. 79, 30093017 (2009)
39. Sobol, I., Kucherenko, S.: A new derivative based importance criterion for groups of variables
and its link with the global sensitivity indices. Comput. Phys. Commun. 181, 12121217
(2010)
40. Sudret, B., Mai, C.V.: Computing derivative-based global sensitivity measures using polyno-
mial chaos expansions. Reliab. Eng. Syst. Saf. 134, 241250 (2015)
41. Touzany, S., Busby, D.: Screening method using the derivative-based global sensitivity indices
with application to reservoir simulator. Oil Gas Sci. Technol. Rev. IFP Energ. Nouvelles 69,
619632 (2014)
Moment-Independent and
Reliability-Based Importance Measures 37
Emanuele Borgonovo and Bertrand Iooss
Abstract
This chapter discusses the class of moment-independent importance measures.
This class comprises density-based, cumulative distribution function-based, and
value of information-based sensitivity measures. The chapter illustrates the
definition and properties of these importance measures as they have been
proposed in the literature, reviewing a common rationale that envelops them,
as well as recent results that concern the general properties of global sensitivity
measures. The final part of the chapter reviews importance measures developed
in the context of reliability and structural reliability theories.
Keywords
Computer experiment Global sensitivity analysis Moment-independent
importance measures Reliability importance measures Structural reliabil-
ity Value of information Common rationale Uncertainty
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266
2 Density-Based Importance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267
3 Sensitivity Measures Based on Separations Between Cumulative Distribution
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272
4 Value of Information-Based Sensitivity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274
5 A Common Rationale: Properties of Sensitivity Measures . . . . . . . . . . . . . . . . . . . . . . . . 1276
E. Borgonovo ()
Department of Decision Sciences, Bocconi University, Milan, Italy
e-mail: emanuele.borgonovo@unibocconi.it
B. Iooss
Industrial Risk Management Department, EDF R&D, Chatou, France
1 Introduction
The last section introduces first classical reliability measures and then a recently
introduced perturbation-based sensitivity measure for structural reliability.
Y D G.X1 ; X2 ; X3 / D X1 X2 C X3 (37.1)
and suppose that the analyst is uncertain about the model inputs. She assigns
X1 uniformly and independently distributed between 1 and 1, and X2 and X3
uniformly distributed between 0 and 1. Then, what our degree of belief about Y
is displayed by the distribution of Y . If Y is a continuous random variable, then we
can refer to its density, fY .y/, which is the blue curve in Fig. 37.1. Precisely, fY .y/
is the unconditional density of Y .
1
fY|X (y)
=0
0.9 1
0.8
fY(y)
0.7
(y)
0.6
1=0
fY(y), fY|X
0.5
0.4
0.3
0.2
0.1
0
1.5 1 0.5 0 0.5 1 1.5 2 2.5
y
Fig. 37.1 Unconditional model output density (fY .y/, blue) and conditional model output density
given that X1 D 0 (fY jXi D0 .y/, red)
1268 E. Borgonovo and B. Iooss
Consider now to inspect what happens if we are able to fix X1 , say, to its mean
value (which equals 0). Then, the input-output mapping becomes
G.0; X2 ; X3 / D X3 (37.2)
and the conditional model output density given X1 D 0 is a uniform distribution
between 0 and 1. This distribution is plotted in the red curve of Fig. 37.1. Thus,
receiving information that X1 D 0 causes a change in the decision-makers view
about Y which is reflected by the change in the corresponding densities.
The shaded region in Fig. 37.1 displays the area enclosed between the two
densities. Such area is given, in formulae, by
Z
a1 .0/ D fY . y/ fY jX D0 . y/ dy (37.3)
i
Y
where the subscript in a1.0/ refers to the model input (X1 ) and .0/ reflects the value
at which X1 has been fixed. In our case, we register a1 .0/ D 0:4556. In general, we
shall write
Z
ai .xi / D fY . y/ fY jX Dx . y/ dy (37.4)
i i
Y
where ai .xi / is the area enclosed between the conditional and unconditional
densities, obtained by fixing the generic model input Xi at xi .
However, because Xi is a continuous random variable, there is a finite probability
that Xi assumes some other values in its support. Thus, ai .xi / is a conditional
separation for any value xi . We can make the separation unconditional by taking
the expectation over all possible values of Xi in accordance with the marginal
distribution of Xi . We then write
Z
1
i D E fY . y/ fY jX .y/ dy
i
2 Y
Z Z
1
D fXi .xi /dxi fY . y/ fY jX . y/ dy (37.5)
i
2 Xi Y
1
where the factor is inserted for normalization purposes. The sensitivity measure
2
in Eq. (37.5) is the expected separation between the conditional and unconditional
model output densities provoked by fixing Xi , with the separation measured
using the L1 -norm. The sensitivity measure in Eq. (37.5) has been introduced in
Borgonovo [6] and is referred to as moment-independent importance measure [36]
or Borgonovos [51]. We shall refer to it as , for simplicity, in the remainder.
possesses several properties. The first is normalization, which is 0 i 1.
In particular, assumes the null value if and only if Y is independent of Xi . In
fact, if Y and Xi are independent, fixing Xi leaves the distribution of Y unchanged.
Conversely, note that i can be rewritten as (see Plischke et al. [41]),
1
i D jfY . y/fXi .xi / fY;Xi . y; xi /j dydxi (37.6)
2 Xi Y
37 Moment-Independent and Reliability-Based Importance Measures 1269
Z Z
1
1;2;:::;d D E fY . y/ fY jX Dx . y/ dy D 1 2fXi .xi /dxi D 1
2 Y 2 Xi
(37.7)
A third property concerns sequential learning. Consider going from the present state
of knowledge. Suppose that we are able to fix model input Xj after we determine
model input Xi . Then, we register a first change in degree of belief from fY .y/
(where we have not fixed any model input) to fY jXi .y/ (Fig. 37.2). This change is
quantified by i .
Once we get to know Xj , we have a second shift in degree of belief from the
situation in which we know Xi and the situation in which we know Xi and Xj ,
that is, a shift from fY jXi .y/ to fY jXi Xj .y/. Then, the average shift from fY jXi .y/
to fY jXi Xj .y/ is given by
Z
1
j ji D E fY jX . y/ fY jX ;X . y/ dy : (37.8)
i i j
2 Y
Now, after these two shifts, we are in a state in which we know both Xi and Xj .
Then, our degree of belief about is now represented by fY jXi Xj .y/. This state is
1270 E. Borgonovo and B. Iooss
fY|Xi (y)
Xi is fixed
fY (y)
Fig. 37.2 A visual explanation of the triangular inequality for distance-based sensitivity measures
equivalently reached with a direct shift from fY .y/ to fY jXi Xj .y/. This last change
is quantified by
Z
1
i;j D E fY . y/ fY jX ;X . y/ dy : (37.9)
i j
2 Y
i;j i C j ji (37.10)
iY D iZ (37.11)
In fact, for any density function, the change of variable rule says that if Z D g.Y /
is a monotonically increasing function, then
Hence, we have
1 R 1 R
iY D E Y fY . y/dy fY jXi . y/dy D E Y fZ .z/d z fZjXi .z/d z D iZ
2 2
(37.13)
37 Moment-Independent and Reliability-Based Importance Measures 1271
Density function
Density function
4 2
2 1
0 0
0.2 0 0.2 0.4 0.6 0.8 1 1.2 0.2 0 0.2 0.4 0.6 0.8 1 1.2
y given x1 y given x2
3
2
1
0
0.2 0 0.2 0.4 0.6 0.8 1 1.2
y given x3 Separation of Conditional Densities
1
x1
0.8
x2
0.6 x3
Sr
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Empirical cdf of inputs
Fig. 37.3 First three graphs: unconditional model output density (black) and conditional densities
given that X1 , X2 , and X3 are fixed at several points in their domain, respectively. Fourth graph:
the separations ai .xi / as xi varies between 0 and 1, for i D 1; 2; 3
1
0.9
0.8
0.7
=0(y)
1
FY(y), FY|X
0.5
0.4 sup[FY(y)FY|X =0(y)]
1
0.3
0.2
0.1
0
1 0.5 0 0.5 1 1.5 2
y
Fig. 37.4 Visual interpretation of ki .xi / in Eq. (37.19). Kuipers distance between cumulative
distribution functions
FY jXi .y/ is the cumulative distribution function conditional on fixing model input
Xi . The expectation is taken in accordance with the marginal distribution of Xi .
In Baucells and Borgonovo [3], the choice of the metric is thoroughly discussed.
If scale invariance is not a relevant property, one can use a metric based on a generic
Lp -distance between cumulative distribution function, selecting
Z 1=p
p
h FY .y/; FY jXi .y/ D .jFY .y/ FY jXi .y/j/ dy (37.17)
1 p < 1.
If transformation invariance is, instead, a convenient property, then one can
base the sensitivity measure on the Kolmogorov-Smirnov distance and on its
modifications. In Baucells and Borgonovo [3], Kuipers distance is considered,
because this metric is equally sensitive for all values of y, and therefore puts
all percentiles on an equal footing [21, p. 140]. Kuipers modification remedies
a weakness associated with the Kolmogorov-Smirnov metric, which attributes more
weight to deviations around the median rather than at distribution tails [19, 37]. A
visualization of Kuipers metric is in Fig. 37.4.
Formally, one has
ki .Xi / D sup FY .y/ FY jXi .y/ C sup FY jXi .y/ FY .y/ (37.18)
y y
where supy FY .y/ FY jXi .y/ is the highest value of the difference between
FY .y/ and FY jXi .y/, while supy FY jXi .y/ FY .y/ is the highest value of the
opposite difference.
1274 E. Borgonovo and B. Iooss
" #
Ku
D E sup FY .y/ FY jXi .y/ C sup FY jXi .y/ FY .y/ (37.19)
y y
h FY ; FY jXi D sup w.FY .y/; FY jXi .y// jFY .y/ FY jXi .y/j (37.20)
y
where wFY .y/; FY jXi .y/ is a general weighting function that allows us
to assign a different relevance to different portions of the distribution. The
weighting
p function proposed by Anderson and Darling [1] (w.FY .y/; FY jXi .y// D
FY .y/1 FY .y/) is one of the most well known. Anderson-Darling weight
places emphasis in distribution tails. A weighting function well known in financial
studies for values at risk, with focus on distribution tails, is proposed by Crnkovic
and Drachman [21], who obtain a suitable modification of Kuipers metric.
To illustrate, we conclude with the values of Ku for the model in Eq. (37.1).
The sensitivity measures are computed using the subroutine betaKS2.m (available
upon request). The numerical values are reported in the fourth row of Table 37.1. We
observe that the values of iKu are equal to the values of i for the three model inputs.
This is not a coincidence, because, as proven in Plischke and Borgonovo [40], the
two sensitivity measures Ku and i are numerically identical when the conditional
and unconditional distributions are unimodal.
The decision-maker selects the alternative that has the maximum expected utility,
solving the problem
Equation (37.21) implies that the decision-maker is using the model to estimate the
distribution of the decision criterion under each alternative, propagates uncertainty,
and then selects the alternative that produces the highest expected value. The
quantity maxaD1;2;:::;A fE Ya g is the value of the decision problem given the present
state of information.
In this context, the effect of receiving information about Xi is twofold. On the one
hand, the distributions of all Ya0 s change. We have discussed this effect in previous
sections. On the other hand, the change in distributions might induce a change in
the preferred alternative. In fact, after we are informed that Xi D xi , the decision
problem becomes to select the best alternative conditional on Xi D xi . In other
words, the optimal alternative is alternative a without the additional information but
it might become a0 when the new information has arrived. In this case, we register
a change in both the preferred alternative and the value of the decision problem
change. Then, the change in value of the decision problem is given by
Taking the expectation with respect to Xi , we obtain the partial expected value of
perfect information about Xi
i D E max fE Ya jXi g max fE Ya g (37.23)
aD1;2;:::;A aD1;2;:::;A
The concept of value of information has been introduced by Howard [28]. The
expression EmaxaD1;2;:::;A fE Ya jXi g is called in Pratt et al. [42, p. 252] prior
expected value of action posterior to perfect information. In the remainder, we shall
refer to i as information value. As observed in Howard [28], information value is a
nonnegative quantity.
In sensitivity analysis, the intuition of using information value as global impor-
tance measure is due to Felli and Hazen [25]. It has then been studied in several
works [38, 46] and [48].
As for the properties of information value, we note that i is greater than or equal
to zero. Its lower bound, i D 0, is reached when Ya is independent of Xi . Its upper
bound is represented by the total information value,
T D E max fE Ya jXg max fE Ya g (37.24)
aD1;2;:::;A aD1;2;:::;A
1276 E. Borgonovo and B. Iooss
which is the value of information for resolving uncertainty in all the model inputs
simultaneously.
As opposed to i and i , a null information value does not reassure the analyst
that Y is independent of Xi . In fact, it might be the case of knowing that fixing Xi
never changes the preferred alternative even if Ya is dependent on Xi . We refer to
Borgonovo, Hazen, and Plischke [13] for further details. We conclude the section
estimating the value of information for the model inputs of our running example.
Suppose the decision-maker has two available alternatives, a and b. Alternative a
1
leads to a sure Ya D , while alternative b leads to an uncertainty Yb , where Yb
2
is in Eq. (37.1), with the usual model input distributions. The numerical values of
1
i are reported in Table 37.1. We observe that 1 D 3 D , while 2 D 0. This
8
last result, as mentioned, does not mean that Y is independent of X2 . Conversely, it
indicates that fixing X2 at any possible value does not cause the preferred alternative
to change.
This section report some more recent theoretical developments. The discussion in
the previous sections shows that density-based, distribution-based, and information
value sensitivity measures have a common aspect. As underlined in Borgonovo,
Hazen, and Plischke [13], they measure, using some form of discrepancy, the shift
between the unconditional model output distribution and the conditional one, given
that Xi is fixed. This common rationale is thoroughly investigated in Borgonovo,
Hazen, and Plischke [13], to which we refer to full details. We succinctly report
here the main findings. Let PY be the distribution that reflects the present decision-
maker degree of belief about the model output(s). Then, let PYjXi be the distribution
that reflects the present decision-maker degree of belief about the model output(s)
when Xi is fixed. Then, consider a generic operator between distributions, say .; /.
The operator goes from pairs of distributions .P0 ; P00 / to R. Borgonovo, Hazen, and
Plischke [13] define inner operator as an operator .; / that possesses the minimal
property that .P; P/ D 0. That is, if the two distributions coincide, then the operator
returns a null value. Then, one can define the inner statistic based on inner operator
.; / the quantity
The inner statistic in Eq. (37.25) is a function that goes from the support of Xi
into R and whose numerical value depends on the value taken on by Xi . Requiring
that i .Xi / is Riemann-Stieltjes integrable with respect to the marginal distribution
of Xi , [13] defines the global sensitivity measure of Xi with respect to Y as the
quantity
i D E .PY ; PYjXi / : (37.26)
37 Moment-Independent and Reliability-Based Importance Measures 1277
Observe that the definition in Eq. (37.26) does not require independence among the
model inputs. The definition also applies when a group of model inputs is considered
(thus, the subscript i would denote a group of inputs). Reference [13] shows that this
definition encompasses all sensitivity measures discussed in this chapter, as well as
variance-based sensitivity measures. In fact, if we set
where P and Q are two probability measures on measurable space .; A/, with
h W R0 ! R0 , h.0/ D 0 and supt h.2t/
h.t/
< 1. Then, if one lets h.t / D t , one
obtains
that is, if P and Q have corresponding densities, then dvar .P; Q/ is the L1 -norm
between densities.
Borgonovo, Tarantola, Plischke, and Morris [14] also show that Eq. (37.28)
contains the Birnbaum-Orlicz family of metrics
respectively.
where M .N / is the number of partitions at size N and F b Y .y/;
b Y jXi 2xi ;xi C .y/ is an estimator of the inner statistic in which the point
F
condition has been replaced by the bin condition.
1280 E. Borgonovo and B. Iooss
Borgonovo, Hazen, and Plischke [13] then prove a general result showing that
under the assumptions that .; / is a continuous function of its arguments, that the
inner statistic is Riemann-Stieltjes integrable, and that M .N / is a monotonically
N
increasing function such that lim D 1, the estimator Oi in Eq. (37.33) is
N !1 M .N /
a consistent estimator of i .
The numerical values of the sensitivity measures presented in Table 37.1 are
obtained using a given data estimation approach.
We finally note that the common rationale cited in this work is also related to the
recent work by Fort et al. [26].
Z
P D P rfG.X/ < 0g D 1fG.x/<0g f .x/d x; (37.34)
where f .x/ is the pdf of X. The objective of the sensitivity analysis becomes the
quantification of the influence of each Xi on this probability.
The proposed approach consists in perturbing the original pdf fi for a given
input variable Xi while keeping constant the pdfs for all the other input variables
Xi D .X1 ; : : : ; Xi1 ; XiC1 ; : : : ; Xd /. Let us call Xi& fi& the corresponding
perturbed random input. This perturbed input takes the place of the real random
input Xi , in a sense of modeling error: what if the real variable were Xi& instead of
Xi ? In order to answer this question, a new value for the failure probability, denoted
P i &, is computed. Xi can be said influential (resp. no influential) on the failure
probability if the value P i & is different (resp. similar) from the reference value P .
Following this idea, the perturbed failure probability of the system writes
Z
fi& .xi /
Pi& D 1fG.x/<0g f .x/d x: (37.35)
fi .xi /
Pi& P
Si& D 1 1f Pi & P g C 1 1f Pi & <P g : (37.36)
P Pi&
The PLI can be estimated directly from the set of simulations used to compute
the system failure probability, thus limiting the number of calls to the numerical
model. Under mild support constraints, a consistent estimator of Pi& is given by
N j
O 1 X fi& .xi /
Pi&N D 1fG.x /<0g
j
j
: (37.37)
N j D1 fi .xi /
where .x1 ; : : : ; xN / is a Monte Carlo sample. This property holds in the more
general case when P is originally estimated by importance sampling (e.g., see
Lemaire [33]) rather than simple Monte Carlo, which is more appealing in contexts
when G is time-consuming. Asymptotical properties of the indices are derived in
Lemaitre et al. [34], including a central limit theorem for the plug-in estimator
of Si& .
Concerning the perturbed input density fi& , it is defined as the closest distribution
to the original fi in the entropy sense and under some constraints of perturbation.
The Kullback-Leibler divergence (Eq. 37.15) is used in [34] to quantify the distance
between the reference model input density and the perturbed density. The perturbed
density is then found as the solution to a constrained optimization problem, as
follows:
minfmod fKL.fmod ; fi /g
s:t: (37.38)
R
gk .xi /fmod .xi /dxi D &k;i .k D 1 K/
where
Z " K #
X
i ./ D log fi .x/ exp
k gk .x/ dx : (37.40)
kD1
Alternative types of perturbations are considered in Lemaitre et al. [34]. The first
is (mean shifting), i.e., a perturbation in the mean:
Z
xi fmod .xi /dxi D &i : (37.41)
37 Moment-Independent and Reliability-Based Importance Measures 1283
where qr is the reference quantile. Equation (37.43) has the meaning that fmod is a
density such that its &i -quantile is qr .
To illustrate the PLI indices defined in Eq. (37.36), we report the example in [34]
which uses the modified version of the Ishigami function:
X1
X2
2
X3
i
S
0
2
4
3 2 1 0 1 2 3
c
Fig. 37.5 Estimated PLI (indices S i & ) for the thresholded Ishigami function with a mean shifting
1284 E. Borgonovo and B. Iooss
8 Conclusions
References
1. Anderson, T.W., Darling, D.A.: Asymptotic theory of certain goodness-of-fit criteria based on
stochastic processes. Ann. Math. Stat. 23, 193212 (1952)
2. Auder, B., Iooss, B.: Global sensitivity analysis based on entropy. In: ESREL 2008 Conference,
ESRA, Valencia, Sept 2008
3. Baucells, M., Borgonovo, E.: Invariant probabilistic sensitivity analysis. Manag. Sci. 59(11),
25362549 (2013). http://www.scopus.com/inward/record.url?eid=2-s2.0-84888863510&
partnerID=tZOtx3y1
4. Birnbaum, L.: On the importance of different elements in a multielement system. In:
Krishnaiah, P.R. (ed.) Multivariate Analysis, vol. 2, pp. 115. Academic, New York (1969)
5. Borgonovo, E.: Measuring uncertainty importance: investigation and comparison of alternative
approaches. Risk Anal. 26(5), 13491361 (2006)
6. Borgonovo, E.: A new uncertainty importance measure. Reliab. Eng. Syst. Saf. 92(6), 771784
(2007)
7. Borgonovo, E.: Differential, criticality and Birnbaum importance measures: an application to
basic event, groups and SSCs in event trees and binary decision diagrams. Reliab. Eng. Syst.
Saf. 92(10), 14581467 (2007)
8. Borgonovo, E.: The reliability importance of components and prime implicants in coherent
and non-coherent systems including total-order interactions. Eur. J. Oper. Res. 204(3),
485495 (2010)
9. Borgonovo, E., Plischke, E.: Sensitivity analysis for operational research. Eur. J. Oper. Res.,
3(1), 869887 (2016)
10. Borgonovo, E., Castaings, W., Tarantola, S.: Moment independent uncertainty importance: new
results and analytical test cases. Risk Anal. 31(3), 404428 (2011)
11. Borgonovo, E., Castaings, W., Tarantola, S.: Model emulation and moment-independent sensi-
tivity analysis: an application to environmental modeling. Environ. Model. Softw. 34, 105115
(2012)
12. Borgonovo, E., Hazen, G., Plischke, E.: A common rationale for global sensitivity analysis.
In: Steenbergen, R.D.M., Van Gelder, P., Miraglia, S., Vrouwenvelder, A.C.W.M.T. (eds.)
Proceedings of the 2013 ESREL Conference, Amsterdam, pp. 32553260 (2013)
13. Borgonovo, E., Hazen, G., & Plischke, E. (2016). A Common Rationale for Global Sensitivity
Measures and their Estimation. Risk Analysis, forthcoming, DOI: 10.1111/risa.12555, 124
14. Borgonovo, E., Tarantola, S., Plischke, E., Morris, M.: Transformation and invariance in the
sensitivity analysis of computer experiments. J. R. Stat. Soc. Ser. B 76(5), 925947 (2014)
15. Borgonovo, E., Hazen, G., Jose, V., Plischke, E., 2016: Value of Information, Scoring Rules
and Global Sensitivity Analysis, work in progress
16. Caniou, Y., Sudret, B.: Distribution-based global sensitivity analysis using polynomial chaos
expansions. Procedia Social and Behavioral Sciences, pp. 76257626 (2010)
17. Caniou, Y., Sudret, B.: Distribution-based global sensitivity analysis in case of correlated
input parameters using polynomial chaos expansions. In: 11th International Conference on
Applications of Statistics and Probability in Civil Engineering (ICASP11), Zurich (2011)
18. Castaings, W., Borgonovo, E., Tarantola, S., Morris, M.D.: Sampling strategies in density-
based sensitivity analysis. Environ. Model. Softw. 38, 1326 (2012)
19. Chandra, M., Singpurwalla, N.D., Stephens, M.A.: Kolmogorov statistics for tests of fit for the
extreme value and Weibull distributions. J. Am. Stat. Assoc. 76(375), 729731 (1981)
20. Critchfield, G.G., Willard, K.E.: Probabilistic analysis of decision trees using Monte Carlo
simulation. Med. Decis. Mak. 6(2), 8592 (1986)
21. Crnkovic, C., Drachman, J.: Quality control. RISK 9(9), 139143 (1996)
1286 E. Borgonovo and B. Iooss
22. Csiszr, I.: Axiomatic characterizations of information measures. Entropy 10, 261273
(2008)
23. Da Veiga, S.: Global sensitivity analysis with dependence measures. J. Stat. Comput. Simul.
85, 12831305 (2015)
24. De Lozzo, M., Marrel, A.: New improvements in the use of dependence measures for sensitivity
analysis and screening, pp. 121 (Dec 2014). arXiv:1412.1414v1
25. Felli, J., Hazen, G.: Sensitivity analysis and the expected value of perfect information. Med.
Decis. Mak. 18, 95109 (1998)
26. Fort, J., Klein, T., Rachdi, N.: New sensitivity analysis subordinated to a contrast. Commun.
Stat. Theory Methods (2014, in press)
27. Gamboa, F., Klein, T., Lagnoux, A.: Sensitivity analysis based on Cramer von Mises distance,
pp. 120 (2015). arXiv:1506.04133 [math.PR]
28. Howard, R.A.: Decision analysis: applied decision theory. In: Proceedings of the Fourth
International Conference on Operational Research. Wiley-Interscience, New York (1966)
29. Iman, R., Hora, S.: A robust measure of uncertainty importance for use in fault tree system
analysis. Risk Anal. 10, 401406 (1990)
30. Krzykacz-Hausmann, B.: Epistemic sensitivity analysis based on the concept of entropy. In:
SAMO 2001, Madrid, CIEMAT, pp. 3135 (2001)
31. Kuo, W., Zhu, X.: Importance Measures in Reliability, Risk and Optimization. Wiley,
Chichester (2012)
32. Le Gratiet, L., Cannamela, C., Iooss, B.: A Bayesian approach for global sensitivity analysis
of (multifidelity) computer codes. SIAM/ASA J. Uncertain. Quantif. 2, 336363 (2014)
33. Lemaire, M.: Structural Reliability. Wiley-ISTE, London/Hoboken (2009)
34. Lemaitre, P., Sergienko, E., Arnaud, A., Bousquet, N., Gamboa, F., Iooss, B.: Density
modification based reliability sensitivity analysis. J. Stat. Comput. Simul. 85, 12001223
(2015)
35. Liu, H., Chen, W., Sudjianto, A.: Relative entropy based method for probabilistic sensitivity
analysis in engineering design. ASME J. Mech. Des. 128, 326336 (2006)
36. Luo, X., Lu, Z., Xu, X.: A fast computational method for moment-independent uncertainty
importance measure. Comput. Phys. Commun. 185, 1927 (2014)
37. Mason, D.M., Shuenmeyer, J.H.: A modified Kolmogorov Smirnov test sensitive to tail
alternatives. Ann. Stat. 11(3), 933946 (1983)
38. Oakley, J.: Decision-theoretic sensitivity analysis for complex computer models. Technomet-
rics 51(2), 121129 (2009)
39. Park, C.K., Ahn, K.I.: A new approach for measuring uncertainty importance and distributional
sensitivity in probabilistic safety assessment. Reliability Engineering & System Safety.
46, 253261 (1994)
40. Plischke, E., Borgonovo, E.: Probabilistic Sensitivity Measures from Empirical Cumulative
Distribution Functions: A Horse Race of Methods, 2016, Work in Progress.
41. Plischke, E., Borgonovo, E., Smith, C.: Global sensitivity measures from given data. Eur. J.
Oper. Res. 226(3), 536550 (2013)
42. Pratt, J., Raiffa, H., Schlaifer, R.: Introduction to Statistical Decision Theory. MIT, Cambridge
(1995)
43. Saltelli, A.: Making best use of model valuations to compute sensitivity indices. Comput. Phys.
Commun. 145, 280297 (2002)
44. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global Sensitivity Analysis The Primer. Wiley, Chichester (2008)
45. Scheff, H.: A useful convergence theorem for probability distributions. Ann. Math. Stat. 18(3),
434438 (1947)
46. Strong, M., Oakley, J.: An efficient method for computing partial expected value of perfect
information for correlated inputs. Med. Decis. Mak. 33, 755766 (2013)
47. Strong, M., Oakley, J.E., Chilcott, J.: Managing structural uncertainty in health eco-
nomic decision models: a discrepancy approach. J. R. Stat. Soc. Ser. C 61(1), 2545
(2012)
37 Moment-Independent and Reliability-Based Importance Measures 1287
48. Strong, M., Oakley, J., Brennan, A.: Estimating multiparameter partial expected value of
perfect information from a probabilistic sensitivity analysis sample: a nonparametric regression
approach. Med. Decis. Mak. 34, 311326 (2014)
49. Sudret, B.: Global sensitivity analysis using polynomial chaos expansion. Reliab. Eng. Syst.
Saf. 93, 964979 (2008)
50. Xu, X., Lu, Z., Luo, X.: Stable approach based on asymptotic space integration for moment-
independent uncertainty importance measure. Risk Anal. 34(2), 235251 (2014)
51. Zhang, L., Lu, Z., Cheng, L., Fan, C.: A new method for evaluating Borgonovo moment-
independent importance measure with its application in an aircraft structure. Reliab. Eng. Syst.
Saf. 132, 163175 (2014)
Metamodel-Based Sensitivity Analysis:
Polynomial Chaos Expansions and 38
Gaussian Processes
Abstract
Global sensitivity analysis is now established as a powerful approach for
determining the key random input parameters that drive the uncertainty of model
output predictions. Yet the classical computation of the so-called Sobol indices
is based on Monte Carlo simulation, which is not affordable when computa-
tionally expensive models are used, as it is the case in most applications in
engineering and applied sciences. In this respect metamodels such as polynomial
chaos expansions (PCE) and Gaussian processes (GP) have received tremendous
attention in the last few years, as they allow one to replace the original, taxing
model by a surrogate which is built from an experimental design of limited size.
Then the surrogate can be used to compute the sensitivity indices in negligible
time. In this chapter an introduction to each technique is given, with an emphasis
on their strengths and limitations in the context of global sensitivity analysis. In
particular, Sobol (resp. total Sobol) indices can be computed analytically from
the PCE coefficients. In contrast, confidence intervals on sensitivity indices can
be derived straightforwardly from the properties of GPs. The performance of the
two techniques is finally compared on three well-known analytical benchmarks
(Ishigami, G-Sobol, and Morris functions) as well as on a realistic engineering
application (deflection of a truss structure).
Keywords
Polynomial chaos expansions Gaussian process regression Kriging Error
estimation Sobol indices Model selection
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1290
2 Polynomial Chaos Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1292
2.1 Mathematical Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1292
2.2 Polynomial Chaos Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1292
2.3 Nonstandard Variables and Truncation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1294
2.4 Computation of the Coefficients and Error Estimation . . . . . . . . . . . . . . . . . . . . . . 1295
2.5 Error Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297
2.6 Post-processing for Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1298
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1301
3 Gaussian Process-Based Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302
3.1 A Short Introduction to Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302
3.2 Gaussian Process Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302
3.3 Main Effects Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308
3.4 Variance of the Main Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309
3.5 Numerical Estimates of Sobol Indices by Gaussian Process Sampling . . . . . . . . . 1310
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1312
4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1312
4.1 Ishigami Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1312
4.2 G-Sobol Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1314
4.3 Morris Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1317
4.4 Maximum Deflection of a Truss Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1322
1 Introduction
x 2 DX Rd 7! y D G.x/ (38.1)
which is constructed based on a limited number of runs of the true model, the so-
called experimental design:
X D x .1/ ; : : : ; x .n/ : (38.2)
Once a type of surrogate model is selected, the parameters have to be fitted based
on the information contained in the experimental
design X and associated
runs of
the original computational model Y D yi D G.x .i/ /; i D 1; : : : ; n . Then the
accuracy of the surrogate shall be estimated by some kind of validation technique.
For a general introduction to surrogate modeling, the reader is referred to [60] and
to the recent review by Iooss and Lematre [33].
In this chapter, we discuss two classes of surrogate models that are commonly
used for sensitivity analysis, namely, polynomial chaos expansions (PCE) and
Gaussian processes (GP). The reader is referred to the books of Santner et al.
[52] and Rasmussen and Williams [49] for more detail about Gaussian process
regression. The use of polynomial chaos expansions in the context of sensitivity
analysis has been originally presented in Sudret [61, 63] using a nonintrusive least-
square method. Other nonintrusive strategies for the calculation of PCE coefficients
include spectral projection through sparse grids (e.g., Crestaux et al. [20], Buzzard
and Xiu [17], Buzzard [16]) and sparse polynomial expansions (e.g., Blatman
and Sudret [13]). In the last 5 years, numerous application examples have been
developed using PCE for sensitivity analysis, e.g., Fajraoui et al. [28], Younes et al.
[70], Brown et al. [15], and Sandoval et al. [51]. Recent extensions to problems with
dependent input parameters can be found in Sudret and Caniou [65] and Munoz
Zuniga et al. [46].
In parallel, Gaussian process modeling has been introduced in the context of
sensitivity analysis by Welch et al. [68], Oakley and OHagan [48], and Marrel et al.
[42, 43]. Recent developments in which metamodeling errors are taken into account
in the analysis have been proposed by Le Gratiet et al. [38] and Chastaing and Le
Gratiet [18].
The chapter first recalls the basics of the two approaches and details how
they can be used to compute sensitivity indices. The two approaches are then
compared on different benchmark examples as well as on an application in structural
mechanics.
1292 L. Le Gratiet et al.
1
X
Y D y j Zj : (38.3)
j D0
1
The random variable Y is therefore cast as an infinite series, in which Zj j D0 is
a numerable
1 set of random variables (which form a basis of the Hilbert space), and
yj j D0 are coefficients. The latter may be interpreted as the coordinates of Y in
this basis. In the sequel we focus on polynomial chaos expansions, in which the
1
basis terms Zj j D0 are multivariate orthonormal polynomials in the input vector
X , i.e., Zj D j .X /.
In the sequel we assume that the input variables are statistically independent,
Qdthat the joint PDF is the product of the d marginal distributions: fX .x/ D
so
iD1 fXi .xi /, where the fXi .xi / are the marginal distributions of each variable
fXi ; i D 1; : : : ; d g defined on DXi . For each single variable Xi and any two
functions 1 ; 2 W x 2 DXi 7! R, we define the functional inner product by the
following integral (provided it exists):
Z
def
h1 ; 2 ii D E 1 .Xi / 2 .Xi / D 1 .x/ 2 .x/ fXi .x/ dx: (38.4)
DXi
Using the above notation, classical algebra allows one to build a family of
.i/
orthogonal polynomials fPk ; k 2 Ng satisfying:
D E h i Z
.i/ .i/ def .i/ .i/ .i/ .i/ .i/
Pj ; Pk D E Pj .Xi / Pk .Xi / D Pj .x/ Pk .x/ fXi .x/ dxDaj j k ;
i DXi
(38.5)
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1293
Table 38.1 Classical families of orthogonal polynomials (taken from Sudret [62])
Type of variable Distribution Orthogonal polynomials Hilbertian basis k .x/
q
1
Uniform U .1; 1/ 11;1 .x/=2 Legendre Pk .x/ Pk .x/= 2kC1
2
p
Gaussian N .0; 1/ p1 e x =2 Hermite Hek .x/ Hek .x/= k
2 q
.kCaC1/
Gamma .a; D 1/ x a e x 1RC .x/ Laguerre Lak .x/ Lak .x/= k
.1x/a .1Cx/b
Beta B.a; b/ 11;1 .x/ B.a/ B.b/
Jacobi Jka;b .x/ Jka;b .x/=Ja;b;k
2aCbC1 .kCaC1/ .kCbC1/
J2a;b;k D 2kCaCbC1 .kCaCbC1/ .kC1/
See, e.g., Abramowitz and Stegun [1]. In the above equation subscript k denotes the
.i/
degree of the polynomial Pk , j k is the Kronecker symbol equal to 1 when j D k
.i/ .i/
and 0 otherwise and aj corresponds to the squared norm of Pj :
D E
.i/ def .i/ .i/ .i/
aj Dk Pj k2i D Pj ; Pj : (38.6)
i
From the sets of univariate orthonormal polynomials, one can now build multivari-
ate orthonormal polynomials with a tensor product construction. For this purpose
let us define the multi-indices 2 Nd , which are ordered lists of integers:
D .1 ; : : : ; d / ; i 2 N: (38.8)
d
Y
def .i/
.x/ D i .xi /; (38.9)
iD1
1294 L. Le Gratiet et al.
n o
.i/
where the univariate polynomials k ; k 2 N are defined above, see
Eqs. (38.5), (38.7). By virtue of Eq. (38.5) and the above tensor product
construction, the multivariate polynomials in the input vector X are also
orthonormal, i.e.:
Z
def
E .X / .X / D .x/ .x/ fX .x/ d x D 8 ; 2 Nd ;
DX
(38.10)
In practical sensitivity analysis problems, the input variables may not necessarily
have standardized distributions as the ones described in Table 38.1. Thus reduced
variables U with standardized distributions are introduced first through an isoprob-
abilistic transform:
X D T .U /: (38.12)
For instance, when dealing with independent uniform distributions with support
DXi D ai ; bi ; i D 1; : : : ; d , the isoprobabilistic transform reads:
ai C bi bi ai
Xi D C Ui Ui U .1; 1/: (38.13)
2 2
In the case of Gaussian independent variables fXi N .i ; i / ; i D 1; : : : ; d g,
the one-to-one mapping reads:
Xi D i C i Ui ; Ui N .0; 1/ (38.14)
In the general case when the input variables are non-Gaussian (e.g., Gumbel
distributions, see application in the last example of Sect. 4 of this chapter), the one-
to-one mapping may be obtained as follows:
Xi D FX1
i
..Ui // Ui N .0; 1/ (38.15)
d
X
def
jj D i : (38.16)
iD1
The standard truncation scheme consists in selecting all polynomials such that jj
is smaller
than or equal to a given
p. This leads to a set of polynomials denoted by
Ad;p D 2 Nd W jj p of cardinality:
!
d Cp .d C p/
card Ad;p D D : (38.17)
p d p
The use of polynomial chaos expansions has emerged in the late 1980s in uncer-
tainty quantification problems under the form of stochastic finite element methods
[29]. In this setup the constitutive equations of the physical problem are discretized
both in the physical space (using standard finite element techniques) and in the
random space using polynomial chaos expansion. This results in coupled systems
of equations which require ad hoc solvers, thus the term intrusive approach.
Nonintrusive techniques such as projection or stochastic collocation have
emerged in the last decade as a means to compute the coefficients of PC expansions
from repeated evaluations of the existing model G considered as a black-box
function. In this section we focus on a particular nonintrusive approach based on
least-square analysis.
1296 L. Le Gratiet et al.
Following Berveiller et al. [7, 8], the computation of the PCE coefficients may
be cast as a least-square minimization problem (originally termed regression
problem) as follows: once a truncation scheme A Nd is chosen (for instance,
A D Ad;p ), the infinite series is recast as the sum of the truncated series and a
residual:
X
Y D G.X / D y .X / C "; (38.18)
2A
in which " corresponds to all those PC polynomials whose index is not in the
truncation set A. The least-square minimization approach consists in finding the set
of coefficients y D fy ; 2 Ag which minimizes the mean-square error:
2 !2 3
2 def X
E " D E 4 G.X / y .X / 5 : (38.19)
2A
N
!2
1 X X
yO D arg min G.x .i/ / y .x .i/ / : (38.21)
y2RcardA N iD1 2A
In this expression, X D x .i/ ; i D 1; : : : ; n is a sample set of points called
experimental design (ED) that is typically obtained by Monte Carlo simulation
of the input random vector X . To solve the least-square minimization problem in
Eq. (38.21), the computational model G is first run for each point in the ED, and
T
the results are stored in a vector Y D y .1/ D G.x .1/ /; : : : ; y .n/ D G.x .n/ / :
Then the so-called information matrix is calculated from the evaluation of the basis
polynomials onto each point in the ED:
n o
def
A D Aij D j .x .i/ / ; i D 1; : : : ; n; j D 1; : : : ; card A : (38.22)
1 T
yO D yA T A A Y: (38.23)
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1297
The points used in the experimental design may be obtained from crude Monte Carlo
simulation. However other types of designs are of common use, especially Latin
hypercube sampling (LHS), see McKay et al. [45], or quasi-random sequences such
as the Sobol or Halton sequence [47]. The size of the experimental design is of
crucial importance: it must be larger than the number of unknowns card A for the
problem to be well posed. In practice we use the thumb rule n 23 card A [10].
The simple least-square approach summarized above does not allow one to cope
with the curse of dimensionality.
Indeed the standard truncation scheme requires
approximately 3 d Cpp
runs of the original model G.x/, which is in the order of
104 when, e.g., d 15; p 5. However, in practice most of the problems lead
eventually to sparse expansions, i.e., PCE in which most of the coefficients are zero
or negligible. In order to find directly the significant polynomials and associated
coefficients, sparse PCEs have been introduced recently by Blatman and Sudret
[11, 12], and Bieri and Schwab [9]. The recent developments make use of specific
selection algorithms which, by solving a penalized least-square problem, lead by
construction to sparse expansions. Of interest in this chapter is the use of the least-
angle regression algorithm (LAR, Efron et al. [27]), which was introduced in the
field of uncertainty quantification by Blatman and Sudret [14]. Details can be found
in Sudret [64]. Note that other techniques based on compressive sensing have also
been developed recently, see, e.g., Doostan and Owhadi [23], Sargsyan et al. [53],
and Jakeman et al. [34].
N
!2
1 X X
"emp D G.x .i/ / yO .x .i/ / : (38.24)
N iD1 2A
def
the experimental design X nx .i/ D x .1/ ; : : : ; x .i1/ ; x .iC1/ ; : : : ; x .n/ . Then the
error is computed at point x .i/ :
def
i D G.x .i/ / G PCni .x .i/ /: (38.25)
1 X 2
n n
1X 2
"LOO D
i D G.x .i/ / G PCni .x .i/ / : (38.26)
n iD1 n iD1
n
!2
1 X G.x .i/ / G PC .x .i/ /
"LOO D ; (38.27)
n iD1 1 hi
where hi is the i -th diagonal term of the projection matrix A.AT A/1 AT (matrix A
is defined in Eq. (38.22)) and G PC ./ is now the PC expansion built up from the full
experimental design X .
As a conclusion, when using a least-square minimization technique to compute
the coefficients of a PC expansion, an a posteriori estimator of the mean-square
error is readily available. This allows one to compare PCEs obtained from different
truncation schemes and select the best one according to the leave-one-out error
estimate.
In other words the mean and variance of the random response may be obtained by a
mere combination of the PCE coefficients once the latter have been computed.
d
X X
G.x/ D G0 C Gi .xi / C Gij .xi ; xj / C C G12:::d .x/; (38.30)
iD1 1i<j d
G0 D E G.X /
Gi .xi / D E G.X /jXi D xi G0 (38.31)
Gij .xi ; xj / D E G.X /jXi ; Xj D xi ; xj Gi .xi / Gj .xj / G0 :
def
A D fi1 ; : : : ; is g f1; : : : ; d g ; (38.32)
where x A is a subvector of x which only contains the components that belong to the
index set A. It can be proven that the summands are orthogonal with each other:
E GA .x A / GB .x B / D 0 8 A; B f1; : : : ; d g ; A B: (38.34)
Using this orthogonality property, one can decompose the variance of the model
output:
2 3
def 6 X 7 X
V D Var Y D Var 6
4 GA .x A /7
5D Var GA .X A / (38.35)
Af1; ::: ;d g Af1; ::: ;d g
A; A;
1300 L. Le Gratiet et al.
def
VA D Var GA .X A / D E GA2 .X A / : (38.36)
def
The Sobol index attached to each subset of variables A D fi1 ; : : : ; is g
f1; : : : ; d g is finally defined by:
VA Var GA .X A /
SA D D : (38.37)
V Var Y
First-order Sobol indices quantify the portion of the total variance V that can be
apportioned to the sole input variable Xi :
Vi Var Gi .Xi /
Si D D : (38.38)
V Var Y
Second-order indices quantify the joint effect of variables .Xi ; Xj / that cannot be
explained by each single variable separately:
Vij Var Gij .Xi ; Xj /
Sij D D : (38.39)
V Var Y
Finally, total Sobol indices Sitot quantify the total impact of a given parameter Xi
including all of its interactions with other variables. They may be computed by the
sum of the Sobol indices of any order that contain Xi :
X
Sitot D SA : (38.40)
A3i
Among other methods, Monte Carlo estimators of the various indices are available
in the literature and thoroughly discussed in (see Chap. 35, Variance-Based
Sensitivity Analysis: Theory and Estimation Algorithms). Their computation
usually requires 1034 runs of the model G for each index, which leads to a global
computational cost that is not affordable when G is expensive to evaluate.
The union of all these sets is by construction equal to A. Thus we can reorder the
terms of the truncated PC expansion so as to exhibit the Sobol decomposition:
X def
X
G PC .x/ D y0 C GAPC .x A / where GAPC .x A / D y .x/:
Af1; ::: ;d g 2AA
A;
(38.42)
In other words, from a given PC expansion, the Sobol indices at any order may be
obtained by a mere combination of the squares of the coefficients. More specifically,
the PC-based estimator of the first-order Sobol indices read:
X
yO2
2Ai
SOi D X where Ai D 2 A W i > 0 ; j i D 0 : (38.44)
2
yO
2A ; 0
2.7 Summary
Polynomial chaos expansions allow one to cast the random response G.X /
as a truncated series expansion. By selecting an orthonormal basis w.r.t. the
input parameter distributions, the corresponding coefficients can be given a
straightforward interpretation: the first coefficient y0 is the mean value of the
model output, whereas the variance is the sum of the squares of the remaining
coefficients. Similarly, the Sobol indices are obtained by summing up the squares
of suitable coefficients. Note that in low dimension (d < 10), the coefficients
can be computed by solving a mere ordinary least-square problem. In higher
dimensions advanced techniques leading to sparse expansions must be used to keep
the total computational cost (measured in terms of the size N of the experimental
design) affordable. Yet the post-processing to get the Sobol indices from the PCE
coefficients is independent of the technique used.
1302 L. Le Gratiet et al.
The principle of Gaussian process regression is to consider that the prior knowledge
about the computational model G.x/, x 2 Rd , can be modeled by a Gaussian
process Z.x/ with a mean denoted by m.x/ and a covariance kernel denoted by
k.x; x 0 /. Roughly speaking, we consider that the true response is a realization of
Z.x/. Usually, the mean and the covariance are parametrized as follows:
and
4
2
2
1
Z(x)
Z(x)
0
0
3 2 1
2
4
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Fig. 38.1 Examples of Gaussian process realizations with squared exponential kernels and
different means. The shaded areas represent the point-wise 95% confidence intervals
0
0
T
2 T
0 FT f.x /
kn .x; x / D 1 f .x/ r .x/ : (38.50)
F R r.x 0 /
N D FT R1 F 1 FT R1 Y: (38.51)
The term N denotes the posterior distribution mode of obtained from the improper
non-informative prior distribution ./ / 1 [50].
Remark. The predictive distribution is given by the Gaussian process Z.x/ condi-
tioned by the known observations Y. The Gaussian process regression metamodel
is given by the conditional expectation mn .x/ and its mean-square error is given by
the conditional variance kn .x; x/. An illustration of mn .x/ and kn .x; x/ is given in
Fig. 38.2.
The reader can note that the predictive distribution (38.48) integrates the
posterior distribution of . However, the hyper-parameters 2 and are not known
in practice and shall be estimated with the maximum likelihood method [32, 52] or
a cross-validation strategy [3]. Then, their estimates are plugged in the predictive
distribution. The restricted maximum likelihood estimate of 2 is given by:
.Y F N /T R1 .Y F N /
O 2 D : (38.52)
np
2.0
predictive distribution
0.0
0.5
Unfortunately, such a closed form expression does not exist for and it has to be
numerically estimated.
Remark. Gaussian process regression can easily be extended to the case of noisy
observations. Let us suppose that Y is tainted by a white Gaussian noise ":
The term " .X / represents the standard deviation of the observation noise.
The mean and the covariance of the predictive distribution Z.x/obs jZ.X / D
Yobs ; 2 ; is then obtained by replacing in Equations (38.49), (38.50), and (38.51)
the correlation matrix R by 2 R C " where " is a diagonal matrix given by :
0 1
" .x .1/ /
B " .x .2/ / C
B C
" D B :: C:
@ : A
" .x .n/ /
We emphasize that the closed form expression for the restricted maximum likeli-
hood estimate of 2 does not exist anymore. Therefore, this parameter has to be
numerically estimated.
More efficient criteria can be found in Bates et al. [4], van Beers and Kleijnen [6],
and Le Gratiet and Cannamela [37].
2
= 0.3 = 0.3
0.8
1
0.6
k(h)
Z(x)
0
0.4
1
0.2
2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
h x
Fig. 38.3 The squared exponential kernel in function of h D x x 0 with different correlation
lengths and examples of resulting Gaussian process realizations
cross-validation predictive distribution, see for instance Dubrule [25]. It allows for
deriving efficient methods of parameter estimation [3] or sequential design [37].
Some usual stationary covariance kernel are listed below.
The squared exponential covariance function . The form of this kernel is given
by:
1
k.x; x 0 / D 2 exp 2 jjx x 0 jj2 :
2
p ! p !
21 2jjhjj 2jjhjj
k .h/ D K ;
./
1.0 =1 2 =1 2
=3 2 =3 2
2
=5 2 =5 2
0.8
1
0.6
Z(x)
k(h)
0
0.4
1
0.2
2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
h x
Fig. 38.4 The -Matrn kernel in function of h D x x 0 with different regularity parameters
and examples of resulting Gaussian process realizations
surely with 0 < . Three popular choices of -Matrn covariance kernels are
the ones for D 1=2, D 3=2, and D 5=2 :
jjhjj
kD1=2 .h/ D exp ;
p ! p !
3jjhjj 3jjhjj
kD3=2 .h/ D 1 C exp ;
and
p ! p !
5jjhjj 5jjhjj2 5jjhjj
kD5=2 .h/ D 1 C C exp :
3 2
For < 2 the corresponding Gaussian process are not differentiable in mean
square, whereas for D 2 it is infinitely differentiable (it corresponds to
the squared exponential kernel). We illustrate in Fig. 38.5 the one-dimensional
-exponential kernel for different values of .
1308 L. Le Gratiet et al.
1.0 =1 =1
=3 2 =3 2
2
=2 =2
0.8
1
0.6
Z(x)
0
k(h)
0.4
1
0.2
2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
h x
Fig. 38.5 The -exponential kernel in function of h D x x 0 with different regularity parameters
and examples of resulting Gaussian process realizations
From now on, the input parameter x 2 Rd is considered as a random input vector
X D .X1 ; : : : ; Xd / with independent components. Before focusing on variance-
based sensitivity indices, the inference about the main effects is studied in this
section. Main effects are a powerful tool to visualize the impact of each input
variable on the model output (see, e.g., Oakley and OHagan [48]). The main effect
of the group of input variables XA ; A f1; : : : ; d g is defined by E G.X/jXA .
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1309
Although the main effect enables one to visualize the impact of a group of variables
on the model output, it does not quantify it. To perform such an analysis, consider
the variance of the main effect:
VA Var .E Zn .X/jXA /
SA D D : (38.56)
V Var .Zn .X//
Sobol indices are the most popular measures to carry out a sensitivity analysis since
their value can easily be interpreted as the part of the total variance due to a group
of variables. However, in contrary to the partial variance VA , it does not provide
information about the order of magnitude of the contribution to the model output
variance of variable group XA .
PN P 2
1 .i/ 1 N .i/
N iD1 Zn .X.i/ /Zn .X A / 2N .i/
iD1 Zn .X / C Zn .X A /
SA;N D P 2 ;
1 PN .i/ /2 1 N .i/ / C Z .X.i/ /
N iD1 Z n .X 2N iD1 Z n .X n A
(38.57)
.i/
where .X.i/ ; X A /iD1;:::;N is a N -sample from the random variable .X; XA /.
In particular, this approach avoids to compute the integrals presented in Oakley
and OHagan [48] and thus simplify the estimation of VA and V . Furthermore, it
takes into account their joint distribution.
P
Remark. This result can easily be extended to the total Sobol index Sitot D SA .
A i
The reader is referred to Sobol et al. [57] and Chap. 35, Variance-Based
Sensitivity Analysis: Theory and Estimation Algorithms for examples of pick-
freeze estimates of SAtot .
The sensitivity index SA;N (38.57) is obtained after substituting the Gaussian
process Zn .x/ for the original computational model G.x/. Therefore, it is a random
variable defined on the same probability space as Zn .x/. The aim of this section is
to present a simple methodology to get a sample SA;N of SA . From this sample, an
estimate of SA (38.56) and a quantification of its uncertainty can be deduced.
ZQ n .x/ D mn .x/ m Q
Q n .x/ C Z.x/; (38.59)
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1311
where m Q / F Q and Q D FT R1 F 1 FT R1
Q n .x/ D fT .x/ Q C rT .x/R1 Z.X
Q / has the same distribution as Zn .x/. Therefore, one can compute realizations
Z.X
Q
of Zn .x/ from realizations of Z.x/. Q
Since Z.x/ is not conditioned, the problem
is numerically easier. Among the available Gaussian process sampling methods,
several can be mentioned: Cholesky decomposition [49], Fourier spectral decom-
position [59], Karhunen-Loeve spectral decomposition [49], and the propagative
version of the Gibbs sampler [36].
Remark. Let suppose that a new point x .nC1/ is added to the experimental
design X . A classical result of conditional probability implies that the new pre-
dictive distribution Z.x/jZ.X / D Y; Z.x .nC1/ / D G.x .nC1/ /; 2 ; is identical
to Zn .x/jZn .x .nC1/ / D G.x .nC1/ /; 2 ; . Therefore, Zn .x/ can be viewed as
an unconditioned Gaussian process and, using the Kriging conditioning method,
realizations of Z.x/jZ.X / D Y; Z.x .nC1/ / D G.x .nC1/ /; 2 ; can be derived
from realizations of Zn .x/ using the following equation:
kn .x .nC1/ ; X/
ZnC1 .x/ D G.x .nC1/ / Zn .x .nC1/ / C Zn .x/: (38.60)
kn .x .nC1/ ; x .nC1/ /
Therefore, it is easy to calculate a new sample of SA;N after adding a new point
x .nC1/ to the experimental design set X . This result is used in the function
sobolGP of the R CRAN package sensitivity to perform sequential design for
sensitivity analysis using a stepwise uncertainty reduction (SUR) strategy [5,38,39].
m
1 X N
SOA D S : (38.61)
m iD1 A;i
with variance:
1 X N 2
m
O S2O D SA;i SOA : (38.62)
A m 1 iD1
The term O 2O represents the uncertainty on the estimate of SA (38.56) due to the
SA
metamodel approximation. Therefore, with the presented strategy, one can obtain
both an unbiased estimate of the sensitivity index SA and a quantification of its
uncertainty. The same procedure can be used to estimate the total Sobol indices.
Finally it may be of interest to evaluate the error due to the pick-freeze
approximation and to compare it to the error due to the metamodel. To do so, one
can use the central limit theorem [18,35] or a bootstrap procedure [38]. In particular,
1312 L. Le Gratiet et al.
a methodology to evaluate the uncertainty on the sensitivity index due to both the
Gaussian process and to the pick-freeze approximations is presented in Le Gratiet
et al. [38]. It makes it possible to determine the value of N such that the pick-freeze
approximation error is negligible compared to that of the metamodel.
3.6 Summary
4 Applications
The input distributions of X1 ; X2 , and X3 are uniform over the interval ; 3 .
This is a classical academic benchmark for sensitivity analysis, with first-order
Sobol indices:
For the polynomial chaos approach, the coefficients are calculated based on a
degree-adaptive LARS strategy (for details, see Blatman and Sudret [14]), resulting
in a sparse basis set. The maximum polynomial degree is adaptively selected in
the interval 3 p 15 based on LOO cross-validation error estimates (see
Eq. (38.27)).
For the Gaussian process approach, a tensorized Matrn-5=2 covariance kernel
is chosen (see Rasmussen and Williams [49]) with trend functions given by:
fT .x/ D 1 x2 x22 x13 x23 x14 x24 : (38.65)
To select fT .x/, we use a classical stepwise regression (i.e., the model errors
are considered independent and identically distributed according to a normal
distribution) with the Bayesian information criterion and a bidirectional selection.
This allows to merely obtain a relevant linear trend for the Gaussian process
regression.
The hyper-parameters are estimated with a leave-one-out cross-validation
procedure, while the parameters and 2 are estimated with a restricted maximum
likelihood method.
First we illustrate in Fig. 38.6 the accuracy of the models with respect to
the sample size n. The Nash-Sutcliffe model efficiency coefficient (also called
predictivity coefficient) is defined as follows:
Pntest ntest
O .i/ //2
.G.x .i/ / G.x N 1 X
Q D 1 iD1
2
Pntest ; GD G.x .i/ /; (38.66)
.i/ N 2 ntest iD1
iD1 .G.x / G/
1.0
0.8
0.8
0.6
0.6
Q2
Q2
0.4
0.4
0.2
0.2
40 60 80 100 120 40 60 80 100 120
n n
Fig. 38.6 Q2 coefficient as a function of the sample size n for the Ishigami function. For each n,
the box-plots represent the variations of Q2 obtained over 100 LHS replications
1.0
0.8
0.8
0.6
0.6
Sobol
Sobol
0.4
0.4
0.2
0.2
0.0
0.0
Fig. 38.7 First-order Sobol index estimates as a function of the sample size n for the Ishigami
function. The horizontal solid lines represent the exact values of S1 , S2 , and S3 . For each n, the box-
plot represents the variations obtained from 100 LHS replications. The validation set comprises
ntest D 10;000 samples
Yd
j4xi 2j C ai
G.x/ D ; ai 0: (38.67)
iD1
1 C ai
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1315
1
Vi D ; i D 1; : : : ; d;
3.1 C ai /2
d
Y (38.68)
V D .1 C Vi / 1;
iD1
Si D Vi =V:
a D f1; 2; 5; 10; 20; 50; 100; 500; 1000; 1000; 1000; 1000; 1000; 1000; 1000g :
(38.69)
As in the previous section, different sample sizes n are considered and 100
LHS replications are computed for each n. Sparse polynomial chaos expansions are
obtained with the same strategy as for the Ishigami function: adaptive polynomial
degree selection with 3 p 15 and LARS-based calculation of the coefficients.
For the Gaussian process regression model, a tensorized Matrn-5/2 covariance
kernel is considered with a constant trend function f.x/ D 1. The hyper-parameter
is estimated with a leave-one-out cross-validation procedure, and the parameters
and 2 are estimated with the maximum likelihood method.
The accuracy of the metamodels with respect to n is presented in Fig. 38.8. It
is computed from a test sample set of size ntest D 10; 000. The convergence of
the estimates of the first four first-order Sobol indices is represented in Fig. 38.9.
1.00
0.95
0.95
0.90
0.90
Q2
Q2
0.85
0.85
0.80
0.80
100 200 300 400 500 100 200 300 400 500
n n
Fig. 38.8 Q2 coefficient as a function of the sample size n for G-Sobol academic example. For
each n, the box-plot represents the variations of Q2 obtained from 100 LHS
1316 L. Le Gratiet et al.
1.0
0.8
0.8
0.6
0.4 0.6
Sobol
Sobol
0.4
0.2
0.2
0.0
0.0
100 200 300 400 500 100 200 300 400 500
n n
Fig. 38.9 Sobol index estimates with respect to the sample size n for G-Sobol function. The
horizontal solid lines represent the true values of S1 , S2 , S3 , and S4 . For each n, the box-plot
represents the variations obtained from 100 LHS
Table 38.2 Sobol index estimates for the G-Sobol function. The median and the root mean-
square error (RMSE) of the estimates are given for n D 100 and n D 500
Polynomial chaos expansion Gaussian process regression
Median RMSE Median RMSE
Index Value 100 500 100 500 100 500 100 500
S1 0:604 0:619 0.607 0.034 0.007 0.618 0.599 0:035 0:012
S2 0:268 0:270 0.269 0.027 0.005 0.233 0.245 0:046 0:026
S3 0:067 0:063 0.065 0.014 0.003 0.045 0.070 0:029 0:016
S4 0:020 0:014 0.019 0.008 0.001 0.008 0.023 0:018 0:013
S5 0:005 0:002 0.005 0.003 0.001 8.6104 1.8103 0:014 0:013
S6 0:001 0:000 7.2104 0.001 3.5104 6.4104 5.3104 0:013 0:013
S7 0:000 0:000 1.1104 1.1103 1.4104 5.3104 3.0104 0:013 0:013
S8 0:000 0:000 0.000 3.3104 1.7105 6.5104 7.1104 0:013 0:013
S9 0:000 0:000 0.000 4.1104 1.1105 8.5104 4.4104 0:14 0:013
S10 0:000 0:000 0.000 2.4104 1.1105 2.2104 1.7104 0:013 0:013
S11 0:000 0:000 0.000 9.5 104 1.2105 5.5104 -9.9105 0:013 0:013
S12 0:000 0:000 0.000 5.2104 2.1 105 2.6104 4.1104 0:013 0:013
S13 0:000 0:000 0.000 5.1104 5.9 106 9.8104 4.7104 0:013 0:013
S14 0:000 0:000 0.000 8.8104 1.9 105 1.8104 6.9104 0:013 0:013
S15 0:000 0:000 0.000 8.6104 9.7106 7.2104 3.1104 0:013 0:013
indices are insignificant. One can observe that the RMS error over the 100 LHS
replications is slightly smaller when using PCE for both n D 100 and n D 500 ED
points. Note that the second-order and total Sobol indices are also available for free
when using PCE.
20
X 20
X 20
X
G.x/ D i wi C ij wi wj C ij l wi wj wl C 5w1 w2 w3 w4 (38.70)
iD1 i<j i<j <l
where Xi U .0; 1/; i D 1; : : : ; 20, and wi D 2.xi 1=2/ for all i except for
i D 3; 5; 7 where wi D 2.1:1xi =.xi C 0:1/ 1=2/. The coefficients are defined as
i D 20; i D 1; : : : ; 10; ij D 15; i; j D 1; : : : ; 6; and ij l D 10; i; j; l D
1; : : : ; 5. The remaining coefficients are set equal to i D .1/i and ij D .1/iCj
and all the rest are zero. The reference values of the first-order Sobol indices of
the Morris function are calculated by a large Monte Carlo-based sensitivity analysis
(n D 106 ).
As in the previous section, different sample sizes n are considered and 100
LHS replications are computed for each n. Sparse polynomial chaos expansions
are obtained by adaptive polynomial degree selection 5 p 13 and LARS-based
calculation of the coefficients.
Furthermore, a tensorized Matrn-5=2 covariance kernel is used for the Gaussian
process regression and the following trend has been selected with a stepwise
regression procedure (as in Fig. 38.1):
1.0
0.8
0.8
0.6
0.6
Q2
Q2
0.4
0.4
0.2
0.2
0.0
100 200 300 400 500 100 200 300 400 500
n n
Fig. 38.10 Q2 coefficient as a function of the sample size n for the Morris function example. For
each n, the box-plot represents the variations of Q2 obtained from 100 LHS
1.0
0.8
0.8
0.6
0.6
Sobol
Sobol
0.4
0.4
0.2
0.2
0.0
0.0
100 200 300 400 500 100 200 300 400 500
n n
Fig. 38.11 First-order Sobol index estimates as a function of the sample size n for the Morris
function. The horizontal solid lines represent the exact values of S3 , S8 , and S9 . For each n, the
box-plot represents the variations obtained from 100 LHS replications
Table 38.3 First-order Sobol indices estimation for the Morris function. The median and the
root mean-square error (RMSE) of the estimates are given for n D 100 and n D 500
Polynomial chaos expansion Gaussian process regression
Median RMSE Median RMSE
Index Value 100 500 100 500 100 500 100 500
S2 0:005 0:000 0:005 0:252 0:017 0:011 0:004 0:109 0:085
S3 0:008 0:000 0:009 0:175 0:027 0:006 0:007 0:089 0:088
S1 0:017 0:000 0:015 0:304 0:047 0:003 0:017 0:130 0:109
S4 0:009 0:000 0:009 0:119 0:023 0:017 0:011 0:130 0:097
S5 0:016 0:000 0:015 0:230 0:043 0:005 0:016 0:120 0:109
S6 0:000 0:000 0:000 0:061 0:003 0:000 0:000 0:061 0:070
S7 0:069 0:045 0:068 0:585 0:058 0:072 0:062 0:095 0:123
S8 0:100 0:128 0:107 0:950 0:105 0:105 0:116 0:108 0:211
S9 0:150 0:192 0:160 1:241 0:143 0:127 0:143 0:246 0:117
S10 0:100 0:133 0:106 0:875 0:092 0:138 0:111 0:404 0:155
S11 0:000 0:000 0:000 0:185 0:003 0:004 0:000 0:088 0:074
S12 0:000 0:000 0:000 0:083 0:004 0:000 0:000 0:064 0:077
S13 0:000 0:000 0:000 0:081 0:003 0:000 0:000 0:064 0:074
S14 0:000 0:000 0:000 0:020 0:003 0:000 0:000 0:070 0:078
S15 0:000 0:000 0:000 0:140 0:003 0:000 0:000 0:065 0:075
S16 0:000 0:000 0:000 0:040 0:005 0:001 0:000 0:077 0:074
S17 0:000 0:000 0:000 0:264 0:004 0:000 0:000 0:065 0:075
S18 0:000 0:000 0:000 0:084 0:004 0:000 0:000 0:064 0:075
S19 0:000 0:000 0:000 0:083 0:004 0:000 0:000 0:064 0:076
S20 0:000 0:000 0:000 0:049 0:004 0:000 0:000 0:064 0:075
P1 P2 P3 P4 P5 P6
2m
u4
6 x 4m
Fig. 38.12 Model of a truss structure with 23 members. The quantity of interest is the maximum
displacement at mid-span u4
Sensitivity analysis is also of great interest for engineering models whose input
parameters may have different distributions. As an example consider the elastic
truss structure depicted in Fig. 38.12 (see, e.g., Blatman and Sudret [11]). This truss
is made of two types of bars, namely, horizontal bars with cross-section A1 and
Youngs modulus (stiffness) E1 on the one hand and oblique bars with cross-section
A2 and Youngs modulus (stiffness) E2 on the other hand. The truss is loaded with
1320 L. Le Gratiet et al.
six vertical loads applied on the top chord. Of interest is the maximum vertical
displacement (called deflection) at mid-span. This quantity is computed using a
finite element model comprising elastic bar elements.
The various parameters describing the behavior of this truss structure are
modeled by independent random variables that account for the uncertainty in both
the physical properties of the structure and the applied loads. Their distributions are
gathered in Table 38.4.
These input variables are collected in the random vector:
X D fE1 ; E2 ; A1 ; A2 ; P1 ; : : : ; P6 g : (38.72)
u4 D G FE .X /: (38.73)
0.5
0.4
0.4
0.3
0.3
Sobol
Sobol
0.2
0.2
0.1
0.1
0.0
0.0
Fig. 38.13 Truss structure First-order Sobol index estimates as a function of the sample size n
for the truss model. The horizontal solid lines represent reference values of input variables E1 , P3 ,
and P5 from a Monte Carlo estimate on 6,000,000 samples. For the Gaussian process regression,
the segments represent the 95% confidence intervals taking both into account the metamodel and
the Monte Carlo errors
5 Conclusions
Sobol indices are recognized as good descriptors of the sensitivity of the output of
a computational model to its various input parameters. Classical estimation methods
based on Monte Carlo simulation are computationally expensive though. The
required costs, in the order of 103 104 model evaluations, are often not compatible
with the advanced simulation models encountered in engineering applications.
For this reason, surrogate models may be first built up from a limited number
of runs of the computational model (the so-called experimental design), and the
sensitivity analysis is then carried out by substituting the surrogate model for the
original one.
1322 L. Le Gratiet et al.
Polynomial chaos expansions and Gaussian processes are two popular methods
that can be used for this purpose. The advantage of the PCE approach is that the
Sobol indices at any order may be computed analytically once the expansion is
available. In this contribution, least-square minimization techniques are presented
to compute the PCE coefficients, yet any intrusive or nonintrusive method could be
used as an alternative.
In contrast Gaussian process surrogate models are used together with Monte
Carlo simulation for estimating the Sobol indices. The advantage of this approach
is that the metamodel error can be included in the estimators. Note that bootstrap
techniques can be used similarly to calculate and include metamodeling error also
for PCE-based sensitivity analysis, as demonstrated by Dubreuil et al. [24].
As shown in the various comparisons, PCE and GP give similar accuracy
(measured in terms of the Q2 validation coefficient) for a given size of the
experimental design in a broad range of applications. The replication of the analyses
with different random designs of the same size show a smaller scatter using GP
for extremely small designs, whereas PCE becomes more stable for medium-size
designs. Selecting the best technique is in the end problem-dependent, and it is
worth comparing the two approaches using the same experimental design, as it
can be done in recent sensitivity analysis toolboxes such as OpenTURNS [2] and
UQLab [41].
Finally it is worth mentioning that the so-called derivative-based global sensitiv-
ity measures (DGSM) originally introduced by Sobol and Kucherenko [56] can also
be computed using surrogate models. In particular, polynomial chaos expansions
may be used to compute the DGSM analytically, as shown in Sudret and Mai
[66]. Furthermore, Gaussian process regression can also be used as presented in
De Lozzo and Marrel [21]. The recent combination of polynomial chaos expansions
and Gaussian processes into PC-Kriging [54] also appears promising for estimating
sensitivity indices from extremely small experimental designs.
References
1. Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions. Dover Publications,
New York (1970)
2. Andrianov, G., Burriel, S., Cambier, S., Dutfoy, A., Dutka-Malen, I., de Rocquigny, E.,
Sudret, B., Benjamin, P., Lebrun, R., Mangeant, F., Pendola, M.: Open TURNS, an
open source initiative to Treat Uncertainties, RisksN Statistics in a structured industrial
approach. In: Proceedings of the ESREL2007 Safety and Reliability Conference, Stavenger
(2007)
3. Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of
Gaussian processes with model misspecification. Comput. Stat. Data Anal. 66, 5569 (2013)
4. Bates, R.A., Buck, R., Riccomagno, E., Wynn, H.: Experimental design and observation for
large systems. J. R. Stat. Soc. Ser. B 58(1), 7794 (1996)
5. Bect, J., Ginsbourger, D., Li, L., Picheny, V., Vazquez, E.: Sequential design of computer
experiments for the estimation of a probability of failure. Stat. Comput. 22, 773793 (2012)
6. van Beers, W., Kleijnen, J.: Customized sequential designs for random simulation experiments:
Kriging metamodelling and bootstrapping. Eur. J. Oper. Res. 186, 10991113 (2008)
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1323
7. Berveiller, M., Sudret, B., Lemaire, M.: Presentation of two methods for computing the
response coefficients in stochastic finite element analysis. In: Proceedings of the 9th ASCE
Specialty Conference on Probabilistic Mechanics and Structural Reliability, Albuquerque
(2004)
8. Berveiller, M., Sudret, B., Lemaire, M.: Stochastic finite elements: a non intrusive approach by
regression. Eur. J. Comput. Mech. 15(13), 8192 (2006)
9. Bieri, M., Schwab, C.: Sparse high order FEM for elliptic sPDEs. Comput. Methods Appl.
Mech. Eng. 198, 11491170 (2009)
10. Blatman, G.: Adaptive sparse polynomial chaos expansions for uncertainty propagation and
sensitivity analysis. PhD thesis, Universit Blaise Pascal, Clermont-Ferrand (2009)
11. Blatman, G., Sudret, B.: Sparse polynomial chaos expansions and adaptive stochastic finite
elements using a regression approach. C. R. Mcanique 336(6), 518523 (2008)
12. Blatman, G., Sudret, B.: An adaptive algorithm to build up sparse polynomial chaos expansions
for stochastic finite element analysis. Prob. Eng. Mech. 25, 183197 (2010)
13. Blatman, G., Sudret, B.: Efficient computation of global sensitivity indices using sparse
polynomial chaos expansions. Reliab. Eng. Syst. Saf. 95, 12161229 (2010)
14. Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on Least Angle
Regression. J. Comput. Phys. 230, 23452367 (2011)
15. Brown, S., Beck, J., Mahgerefteh, H., Fraga, E.: Global sensitivity analysis of the impact of
impurities on CO2 pipeline failure. Reliab. Eng. Syst. Saf. 115, 4354 (2013)
16. Buzzard, G.: Global sensitivity analysis using sparse grid interpolation and polynomial chaos.
Reliab. Eng. Syst. Saf. 107, 8289 (2012)
17. Buzzard, G., Xiu, D.: Variance-based global sensitivity analysis via sparse-grid interpolation
and cubature. Commun. Comput. Phys. 9(3), 542567 (2011)
18. Chastaing, G., Le Gratiet, L.: Anova decomposition of conditional Gaussian processes
for sensitivity analysis with dependent inputs. J. Stat. Comput. Simul. 85(11), 21642186
(2015)
19. Chils, J., Delfiner, P.: Geostatistics: Modeling Spatial Uncertainty. Wiley Series in Probability
and Statistics (Applied Probability and Statistics Section). Wiley, New York (1999)
20. Crestaux, T., Le Matre, O., Martinez, J.M.: Polynomial chaos expansion for sensitivity
analysis. Reliab. Eng. Syst. Saf. 94(7), 11611172 (2009)
21. De Lozzo, M., Marrel, A.: Estimation of the derivative-based global sensitivity measures using
a Gaussian process metamodel (2015, submitted)
22. Ditlevsen, O., Madsen, H.: Structural reliability methods. Wiley, Chichester (1996)
23. Doostan, A., Owhadi, H.: A non-adapted sparse approximation of PDEs with stochastic inputs.
J. Comput. Phys. 230(8), 30153034 (2011)
24. Dubreuil, S., Berveiller, M., Petitjean, F., Salan, M.: Construction of bootstrap confidence
intervals on sensitivity indices computed by polynomial chaos expansion. Reliab. Eng. Syst.
Saf. 121, 263275 (2014)
25. Dubrule, O.: Cross validation of Kriging in a unique neighborhood. Math. Geol. 15, 687699
(1983)
26. Durrande, N., Ginsbourger, D., Roustant, O., Laurent, C.: Anova kernels and RKHS of
zero mean functions for model-based sensitivity analysis. J. Multivar. Anal. 115, 5767
(2013)
27. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32,
407499 (2004)
28. Fajraoui, N., Ramasomanana, F., Younes, A., Mara, T., Ackerer, P., Guadagnini, A.: Use of
global sensitivity analysis and polynomial chaos expansion for interpretation of nonreactive
transport experiments in laboratory-scale porous media. Water Resour. Res. 47(2) (2011)
29. Ghanem, R., Spanos, P.: Stochastic Finite Elements A Spectral Approach. Springer,
New York (1991). (Reedited by Dover Publications, Mineola, 2003)
30. Gramacy, R., Taddy, M.: Categorical inputs, sensitivity analysis, optimization and importance
tempering with tgp version 2, an R package for treed Gaussian process models. J. Stat. Softw.
33, 148 (2010)
1324 L. Le Gratiet et al.
31. Gramacy, R.B., Taddy, M., Wild, S.M.: Variable selection and sensitivity analysis using
dynamic trees, with an application to computer performance tuning. Ann. Appl. Stat. 7(1),
5180 (2013)
32. Harville, D.: Maximum likelihood approaches to variance component estimation and to related
problems. J. Am. Stat. Assoc. 72(358), 320338 (1977)
33. Iooss, B., Lematre, P.: A review on global sensitivity analysis methods. In: Meloni, C.,
Dellino, G. (eds.) Uncertainty management in Simulation-Optimization of Complex Systems:
Algorithms and Applications. Springer, New York (2015)
34. Jakeman, J., Eldred, M., Sargsyan, K.: Enhancing `1 -minimization estimates of polynomial
chaos expansions using basis selection. J. Comput. Phys. 289, 1834 (2015)
35. Janon, A., Klein, T., Lagnoux, A., Nodet, M., Prieur, C.: Asymptotic normality and efficiency
of two Sobol index estimators. ESAIM: Prob. Stat. 18, 342364 (2014)
36. Lantujoul, C., Desassis, N.: Simulation of a Gaussian random vector: a propagative version of
the Gibbs sampler. In: The 9th International Geostatistics Congress, Oslo, p. 1747181, https://
hal-mines-paristech.archives-ouvertes.fr/hal-00709250 (2012)
37. Le Gratiet, L., Cannamela, C.: Cokriging-based sequential design strategies using fast
cross-validation techniques for multifidelity computer codes. Technometrics 57(3), 418427
(2015)
38. Le Gratiet, L., Cannamela, C., Iooss, B.: A Bayesian approach for global sensitivity anal-
ysis of (multifidelity) computer codes. SIAM/ASA J. Uncertain. Quantif. 2(1), 336363
(2014)
39. Le Gratiet L, Couplet, M., Iooss, B., Pronzato, L.: Planification dexpriences squentielle pour
lanalyse de sensibilit. In: 46mes Journs de Statistique de la SFdS, Rennes (2014)
40. Lebrun, R., Dutfoy, A.: An innovating analysis of the Nataf transformation from the copula
viewpoint. Prob. Eng. Mech. 24(3), 312320 (2009)
41. Marelli, S., Sudret, B.: UQLab: a framework for uncertainty quantification in Matlab. In:
Vulnerability, Uncertainty, and Risk (Proceedings of the 2nd International Conference on
Vulnerability, Risk Analysis and Management (ICVRAM2014), Liverpool), pp. 25542563,
doi:10.1061/9780784413609.257, http://ascelibrary.org/doi/abs/10.1061/9780784413609.257,
http://ascelibrary.org/doi/pdf/10.1061/9780784413609.257 (2014)
42. Marrel, A., Iooss, B., Van Dorpe, F., Volkova, E.: An efficient methodology for model-
ing complex computer codes with Gaussian processes. Comput. Stat. Data Anal. 52(10),
47314744 (2008)
43. Marrel, A., Iooss, B., Laurent, B., Roustant, O.: Calculations of Sobol indices for the Gaussian
process metamodel. Reliab. Eng. Syst. Saf. 94, 742751 (2009)
44. Marrel, A., Iooss, B., Da Veiga, S., Ribatet, M.: Global sensitivity analysis of stochastic
computer models with joint metamodels. Stat. Comput. 22(3), 833847 (2010)
45. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting
values of input variables in the analysis of output from a computer code. Technometrics 2,
239245 (1979)
46. Munoz Zuniga M, Kucherenko, S., Shah, N.: Metamodelling with independent and dependent
inputs. Comput. Phys. Commun. 184, 15701580 (2013)
47. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. Society for
Industrial and Applied Mathematics, Philadelphia (1992)
48. Oakley, J., OHagan, A.: Probabilistic sensitivity analysis of complex models a Bayesian
approach. J. R. Stat. Soc. Ser. B 66(part 3), 751769 (2004)
49. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT, Cambridge
(2006)
50. Robert, C.: The Bayesian Choice: From Decision-Theoretic Foundations to Computational
Implementation. Springer, New York (2007)
51. Sandoval, E.H., Anstett-Collin, F., Basset, M.: Sensitivity study of dynamic systems using
polynomial chaos. Reliab. Eng. Syst. Saf. 104, 1526 (2012)
52. Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments.
Springer, New York (2003)
38 Metamodel-Based Sensitivity Analysis: Polynomial Chaos . . . 1325
53. Sargsyan, K., Safta, C., Najm, H., Debusschere, B., Ricciuto, D., Thornton, P.: Dimensionality
reduction for complex models via Bayesian compressive sensing. Int. J. Uncertain. Quantif.
4(1), 6393 (2014)
54. Schbi, R., Sudret, B., Wiart, J.: Polynomial-chaos-based Kriging. Int. J. Uncertain. Quantif.
5(2), 171193 (2015)
55. Sobol, I.: Sensitivity estimates for non linear mathematical models. Math. Model. Comput.
Exp. 1, 407414 (1993)
56. Sobol, I., Kucherenko, S.: Derivative based global sensitivity measures and their link with
global sensitivity indices. Math. Comput. Simul. 79(10), 30093017 (2009)
57. Sobol, I., Tarantola, S., Gatelli, D., Kucherenko, S., Mauntz, W.: Estimating the approximation
error when fixing unessential factors in global sensitivity analysis. Reliab. Eng. Syst. Saf. 92(7),
957960 (2007)
58. Soize, C., Ghanem, R.: Physical systems with random uncertainties: chaos representations with
arbitrary probability measure. SIAM J. Sci. Comput. 26(2), 395410 (2004)
59. Stein, M.: Interpolation of Spatial Data. Springer Series in Statistics. Springer, New York
(1999)
60. Storlie, C., Swiler, L., Helton, J., Sallaberry, C.: Implementation and evaluation of nonpara-
metric regression procedures for sensitivity analysis of computationally demanding models.
Reliab. Eng. Syst. Saf. 94(11), 17351763 (2009)
61. Sudret, B.: Global sensitivity analysis using polynomial chaos expansions. In: Spanos, P.,
Deodatis, G. (eds.) In: Proceeding of the 5th International Conference on Computational
Stochastic Mechanics (CSM5), Rhodos (2006)
62. Sudret, B.: Uncertainty propagation and sensitivity analysis in mechanical models contri-
butions to structural reliability and stochastic spectral methods. Tech. rep., Universit Blaise
Pascal, Clermont-Ferrand, France, habilitation diriger des recherches (229 pages) (2007)
63. Sudret, B.: Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst.
Saf. 93, 964979 (2008)
64. Sudret, B.: Polynomial chaos expansions and stochastic finite element methods. In: Phoon,
K.K., Ching, J. (eds.) Risk and Reliability in Geotechnical Engineering. Taylor and Francis,
Boca Raton (2015)
65. Sudret, B., Caniou, Y.: Analysis of covariance (ANCOVA) using polynomial chaos expansions.
In: Deodatis, G. (ed.) Proceeding of the 11th International Conference on Structural Safety and
Reliability (ICOSSAR2013), New York (2013)
66. Sudret, B., Mai, C.V.: Computing derivative-based global sensitivity measures using polyno-
mial chaos expansions. Reliab. Eng. Syst. Saf. 134, 241250 (2015)
67. Svenson, J., Santner, T., Dean, A., Hyejung, M.: Estimating sensitivity indices based on
Gaussian process metamodels with compactly supported correlation functions. J. Stat. Plan.
Inference 114, 160172 (2014)
68. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening,
predicting, and computer experiments. Technometrics 34(1), 1525 (1992)
69. Xiu, D., Karniadakis, G.: The Wiener-Askey polynomial chaos for stochastic differential
equations. SIAM J. Sci. Comput. 24(2), 619644 (2002)
70. Younes, A., Mara, T., Fajraoui, N., Lehmann, F., Belfort, B., Beydoun, H.: Use of global
sensitivity analysis to help assess unsaturated soil hydraulic parameters. Vadose Zone J. 12(1)
(2013)
Sensitivity Analysis of Spatial and/or
Temporal Phenomena 39
Amandine Marrel, Nathalie Saint-Geours, and Matthias De Lozzo
Abstract
This section presents several sensitivity analysis methods to deal with spatial
and/or temporal models. Focusing on the variance-based approach, solutions
are proposed to perform global sensitivity analysis with functional inputs and
outputs. Some of these solutions are illustrated on two industrial case studies: an
environmental model for flood risk assessment and an atmospheric dispersion
model for radionuclide release. These test cases are fully described at the
beginning of the paper. Then a section is dedicated to spatiotemporal inputs
and proposes several sensitivity analysis methods. The use of metamodels is
also addressed. Pros and cons of the various methods are then discussed.
In a subsequent section, solutions to deal with spatiotemporal outputs are
proposed: aggregated, site, and block indices are described. The use of functional
metamodels for sensitivity analysis purpose is also discussed.
Keywords
Spatiotemporal inputs Spatiotemporal outputs Metamodel Macro-
parameter Joint metamodeling Dimension reduction Distance-based
dissimilarity measure Trigger input Map labeling Aggregated indices
Site indices Block indices Change of support
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328
2 Environmental and Industrial Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329
2.1 NOE Environmental Model for Flood Risk Assessment . . . . . . . . . . . . . . . . . . . . . 1329
2.2 Ceres-Mithra Model for Radionuclide Atmospheric Dispersion . . . . . . . . . . . . . . . 1330
3 Methods for Spatiotemporal Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1331
3.1 Variance-Based Sensitivity Indices for Spatiotemporal Inputs (Without
Metamodeling) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334
3.2 Variance-Based Sensitivity Analysis with Metamodels . . . . . . . . . . . . . . . . . . . . . . 1337
3.3 Pros and Cons of Available Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1340
3.4 Application on NOE Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1342
4 Methods for Spatiotemporal Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344
4.1 Sensitivity Index Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344
4.2 Block Sensitivity Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345
4.3 Aggregated Sensitivity Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345
4.4 Use of Metamodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1348
4.5 Application on NOE Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1349
4.6 Application on Ceres-Mithra Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1351
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355
1 Introduction
Most of the methods of sensitivity analysis (SA) tools presented in the previous
papers are initially defined for scalar variables, while industrial and environmental
applications often deal with more complex input or output parameters. Indeed,
these parameters can be spatial, temporal, spatiotemporal, or functional in a more
general way. Classical SA methods must be adapted to address the problems posed
by this kind of data as input of numerical models but also as output. Among the
factors hindering the use of SA methods in the domain of spatiotemporal data
are the explosion of dimensionality problems and the lack of maturity of certain
models. We will examine in particular several other limits: (i) spatial model input
data generally exhibit some autocorrelation, yet conventional sensitivity analysis
methods only consider independent scalar variables; (ii) notions of scale, support,
and resolution, which play a prominent role in spatial modeling, are ignored
in the formal frameworks of conventional sensitivity analysis methods; and (iii)
the problem of the interpretation of SA in the case of functional output. Hence,
there appears to be a great need to adapt sensitivity analysis methods to the
specific context of spatially distributed modeling. Some ideas have already been
provided in the literature to address this issue. First, in existing research related to
sensitivity analysis, one may find some publications that deal with correlated input
variables, but these studies rarely examine the particular case of spatial dependence.
In addition, spatial statistics, and more specifically geostatistics, offer theoretical
frameworks to describe the uncertainty weighting on spatially distributed data and
to grasp the notions of spatial scale, support, or resolution. Geostatistics also provide
tools to simulate these spatial uncertainties. However, this theoretical corpus has
never been linked to that of sensitivity analysis of numerical models.
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1329
Flood risk research makes intensive use of numerical modeling to forecast flooding
events, assess the impacts of potential flooding events in terms of human well-
being and economic and social development, and design efficient flood management
policies. These models simulate hydrological, hydraulic, and economic processes
over a given study area. They are usually spatially distributed and often make use of
geographic information systems (GIS) tools.
The NOE model is a spatially distributed code which is used to assess the
economic impact of flood risk [11]. It calculates the so-called EAD risk indicator
(expected annual damage [e/year]), which is defined as the expected economic
losses due to flooding events over one year on a given study area. The EAD indicator
is spatially distributed and mapped over the floodplain. Its estimation requires to
compute flood damages and flood return intervals for a range of flood scenarios of
various magnitudes (e.g., a 10-year flood, a 30-year flood, etc.). The return intervals
of these scenarios are computed from a series of annual maximum flows at a gauging
station. The estimation of economic damages caused by a given flood scenario
(Fig. 39.1) is based on the combination of various input data: (i) a hazard map which
gives maximum water depths, water velocity, and flood duration over the study area
for this flood scenario; (ii) a land use map which identifies and locates the various
assets at stake over the study area (e.g., buildings, agricultural land, shops, factories,
etc.); and (iii) a set of so-called damage functions that describe the vulnerability of
flood-exposed assets.
As a study area, we selected the Lower Orb River fluvial plain, located in the
south of France (Hrault). This 63 sq. km area has a Mediterranean subhumid regime
and suffers from frequent flooding events that may result in large economic losses.
For example, the flood of Dec. 1995Jan. 1996, with a peak discharge of 1,700 m3 /s,
affected approximately 15,000 people and caused a total amount of damage of
53 Me [42].
The NOE model has both spatially distributed inputs and a spatial output. It also
has two important characteristics that need to be stressed out. First, the model end
user (e.g., a water manager) is interested in the whole output map of the EAD
risk indicator, but he/she is also interested in knowing the aggregated value of
the EAD risk indicator over one district, one city, or the entire floodplain. We
1330 A. Marrel et al.
Fig. 39.1 NOE model: estimation of economic losses for a given flood scenario
thus say that the NOE model is spatially additive, which means R that the output
of interest over a given spatial support R2 is the sum Y D z2 Y .z/d z. Many
other environmental models are also spatially additive (e.g., models that describe
the rainfall over a study area), but some are not (e.g., a model that predicts the
maximum pollutant concentration max fY .z/jz 2 g over a study area). Second, the
NOE model is also a point-based model, meaning that the value of model output at
a given location z 2 R2 only depends on the set of scalar inputs and on the values
Wi .z/ of spatially distributed inputs at that same location only [21]. Such point-
based models are encountered in various fields of environmental and earth sciences,
whenever spatial interactions in the physical processes under study can be neglected
in a first approximation.
Most SA techniques were initially designed for models with independent and scalar
random inputs only: this is a first obstacle that limits its extension to spatial models.
Indeed, in a spatial model, some model inputs are not scalar values but 2D raster
1332
Fig. 39.2 Flowchart of the C-M model, with the uncertain inputs in bold and the spatial outputs of the simulator
A. Marrel et al.
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1333
or vector data and may exhibit some sort of spatial autocorrelation; the original
framework of VB-GSA does not fit any longer in this case. In this paper, we explore
this issue and investigate how variance-based sensitivity indices can be calculated in
numerical models with spatially distributed inputs. Only models with scalar outputs
are considered here (spatially distributed outputs will be studied in the second part
of the paper). We consider the model f with d scalar inputs X D .X1 ; : : : ; Xd /
and a functional input fWz gz2Dz indexed by the continuous variable z defined on the
domain Dz , where Dz can be a time interval or a spatial domain, for example. The
model is defined by:
f W Rd F ! R
(39.1)
.X; Wz / 7! Y D f .X; Wz /
3.1.1 Macro-parameter
A first approach is to consider the spatiotemporal input fW .z/ W z 2 Dz g, or more
precisely its numerical representation, as a vector of m scalar inputs W D
.W1 : : : Wm /. Following [24], we will use the term macro-parameter to refer to this
method. Most often, the temporal or spatial domain Dz is discretized as a regular
grid with m elements:
discretization
Dz ! z.1/ ; : : : ; z.m/ 2 Dm
z :
In
this case, scalar inputs W D .W1 ; : : : ; Wm / are simply the values
W .z1 /; : : : ; W .zm / at the grid points. Spatiotemporal inputs may also be stored
in a more complex format, for example, as a vector layer in a geographic
information system, comprising numerous spatial objects with various alpha-
numerical attributes. In such case, scalar inputs W D .W1 ; : : : ; Wm / are the values
of the attributes of each spatial object in the layer, and m is a multiple of the number
of objects. In both cases, the model (39.1) is replaced by its discretized version:
f ? W Rd Rm ! R
(39.2)
.X; W / 7! Y D f ? .X; W /
dimension m gets too large (m 100), the large samples required to estimate
sensitivity indices may be difficult to handle.
f ? W Rd f0; 1g ! R
(39.4)
.X; / 7! Y D f ? .X; / :
f ? W Rd 1I n ! R
.L/ (39.5)
.X; L/ 7! Y D f .X; Wz / :
When the computer model is too CPU time-consuming to allow a sufficient number
of model calculations, it can be approximated by a metamodel, estimated on a
few simulations of the model (learning sample), as described in Paper 8 (see
Chap. 38, Metamodel-Based Sensitivity Analysis: Polynomial Chaos Expan-
sions and Gaussian Processes). Then, the sensitivity indices are directly estimated
on the metamodel. In the case of functional inputs, the application of such approach
raises the problem of taking into account a functional input in a metamodel. In what
follows, several strategies are proposed to address this problem.
f W Rd N ! R
(39.6)
.X; X" / 7! Y D f .X; X" / :
Joint metamodels yield two metamodels, one for the mean and another for the
dispersion component:
Referring to the total variance formula, the variance of the output variable Y can be
rewritten and deduced from the two metamodels:
Furthermore, the variance of Y is the sum of the contributions of all the input
variables X D .X1 ; : : : ; Xd / and X" :
d X
X
Var.Y / D V" .Y / C VJ .Y / C VJ " .Y / (39.10)
iD1 jJ jDi
where V" .Y / D VarE.Y jX" /, Vi .Y / D VarE.Y jXi /, Vi" .Y / D VarE.Y jXi X" /
Vi .Y / V" .Y /, Vij .Y / D VarE.Y jXi Xj / Vi .Y / Vj .Y /; : : :. Variance of the
mean component Ym .X/ denoted hereafter Ym can be also decomposed:
d X
X
Var.Ym / D VJ .Ym / : (39.11)
iD1 jJ jDi
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1339
Note that
Moreover, Sobol indices for variable Y according to input variables X D .Xi /iD1:::d
can be derived from their usual definition given in Paper 5 (see Chap. 35,
Variance-Based Sensitivity Analysis: Theory and Estimation Algorithms):
VJ .Ym /
SJ D : (39.13)
Var.Y /
These Sobol indices can be computed using the same classical Monte Carlo
techniques: these algorithms are applied to the metamodel defined by the mean
component Ym of the joint model.
Thus, all terms contained in VarYm .X/ of Eq. (39.9) have been considered.
Then, EYd .X/ can be estimated by a simple numerical integration of Yd .X/
following the distribution of X. Yd .X/ is evaluated with a metamodel, for example,
the dispersion component of the joint model. Therefore, the total sensitivity index
of X" is given by:
Pd P
V" .Y / C iD1 jJ jDi VJ " .Y / EYd .X/
S"tot D D : (39.14)
Var.Y / Var.Y /
Several types of distances can be defined, but the important point is that the distance
should be chosen such that it can be computed rapidly and help predicting if two
parameter fields will lead to similar or different model outputs Y . For example, [45]
propose to use the Hausdorff distance to quantify the differences in the geometry
of complex 3D models (having different fault systems, horizon geometries, etc.).
In their application of oil recovery from a production well, [40] define the square
distance between two parameter fields as the integrated square difference between
the responses computed for the two parameter fields with a fast streamline solver.
The distance between every pair of parameter fields is computed and used as a base
for mapping all the parameter fields in an abstract metric space. Going a step further,
[4] use the same framework to formulate the inverse problem. In a similar way,
several authors recently proposed to handle functional inputs in a Gaussian process
metamodel using covariance functions adapted to the study of functional variables
[12, 18], such as a covariance function depending on functional distances as those
previously mentioned. Still in the framework of Gaussian process metamodel, [33]
proposes, for time-varying inputs and outputs, a correlation function based on a
weighted distance between input functions, suggesting a belief that at any time, the
output is most sensitive to recent values of the input function.
Whatever the kind of distance chosen to take into account the functional inputs in
the distance-based metamodel, the latter, once built, is then directly used to compute
the VB-GSA indices with classical intensive Monte Carlo methods.
We discuss here the pros and cons of the various methods displayed in the previous
section. All these methods intend to compute variance-based sensitivity indices that
could measure the influence of an uncertain spatiotemporal input fW .z/ W z 2 Dz g
on the variance of model output Y . However, they do not produce exactly the same
information.
the set of sensitivity indices SWj ; SWtotj does not yield a good measure
j 21Im
of the main and total contributions of Wz to the variance of model output Y .
One must also note that separate sensitivity indices SWj , SWtotj can be computed
for each parameter Wj , j 2 f1; : : : ; mg if and only if those parameters Wj are
assumed to be independent. Otherwise, only group sensitivity indices should be
computed, because usual variance-based sensitivity indices are not defined for
dependent variables. Besides, the dimension reduction with grouping method only
yields an approximation of sensitivity indices of Wz , as the total information initially
contained in Wz is reduced to a small set of scalar parameters.
It should also be noted that in the joint metamodeling method, only total-order
sensitivity index SWtot can be estimated, but first-order index SW remains unknown.
Results of a numerical study [36] also clearly suggest that the trigger method
does not produce the same information as other methods. Sensitivity index of the
trigger parameter proved to be always smaller than the measure SW brought by
the macro-parameter or map labeling methods. One possible explanation is that, in
the trigger method, uncertainty in Wz is taken into account only when the sampled
value of the trigger input is equal to D 1, that is, one out of every two model runs
in average (P. D 0/ D P. D 1/ D 1=2). Hence, the effect of uncertain model
input Wz on the variance of model output Y is systematically underestimated. This
result leads us to believe that the trigger method is not appropriate to deal with
spatially distributed inputs in variance-based global sensitivity analysis.
Table 39.1 Methods for VB-GSA indices with spatiotemporal inputs. Column (a): number
of map realizations needed. Column (b): coupling with a metamodel is possible. Column (c):
possibility to cope with several spatial inputs. Column (d): Possibility to model spatiotemporal
autocorrelation in fW .z/ W z 2 Dz g. Column (e): sensitivity indices associated with Wz
Method (a) (b) (c) (d) (e) Comments
Macro-parameter N No Yes No Indices of the Possibly
group
of inputs intractable
Wj j 2f1;:::mg for large data
Dimension reduction N No Yes No Indices of the Simplification of
group
of inputs the input data
Wj j 2f1;:::mg
Trigger N =2 No Yes Yes Indices of the A measure of
trigger input sensitivity is
obtained, which
is not equal
to variance-
based sensitivity
indices of W .z/.
Map labeling nL < N No Yes Yes Indices of the Spatially
label input L distributed
input W .z/ is
under-sampled
compared to
other inputs
Joint GAM metamodel N Yes No Yes Total-order index Use of a meta-
SZtot only model
Saint-Geours et al. [38] applied a VB-GSA to the NOE model. Here we briefly
summarize their main results, considering a single scalar output Y , which is the
total expected annual damage [e] over the entire Orb Delta.
3.4.3 Results
Figure 39.3 displays the total-order sensitivity indices Sitot computedR for each uncer-
tain model input with respect to the aggregated model output Y D z2 Y .z/d z over
the entire Orb floodplain. The nonspatially distributed inputs (flood return intervals
X1 and damage functions X2 ) prove to be the most important sources of uncertainty
when aggregating the EAD risk indicator over the total floodplain. On the contrary,
the contribution of the two spatially distributed inputs, that is, the hazard map Wz;1
and the land use map Wz;2 , is small.
f W Rd ! F
(39.15)
X 7! Y .z/ D f .XI z/
for any z 2 Dz . The inputs X are modeled by random variables and characterized by
their probability density function X .
We also consider a discretization of Dz with p elements:
discretization
Dz ! Z D z.1/ ; : : : ; z.p/ 2 Dpz :
f W Rd ! Rp
T (39.16)
X 7! Y D f.X/ D f XI z.1/ ; : : : ; f XI z.p/ :
where f0 D EY,
fi D EYjXi f0 , fi D EYjXi f0 and fi;i D Yfi fi f0 ,
with Xi D Xj 1j d . Then, they compute the covariance matrix of both sides of
j i
1346 A. Marrel et al.
this equality and apply the trace operator in order to turn the matrix format into a
scalar one:
Pp .l/
trace.Ci / lD1 VYl Si
Si D D P p (39.17)
lD1 VYl
trace. /
.l/
where VYl is the variance associated to the lth scalar output and Si is the first-
order Sobol index associated to the lth output for the i th input parameter.
Gamboa et al. [16] estimate this index using the pick-and-freeze estimator
2 !
Pp 1 PN .j / i;.j / 1 PN .j /
Yl C Yl
i ;.j /
lD1 N j D1 Yl Yl N j D1 2
Si;N D
.j / 2
i ;.j / 2 2 !
Pp 1 PN Yl C Yl 1 PN Yl
.j /
C Yl
i ;.j /
lD1 N j D1 2
N j D1 2
(39.18)
Y D C VH
Then, for any w f1; : : : ; d g n ;, the corresponding sensitivity index for the kth
principal component vk is defined by
Vw;k
Sw;k D
k
P
where Vw;k D V E Hk jXw u;uw Vu;k and V;;k D 0, with Xw D .Xi /i2w .
Particularly, for any i 2 f1; : : : ; d g, the first-order and total variance-based
sensitivity indices vk are
V E Hk jXi X
tot
Si;k D and Si;k D Sw;k :
k wf1;:::;d g
w3i
This first-order Sobol index Si has an expression similar to the one of [16] pre-
sented in Eq. (39.17). Moreover, it can easily be proven
that these terms are equal.
For the estimation step, we consider an N -sample X.i/ ; Y.i/ 1iN associated to a
particular Sobol index computation method and replace the covariance matrix by
the empirical one
N
1 X .i/ N .i/ N T
O D Y Y Y Y (39.20)
N iD1
PN
N D
where Y 1
Y.i/ . Similarly, we note O k and vO k , the kth eigenvalue and
N iD1
eigenvector of O , respectively, and we consider the score matrix HO D HO ik 1i N
1kp
equal. These aggregated sensitivity indices are global measures that summarize the
local model behavior over spatiotemporal domain Dz . A unique ranking of model
inputs can be derived from these aggregated sensitivity indices.
Remark 1. Previous sensitivity index maps and block sensitivity indices and their
estimation often require thousands of model evaluations. Recently, Da Veiga [8]
introduced the use of dependence measures in global sensitivity analysis, and
De Lozzo and Marrel [9] developed it to the specific case of screening. One of
these new indices is based on the Hilbert-Schmidt independence criterion (HSIC),
which is defined, for any pair of measurable random variables .X; Y /, as the
Hilbert-Schmidt norm of the cross-covariance operator between some function
transformations of X and Y in reproducing kernel Hilbert spaces [19]. Da Veiga
[8] noted that SA with HSIC-based sensitivity measure requires fewer model
evaluations compared to variance-based ones. Moreover, kernels dedicated to
functional data allow to deal easily with spatiotemporal inputs and outputs. These
recent tools have to be confirmed by industrial applications but already represent an
alternative for sensitivity analysis with spatial output.
All the methods discussed above (sensitivity index maps, block sensitivity indices,
aggregated sensitivity indices) are based on intensive Monte Carlo methods and
require lots of simulations to be estimated. Some spatiotemporal models can be
too time-consuming to be directly used to conduct such variance-based methods.
To avoid the problem of huge calculation time, it can be useful to replace the
complex computer code by a metamodel as introduced in the first section. The
problem of building a metamodel for a functional output has recently been addressed
by Shi et al. [41] and Bayarri et al. [1] who proposed an approach based upon
functional decomposition, such as wavelets. Recently, Marrel et al. [28] proposed
a complete methodology to perform sensitivity analysis of two-dimensional output:
their approach consists in building a functional metamodel and computing variance-
based sensitivity indices at each location of the spatial output map. The maps of
Sobol indices thus obtained yield the identification of global and local influence
of each input variable and the detection of areas with interactions. The functional
metamodel is based on a decomposition on a basis of orthogonal functions
followed by a metamodeling of the coefficients related to the main decomposition
components. This method can be applied with any kind of orthogonal functions such
as wavelets, Fourier functions, principal component analysis, polynomial chaos,
etc. For example, in Marrel et al. [28], the authors use a wavelet decomposition
and Gaussian process metamodels. As this decomposition seems to require a large
number of metamodels to be built, Marrel et al. [30] then propose to apply this
method but using a Proper Orthogonal Decomposition Chatterjee [6] instead of the
wavelet decomposition in order to reduce the number of metamodels.
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1349
Fig. 39.4 Maps Sfi g of site sensitivity indices for each model input
A. Marrel et al.
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1351
Y .z/ D floc. .X; W .z//. Under the stationary hypothesis on Wz , site sensitivity
tot
indices Sfig .z/ do not depend on site z 2 Dz and can simply be denoted by Sfig
tot
.
For any support Dz , the block sensitivity indices of X and Wz over support
verify [37, for a detailed proof]:
SWtotz ./ c
' (39.22)
SXtot ./ jj
where c depends on the covariance function C ./. Equation (39.22) shows that the
ratio c =jj determines the relative contribution of the model inputs Wz and X to
the output variance var.Y /. For a small ratio (i.e., when the area of the support
is large compared with the critical size c ), the variability of spatial input Wz is
mainly local, and the spatial correlation of Wz over is weak. This local variability
averages over the support when the aggregated model output Y is computed;
hence, spatial input Wz explains a small fraction of the output variance var.Y /.
This change of support effect is of great importance to better understand how the
result of a sensitivity analysis may change under model upscaling or downscaling.
The model output of the Ceres-Mithra test case is a discretized spatial map of
time-integrated surface activity concentration of cesium 137 (Bq.s.m3 ), at moments
t0 C 200 , t0 C 400 , . . . , t0 C 2h where t0 is the beginning of releases. This output
is function of six uncertain inputs: the heights of release (HR1 and HR2), the
deposition velocity (DV), the source term activity (STA), and the wind direction
and speed (WD and WS). The associated two-dimensional spatial grid is made up
of 60 45 D 2700 points and is the same for all the Ceres-Mithra simulations.
In the following, a global sensitivity analysis is performed on the Ceres-Mithra
test case, using first the sensitivity index maps described on page 1344 to get
some insight on the local model behavior and then using the aggregated sensitivity
indices defined on page 1345 to get a global ranking of model inputs. With these
sensitivity measures requiring a high number of evaluations, the surrogate model
is necessary because of the important CPU time cost of a Ceres-Mithra run. This
1352 A. Marrel et al.
Fig. 39.6 Temporal evolution of the site 1st order Sobol indices for the 137 Cs integrated surface
activity
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1353
137
Fig. 39.7 Temporal evolution of the site total Sobol indices for the Cs integrated surface
activity
From a global point of view, the wind direction is the most influential variable on
the surface activity forecast: it explains more than half of the activity variance. The
source term activity is the second most influential variable, with on average about
15% of explained variance. The uncertainty on wind speed is less influential, while
the deposition velocity and heights of release do not affect the forecast. The sum
of the primary effects (i.e., without considering any interactions between variables)
explains between 80% and 100% of the variance. Thus, the interactions account
for less than 10% of the variance, and the only significant interaction is the one
between the wind variables. The estimation of second-order indices shows that this
interaction is located at the limit of the plume and increases slightly over time, while
the direction wind decreases; see Fig. 39.8.
Fig. 39.8 Temporal evolution of the site 2nd order Sobol indices between wind components, for
the 137 Cs integrated surface activity
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
11h20 11h40 12h00 12h20 12h40 13h00
Fig. 39.9 Temporal evolution of the normalized block total Sobol indices for the 137 Cs integrated
surface activity
these sensitivity indices after a normalization step, in order to sum to one. Clearly,
the wind direction is the predominant input, explaining between 66% and 76% of
the global output variance, according to the considered moment. Then, the source
term activity represents between 12% and 19% of this variability. The wind speed
uncertainty is less influential, with a contribution between 6% and 15%.
variability, which can be helpful from a physical point of view, to better understand
the local model behavior. Consequently, it is recommended to perform both types
of sensitivity analyses in parallel, in order to deal with a spatial model output.
5 Conclusions
References
1. Bayarri, M., Berger, J., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo, J., Parthasarathy, R.,
Paulo, R., Sacks, J., Walsh, D.: Computer model validation with functional output. Ann. Stat.
35, 18741906 (2007)
2. Bonin, O.: Sensitivity analysis and uncertainty analysis for vector geographical applications.
In: 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and
Environmental Sciences, Lisbon, 57 July 2006
3. Caers, J.: Modeling Uncertainty in the Earth Sciences. Wiley-Blackwell, Hoboken (2011)
4. Caers, J., Park, K., Scheidt, C.: Modeling uncertainty of complex earth systems in metric space.
In: Handbook of Geomathematics, pp. 865889. Springer, Berlin/New York (2010)
5. Campbell, K., McKay, M.D., Williams, B.J.: Sensitivity analysis when model outputs are
functions. Reliab. Eng. Syst. Saf. 91(10), 14681472 (2006)
6. Chatterjee, A.: An introduction to the proper orthogonal decomposition. Curr. Sci. 78(7),
808817 (2000)
1356 A. Marrel et al.
7. Crosetto, M., Tarantola, S.: Uncertainty and sensitivity analysis: tools for GIS-based model
implementation. Int. J. Geogr. Inf. Sci. 15, 415437 (2001)
8. Da Veiga, S.: Global sensitivity analysis with dependence measures. J. Stat. Comput. Simul.
85(7), 12831305 (2014)
9. De Lozzo, M., Marrel, A.: New improvements in the use of dependence measures for sensitivity
analysis and screening. J. Stat. Comput. Simul. (2015, submitted) doi:https://hal.archives-
ouvertes.fr/hal-01090475
10. Doury, A.: Pratiques franaises en matire de prvision quantitative de la pollution atmo-
sphrique potentielle lie aux activits nuclaires. In: Seminar on radioactive releases and their
dispersion in the atmosphere following a hypothetical reactor accident, vol. I, pp. 403448
(1980)
11. Erdlenbruch, K., Gilbert, E., Grelot, F., Lescouliers, C.: Une analyse cot-bnfice spatialise
de la protection contre les inondations : application de la mthode des dommages vits la
basse valle de lOrb. Ingnieries Eau-Agriculture-Territoires 53, 320 (2008)
12. Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis. Springer Series in Statistics,
Springer, New York (2006)
13. Fisher, P.F.: Modelling soil map-unit inclusions by Monte Carlo simulation. Int. J. Geogr. Inf.
Sci. 5(2), 193208 (1991)
14. Fort, J.C., Klein, T., Lagnoux, A., Laurent, B.: Estimation of the Sobol indices in a linear
functional multidimensional model. J. Stat. Plan. Inference 143, 15901605 (2013)
15. Francos, A., Elorza, F.J., Bouraoui, F., Bidoglio, G., Galbiati, L.: Sensitivity analysis of dis-
tributed environmental simulation models: understanding the model behaviour in hydrological
studies at the catchment scale. Reliab. Eng. Syst. Saf. 79(2), 205218 (2003)
16. Gamboa, F., Janon, A., Klein, T., Lagnoux, A., et al.: (2014) Sensitivity analysis for
multidimensional and functional outputs. Electron. J. Stat. 8, 575603
17. Gijbels, I., Prosdocimi, I., Claeskens, G.: Nonparametric estimation of mean and dispersion
functions in extended generalized linear models. Test 19(3), 580608 (2010)
18. Ginsbourger, D., Rosspopoff, B., Pirot, G., Durrande, N., Renard, P.: Distance-based kriging
relying on proxy simulations for inverse conditioning. Adv. Water Resour. 52, 275291
(2013)
19. Gretton, A., Bousquet, O., Smola, A., Schlkopf, B.: Measuring statistical dependence with
Hilbert-Schmidt norms. In: Algorithmic Learning Theory, vol. 3734, pp. 6377. Springer,
Berlin/Heidelberg (2005)
20. Hengl, T.: Finding the right pixel size. Comput. Geosci. 32, 12831298 (2006)
21. Heuvelink, G.B.M., Brus, D.J., Reinds, G.: Accounting for spatial sampling effects in regional
uncertainty propagation analysis. In: Proceedings of the Ninth International Symposium on
Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Leicester,
pp. 8588, 2023 July 2010
22. Heuvelink, G.B.M., Burgers SLGE, Tiktak, A., Den Berg, F.V.: Uncertainty and stochas-
tic sensitivity analysis of the GeoPEARL pesticide leaching model. Geoderma 155(34),
186192 (2010)
23. Iooss, B.: Treatment of spatially dependent input variables in sensitivity analysis of model
output methods. Technical report, European project PAMINA (Performance assessment
methodologies in application to guide the development of the safety case) (2008)
24. Iooss, B., Ribatet, M.: Global sensitivity analysis of computer models with functional inputs.
Reliab. Eng. Syst. Saf. 94, 11941204 (2009)
25. Jacques, J., Lavergne, C., Devictor, N.: Sensitivity analysis in presence of model uncertainty
and correlated inputs. Reliab. Eng. Syst. Saf. 91(1011), 11261134 (2006)
26. Lamboni, M., Monod, H., Makowski, D.: Multivariate sensitivity analysis to measure global
contribution of input factors in dynamic models. Reliab. Eng. Syst. Saf. 96(4), 450459 (2011)
27. Lilburne, L., Tarantola, S.: Sensitivity analysis of spatial models. Int. J. Geogr. Inf. Sci. 23(2),
151168 (2009)
28. Marrel, A., Iooss, B., Jullien, M., Laurent, B., Volkova, E.: Global sensitivity analysis for
models with spatially dependent outputs. Environmetrics 22(3), 383397 (2011)
39 Sensitivity Analysis of Spatial and/or Temporal Phenomena 1357
29. Marrel, A., Iooss, B., Da Veiga, S., Ribatet, M.: Global sensitivity analysis of stochastic
computer models with joint metamodels. Stat. Comput. 22(3), 833847 (2012)
30. Marrel, A., Perot, N., Mottet, C.: Development of a surrogate model and sensitivity analysis
for spatio-temporal numerical simulators. Stoch. Environ. Res. Risk Assess. 29(3), 959974
(2015)
31. McCullagh, P., Nelder, J.A., McCullagh, P.: Generalized Linear Models, vol. 2. Chapman and
Hall, London (1989)
32. Monfort, M., Patryl, L., Armand, P.: Presentation of the ceres platform used to evaluate the
consequences of the emissions of radionuclides in the environment. HARMO 13, 14 (2010)
33. Morris, M.D.: Gaussian surrogates for computer models with time-varying inputs and outputs.
Technometrics 54(1), 4250 (2012)
34. Romary, T.: Heterogeneous media stochastic model inversion. PhD thesis, Universit Pierre et
Marie Curie Paris VI (2008)
35. Ruffo, P., Bazzana, L., Consonni, A., Corradi, A., Saltelli, A., Tarantola, S.: Hydrocarbon
exploration risk evaluation through uncertainty and sensitivity analyses techniques. Reliab.
Eng. Syst. Saf. 91(1011), 11551162 (2006)
36. Saint-Geours, N.: Sensitivity analysis of spatial models: application to cost-benefit analysis of
flood risk management plans. PhD thesis, Universit Montpellier 2, http://tel.archives-ouvertes.
fr/tel-00761032 (2012)
37. Saint-Geours, N., Lavergne, C., Bailly, J.S., Grelot, F.: Change of support in variance-based
spatial sensitivity analysis. Math. Geosci. 44(8), 945958 (2012)
38. Saint-Geours, N., Bailly, J.S., Grelot, F., Lavergne, C.: Multi-scale spatial sensitivity analysis
of a flood damage assessment model. Environ. Model. Softw. 60, 153166 (2014). http://dx.
doi.org/10.1016/j.envsoft.2014.06.012
39. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output. design and estimator for the total sensitivity index.
Comput. Phys. Commun. 181(2), 259270 (2010)
40. Scheidt, C., Caers, J.: Representing spatial uncertainty using distances and kernels. Math.
Geosci. 41(4), 397419 (2009)
41. Shi, J.Q., Wang, B., Murray-Smith, R., Titterington, D.M.: Gaussian process functional
regression modeling for batch data. Biometrics 63(3), 714723 (2007)
42. SMVOL: Programme dactions de prvention des inondations sur les bassins de lOrb et du
Libron (34) pour les annes 20112015. Note de prsentation, Syndicat Mixte des Valles de
lOrb et du Libron (2011)
43. Smyth, G.K.: Generalized linear models with varying dispersion. J. R. Stat. Soc. Ser. B
(Methodol.) 51, 4760 (1989)
44. Suzuki, S., Caers, J.: A distance-based prior model parameterization for constraining solutions
of spatial inverse problems. Math. Geosci. 40(4), 445469 (2008)
45. Suzuki, S., Caumon, G., Caers, J.: Dynamic data integration for structural modeling: model
screening approach using a distance-based model parameterization. Comput. Geosci. 12(1),
105119 (2008)
46. Volkova, E., Iooss, B., Van Dorpe, F.: Global sensitivity analysis for a numerical model
of radionuclide migration from the RRC Kurchatov Institute radwaste disposal site. Stoch.
Environ. Res. Risk A 22(1), 1731 (2008)
47. Wechsler, S.: Uncertainties associated with digital elevation models for hydrologic applica-
tions: a review. Hydrol. Earth Syst. Sci. 11(4), 14811500 (2007)
48. Zabalza, I., Dejean, J.P., Collombier, D.: Prediction and density estimation of a horizontal
well productivity index using generalized linear models. In: 6th European Conference on the
Mathematics of Oil Recovery. Peebles, Scotland (1998)
Part V
Risk
Decision Analytic and Bayesian
Uncertainty Quantification for Decision 40
Support
D. Warner North
Abstract
This essay introduces probability in support of decision-making, as a state
of mind and not of things. Probability provides a framework for reasoning
coherently about uncertainty. According to Coxs theorem, it is the only way
to reason coherently about uncertainty. Probability summarizes states of infor-
mation. A basic desideratum is that states of information judged equivalent
should lead to the same probability distributions. Some widely used probabilistic
models have both sufficient statistics and maximum entropy characterizations.
In practice, outside of certain physics and engineering applications, human
judgment is usually needed to quantify uncertainty as probabilities. Reality is
highly complex, but one can test judgments against what is known, drawing upon
many specialized areas of human knowledge. Sensitivity analysis can evaluate
the importance of multiple uncertainties in a decision context. When the choice
between alternatives is close, what is the value of further information (VOI)?
Important concepts discussed in this chapter are the expected value of perfect
information and the expected value of further experimental testing.
Practical application is illustrated in two case studies. The first involves
assessing the probability for replication of a terrestrial microbe on Mars, in the
decision context of a constraint on planetary exploration. The second involves
weather modification of hurricanes/typhoons: whether to deploy this technology
to reduce damage from hurricanes impacting US coastal areas and valuing
experimental testing offshore. The decision context is that of a natural disaster
turned political by deployment of new technology where such deployment might
be followed by a reduction, or by an increase, in adverse impacts, compared to
not deploying the technology.
Decision support should be viewed as an iterative process of working with
those with decision responsibility and with experts who are the best available
Keywords
Bayes theorem Decision analysis Value of information
Contents
1 A Viewpoint on Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1362
1.1 Many Problems with Using Probability Theory Remain . . . . . . . . . . . . . . . . . . . . . 1367
1.2 Bayes Theorem A Consistency Condition Among Probabilities . . . . . . . . . . . . 1372
1.3 A Basic Desideratum: Two States of Information Perceived to be
Equivalent Should Lead to the Same Probability Assignments . . . . . . . . . . . . . . . . 1373
1.4 Decision Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377
2 Two Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1379
2.1 Case Study #1: Terrestrial Microbes on Mars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1379
2.2 Case Study #2: Decision Analysis of Hurricane Modification . . . . . . . . . . . . . . . . 1384
3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396
Normative context and the axiomatic basis for probability. Modern decision
theory was developed by Abraham Wald [61] and Leonard J. Savage [48] in the mid-
twentieth century, building on foundations laid by Ramsey [47] and von Neumann
and Morgenstern [59]. Decision analysis, as the application of decision theory to
complex practical problems, came about in the 1960s largely though the research
and writings of Ward Edwards, Ronald Howard, and Howard Raiffa and their
colleagues [2, 3, 1517, 32, 4446, 60].
Many people trained in decision analysis view its theoretical basis as the
axiomization by Leonard J. Savage in his 1954 book, The Foundations of Statistics
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1363
quantitative reasoning about uncertainty. Jaynes posthumous text [21] carries the
title Probability Theory: The Logic of Science. Jaynes interpretation has other
enthusiastic fans, such as Nassim Nicholas Taleb, author of The Black Swan ([52],
see p. 319, notes on chapter 9), Fooled by Randomness ([53], see p. 289, notes
on chapter 13), and Antifragile [54]. Many practitioners of risk and decision
analysis are not familiar with the Jaynes and Jeffreys interpretation, sometimes
called the necessarist school of probability. A central idea of from Jaynes and his
predecessors is that the assessment of a probability reflects a state of information.
All probability assessments should be regarded as conditional on this state of
information. The state of information is not a physically measurable entity but
a mental construct: what an individual has as knowledge, which may include a
great deal of observations plus models and mental constructs used to organize
this information. While in many applications information is shared, in a specific
instance, different individuals may have different information about the same
physical event.
As a very simple example, consider a coin toss. The coin is assumed to be fair
in having a Head on one side, a Tail on the other, and tossed in such an imprecise
way that these two outcomes become equally likely, and one of the two is certain
to occur. Individual one (Tom) watches individual two (Bill) toss the coin high
in the air, catch it in the palm of the hand, and then peek at the landed coin, so that
Bill knows which side is up. Tom should assign a probability of 0.5 to the outcome
of a Head. Individual two, Bill, has seen the result of the toss. (We assume good
vision, so Bill will assign a probability of either 0 or 1 to the outcome of a Head.)
The event is the same, the time of the probability assessment is the same, and the
two individuals may generally agree on most items of information. But one of them
has seen the outcome of the coin toss and the other has not. This example illustrates
that all probabilities are conditional: Uncertainty can be a matter of who has what
knowledge from one or more observations, rather than about the physics of the coin
and the coin tossing system. Probability reflects the state of mind, rather than being
a measurement of the state of a physical system, such as temperature as a measure
of the energy in a material object.
The concepts from philosophy underlying this idea are that probability is
the logic for reasoning, and the numerical probabilities we use reflect states of
knowledge, not physical states of a system. The ideas come down from Plato
through Immanuel Kant to his disciple, Hans Vaihinger, author of The Philosophy
of As-If, written in 1877, published in German in 1911 and in English in 1924
[57]. We humans think using numbers and models, and we should not confuse these
models with reality. They are useful fictions in Vaihingers writings. Human brains
connect to underlying physical reality only through sensory data observations,
which may be numerical measurements. How well can we predict what we do not
yet know? The answer lies in the realm of mental processing in the brain(s) of the
predictor(s), who use their state of knowledge to make the prediction. Uncertainty is
a lack of knowledge in the human brain, and not as some sort of objective reality.
Probability, as a measure of uncertainty, reflects ones state of mind, and not a state
of things.
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1365
.A C B/jCD
1366 D.W. North
is the event that at least one of the propositions A and B is true given that C and D
are (both) true.
Greater plausibility is represented by a greater number, so we can write
But for another proposition C, the change from B to B0 does not change the
plausibility, then
P .CjAB0 / D P .CjAB/
The plausibility of the combined proposition AB, that A and B are both true,
depends on the plausibility of B and of A given B, AjB. So the functional form of
plausibility for AB must be a function of the plausibilities B and AjB. The Cox
proof shows that conjunction of plausibilities must be associative, and since all
strictly increasing associative binary operations on real numbers are isomorphic to
multiplication (Cox [9], pp. 1216; Jaynes [21], Chapter 2; Wiki, Coxs theorem
[64]), then, our plausibility model might as well be ordinary multiplication:
P .AB/ D P .AjB/P(B)
The final stage in the proof, that the functional relationship must be multiplication,
is required by consistency: If the plausibility of a proposition can be derived in
multiple (valid) ways, the numerical results must be equal. Suppose that A+B is
equivalent in plausibility to C+D. Then we acquire new information on A and then
new information on B; the numerical results of conditioning on the new information
must be the same as if we have acquired new information on C and then new
information on D.
Jaynes [21] devotes most of two chapters to building up to Coxs theorem. The
derivations of Cox and successors get into subtleties on the domain of what we
have called propositions and descriptions of these propositions using set theory.
The wiki on Coxs theorem describes relevant literature. The remainder of this
chapter assumes that probability theory is the appropriate logic to use in dealing
with uncertainty, assuming that two-valued logic applies and that ambiguities have
been removed through careful definition. If propositions or outcomes of concern in
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1367
Poorly supported inputs. If one uses probability theory in the form of a model to
calculate probabilities of outcomes from probabilities placed on inputs, including
1368 D.W. North
Combining the information from diverse sources experts and data. Characteriz-
ing uncertainty in a complex decision context should involve adequate time learning
about the uncertainty and how it arises. There may be knowledge about cause and
effect. Various areas of science and engineering may apply. An analyst may need
to learn new concepts and specialized terminology before engaging in dialogue
with experts in the subject area, so as to understand their reasoning and assess
their judgment in the form of probabilities. Different people may have knowledge
of different aspects or of different events in a causal chain. An analyst may need
to invest considerable effort in building a model of how things work before
beginning to assess the uncertainties.
There are methods for combining the judgment of multiple experts by weighting
their probabilities with calibration based on test questions [7]. Such methods can
1370 D.W. North
imprecise by design (e.g., a long toss with a bounce off the side onto the table),
one can calculate quite precisely the frequency of winning in a large numbers of
repetitions of a dice game such as Craps or a card game such as 21. (One of the
founders of decision analysis, Ward Edwards, did such calculations, impressing a
gambler sponsor who subsequently funded a series of psychology experiments as
games of chance in the Four Queens Casino in Las Vegas [42]).
The same idea can be applied in other contexts. Life insurance is another
example, and it illustrates how the process becomes more complicated. The
length of a persons life is an uncertain quantity. People of the same age and
sex without known health impairments or disabilities may be regarded as having
equivalent probabilities of dying after n years. The insurance industry has used
mortality data to develop life tables. The current versions are quite sophisticated.
(Example: https://www.johnhancockinsurance.com/life/life-expectancy-tool.aspx.)
Put in information on known health impairments, exercise pattern, alcoholic
beverage consumption, driving record, etc., and observe how the expected remaining
lifetime changes with these factors. There is an important message here. The
probability distribution on a persons lifetime as determined from mortality statistics
is dependent on a list of factors influencing life span. As more factors are added in
the evaluation, probabilities that were initially equal can become unequal, because
the more detailed state of information provides a basis for distinguishing among
individuals. That same message applies in many other contexts. It is therefore
helpful to consider that ALL probabilities are conditional on a state of information.
Outside of games of chance, there are few situations where conditioning can be
ignored. Ignoring possible conditioning factors is sensible if one can reasonably
assume that, based on symmetry or experience, the probabilities of the uncertain
outcomes are known to high precision, so knowledge of these additional factors is
not significant for the assessments. But even in the context of games of chance,
the simplifying assumptions underlying such assumed-to-be-precise probabilities
may be absent. Consider flipping bent coins or thumbtacks, rolling loaded dice, or
the spin of an unbalanced roulette wheel or wheel of fortune: The outcomes are now
not equally likely.
Italian scholar Bruno de Finetti developed the concept of exchangeable
sequences as the appropriate foundation for uncertain situations in which statistical
methods apply. Such situations are often described as aleatory uncertainties. The
basic idea is that a sequence of numbers X1 ; X2 ; X3 ; : : :, representing experimental
data, observations, or outcomes, can be considered as having a joint probability
distribution that is invariant to relabeling the indices. In other words, the order
of the data elements does not matter in making the probability assignments. Di
Finettis theorem states that the elements of such a sequence are conditionally
independent of each other, given one or more latent underlying variables that are
uncertain (i.e., described by a probabilistic model). For these variables the term
epistemic uncertainty applies. Inference or learning about these variables can occur
by observing elements of the sequence [10, 65].
Now we have a theory into which probabilistic models such as the Bernoulli
process apply. For flipping thumbtacks instead of tossing coins, the probability of
1372 D.W. North
a Head (point down, head of tack up) is unknown, but as one flips a thumbtack
a large number of times, the observed frequency of heads compared to the number
of flips converges to a number which is the probability of heads on a single flip.
This is the process of statistical inference described in many texts on probability
and statistics.
How does one revise a probability distribution based on new information, such as
the number of heads in additional tosses of a thumbtack? The process is a simple
application of the logical consistency requirement in probability theory. Let A and
B be two uncertain quantities, and consider how one can write the probability of the
joint event A and B:
Then, dividing, with an assumption that P(A) is not zero (which would mean A
is not possible),
We have a new probability for B given that A is known to have occurred. P.BjA/ is
often called the posterior probability, and P(B) the prior probability, referring to time
periods before and after the resolution of uncertainty on A, which revised the state
of information. The expression P.AjB/=P.A/ is often called the likelihood function.
The relationship works with A and B as random variables as this term is used in
probability and statistics texts.
This relationship (40.1) is known as Bayes theorem, after the Reverend Thomas
Bayes, an English minister, logician, and Fellow of member of Britains Royal
Society. The relationship was published by a friend in 1763, 2 years after Bayes
death.
With the combination of exchangeable sequences and Bayes theorem, much
of traditional statistics can be reinterpreted as using new information to revise
probability distributions.
Here is a simple example. A fair coin is flipped four times. You are told that the
outcomes included at least one head. What is the probability of all heads?
Many people who have taken classes in probability and statistics fail to get
the correct answer. Finding the correct answer is simple for those not trapped
in the thought pattern of traditional statistics, where the usual assumption is that
probabilities do not change. Draw a tree diagram showing the sequence of
possible head-tail sequences of four flips. There are 16 different sequences, ranging
from HHHH to TTTT. Only the last is eliminated by the information that there was
at least one head, and the relative probability of the other 15 outcomes remains
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1373
the same. The answer is 1/15. The reasoning is a very simple example of Bayes
theorem.
A more complex application is the Monty Hall [68] problem arising from the
American television show, Lets Make a Deal, and publicized in Marilyn Savants
widely read column Ask Marilyn in Parade magazine.
Suppose youre on a game show, and youre given the choice of three doors: Behind one
door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows
whats behind the doors, opens another door, say No. 3, which has a goat. He then says to
you, Do you want to pick door No. 2? Is it to your advantage to switch your choice?
The correct answer is to switch the choice of the door, which increases the
probability of winning from 1/3 to 2/3. Many readers of the magazine claimed this
answer was wrong. It is counterintuitive, and as such, an excellent example of why
logical reasoning is needed in complex uncertain decision situations. Apply Bayes
theorem: The probability of the car being behind door 1 is 1=3. The host now reveals
that the car is not behind door 3. So it must be behind either door 1 or door 2. Has
the probability on door 1 changed? No, the information from what the host has done
should only change the probability on the other two doors. The host is not going to
open the door with the car or the door that you chose. He opens the third (No. 3),
one with a goat. The other door, No. 2, now has the probability that you would have
earlier placed on door 3, plus its earlier probability: 1=3 C 1=3 D 2=3.
At this point readers who may be unfamiliar or only partially knowledgeable
have been introduced to Bayes theorem and exchangeable sequences, two key
ideas for how uncertainty can be characterized in quantitative form using probability
theory. Probabilities reflect a state of information that changes as new information
is obtained. Probability illustrated through games of chance is unusual, because in
this context people believe they know the input probabilities and can compute with
confidence probabilities of complex sequences of events, such as winning at the dice
game Craps, drawing a Royal Flush hand in poker, etc.
or the equal arcs around the wheel are judged equal in the sense that relabeling
leads to an equivalent state of information, then the probability assignments should
be the same: 1=2 for each side of the fair coin, 1/6 for each face of a fair die, and
1/360th times the degree of arc for the wheel, (with correction for space taken up by
dividers).
The second application is exchangeable sequences. Its the same idea the
probabilities are invariant under rearrangements of the order of the elements in
the sequence. Using Bayes theorem enables observations of sequence elements to
revise probability distributions on parameters, which turn out to have meaning as
frequencies in the limit when the sequence has a large number of repetitions. The
most well-known models from probability theory fit this description: the Bernoulli
process, a generalization of the coin-tossing game to binary events where the
probability of heads versus tails (or success versus failure) is unknown; the Poisson
process where nonnegative numbers come from events such as the time of the next
arrival for each of a succession, such as customers at a service center or decays
of radioisotopes; and the Gaussian or normal distribution composed of the sum
of lots of small independent increments random walk, velocity of molecules in
an ideal gas, or distribution of a characteristic such as height in a male or female
human population. Readers may have learned about such processes from a standard
text on probability and statistics. But what they may not have learned is that these
probability distributions are special cases in which Bayes theorem can be done
by simple arithmetic on parameters, because observations can be characterized by
sufficient statistics.
E.T. Jaynes writings including his 2003 textbook assert maximum entropy as a
principle for choosing probability distributions. Is maximum entropy an appropriate
principle for this purpose? An answer is provided in North [38]. Maximum entropy
turns out to be an application of the Basic Desideratum.
Howard Raiffa, his Harvard colleague Robert Schlaifer, and others noted the
special character of probability distributions that have a conjugate prior, such
that revision from prior to posterior for these distributions by Bayes theorem
can be done within the same family: Bernoulli, Poisson, normal, gamma, and a
few others [46]. The relation to sufficient statistics is clear. For example, in the
Bernoulli process a prior distribution is used for the parameter , which corresponds
to the long-run frequency of heads and the probability of a head on the next flip
with current information. The beta prior probability distribution is defined over the
interval from 0 to 1, with two parameters u and v:
Similar discussions are readily found for the Poisson process and exponential
distribution with unknown mean inter-arrival time and the Gaussian or normal
distribution with unknown mean and variance.
The mathematical proof showing the equivalence of the exponential family of
probability distributions with sufficient statistics with the solutions of finding the
probability distribution that maximizes entropy subject to constraints that corre-
spond to knowing the mean and/or higher moments of the probability distribution
is found in North [38]. Readers interested in these details can find them there.
The following is a sketch of the argument, for discrete outcomes (intervals) which
become in the usual take-the-limit argument in calculus arbitrarily small with a
correspondingly large number of them.
Consider the data from a series of observations from an exchangeable sequence.
A histogram of the results summarizes these data. Relabeling the observations does
not change the histogram. Count all the sequences that lead to a given histogram,
and these sequences are judged to have equivalent probabilities.
Let n be the number of trials and nk the number in the kth interval, where k runs
from 1 to N and N (as well as n) will be large numbers, with n N, and for all k,
nk N. Let fk D nk =n, the frequency of the kth interval in the n trials.
The number of ways W to arrange n objects in N categories is the multinomial
expression:
W D n=n1 n2 : : : nN
p
Sterlings approximation gives n D en nn x 2n
Therefore,
where O(log n) are terms that increase no faster than as log n as n gets large.
Then
N
X N
X
Log W D n log n n nk log nk C nk C O.log n/
kD1 kD1
N
X
1=n x log W D fk log fk C O..log n/=n/
kD1
N
X
H.p1 ; : : :pN / D pk log pk
kD1
1376 D.W. North
Essentially, entropy is counting the most likely histograms as the numbers n and
N become large. One might think of this as follows: The exchangeable sequence
process becomes random, subject to what is known on expected value of functions
of the probability distribution, such as its mean, variance, and possibly higher
moments. The usual proof of the Central Limit Theorem follows the essentially
the same logic.
In case of conjugate prior-posterior families, the insight is that all one needs to
do for Bayesian updating is to keep track of statistics such as the sample mean
and sample variance, and other aspects of the observational history average out in a
randomization process. There are a number of important applications in physics to
this line of reasoning. It means one can avoid the need for an ergodic hypothesis,
an assumption that time averages equal ensemble averages. Rather, for a system
in equilibrium, in assigning probabilities on the state of the system, it should not
matter when the experimental observation occurs. So, one can randomize the time
of the experiment, and the probabilities on the state of the system should remain the
same. Invariance to such randomization leads directly to the probability distributions
developed by J. Willard Gibbs for statistical mechanics, without requiring ensemble
as an intermediate concept [38].
Here is a summary of applications of the basic desideratum:
is that the velocities follow a normal distribution, and for gases where molecules
do not interact (ideal gas assumption), this distribution is experimentally verified as
correct [67].
In most applications in risk and decision analysis, we are not dealing with the
orderly randomness of the normal or exponential probability distributions. But
thinking about what state variables influence the behavior of a complex system
will help to identify what variables are important to include in a description of the
state of information. Conditioning on these variables may enable describing how
one uncertain quantity is linked to another. No linkage, or independence, of one
uncertain quantity from another, may be motivated by a lack of any known causal
relationship between them. But in a physical, economic, political, or interpersonal
situation, it is often the case that there are linkages between uncertain events.
Then conditional probabilities should be used to characterize these linkages where
the occurrence of predecessor events can make a successor event more, or less,
probable. And there may be changes in the probabilities with time a nonstationary
process. An analyst wishing to characterize an uncertain event in terms of a
probability should consider carefully how the event probability might change over
time or with further information about other, possibly related, events.
It is therefore advisable to consider all probabilities as conditional on ones
state of knowledge. Markov models in probability theory consider transitions
among states, and such models may be useful. The second case study below
included a Markov model, but when probability numbers became established, the
complexity of considering multiple transitions among states was judged unneces-
sary.
Networks of events linked through having the probability of successor events
depend on the outcome of predecessor events that have become widely used both in
decision analysis as influence diagrams [18] and in computer science as Bayesian
networks or Bayesian belief networks [63, 66].
What is decision analysis? This term comes both from Howard Raiffa [45] and
Ronald Howard [15]. Decision analysis is as a process, rather than a product. The
case studies in the latter part of this chapter can be viewed as products, descriptions
of what came out of a process. The National Research Council reports of [35] and
[36] emphasize analysis as a process. See also Fischhoff [11].
Decision analysis as a process supports decision-making by those with decision
responsibility. It involves characterizing uncertainties, which are usually what make
decision-making difficult. Putting the task in the second person with the reader as
the decision-maker, the process involves structuring through quantitative analysis
your alternatives (what you can do); your information (what you know, including
information relevant to uncertainties), and your preferences (what you want) to
identify the alternative(s) that are best in terms of maximizing expected utility. A
good decision is one that is logically consistent with the alternatives, information,
1378 D.W. North
for a group from the preference orderings of the individuals in the group, even if
individual preference orderings are transitive. Majority voting may not work: The
Condorcet paradox on majority voting [71] and the Arrow Impossibility Theorem
[1, 62] show that construction of preferences for the group obeying transitivity may
not be possible. Cultural change, ethics, and morality may come into play in helping
members of a group agree on preferences. To the extent a group or society can
agree in the sense of having transitive group preferences; preference measures can
be constructed to evaluate outcomes. But obtaining agreement on the ordering of
good outcomes in the context of a public policy decision may involve considerable
difficulty [14].
Decision analysis involves defining outcome events related to the decisions
so that both the answers to questions What will happen? and What do those
responsible want? can have answers that are well defined and not ambiguous. And
the process continues, to obtain for each outcome or event a numerical measure
of uncertainty (a probability) and a numerical measure of value. (This numeraire
may be a utility, or in the absence of a nonlinear measure of risk preference, it
might be a value assessed with dollars or another currency.) Cost-benefit analysis
involves assessing costs and benefits from the outcomes from the choice among the
set of decision alternatives. More sophisticated versions examine the distribution of
costs and benefits and not just the aggregate amounts of cost and benefit across the
population that experiences the outcomes.
In 1972 the author and his colleagues at SRI International received a challenging
assignment: to assist the US National Aeronautics and Space Administration
(NASA) by making an assessment of the probability that the planned 1976 landing
1380 D.W. North
a source and reaching a receptor area where vulnerable species might be damaged.
The model begins with an initiating event, the landing of the spacecraft with its
bio-burden of viable terrestrial organisms (VTOs) in different locations on the
spacecraft. The next event, release, depends on whether the landing is soft as
planned or a hard landing due to equipment malfunction and, also, where on the
spacecraft the organisms are located. The analysis team considered three means by
which organisms might enter the Martian environment implantation beneath the
surface, such as from being on the bottom of a landing pad or in a buried fragment
of the spacecraft that broke up from a hard landing. An organism on the surface
of the spacecraft might come off from vibration and land on the soil surface. Wind-
driven dust and sand might erode the spacecraft over a long period of time, releasing
organisms encased in the plastic or on interior surfaces. The model included all of
these modes of release. The next stage is the transport of an organism to a favorable
microenvironment. Does such a microenvironment exist, with water near enough the
liquid state to be usable for reproduction by the organism? Are the needed nutrients
available? Does the organism have the capability to reproduce in the absence of
oxygen and at temperatures well below where water normally freezes? The model
also included the possibility that the life detection experiment on the spacecraft
becomes contaminated, resulting in a large amount of microbial reproduction on the
spacecraft and a much larger subsequent release of organisms. This structure for the
probabilistic model is shown in Fig. 40.1.
The first box on the left summarized information on the bio-burden: the number
of viable terrestrial organisms on board the spacecraft at the time of landing. The
estimate took into account the effect of the heat sterilization processes for the
components, data on microbial spores in the plastic, estimates on organisms that
might be trapped between surfaces as components of the spacecraft were assembled,
and recontamination after final assembly, including the potential for contamination
of the life detection experiment. The output is the estimated number of organisms
in each of five categories of location. The second box includes the outcome for
landing, at the planned low velocity with three pads in contact with the Martian soil
or a much higher velocity hard crash landing that would cause the breakup of the
Implan-
External tation
STERILIZATION LANDING EXISTENCE OF RESISTANCE
Reach
MODE USABLE WATER TO
Covered Usable
ENVIRONMENT Growth
Erosion Water PROBABILITY
RECONTAMINATION RELEASE TRANSPORT OF
Mated MECHANISM MECHANISM CONTAMINATION
Encap- Vibration
BIOEXPERIMENT sulated RELEASE TRANSPORT
NUTRIENTS
LETHALITY LETHALITY
Bioexperiment Contamination
spacecraft with pieces buried deep into Martian soil. The outputs are estimates of
the expected number of organisms implanted into the soil, released through wind-
driven dust or sand erosion into the Martian atmosphere and released via vibration
or other nonviolent processes from the spacecraft onto the Martian surface. The
third box deals with transport to a possible favorable microenvironment, a location
with water near enough the liquid state at some time in the Martian year (i.e.,
useable) so that reproduction might occur. The inputs are the probability that
such a microenvironment exists on Mars and the fraction of the surface area that
it covers, given it exists. The third box is a relatively elaborate six-state Markov
model allowing for multiple transport events (by a series of Martian dust storms)
from release from the spacecraft to contact with useable water. (Details are in
Judd et al. [25]). The output of the third box is the expected number of organisms
that reach useable water. The fourth box considers availability of nutrients and
the characteristics needed for the organisms to reproduce in the cold and without
oxygen. If these conditions for reproduction are met, the outcome is growth, i.e.,
planetary contamination.
The model then gives a probability for contamination built up from approx-
imately 20 inputs describing the microbial contents on the landing vehicle, the
probabilities assigned to the mission outcome (nominal landing and failure modes),
and Martian environmental conditions. The scientists who provided these inputs
found it relatively easy to understand the structure of the analysis, the judgments
they were being asked to make, and how these judgments would be used. When a
complete set of inputs was assembled, the assessment was reviewed with these sci-
entists. The nominal results are shown on the arrows in Fig. 40.2. The nominal case
result is a probability of 6 106 , a factor of about 16 below the mission risk limit.
Extensive sensitivity analyses were carried out to see how the calculated
probability of contamination changed with changes in the inputs to the calculation,
singly and in combination. An illustrative set are shown in Table 40.1.
Fig. 40.2 Viking Lander Mission contamination model with results (Retrieved from http://ntrs.
nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19750005702.pdf)
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1383
The pattern of numerical results suggested a reason why the calculated risk was
more than an order of magnitude below the risk limit of 104 . Most of the release
of viable organisms was through erosion. The Martian atmosphere has no ozone
layer to absorb the ultraviolet (UV) radiation in sunlight, so a high UV flux reaches
through the Martian atmosphere down to the surface. A terrestrial microorganism
would be killed by this UV radiation unless it was protected by shielding, such as
being encased in a piece of plastic much larger than the microorganism itself. The
density of the Martian atmosphere is well known from observation by telescope
from Earth. Only small particles would be transported a significant distance from
the spacecraft, even in the periodic Martian dust storms. These small particles would
not provide adequate shielding to protect the microbes from the UV radiation.
It was therefore highly unlikely that microbes could survive a journey from the
Viking landing site to a favorable microenvironment if such microenvironments
exist on Mars. This insight was not known previously within NASA and its scientific
1384 D.W. North
advisors, but the scientists advising the Planetary Quarantine Officer and other
senior officials at NASA accepted the reasoning. Within a short time the scientists
who had expressed concern about the risk of contamination agreed that the objective
of holding the risk for the first Viking Mission below 104 had been met and that
the Viking Mission should proceed without the need for further steps to reduce the
microbial load on the Viking Lander. The Viking Mission flew on schedule and the
landing on Mars in 1976 was successful.
In 2015 more information has become available. Microenvironments have now
been observed in the form of dark streaks that appear in several locations on Mars
when temperatures are above 23 C. These streaks are thought to be hydrated
minerals a mixture of salts and water. And particles do migrate, even moderately
large ones. Sand-grain-size particles have been observed to impact on the solar
panels of the Mars Rover vehicles. Theoretical calculations and orbiter photos
indicate that sand dunes on Mars move [23, 24, 28].
The authors have not been asked to redo their analysis based on such new
information. In 1992 a committee of leading scientists stated, . . . it is the
unanimous opinion of the task group that terrestrial organisms have almost no
chance of multiplying on the surface of Mars and in fact have little chance of
surviving for long periods of time, especially if they are exposed to wind and to UV
radiation (Space Science Board [49], page 49). The insight from the analysis had
become conventional wisdom. But revisiting the analysis might suggest that NASA
should be concerned about the risks of contamination from a rover or a lander near
the areas where the streaks that might be liquid water have been observed.
Calculating the probability that a terrestrial microorganism could reproduce on
Mars would seem like a situation where there are no data, an example of deep
uncertainty and an absence of the information needed for expert judgment. But
in this project the analysis team broke the assessment into pieces and assembled
judgments from experts via the model described above. Analysis using this model
provided a basis for NASA to decide to proceed with the mission.
The second case study was done a few years earlier. It was on a decision to deploy
a new technology in a domain where it was previously assumed that nothing could
be done to avert natural disaster but to warn the potentially affected population and,
in severe situations, to evacuate them from the area.
Hurricanes are intense cyclonic storms with winds in excess of 74 miles per hour
that form over warm ocean water, such as in the Caribbean or adjacent areas of the
western Atlantic. In the Pacific and Indian Oceans, these storms are called typhoons,
another name for the same natural phenomenon.
In the early 1960s scientists at the National Hurricane Research Center developed
a theory that seeding the clouds around the eye (the calm circular center) of a
hurricane could reduce the intensity of the maximum winds and the storm surge, so
that a hurricane impacting a populated coastal area should become less destructive.
The theory held that seeding with silver iodide crystals would cause supercooled
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1385
water to form into ice crystals, releasing heat that would cause the cloud bank around
the eye wall to move outward, reducing the speed of the maximum winds [70].
Small-scale experimental seeding had been carried out on two hurricanes after the
theory was proposed, and a computer-implemented model of hurricane dynamics
and experiments had been developed. Strong evidence supporting the theory came
from cloud seeding on Hurricane Debbie on August 18 and 20 of 1969: After the first
day of seeding, a reduction of 31 % in the maximum sustained winds was observed.
The hurricane was left alone for a day and then reseeded, with a subsequent wind
speed reduction of 15 %.
Our analysis team from SRI International was engaged subsequent to the
experiment on Hurricane Debbie. The team was asked by the Assistant Secretary
of Commerce for Science and Technology, Myron Tribus, to carry out a decision
analysis on whether the government program, Project Stormfury, should be moved
from a research project into an operational mode that could seed hurricanes
threatening coastal areas of the USA.
This analysis was subsequently published in Science [19]. The following descrip-
tion is a summary from that publication, with an emphasis on the assessment
of uncertainty and the valuation of information from opportunities for further
experimental seeding of the kind that had been done for Hurricane Debbie in 1969.
While currently available scientific information no longer supports the Stormfury
theory, the case study illustrates a number of aspects that apply in other decisions
on deploying a new technology in uncertain situations that could evolve negatively
as well as positively. The potential for benefit must be weighed against the risk of
a negative outcome, and public perceptions become important when a government
agency takes responsibility for introducing technology into a natural process that
has a potential for harm.
The property damage caused by a hurricane is strongly related to the speed of the
maximum sustained winds. These winds do damage directly and through the storm
surge, pushing seawater to abnormally high levels as the hurricane impacts a coastal
area, causing both flooding and structural damage from wave action.
A simplified view of illustrating the difficulty is shown in Fig. 40.3. Supposing
a hurricane is about 12 h away from impacting a populated coastal area. Natural
forces might be causing the hurricane to intensify, but with current knowledge the
change in the hurricane winds cannot be anticipated. Meteorologists cannot predict
accurately how the behavior of the hurricane will evolve over time.
The diagram shows a possible scenario: If the hurricane is seeded, the wind
speed is reduced from w.t1 / to w0 .t1 /, so that seeding has reduced the wind and
the property damage. But when the hurricane makes landfall, it has intensified
less if it has been seeded, but the wind speed has increased. Those people suffering
property damage might blame a government decision-maker for making the choice
to seed.
At the time of our analysis, nearly all the government meteorologists that we
questioned said that they would seed a hurricane threatening their homes and
families but only if they could be released from professional liability. The trade-off
between accepting the responsibility for seeding and accepting higher probabilities
of severe property damage is a crucial issue in this decision. There are many similar
1386 D.W. North
Maximum
Sustained
Winds, w
w(t1) w(t) Without Seeding
12 Hours
t0 t1
Seeding Landfall
Initiated
situations where new technology may bring benefits, but not enough is known to
assure that there are not substantial risks.
The consequences of seeding and not seeding are characterized with probability
distributions. Our team developed these distributions from data on the changes in
central pressure of hurricanes, a standard relationship between wind and central
barometric pressure used in predicting hurricane winds and verified by wind obser-
vations, plus judgment of these government experts on the additional uncertainty
introduced by seeding. The team considered three outcomes: (H1) The beneficial
Stormfury hypothesis: The effect of seeding averaged over many hurricanes is
a reduction of the maximum sustained wind speed. (H2) The null hypothesis:
Seeding on average induces no change in maximum wind speed. (H3) The detri-
mental hypothesis: On average seeding increases the maximum sustained wind
speed. These outcomes are mutually exclusive and collectively exhaustive: Only
one can be true. Available knowledge is characterized by probabilities over these
three hypotheses. Bayes theorem allows a logical revision of these probabilities,
from before the Debbie experiment, after the results from the Debbie experiment
became available (the time of the analysis), and to a future time when more seeding
experiments might yield results.
Using the central pressure to wind speed relationship with the available historical
data gave a probability distribution for the unseeded and for the null hypothesis.
This distribution seemed to fit reasonably well to the normal or Gaussian distri-
bution, and to first order, the 12-h wind speed change in percentage terms stayed
the same as the initial hurricane wind speed varied. The team decided to stay within
the two-parameter normal family of probability distributions for characterizing 12-h
wind speed changes, with an important distinction between the average change the
mean over many hurricanes, and the wind speed change in a specific hurricane,
which was estimated with limited precision from the central pressure change data.
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1387
:
Probability Distribution Based on
Observations of 12-Hour Changes in 40 70 100 130 160
Central Pressure Maximum Sustained Wind, mph (or %)
Empirical Formula Relating Central
Prob. Density
Pressure to Maximum Sustained Wind
Function
time t = 12 hours
(landfall)
s = 15.6
Fig. 40.4 Probabilistic model for 12-h change in maximum sustained wind, natural hurricane
The standard deviation for this unseeded (and null if seeded case) was 15.6 %.
See Fig. 40.4.
For the beneficial seeded case, the team of analysts and experts agreed on
a 15 % reduction on average and a standard deviation of 7 % for the effect on
individual hurricanes. For the detrimental case, the corresponding numbers were
a mean increase on 10 % and the same standard deviation, of 7 %. The standard
deviation for an individual hurricane under H1 or H3 is the square root of the sum
of the squares, 18.6 %.
The team went through an informal process of assessing probabilities on the three
hypotheses but using the mathematics of Bayes theorem. Establishing probabilities
for the three hypotheses was a crucial aspect of characterizing the overall uncertainty
on how seeding could reduce wind speed and property damage.
If one starts with equal prior probabilities on the three hypotheses, then using
Bayes theorem to update after the Debbie results, the posterior probabilities were
H1 D 81 %; H2 D 15 %, and H3 D 4 %. The team used a lower value for H1, based
on agreement that (1) before Debbie, H1 and H3 were approximately equal but
with lower probability than H2, and after Debbie, H1 and H2 were approximately
equally likely. The probabilities that met these conditions were H1 D H2 D 0:49
and H3 D 2 %. The government meteorology experts accepted these probabilities
as reasonable judgments given the currently available information.
Developing this set of probabilities and probability distributions in Figs. 40.5
and 40.6 was essentially the authors responsibility as the project leader of the
1388 D.W. North
Fig. 40.5 Probabilistic model for 12-h change in maximum sustained wind, seeded hurricane
0.8
Unseeded Hurricane
0.7
0.6
Than Amount Shown
0.5
0.4
Seeded Hurricane
0.3
0.2
0.1
0.0
60 70 80 90 100 110 120 130
Maximum Sustained Wind 12 Hours After Seeding Decision (%)
100 x 106
Damage (1969 $)
10 x 106 c
d = c1 w 2
c2 = 4.363
1 x 106
based on property values for different coastal areas. Such adjustment was not done
in estimating the exponent from the data on historical hurricanes, but it could be
done in a site-specific analysis.
The probability distributions on wind changes and the relationship of wind
changes to property damage permitted computation of the expected damage with
and without seeding, shown in Fig. 40.8. This calculation could have been done on a
computer with great precision. At the time of this analysis, computers were large and
took time for access, while handheld calculators were common, as cell phones with
calculators have become today. The quantitative characterizations were not precise
they were the equivalent of sketches summarizing the available information. For
convenience in explaining and ease for others to reproduce the calculations, the wind
speed change outcomes were divided into five intervals: increase of 25 % or more,
increase of 10 % to 25 %, little change, C10 % to 10 %, decrease of 10 % to 25 %,
and a decrease of more than 25 %. The corresponding representative values for the
five intervals are C32 %; C16 %; 0; 16 %, and 34 %. The decision on whether to
seed is shown in Fig. 40.9.
Expected values for the wind speed change and property damage were calculated
for each of the intervals and then the expected damage, multiplying probability
times damage and summing over the five intervals, with and without seeding. This is
readily done on a hand calculator. The expected property damage loss was reduced
18:7 % by seeding, a result that pleased Dr. Tribus and confirmed his intuition. The
comparison of the alternatives is shown below in decision tree form in Fig. 40.9,
with the decision shown as a box, and the resolution of uncertainty on damage from
the first seeded hurricane shown with five discrete outcomes. A cost of $250,000 for
seeding was assumed, small compared to damage in the $100 million range.
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1391
0.9
0.8
0.7
Amount Shown
0.6 Unseeded
0.5
0.4
0.3
0.2 Seeded
0.1
0.0
0 50 100 150 200 250 300 350 400
Property Damage (M$)
Fig. 40.8 Complementary cumulative probability distributions on property damage for the seeded
and unseeded hurricane
Seed:
0.143 +16 191.1
Expected Loss 0.392
= $94.08 + $0.25
0 100.0
0.255
= $94.33
0.172 -16 46.7
Cost of Seeding =
$0.25 -34 16.3
-34 16.3
Economic Value of Seeding = $21.67 million (18.7% reduction in loss)
Decision not sensitive to specific assumptions.
The Seeding Decision for the Nominal Hurricane (Government Responsibility Cost Included)
Government
Change in Property Responsibility Cost
Probabilities Assigned Maximum Damage Loss (% of property Total Cost
to Outcomes Sustained Wind (M$) damage) (M$)
0.038 +32 % $335.8 +50 % $503.7
Seed:
0.142 +16 191.1 +30 248.4
Expected Loss 0.392
= $110.67 + $0.25
0 100.0 +5 105.0
0.255
= $110.92
0.172 -16 46.7 0 46.7
Cost of Seeding =
$0.25 -34 16.3 0 16.3
Fig. 40.10 The seeding decision for the nominal hurricane (government responsibility cost
included)
This legal research persuaded Dr. Tribus and others in the leadership of the
National Hurricane Research Laboratory and the National Weather Service that the
decision to move to operational seeding would need to be made by the US President
or under new legal authority provided by Congress.
There was a further issue to be addressed, and that was getting attention
on the opportunities for further seeding experiments such as had been done on
Hurricane Debbie. What would it be worth to resolve the uncertainty on which
of the three hypotheses were true before a seeding decision needed to be made?
Consider that an alternative to resolve this uncertainty by hiring a hypothetical
clairvoyant who knows the answer. But the clairvoyant must be paid before we
get the clairvoyants information. What is the clairvoyants information worth,
in the context of the nominal $100 million hurricane? The calculation of the
expected value of information is relatively straightforward once a decision tree has
been constructed with values on the outcomes. In the case of perfect information,
complete resolution of the uncertainty on which hypothesis is correct, one needs
only reverse the sequence, resolution of the uncertainty on which hypothesis before
making the choice of whether to seed. The decision tree is shown in Fig. 40.11.
1394 D.W. North
Fig. 40.11 Expected value of the clairvoyants information: which hypothesis describes the effect
of seeding?
A more complex calculation with a decision tree shows the decision to carry
out an experimental seeding of the type that was done for Hurricane Debbie.
Probabilities are calculated for the outcomes for a seeded hurricane, and Bayes
theorem is used to update the posterior probabilities of the three hypotheses. The
overall result, shown in Fig. 40.12, is that for one $100 million hurricane, the
expected value of an experimental seeding is on the order of ten times the cost.
The project final report [4] includes calculated values for multiple seeding
experiments and extrapolated to US hurricane seasons. Opportunities for seeding
in the Pacific were more plentiful, and both the analysis team and the government
sponsors believed that seeding experiments in the Pacific should be done. The
analysis was presented to Dr Edward David, the Science Advisor to President
Nixon, and the Presidents Science Advisory Committee (PSAC). President Nixon
himself was briefed by Dr. David. The Presidents reaction is interesting, especially
in view of later events in the Nixon Administration. President Nixon considered that
if a severe hurricane threatened a US city such as New Orleans, he might wish to
go onto television and ask for a plebiscite from affected citizens as to whether, as
Commander in Chief, he should order a seeding.
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1395
0.038 $116.00
0.143
+32% $503.7
$116.00 +16 248.4
0.392 $103.44 0 105.0
Perform -16 46.7
Experiment 0.255 $116.00 -34 16.3
Seed
$107.96 +32% 503.7
Do Not +16 248.4
0.172 $103.44 Seed 0 105.0
-16 46.7
$116.00 -34 16.3
$87.83
+32% 503.7
$110.92 +16 248.4
$110.92 0 105.0
Do not
-16 46.7
Perform -34 16.3
Seed
Experiment
Do Not +32% 503.7
$110.92 Seed +16 248.4
0 105.0
Expected Value (M$) $116.00 -16 46.7
-34 16.3
Value = $2.96 million
Actual cost = $0.25 million
3 Conclusions
The author takes considerable pride in the way the Mars analysis team learned
about the importance of UV radiation and particle shielding and the way the
hurricane analysis team included government responsibility cost into the hurricane
analysis as a way of bringing attention to the importance of the legal issues related to
trying to change a weather phenomenon. Both teams dealt with the complexity and
uncertainty under time and resource constraints and with difficulties to overcome in
getting cooperation from government scientists. Some of the government scientists
in the hurricane case study were initially hostile to what they thought was an effort
to take seeding operational as part of coastal defense before the technology had
been established as acceptably safe. The team learned their concerns and tried to
address them openly. The hurricane seeding decision analysis was presented to the
committee of leading scientists who advised the President, and the (at the time)
innovative Bayesian probability and value of information calculations were well
received by these scientists and published in Science [19]. The results analysis
encouraged support for seeding in the Pacific, which seemed like a very good
alternative at the time.
The author has learned over many applications of decision analysis in addition
to these two case studies the importance of proper framing: What are the decisions,
and what are the consequences that the stakeholders care about? [35, 36].
It is disappointing that there have not been more applications of decision analysis
to major public policy problems. There is opportunity: The tools of probabilistic
analysis are there, and efforts are ongoing to apply these tools to a range of
problems, including the safety of nuclear power plants and the global decisions to
deal with the threats posed by greenhouse-gas-induced climate change [40].
Many corporations are now using decision analysis routinely; Chevron received
the first Raiffa-Howard Award (named for Howard Raiffa and Ronald Howard) for
excellence in decision quality at the 2014 INFORMS meeting in San Francisco.
Opportunities to carry out analysis in support decisions depend on the leadership
with the decision responsibility. These leaders can ask for the support that decision
analysis can provide.
Acknowledgements This chapter draws from presentations on the two case studies at the
November 2014 meeting of INFORMS (Institute for Operations Research and the Management
Sciences) in San Francisco, on the occasion of the 50th Anniversary of Decision Analysis honoring
its founders, Stanford University Professor Ronald Howard and Harvard University Professor
Howard Raiffa. The author wishes to acknowledge the collaboration with colleagues in the two case
studies, and appreciation to Tony Cox for detailed comments on the initial draft of this chapter. The
author expresses gratitude to these and other colleagues over 50 years in decision and risk analysis
for the many insights learned about assessing uncertainties and values in a decision support context.
References
1. Arrow, K.J.: A difficulty in the concept of social welfare. J. Political Econ. 58, 328346 (1950)
2. Arrow, K.J.: Alternative approaches to the theory of choice in risk-taking situations. Econo-
metrica 19(4), 404437 (1951)
3. Arrow, K.J.: Essays in the Theory of Risk-Bearing. North Holland Pub. Co., Amsterdam (1970)
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1397
4. Boyd, D.W., Howard, R.A., Matheson, J.E., North, D.W.: Decision Analysis of Hurricane
Modification, Final Report, SRI Project 8503, Stanford Research Institute, Menlo Park (1971)
5. Budnitz, R.J., Apostolakis, G., Boore, D.M., Cluff, L.S., Coppersmith, K.J., Cornell, C.A.,
Morris, P.A.: Recommendations for Probabilistic Seismic Hazard Analysis: Guidance on
Uncertainty and Use of Experts, Report prepared for the Nuclear Regulatory Commission,
NUREG/CR-6372, vol. 1, main report, vol. 2, appendices. http://www.nrc.gov/reading-rm/
doc-collections/nuregs/contract/cr6372/ (1997)
6. Budnitz, R.J., Apostolakis, G., Boore, D.M., Cluff, L.S., Coppersmith, K.J., Cornell, C.A.,
Morris, P.A.: Use of technical expert panels: applications to probabilistic seismic hazard
analysis. Risk Anal. 18(4), 463469 (1998)
7. Cooke, R.M.: Experts in Uncertainty: Opinion and Subjective Probability in Science. Oxford
University Press, Oxford (1991)
8. Cox, R.T.: Probability, Frequency, and Reasonable Expectation. Am. J. Phys. 14, 110 (1946)
9. Cox, R.T.: The Algebra of Probable Inference. Johns Hopkins University Press, Baltimore
(1961); See also http://en.wikipedia.org/wiki/Cox%27s_theorem; http://en.wikipedia.org/wiki/
Richard_Threlkeld_Cox
10. De Finetti, B.: La Prevision: ses lois logiques, ses sources subjectives. Annales de LInstitut
Henri Poincar (1937). An English translation is in Studies in Subjective Probability, Kyburg
and Smokler (eds), 1964
11. Fischhoff, B.: The realities of risk-cost-benefit analysis. Science 350(6260) (2015).
doi:10.1126/science.aaa6516
12. Fishburn, P.C.: Utility Theory for Decision Making. Wiley, New York (1970)
13. Fishburn, P.C.: Personal communication (1970)
14. Haidt, J.: The Righteous Mind: Why Good People Are Divided by Politics and Religion.
Vintage Books, New York (2013)
15. Howard, R.A.: Decision analysis: applied decision theory. In: Hertz, D.B., Melese, J. (eds.)
Proceedings of the Fourth International Conference on Operation Research, pp. 5571. Wiley-
Interscience, New York (1966)
16. Howard, R.A.: The foundations of decision analysis. IEEE Trans. Syst. Sci. Cybern. SSC-4(3),
19 (1968)
17. Howard, R.A., Abbas, A.E.: Foundations of Decision Analysis. Pearson, New York (2015)
18. Howard, R.A., Matheson, J.E.: Influence diagrams. Decis. Anal. 2(3), 127143 (2005)
19. Howard, R.A., Matheson, J.E., North, D.W.: The decision to seed hurricanes. Science 176,
11911202 (1972)
20. Jaynes, E.T.: Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 4(3), 227241 (1968)
21. Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press, Cam-
bridge (2003). Earlier versions of this book were circulated as colloquium lectures, Socony
Mobil Oil Company, 1958, then in a version available online, prior to Jaynes death in 1998.
The Cambridge University Press version was edited by G. Larry Bretthorst and published five
years after Jaynes death
22. Jeffreys, H.: Theory of Probability, 3rd edn. Clarendon Press, Oxford (1939, 1961)
23. Jet Propulsion Laboratory News: http://mars.nasa.gov/mro/news/whatsnew/index.cfm?
FuseAction=ShowNews&NewsID=1183 (2011)
24. Jet Prolusion Laboratory News: http://www.jpl.nasa.gov/news/news.php?feature=4722 (2015)
25. Judd, B.R., Warner North, D., Pezier, J.P.: Assessment of the Probability of Contaminating
Mars., Report No. MSU-2788, prepared for the Planetary Programs Division, National
Aeronautics and Space Administration by SRI International, Menlo Park, 156 pp. (1974).
Available at: http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19740018186.pdf
26. Kahneman, D.: Thinking, Fast and Slow. Farrar, Strauss, and Giroux, New York (2011)
27. Kahneman, D., Tversky, A., Slovic, P.: Judgment Under Uncertainty: Heuristics and Biases.
Cambridge University Press, New York (1982)
28. Kok, J.F., Parteli, E.J.R., Michaels, T.I., Karam, D.B.: The physics of wind-blown sand and
dust. Rep. Prog. Phys. 75, 106901 (2012). Available at http://arxiv.org/ftp/arxiv/papers/1201/
1201.4353.pdf
1398 D.W. North
29. Kyburg, H.E., Jr., Smokler, H.E. (eds.): Studies in Subjective Probability. Wiley, New York
(1964)
30. Laplace, P.-S., marquis de, Essai philosophique sur les probabilities (1814). English translation,
Dover, New York (1951)
31. Love, M.: Probability Theory, 4th edn. 1963, Springer, New York (1977)
32. Luce, R.D., Howard R.: Games and Decisions: Introduction and Critical Survey. Wiley,
New York (1957) Reprinted by Dover (1989)
33. Morgan, M.G.: The use (and abuse) of expert elicitation in support of decision making for
public policy. Proc. Natl. Acad. Sci. 111, 71767184 (2014)
34. Morgan, M.G.: Commentary: our knowledge of the world is not simple: policy makers should
deal with it. Risk Anal. 35(1), 1920 (2015)
35. National Research Council: Understanding Risk: Informing Decisions in a Democratic Society.
National Academy Press, Washington, D.C. (1996)
36. National Research Council: Public Participation in Environmental Assessment and Decision
Making. National Academy Press, Washington, D.C. (2008)
37. Navarro, D., Perfors, A.: An introduction to the Beta-Binomial Model, University of Adelaide.
https://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf (undated). Last accessed
2 Feb 2016
38. North, D.W.: The invariance approach to the probabilistic encoding of information. Ph.D.
dissertation, Stanford University (1970). Available at: http://www.northworks.net/phd_thesis.
pdf
39. North, D.W.: Limitations, definitions, principles, and methods of risk analysis. Rev. sci. Tech.
Off. Int. Epiz. 14(4), 913923 (1995)
40. North, D.W.: Review of five books on climate change. Risk Anal. 35(12), 22212227 (2015)
41. North, D.W., Judd, B.R., Pezier, J.P.: New methodology for assessing the probability of
contaminating Mars. Life Sci. Space Res. XIII, 103109. Academie-Verlag, Berlin (1975)
42. Phillips, L., von Winterfeldt, D.: Reflections on the Contributions of Ward Edwards to Decision
Analysis and Behavioral Research, Working paper, London School of Economics. eprints.
lse.ac.uk/22711/1/06086.pdf (2006). Chapter 5 in Advances in Decision Analysis. Cambridge
University Press, Cambridge (2007)
43. Pratt, J.: Risk Aversion in the Small and In the Large. Econometrica 32(1/2), 122136 (1964)
44. Pratt, J., Raiffa, H., Schlaifer, R.: Introduction to Statistical Decision Theory. MIT, Cambridge
(1995)
45. Raiffa, H.: Decision Analysis: Introductory Lectures on Choices under Uncertainty. Addison-
Wesley, Reading (1968)
46. Raiffa, H., Schlaifer, R.: Applied Statistical Decision Theory. Cambridge, MA: Graduate
School of Business, Harvard University, 1961. Reprinted, Wiley Classics Library, New York
(2000)
47. Ramsey, F.P.: Truth and Probability (1926). Reprinted in: Studies in Subjective Probability,
Kyburg and Smokler (eds.) 1964. Also, the text of this paper appears in the book by Ramsey,
The Foundations of Mathematics and Other Essays. Harcourt Brace and Company, New
York (1931) and is available at: http://socserv2.socsci.mcmaster.ca/~econ/ugcm/3ll3/ramseyfp/
ramsess.pdf
48. Savage, L.J.: The Foundations of Statistics. Wiley, New York (1954). 2nd revised edition,
Dover, New York (1972)
49. Space Science Board: National Research Council, Biological Contamination of Mars.
National Academy Press, Washington, D.C. http://www.nap.edu/catalog.php?record_id=12305
(1992)
50. Spetzler, C., Stael von Holstein, C.-A.S.: Probability encoding in decision analysis. Manag.
Sci. 22, 340358 (1975)
51. Stanford Encyclopedia of Philosophy, Modal Logic entry: http://plato.stanford.edu/entries/
logic-modal/. Last accessed 2 Feb 2016
52. Taleb, N.N.: Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets,
2001, updated second edition. Random House, New York (2005)
40 Decision Analytic and Bayesian Uncertainty Quantification for Decision Support 1399
53. Taleb, N.N.: The Black Swan. Random House, New York (2007)
54. Taleb, N.N.: Antifragile: Things That Gain from Disorder. Random House, New York (2012)
55. Tetlock, P.E., Gardner, D.: Superforecasting: The Art and Science of Prediction. Crown
Publishers, New York (2015)
56. Tribus, M.: Rational Descriptions, Decisions, and Designs, Pergamon Press, Elmsford (1969)
57. Vaihinger, H.: The Philosophy of As-If: A System of the Theoretical, Practical, and Religious
Fictions of Mankind. Published in German (1911); in England (1924). Available in a translation
by C.K. Ogden, New York, Barnes and Noble. http://en.wikipedia.org/wiki/Hans_Vaihinger
(1968). Last accessed 2 Mar 2015
58. Van Horn, K.S.: Constructing a logic of probable inference: a guide to Coxs theorem. Int. J.
Approx. Reason. 34(1), 324 (2003)
59. Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior, 2nd edn. 1947,
paperback edition, 2007, Princeton University Press, Princeton
60. Von Winterfeldt, D., Edwards, W.: Decision Analysis and Behavioral Research. Cambridge
University Press, New York (1986)
61. Wald, A.: Statistical Decision Functions. Wiley, New York (1950)
62. Wiki, Arrows impossibility theorem: https://en.wikipedia.org/wiki/Arrow%27s_
impossibility_theorem. Last accessed 30 Jan 2016
63. Wiki, Bayesian network: https://en.wikipedia.org/wiki/Bayesian_network. Last accessed 30
Jan 2016
64. Wiki, Coxs theorem: https://en.wikipedia.org/wiki/Cox%27s_theorem. Last accessed 30 Jan
2016
65. Wiki, De Finettis theorem: https://en.wikipedia.org/wiki/De_Finetti%27s_theorem. Last
accessed 7 Feb 2016
66. Wiki, Influence diagram: https://en.wikipedia.org/wiki/Influence_diagram. Last accessed on
30 Jan 2016
67. Wiki, Maxwell-Boltzmann distribution: https://en.wikipedia.org/wiki/Maxwell%E2%80
%93Boltzmann_distribution. See also: http://farside.ph.utexas.edu/teaching/sm1/lectures/
node72.html. Both last accessed 7 Feb 2016
68. Wiki, Monty Hall problem: https://en.wikipedia.org/wiki/Monty_Hall_problem. Last accessed
14 Nov 2015
69. Wiki, Probability interpretations: http://en.wikipedia.org/wiki/Probability_interpretations. Last
accessed 17 Nov 2015
70. Wiki, Project Stormfury: https://en.wikipedia.org/wiki/Project_Stormfury. Last accessed 15
Nov 2015
71. Wiki, Voting paradox: https://en.wikipedia.org/wiki/Voting_paradox. Last accessed 30 Jan
2016
72. Winkler, R.: Equal versus differential weighting in combining forecasts. Risk Anal. 31(1),
1618 (2015)
Validation, Verification, and Uncertainty
Quantification for Models with Intelligent 41
Adversaries
Abstract
Model verification and validation (V&V) are essential before a model can be
implemented in practice. Integrating model V&V into the process of model
development can help reduce the risk of errors, enhance the accuracy of the
model, and strengthen the confidence of the decision-maker in model results.
Besides V&V, uncertainty quantification (UQ) techniques are used to verify
and validate computational models. Modeling intelligent adversaries is different
from and more difficult than modeling non-intelligent agents. However, modeling
intelligent adversaries is critical to infrastructure protection and national security.
Model V&V and UQ for intelligent adversaries present a big challenge. This
chapter first reviews the concepts of model V&V and UQ in the literature
and then discusses model V&V and UQ for intelligent adversaries. Some
V&V techniques for modeling intelligent adversaries are provided which could
be beneficial to model developers and decision-makers facing with intelligent
adversaries.
Keywords
Decision making Intelligent adversaries Model validation and verification
Validation techniques
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1402
2 Model Verification vs. Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1403
2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1403
2.2 Relationships Between Verification and Validation . . . . . . . . . . . . . . . . . . . . . . . . . 1405
3 Validation, Verification, and UQ in the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1405
3.1 Validation, Verification, and UQ in Model Development Process . . . . . . . . . . . . . . 1405
3.2 Quantitative Model Validation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1406
1 Introduction
Models have been extensively used in research when describing systems and pre-
dicting scenarios. Model verification and validation (V&V) can quantify confidence
in the accuracy of model-based predictions under certain assumptions.
Verification refers to building the system right, while validation refers to building
the right system [47]. Verification is conducted before validation. The verification
process includes assessing code verification and calculation verification. Validation
consists of conceptual model validity and operational validity. There are many
approaches to validation, such as validation by assumption, validation by results,
and validation by common sense. Validation techniques include animation, compari-
son to models, degenerate tests, event validity, extreme condition, face validity, fixed
values, historical data validation, historical methods, internal validity, multistage
validation, operational graphics, parameter variability, predictive validation, traces,
and turning tests [58]. See the explanations of the techniques in Table 41.1.
Academia, industry, and government have been interested in model validation.
Some terms related to the model, such as reliability, credibility, confidence,
and applicability, have become common in academic and industrial studies, as
well as government reports and implementations. A Standards Committee for the
development of model V&V procedures for computational solid mechanics models
has been formed by the American Society of Mechanical Engineers (ASME); a
V&V model for all safety-related nuclear facility design, analyses, and operations
has been supported by the Defense Nuclear Facilities Safety Board (DNFSB); and
validation of complex models has been a key concern of the military simulation
community for over three decades [39].
Considerable attention has been paid to model verification and validation.
Numerous articles have appeared in the literature expressing different concerns of
the validity of the models that have been proposed. The advances in the techniques
of modeling and solution have impacted how people perceive model validation.
For details about model V&V and about computational simulation models [5759],
see [44]. Validation methods, procedures for economic and financial models, urban
and transportation models, government and criminology models, and medical and
physiological models have also been studied [16].
The V&V approach quantifies degree of the accuracy and confidence inferred
from the comparison of the prediction from the model with the results from reality
41 Validation, Verification, and Uncertainty Quantification for Models: : : 1403
2.1 Terminology
Table 41.2 Definitions of verification and validation by DMSO and IEEE (Source: [7, 24])
Verification Validation
DMSO The process of determining that a model The process of determining the degree to
implementation accurately represents the which a model is an accurate representa-
developers conceptual description and tion of the real world from the perspective
specifications of the intended uses of the model
IEEE The process of evaluating a system or The process of evaluating a system or
component to determine whether the prod- component during or at the end of the
ucts of a given development phase satisfy development process to determine whether
the conditions imposed at the start of that it satisfies specified requirements
phase
terminology for V&V, the Department of Defense (DoD) Modeling and Simulation
Office (DMSO) has been the leader and played a major role in attempting to
standardize the definitions of V&V [7, 8]. In addition, DMSO developed the
US fundamental concepts and terminology for model V&V applied to high-level
systems such as ballistic missile defense and battle management simulations [71].
There is a variety of formal definitions. The Defense Modeling and Simulation
Organization (DMSO) of the Department of Defense (DoD) and the Institute of
Electrical and Electronics Engineers (IEEE) give the most widely used definitions
of the terms of verification and validation [44]; see Table 41.2.
Software engineering and software quality assurance use the IEEE definitions
[44]. By contrast, computational simulations in science and engineering and opera-
tion research use the DMSO definitions. The DMSO definitions are widely adopted
by [1,32,33,55,63,71] as well as in this chapter. Thacker et al. [71] states Software
V&V is fundamentally different from model V&V. Software V&V is required when
a computer program or code is the end product. Model V&V is required when a
predictive model is the end product. A code is the computer implementation of
algorithms developed to facilitate the formulation and approximate solution of a
class of models.
Different papers have defined model V&V differently according to specific
contexts. For example, [59] defines model validation as substantiation that a
computerized model within its domain of applicability possesses a satisfactory
range of accuracy consistent with the intended application of the model; according
to [15], model validation refers to activities to establish how closely the model
mirrors the perceived reality of the model use/developer team; [36] defines model
as activities designed to determine the usefulness of a model; i.e., whether it
is appropriate for its intended uses(s); whether the benefits of improving model
usefulness exceed the costs; whether the model contributes to making better
decisions; and possibly how well the particular model performs compared to
alternative models; [46] defines verification and validation as the process of
determining the accuracy with which a computational model can produce results
deliverable by the mathematical model on which it is based: code verification and
solution verification and the process of determining the accuracy with which a
model can predict observed physical events (or the important features of a physical
reality) respectively.
41 Validation, Verification, and Uncertainty Quantification for Models: : : 1405
Model verification and validation (V&V) are the primary processes for quantifying
and building credibility in numerical models and essential parts of the model
development process if models are accepted and used to support decision-making.
Verification and validation are often mentioned together in requirements to test
models, yet they are fundamentally different.
Verifying a model means checking if the model produces the intended output as
a function of the inputs, and whether it is mathematically correct. The focus here
is on the model implementation and coding [18]. Verification is a critical activity,
but is not the same as validation. Fundamentally, model validation is subjective
and different perspectives on model validation could have very different meanings.
The assertion the model was judged valid can mean almost anything, since the
modelers choose the validity tests, the criteria for passing those tests, what models
outputs to validate, what setting to test in, what data to use, etc.[37].
Verification is a matter of asking Did I build the thing right? and Have
the model and the simulation been built so that they fully satisfy the developers
intent?. By contrast, validation asks Did I build the right thing? and Will the
model be able to adequately support its intended use? Is its fidelity appropriate
for that? [49].
The purpose of model verification and validation is to assess and improve
credibility, accuracy, and trustworthiness. Model verification and validation cannot
certify a model to be accurate for all scenarios; but can provide evidence that a
model is sufficiently accurate for its intended use [71].
U.S. GAO [76] introduces basic steps in the modeling process, which includes
(1) describing problem, (2) isolating system, (3) adopting supporting theory, (4)
formulating model, (5) analyzing data requirements, collecting data, (6) developing
computer program, (7) debugging computer program, (8) developing alternative
solutions, (9) evaluating model output/results, (10) presenting results/plans, (11)
developing model maintenance procedures, and (12) transferring system to users.
According to [76], steps (1)(7) are covered by model verification, and steps (1)
(11) are covered by model validation. Yet, the absence of the necessary information
often makes it hard to follow the steps to validate models. Gass [15] adopts this
modeling process which aims at indicating how the research community concerned
policy models is attempting to develop and test procedures for improving the role
of models as decision aids.
Over the past three decades, new dimensions have been brought to the notion
of model V&V by large-scale computer-based mathematical and simulation models
1406 J. Zhang and J. Zhuang
[30]. The Sargent Circle in simulation validation is one of the earliest and most
influential among all the paradigms of the relationships among V&V activities [56].
In the Sargent Circle [56], conceptual model validation is defined as determining
that the theories and assumptions underlying the conceptual model are correct and
that the model representation of the problem entity is reasonable for the intended
purpose of the model; computerized model verification is defined as assuring
that the computer programming and implementation of the conceptual model is
correct; and operational validation is defined as determining that the models
output behavior has sufficient accuracy for the models intended purpose over
the domain of the models intended applicability. Most validation takes place in
operational validation. In order to obtain a high degree of confidence in the model
and its results, the model developers need to compare the input-output behaviors of
the model and the system. There are three basic comparison approaches: (1) graphs
of the model and system behavior data, (2) confidence intervals, and (3) hypothesis
tests. For details of the methodologies, see [2, 4, 31, 33].
A detailed schematic of the model V&V in model development process is given
by [71]. There are two branches in the procedure of model V&V: the first is to obtain
relevant and high-quality experimental data via physical testing; and the second is
to develop and exercise the model.
Because of inherent randomness, uncertainties cannot be ignored. Uncertainty
quantification (UQ) exists in the processes of both performing experiments and
developing models, which emphasizes the importance of UQ in improving confi-
dence in both the experiment and model outcomes.
By comparison with experimental data, validation is able to quantify the confi-
dence in the predictive capability of the model. For more definitions, uncertainties,
and model explanations, see [1, 45, 52, 53, 70, 71].
the quantitative model validation techniques from the perspectives of both hypothe-
sis testing-based and non-hypothesis testing-based methods and gives a systematic
procedure for quantitative model validation. The quantitative model validation tech-
niques include classical hypothesis testing, Bayesian hypothesis testing, confidence
intervals, reliability-based metric, and area metric-based method. For details and
examples of these quantitative model validation techniques, see [13, 26, 32].
resulting from terrorist incidents, since it is hard to assess when, where, and how
the terrorists would strike. Meanwhile, as technologies evolve, adaptive terrorists
could mount new types of attacks. In the reality of counterterrorism, there are
lots of uncertainties; and throughout the built models, there are assumptions and
dubious parameters. Unless the models are validated, the model results may not be
trustworthy.
It is possible to make empirical observations to compare with the prediction
results from the corresponding models. For example, [75] proposes a prospect
theory model of coachess utility and estimates the models parameters using the
data from the 2009 NFL season; [19] studies parking choice models, which are first
calibrated based on the collected data from video-recorded observations from a set
of parking lots on the University at Buffalo north campus and then used to predict
the drivers behavior. Intelligent adversaries are more difficult to model, since they
are adaptive, and may have unknown preferences, beliefs, and capabilities. Guikema
[17] indicates that the key difference between risk assessment for situations with
intelligent adversaries and traditional risk assessment problems is that intelligent
adversaries adapt. They adapt to observed, perceived, and imputed likely future
actions by those defending the system they are attempting to damage. This adaptive
behavior must be considered if risk assessment models are to provide accurate
estimates of future risk from intelligent adversaries and appropriately support risk
management decision making.
Also, [3] claims that adversary risk analysis has three special uncertainties: (1)
aleatory uncertainty (randomness of outcomes), (2) epistemic uncertainty (strategic
choices of an intelligent adversary), and (3) concept uncertainty (beliefs about how
the problems are framed).
Validating models for the probable behaviors of intelligent adversaries may not
mean comparing the model results with existing data or experimental data as is in
validating traditional models. This is because the data, if any, is often incomplete
and sometimes classified. In terms of validating counter-terrorism models, [65]
states that for terrorist acts, validation is only possible in a limited sense and may
be more correctly characterized as ensuring the models are reasonable or credible,
performing sanity checks, ensuring consistency with what is known about terrorist
groups, and not being able to invalidate the model. Validity, in this case is viewed as
a range, not a binary valid/invalid assessment.
1. Identifying and evaluating the validity of key assumptions implicit in the overall
system design;
2. Comparing the world representation in RMAT to external sources;
3. Assessing the completeness of the attack scenarios considered in RMAT, includ-
ing both weapon-target pairings and pathways by which attacks are carried out;
4. Comparing the attack consequences modeled in RMAT to external sources.
Regarding the data, diverse forms of evidence are used to validate the data, such
as logic, subject matter expert judgments, and literature searches.
to replicate and control the environment. The results of the experiment reveal
behavioral patterns that are consistent with the predicted patterns. Holt et al. [23]
provides theoretical analysis and experimental validation to guide policy makers to
improve the effectiveness of targeted profiled screening and investigates the efficient
profiling andcounterterrorism policy.
33% 1%
1%
1%
3%
9%
49%
1%
5%
41 Validation, Verification, and Uncertainty Quantification for Models: : : 1413
Based on the historical data from the GTD and UASI (the Urban Area Security
Initiative, a Department of Homeland Security grant program), the parameter values
are estimated in [78], such as the economic loss, fatality loss, success probability,
and the defense cost-effectiveness for different attack types and targets. The authors
estimate that basing defensive planning on the proposed model results in the lowest
expected loss, having an expected loss which is 857% lower than the single attack-
type model and 8296% lower than the results of the real allocation [78].
Despite the fact that Stackelberg game-based applications have been deployed in
practice, measuring the effectiveness of the applications remains a difficult problem.
And the data available about the deterrence of real-world terrorist attacks is very
limited. Tambe and Shieh [67] suggests several methods to evaluate the effectiveness
of Stackelberg games in real-world deployments, including:
lexicon for defining each technical term appearing in its reports and presentations,
and To assess the probabilities of terrorist decisions, DHS should use elicitation
techniques and decision-oriented models that explicitly recognize terrorists as intel-
ligent adversaries who observe U.S. defensive preparations and seek to maximize
the achievement of their own objectives. DHS [10] concludes that the BTRA is not
valid and suggests the DHS not to continue the development of that model.
5 Conclusion
Acknowledgements This research was partially supported by the United States Department
of Homeland Security (DHS) through the National Center for Risk and Economic Analysis
of Terrorism Events (CREATE) under award number 2010-ST-061-RE0001. This research was
also partially supported by the United States National Science Foundation under award numbers
1200899 and 1334930. However, any opinions, findings, and conclusions or recommendations in
this document are those of the authors and do not necessarily reflect views of the DHS, CREATE,
or NSF. The authors assume responsibility for any errors.
References
1. AIAA: AIAA guide for the verification and validation of computational fluid dynamics
simulation. AIAA-G-077-1998, Reston (1998)
2. Balci, O., Sargent, R.G.: A Methodology for cost-risk analysis in the statistical validation of
simulation models. Commun. ACM. 24(4), 190197 (1981)
3. Banks, D.: Adversarial Risk Analysis: Principles and Practice. Presentation on First Confer-
ence on Validating Models of Adversary Behaviors, Buffalo (2013)
4. Banks, J., Carson II J.S., Nelson, B.L.: Discrete-Event System Simulation, 2nd edn. Prentice
Hall International, London, UK (1996)
5. Borgonovo, E.: Measuring uncertainty importance: investigation and comparison of alternative
approaches. Risk Anal. 20(5), 13491361 (2006)
6. Coleman, H.W., Steele, W.G.: Experimentation, Validation, and Uncertainty Analysis for
Engineers. Wiley, Hoboken (2009)
7. DoD: DoD directive No 5000.59: Modeling and Simulation (M&S) Management. Defense
Modeling and Simulation Office, Office of the Director of Defense Research and Engineering
(1994)
8. DoD: Verification, Validation, and Accreditation (VV&A) Recommended Practices Guide.
Defense Modeling and Simulation Office, Office of the Director of Defense Research and
Engineering (1996)
9. DoD: Special Topic on Subject Matter Experts and Validation, Verification and Accredi-
tation, DoD Recommended Practices Guide (RPG) for Modeling and Simulation VV&A,
Millennium Edition (2000)
10. DHS: Department of Homeland Security Bioterrorism Risk Assessment: A Call for Change.
Available at http://www.nap.edu/catalog/12206.html (2006). Accessed in Nov 2015
11. Elovici, Y., Kandel, A., Last, M., Shapira, B. Zaafrany, O.: Using Data mining Techniques for
Detecting Terror-Related Activities on the Web. Available at http://www.ise.bgu.ac.il/faculty/
mlast/papers/JIW_Paper.pdf. Accessed in Nov 2015
12. Ezell, B.C., Bennett, S.P., Winterfeldt, D., Sokolowski, J, Collins, A.J.: Probabilistic risk
analysis and terrorism risk. Risk Anal. 30(4), 575589 (2010)
13. Ferson, S., Oberkampf, W.: Validation of imprecise probability models. Int. J. Reliab. Saf. 3(1),
322 (2009)
14. Garrick, B.J., Hall, J.E., Kilger, M., McDonald, J.C., OToole, T., Probst, P.S., Parker, E.R.,
Rosenthal, R., Trivelpiece, A.W., Arsdale, L.V., Zebroski, E.L.: Confronting the risks of
terrorism: making the right decisions. Reliab. Eng. Syst. Saf. 86(2), 129176 (2004)
15. Gass, S.I.: Decision-aiding models: validation, assessment, and related issues for policy
analysis. Oper. Res. 31(4), 603631(1983)
16. Gruhl, J., Gruhl, H.: Methods and Examples of Model Validation-an Annotated Bibliography.
MIT Energy Laboratory Working Paper MIT-EL 78-022WP (1978)
17. Guikema, S.: Modeling intelligent adversaries for terrorism risk assessment: some necessary
conditions for adversary models. Risk Anal. 32(7), 11171121 (2012)
18. Guikema, S., Reilly, A.: Perspectives on Validation of Terrorism Risk Analysis Models.
Presentation on First Conference on Validating Models of Adversary Behaviors, Buffalo (2013)
41 Validation, Verification, and Uncertainty Quantification for Models: : : 1417
19. Guo, L., Huang, S., Zhuang, J.: Modeling parking behavior under uncertainty: a static game
theoretic versus a sequential neo-additive capacity modeling approach. Netw. Spat. Econ.
13(3), 327350(2013)
20. Hausken, K., Zhuang, J.: The impact of disaster on the interaction between company and
government. Eur. J. Oper. Res. 225(2), 363376(2013)
21. Hemez, F.M., Doebling, S.W.: Model validation and uncertainty quantification. For publication
in the proceeding of IMAC-XIX, the 19th International Model Analysis Conference, Kissim-
mee, 58 Feb 2001
22. Hills, R.G., Leslie, I.H.: Statistical validation of engineering and scientific models: validation
experiments to application. Sandia Technical Report (SAND2003-0706) (2003)
23. Holt, C.A., Kydd, A., Razzolini, L., Sheremeta, R.: The Paradox of Misaligned Profiling:
Theory and Experimental Evidence. Available at http://www.people.vcu.edu/ lrazzolini/Pro-
filing.pdf (2014). Accessed in Nov 2015
24. IEEE: IEEE Standard Glossary of Software Engineering Terminology. IEEE Std 610.12-1990,
New York (1991)
25. International Terrorism: Attributes of Terrorist Events (ITERATE). Available at http://library.
duke.edu/data/collections/iterate. Accessed in Nov 2015
26. Jiang, X., Mahadevan, S.: Bayesian risk-based decision method for model validation under
uncertainty. Reliab. Eng. Syst. Saf. 92(6), 707718 (2007)
27. John, R., Rosoff, H.: Validation of Proxy Random Utility Models for Adaptive Adver-
saries. Available at http://psam12.org/proceedings/paper/paper_437_1.pdf (2014). Accessed in
November, 2015
28. Jose, V.R.R., Zhuang, J.: Technology Adoption, Accumulation, and Competition in Multi-
period Attacker-Defender Games. Mil. Oper. Res. 18(2), 3347 (2013)
29. LaFree, G., Dugan, L.L.: Introducing the global terrorism database. Terror. Political Violence
19(2), 181204 (2007)
30. Landry, M., Malouin, J.L., Oral. M.: Model validation in operations research. Eur. J. Oper. Res.
14(3), 207220 (1983)
31. Law, A.M., Kelton, W.D.: Simulation Modeling and Analysis, 2nd edn. McGraw-Hill, New
York (1991)
32. Ling, Y., Mahadevan, S.: Quantitative model validation techniques: new insights. Reliab. Eng.
Syst. Saf. 111, 217231 (2013)
33. Liu, Y., Chen, W., Arendt, P., Huang, H.: Toward a better understanding of model validation
metrics. J. Mech. Des. 133(7), 113(2011)
34. Louisa, N., Johnson, C.W.: Validation of Counter-terrorism Simulation Models. Available at
http://www.dcs.gla.ac.uk/~louisa/Publications_files/ISSC09_Paper_2.pdf (2009). Accessed in
Nov 2015
35. Mason, R., McInnis, B., Dalal, S.: Machine Learning for the Automatic Identification of
Terrorist Incidents in Worldwide News Media. In: 2012 IEEE International Conference on
Intelligence and Security Informatics (ISI), Washington, DC, pp. 8489 (2012)
36. McCarl, B.A.: Model validation: an overview with some emphasis on risk models. Rev. Market.
Agric. Econ. 52(3), 153173 (1984)
37. McCarl, B.A., Spreen, T.H.: Validation of Programming Models. Available at http://agecon2.
tamu.edu/people/faculty/mccarl-bruce/mccspr/new18.pdf (1997). Accessed in Nov 2015
38. Merrick, J., Parnell, G.S.: A comparative analysis of PRA and intelligent adversary methods
for counterterrorism risk management. Risk Anal. 31(9), 14881510 (2011)
39. Morral, A.R., Price, C.C., Ortiz, D.S., Wilson, B., LaTourrette, T., Mobley, B.W., McKay, S.,
Willis, H.H.: Modeling Terrorism Risk to the Air Transportation System: An Independent
Assessment of TSAs Risk Management Analysis Tool and Associated Methods. RAND report.
Available at http://www.rand.org/content/dam/rand/pubs/monographs/2012/RAND_MG12
41.pdf (2012). Accessed in Nov 2015
40. NRC: Review of the Department of Homeland Securitys Approach to Risk Analysis. Available
at https://www.fema.gov/pdf/government/grant/2011/fy11_hsgp_risk.pdf (2010). Accessed in
Nov 2015
1418 J. Zhang and J. Zhuang
41. NRC: Bioterrorism Risk Assessment. Biological Threat Characterization Center of the
National Biodefense Analysis and Countermeasures Center. Fort Detrick, MD (2008)
42. Oberkampf, W., Barone, M.: Measures of agreement between computation and experiment:
validation metrics. J. Comput. Phys. 217(1), 536 (2006)
43. Oberkampf, W., Trucano, T.: Verification and validation in computational fluid dynamics.
Progr. Aerosp. Sci. 38(3), 209272 (2002)
44. Oberkampf, W.L.: Bibliography for Verification and Validation in Computational Simulation.
Sandia Report (1998)
45. Oberkampf, W.L., Trucano, T.G., Hirsch, C.: Verification, validation, and predictive capability
in computational engineering and physics. Appl. Mech. Rev. 57(5), 345384 (2004)
46. Oden, J.T.: A Brief View of Verification, Validation, and Uncertainty Quantification. Avail-
able at http://users.ices.utexas.edu/~serge/WebMMM/Talks/Oden-VVUQ-032610.pdf (2009).
Accessed in Nov 2015
47. OKeefe, R.M., OLeary, D.E.: Expert system verification and validation: a survey and tutorial.
Artif. Intell. Rev. 7, 342 (1993)
48. Oliver, R.W.: What Is Transparency? McGraw-Hill, New York (2004)
49. Pace, D.K.: Modeling and simulation verification and validation challenges. Johns Hopkins
APL Technical Digest. 25(2), 163172 (2004)
50. Rakesh, K., Sarin, L. Keller, R.: From the editors: probability approximations, anti-terrorism
strategy, and bulls-eye display for performance feedback. Decis. Anal. 10(1), 15(2013)
51. Rebba, R., Mahadevan, S.: Validation of models with multivariate output. Reliab. Eng. Syst.
Saf. 91(8), 861871 (2006)
52. Roach, P.J.: Verification and Validation in Computational Science and Engineering. Hermosa
Publishers, Albuquerque (1998)
53. Salari, K., Knupp, P.: Code Verification by the Method of Manufactured Solutions. Sandia
National Laboratories, SAND2000-1444 (2000)
54. Saltelli, A., Tarantola, S.: On the relative importance of input factors in mathematical models:
safety assessment for nuclear waste disposal. J. Am. Stat. Assoc. 97(459), 702709 (2002)
55. Sankararaman, S., Mahadevan, S.: Model validation under epistemic uncertainty. Reliab. Eng.
Syst. Saf. 96(9), 12321241(2011)
56. Sargent, R.G.: An assessment procedure and a set of criteria for use in the evaluation of
computerized models and computer-based modeling tools. Final technical report RADC-TR-
80-409, U.S. Air Force (1981)
57. Sargent, R.G.: Some subjective validation methods using graphical displays of data. In:
Proceedings of the 1996 Winter Simulation Conference, Coronado, California (1996)
58. Sargent, R.G.: Verification and validation of simulation models. In: Proceedings of the 2009
Winter Simulation Conference, Austin, Texas, pp. 162176 (2009)
59. Schlesinger, S., Crosbie, R.E., Innis, G.S., Lalwani, C.S., Loch, J., Sylvester, R.J., Wright,
R.D., Kheir, N., Bartos, D.: Terminology for model credibility. Simulation 32(3), 103104
(1979)
60. Shan, X., Zhuang, J.: Cost of equity in homeland security resource allocation in the face of a
strategic attacker. Risk Anal. 33(6), 10831099 (2013)
61. Shan, X., Zhuang, J.: Hybrid defensive resource allocations in the face of partially strategic
attackers in a sequential defender-attacker game. Eur. J. Oper. Res. 228(1), 262272 (2013)
62. Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., Maule, B. , Meyer, G.:
PROTECT: a deployed game theoretic system to protect the ports of the United States. In:
AAMAS, Valencia, Spain (2012)
63. Sornette, D., Davis, A.B., Vixie, K.R.,Pisarenko, V., Kamm, J.R.: Algorithm for model
validation: theory and applications. Proc. Natl. Acad. Sci. U. S. A. 104(16), 65626567 (2007)
64. START: Global Terrorism Database[data file]. Available at http://www.start.umd.edu/gtd.
Accessed in Nov 2015
65. Streetman, S.: The Art of the Possible in Validating Models of Adversary Behavior for Extreme
Terrorist Acts. Presentation on First Conference on Validating Models of Adversary Behaviors,
Buffalo (2013)
41 Validation, Verification, and Uncertainty Quantification for Models: : : 1419
66. Tambe, M.: Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned.
Cambridge University Press, New York (2011)
67. Tambe, M., Shieh, E.: Stackelberg Games in Security Domains: Evaluating Effectiveness of
Real-World Deployments. Presentation on First Conference on Validating Models of Adversary
Behaviors, Buffalo (2013)
68. Tetlock, P.E., Gardner, D.: Superforecasting: The Art and Science of Prediction. Crown, New
York (2015)
69. Terrorism in Western Europe: Events Data (TWEED). Available at http://folk.uib.no/sspje/
tweed.htm. Accessed in Nov 2015
70. Thacker, B.H., Riha, D.S., Millwater, H.R., Enright, M.P.: Errors and uncertainties in prob-
abilistic engineering analysis. In: Proceedings of the 42nd AIAA/ASME/ASCE/AHS/ASC
Structures, Structural Dynamics, and Materials Conference and Exhibit, Seattle, Washington
(2001)
71. Thacker, B.H., Doebling, S.W., Hemez, f. M., Anderson, M.C., Pepin, J.E., Rodriguez, E.A.:
Concepts of Model Verification and validation. Available at http://www.ltas-vis.ulg.ac.be/
cmsms/uploads/File/LosAlamos_VerificationValidation.pdf (2004). Accessed in Nov 2015
72. The Federal Emergency Management Agency (FEMA). Available at http://www.fema.gov/.
Accessed in Nov 2015
73. The National Research Council (NRC). Available at http://www.nationalacademies.org/nrc/.
Accessed in Nov 2015
74. Toubaline, S., Borrion, H., Sage, L.T.: Dynamic generation of event trees for risk modeling of
terrorist attacks. In: 2012 IEEE Conference on Technologies for Homeland Security (HST),
Waltham, MA, pp. 111116 (2012)
75. Urschel, J., J. Zhuang.: Are NFL coaches risk and loss averse? Evidence from their use of
kickoff strategies. J. Quant. Anal. Sports 7(3), Article 14(2011)
76. U.S. GAO: Guidelines for Model Evaluation. PAD-79-17, Washington, DC (1979)
77. U.S. Government Accountability Office (U.S. GAO). Available at http://www.gao.gov/.
Accessed in Nov 2015
78. Zhang, J., Zhuang, J.: Modeling a Multi-period, Multi-target Attacker-defender Game with
Multiple attack types. Working paper (2015)
79. Zhang, J., Zhuang, J.: Defending Remote Border Security with Sensors and UAVs based on
Network Interdiction Methods. Working paper (2015)
80. Zhuang, J., Bier, V.: Balancing terrorism and natural disasters-defensive strategy with endoge-
nous attacker effort. Oper. Res. 55(5), 976991(2007)
81. Zhuang, J., Saxton, G., Wu, H.: Publicity vs. impact in nonprofit disclosures and donor
preferences: a sequential game with one nonprofit organization and N donors. Ann. Oper. Res.
221(1), 469491(2014)
Robust Design and Uncertainty
Quantification for Managing Risks in 42
Engineering
Ron Bates
Abstract
Methods for uncertainty quantification lie at the heart of the robust design
process. Good robust design practice seeks to understand how a product or pro-
cess behaves under uncertain conditions to design out unwanted effects such as
inconsistent or below-par performance or reduced life (and hence increased
service or total life cycle costs). Understanding these effects can be very costly,
requiring a deep understanding of the system(s) under investigation and of the
uncertainties to be guarded against.
This chapter explores applications of UQ methods in an engineering design
environment, including discussions on risk and decision, systems engineering,
and validation and verification. The need for a well-aligned hierarchy of high-
quality models is also discussed. These topics are brought together in an
Uncertainty Management Framework to provide an overall context for embed-
ding UQ methods in the engineering design process. Lastly, some significant
challenges to the approach are highlighted.
Keywords
Systems Engineering Verification & Validation Risk Decision Making
Simulation Uncertainty Propagation Robust optimization
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422
2 Risk and Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1423
3 Systems Engineering and Verification & Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425
3.1 Simulation Verification & Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1426
R. Bates ()
Design Sciences, Engineering Capability, Rolls-Royce plc., Derby, UK
e-mail: Ron.Bates@Rolls-Royce.com
1 Introduction
this approach provides a natural framework for dealing with uncertainty in the
system itself, in the network of models that represent the system, in model form,
and in model inputs and in model outputs, which is the main focus of uncertainty
quantification methods in engineering.
The rest of this section will describe in more detail the application of UQ in
the engineering design environment, followed by a set of challenges that need to
be overcome for application to complex engineering systems. The topics to be
discussed are:
The relationship between quality, risk, and uncertainty is worth exploring. Robust
design has its roots in the quality revolution in postwar Japan (Ishikawa/Deming)
where the connection was made between statistical methods applied to industry
and management, leading to the famous 14 key principles for management later
embraced by the total quality management revolution. The point here is that we
undertake UQ studies to improve our understanding of the effect of uncertainty on
the system (or subsystem, or component) in order to make a decision as part of the
product development process (PDP). It is all about those decisions. There is a strong
hierarchical link between probabilistic analysis, UQ, uncertainty management, and
risk (Fig. 42.1).
Taken in isolation, UQ studies on complex engineering systems are costly and
time-consuming to conduct. High cost can make it difficult to justify their use,
particularly in low-volume industries. Long-running studies may take weeks or even
months to complete, which may be too late to influence the design. However, the
design process itself is not a linear process, and in general, neither is innovation.
Instead ideas are generated, tested, and broken in an iterative process, and design
studies need to keep pace with this. Mapping this process to hierarchical models of
the system (integrated, multi-scale, multi-fidelity) may facilitate detailed UQ studies
early in the PDP, reducing the risk of costly design modifications later. This will be
discussed further in later sections.
When considering uncertainty, there are three main categories:
The field of UQ is most often concerned with (1) and (2). Probabilistic studies
can provide a natural framework for dealing with aleatory uncertainty. Probability
1424 R. Bates
The Cynefin model [1] provides a framework for integrating and expanding these
levels of uncertainty to encompass the wider system view and related decision-
making processes and appropriate modes of behavior for dealing with uncertainty.
Figure 42.3 shows an attempt at mapping the levels of uncertainty defined in
Fig. 42.2 into this framework. Where there is deep uncertainty, one is firmly on
the left-hand side of the Cynefin model.
In addition to facilitating UQ studies by helping with the quantification problem,
systems engineering can also help to address the issue of ineptitude or incom-
petence, for example, by providing a framework, or checklist, to facilitate the
implementation of best practice or simply by having a structured approach that
minimizes data transfer, data processing, model sharing, and model integration. The
issue of incomprehension is also worth considering at this point. A proposed solu-
tion to a problem may not be adequate if the initial specifications are misunderstood
or if the specified system is not correctly modeled. A systems engineering approach
can help guard against these problems by checking for inconsistencies and unex-
pected behavior, which will be explored in the next section. Ultimately, managing
uncertainty helps to speed up implementation, reduce error, and reduce cost.
Fig. 42.3 Mapping uncertainty and systems thinking to the Cynefin model
Fig. 42.5 Simulation V&V (A simplified version of the Sargent Cycle [4])
This simulation framework can be used to solve the forward problem (what is the
variation in outputs, given variation in inputs?) and the inverse problem (what input
variation caused the observed output variation?). There are significant challenges
associated with each pillar of the framework, and these will be further discussed
now.
Standard robust design and systems engineering tools can be used to identify sources
of input uncertainty, and once identified, these need to be quantified. This usually
involves gathering data, although interval-based methods do exist; they are not
usually associated with UQ studies, which concentrate on probabilistic analysis.
Where deeper uncertainty exists (i.e., uncertainty that cannot be quantified), one can
take a scenario-based approach, as previously mentioned. Different types of input
uncertainty related to engineering analysis include:
1. Physics. This defines the nature of the problem, e.g., find displacements given
forces on objects.
2. Geometry. This defines the domain over which the problem will be solved.
This may not necessarily be the fully featured part geometry but could be a
simplification.
1430 R. Bates
A system model or network of models that represents the system needs to be:
Methods for the propagation of uncertainty through the model framework include:
The design and development of a new engine is supported by analysis and test
operations that rely on models at system, subsystem, and component level. The
impact of variation at component level needs to be assessed not only at that level
but also at higher levels. This can create a significant challenge for the practical
application of UQ methods.
A generic example problem, which is nonetheless based on a real engineering
design problem, is presented to illustrate some of the issues that have been
discussed. The problem is to design a blade and disc assembly for a jet engine.
The idea behind the example is to show the hierarchy of design systems and the
multifaceted nature of the design problem.
The disc and blade design problem is addressed at component level, but also
needs to be considered at subsystem level (compressor or turbine) and at system
(whole engine) level. Figures 42.7, 42.8, and 42.9 show these three levels of design
geometry. These geometry models form the basis of the set of analysis models used
to inform both design and the verification and validation of the system.
Figure 42.7 shows a schematic of a Rolls-Royce Plc Trent engine. At the
core of the engine lies the compressor-combustor-turbine arrangement, where air
is compressed, heated, and expanded to produce work. The detailed designs of
42 Robust Design and Uncertainty Quantification for Managing Risks: : : 1433
the blade and disc arrangements of the compressor and turbine are critical in
determining overall engine performance and are subject to multiple design criteria.
Performance is evaluated over a full flight profile, with hundreds of complex
thermomechanical loading conditions. The blade and disc components need to be
evaluated for strength (stress, strain), fatigue, and vibration, requiring the model to
be able to represent nonlinear physical phenomena such as creep, plasticity, fatigue,
and contact and friction, all in a coupled thermomechanical model.
Not all of these conditions and physics-based models need to be evaluated at
the whole engine level. Whole engine analysis is extremely costly more can
be done with available resources if the physical phenomena that drive individual
design criteria are modeled at appropriate scale and fidelity. This however creates an
overhead in managing the hierarchy of analysis models. The models need to be kept
in alignment via common geometry definitions and consistent boundary conditions,
all while allowing them to flex to take uncertainty into account and to allow for
larger design changes in the search for a good design solution.
1434 R. Bates
Fig. 42.9 A generic disc and blade model arrangement (left) and section thereof (right)
1. Simulation
a. Multidisciplinary: common parametric models to enable flow of data
b. Aligned models: systems of multi-fidelity models
c. Fully stochastic simulation models
2. Uncertainty quantification
a. Data-led quantification of sources of uncertainty
b. Methods for UQ where data are sparse (expensive) and multi-scale
c. Propagation methods integrated with simulation models
3. Optimization
a. Improved automated geometry creation and meshing (morph geometry or
mesh?)
b. Increase in the number of objectives considered during optimization
42 Robust Design and Uncertainty Quantification for Managing Risks: : : 1435
4. Robust design
a. Tools and methods able to cope with large nonlinear problems
b. Hierarchy of variable fidelity models from component to subsystem to system
c. Full System view of uncertainty quantification and management: decision-
making in the face of uncertainty
d. Validation: calibration at a test point, extrapolation to a new design vs
interpolation
e. Close the loop: how well did we predict the variation that we see in the
field?
7 Conclusion
References
1. Snowden, D.J., Boone, M.E.: A leaders framework for decision making. Harv. Bus. Rev. 85(11),
6876 (2007)
2. Hartmann, S.: The world as a process: simulations in the natural and social sciences. In:
Hegselmann, R., et al. (eds.) Modelling and Simulation in the Social Sciences from the
Philosophy of Science Point of View, Theory and Decision Library. pp. 77100. Kluwer,
Dordrecht (1996)
1436 R. Bates
3. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments.
Stat. Sci. 4(4), 409423 (1989)
4. Sargent, R.G.: Validation of simulation models. In: Highland, H.J., Spiegel, M.F., Shannon, R.E.
(eds.) Proceedings of the 1979 Winter Simulation Conference. IEEE, Piscataway, pp 497503
(1979)
5. Stephens, E.M., Edwards, T.L., Demeritt, D.: Communicating probabilistic information from
climate model ensembleslessons from numerical weather prediction. WIREs Clim. Change 3,
409426 (2012). doi10.1002/wcc.187
Quantifying and Reducing Uncertainty
About Causality in Improving Public 43
Health and Safety
Abstract
Effectively managing uncertain health, safety, and environmental risks requires
quantitative methods for quantifying uncertain risks, answering the following
questions about them, and characterizing uncertainties about the answers:
These are all causal questions. They are about the uncertain causal relations
between causes, such as exposures, and consequences, such as adverse health
outcomes. This chapter reviews advances in quantitative methods for answering
them. It recommends integrated application of these advances, which might
collectively be called causal analytics, to better assess and manage uncertain
risks. It discusses uncertainty quantification and reduction techniques for causal
modeling that can help to predict the probable consequences of different policy
Keywords
Adaptive learning Change-point analysis (CPA) Bayesian networks (BN)
Causal analytics Causal graph Causal laws Counterfactual Directed
acyclic graph (DAG) DAG model Dynamic Bayesian networks (DBN)
Ensemble learning algorithms Evaluation analytics Granger causality
Influence diagram (ID) Intervention analysis Interrupted time series
analysis Learning analytics Marginal structural model (MSM) Model
ensembles Multi-agent influence diagram (MAID) Path analysis Predic-
tive analytics Prescriptive analytics Propensity score Quasi-experiments
(QEs) Simulation Structural equations model (SEM) Structure discov-
ery Transfer Entropy Uncertainty analytics
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439
2 Some Limitations of Traditional Epidemiological Measures for Causal
Inference: Uncertainty About Whether Associations Are Causal . . . . . . . . . . . . . . . . . 1442
2.1 Example: Exposure-Response Relations Depend on Modeling Choices . . . . . 1443
3 Event Detection and Consequence Prediction: Whats New, and So What? . . . . . . . . . 1448
3.1 Example: Finding Change Points in Surveillance Data . . . . . . . . . . . . . . . . . . . 1449
4 Causal Analytics: Determining Whether a Specific Exposure Harms Human
Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1453
5 Causes and Effects Are Informative About Each Other: DAG Models,
Conditional Independence Tests, and Classification and Regression Tree
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1454
6 Changes in Causes Should Precede, and Help to Predict and Explain,
Changes in the Effects that They Cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455
6.1 Change-Point Analysis Can Be Used to Determine Temporal Order . . . . . . . . 1456
6.2 Intervention Analysis Estimates Effects of Changes Occurring
at Known Times, Enabling Retrospective Evaluation of the
Effectiveness of Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456
6.3 Granger Causality Tests Show Whether Changes in Hypothesized
Causes Help to Predict Subsequent Changes in Hypothesized Effects . . . . . . . 1459
7 Information Flows From Causes to Their Effects over Time: Transfer Entropy . . . . . . 1460
8 Changes in Causes Make Future Effects Different From What They
Otherwise Would Have Been: Potential-Outcome and Counterfactual Analyses . . . . . 1461
9 Valid Causal Relations Cannot Be Explained Away by Noncausal Explanations . . . . . 1466
10 Changes in Causes Produce Changes in Effects via Networks of Causal
Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1466
10.1 Structural Equation and Path Analysis Models Model Linear Effects
Among Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467
43 Quantifying and Reducing Uncertainty About Causality in: : : 1439
1 Introduction
Politically contentious issues often turn on what appear to be technical and scientific
questions about cause and effect. Once a perceived undesired state of affairs
reaches a regulatory or policy agenda, the question arises of what to do about it
to change things for the better. Useful answers require understanding the probable
consequences caused by alternative policy actions. For example,
A century ago, policy makers might have asked whether a prohibition amendment
would decrease or increase alcohol abuse.
A decade ago, policy makers might have wondered whether an invigorated war
on drugs would increase or decrease drug abuse.
Do seatbelts reduce deaths from car accidents, even after accounting for risk
homeostasis changes in driving behaviors?
Does gun control reduce deaths due to shootings?
Does the death penalty reduce violent crime?
Has banning smoking in bars reduced mortality rates due to heart attacks?
Do sex education and birth control programs in schools decrease teen pregnancy
rates and prevalence of sexually transmitted diseases?
Has the Clean Air Act reduced mortality rates, e.g., due to lung cancer or
coronary heart disease (CHD) or to all causes?
Will reformulations of childhood vaccines reduce autism?
Would banning routine antibiotic use in farm animals reduce antibiotic-resistant
infections in people?
quickly and how long does it last? They want to know what will work best and what
has (and has not) worked well in reducing risks and undesirable outcomes without
causing unintended adverse consequences. And they want to know how certain or
uncertain the answers to these questions are.
Developing trustworthy answers to these questions and characterizing uncer-
tainty about them requires special methods. It is notoriously difficult to quickly and
accurately identify events or exposures that cause adverse human health outcomes,
quantify uncertainties about causal relations and impacts, accurately predict the
probable consequences of a proposed action such as a change in exposure or
introduction of a regulation or intervention program, and quantify in retrospect
what effects an action actually did cause, especially if other changing factors
affected the observed outcomes. The following section explains some limitations
of association-based epidemiological and regulatory risk assessment methods that
are often used to try to answer these questions. These limitations suggest that
association-based methods are not adequate for the task [21], contributing to an
unnecessarily widespread prevalence of false positives in current epidemiology that
undermines the credibility and value of scientific studies that should be providing
trustworthy, crucial information to policy makers [22, 56, 70, 102]. New and better
ideas and methods are needed, and are available, to provide better answers. The
remaining sections review the current state of the art of methods for answering the
following causal questions and quantifying uncertainties about their answers:
1. Event detection: What has changed recently in disease patterns or other adverse
outcomes, by how much, when, and why? For example, have hospital or
emergency room admissions or age-specific mortalities with similar symptoms
recently jumped significantly, perhaps suggesting a disease outbreak (or a
terrorist bio-attack)?
2. Consequence prediction: What are the implications for what will probably
happen next if different actions (or no new actions) are taken? For example, how
many new illnesses are likely to occur and when? How quickly can a confident
answer be developed and how certain and accurate can answers be based on
limited surveillance data?
3. Risk attribution: What is causing current undesirable outcomes? Does a specific
exposure harm human health? If so, who is at greatest risk (e.g., children, elderly,
other vulnerable subpopulations) and under what conditions (e.g., for what
exposure concentrations and durations or for what co-exposures)? Answering
this question is the subject of hazard identification in health risk assessment. For
example, do ambient concentrations of fine particulate matter or ozone in air
(possibly in combination with other pollutants) cause increased incidence rates
of heart disease or lung cancer in one or more vulnerable populations? Here,
cause is meant in the specific sense that reducing exposures would reduce the
risks per person per year of the adverse health effects. (The following section
contrasts this with other interpretations of exposure causes disease Y; such as
exposure X is strongly, consistently, specifically, temporally, and statistically
43 Quantifying and Reducing Uncertainty About Causality in: : : 1441
These questions are fundamental in epidemiology and health and safety risk
assessment. They are mainly about how changes in exposures affect changes in
health outcomes and about how certain the answers are. They can be answered
using current methods of causal analysis and uncertainty quantification (UQ) for
causal models if sufficient data are available.
The following sections discuss methods for drawing valid causal inferences
from epidemiological data and for quantifying uncertainties about causal impacts,
taking into account model uncertainty as well as sampling errors and measurement,
classification, or estimation errors in predictors. UQ methods based on model
ensemble methods, such as Bayesian model averaging (BMA) and various forms of
resampling, boosting, model cross validation, and simulation, can help to overcome
over-fitting and other modeling biases, leading to wider confidence intervals for
the estimated impacts of actions and reducing false-positive rates [50]. UQ has the
potential to restore greater integrity and credibility to model-based risk estimates
and causal predictions, to reveal the quantitative impacts of model and other
uncertainties on risk estimates and recommended risk management actions, and to
guide more productive-applied research to decrease key remaining uncertainties and
to improve risk management decision-making via active exploration and discovery
of valid causal conclusions and uncertainty characterizations.
1442 L.A. Cox, Jr.,
These are equations with the explicit causal interpretation that a change in the
variable on the right side causes a corresponding change in the variable on the
left side to restore equality between the two sides (e.g., increasing age increases
cumulative exposure and disease risk, but increasing exposure decreases risk at any
age). These two structural equations together constitute a structural equation model
(SEM) that can be diagrammed as in Fig. 43.1:
1444 L.A. Cox, Jr.,
EXPOSURE
In this diagram, each variable depends causally only on the variables that point
into it, as revealed by the SEM equations. The weights on the arrows (the coefficients
in the SEM equations) show how the average value of the variable at the arrows
head will change if the variable at its tail is increased by one unit, for example,
increasing AGE by 1 decade (if that is the relevant unit) increases RISK directly by
2 units (e.g., 2 expected illnesses per decade, if that is the relevant unit), increases
EXPOSURE by one unit, and thereby decreases RISK indirectly by 1 unit, via the
path through EXPOSURE, for a net effect of a 1 unit increase in RISK per unit
increase in AGE.
By contrast to such causal SEM models, what is called a reduced-form model is
obtained by regressing RISK against EXPOSURE. Using the first SEM equation,
EXPOSURE D 1*AGE, to substitute EXPOSURE for AGE in the second SEM
equation, RISK D 2*AGE EXPOSURE, yields the following reduced-form
equation:
(e.g., pollutant levels and mortality rates both tend to be higher on cold winter days
than during the rest of the year, and both have declined in recent decades) with
a causal, predictive relation that implies that future reductions in pollution would
cause further future reductions in elderly mortality rates. Since such association-
based studies are often unreliable indicators of causality [21] or simply irrelevant
for determining causality, as in the examples for Fig. 43.1, policy makers who wish
to use reliable causal relations to inform policy decisions must seek elsewhere.
These limitations of association-based methods have been well discussed among
methodological specialists for decades [98]. Key lessons, such as that the same data
set can yield either a statistically significant positive exposure-response regression
coefficient or a statistically significant negative exposure-response regression coef-
ficient, depending on the modeling choices made by the investigators, are becoming
increasingly appreciated by practitioners [21]. They illustrate an important type of
uncertainty that arises in epidemiology, but that is less familiar in many other applied
statistical settings: uncertainty about the interpretation of regression coefficients
(or other association-based measures such as RR, PAF, etc.) as indicating causal
relations vs. confounded associations or modeling biases vs. some of each. This type
of uncertainty cannot be addressed by presenting conventional statistical uncertainty
measures such as confidence intervals, p-values, regression diagnostics, sensitivity
analyses, or goodness-of-fit statistics, since the uncertainty is not about how well a
model fits data or about the estimated parameters of the model. Rather, it is about the
extent to which the model is only descriptive of the past vs. predictive of different
futures caused by different choices. Although this is not an uncertainty to which
conventional statistical tests apply, it is crucial for the practical purpose of making
model-informed risk management decisions. Policy interventions will successfully
increase the probabilities of desired outcomes and decrease the frequencies of
undesired ones only to the extent that they act causally on drivers of the outcomes
and not necessarily to the extent that the models used describe past associations.
One way to try to bridge the gap between association and causation is to ask
selected experts what they think about whether or to what extent associations might
be causal. However, research on the performance of expert judgments has called into
question the reliability of expert judgments, specifically including judgments about
causation [61]. Such judgments typically reflect qualitative weight of evidence
(WoE) considerations about the strength, consistency (e.g., do multiple independent
researchers find the claimed associations?), specificity, coherence (e.g., are asso-
ciations of exposure with multiple health endpoints mutually consistent with each
other and with the hypothesis of causality?) temporality (do hypothesized causes
precede their hypothesized effects?), gradient (are larger exposures associated with
larger risks?), and biological plausibility of statistical associations and the quality
of the data sources and studies supporting them. One difficulty is that a strong
confounder (such as age in Fig. 43.1) with delayed effects can create strong,
consistent, specific, coherent, temporal associations between exposure and risk
of an adverse response, with a clear gradient associating larger risks with larger
exposures, without providing any evidence that exposure actually causes increased
risk.
1446 L.A. Cox, Jr.,
Showing that an association is strong, for example, does not address whether
it is causal, although many WoE systems simply assume that the former supports
the latter without explicitly addressing whether the strong associations are instead
explained by strong confounding, strong biases, or strong modeling assumptions.
Similarly, showing that different investigators find the same or similar association
does not necessarily show whether this consistency results from shared modeling
assumptions, biases, or confounders. Conflating causal and associational concepts,
such as evidence for the strength of an association and evidence for causality of
the association, too often makes assessments of causality in epidemiology untrust-
worthy compared to methods used in other fields, discussed subsequently [51, 83].
Most epidemiologists are trained to treat various aspects of association as evidence
for causation, even though they are not, and this undermines the trustworthiness of
expert judgments about causation based on WOE considerations [83].
In addition, experts are sometimes asked to judge the probability that an
association is causal (e.g., [25]). This makes little sense. It neglects the fact that
an association may be partly causal and partly due to confounding or modeling
biases or coincident historical trends. For example, if exposure does increase risk,
but is also confounded by age, then asking for the probability that the regression
coefficient relating exposure to risk is causal overlooks the realistic possibility
that it reflects both a causal component and a confounding component, so that the
probability that it is partly causal might be 1 and the probability that it is completely
causal might be 0. A more useful question to pose to experts might be what fraction
of the association is causal, but this is seldom asked. Common noncausal sources
of statistical associations include model selection and multiple testing biases,
model specification errors, unmodeled errors in explanatory variables in multivariate
models, biases due to data selection and coding (e.g., dichotomizing or categorizing
continuous variables such as age, which can lead to residual confounding), and
coincident historical trends, which can induce statistically significant-appearing
associations between statistically independent random walks a phenomenon
sometimes dubbed as spurious regression [17, 98].
Finally, qualitative subjective judgments and ratings used in many WoE systems
are subject to well-documented psychological biases. These include confirmation
bias (seeing what one expects to see and discounting or ignoring evidence that
might challenge ones preconceptions), motivated reasoning (finding what it benefits
one to find and believing what it pays one to believe), and overconfidence (not
sufficiently doubting, questioning, or testing ones own beliefs and hence not
seeking potentially disconfirming information that might require those beliefs to
be revised) [61, 102].
That statistical associations do not in general convey information sufficient for
making valid causal predictions has been well understood for decades by statis-
ticians and epidemiologists specializing in technical methods for causal analysis
(e.g., [31, 36]). This understanding is gradually percolating through the larger
epidemiological and risk analysis communities. Peer-reviewed published papers and
reports, including those relied on in many regulatory risk assessments, still too often
make the fundamental mistake of reinterpreting empirical exposure-response (ER)
43 Quantifying and Reducing Uncertainty About Causality in: : : 1447
relations between historical levels of exposure and response as if they were causal
relations useful for predicting how future changes in exposures would change future
responses. Fortunately, this confusion is unnecessary today: appropriate technical
methods for causal analysis and modeling are now well developed, widely available
in free software such as R or Python, and readily applicable to the same kinds
of cross-sectional and longitudinal data collected for association-based studies.
Table 43.1 summarizes some of the most useful study designs and methods for valid
causal analysis and modeling of causal exposure-response relations.
Despite the foregoing limitations, there is much of potential value in several WoE
considerations, especially consistency, specificity, and temporality of associations,
especially if they are used as part of a relatively objective, quantitative, data-
Table 43.1 Some formal methods for modeling and testing causal hypotheses
Method and Appropriate study
References Basic Idea design
Conditional Is hypothesized effect (e.g., lung cancer) sta- Cross-sectional data
independence tests tistically independent of hypothesized cause Can also be applied
[31, 32] (e.g., exposure to chemical), given val- to multi-period data
ues of other variables (e.g., education and (e.g., in dynamic
income)? If so, this undermines causal inter- Bayesian networks)
pretation.
Panel data analysis Are changes in exposures followed by Panel data study:
[2, 109] changes in the effects that they are hypothe- Collect a sequence
sized to help cause? If not, this undermines of observations on
causal interpretation; if so, this strengthens same subjects or
causal interpretation. units over time
Example: Are changes in exposure levels
followed (but not preceded) by correspond-
ing changes in mortality rates?
Granger causality Does the history of the hypothesized cause Time series data on
test [23], transfer improve ability to predict the future of the hypothesized causes
entropy hypothesized effect? If so, this strengthens and effects
[81, 91, 99, 118] causal interpretation; otherwise, it under-
mines causal interpretation.
Example: Can lung cancer mortality rates in
different occupational groups be predicted
better from time series histories of exposure
levels and mortality rates than from the time
series history of mortality rates alone?
Quasi-experimental Can control groups and other comparisons Longitudinal
design and analysis refute alternative (noncausal) explanations observational data
[12, 40, 41] for observed associations between hypothe- on subjects exposed
sized causes and effects? For example, can and not exposed to
coincident trends and regression to the mean interventions that
be refuted as possible explanations? If so, change the
this strengthens causal interpretation. hypothesized
cause(s) of effects.
(continued)
1448 L.A. Cox, Jr.,
driven approach to inferring probable causation. The following sections discuss this
possibility and show how such traditional qualitative WoE considerations can be fit
into more formal quantitative causal analyses.
55
hospital_admissions
50
45
40
35
30
0 10 20 30 40 50 60
time
Fig. 43.2 Surveillance time series showing a possible increase in hospitalization rates
0.6
0.5
0.4
posterior
0.3
0.2
0.1
0.0
0 10 20 30 40 50 60
Index
Fig. 43.3 Bayesian posterior distribution for the timing of the increase in Fig. 43.3 if one has
occurred
right side of Fig. 43.2 than the left, but might this plausibly just be due to chance, or
is it evidence for a real increase in hospitalization rates?
Figure 43.3 illustrates a typical result of current statistical technology (also
used in computational intelligence, computational Bayesian, machine learning,
pattern recognition, and data mining technologies) for solving such problems by
using statistical evidence, together with risk models, to draw inferences about
what is probably happening in the real world. The main idea is simple: the
highest points indicate the times that are computed to be most likely for when a
change in hospitalization rate occurred, based on the data in Fig. 43.2. (Technically,
Fig. 43.3 plots the likelihood function of the data, assuming that at most one jump
from one level to a different level has occurred for the hospitalization rate. The
43 Quantifying and Reducing Uncertainty About Causality in: : : 1451
27%
24%
21%
18%
15%
12%
9%
6%
3%
0%
04OCT08 17JAN09 02MAY09 15AUG09 28NOV09 13MAR10 26JUN10 09OCT10
Week Ending
All Ages 00.04yr 05.17yr 18-44yr 45-64yr 65+
Fig. 43.4 Proportion of emergency department visits for influenza-like illness by age group for
October 4, 2008October 9, 2010, in a US Department of Health and Human Services region
(Source: [62], http://jamia.oxfordjournals.org/content/19/6/1075.long)
likelihoods are rescaled so that their sum for all 60 weeks is 1, so that they can
be interpreted as posterior probabilities if the prior is assumed to be uniform.
More sophisticated algorithms are discussed next.) The maximum likelihood-based
algorithm accurately identifies both the time of the change (week 25) and the
magnitude of its effect to one significant decimal place (not shown in Fig. 43.3).
The spread of the likelihood function (or posterior probability distribution) around
the most likely value in Fig. 43.3 also shows how precise is the estimation of the
change-point time.
Figure 43.4 shows a real-world example of a change point and its consequences
for emergency department visits over time. Admissions for flu-like symptoms,
especially among infants and children (04 and 517 year olds), increased sharply
in August and declined gradually in each age group thereafter. Being able to
identify the jump quickly and then applying a predictive model such as a
stochastic compartmental transition model with susceptible, infected, and recovered
subpopulations (SIR model) for each age group to predict the time course of the
disease in the population can help forecast the care resources that will be needed
over time for each age group.
More generally, detecting change points can be accomplished by testing the null
hypothesis of no change for each time step and correcting for multiple testing bias
(which would otherwise inflate false-positive rates, since testing the null hypothesis
for each of the many different possible times at which a change might have
occurred multiplies the occasions on which an apparently significant change occurs
1452 L.A. Cox, Jr.,
Nakahara et al. [85] used CPA to assess the impact on vehicle crash fatalities
of a program initiated in Japan in 2002 that severely penalized drunk driving.
Fatality rates between 2002 and 2006 (the end of the available data series) were
significantly lower than between 1995 and 2002. However, the CPA revealed
that the change point occurred around the end of 1999, right after a high-profile
vehicle fatality that was much discussed in the news. The authors concluded that
changes in drunk-driving behavior occurred well before the new penalties were
instituted.
In Finland in 19811986, a nationwide oral poliovirus vaccine campaign was
closely followed by, and partly overlapped with, a significant increase in the
incidence of Guillain-Barr syndrome (GBS). This temporal association raised
the important question of whether something about the vaccine might have
caused some or all of the increase in GBS. Kinnunen et al. [63] applied CPA to
medical records from a nationwide Hospital Discharge Register database. They
found that a change point in the occurrence of GBS had probably already taken
place before the oral poliovirus vaccine campaign started. They concluded that
43 Quantifying and Reducing Uncertainty About Causality in: : : 1453
Table 43.2 lists seven principles that have proved useful in various fields for
determining whether available data provide valid evidence that some events or
conditions cause others. They can be applied to epidemiological data to help
determine whether and how much exposures to a hazard contribute causally to
subsequent risks of adverse health outcomes in a population, in the sense that
reducing exposure would reduce risk for example, whether and by how much
a given reduction in air pollution would reduce cardiovascular mortality rates
among the elderly, whether and by how much reducing exposure to television
violence in childhood would reduce propensity for violent behavior years later,
or whether decreasing high-fat or high-sugar diets in youth would reduce risks
1. Conditional independence principle: Causes and effects are informative about each other.
Technically, there should be positive mutual information (measured in bits and quantified by
statistical methods of information theory) between the random variables representing a cause
and its effect. This positive mutual information cannot be removed by conditioning on the
levels of other variables.
2. Granger principle: Changes in causes should precede, and help to predict and explain,
changes in the effects that they cause.
3. Transfer entropy principle: Information flows from causes to their effects over time.
4. Counterfactual principle: Changes in causes make future effects different from what they
otherwise would have been.
5. Causal graph principle: Changes in causes produce changes in effects by propagation via
one or more paths (sequences of causal mechanisms) connecting them.
6. Mechanism principle: Valid causal mechanisms are lawlike, yielding identically distributed
outputs when the inputs are the same.
7. Quasi-experiment principle: Valid causal relations produce differences (e.g., compared to
relevant comparison groups) that cannot be explained away by noncausal explanations.
1454 L.A. Cox, Jr.,
of heart attacks in old age. The following sections explain and illustrate these
principles and introduce technical methods for applying them to data. They also
address the fundamental questions of how to model causal responses to exposure
and other factors, how to decide what to do to reduce risk, how to determine
how well interventions have succeeded in reducing risks, and how to characterize
uncertainties about the answers to these questions.
A key principle for causal analytics is that causes and their effects provide
information about each other. If exposure is a cause of increased disease risk,
then measures of exposure and of response (i.e., disease risk) should provide
mutual information about each other, in the sense that the conditional probability
distribution for each varies with the value of the other. Software for determining
whether this is the case for two variables in a data set is discussed at the end of
this section. In addition, if exposures are direct causes of responses, then the mutual
information between them cannot be eliminated by conditioning on the values of
other variables, such as confounders: a cause provides unique information about its
effects. This provides the basis for using statistical conditional independence tests
to test the observable statistical implications of causal hypotheses: An effect should
never be conditionally independent of its direct causes, given (i.e., conditioned on)
the values of other variables.
As a simple example, if both air pollution and elderly mortality rates are elevated
on cold winter days, then if air pollution is a cause of increased elderly mortality
rate, the mutual information between air pollution and elderly mortality rates should
not be eliminated (explained away) by temperature, even though temperature may
be associated with each of them. If both temperature and air pollution contribute
to increased mortality rates (indicated in causal graph notation as temperature !
mortality_rate pollution), then conditioning on the level of temperature will
not eliminate the mutual information between pollution and mortality rate. On
the other hand, if the correct causal model were that temperature is a confounder
that explains both mortality rate and pollution (e.g., because coal-fired power
plants produce more pollution during days with extremely hot and cold weather,
and, independently, these temperature extremes lead to greater elderly mortality),
diagrammed as mortality_rate temperature ! pollution, then conditioning
on the level of temperature would eliminate the mutual information between
pollution and mortality rate. Thus, tests that reveal conditional independence
relations among variables can also help to discriminate among alternative causal
hypotheses.
The notation in these graphs is as follows. Each node in the graph (such as
temperature, pollution, or mortality_rate in the preceding example) represents
a random variable. Arrows between nodes reveal statistical dependencies (and,
43 Quantifying and Reducing Uncertainty About Causality in: : : 1455
implicitly, conditional independence relations) among the variables. The arrows are
usually constrained to for a directed acyclic graph (DAG), meaning that no node can
be its own predecessor in the partial ordering of nodes determined by the arrows.
The probability distribution of each variable with inward-pointing arrows depends
on the values of the variables that point into it, i.e., the conditional probability
distribution for the variable at the head of an arrow is affected by the values of
its direct parents (the variables that point into it) in the causal graph. Conversely,
a random variable represented by a node is conditionally independent of all other
variables, given the values of the variables that point into it (its parents in the DAG),
the values of the variables into which it points (its children), and the values of any
other parents of its children (its spouses) a set of nodes collectively called its
Markov blanket in the DAG model.
To illustrate these ideas, suppose that X causes Y and Y causes Z, as indicated
by the DAG and X ! Y ! Z, where X is an exposure-related variable (e.g.,
job category for an occupational risk or location of a residence for a public health
risk), Y is a measure of individual exposure, and Z is an indicator of adverse health
response. Then even though each variable is statistically associated with the other
two, Z is conditionally independent of X given the value of Y . But Z cannot be
made conditionally independent of Y by conditioning on X . One way to test for
such conditional independence relations in data is with classification and regression
tree algorithms (see, e.g., https://cran.r-project.org/web/packages/rpart/rpart.pdf for
a free R package and documentation). In this example, a tree for Z would not
contain X after splitting on values of Y , reflecting the fact that Z is conditionally
independent of X given Y . However, a tree for Z would always contain Y , provided
that the data set is large and diverse enough so that the tree-growing algorithm can
detect the mutual information between them.
For practitioners, algorithms are now freely available in R, Python, and Google
software packages to estimate mutual information, the conditional entropy reduction
in one variable when another is observed, and related measures for quantifying
how many bits of information observations of one variable provide about another
and whether one variable is conditionally independent of another gives the values
of other variables ([74]; Ince et al. 2009). For example, free R software and
documentation for performing these calculations can be found at the following sites:
https://cran.r-project.org/web/packages/entropy/entropy.pdf
https://cran.r-project.org/web/packages/partykit/vignettes/partykit.pdf.
If changes in exposures always precede and help to predict and explain subsequent
corresponding changes in health effects, this is consistent with the hypothesis that
exposures cause health effects. The following methods and algorithms support
formal testing of this hypothesis.
1456 L.A. Cox, Jr.,
How much difference exposure reductions or other actions have made in reducing
adverse health outcomes or producing other desired outcomes is often addressed
using intervention analysis, also called interrupted time series analysis. The basic
idea is to test whether the best description of an effects time series changes
significantly when a risk factor or exposure changes, e.g., due an intervention
that increases or reduces it [47, 48, 68]. If the answer is yes, then the size
of the changeover time provides quantitative estimates of the sizes and timing of
changes in effects following an intervention. For example, an intervention analysis
might test whether weekly counts of hospital admissions with a certain set of
diagnostic codes or cardiovascular mortalities per person per year among people
over 70 fell significantly when exposures fell due to closure of a plant that generated
high levels of air pollution. If so, then comparing the best-fitting time series models
(e.g., the maximum-likelihood models within a broad class of models, such as the
autoregressive integrated moving average (ARIMA) models widely used in time
series analysis) describing the data before and after the date of the intervention may
help to quantify the size of the effect associated with the intervention. If not, then
the interrupted time series does not provide evidence of a detectable effect of the
intervention. Free software for intervention analysis is available in R (e.g., [80];
CausalImpact algorithm from Google 2015).
43 Quantifying and Reducing Uncertainty About Causality in: : : 1457
Two main methods of intervention analysis are segmented regression, which fits
regression lines or curves to the effects time series before and after the intervention
and then compares them to detect significant changes in slope or level, and Box-Tiao
analysis, often called simply intervention analysis, which fits time series models
(ARIMA or Box-Jenkins models with models of intervention effects, e.g., jumps in
the level, changes in the slope, or ramp-ups or declines in the effects over time) to
the effects data before and after the intervention and tests whether proposed effects
of interventions are significantly different from zero. If so, the parameters of the
intervention effect are estimated from the combined pre- and post-intervention data
(e.g., [47, 48]). For effects time series that are stationary (meaning that the same
statistical description of the time series holds over time) both before and after an
intervention that changes exposure, but that have a jump in mean level due to the
intervention, quantifying the difference in effects that the intervention has made
can be as simple as estimating the difference in means for the effects time series
before and after the intervention, similar to the CPA in Fig. 43.2, but for a known
change point. The top panel of Fig. 43.5 shows a similar comparison for heart attack
rates before and after a smoking ban. Based only on the lines shown, it appears that
Fig. 43.5 Straight-line extrapolation of the historical trend for heart attack (AMI) rates over-
predicts future AMI rates (upper panel) and creates the illusion that smoking bans were followed
by reduced AMI rates, compared to more realistic nonlinear extrapolation (lower panel), which
shows no detectable benefit from a smoking ban (Source: [33])
1458 L.A. Cox, Jr.,
heart attack rates dropped following the ban. (If the effects of a change in exposure
occur gradually, then distributed-lag models of the interventions effects can be used
to describe the post-intervention observations [47].) In nonstationary time series,
however, the effect of the intervention may be obscured by other changes in the
time series. Thus, the bottom panel of Fig. 43.5 considers nonlinear trends over time
and shows that, in this analysis, any effect of the ban now appears to be negative
(i.e., heart attack rates are increased after the ban compared to what is expected
based on the nonlinear trend extrapolated from pre-ban data).
Intervention analyses, together with comparisons to time series for comparison
populations not affected by the interventions, have been widely applied, with
varying degrees of justification and success, to evaluate the impacts caused by
changes in programs and policies in healthcare, social statistics, economics, and
epidemiology. For example, Lu et al. [75] found that prior authorization policies
introduced in Maine to help control the costs of antipsychotic drug treatments for
Medicaid and Medicare Part D patients with bipolar disorder were associated with
an unintended but dramatic (nearly one third) reduction in initiation of medication
regimens among new bipolar patients, but produced no detectable switching of
currently medicated patients toward less expensive treatments. Morriss et al. [84]
found that the time series of suicides in a district population did not change
significantly following a district-wide training program that measurably improved
the skills, attitudes, and confidence of primary care, accident and emergency,
and mental health workers who received the training. They concluded that Brief
educational interventions to improve the assessment and management of suicide
for front-line health professionals in contact with suicidal patients may not be
sufficient to reduce the population suicide rate. [55] used intervention analysis
to estimate that the introduction of pedestrian countdown timers in Detroit cut
pedestrian crashes by about two thirds. Jiang et al. [59] applied intervention analysis
to conclude that, in four Australian states, the introduction of randomized breath
testing led to a substantial reduction in car accident fatalities. Callaghan et al. [10]
used a variant of intervention analysis, regression-discontinuity analysis, to test
whether the best-fitting regression model describing mortality rates among young
people changed significantly at the minimum legal drinking age, which was 18
in some provinces and 19 in others. They found that mortality rates for young
men jumped upward significantly precisely at the minimum legal drinking, which
enabled them to quantify the impact of drinking-age laws on mortality rates. In
these and many other applications, intervention analysis and comparison groups
have been used to produce empirical evidence for what has worked and what has
not and to quantify the sizes over time of effects attributed to interventions when
these effects are significantly different from zero.
Intervention analysis has important limitations, however. Even if an intervention
analysis shows that an effects time series changed when an intervention occurred,
this does not show whether the intervention caused the change. Thus, in applications
from air pollution bans to gun control, initial reports that policy interventions had
significant beneficial effects were later refuted by findings that equal or greater
beneficial changes occurred at the same time in comparison populations not affected
43 Quantifying and Reducing Uncertainty About Causality in: : : 1459
by the interventions [44, 64]. Also, more sophisticated methods such as transfer
entropy, discussed later, must be used to test and estimate effects in nonstationary
time series, since both segmented regression models and intervention analyses that
assume stationarity typically produce spurious results for nonstationary time series.
For example, as illustrated in Fig. 43.5, Gasparrini et al. [33] in Europe and Barr
et al. [8] in the United States found that straight-line projections of what future heart
attack (acute myocardial infarction, AMI) rates would have been in the absence of an
intervention that banned smoking in public places led to a conclusion that smoking
bans were associated with a significant reduction in AMI hospital admission rates
following the bans. However, allowing for nonlinearity in the trend, which was
significantly more consistent with the data, led to the reverse conclusion that the
bans had no detectable impact on reducing AMI admission rates. As illustrated in
Fig. 43.5, the reason is that fitting a straight line to historical data and using it to
project future AMI rates in the absence of intervention tend to overestimate what
those future AMI rates would have been, because the real time series is downward-
curving, not straight. Thus, estimates of the effect of an intervention based on
comparing observed to model-predicted AMI admission rates will falsely attribute a
positive effect even to an intervention that had no effect if straight-line extrapolation
is used to project what would have happened in the absence of an intervention,
ignoring the downward curvature in the time series. This example illustrates how
model specification errors can lead to false inferences about effects of interventions.
The transfer entropy techniques discussed later avoid the need for curve-fitting and
thereby the risks of such model specification errors.
Often, the hypothesized cause (e.g., exposure) and effect (e.g., disease rate) time
series both undergo continual changes over time, instead of changing only once
or occasionally. For example, pollution levels and hospital admission rates for
respiratory or cardiovascular ailments change daily. In such settings of ongoing
changes in both hypothesized cause and effect time series, Granger causality tests
(and the closely related Granger-Sims tests for pairs of time series) address the
question of whether the former helps to predict the latter. If not, then the exposure-
response histories provide no evidence that exposure is a (Granger) cause of the
effects time series, no matter how strong, consistent, etc., the association between
their levels over time may be. More generally, a time series variable X is not
a Granger cause of a time series variable Y if the future of Y is conditionally
independent of the history of X (its past and present values), given the history of
Y itself, so that future Y values can be predicted as well from the history of Y values
as from the histories of both X and Y . If exposure is a Granger cause of health
effects but health effects are not Granger causes of exposures, then this provides
evidence that the exposure time series might indeed be a cause of the effects time
1460 L.A. Cox, Jr.,
series. If exposure and effects are Granger causes of each other, then a confounder
that causes both of them is likely to be present. The key idea of Granger causality
testing is to provide formal quantitative statistical tests of whether the available data
suffice to reject (at a stated level of significance) the null hypothesis that the future
of the hypothesized effect time series can be predicted no better from the history
of the hypothesized cause time series together with the history of the effect time
series than it can be predicted from the history of the effect time series alone. Data
that do not enable this null hypothesis to be rejected do not support the alternative
hypothesis that the hypothesized cause helps to predict (i.e., is a Granger cause of)
the hypothesized effect.
Granger causality tests can be applied to time series on different time scales to
study effects of time-varying risk factors. For example, [76] identified a Granger-
causal association between fatty diet and risk of heart disease decades later in
aggregate (national level) data. Cox and Popken [16] found a statistically signifi-
cant historical association, but no evidence of Granger causation, between ozone
exposures and elderly mortality rates on a time scale of years. Granger causality
testing software is freely available in R (e.g., http://cran.r-project.org/web/packages/
MSBVAR/MSBVAR.pdf).
Originally Granger causality tests were restricted to stationary linear (autore-
gressive) time series models and to only two time series, a hypothesized cause
and a hypothesized effect. However, recent advances have generalized them to
multiple time series (e.g., using vector autoregressive (VAR) time series models)
and to nonlinear time series models (e.g., using nonparametric versions of the test
or parametric models that allow for multiplicative as well as additive interactions
among the different time series variables) ([6, 7, 111, 122]; Diks and Wolski
2014). These advances are now being made available in statistical toolboxes for
practitioners ([7]). For nonstationary time series, special techniques have been
developed, such as vector error-correction (VECM) models fit to first differences
of nonstationary variables or algorithms that search for co-integrated series (i.e.,
series whose weighted averages show zero mean drift). However, these techniques
are typically quite sensitive to model specification errors [91]. Transfer entropy (TE)
and its generalizations, discussed next, provides a more robust analytic framework
for identifying causality from multiple nonstationary time series based on the flow
of information among them.
Both Granger causality tests and conditional independence tests apply the principle
that causes should be informative about their effects; more specifically, changes in
direct causes provide information that helps to predict subsequent changes in effects.
This information is not redundant with the information from other variables and
cannot be explained away by knowledge of (i.e., by conditioning on the values of)
other variables. A substantial generalization and improvement of this information-
43 Quantifying and Reducing Uncertainty About Causality in: : : 1461
based insight is that information flows over time from causes to their effects, but
not in the reverse direction. Thus, instead of just testing whether past and present
exposures provide information about (and hence help to predict) future health
effects, it is possible to quantify the rate at which information, measured in bits,
flows from the past and present values of the exposure time series to the future
values of the effects time series. This is the key concept of transfer entropy (TE)
[81, 91, 99, 118]). It provides a nonparametric, or model-free, way to detect and
quantify rates of information flow among multiple variables and hence to infer
causal relations among them based on the flow of information from changes in
causal variables (drivers) to subsequent changes in the effect variables that they
cause (responses). If there is no such information flow, then there is no evidence
of causality.
Transfer entropy (TE) is model-free in that it examines the empirically estimated
conditional probabilities of values for one time series, given previous values of
others, without requiring any parametric models describing the various time series.
Like Granger causality, TE was originally developed for only two time series,
a possible cause and a possible effect, but has subsequently been generalized
to multiple time series with information flowing among them over time (e.g.,
[81, 91, 99]. In the special case where the exposure and response time series can
be described by linear autoregressive (AR) processes with multivariate normal error
terms, tests for TE flowing from exposure to response are equivalent to Granger
causality tests (Barnett et al. 2009), and Granger tests, in turn, are equivalent to
conditional independence tests for whether the future of the response series is
conditionally independent of the history of the exposure series, given the history
of the response series. Free software packages for computing the TE between or
among multiple time series variables are now available for MATLAB [81] and
other software (http://code.google.com/p/transfer-entropy-toolbox/downloads/list).
Although transfer entropy and closely related information-theoretic quantities been
developed and applied primarily within physics and neuroscience to quantify flows
of information and appropriately defined causal influences [58] among time series
variables, they are likely to become more widely applied in epidemiology as their
many advantages become more widely recognized.
The insight that changes in causes produce changes their effects, making the
probability distributions for effect variables different from what they otherwise
would have been, has contributed to a well-developed field of counterfactual
(potential-outcome) causal modeling [51]. A common analytic technique in this
field is to treat the unobserved outcomes that would have occurred had causes
(e.g., exposures or treatments) been different as missing data and then to apply
missing-data methods for regression models to estimate the average difference
1462 L.A. Cox, Jr.,
then the estimated effect of the treatment on mortality rates might be very different
from what it would be if it is assumed that the patient would not have received the
treatment because the physician is confident that there is no need for it and that the
patient would recover anyway. In practice, counterfactual comparisons usually do
not specify in detail the causal mechanisms behind the counterfactual assumptions
about treatments or exposures, and this can obscure the precise interpretation of any
comparison between what did happen and what is supposed, counterfactually, would
have happened had treatments (or exposure) been different. Standard approaches
estimate effects under the assumption that those who are treated or exposed are
exchangeable with those who are not within strata of adjustment factors that may
affect (but are not affected by) the treatment or the outcome, but the validity of this
assumption is usually difficult or impossible to prove.
An alternative, increasingly popular, approach to using untested assumptions
to justify causal conclusions is the instrumental variable (IV) method, originally
developed in econometrics [112]. In this approach, an instrument is defined as a
variable that is associated with the treatment or exposure of interest a condition
that is usually easily verifiable and that also affects outcomes (e.g., adverse health
effects) only through the treatment or exposure variable, without sharing any causes
with the outcome. (In DAG notation, such an instrument would be a variable with
arrows directed only into Y and Z in the DAG model X ! Y ! Z, where Z is
the outcome variable, Y the treatment or exposure variable, and X a variable that
affects exposure, such as job category, residential location, or intent-to-treat.) These
latter conditions are typically assumed in IV analyses, but not tested or verified. If
they hold, then the effects of unmeasured confounders on the estimated association
between Y and Z can be eliminated using observed values of the instrument,
and this is the potential great advantage of IV analysis. However, in practice,
IV methods applied in epidemiology are usually dangerously misleading, as even
minor violations of their untested assumptions can lead to substantial errors and
biases in estimates of the effects of different exposures or treatments on outcomes;
thus, many methodologists consider it inadvisable to use IV methods to support
causal conclusions in epidemiology, despite their wide and increasing popularity
for this purpose [112]. Unfortunately, within important application domains in
epidemiology, including air pollution health effects research, leading investigators
sometimes draw strong but unwarranted causal conclusions using IV or counter-
factual (PSM or MSM) methods and then present these dubious causal conclusions
and effects estimates to policy-makers and the public as if they were known to be
almost certainly correct, rather than depending crucially on untested assumptions
of unknown validity (e.g., [104]). Such practices lead to important-seeming journal
articles and policy recommendations that are untrustworthy, potentially reflecting
the ideological biases or personal convictions of the investigator rather than true
discoveries about real-world causal impacts of exposures [103]. Other scientists
and policy makers are well advised to remain on guard against such enthusiastic
claims about causal impacts and effects estimates promoted by practitioners of IV
and counterfactual methods who do not appropriately caveat their conclusions by
emphasizing their dependence on untested modeling assumptions.
43 Quantifying and Reducing Uncertainty About Causality in: : : 1465
An older, but still useful, approach to causal inference from observational data,
developed largely in the 1960s and 1970s, consists of showing that there is an
association between exposure and response that cannot plausibly be explained by
confounding, biases (including model and data selection biases and specification
errors), or coincidence (e.g., from historical trends in exposure and response that
move together but that do not reflect causation). Quasi-experiment (QE) design and
analysis approaches originally developed in social statistics [12] systematically enu-
merate potential alternative explanations for observed associations (e.g., coincident
historical trends, regression to the mean, population selection, and response biases)
and provide statistical tests for refuting them with data, if they can be refuted. The
interrupted time series analysis studies discussed earlier are examples of quasi-
experiments: they do not allow random assignment of individuals to exposed and
unexposed populations but do allow comparisons of what happened in different
populations before and after an intervention that affects some of the populations
but not others (the comparison groups).
A substantial tradition of refutationist approaches in epidemiology follows the
same general idea of providing evidence for causation by using data to explicitly
test, and if possible refute, other explanations for exposure-response associations
[77]. As stated by Samet and Bodurow [100], Because a statistical association
between exposure and disease does not prove causation, plausible alternative
hypotheses must be eliminated by careful statistical adjustment and/or consideration
of all relevant scientific knowledge. Epidemiologic studies that show an association
after such adjustment, for example through multiple regression or instrumental
variable estimation, and that are reasonably free of bias and further confounding,
provide evidence but not proof of causation. This is overly optimistic, insofar as
associations that are reasonably free of bias and confounding do not necessarily
provide evidence of causation. For example, strong, statistically significant associ-
ations (according to usual tests, e.g., t-tests) typically occur in regression models
in which the explanatory and dependent variables undergo statistically independent
random walks. The resulting associations do not arise from confounding or bias
but from spurious regression, i.e., coincident historical trends created by random
processes that are not well described by the assumptions of the regression models.
Nonetheless, the recommendation that plausible alternative hypotheses must be
eliminated by careful statistical adjustment and/or consideration of all relevant
scientific knowledge well expresses the refutationist point of view.
Perhaps the most useful and compelling valid evidence of causation, with the
possible exception of differences in effects between treatment and control groups
in well-conducted randomized control trials, consists of showing that changes in
43 Quantifying and Reducing Uncertainty About Causality in: : : 1467
For most of the past century, DAG models such as those in Figs. 43.1 and 43.6,
in which arrows point from some variables into others and there are no directed
cycles, have been used to explicate causal networks of mechanisms and to provide
formal tests for their hypothesized causal structures. For example, path analysis
methods showing the dependency relations among variables in SEMs have been
used for many decades to show how some variables influence others when all
relations are assumed to be linear. Figure 43.7 presents an example involving several
1468 L.A. Cox, Jr.,
Fig. 43.6 Simulation model for major health conditions related to cardiovascular disease (CVD)
and their causes. Boxes represent risk factor prevalence rates modeled as dynamic stocks.
Population flows among these stocks including people entering the adult population, entering
the next age category, immigration, risk factor incidence, recovery, cardiovascular event survival,
and death are not shown (Source: [52])
Key: Blue solid arrows: causal linkages affecting risk factors and cardiovascular events and deaths.
Brown dashed arrows: influences on costs.
Purple italics: factors amenable to direct intervention.
Black italics (population aging, cardiovascular event fatality): other specified trends.
Black nonitalics: all other variables, affected by italicized variables and by each other
Fig. 43.7 A path diagram with standardized coefficients showing linear effects of some variables
on others (Source: [120])
variables that are estimated to significantly predict lung cancer risk: the presence of
a particular single nucleotide polymorphism (SNP) (the CHRNA5-A3 gene cluster,
a genetic variant which is associated with increased risk of lung cancer), smoking,
and presence of chronic obstructive pulmonary disease (COPD) [120].
43 Quantifying and Reducing Uncertainty About Causality in: : : 1469
assumptions of linear SEMs and jointly normally distributed error terms relating
the value of each variable to the values of its parents hold; analyses based on
correlations can then be used instead. Such consistency and coherence constraints
can be expressed as systems of equations that can be solved, when identifiability
conditions hold, to estimate the path coefficients (including confidence intervals)
relating changes in parent variables to changes in their children. Summing these
changes over all paths leading from exposure to response variables allows the total
effect (via all paths) of a change in exposure on changes in expected responses to be
estimated or predicted.
Path analysis and other SEM models are particularly valuable for detecting
and quantifying the effects of unmeasured (latent) confounders based on the
patterns of correlations that they induce among observed variables. SEM modeling
methods have also been extended to include quadratic terms, categorical variables,
and interaction terms [66]. Standard statistics packages and procedures, such as
PROC CALIS in SAS, have made this technology available to modelers for the
past four decades, and free modern implementations are readily available (e.g., in
the R packages SEM or RAMpath).
Path analysis, which is now nearly a century old, provided the main technology
for causal modeling for much of the twentieth century. More recently, DAG
models have been generalized so that causal mechanisms need not be described
by linear equations for expected values, but may instead be described by arbitrary
conditional probability relations. The nodes in such a graph typically represent
random variables, stochastic processes, or time series variables in which a decision-
maker may intervene from time to time by taking actions that cause changes in some
of the time series [4, 23]).
Bayesian networks (BNs) are a prominent type of DAG model in which nodes
represent constants, random variables, or deterministic functions [3]. Figure 43.8
shows an example of a BN for cardiovascular disease (CVD). As usual, the absence
of an arrow between two nodes, such as between ethnic group and CVD in Fig. 43.8,
implies that neither has a probability distribution that depends directly on the
value of the other. Thus, ethnic group is associated with CVD, but the association
is explained by smoking as a mediating variable, and the structure of the DAG
shows no further dependence of CVD on ethnic group. Statistically, the random
variable indicating cardiovascular disease, CVD, is conditionally independent of
ethnic group, given the value of smoking.
Some useful causal implications may be revealed by the structure of a DAG,
even before the model is quantified to create a fully specified BN (or other DAG)
model. For example, if the DAG structure in Fig. 43.8 correctly describes an
individual or population, then elevated systolic BP (blood pressure) is associated
with CVD risk, since both have age and ethnicity as ancestors in the DAG. However,
changes in statin use, which could affect systolic BP via the intermediate variable
43 Quantifying and Reducing Uncertainty About Causality in: : : 1471
Fig. 43.8 Directed acyclic graph (DAG) structure of a Bayesian network (BN) model for
cardiovascular disease (CVD) risk (Source: [115])
antihypertensive, would not be expected to have any effect on CVD risk. Learning
the correct DAG structure of causal relations among variables from data a key
part of the difficult task of causal discovery, discussed later can reveal important
and unexpected findings about what works and what does not for changing the
probabilities of different outcomes, such as CVD in Fig. 43.8. However, uncertainty
about the correct causal structure can make sound inference about causal impacts
(and hence recommendations about the best choice of actions to produce desired
changes) difficult to determine. Such model uncertainty motivates the use of
ensemble modeling methods, discussed later.
Fig. 43.9 BN model of risk of infectious diarrhea among children under 5 in Cameroon. The left
panel shows the unconditional risk for a random child from the population (14.97%); the right panel
shows the conditional risk for a malnourished child from a home in the lowest income quintile and
with poor sanitation (20.00%) (Source: [87]
The left panel shows a BN model for risk of infectious diarrhea among young
children in Cameroon. Each of three risk factors income quintile, availability of
toilets (sanitation), and stunted growth (malnutrition) affects the probability
that a child will ever have had diarrhea for at least two weeks (diarrhea). In
addition, these risk factors affect each other, with an observation of low income
making observations of poor sanitation and malnutrition more likely and observed
poor sanitation also making observed malnutrition more likely at each level of
income. The right panel shows an instance of the model for a particular case of
a malnourished child from the poorest income quintile living with poor sanitation;
these three risk factor values all have probabilities set to 100%, since their values
are known. The result is that the risk of diarrhea, conditioned on this information,
is increased from an average value of 14.97% in this population of children to
20%when all three risk factors are set to these values.
A BN model stores conditional probability tables (CPTs) at nodes with inward-
pointing arrow. A CPT simply tabulates the conditional probabilities for the values
of the node variable for each combination of values of the variables that point into
it (its parents in the DAG). For example, the malnutrition node in Fig. 43.9 has
a CPT with 10 rows, since its two parents, Income and Sanitation, have 5 and 2
possible levels, respectively, implying ten possible pairs of input values. For each of
these ten combinations of input values, the CPT shows the conditional probability
for each possible value of Malnutrition (here, just Yes and No, so that the CPT
for Malnutrition has 2 columns); these conditional probabilities must sum to 1 for
each row of the CPT. BN nodes can also represent deterministic functions by CPTs
that assign 100% probability to a specific output value for each combination of
input values. The conditional probability distribution for the value of a node (i.e.,
variable) thus depends on the values of the variables that point into it; it can be freely
specified (or estimated from data, if adequate data are available) without making any
43 Quantifying and Reducing Uncertainty About Causality in: : : 1473
Any joint probability distribution of multiple random variables can be factored into
a product of marginal and conditional probability distributions and displayed in
DAG form, usually in several different ways. For example, the joint probability
mass function P .x, y/ of two discrete random variables X and Y , specifying the
probability of each pair of specific values (x, y/ for random variables X and Y ,
can be factored as P .x/P .yjx/ or as P .y/P (x jy/ and can be displayed in a BN
as X ! Y or as Y ! X , respectively. Here, x and y denote possible values
of random variables X and Y , respectively; P .x, y/ denotes the joint probability
that X D x and Y D y; P .x/ denotes the marginal probability that X D x;
P .y/ denotes the marginal probability that Y D y, and P .yjx/ and P .xjy/ denote
conditional probabilities that Y D y given that X D x and that X D x given that
Y D y, respectively. Thus, there is nothing inherently causal about a BN. Its nodes
need not represent causal mechanisms that map values of inputs to probabilities
for the values of outputs caused by those inputs. Even if they do represent such
causal mechanisms, they may not explicate how or why the mechanisms work. For
example, the direct link from Income to Malnutrition in Fig. 43.9 gives no insight
into how or why changes in income affect changes in malnutrition e.g., what
specific decisions or behaviors are influenced by income that, in turn, results in
better or worse nutrition. Thus, it is possible to build and use BNs for probabilistic
inference without seeking any causal interpretation of the statistical dependencies
among its variables.
However, BNs are often deliberately constructed and interpreted to mean that
changes in the value of a variable at the tail of an arrow will cause a change in
the probability distribution of the variable into which it points, as described by the
CPT at that node. The effect of a change in a parent variable on the probability
1474 L.A. Cox, Jr.,
distribution of a child variable into which it points may depend on the values of
other parents of that node, thus allowing interactions among direct causes at that
node to be modeled. For example, in Fig. 43.8, the effects of smoking on CVD risk
may be different at different ages, and this would be indicated in the CPT for the
CVD node by having different probabilities for the values of the CVD variable at
different ages for the same value of the smoking variable. A causal BN is a BN
in which the nodes represent stable causal mechanisms or laws that predict how
changes in input values change the probability distribution of output values. The
CPT at a node of a causal BN describes the conditional probability distribution for
its value caused by each combination of values of its inputs, meaning that changes
in one or more of its input values will be followed by corresponding changes in the
probability distribution for the nodes value, as specified by the CPT. This is similar
to the concept of a causal mechanism in structural equation models, where a change
in a right-hand side (explanatory or independent) variable in a structural equation
is followed by a change in the left-hand side (dependent or response variable) to
restore equality [119].
A causal BN allows changes at input nodes to be propagated throughout the rest
of the network, yielding a posterior joint probability distribution for the values of
all variables. (If the detailed time course of changes in probabilities is of interest,
then differential equations or dynamic Bayesian networks (DBNs), discussed later,
may be used to model how the nodes probability distribution of values changes
from period to period.) The order in which changes propagate through a network
provides insight into the (total or partial) causal ordering of variables and can be
used to help deduce network structures from time series data [119]. Similarly, in a
set of simultaneous linear structural equations describing how equilibrium levels
of variables in a system are related, the causal ordering of variables (called the
Simon causal ordering in econometrics) is revealed by the order in which the
equations must be solved to determine the values of all the variables, beginning
with exogenous inputs (and assuming that the system of equations can be solved
uniquely, i.e., that the values of all variables are uniquely identifiable from the data).
Causality flows from exogenous to endogenous variables and among endogenous
variables in such SEMs (ibid). Exactly how the changes in output probabilities
(or in the expected values of left-side variables in an SEM) caused by changes in
inputs are to be interpreted (e.g., as changes in the probability distribution of future
observed values for a single individual or as changes in the frequency distribution
of the variable values in a population of individuals described by the BN) depends
on the situation being modeled.
A true causal mechanism that has been explicated in enough detail to make reliable
predictions can be modeled as a conditional probability table (CPT) that gives the
same conditional probabilities of output values whenever the input values are the
same. Such a stable, repeatable relation, which might be described as lawlike,
43 Quantifying and Reducing Uncertainty About Causality in: : : 1475
can be applied across multiple contexts as long as the inputs to the node are
sufficient to determine (approximately) unique probabilities for its output values.
For example, a dose-response relation between radiation exposure and excess age-
specific probability (or, more accurately, hazard rate) for first diagnosis with a
specific type of leukemia might be estimated from data for one population and then
applied to another with similar exposures, provided that the change in risk caused by
exposure does not depend on omitted factors. If it depends on age and ethnic group,
for example, then these would have to be included, along with exposure, as inputs
to the node representing leukemia status. By contrast, unexplained heterogeneity,
in which the estimated CPT differs significantly when study designs are repeated
by different investigators, signals that a lawlike causal mechanism has not yet been
discovered. In that case, the models and the knowledge that the BN represents need
to be further refined to discover and express predictively useful causal relations
that can be applied to new conditions. The key idea is that, to be transferable
across contexts (e.g., populations), the probabilistic relations encoded in CPTs must
include all of the input conditions that suffice to make their conditional probabilities
accurate, given accurately measured or estimated input values.
A proposed causal relation that turns out to be very heterogeneous, sometimes
showing significant positive effects and other times no effects or significant negative
effects under the same conditions, does not correspond to a lawlike causal relation
and cannot be relied on to make valid causal predictions (e.g., by using mean values
averaged over many heterogeneous studies). Thus, the estimated CPTs at nodes in
Fig. 43.9 may be viewed as averages of many individual-specific CPTs, and the
predictions that they make for any individual case may not be accurate. CPTs that
simply summarize historical data on conditional frequency distributions, but that
do not represent causal mechanisms, may be no more than mixtures of multiple
CPTs for the (perhaps unknown) populations and conditions that contributed to
the historical data. They cannot necessarily be generalized to new populations or
conditions (sometimes described as being transported to new contexts) or used to
predict how outputs will change in response to changes in inputs, unless the relevant
mixtures are known. For example, suppose that the Sanitation node has a value of 1
for children from homes with toilets and a value of 0 otherwise. If homes may have
toilets either because the owners bought them or because a government program
supplied them as part of a program along with a child food and medicine program,
then the effect of a finding that Sanitation D 1 on the conditional probability dis-
tribution of Malnutrition may depend very much on which of these reasons resulted
in Sanitation D 1. But this is not revealed by the model in Fig. 43.9. In such a case,
the estimated CPTs for the nodes in Fig. 43.9 should not be interpreted as describing
causal mechanisms, and the effects on other variables of setting Sanitation D 1 by
alternative methods cannot be predicted from the model in Fig. 43.9.
Once a BN has been quantified by specifying its DAG structure and the probability
tables at its nodes, it can be used to draw a variety of useful inferences by applying
1476 L.A. Cox, Jr.,
any of several well-developed algorithms created and refined over the past three
decades [3]. The most essential inference capability of a BN model is that if
observations (or findings) about the values at some nodes are entered, then the
conditional probability distributions of all other nodes can be computed, conditioned
on the evidence provided by these observed values. This is called posterior
inference. In other words, the BN provides computationally practicable algorithms
for accomplishing the Bayesian operation of updating prior knowledge or beliefs,
represented in the node probability tables, with observations to obtain posterior
probabilities. For example, if known values of a patients age, sex, and systolic blood
pressure were to be entered for the BN in Fig. 43.8, then the conditional probability
distributions based on that information could be computed for all other variables,
including diabetes status and CVD risk, by BN posterior inference algorithms. In
Fig. 43.9, learning that a child is from a home with inadequate sanitation would
allow updated (posterior) probabilities for the possible income and nutrition levels,
as well as the probability of diarrhea, to be computed using exact probabilistic
inference algorithms. The best-known exact algorithm (the junction tree algorithm)
is summarized succinctly by [3]. For large BNs, approximate posterior probabilities
can be computed efficiently using Monte Carlo sampling methods, such as Gibbs
sampling, in which input values drawn from the marginal distributions at input nodes
are propagated through the network by sampling from the conditional distributions
given by the CPTs, thus building up a sample distribution for any output variable(s)
of interest.
The BN in Fig. 43.10 illustrates that different types of data, from demographics
(age and sex) to choices and behaviors (smoking) to comorbidities (diabetes) to
clinical measurements (such as systolic and diastolic blood pressures (SP and DBP))
and biomarkers (cholesterol levels) can be integrated in a BN model, here built using
Fig. 43.10 BN model for predicting CHD risk from multiple types of data [117]
43 Quantifying and Reducing Uncertainty About Causality in: : : 1477
the popular Netica BN software product, to inform risk estimates for coronary heart
disease (CHD) occurring in the next ten years. In addition to posterior inference of
entire probability distributions for its variables, BNs can be used to compute the
most likely explanations for observed or postulated outcomes (e.g., what are the
most likely input values that lead to a specified set of output values) and to study
the sensitivity of the probability of achieving (or avoiding) specified target sets of
output values to changes in the probabilities of input values.
BN software products such as Netica, Hugin, or BayesiaLab not only provide
algorithms to carry out these computations but also integrate them with graphic
user interfaces for drawing BNs, populating their probability tables, and reporting
results. For example, the DAG in Fig. 43.10, drawn in Netica, displays probability
distributions for the values of each node. The probabilities are for a man, since the
Sex node at the top of the diagram has its probability for the value male set to
100%. This node is shaded to show that observed or assumed values have been
specified by the user, rather than being inferred by the BN model. If additional facts
(findings) are entered, such as that the patient is a diabetic never-smoker, then the
probability distributions at the 10-yr Risk of Event node and the other nodes would
all be automatically updated to reflect (condition upon) this information.
Free BN software packages for creating BNs and performing posterior inference
are also available in both R and Python. In R, the gRain package allows BNs to be
specified by entering the probability tables for their nodes. The resulting BN model
can then be queried by entering the variables for which the posterior probability is
desired, along with observed or assumed values for other variables. The package will
return the posterior probabilities of the query variables, conditioned on the specified
observations or assumptions. Both exact and approximate algorithms (such as the
junction tree algorithm and Monte Carlo simulation-based algorithms, respectively)
for such posterior inference in Bayesian networks are readily available if all
variables are modeled as discrete with only a few possible values. For continuous
variables, algorithms are available if each node can be modeled as having a normal
distribution with a mean that is a weighted sum of the values of its parents, so that
each node value depends on its parents values through a linear regression equation.
Various algorithms based on Monte Carlo simulation are available for the case of
mixed discrete and continuous BNs [13].
A far more difficult problem than posterior inference is to infer or learn BNs
or other causal graph models directly from data. This is often referred to as the
problem of causal discovery (e.g., [54]). It includes the structure discovery problem
of inferring the DAG graph of a BN from data, e.g., by making sure that it shows the
conditional independence relations (treated as constraints), statistical dependencies,
and order of propagation of changes [119] inferred from data. Structure learning
algorithms are typically either constraint-based, seeking to find DAG structures
that satisfy the conditional independence relations and other constraints inferred
1478 L.A. Cox, Jr.,
from data or score-based, seeking to find the DAG structure that maximizes a
criterion (e.g., likelihood or posterior probability penalized for complexity) [3,9,20],
although hybrid algorithms have also been developed. Learning a BN from data
also requires quantifying the probability tables (or other representations of the
probabilistic input-output relation) at each node, but this is usually much easier
than structure learning. Simply tabulating the frequencies of each output value
for each combination of input values may suffice for large data sets if the nodes
have been constructed to represent causal mechanisms. For smaller data sets, fitting
classification trees or regression models to available data can generate an estimated
CPT, giving the conditional probability of each output value for each set of values
of the inputs. Alternatively, Bayesian methods can be used to condition priors
(typically, Dirichlet priors for multinomial random variables) on available data to
obtain posterior distributions for the CPTs [110].
Although many BN algorithms are now available to support learning BNs from
data [105], a fundamental limitation and challenge remains that multiple different
models often provide approximately equally good explanations of available data,
as measured by any of the many scoring rules, information-theoretic measures, and
other criteria that have been proposed, and yet they make different predictions for
new cases or situations. In such cases, it is better to use an ensemble of BN models
instead of any single one to make predictions and support decisions [3]. How best
to use common-sense knowledge-based constraints (e.g., that death can be an effect
but not a cause of exposure or that age can be a cause but not an effect of health
effects) to extract unique causal models, or small sets of candidate models, from
data is still an active area of research, but most BN packages allow users to specify
both required and forbidden arrows between nodes when these knowledge-based
constraints are available. Since it may be impossible to identify a unique BN model
from available data, the BN-learning and causal discovery algorithms included in
many BN software packages should be regarded as useful heuristics for suggesting
possible causal models, rather than as necessarily reliable guides to the truth.
For practical applications, the bnlearn package in R [105] provides an assortment
of algorithms for causal discovery, with the option of including knowledge-based
constraints by specifying directed or undirected arcs that must always be included
or that must never be included. For example, in Fig. 43.8, sex, age, and ethnic group
cannot have arrows directed into them (they are not caused by other variables), and
CVD deaths cannot be a cause of any other variable [115]. The DAG model for
cardiovascular disease risk prediction in Fig. 43.8 was discovered using one of the
bnlearn algorithms (the grow-shrink algorithm for structure learning), together with
these knowledge-based constraints. On the other hand, the BN model in Fig. 43.10,
which was developed manually based on an existing regression model, has a DAG
structure that is highly questionable. Its logical structure is that of a regression
model: for men, all other explanatory or independent variables point into the
dependent variable 10-year Risk of Event, and there are no arrows directed between
explanatory variables, e.g., from smoking to risk of diabetes. Such additional
structure would probably have been discovered had machine learning algorithms
for causal discovery such as those in bnlearn been applied to the original data. If the
43 Quantifying and Reducing Uncertainty About Causality in: : : 1479
BNs and other causal graph model are increasingly used in epidemiology to
model uncertain and multivariate exposure-response relations. They are particularly
useful for characterizing uncertain causal relations, since they can represent both
uncertainty about the appropriate causal structure (DAG model), via the use of
multiple DAGs (ensembles of DAG models), and uncertainties about the marginal
and conditional probabilities at the input and other nodes. As noted by Samet
and Bodurow [100], The uncertainty about the correct causal model involves
uncertainty about whether exposure in fact causes disease at all, about the set of
confounders that are associated with exposure and cause disease, about whether
there is reverse causation, about what are the correct parametric forms of the
relations of the exposure and confounders with outcome, and about whether there
are other forms of bias affecting the evidence. One currently used method for
making this uncertainty clear is to draw a set of causal graphs, each of which
represents a particular causal hypothesis, and then consider evidence insofar as it
favors one or more of these hypotheses and related graphs over the others.
An important principle for characterizing and coping with uncertainty about
causal models is not to select and use any single model when there is substantial
uncertainty about which one is correct [3]. It is more effective, as measured by
many performance criteria for evaluating predictive models, such as mean squared
prediction error, to combine the predictions from multiple models that all fit the
data adequately (e.g., that all have likelihoods at least 10% as large as that of
the most likely model). Indeed, the use of multiple models is often essential for
accurately depicting model uncertainty when quantifying uncertainty intervals or
uncertainty sets for model-based predictions. For example, Table 43.3 presents a
Table 43.3 A machine learning challenge: What outcome should be predicted for case 7 based
on the data in cases 16?
Case Predictor 1 Predictor 2 Predictor 3 Predictor 4 Outcome
1 1 1 1 1 1
2 0 0 0 0 0
3 0 1 1 0 1
4 1 1 0 0 0
5 0 0 0 0 0
6 1 0 1 1 1
7 1 1 0 1 ?
1480 L.A. Cox, Jr.,
small hypothetical data set to illustrate that multiple models may provide equally
good (in this example, perfect) descriptions of all available data and yet make very
different predictions for new cases. For simplicity, all variables in this example are
binary (01) variables.
Suppose that cases 16 constitute a training set, with 4 predictors and one
outcome column (the right most) to be predicted from them. The challenge for
predictive analytics or modeling in this example is to predict the outcome for case
7 (the value, either 0 or 1, in the ? cell in the lower right of the table). For
example, predictors 14 might represent various features (1 = present, 0 = absent)
of a chemical, or perhaps results of various quick and inexpensive assays for the
chemical (1 = positive, 0 = negative) and the outcome might indicate whether the
chemical would be classified as a rodent carcinogen in relatively expensive two-year
live-animal experiments. A variety of machine-learning algorithms are available for
inducing predictive rules or models from training data, from logistic regression
to classification trees (or random forest, an ensemble-modeling generalization of
classification trees) to BN learning algorithms. Yet, no algorithm can provide
trustworthy predictions for the outcome in case 7 based on the training data in
cases 16, since many different models fit the training data equally well and yet
make opposite predictions. For example, the following two models each describe
the training data in rows 16 perfectly, yet they make opposite prediction for case 7:
Likewise, these two models would make opposite predictions for a chemical with
predictor values of (0, 0, 1, 0). Model 1 can be represented by a BN DAG structure
in which predictors 2, 3, and 4 are the parents of the outcome node (and the CPT is
a deterministic function with probabilities of 1 or 0 that the outcome = 1, depending
on the values of these predictors). Model 2 would be represented by a BN in which
only node 3 is a parent of the outcome node. The following are additional models
or prediction rules, e.g.:
Model 3: Outcome is the greater of the values of predictors 1 and 2 except when
both equal 1, in which case the outcome is the greater of the values of predictors 3
and 4.
Model 4: Outcome is the greater of the values of predictors 1 and 2 except when
both equal 1, in which case the outcome is the lesser of the values of predictors 3
and 4.
These models also provide equally good fits to, or descriptions of, the training
data, but make opposite predictions for case 7 and imply yet another BN structure.
Thus, it is impossible to confidently identify a single correct model structure from
the training data (the data-generating process is non-identifiable from the training
data), and no predictive analytics or machine learning algorithm can determine from
these data a unique model (or set of prediction rules) for correctly predicting the
outcome for new cases or situations.
43 Quantifying and Reducing Uncertainty About Causality in: : : 1481
priors), then Monte Carlo uncertainty analysis can be used to propagate the
uncertain probabilities efficiently through the BN model, leading to uncertainty
distributions for the posterior probabilities of the values of the variables in the BN
(Kleiter 1996). Imprecise Dirichlet models have also been used to learn credal sets
from data, resulting in upper and lower bounds for the probability that each variable
takes on a given value [15].
Rather than using sets or intervals for uncertain probabilities, it is sometimes
possible to simply use best guesses (point estimates) and yet to have confidence that
the results will be approximately correct. Henrion et al. (1996) note that, in many
situations, the key inferences and insights from BN models are quite insensitive (or
robust) to variations in the estimated values in the probability tables for the nodes.
When this is the case, best guesses (e.g., MLE point estimates) of probability values
may be adequate for inference and prediction, even if the data and expertise used to
form those estimates are scarce and the resulting point estimates are quite uncertain.
The BN techniques discussed so far are useful for predicting how output probabil-
ities will change if input values are varied, provided that the DAG structure can be
correctly interpreted as showing how changes in the inputs propagate through net-
works of causal mechanisms to cause changes in outputs. (As previously discussed,
this requires that the network is constructed so that the CPTs at nodes represent
not merely statistical descriptions of conditional probabilities in historical data but
causal relations determining probabilities of output values for each combination
of input values.) Once the probabilities of different outputs can be predicted for
different inputs, it is natural to ask how the controllable inputs should be set to make
the resulting probability distribution of outputs as desirable as possible. This is the
central question of decision analysis, and mainstream decision analysis provides a
standard answer: choose actions to maximize the expected utility of the resulting
probability distribution of consequences.
To modify BN models to support optimal (i.e., expected utility-maximizing)
risk management decision-making, the BNs must be augmented with two types
of nodes that do not represent random variables or deterministic functions. There
is a utility node, also sometimes called a value node, which is often depicted
in DAG diagrams as a hexagon and given a name such as Decision-makers
utility. There must also be one or more choice nodes, also called decision nodes,
commonly represented by rectangles. The risk management decision problem is to
make choices at the decision nodes to maximize the expected value of the utility
node, taking into account the uncertainties and conditional probabilities described
by the rest of the DAG model. Input decision nodes (i.e., decision nodes with
only outward-directed arrows) represent inputs whose values are controlled by the
decision-maker. Decision nodes with inputs represent decision rules, i.e., tables
or functions specifying how the decision nodes value is to be chosen, for each
43 Quantifying and Reducing Uncertainty About Causality in: : : 1483
Fig. 43.11 An influence diagram (ID) model with two decision nodes (green rectangles) and with
Net health effect as the value node. (Questions and comments in trapezoids on the periphery are
not parts of the formal ID model, but help to interpret it for policy makers) (Source: www.lumina.
com/case-studies/farmed-salmon/)
combination of values of its inputs. BNs with choice and value nodes are called
influence diagram (ID) models. BN posterior inference algorithms can be adapted
to solve for the best decisions in an ID, i.e., the choices to make at the choice nodes
in order to maximize the expected value of the utility node [3, 123].
Figure 43.11 shows an example of an ID model developed and displayed using
the commercial ID software package Analytica. Its two decision nodes represent
choices about whether to lower the allowed limits for pollutants in fish feed and
whether to recommend to consumers that they restrict consumption of farmed
salmon, respectively [49]. The two decision nodes are shown as green rectangles,
located toward the top of the ID. The value or utility node in Fig. 43.11, shown
as a pink hexagon located toward the bottom of the diagram, is a measure of net
health effect in a population. It can be quantified in units such as change in life
expectancy (added person-year of life) or change in cancer mortality rates caused
by different decisions and by the other factors shown in the model. Many of these
1484 L.A. Cox, Jr.,
factors, such as (a) the estimated exposure-response relations for health harm caused
by consuming pollutants in fish and (b) the health benefits caused by consuming
omega three fatty acids in fish, are uncertain. The uncertainties are represented by
random variables (the dark blue oval-shaped nodes throughout the diagram) and by
modeling assumptions that allow other quantities (the light blue oval-shaped nodes)
to be calculated from them.
An example of a modeling assumption is that pollutants increase mortality
rates in proportion to exposure, with the size of this slope factor being uncertain.
Different models (or expert opinions) for relevant toxicology, dietary habits,
consumer responses to advisories and recommendations, nutritional benefits of fish
consumption, and so forth can contribute to developing the CPTs for different parts
of the ID model and characterizing uncertainties about them. IDs thus provide a
constructive framework for coordinating and integrating multiple submodels and
contributions from multiple domains of expertise and for applying them to help
answer practical questions such as how different policies, regulations, warnings, or
other actions will affect probable health effects, consumption patterns, and other
outcomes of interest.
If multiple decision makers with different jurisdictions or spans of control
attempt to control the same outcome, however, then coordinating their decisions
effectively may require resolving game-theoretic issues in which each decision
makers best decision depends on what the others do. For example, in Fig. 43.11,
if the regulators in charge of setting allowed limits for pollutant contamination
levels in fish feed are different from the regulators or public health agencies issuing
advisories about what to eat and what not to eat, then each might decide not to take
additional action to protect public health if it mistakenly assumes that the other will
do so. Problems of risk regulation or management with multiple decision-makers
can be solved by generalizing IDs to multi-agent influence diagrams (MAIDs)
[67, 89, 107]. MAID algorithms recommend what each decision-maker, each with
its own utility function and decision variables, should do, taking into account any
information it has about the actions of others, when their decisions propagate
through a DAG model to jointly determine probabilities of consequences.
Although the idea of extending BNs to include decision and value nodes seems
straightforward in principle, understanding which variables are controllable by
whom over what time interval may require careful thought in practice. For example,
consider a causal graph model (Fig. 43.12) showing factors affecting treatment of
tooth defects (central node), such as patients age, genetics, smoking status, diabetes,
use of antibacterials, pulpal status, available surgical devices, and operator skill [1].
These variables have not been pre-labeled as chance or choice nodes. Even without
expertise in dentistry, it is clear that some of the variables, such as genetics or
age, should not ordinarily be modeled as decision variables. Others, such as Use of
antibacterials and Pulpal status (reflecting oral hygiene) may result from a history
of previous decisions by patients and perhaps other physicians or periodontists.
Still others, such as available surgical devices and operator skill, are fixed in the
short run, but might be considered decision variables over intervals long enough to
include the operators education, training, and experience or if the decisions to be
43 Quantifying and Reducing Uncertainty About Causality in: : : 1485
Plaque
control Smoking
Use of
Diabetes
antibacterials
Defect
morphology
Surgical Operator
procedure skill
Tooth
anatomy
Surgical
Root surface approach
preparation Devices
Fig. 43.12 Different variables can be treated as decision variables on different time scales
(Source: [1])
made include hiring practices and device acquisition decisions of the clinic where
the surgery is performed. Smoking and diabetes indicators might also be facts about
a patient that cannot be varied in the short run, but that might be considered as
at least in part determined by past health and lifestyle decisions. In short, even
if a perfectly accurate causal graph model were available, the question of who
acts upon the world how and over what time frame via the causal mechanisms
in the model must still be resolved in formulating an ID or MAID model from
a causal BN. In organizations or nations seeking to reduce various risks through
policies or regulations, who should manage what, which variables should be taken
as exogenously determined, and which should be subjected to control must likewise
be resolved before ID or MAID models can be formulated and solved to obtain
recommended risk management decisions.
Once a causal ID model has been fully quantified, it can be used to predict how
the probability distributions for different outcomes of interest (such as net health
effect in Fig. 43.11) and expected utility will change if different decisions are
1486 L.A. Cox, Jr.,
making for forecasting in detail the probabilities of different time courses of diseases
and related quantities, such as probability of first diagnosis with a disease or adverse
condition within a specified number of months or years [121], survival times and
probabilities for patients with different conditions and treatments, and remaining
days of hospitalization or remaining years of life for individual patients being
monitored and treated for complex diseases, from cancers to multiple surgeries to
sequential organ failures [5, 101].
DBN estimation software is freely available in R packages [69, 94]. It has been
developed and used largely by the systems biology community for interpreting time
series of gene expressions in systems biology. Biological and medical researchers,
electrical engineers, computer scientists, artificial intelligence researchers, and
statisticians have recognized that DBNs generalize important earlier methods of
dynamic estimation and inference, such as Hidden Markov Models and Kalman
filtering for estimation and signal processing [34]. DBNs are also potentially
extremely valuable in a wide range of other engineering, regulatory, policy, and
decision analysis settings where decisions and their consequences are distributed
over time, where feedback loops or other cycles make any static BN inapplicable,
or where detailed monitoring of changing probabilities of events is desired
so that midcourse changes in actions can be made in order to improve final
outcomes.
Development and application of DBN algorithms and various generalizations are
fruitful areas of ongoing applied research. Key concepts of DBNs and multi-agent
IDs have been successfully combined to model multi-agent control of dynamic
random processes (modeled as multi-agent partially observable Markov decision
processes, POMDPs) [93]. More recently, DBN methods have been combined with
ideas from change-point analysis for situations where arcs in the DAG model are
gained or lost at certain times as new influences or mechanisms start to operate or
former ones cease [97]. These advances further extend the flexibility and realism of
DBN models (and dynamic IDs based on them) to apply to description and control
of nonstationary time series.
As already discussed, value of information (VOI) calculations, familiar from
decision analysis, can be carried out straightforwardly in ID models. Less familiar,
but still highly useful, are methods for optimizing the sequential collection of
information to better ascertain correct causal models. The best available methods
involve design of experiments [116] and of time series of observations [90]. When
the correct ID model describing the relation between decisions and consequence
probabilities is initially uncertain, then collecting additional information may have
value not only for improving specific decisions (i.e., changing decisions or decision
rules to increase expected utility) within the context of a specified ID model but also
for discriminating among alternative ID models to better ascertain which ones best
describe reality. New information can help in learning IDs from data by revealing
how the effects of manipulations develop in affected variables over time [119].
For example, Tong and Koller [116] present a Bayesian approach to sequential
experimentation in which a distribution of BN DAG structures and CPTs is updated
by experiments that set certain variables to new values and monitor the changes in
1488 L.A. Cox, Jr.,
values of other variables. At each step, the next experiment to perform is selected
to most reduce expected loss from incorrect inferences about the presence and
directions of arcs in the DAG model. Even in BNs without decision or utility nodes,
designing experiments and time series of observations to facilitate accurate learning
of BN descriptions can be very valuable in creating and validating models with high
predictive accuracy [90].
11 Causal Analytics
The preceding sections have discussed how causal Bayesian networks and other
DAG and time series algorithms provide constructive methods for carrying out many
risk assessment and risk management tasks, even when there is substantial initial
uncertainty about relevant cause-and-effect relations and about the best (expected
utility-maximizing) courses of action. Other graphical formalisms for risk analysis
and decision-making, such as decision trees, game trees, fault trees, and event
trees, which have long been used to model the propagation of probabilistic events
in complex systems, can all be converted to equivalent IDs or BNs, often with
substantial reductions in computational complexity and with savings in the number
of nodes and combinations of variable values that must be explicitly represented
[107]. Thus, BNs and IDs provide an attractive unifying framework for characteriz-
ing, quantifying, and reducing uncertainties and for deciding what to do under the
uncertainties that remain. They, together with time series techniques and machine
learning techniques, provide a toolkit for using data to inform inference, prediction,
and decision-making with realistic uncertainties. These methods empower the
following important and widely used types of analytics for using data to inform
decisions:
Descriptive analytics: BNs and IDs describe how the part of the world being
modeled probably works, showing which factors influence or determine the prob-
ability distributions for which other variables and quantifying the probabilistic
relations among variables. If a BN or ID has CPTs that represent the operation of
lawlike causal mechanisms i.e., if it is a causal BN or ID then it can be used
to describe how changes in some variables affect the probability distributions
of others and hence how probabilistic causal influences propagate to change the
probabilities of outcomes.
Predictive analytics: A BN can be used to predict how the probabilities of future
observations change when new evidence is acquired (or assumed). A causal
BN or ID predicts how changes made at input nodes will affect the future
probabilities of outputs. Dynamic Bayesian networks (DBNs) are used to forecast
the probable sequences of future changes that will occur after observed changes
in inputs, culminating in a new posterior joint probability distribution for all
other variables over time (calculated via posterior inference algorithms). BNs
and DBNs are also used to predict and compare the probable consequences
(changes in probability distributions of outputs and other variables) caused
by alternative hypothetical (counterfactual) scenarios for changes in inputs,
43 Quantifying and Reducing Uncertainty About Causality in: : : 1489
including alternative decisions. Conversely, BNs can predict the most likely
explanation for observed data, such as the most likely diagnosis explaining
observed symptoms or the most likely sequence of component failures leading
to a real or hypothesized failure of a complex system. By predicting the probable
consequences of alternative policies or decisions and the most likely causes for
undesired outcomes, BNs can inform risk management decision-making and help
to identify where to allocate resources to repair or forestall likely failure paths.
Uncertainty analytics. Both BNs and IDs are designed to quantify uncertain-
ties about their predictions by using probability distributions for all uncertain
quantities. When model uncertainty is important, model ensemble methods
allow the predictions or recommendations from multiple plausible models to
be combined to obtain more accurate forecasts and better-performing decision
recommendations [3]. DBNs provide the ability to track event probabilities
in detail as they change over time, and dynamic versions of MAIDs allow
uncertainties about the actions of other decision-makers to be modeled.
Prescriptive analytics. If a net benefit, loss, or utility function for different
outcomes is defined, and if the causal DAG relating choices to probabilities
of consequences is known, then ID algorithms can be used to solve for the
best combination of decision variables to minimize expected loss or maximize
expected utility. If more than one decision-maker or policy maker makes choices
that affect the outcome, then MAIDS or dynamic versions of MAIDs can be used
to recommend what each should do.
Evaluation and learning analytics. Ensembles of BNs, IDs, and dynamic ver-
sions and extensions of these can be learned from data and experimentation.
Value of information (VOI) calculations determine when a single decision-
maker in a situation modeled by a known ID should stop collecting information
and take action. Dynamic causal BNs and IDs can be learned from time
series data in many settings (including observed responses to manipulations or
designed experiments) and current decision rules or policies can be evaluated
and improved during the learning process, via methods such as low-regret
learning with model ensembles, until no further improvements can be found
[107]. Learning about causal mechanisms from the observed time series of
responses to past interventions, manipulations, decisions, or policies provides
a promising technical approach to using past experience to deliberately improve
future decisions and outcomes.
Table 43.4 shows how these various components, which might collectively be
called causal analytics, provide constructive methods for answering the funda-
mental questions raised in the introduction. For event detection and consequence
prediction, DBNs (especially, nonstationary DBNs) and change-point analysis
(CPA) algorithms are well suited for detecting changes in time series of observations
and occurrences of unobserved events based on their observable effects. DBNs and
causal simulation models, as well as time series models that accurately describe how
impacts of changes are distributed over time, are also useful for predicting the prob-
able future consequences of recent changes or shocks in the inputs to a system.
1490 L.A. Cox, Jr.,
Table 43.4 Causal analytics algorithms address fundamental risk management questions under
realistic uncertainties
Causal analytics algorithms and methods for
Fundamental questions answering the questions
Event detection: What has changed recently Change-point analysis (CPA) algorithms
in disease patterns or other adverse out- Dynamic Bayesian networks (DBNs)
comes, by how much, when?
Consequence prediction: What are the impli- Dynamic Bayesian networks (DBNs)
cations for what will probably happen next Simulation modeling
if different actions (or no new actions) are Time series forecasting
taken?
Risk attribution: What is causing current Causal DAG models (e.g., BNs, IDs)
undesirable outcomes? Does a specific expo- Ensembles of DAG models
sure harm human health, and, if so, who is at Granger causality and transfer entropy
greatest risk and under what conditions? (TE) for time series
For risk attribution, causal graph models (such as BNs, IDs, and dynamic
versions of these) or ensembles of such models can be learned from data and used
to quantify the evidence that suspected hazards indeed cause the adverse effects
attributed to them (i.e., that there is, with high confidence, a directed arc pointing
from a node representing exposure to a hazard into a node representing the effect). If
so, the CPT for the effect node quantifies how changes in the exposure node change
probabilities of effects, given the levels of other causes with which exposure may
interact. Multivariate response modeling, in which the joint distributions of one or
more responses vary with the levels of one or more factors that probabilistically
cause them, can readily be modeled by DAGs that include the different causal
factors and effects. For risk management or regulation under uncertainty, if utility
nodes and decision nodes are incorporated into the causal graph models to create
known causal ID or MAID models, then the best decisions for risk management (i.e.,
for inducing the greatest achievable expected utilities) can be identified by well-
developed ID solution algorithms, and VOI calculations can be used to optimize
costly information collection and the timing of final decisions.
43 Quantifying and Reducing Uncertainty About Causality in: : : 1491
The power and maturity of the technical methods in Table 43.4 have spurred their
rapid uptake and application in fields such as neurobiology, systems biology, econo-
metrics, artificial intelligence, control engineering, game theory, signal processing,
and physics. However, they have so far had relatively limited impact on the practice
of uncertainty quantification and risk management in epidemiology, public health,
and regulatory science, perhaps because these fields give great deference to the
use of subjective judgments informed by weight-of-evidence considerations an
approach widely used and taught since the 1960s, but of unproved and doubtful
probative value [83]. Previous sections have illustrated some of the potential of
more modern methods of causal analytics, but the vast majority of applied work in
epidemiology, public health, and regulatory risk assessment unfortunately still uses
older association-based methods and subjective opinions about the extent to which
statistically significant differences between risk model coefficients for differently
exposed populations might have causal interpretations.
To help close the gap between these poor current practices and the potentially
much more objective, reliable, accurate, and sensitive methods of causal analytics
in Table 43.4, the following checklist may prove useful in judging the adequacy
of policy analyses or quantitative risk assessments (QRAs) that claims to have
identified useful predictive causal relations between exposures to risk factors or
hazards and resulting risks of adverse effects (responses), i.e., causal exposure-
response (E-R) relations.
1492 L.A. Cox, Jr.,
1. Does the QRA show that changes in exposures precede the changes in health
effects that they are said to cause? Are results of appropriate technical analyses
(e.g., change-point analyses, intervention analyses and other quasi-experimental
comparisons, and Granger causality tests or transfer entropy results) presented,
along with supporting data? If effects turn out to precede their presumed causes,
then unmeasured confounders or residual confounding by confounders that the
investigators claim were statistically controlled for may be at work.
2. Does the QRA demonstrate that health effects cannot be made conditionally
independent of exposure by conditioning on other variables, especially potential
confounders? Does it present the details, data, and results of appropriate statis-
tical tests (e.g., conditional independence tests and DAGs) showing that health
effects and exposures share mutual information that cannot be explained away
by any combination of confounders?
3. Does the QRA present and test explicit causal graph models, showing the results
of formal statistical tests of the causal hypotheses implied by the structure
of the model (i.e., which variables point into which others)? Does it identify
which alternative causal graph models are most consistent with available data
(e.g., using the Occams Window method of [78])? Most importantly, does it
present clear evidence that changes in exposure propagate through the causal
graph, causing successive measurable changes in the intermediate variables along
hypothesized causal paths? Such coherence, consistency, and biological plausi-
bility demonstrated in explicit causal graph models showing how hypothesized
causal mechanisms dovetail with each other to transduce changes in exposures
to changes in health risks can provide compelling objective evidence of a causal
relation between them, thus accomplishing what older and more problematic
WoE frameworks have long sought to provide [95].
4. Have noncausal explanations for statistical relations among observed variables
(including exposures, health effects, and any intermediate variables, modifying
factors, and confounders) been explicitly identified and convincingly refuted
using well-conducted and reported statistical tests? Especially, have model
diagnostics (e.g., plots of residuals and discussions of any patterns) and formal
tests of modeling assumptions been presented that show that the models used
appropriately describe the data to which the QRA applies them and that claimed
associations are not caused by model selection biases or specification errors,
failures to model errors in exposure estimates and other explanatory variables,
omitted confounders or other latent variables, uncorrected multiple testing bias,
or coincident historical trends (e.g., spurious regression, if the exposure and
health effects time series in longitudinal studies are not stationary)?
5. Have all causal mechanisms postulated in the QRA modeling been demonstrated
to exhibit stable, uniform, lawlike behavior, so that there is no substantial
unexplained heterogeneity in estimated input-output (e.g., E-R or C-R) relations?
If the answer is no, then missing factors may need to be identified and their effects
modeled before valid predictions can be made based on the assumption that
changes in causes will yield future changes in effects that can be well described
and predicted based on estimates of cause-effect relations from past data.
43 Quantifying and Reducing Uncertainty About Causality in: : : 1493
If the answers to these five diagnostic questions are all yes, then the QRA has
met the burden of proof of showing that the available data are consistent with a
causal relation and that other (noncausal) explanations are not plausible. It can then
proceed to quantify the estimated changes in probability distributions of outputs,
such as future health effects, that would be caused by changes in controllable
inputs (e.g., future exposure levels) using the causal models developed to show
that exposure causes adverse effects. The effort needed to establish valid evidence
of a causal relation between historical levels of inputs and outputs by being able
to answer yes to questions 15 pays off at this stage. Causal graph models (e.g.,
Bayesian networks with validated causal interpretations for their CPTs), simulation
models based on composition of validated causal mechanisms, and valid path
diagrams and SEM causal models can all be used to predict quantitative changes
in outputs that would be caused by changes in inputs, e.g., changes in future health
risks caused by changes in future exposure levels, given any scenario for the future
values of other inputs.
Conversely, if the answer to any of the preceding five diagnostic questions is no,
then it is premature to make causal predictions based on the work done so far. Either
the additional work needed to make the answers yes should be done or results should
be stated as contingent on the as-yet unproved assumption that this can eventually
be done.
Cross-References
Rare-Event Simulation
References
1. Alpiste Illueca, F.M., Buitrago Vera, P., de Grado Cabanilles, P., Fuenmayor Fernandez, V.,
Gil Loscos, F.J.: Periodontal regeneration in clinical practice. Med. Oral Patol. Oral Cir.
Bucal. 11(4), e3:82e3:92 (2006)
2. Angrist, J.D., Pischke, J.-S.: Mostly Harmless Econometrics: An Empiricists Companion.
Princeton University Press, Princeton (2009)
3. Ashcroft, M.: Performing decision-theoretic inference in Bayesian network ensemble models
In: Jaeger,M., Nielsen, T.D., Viappiani, P. (eds.) Twelfth Scandinavian Conference on
Artificial Intelligence, Aalborg, vol. 257, pp. 2534 (2013)
4. Arnold, A., Liu, Y., Abe, N.: Temporal causal modeling with graphical Granger methods. In:
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (SIGKDD-07), San Jose, 1215 Aug 2007. ACM, New York. http://dl.acm.
org/citation.cfm?id=1281192&picked=prox
5. Azhar, N., Ziraldo, C., Barclay, D., Rudnick, D.A., Squires, R.H., Vodovotz, Y., Pediatric
Acute Liver Failure Study Group: Analysis of serum inflammatory mediators identifies unique
dynamic networks associated with death and spontaneous survival in pediatric acute liver
failure. PLoS One. 8(11), e78202 (2013). doi:10.1371/journal.pone.0078202
6. Bai, Z., Wong, W.K., ZhangB.: Multivariate linear and nonlinear causality tests. Math.
Comput. Simul. 81(1), 517 (2010)
7. Barnett, L., Seth, A.K.: The MVGC multivariate Granger causality toolbox: a new approach
to Granger-causal inference. J. Neurosci. Methods 223 (2014)
1494 L.A. Cox, Jr.,
8. Barr, C.D., Diez, D.M., Wang, Y., Dominici, F., Samet, J.M.: Comprehensive smoking bans
and acute myocardial infarction among Medicare enrollees in 387 US counties: 19992008.
Am. J. Epidemiol. 176(7), 642648 (2012). Epub 17 Sep 2012
9. Brenner E, Sontag D. (2013) SparsityBoost: a new scoring function for learning Bayesian
network structure. In: 29th Conference on Uncertainty in Artificial Intelligence (UAI2013).
Westin Bellevue Hotel, Washington, DC, 1115 July 2013. http://auai.org/uai2013/prints/
papers/30.pdf
10. Callaghan, R.C., Sanches, M., Gatley, J.M., Stockwell, T.: Impacts of drinking-age
laws on mortality in Canada, 19802009. Drug Alcohol Depend. 138, 137145 (2014).
doi:10.1016/j.drugalcdep.2014.02.019
11. Cami, A., Wallstrom, G.L., Hogan, W.R.: Measuring the effect of commuting on the
performance of the Bayesian Aerosol Release Detector. BMC Med. Inform. DecisMak.
9(Suppl 1), S7 (2009)
12. Campbell, D.T., Stanley, J.C.: Experimental and Quasi-experimental Designs for Research.
Rand McNally, Chicago (1966)
13. Chang, K.C., Tian, Z.: Efficient inference for mixed Bayesian networks. In: Proceedings of
the Fifth International Conference on Information Fusion, Annapolis, vol. 1, pp 527534,
811 July 2002. IEEE. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1021199
14. Christensen, T.M., Mller, L., Jrgensen, T., Pisinger, C.: The impact of the Danish smoking
ban on hospital admissions for acute myocardial infarction. Eur. J. PrevCardiol. 21(1), 6573
(2014). doi:10.1177/2047487312460213
15. Corani, G., Antonucci, A., Zaffalon, M.: Bayesian networks with imprecise probabilities:
theory and application to classification. In: Holmes, D.E., Jaim, C. (eds.) Data Mining:
Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp.
4993 (2012)
16. Cox, L.A. Jr., Popken, D.A.: Has reducing fine particulate matter and ozone caused reduced
mortality rates in the United States? Ann. Epidemiol. 25(3), 162173 (2015)
17. Cox, L.A. Jr., Popken, D.A., Berman, D.W.: Causal versus spurious spatial exposure-response
associations in health risk analysis. Crit. Rev. Toxicol. 43(Suppl 1), 2638 (2013)
18. Crowson, C.S., Schenck, L.A., Green, A.B., Atkinson, E.J., Therneau, T.M.: The basics
of propensity scoring and marginal structural models. Technical report #84, 1 Aug 2013.
Department of Health Sciences Research, Mayo Clinic, Rochester. http://www.mayo.edu/
research/documents/biostat-84-pdf/doc-20024406
19. Dash, D., Druzdzel, M.J.: A note on the correctness of the causal ordering algorithm. Artif.
Intell. 172, 18001808 (2008). http://www.pitt.edu/~druzdzel/psfiles/aij08.pdf
20. De Campos C.P., Ji, Q.:. Efficient structure learning of Bayesian networks using constraints.
J. Mach. Learn. Res. 12, 663689 (2011)
21. Dominici, F., Greenstone, M., Sunstein, C.R.: Science and regulation. Particulate matter
matters. Science. 344(6181), 257259 (2014). doi:10.1126/science.1247348
22. The Economist: Trouble at the Lab: scientists like to think of science as self-correcting. To
an alarming degree, it is not. www.economist.com/news/briefing/21588057-scientists-think-
science-self-correcting-alarming-degree-it-not-trouble, 19 Oct 2013
23. Eichler, M., Didelez, V.: On Granger causality and the effect of interventions in time series.
Lifetime Data Anal. 16(1), 332 (2010). Epub 26 Nov 2009. http://www.ncbi.nlm.nih.gov/
pubmed/19941069
24. EPA (U.S. Environmental Protection Agency): The Benefits and Costs of the Clean Air Act
from 1990 to 2020. Final Report Rev. A. Office of Air and Radiation, Washington, DC
(2011)
25. EPA: Expanded expert judgment assessment of the concentration-response relationship
between PM2.5 exposure and mortality. www.epa.gov/ttn/ecas/regdata/Uncertainty/pm_ee_
report.pdf (2006)
26. Exarchos, K.P., Goletsis, Y., Fotiadis, D.I.: A multiscale and multiparametric approach for
modeling the progression of oral cancer. BMC Med. Inform. DecisMak. 12, 136 (2012).
doi:10.1186/1472-6947-12-136.
43 Quantifying and Reducing Uncertainty About Causality in: : : 1495
27. Ezzati, M., Hoorn, S.V., Lopez, A.D., Danaei, G., Rodgers, A., Mathers, C.D., Murray,
C.J.L.: Comparative quantification of mortality and burden of disease attributable to selected
risk factors. In: Lopez, A.D., Mathers, C.D., Ezzati, M., Jamison, D.T., Murray, C.J.L.
(eds.) Global Burden of Disease and Risk Factors, chapter 4. World Bank, Washington, DC
(2006)
28. Fann, N., Lamson, A.D., Anenberg, S.C., Wesson, K., Risley, D., Hubbell, B.J.: Estimating
the national public health burden associated with exposure to ambient PM2.5 and Ozone. Risk
Anal. 32(1), 8195 (2012)
29. Ferson, S., Donald, S.: Probability bounds analysis. In: Mosleh, A., Bari, R.A. (eds.)
Probabilistic Safety Assessment and Management, pp. 12031208. Springer, New York
(1998)
30. Ferson, S., Hajagos, J.G.: Arithmetic with uncertain numbers: rigorous and (often) best
possible answers. In: Helton, J.C., Oberkampf, W.L. (eds.) Alternative Representations of
Epistemic Uncertainty. Reliability Engineering & System Safety, vol. 85, pp. 135152;
1369 (2004)
31. Freedman, D.A.: Graphical models for causation, and the identification problem. Eval. Rev.
28(4), 267293 (2004)
32. Friedman, N., Goldszmidt, M.: Learning Bayesian networks with local structure. In: Jordan,
M.I. (ed.) Learning in Graphical Models, pp. 421459. MIT, Cambridge (1998)
33. Gasparrini, A., Gorini, G., Barchielli, A.: On the relationship between smoking bans
and incidence of acute myocardial infarction. Eur. J. Epidemiol. 24(10), 597602
(2009)
34. Ghahramani, Z.: Learning dynamic Bayesian networks. In: Giles, C.L., Gori, M. (eds.) Adap-
tive Processing of Sequences and Data Structures. International Summer School on Neural
Networks "Caianiello, E.R." Vietri sul Mare, Salerno, 613 Sept 1997. Tutorial Lectures. Lec-
ture Notes in Computer Science, vol. 1387 (1998). http://link.springer.com/sbook/10.1007/
BFb0053992, http://link.springer.com/bookseries/558, http://mlg.eng.cam.ac.uk/zoubin/SAL
D/learnDBNs.pdf (1997)
35. Greenland, S.: Epidemiologic measures and policy formulation: lessons from potential
outcomes. Emerg. Themes Epidemiol. 2, 5 (2005)
36. Greenland, S., Brumback, B.: An overview of relations among causal modelling methods. Int.
J. Epidemiol. 31(5), 10301037 (2002). http://www.ncbi.nlm.nih.gov/pubmed/12435780
37. Gruber, S., Logan, R.W., Jarrn, I., Monge, S., Hernn, M.A.: Ensemble learning of inverse
probability weights for marginal structural modeling in large observational datasets. Stat.
Med. 34(1), 106117 (2015)
38. Grundmann, O.: The current state of bioterrorist attack surveillance and preparedness in the
US. Risk Manag. Health Policy. 7, 177187 (2014)
39. Hack, C.E., Haber, L.T., Maier, A., Shulte, P., Fowler, B., Lotz, W.G., Savage, R.E., Jr.: A
Bayesian network model for biomarker-based dose response. Risk Anal. 30(7), 10371051
(2010)
40. Harris, A.D., Bradham, D.D., Baumgarten, M., Zuckerman, I.H., Fink, J.C., Perencevich,
E.N.: The use and interpretation of quasi-experimental studies in infectious diseases. Clin.
Infect Dis. 38(11), 15861591 (2004)
41. Harris, A.D., McGregor, J.C., Perencevich, E.N., Furuno, J.P., Zhu, J., Peterson, D.E., Finkel-
stein, J.: The use and interpretation of quasi-experimental studies in medical informatics. J.
Am. Med. Inform. Assoc. 13(1), 1623 (2006)
42. Harvard School of Public Health: Press Release: Ban On Coal Burning in Dublin Cleans
the Air and Reduces Death Rates www.hsph.harvard.edu/news/press-releases/archives/2002-
releases/press10172002.html (2002)
43. Health Effects Institute (HEI): Impact of Improved Air Quality During the 1996 Summer
Olympic Games in Atlanta on Multiple Cardiovascular and Respiratory Outcomes. HEI
Research Report #148 (2010). Authors: Jennifer L. Peel, Mitchell Klein, W. Dana Flanders,
James A. Mulholland, and Paige E. Tolbert. Health Effects Institute. Boston, MA. http://pubs.
healtheffects.org/getfile.php?u=564
1496 L.A. Cox, Jr.,
44. Health Effects Institute (HEI): Did the Irish Coal Bans Improve Air Quality and Health?
HEI Update. http://pubs.healtheffects.org/getfile.php?u=929 (Summer, 2013). Last Retrieved
1 Feb 2014
45. Helfenstein, U.: The use of transfer function models, intervention analysis and related time
series methods in epidemiology. Int. J. Epidemiol. 20(3), 808815 (1991)
46. Hernn, M.A., Taubman, S.L.: Does obesity shorten life? The importance of well-defined
interventions to answer causal questions. Int. J. Obes. (Lond.) 32(Suppl 3), S8S14 (2008)
47. Hibbs, D.A., Jr.: On analyzing the effects of policy inter ventions: Box-Jenkins and Box-Tiao
vs. structural equation models. Sociol. Methodol. 8, 137179 (1977). http://links.jstor.org/
sici?sici=0081-1750%281977%298%3C137%3AOATEOP%3E2.0.CO%3B2-K
48. Hipel, K.W., Lettenmaier, D.P., McLeod, I.: Assessment of environmental impacts part one:
Interv. Anal. Environ. Manag. 2(6), 529535 (1978)
49. Hites, R.A., Foran, J.A., Carpenter, D.O., Hamilton, M.C., Knuth, B.A., Schwager, S.J.:
Global assessment of organic contaminants in farmed salmon. Science. 303(5655), 226229
(2004)
50. Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging. Stat. Sci. 14,
382401 (1999)
51. Hfler, M.: The Bradford Hill considerations on causality: a counterfactual perspective.
Emerg. Themes Epidemiol. 2, 11 (2005)
52. Homer, J., Milstein, B., Wile, K., Trogdon, J., Huang, P., Labarthe. D., et al.: Simulating and
evaluating local interventions to improve cardiovascular health. Prev. Chronic Dis. 7(1), A18
(2010). www.cdc.gov/pcd/issues/2010/jan/08_0231.htm. Accessed 3 Nov 2015
53. Hora, S.: Eliciting probabilities from experts. In: Edwards, W., Miles, R.F., von Winterfeldt,
D. (eds.) Advances in Decision Analysis: From Foundations to Applications, pp. 129153.
Cambridge University Press, New York (2007)
54. Hoyer, P.O., Hyvrinen, A., Scheines, R., Spirtes, P., Ramsey, J., Lacerda, G., Shimizu,
S.: Causal discovery of linear acyclic models with arbitrary distributions. In: Proceedings
of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence - UAI, Helsinki,
Conference held 912 July 2008, pp. 282289. http://arxiv.org/ftp/arxiv/papers/1206/1206.
3260.pdf
55. Huitema, B.E., Van Houten, R., Manal, H.: Time-series intervention analysis of pedestrian
countdown timer effects. Accid Anal Prev. 72, 2331 (2014). doi:10.1016/j.aap.2014.05.025
56. Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2(8), e124
(2005). doi:10.1371/journal.pmed.0020124
57. James, N.A., Matteson, D.S.: ecp: an R package for nonparametric multiple change point
analysis of multivariate data. J. Stat. Softw. 62(7) (2014). http://www.jstatsoft.org/v62/i07/
paper
58. Janzing, D., Balduzzi, D., Grosse-Wentrup, M., Scholkopf, B.: Quantifying causal influences.
Ann. Stat. 41(5), 23242358 (2013). doi:10.1214/13-AOS1145
59. Jiang, H., Livingston, M., Manton, E.: The effects of random breath testing and lowering the
minimum legal drinking age on traffic fatalities in Australian states. Inj. Prev. 21(2), 7783
(2015). doi:10.1136/injuryprev-2014-041303
60. Joffe, M., Gambhir, M., Chadeau-Hyam, M., Vineis, P.: Causal diagrams in systems epidemi-
ology. Emerg. Themes Epidemiol. 9(1), 1 (2012). doi:10.1186/1742-7622-9-1
61. Kahneman, D.: Thinking, Fast and Slow. Farrar, Straus, and Giroux, New York (2011)
62. Kass-Hout, T.A., Xu, Z., McMurray, P., Park, S., Buckeridge, D.L., Brownstein, J.S.,
Finelli, L., Groseclose, S.L.: Application of change point analysis to daily influenza-like
illness emergency department visits. J. Am. Med. Inform. Assoc. 19(6), 10751081 (2012).
doi:10.1136/amiajnl-2011-000793
63. Kinnunen, E., Junttila, O., Haukka, J., Hovi, T.: Nationwide oral poliovirus vaccination
campaign and the incidence of Guillain-BarrSyndrome. Am. J. Epidemiol. 147(1), 6973
(1998)
64. Kleck, G., Britt, C.L., Bordua, D.: The emperor has no clothes: an evaluation of interrupted
time series designs for policy impact assessment. J. Firearms Public Policy 12, 197247
(2000)
43 Quantifying and Reducing Uncertainty About Causality in: : : 1497
65. Klein, L.R.: Regression systems of linear simultaneous equations. In: A Textbook of
Econometrics, 2nd edn, pp. 131196. Prentice-Hall, Englewood Cliffs (1974). ISBN:0-13-
912832-8
66. Kline, R.B.: Principles and Practice of Structural Equation Modeling. Guilford Press, New
York (1998)
67. Koller, D., Milch, B.: Multi-agent influence diagrams for representing and solving games.
In: Proceedings of the 17th International Joint Conference on Artificial Intelligence
(2001)
68. Lagarde, M.: How to do (or not to do) . . . Assessing the impact of a policy change with routine
longitudinal data. Health Policy Plan. 27(1), 7683 (2012). doi: 10.1093/heapol/czr004.
69. Lebre, S.: Package G1DBN: a package performing dynamic Bayesian network inference.
CRAN repository, 19 Feb 2015. https://cran.r-project.org/web/packages/G1DBN/G1DBN.
pdf
70. Lehrer, J.: Trials and errors: why science is failing us. Wired. http://www.wired.co.uk/
magazine/archive/2012/02/features/trials-and-errors?page=all, 28 Jan 2012
71. Lei, H., Nahum-Shan, I., Lynch, K., Oslin, D., Murphy, S.A.: A SMART design for building
individualized treatment sequences. Ann. Rev. Clin. Psychol. 8, 14.114.28 (2012)
72. Linn, K.A., Laber, E.B., Stefanski LA.: iqLearn: interactive Q-learning in R. https://cran.r-
project.org/web/packages/iqLearn/vignettes/iqLearn.pdf (2015)
73. Lipsitch, M., Tchetgen Tchetgen, E., Cohen, T.: Negative controls: a tool for detecting
confounding and bias in observational studies. Epidemiology 21(3), 383388 (2010)
74. Lizier, J.T.: JIDT: an information-theoretic toolkit for studying the dynamics of
complex systems. Front. Robot. AI 1, 11 (2014); doi:10.3389/frobt.2014.00011 (pre-
print: arXiv:1408.3270), http://arxiv.org/pdf/1408.3270.pdf
75. Lu, C.Y., Soumerai, S.B., Ross-Degnan, D., Zhang, F., Adams, A.S.: Unintended impacts of
a Medicaid prior authorization policy on access to medications for bipolar illness. Med Care.
48(1), 49 (2010). doi:10.1097/MLR.0b013e3181bd4c10.
76. Lynch, W.D., Glass, G.V., Tran, Z.V.: Diet, tobacco, alcohol, and stress as causes of coronary
artery heart disease: an ecological trend analysis of national data. Yale J. Biol. Med. 61(5),
413426 (1988)
77. Maclure, M.: Taxonomic axes of epidemiologic study designs: a refutationist perspective.
J. Clin. Epidemiol. 44(10), 10451053 (1991)
78. Madigan, D., Raftery, A.: Model selection and accounting for model uncertainty in graphical
models using Occams window. J. Am. Stat. Assoc. 89, 15351546 (1994)
79. Madigan, D., Andersson, S.A., Perlman, M.D., Volinsky, C.M.: Bayesian model averaging
and model selection for Markov equivalence classes of acyclic digraphs. Commun. Stat.
Theory Methods 25, 24932519 (1996)
80. McLeod et al. (2011) Time series analysis with R. http://www.stats.uwo.ca/faculty/aim/tsar/
tsar.pdf
81. Montalto, A., Faes, L., Marinazzo, D.: MuTE: a MATLAB toolbox to compare established
and novel estimators of the multivariate transfer entropy. PLoS One 9(10), e109462 (2014).
doi:10.1371/journal.pone.0109462
82. Moore, K.L., Neugebauer, R., van der Laan, M.J., Tager, I.B.: Causal inference in epidemio-
logical studies with strong confounding. Stat Med. (2012). doi:10.1002/sim.4469
83. Morabia, A.: Hume, Mill, Hill, and the sui generis epidemiologic approach to causal
inference. Am. J. Epidemiol. 178(10), 15261532 (2013)
84. Morriss, R., Gask, L., Webb, R., Dixon, C., Appleby, L.: The effects on suicide rates of an
educational intervention for front-line health professionals with suicidal patients (the STORM
project). Psychol. Med. 35(7), 957960 (2005)
85. Nakahara, S., Katanoda, K., Ichikawa, M.: Onset of a declining trend in fatal motor vehicle
crashes involving drunk-driving in Japan. J. Epidemiol. 23(3), 195204 (2013)
86. Neugebauer, R., Fireman, B., Roy, J.A., Raebel, M.A., Nichols, G.A., OConnor, P.J.:
Super learning to hedge against incorrect inference from arbitrary parametric assump-
tions in marginal structural modeling. J. Clin. Epidemiol. 66(8 Suppl):S99S109 (2013).
doi:10.1016/j.jclinepi.2013.01.016
1498 L.A. Cox, Jr.,
87. Nguefack-Tsague, G.: Using Bayesian networks to model hierarchical relationships in epi-
demiological studies. Epidemiol. Health 33, e2011006 (2011). doi:10.4178/epih/e2011006.
Epub 17 Jun 2011. http://e-epih.org/journal/view.php?doi=10.4178/epih/e2011006
88. Nuzzo, R.: Scientific method: statistical errors. P values, the gold standard of statistical
validity, are not as reliable as many scientists assume. Nature 506, 150152 (2014).
doi:10.1038/506150a
89. Owczarek, T.: On modeling asymmetric multi-agent scenarios. In: IEEE International
Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology
and Applications, Rende (Cosenza), 2123 Sept 2009
90. Page, D., Ong, I.M.: Experimental design of time series data for learning from dynamic
Bayesian networks. Pac. Symp. Biocomput. 2006, 267278 (2006)
91. Papana, A., Kyrtsou, C., Kugiumtzis, D., Cees, D.: Detecting causality in non-stationary
time series using partial symbolic transfer entropy: evidence in financial data. Comput. Econ.
47(3), 341365 (2016). http://link.springer.com/article/10.1007%2Fs10614-015-9491-x
92. Pearl, J.: An introduction to causal inference. Int. J. Biostat. 6(2), Article 7 (2010).
doi:10.2202/15574679.1203
93. Polich, K., Gmytrasiewicz, P.: Interactive dynamic influence diagrams. In: Proceedings of the
6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM,
New York. Article No. 34. http://dl.acm.org/citation.cfm?id=1329166
94. Rau, A.: Package ebdbNet: empirical Bayes estimation of dynamic Bayesian net-
works. CRAN repository, 19 Feb 2015. https://cran.r-project.org/web/packages/ebdbNet/ebdb
Net.pdf
95. Rhomberg, L.: Hypothesis-based weight of evidence: an approach to assessing causation and
its application to regulatory toxicology. Risk Anal. 35(6), 11141124 (2015)
96. Robins, J.M., Hernn, M.A., Brumback, B.: Marginal structural models and causal inference
in epidemiology. Epidemiology 11(5), 550560 (2000)
97. Robinson, J.W., Hartemink, A.J.: Learning non-stationary dynamic Bayesian networks.
J. Mach. Learn. Res. 11, 36473680 (2010)
98. Rothman, K.J., Lash, L.L., Greenland, S.: Modern Epidemiology, 3rd edn. Lippincott,
Williams, & Wilkins. New York (2012)
99. Runge, J., Heitzig, J., Petoukhov, V., Kurths, J.: Escaping the curse of dimensionality in
estimating multivariate transfer entropy. Phys. Rev. Lett. 108, 258701. Published 21 June
2012
100. Samet, J.M., Bodurow, C.C. (eds.): Improving the Presumptive Disability Decision-Making
Process for Veterans. Committee on Evaluation of the Presumptive Disability Decision-
Making Process for Veterans, Board on Military and Veterans Health, Institute of Medicine.
National Academies Press, Washington, DC (2008)
101. Sandri, M., Berchialla, P., Baldi, I., Gregori, D., De Blasi, R.A.: Dynamic Bayesian networks
to predict sequences of organ failures in patients admitted to ICU. J. Biomed. Inform. 48,
106113 (2014). doi:10.1016/j.jbi.2013.12.008
102. Sarewitz, D.: Beware the creeping cracks of bias. Nature 485, 149 (2012)
103. Sarewitz, D.: Reproducibility will not cure what ails science. Nature 525(7568), 159 (2015)
104. Schwartz, J., Austin, E., Bind, M.A., Zanobetti, A., Koutrakis, P.: Estimating causal associa-
tions of fine particles with daily deaths in Boston. Am. J. Epidemiol. 182(7), 644650 (2015)
105. Scutari, M.: Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 35(3)
(2010). www.jstatsoft.org/v35/i03/paper. Last accessed 5 May 2015
106. Shen, Y., Cooper, G.F.: A new prior for Bayesian anomaly detection: application to biosurveil-
lance. Methods Inf. Med. 49(1), 4453 (2010)
107. Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and
Logical Foundations. Cambridge University Press, Cambridge (2010)
108. Skrvseth, S.O., Bellika, J.G., Godtliebsen, F.: Causality in scale space as an approach to
change detection. PLoS One. 7(12), e52253 (2012). doi:10.1371/journal.pone.0052253
109. Stebbings, J.H., Jr.: Panel studies of acute health effects of air pollution. II. A methodologic
study of linear regression analysis of asthma panel data. Environ. Res. 17(1), 1032 (1978)
43 Quantifying and Reducing Uncertainty About Causality in: : : 1499
110. Steck, H.: Learning the Bayesian network structure: Dirichlet prior versus data. In: Proceed-
ings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008),
University of Helsinki City Centre Campus, Helsinki, 912 July 2008
111. Sun, X.: Assessing nonlinear granger causality from multivariate time series. Mach. Learn.
Knowl. Discov. Databases. Lect. Notes Comput. Sci. 5212, 440455 (2008)
112. Swanson, S.A., Hernn, M.A.: How to report instrumental variable analyses (suggestions
welcome). Epidemiology 24(3), 370374 (2013)
113. Tashiro, T., Shimizu, S., Hyvrinen, A., Washio T.: ParceLiNGAM: a causal ordering method
robust against latent confounders. Neural Comput. 26(1), 5783 (2014)
114. Taubman, S.L., Allen, H.L., Wright, B.J., Baicker, K., Finkelstein, A.N.: Medicaid increases
emergency-department use: evidence from Oregons health insurance experiment. Science.
343(6168), 263268 (2014). doi:10.1126/science.1246183
115. Thornley, S., Marshall, R.J., Wells, S., Jackson, R.: Using directed acyclic graphs for
investigating causal paths for cardiovascular disease. J. Biomet. Biostat. 4, 182 (2013).
doi:10.4172/2155-6180.1000182
116. Tong, S., Koller, D.: Active learning for structure in Bayesian networks. In: International Joint
Conference on Artificial Intelligence (IJCAI), Seattle (2001)
117. Twardy, C.R., Nicholson, A.E., Korb, K.B., McNeil, J.: Epidemiological data mining of
cardiovascular Bayesian networks. J. Health Inform. 1(1), e3:1e3:13 (2006)
118. Vicente, R., Wibral, M., Lindner, M., Pipa, G.: Transfer entropy-a model-free measure of
effective connectivity for the neurosciences. J. Comput. Neurosci. 30(1), 4567 (2011)
119. Voortman, M., Dash, D., Druzdzel, M.J.: Learning causal models that make correct manip-
ulationpredictions with time series data. In: Guyon, I., Janzing, D., Schlkopf, B. (eds.)
JMLR Workshop and Conference Proceedings, vol. 6, pp. 257266. NIPS 2008 Workshop
on Causality. http://jmlr.csail.mit.edu/proceedings/papers/v6/voortman10a/voortman10a.pdf
(2008)
120. Wang, J., Spitz, M.R., Amos, C.I., et al.: Method for evaluating multiple mediators:
Mediating effects of smoking and COPD on the association between the CHRNA5-A3
Variant and Lung Cancer Risk. de Torres JP, ed. PLoS One. 7(10), e47705 (2012).
doi:10.1371/journal.pone.0047705
121. Watt, E.W., Bui, A.A.: Evaluation of a dynamic Bayesian belief network to predict
osteoarthritic knee pain using data from the osteoarthritis initiative. AMIA Annul. Symp.
Proc. 2008, 788792 (2008)
122. Wen, X., Rangarajan, G., Ding, M.: Multivariate Granger causality: an estimation framework
based on factorization of the spectral density matrix. Philos. Trans. R. Soc. A 371, 20110610
(2013). http://dx.doi.org/10.1098/rsta.2011.0610
123. Zhang, N.L.: Probabilistic inference in influence diagrams. Comput. Intell. 14, 475497
(1998)
Part VI
Codes of Practice and Factors of Safety
Conceptual Structure of Performance
Assessments for Geologic Disposal of 44
Radioactive Waste
Abstract
A conceptual structure for performance assessments (PAs) for radioactive waste
disposal facilities and other complex engineered facilities based on the following
three basic conceptual entities is described: EN1, a probability space that
characterizes aleatory uncertainty; EN2, a function that predicts consequences
for individual elements of the sample space for aleatory uncertainty; and
EN3, a probability space that characterizes epistemic uncertainty. This structure
provides a basis for (i) a separation of the effects and implications of aleatory
and epistemic uncertainty; (ii) the design, computational implementation, and
documentation of a PA for a complex system; and (iii) informative uncertainty
and sensitivity analyses. The implementation of this structure is illustrated with
results from PAs for the Waste Isolation Pilot Plant for transuranic radioactive
waste and the proposed Yucca Mountain repository for spent nuclear fuel and
high-level radioactive waste.
Keywords
Aleatory uncertainty Epistemic uncertainty Performance assessment
Radioactive waste Sensitivity analysis Uncertainty analysis
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1504
2 Characterization of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1504
3 EN1, Representation of Aleatory Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1506
4 EN2, Model That Estimates Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1508
5 EN3, Representation of Epistemic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1509
6 Propagation and Display of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1513
7 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1530
8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1533
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1534
1 Introduction
A conceptual structure for performance assessments (PAs) for the geologic disposal
of radioactive waste is described. Illustrations of this structure are provided by
past PAs for (i) a repository for transuranic radioactive waste near Carlsbad, New
Mexico, known as the Waste Isolation Pilot Plant (WIPP) [1] and (ii) a proposed
repository for spent nuclear fuel and high-level radioactive waste at Yucca Mountain
(YM), Nevada [2]. Probabilistic risk assessments (PRAs) for nuclear power plants
and other complex engineered facilities also have a similar conceptual structure
[35].
The following topics are addressed: (i) characterization of uncertainty (Sect. 2),
(ii) representation of aleatory uncertainty (Sect. 3), (iii) estimation of consequences
(Sect. 4), (iv) representation of epistemic uncertainty (Sect. 5), (v) propagation
and display of uncertainty (Sect. 6), and (vi) sensitivity analysis (Sect. 7). The
presentation then ends with a brief summary (Sect. 8).
This presentation is an edited and updated adaption of two prior conference
presentations [6, 7] and also draws on additional material in two special journal
issues devoted to the indicated WIPP and YM PAs [1, 2].
2 Characterization of Uncertainty
A PA for a geologic repository for radioactive waste or any other complex facility is
an analysis intended to answer three questions about the facility and one additional
question about the analysis itself. The initial three questions are Q1, What could
happen?; Q2, How likely is it to happen?; and Q3, What are the consequences
if it does happen?. Formally, the answers to the preceding three questions can be
represented by a set of ordered triples of the form
.Si ; pSi ; cSi /; i D 1; 2; : : : ; nS; (44.1)
where (i) Si is a set of similar occurrences, (ii) the sets Si are disjoint (i.e., Si \
Sj D for i j ) and [i Si contains everything that could potentially occur at
the particular facility under consideration, (iii) pSi is the probability for Si , and (iv)
cSi is a vector of consequences associated with Si [8]. The fourth question is Q4,
44 Conceptual Structure of Performance Assessments for Geologic: : : 1505
What is the uncertainty in the answers to the first three questions? or, equivalently,
What is the level of confidence in the answers to the first three questions?.
The set of ordered triples in Eq. (44.1) is often referred to as the Kaplan-Garrick
ordered triple representation for risk and, in essence, is an intuitive representation
for a function f (i.e., a random variable) defined in association with a probability
space (A, A, pA ), where (i) A is the set of everything that could occur in the
particular universe under consideration, (ii) A is the collection of subsets of A for
which probability is defined, and (iii) pA is the function that defines probability for
the elements of A ([9], Sect. IV.3). Specifically, the sets Si are elements of A with
[i Si D A; pA .Si / is the probability pSi of Si ; and f .ai ) for a representative
element ai of Si defines cSi . Another less commonly used possibility is that cSi is
the expected value of f conditional on Si .
The uncertainty characterized by the sets Si and associated probabilities pSi
in Eq. (44.1) introduced as answers to questions Q1 and Q2 is often referred to as
aleatory uncertainty and results from a perceived randomness in future occurrences
that could take place at the facility under consideration. The descriptor aleatory
derives from the Latin aleae, which refers to games of dice, and is the reason
for the notational use of A in the definition of the probability space (A, A,
pA /. Alternative designators for aleatory uncertainty include stochastic, type A,
and irreducible. Question Q4 relates to uncertainty that results from a lack of
knowledge with respect to the appropriateness of assumptions and/or parameter
values used in an analysis. The basic idea is that an analysis has been developed to
the point that it has a well-defined overall structure with fixed values for models and
parameters, but uncertainty remains with respect to appropriate parameter values
and possibly submodel structure or choice for use in this overall structure. This
form of uncertainty is usually indicated by the descriptor epistemic, which derives
from the Greek episteme for knowledge. Alternative descriptors for epistemic
uncertainty include subjective, state of knowledge, type B, and reducible. Most
analyses use probability to characterize epistemic uncertainty, which in turn means
that in some way a probability space (E, E, pE / characterizing epistemic uncertainty
must be defined and incorporated into the analysis. The representation of aleatory
and epistemic uncertainty is an important, and sometimes contentious, part of
the analysis of a complex system. As a result, an extensive literature exits on
the incorporation of aleatory and epistemic uncertainty into analyses of complex
systems (e.g., [1023]).
A PA for a geologic repository for radioactive waste or any other complex
facility is a very involved analysis. The understanding of such an analysis is
greatly facilitated if it can be seen in the large before having to work through
all the details of the implementation of the analysis. Fortunately, such an analysis
is typically underlain by three basic entities or components: EN1, a probability
space (A, A, pA / that characterizes aleatory uncertainty; EN2, a function f that
estimates consequences for individual elements a of the sample space A for aleatory
uncertainty; and EN3, a probability space (E, E, pE / that characterizes epistemic
uncertainty [7,2426]. A recognition and understanding of these three basic entities
makes it possible to understand the conceptual and computational structure of a
1506 J.C. Helton et al.
large PA without having basic concepts obscured by fine details of the analysis and
leads to an analysis that results in informative uncertainty and sensitivity analyses.
The nature and role of these three basic entities in PA and associated uncertainty
and sensitivity analyses are elaborated on and illustrated in following sections.
a D a1 ; a2 ; : : : ; anA (44.2)
characterizing individual futures that potentially could occur at the facility under
consideration. The set A and the individual futures a contained in A are typically
defined for some specified time interval. For example, the time interval [0, 104 yr] is
specified in the regulations for WIPP [24, 27], and time intervals of [0, 104 yr] and
[0, 106 yr] are specified in different parts of the regulations for the proposed YM
repository ([26], Sect. 6). In contrast, futures are, in effect, defined for 1 year time
intervals in PAs for nuclear power plants [35]. In practice, the probability space
(A, A, pA / is usually defined by specifying distributions for the individual elements
of a. For notational purposes, it is often convenient to represent the distribution
associated with the elements a of A with a density function dA .a/.
For the 1996 PA supporting the Compliance Certification Application (CCA)
for the WIPP, the only disruptions with sufficient potential for occurrence over the
104 yr regulatory period to merit inclusion in the analysis were drilling for petroleum
resources and mining of potash above the excavated waste disposal drifts [27, 28].
As a consequence, the individual futures underlying the 1996 WIPP PA have the
form
a D t1 ; l1 ; e1 ; b1 ; p1 ; a1 ; t2 ; l2 ; e2 ; b2 ; p2 ; a2 ; : : : ; tn ; ln ; en ; bn ; pn ; an ; tmin ;
1st intrusion 2nd intrusion nth intrusion
(44.3)
where (i) n is the number of drilling intrusions, (ii) ti is the time (yr) of the i th
intrusion, (iii) li designates the location of the i th intrusion, (iv) ei designates
the penetration of an excavated or nonexcavated area by the i th intrusion, (v) bi
designates whether or not the i th intrusion penetrates pressurized brine in the Castile
Formation, (vi) pi designates the borehole plugging procedure used with the i th
intrusion, (vii) ai designates the type of waste penetrated by the i th intrusion, and
(viii) tmi n is the time (yr) at which potash mining occurs within the land withdrawal
boundary. The distributions for times ti for drilling intrusions and tmin for potash
mining are defined by Poisson processes; the distributions for the remaining
elements of a are defined on the basis of the properties of the waste disposal drifts
associated with the WIPP and the geologic environment that surrounds these drifts.
44 Conceptual Structure of Performance Assessments for Geologic: : : 1507
a D nEW ; nED; nII ; nIE; nS G; nSF ; aEW ; aED ; aII ; aIE ; aSG ; aSF ;
(44.4)
where (i) nEW D number of early waste package (WP) failures, (ii) nED D
number of early drip shield (DS) failures, (iii) nII D number of igneous intrusive
events, (iv) nIE D number of igneous eruptive events, (v) nS G D number of
seismic ground motion events, (vi) nSF D number of seismic fault displacement
events, (vii) aEW D vector defining the nEW early WP failures, (viii) aED D vector
defining the nED early DS failures, (ix) aII D vector defining the nII igneous
intrusive events, (x) aIE D vector defining the nIE igneous eruptive events, (xi)
aSG D vector defining the nSG seismic ground motion events, and (xii) aSF D
vector defining the nSF fault displacement events. In turn, the vectors aEW , aED ,
aII , aIE , aSG , and aSF are of the form
and
where (i) aEW ;j D vector defining early WP failure j for j D 1; 2; : : :, nEW, (ii)
aED;j D vector defining early DS failure j for j D 1; 2; : : :, nED, (iii) aII;j D
vector defining igneous intrusive event j for j D 1; 2; : : :, nII, (iv) aIE;j D vector
defining igneous eruptive event j for j D 1; 2; : : :, nIE, (v) aSG;j D vector defining
seismic ground motion event j for j D 1; 2; : : :, nSG, and (vi) aSF;j D vector
defining seismic fault displacement event j for j D 1; 2; : : :, nSF. Definitions
of the vectors aEW ;j , aED;j , aII;j , aIE;j , aSG;j , and aSF;j and their associated
probabilistic characterizations are given in Refs. [3032].
As an example, the individual vectors aEW ;j , j D 1; 2; : : :, nEW, appearing in
the definition of aEW in Eq. (44.5) are defined by
aEW ;j D tj ; bj ; dj (44.8)
and characterize the properties of failed WP j , where (i) tj designates WP type (i.e.,
tj D 1 commercial spent nuclear fuel (CSNF) WP, tj D 2 codisposed (CDSP)
WP), (ii) bj designates percolation bin in which failed WP is located (i.e., bj D
k location of failed WP in percolation bin k for k D 1; 2; 3; 4; 5; see Fig. 44.2,
Ref. [33]), and (iii) dj designates whether the failed WP experiences nondripping or
dripping conditions (i.e., dj D 0 nondripping conditions and dj D 1 dripping
1508 J.C. Helton et al.
conditions). In turn, the associated probabilities are based on the assumption that
the number of early failed WPs follows a binomial distribution with the failed WPs
distributed randomly over WP types, percolation bins, and nondripping/dripping
conditions.
Although potentially complex, representations of aleatory uncertainty of the form
illustrated for the WIPP CCA PA and the YM license application PA permit a
complete and unambiguous description of what constitutes aleatory uncertainty and
the probabilistic characterization of this uncertainty.
As indicated in Sect. 2, the second entity that underlines a PA for a complex system
is a function f that estimates a vector f .a/ of consequences for individual elements
a of the sample space A for aleatory uncertainty. In most analyses, the function
f corresponds to a sequence of models for multiple physical processes that must
be implemented and numerically evaluated in one or more computer programs.
However, for notational and conceptual purposes, it is useful to represent this
sequence of models as a function of elements of the sample space A for aleatory
uncertainty. In most analyses associated with radioactive waste disposal, the analysis
results of interest are functions of time. In this situation, the model used to predict
system behavior can be represented by f .ja/, where corresponds to time and
the indication of conditionality (i.e., ja) emphasizes that the results at time
depend on the particular element of A under consideration. In most analyses, f .ja/
corresponds to a large number of results. In general, could also correspond to
designators other than time such as spatial coordinates; however, it is possible
that spatial coordinates and associated location-dependent results would simply be
viewed as part of a very large number of results corresponding to f .ja).
As an example, the sequence of linked models that corresponds to f in the
1996 WIPP PA is indicated in Fig. 44.1. As summarized in Table 44.1, the models
indicated in Fig. 44.1 represent a variety of physical processes and also involve a
variety of mathematical structures (see Refs. [1, 28] for additional information on
the models used in the 1996 WIPP PA).
As another example, a subset of the models that correspond to f in the 2008 PA
for the proposed YM repository is indicated in Fig. 44.2. The particular configura-
tion of models shown in Fig. 44.2 was used in the YM PA to calculate consequences
associated with elements of the sample space A for aleatory uncertainty that
involved seismic disruptions. Similar model configurations were used to calculate
consequences for early WP failures, early DS failures, and igneous intrusive events;
a very different suite of models was used to calculate consequences for igneous
eruptive events. The number of individual models used in the 2008 YM PA is too
great to permit their individual description here. However, descriptions of these
models are available in Refs. [35, 36] and in the more detailed model-specific
technical reports cited in Refs. [35, 36].
44 Conceptual Structure of Performance Assessments for Geologic: : : 1509
Fig. 44.1 Models used in the 1996 WIPP PA ([34], Fig. 44.4); see Table 44.1 for a description of
individual models
As is evident from Fig. 44.1, Fig. 44.2, and Table 44.1, the function f that
constitutes the second of the three entities that underlie a PA for radioactive waste
repository can be quite complex.
e D eA ; eM D e1 ; e2 ; : : : ; enE ; (44.9)
Fig. 44.2 Models used in the 2008 YM PA for seismic disruptions ([35], Fig. 6.1.46)
As examples, the probability spaces for epistemic uncertainty in the 1996 WIPP
PA and the 2008 YM PA involved nE D 57 and nE D 392 epistemically uncertain
quantities, respectively. Example elements of the vector e for the WIPP and YM
PAs are described in Tables 44.2 and 44.3. The example variables in Tables 44.2
and 44.3 were selected to include all variables that appear in the example sensitivity
analyses discussed in Sect. 7.
As done in the WIPP and YM PAs, probability is the most widely used
mathematical structure for the representation of epistemic uncertainty. However,
a number of additional structures for the representation of epistemic uncertainty in
44 Conceptual Structure of Performance Assessments for Geologic: : : 1511
Table 44.1 Summary of individual models used in the 1996 WIPP PA ([34], Table 1)
BRAGFLO: Calculates multiphase flow of gas and brine through a porous heterogeneous
reservoir. Uses finite difference procedures to solve system of nonlinear partial differential
equations (PDEs) that describes the conservation of gas and brine along with appropriate
constraint equations, initial conditions, and boundary conditions.
BRAGFLO-DBR: Special configuration of BRAGFLO model used to calculate dissolved
radionuclide releases to the surface at the time of a drilling intrusion. Uses initial value
conditions obtained from calculations performed with BRAGFLO and CUTTINGS_S.
CUTTINGS_S: Calculates quantity of radioactive material brought to the surface in cuttings,
cavings, and spallings generated by a drilling intrusion. Spalling calculation uses initial value
conditions obtained from calculations performed with BRAGFLO.
GRASP-INV: Generates transmissivity fields conditioned on measured transmissivity values and
calibrated to steady-state and transient pressure data at well locations using an adjoint sensitivity
and pilot-point technique.
NUTS: Solves system of PDEs for dissolved radionuclide transport in the vicinity of the
repository. Uses brine volumes and flows calculated by BRAGFLO as input.
PANEL: Calculates rate of discharge and cumulative discharge of radionuclides from a waste
panel through an intruding borehole. Uses brine volumes and flows calculated by BRAGFLO as
input.
SANTOS: Determines quasistatic, large deformation, inelastic response of two-dimensional
solids with finite element techniques. Used to determine porosity of waste as a function of time
and cumulative gas generation, which is an input to calculations performed with BRAGFLO.
SECOFL2D: Calculates single-phase Darcy flow for groundwater flow in two dimensions based
on a PDE for hydraulic head. Uses transmissivity fields generated by GRASP-INV.
SECOTP2D: Simulates transport of radionuclides in a fractured porous medium. Solves two
PDEs: one provides a two-dimensional representation for convective and diffusive radionuclide
transport in fractures and the other provides a one-dimensional representation for diffusion of
radionuclides into the rock matrix surrounding the fractures. Uses flow fields calculated by
SECOFL2D.
Table 44.2 Examples of the nE D 57 elements of the vector e of epistemically uncertain variables
in the 1996 WIPP PA (see [45], Table 1, for a complete listing of the indicated 57 epistemically
uncertain variables and sources of detailed information on the individual variables)
ANHPRM Logarithm of anhydrite permeability (m2 /. Distribution: Students with 5 degrees
of freedom. Range: 21:0 to 17:1. Correlation: 0:99 rank correlation with ANHCOMP
(anhydrite compressibility).
BHPRM Logarithm of borehole permeability (m2 /. Distribution: Uniform. Range: 14:0 to
11:0.
HALPOR Halite porosity (dimensionless). Distribution: Piecewise uniform. Range: 0.001 to
0.03.
SHBCEXP Brooks-Corey pore distribution parameter for shaft (dimensionless). Distribution:
Piecewise uniform. Range: 0.11 to 8.10.
WRGSSAT Residual gas saturation in waste (dimensionless). Distribution: Uniform. Range: 0
to 0.15.
WMICDFLG Pointer variable for microbial degradation of cellulose (dimensionless). Distri-
bution: Discrete. WMICDFLG = 1, 2, 3 implies no microbial degradation of cellulose, microbial
degradation of only cellulose, microbial degradation of cellulose, plastic and rubber.
WTAUFAIL Shear strength of waste (Pa). Distribution: Uniform. Range: 0.05 to 10.
1512 J.C. Helton et al.
Table 44.3 Examples of the nE D 392 elements of the vector e of epistemically uncertain
variables in the 2008 YM PA (see [26], App. B, for a complete listing of the indicated 392
epistemically uncertain variables and sources of detailed information on the individual variables)
CORRATSS. Stainless steel corrosion rate (m/yr). Distribution: Truncated log normal. Range:
0.01 to 0.51. Mean/Median/Mode: 0.267. Standard Deviation: 0.209.
CSNFMASS. Scale factor used to characterize uncertainty in radionuclide content of commercial
spent nuclear fuel (CSNF) (dimensionless). Distribution: Uniform. Range: 0.85 to 1.4.
DSNFMASS. Scale factor used to characterize uncertainty in radionuclide content of defense
spent nuclear fuel (DSNF) (dimensionless). Distribution: Triangular. Range: 0.45 to 2.9. Mode:
0.62.
DTDRHUNC. Selector variable used to determine the collapsed drift rubble thermal conductiv-
ity (dimensionless). Distribution: Discrete. Range: 1 to 2.
HLWDRACD. Effective rate coefficient (affinity term) for the dissolution of high-level waste
(HLW) glass in codisposed spent nuclear fuel (CDSP) waste packages (WPs) under low pH
conditions (g/(m2 d)). Distribution: Triangular. Range: 8.41E 03 to 1.15E 07. Mode: 8.41E 03.
IGRATE. Frequency of intersection of the repository footprint by a volcanic event (yr1 ).
Distribution: Piecewise uniform. Range: 0 to 7.76E07.
INFIL. Pointer variable for determining infiltration conditions: 10th, 30th, 50th, or 90th
percentile infiltration scenario (dimensionless). Distribution: Discrete. Range: 1 to 4.
INRFRCTC. The initial release fraction of 99 Tc in a CSNF waste package (dimensionless).
Distribution: Triangular. Range: 0.0001 to 0.0026. Mode: 0.001.
MICC14. Groundwater Biosphere Dose Conversion Factor (BDCF) for 14 C in modern inter-
glacial climate ((Sv/year)/(Bq/m3 //. Distribution: Discrete. Range: 7.18E10 to 2.56E08.
Mean: 1.93E-09. Standard Deviation: 1.85E09.
MICTC99. Groundwater BDCF for 99 Tc in modern interglacial climate ((Sv/year)/(Bq/m3 //.
Distribution: Discrete. Range: 5.28E10 to 2.85E08. Mean: 1.12E-09. Standard Deviation:
1.26E09.
SCCTHR. Residual stress threshold for stress corrosion cracking (MPa). Distribution: Uniform.
Range: 315.9 to 368.55.
SCCTHRP. Residual stress threshold for stress corrosion crack (SCC) nucleation of Alloy 22
(as a percentage of yield strength in MPa) (dimensionless). Distribution: Uniform. Range: 90 to
105.
SZFIPOVO. Logarithm of flowing interval porosity in volcanic units (dimensionless). Distribu-
tion: Piecewise uniform. Range: 5 to 1. Mean/Median/Mode: 3.
SZGWSPDM. Logarithm of the scale factor used to characterize uncertainty in groundwater-
specific discharge (dimensionless). Distribution: Piecewise uniform. Range: 0:951 to 0.951.
THERMCON. Selector variable for one of three host-rock thermal conductivity scenarios (low,
mean, and high) (dimensionless). Distribution: Discrete. Range: 1 to 3.
WDGCA22. Temperature-dependent slope term of Alloy 22 general corrosion rate (K). Distri-
bution: Truncated normal. Range: 666 to 7731. Mean: 4905. Standard Deviation: 1413.
WDGCUA22. Variable for selecting distribution for general corrosion rate (low, medium, or
high) (dimensionless). Distribution: Discrete. Range: 1 to 3.
WDNSCC. Stress corrosion cracking growth rate exponent (repassivation slope) (dimension-
less). Distribution: Truncated normal. Range: 0.935 to 1.395. Mean: 1.165. Standard Deviation:
0.115.
WDZOLID. Deviation from median yield strength range for outer lid (dimensionless). Distribu-
tion: Truncated normal. Range: 3 to 3. Mean: 0. Standard Deviation: 1.
44 Conceptual Structure of Performance Assessments for Geologic: : : 1513
the presence of limited information have also been developed, including interval
analysis, evidence theory, and possibility theory (e.g., [4655]). As yet, these
uncertainty structures have not been used in large-scale PAs.
Two possibilities exist when the propagation and display of uncertainty are con-
sidered in a PA that involves a separation of aleatory uncertainty and epistemic
uncertainty: (i) presentation of the effects of epistemic uncertainty conditional on
a specific realization of aleatory uncertainty and (ii) presentation of the effects of
aleatory uncertainty conditional on a specific realization of epistemic uncertainty.
The presentation of the effects of epistemic uncertainty conditional on a specific
realization of aleatory uncertainty is considered first.
and
Z
pE .y < yja/
Q D 1 pE .yQ yje/ D y f . ja; eM /dE .eM /dE; (44.11)
E
Z
EE .yja/ D f .ja; eM /dE .eM / dE (44.13)
E
and
n
X
EE .yja/ f .ja; eM k /=n; (44.16)
kD1
of size 300 in the propagation of epistemic uncertainty [26, 56, 57]. Latin hyper-
cube sampling operates in the following manner to generate a sample of size
n from the distributions D1 ; D2 ; : : : ; DnE associated with the elements of e D
e1 ; e2 ; : : : ; enE . The range of each variable el is divided into n disjoint intervals
of equal probability, and one value ekl is randomly selected from each interval. The
n values for e1 are randomly paired without replacement with the n values for e2
to produce n pairs. These pairs are then randomly combined without replacement
with the n values for e3 to produce n triples. This process is continued until a set
of n nE-tuples ek D ek1 ; ek2 ; : : : ; ek;nE ; k D 1; 2; : : : ; n, is obtained, with this set
constituting the LHS. Owing to its efficient stratification properties, Latin hypercube
sampling has also been used for the propagation of epistemic uncertainty in the 1996
WIPP PA [45], the NUREG-1150 probabilistic risk assessments for five nuclear
power stations [3, 4], and many other analyses.
44 Conceptual Structure of Performance Assessments for Geologic: : : 1515
c
DOSTOT: 600,000 year
Stepa Variableb R2c SRRCd
1 WDGCA22 0.85 -0.94
2 WDZOLID 0.87 0.14
3 THERMCON 0.88 -0.10
4 INFIL 0.89 -0.10
5 SCCTHR 0.90 -0.09
6 WDNSCC 0.91 -0.08
7 CORRATSS 0.91 -0.07
8 WDGCUA22 0.92 0.08
9 MICTC99 0.92 0.07
10 DTDRHUNC 0.92 0.04
11 CSNFMASS 0.92 0.04
a: Steps in stepwise rank regression analysis
b: Variables listed in order of selection in stepwise
regression
c: Cumulative R2 value with entry of each variable into
regression model
d: Standardized rank regression coefficients (SRRCs) in
final regression model
Fig. 44.3 Estimates obtained with an LHS of size n D 300 of the epistemic uncertainty in dose
DN .jaN ; eM / to RMEI conditional on the realization aN of aleatory uncertainty corresponding to
nominal conditions: (a) CDF and CCDF for DN .jaN ; eM / with D 600; 000 yr ([58], Fig. 1b);
(b) individual results DN .jaN ; eM k /, k D 1; 2; : : : ; n, and associated expected (mean) values
EE DN .jaN ; eM / and quantile values QEq DN .jaN ; eM /; q D 0:05; 0:5; 0:95, for 0
106 yr ([58], Fig. 1a); (c) stepwise rank regressions for DN .jaN ; eM / at D 600; 000 yr ([33],
Table 10); and (d) partial rank correlation coefficients (PRCCs) for DN .jaN ; eM / for 0
106 yr ([33], Fig. 23b)
of size n and that Cr is an analysis result of interest obtained for replicated sample
r. For example, C might be the expected value EE f .ja; eM /] at time that is
estimated with the use of Latin hypercube sampling, and Cr D EEr f . ja; eM /
would be the estimate for C D EE f . ja; eM / obtained with replicated sample r.
Then,
nR nR
( ) 1=2
X X 2
C D Cr =nR and SE.C / D Cr C =nR.nR 1/ (44.25)
rD1 rD1
1518 J.C. Helton et al.
a 102
q = 0.95
Mean
101 q = 0.5 median
(mrem/year) 100
101
102
103
104
0 200000 400000 600000 800000 1000000
Time (yr)
b 102
101
Expected Dose to RMEI
100
(mrem/year)
101
Upper End 95% Cl
102 Expected (Mean)
Lower End 95% Cl
103
104
0 200000 400000 600000 800000 1000000
Time (yr)
Fig. 44.4 Illustration of replicated sampling to determine the adequacy of an LHS of size 300
for the assessment of the epistemic uncertainty present in estimates for the dose DN .jaN ; eM /
to the RMEI conditional on undisturbed (i.e., nominal) repository conditions as indicated
by the vector aN ([58], Fig. 4): (a) expected values EEr DN .jaN ; eM / and quantiles
QEqr DN .jaN ; eM /; q D 0:05; 0:5; 0:95, for DN .jaN ; eM / obtained for each of the three
replicated samples indicated in Eq. (44.24) and (b) time-dependent 95% confidence intervals (i.e.,
for D 0:05) for EE DN .jaN ; eM / obtained as indicated in conjunction with Eq. (44.25)
and
Z
pA .y < yje/
Q D 1 pA .yQ yje/ D y f . ja; eM / dA .ajeA /dA;
A
(44.27)
and
(P
m
f .jai ; eM /=m
EA .yje/ PiD1
mQ (44.31)
Q
iD1 f .jai ; eM /pA .Ai jeA /;
Fig. 44.5 Distribution of CCDFs for normalized release to the accessible environment over 104 yr
obtained in the 1996 WIPP PA: (a) individual CCDFs for LHS of size 100 ([67], Fig. 1a), (b) mean
and quantile curves for CCDFs in Frame a ([67], Fig. 1b), (c) replicated mean and quantile curves
for CCDFs in Frame a for three independent LHSs (i.e., R1, R2, R3) of size 100 ([67], Fig. 2a),
and (d) partial rank correlation coefficients (PRCCs) for all 300 CCDFs used to generate results in
Frame c ([67], Fig. 5)
through the tree being used to both define one set Ai and determine the probability
of this set.
As an example, the CCDFs for normalized radionuclide release to the accessible
environment obtained in the 1996 WIPP PA are shown in Fig. 44.5a. In this
example, the individual CCDFs are calculated as indicated in Eq. (44.30) with a
random sample of size 104 from the sample space for aleatory uncertainty, and
the distribution of CCDFs results from repeating this calculation for each element
ek D eAk ; eM k of an LHS of size n D 100 from the sample space for epistemic
1522 J.C. Helton et al.
uncertainty (see Ref. [67] for computational details). As indicated immediately after
Eq. (44.27), each CCDF in Fig. 44.5a is a plot of points of the form y; pA .y
yje
Q k / with y corresponding to normalized release R at 10,000 yr. Specifically, the
indicated release is the integrated (i.e., total) release that takes place from repository
closure (i.e., time 0 yr) out to 10,000 yr; thus, in the formal model representation
f . ja; eM / used in Eqs. (44.1044.31), the time variable corresponds to 10,000 yr
in Fig. 44.5a.
Displays of the form shown in Fig. 44.5a provide a visual impression of the
effects of epistemic uncertainty in the determination of the presented results. A more
quantitative presentation of the effects of epistemic uncertainty is provided by a plot
of the expected value and selected quantile values (e.g., q D 0:05; 0:5; 0:95) for
the exceedance probabilities associated with individual values on the abscissa (e.g.,
normalized release R as in Fig. 44.5a). In general and with the same notation as used
in Eqs. (44.29), (44.30) and (44.31), the indicated expected value EE pA .y < yje/ Q
is given by
Z Z
EE pA .y < yje/
Q D y f .ja; eM / dA .aje/dA dE .e/dE
E A
8P m
n P
y f . jai ; e M k / =m =n (44.32)
< kD1 iD1
( )
P P f .jaQ ; e / p .A je / =n
n Q
m
: y i Mk A i Ak
kD1 iD1
As an example, mean and quantile (i.e., percentile) results for the CCDFs in
Fig. 44.5a are presented in Fig. 44.5b.
Results analogous to those illustrated in Fig. 44.5a, b can also be obtained
and presented for CDFs. However, CCDFs rather than CDFs usually provide the
preferred presentation format for results obtained in PAs for two reasons. First,
CCDFs provide an answer to questions of the form How likely is it to be this bad
44 Conceptual Structure of Performance Assessments for Geologic: : : 1523
or worse?, which is typically the type of question of most interest in a PA. Second,
CCDFs facilitate the display of the probabilities associated with low probability
but high consequence aleatory occurrences (especially when a log10 scale is used
on the ordinate); in contrast, such probabilities are difficult to obtain from a CDF
owing to the lack of resolution in displayed probabilities as cumulative probability
approaches 1.0.
As previously indicated, the 1996 WIPP PA used an LHS of size 100 in the
propagation of epistemic uncertainty. To assess the adequacy of this sample size,
the analysis was repeated with three replicated LHSs of size 100 denoted R1, R2,
and R3. As shown in Fig. 44.5c, there is little variability in the results obtained with
the individual replicated samples. The sensitivity analysis results with PRCCs in
Fig. 44.5d are discussed in Sect. 7.
An interesting potential regulatory requirement is also illustrated in Fig. 44.5a, b.
The two-step boundary labeled EPA limit in the upper right corner of Fig. 44.5a
derives from a licensing regulation for the WIPP established by the US Environ-
mental Agency (EPA) which requires for the time interval [0, 104 yr] after repository
closure that (i) the probability of exceeding a normalized release of size R D 1:0
shall be less than 0.1 and (ii) the probability of exceeding a normalized release of
size R D 10:0 shall be less than 0.001. As specified by the EPA, the indicated
requirement is violated if the mean CCDF in Fig. 44.5b crosses the indicated
boundary line (see Ref. [24] for additional details). A requirement of this type places
stronger restrictions on system outcomes as the severity of these outcomes increases
and is known as the Farmer limit line approach to the definition of acceptable risk
[6870].
An additional example of the representation of the effects of aleatory uncertainty
conditional on a specific realization of epistemic uncertainty is provided by the
treatment in the 2008 YM PA of expected dose EA D. ja; eM /jeA over aleatory
uncertainty to the RMEI conditional on a realization e D eA ; eM of epistemic
uncertainty. In the preceding, D.ja; eM / represents dose (mrem/yr) to the RMEI at
time (yr) conditional on the future defined by the vector a of aleatory quantities
(see Eq. (44.4)) and the vector eM of epistemically uncertainty quantities (see
Table 44.3) used in the calculation of dose to the RMEI; and EA D.ja; eM /jeA is
defined as indicated in Eq. (44.28). Specifically,
Z
EA D.ja; eM / D D. ja; eM /dA .ajeA /dA: (44.34)
A
The time-dependent expected values EA D. ja; eM /jeA for D.ja; eM / were
estimated for the time intervals [0, 20,000 yr] and [0, 106 yr] for each of the LHS
sample elements ek D eAk ; eM k indicated in Eq. (44.17).
The resultant expected dose curves ; EA D. ja; eM k /jeAk ; k D 1; 2; : : :; n D
300, the associated curves for the quantiles QEq fEA D. ja; eM /jeA g; q D
0:05; 0:5; 0:95; and the expected value EE fEA D. ja; eM /jeA g for dose over
aleatory and epistemic uncertainty are shown in Fig. 44.6a. The quantities
EA D. ja; eM /jeA and EE fEA D. ja; eM /jeA g are both expected doses but with
1524 J.C. Helton et al.
b
a
CCDF
0.8 0.8
100
0.6 0.6
101 q = 0.5 Median
Mean
2
10 0.4 0.4
CDF
103 q = 0.95 0.2 0.2
Mean
104 q = 0.5 Median q = 0.05
q = 0.05
0.0 0.0
105
0 5000 10000 15000 20000 102 101 100
Time (yr) E: Expected Dose to RMEI at 104 yr (mrem/yr)
c
103 d
102 1.00
Expected Dose (mrem/yr)
101
108
0 5000 10000 15000 20000 0 5000 10000 15000 20000
Time (yr) Time (yr)
Fig. 44.6 Estimates obtained with an LHS of size n D 300 of expected dose EA D.ja; eM /jeA
to the RMEI in the 2008 YM PA for the time interval [0, 20,000 yr]: (a) expected dose
EA D.ja; eM k /jeAk ; k D 1; 2; : : :; n D 300, expected (mean) dose EE fEA D.ja; eM /jeA g,
and quantiles QEq fEA D.ja; eM /jeA g; q D 0:05; 0:5; 0:95 ([61], Fig. 1a); (b) cumula-
tive probabilities pE fEA D.104 ja; eM /jeA Dg, exceedance probabilities pE fD <
EA D.104 ja; eM /jeA g, expected (mean) dose EE fEA D.104 ja; eM /jeA g, and quantiles
QEq fEA D.104 ja; eM /jeA g; q D 0:05; 0:5; 0:95, for D 104 yr ([61], Fig. 1b); (c) estimates
of EE fEA D.ja; eM /jeA g and QEq fEA D.ja; eM /jeA g; q D 0:05; 0:5; 0:95, obtained with
three replicated LHSs of size n D 300 as indicated in Eq. (44.24) ([61], Fig. 5); and (d) partial
rank correlation coefficients (PRCCs) for EA D.ja; eM /jeA ([61], Fig. 6)
Specifically, QEq fEA D. ja; eM /jeA g corresponds to the value y defined and
approximated by
Z
qD y fEA D.ja; eM /jeA g dE .e/dE
E
n
X (44.35)
y fEA D. ja; eM k /jeAk g =n;
kD1
Similarly, QEq fEA y. ja; eM /jeA g and EE fEA y. ja; eM /jeA g for an arbitrary
analysis result y.ja; eM / are defined and approximated as indicated in
Eqs. (44.35) and (44.36) with y.ja; eM / and y. ja; eM k / replacing D. ja; eM /
and D.ja; eM k /.
The numerics of determining the expected doses EA D. ja; eM k /jeAk are now
considered. In concept, EA D. ja; eM k /jeAk can be approximated by
m
X
EA D.ja; eM k /jeA D. jai ; eM k /=m; (44.37)
iD1
In the 2008 YM PA, the first step in the evaluation of EA D. ja; eM k /jeAk ] was
to divide the defining integral for EA D. ja; eM k /jeAk ] in Eq. (44.34) into a sum of
integrals over selected subsets of the sample space A for aleatory uncertainty. These
subsets are referred to as scenario classes (and sometimes as modeling cases) in the
2008 YM PA and are denoted and defined with the notation used in Eqs. (44.444.7)
by (i) AEW D faja 2 A and nEW 1g for the early WP failure scenario class,
(ii) AED D faja 2 A and nED 1g for the early DS failure scenario class,
(iii) AII D faja 2 A and nII 1g for the igneous intrusion scenario class,
(iv) AIE D faja 2 A and nIE 1g for the igneous eruption scenario class, (v)
ASG D faja 2 A and nS G 1g for the seismic ground motion scenario class, (vi)
ASF D faja 2 A and nSF 1g for the seismic fault displacement scenario class,
and (vii) AN D faja 2 A and nEW D nED D nII D nIE D nS G D nSF D
0g D faN g for the nominal scenario class. In turn, the indicated sum of integrals is
given by
with (i) MC D fEW; ED; II; IE; SG; SFg and (ii) the doses DC .ja; eM k / for
scenario class AC calculated with use of only the elements of a and e D eA ,
eM ] that constitute part of the defining conditions for scenario class AC (i.e.,
early WP failure, early DS failure, igneous intrusion, igneous eruption, seismic
ground motion, and seismic fault displacement). The preceding approximation to
EA D. ja; eM k /jeAk ] was justified for use in the 2008 YM PA on the basis of trade-
offs between the effects of high probability-low consequence scenario classes and
low probability-high consequence scenario classes ([61], Sect. 5).
Use of the decomposition in Eq. (44.38) reduces the determination of EA D. ja;
eM k /jeAk ] from the evaluation of a single very complex integral to the evaluations
of (i) DN .jaN; eM k / and (ii) six individual integrals (i.e., one integral for each
element of the set MC). The six integrals for DC . ja; eM k / over AC for C 2 MC
that need to be evaluated are still complex, but less complex than a single integral
for D.ja; eM k / over A. The advantage of the decomposition in Eq. (44.38) is that
the integrals for DC . ja; eM k / over AC for C 2 MC can be implemented in ways
that obtain computational efficiencies that take advantage of specific properties of
the dose function DC . ja; eM k / and the restriction of the density function to the set
AC for scenario class C .
44 Conceptual Structure of Performance Assessments for Geologic: : : 1527
where (i) nSG = number of seismic ground motion events in 20,000 yr, (ii) ti D time
(yr) of event i , (iii) vi D peak ground motion velocity (m/s) for event i , (iv) Ai D
damaged area (m2 / on individual WPs for peak ground motion velocity vi , (v) the
occurrence of seismic ground motion events is characterized by a hazard curve for
peak ground motion velocity, and (vi) damaged area is characterized by distributions
conditional on peak ground motion velocity. In effect, ASG is the sample space for
aleatory uncertainty when only seismic ground motion events are considered over
the time interval [0, 20,000 yr].
To evaluate EA D. ja; eM k /jeAk ], it is necessary to integrate the function
DSG . ja; eM k / over the set ASG with a defined as indicated in Eq. (44.39). In
full detail, DSG .ja; eM k / is defined by the model system indicated in Fig. 44.2.
Evaluation of this system is too computationally demanding to permit its evaluation
1000s of times for each element ek D eAk , eM k ] of the LHS in Eq. (44.17). This
is a common situation in analyses of complex systems, where very detailed physical
models are developed which then turn out to be too computationally demanding to
be naively used in the propagation of aleatory uncertainty. In such situations, it is
necessary to find ways to efficiently use the results of a limited number of model
evaluations to predict outcomes for a large number of different possible realizations
of aleatory uncertainty.
For the seismic ground motion scenario class and the time interval [0, 20;000 yr],
the needed computational efficiency was achieved by evaluating DSG .ja; eM k / at
a sequence of seismic event times (i.e., 100, 1000, 3000, 6000, 12,000, 18,000 yrs)
and for a sequence of damaged areas (i.e., 108Cs .32:6 m2 / for s D 1; 2; : : :; 5
with 32.6 m2 corresponding to the surface area of a WP) at each of the indicated
times (Fig. 44.7a). This required 6 5 D 30 evaluations of the system indicated in
Fig. 44.2 for each LHS element in Eq. (44.17). Once obtained, these evaluations
can be used with appropriate interpolation and additive procedures to evaluate
DSG . ja; eM k / for different values of a for each LHS element ek D eAk , eM k ]
(see Ref. [32], Sect. 4, for computational details).
The individual CCDFs in Fig. 44.7b are defined by probabilities of the form
shown in Eq. (44.27) with (i) DSG . ja; eM k / and eAk replacing y.ja; eM /
and eA and (ii) D 104 yr. Numerically, the integrals that define exceedance
probabilities for the individual CCDFs are approximated with (i) random sampling
from the possible values for a as indicated in Eq. (44.30) and (ii) estimated
values DO SG .104 jai ; eM k / for DSG .104 ja, eM k / constructed from results of
1528 J.C. Helton et al.
a b
101 101
q = 0.95
Mean
Dose to REMI (mrem/yr)
10 0 0 q = 0.5 Median
10 q = 0.05
Prob(Dose>D)
101 10 1
102 102
103 103
104 104
0 5000 10000 15000 20000 103 102 101 100 101 102 103
Time (yr) D: Dose to RMEI at 10,000 yr (mrem/yr)
c d
0.75
Variablea R2b SRRCC
0.50
SCCTHRP 0.88 0.93
0.25
MICTC99 0.90 0.12
0.00
DSNFMASS 0.91 0.14 SCCTHRP
0.25 SZGWSPDM
HLWDRACD 0.91 0.08 SZFIPOVO
0.50 INFIL
MICC14 0.92 0.08 DSNFMASS
0.75
MICC14
a: Variables listed in order of selection 1.00
b: Cumulative R2 value
c: SRRCs in final regression model 0 5000 10000 15000 20000
Time (yr)
Fig. 44.7 Example results for dose (mrem/yr) to RMEI for seismic ground motion scenario class:
(a) dose for seismic events occurring at different times and causing different damaged areas on WPs
([32], Fig. 4a), (b) CCDFs for dose at 10,000 yr ([32], Fig. 10), (c) CCDF and CDF for expected
dose at 10,000 yr ([32], Fig. 6b), (d) time-dependent expected dose ([32], Fig. 6a), (e) stepwise
rank regression for expected dose at 10,000 yr ([72], Table 4), and (f) time-dependent PRCCs for
expected dose ([72], Fig. 4b)
the form shown in Fig. 44.7a (see Ref. [32], Sect. 4, for computational details).
Specifically,
Xm h i
pOA y < DSG .104 ja; eM k /jeAk D y DO SG .104 jai ; eM k / =m; (44.40)
iD1
44 Conceptual Structure of Performance Assessments for Geologic: : : 1529
a b
102 102
Expected Dose to RMEI (mrem/yr)
103 103
104 104
q = 0.95
105 Mean 105
106 q = 0.5 Median
q = 0.05
106
107 107
0 5000 10000 15000 20000 0 5000 10000 15000 20000
Time (yr) Time (yr)
e f
102 102
Expected Dose to RMEI (mrem/yr)
101 101
102 102
103 103
104 104
q = 0.95
105 Mean 105
q = 0.5 Median
106 q = 0.05 106 Seismic Fault Displacement
107 107
0 5000 10000 15000 20000 0 5000 10000 15000 20000
Time (yr) Time (yr)
Fig. 44.8 Expected dose EA DC .ja; eM k /jeAk ] to RMEI for 0 20; 000 yr for C 2
fEW; ED; II; IE; S G; SF g: (a) early WP failure, (b) early DS failure, (c) igneous intrusion, (d)
igneous eruption, (e) seismic ground motion, and (f) seismic fault displacement ([61], Fig. 2)
showed that DN .jaN ; eM k / D 0 mrem/yr for 0 20; 000 yr (see Fig. 44.3a);
thus, DN .jaN ; eM k / made no contribution to EA D. ja; eM k /jeAk ] for 0
20,000 yr.
7 Sensitivity Analysis
8 Summary
References
1. Helton, J.C., Marietta, M.G. (eds.): Special issue: the 1996 performance assessment for the
Waste Isolation Pilot Plant. Reliab. Eng. Syst. Saf. 69(13), 1451 (2000)
2. Helton, J.C., Hansen, C.W., Swift, P.N. (eds.): Special issue: performance assessment for the
proposed high-level radioactive waste repository at Yucca Mountain, Nevada. Reliab. Eng.
Syst. Saf. 122, 1456 (2014)
3. Helton, J.C., Breeding, R.J.: Calculation of reactor accident safety goals. Reliab. Eng. Syst.
Saf. 39(2), 129158 (1993)
4. Breeding, R.J., Helton, J.C., Gorham, E.D., Harper, F.T.: Summary description of the methods
used in the probabilistic risk assessments for NUREG-1150. Nucl. Eng. Des. 135(1), 127
(1992)
5. Breeding, R.J., Helton, J.C., Murfin, W.B., Smith, L.N., Johnson, J.D., Jow, H.-N., Shiver,
A.W.: The NUREG-1150 probabilistic risk assessment for the Surry Nuclear Power Station.
Nucl. Eng. Des. 135(1), 2959 (1992)
6. Helton, J.C., Hansen, C.W., Sallaberry, C.J.: Conceptual structure of performance assessments
for the geologic disposal of radioactive waste. In: 10th International Probabilistic Safety
Assessment & Management Conference, Seattle (2010)
7. Helton, J.C., Sallaberry, C.J.: Uncertainty and sensitivity analysis: from regulatory require-
ments to conceptual structure and computational implementation. In: 10th IFIP WG 2.5
Working Conference on Uncertainty Quantification in Scientific Computing, Boulder. IFIP
Advances in Information and Communication Technology (AICT), vol. 377, pp. 6076 (2012)
8. Kaplan, S., Garrick, B.J.: On the quantitative definition of risk. Risk Anal. 1(1), 1127 (1981)
9. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 2, 2nd edn. Wiley,
New York (1971)
10. Parry, G.W., Winter, P.W.: Characterization and evaluation of uncertainty in probabilistic risk
analysis. Nucl. Saf. 22(1), 2842 (1981)
11. Parry, G.W.: The characterization of uncertainty in probabilistic risk assessments of complex
systems. Reliab. Eng. Syst. Saf. 54(23), 119126 (1996)
12. Apostolakis, G.: The concept of probability in safety assessments of technological systems.
Science 250(4986), 13591364 (1990)
13. Apostolakis, G.: The distinction between aleatory and epistemic uncertainties is important:
an example from the inclusion of aging effects into PSA. In: Proceedings of the Interna-
tional Topical Meeting on Probabilistic Safety Assessment, PSA 99: Risk Informed and
Performance-Based Regulation in the New Millennium, vol. 1, pp. 135142. American Nuclear
Society, La Grange Park (1999)
44 Conceptual Structure of Performance Assessments for Geologic: : : 1535
14. Helton, J.C.: Treatment of uncertainty in performance assessments for complex systems. Risk
Anal. 14(4), 483511 (1994)
15. Helton, J.C., Burmaster, D.E.: Guest editorial: treatment of aleatory and epistemic uncertainty
in performance assessments for complex systems. Reliab. Eng. Syst. Saf. 54(23), 9194
(1996)
16. Helton, J.C.: Uncertainty and sensitivity analysis in the presence of stochastic and subjective
uncertainty. J. Stat. Comput. Simul. 57(14), 376 (1997)
17. Helton, J.C., Johnson, J.D., Sallaberry, C.J.: Quantification of margins and uncertainties:
example analyses from reactor safety and radioactive waste disposal involving the separation
of aleatory and epistemic uncertainty. Reliab. Eng. Syst. Saf. 96(9), 10141033 (2011)
18. Hoffman, F.O., Hammonds, J.S.: Propagation of uncertainty in risk assessments: the need to
distinguish between uncertainty due to lack of knowledge and uncertainty due to variability.
Risk Anal. 14(5), 707712 (1994)
19. Pat-Cornell, M.E.: Uncertainties in risk analysis: six levels of treatment. Reliab. Eng. Syst.
Saf. 54(23), 95111 (1996)
20. Winkler, R.L.: Uncertainty in probabilistic risk assessment. Reliab. Eng. Syst. Saf. 54(23),
127132 (1996)
21. Ferson, S., Ginzburg, L.: Different methods are needed to propagate ignorance and variability.
Reliab. Eng. Syst. Saf. 54(23), 133144 (1996)
22. Aven, T.: On different types of uncertainties in the context of the precautionary principle. Risk
Anal. 31(10), 15151525 (2011)
23. Aven, T., Reniers, G.: How to define and interpret a probability in a risk and safety setting. Saf.
Sci. 51, 223231 (2013)
24. Helton, J.C., Anderson, D.R., Jow, H.-N., Marietta, M.G., Basabilvazo, G.: Conceptual
structure of the 1996 performance assessment for the Waste Isolation Pilot Plant. Reliab. Eng.
Syst. Saf. 69(13), 151165 (2000)
25. Helton, J.C.: Mathematical and numerical approaches in performance assessment for radioac-
tive waste disposal: dealing with uncertainty. In: Scott, E.M. (ed.) Modelling Radioactivity in
the Environment, pp. 353390. Elsevier Science, New York (2003)
26. Helton, J.C., Hansen, C.W., Sallaberry, C.J.: Conceptual structure and computational orga-
nization of the 2008 performance assessment for the proposed high-level radioactive waste
repository at Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 223248 (2014)
27. Helton, J.C., Davis, F.J., Johnson, J.D.: Characterization of stochastic uncertainty in the 1996
performance assessment for the Waste Isolation Pilot Plant. Reliab. Eng. Syst. Saf. 69(13),
167189 (2000)
28. U.S. Department of Energy: Title 40 CFR Part 191 Compliance Certification Application
for the Waste Isolation Pilot Plant. DOE/CAO-1996-2184, vols. IXXI. U.S. Department of
Energy, Carlsbad Area Office, Waste Isolation Pilot Plant, Carlsbad (1996)
29. Helton, J.C., Anderson, D.R., Jow, H.-N., Marietta, M.G., Basabilvazo, G.: Performance
assessment in support of the 1996 compliance certification application for the Waste Isolation
Pilot Plant. Risk Anal. 19(5), 959986 (1999)
30. Helton, J.C., Hansen, C.W., Sallaberry, C.J.: Expected dose for the early failure scenario classes
in the 2008 performance assessment for the proposed high-level radioactive waste repository
at Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 297309 (2014)
31. Sallaberry, C.J., Hansen, C.W., Helton, J.C.: Expected dose for the igneous scenario classes in
the 2008 performance assessment for the proposed high-level radioactive waste repository at
Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 339353 (2014)
32. Helton, J.C., Gross, M.B., Hansen, C.W., Sallaberry, C.J., Sevougian, S.D.: Expected dose for
the seismic scenario classes in the 2008 performance assessment for the proposed high-level
radioactive waste repository at Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 380398
(2014)
33. Hansen, C.W., Behie, G.A., Bier, A., Brooks, K.M., Chen, Y., Helton, J.C., Hommel, S.P., Lee,
K.P., Lester, B., Mattie, P.D., Mehta, S., Miller, S.P., Sallaberry, C.J., Sevougian, S.D., Vo, P.:
Uncertainty and sensitivity analysis for the nominal scenario class in the 2008 performance
1536 J.C. Helton et al.
assessment for the proposed high-level radioactive waste repository at Yucca Mountain,
Nevada. Reliab. Eng. Syst. Saf. 122, 272296 (2014)
34. Helton, J.C., Johnson, J.D., Jow, H.-N., McCurley, R.D., Rahal, L.J.: Stochastic and subjective
uncertainty in the assessment of radiation exposure at the Waste Isolation Pilot Plant. Hum.
Ecol. Risk Assess. 4(2), 469526 (1998)
35. Sandia National Laboratories: Total System Performance Assessment Model/Analysis for the
License Application. MDL-WIS-PA-000005 Rev 00, AD 01. U.S. Department of Energy Office
of Civilian Radioactive Waste Management, Las Vegas (2008)
36. Hansen, C.W., Birkholzer, J.T., Blink, J., Bryan, C.R., Chen, Y., Gross, M.B., Hardin, E.,
Houseworth, J., Howard, R., Jarek, R., Lee, K.P., Lester, B., Mariner, P., Mattie, P.D., Mehta,
S., Perry, F.V., Robinson, B., Sassani, D., Sevougian, S.D., Stein, J.S., Wasiolek, M.: Overview
of total system model used for the 2008 performance assessment for the proposed high-level
radioactive waste repository at Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 249266
(2014)
37. Hora, S.C., Iman, R.L.: Expert opinion in risk analysis: the NUREG-1150 methodology. Nucl.
Sci. Eng. 102(4), 323331 (1989)
38. Thorne, M.C., Williams, M.M.R.: A review of expert judgement techniques with reference to
nuclear safety. Prog. Nucl. Saf. 27(23), 83254 (1992)
39. Budnitz, R.J., Apostolakis, G., Boore, D.M., Cluff, L.S., Coppersmith, K.J., Cornell, C.A.,
Morris, P.A.: Use of technical expert panels: applications to probabilistic seismic hazard
analysis. Risk Anal. 18(4), 463469 (1998)
40. McKay, M., Meyer, M.: Critique of and limitations on the use of expert judgements in accident
consequence uncertainty analysis. Radiat. Prot. Dosim. 90(3), 325330 (2000)
41. Meyer, M.A., Booker, J.M.: Eliciting and Analyzing Expert Judgment: A Practical Guide.
SIAM, Philadelphia (2001)
42. Ayyub, B.M.: Elicitation of Expert Opinions for Uncertainty and Risks. CRC Press, Boca
Raton (2001)
43. Cooke, R.M., Goossens, L.H.J.: Expert judgement elicitation for risk assessment of critical
infrastructures. J. Risk Res. 7(6), 643656 (2004)
44. Garthwaite, P.H., Kadane, J.B., OHagan, A.: Statistical methods for eliciting probability
distributions. J. Am. Stat. Assoc. 100(470), 680700 (2005)
45. Helton, J.C., Martell, M.-A., Tierney, M.S.: Characterization of subjective uncertainty in the
1996 performance assessment for the Waste Isolation Pilot Plant. Reliab. Eng. Syst. Saf. 69(1
3), 191204 (2000)
46. Baudrit, C., Dubois, D.: Practical representations of incomplete probabilistic knowledge.
Comput. Stat. Data Anal. 51(1), 86108 (2006)
47. Dubois, D., Prade, H.: Possibility theory and its applications: a retrospective and prospective
view. In: Riccia, G.D., Dubois, D., Kruse, R., Lenz, H.-J. (eds.) Decision Theory and Multi-
agent Planning. CISM International Centre for Mechanical Sciences, vol. 482, pp. 89109
(2006)
48. Klir, G.J.: Uncertainty and Information: Foundations of Generalized Information Theory.
Wiley-Interscience, New York (2006)
49. Ross, T.J.: Fuzzy Logic with Engineering Applications, 2nd edn. Wiley, New York (2004)
50. Helton, J.C., Johnson, J.D., Oberkampf, W.L.: An exploration of alternative approaches to
the representation of uncertainty in model predictions. Reliab. Eng. Syst. Saf. 85(13), 3971
(2004)
51. Ross, T.J., Booker, J.M., Parkinson, W.J. (eds.): Fuzzy Logic and Probability Applications:
Bridging the Gap. Society for Industrial and Applied Mathematics, Philadelphia (2002)
52. Klir, G.J., Wierman, M.J.: Uncertainty-Based Information. Physica-Verlag, New York (1999)
53. Bardossy, G., Fodor, J.: Evaluation of Uncertainties and Risks in Geology. Springer, New York
(2004)
54. Helton, J.C., Johnson, J.D., Oberkampf, W.L., Sallaberry, C.J.: Representation of analysis
results involving aleatory and epistemic uncertainty. Int. J. Gen. Syst. 39(6), 605646 (2010)
44 Conceptual Structure of Performance Assessments for Geologic: : : 1537
55. Helton, J.C., Johnson, J.D.: Quantification of margins and uncertainties: alternative represen-
tations of epistemic uncertainty. Reliab. Eng. Syst. Saf. 96(9), 10341052 (2011)
56. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting
values of input variables in the analysis of output from a computer code. Technometrics 21(2),
239245 (1979)
57. Helton, J.C., Davis, F.J.: Latin hypercube sampling and the propagation of uncertainty in
analyses of complex systems. Reliab. Eng. Syst. Saf. 81(1), 2369 (2003)
58. Helton, J.C., Hansen, C.W., Sallaberry, C.J.: Expected dose for the nominal scenario class in
the 2008 performance assessment for the proposed high-level radioactive waste repository at
Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 267271 (2014)
59. Helton, J.C., Davis, F.J.: Latin Hypercube Sampling and the Propagation of Uncertainty in
Analyses of Complex Systems. SAND2001-0417. Sandia National Laboratories, Albuquerque
(2002)
60. Iman, R.L.: Statistical methods for including uncertainties associated with the geologic
isolation of radioactive waste which allow for a comparison with licensing criteria. In: Kocher,
D.C. (ed.) Proceedings of the Symposium on Uncertainties Associated with the Regulation
of the Geologic Disposal of High-Level Radioactive Waste, Gatlinburg, 913 Mar 1981,
pp. 145157. U.S. Nuclear Regulatory Commission, Directorate of Technical Information and
Document Control, Washington, DC (1982)
61. Helton, J.C., Hansen, C.W., Sallaberry, C.J.: Expected dose and associated uncertainty and
sensitivity analysis results for all scenario classes in the 2008 performance assessment for
the proposed high-level radioactive waste repository at Yucca Mountain, Nevada. Reliab. Eng.
Syst. Saf. 122, 421435 (2014)
62. Hansen, C.W., Helton, J.C., Sallaberry, C.J.: Use of replicated Latin hypercube sampling to
estimate sampling variance in uncertainty and sensitivity analysis results for the geologic
disposal of radioactive waste. Reliab. Eng. Syst. Saf. 107, 139148 (2012)
63. Helton, J.C., Sallaberry, C.J.: Computational implementation of sampling-based approaches
to the calculation of expected dose in performance assessments for the proposed high-level
radioactive waste repository at Yucca Mountain, Nevada. Reliab. Eng. Syst. Saf. 94(3), 699
721 (2009)
64. Sallaberry, C.J., Helton, J.C., Hora, S.C.: Extension of Latin hypercube samples with correlated
variables. Reliab. Eng. Syst. Saf. 93, 10471059 (2008)
65. Iman, R.L., Conover, W.J.: A distribution-free approach to inducing rank correlation among
input variables. Commun. Stat. Simul. Comput. B11(3), 311334 (1982)
66. Iman, R.L., Davenport, J.M.: Rank correlation plots for use with correlated input variables.
Commun. Stat. Simul. Comput. B11(3), 335360 (1982)
67. Helton, J.C., Anderson, D.R., Basabilvazo, G., Jow, H.-N., Marietta, M.G.: Summary discus-
sion of the 1996 performance assessment for the Waste Isolation Pilot Plant. Reliab. Eng. Syst.
Saf. 69(13), 437451 (2000)
68. Farmer, F.R.: Reactor safety and siting: a proposed risk criterion. Nucl. Saf. 8(6), 539548
(1967)
69. Cox, D.C., Baybutt, P.: Limit lines for risk. Nucl. Technol. 57(3), 320330 (1982)
70. Munera, H.A., Yadigaroglu, G.: On farmers line, probability density functions, and overall
risk. Nucl. Technol. 74(2), 229232 (1986)
71. Helton, J.C., Gross, M.B., Sallaberry, C.J.: Representation of aleatory uncertainty associated
with the seismic ground motion scenario class in the 2008 performance assessment for the
proposed high-level radioactive waste repository at Yucca Mountain, Nevada. Reliab. Eng.
Syst. Saf. 122, 399405 (2014)
72. Hansen, C.W., Behie, G.A., Bier, A., Brooks, K.M., Chen, Y., Helton, J.C., Hommel, S.P.,
Lee, K.P., Lester, B., Mattie, P.D., Mehta, S., Miller, S.P., Sallaberry, C.J., Sevougian S.D.,
Vo, P.: Uncertainty and sensitivity analysis for the seismic scenario classes in the 2008
performance assessment for the proposed high-level radioactive waste repository at Yucca
Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 406420 (2014)
1538 J.C. Helton et al.
73. Helton, J.C., Johnson, J.D., Sallaberry, C.J., Storlie, C.B.: Survey of sampling-based methods
for uncertainty and sensitivity analysis. Reliab. Eng. Syst. Saf. 91(1011), 11751209 (2006)
74. Helton, J.C.: Uncertainty and sensitivity analysis techniques for use in performance assessment
for radioactive waste disposal. Reliab. Eng. Syst. Saf. 42(23), 327367 (1993)
75. Iman, R.L., Shortencarier, M.J., Johnson, J.D.: A FORTRAN 77 Program and Users Guide for
the Calculation of Partial Correlation and Standardized Regression Coefficients. NUREG/CR-
4122, SAND85-0044. Sandia National Laboratories, Albuquerque (1985)
76. Iman, R.L., Conover, W.J.: The use of the rank transform in regression. Technometrics 21(4),
499509 (1979)
77. Cooke, R.M., van Noortwijk, J.M.: Graphical methods. In: Saltelli, A., Chan, K., Scott, E.M.
(eds.) Sensitivity Analysis, pp. 245264. Wiley, New York (2000)
78. Kleijnen, J.P.C., Helton, J.C.: Statistical analyses of scatterplots to identify important factors in
large-scale simulations, 1: review and comparison of techniques. Reliab. Eng. Syst. Saf. 65(2),
147185 (1999)
79. Storlie, C.B., Helton, J.C.: Multiple predictor smoothing methods for sensitivity analysis:
description of techniques. Reliab. Eng. Syst. Saf. 93(1), 2854 (2008)
80. Storlie, C.B., Helton, J.C.: Multiple predictor smoothing methods for sensitivity analysis:
example results. Reliab. Eng. Syst. Saf. 93(1), 5577 (2008)
81. Storlie, C.B., Swiler, L.P., Helton, J.C., Sallaberry, C.J.: Implementation and evaluation of
nonparametric regression procedures for sensitivity analysis of computationally demanding
models. Reliab. Eng. Syst. Saf. 94(11), 17351763 (2009)
82. Storlie, C.B., Reich, B.J., Helton, J.C., Swiler, L.P., Sallaberry, C.J.: Analysis of computation-
ally demanding models with continuous and categorical inputs. Reliab. Eng. Syst. Saf. 113(1),
3041 (2013)
83. Hora, S.C., Helton, J.C.: A distribution-free test for the relationship between model input and
output when using Latin hypercube sampling. Reliab. Eng. Syst. Saf. 79(3), 333339 (2003)
84. Hamby, D.M.: A review of techniques for parameter sensitivity analysis of environmental
models. Environ. Monit. Assess. 32(2), 135154 (1994)
85. Frey, H.C., Patil, S.R.: Identification and review of sensitivity analysis methods. Risk Anal.
22(3), 553578 (2002)
86. Ionescu-Bujor, M., Cacuci, D.G.: A comparative review of sensitivity and uncertainty analysis
of large-scale systemsI: deterministic methods. Nucl. Sci. Eng. 147(3), 1892003 (2004)
87. Cacuci, D.G., Ionescu-Bujor, M.: A comparative review of sensitivity and uncertainty analysis
of large-scale systemsII: statistical methods. Nucl. Sci. Eng. 147(3), 204217 (2004)
88. Cacuci, D.G.: Sensitivity and Uncertainty Analysis, Volume 1: Theory. Chapman and
Hall/CRC Press, Boca Raton (2003)
89. Cacuci, D.G., Ionescu-Bujor, M., Navon, I.M.: Sensitivity and Uncertainty Analysis, Volume 2:
Application to Large-Scale Systems. Chapman and Hall/CRC Press, Boca Raton (2005)
90. Saltelli, A., Chan, K., Scott, E.M. (eds).: Sensitivity Analysis. Wiley, New York (2000)
91. Saltelli, A., Ratto, M., Tarantola, S., Campolongo, F.: Sensitivity analysis for chemical models.
Chem. Rev. 105(7), 28112828 (2005)
92. Helton, J.C., Bean, J.E., Economy, K., Garner, J.W., MacKinnon, R.J., Miller, J., Schreiber,
J.D., Vaughn, P.: Uncertainty and sensitivity analysis for two-phase flow in the vicinity of the
repository in the 1996 performance assessment for the Waste Isolation Pilot Plant: disturbed
conditions. Reliab. Eng. Syst. Saf. 69(13), 263304 (2000)
93. Helton, J.C., Bean, J.E., Economy, K., Garner, J.W., MacKinnon, R.J., Miller, J., Schreiber,
J.D., Vaughn, P.: Uncertainty and sensitivity analysis for two-phase flow in the vicinity of the
repository in the 1996 performance assessment for the Waste Isolation Pilot Plant: undisturbed
conditions. Reliab. Eng. Syst. Saf. 69(13), 227261 (2000)
94. Berglund, J.W., Garner, J.W., Helton, J.C., Johnson, J.D., Smith, L.N.: Direct releases to
the surface and associated complementary cumulative distribution functions in the 1996
performance assessment for the Waste Isolation Pilot Plant: cuttings, cavings and spallings.
Reliab. Eng. Syst. Saf. 69(13), 305330 (2000)
44 Conceptual Structure of Performance Assessments for Geologic: : : 1539
95. Stoelzel, D.M., OBrien, D.G., Garner, J.W., Helton, J.C., Johnson, J.D., Smith, L.N.: Direct
releases to the surface and associated complementary cummulative distribution functions in the
1996 performance assessment for the Waste Isolation Pilot Plant: direct brine release. Reliab.
Eng. Syst. Saf. 69(13), 343367 (2000)
96. Stockman, C.T., Garner, J.W., Helton, J.C., Johnson, J.D., Shinta, A., Smith, L.N.: Radionu-
clide transport in the vicinity of the repository and associated complementary cumulative
distribution functions in the 1996 performance assessment for the Waste Isolation Pilot Plant.
Reliab. Eng. Syst. Saf. 69(13), 370396 (2000)
97. Ramsey, J.L., Blaine, R., Garner, J.W., Helton, J.C., Johnson, J.D., Smith, L.N., Wallace, M.:
Radionuclide and colloid transport in the Culebra Dolomite and associated complementary
cumulative distribution functions in the 1996 performance assessment for the Waste Isolation
Pilot Plant. Reliab. Eng. Syst. Saf. 69(13), 397420 (2000)
98. Hansen, C.W., Behie, G.A., Bier, A., Brooks, K.M., Chen, Y., Helton, J.C., Hommel, S.P.,
Lee, K.P., Lester, B., Mattie, P.D., Mehta, S., Miller, S.P., Sallaberry, C.J., Sevougian, S.D.,
Vo, P.: Uncertainty and sensitivity analysis for the early failure scenario classes in the 2008
performance assessment for the proposed high-level radioactive waste repository at Yucca
Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 310338 (2014)
99. Sallaberry, C.J., Behie, G.A., Bier, A., Brooks, K.M., Chen, Y., Hansen, C.W., Helton, J.C.,
Hommel, S.P., Lee, K.P., Lester, B., Mattie, P.D., Mehta, S., Miller, S.P., Sevougian, S.D.,
Vo, P.: Uncertainty and sensitivity analysis for the igneous scenario classes in the 2008
performance assessment for the proposed high-level radioactive waste repository at Yucca
Mountain, Nevada. Reliab. Eng. Syst. Saf. 122, 354379 (2014)
Redundancy of Structures and Fatigue of
Bridges and Ships Under Uncertainty 45
Dan M. Frangopol, Benjin Zhu, and Mohamed Soliman
Abstract
Bridges and ships are key components of civil and marine infrastructure sys-
tems, respectively. Due to the nature of their in-service role, failure of such
structures may lead to very high consequences. Preventing such failures requires
rigorous design and life-cycle management techniques. In order to maintain
satisfactory performance for these structures throughout their service life, various
uncertainties associated with the design and management processes should be
properly accounted for. Among the important design considerations for bridges
and ships that highly depend on the proper identification of these uncertainties
are the structural redundancy quantification and fatigue life assessment. In this
chapter, quantification of redundancy and its integration into the design process
of structural components are discussed. Additionally, the probabilistic fatigue
assessment problem and the sources of uncertainties associated with the fatigue
life prediction models are presented.
Keywords
Redundancy Reliability Uncertainty Fatigue Life-cycle
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1542
2 Quantitative Redundancy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1543
2.1 Time-Variant Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1545
2.2 Redundancy Factor for Structural Component Design . . . . . . . . . . . . . . . . . . . . . . . 1547
3 Fatigue in Bridges and Ships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1552
3.1 Fatigue Assessment Based on Fracture Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . 1553
3.2 The S-N Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1562
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1562
1 Introduction
Structures are expected to maintain desired levels of serviceability and safety during
their service life. Due to the unavoidable uncertainties in structural design and
performance assessment, probabilistic performance indicators, such as reliability
and redundancy, were introduced. Redundancy, in general, indicates the ability
of a structural system to redistribute the load after the failure of one component
or subsystem. Adequate redundancy should be considered in structural design.
System redundancy has attracted a lot of research interest in recent decades
[19, 21, 22, 26, 44].
Even though redundancy is a highly desired attribute for a structure, quantitative
guidance is not commonly available in the structural design specifications. Load
and resistance factor design (LRFD) replaced allowable stress design within the last
decades with better understanding of structural performance and system reliability
theory [2]. Since LRFD approach considers the uncertainties associated with the
resistances and load effects using separate factors, it provides better means of
incorporating system reliability and redundancy concepts in the structural design.
Several measures for quantifying structural redundancy are available in the
literature. Frangopol and Curley [19] defined the system redundancy as the ratio
of the reliability index of the intact system to the difference between the reliability
indices of the intact and the damaged systems. Rabi et al. [44] defined the redun-
dancy by comparing the probability of system collapse to the probability of first
member failure. Ghosn and Moses [22] proposed three redundancy measures that
are defined in terms of relative reliability indices. Since redundancy is regarded as an
important performance indicator, research on the evaluation of system redundancy
in real structures has been conducted. Tsopelas and Husain [52] studied the system
redundancy of reinforced concrete buildings. Wen and Song [53] investigated the
redundancy of special moment resisting frames and dual systems under seismic
excitations. Kim [27] evaluated the redundancy of steel box-girder bridge by using
a nonlinear finite element model.
Fatigue is another critical mechanism which can adversely affect the safety
of metallic structures. Fatigue is the process of crack initiation and propagation
resulting from repetitive loads on the structure [17]. These cracks, if not effectively
detected and repaired, may cause sudden failure of the damaged component. If
the structure does not have an adequate redundancy level, a catastrophic structural
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1543
failure may occur due to the failure of the cracked component. Although method-
ologies for fatigue assessment and design are already established in literature and
design specifications, fatigue cracks exist in various types of structures such as
bridges, offshore structures, and naval vessels. This can be attributed to the fact that
a large number of factors contribute to the fatigue crack initiation and propagation
including the structural configuration of the detail, presence of initial defects,
environmental conditions, and loading conditions, among others. Additionally,
several sources of uncertainty affect the fatigue crack initiation and propagation.
Such uncertainties must be well-understood and should be accounted for in order to
correctly assess the performance of a structure with respect to fatigue. Accordingly,
several researchers address the fatigue life estimation and performance assessment
on probabilistic basis [28, 41, 54].
This chapter presents a brief review of several proposed redundancy measures.
A methodology to assess the life-cycle redundancy of a structure is described
in details in context of system reliability. Next, several quantifications of the
redundancy factor used for structural component design are discussed. The chapter
also discusses the probabilistic aspects of the fatigue life estimation problem.
C
R1 D (45.1)
Q
The reserve strength factor varies from a value of infinity, when the intact structure
has no load, to a value of 1.0, when the nominal load on the intact structure equals
its capacity.
The residual strength factor provides a measure for the strength of the system
in a damaged condition compared to the intact system. It is defined as the ratio of
the capacity of the damaged structure or member, Cd , to the capacity of the intact
structure or member, C , and can be expressed as [20]
Cd
R2 D (45.2)
C
This factor takes values between 0, when damaged structure has zero capacity,
and 1.0, when damaged structure does not have anyreduction in its load-carrying
1544 D.M. Frangopol et al.
capacity. Frangopol and Klisinski [20] also proposed the following deterministic
measure of redundancy:
1
R3 D (45.3)
1 R2
LFu
Ru D (45.4)
LF1
LFf
Rf D (45.5)
LF1
LFd
Rd D (45.6)
LF1
where Ru is the system reserve ratio for the ultimate limit state, Rf is the system
reserve ratio for the functionality limit state, and Rd is the system reserve ratio for
the damage condition.
In order to account for the uncertainties in system and member capacities, as well
as the applied loads, Frangopol and Curley [19] proposed a probabilistic measure
for redundancy as
Pf .d mg/ Pf .sys/
RI D (45.7)
Pf .sys/
i ntact
IR D (45.8)
i ntact damaged
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1545
where IR is redundancy index, i ntact is the reliability index of the intact system,
and damaged represents the reliability index of the damaged system.
Ghosn and Moses [22] also proposed probabilistic measures of redundancy. They
examined the differences between the reliability indices of the systems expressed
in terms of the reliability index associated with the ultimate capacity of the intact
structure, ultimate ; the reliability index associated with the functionality loss,
functionality ; the reliability index associated with the ultimate capacity of the damaged
structure, damaged ; and the reliability index of the most critical member of the
structure, member . As measures of redundancy levels, they investigated the relative
reliability indices for the ultimate u , functionality f , and damaged d limit
states and are defined as:
The majority of studies on system redundancy assume that both the loads and
resistances are time independent. In other words, the probability density functions
(PDF) of the random variables are kept constant during the lifetime of a structure due
to the complexity involved in dealing with time-dependent systems. Nevertheless,
accounting for the effects of time on structures may have significant impact on
the redundancy measures and is necessary for accurate performance assessment of
structural systems.
Prediction of both the time-variant loads and resistances is required to include
the time factor in the performance prediction model. The resistances of structural
components and systems are subjected to deterioration in time. Deterioration can
be caused by various mechanisms, such as corrosion and fatigue. Furthermore, the
applied loading may increase over time (e.g., traffic loads on bridges). The combined
effect of the decrease in resistance and increase in load will lead to significant drop
of structural reliability and redundancy.
In this section, a procedure for evaluating the time-variant redundancy for
structural systems under uncertainty and deterioration, based on the redundancy
measure given in Eq. (45.8), is presented. As indicated in this equation, the relia-
bilities (or failure probabilities) associated with the intact and damaged structures
must be determined to evaluate redundancy. A common approach for assessing
the time-variant reliability is the point-in-time method in which the reliability
index is computed at discrete time points to establish a lifetime reliability profile.
Alternatively, the reliability and redundancy can be assessed in continuous time
periods using survivor functions [40]. In the point-in-time method, the limit state
functions are required to be updated at every time point according to the predictive
models for deteriorated resistance and increased traffic loads.
1546 D.M. Frangopol et al.
uncertainties. The variation of the live load over time is required in order to perform
time-dependent reliability analysis. Then, the limit state equation consisting of the
resistance and the load effects needs to be established. The instantaneous probability
of failure and/or reliability index associated with a damage case can be obtained by
first-order reliability method (FORM), second-order reliability method (SORM),
or Monte Carlo simulation. Once the reliability index or probability of failure is
obtained for a time point, the redundancy index can be calculated based on the
adopted formulation (e.g., Eq. (45.8)). This procedure should be repeated at different
time steps with time-variant values of resistance and load effects in order to obtain
the lifetime redundancy profile. In addition, redundancy associated with different
damage cases must be evaluated to identify the importance of components within
the system.
The procedure for system reliability assessment with emphasis on bridges is
further illustrated herein. The system reliability of bridges can be assessed based on
appropriate assumptions associated with the interaction of individual components.
In general, a bridge system can be modeled as a series, parallel, or series-
parallel system. The reliability of a bridge structural system is often evaluated by
considering the system failure as series-parallel combination of the components
failures. After determining the random variables and their statistical parameters for
component reliability analysis, the limit states of components can be included in the
system model. A bridge component may have several failure modes. For example, a
girder can fail in bending or shear. The flexural or shear capacity for girders and the
slab can be calculated based on the strength limit state relations given in AASHTO
LRFD Bridge Design Specifications [1].
Another approach for redundancy assessment of bridges makes use of finite
element (FE) method. An appropriate statistical distribution for the desired output
of FE analysis (e.g., stress, displacement, bending moment) can be obtained by
repeating the analysis for a large number of samples of the random variables associ-
ated with the structure. However, for complex structures, it may be computationally
expensive to repeat the FE analysis several times. In such cases, the response surface
method (RSM) can be used to approximate the relation between the desired output
of FE analysis and random variables by performing analyses for a significantly
less number of samples [40]. In this method, the response surface function is
generated by fitting a mathematical model to the simulation outcomes in terms of
input variables of the simulation model. Support points for each design variable are
selected to estimate the parameters of the response surface function. This task can
be performed by using experimental design concepts.
Structures are purposefully designed much stronger than what is usually needed to
accomodate intended loading. The main objective is to create a margin of safety
under extreme events with high consequences, unexpected loadings, and structural
deterioration. The extent by which the structural capacity of a system exceeds the
1548 D.M. Frangopol et al.
expected loads or actual loads is described as factor of safety. Safety factor is applied
to the material strength of structural component in the conventional allowable stress
design (ASD) to obtain the allowable strength. Although this ASD has been the
primary approach used for structural component design since the early 1900s, it is
not a rational and economical design method as the uncertainties involved in the
structural design process are not properly considered.
With the increased understanding of structural behavior and loading process,
the design philosophy evolved to the load factor design (LFD) around the 1970s.
Compared to the ASD, LFD recognizes that the live load was more variable
than dead load, and, therefore, uncertainties in load prediction were considered
by introducing the load factors. However, deterministic approaches were still
used when calibrating these factors. Due to the development and application of
probability-based reliability theory in structural engineering, a design philosophy
referred to as load and resistance factor design (LRFD) was introduced in the 1980s.
Compared with the previous design methods, LRFD is more rational because the
uncertainties associated with resistances and loads are quantitatively incorporated
in the design process [51].
Structural components in a system do not behave independently; instead, they
interact with other components under external loads to form a structural system.
However, the previous design methods ignore this system effect in the component
design. The aforementioned definitions of redundancy indicate that it is connected
to system behavior. Therefore, consideration of system redundancy is essential
for component design. Structures with sufficient redundancy can be rewarded by
allowing their components to have less conservative designs, while structures that
are nonredundant would be required to have higher component capacities. The
existence of uncertainties in resistances and loads requires quantifying the system
redundancy in a probabilistic manner. Since the LRFD approach was developed
based on the probability theory, it provides a basis to incorporate system reliability
and redundancy concepts in structural component design.
In the current LRFD design specifications, load and resistances factors are
developed using statistical knowledge of the variability of loads and structural
performance. In the AASHTO LRFD bridge design specifications, three system
factors relating to ductility, redundancy, and operation classification, respectively,
are applied on the load side of the strength limit state. The factor relating to redun-
dancy, denoted as R , is included to account for the effect of system redundancy on
the component design. Its value is determined based on three redundancy levels [1],
as follows:
8
< 1.05 for nonredundant member
R D 1.00 for conventional level of redundancy (45.12)
:
0.95 for exceptional levels of redundancy
This classification of the redundancy levels is very general, and the evaluation
of the factor relating to redundancy is very subjective. In fact, this factor is
affected by several parameters, such as system type, number of components in
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1549
1. Member failure limit state: to check the safety of an individual component using
elastic analysis.
2. Ultimate limit state: to determine the ultimate capacity of the intact bridge
system.
3. Functionality limit state, which is defined as a maximum acceptable displace-
ment of a main component under live load.
4. Damaged condition limit state: to find the ultimate capacity of a damaged bridge
system.
same component when its reliability index is the same as that of the system. For
example, consider a single component with resistance R and load P which are
treated as random variables. Given the distribution types of R and P , mean value of
load E.P /, coefficients of variation of resistance and load V .R/ and V .P /, and the
prescribed component reliability index c D 3:5, the mean value of the component
resistance Ec .R/ can be determined (e.g., by using Monte Carlo simulation). In two
specific cases where both R and P follow normal or lognormal distribution, Ec .R/
can be directly calculated by solving Eqs. (45.13) and (45.14), respectively:
Ec .R/ E.P /
c D q (45.13)
.Ec .R/ V .R//2 C .E.P / V .P //2
h q i
1CV 2 .P /
ln EE.P
c .R/
/ 1CV 2 .R/
c D p (45.14)
ln .1 C V 2 .R// .1 C V 2 .P //
For a three-component parallel system whose components are the same as the
single component just described (e.g., geometry and material properties, distribution
parameters), the load acting on the system is assumed to be 3P , and the correlation
coefficient between the resistances of components i and j is assumed to be
.Ri ,Rj /. According to the definition of redundancy factor, the target system
reliability index should be the same as the component reliability index, which is 3.5
herein. Therefore, the mean value of component resistance in the system, denoted
as Ecs .R/, can be calculated by using Monte Carlo simulation. Having obtained the
mean resistance of a component in a system when the system reliability index is
3.5, Ecs .R/, and the mean resistance of the same component when the component
reliability index is 3.5, Ec .R/, the redundancy factor, denoted as R , which is
defined as the ratio of Ecs .R/ to Ec .R/, can be determined.
Since system reliability is affected by system modeling, number of components
in the system, correlations among the component resistances, and post-failure
material behavior of components, the effects of these parameters on the system
redundancy should be considered in the quantification of the redundancy factor.
Three brief examples are presented herein to illustrate these effects. The first
example is a four-component structure. Based on different definitions of system
failure, three types of systems can be modeled: (45.1) series model, the system
fails if any component fails; (45.2) parallel model, the system fails only if all
components fail; and (c) series-parallel model, the system fails if components
1 and 2 fail simultaneously or components 3 and 4 fail at the same time. The
resistance R and load effect P associated with each component are assumed to
be the same and to follow normal distribution. The coefficients of variation of
R and P are considered 10%. The mean value of load affect, denoted E.P /, is
assumed to be 10. Three correlation cases among the resistances of components are
investigated herein: (a) .Ri ,Rj / = 0, no correlation; (b) .Ri ,Rj / = 0.5, partial
correlation; and (c) .Ri ,Rj / = 1.0, perfect correlation. Therefore, the mean values
of resistance of a single component when its reliability is 3.5 are obtained for the
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1551
three correlation cases using Eq. (45.13). Similarly, the mean values of component
resistances Ecs .R/ when the system reliability index is also 3.5 can be calculated for
the three systems associated with different correlation cases. Finally, the redundancy
factors R for series, parallel, and series-parallel systems are obtained as (45.1)
1.062, 0.785, and 0.878 when .Ri ,Rj / = 0; (45.2) 1.059, 0.859, and 0.933 when
.Ri ,Rj / = 0.5; and (45.3) 1.0, 1.0, and 1.0 when .Ri ,Rj / = 1.0. It is noticed
that the redundancy factor associated with series system is the highest, while its
counterpart associated with parallel system is the lowest. As the correlation among
the component resistances increases, the redundancy factor decreases in the series
system but increases in the parallel and series-parallel systems. The redundancy
factor in the perfect correlation case is independent of the system modeling type.
These observations indicate that the component in the series system requires
larger redundancy factor in the design to reach the targeted system reliability level.
However, in the parallel system, components can be designed more economically.
Higher correlation among the component resistances leads to higher redundancy
factor in the parallel and series-parallel systems, which implies that the component
needs to be designed stronger. An opposite conclusion is drawn for the series
system. In the perfect correlation case, the redundancy factor is always 1.0. This
is expected because a system whose components are identical and their failure
modes are perfectly correlated can be reduced to a single component. Therefore,
the redundancy factor does not change with the system type.
The second example is to study the effect of a number of components on the
redundancy factor. An additional three-component structure is investigated, and
its results will be compared with those from the four-component systems just
discussed. Two types of systems are formed for this three-component structure:
series and parallel. The components are considered as identical and the loads
acting on them are assumed to be similar. The statistical parameters associated with
resistances and loads (i.e.,V .R/, V .P /, and E.P // in the previous four-component
structure are used herein. Assuming the target reliability index both for single
component and the systems is 3.5, the redundancy factors of the series and parallel
systems are found to be: (45.1) 1.049 and 0.812 when .Ri ,Rj / D 0; (45.2) 1.047
and 0.879 when .Ri ,Rj / = 0.5; and (45.3) 1.0 and 1.0 when .Ri ,Rj / D 1:0.
By comparing these factors with the values in the four-component systems, it is
seen that as the number of components increases from three to four, the redundancy
factor associated with series system increases, while its counterpart associated with
the parallel system decreases. However, this conclusion is not valid in the perfect
correlation case where the redundancy factor is constant.
All the materials used in practical cases have their own behaviors. The ability of
components to carry loads beyond the elastic limits affects the redistribution of loads
within a system and the availability of alternative load paths. Therefore, the post-
failure material behavior of components needs to be considered in the quantification
of the redundancy factor [58]. The last example in this section studies the impact
of post-failure behavior on the redundancy factor using the aforementioned three-
component parallel system. The load acting on the system is 3P . The resistances
and loads of components follow normal distribution with the parameters mentioned
1552 D.M. Frangopol et al.
previously. Two types of post-failure behavior are examined: ductile and brittle.
A ductile component can still take load after yielding; however, a brittle component
completely loses its capacity when it fails. This difference results in different limit
state equations of the parallel system in the ductile and brittle cases, and therefore,
the mean resistances in these two cases will be different. If the target reliability
index is predefined as 3.5, the redundancy factors associated with the ductile and
brittle cases are obtained as: (45.1) 0.98 and 1.033 when .Ri ,Rj / D 0; (45.2)
0.989 and 1.026 when .Ri ,Rj / = 0; and (45.3) 1.0 and 1.0 when .Ri ,Rj / D 1:0.
It is noticed that the redundancy factor of the ductile system is much lower than that
of the brittle system in the no correlation and partial correlation cases. This means
the component with brittle material needs to be designed much stronger than the one
with ductile material.
It is seen from the three examples that the redundancy factor is significantly
affected by the system modeling, number of components, correlations among the
component resistances, and post-failure material behavior. This emphasizes the
necessity of quantifying redundancy factors for a variety of systems with different
parameters mentioned previously. To achieve this goal, nondeterministic idealized
systems are used to evaluate the redundancy factors for ductile and brittle systems
with up to 500 components considering different system types, several correlation
cases, and two probability distribution types. As shown in Zhu et al. [59], the
obtained redundancy factors in ductile and brittle systems can be often larger than
the upper bound (i.e., 1.05) or lower than the lower bound (i.e., 0.95) as defined in
the AASHTO LRFD Bridge Design Specifications.
for fatigue design and assessment due to its simplicity and sufficient agreement
with experimental test results. Fatigue assessment approaches based on fracture
mechanics can also be used to assess the crack conditions at a certain detail and
predict the time-variant crack size.
A brief discussion on the S N approach and the linear elastic fracture mechanics
in fatigue assessment is presented next. Uncertainties associated with different
model parameters are also discussed. Additionally, the widely implemented method-
ologies for quantifying and reducing these uncertainties are presented.
In this method, the stresses near the crack tip are related to the range of the stress
intensity factor, K: Paris law [42] relates the crack growth rate to the range of the
stress intensity factor as follows:
da
D C .K/m (45.15)
dN
where a is crack size, N is number of cycles, and K is range of the stress intensity
factor. C and m are material parameters. The range of the stress intensity factor can
be expressed as
p
K D K.a/ Sre a (45.16)
where Sre is the stress range and K.a/is the generalized stress intensity factor which
depends on the crack orientation and shape. This factor can be expressed as [16]
K.a/ D Fe Fs Fw Fg (45.17)
in which Fe , Fs , Fw , and Fg are correction factors taking into account the effects of
the elliptical crack shape, free surface, finite width, and nonuniform stress acting on
the crack, respectively. Solutions for these correction factors can be found in Tada
et al. [50]
Using Eqs. (45.15) and (45.16), the number of cycles associated with a growth in
the crack size from an initial size of ao to a size of at can be calculated as
Z at
1 1
N D m
p m da (45.18)
C Sre ao K.a/ a
Considering the average daily number of cycles Navg to be constant, the time
interval (in years) associated with a crack growth from ao to a size of at can be
calculated as
Z at
1 1
tD p da (45.19)
365 Navg C Sre ao K.a/ a m
m
The integration in Eq. (45.19) can be performed numerically to find the crack growth
profile for the analyzed detail. However, the cases where K.a/ is not constant
may take considerable computational effort especially if probabilistic performance
1554 D.M. Frangopol et al.
Instance, Moan and Song [38] considered the mean value to be 0.11 mm. Several
distribution types for this parameters have also been reported including lognormal
distribution [30, 49] and exponential distribution [38]. Additionally, the coefficient
of variation of the initial crack size has been reported to range from 0.2 to 0.5
[8, 30]. Accordingly, a significant variability in the value of the initial crack size
can be seen. Similar variability in the descriptors of the random variables C and
m can also be found. Accordingly, attempts to quantify the uncertainties associated
with these parameters should be made to improve the accuracy of the stochastic
analysis.
To help quantifying the uncertainties associated with such model parameters,
the Bayesian approach can be used to find a better representation of the model
parameters descriptors given field inspection information. In this approach, the
information from inspection actions is used to represent the likelihood function
which can be combined with the prior knowledge of the model parameters to
yield an updated posterior distribution of the model parameters. Hence, the per-
formance prediction is performed using the posterior parameters to achieve more
reliable results. Figure 45.3 schematically shows this process. This approach was
investigated by [24], in which the probability of detection was used in an updating
procedure to predict the posterior distribution of the initial crack size at a certain
point in time, and the measurement was used as the new data to update the PDF
of the initial crack size. However, inspection optimization and scheduling was not
considered as a goal of their study. Perrin et al. [43] used Bayesian techniques
and Markov Chain Monte Carlo (MCMC) for fatigue crack growth analysis based
on data collected during experimental investigations. Their results showed the
feasibility of updating the model parameters based on crack size measurements.
Li et al. [34] used Bayesian updating to study the effect of the sensor degradation
on the estimation of the remaining useful life of structures. Soliman and Frangopol
[47] proposed an approach to establish integrated management plans that provide
optimal intervention times and types while making use of the available inspection
and monitoring information to improve the performance prediction process, and,
hence, better and effective decisions can be made. In this process, information
from field measurement is used to represent the likelihood function which can be
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1557
combined with the prior information on model parameters to find their posterior
distributions. The posterior distributions can be found as
P ./ P .dj/
P .jd/ D R (45.21)
P ./ P .dj/d
where P .jd/ is the posterior distribution of model parameters given the addi-
tional information d, P .) represents the prior distribution of model parameters,
P .dj/is the likelihood function of obtaining information d conditioned by , and
d and are the vectors of observed data and model parameters, respectively.
By knowing the prior distributions and the likelihood function, the posterior
distributions can be established using sampling approaches based on Markov chain
Monte Carlo simulation such as the Metropolis algorithm [36] or the slice sampling
algorithm [39]. The considered likelihood function is [43]
n
" !#
Y 1 1 di ap;i 2
P .dj/ D p exp (45.22)
iD1
2 e 2 e
where di and ap;i are the observed and predicted data, respectively, at the i th
inspection; e represents a single-error term combining the measurement and
modeling errors which is assumed to follow a normal distribution with zero mean
and a standard deviation e (i.e., N (0, e //. The vector of observed data, consisting
of crack size measurements identified during fatigue inspections, is
where ai ns;1 = measured crack size at the i th inspection and n = number of inspec-
tions. Based on the sample space of inspection outcomes, subsequent inspection
times can be found. In Soliman and Frangopol [47], this updating process was
integrated in a comprehensive life-cycle management approach to find updated
inspection times based on field measurements. An optimization process, executed
iteratively for possible crack size measurements, was constructed and solved to find
the optimum inspection time which minimizes the total cost which included the
inspection cost as well as the failure cost. The outcome of this process is shown
schematically in Fig. 45.4.
Although fatigue assessment models based on fracture mechanics are well
established, the standardization and formalization of these approaches, especially
with respect to stochastic analysis, is still required.
of the stress range and the logarithm of the expected number of cycles to failure for
several types of details. In order to construct the S N line for a certain detail, several
specimens are tested in the laboratory under different cyclic stress amplitudes until
a crack with predefined size propagates through the detail. The number of cycles to
failure is plotted against the applied stress range of each specimen, and regression
analysis is performed to plot the mean S N line in the logarithmic scale. In order
to achieve a satisfactory probability of survival for designed structures, a design line
is usually defined in the codes through shifting the mean line horizontally to the left
by a certain amount.
The S N relationship within each portion of the S N lines can be expressed as
N D A S m (45.24)
where N is the number of cycles to failure under the stress range S , A is a fatigue
detail coefficient for each category, and m is the slope parameter of the S N line.
In most of the structural components in bridges and ships, fatigue-prone details
are normally subjected to variable amplitude stress-range cycles. Accordingly, an
equivalent constant amplitude stress range is computed to calculate the fatigue
damage under this loading. Miners rule [37] is the most widely employed method to
quantify the fatigue damage accumulation due to variable amplitude loading. If the
histogram of stress range is identified for the detail, and assuming a linear damage
accumulation, Miners damage accumulation index is
Xnss
ni
DD (45.25)
iD1
N i
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1559
"n # m1
X ss
ni m
Sre D Si (45.26)
iD1
NT
P
nss
A
where NT D ni . By knowing the average annual number of cycles Navg , an
iD1
estimate of the fatigue life can be computed as
N
t .years/ D A
(45.27)
Navg
Several stress analysis methods can be used for the fatigue assessment of steel
and aluminum details, namely, the nominal stress, structural hot spot stress, and
notch stress. The choice of the stress type and its corresponding S N relationships
depends on the type of details, available measurement data, and the used specifi-
cations. The nominal stress approach is adopted by several design and assessment
guides such as the Eurocode 9 [15] and the AASHTO [1]. In this approach, the stress
range refers to the stresses acting on the considered location neglecting the stress
concentration arising from both the weld effects and the structural configuration.
Accordingly, The S N lines inherently consider these effects. This approach is
characterized by the ease of application since the calculation of nominal stresses can
be simply computed for most structural engineering problems. However, assessing
fatigue life of a specific detail using this approach requires the existence of a similar
match in the design or assessment guidelines, which requires experimental testing
of a large number of details. For steel bridges, the S N lines in the design and
assessment guidelines cover most of the details adopted in practice. However, for
steel and aluminum ships, several of the details adopted in practice may not be
found in the design guidelines which increases the difficulty of applying the nominal
stress approach and calls for the application of either the hot spot or notch stress
approaches.
In the structural hot spot stress approach, the stress induced in the proximity of
the weld is used. This stress includes the stress concentration due to the structural
configuration but not due to the weld itself. Accordingly, this stress is compared
to S N lines which similarly incorporate the effect of weld stress concentration.
1560 D.M. Frangopol et al.
Accordingly, computing the hot spot stress requires more advanced structural
analysis when compared to the nominal stress. However, using this approach
requires a lower number of S N lines to be developed, which reduces the cost
of experimental investigations. To compute the stress without the excluding the
concentration due to weld, a single reference point at a prescribed distance from
the weld toe can be used. Alternatively, the hot spot stress can be extrapolated by
measurement performed at multiple reference points.
In the notch stress approach, the total stress acting at the crack initiation location
including the stress concentration due to both the structural configuration and the
weld geometry is considered. The notch stress is usually more difficult to obtain;
however, it can be used to find the fatigue life of the structural detail using the S N
curve for a base non-welded metal.
For design purposes, it is generally preferred to use the nominal stress approach
which requires an exact match of the detail to be found in the design specifications.
When dealing with SHM data, it may not be practical to find the stress concentration
at the weld toe using strain measurement, due to the high stress gradient at this
location. Thus, depending on the available data, the nominal stress approach can be
used if a similar detail can be found in design guides. Otherwise, the structural hot
spot stress approach can be used.
Although the fatigue life estimation by using the S N approach is straight-
forward, uncertainties associated with the fatigue damage process requires more
robust probabilistic analysis to evaluate the fatigue life of structural components.
Accordingly, reliability-based methodologies for fatigue life assessment have been
proposed. These methodologies use the time-variant reliability index as the perfor-
mance indicator. Through the definition of minimum acceptable reliability levels,
the fatigue life is considered to be reached when the reliability reaches its minimum
threshold.
The reliability index is related to the probability of failure Pf (i.e., the
probability of violating a certain limit state), through the following relationship:
D 1 1 Pf (45.28)
in which 1 ./ is the inverse of the standard normal cumulative distribution func-
tion. For the estimating the reliability-based remaining fatigue life, the following
performance function can be used [3, 25, 31, 48]:
g .t / D D.t / (45.29)
Most of the studies recommend the mean value to be 0.9 [3] or 1.0 ([9, 48]; [31])
with a coefficient of variation of 0.48 [3, 9] or 0.3 [54].
Based on Eq. (45.24), Minerss damage accumulation index can be computed as
N .t / m t Navg m
D .t / D Sre D Sre (45.30)
A A
where A and m are the S N relationship parameters, Sre is the equivalent constant
amplitude stress range, t is the operational time of the structure, and Navg is average
number of cycles. In this model, the S N parameter A can also be treated as a
random variable following a lognormal distribution whose parameters can also be
established based on regression analysis of test data.
Based on Eqs. (45.29) and (45.30) and assuming that all the random variables
(i.e., Sre , A, and ) follow the lognormal distribution [3, 31], the fatigue reliability
index can be derived as follows:
C A m Sre ln N .t /
.t / D q (45.31)
2
C A2 C .m Sre /2
where and are the lognormal parameters associated with different random
variables. If the target reliability threshold target is known, the fatigue life tf can
be determined as follows [48]:
e kmSre
tf D (45.32)
Navg
where
p
k D C A target 2 (45.33)
and
2 D
2
C A2 C .m Sre /2 (45.34)
Eq. (45.32) represents a direct way to estimate the reliability-based fatigue life
for a specific detail once the associated stress-range distribution is known. The
performance function in Eq. (45.29) can be used to calculate the reliability index
using computer software such as RELSYS [13]. Alternatively, Eq. (45.31) can be
used directly to calculate the reliability index under the assumption that all the
random variables have a lognormal PDF. After establishing the time-variant fatigue
reliability profile, the service life is considered to be achieved when the reliability
level reaches the minimum allowable reliability level. With respect to fatigue, this
reliability threshold is considered to be ranging from two to four [35].
In newly constructed bridges and naval vessels, fatigue-prone components are
designed to have either an infinite fatigue life or a fatigue life that is long enough to
avoid failure during the service life. For bridges, fatigue cracks should be repaired
as soon as they are discovered in order to prevent the occurrence of sudden failures.
1562 D.M. Frangopol et al.
In some structures, such as cable stayed bridges where it is difficult to inspect all
the details frequently, monitoring devices can be installed to provide warning when
alarming crack growth is detected. However, for large-scale structures which rely to
a great extent on welding connections such as ships, thousands of crack initiation
locations exist and fatigue cracking is inevitable. In such cases, fatigue repairs and
inspections can significantly impact the life-cycle operational cost of a vessel, and
the structure can be operating with undetected growing cracks. Methodologies to
identify the effect of these growing cracks on the structural performance, reliability,
and redundancy levels are still required.
4 Conclusions
The ultimate goal of the design process is minimizing the life-cycle cost while
maintaining safety at desired levels [18]. Structural systems are subjected to damage
and deterioration in time which may lead to sudden failures. Therefore, in addition
to basic safety measures such as the reliability index, quantifying the structural
redundancy is beneficial. This chapter briefly discussed the redundancy of structural
systems in a life-cycle context. Several measures for system redundancy were
reviewed. Then, the time-dependent aspects of redundancy are described, and a
methodology for assessing time-dependent redundancy is presented. Finally, several
quantifications of the redundancy factor used for structural component design are
discussed. The chapter also discussed the probabilistic aspects of the fatigue life
estimation problem.
In a probabilistic context, system reliability methods should be incorporated
to examine structural system behavior. There are several factors which might
have significant impact on the redundancy of structural systems, such as material
behavior, system modeling type, and correlation effects, among others. These
factors need to be included not only in the redundancy assessment of existing
structures but also in the design of new structures.
Much research is still necessary to solve uncertainty quantification problems
related to structural redundancy and fatigue in real-life applications. Aleatoric
and epistemic uncertainties have to be considered and both forward uncertainty
propagation and inverse uncertainty quantification have to be used in order to design
robust structures and assess and predict the safety of existing structures.
References
1. American Association of State Highway and Transportation Officials (AASHTO): AASHTO
LRFD Bridge Design Specifications, 7th edn. American Association of State Highway and
Transportation Officials, Washington, DC (2014)
2. Ang, A.H-S., Tang, W.H.: Probability Concepts in Engineering Planning and Design, vol. 2.
Wiley, New York (1984)
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1563
3. Ayyub, B.M., Assakkaf, I.A., Kihl, D.P., Siev, M.W.: Reliability-based design guidelines for
fatigue of ship structures. Naval Eng. J. 114(2), 113138 (2002)
4. Barsom, J.M., Rolfe, S.T.: Fracture and Fatigue Control in Structures: Applications of Fracture
Mechanics. ASTM, West Conshohocken (1999)
5. British Standards Institute (BSI): Steel, Concrete, and Composite Bridges: Code of Practice
for Fatigue. 5400-Part 10. British Standards Institute, London (1980)
6. British Standards Institute (BSI): Guide to Methods for Assessing the Acceptability of Flaws
in Metallic Structures. BS7910, British Standards Institute, London (2005)
7. Chan, T.H.T., Li, Z.X., Ko, J.M.: Fatigue analysis and life prediction of bridges with structural
health monitoring data Part II: application. Int. J. Fatigue 23(1), 5564 (2001)
8. Chung, H., Manuel, L., Frank, K.: Optimal inspection scheduling of steel bridges using
nondestructive testing techniques. J. Br. Eng. 113, 305319 (2006)
9. Collette, M., Incecik, A.: An approach for reliability-based fatigue design of welded joints in
aluminum high-speed vessels. J. Ship Res. 50(3), 8598 (2006)
10. Collette, M.: Strength and Reliability of Aluminum Stiffened Panels, pp. 139198, A
Thesis submitted for the Degree of Doctor of Philosophy, School of Marine Science and
Technology, Faculty of Science, Agriculture and Engineering, University of Newcastle
(2005)
11. Connor, R.J., Fisher, J.W.: Identifying effective and ineffective retrofits for distortion fatigue
cracking in steel bridges using field instrumentation. J. Br. Eng. 11(6), 74552 (2006)
12. Downing, S.D., Socie, D.F.: Simple rainflow counting algorithms. Int. J. Fatigue 4(1), 3140
(1982)
13. Estes, A.C, Frangopol, D.M.: RELSYS: A computer program for structural system reliability
analysis. Struct. Eng. Mech. Techno Press 6(8), 901919 (1998)
14. Eurocode 3: Design of Steel Structures Part 19, Fatigue Strength. CEN European Committee
for Standardisation, Brussels (2010)
15. Eurocode 9: Design of Aluminium Structures Part 13, Additional Rules for Structures
Susceptible to Fatigue. CEN European committee for Standardisation, Brussels (2009)
16. Fisher, J.W.: Fatigue and Fracture in Steel Bridges, Case Studies. Wiley, New York (1984)
17. Fisher, J.W., Kulak, G.L., Smith, I.F.: A fatigue primer for structural engineers, National Steel
Bridge Alliance, Chicago (1998)
18. Frangopol, D.M.: Life-cycle performance, management, and optimization of structural systems
under uncertainty: Accomplishments and challenges. Struct. Infrastruct. Eng. 7(6), 389413
(2011)
19. Frangopol, D.M., Curley, J.P.: Effects of damage and redundancy on structural reliability.
J. Struct. Eng. 113(7), 15331549 (1987)
20. Frangopol, D.M., Klisinski, M.: Material behavior and optimum design of structural systems.
J. Struct. Eng. 115(5), 10541075 (1989)
21. Frangopol, D.M., Nakib, R.: Redundancy in highway bridges. Eng. J. 28(1), 4550 (1991).
American Institute of Steel Construction, Chicago
22. Ghosn, M., Moses, F.: Redundancy in Highway Bridge Superstructures. NCHRP Report 406.
Transportation Research Board, Washington, DC (1998)
23. Guedes Soares, C., Garbatov, Y.: Fatigue reliability of the ship hull girder. Mar. Struct. 9(34),
495516 (1996)
24. Heredia-Zavoni, E., Montes-Iturrizaga, R.: A Bayesian model for the probability distribution
of fatigue damage in tubular joints. J. Offshore Mech. Arctic Eng. 126(3) 243249 (2004)
25. Jensen, J.J.: In: Bhattacharyya, R., McCormick, M.E. (eds.) Load and Global Response of
Ships. Ocean Engineering Series, vol. 4. Elsevier, Oxford, UK (2001)
26. Hendawi, S., Frangopol, D.M.: System reliability and redundancy in structural design and
evaluation. Struct. Saf. 16(1+2), 4771 (1994)
27. Kim, J.: Finite Element Modeling of Twin Steel Box-Girder Bridges for Redundancy Evalua-
tion. Dissertation, The University of Texas at Austin, Austin (2010)
28. Kim, S., Frangopol, D.M.: Optimum inspection planning for minimizing fatigue damage
detection delay of ship hull structures. Int. J. Fatigue 33(3), 448459 (2011)
1564 D.M. Frangopol et al.
29. Kim, S., Frangopol, D.M.: Probabilistic bicriterion optimum inspection/monitoring planning:
application to naval ships and bridges under fatigue. Struct. Infrastruct. Eng. 8(10), 912927
(2012)
30. Kim, S., Frangopol, D.M, Soliman, M.: Generalized probabilistic framework for optimum
inspection and maintenance planning. J. Struct. Eng. 139(3), 435447 (2013)
31. Kwon, K., Frangopol, D.M., Soliman, M.: Probabilistic fatigue life estimation of steel bridges
by using a bilinear S-N approach. J. Bridge Eng. 17(1), 5870 (2012)
32. Leemis, L.M.: Reliability, Probabilistic Models and Statistical Methods. Prentice Hall, Engle-
wood Cliffs (1995)
33. Li, Z.X., Chan, T.H.T., Ko, J.M.: Fatigue analysis and life prediction of bridges with structural
health monitoring data Part I: methodology and strategy. Int. J. Fatigue 23(1), 4553 (2001)
34. Li, Z., Zhang, Y., Wang, C.: A sensor-driven structural health prognosis procedure considering
sensor performance degradation. Struct. Infrastruct. Eng. 9(8), 764776 (2013)
35. Mansour, A.E., Wirsching, P.H., White, G.J., Ayyub, B.M.: Probability-Based Ship Design:
Implementation of Design Guidelines. SSC 392. Ship Structures Committee, Washington
(1996)
36. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of state
calculations by fast computing machines. J. Chem. Phys. 21(6), 10871092 (1953)
37. Miner, M.A.: Cumulative damage in fatigue. J. Appl. Mech. 12(3), 159164 (1945)
38. Moan, T., Song, R.: Implications of inspection updating on system fatigue reliability of offshore
structures. J. Offshore Mech. Arctic Eng. 122(3), 173180 (2000)
39. Neal, R.M.: Slice sampling. Ann. Stat. Inst. Math. Stat. 31(3), 705767 (2003)
40. Okasha, N.M., Frangopol, D.M.: Advanced modeling for efficient computation of life-cycle
performance prediction and service-life estimation of bridges. J. Comput. Civil Eng. 24(6),
548556 (2010)
41. Paik, J., Wang, G.: Time-dependent risk assessment of ageing ships accounting for general/pit
corrosion, fatigue cracking and local dent damage. World Maritime Technology Conference,
San Francisco (2003)
42. Paris, P.C., Erdogan, F.A.: Critical analysis of crack propagation laws. J. Basic Eng. 85(Series
D), 528534 (1963)
43. Perrin, F., Sudret, B., Pendola, M.: Bayesian updating of mechanical models-application in
fracture mechanics. 18 , Grenoble, 2731 (2007)
44. Rabi, S., Karamchandani, A., Cornell, C.A.: Study of redundancy of near-ideal parallel
structural systems. In: Proceeding of the 5th International Conference on Structural Safety
and Reliability, pp. 975982. ASCE, New York (1989)
45. Sharp, M.L., Nordmark, G.E., Menzemer, C.C.: Fatigue Design of Aluminum Components and
Structures. McGraw-Hill, New York (1996)
46. Shigley, J., Mischke, C.: Mechanical Engineering Design, 5th edn. McGraw-Hill, New York
(1989)
47. Soliman, M., Frangopol D.M.: Life-cycle management of fatigue sensitive structures integrat-
ing inspection information. J. Infrastruct. Syst. 20(2), 04014001 (2014)
48. Soliman, M., Barone, G., Frangopol, D.M.: Fatigue reliability and service life prediction
of aluminum high-speed naval vessels based on structural health monitoring. Struct. Health
Monit. 14(1), 319 (2015)
49. Soliman, M., Frangopol, D.M., Kim, S.: Probabilistic optimum inspection planning of steel
bridges based on multiple fatigue sensitive details. Eng. Struct. 49, 9961006 (2013)
50. Tada, H., Paris, P.C., Irwin, G.R.: The Stress Analysis of Cracks Handbook. The American
Society of Mechanical Engineers, 3rd edn. Three Park Avenue, New York (2000)
51. Tobias, D.H.: Perspectives on AASHTO load and resistance factor design. J. Br. Eng. 16(6),
684692 (2011)
52. Tsopelas, P., Husain, M.: Measures of structural redundancy in reinforced concrete buildings.
II: redundancy response modification factor RR . J. Struct. Eng. 130(11), 16591666 (2004)
53. Wen, Y.K., Song, S.-H.: Structural reliability/redundancy under earthquakes. J. Struct. Eng.
129, 5667 (2003)
45 Redundancy of Structures and Fatigue of Bridges and Ships Under Uncertainty 1565
54. Wirsching, P.H.: Fatigue reliability for offshore structures. J. Struct. Eng. 110(10), 23402356
(1984)
55. Yazdani, N., Albrecht, P.: Risk analysis of fatigue failure of highway steel bridges. J. Struct.
Eng. 113(3), 483500 (1987)
56. Zhu, B., Frangopol, D.M.: Effects of post-failure material behavior on redundancy factor for
design of structural components in nondeterministic systems. Struct. Infrastruct. Eng. 11(4),
466485 (2014)
57. Zhu, B., Frangopol, D.M.: Redundancy-based design of nondeterministic systems, chapter 23.
In: Frangopol, D.M., Tsompanakis, Y. (eds.) Maintenance and Safety of Aging Infrastructure.
Structures and Infrastructures, vol. 10, pp. 707738. CRC Press/Balkema, Taylor & Francis
Group, London (2014)
58. Zhu, B., Frangopol, D.M.: Effects of post-failure material behavior on the reliability of systems.
ASCE-ASME J. Risk Uncertain. Eng. Syst. Part A: Civil Eng. 1(1), 04014002, 113 (2015)
59. Zhu, B., Frangopol, D.M., Kozy, B.M.: System reliability and the redundancy factor by
simplified modeling. In: Furuta, H., Frangopol, D.M., Akiyama, M. (eds.) Assessment,
Maintenance and Management, p. 148. CRC Press/Balkema, Taylor & Francis Group, London
(2015) and Full Paper on DVD, Taylor & Francis Group PLC, London, pp. 614618 (2015)
Uncertainty Approaches in Ship Structural
Performance 46
Matthew Collette
Abstract
Uncertainty quantification in ship structural performance remains uneven. While
the ship structure community has long developed probabilistic models for the
ocean environment and structural responses, complete quantification frameworks
remain the exception not the rule. Much of the current uncertainty focus is
around design code development, and more piecemeal responses in areas where
current design codes are not adequate. This chapter briefly reviews the current
state of the art in ship structural performance, including examples of failures in
service. Then, the current design code landscape is reviewed. In design codes
today, partial safety factors make scattered appearances, but a wide adoption of
such measures is still some time off. While more general concepts of risk are
gaining traction in the marine industry for risk-based approval, direct structural
performance simulation remains the exception rather than the rule. Different
uncertainty types and underlying data are then presented and reviewed. Initially,
basic design parameters and underlying operational data are discussed, and
then strength and loading models are presented. The fragmented nature of
uncertainty quantification is clear across all these topics while many key
responses are described in stochastic frameworks, an equal number of modeling,
fabrication, and operational parameters are subject to a large amount of epistemic
uncertainty today. This limits the number of integrated uncertainty quantification
computations that can be completed today. Finally, an overview of published
uncertainty frameworks to date is presented. While none of these frameworks are
all-encompassing, they demonstrate both the practical application of uncertainty
quantification to current problems and a foundation for future probabilistic
modeling advances.
M. Collette ()
Department of Naval Architecture and Marine Engineering, University of Michigan,
Ann Arbor, MI, USA
e-mail: mdcoll@umich.edu
Keywords
Goal-based standards Common structural rules Marine Ship Ocean
Design code Fabrication LRFD Uncertainty framework Modeling
uncertainty Modeling idealization
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568
2 Current Uncertainty Handling in Design Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1571
3 Uncertainty Types and Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1572
3.1 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1572
3.2 Uncertainty Data Related to Basic Design Parameters . . . . . . . . . . . . . . . . . . . . . . 1573
3.3 Uncertainty Data Related to Modeling and Experiments . . . . . . . . . . . . . . . . . . . . . 1578
4 Proposed Frameworks to Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1583
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1584
Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1585
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1585
1 Introduction
inspection regime for these vessels to ensure a sufficient level of safety. Barsom and
Rolfe present a case study of this problem in Chapter 17 of their fracture mechanics
textbook [3].
A second example is the result of the take tons off sensibly (TOTS) program
in the US Navy. Here, the US Navy reused a smaller hull form from the successful
DD-963 class destroyer for the larger CG-47 class cruiser. In an effort to expand
the combat capability and increase the weight available for upgrades in the CG-47
class cruiser design, the US Navy and the shipyards involved extensively reviewed
the design of the vessel for potential weight savings. By optimizing the structural
design, and using higher-strength steel plating, significant weight savings were
obtained [4]. Unfortunately, all sources of uncertainty around structural loading
and response were not quantified during this optimization. In fact, the hullform is
atypical with large bow flare that can dramatically increase impulsive loads from
wave impact. Additionally, a new deck-piercing vertical launch system (VLS) for
missiles was added to the foredeck near the forward end of the superstructure,
creating complex local stress fields. A slight knuckle also exists in the deck forward
of the superstructure before the VLS module, leading to an eccentric load path on the
deck under compressive loads. While the traditional factors of safety were sufficient
to avoid failures in the non-optimized structure without the VLS, the optimized
structure suffered frequent deck buckling at the knuckle and fatigue damage in-
service [5]. Costly replating and structural retrofits of most vessels in this class were
then required. The experience with the TAPS vessels and the CG-47 class highlights
the need for better uncertainty management in the marine structural industry.
In addition to in-service failures, the growth of both complex numerical modeling
and on-board instrumentation is further highlighting the need for better uncertainty
management. The growth in the power of numerical modeling in the past two
decades is encouraging engineers to increase the detail of the analysis used in
design. This increased power to estimate responses is leading to more aggressive
optimization of structures. However, these more advance models also typically
require many more inputs of fundamentally stochastic parameters. Without expand-
ing the uncertainty analysis and quantification used in this analysis, a highly precise
answer is obtained with unknown variance. Likewise, the rapid growth in in-service
monitoring and data recording has led to an increase desire to make structural
diagnosis and prognosis decisions for actual structures. However, for most vessels,
the as-built and as-repaired condition is rarely fully documented. Such structure-
specific analysis also requires an enhanced understanding and quantification of these
sources of uncertainty.
The remainder of this chapter summarizes the current state of the art and chal-
lenges in uncertainty approaches in ship structural performance. Current approaches
in marine industry design codes are presented in Sect. 1. Section 2 discusses the
types of uncertainty underlying the ship structural performance problem. Specific
focus is given to both uncertainties in the physical system and uncertainties in
the modeling and idealization of the structure. Section 3 presents a brief listing
and overview of several proposed integrated uncertainty frameworks, and Sect. 4
presents conclusions.
46 Uncertainty Approaches in Ship Structural Performance 1571
All ship structure codes fundamentally balance load capacity and demand to ensure
safety in service. Historically, such codes were based on experience and simplified,
deterministic calculations that were at best broadly representative of the real loading
and structural failure modes occurring in service. Formulations for load magnitudes
at specified exceedance probabilities first became possible after a landmark paper
in 1953 [6]. This paper extended the electronic filter analogy to a ship operating in
waves, allowing the spectrum of the ships response to be calculated for known wave
spectra. Just over a decade later, Dunn [7] advocated a reliability-based approach to
the broader problem of design including strength. Dunn framed this argument in
terms of stochastic variables following defined distributions, with safety measured
probabilistically. Dunn took his inspiration from work with electronics and missile
system which were adopting similar formats in the marine industry at this time.
From these initial works, uncertainty rapidly entered the ship structural com-
munity via conventional structural reliability analysis (SRA). This development
paralleled the rapid advance of reliability in the civil engineering industry. However,
unlike civil engineering, reliability did not make the jump to structural design codes
via load and resistance factor design (LRFD). While several proposals were made
for using reliability, safety indexes, partial safety factors, and LRFD in ship design
codes [1, 8, 9], industry-wide adoption of LRFD or partial safety factors has not
occurred in the marine industry.
Instead, uncertainty handling via reliability is incorporated into certain areas
of design codes. The international association of classification societies (IACS)
has developed a standardized code, known as the common structural rules (CSR).
The CSR are also a response to the International Maritime Organization (IMO)
move toward goal-based standards. Goal-based standards specify the desired safety
or performance outcome for a design, instead of specifying prescriptive safety
solutions. However, the IMO specification for the goal-based standards only requires
that uncertainty be addressed [10] but does not specify how. The current CSR
includes the option of using a partial safety factor approach for hull girder collapse
limit states. However, other sections of the CSR are still formulated as working
stress. IACS has also worked to standardize stochastic descriptions of the ocean
environment for design and standardize that extreme loads should be calculated with
an exceedance probability of 108 per wave cycle.
Structural reliability approaches are codified as optional or extra procedures
for unusual vessels or situations. These approaches typically go far beyond static
LRFD codes. They include direct involvement of the design engineer in building the
stochastic uncertainty model. For example, the classification society DNV-GL has
issued a classification note on this topic DNV [11] . This note includes guidance
on setting up the reliability problem, characterizing the stochastic variables via
distributions, computing and evaluating the resulting reliability, and performing
inspection updating. However, the application of such approaches remains optional.
The emergence of marine risk-based design methods [12] outside of reliability
has also increased the demand for uncertainty modeling. Risk-based design attempts
1572 M. Collette
to quantify risk levels for a variety of events, including many not directly related
with structural performance. Much of this work has remained in academic circles
to date (e.g. the work of [13] on ship collision probability). Furthermore, perfor-
mance issues of ship grounding, collision, stability, fire, and evacuation have been
investigated in far more detail than structural response. However, a few transitions
into codes that address structures have occurred. The ABS classification society
has issued a guidance note on approval of novel concepts [14] that codifies how
uncertainties are to be handled by a range of quantitative risk assessment procedures
beyond structural reliability analysis.
In summary, the current state of the art for handling uncertainties in the
marine structural community is fragmented. Certain concepts, such as the wave
environment and extreme load exposure levels, are well codified in stochastic
models. Other topics such as structural reliability assessment and quantitative risk
analysis have been partially codified. However, these tools are typically reserved for
novel or special concepts today. Furthermore, no industry-wide standard exists for
the implementation of such methods.
Table 46.1 Summary of basic material property data uncertainty from Hess et al. [15]
Variable Mean value COV Recommend distribution
Plate thickness, t 1.05 t 0.044 Lognormal
Yield stress, y , ordinary strength 1:3 y 0.124 Lognormal
steel
Yield stress, y , high-strength steel 1:19 y 0.083 Lognormal
Ultimate strength, u all steels 1:05 u 0.075 Normal
Elastic modulus, E, all steels 0.987 E 0.076 Normal
as the toolbox for predicting performance with aleatory uncertainties is much richer
than that with epistemic uncertainties.
When this loading is removed, the tensile block unloads elastically to a new, lower,
stress level. In this process, applied loads wash out or shakedown the residual
stresses in the structure.
Much like as-built residual stresses, the community currently lacks widely
applicable models to estimate shakedown effects. Gannon et al. [23] reported up to
40% reduction in residual stresses in a stiffened panel with applied loads of roughly
50% of the yield stress. Syahroni and Berge reported on shakedown of peak residual
stresses at fatigue notches, including experimental investigation [24]. Similar to
Gannon et al. they found applied tensile loads largely reduced the residual stresses
at the fatigue notches. The experiments indicated that it is possible to completely
eliminate the residual stresses when the tensile overload is 85% of the yield stress
of the base metal. While this process has been theoretically explained and experi-
mentally verified, practical uncertainty characterization has not yet been completed.
Thus, while basic material properties are well understood, the as-built and in-service
shape and stresses in the structure still include epistemic uncertainties.
has not yet been formulated; however, reasonable statistical data is now available
from these two approaches for common ship structures.
Structural fatigue cracking is another time-varying change to the structures
integrity, load paths, and remaining structural capacity. As one of the few large,
weld-fabricated structures that see frequent complete reversal of primary stress
fields from tension to compression, naval structures are prone to structural fatigue
cracking. In turn, structural fatigue cracking has historically been marked by high
variability. Aleatory uncertainties have been shown to be large for structural fatigue
crack initiation. It is not uncommon to have otherwise-identical fatigue details
fail in laboratory testing with more than one order of magnitude difference in
fatigue life under equal loading. Such uncertainty can be recorded in laboratory
experiments. However, there are equally significant epistemic uncertainties in
fatigue life prediction for most ship structures. Ship structures have historically been
fabricated with looser construction tolerances than assembly-line structures such as
aerospace structures, and as a result, the weld toe profile, misalignment, and residual
stress present in the final as-built structure are often unknown at the design stage.
These factors can equally influence the final fatigue life as much as the aleatory
uncertainties inherent in the fatigue of large welded structures.
To date, the most common practice is to lump all these uncertainties together
and capture aleatory uncertainty measures from laboratory experiments under the
assumption that the laboratory specimens represent industrial fabrication tolerances.
There have been a few papers focused on the fatigue-related uncertainty in isolation.
Wirsching [29] presented the first major study into fatigue crack initiation reliability,
including uncertainty in the S-N equation. Collette and Incecik tested simplified
reliability formulations against aluminum structural test data and summarized
recommended uncertainty measures [30]. Ayyub et al. summarized similar data and
proposed LRFD rules for naval ship structures [31], while Fols et al. performed
a similar exercise for commercial ships, including significant data analysis [32].
Depending on the production processes followed, statistical recommendations from
the wider engineering community, such as those included in the International
Institute of Welding (IIW) publications, may also be applicable to ship structures.
For fatigue crack propagation, far fewer studies have been conducted into reasonable
values for uncertainty terms in the crack growth laws. In summary, the aleatory
aspects of fatigue degradation are relatively well documented for laboratory tests
of ship structural components. The epistemic uncertainties have received far less
attention to date.
individual topics within the scope of operational data. The operational impact of
loading, wave climate, and crews weather routing will each be reviewed in turn.
The operational weight distribution is the first major contribution to the structural
loading of the vessel. Differences in weight and buoyancy along and across the
hull lead to large hydrostatic forces and bending moments throughout the structure.
While the hydrostatic external loading from the seawater pressure is known with
high precision with the vessel at rest, the weight distribution on the vessel is not
constant and subject to extreme variability. While the ships center of gravity is
verified after launching, the distribution of mass along the hull is typically not
verified, and thus the as-built weight distribution is subject to uncertainty. Once
operational, the crew will adjust cargo, fuel, consumables, and ballast water as
necessary to maintain adequate trim and stability. These large variable loads are
termed still water loads and are typically marked by high uncertainty. Guedes Soares
and Moan [33] provide a database of measured still water loads for seven standard
ship types and summarized uncertainty measures. The measured data indicated that
a normal distribution is appropriate, but that the per-ship coefficient of variation
(COV) may be quite high, approaching 0.4.
The ocean wave climate is often represented by engineering idealizations
during the assessment of ship structures. The real ocean environment may include
multidirectional wave systems (e.g., swell and wind-driven seas from different storm
systems) as well as local bathymetric influences. Engineering design idealizes this
situation using linear wave theory to reduce the ocean environment into single- or
two-parameter unidirectional energy spectra. These spectra are given as functions
typically parameterized by wave height and wave period. To represent the different
sea states that are likely to occur in a given area, a scatter diagram is created where
the relative frequency of specific wave height and period combinations are listed
based on observations. The International Association of Classification Societies
(IACS) publishes a recommended wave scatter diagram and spectral formulation
for worldwide service in Recommendation [34]. Such formulations allow the wave
environment to be presented in a framework that includes aleatory uncertainties.
Primary epistemic uncertainties include the relevance of the worldwide model to
local conditions, the difficulty in predicting extreme values and departure from
linear wave theory for extreme wave heights [35], and potential future impacts of
climate change on the scatter diagrams [36].
Within the context of the ocean wave environment, the vessels crew can still
decide heading angle and ships speed with respect to the wave field. Furthermore,
using weather forecasts for wide areas of the ocean, the crew can attempt to avoid
the worst weather by planning the routing several days in advance. This ability to
weather route means that the ocean environment experienced by the vessel may
differ from that observed by wave buoys or otherwise stationary observers. Data
on crews decisions with respect to speed and heading can be gathered from two
different approaches. Statistical data mining of ship logs can be used to develop
operational profiles for vessels, indicating the probability of taking a specific speed
and heading given a prevailing sea state. Sikora [37] presents recent data for
operational profiles of naval combatants and auxiliaries.
1578 M. Collette
Experience has shown that such human impacts can be significant, especially for
calculations done in a very small community or infrequently, where both the tool
reliability and operator knowledge may be less than ideal.
Additionally, compared to basic design parameters, stochastic model of uncer-
tainty in idealizations and models is generally less mature in the marine industry.
However, this topic has grown in importance and research activity lately. Sev-
eral initial studies into modeling and tool uncertainty took place in the 1990s
and will be reviewed in this section. A major landmark was the joint Inter-
national Towing Tank Conference (ITTC) and International Ship and Offshore
Structures Congress (ISSC) workshop on stochastic uncertainty modeling for
46 Uncertainty Approaches in Ship Structural Performance 1579
ship design loads and operational guidance, the papers from which were published
in a special issue of Ocean Engineering [41].
a reference stress related to the stress believed to be causing crack initiation that
is calculable with linear finite element analysis. However, while the concept of the
reference stress has been standardized, the implementation and modeling do vary,
leading to significant modeling uncertainty.
In a 2002 benchmark study, Fricke et al. [45] studied the same weld toe via seven
different hot-spot analysis techniques. The predicted fatigue life of the detail varied
between 1.8 and 20.7 years, indicating a high level of modeling and idealization
uncertainty. A similar study has also been made of notch stress analysis techniques
[46], again showing significant variability. While such studies have shown that the
variability in fatigue idealizations is high, statistics for a particular given method of
analysis on a type of structure are generally not available. In a study investigating
aluminum fatigue experimental test results with author-reported hot-spot stress
analysis, Collette and Incecik [30] showed that the different test programs were
from distinct statistical populations. Thus, a common uncertainty measure was not
recommended. Idealization and modeling errors for fatigue analysis remain a topic
of investigation for the marine community.
The second major ship structure modeling approach is collapse models where
the nonlinear failure of the structure is directly estimated. This is typically done
either via simplified semi-analytic models or nonlinear finite element models.
As ships are thin-walled shell structures, collapse strength is dependent on far
more basic variables than the linear FEA allowable stress method reviewed above.
Imperfections, residual stresses, and yield stress variations can all strongly impact
the computed collapse strength. Additionally, the modeling is more complex than
linear elastic analysis. The uncertainty in such collapse models has been actively
investigated for over 30 years.
As would be expected, collapse models for smaller portions of a structure (e.g.,
an unstiffened plate in isolation) generally perform with less variability than models
for more complex assemblies of structural components. This is a result of simpler
and better-fitting idealizations at the component level. For a series of stiffened
panels tested in compression, results have shown that most simplified analysis
methods can predict the strength of the panel within roughly 10% of the true load
and the coefficient of variation of the prediction also typically close to 10% [47].
However, these predictions are with fairly complete information about the as-built
material yield stresses, imperfections, and residual stresses in the structure e.g.,
the underlying material and geometry uncertainties have largely been removed.
Modeling such simple structural experiments also present minimal opportunity for
human analyst error as the topology, boundary conditions, and relevant failure
modes are normally clear by inspection.
For a more complex calculation of the collapse of ship hull, a benchmark
calculation between a range of mostly simplified analysis methods was made in
1994 [48] on a 1/3 scale frigate hull that was also tested experimentally. The results
showed much greater scatter with most methods having errors between 10% and
20%, with some methods missing by as much as 40%. However, the structure of
this test case was much more slender than typically commercial vessels, which
may explain some of the struggles of the simplified methods. Recent guidance has
46 Uncertainty Approaches in Ship Structural Performance 1581
suggested that simplified collapse methods have a bias factor less than 5% and
a COV approaching 10% when compared to nonlinear finite element analysis for
conventional ships [49].
Errors in nonlinear finite element analysis are not yet clear. In principle, the FEA
models are able to capture many more failure modes and mode interactions more
successfully than the previous generation of simplified analysis codes. In practice,
the number of modeling and idealization decisions that must go into generating
and solving such a large model are easily an order of magnitude higher than
the simplified analysis codes. Thus, the opportunity for idealization and human
analyst errors is higher. Unfortunately, data on FEA model idealization error is quite
limited at the moment. Rigo et al. [50] compared six different FEA analyses of
an aluminum-stiffened panel with established geometry and boundary conditions.
The six analysis runs were done by different authors who used different codes,
meshes, and element types. In general, the ultimate strength prediction of the panel
fell within a 4% error band, indicating good agreement, though the post-collapse
behavior did differ more strikingly.
In 2012, Paik et al. [51] compared ten different overall collapse prediction codes
on six different ships. The codes included both nonlinear FEA and various simplified
methods. In general, the scatter of the results across all methods was on the order
of 1020%, including the FEA codes. Even within FEA analysis, similar results
were seen. For example, the same commercial code (ANSYS) was used by two
participants. For most of the vessels, they investigated the results between the two
participants differed 1020%. This indicates the idealization decisions made by the
analyst for nonlinear FEA may be a first-order uncertainty in the calculation. Moan
et al. [52] provide a framework for more formally assessing such uncertainties, but
again without data for the general analyst. Such overall ship collapse analysis with
nonlinear finite element analysis has only been tractable for about the past decade
with standard FEA codes, so this area is still one in rapid development.
5 Conclusions
models used. Despite this circumstance, several authors have proposed integrated
frameworks that tie multiple pieces of uncertainty together and increasingly include
inspection updating and lifecycle performance and risk measures. Such studies point
the way to increased use of uncertainty in the future evaluation of ship structural
performance.
The author wishes to acknowledge helpful discussion with Dr. Paul Hess of Code
331 at the Office of Naval Research in the USA in framing this chapter.
Cross-References
References
1. Ayyub, B.M., Assakkaf, I.A., Beach, J.E., Melton, W.M., Nappi, N., Conley, J.A.: Methodol-
ogy for developing reliability-based load and resistance factor design (LRFD) guidelines for
ship structures. Nav. Eng. J. 114(2), 2342 (2002)
2. Sielski, R.: Aluminum Structure Design and Fabrication Guide. Ship Structure Committee,
Washington, DC, SSC-452 (2007)
3. Barsom, J., Rolfe, S.: Fracture and Fatigue Control in Structures. Applications of Fracture
Mechanics, 3rd edn. Butterworth-Heinemann, West Conshohocken (1999)
4. Staiman, R.C.: Aegis cruiser weight reduction and control. Nav. Eng. J. 99(3), 190201 (1987)
5. Keane R. Jr.: Reducing total ownership cost: designing inside-out of the hull. Nav. Eng.
J. 124(4), 6780 (2012)
6. St. Denis, M., Pierson, W.J.: On the motion of ships in confused seas. Trans. Soc. Nav. Archit.
Mar. Eng. 61, 280357 (1953)
7. Dunn, T.W.: Reliability in shipbuilding. Trans. Soc. Nav. Archit. Mar. Eng. 72, 1440 (1964)
8. Mansour, A.E., Wirsching, P., Lucket, M.D., Plumpton, A.M., Lin, Y.H.: Structural safety of
ships. Trans. Soc. Nav. Archit. Mar. Eng. 105, 6198 (1997)
9. Mansour, A., Wirsching, P., White, G., Ayyub, B.M.: Probability Based Ship Design: Imple-
mentation of Design Guidelines. Ship Structure Comittee, Washington, DC, SSC-392 (1996)
10. Hoppe, H.: Goal-based standards a new approach to the international regulation of ship
construction. IMO News Mag. 2006(1), 1317 (2006)
11. DNV, Classification Note 30.6: Structural Reliability Analysis of Marine Structures. Det
Norske Veritas, Hvik (1992)
12. Papanikolaou, A.: Risk-Based Ship Design. Springer, Berlin/Heidelberg (2009)
13. Sthlberg, K., Goerlandt, F., Ehlers, S., Kujala, P.: Impact scenario models for probabilistic
risk-based design for shipship collision. Mar. Struct. 33, 238264 (2013)
14. ABS: Guidance Notes on Review and Approval of Novel Concepts. American Bureau of
Shipping, Houston (2003)
15. Hess, P.E., Bruchman, D., Assakkaf, I.A., Ayyub, B.M.: Uncertainties in material and
geometric strength and load variables. Nav. Eng. J. 114(2), 139166 (2002)
16. Kaufman, J., Prager, M.: Marine Structural Steel Toughness Data Bank. Ship Structure
Committee, Washington, DC, United states, SSC-352 (1990)
17. Sumpter, J.D.G., Kent, J.S.: Fracture toughness of grade D ship steel. Eng. Fract. Mech. 73(10),
13961413 (2006)
1586 M. Collette
18. Paik, J., Thayamballi, A.K., Ryu, J., Jang, C., Seo, J., Park, S., Soe, S., Renaud, C., Kim, N.:
Mechanical Collapse Testing on Aluminum Stiffened Panels for Marine Applications. Ship
Structure Committee, Washington, DC, SSC-451 (2007)
19. Smith, C.S., Davidson, P.C., Chapman, J.C., Dowling, P.J.: Strength and stiffness of ships
plating under in-plane compression and tension. Trans. R. Inst. Nav. Archit. 130, 277296
(1988)
20. Kenno, S.Y., Das, S., Kennedy, J.B., Rogge, R.B., Gharghouri, M.: Residual stress distributions
in ship hull specimens. Mar. Struct. 23(3), 263273 (2010)
21. Gannon, L., Liu, Y., Pegg, N., Smith, M.: Effect of welding sequence on residual stress and
distortion in flat-bar stiffened plates. Mar. Struct. 23(3), 385404 (2010)
22. Jennings, E., Grubbs, K., Zanis, C., Raymond, L.: Inelastic Deformation of Plate Panels. Ship
Structure Committee, Washington, DC, SSC-364 (1991)
23. Gannon, L.G., Pegg, N.G., Smith, M.J., Liu, Y.: Effect of residual stress shakedown on stiffened
plate strength and behaviour. Ships Offshore Struct. 8(6), 638652 (2013)
24. Syahroni, N., Berge, S.: Fatigue assessment of welded joints taking into account effects of
residual stress. J. Offshore Mech. Arct. Eng. 134(2), 021405021405 (2011)
25. Paik, J.K., Lee, J.M., Hwang, J.S., Park, Y. II: A time-dependent corrosion wastage model for
the structures of single- and double-hull tankers and FSOs and FPSOs. Mar. Technol. 40(3),
201217 (2003)
26. Wang, G., Spencer, J., Sun, H.: Assessment of corrosion risks to aging ships using an
experience database. J. Offshore Mech. Arct. Eng. 127(2), 167174 (2005)
27. Melchers, R.E., Jeffrey, R.J.: Probabilistic models for steel corrosion loss and pitting of marine
infrastructure. Reliab. Eng. Syst. Saf. 93(3), 423432 (2008)
28. Melchers, R.E.: Extreme value statistics and long-term marine pitting corrosion of steel.
Probab. Eng. Mech. 23(4), 482488 (2008)
29. Wirsching, P.: Fatigue reliability for offshore structures. J. Struct. Eng. 110(10), 23402356
(1984)
30. Collette, M., Incecik, A.: An approach for reliability-based fatigue design of welded joints on
aluminum high-speed vessels. J. Ship Res. 50(1), 8598 (2006)
31. Ayyub, B.M., Assakkaf, I.A., Kihl, D.P., Siev, M.W.: Reliability-based design guidelines for
fatigue of ship structures. Nav. Eng. J. 114(2), 113138+207 (2002)
32. Fols, R., Otto, S., Parmentier, G.: Reliability-based calibration of fatigue design guidelines
for ship structures. Mar. Struct. 15(6), 627651 (2002)
33. Soares, C.G., Moan, T.: Statistical analysis of stillwater load effects in ship structures. Soc.
Nav. Archit. Mar. Eng.-Trans. 96, 129156 (1988)
34. IACS: Standard Wave Data. Corrected Nov. 2001. IACS (1992)
35. Jonathan, P., Ewans, K.: Statistical modelling of extreme ocean environments for marine
design: a review. Ocean Eng. 62, 91109 (2013)
36. Bitner-Gregersen, E.M., Eide, L.I., Hrte, T., Skjong, R.: Ship and Offshore Structure Design
in Climate Change Perspective. Springer Science & Business Media, Berlin (2013)
37. Sikora, J.P., Michaelson, R.W., Ayyub, B.M.: Assessment of cumulative lifetime seaway loads
for ships. Nav. Eng. J. 114(2), 167180 (2002)
38. Sternsson, M., Bjrkenstam, U.: Influence of weather routing on encountered wave heights.
Int. Shipbuild. Prog. 49(2), 8594 (2002)
39. Shu, Z., Moan, T.: Effects of avoidance of heavy weather on the wave-induced load on ships.
J. Offshore Mech. Arct. Eng. 130(2) (2008)
40. Papanikolaou, A., Alfred Mohammed, E., Hirdaris, S.E.: Stochastic uncertainty modelling for
ship design loads and operational guidance. Ocean Eng. 86, 4757 (2014)
41. Hirdaris, S.: Special issue on uncertainty modelling for ships and offshore structures. Ocean
Eng. 86, 12 (2014)
42. stergaard, C., Dogliani, M., Guedes Soares, C., Parmentier, G., Pedersen, P.T.: Measures of
model uncertainty in the assessment of primary stresses in ship structures. Mar. Struct. 9(34),
427447 (1996)
46 Uncertainty Approaches in Ship Structural Performance 1587
43. GL: Rules for Classification and Construction V Analysis Techniques Hull Structural
Design Analysis Guidelines for Global Strength Analysis of Container Ships. Germanischer
Lloyd SE (2011)
44. DNV: Classification Note 34.1: CSA-Direct Analysis of Ship Structures. Det Norske Veritas
(2013)
45. Fricke, W., Cui, W., Kierkegaard, H., Kihl, D., Koval, M., Mikkola, T., Parmentier, G.,
Toyosada, M., Yoon, J.-H.: Comparative fatigue strength assessment of a structural detail in
a containership using various approaches of classification societies. Mar. Struct. 15(1), 113
(2002)
46. Fricke, W., Bollero, A., Chirica, I., Garbatov, Y., Jancart, F., Kahl, A., Remes, H., Rizzo, C.M.,
von Selle, H., Urban, A., Wei, L.: Round robin study on structural hot-spot and effective notch
stress analysis. Ships Offshore Struct. 3(4), 335345 (2008)
47. Hughes, O., Nikolaidis, E., Ayyub, B., White, G., Hess, P.: Uncertainty in Strength Models for
Marine Structures. Ship Structure Committee, Washington, DC, SSC-375 (1994)
48. Jensen, J.J.: Ductile Collapse Committee III.1. In: Proceedings of the 12th International Ship
and Offshore Structures Congress, St. Johns, pp. 299388 (1994)
49. Amlashi, H.K.K., Moan, T.: A proposal of reliability-based design formats for ultimate hull
girder strength checks for bulk carriers under combined global and local loadings. J. Mar. Sci.
Technol. 16(1), 5167 (2011)
50. Rigo, P., Sarghiuta, R., Estefen, S., Lehmann, E., Otelea, S.C., Pasqualino, I., Simonsen, B.C.,
Wan, Z., Yao, T.: Sensitivity analysis on ultimate strength of aluminium stiffened panels. Mar.
Struct. 16(6), 437468 (2003)
51. Paik, J.K., Amlashi, H., Boon, B., Branner, K., Caridis, P., Das, P.K., Fujikubo, M., Huang,
C.H., Josefson, L., Kaeding, P., Kim, C.W., Parmentier, G., Pasqualino, I., Rizzo, C.M.,
Vhanmane, S., Wang, X., Yang, P.: Committee III.1: Ultimate Strength. In: Proceedings of
the 18th International Ship and Offshore Structures Congress, vol. 1, 3 vols., pp. 285363.
Schiffbautechnische Gesellschaft e.V., Hamburg (2012)
52. Moan, T., Dong, G., Amlashi, H.K.K.: Critical assessment of ultimate hull girder capacity of
ships from a reliability analysis point of view. In: Maritime Transportation and Exploitation
of Ocean and Coastal Resources, Two Volume Set, Volume 1., pp. 477485. Lisbon, Taylor &
Francis (2006)
53. Schellin, T.E., stergaard, C., Guedes Soares, C.: Uncertainty assessment of low frequency
load effects for containerships. Mar. Struct. 9(34), 313332 (1996). SPEC. ISS.
54. Parunov, J., Senjanovi, I.: Incorporating model uncertainty in ship reliability analysis. Trans.
Soc. Nav. Archit. Mar. Eng. 111, 376408 (2003)
55. Li, Z., Mao, W., Ringsberg, J.W., Johnson, E., Storhaug, G.: A comparative study of fatigue
assessments of container ship structures using various direct calculation approaches. Ocean
Eng. 82, 6574 (2014)
56. Drummen, I., Holtmann, M.: Benchmark study of slamming and whipping. Ocean Eng. 86,
310 (2014)
57. Kim, Y., Hermansky, G.: Uncertainties in seakeeping analysis and related loads and response
procedures. Ocean Eng. 86, 6881 (2014)
58. Soares, C.G., Garbatov, Y.: Reliability of maintained ship hull girders subjected to corrosion
and fatigue. Struct. Saf. 20(3), 201219 (1998)
59. Soares, C.G., Garbatov, Y.: Reliability of corrosion protected and maintained ship hulls
subjected to corrosion and fatigue. J. Ship Res. 43(2), 6578 (1999)
60. Deco, A., Frangopol, D.M., Zhu, B.: Reliability and redundancy assessment of ships under
different operational conditions. Eng. Struct. 42, 457471 (2012)
61. Dong, Y., Frangopol, D.M.: Risk-informed life-cycle optimum inspection and maintenance of
ship structures considering corrosion and fatigue. Ocean Eng. 101, 161171 (2015)
62. Frangopol, D.D.M., Bocchini, D.P., Dec, A., Kim, D.S., Kwon, D.K., Okasha, D.N.M.,
Saydam, D.: Integrated life-cycle framework for maintenance, monitoring, and reliability of
naval ship structures. Nav. Eng. J. 124(1), 8999 (2012)
1588 M. Collette
63. Faber, M.H., Straub, D., Heredia-Zavoni, E., Montes-Iturrizaga, R.: Risk assessment for
structural design criteria of FPSO systems. Part I: generic models and acceptance criteria. Mar.
Struct. 28(1), 120133 (2012)
64. Heredia-Zavoni, E., Montes-Iturrizaga, R., Faber, M.H., Straub, D.: Risk assessment for
structural design criteria of FPSO systems. Part II: consequence models and applications to
determination of target reliabilities. Mar. Struct. 28(1), 5066 (2012)
65. Ayyub, B.M., Stambaugh, K.A., McAllister, T.A., de Souza, G.F., Webb, D.: Structural life
expectancy of marine vessels: ultimate strength, corrosion, fatigue, fracture, and systems.
ASCE-ASME J. Risk Uncertain. Eng. Syst. Part B Mech. Eng. 1(1), 011001011001 (2015)
66. Moan, T., Ayala-Uraga, E.: Reliability-based assessment of deteriorating ship structures
operating in multiple sea loading climates. Reliab. Eng. Syst. Saf. 93(3), 43346 (2008)
Uncertainty Quantifications Role in
Modeling and Simulation Planning, and 47
Credibility Assessment Through the
Predictive Capability Maturity Model
Abstract
The importance of credible, trustworthy numerical simulations is obvious espe-
cially when using the results for making high-consequence decisions. Deter-
mining the credibility of such numerical predictions is much more difficult
and requires a systematic approach to assessing predictive capability, associated
uncertainties and overall confidence in the computational simulation process for
the intended use of the model. This process begins with an evaluation of the
computational modeling of the identified, important physics of the simulation
for its intended use. This is commonly done through a Phenomena Identification
Ranking Table (PIRT). Then an assessment of the evidence basis supporting the
ability to computationally simulate these physics can be performed using various
frameworks such as the Predictive Capability Maturity Model (PCMM). Several
critical activities follow in the areas of code and solution verification, validation
and uncertainty quantification, which will be described in detail in the following
sections. The subject matter is introduced for general applications but specifics
are given for the failure prediction project.
The first task that must be completed in the verification & validation procedure
is to perform a credibility assessment to fully understand the requirements and
limitations of the current computational simulation capability for the specific
application intended use. The PIRT and PCMM are tools used at Sandia
National Laboratories (SNL) to provide a consistent manner to perform such
an assessment. Ideally, all stakeholders should be represented and contribute
to perform an accurate credibility assessment. PIRTs and PCMMs are both
described in brief detail below and the resulting assessments for an example
project are given.
Keywords
Application-specific Foundational Frameworks PCMM PIRT
Contents
1 Introduction to Predictive Capability Maturity Model (PCMM) . . . . . . . . . . . . . . . . . . . 1590
2 Need for Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593
3 Different Frameworks and Their Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596
3.1 The Original PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596
3.2 Variations on a Theme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1597
3.3 Foundational PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1599
3.4 Application Specific PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1600
4 Phenomena Identification Ranking Table (PIRT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1601
5 PCMM Example: Qualification of Alternatives to the SPUR Reactor (QASPR) . . . . . . 1603
6 The Role of Frameworks like PCMM and PIRT in Credibility . . . . . . . . . . . . . . . . . . . . 1605
7 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1610
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1611
The PCMM was developed at SNL for the DOE Advanced Simulation and
Computing (ASC) program as a means of assessing completeness of modeling and
computational simulation activities for a particular application. There have been
3 generations of PCMM with an emphasis evolving from assessment to evidence
inventory [8, 9, 11, 12]. Tables 47.1 and 47.2 show one of the PCMM templates
commonly used. There are 6 elements to computational simulation that require
investigation with respect to maturity level. Each element has several factors to
consider. In the early PCMM versions these were divided into subcategories to
be assessed separately but in the newer, current version these are areas to explore,
consider and aggregate when determining an overall evaluation of the element. The
stakeholders, from analysts to the customers, must determine what the appropriate
goals, or required maturity levels, are for each of the elements for the required
simulation. The supporting evidence determines the current level of maturity and
qualifies the assessment. The difference between the current state and desired
maturity level identifies the gaps that need to be addressed. Before the project
progresses it must be determined and accepted by all involved whether these
gaps are acceptable (and thereby, the required maturity level reduced) or what the
mitigation plan is to reduce this gap. Often these gaps cannot be closed due to
funding, time or technical constraints. Therefore, it must be determined whether
a compromise between the stakeholders is possible and viable.
The PCMM assessment performed by the analysis team for the example is
given in Table 47.1 with the specific grades highlighted in light green. The
programmatic maturity level goal/target was determined to be a level 2 for all
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1591
Table 47.1 The commonly seen PCMM Template with current state quality levels in green
transparent boxes
Scaled Scores
0
Target
VAL CVER
SVER
elements at the beginning of the project. A Kiviat diagram can be used to provide
a visual metric of the state of each of the elements as compared to the specified
target level. This simple chart quickly shows which element of providing a credible
prediction is lacking. and which are meeting the target. For the puncture example,
the associated Kiviat diagram is shown in Fig. 47.1 which reveals the state of the
PCMM graphically.
The PCMM was defined to allow the assessment of the state of modeling and
simulation (M&S) in a manner that balances all aspects of the M&S workflow.
PCMM provides a structured breakdown of the component work within an engineer-
ing M&S study. Classically it contains six elements, each requiring specific focus
when the PCMM is applied. These six elements are geometry/representation fidelity,
model fidelity, code verification, solution verification (numerical error estimation),
validation and uncertainty quantification/sensitivity analysis. Each of these elements
entails significant complexity and contributes to the overall quality.
Part of the original role of PCMM was to organize and structure the achievement
of high credibility M&S for the purposes of decision-making. As PCMM was
developed at an engineering laboratory, the focus was necessarily centered upon
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1593
engineering analysis and thus biased. Over time these biases have become evident
upon the application of PCMM to other M&S endeavors. In the process the
framework has been modified, extended and its biases lay bare. Each extension has
helped the model itself mature and yielded an enhanced understanding regarding
its application and form. Among the most important of these extensions is the
expansion of the process surrounding PCMM to include pre- and post-conditions
and targets and an overall iterative framework for applying the assessment. We
have come to view PCMM more as a communication and planning tool than a
vehicle for assessment. It is in this broader sense that the PCMM may find its
greatest utility in use, the definition of high-credibility M&S for decision making
from inception to completion with greater quality. A third way to constructively
engage with PCMM is through viewing it as defining the elements of the modeling
and simulation workflow. In any workflow the PCMM encapsulates the different
issues and topics that must be confronted. Recent work on PCMM has provided a
recommended workflow together with the assessment elements.
One of the big problems that the entire V&V enterprise has is its sense of imposition
on others. Every simulation worth discussing does V&V at some level, and almost
without exception they have weaknesses. Doing V&V right or well is not easy
or simple. Usually, the proper conduct of V&V will expose numerous problems with
a code, model, and/or simulation. Its kind of like exposing yourself to an annual
physical; its good for you because you learned something about yourself you didnt
know, but you might have to face some unpleasant realities.
In addition, the activity of V&V is quite broad and something almost always
slips between the cracks (or chasms in many cases). To deal with this breadth,
the V&V community has developed some frameworks to hold all the details
together. Sometimes these frameworks are approached as prescriptions for all the
things you must do. Instead we suggest that these frameworks should not be recipes,
nor should they be thought of as prescriptions, they are things to be seriously
considered If listed aspects of PCMM are disregarded, it should be justified through
rigorous analysis. They are thou should, not thou shalt, or even you might.
Several frameworks exist today and none of them is fit for all purposes, but all of
them are instructive on the full range of activities that should be at least considered,
if not engaged in.
CSAU Code Scaling Assessment and Uncertainty [1, 2] developed by the
Nuclear Regulatory Committee to manage the quality of analyses done for power
plant accidents. It is principally applied to thermal-fluid (i.e. thermal-hydraulic)
phenomena that could potentially threaten the ability of nuclear fuel to contain
radioactive products. This process led the way and has been updated recently
[13]. It includes processes and perspectives that have not been fully replicated in
subsequent work. PCMM is attempting to utilize these lessons in improving its
completeness.
1594 W.J. Rider et al.
1. V&V and UQ are both deep fields with numerous deep subfields. Keeping
all of this straight is a massive undertaking beyond the capacity of most
professional scientists or engineers. A framework provides a blueprint for
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1595
conducting detailed studies and making certain that no aspect of the technical
work remains untouched. Such assessments are prone to missing important
details associated with topics that may not be within the expertise of those doing
the work.
2. Human behavior shows that everyone will default to focusing on where they are
strong and comfortable, or interested. For some people it is mesh generation,
for others it is modeling, and for yet others it is analysis of results. Such deep
focus may not lead (or is not likely to lead) to the right sort of quality. Where
quality is needed is dependent upon the problem itself and how the problems
solution is used. The frameworks provide a holistic view of quality where deep
focus does not necessarily lead to myopic behavior. It also encourages a more
collaborative aspect for the conduct of V&V by highlighting where expertise
needs to be sought beyond those conducting the work.
3. These are useful outlines for all of the activities that a modeling and simulation
project might consider. Project planning can use the frameworks to develop help
set objectives and subtasks and prioritize and schedule work The frameworks
also provide a rubric for the peer review of such efforts and allow for the
proper determination of the breadth of capabilities that might be needed for
peer review.
4. These are menus of all the sort of things you might do, not all the things you
must do. Combining the outcomes from the PIRT and the intended use of the
simulation and modeling provides a means of sorting through the menu. It also
allows the determination of the activities not conducted to be explicit rather than
implicit in the assessment.
5. They provide a sequenced set of activities, prepared in a sequenced rational
manner with an eye toward what the modeling and simulation is used for
(intended use). Again, the framework provides suggestions and not a straight-
jacket. Different sequencing can be executed if reasoned analysis calls for it.
The frameworks give a structured manner where the work can be extended or
improved in a rational manner if resources become available.
6. They help keep activities in balance. A generally holistic quality mentality
is encouraged and deviations from full coverage are explicitly available. The
framework if properly utilized will help keep you honest.
7. You will understand what is fit for purpose, when you have put too much effort
into a single aspect of quality. Both the impact of neglected aspects of the
modeling and simulation effort and steps to remove such neglect can be easily
accessed via the framework.
8. V&V and UQ are developing quickly and the frameworks provide a cheat
sheet for all of the different aspects. It assures that the analysis, assessment
and examination remains current. Generally the list of potential activities can
never be exhausted given fixed resources most projects operate under. The
frameworks provide a means of looking towards and measuring continuous
improvement.
9. A frameworks flexibility is key and not every application necessarily should
focus on every quality aspect, or apply every quality approach in equal
1596 W.J. Rider et al.
measure. Again the whole aspect of balancing a holistic view of quality with
an appropriate narrowing effort put toward more critical aspects of a particular
modeling and simulation effort is necessary.
10. Validation itself is incredibly hard in both breadth and depth. It should be
engaged in a structured, thoughtful manner with a strong focus on the end
application. Validation is easy to do poorly; the frameworks provide steps where
the underlying work for high quality validation is readily available.
11. The computational science community largely ignores verification of code and
calculations. Even when it is done, it is usually done poorly, or insufficiently
The frameworks provide a focus on this aspect of V&V so that it is not entirely
ignored by any effort utilizing them.
12. Error estimation and uncertainty too rarely include the impact of numerical
error, and estimate uncertainty primarily through parametric changes in models.
These efforts are highlighted in frameworks and lead to higher quality assess-
ments and provide the means for improving many efforts.
13. Another element of the frameworks is the accounting for numerical errors
which is usually much larger than acknowledged. Lots of parametric and
model calibration is actually accounting for the numerical error, or providing
numerical stability rather than physical modeling.
14. A framework helps identify gaps and associated risks for each modeling and
simulation effort. These gaps can be targeted as additional work to improve
quality.
15. Lastly, a framework can help you incorporate project resource constraints and
identify associated risks and consequences in forgoing various V&V activities.
They also provide an explicit focus on both what is done, but also what is not
done. None of this is an alternative to a hard hitting and technically focused
peer review. Bringing in independent experts and fresh perspectives is utterly
invaluable to the honesty and integrity of quality assessment.
As originally constructed (and reported in [9]) the PCMM addressed six elements
that were identified to be essential for successful application of modeling and
simulation. These elements were:
In the PCMM process a general set of attributes were identified for each of these
elements to permit characterization of each element into one of four maturity levels
(0-3) This resulted in defining a matrix of maturity levels verses PCMM elements
as illustrated in Table 47.2.
As can be seen from the information contained in Table 47.2, the PCMM
maturity levels constitute a hierarchy that represents increasingly greater levels of
sophistication and computational fidelity (and expense and resources). The levels
contained within this hierarchy can be summarized as follows.
We note that the assessment of the levels of maturity does not, by itself,
indicate the degree to which the M&S capability will be successful at meeting
the requirements identified to address a particular application (e.g. a licensing
application or a regulatory requirement, or a design specification). To identify
the degree to which such a capability will be present for a particular application,
one would compare the assessed maturity level for the PCMM elements with an
objective level of maturity that is identified as necessary for the application of
interest. Any application will provide specific quantities of interest associated with
defining a successful outcome for the intended purpose.
PCMM is a framework to organize that entire V&V and UQ landscape into neat
little boxes. Of course reality is never so neat and tidy, but structure for such a
complex and potentially unbounded activity is good. PCMM has gone through
numerous revisions, extensions and rearticulations, and the safest conclusion is that
the framework will never be complete. While for any given use of PCMM a different
and focused version may have utility, any effort to achieve this may be futile. Such
a framework can produce better results, but will never be a silver bullet.
A close examination of PCMM provides an insight to its intrinsic bias toward
a certain class of engineering calculations. This is clearest in the emphasis on
geometric fidelity and solution verification, which belie its basis in meshbased
1598 W.J. Rider et al.
calculations. Nonetheless, peeling back a layer, one should not lose sight of the
nature of these entries in the framework. The geometry is really a view of the
representation of reality in the overall computational simulation, while the solution
verification is the estimate of error in the same. Both aspects are essential to
determining the quality and assigning confidence to the simulation results. Other
aspects of the PCMM translate across fields more readily. Code verification and
software quality are necessary elements in providing reliable computer codes
for simulation. Modeling and its credibility are universal in the field. Finally
validation provides the tangible and measured connection to reality, and uncertainty
quantification is key to decision-making.
Other elements have been added due to their importance and focus when applying
the framework. Recent work has taken the validation element and broken it up into
four separate activities to be examined. Two elements now involve the quality and
acquisition of experimental data; followed by the assessment of the simulation with
respect to the data. The validation exercise is divided into two sections, one with
data and validation applied to the underlying and low-level models in the code, and
the second applying to the application-level or high-level modelling and associated
data for comparison. In addition, we considered adding additional elements to
PCMM associated with the requirements arising from the simulation customers,
and the impact of the code user/analyst/engineer on the results obtained. This area
is sensitive and controversial because the human impact of M&S analysis is a hot-
button issue with many and is tabled for now.
A useful concept in examining complex, difficult problems is the characterization
of the problem as wicked. A wicked problem has a number of characteristics
that make it special and difficult to solve. For example, the problem cannot be
fully understood before attempting to solve it. As the problem is tackled, new
aspects of the problem are unveiled, and the solution to the problem needs to be
rescoped. This is a continual aspect of the problem, and it bedevils those who
attempt to apply project plans and complete predictability to the problem. There
are even super-wicked problems that are more difficult due to counter-intuitive
feedbacks between the problem itself and the solution. Characteristics of super-
wicked problems are that those solving the problem are also causing the problem,
and future results are irrationally discounted presently among others. V&V with
PCMM as an example can probably be characterized as being wicked and perhaps
even super-wicked. Thus the activity defies full articulation.
Despite this intrinsic futility we would claim that PCMM is useful. In a real sense
the activities within PCMM could be thought of as a menu of activities that one
might consider in determining and driving simulation quality. This is as important
as the measurement of the quality itself. Being such a complex and potentially
unbounded activity means that there is large probability for details to slip from
attention. PCMM provides the list of activities for simulation quality improvement
and assessment, which can be utilized effectively by code development and
application projects to draw upon. A reasonable recommendation is to examine the
PCMM for low hanging fruit that can easily be incorporated into the simulation
process. As we will now describe this process can be further refined by dividing
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1599
the PCMM into two pieces; one that applies primarily to code development and
the foundation of simulation capability, and a second that is more specific to the
application of interest. Another practicality is the potentially overwhelming nature
of PCMM. It is therefore rational and reasonable to apply less than the complete
framework initially.
Thus a useful observation is that one can decompose the whole computational
confidence issue into two relatively neat and tidy pieces: that which is more tool
(code)-specific, (or foundational) and that which is problem or application-specific.
Thus some work provides the general characterization of the computational tools
to be used for analysis, while other work is directly applied to the application
of that tool. For example, the general software development approach and docu-
mentation can be used over and over for different applications, while the specific
options from a code used for a given application are narrow and specific to that
application. This principle can be applied repeatedly across the span of activities
that comprise the quality pedigree for a given code and its intended application.
This principle applies for nearly every area of code quality investigation as laid out
below.
Software quality is one of the obvious lynchpins of the foundation for a code. The
practices applied in the development of the code provide the degree of confidence in
the codes correctness and stability. High quality code practices provide important
tractability to the overall pedigree of the simulation. These practices will apply
to every application of the code. We acknowledge that a subset of the software
activities will be more directly relevant to a given activity and may be subject to
different requirements.
Code verification is another simulation code quality practice that applies across
the entire spectrum of potential applications. Code verification in a nutshell provides
the evidence and confidence that the solution algorithm in the code is implemented
correctly, and the given mathematical description is actually being solved. This is a
distinctive to numerical simulation and applies in a complementary manner to the
overall software quality approach. Again, some of the code verification will be more
specifically applicable to a certain problem attacked by the code.
The decomposition of validation into two sets allows part of the validation
activity to be considered foundational. Many models are common to a large number
of simulations and comprise the core of the modeling capability of the code. These
models must be validated so that their fidelity can be fully assessed away from
the intended application. Furthermore, these models can be compared to special
purpose experiments that have relatively small errors compared to many application
settings. This separation allows some of the modeling capability to be assessed in a
manner that should greatly increase confidence in the code. When these errors are
convoluted with the integral-large scale data from the applications, the source of
discrepancy can become hidden.
1600 W.J. Rider et al.
uncertainty estimation. Too often the numerical error is simply calibrated for, or
muddled with other modeling errors. The key to this step is the clear separation and
articulation of this aspect of error and uncertainty apart from other sources. In many
cases the solution verification may involve a simple bounding estimate that gives
the idea of the magnitude of the effect on the results.
The integral validation aspect of PCMM comes naturally to the nuclear engineer.
The unnatural part of the exercise is properly casting the process with all the
other elements of PCMM. PCMM is in essence the deconvolution of many effects
that often comprise the validation exercise. The foundational aspects of validation,
uncertainty and solution verification attempt to peel away this complexity leaving
the core of error to be examined. This is the task with integral validation to first
understand how well the uncalibrated code models the circumstance, and then
calibrate the solution without undoing any of the foundational work. Thus the
calibration can be fully exposed to scrutiny and hopefully underlie the capacity to
more directly attack the basis of lack of credibility.
Application uncertainty and sensitivity analysis is then quite clearly defined.
Again the separation between foundational model validation with requisite uncer-
tainty gives a basis for attacking the specific application uncertainty with clarity.
Both aspects of uncertainty impact results and are included in the assessment, but the
goal of clearly identifying the application model-specific uncertainties is obtainable
through following the structure outlined here.
User qualification has a large impact on the results, yet is rarely assessed. An
ideal case would be to have independent models of the same application. This is
often not possible despite numerous studies showing that the user effect is large
(larger than model differences in cases).
Finally the entity paying for the application work has requirements. These
requirements should be assessed for how well the simulation has met these.
A PIRT should list all of the important physics phenomena occurring in the physical
event being simulated at the specified level of interest [18]. For each individual
phenomenon an assessment/ranking must be declared for the Importance of this
particular phenomenon in the application simulation or, in other words, what is the
resulting consequence if the phenomena model is wrong. The rankings are specified
as high (H), medium (M) or low (L). Then Adequacy determinations must
be made with respect to how well the phenomena is represented with a mathematical
model, how well it is implemented within the simulation code of choice (Sierra/SM
for this project/example) [15], and the level of validation that has been performed
for the application space of interest. Once again a ranking is determined for each of
these three Adequacy areas.
For quick visual adequacy analysis, a color-coding scheme is used to identify
areas of inadequacy or gaps. Green signifies that the adequacy is acceptable; yellow
indicates that the adequacy is marginal and red indicates that the adequacy level
1602 W.J. Rider et al.
is unacceptable and needs to be addressed. The colors are assigned with respect to
the Importance ranking. If a phenomenas Importance ranking is low, then any
level of adequacy is deemed acceptable. If the Importance ranking is medium,
then acceptable adequacy rankings are medium and high. An adequacy ranking of
low produces a yellow square that indicates a marginal adequacy level. When the
Importance ranking is high, then the only acceptable adequacy level is high
earning a green square. For a medium adequacy determination a yellow square
is used, however, for a low adequacy assessment a red square is used to indicate
an unacceptable adequacy level for that particular category. A sample PIRT with
3 identified physics phenomena is given in Table 47.3 showing various adequacy
levels with colored gap indicators.
For the presented project, an assessment of the numerical prediction of important
phenomenological aspects of an abnormal fracture problem was evaluated at the
start of the project, and presented in Table 47.4. Six different physics phenomena
were identified (large elastic-plastic deformation of metals, ductile material failure,
contact, friction between punch and test item, enforcement of boundary conditions,
inertial loads). All of them were rated of high importance except for friction
between punch and test item which was considered of medium importance.
Two areas of concern are quickly identified by their red coloration that required
attention: Ductile Material Failure and Enforcement of Boundary Conditions
with respect to validation adequacy. Both of these were of high importance but
ranked low on the validation adequacy scale. Since the intent of the project
was to better understand failure mechanics modeling and its limitations, validate
its applicability for the application being studied and characterize its reliability
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1603
and usability, the fact that Ductile material failure was ranked inadequate was
worrisome and addressed in the scope of the project.
The boundary condition assumption gap was also identified and investigated.
The model enforces a perfect non-slip boundary condition between the plate and
the anchoring fixture. In reality it is possible that the test article could slip in the
clamping fixture. Everything was done experimentally to minimize the slippage
during impact and penetration. Several experiments were performed to assess the
level of non-compliance of the non-slip boundary condition and physically inter-
rogated this situation. The experimentalists deemed that the plate was not slipping
in the clamping fixture. Therefore, the theoretical boundary condition enforcement
was deemed acceptable and the Importance ranking of Enforcement of Boundary
Conditions was downgraded to L in the final PIRT assessment (shown in black
print in Table 47.4) that elevated the validation adequacy coloration to green.
Here we give an example of what an actual capability assessment looks like in a bit
of detail. In this case we will be showing the outcomes and process outline for the
QASPR project [16] and associated codes, Xyce [4] and Charon [7, 17] (Fig. 47.2).
The assessment was conducted at Sandia National Laboratories, Albuquerque by
Laboratory staff. QASPR was begun an admittedly ambitious effort to replace the
experimental testing of integrated circuits and associated semi-conductor material
in a high radiation environment. Sandia sought to replace the expensive and risky
SPUR reactor with an extensively validated and verified computational capability.
The specific assessment was for the maturity of the III-V Npn model predictability
within a defined threat environment for the Xyce and Charon codes. In particular,
the device model for the MESA developed using Npn InGap/GaAs.
1.5 1.5
1 1
VAL1 VAL1
RGF1 RGF1
SVER5 RGF2 SVER5 RGF2
SVER4 RGF3 SVER4 RGF3
SVER3 SVER3
SVER2 SVER2
SVER1 SVER1
Fig. 47.2 QASPR: PCMM Kiviat Plots showing the assessed status of Xyce and Charon
1604 W.J. Rider et al.
The first thing to do is pull together a team to conduct the assessment starting
with a lead stakeholder. In this case the stakeholder is acting as the customer for
the work done by the code. In other cases this may or should be another person
altogether. The assessment can take input from any member of the team although
each member should not necessarily contribute to each part of the PCMM. Secondly
we have a (trained) PCMM assessor who also acts as a moderator for the activity.
Next we have a PCMM subject matter expert (SME) whose expertise includes key
aspects of V&V. The team is then rounded out by SMEs in each major role for
the development and use of the code(s). In this case we have SMEs for UQ, V&V
analysis, experiments, applied circuit analysis, calibration and code development for
both Xyce and Charon (different people).
The first step done was to meet with the entire team to brief them on the process
and gain buy-in. In addition the team receives training on the PCMM and the
assessment process. In the case of QASPR the project had already conducted an
extensive PIRT. Rather than repeat this work, the first step was to review and
vet the existing PIRT for the assessment that follows. The team then held a set
of meetings to conduct the assessment. It is important that the team be well-
represented throughout. After a set of meetings the team prepared some focused
feedback on the state of the QASPR projects codes as well as the PCMM process
itself.
Part of this effort used an Excel spreadsheet as the vehicle for the assessment,
and much of the feedback keyed on this. There was feeling that clarification of the
language could be done particularly within the UQ section. Some elements seemed
to overlap one another. Overall the Excel Tool was judged to be very beneficial and
helps capture a great deal of detail, but can be overwhelming at times during the
assessment. We might consider creating a reduced version for real time assessment.
Overall it is a great organization tool for the assessor. This sort of feedback should
always be included to continually improve the process.
A real positive is that the PCMM assessment generated discussion across
different teams consisting of analysts, developers, and experimentalists. As an
example of the discussion: For qualification, solution verification will be critical
(e.g., input/output file verification) and will need to be better formalized analysts
are now aware of this and QASPR will need to better formalize this workflow. We
need another assessment iteration to determine the value of some members on the
larger team for the purposes of assessment. Experimentalists seemed to be the odd
man out, but may be due to QASPR development of Physical Simulation PCMM
(the PCMM ideas applied to physical testing or experimentation). The assessment
is also captured in some graphical output, and overall spreadsheet content.
At the end of the assessment we crafted a path forward for further assessments.
1. First, review PCMM elements with PCMM SME prior to assessment to insure
all the descriptors are well understood. This was done in real time during the
assessment which caused delays.
2. The Excel Tool will be used for future assessments.
3. Evidence may be moved and maintained on a SharePoint site.
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1605
The next assessment will focus on PnP devices and potentially circuits and an update
of the Npn capability should also be done.
As tools the frameworks of PIRT and PCMM are only as good as how they are
used to support high quality work. Using these tools will not necessarily improve
your credibility, but rather help you assess it holistically. Ultimately the credibility
of your modeling & simulation capability is driven by the honesty and integrity of
your assessment. It is easy to discuss what you do well and where you have mastery
over a topic. The real key to assessment includes a will and willingness to articulate
where an effort is weak and where the fundamental foundational knowledge in a
field is the limiting factor in your capacity for solving problems. The goal in credibil-
ity assessment is not demonstration of mastery over a topic, but rather a demonstra-
tion of the actual state of affairs so that decisions can be made with full knowledge
of the weight to place and risks inherent in modeling & simulation (M&S).
A topic like quality is highly subjective and certainly subject to a great deal of
relativity where considerations of the problem being solved and the M&S capabili-
ties of the field or fields relevant come into consideration. The frameworks serve to
provide a basic rubric and commonly consistent foundation for the consideration of
M&S quality. As such, PIRT and PCMM are blueprints for numerical modeling. The
craft of executing the M&S within the confines of resources available is the work
of the scientists and engineers. These resources include the ability to muster effort
towards completing work, but also the knowledge and capability base that you can
draw upon. The goal of high quality is to provide an honest and holistic approach to
guiding the assessment of M&S quality.
In the assessment of quality the most important task to accomplish is an honest
discussion of the technical elements of the work with complete disclosure of faults
and shortcomings. There are significant physiological and social factors that lead
to a lack of honesty in evaluation and assessment. No framework or system can
completely overcome such tendencies, but might act as a hedge against the tendency
to overlook critical details that do not reflect well on the assessment. The framework
assures that each important category is addressed. The ultimate test for the overall
integrity and honesty of an assessment of M&S credibility depends upon deeper
technical knowledge than any framework can capture.
Quite often an assessment will avoid dealing with systematic problems for
a given capability that have not been solved sufficiently. Several examples of
1606 W.J. Rider et al.
this phenomenon are useful in demonstrating where this can manifest itself. In
fluid dynamics, turbulence remains a largely unsolved problem. Turbulence has
intrinsic and irreducible uncertainty associated with it, and no single model or
modeling approach is adequate to elucidate the important details. In Lagrangian
solid mechanics the technique of element death (where poorly deformed elements
are removed to allow the simulation to continue) is pervasively utilized for highly
strained flows where fracture and failure occur. It is essential for many simulations
and often renders a simulation to be non-convergent under mesh refinement. In both
cases the communities dependent upon utilizing M&S with these characteristics
tend to under-emphasize the systematic issues associated with both. This produces
a systematically higher confidence and credibility than is technically justifiable.
The general principle is to be intrinsically wary of unsolved problems in any given
technical discipline.
Together the PIRT and PCMM adapted and applied to any M&S activity form
part of the delivery of defined credibility of the effort. The PIRT gives context to
the modeling efforts and the level of importance and knowledge of each part of
the work. It is a structured manner for the experts in a given field to weigh in on the
basis for model construction. The actual activities should be strongly reflected in the
sort of assessed importance and knowledge basis reflected in the PIRT. Similarly
the PCMM can be used for a structured assessment of the specific aspects of the
modeling and simulation.
The degree of foundational work providing the basis for confidence for the work
is spelled out in the PCMM categories. Included among these are the major areas
of emphasis some of which may be drawn from outside the specific effort. Code
verification being an exemplar of this where its presence and quality provides a
distinct starting point for the specific aspects of the estimation of the numerical
error for the specific M&S activity being assessed. Each of the assessed categories
forms the starting point for the specific credibility assessment.
One concrete way to facilitate the delivery of results is the consideration of
the uncertainty budget for a given modeling activity. Here the delivery of results
using PIRT and PCMM is enabled by considering them to be resource guides
for the concrete assessment of an analysis and its credibility. This credibility is
quantitatively defined by the uncertainty and the intended applications capability
to absorb such uncertainties for the sort of questions to be answered or decision
to be made. If the application is relatively immune to uncertainty or only needing
a qualitative assessment then large uncertainties are not worrisome. If on the
other hand an application is operating under tight constraints associated with other
considerations (sometimes called a design margin) then the uncertainties need to be
carefully considered in making any decisions based on modeling and simulation.
This gets to the topic of how modeling & simulation are being used. Traditionally
M&S goes through two distinct phases of use. The first phase is dominated by
what if modeling efforts where the results are largely qualitative and exploratory
in nature. The impact of decisions or options is considered on a qualitative basis and
guides decisions in a largely subjective way. Here the standards of quality tend to
focus on completeness and high-level issues. As modeling and simulation proves its
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1607
worth for these sorts of studies, it begins to have greater quantitative demands placed
on it. This forms a transition to a more demanding case for M&S where design or
analysis decision is made. In this case the standards for uncertainty become far more
taxing. This is the place where these frameworks become vital tools in organizing
and managing the assessment of quality.
This is not to say that these tools cannot assist in earlier uses of M&S. In
particular the PIRT can be a great tool to engage with in determining modeling
requirements for an effort. Similarly, the PCMM can be used to judge the appropri-
ate level of formality and completeness for an effort to engage with. Nonetheless
these frameworks are far more important and impactful when utilized for more
mature, engineering focused modeling and simulation efforts.
Any high level integrated view of credibility is built upon the foundation of the
issues exposed in PIRT and PCMM. The problem that often arises in a complex
M&S activity is managing the complexity of the overall activity. Invariably gaps,
missing efforts and oversights will creep into the execution of the work. The basic
modeling activity is informed by the PIRTs structure. Are there important parts
of the model that are missing, or poorly grounded in available knowledge? From
PCMM, are the important parts of the model tested adequately? The PIRT becomes a
fuel for assessing the quality of the validation, and planning for an appropriate level
of activity around important modeling details. Questions regarding the experimental
support for the modeling can be explored in a structured and complete manner.
While the credibility is not built on the PCMM and PIRT, the ability to manage
its assessment is enabled by their mastery of the complexity of modeling and
simulation.
In getting to the quantitative basis for assessment of credibility, the definition of
the uncertainty budget for a computational simulation activity can be enlightening.
While the PCMM and PIRT provide a broadly encompassing view of MS quality
from a qualitative point of view, the uncertainty budget is ultimately a quantitative
assessment of quality. Forcing the production of numerical values to the quality is
immensely useful and provides important focus. For this to be a useful and powerful
tool, this budget must be determined with well-defined principles and fairly good
disciplined decision-making.
One of the key principles underlying a successful uncertainty budget is the
determination of unambiguous categories for assessment. Each of these broad
categories can be populated with sub-categories, and finer and finer categorization.
Once an effort has committed to a certain level of granularity in defining uncertainty,
it is essential that the uncertainty be assessed broadly and holistically. In other
words, it is important, if not essential that none of the categories be ignored.
This can be extremely difficult because some areas of uncertainty are truly
uncertain, no information may exist to enable a definitive estimation. This is the
core of the difficulty for uncertainty estimation, the unknown value and basis for
some quantitative uncertainties. Generally speaking, the unknown or poorly known
uncertainties are more important to assess than some of the well-known ones. In
practice the opposite happens, when something is poorly known the value often
adopted in the assessment is implicitly defined quantitatively as zero. This is
1608 W.J. Rider et al.
intentional, and avoid unintentional oversights. Their use can go great lengths to
provide direct evidence of due diligence in the assessment and highlight the quality
of the credibility provided to whomever utilizes the results.
We should not judge people by their peak of excellence; but by the distance they have
traveled from the point where they started.
Henry Ward Beecher
We believe it is time for the community to shoulder some of the blame and rethink
our approach to engaging other scientists and engineers on the topic of M&S quality.
V&V should be an easy sell to the scientific and engineering establishment. It
has not been and it has often been resisted at every step. V&V is basically a
rearticulation of the scientific method we all learn, use and ultimately love and
cherish. Instead, we find a great deal of animosity toward V&V, and outright
resistance to including it as part of the M&S product. To some extent it has been
successful in growing as a discipline and focus, but too many barriers still exist.
Through hard learned lessons we have come to the conclusion that a large part
of the reason is the V&V communitys approach. For example, one of the worst
ideas the V&V community has ever had is independent V&V. In this model V&V
comes in independently and renders a judgment on the quality of M&S. It ends up
being completely adversarial with the M&S community, and a recipe for disaster.
We end up less engaged and hated by those we judge. No lasting V&V legacy is
created through the effort. The M&S professionals treat V&V like a disease and
spend a lot of time trying to simply ignore or defeat it. This time could be better
spent improving the true quality, which ought to be everyones actual objective.
Archetypical examples of this approach in action are federal regulators (NRC,
the Defense Board. . . for a discussion of PCMM in a regulatory context see the
subsequent Chapter by Mousseau and Williams). This idea needs to be modified
into something collaborative where the M&S professions end up owning the quality
of their work, and V&V engages as a resource to improve quality.
The fact is that everyone doing M&S wants to do the best job they can, but
to some degree dont know how to do everything. In a lot of cases they havent
even considered some of the issues we can help with. V&V expertise can provide
knowledge and capability to improve quality if they are welcome and trusted. One
of the main jobs of V&V should be to build trust so that they might provide their
knowledge to important work. In sense, the V&V community should be quality
coaches for M&S. Another way the V&V community can help is to provide
appropriately leveled tools for managing quality. PCMM can be such a tool if
its flexibility is increased. Most acutely, PCMM needs a simpler version. Most
modeling and simulation professionals will do a very good job with some aspects
of quality. Other areas of quality fall outside their expertise or interest. In a very
real sense, PCMM is a catalog of quality measures that could be taken. Following
47 Uncertainty Quantifications Role in Modeling and Simulation: : : 1611
the framework helps M&S professionals keep all the aspects of quality in mind
and within reach. The V&V community can then provide the necessary expertise to
carry out a deeper quality approach.
If V&V allows itself to get into the role of judge and jury on quality, progress will
be poor. V&Vs job is to ask appropriate questions about quality as partners with
M&S professionals interested in improving the quality of their work. By taking this
approach we can produce an M&S future where quality continuously improves.
References
1. Boyack, B.E., Lellouche, G.S.: Quantifying reactor safety margins part 1: an overview of the
code scaling, applicability, and uncertainty evaluation methodology. Nucl. Eng. Des. 119(1),
115 (1990)
2. Boyack, B., et al.: Quantifying reactor safety margins: application of code scaling, applicabil-
ity, and uncertainty evaluation methodology to a large-break, loss-of-coolant accident. Nuclear
Regulatory Commission, Washington, DC (USA). Div. of Systems Research (1989)
3. Babula, M., Bertch, W.J., Green, L.L., Hale, J.P., Mosier, G.E., Steele, M.J., Woods, J.: NASA
standard for models and simulations: credibility assessment scale. In: AIAA Proceedings of
the 47th AIAA Aerospace Sciences Meeting, AIAA-2009-1011 (2009)
4. Fixel, D., Hennigan, G., Castro, J., Lin, P.: Charon parallel semiconductor device simulator.
SAND2010-3905, Sandia National Laboratories, Albuquerque (2010)
5. Hemez, F., Atamturktur, H.S., Unal, C.: Defining predictive maturity for validated numerical
simulations. Comput. Struct. 88(7), 497505 (2010)
6. Hills, R.G., Witkowski, W.R., Rider, W.J., Trucano, T.G., Urbina, A.: Development of a fourth
generation predictive capability maturity model. Sandia National Laboratories, Albuquerque
(2013)
7. Keiter, E.R., Thornquist, H.K., Hoekstra, R.J., Russo, T.V., Schiek, R.L., Rankin, E.L.: Parallel
Transistor-Level Circuit Simulation. Simulation and Verification of Electronic and Biolog-
ical Systems, pp. 121. Springer, Dordrecht (2011). http://dx.doi.org/10.1007/978-94-007-
0149-6_1
8. Oberkampf, W.L., Roy, C.J.: Verification and Validation in Scientific Computing. Cambridge
University Press, Cambridge (2010)
9. Oberkampf, W.L., Pilch, M.M., Trucano, T.G.: Predictive capability maturity model for
computational modeling and simulation. Development 2 (2007)
10. Pace, D.K.: Comprehensive consideration of uncertainty in simulation use. J. Def. Model.
Simul. Appl. Methodol. Technol. 10(4), 367380 (2013)
11. Pilch, M.M., Trucano, T.G., Helton, J.C.: Ideas underlying quantification of margins and
uncertainties (QMU): a white paper. Unlimited Release SAND2006-5001, Sandia National
Laboratory, Albuquerque, 87185 (2006)
12. Pilch, M.M.: Predictive capability maturity model (PCMM). No. SAND2007-0102C. Sandia
National Laboratories (2007)
13. Regulatory Guide 1.203: U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory
Research (2005)
14. Roache, P.J.: Verification and Validation in Computational Science and Engineering, Hermosa
(1998)
15. Edwards, H.C., Stewart, J.R.: Sierra, a software environment for developing complex multi-
physics applications. In: Computational Fluid and Solid Mechanics. Proceedings of the First
MIT Conference, pp. 11471150 (2001)
16. Holmes, S.M.: .Tough Enough. Sandia Res. 2(2), 1115 (2014)
17. Thornquist, H.K., Keiter, E.R., Hoekstra, R.J., Day, D.M., Boman, E.G.: A parallel precon-
ditioning strategy for efficient transistor-level circuit simulation. In: Proceedings of the 2009
1612 W.J. Rider et al.
Abstract
This chapter describes the use of the Predictive Capability Maturity Model
(PCMM) (Oberkampf et al., Predictive capability maturity model for compu-
tational modeling and simulation. Technical report, SAND2007-5948, Sandia
National Laboratories, 2007) applied to a nuclear reactor simulation. The
application and PCMM will be discussed relative to review by the Nuclear
Regulatory Commission. In a regulatory environment, one takes on the role of
a lawyer presenting evidence to a judge with a prosecuting attorney allowed to
cross-examine. In this type of hostile environment, a structured process that
logically presents the evidence is helpful.
In addition, many simulations are now multi-scale, multi-physics, and multi-
code. For this level of complexity, it is easy to get lost in the details. The PCMM
method has been adapted for this multi-physics multi-code software. Since the
key is to provide the regulator with confidence that the software is capable of
predicting the quantity of interest (QoI) with a well-quantified uncertainty, the
PCMM approach is a natural solution.
Keywords
Bayesian calibration Code scaling, applicability, and uncertainty (CSAU)
Dittus-Boelter correlation Expert opinion distributions Graphical user
interface (GUI) Joint distributons Marginal distributions Markov chain
Monte Carlo (MCMC) Neutronics Numerical uncertainty Phenomena
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614
2 Uncertainty Quantification Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1615
2.1 Total Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616
2.2 Wilks Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1618
2.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1620
3 Prerequisite Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1620
3.1 Top-Down Parameter Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1621
3.2 Bottom-Up Parameter Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1621
3.3 General Parameter Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1622
3.4 As Built . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1622
3.5 QoI Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1623
3.6 Parameter Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1623
3.7 User Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1630
4 Modifications to PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1631
4.1 Quantified Parameter Ranking Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1631
4.2 Validation Pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632
4.3 Partitioning of Code and Application PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1632
5 Sample Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1634
5.1 PIRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1634
5.2 Numerical Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1635
5.3 QPRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1635
5.4 PIRT and QPRT Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636
5.5 Code PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637
5.6 Application PCMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1640
5.7 Best Estimate Plus Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1640
6 Presentation of the Evidence File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1642
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1642
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1643
A.1 Analysis of Wilks Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1643
B.2 PIRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1645
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1648
1 Introduction
1. Code bugs documentation and regression testing are two powerful tools that
fall under the heading of software quality engineering (SQE) to minimize code
bugs.
2. Code verification addresses the uncertainty due to mistakes in implementing
numerical methods.
3. Solution verification quantifies the uncertainty due to numerical methods
applied in suboptimal conditions like coarse mesh spacing and large time steps
required for solution speed.
4. Validation quantifies uncertainty due to the physics models. Verification asks,
Are you solving the equations correctly? Validation asks, Are you solving the
correct equations?
5. Parameter uncertainty quantifies the uncertainty due to uncertainty in model
parameters.
6. Calibration is the mathematical process by which one reduces uncertainty in
model parameters due to the addition of new experimental data.
These six topics are covered by the four PCMM evidence bins, code verification,
solution verification, model validation, and uncertainty quantification and sensitiv-
ity analysis. PCMM provides a structured approach to gather the evidence and
present it in a logical fashion.
The remainder of this chapter is organized as follows. Section 2 provides
an initial overview of uncertainty quantification. This is followed in Sect. 3 by
a discussion of the prerequisite work required for conducting an uncertainty
quantification study. Sect. 4 introduces changes and improvements to the PCMM
process presented in [14]. A sample PCMM analysis is presented in Sect. 5, followed
in Sect. 6 by a discussion of the presentation of the evidence file. Conclusions are
provided in Sect. 7.
1. The code is relatively bug-free and one knows what is in the code.
2. The equations in the code are solved correctly.
3. The correct equations are solved.
If any of the three previous statements are false, the exercise of conducting a
parameter uncertainty quantification study may provide useless information. If any
of the three assumptions above do not hold, the potentially substantial perturbations
of an uncertainty quantification study are likely to yield incorrect results. So before
1616 V.A. Mousseau and B.J. Williams
one begins an uncertainty quantification process, one needs to gather evidence that
the above three assumptions are true.
Statistical methods are sometimes employed by engineers because they sound
too good to be true. One can treat the simulation tool as a black box, only study
the parameters in the input processor, and, if utilizing Wilks formula (see Sect. 2.2
and Appendix A.1), only run the code 59 times for uncertainty quantification. It is
important to understand the underlying assumptions of the statistical method and be
convinced that it is being applied correctly.
It is normally assumed that the validation data covers the application space.
For many applications this is not the case. The experiments at relevant pressures,
temperatures, or flow rates are either too expensive or too dangerous to perform.
In this case one must use scaling analysis to extrapolate the empirical correlations
calibrated to the validation experiments to the range of applicability needed for the
application. This creates a new mode of uncertainty that must be understood and
addressed.
The first step in the total uncertainty framework is to conduct thorough verifica-
tion analysis of the code(s) under study. Numerical error is used as a surrogate for
numerical uncertainty (note that numerical error is usually better quantified as a
systematic bias than as a random uncertainty, although numerical error models could
incorporate both components). Chapters II and III in [10] provide formal procedures
for verification assessments. Numerical errors that are too large compared against
desired criteria indicate that additional numerical work is required before moving
on to the next step in the total uncertainty framework. For later use, a coarse
metric measuring numerical error is calculated by taking the norm of the difference
between the computed QoI and the computed QoI on an infinitely refined mesh:
QoIcomputed QoI
x!0 D numerical error: (48.1)
QoIcomputed QoIbound D model parameter error: (48.3)
The use of Wilks formula [15] has become popular in recent years in the nuclear
power industry. The original work from 1941 was designed for use on assembly
lines. For this application, the parameters were controls on the assembly line that
allowed for fine adjustment. The distribution of the parameter was well known
since it was uniform and simply the range on the physical control knob. Wilks
formula is solid mathematically and a very useful tool when properly applied.
The challenge to properly applying Wilks formula to software instead of an
assembly line comes from meeting the implied assumptions. In the four subsections
below, the four main assumptions behind the use of Wilks formula in a software
context are discussed.
the joint parameter distribution is another difficult requirement for expert opinion-
based specifications.
It is interesting to observe that poor choices of parameter ranges and the joint
distribution function may be the cause of code crashes. This leads to a nonlinear
feedback into the system where the expert opinion distribution is then modified
to eliminate the code crashes. This process of shrinking the expert opinion-based
parameter ranges until code crashes are eliminated is in reality a measure of code
robustness and has little to do with uncertainty quantification.
2.2.4 No Biases
The machinery in Wilks formula is not designed to deal with biases. This is a
particularly hard obstacle for simulation tools to overcome. This is due to the fact
that all numerical errors (inside of the asymptotic convergence range) show up as a
bias. That is, as the time step and mesh spacing are reduced, the code approaches
the exact solution monotonically. Model-form error also has this same signature in
that it introduces a bias to the solution.
2.2.5 Application
Notwithstanding the above, Wilks formula can still be applied if systematic bias
adjustments to code output are obtainable through statistical inference. Wilks
formula is a valuable tool in statistical analysis. In its intended application to
assembly lines, the required knowledge of parameters and their distributions was
easy to accommodate. One should use Wilks formula with caution when applying
it to software where knowing all of the parameters and their distributions is not
practical. Note that there is no provision in Wilks formula to only analyze important
parameters; all parameters that impact the QoI must be analyzed.
It should be recognized that many mathematical formulas still provide useful
knowledge when applied in settings where all of the assumptions behind them
cannot be rigorously verified. Although there are certainly cases where Wilks
formula is not conservative, current experience indicates that Wilks formula is
accurate or conservative when applied to software. More detail about Wilks formula
is contained in Appendix A.1.
1620 V.A. Mousseau and B.J. Williams
2.3 Scaling
Many times due to cost or safety, one cannot perform the validation experiments
at the correct power, pressure, or temperature. For single-physics applications, one
can often define nondimensional parameters that describe how information can be
scaled from one part of state space to another. The Reynolds number in CFD is a
good example of how physics can be accurately scaled.
This, however, becomes significantly more complicated for multi-physics appli-
cations. There may be a dozen different nondimensional numbers that describe
a multi-physics simulation. It is often impossible to match all of the relevant
nondimensional numbers. In that case, the experimentalist will attempt to match
the important nondimensional numbers and provide justification that the others do
not have a significant impact on the solution.
It is not easy to defend applying empirical formulas outside of their range of
applicability. The more physics that can be incorporated into the model, the easier
it is to justify using an empirical model to extrapolate outside of its validation data
range. Well-designed scaling arguments can provide a justification for extrapolation.
However, these arguments are easy for a regulator to question.
Whenever possible, respect the validation data range; when this is not possible,
ensure that an easily defensible scaling argument justifies extrapolation. It should be
noted that CSAU is one of the predecessors of the PCMM process. In CSAU, scaling
was considered one of the most important discussions in terms of establishing
confidence in software.
3 Prerequisite Work
Determining which physical phenomena are important is the first step in any uncer-
tainty quantification. This is accomplished through the PIRT process. However, this
is just the start of a long labor-intensive process. One then needs to find the models in
the code that describe the important physical phenomena and expose the parameters
in these important models for uncertainty quantification.
In many engineering applications, the models are designed one way in the journal
publication but implemented a different way in the software. Testing what the model
was supposed to be, based on the journal paper, is not valuable. One needs to test
the model as built based on how it was implemented in the software.
In addition to the parameters, the QoIs also need to be exposed for study. This can
be an error-prone process if the QoIs are computed from the graphics output files
instead of directly from the simulation code. The quantity and quality of the data in
a graphics file may not support accurate calculation of the QoI. This produces a new
mode of uncertainty related simply to errors in computing the QoI.
Assuming all the parameters and QoIs are exposed for study, the next step
is to construct parameter distribution functions. Traditionally this is done based
on expert opinion. However, Bayesian methods have the ability to construct (not
necessarily analytically) the parameter distribution functions that can then be used
for uncertainty quantification. This Bayesian process requires large amounts of
48 Uncertainty Quantification in a Regulatory Environment 1621
In legacy multi-physics codes, there can be a large number of physical models, and
each of these models can have many parameters. There is a natural way to prune
the parameter tree hierarchy that was initially published in [7]. A key realization
is that there are a small number of classes of physical phenomena. One can then
determine the sensitivity of each class. If the sensitivity of a class is small, then one
does not need to analyze the parameters that make up the models in that class.
For example, wall friction is a class. One can construct a tree of wall friction
by first separating it into single-phase and two-phase wall friction. These then
decompose into laminar and turbulent wall friction. The laminar and turbulent wall
frictions then decompose further into flow regimes. This tree structure can require a
large number of lines of code and a large number of parameters to describe.
In the top-down approach, a sensitivity multiplier is applied to the top of the tree,
namely, wall friction. If perturbations on wall friction do not impact the QoI, then
there is no need to invest any further in uncertainty quantification of the lower-level
models. Often a general purpose tool will have a large number of models, and only a
small number may be employed for any given application. The top-down approach
takes advantage of this idea to minimize the amount of uncertainty quantification
work.
It is important to note that if the class wall friction is important, then the same
top-down idea can be used to determine important subclasses. For example, one
could next determine the importance of single-phase and two-phase wall friction.
Recursive application of the top-down approach will continually define smaller and
smaller sections of the physical models that need to be studied.
Once an important physical phenomenon has been identified from the top-down
approach, the bottom-up approach is then applied. In the bottom-up approach,
one starts with a single model for an important physical phenomenon. One now
determines all of the parameters in that model and their acceptable ranges and
distributions. Another important part of the bottom-up approach is to define the
range of applicability of the model.
1622 V.A. Mousseau and B.J. Williams
Many of the closure laws are empirical in nature. This means that they are only
proven accurate inside of the validation data range used to create the empirical
model. In other words empirical correlations work well to interpolate experimen-
tal data. However, it is important to know when an empirical model is being
extrapolated outside of its range of applicability. At a minimum, the code user
needs to be warned that this is happening and ideally the uncertainty should
be increased when an empirical correlation is employed outside of its range of
applicability. Furthermore, application of an empirical correlation outside its range
of applicability may introduce undetectable bias into the calculation that may grow
with extrapolation distance.
One can only quantify the uncertainty of parameters that have been exposed to study.
Often in legacy codes, only a small percentage of parameters are exposed for study.
That means that one needs to modify the source code to allow for the parameters to
be perturbed. Depending on the complexity and level of documentation, this can be
a very labor-intensive process.
It is important to recognize that an uncertainty quantification exercise on the
preexisting exposed parameters may provide false information. If only a few of the
important parameters are available for study, there is little to no scientific value
in a partial uncertainty quantification exercise. One needs to follow a dependable
process, like the top-down and bottom-up process defined previously, to demonstrate
that all important parameters are studied.
There is a downside to parameter exposure that has to be understood and
minimized if possible. Traditionally, uncertainty quantification software works off
the code input processor. One perturbs parameters for study by changing their value
in the input file. There is a good reason why many important parameters are not
exposed for study in the input file. The quality of the software is determined by
its physical models. One does not want a code user to be able to modify the code
physics. This defeats the idea of software quality engineering that ensures that the
code models are correct. If the code user can change the physics through input, then
one cannot build a pedigree for the code physics.
Whenever possible, it is best to expose parameters for study through a different
mechanism than the input file. One wants a file that is hard to read and hard
to modify and ideally the code user does not even know that it exists. One
needs to recognize that parameter exposure is important for accurate uncertainty
quantification, but one also needs to be cautious how many tunable parameters
are exposed to the code user.
3.4 As Built
to even recognize the closure law in the software especially for legacy software.
The closure law in the journal paper only has one requirement the correlation
needs to approximate the experimental data.
The closure law in the software has many requirements that include derivatives
and correct asymptotic behavior. This is often exacerbated by applying the closure
law outside of its range of applicability. Every time the code crashes in the closure
law, some code developer has to bulletproof the correlation. This bulletproofing
exercise can make the relation between what is in the code and what is in the journal
paper hard to reconcile.
Closure laws are often based on assumptions like steady state and fully devel-
oped. This results in discontinuities in time and space in the closure laws. Dis-
continuities in time are often dealt with by under-relaxation. This process smooths
the discontinuous closure law in time but as a consequence modifies the closure
law physics. The discontinuities in space are often dealt with by applying ramps
between closure laws. Here, one smooths the closure laws by taking an average of
two adjacent control volumes. This ramp or averaging process also modifies the
closure law physics.
Therefore to assess the uncertainty contribution of a closure law, it is important
to look at the software and study the closure law as built, not as designed in the
journal paper.
One of the challenges for uncertainty quantification (as well as verification and
validation) is how to compute the QoI. In an ideal situation, the simulation tool
would compute the QoI in a manner that was consistent with the numerical methods,
the computational grid, and the physical models of the simulation software. This is
not always possible. What is more often the case is that the code exports the required
information to an output file and then the uncertainty quantification software must
build the QoI from the output file. This process can be error prone because there
are often approximations made to minimize the size of the output file. Unless
these approximations are made clear, the QoI constructed from the approximated
output files may contain significant error. This can lead to large perceived errors
in verification and validation when, in reality, the error arose from how the output
files were written by the code and how they were interpreted by the uncertainty
quantification software.
Theory
Denote the vector of model parameters to be calibrated by . Model and experimen-
tal outputs are represented by f .x; / and y, respectively, where x quantifies input
settings describing the experimental scenario. With " denoting observational error,
in the absence of systematic model bias, the n experimental and model outputs are
related as follows:
0 1 0 1 0 1
y1 f .x1 ; / "1
By2 C Bf .x2 ; /C B"2 C
B C B C B C
B : CDB :: C C B : C;
@ :: A @ : A @ :: A
yn f .xn ; / "n
. ; 2 / D 1 . / 2 . 2 /:
n
Y
1 . / D 1i .i /;
iD1
where n is the number of parameters and 1i ./ is the marginal distribution of
i , i D 1; : : : ; n . The ultimate goal of Bayesian calibration is to determine the
posterior distribution of parameters and 2 , denoted . ; 2 jy/ and given by
That is, the posterior distribution is proportional to the likelihood function mul-
tiplied by the prior distribution. In other words, the experimental data y is used
to update prior knowledge about the parameters and 2 through the likelihood
function L. ; 2 jy/. Relative to the prior distribution, values of and 2 associated
with greater likelihood values (i.e., more likely to have generated the observed data)
are more probable.
The posterior distribution of and 2 is generally not available analytically due
to the fact that the normalizing constant
Z
cD L. ; 2 jy/ .; 2 / d d 2
about model output is then obtained by summarizing the sample of model outputs
fg.i /; i D 1; : : : ; M g. MCMC and predictive inference are infeasible if the model
f ./ calculates slowly for each parameter setting . In this case, a physical or
mathematical surrogate for f ./ must be employed in both Bayesian calibration
and uncertainty quantification.
When systematic bias is present in model calculations, the statistical model used
to describe the experimental data must be extended. This can be accomplished
in several ways depending on the nature of the bias. Here a particular case is
considered in which the n-vector of experimental data was collected across `
laboratories (` < n):
0 1
y1
By C
B 2C
y D B : C:
@ :: A
y`
Application
Mixed effects models as described in the previous section were utilized to calibrate
three parameters of the Dittus-Boelter correlation for wall heat transfer and two
48 Uncertainty Quantification in a Regulatory Environment 1627
where x1 is the Reynolds number and x2 is the Prandtl number. The nominal value
of is set to 0 D .0:023; 0:8; 0:4/, and the prior distribution of is taken to
be uniform on the hyperrectangle defined by the Cartesian product of the intervals
0; 2 0;i for i D 1; 2; 3. The McAdams correlation has the following form:
fM .x; / D 1 x 2 ;
l
Exp. 1 Exp. 1 Exp. 4 Exp. 7 Exp. 10
ll
l Exp. 2 l Exp. 2 Exp. 5 Exp. 8 l Exp. 11
l Exp. 3 Exp. 3 Exp. 6 Exp. 9
l
Nusselt number
l l
l ll
l
l l l
l
F
l
l l l
l l l l
l l
l
l
l
l
l
l
l
ll
l
l
l
ll
l
l
lll
llllll
10000 15000 20000 25000 30000 35000 0e+00 1e+05 2e+05 3e+05 4e+05 5e+05
Reynolds number Reynolds number
Fig. 48.1 Experimental data used for Dittus-Boelter (left) and McAdams (right) calibrations
1628 V.A. Mousseau and B.J. Williams
Common
0.5 0.6
0.5 0.6
0.80 0.82 0.84 0.86 0.88 0.90 Exp. 1
l Exp. 2
Exp. 3
0.4
0.4
l l
l
0.2 0.3
0.2 0.3
2
l
l
ll
l
l
lll l
l
ll
ll
l
l
3
ll
ll
l
llll
l l
ll
ll
ll
ll
ll
l
l
l
l
l
ll
l
ll
l
l
ll
l
l lll
l
l
ll
l
ll
l
lll
l
ll
ll
l
l
ll
l
l
l
ll
l
l
ll
l
ll
ll
lll l
ll
l
l
ll
l
ll
l
l
ll
l
ll
l
l
ll
l
l
l
ll
l
l
l
ll
l
l
l
ll
ll
l
l
ll
l
lll l
l
ll
l
l
l
ll
l
ll
ll
l
l
ll
l
ll
l
l
l
ll
l
l
l
ll
ll
ll
l
l
ll
l
l
ll
l
l
ll
ll
ll
l
l
ll
l
l
l
l
ll
l
ll l
l
ll
l
l
l
ll
l
l
l
l
ll
l
l
l
ll
l
l
l
ll
l
ll
ll
ll
l
l
ll
l
lll
l
l
ll
ll
l
l
ll
l
ll
ll
l
l
l
lllll ll
ll
l
ll
l
l
l
ll
l
l
l
l
lll
l
l
l
ll
l
l
ll
l
l lll
l
l
ll
l
l
ll
l
l
l
ll
l
l
l
ll
l ll
l
l
ll
l
l
lll
l
l
l
ll
ll
ll
l
l
l
ll
l
l
l
l
ll
ll
l
l l
ll
l
l
l
l
ll
l
ll
l
l
l
l
ll
l
l
ll
l
l
l
l l l
l
l
ll
l
l
ll
l
l
ll
l
l
ll
l
ll
l
l
l
ll
ll
l
l lll
ll
l
l
ll
l
l
ll
l
ll
ll l
ll
l
ll
l
l
l
ll
l
l
l
ll
l
ll
l
l
ll l
l l
l
ll
ll
ll
l ll
l
l
ll
l
l
l
ll
l
ll
l
lll
l l
l
ll
l
llll l
l
l
l
ll
l
l
ll
ll
l ll
ll
l
l
ll
l
l
ll
lllll
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
ll lll
l
l
l
l
l
l
l
l l
ll
l
ll
ll
l
l
l
lll
l
ll
l
ll
l
l
l
ll
l
l
l
l
ll
l
l
l
ll
ll
l
ll
ll
l l
ll ll
l
l
ll
l
ll
ll
l
ll
l
l
ll
l
l
l
ll
l
l
l
l
ll
l
l
l
ll
ll
l
ll
l
lll
l
ll
ll
l
l
ll
ll
ll
ll
ll
l
ll
ll
l
ll
l
lll
ll
lll
l
ll
l
l
ll
l
l
ll
l
l
ll
ll
l
l
ll
l l l
ll
ll
l
ll
l
l
lll
l
ll
l
l
ll
l
l
ll
l
l
ll
l
l
ll
ll
ll
ll
l
ll
l
l l
l l
l l
ll
l
l
ll
l
ll
l
l
ll
l
l
l
lll
ll
l
l
ll
l
ll
l
l
l
ll
l
l
l ll
l
ll
ll
ll
l
ll
l
ll
l
ll
l
l
ll
l
l
ll
l
ll
l
ll
l
l
ll
ll
l
ll
l
l
lll
ll
ll
llll
l
llll
l
ll
ll
l
l
l
l
l l ll ll
ll
l
0.1
0.1
Common Common
Exp. 1 Exp. 1
l Exp. 2 l Exp. 2
0.0
0.0
Exp. 3 Exp. 3
0.01 0.02 0.03 0.04 0.01 0.02 0.03 0.04 0.80 0.82 0.84 0.86 0.88 0.90
1 1 2
Fig. 48.2 Bivariate projections of posterior samples of common (black) and perturbed C
corresponding to each laboratory (see legends) from Dittus-Boelter calibration
l
Common l Exp. 2 Exp. 4 Exp. 6 Exp. 8 Exp. 10
0.15
l
Exp.
1 Exp. 3 Exp. 5 Exp. 7 Exp. 9 l Exp. 11
ll
l
l
l
l
l
l
ll
l
l
l
l
l llll
l
l
ll
l l
l
l
l llll
l
l
l ll
l
ll
l
ll lll
l
l
l
l
l ll
l
l
ll
l
l
l l
ll
l
l
ll
l
l
l
l
l l l
l
l
l
ll
l
l
l
ll
l
l
l
l ll
l
ll
l
l
l
ll
ll
ll
ll
l
l l
l
l
ll
l
l
ll
l
ll
l
l
l
l l
ll
l
l
ll
l
ll
l
l
ll
l
l l
ll
l
l
ll
l
ll
l
ll
l
l
l
l
l lll
l
l
l
ll
l
l
l
l
ll
l
l
llll
0.20
l
l
l
l l
l
ll
l
l
l
l
ll
l
ll
ll
ll
l
l l
ll
l
ll
l
l
ll
l
l
l
l
ll
l
l
ll
l
l
l ll
ll
l
l
l
ll
l
l
l
l
ll
l
l l
l
l
l
l
l l
l
l
ll
l
l
l
ll
l
l
l
l
ll
l
ll l
l
ll
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l ll
l
ll
l
l
l
l
ll
l
ll
l
l
l
l
l
l l
ll
l
ll
l
l
ll
l
l
l
ll l
ll
l
l ll
l
ll
ll
l
l
ll
l
l
ll
l
l
l l
ll
l
l
ll
l
l
ll
l
l
l
l
lll
l
l
l ll
ll
l
l
ll
l
l
l
ll l
l
l
l ll
ll
l
ll
l
ll
l
l
l l
l
l
ll
lllll
l ll
l
l
l
l
lll
l
l
ll llll
l
l lll
l
l ll
l
l lll
l
ll
l
l ll
l
2
l
0.25
l
0.30
Fig. 48.3 Posterior samples of common (black) and perturbed C corresponding to each
laboratory (see legend) from McAdams calibration
Marginal Distributions
Assume the parameters are statistically independent, having marginal distributions
obtained from Bayesian calibration to experimental data. The support of these
48 Uncertainty Quantification in a Regulatory Environment 1629
distributions is represented as the red bounding box in Fig. 48.4. The experimental-
based parameter distributions are now peaked within their ranges, reflecting
marginal preferences for parameter values associated with output more consistent
with the experimental data. Sampling these marginal distributions thus results in a
lower likelihood of bad simulation results relative to the expert opinion case.
Joint Distributions
Joint distributions arise from analysis of the interdependency among parameters.
The support of the joint distribution is represented as the solid green trapezoid in
Fig. 48.4. Here, using experimental data, the relationship among the parameters is
recognized. This results in a smaller probabilistic volume to sample from and should
eliminate all code crashes due to unphysical parameter choices. It should be stressed
that the assumption of parameter independence is essential for expert opinion-based
parameter distributions because it is impossible, without detailed mathematical
analysis of the experimental data, for an expert to intuit the relationship among
parameters.
samples were combined across parameters to form a joint sample. The resulting
distribution for maximum fuel temperature is shown in green in Fig. 48.5. A
maximum fuel temperature distribution is seen that looks right skewed as also seen
from expert opinion, but now unimodal due to much lower probability assigned to
unphysical parameter settings. It is interesting to note that the range of the maximum
fuel temperature distribution is not much different than the expert opinion-based
distribution shown in red.
In the second analysis using the Bayesian calibration results, the maximum fuel
temperature distribution was computed based on the joint parameter distribution.
This is shown in blue in Fig. 48.5. A very tight distribution function is now seen
with a nominally Gaussian shape. All sampled parameter settings in this case are
probabilistically consistent with the experimental data.
It is important to note that a significant reduction in uncertainty, and a resulting
larger margin, was accomplished simply through improved parameter distributions.
The lower uncertainty resulting from the joint parameter distribution is easier to
defend to a regulator than the larger expert opinion-based uniform distribution.
As long as the expert opinion-based independent uniform parameter distribution
is larger than the real parameter distribution based on experimental data, it will
typically (under weak assumptions of code input-output behavior) result in a more
conservative estimate of the output uncertainty.
There is recognition in the nuclear power industry that there is a mode of uncertainty
that comes from the code user called the user effect [12]. Based on the training of the
code user and the level of code documentation for user guidelines and best practices,
two different code users can get two different answers given the same simulation
problem. This difference in answers is called the user effect.
To minimize user effect, one needs to eliminate, as much as possible, tunable
parameters. Tunable parameters allow the code user to dial in whatever answer
48 Uncertainty Quantification in a Regulatory Environment 1631
they expect. They can force the simulation answer to behave the way the code user
expects. This overriding of the code physics minimizes the work on code quality
that provided the pedigree asserting the code physics was correct. If parameters
are put into the input file for uncertainty quantification, a large number of tunable
parameters can be created.
If ten people, given the same problem definition and the same software, get ten
different answers, this typifies user effect. Code documentation that discusses best
practices and user guidelines is the key to minimizing user effect. This can also be
minimized by expanded input processing such as a graphical user interface (GUI).
The GUI can have best practices and user guidelines built into the default values
which reduce the number of parameters code users can change.
4 Modifications to PCMM
The PCMM process is evolving and can be modified to better suit the application.
The following is a list of modifications, changes, and improvements to the PCMM
process for this application:
1. Quantified Parameter Ranking Table (QPRT) the QPRT augments the informa-
tion from the PIRT and provides a more complete picture of the importance of
physical models.
2. Validation Pyramid a validation pyramid decomposes the QoI into its individual
components.
3. Partitioning of Code and Application PCMM PCMM applied to an application
drives PCMM analysis of individual codes.
The PIRT plays a key role in analyzing the predictability of software. The PIRT
process consists of a team of experts defining which physical phenomena are
important to correctly model for a given application and the set of QoIs. The experts
list the important phenomena and then rank them in terms of each phenomenons
impact on each QoI (high, medium, or low) and then on available knowledge
to model that phenomenon (high, medium, or low). Resources for modeling and
simulation are then focused on phenomena that have high importance and low
knowledge.
The challenge of a PIRT is that it is very subjective and there can be a large
uncertainty in the PIRT process based on ones choice of experts. Therefore the
PIRT process is augmented with a QPRT. The QPRT is a sensitivity analysis based
on the parameters in the software. This is a quantitative process that is repeatable
by different investigators (the expert bias has been removed). One can numerically
1632 V.A. Mousseau and B.J. Williams
measure the impact of any given parameter on each QoI. These parameters can
then be ranked ordinally from most important to least important. Given a working
knowledge of the simulation code, each of these parameters can then be mapped
back to a model of a physical phenomenon. In this way one can construct a list of
phenomena that the software believes is important.
One is then left with the task of making the PIRT and QPRT agree. It is possible
that an important physical phenomenon is ranked high in the PIRT but does not
show up in the QPRT because a model for that phenomenon was not included in
the code. This means that new models have to be added to the code. Sometimes the
phenomena defined by the experts are not relevant to the simulation. For example,
an expert may believe that interfacial friction is a key phenomenon; however, if the
flow is single phase, then two-phase phenomena are not important.
There is another process that can be used to assess the quality of the PIRT and
QPRT, and that is sensitivity experiments. In a sensitivity experiment, one makes
small perturbations to the nominal experiment based on a physical phenomenon.
One then experimentally confirms that perturbations in the phenomenon have a large
or small effect on the QoI measured in the experiment.
If the PIRT, QPRT, and experimental sensitivity study all agree on what phenom-
ena are important and are not important, then one can proceed with confidence that
the important phenomena have been correctly identified. When the theoretical expert
opinion-based PIRT and the mathematical and computer science-based QPRT agree
with the experimental-based sensitivity studies, it is an easy process to convince a
regulator that the important phenomena have been captured.
The validation pyramid shown in Fig. 48.6 is constructed by starting with the QoI at
the top and then decomposing the physics into its component parts. In the example
given here, the QoI is the fuel temperature. The fuel temperature is computed based
on the fuel code, the neutronic code which provides the heating, and the thermal
hydraulic code which provides the cooling. In addition this process has a variety
of code coupling parameters and some nonlinear feedback mechanisms. The heat
source from the neutronics is a nonlinear function of the coolant density and the fuel
temperature. The neutronics generates the heat which is passed into the fuel which
then conducts it to the clad which is then removed by the fluid through convective
heating and boiling. The validation pyramid pictorially describes all of the physical
processes that need to be validated and all of the coupling and nonlinear feedback
in the multi-physics code.
For a given application, one can divide the PCMM work between the top of the
validation pyramid, which is specific to the application and the code coupling, and
the bottom of the validation pyramid which is foundational and is independent of
48 Uncertainty Quantification in a Regulatory Environment 1633
Validation Pyramid
the application. This way one can partition PCMM work between the separate codes
that make up the multi-physics application in this case, fuels, thermal hydraulics,
and neutronics and the application itself which is computed from coupled codes.
For a single QoI and a single application, this appears to be a somewhat arbitrary
division. However, when one considers multiple QoIs and multiple applications, the
value becomes more clear.
Each application and subsequent QoI clearly define a set of requirements of
the individual codes. The validation pyramid precisely defines which physics are
required by each separate code to construct the QoI for a given application. The
individual codes can then build a foundational PCMM analysis that establishes
the pedigree for the low-level capability required for the application. If there
are multiple QoIs and multiple applications, then the code-level requirements are
constructed from the union of all of the individual application and QoI requirements.
This builds a set of application-independent requirements that have their maturity
assessed by the code PCMM. The code PCMM is at a low level so validation is
based on separate effects experiments.
Given a set of codes that have a PCMM pedigree for their application and QoI
requirements, one is ready to then build the application and QoI PCMM pedigree.
The pedigree of the coupled code utilized in the application is established. The
verification and validation are now applied to the coupled application based on
the solid foundation provided by the individual code PCMM. Validation at the
application level is at a higher level and is therefore based on integral effects
experiments.
This process provides a rapid way to move to new applications. The basic
assumption is that the code PCMM will begin to saturate and all of the low-
level requirements will be pedigreed by an earlier application. This way additional
applications will focus mainly on integral effects experiments and coupled code
applications due to the fact that the amount of work for code PCMM will diminish
because of the redundancy in code-level requirements.
1634 V.A. Mousseau and B.J. Williams
5 Sample Application
The sample application is for a single assembly with 17x17 fuel pins shown in
Fig. 48.7. This is the simulation of a pressurized water reactor at hot full power.
It includes thermal hydraulic feedback and is run at 100% of nominal power with
a boron concentration of 1300 parts per million. More details of this study are
presented in a technical report [8]. This test problem was chosen because it is
small enough to do hundreds of runs easily, but complicated enough that it includes
code coupling and key nuclear reactor phenomena. It should be noted that there are
no validation data for this problem so the application validation effort will not be
presented. The code validation effort was implemented with separate effects data
used to calibrate some of the model parameters.
This is a steady-state problem at 100% of nominal power, implying single phase
without subcooled boiling (the subcooled boiling occurs in the hot assemblies that
are at higher than 100% nominal power). Due to this only a small fraction of the
thermal hydraulic capability is utilized.
5.1 PIRT
The PIRT is presented in its entirety in Appendix B.2. The result of the PIRT is
summarized in the following list of high-importance phenomena:
1. (H,M) Subcooled boiling (minimal effect due to void, possible effect due to
improved heat transfer, but subcooled boiling occurs where the power is low)
2. (H,H) Single-phase heat transfer (Dittus-Boelter affects fuel temperature)
It should be noted that some phenomena from the PIRT in this list have been
collapsed into more general categories.
Our first step is to use solution verification to estimate the numerical uncertainty
for each of our three QoIs. The QoIs can sometimes be estimated by the stand-
alone codes (CTF) and the coupled codes (VERA-CS). The verification results are
presented in Table 48.1 which quantifies estimates of the numerical uncertainty.
5.3 QPRT
The code is used to estimate the important phenomena based on the software. Note
that due to the complexity of the interrelationships among the cross sections, they
were not simply perturbed. Instead, a set of 100 consistent cross-sectional libraries
was constructed and each was run separately. For all other parameters, the multiplier
of the nominal parameter value is listed as well as the resulting change in the QoI.
It should be noted that some parameters received a larger perturbation due to the
larger uncertainty in that parameter (gap conductivity and fuel conductivity in the
fuel model are noted).
Table 48.2 provides sensitivities of the different parameters that represent the
different phenomena in the code. The numerical uncertainty is presented in italics.
The numerical uncertainty sets the floor for future parameter work. Utilizing the
concept of total uncertainty in Sect. 2.1, focus centers on the largest uncertainties
among numerical, validation, and parameter. Since validation data are not available
for this problem, only numerical and parameter uncertainties remain. Therefore
interest in parameter uncertainties smaller than the numerical uncertainty is merely
academic. As expected, the sensitivities associated with cross sections dominate the
reactivity QoI. This is followed by fuel model phenomena and code coupling.
Table 48.3 provides the QPRT for the pin power QoI. As opposed to reactivity,
which is a single global number, for this simulation pin power is different in every
fuel pin and every axial level. As a local variable, it is a stronger function of the fuel
temperature and the resulting feedback than the cross sections.
Table 48.4 provides the QPRT for the maximum fuel temperature QoI. This table
is dominated by the fuel model which is expected for this QoI.
Table 48.5 compares the PIRT results with the QPRT results. The PIRT panel had
a bias to members with a thermal hydraulic background. This resulted in thermal
hydraulic phenomena being given a higher ranking. However, in the problem studied
and for the QoIs selected, the neutronic and fuel models were more important than
thermal hydraulics. It should also be noted that some entries in the PIRT were
redundant since the lists were constructed separately for fuels, thermal hydraulics,
48 Uncertainty Quantification in a Regulatory Environment 1637
and neutronics. The code coupling parameters (fission heat and fuel temperature)
play a major role in this application.
The next step in the process is to establish the initial PCMM scores for the separate
codes. The expected PCMM values, referred to as goals, are also established.
This first step provides an initial external assessment (outside of the code team)
which identifies areas of improvement for the code team. It should be noted that the
PCMM goals reflect the intended use of the application results. In this example, the
application was simply a progression problem used to monitor code development
progress. Since no important decisions are based on the results, the goal PCMM
scores are low.
The second step is to come back later and remeasure the PCMM score relative
to the goal. Once the software has achieved its PCMM goal, additional work is of
lower priority.
1638 V.A. Mousseau and B.J. Williams
5.5.1 Neutronics
Table 48.6 provides the initial PCMM scores, goals, and current scores for neutron-
ics. In all cases, the current score exceeds or meets the goal. Note that during this
study, it was decided that the neutronic code would be replaced. This resulted in
relatively low goal PCMM values.
The results of Table 48.6 can be quickly summarized by the Kiviat diagram
shown in Fig. 48.8. The Kiviat diagram provides a quick reference for where
additional work is needed, if anywhere.
Figure 48.9 shows the pictorial summary of Table 48.7. This diagram shows that
additional work is needed in code verification and validation. It should be noted that
this work was completed after this study was conducted.
Table 48.8 provides the PCMM scores for the coupled code application. It is
sometimes challenging to separate the code PCMM from the application PCMM
since there is a certain amount of overlap. Note that since this application does not
have any validation experimental data, the goal for validation is zero.
Figure 48.10 shows the summary of Table 48.8. It is seen that all the goal
values have been met except for application code verification. It should be noted
that whereas solution verification is easy for an application, code verification often
requires additional software development (e.g., method of manufactured solutions).
included a variety of approaches. For neutronics, all of the cross sections were
represented as a single parameter having uncertainty captured by perturbed cross-
sectional libraries. Some of the parameters had detailed parameter distributions
constructed from Bayesian calibration of separate effects tests as described in
Sect. 3.6.2. Some parameters used expert opinion which is always the fallback
position. The result was the construction of a two-standard-deviation uncertainty
range for each of the QoIs.
5.7.1 Reactivity
The reactivity uncertainty, although small on a percentage basis, was higher than
expected. Future efforts will be applied to understand this large uncertainty and
make attempts to reduce it:
When the best estimate plus uncertainty results are being presented in a hostile
environment where a regulator is reviewing the information, it is important they
be presented in a logical fashion that is easy to follow. The story should be
presented in a manner that demonstrates a high-level understanding of both the
application physical phenomena and a deep understanding of the code results. The
evidence files for each of the six PCMM elements should be available for a deep
dive if details are needed to support conclusions. However, these details can be
summarized quickly in a Kiviat diagram, and there is no need to step through all
of the evidence.
It is important to recognize the major sources of uncertainty as numerical,
model-form (validation), and parameter uncertainty. It is important to present a
succinct summary that all important topics have been covered in a logical fashion.
For parameter uncertainty, it is desirable to establish which parameters were studied
and which parameters were not with a logical explanation of why parameters were
excluded. When possible parameter distributions based on Bayesian calibration of
separate effects tests should be employed, but expert opinion is always the fallback
position.
The key to presenting uncertainty quantification results in a regulatory environ-
ment is to prepare as much as possible for whatever reasonable questions could
be asked. If questions are answered quickly and concisely with backup evidence
available, the regulators will quickly gain confidence in the presenter. Because of
the high level of stress induced by a regulatory review, thinking off the top of ones
head can be dangerous. It is sometimes preferable to simply admit ignorance and
ask for more time to study than to give an incorrect answer.
7 Conclusions
4. Finally one may be reviewed by someone who has intimate knowledge of the
simulation software and knows what model parameters are exposed for study
and which model parameters are hidden in the code.
If the regulatory decision is important (e.g., related to nuclear reactor safety), one
will most likely encounter experts in the review process from all four fields above.
In a regulatory environment, black box approaches are only useful if the regulator
does not know what is in the box.
It is important to understand all of the different sources of uncertainty and to
address all of them clearly and logically. The PCMM process provides a structure
that ensures that all important topics are accounted for. It is important for the
regulator to be able to clearly understand the pedigree behind the best estimate value
of the QoI and the quantified uncertainty in the QoI. It is not important to present
all of the details, but it is important to have all of the details readily available when
questions are asked. The PCMM Kiviat diagrams provide a very succinct summary
of the work, and the PCMM evidence files provide the detailed information needed
to answer specific questions.
Appendix
In 1941 Wilks [15] introduced a nonparametric method for setting tolerance limits.
This method assumes that the population cumulative distribution function G./ of
the QoI admits a density function g./ with respect to Lebesgue measure, i.e.,
d
dy
G.y/ D g.y/. Calculation of Wilks tolerance limits requires the capability
to randomly sample the distribution G./. In the applications of interest here, the
distribution G.y/ is defined as the probability of the set f W f . / yg, where
f . / denotes the results of a computational model evaluated at parameter setting .
In this scenario, the random sampling requirement is met by (a) possessing an ability
to sample from its correct distribution, even if this distribution is not available
analytically (cf. Sect. 2.2.2), and (b) confirming that code crashes are independent
of parameter values (cf. Sect. 2.2.1). The density g.y/ will not correctly represent
the true output distribution, and therefore Wilks tolerance limits will be biased, if
(a) the assumed parameter vector excludes additional inputs that if varied would
induce changes in the QoI (cf. Sect. 2.2.3) or (b) f . / produces biased predictions
of the QoI for values belonging to a set of positive probability (cf. Sect. 2.2.4).
Given an independent and identically distributed sample .Y1 ; : : : ; Yn / from g./,
form the order statistics .Y.1/ ; : : : ; Y.n/ /, where Y.1/ Y.2/ Y.n/ .
Upper [lower] one-sided tolerance bounds of the form .1; Ut / D .1; Y.t/ /
[.Lt ; 1/ D .Y.s/ ; 1/] are considered and tolerance intervals of the form .Lt ; Ut / D
.Y.s/ ; Y.t/ /, s < t . Define the following random variables,
1644 V.A. Mousseau and B.J. Williams
Z 1
PU D 1 G. Y.t/ / D g. y/ dy;
Y.t/
Z Y.s/
PL D G. Y.s/ / D g. y/ dy; and
1
Z Y.t/
PC D G. Y.t/ / G. Y.s/ / D g. y/ dy:
Y.s/
The upper one-sided tolerance bound .1; Y.t/ / has (random) coverage probability
1 PU , the lower one-sided tolerance bound .Y.s/ ; 1/ has coverage probability 1
PL , and the tolerance interval .Y.s/ ; Y.t/ / has coverage probability PC . Starting from
the joint probability distribution of .Y.s/ ; Y.t/ / (cf. [2]), these coverage probabilities
each have beta distributions:
PU Beta.n t C 1; t /;
PL Beta.s; n s C 1/; and
PC Beta.t s; n t C s C 1/:
Pr1 PU P 1 ;
I1P .r; n r C 1/ 1 :
48 Uncertainty Quantification in a Regulatory Environment 1645
IP .n 2r C 1; 2r/ :
Table 48.9 provides the minimum sample sizes n required to obtain 95%/95%
Wilks tolerance bounds/intervals for r 2 f1; 2; : : : ; 10g, calculated from the recipe
of the previous paragraph. Note that increasing r corresponds to using more cen-
trally located order statistics to define the tolerance bounds/intervals. It is seen that
larger sample sizes n are required to guarantee that tolerance bounds/intervals using
increasingly centralized order statistics will achieve the required 95% coverage of
the distribution G./ at the 95% confidence level.
B.2 PIRT
B.2.3 Phenomena
Phenomena are listed for each of the three physics areas. Phenomena will be ranked
as an ordered pair (importance, knowledge) immediately after each phenomenon
number.
Thermal Hydraulics
Here the fluid flow over the fueled portion of the pin is considered:
1. (H,M) Subcooled boiling (minimal effect due to void, possible effect due to
improved heat transfer, but subcooled boiling occurs where the power is low).
2. (M,H) Water density (10% change over core, uncertainty in steam table ASME
1967 vs. ASME 1998).
3. (H,H) Single-phase heat transfer (Dittus-Boelter affects fuel temperature).
4. (L,M) Gamma heating (2% of total power directly into the coolant).
5. (L,L) Dimensional changes due to thermal expansion (may be smaller than
other uncertainties in thermal hydraulics).
6. (M,M) Inlet mass flow rate (no feedback from loop so pressure drops sum to
zero).
7. (M,M) Inlet temperature (no feedback from loop so changes in temperature sum
to zero).
8. (M,H) Wall friction.
9. (H,H) Since the figure of merit is reactivity, power can be fixed for an eigenvalue
search (cannot fix eigenvalue = fix reactivity and do a power search). Until
transient capability is available, eigenvalue is the only logical search variable.
10. (H,L) Cross-flow models (important near guide tubes).
11. (H,M) Spacer grid model:
(a) (H,M) Loss coefficient is steady state.
(b) (M,M) Mixing term.
(c) (M,M) Enhanced heat transfer.
Fuel Model
Here the pellet, the gap, and the clad are considered. Energy is transferred from
the fuel across the gap to the clad. The fuel model in CTF is currently being
employed; later this will be replaced by Bison-CASL. Note that only fresh fuel
is being considered so there are no burnup effects:
Neutronics
The challenge here is figuring out what to do with the large number of cross sections,
which include fission, absorption, and scattering. The following list of materials
pertain to this application:
1. U235
2. U238
3. U234
4. U236
5. Inconel
6. Zirconium
7. Oxygen
8. Hydrogen
9. Helium
10. Boron
11. Steel (nozzles and core plates)
There are 23 energy groups in the SPn calculation and 252 energy groups in the
pin cell computations. It is important to note that the neutronic domain is larger
than the thermal hydraulic domain. SCALE may provide uncertainty quantification
information. It is necessary to conduct a literature search to look for cross-sectional
prioritization work. The important phenomena for the QoIs (eigenvalue and 3D pin
power directly or 3D fuel temperature is coupled through 3D power and cross-
sectional feedback) are:
1. (H,H) Neutron fission (nuclear data would be the main component of this).
2. (H,H) Neutron/gamma absorption.
3. (H,H) Neutron/gamma scattering.
4. (H,H) Neutrons released per fission.
5. (M,H) Other neutron reactions (n,2n), (n,3n).
6. (H,H) Energy released per fission.
7. (L,H) Energy released per capture.
8. (L,M) Gamma energy deposition (in fuel, clad, and moderator).
9. (H,M) Boron density in the coolant.
10. (H,H) Moderator density.
11. (M,H) Moderator temperature.
12. (H,H) Fuel temperature (Doppler feedback).
13. (L,H) Clad temperature.
14. (M,M) Fuel density.
15. (L,L) Manufacturing uncertainties (mass and dimensions in fuel/clad/grids/
structure).
1648 V.A. Mousseau and B.J. Williams
16. (M,M) Neutron/gamma transport (includes things like SPn order, number of
groups, homogenization, etc.).
17. (L,M) Bowing (only effects of thermal expansion since depletion is not
included).
18. (L,M) Thermal expansion.
19. (L,L) What are the neutronic effects of layer of bubbles on fuel rod surface?
References
1. Adams, B.M., Hooper, R.W., Lewis, A., McMahan, J.A. Jr., Smith, R., Swiler, L.P., Williams,
B.J.: User guidelines and best practices for CASL VUQ analysis using Dakota. Technical
report, CASL-U-2014-0038-000, Consortium for Advanced Simulation of LWRs (2014)
2. Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: A First Course in Order Statistics. SIAM,
Philadelphia (2008)
3. Boyack, B., Duffey, R., Griffith, P., Lellouche, G., Levy, S., Rohatgi, U., Wilson, G., Wulff,
W., Zuber, N.: Quantifying reactor safety margins: application of code scaling, applicability,
and uncertainty evaluation methodology to a large-break, loss-of-coolant accident. Technical
report, NUREG/CR-5249, Nuclear Regulatory Commission (1989)
4. Adams, B.M. et al.: Dakota, a multilevel parallel object-oriented framework for design
optimization, parameter estimation, uncertainty quantification, and sensitivity analysis: Version
6.2 reference manual. Technical report, SAND2014-5015, Sandia National Laboratories (2015)
5. Adams, B.M. et al.: Dakota, a multilevel parallel object-oriented framework for design
optimization, parameter estimation, uncertainty quantification, and sensitivity analysis: Version
6.2 theory manual. Technical report, SAND2014-4253, Sandia National Laboratories (2015)
6. Adams, B.M. et al.: Dakota, a multilevel parallel object-oriented framework for design
optimization, parameter estimation, uncertainty quantification, and sensitivity analysis: Version
6.2 users manual. Technical report, SAND2014-4633, Sandia National Laboratories (2015)
7. Martin, R.P., ODell, L.D.: AREVAs realistic large break LOCA analysis methodology. Nucl.
Eng. Des. 235, 17131725 (2005)
8. Mousseau, V.A., Dinh, N., Rider, W., Abdel-Khalik, H., Williams, B., Smith, R., Adams, B.,
Belcourt, N., Hooper, R., Copps, K.: Demonstration of integrated DA/UQ for VERA-CS on a
core physics progression problem. Technical report, CASL-I-2014-0158-000, Consortium for
Advanced Simulation of LWRs (2014)
9. Oberkampf, W.L., Pilch, M., Trucano, T.G.: Predictive capability maturity model for com-
putational modeling and simulation. Technical report, SAND2007-5948, Sandia National
Laboratories (2007)
10. Oberkampf, W.L., Roy, C.J.: Verification and Validation in Scientific Computing. Cambridge
University Press, Cambridge (2010)
11. Oliver, T.A., Terejanu, G., Simmons, C.S., Moser, R.D.: Validating predictions of unobserved
quantities. Comput. Methods Appl. Eng. 283, 13101335 (2015)
12. Petruzzi, A., DAuria, F., Bajs, T., Reventos, F., Hassan, Y.: International course to support
nuclear licensing by user training in the areas of scaling, uncertainty, and 3D thermal-
hydraulics/neutron-kinetics coupled codes: 3D S.UN.COP seminars. Sci. Technol. Nucl.
Install. 2008(874023), 116 (2008)
13. Rider, W., Witkowski, W., Kamm, J.R., Wildey, T.: Robust verification analysis. J. Comput.
Phys. 307, 146163 (2016)
14. Rider, W., Witkowski, W., Mousseau, V.: UQs Role in Modeling and Simulation Planning,
Credibility and Assessment Through the Predictive Capability Maturity Model. Springer,
Switzerland (2015)
15. Wilks, S.S.: Determination of sample sizes for setting tolerance limits. Ann. Math. Stat. 12(1),
9196 (1941)
Part VII
Introduction to Software for Uncertainty
Quantification
Dakota: Bridging Advanced Scalable
Uncertainty Quantification Algorithms 49
with Production Deployment
Abstract
This chapter highlights uncertainty quantification (UQ) methods in Sandia
National Laboratories Dakota software. The UQ methods primarily focus on
forward propagation of uncertainty, but inverse propagation with Bayesian
calibration is also discussed. The chapter begins with a brief Dakota history
and mechanics of licensing, software and documentation acquisition, and getting
started, including interfacing simulations to Dakota. Early sections are devoted
to core sampling, stochastic expansion, reliability, and epistemic methods, while
subsequent sections discuss more advanced capabilities such as mixed epistemic-
aleatory UQ, multifidelity UQ, optimization under uncertainty, and Bayesian
calibration. The chapter concludes with usage guidelines and a discussion of
future directions.
Keywords
Dakota software Open-source software Black box Parallel computing
Surrogate models Sampling Reliability Polynomial chaos expansions
Stochastic collocation Epistemic UQ Interval estimation Multifidelity
Stochastic design Bayesian calibration Adaptive methods Sensitivity
analysis Optimization Calibration Importance sampling
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1652
1.1 History and Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1653
1.2 Obtaining the Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1654
1.3 Getting Started Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1655
1.4 Interfacing to Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1655
1.5 Characterizing Uncertain Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658
2 Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658
2.1 Latin Hypercube Sampling Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1660
2.2 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1662
3 Stochastic Expansion Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1662
3.1 Polynomial Chaos Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1663
3.2 Stochastic Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665
3.3 PCE Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665
4 Reliability Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1666
4.1 Local Reliability Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1670
4.2 Global Reliability Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1670
4.3 Local Reliability Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1672
5 Epistemic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1673
5.1 Interval Methods for Epistemic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1674
5.2 Dempster-Shafer Theory of Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1675
6 Advanced Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1675
6.1 Mixed Aleatory-Epistemic UQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1675
6.2 Multifidelity UQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1677
6.3 Optimization Under Uncertainty (OUU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1682
6.4 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1683
7 Usage Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688
7.1 Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688
7.2 Reliability Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689
7.3 Stochastic Expansions Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1690
7.4 Epistemic Uncertainty Quantification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1690
7.5 Mixed Aleatory-Epistemic UQ Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1690
7.6 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1690
8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1691
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1691
1 Introduction
Dakota
optimization, calibration,
sensitivity analysis, UQ
x: input g(x): output
parameters responses
computational model
(simulation)
Dakota is open source and distributed under the GNU Lesser General Public License
(LGPL): http://dakota.sandia.gov/license.html. The core source code is written in
C++. Dakota distributions include required third-party libraries, written in C, C++,
Fortran 77, and Fortran 90, with their own LGPL-compatible licenses. Given
Dakotas research to development spectrum, release notes on the webpage indicate
which capabilities are more recent and whether they are in a prototype stage.
More than 13,000 users worldwide have downloaded Dakota, and it has hundreds
of users within the Department of Energy laboratory complex. It is supported on
numerous Linux, Mac, and Windows operating system variants, including high-
performance computing platforms. Software downloads (source and binary), system
requirements, and installation information are available at http://dakota.sandia.gov/
download.html. When feasible, users should download binary packages of Dakota.
However, the website includes information on satisfying system prerequisites
including compilers, MPI, Boost, and linear algebra libraries and compiling the
source code with CMake.
49 Dakota: Bridging Advanced Scalable Uncertainty Quantification: : : 1655
Users: The Users Manual [1] addresses the various Dakota method categories
such as parameter studies, design of experiments, optimization, and nonlinear
least squares. Chapter 5 (Uncertainty Quantification Capabilities) addresses
uncertainty quantification. This manual also covers surrogate and other types of
models, how to define input variables and output responses, creating an interface
between Dakota and the simulation code, and parallelism options.
Reference: The Reference Manual [2] describes all valid keywords available
to specify a Dakota study. For example, when using a genetic algorithm for
optimization, one needs to define the population size, the mutation rate, the
crossover rate, etc. This manual details the keywords and associated arguments
that qualify them.
Theory: The Theory Manual [3] supplements the Users Manual with advanced
topics and more theoretical depth. For example, it covers surrogate-based
optimization and optimization under uncertainty. In terms of uncertainty quantifi-
cation, the Theory Manual has comprehensive descriptions of reliability methods,
stochastic expansion methods, and evidence theory.
Developers: The Developers Manual describes Dakotas architecture, interfaces,
development practices, and C++ class details.
analysis driver
interface
user-supplied
preprocessing
automatic post-
processing
computational
code code
model
input output
(simulation)
Fig. 49.2 Details of Dakota execution and information flow, including user-provided analysis
driver
method
sampling
samples = 10
seed = 98765 rng rnum2
response_levels = 0.1 0.2 0.6
0.1 0.2 0.6
0.1 0.2 0.6
sample_type lhs
distribution cumulative
variables
uniform_uncertain = 2
lower_bounds = 0. 0.
upper_bounds = 1. 1.
descriptors = 'x1' 'x2'
interface
fork asynch evaluation_concurrency = 5
analysis_driver = 'text_book'
responses
response_functions = 3
no_gradients
no_hessians
Fig. 49.3 Dakota input file for UQ example using Latin hypercube sampling
fields from an output file, or it may involve more significant manipulation of a binary
output file. The script returns the responses to Dakota, and then Dakota starts another
iteration of the simulation based on the method. At the end of all of the simulation
evaluations, Dakota outputs the results both in text form and in a tabular data file.
Figure 49.3 provides an example of a basic Dakota input file. This example
demonstrates an input file for sampling-based UQ. The method section indicates
that the method is sampling, of type lhs, for Latin hypercube sampling (as
opposed to pure Monte Carlo or importance sampling), and there will be 10 samples
with a seed of 98765. The keyword response_levels indicates that Dakota
should compute the cumulative distribution function (CDF) value at the specified
response levels 0.1, 0.2, and 0.6, i.e., the probability that the simulation response is
less than each of these values. Dakota will calculate sample statistics based on the
ten samples; external postprocessing is not necessary.
In the variables section, there are two input variables x1 and x2 , characterized
as uniformly distributed on the interval 0; 1. The interface for this example is
implemented in an analysis driver called text_book, an executable included with
1658 L.P. Swiler et al.
2 Sampling Methods
Dakotas sampling implementation (MC and LHS) supports the full range of con-
tinuous and discrete distributions mentioned in the introduction. In addition, LHS
accounts for correlations among the variables, which can be used to accommodate
a user-supplied correlation matrix or to minimize correlation when a correlation
matrix is not supplied.
The input file previously shown in Fig. 49.3 demonstrates the use of Latin hypercube
Monte Carlo sampling for assessing probability of failure as measured by specified
response levels. The textbook example problem has two uniform variables on 0; 1.
The number of samples to perform is controlled with the samples specification,
the type of sampling algorithm to use is controlled with the sample_type
specification, the levels used for computing statistics on the response functions are
specified with the response_levels input, and the seed specification controls
the sequence of the pseudorandom numbers used by the sampling algorithms.
The response function statistics generated by Dakota when running this input
file are shown in Fig. 49.4. The first two blocks of output specify the response
sample moments and the confidence intervals for the mean and standard deviation.
The last section of the output defines CDF pairs (distribution cumulative
was specified) for the response functions by presenting the probability levels
corresponding to the specified response levels (response_levels were set
and the default compute probabilities was used). Alternatively, Dakota
could have provided CCDF pairings, reliability levels corresponding to prescribed
response levels, or response levels corresponding to prescribed probability or
reliability levels.
In addition to obtaining statistical summary information of the type shown in
Fig. 49.4, the results of LHS sampling also include correlations. These can be
helpful in understanding the influence of parameters on responses for ranking
or screening. Four types of correlations are output: simple and partial raw
correlations and simple and partial rank correlations. Raw correlations refer to
correlations performed on the values of the input and output data. Rank correlations
are instead computed on the ranks of the data. Ranks are obtained by replacing
the data by their ranked (ascending) values. The simple correlation is calculated as
Pearsons correlation coefficient.
Figure 49.5 shows an example of the correlation output provided by Dakota for
the input file in Fig. 49.3. Note that the simple correlations among the inputs are
nearly zero. This is due to the default restricted pairing method in the LHS routine
which forces near-zero correlation among uncorrelated inputs. In this particular
example, the first response function is strongly negatively correlated with both
input variables, meaning that the function decreases as the input values increase
from lower to upper bounds. In contrast, the second response function is positively
correlated with the first variable.
In PCE, the output response is modeled as a function of the input random variables
using a carefully chosen set of polynomials. For example, PCE employs Hermite
polynomials to model Gaussian random variables, as originally employed by
Wiener [39]. Dakota implements the generalized PCE approach using the Wiener-
Askey scheme [40], in which Hermite, Legendre, Laguerre, Jacobi, and generalized
1664 L.P. Swiler et al.
Laguerre orthogonal polynomials are used for modeling the effect of continuous
random variables described by normal, uniform, exponential, beta, and gamma
probability distributions, respectively. These orthogonal polynomial selections are
optimal for these distribution types since the inner product weighting function
corresponds to the probability density functions for these continuous distributions.
Dakota allows the selection of three types of basis functions for PCE: Wiener,
Askey, and Extended. The Wiener option uses a Hermite orthogonal polynomial
basis for all random variables. It employs the Nataf variable transformation to
transform non-normal, correlated distributions to normal, independent distributions.
The Askey option, however, employs an extended basis of Hermite, Legendre,
Laguerre, Jacobi, and generalized Laguerre orthogonal polynomials. The Extended
option avoids the use of any nonlinear variable transformations by augmenting the
Askey approach with numerically generated orthogonal polynomials for non-Askey
probability density functions (e.g., bounded normal, Weibull, histogram, etc.).
To propagate input uncertainty through a model using PCE, Dakota performs
the following steps: (1) input uncertainties are transformed to a set of uncorrelated
random variables, (2) a basis such as Hermite polynomials is selected, and (3) the
parameters of the functional approximation are determined. The general polynomial
chaos expansion for a response g has the form
P
X
g.x/ j j .x/ (49.1)
j D0
In the simplest case, if one wants to use a polynomial chaos expansion in Dakota
with isotropic tensor product expansion and five quadrature points for each input
variable, the method section of the Dakota input would look like the following:
method
polynomial_chaos
quadrature_order = 5
If there were two variables, this would involve 55 D 25 function evaluations. If
instead, one wanted to use a sparse grid at level 1, the method section of the Dakota
input would look like the following:
method
polynomial_chaos
sparse_grid_level = 1
1666 L.P. Swiler et al.
2 2
0 0
x2
x2
2 2
4 4
4 2 0 2 4 4 2 0 2 4
x1 x1
Fig. 49.6 Comparison of tensor product (left) and sparse grids (right) used in Dakota PCE
construction
Figure 49.6 contrasts the resulting sample designs for two standard normal
variables. The full tensor grid with quadrature order 5 is shown on the left. On the
right, the level 1 sparse grid requires evaluation of the center point and three-point
quadrature rules for each variable (2n C 1 total points for symmetric weakly-nested
rules). The level 2 sparse grid requires 21 points, including those in the level 1 sparse
grid due to the default use of nested rules (Genz-Keister for Hermite). The sparse
grid points have a bigger range than the quadrature points in this particular example.
Using a PCE method in Dakota results in the moments of each response
function, the covariance matrix for the response functions, and the local and global
sensitivities, as shown in Fig. 49.7. The local sensitivities of each response function
are calculated as the derivative of the response function evaluated at the means of
the uncertain variables. The global sensitivities are calculated by variance-based
decomposition, and the main, interaction, and total effects are reported. In this
particular example, the responses are separable in the two inputs, so the interaction
sensitivities are negligible. As with sampling, one may also request information
about the probability density of the response functions in terms of histogram
information and about the CDF values. This is shown in Fig. 49.8.
Dakotas stochastic expansions support many options. In lieu of listing them here,
the reader is encouraged to explore the Dakota Reference Manual [2]. Table 49.1
summarizes key Dakota PCE features; similar options are available for SC.
4 Reliability Methods
Local sensitivities for each response function evaluated at uncertain variable means:
response_fn_1:
[ -4.0000000000e+00 -4.0000000000e+00 ]
response_fn_2:
[ 2.6888213878e-17 -5.0000000000e-01 ]
response_fn_3:
[ -5.0000000000e-01 -1.3877787808e-16 ]
Fig. 49.7 Statistical output generated from a polynomial chaos expansion in Dakota
Fig. 49.8 PDF and CDF information from results of a polynomial chaos expansion in Dakota
the level of interest zN is shown in solid red. The probability calculation involves a
probability-weighted multidimensional integral of the implicit simulation mapping
g.x/ over the failure domain, for example, outside the red curve, as indicated by the
shading.
Figure 49.9 depicts a standard normal probability space u, in which probability
calculations can be more tractable. In Dakota, the Nataf transformation [24] can be
used to automatically map the user-specified uncertain variables, x, with probability
density function, .x1 ; x2 /, which can be nonnormal and correlated, to a space of
independent Gaussian random variables, .u1 ; u2 /, each with mean zero and unit
variance. In this transformed space, probability contours are circular, instead of the
rotated hyper-ellipses of correlated variables, or arbitrary level curves of a general
joint probability density function.
In u-space, the multidimensional integrals that define the failure probability can
be approximated by simple functions of a single parameter, , called the reliability
index. is the minimum Euclidean distance from the origin in the transformed space
to the failure boundary, also known as the limit state surface. This closest point is
known as the most probable point (MPP) of failure. This nomenclature is due to
the origin of these methods within the disciplines of structural safety and reliability;
however, the methodology is equally applicable for computation of probabilities
unrelated to failure.
Within the class of reliability methods, there are local and global algorithms.
The most well-known are local methods, which locate a single MPP using a local,
often gradient-based, optimization search method and then utilize an approximation
centered around this point (see the notional linear and quadratic approximations in
Fig. 49.9). In contrast, global methods generate approximations over the full random
variable space and can find multiple MPPs if they exist, for example, the second
most likely point, shown as an open red circle in Fig. 49.9. In both cases, a primary
1670 L.P. Swiler et al.
strength of the methods lies in the fact that the computational expense is generally
unrelated to the probability level; the cost of evaluating a probability in the far tails
is no more than that of evaluating near the means.
The Dakota Theory Manual [3] provides the algorithmic details for the local
reliability methods as well as references to related research activities. Local methods
include first- and second-order versions of the mean value method (MVFOSM and
MVSOSM) and a variety of most probable point (MPP) search methods, including
the advanced mean value method (AMV and AMV2 ), the iterated advanced
mean value method (AMV+ and AMV2 +), the two-point adaptive nonlinearity
approximation method (TANA-3), and the traditional first-order and second-order
reliability methods (FORM and SORM) [15]. The MPP search methods may be
used in forward (reliability index approach (RIA)) or inverse (performance measure
approach (PMA)) modes, as dictated by the type of level mappings. Each of the MPP
search techniques solve local optimization problems in order to locate the MPP,
which is then used as the point about which approximate probabilities are integrated
(using first- or second-order integrations in combination with refinements based on
importance sampling).
Given variants of limit state approximation, approximation order, integration
approach, and MPP search type, the number of algorithmic combinations is
significant. Table 49.2 provides a succinct mapping for some of these combinations
to common method names from the reliability literature.
4.2.1 EGRA
As the name implies, efficient global reliability analysis (EGRA) [6] has its
roots in efficient global optimization (EGO) [18, 21]. The main idea in EGO-type
optimization methods is that a global approximation is made of the underlying
function. This Gaussian process model approximation is used to guide the search
by finding points which maximize the expected improvement function (EIF). The
EIF is used to select the location at which a new training point should be added
to the Gaussian process model by maximizing the amount of improvement in the
objective function that can be expected by adding that point. A point could be
expected to produce an improvement in the objective function if its predicted value
is better than the current best solution, or if the uncertainty in its prediction is such
that the probability of it producing a better solution is high. Because the uncertainty
is higher in regions of the design space with fewer observations, this provides a
balance between exploiting areas of the design space that predict good solutions
and exploring areas where more information is needed.
In the case of reliability analysis, specified with global_reliability, the
goal of optimizing an objective function is replaced with the goal of resolving the
failure boundary. In this case, the Gaussian process model is adaptively refined to
accurately resolve a particular contour of a response function using a variation of
the EIF known as the expected feasibility function. Then, the failure probabilities
are estimated on this adaptively refined Gaussian process model using multimodal
adaptive importance sampling.
4.2.2 GPAIS
Gaussian process adaptive importance sampling (GPAIS) [7] starts with an initial
set of LHS samples and adds samples one at a time, with the goal of adaptively
improving the estimate of the ideal importance density during the process. The
approach uses a mixture of component densities. An iterative process is used to
construct the sequence of improving component densities. The GPs are not used
to directly calculate the failure probability; they are only used to approximate
the importance density. Thus, GPAIS overcomes limitations involving using a
potentially inaccurate surrogate model directly in importance sampling calculations.
This method is specified with the keyword gpais. There are three main controls
which govern the behavior of the algorithm. samples specifies the initial number
of Latin hypercube samples which are used to create the initial Gaussian process
surrogate. emulator_samples specifies the number of samples taken on the
latest Gaussian process model each iteration of the algorithm. The third control is
max_iterations, which controls the number of iterations of the algorithm after
the initial LHS samples.
either in the failure or the non-failure region. This radius depends on the function
evaluation at the disk center, the failure threshold, and an estimate of the function
gradient at the disk center.
After exhausting the sampling budget specified by samples, which is the
number of spheres per failure threshold, the domain is decomposed into two regions.
These regions correspond to failure and non-failure, each represented by the union
of the spheres of each type. After sphere construction, a surrogate model is built and
extensively sampled to estimate the probability of failure for each threshold.
The surrogate model can either be a global surrogate or an ensemble of local
surrogates. The local option leverages a piecewise surrogate model, called a Voronoi
piecewise surrogate (VPS). The VPS model is used to construct high-dimensional
surrogates fitting a few data points, allowing the user to estimate high-dimensional
function values with cheap polynomial evaluations. The core idea in the VPS is to
naturally decompose a high-dimensional domain into its Voronoi tessellation, with
the few given data points as Voronoi seeds. Each Voronoi cell then builds its own
surrogate. The surrogate used is a polynomial that passes through the cells seed and
optimally fits its local neighbors by minimizing their error in the least squares sense.
Therefore, a function evaluation of a new point requires finding the closest seed and
using its particular polynomial coefficients to find the function value estimate.
Figure 49.10 shows the Dakota input file for an example problem that demonstrates
the simplest reliability method, called the mean value method (also referred to as the
interface
fork asynch
analysis_driver = 'text_book'
variables
lognormal_uncertain = 2
means = 1. 1.
std_deviations = 0.5 0.5
descriptors = 'TF1ln' 'TF2ln'
responses
response_functions = 3
numerical_gradients
method_source dakota
interval_type central
fd_gradient_step_size = 1.e-4
no_hessians
Fig. 49.10 Mean value reliability method: the Dakota input file
49 Dakota: Bridging Advanced Scalable Uncertainty Quantification: : : 1673
-----------------------------------------------------------------
MV Statistics for response_fn_1:
Approximate Mean Response = 0.0000000000e+00
Approximate Standard Deviation of Response = 0.0000000000e+00
Importance Factors not available.
MV Statistics for response_fn_2:
Approximate Mean Response = 5.0000000000e-01
Approximate Standard Deviation of Response = 1.0307764064e+00
Importance Factor for variable TF1ln = 9.4117647059e-01
Importance Factor for variable TF2ln = 5.8823529412e-02
MV Statistics for response_fn_3:
Approximate Mean Response = 5.0000000000e-01
Approximate Standard Deviation of Response = 1.0307764064e+00
Importance Factor for variable TF1ln = 5.8823529412e-02
Importance Factor for variable TF2ln = 9.4117647059e-01
-----------------------------------------------------------------
Fig. 49.11 Results of the mean value method on the textbook function
5 Epistemic Methods
Uncertainty quantification is often used for assessing the risk, reliability, and safety
of engineered systems. In these contexts, uncertainty is increasingly separated into
two categories for analysis purposes: aleatory and epistemic uncertainty [16, 27].
Aleatory uncertainty is also referred to as variability, irreducible or inherent
uncertainty, or uncertainty due to chance. Examples of aleatory uncertainty include
the height of individuals in a population, or the temperature in a processing
environment. Aleatory uncertainty is usually modeled with probability distributions.
In contrast, epistemic uncertainty refers to lack of knowledge or lack of information
1674 L.P. Swiler et al.
about a particular aspect of the simulation model, including the system and
environment being modeled. An increase in knowledge or information relating to
epistemic uncertainty will lead to a reduction in the predicted uncertainty of the
system response or performance. Epistemic uncertainty is referred to as subjective,
reducible, or lack of knowledge uncertainty. Examples of epistemic uncertainty
include little or no experimental data for a fixed but unknown physical parameter,
incomplete understanding of complex physical phenomena, uncertainty about the
correct model form to use, etc.
There are many approaches which have been developed to model epistemic
uncertainty, including fuzzy set theory, possibility theory, and evidence theory.
There are three approaches to treat epistemic uncertainties in Dakota: interval
analysis, evidence theory, and subjective probability. In the case of subjective
probability, the same probabilistic methods for sampling, reliability, or stochastic
expansion may be used, albeit with a different subjective interpretation of the
statistical results. We describe the interval analysis and evidence theory capabilities
in the following sections.
In interval analysis, one assumes only that the value of each epistemic uncertain
variable lies somewhere within a specified interval. It is not assumed that the value
has a uniform probability over the interval. Instead, the interpretation is that any
value within the interval is a possible value or a potential realization of that variable.
In interval analysis, the uncertainty quantification problem translates to determining
bounds on the output (defining the output interval), given interval bounds on the
inputs. Again, any output response that falls within the output interval is a possible
output with no frequency information assigned to it.
Dakota supports interval analysis using either global (global_interval_
est) or local (local_interval_est) approaches. The global approach uses
either optimization or sampling to estimate the bounds, with options for acceleration
with surrogates. Specifying the keyword lhs performs Latin hypercube sampling
and takes the minimum and maximum of the samples as the bounds (no optimization
is performed), while optimization approaches are specified via ego, sbo, or ea.
In the case of ego, the efficient global optimization method adaptively refines a
Gaussian process surrogate to calculate bounds. The latter two (sbo for surrogate-
based optimization and ea for evolutionary algorithm) support mixed-integer
nonlinear programming (MINLP), enabling the inclusion of discrete epistemic
parameters such as model form selections