Académique Documents
Professionnel Documents
Culture Documents
Geostatistical Models
Hannes Kazianka and Jurgen
Pilz
1 Introduction
Copulas describe the dependence between random variables independently of their
marginal distributions. They are commonly used in financial and actuarial statistics,
however, they are just beginning to become popular in geostatistics. Spatial dependence is traditionally described using the variogram which is strongly influenced by
the univariate distribution of the random field. Extreme outlying observations adversely affect the empirical and theoretical variogram estimates. Moreover, spatial
modeling often relies on the Gaussian assumption which is hardly fulfilled for environmental processes. To circumvent these disadvantages Bardossy (2006) proposed
the use of copulas to describe the spatial variability. In the following we adopt this
P.M. Atkinson and C.D. Lloyd (eds.), geoENV VII Geostatistics for Environmental
Applications, Quantitative Geology and Geostatistics 16,
c Springer Science+Business Media B.V. 2010
DOI 10.1007/978-90-481-2322-3 27,
307
308
methodology and use it not only for spatial modeling of dependence structures but
also for spatial interpolation. We present three methods for estimating the values
of the random field at unknown locations. The first method we suggest is indicator
and disjunctive kriging. The second method is rank-order kriging, originally proposed by Journel and Deutsch (1996), where we calculate the covariance function
through its relationship to the Spearman rank correlation. The third method is a
plug-in Bayes predictor and can be used if all multivariate distributions of the random field are modeled using the copula.
The paper is organized as follows. Section 2 reviews the basic properties of copulas, while Section 3 briefly describes the spatial copula methodology. In Section 4
the spatial interpolation techniques using copulas are presented and in Section 5
they are used to analyze the Joker data set from the spatial interpolation comparison
SIC2004 (Dubois, 2005). Section 6 is devoted to conclusions.
2 Copulas
The word copula was first used by Sklar (1959) to describe distribution functions
on the n-dimensional unit cube, I n , that link multivariate distributions to their onedimensional margins. To be precise, an n-dimensional copula is an n-dimensional
real function C W I n ! I which satisfies the following properties:
1. For every u 2 I n
C .u/ D 0 if at least one coordinate of u equals 0;
C .u/ D uk if all coordinates of u are 1 except uk :
2. For every a; b 2 I n with a b the n-th order difference of C on a; b
: : : ba11 C .u/ 0:
VC .a; b/ D bann ban1
n1
b
(1)
309
(2)
h F11 .u1 / ; : : : ; Fn1 .un /
@n C .u1 ; : : : ; un /
D
;
Qn
1
@u1 : : : @un
.ui /
i D1 fi Fi
(3)
where h denotes the density of H and the fi denote the densities of Fi . One of the
advantages of working with copulas is that they are invariant under strictly increasing transformations of the random variables. Therefore, typical data transformation
methods, such as taking the logarithm or performing a BoxCox transformation,
have no impact on the copula. Bivariate copulas are directly linked to the scale free
measure of association known as Spearmans rho. The Spearman rank correlation
between two random variables X1 and X2 with copula C can be calculated as
Z Z
X1 ;X2 D 12
Z Z
I2
u1 u2 dC .u1 ; u2 / 3 D 12
I2
C .u1 ; u2 / d u1 d u2 3: (4)
(5)
310
n 1
2X
.1/
Pn
j D1 ij
m; ."i / ;
i D0
P2n 1
d .z1 ; : : : ; zn / D
m; ."i /
qQ
;
n
2n
i D1 zi
i D0
Pn
p
p
j 1
, "i D .1/i1 z1 ; : : : ; .1/in zn and
where ij 2 f0; 1g, i D
j D1 ij 2
; denotes the Gaussian density function. Using Eqs. 23 the copula and its density can be evaluated.
311
for the family of marginal distributions FZ . Inference for all the parameters can be
based on the maximum likelihood approach. If the copula density can be evaluated
for all n 2 dimensions, maximization of the likelihood is not difficult. However, as
is the case for the non-central 2 -copula, it may occur that calculation of the copula
density is infeasible for higher dimensions. Here we proceed to perform maximum
likelihood estimation only with the bivariate copula densities. Under the assumption
that different pairs of observations are treated as independent we have to maximize
l . I Z .x// D
Y
c; F .Z .x i // ; F Z x j
f .Z .x k // ;
k2fi;j g
i;j 2f1;:::;ng
i j
where D .; ; / is the parameter vector, c; is the copula density, F is the
marginal distribution as a function of and f denotes its density. The procedure
works well as long as there is no intention to estimate anisotropy.
An advantage of working with copulas that are constructed from elliptical distributions is that the correlation matrix explicitly appears in their analytical expression.
If we parameterize the correlation matrix using a geostatistical covariance model,
we no longer need to estimate a sill since it is equal to 1. The reason for this is that
the overall variance of the random field is a property of the marginal distribution
and the copula describes the dependence structure without information about the
margins.
u20;1 2
p
where Cn D n .Cn C /, Cn is the empirical copula calculated using the n data
points and C is the estimation under the null hypothesis. The steps of the algorithm
are as follows:
1. For each of the lags h1 ; : : : ; hr compute the empirical copula Cnh1 ; : : : ; Cnhr .
2. Estimate the theoretical copula, for example using the maximum likelihood approach. Denote the estimated parameters by . For every lag class there is a
corresponding theoretical copula Ch1 ; : : : ; Chr .
312
N
1 X hj
h
I Sn;k > Sn j
N
or phj D
kD1
N
1 X hj
h
I Tn;k > Tn j ;
N
kD1
(6)
313
xj D
n
X
i D1
!
n
X
1
1
i v .x i / C
i ;
2
i D1
1
where j D n C 1; : : : ; n C m.Since
back-transforming v x j using FZ would
lead to a biased estimate for Z x j , a bias correction is introduced
z x j D FZ1 v x j C x j FZ1 L v x j
FZ1 v x j
; (7)
314
where L ./ is the distribution function of all kriged values v x j and .x 0 / D
!
2
K
.x j /
2
2
with K
being the kriging variance, max
being the maximal kriging
2
max
1
1
if x 2 max f0; ag ; v x j ;
< min 2.v .x /a/ ; 2v .x / ;
j
j
d .x/ D
1
1
315
100
1 X
k
0:5
zp x j with p D
ZO x j D
:
100
100 100
kD1
316
calculated by just using a Jacobian transformation. To get back from the ranks to the
original scale the transformation is FZ1 . The corresponding Jacobian determinant
is exactly the density fZ . Hence,
O D D ch FZ .z .x 0 // j ;
O D fZ .z .x 0 // :
p z .x 0 / j ;
(8)
1
1
O D fZ .z .x 0 // d z .x 0 /
z .x 0 / ch FZ .z .x 0 // j ;
O D d v .x 0 / :
FZ1 .v .x 0 // ch v .x 0 / j ;
1
0
2
O D d v .x 0 / :
FZ1 .v .x 0 // ZO .x 0 / ch v .x 0 / j ;
317
and the Gaussian spatial copula model is fully determined. If we assume that the
random field follows the Gaussian spatial copula model with known FZ , then
g .z/ WD 1 .FZ .z// is a suitable transformation.
Since we can also use any other copula different from the Gaussian copula in our
approach, we observe that the spatial copula model is a generalization of the transGaussian model. Even if we want to stay within the Gaussian framework, it is more
convenient to use the copula methodology because it is easier to specify the univariate distribution of the random field than to determine a suitable transformation
function. Especially when we work with multimodal or extreme value data this fact
becomes obvious.
For certain copula families not all data values can be used to build the predictive
distribution. For the non-central 2 -copula mentioned in Section 3.1 this happens
because one would need to evaluate 2n terms for the calculation of the conditional
copula. In these cases we propose a local prediction.
a
7
x 105
6
5
1400
1200
1000
800
2
600
400
200
0
1
1
0
8
0.5
0.5
1.5
2.5
3
5
x 10
x 105
3
6
2
4
1
2
x 10
2 1
Fig. 1 The locations of the Joker training (dots) and test data (circles) are displayed in (a).
A surface plot of the Joker data is shown in (b)
318
1600
1400
True Values
1200
1000
800
600
400
200
0
0
b
18
16
14
12
10
8
6
4
2
200 400 600 800 1000 1200 1400 1600 0
Prediction
x 105
Fig. 2 The predicted values of the test set are plotted against the true values in (a). The predictive
density at a hotspot is displayed in (b)
in Section 3.3 we find out that it is sufficient to work with the Gaussian spatial
copula. The correlation matrix of the Gaussian distribution is parameterized (cf.
Section 3.1) by a mixture of a Gaussian and an exponential correlation model.
Geometric anisotropy is considered by a 2 2 transformation matrix, where one
entry is fixed to avoid interference with the ranges of the correlation models. All
parameters, including the nugget, two ranges, one mixing parameter, three parameters for the generalized extreme value distribution and three anisotropy parameters,
are estimated using the maximum likelihood approach. Note that there is no need
to estimate a sill. Prediction is performed using the plug-in Bayes approach. The
predicted values are plotted against the true values in Fig. 2a. The predictive density at a hotspot is visualized in Fig. 2b and it shows that values around 1,500
are still contained in the body of the distribution. The results for the test data are:
RMSE D 65:87, MAE D 16:22, ME D 2:58 and Pearson-r D 0:71. Compared
to the results of more than 30 participants of the SIC2004 this would be the third
smallest RMSE, the second smallest MAE and the third largest Pearson correlation.
6 Conclusion
Copulas can be used to describe spatial dependence and in this way generalize the
concept of the variogram. Moreover, the spatial copula model can be used to perform spatial interpolation. It generalizes the trans-Gaussian kriging method and is
therefore a flexible tool for working with non-Gaussian, multimodal and extreme
value data. Specifying the univariate distribution of the random field is easier than
finding a suitable transformation for trans-Gaussian kriging, which makes the spatial
copula model attractive even if the Gaussian copula is used. Another advantage is
that the sill need not to be estimated and, hence, the model contains one parameter
319
less. Results on the SIC2004 Joker data demonstrate that the presented approach
could be applied for emergency monitoring and estimating exceedance probabilities
for certain emergency thresholds in environmental monitoring systems.
Acknowledgement This work was partially funded by the European Commission, under the Sixth
Framework Programme, by the Contract N. 033811 with DG INFSO, Action Line IST- 2005-2.5.12
ICT for Environmental Risk Management. The views expressed herein are those of the authors and
are not necessarily those of the European Commission.
References
Bardossy A (2006) Copula-based geostatistical models for groundwater quality parameters. Water
Resour Res 42(W11416). doi:10.1029/2005WR004754
Diggle P, Ribeiro P (2007) Model-based geostatistics. Springer, New York
Dubois G (2005) Automatic mapping algorithms for routine and emergency monitoring data. EC
Joint Research Centre, Belgium
Genest C, Remillard B (2008) Validity of the parametric bootstrap for goodness-of-fit testing in
semiparametric models. Ann I H Poincare B 44:10961127
Journel A, Deutsch C (1996) Rank order geostatistics. In: Baafi E, Schofield N (eds) Geostatistics
Wollongong 96. Kluwer, Dordrecht, pp 174187
Nelsen R (2006) An introduction to copulas. Springer, New York
Rivoirard J (1994) Introduction to disjunctive kriging and non-linear geostatistics. Oxford
University Press, Oxford
Saito H, Goovaerts P (2000) Geostatistical interpolation of positively skewed and censored data in
a dioxin-contaminated site. Environ Sci Technol 34:42284235
Sklar A (1959) Fonctions de repartition a n dimensions et leurs marges. Publ Inst Stat Univ Paris
8:229231